Mining and exploiting domain-specific corpora in the PANACEA platform

Inicio | ¿Qué es? | Contacto

English | Català

Consultar RECERCAT

Por comunidades y
colecciones Por fecha Por autores Por títulos Por temas (CDU)

Consultar departamento

Por fecha Por autores Por títulos Por temas (CDU)

Estadisticas

Del documento Todo RECERCAT

Mi RECERCAT

Entrar Alertas por correo-e

Directorio de otros repositorios

RECERCAT Principal > Universitat Pompeu Fabra > Articles, congressos, llibres > Visualizar documento

Para acceder a los documentos con el texto completo, por favor, siga el siguiente enlace: http://hdl.handle.net/10230/20416

Título:	Mining and exploiting domain-specific corpora in the PANACEA platform
Autor/a:	Bel Rafecas, Núria; Prokopidis, Prokopis; Toral, Antonio; Arranz, Victoria; Papavassiliou, Vassilis
Abstract:	The objective of the PANACEA ICT-2007.2.2 EU project is to build a platform that automates the stages involved in the acquisition,production, updating and maintenance of the large language resources required by, among others, MT systems. The development of a Corpus Acquisition Component (CAC) for extracting monolingual and bilingual data from the web is one of the most innovative building blocks of PANACEA. The CAC, which is the first stage in the PANACEA pipeline for building Language Resources, adopts an efficient and distributed methodology to crawl for web documents with rich textual content in specific languages and predefined domains. The CAC includes modules that can acquire parallel data from sites with in-domain content available in more than one language. In order to extrinsically evaluate the CAC methodology, we have conducted several experiments that used crawled parallel corpora for the identification and extraction of parallel sentences using sentence alignment. The corpora were then successfully used for domain adaptation of Machine Translation Systems.
Materia(s):	-Web crawling -Boilerplate removal -Corpus acquisition -IPR for language resources
Derechos:	© 2012 ELRA - European Language Resources Association. All rights reserved.
Tipo de documento:	Objeto de conferencia Artículo - Versión publicada
Editor:	ELRA (European Language Resources Association)
Compartir:

Mostrar el registro completo del ítem

Documentos relacionados

Otros documentos del mismo autor/a

Language Resources Factory: case study on the acquisition of Translation Memories

Poch, Marc; Toral, Antonio; Bel Rafecas, Núria

Tradução online

Poch, Marc; Toral, Antonio; Hamon, Olivier; Quochi, Valeria; Bel Rafecas, Núria

PANACEA (Platform for Automatic, Normalised Annotation and Cost-Effective Acquisition of Language Resources for Human Language Technologies)

Bel Rafecas, Núria; Poch, Marc; Toral, Antonio

Towards a User-Friendly Platform for Building Language Resources based on Web Services

Poch, Marc; Hamon, Olivier; Quochi, Valeria; Bel Rafecas, Núria; Toral, Antonio

Constituency and Dependency Parsers Evaluation

Comelles Pujadas, Elisabet; Arranz, Victoria; Castellón Masalles, Irene

Accesibilidad | Aviso legal | Política de Cookies | Documentos de uso interno

Coordinación

Patrocinio