On the impact of morphology in English to Spanish statistical MT

Home | About RECERCAT | Contact

Català | Castellano

All of RECERCAT

By Communities &
Collections By Defense Date By Authors By Titles By Subject

This Collection

By Defense Date By Authors By Titles By Subject

Statistics

View Statistics All RECERCAT

My RECERCAT

Other repositories directory

RECERCAT Home > Universitat Politècnica de Catalunya > Documents de recerca > View document

To access the full text documents, please follow this link: http://hdl.handle.net/2117/79198

dc.contributor	Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.contributor	Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
dc.contributor.author	Gispert Ramis, Adrià de
dc.contributor.author	Mariño Acebal, José Bernardo
dc.date	2008-12-31
dc.identifier.citation	de Gispert, A., Mariño, J.B. On the impact of morphology in English to Spanish statistical MT. "Speech communication", 31 Desembre 2008, vol. 50, núm. 11, p. 1034-1046.
dc.identifier.citation	0167-6393
dc.identifier.citation	10.1016/j.specom.2008.05.003
dc.identifier.uri	http://hdl.handle.net/2117/79198
dc.language.iso	eng
dc.relation	http://www.sciencedirect.com/science/article/pii/S0167639308000769
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	http://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.subject	Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic
dc.subject	Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic
dc.subject	Speech processing systems
dc.subject	Machine learning
dc.subject	Morphology generation
dc.subject	N-gram based translation
dc.subject	Statistical machine translation
dc.subject	Machine learning
dc.subject	Processament de la parla
dc.subject	Aprenentatge automàtic
dc.title	On the impact of morphology in English to Spanish statistical MT
dc.type	info:eu-repo/semantics/publishedVersion
dc.type	info:eu-repo/semantics/article
dc.description.abstract	This paper presents a thorough study of the impact of morphology derivation on N-gram-based Statistical Machine Translation (SMT) models from English into a morphology-rich language such as Spanish. For this purpose, we define a framework under the assumption that a certain degree of morphology-related information is not only being ignored by current statistical translation models, but also has a negative impact on their estimation due to the data sparseness it causes. Moreover, we describe how this information can be decoupled from the standard bilingual N-gram models and introduced separately by means of a well-defined and better informed feature-based classification task. Results are presented for the European Parliament Plenary Sessions (EPPS) English ¿ Spanish task, showing oracle scores based on to what extent SMT models can benefit from simplifying Spanish morphological surface forms for each Part-Of-Speech category. We show that verb form morphological richness greatly weakens the standard statistical models, and we carry out a posterior morphology classification by defining a simple set of features and applying machine learning techniques. In addition to that, we propose a simple technique to deal with Spanish enclitic pronouns. Both techniques are empirically evaluated and final translation results show improvements over the baseline by just dealing with Spanish morphology. In principle, the study is also valid for translation from English into any other Romance language (Portuguese, Catalan, French, Galician, Italian, etc.). The proposed method can be applied to both monotonic and non-monotonic decoding scenarios, thus revealing the interaction between word-order decoding and the proposed morphology simplification techniques. Overall results achieve statistically significant improvement over baseline performance in this demanding task.
dc.description.abstract	Peer Reviewed

Show simple item record

All of RECERCAT

This Collection

Statistics

My RECERCAT

Related documents

Other documents of the same author