Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification

Fecha de publicación

2021-03-11T11:21:57Z

2021-03-11T11:21:57Z

2020-08-27

2021-03-11T11:21:57Z

Resumen

Multiword Expressions (MWEs) are idiosyncratic combinations of words which pose important challenges to Natural Language Processing. Some kinds of MWEs, such as verbal ones, are particularly hard to identify in corpora, due to their high degree of morphosyntactic flexibility. This paper describes a linguistically motivated method to gather detailed information about verb+noun MWEs (VNMWEs) from corpora. Although the main focus of this study is Spanish, the method is easily adaptable to other languages. Monolingual and parallel corpora are used as input, and data about the morphosyntactic variability of VNMWEs is extracted. This information is then tested in an identification task, obtaining an F score of 0.52, which is considerably higher than related work.

Tipo de documento

Artículo


Versión publicada

Lengua

Inglés

Publicado por

Public Library of Science (PLoS)

Documentos relacionados

Reproducció del document publicat a: https://doi.org/10.1371/journal.pone.0237767

PLoS One, 2020, vol. 15, num. 8, p. e0237767

https://doi.org/10.1371/journal.pone.0237767

Citación recomendada

Esta citación se ha generado automáticamente.

Derechos

cc-by (c) Inurrieta, Uxoa et al., 2020

http://creativecommons.org/licenses/by/3.0/es

Este ítem aparece en la(s) siguiente(s) colección(ones)