Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification

dc.contributor.author
Inurrieta, Uxoa
dc.contributor.author
Aduriz, Itziar
dc.contributor.author
Diaz de Ilarraza, Arantza
dc.contributor.author
Labaka, Gorka
dc.contributor.author
Sarasola, Kepa
dc.date.issued
2021-03-11T11:21:57Z
dc.date.issued
2021-03-11T11:21:57Z
dc.date.issued
2020-08-27
dc.date.issued
2021-03-11T11:21:57Z
dc.identifier
1932-6203
dc.identifier
https://hdl.handle.net/2445/174917
dc.identifier
703445
dc.identifier
32853283
dc.description.abstract
Multiword Expressions (MWEs) are idiosyncratic combinations of words which pose important challenges to Natural Language Processing. Some kinds of MWEs, such as verbal ones, are particularly hard to identify in corpora, due to their high degree of morphosyntactic flexibility. This paper describes a linguistically motivated method to gather detailed information about verb+noun MWEs (VNMWEs) from corpora. Although the main focus of this study is Spanish, the method is easily adaptable to other languages. Monolingual and parallel corpora are used as input, and data about the morphosyntactic variability of VNMWEs is extracted. This information is then tested in an identification task, obtaining an F score of 0.52, which is considerably higher than related work.
dc.format
18 p.
dc.format
application/pdf
dc.language
eng
dc.publisher
Public Library of Science (PLoS)
dc.relation
Reproducció del document publicat a: https://doi.org/10.1371/journal.pone.0237767
dc.relation
PLoS One, 2020, vol. 15, num. 8, p. e0237767
dc.relation
https://doi.org/10.1371/journal.pone.0237767
dc.rights
cc-by (c) Inurrieta, Uxoa et al., 2020
dc.rights
http://creativecommons.org/licenses/by/3.0/es
dc.rights
info:eu-repo/semantics/openAccess
dc.source
Articles publicats en revistes (Filologia Catalana i Lingüística General)
dc.subject
Morfologia (Gramàtica)
dc.subject
Semàntica
dc.subject
Aprenentatge automàtic
dc.subject
Morphology (Grammar)
dc.subject
Semantics
dc.subject
Machine learning
dc.title
Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification
dc.type
info:eu-repo/semantics/article
dc.type
info:eu-repo/semantics/publishedVersion


Ficheros en el ítem

FicherosTamañoFormatoVer

No hay ficheros asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)