A proposal for Wide-Coverage Spanish Named Entity Recognition

Data de publicació

2019-03-12T14:02:32Z

2019-03-12T14:02:32Z

2002

2019-03-12T14:02:32Z

Resum

This paper presents a proposal for wide--coverage Named Entity Recognition for Spanish. First, a linguistic description of the typology of Named Entities is proposed. Following this definition an architecture of sequential processes is described for addressing the recognition and classification of strong and weak Named Entities. The former are treated using Machine Learning techniques (AdaBoost) and simple attributes requiring non tagged corpora complemented with external information sources (a list of trigger words and a gazetteer). The latter are approached through a context free grammar for recognizing syntactic patterns. A deep evaluation of the first task on real corpora to validate the appropriateness of the approach is presented. A preliminar version of the context free grammar is qualitatively evaluated with also good results on a small hand--tagged corpus.

Tipus de document

Article


Versió publicada

Llengua

Anglès

Publicat per

Sociedad Española para el Procesamiento del Lenguaje Natural (SEPLN)

Documents relacionats

Reproducció del document publicat a: http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/3305

Procesamiento del lenguaje natural , 2002, num. 28, p. 63-80

Citació recomanada

Aquesta citació s'ha generat automàticament.

Drets

(c) Arévalo, Montse et al., 2002

Aquest element apareix en la col·lecció o col·leccions següent(s)