CatDetect, a framework for detecting Catalan tweets

Plaza Cagigós, Sergi; Plaza Cagigós, Sergi

CatDetect, a framework for detecting Catalan tweets

To access the full text documents, please follow this link: http://hdl.handle.net/10459.1/60252

Author

Plaza Cagigós, Sergi

Other authors

Solsona Tehàs, Francesc

Vilaplana Mayoral, Jordi

Universitat de Lleida. Escola Politècnica Superior

Publication date

2017-09-22T10:46:45Z

2020-03-30T22:11:30Z

2017-09

Abstract

This work deals with language detection. It includes new proposals ranging from lexicon and morphological analysis to an increasing use of machine learning solutions. In this case, the language study is focused on Catalan, a minority language. Difficulty even increases in detecting Catalan on tweets, messages written in the Twitter social network. To achieve that, a Twitter-Catalan corpus has been generated using lexicon and morphological approaches, which then will be used to create supervised models based on Machine Learning techniques. They are also evaluated in order to see which one obtains the best prediction score and thus, the best suitability to be used. The best model is to be used in a website, where users can test the algorithm interactively in a front-end webpage and in background by means of a webservice across a RESTful API.

Document Type

bachelorThesis

Language

English

Subjects and keywords

Catalan; Language Detection; Twitter corpus; Machine Learning; Website; Twitter; Català -- Ús

Recommended citation

This citation was generated automatically.

Export

DIDL MARC MARC_CCUC METS OAI_DC ORE QDC RDF

Rights

cc-by-nc-nd

http://creativecommons.org/licenses/by-nc-nd/4.0/

This item appears in the following Collection(s)

Treballs de l'estudiantat [3381]

CatDetect, a framework for detecting Catalan tweets

Author

Other authors

Publication date

Share

Abstract

Document Type

Language

Subjects and keywords

Recommended citation

Export

Rights

This item appears in the following Collection(s)