dc.contributor
Solsona Tehàs, Francesc
dc.contributor
Vilaplana Mayoral, Jordi
dc.contributor
Universitat de Lleida. Escola Politècnica Superior
dc.contributor.author
Plaza Cagigós, Sergi
dc.date.accessioned
2024-12-05T23:05:24Z
dc.date.available
2024-12-05T23:05:24Z
dc.date.issued
2017-09-22T10:46:45Z
dc.date.issued
2020-03-30T22:11:30Z
dc.identifier
http://hdl.handle.net/10459.1/60252
dc.identifier.uri
http://hdl.handle.net/10459.1/60252
dc.description.abstract
This work deals with language detection. It includes new proposals ranging from lexicon and morphological
analysis to an increasing use of machine learning solutions. In this case, the language study is focused on
Catalan, a minority language. Difficulty even increases in detecting Catalan on tweets, messages written in
the Twitter social network. To achieve that, a Twitter-Catalan corpus has been generated using lexicon and
morphological approaches, which then will be used to create supervised models based on Machine Learning
techniques. They are also evaluated in order to see which one obtains the best prediction score and thus,
the best suitability to be used. The best model is to be used in a website, where users can test the algorithm
interactively in a front-end webpage and in background by means of a webservice across a RESTful API.
dc.rights
info:eu-repo/semantics/openAccess
dc.rights
http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject
Language Detection
dc.subject
Twitter corpus
dc.subject
Machine Learning
dc.title
CatDetect, a framework for detecting Catalan tweets