Comparison of automatic classifiers'performances using word-based feature extraction techniques in an e-government setting

Inicio | ¿Qué es? | Contacto

English | Català

Consultar RECERCAT

Por comunidades y
colecciones Por fecha Por autores Por títulos Por temas (CDU)

Consultar departamento

Por fecha Por autores Por títulos Por temas (CDU)

Estadisticas

Del documento Todo RECERCAT

Mi RECERCAT

Entrar Alertas por correo-e

Directorio de otros repositorios

RECERCAT Principal > Universitat Politècnica de Catalunya > Tesines i projectes i treballs de final de carrera > Visualizar documento

Para acceder a los documentos con el texto completo, por favor, siga el siguiente enlace: http://hdl.handle.net/2099.1/14547

Título:	Comparison of automatic classifiers'performances using word-based feature extraction techniques in an e-government setting
Autor/a:	Marin Rodenas, Alfonso
Otros autores:	Kungliga Tekniska högskolan; Velupillai, Sumithra; Dalianis, Hercules
Abstract:	Projecte realitzat mitjançant programa de mobilitat. KUNGLIGA TEKNISKA HÖGSKOLAN, STOCKHOLM
Abstract:	Nowadays email is commonly used by citizens to establish communication with their government. On the received emails, governments deal with some common queries and subjects which some handling officers have to manually answer. Automatic email classification of the incoming emails allows to increase the communication efficiency by decreasing the delay between the query and its response. This thesis takes part within the IMAIL project, which aims to provide an automatic answering solution to the Swedish Social Insurance Agency (SSIA) (¿Försäkringskassan¿ in Swedish). The goal of this thesis is to analyze and compare the classification performance of different sets of features extracted from SSIA emails on different automatic classifiers. The features extracted from the emails will depend on the previous preprocessing that is carried out as well. Compound splitting, lemmatization, stop words removal, Part-of-Speech tagging and Ngrams are the processes used in the data set. Moreover, classifications will be performed using Support Vector Machines, k- Nearest Neighbors and Naive Bayes. For the analysis and comparison of different results, precision, recall and F-measure are used. From the results obtained in this thesis, SVM provides the best classification with a F-measure value of 0.787. However, Naive Bayes provides a better classification for most of the email categories than SVM. Thus, it can not be concluded whether SVM classify better than Naive Bayes or not. Furthermore, a comparison to Dalianis et al. (2011) is made. The results obtained in this approach outperformed the results obtained before. SVM provided a F-measure value of 0.858 when using PoS-tagging on original emails. This result improves by almost 3% the 0.83 obtained in Dalianis et al. (2011). In this case, SVM was clearly better than Naive Bayes.
Materia(s):	-Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació -Internet in public administration -e-government -machine learning -WEKA -SVM -Naive Bayes -kNN -Swedish -PoStagging -feature extraction -feature selection -automatic e-mail classification -Administració electrònica
Derechos:
Tipo de documento:	Trabajo/Proyecto fin de carrera
Editor:	Universitat Politècnica de Catalunya; Kungl. Tekniska högskolan (Estocolm)
Compartir:

Mostrar el registro completo del ítem

Accesibilidad | Aviso legal | Política de Cookies | Documentos de uso interno

Coordinación

Patrocinio