Automatic call classification using machine learning and advanced NLP approaches

dc.contributor
Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica
dc.contributor
Moreno Eguilaz, Juan Manuel
dc.contributor.author
Vives Garcia Del Real, José-Nicolás
dc.date.accessioned
2026-03-26T18:20:52Z
dc.date.available
2026-03-26T18:20:52Z
dc.date.issued
2026-01-26
dc.identifier
https://hdl.handle.net/2117/459338
dc.identifier
PRISMA-203863
dc.identifier.uri
https://hdl.handle.net/2117/459338
dc.description.abstract
This Bachelor’s Thesis addresses the development, training, and comparison of several Transformer-based models for the classification of pharmaceutical call transcriptions, distinguishing whether the patient reports an adverse event, AE, or not. The work aims to contribute to the automation of this process through natural language processing, NLP, and machine learning, ML. The project is developed using a dataset provided by a pharmaceutical company, consisting of a limited number of transcribed calls. These transcriptions are long and noisy, and they reflect real conversational language, which introduces additional challenges compared to more structured text sources. In this context, the proposed methodology defines a complete pipeline for data preparation and model training to address the task as a supervised binary classification problem. The study compares several BERT based architectures, including BERT base, BERT large, DistilBERT, RoBERTa, and ALBERT, with the goal of identifying which configuration performs best in this scenario. In addition, a final comparison is presented between the best Transformer based model and the best classical ML approach developed in parallel for the same problem, in order to assess which paradigm is more effective for AE detection in conversations. The selection of the final model is not based on a single metric, but on a multidimensional criterion that combines different aspects relevant to a safety critical application. The goal is to achieve overall model effectiveness, the ability to detect AE cases with high sensitivity, reliable generalization to unseen calls, and practical feasibility under computational cost constraints. The thesis presents the pipeline and the selected configuration that is most suitable for classifying and detecting adverse events in telephone call transcripts.
dc.format
application/pdf
dc.language
eng
dc.publisher
Universitat Politècnica de Catalunya
dc.rights
Open Access
dc.subject
Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial
dc.subject
Machine learning
dc.subject
Natural language processing (Computer science)
dc.subject
Artificial intelligence--Medical applications
dc.subject
Aprenentatge automàtic
dc.subject
Tractament del llenguatge natural (Informàtica)
dc.subject
Intel·ligència artificial--Aplicacions a la medicina
dc.title
Automatic call classification using machine learning and advanced NLP approaches
dc.type
Bachelor thesis


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)