Automatic call classification using machine learning and advanced NLP approaches

Vives Garcia Del Real, José-Nicolás

Automatic call classification using machine learning and advanced NLP approaches

dc.contributor

Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica

dc.contributor

Moreno Eguilaz, Juan Manuel

dc.contributor.author

Vives Garcia Del Real, José-Nicolás

dc.date.accessioned

2026-03-26T18:20:52Z

dc.date.available

2026-03-26T18:20:52Z

dc.date.issued

2026-01-26

dc.identifier

https://hdl.handle.net/2117/459338

dc.identifier

PRISMA-203863

dc.identifier.uri

https://hdl.handle.net/2117/459338

dc.description.abstract

This Bachelor’s Thesis addresses the development, training, and comparison of several Transformer-based models for the classification of pharmaceutical call transcriptions, distinguishing whether the patient reports an adverse event, AE, or not. The work aims to contribute to the automation of this process through natural language processing, NLP, and machine learning, ML. The project is developed using a dataset provided by a pharmaceutical company, consisting of a limited number of transcribed calls. These transcriptions are long and noisy, and they reflect real conversational language, which introduces additional challenges compared to more structured text sources. In this context, the proposed methodology defines a complete pipeline for data preparation and model training to address the task as a supervised binary classification problem. The study compares several BERT based architectures, including BERT base, BERT large, DistilBERT, RoBERTa, and ALBERT, with the goal of identifying which configuration performs best in this scenario. In addition, a final comparison is presented between the best Transformer based model and the best classical ML approach developed in parallel for the same problem, in order to assess which paradigm is more effective for AE detection in conversations. The selection of the final model is not based on a single metric, but on a multidimensional criterion that combines different aspects relevant to a safety critical application. The goal is to achieve overall model effectiveness, the ability to detect AE cases with high sensitivity, reliable generalization to unseen calls, and practical feasibility under computational cost constraints. The thesis presents the pipeline and the selected configuration that is most suitable for classifying and detecting adverse events in telephone call transcripts.

dc.format

application/pdf

dc.language

eng

dc.publisher

Universitat Politècnica de Catalunya

dc.rights

Open Access

dc.subject

Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial

dc.subject

Machine learning

dc.subject

Natural language processing (Computer science)

dc.subject

Artificial intelligence--Medical applications

dc.subject

Aprenentatge automàtic

dc.subject

Tractament del llenguatge natural (Informàtica)

dc.subject

Intel·ligència artificial--Aplicacions a la medicina

dc.title

Automatic call classification using machine learning and advanced NLP approaches

dc.type

Bachelor thesis

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Treballs acadèmics [82482]