Automatic call classification using machine learning and advanced NLP approaches

Altres autors/es

Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica

Moreno Eguilaz, Juan Manuel

Data de publicació

2026-01-26



Resum

This Bachelor’s Thesis addresses the development, training, and comparison of several Transformer-based models for the classification of pharmaceutical call transcriptions, distinguishing whether the patient reports an adverse event, AE, or not. The work aims to contribute to the automation of this process through natural language processing, NLP, and machine learning, ML. The project is developed using a dataset provided by a pharmaceutical company, consisting of a limited number of transcribed calls. These transcriptions are long and noisy, and they reflect real conversational language, which introduces additional challenges compared to more structured text sources. In this context, the proposed methodology defines a complete pipeline for data preparation and model training to address the task as a supervised binary classification problem. The study compares several BERT based architectures, including BERT base, BERT large, DistilBERT, RoBERTa, and ALBERT, with the goal of identifying which configuration performs best in this scenario. In addition, a final comparison is presented between the best Transformer based model and the best classical ML approach developed in parallel for the same problem, in order to assess which paradigm is more effective for AE detection in conversations. The selection of the final model is not based on a single metric, but on a multidimensional criterion that combines different aspects relevant to a safety critical application. The goal is to achieve overall model effectiveness, the ability to detect AE cases with high sensitivity, reliable generalization to unseen calls, and practical feasibility under computational cost constraints. The thesis presents the pipeline and the selected configuration that is most suitable for classifying and detecting adverse events in telephone call transcripts.

Tipus de document

Bachelor thesis

Llengua

Anglès

Publicat per

Universitat Politècnica de Catalunya

Citació recomanada

Aquesta citació s'ha generat automàticament.

Drets

Open Access

Aquest element apareix en la col·lecció o col·leccions següent(s)