Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica
Moreno Eguilaz, Juan Manuel
2026-01-26
This Bachelor’s Thesis addresses the development, training, and comparison of several Transformer-based models for the classification of pharmaceutical call transcriptions, distinguishing whether the patient reports an adverse event, AE, or not. The work aims to contribute to the automation of this process through natural language processing, NLP, and machine learning, ML. The project is developed using a dataset provided by a pharmaceutical company, consisting of a limited number of transcribed calls. These transcriptions are long and noisy, and they reflect real conversational language, which introduces additional challenges compared to more structured text sources. In this context, the proposed methodology defines a complete pipeline for data preparation and model training to address the task as a supervised binary classification problem. The study compares several BERT based architectures, including BERT base, BERT large, DistilBERT, RoBERTa, and ALBERT, with the goal of identifying which configuration performs best in this scenario. In addition, a final comparison is presented between the best Transformer based model and the best classical ML approach developed in parallel for the same problem, in order to assess which paradigm is more effective for AE detection in conversations. The selection of the final model is not based on a single metric, but on a multidimensional criterion that combines different aspects relevant to a safety critical application. The goal is to achieve overall model effectiveness, the ability to detect AE cases with high sensitivity, reliable generalization to unseen calls, and practical feasibility under computational cost constraints. The thesis presents the pipeline and the selected configuration that is most suitable for classifying and detecting adverse events in telephone call transcripts.
Bachelor thesis
Anglès
Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial; Machine learning; Natural language processing (Computer science); Artificial intelligence--Medical applications; Aprenentatge automàtic; Tractament del llenguatge natural (Informàtica); Intel·ligència artificial--Aplicacions a la medicina
Universitat Politècnica de Catalunya
Open Access
Treballs acadèmics [82483]