2026-04-13T07:01:34Zhttps://recercat.cat/oai/request

oai:recercat.cat:2117/4457622025-11-08T07:19:10Zcom_2072_1033col_2072_452951

RECERCAT author Espasa Rosell, Jordi 2025-11-08T07:19:10Z 2025-11-08T07:19:10Z 2025-10-20 http://hdl.handle.net/2117/445762 This Master's Thesis optimizes large language models (LLMs) for multiple-choice question answering (MCQA) to evaluate employee performance from spoken transcripts in personalized training platforms. Current LLMs achieve only 63% accuracy in dynamic assessments due to biases, reasoning failures, and inefficiencies. We develop a systematic framework balancing precision, cost, and execution time through iterative evaluation refinement, corpus preparation, baseline selection, and phased experiments, including single-factor screening (OFAT), multi-factor interactions, and parameter-efficient fine-tuning (PEFT). Key factors assessed include model scale, in-context learning, chain-of-thought (CoT), chain-of-density (CoD), self-correction, and agentic ensembles. Contributions encompass a replicable optimization pipeline and strategies to mitigate biases like positional and literal interpretation errors. Results show improvements from 63% to 80% accuracy and enhanced F1-scores, enabling ethical, scalable AI-driven assessments for enterprise individualized learning. Open Access Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic Deep learning (Machine learning) Questions and answers Models de llenguatge de gran escala Resposta a preguntes d'opció múltiple Avaluació del rendiment d'empleats Transcripcions orals Plataformes de formació personalitzada Optimització de models Precisió en avaluacions dinàmiques Biaixos en models d'IA Fallades de raonament Marc sistemàtic Refinament iteratiu d'avaluació Preparació de corpus Selecció de línia base Experiments per fases Cribratge d'un sol factor Large language models Multiple-choice question answering Employee performance evaluation Spoken transcripts Mersonalized training platforms Model optimization Accuracy in dynamic assessments AI model biases Reasoning failures Systematic framework Iterative evaluation refinement Corpus preparation Baseline selection Phased experiments One-factor-at-a-time screening Multi-factor interactions Parameter-efficient fine-tuning Model scale In-context learning Chain-of-thought Aprenentatge profund (Aprenentatge automàtic) Preguntes i respostes Natural language models for learning assessment from unstructured data Master thesis