;
Quality-Driven Synthetic Text Generation for Multilingual Speech Translation with Audio Large Language Models;
Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
Hernando Pericás, Francisco Javier
2025-10-28
.
This thesis explores quality-driven synthetic data generation as a scalable solution for mul- tilingual speech-to-text translation (S2TT), focusing on Iberian languages with limited natural resources. Leveraging large language models (LLMs) and rigorous reference-free quality filtering via BLASER 2.0, an end-to-end pipeline was implemented to generate millions of high-quality synthetic translations. The approach demonstrates substantial improvements in translation quality and semantic similarity for low-resource languages such as Asturian and Occitan, while enabling efficient scaling to diverse linguistic do- mains. Experimental results reveal that models trained on filtered synthetic data achieve competitive and often state-of-the-art performance in S2TT tasks, and narrow the gap between direct and Chain-of-Thought cascade architectures. This work lays foundational evidence that scalable, quality-centric synthetic data pipelines are powerful enablers for inclusive, robust multilingual speech technologies, especially where manual annotation remains costly or infeasible.
Master thesis
Anglès
Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic; Text-to-speech software; Machine learning; Speech; LLM; Text; Generation; Synthetic; Quality; Síntesi de la parla (Programari); Aprenentatge automàtic
Universitat Politècnica de Catalunya
S'autoritza la difusió de l'obra mitjançant la llicència Creative Commons o similar 'Reconeixement-NoComercial- SenseObraDerivada'
Open Access
Treballs acadèmics [82075]