Quality-driven synthetic text generation for multilingual speech translation with audio large language models

Val Vila, Xavier; Val Vila, Xavier

Quality-driven synthetic text generation for multilingual speech translation with audio large language models

;
Quality-Driven Synthetic Text Generation for Multilingual Speech Translation with Audio Large Language Models;

To access the full text documents, please follow this link: https://hdl.handle.net/2117/456950

Author

Val Vila, Xavier

Other authors

Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions

Hernando Pericás, Francisco Javier

Publication date

2025-10-28

Abstract

This thesis explores quality-driven synthetic data generation as a scalable solution for mul- tilingual speech-to-text translation (S2TT), focusing on Iberian languages with limited natural resources. Leveraging large language models (LLMs) and rigorous reference-free quality filtering via BLASER 2.0, an end-to-end pipeline was implemented to generate millions of high-quality synthetic translations. The approach demonstrates substantial improvements in translation quality and semantic similarity for low-resource languages such as Asturian and Occitan, while enabling efficient scaling to diverse linguistic do- mains. Experimental results reveal that models trained on filtered synthetic data achieve competitive and often state-of-the-art performance in S2TT tasks, and narrow the gap between direct and Chain-of-Thought cascade architectures. This work lays foundational evidence that scalable, quality-centric synthetic data pipelines are powerful enablers for inclusive, robust multilingual speech technologies, especially where manual annotation remains costly or infeasible.

Document Type

Master thesis

Language

English

Subjects and keywords

Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic; Text-to-speech software; Machine learning; Speech; LLM; Text; Generation; Synthetic; Quality; Síntesi de la parla (Programari); Aprenentatge automàtic

Publisher

Universitat Politècnica de Catalunya

Recommended citation

This citation was generated automatically.

Export

DIDL MARC MARC_CCUC METS OAI_DC ORE QDC RDF

Rights

S'autoritza la difusió de l'obra mitjançant la llicència Creative Commons o similar 'Reconeixement-NoComercial- SenseObraDerivada'

Open Access

This item appears in the following Collection(s)

Treballs acadèmics [82075]

Quality-driven synthetic text generation for multilingual speech translation with audio large language models

Author

Other authors

Publication date

Share

Abstract

Document Type

Language

Subjects and keywords

Publisher

Recommended citation

Export

Rights

This item appears in the following Collection(s)