Open-Domain Zero-Shot Audio Tagging: Evaluation via Semantic Embeddings

Yapici, Tolga; Yapici, Tolga

Open-Domain Zero-Shot Audio Tagging: Evaluation via Semantic Embeddings

Per accedir als documents amb el text complet, si us plau, seguiu el següent enllaç: https://hdl.handle.net/10230/72485

Autor/a

Yapici, Tolga

Data de publicació

2026-02-06T14:06:56Z

2025

Resum

Treball fi de màster de: Master in Sound and Music Computing

Supervisor: Panagiota Anastasopoulou

Co-Supervisor: Frederic Font

This thesis investigates open-domain zero-shot audio tagging on the BSD10k dataset, a curated heterogeneous subset of Freesound, using Contrastive Language–Audio Pretraining (CLAP) audio embeddings. To reduce the impact of rare and noisy labels, we apply a document frequency (DF) weighting scheme, which leads to substantial performance gains. We further introduce a semantic evaluation approach based on SBERT text embeddings, which captures semantically valid tags missed by exact string matching. This yields notable gains across systems, with the largest improvements in the baseline model and consistent improvements for both the DFweighted variant and Freesound’s supervised tag recommender used for comparison. Together, the tag weighting and semantic evaluation demonstrate performance improvements beyond standard metrics. While the results show clear advances, zeroshot tagging with CLAP remains limited by incomplete generalization to folksonomy labels and sparse annotation coverage. Nevertheless, this work highlights the potential of zero-shot approaches to enable consistent and standardized audio annotation directly from raw audio.

Tipus de document

Treball fi de màster

Llengua

Anglès

Matèries i paraules clau

Domini públic

Citació recomanada

Aquesta citació s'ha generat automàticament.

Exportar

DIDL MARC MARC_CCUC METS OAI_DC ORE QDC RDF

Drets

Creative Commons license AttributionNonCommercial- NoDerivs 4.0 International

Attribution-NonCommercial-NoDerivatives 4.0 International

https://creativecommons.org/licenses/by-nc-nd/4.0/

Aquest element apareix en la col·lecció o col·leccions següent(s)

Treballs d'estudiants [4946]

Open-Domain Zero-Shot Audio Tagging: Evaluation via Semantic Embeddings

Autor/a

Data de publicació

Compartir

Resum

Tipus de document

Llengua

Matèries i paraules clau

Citació recomanada

Exportar

Drets

Aquest element apareix en la col·lecció o col·leccions següent(s)