Open-Domain Zero-Shot Audio Tagging: Evaluation via Semantic Embeddings

dc.contributor.author
Yapici, Tolga
dc.date.accessioned
2026-02-07T20:25:44Z
dc.date.available
2026-02-07T20:25:44Z
dc.date.issued
2026-02-06T14:06:56Z
dc.date.issued
2026-02-06T14:06:56Z
dc.date.issued
2025
dc.identifier
https://hdl.handle.net/10230/72485
dc.identifier.uri
https://hdl.handle.net/10230/72485
dc.description.abstract
Treball fi de màster de: Master in Sound and Music Computing
dc.description.abstract
Supervisor: Panagiota Anastasopoulou
dc.description.abstract
Co-Supervisor: Frederic Font
dc.description.abstract
This thesis investigates open-domain zero-shot audio tagging on the BSD10k dataset, a curated heterogeneous subset of Freesound, using Contrastive Language–Audio Pretraining (CLAP) audio embeddings. To reduce the impact of rare and noisy labels, we apply a document frequency (DF) weighting scheme, which leads to substantial performance gains. We further introduce a semantic evaluation approach based on SBERT text embeddings, which captures semantically valid tags missed by exact string matching. This yields notable gains across systems, with the largest improvements in the baseline model and consistent improvements for both the DFweighted variant and Freesound’s supervised tag recommender used for comparison. Together, the tag weighting and semantic evaluation demonstrate performance improvements beyond standard metrics. While the results show clear advances, zeroshot tagging with CLAP remains limited by incomplete generalization to folksonomy labels and sparse annotation coverage. Nevertheless, this work highlights the potential of zero-shot approaches to enable consistent and standardized audio annotation directly from raw audio.
dc.format
application/pdf
dc.language
eng
dc.rights
Creative Commons license AttributionNonCommercial- NoDerivs 4.0 International
dc.rights
Attribution-NonCommercial-NoDerivatives 4.0 International
dc.rights
https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.rights
info:eu-repo/semantics/openAccess
dc.subject
Domini públic
dc.title
Open-Domain Zero-Shot Audio Tagging: Evaluation via Semantic Embeddings
dc.type
info:eu-repo/semantics/masterThesis


Ficheros en el ítem

FicherosTamañoFormatoVer

No hay ficheros asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)