Bayesian bandits for algorithm selection: latent-state modeling and spatial reward structures

Ernst, Marvin Michel; Gelabert Cortés, Oriol; Vadenja, Melisa

Bayesian bandits for algorithm selection: latent-state modeling and spatial reward structures

dc.contributor.author

Ernst, Marvin Michel

dc.contributor.author

Gelabert Cortés, Oriol

dc.contributor.author

Vadenja, Melisa

dc.date.accessioned

2025-11-28T20:35:08Z

dc.date.available

2025-11-28T20:35:08Z

dc.date.issued

2025-11-26T12:28:41Z

dc.date.issued

2025-11-26T12:28:41Z

dc.date.issued

2025-06-04

dc.identifier

http://hdl.handle.net/10230/72017

dc.identifier.uri

http://hdl.handle.net/10230/72017

dc.description.abstract

Treball fi de màster de: Master's Degree in Data Science. Methodology Program. Curs 2024-2025

dc.description.abstract

Tutors: David Rossel i Christian Brownlees

dc.description.abstract

This thesis extends the classical Multi-Armed Bandit (MAB) framework to dynamic and spatial environments. In dynamic settings, Bayesian latent-state models with Thompson Sampling and UCB are evaluated for their ability to adapt to non-stationary rewards, with comparisons to simpler autoregressive (AR) models. For spatially structured problems, Gaussian Process (GP) and Lipschitz bandits are used to exploit correlations between arms. Algorithms such as GP-UCB and Zoom-In demonstrate improved learning efficiency. Empirical results highlight the benefits of modeling temporal and spatial structure, while also emphasizing the computational trade-offs compared to classical, more tractable bandit algorithms.

dc.description.abstract

Esta tesis amplía el marco clásico de Multi-Armed Bandit (MAB) a entornos dinámicos y espaciales. En contextos dinámicos, se evalúan modelos bayesianos con estados latentes, combinados con algoritmos clásicos por su capacidad de adaptarse a recompensas no estacionarias, comparándolos con modelos autorregresivos (AR) más simples. Para el caso de estructura espacial, se emplean GP Bandits y Lipschitz Bandits para aprovechar las correlaciones entre brazos. Algoritmos como GP-UCB y Zoom-In demuestran una mayor eficiencia en el aprendizaje en este entorno. Los resultados empíricos resaltan las ventajas de modelar la estructura temporal y espacial, al tiempo que se enfatizan los costes computacionales frente a los algoritmos clásicos más accesibles.

dc.format

application/pdf

dc.language

eng

dc.rights

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

dc.rights

https://creativecommons.org/licenses/by-nc-nd/4.0

dc.rights

info:eu-repo/semantics/openAccess

dc.subject

Treball de fi de màster – Curs 2024-2025

dc.subject

Multi-armed bandits

dc.subject

Latent-state models

dc.subject

Gaussian process bandits

dc.subject

Modelos de estado latente

dc.title

Bayesian bandits for algorithm selection: latent-state modeling and spatial reward structures

dc.type

info:eu-repo/semantics/masterThesis

Fitxers en aquest element

Fitxers	Grandària	Format	Visualització
No hi ha fitxers associats a aquest element.

Aquest element apareix en la col·lecció o col·leccions següent(s)

Treballs d'estudiants [4945]