2026-04-13T13:29:15Zhttps://recercat.cat/oai/request

oai:recercat.cat:2117/3943222025-07-23T04:05:20Zcom_2072_1033col_2072_452951

Multi-agent reinforcement learning in two-player zero-sum games Paulo Molina, Marc Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic Reinforcement learning Supervised learning (Machine learning) aprenentatge per reforç aprenentatge supervisat jocs de suma-zero reinforcement learning supervised learning zero-sum games Aprenentatge per reforç Aprenentatge supervisat (Aprenentatge automàtic) El principal objectiu d'aquest projecte és comparar com diferents algorismes d'Aprenentatge per Reforç aprenen a jugar a jocs de suma-zero. En concret, ens centrem en el joc del Connecta 4 (o Quatre en Ratlla). En primer lloc, comencem introduint els conceptes teòrics bàsics per a entendre el projecte. Seguidament, proposem un procés d'entrenament que combina Aprenentatge Supervisat i Aprenentatge per Reforç. Inicialment, els agents aprenen a imitar els moviments d'un jugador de nivell mitjà. Sobre aquest coneixement après, s'apliquen diferents algorismes d'Aprenentatge per Reforç amb l'objectiu de millorar el nivell de joc de cada agent. Per a avaluar els agents entrenats, els fem competir entre ells i els comparem per a acabar concloent quin algorisme ha permès adquirir un millor nivell de joc. Finalment, presentem una senzilla Interfície d'Usuari perquè el/la lector/a pugui jugar al joc del Quatre en Ratlla contra tots els agents que s'han entrenat en aquest projecte. The main objective of this project is to compare how different Reinforcement Learning algorithms learn to play zero-sum games. Specifically, we focus on Connect 4 (or Four in a Row). Firstly, we start by introducing the basic theoretical concepts to understand the project. Afterward, we propose a training process that combines Supervised Learning and Reinforcement Learning. Initially, the agents learn to mimic the actions of a mid-level player. On this learned knowledge, we apply different Reinforcement Learning algorithms to improve the performance of each agent. To evaluate the trained agents, they compete against each other, so we can compare them and conclude which algorithm has achieved the highest level of play. Finally, we present a simple User Interface to let the reader play Connect 4 against all the agents that have been trained in this project. 2023-06-27 Bachelor thesis Open Access Universitat Politècnica de Catalunya