2026-04-17T11:42:51Zhttps://recercat.cat/oai/request

oai:recercat.cat:2117/3945962025-07-23T06:11:50Zcom_2072_1033col_2072_452951

Algoritmo generador de datos artificiales para Machine Learning Pérez Moliner, Sergio Universitat Politècnica de Catalunya. Departament de Ciències de la Computació Belanche Muñoz, Luis Antonio Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial Algorithms Machine learning valors perduts CART KNN distancia Gower gaussiana classificació missing values Gower distance gaussian classification Algorismes Aprenentatge automàtic En el camp del Machine Learning, és molt comú tenir la necessitat d'obtenir conjunts de dades amb característiques específiques per poder desenvolupar models de predicció i provar-los exhaustivament. Actualment, l'obtenció d'aquests conjunts pot ser una tasca complicada a causa de la manca d'eines que permetin obtenir dades ajustades al que es necessita. Els únics recursos disponibles en l'actualitat són els grans repositoris amb conjunts de dades existents i la creació manual de conjunts de dades. A partir d'aquesta manca de recursos, aquest projecte està enfocat en la creació d'un algorisme que permeti als usuaris generar conjunts de dades complets i amb diferents característiques de manera ràpida i eficaç. Per a això, l'algorisme permet la utilització de diversos tipus de variables, diversos mètodes per generar conjunts diferents, l'addició de valors perduts i l'addició de dificultat sobre la variable objectiu. Els resultats experimentals obtinguts durant el desenvolupament del projecte mostren que les tècniques utilitzades funcionen correctament i s'obtenen conjunts de dades útils per observar les característiques i el comportament d'alguns models de Machine Learning ja creats. Per concloure cal destacar que l'algorisme suposa una nova eina que pot servir com a base d'altres projectes i que és susceptible de noves millores que permetin afegir més característiques interessants als conjunts i que, per tant, li permetin arribar a més usuaris. In the field of Machine Learning, it is very common to have the need to obtain datasets with specific characteristics in order to develop prediction models and test them. Currently, obtaining such datasets can be a challenging task due to the lack of tools that allow for obtaining data tailored to the required specifications. The only available resources at present are large repositories with existing datasets and the manual creation of datasets. Addressing this lack of resources, this project is focused on the creation of an algorithm that enables users to quickly and efficiently generate complete datasets with various characteristics. To achieve this, the algorithm allows for the use of different types of variables, multiple methods for generating distinct datasets, the addition of missing values, and the introduction of difficulty regarding the target variable. The experimental results obtained during the project's development demonstrate that the employed techniques function correctly, yielding datasets that are useful for examining the characteristics and behavior of pre-existing Machine Learning models. In conclusion, it is worth noting that this algorithm represents a novel tool that can serve as a foundation for other projects. Furthermore, it is open to further improvements that could introduce more interesting features to the datasets and consequently expand its accessibility to a wider range of users. 2023-06-27 Bachelor thesis https://hdl.handle.net/2117/394596 177894 spa Open Access application/pdf Universitat Politècnica de Catalunya