<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="static/style.xsl"?><OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2026-04-14T08:35:21Z</responseDate><request verb="GetRecord" identifier="oai:www.recercat.cat:2117/169293" metadataPrefix="oai_dc">https://recercat.cat/oai/request</request><GetRecord><record><header><identifier>oai:recercat.cat:2117/169293</identifier><datestamp>2026-01-24T08:20:44Z</datestamp><setSpec>com_2072_1033</setSpec><setSpec>col_2072_452950</setSpec></header><metadata><oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:doc="http://www.lyncode.com/xoai" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
   <dc:title>PRESISTANT: Learning based assistant for data pre-processing</dc:title>
   <dc:creator>Bilalli, Besim</dc:creator>
   <dc:creator>Abelló Gamazo, Alberto</dc:creator>
   <dc:creator>Aluja Banet, Tomàs</dc:creator>
   <dc:creator>Wrembel, Robert</dc:creator>
   <dc:contributor>Universitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació</dc:contributor>
   <dc:contributor>Universitat Politècnica de Catalunya. Departament d'Estadística i Investigació Operativa</dc:contributor>
   <dc:contributor>Universitat Politècnica de Catalunya. inSSIDE - integrated Software, Service, Information and Data Engineering</dc:contributor>
   <dc:subject>Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació</dc:subject>
   <dc:subject>Decision trees</dc:subject>
   <dc:subject>Data mining</dc:subject>
   <dc:subject>Information storage and retrieval systems</dc:subject>
   <dc:subject>Data pre-processing</dc:subject>
   <dc:subject>Meta-learning</dc:subject>
   <dc:subject>Arbres de decisió</dc:subject>
   <dc:subject>Mineria de dades</dc:subject>
   <dc:subject>Informació -- Sistemes d'emmagatzematge i recuperació</dc:subject>
   <dc:description>Data pre-processing is one of the most time consuming and relevant steps in a data analysis process (e.g., classification task). A given data pre-processing operator can have positive, negative, or zero impact on the final result of the analysis. Expert users have the required knowledge to find the right pre-processing operators. However, when it comes to non-experts, they are overwhelmed by the amount of pre-processing operators and it is challenging for them to find operators that would positively impact their analysis (e.g., increase the predictive accuracy of a classifier). Existing solutions either assume that users have expert knowledge, or they recommend pre-processing operators that are only “syntactically” applicable to a dataset, without taking into account their impact on the final analysis. In this work, we aim at providing assistance to non-expert users by recommending data pre-processing operators that are ranked according to their impact on the final analysis. We developed a tool, PRESISTANT, that uses Random Forests to learn the impact of pre-processing operators on the performance (e.g., predictive accuracy) of 5 different classification algorithms, such as Decision Tree (J48), Naive Bayes, PART, Logistic Regression, and Nearest Neighbor (IBk). Extensive evaluations on the recommendations provided by our tool, show that PRESISTANT can effectively help non-experts in order to achieve improved results in their analytic tasks.</dc:description>
   <dc:description>Peer Reviewed</dc:description>
   <dc:description>Postprint (author's final draft)</dc:description>
   <dc:date>2019-09</dc:date>
   <dc:type>Article</dc:type>
   <dc:identifier>Bilalli, B. [et al.]. PRESISTANT: Learning based assistant for data pre-processing. "Data and knowledge engineering", vol. 123, Setembre 2019, núm. 101727, p. 1-22.</dc:identifier>
   <dc:identifier>0169-023X</dc:identifier>
   <dc:identifier>https://hdl.handle.net/2117/169293</dc:identifier>
   <dc:identifier>10.1016/j.datak.2019.101727</dc:identifier>
   <dc:language>eng</dc:language>
   <dc:relation>https://www.sciencedirect.com/science/article/pii/S0169023X18305123</dc:relation>
   <dc:rights>http://creativecommons.org/licenses/by-nc-nd/3.0/es/</dc:rights>
   <dc:rights>Open Access</dc:rights>
   <dc:rights>Attribution-NonCommercial-NoDerivs 3.0 Spain</dc:rights>
   <dc:format>31 p.</dc:format>
   <dc:format>application/pdf</dc:format>
   <dc:publisher>Elsevier</dc:publisher>
</oai_dc:dc></metadata></record></GetRecord></OAI-PMH>