Other authors

Universitat Politècnica de Catalunya. Doctorat en Computació

Universitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació

Universitat Politècnica de Catalunya. inSSIDE - integrated Software, Services, Information and Data Engineering

Publication date

2025



Abstract

Denial Constraints (DCs) are a flexible formalism to express many types of data rules, making them a widely adopted tool for many applications. This flexibility led to the development of numerous algorithms to automatically discover DCs directly from data. However, few studies have been conducted on the quality of the discovered DCs. We experimentally quantify the lack of quality in the results obtained by state-of-the-art algorithms, showing how the proportion of discovered DCs that are false is rarely below 95%. We hypothesize that the common source of these erroneous DCs stems from the adoption of the current DC validity definition. We use a statistical approach to explain the mechanism leading to these results, and propose a redefinition of DC validity properties to avoid the acceptance of false DCs. We validate this redefinition experimentally, showing that it exclusively accepts true constraints of the data, and is reliable enough to discover DCs missed by domain experts. Additionally, we provide curated sets of golden DCs for each dataset used in our study, those generated by domain experts and those discovered using our approach.


This work is supported by the Horizon Europe Programme under GA.101135513 (CyclOps) and the Spanish Ministerio de Ciencia e Innovación under project PID2020-117191RB-I00 / AEI/10.13039/ 501100011033 (DOGO4ML). Anna Queralt is a Serra-Húnter fellow. E. Almeida is funded by the CNPQ grants 302909/2022-2 and 444192/2024-7. Albert Martin is funded by the predoctoral program AGAUR-FI grants (2025 FI-1 00967) Joan Oró, which is backed by the Secretariat of Universities and Research of the Department of Research and Universities of the Generalitat of Catalonia, as well as the European Social Plus Fund.


Peer Reviewed


Postprint (published version)

Document Type

Conference report

Language

English

Publisher

Association for Computing Machinery (ACM)

Related items

https://dl.acm.org/doi/10.14778/3748191.3748209

info:eu-repo/grantAgreement/EC/HE/101135513/EU/Automated end-to-end data life cycle management for FAIR data integration, processing and re-use/CyclOps

info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-117191RB-I00/ES/DESARROLLO, OPERATIVA Y GOBERNANZA DE DATOS PARA SISTEMAS SOFTWARE BASADOS EN APRENDIZAJE AUTOMATICO/

Recommended citation

This citation was generated automatically.

Rights

http://creativecommons.org/licenses/by-nc-nd/4.0/

Open Access

Attribution-NonCommercial-NoDerivatives 4.0 International

This item appears in the following Collection(s)

E-prints [72263]