How and why false denial constraints are discovered

dc.contributor
Universitat Politècnica de Catalunya. Doctorat en Computació
dc.contributor
Universitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació
dc.contributor
Universitat Politècnica de Catalunya. inSSIDE - integrated Software, Services, Information and Data Engineering
dc.contributor.author
Martin Garcia, Albert
dc.contributor.author
Cunha de Almeida, Eduardo
dc.contributor.author
Romero Moral, Óscar
dc.contributor.author
Queralt Calafat, Anna
dc.date.accessioned
2026-03-03T01:53:53Z
dc.date.available
2026-03-03T01:53:53Z
dc.date.issued
2025
dc.identifier
Martin, A. [et al.]. How and why false denial constraints are discovered. A: International Conference on Very Large Data Bases. «Proceedings of the VLDB Endowment (vol. 18, no. 10, July 2025)». New York: Association for Computing Machinery (ACM), 2025, p. 3477-3489. ISBN 2150-8097. DOI 10.14778/3748191.3748209 .
dc.identifier
2150-8097
dc.identifier
https://hdl.handle.net/2117/456287
dc.identifier
10.14778/3748191.3748209
dc.identifier.uri
https://hdl.handle.net/2117/456287
dc.description.abstract
Denial Constraints (DCs) are a flexible formalism to express many types of data rules, making them a widely adopted tool for many applications. This flexibility led to the development of numerous algorithms to automatically discover DCs directly from data. However, few studies have been conducted on the quality of the discovered DCs. We experimentally quantify the lack of quality in the results obtained by state-of-the-art algorithms, showing how the proportion of discovered DCs that are false is rarely below 95%. We hypothesize that the common source of these erroneous DCs stems from the adoption of the current DC validity definition. We use a statistical approach to explain the mechanism leading to these results, and propose a redefinition of DC validity properties to avoid the acceptance of false DCs. We validate this redefinition experimentally, showing that it exclusively accepts true constraints of the data, and is reliable enough to discover DCs missed by domain experts. Additionally, we provide curated sets of golden DCs for each dataset used in our study, those generated by domain experts and those discovered using our approach.
dc.description.abstract
This work is supported by the Horizon Europe Programme under GA.101135513 (CyclOps) and the Spanish Ministerio de Ciencia e Innovación under project PID2020-117191RB-I00 / AEI/10.13039/ 501100011033 (DOGO4ML). Anna Queralt is a Serra-Húnter fellow. E. Almeida is funded by the CNPQ grants 302909/2022-2 and 444192/2024-7. Albert Martin is funded by the predoctoral program AGAUR-FI grants (2025 FI-1 00967) Joan Oró, which is backed by the Secretariat of Universities and Research of the Department of Research and Universities of the Generalitat of Catalonia, as well as the European Social Plus Fund.
dc.description.abstract
Peer Reviewed
dc.description.abstract
Postprint (published version)
dc.format
13 p.
dc.format
application/pdf
dc.language
eng
dc.publisher
Association for Computing Machinery (ACM)
dc.relation
https://dl.acm.org/doi/10.14778/3748191.3748209
dc.relation
info:eu-repo/grantAgreement/EC/HE/101135513/EU/Automated end-to-end data life cycle management for FAIR data integration, processing and re-use/CyclOps
dc.relation
info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-117191RB-I00/ES/DESARROLLO, OPERATIVA Y GOBERNANZA DE DATOS PARA SISTEMAS SOFTWARE BASADOS EN APRENDIZAJE AUTOMATICO/
dc.rights
http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.rights
Open Access
dc.rights
Attribution-NonCommercial-NoDerivatives 4.0 International
dc.subject
Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació
dc.subject
Data mining
dc.subject
Integrity rules
dc.subject
Data quality
dc.subject
Denial constraints
dc.subject
Functional dependencies
dc.title
How and why false denial constraints are discovered
dc.type
Conference report


Ficheros en el ítem

FicherosTamañoFormatoVer

No hay ficheros asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

E-prints [72263]