Altres autors/es

Universitat Politècnica de Catalunya. Doctorat en Computació

Universitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació

Universitat Politècnica de Catalunya. inSSIDE - integrated Software, Services, Information and Data Engineering

Data de publicació

2025



Resum

Among other measures of data quality, determining the reliability of conflicting values from different sources is especially challenging. Traditional data fusion approaches often infer correct values in simple cases, but struggle to handle variations in data granularity (such as differences in temporal, spatial, or categorical aggregations) and offer limited insight into the nature of disagreements. Thus, we propose a new source evaluation approach for numerical attributes that measures discordance (i.e., the extent to which sources differ from each other). Unlike existing methods that focus solely on point estimation, we allow both fine-grained and coarse-grained analysis, allowing more sophisticated data quality assessments. We employ a linear programming solver that transparently adapts to any data alignment expressed in a set of operators resembling relational algebra. Extensive experiments on real-world datasets demonstrate that our method generalizes existing truth discovery techniques measuring differences with Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and can adapt to diverse and complex scenarios.


Y. A. Akter is funded by the EC Horizon 2020 research and innovation programme (DEDS: grant agreement No 955895). A. Abelló and P. Jovanovic are funded by the Spanish Ministerio de Ciencia e Innovación under project PID2020-117191RB-I00/AEI/10.13039/501100011033 (DOGO4ML) and the EC Horizon Europe programme (ExtremeXP: grant agreement No 101093164).


Peer Reviewed


Postprint (author's final draft)

Tipus de document

Conference report

Llengua

Anglès

Publicat per

Springer

Documents relacionats

https://link.springer.com/chapter/10.1007/978-3-032-05281-0_10

info:eu-repo/grantAgreement/EC/H2020/955895/EU/Data Engineering for Data Science/DEDS

info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-117191RB-I00/ES/DESARROLLO, OPERATIVA Y GOBERNANZA DE DATOS PARA SISTEMAS SOFTWARE BASADOS EN APRENDIZAJE AUTOMATICO/

info:eu-repo/grantAgreement/EC/HE/101093164/EU/EXPeriment driven and user eXPerience oriented analytics for eXtremely Precise outcomes and decisions/ExtremeXP

Citació recomanada

Aquesta citació s'ha generat automàticament.

Drets

Restricted access - publisher's policy

Aquest element apareix en la col·lecció o col·leccions següent(s)

E-prints [72263]