Universitat Politècnica de Catalunya. Doctorat en Computació
Universitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació
Universitat Politècnica de Catalunya. inSSIDE - integrated Software, Services, Information and Data Engineering
2025
Among other measures of data quality, determining the reliability of conflicting values from different sources is especially challenging. Traditional data fusion approaches often infer correct values in simple cases, but struggle to handle variations in data granularity (such as differences in temporal, spatial, or categorical aggregations) and offer limited insight into the nature of disagreements. Thus, we propose a new source evaluation approach for numerical attributes that measures discordance (i.e., the extent to which sources differ from each other). Unlike existing methods that focus solely on point estimation, we allow both fine-grained and coarse-grained analysis, allowing more sophisticated data quality assessments. We employ a linear programming solver that transparently adapts to any data alignment expressed in a set of operators resembling relational algebra. Extensive experiments on real-world datasets demonstrate that our method generalizes existing truth discovery techniques measuring differences with Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and can adapt to diverse and complex scenarios.
Y. A. Akter is funded by the EC Horizon 2020 research and innovation programme (DEDS: grant agreement No 955895). A. Abelló and P. Jovanovic are funded by the Spanish Ministerio de Ciencia e Innovación under project PID2020-117191RB-I00/AEI/10.13039/501100011033 (DOGO4ML) and the EC Horizon Europe programme (ExtremeXP: grant agreement No 101093164).
Peer Reviewed
Postprint (author's final draft)
Conference report
Inglés
Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació; Data fusion; Truth discovery; Discordance; Linear programming
Springer
https://link.springer.com/chapter/10.1007/978-3-032-05281-0_10
info:eu-repo/grantAgreement/EC/H2020/955895/EU/Data Engineering for Data Science/DEDS
info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-117191RB-I00/ES/DESARROLLO, OPERATIVA Y GOBERNANZA DE DATOS PARA SISTEMAS SOFTWARE BASADOS EN APRENDIZAJE AUTOMATICO/
info:eu-repo/grantAgreement/EC/HE/101093164/EU/EXPeriment driven and user eXPerience oriented analytics for eXtremely Precise outcomes and decisions/ExtremeXP
Restricted access - publisher's policy
E-prints [72263]