Evaluating quality of disparate data sources: A discord-driven approach

dc.contributor
Universitat Politècnica de Catalunya. Doctorat en Computació
dc.contributor
Universitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació
dc.contributor
Universitat Politècnica de Catalunya. inSSIDE - integrated Software, Services, Information and Data Engineering
dc.contributor.author
Akter, Yeasmin Ara
dc.contributor.author
Abelló Gamazo, Alberto
dc.contributor.author
Jovanovic, Petar
dc.contributor.author
Sagi, Tomer
dc.contributor.author
Hose, Katja
dc.date.accessioned
2026-03-03T00:25:41Z
dc.date.available
2026-03-03T00:25:41Z
dc.date.issued
2025
dc.identifier
Akter, Y. [et al.]. Evaluating quality of disparate data sources: A discord-driven approach. A: European Conference on Advances in Databases and Information Systems. «Advances in Databases and Information Systems: 29th European Conference, ADBIS 2025: Tampere, Finland, September 23–26, 2025: proceedings». Springer, 2025, p. 147-163. ISBN 978-3-032-05281-0. DOI 10.1007/978-3-032-05281-0_10 .
dc.identifier
978-3-032-05281-0
dc.identifier
https://hdl.handle.net/2117/456283
dc.identifier
10.1007/978-3-032-05281-0_10
dc.identifier.uri
https://hdl.handle.net/2117/456283
dc.description.abstract
Among other measures of data quality, determining the reliability of conflicting values from different sources is especially challenging. Traditional data fusion approaches often infer correct values in simple cases, but struggle to handle variations in data granularity (such as differences in temporal, spatial, or categorical aggregations) and offer limited insight into the nature of disagreements. Thus, we propose a new source evaluation approach for numerical attributes that measures discordance (i.e., the extent to which sources differ from each other). Unlike existing methods that focus solely on point estimation, we allow both fine-grained and coarse-grained analysis, allowing more sophisticated data quality assessments. We employ a linear programming solver that transparently adapts to any data alignment expressed in a set of operators resembling relational algebra. Extensive experiments on real-world datasets demonstrate that our method generalizes existing truth discovery techniques measuring differences with Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and can adapt to diverse and complex scenarios.
dc.description.abstract
Y. A. Akter is funded by the EC Horizon 2020 research and innovation programme (DEDS: grant agreement No 955895). A. Abelló and P. Jovanovic are funded by the Spanish Ministerio de Ciencia e Innovación under project PID2020-117191RB-I00/AEI/10.13039/501100011033 (DOGO4ML) and the EC Horizon Europe programme (ExtremeXP: grant agreement No 101093164).
dc.description.abstract
Peer Reviewed
dc.description.abstract
Postprint (author's final draft)
dc.format
17 p.
dc.format
application/pdf
dc.language
eng
dc.publisher
Springer
dc.relation
https://link.springer.com/chapter/10.1007/978-3-032-05281-0_10
dc.relation
info:eu-repo/grantAgreement/EC/H2020/955895/EU/Data Engineering for Data Science/DEDS
dc.relation
info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-117191RB-I00/ES/DESARROLLO, OPERATIVA Y GOBERNANZA DE DATOS PARA SISTEMAS SOFTWARE BASADOS EN APRENDIZAJE AUTOMATICO/
dc.relation
info:eu-repo/grantAgreement/EC/HE/101093164/EU/EXPeriment driven and user eXPerience oriented analytics for eXtremely Precise outcomes and decisions/ExtremeXP
dc.rights
Restricted access - publisher's policy
dc.subject
Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació
dc.subject
Data fusion
dc.subject
Truth discovery
dc.subject
Discordance
dc.subject
Linear programming
dc.title
Evaluating quality of disparate data sources: A discord-driven approach
dc.type
Conference report


Ficheros en el ítem

FicherosTamañoFormatoVer

No hay ficheros asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

E-prints [72263]