Agencia Estatal de Investigación
2025-10
Machine learning regression models are increasingly used to improve management, decision-making, and monitoring of drinking water quality, leveraging growing data from real-time sensors and laboratory analyses. However, most models provide only point predictions, ignoring inherent uncertainty caused by unobserved factors that can produce varying outcomes under similar conditions. This study benchmarks state-of-the-art regression algorithms and uncertainty quantification methods for predicting E. coli concentrations in a drinking water catchment. Gradient-boosted decision trees (GBDT) proved effective for real-time tracking, with CatBoost achieving the lowest error (RMSLE = 0.877), improving on the naïve baseline (1.160) and outperforming Random Forest by 5 %. Uncertainty quantification techniques successfully generated valid prediction intervals to identify high-risk contamination events, with Conformalized Quantile Regression emerging as the most reliable method. By combining accurate GBDT predictions with well-calibrated uncertainty estimates, this approach enhances microbial water quality forecasting, offering improved risk assessment and supporting more robust decision-making in drinking water management
David Abert-Fernández thanks UdG for a predoctoral grant under the program IFUdG2023/2 and Hèctor Monclús gratefully acknowledges the Ramon y Cajal Research Fellowship (RYC2019- 026434-I). This study was supported ShERLOcK (PID2020-112615RA-I00) and WaterCLUE (CNS2023-143664) projects, financed by the Ministerio de Ciencia e Innovación (MICIN) and Agencia Estatal de Investigación (AEI) (Spain). LEQUIA has been recognized as a consolidated research group by the Catalan government (2021-SGR-1352). Open Access funding provided thanks to the CRUE-CSIC agreement with Elsevier
6
Article
Published version
peer-reviewed
English
Aprenentatge automatic; Machine learning; Incertesa -- Models matemàtics; Uncertainty -- Mathematical models; Aigua potable; Drinking wàter; Aigua -- Qualitat; Water quality
Elsevier
info:eu-repo/semantics/altIdentifier/doi/10.1016/j.jwpe.2025.108734
info:eu-repo/semantics/altIdentifier/eissn/2214-7144
PID2020-112615RA-I00
info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-112615RA-I00/ES/DESARROLLO DE UNA METODOLOGIA PARA UNA GESTION RESILIENTE EN LOS SISTEMAS DE TRATAMIENTO DE AGUA POTABLE. DE LA INVESTIGACION APLICADA A LA VALIDACION A ESCALA REAL/
Attribution-NonCommercial 4.0 International
http://creativecommons.org/licenses/by-nc/4.0/