Beyond point predictions: Quantifying uncertainty in E. coli ML-based monitoring

Other authors

Agencia Estatal de Investigación

Publication date

2025-10



Abstract

Machine learning regression models are increasingly used to improve management, decision-making, and monitoring of drinking water quality, leveraging growing data from real-time sensors and laboratory analyses. However, most models provide only point predictions, ignoring inherent uncertainty caused by unobserved factors that can produce varying outcomes under similar conditions. This study benchmarks state-of-the-art regression algorithms and uncertainty quantification methods for predicting E. coli concentrations in a drinking water catchment. Gradient-boosted decision trees (GBDT) proved effective for real-time tracking, with CatBoost achieving the lowest error (RMSLE = 0.877), improving on the naïve baseline (1.160) and outperforming Random Forest by 5 %. Uncertainty quantification techniques successfully generated valid prediction intervals to identify high-risk contamination events, with Conformalized Quantile Regression emerging as the most reliable method. By combining accurate GBDT predictions with well-calibrated uncertainty estimates, this approach enhances microbial water quality forecasting, offering improved risk assessment and supporting more robust decision-making in drinking water management


David Abert-Fernández thanks UdG for a predoctoral grant under the program IFUdG2023/2 and Hèctor Monclús gratefully acknowledges the Ramon y Cajal Research Fellowship (RYC2019- 026434-I). This study was supported ShERLOcK (PID2020-112615RA-I00) and WaterCLUE (CNS2023-143664) projects, financed by the Ministerio de Ciencia e Innovación (MICIN) and Agencia Estatal de Investigación (AEI) (Spain). LEQUIA has been recognized as a consolidated research group by the Catalan government (2021-SGR-1352). Open Access funding provided thanks to the CRUE-CSIC agreement with Elsevier


6

Document Type

Article


Published version


peer-reviewed

Language

English

Publisher

Elsevier

Related items

info:eu-repo/semantics/altIdentifier/doi/10.1016/j.jwpe.2025.108734

info:eu-repo/semantics/altIdentifier/eissn/2214-7144

PID2020-112615RA-I00

info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-112615RA-I00/ES/DESARROLLO DE UNA METODOLOGIA PARA UNA GESTION RESILIENTE EN LOS SISTEMAS DE TRATAMIENTO DE AGUA POTABLE. DE LA INVESTIGACION APLICADA A LA VALIDACION A ESCALA REAL/

Recommended citation

This citation was generated automatically.

Rights

Attribution-NonCommercial 4.0 International

http://creativecommons.org/licenses/by-nc/4.0/