Beyond point predictions: Quantifying uncertainty in E. coli ML-based monitoring

Altres autors/es

Agencia Estatal de Investigación

Data de publicació

2025-10



Resum

Machine learning regression models are increasingly used to improve management, decision-making, and monitoring of drinking water quality, leveraging growing data from real-time sensors and laboratory analyses. However, most models provide only point predictions, ignoring inherent uncertainty caused by unobserved factors that can produce varying outcomes under similar conditions. This study benchmarks state-of-the-art regression algorithms and uncertainty quantification methods for predicting E. coli concentrations in a drinking water catchment. Gradient-boosted decision trees (GBDT) proved effective for real-time tracking, with CatBoost achieving the lowest error (RMSLE = 0.877), improving on the naïve baseline (1.160) and outperforming Random Forest by 5 %. Uncertainty quantification techniques successfully generated valid prediction intervals to identify high-risk contamination events, with Conformalized Quantile Regression emerging as the most reliable method. By combining accurate GBDT predictions with well-calibrated uncertainty estimates, this approach enhances microbial water quality forecasting, offering improved risk assessment and supporting more robust decision-making in drinking water management


David Abert-Fernández thanks UdG for a predoctoral grant under the program IFUdG2023/2 and Hèctor Monclús gratefully acknowledges the Ramon y Cajal Research Fellowship (RYC2019- 026434-I). This study was supported ShERLOcK (PID2020-112615RA-I00) and WaterCLUE (CNS2023-143664) projects, financed by the Ministerio de Ciencia e Innovación (MICIN) and Agencia Estatal de Investigación (AEI) (Spain). LEQUIA has been recognized as a consolidated research group by the Catalan government (2021-SGR-1352). Open Access funding provided thanks to the CRUE-CSIC agreement with Elsevier


6

Tipus de document

Article


Versió publicada


peer-reviewed

Llengua

Anglès

Publicat per

Elsevier

Documents relacionats

info:eu-repo/semantics/altIdentifier/doi/10.1016/j.jwpe.2025.108734

info:eu-repo/semantics/altIdentifier/eissn/2214-7144

PID2020-112615RA-I00

info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-112615RA-I00/ES/DESARROLLO DE UNA METODOLOGIA PARA UNA GESTION RESILIENTE EN LOS SISTEMAS DE TRATAMIENTO DE AGUA POTABLE. DE LA INVESTIGACION APLICADA A LA VALIDACION A ESCALA REAL/

Citació recomanada

Aquesta citació s'ha generat automàticament.

Drets

Attribution-NonCommercial 4.0 International

http://creativecommons.org/licenses/by-nc/4.0/

Aquest element apareix en la col·lecció o col·leccions següent(s)