Benchmarking Machine Learning Algorithms for Microbial Electro-methanogenesis: A Comprehensive Assessment with SHAP-based Insights

Abstract

Microbial electromethanogenesis (EM) presents a promising pathway for sustainable biogas upgrading, but accurately predicting its performance is challenging due to complex, nonlinear process dynamics. Here, we systematically compared seven supervised machine learning (ML) algorithms, including one-dimensional convolutional neural network (1D-CNN), multilayer perceptron (MLP), gradient boosting regressor (GBR), adaptive boosting regressor (AdaBoost), stacking regressors, and K-nearest neighbors (kNN), for their predictive biomethane production capabilities using experimental data from EM bioelectrochemical systems (EM-BESs). The data set encompassed operational parameters such as optical density (OD600), pH, electrical conductivity (EC, mS/cm), average applied current (A m-2), and CO2 availability (mol). After hyperparameter optimization, the 1D-CNN model exhibited superior predictive performance (R2 = 0.934), significantly outperforming traditional ML methods. To move beyond prediction and uncover mechanistic insights, a feature importance analysis was conducted on the CNN model using SHapley Additive exPlanations (SHAP). The analysis revealed that average current, OD600, and pH were the most influential features in biomethane production, confirming that the model learned relationships grounded in fundamental bioelectrochemical principles. The SHAP analysis also identified complex, nonmonotonic effects of other variables, providing deeper process understanding. This study not only demonstrates the promising ability of ML, especially deep learning architectures, to advance EM optimization but also provides mechanistic insights into the factors governing bioelectrochemical methanogenesis. These findings are broadly applicable to analogous BESs, particularly microbial electrosynthesis (i.e., commodity chemical) and microbial electrolysis cells (i.e., biohydrogen), offering potential for enhancing system performance through data-driven operational control across sustainable biotechnology applications


iddharth Gadkari would like to acknowledge the financial support from the Natural Environment Research Council (NERC) UK project grant: NE/W003627/1. Sebastià Puig (S.P.) and Silvia Bolognesi’s time was funded by the Spanish Ministry of Science (AEI) through the grant Proyectos de Transición Ecológica y Digital (TED2021-129452B-I00) in the framework of the De-Cent project. S.P. is a Serra Hunter Fellow (UdG-AG-575) and gratefully acknowledges the funding from the AGAUR–ICREA Academia Programme, supported by the Department of Research and Universities of the Government of Catalonia. LEQUIA has been recognized by the Catalan Government (ref. 2021 SGR01352). We would also like to thank the Surrey Institute for People-Centred AI and the National Council for Scientific and Technological Development (CNPq) for their institutional support. Erick G. Sperandio Nascimento is a CNPq technological development fellow (Proc.308963/2022-9)

Document Type

Article


Published version


peer reviewed

Language

English

Related items

info:eu-repo/semantics/altIdentifier/doi/10.1021/acssuschemeng.5c09770

info:eu-repo/semantics/altIdentifier/issn/2168-0485

info:eu-repo/semantics/altIdentifier/eissn/2168-0485

Recommended citation

This citation was generated automatically.

Rights

Reconeixement 4.0 Internacional

http://creativecommons.org/licenses/by/4.0