Cardiometabolic risk estimation using exposome data and machine learning

dc.contributor.author
Atehortúa, Angélica
dc.contributor.author
Gkontra, Polyxeni
dc.contributor.author
Camacho, Marina
dc.contributor.author
Díaz, Oliver
dc.contributor.author
Bulgheroni, Maria
dc.contributor.author
Simonetti, Valentina
dc.contributor.author
Chadeau-Hyam, Marc
dc.contributor.author
Felix, Janine F.
dc.contributor.author
Sebert, Sylvain
dc.contributor.author
Lekadir, Karim, 1977-
dc.date.issued
2025-02-20T08:24:58Z
dc.date.issued
2025-02-20T08:24:58Z
dc.date.issued
2023-11
dc.date.issued
2025-02-20T08:24:59Z
dc.identifier
1386-5056
dc.identifier
https://hdl.handle.net/2445/219023
dc.identifier
742387
dc.description.abstract
Background: The human exposome encompasses all exposures that individuals encounter throughout their lifetime. It is now widely acknowledged that health outcomes are influenced not only by genetic factors but also by the interactions between these factors and various exposures. Consequently, the exposome has emerged as a significant contributor to the overall risk of developing major diseases, such as cardiovascular disease (CVD) and diabetes. Therefore, personalized early risk assessment based on exposome attributes might be a promising tool for identifying high-risk individuals and improving disease prevention. Objective: Develop and evaluate a novel and fair machine learning (ML) model for CVD and type 2 diabetes (T2D) risk prediction based on a set of readily available exposome factors. We evaluated our model using internal and external validation groups from a multi-center cohort. To be considered fair, the model was required to demonstrate consistent performance across different subgroups of the cohort. Methods: From the UK Biobank, we identified 5,348 and 1,534 participants who within 13 years from the baseline visit were diagnosed with CVD and T2D, respectively. An equal number of participants who did not develop these pathologies were randomly selected as the control group. 109 readily available exposure variables from six different categories (physical measures, environmental, lifestyle, mental health events, sociodemographics, and early-life factors) from the participant’s baseline visit were considered. We adopted the XGBoost ensemble model to predict individuals at risk of developing the diseases. The model’s performance was compared to that of an integrative ML model which is based on a set of biological, clinical, physical, and sociodemographic variables, and, additionally for CVD, to the Framingham risk score. Moreover, we assessed the proposed model for potential bias related to sex, ethnicity, and age. Lastly, we interpreted the model’s results using SHAP, a state-of-the-art explainability method. Results: The proposed ML model presents a comparable performance to the integrative ML model despite using solely exposome information, achieving a ROC-AUC of 0.78 ± 0.01 and 0.77 ± 0.01 for CVD and T2D, respectively. Additionally, for CVD risk prediction, the exposome-based model presents an improved performance over the traditional Framingham risk score. No bias in terms of key sensitive variables was identified. Conclusions: We identified exposome factors that play an important role in identifying patients at risk of CVD and T2D, such as naps during the day, age completed full-time education, past tobacco smoking, frequency of tiredness/unenthusiasm, and current work status. Overall, this work demonstrates the potential of exposome-based machine learning as a fair CVD and T2D risk assessment tool.
dc.format
12 p.
dc.format
application/pdf
dc.language
eng
dc.publisher
Elsevier B.V.
dc.relation
Reproducció del document publicat a: https://doi.org/10.1016/j.ijmedinf.2023.105209
dc.relation
International Journal of Medical Informatics, 2023, vol. 179, p. 105209
dc.relation
https://doi.org/10.1016/j.ijmedinf.2023.105209
dc.rights
cc-by-nc-nd (c) Angélica Atehortúa et al., 2023
dc.rights
http://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.rights
info:eu-repo/semantics/openAccess
dc.source
Articles publicats en revistes (Matemàtiques i Informàtica)
dc.subject
Malalties cardiovasculars
dc.subject
Diabetis
dc.subject
Aprenentatge automàtic
dc.subject
Cardiovascular diseases
dc.subject
Diabetes
dc.subject
Machine learning
dc.title
Cardiometabolic risk estimation using exposome data and machine learning
dc.type
info:eu-repo/semantics/article
dc.type
info:eu-repo/semantics/publishedVersion


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)