Single Nucleotide Polymorphism relevance learning with Random Forests for Type 2 diabetes risk prediction Type 2 diabetes Random Forest Feature learning Predictive model Gini importance

dc.contributor
Ministerio de Economía y Competitividad (Espanya)
dc.contributor.author
López Ibáñez, Beatriz
dc.contributor.author
Torrent-Fontbona, Ferran
dc.contributor.author
Viñas, Ramon
dc.contributor.author
Fernández-Real Lemos, José Manuel
dc.date.accessioned
2024-06-18T14:38:46Z
dc.date.available
2024-06-18T14:38:46Z
dc.date.issued
info:eu-repo/date/embargoEnd/2018-09-22
dc.date.issued
2018-04
dc.identifier
http://hdl.handle.net/10256/14506
dc.identifier.uri
http://hdl.handle.net/10256/14506
dc.description.abstract
The use of artificial intelligence techniques to find out which Single Nucleotide Polymorphisms (SNPs) promote the development of a disease is one of the features of medical research, as such techniques may potentially aid early diagnosis and help in the prescription of preventive measures. In particular, the aim is to help physicians to identify the relevant SNPs related to Type 2 diabetes, and to build a decision-support tool for risk prediction. Methods: We use the Random Forest (RF) technique in order to search for the most important attributes (SNPs) related to diabetes, giving a weight (degree of importance), ranging between 0 and 1, to each attribute. Support Vector Machines and Logistic Regression have also been used since they are two other machine learning techniques that are well-established in the health community. Their performance has been compared to that achieved by RF. Furthermore, the relevance of the attributes obtained through the use of RF has then been used to perform predictions with k-Nearest Neighbour method weighting attributes in the similarity measure according to the relevance of the attributes with RF. Results: Testing is performed on a set of 677 subjects. RF is able to handle the complexity of features' interactions, overfitting, and unknown attribute values, providing the SNPs' relevance with an up to 0.89 area under the ROC curve in terms of risk prediction. RF outperforms all the other tested machine learning techniques in terms of prediction accuracy, and in terms of the stability of the estimated relevance of the attributes. Conclusions: The Random Forest is a useful method for learning predictive models and the relevance of SNPs without any underlying assumption
dc.description.abstract
This work was supported by the European Unions Horizon 2020 research and innovation programme [grant number 689810, PEPPER]; the University of Girona [grant number MPCUdG2016]; and the Spanish MINECO [grant number DPI2013-47450-C21-R].
dc.format
application/pdf
dc.language
eng
dc.publisher
Elsevier
dc.relation
info:eu-repo/semantics/altIdentifier/doi/10.1016/j.artmed.2017.09.005
dc.relation
info:eu-repo/semantics/altIdentifier/issn/0933-3657
dc.relation
info:eu-repo/semantics/altIdentifier/eissn/1873-2860
dc.relation
info:eu-repo/grantAgreement/MINECO//DPI2013-47450-C2-1-R/ES/PLATAFORMA PARA LA MONITORIZACION Y EVALUACION DE LA EFICIENCIA DE LOS SISTEMAS DE DISTRIBUCION EN SMART CITIES/
dc.relation
info:eu-repo/grantAgreement/EC/H2020/689810/EU/Patient Empowerment through Predictive PERsonalised decision support/PEPPER
dc.rights
Tots els drets reservats
dc.rights
info:eu-repo/semantics/openAccess
dc.source
© Artificial Intelligence in Medicine, 2017, vol. 85, p. 45-49
dc.source
Articles publicats (D-EEEiA)
dc.source
López Ibáñez, Beatriz Torrent-Fontbona, Ferran Viñas, Ramon Fernández-Real Lemos, José Manuel 2017 Single Nucleotide Polymorphism relevance learning with Random Forests for Type 2 diabetes risk prediction Type 2 diabetesRandom ForestFeature learningPredictive modelGini importance Artificial Intelligence in Medicine
dc.subject
Diabetis no-insulinodependent.
dc.subject
Non-insulin-dependent diabetes.
dc.subject
Diàtesi
dc.subject
Disease susceptibility
dc.subject
Intel·ligència artificial -- Aplicacions a la medicina
dc.subject
Artificial intelligence -- Medical applications
dc.title
Single Nucleotide Polymorphism relevance learning with Random Forests for Type 2 diabetes risk prediction Type 2 diabetes Random Forest Feature learning Predictive model Gini importance
dc.type
info:eu-repo/semantics/article
dc.type
info:eu-repo/semantics/acceptedVersion
dc.type
peer-reviewed


Fitxers en aquest element

FitxersGrandàriaFormatVisualització

No hi ha fitxers associats a aquest element.

Aquest element apareix en la col·lecció o col·leccions següent(s)