2026-01-30T15:05:48Z
2026-01-30T15:05:48Z
2025
2026-01-30T15:05:48Z
In prostate cancer (PCa), risk calculators have been proposed, relying on clinical parameters and magnetic resonance imaging (MRI) enable early prediction of clinically significant cancer (CsPCa). The prostate imaging-reporting and data system (PI-RADS) is combined with clinical variables predominantly based on logistic regression models. This study explores modeling using regularization techniques such as ridge regression, LASSO, elastic net, classification tree, tree ensemble models like random forest or XGBoost, and neural networks to predict CsPCa in a dataset of 4799 patients in Catalonia (Spain). An 80-20% split was employed for training and validation. We used predictor variables such as age, prostate-specific antigen (PSA), prostate volume, PSA density (PSAD), digital rectal exam (DRE) findings, family history of PCa, a previous negative biopsy, and PI-RADS categories. When considering a sensitivity of 0.9, in the validation set, the XGBoost model outperforms others with a specificity of 0.640, followed closely by random forest (0.638), neural network (0.634), and logistic regression (0.620). In terms of clinical utility, for a 10% missclassification of CsPCa, XGBoost can avoid 41.77% of unnecessary biopsies, followed closely by random forest (41.67%) and neural networks (41.46%), while logistic regression has a lower rate of 40.62%. Using SHAP values for model explainability, PI-RADS emerges as the most influential risk factor, particularly for individuals with PI-RADS 4 and 5. Additionally, a positive digital rectal examination (DRE) or family history of prostate cancer proves highly influential for certain individuals, while a previous negative biopsy serves as a protective factor for others.
L.M.E., M.E.E., J.E.-E. and A.B.-F. received support by the Government of Aragon [Grant Number T69_23R]; L.M.E. and A.B.-F. received support by the Ministerio de Ciencia e Innovación [Grant Number PID2020-116873GB-I00]; J.M.A, P.S. and J.M. received support by the Instituto de Salut Carlos III and the European Union [Grant Number PI20/01666].
Article
Published version
English
Clinical utility; Clinically significant prostate cancer; Machine learning; SHAP values
Nature Research
Scientific Reports. 2025;15(1):4261
info:eu-repo/grantAgreement/ES/2PE/PID2020-116873GB-I00
© The Author(s) 2025. Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
http://creativecommons.org/licenses/by-nc-nd/4.0/