Evaluation metrics in medical imaging AI: fundamentals, pitfalls, misapplications, and recommendations

Kocak, Burak; Klontzas, Michail E.; Stanzione, Arnaldo; Meddeb, Aymen; Demircioğlu, Aydın; Bluethgen, Christian; Bressem, Keno K.; Ugga, Lorenzo; Mercaldo, Nathaniel; Díaz, Oliver; Cuocolo, Renato

Evaluation metrics in medical imaging AI: fundamentals, pitfalls, misapplications, and recommendations

dc.contributor.author

Kocak, Burak

dc.contributor.author

Klontzas, Michail E.

dc.contributor.author

Stanzione, Arnaldo

dc.contributor.author

Meddeb, Aymen

dc.contributor.author

Demircioğlu, Aydın

dc.contributor.author

Bluethgen, Christian

dc.contributor.author

Bressem, Keno K.

dc.contributor.author

Ugga, Lorenzo

dc.contributor.author

Mercaldo, Nathaniel

dc.contributor.author

Díaz, Oliver

dc.contributor.author

Cuocolo, Renato

dc.date.accessioned

2026-03-06T02:43:50Z

dc.date.available

2026-03-06T02:43:50Z

dc.date.issued

2026-03-04T12:03:38Z

dc.date.issued

2026-03-04T12:03:38Z

dc.date.issued

2025-09

dc.date.issued

2026-03-04T12:03:38Z

dc.identifier

https://hdl.handle.net/2445/227851

dc.identifier

766730

dc.identifier.uri

https://hdl.handle.net/2445/227851

dc.description.abstract

Robust assessment of artificial intelligence (AI) models in medical imaging is paramount for reliable clinical integration. This international collaborative review paper provides an overview of key evaluation metrics across diverse tasks, including classification, regression, survival analysis, detection, and segmentation, as well as specialized metrics for calibration, foundation models, large language models, and synthetic images. Challenges of comparing models statistically and translating metric scores to clinical practice are also discussed. For each section, the paper outlines fundamental metrics, identifies common pitfalls and misapplications, and offers recommendations for more robust evaluations. Key recommendations often involve utilizing multiple, complementary metrics tailored to the specific task and dataset properties, transparent reporting of methodology, and critically, considering the clinical utility and real-world implications of model performance. Ultimately, effective evaluation requires a comprehensive, context-aware approach that goes beyond statistical metrics to ensure.

dc.format

24 p.

dc.format

application/pdf

dc.language

eng

dc.publisher

Elsevier B.V.

dc.relation

Reproducció del document publicat a: https://doi.org/10.1016/j.ejrai.2025.100030

dc.relation

European Journal of Radiology Artificial Intelligence, 2025, vol. 3, p. 100030

dc.relation

https://doi.org/10.1016/j.ejrai.2025.100030

dc.rights

cc-by (c) Burak Kocak et al., 2025

dc.rights

http://creativecommons.org/licenses/by/4.0/

dc.rights

info:eu-repo/semantics/openAccess

dc.subject

Intel·ligència artificial en medicina

dc.subject

Diagnòstic per la imatge

dc.subject

Aprenentatge automàtic

dc.subject

Algorismes computacionals

dc.subject

Medical artificial intelligence

dc.subject

Diagnostic imaging

dc.subject

Machine learning

dc.subject

Computer algorithms

dc.title

Evaluation metrics in medical imaging AI: fundamentals, pitfalls, misapplications, and recommendations

dc.type

info:eu-repo/semantics/article

dc.type

info:eu-repo/semantics/publishedVersion

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Matemàtiques i Informàtica [1007]