Evaluation metrics in medical imaging AI: fundamentals, pitfalls, misapplications, and recommendations

dc.contributor.author
Kocak, Burak
dc.contributor.author
Klontzas, Michail E.
dc.contributor.author
Stanzione, Arnaldo
dc.contributor.author
Meddeb, Aymen
dc.contributor.author
Demircioğlu, Aydın
dc.contributor.author
Bluethgen, Christian
dc.contributor.author
Bressem, Keno K.
dc.contributor.author
Ugga, Lorenzo
dc.contributor.author
Mercaldo, Nathaniel
dc.contributor.author
Díaz, Oliver
dc.contributor.author
Cuocolo, Renato
dc.date.accessioned
2026-03-06T02:43:50Z
dc.date.available
2026-03-06T02:43:50Z
dc.date.issued
2026-03-04T12:03:38Z
dc.date.issued
2026-03-04T12:03:38Z
dc.date.issued
2025-09
dc.date.issued
2026-03-04T12:03:38Z
dc.identifier
https://hdl.handle.net/2445/227851
dc.identifier
766730
dc.identifier.uri
https://hdl.handle.net/2445/227851
dc.description.abstract
Robust assessment of artificial intelligence (AI) models in medical imaging is paramount for reliable clinical integration. This international collaborative review paper provides an overview of key evaluation metrics across diverse tasks, including classification, regression, survival analysis, detection, and segmentation, as well as specialized metrics for calibration, foundation models, large language models, and synthetic images. Challenges of comparing models statistically and translating metric scores to clinical practice are also discussed. For each section, the paper outlines fundamental metrics, identifies common pitfalls and misapplications, and offers recommendations for more robust evaluations. Key recommendations often involve utilizing multiple, complementary metrics tailored to the specific task and dataset properties, transparent reporting of methodology, and critically, considering the clinical utility and real-world implications of model performance. Ultimately, effective evaluation requires a comprehensive, context-aware approach that goes beyond statistical metrics to ensure.
dc.format
24 p.
dc.format
application/pdf
dc.language
eng
dc.publisher
Elsevier B.V.
dc.relation
Reproducció del document publicat a: https://doi.org/10.1016/j.ejrai.2025.100030
dc.relation
European Journal of Radiology Artificial Intelligence, 2025, vol. 3, p. 100030
dc.relation
https://doi.org/10.1016/j.ejrai.2025.100030
dc.rights
cc-by (c) Burak Kocak et al., 2025
dc.rights
http://creativecommons.org/licenses/by/4.0/
dc.rights
info:eu-repo/semantics/openAccess
dc.subject
Intel·ligència artificial en medicina
dc.subject
Diagnòstic per la imatge
dc.subject
Aprenentatge automàtic
dc.subject
Algorismes computacionals
dc.subject
Medical artificial intelligence
dc.subject
Diagnostic imaging
dc.subject
Machine learning
dc.subject
Computer algorithms
dc.title
Evaluation metrics in medical imaging AI: fundamentals, pitfalls, misapplications, and recommendations
dc.type
info:eu-repo/semantics/article
dc.type
info:eu-repo/semantics/publishedVersion


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)