The Data Artifacts Glossary : a community-based repository for bias on health datasets

Gameiro, Rodrigo R.; Woite, Naira Link; Sauer, Christopher M.; Hao, Sicheng; Fernandes, Chrystinne; Premo, Anna E.; Teixeira, Alice Rangel; Resli, Isabelle; Wong, An-Kwok Ian; Celi, Leo Anthony

The Data Artifacts Glossary : a community-based repository for bias on health datasets

dc.contributor.author

Gameiro, Rodrigo R.

dc.contributor.author

Woite, Naira Link

dc.contributor.author

Sauer, Christopher M.

dc.contributor.author

Hao, Sicheng

dc.contributor.author

Fernandes, Chrystinne

dc.contributor.author

Premo, Anna E.

dc.contributor.author

Teixeira, Alice Rangel

dc.contributor.author

Resli, Isabelle

dc.contributor.author

Wong, An-Kwok Ian

dc.contributor.author

Celi, Leo Anthony

dc.date.issued

2025

dc.identifier

https://ddd.uab.cat/record/321027

dc.identifier

urn:10.1186/s12929-024-01106-6

dc.identifier

urn:oai:ddd.uab.cat:321027

dc.identifier

urn:pmcid:PMC11792693

dc.identifier

urn:pmc-uid:11792693

dc.identifier

urn:pmid:39901158

dc.identifier

urn:oai:pubmedcentral.nih.gov:11792693

dc.identifier

urn:articleid:14230127v32p14

dc.description.abstract

The deployment of Artificial Intelligence (AI) in healthcare has the potential to transform patient care through improved diagnostics, personalized treatment plans, and more efficient resource management. However, the effectiveness and fairness of AI are critically dependent on the data it learns from. Biased datasets can lead to AI outputs that perpetuate disparities, particularly affecting social minorities and marginalized groups. This paper introduces the "Data Artifacts Glossary", a dynamic, open-source framework designed to systematically document and update potential biases in healthcare datasets. The aim is to provide a comprehensive tool that enhances the transparency and accuracy of AI applications in healthcare and contributes to understanding and addressing health inequities. Utilizing a methodology inspired by the Delphi method, a diverse team of experts conducted iterative rounds of discussions and literature reviews. The team synthesized insights to develop a comprehensive list of bias categories and designed the glossary's structure. The Data Artifacts Glossary was piloted using the MIMIC-IV dataset to validate its utility and structure. The Data Artifacts Glossary adopts a collaborative approach modeled on successful open-source projects like Linux and Python. Hosted on GitHub, it utilizes robust version control and collaborative features, allowing stakeholders from diverse backgrounds to contribute. Through a rigorous peer review process managed by community members, the glossary ensures the continual refinement and accuracy of its contents. The implementation of the Data Artifacts Glossary with the MIMIC-IV dataset illustrates its utility. It categorizes biases, and facilitates their identification and understanding. The Data Artifacts Glossary serves as a vital resource for enhancing the integrity of AI applications in healthcare by providing a mechanism to recognize and mitigate dataset biases before they impact AI outputs. It not only aids in avoiding bias in model development but also contributes to understanding and addressing the root causes of health disparities.

dc.format

application/pdf

dc.language

eng

dc.publisher

dc.relation

Journal of Biomedical Science ; Vol. 32, art 14 (february 2025)

dc.rights

open access

dc.rights

Aquest document està subjecte a una llicència d'ús Creative Commons. Es permet la reproducció total o parcial, la distribució, la comunicació pública de l'obra i la creació d'obres derivades, fins i tot amb finalitats comercials, sempre i quan es reconegui l'autoria de l'obra original.

dc.rights

https://creativecommons.org/licenses/by/4.0/

dc.subject

Bias

dc.subject

Health equity

dc.subject

Dataset

dc.subject

Data Artifacts Glossary

dc.subject

Artificial intelligence

dc.subject

Machine learning

dc.title

The Data Artifacts Glossary : a community-based repository for bias on health datasets

dc.type

Article

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Articles escrits per personal UAB o publicats per la universitat [88076]