Deep research agentic framework for mitigating bias in AI-driven healthcare diagnostics

Ferreira Moreira, Pedro José

Deep research agentic framework for mitigating bias in AI-driven healthcare diagnostics

dc.contributor.author

Ferreira Moreira, Pedro José

dc.date.accessioned

2025-11-05T20:23:58Z

dc.date.available

2025-11-05T20:23:58Z

dc.date.issued

2025-11-04T17:38:20Z

dc.date.issued

2025-11-04T17:38:20Z

dc.date.issued

2025

dc.identifier

http://hdl.handle.net/10230/71770

dc.identifier.uri

http://hdl.handle.net/10230/71770

dc.description.abstract

Treball fi de màster de: Erasmus Mundus joint Master in Artificial Intelligence (EMAI)

dc.description.abstract

Supervisors: Vicenç Gómez & Leo Anthony Celi Academic Tutor: Vicenç Gómez

dc.description.abstract

Transformer-scale language models can now ace many medical exams, but their frozen parametric memory risks propagating outdated guidelines and systemic bias to the bedside. To counter this, we re-imagine the diagnostic assistant as a navigator that plans, retrieves, executes code, and verifies evidence rather than guessing from memory. We introduce DeepMed, a 4 B-parameter multi-agent framework who attempts to switch the paradigm of medical assistances from diagnostic oracles to information retrievers. Agents invoke external tools via the open Model Context Protocol (MCP), including M3, a natural-language gateway to the MIMIC-IV EHR, and a sandboxed Python REPL for on-the-fly calculations. Performance is audited on the newly proposed MedBrowseComp benchmark (1 089 quarterly-regenerating, multi-hop oncology related queries), legacy QA suites, the EquityMedQA counter factual set, and the EHRSQL challenge. With just a 4 billion parameter LLM as the cognitive engine DeepMed achieves 26 %single-pass accuracy on MedBrowseComp, outperforming larger entreprise grade systems that rely on 10 to 100 times larger fine tuned models while running locally on a consumer laptop. On EquityMedQA it increases correctness from 50.8 % to 57.4%, a 13% relative reduction in demographic disparity. Coupling MCP to the M3 EHR interface lifts pass@1 on EHRSQL from 2% to 9%. By fusing agentic planning, typed tool use, and evidence-first reporting, DeepMed shows that bias-aware, verifiable clinical AI can be achieved without frontier-scale models or costly GPU clusters. The open-sourced multiagent framework, MCP server tool contributions like M3 and MedBrowseComp benchmark provide a reproducible path toward transparent, low-cost decision support in safety-critical healthcare settings.

dc.format

application/pdf

dc.language

eng

dc.rights

Llicència CC Reconeixement-NoComercial-SenseObraDerivada 4.0 Internacional (CC BY-NC-ND 4.0)

dc.rights

https://creativecommons.org/licenses/by-nc-nd/4.0/

dc.rights

info:eu-repo/semantics/openAccess

dc.subject

Sistemes multiagent

dc.title

Deep research agentic framework for mitigating bias in AI-driven healthcare diagnostics

dc.type

info:eu-repo/semantics/masterThesis

Ficheros en el ítem

Ficheros	Tamaño	Formato	Ver
No hay ficheros asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

Treballs d'estudiants [4945]