Deep research agentic framework for mitigating bias in AI-driven healthcare diagnostics

dc.contributor.author
Ferreira Moreira, Pedro José
dc.date.accessioned
2025-11-05T20:23:58Z
dc.date.available
2025-11-05T20:23:58Z
dc.date.issued
2025-11-04T17:38:20Z
dc.date.issued
2025-11-04T17:38:20Z
dc.date.issued
2025
dc.identifier
http://hdl.handle.net/10230/71770
dc.identifier.uri
http://hdl.handle.net/10230/71770
dc.description.abstract
Treball fi de màster de: Erasmus Mundus joint Master in Artificial Intelligence (EMAI)
dc.description.abstract
Supervisors: Vicenç Gómez & Leo Anthony Celi Academic Tutor: Vicenç Gómez
dc.description.abstract
Transformer-scale language models can now ace many medical exams, but their frozen parametric memory risks propagating outdated guidelines and systemic bias to the bedside. To counter this, we re-imagine the diagnostic assistant as a navigator that plans, retrieves, executes code, and verifies evidence rather than guessing from memory. We introduce DeepMed, a 4 B-parameter multi-agent framework who attempts to switch the paradigm of medical assistances from diagnostic oracles to information retrievers. Agents invoke external tools via the open Model Context Protocol (MCP), including M3, a natural-language gateway to the MIMIC-IV EHR, and a sandboxed Python REPL for on-the-fly calculations. Performance is audited on the newly proposed MedBrowseComp benchmark (1 089 quarterly-regenerating, multi-hop oncology related queries), legacy QA suites, the EquityMedQA counter factual set, and the EHRSQL challenge. With just a 4 billion parameter LLM as the cognitive engine DeepMed achieves 26 %single-pass accuracy on MedBrowseComp, outperforming larger entreprise grade systems that rely on 10 to 100 times larger fine tuned models while running locally on a consumer laptop. On EquityMedQA it increases correctness from 50.8 % to 57.4%, a 13% relative reduction in demographic disparity. Coupling MCP to the M3 EHR interface lifts pass@1 on EHRSQL from 2% to 9%. By fusing agentic planning, typed tool use, and evidence-first reporting, DeepMed shows that bias-aware, verifiable clinical AI can be achieved without frontier-scale models or costly GPU clusters. The open-sourced multiagent framework, MCP server tool contributions like M3 and MedBrowseComp benchmark provide a reproducible path toward transparent, low-cost decision support in safety-critical healthcare settings.
dc.format
application/pdf
dc.language
eng
dc.rights
Llicència CC Reconeixement-NoComercial-SenseObraDerivada 4.0 Internacional (CC BY-NC-ND 4.0)
dc.rights
https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.rights
info:eu-repo/semantics/openAccess
dc.subject
Sistemes multiagent
dc.title
Deep research agentic framework for mitigating bias in AI-driven healthcare diagnostics
dc.type
info:eu-repo/semantics/masterThesis


Ficheros en el ítem

FicherosTamañoFormatoVer

No hay ficheros asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)