High-fidelity parameter-efficient fine-tuning for joint recognition and linking of diagnoses to ICD-10 in non-standard primary care notes

Estupiñan-Ojeda, Cristian; Sandomingo Freire, Raúl Jesús; Padró, Lluís; Turmo Borras, Jorge

High-fidelity parameter-efficient fine-tuning for joint recognition and linking of diagnoses to ICD-10 in non-standard primary care notes

dc.contributor

Universitat Politècnica de Catalunya. Doctorat en Intel·ligència Artificial

dc.contributor

Universitat Politècnica de Catalunya. Departament de Ciències de la Computació

dc.contributor

Universitat Politècnica de Catalunya. IDEAI-UPC - Intelligent Data sciEnce and Artificial Intelligence Research Group

dc.contributor.author

Estupiñan-Ojeda, Cristian

dc.contributor.author

Sandomingo Freire, Raúl Jesús

dc.contributor.author

Padró, Lluís

dc.contributor.author

Turmo Borras, Jorge

dc.date.issued

2025-10

dc.identifier

Estupiñan-Ojeda, C. [et al.]. High-fidelity parameter-efficient fine-tuning for joint recognition and linking of diagnoses to ICD-10 in non-standard primary care notes. «JAMIA open», Octubre 2025, vol. 8, núm. 5, article ooaf120.

dc.identifier

2574-2531

dc.identifier

https://hdl.handle.net/2117/446589

dc.identifier

10.1093/jamiaopen/ooaf120

dc.description.abstract

Objectives: Joint recognition and ICD-10 linking of diagnoses in bilingual, non-standard Spanish and Catalan primary care notes is challenging. We evaluate parameter-efficient fine-tuning (PEFT) techniques as a resource-conscious alternative to full fine-tuning (FFT) for multi-label clinical text classification. Materials and Methods: On a corpus of 21 812 Catalan and Spanish clinical notes from Catalonia, we compared the PEFT techniques LoRA, DoRA, LoHA, LoKR, and QLoRA applied to multilingual transformers (BERT, RoBERTa, DistilBERT, and mDeBERTa). Results: FFT delivered the best strict Micro-F1 (63.0), but BERT-QLoRA scored 62.2, only 0.8 points lower, while reducing trainable parameters by 67.5% and memory by 33.7%. Training on combined bilingual data consistently improved generalization across individual languages. Discussion: The small FFT margin was confined to rare labels, indicating limited benefit from updating all parameters. Among PEFT techniques, QLoRA offered the strongest accuracy-efficiency balance; LoRA and DoRA were competitive, whereas LoHA and LoKR incurred larger losses. Adapter rank mattered: ranks below 128 sharply degraded Micro-F1. The substantial memory savings enable deployment on commodity GPUs while delivering performance very close to FFT. Conclusion: PEFT, particularly QLoRA, supports accurate and memory-efficient joint entity recognition and ICD-10 linking in multilingual, low-resource clinical settings.

dc.description.abstract

This research was supported by the Spanish Ministry of Science and Innovation, through project TADIA-MED (https:// futur.upc.edu/28881334/), grant number [PID2019-10694 2RB-C33].

dc.description.abstract

Peer Reviewed

dc.description.abstract

Postprint (published version)

dc.format

9 p.

dc.format

application/pdf

dc.language

eng

dc.publisher

Oxford University Press

dc.relation

https://academic.oup.com/jamiaopen/article/8/5/ooaf120/8287824

dc.relation

info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-106942RB-C33/ES/ANALISIS DE TEXTO MEDICO PARA LA ASSISTENCIA A LA PREDICCION DE DIAGNOSIS/

dc.rights

http://creativecommons.org/licenses/by/4.0/

dc.rights

Open Access

dc.rights

Attribution 4.0 International

dc.subject

Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Llenguatge natural

dc.subject

Natural language processing

dc.subject

Joint entity recognition and linking

dc.subject

ICD-10 codes

dc.subject

Parameter-efficient fine-tuning

dc.title

High-fidelity parameter-efficient fine-tuning for joint recognition and linking of diagnoses to ICD-10 in non-standard primary care notes

dc.type

Article

Ficheros en el ítem

Ficheros	Tamaño	Formato	Ver
No hay ficheros asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

E-prints [72987]