2026-04-18T05:23:49Zhttps://recercat.cat/oai/request

oai:recercat.cat:2117/4606462026-04-17T01:52:34Zcom_2072_1033col_2072_452951

00925njm 22002777a 4500 dc Dié, Jean, Pierre, Dao-Koin author 2026-01-27 Large language models fine-tuned on domain-specific data are vulnerable to membership inference attacks, which can reveal whether particular examples were used in training. While prior work has established that fine-tuned models exhibit higher vulnerability than pre-trained models, this research has focused almost exclusively on endpoint comparisons-evaluating vulnerability after fine-tuning is complete without examining how it develops during training. This thesis investigates the progressive emergence of membership inference vulnerability across training epochs and its relationship with overfitting. We evaluate five membership inference attacks across five fine-tuning methods (full fine-tuning, LoRA, BitFit, adapter tuning, and prefix tuning), three model scales (1B, 6.9B, and 12B parameters), and five training epochs, yielding 375 attack evaluations. To ensure methodological rigor, we employ bag-of-words validation to verify that evaluation datasets are free from distribution artifacts that have confounded prior benchmarks. The central finding is a strong correlation between the training-validation loss gap-a standard measure of overfitting-and attack effectiveness across all experimental conditions. Pearson correlations range from 0.838 to 0.996 across attack methods, with all correlations statistically significant (p < 0.001). This relationship holds consistently across fine-tuning methods and model scales, suggesting that membership inference attacks primarily succeed when models are overfitted rather than exploiting fundamental architectural vulnerabilities. Reference-based attacks, which compare the fine-tuned model's behavior against the original base model, show amplified sensitivity compared to attacks that examine only the fine-tuned model, achieving high effectiveness at lower overfitting levels. These findings suggest that standard generalization practices may reduce membership inference vulnerability alongside their benefits for model quality. The loss gap, already monitored by practitioners for model selection, could serve as a practical privacy risk indicator during fine-tuning without requiring attack implementation. The core contributions of this thesis have been accepted for publication at RECSI 2026 (XVIII Reunión Española sobre Criptología y Seguridad de la Información). https://hdl.handle.net/2117/460646 Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic Àrees temàtiques de la UPC::Informàtica::Seguretat informàtica Machine learning Computer security Atacs d'inferència de pertinença Models de llenguatge grans Ajust fi Privacitat Ajust fi eficient en paràmetres Sobreajust Membership inference attacks Large language models Fine-tuning Privacy Parameter-efficient fine-tuning Overfitting Aprenentatge automàtic Seguretat informàtica When do membership inference attacks succeed? An empirical study of overfitting in fine-tuned Large Language Models (LLMs)