Set covering machine on t-cell receptor LLM representations for lung cancer prediction

Data de publicació

2025-11-04T16:03:40Z

2025-11-04T16:03:40Z

2025



Resum

Treball fi de màster de: Erasmus Mundus joint Master in Artificial Intelligence (EMAI)


Supervisors: Benny Chain and John Shawe-Taylor


Tutor: Massimo Mecella


T-cell receptors (TCRs) provide insights into immune recognition of cancer. We explore whether interpretable rule-based classifiers derived from SCEPTR embeddings of TCR sequences can differentiate cancer repertoires from healthy controls. Using the Set Covering Machine algorithm, we developed models with hyperplane and similarity based rules across alpha and beta chains. Despite strong performance on training data, models failed to generalize to external datasets. Unexpectedly, alpha-chain models often outperformed beta-chain models, and single rules sometimes achieved high training accuracy, suggesting overfitting. Our findings highlight challenges in detecting cancer-specific TCR signatures and indicate current embeddings may capture technical patterns rather than biological signal. We propose future directions including improved rule generation strategies and validation with functionally annotated repertoires.

Tipus de document

Treball fi de màster

Llengua

Anglès

Matèries i paraules clau

Càncer

Citació recomanada

Aquesta citació s'ha generat automàticament.

Drets

Llicència CC Reconeixement-NoComercial-SenseObraDerivada 4.0 Internacional (CC BY-NC-ND 4.0)

https://creativecommons.org/licenses/by-nc-nd/4.0/

Aquest element apareix en la col·lecció o col·leccions següent(s)