Fecha de publicación

2025-11-06T16:29:10Z

2025-11-06T16:29:10Z

2025



Resumen

Treball fi de màster de: Erasmus Mundus joint Master in Artificial Intelligence (EMAI)


Supervisor: Jude Wells Co-Supervisor: Vicenç Gómez


Recent advances in chemical language models have enabled rapid exploration of chemical space through generative design of novel molecules. However, precise control over key molecular properties—such as size, aqueous solubility, and lipophilicity—remains challenging without retraining or introducing complex optimization steps. This thesis investigates a lightweight approach based on contrastive activation addition, where differences in model activations between molecules with favorable and unfavorable properties are used to compute steering vectors. These vectors are applied during generation to bias the model towards producing molecules with desired characteristics, without modifying model weights. Using a GPT-style molecular generator conditioned on protein targets, we demonstrate that steering can consistently shift molecular property distributions: reducing median heavy-atom counts, improving predicted solubility by up to 1.4 logS units, and increasing the fraction of molecules within the optimal lipophilicity window for oral drugs. The approach preserves high validity rates, typically above 90%, and requires minimal computation, making it suitable for early-stage drug discovery workflows. Two variants of the method are compared: a global steering vector applied uniformly, and a tokenaligned vector field adapting dynamically to each generation step. While the latter amplifies property shifts, it also increases the risk of generating invalid molecules under certain settings. Overall, this work demonstrates that activation steering offers an interpretable, low-overhead mechanism for fine-tuning molecular properties, providing a practical tool to accelerate the design–make–test cycle in drug development. Future directions include extending this strategy to multi-property optimization and models that capture three-dimensional molecular structures.

Tipo de documento

Trabajo fin de máster

Lengua

Inglés

Materias y palabras clave

Molècules

Citación recomendada

Esta citación se ha generado automáticamente.

Derechos

Llicència CC Reconeixement-NoComercial-SenseObraDerivada 4.0 Internacional (CC BY-NC-ND 4.0)

https://creativecommons.org/licenses/by-nc-nd/4.0/

Este ítem aparece en la(s) siguiente(s) colección(ones)