dc.contributor.author
Dimitrievikj, Aleksandar
dc.date.accessioned
2025-11-07T20:25:06Z
dc.date.available
2025-11-07T20:25:06Z
dc.date.issued
2025-11-06T16:29:10Z
dc.date.issued
2025-11-06T16:29:10Z
dc.identifier
http://hdl.handle.net/10230/71794
dc.identifier.uri
http://hdl.handle.net/10230/71794
dc.description.abstract
Treball fi de màster de: Erasmus Mundus joint Master in Artificial Intelligence (EMAI)
dc.description.abstract
Supervisor: Jude Wells
Co-Supervisor: Vicenç Gómez
dc.description.abstract
Recent advances in chemical language models have enabled rapid exploration of chemical space through generative design of novel molecules. However, precise control over key molecular properties—such as size, aqueous solubility, and lipophilicity—remains challenging without retraining or introducing complex optimization steps. This thesis investigates a lightweight approach based on contrastive activation addition, where differences in model activations between molecules with favorable and unfavorable properties are used to compute steering vectors. These vectors are applied during generation to bias the model towards producing molecules with desired characteristics, without modifying model weights. Using a GPT-style molecular generator conditioned on protein targets, we demonstrate that steering can consistently shift molecular property distributions: reducing median heavy-atom counts, improving predicted solubility by up to 1.4 logS units, and increasing the fraction of molecules within the optimal lipophilicity window for oral drugs. The approach preserves high validity rates, typically above 90%, and requires minimal computation, making it suitable for early-stage drug discovery workflows. Two variants of the method are compared: a global steering vector applied uniformly, and a tokenaligned vector field adapting dynamically to each generation step. While the latter amplifies property shifts, it also increases the risk of generating invalid molecules under certain settings. Overall, this work demonstrates that activation steering offers an interpretable, low-overhead mechanism for fine-tuning molecular properties, providing a practical tool to accelerate the design–make–test cycle in drug development. Future directions include extending this strategy to multi-property optimization and models that capture three-dimensional molecular structures.
dc.format
application/pdf
dc.rights
Llicència CC Reconeixement-NoComercial-SenseObraDerivada 4.0 Internacional (CC BY-NC-ND 4.0)
dc.rights
https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.rights
info:eu-repo/semantics/openAccess
dc.title
Steering vector-guided molecular generation using language models
dc.type
info:eu-repo/semantics/masterThesis