<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="static/style.xsl"?><OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2026-04-19T22:14:39Z</responseDate><request verb="GetRecord" identifier="oai:www.recercat.cat:2117/460649" metadataPrefix="oai_dc">https://recercat.cat/oai/request</request><GetRecord><record><header><identifier>oai:recercat.cat:2117/460649</identifier><datestamp>2026-04-17T04:01:50Z</datestamp><setSpec>com_2072_1033</setSpec><setSpec>col_2072_452951</setSpec></header><metadata><oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:doc="http://www.lyncode.com/xoai" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
   <dc:title>Improvement and integration of a deep learning-based semantic model into a hybrid lexical-semantic search engine</dc:title>
   <dc:creator>Fernández Coronado, Alba</dc:creator>
   <dc:contributor>Universitat Politècnica de Catalunya. Universitat Rovira i Virgili</dc:contributor>
   <dc:contributor>Universitat Rovira i Virgili</dc:contributor>
   <dc:contributor>Universitat de Barcelona</dc:contributor>
   <dc:contributor>PUNTO-FA, SL</dc:contributor>
   <dc:contributor>Ferrando Hernández, Pol</dc:contributor>
   <dc:contributor>Moreno Ribas, Antonio</dc:contributor>
   <dc:subject>Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic</dc:subject>
   <dc:subject>Àrees temàtiques de la UPC::Economia i organització d'empreses::Comerç electrònic</dc:subject>
   <dc:subject>Deep learning</dc:subject>
   <dc:subject>Semantics--Data processing</dc:subject>
   <dc:subject>Computer vision</dc:subject>
   <dc:subject>Electronic commerce</dc:subject>
   <dc:subject>Cerca semàntica</dc:subject>
   <dc:subject>Cerca en comerç electrònic</dc:subject>
   <dc:subject>Recuperació multilingüe</dc:subject>
   <dc:subject>Sistemes de cerca híbrids</dc:subject>
   <dc:subject>Cerca multimodal</dc:subject>
   <dc:subject>Models visió-llenguatge</dc:subject>
   <dc:subject>CLIP</dc:subject>
   <dc:subject>BERT</dc:subject>
   <dc:subject>Recuperació de moda</dc:subject>
   <dc:subject>Adaptació al domini</dc:subject>
   <dc:subject>Embeddings imatge-text</dc:subject>
   <dc:subject>Semantic search</dc:subject>
   <dc:subject>E-commerce search</dc:subject>
   <dc:subject>Multilingual retrieval</dc:subject>
   <dc:subject>Hybrid search systems</dc:subject>
   <dc:subject>Multimodal search</dc:subject>
   <dc:subject>Vision-language models</dc:subject>
   <dc:subject>Image-text embeddings</dc:subject>
   <dc:subject>Image-text embeddings</dc:subject>
   <dc:subject>Image-text embeddings</dc:subject>
   <dc:subject>Aprenentatge profund</dc:subject>
   <dc:subject>Semàntica--Informàtica</dc:subject>
   <dc:subject>Visió per ordinador</dc:subject>
   <dc:subject>Comerç electrònic</dc:subject>
   <dc:description>This thesis presents a series of improvements to the semantic search system deployed on the mango.com e-commerce platform, with the goal of enhancing retrieval accuracy, robustness, and relevance across multiple languages and query types. The work focuses on addressing key limitations of the existing semantic component within a hybrid lexical¿semantic search architecture. The main contributions include the training of a multilingual BERT uncased model to improve robustness to capitalization and diacritics, as well as fine-tuning on fashion-specific data to enhance domain understanding. In addition, limitations in handling attribute-only queries are addressed through the use of intelligent image cropping and a weighted fusion strategy that combines image embeddings with short textual metadata. Furthermore, the CLIP image encoder is fine-tuned to generate semantically richer and more discriminative visual representations, leading to higher similarity scores and improved ranking stability. Experimental results, evaluated on a manually annotated multilingual dataset, demonstrate consistent improvements across locales, increased semantic similarity scores, and more stable ranking performance, without introducing catastrophic forgetting or degrading standard query retrieval. The integration of these enhancements into the full production search pipeline provides a robust foundation for future improvements, including tighter alignment between multilingual text embeddings and the fine-tuned image encoder.</dc:description>
   <dc:date>2026-01-27</dc:date>
   <dc:type>Master thesis</dc:type>
   <dc:identifier>https://hdl.handle.net/2117/460649</dc:identifier>
   <dc:identifier>201017</dc:identifier>
   <dc:identifier>https://hdl.handle.net/2117/460649</dc:identifier>
   <dc:language>eng</dc:language>
   <dc:rights>Restricted access - confidentiality agreement</dc:rights>
   <dc:format>application/pdf</dc:format>
   <dc:publisher>Universitat Politècnica de Catalunya</dc:publisher>
</oai_dc:dc></metadata></record></GetRecord></OAI-PMH>