2026-04-19T22:14:52Zhttps://recercat.cat/oai/request

oai:recercat.cat:2117/4606492026-04-17T04:01:50Zcom_2072_1033col_2072_452951

00925njm 22002777a 4500 dc Fernández Coronado, Alba author 2026-01-27 This thesis presents a series of improvements to the semantic search system deployed on the mango.com e-commerce platform, with the goal of enhancing retrieval accuracy, robustness, and relevance across multiple languages and query types. The work focuses on addressing key limitations of the existing semantic component within a hybrid lexical¿semantic search architecture. The main contributions include the training of a multilingual BERT uncased model to improve robustness to capitalization and diacritics, as well as fine-tuning on fashion-specific data to enhance domain understanding. In addition, limitations in handling attribute-only queries are addressed through the use of intelligent image cropping and a weighted fusion strategy that combines image embeddings with short textual metadata. Furthermore, the CLIP image encoder is fine-tuned to generate semantically richer and more discriminative visual representations, leading to higher similarity scores and improved ranking stability. Experimental results, evaluated on a manually annotated multilingual dataset, demonstrate consistent improvements across locales, increased semantic similarity scores, and more stable ranking performance, without introducing catastrophic forgetting or degrading standard query retrieval. The integration of these enhancements into the full production search pipeline provides a robust foundation for future improvements, including tighter alignment between multilingual text embeddings and the fine-tuned image encoder. https://hdl.handle.net/2117/460649 Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic Àrees temàtiques de la UPC::Economia i organització d'empreses::Comerç electrònic Deep learning Semantics--Data processing Computer vision Electronic commerce Cerca semàntica Cerca en comerç electrònic Recuperació multilingüe Sistemes de cerca híbrids Cerca multimodal Models visió-llenguatge CLIP BERT Recuperació de moda Adaptació al domini Embeddings imatge-text Semantic search E-commerce search Multilingual retrieval Hybrid search systems Multimodal search Vision-language models Image-text embeddings Image-text embeddings Image-text embeddings Aprenentatge profund Semàntica--Informàtica Visió per ordinador Comerç electrònic Improvement and integration of a deep learning-based semantic model into a hybrid lexical-semantic search engine