<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="static/style.xsl"?><OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2026-04-14T07:31:57Z</responseDate><request verb="GetRecord" identifier="oai:www.recercat.cat:2117/127077" metadataPrefix="oai_dc">https://recercat.cat/oai/request</request><GetRecord><record><header><identifier>oai:recercat.cat:2117/127077</identifier><datestamp>2025-07-17T16:06:05Z</datestamp><setSpec>com_2072_1033</setSpec><setSpec>col_2072_452951</setSpec></header><metadata><oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:doc="http://www.lyncode.com/xoai" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
   <dc:title>How concepts emerge in neural networks</dc:title>
   <dc:creator>Surís Coll-Vinent, Dídac</dc:creator>
   <dc:contributor>Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions</dc:contributor>
   <dc:contributor>Massachusetts Institute of Technology</dc:contributor>
   <dc:contributor>Torralba, Antonio</dc:contributor>
   <dc:contributor>Giró Nieto, Xavier</dc:contributor>
   <dc:subject>Àrees temàtiques de la UPC::Enginyeria de la telecomunicació</dc:subject>
   <dc:subject>Neural networks (Computer science)</dc:subject>
   <dc:subject>Computer vision</dc:subject>
   <dc:subject>Neural networks</dc:subject>
   <dc:subject>concepts</dc:subject>
   <dc:subject>vision and language</dc:subject>
   <dc:subject>sound</dc:subject>
   <dc:subject>speech</dc:subject>
   <dc:subject>convolutional networks</dc:subject>
   <dc:subject>multimodal learning</dc:subject>
   <dc:subject>unsupervised learning</dc:subject>
   <dc:subject>Xarxes neuronals (Informàtica)</dc:subject>
   <dc:subject>Visió per ordinador</dc:subject>
   <dc:description>To be defined at MIT.</dc:description>
   <dc:description>Deep learning models, and more specifically computer vision systems, have achieved great results in recent years. However, the interpretability and understanding of these models is still in its early stages. Interpretability can be approached from a low-level or filter level perspective, but the representations learned by neural networks encompass a much higher-level knowledge that has to be approached from a semantic point of view, with concepts in mind. The goal of this project is to investigate the concepts neural networks learn implicitly when they are trained in an unsupervised scenario, with a special focus on the multimodal matching of words to visual objects and attributes. We study how we can detect these concepts, as well as how we can force the networks to learn more meaningful ones, both providing analytical insights and getting practical results.</dc:description>
   <dc:date>2018-10-17</dc:date>
   <dc:type>Master thesis</dc:type>
   <dc:identifier>https://hdl.handle.net/2117/127077</dc:identifier>
   <dc:identifier>ETSETB-230.134403</dc:identifier>
   <dc:language>eng</dc:language>
   <dc:rights>Restricted access - author's decision</dc:rights>
   <dc:format>application/pdf</dc:format>
   <dc:format>application/zip</dc:format>
   <dc:publisher>Universitat Politècnica de Catalunya</dc:publisher>
</oai_dc:dc></metadata></record></GetRecord></OAI-PMH>