<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="static/style.xsl"?><OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2026-04-13T07:01:32Z</responseDate><request verb="GetRecord" identifier="oai:www.recercat.cat:2117/376954" metadataPrefix="marc">https://recercat.cat/oai/request</request><GetRecord><record><header><identifier>oai:recercat.cat:2117/376954</identifier><datestamp>2025-07-23T01:09:56Z</datestamp><setSpec>com_2072_1033</setSpec><setSpec>col_2072_452951</setSpec></header><metadata><record xmlns="http://www.loc.gov/MARC21/slim" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:doc="http://www.lyncode.com/xoai" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd">
   <leader>00925njm 22002777a 4500</leader>
   <datafield ind2=" " ind1=" " tag="042">
      <subfield code="a">dc</subfield>
   </datafield>
   <datafield ind2=" " ind1=" " tag="720">
      <subfield code="a">Pujol Perich, David</subfield>
      <subfield code="e">author</subfield>
   </datafield>
   <datafield ind2=" " ind1=" " tag="260">
      <subfield code="c">2022-07-10</subfield>
   </datafield>
   <datafield ind2=" " ind1=" " tag="520">
      <subfield code="a">Recent years have seen the vast potential of the Transformer model, as it is arguably the first general-purpose architecture in the sense that achieves state-of-the-art performance in numerous fields –e.g., Computer Vision, Natural Language Processing, Autonomous driving– using minimal architecture modifications. The success of Transformers greatly relies on the use of the self-attention mechanism, the understanding of which still remains somewhat obscure. In this thesis, we first focus on bridging this gap from an empirical perspective, studying its main inductive biases and limitations. We also propose the ReLA Nyströmformer, a novel architecture that attains a linear complexity –considerably improving the original quadratic complexity of the self-attention– meanwhile proving to be empirically superior to a vast set of state-of-the-art benchmarks. We also notice that Transformer architectures are ultimately third-order-interaction-based models –i.e., can be formalized as third-order polynomials– which tractability strongly depends on a number of inductive biases. This motivates the last part of this thesis where we discuss the suitability of devising higher-order models –i.e., based on higher-order polynomials– both from a predictive and interpretability point of view. Finally, we propose two novel architectures, the Low-rank Deep Polynomial Network and the Adaptive attention, based on low-rank projections and automatic attention pattern learning. The state-of- the-art performance of both of these models underpins the need of researching further in this direction to fully elucidate their potential.</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Llenguatge natural</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">Deep learning (Machine learning)</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">Natural language processing (Computer science)</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">Computer vision</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">aprenentatge profund</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">self-attention</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">processament de llenguatge natural</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">visió per computació</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">models d'ordre superior</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">deep learning</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">efficient Transformers</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">natural language processing</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">computer vision</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">high-order models</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">aprenentatge automàtic</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">machine learning</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">Transformers</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">Aprenentatge profund</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">Tractament del llenguatge natural (Informàtica)</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">Visió per ordinador</subfield>
   </datafield>
   <datafield ind2="0" ind1="0" tag="245">
      <subfield code="a">Understanding and improving self-attention mechanisms</subfield>
   </datafield>
</record></metadata></record></GetRecord></OAI-PMH>