<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="static/style.xsl"?><OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2026-04-17T12:57:38Z</responseDate><request verb="GetRecord" identifier="oai:www.recercat.cat:2117/445011" metadataPrefix="marc">https://recercat.cat/oai/request</request><GetRecord><record><header><identifier>oai:recercat.cat:2117/445011</identifier><datestamp>2026-02-04T05:50:30Z</datestamp><setSpec>com_2072_1033</setSpec><setSpec>col_2072_452950</setSpec></header><metadata><record xmlns="http://www.loc.gov/MARC21/slim" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:doc="http://www.lyncode.com/xoai" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd">
   <leader>00925njm 22002777a 4500</leader>
   <datafield ind2=" " ind1=" " tag="042">
      <subfield code="a">dc</subfield>
   </datafield>
   <datafield ind2=" " ind1=" " tag="720">
      <subfield code="a">Martínez Pérez, Héctor</subfield>
      <subfield code="e">author</subfield>
   </datafield>
   <datafield ind2=" " ind1=" " tag="720">
      <subfield code="a">Catalán Pallarés, Sandra</subfield>
      <subfield code="e">author</subfield>
   </datafield>
   <datafield ind2=" " ind1=" " tag="720">
      <subfield code="a">Igual Peña, Francisco D.</subfield>
      <subfield code="e">author</subfield>
   </datafield>
   <datafield ind2=" " ind1=" " tag="720">
      <subfield code="a">Herrero Zaragoza, José Ramón</subfield>
      <subfield code="e">author</subfield>
   </datafield>
   <datafield ind2=" " ind1=" " tag="720">
      <subfield code="a">Rodríguez Sánchez, Rafael</subfield>
      <subfield code="e">author</subfield>
   </datafield>
   <datafield ind2=" " ind1=" " tag="720">
      <subfield code="a">Quintana Ortí, Enrique Salvador</subfield>
      <subfield code="e">author</subfield>
   </datafield>
   <datafield ind2=" " ind1=" " tag="260">
      <subfield code="c">2025-09</subfield>
   </datafield>
   <datafield ind2=" " ind1=" " tag="520">
      <subfield code="a">This paper advocates for a careful customization of the special general matrix multiplication (GEMM) kernels that are invoked from blocked routines for several relevant matrix factorizations in LAPACK, in order to improve their performance on modern multicore processors with hierarchical cache memories. To achieve this, we leverage a refined analytical model to dynamically tune the cache configuration parameters of GEMM for these kernels, taking into account the matrix operands’ dimensions, in order to improve cache occupation. In addition, toward the same goal, we accommodate a flexible development of architecture-specific micro-kernels for GEMM that allows us to select the option that, depending on the operands’ dimensions, ameliorates cache utilization. Our experiments for the LU and QR factorizations on two platforms, equipped with ARM (NVIDIA Carmel) and x86 (AMD EPYC) multi-core processors, demonstrate the benefits of this approach in terms of a better cache utilization and, in general, higher performance. Moreover, they also reveal the delicate balance between optimizing for multi-threaded parallelism versus cache usage as well as the positive effects of software prefetching.</subfield>
   </datafield>
   <datafield ind2=" " ind1=" " tag="520">
      <subfield code="a">This work was supported by grants PID2020- 113656RB-C22, PID2019-107255GB, PID2021-126576NB-I00 and PID2021-123627OB-C52 of MCIN/AEI/10.13039/501100011033, by ‘‘ERDF A way of making Europe’’, and 2021-SGR-01007 of the Generalitat de Catalunya. Héctor Martínez is a postdoctoral fellow supported by the Consejería de Transformación Económica, Industria, Conocimiento y Universidades de la Junta de Andalucía. Sandra Catalán was supported by the grant RYC2021-033973- I, funded by MCIN/AEI/10.13039/501100011033 and the European Union ‘‘NextGenerationEU’’/PRTR. Funding Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.</subfield>
   </datafield>
   <datafield ind2=" " ind1=" " tag="520">
      <subfield code="a">Peer Reviewed</subfield>
   </datafield>
   <datafield ind2=" " ind1=" " tag="520">
      <subfield code="a">Postprint (published version)</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">Dense linear algebra</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">Computer architecture</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">Multicore processors</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">Cache memory</subfield>
   </datafield>
   <datafield tag="653" ind2=" " ind1=" ">
      <subfield code="a">Matrix factorization</subfield>
   </datafield>
   <datafield ind2="0" ind1="0" tag="245">
      <subfield code="a">Cache-aware optimization of matrix multiplication and matrix factorizations on multicore processors</subfield>
   </datafield>
</record></metadata></record></GetRecord></OAI-PMH>