dc.contributor.author
Quintana-Ortí, Enrique S.
dc.date.accessioned
2026-01-14T02:12:59Z
dc.date.available
2026-01-14T02:12:59Z
dc.date.issued
2023-03-21
dc.identifier
Quintana-Ortí, E.S. A continuum of matrix multiplications: from scientific computing to deep learning. A: Severo Ochoa Research Seminars at BSC. «8th Severo Ochoa Research Seminar Lectures at BSC, Barcelona, 2022-23». Barcelona: Barcelona Supercomputing Center, 2023, p. 77-78.
dc.identifier
https://hdl.handle.net/2117/450332
dc.identifier.uri
http://hdl.handle.net/2117/450332
dc.description.abstract
Matrix multiplication (GEMM) is a key, pervasive
computational kernel that spans across multiple domains. On
the one hand, many applications arising in scientific computing
require the solution of linear systems of equations, least-square
problems, and eigenvalue problems. For portability, these
applications often rely on linear algebra routines from
LAPACK (linear algebra package). In turn, in order to deliver
high performance, LAPACK heavily relies on GEMM and
other Basic Linear algebra subroutines (BLAS). On the other
hand, to a large extent, the computational cost for the
convolutional neural networks (CNNs) that dominate machine
learning algorithms for signal processing and computer vision
tasks, as well as the transformers behind recent deep learning
(DL) applications, such as ChatGPT, is largely determined by
the performance of GEMM.
In this talk we will first expose caveats of current instances of
GEMM in linear algebra libraries for conventional multicore
architectures: suboptimal performance and missing support for
DL-oriented data types. Starting from that point, we will then
demonstrate how these problems can be overcome via tools for
the (semi-)automatic generation of the only architecturespecific
piece of GEMM, known as micro-kernel, together with
an analytical-based model to capture the cache hierarchy
configuration. In addition, we will show that this approach
carries over to more "exotic" architectures, from high-end vector accelerators and the Xilinx artificial intelligence engine
(AIE) to low-power designs such as RISC-V processors and
ARM-based (Arduino) micro-controllers.
dc.format
application/pdf
dc.publisher
Barcelona Supercomputing Center
dc.rights
http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.rights
Attribution-NonCommercial-NoDerivatives 4.0 International
dc.subject
Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject
High performance computing
dc.subject
Càlcul intensiu (Informàtica)
dc.title
A continuum of matrix multiplications: from scientific computing to deep learning
dc.type
Conference report