A continuum of matrix multiplications: from scientific computing to deep learning

Quintana-Ortí, Enrique S.

A continuum of matrix multiplications: from scientific computing to deep learning

dc.contributor.author

Quintana-Ortí, Enrique S.

dc.date.accessioned

2026-01-14T02:12:59Z

dc.date.available

2026-01-14T02:12:59Z

dc.date.issued

2023-03-21

dc.identifier

Quintana-Ortí, E.S. A continuum of matrix multiplications: from scientific computing to deep learning. A: Severo Ochoa Research Seminars at BSC. «8th Severo Ochoa Research Seminar Lectures at BSC, Barcelona, 2022-23». Barcelona: Barcelona Supercomputing Center, 2023, p. 77-78.

dc.identifier

https://hdl.handle.net/2117/450332

dc.identifier.uri

http://hdl.handle.net/2117/450332

dc.description.abstract

Matrix multiplication (GEMM) is a key, pervasive computational kernel that spans across multiple domains. On the one hand, many applications arising in scientific computing require the solution of linear systems of equations, least-square problems, and eigenvalue problems. For portability, these applications often rely on linear algebra routines from LAPACK (linear algebra package). In turn, in order to deliver high performance, LAPACK heavily relies on GEMM and other Basic Linear algebra subroutines (BLAS). On the other hand, to a large extent, the computational cost for the convolutional neural networks (CNNs) that dominate machine learning algorithms for signal processing and computer vision tasks, as well as the transformers behind recent deep learning (DL) applications, such as ChatGPT, is largely determined by the performance of GEMM. In this talk we will first expose caveats of current instances of GEMM in linear algebra libraries for conventional multicore architectures: suboptimal performance and missing support for DL-oriented data types. Starting from that point, we will then demonstrate how these problems can be overcome via tools for the (semi-)automatic generation of the only architecturespecific piece of GEMM, known as micro-kernel, together with an analytical-based model to capture the cache hierarchy configuration. In addition, we will show that this approach carries over to more "exotic" architectures, from high-end vector accelerators and the Xilinx artificial intelligence engine (AIE) to low-power designs such as RISC-V processors and ARM-based (Arduino) micro-controllers.

dc.format

2 p.

dc.format

application/pdf

dc.language

eng

dc.publisher

Barcelona Supercomputing Center

dc.rights

http://creativecommons.org/licenses/by-nc-nd/4.0/

dc.rights

Open Access

dc.rights

Attribution-NonCommercial-NoDerivatives 4.0 International

dc.subject

Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors

dc.subject

High performance computing

dc.subject

Càlcul intensiu (Informàtica)

dc.title

A continuum of matrix multiplications: from scientific computing to deep learning

dc.type

Conference report

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Congressos [11156]