2026-04-17T23:18:50Zhttps://recercat.cat/oai/request

oai:recercat.cat:2117/4281422025-07-16T22:28:11Zcom_2072_1033col_2072_452949

Parallelizing recurrent neural networks and variants using OmpSs Sharma, Robin Kumar Casas, Marc Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors High performance computing Deep neural network (DNN) wavefront parallelization task parallelism recurrent neural networks (RNNs) bidirectional recurrent neural networks (BRNNs) long-short term memory (LSTM) gated recurrent units (GRU) Càlcul intensiu (Informàtica) Recurrent neural networks (RNNs) are widely used for natural language processing, time-series prediction, or text analysis tasks [1]. RNNs models have been widely used in combination with convolutional neural networks (CNNs). RNNs contain memory units that display dynamic and temporal connections between past and future data. The outstanding text and signal analysis properties of RNNs and other recurrent models like Long-Short Term Memories (LSTMs) [2] and Gated Recurrent Units (GRUs) [3] make them the prevalent choice to analyze sequential and unsegmented data like text or speech signals. RNNs have two widely used variants; one is uni-directional RNNs [1], which only preserves the information of the past because the only inputs it has seen are from the past, and the second is bi-directional RNNs (BRNNs) [4] which preserves both past and future information. The internal structure of RNNs and its variants inference and training in terms of data or control dependencies across their fundamental numerical kernels complicate the exploitation of model parallelism, which is why just data-parallelism has been traditionally applied to accelerate RNNs [1]. Model parallelism has not been fully exploited to accelerate the forward and backward propagation of RNNs on multi-core CPUs. We present two model parallelism-based approaches: WPar (Wavefront-Parallelization), a comprehensive approach for uni-directional RNNs, and B-Par (Bidirectional-Parallelization) for bi-directional RNNs inference and training on CPUs that relies on applying model parallelism into RNNs models. We use fine-grained pipeline parallelism in terms of tasks to accelerate multi-layer RNNs running on multi-core CPUs. 2023-05 Conference report http://creativecommons.org/licenses/by-nc-nd/4.0/ Open Access Attribution-NonCommercial-NoDerivatives 4.0 International Barcelona Supercomputing Center