Title:
|
General purpose task-dependence management hardware for task-based dataflow programming models
|
Author:
|
Tan, Xubin; Bosch, Jaume; Vidal-Piñol, Miquel; Alvarez, Carlos; Jimenez-Gonzalez, Daniel; Ayguadé Parra, Eduard; Valero Cortés, Mateo
|
Other authors:
|
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors; Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions |
Abstract:
|
Task-based programming models such as OpenMP, IntelTBB and OmpSs offer the possibility of expressing dependences among tasks to drive their execution at runtime. Managing these dependences introduces noticeable overheads when targeting fine-grained tasks, diminishing the potential speedups or even introducing performance losses. To overcome this drawback, we present a general purpose hardware accelerator, Picos++, to manage the inter-task dependences efficiently in both time and energy. Our design also includes a novel nested task support. To this end, a new hardware/software co-design is presented to overcome the fact that nested tasks with dependences could result in system deadlocks due to the limited amount of resources in hardware task dependence managers. In this paper we describe a detailed implementation of this design and evaluate a parallel task-based programming model using Picos++ in a Linux embedded system with two ARM Cortex-A9 and a FPGA. The scalability and energy consumption of the real system implemented have been studied and compared against a software runtime. Even in a system limited to 2 threads, using Picos++ results in more than 1.8x speedup and 40% of energy savings in the most demanding parallelizations of real benchmarks. As a matter of fact, a hardware task dependence manager should be able to achieve much higher speedup and provide more energy savings with more threads. |
Abstract:
|
This work is supported by the Spanish Government (projects SEV-2015-0493 and TIN2015-65316-P), by the Generalitat de Catalunya (2014-SGR-1051 and 2014-SGR-
1272), by the European Research Council (RoMoL GA 321253) and by the “Port of OmpSs to the Android platform and Hardware support for Nanos++ runtime” Project
Cooperation Agreement with LG Electronics. We also thank the Xilinx University Program. |
Abstract:
|
Peer Reviewed |
Subject(s):
|
-Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures paral·leles -Parallel programming (Computer science) -Hardware -Programming -Discrete cosine transforms -Runtime -Parallel processing -Instruction sets -Programació en paral·lel (Informàtica) |
Rights:
|
|
Document type:
|
Article - Published version Conference Object |
Published by:
|
Institute of Electrical and Electronics Engineers (IEEE)
|
Share:
|
|