Generating and Exploiting Deep Learning Variants to Increase Heterogeneous Resource Utilization in the NVIDIA Xavier

Inicio | ¿Qué es? | Contacto

English | Català

Consultar RECERCAT

Por comunidades y
colecciones Por fecha Por autores Por títulos Por temas (CDU)

Consultar departamento

Por fecha Por autores Por títulos Por temas (CDU)

Estadisticas

Del documento Todo RECERCAT

Mi RECERCAT

Entrar Alertas por correo-e

Directorio de otros repositorios

RECERCAT Principal > Universitat Politècnica de Catalunya > Documents de recerca > Visualizar documento

Para acceder a los documentos con el texto completo, por favor, siga el siguiente enlace: http://hdl.handle.net/2117/166069

Título:	Generating and Exploiting Deep Learning Variants to Increase Heterogeneous Resource Utilization in the NVIDIA Xavier
Autor/a:	Pujol, Roger; Tabani, Hamid; Kosmidis, Leonidas; Mezzetti, Enrico; Abella, Jaume; Cazorla, Francisco J.
Otros autores:	Barcelona Supercomputing Center
Abstract:	Deep learning-based solutions and, in particular, deep neural networks (DNNs) are at the heart of several functionalities in critical-real time embedded systems (CRTES) from vision-based perception (object detection and tracking) systems to trajectory planning. As a result, several DNN instances simultaneously run at any time on the same computing platform. However, while modern GPUs offer a variety of computing elements (e.g. CPUs, GPUs, and specific accelerators) in which those DNN tasks can be executed depending on their computational requirements and temporal constraints, current DNNs are mainly programmed to exploit one of them, namely, regular cores in the GPU. This creates resource imbalance and under-utilization of GPU resources when executing several DNN instances, causing an increase in DNN tasks' execution time requirements. In this paper, (a) we develop different variants (implementations) of well-known DNN libraries used in the Apollo Autonomous Driving (AD) software for each of the computing elements of the latest NVIDIA Xavier SoC. Each variant can be configured to balance resource requirements and performance: the regular CPU core implementation that can run on 2, 4, and 6 cores; the GPU regular and Tensor core variants that can run in 4 or 8 GPU's Streaming Multiprocessors (SM); and 1 or 2 NVIDIA's Deep Learning Accelerators (NVDLA); (b) we show that each particular variant/configuration offers a different resource utilization/performance point; finally, (c) we show how those heterogeneous computing elements can be exploited by a static scheduler to sustain the execution of multiple and diverse DNN variants on the same platform.
Abstract:	This work has been partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under grant TIN2015-65316-P, the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 772773), and the HiPEAC Network of Excellence. MINECO partially supported Jaume Abella under Ramon y Cajal postdoctoral fellowship (RYC-2013-14717), Enrico Mezzetti under Juan de la Cierva-Incorporación postdoctoral fellowship (IJCI-2016-27396), and Leonidas Kosmidis under Juan de la Cierva-Formación postdoctoral fellowship (FJCI-2017-34095).
Abstract:	Peer Reviewed
Materia(s):	-Àrees temàtiques de la UPC::Informàtica -High performance computing -Deep Neural Network (DNN) -GPU -Heterogeneous Resources -Supercomputadors
Derechos:
Tipo de documento:	Artículo - Versión publicada Objeto de conferencia
Compartir:

Mostrar el registro completo del ítem

Documentos relacionados

Otros documentos del mismo autor/a

Assessing the Adherence of an Industrial Autonomous Driving Framework to ISO 26262 Software Guidelines

Tabani, Hamid; Kosmidis, Leonidas; Abella, Jaume; Cazorla, Francisco J.; Bernat, Guillem

Alcaide, Sergi; Kosmidis, Leonidas; Tabani, Hamid; Hernandez, Carles; Abella, Jaume; Cazorla, Francisco J.

Modelling multicore contention on the AURIXTM TC27x

Díaz, Enrique; Mezzetti, Enrico; Kosmidis, Leonidas; Abella, Jaume; Cazorla, Francisco J.

High-Integrity Performance Monitoring Units in Automotive Chips for Reliable Timing V&V

Mezzetti, Enrico; Kosmidis, Leonidas; Abella, Jaume; Cazorla, Francisco J.

Probabilistic Worst-Case Timing Analysis: Taxonomy and Comprehensive Survey

Cazorla, Francisco J.; Kosmidis, Leonidas; Mezzetti, Enrico; Hernandez, Carles; Abella, Jaume; Vardanega, Tullio

Accesibilidad | Aviso legal | Política de Cookies | Documentos de uso interno

Coordinación

Patrocinio