Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications

Inici | Què és? | Contacte

English | Castellano

Consultar RECERCAT

Per comunitats i
col·leccions Per data Per autors Per títols Per matèries

Consultar col·lecció

Per data Per autors Per títols Per matèries

Estadístiques

Del document Tot RECERCAT

El meu RECERCAT

Entrar Alertes per correu-e

Directori d’altres repositoris

Pàgina inicial del RECERCAT > Universitat Politècnica de Catalunya > Documents de recerca > Visualitza document

Per accedir als documents amb el text complet, si us plau, seguiu el següent enllaç: http://hdl.handle.net/2117/96866

Títol:	Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications
Autor/a:	García Flores, Víctor; Gomez Luna, J.; Grass, Thomas Dieter; Rico, Alejandro; Ayguadé Parra, Eduard; Pena, A. J.
Altres autors:	Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors; Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
Abstract:	Heterogeneous systems are ubiquitous in the field of High- Performance Computing (HPC). Graphics processing units (GPUs) are widely used as accelerators for their enormous computing potential and energy efficiency; furthermore, on-die integration of GPUs and general-purpose cores (CPUs) enables unified virtual address spaces and seamless sharing of data structures, improving programmability and softening the entry barrier for heterogeneous programming. Although on-die GPU integration seems to be the trend among the major microprocessor manufacturers, there are still many open questions regarding the architectural design of these systems. This paper is a step forward towards understanding the effect of on-chip resource sharing between GPU and CPU cores, and in particular, of the impact of last-level cache (LLC) sharing in heterogeneous computations. To this end, we analyze the behavior of a variety of heterogeneous GPU-CPU benchmarks on different cache configurations. We perform an evaluation of the popular Rodinia benchmark suite modified to leverage the unified memory address space. We find such GPGPU workloads to be mostly insensitive to changes in the cache hierarchy due to the limited interaction and data sharing between GPU and CPU. We then evaluate a set of heterogeneous benchmarks specifically designed to take advantage of the finegrained data sharing and low-overhead synchronization between GPU and CPU cores that these integrated architectures enable. We show how these algorithms are more sensitive to the design of the cache hierarchy, and find that when GPU and CPU share the LLC execution times are reduced by 25% on average, and energy-to-solution by over 20% for all benchmarks.
Abstract:	This work has been supported by the Spanish Ministry of Science and Innovation (contract TIN2015-65316-P) and by the BSC/UPC NVIDIA GPU Center of Excellence.
Abstract:	Peer Reviewed
Matèries:	-Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors -Microprocessors -Computer architecture -Cache storage -Graphics processing units -Microprocessor chips -Last-level cache sharing -Integrated GPU-CPU system -Heterogeneous system -Highperformance computing -HPC -Graphics processing unit -Energy efficiency -Virtual address space -On-die GPU integration -On-chip resource sharing -Rodinia benchmark -Unified memory address space -GPGPU -Microprocessadors -Arquitectura d'ordinadors
Drets:
Tipus de document:	Article - Versió publicada Objecte de conferència
Publicat per:	Institute of Electrical and Electronics Engineers (IEEE)
Compartir:

Mostra el registre complet del document

Documents relacionats

Altres documents del mateix autor/a

TaskPoint: sampled simulation of task-based programs

Grass, Thomas Dieter; Rico, Alejandro; Casas, Marc; Moreto Planas, Miquel; Ayguadé Parra, Eduard

On the maturity of parallel applications for asymmetric multi-core processors

Chronaki, Kallia; Moreto Planas, Miquel; Casas, Marc; Rico, Alejandro; Badia Sala, Rosa Maria; Ayguadé Parra, Eduard; Valero Cortés, Mateo

POSTER: Exploiting asymmetric multi-core processors with flexible system sofware

Chronaki, Kallia; Moreto Planas, Miquel; Casas Guix, Marc; Rico, Alejandro; Badia Sala, Rosa Maria; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo

MUSA: a multi-level simulation approach for next-generation HPC machines

Grass, Thomas; Allande, César; Armejach, Adrià; Rico, Alejandro; Ayguadé Parra, Eduard; Labarta, Jesús; Valero Cortés, Mateo; Casas, Marc; Moreto Planas, Miquel

The Mont-Blanc prototype: an alternative approach for HPC systems

Rajovic, Nikola; Rico, Alejandro; Mantovani, Filippo; Ruiz, Daniel; Vlarrubi, Josep O.; Gomez, Constantino; Backes, Luna; Nieto, Diego; Servat, Harald; Martorell Bofill, Xavier; Labarta Mancho, Jesús José; Ayguadé Parra, Eduard; Adeniyi-Jones, Chris; Derradji, Said; Gloaguen, Hervé; Lanucara, Piero; Sanna, Nico; Mehaut, Jean-François; Pouget, Kevin; Videau, Brice; Boyer, Eric; Allalen, Momme; Auweter, Axel; Brayford, David; Tafani, Daniele; Weinberg, Volker; Brömmel, Dirk; Halver, René; Meinke, Jan H.; Beivide Palacio, Ramon; Benito, Mariano; Vallejo, Enrique; Valero Cortés, Mateo; Ramirez, Alex

Accessibilitat | Avís legal | Política de Cookies | Documents d'ús intern

Coordinació

Patrocini