To access the full text documents, please follow this link:

Mapping parallel loops on multicore systems
Tabik, Siham; Romero, Felipe; Utrera Iglesias, Gladys Miriam; Plata, Oscar
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors; Universitat Politècnica de Catalunya. CAP - Grup de Computació d´Altes Prestacions
The compute nodes in contemporary HPC systems contain one or more multicore processors. As a result, these nodes constitute a shared-memory multiprocessor, often combining CMP and SMT concurrency technologies. This configuration introduces different levels of sharing in the cache hierarchy, resulting in non-uniform data sharing overheads. In this paper we analyze the data-sharing patterns that exhibit a real multithreaded application when executing on a multicore system, with emphasis in the use of the shared last level cache (LLC) for the concurrent threads. As a consequence of this study, we explore the loop mapping problem in such systems with the aim of optimizing the shared use of the the LLC by all parallel threads. We propose a three-phase loop mapping strategy that deals with workload imbalances, minimizes cache sharing interferences, and maximizes intra-core and inter-core data reuse in the cache hierarchy. Preliminary results show some benefits of our approach. However, this is a work in progress and much more research is being done.
Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
High performance computing
Càlcul intensiu (Informàtica)
Attribution-NonCommercial-NoDerivs 3.0 Spain

Show full item record