Title:
|
Node architecture implications for in-memory data analytics on scale-in clusters
|
Author:
|
Awan, Ashan Javed; Vlassov, Vladimir; Brorsson, Mats; Ayguadé Parra, Eduard
|
Other authors:
|
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors; Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions |
Abstract:
|
While cluster computing frameworks are continuously evolving to provide real-time data analysis capabilities, Apache Spark has managed to be at the forefront of big data analytics. Recent studies propose scale-in clusters with in-storage processing devices to process big data analytics with Spark However the proposal is based solely on the memory bandwidth characterization of in-memory data analytics and also does not shed light on the specification of host CPU and memory. Through empirical evaluation of in-memory data analytics with Apache Spark on an Ivy Bridge dual socket server, we have found that (i) simultaneous multithreading is effective up to 6 cores (ii) data locality on NUMA nodes can improve the performance by 10% on average, (iii) disabling next-line L1-D prefetchers can reduce the execution time by up to 14%, (iv) DDR3 operating at 1333 MT/s is suffcient and (v) multiple small executors can provide up to 36% speedup over single large executor. |
Abstract:
|
Peer Reviewed |
Subject(s):
|
-Àrees temàtiques de la UPC::Informàtica -Big data -NUMA -SMT -Spark -Macrodades |
Rights:
|
|
Document type:
|
Article - Published version Conference Object |
Published by:
|
Association for Computing Machinery (ACM)
|
Share:
|
|