Node architecture implications for in-memory data analytics on scale-in clusters

Home | About RECERCAT | Contact

Català | Castellano

All of RECERCAT

By Communities &
Collections By Defense Date By Authors By Titles By Subject

This Collection

By Defense Date By Authors By Titles By Subject

Statistics

View Statistics All RECERCAT

My RECERCAT

Other repositories directory

RECERCAT Home > Universitat Politècnica de Catalunya > Documents de recerca > View document

To access the full text documents, please follow this link: http://hdl.handle.net/2117/99705

Title:	Node architecture implications for in-memory data analytics on scale-in clusters
Author:	Awan, Ashan Javed; Vlassov, Vladimir; Brorsson, Mats; Ayguadé Parra, Eduard
Other authors:	Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors; Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
Abstract:	While cluster computing frameworks are continuously evolving to provide real-time data analysis capabilities, Apache Spark has managed to be at the forefront of big data analytics. Recent studies propose scale-in clusters with in-storage processing devices to process big data analytics with Spark However the proposal is based solely on the memory bandwidth characterization of in-memory data analytics and also does not shed light on the specification of host CPU and memory. Through empirical evaluation of in-memory data analytics with Apache Spark on an Ivy Bridge dual socket server, we have found that (i) simultaneous multithreading is effective up to 6 cores (ii) data locality on NUMA nodes can improve the performance by 10% on average, (iii) disabling next-line L1-D prefetchers can reduce the execution time by up to 14%, (iv) DDR3 operating at 1333 MT/s is suffcient and (v) multiple small executors can provide up to 36% speedup over single large executor.
Abstract:	Peer Reviewed
Subject(s):	-Àrees temàtiques de la UPC::Informàtica -Big data -NUMA -SMT -Spark -Macrodades
Rights:
Document type:	Article - Published version Conference Object
Published by:	Association for Computing Machinery (ACM)
Share:

Show full item record

All of RECERCAT

This Collection

Statistics

My RECERCAT

Related documents

Other documents of the same author