Tous Liesa, Rubén
Torres Viñals, Jordi
2015-04-28
The Big data analysis has becoming increasingly more relevant for the enterprises because the efficient handling of information represents a unique competitive advantage, being its application so diverse as the nature of the data. Ejm. Fraud detection, advertising strategies, web traffic m onitoring, etc. Apache Spark is a engine for large - scale data processing, intended to be a drop in replacement for Hadoop MapReduce providing the benefit of improved performance; the main goal of this project is proof the capabilities of this system, throu gh the development and implementation of a distributed pipeline for processing and indexing at high speed and real - time multimedia data streams generated by social networks and detect trends in these, using for this purpose the Spark related projects and l ibraries: Spark Streaming and Spark MLlib. To verify the effectiveness of the algorithm, different benchmarks (with different configurations) will be performed, these results will be analyzed.
Master thesis
English
Àrees temàtiques de la UPC::Enginyeria civil::Infraestructures i modelització dels transports::Trànsit; Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació; Big data; Spark; LDA; streaming; clustering; Macrodades
Universitat Politècnica de Catalunya
Open Access
Treballs acadèmics [82541]