Abstract:
|
The Big data analysis has becoming
increasingly more relevant for the enterprises
because the efficient handling of information represents a unique competitive advantage,
being its application so diverse as the nature of the data. Ejm. Fraud detection, advertising
strategies, web traffic m
onitoring, etc.
Apache Spark is a engine for large
-
scale data processing, intended to be a drop in
replacement for Hadoop MapReduce providing the benefit of improved performance; the
main goal of this project is proof the capabilities of this system, throu
gh the development and
implementation of a distributed pipeline for processing and indexing at high speed and real
-
time multimedia data streams generated by social networks and detect trends in these, using
for this purpose the Spark related projects and l
ibraries: Spark Streaming and Spark MLlib.
To verify the effectiveness of the algorithm, different benchmarks (with different
configurations) will be performed,
these results
will be analyzed. |