Abstract:
|
Apache Spark’s capabilites offer new possibilities to make software systems more scalable and
reliable. The framework can be used to improve old network visibility platforms. Previously,
these systems used to be run in a single node, and used Deep Packet Inspection (DPI)
techniques to classify the network flows. Deep Packet Inspection methods have a high
computational cost so this limited the systems to a lower performance. Classifiers were
forced to sample the input data in order to be able to process it in realtime, which caused
important loss of information.
This project makes use of Spark’s innovative features to create a distributed and fault tolerant
platform that can analyse much more flows per second using Machine Learning to achieve a
high precision and accuracy at a low computational cost. |