Abstract:
|
According to AV vendors malicious software has been growing exponentially
last years. One of the main reasons for these high volumes is that in order
to evade detection, malware authors started using polymorphic and metamorphic
techniques. As a result, traditional signature-based approaches to
detect malware are being insufficient against new malware and the categorization
of malware samples had become essential to know the basis of the
behavior of malware and to fight back cybercriminals.
During the last decade, solutions that fight against malicious software had
begun using machine learning approaches. Unfortunately, there are few opensource
datasets available for the academic community. One of the biggest
datasets available was released last year in a competition hosted on Kaggle
with data provided by Microsoft for the Big Data Innovators Gathering
(BIG 2015). This thesis presents two novel and scalable approaches using
Convolutional Neural Networks (CNNs) to assign malware to its corresponding
family. On one hand, the first approach makes use of CNNs to learn a
feature hierarchy to discriminate among samples of malware represented as
gray-scale images. On the other hand, the second approach uses the CNN
architecture introduced by Yoon Kim [12] to classify malware samples according
their x86 instructions. The proposed methods achieved an improvement
of 93.86% and 98,56% with respect to the equal probability benchmark. |