Abstract:
|
Deep neural networks have gained popularity in recent years, obtaining outstanding results in
a wide range of application, but most notoriously in computer vision and natural language
processing tasks. Despite the newly found interest, research in neural networks span many
decades back, and some of today’s most used network architectures where invented many years
ago. Nevertheless, the progress made during this period cannot be understood without taking
into account the technological advancements seen in key contiguous domains such as massive
data storage and computing systems, more specifically in the Graphic Processing Unit (GPU)
domain. These two components are responsible for the enormous performance gains in neural
networks, that have made what we call Deep Learning a common word among the Artificial
Intelligence and Machine Learning community.
These kind of networks need massive amounts of data to effectively train the millions of
parameters they contain, and this training can take up to days or weeks depending on the
computer architecture we are using. The size of new published datasets keeps growing, and the
tendency of creating deeper networks that outperforms shallower architectures means that on
the medium and long term the computer hardware to undertake these kind of training processes
can only be found in high performance computing facilities, where they have enormous clusters
of computers. However, using these machines is not straightforward, as both the framework and
the code need to be appropriately tuned for effectively taking advantage of these distributed
environments.
For this reason, we test TensorFlow, an open-sourced framework for Deep Learning from
Google that has built-in distributed support, on top of the GPU cluster, called MinoTauro, at
Barcelona Supercomputing Center (BSC). We aim to implement a defined workload using the
distributed features the framework offers, to speed up the training process, acquire knowledge
of the inner workings of the framework and understand the similarities and differences with
respect to a classic single node training. |