Abstract:
|
Lifelogging is a raising field nowadays with the normalization of many
devices that collect data from our daily routines. Egocentric cameras are
particularly interesting devices that allow us to capture very rich information
about the life of the wearer, including his/her social interactions, activities
and contexts where he or she spends the day. Context or scene is one of
the things that influences us most, in almost every aspect of our lives, and
also one of the most challenging things to log, analyze and visualize with an
automatic device.
But, among all kind of contexts, one of the most important is the one
related with food. We are what we eat, and we eat depending on where
we are. So, in order to keep track of a person’s relation with food related
environments, we are going to propose a deep learning based approach in
order to perform food related scene recognition in images gathered from an
egocentric camera.
We explore in detail and propose an optimal framework for food related
environment recognition. Moreover, we introduce a new egocentric dataset
called Egoplaces, that contains over 60.000 thousand labeled images distributed
in 28 categories, corresponding to 27 food related scenes and one
non food related, and we propose several techniques to automatically classify
the environment the user is seeing.
We had to face several challenges, including a small amount of images,
images with small range of view and noise, and, particularly, the problem of
having a very unbalanced dataset. We propose several techniques to deal with
it, using deep convolutional networks to do the classification, and varying the
training strategy. We explore the possibilities of learning incrementally by
doing several training iterations introducing new categories in each, choosing
the most frequent labels first. We also propose a hierarchical learning
strategy, by exploiting the semantic relations among the labels, and learning
from less to more specific. We explore the possibility of applying Bayesian
inference when doing hierarchical classification. Finally, we propose to introduce
repeated images in our dataset in order to overcome the unbalanced
problem, and a post-classification smoothing technique based on K-Nearest
Neighbours algorithm that exploits the fact of egocentric images coming in
a sequence. |