Abstract:
|
The straight advantages of the use of graphs for representation purposes
appear to be useless in some applications due to the lack of mathematical
structure in graph domains. An illustrative example is the problem of finding
a representative of a set of graphs. While in vector spaces it is easy to
compute representatives such as medians and means with respect to a wide
range of distances, in the graph domain the analogy turns out to be a highly
non-trivial task.
In this work we introduce and compare two different approaches to compute a representative of a given set of graphs, namely the median graph and
the barycenter graph. Such ideal concepts suffer from a prohibitive computational cost in practice. Therefore, approximate methods for their computation are presented.
The median graph is defined as the graph minimizing the sum of distances
(SOD) to all the graphs in a given set. A first option is to restrict the search
of the graph with minimum SOD to the original given set, which leads to
the so called set median graph. In order to overcome this restriction, this
minimum can be taken over a more general search space. The resulting
graph is called generalized median graph, which is expected to provide better
representatives but is much harder to obtain. The strategy presented in this
work for its computation consists in mapping graphs into a vector space and
computing the median in this image space. The result is then mapped back
to the graph domain, to obtain the median graph. Several variations of this
standard methodology are presented.
In a second approach we present the new concept of barycenter graph,
more concretely, we present the set barycenter graph and generalized barycenter graph. The barycenter graph is inspired in the concept of barycenter or
centroid in vector spaces, which minimizes the sum of the square of the distances (SOSD) to the vectors for which we want to find a representative.
The strategy proposed to compute the generalized barycenter graph can be
straightly performed in the graph domain, in analogy to the barycenter computation in Euclidean spaces, by iteratively computing the weighted mean of
pairs of graphs. Thus, in this case no auxiliary vector space is needed. We
present a collection of variants of this algorithm. Similarly to the case of the
median graph, we search for the graph in the given set with minimum SOSD,
mainly for evaluation purposes.
In order to compare the different methods presented in the work, we perform some clustering experiments on one semi{artificial and three real-world datasets. The evaluation is carried out using standard clustering performance
indexes. It is worth mentioning that, up to our knowledge, no algorithms for
graph representatives' computation can be found which are able to handle
the sizes of both graphs and graph-sets we work with.
Results presented in this work lead to promising conclusions on the proposed techniques. The methods presented to compute the generalized median
and barycenter graph outperform in general the set median and barycenter
graphs, respectively. In the case of the median graph, also some previous
methods are overcome. Finally, we come to the conclusion that none of the
two approaches is clearly better than the other, but that it depends on the
characteristics of the underlying data. In other words, that the new concept
of barycenter graph is as good a representative as the median graph. |