dc.contributor.author
Rajani, Nazneen
dc.contributor.author
McArdle, Kate
dc.contributor.author
Dhillon, Inderjit S.
dc.identifier
Rajani, Nazneen; McArdle, Kate; Dhillon, Inderjit S. Parallel k nearest neighbor graph construction using tree-based data structures. A: HPGM: High Performance Graph Mining. "1st High Performance Graph Mining workshop, Sydney, 10 August 2015". 2015.
dc.identifier
https://hdl.handle.net/2117/76382
dc.identifier
10.5821/hpgm15.1
dc.description.abstract
Construction of a nearest neighbor graph is often a neces-
sary step in many machine learning applications. However,
constructing such a graph is computationally expensive, es-
pecially when the data is high dimensional. Python's open
source machine learning library Scikit-learn uses k-d trees
and ball trees to implement nearest neighbor graph construc-
tion. However, this implementation is ine cient for large
datasets. In this work, we focus on exploiting these under-
lying tree-based data structures to optimize parallel execu-
tion of the nearest neighbor algorithm. We present parallel
implementations of nearest neighbor graph construction us-
ing such tree structures, with parallelism provided by the
OpenMP and the Galois framework. We empirically show
that our parallel and exact approach is e cient as well as
scalable, compared to the Scikit-learn implementation. We
present the rst implementation of k-d trees and ball trees
using Galois. Our results show that k-d trees are faster when
the number of dimensions is small (2d N); ball trees on
the other hand scale well with the number of dimensions.
Our implementation of ball trees in Galois has almost linear
speedup on a number of datasets irrespective of the size and
dimensionality of the data.
dc.format
application/pdf
dc.publisher
Barcelona Supercomputing Center
dc.relation
High Performance Graph Mining workshop (1st: 2015: Sydney)
dc.rights
http://creativecommons.org/licenses/by-sa/3.0/es/
dc.rights
Attribution-ShareAlike 3.0 Spain
dc.subject
Àrees temàtiques de la UPC::Matemàtiques i estadística
dc.title
Parallel k nearest neighbor graph construction using tree-based data structures
dc.type
Conference report