Scalable nearest neighbour methods for high dimensional data

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Scalable nearest neighbour methods for high dimensional data Muja, Marius

Abstract

For many computer vision and machine learning problems, large training sets are key for good performance. However, the most computationally expensive part of many computer vision and machine learning algorithms consists of finding nearest neighbour matches to high dimensional vectors that represent the training data. We propose new algorithms for approximate nearest neighbour matching and evaluate and compare them with previous algorithms. For matching high dimensional features, we find two algorithms to be the most efficient: the randomized k-d forest and a new algorithm proposed in this thesis, the priority search k-means tree. We also propose a new algorithm for matching binary features by searching multiple hierarchical clustering trees and show it outperforms methods typically used in the literature. We show that the optimal nearest neighbour algorithm and its parameters depend on the dataset characteristics and describe an automated configuration procedure for finding the best algorithm to search a particular dataset. In order to scale to very large datasets that would otherwise not fit in the memory of a single machine, we propose a distributed nearest neighbour matching framework that can be used with any of the algorithms described in the thesis. All this research has been released as an open source library called FLANN (Fast Library for Approximate Nearest Neighbours), which has been incorporated into OpenCV and is now one of the most popular libraries for nearest neighbour matching.

Item Metadata

Title	Scalable nearest neighbour methods for high dimensional data
Creator	Muja, Marius
Publisher	University of British Columbia
Date Issued	2013
Description	For many computer vision and machine learning problems, large training sets are key for good performance. However, the most computationally expensive part of many computer vision and machine learning algorithms consists of finding nearest neighbour matches to high dimensional vectors that represent the training data. We propose new algorithms for approximate nearest neighbour matching and evaluate and compare them with previous algorithms. For matching high dimensional features, we find two algorithms to be the most efficient: the randomized k-d forest and a new algorithm proposed in this thesis, the priority search k-means tree. We also propose a new algorithm for matching binary features by searching multiple hierarchical clustering trees and show it outperforms methods typically used in the literature. We show that the optimal nearest neighbour algorithm and its parameters depend on the dataset characteristics and describe an automated configuration procedure for finding the best algorithm to search a particular dataset. In order to scale to very large datasets that would otherwise not fit in the memory of a single machine, we propose a distributed nearest neighbour matching framework that can be used with any of the algorithms described in the thesis. All this research has been released as an open source library called FLANN (Fast Library for Approximate Nearest Neighbours), which has been incorporated into OpenCV and is now one of the most popular libraries for nearest neighbour matching.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2013-05-01
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0052198
URI	http://hdl.handle.net/2429/44402
Degree (Theses)	Doctor of Philosophy - PhD
Program (Theses)	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2013-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Scalable nearest neighbour methods for high dimensional data Muja, Marius

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights