UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Algorithms for large-scale multi-codebook quantization Martinez-Covarrubias, Julieta

Abstract

Combinatorial vector compression is the task of expressing a set of vectors as accurately as possible in terms of discrete entries in multiple bases. The problem is of interest in the context of large-scale similarity search, as it provides a memory-efficient, yet ready-to-use compact representation of high-dimensional data on which vector similarities such as Euclidean distances and dot products can be efficiently approximated. Combinatorial compression poses a series of challenging optimization problems that are often a barrier to its deployment on very large scale systems (e.g., of over a billion entries). In this thesis we explore algorithms and optimization techniques that make combinatorial compression more accurate and efficient in practice, and thus provide a practical alternative to current methods for large-scale similarity search.

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International