BIRS Workshop Lecture Videos

Banff International Research Station Logo

BIRS Workshop Lecture Videos

BigData: Efficient Search and Learning using Sparse Random Projections and Probabilistic Hashing Li, Ping

Description

Modern applications of search and learning have to deal with datasets with billions of examples in billion or even billion square dimensions (e.g., text documents represented by high-order n-grams). In this talk, we will first present the use of very sparse random projections (Li, Hastie, Church, KDD 2006) for learning with high-dimensional data. It is evident that the projection matrix can be extremely sparse (e.g., 0.1% or less nonzeros) without hurting the learning performance. For binary sparse data (which are common in practice), however, b-bit minwise hashing (Li and Konig, Communications of the ACM 2011) turns out to be much more efficient than random projections. In addition, the recent development of one-permutation hashing (Li, Owen, Zhang, NIPS 2012) substantially reduced the processing time of (b-bit) minwise hashing, from (e.g.,) 500 permutations to merely one. There are many other exciting new progresses in the basic research of random projections and hashing, for example, the new work on sign Cauchy random projections for approximating chi- square distances (Li, Samorodnitsky, Hopcroft, NIPS 2013) and the work on using stable random projections for very fast and accurate compressed sensing (Li, Zhang, Zhang, 2013).

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivs 2.5 Canada