Title	BigData: Efficient Search and Learning using Sparse Random Projections and Probabilistic Hashing
Creator	Li, Ping
Publisher	Banff International Research Station for Mathematical Innovation and Discovery
Date Issued	2014-02-10
Description	Modern applications of search and learning have to deal with datasets with billions of examples in billion or even billion square dimensions (e.g., text documents represented by high-order n-grams). In this talk, we will first present the use of very sparse random projections (Li, Hastie, Church, KDD 2006) for learning with high-dimensional data. It is evident that the projection matrix can be extremely sparse (e.g., 0.1% or less nonzeros) without hurting the learning performance. For binary sparse data (which are common in practice), however, b-bit minwise hashing (Li and Konig, Communications of the ACM 2011) turns out to be much more efficient than random projections. In addition, the recent development of one-permutation hashing (Li, Owen, Zhang, NIPS 2012) substantially reduced the processing time of (b-bit) minwise hashing, from (e.g.,) 500 permutations to merely one. There are many other exciting new progresses in the basic research of random projections and hashing, for example, the new work on sign Cauchy random projections for approximating chi- square distances (Li, Samorodnitsky, Hopcroft, NIPS 2013) and the work on using stable random projections for very fast and accurate compressed sensing (Li, Zhang, Zhang, 2013).
Extent	42 minutes
Subject	Mathematics; Statistics; Biology and other natural sciences; Applied statistics
Type	Moving Image
File Format	video/mp4
Language	eng
Notes	Author affiliation: Rutgers University
Series	BIRS Workshop Lecture Videos (Banff, Alta)
Date Available	2014-08-07
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivs 2.5 Canada
DOI	10.14288/1.0043880
URI	http://hdl.handle.net/2429/49833
Affiliation	Non UBC
Peer Review Status	Unreviewed
Scholarly Level	Faculty
Rights URI	http://creativecommons.org/licenses/by-nc-nd/2.5/ca/
Aggregated Source Repository	DSpace

Open Collections

BIRS Workshop Lecture Videos

BigData: Efficient Search and Learning using Sparse Random Projections and Probabilistic Hashing Li, Ping

Description

Item Metadata

Item Media

Item Citations and Data

Rights