- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- BIRS Workshop Lecture Videos /
- BigData: Efficient Search and Learning using Sparse...
Open Collections
BIRS Workshop Lecture Videos
BIRS Workshop Lecture Videos
BigData: Efficient Search and Learning using Sparse Random Projections and Probabilistic Hashing Li, Ping
Description
Modern applications of search and learning have to deal with datasets with billions of examples in billion or even billion square dimensions (e.g., text documents represented by high-order n-grams). In this talk, we will first present the use of very sparse random projections (Li, Hastie, Church, KDD 2006) for learning with high-dimensional data. It is evident that the projection matrix can be extremely sparse (e.g., 0.1% or less nonzeros) without hurting the learning performance. For binary sparse data (which are common in practice), however, b-bit minwise hashing (Li and Konig, Communications of the ACM 2011) turns out to be much more efficient than random projections. In addition, the recent development of one-permutation hashing (Li, Owen, Zhang, NIPS 2012) substantially reduced the processing time of (b-bit) minwise hashing, from (e.g.,) 500 permutations to merely one. There are many other exciting new progresses in the basic research of random projections and hashing, for example, the new work on sign Cauchy random projections for approximating chi- square distances (Li, Samorodnitsky, Hopcroft, NIPS 2013) and the work on using stable random projections for very fast and accurate compressed sensing (Li, Zhang, Zhang, 2013).
Item Metadata
Title |
BigData: Efficient Search and Learning using Sparse Random Projections and Probabilistic Hashing
|
Creator | |
Publisher |
Banff International Research Station for Mathematical Innovation and Discovery
|
Date Issued |
2014-02-10
|
Description |
Modern applications of search and learning have to deal with datasets with billions of examples in billion or even billion square dimensions (e.g., text documents represented by high-order n-grams). In this talk, we will first present the use of very sparse random projections (Li, Hastie, Church, KDD 2006) for learning with high-dimensional data. It is evident that the projection matrix can be extremely sparse (e.g., 0.1% or less nonzeros) without hurting the learning performance. For binary sparse data (which are common in practice), however, b-bit minwise hashing (Li and Konig, Communications of the ACM 2011) turns out to be much more efficient than random projections. In addition, the recent development of one-permutation hashing (Li, Owen, Zhang, NIPS 2012) substantially reduced the processing time of (b-bit) minwise hashing, from (e.g.,) 500 permutations to merely one. There are many other exciting new progresses in the basic research of random projections and hashing, for example, the new work on sign Cauchy random projections for approximating chi- square distances (Li, Samorodnitsky, Hopcroft, NIPS 2013) and the work on using stable random projections for very fast and accurate compressed sensing (Li, Zhang, Zhang, 2013).
|
Extent |
42 minutes
|
Subject | |
Type | |
File Format |
video/mp4
|
Language |
eng
|
Notes |
Author affiliation: Rutgers University
|
Series | |
Date Available |
2014-08-07
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivs 2.5 Canada
|
DOI |
10.14288/1.0043880
|
URI | |
Affiliation | |
Peer Review Status |
Unreviewed
|
Scholarly Level |
Faculty
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivs 2.5 Canada