 Library Home /
 Search Collections /
 Open Collections /
 Browse Collections /
 BIRS Workshop Lecture Videos /
 Optimal lower bound for samplers, finding duplicates,...
Open Collections
BIRS Workshop Lecture Videos
BIRS Workshop Lecture Videos
Optimal lower bound for samplers, finding duplicates, and universal relation Nelson, Jelani
Description
In turnstile $l_0$ sampling, a vector x receives coordinatewise updates, and during a query one must return a uniformly random element from support(x). Data structures solving this problem were first used as a subroutine to solve various dynamic graph streaming problems in (Ahn, Guha, McGregor SODA’12) and since then have been crucially used in seemingly every dynamic graph streaming problem studied across several papers (connectivity, kconnectivity, spanners, cut sparsifiers, spectral sparsifiers, minimum spanning tree, maximal matching, maximum matching, vertex cover, hitting set, bmatching, disjoint paths, kcolorable subgraph, several other maximum subgraph problems, densest subgraph, vertex and hyperedge connectivity, and approximating graph degeneracy, to name a few). If x is ndimensional with poly(n)bounded coordinates, (Jowhari, Saglam, Tardos PODS’11) gave an $\Omega(lg^2 n)$ bit space lower bound for data structures which fail with constant probability, and otherwise whose query responses are close to uniform in statistical distance. They also gave an upper bound with failure probability delta, which in fact gave $\Theta(lg(1/delta))$ uniform samples from the support, with space $O(lg^2 n lg(1/delta))$ bits. No lower bound was known in terms of delta. We prove that for any $l_0$ sampling data structure, the correct space complexity is $\Omega(max(k, lg(1/delta)) lg^2 n)$ bits to recover k samples from support(x) with failure probability delta. This is optimal for all settings of k, delta, and n (as long as the expression stays below, say, $n^{.99}$  there is always a trivial $O(n lg n)$ bit algorithm by storing x in memory). Furthermore, our lower bound holds even if the data structure is only required to output any k elements from the support, as opposed to nearly uniform ones. The previous lower bound did not hold against this weaker guarantee, despite that the fact that this weaker guarantee is actually all that is needed for most dynamic graph streaming applications. Furthermore, since our lower bound holds regardless of which support elements the data structure finds, it holds against l_p samplers for every p (or even other distributions). Our approach is loosely inspired by recent activity in adaptive data analysis and differential privacy and may be of independent interest. Joint work with Jakub Pachocki and Zhengyu Wang.
Item Metadata
Title 
Optimal lower bound for samplers, finding duplicates, and universal relation

Creator  
Publisher 
Banff International Research Station for Mathematical Innovation and Discovery

Date Issued 
20170320T10:35

Description 
In turnstile $l_0$ sampling, a vector x receives coordinatewise updates, and during a query one must return a uniformly random element from support(x). Data structures solving this problem were first used as a subroutine to solve various dynamic graph streaming problems in (Ahn, Guha, McGregor SODA’12) and since then have been crucially used in seemingly every dynamic graph streaming problem studied across several papers (connectivity, kconnectivity, spanners, cut sparsifiers, spectral sparsifiers, minimum spanning tree, maximal matching, maximum matching, vertex cover, hitting set, bmatching, disjoint paths, kcolorable subgraph, several other maximum subgraph problems, densest subgraph, vertex and hyperedge connectivity, and approximating graph degeneracy, to name a few).
If x is ndimensional with poly(n)bounded coordinates, (Jowhari, Saglam, Tardos PODS’11) gave an $\Omega(lg^2 n)$ bit space lower bound for data structures which fail with constant probability, and otherwise whose query responses are close to uniform in statistical distance. They also gave an upper bound with failure probability delta, which in fact gave $\Theta(lg(1/delta))$ uniform samples from the support, with space $O(lg^2 n lg(1/delta))$ bits. No lower bound was known in terms of delta.
We prove that for any $l_0$ sampling data structure, the correct space complexity is $\Omega(max(k, lg(1/delta)) lg^2 n)$ bits to recover k samples from support(x) with failure probability delta. This is optimal for all settings of k, delta, and n (as long as the expression stays below, say, $n^{.99}$  there is always a trivial $O(n lg n)$ bit algorithm by storing x in memory). Furthermore, our lower bound holds even if the data structure is only required to output any k elements from the support, as opposed to nearly uniform ones. The previous lower bound did not hold against this weaker guarantee, despite that the fact that this weaker guarantee is actually all that is needed for most dynamic graph streaming applications. Furthermore, since our lower bound holds regardless of which support elements the data structure finds, it holds against l_p samplers for every p (or even other distributions).
Our approach is loosely inspired by recent activity in adaptive data analysis and differential privacy and may be of independent interest.
Joint work with Jakub Pachocki and Zhengyu Wang.

Extent 
42 minutes

Subject  
Type  
File Format 
video/mp4

Language 
eng

Notes 
Author affiliation: Harvard University

Series  
Date Available 
20170917

Provider 
Vancouver : University of British Columbia Library

Rights 
AttributionNonCommercialNoDerivatives 4.0 International

DOI 
10.14288/1.0355674

URI  
Affiliation  
Peer Review Status 
Unreviewed

Scholarly Level 
Faculty

Rights URI  
Aggregated Source Repository 
DSpace

Item Media
Item Citations and Data
Rights
AttributionNonCommercialNoDerivatives 4.0 International