Title	Optimal lower bound for samplers, finding duplicates, and universal relation
Creator	Nelson, Jelani
Publisher	Banff International Research Station for Mathematical Innovation and Discovery
Date Issued	2017-03-20T10:35
Description	In turnstile $l_0$ sampling, a vector x receives coordinate-wise updates, and during a query one must return a uniformly random element from support(x). Data structures solving this problem were first used as a subroutine to solve various dynamic graph streaming problems in (Ahn, Guha, McGregor SODA’12) and since then have been crucially used in seemingly every dynamic graph streaming problem studied across several papers (connectivity, k-connectivity, spanners, cut sparsifiers, spectral sparsifiers, minimum spanning tree, maximal matching, maximum matching, vertex cover, hitting set, b-matching, disjoint paths, k-colorable subgraph, several other maximum subgraph problems, densest subgraph, vertex and hyperedge connectivity, and approximating graph degeneracy, to name a few). If x is n-dimensional with poly(n)-bounded coordinates, (Jowhari, Saglam, Tardos PODS’11) gave an $\Omega(lg^2 n)$ bit space lower bound for data structures which fail with constant probability, and otherwise whose query responses are close to uniform in statistical distance. They also gave an upper bound with failure probability delta, which in fact gave $\Theta(lg(1/delta))$ uniform samples from the support, with space $O(lg^2 n lg(1/delta))$ bits. No lower bound was known in terms of delta. We prove that for any $l_0$ sampling data structure, the correct space complexity is $\Omega(max(k, lg(1/delta)) lg^2 n)$ bits to recover k samples from support(x) with failure probability delta. This is optimal for all settings of k, delta, and n (as long as the expression stays below, say, $n^{.99}$ -- there is always a trivial $O(n lg n)$ bit algorithm by storing x in memory). Furthermore, our lower bound holds even if the data structure is only required to output any k elements from the support, as opposed to nearly uniform ones. The previous lower bound did not hold against this weaker guarantee, despite that the fact that this weaker guarantee is actually all that is needed for most dynamic graph streaming applications. Furthermore, since our lower bound holds regardless of which support elements the data structure finds, it holds against l_p samplers for every p (or even other distributions). Our approach is loosely inspired by recent activity in adaptive data analysis and differential privacy and may be of independent interest. Joint work with Jakub Pachocki and Zhengyu Wang.
Extent	42 minutes
Subject	Mathematics; Computer science; Information and communication, circuits; Theoretical computer science
Type	Moving Image
File Format	video/mp4
Language	eng
Notes	Author affiliation: Harvard University
Series	BIRS Workshop Lecture Videos (Banff, Alta)
Date Available	2017-09-17
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0355674
URI	http://hdl.handle.net/2429/63057
Affiliation	Non UBC
Peer Review Status	Unreviewed
Scholarly Level	Faculty
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

BIRS Workshop Lecture Videos

Optimal lower bound for samplers, finding duplicates, and universal relation Nelson, Jelani

Description

Item Metadata

Item Media

Item Citations and Data

Rights