BIRS Workshop Lecture Videos

Banff International Research Station Logo

BIRS Workshop Lecture Videos

Optimal lower bound for samplers, finding duplicates, and universal relation Nelson, Jelani


In turnstile $l_0$ sampling, a vector x receives coordinate-wise updates, and during a query one must return a uniformly random element from support(x). Data structures solving this problem were first used as a subroutine to solve various dynamic graph streaming problems in (Ahn, Guha, McGregor SODA’12) and since then have been crucially used in seemingly every dynamic graph streaming problem studied across several papers (connectivity, k-connectivity, spanners, cut sparsifiers, spectral sparsifiers, minimum spanning tree, maximal matching, maximum matching, vertex cover, hitting set, b-matching, disjoint paths, k-colorable subgraph, several other maximum subgraph problems, densest subgraph, vertex and hyperedge connectivity, and approximating graph degeneracy, to name a few). If x is n-dimensional with poly(n)-bounded coordinates, (Jowhari, Saglam, Tardos PODS’11) gave an $\Omega(lg^2 n)$ bit space lower bound for data structures which fail with constant probability, and otherwise whose query responses are close to uniform in statistical distance. They also gave an upper bound with failure probability delta, which in fact gave $\Theta(lg(1/delta))$ uniform samples from the support, with space $O(lg^2 n lg(1/delta))$ bits. No lower bound was known in terms of delta. We prove that for any $l_0$ sampling data structure, the correct space complexity is $\Omega(max(k, lg(1/delta)) lg^2 n)$ bits to recover k samples from support(x) with failure probability delta. This is optimal for all settings of k, delta, and n (as long as the expression stays below, say, $n^{.99}$ -- there is always a trivial $O(n lg n)$ bit algorithm by storing x in memory). Furthermore, our lower bound holds even if the data structure is only required to output any k elements from the support, as opposed to nearly uniform ones. The previous lower bound did not hold against this weaker guarantee, despite that the fact that this weaker guarantee is actually all that is needed for most dynamic graph streaming applications. Furthermore, since our lower bound holds regardless of which support elements the data structure finds, it holds against l_p samplers for every p (or even other distributions). Our approach is loosely inspired by recent activity in adaptive data analysis and differential privacy and may be of independent interest. Joint work with Jakub Pachocki and Zhengyu Wang.

Item Media

Item Citations and Data


Attribution-NonCommercial-NoDerivatives 4.0 International