UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Embracing nonuniform samples López, Oscar Fabian 2019

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


24-ubc_2019_september_lopez_oscar.pdf [ 1.41MB ]
JSON: 24-1.0380720.json
JSON-LD: 24-1.0380720-ld.json
RDF/XML (Pretty): 24-1.0380720-rdf.xml
RDF/JSON: 24-1.0380720-rdf.json
Turtle: 24-1.0380720-turtle.txt
N-Triples: 24-1.0380720-rdf-ntriples.txt
Original Record: 24-1.0380720-source.json
Full Text

Full Text

EMBRACING NONUNIFORM SAMPLESbyOscar Fabian LópezA THESIS SUBMITTED IN PARTIALFULFILLMENT OF THE REQUIREMENTS FORTHE DEGREE OFDOCTOR OF PHILOSOPHYinTHE FACULTY OF GRADUATE ANDPOSTDOCTORAL STUDIES(Mathematics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)August 2019c© Oscar Fabian López, 2019The following individuals certify that they have read, and recommendto the Faculty of Graduate and Postdoctoral Studies for acceptance, a the-sis/dissertation entitled:EMBRACING NONUNIFORM SAMPLESsubmitted by OSCAR FABIAN LÓPEZ in partial fulfillment of the require-ments for the degree of Doctor of Philosophy in MathematicsExamining Committee:Özgür Ylmaz, Faculty of ScienceSupervisorFelix Herrmann, Faculty of ScienceSupervisorYaniv Plan, Faculty of ScienceSupervisory Committee MemberBrian Wetton, Faculty of ScienceUniversity ExaminerMichael Bostock, Faculty of ScienceUniversity ExaminerMichael Wakin, Colorado School of MinesExternal ExamineriiAbstractMany empirical studies suggest that samples of continuous-time signals takenat locations randomly deviated from an equispaced grid can benefit signalacquisition (e.g., undersampling and anti-aliasing). However, rigorous state-ments of such advantages and the respective conditions are scarce in theliterature. This thesis provides some theoretical insight on this topic whenthe deviations are known and generated i.i.d. from a variety of distributions.By assuming the signal of interest is s-compressible (i.e., can be expandedby s coefficients in some basis), we show that O(s polylog(N)) samples ran-domly deviated from an equispaced grid are sufficient to recover the N/2-bandlimited approximation of the signal. For sparse signals (i.e., s  N),this sampling complexity is a great reduction in comparison to equispacedsampling whereO(N)measurements are needed for the same quality of recon-struction (Nyquist-Shannon sampling theorem). The methodology consistsof incorporating an interpolation kernel into the basis pursuit problem. Themain result shows that this technique is robust with respect to measurementnoise and stable when the signal of interest is not strictly sparse.Analogous results are provided for signals that can be represented as anN ×N matrix with rank r. In this context, we show that O(rN polylog(N))random nonuniform samples provide robust recovery of the N/2-bandlimitedapproximation of the signal via the nuclear norm minimization problem.This result has novel implications for the noisy matrix completion problemby improving known error bounds and providing the first result that is stablewhen the data matrix is not strictly low-rank.iiiLay SummaryIn signal processing, sampling theory deals with the acquisition of continuousdata via a discrete set of samples (e.g., images, audio, video and metereolog-ical signals). The classical results in this field give a complete understandingof the number of equispaced (or uniform) samples needed to capture a sig-nal according to its oscillatory behavior (the largest frequency component).However, in many applications equispaced samples are difficult to honor dueto natural factors and equipment malfunction that deviate the sampling loca-tions from a desired equispaced grid. This makes non-equispaced (or nonuni-form) sampling an inherent theme in signal processing.While nonuniform samples are typically seen as a nuisance in practice,many works in the literature argue experimentally that such deviated mea-surements can reduce the number of required samples for signal acquisition(in contrast to the uniform case). Although several authors have provideda theoretical treatment of such claims, the topic remains largely incompleteand vague for practical purposes. This thesis extends the comprehension ofthis phenomenon by providing methodology, sampling complexity and signalrecovery error terms. The main result proves that when the deviations arerandom, one can exploit nonuniform samples to capture a signal with lesssamples than those required from equispaced samples.ivPrefaceThis thesis consists of my original research, conducted at the University ofBritish Columbia under supervision of Dr. Özgür Ylmaz and Dr. Felix Her-rmann (now at the Georgia Institute of Technology). The following chaptersare composed of unpublished but soon to be submitted work for which I willbe the principal author. However, some of the ideas and writing scatteredthroughout this thesis are taken from [68] (and cited accordingly), for whichI was the principal author.vTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Sampling Theory . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Nonuniform Sampling . . . . . . . . . . . . . . . . . . . . . . . 31.2.1 Benefits of Nonuniform Sampling . . . . . . . . . . . . 41.2.2 An Open Problem . . . . . . . . . . . . . . . . . . . . 51.3 Compressive Sensing . . . . . . . . . . . . . . . . . . . . . . . 61.4 Low-Rank Matrix Recovery . . . . . . . . . . . . . . . . . . . 71.5 Overview of the Main Results . . . . . . . . . . . . . . . . . . 91.6 Organization and Notation . . . . . . . . . . . . . . . . . . . . 102 Compressive Off-the-Grid Sampling . . . . . . . . . . . . . . . 122.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.1.1 The Anti-aliasing Nature of Nonuniform Samples . . . 122.2 Notation, Assumptions and Methodology . . . . . . . . . . . . 152.2.1 Signal Model . . . . . . . . . . . . . . . . . . . . . . . 152.2.2 Deviation Model . . . . . . . . . . . . . . . . . . . . . 17vi2.2.3 Dirichlet Kernel . . . . . . . . . . . . . . . . . . . . . . 172.3 Main Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.3.1 Simplified Result . . . . . . . . . . . . . . . . . . . . . 202.3.2 Full Result . . . . . . . . . . . . . . . . . . . . . . . . . 212.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.4.1 Deviation Model . . . . . . . . . . . . . . . . . . . . . 232.4.2 Signal Model . . . . . . . . . . . . . . . . . . . . . . . 252.4.3 Novelty of the Results . . . . . . . . . . . . . . . . . . 272.5 Proof of Main Result . . . . . . . . . . . . . . . . . . . . . . . 282.5.1 Restricted Isometry Property of SΨ . . . . . . . . . . . 292.5.2 Proof of Theorem 2.3.3 . . . . . . . . . . . . . . . . . . 372.6 Interpolation Error of Dirichlet Kernel: Proof . . . . . . . . . 392.7 Proof of Lemma 2.5.2 . . . . . . . . . . . . . . . . . . . . . . . 442.8 DFT-incoherence of Discretized Smooth Functions . . . . . . . 472.9 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . 492.9.1 Effect of DFT-incoherence . . . . . . . . . . . . . . . . 512.9.2 Effect of Deviation Model Parameter θ . . . . . . . . . 512.9.3 Noise Attenuation . . . . . . . . . . . . . . . . . . . . . 533 Off-the-Grid Sampling of Low-Rank Matrices . . . . . . . . . 553.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553.2 Notation, Assumptions and Methodology . . . . . . . . . . . . 553.2.1 Signal Model . . . . . . . . . . . . . . . . . . . . . . . 563.2.2 2D Deviation Model . . . . . . . . . . . . . . . . . . . 583.2.3 2D Dirichlet Kernel . . . . . . . . . . . . . . . . . . . . 583.3 Main Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603.3.1 Implications for Matrix Completion: Stability and Ro-bustness . . . . . . . . . . . . . . . . . . . . . . . . . . 613.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633.4.1 r-incoherence Condition . . . . . . . . . . . . . . . . . 633.4.2 2D Deviation Model . . . . . . . . . . . . . . . . . . . 673.5 Proof of Main Result . . . . . . . . . . . . . . . . . . . . . . . 683.5.1 Dual Certificate and Required Lemmas . . . . . . . . . 693.5.2 Proof of Theorem 3.3.1 . . . . . . . . . . . . . . . . . . 713.6 Proof of Dual Certificate Recovery and Required Lemmas . . . 733.6.1 Proof of Lemma 3.5.1 . . . . . . . . . . . . . . . . . . . 733.6.2 Proof of Lemma 3.5.2 . . . . . . . . . . . . . . . . . . . 753.6.3 Proof of Lemma 3.5.3 . . . . . . . . . . . . . . . . . . . 83vii3.6.4 Proof of Lemma 3.5.4 . . . . . . . . . . . . . . . . . . . 863.7 Interpolation Error of 2D Dirichlet Kernel: Proof . . . . . . . 903.8 Proof of Theorem 3.3.2: Stable and Robust Matrix Completion 943.9 Uniform Matrix Bernstein Inequality . . . . . . . . . . . . . . 983.10 Proof of Additional Lemmas . . . . . . . . . . . . . . . . . . . 1054 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113viiiList of Figures2.1 Illustration of alias error for uniform and nonuniform samples.Credit is given to [25]. (Top) alias error caused by equispacedsamples. Notice that all three curves pass through the samplepoints. (Bottom) alias free sampling due to nonuniform sam-ples. Notice that only the black curve passes through all fourpoints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2 Different undersampling schemes and their imprint in the Fourierdomain for a signal that is the superposition of three cosinefunctions. Credit for this image is given to [35]. (Top) sig-nal uniformly sampled above Nyquist rate and its respectivespectrum on the left. (Middle) same signal randomly nonuni-formly three-fold undersampled according to a discrete uni-form distribution. (Bottom) same signal uniformly three-foldundersampled. Notice that only in the case of nonuniformundersampling can the significant coefficients be detected byapplying a threshold. . . . . . . . . . . . . . . . . . . . . . . . 142.3 Illustrations of example perturbations generated by our devi-ation model. In the top two examples, the red areas indicatethe allowed positions of the deviations from the grid pointkm.(Top) deviations pertaining to example 1) with µ = 0 andp = 1, the samples lie on the interval [2k−12m, 2k+12m]. (Middle)deviations pertaining to example 2) with µ = 0 and p = 1,the samples lie on a discrete subgrid centered atkm. (Bottom)deviations pertaining to example 3) with µ = 0 where the bellcurve indicates the Gaussian pdf of these centered deviations.Notice that these samples may lie outside of Ω, but recall thatwe are sampling on the torus (see Section 2.2.2). . . . . . . . . 24ix2.4 Plot of average relative reconstruction error vs average nonuni-form step size for both signal models. In the complex expo-nential model (Ψ = DFT) we have γ = 1 and in the Gaus-sian signal model we have γ ≈ 40.78 (Daubechies 2 wavelet).Notice that the complex exponential model allows for recon-struction from larger step sizes in comparison to the Gaussiansignal model. . . . . . . . . . . . . . . . . . . . . . . . . . . . 522.5 (Left) plot of average relative reconstruction error vs ρ pa-rameter and (right) plot of corresponding θ parameters vs ρparameter. The plot on the right includes the constant valueθ = 1√2required to apply Theorem 2.3.3 (the red line). No-tice that although our results only holds for three values of ρ(.5, .49, .48), the plot on the left demonstrates that accuraterecovery is still possible otherwise. . . . . . . . . . . . . . . . . 532.6 Plot of average relative reconstruction error (‖f−Ψg]‖2/‖f‖2)vs average nonuniform step size (blue curve) and average inputrelative measurement error (‖d‖2/‖f‖2) vs average nonuniformstep size (red curve). Notice that for the first four step sizevalues (2,2.25,2.5,2.75) noise attenuation is achieved, i.e., thereconstruction error is lower than the input noise level. . . . . 543.1 Illustration of two 500 × 500 data matrices (from [68]) ofrank 5 with distinct 5-incoherence parameters (γ). (Left) low-rank data matrix with inappropriate 5-incoherence structureγ ∼ O(√N). (Right) low-rank data matrix with appropriate5-incoherence structure γ ∼ O(1). Both data matrices are dis-cretizations of functions of the form (3.13) with same centerlocations and parameter c = 1000 for the left data matrix andc = 20 for the right data matrix. . . . . . . . . . . . . . . . . . 65x3.2 Illustration of two 500 × 500 data matrices of rank 6 withdistinct 6-incoherence parameters (γ) but same aliasing energy(∑|`|>N/2 |c`|). (Left) low-rank data matrix with inappropriate6-incoherence structure γ ∼ O(√N). (Right) low-rank datamatrix with appropriate 6-incoherence structure γ ∼ O(1).Both data matrices are discretizations of functions of the form(3.14) modified from respectively from Figure 3.1, where theleft image has no additional aliasing energy introduced (i.e.,ω < 250 and d  1) and the image on the right has beengenerated with additional aliasing error (ω > 250 and d ∼ 1)to match that of the image on the left (these additional aliasingartifacts are subtle due to their high frequency nature). . . . . 663.3 Illustration of the 2D sampling scenario in Ω. (Left) denseequispaced samples. (Right) less dense nonuniform samples(on average by a factor of ≈ 12) which can be generated by ex-ample 1) or 2) in this section. Both sampling schemes providethe same quality of reconstruction, but random off-the-gridsamples require less measurements according to our results. . . 68xiAcknowledgementsI want to thank my supervisors Dr. Özgür Ylmaz and Dr. Felix Herrmann.I would like to give credit to Dr. Herrmann for proposing the initial method-ology (inspired by [9]) that led to the work in [68, 82] and eventually becamethe core approach of this thesis. Dr. Ylmaz identified the potential av-enue this methodology offered to solve the jittered sampling problem, i.e., thegeneral problem considered in this thesis. He encouraged me to adapt andanalyze the approach and this eventually became my doctoral work. Asidefrom this, both of them have always inspired me to grow as an independentresearcher and person.xiiDedicationPara Luisi, Maribel, Rosi y Jorge.xiiiChapter 1IntroductionOur understanding of nature's many complex processes is inevitably donevia a discrete set of observations. Restricted by our own mosaic perception,we endeavor to take such samples in a periodic, patterned and predictablefashion. However, by nature itself this task is impossible (no matter howaccurate), forcing us to view the outside world in a nonuniform and chaoticmanner. Should we then continue our adversarial pursuit of knowledge? Orcan we instead benefit by adapting to nature's stochastic ways? Can weembrace the nonuniform samples?This thesis argues in the affirmative. The main argument is that in com-parison to classical sampling theory (equispaced sampling), random nonuni-form samples can capture a signal of interest with less samples (by simul-taneously attenuating the aliasing error). Such benefits have been knownsince the 1960's, with no large analytical understanding. This thesis extendsthe comprehension of this phenomenon by providing methodology, samplingcomplexity and signal recovery error bounds. The main novelty is due torecent advances in random matrix theory, compressive sensing and low-rankmatrix recovery.1.1 Sampling TheoryThe Nyquist-Shannon sampling theorem is perhaps the most impacting resultin the theory of signal processing, fundamentally shaping the practice ofacquiring and processing data [8, 37] (also attributed to Kotel'nikov [91],Ferrar [92], Cauchy [1], Ogura [50], E.T. and J.M. Whittaker [17, 44]). In1this setting, typical acquisition of a continuous-time signal involves takingequispaced samples at a rate slightly higher than a prescribed frequency ωHz in order to obtain a bandlimited approximation via a quickly decayingkernel. Such techniques provide accurate approximations of noisy signalswhose spectral energy is largely contained in the band [−ω/2, ω/2] [28].To be precise, let f(x) be a function whose Fourier transform vanishes out-side of [−ω/2, ω/2] (i.e., belongs to a Paley-Wiener space [36]), then Shannonand Kotel'nikov (independently) provided the expansionf(x) =∞∑k=−∞f(kω)sin (pi(ωx− k))pi(ωx− k) := Sωf(x). (1.1)Therefore, one can completely capture a continuous bandlimited signal via adiscrete sequence of its equispaced (or uniform) samples with step size 1/ω.Such initial results were later generalized to non-bandlimited functions[69], where for f ∈ L2(R) ∩ C(R) with Fourier transform fˆ ∈ L1(C) oneobtains the error|f(x)− Sωf(x)| ≤√2pi∫|v|≥ω/2|ˆf(v)|dv. (1.2)This is also known as the aliasing error, a standard (and inevitable) errorterm absorbed in applications. In view of (1.2), oversampling is a commontechnique to attenuate the aliasing error (i.e., anti-aliasing) by increasing thefrequency ω of samples.Furthermore, in practice only a finite number of samples may be takenand practitioners must adopt the truncated approximationf(x) ≈N∑k=−Nf(kω)sin (pi(ωx− k))pi(ωx− k) ,for some N large enough. This provides an additional error term of order1/√N (see Theorem 3.20 in [2] or [10, 28]), which may be unacceptable inmany applications. Furthermore, such truncation approaches can lead to sta-bility issues since they may produce ill-posed problems (see [86]). To remedythis, many works have proposed instead to use trigonometric polynomials asa stable finite dimensional model ([86, 19] and Section 6.2 in [28]). In essence,given samples in [−1/2, 1/2), this approach adopts the periodic extension off(x)|[−1/2,1/2) to the whole line and approximates it as a periodic bandlimitedfunction (i.e., a trigonometric polynomial).21.2 Nonuniform SamplingAs a consequence of the results in the previous section, industrial signal ac-quisition and post-processing methods tend to be designed to incorporateuniform sampling. However, such sampling schemes are difficult to honorin practice due to physical constraints and natural factors that perturb thesampling locations from the uniform grid, i.e., off-the-grid samples. In re-sponse, nonuniform analogs of the sampling theorem have been developed,where an average sampling density proportional to the highest frequency ωof the signal guarantees accurate interpolation, e.g., Landau density [39, 28].In this context, given {tp}p∈Z a complete interpolating sequence for aPaley-Wiener space ([81] Section 4.5, or a Bernstein space see [36] Section7.3.2) and a12-bandlimited function f(x) (i.e., whose Fourier transform van-ishes outside [−1/2, 1/2]) we may expandf(x) =∞∑p=−∞f(tp)G(x)G′(tp)(t− tp) , (1.3)whereG(x) = (x− t0)∞∏p=1(1− xtp)(1− xt−p),is an entire function (see [81] Section 4.1 equation (3) or [2] equation 3.1.3).Therefore (1.3) can be viewed as the infinite version of the Lagrange interpo-lation formula (since (1.3) now involves infinite terms). Furthermore, whentp = p then (1.3) reduces to (1.1) with ω = 1 so that this interpolationformula can also be seen as a generalization of the equispaced version.One sufficient condition for a set of nonuniform samples {tp}p∈Z to be acomplete interpolating sequence (i.e., (1.3) holds) is given by Kadec's 1/4-theorem [55, 28, 81] (initially proven in weaker form by Paley-Wiener [73]),which requiressupp∈Z|tp − p| < 14.In the case the tp are randomly nonuniform, the restriction |tp − p| < 14can be removed. In [32] it was shown that if tp = p + ∆p where {∆p}p∈Zare independent, centered random variables with uniformly bounded secondmoments then {tp}p∈Z is a complete interpolating sequence almost surely.3This allows the case |∆p| ≥ 14 , but on average the sampling density agreeswith Kadec's, Landau's and Nyquist's rate.Such results allow for practical signal acquisition, as long as the practi-tioner keeps accurate tracking of the nonuniform sampling locations. As inthe previous section, standard implementations approximate the function ofinterest by absorbing the error of truncating (1.3) and aliasing error (i.e., thesampling rate provides a bandlimited approximation).Although nonuniform sampling theory is mainly on par with the clas-sical uniform counterpart, off-the-grid samples are typically unwanted andregarded as a burden. This is in large part due to the extra computationalcost involved in regularization, i.e., interpolating the nonuniform samplesonto a desired equispaced grid. Regularization is a common procedure inpractice, required for many post-processing techniques designed according toclassical sampling theory (involving uniform samples).1.2.1 Benefits of Nonuniform SamplingIn contrast to the mind set of the previous section, many works in the litera-ture have considered the potential benefits of deliberate nonuniform sampling[38, 26, 27, 76, 80, 67, 63, 61, 90, 51, 5, 93, 12, 13, 84, 34, 72, 33, 57, 18, 71,78, 70, 52, 47, 15, 40, 35]. Suppression of aliasing error, i.e., anti-aliasing, is awell known advantage of randomly perturbed samples. For example, jitteredsampling is a common technique for anti-aliasing that also provides a well dis-tributed set of samples [11, 74, 14, 72]. To the best of the author's knowledge,this phenomenon seems to have been first noticed by Harold S. Shapiro andRichard A. Silverman [38] (also by Federick J. Beutler [26, 27] and implicitlyby Henry J. Landau [39]) and remained unused in applications until redis-covered in Pixar animation studios by Robert L. Cook [76]. In the context ofcomputer graphics, nonuniform samples produce aliasing artifacts in imagesthat are less noticeable to the human eye (see also [12, 13, 72]). Interestingly,there are indications that the human visual system exploits nonuniform pho-toreceptor structures to process high frequency images [75, 93].Intrinsically, these observations also suggest that a bandlimited functionmay be accurately captured in an undersampled sense (in comparison tothe Nyquist rate) via nonuniform samples. Indeed, by lessening the effectof aliasing we are reducing the right hand side of (1.2). Intuitively, this4corresponds to a diminished error of the form∼√2pi∫|v|≥ω˜/2|ˆf(v)|dv, (1.4)for some ω˜ > ω. In effect, it seems we are able to capture higher frequencycontent of the signal and potentially the full signal if ω˜ exceeds the bandwidthof f(x) (this is experimentally argued in [33]). Thus, nonuniform samplingprovides an interesting alternative to oversampling for anti-aliasing. If fullyunderstood, a practitioner could attenuate aliasing artifacts without the needfor additional samples and in fact may be able to do so with a reducedsampling complexity. Furthermore, this may be achieved in a natural mannersince deviated samples are inevitable in many situations.To the best of the author's knowledge, such observations and the ma-jority citations mentioned here are largely experimental and in need of fur-ther analytical understanding. The exceptions in this literature review are[38, 26, 27, 39]. In [38], Shapiro and Silverman show that Poisson randomsamples can provide alias free representation of stationary random processeswhose spectrum is absolutely continuous and has an L2 integrable derivative.The average sampling rate of this result agrees with the Nyquist rate, butprovides an explicit benefit of nonuniform samples that cannot be achievedby equispaced samples (since such stationary random processes need not bebandlimited). This result was generalized by Federick J. Beutler in [27] byextending the result to stationary random processes with any spectrum andonly requiring Poisson sampling on the half-axis. Beutler provides similaralias free results for random jitter sampling at Nyquist rate, and arbitrarilybelow the Nyquist rate for nonuniform but on-the-grid samples (i.e., ran-domly expunging samples from a dense equispaced grid). The results in[26, 39] are more aligned with sampling theory, where in [26] it was shownthat for generalized bandlimited signals (representable by a Fourier-Stieltjesintegral on a bounded interval) nonuniform samples can provide error freeexpansions with average sampling rates below Nyquist. A similar conclusioncan be interpreted from the work of Landau [39], though he does not stateit explicitly or provide examples.1.2.2 An Open ProblemIn the spirit of the previous section, we are left with a vague but interest-ing problem: state concrete methods and conditions for signal acquisition5via nonuniform samples that provide explicit benefits in contrast to uniformsamples.Specifically, we wish to answer the following questions simultaneously:• How many nonuniform samples are required? At what average sam-pling density?• What kind of grid deviations are permissible? What structure on thenonuniform samples?• What class of signals can we recover?• What numerical method can recover the signal in a stable manner?• How large can the reconstruction error be? Is perfect acquisition pos-sible?This thesis provides novel theoretical insight to each of these questions.Note that from the discussion in the previous section, all of these questionshave been investigated experimentally but only a few have been consideredin a rigorous (but arguably incomplete) manner [38, 26, 27, 39]. With this inmind, this thesis endeavors to provide analytical treatment of these questionswith answers that are informative for the practice and design of nonuniformsampling techniques. However, this is an extensive topic that seems to belacking a tangible theoretical framework. As a consequence, many potentialavenues of research remain with subsequent questions. In this sense, thisdissertation can be seen as the initial step to a more rigid theory of thisphenomenon.The contributions of this work are largely due to the relatively new fieldsof compressive sensing and low-rank matrix recovery. The numerical andanalytical methods of these areas of study are important for the remainderof the thesis.1.3 Compressive SensingCompressive sensing (CS) aims to simultaneously acquire and compress asignal that is sparse or compressible with respect to a given basis or frame.Suppose a discrete signal f ∈ CN is s-sparse with respect to a basis Ψ ∈CN×N , i.e., f = Ψg for some vector g ∈ CN with only s ≤ N non-zero entries6(g is said to be s-sparse). In CS, one collects m  N linear measurementsof f given byb = Af + d ∈ Cm.Here, A ∈ Cm×N is the measurement matrix that models the sampling pro-tocol and d ∈ Cm represents measurement noise with ‖d‖2 ≤ η. In thisfield, perhaps the most popular method to obtain an approximation to fis by means of the basis pursuit problem. More precisely, the basis pursuitproblem approximates f ≈ Ψg] where g] given byg] = arg minh∈CN‖h‖1 s.t. ‖AΨh− b‖2 ≤ η. (1.5)Here ‖h‖1 :=∑Nk=1 |hk| is the `1-norm, a standard penalty function for sparsevector recovery [43, 85, 36].An appropriate measurement matrix A and sparsifying transform Ψ arecrucial for the success of (1.5). Standard results show that if f ≈ Ψg for somes-sparse vector g, then f can be robustly and stably approximated as longas AΨ is a suitable measurement matrix and has m = O(s polylog(N))rows (i.e., measurements). In this case the approximation error ‖f − Ψg]‖2is proportional to η (robust) and the sparsity model mismatch (stability, i.e.,error of best s-sparse approximation) [85]. Typical assumptions imposed onAΨ for successful recovery include the null space property (NSP) [83] andthe restricted isometry property (RIP, see Section 2.5).Therefore, one can manage to solve a linear system of equations in anunder determined sense (since m < N) for sparse vector recovery. These ap-proaches are valuable to industrial applications since signals may be acquiredand processed in a frugal and compressed manner. Standard measurementmatrices known to be successful for sparse vector recovery (e.g., satisfy NSPor RIP) typically involve some degree of randomness. Matrices A whoseentries are independent and centered subgaussian random variables can beshown to allow for s-sparse vector recovery if m ∼ s log(N/s) [58, 49, 83, 85].Random matrices with more structure, such as random Fourier matrices, alsoallow for sparse vector recovery fromO(s polylog(N)) samples [58, 85, 42, 41].1.4 Low-Rank Matrix RecoveryIn the same spirit as CS, the field of low-rank matrix recovery (also known asmatrix sensing) has the goal of recovering an N ×M low-rank matrix from7linear measurements and is particularly of interest in the case the numberof measurements is much smaller than NM [6, 59, 65, 66]. As a popularexample, the matrix completion problem of reconstructing a data matrixgiven only a few observed entries has received increased amounts of attentiondue to its extensive applications and implementation success [21, 22, 7] (seeSection 3.3.1 for an elaborated discussion). The key assumption here is thatthe underlying matrix of interest has few nonzero or quickly decaying singularvalues, i.e., the matrix exhibits low-rank structure.More precisely, let D ∈ CN×M be a rank r matrix (or a matrix that canbe well approximated by a rank r matrix) and suppose that we have thenoisy linear measurementsB = A(D) + E ∈ Cn×m,where A : CN×M 7→ Cn×m is linear map with nm < NM and E ∈ Cn×mrepresents our measurement noise with ‖E‖F ≤ η. It is important to noticethat in the literature, it is typically assumed that B,A(D) and E are vectors.Here we allow for the general structure of n × m matrices which best fitour scenario (see Chapter 3.2.1), but also include the assumptions in theliterature by taking m = 1 and considering an n× 1 matrix as a vector with‖E‖2 = ‖E‖F .We compensate for our small number of measurements by assuming thatr  min{N,M}, i.e., we assume that our matrix of interest is of low-rankrelative to the ambient dimensions (or well approximated by such a rank rmatrix). With this assumption, perhaps the most popular method to ap-proximate D is to obtain a data estimate D] as the argument output of theoptimization procedureminX∈CN×M‖X‖∗ subject to ‖A(X)−B‖F ≤ η, (1.6)where ‖X‖∗ is the nuclear norm ofX and is defined as ‖X‖∗ :=∑min{N,M}k=1 σk(X)where σk(X) denotes the k-th largest singular value of the matrix.This methodology has been extensively studied with the concepts of rankrestricted isometry property [6, 59, 85, 94], robust rank null space property[65, 66] and dual certificates [22, 21]. In this context, standard results showthat m ∼ r(N + M) log(N + M) measurements via an appropriate A (e.g.,satisfying one of the properties previously mentioned) provides an accurateapproximation of the signal via its best r rank approximation. This sampling8complexity is a great reduction in contrast to the NM measurements neededin general to parametrize an N ×M matrix. Standard results provide errorbounds robust to noise and proportional to the rank r approximation of thedata matrix. However, results using dual certificates typically do not providestability with respect to the low-rank model [16, 7, 22, 21] (i.e., the resultsonly apply to matrices that are exactly rank r).1.5 Overview of the Main ResultsThis thesis provides novel theoretical understanding of the benefits of randomnonuniform sampling for signal acquisition (as opposed to uniform sampling).In essence, the main results formalize the advantages of nonuniform samplesfor anti-aliasing and undersampling outlined in Section 1.2.1 and addressthe questions posed in Section 1.2.2. Adopting the methodology and prooftechniques developed in compressive sensing and low-rank matrix recovery,the work develops sampling complexity and recovery error bounds in termsof the bandlimited approximation.Along the lines of compressive sensing, our results in Chapter 2 hinge on asparse signal model assumption. With an underlying function f(x) to be sam-pled in Ω, we assume its discrete counterpart f ∈ CN (sampled uniformly) iscompressible with respect to a basis Ψ ∈ CN×N (f ≈ Ψg as outlined in Sec-tion 2.3.3). Our methodology consists of solving the `1 minimization problem(1.5) with an interpolation kernel S in lieu of A to model the nonuniform ob-servations. We show that under a random deviation model, O(s polylog(N))off-the-grid samples approximate f(x) (∀x ∈ Ω) robustly with error propor-tional to the s-sparse approximation of g and N/2-bandlimited approxima-tion of f(x) with high probability (w.h.p.). For highly compressible signalswith s N , we may therefore recover the bandlimited approximation of thesignal of interest with less samples than those required in the equispaced casefor the same quality of reconstruction (O(N) by Nyquist rate).Similarly, when a 2D signal of interest D(x, y) can be discretized (via uni-form samples) as an approximately low-rank matrix D ∈ CN×N , the theoryof low-rank matrix recovery provides analogous results in Chapter 3. In thiscase we show that under a less general deviation model, O(rN polylog(N))off-the-grid measurements can be incorporated into the nuclear norm mini-mization problem (1.6) with a 2D interpolation kernel to estimate D via itsrank r approximation (w.h.p.). This in turn gives the full signal D(x, y) (for9(x, y) in the sampling domain) robustly with error proportional to the r rankapproximation of D and N/2-bandlimited approximation of D(x, y).The proof technique in the 2D context utilizes a dual certificate, makingthe result rather novel since it provides stability with respect to the low-rankdata model. In other words, the analysis concerns full rank matrices andguarantees recovery of a low-rank approximation of D which is uncommonfor low-rank matrix recovery proofs that employ dual certificates (e.g., [16, 21,22, 7]). In particular, the recovery error bounds apply to matrix completionand thus produce the first results in this popular problem that do not requirethe data matrix (D) to be exactly low-rank and guarantee error proportionalto the r rank approximation (with similar sampling complexity to standardresults in the matrix completion literature).1.6 Organization and NotationThe remainder of the thesis is organized as follows: Chapter 2 discusses theresults outlined in the previous section with the sparse signal model assump-tion. This chapter includes an extended discussion of the anti-aliasing natureof nonuniform samples, providing intuition behind this phenomenon. Chap-ter 3 proceeds with the analogous results under the low-rank data model.Extra consideration is given to the proof technique in this chapter in orderto outline the novelties and implications for other fields (e.g., the matrixcompletion problem) and future work.Before proceeding to the next chapter, we find it best to introduce thegeneral notation that will be used throughout the thesis. However, eachsubsequent chapter will introduce additional notation helpful in the specificcontext.Notation: We denote complex valued functions of real variables usingbold letters, e.g., f : R2 → C. For any integer n ∈ N, [n] denotes the set{` ∈ N : 1 ≤ ` ≤ n}. For k, ` ∈ N, bk indicates the k-th entry of the vector b,Dk` denotes the (k, `) entry of the matrix D and Dk∗ (D∗`) denotes the k-throw (resp. `-th column) of the matrix. We reserve x, y to denote real variablesand write the complex exponential as e(x) := e2piix, where i is the imaginaryunit. For a vector f ∈ Cn, ‖f‖1 :=∑nk=1 |fk| is the `1 norm, ‖f‖2 :=[∑nk=1 |fk|2]1/2is the Euclidean norm, and ‖f‖0 gives the total number ofnon-zero elements of f . For a matrix X ∈ Cn×m, σk(X) denotes the k-thlargest singular value of X, ‖X‖∗ :=∑min (n,m)k=1 σk(X) is the nuclear norm,10‖X‖ := σ1(X) is the spectral norm, and ‖X‖F :=[∑nk=1∑m`=1 |Xk`|2]1/2isthe Frobenious norm of X. L2(Ω) is the Lebesgue space and H1(Ω) is theSobolev space W 1,2(Ω) (with domain Ω). The adjoint of an operator A isdenoted by A∗.11Chapter 2Compressive Off-the-GridSampling2.1 IntroductionIn this chapter, we produce our results under the sparse signal model dis-cussed in Section 1.5. In Section 2.1.1, we begin with some motivation andintuition for our problem of interest and analysis approach. Sections 2.2and 2.3 proceed to lay the foundation necessary to state the methodologyand main result. The remaining sections of the chapter are dedicated to anelaborated discussion of the main result and its proof.2.1.1 The Anti-aliasing Nature of Nonuniform SamplesThough there is few theoretical work explicitly stating the anti-aliasing be-havior of nonuniform samples, the intuition of this phenomenon is not hardto grasp. In this section we provide some illustrations to aid the readercomprehend this matter and lightly explain the proof technique used in thethesis.When trying to capture a signal via equispaced samples, the main issueis that one can encounter higher and lower frequency signals that satisfy thesame set of measurements. This is illustrated in Figure 2.1 (credit for theseimages is given to [25]). In the top image, all three sinusoids (black, blueand yellow) pass through the four uniform samples. In effect, no recoverymethod should be able to distinguish between these signals in order to capture12the black sinusoid of interest. In general, additional equispaced samples areneeded to remove this aliasing bias (i.e., oversampling).On the other hand, the bottom image in Figure 2.1 illustrates the anal-ogous situation for nonuniform samples. Notice that in this case, only theblack sinusoid fits all four measurements and can thus be distinguished fromthe other curves. This simple example provides the intuition behind theanti-aliasing nature of nonuniform samples.Figure 2.1: Illustration of alias error for uniform and nonuniform samples.Credit is given to [25]. (Top) alias error caused by equispaced samples.Notice that all three curves pass through the sample points. (Bottom) aliasfree sampling due to nonuniform samples. Notice that only the black curvepasses through all four points.We provide some more intuition, now in the frequency domain. We con-sider the effect of uniform and nonuniform undersampling on the spectrumof a continuous function. Consider Figure 2.2 (credit for this image is given13to [35]). The top image corresponds to a densely sampled signal (aboveNyquist rate) and its respective spectrum (on the right), which has few sig-nificant Fourier coefficients (only 6 non-zero components). The middle andbottom images correspond to the same signal nonuniformly and uniformlyundersampled along with the respective spectra. We see that both under-sampled cases introduce noise into the spectrum, but only in the case ofrandom nonuniform samples is this noise low in amplitude. This allows fordetection of all 6 signal coefficients using nonuniform samples (by applyinga threshold). On the other hand, periodic undersampling makes this taskrather difficult.Figure 2.2: Different undersampling schemes and their imprint in the Fourierdomain for a signal that is the superposition of three cosine functions. Creditfor this image is given to [35]. (Top) signal uniformly sampled above Nyquistrate and its respective spectrum on the left. (Middle) same signal randomlynonuniformly three-fold undersampled according to a discrete uniform dis-tribution. (Bottom) same signal uniformly three-fold undersampled. Noticethat only in the case of nonuniform undersampling can the significant coef-ficients be detected by applying a threshold.This last argument (illustrated in Figure 2.2) is important for the workof this thesis. Basically, this is the approach that will be used to provethe anti-aliasing nature of random nonuniform samples. However, we will14generalize this argument to work for coefficients in any basis (rather thanjust the Fourier domain).2.2 Notation, Assumptions and MethodologyBefore being able to state our methodology and main result, we must intro-duce important definitions and assumptions. In this section we introduce thesignal model, deviation model and interpolation kernel used as a measure-ment matrix in Sections 2.2.1, 2.2.2 and 2.2.3 respectively. This will allow usto proceed to Section 2.3 where the main result of this chapter is elaborated.2.2.1 Signal ModelLet Ω = [−12, 12) and let f : Ω→ C be the function of interest to be sampledin Ω. We assume f ∈ H1(Ω) with Fourier expansionf(x) =∞∑`=−∞c`e(`x), (2.1)valid only for x ∈ Ω. Note that our regularity assumption implies that∞∑`=−∞|c`| <∞,which will be crucial for our error bound.Henceforth, let N ∈ N be odd. We denote the discretized regular datavector by f ∈ CN , which is obtained by sampling f on the uniform gridτ = {t1, · · · , tN} ⊂ Ω, where tk := k−1N − 12 , which is a collection of equispacedpoints, so that fk = f(tk). Here, f will be our discrete signal of interestto recover via few nonuniform samples (f will provide an N−12-bandlimitedapproximation of f(x)). Similar results can be obtained in the case N is even,our current assumption is only needed to simplify our exposition.The observed discretized nonuniform data vector is denoted by f˜ ∈ Cmwith underlying unstructured grid τ˜ = {t˜1, · · · , t˜m} ⊂ Ω where t˜k := k−1m −12+ ∆k is now a collection of generally non-equispaced points. The entries ofthe perturbation vector ∆ ∈ Rm defines the pointwise deviations of τ˜ fromthe equispaced grid {k−1m− 12}mk=1, where f˜k = f(t˜k). We assume that the15nonuniform points remain in our sampling space, i.e., τ˜ ⊂ Ω so that (2.1)remains valid for these measurements. See sampling on the torus discussionin Section 2.2.2 for further discussion of this important restriction.In our sampling scenario, noisy nonuniform samples are given asb = f˜ + d ∈ Cm,where the noise model, d with ‖d‖2 ≤ η, does not incorporate off-the-gridcorruption. We assume that we know τ˜ , and η (or at least an upper boundon the noise energy).In this chapter, we impose a compressibility condition on f ∈ CN . Tothis end, let n ≤ N and Ψ : CN×n be a full rank matrix with 0 < σn(Ψ) := αand σ1(Ψ) := β. We assume there exists some g ∈ Cn such that f = Ψg,where g can be accurately approximated by an s ≤ n sparse vector. Moreprecisely, for s ∈ [n] we define the error of best s-sparse approximation of gass(g) := min‖h‖0≤s‖h− g‖1,and assume s has been chosen so that s(g) is within a prescribed errortolerance determined by the practitioner. Further, we require m ≤ n (neededin the proof of Theorem 2.5.3, see Lemma 3.6 from [58]).The transform Ψ will have to be incoherent with respect to the 1D cen-tered discrete Fourier basis F ∈ CN×N (see Section 2.2.3 for definition of F).To be precise, we define the DFT-incoherence parameter asγ = max`∈[n]N∑k=1|〈F∗k,Ψ∗`〉| ,which provides a uniform bound on the `1-norm of the DFT coefficients of thecolumns of Ψ. This parameter will play a role in the sampling complexityof our result (see Section 2.3.1). We discuss γ in detail in Section 2.4.2,including several examples of the value of γ for several transforms commonin compressive sensing.In this chapter, the goal is to estimate g via basis pursuit (1.5), where Awill be replaced by an interpolation kernel S ∈ Cm×N that achieves Sf ≈ f˜accurately (the Dirichlet kernel, see Section 2.2.3).162.2.2 Deviation ModelOur work will provide analysis for deviations ∆ ∈ Rm (defined in Section2.2.1) whose entries are i.i.d. with any distribution, D, that obeys thefollowing: for δ ∼ D, there exists some θ ≥ 0 such that for all integers0 < |j| ≤ 2(N−1)mwe have|Ee(jmδ)| ≤ θm2N.This will be known as our deviation model. In our results, distributionswith smaller θ parameter will require less samples and provide reduced errorbounds.Sampling on the torus : It is important to notice that our deviation modelcan potentially allow for nonuniform samples outside of Ω := [−12, 12). Thisviolates our assumption in Section 2.2.1, as (2.1) will no longer hold for τ˜ .To remedy this, we modify our sampling scheme to be on the torus. To beprecise, if f|Ω(x) is given asf|Ω(x) ={f(x) if x ∈ [−12, 12)0 if x /∈ [−12, 12),then we define f˜(x) as the periodic extension of f|Ω(x) to the whole linef˜(x) =∞∑`=−∞f|Ω(x+ `).We now apply samples generated from our deviation model to f˜(x). In-deed, for any t˜k generated outside of Ω by our deviation model we will havef˜(t˜k) = f(t∗) for some t∗ ∈ Ω. In this way we may proceed with our unre-stricted deviation model, which will provide nonuniform samples for f in Ω(in particular the expansion (2.1) will remain valid for these samples).We postpone further discussion of the deviation model until Section 2.4.1,where we will also provide examples of deviations that fit this model.2.2.3 Dirichlet KernelWe model our nonuniform samples via an interpolation kernel S ∈ Rm×Nthat achieves Sf ≈ f˜ accurately. In this thesis, we consider the Dirichletkernel defined by S = NF∗ : CN → Cm, where F ∈ CN×N is a 1D centered17discrete Fourier transform (DFT) and N ∈ Cm×N is a 1D centered nonuni-form discrete Fourier transform (NDFT, see [45, 53]) with normalized rowsand irregular frequencies chosen according to τ˜ . In other words, let N˜ = N−12,then the (k, `) ∈ [m]× [N ] entry of N is given asNk` = 1√Ne((`− N˜ − 1)t˜k).This NDFT is referred to as a nonuniform discrete Fourier transform of type2 in [53]. Thus, the action of S on f ∈ CN can be given as follows: we firstapply the centered inverse DFT to our discrete uniform datafˇu := (F∗f)u =N∑p=1fpF∗up :=1√NN∑p=1fpe((u− N˜ − 1)tp), ∀u ∈ [N ], (2.2)followed by the NDFT in terms of τ˜(Sf)k := (N fˇ)k =N∑u=1fˇuNku := 1√NN∑u=1fˇue(−t˜k(u− N˜ − 1)), ∀k ∈ [m].(2.3)Equivalently,(Sf)k = 1NN∑p=1fpK(t˜k − tp), (2.4)whereK(θ) = sin (Npiθ)sin (piθ)is the Dirichlet kernel. This equality is well known andholds by applying the geometric series formula upon expansion (notice thatS ∈ Rm×N is real valued). This kernel is commonly used for trigonometricinterpolation and is therefore accurate when acting on signals that can bewell approximated by trigonometric polynomials of finite order, as shown inthe following theorem.Theorem 2.2.1. Let S, f and f˜ be defined as above with τ˜ ⊂ Ω. For eachk ∈ [m], if t˜k = tp for some p ∈ [N ] then(f˜ − Sf)k= 0 (2.5)18and otherwise(f˜ − Sf)k=∑|`|>N˜c`(e(`t˜k)− (−1)b `+N˜N ce(r(`)t˜k)), (2.6)where r(`) = rem(` + N˜ ,N) − N˜ with rem(`,N) giving the remainder afterdivision of ` by N . As a consequence, for any 1 ≤ p <∞‖f˜ − Sf‖p ≤ 2m1p∑|`|>N˜|c`| , (2.7)and‖f˜ − Sf‖∞ ≤ 2∑|`|>N˜|c`| . (2.8)The proof of this theorem is postponed until section 2.6.Therefore, the error of S is proportional to the `1-norm of the Fouriercoefficients of f that correspond to frequencies higher than N˜ = N−12. Inparticular notice that if c` = 0 for all ` > N˜ we obtain perfect interpolation,as expected from standard results in signal processing (this will only happenfor trigonometric polynomials of degree ≤ N˜). Despite the wide usage oftrigonometric interpolation in applications [86, 19, 41], such a result thatgives an exact error term does not seem to exist in the literature.Notice that Theorem 2.2.1 only holds for τ˜ ⊂ Ω as restricted in Section2.2.1. However, the results continues to hold for τ˜ unrestricted if we sampleon the torus as discussed in Section 2.2.2 (in particular the error bound willalways hold for our deviation model in this sense).2.3 Main ResultWith the definitions and assumptions introduced in Section 2.2, our method-ology in this chapter will consist of approximating the s largest coefficientsof f in Ψ (in the representation f = Ψg) asg ≈ g] := arg minh∈Cn‖h‖1 s.t. ‖SΨh− b‖2 ≤ η + 2√m∑|`|>N−12|c`| . (2.9)This will give us the approximation f ≈ Ψg]. Recall that S was intro-duced in Section 2.2.3 and Ψ, b, η where defined in Section 2.2.1. The term192√m∑|`|>N˜ |c`| in the noise constraint is due to the error of our interpola-tion kernel S given in (2.7). Thus, since SΨg = Sf ≈ f˜ , (2.9) models ournonuniform samples within the noise level and interpolation error. Underour sparse signal model imposed on g in Section 2.2.1, this approach shouldsuccessfully recovery f from the few nonuniform samples f˜ as shown in thefollowing main results of this chapter.2.3.1 Simplified ResultThis section provides a simplified version of our result assuming that Ψ is anorthonormal basis. We present this result as a corollary of the main resultin Section 2.3.2.Corollary 2.3.1. Let Ψ ∈ CN×N be an orthonormal basis with DFT-incoherenceparameter γ. Define S with the entries of ∆ i.i.d. from any distribution thatsatisfies our deviation model with θ < 1√2.Defineg] = arg minh∈CN‖h‖1 s.t. ‖SΨh− b‖2 ≤ η + 2√m∑|`|>N−12|c`|. (2.10)Let τ := 14√2− θ4> 0. If√m ≥ c1γ√s log2(N)√1 + θ + ττ(2.11)where c1 is an absolute constant, then for c2, c3 > 0 depending only on θ‖f −Ψg]‖2 ≤ c2s(g)√s+c3√Nη√m+ 2c3√N∑|`|>N−12|c`| (2.12)with probability exceeding1− exp(− mτ8sγ2log(1 + 2 log(1 +τ2τ + 1 + 2θ))). (2.13)Therefore, with m ∼ s log4(N) random off-the-grid samples we can re-cover f with error (2.12) proportional to the sparse model mismatch (s(g)),the noise level (η) and the error of the N−12-bandlimited approximation of f(∑|`|>N−12|c`|). As a consequence, we can approximate f(x) for all x ∈ Ω asstated in the following corollary.20Corollary 2.3.2. Let h : Ω → CN be the vector valued function definedentry-wise for ` ∈ [N ] ash(x)` :=1√Ne((`− N˜ − 1)x),and define the function f] : Ω→ C viaf](x) = 〈h(x),F∗Ψg]〉,where g] is given by (2.10).Then, under the assumptions of Corollary 2.3.1,|f(x)− f ](x)| ≤ c2s(g)√s+c3√Nη√m+ 2(c3√N + 1)∑|`|>N−12|c`| (2.14)holds for all x ∈ Ω = [−12, 12) with probability exceeding (2.13).The proof of this corollary is presented in Section 2.6.In the case s(g) = η = 0, the result intuitively says that we can recovertheN−12-bandlimited approximation of f(x) with O(s polylog(N)) randomnonuniform samples. In the case of uniform samples, O(N) measurementsare needed for the same quality of reconstruction by the Nyquist-Shannonsampling theorem (or by Theorem 2.2.1 directly). Thus, for compressiblesignals s  N , random nonuniform samples provide a significant reductionin sampling complexity (see Section 2.4 for further discussion).Notice that general denoising is not guaranteed via our undersamplingscenario (m ≤ N), due to the term√Nη√min (2.12) and (2.14). In other words,one cannot expect to reduce the measurement noise η since√N√m≥ 1 appearingin our error bound implies an amplification of the input noise level. In generala practitioner must oversample (i.e., N < m) to attenuate the effects of noise(our result does not apply in this case since we assumed m ≤ N in Section2.2.1, see proof of Theorem 2.5.3). However, the main result states thatnonuniform samples can handle alias related noise efficiently.2.3.2 Full ResultWe now present the full result, assuming that Ψ is a full column-rank matrix.Corollary 2.3.1 will follow from Theorem 2.3.3 by taking α = β = 1.21Theorem 2.3.3. Let n ≤ N and Ψ ∈ CN×n be a full rank matrix withDFT-incoherence parameter γ andσn(Ψ) := α >√1− 1√2and σ1(Ψ) := β ≤√1 +1√2.Let the entries of ∆ be i.i.d. with any distribution that satisfies our deviationmodel withθ < max{1β2(1 +1√2)− 1, 1 + 1α2(1√2− 1)}.Define g] as in (2.9). Let τ := 14√2− max{β2θ+β2−1,α2θ+1−α2}4> 0.If√m ≥ c˜1γ√s log2(n)√β2 + β2θ + ττ(2.15)where c˜1 is an absolute constant, then for c˜2, c˜3 > 0 depending only on θ, αand β,‖f −Ψg]‖2 ≤ βc˜2s(g)√s+βc˜3√Nη√m+ 2βc˜3√N∑|`|>N−12|c`|with probability exceeding1− exp(− mτ8sγ2log(1 + 2 log(1 +τ2τ + β2(1 + 2θ)))).The proof of this theorem is found in Section 2.5.This theorem generalizes the result in the previous section to more generaltransformations Ψ for sparse representation. This is more practical since thecolumns of Ψ need not be orthogonal, instead linear independence suffices(with knowledge of the singular values α, β). In particular notice that (2.15)requires m ∼ s log4(n) as opposed to m ∼ s log4(N) in (2.11). Since n ≤ N ,this general result allows for a potential reduction in sample complexity if thepractitioner may construct Ψ in such an efficient manner while still allowinga sparse and accurate representation of f .222.4 DiscussionThis section elaborates on several aspects of the main result. Section 2.4.1provides examples of distributions that satisfy our deviation model and in-tuition of its meaning. Section 2.4.2 explores the γ parameter in Corollary2.3.1 and Theorem 2.3.3 with examples of transformations Ψ that produce asatisfiable sampling complexity. This section also considers the practicalityof our signal model. Section 2.4.3 argues for the novelty of our results incontrast to related work in the literature.2.4.1 Deviation ModelIn this section, we present several examples of distributions that are suitablefor our deviation model. These are illustrated in Figure 2.3.• 1) D = U [− 12m, 12m] gives θ = 0. To generalize this example, we maytake D = U [µ− p2m, µ+ p2m], for any µ ∈ R and p ∈ N/{0}. Here θ = 0.• 2) D = U{− 12m+ kmn¯}n¯−1k=0 with n¯ := d2(N−1)m e + 1 gives θ = 0. Togeneralize this example, we may take D = U{µ− p2m+ pkmn¯}n¯−1k=0 , for anyµ ∈ R and p ∈ N/{0} and n¯ ∈ N/[d2(N−1)pme]. Here θ = 0.• 3) D = N (µ, σ¯2), for any µ ∈ R and σ¯2 > 0. Here θ = 2Nme−2(σ¯pim)2.In particular, for fixed σ¯, m may be chosen large enough to satisfy theconditions of Theorem 2.3.3 and vice versa.• 4) Jittered sampling: notice that examples 1) and 2) include cases ofjittered sampling [11, 74, 14, 72]. Indeed, (let µ = 0 and p = 1) wemay choose N,m, n¯ in such a way that partitions Ω into m regions ofequal size and these distributions will choose a point randomly fromeach region (in a continuous or discrete sense).We leave it to the reader to verify the examples above and consider otherdistributions of interest.Interpretation of the deviation model : For any∆ ∈ R, notice that e(jm∆)will be a point on the unit circle. A small θ ≈ 0 parameter in our deviationmodel implies that|Ee(jm∆)| ≈ 0.23Figure 2.3: Illustrations of example perturbations generated by our deviationmodel. In the top two examples, the red areas indicate the allowed positionsof the deviations from the grid pointkm. (Top) deviations pertaining to ex-ample 1) with µ = 0 and p = 1, the samples lie on the interval [2k−12m, 2k+12m].(Middle) deviations pertaining to example 2) with µ = 0 and p = 1, the sam-ples lie on a discrete subgrid centered atkm. (Bottom) deviations pertainingto example 3) with µ = 0 where the bell curve indicates the Gaussian pdfof these centered deviations. Notice that these samples may lie outside of Ω,but recall that we are sampling on the torus (see Section 2.2.2).24Intuitively, θ is in some sense measuring how biased a given distribution is ingenerating deviations. Indeed, |Ee(jm∆)| = 0 means that the distribution inquestion is centered and unbiased to any particular directions. On the otherhand, |Ee(jm∆)| ≈ 1 gives the opposite interpretation where deviations willbe generated favoring a certain direction in an almost deterministic sense.Our result is not applicable to such biased distributions since in Theorem2.3.3, asθ → max{1β2(1 +1√2)− 1, 1 + 1α2(1√2− 1)}we have τ → 0 and c˜2, c˜3 → ∞ (see Section 2.5). Therefore our number ofsamples and error terms blow up in this case. In conclusion, our deviationmodel cannot be satisfied by such biased and quasi-deterministic deviationssince in the case of the degenerate distribution|Ee(jm∆)| = |e(jm∆)| = 1,which gives θ = 2Nm> 1. Such a parameter will never satisfy the conditionsof the main result.2.4.2 Signal ModelWe begin this section with the discussion of the DFT-incoherence parameterγ introduced in Section 2.2.1 asγ = max`∈[n]N∑k=1|〈F∗k,Ψ∗`〉| .This a uniform upper bound on the `1-norm of the discrete Fourier coeffi-cients of the columns of Ψ. Since the decay of the Fourier coefficients of afunction are related to its smoothness, intuitively γ can be seen as a measureof the smoothness of the columns of Ψ. Implicitly, this also measures thesmoothness of f(x) since its uniform discretization admits a representationvia this transformation f = Ψg. Therefore, the role of γ on the samplingcomplexity is clear since smaller γ implies that our signal our interest issmooth and therefore requires less samples. This observation is intuitive,since non-smooth functions will require additional samples to capture jumpdiscontinuities in accordance with Gibbs phenomenon.We now consider several common choices for Ψ and discuss the respectiveγ parameter:25• 1) Ψ = F (the DFT), then γ = 1. This is the optimal case and canalso be achieved for the discrete cosine and sine transforms.• 2) When Ψ = H is the 1D Haar wavelet basis, we have γ ∼ O(log(N)).In [30] it is shown that |〈F∗k,H∗`〉| ∼ 1|k| (see Lemma 6.1), which givesthe desired upper bound for γ via an integral comparison. Notice thatthese basis function have jump discontinuities, and yet we still obtain anacceptable DFT-incoherence parameter for nonuniform undersampling.• 3) Ψ = I (the N × N identity) gives γ = √N . This is the worst casescenario for normalized transforms sincemaxx∈SN−1N∑k=1|〈F∗k, x〉| = maxx∈SN−1N∑k=1|〈F∗k,F∗x〉| = maxx∈SN−1N∑k=1|xk| ≤√N.In general, our smooth signals of interest are not fit for this sparsitymodel.• 4) Let p ≥ 1 be an integer. Matrices Ψ whose columns are uniformdiscretizations of p-differentiable functions, with p − 1 periodic andcontinuous derivatives and with p-th derivative that is piecewise con-tinuous. In this case γ ∼ O(log(N)) if p = 1 and γ ∼ O(1) if p ≥ 2.An informal argument for these computations is provided in Section2.8.Example 4) in particular is informative due to its generality and ability tosomewhat formalize the intuition behind γ previously discussed. This ex-ample implies the applicability of our result to a general class of smoothfunctions that agree nicely with our signal model defined in Section 2.2.1(functions in H1(Ω)).We finish this section by motivating the practicality of our signal model.Note that any f ∈ L2(Ω) provides an infinite Fourier expansion with decayingFourier coefficients |c`|. Such functions can be approximated (according tothe practitioner's error tolerance) by a trigonometric polynomial of degree N˜ ,where N˜ may be extremely large in order to provide∑|`|>N˜ |c`|   (withina desired error tolerance). In essence, our main result shows that randomnonuniform samples allow for robust and stable recovery of f(x) with anerror term proportional to∑|`|>(N−1)/2 |c`|, the error of a N/2-bandlimitedapproximation. This term is a typical (and inevitable) error accepted when26discretizing non-bandlimited signals. Thus, we may capture f to within qual-ity of standard practice, but economically so with sampling complexity pro-portional to the compression level provided by Ψ and only dependent onN logarithmically. In particular, N may be chosen substantially large toaccommodate∑|`|>(N−1)/2 |c`|   for signals with slowly decaying Fouriercoefficients and the number of samples required will not drastically increase.Thus, our work is relevant to a variety applications that are prone tononuniform sampling and have lossy compression techniques available. Thisis the case for many smooth signals of interest including audio [3], radar [60],seismic [29] and other signals that represent dicretizations of analog finite-energy wavefields (e.g., Lamb waves [46]). Many technologies developed forsuch applications have proven through their every day use that most signalsof interest can be significantly compressed (MP3, JPEG 2000, DjVu, WAV,ZIP, PGF, MP4, ICER).2.4.3 Novelty of the ResultsSeveral studies in the literature are similar to our results in this chapter[42, 41], and we would like to stress the novelties of this work and give creditdue before proceeding to the proofs. Our main proof under the sparse signalmodel adopts the approach of [58, 42, 41] in the case S forms a boundedorthonormal system (when θ = 0). However, we derive recovery guaranteesfor non-orthonormal systems (when θ 6= 0) and focus the scope of the paperwithin the context of classical sampling theory (introducing error accord-ing to bandlimited approximation). The work in [41] considers sampling ofsparse trigonometric polynomials and coincides with our application in thecase Ψ = F . Our results generalize this work (in the basis pursuit case) toallow for other signal models and sparsifying transforms. Furthermore, [41]assumes that the samples are chosen uniformly at random from a continuousinterval or a discrete set of N equispaced points. In contrast, our resultspertain to general deviations from an equispaced grid with sampling density∼ s polylog(N) and allow for these and many other distributions of the per-turbations (according to parameter θ). Finally, we also derive results in thelow-rank matrix recovery scenario (see Chapter 3). These results generalizethe previous method of proof to the low-rank data case and may be of intereston their own to establish other results for low-rank matrix recovery theory.In particular, the results under the low-rank data model also provide novelinsights for the popular matrix completion problem.27The differences might seem subtle, but the sampling model we consideris a naturally occurring scenario in many applications that is not covered byprevious work. To illustrate, we consider a typical marine seismic data survey.In this application, cables of sources and receivers equipped with GPS aretowed by a vessel over a survey area. Due to ocean currents and varying shipspeeds, the measurements are inevitably deviated from the desired uniformgrid designed for post-processing [68, 82, 9, 34]. This provides ideal condi-tions for our methodology as the deviations are accurately monitored andcan be appropriately modeled as random phenomena. In comparison, the as-sumptions in [41] would require practitioners to remove sources and receiversuniformly at random from the equipment arrays independently. Since surveytools are pre-designed and fixed in many applications, such sampling schemeswould require mayor modifications of equipment and acquisition design. Infact, most sampling strategies from the compressive sensing literature wouldrequire multiple sources to be fired simultaneously with subgaussian weights,[58, 49, 65, 64, 66, 83, 4, 88], and are therefore impractical for applicationsthat rely on equipment developed according to classical sampling theory. Onthe other hand, we adopt acquisition scenarios that occur frequently in cur-rent practice and our main results show that these situations can be exploitedfor economical sampling of compressible signals.2.5 Proof of Main ResultAs mentioned in Section 2.3.3, the restricted isometry property (RIP) of ameasurement operator is a common analytical tool used to establish sparsevector recovery results via basis pursuit. We will obtain our main result byestablishing this property for our interpolation kernel S with respect to thetransform Ψ. Specifically, we wish to show that SΨ satisfies the following[85, 89]:Definition: Suppose A ∈ Cm×n is a measurement matrix and 1 ≤ s ≤ nis an integer. The restricted isometry constant (RIC) of order s of A is thesmallest number δs such that for all s-sparse vectors v ∈ Cn (i.e., ‖v‖0 ≤ s),(1− δs)‖v‖22 ≤ ‖Av‖22 ≤ (1 + δs)‖v‖22. (2.16)Informally, the RIP is said to hold if the RIC is small enough for sparsevector recovery. Many results exist in the literature that give such conditions28on the RIC. To the best of the author's knowledge, [89] provides the bestresult in terms of the RIC:Theorem 2.5.1 (Theorem 2.1 in [89]). Consider the basis pursuit problem(1.5) with feasible vector g and minimizer g]. If the RIC δ2s of AΨ satisfiesδ2s <1√2,then‖g − g]‖2 ≤δ2s√2 +√2δ2s(1√2− δ2s)2(1√2− δ2s) + 1 2s(g)√s + 2η√2(1 + δ2s)1− δ2s√2:=c˜2s(g)√s+ c˜3η.Notice that c˜2, c˜3 only depend on δ2s. These constants provide the errorbounds in Corollary 2.3.1 and Theorem 2.3.3 where δ2s will depend on θ, αand β.This result provides recovery error bounds proportional to the noise leveland sparse model mismatch. We will thus compute the RIC of SΨ and thenapply Theorem 2.5.1 to obtain our main result.2.5.1 Restricted Isometry Property of SΨTo prove Theorem 2.3.3 we will bound the RIC of A :=√N√mSΨ ∈ Cm×n,which will then provide our recovery error bound using well known results inthe literature [89, 85] (Theorem 2.5.1 in the previous section).For an index set T ⊂ [n], let AT ∈ Cm×|T | be the sub-matrix of A thatresults by only considering the columns of A specified by T . We consider thetermXm := sup|T |≤s‖A∗TAT − EA∗TAT‖2.Notice that in contrast to many works in the literature that adopt the sameapproach ([41, 58]), we have EA∗TAT 6= Is×s in general (equality only holdsin the case θ = 0 and α = β = 1). We must therefore deal with this termexplicitly and modify our approach respectively.29In order to bound the RIC, our goal is to boundXm ≤ τ. (2.17)for some τ ≥ 0. This will in turn show that the order s RIC of A is boundedasδs ≤ τ + max{β2θ + β2 − 1, α2θ + 1− α2},where the additional term max{β2θ + β2 − 1, α2θ + 1 − α2} is due to ourdeviation model and assumptions on Ψ. This is shown in the following argu-ment:The variational characterization of the spectral norm for a Hermitianmatrix givesXm = sup|T |≤smaxx∈Ss−1|〈(A∗TAT − EA∗TAT )x, x〉|= supx∈Sn−1,‖x‖0≤s|〈(A∗A− EA∗A)x, x〉|= supx∈Sn−1,‖x‖0≤s∣∣‖Ax‖22 − E‖Ax‖22∣∣ .Therefore, Xm ≤ τ is equivalent toE‖Ax‖22 − τ‖x‖22 ≤ ‖Ax‖22 ≤ E‖Ax‖22 + τ‖x‖22,and to establish our claim it suffices to show thatE‖Ax‖22 ≤ β2(1 + θ)‖x‖22 and E‖Ax‖22 ≥ α2(1− θ)‖x‖22. (2.18)To this end, let w ∈ Cn and normalize N˜ = √NN so that for k ∈ [m], ` ∈[N ]N˜k` = e(t˜k(`− n˜− 1)).Throughout, let ∆˜ ∈ R be an independent copy of the entries of ∆ ∈ Rm.30Then with v := F∗Ψw,E‖Aw‖22 =1mE‖N˜F∗Ψw‖22 :=1mE‖N˜ v‖22= E1mm∑k=1∣∣∣〈N˜k∗, v〉∣∣∣2 = E 1mm∑k=1∣∣∣∣∣N∑`=1e(t˜k(`− n˜− 1))v`∣∣∣∣∣2= E1mm∑k=1 N∑`=1N∑˜`=1e(t˜k(`− ˜`))v`v¯˜`=N∑`=1N∑˜`=1v`v¯˜`(E1mm∑k=1e(t˜k(`− ˜`)))=N∑`=1|v`|2 +b(N−1)/mc∑j=1∑`−˜`=jmv`v¯˜`Ee(jm(∆˜− 1/2))+b(N−1)/mc∑j=1∑`−˜`=−jmv`v¯˜`Ee(−jm(∆˜− 1/2)).The last equality can be obtained as follows,E1mm∑k=1e(t˜k(`− ˜`)) = E 1mm∑k=1e((k − 1m− 12+ ∆k)(`− ˜`))=1mm∑k=1e((k − 1m− 12)(`− ˜`))Ee(∆k(`− ˜`))=1mm∑k=1e((k − 1m− 12)(`− ˜`))Ee(∆˜(`− ˜`))= Ee((∆˜− 1/2)(`− ˜`)) m∑k=11me(k − 1m(`− ˜`))=1 if ` = ˜`Ee(jm(∆˜− 1/2))if `− ˜`= jm, j ∈ Z/{0}0 otherwisewhere the third equality uses the fact that Ee(∆k(`− ˜`)) = Ee(∆˜(`− ˜`)) forall k ∈ [m] in order to properly factor out this constant from the sum in the31fourth equality. The last equality is due to the properties of this geometricseries.Returning to our original calculation, we bound the last term using ourdeviation model assumptions∣∣∣∣∣∣b(N−1)/mc∑j=1∑`−˜`=−jmv`v¯˜`Ee(−jm(∆˜− 1/2))∣∣∣∣∣∣=∣∣∣∣∣∣b(N−1)/mc∑j=1∑`∈Qjv`v¯`+jmEe(−jm(∆˜− 1/2))∣∣∣∣∣∣≤b(N−1)/mc∑j=1∑`∈Qj|v`||v`+jm|∣∣∣Ee(−jm(∆˜− 1/2))∣∣∣≤ θm2Nb(N−1)/mc∑j=1∑`∈Qj|v`||v`+jm| ≤ θm2Nb(N−1)/mc∑j=1|〈v, v〉|=θm‖v‖222Nb(N−1)/mc∑j=11 =θm‖v‖222NbN − 1mc ≤ θ‖v‖222.Here, Qj ⊂ [N ] is the index set of allowed ` indices according to j, i.e.,that satisfy ` ∈ [N ] and ` + jm ∈ [N ]. The remaining sum can be boundedsimilarly. Combine these inequalities with the singular values of Ψ to obtainE‖Aw‖22 ≤ ‖v‖22 +2θ‖v‖222= ‖Ψw‖22 (1 + θ) ≤ β2‖w‖22 (1 + θ) ,andE‖Aw‖22 ≥ α2‖w‖22 (1− θ) .We will apply this inequality and similar orthogonality properties in whatfollows, and ask the reader to keep this in mind. To establish (2.17), we willapply a concentration inequality (Theorem 5.2 in [41], established in [87]).The following lemma will be useful for this purpose.Lemma 2.5.2. Let v ∈ Cn be an s-sparse vector and define N˜ as abovewith the entries of ∆ i.i.d. with any distribution D, such that for integers0 < |j| ≤ 2(N−1)m, if δ ∼ D then|Ee(jδm)| ≤ θm2N,32for some θ ≥ 0. Defineγ = max`∈[n]N∑k=1|〈F∗k,Ψ∗`〉| .Then for all k ∈ [m]|〈Ak∗, v〉| ≤ γ√s‖v‖2√m,andEm∑k=1|〈Ak∗, v〉|4 ≤ sβ2γ2(1 + 2θ)‖v‖42mThe proof of this lemma can be found in Section 2.7.We are now in a position to bound the RIC. The proof will follow thearguments in [41, 58]. In particular, the main difference here is that the rowsof A are not identically distributed and EA∗A 6= In×n in general (equalityonly holds in the case θ = 0 and α = β = 1). However, the proof goesthrough similarly, where we need only to apply the previous lemma.Theorem 2.5.3. Let the entries of ∆ be i.i.d. with any distribution D, suchthat for integers 0 < |j| ≤ 2(N−1)m, if δ ∼ D then|Ee(jδm)| ≤ θm2N,for some θ ≥ 0. Defineγ = max`∈[n]N∑k=1|〈F∗k,Ψ∗`〉| .If √m√log(m)≥ 2cγ√s log(s)√log(n)√1 + θ + ττ,then the order s restricted isometry constant of A :=√N√mSΨ ∈ Cm×n satisfiesδs ≤ 2τ + max{β2θ + β2 − 1, α2θ + 1− α2}with probability exceeding1− exp(− mτ4sγ2log(1 + 2 log(1 +τ2τ + β2(1 + 2θ)))).33Proof. In what follows, for k ∈ [m] let AkT ∈ C|T | denote the k-th row of AT .DefineXm := sup|T |≤s‖A∗TAT − EA∗TAT‖2 = sup|T |≤s‖m∑k=1(AkT ∗AkT − EAkT ∗AkT ) ‖2.As noted in the beginning of the section, our RIC bound will be establishedif we show Xm ≤ 2τ with high probability. We begin by bounding EDXm,and then show that Xm does not deviate much from its expectation withhigh probability.Symmetrizing as Lemma 6.3 in [56], we haveEDXm ≤ 2EDE sup|T |≤s‖m∑k=1kAkT ∗AkT‖2,where the k are independent Rademacher random variables. We will nowapply Lemma 3.6 in [58], which requires m ≤ n and an upper bound of‖Ak∗‖∞ for all k ∈ [m] accordingly.Notice that for k ∈ [m]‖Ak∗‖∞ = max`∈[n]|Ak`| = max`∈[n]1√m∣∣∣〈N˜k∗, (F∗Ψ)`〉∣∣∣ = max`∈[n]1√m∣∣∣∣∣N∑p=1N˜kp(F∗Ψ)p`∣∣∣∣∣≤ 1√mmax`∈[n]N∑p=1|N˜kp| |(F∗Ψ)p`| = 1√mmax`∈[n]N∑p=1|(F∗Ψ)p`| := γ√m.Applying Lemma 3.6 in [58] with this observation gives that2E sup|T |≤s‖m∑k=1kAkT ∗AkT‖2 ≤ Q(s,m, n) sup|T |≤s‖m∑k=1AkT ∗AkT‖1/22 .We have definedQ(s,m, n) :=2cγ√s log(s)√log(n) log(m)√m,34where c is an absolute constant given in Lemma 3.6 from [58]. We maytherefore boundEDXm ≤ Q(s,m, n)ED sup|T |≤s‖m∑k=1AkT ∗AkT‖1/22 ≤ Q(s,m, n)(ED sup|T |≤s‖m∑k=1AkT ∗AkT‖2)1/2≤ Q(s,m, n)(ED sup|T |≤s‖m∑k=1(AkT ∗AkT − EDAkT ∗AkT ) ‖2 + sup|T |≤s‖EDA∗TAT‖2)1/2≤ Q(s,m, n)(ED sup|T |≤s‖m∑k=1(AkT ∗AkT − EDAkT ∗AkT ) ‖2 + β2 + β2θ)1/2= Q(s,m, n)(EDXm + β2 + β2θ)1/2.The last inequality holds as in the proof of (2.18), sincesup|T |≤s‖EDA∗TAT‖2 ≤ ‖EDA∗A‖2 = supx∈Sn−1ED〈A∗Ax, x〉 = supx∈Sn−1ED‖Ax‖22 ≤ β2 + β2θ.In conclusion,EDXm(EDXm + β2 + β2θ)1/2≤ 2cγ√s log(s)√log(n) log(m)√mand we may achieveEDXm ≤ τif2cγ√s log(s)√log(n) log(m)√m≤ τ√τ + β2 + β2θ. (2.19)We now apply a concentration inequality to show that Xm is close to itsexpected value with high probability. We use the following result (Theorem5.2 in [41], established in [87]).Theorem 2.5.4. Let Y1, ..., Ym be a sequence of independent random vari-ables with values in some Polish space X . Let F be a countable collection ofreal-valued measurable and bounded functions f on X with ‖f‖∞ ≤ B for allf ∈ F . Let Z be the random variableZ = supf∈Fm∑k=1f(Yk).35Assume Ef(Yk) = 0 for all k ∈ [m] and all f ∈ F . Define σ2 = supf∈F E∑mk=1 f(Y`)2.Then for all t ≥ 0P (Z ≥ EZ + t) ≤ exp(− t4Blog(1 + 2 log(1 +Bt2BEZ + σ2))).To apply the theorem, letDs := {x ∈ Sn−1 : ‖x‖0 ≤ s},andX := {Z ∈ Cn×n : 〈Zx, x〉 ∈ R ∀x ∈ Ds and supx∈Ds〈Zx, x〉 ≤ γ2sm},which is a closed subset of Cn×n (a Polish space), and therefore itself a Polishspace (see for example [31]). For x ∈ Ds, we define the function fx : X 7→ Rasfx(Z) := 〈Zx, x〉,which is real valued by definition of X . Then notice thatXm = sup|T |≤s‖m∑k=1(AkT ∗AkT − EAkT ∗AkT ) ‖2= sup|T |≤ssupx∈Ss−1m∑k=1〈(AkT ∗AkT − EAkT ∗AkT )x, x〉= supx∈Dsm∑k=1〈(A∗k∗Ak∗ − EA∗k∗Ak∗)x, x〉 = supfx,x∈Dsm∑k=1fx (A∗k∗Ak∗ − EA∗k∗Ak∗)= supfx,x∈D∗sm∑k=1fx (A∗k∗Ak∗ − EA∗k∗Ak∗) ,where D∗s is a dense countable subset of Ds. Furthermore, by the first partof Lemma 2.5.2, for all k ∈ [m] and x ∈ Dsfx(A∗k∗Ak∗ − EA∗k∗Ak∗) = |〈Ak∗, x〉|2 − E|〈Ak∗, x〉|2 ≤γ2sm,so that {A∗k∗Ak∗ − EA∗k∗Ak∗}mk=1 ⊂ X .36Otherwise, we meet all the assumptions of the Theorem with B := γ2sm,and need only to compute σ2. To this end we apply the second part of Lemma2.5.2, and obtain for any x ∈ DsEm∑k=1fx(A∗k∗Ak∗ − EA∗k∗Ak∗)2 = Em∑k=1|〈Ak∗, x〉|4 −(E|〈Ak∗, x〉|2)2≤ Em∑k=1|〈Ak∗, x〉|4 ≤ sβ2γ2(1 + 2θ)m:= σ2.To finish, apply Theorem 2.5.4 with t = τ > 0 and assume√m√log(m)≥ 2cγ√s log(s)√log(n)√β2 + β2θ + ττ,which gives EXm ≤ τ according to (2.19). ThenXm ≤ EXm + t ≤ 2τwith probability exceeding1− exp(− mτ4sγ2log(1 + 2 log(1 +τ2τ + β2(1 + 2θ)))).Thus, by our remarks in the beginning of the section, A :=√N√mSΨ satifiesthe RIP inequality with constant 2τ + max{β2θ + β2 − 1, α2θ + 1− α2} andthe derived probability.2.5.2 Proof of Theorem 2.3.3We finish by applying the RIC bound established in the previous sectionto well known results in compressive sensing and obtain our recovery errorbound.Proof of Theorem 2.3.3. Let f = Ψg and e = f˜ − Sf . Then for any g¯ ∈ Cn‖SΨg¯ − f˜ − d‖2 = ‖SΨg¯ − (Sf + e)− d‖2=‖SΨg¯ − SΨg − e− d‖2.37Thus, g] equivalently solvesming¯∈Cn‖g¯‖1 subject to ‖SΨg¯ − SΨg − e− d‖2 ≤ 2√m∑|`|≥N−12|c`|+ η= ming¯∈Cn‖g¯‖1 subject to ‖A(g¯ − g)−√N√m(e+ d)‖2 ≤√N√m2√m ∑|`|≥N−12|c`|+ η ,where as before A :=√N√mSΨ. Notice that g is feasible since√N√m‖e+ d‖2 ≤√N√m‖e‖2 +√N√m‖d‖2 ≤√N√m2√m ∑|`|≥N−12|c`|+ √Nη√m,where the last equality follows since ‖e‖2 ≤ 2√m∑|`|≥N−12|c`| by Theorem2.2.1. Apply Theorem 2.5.3 withτ :=14√2− max{β2θ + β2 − 1, α2θ + 1− α2}4> 0,which is positive by our assumptions on θ, α and β. Then if√m√log(m)≥ 2cγ√2s log(2s)√log(n)√β2 + β2θ + ττ, (2.20)A satisfies the 2s-restricted isometry property with constant satisfyingδ2s ≤ 2τ + max{β2θ + β2 − 1, α2θ + 1− α2}:=12√2+max{β2θ + β2 − 1, α2θ + 1− α2}2<1√2and probability exceeding1− exp(− mτ8sγ2log(1 + 2 log(1 +τ2τ + β2(1 + 2θ)))).Since δ2s <1√2, Theorem 2.1 in [89] (the case t = 2 or see Theorem 2.5.1)and the singular values of Ψ give‖f−Ψg]‖2 = ‖Ψ(g−g])‖2 ≤ β‖g−g]‖2 ≤ βc˜2s(g)√s+βc˜3√N√m2√m ∑|`|≥N−12|c`|+ η ,(2.21)38where c˜2, c˜3 > 0 depend only on12√2+ max{β2θ+β2−1,α2θ+1−α2}2(c˜2, c˜3 are givenin [89] or see Theorem 2.5.1).In particular, since s ≤ n and by assumption m ≤ n, if√m ≥ 2cγ√2s log2(2n)√β2 + β2θ + ττ, (2.22)then (2.20) holds. In the statement of the theorem we have absorbed all theabsolute constants in this expression into the definition of c˜1.2.6 Interpolation Error of Dirichlet Kernel: ProofIn this section we provide the error term of our interpolation operator whenapplied to our signal model (Theorem 2.2.1) and also the error bound givenin Corollary 2.3.2.Proof of Theorem 2.2.1. We begin by showing that if t˜k = tp˜ for some p˜ ∈ [N ](i.e., our nonuniform sample lies on the dense interpolation grid), then theerror is zero. This is easy to see by orthogonality of the complex exponentials,combining (2.2), (2.3) (recall that N˜ = N−12) we have(Sf)k = 〈f,Sk∗〉 =N∑p=1fpSkp := 1NN∑p=1fp N˜∑u=−N˜e(utp)e(−ut˜k):=1NN∑p=1fp N˜∑u=−N˜e(utp)e(−utp˜) = fp˜ = f(tp˜) = f(t˜k) = f˜k.This establishes (2.5).We now deal with the general case (2.6). Recall the Fourier expansion ofour underlying functionf(x) =∞∑`=−∞c`e(`x).39Again, using (2.2), (2.3) and the Fourier expansion at f(tp) = fp we obtain(Sf)k = 〈f,Sk∗〉 =N∑p=1fpSkp:=1NN∑p=1( ∞∑`=−∞c`e(`tp)) N˜∑u=−N˜e(utp)e(−ut˜k) .At this point, we wish to switch the order of summation and sum over allp ∈ [N ]. We must assume the corresponding summands are non-zero. To thisend, we continue assuming fp,Skp 6= 0 for all p ∈ [N ]. We will deal with thesecases separately afterward. In particular we will remove this assumption forthe fp's and show that Skp 6= 0 under our assumption τ˜ ⊂ Ω.Proceeding, we may now sum over all p ∈ [N ] since these summands arenon-zero and switch the order of summations to obtain(Sf)k = 1NN˜∑u=−N˜∞∑`=−∞c`e(−ut˜k)N∑p=1e((u+ `)tp)=N˜∑u=−N˜∞∑j=−∞(−1)jNcjN+ue(ut˜k) =∞∑j=−∞(−1)b j+N˜N ccje(r(j)t˜k).The second equality is obtained by orthogonality of the exponential basisfunctions,∑Np=1 e((u + `)tp) = 0 when ` + u /∈ NZ and otherwise equal toN(−1)jN for some j ∈ Z (where u+ ` = jN). The last equality results froma reordering of the series where the mapping r is defined as in the statementof Theorem 2.2.1. To illustrate, we consider j ≥ 0 (for simplicity) and firstnotice that (−1)jN = (−1)j since N is assumed to be odd in Section expanding the previous sum givesN˜∑u=−N˜∞∑j=0(−1)jcjN+ue(ut˜k) = e(−N˜ t˜k)(c−N˜ ,−cN−N˜ , c2N−N˜ , · · ·)+ e((−N˜ + 1)t˜k)(c−N˜+1,−cN−N˜+1, c2N−N˜+1, · · ·)· · ·+ (c0,−cN , c2N , · · · )· · ·+ e(N˜ t˜k)(cN˜ ,−cN+N˜ , c2N+N˜ , · · ·).Notice that in the first line we have indices N−N˜ = N˜+1 and 2N−N˜ =N + N˜ + 1. Therefore, if start at the term c−N˜ and traverse this array ofFourier coefficients we will obtain the ordered sequence {(−1)b j+N˜N ccj}∞j=−N˜(with no repetitions). The coefficients in row q ∈ [N ] correspond to frequencyvalue −N˜ + q − 1 and have indices of the form cpN−N˜+q−1 for some p ∈ N.Let us check that for a given index the mapping r gives the correct frequencyvalue, i.e., r(pN − N˜ + q − 1) = −N˜ + q − 1 for all q ∈ [N ]:r(pN − N˜ + q − 1) := rem(pN − N˜ + q − 1 + N˜ ,N)− N˜= rem(pN + q − 1, N)− N˜ = q − 1− N˜ .We can therefore reorder the series as desired and incorporate the sumover j < 0 via the same logic to establish the equality. Since for ` ∈{−N˜ ,−N˜ +1, · · · , N˜} we have r(`) = ` and (−1)b `+N˜N c = 1, we finally obtainf(t˜k)− (Sf)k =∑|`|>N˜c`(e(`t˜k)− (−1)b `+N˜N ce(r(`)t˜k)).The definition of `p-norms along with the triangle inequality give the remain-41ing claim. In particular,‖f˜ − Sf‖p =(m∑k=1∣∣f(t˜k)− (Sf)k∣∣p)1/p= m∑k=1∣∣∣∣∣∣∑|`|>N˜c`(e(`t˜k)− (−1)b `+N˜N ce(r(`)t˜k))∣∣∣∣∣∣p1/p ≤ m∑k=1∑|`|>N˜2|c`|p1/p=m∑|`|>N˜2|c`|p1/p = 2m1/p ∑|`|>N˜|c`|.This finishes the proof in the case fp,Skp 6= 0 for all p ∈ [N ]. To removethis condition for the fp's, we may find a real number µ such that the functionh(x) := f(x) + µ =∑`∈(−∞,∞)∩Z/{0}c`e(`x) + c0 + µis non-zero when x ∈ {tp}Np=1. In particular notice that if we define h =f + µ ∈ CN , then hp 6= 0 for all p ∈ [N ]. Therefore, only assuming now thatSkp 6= 0 for p ∈ [N ], the previous argument can be applied to concludeh(t˜k)− (Sh)k =∑|`|>N˜c`(e(`t˜k)− (−1)b `+N˜N ce(r(`)t˜k)).However, if 1N ∈ CN denotes the all ones vector and eN˜+1 ∈ CN is theN˜ + 1-th standard basis vector, notice that(Sh)k = 〈Sk∗, h〉 = 〈Sk∗, f〉+ µ〈Sk∗, 1N〉 = 〈Sk∗, f〉+ µ〈Nk∗,F∗1N〉= 〈Sk∗, f〉+ µ√N〈Nk∗, eN˜+1〉 = 〈Sk∗, f〉+ µ = (Sf)k + µ.The fourth equality holds by orthogonality of F∗ and since F∗(N˜+1)∗ =1√N1N .Thereforeh(t˜k)− (Sh)k = f(t˜k) + µ− ((Sf)k + µ) = f(t˜k)− (Sf)k,and the claim holds in this case as well.42The assumption Skp 6= 0 will always hold under our assuption τ˜ ⊂ Ω, i.e.,t˜k ∈ [−12 , 12) for all k ∈ [m]. We show this case by deriving conditions underwhich this occurs. As noted before, we haveSkp :=N˜∑u=−N˜e(u(tp − t˜k)) =N−1∑u=0e(u(tp − t˜k))e(−N˜(tp − t˜k))= e(−N˜(tp − t˜k))1− e(N(tp − t˜k))1− e(tp − t˜k)and we see that Skp = 0 iff N(tp − t˜k) ∈ Z/{0} and tp − t˜k /∈ Z. However,notice thatN(tp − t˜k) = N(p− 1N− k − 1m−∆k)= p− 1− N(k − 1)m−N∆k,so that N(tp − t˜k) ∈ Z/{0} iff N(k−1)m + N∆k = Nt˜k + N2 ∈ Z/{p− 1}. Thiscondition equivalently gives that t˜k =jN− 12for some j ∈ Z/{p− 1}. Sincethis must hold for all p ∈ [N ], we have that N(tp− t˜k) ∈ Z/{0} iff t˜k = jN − 12for some j ∈ Z/{0, 1, · · · , N − 1}.We see that such a condition would imply that t˜k /∈ Ω := [−12 , 12), whichviolates the assumptions of Theorem 2.2.1. This finishes the proof.We end this section with the proof of Corollary 2.3.2.proof of Corollary 2.3.2. The proof will consist of applying Corollary 2.3.1(since we have adopted these assumptions) and Theorem 2.2.1.By Corollary 2.3.1, we have that‖f −Ψg]‖2 ≤ c2s(g)√s+c3√Nη√m+ 2c3√N∑|`|>N−12|c`|with probability exceeding (2.13) and by Theorem 2.2.1 for x ∈ Ωf(x)− 〈h(x),F∗f〉 =∑|`|>N˜c`(e(`x)− (−1)b `+N˜N ce(r(`)x)).43Therefore|f(x)− f](x)| := |f(x)− 〈h(x),F∗Ψg]〉|≤ |f(x)− 〈h(x),F∗f〉|+ |〈h(x),F∗f〉 − 〈h(x),F∗Ψg]〉|≤∣∣∣∣∣∣∑|`|>N˜c`(e(`x)− (−1)b `+N˜N ce(r(`)x))∣∣∣∣∣∣+ ‖h(x)‖2‖F∗(f −Ψg])‖2≤ 2∑|`|>N˜|c`|+ c2s(g)√s+c3√Nη√m+ 2c3√N∑|`|>N−12|c`|.The last inequality holds since ‖h(x)‖2 = 1 (here x is considered fixed andh(x) ∈ CN). This finishes the proof.2.7 Proof of Lemma 2.5.2We end the required proofs of this chapter by establishing Lemma 2.5.2. Thislemma was used in the RIC bound to apply the respective concentrationinequality.proof of Lemma 2.5.2. We upper bound |〈Ak∗, v〉| as follows,|〈Ak∗, v〉| = |〈 1√mN˜k∗,F∗Ψv〉|=1√m∣∣∣∣∣N∑`=1N˜k`(F∗Ψv)`∣∣∣∣∣ ≤ 1√mN∑`=1|N˜k`| |(F∗Ψv)`|=1√mN∑`=1|(F∗Ψv)`| = 1√mN∑`=1∣∣∣∣∣n∑p=1(F∗Ψ)`p vp∣∣∣∣∣ = 1√mN∑`=1∣∣∣∣∣n∑p=1〈F∗`,Ψ∗p〉vp∣∣∣∣∣≤ 1√mn∑p=1N∑`=1|〈F∗`,Ψ∗p〉| |vp| ≤ 1√m(maxq∈[n]N∑`=1|〈F∗`,Ψ∗q〉|)n∑p=1|vp|:=γ‖v‖1√m≤ γ√s‖v‖2√m.44Note that we have shown ‖F∗Ψv‖1 ≤ γ√s‖v‖2 (this will be used in the proofof the remaining claim).We now bound the sum of the fourth moments, using our deviation modelas in the proof of (2.18). Throughout, let ∆˜ ∈ R be an independent copy ofthe entries of ∆ ∈ Rm. Expanding and taking expected value we obtainEm∑k=1|〈Ak∗, v〉|4 = 1m2Em∑k=1|〈N˜k∗,F∗Ψv〉|4 = 1m2Em∑k=1∣∣∣∣∣N∑`=1N˜k`(F∗Ψv)`∣∣∣∣∣4=1m2Em∑k=1(N∑`1,`2,`3,`4=1N˜k`1N˜ k`2N˜k`3N˜ k`4(F∗Ψv)`1(F∗Ψv)`2(F∗Ψv)`3(F∗Ψv)`4):=N∑`1,`2,`3,`4=11m2(F∗Ψv)`1(F∗Ψv)`2(F∗Ψv)`3(F∗Ψv)`4(Em∑k=1e(t˜k(`1 − `2 + `3 − `4)))=∑`1−`2=`4−`31m(F∗Ψv)`1(F∗Ψv)`2(F∗Ψv)`3(F∗Ψv)`4+b2(N−1)/mc∑j=1∑`1−`2=`4−`3+jm1m(F∗Ψv)`1(F∗Ψv)`2(F∗Ψv)`3(F∗Ψv)`4Ee(jm(∆˜− 1/2))+b2(N−1)/mc∑j=1∑`1−`2=`4−`3−jm1m(F∗Ψv)`1(F∗Ψv)`2(F∗Ψv)`3(F∗Ψv)`4Ee(−jm(∆˜− 1/2)).The last equality holds as in the proof of (2.18), whereE1mm∑k=1e(t˜k(`1 − `2 + `3 − `4))=1 if `1 − `2 + `3 − `4 = 0Ee(jm(∆˜− 1/2))if `1 − `2 + `3 − `4 = jm, j ∈ Z/{0}0 otherwise.45We bound each of these last three terms. The first term can be bounded as∣∣∣∣∣ ∑`1−`2=`4−`31m(F∗Ψv)`1(F∗Ψv)`2(F∗Ψv)`3(F∗Ψv)`4∣∣∣∣∣=∣∣∣∣∣∣ 1mN∑`3,`4=1(F∗Ψv)`3(F∗Ψv)`4∑`2∈Q`3,`4(F∗Ψv)`4+`2−`3(F∗Ψv)`2∣∣∣∣∣∣≤ 1mN∑`3,`4=1|(F∗Ψv)`3 ||(F∗Ψv)`4 |∣∣∣∣∣∣∑`2∈Q`3,`4(F∗Ψv)`4+`2−`3(F∗Ψv)`2∣∣∣∣∣∣≤ 1mN∑`3,`4=1|(F∗Ψv)`3 | |(F∗Ψv)`4| |〈F∗Ψv,F∗Ψv〉|=1m‖F∗Ψv‖22‖F∗Ψv‖21 =1m‖Ψv‖22‖F∗Ψv‖21≤ 1mβ2‖v‖42γ2s.In the first equality, Q`3,`4 is the set of allowed index values for `2 according to`3 and `4, i.e., so that `2 ∈ [N ] and `4 +`2−`3 ∈ [N ]. The last bound consistsof bounding ‖F∗Ψv‖1 ≤ γ√s‖v‖2 as in the computations of |〈Ak∗, v〉|, and‖Ψv‖2 ≤ β‖v‖2 from the singular values of Ψ.The remaining terms can be bounded similarly. Using our deviationmodel, notice thatb2(N−1)/mc∑j=1|Ee(±jm(∆˜− 1/2))| ≤b2(N−1)/mc∑j=1θm2N≤ θ,and therefore we can similarly obtain∣∣∣∣∣∣b2(N−1)/mc∑j=1∑`1−`2=`4−`3±jm1m(F∗Ψv)`1(F∗Ψv)`2(F∗Ψv)`3(F∗Ψv)`4Ee(±jm(∆˜− 1/2))∣∣∣∣∣∣≤ θmβ2‖v‖42γ2s.46This gives,Em∑k=1|〈Ak∗, v〉|4 ≤ 1 + 2θmβ2‖v‖42γ2s,as desired.2.8 DFT-incoherence of Discretized Smooth Func-tionsThe goal of this section is to informally bound the DFT-incoherence param-eter (γ) for Ψ whose columns are discretizations of smooth functions (withat least one piece-wise continuous derivative). This is example 4) discussedin Section 2.4.2. This section will also provide useful intuition for our resultsin Chapter 3 (see Section 3.4.1).These computation are not crucial in any way for the main result. Rather,this statement is only for the sake of intuition for the reader and to establishthe practicality of the main results. With this in mind, the following will bea very informal argument for the sake of brevity. It follows the lecture notesof Dr. Lei Li at Duke University [54] (intro to numerical PDEs), we refer thereader to these notes for a formal argument, proofs and references therein.Recall that for Ψ ∈ CN×n, the DFT-incoherence parameter γ is definedin Section 2.2.1 asγ = max`∈[n]N∑k=1|〈F∗k,Ψ∗`〉| .Let Ψ have columns that are uniform discretizations of p-differentiable func-tions with p − 1 continuous and periodic derivatives (with period of 1) andwith p-th derivative that is piecewise continuous (p ≥ 1). In this section wewill to argue that:• γ ∼ O(log(N)) if p = 1• γ ∼ O(1) if p ≥ 2.To this end, for k ∈ [N ] and ` ∈ [n] we may assume thatΨk` = g`(k − 1N− 12):= g` (tk),47where each g` : R → C is a function such that g` ∈ Cp−1(Ω) (has p − 1continuous derivates), g(p)` is piecewise continuous and g(j)`(−12)= g(j)`(12)for all derivatives j ∈ {0, 1, · · · , p− 1} (i.e., has p− 1 periodic derivatives).Let the Fourier coefficients of each g` be given asa(`)u :=∫ 1/2−1/2g`(x)e(ux)dxfor u ∈ Z. Further, denote the discrete Fourier coefficients asb(`)v =N∑q=1g`(q − 1N− 12)e(v(q − 1N− 12))=N∑q=1Ψq`e (vtq) :=√N〈Ψ∗`,F∗(v+N˜+1)〉for v ∈ {1−N2, · · · , N−12} := {−N˜ ,−N˜ + 1, · · · , N˜ − 1, N˜} (recall that N˜ :=N−12).By Corollary 2 in [54] (notice our distinct normalization here), we havethat ∣∣∣∣a(`)u − 1√N b(`)u∣∣∣∣ ≤ ω1Np (2.23)for all u ∈ {−N˜ ,−N˜ + 1, · · · , N˜ − 1, N˜} where 0 < ω1 < ∞ is a constantthat depends on the g`'s. Furthermore by Theorem 2 in [54] we have that|a(`)u| ≤ ω2|u|p , (2.24)for all u ∈ {−N˜ ,−N˜ + 1, · · · , N˜ − 1, N˜} where 0 < ω2 < ∞ is a constantthat depends on the g`'s.48Applying (2.23) and (2.24), for each ` ∈ [n] we haveN∑k=1|〈F∗k,Ψ∗`〉| :=N∑k=1∣∣∣∣ 1√N b(`)k−N˜−1∣∣∣∣=N˜∑k=−N˜∣∣∣∣ 1√N b(`)k∣∣∣∣ ≤ N˜∑k=−N˜∣∣∣∣ 1√N b(`)k − a(`)k∣∣∣∣+ N˜∑k=−N˜|a(`)k|≤N˜∑k=−N˜ω1Np+N˜∑k=−N˜ω2|k|p ≤ ω1 +N˜∑k=−N˜ω2|k|p .The last inequality holds since for p ≥ 1 we have∑N˜k=−N˜ 1Np ≤∑N˜k=−N˜ 1N = 1.For the remaining sum, if p = 1 we haveN˜∑k=−N˜1|k| ∼ O(log(N))by an integral comparison andN˜∑k=−N˜1|k|p ∼ O(1)if p ≥ 2 (again by an integral comparison). This holds for all ` ∈ [n] whichgives the desired bounds for γ. This finishes our argument.2.9 Numerical ExperimentsIn this section we present numerical experiments to elaborate several aspectsof our methodology and results. We first introduce several terms and modelsto describe the setup of the experiments. Throughout we let N = 2015 bethe uniformly discretized signal size.Our methodology (2.9) is solved by SPGL1 [20], a Pareto curve approachthat uses duality theory to solve the basis pursuit problem via a sequenceof numerically cheaper LASSO subproblems. Each basis pursuit problem issolved by limiting the number of SPGl1 iterations to 200.49We implement the Dirichlet kernel using (2.4) directly to construct a Spotoperator [24], a matrix free linear operator toolbox that is compatible withSPGL1. We warn the reader that in this section we have not dedicated mucheffort to optimize the numerical complexity of the interpolation kernel. For afaster implementation, we recommend instead applying the Fourier transformrepresentation S = NF∗ (see Section 2.2.3) using NFFT 3 software from [45]or its parallel counterpart [62].Given output f ] = Ψg] of any of our programs with true solution f ,we consider the relative reconstruction error norm as a measure of outputquality, given asRelative Reconstruction Error =‖f ] − f‖2‖f‖2 .Grid perturbations: To construct the nonuniform grid τ˜ , we introduce anirregularity parameter ρ ∈ R+. We define our perturbations by samplingfrom a uniform distribution, so that each ∆k is drawn uniformly at randomfrom [− ρm, ρm] for all k ∈ [m] independently. τ˜ is generated independently foreach signal reconstruction experiment.Complex exponential signal model: We consider bandlimited complex ex-ponentials with random harmonic frequencies. With signal size N = 2015,bandlimit ω = N−12= 1007, and sparsity level s = 50 we generate ~ω ∈ Zs bychoosing s frequencies uniformly at random from {−ω,−ω + 1, · · · , ω} andlettingf(x) =s∑k=1e (~ωkx) .We use the DFT as a sparsifying transform Ψ = F so that g = Ψ∗f = Ψ−1f isindeed a 50-sparse vector. This transform is implemented as a Spot operatorwhich utilizes MATLAB's fft function. The frequency vector, ~ω, is generatedrandomly for each independent set of experiments. Note that in this case wehave optimal DFT-incoherence parameter γ = 1 (see Section 2.4.2).Gaussian signal model: We consider a non-bandlimited signal consistingof sums of Gaussian functions. With signal size N = 2015, this signal modelis defined asf(x) = −e−100x2 + e−100(x−.104)2 − e−100(x+.217)2 .For this dataset, we use the Daubechies 2 wavelet as a sparsifying transformΨ. This operator is implemented as a Spot operator which utilizes the Rice50Wavelet Toolbox [79]. This provides g = Ψ∗f = Ψ−1f that can be wellapproximated by a 50-sparse vector. In other words, all entries of g are non-zero but 50(g) < .088 ≈ ‖f‖2250 . In this case we have γ ≈ 40.78, which wascomputed numerically.2.9.1 Effect of DFT-incoherenceThis section is dedicated to exploring the effect of the DFT-incoherence pa-rameter in signal reconstruction. We consider the complex exponential andGaussian signal models described above, notice that both signals are effec-tively 50-sparse. Recall that in the complex exponential model we haveΨ = F (the DFT) with optimal DFT-incoherence parameter γ = 1. In theGaussian model Ψ is the Daubechies 2 wavelet with γ ≈ 40.78 (computednumerically).Here we set ρ = 12to generate the deviations (so that θ = 0) and varythe average step size of the nonuniform samples. We do so by letting mvary through the set {b N1.5c, bN2c, b N2.5c, · · · , b N9.5c}. For each fixed value ofm, the relative reconstruction error is obtained by averaging the result of10 independent reconstruction experiments. The results are shown in Fig-ure 2.4, where we plot the average nonuniform step size vs average relativereconstruction error.These experiments demonstrate the negative effect of larger DFT-incoherenceparameters in signal reconstruction. Indeed, in Figure 2.4 we see that thecomplex exponential model with γ = 1 allows for accurate reconstructionfrom larger step sizes. This is to be expected from our results in Section2.3 since the results imply that the Daubechies 2 wavelet will require moresamples and worsen the probability of successful reconstruction according toits parameter γ ≈ Effect of Deviation Model Parameter θIn this section we generate the deviations in such a way that vary the parame-ter θ, in order to explore its effect on signal reconstruction. We only considerthe complex exponential signal model for this purpose and fixm = 287 = bN7cnonuniform measurements.We vary the θ parameter by generating the deviations with ρ varyingover {.06, .07, .08, · · · , .5}. For each fixed ρ value we plot the average relativereconstruction error of 50 independent experiments. Notice that for each51Figure 2.4: Plot of average relative reconstruction error vs average nonuni-form step size for both signal models. In the complex exponential model (Ψ =DFT) we have γ = 1 and in the Gaussian signal model we have γ ≈ 40.78(Daubechies 2 wavelet). Notice that the complex exponential model allowsfor reconstruction from larger step sizes in comparison to the Gaussian signalmodel.52k ∈ [m] and any jEe (jm∆k) =sin (2pijρ)2pijρ.We use this observation to compute the θ parameter numerically by con-sidering all j ∈(Z ∩ [2(1−N)m, 2(N−1)m])/{0} = {−14,−13, · · · ,−1, 1, · · · , 14}.The results are shown in Figure 2.5.Figure 2.5: (Left) plot of average relative reconstruction error vs ρ parameterand (right) plot of corresponding θ parameters vs ρ parameter. The plot onthe right includes the constant value θ = 1√2required to apply Theorem 2.3.3(the red line). Notice that although our results only holds for three valuesof ρ (.5, .49, .48), the plot on the left demonstrates that accurate recovery isstill possible otherwise.Computing the θ parameters numerically (plot on the right in Figure 2.5)shows that our main result (Theorem 2.3.3) is only strictly applicable in threecases (ρ = .5, .49, .48). However, the left plot in Figure 2.5) demonstratesthat decent signal reconstruction can still be achieved when the conditionθ < 1√2does not hold. Therefore, the applicability of the methodology goesbeyond the restrictions of the theorem.2.9.3 Noise AttenuationThis section explores the robustness of the methodology when presented withmeasurement noise. We only consider the complex exponential signal modelfor this purpose. We generate additive random noise d ∈ Rm from a uniformdistribution. Each entry of d is i.i.d. from [− χ100, χ100] where χ is the averagevalue of the entries of |f | ∈ RN .53We set ρ = 12to generate the deviations (so that θ = 0) and vary the aver-age step size of the nonuniform samples. We do so by letting m vary throughthe set {bN2c, b N2.25c, b N2.5c, · · · , bN7c}. For each fixed value of m, the relativereconstruction error is obtained by averaging the result of 50 independentreconstruction experiments. The results are shown in Figure 2.6, where weplot the average nonuniform step size vs average relative reconstruction errorand average relative noise level ‖d‖2/‖f‖2.Figure 2.6: Plot of average relative reconstruction error (‖f−Ψg]‖2/‖f‖2) vsaverage nonuniform step size (blue curve) and average input relative measure-ment error (‖d‖2/‖f‖2) vs average nonuniform step size (red curve). Noticethat for the first four step size values (2,2.25,2.5,2.75) noise attenuation isachieved, i.e., the reconstruction error is lower than the input noise level.Although our results do not imply that noise attenuation is possible (thiswas discussed in Section 2.3.1), these experiments show that nonuniformsamples do have a restricted ability for denoising. This is seen in Figure2.6, where the first four step size values (2,2.25,2.5,2.75) output an averagerelative reconstruction error smaller than the input measurement noise level.Thus, when nonuniform samples are not heavily undersampled, reduction inthe noise level is possible.54Chapter 3Off-the-Grid Sampling ofLow-Rank Matrices3.1 IntroductionThis chapter discusses our results under the low-rank signal model discussedin Section 1.2.2. The components of this chapter are in many aspects anal-ogous (in an intuitive way) to the concepts of Chapter 2, but utilizing theideas of low-rank matrix recovery. For the sake of brevity, we will omit orshorten many topics in this chapter that have been thoroughly developed inChapter 2 and will refer the reader to the respective section therein. Section3.2 introduces the assumptions and definitions needed to state the method-ology and main result. Section 3.3 produces the main result of this chapter,including novel analysis of the matrix completion problem. The elementsand implications of these results are discussed in Section 3.4. The remainingsections of the chapter are dedicated to an elaborated discussion of the proof.3.2 Notation, Assumptions and MethodologyIn order to state our methodology and main result, we first introduce nec-essary definitions, assumptions in this section. We introduce the 2D signalmodel, 2D deviation model and 2D interpolation kernel in Sections 3.2.1,3.2.2 and 3.2.3 respectively. This will allow us to proceed to Section 3.3where the main result of this chapter is elaborated.553.2.1 Signal ModelWe now let Ω = [−12, 12)2 and D : Ω 7→ C be our function of interest to besampled in Ω. Assume that D ∈ H1(Ω), which allows the Fourier expansionD(x, y) =∞∑k=−∞∞∑`=−∞ck,`e(kx)e(`y), (3.1)valid only for (x, y) ∈ Ω. For our main result, it is important to notice thatour regularity assumption implies that∞∑k=−∞∞∑`=−∞|ck,`| <∞.Let N ∈ N with N ≥ 9 be odd and D ∈ CN×N denote the matrixwhose entries are samples of D on the 2D equispaced grid τ × ν ⊂ Ω withτ = {τk := k−1N − 12 : k ∈ [N ]} and ν = {ν` := `−1N − 12 : ` ∈ [N ]}, i.e.,Dk` = D(τk, ν`). Our goal will be to reconstruct D from few nonuniformsamples. We have assumed D is a square matrix with N odd for simplicity,our results can be easily adapted to the case D is a rectangular matrix witheven dimensions.Let n,m ∈ N such that nm ≤ N2. Here, D˜ ∈ Cn×m encompassesour observed discretized nonuniform signal (modulo noise). Here, D˜k` =D(τ˜k,`, ν˜k,`) where τ˜ × ν˜ is the nonuniform 2D grid with τ˜k,` := k−1n − 12 + ∆k`and ν˜k,` :=`−1m− 12+ Γk`. The entries of the perturbation matrices ∆,Γ ∈Rn×m give the grid deviations in each respective axis. We assume that thesenonuniform samples remain in the sampling domain, i.e., τ˜ × ν˜ ⊂ Ω. Inparticular, this allows (3.1) to hold for our nonuniform observations. The re-striction nm ≤ N2 is required in the proof of Theorem 3.5.2 (when applyingLemma 3.6.2 therein)Our noisy nonuniform observations are given viaB = D˜ + Ewhere E ∈ Cn×m with ‖E‖F ≤ η, models the noise introduced in the obser-vations. Our goal is to obtain an approximation to D given B, τ˜ , ν˜ and η ina robust and computationally tractable manner.In the 2D setting, the results will rely on the low-rank structure of D ∈CN×N . In other words, given r ∈ [N ], we define the error of the best rank r56approximation of D in nuclearn norm asσr,∗(D) := minrank(X)≤r‖D −X‖∗ =N∑k=r+1σk(D),and say that D has low-rank structure if this term is within a prescribedtolerance for some r  N .We impose some structure on the first r singular vectors of D, which werefer to as the r-incoherence condition with parameter γ (defined in whatfollows): For W ∈ Cd1×d2 defineγ(W ) := maxk∈[d2]d1∑p=1|〈F1∗p,W∗k〉|,where F1 : Cd1 7→ Cd1 is the 1D centered DFT (see Section 2.2.3) and F1∗psignifies its p-th column (in accordance to our notation). Let D = UΣV ∗ bethe full singular value decomposition (SVD) of our matrix. We decomposeD = UΣV ∗ = UrΣrV ∗r + U+Σ+V∗+in terms of its first r singular vectors and last N − r singular vectors inUrΣrV∗r and U+Σ+V∗+ respectively. We define the r-incoherence parameterof D asγ := max{γ(Ur), γ(Vr)}.This parameter will play a crucial role in our main result. We postponefurther discussion of γ until Section 3.4.1, including the intuition of its sig-nificance on the signal and computation of its value for different examples(this concept is similar in many aspects to the DFT-incoherence in Chapter2).Our goal is to recover D (which provides a bandlimited approximation ofD(x, y)) from B = D˜+E. In what follows we will consider a 2D interpolationkernel S : CN×N → Cn×m that achieves S(D) ≈ D˜ accurately (the 2DDirichlet kernel see Section 3.2.3). We will then achieve our reconstructionby solving the nuclear norm minimization problem (1.6) with measurementoperator S.573.2.2 2D Deviation ModelOur main result of this chapter will consider grid deviations ∆,Γ whoseentries are i.i.d. with distributions D1,D2 (respectively), that satisfy thefollowing: for integers 0 < |j| ≤ 2(N−1)n, if δ ∼ D1 thenED1e(jδn) = 0,and similarly for integers 0 < |j| ≤ 2(N−1)m, if δ ∼ D2 thenED2e(jδm) = 0.This model is similar to the one considered in Chapter 2, defined in Section2.2.2. However, notice that we have not introduced an analogy of the pa-rameter θ used in the sparse signal model (i.e., we require θ = 0 here). Thisis done for simplicity and brevity, in order to produce a substantially cleanerresult and proof. It is the author's belief that this deviation model can begeneralized by allowing θ 6= 0 in this context, but at the moment this seemsa daunting task since the proof is already complicated and long as is.Finally, it is important to reiterate that we are sampling on the torus.This is discussed in detail in Section 2.2.2 and we refer the reader to thisreading. We adopt this concept of sampling the periodic extension of D|Ω(x)here, which can be defined analogously in the 2D case. Basically, this isneeded in order to maintain our samples of D in Ω in accordance to Section3.2.1 so that (3.1) remains valid for our nonuniform samples.The deviation model is discussed further in Section 2D Dirichlet KernelWe will use the 2D Dirichlet kernel S : CN×N → Cn×m to model ournonuniform observations S(D) ≈ D˜. This interpolation kernel is definedas S := NF∗, where F : CN×N → CN×N is the 2D centered DFT andN : CN×N → Cn×m is a 2D centered NDFT that incorporates the unstruc-tured grid τ˜ × ν˜.Let N˜ = N−12. Specifically, we apply the centered inverse 2D DFT to ourregular data where for (u, v) ∈ [N ]2Dˇuv := (F∗(D))uv1NN∑p=1N∑q=1Dpqe((u− N˜ − 1)τp)e((v − N˜ − 1)νq), (3.2)58followed by the centered 2D NDFT according to τ˜ , ν˜(S(D))k` :=(N (Dˇ))k`=1NN∑u=1N∑v=1Dˇuve(−τ˜k,`(u− N˜ − 1))e(−ν˜k,`(v − N˜ − 1)). (3.3)As in the 1D case, the action of this operator can also be written in termsof the Dirichlet kernel (see [68]) and is thus a real valued operator. In ourcurrent signal model, we obtain an error bound analogous to the 1D case.Theorem 3.2.1. Let S, D, D˜ be defined as above with τ˜ × ν˜ ⊂ Ω. For(k, `) ∈ [n]× [m], if τ˜k,` = τp and ν˜k,` = νq for some (p, q) ∈ [N ]2 then(D˜ − S(D))k`= 0, (3.4)and otherwise(D˜ − S(D))k`=∑|p|>N˜∑|q|>N˜cp,q(e(pτ˜k,`)e(qν˜k,`)− (−1)bp+N˜Nc+b q+N˜Nce(rN(p)τ˜k,`)e(rN(q)ν˜k,`)),(3.5)where rN(`) = rem(`+ N˜ ,N)− N˜ with rem(`,N) giving the remainder afterdivision of ` by N . As a consequence‖D˜ − S(D)‖F ≤ 2√nm∑|p|>N˜∑|q|>N˜|cp,q| .The proof of this error term is given in Section 3.7. The intuition andproof of this error bound are similar to Theorem 2.2.1. In particular, theresult shows that D provides the bandlimited approximation ofD(x, y). Thisresult is crucial to obtain the error bound of our main result.593.3 Main ResultWith the Dirichlet kernel achieving S(D) ≈ D˜ accurately, we approximateD ≈ D] given viaD] := arg minX∈CN×N‖X‖∗ s.t. ‖S(X)−B‖F ≤ η + 2√nm∑|p|>N−12∑|q|>N−12|cp,q| ,(3.6)where the term 2√nm∑|p|,|q|>N−12|cp,q| is to due to the interpolation kernelerror in Theorem 3.2.1.We now proceed to the main result.Theorem 3.3.1. Let the entries of ∆,Γ be i.i.d. from distributions thatsatisfy our 2D deviation model and D] be given by (3.6).Ifmn ≥ C˜Nrγ4 log6(N) (3.7)where C˜ is an absolute constant, then‖D−D]‖F ≤ 23.4 σr,∗(D)+(5.3+6.9√r)N η√nm+ 2∑|k|>N−12∑|`|>N−12|ck,`| ,(3.8)with probability exceeding1− 4nm− exp(− nm40γ4r√Nlog(1 + 2 log(1 +t15√N+ 1)))−4N exp(− nm8 log(nm)r2 + 4Nr3)− 4N exp(− nm576Nr log(nm) + 8γ2Nr).(3.9)Therefore, with nm ∼ Nr log6(N) random off-the-grid samples we can ap-proximate D with recovery error (3.8) proportional to the low-rank mismatch(σr,∗(D)), noise level (Nη/√nm) and the N−12-bandlimited approximation ofD(x, y) (∑|k|,|`|>N−12|ck,`|).This allows us to construct a function D](x, y) : Ω → C using D] that60achieves|D(x, y)−D](x, y)| ≤23.4 σr,∗(D) +(5.3 + 6.9√r)Nη√nm+ 2(N(5.3 + 6.9√r) + 1) ∑|k|>N−12∑|`|>N−12|ck,`|(3.10)for all (x, y) ∈ Ω. The construction of D](x, y) via D] and proof of the errorbound (3.10) are similar to the 1D case (Corollary 2.3.2) and we omit theproof of this 2D statement.In conclusion, the result states that random nonuniform samples effi-ciently handle aliasing artifacts in an undersampled sense. When η = σr,∗(D) =0, we may recover the N−12-bandlimited approximation of D(x, y) accordingto (3.10) with O(rN polylog(N)) stochastic nonuniform samples. In compar-ison, uniform sampling requires O(N2) measurements for the same quality ofreconstruction. When the uniform discretization of our signal of interest canbe well approximated by a rank r  N matrix, our nonuniform samplingmethodology provides a stark contrast in acquisition complexity.3.3.1 Implications for Matrix Completion: Stability andRobustnessTheorem 3.3.1 applies almost directly to the noisy matrix completion prob-lem. To be specific, a relatively minor modification of the deviation modeland proof of Theorem 3.3.1 provides a novel result for the matrix completionproblem under the sampling with replacement model from [7]. For simplicity,we only consider this sampling model but note that other possible matrixcompletion sampling strategies that can be derived in a similar manner (e.g.,the standard uniform random sampling model).To elaborate, in the matrix completion problem one is given a set ofmulti-indices {(pk, qk)}mk=1 ⊂ [N ]2 indicating the set of observed matrix en-tries (modulo noise) of D ∈ CN×N . Let Λ := {(pk, qk)}mk=1 and DΛ ∈ Cmencompass the observed matrix elements, i.e., each entry of DΛ is given asDΛk = Dpkqk , ∀k ∈ [m].Then our noisy observations are given asB = DΛ + E ∈ Cm,61where E ∈ Cm models the additive noise with ‖E‖2 ≤ η. Notice that here Dneed not be generated as a discretization of a continuous function D(x, y).This will also hold for our result in this section, where the signal model is nolonger needed. Our novel result in this context is the followingTheorem 3.3.2. Let D be an N × N matrix with r-incoherence parameterγ (not necessarily generated as a discretization of a function). Suppose Λ :={(pk, qk)}mk=1, where each multi-index (pk, qk) is selected independently fromthe uniform distribution on [N ]2 (i.e., sampling with replacement).LetB = DΛ + E,with ‖E‖2 ≤ η andD] := arg minX∈CN×N‖X‖∗ s.t. ‖XΛ −B‖2 ≤ η. (3.11)Ifm ≥ C˜Nrγ4 log6(N),where C˜ is the absolute constant, then D] satisfies‖D −D]‖F ≤ 23.4 σr,∗(D) + (5.3 + 6.9√r)Nη√nm,with probability exceeding1− 4nm− exp(− nm40γ4r√Nlog(1 + 2 log(1 +t15√N+ 4)))−4N exp(− nm8 log(nm)r2 + 4Nr3)− 4N exp(− nm576Nr log(nm) + 8γ2Nr).(3.12)The proof of this theorem is presented in Section 3.8, it follows from aproof similar to that of Theorem 3.3.1.Notice that the result does not require D to be rank r and provides anerror bound proportional to the error of the rank r approximation (σr,∗(D)).Such methodologies and results are referred to as stable (see for example[85]). This is the main contribution of this result, where all other relatedworks in the literature require the data matrix to be rank r [21, 22, 7, 23].62The reference [22] allows for noisy measurements, so that one could po-tentially remove the rank r constraint by absorbing the error of the rank r ap-proximation in the noise model, i.e., modify the noise level as η˜ := η+σr,∗(D).However, this would imply that the practitioner must have an estimate ofσr,∗(D) and would produce an error bound ∼ N√r polylog(N)(η+σr,∗(D)) in [22].On the other hand, our results apply to full rank matrices and give the sig-nificantly improved error bound ∼ σr,∗(D) +√N√polylog(N)η. Furthermore, ourresult has the same sampling complexity as [22] in terms of the logarithmicdependency, log6(N). However, this comparison is difficult to make fairlydue to the distinct incoherence and sampling considered.Theorem 3.3.2 applies to the sampling with replacement model. Onecould argue as in Proposition 3.1 in [7] and extend this result to the uniformrandom sampling model from standard results on the topic [21, 23]. However,this seems strange to do in the noisy case, since observing repeated entrieswill modify the noise energy level η. We leave such results as future work andend this section by noting that other sampling strategies can be consideredfor the matrix completion problem via our 2D deviation model, especiallyby generalizing this model to the case θ 6= 0 (see Section 3.2.2 and Section2.2.2).3.4 DiscussionThis section provides additional discussion on the elements of our main re-sult in this section, Theorem 3.3.1. Section 3.4.1 discusses the incoherencecondition on the singular vectors of the data matrix. Section 3.4.2 considersthe 2D deviation model.3.4.1 r-incoherence ConditionOur r-incoherence condition defined in Section 3.2.1 is related to the DFT-incoherence parameter of Chapter 2 from Section 2.2.1. Thus, the intutionelaborated in Section 2.4.2 remains informative but now applies to the sin-gular values of the data matrix D.Therefore, as discussed in Section 2.4.2, our results apply to data matricesD whose main rank r component (UrV∗r ) corresponds to the discretizationof a smooth function. This is perhaps best described as in example 4) from63Section 2.4.2:r-incoherence for Discretized Smooth Functions : Let p ≥ 1 be an integer.Consider singular vectors Ur, Vr whose columns are uniform discretizationsof p-differentiable functions with p − 1 periodic and continuouss derivativesand with p-th derivative that is piecewise continuous. In this case we haveγ(Ur), γ(Vr) ∼ O(log(N)) if p = 1 and γ(Ur), γ(Vr) ∼ O(1) if p ≥ 2. Weremind the reader that this example is argued informally in Section 2.8.Alternatively, our r-incoherence concept aligns well with the typical inco-herence structure from the matrix completion literature [21, 22, 7, 23] (e.g.,coherence condition and strong incoherence property). Intuitively, the inco-herence conditions from these references assure that the data matrix containsits energy (i.e., norm) evenly spread out throughout the matrix. Small inco-herence parameters avoid the case where a significant portion of the matrixenergy lies in a small subset of the matrix entries. In matrix completion,data matrices with large incoherence paramter are not appropriate for ma-trix completion (with sampling oblivious to the singular vectors) since onemight miss crucial information upon sampling entry wise.This same intuition is informative for our r-incoherence structure. This isillustrated by a simple example in Figure 3.1, where two data matrices (takenfrom [68]) with distinct γ values are presented. Both of these matrices areuniform discretizations of continuous functions of the formD(x, y) =5∑k=1e−c(x−xk)2−c(y−yk)2(3.13)for some parameter c > 0 and Gaussian centers {(xk, yk)}5k=1. Both uniformlydiscretized signals (D) are rank-5 in this example.In Figure 3.1, the data matrix on the left corresponds to (3.13) withc = 1000 (i.e., relatively large Lipschitz constant). As a consequence, mostof its energy lies in a small portion of the matrix and we obtain inappropriatevalue γ ∼ O(√N) which also agrees with the standard concept of incoherencefrom matrix completion. On the other hand, the data matrix on the right ismore appropriate for our application (and matrix completion) as its energyis more evenly spread. This matrix corresponds to (3.13) with parameterc = 20. Such moderate smooth signals provide favorable sampling complexityand reconstruction error bounds via our methodology.However, this simple example may be misleading in the context of thisthesis since the discretizations in Figure 3.1 differ in both r-incoherence pa-64Figure 3.1: Illustration of two 500 × 500 data matrices (from [68]) of rank5 with distinct 5-incoherence parameters (γ). (Left) low-rank data matrixwith inappropriate 5-incoherence structure γ ∼ O(√N). (Right) low-rankdata matrix with appropriate 5-incoherence structure γ ∼ O(1). Both datamatrices are discretizations of functions of the form (3.13) with same centerlocations and parameter c = 1000 for the left data matrix and c = 20 for theright data matrix.rameter and aliasing energy (i.e.,∑|`|>N/2 |c`|). Such differences make theeffect of r-incoherence unclear in the images and error bounds, since themain result also includes the error of N/2-bandlimited approximation. Toprovide a fair example that decouples these two concepts we modify (3.13)and consider functions of the formD(x, y) =5∑k=1e−c(x−xk)2−c(y−yk)2 + d sin (2piωx) sin (2piωy) , (3.14)where d, ω > 0 will be used to modify the bandwidth and aliasing energy ofthese simple examples. This modified example is shown in Figure 3.2. Inthis case, both discretizations (D) are now rank-6.In this modified example, we may now choose distinct bandwidth ω andaliasing energy d parameters so that both discretizations have the same alias-ing energy, while not substantially affecting the respective incoherence pa-rameters γ of the original examples from Figure 3.1. In addition, both datamatrices are equal in rank so that this simple example provides a more fairillustration of the r-incoherence concept that does not vary other aspects ofthe data matrices.65Figure 3.2: Illustration of two 500 × 500 data matrices of rank 6 with dis-tinct 6-incoherence parameters (γ) but same aliasing energy (∑|`|>N/2 |c`|).(Left) low-rank data matrix with inappropriate 6-incoherence structure γ ∼O(√N). (Right) low-rank data matrix with appropriate 6-incoherence struc-ture γ ∼ O(1). Both data matrices are discretizations of functions of the form(3.14) modified from respectively from Figure 3.1, where the left image hasno additional aliasing energy introduced (i.e., ω < 250 and d  1) and theimage on the right has been generated with additional aliasing error (ω > 250and d ∼ 1) to match that of the image on the left (these additional aliasingartifacts are subtle due to their high frequency nature).663.4.2 2D Deviation ModelAs discussed in Section 3.2.2, our 2D deviation model is similar to the devia-tion model from Chapter 2 in Section 2.2.2 but more restrictive. In particular,the examples with θ = 0 provided in Section 2.4.1 apply in this 2D setting.Specifically, these are• 1) D1 = U [− 12n , 12n ] or D2 = U [− 12m , 12m ]. To generalize this example,we may take D1 = U [µ− p2n , µ+ p2n ] or D2 = U [µ− p2m , µ+ p2m ], for anyµ ∈ R and p ∈ N/{0}.• 2) D1 = U{− 12n + knn¯1}n¯1−1k=0 or D2 = U{− 12m + kmn¯2}n¯2−1k=0 with n¯1 :=d2(N−1)ne + 1 and n¯2 := d2(N−1)m e + 1. To generalize this example, wemay take D1 = U{µ− p2n + pknn¯1}n¯1−1k=0 or D2 = U{µ− p2m + pkmn¯2}n¯2−1k=0 , forany µ ∈ R, p ∈ N/{0}, n¯1 ∈ N/[d2(N−1)pn e] and n¯2 ∈ N/[d2(N−1)pm e].Figure 3.3 offers a simple illustration of the possible undersampling in our2D random nonuniform sampling scenario. The red samples correspond todense uniform samples and the blue samples depict random nonuniform sam-ples achievable by both examples 1) and 2) above. Nonuniform samples underthe 2D deviation model achieve the same quality of reconstruction as denseequispaced samples, but with significantly less measurements in agreementwith Theorem 3.3.1. In particular notice that our assumptions, where sam-ples are deviated from a less dense equispaced grid {(k−1n− 12, `−1m− 12)}k∈[n],`∈[m]allow for samples that are evenly distributed throughout Ω (these examplesinclude jittered sampling [11, 74, 14, 72]).An analogy of the parameter θ from Chapter 2 is not used in this chapterfor simplicity, simply because the proof of the main result is long and com-plicated as is. Therefore, we should expect our low-rank signal model resultsto hold in a more general setting that does not require θ = 0. For example,when D1,D2 = N (µ, σ¯2), for any µ ∈ R and σ¯2 > 0 we have for j ∈ Z/{0}and δ ∼ D1,D2|ED1e(jδn)| = e−2(σ¯pin)2and |ED2e(jδm)| = e−2(σ¯pim)2as discussed in Section 2.4.1. For anym and σ¯, this example will never satisfyour current 2D deviation model but can come arbitrarily close.67Figure 3.3: Illustration of the 2D sampling scenario in Ω. (Left) dense eq-uispaced samples. (Right) less dense nonuniform samples (on average by afactor of ≈ 12) which can be generated by example 1) or 2) in this section.Both sampling schemes provide the same quality of reconstruction, but ran-dom off-the-grid samples require less measurements according to our results.3.5 Proof of Main ResultWe will prove this main result via a dual certificate (Lemma 3.5.1 below,proven in Section 3.6). This lemma is a generalization of dual certificateguarantees for sparse vector recovery to the low-rank matrix recovery case(see Theorem 4.33 in [85]).We begin with some necessary notation, definitions. Using the decompo-sition D = UrΣrV∗r + U+Σ+V∗+, we will be interested in the following spaceof matricesT = {X ∈ CN×N : X =r∑k=1λkU∗kV ∗∗k, λk ∈ R+} = Span ({U∗kV ∗∗k}rk=1) ,and its orthogonal complimentT⊥ = Span({U∗kV ∗∗`}(k,`)∈[N ]2/{U∗kV ∗∗k}rk=1) .In particular, notice that UrΣrV∗r ∈ T and U+Σ+V ∗+ ∈ T⊥.Let PT (X),PT⊥(X) denote the respective orthogonal projections to eachof these spaces and let S be the unit sphere in CN×N . The following sectionstates the dual certificate conditions that provide our recovery error bounds.This is followed by additional lemmas that will be used to establish the dualcertificate.683.5.1 Dual Certificate and Required LemmasThe following lemma will establish our recovery error bounds, stated for ageneral linear operator.Lemma 3.5.1. Let A : CN×N 7→ Cn×m be a linear operator and D =UrΣrV∗r + U+Σ+V∗+ decomposed as above. Assume thatsupX∈T∩S〈(A∗A− I)(X), X〉 ≤ β1, ‖A‖F 7→F ≤ β2, (3.15)and there exists a matrix M = A∗Y such that‖PT (M)− UrV ∗r ‖F ≤ β3, ‖PT⊥(M)‖ ≤ β4, ‖Y ‖F ≤ β5√r. (3.16)Ifβ2β3√1−β1 + β4 < 1, thenD] := arg minX‖X‖∗ s.t. ‖A(X)−A(D)− E‖F ≤ ηwith ‖E‖F ≤ η satisfies‖D −D]‖F ≤ C1σr,∗(D) + (C2 + C3√r)η,for constants C1, C2, C3 that depend only on the βk's.The proof of this lemma is postponed until Section 3.6.1.We therefore obtain our main result if we establish (3.15) and (3.16) forour sampling operator under the r-incoherence assumptions. These inequal-ities will follow from the next three Lemmas (also proven in Section 3.6).Throughout, we will denoteA := 1√nmN˜F∗where N˜ := NN .The following lemma will be used to compute the parameters β1, β3 andβ5 in (3.15) and (3.16) of Lemma 3.5.1.Lemma 3.5.2. Let A, T and γ be defined as above. If√mn√log(nm)≥ 2C√Nrγ2 log5/2(N)√1 + δ√Nδ(3.17)69where C > 0 is an absolute constant, thensupX∈T∩S〈(A∗A− I)(X), X〉 ≤ 2δ√N, (3.18)with probability exceeding1− exp(− nmδ4γ4r√Nlog(1 + 2 log(1 +t2 δ√N+ 1))).The proof can be found in Section 3.6.2.The next lemma gives parameter β2 from (3.20) directly.Lemma 3.5.3. With the previous notation and definitions, we have‖A‖F→F ≤√3N√rwith probability greater than1− 2nm− 4N exp(− nm8 log(nm)r2 + 4Nr3).The proof is provided in Section 3.6.3.The final lemma handles the term ‖PT⊥(M)‖ for parameter β4 in (3.16)of Lemma 3.5.1.Lemma 3.5.4. With the previous notation and definitions, we have‖PT⊥ (A∗A(UrV ∗r )) ‖ ≤ 2δwith probability greater than1− 2nm− 4N exp(− nmδ216Nr log(nm) + 4γ2Nrδ3).The proof is postponed until Section Proof of Theorem 3.3.1We are now in a position to prove the main result, Theorem 3.3.1, via Lemmas3.5.1-3.5.4.Proof of Theorem 3.3.1. We obtain our result by applying Lemma 3.5.1. Toestablish the required conditions (3.15) and (3.16), we use Lemma 3.5.3,Lemma 3.5.2 with δ = 1/10 and Lemma 3.5.4 with δ = 1/6 and proceedto compute the βk parameters in Lemma 3.5.1 (and the Ck constants in theerror bound).Applying Lemma 3.5.2 with δ = 1/10, we assume√mn√log(nm)≥ 20C√Nrγ2 log5/2(N)√1 +110√N, (3.19)to obtainsupX∈T∩S〈(A∗A− I)(X), X〉 ≤ 15√N, (3.20)with probability exceeding1− exp(− nm40γ4r√Nlog(1 + 2 log(1 +t15√N+ 1))).Inequality (3.20) gives parameter β1 :=15√N≤ 115directly (Since weassume N ≥ 9).For β5, define Y := A(UrV ∗r ), so that M = A∗A(UrV ∗r ). Notice that theLHS of (3.20) is equivalent tosupX∈T∩S〈(A∗A− I)(X), X〉 = supX∈T∩S|〈(A∗A(X), X〉 − 〈X,X〉|= supX∈T∩S∣∣‖A(X)‖2F − ‖X‖2F ∣∣ .Therefore, since UrV∗r ∈ T , (3.20) gives‖Y ‖F := ‖A(UrV ∗r )‖F ≤√1 +15√N‖UrV ∗r ‖F ≤√1 +115√r =√16r√15,so that β5 =√16√15.71To compute β3, notice thatsupX∈T∩S〈(A∗A− I)(X), X〉 = supX∈S〈(A∗A− I)(PT (X)),PT (X)〉= supX∈S〈PT ◦ (A∗A− I) ◦ PT (X), X〉 = ‖PT ◦ (A∗A− I) ◦ PT‖F 7→F ,where the last inequality holds since PT ◦ (A∗A − I) ◦ PT is a Hermitianoperator. Therefore, (3.20) implies‖PT (M)− UrV ∗r ‖F = ‖PT (M − UrV ∗r )‖F := ‖PT ◦ (A∗A− I)(UrV ∗r )‖F=‖PT ◦ (A∗A− I) ◦ PT (UrV ∗r )‖F ≤‖UrV ∗r ‖F5√N=√r5√N:= β3.Applying Lemma 3.5.3 gives β2 =√3N√rdirectly with probability greaterthan1− 2nm− 4N exp(− nm8 log(nm)r2 + 4Nr3).Finally, to compute β4 we apply Lemma 3.5.4 with δ =16. Thus‖PT⊥(M)‖ := ‖PT⊥ ◦ A∗A(UrV ∗r )‖ ≤13:= β4with probability greater than1− 2nm− 4N exp(− nm576Nr log(nm) + 8γ2Nr).Notice thatβ2β3√1− β1+ β4 =√35√14/15+13≤ .7 < 1,as required. We obtain the conclusion of Theorem 3.5.1, with constantsC1 := 2(β21− β1 + 1)(1− β2β3√1− β1− β4)−1≤ 23.4,C2 :=2√1− β1+2β3√1− β1(β21− β1 + 1)(1− β2β3√1− β1− β4)−1≤ 5.3,72andC3 := 2β5(1− β2β3√1− β1− β4)−1≤ 6.9,where the probability of failure holds by a union bound. In particular, sincewe assume nm ≤ N2 and N ≥ 9, if√mn ≥ 20√2C√Nrγ2 log3(N)√1 +130then (3.19) holds. This is the advertised sampling complexity in the state-ment of Theorem 3.3.1, where we have used√1 + 110√N≤√1 + 130andabsorbed the absolute constants into the definition of√C˜.3.6 Proof of Dual Certificate Recovery and Re-quired LemmasIn this section we prove the required lemmas from Section 3.5.1 for the mainresult: Lemma 3.5.1, Lemma 3.5.2, Lemma 3.5.3 and Lemma Proof of Lemma 3.5.1We begin with the proof of Lemma 3.5.1. This lemma is a generalization ofdual certificate guarantees for sparse vector recovery to the low-rank matrixrecovery case (see Theorem 4.33 in [85]), and the proof will be analogous.Proof of Lemma 3.5.1. Denote W = D] − D, our goal is to bound ‖W‖F .Notice that since D] is feasible‖A(W )‖F ≤ ‖A(D])−A(D)− E‖F + ‖E‖F ≤ 2η.We will apply this inequality throughout the proof.Let Z ∈ T⊥ be such that 〈D +W,Z〉 = ‖PT⊥(D +W )‖∗. By optimalityof D], we obtain‖D‖∗ ≥ ‖D]‖∗ = ‖D +W‖∗ ≥ |〈D +W,UrV ∗r + Z〉|=|〈D +W,UrV ∗r 〉+ ‖PT⊥(D +W )‖∗|=|‖PT (D)‖∗ + 〈PT (W ), UrV ∗r 〉+ ‖PT⊥(D +W )‖∗|≥‖PT (D)‖∗ − |〈PT (W ), UrV ∗r 〉|+ ‖PT⊥(W )‖∗ − ‖PT⊥(D)‖∗.73Where the second inequality holds by the variational characterization of thenuclear norm, ‖X‖∗ = sup‖Y ‖≤1〈X, Y 〉. In the third equality, we have used〈D,UrV ∗r 〉 = ‖PT (D)‖∗.Rearranging, we have shown‖PT⊥(W )‖∗ ≤ 2‖PT⊥(D)‖∗ + |〈PT (W ), UrV ∗r 〉|. (3.21)Under the assumption ‖UrV ∗r − PT (M)‖F ≤ β3, the last term in (3.21) canbe bounded as|〈PT (W ), UrV ∗r 〉| ≤ |〈PT (W ), UrV ∗r − PT (M)〉|+ |〈PT (W ),PT (M)〉|≤β3‖PT (W )‖F + |〈PT (W ),PT (M)〉|=β3‖PT (W )‖F + |〈W − PT⊥(W ),M − PT⊥(M)〉|=β3‖PT (W )‖F + |〈W,M〉 − 〈PT⊥(W ),PT⊥(M)〉|≤β3‖PT (W )‖F + |〈W,M〉|+ |〈PT⊥(W ),PT⊥(M)〉|. (3.22)In the second equality, we used the fact that 〈W,PT⊥(M)〉 = 〈PT⊥(W ),PT⊥(M)〉.We now bound the three terms in (3.22), beginning with ‖PT (W )‖F .The assumed inequalities (3.15) give that for any Z ∈ T we have ‖A(Z)‖2F ≥(1− β1)‖Z‖2F , and ‖A(H)‖F ≤ β2‖H‖F for any H. Therefore,‖PT (W )‖F ≤ 1√1− β1‖A(PT (W ))‖F ≤ 1√1− β1‖A(W )‖F + 1√1− β1‖A(PT⊥(W ))‖F≤ 2η√1− β1+β2√1− β1‖PT⊥(W )‖F .(3.23)The remaining terms, |〈PT⊥(W ),PT⊥(M)〉| and |〈W,M〉|, can be boundedby our assumptions (3.16)|〈PT⊥(W ),PT⊥(M)〉| ≤ ‖PT⊥(W )‖∗‖PT⊥(M)‖ ≤ β4‖PT⊥(W )‖∗and|〈W,M〉| := |〈W,A∗(Y )〉| = |〈A(W ), Y 〉| ≤ ‖A(W )‖F‖Y ‖F ≤ 2ηβ5√r.Using these inequalities to bound |〈PT (W ), UV ∗〉| in (3.21) we obtain‖PT⊥(W )‖∗ ≤ 2‖PT⊥(D)‖∗ +2ηβ3√1− β1+β2β3√1− β1‖PT⊥(W )‖F + β4‖PT⊥(W )‖∗ + 2ηβ5√r≤2‖PT⊥(D)‖∗ +(β2β3√1− β1+ β4)‖PT⊥(W )‖∗ +(2β3√1− β1+ 2β5√r)η.74Since by assumption ρ := β2β3√1−β1 + β4 < 1, we can rearrange and obtain‖PT⊥(W )‖∗ ≤21− ρ‖PT⊥(D)‖∗ +(2β3√1−β1 + 2β5√r)1− ρ η.From our previous calculations, (3.23),‖PT (W )‖F ≤ 2η√1− β1+β2√1− β1‖PT⊥(W )‖F ≤2η√1− β1+β2√1− β1‖PT⊥(W )‖∗,so that both of these inequalities give‖W‖F ≤ ‖PT (W )‖F + ‖PT⊥(W )‖F ≤ ‖PT (W )‖F + ‖PT⊥(W )‖∗≤ 2η√1− β1+(β2√1− β1+ 1)‖PT⊥(W )‖∗ ≤ C1‖PT⊥(D)‖∗ +(C2 + C3√r)η,with appropriate constants, where by definition σr,∗(D) = ‖PT⊥(D)‖∗. Wefinish by noting that these constants are given asC1 := 2(β21− β1 + 1)(1− β2β3√1− β1− β4)−1,C2 :=2√1− β1+2β3√1− β1(β21− β1 + 1)(1− β2β3√1− β1− β4)−1,andC3 := 2β5(1− β2β3√1− β1− β4)− Proof of Lemma 3.5.2We now endeavor to prove Lemma 3.5.2. This result requires the two lemmas,which we state here and prove in Section 3.10.Before continuing, we establish some useful notation and observations forour linear operators. We consider N : CN×N → Cn×m,F∗ : CN×N → CN×Nas ensembles of matrices {N k,`}(k,`)∈[n]×[m] ⊂ CN×N and {(F∗)k,`}(k,`)∈[N ]2 ⊂CN×N respectively. Here the superscripts order the matrices such that theaction of each operator on X ∈ CN×N is given entry-wise as(N (X))k` = 〈N k,`, X〉,75for (k, `) ∈ [n]× [m] with an analogous definition for F∗ and any other linearoperator to be defined.We remind the reader that we have normalized A := 1√nmN˜F∗ whereN˜ := NN , so that for (k, `) ∈ [n]× [m] and (p, q) ∈ [N ]× [N ]N˜ k,`pq = e(τ˜k,`(p− N˜ − 1))e(ν˜k,`(q − N˜ − 1)).Our deviation model ensures that1√nmN˜ forms an isotropic ensemble, i.e.,for any X ∈ CN×NE1nmN˜ ∗N˜ (X) = X. (3.24)We produce the calculation here, which will also be useful in establishing thelemmas that follow:Throughout, let ∆˜, Γ˜ ∈ R be independent copies of the entries of ∆,Γrespectively and let p, q ∈ [N ]. Expanding givesE1nm(N˜ ∗N˜ (X))pq= E1nmn∑k=1m∑`=1N˜ k,`pq 〈N˜ k,`, X〉= E1nmn∑k=1m∑`=1N˜ k,`pq(N∑p˜,q˜=1¯˜N k,`p˜q˜ Xp˜q˜):=N∑p˜,q˜=1Xp˜q˜E∑k`1nme(τ˜k,`(p− p˜))e(ν˜k,`(q − q˜)).76At this point, we use our deviation model to obtainE∑k`1nme(τ˜k,`(p− p˜))e(ν˜k,`(q − q˜)) = ED1e(∆˜(p− p˜))ED2e(Γ˜(q − q˜))·(n∑k=11ne((k − 1n− 12)(p− p˜)))( m∑`=11me((`− 1m− 12)(q − q˜))),=1 if p = p˜ and q = q˜ED2e(zm(Γ˜− 1/2))= 0 if p = p˜ and q − q˜ = zm,for z ∈ Z/{0}ED1e(jn(∆˜− 1/2))= 0 if p− p˜ = jn and q = q˜,for j ∈ Z/{0}ED1e(jn(∆˜− 1/2))ED2e(zm(Γ˜− 1/2))= 0 if p− p˜ = jn and q − q˜ = zm,for j, z ∈ Z/{0}0 otherwise.The midterms vanish by our deviation model (∆˜ ∼ D1, Γ˜ ∼ D2) and be-cause q − q˜ = zm, p − p˜ = jn imply that z ∈ (Z ∩ [1−Nm, N−1m])/{0} andj ∈ (Z ∩ [1−Nn, N−1n])/{0} (since p, p˜, q, q˜ ∈ [N ]). The final term is zero byorthogonality of the complex exponentials. Therefore,E1nm(N˜ ∗N˜ (X))pq= Xpq,as desired. We will use this isotropy property frequently and similar orthog-onality calculations in what follows.We now provide a lemma that will be useful to establish both Lemma 3.5.2and Lemma 3.5.4, in order to apply respective concentration inequalities ineach. The proof is postponed until Section 3.10.Lemma 3.6.1. Define N˜ ,F as above. Then for D ∈ T and all (k, `) ∈[n]× [m]|〈N˜ k,`,F∗(D)〉| ≤ γ2√r‖D‖F (3.25)and1nmEn∑k=1m∑`=1|〈N˜ k,`,F∗(D)〉|4 ≤ γ4r‖D‖4F . (3.26)77The next Lemma is a generalization of Lemma 3.6 in [58] to the low-rank matrix recovery case. The proof is due to [94], but requires a slightmodification for our setting. Adopting the author's notation, in what followsfor a matrix A ∈ CN×N we denote |A)(A| as the operator that maps X 7→A〈A,X〉.Lemma 3.6.2. Let m ≤ N2, U ⊂ CN×N andU2 := {X ∈ CN×N : ‖X‖F ≤ 1, ‖X‖∗ ≤√r‖X‖F}.Fix some V1, ..., Vm ∈ CN×N that satisfymaxk∈[m]supX∈U|〈Vk, X〉| ≤ K√r‖X‖F .Let 1, ..., m be i.i.d. Rademacher random variables. ThenE supX∈U∩U2m∑k=1k〈|Vk)(Vk|(X), X〉≤√rCK log5/2(N) log1/2(m)(supX∈U∩U2m∑k=1〈|Vk)(Vk|(X), X〉)1/2,where C is an absolute constant.See Section 3.10 for the proof.We now proceed to the proof of Lemma 3.5.2. This lemma computes theparameters β1, β3 and β5 in (3.15) and (3.16).Proof of Lemma 3.5.2. Let A = N√nmS = 1√nmN˜F∗ andX := supX∈T∩S〈(A∗A− I)(X), X〉.We aim to showX ≤ 2δ√N, (3.27)which we bound by noting thatsupX∈T∩S〈(A∗A− I)(X), X〉 = supX∈T〈(A∗A− I)(X), X〉, (3.28)78whereT := U2 ∩ T := {X ∈ CN×N : ‖X‖F = 1, ‖X‖∗ ≤√r‖X‖F}⋂T.Here we have adopted notation U2 from Lemma 3.6.2 (and from [94]). Indeed,notice that for X ∈ T we haverank(X) ≤ r,which gives ‖X‖∗ ≤√r‖X‖F and the containment T ∩ S ⊂ T holds (andthus T = T ).We now proceed along the lines of [94, 41, 58]. We reiterate that we areadopting some notation from Lemma 3.6.2 (and from [94]), where for a matrixA ∈ CN×N we denote |A)(A| as the operator that maps X 7→ A〈A,X〉. WriteX := supX∈T〈(A∗A− I)(X), X〉 := ‖A∗A− I‖T=‖n∑k=1m∑`=1(|Ak,`)(Ak,`|)− I‖T=‖n∑k=1m∑`=1(|Ak,`)(Ak,`| − E|Ak,`)(Ak,`|) ‖T ,where the last equality holds due to isotropy of our ensemble (3.24). We willfirst bound ED1,D2X , and then apply a concentration inequality to show thisrandom variable is concentrated around its mean.Using symmetrization (as in equation (42) of [94], which uses Lemma 6.3in [56]) we obtainED1,D2X ≤ 2ED1,D2E‖n∑k=1m∑`=1k,`|Ak,`)(Ak,`|‖T ,where k,` are Rademacher random variables. We now apply Lemma 3.6.2which requires nm ≤ N2 and computation of the parameter K, where in ourcase we have U = T . We obtain,maxk,`supX∈T|〈Ak,`, X〉| := maxk,`supX∈T1√nm|〈N˜ k,`,F∗(X)〉|≤ γ2√r‖X‖F√nm,79where the last inequality follows from Lemma 3.6.1. Therefore, with K =γ2√nm, Lemma 3.6.2 givesE‖∑k,`k,`|Ak,`)(Ak,`|‖T ≤ C1‖∑k,`|Ak,`)(Ak,`|‖1/2T ,whereC1 :=C√rγ2 log5/2(N)√log(nm)√nm,and C > 0 is an absolute constant given in Lemma 3.6.2.Summarizing and continuing these calculations, we haveED1,D2X ≤ 2C1ED1,D2(‖∑k,`|Ak,`)(Ak,`|‖T)1/2≤2C1ED1,D2(‖∑k,`(|Ak,`)(Ak,`|)− I‖T + 1)1/2≤2C1(ED1,D2‖∑k,`(|Ak,`)(Ak,`|)− I‖T + 1)1/2=2C1 (ED1,D2X + 1)1/2 .ThereforeED1,D2X√ED1,D2X + 1≤ 2C√rγ2 log5/2(N)√log(nm)√nm,and we achieve ED1,D2X ≤ τ if2C√rγ2 log5/2(N)√log(nm)√nm≤ τ√τ + 1. (3.29)We now apply a concentration inequality to show that X is close to itsexpected value with high probability. As in the sparse vector recovery setting,we use Theorem 5.2 in [41] (proven in [87]).80Theorem 3.6.3. Let Y1, ..., Ym be a sequence of independent random vari-ables with values in some Polish space H. Let F be a countable collection ofreal-valued measurable and bounded functions f on H with ‖f‖∞ ≤ B for allf ∈ F . Let Z be the random variableZ = supf∈Fm∑k=1f(Yk).Assume Ef(Yk) = 0 for all k ∈ [m] and all f ∈ F . Define σ2 = supf∈F E∑mk=1 f(Y`)2.Then for all t ≥ 0P (Z ≥ EZ + t) ≤ exp(− t4Blog(1 + 2 log(1 +Bt2BEZ + σ2))).To apply the theorem, defineH := {Z : CN×N → CN×N : 〈Z(X), X〉 ∈ R ∀X ∈ T and supX∈T〈Z(X), X〉 ≤ γ4rnm},which is a closed subset of {Z : CN×N → CN×N} (a Polish space via homeo-morphism to CN4), and therefore itself a Polish space (see [31]). For X ∈ Tdefine the function fX : H → R acting on operators asfX(Z) := 〈Z(X), X〉,which is real valued by definition of H. Then notice thatX := ‖∑k,`(|Ak,`)(Ak,`|)− I‖T= supX∈T∑k,`〈(|Ak,`)(Ak,`| − E|Ak,`)(Ak,`|) (X), X〉= supX∈T∑k,`fX(|Ak,`)(Ak,`| − E|Ak,`)(Ak,`|)= supX∈T ∗∑k,`fX(|Ak,`)(Ak,`| − E|Ak,`)(Ak,`|) ,where T ∗ is a dense countable subset of T .81To see that {|Ak,`)(Ak,`| − E|Ak,`)(Ak,`|}(k,`)∈[n]×[m] ⊂ H, we use the firstpart of Lemma 3.6.1 and our previous remarks. For all (k, `) ∈ [n]× [m] andX ∈ T , we obtainfX(|Ak,`)(Ak,`| − E|Ak,`)(Ak,`|) = |〈Ak,`, X〉|2 − E|〈Ak,`, X〉|2=1nm|〈N k,`,F∗(X)〉|2 − E 1nm|〈N k,`,F∗(X)〉|2≤γ4rnm,and we may choose B = γ4rnm.Now for σ2, we apply the second part of Lemma 3.6.1 with our previousreasoning to obtainE∑k,`fX(|Ak,`)(Ak,`| − E|Ak,`)(Ak,`|)2 = E∑k,`|〈Ak,`, X〉|4 − (E|〈Ak,`, X〉|2)2≤ E∑k,`|〈Ak,`, x〉|4 = 1n2m2E∑k,`|〈N k,`,F∗(X)〉|4≤ γ4rnm:= σ2.To finish, apply Theorem 3.6.3 with t = δ√Nand choose τ = δ√N. Thenassuming√mn√log(nm)≥ 2C√Nrγ2 log5/2(N)√1 + δ√Nδ, (3.30)gives EX ≤ δ√Naccording to (3.29). ThenX ≤ EX + t ≤ 2δ√Nwith probability of failure not exceedingexp(− nmδ4γ4r√Nlog(1 + 2 log(1 +t2 δ√N+ 1))).823.6.3 Proof of Lemma 3.5.3We now prove Lemma 3.5.3, which provides the parameter β2 in (3.20).proof of Lemma 3.5.3. We wish to bound‖A‖F→F := 1√nm‖N˜F∗‖F→F = 1√nm‖N˜ ‖F→F .LetM ∈ Cnm×N2 be the matrix representation of the operator 1√nmN˜ :CN×N → Cn×m. In other words, if vecd1,d2 : Cd1×d2 → Cd1d2 denotes theinvertible operator that reorganizes a d1 × d2 matrix into a vector of lengthd1d2, Then for v ∈ CN2 the action ofM is given asMv = vecn,m(1√nmN˜ (vec−1N,N(v))) .With this operator defined, we see that1√nm‖N˜ ‖F→F = ‖M‖,since any matrix X ∈ S will be uniquely identified with vecN,N(X)∈ SN2−1and the supremums agree.Using a matrix Bernstein inequality, we will show that‖M∗M−I‖ ≤ 2Nr,which will then show by Proposition A.15 in [85] that‖M‖ ≤√1 +2Nr≤√3Nr.We will denote the rows of our ensemble asM(k,`)∗ = 1√nmvecN,N(N˜ k,`)where this expression is allowed by our definition of M and the strangeindexing is for convenience to refer to our original ensemble in a natural way(the result is irrelevant of the ordering of the rows).83We expand‖M∗M−I‖ = ‖n∑k=1m∑`=1(M∗(k,`)∗M(k,`)∗ − EM∗(k,`)∗M(k,`)∗) ‖:= ‖n∑k=1m∑`=1(Sk,` − ESk,`) ‖,where the first equality holds by isotropy of1nmN˜ ∗N˜ (3.24) shown in Section3.6.2. We wish to apply a Matrix Bernstein inequality (Corollary 6.1.2 in[48]) to this sum of independent and centered random matrices. However thismakes computing the matrix variance statistic difficult, i.e., the parameter νin [48] defined asν :=max{‖E∑k,`(Sk,` − ESk,`) (Sk,` − ESk,`)∗ ‖, ‖E∑k,`(Sk,` − ESk,`)∗ (Sk,` − ESk,`) ‖}= max{‖E∑k,`(Sk,`S∗k,` − ESk,`ES∗k,`) ‖, ‖E∑k,`(S∗k,`Sk,` − ES∗k,`ESk,`) ‖}.Specifically we wish to handle many different deviation distributions simul-taneously (i.e., all D1,D2 satisfying our deviation model), making the crossterms (∑k,` ESk,`ES∗k,` and∑k,` ES∗k,`ESk,`) particularly messy and perhapsonly possible to handle in a case by case basis.Instead, we circumvent this complication by applying our modified uni-form matrix Bernstein inequality, Theorem 3.9.1. Indeed notice that thisresult avoids dealing with the cross terms, at the cost of a reduced proba-bility of success (which will still hold with high probability). Applying thisresult will greatly simply our computations, and allow us to proceed with allof our deviation models simultaneously.We thus apply Theorem 3.9.1 and compute the corresponding parametersL and ν˜. To compute L, notice that‖M∗(k,`)∗M(k,`)∗ −1nmI‖ ≤ ‖M∗(k,`)∗M(k,`)∗‖+1nm= ‖M∗(k,`)∗M(k,`)∗‖F +1nm= ‖M(k,`)∗‖22 +1nm=1nm‖N˜ k,`‖2F +1nm=N2nm+1nm≤ 2N2nm:= L.The first equality holds sinceM∗(k,`)∗M(k,`)∗ is a rank 1 matrix.84Now to compute the variance, notice thatED1,D2n∑k=1m∑`=1((M∗(k,`)∗M(k,`)∗) (M∗(k,`)∗M(k,`)∗)∗ − 1(nm)2I)= ED1,D2n∑k=1m∑`=1(‖M(k,`)∗‖22(M∗(k,`)∗M(k,`)∗)− 1(nm)2I)= ED1,D2n∑k=1m∑`=1(N2nm(M∗(k,`)∗M(k,`)∗)− 1(nm)2I)=N2nmI − 1nmI.Similarly,ED1,D2n∑k=1m∑`=1((M∗(k,`)∗M(k,`)∗)∗ (M∗(k,`)∗M(k,`)∗)− 1(nm)2I)=N2nmI − 1nmI.and we can therefore we can chooseν˜ =N2nm.We now apply Theorem 3.9.1 to obtain for any δ ≥ 0P(‖∑k,`(M∗(k,`)∗M(k,`)∗ −Inm)‖ ≤ 2δ)≥1− 2nm− 4N exp(− δ28 log(nm)ν˜ + 2Lδ3)=1− 2nm− 4N exp(− nmδ28 log(nm)N2 + 4N2δ3).Choose δ = Nr, which gives‖M∗M−I‖ ≤ 2Nr,85with probability greater than1− 2nm− 4N exp(− nm8 log(nm)r2 + 4Nr3).Therefore‖A‖F→F = ‖M‖ ≤√1 +2Nr≤√3Nr,by proposition A.15 in [85] with the advertised probability.3.6.4 Proof of Lemma 3.5.4We finish by proving the last lemma used to prove Theorem 3.3.1, whichprovides the parameter β4 in (3.16).proof of Lemma 3.5.4. We simplify our workload with several tricks. Toavoid dealing with PT⊥ and F , notice that‖PT⊥ ◦ A∗A(UrV ∗r )‖ = ‖PT⊥(A∗A(UrV ∗r )− UrV ∗r )‖ ≤ ‖(A∗A− I)(UrV ∗r )‖:=‖F ◦ ( 1nmN˜ ∗N˜ − I) ◦ F∗(UrV ∗r )‖ ≤ ‖(1nmN˜ ∗N˜ − I) ◦ F∗(UrV ∗r )‖:=‖( 1nmN˜ ∗N˜ − I)(X)‖,where we definedX := F∗(UrV ∗r ). The first equality holds since PT⊥(UrV ∗r ) =0, and the first inequality holds since PT⊥ is an orthogonal projection.We may therefore bound the term ‖( 1nmN˜ ∗N˜ −I)(X)‖ instead. As in theproof of Lemma 3.5.3, we will achieve our goal by applying a matrix Bern-stein inequality for operator norm (Corollary 6.1.2 in [48]) via comparisonto the uniform counterpart of our operator (Theorem 3.9.1, uniform matrixBernstein inequality).Using again the notation of Lemma 3.6.2, we expand(1nmN˜ ∗N˜ − I)(X) =n∑k=1m∑`=1(1nm|N˜ k,`)(N˜ k,`|(X))−X:=n∑k=1m∑`=1(Sk,`)−X,86and proceed by applying our uniform matrix Bernstein inequality, Theorem3.9.1. As in the proof of Lemma 3.5.3, this approach will allow us to dealwith all distributions from our deviation model simultaneously (as opposedto a direct application of Corollary 6.1.2 in [48]).We now apply Theorem 3.9.1. To bound the operator norm of each matrix(parameter L in Theorem 3.9.1), we have1nm‖|N˜ k,`)(N˜ k,`|(X)‖ = 1nm|〈N˜ k,`, X〉|‖N˜ k,`‖ := 1nm|〈N˜ k,`,F∗(UrV ∗r )〉|‖N˜ k,`‖≤γ2rnm‖N˜ k,`‖ = γ2rnm‖N˜ k,`‖F = γ2rNnm.The inequality is due to Lemma 3.6.1 since UrV∗r ∈ T , which holds for anymulti-index (k, `) ∈ [n]× [m]. The second to last equality holds since N˜ k,` isalways rank 1. It is therefore easy to see that‖Sk,` − Xnm‖ := 1nm‖|N˜ k,`)(N˜ k,`|(X)−X‖ ≤ 2γ2rNnm:= L.Now for the variance matrix statistic of the sum (parameter ν˜ in Theorem3.9.1). We see that‖ED1,D2∑k,`(S∗k,`Sk,` −X∗X(nm)2)‖ ≤ ‖ED1,D2∑k,`S∗k,`Sk,`‖+‖X∗X‖nm≤ ‖ED1,D2∑k,`S∗k,`Sk,`‖+1nm.The last inequality holds since‖X∗X‖ := ‖ (F∗(UrV ∗r ))∗F∗(UrV ∗r )‖ ≤ ‖F∗(UrV ∗r )‖‖F∗(UrV ∗r )‖ = ‖UrV ∗r ‖2 = 1.To bound the remaining term, we show that ED1,D2∑k,` S∗k,`Sk,` is a diagonal87matrix. Let u, v ∈ [N ], then(ED1,D2n∑k=1m∑`=1S∗k,`Sk,`)uv:=ED1,D2n∑k=1m∑`=11(nm)2(N˜ k,`∗N˜ k,`)uv|〈N˜ k,`, X〉|2=1(nm)2ED1,D2∑k,`〈N˜ k,`∗u , N˜ k,`∗v 〉|〈N˜ k,`, X〉|2=1(nm)2ED1,D2∑k,`(N∑q=1N˜ k,`qu N˜k,`qv)|〈N˜ k,`, X〉|2:=1(nm)2ED1,D2∑k,`(N∑q=1e (ν˜k,`(u− v)))|〈N˜ k,`, X〉|2=N(nm)2ED1,D2∑k,`e (ν˜k,`(u− v)) |〈N˜ k,`, X〉|2=N(nm)2ED1,D2∑k,`e (ν˜k,`(u− v))∣∣∣∣∣N∑q,j=1X¯qjN˜ k,`qj∣∣∣∣∣2=N(nm)2ED1,D2∑k,`e (ν˜k,`(u− v))N∑q,q˜,j,j˜=1X¯qjXq˜j˜e (τ˜k,`(q − q˜)) e(ν˜k,`(j − j˜))=N(nm)2N∑q,q˜,j,j˜=1X¯qjXq˜j˜(ED1,D2∑k,`e (τ˜k,`(q − q˜)) e(ν˜k,`(u− v + j − j˜)))=NnmN∑q=1∑j−j˜=v−uX¯qjXqj˜ =NnmN∑q=1∑j∈QuvX¯qjXq(j+u−v).We take a moment to breathe and explain our calculations thus far. Thesecond to last equality holds by our deviation model, similar to the argumentused to establish isotropy (3.24) and the proof of the second part of Lemma3.6.1. In the last line Quv is the index set of allowed values for j accordingto u and v, i.e., so that j, j + u − v ∈ [N ]. Notice that if u = v, Quv = [N ]88and we have shown for all u ∈ [N ] that(ED1,D2S∗pSp)uu=NnmN∑q,j=1XqjX¯qj =N‖X‖2Fnm=N‖UrV ∗r ‖2Fnm=Nrnm.We continue assuming u 6= v. Since X := F∗(UrV ∗r ) := F∗(Z), we obtainNnm∑q∑j∈QuvX¯qjXq(j+u−v) :=Nnm∑q∑j∈Quv〈(F∗)q,j, Z〉〈(F∗)q,(j+u−v), Z〉=Nnm∑q∑j∈QuvN∑k,`=1N∑k˜,˜`=1(F∗)q,jk` Zk`(F∗)q,(j+u−v)k˜ ˜` Z¯k˜ ˜`:=1Nnm∑q∑j∈QuvN∑k,`=1N∑k˜,˜`=1e (q(τk − τk˜)) e (ν`j − ν˜`(j + u− v))Zk`Z¯k˜ ˜`=1Nnm∑j∈QuvN∑k,`=1N∑k˜,˜`=1e (ν`j − ν˜`(j + u− v))Zk`Z¯k˜ ˜`(N∑q=1e (q(τk − τk˜)))=1nm∑j∈QuvN∑k=1N∑`=1N∑˜`=1e (ν`j − ν˜`(j + u− v))Zk`Z¯k ˜`=1nm∑j∈QuvN∑`=1N∑˜`=1e (ν`j − ν˜`(j + u− v))(N∑k=1Zk`Z¯k ˜`)=1nm∑j∈QuvN∑`=1e (ν`(j − (j + u− v)) = 1nm∑j∈QuvN∑`=1e (ν`(v − u))=|Quv|nmN∑`=1e (ν`(v − u)) = 0.In the sevent equality, we usedN∑k=1Zk`Z¯k ˜` :=N∑k=1(UrV∗r )k`(UrV∗r )k ˜` ={0 if ` 6= ˜`1 if ` = ˜`,which holds since the columns of Ur, Vr are orthonormal.Therefore‖ED1,D2∑k,`S∗k,`Sk,`‖ =Nrnm‖I‖ = Nrnm.89Similarly, we can show‖ED1,D2∑k,`Sk,`S∗k,`‖ =Nrnm,allowing us to bound ν˜ ≤ (Nr+1)nm≤ 2Nrnm. We now apply Theorem 3.9.1 toobtain for any δ ≥ 0P(‖∑k,`(1nm|N˜ k,`)(N˜ k,`|(X))−X‖ ≤ 2δ)≥1− 2nm− 4N exp(− δ28 log(nm)ν˜ + 2Lδ3)=1− 2nm− 4N exp(− nmδ216Nr log(nm) + 4γ2Nrδ3).We conclude‖PT⊥ ◦ A∗A(UrV ∗r )‖ ≤ ‖(1nmN˜ ∗N˜ − I)(F∗(UrV ∗r ))‖ ≤ 2δ,with the prescribed probability.3.7 Interpolation Error of 2D Dirichlet Kernel:ProofIn this section we establish the error term of our 2D Dirichlet interpolationkernel when applied to our signal model. The argument utilizes the ideas ofthe proof of Theorem 2.2.1 (the analogous 1D statement), and we will referto this proof for brevity.Proof of Theorem 3.2.1. We begin by establishing the error-free case (3.4).Under this scenario we assume τ˜k,` = τp˜ and ν˜k,` = νq˜ for some p˜, q˜ ∈ [N ].The claim follows easily by orthogonality of the complex exponential basis.90Combining (3.2), (3.3) we have (recall that N˜ := N−12)(S(D))k` = 1N2N∑p=1N∑q=1Dpq N˜∑u=−N˜N˜∑v=−N˜e(uτp)e(−uτ˜k,`)e(vνq)e(−vν˜k,`):=1N2N∑p=1N∑q=1Dpq N˜∑u=−N˜N˜∑v=−N˜e(uτp)e(−uτp˜)e(vνq)e(−vνq˜)=1N2N∑p=1N∑q=1Dpq N˜∑u=−N˜e(u(τp − τp˜)) N˜∑v=−N˜e(v(νq − νq˜))= Dp˜q˜ = D(τp˜, νq˜) := D(τ˜k,`, ν˜k,`) = D˜k`.To establish (3.5), recall the Fourier expansion of our 2D functionD(x, y) =∞∑k˜=−∞∞∑˜`=−∞ck˜,˜`e(k˜x)e(˜`y).Combining (3.2), (3.3) and the Fourier expansion we obtain(S(D))k` = 1N2N∑p=1N∑q=1Dpq N˜∑u=−N˜N˜∑v=−N˜e(uτp)e(−uτ˜k,`)e(vνq)e(−vν˜k,`)=1N2N∑p=1N∑q=1D(τp, νq) N˜∑u=−N˜N˜∑v=−N˜e(uτp)e(−uτ˜k,`)e(vνq)e(−vν˜k,`)=1N2N∑p=1N∑q=1 ∞∑k˜=−∞∞∑˜`=−∞ck˜,˜`e(k˜τp)e(˜`νq)· N˜∑u=−N˜N˜∑v=−N˜e(uτp)e(−uτ˜k,`)e(vνq)e(−vν˜k,`) .At this point, we wish to switch the order of summation to sum over allp, q. To this end, we proceed assuming Dpq =∑k˜,˜`ck˜,˜`e(k˜τp)e(˜`νq) 6= 0 and∑u,v e(uτp)e(−uτ˜k,`)e(vνq)e(−vν˜k,`) 6= 0 for all p, q ∈ [N ]. We will deal withthe case where these terms vanish afterward. In particular we can remove the91assumption Dpq 6= 0 and we will show the remaining term does not vanishunder our assumption τ˜ × ν˜ ⊂ Ω.Continuing under our added assumption, we may switch the order ofsummation and sum over all p, q since these summands are non-zero to obtain(S(D))k` =1N2∑u,v∞∑k˜=−∞∞∑˜`=−∞ck˜,˜`e(−uτ˜k,`)e(−vν˜k,`)(∑p,qe((u+ k˜)τp)e((v + ˜`)νq))=∑u,v∞∑j=−∞∞∑j˜=−∞(−1)jN+j˜Nc(jN+u),(j˜N+v)e(uτ˜k,`)e(vν˜k,`)=∞∑j=−∞∞∑j˜=−∞(−1)b j+N˜N c+b j˜+N˜N ccj,j˜e(rN(j)τ˜k,`)e(rN(j˜)ν˜k,`).The second equality is obtained by orthogonality of the exponential basisfunctions,∑np=1 e((u + k˜)τp) = 0 when k˜ + u /∈ NZ and otherwise equal to(−1)jN = (−1)j for some j ∈ Z (where u + k˜ = jN). A similar conclusionholds for the independent sum over index q (with v + ˜` = j˜N). The lastinequality results from a reordering of the series as in the proof of Theorem2.2.1. To elaborate this reordering, note that (−1)jN+j˜N = (−1)j+j˜ (N isassumed to be odd) and we ask the reader to recall the reordering in theproof of Theorem 2.2.1 in Section 2.6 in order to apply the same argumenthere. Starting with the previous term, we apply the reordering argument92from the 1D case twice as follows∑u,v∞∑j=−∞∞∑j˜=−∞(−1)jN+j˜Nc(jN+u),(j˜N+v)e(uτ˜k,`)e(vν˜k,`)=∑v∞∑j˜=−∞(−1)j˜e(vν˜k,`)(∑u∞∑j=−∞(−1)jc(jN+u),(j˜N+v)e(uτ˜k,`))=∑v∞∑j˜=−∞(−1)j˜e(vν˜k,`)( ∞∑j=−∞(−1)b j+N˜N ccj,(j˜N+v)e(rN(j)τ˜k,`))=∞∑j=−∞(−1)b j+N˜N ce(rN(j)τ˜k,`)∑v∞∑j˜=−∞(−1)j˜cj,j˜N+ve(vν˜k,`)=∞∑j=−∞(−1)b j+N˜N ce(rN(j)τ˜k,`) ∞∑j˜=−∞(−1)b j˜+N˜N ccj,j˜e(rN(j˜)ν˜k,`) ,where the 1D reordering was applied in the second and fourth equalities.For j, j˜ ∈ {−N˜ ,−N˜+1, · · · , N˜−1, N˜} we have (−1)b j+N˜N c = (−1)b j˜+N˜N c =1, rN(j) = j and rN(j˜) = j˜. Using the Fourier expansion of D(τ˜k,`, ν˜k,`) =D˜k` we have shownD˜k` − (S(D))k` = D(τ˜k,`, ν˜k,`)− (S(D))k` =∑|p|>N˜∑|q|>N˜cp,q(e(pτ˜k,`)e(qν˜k,`)− (−1)bp+N˜Nc+b q+N˜Nce(rN(p)τ˜k,`)e(rN(q)ν˜k,`)).The definition of the Frobenious norm and the triangle inequality finish theproof in this case.We have left to consider the cases where Dpq = 0 or∑u,ve(uτp)e(−uτ˜k,`)e(vνq)e(−vν˜k,`) = 0for some p, q ∈ [N ]. Both of these will be handled as in the proof of Theorem2.2.1 in Section 2.6 as we refer the reader to this proof for a more detailedexplanation.93The constraint Dpq 6= 0 for all p, q ∈ [N ] can be removed by finding someµ ∈ R such that Hpq := Dpq + µ 6= 0 for all p, q ∈ [N ] and let H := D + µ.Then the argument above holds assuming only∑u,v e(uτp)e(−uτ˜k,`)e(vνq)e(−vν˜k,`) 6=0 for all p, q ∈ [N ]. We concludeH(τ˜k,`, ν˜k,`)− (S(H))k` =∑|p|>N˜∑|q|>N˜cp,q(e(pτ˜k,`)e(qν˜k,`)− (−1)bp+N˜Nc+b q+N˜Nce(rN(p)τ˜k,`)e(rN(q)ν˜k,`)),but it is easy to show thatH(τ˜k,`, ν˜k,`)− (S(H))k` = D(τ˜k,`, ν˜k,`)− (S(D))k`so the result holds in this case as well.For the final case, one case use the orthogonality of the complex exponen-tial basis to show that∑u,v e(uτp)e(−uτ˜k,`)e(vνq)e(−vν˜k,`) = 0 if τ˜k,` = jN− 12or ν˜k,` =j˜N− 12for some j, j˜ ∈ Z/{0, 1, · · · , N−1}. Both of these cases violatethe assumption τ˜ × ν˜ ⊂ Ω, so the previous arguments always apply.3.8 Proof of Theorem 3.3.2: Stable and RobustMatrix CompletionAs mentioned in Section 3.3.1, the proof of our matrix completion resultTheorem 3.3.2 is almost a corollary of Theorem 3.3.1. However, it requiresseveral changes in our assumptions and methodology. We present these de-tails here in the form of a quick informal argument rather than a proof forthe sake of brevity. With these differences noted, the proofs are very similar.In this context, let {(τ˜k, ν˜k)}mk=1 ⊂ Ω denote our non-equispaced samples(notice we have dropped the ` index). Our sampled matrix entries Λ in thiscontext are drawn from the uniform distribution on [N ]2. This is achievedby generating for each k ∈ [m](τ˜k, ν˜k) = (τpk , νqk)where pk, qk are chosen independently uniformly at random from [N ]. Here{(τp, νq)}p,q∈[N ] still correspond to our equispaced samples from Section 3.2.1,94which indeed give the matrix entries of D, i.e., Dpq = D(τp, νq) (the signalmodel D will no longer be needed here, see below).Several things to notice. First, for all k ∈ [m] we have (τ˜k, ν˜k) = (τp, νq)for some p, q ∈ [N ] so that we may apply the error-free interpolation result(3.4) from Theorem 3.2.1. This justifies our noise constraint ∼ η in (3.11)that does not include the error of theN−12-bandlimited approximation, i.e., Dis feasible under this noise constraint. Notice that this observation removesour need for the signal modelD(x, y), since we are only working with samplesof the form Dpq which are fit for any matrix (not necessarily generated as adiscretization of a function).Second and most important, we no longer strictly fit our 2D deviationmodel from Section 3.2.2 since we have gotten rid of the deviation structure,i.e., ∆,Γ no longer play a role. However, our new ensemble still exhibitsisotropy (3.24) since for `, ˜`∈ [N ] with ` 6= ˜`we haveEe(τ˜k(`− ˜`))=1NN∑p=1e(τp(`− ˜`))=1Ne(−`−˜`2)1− e(`− ˜`)1− e(`−˜`N) = 0,where the last equality holds since |`− ˜`| ≤ N−1. This observation allows usto keep the crucial isotropy property (3.24) as shown in Section 3.6.2, which isapplied several times throughout the proof of Theorem 3.3.1 and its requiredlemmas. Furthermore, Lemma 3.5.4 still holds, where the orthogonality isneeded to show that E∑k,` S∗k,`Sk,` and E∑k,` Sk,`S∗k,` are diagonal matrices.This claim remains true here and can be proved in a very similar manner.The only computation remaining that is lost using our modified samplemodel is in the proof of Lemma 3.6.1, where this orthogonality is used tobound the sum of the fourth moments in establishing (3.26). However, avariant of this claim still holds under this context. In particular we can showthat for D ∈ T we have1mEm∑k=1|〈N˜ k,k,F∗(D)〉|4 ≤ 4γ4r‖D‖4F , (3.31)where N˜ k,k is meant to adopt our previous notation, i.e., for k ∈ [m] andp, q ∈ [N ]N˜ k,kpq = e(τ˜k(p− N˜ − 1))e(ν˜k(q − N˜ − 1)).Notice that the only difference between (3.31) and (3.26) is the factor of 4.The claim (3.31) can be shown as follows: assume D ∈ T . Expand and95obtain1mEm∑k=1|〈N˜ k,k,F∗(D)〉|4=N∑p1,p2,p4,p3=1N∑q1,q2,q4,q3=1F∗(D)p1q1F∗(D)p2q2F∗(D)p3q3F∗(D)p4q4·(1mEm∑k=1¯˜N k,kp1q1N˜ k,kp2q2 ¯˜N k,kp3q3N˜ k,kp4q4)=∑p1−p2=p4−p3∑q1−q2=q4−q3F∗(D)p1q1F∗(D)p2q2F∗(D)p3q3F∗(D)p4q4−∑p1−p2=p4−p3∑q1−q2+q3−q4=±NF∗(D)p1q1F∗(D)p2q2F∗(D)p3q3F∗(D)p4q4−∑p1−p2+p3−p4=±N∑q1−q2=q4−q3F∗(D)p1q1F∗(D)p2q2F∗(D)p3q3F∗(D)p4q4+∑p1−p2+p3−p4=±N∑q1−q2+q3−q4=±NF∗(D)p1q1F∗(D)p2q2F∗(D)p3q3F∗(D)p4q4 .(3.32)The last equality holds by isotropy of our ensemble due to our sampling96model in this context. More precisely, we have1mEm∑k=1¯˜N k,kp1q1N˜ k,kp2q2 ¯˜N k,kp3q3N˜ k,kp4q4:=1mE(m∑k=1e(τ˜k(−p1 + p2 − p3 + p4))e(ν˜k(−q1 + q2 − q3 + q4)))=(1NN∑u=1e(τu(−p1 + p2 − p3 + p4)))(1NN∑v=1e(νv(−q1 + q2 − q3 + q4)))=(1Ne(−−p1 + p2 − p3 + p42)1− e(−p1 + p2 − p3 + p4)1− e (−p1+p2−p3+p4N) )·(1Ne(−−q1 + q2 − q3 + q42)1− e(−q1 + q2 − q3 + q4)1− e (−q1+q2−q3+q4N) )=1 if − p1 + p2 − p3 + p4 = 0and − q1 + q2 − q3 + q4 = 0,e (∓N/2) = −1 if − p1 + p2 − p3 + p4 = 0and − q1 + q2 − q3 + q4 = ±N,e (∓N/2) = −1 if − p1 + p2 − p3 + p4 = ±N,and − q1 + q2 − q3 + q4 = 0,e (∓N/2) e (∓N/2) = 1 if − p1 + p2 − p3 + p4 = ±N,and − q1 + q2 − q3 + q4 = ±N,0 otherwise.The midterms hold since e (∓N/2) = e∓piiN = −1 as N is assumed to be odd.The last case holds since |−p1+p2−p3+p4|, |−q1+q2−q3+q4| ≤ 2N−2 < 2N(this is why we only single out the mid-cases ±N).Continuing from (3.32), we can bound the first term of this last equality97as∣∣∣∣∣ ∑p1−p2=p4−p3∑q1−q2=q4−q3F∗(D)p1q1F∗(D)p2q2F∗(D)p3q3F∗(D)p4q4∣∣∣∣∣=∣∣∣∣∣∣N∑p1,p2=1N∑q1,q2=1F∗(D)p1q1F∗(D)p2q2∑p3,q3∈Qp1,p2,q1,q2F∗(D)p3q3F∗(D)(p1−p2+p3)(q1−q2+q3)∣∣∣∣∣∣≤N∑p1,p2=1N∑q1,q2=1|F∗(D)p1q1F∗(D)p2q2||〈F∗(D),F∗(D)〉|= ‖D‖2F(∑p,q|F∗(D)pq|)2≤ γ4r‖D‖4F .In the first equality, Qp1,p2,q1,q2 ⊂ [N ]2 is the index set of allowed valuesfor p3, q3 according to p1, p2, q1 and q2 (i.e., so that p3, q3 ∈ [N ] and (p1 −p2 + p3), (q1 − q2 + q3) ∈ [N ]). The last inequality holds since D ∈ T gives∑Np,q=1 |〈(F∗)p,q, D〉| ≤ γ2√r‖D‖F as in the proof of (3.25) from Lemma3.6.1.The remaining terms in (3.32) can be bounded similarly, this producesthe factor of 4 since we apply the bound for each of the 4 terms in (3.32).Hence, (3.31) holds.Using these observations allows the proof of Theorem 3.3.2 to proceedas in the proof of Theorem 3.3.1. The only difference being the factor of 4in (3.31) in contrast to (3.26) used to establish Theorem 3.3.1. This differ-ence only affects the probability by which Lemma 3.5.2 holds (in applyingTheorem 3.6.3 therein). This last observation gives the slightly modifiedprobability bound (3.12) in Theorem Uniform Matrix Bernstein InequalityThis section is dedicated to a modified matrix Bernstein inequality, crucial forthe proof of Lemma 3.5.3 and Lemma 3.5.4. As mentioned in those respectiveproofs, the advantage of this result is that it does not require one to deal withthe cross terms in the computation of the variance statistic in Corollary 6.1.2in [48] and other similar results. This is a valuable property for cases wherethe summands are not identically distributed or each individual summand is98not isotropic. Instead, this theorem only requires isotropy on the ensembleas a whole. For this reason, this result may be of interest on its own.Theorem 3.9.1. Let {Xk}mk=1 ⊂ Cd1×d2 be a sequence of independent randommatrices, where each matrix is generated from a distribution Dk. Assume thatm∑k=1EDkXk = X,‖Xk − Xm‖ ≤ L, ∀k ∈ [m]andmax{‖m∑k=1(EDkXkX∗k −XX∗m2)‖, ‖m∑k=1(EDkX∗kXk −X∗Xm2)‖} ≤ ν˜.Then for all δ ≥ 0P(‖m∑k=1Xk −X‖ ≤ 2δ)≥ 1− 2m− 4(d1 + d2) exp(− δ28 log(m)ν˜ + 2Lδ3).Proof. To establish the result, we will show that ‖∑mk=1Xk − X‖ can bebounded with high probability by two terms consisting of sums of indepen-dent and centered random matrices. We will then apply Corollary 6.1.2 in[48] to bound the operator norm of these sums.To this end, we consider the sequence of i.i.d. randommatrices {Xjp}m˜p=1 ⊂Cd1×d2 , where each jp ∼ U([m]) and Xjp is generated according to Djp . Inother words, each Xjp will be of the form Xk for some k chosen uniformly atrandom from [m] and generated according to Dk as per the original ensemble.By an elementary analysis of the coupon collector's problem, if m˜ ≥λm log(m) then with probability greater than 1−(m)1−λ the generated indices{jp}m˜p=1 will collect the whole set [m] (see for example [77] Section 3.6). Inother words{jp}m˜p=1 = [m] ∪ J,where |J | = m(λ log(m)− 1).With this in mind, the proof goes as follows. Let m˜ = 4m log(m) so that{jp}m˜p=1 will contain [m] at least twice with probability greater than 1 − 2m(apply the coupon collector's problem twice). We will now create a subset of99indices Ω ⊂ [m˜] such that {jp}p∈Ω = [m]. This can be done in many ways,for the result we only require that P(p ∈ Ω) 6= 0 for all p ∈ [m˜] (which willalways hold by the uniform distribution of the jp's). However, for simplicitywe will actually create Ω in such a way that P(p ∈ Ω) = P(p˜ ∈ Ω) for allp 6= p˜ (though we will not prove this since it is not crucial).For each k ∈ [m], define the setCk := {p ∈ [m˜] : jp = k},and notice that |Ck| ≥ 2 for all k (with probability larger than 1 − 2m).Then for each k choose index pk ∈ [m˜] uniformly at random from Ck, i.e.,pk ∼ U(Ck). We achieve our set of interest by definingΩ :=m⋃k=1{pk}.Thus {jp}m˜p=1 = [m]⋃{jp}p/∈Ω = {jp}p∈Ω⋃{jp}p/∈Ω, where the equalityholds w.h.p., and generate each matrix Xjp ∼ Djp . We have {Xjp}p∈Ω ={Xk}mk=1 and we have thus also generated an instance of our original ensemblewith high probability.Then‖m∑k=1Xk −X‖=‖m˜∑p=1Xjp −∑p/∈ΩXjp −X‖=‖m˜∑p=1Xjp −∑p/∈ΩXjp −(m˜m)X +(m˜−mm)X‖≤‖m˜∑p=1Xjp −(m˜m)X‖+ ‖∑p/∈ΩXjp −(m˜−mm)X‖= ‖m˜∑p=1(Xjp −Xm)‖+ ‖∑p/∈Ω(Xjp −Xm)‖, (3.33)and we may now bound the two last terms in (3.33) instead. This approachwill allow a simpler application of Corollary 6.1.2 in [48], where the main100advantage is that each matrix in the first sum over all p ∈ [m˜] satisfiesEjp,DjpXjp =m∑k=1P(jp = k)EDkXk =1mm∑k=1EDkXk =Xm, (3.34)by construction. Therefore, the first sum consists of independent and cen-tered random matrices.The same conclusion holds for the second sum over p /∈ Ω, but this willrequire additional work to show (i.e., we aim to show (3.34) and independencehold given that p /∈ Ω). First, we show that the summands in the sum overp /∈ Ω are centered.Fix p ∈ [m˜]. By (3.34) and the law of total expectation we haveXm= Ejp,DjpXjp= Ejp∈Ω,DjpXjpP(p ∈ Ω) + Ejp/∈Ω,DjpXjpP(p /∈ Ω), (3.35)where the subscripts in jp∈Ω (jp/∈Ω) are used the indicate the condition onexpectation. It is important to notice that P(p ∈ Ω) ≤ 12and P(p /∈ Ω) ≥ 12by construction of Ω and since we are guaranteed (w.h.p.) that {jp}p∈[m˜]contains [m] at least twice.Then since {jp}p∈Ω = [m], we haveEjp∈Ω,DjpXjp =m∑k=1P(jp = k|p ∈ Ω)EDkXk. (3.36)We will show that P(jp = k|p ∈ Ω) = 1m for all k ∈ [m]. Let k˜ 6= k andrecall the definition of Ck. Since |Ck| ≥ 2 for all k ∈ [m], by the law of totalprobabilityP(p ∈ Ω|jp = k) =m˜∑q=2P(p ∈ Ω|jp = k & |Ck| = q)P(|Ck| = q)=m˜∑q=21qP(|Ck| = q) =m˜∑q=21qP(|Ck˜| = q)=m˜∑q=2P(p ∈ Ω|jp = k˜ & |Ck˜| = q)P(|Ck˜| = q) = P(p ∈ Ω|jp = k˜).101The second equality holds by construction of Ω and the third equality holdssince the indices are drawn uniformly at random. Therefore, P(p ∈ Ω|jp =k) = P(p ∈ Ω|jp = k˜) for all k˜ 6= k. By Bayes theorem and our uniformdistribution of jp, we obtainP(jp = k|p ∈ Ω) = P(p ∈ Ω|jp = k)P(jp = k)P(p ∈ Ω)=P(p ∈ Ω|jp = k˜)P(jp = k˜)P(p ∈ Ω) = P(jp = k˜|p ∈ Ω).Let α := P(jp = k|p ∈ Ω), then1 =m∑k=1P(jp = k|p ∈ Ω) = αmand we must have α := P(jp = k|p ∈ Ω) = 1m for all k ∈ [m], as desired. Tofinish, our observation applied to (3.36) gives thatEjp∈Ω,DjpXjp =m∑k=11mEDkXk =Xm.Rearranging (3.35) we have shownXm(1− P (p ∈ Ω)) = Ejp/∈Ω,DjpXjpP(p /∈ Ω),so that diving by P(p /∈ Ω) 6= 0 gives the claim.Next, we show that this sequence of random matrices is independent. Forthis claim we must solely consider the random indices jp/∈Ω, since once {jp}p/∈Ωare set the matrices {Xjp}p/∈Ω will be generated independently according to{Djp}p/∈Ω.Our goal is to show that P(jp = k|{jp˜}p˜/∈Ω) = P(jp = k) for all p /∈ Ω andk ∈ [m]. To this end, let p, p˜ ∈ [m˜]/Ω (p 6= p˜) and k, k˜ ∈ [m] (k 6= k˜). Byconstruction, it is clear that P(jp = k) = P(jp = k|jp˜ = k˜). Therefore, by the102law of total probability we obtainP(jp = k) =m∑k˜=1P(jp = k|jp˜ = k˜)P(jp˜ = k˜) =m∑k˜=11mP(jp = k|jp˜ = k˜)=∑k˜ 6=k1mP(jp = k|jp˜ = k˜) + 1mP(jp = k|jp˜ = k)=m− 1mP(jp = k) +1mP(jp = k|jp˜ = k).The second equality uses P(jp = k|p /∈ Ω) = 1m , which holds due to ourprevious argument that showed P(jp = k|p ∈ Ω) = 1m . Combining termsand rearranging gives P(jp = k) = P(jp = k|jp˜ = k). This establishespairwise independence and a recursive application of the argument provesindependence of the ensemble.In conclusion, both terms in (3.33) correspond to sums of independentand centered random matrices and we now apply Corollary 6.1.2 in [48] tobound both operator norms by computing the corresponding parameters Land ν. This finishes our proof sinceL := maxp∈[m˜]‖Xjp − EXjp‖ = maxp∈[m˜]‖Xjp −Xm‖ = maxk∈[m]‖Xk − Xm‖agrees with our definition and‖m˜∑p=1Ejp,Djp(Xjp −Xm)(Xjp −Xm)∗‖ = ‖m˜∑p=1Ejp,Djp(XjpX∗jp −XX∗m2)‖=m˜‖Ejp,Djp(XjpX∗jp −XX∗m2)‖ = m˜m‖m∑k=1(EDkXkX∗k −XX∗m2)‖≤ m˜ν˜m= 4 log(m)ν˜.The second equality holds since the terms are i.i.d. and the last line appliesthe definition of ν˜ and m˜ := 4m log(m).A similar argument shows that‖m˜∑p=1Ejp,Djp(Xjp −Xm)∗(Xjp −Xm)‖ ≤ 4 log(m)ν˜103and therefore ν ≤ 4 log(m)ν˜.The conclusion provided by Corollary 6.1.2 in [48] isP(‖m˜∑p=1(Xjp −Xm)‖2 ≥ δ)≤ 2(d1 + d2) exp(− δ22ν + 2Lδ3)≤2(d1 + d2) exp(− δ28 log(m)ν˜ + 2Lδ3).By our previous observation, the same approach can be used for the thesecond term in (3.33) with sum over p /∈ Ω to obtainP‖∑p/∈Ω(Xjp −Xm)‖2 ≥ δ ≤ 2(d1 + d2) exp(− δ22(4 logm− 1)ν˜ + 2Lδ3)≤2(d1 + d2) exp(− δ28 log(m)ν˜ + 2Lδ3).Therefore‖m∑k=1Xk −X‖2≤‖m˜∑p=1(Xjp −Xm)‖2 + ‖∑p/∈Ω(Xjp −Xm)‖2≤2δ,where the first inequality fails with probability less than2m(by the couponcollector's problem) and the last inequality fails with probability less than4(d1 + d2) exp(− δ28 log(m)ν˜ + 2Lδ3)by a union bound.1043.10 Proof of Additional LemmasIn this section we prove Lemmas 3.6.1 and 3.6.2 stated in Section 3.6.2, usedto establish Lemma 3.5.2 and Lemma 3.5.4. We begin with the proof oflemma 3.6.1.proof of lemma 3.6.1. We begin by showing (3.25). Let D ∈ T and noticethat we can writeD =r∑j=1λjU∗jV ∗∗j.Notice that rank(D) ≤ r and ‖D‖F =√∑j λ2j .For any (k, `) ∈ [n]× [m], we obtain|〈N˜ k,`,F∗(D)〉| =∣∣∣∣∣N∑p,q=1N˜ k,`pq F∗(D)pq∣∣∣∣∣ ≤N∑p,q=1|F∗(D)pq|=N∑p,q=1|〈(F∗)p,q, D〉| ≤r∑j=1λjN∑p,q=1|〈(F∗)p,q, U∗jV ∗∗j〉|=r∑j=1λjN∑p,q=1|〈F1∗p, U∗j〉||〈F1∗q, V ∗∗j〉| ≤r∑j=1λjγ(Ur)γ(Vr) ≤ γ2√r‖D‖F .This establishes the first claim. Notice that we have shown∑Np,q=1 |〈(F∗)p,q, D〉| ≤γ2√r‖D‖F (this will be used to show the remaining inequality).For (3.26), we use our deviation model and computations similar to thosethat establish the isotropy property of our ensemble (see (3.24) and the ar-105gument that follows). We expand and obtain1nmEn∑k=1m∑`=1|〈N˜ k,`,F∗(D)〉|4=N∑p1,p2,p4,p3=1N∑q1,q2,q4,q3=1F∗(D)p1q1F∗(D)p2q2F∗(D)p3q3F∗(D)p4q4·(1nmEn∑k=1m∑`=1¯˜N k,`p1q1N˜ k,`p2q2 ¯˜N k,`p3q3N˜ k,`p4q4)=∑p1−p2=p4−p3∑q1−q2=q4−q3F∗(D)p1q1F∗(D)p2q2F∗(D)p3q3F∗(D)p4q4=N∑p1,p2=1N∑q1,q2=1F∗(D)p1q1F∗(D)p2q2∑p3,q3∈Qp1,p2,q1,q2F∗(D)p3q3F∗(D)(p1−p2+p3)(q1−q2+q3)≤N∑p1,p2=1N∑q1,q2=1|F∗(D)p1q1F∗(D)p2q2||〈F∗(D),F∗(D)〉|= ‖D‖2F(∑p,q|F∗(D)pq|)2≤ γ4r‖D‖4F .The last inequality holds since we previously showed∑Np,q=1 |〈(F∗)p,q, D〉| ≤γ2√r‖D‖F . In the third equality, Qp1,p2,q1,q2 ⊂ [N ]2 is the index set of allowedvalues for p3, q3 according to p1, p2, q1 and q2. To reiterate, the second equalityholds by isotropy of our ensemble due to our deviation model. More precisely,106if ∆˜, Γ˜ ∈ R are independent copies of the entries of ∆,Γ respectively, we have1nmEn∑k=1m∑`=1¯˜N k,`p1q1N˜ k,`p2q2 ¯˜N k,`p3q3N˜ k,`p4q4=ED1e(∆˜(−p1 + p2 − p3 + p4))ED2e(Γ˜(−q1 + q2 − q3 + q4))·(n∑k=11ne((k − 1n− 12)(−p1 + p2 − p3 + p4)))·(m∑`=11me((`− 1m− 12)(−q1 + q2 − q3 + q4)))=1 if − p1 + p2 − p3 + p4 = 0and − q1 + q2 − q3 + q4 = 0,ED2e(zm(Γ˜− 1/2))= 0 if − p1 + p2 − p3 + p4 = 0and − q1 + q2 − q3 + q4 = zm,for z ∈ Z/{0},ED1e(jn(∆˜− 1/2))= 0 if − p1 + p2 − p3 + p4 = jn,for j ∈ Z/{0},and − q1 + q2 − q3 + q4 = 0,ED1e(jn(∆˜− 1/2))ED2e(zm(Γ˜− 1/2))= 0 if − p1 + p2 − p3 + p4 = jn,and − q1 + q2 − q3 + q4 = zm,for j, z ∈ Z/{0},0 otherwise.The midterms vanish by our deviation model (∆˜ ∼ D1, Γ˜ ∼ D2) and be-cause −q1 + q2 − q3 + q4 = zm,−p1 + p2 − p3 + p4 = jn imply that z ∈(Z ∩ [2(1−N)m, 2(N−1)m])/{0} and j ∈(Z ∩ [2(1−N)n, 2(N−1)n])/{0} (since pk's, qk's∈[N ]).We now prove Lemma 3.6.2. The proof is due to [94], but requires aslight modification for our setting. Adopting the author's notation, in whatfollows for a matrix A ∈ CN×N we denote |A)(A| as the operator that mapsX 7→ A〈A,X〉.proof of Lemma 3.6.2. We will rely heavily on the proof of Lemma 3.1 in[94], and refer to this text and its notation for brevity. Indeed notice that U2107is defined as in [94] and the only difference is that we consider the supremumover U ∩ U2 (and redefine K with respect to U accordingly). Therefore weobtain our result in the same manner, but replace U2 in [94] with U2 ∩U andonly require a few additional observations.To elaborate, as in the beginning of the proof of Lemma 3.1 in [94], viacomparison principle and Dudley's inequality we can showE supX∈U∩U2m∑k=1k〈|Vk)(Vk|(X), X〉 ≤ 24√2pi∫ ∞0log1/2N(1√rU2 ∩ U , ‖ · ‖X , 2R√r)d,where N(1√rU2 ∩ U , ‖ · ‖X , )is a covering number (the number of balls inCN×N of radius  in the metric ‖ · ‖X needed to cover 1√rU2 ∩ U),R :=(supX∈U∩U2m∑k=1〈|Vk)(Vk|(X), X〉)1/2and ‖ · ‖X is a semi-norm CN×N defined as‖M‖X = maxk∈[m]|〈Vk,M〉|.For M ∈ U , notice that by assumption‖M‖X ≤ K√r‖M‖F . (3.37)This leaves the containments used throughout the proof unchanged with ourmodified value K. To elaborate, by (3.37) we have 1√rU2∩U ⊂ K ·BX (whereBX is the unit ball in ‖ · ‖X). Since 1√rU2 ⊂ B1 (where B1 is the unit ball in‖ · ‖∗) we have 1√rU2 ∩ U ⊂ B1.With these observations, we can now accept the covering number boundsprovided by [94] since these clearly hold for our subsets of interest, i.e.,N(1√rU2 ∩ U , ‖ · ‖X , )≤ N (K ·BX , ‖ · ‖X , ) ≤(1 +2K)2N2for small  andN(1√rU2 ∩ U , ‖ · ‖X , )≤ N (B1, ‖ · ‖X , ) ≤ exp(C21K2log3(N) log(m)),108for large , where C1 is an absolute constant given by Maurey's empiricalmethod (this is proven in Section 3.3 of [94]).The remainder of the proof is as in [94], where the covering numbersbound the integral to obtainE supX∈U∩U2m∑k=1k〈|Vk)(Vk|(X), X〉 ≤ CR√rK log5/2(N) log1/2(m),where C is an absolute constant.109Chapter 4ConclusionThis thesis considers the benefits of randomly generated off-the-grid samplesfor signal acquisition (in comparison to equispaced sampling), with the goal ofproviding explicit statements that are informative for practitioners. In largepart, this goal has been achieved where the main results provide novel insighton the anti-aliasing nature of nonuniform samples. The methodology andanalysis uses ideas from compressive sensing and low-rank matrix recoveryto develop a random deviation model, sampling complexity and recovery errorbounds in terms of classical signal processing concepts (error of bandlimitedapproximation).The specific contributions are listed below:• In Chapter 2, we consider 1D functions f ∈ H1([−1/2, 1/2)) that admitan s-sparse uniform discretization f ∈ CN in some basis (i.e., f = Ψgwith ‖g‖0 ≤ s). We show that O(s polylog(N)) random off-the-gridsamples of f from our deviation model are sufficient to recover theN−12-bandlimited approximation of f via the basis pursuit problem with aninterpolation kernel (2.9). When (s  N), this is a stark contrast touniform sampling where O(N) equispaced samples are needed for thesame quality of reconstruction. Furthermore, the results are robustwhen the measurements are noisy and stable when g is not exactlys-sparse (but is well approximated by an s-sparse vector).• Chapter 3 extends the results of Chapter 2 to a low-rank signal modelvia the theory of low-rank matrix recovery. In this context we con-sider 2D functions D ∈ H1([−1/2, 1/2)2) whose uniform discretization110D ∈ CN×N exhibits low-rank structure. By incorporating a 2D Dirich-let kernel into the nuclear norm minimization problem (3.6), we showthat O(Nr polylog(N)) random nonuniform samples provide the N−12-bandlimited approximation of D with error proportional to the bestrank r approximation of D. The methodology allows for additive noise,where the result gives a robust error bound proportional to the noiseenergy level. Therefore, when D can be well approximated by a rankr  N matrix, we may also achieve frugal signal acquisition in thisscenario via random off-the-grid samples. In comparison, uniform sam-ples would requireO(N2)measurements to obtain the N−12-bandlimitedapproximation of D.• Chapter 3 also produces a novel result for the noisy matrix completionproblem (Section 3.3.1). The main contribution of the work is a resultthat applies to full rank matrices with stable and robust recovery er-ror proportional to the low-rank approximation of the signal and noiselevel. We consider the sampling with replacement model where entriesof the matrix are observed uniformly at random with the possibility ofobserving the same entry more than once. Under this sampling andour r-incoherence structure, we show that O(Nr polylog(N)) observednoisy matrix entries are sufficient to recovery the rank r approximationof the data matrix with error robust to the noise. The result providesa contribution to the matrix completion literature as it applies to gen-eral full rank matrices while improving all known error bounds withstandard sampling complexity.This work allows for many avenues of future research. First, the overallmethodology of this thesis requires the practitioner to know the nonuniformsampling locations τ˜ accurately. While this is typical for signal reconstruc-tion techniques that involve non-equispaced samples, it would be of practicalinterest to extend the methodology is such a way that allows for robustnessto inaccurate sampling locations and even self-calibration.As mentioned in Section 2.9, this work has not dedicated much effortto a numerically efficient implementation of the Dirichlet kernel S. This iscrucial for large-scale applications, especially in the 2D case where a directimplementation of the Dirichlet kernel via its Fourier representation (3.2) and(3.3) or Dirichlet representation (see [68]) may be too inefficient for practicalpurposes. As future work, it would be useful to consider other interpolation111kernels with greater numerical efficiency (e.g., a low order Lagrange interpo-lation operator).Finally, while novel and informative, the matrix completion results inSection 3.3.1 are not very useful in applications due to the sampling withreplacement model. Indeed, in practice one would not be likely to samplethe same matrix entry twice. Extending the analysis for more practical sam-pling schemes is left as future work, where for example the standard uniformrandom sampling model may be achieved arguing as in Proposition 3.1 in[7].112Bibliography[1] A.J. Jerri [1977], The Shannon Sampling Theorem-its various exten-sions and applications: a tutorial review, Proc. IEEE, 65(11), 1565-1596.[2] A.I. Zayed [1993], Advances in Shannon's Sampling Theory. CRC Press.[3] A. Griffin and P. Tsakalides [2008], Compressed sensing of audio signalsusing multiple sensors. 16th European Signal Processing Conference.[4] A.Y. Aravkin, R. Kumar, H. Mansour, B. Recht and F.J. Herrmann[2014], Fast methods for denoising matrix completion formulations,with applications to robust seismic data interpolation. SIAM Journalon Scientific Computing, 36, S237-S266.[5] A. Zandieh, A. Zareian, M. Azghani and F. Marvasti [2014], Recon-struction of Sub-Nyquist Random Sampling for Sparse and Multi-BandSignals. arXiv.[6] B. Recht, M. Fazel, and P.A. Parillo [2010], Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Mini-mization. SIAM Review, 52, 471-501.[7] B. Recht [2011], A Simpler Approach to Matrix Completion. The Jour-nal of Machine Learning Research, 12, 3413-3430.[8] C.E. Shannon [1949], Communication in the Presence of Noise, Proc.IRE, 37(1), 10-21.[9] C. Li, C.C. Mosher and S.T. Kaplan [2012], Interpolated compres-sive sensing for seismic data reconstruction. 82nd annual internationalmeeting, SEG, Expanded Abstracts.113[10] D. Jagerman [1966], Bounds for Truncation Error of the Sampling Ex-pansion. SIAM J. Appl. Math., 14(4), 714-723.[11] D.R. Bellhouse [1981], Area Estimation by Point-counting Techniques.Biometrics, 37(2), 303-312.[12] D.P. Mitchell [1987], Generating Antialiased Images at Low SamplingDensities. ACM SIGGRAPH Computer Graphics, 21(4), 65-72.[13] D.P. Mitchell [1990], The Antialiasing Problem in Ray Tracing. SIG-GRAPH 90.[14] D.P. Dobkin, D. Eppstein and D.P. Mitchell [1996], Computing the Dis-crepancy with Applications to SuperSampling Patterns. ACM Trans-actions on Graphics, 15(4), 354-376.[15] D.M. Bechir and B. Ridha [2009], Non-uniform Sampling Schemes forRF Bandpass Sampling Receiver. International Conference on SignalProcessing Systems.[16] D. Gross [2011], Recovering Low-Rank Matrices from Few Coefficientsin Any Basis. IEEE Transactions on Information Theory, 57(3), 1548-1566.[17] E.T. Whittaker [1915], On the Functions Which are Represented bythe Expansion of Interpolating Theory, Proc. Roy. Soc. Edinburgh, 35,181-194.[18] E. Shlomot and Y.Y. Zeevi [1989], A Nonuniform Sampling and Repre-sentation Scheme for Images Which Are Not bandlimited. The SixteenthConference of Electrical and Electronics Engineers in Israel.[19] E. Margolis and Y.C. Eldar [2008], Nonuniform Sampling of PeriodicBandlimited Signals. IEEE Transactions on Signal Processing, 56(7),2728 - 2745.[20] Ewout van den Berg and Michael P. Friedlander [2008], Probing thepareto frontier for basis pursuit solutions. SIAM Journal on ScientificComputing, 31(2), 890-912.[21] E.J. Candès, and B. Recht [2009], Exact Matrix Completion via ConvexOptimization. Foundations of Computational Mathematics, 9, 717-772.114[22] E.J. Candès, and Y. Plan [2009], Matrix Completion With Noise. Pro-ceedings of the IEEE , 98, 925 - 936.[23] E.J. Candès and T. Tao [2010], The Power of Convex Relaxation: Near-Optimal Matrix Completion. IEEE Transactions on Information The-ory, 56(5), 2053-2080.[24] E. van den Berg and M.P. Friedlander [2014], Spot  A Linear-OperatorToolbox. [Online]. https://www.cs.ubc.ca/labs/scl/spot/index.html.[Accessed: 13- June- 2019].[25] Edi.lv. [2019]. Avoiding aliasing by Nonuniform sampling. [online] Avail-able at: http://www.edi.lv/media/uploads/UserFiles/dasp-web/sec-5.htm [Accessed 9 May 2019].[26] F.J. Beutler [1966], Error-Free Recovery of Signals from IrregularlySpaced Samples. Society for Industrial and Applied Mathematics, 8(3),328-335.[27] F. Beutler [1970], Alias-free Randomly Timed Sampling of StochasticProcesses. IEEE Transactions on Information Theory, 16(2), 147-152.[28] F. Marvasti [2001], Nonuniform sampling: theory and practice.Springer.[29] F.J. Herrmann, M.P. Friedlander and Ö. Ylmaz [2012], Fighting theCurse of dimensionality: compressive sensing in exploration seismology.IEEE Signal Processing Magazine, 29(3), 88 - 100.[30] F. Krahmer and R. Ward [2014], Stable and Robust Sampling Strate-gies for Compressive Imaging. IEEE Transactions on Image Processing,23(2), 612-622.[31] G. Beer [1991], A Polish Topology for the Closed Subsets of a Pol-ish Space. Proceedings of the American Mathematical Society, 113(4),1123 - 1133.[32] G. Chistyakov and Y. Lyubarskii [1997], Random Perturbations of Ex-ponential Ries Basis in L2(−pi, pi). Annales de l'institut Fourier, 47(1),201-255.115[33] G.L. Bretthorst [2001], Nonuniform Sampling: Bandwidth and Alias-ing. AIP Conference Proceedings, 567(1).[34] G. Hennenfent and F.J. Herrmann [2006], Seismic Denoising withNonuniformly Sampled Curvelets. Computing in Science and Engineer-ing, 8(3), 16-25.[35] G. Hennenfent and F.J. Herrmann [2008], Simply Denoise: WavefieldReconstruction via Jittered Undersampling. Geophysics, 73(3), V19-V28.[36] G.E. Pfander [2015], Sampling Theory, A Renaissance, BirkhäuserBasel.[37] H. Nyquist [1928], Certain Topics in Telegraph Transmission Theory,AIEE Trans., 47, 617-644.[38] H.S. Shapiro and R.A. Silverman [1960], Alias-free Sampling of RandomNoise. Journal of Society for Industrial and Applied Mathematics, 8(2),225-248.[39] H. Landau [1967], Necessary Density Condition for Sampling and In-terpolation of Certain Entire Functions, Acta Math, 117, 37-52.[40] H. Lee and Z. Bien [2005], Sub-Nyquist Nonuniform Sampling and Per-fect Reconstruction of Speech Signals. TENCON 2005 - 2005 IEEERegion 10 Conference.[41] H. Rauhut [2008], Stability Results for Random Sampling of SparseTrigonometric Polynomials. IEEE Transactions on Information The-ory, 54(12), 5661 - 5670.[42] H. Rauhut [2011], Compressive Sensing and Structured Random Ma-trices. Radon Series Comp. Appl., 1-94.[43] H. Boche, R. Calderbank, G. Kutyniok, J. Vybíral [2013], Compressedsensing and its applications. Birkhäuser.[44] J.M. Whittaker [1929], The Fourier Theory of the Cardinal Functions,Proc. Math. Soc. Edinburgh, 1, 169-176.116[45] J. Keiner, S. Kunis and D. Potts [2008], Using NFFT 3  a softwarelibrary for various non-equispaced fast Fourier transforms. ACM Trans.Math. Softw., 36, 19:119:30.[46] J.B. Harley and J.M.F. Moura [2013], Sparse recovery of the multi-modal and dispersive characteristics of Lamb waves. The Journal ofthe Acoustic Society of America, 133(5), 2732-2745.[47] J. Koh, W. Lee, T.K. Sarkar and M. Salazar-Palma [2013], Calculationof Far-Field Radiation Pattern Using Nonuniformly Spaced Antennasby a Least Square Method. IEEE Transactions on Antennas and Prop-agation, 62(4), 1572-1578.[48] J.A. Tropp [2015], An Introduction to Matrix Concentration Inequali-ties. Now Publishers.[49] J.A. Tropp [2015], Convex Recovery of a Structured Signal from In-dependent Random Linear Measurements. Pfander G. (eds) SamplingTheory, a Renaissance. Applied and Numerical Harmonic Analysis,Birkhäuser.[50] K. Ogura [1920], On a Certain Transcendental Function in the Theoryof Interpolation. Tôhoku Math. J., 17, 64-72.[51] K. Czy» [2004], Nonuniformly Sampled Active Noise Control System.IFAC Proceedings Volumes, 37(20), 351-355.[52] K. Han, Y. Wei and X. Ma [2016], An Efficient Non-uniform FilteringMethod for Level-crossing Sampling. IEEE International Conferenceon Digital Signal Processing.[53] L. Greengard and J. Lee [2004], Accelerating the Nonuniform fastFourier transform. Applied and Computational Harmonic Analysis, 35,111-129.[54] L. Li [2016], Math 660-Lecture 12: Spec-tral methods: Fourier. [Online]. Available:https://services.math.duke.edu/ leili/teaching/duke/math660s16/lectures/lec12.pdf.[Accessed: 13- May- 2019].117[55] M.I. Kadec [1964],The Exact Value of the Paley-Wiener Constant, Sov.Math. Dokl., 5, 559-561.[56] M. Ledoux and M. Talagrand [1991], Probability in Banach spaces.Springer.[57] M. Gastpar and Y. Bresler [2000], On The Necessary Density forSpectrum-Blind Nonuniform Sampling Subject to Quantization. Proc.IEEE Int. Conf. Acoustics, Speech, Signal Process., 1, 348-351.[58] M. Rudelson and R. Vershynin [2007], On sparse reconstruction fromFourier and Gaussian measurements. Communications on Pure and Ap-plied Mathematics, 61(8), 1025-1045.[59] M. Fazel, E.J. Candès, B. Recht and P.A. Parrilo [2008], Compressedsensing and robust recovery of low rank matrices. Signals, Systems andComputers, 2008 42nd Asilomar Conference on, 1043-1047.[60] M.A. Herman and T. Strohmer [2009]. High-resolution radar via com-pressed sensing. IEEE Transactions on Signal Processing, 57(6), 2275-2284.[61] M.W. Maciejewski, H.Z. Qui, M. Mobli and J.C. Hoch [2009], Nonuni-form Sampling and Spectral Aliasing. Journal of Magnetic Resonance,199(1), 88-93.[62] M. Pippig and D. Potts [2013], Parallel three-dimensional non-equispaced fast Fourier transforms and their applications to particlesimulation. SIAM Journal of Scientific Computing, 35(4), C411-C437.[63] M. Jia, C. Wang, K. Ting Chen and T. Baba [2013], An Non-uniformSampling Strategy for Physiological Signals Component Analysis. Di-gest of Technical Papers - IEEE International Conference on ConsumerElectronics, 526-529.[64] M. Kabanava and H. Rauhut [2015]. Cosparsity in compressed sensing.Boche H., Calderbank R., Kutyniok G., Vybíral J. (eds) CompressedSensing and its Applications. Applied and Numerical Harmonic Anal-ysis. Birkhäuser.118[65] M. Kabanava, H. Rauhut and U. Terstiege [2015]. Analysis of low rankmatrix recovery via Mendelson's small ball method. 2015 InternationalConference on Sampling Theory and Applications (SampTA).[66] M. Kabanava, R. Kueng, H. Rauhut and U. Terstiege [2016]. Stablelow-rank matrix recovery via null space properties. Information andInference: A Journal of the IMA, 5(4), 405-441.[67] M. Hajar, M. El Badaoui, A. Raad and F. Bonnardot [2019], DiscreteRandom Sampling: Theory and Practice in Machine Monitoring. Me-chanical Systems and Signal Processing, 123, 386-402.[68] O. López, R. Kumar, Ö. Ylmaz and F.J. Herrmann [2016], Off-the-Grid Low-Rank Matrix Recovery and Seismic Data Reconstruction.IEEE Journal of Selected Topics in Signal Processing, 10(4).[69] P.L. Butzer and R.L. Stens [1992], Sampling Theory for Not Necessarilybandlimited Functions: A Historical Overview, SIAM Rev., 34(1), 40-53.[70] P.W. Cary [1997], 3D Stacking of Irregularly Sampled Data by Wave-field Reconstruction. SEG Technical Program Expanded Abstracts.[71] P.S. Penev and L.G. Iordanov [2001], Optimal Estimation of SubbandSpeech from Nonuniform Non-recurrent Signal-driven Sparse Samples.IEEE International Conference on Acoustics, Speech and Signal Pro-cessing Proceedings.[72] P. Christensen, A. Kensler and C. Kilpatrick [2018], Progressive Multi-Jittered Sample Sequences. Eurographics Symposium on Rendering,37(4).[73] R. Paley and N. Wiener [1934], Fourier Transforms in the ComplexDomain. Amer. Math. Soc. Colloq. Publs., 19.[74] R. L. Cook, T. Porter and L. Carpenter [1984], Distributed Ray Trac-ing. ACM Siggraph 84 Conference Proceedings, 18(4), 165-174.[75] R. Kronauer and Y.Y. Zeevi [1985], Reorganization and Diversifica-tion of Signals in Vision. IEEE Transactions on Systems, Man, andCybernetics, 15(1), 91-101.119[76] R.L. Cook [1986], Stochastic Sampling in Computer Graphics. ACMTransactions on Graphics, 6(1).[77] R. Motwani and P. Raghavan [1995], Randomized Algorithms. Cam-bridge University Press.[78] R.D. Wisecup [1998], Unambiguous Signal Recovery Above the NyquistUsing Random-sample-interval Imaging. Geophysics, 63(2), 331-789.[79] R. Baraniuk, H. Choi, F. Fernandes, B. Hendricks, R. Nee-lamani, V. Ribeiro, J. Romberg, R. Gopinath, H. Guo, M.Lang, J.E. Odegard and D. Wei [2001], Rice Wavelet Toolbox.[Online]. https://www.ece.rice.edu/dsp/software/rwt.shtml. [Accessed:13- June- 2019].[80] R. Venkataramani and Y. Bresler [2001], Optimal Sub-Nyquist Nonuni-form Sampling and Reconstruction for Multiband Signals. IEEE Trans-actions on Signal Processing, 48(10), 2301 - 2313.[81] R. Young [2001], An Introduction to Non-Harmonic Fourier Series, El-sevier.[82] R. Kumar, O. López, E. Esser and F.J. Herrmann [2015], Matrix com-pletion on unstructured grids : 2-D seismic data regularization andinterpolation. EAGE Annual Conference Proceedings.[83] R. Kueng and P. Jung [2017]. Robust Nonnegative Sparse Recoveryand the Nullspace Property of 0/1 Measurements. IEEE Transactionson Information Theory, 64(2), 689-703.[84] S. Maymon and A.V. Oppenheim [2011], Sinc Interpolation of Nonuni-form Samples. IEEE Transactions on Signal Processing, 59(10), 4745- 4758.[85] S. Foucart and H. Rauhut [2013], A Mathematical Introduction to Com-pressive Sensing. Birkhäuser.[86] T. Strohmer [2000], Numerical Analysis of the Non-Uniform SamplingProblem. Journal of Computational and Applied Mathematics, 122,297-316.120[87] T. Klein and E. Rio [2005], Concentration around the mean for maximaof empirical processes, Annals of Probability, 33(3), 10601077.[88] T.T.Y. Lin and F.J, Herrmann [2009], Designing simultaneous acquisi-tions with compressive sensing. EAGE Annual Conference Proceedings.[89] T.T. Cai and A. Zhang [2014], Sparse Representation of a Polytope andRecovery of Sparse Signals and Low-Rank Matrices. IEEE Transactionson Information Theory, 60(1), 122-132.[90] T. Wu, S. Dey, and Mike Shuo-Wei Chen [2016], A Nonuniform Sam-pling ADC Architecture With Reconfigurable Digital Anti-Aliasing Fil-ter. IEEE Journal of Selected Topics in Signal Processing, 63(10), 1639- 1651.[91] V.A. Kotel'nikov [1933], On the Transmission Capacity of ether andwire in electrocommunications, (material for the first all-union confer-ence on questions of communication) Izd. Red. Upr. Svyazi RKKA.[92] W.L. Ferrar [1928], On the Consistency of Cardinal Function Interpo-lation, Proc. Roy. Soc. Edinburgh, 47, 230-242.[93] Y.Y. Zeevi and E. Shlomot [1993], Nonuniform Sampling and Antialias-ing in Image Representation. IEEE Transactions on Signal Processing,41(3), 1223-1236.[94] Y. Liu [2011], Universal Low-Rank Matrix Recovery from Pauli Mea-surements. NIPS'11 Proceedings of the 24th International Conferenceon Neural Information Processing Systems, 1638-1646.121


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items