{"Affiliation":[{"label":"Affiliation","value":"Science, Faculty of","attrs":{"lang":"en","ns":"http:\/\/vivoweb.org\/ontology\/core#departmentOrSchool","classmap":"vivo:EducationalProcess","property":"vivo:departmentOrSchool"},"iri":"http:\/\/vivoweb.org\/ontology\/core#departmentOrSchool","explain":"VIVO-ISF Ontology V1.6 Property; The department or school name within institution; Not intended to be an institution name."},{"label":"Affiliation","value":"Mathematics, Department of","attrs":{"lang":"en","ns":"http:\/\/vivoweb.org\/ontology\/core#departmentOrSchool","classmap":"vivo:EducationalProcess","property":"vivo:departmentOrSchool"},"iri":"http:\/\/vivoweb.org\/ontology\/core#departmentOrSchool","explain":"VIVO-ISF Ontology V1.6 Property; The department or school name within institution; Not intended to be an institution name."}],"AggregatedSourceRepository":[{"label":"AggregatedSourceRepository","value":"DSpace","attrs":{"lang":"en","ns":"http:\/\/www.europeana.eu\/schemas\/edm\/dataProvider","classmap":"ore:Aggregation","property":"edm:dataProvider"},"iri":"http:\/\/www.europeana.eu\/schemas\/edm\/dataProvider","explain":"A Europeana Data Model Property; The name or identifier of the organization who contributes data indirectly to an aggregation service (e.g. Europeana)"}],"Campus":[{"label":"Campus","value":"UBCV","attrs":{"lang":"en","ns":"https:\/\/open.library.ubc.ca\/terms#degreeCampus","classmap":"oc:ThesisDescription","property":"oc:degreeCampus"},"iri":"https:\/\/open.library.ubc.ca\/terms#degreeCampus","explain":"UBC Open Collections Metadata Components; Local Field; Identifies the name of the campus from which the graduate completed their degree."}],"Creator":[{"label":"Creator","value":"L\u00f3pez, Oscar Fabian","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/creator","classmap":"dpla:SourceResource","property":"dcterms:creator"},"iri":"http:\/\/purl.org\/dc\/terms\/creator","explain":"A Dublin Core Terms Property; An entity primarily responsible for making the resource.; Examples of a Contributor include a person, an organization, or a service."}],"DateAvailable":[{"label":"DateAvailable","value":"2019-08-30T16:05:20Z","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/issued","classmap":"edm:WebResource","property":"dcterms:issued"},"iri":"http:\/\/purl.org\/dc\/terms\/issued","explain":"A Dublin Core Terms Property; Date of formal issuance (e.g., publication) of the resource."}],"DateIssued":[{"label":"DateIssued","value":"2019","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/issued","classmap":"oc:SourceResource","property":"dcterms:issued"},"iri":"http:\/\/purl.org\/dc\/terms\/issued","explain":"A Dublin Core Terms Property; Date of formal issuance (e.g., publication) of the resource."}],"Degree":[{"label":"Degree","value":"Doctor of Philosophy - PhD","attrs":{"lang":"en","ns":"http:\/\/vivoweb.org\/ontology\/core#relatedDegree","classmap":"vivo:ThesisDegree","property":"vivo:relatedDegree"},"iri":"http:\/\/vivoweb.org\/ontology\/core#relatedDegree","explain":"VIVO-ISF Ontology V1.6 Property; The thesis degree; Extended Property specified by UBC, as per https:\/\/wiki.duraspace.org\/display\/VIVO\/Ontology+Editor%27s+Guide"}],"DegreeGrantor":[{"label":"DegreeGrantor","value":"University of British Columbia","attrs":{"lang":"en","ns":"https:\/\/open.library.ubc.ca\/terms#degreeGrantor","classmap":"oc:ThesisDescription","property":"oc:degreeGrantor"},"iri":"https:\/\/open.library.ubc.ca\/terms#degreeGrantor","explain":"UBC Open Collections Metadata Components; Local Field; Indicates the institution where thesis was granted."}],"Description":[{"label":"Description","value":"Many empirical studies suggest that samples of continuous-time\r\nsignals taken at locations randomly deviated from an equispaced grid can\r\nbenefit signal acquisition (e.g., undersampling and anti-aliasing).\r\nHowever, rigorous statements of such advantages and the respective\r\nconditions are scarce in the literature. This thesis provides some theoretical insight on\r\nthis topic when the deviations are known and generated i.i.d. from a variety of distributions. \r\n\r\nBy assuming the signal of interest is s-compressible (i.e., can be expanded by s\r\ncoefficients in some basis), we show that O(s polylog(N))$ samples randomly deviated from an equispaced grid are sufficient to recover the\r\nN\/2-bandlimited approximation of the signal. For sparse signals (i.e.,\r\ns \u226a N), this sampling complexity is a great reduction in comparison to\r\nequispaced sampling where O(N) measurements are needed for the\r\nsame quality of reconstruction (Nyquist-Shannon sampling theorem). The methodology consists of incorporating an interpolation kernel into the basis pursuit problem. The main result shows that this technique is robust with respect to measurement noise and stable when the signal of interest is not strictly sparse. \r\n\r\nAnalogous results are provided for signals that can be represented as an N \u00d7 N matrix with rank r. In this context, we show that O(rN polylog (N)) random nonuniform samples provide robust recovery of the N\/2-bandlimited approximation of the signal via the nuclear norm minimization problem. This result has novel implications for the noisy matrix completion problem by improving known error bounds and providing the first result that is stable when the data matrix is not strictly low-rank.","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/description","classmap":"dpla:SourceResource","property":"dcterms:description"},"iri":"http:\/\/purl.org\/dc\/terms\/description","explain":"A Dublin Core Terms Property; An account of the resource.; Description may include but is not limited to: an abstract, a table of contents, a graphical representation, or a free-text account of the resource."}],"DigitalResourceOriginalRecord":[{"label":"DigitalResourceOriginalRecord","value":"https:\/\/circle.library.ubc.ca\/rest\/handle\/2429\/71538?expand=metadata","attrs":{"lang":"en","ns":"http:\/\/www.europeana.eu\/schemas\/edm\/aggregatedCHO","classmap":"ore:Aggregation","property":"edm:aggregatedCHO"},"iri":"http:\/\/www.europeana.eu\/schemas\/edm\/aggregatedCHO","explain":"A Europeana Data Model Property; The identifier of the source object, e.g. the Mona Lisa itself. This could be a full linked open date URI or an internal identifier"}],"FullText":[{"label":"FullText","value":"EMBRACING NONUNIFORM SAMPLESbyOscar Fabian L\u00f3pezA THESIS SUBMITTED IN PARTIALFULFILLMENT OF THE REQUIREMENTS FORTHE DEGREE OFDOCTOR OF PHILOSOPHYinTHE FACULTY OF GRADUATE ANDPOSTDOCTORAL STUDIES(Mathematics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)August 2019c\u00a9 Oscar Fabian L\u00f3pez, 2019The following individuals certify that they have read, and recommendto the Faculty of Graduate and Postdoctoral Studies for acceptance, a the-sis\/dissertation entitled:EMBRACING NONUNIFORM SAMPLESsubmitted by OSCAR FABIAN L\u00d3PEZ in partial fulfillment of the require-ments for the degree of Doctor of Philosophy in MathematicsExamining Committee:\u00d6zg\u00fcr Y\u0019lmaz, Faculty of ScienceSupervisorFelix Herrmann, Faculty of ScienceSupervisorYaniv Plan, Faculty of ScienceSupervisory Committee MemberBrian Wetton, Faculty of ScienceUniversity ExaminerMichael Bostock, Faculty of ScienceUniversity ExaminerMichael Wakin, Colorado School of MinesExternal ExamineriiAbstractMany empirical studies suggest that samples of continuous-time signals takenat locations randomly deviated from an equispaced grid can benefit signalacquisition (e.g., undersampling and anti-aliasing). However, rigorous state-ments of such advantages and the respective conditions are scarce in theliterature. This thesis provides some theoretical insight on this topic whenthe deviations are known and generated i.i.d. from a variety of distributions.By assuming the signal of interest is s-compressible (i.e., can be expandedby s coefficients in some basis), we show that O(s polylog(N)) samples ran-domly deviated from an equispaced grid are sufficient to recover the N\/2-bandlimited approximation of the signal. For sparse signals (i.e., s \u001c N),this sampling complexity is a great reduction in comparison to equispacedsampling whereO(N)measurements are needed for the same quality of recon-struction (Nyquist-Shannon sampling theorem). The methodology consistsof incorporating an interpolation kernel into the basis pursuit problem. Themain result shows that this technique is robust with respect to measurementnoise and stable when the signal of interest is not strictly sparse.Analogous results are provided for signals that can be represented as anN \u00d7N matrix with rank r. In this context, we show that O(rN polylog(N))random nonuniform samples provide robust recovery of the N\/2-bandlimitedapproximation of the signal via the nuclear norm minimization problem.This result has novel implications for the noisy matrix completion problemby improving known error bounds and providing the first result that is stablewhen the data matrix is not strictly low-rank.iiiLay SummaryIn signal processing, sampling theory deals with the acquisition of continuousdata via a discrete set of samples (e.g., images, audio, video and metereolog-ical signals). The classical results in this field give a complete understandingof the number of equispaced (or uniform) samples needed to capture a sig-nal according to its oscillatory behavior (the largest frequency component).However, in many applications equispaced samples are difficult to honor dueto natural factors and equipment malfunction that deviate the sampling loca-tions from a desired equispaced grid. This makes non-equispaced (or nonuni-form) sampling an inherent theme in signal processing.While nonuniform samples are typically seen as a nuisance in practice,many works in the literature argue experimentally that such deviated mea-surements can reduce the number of required samples for signal acquisition(in contrast to the uniform case). Although several authors have provideda theoretical treatment of such claims, the topic remains largely incompleteand vague for practical purposes. This thesis extends the comprehension ofthis phenomenon by providing methodology, sampling complexity and signalrecovery error terms. The main result proves that when the deviations arerandom, one can exploit nonuniform samples to capture a signal with lesssamples than those required from equispaced samples.ivPrefaceThis thesis consists of my original research, conducted at the University ofBritish Columbia under supervision of Dr. \u00d6zg\u00fcr Y\u0019lmaz and Dr. Felix Her-rmann (now at the Georgia Institute of Technology). The following chaptersare composed of unpublished but soon to be submitted work for which I willbe the principal author. However, some of the ideas and writing scatteredthroughout this thesis are taken from [68] (and cited accordingly), for whichI was the principal author.vTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Sampling Theory . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Nonuniform Sampling . . . . . . . . . . . . . . . . . . . . . . . 31.2.1 Benefits of Nonuniform Sampling . . . . . . . . . . . . 41.2.2 An Open Problem . . . . . . . . . . . . . . . . . . . . 51.3 Compressive Sensing . . . . . . . . . . . . . . . . . . . . . . . 61.4 Low-Rank Matrix Recovery . . . . . . . . . . . . . . . . . . . 71.5 Overview of the Main Results . . . . . . . . . . . . . . . . . . 91.6 Organization and Notation . . . . . . . . . . . . . . . . . . . . 102 Compressive Off-the-Grid Sampling . . . . . . . . . . . . . . . 122.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.1.1 The Anti-aliasing Nature of Nonuniform Samples . . . 122.2 Notation, Assumptions and Methodology . . . . . . . . . . . . 152.2.1 Signal Model . . . . . . . . . . . . . . . . . . . . . . . 152.2.2 Deviation Model . . . . . . . . . . . . . . . . . . . . . 17vi2.2.3 Dirichlet Kernel . . . . . . . . . . . . . . . . . . . . . . 172.3 Main Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.3.1 Simplified Result . . . . . . . . . . . . . . . . . . . . . 202.3.2 Full Result . . . . . . . . . . . . . . . . . . . . . . . . . 212.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.4.1 Deviation Model . . . . . . . . . . . . . . . . . . . . . 232.4.2 Signal Model . . . . . . . . . . . . . . . . . . . . . . . 252.4.3 Novelty of the Results . . . . . . . . . . . . . . . . . . 272.5 Proof of Main Result . . . . . . . . . . . . . . . . . . . . . . . 282.5.1 Restricted Isometry Property of S\u03a8 . . . . . . . . . . . 292.5.2 Proof of Theorem 2.3.3 . . . . . . . . . . . . . . . . . . 372.6 Interpolation Error of Dirichlet Kernel: Proof . . . . . . . . . 392.7 Proof of Lemma 2.5.2 . . . . . . . . . . . . . . . . . . . . . . . 442.8 DFT-incoherence of Discretized Smooth Functions . . . . . . . 472.9 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . 492.9.1 Effect of DFT-incoherence . . . . . . . . . . . . . . . . 512.9.2 Effect of Deviation Model Parameter \u03b8 . . . . . . . . . 512.9.3 Noise Attenuation . . . . . . . . . . . . . . . . . . . . . 533 Off-the-Grid Sampling of Low-Rank Matrices . . . . . . . . . 553.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553.2 Notation, Assumptions and Methodology . . . . . . . . . . . . 553.2.1 Signal Model . . . . . . . . . . . . . . . . . . . . . . . 563.2.2 2D Deviation Model . . . . . . . . . . . . . . . . . . . 583.2.3 2D Dirichlet Kernel . . . . . . . . . . . . . . . . . . . . 583.3 Main Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603.3.1 Implications for Matrix Completion: Stability and Ro-bustness . . . . . . . . . . . . . . . . . . . . . . . . . . 613.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633.4.1 r-incoherence Condition . . . . . . . . . . . . . . . . . 633.4.2 2D Deviation Model . . . . . . . . . . . . . . . . . . . 673.5 Proof of Main Result . . . . . . . . . . . . . . . . . . . . . . . 683.5.1 Dual Certificate and Required Lemmas . . . . . . . . . 693.5.2 Proof of Theorem 3.3.1 . . . . . . . . . . . . . . . . . . 713.6 Proof of Dual Certificate Recovery and Required Lemmas . . . 733.6.1 Proof of Lemma 3.5.1 . . . . . . . . . . . . . . . . . . . 733.6.2 Proof of Lemma 3.5.2 . . . . . . . . . . . . . . . . . . . 753.6.3 Proof of Lemma 3.5.3 . . . . . . . . . . . . . . . . . . . 83vii3.6.4 Proof of Lemma 3.5.4 . . . . . . . . . . . . . . . . . . . 863.7 Interpolation Error of 2D Dirichlet Kernel: Proof . . . . . . . 903.8 Proof of Theorem 3.3.2: Stable and Robust Matrix Completion 943.9 Uniform Matrix Bernstein Inequality . . . . . . . . . . . . . . 983.10 Proof of Additional Lemmas . . . . . . . . . . . . . . . . . . . 1054 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113viiiList of Figures2.1 Illustration of alias error for uniform and nonuniform samples.Credit is given to [25]. (Top) alias error caused by equispacedsamples. Notice that all three curves pass through the samplepoints. (Bottom) alias free sampling due to nonuniform sam-ples. Notice that only the black curve passes through all fourpoints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2 Different undersampling schemes and their imprint in the Fourierdomain for a signal that is the superposition of three cosinefunctions. Credit for this image is given to [35]. (Top) sig-nal uniformly sampled above Nyquist rate and its respectivespectrum on the left. (Middle) same signal randomly nonuni-formly three-fold undersampled according to a discrete uni-form distribution. (Bottom) same signal uniformly three-foldundersampled. Notice that only in the case of nonuniformundersampling can the significant coefficients be detected byapplying a threshold. . . . . . . . . . . . . . . . . . . . . . . . 142.3 Illustrations of example perturbations generated by our devi-ation model. In the top two examples, the red areas indicatethe allowed positions of the deviations from the grid pointkm.(Top) deviations pertaining to example 1) with \u00b5 = 0 andp = 1, the samples lie on the interval [2k\u221212m, 2k+12m]. (Middle)deviations pertaining to example 2) with \u00b5 = 0 and p = 1,the samples lie on a discrete subgrid centered atkm. (Bottom)deviations pertaining to example 3) with \u00b5 = 0 where the bellcurve indicates the Gaussian pdf of these centered deviations.Notice that these samples may lie outside of \u2126, but recall thatwe are sampling on the torus (see Section 2.2.2). . . . . . . . . 24ix2.4 Plot of average relative reconstruction error vs average nonuni-form step size for both signal models. In the complex expo-nential model (\u03a8 = DFT) we have \u03b3 = 1 and in the Gaus-sian signal model we have \u03b3 \u2248 40.78 (Daubechies 2 wavelet).Notice that the complex exponential model allows for recon-struction from larger step sizes in comparison to the Gaussiansignal model. . . . . . . . . . . . . . . . . . . . . . . . . . . . 522.5 (Left) plot of average relative reconstruction error vs \u03c1 pa-rameter and (right) plot of corresponding \u03b8 parameters vs \u03c1parameter. The plot on the right includes the constant value\u03b8 = 1\u221a2required to apply Theorem 2.3.3 (the red line). No-tice that although our results only holds for three values of \u03c1(.5, .49, .48), the plot on the left demonstrates that accuraterecovery is still possible otherwise. . . . . . . . . . . . . . . . . 532.6 Plot of average relative reconstruction error (\u2016f\u2212\u03a8g]\u20162\/\u2016f\u20162)vs average nonuniform step size (blue curve) and average inputrelative measurement error (\u2016d\u20162\/\u2016f\u20162) vs average nonuniformstep size (red curve). Notice that for the first four step sizevalues (2,2.25,2.5,2.75) noise attenuation is achieved, i.e., thereconstruction error is lower than the input noise level. . . . . 543.1 Illustration of two 500 \u00d7 500 data matrices (from [68]) ofrank 5 with distinct 5-incoherence parameters (\u03b3). (Left) low-rank data matrix with inappropriate 5-incoherence structure\u03b3 \u223c O(\u221aN). (Right) low-rank data matrix with appropriate5-incoherence structure \u03b3 \u223c O(1). Both data matrices are dis-cretizations of functions of the form (3.13) with same centerlocations and parameter c = 1000 for the left data matrix andc = 20 for the right data matrix. . . . . . . . . . . . . . . . . . 65x3.2 Illustration of two 500 \u00d7 500 data matrices of rank 6 withdistinct 6-incoherence parameters (\u03b3) but same aliasing energy(\u2211|`|>N\/2 |c`|). (Left) low-rank data matrix with inappropriate6-incoherence structure \u03b3 \u223c O(\u221aN). (Right) low-rank datamatrix with appropriate 6-incoherence structure \u03b3 \u223c O(1).Both data matrices are discretizations of functions of the form(3.14) modified from respectively from Figure 3.1, where theleft image has no additional aliasing energy introduced (i.e.,\u03c9 < 250 and d \u001c 1) and the image on the right has beengenerated with additional aliasing error (\u03c9 > 250 and d \u223c 1)to match that of the image on the left (these additional aliasingartifacts are subtle due to their high frequency nature). . . . . 663.3 Illustration of the 2D sampling scenario in \u2126. (Left) denseequispaced samples. (Right) less dense nonuniform samples(on average by a factor of \u2248 12) which can be generated by ex-ample 1) or 2) in this section. Both sampling schemes providethe same quality of reconstruction, but random off-the-gridsamples require less measurements according to our results. . . 68xiAcknowledgementsI want to thank my supervisors Dr. \u00d6zg\u00fcr Y\u0019lmaz and Dr. Felix Herrmann.I would like to give credit to Dr. Herrmann for proposing the initial method-ology (inspired by [9]) that led to the work in [68, 82] and eventually becamethe core approach of this thesis. Dr. Y\u0019lmaz identified the potential av-enue this methodology offered to solve the jittered sampling problem, i.e., thegeneral problem considered in this thesis. He encouraged me to adapt andanalyze the approach and this eventually became my doctoral work. Asidefrom this, both of them have always inspired me to grow as an independentresearcher and person.xiiDedicationPara Luisi, Maribel, Rosi y Jorge.xiiiChapter 1IntroductionOur understanding of nature's many complex processes is inevitably donevia a discrete set of observations. Restricted by our own mosaic perception,we endeavor to take such samples in a periodic, patterned and predictablefashion. However, by nature itself this task is impossible (no matter howaccurate), forcing us to view the outside world in a nonuniform and chaoticmanner. Should we then continue our adversarial pursuit of knowledge? Orcan we instead benefit by adapting to nature's stochastic ways? Can weembrace the nonuniform samples?This thesis argues in the affirmative. The main argument is that in com-parison to classical sampling theory (equispaced sampling), random nonuni-form samples can capture a signal of interest with less samples (by simul-taneously attenuating the aliasing error). Such benefits have been knownsince the 1960's, with no large analytical understanding. This thesis extendsthe comprehension of this phenomenon by providing methodology, samplingcomplexity and signal recovery error bounds. The main novelty is due torecent advances in random matrix theory, compressive sensing and low-rankmatrix recovery.1.1 Sampling TheoryThe Nyquist-Shannon sampling theorem is perhaps the most impacting resultin the theory of signal processing, fundamentally shaping the practice ofacquiring and processing data [8, 37] (also attributed to Kotel'nikov [91],Ferrar [92], Cauchy [1], Ogura [50], E.T. and J.M. Whittaker [17, 44]). In1this setting, typical acquisition of a continuous-time signal involves takingequispaced samples at a rate slightly higher than a prescribed frequency \u03c9Hz in order to obtain a bandlimited approximation via a quickly decayingkernel. Such techniques provide accurate approximations of noisy signalswhose spectral energy is largely contained in the band [\u2212\u03c9\/2, \u03c9\/2] [28].To be precise, let f(x) be a function whose Fourier transform vanishes out-side of [\u2212\u03c9\/2, \u03c9\/2] (i.e., belongs to a Paley-Wiener space [36]), then Shannonand Kotel'nikov (independently) provided the expansionf(x) =\u221e\u2211k=\u2212\u221ef(k\u03c9)sin (pi(\u03c9x\u2212 k))pi(\u03c9x\u2212 k) := S\u03c9f(x). (1.1)Therefore, one can completely capture a continuous bandlimited signal via adiscrete sequence of its equispaced (or uniform) samples with step size 1\/\u03c9.Such initial results were later generalized to non-bandlimited functions[69], where for f \u2208 L2(R) \u2229 C(R) with Fourier transform f\u02c6 \u2208 L1(C) oneobtains the error|f(x)\u2212 S\u03c9f(x)| \u2264\u221a2pi\u222b|v|\u2265\u03c9\/2|\u02c6f(v)|dv. (1.2)This is also known as the aliasing error, a standard (and inevitable) errorterm absorbed in applications. In view of (1.2), oversampling is a commontechnique to attenuate the aliasing error (i.e., anti-aliasing) by increasing thefrequency \u03c9 of samples.Furthermore, in practice only a finite number of samples may be takenand practitioners must adopt the truncated approximationf(x) \u2248N\u2211k=\u2212Nf(k\u03c9)sin (pi(\u03c9x\u2212 k))pi(\u03c9x\u2212 k) ,for some N large enough. This provides an additional error term of order1\/\u221aN (see Theorem 3.20 in [2] or [10, 28]), which may be unacceptable inmany applications. Furthermore, such truncation approaches can lead to sta-bility issues since they may produce ill-posed problems (see [86]). To remedythis, many works have proposed instead to use trigonometric polynomials asa stable finite dimensional model ([86, 19] and Section 6.2 in [28]). In essence,given samples in [\u22121\/2, 1\/2), this approach adopts the periodic extension off(x)|[\u22121\/2,1\/2) to the whole line and approximates it as a periodic bandlimitedfunction (i.e., a trigonometric polynomial).21.2 Nonuniform SamplingAs a consequence of the results in the previous section, industrial signal ac-quisition and post-processing methods tend to be designed to incorporateuniform sampling. However, such sampling schemes are difficult to honorin practice due to physical constraints and natural factors that perturb thesampling locations from the uniform grid, i.e., off-the-grid samples. In re-sponse, nonuniform analogs of the sampling theorem have been developed,where an average sampling density proportional to the highest frequency \u03c9of the signal guarantees accurate interpolation, e.g., Landau density [39, 28].In this context, given {tp}p\u2208Z a complete interpolating sequence for aPaley-Wiener space ([81] Section 4.5, or a Bernstein space see [36] Section7.3.2) and a12-bandlimited function f(x) (i.e., whose Fourier transform van-ishes outside [\u22121\/2, 1\/2]) we may expandf(x) =\u221e\u2211p=\u2212\u221ef(tp)G(x)G\u2032(tp)(t\u2212 tp) , (1.3)whereG(x) = (x\u2212 t0)\u221e\u220fp=1(1\u2212 xtp)(1\u2212 xt\u2212p),is an entire function (see [81] Section 4.1 equation (3) or [2] equation 3.1.3).Therefore (1.3) can be viewed as the infinite version of the Lagrange interpo-lation formula (since (1.3) now involves infinite terms). Furthermore, whentp = p then (1.3) reduces to (1.1) with \u03c9 = 1 so that this interpolationformula can also be seen as a generalization of the equispaced version.One sufficient condition for a set of nonuniform samples {tp}p\u2208Z to be acomplete interpolating sequence (i.e., (1.3) holds) is given by Kadec's 1\/4-theorem [55, 28, 81] (initially proven in weaker form by Paley-Wiener [73]),which requiressupp\u2208Z|tp \u2212 p| < 14.In the case the tp are randomly nonuniform, the restriction |tp \u2212 p| < 14can be removed. In [32] it was shown that if tp = p + \u2206p where {\u2206p}p\u2208Zare independent, centered random variables with uniformly bounded secondmoments then {tp}p\u2208Z is a complete interpolating sequence almost surely.3This allows the case |\u2206p| \u2265 14 , but on average the sampling density agreeswith Kadec's, Landau's and Nyquist's rate.Such results allow for practical signal acquisition, as long as the practi-tioner keeps accurate tracking of the nonuniform sampling locations. As inthe previous section, standard implementations approximate the function ofinterest by absorbing the error of truncating (1.3) and aliasing error (i.e., thesampling rate provides a bandlimited approximation).Although nonuniform sampling theory is mainly on par with the clas-sical uniform counterpart, off-the-grid samples are typically unwanted andregarded as a burden. This is in large part due to the extra computationalcost involved in regularization, i.e., interpolating the nonuniform samplesonto a desired equispaced grid. Regularization is a common procedure inpractice, required for many post-processing techniques designed according toclassical sampling theory (involving uniform samples).1.2.1 Benefits of Nonuniform SamplingIn contrast to the mind set of the previous section, many works in the litera-ture have considered the potential benefits of deliberate nonuniform sampling[38, 26, 27, 76, 80, 67, 63, 61, 90, 51, 5, 93, 12, 13, 84, 34, 72, 33, 57, 18, 71,78, 70, 52, 47, 15, 40, 35]. Suppression of aliasing error, i.e., anti-aliasing, is awell known advantage of randomly perturbed samples. For example, jitteredsampling is a common technique for anti-aliasing that also provides a well dis-tributed set of samples [11, 74, 14, 72]. To the best of the author's knowledge,this phenomenon seems to have been first noticed by Harold S. Shapiro andRichard A. Silverman [38] (also by Federick J. Beutler [26, 27] and implicitlyby Henry J. Landau [39]) and remained unused in applications until redis-covered in Pixar animation studios by Robert L. Cook [76]. In the context ofcomputer graphics, nonuniform samples produce aliasing artifacts in imagesthat are less noticeable to the human eye (see also [12, 13, 72]). Interestingly,there are indications that the human visual system exploits nonuniform pho-toreceptor structures to process high frequency images [75, 93].Intrinsically, these observations also suggest that a bandlimited functionmay be accurately captured in an undersampled sense (in comparison tothe Nyquist rate) via nonuniform samples. Indeed, by lessening the effectof aliasing we are reducing the right hand side of (1.2). Intuitively, this4corresponds to a diminished error of the form\u223c\u221a2pi\u222b|v|\u2265\u03c9\u02dc\/2|\u02c6f(v)|dv, (1.4)for some \u03c9\u02dc > \u03c9. In effect, it seems we are able to capture higher frequencycontent of the signal and potentially the full signal if \u03c9\u02dc exceeds the bandwidthof f(x) (this is experimentally argued in [33]). Thus, nonuniform samplingprovides an interesting alternative to oversampling for anti-aliasing. If fullyunderstood, a practitioner could attenuate aliasing artifacts without the needfor additional samples and in fact may be able to do so with a reducedsampling complexity. Furthermore, this may be achieved in a natural mannersince deviated samples are inevitable in many situations.To the best of the author's knowledge, such observations and the ma-jority citations mentioned here are largely experimental and in need of fur-ther analytical understanding. The exceptions in this literature review are[38, 26, 27, 39]. In [38], Shapiro and Silverman show that Poisson randomsamples can provide alias free representation of stationary random processeswhose spectrum is absolutely continuous and has an L2 integrable derivative.The average sampling rate of this result agrees with the Nyquist rate, butprovides an explicit benefit of nonuniform samples that cannot be achievedby equispaced samples (since such stationary random processes need not bebandlimited). This result was generalized by Federick J. Beutler in [27] byextending the result to stationary random processes with any spectrum andonly requiring Poisson sampling on the half-axis. Beutler provides similaralias free results for random jitter sampling at Nyquist rate, and arbitrarilybelow the Nyquist rate for nonuniform but on-the-grid samples (i.e., ran-domly expunging samples from a dense equispaced grid). The results in[26, 39] are more aligned with sampling theory, where in [26] it was shownthat for generalized bandlimited signals (representable by a Fourier-Stieltjesintegral on a bounded interval) nonuniform samples can provide error freeexpansions with average sampling rates below Nyquist. A similar conclusioncan be interpreted from the work of Landau [39], though he does not stateit explicitly or provide examples.1.2.2 An Open ProblemIn the spirit of the previous section, we are left with a vague but interest-ing problem: state concrete methods and conditions for signal acquisition5via nonuniform samples that provide explicit benefits in contrast to uniformsamples.Specifically, we wish to answer the following questions simultaneously:\u2022 How many nonuniform samples are required? At what average sam-pling density?\u2022 What kind of grid deviations are permissible? What structure on thenonuniform samples?\u2022 What class of signals can we recover?\u2022 What numerical method can recover the signal in a stable manner?\u2022 How large can the reconstruction error be? Is perfect acquisition pos-sible?This thesis provides novel theoretical insight to each of these questions.Note that from the discussion in the previous section, all of these questionshave been investigated experimentally but only a few have been consideredin a rigorous (but arguably incomplete) manner [38, 26, 27, 39]. With this inmind, this thesis endeavors to provide analytical treatment of these questionswith answers that are informative for the practice and design of nonuniformsampling techniques. However, this is an extensive topic that seems to belacking a tangible theoretical framework. As a consequence, many potentialavenues of research remain with subsequent questions. In this sense, thisdissertation can be seen as the initial step to a more rigid theory of thisphenomenon.The contributions of this work are largely due to the relatively new fieldsof compressive sensing and low-rank matrix recovery. The numerical andanalytical methods of these areas of study are important for the remainderof the thesis.1.3 Compressive SensingCompressive sensing (CS) aims to simultaneously acquire and compress asignal that is sparse or compressible with respect to a given basis or frame.Suppose a discrete signal f \u2208 CN is s-sparse with respect to a basis \u03a8 \u2208CN\u00d7N , i.e., f = \u03a8g for some vector g \u2208 CN with only s \u2264 N non-zero entries6(g is said to be s-sparse). In CS, one collects m \u001c N linear measurementsof f given byb = Af + d \u2208 Cm.Here, A \u2208 Cm\u00d7N is the measurement matrix that models the sampling pro-tocol and d \u2208 Cm represents measurement noise with \u2016d\u20162 \u2264 \u03b7. In thisfield, perhaps the most popular method to obtain an approximation to fis by means of the basis pursuit problem. More precisely, the basis pursuitproblem approximates f \u2248 \u03a8g] where g] given byg] = arg minh\u2208CN\u2016h\u20161 s.t. \u2016A\u03a8h\u2212 b\u20162 \u2264 \u03b7. (1.5)Here \u2016h\u20161 :=\u2211Nk=1 |hk| is the `1-norm, a standard penalty function for sparsevector recovery [43, 85, 36].An appropriate measurement matrix A and sparsifying transform \u03a8 arecrucial for the success of (1.5). Standard results show that if f \u2248 \u03a8g for somes-sparse vector g, then f can be robustly and stably approximated as longas A\u03a8 is a \u0010suitable\u0011 measurement matrix and has m = O(s polylog(N))rows (i.e., measurements). In this case the approximation error \u2016f \u2212 \u03a8g]\u20162is proportional to \u03b7 (robust) and the sparsity model mismatch (stability, i.e.,error of best s-sparse approximation) [85]. Typical assumptions imposed onA\u03a8 for successful recovery include the null space property (NSP) [83] andthe restricted isometry property (RIP, see Section 2.5).Therefore, one can manage to solve a linear system of equations in anunder determined sense (since m < N) for sparse vector recovery. These ap-proaches are valuable to industrial applications since signals may be acquiredand processed in a frugal and compressed manner. Standard measurementmatrices known to be successful for sparse vector recovery (e.g., satisfy NSPor RIP) typically involve some degree of randomness. Matrices A whoseentries are independent and centered subgaussian random variables can beshown to allow for s-sparse vector recovery if m \u223c s log(N\/s) [58, 49, 83, 85].Random matrices with more structure, such as random Fourier matrices, alsoallow for sparse vector recovery fromO(s polylog(N)) samples [58, 85, 42, 41].1.4 Low-Rank Matrix RecoveryIn the same spirit as CS, the field of low-rank matrix recovery (also known asmatrix sensing) has the goal of recovering an N \u00d7M low-rank matrix from7linear measurements and is particularly of interest in the case the numberof measurements is much smaller than NM [6, 59, 65, 66]. As a popularexample, the matrix completion problem of reconstructing a data matrixgiven only a few observed entries has received increased amounts of attentiondue to its extensive applications and implementation success [21, 22, 7] (seeSection 3.3.1 for an elaborated discussion). The key assumption here is thatthe underlying matrix of interest has few nonzero or quickly decaying singularvalues, i.e., the matrix exhibits low-rank structure.More precisely, let D \u2208 CN\u00d7M be a rank r matrix (or a matrix that canbe well approximated by a rank r matrix) and suppose that we have thenoisy linear measurementsB = A(D) + E \u2208 Cn\u00d7m,where A : CN\u00d7M 7\u2192 Cn\u00d7m is linear map with nm < NM and E \u2208 Cn\u00d7mrepresents our measurement noise with \u2016E\u2016F \u2264 \u03b7. It is important to noticethat in the literature, it is typically assumed that B,A(D) and E are vectors.Here we allow for the general structure of n \u00d7 m matrices which best fitour scenario (see Chapter 3.2.1), but also include the assumptions in theliterature by taking m = 1 and considering an n\u00d7 1 matrix as a vector with\u2016E\u20162 = \u2016E\u2016F .We compensate for our small number of measurements by assuming thatr \u001c min{N,M}, i.e., we assume that our matrix of interest is of low-rankrelative to the ambient dimensions (or well approximated by such a rank rmatrix). With this assumption, perhaps the most popular method to ap-proximate D is to obtain a data estimate D] as the argument output of theoptimization procedureminX\u2208CN\u00d7M\u2016X\u2016\u2217 subject to \u2016A(X)\u2212B\u2016F \u2264 \u03b7, (1.6)where \u2016X\u2016\u2217 is the nuclear norm ofX and is defined as \u2016X\u2016\u2217 :=\u2211min{N,M}k=1 \u03c3k(X)where \u03c3k(X) denotes the k-th largest singular value of the matrix.This methodology has been extensively studied with the concepts of rankrestricted isometry property [6, 59, 85, 94], robust rank null space property[65, 66] and dual certificates [22, 21]. In this context, standard results showthat m \u223c r(N + M) log(N + M) measurements via an appropriate A (e.g.,satisfying one of the properties previously mentioned) provides an accurateapproximation of the signal via its best r rank approximation. This sampling8complexity is a great reduction in contrast to the NM measurements neededin general to parametrize an N \u00d7M matrix. Standard results provide errorbounds robust to noise and proportional to the rank r approximation of thedata matrix. However, results using dual certificates typically do not providestability with respect to the low-rank model [16, 7, 22, 21] (i.e., the resultsonly apply to matrices that are exactly rank r).1.5 Overview of the Main ResultsThis thesis provides novel theoretical understanding of the benefits of randomnonuniform sampling for signal acquisition (as opposed to uniform sampling).In essence, the main results formalize the advantages of nonuniform samplesfor anti-aliasing and undersampling outlined in Section 1.2.1 and addressthe questions posed in Section 1.2.2. Adopting the methodology and prooftechniques developed in compressive sensing and low-rank matrix recovery,the work develops sampling complexity and recovery error bounds in termsof the bandlimited approximation.Along the lines of compressive sensing, our results in Chapter 2 hinge on asparse signal model assumption. With an underlying function f(x) to be sam-pled in \u2126, we assume its discrete counterpart f \u2208 CN (sampled uniformly) iscompressible with respect to a basis \u03a8 \u2208 CN\u00d7N (f \u2248 \u03a8g as outlined in Sec-tion 2.3.3). Our methodology consists of solving the `1 minimization problem(1.5) with an interpolation kernel S in lieu of A to model the nonuniform ob-servations. We show that under a random deviation model, O(s polylog(N))off-the-grid samples approximate f(x) (\u2200x \u2208 \u2126) robustly with error propor-tional to the s-sparse approximation of g and N\/2-bandlimited approxima-tion of f(x) with high probability (w.h.p.). For highly compressible signalswith s\u001c N , we may therefore recover the bandlimited approximation of thesignal of interest with less samples than those required in the equispaced casefor the same quality of reconstruction (O(N) by Nyquist rate).Similarly, when a 2D signal of interest D(x, y) can be discretized (via uni-form samples) as an approximately low-rank matrix D \u2208 CN\u00d7N , the theoryof low-rank matrix recovery provides analogous results in Chapter 3. In thiscase we show that under a less general deviation model, O(rN polylog(N))off-the-grid measurements can be incorporated into the nuclear norm mini-mization problem (1.6) with a 2D interpolation kernel to estimate D via itsrank r approximation (w.h.p.). This in turn gives the full signal D(x, y) (for9(x, y) in the sampling domain) robustly with error proportional to the r rankapproximation of D and N\/2-bandlimited approximation of D(x, y).The proof technique in the 2D context utilizes a dual certificate, makingthe result rather novel since it provides stability with respect to the low-rankdata model. In other words, the analysis concerns full rank matrices andguarantees recovery of a low-rank approximation of D which is uncommonfor low-rank matrix recovery proofs that employ dual certificates (e.g., [16, 21,22, 7]). In particular, the recovery error bounds apply to matrix completionand thus produce the first results in this popular problem that do not requirethe data matrix (D) to be exactly low-rank and guarantee error proportionalto the r rank approximation (with similar sampling complexity to standardresults in the matrix completion literature).1.6 Organization and NotationThe remainder of the thesis is organized as follows: Chapter 2 discusses theresults outlined in the previous section with the sparse signal model assump-tion. This chapter includes an extended discussion of the anti-aliasing natureof nonuniform samples, providing intuition behind this phenomenon. Chap-ter 3 proceeds with the analogous results under the low-rank data model.Extra consideration is given to the proof technique in this chapter in orderto outline the novelties and implications for other fields (e.g., the matrixcompletion problem) and future work.Before proceeding to the next chapter, we find it best to introduce thegeneral notation that will be used throughout the thesis. However, eachsubsequent chapter will introduce additional notation helpful in the specificcontext.Notation: We denote complex valued functions of real variables usingbold letters, e.g., f : R2 \u2192 C. For any integer n \u2208 N, [n] denotes the set{` \u2208 N : 1 \u2264 ` \u2264 n}. For k, ` \u2208 N, bk indicates the k-th entry of the vector b,Dk` denotes the (k, `) entry of the matrix D and Dk\u2217 (D\u2217`) denotes the k-throw (resp. `-th column) of the matrix. We reserve x, y to denote real variablesand write the complex exponential as e(x) := e2piix, where i is the imaginaryunit. For a vector f \u2208 Cn, \u2016f\u20161 :=\u2211nk=1 |fk| is the `1 norm, \u2016f\u20162 :=[\u2211nk=1 |fk|2]1\/2is the Euclidean norm, and \u2016f\u20160 gives the total number ofnon-zero elements of f . For a matrix X \u2208 Cn\u00d7m, \u03c3k(X) denotes the k-thlargest singular value of X, \u2016X\u2016\u2217 :=\u2211min (n,m)k=1 \u03c3k(X) is the nuclear norm,10\u2016X\u2016 := \u03c31(X) is the spectral norm, and \u2016X\u2016F :=[\u2211nk=1\u2211m`=1 |Xk`|2]1\/2isthe Frobenious norm of X. L2(\u2126) is the Lebesgue space and H1(\u2126) is theSobolev space W 1,2(\u2126) (with domain \u2126). The adjoint of an operator A isdenoted by A\u2217.11Chapter 2Compressive Off-the-GridSampling2.1 IntroductionIn this chapter, we produce our results under the sparse signal model dis-cussed in Section 1.5. In Section 2.1.1, we begin with some motivation andintuition for our problem of interest and analysis approach. Sections 2.2and 2.3 proceed to lay the foundation necessary to state the methodologyand main result. The remaining sections of the chapter are dedicated to anelaborated discussion of the main result and its proof.2.1.1 The Anti-aliasing Nature of Nonuniform SamplesThough there is few theoretical work explicitly stating the anti-aliasing be-havior of nonuniform samples, the intuition of this phenomenon is not hardto grasp. In this section we provide some illustrations to aid the readercomprehend this matter and lightly explain the proof technique used in thethesis.When trying to capture a signal via equispaced samples, the main issueis that one can encounter higher and lower frequency signals that satisfy thesame set of measurements. This is illustrated in Figure 2.1 (credit for theseimages is given to [25]). In the top image, all three sinusoids (black, blueand yellow) pass through the four uniform samples. In effect, no recoverymethod should be able to distinguish between these signals in order to capture12the black sinusoid of interest. In general, additional equispaced samples areneeded to remove this aliasing bias (i.e., oversampling).On the other hand, the bottom image in Figure 2.1 illustrates the anal-ogous situation for nonuniform samples. Notice that in this case, only theblack sinusoid fits all four measurements and can thus be distinguished fromthe other curves. This simple example provides the intuition behind theanti-aliasing nature of nonuniform samples.Figure 2.1: Illustration of alias error for uniform and nonuniform samples.Credit is given to [25]. (Top) alias error caused by equispaced samples.Notice that all three curves pass through the sample points. (Bottom) aliasfree sampling due to nonuniform samples. Notice that only the black curvepasses through all four points.We provide some more intuition, now in the frequency domain. We con-sider the effect of uniform and nonuniform undersampling on the spectrumof a continuous function. Consider Figure 2.2 (credit for this image is given13to [35]). The top image corresponds to a densely sampled signal (aboveNyquist rate) and its respective spectrum (on the right), which has few sig-nificant Fourier coefficients (only 6 non-zero components). The middle andbottom images correspond to the same signal nonuniformly and uniformlyundersampled along with the respective spectra. We see that both under-sampled cases introduce noise into the spectrum, but only in the case ofrandom nonuniform samples is this noise low in amplitude. This allows fordetection of all 6 signal coefficients using nonuniform samples (by applyinga threshold). On the other hand, periodic undersampling makes this taskrather difficult.Figure 2.2: Different undersampling schemes and their imprint in the Fourierdomain for a signal that is the superposition of three cosine functions. Creditfor this image is given to [35]. (Top) signal uniformly sampled above Nyquistrate and its respective spectrum on the left. (Middle) same signal randomlynonuniformly three-fold undersampled according to a discrete uniform dis-tribution. (Bottom) same signal uniformly three-fold undersampled. Noticethat only in the case of nonuniform undersampling can the significant coef-ficients be detected by applying a threshold.This last argument (illustrated in Figure 2.2) is important for the workof this thesis. Basically, this is the approach that will be used to provethe anti-aliasing nature of random nonuniform samples. However, we will14generalize this argument to work for coefficients in any basis (rather thanjust the Fourier domain).2.2 Notation, Assumptions and MethodologyBefore being able to state our methodology and main result, we must intro-duce important definitions and assumptions. In this section we introduce thesignal model, deviation model and interpolation kernel used as a measure-ment matrix in Sections 2.2.1, 2.2.2 and 2.2.3 respectively. This will allow usto proceed to Section 2.3 where the main result of this chapter is elaborated.2.2.1 Signal ModelLet \u2126 = [\u221212, 12) and let f : \u2126\u2192 C be the function of interest to be sampledin \u2126. We assume f \u2208 H1(\u2126) with Fourier expansionf(x) =\u221e\u2211`=\u2212\u221ec`e(`x), (2.1)valid only for x \u2208 \u2126. Note that our regularity assumption implies that\u221e\u2211`=\u2212\u221e|c`| <\u221e,which will be crucial for our error bound.Henceforth, let N \u2208 N be odd. We denote the discretized regular datavector by f \u2208 CN , which is obtained by sampling f on the uniform grid\u03c4 = {t1, \u00b7 \u00b7 \u00b7 , tN} \u2282 \u2126, where tk := k\u22121N \u2212 12 , which is a collection of equispacedpoints, so that fk = f(tk). Here, f will be our discrete signal of interestto recover via few nonuniform samples (f will provide an N\u221212-bandlimitedapproximation of f(x)). Similar results can be obtained in the case N is even,our current assumption is only needed to simplify our exposition.The observed discretized nonuniform data vector is denoted by f\u02dc \u2208 Cmwith underlying unstructured grid \u03c4\u02dc = {t\u02dc1, \u00b7 \u00b7 \u00b7 , t\u02dcm} \u2282 \u2126 where t\u02dck := k\u22121m \u221212+ \u2206k is now a collection of generally non-equispaced points. The entries ofthe perturbation vector \u2206 \u2208 Rm defines the pointwise deviations of \u03c4\u02dc fromthe equispaced grid {k\u22121m\u2212 12}mk=1, where f\u02dck = f(t\u02dck). We assume that the15nonuniform points remain in our sampling space, i.e., \u03c4\u02dc \u2282 \u2126 so that (2.1)remains valid for these measurements. See sampling on the torus discussionin Section 2.2.2 for further discussion of this important restriction.In our sampling scenario, noisy nonuniform samples are given asb = f\u02dc + d \u2208 Cm,where the noise model, d with \u2016d\u20162 \u2264 \u03b7, does not incorporate off-the-gridcorruption. We assume that we know \u03c4\u02dc , and \u03b7 (or at least an upper boundon the noise energy).In this chapter, we impose a compressibility condition on f \u2208 CN . Tothis end, let n \u2264 N and \u03a8 : CN\u00d7n be a full rank matrix with 0 < \u03c3n(\u03a8) := \u03b1and \u03c31(\u03a8) := \u03b2. We assume there exists some g \u2208 Cn such that f = \u03a8g,where g can be accurately approximated by an s \u2264 n sparse vector. Moreprecisely, for s \u2208 [n] we define the error of best s-sparse approximation of gas\u000fs(g) := min\u2016h\u20160\u2264s\u2016h\u2212 g\u20161,and assume s has been chosen so that \u000fs(g) is within a prescribed errortolerance determined by the practitioner. Further, we require m \u2264 n (neededin the proof of Theorem 2.5.3, see Lemma 3.6 from [58]).The transform \u03a8 will have to be incoherent with respect to the 1D cen-tered discrete Fourier basis F \u2208 CN\u00d7N (see Section 2.2.3 for definition of F).To be precise, we define the DFT-incoherence parameter as\u03b3 = max`\u2208[n]N\u2211k=1|\u3008F\u2217k,\u03a8\u2217`\u3009| ,which provides a uniform bound on the `1-norm of the DFT coefficients of thecolumns of \u03a8. This parameter will play a role in the sampling complexityof our result (see Section 2.3.1). We discuss \u03b3 in detail in Section 2.4.2,including several examples of the value of \u03b3 for several transforms commonin compressive sensing.In this chapter, the goal is to estimate g via basis pursuit (1.5), where Awill be replaced by an interpolation kernel S \u2208 Cm\u00d7N that achieves Sf \u2248 f\u02dcaccurately (the Dirichlet kernel, see Section 2.2.3).162.2.2 Deviation ModelOur work will provide analysis for deviations \u2206 \u2208 Rm (defined in Section2.2.1) whose entries are i.i.d. with any distribution, D, that obeys thefollowing: for \u03b4 \u223c D, there exists some \u03b8 \u2265 0 such that for all integers0 < |j| \u2264 2(N\u22121)mwe have|Ee(jm\u03b4)| \u2264 \u03b8m2N.This will be known as our deviation model. In our results, distributionswith smaller \u03b8 parameter will require less samples and provide reduced errorbounds.Sampling on the torus : It is important to notice that our deviation modelcan potentially allow for nonuniform samples outside of \u2126 := [\u221212, 12). Thisviolates our assumption in Section 2.2.1, as (2.1) will no longer hold for \u03c4\u02dc .To remedy this, we modify our sampling scheme to be on the torus. To beprecise, if f|\u2126(x) is given asf|\u2126(x) ={f(x) if x \u2208 [\u221212, 12)0 if x \/\u2208 [\u221212, 12),then we define f\u02dc(x) as the periodic extension of f|\u2126(x) to the whole linef\u02dc(x) =\u221e\u2211`=\u2212\u221ef|\u2126(x+ `).We now apply samples generated from our deviation model to f\u02dc(x). In-deed, for any t\u02dck generated outside of \u2126 by our deviation model we will havef\u02dc(t\u02dck) = f(t\u2217) for some t\u2217 \u2208 \u2126. In this way we may proceed with our unre-stricted deviation model, which will provide nonuniform samples for f in \u2126(in particular the expansion (2.1) will remain valid for these samples).We postpone further discussion of the deviation model until Section 2.4.1,where we will also provide examples of deviations that fit this model.2.2.3 Dirichlet KernelWe model our nonuniform samples via an interpolation kernel S \u2208 Rm\u00d7Nthat achieves Sf \u2248 f\u02dc accurately. In this thesis, we consider the Dirichletkernel defined by S = NF\u2217 : CN \u2192 Cm, where F \u2208 CN\u00d7N is a 1D centered17discrete Fourier transform (DFT) and N \u2208 Cm\u00d7N is a 1D centered nonuni-form discrete Fourier transform (NDFT, see [45, 53]) with normalized rowsand irregular frequencies chosen according to \u03c4\u02dc . In other words, let N\u02dc = N\u221212,then the (k, `) \u2208 [m]\u00d7 [N ] entry of N is given asNk` = 1\u221aNe((`\u2212 N\u02dc \u2212 1)t\u02dck).This NDFT is referred to as a nonuniform discrete Fourier transform of type2 in [53]. Thus, the action of S on f \u2208 CN can be given as follows: we firstapply the centered inverse DFT to our discrete uniform dataf\u02c7u := (F\u2217f)u =N\u2211p=1fpF\u2217up :=1\u221aNN\u2211p=1fpe((u\u2212 N\u02dc \u2212 1)tp), \u2200u \u2208 [N ], (2.2)followed by the NDFT in terms of \u03c4\u02dc(Sf)k := (N f\u02c7)k =N\u2211u=1f\u02c7uNku := 1\u221aNN\u2211u=1f\u02c7ue(\u2212t\u02dck(u\u2212 N\u02dc \u2212 1)), \u2200k \u2208 [m].(2.3)Equivalently,(Sf)k = 1NN\u2211p=1fpK(t\u02dck \u2212 tp), (2.4)whereK(\u03b8) = sin (Npi\u03b8)sin (pi\u03b8)is the Dirichlet kernel. This equality is well known andholds by applying the geometric series formula upon expansion (notice thatS \u2208 Rm\u00d7N is real valued). This kernel is commonly used for trigonometricinterpolation and is therefore accurate when acting on signals that can bewell approximated by trigonometric polynomials of finite order, as shown inthe following theorem.Theorem 2.2.1. Let S, f and f\u02dc be defined as above with \u03c4\u02dc \u2282 \u2126. For eachk \u2208 [m], if t\u02dck = tp for some p \u2208 [N ] then(f\u02dc \u2212 Sf)k= 0 (2.5)18and otherwise(f\u02dc \u2212 Sf)k=\u2211|`|>N\u02dcc`(e(`t\u02dck)\u2212 (\u22121)b `+N\u02dcN ce(r(`)t\u02dck)), (2.6)where r(`) = rem(` + N\u02dc ,N) \u2212 N\u02dc with rem(`,N) giving the remainder afterdivision of ` by N . As a consequence, for any 1 \u2264 p <\u221e\u2016f\u02dc \u2212 Sf\u2016p \u2264 2m1p\u2211|`|>N\u02dc|c`| , (2.7)and\u2016f\u02dc \u2212 Sf\u2016\u221e \u2264 2\u2211|`|>N\u02dc|c`| . (2.8)The proof of this theorem is postponed until section 2.6.Therefore, the error of S is proportional to the `1-norm of the Fouriercoefficients of f that correspond to frequencies higher than N\u02dc = N\u221212. Inparticular notice that if c` = 0 for all ` > N\u02dc we obtain perfect interpolation,as expected from standard results in signal processing (this will only happenfor trigonometric polynomials of degree \u2264 N\u02dc). Despite the wide usage oftrigonometric interpolation in applications [86, 19, 41], such a result thatgives an exact error term does not seem to exist in the literature.Notice that Theorem 2.2.1 only holds for \u03c4\u02dc \u2282 \u2126 as restricted in Section2.2.1. However, the results continues to hold for \u03c4\u02dc unrestricted if we sampleon the torus as discussed in Section 2.2.2 (in particular the error bound willalways hold for our deviation model in this sense).2.3 Main ResultWith the definitions and assumptions introduced in Section 2.2, our method-ology in this chapter will consist of approximating the s largest coefficientsof f in \u03a8 (in the representation f = \u03a8g) asg \u2248 g] := arg minh\u2208Cn\u2016h\u20161 s.t. \u2016S\u03a8h\u2212 b\u20162 \u2264 \u03b7 + 2\u221am\u2211|`|>N\u221212|c`| . (2.9)This will give us the approximation f \u2248 \u03a8g]. Recall that S was intro-duced in Section 2.2.3 and \u03a8, b, \u03b7 where defined in Section 2.2.1. The term192\u221am\u2211|`|>N\u02dc |c`| in the noise constraint is due to the error of our interpola-tion kernel S given in (2.7). Thus, since S\u03a8g = Sf \u2248 f\u02dc , (2.9) models ournonuniform samples within the noise level and interpolation error. Underour sparse signal model imposed on g in Section 2.2.1, this approach shouldsuccessfully recovery f from the few nonuniform samples f\u02dc as shown in thefollowing main results of this chapter.2.3.1 Simplified ResultThis section provides a simplified version of our result assuming that \u03a8 is anorthonormal basis. We present this result as a corollary of the main resultin Section 2.3.2.Corollary 2.3.1. Let \u03a8 \u2208 CN\u00d7N be an orthonormal basis with DFT-incoherenceparameter \u03b3. Define S with the entries of \u2206 i.i.d. from any distribution thatsatisfies our deviation model with \u03b8 < 1\u221a2.Defineg] = arg minh\u2208CN\u2016h\u20161 s.t. \u2016S\u03a8h\u2212 b\u20162 \u2264 \u03b7 + 2\u221am\u2211|`|>N\u221212|c`|. (2.10)Let \u03c4 := 14\u221a2\u2212 \u03b84> 0. If\u221am \u2265 c1\u03b3\u221as log2(N)\u221a1 + \u03b8 + \u03c4\u03c4(2.11)where c1 is an absolute constant, then for c2, c3 > 0 depending only on \u03b8\u2016f \u2212\u03a8g]\u20162 \u2264 c2\u000fs(g)\u221as+c3\u221aN\u03b7\u221am+ 2c3\u221aN\u2211|`|>N\u221212|c`| (2.12)with probability exceeding1\u2212 exp(\u2212 m\u03c48s\u03b32log(1 + 2 log(1 +\u03c42\u03c4 + 1 + 2\u03b8))). (2.13)Therefore, with m \u223c s log4(N) random off-the-grid samples we can re-cover f with error (2.12) proportional to the sparse model mismatch (\u000fs(g)),the noise level (\u03b7) and the error of the N\u221212-bandlimited approximation of f(\u2211|`|>N\u221212|c`|). As a consequence, we can approximate f(x) for all x \u2208 \u2126 asstated in the following corollary.20Corollary 2.3.2. Let h : \u2126 \u2192 CN be the vector valued function definedentry-wise for ` \u2208 [N ] ash(x)` :=1\u221aNe((`\u2212 N\u02dc \u2212 1)x),and define the function f] : \u2126\u2192 C viaf](x) = \u3008h(x),F\u2217\u03a8g]\u3009,where g] is given by (2.10).Then, under the assumptions of Corollary 2.3.1,|f(x)\u2212 f ](x)| \u2264 c2\u000fs(g)\u221as+c3\u221aN\u03b7\u221am+ 2(c3\u221aN + 1)\u2211|`|>N\u221212|c`| (2.14)holds for all x \u2208 \u2126 = [\u221212, 12) with probability exceeding (2.13).The proof of this corollary is presented in Section 2.6.In the case \u000fs(g) = \u03b7 = 0, the result intuitively says that we can recovertheN\u221212-bandlimited approximation of f(x) with O(s polylog(N)) randomnonuniform samples. In the case of uniform samples, O(N) measurementsare needed for the same quality of reconstruction by the Nyquist-Shannonsampling theorem (or by Theorem 2.2.1 directly). Thus, for compressiblesignals s \u001c N , random nonuniform samples provide a significant reductionin sampling complexity (see Section 2.4 for further discussion).Notice that general denoising is not guaranteed via our undersamplingscenario (m \u2264 N), due to the term\u221aN\u03b7\u221amin (2.12) and (2.14). In other words,one cannot expect to reduce the measurement noise \u03b7 since\u221aN\u221am\u2265 1 appearingin our error bound implies an amplification of the input noise level. In generala practitioner must oversample (i.e., N < m) to attenuate the effects of noise(our result does not apply in this case since we assumed m \u2264 N in Section2.2.1, see proof of Theorem 2.5.3). However, the main result states thatnonuniform samples can handle alias related noise efficiently.2.3.2 Full ResultWe now present the full result, assuming that \u03a8 is a full column-rank matrix.Corollary 2.3.1 will follow from Theorem 2.3.3 by taking \u03b1 = \u03b2 = 1.21Theorem 2.3.3. Let n \u2264 N and \u03a8 \u2208 CN\u00d7n be a full rank matrix withDFT-incoherence parameter \u03b3 and\u03c3n(\u03a8) := \u03b1 >\u221a1\u2212 1\u221a2and \u03c31(\u03a8) := \u03b2 \u2264\u221a1 +1\u221a2.Let the entries of \u2206 be i.i.d. with any distribution that satisfies our deviationmodel with\u03b8 < max{1\u03b22(1 +1\u221a2)\u2212 1, 1 + 1\u03b12(1\u221a2\u2212 1)}.Define g] as in (2.9). Let \u03c4 := 14\u221a2\u2212 max{\u03b22\u03b8+\u03b22\u22121,\u03b12\u03b8+1\u2212\u03b12}4> 0.If\u221am \u2265 c\u02dc1\u03b3\u221as log2(n)\u221a\u03b22 + \u03b22\u03b8 + \u03c4\u03c4(2.15)where c\u02dc1 is an absolute constant, then for c\u02dc2, c\u02dc3 > 0 depending only on \u03b8, \u03b1and \u03b2,\u2016f \u2212\u03a8g]\u20162 \u2264 \u03b2c\u02dc2\u000fs(g)\u221as+\u03b2c\u02dc3\u221aN\u03b7\u221am+ 2\u03b2c\u02dc3\u221aN\u2211|`|>N\u221212|c`|with probability exceeding1\u2212 exp(\u2212 m\u03c48s\u03b32log(1 + 2 log(1 +\u03c42\u03c4 + \u03b22(1 + 2\u03b8)))).The proof of this theorem is found in Section 2.5.This theorem generalizes the result in the previous section to more generaltransformations \u03a8 for sparse representation. This is more practical since thecolumns of \u03a8 need not be orthogonal, instead linear independence suffices(with knowledge of the singular values \u03b1, \u03b2). In particular notice that (2.15)requires m \u223c s log4(n) as opposed to m \u223c s log4(N) in (2.11). Since n \u2264 N ,this general result allows for a potential reduction in sample complexity if thepractitioner may construct \u03a8 in such an efficient manner while still allowinga sparse and accurate representation of f .222.4 DiscussionThis section elaborates on several aspects of the main result. Section 2.4.1provides examples of distributions that satisfy our deviation model and in-tuition of its meaning. Section 2.4.2 explores the \u03b3 parameter in Corollary2.3.1 and Theorem 2.3.3 with examples of transformations \u03a8 that produce asatisfiable sampling complexity. This section also considers the practicalityof our signal model. Section 2.4.3 argues for the novelty of our results incontrast to related work in the literature.2.4.1 Deviation ModelIn this section, we present several examples of distributions that are suitablefor our deviation model. These are illustrated in Figure 2.3.\u2022 1) D = U [\u2212 12m, 12m] gives \u03b8 = 0. To generalize this example, we maytake D = U [\u00b5\u2212 p2m, \u00b5+ p2m], for any \u00b5 \u2208 R and p \u2208 N\/{0}. Here \u03b8 = 0.\u2022 2) D = U{\u2212 12m+ kmn\u00af}n\u00af\u22121k=0 with n\u00af := d2(N\u22121)m e + 1 gives \u03b8 = 0. Togeneralize this example, we may take D = U{\u00b5\u2212 p2m+ pkmn\u00af}n\u00af\u22121k=0 , for any\u00b5 \u2208 R and p \u2208 N\/{0} and n\u00af \u2208 N\/[d2(N\u22121)pme]. Here \u03b8 = 0.\u2022 3) D = N (\u00b5, \u03c3\u00af2), for any \u00b5 \u2208 R and \u03c3\u00af2 > 0. Here \u03b8 = 2Nme\u22122(\u03c3\u00afpim)2.In particular, for fixed \u03c3\u00af, m may be chosen large enough to satisfy theconditions of Theorem 2.3.3 and vice versa.\u2022 4) Jittered sampling: notice that examples 1) and 2) include cases ofjittered sampling [11, 74, 14, 72]. Indeed, (let \u00b5 = 0 and p = 1) wemay choose N,m, n\u00af in such a way that partitions \u2126 into m regions ofequal size and these distributions will choose a point randomly fromeach region (in a continuous or discrete sense).We leave it to the reader to verify the examples above and consider otherdistributions of interest.Interpretation of the deviation model : For any\u2206 \u2208 R, notice that e(jm\u2206)will be a point on the unit circle. A small \u03b8 \u2248 0 parameter in our deviationmodel implies that|Ee(jm\u2206)| \u2248 0.23Figure 2.3: Illustrations of example perturbations generated by our deviationmodel. In the top two examples, the red areas indicate the allowed positionsof the deviations from the grid pointkm. (Top) deviations pertaining to ex-ample 1) with \u00b5 = 0 and p = 1, the samples lie on the interval [2k\u221212m, 2k+12m].(Middle) deviations pertaining to example 2) with \u00b5 = 0 and p = 1, the sam-ples lie on a discrete subgrid centered atkm. (Bottom) deviations pertainingto example 3) with \u00b5 = 0 where the bell curve indicates the Gaussian pdfof these centered deviations. Notice that these samples may lie outside of \u2126,but recall that we are sampling on the torus (see Section 2.2.2).24Intuitively, \u03b8 is in some sense measuring how biased a given distribution is ingenerating deviations. Indeed, |Ee(jm\u2206)| = 0 means that the distribution inquestion is centered and unbiased to any particular directions. On the otherhand, |Ee(jm\u2206)| \u2248 1 gives the opposite interpretation where deviations willbe generated favoring a certain direction in an almost deterministic sense.Our result is not applicable to such biased distributions since in Theorem2.3.3, as\u03b8 \u2192 max{1\u03b22(1 +1\u221a2)\u2212 1, 1 + 1\u03b12(1\u221a2\u2212 1)}we have \u03c4 \u2192 0 and c\u02dc2, c\u02dc3 \u2192 \u221e (see Section 2.5). Therefore our number ofsamples and error terms blow up in this case. In conclusion, our deviationmodel cannot be satisfied by such biased and quasi-deterministic deviationssince in the case of the degenerate distribution|Ee(jm\u2206)| = |e(jm\u2206)| = 1,which gives \u03b8 = 2Nm> 1. Such a parameter will never satisfy the conditionsof the main result.2.4.2 Signal ModelWe begin this section with the discussion of the DFT-incoherence parameter\u03b3 introduced in Section 2.2.1 as\u03b3 = max`\u2208[n]N\u2211k=1|\u3008F\u2217k,\u03a8\u2217`\u3009| .This a uniform upper bound on the `1-norm of the discrete Fourier coeffi-cients of the columns of \u03a8. Since the decay of the Fourier coefficients of afunction are related to its smoothness, intuitively \u03b3 can be seen as a measureof the smoothness of the columns of \u03a8. Implicitly, this also measures thesmoothness of f(x) since its uniform discretization admits a representationvia this transformation f = \u03a8g. Therefore, the role of \u03b3 on the samplingcomplexity is clear since smaller \u03b3 implies that our signal our interest issmooth and therefore requires less samples. This observation is intuitive,since non-smooth functions will require additional samples to capture jumpdiscontinuities in accordance with Gibbs phenomenon.We now consider several common choices for \u03a8 and discuss the respective\u03b3 parameter:25\u2022 1) \u03a8 = F (the DFT), then \u03b3 = 1. This is the optimal case and canalso be achieved for the discrete cosine and sine transforms.\u2022 2) When \u03a8 = H is the 1D Haar wavelet basis, we have \u03b3 \u223c O(log(N)).In [30] it is shown that |\u3008F\u2217k,H\u2217`\u3009| \u223c 1|k| (see Lemma 6.1), which givesthe desired upper bound for \u03b3 via an integral comparison. Notice thatthese basis function have jump discontinuities, and yet we still obtain anacceptable DFT-incoherence parameter for nonuniform undersampling.\u2022 3) \u03a8 = I (the N \u00d7 N identity) gives \u03b3 = \u221aN . This is the worst casescenario for normalized transforms sincemaxx\u2208SN\u22121N\u2211k=1|\u3008F\u2217k, x\u3009| = maxx\u2208SN\u22121N\u2211k=1|\u3008F\u2217k,F\u2217x\u3009| = maxx\u2208SN\u22121N\u2211k=1|xk| \u2264\u221aN.In general, our smooth signals of interest are not fit for this sparsitymodel.\u2022 4) Let p \u2265 1 be an integer. Matrices \u03a8 whose columns are uniformdiscretizations of p-differentiable functions, with p \u2212 1 periodic andcontinuous derivatives and with p-th derivative that is piecewise con-tinuous. In this case \u03b3 \u223c O(log(N)) if p = 1 and \u03b3 \u223c O(1) if p \u2265 2.An informal argument for these computations is provided in Section2.8.Example 4) in particular is informative due to its generality and ability tosomewhat formalize the intuition behind \u03b3 previously discussed. This ex-ample implies the applicability of our result to a general class of smoothfunctions that agree nicely with our signal model defined in Section 2.2.1(functions in H1(\u2126)).We finish this section by motivating the practicality of our signal model.Note that any f \u2208 L2(\u2126) provides an infinite Fourier expansion with decayingFourier coefficients |c`|. Such functions can be approximated (according tothe practitioner's error tolerance) by a trigonometric polynomial of degree N\u02dc ,where N\u02dc may be extremely large in order to provide\u2211|`|>N\u02dc |c`| \u001c \u000f (withina desired error tolerance). In essence, our main result shows that randomnonuniform samples allow for robust and stable recovery of f(x) with anerror term proportional to\u2211|`|>(N\u22121)\/2 |c`|, the error of a N\/2-bandlimitedapproximation. This term is a typical (and inevitable) error accepted when26discretizing non-bandlimited signals. Thus, we may capture f to within qual-ity of standard practice, but economically so with sampling complexity pro-portional to the compression level provided by \u03a8 and only dependent onN logarithmically. In particular, N may be chosen substantially large toaccommodate\u2211|`|>(N\u22121)\/2 |c`| \u001c \u000f for signals with slowly decaying Fouriercoefficients and the number of samples required will not drastically increase.Thus, our work is relevant to a variety applications that are prone tononuniform sampling and have lossy compression techniques available. Thisis the case for many smooth signals of interest including audio [3], radar [60],seismic [29] and other signals that represent dicretizations of analog finite-energy wavefields (e.g., Lamb waves [46]). Many technologies developed forsuch applications have proven through their every day use that most signalsof interest can be significantly compressed (MP3, JPEG 2000, DjVu, WAV,ZIP, PGF, MP4, ICER).2.4.3 Novelty of the ResultsSeveral studies in the literature are similar to our results in this chapter[42, 41], and we would like to stress the novelties of this work and give creditdue before proceeding to the proofs. Our main proof under the sparse signalmodel adopts the approach of [58, 42, 41] in the case S forms a boundedorthonormal system (when \u03b8 = 0). However, we derive recovery guaranteesfor non-orthonormal systems (when \u03b8 6= 0) and focus the scope of the paperwithin the context of classical sampling theory (introducing error accord-ing to bandlimited approximation). The work in [41] considers sampling ofsparse trigonometric polynomials and coincides with our application in thecase \u03a8 = F . Our results generalize this work (in the basis pursuit case) toallow for other signal models and sparsifying transforms. Furthermore, [41]assumes that the samples are chosen uniformly at random from a continuousinterval or a discrete set of N equispaced points. In contrast, our resultspertain to general deviations from an equispaced grid with sampling density\u223c s polylog(N) and allow for these and many other distributions of the per-turbations (according to parameter \u03b8). Finally, we also derive results in thelow-rank matrix recovery scenario (see Chapter 3). These results generalizethe previous method of proof to the low-rank data case and may be of intereston their own to establish other results for low-rank matrix recovery theory.In particular, the results under the low-rank data model also provide novelinsights for the popular matrix completion problem.27The differences might seem subtle, but the sampling model we consideris a naturally occurring scenario in many applications that is not covered byprevious work. To illustrate, we consider a typical marine seismic data survey.In this application, cables of sources and receivers equipped with GPS aretowed by a vessel over a survey area. Due to ocean currents and varying shipspeeds, the measurements are inevitably deviated from the desired uniformgrid designed for post-processing [68, 82, 9, 34]. This provides ideal condi-tions for our methodology as the deviations are accurately monitored andcan be appropriately modeled as random phenomena. In comparison, the as-sumptions in [41] would require practitioners to remove sources and receiversuniformly at random from the equipment arrays independently. Since surveytools are pre-designed and fixed in many applications, such sampling schemeswould require mayor modifications of equipment and acquisition design. Infact, most sampling strategies from the compressive sensing literature wouldrequire multiple sources to be fired simultaneously with subgaussian weights,[58, 49, 65, 64, 66, 83, 4, 88], and are therefore impractical for applicationsthat rely on equipment developed according to classical sampling theory. Onthe other hand, we adopt acquisition scenarios that occur frequently in cur-rent practice and our main results show that these situations can be exploitedfor economical sampling of compressible signals.2.5 Proof of Main ResultAs mentioned in Section 2.3.3, the restricted isometry property (RIP) of ameasurement operator is a common analytical tool used to establish sparsevector recovery results via basis pursuit. We will obtain our main result byestablishing this property for our interpolation kernel S with respect to thetransform \u03a8. Specifically, we wish to show that S\u03a8 satisfies the following[85, 89]:Definition: Suppose A \u2208 Cm\u00d7n is a measurement matrix and 1 \u2264 s \u2264 nis an integer. The restricted isometry constant (RIC) of order s of A is thesmallest number \u03b4s such that for all s-sparse vectors v \u2208 Cn (i.e., \u2016v\u20160 \u2264 s),(1\u2212 \u03b4s)\u2016v\u201622 \u2264 \u2016Av\u201622 \u2264 (1 + \u03b4s)\u2016v\u201622. (2.16)Informally, the RIP is said to hold if the RIC is small enough for sparsevector recovery. Many results exist in the literature that give such conditions28on the RIC. To the best of the author's knowledge, [89] provides the bestresult in terms of the RIC:Theorem 2.5.1 (Theorem 2.1 in [89]). Consider the basis pursuit problem(1.5) with feasible vector g and minimizer g]. If the RIC \u03b42s of A\u03a8 satisfies\u03b42s <1\u221a2,then\u2016g \u2212 g]\u20162 \u2264\uf8eb\uf8ec\uf8ec\uf8ed\u03b42s\u221a2 +\u221a2\u03b42s(1\u221a2\u2212 \u03b42s)2(1\u221a2\u2212 \u03b42s) + 1\uf8f6\uf8f7\uf8f7\uf8f8 2\u000fs(g)\u221as + 2\u03b7\u221a2(1 + \u03b42s)1\u2212 \u03b42s\u221a2:=c\u02dc2\u000fs(g)\u221as+ c\u02dc3\u03b7.Notice that c\u02dc2, c\u02dc3 only depend on \u03b42s. These constants provide the errorbounds in Corollary 2.3.1 and Theorem 2.3.3 where \u03b42s will depend on \u03b8, \u03b1and \u03b2.This result provides recovery error bounds proportional to the noise leveland sparse model mismatch. We will thus compute the RIC of S\u03a8 and thenapply Theorem 2.5.1 to obtain our main result.2.5.1 Restricted Isometry Property of S\u03a8To prove Theorem 2.3.3 we will bound the RIC of A :=\u221aN\u221amS\u03a8 \u2208 Cm\u00d7n,which will then provide our recovery error bound using well known results inthe literature [89, 85] (Theorem 2.5.1 in the previous section).For an index set T \u2282 [n], let AT \u2208 Cm\u00d7|T | be the sub-matrix of A thatresults by only considering the columns of A specified by T . We consider thetermXm := sup|T |\u2264s\u2016A\u2217TAT \u2212 EA\u2217TAT\u20162.Notice that in contrast to many works in the literature that adopt the sameapproach ([41, 58]), we have EA\u2217TAT 6= Is\u00d7s in general (equality only holdsin the case \u03b8 = 0 and \u03b1 = \u03b2 = 1). We must therefore deal with this termexplicitly and modify our approach respectively.29In order to bound the RIC, our goal is to boundXm \u2264 \u03c4. (2.17)for some \u03c4 \u2265 0. This will in turn show that the order s RIC of A is boundedas\u03b4s \u2264 \u03c4 + max{\u03b22\u03b8 + \u03b22 \u2212 1, \u03b12\u03b8 + 1\u2212 \u03b12},where the additional term max{\u03b22\u03b8 + \u03b22 \u2212 1, \u03b12\u03b8 + 1 \u2212 \u03b12} is due to ourdeviation model and assumptions on \u03a8. This is shown in the following argu-ment:The variational characterization of the spectral norm for a Hermitianmatrix givesXm = sup|T |\u2264smaxx\u2208Ss\u22121|\u3008(A\u2217TAT \u2212 EA\u2217TAT )x, x\u3009|= supx\u2208Sn\u22121,\u2016x\u20160\u2264s|\u3008(A\u2217A\u2212 EA\u2217A)x, x\u3009|= supx\u2208Sn\u22121,\u2016x\u20160\u2264s\u2223\u2223\u2016Ax\u201622 \u2212 E\u2016Ax\u201622\u2223\u2223 .Therefore, Xm \u2264 \u03c4 is equivalent toE\u2016Ax\u201622 \u2212 \u03c4\u2016x\u201622 \u2264 \u2016Ax\u201622 \u2264 E\u2016Ax\u201622 + \u03c4\u2016x\u201622,and to establish our claim it suffices to show thatE\u2016Ax\u201622 \u2264 \u03b22(1 + \u03b8)\u2016x\u201622 and E\u2016Ax\u201622 \u2265 \u03b12(1\u2212 \u03b8)\u2016x\u201622. (2.18)To this end, let w \u2208 Cn and normalize N\u02dc = \u221aNN so that for k \u2208 [m], ` \u2208[N ]N\u02dck` = e(t\u02dck(`\u2212 n\u02dc\u2212 1)).Throughout, let \u2206\u02dc \u2208 R be an independent copy of the entries of \u2206 \u2208 Rm.30Then with v := F\u2217\u03a8w,E\u2016Aw\u201622 =1mE\u2016N\u02dcF\u2217\u03a8w\u201622 :=1mE\u2016N\u02dc v\u201622= E1mm\u2211k=1\u2223\u2223\u2223\u3008N\u02dck\u2217, v\u3009\u2223\u2223\u22232 = E 1mm\u2211k=1\u2223\u2223\u2223\u2223\u2223N\u2211`=1e(t\u02dck(`\u2212 n\u02dc\u2212 1))v`\u2223\u2223\u2223\u2223\u22232= E1mm\u2211k=1\uf8eb\uf8ed N\u2211`=1N\u2211\u02dc`=1e(t\u02dck(`\u2212 \u02dc`))v`v\u00af\u02dc`\uf8f6\uf8f8=N\u2211`=1N\u2211\u02dc`=1v`v\u00af\u02dc`(E1mm\u2211k=1e(t\u02dck(`\u2212 \u02dc`)))=N\u2211`=1|v`|2 +b(N\u22121)\/mc\u2211j=1\u2211`\u2212\u02dc`=jmv`v\u00af\u02dc`Ee(jm(\u2206\u02dc\u2212 1\/2))+b(N\u22121)\/mc\u2211j=1\u2211`\u2212\u02dc`=\u2212jmv`v\u00af\u02dc`Ee(\u2212jm(\u2206\u02dc\u2212 1\/2)).The last equality can be obtained as follows,E1mm\u2211k=1e(t\u02dck(`\u2212 \u02dc`)) = E 1mm\u2211k=1e((k \u2212 1m\u2212 12+ \u2206k)(`\u2212 \u02dc`))=1mm\u2211k=1e((k \u2212 1m\u2212 12)(`\u2212 \u02dc`))Ee(\u2206k(`\u2212 \u02dc`))=1mm\u2211k=1e((k \u2212 1m\u2212 12)(`\u2212 \u02dc`))Ee(\u2206\u02dc(`\u2212 \u02dc`))= Ee((\u2206\u02dc\u2212 1\/2)(`\u2212 \u02dc`)) m\u2211k=11me(k \u2212 1m(`\u2212 \u02dc`))=\uf8f1\uf8f4\uf8f2\uf8f4\uf8f31 if ` = \u02dc`Ee(jm(\u2206\u02dc\u2212 1\/2))if `\u2212 \u02dc`= jm, j \u2208 Z\/{0}0 otherwisewhere the third equality uses the fact that Ee(\u2206k(`\u2212 \u02dc`)) = Ee(\u2206\u02dc(`\u2212 \u02dc`)) forall k \u2208 [m] in order to properly factor out this constant from the sum in the31fourth equality. The last equality is due to the properties of this geometricseries.Returning to our original calculation, we bound the last term using ourdeviation model assumptions\u2223\u2223\u2223\u2223\u2223\u2223b(N\u22121)\/mc\u2211j=1\u2211`\u2212\u02dc`=\u2212jmv`v\u00af\u02dc`Ee(\u2212jm(\u2206\u02dc\u2212 1\/2))\u2223\u2223\u2223\u2223\u2223\u2223=\u2223\u2223\u2223\u2223\u2223\u2223b(N\u22121)\/mc\u2211j=1\u2211`\u2208Qjv`v\u00af`+jmEe(\u2212jm(\u2206\u02dc\u2212 1\/2))\u2223\u2223\u2223\u2223\u2223\u2223\u2264b(N\u22121)\/mc\u2211j=1\u2211`\u2208Qj|v`||v`+jm|\u2223\u2223\u2223Ee(\u2212jm(\u2206\u02dc\u2212 1\/2))\u2223\u2223\u2223\u2264 \u03b8m2Nb(N\u22121)\/mc\u2211j=1\u2211`\u2208Qj|v`||v`+jm| \u2264 \u03b8m2Nb(N\u22121)\/mc\u2211j=1|\u3008v, v\u3009|=\u03b8m\u2016v\u2016222Nb(N\u22121)\/mc\u2211j=11 =\u03b8m\u2016v\u2016222NbN \u2212 1mc \u2264 \u03b8\u2016v\u2016222.Here, Qj \u2282 [N ] is the index set of allowed ` indices according to j, i.e.,that satisfy ` \u2208 [N ] and ` + jm \u2208 [N ]. The remaining sum can be boundedsimilarly. Combine these inequalities with the singular values of \u03a8 to obtainE\u2016Aw\u201622 \u2264 \u2016v\u201622 +2\u03b8\u2016v\u2016222= \u2016\u03a8w\u201622 (1 + \u03b8) \u2264 \u03b22\u2016w\u201622 (1 + \u03b8) ,andE\u2016Aw\u201622 \u2265 \u03b12\u2016w\u201622 (1\u2212 \u03b8) .We will apply this inequality and similar orthogonality properties in whatfollows, and ask the reader to keep this in mind. To establish (2.17), we willapply a concentration inequality (Theorem 5.2 in [41], established in [87]).The following lemma will be useful for this purpose.Lemma 2.5.2. Let v \u2208 Cn be an s-sparse vector and define N\u02dc as abovewith the entries of \u2206 i.i.d. with any distribution D, such that for integers0 < |j| \u2264 2(N\u22121)m, if \u03b4 \u223c D then|Ee(j\u03b4m)| \u2264 \u03b8m2N,32for some \u03b8 \u2265 0. Define\u03b3 = max`\u2208[n]N\u2211k=1|\u3008F\u2217k,\u03a8\u2217`\u3009| .Then for all k \u2208 [m]|\u3008Ak\u2217, v\u3009| \u2264 \u03b3\u221as\u2016v\u20162\u221am,andEm\u2211k=1|\u3008Ak\u2217, v\u3009|4 \u2264 s\u03b22\u03b32(1 + 2\u03b8)\u2016v\u201642mThe proof of this lemma can be found in Section 2.7.We are now in a position to bound the RIC. The proof will follow thearguments in [41, 58]. In particular, the main difference here is that the rowsof A are not identically distributed and EA\u2217A 6= In\u00d7n in general (equalityonly holds in the case \u03b8 = 0 and \u03b1 = \u03b2 = 1). However, the proof goesthrough similarly, where we need only to apply the previous lemma.Theorem 2.5.3. Let the entries of \u2206 be i.i.d. with any distribution D, suchthat for integers 0 < |j| \u2264 2(N\u22121)m, if \u03b4 \u223c D then|Ee(j\u03b4m)| \u2264 \u03b8m2N,for some \u03b8 \u2265 0. Define\u03b3 = max`\u2208[n]N\u2211k=1|\u3008F\u2217k,\u03a8\u2217`\u3009| .If \u221am\u221alog(m)\u2265 2c\u03b3\u221as log(s)\u221alog(n)\u221a1 + \u03b8 + \u03c4\u03c4,then the order s restricted isometry constant of A :=\u221aN\u221amS\u03a8 \u2208 Cm\u00d7n satisfies\u03b4s \u2264 2\u03c4 + max{\u03b22\u03b8 + \u03b22 \u2212 1, \u03b12\u03b8 + 1\u2212 \u03b12}with probability exceeding1\u2212 exp(\u2212 m\u03c44s\u03b32log(1 + 2 log(1 +\u03c42\u03c4 + \u03b22(1 + 2\u03b8)))).33Proof. In what follows, for k \u2208 [m] let AkT \u2208 C|T | denote the k-th row of AT .DefineXm := sup|T |\u2264s\u2016A\u2217TAT \u2212 EA\u2217TAT\u20162 = sup|T |\u2264s\u2016m\u2211k=1(AkT \u2217AkT \u2212 EAkT \u2217AkT ) \u20162.As noted in the beginning of the section, our RIC bound will be establishedif we show Xm \u2264 2\u03c4 with high probability. We begin by bounding EDXm,and then show that Xm does not deviate much from its expectation withhigh probability.Symmetrizing as Lemma 6.3 in [56], we haveEDXm \u2264 2EDE\u000f sup|T |\u2264s\u2016m\u2211k=1\u000fkAkT \u2217AkT\u20162,where the \u000fk are independent Rademacher random variables. We will nowapply Lemma 3.6 in [58], which requires m \u2264 n and an upper bound of\u2016Ak\u2217\u2016\u221e for all k \u2208 [m] accordingly.Notice that for k \u2208 [m]\u2016Ak\u2217\u2016\u221e = max`\u2208[n]|Ak`| = max`\u2208[n]1\u221am\u2223\u2223\u2223\u3008N\u02dck\u2217, (F\u2217\u03a8)`\u3009\u2223\u2223\u2223 = max`\u2208[n]1\u221am\u2223\u2223\u2223\u2223\u2223N\u2211p=1N\u02dckp(F\u2217\u03a8)p`\u2223\u2223\u2223\u2223\u2223\u2264 1\u221ammax`\u2208[n]N\u2211p=1|N\u02dckp| |(F\u2217\u03a8)p`| = 1\u221ammax`\u2208[n]N\u2211p=1|(F\u2217\u03a8)p`| := \u03b3\u221am.Applying Lemma 3.6 in [58] with this observation gives that2E\u000f sup|T |\u2264s\u2016m\u2211k=1\u000fkAkT \u2217AkT\u20162 \u2264 Q(s,m, n) sup|T |\u2264s\u2016m\u2211k=1AkT \u2217AkT\u20161\/22 .We have definedQ(s,m, n) :=2c\u03b3\u221as log(s)\u221alog(n) log(m)\u221am,34where c is an absolute constant given in Lemma 3.6 from [58]. We maytherefore boundEDXm \u2264 Q(s,m, n)ED sup|T |\u2264s\u2016m\u2211k=1AkT \u2217AkT\u20161\/22 \u2264 Q(s,m, n)(ED sup|T |\u2264s\u2016m\u2211k=1AkT \u2217AkT\u20162)1\/2\u2264 Q(s,m, n)(ED sup|T |\u2264s\u2016m\u2211k=1(AkT \u2217AkT \u2212 EDAkT \u2217AkT ) \u20162 + sup|T |\u2264s\u2016EDA\u2217TAT\u20162)1\/2\u2264 Q(s,m, n)(ED sup|T |\u2264s\u2016m\u2211k=1(AkT \u2217AkT \u2212 EDAkT \u2217AkT ) \u20162 + \u03b22 + \u03b22\u03b8)1\/2= Q(s,m, n)(EDXm + \u03b22 + \u03b22\u03b8)1\/2.The last inequality holds as in the proof of (2.18), sincesup|T |\u2264s\u2016EDA\u2217TAT\u20162 \u2264 \u2016EDA\u2217A\u20162 = supx\u2208Sn\u22121ED\u3008A\u2217Ax, x\u3009 = supx\u2208Sn\u22121ED\u2016Ax\u201622 \u2264 \u03b22 + \u03b22\u03b8.In conclusion,EDXm(EDXm + \u03b22 + \u03b22\u03b8)1\/2\u2264 2c\u03b3\u221as log(s)\u221alog(n) log(m)\u221amand we may achieveEDXm \u2264 \u03c4if2c\u03b3\u221as log(s)\u221alog(n) log(m)\u221am\u2264 \u03c4\u221a\u03c4 + \u03b22 + \u03b22\u03b8. (2.19)We now apply a concentration inequality to show that Xm is close to itsexpected value with high probability. We use the following result (Theorem5.2 in [41], established in [87]).Theorem 2.5.4. Let Y1, ..., Ym be a sequence of independent random vari-ables with values in some Polish space X . Let F be a countable collection ofreal-valued measurable and bounded functions f on X with \u2016f\u2016\u221e \u2264 B for allf \u2208 F . Let Z be the random variableZ = supf\u2208Fm\u2211k=1f(Yk).35Assume Ef(Yk) = 0 for all k \u2208 [m] and all f \u2208 F . Define \u03c32 = supf\u2208F E\u2211mk=1 f(Y`)2.Then for all t \u2265 0P (Z \u2265 EZ + t) \u2264 exp(\u2212 t4Blog(1 + 2 log(1 +Bt2BEZ + \u03c32))).To apply the theorem, letDs := {x \u2208 Sn\u22121 : \u2016x\u20160 \u2264 s},andX := {Z \u2208 Cn\u00d7n : \u3008Zx, x\u3009 \u2208 R \u2200x \u2208 Ds and supx\u2208Ds\u3008Zx, x\u3009 \u2264 \u03b32sm},which is a closed subset of Cn\u00d7n (a Polish space), and therefore itself a Polishspace (see for example [31]). For x \u2208 Ds, we define the function fx : X 7\u2192 Rasfx(Z) := \u3008Zx, x\u3009,which is real valued by definition of X . Then notice thatXm = sup|T |\u2264s\u2016m\u2211k=1(AkT \u2217AkT \u2212 EAkT \u2217AkT ) \u20162= sup|T |\u2264ssupx\u2208Ss\u22121m\u2211k=1\u3008(AkT \u2217AkT \u2212 EAkT \u2217AkT )x, x\u3009= supx\u2208Dsm\u2211k=1\u3008(A\u2217k\u2217Ak\u2217 \u2212 EA\u2217k\u2217Ak\u2217)x, x\u3009 = supfx,x\u2208Dsm\u2211k=1fx (A\u2217k\u2217Ak\u2217 \u2212 EA\u2217k\u2217Ak\u2217)= supfx,x\u2208D\u2217sm\u2211k=1fx (A\u2217k\u2217Ak\u2217 \u2212 EA\u2217k\u2217Ak\u2217) ,where D\u2217s is a dense countable subset of Ds. Furthermore, by the first partof Lemma 2.5.2, for all k \u2208 [m] and x \u2208 Dsfx(A\u2217k\u2217Ak\u2217 \u2212 EA\u2217k\u2217Ak\u2217) = |\u3008Ak\u2217, x\u3009|2 \u2212 E|\u3008Ak\u2217, x\u3009|2 \u2264\u03b32sm,so that {A\u2217k\u2217Ak\u2217 \u2212 EA\u2217k\u2217Ak\u2217}mk=1 \u2282 X .36Otherwise, we meet all the assumptions of the Theorem with B := \u03b32sm,and need only to compute \u03c32. To this end we apply the second part of Lemma2.5.2, and obtain for any x \u2208 DsEm\u2211k=1fx(A\u2217k\u2217Ak\u2217 \u2212 EA\u2217k\u2217Ak\u2217)2 = Em\u2211k=1|\u3008Ak\u2217, x\u3009|4 \u2212(E|\u3008Ak\u2217, x\u3009|2)2\u2264 Em\u2211k=1|\u3008Ak\u2217, x\u3009|4 \u2264 s\u03b22\u03b32(1 + 2\u03b8)m:= \u03c32.To finish, apply Theorem 2.5.4 with t = \u03c4 > 0 and assume\u221am\u221alog(m)\u2265 2c\u03b3\u221as log(s)\u221alog(n)\u221a\u03b22 + \u03b22\u03b8 + \u03c4\u03c4,which gives EXm \u2264 \u03c4 according to (2.19). ThenXm \u2264 EXm + t \u2264 2\u03c4with probability exceeding1\u2212 exp(\u2212 m\u03c44s\u03b32log(1 + 2 log(1 +\u03c42\u03c4 + \u03b22(1 + 2\u03b8)))).Thus, by our remarks in the beginning of the section, A :=\u221aN\u221amS\u03a8 satifiesthe RIP inequality with constant 2\u03c4 + max{\u03b22\u03b8 + \u03b22 \u2212 1, \u03b12\u03b8 + 1\u2212 \u03b12} andthe derived probability.2.5.2 Proof of Theorem 2.3.3We finish by applying the RIC bound established in the previous sectionto well known results in compressive sensing and obtain our recovery errorbound.Proof of Theorem 2.3.3. Let f = \u03a8g and e = f\u02dc \u2212 Sf . Then for any g\u00af \u2208 Cn\u2016S\u03a8g\u00af \u2212 f\u02dc \u2212 d\u20162 = \u2016S\u03a8g\u00af \u2212 (Sf + e)\u2212 d\u20162=\u2016S\u03a8g\u00af \u2212 S\u03a8g \u2212 e\u2212 d\u20162.37Thus, g] equivalently solvesming\u00af\u2208Cn\u2016g\u00af\u20161 subject to \u2016S\u03a8g\u00af \u2212 S\u03a8g \u2212 e\u2212 d\u20162 \u2264 2\u221am\u2211|`|\u2265N\u221212|c`|+ \u03b7= ming\u00af\u2208Cn\u2016g\u00af\u20161 subject to \u2016A(g\u00af \u2212 g)\u2212\u221aN\u221am(e+ d)\u20162 \u2264\u221aN\u221am\uf8eb\uf8ed2\u221am \u2211|`|\u2265N\u221212|c`|+ \u03b7\uf8f6\uf8f8 ,where as before A :=\u221aN\u221amS\u03a8. Notice that g is feasible since\u221aN\u221am\u2016e+ d\u20162 \u2264\u221aN\u221am\u2016e\u20162 +\u221aN\u221am\u2016d\u20162 \u2264\u221aN\u221am\uf8eb\uf8ed2\u221am \u2211|`|\u2265N\u221212|c`|\uf8f6\uf8f8+ \u221aN\u03b7\u221am,where the last equality follows since \u2016e\u20162 \u2264 2\u221am\u2211|`|\u2265N\u221212|c`| by Theorem2.2.1. Apply Theorem 2.5.3 with\u03c4 :=14\u221a2\u2212 max{\u03b22\u03b8 + \u03b22 \u2212 1, \u03b12\u03b8 + 1\u2212 \u03b12}4> 0,which is positive by our assumptions on \u03b8, \u03b1 and \u03b2. Then if\u221am\u221alog(m)\u2265 2c\u03b3\u221a2s log(2s)\u221alog(n)\u221a\u03b22 + \u03b22\u03b8 + \u03c4\u03c4, (2.20)A satisfies the 2s-restricted isometry property with constant satisfying\u03b42s \u2264 2\u03c4 + max{\u03b22\u03b8 + \u03b22 \u2212 1, \u03b12\u03b8 + 1\u2212 \u03b12}:=12\u221a2+max{\u03b22\u03b8 + \u03b22 \u2212 1, \u03b12\u03b8 + 1\u2212 \u03b12}2<1\u221a2and probability exceeding1\u2212 exp(\u2212 m\u03c48s\u03b32log(1 + 2 log(1 +\u03c42\u03c4 + \u03b22(1 + 2\u03b8)))).Since \u03b42s <1\u221a2, Theorem 2.1 in [89] (the case t = 2 or see Theorem 2.5.1)and the singular values of \u03a8 give\u2016f\u2212\u03a8g]\u20162 = \u2016\u03a8(g\u2212g])\u20162 \u2264 \u03b2\u2016g\u2212g]\u20162 \u2264 \u03b2c\u02dc2\u000fs(g)\u221as+\u03b2c\u02dc3\u221aN\u221am\uf8eb\uf8ed2\u221am \u2211|`|\u2265N\u221212|c`|+ \u03b7\uf8f6\uf8f8 ,(2.21)38where c\u02dc2, c\u02dc3 > 0 depend only on12\u221a2+ max{\u03b22\u03b8+\u03b22\u22121,\u03b12\u03b8+1\u2212\u03b12}2(c\u02dc2, c\u02dc3 are givenin [89] or see Theorem 2.5.1).In particular, since s \u2264 n and by assumption m \u2264 n, if\u221am \u2265 2c\u03b3\u221a2s log2(2n)\u221a\u03b22 + \u03b22\u03b8 + \u03c4\u03c4, (2.22)then (2.20) holds. In the statement of the theorem we have absorbed all theabsolute constants in this expression into the definition of c\u02dc1.2.6 Interpolation Error of Dirichlet Kernel: ProofIn this section we provide the error term of our interpolation operator whenapplied to our signal model (Theorem 2.2.1) and also the error bound givenin Corollary 2.3.2.Proof of Theorem 2.2.1. We begin by showing that if t\u02dck = tp\u02dc for some p\u02dc \u2208 [N ](i.e., our \u0010nonuniform\u0011 sample lies on the dense interpolation grid), then theerror is zero. This is easy to see by orthogonality of the complex exponentials,combining (2.2), (2.3) (recall that N\u02dc = N\u221212) we have(Sf)k = \u3008f,Sk\u2217\u3009 =N\u2211p=1fpSkp := 1NN\u2211p=1fp\uf8eb\uf8ed N\u02dc\u2211u=\u2212N\u02dce(utp)e(\u2212ut\u02dck)\uf8f6\uf8f8:=1NN\u2211p=1fp\uf8eb\uf8ed N\u02dc\u2211u=\u2212N\u02dce(utp)e(\u2212utp\u02dc)\uf8f6\uf8f8 = fp\u02dc = f(tp\u02dc) = f(t\u02dck) = f\u02dck.This establishes (2.5).We now deal with the general case (2.6). Recall the Fourier expansion ofour underlying functionf(x) =\u221e\u2211`=\u2212\u221ec`e(`x).39Again, using (2.2), (2.3) and the Fourier expansion at f(tp) = fp we obtain(Sf)k = \u3008f,Sk\u2217\u3009 =N\u2211p=1fpSkp:=1NN\u2211p=1( \u221e\u2211`=\u2212\u221ec`e(`tp))\uf8eb\uf8ed N\u02dc\u2211u=\u2212N\u02dce(utp)e(\u2212ut\u02dck)\uf8f6\uf8f8 .At this point, we wish to switch the order of summation and sum over allp \u2208 [N ]. We must assume the corresponding summands are non-zero. To thisend, we continue assuming fp,Skp 6= 0 for all p \u2208 [N ]. We will deal with thesecases separately afterward. In particular we will remove this assumption forthe fp's and show that Skp 6= 0 under our assumption \u03c4\u02dc \u2282 \u2126.Proceeding, we may now sum over all p \u2208 [N ] since these summands arenon-zero and switch the order of summations to obtain(Sf)k = 1NN\u02dc\u2211u=\u2212N\u02dc\u221e\u2211`=\u2212\u221ec`e(\u2212ut\u02dck)N\u2211p=1e((u+ `)tp)=N\u02dc\u2211u=\u2212N\u02dc\u221e\u2211j=\u2212\u221e(\u22121)jNcjN+ue(ut\u02dck) =\u221e\u2211j=\u2212\u221e(\u22121)b j+N\u02dcN ccje(r(j)t\u02dck).The second equality is obtained by orthogonality of the exponential basisfunctions,\u2211Np=1 e((u + `)tp) = 0 when ` + u \/\u2208 NZ and otherwise equal toN(\u22121)jN for some j \u2208 Z (where u+ ` = jN). The last equality results froma reordering of the series where the mapping r is defined as in the statementof Theorem 2.2.1. To illustrate, we consider j \u2265 0 (for simplicity) and firstnotice that (\u22121)jN = (\u22121)j since N is assumed to be odd in Section 2.2.1.40Aesthetically expanding the previous sum givesN\u02dc\u2211u=\u2212N\u02dc\u221e\u2211j=0(\u22121)jcjN+ue(ut\u02dck) = e(\u2212N\u02dc t\u02dck)(c\u2212N\u02dc ,\u2212cN\u2212N\u02dc , c2N\u2212N\u02dc , \u00b7 \u00b7 \u00b7)+ e((\u2212N\u02dc + 1)t\u02dck)(c\u2212N\u02dc+1,\u2212cN\u2212N\u02dc+1, c2N\u2212N\u02dc+1, \u00b7 \u00b7 \u00b7)\u00b7 \u00b7 \u00b7+ (c0,\u2212cN , c2N , \u00b7 \u00b7 \u00b7 )\u00b7 \u00b7 \u00b7+ e(N\u02dc t\u02dck)(cN\u02dc ,\u2212cN+N\u02dc , c2N+N\u02dc , \u00b7 \u00b7 \u00b7).Notice that in the first line we have indices N\u2212N\u02dc = N\u02dc+1 and 2N\u2212N\u02dc =N + N\u02dc + 1. Therefore, if start at the term c\u2212N\u02dc and traverse this array ofFourier coefficients we will obtain the ordered sequence {(\u22121)b j+N\u02dcN ccj}\u221ej=\u2212N\u02dc(with no repetitions). The coefficients in row q \u2208 [N ] correspond to frequencyvalue \u2212N\u02dc + q \u2212 1 and have indices of the form cpN\u2212N\u02dc+q\u22121 for some p \u2208 N.Let us check that for a given index the mapping r gives the correct frequencyvalue, i.e., r(pN \u2212 N\u02dc + q \u2212 1) = \u2212N\u02dc + q \u2212 1 for all q \u2208 [N ]:r(pN \u2212 N\u02dc + q \u2212 1) := rem(pN \u2212 N\u02dc + q \u2212 1 + N\u02dc ,N)\u2212 N\u02dc= rem(pN + q \u2212 1, N)\u2212 N\u02dc = q \u2212 1\u2212 N\u02dc .We can therefore reorder the series as desired and incorporate the sumover j < 0 via the same logic to establish the equality. Since for ` \u2208{\u2212N\u02dc ,\u2212N\u02dc +1, \u00b7 \u00b7 \u00b7 , N\u02dc} we have r(`) = ` and (\u22121)b `+N\u02dcN c = 1, we finally obtainf(t\u02dck)\u2212 (Sf)k =\u2211|`|>N\u02dcc`(e(`t\u02dck)\u2212 (\u22121)b `+N\u02dcN ce(r(`)t\u02dck)).The definition of `p-norms along with the triangle inequality give the remain-41ing claim. In particular,\u2016f\u02dc \u2212 Sf\u2016p =(m\u2211k=1\u2223\u2223f(t\u02dck)\u2212 (Sf)k\u2223\u2223p)1\/p=\uf8eb\uf8ed m\u2211k=1\u2223\u2223\u2223\u2223\u2223\u2223\u2211|`|>N\u02dcc`(e(`t\u02dck)\u2212 (\u22121)b `+N\u02dcN ce(r(`)t\u02dck))\u2223\u2223\u2223\u2223\u2223\u2223p\uf8f6\uf8f81\/p \u2264\uf8eb\uf8ed m\u2211k=1\uf8eb\uf8ed\u2211|`|>N\u02dc2|c`|\uf8f6\uf8f8p\uf8f6\uf8f81\/p=\uf8eb\uf8edm\uf8eb\uf8ed\u2211|`|>N\u02dc2|c`|\uf8f6\uf8f8p\uf8f6\uf8f81\/p = 2m1\/p \u2211|`|>N\u02dc|c`|.This finishes the proof in the case fp,Skp 6= 0 for all p \u2208 [N ]. To removethis condition for the fp's, we may find a real number \u00b5 such that the functionh(x) := f(x) + \u00b5 =\u2211`\u2208(\u2212\u221e,\u221e)\u2229Z\/{0}c`e(`x) + c0 + \u00b5is non-zero when x \u2208 {tp}Np=1. In particular notice that if we define h =f + \u00b5 \u2208 CN , then hp 6= 0 for all p \u2208 [N ]. Therefore, only assuming now thatSkp 6= 0 for p \u2208 [N ], the previous argument can be applied to concludeh(t\u02dck)\u2212 (Sh)k =\u2211|`|>N\u02dcc`(e(`t\u02dck)\u2212 (\u22121)b `+N\u02dcN ce(r(`)t\u02dck)).However, if 1N \u2208 CN denotes the all ones vector and eN\u02dc+1 \u2208 CN is theN\u02dc + 1-th standard basis vector, notice that(Sh)k = \u3008Sk\u2217, h\u3009 = \u3008Sk\u2217, f\u3009+ \u00b5\u3008Sk\u2217, 1N\u3009 = \u3008Sk\u2217, f\u3009+ \u00b5\u3008Nk\u2217,F\u22171N\u3009= \u3008Sk\u2217, f\u3009+ \u00b5\u221aN\u3008Nk\u2217, eN\u02dc+1\u3009 = \u3008Sk\u2217, f\u3009+ \u00b5 = (Sf)k + \u00b5.The fourth equality holds by orthogonality of F\u2217 and since F\u2217(N\u02dc+1)\u2217 =1\u221aN1N .Thereforeh(t\u02dck)\u2212 (Sh)k = f(t\u02dck) + \u00b5\u2212 ((Sf)k + \u00b5) = f(t\u02dck)\u2212 (Sf)k,and the claim holds in this case as well.42The assumption Skp 6= 0 will always hold under our assuption \u03c4\u02dc \u2282 \u2126, i.e.,t\u02dck \u2208 [\u221212 , 12) for all k \u2208 [m]. We show this case by deriving conditions underwhich this occurs. As noted before, we haveSkp :=N\u02dc\u2211u=\u2212N\u02dce(u(tp \u2212 t\u02dck)) =N\u22121\u2211u=0e(u(tp \u2212 t\u02dck))e(\u2212N\u02dc(tp \u2212 t\u02dck))= e(\u2212N\u02dc(tp \u2212 t\u02dck))1\u2212 e(N(tp \u2212 t\u02dck))1\u2212 e(tp \u2212 t\u02dck)and we see that Skp = 0 iff N(tp \u2212 t\u02dck) \u2208 Z\/{0} and tp \u2212 t\u02dck \/\u2208 Z. However,notice thatN(tp \u2212 t\u02dck) = N(p\u2212 1N\u2212 k \u2212 1m\u2212\u2206k)= p\u2212 1\u2212 N(k \u2212 1)m\u2212N\u2206k,so that N(tp \u2212 t\u02dck) \u2208 Z\/{0} iff N(k\u22121)m + N\u2206k = Nt\u02dck + N2 \u2208 Z\/{p\u2212 1}. Thiscondition equivalently gives that t\u02dck =jN\u2212 12for some j \u2208 Z\/{p\u2212 1}. Sincethis must hold for all p \u2208 [N ], we have that N(tp\u2212 t\u02dck) \u2208 Z\/{0} iff t\u02dck = jN \u2212 12for some j \u2208 Z\/{0, 1, \u00b7 \u00b7 \u00b7 , N \u2212 1}.We see that such a condition would imply that t\u02dck \/\u2208 \u2126 := [\u221212 , 12), whichviolates the assumptions of Theorem 2.2.1. This finishes the proof.We end this section with the proof of Corollary 2.3.2.proof of Corollary 2.3.2. The proof will consist of applying Corollary 2.3.1(since we have adopted these assumptions) and Theorem 2.2.1.By Corollary 2.3.1, we have that\u2016f \u2212\u03a8g]\u20162 \u2264 c2\u000fs(g)\u221as+c3\u221aN\u03b7\u221am+ 2c3\u221aN\u2211|`|>N\u221212|c`|with probability exceeding (2.13) and by Theorem 2.2.1 for x \u2208 \u2126f(x)\u2212 \u3008h(x),F\u2217f\u3009 =\u2211|`|>N\u02dcc`(e(`x)\u2212 (\u22121)b `+N\u02dcN ce(r(`)x)).43Therefore|f(x)\u2212 f](x)| := |f(x)\u2212 \u3008h(x),F\u2217\u03a8g]\u3009|\u2264 |f(x)\u2212 \u3008h(x),F\u2217f\u3009|+ |\u3008h(x),F\u2217f\u3009 \u2212 \u3008h(x),F\u2217\u03a8g]\u3009|\u2264\u2223\u2223\u2223\u2223\u2223\u2223\u2211|`|>N\u02dcc`(e(`x)\u2212 (\u22121)b `+N\u02dcN ce(r(`)x))\u2223\u2223\u2223\u2223\u2223\u2223+ \u2016h(x)\u20162\u2016F\u2217(f \u2212\u03a8g])\u20162\u2264 2\u2211|`|>N\u02dc|c`|+ c2\u000fs(g)\u221as+c3\u221aN\u03b7\u221am+ 2c3\u221aN\u2211|`|>N\u221212|c`|.The last inequality holds since \u2016h(x)\u20162 = 1 (here x is considered fixed andh(x) \u2208 CN). This finishes the proof.2.7 Proof of Lemma 2.5.2We end the required proofs of this chapter by establishing Lemma 2.5.2. Thislemma was used in the RIC bound to apply the respective concentrationinequality.proof of Lemma 2.5.2. We upper bound |\u3008Ak\u2217, v\u3009| as follows,|\u3008Ak\u2217, v\u3009| = |\u3008 1\u221amN\u02dck\u2217,F\u2217\u03a8v\u3009|=1\u221am\u2223\u2223\u2223\u2223\u2223N\u2211`=1N\u02dck`(F\u2217\u03a8v)`\u2223\u2223\u2223\u2223\u2223 \u2264 1\u221amN\u2211`=1|N\u02dck`| |(F\u2217\u03a8v)`|=1\u221amN\u2211`=1|(F\u2217\u03a8v)`| = 1\u221amN\u2211`=1\u2223\u2223\u2223\u2223\u2223n\u2211p=1(F\u2217\u03a8)`p vp\u2223\u2223\u2223\u2223\u2223 = 1\u221amN\u2211`=1\u2223\u2223\u2223\u2223\u2223n\u2211p=1\u3008F\u2217`,\u03a8\u2217p\u3009vp\u2223\u2223\u2223\u2223\u2223\u2264 1\u221amn\u2211p=1N\u2211`=1|\u3008F\u2217`,\u03a8\u2217p\u3009| |vp| \u2264 1\u221am(maxq\u2208[n]N\u2211`=1|\u3008F\u2217`,\u03a8\u2217q\u3009|)n\u2211p=1|vp|:=\u03b3\u2016v\u20161\u221am\u2264 \u03b3\u221as\u2016v\u20162\u221am.44Note that we have shown \u2016F\u2217\u03a8v\u20161 \u2264 \u03b3\u221as\u2016v\u20162 (this will be used in the proofof the remaining claim).We now bound the sum of the fourth moments, using our deviation modelas in the proof of (2.18). Throughout, let \u2206\u02dc \u2208 R be an independent copy ofthe entries of \u2206 \u2208 Rm. Expanding and taking expected value we obtainEm\u2211k=1|\u3008Ak\u2217, v\u3009|4 = 1m2Em\u2211k=1|\u3008N\u02dck\u2217,F\u2217\u03a8v\u3009|4 = 1m2Em\u2211k=1\u2223\u2223\u2223\u2223\u2223N\u2211`=1N\u02dck`(F\u2217\u03a8v)`\u2223\u2223\u2223\u2223\u22234=1m2Em\u2211k=1(N\u2211`1,`2,`3,`4=1N\u02dck`1N\u02dc k`2N\u02dck`3N\u02dc k`4(F\u2217\u03a8v)`1(F\u2217\u03a8v)`2(F\u2217\u03a8v)`3(F\u2217\u03a8v)`4):=N\u2211`1,`2,`3,`4=11m2(F\u2217\u03a8v)`1(F\u2217\u03a8v)`2(F\u2217\u03a8v)`3(F\u2217\u03a8v)`4(Em\u2211k=1e(t\u02dck(`1 \u2212 `2 + `3 \u2212 `4)))=\u2211`1\u2212`2=`4\u2212`31m(F\u2217\u03a8v)`1(F\u2217\u03a8v)`2(F\u2217\u03a8v)`3(F\u2217\u03a8v)`4+b2(N\u22121)\/mc\u2211j=1\u2211`1\u2212`2=`4\u2212`3+jm1m(F\u2217\u03a8v)`1(F\u2217\u03a8v)`2(F\u2217\u03a8v)`3(F\u2217\u03a8v)`4Ee(jm(\u2206\u02dc\u2212 1\/2))+b2(N\u22121)\/mc\u2211j=1\u2211`1\u2212`2=`4\u2212`3\u2212jm1m(F\u2217\u03a8v)`1(F\u2217\u03a8v)`2(F\u2217\u03a8v)`3(F\u2217\u03a8v)`4Ee(\u2212jm(\u2206\u02dc\u2212 1\/2)).The last equality holds as in the proof of (2.18), whereE1mm\u2211k=1e(t\u02dck(`1 \u2212 `2 + `3 \u2212 `4))=\uf8f1\uf8f4\uf8f2\uf8f4\uf8f31 if `1 \u2212 `2 + `3 \u2212 `4 = 0Ee(jm(\u2206\u02dc\u2212 1\/2))if `1 \u2212 `2 + `3 \u2212 `4 = jm, j \u2208 Z\/{0}0 otherwise.45We bound each of these last three terms. The first term can be bounded as\u2223\u2223\u2223\u2223\u2223 \u2211`1\u2212`2=`4\u2212`31m(F\u2217\u03a8v)`1(F\u2217\u03a8v)`2(F\u2217\u03a8v)`3(F\u2217\u03a8v)`4\u2223\u2223\u2223\u2223\u2223=\u2223\u2223\u2223\u2223\u2223\u2223 1mN\u2211`3,`4=1(F\u2217\u03a8v)`3(F\u2217\u03a8v)`4\u2211`2\u2208Q`3,`4(F\u2217\u03a8v)`4+`2\u2212`3(F\u2217\u03a8v)`2\u2223\u2223\u2223\u2223\u2223\u2223\u2264 1mN\u2211`3,`4=1|(F\u2217\u03a8v)`3 ||(F\u2217\u03a8v)`4 |\u2223\u2223\u2223\u2223\u2223\u2223\u2211`2\u2208Q`3,`4(F\u2217\u03a8v)`4+`2\u2212`3(F\u2217\u03a8v)`2\u2223\u2223\u2223\u2223\u2223\u2223\u2264 1mN\u2211`3,`4=1|(F\u2217\u03a8v)`3 | |(F\u2217\u03a8v)`4| |\u3008F\u2217\u03a8v,F\u2217\u03a8v\u3009|=1m\u2016F\u2217\u03a8v\u201622\u2016F\u2217\u03a8v\u201621 =1m\u2016\u03a8v\u201622\u2016F\u2217\u03a8v\u201621\u2264 1m\u03b22\u2016v\u201642\u03b32s.In the first equality, Q`3,`4 is the set of allowed index values for `2 according to`3 and `4, i.e., so that `2 \u2208 [N ] and `4 +`2\u2212`3 \u2208 [N ]. The last bound consistsof bounding \u2016F\u2217\u03a8v\u20161 \u2264 \u03b3\u221as\u2016v\u20162 as in the computations of |\u3008Ak\u2217, v\u3009|, and\u2016\u03a8v\u20162 \u2264 \u03b2\u2016v\u20162 from the singular values of \u03a8.The remaining terms can be bounded similarly. Using our deviationmodel, notice thatb2(N\u22121)\/mc\u2211j=1|Ee(\u00b1jm(\u2206\u02dc\u2212 1\/2))| \u2264b2(N\u22121)\/mc\u2211j=1\u03b8m2N\u2264 \u03b8,and therefore we can similarly obtain\u2223\u2223\u2223\u2223\u2223\u2223b2(N\u22121)\/mc\u2211j=1\u2211`1\u2212`2=`4\u2212`3\u00b1jm1m(F\u2217\u03a8v)`1(F\u2217\u03a8v)`2(F\u2217\u03a8v)`3(F\u2217\u03a8v)`4Ee(\u00b1jm(\u2206\u02dc\u2212 1\/2))\u2223\u2223\u2223\u2223\u2223\u2223\u2264 \u03b8m\u03b22\u2016v\u201642\u03b32s.46This gives,Em\u2211k=1|\u3008Ak\u2217, v\u3009|4 \u2264 1 + 2\u03b8m\u03b22\u2016v\u201642\u03b32s,as desired.2.8 DFT-incoherence of Discretized Smooth Func-tionsThe goal of this section is to informally bound the DFT-incoherence param-eter (\u03b3) for \u03a8 whose columns are discretizations of smooth functions (withat least one piece-wise continuous derivative). This is example 4) discussedin Section 2.4.2. This section will also provide useful intuition for our resultsin Chapter 3 (see Section 3.4.1).These computation are not crucial in any way for the main result. Rather,this statement is only for the sake of intuition for the reader and to establishthe practicality of the main results. With this in mind, the following will bea very informal argument for the sake of brevity. It follows the lecture notesof Dr. Lei Li at Duke University [54] (intro to numerical PDEs), we refer thereader to these notes for a formal argument, proofs and references therein.Recall that for \u03a8 \u2208 CN\u00d7n, the DFT-incoherence parameter \u03b3 is definedin Section 2.2.1 as\u03b3 = max`\u2208[n]N\u2211k=1|\u3008F\u2217k,\u03a8\u2217`\u3009| .Let \u03a8 have columns that are uniform discretizations of p-differentiable func-tions with p \u2212 1 continuous and periodic derivatives (with period of 1) andwith p-th derivative that is piecewise continuous (p \u2265 1). In this section wewill to argue that:\u2022 \u03b3 \u223c O(log(N)) if p = 1\u2022 \u03b3 \u223c O(1) if p \u2265 2.To this end, for k \u2208 [N ] and ` \u2208 [n] we may assume that\u03a8k` = g`(k \u2212 1N\u2212 12):= g` (tk),47where each g` : R \u2192 C is a function such that g` \u2208 Cp\u22121(\u2126) (has p \u2212 1continuous derivates), g(p)` is piecewise continuous and g(j)`(\u221212)= g(j)`(12)for all derivatives j \u2208 {0, 1, \u00b7 \u00b7 \u00b7 , p\u2212 1} (i.e., has p\u2212 1 periodic derivatives).Let the Fourier coefficients of each g` be given asa(`)u :=\u222b 1\/2\u22121\/2g`(x)e(ux)dxfor u \u2208 Z. Further, denote the discrete Fourier coefficients asb(`)v =N\u2211q=1g`(q \u2212 1N\u2212 12)e(v(q \u2212 1N\u2212 12))=N\u2211q=1\u03a8q`e (vtq) :=\u221aN\u3008\u03a8\u2217`,F\u2217(v+N\u02dc+1)\u3009for v \u2208 {1\u2212N2, \u00b7 \u00b7 \u00b7 , N\u221212} := {\u2212N\u02dc ,\u2212N\u02dc + 1, \u00b7 \u00b7 \u00b7 , N\u02dc \u2212 1, N\u02dc} (recall that N\u02dc :=N\u221212).By Corollary 2 in [54] (notice our distinct normalization here), we havethat \u2223\u2223\u2223\u2223a(`)u \u2212 1\u221aN b(`)u\u2223\u2223\u2223\u2223 \u2264 \u03c91Np (2.23)for all u \u2208 {\u2212N\u02dc ,\u2212N\u02dc + 1, \u00b7 \u00b7 \u00b7 , N\u02dc \u2212 1, N\u02dc} where 0 < \u03c91 < \u221e is a constantthat depends on the g`'s. Furthermore by Theorem 2 in [54] we have that|a(`)u| \u2264 \u03c92|u|p , (2.24)for all u \u2208 {\u2212N\u02dc ,\u2212N\u02dc + 1, \u00b7 \u00b7 \u00b7 , N\u02dc \u2212 1, N\u02dc} where 0 < \u03c92 < \u221e is a constantthat depends on the g`'s.48Applying (2.23) and (2.24), for each ` \u2208 [n] we haveN\u2211k=1|\u3008F\u2217k,\u03a8\u2217`\u3009| :=N\u2211k=1\u2223\u2223\u2223\u2223 1\u221aN b(`)k\u2212N\u02dc\u22121\u2223\u2223\u2223\u2223=N\u02dc\u2211k=\u2212N\u02dc\u2223\u2223\u2223\u2223 1\u221aN b(`)k\u2223\u2223\u2223\u2223 \u2264 N\u02dc\u2211k=\u2212N\u02dc\u2223\u2223\u2223\u2223 1\u221aN b(`)k \u2212 a(`)k\u2223\u2223\u2223\u2223+ N\u02dc\u2211k=\u2212N\u02dc|a(`)k|\u2264N\u02dc\u2211k=\u2212N\u02dc\u03c91Np+N\u02dc\u2211k=\u2212N\u02dc\u03c92|k|p \u2264 \u03c91 +N\u02dc\u2211k=\u2212N\u02dc\u03c92|k|p .The last inequality holds since for p \u2265 1 we have\u2211N\u02dck=\u2212N\u02dc 1Np \u2264\u2211N\u02dck=\u2212N\u02dc 1N = 1.For the remaining sum, if p = 1 we haveN\u02dc\u2211k=\u2212N\u02dc1|k| \u223c O(log(N))by an integral comparison andN\u02dc\u2211k=\u2212N\u02dc1|k|p \u223c O(1)if p \u2265 2 (again by an integral comparison). This holds for all ` \u2208 [n] whichgives the desired bounds for \u03b3. This finishes our argument.2.9 Numerical ExperimentsIn this section we present numerical experiments to elaborate several aspectsof our methodology and results. We first introduce several terms and modelsto describe the setup of the experiments. Throughout we let N = 2015 bethe uniformly discretized signal size.Our methodology (2.9) is solved by SPGL1 [20], a Pareto curve approachthat uses duality theory to solve the basis pursuit problem via a sequenceof numerically cheaper LASSO subproblems. Each basis pursuit problem issolved by limiting the number of SPGl1 iterations to 200.49We implement the Dirichlet kernel using (2.4) directly to construct a Spotoperator [24], a matrix free linear operator toolbox that is compatible withSPGL1. We warn the reader that in this section we have not dedicated mucheffort to optimize the numerical complexity of the interpolation kernel. For afaster implementation, we recommend instead applying the Fourier transformrepresentation S = NF\u2217 (see Section 2.2.3) using NFFT 3 software from [45]or its parallel counterpart [62].Given output f ] = \u03a8g] of any of our programs with true solution f ,we consider the relative reconstruction error norm as a measure of outputquality, given asRelative Reconstruction Error =\u2016f ] \u2212 f\u20162\u2016f\u20162 .Grid perturbations: To construct the nonuniform grid \u03c4\u02dc , we introduce anirregularity parameter \u03c1 \u2208 R+. We define our perturbations by samplingfrom a uniform distribution, so that each \u2206k is drawn uniformly at randomfrom [\u2212 \u03c1m, \u03c1m] for all k \u2208 [m] independently. \u03c4\u02dc is generated independently foreach signal reconstruction experiment.Complex exponential signal model: We consider bandlimited complex ex-ponentials with random harmonic frequencies. With signal size N = 2015,bandlimit \u03c9 = N\u221212= 1007, and sparsity level s = 50 we generate ~\u03c9 \u2208 Zs bychoosing s frequencies uniformly at random from {\u2212\u03c9,\u2212\u03c9 + 1, \u00b7 \u00b7 \u00b7 , \u03c9} andlettingf(x) =s\u2211k=1e (~\u03c9kx) .We use the DFT as a sparsifying transform \u03a8 = F so that g = \u03a8\u2217f = \u03a8\u22121f isindeed a 50-sparse vector. This transform is implemented as a Spot operatorwhich utilizes MATLAB's fft function. The frequency vector, ~\u03c9, is generatedrandomly for each independent set of experiments. Note that in this case wehave optimal DFT-incoherence parameter \u03b3 = 1 (see Section 2.4.2).Gaussian signal model: We consider a non-bandlimited signal consistingof sums of Gaussian functions. With signal size N = 2015, this signal modelis defined asf(x) = \u2212e\u2212100x2 + e\u2212100(x\u2212.104)2 \u2212 e\u2212100(x+.217)2 .For this dataset, we use the Daubechies 2 wavelet as a sparsifying transform\u03a8. This operator is implemented as a Spot operator which utilizes the Rice50Wavelet Toolbox [79]. This provides g = \u03a8\u2217f = \u03a8\u22121f that can be wellapproximated by a 50-sparse vector. In other words, all entries of g are non-zero but \u000f50(g) < .088 \u2248 \u2016f\u20162250 . In this case we have \u03b3 \u2248 40.78, which wascomputed numerically.2.9.1 Effect of DFT-incoherenceThis section is dedicated to exploring the effect of the DFT-incoherence pa-rameter in signal reconstruction. We consider the complex exponential andGaussian signal models described above, notice that both signals are effec-tively 50-sparse. Recall that in the complex exponential model we have\u03a8 = F (the DFT) with optimal DFT-incoherence parameter \u03b3 = 1. In theGaussian model \u03a8 is the Daubechies 2 wavelet with \u03b3 \u2248 40.78 (computednumerically).Here we set \u03c1 = 12to generate the deviations (so that \u03b8 = 0) and varythe average step size of the nonuniform samples. We do so by letting mvary through the set {b N1.5c, bN2c, b N2.5c, \u00b7 \u00b7 \u00b7 , b N9.5c}. For each fixed value ofm, the relative reconstruction error is obtained by averaging the result of10 independent reconstruction experiments. The results are shown in Fig-ure 2.4, where we plot the average nonuniform step size vs average relativereconstruction error.These experiments demonstrate the negative effect of larger DFT-incoherenceparameters in signal reconstruction. Indeed, in Figure 2.4 we see that thecomplex exponential model with \u03b3 = 1 allows for accurate reconstructionfrom larger step sizes. This is to be expected from our results in Section2.3 since the results imply that the Daubechies 2 wavelet will require moresamples and worsen the probability of successful reconstruction according toits parameter \u03b3 \u2248 40.78.2.9.2 Effect of Deviation Model Parameter \u03b8In this section we generate the deviations in such a way that vary the parame-ter \u03b8, in order to explore its effect on signal reconstruction. We only considerthe complex exponential signal model for this purpose and fixm = 287 = bN7cnonuniform measurements.We vary the \u03b8 parameter by generating the deviations with \u03c1 varyingover {.06, .07, .08, \u00b7 \u00b7 \u00b7 , .5}. For each fixed \u03c1 value we plot the average relativereconstruction error of 50 independent experiments. Notice that for each51Figure 2.4: Plot of average relative reconstruction error vs average nonuni-form step size for both signal models. In the complex exponential model (\u03a8 =DFT) we have \u03b3 = 1 and in the Gaussian signal model we have \u03b3 \u2248 40.78(Daubechies 2 wavelet). Notice that the complex exponential model allowsfor reconstruction from larger step sizes in comparison to the Gaussian signalmodel.52k \u2208 [m] and any jEe (jm\u2206k) =sin (2pij\u03c1)2pij\u03c1.We use this observation to compute the \u03b8 parameter numerically by con-sidering all j \u2208(Z \u2229 [2(1\u2212N)m, 2(N\u22121)m])\/{0} = {\u221214,\u221213, \u00b7 \u00b7 \u00b7 ,\u22121, 1, \u00b7 \u00b7 \u00b7 , 14}.The results are shown in Figure 2.5.Figure 2.5: (Left) plot of average relative reconstruction error vs \u03c1 parameterand (right) plot of corresponding \u03b8 parameters vs \u03c1 parameter. The plot onthe right includes the constant value \u03b8 = 1\u221a2required to apply Theorem 2.3.3(the red line). Notice that although our results only holds for three valuesof \u03c1 (.5, .49, .48), the plot on the left demonstrates that accurate recovery isstill possible otherwise.Computing the \u03b8 parameters numerically (plot on the right in Figure 2.5)shows that our main result (Theorem 2.3.3) is only strictly applicable in threecases (\u03c1 = .5, .49, .48). However, the left plot in Figure 2.5) demonstratesthat decent signal reconstruction can still be achieved when the condition\u03b8 < 1\u221a2does not hold. Therefore, the applicability of the methodology goesbeyond the restrictions of the theorem.2.9.3 Noise AttenuationThis section explores the robustness of the methodology when presented withmeasurement noise. We only consider the complex exponential signal modelfor this purpose. We generate additive random noise d \u2208 Rm from a uniformdistribution. Each entry of d is i.i.d. from [\u2212 \u03c7100, \u03c7100] where \u03c7 is the averagevalue of the entries of |f | \u2208 RN .53We set \u03c1 = 12to generate the deviations (so that \u03b8 = 0) and vary the aver-age step size of the nonuniform samples. We do so by letting m vary throughthe set {bN2c, b N2.25c, b N2.5c, \u00b7 \u00b7 \u00b7 , bN7c}. For each fixed value of m, the relativereconstruction error is obtained by averaging the result of 50 independentreconstruction experiments. The results are shown in Figure 2.6, where weplot the average nonuniform step size vs average relative reconstruction errorand average relative noise level \u2016d\u20162\/\u2016f\u20162.Figure 2.6: Plot of average relative reconstruction error (\u2016f\u2212\u03a8g]\u20162\/\u2016f\u20162) vsaverage nonuniform step size (blue curve) and average input relative measure-ment error (\u2016d\u20162\/\u2016f\u20162) vs average nonuniform step size (red curve). Noticethat for the first four step size values (2,2.25,2.5,2.75) noise attenuation isachieved, i.e., the reconstruction error is lower than the input noise level.Although our results do not imply that noise attenuation is possible (thiswas discussed in Section 2.3.1), these experiments show that nonuniformsamples do have a restricted ability for denoising. This is seen in Figure2.6, where the first four step size values (2,2.25,2.5,2.75) output an averagerelative reconstruction error smaller than the input measurement noise level.Thus, when nonuniform samples are not heavily undersampled, reduction inthe noise level is possible.54Chapter 3Off-the-Grid Sampling ofLow-Rank Matrices3.1 IntroductionThis chapter discusses our results under the low-rank signal model discussedin Section 1.2.2. The components of this chapter are in many aspects anal-ogous (in an intuitive way) to the concepts of Chapter 2, but utilizing theideas of low-rank matrix recovery. For the sake of brevity, we will omit orshorten many topics in this chapter that have been thoroughly developed inChapter 2 and will refer the reader to the respective section therein. Section3.2 introduces the assumptions and definitions needed to state the method-ology and main result. Section 3.3 produces the main result of this chapter,including novel analysis of the matrix completion problem. The elementsand implications of these results are discussed in Section 3.4. The remainingsections of the chapter are dedicated to an elaborated discussion of the proof.3.2 Notation, Assumptions and MethodologyIn order to state our methodology and main result, we first introduce nec-essary definitions, assumptions in this section. We introduce the 2D signalmodel, 2D deviation model and 2D interpolation kernel in Sections 3.2.1,3.2.2 and 3.2.3 respectively. This will allow us to proceed to Section 3.3where the main result of this chapter is elaborated.553.2.1 Signal ModelWe now let \u2126 = [\u221212, 12)2 and D : \u2126 7\u2192 C be our function of interest to besampled in \u2126. Assume that D \u2208 H1(\u2126), which allows the Fourier expansionD(x, y) =\u221e\u2211k=\u2212\u221e\u221e\u2211`=\u2212\u221eck,`e(kx)e(`y), (3.1)valid only for (x, y) \u2208 \u2126. For our main result, it is important to notice thatour regularity assumption implies that\u221e\u2211k=\u2212\u221e\u221e\u2211`=\u2212\u221e|ck,`| <\u221e.Let N \u2208 N with N \u2265 9 be odd and D \u2208 CN\u00d7N denote the matrixwhose entries are samples of D on the 2D equispaced grid \u03c4 \u00d7 \u03bd \u2282 \u2126 with\u03c4 = {\u03c4k := k\u22121N \u2212 12 : k \u2208 [N ]} and \u03bd = {\u03bd` := `\u22121N \u2212 12 : ` \u2208 [N ]}, i.e.,Dk` = D(\u03c4k, \u03bd`). Our goal will be to reconstruct D from few nonuniformsamples. We have assumed D is a square matrix with N odd for simplicity,our results can be easily adapted to the case D is a rectangular matrix witheven dimensions.Let n,m \u2208 N such that nm \u2264 N2. Here, D\u02dc \u2208 Cn\u00d7m encompassesour observed discretized nonuniform signal (modulo noise). Here, D\u02dck` =D(\u03c4\u02dck,`, \u03bd\u02dck,`) where \u03c4\u02dc \u00d7 \u03bd\u02dc is the nonuniform 2D grid with \u03c4\u02dck,` := k\u22121n \u2212 12 + \u2206k`and \u03bd\u02dck,` :=`\u22121m\u2212 12+ \u0393k`. The entries of the perturbation matrices \u2206,\u0393 \u2208Rn\u00d7m give the grid deviations in each respective axis. We assume that thesenonuniform samples remain in the sampling domain, i.e., \u03c4\u02dc \u00d7 \u03bd\u02dc \u2282 \u2126. Inparticular, this allows (3.1) to hold for our nonuniform observations. The re-striction nm \u2264 N2 is required in the proof of Theorem 3.5.2 (when applyingLemma 3.6.2 therein)Our noisy nonuniform observations are given viaB = D\u02dc + Ewhere E \u2208 Cn\u00d7m with \u2016E\u2016F \u2264 \u03b7, models the noise introduced in the obser-vations. Our goal is to obtain an approximation to D given B, \u03c4\u02dc , \u03bd\u02dc and \u03b7 ina robust and computationally tractable manner.In the 2D setting, the results will rely on the low-rank structure of D \u2208CN\u00d7N . In other words, given r \u2208 [N ], we define the error of the best rank r56approximation of D in nuclearn norm as\u03c3r,\u2217(D) := minrank(X)\u2264r\u2016D \u2212X\u2016\u2217 =N\u2211k=r+1\u03c3k(D),and say that D has low-rank structure if this term is within a prescribedtolerance for some r \u001c N .We impose some structure on the first r singular vectors of D, which werefer to as the r-incoherence condition with parameter \u03b3 (defined in whatfollows): For W \u2208 Cd1\u00d7d2 define\u03b3(W ) := maxk\u2208[d2]d1\u2211p=1|\u3008F1\u2217p,W\u2217k\u3009|,where F1 : Cd1 7\u2192 Cd1 is the 1D centered DFT (see Section 2.2.3) and F1\u2217psignifies its p-th column (in accordance to our notation). Let D = U\u03a3V \u2217 bethe full singular value decomposition (SVD) of our matrix. We decomposeD = U\u03a3V \u2217 = Ur\u03a3rV \u2217r + U+\u03a3+V\u2217+in terms of its first r singular vectors and last N \u2212 r singular vectors inUr\u03a3rV\u2217r and U+\u03a3+V\u2217+ respectively. We define the r-incoherence parameterof D as\u03b3 := max{\u03b3(Ur), \u03b3(Vr)}.This parameter will play a crucial role in our main result. We postponefurther discussion of \u03b3 until Section 3.4.1, including the intuition of its sig-nificance on the signal and computation of its value for different examples(this concept is similar in many aspects to the DFT-incoherence in Chapter2).Our goal is to recover D (which provides a bandlimited approximation ofD(x, y)) from B = D\u02dc+E. In what follows we will consider a 2D interpolationkernel S : CN\u00d7N \u2192 Cn\u00d7m that achieves S(D) \u2248 D\u02dc accurately (the 2DDirichlet kernel see Section 3.2.3). We will then achieve our reconstructionby solving the nuclear norm minimization problem (1.6) with measurementoperator S.573.2.2 2D Deviation ModelOur main result of this chapter will consider grid deviations \u2206,\u0393 whoseentries are i.i.d. with distributions D1,D2 (respectively), that satisfy thefollowing: for integers 0 < |j| \u2264 2(N\u22121)n, if \u03b4 \u223c D1 thenED1e(j\u03b4n) = 0,and similarly for integers 0 < |j| \u2264 2(N\u22121)m, if \u03b4 \u223c D2 thenED2e(j\u03b4m) = 0.This model is similar to the one considered in Chapter 2, defined in Section2.2.2. However, notice that we have not introduced an analogy of the pa-rameter \u03b8 used in the sparse signal model (i.e., we require \u03b8 = 0 here). Thisis done for simplicity and brevity, in order to produce a substantially cleanerresult and proof. It is the author's belief that this deviation model can begeneralized by allowing \u03b8 6= 0 in this context, but at the moment this seemsa daunting task since the proof is already complicated and long as is.Finally, it is important to reiterate that we are sampling on the torus.This is discussed in detail in Section 2.2.2 and we refer the reader to thisreading. We adopt this concept of sampling the periodic extension of D|\u2126(x)here, which can be defined analogously in the 2D case. Basically, this isneeded in order to maintain our samples of D in \u2126 in accordance to Section3.2.1 so that (3.1) remains valid for our nonuniform samples.The deviation model is discussed further in Section 3.4.2.3.2.3 2D Dirichlet KernelWe will use the 2D Dirichlet kernel S : CN\u00d7N \u2192 Cn\u00d7m to model ournonuniform observations S(D) \u2248 D\u02dc. This interpolation kernel is definedas S := NF\u2217, where F : CN\u00d7N \u2192 CN\u00d7N is the 2D centered DFT andN : CN\u00d7N \u2192 Cn\u00d7m is a 2D centered NDFT that incorporates the unstruc-tured grid \u03c4\u02dc \u00d7 \u03bd\u02dc.Let N\u02dc = N\u221212. Specifically, we apply the centered inverse 2D DFT to ourregular data where for (u, v) \u2208 [N ]2D\u02c7uv := (F\u2217(D))uv1NN\u2211p=1N\u2211q=1Dpqe((u\u2212 N\u02dc \u2212 1)\u03c4p)e((v \u2212 N\u02dc \u2212 1)\u03bdq), (3.2)58followed by the centered 2D NDFT according to \u03c4\u02dc , \u03bd\u02dc(S(D))k` :=(N (D\u02c7))k`=1NN\u2211u=1N\u2211v=1D\u02c7uve(\u2212\u03c4\u02dck,`(u\u2212 N\u02dc \u2212 1))e(\u2212\u03bd\u02dck,`(v \u2212 N\u02dc \u2212 1)). (3.3)As in the 1D case, the action of this operator can also be written in termsof the Dirichlet kernel (see [68]) and is thus a real valued operator. In ourcurrent signal model, we obtain an error bound analogous to the 1D case.Theorem 3.2.1. Let S, D, D\u02dc be defined as above with \u03c4\u02dc \u00d7 \u03bd\u02dc \u2282 \u2126. For(k, `) \u2208 [n]\u00d7 [m], if \u03c4\u02dck,` = \u03c4p and \u03bd\u02dck,` = \u03bdq for some (p, q) \u2208 [N ]2 then(D\u02dc \u2212 S(D))k`= 0, (3.4)and otherwise(D\u02dc \u2212 S(D))k`=\u2211|p|>N\u02dc\u2211|q|>N\u02dccp,q(e(p\u03c4\u02dck,`)e(q\u03bd\u02dck,`)\u2212 (\u22121)bp+N\u02dcNc+b q+N\u02dcNce(rN(p)\u03c4\u02dck,`)e(rN(q)\u03bd\u02dck,`)),(3.5)where rN(`) = rem(`+ N\u02dc ,N)\u2212 N\u02dc with rem(`,N) giving the remainder afterdivision of ` by N . As a consequence\u2016D\u02dc \u2212 S(D)\u2016F \u2264 2\u221anm\u2211|p|>N\u02dc\u2211|q|>N\u02dc|cp,q| .The proof of this error term is given in Section 3.7. The intuition andproof of this error bound are similar to Theorem 2.2.1. In particular, theresult shows that D provides the bandlimited approximation ofD(x, y). Thisresult is crucial to obtain the error bound of our main result.593.3 Main ResultWith the Dirichlet kernel achieving S(D) \u2248 D\u02dc accurately, we approximateD \u2248 D] given viaD] := arg minX\u2208CN\u00d7N\u2016X\u2016\u2217 s.t. \u2016S(X)\u2212B\u2016F \u2264 \u03b7 + 2\u221anm\u2211|p|>N\u221212\u2211|q|>N\u221212|cp,q| ,(3.6)where the term 2\u221anm\u2211|p|,|q|>N\u221212|cp,q| is to due to the interpolation kernelerror in Theorem 3.2.1.We now proceed to the main result.Theorem 3.3.1. Let the entries of \u2206,\u0393 be i.i.d. from distributions thatsatisfy our 2D deviation model and D] be given by (3.6).Ifmn \u2265 C\u02dcNr\u03b34 log6(N) (3.7)where C\u02dc is an absolute constant, then\u2016D\u2212D]\u2016F \u2264 23.4 \u03c3r,\u2217(D)+(5.3+6.9\u221ar)N\uf8eb\uf8ed \u03b7\u221anm+ 2\u2211|k|>N\u221212\u2211|`|>N\u221212|ck,`|\uf8f6\uf8f8 ,(3.8)with probability exceeding1\u2212 4nm\u2212 exp(\u2212 nm40\u03b34r\u221aNlog(1 + 2 log(1 +t15\u221aN+ 1)))\u22124N exp(\u2212 nm8 log(nm)r2 + 4Nr3)\u2212 4N exp(\u2212 nm576Nr log(nm) + 8\u03b32Nr).(3.9)Therefore, with nm \u223c Nr log6(N) random off-the-grid samples we can ap-proximate D with recovery error (3.8) proportional to the low-rank mismatch(\u03c3r,\u2217(D)), noise level (N\u03b7\/\u221anm) and the N\u221212-bandlimited approximation ofD(x, y) (\u2211|k|,|`|>N\u221212|ck,`|).This allows us to construct a function D](x, y) : \u2126 \u2192 C using D] that60achieves|D(x, y)\u2212D](x, y)| \u226423.4 \u03c3r,\u2217(D) +(5.3 + 6.9\u221ar)N\u03b7\u221anm+ 2(N(5.3 + 6.9\u221ar) + 1) \u2211|k|>N\u221212\u2211|`|>N\u221212|ck,`|(3.10)for all (x, y) \u2208 \u2126. The construction of D](x, y) via D] and proof of the errorbound (3.10) are similar to the 1D case (Corollary 2.3.2) and we omit theproof of this 2D statement.In conclusion, the result states that random nonuniform samples effi-ciently handle aliasing artifacts in an undersampled sense. When \u03b7 = \u03c3r,\u2217(D) =0, we may recover the N\u221212-bandlimited approximation of D(x, y) accordingto (3.10) with O(rN polylog(N)) stochastic nonuniform samples. In compar-ison, uniform sampling requires O(N2) measurements for the same quality ofreconstruction. When the uniform discretization of our signal of interest canbe well approximated by a rank r \u001c N matrix, our nonuniform samplingmethodology provides a stark contrast in acquisition complexity.3.3.1 Implications for Matrix Completion: Stability andRobustnessTheorem 3.3.1 applies almost directly to the noisy matrix completion prob-lem. To be specific, a relatively minor modification of the deviation modeland proof of Theorem 3.3.1 provides a novel result for the matrix completionproblem under the sampling with replacement model from [7]. For simplicity,we only consider this sampling model but note that other possible matrixcompletion sampling strategies that can be derived in a similar manner (e.g.,the standard uniform random sampling model).To elaborate, in the matrix completion problem one is given a set ofmulti-indices {(pk, qk)}mk=1 \u2282 [N ]2 indicating the set of observed matrix en-tries (modulo noise) of D \u2208 CN\u00d7N . Let \u039b := {(pk, qk)}mk=1 and D\u039b \u2208 Cmencompass the observed matrix elements, i.e., each entry of D\u039b is given asD\u039bk = Dpkqk , \u2200k \u2208 [m].Then our noisy observations are given asB = D\u039b + E \u2208 Cm,61where E \u2208 Cm models the additive noise with \u2016E\u20162 \u2264 \u03b7. Notice that here Dneed not be generated as a discretization of a continuous function D(x, y).This will also hold for our result in this section, where the signal model is nolonger needed. Our novel result in this context is the followingTheorem 3.3.2. Let D be an N \u00d7 N matrix with r-incoherence parameter\u03b3 (not necessarily generated as a discretization of a function). Suppose \u039b :={(pk, qk)}mk=1, where each multi-index (pk, qk) is selected independently fromthe uniform distribution on [N ]2 (i.e., sampling with replacement).LetB = D\u039b + E,with \u2016E\u20162 \u2264 \u03b7 andD] := arg minX\u2208CN\u00d7N\u2016X\u2016\u2217 s.t. \u2016X\u039b \u2212B\u20162 \u2264 \u03b7. (3.11)Ifm \u2265 C\u02dcNr\u03b34 log6(N),where C\u02dc is the absolute constant, then D] satisfies\u2016D \u2212D]\u2016F \u2264 23.4 \u03c3r,\u2217(D) + (5.3 + 6.9\u221ar)N\u03b7\u221anm,with probability exceeding1\u2212 4nm\u2212 exp(\u2212 nm40\u03b34r\u221aNlog(1 + 2 log(1 +t15\u221aN+ 4)))\u22124N exp(\u2212 nm8 log(nm)r2 + 4Nr3)\u2212 4N exp(\u2212 nm576Nr log(nm) + 8\u03b32Nr).(3.12)The proof of this theorem is presented in Section 3.8, it follows from aproof similar to that of Theorem 3.3.1.Notice that the result does not require D to be rank r and provides anerror bound proportional to the error of the rank r approximation (\u03c3r,\u2217(D)).Such methodologies and results are referred to as stable (see for example[85]). This is the main contribution of this result, where all other relatedworks in the literature require the data matrix to be rank r [21, 22, 7, 23].62The reference [22] allows for noisy measurements, so that one could po-tentially remove the rank r constraint by absorbing the error of the rank r ap-proximation in the noise model, i.e., modify the noise level as \u03b7\u02dc := \u03b7+\u03c3r,\u2217(D).However, this would imply that the practitioner must have an estimate of\u03c3r,\u2217(D) and would produce an error bound \u223c N\u221ar polylog(N)(\u03b7+\u03c3r,\u2217(D)) in [22].On the other hand, our results apply to full rank matrices and give the sig-nificantly improved error bound \u223c \u03c3r,\u2217(D) +\u221aN\u221apolylog(N)\u03b7. Furthermore, ourresult has the same sampling complexity as [22] in terms of the logarithmicdependency, log6(N). However, this comparison is difficult to make fairlydue to the distinct incoherence and sampling considered.Theorem 3.3.2 applies to the sampling with replacement model. Onecould argue as in Proposition 3.1 in [7] and extend this result to the uniformrandom sampling model from standard results on the topic [21, 23]. However,this seems strange to do in the noisy case, since observing repeated entrieswill modify the noise energy level \u03b7. We leave such results as future work andend this section by noting that other sampling strategies can be consideredfor the matrix completion problem via our 2D deviation model, especiallyby generalizing this model to the case \u03b8 6= 0 (see Section 3.2.2 and Section2.2.2).3.4 DiscussionThis section provides additional discussion on the elements of our main re-sult in this section, Theorem 3.3.1. Section 3.4.1 discusses the incoherencecondition on the singular vectors of the data matrix. Section 3.4.2 considersthe 2D deviation model.3.4.1 r-incoherence ConditionOur r-incoherence condition defined in Section 3.2.1 is related to the DFT-incoherence parameter of Chapter 2 from Section 2.2.1. Thus, the intutionelaborated in Section 2.4.2 remains informative but now applies to the sin-gular values of the data matrix D.Therefore, as discussed in Section 2.4.2, our results apply to data matricesD whose main rank r component (UrV\u2217r ) corresponds to the discretizationof a smooth function. This is perhaps best described as in example 4) from63Section 2.4.2:r-incoherence for Discretized Smooth Functions : Let p \u2265 1 be an integer.Consider singular vectors Ur, Vr whose columns are uniform discretizationsof p-differentiable functions with p \u2212 1 periodic and continuouss derivativesand with p-th derivative that is piecewise continuous. In this case we have\u03b3(Ur), \u03b3(Vr) \u223c O(log(N)) if p = 1 and \u03b3(Ur), \u03b3(Vr) \u223c O(1) if p \u2265 2. Weremind the reader that this example is argued informally in Section 2.8.Alternatively, our r-incoherence concept aligns well with the typical inco-herence structure from the matrix completion literature [21, 22, 7, 23] (e.g.,coherence condition and strong incoherence property). Intuitively, the inco-herence conditions from these references assure that the data matrix containsits energy (i.e., norm) evenly spread out throughout the matrix. Small inco-herence parameters avoid the case where a significant portion of the matrixenergy lies in a small subset of the matrix entries. In matrix completion,data matrices with large incoherence paramter are not appropriate for ma-trix completion (with sampling oblivious to the singular vectors) since onemight miss crucial information upon sampling entry wise.This same intuition is informative for our r-incoherence structure. This isillustrated by a simple example in Figure 3.1, where two data matrices (takenfrom [68]) with distinct \u03b3 values are presented. Both of these matrices areuniform discretizations of continuous functions of the formD(x, y) =5\u2211k=1e\u2212c(x\u2212xk)2\u2212c(y\u2212yk)2(3.13)for some parameter c > 0 and Gaussian centers {(xk, yk)}5k=1. Both uniformlydiscretized signals (D) are rank-5 in this example.In Figure 3.1, the data matrix on the left corresponds to (3.13) withc = 1000 (i.e., relatively large Lipschitz constant). As a consequence, mostof its energy lies in a small portion of the matrix and we obtain inappropriatevalue \u03b3 \u223c O(\u221aN) which also agrees with the standard concept of incoherencefrom matrix completion. On the other hand, the data matrix on the right ismore appropriate for our application (and matrix completion) as its energyis more evenly spread. This matrix corresponds to (3.13) with parameterc = 20. Such moderate smooth signals provide favorable sampling complexityand reconstruction error bounds via our methodology.However, this simple example may be misleading in the context of thisthesis since the discretizations in Figure 3.1 differ in both r-incoherence pa-64Figure 3.1: Illustration of two 500 \u00d7 500 data matrices (from [68]) of rank5 with distinct 5-incoherence parameters (\u03b3). (Left) low-rank data matrixwith inappropriate 5-incoherence structure \u03b3 \u223c O(\u221aN). (Right) low-rankdata matrix with appropriate 5-incoherence structure \u03b3 \u223c O(1). Both datamatrices are discretizations of functions of the form (3.13) with same centerlocations and parameter c = 1000 for the left data matrix and c = 20 for theright data matrix.rameter and aliasing energy (i.e.,\u2211|`|>N\/2 |c`|). Such differences make theeffect of r-incoherence unclear in the images and error bounds, since themain result also includes the error of N\/2-bandlimited approximation. Toprovide a fair example that decouples these two concepts we modify (3.13)and consider functions of the formD(x, y) =5\u2211k=1e\u2212c(x\u2212xk)2\u2212c(y\u2212yk)2 + d sin (2pi\u03c9x) sin (2pi\u03c9y) , (3.14)where d, \u03c9 > 0 will be used to modify the bandwidth and aliasing energy ofthese simple examples. This modified example is shown in Figure 3.2. Inthis case, both discretizations (D) are now rank-6.In this modified example, we may now choose distinct bandwidth \u03c9 andaliasing energy d parameters so that both discretizations have the same alias-ing energy, while not substantially affecting the respective incoherence pa-rameters \u03b3 of the original examples from Figure 3.1. In addition, both datamatrices are equal in rank so that this simple example provides a more fairillustration of the r-incoherence concept that does not vary other aspects ofthe data matrices.65Figure 3.2: Illustration of two 500 \u00d7 500 data matrices of rank 6 with dis-tinct 6-incoherence parameters (\u03b3) but same aliasing energy (\u2211|`|>N\/2 |c`|).(Left) low-rank data matrix with inappropriate 6-incoherence structure \u03b3 \u223cO(\u221aN). (Right) low-rank data matrix with appropriate 6-incoherence struc-ture \u03b3 \u223c O(1). Both data matrices are discretizations of functions of the form(3.14) modified from respectively from Figure 3.1, where the left image hasno additional aliasing energy introduced (i.e., \u03c9 < 250 and d \u001c 1) and theimage on the right has been generated with additional aliasing error (\u03c9 > 250and d \u223c 1) to match that of the image on the left (these additional aliasingartifacts are subtle due to their high frequency nature).663.4.2 2D Deviation ModelAs discussed in Section 3.2.2, our 2D deviation model is similar to the devia-tion model from Chapter 2 in Section 2.2.2 but more restrictive. In particular,the examples with \u03b8 = 0 provided in Section 2.4.1 apply in this 2D setting.Specifically, these are\u2022 1) D1 = U [\u2212 12n , 12n ] or D2 = U [\u2212 12m , 12m ]. To generalize this example,we may take D1 = U [\u00b5\u2212 p2n , \u00b5+ p2n ] or D2 = U [\u00b5\u2212 p2m , \u00b5+ p2m ], for any\u00b5 \u2208 R and p \u2208 N\/{0}.\u2022 2) D1 = U{\u2212 12n + knn\u00af1}n\u00af1\u22121k=0 or D2 = U{\u2212 12m + kmn\u00af2}n\u00af2\u22121k=0 with n\u00af1 :=d2(N\u22121)ne + 1 and n\u00af2 := d2(N\u22121)m e + 1. To generalize this example, wemay take D1 = U{\u00b5\u2212 p2n + pknn\u00af1}n\u00af1\u22121k=0 or D2 = U{\u00b5\u2212 p2m + pkmn\u00af2}n\u00af2\u22121k=0 , forany \u00b5 \u2208 R, p \u2208 N\/{0}, n\u00af1 \u2208 N\/[d2(N\u22121)pn e] and n\u00af2 \u2208 N\/[d2(N\u22121)pm e].Figure 3.3 offers a simple illustration of the possible undersampling in our2D random nonuniform sampling scenario. The red samples correspond todense uniform samples and the blue samples depict random nonuniform sam-ples achievable by both examples 1) and 2) above. Nonuniform samples underthe 2D deviation model achieve the same quality of reconstruction as denseequispaced samples, but with significantly less measurements in agreementwith Theorem 3.3.1. In particular notice that our assumptions, where sam-ples are deviated from a less dense equispaced grid {(k\u22121n\u2212 12, `\u22121m\u2212 12)}k\u2208[n],`\u2208[m]allow for samples that are evenly distributed throughout \u2126 (these examplesinclude jittered sampling [11, 74, 14, 72]).An analogy of the parameter \u03b8 from Chapter 2 is not used in this chapterfor simplicity, simply because the proof of the main result is long and com-plicated as is. Therefore, we should expect our low-rank signal model resultsto hold in a more general setting that does not require \u03b8 = 0. For example,when D1,D2 = N (\u00b5, \u03c3\u00af2), for any \u00b5 \u2208 R and \u03c3\u00af2 > 0 we have for j \u2208 Z\/{0}and \u03b4 \u223c D1,D2|ED1e(j\u03b4n)| = e\u22122(\u03c3\u00afpin)2and |ED2e(j\u03b4m)| = e\u22122(\u03c3\u00afpim)2as discussed in Section 2.4.1. For anym and \u03c3\u00af, this example will never satisfyour current 2D deviation model but can come arbitrarily close.67Figure 3.3: Illustration of the 2D sampling scenario in \u2126. (Left) dense eq-uispaced samples. (Right) less dense nonuniform samples (on average by afactor of \u2248 12) which can be generated by example 1) or 2) in this section.Both sampling schemes provide the same quality of reconstruction, but ran-dom off-the-grid samples require less measurements according to our results.3.5 Proof of Main ResultWe will prove this main result via a dual certificate (Lemma 3.5.1 below,proven in Section 3.6). This lemma is a generalization of dual certificateguarantees for sparse vector recovery to the low-rank matrix recovery case(see Theorem 4.33 in [85]).We begin with some necessary notation, definitions. Using the decompo-sition D = Ur\u03a3rV\u2217r + U+\u03a3+V\u2217+, we will be interested in the following spaceof matricesT = {X \u2208 CN\u00d7N : X =r\u2211k=1\u03bbkU\u2217kV \u2217\u2217k, \u03bbk \u2208 R+} = Span ({U\u2217kV \u2217\u2217k}rk=1) ,and its orthogonal complimentT\u22a5 = Span({U\u2217kV \u2217\u2217`}(k,`)\u2208[N ]2\/{U\u2217kV \u2217\u2217k}rk=1) .In particular, notice that Ur\u03a3rV\u2217r \u2208 T and U+\u03a3+V \u2217+ \u2208 T\u22a5.Let PT (X),PT\u22a5(X) denote the respective orthogonal projections to eachof these spaces and let S be the unit sphere in CN\u00d7N . The following sectionstates the dual certificate conditions that provide our recovery error bounds.This is followed by additional lemmas that will be used to establish the dualcertificate.683.5.1 Dual Certificate and Required LemmasThe following lemma will establish our recovery error bounds, stated for ageneral linear operator.Lemma 3.5.1. Let A : CN\u00d7N 7\u2192 Cn\u00d7m be a linear operator and D =Ur\u03a3rV\u2217r + U+\u03a3+V\u2217+ decomposed as above. Assume thatsupX\u2208T\u2229S\u3008(A\u2217A\u2212 I)(X), X\u3009 \u2264 \u03b21, \u2016A\u2016F 7\u2192F \u2264 \u03b22, (3.15)and there exists a matrix M = A\u2217Y such that\u2016PT (M)\u2212 UrV \u2217r \u2016F \u2264 \u03b23, \u2016PT\u22a5(M)\u2016 \u2264 \u03b24, \u2016Y \u2016F \u2264 \u03b25\u221ar. (3.16)If\u03b22\u03b23\u221a1\u2212\u03b21 + \u03b24 < 1, thenD] := arg minX\u2016X\u2016\u2217 s.t. \u2016A(X)\u2212A(D)\u2212 E\u2016F \u2264 \u03b7with \u2016E\u2016F \u2264 \u03b7 satisfies\u2016D \u2212D]\u2016F \u2264 C1\u03c3r,\u2217(D) + (C2 + C3\u221ar)\u03b7,for constants C1, C2, C3 that depend only on the \u03b2k's.The proof of this lemma is postponed until Section 3.6.1.We therefore obtain our main result if we establish (3.15) and (3.16) forour sampling operator under the r-incoherence assumptions. These inequal-ities will follow from the next three Lemmas (also proven in Section 3.6).Throughout, we will denoteA := 1\u221anmN\u02dcF\u2217where N\u02dc := NN .The following lemma will be used to compute the parameters \u03b21, \u03b23 and\u03b25 in (3.15) and (3.16) of Lemma 3.5.1.Lemma 3.5.2. Let A, T and \u03b3 be defined as above. If\u221amn\u221alog(nm)\u2265 2C\u221aNr\u03b32 log5\/2(N)\u221a1 + \u03b4\u221aN\u03b4(3.17)69where C > 0 is an absolute constant, thensupX\u2208T\u2229S\u3008(A\u2217A\u2212 I)(X), X\u3009 \u2264 2\u03b4\u221aN, (3.18)with probability exceeding1\u2212 exp(\u2212 nm\u03b44\u03b34r\u221aNlog(1 + 2 log(1 +t2 \u03b4\u221aN+ 1))).The proof can be found in Section 3.6.2.The next lemma gives parameter \u03b22 from (3.20) directly.Lemma 3.5.3. With the previous notation and definitions, we have\u2016A\u2016F\u2192F \u2264\u221a3N\u221arwith probability greater than1\u2212 2nm\u2212 4N exp(\u2212 nm8 log(nm)r2 + 4Nr3).The proof is provided in Section 3.6.3.The final lemma handles the term \u2016PT\u22a5(M)\u2016 for parameter \u03b24 in (3.16)of Lemma 3.5.1.Lemma 3.5.4. With the previous notation and definitions, we have\u2016PT\u22a5 (A\u2217A(UrV \u2217r )) \u2016 \u2264 2\u03b4with probability greater than1\u2212 2nm\u2212 4N exp(\u2212 nm\u03b4216Nr log(nm) + 4\u03b32Nr\u03b43).The proof is postponed until Section 3.6.4.703.5.2 Proof of Theorem 3.3.1We are now in a position to prove the main result, Theorem 3.3.1, via Lemmas3.5.1-3.5.4.Proof of Theorem 3.3.1. We obtain our result by applying Lemma 3.5.1. Toestablish the required conditions (3.15) and (3.16), we use Lemma 3.5.3,Lemma 3.5.2 with \u03b4 = 1\/10 and Lemma 3.5.4 with \u03b4 = 1\/6 and proceedto compute the \u03b2k parameters in Lemma 3.5.1 (and the Ck constants in theerror bound).Applying Lemma 3.5.2 with \u03b4 = 1\/10, we assume\u221amn\u221alog(nm)\u2265 20C\u221aNr\u03b32 log5\/2(N)\u221a1 +110\u221aN, (3.19)to obtainsupX\u2208T\u2229S\u3008(A\u2217A\u2212 I)(X), X\u3009 \u2264 15\u221aN, (3.20)with probability exceeding1\u2212 exp(\u2212 nm40\u03b34r\u221aNlog(1 + 2 log(1 +t15\u221aN+ 1))).Inequality (3.20) gives parameter \u03b21 :=15\u221aN\u2264 115directly (Since weassume N \u2265 9).For \u03b25, define Y := A(UrV \u2217r ), so that M = A\u2217A(UrV \u2217r ). Notice that theLHS of (3.20) is equivalent tosupX\u2208T\u2229S\u3008(A\u2217A\u2212 I)(X), X\u3009 = supX\u2208T\u2229S|\u3008(A\u2217A(X), X\u3009 \u2212 \u3008X,X\u3009|= supX\u2208T\u2229S\u2223\u2223\u2016A(X)\u20162F \u2212 \u2016X\u20162F \u2223\u2223 .Therefore, since UrV\u2217r \u2208 T , (3.20) gives\u2016Y \u2016F := \u2016A(UrV \u2217r )\u2016F \u2264\u221a1 +15\u221aN\u2016UrV \u2217r \u2016F \u2264\u221a1 +115\u221ar =\u221a16r\u221a15,so that \u03b25 =\u221a16\u221a15.71To compute \u03b23, notice thatsupX\u2208T\u2229S\u3008(A\u2217A\u2212 I)(X), X\u3009 = supX\u2208S\u3008(A\u2217A\u2212 I)(PT (X)),PT (X)\u3009= supX\u2208S\u3008PT \u25e6 (A\u2217A\u2212 I) \u25e6 PT (X), X\u3009 = \u2016PT \u25e6 (A\u2217A\u2212 I) \u25e6 PT\u2016F 7\u2192F ,where the last inequality holds since PT \u25e6 (A\u2217A \u2212 I) \u25e6 PT is a Hermitianoperator. Therefore, (3.20) implies\u2016PT (M)\u2212 UrV \u2217r \u2016F = \u2016PT (M \u2212 UrV \u2217r )\u2016F := \u2016PT \u25e6 (A\u2217A\u2212 I)(UrV \u2217r )\u2016F=\u2016PT \u25e6 (A\u2217A\u2212 I) \u25e6 PT (UrV \u2217r )\u2016F \u2264\u2016UrV \u2217r \u2016F5\u221aN=\u221ar5\u221aN:= \u03b23.Applying Lemma 3.5.3 gives \u03b22 =\u221a3N\u221ardirectly with probability greaterthan1\u2212 2nm\u2212 4N exp(\u2212 nm8 log(nm)r2 + 4Nr3).Finally, to compute \u03b24 we apply Lemma 3.5.4 with \u03b4 =16. Thus\u2016PT\u22a5(M)\u2016 := \u2016PT\u22a5 \u25e6 A\u2217A(UrV \u2217r )\u2016 \u226413:= \u03b24with probability greater than1\u2212 2nm\u2212 4N exp(\u2212 nm576Nr log(nm) + 8\u03b32Nr).Notice that\u03b22\u03b23\u221a1\u2212 \u03b21+ \u03b24 =\u221a35\u221a14\/15+13\u2264 .7 < 1,as required. We obtain the conclusion of Theorem 3.5.1, with constantsC1 := 2(\u03b221\u2212 \u03b21 + 1)(1\u2212 \u03b22\u03b23\u221a1\u2212 \u03b21\u2212 \u03b24)\u22121\u2264 23.4,C2 :=2\u221a1\u2212 \u03b21+2\u03b23\u221a1\u2212 \u03b21(\u03b221\u2212 \u03b21 + 1)(1\u2212 \u03b22\u03b23\u221a1\u2212 \u03b21\u2212 \u03b24)\u22121\u2264 5.3,72andC3 := 2\u03b25(1\u2212 \u03b22\u03b23\u221a1\u2212 \u03b21\u2212 \u03b24)\u22121\u2264 6.9,where the probability of failure holds by a union bound. In particular, sincewe assume nm \u2264 N2 and N \u2265 9, if\u221amn \u2265 20\u221a2C\u221aNr\u03b32 log3(N)\u221a1 +130then (3.19) holds. This is the advertised sampling complexity in the state-ment of Theorem 3.3.1, where we have used\u221a1 + 110\u221aN\u2264\u221a1 + 130andabsorbed the absolute constants into the definition of\u221aC\u02dc.3.6 Proof of Dual Certificate Recovery and Re-quired LemmasIn this section we prove the required lemmas from Section 3.5.1 for the mainresult: Lemma 3.5.1, Lemma 3.5.2, Lemma 3.5.3 and Lemma 3.5.4.3.6.1 Proof of Lemma 3.5.1We begin with the proof of Lemma 3.5.1. This lemma is a generalization ofdual certificate guarantees for sparse vector recovery to the low-rank matrixrecovery case (see Theorem 4.33 in [85]), and the proof will be analogous.Proof of Lemma 3.5.1. Denote W = D] \u2212 D, our goal is to bound \u2016W\u2016F .Notice that since D] is feasible\u2016A(W )\u2016F \u2264 \u2016A(D])\u2212A(D)\u2212 E\u2016F + \u2016E\u2016F \u2264 2\u03b7.We will apply this inequality throughout the proof.Let Z \u2208 T\u22a5 be such that \u3008D +W,Z\u3009 = \u2016PT\u22a5(D +W )\u2016\u2217. By optimalityof D], we obtain\u2016D\u2016\u2217 \u2265 \u2016D]\u2016\u2217 = \u2016D +W\u2016\u2217 \u2265 |\u3008D +W,UrV \u2217r + Z\u3009|=|\u3008D +W,UrV \u2217r \u3009+ \u2016PT\u22a5(D +W )\u2016\u2217|=|\u2016PT (D)\u2016\u2217 + \u3008PT (W ), UrV \u2217r \u3009+ \u2016PT\u22a5(D +W )\u2016\u2217|\u2265\u2016PT (D)\u2016\u2217 \u2212 |\u3008PT (W ), UrV \u2217r \u3009|+ \u2016PT\u22a5(W )\u2016\u2217 \u2212 \u2016PT\u22a5(D)\u2016\u2217.73Where the second inequality holds by the variational characterization of thenuclear norm, \u2016X\u2016\u2217 = sup\u2016Y \u2016\u22641\u3008X, Y \u3009. In the third equality, we have used\u3008D,UrV \u2217r \u3009 = \u2016PT (D)\u2016\u2217.Rearranging, we have shown\u2016PT\u22a5(W )\u2016\u2217 \u2264 2\u2016PT\u22a5(D)\u2016\u2217 + |\u3008PT (W ), UrV \u2217r \u3009|. (3.21)Under the assumption \u2016UrV \u2217r \u2212 PT (M)\u2016F \u2264 \u03b23, the last term in (3.21) canbe bounded as|\u3008PT (W ), UrV \u2217r \u3009| \u2264 |\u3008PT (W ), UrV \u2217r \u2212 PT (M)\u3009|+ |\u3008PT (W ),PT (M)\u3009|\u2264\u03b23\u2016PT (W )\u2016F + |\u3008PT (W ),PT (M)\u3009|=\u03b23\u2016PT (W )\u2016F + |\u3008W \u2212 PT\u22a5(W ),M \u2212 PT\u22a5(M)\u3009|=\u03b23\u2016PT (W )\u2016F + |\u3008W,M\u3009 \u2212 \u3008PT\u22a5(W ),PT\u22a5(M)\u3009|\u2264\u03b23\u2016PT (W )\u2016F + |\u3008W,M\u3009|+ |\u3008PT\u22a5(W ),PT\u22a5(M)\u3009|. (3.22)In the second equality, we used the fact that \u3008W,PT\u22a5(M)\u3009 = \u3008PT\u22a5(W ),PT\u22a5(M)\u3009.We now bound the three terms in (3.22), beginning with \u2016PT (W )\u2016F .The assumed inequalities (3.15) give that for any Z \u2208 T we have \u2016A(Z)\u20162F \u2265(1\u2212 \u03b21)\u2016Z\u20162F , and \u2016A(H)\u2016F \u2264 \u03b22\u2016H\u2016F for any H. Therefore,\u2016PT (W )\u2016F \u2264 1\u221a1\u2212 \u03b21\u2016A(PT (W ))\u2016F \u2264 1\u221a1\u2212 \u03b21\u2016A(W )\u2016F + 1\u221a1\u2212 \u03b21\u2016A(PT\u22a5(W ))\u2016F\u2264 2\u03b7\u221a1\u2212 \u03b21+\u03b22\u221a1\u2212 \u03b21\u2016PT\u22a5(W )\u2016F .(3.23)The remaining terms, |\u3008PT\u22a5(W ),PT\u22a5(M)\u3009| and |\u3008W,M\u3009|, can be boundedby our assumptions (3.16)|\u3008PT\u22a5(W ),PT\u22a5(M)\u3009| \u2264 \u2016PT\u22a5(W )\u2016\u2217\u2016PT\u22a5(M)\u2016 \u2264 \u03b24\u2016PT\u22a5(W )\u2016\u2217and|\u3008W,M\u3009| := |\u3008W,A\u2217(Y )\u3009| = |\u3008A(W ), Y \u3009| \u2264 \u2016A(W )\u2016F\u2016Y \u2016F \u2264 2\u03b7\u03b25\u221ar.Using these inequalities to bound |\u3008PT (W ), UV \u2217\u3009| in (3.21) we obtain\u2016PT\u22a5(W )\u2016\u2217 \u2264 2\u2016PT\u22a5(D)\u2016\u2217 +2\u03b7\u03b23\u221a1\u2212 \u03b21+\u03b22\u03b23\u221a1\u2212 \u03b21\u2016PT\u22a5(W )\u2016F + \u03b24\u2016PT\u22a5(W )\u2016\u2217 + 2\u03b7\u03b25\u221ar\u22642\u2016PT\u22a5(D)\u2016\u2217 +(\u03b22\u03b23\u221a1\u2212 \u03b21+ \u03b24)\u2016PT\u22a5(W )\u2016\u2217 +(2\u03b23\u221a1\u2212 \u03b21+ 2\u03b25\u221ar)\u03b7.74Since by assumption \u03c1 := \u03b22\u03b23\u221a1\u2212\u03b21 + \u03b24 < 1, we can rearrange and obtain\u2016PT\u22a5(W )\u2016\u2217 \u226421\u2212 \u03c1\u2016PT\u22a5(D)\u2016\u2217 +(2\u03b23\u221a1\u2212\u03b21 + 2\u03b25\u221ar)1\u2212 \u03c1 \u03b7.From our previous calculations, (3.23),\u2016PT (W )\u2016F \u2264 2\u03b7\u221a1\u2212 \u03b21+\u03b22\u221a1\u2212 \u03b21\u2016PT\u22a5(W )\u2016F \u22642\u03b7\u221a1\u2212 \u03b21+\u03b22\u221a1\u2212 \u03b21\u2016PT\u22a5(W )\u2016\u2217,so that both of these inequalities give\u2016W\u2016F \u2264 \u2016PT (W )\u2016F + \u2016PT\u22a5(W )\u2016F \u2264 \u2016PT (W )\u2016F + \u2016PT\u22a5(W )\u2016\u2217\u2264 2\u03b7\u221a1\u2212 \u03b21+(\u03b22\u221a1\u2212 \u03b21+ 1)\u2016PT\u22a5(W )\u2016\u2217 \u2264 C1\u2016PT\u22a5(D)\u2016\u2217 +(C2 + C3\u221ar)\u03b7,with appropriate constants, where by definition \u03c3r,\u2217(D) = \u2016PT\u22a5(D)\u2016\u2217. Wefinish by noting that these constants are given asC1 := 2(\u03b221\u2212 \u03b21 + 1)(1\u2212 \u03b22\u03b23\u221a1\u2212 \u03b21\u2212 \u03b24)\u22121,C2 :=2\u221a1\u2212 \u03b21+2\u03b23\u221a1\u2212 \u03b21(\u03b221\u2212 \u03b21 + 1)(1\u2212 \u03b22\u03b23\u221a1\u2212 \u03b21\u2212 \u03b24)\u22121,andC3 := 2\u03b25(1\u2212 \u03b22\u03b23\u221a1\u2212 \u03b21\u2212 \u03b24)\u22121.3.6.2 Proof of Lemma 3.5.2We now endeavor to prove Lemma 3.5.2. This result requires the two lemmas,which we state here and prove in Section 3.10.Before continuing, we establish some useful notation and observations forour linear operators. We consider N : CN\u00d7N \u2192 Cn\u00d7m,F\u2217 : CN\u00d7N \u2192 CN\u00d7Nas ensembles of matrices {N k,`}(k,`)\u2208[n]\u00d7[m] \u2282 CN\u00d7N and {(F\u2217)k,`}(k,`)\u2208[N ]2 \u2282CN\u00d7N respectively. Here the superscripts order the matrices such that theaction of each operator on X \u2208 CN\u00d7N is given entry-wise as(N (X))k` = \u3008N k,`, X\u3009,75for (k, `) \u2208 [n]\u00d7 [m] with an analogous definition for F\u2217 and any other linearoperator to be defined.We remind the reader that we have normalized A := 1\u221anmN\u02dcF\u2217 whereN\u02dc := NN , so that for (k, `) \u2208 [n]\u00d7 [m] and (p, q) \u2208 [N ]\u00d7 [N ]N\u02dc k,`pq = e(\u03c4\u02dck,`(p\u2212 N\u02dc \u2212 1))e(\u03bd\u02dck,`(q \u2212 N\u02dc \u2212 1)).Our deviation model ensures that1\u221anmN\u02dc forms an isotropic ensemble, i.e.,for any X \u2208 CN\u00d7NE1nmN\u02dc \u2217N\u02dc (X) = X. (3.24)We produce the calculation here, which will also be useful in establishing thelemmas that follow:Throughout, let \u2206\u02dc, \u0393\u02dc \u2208 R be independent copies of the entries of \u2206,\u0393respectively and let p, q \u2208 [N ]. Expanding givesE1nm(N\u02dc \u2217N\u02dc (X))pq= E1nmn\u2211k=1m\u2211`=1N\u02dc k,`pq \u3008N\u02dc k,`, X\u3009= E1nmn\u2211k=1m\u2211`=1N\u02dc k,`pq(N\u2211p\u02dc,q\u02dc=1\u00af\u02dcN k,`p\u02dcq\u02dc Xp\u02dcq\u02dc):=N\u2211p\u02dc,q\u02dc=1Xp\u02dcq\u02dcE\u2211k`1nme(\u03c4\u02dck,`(p\u2212 p\u02dc))e(\u03bd\u02dck,`(q \u2212 q\u02dc)).76At this point, we use our deviation model to obtainE\u2211k`1nme(\u03c4\u02dck,`(p\u2212 p\u02dc))e(\u03bd\u02dck,`(q \u2212 q\u02dc)) = ED1e(\u2206\u02dc(p\u2212 p\u02dc))ED2e(\u0393\u02dc(q \u2212 q\u02dc))\u00b7(n\u2211k=11ne((k \u2212 1n\u2212 12)(p\u2212 p\u02dc)))( m\u2211`=11me((`\u2212 1m\u2212 12)(q \u2212 q\u02dc))),=\uf8f1\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f31 if p = p\u02dc and q = q\u02dcED2e(zm(\u0393\u02dc\u2212 1\/2))= 0 if p = p\u02dc and q \u2212 q\u02dc = zm,for z \u2208 Z\/{0}ED1e(jn(\u2206\u02dc\u2212 1\/2))= 0 if p\u2212 p\u02dc = jn and q = q\u02dc,for j \u2208 Z\/{0}ED1e(jn(\u2206\u02dc\u2212 1\/2))ED2e(zm(\u0393\u02dc\u2212 1\/2))= 0 if p\u2212 p\u02dc = jn and q \u2212 q\u02dc = zm,for j, z \u2208 Z\/{0}0 otherwise.The midterms vanish by our deviation model (\u2206\u02dc \u223c D1, \u0393\u02dc \u223c D2) and be-cause q \u2212 q\u02dc = zm, p \u2212 p\u02dc = jn imply that z \u2208 (Z \u2229 [1\u2212Nm, N\u22121m])\/{0} andj \u2208 (Z \u2229 [1\u2212Nn, N\u22121n])\/{0} (since p, p\u02dc, q, q\u02dc \u2208 [N ]). The final term is zero byorthogonality of the complex exponentials. Therefore,E1nm(N\u02dc \u2217N\u02dc (X))pq= Xpq,as desired. We will use this isotropy property frequently and similar orthog-onality calculations in what follows.We now provide a lemma that will be useful to establish both Lemma 3.5.2and Lemma 3.5.4, in order to apply respective concentration inequalities ineach. The proof is postponed until Section 3.10.Lemma 3.6.1. Define N\u02dc ,F as above. Then for D \u2208 T and all (k, `) \u2208[n]\u00d7 [m]|\u3008N\u02dc k,`,F\u2217(D)\u3009| \u2264 \u03b32\u221ar\u2016D\u2016F (3.25)and1nmEn\u2211k=1m\u2211`=1|\u3008N\u02dc k,`,F\u2217(D)\u3009|4 \u2264 \u03b34r\u2016D\u20164F . (3.26)77The next Lemma is a generalization of Lemma 3.6 in [58] to the low-rank matrix recovery case. The proof is due to [94], but requires a slightmodification for our setting. Adopting the author's notation, in what followsfor a matrix A \u2208 CN\u00d7N we denote |A)(A| as the operator that maps X 7\u2192A\u3008A,X\u3009.Lemma 3.6.2. Let m \u2264 N2, U \u2282 CN\u00d7N andU2 := {X \u2208 CN\u00d7N : \u2016X\u2016F \u2264 1, \u2016X\u2016\u2217 \u2264\u221ar\u2016X\u2016F}.Fix some V1, ..., Vm \u2208 CN\u00d7N that satisfymaxk\u2208[m]supX\u2208U|\u3008Vk, X\u3009| \u2264 K\u221ar\u2016X\u2016F .Let \u000f1, ..., \u000fm be i.i.d. Rademacher random variables. ThenE\u000f supX\u2208U\u2229U2m\u2211k=1\u000fk\u3008|Vk)(Vk|(X), X\u3009\u2264\u221arCK log5\/2(N) log1\/2(m)(supX\u2208U\u2229U2m\u2211k=1\u3008|Vk)(Vk|(X), X\u3009)1\/2,where C is an absolute constant.See Section 3.10 for the proof.We now proceed to the proof of Lemma 3.5.2. This lemma computes theparameters \u03b21, \u03b23 and \u03b25 in (3.15) and (3.16).Proof of Lemma 3.5.2. Let A = N\u221anmS = 1\u221anmN\u02dcF\u2217 andX := supX\u2208T\u2229S\u3008(A\u2217A\u2212 I)(X), X\u3009.We aim to showX \u2264 2\u03b4\u221aN, (3.27)which we bound by noting thatsupX\u2208T\u2229S\u3008(A\u2217A\u2212 I)(X), X\u3009 = supX\u2208T\u3008(A\u2217A\u2212 I)(X), X\u3009, (3.28)78whereT := U2 \u2229 T := {X \u2208 CN\u00d7N : \u2016X\u2016F = 1, \u2016X\u2016\u2217 \u2264\u221ar\u2016X\u2016F}\u22c2T.Here we have adopted notation U2 from Lemma 3.6.2 (and from [94]). Indeed,notice that for X \u2208 T we haverank(X) \u2264 r,which gives \u2016X\u2016\u2217 \u2264\u221ar\u2016X\u2016F and the containment T \u2229 S \u2282 T holds (andthus T = T ).We now proceed along the lines of [94, 41, 58]. We reiterate that we areadopting some notation from Lemma 3.6.2 (and from [94]), where for a matrixA \u2208 CN\u00d7N we denote |A)(A| as the operator that maps X 7\u2192 A\u3008A,X\u3009. WriteX := supX\u2208T\u3008(A\u2217A\u2212 I)(X), X\u3009 := \u2016A\u2217A\u2212 I\u2016T=\u2016n\u2211k=1m\u2211`=1(|Ak,`)(Ak,`|)\u2212 I\u2016T=\u2016n\u2211k=1m\u2211`=1(|Ak,`)(Ak,`| \u2212 E|Ak,`)(Ak,`|) \u2016T ,where the last equality holds due to isotropy of our ensemble (3.24). We willfirst bound ED1,D2X , and then apply a concentration inequality to show thisrandom variable is concentrated around its mean.Using symmetrization (as in equation (42) of [94], which uses Lemma 6.3in [56]) we obtainED1,D2X \u2264 2ED1,D2E\u000f\u2016n\u2211k=1m\u2211`=1\u000fk,`|Ak,`)(Ak,`|\u2016T ,where \u000fk,` are Rademacher random variables. We now apply Lemma 3.6.2which requires nm \u2264 N2 and computation of the parameter K, where in ourcase we have U = T . We obtain,maxk,`supX\u2208T|\u3008Ak,`, X\u3009| := maxk,`supX\u2208T1\u221anm|\u3008N\u02dc k,`,F\u2217(X)\u3009|\u2264 \u03b32\u221ar\u2016X\u2016F\u221anm,79where the last inequality follows from Lemma 3.6.1. Therefore, with K =\u03b32\u221anm, Lemma 3.6.2 givesE\u000f\u2016\u2211k,`\u000fk,`|Ak,`)(Ak,`|\u2016T \u2264 C1\u2016\u2211k,`|Ak,`)(Ak,`|\u20161\/2T ,whereC1 :=C\u221ar\u03b32 log5\/2(N)\u221alog(nm)\u221anm,and C > 0 is an absolute constant given in Lemma 3.6.2.Summarizing and continuing these calculations, we haveED1,D2X \u2264 2C1ED1,D2(\u2016\u2211k,`|Ak,`)(Ak,`|\u2016T)1\/2\u22642C1ED1,D2(\u2016\u2211k,`(|Ak,`)(Ak,`|)\u2212 I\u2016T + 1)1\/2\u22642C1(ED1,D2\u2016\u2211k,`(|Ak,`)(Ak,`|)\u2212 I\u2016T + 1)1\/2=2C1 (ED1,D2X + 1)1\/2 .ThereforeED1,D2X\u221aED1,D2X + 1\u2264 2C\u221ar\u03b32 log5\/2(N)\u221alog(nm)\u221anm,and we achieve ED1,D2X \u2264 \u03c4 if2C\u221ar\u03b32 log5\/2(N)\u221alog(nm)\u221anm\u2264 \u03c4\u221a\u03c4 + 1. (3.29)We now apply a concentration inequality to show that X is close to itsexpected value with high probability. As in the sparse vector recovery setting,we use Theorem 5.2 in [41] (proven in [87]).80Theorem 3.6.3. Let Y1, ..., Ym be a sequence of independent random vari-ables with values in some Polish space H. Let F be a countable collection ofreal-valued measurable and bounded functions f on H with \u2016f\u2016\u221e \u2264 B for allf \u2208 F . Let Z be the random variableZ = supf\u2208Fm\u2211k=1f(Yk).Assume Ef(Yk) = 0 for all k \u2208 [m] and all f \u2208 F . Define \u03c32 = supf\u2208F E\u2211mk=1 f(Y`)2.Then for all t \u2265 0P (Z \u2265 EZ + t) \u2264 exp(\u2212 t4Blog(1 + 2 log(1 +Bt2BEZ + \u03c32))).To apply the theorem, defineH := {Z : CN\u00d7N \u2192 CN\u00d7N : \u3008Z(X), X\u3009 \u2208 R \u2200X \u2208 T and supX\u2208T\u3008Z(X), X\u3009 \u2264 \u03b34rnm},which is a closed subset of {Z : CN\u00d7N \u2192 CN\u00d7N} (a Polish space via homeo-morphism to CN4), and therefore itself a Polish space (see [31]). For X \u2208 Tdefine the function fX : H \u2192 R acting on operators asfX(Z) := \u3008Z(X), X\u3009,which is real valued by definition of H. Then notice thatX := \u2016\u2211k,`(|Ak,`)(Ak,`|)\u2212 I\u2016T= supX\u2208T\u2211k,`\u3008(|Ak,`)(Ak,`| \u2212 E|Ak,`)(Ak,`|) (X), X\u3009= supX\u2208T\u2211k,`fX(|Ak,`)(Ak,`| \u2212 E|Ak,`)(Ak,`|)= supX\u2208T \u2217\u2211k,`fX(|Ak,`)(Ak,`| \u2212 E|Ak,`)(Ak,`|) ,where T \u2217 is a dense countable subset of T .81To see that {|Ak,`)(Ak,`| \u2212 E|Ak,`)(Ak,`|}(k,`)\u2208[n]\u00d7[m] \u2282 H, we use the firstpart of Lemma 3.6.1 and our previous remarks. For all (k, `) \u2208 [n]\u00d7 [m] andX \u2208 T , we obtainfX(|Ak,`)(Ak,`| \u2212 E|Ak,`)(Ak,`|) = |\u3008Ak,`, X\u3009|2 \u2212 E|\u3008Ak,`, X\u3009|2=1nm|\u3008N k,`,F\u2217(X)\u3009|2 \u2212 E 1nm|\u3008N k,`,F\u2217(X)\u3009|2\u2264\u03b34rnm,and we may choose B = \u03b34rnm.Now for \u03c32, we apply the second part of Lemma 3.6.1 with our previousreasoning to obtainE\u2211k,`fX(|Ak,`)(Ak,`| \u2212 E|Ak,`)(Ak,`|)2 = E\u2211k,`|\u3008Ak,`, X\u3009|4 \u2212 (E|\u3008Ak,`, X\u3009|2)2\u2264 E\u2211k,`|\u3008Ak,`, x\u3009|4 = 1n2m2E\u2211k,`|\u3008N k,`,F\u2217(X)\u3009|4\u2264 \u03b34rnm:= \u03c32.To finish, apply Theorem 3.6.3 with t = \u03b4\u221aNand choose \u03c4 = \u03b4\u221aN. Thenassuming\u221amn\u221alog(nm)\u2265 2C\u221aNr\u03b32 log5\/2(N)\u221a1 + \u03b4\u221aN\u03b4, (3.30)gives EX \u2264 \u03b4\u221aNaccording to (3.29). ThenX \u2264 EX + t \u2264 2\u03b4\u221aNwith probability of failure not exceedingexp(\u2212 nm\u03b44\u03b34r\u221aNlog(1 + 2 log(1 +t2 \u03b4\u221aN+ 1))).823.6.3 Proof of Lemma 3.5.3We now prove Lemma 3.5.3, which provides the parameter \u03b22 in (3.20).proof of Lemma 3.5.3. We wish to bound\u2016A\u2016F\u2192F := 1\u221anm\u2016N\u02dcF\u2217\u2016F\u2192F = 1\u221anm\u2016N\u02dc \u2016F\u2192F .LetM \u2208 Cnm\u00d7N2 be the matrix representation of the operator 1\u221anmN\u02dc :CN\u00d7N \u2192 Cn\u00d7m. In other words, if vecd1,d2 : Cd1\u00d7d2 \u2192 Cd1d2 denotes theinvertible operator that reorganizes a d1 \u00d7 d2 matrix into a vector of lengthd1d2, Then for v \u2208 CN2 the action ofM is given asMv = vecn,m(1\u221anmN\u02dc (vec\u22121N,N(v))) .With this operator defined, we see that1\u221anm\u2016N\u02dc \u2016F\u2192F = \u2016M\u2016,since any matrix X \u2208 S will be uniquely identified with vecN,N(X)\u2208 SN2\u22121and the supremums agree.Using a matrix Bernstein inequality, we will show that\u2016M\u2217M\u2212I\u2016 \u2264 2Nr,which will then show by Proposition A.15 in [85] that\u2016M\u2016 \u2264\u221a1 +2Nr\u2264\u221a3Nr.We will denote the rows of our ensemble asM(k,`)\u2217 = 1\u221anmvecN,N(N\u02dc k,`)where this expression is allowed by our definition of M and the strangeindexing is for convenience to refer to our original ensemble in a natural way(the result is irrelevant of the ordering of the rows).83We expand\u2016M\u2217M\u2212I\u2016 = \u2016n\u2211k=1m\u2211`=1(M\u2217(k,`)\u2217M(k,`)\u2217 \u2212 EM\u2217(k,`)\u2217M(k,`)\u2217) \u2016:= \u2016n\u2211k=1m\u2211`=1(Sk,` \u2212 ESk,`) \u2016,where the first equality holds by isotropy of1nmN\u02dc \u2217N\u02dc (3.24) shown in Section3.6.2. We wish to apply a Matrix Bernstein inequality (Corollary 6.1.2 in[48]) to this sum of independent and centered random matrices. However thismakes computing the matrix variance statistic difficult, i.e., the parameter \u03bdin [48] defined as\u03bd :=max{\u2016E\u2211k,`(Sk,` \u2212 ESk,`) (Sk,` \u2212 ESk,`)\u2217 \u2016, \u2016E\u2211k,`(Sk,` \u2212 ESk,`)\u2217 (Sk,` \u2212 ESk,`) \u2016}= max{\u2016E\u2211k,`(Sk,`S\u2217k,` \u2212 ESk,`ES\u2217k,`) \u2016, \u2016E\u2211k,`(S\u2217k,`Sk,` \u2212 ES\u2217k,`ESk,`) \u2016}.Specifically we wish to handle many different deviation distributions simul-taneously (i.e., all D1,D2 satisfying our deviation model), making the crossterms (\u2211k,` ESk,`ES\u2217k,` and\u2211k,` ES\u2217k,`ESk,`) particularly messy and perhapsonly possible to handle in a case by case basis.Instead, we circumvent this complication by applying our modified uni-form matrix Bernstein inequality, Theorem 3.9.1. Indeed notice that thisresult avoids dealing with the cross terms, at the cost of a reduced proba-bility of success (which will still hold with high probability). Applying thisresult will greatly simply our computations, and allow us to proceed with allof our deviation models simultaneously.We thus apply Theorem 3.9.1 and compute the corresponding parametersL and \u03bd\u02dc. To compute L, notice that\u2016M\u2217(k,`)\u2217M(k,`)\u2217 \u22121nmI\u2016 \u2264 \u2016M\u2217(k,`)\u2217M(k,`)\u2217\u2016+1nm= \u2016M\u2217(k,`)\u2217M(k,`)\u2217\u2016F +1nm= \u2016M(k,`)\u2217\u201622 +1nm=1nm\u2016N\u02dc k,`\u20162F +1nm=N2nm+1nm\u2264 2N2nm:= L.The first equality holds sinceM\u2217(k,`)\u2217M(k,`)\u2217 is a rank 1 matrix.84Now to compute the variance, notice thatED1,D2n\u2211k=1m\u2211`=1((M\u2217(k,`)\u2217M(k,`)\u2217) (M\u2217(k,`)\u2217M(k,`)\u2217)\u2217 \u2212 1(nm)2I)= ED1,D2n\u2211k=1m\u2211`=1(\u2016M(k,`)\u2217\u201622(M\u2217(k,`)\u2217M(k,`)\u2217)\u2212 1(nm)2I)= ED1,D2n\u2211k=1m\u2211`=1(N2nm(M\u2217(k,`)\u2217M(k,`)\u2217)\u2212 1(nm)2I)=N2nmI \u2212 1nmI.Similarly,ED1,D2n\u2211k=1m\u2211`=1((M\u2217(k,`)\u2217M(k,`)\u2217)\u2217 (M\u2217(k,`)\u2217M(k,`)\u2217)\u2212 1(nm)2I)=N2nmI \u2212 1nmI.and we can therefore we can choose\u03bd\u02dc =N2nm.We now apply Theorem 3.9.1 to obtain for any \u03b4 \u2265 0P(\u2016\u2211k,`(M\u2217(k,`)\u2217M(k,`)\u2217 \u2212Inm)\u2016 \u2264 2\u03b4)\u22651\u2212 2nm\u2212 4N exp(\u2212 \u03b428 log(nm)\u03bd\u02dc + 2L\u03b43)=1\u2212 2nm\u2212 4N exp(\u2212 nm\u03b428 log(nm)N2 + 4N2\u03b43).Choose \u03b4 = Nr, which gives\u2016M\u2217M\u2212I\u2016 \u2264 2Nr,85with probability greater than1\u2212 2nm\u2212 4N exp(\u2212 nm8 log(nm)r2 + 4Nr3).Therefore\u2016A\u2016F\u2192F = \u2016M\u2016 \u2264\u221a1 +2Nr\u2264\u221a3Nr,by proposition A.15 in [85] with the advertised probability.3.6.4 Proof of Lemma 3.5.4We finish by proving the last lemma used to prove Theorem 3.3.1, whichprovides the parameter \u03b24 in (3.16).proof of Lemma 3.5.4. We simplify our workload with several tricks. Toavoid dealing with PT\u22a5 and F , notice that\u2016PT\u22a5 \u25e6 A\u2217A(UrV \u2217r )\u2016 = \u2016PT\u22a5(A\u2217A(UrV \u2217r )\u2212 UrV \u2217r )\u2016 \u2264 \u2016(A\u2217A\u2212 I)(UrV \u2217r )\u2016:=\u2016F \u25e6 ( 1nmN\u02dc \u2217N\u02dc \u2212 I) \u25e6 F\u2217(UrV \u2217r )\u2016 \u2264 \u2016(1nmN\u02dc \u2217N\u02dc \u2212 I) \u25e6 F\u2217(UrV \u2217r )\u2016:=\u2016( 1nmN\u02dc \u2217N\u02dc \u2212 I)(X)\u2016,where we definedX := F\u2217(UrV \u2217r ). The first equality holds since PT\u22a5(UrV \u2217r ) =0, and the first inequality holds since PT\u22a5 is an orthogonal projection.We may therefore bound the term \u2016( 1nmN\u02dc \u2217N\u02dc \u2212I)(X)\u2016 instead. As in theproof of Lemma 3.5.3, we will achieve our goal by applying a matrix Bern-stein inequality for operator norm (Corollary 6.1.2 in [48]) via comparisonto the uniform counterpart of our operator (Theorem 3.9.1, uniform matrixBernstein inequality).Using again the notation of Lemma 3.6.2, we expand(1nmN\u02dc \u2217N\u02dc \u2212 I)(X) =n\u2211k=1m\u2211`=1(1nm|N\u02dc k,`)(N\u02dc k,`|(X))\u2212X:=n\u2211k=1m\u2211`=1(Sk,`)\u2212X,86and proceed by applying our uniform matrix Bernstein inequality, Theorem3.9.1. As in the proof of Lemma 3.5.3, this approach will allow us to dealwith all distributions from our deviation model simultaneously (as opposedto a direct application of Corollary 6.1.2 in [48]).We now apply Theorem 3.9.1. To bound the operator norm of each matrix(parameter L in Theorem 3.9.1), we have1nm\u2016|N\u02dc k,`)(N\u02dc k,`|(X)\u2016 = 1nm|\u3008N\u02dc k,`, X\u3009|\u2016N\u02dc k,`\u2016 := 1nm|\u3008N\u02dc k,`,F\u2217(UrV \u2217r )\u3009|\u2016N\u02dc k,`\u2016\u2264\u03b32rnm\u2016N\u02dc k,`\u2016 = \u03b32rnm\u2016N\u02dc k,`\u2016F = \u03b32rNnm.The inequality is due to Lemma 3.6.1 since UrV\u2217r \u2208 T , which holds for anymulti-index (k, `) \u2208 [n]\u00d7 [m]. The second to last equality holds since N\u02dc k,` isalways rank 1. It is therefore easy to see that\u2016Sk,` \u2212 Xnm\u2016 := 1nm\u2016|N\u02dc k,`)(N\u02dc k,`|(X)\u2212X\u2016 \u2264 2\u03b32rNnm:= L.Now for the variance matrix statistic of the sum (parameter \u03bd\u02dc in Theorem3.9.1). We see that\u2016ED1,D2\u2211k,`(S\u2217k,`Sk,` \u2212X\u2217X(nm)2)\u2016 \u2264 \u2016ED1,D2\u2211k,`S\u2217k,`Sk,`\u2016+\u2016X\u2217X\u2016nm\u2264 \u2016ED1,D2\u2211k,`S\u2217k,`Sk,`\u2016+1nm.The last inequality holds since\u2016X\u2217X\u2016 := \u2016 (F\u2217(UrV \u2217r ))\u2217F\u2217(UrV \u2217r )\u2016 \u2264 \u2016F\u2217(UrV \u2217r )\u2016\u2016F\u2217(UrV \u2217r )\u2016 = \u2016UrV \u2217r \u20162 = 1.To bound the remaining term, we show that ED1,D2\u2211k,` S\u2217k,`Sk,` is a diagonal87matrix. Let u, v \u2208 [N ], then(ED1,D2n\u2211k=1m\u2211`=1S\u2217k,`Sk,`)uv:=ED1,D2n\u2211k=1m\u2211`=11(nm)2(N\u02dc k,`\u2217N\u02dc k,`)uv|\u3008N\u02dc k,`, X\u3009|2=1(nm)2ED1,D2\u2211k,`\u3008N\u02dc k,`\u2217u , N\u02dc k,`\u2217v \u3009|\u3008N\u02dc k,`, X\u3009|2=1(nm)2ED1,D2\u2211k,`(N\u2211q=1N\u02dc k,`qu N\u02dck,`qv)|\u3008N\u02dc k,`, X\u3009|2:=1(nm)2ED1,D2\u2211k,`(N\u2211q=1e (\u03bd\u02dck,`(u\u2212 v)))|\u3008N\u02dc k,`, X\u3009|2=N(nm)2ED1,D2\u2211k,`e (\u03bd\u02dck,`(u\u2212 v)) |\u3008N\u02dc k,`, X\u3009|2=N(nm)2ED1,D2\u2211k,`e (\u03bd\u02dck,`(u\u2212 v))\u2223\u2223\u2223\u2223\u2223N\u2211q,j=1X\u00afqjN\u02dc k,`qj\u2223\u2223\u2223\u2223\u22232=N(nm)2ED1,D2\u2211k,`e (\u03bd\u02dck,`(u\u2212 v))N\u2211q,q\u02dc,j,j\u02dc=1X\u00afqjXq\u02dcj\u02dce (\u03c4\u02dck,`(q \u2212 q\u02dc)) e(\u03bd\u02dck,`(j \u2212 j\u02dc))=N(nm)2N\u2211q,q\u02dc,j,j\u02dc=1X\u00afqjXq\u02dcj\u02dc(ED1,D2\u2211k,`e (\u03c4\u02dck,`(q \u2212 q\u02dc)) e(\u03bd\u02dck,`(u\u2212 v + j \u2212 j\u02dc)))=NnmN\u2211q=1\u2211j\u2212j\u02dc=v\u2212uX\u00afqjXqj\u02dc =NnmN\u2211q=1\u2211j\u2208QuvX\u00afqjXq(j+u\u2212v).We take a moment to breathe and explain our calculations thus far. Thesecond to last equality holds by our deviation model, similar to the argumentused to establish isotropy (3.24) and the proof of the second part of Lemma3.6.1. In the last line Quv is the index set of allowed values for j accordingto u and v, i.e., so that j, j + u \u2212 v \u2208 [N ]. Notice that if u = v, Quv = [N ]88and we have shown for all u \u2208 [N ] that(ED1,D2S\u2217pSp)uu=NnmN\u2211q,j=1XqjX\u00afqj =N\u2016X\u20162Fnm=N\u2016UrV \u2217r \u20162Fnm=Nrnm.We continue assuming u 6= v. Since X := F\u2217(UrV \u2217r ) := F\u2217(Z), we obtainNnm\u2211q\u2211j\u2208QuvX\u00afqjXq(j+u\u2212v) :=Nnm\u2211q\u2211j\u2208Quv\u3008(F\u2217)q,j, Z\u3009\u3008(F\u2217)q,(j+u\u2212v), Z\u3009=Nnm\u2211q\u2211j\u2208QuvN\u2211k,`=1N\u2211k\u02dc,\u02dc`=1(F\u2217)q,jk` Zk`(F\u2217)q,(j+u\u2212v)k\u02dc \u02dc` Z\u00afk\u02dc \u02dc`:=1Nnm\u2211q\u2211j\u2208QuvN\u2211k,`=1N\u2211k\u02dc,\u02dc`=1e (q(\u03c4k \u2212 \u03c4k\u02dc)) e (\u03bd`j \u2212 \u03bd\u02dc`(j + u\u2212 v))Zk`Z\u00afk\u02dc \u02dc`=1Nnm\u2211j\u2208QuvN\u2211k,`=1N\u2211k\u02dc,\u02dc`=1e (\u03bd`j \u2212 \u03bd\u02dc`(j + u\u2212 v))Zk`Z\u00afk\u02dc \u02dc`(N\u2211q=1e (q(\u03c4k \u2212 \u03c4k\u02dc)))=1nm\u2211j\u2208QuvN\u2211k=1N\u2211`=1N\u2211\u02dc`=1e (\u03bd`j \u2212 \u03bd\u02dc`(j + u\u2212 v))Zk`Z\u00afk \u02dc`=1nm\u2211j\u2208QuvN\u2211`=1N\u2211\u02dc`=1e (\u03bd`j \u2212 \u03bd\u02dc`(j + u\u2212 v))(N\u2211k=1Zk`Z\u00afk \u02dc`)=1nm\u2211j\u2208QuvN\u2211`=1e (\u03bd`(j \u2212 (j + u\u2212 v)) = 1nm\u2211j\u2208QuvN\u2211`=1e (\u03bd`(v \u2212 u))=|Quv|nmN\u2211`=1e (\u03bd`(v \u2212 u)) = 0.In the sevent equality, we usedN\u2211k=1Zk`Z\u00afk \u02dc` :=N\u2211k=1(UrV\u2217r )k`(UrV\u2217r )k \u02dc` ={0 if ` 6= \u02dc`1 if ` = \u02dc`,which holds since the columns of Ur, Vr are orthonormal.Therefore\u2016ED1,D2\u2211k,`S\u2217k,`Sk,`\u2016 =Nrnm\u2016I\u2016 = Nrnm.89Similarly, we can show\u2016ED1,D2\u2211k,`Sk,`S\u2217k,`\u2016 =Nrnm,allowing us to bound \u03bd\u02dc \u2264 (Nr+1)nm\u2264 2Nrnm. We now apply Theorem 3.9.1 toobtain for any \u03b4 \u2265 0P(\u2016\u2211k,`(1nm|N\u02dc k,`)(N\u02dc k,`|(X))\u2212X\u2016 \u2264 2\u03b4)\u22651\u2212 2nm\u2212 4N exp(\u2212 \u03b428 log(nm)\u03bd\u02dc + 2L\u03b43)=1\u2212 2nm\u2212 4N exp(\u2212 nm\u03b4216Nr log(nm) + 4\u03b32Nr\u03b43).We conclude\u2016PT\u22a5 \u25e6 A\u2217A(UrV \u2217r )\u2016 \u2264 \u2016(1nmN\u02dc \u2217N\u02dc \u2212 I)(F\u2217(UrV \u2217r ))\u2016 \u2264 2\u03b4,with the prescribed probability.3.7 Interpolation Error of 2D Dirichlet Kernel:ProofIn this section we establish the error term of our 2D Dirichlet interpolationkernel when applied to our signal model. The argument utilizes the ideas ofthe proof of Theorem 2.2.1 (the analogous 1D statement), and we will referto this proof for brevity.Proof of Theorem 3.2.1. We begin by establishing the error-free case (3.4).Under this scenario we assume \u03c4\u02dck,` = \u03c4p\u02dc and \u03bd\u02dck,` = \u03bdq\u02dc for some p\u02dc, q\u02dc \u2208 [N ].The claim follows easily by orthogonality of the complex exponential basis.90Combining (3.2), (3.3) we have (recall that N\u02dc := N\u221212)(S(D))k` = 1N2N\u2211p=1N\u2211q=1Dpq\uf8eb\uf8ed N\u02dc\u2211u=\u2212N\u02dcN\u02dc\u2211v=\u2212N\u02dce(u\u03c4p)e(\u2212u\u03c4\u02dck,`)e(v\u03bdq)e(\u2212v\u03bd\u02dck,`)\uf8f6\uf8f8:=1N2N\u2211p=1N\u2211q=1Dpq\uf8eb\uf8ed N\u02dc\u2211u=\u2212N\u02dcN\u02dc\u2211v=\u2212N\u02dce(u\u03c4p)e(\u2212u\u03c4p\u02dc)e(v\u03bdq)e(\u2212v\u03bdq\u02dc)\uf8f6\uf8f8=1N2N\u2211p=1N\u2211q=1Dpq\uf8eb\uf8ed N\u02dc\u2211u=\u2212N\u02dce(u(\u03c4p \u2212 \u03c4p\u02dc))\uf8f6\uf8f8\uf8eb\uf8ed N\u02dc\u2211v=\u2212N\u02dce(v(\u03bdq \u2212 \u03bdq\u02dc))\uf8f6\uf8f8= Dp\u02dcq\u02dc = D(\u03c4p\u02dc, \u03bdq\u02dc) := D(\u03c4\u02dck,`, \u03bd\u02dck,`) = D\u02dck`.To establish (3.5), recall the Fourier expansion of our 2D functionD(x, y) =\u221e\u2211k\u02dc=\u2212\u221e\u221e\u2211\u02dc`=\u2212\u221eck\u02dc,\u02dc`e(k\u02dcx)e(\u02dc`y).Combining (3.2), (3.3) and the Fourier expansion we obtain(S(D))k` = 1N2N\u2211p=1N\u2211q=1Dpq\uf8eb\uf8ed N\u02dc\u2211u=\u2212N\u02dcN\u02dc\u2211v=\u2212N\u02dce(u\u03c4p)e(\u2212u\u03c4\u02dck,`)e(v\u03bdq)e(\u2212v\u03bd\u02dck,`)\uf8f6\uf8f8=1N2N\u2211p=1N\u2211q=1D(\u03c4p, \u03bdq)\uf8eb\uf8ed N\u02dc\u2211u=\u2212N\u02dcN\u02dc\u2211v=\u2212N\u02dce(u\u03c4p)e(\u2212u\u03c4\u02dck,`)e(v\u03bdq)e(\u2212v\u03bd\u02dck,`)\uf8f6\uf8f8=1N2N\u2211p=1N\u2211q=1\uf8eb\uf8ed \u221e\u2211k\u02dc=\u2212\u221e\u221e\u2211\u02dc`=\u2212\u221eck\u02dc,\u02dc`e(k\u02dc\u03c4p)e(\u02dc`\u03bdq)\uf8f6\uf8f8\u00b7\uf8eb\uf8ed N\u02dc\u2211u=\u2212N\u02dcN\u02dc\u2211v=\u2212N\u02dce(u\u03c4p)e(\u2212u\u03c4\u02dck,`)e(v\u03bdq)e(\u2212v\u03bd\u02dck,`)\uf8f6\uf8f8 .At this point, we wish to switch the order of summation to sum over allp, q. To this end, we proceed assuming Dpq =\u2211k\u02dc,\u02dc`ck\u02dc,\u02dc`e(k\u02dc\u03c4p)e(\u02dc`\u03bdq) 6= 0 and\u2211u,v e(u\u03c4p)e(\u2212u\u03c4\u02dck,`)e(v\u03bdq)e(\u2212v\u03bd\u02dck,`) 6= 0 for all p, q \u2208 [N ]. We will deal withthe case where these terms vanish afterward. In particular we can remove the91assumption Dpq 6= 0 and we will show the remaining term does not vanishunder our assumption \u03c4\u02dc \u00d7 \u03bd\u02dc \u2282 \u2126.Continuing under our added assumption, we may switch the order ofsummation and sum over all p, q since these summands are non-zero to obtain(S(D))k` =1N2\u2211u,v\u221e\u2211k\u02dc=\u2212\u221e\u221e\u2211\u02dc`=\u2212\u221eck\u02dc,\u02dc`e(\u2212u\u03c4\u02dck,`)e(\u2212v\u03bd\u02dck,`)(\u2211p,qe((u+ k\u02dc)\u03c4p)e((v + \u02dc`)\u03bdq))=\u2211u,v\u221e\u2211j=\u2212\u221e\u221e\u2211j\u02dc=\u2212\u221e(\u22121)jN+j\u02dcNc(jN+u),(j\u02dcN+v)e(u\u03c4\u02dck,`)e(v\u03bd\u02dck,`)=\u221e\u2211j=\u2212\u221e\u221e\u2211j\u02dc=\u2212\u221e(\u22121)b j+N\u02dcN c+b j\u02dc+N\u02dcN ccj,j\u02dce(rN(j)\u03c4\u02dck,`)e(rN(j\u02dc)\u03bd\u02dck,`).The second equality is obtained by orthogonality of the exponential basisfunctions,\u2211np=1 e((u + k\u02dc)\u03c4p) = 0 when k\u02dc + u \/\u2208 NZ and otherwise equal to(\u22121)jN = (\u22121)j for some j \u2208 Z (where u + k\u02dc = jN). A similar conclusionholds for the independent sum over index q (with v + \u02dc` = j\u02dcN). The lastinequality results from a reordering of the series as in the proof of Theorem2.2.1. To elaborate this reordering, note that (\u22121)jN+j\u02dcN = (\u22121)j+j\u02dc (N isassumed to be odd) and we ask the reader to recall the reordering in theproof of Theorem 2.2.1 in Section 2.6 in order to apply the same argumenthere. Starting with the previous term, we apply the reordering argument92from the 1D case twice as follows\u2211u,v\u221e\u2211j=\u2212\u221e\u221e\u2211j\u02dc=\u2212\u221e(\u22121)jN+j\u02dcNc(jN+u),(j\u02dcN+v)e(u\u03c4\u02dck,`)e(v\u03bd\u02dck,`)=\u2211v\u221e\u2211j\u02dc=\u2212\u221e(\u22121)j\u02dce(v\u03bd\u02dck,`)(\u2211u\u221e\u2211j=\u2212\u221e(\u22121)jc(jN+u),(j\u02dcN+v)e(u\u03c4\u02dck,`))=\u2211v\u221e\u2211j\u02dc=\u2212\u221e(\u22121)j\u02dce(v\u03bd\u02dck,`)( \u221e\u2211j=\u2212\u221e(\u22121)b j+N\u02dcN ccj,(j\u02dcN+v)e(rN(j)\u03c4\u02dck,`))=\u221e\u2211j=\u2212\u221e(\u22121)b j+N\u02dcN ce(rN(j)\u03c4\u02dck,`)\uf8eb\uf8ed\u2211v\u221e\u2211j\u02dc=\u2212\u221e(\u22121)j\u02dccj,j\u02dcN+ve(v\u03bd\u02dck,`)\uf8f6\uf8f8=\u221e\u2211j=\u2212\u221e(\u22121)b j+N\u02dcN ce(rN(j)\u03c4\u02dck,`)\uf8eb\uf8ed \u221e\u2211j\u02dc=\u2212\u221e(\u22121)b j\u02dc+N\u02dcN ccj,j\u02dce(rN(j\u02dc)\u03bd\u02dck,`)\uf8f6\uf8f8 ,where the 1D reordering was applied in the second and fourth equalities.For j, j\u02dc \u2208 {\u2212N\u02dc ,\u2212N\u02dc+1, \u00b7 \u00b7 \u00b7 , N\u02dc\u22121, N\u02dc} we have (\u22121)b j+N\u02dcN c = (\u22121)b j\u02dc+N\u02dcN c =1, rN(j) = j and rN(j\u02dc) = j\u02dc. Using the Fourier expansion of D(\u03c4\u02dck,`, \u03bd\u02dck,`) =D\u02dck` we have shownD\u02dck` \u2212 (S(D))k` = D(\u03c4\u02dck,`, \u03bd\u02dck,`)\u2212 (S(D))k` =\u2211|p|>N\u02dc\u2211|q|>N\u02dccp,q(e(p\u03c4\u02dck,`)e(q\u03bd\u02dck,`)\u2212 (\u22121)bp+N\u02dcNc+b q+N\u02dcNce(rN(p)\u03c4\u02dck,`)e(rN(q)\u03bd\u02dck,`)).The definition of the Frobenious norm and the triangle inequality finish theproof in this case.We have left to consider the cases where Dpq = 0 or\u2211u,ve(u\u03c4p)e(\u2212u\u03c4\u02dck,`)e(v\u03bdq)e(\u2212v\u03bd\u02dck,`) = 0for some p, q \u2208 [N ]. Both of these will be handled as in the proof of Theorem2.2.1 in Section 2.6 as we refer the reader to this proof for a more detailedexplanation.93The constraint Dpq 6= 0 for all p, q \u2208 [N ] can be removed by finding some\u00b5 \u2208 R such that Hpq := Dpq + \u00b5 6= 0 for all p, q \u2208 [N ] and let H := D + \u00b5.Then the argument above holds assuming only\u2211u,v e(u\u03c4p)e(\u2212u\u03c4\u02dck,`)e(v\u03bdq)e(\u2212v\u03bd\u02dck,`) 6=0 for all p, q \u2208 [N ]. We concludeH(\u03c4\u02dck,`, \u03bd\u02dck,`)\u2212 (S(H))k` =\u2211|p|>N\u02dc\u2211|q|>N\u02dccp,q(e(p\u03c4\u02dck,`)e(q\u03bd\u02dck,`)\u2212 (\u22121)bp+N\u02dcNc+b q+N\u02dcNce(rN(p)\u03c4\u02dck,`)e(rN(q)\u03bd\u02dck,`)),but it is easy to show thatH(\u03c4\u02dck,`, \u03bd\u02dck,`)\u2212 (S(H))k` = D(\u03c4\u02dck,`, \u03bd\u02dck,`)\u2212 (S(D))k`so the result holds in this case as well.For the final case, one case use the orthogonality of the complex exponen-tial basis to show that\u2211u,v e(u\u03c4p)e(\u2212u\u03c4\u02dck,`)e(v\u03bdq)e(\u2212v\u03bd\u02dck,`) = 0 if \u03c4\u02dck,` = jN\u2212 12or \u03bd\u02dck,` =j\u02dcN\u2212 12for some j, j\u02dc \u2208 Z\/{0, 1, \u00b7 \u00b7 \u00b7 , N\u22121}. Both of these cases violatethe assumption \u03c4\u02dc \u00d7 \u03bd\u02dc \u2282 \u2126, so the previous arguments always apply.3.8 Proof of Theorem 3.3.2: Stable and RobustMatrix CompletionAs mentioned in Section 3.3.1, the proof of our matrix completion resultTheorem 3.3.2 is almost a corollary of Theorem 3.3.1. However, it requiresseveral changes in our assumptions and methodology. We present these de-tails here in the form of a quick informal argument rather than a proof forthe sake of brevity. With these differences noted, the proofs are very similar.In this context, let {(\u03c4\u02dck, \u03bd\u02dck)}mk=1 \u2282 \u2126 denote our non-equispaced samples(notice we have dropped the ` index). Our sampled matrix entries \u039b in thiscontext are drawn from the uniform distribution on [N ]2. This is achievedby generating for each k \u2208 [m](\u03c4\u02dck, \u03bd\u02dck) = (\u03c4pk , \u03bdqk)where pk, qk are chosen independently uniformly at random from [N ]. Here{(\u03c4p, \u03bdq)}p,q\u2208[N ] still correspond to our equispaced samples from Section 3.2.1,94which indeed give the matrix entries of D, i.e., Dpq = D(\u03c4p, \u03bdq) (the signalmodel D will no longer be needed here, see below).Several things to notice. First, for all k \u2208 [m] we have (\u03c4\u02dck, \u03bd\u02dck) = (\u03c4p, \u03bdq)for some p, q \u2208 [N ] so that we may apply the error-free interpolation result(3.4) from Theorem 3.2.1. This justifies our noise constraint \u223c \u03b7 in (3.11)that does not include the error of theN\u221212-bandlimited approximation, i.e., Dis feasible under this noise constraint. Notice that this observation removesour need for the signal modelD(x, y), since we are only working with samplesof the form Dpq which are fit for any matrix (not necessarily generated as adiscretization of a function).Second and most important, we no longer strictly fit our 2D deviationmodel from Section 3.2.2 since we have gotten rid of the deviation structure,i.e., \u2206,\u0393 no longer play a role. However, our new ensemble still exhibitsisotropy (3.24) since for `, \u02dc`\u2208 [N ] with ` 6= \u02dc`we haveEe(\u03c4\u02dck(`\u2212 \u02dc`))=1NN\u2211p=1e(\u03c4p(`\u2212 \u02dc`))=1Ne(\u2212`\u2212\u02dc`2)1\u2212 e(`\u2212 \u02dc`)1\u2212 e(`\u2212\u02dc`N) = 0,where the last equality holds since |`\u2212 \u02dc`| \u2264 N\u22121. This observation allows usto keep the crucial isotropy property (3.24) as shown in Section 3.6.2, which isapplied several times throughout the proof of Theorem 3.3.1 and its requiredlemmas. Furthermore, Lemma 3.5.4 still holds, where the orthogonality isneeded to show that E\u2211k,` S\u2217k,`Sk,` and E\u2211k,` Sk,`S\u2217k,` are diagonal matrices.This claim remains true here and can be proved in a very similar manner.The only computation remaining that is lost using our modified samplemodel is in the proof of Lemma 3.6.1, where this orthogonality is used tobound the sum of the fourth moments in establishing (3.26). However, avariant of this claim still holds under this context. In particular we can showthat for D \u2208 T we have1mEm\u2211k=1|\u3008N\u02dc k,k,F\u2217(D)\u3009|4 \u2264 4\u03b34r\u2016D\u20164F , (3.31)where N\u02dc k,k is meant to adopt our previous notation, i.e., for k \u2208 [m] andp, q \u2208 [N ]N\u02dc k,kpq = e(\u03c4\u02dck(p\u2212 N\u02dc \u2212 1))e(\u03bd\u02dck(q \u2212 N\u02dc \u2212 1)).Notice that the only difference between (3.31) and (3.26) is the factor of 4.The claim (3.31) can be shown as follows: assume D \u2208 T . Expand and95obtain1mEm\u2211k=1|\u3008N\u02dc k,k,F\u2217(D)\u3009|4=N\u2211p1,p2,p4,p3=1N\u2211q1,q2,q4,q3=1F\u2217(D)p1q1F\u2217(D)p2q2F\u2217(D)p3q3F\u2217(D)p4q4\u00b7(1mEm\u2211k=1\u00af\u02dcN k,kp1q1N\u02dc k,kp2q2 \u00af\u02dcN k,kp3q3N\u02dc k,kp4q4)=\u2211p1\u2212p2=p4\u2212p3\u2211q1\u2212q2=q4\u2212q3F\u2217(D)p1q1F\u2217(D)p2q2F\u2217(D)p3q3F\u2217(D)p4q4\u2212\u2211p1\u2212p2=p4\u2212p3\u2211q1\u2212q2+q3\u2212q4=\u00b1NF\u2217(D)p1q1F\u2217(D)p2q2F\u2217(D)p3q3F\u2217(D)p4q4\u2212\u2211p1\u2212p2+p3\u2212p4=\u00b1N\u2211q1\u2212q2=q4\u2212q3F\u2217(D)p1q1F\u2217(D)p2q2F\u2217(D)p3q3F\u2217(D)p4q4+\u2211p1\u2212p2+p3\u2212p4=\u00b1N\u2211q1\u2212q2+q3\u2212q4=\u00b1NF\u2217(D)p1q1F\u2217(D)p2q2F\u2217(D)p3q3F\u2217(D)p4q4 .(3.32)The last equality holds by isotropy of our ensemble due to our sampling96model in this context. More precisely, we have1mEm\u2211k=1\u00af\u02dcN k,kp1q1N\u02dc k,kp2q2 \u00af\u02dcN k,kp3q3N\u02dc k,kp4q4:=1mE(m\u2211k=1e(\u03c4\u02dck(\u2212p1 + p2 \u2212 p3 + p4))e(\u03bd\u02dck(\u2212q1 + q2 \u2212 q3 + q4)))=(1NN\u2211u=1e(\u03c4u(\u2212p1 + p2 \u2212 p3 + p4)))(1NN\u2211v=1e(\u03bdv(\u2212q1 + q2 \u2212 q3 + q4)))=(1Ne(\u2212\u2212p1 + p2 \u2212 p3 + p42)1\u2212 e(\u2212p1 + p2 \u2212 p3 + p4)1\u2212 e (\u2212p1+p2\u2212p3+p4N) )\u00b7(1Ne(\u2212\u2212q1 + q2 \u2212 q3 + q42)1\u2212 e(\u2212q1 + q2 \u2212 q3 + q4)1\u2212 e (\u2212q1+q2\u2212q3+q4N) )=\uf8f1\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f31 if \u2212 p1 + p2 \u2212 p3 + p4 = 0and \u2212 q1 + q2 \u2212 q3 + q4 = 0,e (\u2213N\/2) = \u22121 if \u2212 p1 + p2 \u2212 p3 + p4 = 0and \u2212 q1 + q2 \u2212 q3 + q4 = \u00b1N,e (\u2213N\/2) = \u22121 if \u2212 p1 + p2 \u2212 p3 + p4 = \u00b1N,and \u2212 q1 + q2 \u2212 q3 + q4 = 0,e (\u2213N\/2) e (\u2213N\/2) = 1 if \u2212 p1 + p2 \u2212 p3 + p4 = \u00b1N,and \u2212 q1 + q2 \u2212 q3 + q4 = \u00b1N,0 otherwise.The midterms hold since e (\u2213N\/2) = e\u2213piiN = \u22121 as N is assumed to be odd.The last case holds since |\u2212p1+p2\u2212p3+p4|, |\u2212q1+q2\u2212q3+q4| \u2264 2N\u22122 < 2N(this is why we only single out the mid-cases \u00b1N).Continuing from (3.32), we can bound the first term of this last equality97as\u2223\u2223\u2223\u2223\u2223 \u2211p1\u2212p2=p4\u2212p3\u2211q1\u2212q2=q4\u2212q3F\u2217(D)p1q1F\u2217(D)p2q2F\u2217(D)p3q3F\u2217(D)p4q4\u2223\u2223\u2223\u2223\u2223=\u2223\u2223\u2223\u2223\u2223\u2223N\u2211p1,p2=1N\u2211q1,q2=1F\u2217(D)p1q1F\u2217(D)p2q2\u2211p3,q3\u2208Qp1,p2,q1,q2F\u2217(D)p3q3F\u2217(D)(p1\u2212p2+p3)(q1\u2212q2+q3)\u2223\u2223\u2223\u2223\u2223\u2223\u2264N\u2211p1,p2=1N\u2211q1,q2=1|F\u2217(D)p1q1F\u2217(D)p2q2||\u3008F\u2217(D),F\u2217(D)\u3009|= \u2016D\u20162F(\u2211p,q|F\u2217(D)pq|)2\u2264 \u03b34r\u2016D\u20164F .In the first equality, Qp1,p2,q1,q2 \u2282 [N ]2 is the index set of allowed valuesfor p3, q3 according to p1, p2, q1 and q2 (i.e., so that p3, q3 \u2208 [N ] and (p1 \u2212p2 + p3), (q1 \u2212 q2 + q3) \u2208 [N ]). The last inequality holds since D \u2208 T gives\u2211Np,q=1 |\u3008(F\u2217)p,q, D\u3009| \u2264 \u03b32\u221ar\u2016D\u2016F as in the proof of (3.25) from Lemma3.6.1.The remaining terms in (3.32) can be bounded similarly, this producesthe factor of 4 since we apply the bound for each of the 4 terms in (3.32).Hence, (3.31) holds.Using these observations allows the proof of Theorem 3.3.2 to proceedas in the proof of Theorem 3.3.1. The only difference being the factor of 4in (3.31) in contrast to (3.26) used to establish Theorem 3.3.1. This differ-ence only affects the probability by which Lemma 3.5.2 holds (in applyingTheorem 3.6.3 therein). This last observation gives the slightly modifiedprobability bound (3.12) in Theorem 3.3.2.3.9 Uniform Matrix Bernstein InequalityThis section is dedicated to a modified matrix Bernstein inequality, crucial forthe proof of Lemma 3.5.3 and Lemma 3.5.4. As mentioned in those respectiveproofs, the advantage of this result is that it does not require one to deal withthe cross terms in the computation of the variance statistic in Corollary 6.1.2in [48] and other similar results. This is a valuable property for cases wherethe summands are not identically distributed or each individual summand is98not isotropic. Instead, this theorem only requires isotropy on the ensembleas a whole. For this reason, this result may be of interest on its own.Theorem 3.9.1. Let {Xk}mk=1 \u2282 Cd1\u00d7d2 be a sequence of independent randommatrices, where each matrix is generated from a distribution Dk. Assume thatm\u2211k=1EDkXk = X,\u2016Xk \u2212 Xm\u2016 \u2264 L, \u2200k \u2208 [m]andmax{\u2016m\u2211k=1(EDkXkX\u2217k \u2212XX\u2217m2)\u2016, \u2016m\u2211k=1(EDkX\u2217kXk \u2212X\u2217Xm2)\u2016} \u2264 \u03bd\u02dc.Then for all \u03b4 \u2265 0P(\u2016m\u2211k=1Xk \u2212X\u2016 \u2264 2\u03b4)\u2265 1\u2212 2m\u2212 4(d1 + d2) exp(\u2212 \u03b428 log(m)\u03bd\u02dc + 2L\u03b43).Proof. To establish the result, we will show that \u2016\u2211mk=1Xk \u2212 X\u2016 can bebounded with high probability by two terms consisting of sums of indepen-dent and centered random matrices. We will then apply Corollary 6.1.2 in[48] to bound the operator norm of these sums.To this end, we consider the sequence of i.i.d. randommatrices {Xjp}m\u02dcp=1 \u2282Cd1\u00d7d2 , where each jp \u223c U([m]) and Xjp is generated according to Djp . Inother words, each Xjp will be of the form Xk for some k chosen uniformly atrandom from [m] and generated according to Dk as per the original ensemble.By an elementary analysis of the coupon collector's problem, if m\u02dc \u2265\u03bbm log(m) then with probability greater than 1\u2212(m)1\u2212\u03bb the generated indices{jp}m\u02dcp=1 will collect the whole set [m] (see for example [77] Section 3.6). Inother words{jp}m\u02dcp=1 = [m] \u222a J,where |J | = m(\u03bb log(m)\u2212 1).With this in mind, the proof goes as follows. Let m\u02dc = 4m log(m) so that{jp}m\u02dcp=1 will contain [m] at least twice with probability greater than 1 \u2212 2m(apply the coupon collector's problem twice). We will now create a subset of99indices \u2126 \u2282 [m\u02dc] such that {jp}p\u2208\u2126 = [m]. This can be done in many ways,for the result we only require that P(p \u2208 \u2126) 6= 0 for all p \u2208 [m\u02dc] (which willalways hold by the uniform distribution of the jp's). However, for simplicitywe will actually create \u2126 in such a way that P(p \u2208 \u2126) = P(p\u02dc \u2208 \u2126) for allp 6= p\u02dc (though we will not prove this since it is not crucial).For each k \u2208 [m], define the setCk := {p \u2208 [m\u02dc] : jp = k},and notice that |Ck| \u2265 2 for all k (with probability larger than 1 \u2212 2m).Then for each k choose index pk \u2208 [m\u02dc] uniformly at random from Ck, i.e.,pk \u223c U(Ck). We achieve our set of interest by defining\u2126 :=m\u22c3k=1{pk}.Thus {jp}m\u02dcp=1 = [m]\u22c3{jp}p\/\u2208\u2126 = {jp}p\u2208\u2126\u22c3{jp}p\/\u2208\u2126, where the equalityholds w.h.p., and generate each matrix Xjp \u223c Djp . We have {Xjp}p\u2208\u2126 ={Xk}mk=1 and we have thus also generated an instance of our original ensemblewith high probability.Then\u2016m\u2211k=1Xk \u2212X\u2016=\u2016m\u02dc\u2211p=1Xjp \u2212\u2211p\/\u2208\u2126Xjp \u2212X\u2016=\u2016m\u02dc\u2211p=1Xjp \u2212\u2211p\/\u2208\u2126Xjp \u2212(m\u02dcm)X +(m\u02dc\u2212mm)X\u2016\u2264\u2016m\u02dc\u2211p=1Xjp \u2212(m\u02dcm)X\u2016+ \u2016\u2211p\/\u2208\u2126Xjp \u2212(m\u02dc\u2212mm)X\u2016= \u2016m\u02dc\u2211p=1(Xjp \u2212Xm)\u2016+ \u2016\u2211p\/\u2208\u2126(Xjp \u2212Xm)\u2016, (3.33)and we may now bound the two last terms in (3.33) instead. This approachwill allow a simpler application of Corollary 6.1.2 in [48], where the main100advantage is that each matrix in the first sum over all p \u2208 [m\u02dc] satisfiesEjp,DjpXjp =m\u2211k=1P(jp = k)EDkXk =1mm\u2211k=1EDkXk =Xm, (3.34)by construction. Therefore, the first sum consists of independent and cen-tered random matrices.The same conclusion holds for the second sum over p \/\u2208 \u2126, but this willrequire additional work to show (i.e., we aim to show (3.34) and independencehold given that p \/\u2208 \u2126). First, we show that the summands in the sum overp \/\u2208 \u2126 are centered.Fix p \u2208 [m\u02dc]. By (3.34) and the law of total expectation we haveXm= Ejp,DjpXjp= Ejp\u2208\u2126,DjpXjpP(p \u2208 \u2126) + Ejp\/\u2208\u2126,DjpXjpP(p \/\u2208 \u2126), (3.35)where the subscripts in jp\u2208\u2126 (jp\/\u2208\u2126) are used the indicate the condition onexpectation. It is important to notice that P(p \u2208 \u2126) \u2264 12and P(p \/\u2208 \u2126) \u2265 12by construction of \u2126 and since we are guaranteed (w.h.p.) that {jp}p\u2208[m\u02dc]contains [m] at least twice.Then since {jp}p\u2208\u2126 = [m], we haveEjp\u2208\u2126,DjpXjp =m\u2211k=1P(jp = k|p \u2208 \u2126)EDkXk. (3.36)We will show that P(jp = k|p \u2208 \u2126) = 1m for all k \u2208 [m]. Let k\u02dc 6= k andrecall the definition of Ck. Since |Ck| \u2265 2 for all k \u2208 [m], by the law of totalprobabilityP(p \u2208 \u2126|jp = k) =m\u02dc\u2211q=2P(p \u2208 \u2126|jp = k & |Ck| = q)P(|Ck| = q)=m\u02dc\u2211q=21qP(|Ck| = q) =m\u02dc\u2211q=21qP(|Ck\u02dc| = q)=m\u02dc\u2211q=2P(p \u2208 \u2126|jp = k\u02dc & |Ck\u02dc| = q)P(|Ck\u02dc| = q) = P(p \u2208 \u2126|jp = k\u02dc).101The second equality holds by construction of \u2126 and the third equality holdssince the indices are drawn uniformly at random. Therefore, P(p \u2208 \u2126|jp =k) = P(p \u2208 \u2126|jp = k\u02dc) for all k\u02dc 6= k. By Bayes theorem and our uniformdistribution of jp, we obtainP(jp = k|p \u2208 \u2126) = P(p \u2208 \u2126|jp = k)P(jp = k)P(p \u2208 \u2126)=P(p \u2208 \u2126|jp = k\u02dc)P(jp = k\u02dc)P(p \u2208 \u2126) = P(jp = k\u02dc|p \u2208 \u2126).Let \u03b1 := P(jp = k|p \u2208 \u2126), then1 =m\u2211k=1P(jp = k|p \u2208 \u2126) = \u03b1mand we must have \u03b1 := P(jp = k|p \u2208 \u2126) = 1m for all k \u2208 [m], as desired. Tofinish, our observation applied to (3.36) gives thatEjp\u2208\u2126,DjpXjp =m\u2211k=11mEDkXk =Xm.Rearranging (3.35) we have shownXm(1\u2212 P (p \u2208 \u2126)) = Ejp\/\u2208\u2126,DjpXjpP(p \/\u2208 \u2126),so that diving by P(p \/\u2208 \u2126) 6= 0 gives the claim.Next, we show that this sequence of random matrices is independent. Forthis claim we must solely consider the random indices jp\/\u2208\u2126, since once {jp}p\/\u2208\u2126are set the matrices {Xjp}p\/\u2208\u2126 will be generated independently according to{Djp}p\/\u2208\u2126.Our goal is to show that P(jp = k|{jp\u02dc}p\u02dc\/\u2208\u2126) = P(jp = k) for all p \/\u2208 \u2126 andk \u2208 [m]. To this end, let p, p\u02dc \u2208 [m\u02dc]\/\u2126 (p 6= p\u02dc) and k, k\u02dc \u2208 [m] (k 6= k\u02dc). Byconstruction, it is clear that P(jp = k) = P(jp = k|jp\u02dc = k\u02dc). Therefore, by the102law of total probability we obtainP(jp = k) =m\u2211k\u02dc=1P(jp = k|jp\u02dc = k\u02dc)P(jp\u02dc = k\u02dc) =m\u2211k\u02dc=11mP(jp = k|jp\u02dc = k\u02dc)=\u2211k\u02dc 6=k1mP(jp = k|jp\u02dc = k\u02dc) + 1mP(jp = k|jp\u02dc = k)=m\u2212 1mP(jp = k) +1mP(jp = k|jp\u02dc = k).The second equality uses P(jp = k|p \/\u2208 \u2126) = 1m , which holds due to ourprevious argument that showed P(jp = k|p \u2208 \u2126) = 1m . Combining termsand rearranging gives P(jp = k) = P(jp = k|jp\u02dc = k). This establishespairwise independence and a recursive application of the argument provesindependence of the ensemble.In conclusion, both terms in (3.33) correspond to sums of independentand centered random matrices and we now apply Corollary 6.1.2 in [48] tobound both operator norms by computing the corresponding parameters Land \u03bd. This finishes our proof sinceL := maxp\u2208[m\u02dc]\u2016Xjp \u2212 EXjp\u2016 = maxp\u2208[m\u02dc]\u2016Xjp \u2212Xm\u2016 = maxk\u2208[m]\u2016Xk \u2212 Xm\u2016agrees with our definition and\u2016m\u02dc\u2211p=1Ejp,Djp(Xjp \u2212Xm)(Xjp \u2212Xm)\u2217\u2016 = \u2016m\u02dc\u2211p=1Ejp,Djp(XjpX\u2217jp \u2212XX\u2217m2)\u2016=m\u02dc\u2016Ejp,Djp(XjpX\u2217jp \u2212XX\u2217m2)\u2016 = m\u02dcm\u2016m\u2211k=1(EDkXkX\u2217k \u2212XX\u2217m2)\u2016\u2264 m\u02dc\u03bd\u02dcm= 4 log(m)\u03bd\u02dc.The second equality holds since the terms are i.i.d. and the last line appliesthe definition of \u03bd\u02dc and m\u02dc := 4m log(m).A similar argument shows that\u2016m\u02dc\u2211p=1Ejp,Djp(Xjp \u2212Xm)\u2217(Xjp \u2212Xm)\u2016 \u2264 4 log(m)\u03bd\u02dc103and therefore \u03bd \u2264 4 log(m)\u03bd\u02dc.The conclusion provided by Corollary 6.1.2 in [48] isP(\u2016m\u02dc\u2211p=1(Xjp \u2212Xm)\u20162 \u2265 \u03b4)\u2264 2(d1 + d2) exp(\u2212 \u03b422\u03bd + 2L\u03b43)\u22642(d1 + d2) exp(\u2212 \u03b428 log(m)\u03bd\u02dc + 2L\u03b43).By our previous observation, the same approach can be used for the thesecond term in (3.33) with sum over p \/\u2208 \u2126 to obtainP\uf8eb\uf8ed\u2016\u2211p\/\u2208\u2126(Xjp \u2212Xm)\u20162 \u2265 \u03b4\uf8f6\uf8f8 \u2264 2(d1 + d2) exp(\u2212 \u03b422(4 logm\u2212 1)\u03bd\u02dc + 2L\u03b43)\u22642(d1 + d2) exp(\u2212 \u03b428 log(m)\u03bd\u02dc + 2L\u03b43).Therefore\u2016m\u2211k=1Xk \u2212X\u20162\u2264\u2016m\u02dc\u2211p=1(Xjp \u2212Xm)\u20162 + \u2016\u2211p\/\u2208\u2126(Xjp \u2212Xm)\u20162\u22642\u03b4,where the first inequality fails with probability less than2m(by the couponcollector's problem) and the last inequality fails with probability less than4(d1 + d2) exp(\u2212 \u03b428 log(m)\u03bd\u02dc + 2L\u03b43)by a union bound.1043.10 Proof of Additional LemmasIn this section we prove Lemmas 3.6.1 and 3.6.2 stated in Section 3.6.2, usedto establish Lemma 3.5.2 and Lemma 3.5.4. We begin with the proof oflemma 3.6.1.proof of lemma 3.6.1. We begin by showing (3.25). Let D \u2208 T and noticethat we can writeD =r\u2211j=1\u03bbjU\u2217jV \u2217\u2217j.Notice that rank(D) \u2264 r and \u2016D\u2016F =\u221a\u2211j \u03bb2j .For any (k, `) \u2208 [n]\u00d7 [m], we obtain|\u3008N\u02dc k,`,F\u2217(D)\u3009| =\u2223\u2223\u2223\u2223\u2223N\u2211p,q=1N\u02dc k,`pq F\u2217(D)pq\u2223\u2223\u2223\u2223\u2223 \u2264N\u2211p,q=1|F\u2217(D)pq|=N\u2211p,q=1|\u3008(F\u2217)p,q, D\u3009| \u2264r\u2211j=1\u03bbjN\u2211p,q=1|\u3008(F\u2217)p,q, U\u2217jV \u2217\u2217j\u3009|=r\u2211j=1\u03bbjN\u2211p,q=1|\u3008F1\u2217p, U\u2217j\u3009||\u3008F1\u2217q, V \u2217\u2217j\u3009| \u2264r\u2211j=1\u03bbj\u03b3(Ur)\u03b3(Vr) \u2264 \u03b32\u221ar\u2016D\u2016F .This establishes the first claim. Notice that we have shown\u2211Np,q=1 |\u3008(F\u2217)p,q, D\u3009| \u2264\u03b32\u221ar\u2016D\u2016F (this will be used to show the remaining inequality).For (3.26), we use our deviation model and computations similar to thosethat establish the isotropy property of our ensemble (see (3.24) and the ar-105gument that follows). We expand and obtain1nmEn\u2211k=1m\u2211`=1|\u3008N\u02dc k,`,F\u2217(D)\u3009|4=N\u2211p1,p2,p4,p3=1N\u2211q1,q2,q4,q3=1F\u2217(D)p1q1F\u2217(D)p2q2F\u2217(D)p3q3F\u2217(D)p4q4\u00b7(1nmEn\u2211k=1m\u2211`=1\u00af\u02dcN k,`p1q1N\u02dc k,`p2q2 \u00af\u02dcN k,`p3q3N\u02dc k,`p4q4)=\u2211p1\u2212p2=p4\u2212p3\u2211q1\u2212q2=q4\u2212q3F\u2217(D)p1q1F\u2217(D)p2q2F\u2217(D)p3q3F\u2217(D)p4q4=N\u2211p1,p2=1N\u2211q1,q2=1F\u2217(D)p1q1F\u2217(D)p2q2\u2211p3,q3\u2208Qp1,p2,q1,q2F\u2217(D)p3q3F\u2217(D)(p1\u2212p2+p3)(q1\u2212q2+q3)\u2264N\u2211p1,p2=1N\u2211q1,q2=1|F\u2217(D)p1q1F\u2217(D)p2q2||\u3008F\u2217(D),F\u2217(D)\u3009|= \u2016D\u20162F(\u2211p,q|F\u2217(D)pq|)2\u2264 \u03b34r\u2016D\u20164F .The last inequality holds since we previously showed\u2211Np,q=1 |\u3008(F\u2217)p,q, D\u3009| \u2264\u03b32\u221ar\u2016D\u2016F . In the third equality, Qp1,p2,q1,q2 \u2282 [N ]2 is the index set of allowedvalues for p3, q3 according to p1, p2, q1 and q2. To reiterate, the second equalityholds by isotropy of our ensemble due to our deviation model. More precisely,106if \u2206\u02dc, \u0393\u02dc \u2208 R are independent copies of the entries of \u2206,\u0393 respectively, we have1nmEn\u2211k=1m\u2211`=1\u00af\u02dcN k,`p1q1N\u02dc k,`p2q2 \u00af\u02dcN k,`p3q3N\u02dc k,`p4q4=ED1e(\u2206\u02dc(\u2212p1 + p2 \u2212 p3 + p4))ED2e(\u0393\u02dc(\u2212q1 + q2 \u2212 q3 + q4))\u00b7(n\u2211k=11ne((k \u2212 1n\u2212 12)(\u2212p1 + p2 \u2212 p3 + p4)))\u00b7(m\u2211`=11me((`\u2212 1m\u2212 12)(\u2212q1 + q2 \u2212 q3 + q4)))=\uf8f1\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f31 if \u2212 p1 + p2 \u2212 p3 + p4 = 0and \u2212 q1 + q2 \u2212 q3 + q4 = 0,ED2e(zm(\u0393\u02dc\u2212 1\/2))= 0 if \u2212 p1 + p2 \u2212 p3 + p4 = 0and \u2212 q1 + q2 \u2212 q3 + q4 = zm,for z \u2208 Z\/{0},ED1e(jn(\u2206\u02dc\u2212 1\/2))= 0 if \u2212 p1 + p2 \u2212 p3 + p4 = jn,for j \u2208 Z\/{0},and \u2212 q1 + q2 \u2212 q3 + q4 = 0,ED1e(jn(\u2206\u02dc\u2212 1\/2))ED2e(zm(\u0393\u02dc\u2212 1\/2))= 0 if \u2212 p1 + p2 \u2212 p3 + p4 = jn,and \u2212 q1 + q2 \u2212 q3 + q4 = zm,for j, z \u2208 Z\/{0},0 otherwise.The midterms vanish by our deviation model (\u2206\u02dc \u223c D1, \u0393\u02dc \u223c D2) and be-cause \u2212q1 + q2 \u2212 q3 + q4 = zm,\u2212p1 + p2 \u2212 p3 + p4 = jn imply that z \u2208(Z \u2229 [2(1\u2212N)m, 2(N\u22121)m])\/{0} and j \u2208(Z \u2229 [2(1\u2212N)n, 2(N\u22121)n])\/{0} (since pk's, qk's\u2208[N ]).We now prove Lemma 3.6.2. The proof is due to [94], but requires aslight modification for our setting. Adopting the author's notation, in whatfollows for a matrix A \u2208 CN\u00d7N we denote |A)(A| as the operator that mapsX 7\u2192 A\u3008A,X\u3009.proof of Lemma 3.6.2. We will rely heavily on the proof of Lemma 3.1 in[94], and refer to this text and its notation for brevity. Indeed notice that U2107is defined as in [94] and the only difference is that we consider the supremumover U \u2229 U2 (and redefine K with respect to U accordingly). Therefore weobtain our result in the same manner, but replace U2 in [94] with U2 \u2229U andonly require a few additional observations.To elaborate, as in the beginning of the proof of Lemma 3.1 in [94], viacomparison principle and Dudley's inequality we can showE\u000f supX\u2208U\u2229U2m\u2211k=1\u000fk\u3008|Vk)(Vk|(X), X\u3009 \u2264 24\u221a2pi\u222b \u221e0log1\/2N(1\u221arU2 \u2229 U , \u2016 \u00b7 \u2016X , \u000f2R\u221ar)d\u000f,where N(1\u221arU2 \u2229 U , \u2016 \u00b7 \u2016X , \u000f)is a covering number (the number of balls inCN\u00d7N of radius \u000f in the metric \u2016 \u00b7 \u2016X needed to cover 1\u221arU2 \u2229 U),R :=(supX\u2208U\u2229U2m\u2211k=1\u3008|Vk)(Vk|(X), X\u3009)1\/2and \u2016 \u00b7 \u2016X is a semi-norm CN\u00d7N defined as\u2016M\u2016X = maxk\u2208[m]|\u3008Vk,M\u3009|.For M \u2208 U , notice that by assumption\u2016M\u2016X \u2264 K\u221ar\u2016M\u2016F . (3.37)This leaves the containments used throughout the proof unchanged with ourmodified value K. To elaborate, by (3.37) we have 1\u221arU2\u2229U \u2282 K \u00b7BX (whereBX is the unit ball in \u2016 \u00b7 \u2016X). Since 1\u221arU2 \u2282 B1 (where B1 is the unit ball in\u2016 \u00b7 \u2016\u2217) we have 1\u221arU2 \u2229 U \u2282 B1.With these observations, we can now accept the covering number boundsprovided by [94] since these clearly hold for our subsets of interest, i.e.,N(1\u221arU2 \u2229 U , \u2016 \u00b7 \u2016X , \u000f)\u2264 N (K \u00b7BX , \u2016 \u00b7 \u2016X , \u000f) \u2264(1 +2K\u000f)2N2for small \u000f andN(1\u221arU2 \u2229 U , \u2016 \u00b7 \u2016X , \u000f)\u2264 N (B1, \u2016 \u00b7 \u2016X , \u000f) \u2264 exp(C21K\u000f2log3(N) log(m)),108for large \u000f, where C1 is an absolute constant given by Maurey's empiricalmethod (this is proven in Section 3.3 of [94]).The remainder of the proof is as in [94], where the covering numbersbound the integral to obtainE\u000f supX\u2208U\u2229U2m\u2211k=1\u000fk\u3008|Vk)(Vk|(X), X\u3009 \u2264 CR\u221arK log5\/2(N) log1\/2(m),where C is an absolute constant.109Chapter 4ConclusionThis thesis considers the benefits of randomly generated off-the-grid samplesfor signal acquisition (in comparison to equispaced sampling), with the goal ofproviding explicit statements that are informative for practitioners. In largepart, this goal has been achieved where the main results provide novel insighton the anti-aliasing nature of nonuniform samples. The methodology andanalysis uses ideas from compressive sensing and low-rank matrix recoveryto develop a random deviation model, sampling complexity and recovery errorbounds in terms of classical signal processing concepts (error of bandlimitedapproximation).The specific contributions are listed below:\u2022 In Chapter 2, we consider 1D functions f \u2208 H1([\u22121\/2, 1\/2)) that admitan s-sparse uniform discretization f \u2208 CN in some basis (i.e., f = \u03a8gwith \u2016g\u20160 \u2264 s). We show that O(s polylog(N)) random off-the-gridsamples of f from our deviation model are sufficient to recover theN\u221212-bandlimited approximation of f via the basis pursuit problem with aninterpolation kernel (2.9). When (s \u001c N), this is a stark contrast touniform sampling where O(N) equispaced samples are needed for thesame quality of reconstruction. Furthermore, the results are robustwhen the measurements are noisy and stable when g is not exactlys-sparse (but is well approximated by an s-sparse vector).\u2022 Chapter 3 extends the results of Chapter 2 to a low-rank signal modelvia the theory of low-rank matrix recovery. In this context we con-sider 2D functions D \u2208 H1([\u22121\/2, 1\/2)2) whose uniform discretization110D \u2208 CN\u00d7N exhibits low-rank structure. By incorporating a 2D Dirich-let kernel into the nuclear norm minimization problem (3.6), we showthat O(Nr polylog(N)) random nonuniform samples provide the N\u221212-bandlimited approximation of D with error proportional to the bestrank r approximation of D. The methodology allows for additive noise,where the result gives a robust error bound proportional to the noiseenergy level. Therefore, when D can be well approximated by a rankr \u001c N matrix, we may also achieve frugal signal acquisition in thisscenario via random off-the-grid samples. In comparison, uniform sam-ples would requireO(N2)measurements to obtain the N\u221212-bandlimitedapproximation of D.\u2022 Chapter 3 also produces a novel result for the noisy matrix completionproblem (Section 3.3.1). The main contribution of the work is a resultthat applies to full rank matrices with stable and robust recovery er-ror proportional to the low-rank approximation of the signal and noiselevel. We consider the sampling with replacement model where entriesof the matrix are observed uniformly at random with the possibility ofobserving the same entry more than once. Under this sampling andour r-incoherence structure, we show that O(Nr polylog(N)) observednoisy matrix entries are sufficient to recovery the rank r approximationof the data matrix with error robust to the noise. The result providesa contribution to the matrix completion literature as it applies to gen-eral full rank matrices while improving all known error bounds withstandard sampling complexity.This work allows for many avenues of future research. First, the overallmethodology of this thesis requires the practitioner to know the nonuniformsampling locations \u03c4\u02dc accurately. While this is typical for signal reconstruc-tion techniques that involve non-equispaced samples, it would be of practicalinterest to extend the methodology is such a way that allows for robustnessto inaccurate sampling locations and even self-calibration.As mentioned in Section 2.9, this work has not dedicated much effortto a numerically efficient implementation of the Dirichlet kernel S. This iscrucial for large-scale applications, especially in the 2D case where a directimplementation of the Dirichlet kernel via its Fourier representation (3.2) and(3.3) or Dirichlet representation (see [68]) may be too inefficient for practicalpurposes. As future work, it would be useful to consider other interpolation111kernels with greater numerical efficiency (e.g., a low order Lagrange interpo-lation operator).Finally, while novel and informative, the matrix completion results inSection 3.3.1 are not very useful in applications due to the sampling withreplacement model. Indeed, in practice one would not be likely to samplethe same matrix entry twice. Extending the analysis for more practical sam-pling schemes is left as future work, where for example the standard uniformrandom sampling model may be achieved arguing as in Proposition 3.1 in[7].112Bibliography[1] A.J. Jerri [1977], The Shannon Sampling Theorem-its various exten-sions and applications: a tutorial review, Proc. IEEE, 65(11), 1565-1596.[2] A.I. Zayed [1993], Advances in Shannon's Sampling Theory. CRC Press.[3] A. Griffin and P. Tsakalides [2008], Compressed sensing of audio signalsusing multiple sensors. 16th European Signal Processing Conference.[4] A.Y. Aravkin, R. Kumar, H. Mansour, B. Recht and F.J. Herrmann[2014], Fast methods for denoising matrix completion formulations,with applications to robust seismic data interpolation. SIAM Journalon Scientific Computing, 36, S237-S266.[5] A. Zandieh, A. Zareian, M. Azghani and F. Marvasti [2014], Recon-struction of Sub-Nyquist Random Sampling for Sparse and Multi-BandSignals. arXiv.[6] B. Recht, M. Fazel, and P.A. Parillo [2010], Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Mini-mization. SIAM Review, 52, 471-501.[7] B. Recht [2011], A Simpler Approach to Matrix Completion. The Jour-nal of Machine Learning Research, 12, 3413-3430.[8] C.E. Shannon [1949], Communication in the Presence of Noise, Proc.IRE, 37(1), 10-21.[9] C. Li, C.C. Mosher and S.T. Kaplan [2012], Interpolated compres-sive sensing for seismic data reconstruction. 82nd annual internationalmeeting, SEG, Expanded Abstracts.113[10] D. Jagerman [1966], Bounds for Truncation Error of the Sampling Ex-pansion. SIAM J. Appl. Math., 14(4), 714-723.[11] D.R. Bellhouse [1981], Area Estimation by Point-counting Techniques.Biometrics, 37(2), 303-312.[12] D.P. Mitchell [1987], Generating Antialiased Images at Low SamplingDensities. ACM SIGGRAPH Computer Graphics, 21(4), 65-72.[13] D.P. Mitchell [1990], The Antialiasing Problem in Ray Tracing. SIG-GRAPH 90.[14] D.P. Dobkin, D. Eppstein and D.P. Mitchell [1996], Computing the Dis-crepancy with Applications to SuperSampling Patterns. ACM Trans-actions on Graphics, 15(4), 354-376.[15] D.M. Bechir and B. Ridha [2009], Non-uniform Sampling Schemes forRF Bandpass Sampling Receiver. International Conference on SignalProcessing Systems.[16] D. Gross [2011], Recovering Low-Rank Matrices from Few Coefficientsin Any Basis. IEEE Transactions on Information Theory, 57(3), 1548-1566.[17] E.T. Whittaker [1915], On the Functions Which are Represented bythe Expansion of Interpolating Theory, Proc. Roy. Soc. Edinburgh, 35,181-194.[18] E. Shlomot and Y.Y. Zeevi [1989], A Nonuniform Sampling and Repre-sentation Scheme for Images Which Are Not bandlimited. The SixteenthConference of Electrical and Electronics Engineers in Israel.[19] E. Margolis and Y.C. Eldar [2008], Nonuniform Sampling of PeriodicBandlimited Signals. IEEE Transactions on Signal Processing, 56(7),2728 - 2745.[20] Ewout van den Berg and Michael P. Friedlander [2008], Probing thepareto frontier for basis pursuit solutions. SIAM Journal on ScientificComputing, 31(2), 890-912.[21] E.J. Cand\u00e8s, and B. Recht [2009], Exact Matrix Completion via ConvexOptimization. Foundations of Computational Mathematics, 9, 717-772.114[22] E.J. Cand\u00e8s, and Y. Plan [2009], Matrix Completion With Noise. Pro-ceedings of the IEEE , 98, 925 - 936.[23] E.J. Cand\u00e8s and T. Tao [2010], The Power of Convex Relaxation: Near-Optimal Matrix Completion. IEEE Transactions on Information The-ory, 56(5), 2053-2080.[24] E. van den Berg and M.P. Friedlander [2014], Spot \u0015 A Linear-OperatorToolbox. [Online]. https:\/\/www.cs.ubc.ca\/labs\/scl\/spot\/index.html.[Accessed: 13- June- 2019].[25] Edi.lv. [2019]. Avoiding aliasing by Nonuniform sampling. [online] Avail-able at: http:\/\/www.edi.lv\/media\/uploads\/UserFiles\/dasp-web\/sec-5.htm [Accessed 9 May 2019].[26] F.J. Beutler [1966], Error-Free Recovery of Signals from IrregularlySpaced Samples. Society for Industrial and Applied Mathematics, 8(3),328-335.[27] F. Beutler [1970], Alias-free Randomly Timed Sampling of StochasticProcesses. IEEE Transactions on Information Theory, 16(2), 147-152.[28] F. Marvasti [2001], Nonuniform sampling: theory and practice.Springer.[29] F.J. Herrmann, M.P. Friedlander and \u00d6. Y\u0019lmaz [2012], Fighting theCurse of dimensionality: compressive sensing in exploration seismology.IEEE Signal Processing Magazine, 29(3), 88 - 100.[30] F. Krahmer and R. Ward [2014], Stable and Robust Sampling Strate-gies for Compressive Imaging. IEEE Transactions on Image Processing,23(2), 612-622.[31] G. Beer [1991], A Polish Topology for the Closed Subsets of a Pol-ish Space. Proceedings of the American Mathematical Society, 113(4),1123 - 1133.[32] G. Chistyakov and Y. Lyubarskii [1997], Random Perturbations of Ex-ponential Ries Basis in L2(\u2212pi, pi). Annales de l'institut Fourier, 47(1),201-255.115[33] G.L. Bretthorst [2001], Nonuniform Sampling: Bandwidth and Alias-ing. AIP Conference Proceedings, 567(1).[34] G. Hennenfent and F.J. Herrmann [2006], Seismic Denoising withNonuniformly Sampled Curvelets. Computing in Science and Engineer-ing, 8(3), 16-25.[35] G. Hennenfent and F.J. Herrmann [2008], Simply Denoise: WavefieldReconstruction via Jittered Undersampling. Geophysics, 73(3), V19-V28.[36] G.E. Pfander [2015], Sampling Theory, A Renaissance, Birkh\u00e4userBasel.[37] H. Nyquist [1928], Certain Topics in Telegraph Transmission Theory,AIEE Trans., 47, 617-644.[38] H.S. Shapiro and R.A. Silverman [1960], Alias-free Sampling of RandomNoise. Journal of Society for Industrial and Applied Mathematics, 8(2),225-248.[39] H. Landau [1967], Necessary Density Condition for Sampling and In-terpolation of Certain Entire Functions, Acta Math, 117, 37-52.[40] H. Lee and Z. Bien [2005], Sub-Nyquist Nonuniform Sampling and Per-fect Reconstruction of Speech Signals. TENCON 2005 - 2005 IEEERegion 10 Conference.[41] H. Rauhut [2008], Stability Results for Random Sampling of SparseTrigonometric Polynomials. IEEE Transactions on Information The-ory, 54(12), 5661 - 5670.[42] H. Rauhut [2011], Compressive Sensing and Structured Random Ma-trices. Radon Series Comp. Appl., 1-94.[43] H. Boche, R. Calderbank, G. Kutyniok, J. Vyb\u00edral [2013], Compressedsensing and its applications. Birkh\u00e4user.[44] J.M. Whittaker [1929], The Fourier Theory of the Cardinal Functions,Proc. Math. Soc. Edinburgh, 1, 169-176.116[45] J. Keiner, S. Kunis and D. Potts [2008], Using NFFT 3 \u0015 a softwarelibrary for various non-equispaced fast Fourier transforms. ACM Trans.Math. Softw., 36, 19:1\u001519:30.[46] J.B. Harley and J.M.F. Moura [2013], Sparse recovery of the multi-modal and dispersive characteristics of Lamb waves. The Journal ofthe Acoustic Society of America, 133(5), 2732-2745.[47] J. Koh, W. Lee, T.K. Sarkar and M. Salazar-Palma [2013], Calculationof Far-Field Radiation Pattern Using Nonuniformly Spaced Antennasby a Least Square Method. IEEE Transactions on Antennas and Prop-agation, 62(4), 1572-1578.[48] J.A. Tropp [2015], An Introduction to Matrix Concentration Inequali-ties. Now Publishers.[49] J.A. Tropp [2015], Convex Recovery of a Structured Signal from In-dependent Random Linear Measurements. Pfander G. (eds) SamplingTheory, a Renaissance. Applied and Numerical Harmonic Analysis,Birkh\u00e4user.[50] K. Ogura [1920], On a Certain Transcendental Function in the Theoryof Interpolation. T\u00f4hoku Math. J., 17, 64-72.[51] K. Czy\u00bb [2004], Nonuniformly Sampled Active Noise Control System.IFAC Proceedings Volumes, 37(20), 351-355.[52] K. Han, Y. Wei and X. Ma [2016], An Efficient Non-uniform FilteringMethod for Level-crossing Sampling. IEEE International Conferenceon Digital Signal Processing.[53] L. Greengard and J. Lee [2004], Accelerating the Nonuniform fastFourier transform. Applied and Computational Harmonic Analysis, 35,111-129.[54] L. Li [2016], Math 660-Lecture 12: Spec-tral methods: Fourier. [Online]. Available:https:\/\/services.math.duke.edu\/ leili\/teaching\/duke\/math660s16\/lectures\/lec12.pdf.[Accessed: 13- May- 2019].117[55] M.I. Kadec [1964],The Exact Value of the Paley-Wiener Constant, Sov.Math. Dokl., 5, 559-561.[56] M. Ledoux and M. Talagrand [1991], Probability in Banach spaces.Springer.[57] M. Gastpar and Y. Bresler [2000], On The Necessary Density forSpectrum-Blind Nonuniform Sampling Subject to Quantization. Proc.IEEE Int. Conf. Acoustics, Speech, Signal Process., 1, 348-351.[58] M. Rudelson and R. Vershynin [2007], On sparse reconstruction fromFourier and Gaussian measurements. Communications on Pure and Ap-plied Mathematics, 61(8), 1025-1045.[59] M. Fazel, E.J. Cand\u00e8s, B. Recht and P.A. Parrilo [2008], Compressedsensing and robust recovery of low rank matrices. Signals, Systems andComputers, 2008 42nd Asilomar Conference on, 1043-1047.[60] M.A. Herman and T. Strohmer [2009]. High-resolution radar via com-pressed sensing. IEEE Transactions on Signal Processing, 57(6), 2275-2284.[61] M.W. Maciejewski, H.Z. Qui, M. Mobli and J.C. Hoch [2009], Nonuni-form Sampling and Spectral Aliasing. Journal of Magnetic Resonance,199(1), 88-93.[62] M. Pippig and D. Potts [2013], Parallel three-dimensional non-equispaced fast Fourier transforms and their applications to particlesimulation. SIAM Journal of Scientific Computing, 35(4), C411-C437.[63] M. Jia, C. Wang, K. Ting Chen and T. Baba [2013], An Non-uniformSampling Strategy for Physiological Signals Component Analysis. Di-gest of Technical Papers - IEEE International Conference on ConsumerElectronics, 526-529.[64] M. Kabanava and H. Rauhut [2015]. Cosparsity in compressed sensing.Boche H., Calderbank R., Kutyniok G., Vyb\u00edral J. (eds) CompressedSensing and its Applications. Applied and Numerical Harmonic Anal-ysis. Birkh\u00e4user.118[65] M. Kabanava, H. Rauhut and U. Terstiege [2015]. Analysis of low rankmatrix recovery via Mendelson's small ball method. 2015 InternationalConference on Sampling Theory and Applications (SampTA).[66] M. Kabanava, R. Kueng, H. Rauhut and U. Terstiege [2016]. Stablelow-rank matrix recovery via null space properties. Information andInference: A Journal of the IMA, 5(4), 405-441.[67] M. Hajar, M. El Badaoui, A. Raad and F. Bonnardot [2019], DiscreteRandom Sampling: Theory and Practice in Machine Monitoring. Me-chanical Systems and Signal Processing, 123, 386-402.[68] O. L\u00f3pez, R. Kumar, \u00d6. Y\u0019lmaz and F.J. Herrmann [2016], Off-the-Grid Low-Rank Matrix Recovery and Seismic Data Reconstruction.IEEE Journal of Selected Topics in Signal Processing, 10(4).[69] P.L. Butzer and R.L. Stens [1992], Sampling Theory for Not Necessarilybandlimited Functions: A Historical Overview, SIAM Rev., 34(1), 40-53.[70] P.W. Cary [1997], 3D Stacking of Irregularly Sampled Data by Wave-field Reconstruction. SEG Technical Program Expanded Abstracts.[71] P.S. Penev and L.G. Iordanov [2001], Optimal Estimation of SubbandSpeech from Nonuniform Non-recurrent Signal-driven Sparse Samples.IEEE International Conference on Acoustics, Speech and Signal Pro-cessing Proceedings.[72] P. Christensen, A. Kensler and C. Kilpatrick [2018], Progressive Multi-Jittered Sample Sequences. Eurographics Symposium on Rendering,37(4).[73] R. Paley and N. Wiener [1934], Fourier Transforms in the ComplexDomain. Amer. Math. Soc. Colloq. Publs., 19.[74] R. L. Cook, T. Porter and L. Carpenter [1984], Distributed Ray Trac-ing. ACM Siggraph 84 Conference Proceedings, 18(4), 165-174.[75] R. Kronauer and Y.Y. Zeevi [1985], Reorganization and Diversifica-tion of Signals in Vision. IEEE Transactions on Systems, Man, andCybernetics, 15(1), 91-101.119[76] R.L. Cook [1986], Stochastic Sampling in Computer Graphics. ACMTransactions on Graphics, 6(1).[77] R. Motwani and P. Raghavan [1995], Randomized Algorithms. Cam-bridge University Press.[78] R.D. Wisecup [1998], Unambiguous Signal Recovery Above the NyquistUsing Random-sample-interval Imaging. Geophysics, 63(2), 331-789.[79] R. Baraniuk, H. Choi, F. Fernandes, B. Hendricks, R. Nee-lamani, V. Ribeiro, J. Romberg, R. Gopinath, H. Guo, M.Lang, J.E. Odegard and D. Wei [2001], Rice Wavelet Toolbox.[Online]. https:\/\/www.ece.rice.edu\/dsp\/software\/rwt.shtml. [Accessed:13- June- 2019].[80] R. Venkataramani and Y. Bresler [2001], Optimal Sub-Nyquist Nonuni-form Sampling and Reconstruction for Multiband Signals. IEEE Trans-actions on Signal Processing, 48(10), 2301 - 2313.[81] R. Young [2001], An Introduction to Non-Harmonic Fourier Series, El-sevier.[82] R. Kumar, O. L\u00f3pez, E. Esser and F.J. Herrmann [2015], Matrix com-pletion on unstructured grids : 2-D seismic data regularization andinterpolation. EAGE Annual Conference Proceedings.[83] R. Kueng and P. Jung [2017]. Robust Nonnegative Sparse Recoveryand the Nullspace Property of 0\/1 Measurements. IEEE Transactionson Information Theory, 64(2), 689-703.[84] S. Maymon and A.V. Oppenheim [2011], Sinc Interpolation of Nonuni-form Samples. IEEE Transactions on Signal Processing, 59(10), 4745- 4758.[85] S. Foucart and H. Rauhut [2013], A Mathematical Introduction to Com-pressive Sensing. Birkh\u00e4user.[86] T. Strohmer [2000], Numerical Analysis of the Non-Uniform SamplingProblem. Journal of Computational and Applied Mathematics, 122,297-316.120[87] T. Klein and E. Rio [2005], Concentration around the mean for maximaof empirical processes, Annals of Probability, 33(3), 1060\u00151077.[88] T.T.Y. Lin and F.J, Herrmann [2009], Designing simultaneous acquisi-tions with compressive sensing. EAGE Annual Conference Proceedings.[89] T.T. Cai and A. Zhang [2014], Sparse Representation of a Polytope andRecovery of Sparse Signals and Low-Rank Matrices. IEEE Transactionson Information Theory, 60(1), 122-132.[90] T. Wu, S. Dey, and Mike Shuo-Wei Chen [2016], A Nonuniform Sam-pling ADC Architecture With Reconfigurable Digital Anti-Aliasing Fil-ter. IEEE Journal of Selected Topics in Signal Processing, 63(10), 1639- 1651.[91] V.A. Kotel'nikov [1933], On the Transmission Capacity of \u0010ether\u0011 andwire in electrocommunications, (material for the first all-union confer-ence on questions of communication) Izd. Red. Upr. Svyazi RKKA.[92] W.L. Ferrar [1928], On the Consistency of Cardinal Function Interpo-lation, Proc. Roy. Soc. Edinburgh, 47, 230-242.[93] Y.Y. Zeevi and E. Shlomot [1993], Nonuniform Sampling and Antialias-ing in Image Representation. IEEE Transactions on Signal Processing,41(3), 1223-1236.[94] Y. Liu [2011], Universal Low-Rank Matrix Recovery from Pauli Mea-surements. NIPS'11 Proceedings of the 24th International Conferenceon Neural Information Processing Systems, 1638-1646.121","attrs":{"lang":"en","ns":"http:\/\/www.w3.org\/2009\/08\/skos-reference\/skos.html#note","classmap":"oc:AnnotationContainer"},"iri":"http:\/\/www.w3.org\/2009\/08\/skos-reference\/skos.html#note","explain":"Simple Knowledge Organisation System; Notes are used to provide information relating to SKOS concepts. There is no restriction on the nature of this information, e.g., it could be plain text, hypertext, or an image; it could be a definition, information about the scope of a concept, editorial information, or any other type of information."}],"Genre":[{"label":"Genre","value":"Thesis\/Dissertation","attrs":{"lang":"en","ns":"http:\/\/www.europeana.eu\/schemas\/edm\/hasType","classmap":"dpla:SourceResource","property":"edm:hasType"},"iri":"http:\/\/www.europeana.eu\/schemas\/edm\/hasType","explain":"A Europeana Data Model Property; This property relates a resource with the concepts it belongs to in a suitable type system such as MIME or any thesaurus that captures categories of objects in a given field. It does NOT capture aboutness"}],"GraduationDate":[{"label":"GraduationDate","value":"2019-09","attrs":{"lang":"en","ns":"http:\/\/vivoweb.org\/ontology\/core#dateIssued","classmap":"vivo:DateTimeValue","property":"vivo:dateIssued"},"iri":"http:\/\/vivoweb.org\/ontology\/core#dateIssued","explain":"VIVO-ISF Ontology V1.6 Property; Date Optional Time Value, DateTime+Timezone Preferred "}],"IsShownAt":[{"label":"IsShownAt","value":"10.14288\/1.0380720","attrs":{"lang":"en","ns":"http:\/\/www.europeana.eu\/schemas\/edm\/isShownAt","classmap":"edm:WebResource","property":"edm:isShownAt"},"iri":"http:\/\/www.europeana.eu\/schemas\/edm\/isShownAt","explain":"A Europeana Data Model Property; An unambiguous URL reference to the digital object on the provider\u2019s website in its full information context."}],"Language":[{"label":"Language","value":"eng","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/language","classmap":"dpla:SourceResource","property":"dcterms:language"},"iri":"http:\/\/purl.org\/dc\/terms\/language","explain":"A Dublin Core Terms Property; A language of the resource.; Recommended best practice is to use a controlled vocabulary such as RFC 4646 [RFC4646]."}],"Program":[{"label":"Program","value":"Mathematics","attrs":{"lang":"en","ns":"https:\/\/open.library.ubc.ca\/terms#degreeDiscipline","classmap":"oc:ThesisDescription","property":"oc:degreeDiscipline"},"iri":"https:\/\/open.library.ubc.ca\/terms#degreeDiscipline","explain":"UBC Open Collections Metadata Components; Local Field; Indicates the program for which the degree was granted."}],"Provider":[{"label":"Provider","value":"Vancouver : University of British Columbia Library","attrs":{"lang":"en","ns":"http:\/\/www.europeana.eu\/schemas\/edm\/provider","classmap":"ore:Aggregation","property":"edm:provider"},"iri":"http:\/\/www.europeana.eu\/schemas\/edm\/provider","explain":"A Europeana Data Model Property; The name or identifier of the organization who delivers data directly to an aggregation service (e.g. Europeana)"}],"Publisher":[{"label":"Publisher","value":"University of British Columbia","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/publisher","classmap":"dpla:SourceResource","property":"dcterms:publisher"},"iri":"http:\/\/purl.org\/dc\/terms\/publisher","explain":"A Dublin Core Terms Property; An entity responsible for making the resource available.; Examples of a Publisher include a person, an organization, or a service."}],"Rights":[{"label":"Rights","value":"Attribution-NonCommercial-NoDerivatives 4.0 International","attrs":{"lang":"*","ns":"http:\/\/purl.org\/dc\/terms\/rights","classmap":"edm:WebResource","property":"dcterms:rights"},"iri":"http:\/\/purl.org\/dc\/terms\/rights","explain":"A Dublin Core Terms Property; Information about rights held in and over the resource.; Typically, rights information includes a statement about various property rights associated with the resource, including intellectual property rights."}],"RightsURI":[{"label":"RightsURI","value":"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/","attrs":{"lang":"*","ns":"https:\/\/open.library.ubc.ca\/terms#rightsURI","classmap":"oc:PublicationDescription","property":"oc:rightsURI"},"iri":"https:\/\/open.library.ubc.ca\/terms#rightsURI","explain":"UBC Open Collections Metadata Components; Local Field; Indicates the Creative Commons license url."}],"ScholarlyLevel":[{"label":"ScholarlyLevel","value":"Graduate","attrs":{"lang":"en","ns":"https:\/\/open.library.ubc.ca\/terms#scholarLevel","classmap":"oc:PublicationDescription","property":"oc:scholarLevel"},"iri":"https:\/\/open.library.ubc.ca\/terms#scholarLevel","explain":"UBC Open Collections Metadata Components; Local Field; Identifies the scholarly level of the author(s)\/creator(s)."}],"Title":[{"label":"Title","value":"Embracing nonuniform samples","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/title","classmap":"dpla:SourceResource","property":"dcterms:title"},"iri":"http:\/\/purl.org\/dc\/terms\/title","explain":"A Dublin Core Terms Property; The name given to the resource."}],"Type":[{"label":"Type","value":"Text","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/type","classmap":"dpla:SourceResource","property":"dcterms:type"},"iri":"http:\/\/purl.org\/dc\/terms\/type","explain":"A Dublin Core Terms Property; The nature or genre of the resource.; Recommended best practice is to use a controlled vocabulary such as the DCMI Type Vocabulary [DCMITYPE]. To describe the file format, physical medium, or dimensions of the resource, use the Format element."}],"URI":[{"label":"URI","value":"http:\/\/hdl.handle.net\/2429\/71538","attrs":{"lang":"en","ns":"https:\/\/open.library.ubc.ca\/terms#identifierURI","classmap":"oc:PublicationDescription","property":"oc:identifierURI"},"iri":"https:\/\/open.library.ubc.ca\/terms#identifierURI","explain":"UBC Open Collections Metadata Components; Local Field; Indicates the handle for item record."}],"SortDate":[{"label":"Sort Date","value":"2019-12-31 AD","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/date","classmap":"oc:InternalResource","property":"dcterms:date"},"iri":"http:\/\/purl.org\/dc\/terms\/date","explain":"A Dublin Core Elements Property; A point or period of time associated with an event in the lifecycle of the resource.; Date may be used to express temporal information at any level of granularity. Recommended best practice is to use an encoding scheme, such as the W3CDTF profile of ISO 8601 [W3CDTF].; A point or period of time associated with an event in the lifecycle of the resource.; Date may be used to express temporal information at any level of granularity. Recommended best practice is to use an encoding scheme, such as the W3CDTF profile of ISO 8601 [W3CDTF]."}]}