Compressed Sensing with DeterministicMeasurement MatricesbyArman ArianB.Sc., Sharif University of Technology, 2008M.Sc., Sharif University of Technology, 2011A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinThe Faculty of Graduate and Postdoctoral Studies(Mathematics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)August 2019c Arman Arian 2019The following individuals certify that they have read, and recommend to the Faculty of Graduate and Postdoctoral Studies for acceptance, the dissertation entitled: Examining Committee: submitted byin partial fulfillment of the requirements forthe degree ofinSupervisor Supervisory Committee Member Supervisory Committee MemberUniversity ExaminerUniversity ExaminerArman ArianMathematicsOzgur YilmazYaniv PlanBrian MarcusJulia Yulia GordonLutz Hans-Joachim LampeCompressed Sensing with Deterministic Measurement MatricesDoctor of PhilosophyiiAbstractCompressed sensing (CS) is a signal acquisition paradigm to simultaneouslyacquire and reduce dimension of signals that admit sparse representations.This is achieved by collecting linear, non-adaptive measurements of a signal,which can be formalized as multiplying the signal with a “measurement ma-trix". If the measurement matrix satisfies the so-called restricted isometryproperty (RIP), then it will be appropriate for compressed sensing. Whilewide classes of random matrices provably satisfy the RIP with high proba-bility, explicit and deterministic constructions have been shown (so far) tosatisfy the RIP only in a significantly suboptimal regime.In this thesis, our focus is on deterministic measurement matrices incompressed sensing. In a nutshell, we investigate quantization methods for aclass of deterministic matrices (Chapter 2); introduce a novel deterministicconstruction of a family of binary, circulant measurement matrices usingthe Legendre symbol (Chapter 3); and propose two novel approaches forimproving the RIP constant estimates based on Gershgorin circle theorem,obtaining an improvement over the Gershgorin bounds by a multiplicativeconstant. One of our approaches here, together with a conjecture we makeregarding the distribution of quadratic residues in a finite field provides apotential path to break the so-called “square-root barrier"–we give a proofbased on the assumption that the conjecture holds.iiiLay SummaryCompressed sensing (CS) is a revolutionary sampling paradigm born in 2006.In this field, signals are viewed as vectors, and measurements are obtained viaa measurement matrix. The theory in CS provides tools to prove that certainrandom matrices perform “well" as measurement matrices with high proba-bility, and the performance of deterministic matrices have been considered(so far) to be “weak". In this thesis, we propose a novel class of measurementmatrices which has certain advantages compared to the structures given inthe literature so far. We will also improve (a bit) the “weakness" of deter-ministic matrices mentioned above for a particular construction. Moreover,we will derive an approach to show how one can quantize the collected mea-surements obtained from a deterministic matrix efficiently, in order to storeor transmit data.ivPrefaceThis thesis consists of my original research, conducted at the Departmentof Mathematics at the University of British Columbia, Vancouver, Canada,under the supervision of Prof. Ozgur Yilmaz. The results in Chapter 2 havebeen submitted as a conference paper to the 13th International conferenceon Sampling Theory and Applications (SampTA 2019).vTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . x1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Compressed sensing– a new sampling paradigm . . . . . . . . 11.1.1 Basic idea . . . . . . . . . . . . . . . . . . . . . . . . 11.1.2 Measurement matrices in compressed sensing . . . . . 31.1.3 Restricted Isometry Property (RIP) . . . . . . . . . . 61.1.4 RIP for random matrices . . . . . . . . . . . . . . . . 81.1.5 Optimality of random matrices and Gelfand widths . 101.2 Deterministic constructions in compressed sensing . . . . . . 111.2.1 Coherence . . . . . . . . . . . . . . . . . . . . . . . . 141.2.2 Measurement matrices based on Reed-Muller codes . 161.2.3 DeVore’s construction . . . . . . . . . . . . . . . . . 171.2.4 Chirp sensing measurement matrices . . . . . . . . . 181.2.5 The construction of Bourgain et al. . . . . . . . . . . 211.3 Quantization in compressed sensing . . . . . . . . . . . . . . 241.3.1 Memoryless scalar quantization . . . . . . . . . . . . . 251.3.2 Sigma-Delta (⌃) quantization . . . . . . . . . . . . 271.3.3 Approximating signals using ⌃ quantization . . . . 271.3.4 Higher order ⌃ quantization . . . . . . . . . . . . . 301.3.5 One-stage recovery for ⌃ quantization . . . . . . . . 341.4 Organization of the thesis . . . . . . . . . . . . . . . . . . . . 36viTable of Contents2 One-stage recovery for ⌃-quantized compressed sensing 372.1 Approach 1: Modifying the measurement matrix . . . . . . . 402.1.1 Implications for bounded orthonormal systems . . . . 442.1.2 Numerical experiments . . . . . . . . . . . . . . . . . 472.2 Approach 2: Using a digital buffer . . . . . . . . . . . . . . . 472.2.1 Numerical experiments . . . . . . . . . . . . . . . . . 532.3 ⌃-quantized compressed sensing with chirp sensing matrices 542.3.1 Approximation error as the number of measurementsgrows . . . . . . . . . . . . . . . . . . . . . . . . . . . 552.3.2 Approximation error as the sparsity level varies . . . 592.3.3 Numerical experiments . . . . . . . . . . . . . . . . . 612.4 Further encoding of ⌃-quantized compressive measurements 623 Deterministic partial binary circulant compressed sensingmatrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673.1 Introduction and preliminaries . . . . . . . . . . . . . . . . . 673.2 Compressed sensing matrices using Legendre sequence . . . . 693.3 Circulant matrices . . . . . . . . . . . . . . . . . . . . . . . . 703.4 A novel, explicit construction . . . . . . . . . . . . . . . . . . 733.5 One-stage recovery for ⌃-quantized compressed sensing withdeterministic partial circulant matrices . . . . . . . . . . . . 773.6 Numerical experiments . . . . . . . . . . . . . . . . . . . . . 784 RIP constants for deterministic compressed sensing matrices-beyond Gershgorin . . . . . . . . . . . . . . . . . . . . . . . . 834.1 Paley tight frames for compressed sensing . . . . . . . . . . . 854.2 Improving the Gershgorin bound using skew-adjacency matri-ces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 894.3 Improving the Gershgorin bound using Dembo bounds . . . . 914.4 A generalized Dembo approach . . . . . . . . . . . . . . . . . 994.5 A path to break the square-root barrier using Dembo bounds 1075 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . 116Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118viiList of Figures1.1 Error in approximation using first order and second order ⌃quantization versus number of measurements m. The graphsare plotted in log-log scale and they are compared with theupper bounds on the error, namely, f(m) = C/pm, andg(m) = D/pm3 (the graphs of f(m) = 1pmand g(m) = 1mpmare multiplied with proper constants to shift them close to thenumerical graph) respectively. . . . . . . . . . . . . . . . . . 332.1 Error in approximation using first order and second order ⌃quantization with one-stage reconstruction scheme and with a“modified" random partial Fourier matrix for 10-sparse signalsand the comparison with the graphs of f(m) = Cm1/2andg(m) = Dm3/2in log-log scale. . . . . . . . . . . . . . . . . . . 482.2 Quantizing the signal x by first using MSQ with a very smallstep size 0, then applying U which is a fast transform followedby a ⌃ quantization scheme. . . . . . . . . . . . . . . . . . . 482.3 Error in approximation using first order and second order ⌃quantization with one-stage reconstruction scheme and withan extra MSQ step (before applying the matrix U on the mea-surement vector). . . . . . . . . . . . . . . . . . . . . . . . . 532.4 Error in approximation using first order and second order ⌃quantization with one-stage reconstruction scheme for a 4-sparse signal and the comparison with the graphs of f(p) =Cpp and g(p) =Dp3/2(each one shifted properly to match theoriginal graphs) in log-log scale. . . . . . . . . . . . . . . . . 622.5 Error in approximation using first order and second order ⌃quantization with one-stage reconstruction scheme with fixednumber of measurements (p = 541) and the comparison withthe graphs of f(k0) = 1pk0and g(k0) = 1pk03. . . . . . . . . . . 633.1 The binary matrix given by the new construction for p = 997. 80viiiList of Figures3.2 Coherence of the matrices introduced in this chapter, and alsochirp sensing matrices in log-log scale, accompanied with bestfitted lines. As seen in this Figure, the coherence of our con-struction behaves as ⇠ m1/3, and for chirp sensing matricesbehaves as ⇠ p1/2 = m1/2. . . . . . . . . . . . . . . . . . . 803.3 The fraction of exactly recovered vectors versus sparsity for afixed number of measurements. The number of measurementsis chosen as m = 256 for the Reed-Muller matrix, m = 178for the new construction and also for the Bernoulli matrix,and m = 169 for the DeVore’s construction. The ambientdimension of all signals is n = 300 . . . . . . . . . . . . . . . 813.4 The graph of fraction of exactly recovered vectors (for 10 ex-periments) versus prime number p for a fixed level of sparsity(k = 10 or 20) for the new construction and the Bernoullimatrices. Note that only three graphs are shown because thegraphs corresponding to k = 20 for the new construction andrandom Bernoulli exactly overlap with each other. This sug-gests that our proposed deterministic construction has a verysimilar performance to random Bernoulli. . . . . . . . . . . . 823.5 The maximum sparsity level of recoverable signals g(p) versusthe prime number p compared with the graph of f(p) = p3/4. 824.1 The graph of lower bound of the RIP constants, comparedwith the Gershgorin bound and the new improved bound, asgiven in Section 4.2, on the RIP constants. . . . . . . . . . . 874.2 Comparison of lower bounds of the RIP constants obtainedfrom a single random support set and using the worst caseamong 1000 random support sets. We observe that the slopeof the graph (in log-log scale) obtained from a single supportset almost remains constant, as we increase the number ofsupport sets from 1 to 1000. . . . . . . . . . . . . . . . . . . 884.3 Comparing the sharpness of Dembo bounds and Gershgorinbounds for estimating the maximum eigenvalue of a semi-positive definite Hermitian matrix by considering the ratioof these bounds over the actual maximum eigenvalues for afixed Paley matrix with p = 103. . . . . . . . . . . . . . . . . 93ixAcknowledgementsFirst and foremost, I express my most sincere gratitude to my supervisorProfessor Ozgur Yilmaz. I learned so many things from him during the yearsof completing my Ph.D. degree. In addition to learning a lot about topicsrelated to my research area, I learned how to enjoy beauty of Mathematicsmore deeply by tackling the topics and problems using a “broad perspective".I am deeply thankful to him because of all the insightful ideas he gave meand because of all the time he spent for me to help me develop those ideas.I would like to thank the supervisory committee for their time, and theiradvice that always guided me through this journey.Lastly, I would like to thank my family who have always given me loveand support, and helped me a lot emotionally during the time of writing thisdissertation.xChapter 1IntroductionThis thesis focuses on several problems in compressed sensing (CS)– a novelsignal acquisition paradigm that relies upon a new sampling theory for“sparse signals". One of the main problems in compressed sensing is the de-sign of appropriate measurement matrices with (nearly) optimal dimensionrelations. While nearly optimal measurement matrices are ubiquitous (anyrandom matrix with i.i.d. Gaussian entries will do with high probability), noexplicit deterministic construction is known that produces (nearly) optimalmeasurement matrices. In this thesis, we address several problems relatedto CS with deterministic matrices including a novel construction, quantiza-tion, and improved bounds on the RIP constants of certain deterministicconstructions.In this chapter, we give an overview of the theory of CS, deterministicconstructions in CS, and quantization in CS.1.1 Compressed sensing– a new samplingparadigm1.1.1 Basic ideaOne of the key ingredients of today’s digital technology is our ability toacquire, store, process, and transmit signals digitally. The signals of interestcan be images, audio, video, text, etc., which can be modelled as finite (buttypically high) dimensional vectors. Furthermore, signals often possess low-dimensional structure that can be exploited to accomplish the various taskswe listed above.Compressed sensing relies on such a structural property that variousclasses of signals enjoy: often signals are sparse or compressible with respectto some basis (i.e., transform). To be more precise, a signal x 2 Rn is k-sparseif it has at most k non-zero entries. The support of x = (x1, x2, · · · , xn) isthe set of indices j such that xj 6= 0, and is denoted by supp(x). We alsodefine kxk0 = |supp(x)|, and ⌃nk := {x : kxk0 k} denotes the set of allk-sparse vectors.11.1. Compressed sensing– a new sampling paradigmModelling a given signal class, say the collection of all natural images, as⌃nk , for k ⌧ n, is naive at best. However, we can generalize this approachby allowing two additional relaxations on the requirement of sparsity:(i) Signals may not often satisfy the sparsity assumption, but may admitsparse representations with respect to a known basis. More precisely,a signal x is said to be sparse with respect to a basis B (here B is aunitary matrix, i.e., its columns form an orthonormal basis for Rn) ifx = Bu where u 2 Rn is sparse.(ii) A signal x may not be sparse (with respect to a basis B) but compress-ible. This means that there is a u 2 ⌃nk such that kx Buk is smallin some norm. This would be the case if, e.g., the sorted entries of udecay rapidly.Once we make these more relaxed assumptions, we obtain a realisticmodel that is indeed encountered in many practical scenarios. For example,natural images are compressible with respect to wavelets and audio signalswith respect to Fourier. For various other classes of signals, one needs torelax the requirement in the condition (i) (mentioned above) that B is abasis and work with frames instead. But for our purposes in this thesis, wewill assume both (i) and (ii) hold. Often, we will also make the assumptionthat B = I, and work directly with sparse vectors in ⌃nk .Next, we describe the main idea behind CS. Suppose x 2 ⌃nk with k ⌧ n.A generalized sample of x is given by its inner product with a measurementvector i 2 Rn. In CS, we pick m ⌧ n non-adaptive measurement vec-tors 1, ...m 2 Rn (which we now assume are row vectors) and collect themeasurementsyj = hj , xi, j = 1, 2, ....,mor equivalently, setting y = [y1, · · · , ym]T ,y = xwhere is the m ⇥ n matrix whose jth row is j . The measurementsyj collected this way are called compressed measurements or compressiblesamples of x. The m ⇥ n matrix where m ⌧ n, is the (compressed)measurement matrix. Next, we summarize the important features that aproper CS measurement matrix must satisfy in order to guarantee that aconvex minimization algorithm can be used for the recovery of sparse orcompressible signals.21.1. Compressed sensing– a new sampling paradigm1.1.2 Measurement matrices in compressed sensingOne of the main goals in CS is to design measurement matrices that preservethe information that identifies the original signal x. One of the main toolsfor assessing if a matrix is an appropriate CS measurement matrix is theso-called Restricted Isometry Property (RIP).Definition 1. A matrix is said to satisfy RIP of order k if there exists aconstant 2 (0, 1) satisfying(1 )kxk22 kxk22 (1 + )kxk22 (1.1)for every k-sparse vector x. The RIP constant of order k for a matrix ,denoted by k, is defined as the minimum over all numbers satisfying (1.1).One of the important implications of RIP is the uniqueness property in thefollowing sense.Lemma 1. Let 2 Rm⇥n be a matrix satisfying RIP of order 2k. Let xand y be k-sparse vectors and assume that x = y, then x = y.Proof. Let z = xy. Since x and y are k-sparse vectors, each of them have atmost k non-zero entries. Thus, z has at most 2k non-zero entries which meansthat z is a 2k-sparse vector. On the other hand, we can use the assumptionx = y to conclude z = 0. Therefore by (1.1), (1 2k)kzk22 = 0 andsince 1 2k > 0 this implies z = 0, i.e., x = y as desired.Based on Lemma 1 that guarantees injectivity of on ⌃nk , the theorembelow gives an algorithm to reconstruct the original k-sparse signal from themeasurement vector.Theorem 1. Let be a matrix satisfying RIP of order 2k. Suppose that xis a k-sparse vector and let y = x be the measurement vector. If xˆ is thereconstructed vector for x defined byxˆ = argmin kzk0 such that y = z, (1.2)then xˆ = x.Proof. Since x is a k-sparse vector satisfying y = x, we can conclude thatany solution to (1.2) is also k-sparse. Therefore, by Lemma 1, the solutionto (1.2) is unique, and it must be equal to x, i.e., xˆ = x.31.1. Compressed sensing– a new sampling paradigmThe reconstruction scheme given in Theorem 1 is in fact intractable and isNP-hard [59]. On the other hand, the convex relaxation of this algorithmwhich is computationally efficient, also reconstructs the original signal per-fectly in the absence of noise. In their 2006 breakthrough article, Candès,Romberg, and Tao [20] proved the following theorem.Theorem 2. [20] Let be a matrix satisfying RIP of order 4k, with theproperty that 3k + 34k < 2. Let x be a k-sparse vector and let y = x bethe measurement vector. If xˆ is the reconstruction vector for x defined byxˆ = argmin kzk1 such that y = z,then xˆ = x.This theorem is of extreme importance since we can say based on it, thefield of CS was born. We should mention, however, that parallel to Candèset al. [20] in 2006, Donoho [30] also published a breakthrough article in thesame year that can also be considered as the starting point of CS.The condition given in this theorem, i.e., 3k + 34k < 2 was improvedgradually between 2006 and 2014. In 2014, Cai et al. improved this conditionto 2k < 1/p2, and more importantly they proved that this condition isoptimal and can not be improved further. In particular, they proved thefollowing theorem.Theorem 3. [18] Assume that the conditions of Theorem 1 hold and that2k <1p2⇡ 0.7071. Let xˆ be the reconstruction vector obtained by BasisPursuit (BP) algorithm given byxˆ = argmin kzk1 such that y = z, (1.3)then xˆ = x.Note that, as shown in [18], the condition of this theorem can be gener-alized to tk <qt1t for t 43 .Noisy Recovery: One of the key requirements of CS to be practicallyrelevant is that it is robust to noise and stable under model mismatch. Inother words, suppose x is not necessarily sparse but compressible, and sup-pose that its compressed measurements are corrupted by noise. Formally, ifx is the original signal and 2 Rm⇥n is the measurement matrix, we canmodel our problem asy = x+ w (1.4)41.1. Compressed sensing– a new sampling paradigmwhere w is a noise term that satisfies kwk ✏ for some known ✏ > 0. Inthis case, the vector x can be well approximated with the solution to BasisPursuit Denoising (BPDN) algorithm defined asxˆ = argmin kzk1 subject to ky zk2 ✏ (1.5)where ✏ is an upper bound on the size of the noisy contribution. Moreprecisely, we have the following theorem, due to Candès, Romberg, and Tao.Below, xk := argminu2⌃nk kxuk denotes the best k-term approximationof x, and can be obtained by keeping the largest (in absolute value) k entriesof x, and replacing the rest be zeros. The error in approximation (in `p norm)using the best k-term approximation is denoted by k(x)p, and is defined ask(x)p := kx xkkp (1.6)Theorem 4. [20] Assume that 3k + 34k < 2 and kwk2 ✏. Then, thesolution to (1.5) obeyskx xˆk2 D0pkk(x)1 +D1✏ (1.7)for some universal constants D0 and D1 that depend only on the value of4k.Similar to what we explained for the noise-free case, the condition 3k+34k <2 was later improved to the optimized condition 2k < 1/p2 [18] in thefollowing sense.Theorem 5. [18] Assume that 2k < 1p2 and kwk2 ✏. Then, the solutionto (1.5) obeyskx xˆk2 C0pkk(x)1 + C1✏ (1.8)for some universal constants C0 and C1 which depend only on the value of2k.Remark 1. Explicit formulas for the constants C0 and C1 mentioned inTheorem 5 were derived in [18]. For the sake of completeness, we mentionthem here:C0 = 2⇣p22k +q2( 1p2 2k)2k2( 1p2 2k)+ 1⌘,andC1 =2p2(1 + 2k)1p22k.51.1. Compressed sensing– a new sampling paradigmRemark 2. The Basis Pursuit and Basis Pursuit Denoising algorithms de-fined in (1.3) and (1.5) are indeed two of the most important algorithms inCS, but they are not the only ones. In fact, finding a fast algorithm for re-construction is one of the two major problems in CS (the other problem isfinding an appropriate measurement matrix). The major algorithms in CScan be categorized into two groups. The first class is convex optimization al-gorithms; BP and BPDN are important examples in this group. The secondclass is greedy algorithms, and important algorithms of this group includeOrthogonal Matching Pursuit (OMP), Compressive Sampling Matching Pur-suit (CoSaMP), and Iterative Hard Thresholding (IHT). Note that this thesisfocuses on the design and analysis of CS measurement matrices, and we willnot go through different types of algorithms in detail.1.1.3 Restricted Isometry Property (RIP)Let x 2 Rn be a given signal, and let be an m ⇥ n measurement matrix.In this section, we will review an important approach to evaluate the RIPconstants that can be exploited directly from the matrix and without con-sidering k-sparse signals. Let T be a subset of {1, 2, 3, ..., n} with k elements.By xT , we mean the vector in Rk obtained by restricting x to the entriesindexed by the elements of T . Similarly, by T we mean the m⇥ |T | matrixdefined by the restriction of to the columns indexed by the elements ofthe set T . Define kmin and kmax as follows.kmax = maxT :|T |kmax(⇤TT ),kmin = minT :|T |kmin(⇤TT ),(1.9)where max and min denote the maximum and minimum eigenvalue of amatrix respectively, and G := ⇤TT is called the Gramian matrix corre-sponding to T .Proposition 1. For every matrix , the RIP constant k of the matrix canbe computed by the following formulak = max{1 kmin,kmax 1} (1.10)Proof. Let T0 to be the index set for which k⇤T0T0Ik2 becomes maximum,i.e., k⇤T0T0 Ik2 = maxT :|T |k k⇤TT Ik2. We know that the singularvalues of a Hermitian matrix are the absolute values of eigenvalues of thesame matrix. Hence, to obtain k⇤T0T0 Ik2 (the maximum singular value61.1. Compressed sensing– a new sampling paradigmof (⇤T0T0 I)), we should simply consider |1|, where 1 is the eigenvalueof this matrix with the largest absolute value. Let v1 2 Ck be the eigenvectorof this matrix corresponding to 1, then(⇤T0T0 I)v1 = 1v1 = ±k⇤T0T0 Ik2v1Therefore,kT0v1k22 kv1k22 = h⇤T0T0v1, v1i hv1, v1i= h(⇤T0T0 I)v1, v1i = ±k⇤T0T0 Ik2kv1k22(1.11)So(1 k⇤T0T0 Ik2))kv1k22 kT0v1k22 (1 + k⇤T0T0 Ik2)kv1k22which implies(1 k⇤T0T0 Ik2))kw1k22 kw1k22 (1 + k⇤T0T0 Ik2)kw1k22where w1 2 Cn is a k-sparse vector obtained by padding zeros to v1, i.e.,v1 = (w1)T0 . Accordingly, k k⇤T0T0 Ik2 = maxT :|T |k k⇤TT ISk2.On the other hand, according to (1.1), we have|kw1k22 kw1k22| = |kT0v1k22 kv1k22| kkv1k22Thus,|kT0v1k22 kv1k22|kv1k22 kSo by using (1.11), we havek⇤T0T0 Ik2 = maxT :|T |k k⇤TT Ik2 kTherefore,k = maxT :|T |kk⇤TT ISk2,which implies (1.10).Note that verifying whether a given matrix is RIP with a small constant ornot is intractable because we need to calculate the minimum and maximumeigenvalues ofnkmatrices. This problem is circumvented by using randommatrices.71.1. Compressed sensing– a new sampling paradigm1.1.4 RIP for random matricesConsider a matrix 2 Rm⇥n with the property that each entry of is arandom variable. Such a matrix is called a random matrix. As stated in[37], the random matrices were initially introduced in the context of Statisticsby Wishart and in Mathematical Physics by Wigner.Definition 2. Let be an m⇥ n random matrix.(i) The matrix is called a Bernoulli random matrix if the entries of are i.i.d. Rademacher random variables, i.e., random variables taking1 or -1, each with probability p = 1/2.(ii) The matrix is called a Gaussian random matrix if the entries of are i.i.d. standard Gaussian random variables.(iii) The matrix is called a sub-Gaussian random matrix if the entries of are independent (not necessarily identically distributed) sub-Gaussianrandom variables with zero mean and variance one.The random variable X is defined to be a sub-Gaussian random variablewith variance proxy denoted by X ⇠ subG(2) if E(X) = 0, and if themoment-generating function of X satisfiesmX(t) = E(etX) et22/2for any t 2 R. The name sub-Gaussian should becomes clear from this defi-nition as et22/2 is the moment generating function of the Gaussian randomvariable with mean zero and variance 2. Note that unlike the ordinaryconvention in probability theory, the sub-Gaussian random variable with agiven variance proxy determines infinitely many distributions. In fact, if Xis a sub-Gaussian random variable with variance proxy 2, then it is also asub-Gaussian random variable with variance proxy 2 with any .Remark 3. It is obvious that a Gaussian random variable is a sub-Gaussianrandom variable. Also, if X is a Rademacher random variable, then themoment generating function of X would bemX(t) =et + et2= cosh t et2/2and therefore, X is sub-Gaussian. Thus, a random matrix whose some ofits entries are standard Gaussian and some of its entries are Rademacherrandom variables is considered to be a sub-Gaussian random matrix. (notethat as stated in Definition 2, the entries of a sub-Gaussian matrix do nothave to be identically distributed).81.1. Compressed sensing– a new sampling paradigmStudying random matrices and their properties have been of a great inter-est in the past few years because of interesting properties of these matrices.For example, if A is a matrix, one could always ask about finding or estimat-ing the extreme singular value of A. The reason for interest in these quan-tities is that the maximum singular value i.e., max(A) is same as operatornorm of A, and the condition number of A, i.e., cond(A) = max(A)/min(A)measures how far the matrix is from being an isometry. The closer to onethis number is, the closer the matrix is to be an isometry. In general, an-swering these question is difficult, but important results have been derivedin the case that A is a random matrix, e.g., [23, 55, 70, 74].In the context of CS, the sub-Gaussian random matrices satisfy the RIPwith high probability, if the sparsity level k is slightly less than m. Specifi-cally, the following theorem holds.Theorem 6. [37] Let be an m⇥n sub-Gaussian random matrix. Then forany 0 < < 1 there exists a constant C > 0 (depending only on sub-Gaussianvariance proxy) such that restricted isometry constant of 1pm satisfies k with probability at least 1 2 exp (2m/(2C)) provided that the numberof measurements satisfiesm 2C2k log (en/k) (1.12)Note that a factor 1pmis appeared in front of the measurement matrix inthe theorem above because if a matrix satisfies RIP, then it is expected to beclose to be an isometry. Hence we would expect the columns of the measure-ment matrix to be normalized to one. Since the entries of the measurementmatrix are random variables with variance one, we multiply the matrix withthe factor 1pmso that the columns become unit-norm in expectation.As we can see in (1.12), the bound in minimum number of measurementsm is just slightly larger than k. Recall that in practice the signal x livesin Rn where the ambient dimension n is a large number, while the sparsitylevel k is small for many classes of signals. Therefore, based on theoremabove, we can reconstruct these signals with high probability when we usesub-Gaussian random matrices as the measurement matrix with using onlymmeasurements satisfying (1.12), and this number of measurements is seem-ingly insufficient. It can be shown [36, 38] that the bound (1.12) is optimal.In particular, as we will discuss in the next section, the factor log (n/k) cannot be improved.91.1. Compressed sensing– a new sampling paradigm1.1.5 Optimality of random matrices and Gelfand widthsHere, we consider the measurement matrix 2 Rm⇥n as an encoder (for thesignal x) and the algorithm that reconstructs x as a decoder and we denoteit by : Rm ! Rn. For a set K ✓ Rn, k(K)p is defined as follows.k(K)p := supx2Kk(x)pwhere k(x)p is as defined in (1.6). Moreover, the worst reconstruction errorfor the best pair of encoder/decoder is denoted by Em(K, `p), and is definedasEm(K, `p) := inf supx2Kkx(Ax)kpwhere the infimum is taken over all pairs of (A,) with A 2 Rm⇥n and : Rm ! Rn. The bounds for Em(K, `p) depend on a quantity calledGelfand width of a set. To define the Gelfand widths, first we need to definethe codimension of a subspace of a normed space.Definition 3. If W is a subspace of a vector space V , then the codimensionof W is defined as the dimension of the quotient space V/W which is givenbycodim(W ) = dim(V/W ) = dimV dimWDefinition 4. If K is a compact set in a normed space X, then the Gelfandwidth of K of order m, denoted by dm(K)X , dm(K,X), or simply dm(K) ifthe normed space X is clear, is defined asdm(K) := inf sup{kxk : x 2 K \ Y }where the infimum is taken over all subsets Y of X with codim Y m.Proposition 2. [33] Let K ✓ X = Rn (where X = Rn is equipped with `pnorm) be a compact set with K = K and K+K ✓ C0K for some constantC0. Then,dm(K) Em(K, `p) C0dm(K)Now, assume that Bnp denotes the unit ball respect to the `p norm, i.e.,Bnp := {x 2 Rn : kxkp 1}Also, suppose that the vector space X = Rn is equipped with `2 norm. Thenext theorem provides bounds for dm(Bnp ).101.2. Deterministic constructions in compressed sensingTheorem 7. [36] Let 0 < p 1. There exist universal constants Cp andDp > 0 such that the Gelfand width of dm(Bnp ) satisfiesCpmin{1, log(en/m)m}1/p1/2 dm(Bnp ) Dpmin{1,log(en/m)m}1/p1/2Now, if we set p = 1, for large enough m we will obtainC1rlog(en/m)m dm(Bn1 ) D1rlog(en/m)mTherefore, by setting K = Bn1 in Proposition 2, we obtainC1rlog(en/m)m dm(Bn1 ) Em(Bn1 , `2) 2dm(Bn1 ) 2D1rlog(en/m)m(1.13)where we used the fact that Bn1 +Bn1 ✓ 2Bn1 , and hence, we can use C0 = 2.Next, assume that 2 Rm⇥n satisfies RIP with 2k < 1p2 , then for x 2 Bn1we havekx(Ax)k2 C0k(x)1pkwhere : Rm ! Rn is the `1 minimization reconstruction algorithm. Then,since k(x) kxk1 1, we will have kx (x)k2 C0 1pk . Hence,Em(Bn1 , `2) C0pk . Combining this inequality with (1.13), we concludeC1rlog(en/m)m Em(Bn1 , `2) C0pkand this implies m C20C21k log(en/m) which proves that the bound (1.12) isoptimal for the minimum number of measurements in order to have a matrixwith small enough RIP constant.1.2 Deterministic constructions in compressedsensingIn Section 1.1, we saw that random matrices are important examples ofRIP matrices. However, there are two major drawbacks for these matrices.First, storing the entries of a random matrix and performing matrix-vector111.2. Deterministic constructions in compressed sensingmultiplication require significant computational resources. However, this isnot the case for deterministic matrices, since as we will see below, most ofthese matrices are structured matrices and the full matrix does not have tobe stored. Second, even though sub-Gaussian matrices are RIP, with RIPconstants satisfying k s with high probability in the optimal regime mks2 log n, there is no tractable algorithm to check whether a given randommatrix is RIP or not in that regime, and the RIP verification is an NP-hardproblem in such a regime. This has been shown in [76] by Wang et al. andin particular, it has been shown that there is no polynomial time algorithmfor RIP verification–even in an average case scenario–when the parameterssatisfy m k1+↵s2 log n, for any ↵ 2 [0, 1). In fact, the polynomial timealgorithm exists only if the parameters satisfy m k2s2 log n. In thefollowing, we explain briefly what is meant by average case RIP verificationand we summarize the main results of [76].The NP-hardness mentioned above occurs when a “designed" set of ma-trices, e.g., a set of random matrices with i.i.d. entries, or a structured classof deterministic matrices is given and we want to know whether the RIPconstant of the matrix with largest RIP constant, i.e., the worst case sce-nario, satisfies k s (for a fixed value of s). However, it is useful to providea tool (called certifier) that determines whether the RIP constant of givenmatrices satisfies k s in the average case. To formulate this idea, for0 ↵ 1, define the set of R↵ as the set containing all quadruples of thefollowing form.R↵ := {(m,n, k, s) : m k1+↵s2 log n}Also, let RIPm,n(k, s) denote the set of all m ⇥ n matrices whose RIP con-stants of order k satisfies k s. With this notation, Theorem 6 can bestated as follows. Fix s 2 (0, 1), and let (m,n, k, s) 2 R0. In the asymptoticregime m,n, k !1, if X is a sub-Gaussian random matrix of the size m⇥n,then P(X 2 RIPm,n(k, s))! 1.Now, the question is for a given set of designed matrices, can we find a toolthat verifies whether most of the matrices in that set with the parameters(m,n, k, s) 2 R↵ have the RIP constants satisfying k s. To do so, asequence of certifiers can be defined as follows.Definition 5. Given parameters m,n, k,s, and a set of of m ⇥ n designedmatrices A, a certifier for A is defined as the sequence of measurable func-tions m : Rm⇥n ! {0, 1} such that(i) m(X) = 1 iff X 2 RIPm,n(k, s).121.2. Deterministic constructions in compressed sensing(ii) P( m(X) = 0, X 2 A) 1/3 in the asymptotic regime m,n, k !1.Note that the certifier above depends on the parameters m,n, k, and sas well as the set A. In [76], the set A is considered to be the set of allm⇥n sub-Gaussian random matrices. It would be ideal to obtain a certifierthat is computationally tractable and works for the parameters in the setR↵, with ↵ as small as possible. In Proposition 1, we saw that the RIPconstant of a matrix can be computed via (1.10). This provides a certifier m : Rm⇥n ! {0, 1} defined by m(X) =⇢1, if maxT :|T |k kX⇤TXT Ikk2 s0, if maxT :|T |k k⇤TT Ikk2 > sAlthough this certifier works for the class of sub-Gaussian matrices withthe parameters chosen from R0, but it is computationally intractable sincecomputing the operator norm mentioned above is NP-hard. However, if theparameters are chosen from R1 (specifically, for m 1964k2s2 log n), andif we set the set of design matrices to be the set of sub-Gaussian matrices,there is a polynomial time certifier m : Rm⇥n ! {0, 1} given in [37] whichcan be defined as follows. m(X) =8<: 1, if kX⇤X Ink1 142qlognm0, if kX⇤X Ink1 > 142qlognmNow, the important question is whether a polynomial time certifier exists,when the parameters are chosen from the set R↵ for ↵ 2 [0, 1), and the setA is set to be the set of m ⇥ n random matrices with i.i.d. entries with adistribution Q. It is shown in [76] that the answer to this question is “no"assuming that a version of planted clique hypothesis holds. In particular,they prove the following Theorem under the assumption mentioned above.Theorem 8. Fix ↵ 2 [0, 1). Then, there exists parameters (m,n, k, s) 2 R↵such that no certifier/distribution couple ( , Q) (with being a polynomialtime certifier) exists for this sequence of parameters.These caveats of random matrices explained above leads us to consider“deterministic matrices" as an alternative for measurement matrices. How-ever, note that if by doing so, we want to resolve the non-existence oftractable algorithm for RIP verification as explained above, we need to con-struct a class of deterministic measurement matrices that satisfy RIP andhave the parameters satisfying m k1+↵s2 log n for ↵ 2 [0, 1) (since as131.2. Deterministic constructions in compressed sensingmentioned above for m k2s2 log n, a polynomial time algorithm existsfor RIP verification of random matrices). This means that to solve this is-sue, we need to construct a class of deterministic matrices that breaks thesquare-root barrier, see Section 1.2.1. Next, we explain one of the main toolsused for deterministic CS matrices which is a property called coherence.1.2.1 CoherenceThe coherence of a measurement matrix with unit norm columns is definedasµ() = max1i 6=jn|hi,ji|where i is the ith column of . A matrix can be used for measurementof sparse signals if the coherence of that matrix is small. More precisely, if(2k 1)µ < 1, there will be a reconstruction algorithm for the recovery ofevery k-sparse signal in k iterations [31].For any CS matrix, the coherence and the RIP constant can be related toeach other. These quantities are in fact related to each other via a theoremcalled Gershgorin circle theorem which provides bounds on the eigenvaluesof a matrix. This theorem can be stated as follows [8, 39].Theorem 9. Suppose that A = (aij) is an n⇥ n matrix. LetRi =nXj=1,j 6=i|aij |,then each eigenvalue of A is in at lease one of the disks{z : |z aii| Ri}Proposition 3. Suppose is a matrix with coherence µ. Then satisfiesthe RIP of order k withk µ(k 1) (1.14)whenever k < 1µ + 1.Proof. Consider an index set T ✓ {1, 2, ..., n} with |T | = k. Also, assumethat max,T and min,T are the maximum and minimum eigenvalues of ⇤TTrespectively. Since all columns of have unit norm, the diagonal elementsof ⇤TT are all one. Hence, by using Gershgorin circle theorem we concludethat141.2. Deterministic constructions in compressed sensing|max,T 1| max1ikXj 6=i|hj |ii| µ(k 1)and similarly|min,T 1| max1ikXj 6=i|hj |ii| µ(k 1)Since the inequalities above are valid for any index set T , we conclude thatthe quantities kmax and kmin defined as kmax := maxT,|T |k max,T , andkmin := minT,|T |k min,T satisfy |kmax 1| µ(k 1) and |kmin 1| µ(k 1) respectively. Therefore,k = max{kmax 1, 1 kmin} µ(k 1)Since computing the coherence of a deterministic matrix is rather a simpleproblem, we can use the proposition above to determine values of k for whichthe RIP constant k satisfies k < 1/p2 which implies that we can use thosematrices as CS measurement matrices for reconstruction of k-sparse signals.Here, of course we prefer the value of k to be as large as possible which canbe achieved for matrices with small coherence. Unfortunately, there exists alower bound for the coherence of any matrix called the Welch bound [77].Theorem 10. (Welch Bound) Let 2 Rm⇥n and assume that columns of have unit norm. Thenµ() rnmm(n 1)Based on this theorem, we can derive an upper bound for the maximumsparsity level of signals in the case that coherence is used along with Propo-sition 3 to prove that a matrix is RIP with small enough isometry constant.To do that, we consider a matrix that satisfies the Welch bound, and in thatcase, in order to have 2k < 1/p2, it is enough to have µ(2k 1) < 1/p2.This inequality will hold ifk . 1/p22µ=12p2rm(n 1)nmwhere the notation f . g means |f | C|g| for some universal constantC. In the context of CS, where n is a large number, and m ⌧ n, we have151.2. Deterministic constructions in compressed sensingn1nm ⇡ 1, and hence, the maximum sparsity level of the signals satisfiesk = O(pm). This barrier on the sparsity of signals using deterministic CSis called the square-root barrier for deterministic CS, and is indeed muchworse than sub-Gaussian case, where as we saw, the maximum sparsity levelis given by k = O( mlog(nk )).Note that this bound is imposed on the maximum sparsity level by com-bining Welch bound with the bound on the RIP constant given by Propo-sition 3. While the Welch bound can not be improved, as we will see inChapter 4, the bounds on RIP constants obtained from Gershgorin circletheorem (Proposition 3) can be improved. This can suggest a novel waythat leads to breaking the square-root barrier.There have been several deterministic CS constructions in the literature.Most of these constructions rely on graph theory, coding theory, or polyno-mials. There are also some individual constructions. In the following, wesummarize some of the major deterministic constructions. In Chapter 3, wewill construct a novel deterministic CS and we will compare this construc-tion with the existing constructions and we will provide the advantages ofthis construction.1.2.2 Measurement matrices based on Reed-Muller codesLet P be a fixed s⇥s binary symmetric matrix and let a = (a0, a1, ..., as1)Tand b = (b0, b1, ..., bs1)T be arbitrary binary vectors in Zs2. A second-orderReed-Muller (RM) sequence of length 2s is obtained from these parametersbyP,b(a) =(1)w(b)p2si(2b+Pa)T a (1.15)where w(b) denotes the weight of b, i.e., the number of ones in b. In [41], thedeterministic measurement matrix RM is constructed asRM = [UP1 UP2 .... UP2s(s1)/2 ]where each UPj is a 2s⇥2s orthogonal matrix whose columns are the second-order RM sequence obtained by fixing (any binary zero-diagonal symmetrics ⇥ s matrix) Pj in (1.15) and letting b range through all possible binarys-vectors. Since there are 2s(s1)/2 possible s⇥ s binary zero-diagonal sym-metric matrices Pj , and each of them are 2s ⇥ 2s, the associated full RMmatrix has the dimension 2s ⇥ 2s(s+1)/2.161.2. Deterministic constructions in compressed sensingThe inner products between columns of RM have been calculated in [53].If the first column, Pi,. is chosen from UPi and the second column, Pj,. ischosen from UPj then|hPi,. ,Pj,.i| =⇢1/p2l, 2l times;0, 2s 2l times. (1.16)where l = rank(Pi Pj). Note that for large values of s, the Welch Boundis given by µ(RM ) =qnmm(n1) ' 1pm = 1p2s where n = 2s(s+1)2 is theambient dimension and m = 2s is the number of measurements. Therefore,Reed-Muller matrices nearly reach the Welch bound. In [41], Howard et al.provide a fast reconstruction algorithm when Reed-Muller matrices are usedas measurement matrices. However, Ni et al. state in [61] that the algorithmgiven in [41] is not efficient enough for 2D signals such as medical images,and so they propose an algorithm that is more useful in this context. Intheir paper, they introduce a reconstruction algorithm when a Reed-Mullermatrix of the following form is used as the measurement matrix. = [UP1 UP2 UP3 UP4 ]Here, the matrices P1, P2, P3, and P4 are s ⇥ s zero-diagonal binary sym-metric matrices and the difference of any two of these matrices is full rank.1.2.3 DeVore’s constructionOne of the most important deterministic class of matrices in CS is the classof binary matrices introduced by DeVore in [28]. The significance of thisconstruction is because it is a binary deterministic construction, and alsothere is an explicit and simple formula for entries of these matrices, i.e., it isnot recursive, existential, or dependent of other entities such as special typesof graphs. Moreover, unlike most deterministic constructions, this construc-tion has the property that for a fixed number of measurements, the ambientdimension can be chosen as large as we want which is an ideal feature inthe context of CS. In particular, Devore’s construction provides a class ofp2 ⇥ pr+1 matrices, where p is a prime number, and r 2 is an integer. Todefine these matrices, fix a postive integer r 2 and a polynomial q(x) inZp[x] with deg(q) = r. Here, Zp = {0, 1, 2, ..., p 1} is the field of integersmod p. For each i 2 Zp we know that q(i) 2 Zp. Consider the column of themeasurement matrix as a p2 dimensional vector vq whose (p` + s + 1)-stentry (for `, s 2 Zp) is defined as 1/pp if vq(`) = s and 0 otherwise.171.2. Deterministic constructions in compressed sensingSince there are pr+1 polynomials of degree r, this matrix is of the size p2 ⇥pr+1. If vq and vs are two different columns of , then vq(j)vs(j) = 0 unlessq(j) = s(j), i.e. (q s)(j) = 0. Note that deg(q s) r, thus, there are atmost r elements j with this property. Therefore,|hvq, vsi| rpAccordingly, by Proposition 3, RIP constant of these matrices satisfiesk (k 1)rpfor k < p/r + 1. A generalization of DeVore’s construction is given in [51],and includes evaluation of functions at the points on algebraic curves. InChapter 3, we will also introduce a binary deterministic construction whichhas the advantage of being a partial circulant matrix and hence, we can do afast matrix-vector multiplication or fast reconstruction using these matrices.1.2.4 Chirp sensing measurement matricesThis is a class of deterministic CS matrices, first introduced by Applebaumet al. [2]. There are different aspects for significance of these matrices. Oneaspect is that each entry (before normalization) is a root of unity with anexplicit formula and as we will see these matrices almost reach the Welchbound. Another aspect lies in the fact that there is a specific fast recon-struction algorithm for these matrices. The standard recovery algorithmsin CS are the basis pursuit (BP) which has the computational complexityof O(n3), and the matching pursuit which has the computational algorithmof O(kmn). However, Applebaum et al. [2] suggested a new algorithm forchirp sensing matrices that includes fast Fourier transform (FFT) and has acomputational complexity of O(mk2 log k). Another aspect is that Bourgainet al.’s deterministic construction [16], which is the only construction knownso far that breaks the square-root barrier for deterministic matrices is in facta submatrix of chirp sensing matrices. The definition and properties of chirpsensing matrices is related to Gauss sums as defined as follows.Definition 6. Let p be a prime number, and let ! be a primitive pth root ofunity. The quadratic Gauss sum mod p is defined asGp :=p1Xn=0!n2181.2. Deterministic constructions in compressed sensingThe exact value of quadratic Gauss sums which were originally computed byGauss, are given by the following formula [43].Gp =⇢p1/2 if p ⌘ 1 mod 4 ;ip1/2 if p ⌘ 3 mod 4 ; (1.17)where i =p1. Applebaum et al. [2] used this fact to construct the classof chirp sensing matrices, of the size p ⇥ p2. To define these matrices, let0 r p 1 and 0 m p 1 be two arbitrary integers in Zp. Define thecolumns of as follows(r,m) = rp+m+1 =1pp26666664!r·02+m·0!r·12+m·1...!r·(p1)2+m·(p1)37777775 (1.18)Here, j denotes the jth column of . As r and m range between 0 andp 1 they generate p2 columns of . By this definition the (`, j)th entry ofthe matrix is defined as `,j = !r`2+m` with rp+m+ 1 = j. The p⇥ p2matrices that are constructed this way are called chirp sensing matrices.Next, in order to compute the coherence of these matrices, one needs ageneralization of the Gauss sums (1.17) in the following sense.Proposition 4. Let p be a prime number, ! be the primitive pth root ofunity, and r1, r2 2 Zp be two distinct numbers. Then,p1Xx=0!(r1r2)x2=8<:⇣r1r2p⌘p1/2 if p ⌘ 1 mod 4 ;⇣r1r2p⌘ip1/2 if p ⌘ 3 mod 4 ;(1.19)Here,⇣ap⌘denotes the Legendre symbol which is equal to 1 if r1 r2 is aquadratic residue, and is equal to -1 if r1 r2 is a quadratic non-residue.Proof. Case (I) : r1r2 is a quadratic residue. In this case, the sum in (1.19)is the sum of all quadratic residues whose value was given in (1.17).Case (II) : r1 r2 is a quadratic non-residue. Note that if a and b are aquadratic residue and quadratic non-residue respectively, thenp1Xx=0!ax2+p1Xx=0!bx2= 2⇣1 + ! + !2 + ...!p1⌘= 0191.2. Deterministic constructions in compressed sensingHere, we used the fact that for any 0 6= j 2 Zp, this number is either aquadratic residue in which case, !j will be appeared twice in the first sumabove, or is a quadratic non-residue in which case it will be appeared twicein the second sum above. Therefore, if r1 r2 is a quadratic non-residue,thenp1Xx=0!(r1r2)x2= p1Xx=0!x2as desired.Now, the coherence of these matrices can be computed easily since as thefollowing Proposition suggests, there is only two possibility for the magnitudeof inner product of two distinct columns of these matrices.Proposition 5. Suppose is a p ⇥ p2 chirp sensing matrix. If j and kare two distinct columns of , then hj ,ki = 0 or |hj ,ki| = 1pp . Inparticular, µ = 1pp .Proof. The norm of inner product of j and k can be written as :|hj ,ki| = 1p p1X`=0!(r1r2)`2+(m1m2)`where r1, m1 correspond to the jth column and r2, m2 correspond to thekth column respectively. If r1 = r2 and m1 6= m2, then hj ,ki = 0 sincefor any prime p, the sum of all pth roots of unity is zero. If r1 6= r2, then|hj ,ki| = 1p p1X`=0!(r1r2)`2+(m1m2)`=1p!(r1r2)↵2 p1X`=0!(r1r2)(`+↵)2=1p p1X`=0!(r1r2)(`+↵)2 = 1p p1X`=0!(r1r2)`2 = 1ppwhere ↵ 2 Zp is the solution to equation 2(r1r2)x = m1m2 mod p, andwhere we used Proposition 4. Hence, for a chirp sensing matrix,201.2. Deterministic constructions in compressed sensingµ = max1j 6=kn|hj ,ki| = 1ppNote that the Welch bound for chirp sensing matrices is given byrnmm(n 1) =sp2 pp(p2 1)which is almost same as 1pp for large enough p, and is same as coherence ofthese matrices.A generalization of chirp sensing codes was given by Nelson et al. [60].They constructed a class of p ⇥ pr CS matrices, where r is an integer withr 2. Their construction is based on Weil Theorem which as stated asfollows is a generalization for quadratic Gauss sum formula (1.17).Theorem 11. Let r 2 be an integer and let p > r be a prime number. Leta := (a1, a2, ..., ar), where each aj is an integer in Zp, and letF (a, u) := arur + ...+ a1uThen for a 6= (0, ..., 0), we have|S(a)| (r 1)ppwhere S(a) =Ppu=1 e2⇡iF (a,u)p .Using this theorem, Nelson et al. defined a class of p⇥pr matrices (similar towhat we saw for defining the p⇥p2 chirp sensing matrices), and the coherenceof matrices in their construction satisfies µ r1pp .The last construction we review in this chapter, is the deterministic con-struction given by Bourgain et al. [16] which is a class of submatrices of chirpsensing matrices and breaks the square-root barrier although by a small im-provement, i.e., m1/2+✏ for a very small number ✏.1.2.5 The construction of Bourgain et al.In [15, 16], Bourgain et al. give an explicit construction of deterministicmatrices of the size p⇥ p1+✏ (for a small number ✏) that breaks the square-root barrier. Below, we give a brief overview of the construction given in211.2. Deterministic constructions in compressed sensing[16]. We also refer to [56] for a detailed discussion and review of [16] andimportant extensions.Fix a prime number p, and an even number m 100. Let ! = e2⇡i/p,and let Zp be the set of integers mod p as usual. Set↵ :=12m, :=12.01m, r := b log plog 2c, M := b22.01m1cand define the sets A and B as follows.A := {1, 2, ..., bp↵c}, B :=n rXj=1xj(2M)j1 : x1, ..., xr 2 {0, 1, ...,M1}oNext, define the p ⇥ n measurement matrix ˜ to be the submatrix of chirpsensing matrix as follows.ua,b =1pp(!ax2+bx)x2Zp (1.20)where a 2 A , and b 2 B. The number of columns of ˜, n, is clearly |A ||B|.Now, the number of elements of B can be counted in the following way.|B| ⇣M r ⇣ (2 11) log2 p = 2log2 p2log2 p = p.p = p1where we used Hardy notation a ⇣ b, i.e., a ⇣ b if there are positive constantsC1 and C2 such that C1b < a < C2b. Hence,n = |A ||B| ⇣ p↵p1 = p1+ 1402mIt turns out that the matrix as defined above breaks the square-root bar-rier. The proof is based on the following lemma.Lemma 2. [16] Let k 210 and s be a positive integer. Assume that thecoherence parameter of the matrix is µ 1/k. Also, assume that for some 0 and any disjoint J1, J2 ✓ {1, 2, ..., n} with |J1| k, |J2| k we haveDXj2J1uj ,Xj2J2ujE k.Then, = [u1 u2 · · · un] satisfies the RIP of order 2sk with constant44sp log k.221.2. Deterministic constructions in compressed sensingNext, it is shown in [16] that for sufficiently large enough p, the matrix ˜as defined in (1.20) satisfies the condition of Lemma 2 with k = bppc, and = O(p✏1(log p)2). Here, ✏1 is a constant defined as✏1 :=c08 47↵232m1 + 93/m+ c0/2where c0 is a constant that originates from additive combinatorics and isestimated as c0 = 110430 [17] (though this estimate is not necessarily sharp), is a constant that can be set to := 141.4m . This reduces the expressionabove for ✏1 to the following expression.✏1 =c0m 3799.6m2331.2(1 + 93/m+ c0/2)(1.21)Therefore, if we set s = 2bp✏0c , with ✏0 < ✏1/2, then Lemma 2 implies that ˜satisfies RIP with order p1/2+✏0 , and constantO(p✏1/2+✏0(log p)2). Hence,The square-root barrier would be broken for the CS matrix ˜ as definedin (1.20). Note that in [16], the value of m is set to be m = 2b3799.6c0 c.Using the approximation for the value of c0 mentioned above, we obtainm = 79, 259, 656, which yields ✏1 ⇡ 1.8261⇥ 1015.Discussion about the construction of Bourgain et al.The construction by Bourgain et al. constitutes a landmark (and the only onethus far) in the effort for breaking the so-called square-root barrier. One ofour main goals in this thesis is to propose two novel approaches that improvethe Gershgorin bound by a multiplicative or additive constant, respectively,in the case of a specific construction. We will also propose a conjecture re-garding the distribution of quadratic residues in Zp and we will show thatthis conjecture combined with one of the two approaches mentioned aboveleads to breaking the square-root barrier with much milder conditions onparameters compared to the construction given in [16]. Specifically, thereare three aspects that we should consider when we talk about breaking thesquare-root barrier. The first (and probably the most important one) is howclose to the unity the power ↵ in k = O(m↵) can be chosen. In the con-struction given in [16], ↵ = 1/2 + ✏0, with ✏0 < ✏1/2 < 1015 in the instanceconsidered above. If our proposed conjecture holds, then our result suggestsk = O(m5/7). Another aspect that potentially can be improved is the boundon the minimum number of measurements p. In the construction given in[16], the RIP constant is O(p✏1/2+✏0(log p)2). Recall that the minimum231.3. Quantization in compressed sensingconstant that guarantees `1 recovery through BP or BPDN is 2k < 1/p2.Note that in order to havep✏1/2(log p)2 ⇡ p9.1304⇥1016(log p)2 < 1/p2,we must have p > 101016 = 1010000000000000000, since one can verify thatp = 101016 is not large enough to satisfy the inequality above. It can beseen that a similar bound on the minimum number of measurements musthold in order for the aspect ratio of the measurement matrices (n/p) to bereasonably large, e.g., n/p = 2, and also in order for the bound on theRIP constant given in [16] to be smaller (tighter) compared to the classicalGershgorin bound k k1pp . Therefore, our goal in Chapter 4 is to proposean approach that can lead to improved version of breaking the so calledsquare-root barrier (assuming our conjecture holds).1.3 Quantization in compressed sensingQuantization of compressive samples. As summarized in Section 1, us-ing compressed sensing one can reconstruct a given sparse signal using fewmeasurements. In fact, we saw that for a given ambient dimension n, andsparsity level k, one can construct m⇥n measurement matrices that guar-antee that (1.3) holds ifm Ck log (n/k) for some constant C. For example,certain sub-Gaussian matrices satisfy this condition with high probability, asmentioned in Theorem 6. So, effectively compressed sensing transforms thesignal in high-dimensional space to a space of lower dimension. On the otherhand, one of the original motivations of compressed sensing has been com-pression. In the context of signal acquisition, we must not only “sample" (ormeasure) the signal in such a way that we can accurately reconstruct it later,but we must also quantize the measurements so that we may store/transmitthem using digital devices. Quantization makes the range of a signal discrete,so that the quantized signal takes on only a discrete, usually finite, set ofvalues. Unlike sampling (where we saw that under suitable conditions exactreconstruction is possible), quantization is generally irreversible and resultsin loss of information. It therefore introduces a distortion into the quantizedsignal that can not be eliminated. A quantizer is a device or a function thatperforms quantization. The round-off error introduced by quantization iscalled quantization error.Main Problem: Given a vector y, which, in our context will be thecompressed measurements of a sparse or compressible signal, we want to241.3. Quantization in compressed sensingreplace it by a vector whose elements are chosen from a discrete set A, calledthe quantization alphabet. We consider the general form of A = Z. Forexample for = 12 our alphabet will be, A = 12Z = {....,1,12 , 0, 12 , 1, ....}1.3.1 Memoryless scalar quantizationOne of the main goals in quantization is to obtain an approximation ofthe original signal as accurately as possible given a fixed alphabet A. It isalso important to have a “progressive" approximation scheme in the sensethat once more measurements are collected, i.e., the larger m is, the betterresulting approximation is. Consider a signal x 2 Rn and a measurementvector y = x 2 Rn. One of the easiest approaches to quantization is amethod called memoryless scalar quantization (MSQ). In this method, wereplace each measurement y(i), i 2 {1, 2, ...,m}, by its closest neighbourqMSQ(i) in A = Z. By the definition of our alphabet we have |y(i) qMSQ(i)| < 2 and therefore,ky qMSQk2 < 2pm (1.22)Next, we observe that we can interpret the error due to quantization as“noise", i.e., in (1.4) we set ! = y qMSQ. In other words,qMSQ = x+ !where k!k2 = ky qMSQk2 pm2 . Using Theorem 5, it can be concludedthat the solution xˆ toxˆ = argmin kzk1 subject to kqMSQ zk2 pm2satisfieskx xˆk2 C1pmTherefore, as we increase the number of measurements m, the upper boundon the error in reconstruction increases, which is contrary to what we expect.Note here, that as a convention, we have worked so far with matrices withexpected unit-norm columns. This is to ensure that the matrices satisfy theRIP condition. Since the entries of m⇥n measurement matrices are randomvariables with unit variance, we need a normalization factor 1/pm so thatthe columns have unit-norm (in expectation).251.3. Quantization in compressed sensingIn the context of quantization, if we want to work with the same nor-malization convention, we have to decrease the quantizer step size as weincrease the number of measurements m. For example, consider a sig-nal x and the measurement vector y = 1pm0x where 0 is an unnor-malized measurement matrix. Suppose that when m = 1, y(1) = 2.3478and the quantization alphabet size is = 0.5. Therefore, we will haveq(1) = 2.5. Now suppose that we increase the number of measurementsto m0 = 10000. Assuming the same values for entries of 0 as above, wewill have y(1) = 0.023478. Therefore, we must choose the new quantizationalphabet size as 0 = /pm = /100 = 0.005 to map this entry to 0.025and be able to keep the scaled original information about the value of y(1),otherwise all entries of the measurement vector including y(1) will map to 0for sufficiently large m and we will lose all the information.Therefore, if we want to keep the quantizer step size constant as we in-crease m, we need to change the normalization convention for the measure-ment matrices and work with unnormalized measurement matrices whoseentries are random variables with the unit variance (or unit-magnitude inthe case of deterministic matrices).Using this convention, we still haveky qk2 = k0x qk2 pm (1.23)(where 0 is the unnormalized measurement matrix), but in order to be ableto use Theorem 5 we need to divide both sides of (1.23) by the factor ofpmto obtaink 1pm0x qpmk2 and we conclude that the solution xˆ to the `1 minimization problemxˆ = argmin kzk1 subject to k 1pm0z qpmk2 satisfies kx xˆk2 C1.Therefore, we observe that if we use MSQ, as we increase the numberof measurements m, even after resolving the normalization issue, the recon-struction error does not decrease as we expect. Hence, as we want to use aquantization scheme that has this property, we introduce another method ofquantization called “Sigma-Delta quantization". In the following, we give abrief summary about this method of quantization in CS.261.3. Quantization in compressed sensing1.3.2 Sigma-Delta (⌃) quantization⌃ Quantization is a quantization scheme used predominantly in quantizingovercomplete bandlimited functions and redundant frame expansions [9, 10,12, 40, 42, 48, 49, 73, 79, 80]. The 1st order ⌃ Quantization scheme isdefined as follows.Let u0 = 0, and define uj and qj for j 1 recursively as follows.qj = argminp2A|uj1 + yj p|uj = uj1 + yj qj ,(1.24)When the input y = (yj) is a finite sequence, i.e. y 2 Rm, we can use matrixnotation to abbreviate (1.24). In this case, we can represent (1.24) asy q = Du (1.25)where D = [Dij ]m⇥m is the difference matrix whose entries satisfyDij =8<: 1 if i = j;1 if i = j + 1;0 otherwise.1.3.3 Approximating signals using ⌃ quantizationAs we saw before, if we perform MSQ as the method of quantization, and ifwe assume that the entries of our measurement matrices have the unit norm(if the matrix is deterministic) or unit-norm in expectation (if the matrix israndom), the reconstruction vector xˆ satisfieskx xˆk2 C1for some constant C1. Consequently, the error bound does not improve aswe increase m. In this section, we compute the error in approximating theoriginal signal using ⌃ quantization with the same normalization conditionas above.We know that yq = xq = Du. As before, let x be a k-sparse vectorand assume that we have recovered support of x, i.e., we know T = supp(x).Also, let T be the restriction of to its columns indexed by the set T . Wecan view T as a frame matrix. The concept of “frames" is a generalizationof the concept of “basis". Specifically, a set of vectors {fn} in a Hilbert spaceH is called a frame if there exists positive constants A and B called framebounds satisfyingAkxk2 X|hx, fii|2 Bkxk2271.3. Quantization in compressed sensingfor every x 2 H, and here, kxk denotes the norm induced by the innerproduct of the Hilbert space. In finite dimensional case, where H = Rk,if we have a set of vectors {fn} with n = 1, 2, ...,m, the frame matrix Eis obtained by considering the transpose of these vectors as the rows of thismatrix. It can be seen that the set {fn} is a frame if and only if the resultingframe matrix is full rank. Any left inverse of the frame matrix E is calleda dual of E, and the Moore-Penrose pseudo-inverse is called the canonicaldual.In the context of CS, we have y = x = TxT , where T 2 Rm⇥kwhich is a tall and full rank matrix, and can be viewed as a frame matrix.Thus, it admits a dual matrix; let F be a dual (left inverse) of T . Sincey = x = TxT , multiplying both sides of (1.25) from left by F impliesxT Fq = FDuNow define xˆ0⌃ = Fq . Then the error in approximation using ⌃ quanti-zation will be bounded bykxT xˆ0⌃k kFDuk2 kFDk2kuk2On the other hand, it is shown in [47] that there exists a ⌃ scheme suchthat for every quantization alphabet A = Z and for every input signal y,kuk1 C for a universal constant C not depending on m (see Proposition6). Using this inequality we can conclude kx xˆ⌃k2 CkFDk2pm. Thismotivates minimizing kFDk2 in order to minimize the upper bound on theerror.The frame F that minimizes kFDk2 subject to FT = I is called the(first order) Sobolev dual frame of T and is given by F = (D1T )†D1[11], where † denotes the Moore-Penrose inverse of a matrix. The Moore-Penrose inverse of a tall matrix A is given by A† = (A⇤A)1A⇤. Knowingthat the operator norm of a matrix is its maximum singular value, for thischoice of F we can writekFDk2 = k(D1T )†k2 = max[(D1T )†] = [min(D1T )]1Therefore, the error in approximation is bounded bykxT xˆ0⌃k2 Cpmmin(D1T )(1.26)On the other hand, kx xˆ⌃k2 = kxT xˆ0⌃k2 where xˆ⌃ 2 Rn is the vectorobtained from xˆ0⌃ by padding zeros in the components that are not indexed281.3. Quantization in compressed sensingby T = supp(x). Therefore,kx xˆ⌃k2 Cpmmin(D1T ). (1.27)Approximation via Two-Stage Reconstruction ⌃ Quantization:Given the quantized measurement vector q 2 Rm, approximation of x using⌃ quantization includes two major steps. In the first step, the supportof x is recovered using `1 minimization problem. In the second stage, ap-proximation to vector x is calculated using Sobolev-dual of a submatrix of determined by the support recovered in stage 1.Stage 1: Support Recovery: Let x be a given k-sparse signal, be an m ⇥ n sub-Gaussian unnormalized measurement matrix. We set aconstant smaller than 1/p2, say, 2k = 0.5 in Theorem 6. We also assumethat m 16Ck log(en/(2k)), where C is the constant given by this Theorem6. Let q be the corresponding quantized measurement vector. In order toestimate the support of x, consider the following `1 minimization problemx0 := argminkzk1 subject to kz qk2 ✏ := pm (1.28)By Theorem 5, with probability at least 1 2 exp( m8C ), we havekx x0k2 C1 12 := ⌘where we used the fact that for the constant C1 whose expression is given inRemark 1, and for 2k that satisfies 2k 0.5, we have C1 12. Therefore,if the smallest entry of x is greater than 2⌘ in magnitude, then with theprobability mentioned above, picking the positions of k largest entries of x0,recovers the support of x perfectly.Stage 2: Signal Approximation: The support of x is recovered inStage 1. Now, let T = supp(x), and E = T . If we put F 0 = (D1E)†D1,and xˆ0⌃ = F 0q, then by (1.27) we havekx xˆ⌃k2 = kxT xˆ0⌃k2 Cpmmin(D1E)Hence, to bound the error in approximation, it remains to find lower boundfor min(D1E). To that end, Gunturk et al. [40] proved the followingtheorem.Theorem 12. Let E be an m ⇥ k random matrix whose entries are i.i.dN (0, 1). Given ↵ 2 (0, 1), there exist constants c1, c2, and c3 such that ifm k(c1 logm)11↵ , then with probability at least 1 exp(c2 k↵m1↵ ),min(D1E) c3m↵+12k↵2.291.3. Quantization in compressed sensingTherefore, if is a Gaussian matrix with entries N (0, 1) and with the min-imum number of measurements mentioned above, then for a fixed signal xand for any choice of T , with high probability min(D1E) c00m↵+12 , fora constant c00. Therefore, the bound on the error in approximation can bewritten as follows.kxT xˆ0⌃k2 Cpmc00m↵+12=Cc001m↵2with ↵ 2 (0, 1). Comparing the above error with the error of approximationusing MSQ which as we saw, could be written in the formkx xˆMSQk2 C1we see that this time the error of approximation decreases as the numberof measurements increases. Accordingly, we expect a better performance inthe sense of having smaller error in approximation using ⌃ quantizationinstead of MSQ, especially when the number of measurements is large.1.3.4 Higher order ⌃ quantizationAbove, we summarized the first order ⌃ quantization scheme. One canalso define higher-order ⌃ quantizers. For example, a second-order ⌃quantizer with alphabet A is defined as follows.Given a vector y = (y1, ..., yn), the quantization vector q is defined byrunning the iterationqi = QA(⇢(ui1, ui2, yi))ui = yi qi + 2ui1 ui2(1.29)where i 1, and u0 = u1 = 0. Above, QA : R! R is the scalar quantizerwith alphabet A as defined in (1.24), and ⇢ : R3 ! R is a specified quanti-zation rule. Note that if we set vi := ui ui1 (for i 0), then (1.29) canbe rewritten in the following form.qi = QA(⇢(ui1, vi1, yi))vi = vi1 + yi qiui = ui1 + vi(1.30)with u0 = v0 = 0.In (1.29), we can set ⇢(ui1, ui2, yi) := 2ui1ui2+ yi (see (1.34) below),or alternatively, we can set ⇢(ui1, vi1, yi) := ui1 + vi1 + yi in (1.30).301.3. Quantization in compressed sensingWe also set QA(x) = b10xe10 where b...e is the nearest integer function andQA(x) rounds a number to its first decimal place. Thus, the second order⌃ quantization can be written in the following form.q1 =b10y1e10v1 = y1 q1u1 = v1(1.31)And for i 2 :qi =b10(ui1 + vi1 + yi)e10vi = vi1 + yi qiui = ui1 + vi(1.32)Using matrix notation, we observe that y, q, u, and v in (1.29) satisfyy q = Dv , v = Du) y q = D2u.Similar to what we observed for defining the second order ⌃ quantizationin (1.29), the rth order ⌃ quantization can be defined as follows.qi = QA(⇢(yi, ui1, ui2, ..., uir))ui = yi qi rXj=1✓nj◆(1)iuij . (1.33)Above ⇢ : Rr+1 7! R is an appropriate quantization rule, and can be consid-ered as [72]⇢(yi, ui1, ui2, ..., uir) :=rXj=1(1)j1✓rj◆uij + yi. (1.34)Also, QA is the scalar quantizer with alphabet A as defined in (1.24) withthe initial condition u0 = u1 = .... = ur+1 = 0. In the case of an rth-order ⌃ quantizer, the relationship between y, q, and u can be expressedusing matrix notation as y q = Dru. It is shown in [48] that the ⌃quantization scheme as given in (1.33), is stable in the sense that the socalled state variable u remains bounded in infinity norm (uniformly in m).In other words, the following proposition has been proved.311.3. Quantization in compressed sensingProposition 6. [48] Set the quantization alphabet to be A = Z, then foreach r 2 N, there exists a constant Cr such that the ⌃ scheme given in(1.33) is stable for all input signals y with kuk1 Cr. Furthermore, forthe quantization rule given in (1.34), one can assume Cr = 1/2.Replacing D with Dr in our equations for first-order ⌃ quantization, wesee that in the case of rth order ⌃ quantization, the reconstruction vectoris given by xˆ⌃ = Fq = (DrE)†Drq , and the bound in error of approx-imation is proportional to 1min(DrE) . In particular, the following theoremholds for estimating the error in reconstruction of signals using rth order ⌃quantization.Theorem 13. [40] Let E be an m⇥k random matrix whose entries are i.i.d.N (0, 1). Given r 2 N and ↵ 2 (0, 1), there exist constants c = c(r) > 0, andc0 = c0(r) > 0 such that if we set := mk , and if (c logm)1/(1↵), thenwith probability at least 1 exp(c0m↵), we havekx xˆ⌃k2 . ↵(r1/2)Therefore, if we fix the value of k, then the error in reconstruction shoulddecay like f(m) = 1pmin the case of first order ⌃ quantization, and itshould decay like g(m) = 1mpmin the case of second order ⌃ quantization.Figure 1.1 confirms this fact. In this experiment, the sparsity level k = 10 isbeing fixed, and for each m, 200 experiments is performed. In each experi-ment, a 10-sparse signal x 2 R200 with a random support T ✓ {1, 2, ..., 200},and standard normal entries is chosen, and an m ⇥ 200 Gaussian matrix isconsidered as the measurement matrix. Then, x is quantized using first or-der and second order ⌃ quantization schemes (with the step size = 0.1).Let q(r) denote the rth order quantized vector. Then the reconstruction vec-tors are computed using via xˆ := F 0q(r) = (DrT )†Dr, and the error inreconstruction, i.e., kx xˆk2 is computed. The average errors in these 200experiments are plotted versus number of measurements m in the log-logscale.Unfortunately, there are a number of caveats for the two-stage recon-struction for ⌃ quantization. First, in order to recover the support of asignal x, apart from a universal constant, the quantizer step must be assmall (in magnitude) as the lower bound on the smallest non-zero entries ofthe signal x (see Theorem B in [40]). Furthermore, two-stage reconstructionfor ⌃ quantization is not stable and robust (unlike the usual situation inCS). For these reasons, another reconstruction method, this time one stage,321.3. Quantization in compressed sensing50 100 150Number of measurements m10-310-2Mean error in approximationFigure 1.1: Error in approximation using first order and second order ⌃quantization versus number of measurements m. The graphs are plotted inlog-log scale and they are compared with the upper bounds on the error,namely, f(m) = C/pm, and g(m) = D/pm3 (the graphs of f(m) = 1pmand g(m) = 1mpmare multiplied with proper constants to shift them closeto the numerical graph) respectively.331.3. Quantization in compressed sensingwas later introduced in [72], which we review next.1.3.5 One-stage recovery for ⌃ quantizationAs an alternative to the two-stage reconstruction method of [40] and [48],[72] introduced a one-stage reconstruction scheme. Specifically, if q is the⌃ quantization vector obtained from noisy measurements x + ⌘ withk⌘k1 ✏, then an approximation x˜ to the signal x is recovered by solving(xˆ, ⌫ˆ) := argmin(z,⌫)kzk1 subject to kDr(z + ⌫ q)k2 Crpmand k⌫k2 ✏pm(1.35)where D denotes the difference matrix as above, r denotes the order of the⌃ scheme, and Cr is the constant obtained from Proposition 6.Theorem 14. (Theorem 1 in [72]) Let k, `, m, N be integers satisfyingm ` ck log(n/k) and let be an m ⇥ n sub-Gaussian matrix. Let q bethe ⌃ quantization of x+! (with k!k1 ✏). Then with high probabilityon the draw of , for every signal x 2 Rn, the solution xˆ of (2.10) satisfieskxˆ xk2 C⇣(m`)r+1/2 +k(x)pk+rm`✏⌘(1.36)where c, C are constants that do not depend on m, `, n.Note that the parameter ` can be chosen as any number in the interval[ck log(n/k),m] and various choices of ` have different implications. Forthe case where there is no noise, ` would be chosen as small as possible tominimize the error in approximation. Also, note that in practice we aregiven a k-sparse signal x, and we know the sparsity level k. Based on thisk, we have to choose at least mmin = O(k log(n/k)) measurements in orderto obtain an RIP matrix with small enough constant (with high probability)and hence, use the standard CS results. In the context of quantization, wecan choose the parameter ` = O(k log(n/k)) (similar to mmin). Now, if weuse m = mmin measurements, we can not obtain a guarantee for the error ofsignal reconstruction being small enough, but as we increase the number ofmeasurements (while we keep ` fixed), the error becomes smaller and smaller(decaying like mr+1/2 if there is no noise). If there is noise, or if the signalis not k-sparse, the bound on the error can still be found according to (1.36).341.3. Quantization in compressed sensingUsing one-stage reconstruction method has important advantages. First,unlike the two-stage reconstruction method, this method is robust with re-spect to noise. This is a significant advantage of this method since in practicethere is always noise present. Second, as we saw before, two-stage reconstruc-tion method requires the minimum entry of x to be at least 24. However,the one-stage reconstruction method does not have any restrictions of thissort. For these reasons, this method is a preferred method to use for re-construction. Another advantage of using one-stage reconstruction methodis that by choosing an appropriate value for ` in (1.36) and assuming that = m/k is constant, it can be shown [72] that the error in reconstructiondecays root-exponentially in the case with no noise involved.Further encoding and exponential accuracyAs mentioned above if one-stage quantization is used, in the absence of noisea root-exponential accuracy can be achieved. However, it was shown in [71]that the exponential error decay can be achieved if the quantization stage isfollowed by an encoding stage. We explain the idea as follows.We start with a compressible signal x 2 Rn. Then, we measure this signalusing a measurement matrix , and we obtain the measurement vector y =x. Next, using a quantization alphabet A, we quantize the measurementsusing a ⌃ quantization scheme Q : Rm ! Am, obtaining the quantizedvector q = Q(y) = Q(x) 2 Am. In the following step, we encode thequantized vector using an encoding map E : Am ! C, where C is a finite set,called, the codebook satisfying log2 |C| ⌧ log2 |Am|. This way, to representthe quantized measurements we will need much less number of bits. Weshould however, encode in such a way that we can still obtain an accuratereconstruction using an appropriate reconstruction algorithm. In fact, theencoding map is defined as follows.E : q ! BDrqwhere D is, as usual, the difference matrix, B is an `⇥m matrix with i.i.d.Bernoulli random entries, andm ` ck log(n/k)for an appropriate constant c. The algorithms to find the reconstructionvector xˆ are given in (20) and (21) in [71] for the cases where the noise isabsent or present respectively. In particular, the reconstruction algorithmfor the noise-free case is as follows.xˆ := argminzkzk1, subject to kBDr(z q)k2 3Cm351.4. Organization of the thesiswhere C is a constant that depends on the ⌃ quantizer.Next, let : C ! Rn denote the reconstruction algorithm. Also, theworst case error of reconstruction, the distortion D, is defined viaD := sup k⇣E(Q(x))⌘ xk2where the supremum is over all signals under consideration, which is eitherthe set of all k-sparse signals or the set of all compressible signals. Further-more, the bit rate R is defined asR := log2 |C|Using this notation, it was proved in [71] that the exponential accuracy canbe achieved in the following sense.Theorem 15. [71] Let k0 = b `C0 lognc, and assume that x is a k-sparse signalwith k k0. Then,D . 2C2 Rk0 logn1.4 Organization of the thesisIn this thesis, we focus on three important problems related to compressedsensing with explicit, deterministic constructions.In Chapter 2, we investigate how to extend existing results on ⌃ quan-tization to the cases when compressed measurements are obtained usingmatrices that do not belong to sub-Gaussian ensemble.In Chapter 3, we present a novel construction that results in a binary,partially circulant measrement matrix that is easy to store and fast to apply.In Chapter 4, we develop techniques that can be used to improve boundson the RIP constants obtained by using the Gershgorin circle Theorem. Weend by making a conjecture, which, we prove, will lead to breaking the squareroot barrier if true.36Chapter 2One-stage recovery for⌃-quantized compressedsensingThe quantization problem is an important and difficult problem in the con-text of CS. CS is a signal acquisition paradigm that performs dimensionreduction when the acquired signals live in a high-dimensional ambient spacebut have low-dimensional structure. Specifically, the focus is (approximately)sparse signals in Rn where n is large. Consequently, the “compressed" rep-resentations that result are still real valued vectors, this time in Rm, withm⌧ n.While quantization was mostly omitted in the early CS literature, therehas been several recent papers that address this problem. The approaches inthe literature focus mostly on either “memoryless scalar quantizers" (MSQ)or “noise-shaping quantizers".MSQ for CS: Suppose that the entries of y = x+⌘ 2 Rm are (possiblynoisy) measurements of an (approximately) sparse x 2 Rn. An MSQ withalphabet A rounds off each entry of y (independently) to the closest elementof A. [13, 24, 65].A special case of MSQ is the 1-bit quantizers, where each measurementis replaced by its sign [14, 44, 63, 64]. While such quantizers typically loseinformation regarding the magnitude of x, they are still interesting in certainapplications due to their simplicity.As discussed in Section 1.3, a direct interpretation of the quantizationerror as additive noise leads to an estimate x˜ (via, for example Basis PursuitDenoise [22, 30]), where kx x˜k is proportional to the quantizer step size ,and does not improve by increasing the number of measurements m. On theother hand, it was observed empirically in [40] that in a two-stage recoverymethod where the Penrose-Moore pseudo-inverse is used in the second stage(after support recovery), the error kx x˜k is O( 1pm).Motivated by this, [54] shows that kx x˜k is bounded by the sum of a37Chapter 2. One-stage recovery for ⌃-quantized compressed sensingterm that does not depend on m, but unobservably small in any realisticsetting, and a term that is indeed O( 1pm), at least for a wide class of sub-Gaussian matrices with high probability. Similarly, it was also shown inthe 1-bit CS context, that the approximation error decays as m increases.Specifically, it was shown in [64] that for a fixed level of sparsity, the errorin approximation using a specific convex minimization program decays asO( 1m1/5) up to a logarithmic factor.While these improved results show some decay as a function of m, thisdecay is mild, suggesting that MSQ does not utilize extra measurementsefficiently. This leads us to noise-shaping quantizers.Noise-shaping quantizers for CS: Noise-shaping quantizers were orig-inally introduced in the context of analogue-to-digital (A/D) conversion ofbandlimited signals [42]. These A/D convertors, called ⌃ quantizers be-come popular in A/D conversion technology [73] as they can be implementedusing low-accuracy circuit elements and still produce high-accuracy approxi-mations by oversampling the bandlimited signals at a rate much higher thanthe Nyquist rate. For many classes of signals oversampling is much easier (toimplement on circuitry) compared to using high-accuracy circuit elements,for example a scalar quantizers Q with very small .Motivated by their efficiency in exploiting redundancy, ⌃ quantizerswere considered in the context of frame expansions (which are inherentlyredundant). Indeed, they were shown to yield approximations that improveas the redundancy increases in the contexts of Gabor frames [79, 80], fi-nite frames in Rd with certain regularity assumptions [9, 10, 12] , Gaussianrandom frames [40] and sub-Gaussian random frames [48, 49].As explained in Section 1.3, the frame theory setting is inherently con-nected to CS when the underlying signal is sparse. This observation moti-vated the “two-stage reconstruction method" in CS after the measurementsare quantized using an rth order ⌃ scheme. In a nutshell, suppose x 2 ⌃nk , 2 Rm⇥n be an appropriate CS measurement matrix, and y = x be thenoise free compressive measurements. Also, let q be obtained by quantizingy using an rth order ⌃ scheme and let D be the difference matrix as in(1.3.2). In order to perform two-stage ⌃ quantization, we have to firstrecover the support set T = supp(x). Then, the reconstruction vector xˆ isgiven by xˆ⌃ = Fq where F = (DrT )†Dr. Under such assumptions, theerror in approximation satisfieskx xˆ⌃k2 Cpmmin(DrT ).However, performing two-stage ⌃ quantization for sub-Gaussian matrices38Chapter 2. One-stage recovery for ⌃-quantized compressed sensinghave two main disadvantages :(1) The recovery of signals is not necessarily robust respect to additive noise(unlike the usual situation in CS).(2) The smallest entry of x must be at least 2⌘ := 24 in magnitude. There-fore, in general we have to work with a fine alphabet so that the smallestentry of the signal becomes greater than 2⌘. Hence, two-stage ⌃ quan-tization is not adaptable to coarse quantization.As discussed in Section 1.3, to solve these issues, and to recover theoriginal signal from quantized measurements simply in one stage and bysolving a convex minimization problem, the following algorithm is proposedin [72].(xˆ, ⌫ˆ) := argmin(z,⌫)kzk1 s.t. kDr(z + ⌫ q)k2 Crpmand k⌫k2 ✏pm(2.1)Here Cr is a constant that can depend on the order r and and in the specificcase of an rth order greedy ⌃ quantizer, Cr = 1/2 [72]. In this thesis wemake the latter assumption, i.e., we assume Cr = 1/2. As seen in Section1.3, if x is k-sparse, and if the value of ` is large enough, then for any mwhich is at least as large as `, the error in approximation satisfieskxˆ xk2 C⇣(m`)r+1/2 +rm`✏⌘(2.2)where c, C are constants that do not depend on m, `, n.Although this method solves the issues mentioned above for the two-stage ⌃ quantization, it can be applied only for the class of sub-Gaussianmatrices. However, there are other important class of measurement matri-ces that are used in CS such as random restrictions of Fourier matrices orbounded orthonormal systems in general, and deterministic measurementmatrices. Therefore, one important task is to propose a method to perform⌃ quantization using these type of matrices.The idea in this chapter is to abstract out the underlying property thatenables us to perform one-stage quantization for sub-Gaussian matrices andto make it work for other CS random matrices and for deterministic CS ma-trices. Specifically, we will propose two novel approaches by using a modifiedmeasurement matrix and by using a digital buffer given in Sections 2.1 and2.2 respectively.392.1. Approach 1: Modifying the measurement matrix2.1 Approach 1: Modifying the measurementmatrixIn [72], a wide class of sub-Gaussian matrices was considered and it wasshown that one-stage reconstruction ⌃ quantization can be performed us-ing (2.1). In order to generalize the results of [72] to other classes of randommatrices and also certain deterministic matrices, we next isolate one mainproperty, which we call (P1), that the measurement matrices must satisfy forsuch a generalization. Specifically, we define the property (P1) as follows.Property (P1). Suppose that is an m⇥ n unnormalized CS matrix,with (expected) column norm ofpm. We say that satisfies the prop-erty (P1) of order (k, `) if the RIP constant of 1p`()`– where ()` is therestriction of to its first ` rows– satisfies 2k < 1/9.Note that sub-Gaussian matrices, and random restrictions of BoundedOrthonormal Systems (including the DFT matrix) satisfy this property withhigh probability for appropriate choices of k and `.Next, let y = x+⌘, and k⌘k1 ✏. Set H := [CrDr ✏ I], i.e., an m⇥2mmatrix whose first m columns are obtained from CrDr and whose next mcolumns are obtained from ✏ I. Suppose that H = U⌃VT is the singularvalue decomposition of H. We shall prove that one-stage reconstructionfollowing ⌃ quantization can be performed if :(i) satisfies (P1), and(ii) the compressed measurements are obtained using U as our measure-ment matrix (as opposed to ).Remark 4. If U is a unitary matrix and is a CS measurement matrix,then 1pmU can also be used as a CS measurement matrix for the recovery ofsignals with the same level of sparsity. This is because multiplying a matrixwith a unitary matrix does not change its RIP constant.The following results are instrumental.Proposition 7. [35] Let f, g 2 Cn, and 2 Cm,n. Suppose that is RIPwith constant 2k < 1/9. Then for any 1 p 2, we havekf gkp C4k1/p1/2k(f g)k2 + C5k11/p(kfk1 kgk1 + 2k(g)1)where C4 and C5 are constants that only depend on 2k.402.1. Approach 1: Modifying the measurement matrixProposition 8. [48] The jth singular value of the matrix Dr satisfies1(3⇡r)r(mj)r j(Dr) (6r)r(mj)rNow, we are ready to show that if is a matrix that satisfies the property(P1), then certain unitary transform of can be used as the measurementmatrix to perform ⌃ quantization followed by a one-stage reconstructionscheme.Theorem 16. Suppose that is an m ⇥ n CS matrix, x 2 ⌃nk , and H =U⌃V T as above. Assume that k and ` are chosen so that satisfies (P1)of order (k, `). Suppose we measure the signal x using the matrix ˜ = U,yielding the measurements y = ˜x, and perform the ⌃ quantization to yto obtain the vector q. Then xˆ, obtained from (2.1), approximates x with theerror satisfyingkx xˆk2 2CrC4(3⇡r)r(m`)r+1/2 + 2C4rm`✏+2C5pkk(x) (2.3)where C4 and C5 are constants only depending on RIP constant of .The proof of this theorem when is a sub-Gaussian matrix is given in[72]. However, the only condition used in that proof is in fact property (P1)as defined above. For the sake of clarity and completeness, we next give theproof of this theorem.Proof of Theorem 16. Let xˆ and ⌫ˆ be the solutions to the minimization prob-lem (2.1). Also, let ˜ := U and w := Dr(˜xˆ ⌫ˆ q). Then˜xˆ q = Drw + ⌫ˆ = [CrDr, ✏I]24 1Crw✏ ⌫ˆ35 = Hpwhere p :=24 1Crw✏ ⌫ˆ35 is the vector whose first m entries are obtained from1Crw and whose next m rows are obtained from ✏ ⌫ˆ, and H = [CrDr, ✏ I] isas defined earlier.Next, let H = U⌃V T be the singular value decomposition of H (whereU is m⇥m, ⌃ is m⇥ 2m, and V is 2m⇥ 2m). The pseudo-inverse of H will412.1. Approach 1: Modifying the measurement matrixbe given by H† = V ⌃1UT where ⌃1 is the matrix whose (i, i)th entry (for1 i m) (⌃1)ii satisfies (⌃1)ii = 1⌃ii , so that⌃1⌃ =Im⇥m 0m⇥m0m⇥m 0m⇥mHence,H†H = VIm⇥m 0m⇥m0m⇥m 0m⇥mV T = [v1 v2 . . . vm 0 · · · 0]26664vT1vT2...vT2m37775where vi denotes the ith column of the matrix V . Accordingly, for any vectorx 2 R2m,H†Hx = hv1, xiv1 + . . .+ hvm, xivmThat is, H†H is the orthogonal projection operator into the subspaceW = span{v1, v2, ..., vm} and so kH†Hkop 1. Now,kH†(˜xˆ q)k2 = kH†Hpk2 kH†Hkopkpk2 kpk2 p2m (2.4)where we used the fact that kwk2 Crpm and k⌫ˆk2 ✏pm (by (2.1)).Next, we find a lower bound on the singular values of H†. Note that H =[CrDr✏ I]. Thus,HH⇤ = [CrDr,✏I]Cr(Dr)T✏ I= C2r (Dr)(Dr)T + (✏)2I.Let j(Dr) denote the jth largest singular value ofDr. Then, the eigenvaluesof HH⇤ are given by C2r2j (Dr) + (✏ )2. Using the fact that singular valuesof a matrix A are given as the square roots of eigenvalues of AA⇤, andalso using the connection between the singular values of a matrix A and itspseudo-inverse `(A†) = 1m`+1(A) (with ` = 1, 2, ...,m 1,m), we obtain :`(H†) =⇣C2r2m`+1(Dr) + (✏)2⌘1/2=⇣C2r12` (Dr)+ (✏)2⌘1/2 ⇣C2r (3⇡r`m )2r + ( ✏ )2⌘1/2. (2.5)422.1. Approach 1: Modifying the measurement matrixOn the other hand, we havekH†(˜x q)k2 = kH†(˜x+ ⌘ ⌘ q)k2 = kH†(y q ⌘)k2 = kH†(Dru ⌘)k2= kH†[CrDr ✏I]24 1Cr u✏ ⌘35k2 kHTHkopk24 1Cr u✏ ⌘35k2 p2mwhere we used kHTHkop 1 , y q = Dru with kuk2 Crpm (byProposition 6) and k⌘k2 ✏pm. Combining the inequality above with(2.4), we obtainkH†˜(x xˆ)k2 kH†(˜x q)k2 + kH†(˜xˆ q)k2 p2m+ p2m = 2p2m, (2.6)Therefore,2p2m kH†˜(x xˆ)k2 = kV ⌃1UTU(x xˆ)k2= k⌃1(x xˆ)k2 `(H†)k`(x xˆ)k2where `(H†) is the `th singular value of H† and ` is the matrix obtainedby considering the first ` rows of . Hence,k`(x xˆ)k2 2p2m`(H†).Now by hypothesis, k and ` are chosen so that 1p`` satisfies RIP with2k < 1/9. Thus, by using p = 2 in Proposition 7, we obtainkx xˆk2 C4p`k`(x xˆ)k2 + 2C5pkk(x)`1where we used the fact that kxˆk1 kxk1. Hence, for such ` and k,kx xˆk2 C4p`k`(x xˆ)k2 + 2C5pkk(x) 2C4pm`(H†)p`+2C5pkk(x) 2C4rm`⇣C2r (3⇡r`m)2r + (✏)2⌘1/2+2C5pkk(x) 2C4rm`⇣Cr(3⇡r`m)r +✏⌘+2C5pkk(x)= 2C4Cr(3⇡r)r(m`)r+12 + 2C4rm`✏+2C5pkk(x)432.1. Approach 1: Modifying the measurement matrixwhere we used the inequality (t2 + s2)1/2 t+ s for t, s > 0.2.1.1 Implications for bounded orthonormal systemsThe initial matrices used in CS were all non-structured random matrices suchas sub-Gaussian matrices. Using them came with at least two importantcaveats, namely, multiplying non-structured matrices with vectors is a longprocess and also storing them is costly and difficult. For these reasons, animportant class of random matrices in CS are considered choosing randomrows of Fourier matrices. Since these random matrices are structured, theysolve the issues mentioned above. Another reason for using these matrices isthat in some applications such as MRI [52] or tomographic imaging [19] thedevices are designed in a way that they measure the coefficients of signals inthe transform domain. Using these matrices was first suggested by Candèset al. [21] to recover sparse signals using few measurements. The number ofmeasurements was later improved by Rudelson et al. [69]. Specifically, it isshown in [69] that for a normalized n⇥ n discrete Fourier transform (DFT)matrix F (n) whose (k, j)th entry is given byF (n)k,j =1pne2⇡i(j1)(k1)n (2.7)If the number of measurements m satisfies m = O(k log4 n) , then the sub-matrix consisting of m rows of F (n) satisfies RIP condition with highprobability.A generalization of Fourier matrices is given in [34], and is defined asBounded Orthonormal Systems (BOS). Let D ✓ Rd be equipped with aprobability measure ⌫. A BOS is defined as follows.Definition 7. Let = [1,2, . . . ,n] be a system of complex-valued func-tions on D. We call a Bounded Orthonormal System (BOS) if(i) is an orthonormal system, that is, for all j, k 2 {1, 2, ..., n},Zj(t)k(t)d⌫(t) = j,k =⇢1 if j = k0 otherwise , and(ii) is uniformly bounded, that is, for all j 2 {1, 2, ..., n},kjk1 = supt2D|j(t)| Kfor some constant K.442.1. Approach 1: Modifying the measurement matrixIn the context of CS, we are mostly interested in discrete orthonormal sys-tems. In this case, D = {1, 2, ..., n} and the discrete uniform measure ⌫ isgiven by ⌫(B) = card(B)n for B ✓ {1, 2, ..., n}. Then, for a unitary matrixU = [Ut,k] = [u1, u2, ..., un] 2 Cn⇥n, the normalized vectors pnuk 2 Cn (fork = 1, 2, ..., n) form a BOS ifpn maxt,k2{1,2,...,n}|Ut,k| K (2.8)for some constant K.One important example of BOS is the DFT matrix defined in (2.7). Itis obvious that this matrix is unitary and it satisfies (2.8). Other importantexamples of BOS are discrete cosine transform (DCT) C(n) and discrete sinetransform (DST) S(n) defined byC(n)j,k =r2ncos⇣⇡n(j 12)(k 12)⌘, S(n)j,k =r2nsin⇣(2j 1)k⇡2n⌘(2.9)It is routine to check that C(n) and S(n) are unitary matrices and they satisfy(2.8) with K =p2. Another important example of BOS is the class ofHadamard matrices defined as follows. A Hadamard matrix H of order2n ⇥ 2n is the matrix whose (j, `)th entry is given byHj,` =12n/2(1)Pnk=1 jk`kwhere for any j 2 {1, 2, ..., 2n}, jk 2 {0, 1} is defined such thatj =Pnk=1 jk2k1 + 1. It can be shown that H is unitary and satisfies(2.8) with K = 1.If U is a discrete BOS, by choosing m random rows ofpnU , one canobtain the random matrix A =pnRTU where RT : Cn ! Cm is the randomoperator that samples m rows of U . After proper normalization, such matrixA satisfies RIP with high probability if the number of measurements is largeenough and thus it can be used as a CS measurement matrix.Theorem 17. [37] Let A 2 Cm⇥n be the random sampling matrix associatedwith a BOS with constant K 1. If for 2 (0, 1),m CK22k ln4(n)(for a universal constant C > 0), then with probability at least 1 n ln3 nthe restricted isometry constant k of 1pmA satisfies k .452.1. Approach 1: Modifying the measurement matrixCorollary 1. For a k-sparse signal x 2 Rn, we can use a Fourier matrixF (n), Discrete Cosine Transform matrix C(n), or Discrete Sine TransformS(n) and consider m0 to be the smallest value (obtained by Theorem 17) forwhich the corresponding measurement matrix satisfies RIP with 2k < 1/9with high probability. Next, set ` := m0, and choose m ` rows of F (n), C(n),S(n) randomly and denote them by F (m,n), C(m,n), and S(m,n) respectively.Then, measure x using UF (m,n) , UC(m,n), or US(m,n). Let xˆ be the solutionto (2.1) with replaced by one of the matrices mentioned here. Then, theerror in approximation using one-stage ⌃ quantization satisfies (2.3) as weincrease the number of measurements m.Remark 5. Here, we show that computing the signal with UF (m,n), UC(m,n),or US(m,n) is a fast process at least when r = 1. First, note that an explicitformula for entries of U is given in the case of r = 1 in [47] :Uk` =s2n+ 1/2cos⇣2(k 1/2)(n `+ 1/2)⇡2n+ 1⌘=s2n+ 1/2cos⇣(2k 1)(2n 2`+ 1)⇡/22n+ 1⌘=s2n+ 1/2cos⇣(2k 1)⇡/2 (2k 1)`⇡2n+ 1⌘=s2n+ 1/2(1)k+1 sin⇣(2k 1)`⇡2n+ 1⌘On the other hand, Discrete Sine Transform (DST) of type III is given by[62]S(n)k` =r2nsin⇣(2k 1)`⇡2n⌘Therefore, we can obtain entries of U using a submatrix of S(2n+1). Thereason is that we can write (k, 2`)th element of S(2n+1) asS(2n+1)k,2` =r22n+ 1sin⇣2(2k 1)`⇡4n+ 2⌘=r22n+ 1sin⇣(2k 1)`⇡2n+ 1⌘which is same as (k, `)th entry of matrix U in absolute value up to a constant.We will also use the expression above for the entries of U in order to to showthat evaluating Uy for a vector y is fast. See Remark 9.462.2. Approach 2: Using a digital bufferRemark 6. While the singular value decomposition of D can be computedexplicitly, to our knowledge an explicit formula for singular value decompo-sition of Dr with r 2 is not known. Note, however, that one can estimatethe singular values of Dr using Weyl’s inequalities [40, 47].Remark 7. Alternatively, one could apply U after collecting the measure-ments using F (m,n), C(m,n), or S(m,n). Of course, this would require that wekeep all m analogue measurements in memory, at least until we apply U stillin analogue domains which is not practically feasible in applications when mis large. We will propose a remedy in Section 2.2.2.1.2 Numerical experimentsIn order to verify the results given in Theorem 16, and in particular, given inCorollary 1, we perform a numerical experiment. In this experiment, we fixthe ambient dimension of signals to n = 200, the sparsity level to k = 5, andthe quantization step to = 0.1. We consider them⇥200 matrix UFm,200 assuggested by Corollary 1 withm 2 {20, 30, 40, 50, 60, 70} as the measurementmatrix. For each value of m, we consider 20 signals in R200, random supportT ✓ {1, 2, ..., 200}, and with non-zero entries chosen from normal Gaussiandistribution. For each of these signals, we find the measurement vector, andsubsequently quantize it using first or second order ⌃ quantization. Next,we find xˆ, the solution to (2.1), and we find the error in approximation. Wetake an average for the error for all 20 signals and move to the next value ofm. The results are plotted in Figure 2.1 in log-log scale. As we observe inthis Figure, the error bounds decays as predicted in (2.3).2.2 Approach 2: Using a digital bufferAside from the issues raised in Remark 7, the above approach is not ideal alsobecause the measurement matrix U (specifically U) depends on m. Thismeans that we must use a different measurement matrix if we wish to increasethe number of measurements m, i.e., we can not “reuse” the measurementsalready collected. This problem would be resolved if we could modify thescheme so that• We first collect y = x and quantize y;• We then use U (or any other matrix that admits a fast implementation)on the quantized measurements, which are now in the digital domain.472.2. Approach 2: Using a digital buffer20 30 40 50 60 700.050.10.150.20.25Figure 2.1: Error in approximation using first order and second order ⌃quantization with one-stage reconstruction scheme and with a “modified"random partial Fourier matrix for 10-sparse signals and the comparison withthe graphs of f(m) = Cm1/2and g(m) = Dm3/2in log-log scale.Figure 2.2: Quantizing the signal x by first using MSQ with a very smallstep size 0, then applying U which is a fast transform followed by a ⌃quantization scheme.482.2. Approach 2: Using a digital bufferTo that end, we propose the following scheme.(1) Given a standard CS measurement matrix , we collect the compressedmeasurements y = x + ⌘, where ⌘, as before, denotes the noise suchthat k⌘k1 ✏.(2) We fix a small 0 (much smaller than the desired final accuracy) andquantize y using an MSQ with step size 0 resulting in yMSQ. This is ahigh bit-budget representation of y and will be discarded after the nextstages so, it is just kept in a buffer (with sufficiently large memory).(3) We compute UyMSQ, which finely approximates Uy = Ux as U is anisometry.(4) We use a ⌃ quantizer (of appropriate order r that matches the matrixU in step (3)) with step size to quantize UyMSQ. This will be thedigital representation of x that we will keep.Finally, we will reconstruct an approximation to x by means of convex opti-mization problem similar to (2.1) given by(xˆ, ⌫ˆ) := argmin(z,⌫)kzk1 subject to kDr(Uz + ⌫ q)k2 Crpmand k⌫k2 00pm(2.10)with 00 defined as 00 = ✏+ 0/2.Note that this method will be successful provided 0 in step (2) is suf-ficiently small to match the quantization error corresponding to the ⌃quantization of step (4). Thus, we will have to ensure that m mmax where0 will be chosen depending on mmax (or vice versa). Collecting all these, wehave the following theorem, which we will prove after stating few remarks.Theorem 18. Let x 2 Rn, be a CS measurement matrix, k and ` be suchthat satisfies (P1), suppose that q is obtained from x following the schemesuggested above where• U is tailored to a ⌃ quantizer of order r (as described in section 2.1).• 0 := (3⇡r)rmrmax492.2. Approach 2: Using a digital buffer• k⌘k1 ✏If xˆ is obtained via (2.10), the approximation error satisfieskx xˆk2 2C4(3⇡r)r(m`)r+1/2 + 2C4rm`✏+2C5pkk(x) (2.11)for ` m mmax. Here C4 and C5 depend only on the RIP constants of .Remark 8. Since in practice, the original measurements in CS are physicalquantities (such as currents), the MSQ step mentioned in Theorem 18 wasperformed in order to assign numbers to the measurements which enables usto store the measurements in the processor and multiply with U later.Remark 9. In step (3) above, we need to compute UyMSQ. Here, we showthat this computation can be done fast. To that end, we use the fact that forS(m) as defined in (2.9), computing S(m)y is fast for any positive integer m,and any vector y. Let y = (y1, y2, ..., ym), then(Uy)1 = U11y1 + U12y2 + ....+ U1mym(Uy)2 = U21y1 + U22y2 + ...+ U2mym...(Uy)m = Um1y1 + Um2y2...+ UmmymThus,(Uy)1 =s2m+ 1/2⇣sin(⇡2m+ 1)y1 + sin(2⇡2m+ 1)y2 + ....+ sin(m⇡2m+ 1)ym⌘(Uy)2 = s2m+ 1/2⇣sin(3⇡2m+ 1)y1 + sin(6⇡2m+ 1)y2 + ...+ sin(3m⇡2m+ 1)ym⌘...(Uy)m = (1)m+1s2m+ 1/2⇣sin((2m 1)⇡2m+ 1)y1 + ...+ sin(m(2m 1)⇡2m+ 1)ym⌘502.2. Approach 2: Using a digital bufferThus, we can write the above equations in the following form.(Uy)1 =p2S(2m+1)12 y1 +p2S(2m+1)14 y2 + ...+p2S(2m+1)1,2m ym(Uy)2 = p2S(2m+1)22 y1 p2S(2m+1)24 y2 ...p2S(2m+1)2,2m ym...(Uy)m = (1)m+1p2S(2m+1)m2 y1 + (1)m+1p2S(2m+1)m4 y2 + ...+ (1)m+1p2S(2m+1)m,2m ymTherefore, we have(Uy)j = (1)j+1p2(S(2m+1)y˜)jfor j = 1, 2, ...,m, and where y˜ 2 R2m+1 is a vector whose odd entries arezero, and whose (2k)th entry (k = 1, 2, ...,m) is defined as yk. Accordingly,computing Uy is a fast process.Proof of Theorem 18. Let x 2 Rn be the given signal, y = x + ⌘ be themeasurement vector (as usual), and y˜ := yMSQ be the vector obtained from ybe performing MSQ (with the step size 0 mentioned above). Then, we havey˜ = y + ⌘0 = x+ ⌘ + ⌘0 with k⌘k2 ✏pm and k⌘0k2 = ky y˜k2 02pm.Moreover, since we apply ⌃ quantization scheme on the vector Uy˜ (toobtain the quantized vector q), we can writeUy˜ q = Druwith kuk2 Crpm (by Proposition 6). Thus,Ux q = Dru+ µ = Hp0where µ = U(⌘ + ⌘0), H = [CrDr 00 I] and p0 =24 1Cr u00µ35. Note thatkµk2 k⌘k2 + k⌘0k2 (✏+ 02)pm = 00pmwhere 00 is defined as 00 = ✏+ 02 . Hence, kp0k22 2m+ 2m = 22m.Therefore, by defining w := Dr(xˆ ⌫ˆq) where xˆ and ⌫ˆ are the solutionsto minimization problem (2.10) , we havekH†U(x xˆ)k2 kH†(Ux q)k2 + kH†(Uxˆ q)k2= kH†Hp0k2 + kH†(Drw + ⌫ˆ)k2 kH†Hkopkp0k2 + kH†Hkopkp00k2 p2m+ p2m = 2p2m(2.12)512.2. Approach 2: Using a digital bufferwhere p00 =24 1Crw00 ⌫ˆ35 , and kp00k2 p2m (since kwk2 Crpm andk⌫ˆk2 00pm by (2.10)).On the other hand for every 1 ` m,2p2m kH†U(x xˆ)k2 = kV ⌃1UTU(x xˆ)k2= k⌃1(x xˆ)k2 `(H†)k`(x xˆ)k2where `(H†) is the `th singular value of H†. Hence,k`(x xˆ)k2 2pm`(H†)and the lower bound for `(H†) is given in (2.5) (with ✏ replaced by 00).Now by Proposition 7 , if k and ` are chosen so that 1p`` satisfies RIP with2k < 1/9, then by using p = 2 we obtainkx xˆk2 C4p`k`(x xˆ)k2 + 2C5pkk(x)where we used the fact that kxˆk1 kxk1. Hence, for such ` and k :kx xˆk2 C4p`k`(x xˆ)k2 + 2C5pkk(x) 2C4pm`(H†)p`+2C5pkk(x)Now, we use (2.5), with ✏ replaced by00 =✏+0/2 to simplify the boundabove.kx xˆk2 2C4Cr(3⇡r)r(m`)r+1/2 + 2C4rm`(✏+ 0/2) +2C5pkk(x) 2C4(3⇡r)r(m`)r+1/2 + 2C4rm`✏+2C5pkk(x)for values of m satisfying ` m mmax. In the last inequality above, weused ` 1, and we assumed Cr = 1/2, and 0 = (3⇡r)rmrmax.522.2. Approach 2: Using a digital buffer20 30 40 50 60 7010-1Figure 2.3: Error in approximation using first order and second order ⌃quantization with one-stage reconstruction scheme and with an extra MSQstep (before applying the matrix U on the measurement vector).2.2.1 Numerical experimentsIn this section, we verify the result given in Theorem 18 empirically. In orderto do that we repeat the experiment explained in Section 2.1.2. The onlydifference is that in this experiment, to obtain the measurement vector y,we use the original random partial Fourier matrix Fm⇥200 (as opposed toUFm⇥200), then we use the step size 0 = (3⇡r)rmrmax to obtain the high-budgetquantized vector yMSQ (which will be stored in the buffer and will be dis-carded later). Next, we find UyMSQ and quantize it using rth order (r = 1, 2)⌃ quantization (with the step size = 0.1) to obtain the vector q. Next,we use (2.3) to obtain the vector xˆ and we find the error in approximation.Similar to what we did in Section 2.1.2, we repeat the experiment for 20signals, and we take an average for the error in approximation. The graphof errors along with the reference graphs f(m) = Cpmand g(m) = Dmpmareshown in Figure 2.3 in log-log scale.532.3. ⌃-quantized compressed sensing with chirp sensing matrices2.3 ⌃-quantized compressed sensing with chirpsensing matricesAs we saw before, the chirp sensing matrices are defined as follows. For aprime number p and the primitive root of unity ! = e2⇡ip , the columns ofp⇥ p2 matrix are defined viarp+m+1 =26666664!r·02+m·0!r·12+m·1...!r·(p1)2+m·(p1)37777775 (2.13)In above, j denotes the jth column of , and r and m range between 0and p 1. By this definition, the (k, j)th entry of the matrix is definedas k,j = !rk2+mk with rp +m + 1 = j. Motivated by the fact that chirpsensing matrices can be used as CS measurement matrices, we try to use ⌃schemes to quantize CS measurements obtained using these matrices. Weknow that we can do so if they satisfy (P1). However, we observe that (P1)does not hold for these matrices. Consider a p ⇥ p2 chirp sensing matrix and let T = {1, 2} (hence, we shall consider the first and second columns ofthis matrix). Note that we prefer the parameter ` in (2.3) to be as small aspossible in order to minimize the error in approximation, but as we illustratebelow the property (P1) does not hold even for ` = p1✏ (for any ✏ > 0),and large enough p. Set E = ` (where as above, ` denotes the restrictionof to its first ` rows). Next, consider the matrix A = 1`E⇤TET . Obviously,A11 = A22 = 1 and for any given ✏0 > 0,|A12| = |A21| =1`⇣1 + e2⇡ip + e4⇡ip + e6⇡ip + ...+ e2`⇡ip⌘ 1 ✏0for large enough p since each term in the sum above goes to 1 as p!1. Theeigenvalues of this matrix satisfy (1)2 = |A12|2 and so min = 1 |A12| ✏0. Hence, 2 = max{1 min,max 1} 1 ✏0 for large enough p, andtherefore 2 < 1/9 can not hold.However, this issue can be resolved if we use a certain submatrix of thechirp sensing matrix by choosing certain values of m. Specifically, we definea p⇥ pbppc matrix ¯ as follows.Definition 8. Let p be a prime number, and = (!rx2+mx)x2Zp be ap ⇥ p2 chirp sensing matrix, where the columns are indexed by two param-542.3. ⌃-quantized compressed sensing with chirp sensing matriceseters r and m in Zp. Define ¯ = (!rx2+mx)x2Zp as a p ⇥ pbppc subma-trix of if the values of r and m are chosen from {0, 1, 2., , , p 1}, and{bppc, 2bppc, ...., (bppc)2} respectively.We will show that such matrices satisfy (P1) and hence, one can performone-stage ⌃ quantization using them as measurement matrices. We willanalyze the corresponding approximation error in two scenarios: First, wefix the sparsity level and vary the number of measurements. Next, we fix thenumber of measurements and vary the sparsity level.2.3.1 Approximation error as the number of measurementsgrowsIn this case, we fix the signal x and we will increase the ambient dimensionand the number of measurements while fixing the vector x by embeddingx into higher dimensional space. This is because for the class of matricesdefined above, to increase the number of measurements p, we must also in-crease the ambient dimension, which is equal to pbppc. As such, we evaluatethe error in quantization using one-stage ⌃ quantization as the number ofmeasurements p increases.First, we prove that the class of matrices defined in Definition 8 satisfythe property (P1 ) of order (k, `) for appropriate choices of k, and `.Theorem 19. Consider the p⇥ pbppc matrix ¯ as defined in Definition 8.Then there exists a prime number p0 such that for p p0, the matrix sat-isfies the property (P1) of order (k, `) for k 4pp log p and ` = bp3/4 log2 pc.To prove this theorem, we will use the following result about an estimatefor exponential sums, given by Weyl [57, p. 41].Theorem 20. (a) Suppose that P (x) = ↵x2 + x+ where ↵ satisfies↵ aq 1q2for some relatively prime integers a and q. Then NXn=1e(P (n)) . Npq+pN log q +pq log qwhere e(x) = e2⇡ix and the notation f . g means |f | Cg for a constant Cand for all values of the free variables under consideration.552.3. ⌃-quantized compressed sensing with chirp sensing matrices(b) Suppose that P (x) = x+ . Then NXn=1e(P (n)) 12kkwhere kk is the distance to the nearest integer.Proof of Theorem 19. First we define the incomplete Gauss type sum S(r,m, p, `)for r, m, p, and ` as given in the theorem viaS(r,m, p, `) := e2⇡ip [r+m] + e2⇡ip [4r+2m] + ...+ e2⇡ip [r`2+m`]Suppose that va and vb are two distinct columns of 1p`` corresponding tothe values of r1,m1 and r2,m2 (i.e., a = r1p+m1+1 and b = r2p+m2+1).Then|hva, vbi| = 1` |S(r2 r1,m2 m1, p, `)| (2.14)To bound the RHS in above, we need to consider two cases.Case 1. If r1 6= r2, we bound |hva, vbi| by setting ` = bp3/4 log2 pc.For this purpose, we apply part (a) of Theorem 20 mentioned above with↵ = r2r1p which is of the formaq for relatively prime integers a = r2 r1and q = p. Hence, by using part (a) of Theorem 20, and using the fact thatp3/4 log2 p2 p3/4 log2 p 1 ` p3/4 log2 p for p 3, we obtain|hva, vbi| . 1p3/4 log2 p|S(r2 r1,m2 m1, p, `)|. 1p3/4 log2 p⇣p1/4 log2 p+ p3/8 log3/2 p+pp log p⌘. 1p1/4 log3/2 p(2.15)Case 2. If r1 = r2, we set ` = bp3/4 log2 pc and we use the fact that theset of possible values of m are {bppc, 2bppc, 3bppc, ...., (bppc)2}, and weuse part (b) of Theorem 20. Note that in our problem = m2m1p , and1 1. Accordingly, to evaluate kk, we evaluate min{||, 1 ||}.Next,|| bppcppp 1p 0.1ppp=0.1ppfor any prime number p. Also,|| (bppc 1)bppcp (pp 1)ppp= 1 1pp562.3. ⌃-quantized compressed sensing with chirp sensing matriceswhich implies 1 || 1pp . Therefore, in any case, kk 0.1pp , and since weset the value of ` to be ` = bp3/4 log2 pc, we will have|hva, vbi| . 1p3/4 log2 ppp . 1p1/4 log2 p. (2.16)Combining the equations (2.15) and (2.16) we obtain|hva, vbi| . 1p1/4 log3/2 pSince the columns of the matrix ` have unit norm, we can conclude thatthe coherence of this matrix, µ, satisfies µ . 1p1/4 log3/2 p.Therefore, there exists a prime number p0 such that for p p0 :k kµ . p1/4 log p 1p1/4 log3/2 p=1plog p<19Next, we prove the following corollary which shows that we can indeed use¯ along with a rth-order ⌃ quantizer.Corollary 2. Let x 2 ⌃nk , let p0 be as defined in Theorem 19, and supposethat p1 > p0 is a prime number such that k 4pp1 log p1. Then, for anyp p1, x can be approximated by xˆ, the solution to (2.1), if(i) the measurement matrix is U ¯, where ¯ is the p⇥pbppc matrix definedas in Definition 8, and(ii) q is obtained by quantizing U ¯ using an rth order ⌃ scheme.In the noise-free case, as we increase the number of measurements p, theapproximation error satisfieskx xˆk2 C(3⇡r)r(log p)2r1p 14 (r 12 ) (2.17)where C is a constant that does not depend on r, p0, p, and p1.Proof. Set ` = bp3/4 log2 pc. Then, since p > p1, we have k 4pp log p.Thus, by Theorem 19, the p⇥ pbppc matrix ¯ satisfies (P1) of order (k, `),and hence the vector x can be approximated by xˆ. Moreover, by Theorem16, as p increases in the noise-free case, the error in approximation satisfieskx xˆk2 C(3⇡r)r(log p)2r1p 14 (r 12 ) (2.18)where C is a constant that does not depend on r, p0, p, and p1.572.3. ⌃-quantized compressed sensing with chirp sensing matricesNote that the error decay rate O(p14 (r 12 )) (up to a factor logarithmic inp) given in Corollary 2 is inferior to O(p(r12 )) which we obtain with randommatrices (with m = p measurements). This behaviour is due to the fact thatthe both dimensions of ¯ increase as we increase p. One way to circumventthis issue is to restrict the maximum number of measurements to some pmax.In the following theorem, we will prove that under such circumstances, theapproximation error behaves like p(r12 ), similar to the case with randommatrices.Theorem 21. Fix ↵, > 0, with ↵ + /2 < 1/2. Let x 2 ⌃nk , and assumethat p0 be as defined in Theorem 19. Suppose that p1 > p0 is a prime numbersuch that k p↵1 . Then, for any p1 p pmax, where pmax = O(p1+1 ), thesignal x can be approximated by xˆ, the solution to (2.1), if(i) the measurement matrix is U ¯, where ¯ is the p⇥pbppc matrix definedas in Definition 8, and(ii) q is obtained by quantizing Ux using an rth order ⌃ scheme.In the noise-free case, as we increase the number of measurements p, theapproximation error satisfieskx xˆk2 Dp(r1/2) (2.19)where D is a constant that depends on p1, and order r, but does not dependon p0 or p.Proof. Set ` = bp1/2+↵+/21 log2 p1c. Then, by using Theorem 20, and similarto the argument given for the proof of Theorem 19, we conclude that thecoherence of ˜ satisfiesµ . 1`⇣ `pp+p` log p+pp log p+pp⌘. 1`pp log p . p1/2+/21plog p1p1/2+↵+/21 log2 p1Hence, the RIP constant of 1p`¯ satisfiesk < kµ .p1/2+/2+↵1plog pp1/2+↵+/21 log2 p1=1log3/2 p1< 1/9where we used the fact that 1log3/2 p0< 1/9, and p1 p0. This means that¯ satisfies the property (P1) of order (k, `), and hence the vector x can bereconstructed using the solution of (2.1) if U ¯ is used as the measurementmatrix. Also, by using ` = bp1/2+↵+/21 log2 p1c, and m = p in (2.1) weobtain the bound on the error in approximation (2.20) as desired.582.3. ⌃-quantized compressed sensing with chirp sensing matricesAs an example of Theorem above, we can set k = 4, ↵ = 0.34, and = 0.3. Then ↵ + /2 = 0.49 < 1/2, and we must choose p1 such thatk = 4 < p0.341 . We can observe that p1 = 61 satisfies this inequality. Hence,if the number of measurements satisfies 61 p 611.3, then the guaranteeon the error bound (2.20) will hold.Combining Theorem 21 and Corollary 2, we obtain the following result.Corollary 3. Fix ↵, > 0, with ↵ + /2 < 1/2. Let x 2 ⌃nk , and assumethat p0 be as defined in Theorem 19. Suppose that p1 > p0 is a prime numbersuch that k p↵1 . Then, for any p p1, the signal x can be approximatedby xˆ, the solution to (2.1), if(i) the measurement matrix is U ¯, where ¯ is the p⇥pbppc matrix definedas in Definition 8, and(ii) q is obtained by quantizing Ux using an rth order ⌃ scheme.In the noise-free case, as we increase the number of measurements p, theapproximation error satisfieskx xˆk2 Dp(r1/2) (2.20)if p p1+1 , andkx xˆk2 C(log p)2r1p 14 (r 12 ) (2.21)if p > p1+1 .2.3.2 Approximation error as the sparsity level variesIn the previous section, we saw that if we use an appropriate measurementmatrix, and an appropriate approximation scheme, then as we increase thenumber of measurements, the error in approximation decreases. Our objec-tive in this section is to fix the number of measurement (which also fixes theambient dimension) and reduce the sparsity level k. We expect to observea similar behaviour to what we observed above, and see a decay in error inapproximation.Theorem 22. Consider the CS matrix ¯ as defined in Definition 8. Thereexists a prime number p1 such that for a fixed number of measurements p,with p p1, 1 k pplog p and ` = bkpp log pc, the matrix ¯ satisfies theproperty (P1) of order (k, `).592.3. ⌃-quantized compressed sensing with chirp sensing matricesProof. By doing a similar calculation to the one given in the proof of The-orem 19, and using ` = bkpp log pc, the equations (2.15) and (2.16) will bereplaced by|hva, vbi| . 1kpp log p⇣k log p+pk 4pp log p+pp log p⌘and|hva, vbi| .ppkpp log prespectively. This implies|hva, vbi| . max{ 1pp,1pk 4pp,1kplog p,1k log p}Therefore there exists a prime number p1 such that for p p1, the RIPconstant of the matrix ` satisfiesk < kµ . max{ kpp,pk4pp,1plog p,1log p} . 1plog p< 1/9where we used the assumption on the sparsity level k pplog p .Similar to what we observed in Section 2.3.1, we state a corollary regardingthe bound on the error term when the matrix U ¯ is used as the measurementmatrix, and one-stage recovery scheme is used to reconstruct x. To matchthis corollary with the similar results, where we had a decreasing functionfor the error term, we consider the error as a function of k0 = 1/k. Note thatwe expect the error term to decay as we decrease the value of k, i.e., as weincrease the value of k0.Corollary 4. There exists a prime number p0 such that for a fixed primenumber p with p p0, any k-sparse signal x can be approximated with thevector xˆ, the solution to (2.1), provided that the following holds.(i) The sparsity level satisfies k bkmax :=pplog pc.(ii) The measurement matrix is U ¯, with ¯ defined as in Definition 8;(iii) q is obtained by quantizing U ¯x using an rth order ⌃ scheme – as in(1.33).602.3. ⌃-quantized compressed sensing with chirp sensing matricesThe error in approximation satisfieskx xˆk2 2CrC4(3⇡r)r(pplog p)r+1/2!(k0)r+1/2 (2.22)assuming that no noise is present. Here, k0 = 1/k, and the constant C4 onlydepends on RIP constant of .Proof. Let x be a k-sparse signal, and xˆ be the approximation vector. Also,let p0 be the prime number given by Theorem 19. Replace the value ofm = pand ` = bkpp log pc kpp log p into (2.3) and use the fact that k(x) = 0for a k-sparse signal to concludekx xˆk2 2CrC4(3⇡r)r(pplog p)r+1/2!(k)r1/2as desired.2.3.3 Numerical experimentsIn this section, we verify the results we obtained in Sections 2.3.1 and 2.3.2.We run two numerical experiments. In the first experiment, we considerprime numbers p = 61, 137, 223, 307, 397, 487, 593, 677, 787, and foreach prime p, we draw 20 signals, each of which is a 4-sparse signal witha random support chosen from the set {1, 2, · · · , 61⌅p61⇧}, and whose en-tries are chosen independently from a standard Gaussian distribution. Inother words, the actual ambient dimension of signals that are consideredis 61⌅p61⇧= 427. For each such signal, we compute the CS measurementsy = U ¯ which we subsequently quantize using a stable rth-order ⌃ schemeto obtain q with r = 1 or r = 2. Next, we reconstruct an approximation xˆof x using (2.1) where we set = U ¯, = 0.1, r = 1, 2, and ✏ = 0. Finally,for each p, we compute the average kx xˆk2. We plot the average error asa function of p in log-log scale in Figure 2.4. As mentioned in Section 2.3.1,for 4-sparse signals, we expect the bound on the error in approximation tobehave like p(r1/2) at least for 61 p 611.3. Figure 2.4 confirms thisfact and shows the p(r1/2) behaviour even for p values beyond this range.In the second experiment, we fixed the number of measurements to bep = 541, and we considered k-sparse signals with 3 k 15. Then foreach k0 = 1k , we consider 50 signals which are k-sparse and have a randomsupport T ✓ {1, 2, ..., 1400} and have entries chosen independently from the612.4. Further encoding of ⌃-quantized compressive measurements102 Number of measurements p10-310-2Mean Error in ApproximationFigure 2.4: Error in approximation using first order and second order ⌃quantization with one-stage reconstruction scheme for a 4-sparse signal andthe comparison with the graphs of f(p) = Cpp and g(p) =Dp3/2(each oneshifted properly to match the original graphs) in log-log scale.standard Gaussian distribution. For each of these signals, the reconstructionvector xˆ is obtained from (2.1) with r = 1 or r = 2. We average over all theerrors for each value of k, and we plot the graph of average errors as well asthe upper bounds on the error obtained in Section 2.3.2 in log-log scale inFigure 2.5.2.4 Further encoding of ⌃-quantized compressivemeasurementsIn one-stage recovery of ⌃ quantized measurements, we start with a mea-surement vector y and since we have to store/transmit data we quantize thisvector using an alphabet A to obtain a quantized vector q 2 Am. To en-code q, we need log2 |A|m = m log2 |A| bits. In [71], Saab et al. proposed amethod to encode using much less number of bits without affecting the errorin reconstruction significantly. In the following, we give a brief review abouttheir result.In a nutshell, they reduce the dimension of q to encode using less numberof bits. In particular, suppose that L m, and consider the encoder E :Am ! C defined as E(q) = BDrq. where B is an L ⇥m Bernoulli matrix622.4. Further encoding of ⌃-quantized compressive measurements0.1 0.15 0.2 0.25 0.3k'=1/k10-310-2Error in ApproximationFigure 2.5: Error in approximation using first order and second order ⌃quantization with one-stage reconstruction scheme with fixed number of mea-surements (p = 541) and the comparison with the graphs of f(k0) = 1pk0andg(k0) = 1pk03.with i.i.d. equiprobable entries.First, we find how many bits we are saving by using this encoder. Weconsider the alphabet AK := {K, ...,, , ...,K}. Since kDrk1 mr,and kBk m [71], we obtain kBDrqk1 mr+1kqk1 mr+1K. Thus,for each entry of E we need an alphabet of the formA0 = AKmr+1There are L such entries, so in total we should useL log2 |A0| = L(r + 1) log2m+ L log2 2Kbits to represent E(q). Thus, by enlarging the size of alphabet and reducingthe dimension, Saab et al. [71] reduced the number of bits because the sizeof alphabet appears only as logarithmic factor.Now, the goal is to find an algorithm to reconstruct x with the vector xˆgiven the encoded vector q˜ = E(q) = BDrq, and with kx xˆk2 to be assmall as possible. This algorithm is given in [71] as follows.(xˆ, uˆ, eˆ) = argminkx˜k1 subject to BDr(x˜+ e˜)Bu˜ = BDrqand kBu˜k2 3Cm and ke˜k pm✏(2.23)632.4. Further encoding of ⌃-quantized compressive measurementsNext, we prove that this algorithm can be applied using the measurementmatrix defined in Definition 8. In order to do so, first we choose a Bernoullimatrix B of the size L ⇥ p with L = bp5/8 log2 pc and consider the p ⇥ pmatrix Dr. Then, write the singular value decomposition of BDr in theform BDr = TSRT . Using this notation, we prove the following theorem.Theorem 23. Consider a k-sparse signal x 2 Rn, with k b 8pp log pc.Suppose that we use the matrix R¯ as the measurement matrix, where R isas above, and ¯ is the matrix given in Definition 8, to find the measurementvector y. Then, we use the rth order ⌃ quantization to obtain the quantizedvector q. Next, find the reconstruction vector xˆ via (2.23). The error inreconstruction satisfieskx xˆk C1⇣ log ppp)r/23/4 + C2s pplog p✏+ C3k(x)1pkwith probability at least 1C5ec6p11/16 log p for some constants C1, C2, C3, C5,and c6.Note that if we want to have decreasing bound (as a function of p) for theerror in approximation in the noise-free case, we need to have r/23/4 > 0.This means we must have r 2.Proof. First, let L = bp5/8 log2 pc, and we verify that 1pL¯L satisfies theRIP with 2k < 1/9, if k 4pp log p. To that end, we use (2.14) along withTheorem 20, to conclude that|hva, vbi| . 1p5/8 log2 p⇣p1/8 log2 p+ p5/16 log3/2 p+pp log p⌘.pp log pp5/8 log2 p=1p1/8 log3/2 pHence, the coherence of 1pL¯L satisfies µ . 1p1/8 log3/2 p , and this implies thatthe RIP constant satisfiesk < kµ p1/8 log pp1/8 log3/2 p< 1/9for large enough p.Similar to the what mentioned in the proof of Theorem 18, if we usep = 2 in Proposition 7, and the value of L as stated above, since 1pL¯L642.4. Further encoding of ⌃-quantized compressive measurementssatisfies the RIP with 2k < 1/9, we can conclude thatkx xˆk2 d1pLk¯L(x xˆ)k2 + d2k(x)1pk(2.24)for some constants d1 and d2. Next, we find an upper bound for k 1pL ¯L(xxˆ)k2. To do that, we consider the set E = E1 \ E2 whereE1 := {B 2 Bern(L,m) : L(BDr) ⇣mL⌘r/21/4pm},andE2 := {B 2 Bern(L,m) : kBk`2!`2 pL+ 2pm}It is shown in [71] thatP (E) 1 2ec1pmL ec2L, (2.25)for some constants , c1, and c2. It is also shown that for any B 2 E , if wedecompose BDr in the form BDr = TSRT , and if we set ˜ = RT (here, is the measurement matrix, and in our case, = R¯ with ¯ as given inDefinition 8, and so ˜ = RT (R¯) = ¯), then we have1pLk˜L(x xˆ)k2 6C⇣ Lm⌘r/23/4+ 2rmL✏for a constant C. Hence, using the value of L as given above, we obtaink 1pL¯L(x xˆ)k2 6C⇣ log2 p8pp⌘r/23/4+ 2s8pplog2 p✏ (2.26)Accordingly, by combining (2.24) and (2.26), we obtainkx xˆk2 C1⇣ log2 p8pp)r/23/4 + C2s8pplog2 p✏+ C3k(x)1pkNoting that m = dp3/4e, and L = dp5/8 log2 pe, we conclude that ec1pmL .ec2L, which implies that 1 ec2L . 1 ec1pmL. Therefore, by (2.25),the inequality above holds with probability at least 1 C5ec6pmL, i.e.,1 C5ec6p11/16 log p, for some constants C1, C2, C3, C5, and c6.652.4. Further encoding of ⌃-quantized compressive measurementsTherefore, based on what we summarized from [71] in Section 1.3.5, thefollowing corollary holds regarding the exponential accuracy. Here, RT ¯,with ¯ as defined in Definition 8 is used as the measurement matrix, themeasurements are quantized using rth order ⌃ quantization, and furtherencoding is performed at the end.Corollary 5. There exist constants C0, C2 such that in the noise-free case,and for k0 = bC0p5/8 log pc, the distortion rate D, as defined in Section 1.3.5,satisfiesD . 2C2 Rk0 log pwhere R is the bit rate defined as R := log |C|.66Chapter 3Deterministic partial binarycirculant compressed sensingmatrices3.1 Introduction and preliminariesIn this chapter, we present a novel construction for deterministic CS matri-ces based on decimated Legendre sequences. As outlined below, Legendresequence provides a binary sequence with ±1 entries which initially seemsideal to use in the context of CS. However, in order to be able to use thesesequences as rows or columns of a measurement matrix, one has to guaranteea low maximum correlation between two such sequences. This was done firstby Zhang et al. [82] in 2002 (before the birth of CS) in the context of codingtheory, and by considering decimated Legendre sequences. The use of Leg-endre symbol in CS with random matrices was proposed by Bandeira et al.[6] in 2016. The summary of their work is given in Section 3.2. In the sameyear, Zhang et al. [81] proposed the use of Legendre symbol for the construc-tion of deterministic CS matrices. In fact, their construction was based ontheir previous work in 2002 which was done in the context of coding theory,and has the feature of being binary, and having low coherence. Moreover,since any prime number can be used as the number of measurements, thedifference between size of two adjacent matrices in their construction is lowcompared to many other deterministic constructions.As outlined below, another important feature that a CS matrix can haveis the circulant matrix structure (see Section 3.3). The use of circulantmatrices in random CS was first proposed by Bajwa et al. [4] in 2007. Anequivalent approach was proposed by Romberg [68] in 2009. In the latterapproach, given a signal x 2 Rn, first a convolution of x, of the form Hx isconsidered, where H can be written in the form ofH =1pnF ⇤⌃F673.1. Introduction and preliminariesHere, n is as usual the ambient dimension, F is the discrete Fourier matrix,and ⌃ is a diagonal matrix whose diagonal entries are complex numberswith unit norm, and random phases. Following this step, we subsample themeasurements. Therefore, the measurement matrix in this approach can bewritten as = R⌦H, where ⌦ ✓ {1, 2, · · · , n} is a set with m elements,and R⌦ is a sampling operator that restricts the rows to a random set ⌦,i.e., a random choice of a set ⌦ among all possiblenmsuch sets. Basedon this idea, Li et al. [50] considered a measurement matrix of the form = R⌦A, where A is a deterministic matrix. Since R⌦ here is a randomsampling operator, the measurement matrix in their construction can not beconsidered as deterministic yet. To the best of our knowledge, the only classof circulant deterministic matrices considered in the literature so far is theclass of matrices introduced by Cui [25]. In their paper, they constructed thecirculant matrix A by first writing it of the form A = 1pnF ⇤diag()F , then,considering the sequence as a decimated Legendre sequence, and finallyconsidering the first m rows of A as the measurement matrix. They showempirically that this matrix performs very well as a measurement matrix,but no proof was given in this regard. Note that there are of course otherdeterministic binary matrices given in the literature. In Section 1.2, wegave a summary for deterministic binary construction introduced by DeVore.Other binary deterministic constructions have been introduced in [1, 51, 58].To the best of our knowledge, the construction we introduce in this chap-ter is the first deterministic binary circulant construction in CS which isproved to have low coherence, and hence, can be used for recovery of sparsesignals. Compared to the work of Cui [25], in addition to giving a proof forwhy the construction can be used in CS, our construction has the advantageof having a simple, explicit formula for each entry of the measurement ma-trix itself (as opposed to its diagonalization). The circulant structure of ourconstruction allows us to perform a fast matrix-vector multiplication, and afast recovery algorithm. Moreover, our construction has a small differencebetween the sizes of two adjacent matrices. (as we will see below, the num-ber of measurements in our construction is chosen as dp3/4e, where n = p isa prime number and is assumed to be the ambient dimension). Lastly, wewill show that we can perform the one-stage ⌃ quantization as describedin Chapter 2 using our construction. Similar to the constructions given in[6, 25, 81], our construction exploits Legendre symbol. Therefore, we firstgive a summary regarding Legendre symbol and its properties.Let p 3 be a prime number. For a 2 R, we assume that dae representsthe smallest integer greater than or equal to a. For a 2 Z, we assume that683.2. Compressed sensing matrices using Legendre sequence⇣ap⌘denotes the Legendre symbol and is equal to 0 if a is divisible by p,is equal to 1 if a is a quadratic residue mod p, and is equal to -1 if a is aquadratic non-residue mod p. These basic properties of the Legendre symbolcan be found in any elementary number theory textbook, e.g., in [45] :1.⇣abp⌘=⇣ap⌘⇣bp⌘for any a, b 2 Z and a prime number p.2.⇣ap⌘= ap12 mod p for a 2 Z and an odd prime number p.3.⇣qp⌘=⇣pq⌘(1) p12 q12 for all odd primes p and q.Legendre symbol is a special example of Dirichlet characters. A Dirichletcharacter is a function : Z! C that has the following properties:1. There exists a positive integer k such that (n) = (n + k) for alln 2 Z. The number k is called the modulus of the character.2. If (n, k) > 1 (where (n, k) denotes the greatest common divisor of nand k), then (n) = 0; If (n, k) = 1, then (n) 6= 0.3. (mn) = (m)(n) for all m,n 2 Z. Note that this property impliesthat (1) = 1 as (1) 6= 0 by 2.A character is called principal if it assumes the value 1 for the argumentscoprime to its modulus and otherwise is 0.It is obvious that when p is a prime,⇣·p⌘is a Dirichlet character withmodulus k = p.Remark 10. Dirichlet characters with modulus k = p form a group undermultiplication where the principal character is the identity element. If isan element of this group, then (ap1) = (1) = 1 for every integer a whichis not divisible by p since ap1 = 1 mod p. Therefore p1 is always equalto the principal character. Hence, if is a character modulus p of order d,then d|p 1.3.2 Compressed sensing matrices using LegendresequenceFix a prime number p, and an element x 2 Zp. Using the Legendre symbolintroduced above, we observe that the entries of any vector of the form v =693.3. Circulant matrices(⇣1+xp⌘,⇣2+xp⌘, · · · ,⇣p+xp⌘) are evenly distributed with ±1 entries. Hence,one can construct CS matrices using these vectors (or similar vectors). In[6], Bandeira et al. constructed a class of m ⇥ n random matrices whose(i, j)th entry is obtained viai,j :=1pm⇣x+m(j 1) + ip⌘(3.1)where p is a large prime number, and x is drawn uniformly from a set of theform {0, 1, 2, · · · , 2h1}. They show that for appropriate choices of k,m, n, ,and h, such matrices are RIP of order k and constant with high probability.They also make a conjecture implying that the class of deterministic matricesobtained from fixing x = 0 in (3.1) are RIP in the optimal regime (and hence,would break the square-root barrier). As mentioned above, one can also usethe Legendre symbol to construct binary deterministic matrices. Examplesof such constructions are given in [25, 81].Since our goal in this chapter is to consider a class of deterministic binarycirculant matrices, we next summarize the definition and main features ofcirculant matrices in the context of CS.3.3 Circulant matricesThe use of (random) circulant matrices in CS has been originally suggestedby Bajwa et al. [4] in 2007, followed by the results by Romberg [66, 68] in2009. As stated in [66], there is an advantage for using circulant matricescompared to Gaussian and Bernoulli random matrices in CS, as generatingthese matrices is faster, and more importantly the matrix-vector multiplica-tion process is faster when we use these matrices which in turn makes thereconstruction algorithm faster. Moreover, they arise in applications such asidentifying linear time-invariant systems [3]. We know that CS can be ap-plied in MRI using partial Fourier matrices. However, application of CS inMRI with using a generalization of circulant matrices called Toeplitz matri-ces has been proposed in [52], and has been shown that it has an advantageover the classical method in the sense that it spreads out the image energymore evenly.703.3. Circulant matricesDefinition 9. A circulant matrix is a matrix of the formC =26664c1 c2 .... cncn c1 .... cn1...c2 c3 ... c137775We say that C is generated by the vector v = (c1, c2, ..., cn).It is seen in the definition above that a circulant matrix is a squarematrix and hence, is not appropriate to use in the context of CS. Therefore,in practice, random rows of such matrices have been considered to obtaina class of matrices called “partial circulant matrices". It is shown in [67]that if a circulant matrix is generated by a Rademacher sequence, then withhigh probability, the partial circulant matrix obtained from choosing m rowsof the (square) n ⇥ n circulant matrix satisfies RIP with 2k < 1/p2 ifm & k3/2 log3/2 n. This condition was later improved by Krahmer et al.[46] to the (suboptimal) condition m & k log2 k · log2 n. Hence, to comparethese random matrices with the sub-Gaussian random matrices, we observethat there is an additional log2 k factor in the expression for the minimumnumber of measurements. However, on the positive side, these matricescan be diagonalized using Fourier transform, and therefore, using FFT, theprocess of matrix-vector multiplication as well as running the reconstructionalgorithm become faster.Here, we justify these facts. First, we show that the Discrete FourierTransform matrix (DFT) diagonalizes the circulant matrix above. Let ! =e2⇡i/n, and let vk = (1,!k,!2k, ...,!(n1)k)T (for 0 k n 1). Then,Cvk =26664c1 c2 .... cncn c1 .... cn1...c2 c3 ... c137775266641!k...!(n1)k37775 =26664c1 + c2!k + · · · cn!(n1)kcn + c1!k + · · ·+ cn1!(n1)k...c2 + c3!k + · · ·+ c1!(n1)k37775=⇣c1 + c2!k + · · ·+ cn!(n1)k⌘266641!k...!(n1)k37775Hence, vk is an eigenvector of C with eigenvalue k = c1 + c2!k + · · · +713.3. Circulant matricescn!(n1)k. Thus, the Discrete Fourier Transform (DFT) matrixF =1pn26666641 1 1 · · · 11 ! !2 · · · !(n1)1 !2 !4 · · · !2(n1)...1 !n1 !2(n1) · · · !(n1)(n1)3777775diagonalizes the matrix C. Therefore, to compute Cx = FDF ⇤x (for a vectorx), we need to compute F ⇤x using fast fourier transform (FFT) (which takesas we know O(n log n) operations instead of the standard n2 operations),then we multiply the ith entry of the resultant vector with i. Note thatby computing FFT of x = (1, 1, · · · , 1)T , we find the vector consisting allthe eigenvalues. At last, we multiply F with the resultant vector usingFFT which takes another O(n log n) operations. Accordingly, Cx can becomputed using O(n log n) operations. Note that if we use a partial circulantmatrix as the measurement matrix, we should multiply the resultant vectorby the matrix RT from left, where T = {j1, ..., jm} determines the rows ofthe circulant matrix that are going to be selected, and RT is an m⇥n matrixwhose (i, ji)th entry is one, and all its other entries are zeros. Since RT is asparse matrix, this last multiplication is also fast.As in this thesis, we focus on deterministic matrices as well as quantiza-tion, we should also investigate about performing quantization when partialcirculant matrices are used as CS matrices. In the recent publications, Dirk-sen et al. [29] showed how to do one-bit quantization using random partialcirculant matrices, and also Feng et al. [32] showed how to do ⌃ quanti-zation using these matrices. The essence of the main result in [32] is in factProposition 7, which as we saw in Section 2.1 leads to Theorem 16. Usingthe notation used in Chapter 2, the main technical challenge in [32] is toprove that UT satisfies (P1) for appropriate choices of k and ` wheneverthe matrix is a random partial circulant matrix. In Section 3.5, we willconsider the problem of one-stage recovery using ⌃ quantization for a spe-cific class of deterministic partial binary circulant matrices which we proposeas follows.Now, we give an explicit formula for the novel class of deterministic partialcirculant matrices. We obtain bounds for the coherence of these matrices,which we then use to identify requirements that relate the sparsity level k tothe number of measurements m (as well as the ambient dimension n) to en-sure that these matrices can be used as CS matrices. We also perform somenumerical experiments to compare the performance of these matrices with723.4. A novel, explicit constructionother deterministic and random CS matrices. Finally, we investigate theproblem of quantization when these matrices are used as measurement ma-trices, and we provide theoretical guarantees for the error in reconstructionusing one-stage recovery method in the case of rth order ⌃ quantization.3.4 A novel, explicit constructionConsider the p⇥ p deterministic matrix A defined byAi,j =(⇣jip⌘if i 6= j1 if i = jwhere⇣.p⌘denotes the Legendre symbol, and 1 i, j p.Proposition 9. Let p be any prime number, then the matrix A as definedabove is a circulant matrix.Proof. First, we define the operator S : Rp ! Rp as follows.S⇣(x1, x2, ..., xp)⌘:= (xp, x1, · · · , xp1)Now, if we denote the rows of A by A1, A2, ..., Ap, then to prove A is acirculant matrix, we need to show S(Ai) = Ai+1 for 1 i p 1. Next,S(Ai) = S(Ai1, Ai2, ..., Aip) = S⇣(1 ip), (2 ip), ..., (p ip)⌘=⇣(p i)p), (1 ip), ..., (p 1 ip)⌘=⇣(1 (i+ 1)p), (2 (i+ 1)p), ..., (p (i+ 1)p)⌘= (Ai+1,1, Ai+1,2, ..., Ai+1,p) = Ai+1as desired.Given the matrix A above, we define the dp3/4e ⇥ p measurement matrix by considering the first dp3/4e rows of the matrix A (and then normalizingit).733.4. A novel, explicit constructionDefinition 10. Let p 3 be a prime number. The (i, j) th entry of themeasurement matrix , of our construction is defined as follows.i,j =8<:1pdp3/4e⇣jip⌘if i 6= j1pdp3/4e if i = jwith 1 i dp3/4e, and 1 j pBased on this definition, is a deterministic partial circulant matrix,and hence, all the features and benefits of using a circulant measurementmatrix, such as less space required to store the matrix, or fast matrix-vectormultiplication, and fast reconstruction scheme can be applied to these ma-trices.Theorem 24. There exists p0 23 such that for p p0, the coherence µ ofthe dp3/4e ⇥ p matrix defined above satisfiesµ 3p1/2 log pp3/4=3 log pp1/4Hence, for these matricesk kµ 3k log pp1/4Thus, in the context of CS, these matrices can be used for measurement and`1 recovery of vectors with the sparsity level of k = O( p1/4log p).Note that if we use as a measurement matrix, then the maximum sparsitylevel k, and the number of measurements m are related via k = O⇣m1/3logn⌘since as we observed above, the number of measurements is of order p3/4,and the the maximum sparsity level k is of order p1/4log p . This clearly comparesunfavourably to the random case; however note that our construction isexplicit and fast.Remark 11. Similar results will hold if we construct a dp↵e⇥p CS matrix with the similar definition (with 1/2 < ↵ < 1) given in section II. Specifically,we define the dp↵e ⇥ p matrix asi,j =1pdp↵e⇣j ip⌘743.4. A novel, explicit constructionwhere 1 i dp↵e and 1 j p. Such matrix can be used for the exactrecovery of signals with the sparsity level k satisfyingk p↵1/218p2 log p= O⇣m↵1/2↵log n⌘where we used the fact that m = dp↵e, and n = p. Hence, if we choose alarger value for ↵ (close to 1 ), then the matrix can be used for recovery ofa larger class of signals but we need more number of measurements. While ifwe choose a smaller value for ↵ (close to 1/2), then the matrix is closer tobe an ideal CS matrix (where the number of measurements is much less thanthe ambient dimension) but on the other hand can be used for recovery ofa smaller class of signals. Note that 0 < ↵1/2↵ < 1/2 for 1/2 < ↵ < 1,which is within boundaries of square-root barrier for deterministic matrices.To prove Theorem 24, we should consider the inner product of two distinctcolumns of our measurement matrix. As we will see below, the inner prod-ucts are related to a quantity called “incomplete Weyl sums". Let p be acharacter modulus p, and let f(x) 2 Zp[x] be a polynomial. A complete Weylsum is a sum of the formp1Xx=0p⇣f(x)⌘.It can be shown [78] that if f(x) 2 Zp[x] is monic of degree d 1 and f hasdistinct roots in Zp, then p1Xx=0p⇣f(x)⌘ < dppAn incomplete Weyl sum is a sum of the formNXx=Mp⇣f(x)⌘,where M,N 2 Zp. In the case of f(x) = x, an upper bound was found byPolya in 1918 : NXx=Mp(x) pp log pIn the case of an arbitrary polynomial f(x), the following bound on theIncomplete Weyl sums can be derived.753.4. A novel, explicit constructionTheorem 25. (Incomplete Weyl-Sum Estimate, Theorem 9.2 in [78]) : Thereexists p0 > 0 such that for any prime number p p0, and for any monicpolynomial f(x) 2 Zp of degree d 1 and with distinct roots in Zp we have NXx=0p⇣f(x)⌘ d(1 + log p)ppfor any integer N 2 Zp.Proof of Theorem 24. Suppose that a and b are two distinct columns of, i.e., a 6= b. Thenha,bi = 1dp3/4edp3/4eXi=1⇣b ip⌘⇣a ip⌘(3.2)Next, let f(x) := (x a)(x b), (n) :=⇣np⌘(which is a character modulusp), and N = dp3/4e. Then, f(x) is of degree d = 2, and hence we canuse Theorem 25. Also, note that for any integer i 2 Zp, we have f(i) =(b i)(a i), and hence,⇣f(i)⌘=⇣(b i)(a i)p⌘= (b ip)(a ip)Therefore, dp3/4eXi=1⇣b ip⌘⇣a ip⌘ = dp3/4eXi=1p⇣f(i)⌘ dp3/4eXi=0p⇣f(i)⌘+ 1 2(1 + log p)pp+ 1 (3 + 2 log p)pp (3 log p)ppwhere we used the assumption p 23 to conclude log p 3, i.e., p e3.Since the inequality above is valid for any two distinct values of a and b,using (3.2) we can concludeµ = maxa 6=bha,bi 3p1/2 log pp3/4=3 log pp1/4as desired.763.5. One-stage recovery for ⌃-quantized compressed sensing with deterministic partial circulant matrices3.5 One-stage recovery for ⌃-quantizedcompressed sensing with deterministic partialcirculant matricesWe saw in Chapter 2 that we can perform a one-stage reconstruction ⌃quantization method for any CS measurement matrix as far as the mea-surement matrix satisfies the property (P1) of order (k, `), as defined inSection 2.1. In particular, Theorem 16 states that if we write the singularvalue decomposition of Dr in the form Dr = U⌃V T , and if the (original)measurement matrix satisfies (P1) of order (k, `), then we can use a (mod-ified) measurement matrix ˜ = U, and the error in reconstruction usingthe algorithm (2.1) satisfies the bound given in (2.3).Proposition 10. The partial circulant matrix , as defined in Definition10, satisfies the property (P1) of order (k, `) for ` = dp5/8e, and k = d p1/8log pe.Hence, in terms of number of measurements m = dp3/4e, and the ambientdimension n = p, we have ` = O(m5/6), and k = O(m1/6logn ).Proof. Recall that in one stage reconstruction method, we start with a matrixthat satisfies RIP. Then, we consider the first ` (for ` as small as possible)rows of the m ⇥ n matrix , and the resultant matrix still satisfies RIP.The technical issue is that such property does not hold for an arbitrarymeasurement matrix, e.g., see Section 2.3. However, our construction herehas the advantage that the matrix obtained by considering its first ` rows isstill a partial circulant binary matrix. Hence, to obtain the RIP constantsof a matrix of the form =1p`` (3.3)where ` is the restriction of as defined above to its first rows, we simplyreplace m = dp3/4e in the proof of Theorem 24, with ` = dp5/8e. Then, weconclude that the coherence of this matrix given in (3.3), satisfiesµ 18pp log pp5/8=18 log pp1/8Hence, if the sparsity level satisfies k p1/8log p , we havek < kµ =19773.6. Numerical experimentsThe following Corollary is immediate by combining the theorem above, andTheorem 16.Corollary 6. Consider the deterministic partial circulant matrix of orderdp3/4e ⇥ p, as defined in Definition 10. Then, for k p1/8log p , any k-sparsesignal x can be approximated with the vector xˆ, the solution to (2.1). Here,the measurement matrix that we use is U, and q is obtained from rth order⌃ quantization. Moreover, as we increase the number of measurements p,the error in approximation decreases according tokx xˆk2 2CrC4(3⇡r)r(m1/6)r+1/2 + 2C4✏pm1/6 (3.4)for constant C4 depending only on RIP constant of , and where m = dp3/4edenotes the number of measurements.3.6 Numerical experimentsFirst, we give an example of the new construction when p = 997. Figure 3.1shows the matrix constructed with this method.To compare the construction we introduced in this chapter with otherexisting deterministic constructions extensively, we run four numerical ex-periments.In the first experiment, we compute the coherence of matrices as definedin Definition 10, in order to find out about the tightness of the theoreticalbound we derived in this chapter. In this experiment, for each value of p, with71 p 1193, we start with the dp3/4e⇥p matrix , then we construct ⇤.Next, we calculate the maximum off-diagonal entry (in absolute value) of thisGramian matrix. This gives the coherence of the matrix and we plot thisin log-log scale in Figure 3.2. As seen in this Figure, the coherence of thesematrices as a function of p, behaves almost like f(p) = 1p1/4, which apartfrom a logarithm factor, matches with the bound we derived in Theorem 24.In order to make a comparison, we also plot the coherence of chirp sensingmatrices (of the size p ⇥ p2) as a function of p. Note that the coherence ofchirp sensing matrices matches precisely with f(p) = p1/2.In the second experiment, we fix a measurements matrix (randomor deterministic) and we consider a k-sparse signal x 2 R300. For each2 k 102, we choose a random support T ✓ {1, 2, ..., 300} with k elements,and we choose non-zero entries of x independently from standard Gaussiandistribution. Then, we compute the measurement vector y = x, and we783.6. Numerical experimentsuse BP to to find the reconstruction vector xˆ. Next, we compute SNR forthe signal x defined bySNR(x) = 10 · log10⇣ kxk2kx xˆk2⌘dB (3.5)If we obtain SNR(x) > 50, we consider the reconstruction as a successfulrecovery and otherwise an unsuccessful recovery. For each k, we repeat thisexperiment 10 times and we let f(k) := number of successful recoveries10 . InFigure 3.3, we plot f(k) vs. k for different constructions. For our proposedconstruction, we choose a prime number close to 1000, say p = 997, whichleads to the number of measurements m = d9973/4e = 178. We can choosethe same number of measurements for random Bernoulli matrix. For Reed-Muller construction, since the number of measurements is a power of 2, wechoose the smallest power of 2, greater than 178, namely, m = 256. ForDeVore construction, we choose m = 169 (the closest integer of the formp2 to 178). Accordingly, in Figure 3.3, we use Reed-Muller construction ofthe size 256 ⇥ 236, restricted to its first 300 columns (as x 2 R300), DeVoreconstruction of the size 132 ⇥ 133, restricted to its first 300 columns, thenovel construction of the size d9973/4e ⇥ 997 = 178 ⇥ 997, restricted to itsfirst 300 columns, and also random Bernoulli of the size 178⇥300. As we seein this Figure, the novel construction introduced in this chapter has the bestperformance among the other deterministic constructions mentioned above,and its performance is comparable to the random Bernoulli matrices.In the third experiment, we fix k = 10, and we consider a k-sparsesignal x 2 R40, with random support and non-zero entries chosen indepen-dently from the standard Gaussian distribution, and we consider the newconstruction with m = bp3/4c, and n = p (with 41 p 293). For eachvalue of p, we evaluate y = x, we approximate x with xˆ using BP, andwe evaluate SNR using (3.5). Similar to above, if SNR > 50, we con-sider the reconstruction as a successful recovery, and otherwise an unsuc-cessful recovery. We repeat the same experiment 10 times and we definef(p) := number of successful recoveries10 . We plot f(p) vs. p in Figure 3.4.We repeat the same process above with k = 20. Lastly, we perform thesame process with k = 10 and k = 20 using random Bernoulli matrices withexactly same sizes, i.e., bp3/4c ⇥ p, with 41 p 293. Figure 3.4 also sug-gests that the performance of the new construction is comparable with therandom Bernoulli matrices.In the last experiment, for a fixed value of p (which fixes the number ofmeasurements m = dp3/4e), and with 113 p 197, we consider a 1-sparse793.6. Numerical experiments100 200 300 400 500 600 700 800 90050100150Figure 3.1: The binary matrix given by the new construction for p = 997.102 103p10-1100CoherenceFigure 3.2: Coherence of the matrices introduced in this chapter, and alsochirp sensing matrices in log-log scale, accompanied with best fitted lines. Asseen in this Figure, the coherence of our construction behaves as ⇠ m1/3,and for chirp sensing matrices behaves as ⇠ p1/2 = m1/2.803.6. Numerical experiments0 20 40 60 80 100 120Sparsity level k00.10.20.30.40.50.60.70.80.91Fraction of successful recoveryFigure 3.3: The fraction of exactly recovered vectors versus sparsity for afixed number of measurements. The number of measurements is chosen asm = 256 for the Reed-Muller matrix, m = 178 for the new construction andalso for the Bernoulli matrix, and m = 169 for the DeVore’s construction.The ambient dimension of all signals is n = 300signal x 2 R100 with random support and the non-zero entry chosen from astandard Gaussian distribution. Then, we reconstruct it using BP, and wemeasure SNR(x). We repeat the experiment 50 times for 50 signals x and ifthe minimum value of SNR(x) becomes greater than 50 dB, we consider thatlevel of sparsity as exactly recoverable and we increase the sparsity level by1 unit. Then, we repeat the same experiment until the minimum value ofSNR(x) (among all 50 experiments) becomes less than 50 dB for a sparsitylevel k1. Then we define g(p) := k1 1 as the maximum level of recoverablesparsity corresponding to p. The graph of g(p) versus p is plotted in Figure3.5 in log-log scale. In the suboptimal case, we know that the maximumlevel of sparsity is m/ log(n/m) which will be p3/4/(4 log p). Therefore, inaddition to the graph of g(p), we plot the graph of f(p) = p3/4 and wecompare these graphs. As it can be seen in this Figure, our construction,like the random CS matrices has the feature that numerically, the maximumrecoverable sparsity level behaves close to the suboptimal case.813.6. Numerical experiments0 50 100 150 200 250 300Ambient dimension p00.10.20.30.40.50.60.70.80.91Fraction of successful recoveryFigure 3.4: The graph of fraction of exactly recovered vectors (for 10 ex-periments) versus prime number p for a fixed level of sparsity (k = 10 or20) for the new construction and the Bernoulli matrices. Note that onlythree graphs are shown because the graphs corresponding to k = 20 forthe new construction and random Bernoulli exactly overlap with each other.This suggests that our proposed deterministic construction has a very similarperformance to random Bernoulli.120 130 140 150 160 170 180 190Ambient dimension p8101214161820Sparsity level kFigure 3.5: The maximum sparsity level of recoverable signals g(p) versusthe prime number p compared with the graph of f(p) = p3/4.82Chapter 4RIP constants for deterministiccompressed sensingmatrices-beyond GershgorinWe saw in Section 1.2 that given a deterministic CS matrix, one of the mostcommon ways to bound its RIP constant is by relating its RIP constant withits mutual coherence viak µ(k 1) (4.1)Throughout this chapter, we will call this bound, the Gershgorin bound onthe RIP constants, or simply the Gershgorin bound as this is the boundobtained from Gershgorin circle theorem. Moreover, the tightest bound thatrelates the RIP constants of a matrix to its performance as a CS measurementmatrix is due to [18] and it states that to ensure recovery of k-sparse (orcompressible) signals, we need 2k < 1p2 . On the other hand, using (4.1), inorder for a matrix to have small enough RIP constant, it is sufficient thatthe maximum sparsity level satisfies k < 12µp2+ 12 . Considering the Welchbound for the coherence of an m ⇥ n matrix, this imposes a square-rootbarrier on the sparsity level of signals, namely, k = O(pm). Comparingthis level of sparsity with the maximum level of sparsity for sub-Gaussianmatrices, which is found to be k = O( mlog nm ), we observe that there is a hugedifference between these two.In fact, as mentioned in [37], p. 141, finding a deterministic CS matrixthat satisfies RIP in the optimal regime is a major open problem. Herewe quote a few sentences from [37] that explains the intrinsic difficulty ofreaching RIP in the optimal regime :“The intrinsic difficulty in bounding the restricted isometry constants ofexplicit matrices A lies in the basic tool for estimating the eigenvalues ofA⇤SAS Id, namely, Gershgorin’s disk Theorem ... the quadratic bottle-neck is unavoidable when using Gershgorin’s theorem to estimate restrictedisometry constants. It seems that not only the magnitude of the entries of83Chapter 4. RIP constants for deterministic compressed sensing matrices-beyond GershgorinGramian A⇤A but also their signs should be taken into account in order toimprove estimates for deterministic matrices, but which tools to be used forthis purpose remain unclear." We will verify the fact that one needs to takeaccount the signs of entries of the Gramian matrix (in addition to the mag-nitudes) to obtain a bound that improves the Gershgorin bound. See Section4.1. Moreover, recall from Section 1.2 that as shown in [76], whenever thenumber of measurements satisfies m k1+↵s2 log n, with ↵ 2 [0, 1), thereis no polynomial time algorithm that can certify the RIP constants satisfyk s. This shows the significance of finding RIP matrices, even in thesuboptimal regime m k1+↵s2 log n, with ↵ 2 [0, 1). The first step to doso, is going beyond the Gershgorin bound, namely, the bound given in (4.1).In this chapter, we will propose two different tools to replace Gershgorincircle theorem for bounding eigenvalues of the Gramian matrix to estimatethe isometry constants for a specific construction. In the first approach, wecompare the Gramian matrices of this construction with the skew-adjacencymatrices of specific graphs to obtain bounds on the extreme eigenvalues ofthe Gramian matrices and hence, will estimate the RIP constants.To explain the idea used in the second approach, first note that Gersh-gorin circle theorem bounds every eigenvalue of a matrix uniformly. That is,it does not distinguish the extreme eigenvalues with other eigenvalues and itstates that every eigenvalue lies in one of Gershgorin circles. However, theisometry constants only depend on the minimum and maximum eigenvaluesof the Gramian matrix. There is in fact, a bound called “Dembo bound"which provides bounds for the maximum and minimum eigenvalues of a pos-itive semidefinite Hermitian matrix. The goal in this chapter is to use oneof these two approaches to achieve an improved bound for the isometry con-stant, i.e., something better than k (k 1)µ. We will see that using thefirst approach mentioned above, one can improve the classical Gerhsgorinbound by a multiplicative constant while using the second approach one canhave a small additive improvement. However, the second approach has itsown significance because using this approach we will give a pathway by pro-viding an explicit conjecture regarding the distribution of quadratic residues,to break the square-root barrier via k = O(m5/7) (if the conjecture holds).All results in the literature on sparse recovery using the standard RIP relyon the Welch bound or its variants using `1-coherence. The only exception tothis, until our work, is the work of Bourgain et al. [16]. For a prime numberp, they constructed an explicit CS matrix of the order p⇥ p1+✏ (where ✏ > 0is a small number and m = p is the number of measurements) such that thismatrix satisfies RIP with k < 1/p2 when k = bp 12+✏0c (with ✏0 < ✏ is also a844.1. Paley tight frames for compressed sensingsmall constant) and p is large enough. As mentioned above, while we can notbreak the square-root barrier, we will propose novel approaches to improvethe bounds based on coherence or `1-coherence. Lastly, we will propose aconjecture that if it holds, we would have an improved version of breakingthe square-root barrier compared to the one given in [16]. This improvementwill be on how close to unity the power ↵ can be chosen in k = O(m↵), andon the lower bound on the minimum number of measurements.4.1 Paley tight frames for compressed sensingIn this chapter, we will investigate the behaviour of the RIP constants of aspecific class of matrices, and will show that it behaves better than what isexpected using the Gerhsgorin circle theorem, i.e., the bound given by (4.1).In order to choose such a class of matrices, first note that for a (normalized)measurement matrix , with the coherence µ, the 2 ⇥ 2 Gramian matrixof the form1 cc⇤ 1, with |c| = µ, has the extreme eigenvalues 1 ± µ, andhence, 2 = µ as predicted by (4.1). In the next step, we consider a Gramianmatrix of order 3 of the form241 µ µµ 1 µµ µ 135, and we observe that the extremeeigenvalues of this matrix are of the form 1± 2µ. However, if we consider aGramian matrix of the form24 1 iµ iµiµ 1 µiµ iµ 135 (with i = p1), the extremeeigenvalues are of the form 1 ± p3µ. Moreover, it can be seen that fora larger value of k, the spectral radius of the Gramian matrix of order kcan reduce further if all non-diagonal entries are imaginary numbers anda mixture of the above diagonal entries have negative imaginary parts (asopposed to all above diagonal entries having positive imaginary parts, or allhaving negative imaginary parts). Therefore, we search among measurementmatrices with the property that the inner product of distinct columns areimaginary numbers, and also for large enough k, a mixture of above-diagonalentries of Gramian matrices of order k, have negative imaginary parts. Sucha construction is based on Paley tight frame as proposed in [7]. Specifically,we will consider the following matrices.Definition 11. Let p ⌘ 3 mod 4 be a prime number, and consider thep⇥ p DFT matrix whose (m,n)th entry is given by e 2⇡ip mn. Next, choose the(p+1)/2 rows of the DFT matrix whose indices are quadratic residues mod p854.1. Paley tight frames for compressed sensing(starting with the row corresponding to m = 0). We denote this (p+1)/2⇥pmatrix by H, which we normalize to obtain the measurement matrix : := DH,where D is the diagonal matrix whose first diagonal entry isq1p , and therest of its diagonal entries areq2p .Hence, our measurement matrix is a (p + 1)/2 ⇥ p matrix with unitnorm columns. For example, for p = 7, we should consider the 7 ⇥ 7 DFTmatrix, and then consider the 1st, 2nd, 3rd, and 5th rows (corresponding tothe quadratic residues m = 0, 1, 2, 4 in Z7), and subsequently normalize theresultant matrix as mentioned above to obtain the 4 ⇥ 7 Paley CS matrix.Using Proposition 4, we compute the inner product between the columnscorresponding to n, n0 2 Zp ashn,n0i = 1p +2pp12Xm=1e2⇡i(nn0)p m2=1pp·p1Xm=0e2⇡i(nn0)p m2=⇣n n0p⌘ ipp.Here,⇣nn0p⌘denotes the Legendre symbol.One way to bound the RIP constants of this construction is using (4.1),which givesk k 1pp (4.2)On contrary, numerically, we observe that at least the lower bound on theRIP constant behaves much better. In fact, what we observe in Figure 4.1would be consistent withk kpp(4.3)for ⇡ 0.65. Note that if (4.3) is proved, then the square-root barrier wouldbe broken. In this chapter, we will show that for this construction, the bound(4.2) can be improved by an additive or a multiplicative constant using twonovel approaches. We will also propose a conjecture regarding distributionof quadratic residues in Zp that leads to (4.3). Next, we explain how weobtain Figure 4.1.Fix a value of p, say p = 103, and consider the Paley CS matrix asdefined above. Also, fix a value of k, say k = 30, and choose a signal withrandom support T ✓ {1, 2, ..., p} and with |T | = k. Let T = {r1, ..., rk}, and864.1. Paley tight frames for compressed sensing101Sparsity level k100RIP constant estimationFigure 4.1: The graph of lower bound of the RIP constants, compared withthe Gershgorin bound and the new improved bound, as given in Section 4.2,on the RIP constants.for each 1 j k, let d(j) be an estimation for the RIP constant of orderj defined byd(j) = max{max(Gj) 1, 1 min(Gj)}where Gj := ⇤TjTj , and Tj := {r1, ..., rj}. The graph of d(j) as a functionof j is shown in Figure 4.1 in log-log scale. In this figure, we also plot theclassical Gershgorin bound as well as the new improved bound (which will bederived in Section 4.2) on the RIP constants. As we observe in this figure andas suggested by the method of least squares, the lower bound function d(j) forthe RIP constants behaves like j for ⇡ 0.65308. Note that as mentionedabove, d(j) is a lower bound for the RIP constants since it is obtained byusing only one random support set, while the precise value of RIP constantsare obtained by considering the worst case over exponentially many supportsets. Accordingly, we perform another experiment in which we compare thebehaviour of d(j) as defined above obtained from a single random supportset with d0(j) obtained from the worst case of 1000 random support sets. Aswe observe in Figure 4.2, as we increase the number of random support sets,the behaviour of RIP constant estimation remains almost the same. We willuse the estimated value of in d(j) = j later in this chapter.Remark 12. In the construction used in [7], it is assumed that p ⌘ 1.However, as our goal in this chapter is to improve the Greshgorin bound, wewould not be able to do so with the same assumption. The reason is that the874.1. Paley tight frames for compressed sensing12 14 16 18 20 22 24 26 28 30Sparsity level k0.60.650.70.750.80.850.9RIP constant estimationEstimation using single support setEstimation using the worst case in 1000 support setsFigure 4.2: Comparison of lower bounds of the RIP constants obtained froma single random support set and using the worst case among 1000 randomsupport sets. We observe that the slope of the graph (in log-log scale) ob-tained from a single support set almost remains constant, as we increase thenumber of support sets from 1 to 1000.computation above shows that the inner product of nth and n0th columns of is given by⇣nn0p⌘· 1pp . Now, consider a set T = {r1, ..., rk} such that⇣rirjp⌘= 1 whenever i < j. Then, the Gramian matrix will be a k ⇥ kmatrix of the following formG =26641 1pp · · · 1pp...1pp1pp · · · 13775The Gershgorin bound for the maximum eigenvalue of this matrix is ⌘ =1 + k1pp . This is in fact the maximum eigenvalue of this matrix sincedet2664k1pp 1pp · · · 1pp...1pp1pp · · · k1pp3775 = 0This can be verified by adding rows 2, 3, ..., k to the first row, which makesthe first row the zero vector. For this reason, we change the assumption top ⌘ 3 mod 4, which as we will see later will lead to improving the Gershgorinbounds.884.2. Improving the Gershgorin bound using skew-adjacency matrices4.2 Improving the Gershgorin bound usingskew-adjacency matricesIn this section, we propose an approach that will enable us to improve theGershgorin bound by a multiplicative constant for the construction given inDefinition 11.We start by considering the construction given in Definition 11, anddecompose the Gramian matrix of order k for this construction, denoted byGk as follows.Gk = (gij) = Ik +Ak (4.4)where Ik is the identity matrix of order k, Ak = (aij), aii = 0, and aij =p1ppor p1pp for i 6= j.Recall that to compute the RIP constant of order k of the measurementmatrix ˜, we need to consider the Gramian matrices Gmaxk , and Gmink withlargest maximum and smallest minimum eigenvalues respectively (among allGramian matrices of the same order). Decompose these matrices as Gmaxk =I+Amaxk , and Gmink = I+Amink as in (4.4). For any matrix M , let max(M),min(M), and ⇢(M) denote the maximum and minimum eigenvalues of M ,and the spectral radius of M respectively. Then, we havemax(Gmaxk ) = 1 + max(Amaxk )andmin(Gmink ) = 1 + min(Amink )On the other hand, each Ak is a skew-symmetric matrix and hence, if isan eigenvalue of Ak, then is also an eigenvalue for Ak. This means thatmax(Amaxk ) = ⇢(Amaxk ) and min(Amink ) = ⇢(Amaxk ). Therefore,k = max{max(Gmaxk ) 1, 1 min(Gmink )} = ⇢(Amaxk ) (4.5)It remains to find a bound for ⇢(Amaxk ). Note that Amaxk can be written inthe form ofAmaxk =ippCk, (4.6)where i =p1, and Cmaxk is a skew-symmetric matrix with zero diagonals,and whose every other entry is 1, or -1, and it has the largest spectral radius(among all matrices of the same form). In order to bound the spectral radiusof Cmaxk , we view Cmaxk as the skew adjacency of an oriented graph, and use894.2. Improving the Gershgorin bound using skew-adjacency matricesthe results in the literature about the spectral radius of these matrices tofind bounds on the extreme eigenvalues of Cmaxk . First, we need the followingdefinition.Definition 12. Let G be a simple undirected graph of order n. By G wedenote a directed (or oriented) graph that assigns a direction to every edge ofG. The skew-symmetric adjacency matrix of G, denoted by S(G) = (sij)is an n⇥ n skew symmetric matrix such that si,j = 1 and sj,i = 1 if i! jis an arc of G. If there is no arc between the vertices i and j, we definesi,j = sj,i = 0. The skew spectral radius of G, denoted by ⇢S(G) is definedas spectral radius of S(G).Now, to find a bound for ⇢(Amaxk ), we need to consider the skew adjacencyof a simple graph with largest (among all oriented graphs of order k) spectralradius. It turns out [27] that the oriented graph whose skew adjacencymatrix has zero diagonals, whose upper diagonals are all 1, and whose lowerdiagonals are all -1 has the largest spectral radius. In particular, let Kn bethe complete graph of order n, and let K⌧n be the oriented complete graphwith the adjacency matrix with zero diagonals, with 1’s located in the upperdiagonal entries, and -1’s located in the lower diagonal entries. In otherwords,S(K⌧n) =266640 1 1 · · · 11 0 1 · · · 1...1 1 1 · · · 037775Theorem 26. For any oriented graph G of order n,⇢S(G) ⇢S(K⌧n) = cot(⇡2n)Equality holds if and only if S(G) = QTS(K⌧n)Q for some signed permuta-tion matrix Q.Based on this theorem and using our notation, we can conclude that⇢(Cmaxk ) cot(⇡2k) 2k⇡(4.7)where we used the fact that cot(x) 1/x for x > 0. This inequality comesfrom the standard inequality x < tanx for x > 0. Lastly, by combining(4.5), (4.6), and (4.7) we obtain the following theorem.904.3. Improving the Gershgorin bound using Dembo boundsTheorem 27. Let p 3 be a prime number. The RIP constant of the matrix˜ as defined in Definition 11 satisfiesk 2⇡ ·kppfor any k p.Therefore, the maximum sparsity level k for which we have a guaranteefor recovery through BPDN using this construction must now satisfy2k <rp2· ⇡2(instead of the standard bound 2k <qp2 + 1). For example, if we setp = 1009, the maximum sparsity level becomes 2k bqp2 · ⇡2 c = 35 (insteadof the standard bound 2k bqp2c+ 1 = 23).4.3 Improving the Gershgorin bound using DemboboundsWe have observed so far that the Gershgorin bound can be improved with amultiplicative constant using the specific construction given in Definition 11.For the rest of this chapter, we propose another tool that can also improvethe Gershgorin bound. Although this approach improves this bound onlyby an additive constant but it has advantage over the previous approach inthe sense that it can be applied to other constructions. More importantly,we will propose a conjecture that if it holds, then using this approach thesquare-root barrier can be broken for the construction given in Definition 11.First, we explain the idea behind this approach.We know that the RIP constants of a (normalized) measurement ma-trix is computed using the extreme eigenvalues of a matrix of the formA := ⇤TT , where T is a set T with |T | = k. Now, if the coherence of is C/pm (which is the suboptimal value considering the Welch bound),then by Gershgorin circle theorem, every eigenvalue (including the extremeeigenvalues) lies in (1kC/pm, 1+kC/pm). However, we would naturallyexpect to have sharper bounds if we try to bound only the extreme eigenval-ues. Below, we attempt to do this using the so-called Dembo bounds, andthen we will generalize it later. These bounds estimate the extreme eigen-values of a positive semidefinite Hermitian matrix, and is the main tool thatwe use in this approach.914.3. Improving the Gershgorin bound using Dembo boundsTheorem 28. (Dembo Bounds, [26]) Suppose that a positive semidefinteHermitian matrix R can be written as R =c b⇤b Qwhere Q is a k ⇥ kpositive semidefinite Hermitian matrix, b is a k⇥ 1 vector, and c 0. Also,suppose that 1 2 ... k+1 are the eigenvalues of R. Then, Dembobounds can be stated ask+1 c+ ⌘k2 +r(c ⌘k)24+ b⇤b (4.8)and1 c+ ⌘12r(c ⌘1)24+ b⇤b (4.9)where ⌘1 is any lower bound on the minimum eigenvalue of Q and ⌘k is anyupper bound on the maximum eigenvalue of Q.Before stating and deriving our results formally, we perform a numericalexperiment. Consider a Paley CS matrix as defined in Definition 11, withp = 103. We also fix a k-value, say k = 30, and we choose a random setT = {r1, r2, ..., rk} ✓ {1, 2, ..., p}. For 1 j k, let Tj = {r1, ..., rj},Aj = Tj , and Dj = ⇤TjTj . For 2 j k, we can write Dj of the formDj =1 b⇤b Dj1. Let j1 be the maximum eigenvalue of Dj1, and j bethe maximum eigenvalue of Dj . Since each entry of b is ±i/pp, we haveb⇤b = j1p . Therefore, using Dembo bounds, the bound on j which wedenote by ⌘j can be written as follows.⌘j =1 + j12+s(1 j1)24+j 1pWe also find the upper bound for the maximum eigenvalue of Dj given byGershgorin bound, i.e.,⌘0j = 1 +j 1ppNext, we calculate the ratio of the upper bounds for the maximum eigenval-ues of Dj to the actual maximum eigenvalues of Dj for these two differentbounds, i.e., Dembo boundsactual eigenvalues and alsoGershgorin boundsactual eigenvalues , and we plot thegraphs in log-log scale. Note that if the graph is closer to y = 1 it means wehave a better and tighter estimate. The graphs are shown in Figure 4.3 andwe can clearly see that Dembo bound gives a better estimate for the maxi-mum eigenvalues compared to Gershgorin bounds. A similar reasoning can924.3. Improving the Gershgorin bound using Dembo bounds101Sparsity level k1.11.21.31.41.51.61.71.81.92RatioDembo boundsGershgorin boundsFigure 4.3: Comparing the sharpness of Dembo bounds and Gershgorinbounds for estimating the maximum eigenvalue of a semi-positive definiteHermitian matrix by considering the ratio of these bounds over the actualmaximum eigenvalues for a fixed Paley matrix with p = 103.be given regarding the estimates for the minimum eigenvalues. However, thisapproach can not be applied in practice since it assumes that we have accessto exact eigenvalues of previous order in each step. Despite this fact, usingDembo bounds or generalizations of this bound can lead to an improvementof Gershgorin bounds. We begin by finding a non-trivial bound for k fora fixed small value of k, i.e., a bound tighter than the one given by Gersh-gorin bound. Then, we apply Dembo bounds or the so-called “GeneralizedDembo bounds" to obtain non-trivial bounds on k for the next values of kinductively.Theorem below provides a bound for the RIP constants when the con-struction given in Definition 11 are used as the measurement matrices.Theorem 29. Let p 7 be a prime such that p ⌘ 3 mod 4. Then fork 3, the RIP constant k of the matrix ˜, as defined above, satisfiesk k 1 1c(k1)pp(4.10)where c = 12(2p3) .934.3. Improving the Gershgorin bound using Dembo boundsRemark 13. The bound given in (4.10) is not achievable either using Ger-shgorin bound, or using `1-coherence as introduced in [34]. The `1-coherencefunction µ1 of the m⇥n matrix = [1 2 · · · n] is defined for s 2 [n1]byµ1(s) := maxi2[n]max{Xj2S|hi,ji|, S ✓ [n], card(S) = s, i 62 S}where [n] = {1, 2, ..., n}. It is shown in [34] that µ µ1(s) sµ. It is alsoshown that the RIP constant of a matrix satisfiesk µ1(k 1) (k 1)µSo in general `1-coherence may give a better bound compared to k (k1)µ.However, for the class of matrices considered in Definition 11 this is not thecase. In fact, if k p 1, then since the inner product of any two distinctcolumns of has magnitude 1/pp, we have µ1(k) = kµ = kpp .To prove Theorem 29, first we will prove the following lemmas.Lemma 3. Suppose p 7 is a prime number which is 3 mod 4. The RIPconstants 3 of the matrix ˜ as defined in Definition 11 satisfies3 =r3pRemark 14. The value of RIP constant above is consistent with the onegiven by Gerhsgorin bound, i.e., 3 2pp , and also with the “improved" boundas it was given in Theorem 27, i.e., 3 6⇡pp (note thatp3 < 6/⇡ < 2).Proof of Lemma 3. Let G3 be an arbitrary Gramian matrix of order 3. Wecan write G3 in the following formG3 =24 1 b cb⇤ 1 dc⇤ d⇤ 135where each b, c, and d is±i/pp. Let p(x) = det(G3xI) be the characteristicpolynomial of the matrix G3. Then, we havep(x) = (1 x)⇣(1 x)2 dd⇤⌘ b⇣b⇤(1 x) dc⇤⌘+ c⇣b⇤d⇤ c⇤(1 x)⌘= (1 x)3 (dd⇤ + bb⇤ + cc⇤)(1 x) + 2Re(bdc⇤) = (1 x)3 3(1 x)p(4.11)944.3. Improving the Gershgorin bound using Dembo boundswhere we used the fact that each of b, c, and d is ± ipp , and hence, Re(bdc⇤) =0. It can be easily verified that the roots of the polynomial above are x =1q3p , x = 1, x = 1+q3p . Thus, all Gramian matrices of order 3 (includingthe ones with the largest maximum eigenvalue, and the smallest minimumeigenvalue) have the same eigenvalues and hence,3 = max{(1 +r3p) 1, 1 (1r3p)} =r3pLemma 4. If c 1 is a constant, thenk 1/ck2+r(k 1/ck)24+ k + 1 k + 1 1/c(k + 1)for each k 2 N = {1, 2, ...}.Proof. Note that for each k 2 N we have4k + 4kc 4k + 4c 4c2k(k + 1)2 0Hence, we have4k 4kc(k + 1) + 4c(k + 1)2 4(k + 1)c2k(k + 1)2 0which implies4c2(k + 1)2 4c(k + 1)+4ck 4c2k(k + 1) 0Therefore,4c2(k + 1)2 4c(1 1k + 1) +2c 8c(k + 1)+4ck 4c2k(k + 1) 2cHence,k2 + 4 +4c2(k + 1)2+1c2k2+ 4k 4kc(k + 1)+2c 8c(k + 1)+4ck 4c2k(k + 1) k2 + 1c2k2 2c+ 4k + 4954.3. Improving the Gershgorin bound using Dembo boundsThus,(k + 2 2c(k + 1)+1ck)2 (k 1/ck)2 + 4k + 4Hence, ⇣k + 2 2c(k+1) + 1ck2⌘2 (k 1/ck)24+ k + 1On the other hand, it is obvious that the expression whose square we computeon the left hand side is positive, i.e., k + 2 2c(k+1) + 1ck > 0, therefore,k + 1 1/c(k + 1) k 1/ck2r(k 1/ck)24+ k + 1which implies the Lemma.Next, we give the proof for Theorem 29.Proof of Theorem 29. To prove this theorem, we use induction on k. Letp ⌘ 3 mod 4 be a prime satisfying the condition of the theorem, and let ˜be the (p + 1)/2 ⇥ p measurement matrix as defined in Definition 11. Let ✓ {1, 2, ..., p}, and by ˜ we mean an m⇥ || matrix defined by restrictionof ˜ to the columns indexed by the elements of the set . Define mink andmaxk asmaxk = max:||kmax(⇤) = max(⇤00),mink = min:||kmin(⇤) = min(⇤11),(4.12)where 0 and 1 are sets with k elements, max(A) and min(A) denotethe maximum and minimum eigenvalues of a matrix A respectively, andGmaxk := ⇤00 and Gmink := ⇤11 denote the Gramian matrices withlargest maximum eigenvalue and smallest minimum eigenvalue respectively.Note that the RIP constant k of ˜ is given byk = max{1 mink ,maxk 1} (4.13)In order to prove the theorem, we find an upper bound for the maximumeigenvalue of Gmaxk and a lower bound for the minimum eigenvalue of Gminkrespectively.Proving max(Gmaxk ) 1+k1 1c(k1)pp for k 3: First, note that the resultholds for k = 3 by Lemma 3. Indeed by this lemma, max(Gmax3 ) 1+p3pp =964.3. Improving the Gershgorin bound using Dembo bounds1 +2 12cpp for c satisfying12c = 2p3, i.e., c = 142p3 .Now assume that the statement is valid for k, then maxk+1 1 + k1/ckpp . Wewill show that maxk+2 1 + k+11/c(k+1)pp .To bound maxk+2 in terms of maxk+1, we will use the Dembo bounds as stated inTheorem 28. In particular, if R is a positive semidefinite Hermitian matrixsuch that R =c b⇤b Q, Q is a (k + 1) ⇥ (k + 1), positive semidefiniteHermitian matrix, and 1 2 . . . k+2 are eigenvalues of R, thenk+2 c+ ⌘k+12 +r(c ⌘k+1)24+ b⇤b (4.14)and1 c+ ⌘12r(c ⌘1)24+ b⇤b (4.15)for any ⌘1 and ⌘k+1 such that max(Q) ⌘k+1 and min(Q) ⌘1. In ourcase, Q is the matrix ˜⇤0˜0 , and R is the matrix ˜⇤˜, where ,0 ✓{1, 2, ..., p} with |0| = k + 1 and ◆ 0 with || = k + 2. Say, 0 ={j2, j3, ..., jk+2}, and = {j1, j2, ..., jk+2}. Then ˜ = [j1 ,j2 , ...,jk+2 ],and˜⇤˜ =26664hj1 ,j1i hj1 ,j2i . . . hj1 ,jk+2ihj2 ,j1i hj2 ,j2i . . . hj2 ,jk+2i...hjk+2 ,j1i hjk+2 ,j2i . . . hjk+2 ,jk+2i37775so we have c = 1 (as kjk = 1 for all j), b = [hj2 ,j1i hj3 ,j1i . . . hjk+2 ,j1i]T ,andQ =26664hj2 ,j2i hj2 ,j3i . . . hj2 ,jk+2ihj3 ,j2i hj3 ,j3i . . . hj3 ,jk+2i...hjk+2 ,j2i hjk+2 ,j3i . . . hjk+2 ,jk+2i37775Also, by induction hypothesis we have ⌘k+1 1 + k1/ckpp . Hence (4.14)implies that,k+2 1 + k 1/ck2pp +s(k 1/ck)24p+ b⇤b974.3. Improving the Gershgorin bound using Dembo boundsOn the other hand, using the fact that each entry of b is ± ipp we haveb⇤b =Pk+1i=11p =k+1p . Therefore, to provek+2 1 +k + 1 1c(k+1)pp, (4.16)it is enough to prove :k 1/ck2pp+s(k 1/ck)24p+k + 1p k + 1 1/c(k + 1)pp(4.17)This inequality holds by Lemma 4. Next, we observe that to calculate maxk+2,we have to consider all such matrices R as mentioned above and take maxi-mum over all such choices. In other words,maxk+2 = maxk+2(˜⇤˜)However, as it was seen in (4.16), the value of k+2(˜⇤˜) only depends on|| = k + 2, and not the elements of . Therefore, the same upper boundholds for maxk+2, i.e.,maxk+2 1 +k + 1 1c(k+1)ppProving mink 1 k1 1c(k1)pp for k 3: Similar to the argument givenabove, the induction base holds by Lemma 3. Assuming that the statementis valid for (k + 1) (induction hypothesis), we prove it for (k + 2). Usingthe same notation used above, and using Dembo bound (4.15), we can finda lower bound for the minimum eigenvalue of R, 1(R), as follows.1 1k 1/ck2pps(k 1/ck)24p+ b⇤b = 1k 1/ck2pms(k 1/ck)24p+k + 1pwhere we used the fact that b⇤b = k+1p . Hence, using (4.17), we obtain1 1 k + 1 1/c(k + 1)pp(4.18)Again, note that mink+2 can be calculated by considering all such matrices Rand taking a minimum over them, i.e.,984.4. A generalized Dembo approachmink+2 = min1(˜⇤˜)and as seen in (4.18), the value ofmin 1(˜⇤˜) depends only on || = k+2,and not the elements of . Therefore, the same lower bound holds for mink+2,i.e.,mink+2 1k + 1 1c(k+1)ppTherefore,k+2 = max{maxk+2 1, 1 mink+2} k + 1 1/c(k + 1)ppas desired.4.4 A generalized Dembo approachThroughout this chapter, we have focused so far on the bounds on the max-imum and minimum eigenvalues of a Hermitian positive semidefinite matrixA given by Dembo bounds as stated in Theorem 28. These bounds are ob-tained by considering the maximum and minimum eigenvalues of 2⇥2 blockmatrices R1 and R2 satisfying R1 A, and A R2 respectively. In thissection, our goal is to tighten these bounds by following a similar idea. Inparticular, we would like to consider the maximum and minimum eigenvaluesof 3⇥ 3 block matrices Q1 and Q2 satisfying Q1 A, and A Q2, in orderto obtain bounds on the extreme eigenvalues of A.Lemma 5. Suppose that a (k + 2)⇥ (k + 2) positive semidefinte Hermitianmatrix R can be written as R =24 a b cb⇤ a dc⇤ d⇤ Q35 where Q is a k ⇥ k positivesemidefinite Hermitian matrix, and c and d are k ⇥ 1 vectors, and a, b 2 C.Also, suppose that 1 2 ... k+1 k+2 are the eigenvalues of R.Then k+2 ⌫max and 1 ⌫min where ⌫max and ⌫min are the maximumand minimum roots of characteristic polynomials of R1 =24 a b cb⇤ a dc⇤ d⇤ ⌘kI35994.4. A generalized Dembo approachand R2 =24 a b cb⇤ a dc⇤ d⇤ ⌘1I35 respectively. Here, as before, ⌘1 is any lowerbound on the minimum eigenvalue of Q and ⌘k is any upper bound on themaximum eigenvalue of Q.Proof. Since Q ⌘1I 0, we have RR2 0, and so min(R) min(R2).Similarly, ⌘kI Q 0, implies that max(R) max(R1).Next, to estimate the extreme eigenvalues of matrices of the form R1 orR2 mentioned above, we need to obtain a formula for determinant of thesematrices. To that end, we use the so called Schur determinant formula.Lemma 6. [5, p. 50] (Schur determinant formula) Let M be a 2 ⇥ 2 blockmatrix of the formM =P QR Swhere P is a p⇥ p matrix, S is an s⇥ s matrix, Q is a p⇥ s matrix, and Ris an s⇥ p matrix. If P is invertible, thendet(M) = det(P ) · det(S RP1Q)Similarly, if S is invertible, thendet(M) = det(S) det(P QS1R)An immediate corollary of the lemma above is that for a 2⇥ 2 block matrixof the formR =a bc ⌘Ikwhere ⌘ 6= 0, and b, and c are 1⇥ k, and k⇥ 1 vectors respectively, we havedet(R) = ⌘k(a b⌘1c)Now, we are ready to derive a formula regarding the determinant of the 3⇥3block matrices of the form mentioned above.Lemma 7. Let R be a (k + 2) ⇥ (k + 2) matrix that can be written of theform R =24 a b cb⇤ a dc⇤ d⇤ ⌘Ik35, where c = (c1, c2, ..., ck), d = (d1, d2, ..., dk) are1004.4. A generalized Dembo approach1⇥k vectors. For each 1 i k, define the vectors ci, and di as 1⇥ (k1)vectors obtained by removing the ith entry ci and di from the vectors c andd respectively. Then we havedet(R) = ⌘k2⇣a2⌘2 a⌘(dd⇤ + cc⇤) bb⇤⌘2 + 2Re(bdc⇤) + ⌘where is defined as :=Pki=1 |ci|2did⇤i Pki=1 cid⇤idic⇤i .Note that the determinant of a Hermitian matrix must be a real number.Now, apart from the termPki=1 cid⇤idic⇤i , other terms are obviously real.This term is also real because if say c`, d` 6= 0, then the term c`d⇤`d`c⇤` willbe appeared in the sum above. Now, if cm, dm 6= 0, for some m 6= `, thenc⇤mdm will be one of the terms appearing in the expansion of d`c⇤` . Hence,c`d⇤`c⇤mdm will be a generic non-zero term of the expansion of c`d⇤`d`c⇤` . Next,note that this term will be accompanied by cmd⇤mc⇤`d` which is a genericterm in the expansion of cmd⇤mdmc⇤m. Thus, every term in this sum will beaccompanied by its complex conjugate, and hence, this term is also real.Proof of Lemma 7. Expanding the determinant along the first row we obtaindet(R) = a deta dd⇤ ⌘Ik b detb⇤ dc⇤ ⌘Ik+ c1 det2666664b⇤ a d2 d3 ... dkc⇤1 d⇤1 0 0 ... 0c⇤2 d⇤2 ⌘ 0 ... 0...c⇤k d⇤k 0 0 ... ⌘3777775 c2 det266666664b⇤ a d1 d3 . . . dkc⇤1 d⇤1 ⌘ 0 . . . 0c⇤2 d⇤2 0 0 . . . 0c⇤3 d⇤3 0 ⌘ ... 0...c⇤k d⇤k 0 0 . . . ⌘377777775+ . . .+ (1)k+1ck266666664b⇤ a d1 d2 d3 . . . dk1c⇤1 d⇤1 ⌘ 0 0 . . . 0c⇤2 d⇤2 0 ⌘ 0 . . . 0c⇤3 d⇤3 0 0 ⌘ ... 0...c⇤k d⇤k 0 0 0 . . . 03777777751014.4. A generalized Dembo approachFor the first two terms above, we use Schur determinant formula, and weexpand the remaining terms along the rows with more number of zeros.det(R) = a⌘k(a ⌘1dd⇤) b⌘k(b⇤ ⌘1dc⇤) + c1⇣ c⇤1 deta d1d⇤1 ⌘Ik1+ d⇤1b⇤ d1c⇤1 ⌘Ik1⌘ c2⇣c⇤2a d2d⇤2 ⌘Ik1 d⇤2 detb⇤ d2c⇤2 ⌘Ik1⌘+ . . .+ (1)k+1ck⇣(1)k+2c⇤ka dkd⇤k ⌘Ik1+ (1)k+3d⇤k detb⇤ dkc⇤k ⌘Ik1⌘Next, we use the Schur determinant formula to expand the determinant of2⇥ 2 block matrices above.det(R) = a2⌘k a⌘k1dd⇤ bb⇤⌘k + b⌘k1dc⇤⇣c1c⇤1⌘k1a ⌘k2c1c⇤1d1d⇤1 + c2c⇤2⌘k1a ⌘k2c2c⇤2d2d⇤2 + ...+ ckc⇤k⌘k1a ⌘k2ckc⇤kdkd⇤k⌘+ c1d1b⇤⌘k1 c1d1d1c⇤1⌘k2 + . . .+ ckd⇤kb⇤⌘k1 ckd⇤kdkc⇤k⌘k2= ⌘k2 a2⌘2 a⌘dd⇤ bb⇤⌘2 + b⌘dc⇤ acc⇤⌘ +kXi=1cic⇤idid⇤i+ ⌘b⇤cd⇤ kXi=1cid⇤idic⇤i!= ⌘k1 a2⌘ add⇤ bb⇤⌘ + 2Re(bdc⇤) acc⇤!+ ⌘k2 kXi=1cic⇤idid⇤i kXi=1cid⇤idic⇤i!= ⌘k2⇣a2⌘2 a⌘(dd⇤ + cc⇤) bb⇤⌘2 + 2Re(bdc⇤) + ↵⌘where is the expression defined as := Pki=1 cic⇤idid⇤i Pki=1 cid⇤idic⇤i!.Using the lemmas proved above, we show that the RIP constant of thePaley CS matrices– as defined in Definition 11– can be improved where theimprovement term is a universal constant, unlike the situation in the previoussection (see Theorem 29), where the improvement term was dependent onthe sparsity level.1024.4. A generalized Dembo approachTheorem 30. Let p 7 be a prime number such that p ⌘ 3 mod 4. TheRIP constants of the measurement matrix ˜ as given in Definition 11 satisfiesk k 123(2p3)ppProof. The idea of the proof is similar to the one given for Theorem 29, andwe use similar notation. Hence, we assume that the result holds for k whichgives an upper bound and a lower bound for the eigenvalues of a Gramianmatrix of the size k ⇥ k, and we prove the statement for (k + 2). Therefore,the proof includes two main steps, one regarding the maximum eigenvalue,and one regarding the minimum eigenvalue.1. We will prove that max(Gmaxk ) 1 +k1 23 (2p3↵)pp for k 3 usinginduction. To that end, we first verify the statement for k = 3 andk = 4; then we finish by assuming it holds for k, and proving thatthis implies it holds for (k + 2). The induction base (k = 3) holds byLemma 3, since by this lemma,max3 1 +p3pp= 1 +2 (2p3)pp 1 + 223(2p3)pp.The other induction base (k = 4) also holds by Theorem 29. Settingk = 4 in this theorem, we obtain 4 313cpp , which impliesmax4 1 +3 13cpp= 1 +3 23 12cpp= 1 +3 23(2p3)ppNext, consider Gmaxk+2, the (k + 2) ⇥ (k + 2) matrix obtained from theGramian matrix indexed by the set = {r1, r2, ..., rk+2} ✓ {1, 2, ..., p}(with || = k + 2,) and we write it in the following form.Gmaxk+2 =24 1 b cb⇤ 1 dc⇤ d⇤ Q35 (4.19)Here, b = h˜r1 , ˜r2i, c = (c1, ..., ck), d = (d1, ..., dk), ci = h˜r1 , ˜ri+2i,di = h˜r2 , ˜ri+2i (with 1 i k). By the construction of ˜, eachnon-diagonal entry of Gmaxk+2 (including b, ci’s and di’s) is ± ipp . On the1034.4. A generalized Dembo approachother hand, we know by Lemma 5, that the maximum eigenvalue ofGmaxk+2 is bounded from above by the maximum eigenvalue ofBmaxk+2 =24 1 b cb⇤ 1 dc⇤ d⇤ ⌘kI35 (4.20)where ⌘k is the upper bound on the maximum eigenvalue of Q and byinduction hypothesis can be written in the form ⌘k = 1 + Dpp , withD = k 1 23(2 p3). Next, let = 1 + Cpp , with C > D + 2 =k + 1 23(2 p3). If we show that p() 6= 0, where p(x) is thecharacteristic polynomial ofBmaxk+2 , then this shows that 1+k+1 23 (2p3)ppis an upper bound for the maximum eigenvalue of Bmaxk+2 , and hence, forGmaxk+2 as desired. Note that p(x) = det(Bmaxk+2 xI) can be written asp(x) = det241 x b cb⇤ 1 x dc⇤ d⇤ (⌘k x)I35, where each non-diagonal entryis ±i/pp. Following the notation of Lemma 4, we havep(x) = (⌘k x)k2⇣(1 x))2(⌘k x)2 (1 x)(⌘k x)(dd⇤ + cc⇤)(⌘k x)2bb⇤ + 2(⌘k x)Re(bdc⇤) + ⌘(4.21)where :=Pki=1 |ci|2did⇤i Pki=1 cid⇤idic⇤i . Next, we prove that 0. Note that each entry of vectors c and d is ± ipp . Thus,Pki=1 cic⇤idid⇤i =k(k1)p2 . Also, for each 1 i k, dic⇤i contains(k 1) terms, each with magnitude 1p . Thus,Pki=1 cid⇤idic⇤i k(k1)p2 .This implies that 0.Furthermore, since C > D, we have ⌘k < 0. Accordingly, to showp() 6= 0, considering the fact that 0, it is enough to show thatq() = (1 )2(⌘k ) (1 )(dd⇤ + cc⇤) (⌘k )bb⇤ < 0where we used the fact that Re(bdc⇤) = 0. Also, using ⌘k = 1 + Dpp , = 1 + Cpp , c⇤c = kp , and d⇤d = kp , we haveq() =1ppp⇣C2(D C) + C(2k + 1)D⌘(4.22)1044.4. A generalized Dembo approachTo show that q() < 0 for any C > D + 2, first let C = D + 2 =k 1/3 + 2p3/3, and recall that D = k 7/3 + 2p3/3. Then,q() =1ppp⇣ 2C2 + C(2k + 1) (k 7/3 + 2p3/3)⌘=1ppp⇣ 2(k + 2p33 13)2 + (k +2p33 13)(2k + 1) k + 73 2p33⌘=1ppp⇣ 23(2p3 1)k + 2 29(2p3 1)2⌘ 1ppp⇣ 23(2p3 1)k + 0.651⌘ < 0for k 1.On the other hand, if we substitute D = k 7/3 + 2p3/3, and subse-quently differentiate the right hand side of (4.22), we obtainddC⇣C2(k 7/3 + 2p3/3 C) + C(2k + 1)⌘= 3C2 + 2(k 7/3 + 2p3/3)C + 2k + 1 := g(C) < 0for C > k. This is because g(k) = k2 + (8/3 + 4p3/3)k + 1 < 0,and g(C) is decreasing for C > 2k14/3+43 (p3)6 . Therefore, q() < 0 forevery = 1 + Cpp and C > k 1/3 + 2p3/3. Hence, an upper boundfor the maximum eigenvalue of (k+2)⇥ (k+2) matrix R2 (and hencefor Gmaxk+2) is = 1 +k1/3+2p3/3pp , as desired. Note this bound onlydepends on ||, and not the elements of .2. We will prove that min(Gmink ) 1 k1 23 (2p3)pp for k 3 usinginduction. The proof is similar to above. For the sake of completeness,we state it briefly. The induction base (k = 3) holds by Lemma 3,since by this lemma,min3 1p3pp= 1 2 (2p3)pp 1 223(2p3)ppThe other induction base (k = 4) also holds by Theorem 29: Set k = 41054.4. A generalized Dembo approachin this theorem. Then we obtain 4 313cpp , which impliesmin4 13 13cpp= 1 32312cpp= 1 323(2p3)ppNext, consider the (k + 2)⇥ (k + 2) Gramian matrix Gmink+2. We writethis matrix of the form given in (4.19), and we consider the matrixBmink+2 of the form given in (4.20), and with ⌘k replaced by ⌘1, namely,the lower bound for the minimum eigenvalue of Q. We write ⌘1 of theform ⌘1 = 1D/pp, and we consider := 1C/pp with C > D+2.We will consider p(x), the characteristic polynomial of Bmink+2, and willshow that p() 6= 0. To do so, it is enough to show thatq() = (1 )2(⌘k ) (1 )(dd⇤ + cc⇤) (⌘k )bb⇤ > 0The expression for q() can be simplified asq() =1ppp⇣C2(C D) C(2k + 1) +D⌘(4.23)Now, let C = k 1/3 + 2p3/3, and recall that D = k 7/3 + 2p3/3.Next,q() =1ppp⇣2C2 C(2k + 1) + (k 7/3 + 2p3/3)⌘ 1ppp⇣23(2p3 1)k 0.651⌘ > 0for k 1.On the other hand, if we substitute D = k 7/3 + 2p3/3, and subse-quently differentiate the right hand side of (4.23), we will getddC⇣C2(k + 7/3 2p3/3 + C) C(2k + 1)⌘= 3C2 2(k 7/3 + 2p3/3)C 2k 1 := g(C) > 0for C > k. This is because g(k) = k2 + (8/3 4p3/3)k 1 > 0,and g(C) is increasing for C > 2k14/3+43 (p3)6 . Therefore, q() > 0 for1064.5. A path to break the square-root barrier using Dembo boundsevery = 1 Cpp and C > k 1/3+ 2p3/3. Hence, a lower bound forthe minimum eigenvalue of (k+2)⇥ (k+2) matrix R2 (and hence forGmink+2) is = 1 k1/3+2p3/3pp , as desired. Note that this bound alsoonly depends on ||, and not the elements of .Considering the calculations we did for the maximum and minimumeigenvalues of the Gramian matrices R1 and R2, we conclude that theRIP constant of order (k + 2) satisfiesk+2 k 1/3 + 2p3/3ppproving the theorem using induction.4.5 A path to break the square-root barrier usingDembo boundsIn this section, we propose an approach that can lead to breaking the square-root barrier for the construction given in Definition 11, if a specific conjec-ture regarding the distribution of quadratic residues holds. Our approach isbased on the generalized Dembo bounds as derived and explained in section4.4. Let denote the measurement matrix as defined in Definition 11. Wesaw in Section 4.2 that if all upper diagonal entries of the Gramian matrixG = ⇤TT , corresponding to the index set T = {r1, r2, ..., rk}, are i/pp(and all the lower elements are i/pp) then we would have a multiplicativeimprovement for Gershgorin bound but the square root barrier can not bebroken. Therefore, in such case,⇣r1 r3p⌘=⇣r1 r4p⌘= ... =⇣r1 rkp⌘= 1,⇣r2 r3p⌘=⇣r2 r4p⌘= ... =⇣r2 rkp⌘= 1If we re-tag the columns as r01 = rk, r02 = rk1, ..., r0k := r1, then⇣r01 r03p⌘=⇣r01 r04p⌘= ... =⇣r01 r0kp⌘= 1,⇣r02 r03p⌘=⇣r02 r04p⌘= ... =⇣r02 r0kp⌘= 11074.5. A path to break the square-root barrier using Dembo boundsIn either case, we have Xi2I(i)(i+ a) = |I|where (x) =⇣xp⌘denotes the Legendre symbol (and hence, is a Dirichletcharacter), I := {r1 r3, r1 r4, ..., r1 rk}, and a = r2 r1. Therefore, onecan hope that if the opposite to this situation occurs in the following sense,then the square-root barrier may be broken.Conjecture 1. There exists constants 0 < ↵ < 1, and ⌫ > 1/2, and apositive integer m↵ such that for any set {r1, ..., rk} in Zp, with m↵ k p⌫there exist indices 1 i < j k satisfying the following inequality|P`2Iri,rj (`)(`+ a)||Iri,rj |< ↵where (x) =⇣xp⌘, a = rjri, and Iri,rj := {rir` : 1 ` k, ` 6= i, ` 6= j}.We call Iri,rj a one-sided difference set.Note that this conjecture is not just based on what is needed to breakthe square-root barrier, but also based on a similar result already known inNumber Theory. We briefly mention this result as stated in [75].Let G be a finite (additive) group, and let D be a subset of G (called adifference set) with k elements such that every non-zero element of G canbe uniquely written as d1 d2. Then,|Xd2D(d)| = pk 1Hence, for any 0 < ↵ < 1, we have|Pd2D (d)||D| =pk 1k< ↵provided that k is sufficiently large. We also verify this conjecture numeri-cally with few examples.Set ↵ := 0.8, and m↵ := 5. The first prime satisfying p m↵ is p = 5in which case we should start with a set with 5 elements. We only have onechoice, and that is A = Z5 (or any permutation of it). Setr1 = 0, r2 = 1, r3 = 2, r4 = 3, r5 = 41084.5. A path to break the square-root barrier using Dembo boundsThen,⇣r1 r3p⌘⇣r2 r3p⌘= 1,⇣r1 r4p⌘⇣r2 r4p⌘= 1,⇣r1 r5p⌘⇣r2 r5p⌘= 1Thus,|P5i=3 ⇣ r1rip ⌘⇣ r2rip ⌘|3= 1/3 < ↵We also verify for p = 19. As we know, we should start with a set with at least5 elements, say, we start with a set with 12 elements. We choose a randomsupport set T with 12 elements, say T = {8, 15, 5, 13, 10, 2, 17, 4, 1118, 16, 19}(i.e., r1 = 8, r2 = 15, ..., r12 = 19). Then, among 10 elements of the sequence{⇣r1r3p⌘·⇣r2r3p⌘, ....,⇣r1r012p⌘·⇣r2r12p⌘}, we have three -1’s and seven 1’s.Hence,|P12i=3 ⇣ r1rip ⌘⇣ r2rip ⌘|10= 0.4 < ↵In the next experiment, we again work with p = 19, but we try to considerthe worst case (that could potentially fail the conjecture). Such case wouldoccur if we choose a set {r1, ..., rk} such that⇣rirjp⌘= 1 whenever i < j.In particular, we can choose T := {1, 2, 18, 16, 15, 14, 8, 7, 6, 4}. Then, all10 elements of the sequence {⇣r1r3p⌘·⇣r2r3p⌘, ....,⇣r1r12p⌘·⇣r2r12p⌘} are1’s which is opposite to what we need. However, we can simply chooser01 := r7 = 8, and r02 = r8 = 7, {r03, ..., r010} = T \ {7, 8} (the order isirrelevant). Then, we would have|P12i=3 ⇣ r1rip ⌘⇣ r2rip ⌘|10=210< ↵Therefore, in all these experiments the conjecture was verified for ↵ = 0.8(in these cases, we could even choose a smaller ↵, e.g., ↵ = 0.5).Now, based on this conjecture, we prove that the square-root barrier canbe broken for the construction given in Definition 11. In the following, weprovide a proof using induction on k. For the induction base, we need to usea value for the power in k kpp . Since numerical experiments (see Figure4.1) suggests < 0.7, we use = 0.7 as it seems that this is the value thatworks for any k-value.1094.5. A path to break the square-root barrier using Dembo boundsProposition 11. (breaking the square-root barrier). Suppose Conjecture 1holds with ↵, ⌫,m↵ as defined in the statement of this conjecture. Let = 0.7,and let c↵ be a fixed integer such that the following inequality holds:12c1+↵ < (1 ↵)c2↵ 2c↵,Also, let b↵ = max{m↵ + 2, c↵ + 2}, and suppose that p b2↵. Then, for theconstruction given in Definition 11, and for k min{p⌫2 , p122 }, we havek < 1/p2Hence, the square-root barrier would be broken for this construction.Remark 15. Here, we make the above statement more concrete. Set ↵ =0.8,m↵ = 5, and ⌫ = 0.8. Then, c↵ is the smallest integer satisfying12x1.7 0.2x2 2xNumerically we check that we can set c↵ = 899, 998, and this gives b↵ =900, 000. So the proposition above then reduces to:“Suppose Conjecture 1 holds with the values mentioned above, and letp ⌘ 3 mod 4 be a prime number satisfying p 81 ⇥ 1010, and considerthe construction given in Definition 11. Then, the RIP constants of suchconstruction satisfy 2k < 1/p2 for any k < p5/72 ".Proof of Proposition 11. First, let k b↵. Then, k pp, and hence, byTheorem 27, the RIP constant of the measurement matrix ˜ satisfiesk 2⇡kpp 2⇡<1p2as desired. Next, let k = k0 b↵ be an arbitrary integer (with k0 p⌫),and we prove thatk kpp(4.24)holds for k = k0 using induction. To do that, first verify (4.24) for k = b↵ 1, b↵ 2 numerically (induction base). Now, since k0 m↵, by Conjecture1, there exist indices 1 < i < j k0 such that if we set r01 = ri, r02 = rj , thenPk0`=3 ⇣ r01r0`p ⌘⇣ r02r0`p ⌘k0 2 < ↵ (4.25)1104.5. A path to break the square-root barrier using Dembo boundswhere r03, ..., r0k0 are the elements of the set {r1, ..., rk0} \ {r01, r02} (the orderis irrelevant). In the next step, we consider I3 := {r03, r04, ..., r0k0}, and ifk0 2 = |I3| m↵, then we apply Conjecture 1 again, and after possibly apermutation, we can assume thatPk0`=5 ⇣ r03r0`p ⌘⇣ r04r0`p ⌘k0 4 < ↵. (4.26)Continuing this process, in the last step we reachPk0`=k0b↵+2 ⇣ r0k0b↵r0`p ⌘⇣ r0k0b↵+1r0`p ⌘b↵ 1 < ↵ (4.27)if k0 ⌘ b↵ + 1 mod 2, andPk0`=k0b↵+3 ⇣ r0k0b↵+1r0`p ⌘⇣ r0k0b↵+2r0`p ⌘b↵ 2 < ↵ (4.28)if k0 ⌘ b↵ mod 2. Now, let maxk0 and mink0 denote the maximum and minimumeigenvalues of the Gramian matrices of order k0 with the largest maximumeigenvalue and the smallest minimum eigenvalues respectively. We prove thetheorem using two main steps.Step 1: Proving maxk0 1 +k0pp . As mentioned above, our inductionbase consists of verifying (4.24) for k = b↵ 1, b↵ 2 numerically. Next, weconsider two cases.Case (I): k0 ⌘ b↵ mod 2. Our goal is to estimate the eigenvalues of amatrix of the form G = Gmaxk0 = ⇤TT , with T = {r1, r2, ..., rk0}. Afterpossibly a proper permutation, we can assume that T = {r01, ..., r0k0} is suchthat all of (4.25), (4.26),...,(4.28) hold. Now, let T1 = {rk0b↵+3, ..., rk0},k1 := b↵2, G1 = ⇤T1T1 (G1 is obtained by considering the last k1 = b↵2rows and columns of G). We start by estimating eigenvalues of this matrix.By induction hypothesis, an upper bound for the largest eigenvalue of G1 isgiven by⌘k1 = 1 +k1ppNow, we show that⌘k2 = 1 +(k1 + 2)pp1114.5. A path to break the square-root barrier using Dembo bounds(with k2 = k1 + 2) is an upper bound for the maximum eigenvalue of theGramian matrix G2 obtained by considering the last k1+2 rows and columnsof G. In order to do that, we write G2 in the formG2 =24 1 b cb⇤ 1 dc⇤ d⇤ G135with c = (c1, ..., ck), d = (d1, ..., dk), ci = h˜rk0k11 , ˜rk0k1+ii, and di =h˜rk0k1 , ˜rk0k1+ii (with 1 i k1). Note that each of the entries b, ci’sand di’s are ± ipp . (similar to what we did in (4.19)). We also define B2 viaB2 =24 1 b cb⇤ 1 dc⇤ d⇤ ⌘k135 (4.29)where ⌘k1 = 1 + Dpp , with D = k1 , and we let = 1 + C/pp with C >(k1 + 2) . We will show that p() 6= 0, where p(x) is the characteristicpolynomial of B2. First, we write the expression for p(x) as given in (4.21).p(x) = (⌘k x)k2⇣(1 x))2(⌘k x)2 (1 x)(⌘k x)(dd⇤ + cc⇤)(⌘k x)2bb⇤ + 2(⌘k x)Re(bdc⇤) + ⌘Using Re(bdc⇤) = 0, c⇤c = d⇤d = k1p , and ⌘k 6= 0, we observe that toshow p() 6= 0, it is enough to showq(C) := C2(D C)2 (C)(D C)(2k1) (D C)2 + > 0 (4.30)i.e.,q(C) = C2(C D)2 C(C D)(2k1) (C D)2 + > 0In order to show this inequality, we show thatC(C D)(2k1) + (C D)2 < (4.31)where we used the fact that 0. We call the left hand side and right handside of the inequality above as LHS and RHS respectively. Next,LHS = (C D)⇣2k1C + (C D)⌘= (C D)⇣(2k1 + 1)C D⌘1124.5. A path to break the square-root barrier using Dembo boundsOur goal is to prove (4.31) for any C > (k1+2) , but we start by consideringC = (k1 + 2) . Then,C D = (k1 + 2) k1 < 2where we used the fact that if we set f(x) = ax (a 2)x, then f(1) = 2,and f(x) is increasing for 0 x 1. Hence,LHS 2⇣(2k1 + 1)(k1 + 2) k1⌘< 2⇣(3k1)(2k1)⌘< 12k1+1where we used the fact that 2k1 + 1 < 3k1, and k1 + 2 < 2k1.On the other hand,RHS = =k1Xi=1cic⇤idid⇤i k1Xi=1cid⇤idic⇤i = k1(k1 1)k1Xi=1cid⇤idic⇤iIn above, ci, di are vectors c and d excluding their ith entry (so they are (k11)-dimensional vectors). Now, we find an upper bound forPk1i=1 cid⇤idic⇤i :k1Xi=1cid⇤idic⇤i k1Xi=1|d⇤i ci| k1maxi|d⇤i ci|On the other hand, for each 1 j k1, by (4.28), we have|d⇤jcj | k0Xi=k0k1+1⇣r0k0k11 r0ip⌘⇣r0k0k1 r0ip⌘+ 1 ↵k1 + 1Thus, k1(k1 1) ↵k21 k1 = (1 ↵) · k21 2k1and hence, 12k1+1 . In above, we used the fact that for k c↵, we have12k1+ (1↵)k2 2k. Therefore, q(C) > 0 for C = (k1+2) . Next, notethat the value of does not depend on C, and note that CD is increasing asa function of C. Hence, if we show that g(C) := C2(CD)2k1C(CD)is an increasing function of C, then we can conclude that q(C) > 0 forC > (k1 + 2) . Now, we haveddCg(C) = 3C2 2k1 2CD 1 = 3C2 2C · k1 2k1 1 > 01134.5. A path to break the square-root barrier using Dembo boundsfor C (k1 + 2) . To justify the last inequality, we define h(C) := 3C2 2C · k1 2k1 1, and we verify that h⇣(k1 + 2)⌘> 0, and ddCh(C) > 0 forC > (k1 + 2) :h⇣(k1 + 2)⌘= 3(k1 + 2)2 2(k21 + 2k1) 2k1 1= 3(k21 + 4k1 + 4) 2(k21 + 2k1) 2k1 1 3(k21 + 2k1) 2(k21 + 2k1) 2k1 1 = (k21 + 2k1) 2k1 1 > 0,for k1 4. We also haveddC(3C2 2Ck1 2k1 1) = 6C 2k1 > 0for C > k1 /3. Therefore, if we set k2 := k1+2,we have shown that 1+(k2)pp isan upper bound for the maximum eigenvalue of G2 = T2T2 , with T2 beingthe set containing the last k2 entries of T , i.e., T2 = T1[{rkb↵+2, rkb↵+1}.After repeating a similar reasoning mentioned above, we can conclude that1+k3pp is an upper bound for G3 = ⇤T3T3 , with k3 := k2 +2, and T3 beingthe set containing the last k3 elements of T . By continuing this process, weconclude that in the last step, 1 + kp`p is an upper bound for the maximumeigenvalue of G` = G = T`T` , where k` = k0, and T` = T .Case (II). k0 ⌘ b↵+1 mod 2. In this case, to find an upper bound for themaximum eigenvalue ofG = Gmaxk0 = ⇤TT (again with T = {r1, r2, ..., rk0}),we begin with estimating the eigenvalues of G1 := ⇤T1T1 , with T1 being theset containing the last k1 := b↵ 1 elements of T . By induction hypothesis,the upper bound for the largest eigenvalue of G1 is given by1 +k1ppNext, it can be shown (similar to above) that 1 + k2pp is an upper bound forthe largest eigenvalue of G2 = ⇤T2T2 , with k2 := k1+2, and T2 being the setcontaining the last k2 entries of T . Continuing this process, we conclude thatthe 1+ kp`p is an upper bound for the largest eigenvalue of G = G` = ⇤T`T` ,with k` = k0 and T` = T .Step 2: Proving mink0 1 k0pp . The proof of this step is similar tothe one given for Step 1 (and consists of two cases). In each case, we startby concluding from induction hypothesis that ⌘1,k1 := 1 Dpp , with D = k11144.5. A path to break the square-root barrier using Dembo boundsis a lower bound for the minimum eigenvalue of G1 = ⇤T1T1 . Then, welet = 1 Cpp with C > D + 2, and we show that p() 6= 0, where p(x)is the characteristic polynomial of the matrix B02 (obtained from the matrixB2 as given in (4.29), but ⌘k1 replaced by ⌘1,k1). To do so, it is enough toshow (4.30) holds, which we already know its validity as proved in Step 1.Therefore, ⌘1,k2 = 1 k2pp (with k2 = k1+2) is a lower bound for the minimumeigenvalue of B02 (and hence the minimum eigenvalue of G2). Continuing thisprocess, after ` steps, we conclude that ⌘1,k` = 1 kp`p is a lower bound forthe minimum eigenvalue of G` = G = ⇤T`T` , with k` = k0, and T` = T .Gathering the results proved in Steps 1 and 2, we conclude that the RIPconstants of ˜ satisfyk = max{maxk 1, 1 mink } kppfor b↵ k p⌫ . Therefore, k < 1/p2 for k min{p⌫2 , p122 }, as desired.115Chapter 5Concluding remarksCompressed sensing (CS) is a sampling paradigm in which signals are consid-ered as vectors in Rn, for a large n, and are generally assumed to be sparseor compressible with respect to a basis or frame. Under such assumption,the goal of CS is to recover the original signal with seemingly insufficientnumber of measurements using a proper measurement matrix and an effi-cient algorithm. The standard results of CS indicate that certain randommatrices can be used as measurement matrices, and they satisfy “RestrictedIsometry Property" (RIP) with high probability, for optimal number of mea-surements. Using these matrices comes with one caveat: storing these ma-trices are costly and for this reason certain deterministic matrices have beenconsidered. Using these matrices also comes with one significant caveat: inorder for these matrices to satisfy RIP, the number of measurements mustsatisfy m = O(k2), i.e., k = O(pm). This is called the “square-root barrier"for deterministic matrices in CS, which in fact is derived by combining theso-called Welch bound with Gershgorin circle theorem.In Chapter 3, we proposed a novel deterministic construction which hasthe advantage of being partial circulant, binary, and each entry of this con-struction has a simple and explicit formula. This means that these matricesare easy to store, and also using FFT, these matrices come with fast matrix-vector multiplication, and a fast reconstruction algorithm. We proved thatthese matrices have small coherence (and hence, can be used as CS matri-ces). In addition, we provided numerical experiments comparing our pro-posed construction with some existing deterministic constructions. In fact,we observed that our construction has a better performance compared to thedeterministic constructions considered here, and has a performance compa-rable with random Bernoulli matrices.In today’s digital world, quantizing the measurement vector is a crucialstep in the sampling process, which was mostly ignored in early literatureof CS. One known efficient method of quantization in CS is a method calledrth-order ⌃ quantization, which was accompanied with a one-stage recon-struction method. This method was shown to be robust respect to noiseand stable respect to compressible signals, but came with one caveat: it116Chapter 5. Concluding remarkswas applied only for the class of sub-Gaussian matrices. In Chapter 2, weproposed two novel approaches to generalize this method to random restric-tions of bounded orthonormal systems, such as random restrictions of DFTmatrices (which are of high importance due to the applications in MRI). Wealso generalized this method to certain class of deterministic measurementmatrices, namely, certain submatrices of chirp sensing matrices. For eachof these cases, we provided numerical experiments confirming the boundsderived for the errors in approximation.In CS, the performance of a measurement matrix is normally judged bythe estimates for its RIP constants, since calculating the exact values of RIPconstants is an NP-hard problem– at least in a vast regime for number ofmeasurements m (vs. the sparsity level k). A common bound for RIP con-stants of a matrix, namely, k (k 1)µ is derived by applying Gershgorincircle theorem on Gramian matrices. However, one should note that Gersh-gorin circle theorem estimates the eigenvalues of a matrix uniformly, whilethe RIP constants depend only on the maximum and minimum eigenval-ues of the Gramian matrices. Furthermore, Gershgorin circle theorem canbe applied to any square matrix, and hence it does not use the fact thatthe Gramian matrices are positive semidefinite. In Chapter 4, we deployedthe so-called Dembo bounds, which estimate the maximum and minimumeigenvalues of a positive semidefinite matrix, to improve the classical boundk (k 1)µ by an additive constant for the so-called Paley tight frames.However, we showed that this method has a great potential in general. Infact, we showed that if a particular conjecture regarding the distribution ofquadratic residues holds, then we can generalize Dembo bounds to break thesquare-root barrier via k = O(m5/7). We substantiated this conjecture bynumerical experiments, and we also theoretically discussed it. Furthermore,we used the notion of skew-symmetric adjacency matrices and a recent (2018)result regarding a bound on the spectral radius of an oriented graph to derivea multiplicative constant improvement on the classical bound k (k 1)µfor the Paley tight frames. In particular, we showed that the maximumsparsity level satisfies 2k < ⇡2 · 1µp2 (opposed to 2k < 1µp2 + 1).117Bibliography[1] A. Amini and F. Marvasti. Deterministic construction of binary, bipolarand ternary compressed sensing matrices. Information Theory, IEEEtransactions on, 57(4):2360–2370, 2011.[2] L. Applebaum, S. D. Howard, S. Searle, and R. Calderbank. Chirp sens-ing codes: Deterministic compressed sensing for fast recovery. Appliedand Computational Harmonic Analysis, 26(2):283–290, 2009.[3] W. Bajwa, J. Haupt, G. Raz, and R. Nowak. Compressed channelsensing. Proc. of Conf. on Information Sciences and Systems, 2008.[4] W. Bajwa, J. Haupt, G. Raz, S. Wright, and R. Nowak. Toeplitz-structured compressed sensing matrices. IEEE Workshop SSP., 2007.[5] T. Banachiewicz. Zur berechnung der determinanten, wie auch der in-versen, und zur darauf basierten auflosung der systeme linearer gle-ichungen. Acta Astronomica, Serie C(3):41–67, 1937.[6] A. Bandeira, M. Fickus, D. G. Mixon, and J. Moreira. Derandomizngrestricted isometries via the legendre symbol. Constructive Approxima-tion, June 2014.[7] A.S. Bandeira, M. Fickus, D. Mixon, and P. Wong. The road to de-terministic matrices with the restricted isometry property. Journal ofFourier Analysis and Applications, 19:1123–1149, 2013.[8] H. E. Bell. Gerschgorin’s theorem and the zeros of polynomials. Amer.Math. Monthly, 72:292–295, 1965.[9] J. Benedetto, A. Powell, and O. Yilmaz. Sigma-delta quantization andfinite frames. IEEE International Conference on Acoustics, 52(5):1990–2005, 2004.[10] J. Benedetto, A. Powell, and O. Yilmaz. Second-order sigma-deltaquantization of finite frame expansions. Applied and ComputationalHarmonic Analysis, 20(1), 2006.118Bibliography[11] J. Blum, M. Lammers, A. Powell, and O. Yilmaz. Sobolev duals in frametheory and sigma-delta quantization. Journal of Fourier Analysis andApplications, 16(3):365–381, 2010.[12] B. Bodmann and V. Paulsen. Frames, graphs, and erasures. LinearAlgebra and its Applications, 404, 2005.[13] P. Boufounos, L. Jacques, F. Krahmer, and R. Saab. Quantization andcompressed sensing, chapter in “Compressed sensing and its applica-tions" (edited by H. Boche, R. Calderbank, G. Kutyniuk, J. Vybiral).Springer, 2015.[14] P. T. Boufounos and R. G. Baraniuk. One-bit compressive sensing.Proc. of Conf. on Information Sciences and Systems, March 2008.[15] J. Bourgain, S. Dilworth, K. Ford, S. Konyagin, and D. Kutzarova.Breaking the k2 barrier for explicit rip matrices. STOC, 2011.[16] J. Bourgain, S. Dilworth, K. Ford, S. Konyagin, and D. Kutzarova.Explicit constructions of rip matrices and related problems. Duke Math,159(1):145–185, 2011.[17] J. Bourgain and A. Glibichuk. Exponential sum estimate over subgroupin an arbitrary finite field. Journal d’Analyse Mathematique, 115(1):51–70, 2011.[18] T. T. Cai and A. Zhang. Sparse representation of a polytope and re-covery of sparse signals and low-rank matrices. IEEE Transactions onInformation Theory, 60(1), January 2014.[19] E. Candes, J. Romberg, and T. Tao. Robust uncertainty principle: Ex-act signal reconstruction from highly incomplete frequency information.IEEE Transactions on Information Theory, 52(2):489–509, 2006.[20] E. Candes, J. Romberg, and T.Tao. Stable signal recovery from in-complete and inaccurate measurements. Communications on pure andApplied Mathematics, 59(8):1207–1223, August 2006.[21] E. Candes and T. Tao. Near-optimal signal recovery from random pro-jections: universal encoding strategies ? IEEE Transactions on Infor-mation Theory, 52:5406–5425, 2006.[22] E. Candes and T. Tao. Decoding by linear programming. InformationTheory, IEEE transactions on, 51(12), December 2005.119Bibliography[23] Z. Chen and J. Dongarra. Condition numbers of gaussian random ma-trices. SIAM J. Matrix Anal. and Appl., 27(3):603–620, 2005.[24] E. Chou, S. Gunturk, F. Krahmer, R. Saab, and O. Yilmaz. Noise-shaping quantization methods for frame-based and compressive samplingsystems, chapter 4 in "Sampling Theory, A Renaissance" (edited by G.Pfander). Birkhauser, Boston, 2015.[25] X. Cui. Construction of deterministic measurement matrices using dec-imated legendre sequence. MATEC Web of conferences, page 22, 2015.[26] A. Dembo. Bounds on the extreme eigenvalues of positive-definitetoeplitz matrices. Information Theory, IEEE transactions on,34(2):352–355, 1988.[27] B. Deng, X. Li, B. Shader, and W. So. On the maximum skew spectralradius and minimum skew energy of tournaments. Linear and Multilin-ear Algebra, 66(7), 2018.[28] R. DeVore. Deterministic cinstructions of compressed sensing matrices.Journal of Complexity, 23(4-6):918–925, 2007.[29] S. Dirksen, H. Jung, and H. Rauhut. One-bit compressed sensing withpartial gaussian circulant matrices. ArXiv preprint ArXiv: 1710.03287,2017.[30] D. Donoho. Compressed sensing. IEEE Transactions on Signal Pro-cessing, 52(4):1289–1306, 2006.[31] D. Donoho, M. Elad, and V. Telmyakov. Stable recovery of sparse over-complete representations in the presence of noise. Information Theory,IEEE transactions on, 52(1):6–18, January 2006.[32] J. Feng, F. Krahmer, and R. Saab. Quantized compressed sensing forpartial random circulant matrices. 2017 International Conference onSampling Theory and Applications, 2017.[33] M. Fornasier. Numerical methods for sparse recovery, chapter 2 in:“Theoretical foundations and numerical methods for sparse recovery"(edited by M. Fornasier). Walter de Gruyter & Co., 2010.[34] S. Foucart. Sparse recovery algorithms: sufficient conditions in terms ofrestricted isometry constants. Chapter 5 in: Approximation theory XIII:San Antonio 2010, volume 13. Springer Proceedings in Mathematics,2012.120Bibliography[35] S. Foucart. Stability and robustness of `1 minimization with weibull ma-trices and redundtant dictionaries. Linear Algebra and its Applications,441(15):4–21, 2014.[36] S. Foucart, A. Pajor, H. Rauhut, and T. Ullrich. The gellfand widths oflp-balls for 0 < p 1. Journal of Complexity, 26(6):629–640, December2010.[37] S. Foucart and H. Rauhut. A mathematical introduction to compressivesensing. Birkhauser Verlag, 2013.[38] A. Y. Garnaev and E. D. Gluskin. On widths of the euclidean ball.Soviet Mathematics. Doklady, 30(200-204), 1984.[39] S. Gerschgorin. Uber die abgrenzung der eigenwerte einer matrix. Bul-letin de l’Academie des Sciences de l’URSS. Classe des sciences mathe-matiques et na, (6):749–754, 1931.[40] C.S. Gunturk, M. Lammers, A. Powell, R. Saab, and O. Yilmaz. Sobolevduals for random frames and sigma-delta quantization of compressedsensing measurements. Foundation of Computational Mathematics,13(1):1–36, 2013.[41] S.D. Howard, A. R. Calderbank, and S. J. Searle. A fast reconstructionalgorithm for deterministic compressive sensing using second order reed-muller codes. Proc. of Conf. on Information Sciences and Systems,pages 11–15, 2008.[42] H. Inose and Y. Yasuda. A unity bit coding method by negative feed-back. Proc. IEEE, 51(11):1524–1535, 1963.[43] K. Ireland and M. Rosen. A classical introduction to modern numbertheory. Springer Science+ Business Media New York, 2000.[44] L. Jacques, J. N. Laska, P. T. Boufounos, and R. G. Baraniuk. Robust 1-bit compressive sensing via binary stable embeddings of sparse vectors.IEEE Transactions on Information Theory, 59(4), April 2013.[45] G. A. Jones and J. M. Jones. Elementary Number Theory. Berlin:Springer-Verlag, 1998.[46] F. Krahmer, S. Mendelson, and H. Rauhut. Suprema of chaos processesand the restricted isometry property. Communications on pure andApplied Mathematics, 67(11):1877–1904, 2014.121Bibliography[47] F. Krahmer, R. Saab, and R. Ward. Root-exponential accuracy forcoarse quantization of finite frame expansions. Information Theory,IEEE transactions on, 58(2):1069–1079, 2012.[48] F. Krahmer, R. Saab, and O. Yilmaz. Sigma-delta quantization of sub-gaussian frame expamsions and its application to compressed sensing.Information and Inference, 3(1):40–58, 2014.[49] F. Krahmer and R. Ward. Lower bounds for the error decay incurredby coarse quantization schemes. Applied and Computational HarmonicAnalysis, 32(1):131–138, 2012.[50] K. Li, L. Gan, and C. Ling. Convolutional compressed sensing us-ing deterministic sequences. IEEE Transactions on Signal Processing,61(3):740–752, 2013.[51] S. Li, F. Gao, G. Ge, and S. Zhang. Deterministic construction ofcompressed sensing matrices via algebraic curves. Information Theory,IEEE transactions on, 58(8):5035–5041, 2012.[52] M. Lustig, D. L. Donoho, J. M. Santos, and J. M. Pauly. Compressedsensing mri. IEEE Signal Processing Magazine, 25(2):72–82, 2008.[53] F. J. MacWilliams and N. J. A. Sloane. The theory of error correctingcodes. Elsevier, 1976.[54] K. Melnykova and O. Yilmaz. Memoryless scalar quantization for ran-dom frames. preprint, 2018.[55] S. Mendelson and A. Pajor. On singular values of matrices with inde-pendent rows. Bernoulli, 12(5):761–773, 2006.[56] D. G. Mixon. Explicit matrices with the rip: Breaking the square-rootbottleneck. Book chapter, Compressed sensing and applications, 2014.[57] Hugh L. Montgomery. Ten lectures on the interface between analyticnumber theory and harmonic analysis. American Mathematical Society,1994.[58] R. R. Naidu, P. V. Jampana, and C. S. Sastry. Deterministic compressedsensing matrices: Construction via euler squares and applications. IEEETransactions on Signal Processing, 64(14):3566–3575, April 2016.[59] B. K. Natarajan. Sparse approximate solutions to linear systems. SIAMJournal on Computing, 24:227–234, 1995.122Bibliography[60] J. L. Nelson and V. N. Temlyakov. On the size of incoherent systems.Journal of Approximation Theory, 163(9):1238–1245, September 2011.[61] K. Ni, S. Datta, P. Mahanti, S. Roudenko, and D. Cochran. Using reed-muller sequences as deterministic compressed sensing matrices for imagereconstruction. Acoustic Speech and Signal Processing, pages 465–468,March 2010.[62] J. A. Nikara, J. H. Takala, and J. T. Astola. Discrete cosine and sinetransforms: regular algorithms and pipeline architectures. Signal Pro-cessing, 86(2):230–249, 2006.[63] Y. Plan and R. Vershynin. Robust 1-bit compressed sensing and sparselogistic regression: a convex programming approach. Information The-ory, IEEE transactions on, 59(1):482–494, December 2012.[64] Y. Plan and R. Vershynin. One-bit compressed sensing by linearprogramming. Communications on pure and Applied Mathematics,66(8):1275–1297, 2013.[65] A. Powell, R. Saab, and O. Yilmaz. Quantization and finite frames,chapter 8 in “Finite frames: Theory and Applications" (edited by P.Casazza and G. Kutyniok). Birkhauser, Boston, 2012.[66] H. Rauhut. Circulant and toeplitz matrices in compressed sensing. Proc.SPARS’09, 2009.[67] H. Rauhut, J. Romberg, and J. Tropp. Restricted isometries for par-tial random circulant matrices. Applied and Computational HarmonicAnalysis, 32:242–254, 2012.[68] J. Romberg. Compressive sensing by random convolution. SIAM J.Imaging Sci., 2(4):1098–1128, 2009.[69] M. Rudelson and R. Vershynin. Sparse reonstruction by convex relax-ation: Fourier and gaussian measurements. 40th Annual Conference onInformation Science and Systems, pages 207–212, 2006.[70] M. Rudelson and R. Vershynin. Non-asymptotic theory of randommatrices: extreme singular values. Proceedings of the InternationalCongress of Mathematicians, Hyderabad, India, 2010.[71] R. Saab, R. Wang, and O. Yilmaz. From compressed sensing to com-pressed bit-streams: practical enncoders, tractable decoders. IEEETransactions on Information Theory, 64(9):6098–6114, 2017.123Bibliography[72] R. Saab, R. Wang, and O. Yilmaz. Quantization of compressive sampleswith stable and robust recovery. Applied and Computational HarmonicAnalysis, 44(1):123–143, 2018.[73] R. Schreier and G. Temes. Understanding delta-sigma data converters.Wiley, Piscataway, New Jersey, 2004.[74] A. Soshnikov and Y. Fyadorov. On the largest singular values of randommatrices with independent cauchy entries. Journal of MathematicalPhysics, 46, 2005.[75] R. J. Turyn. Character sums and difference sets. Pacific Journal ofMathematics, 15(1), 1965.[76] T. Wang, Q. Berthet, and Y. Plan. Average-case hardness of rip cer-tification. Proceedinngs of the 30th International Conference on Neu-ral Information Processing Systems. NIPS’16, Curran Associates Inc.,USA, 2016.[77] L. R. Welch. Lower bounds on the maximum cross-correlation of signals.IEEE Transactions on Information Theory, 20(3):397–399, 1974.[78] S. Wright. Quadratic residues and non-residues. Springer InternationalPublishing, 2016.[79] O. Yilmaz. Coarse quantization of highly redundant time-frequency rep-resentations of square-integrable functions. Applied and ComputationalHarmonic Analysis, 14(2):107–132, 2003.[80] O. Yilmaz. On coarse quantization of tight gabor frame expansions.International Journal of Wavelets, 3(2):283–299, 2005.[81] G. Zhang, R. Mathar, and Q. Zhou. Deterministic bipolar measurementmatrices with flexible sizes from legendre sequence. Electronic Letters,52(11), May 2016.[82] G. Zhang and Q. Zhou. Pseudonoise codes constructed constructed bylegendre sequence. Electronic Letters, 38(8):376–377, 2002.124
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Compressed sensing with deterministic measurement matrices
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Compressed sensing with deterministic measurement matrices Arian, Arman 2019
pdf
Page Metadata
Item Metadata
Title | Compressed sensing with deterministic measurement matrices |
Creator |
Arian, Arman |
Publisher | University of British Columbia |
Date Issued | 2019 |
Description | Compressed sensing (CS) is a signal acquisition paradigm to simultaneously acquire and reduce dimension of signals that admit sparse representations. This is achieved by collecting linear, non-adaptive measurements of a signal, which can be formalized as multiplying the signal with a “measurement matrix". If the measurement matrix satisfies the so-called restricted isometry property (RIP), then it will be appropriate for compressed sensing. While wide classes of random matrices provably satisfy the RIP with high probability, explicit and deterministic constructions have been shown (so far) to satisfy the RIP only in a significantly suboptimal regime. In this thesis, our focus is on deterministic measurement matrices in compressed sensing. In a nutshell, we investigate quantization methods for a class of deterministic matrices (Chapter 2); introduce a novel deterministic construction of a family of binary, circulant measurement matrices using the Legendre symbol (Chapter 3); and propose two novel approaches for improving the RIP constant estimates based on Gershgorin circle theorem, obtaining an improvement over the Gershgorin bounds by a multiplicative constant. One of our approaches here, together with a conjecture we make regarding the distribution of quadratic residues in a finite field provides a potential path to break the so-called “square-root barrier"–we give a proof based on the assumption that the conjecture holds. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2019-09-27 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0381024 |
URI | http://hdl.handle.net/2429/71789 |
Degree |
Doctor of Philosophy - PhD |
Program |
Mathematics |
Affiliation |
Science, Faculty of Mathematics, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2019-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2019_november_arian_arman.pdf [ 2.42MB ]
- Metadata
- JSON: 24-1.0381024.json
- JSON-LD: 24-1.0381024-ld.json
- RDF/XML (Pretty): 24-1.0381024-rdf.xml
- RDF/JSON: 24-1.0381024-rdf.json
- Turtle: 24-1.0381024-turtle.txt
- N-Triples: 24-1.0381024-rdf-ntriples.txt
- Original Record: 24-1.0381024-source.json
- Full Text
- 24-1.0381024-fulltext.txt
- Citation
- 24-1.0381024.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0381024/manifest