Using Prior Support Information in CompressedSensingbyNavid GhadermarzyB.Sc., Sharif University of Technology, 2011A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEinThe Faculty of Graduate and Postdoctoral Studies(Mathematics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)August 2013c Navid Ghadermarzy 2013AbstractCompressed sensing is a data acquisition technique that entails recovering estimates of sparseand compressible signals from n linear measurements, significantly fewer than the signal ambi-ent dimension N . In this thesis we show how we can reduce the required number of measure-ments even further if we incorporate prior information about the signal into the reconstructionalgorithm. Specifically, we study certain weighted nonconvex `pminimization algorithms anda weighted approximate message passing algorithm.In Chapter 1 we describe compressed sensing as a practicable signal acquisition method inapplication and introduce the generic sparse approximation problem. Then we review someof the algorithms used in compressed sensing literature and briefly introduce the method weused to incorporate prior support information into these problems.In Chapter 2 we derive sufficient conditions for stable and robust recovery using weighted `pminimization and show that these conditions are better than those for recovery by regular `pand weighted `1. We present extensive numerical experiments, both on synthetic examplesand on audio, and seismic signals.In Chapter 3 we derive weighted AMP algorithm which iteratively solves the weighted `1minimization. We also introduce a reweighting scheme for weighted AMP algorithms whichenhances the recovery performance of weighted AMP. We also apply these algorithms on syn-thetic experiments and on real audio signals.iiPrefaceChapter 1 is an introduction and motivation for the problems studied in this thesis. Sections1.1 and 1.2 reviews well-known results in the field, no originality claimed. Figures 1.4 and 1.7has been adopted from [2] and [4], respectively, by permission.Chapter 2 is a joint work with Hassan Mansour and ?zg?r Y?lmaz. I conducted all theexperiments. A version of this chapter will be published by the title "Recovering CompressivelySampled Signals by non-convex optimization and using partial support information". Thischapter is based on works conducted by M. P. Friedlander, H. Mansour, R. Saab, and O.Y?lmaz [1] and R. Saab and O.Y?lmaz [2].Chapter 3 incorporates prior support information into the algorithm introduced by D. L.Donoho, A. Maleki, and A. Montanari [3] and is based on their work in [34]. all the lemmasand theorems in [34] has been modified by the author except for Lemma 3.1 which has beenrestated. A version of this chapter will be published by the title "Weighted and reweightedapproximate message passing".iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi1 Introduction and overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Compressed sensing and the sparse approximation problem . . . . . . . . . . . 21.2 Alternative algorithms and recovery guarantees . . . . . . . . . . . . . . . . . 51.2.1 Convex relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2.2 Nonconvex optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.3 Approximate message passing . . . . . . . . . . . . . . . . . . . . . . . 91.3 Using prior information in the recovery algorithm . . . . . . . . . . . . . . . . 101.3.1 Recovery by weighted `1minimization . . . . . . . . . . . . . . . . . . 121.3.2 Recovery by weighted `pminimization . . . . . . . . . . . . . . . . . . 131.3.3 Recovery by weighted AMP . . . . . . . . . . . . . . . . . . . . . . . . 142 Recovering compressively sampled signals by non-convex optimization and using partial supportinformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.1 Introduction and overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.1.1 Recovery by `1minimization . . . . . . . . . . . . . . . . . . . . . . . . 162.1.2 Recovery by `pminimization . . . . . . . . . . . . . . . . . . . . . . . . 172.1.3 Recovery by weighted `1minimization . . . . . . . . . . . . . . . . . . 18ivTable of Contents2.2 Main results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.2.1 Weighted `pminimization with estimated support . . . . . . . . . . . . 202.2.2 Comparison to weighted `1recovery . . . . . . . . . . . . . . . . . . . . 222.2.3 Comparison to `precovery . . . . . . . . . . . . . . . . . . . . . . . . . 242.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.3.1 Algorithmic issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.3.2 Projected gradient method . . . . . . . . . . . . . . . . . . . . . . . . . 272.3.3 Iterative reweighted `p. . . . . . . . . . . . . . . . . . . . . . . . . . . 272.3.4 Iterative reweighted least squares . . . . . . . . . . . . . . . . . . . . . 282.4 Numerical examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.4.1 The sparse case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.4.2 The compressible case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.5 Stylized application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.5.1 Audio signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.5.2 Seismic signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402.6 Proof of Theorem 2.12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453 Weighted AMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.2 Construction of the graphical model for weighted BP . . . . . . . . . . . . . . 513.3 Large system limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.4 Large limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553.5 From message passing to AMP . . . . . . . . . . . . . . . . . . . . . . . . . . . 563.6 Reweighted AMP and reweighted W-AMP . . . . . . . . . . . . . . . . . . . . 593.6.1 Reweighted AMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593.6.2 Reweighted W-AMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633.7 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643.8 Stylized applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73vList of Tables3.1 Comparison of recovery time of AMP, reweighted AMP, and `1minimization inseconds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65viList of Figures1.1 Image compression with discrete cosine transform (DCT). (a) Original image.(b) DCT coefficients sorted in descending order. (c) Image, reconstructed byzeroing out all the DCT coefficients but largest 10%. . . . . . . . . . . . . . . . 3(a) Original image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3(b) Magnitude of DCT coefficients sorted in descending order . . . . . . . . . 3(c) Image, reconstructed by zeroing out all the DCT coefficients but largest10% . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 `0, `1and `2recovery for a simple 2-D example. The line shows the constraintdetermined by the measurements. The square shows the 1-norm ball and thedashed circle shows the 2-norm ball. . . . . . . . . . . . . . . . . . . . . . . . . 61.3 Convex hull generated by columns of A and the polytopes generated by differentp?s. The `pminimization can recover x if Ax corresponds to a point on the faceof ABNp. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.4 Theoritical and empirical recovery guarantees using `pminimization to recoveran S-sparse signal, when the the measurement matrix is a Gaussian matrixA 2 R100300: Adapted from [2] with permission. . . . . . . . . . . . . . . . . . 9(a) Theoretical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9(b) Empirical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.5 Amplitude of frequency coefficients of an audio signal in DCT domain. Most ofthe large coefficients correspond to low frequencies. . . . . . . . . . . . . . . . . 111.6 (a) Example of a shot gather from a seismic line from the Gulf of Suez. (b)Example of a high resolution time slice in the source-receiver (SR) domain froma seismic line from the Gulf of Suez. . . . . . . . . . . . . . . . . . . . . . . . . 121.7 Example of the accuracy achieved from using adjacent frequencies to predictthe support of the curvelet coefficients in a seismic line. Adapted from [4] withpermission. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13viiList of Figures2.1 Comparison of the sufficient conditions for recovery with weighted `precon-struction with various . In all the Figures, we set a = 3 and = 1 and p = 25. 232.2 Comparison between the phase diagrams of measurement matrices with Gaus-sian entries satisfying the sufficient recovery conditions of weighted `pmini-mization and weighted `1minimization with ! = 0 and = 0:3, 0:6, and 0:8and standard `1. The plots are calculated using the upper bounds on the re-stricted isometry constants derived in [5]. Points below each curve determinethe sparsity-undersampling ratios that satisfy the sufficient bounds on the RIPconstants introduced in (2.7) and (2.13). . . . . . . . . . . . . . . . . . . . . . . 252.3 Comparison of the recovery constants for weighted `preconstruction with vari-ous . In all the figures, we set a = 3 and = 1 and p = 25. . . . . . . . . . . . 262.4 Comparison of performance of weighted `pand weighted `1recovery in terms ofSNR averaged over 10 experiments for sparse signals with variable weights andmeasurements and = 1 and p = 0:5. . . . . . . . . . . . . . . . . . . . . . . . 342.5 Comparison of SNR for variable weights and p for = 1, k = 40 and n = 100. . 352.6 Comparison of performance of weighted `pand weighted `1recovery in terms ofSNR averaged over 20 experiments for sparse signals x with n = 100; N = 500with variable support size and variable and ! . . . . . . . . . . . . . . . . . . 362.7 Comparison of performance of weighted `pand weighted `1recovery in terms ofSNR averaged over 20 experiments for compressible signals x with n = 100; N =500. The coefficients decay with a power d = 1:1. The accuracy of the supportestimate is calculated with respect to the best k = 40 term approximation. . 372.8 Comparison of performance of weighted `pand weighted `1recovery in terms ofSNR averaged over 20 experiments for compressible signals x with n = 100; N =500. The coefficients decay with a power d = 1:5. The accuracy of the supportestimate is calculated with respect to the best k = 20 term approximation. . 382.9 SNRs of reconstructed signal from compressed sensing measurements plottedagainst !. An intermediate value of ! yields the best performance. . . . . . . . 402.10 (a) Example of a high resolution time slice at t = 0:32 s in the source-receiver(SR) domain, (b) the random subsampling mask where the black lines corre-spond to the locations of inactive receivers, and (c) the subsampled time slice.The subsampling ratio is 50%. . . . . . . . . . . . . . . . . . . . . . . . . . . . 412.11 (a) Shot gather number 32 from a seismic line from the Gulf of Suez. (b)Subsampled shot gather using column 32 from the mask in Figure 2.10.b. . . . 43viiiList of Figures2.12 (a) Recovered shot gather using `pminimization in the SR domain. (b) Recov-ered shot gather using weighted `pminimization in the SR domain. . . . . . . . 432.13 (a) Error plots showing the difference between the original shot gather andthe reconstruction from `pminimization in the source-receiver domain. (b)Error plots showing the difference between the original shot gather and thereconstruction from weighted `pminimization in the SR domain. . . . . . . . . 442.14 Comparison of the SNRs achieved by `1, `p, weighted `1, and weighted `pmin-imization in recovering shot gathers applied to source-receiver domain . . . . . 452.15 Illustration of the signal x and weight vector w emphasizing the relationshipbetween the sets T0and eT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463.1 Factor graph associated to the probability distribution 3.4. Circles correspondto variables si, i 2 [n] and squares correspond to measurements ya, a 2 [m]. . . 523.2 Histograms of the recovery time of weighted `1and weighted AMP. Plots showthe times it takes for (a) weighted `1, and (b) weighted AMP to recover 1000k-sparse signals x 2 R4000 with k = 400 and n = 1500 measurements when wehave a support estimate with 50% accuracy. . . . . . . . . . . . . . . . . . . . . 583.3 Comparison of recovery time of each one of the 1000 signals by weighted `1andweighted AMP. The experiments are sorted with respect to the time it takes forweighted `1to recover the signal. Specifically, the first experiments is one thatweighted `1minimization recovers it faster than all the others and experimentnumber 1000 is the one which is recovered slowest by weighted `1. . . . . . . . 593.4 Percentage of the support recovered by AMP versus iteration number when thesignal x 2 R1000 is 100-sparse. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603.5 Comparison between regular AMP and reweighted AMP when m=250 andm=350, recovering a 100-sparse signal in R1000. . . . . . . . . . . . . . . . . . . 613.6 Comparison of percentage of exact recovery by reweighted AMP, IRL1, SDRL1,WSPGL1, and regular AMP recovering 100 sparse signals x 2 R1000 with dif-ferent number of measurement, n, and different levels of sparsity. . . . . . . . 623.7 Histogram of ratio of mean squared error (MSE) between reweighted AMP andIRL1 for recovering compressible signals x 2 R1000, sorted coefficients of whichdecay like jp with j 2 f1; 2; : : : ; 100g and with different decay rates p. . . . 63ixList of Figures3.8 (Noise-free case) Comparison of performance of (a, c, e) weighted AMP andweighted `1and (b, d, f) reweighted W-AMP, IRL1, SDRL1, and WSPGL1in terms of SNR averaged over 20 experiments for sparse signals with variableweights and measurements when there is no noise. . . . . . . . . . . . . . . . . 673.9 (Noisy case) Comparison of performance of (a, c, e) weighted AMP and weighted`1and (b, d, f) reweighted W-AMP, IRL1, SDRL1, and WSPGL1 in terms ofSNR averaged over 20 experiments for sparse signals with variable weights andmeasurements when there is 5% noise. . . . . . . . . . . . . . . . . . . . . . . . 703.10 Empirical phase transition of 41, regular AMP and reweighted AMP recoveringsparse signals in R200. Figures show the percentage of successful recovery over20 experiments when the measurement matrix is Gaussian. . . . . . . . . . . . 713.11 Illustration of the phasediagrams of AMP, reweighted AMP, and 41presentedin Figure 3.10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713.12 SNRs of reconstructed audio signals from compressed sensing measurementsplotted against !. Figure shows the result for reconstruction via weighted `1minimization, weighted `pminimization, weighted AMP, and reweighted W-AMP. 72xAcknowledgementsI would like to thank my supervisor, ?zg?r Y?lmaz, for his support and guidance throughoutthese last two years. His patience with me and his insightful suggestions are much appreciated.I would also like to thank Hassan Mansour for all his help during my graduate studies and allthe meetings where he patiently shared his ideas with me. I would also want to thank FelixHerrmann and all the other members of SLIM group for their support. Last but not least Iwould want to thank my family who I owe where I am right now to them.xiChapter 1Introduction and overviewCompressed sensing is a data acquisition technique for efficiently acquiring and recoveringsparse or approximately sparse signals from seemingly incomplete and noisy linear measure-ments. We say that a signal is sparse when the signal admits a representation in some transformdomain where most of its coefficients are zero. There are many applications where the targetsignals admit a nearly sparse representation in some transform domain. In other words, mostof the underlying information of the signal can be obtained by relatively fewer coefficients. Forexample, natural images are nearly sparse in discrete cosine transform domain (DCT) and inthe wavelet domain which is crucial in applications like JPEG and JPEG2000. Similarly audiosignals are approximately sparse in short time DCT domain and short time Fourier domainwhich allows MP3 compression.Exploiting this observation and using appropriate reconstruction schemes, compressive sens-ing enables signal acquisition with fewer measurements than traditional sampling. Compressedsensing is especially promising in applications where taking measurements is costly (e.g., hyper-spectral imaging [6]) as well as in applications where the ambient dimension of the underlyingsignal is very large, i.e., medical [7] and seismic imaging [8, 9] (seismic images are approxi-mately sparse in the curvelet domain).Compressed sensing utilizes the sparsity information of the signal to reduce the necessarynumber of measurements in order to capture all (or most) the information of the signal. Onthe other hand, several studies, e.g., [10?12] show that the necessary number of measurementscan be reduced even more if we use prior information about the signal to be recovered. Priorinformation about the signal can be obtained by considering the properties of the signal whichwe want to recover. For example, we know that speech signals have large low-frequency co-efficients. Other than this, signals such as video and audio exhibit correlation over temporalframes that can be exploited to estimate a portion of the support using previously decodedframes. As an other example correlations between locations of significant coefficients of differ-ent partitions of seismic data, e.g., shot records, common-offset gathers, or frequency slices ofthe acquired data can be used as a prior information to interpolate seismic data sets.11.1. Compressed sensing and the sparse approximation problemIn this thesis we introduce modified versions of two well-known recovery algorithms when addi-tional partial support information is available and examine the performance of these modifiedalgorithms. Chapter 2 focuses on weighted non-convex optimization for signal reconstructionand Chapter 3 investigates the weighted approximate message passing (AMP) algorithm. Be-fore summarizing our main contribution, we first provide the necessary context. Section 1.1describes compressed sensing as a practicable signal acquisition method in applications andintroduces the generic sparse approximation problem. Section 1.2 reviews some of the algo-rithms used in compressed sensing literature. Section 1.3 clarifies the main idea of using priorinformation to enhance the recovery performance and summarizes our main contributions,weighted `pminimization, and weighted AMP.1.1 Compressed sensing and the sparse approximation problemRole of sparsityConsider the problem of acquiring a signal f 2 RN via n linear measurements:y = f: (1.1)Here is called the sensing matrix and y is the measurement vector. Recalling fundamentaltheorem of linear algebra, if we want to reconstruct f from y in (1.1), the number of measure-ments, n, should be at least equal to the number of unknowns N . In many applications, thesignal size N , is very big and it is very costly and time consuming to take that many mea-surements. On the other hand many signals can be approximated by a sparse signal in someconvenient basis. In other words most of the information of the signal can be captured by arelatively few nonzero coefficients if the signal is represented in an appropriate basis. Figures1.1 illustrate this observation: In Figure 1.1.a we show a grayscale image. Figure 1.1.b showsthe magnitude of the DCT coefficients of this image in descending order. Notice how fast theDCT coefficients tend to zero. Figure 1.1.c is a reconstruction of the original image in 1.1.afrom only the largest 10% of its DCT coefficients. As we can see in Figure 1.1.b, the smallest90% coefficients of the image in DCT basis are very close to zero and therefore, we do not losemuch information when we throw them out.21.1. Compressed sensing and the sparse approximation problemOriginal image(a)0 0.5 1 1.5 2 2.5 3x 1050500100015002000250030003500 DCT coefficients in descending order(b)Reconstructed image by 10% of DCT coefficients(c)Figure 1.1: Image compression with discrete cosine transform (DCT). (a) Original image. (b)DCT coefficients sorted in descending order. (c) Image, reconstructed by zeroing out all theDCT coefficients but largest 10%.Compressed sensingSimilar to the observation we made in Figure 1.1, for many classes of signals, such as naturalimages, audio, video, seismic data and images, an appropriate basis can be found such thatsignal is approximately sparse in that basis.Assume that the signal f is sparse (or approximately sparse) in the basis = f 1; 2; : : : ; Ng.Then f can be written as a linear combination of the basis vectors 1, 2, ..., Nasf =NXi=1 ixi= x; (1.2)such that x 2 RN is sparse. Notice that with slight abuse of notation we use to refer to boththe matrix [ 1; 2; : : : ; N] and the basis generated by the columns of this matrix. Substi-tuting (1.2) in (1.1), we get y = x. Defining A := , we have y = Ax with the additionalinformation that x is sparse. Exploiting this sparsity lets us reconstruct x and hence f , bymuch fewer measurements than the number of unknowns N . The sampling paradigm thatis based on this observation and hence allows us to acquire sparse high dimensional signalsby making a small number of non-adaptive linear measurements is called compressed sens-ing [13, 14]. Of course, our discussion above is sketchy in that we have not specified how torecover the original sparse vector, i.e., how to solve the "sparse recovery problem".31.2. Alternative algorithms and recovery guaranteesSparse recovery problemAs mentioned above, sparse recovery problem is at the foundation of compressed sensing.First we quantify what we mean by sparsity. Let k < N be two positive integers. We say thatx 2 RN is k-sparse if x 2 Nk:= fu 2 RN : kuk0 kg (kuk0, denotes the number of non-zerocoefficients of u).Let x 2 Nkand assume that y 2 Rn, the vector of n linear and potentially noisy measurementsof x is acquired via y := Ax+ e. Here A is an nN measurement matrix with nfi N and edenotes the noise in our measurements with kek2 ". We wish to recover x from y by solvingthe sparse recovery problem. This entails finding the sparsest vector, say z, that is feasible,i.e., kAz yk . In the noise-free case, i.e., when = 0, the decoder 40: RnN Rn 7! RNis defined as40(A; y) := argminz2RNjjzjj0subject to Az = y: (1.3)It has been proven, e.g., in [15], that if n > 2k, then (1.3) recovers x 2 Nkperfectly when Ais in general position. That is, in this case 40(A; y) = x. However, (1.3) is a combinatorialproblem which becomes intractable as the dimensions of the problem increase. Therefore, oneseeks to modify the optimization problem so that it can be solved (at least approximately) withmethods that are more tractable than combinatorial search. In the next section we describesome of the algorithms that are used to find the solution of (1.3).1.2 Alternative algorithms and recovery guaranteesThe optimization problem (1.3) is a combinatorial problem. Hence alternative algorithms hasbeen introduced to find its solution (or approximate it) in certain cases that can be identifiedquantitatively. In this section we describe three of these algorithms.1.2.1 Convex relaxationThe most common approach to solve (1.3) is to replace the `0norm in (1.3) by a convex `1norm. More precisely, 41: RnN Rn R 7! RN is defined as41(A; y; ) := argminz2RNjjzjj1subject to jjAz yjj2 : (1.4)Figure 1.2 illustrates the idea of using `1norm instead of `0norm with a simple 2-D example.The line shows a constraint determined by the measurements. The point indicated by the41.2. Alternative algorithms and recovery guarantees?1.5 ?1 ?0.5 0 0.5 1 1.5?1?0.8?0.6?0.4?0.200.20.40.60.81Figure 1.2: `0, `1and `2recovery for a simple 2-D example. The line shows the constraintdetermined by the measurements. The square shows the 1-norm ball and the dashed circleshows the 2-norm ball.dotted circle shows one of the solutions of the `0minimization problem which is the intersectionof the constraint line with the x-axis. The square shows the smallest 1-norm ball that touchesthe constraint line which shows that one of the solutions to the `0problem coincides with theunique solution of the `1problem. On the other hand, this figure shows that an `2minimizationproblem does not find the sparsest solution on the constraint line in this example. The solutionto `2minimization is determined by the solid circle which indicates the intersection of theconstraint line and the dashed 2-norm ball.The `1minimization problem (1.4) is a convex relaxation of the `0problem and hence canbe solved in polynomial cost. However, the computational tractability of `1minimizationcomes at the cost of increasing the number of measures taken. Several works has been done toclose the gap in the required number of measurements for recovery by `0and `1minimizationproblem, including solving a non-convex `pproblem when 0 < p < 1 which is explained in the51.2. Alternative algorithms and recovery guaranteesnext section.1.2.2 Nonconvex optimizationAs mentioned in the previous section, the `1minimization problem in (1.4) can be formulatedas a linear program, which can be solved in polynomial time using standard algorithms (incontrast to combinatorial complexity of the `0problem (1.3)). However, this advantage comeswith the cost of increasing the number of measurements that has to be taken. One idea toclose this gap is to solve a nonconvex `pminimization problem with 0 < p < 1 instead of the`1minimization.Chartrand [16] and Saab, and Y?lmaz [2], cf. [17], considered the `pminimization based sparserecovery method when 0 < p < 1. They have shown that the `pminimization problem enjoysbetter recovery guarantees compared to the `1problem. Here the `1norm in (1.4) is replacedby the `pnorm. Notice that when 0 < p < 1, this is not a norm, but a quasi-norm. However,in what follows, we shall abuse notation and refer to the `pquasi-norm as the `pnorm. Thedecoder 4p: RnN Rn R 7! RN is defined as4p(A; y; ) := argminz2RNjjzjjpsubject to jjAz yjj2 ; (1.5)where 0 < p < 1. Although this method involves solving a nonconvex minimization problem,it can be solved, at least locally, much faster than (1.3). The algorithms used to approximatethe solution of this problem will be explained in detail in Chapter 2.We illustrate the benefits of using such an optimization by a simple example, shown in Figure1.3. Assume that a1; :::; a4denotes the columns of a matrix A 2 R34. Figure 1.3.a showsthe convex hull generated by the columns of A. For simplicity, we ignore a1; :::;a4in thefigure; note that including these does not change the outcome of our example. Assume thatx 2 42and the nonzero components of x correspond to the columns a1and a2with x1> 0 andx2> 0. Hence, y is placed on the solid line segment connecting a1and a2(for simplicity, wename this line segment a12). Assume BNpis the unit p-norm ball in RN ; then Figures 1.3.b?dshow a cross-section of ABNpfor p = 1; 0:7; 0:5 respectively. Recall that the `pminimizationproblem can recover x if the support of x corresponds to a point on the face of ABNp. As wecan see in this figure when p = 1 the line a12is completely inside the polytope which meansthat `1fails to recover x. As we decrease p, points on a12gradually become points on someface of ABNp, making x recoverable. This example illustrates how using `pminimization with0 < p < 1 can be advantageous.61.2. Alternative algorithms and recovery guarantees?0.8 ?0.6 ?0.4 ?0.2 0 0.2 0.4 0.6?0.500.500.20.40.60.81a2a3a4a1(a) Convex hull generated by columns of A?0.8 ?0.6 ?0.4 ?0.2 0 0.2 0.4 0.6 0.800.10.20.30.40.50.60.70.80.91(b) p = 1?0.8 ?0.6 ?0.4 ?0.2 0 0.2 0.4 0.6 0.800.10.20.30.40.50.60.70.80.91(c) p = 0:7?0.8 ?0.6 ?0.4 ?0.2 0 0.2 0.4 0.6 0.800.10.20.30.40.50.60.70.80.91(d) p = 0:5Figure 1.3: Convex hull generated by columns of A and the polytopes generated by differentp?s. The `pminimization can recover x if Ax corresponds to a point on the face of ABNp.Figure 1.4, taken from [2], illustrates the advantage of using 4pfor sparse recovery. Thisdiagram shows the success rate of recovering S-sparse signals using `pminimization, when themeasurement matrix is a Gaussian matrix A 2 R100300. In Figure 1.4.a the light-shaded areasshow the pairs (p,S) that we have guaranteed recovery using `pminimization to recover anS-sparse signal. In Figure 1.4.b Saab and Y?lmaz [2] show the empirical results. Again light-shaded regions correspond to higher rates of successful recovery. This figure illustrates howusing `pminimization improves the recovery conditions. For example, empirically with 90%probability, 20-sparse signals can be recovered by `1minimization whereas, `pminimizationwith p = 12, recovers 40-sparse signal with probability greater than 90%.71.2. Alternative algorithms and recovery guaranteesRegion where recovery with ?p is guaranteed for p and S (Light Shading = Recoverable)pS10 20 30 40 500.10.20.30.40.50.60.70.80.91(a) Theoretical resultsSpEmpirical Recovery Rates with ?p10 20 30 40 500.10.20.30.40.50.60.70.80.910.10.20.30.40.50.60.70.80.9(b) Empirical resultsFigure 1.4: Theoritical and empirical recovery guarantees using `pminimization to recoveran S-sparse signal, when the the measurement matrix is a Gaussian matrix A 2 R100300:Adapted from [2] with permission.1.2.3 Approximate message passingApplicability of simple linear programming algorithms to the basis pursuit problem (1.4) hasmade it an appealing choice for compressed sensing problems. However, in large applications,high computational complexity of these algorithms is a problem. Therefore, significant efforthas been made to find fast first order methods to recover signals from compressed measure-ments in applications ranging from biology to imaging where very large dimensional signalsare involved. Iterative thresholding algorithms is one family of algorithms which has attractedmuch attention in the literature [18?20]. However, until recently the iterative thresholdingalgorithms introduced, had worse recovery conditions than those of convex optimization. Inother words, the required number of measurements which had to be taken for recovery by ITalgorithms were more than those needed for convex relaxation. Inspired by ideas from theoryof graphical models, message passing algorithms and statistical physics, Donoho, Maleki, andMontanari [21] proposed a new iterative thresholding algorithm for sparse recovery, which isreferred to as approximate message passing (AMP), which was shown to have empirical perfor-81.3. Using prior information in the recovery algorithmmance matching BP while still enjoying the low complexity nature of iterative algorithms. Theperformance of AMP has been justified in the large system limit for Gaussian measurementmatrices. The AMP algorithm goes asxt+1= (xt+ Azt; fi^t)zt= y Axt+ 1zt1h0(xt1+ Azt1; fi^t1)i;(1.6)where = nNis the undersampling ratio, is a scalar thresholding function which actscomponent-wise on the vector inside, e.g., a soft thresholding function defined as (a; b) =sign(a)(a b)+, and fi^ t is a small threshold number which dictates the sparsity of the signal.Furthermore, 0(x; b) = @@x(x; b) and hi is the arithmetic mean. xt is the current estimateof the sparse signal x, zt is the current residual and A is the Hermitian transpose of themeasurement matrix, A.1.3 Using prior information in the recovery algorithmThe algorithms explained in the previous section do not use any prior information on thesignal to be recovered?though for successful recovery the signal needs to be sparse. However,in many applications it is possible to use prior information on the signal to reduce the numberof measurements even further.For example we know that speech signals have large low-frequency coefficients. Figure 1.5shows the amplitude of coefficients of an audio signal in discrete cosine transform domain.Notice how the largest coefficients are concentrated around low frequencies. This informationabout the audio signals can be used to enhance the recovery conditions. As another example weconsider the problem of interpolating irregularly sampled and incomplete seismic data. Assumethat we have Nssources located on earth surface which send sound waves into the earth and Nrreceivers record the reflection in Nttime samples. Hence the seismic data is organized in a 3-Dseismic line with Nssources, Nrreceivers, and Nttime samples. Rearranging the seismic line,we have a signal f 2 RN , where N = NsNrNt. Figure 1.6.a shows an example of a shotgatherfrom a seismic line from the Gulf of Suez, which is extracted from the data using one sourceand different receivers and times. Figure 1.6.b shows an example of a timeslice from the sameseismic line from the Gulf of Suez, which is extracted from the data using one time slice anddifferent sources and receivers. Seismic data is approximately sparse in curvelet domain [22,23]and hence we can formulate the seismic data interpolation problem as an instance of recoveryfrom compressive samples. Assume x = Sf where x is the sparse representation of f in curvelet91.3. Using prior information in the recovery algorithm0 500 1000 1500 2000 250000.20.40.60.811.21.41.61.8 Discrete Fourier tTransform of an audio signalFrequency numberAmplitudeFigure 1.5: Amplitude of frequency coefficients of an audio signal in DCT domain. Most ofthe large coefficients correspond to low frequencies.domain. We want to recover a very high dimensional seismic data volume x by interpolatingbetween a smaller number of measurements b = RMS, where RM is a sampling operatorcomposed of the product of a restriction matrix R, which specifies the sample locations thathave data in them and a measurement basis matrix M , which represents the basis in whichthe measurements are taken [4]. To overcome the problem of high dimensionality, recovery isperformed by first partitioning the seismic data volumes into frequency slices, or into commonoffset-azimuth gathers and then solving a sequence of individual subproblems [4]. Doing thishelps us in two ways. It reduces the size of the problem and we can also use the support setof each partition as an estimate of the support set of the next partition. Figure 1.7 takenfrom [4] is an example of the fraction of overlapping support when adjacent frequencies is usedto predict the support of the curvelet coefficients in a seismic line.Next, we provide a brief introduction to incorporating prior support information into thealgorithms introduced in the previous section.1.3.1 Recovery by weighted `1minimizationSuppose that x is an arbitrary vector in RN and let xkbe the best k-term approximationof x, i.e., set T0to be the set of indices of k largest coefficients of x in magnitude and setxk(j) = x(j) for j 2 T0and xk(j) = 0 for j =2 T0. Also assume that using prior information101.3. Using prior information in the recovery algorithmExample of a shot gatherDistance (m)Time (sec)200 400 600 800 1000 1200 1400 1600 1800 2000 22000.20.40.60.811.21.41.61.82(a)Example of a fully Sampled time slice in source?receiver domainSource numberReceiver number20 40 60 80 100 120 140 16020406080100120140160(b)Figure 1.6: (a) Example of a shot gather from a seismic line from the Gulf of Suez. (b) Exampleof a high resolution time slice in the source-receiver (SR) domain from a seismic line from theGulf of Suez.about the signal we can estimate the locations of the largest coefficients by eT , i.e., eT is anestimate for T0. It was noted in [1] that one can improve the recovery performance if there issuch prior information about the signal that is sufficiently accurate. Specifically, in this case,the required number of measurements for exact or approximate recovery can be reduced byincorporating the prior support information into the `1minimization based recovery algorithm.In particular [1] proposes the weighted `1decoder 41;w: RnN Rn RRN 7! RN definedas41;w(A; y; ;w) := argminz2RNjjzjj1;wsubject to jjAz yjj2 ; (1.7)where w 2 f!; 1gN is the weight vector and kzk1;w:= iwijzij is the weighted `1norm. Theoptimization problem in (1.7) uses bigger weights for coefficients which are not in the supportestimate and smaller weights for the ones in the support estimate. Therefore, the coefficientsin the support estimate are more likely to remain nonzero. Details of this algorithm has beenpresented in Chapter 2.1.3.2 Recovery by weighted `pminimizationAs mentioned in Section 1.2.1, solving the `1minimization problem (1.4) gives us the benefitsof solving a convex optimization problem but comes with the cost of increasing number of111.3. Using prior information in the recovery algorithm0 20 40 60 80 100 12000.10.20.30.40.50.60.70.80.91Frequency (Hz)Fraction of overlapping support Source?receiver domainMidpoint?offset domainFigure 1.7: Example of the accuracy achieved from using adjacent frequencies to predict thesupport of the curvelet coefficients in a seismic line. Adapted from [4] with permission.measurements. So far we have reviewed two algorithms which attempted to close the gap be-tween `0and `1, namely, `pminimization with p < 1 and weighted `1minimization. In Section2 we propose combining these two algorithms to reduce the required number of measurementseven further. In particular when we have a support estimate eT , we propose to approximate xby 4p;w(A; y; ;w) where 4p;w: RnN Rn R RN 7! RN is defined as4p;w(A; y; ;w) := argminz2RNkzkp;wsubject to kAz yk2 with wi=8<:1; if i 2 eT c!; if i 2 eT:(1.8)Here w 2 f!; 1gN is the weight vector and kzkp;w:= (iwpijzijp)1p is the weighted `pnorm.If the support estimate is sufficiently accurate, The sufficient recovery conditions for thisoptimization problem are weaker than those of weighted `1?see Theorem 2.12?and alsothis algorithm outperforms `pminimization if the support estimate is accurate enough. Athorough discussion including theoretical guarantees associated with weighted `pminimization,algorithmic issues, and numerical experiments on synthetic data, audio and seismic signals isgiven in Chapter 2.121.3. Using prior information in the recovery algorithm1.3.3 Recovery by weighted AMPIncorporating prior information into `1and `pminimization improves the recovery conditionsof sparse and approximately sparse signals. This observation motivated us to derive a weightedAMP algorithm which benefits from the low complexity nature of the AMP algorithm and usesthe prior support information about the signal. Let x 2 RN be a sparse vector. As before wetry to recover x from n < N linear measurements acquired via y = Ax+ e with kek . HereA is an nN matrix whose coefficients are drawn from a sub-Gaussian distribution. Assumethat we have a support estimate eT of the nonzero coefficients of the signal x. We incorporatethis information into the AMP algorithm by the following iterative algorithm: At t = 0 startfrom x0 = 0 and fi^ 0 = 1 and z0 = y. Assume w = [w1; w2; : : : ; wN]T is the weight vectorwhere wi= ! < 1 for i 2 eT and wi= 1 for i 2 eT c. Then the weighted AMP algorithmproceeds asxt+1= (xt+ Azt; fi^tw)zt= y Axt+ 1zt1h0(xt1+ Azt1; fi^t1w)ifi^t=fi^t1h0(xt1+ Azt1; fi^t1w)i;(1.9)where is the soft thresholding function defined in (1.6).In Chapter 3 we explain the derivation of this algorithms and show numerical results whichcompare this algorithm with regular AMP and weighted `1.13Chapter 2Recovering compressively sampled signals bynon-convex optimization and using partialsupport information2.1 Introduction and overviewIn this section we review the `1, `pand weighted `1minimization algorithms introduced inthe previous chapter. Our setting is as before: We have a sparse signal x 2 Nkand we wantto recover this signal from y 2 Rn, n linear and potentially noisy measurements of x whichare acquired via y := Ax + e. As before A is an n N measurement matrix with n fi Nand e denotes the noise in our measurements with kek2 ". This entails finding the sparsestvector x that is feasible, i.e., kAx yk ". Restating (1.3), in the noise-free case, the decoder40: RnN Rn 7! RN is defined as40(A; y) := argminz2RNjjzjj0subject to Az = y; (2.1)and 40(A; y) recovers x if n > 2k and A is in general position (see, e.g., [15]). As mentionedearlier, generally (2.1) is a combinatorial problem which becomes intractable as the dimensionsof the problem increase. Therefore, one seeks to modify the optimization problem so that it canbe solved (at least approximately) with methods that are more tractable than combinatorialsearch. We introduced some of these algorithms in Chapter 1, namely, `1, `pand weighted`1minimization. In this section we review these algorithms more carefully and provide thestability and robustness theorems of each.In all these algorithms the restricted isometry constants, defined in [24], play a central role.Definition 1. A matrix A satisfies restricted isometry property (RIP) of order k with con-142.1. Introduction and overviewstant kif for all k-sparse vectors z 2 Nk,(1 k)jjzjj22 jjAzjj22 (1 + k)jjzjj22: (2.2)2.1.1 Recovery by `1minimizationDonoho [13] and Cand?s, Romberg, and Tao [14] showed that if A obeys a certain restrictedisometry property, then solving (1.4) can stably and robustly recover x from seemingly incom-plete and inaccurate measurements y = Ax+ e. The following theorem, proved in [14], showsthe error guarantees of finding the best k-sparse solution with (1.4):Theorem 2.1. (Cand?s, Romberg and Tao [14] ) Let k, N be positive integers with k < N .Suppose that x is an arbitrary vector in RN and let xkbe the best k-term approximationof x. Let y = Ax+ e with kek2 . If A satisfies the restricted isometry property (RIP)with ak+ a(a+1)k< a 1, thenjj41(A; y; ) xjj`2 C`11 + C`12jjx xkjj`1pk:Remark 2.2. The constants C`11and C`12in Theorem 2.1 are given explicitly byC`11=2(1 + a12)q1 (a+1)k a12p1 + ak;C`12=2a12(q1 (a+1)k+p1 + ak)q1 (a+1)k a12p1 + ak:(2.3)Remark 2.3. Theorem 2.1 states that if ak+ a(a+1)k< a 1, then (1.4) can recover anyk-sparse vector x with an error proportional to the 2-norm of the measurement noise. Forexample in this case if we set a = 3, k 41(A; y; ) xk`2 C`11, and for reasonable values of4k, C`11is well behaved, e.g., C`11< 12:04 for 4k=15.Remark 2.4. It is worth noting that calculating the RIP constants for an arbitrary matrixA is computationally expensive. On the other hand it was proved in [25] that if columnsof A are independent, identically distributed (i.i.d.) random vectors with any sub-Gaussiandistribution, then A satisfies RIP property (2.2) for any 0 < < 1 with probability greaterthan 12ec2k, whenever k c1nlogNn; here c1, and c2are positive constants that only dependon and the distribution of A.152.1. Introduction and overviewAs mentioned in Chapter 1, the `1minimization problem (1.4) is a convex optimizationproblem and can be solved in polynomial time. However, the computational tractability of `1minimization comes at the cost of increasing the number of measurements taken. For exampleif the columns of the measurement matrix A are independent, identically distributed randomvectors with any sub-Gaussian distribution, Theorem 2.1 together with Remark 2.4 impliesthat 41can recover any k-sparse vector x when n & k log(Nk) rather than the n > 2kproperty which is sufficient for recovery by 40. Therefore, as mentioned earlier, several workshas been done to close the gap in the required number of measurements for recovery via `0and `1minimization problems, including solving a non-convex `pproblem when p < 1 [2, 16]and using prior knowledge about the signal to be recovered [1].2.1.2 Recovery by `pminimizationIt was shown in [2,16,17,26] that recovery by `pminimization is stable and robust under weakersufficient conditions than the analogous `1minimization conditions. Therefore, the globalminimizer of (1.5) is an approximate solution to (2.1) for a more general class of measurementmatrices A than the minimizer of (1.4). This result is made explicit by the following theoremfrom [2].Theorem 2.5. (Saab and Y?lmaz [2] ) Suppose that x is an arbitrary vector in RN and letxkbe the best k-term approximation of x and let y = Ax+ e with kek2 . If A satisfiesak+ a2p1(a+1)k< a2p1 1, for a 2 1kN and k > 1 and 0 < p < 1 thenjj4p(A; y; ) xjjp2 C`p1 p+ C`p2jjx xkjjppk1p=2:Remark 2.6. It is sufficient that A satisfies(a+1)k<^`p:=a2p1 1a2p1+ 1(2.4)for Theorem 2.5 to hold, i.e., to guarantee stable and robust recovery described in the theoremwith the same constants C`p1and C`p2.162.1. Introduction and overviewRemark 2.7. Constants C`p1and C`p2in Theorem 2.5 are given explicitly by:C`p1=2p0B@1 +1a2p1(2p1)p21CA(1 (a+1)k)p2 (1 + ak)p2ap21C`p2=20B@(1 + ak)p2ap21+(1(a+1)k)p2a2p1(2p1)p21CA(1 (a+1)k)p2 (1 + ak)p2ap21:(2.5)As stated in [2] the constants C`p1and C`p2are well behaved. When p = 1 these constantsreduce to those of Theorem 2.1. If p = 12and 4k=15, then C`p1< 3:87 and C`p2< 3:18. Itwas shown in [26] that smaller bounds are achievable in the noise-free case. Moreover, theresult in [2] says that given a matrix A that satisfies (a+1)k<a1a+1and suppose that k1is thelargest k for which the inequality holds, then Theorem 2.8 guarantees that (1.4) can recover allk1-sparse signals whereas Theorem 2.5 guarantees that (1.5) can recover all kp-sparse vectorswhere kp=a+1ap2p+1k1which is bigger than k1when p < 1.2.1.3 Recovery by weighted `1minimizationAs stated also in Chapter 1, the `1problem (1.4) does not use any prior information aboutthe signal (though for successful recovery the signal needs to be sparse). However, in manyapplications it is possible to estimate the support of the signal which we want to recover. Itwas noted in [1] that one can improve the recovery conditions if there is prior informationabout the signal. This result is demonstrated by the following theorem from [1].Theorem 2.8. (FMSY [1] ) Suppose that x is an arbitrary vector in RN and y = Ax+e withkek2 . Let xkbe the best k-term approximation of x with suppfxkg = T0. Let eT be anarbitrary subset of f1; 2; :::; Ng and define and such that j eT j = k and jT0\eT j = k.(Here j eT j denotes the cardinality of eT .) Suppose there exists an a 2 1kZ with a (1)and a > 1 and the measurement matrix A has RIP withak+a(! + (1 !)p1 + 2)2(a+1)k<a(! + (1 !)p1 + 2)2 1172.1. Introduction and overviewfor some 0 ! 1. Thenjj41;w(A; y; ;w) xjj2 Cw`11+ Cw`12k12(!kx xkk1+ (1 !)kxeTc\Tc0k1):Remark 2.9. Constants Cw`11and Cw`12in Theorem 2.8 are given explicitly by:Cw`11=2(1 +w+(1w)p1+2pa)q1 (a+1)kw+(1w)p1+2pap1 + akCw`12=2a12(q1 (a+1)k+p1 + ak)q1 (a+1)kw+(1w)p1+2pap1 + ak:(2.6)Remark 2.10. It is sufficient that A satisfies(a+1)k<^w`1:=a (! + (1 !)p1 + 2)2a+ (! + (1 !)p1 + 2)2(2.7)for Theorem 2.8 to hold, i.e., to guarantee stable and robust recovery described in the theoremwith the same constants Cw`11and Cw`12.Notice that determines the ratio of the size of the estimated support to the size of theactual support of xk(or the support of x if x is k-sparse) and determines the accuracy ofthe support estimate which is the ratio of the size of eT \ T , to the the size of our estimate eT .Remark 2.11. Theorem 2.8 shows that if > 0:5 the sufficient recovery conditions of weighted`1become weaker than those of regular `1. Moreover, the error bound constants becomesmaller. The result also indicates that when the support estimate is accurate, i.e., > 0:5, achoice of the weight ! = 0 results in the weakest sufficient conditions for recovery and smallesterror bounds, whereas choosing ! = 1 achieves the weakest sufficient conditions for recoveryand smallest error bounds when the support accuracy is low ( < 0:5). However, numericalexperiments conducted in [1] have shown that weighted `1minimization achieves the bestrecovery when intermediate values of ! are used for different accuracy levels. We discuss thisissue more extensively in the next section.Combining these two ideas, in this chapter we study the sparse recovery guarantees of theweighted `pminimization problem (1.8). Specifically, we show that the weighted `pminimiza-tion algorithm outperforms both `pminimization and weighted `1minimization under certaincircumstances.In Section 2.2, we describe the proposed weighted `pminimization algorithm, derive stability182.2. Main resultsand robustness guarantees for this algorithm and compare it with regular `pand weighted `1.Specifically, we prove that the recovery guarantees of the weighted `palgorithm with 0 < p < 1are better than those of weighted `1and regular `pwhen we have a prior support estimatewith accuracy better than 50%. In Section 2.3, we explain the algorithmic issues that comewith solving the proposed non-convex optimization problem and the approach we take to em-pirically overcome them. In Section 2.4, we present numerical experiments where we applythe weighted `pmethod to recover sparse and compressible signals. In Section 2.5, we showthe result of applying these algorithms to audio and seismic signals. In Section 2.6, we providethe proof for our main theorem.2.2 Main resultsIn this section we explain the recovery by weighted `pminimization. In particular when wehave a prior support estimate eT , we approximate x by 4p;w(A; y; ;w) where 4p;w: RnN Rn R RN 7! RN is defined as4p;w(A; y; ;w) := argminz2RNkzkp;wsubject to kAz yk2 with wi=8<:1; if i 2 eT c!; if i 2 eT:(2.8)Here w 2 f!; 1gN is the weight vector and kzkp;w:= (iwpijzijp)1p is the weighted `pnorm.Next we provide the stable and robust recovery conditions of this algorithm and compare itwith weighted `1and `pin terms of sufficient conditions for stable and robust recovery andassociated constants.2.2.1 Weighted `pminimization with estimated supportAs mentioned in the previous section, one can improve the recovery guarantees of 41by using4pand by incorporating prior support information into the the optimization problem. Inthis section we provide the recovery conditions when we combine both these approaches. Thefollowing theorem states the main result.Theorem 2.12. Suppose that x is an arbitrary vector in RN and y = Ax+ e with kek2 .Let xkbe the best k-term approximation of x with suppfxkg = T0. Let eT be an arbitrarysubset of f1; 2; :::; Ng and define and such that j eT j = k and jT0\eT j = k. Supposethere exist an a 2 1kZ, with a (1 ) and a > 1 and the measurement matrix A has192.2. Main resultsRIP withak+a2p1(!p+ (1 !p)(1 + 2)1p2)2p(a+1)k<a2p1(!p+ (1 !p)(1 + 2)1p2)2p 1;(2.9)for some 0 ! 1 and 0 < p < 1. Thenk 4p;w(A; y; ;w) xkp2 C1p+ C2kp21(!pkx xkkpp+ (1 !p)kxeTc\Tc0kpp): (2.10)Remark 2.13. Note that and have the same definitions as in Theorem 2.8, i.e., denotesthe ratio of the size of the estimated support to the size of the actual support of xkand denotes the accuracy of our estimate which is the ratio of the size of eT \ T0, to the the size ofour estimate eT .Remark 2.14. The constants C1and C2are explicitly given by:C1=2p0B@1 +!p+(1!p)(1+2)1p2a2p1(2p1)p21CA(1 (a+1)k)p2 (1 + ak)p2ap21!p+ (1 !p)(1 + 2)1p2; (2.11)C2=20B@(1 + ak)p2ap21+(1(a+1)k)p2a2p1(2p1)p21CA(1 (a+1)k)p2 (1 + ak)p2ap21!p+ (1 !p)(1 + 2)1p2: (2.12)Remark 2.15. Theorem 2.12 is consistent with the analogous theorem for `p[2], i.e., Theorem2.5 and for weighted `1[1], i.e., Theorem 2.8. That is, if we set ! = 1, Theorem 2.12 reducesto Theorem 2.5 and setting p = 1, Theorem 2.12 yields Theorem 2.8 as a special case.Remark 2.16. It is sufficient that A satisfies(a+1)k<^w`p:=a2p1 (!p+ (1 !p)(1 + 2)1p2)2pa2p1+ (!p+ (1 !p)(1 + 2)1p2)2p(2.13)for Theorem 2.12 to hold, i.e., to guarantee stable and robust recovery described in the theoremwith the same constants C1and C2. Setting ! = 1 gives us sufficient conditions for recoveryby 4pand setting p = 1 yields sufficient conditions for recovery by 41;w. Notice that these202.2. Main resultsconditions are in terms of bounds on RIP constants. In Section 2.2.2 we compare these bounds.Remark 2.17. In [27] Cand?s used a slightly different approach to guarantee the stable androbust recovery when a = 1. Cand?s proved that if 2k< (1 +p2)1, then 41achieves stableand robust recovery. Using the same approach as in [27] we get the condition2k<11 +p2(!p+ (1 !p)(1 + 2)1p2)(2.14)which gives us an alternative stable and robust recovery condition for 4p;w. When > 0:5this condition provides weaker conditions on the measurement matrices compared to 4p. Weomit the derivation of this condition as it closely mimics the respective calculations in [27].2.2.2 Comparison to weighted `1recoveryIn this section we compare the conditions for which Theorem 2.12 holds with the correspond-ing conditions for Theorem 2.8. The following observations are easy to verify.Proposition 2.18. Let C1, C2, Cw`11and Cw`12be as defined above. If p = 1 then C1= Cw`11andC2= Cw`12and the sufficient condition for Theorem 2.12 would be identical to Theorem2.8.Remark 2.19. If = 0:5, then the sufficient condition for stable and robust recovery with(2.8) would be (a+1)k<a2p11a2p1+1and the analogous condition for stable and robust recoverywith (1.7) would be (a+1)k<a1a+1. Since a > 1 for 0 < p < 1, a2p1> a. Therefore, theweighted `pconditions would be much weaker than the weighted `1conditions, for example,when a = 3 and p = 0:5, then the sufficient condition for weighted `pis 4k<2628which ismuch weaker than the sufficient condition for weighted `1which is 4k< 0:5.Figure 2.1 illustrates how the sufficient conditions on the RIP constants vary with and !in the case of weighted `1and weighted `p. In particular these sufficient conditions are intro-duced in Theorem 2.8 and Theorem 2.12, i.e., ^w`1 defined in (2.7) and ^w`p defined in (2.13)which determine bounds on the RIP constants. In Figure 2.1, we plot ^w`p versus ! for p = 1and p = 25, with a = 3 in both cases and different values of . The bounds on RIP constantsget larger as increases which means that a wider range of measurement matrices satisfy thesufficient condition when the support estimate is more accurate. Note that when = 0:5 thesufficient conditions for recovery by weighted `pwould be identical to sufficient conditions forrecovery by standard `pfor 0 < p < 1. Comparing these results with recovery by weighted `1,212.2. Main results0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.40.50.60.70.80.91Weights (?)Bounds on the RIP constant wlp ? = 0.1wlp ? = 0.3lpwlp ? = 0.7wlp ? = 0.9wl1 ?=0.3l1wl1 ?=0.9(a) ^w vs !Figure 2.1: Comparison of the sufficient conditions for recovery with weighted `preconstructionwith various . In all the Figures, we set a = 3 and = 1 and p = 25.we see that in recovery by weighted `pthe measurement matrix A has to satisfy much weakerconditions than the analogous conditions in recovery by weighted `1even when we do not havea good support estimate. For example with a = 3 and ! = 0:2, when the support estimate is70% accurate, we get the sufficient condition ^wlp < 0:9897 which is so close to the best thatwe can get (^ < 1) and it is weaker than the sufficient recovery condition for weighted `1whichis ^w`1 < 0:763 and regular `1which is ^1 < 0:5.It is worth considering the sufficient recovery conditions for the special case of zero weight. Asseen in Figure 2.1 setting ! = 0 is beneficial when > 0:5, however, the sufficient recoverycondition is stricter when < 0:5. This suggests that setting the weight ! = 0 is beneficial inthe case where the support estimate is highly accurate.Figure 2.2 compares the recovery guarantees we obtain in the zero-weight case with condi-tions that guarantee recovery via weighted `pand weighted `1minimization. To this end, wepresent the phase diagrams of measurement matrices A with Gaussian entries that satisfy theconditions on the restricted isometry constants (a+1)kgiven in (2.7) and (2.13) with ! = 0, = 1, and = 0:3; 0:6; and 0:8. The phase diagrams are calculated using the upper boundson the RIP constants derived in [5] and reflect the sparsity levels for which the three theoremsguarantee exact signal recovery as a function of the aspect ratio of the measurement matrix A.222.2. Main results0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.900.0020.0040.0060.0080.010.0120.0140.0160.0180.02nNk nphase diagram w?Lp, ? = 0.8 (? = 0.9994)w?Lp,? = 0.6 (? = 0.9899)Lp (? = 0.9756)w?Lp, ? = 0.3 (? = 0.9094)w?L1, ? = 0.8 (? = 0.7647)w?L1, ? = 0.6 (? = 5789)L1 (? = 0.5)w?L1, ? = 0.3 (? = 0.3636)Figure 2.2: Comparison between the phase diagrams of measurement matrices with Gaussianentries satisfying the sufficient recovery conditions of weighted `pminimization and weighted`1minimization with ! = 0 and = 0:3, 0:6, and 0:8 and standard `1. The plots are calculatedusing the upper bounds on the restricted isometry constants derived in [5]. Points below eachcurve determine the sparsity-undersampling ratios that satisfy the sufficient bounds on theRIP constants introduced in (2.7) and (2.13).2.2.3 Comparison to `precoveryIn this section we compare the sufficient conditions of Theorem 2.5 and Theorem 2.12. Thefollowing observations are easy to verify.Proposition 2.20. Let C1, C2, C`p1and C`p2be as defined above .(i) If ! = 1 then C1= C`p1and C2= C`p2and the sufficient condition for Theorem 2.12and Theorem 2.5 would be identical.(ii) If = 0:5 then again C1= C`p1and C2= C`p2and the sufficient condition for Theorem2.12 would be identical to Theorem 2.5.(iii) Suppose 0 ! < 1 then C1< C`p1and C2< C`p2if and only if > 0:5.232.3. Algorithms0 0.2 0.4 0.6 0.8 11.522.533.54Weights (?)Error bound noise constant (C 0) ? = 0.1? = 0.3? = 0.5? = 0.7? = 0.9(a) C1vs !0 0.2 0.4 0.6 0.8 11.522.533.544.55Weights (?)Error bound compressibility constant (C 2) ? = 0.1? = 0.3? = 0.5? = 0.7? = 0.9(b) C2vs !Figure 2.3: Comparison of the recovery constants for weighted `preconstruction with various. In all the figures, we set a = 3 and = 1 and p = 25.Proposition 2.20 reflects the results shown in Figure 2.3. Figures 2.3.a and 2.3.b show howconstants C1and C2in (2.10) change with ! for different values of . Notice that when weincrease we get smaller error constants.When < 0:5, i.e., when our estimate is less than 50% accurate, using bigger weights wouldresult in more robust recovery which is useful when we do not know the estimate accuracy ex-actly. However, numerical results?Section 2.4?show that using intermediate weights usuallyresults in better recovery not only when < 0:5 but also in some cases of > 0:5. When! = 1, the conditions of Theorems 2.12 and 2.5 become identical. For all values of ! < 1,having a support estimate accuracy > 0:5 results in a weaker condition on the RIP constantand smaller error bound constants compared with the conditions of standard `p. On the otherhand, if < 0:5, i.e., the support estimate has low accuracy, then standard `phas weaker suf-ficient recovery conditions and smaller error bound constants compared to weighted `p. Thisbehavior is similar to that derived for weighted `1minimization in [1].2.3 AlgorithmsIn this section, we study the algorithm that we have used for the weighted `pminimizationproblem (2.8).242.3. Algorithms2.3.1 Algorithmic issuesIn this section, we study the performance of three different algorithms that attempt to solve the`pminimization problem. There are a few algorithms which are commonly used to solve thisminimization problem including the projected gradient method [16], the iterative reweighted `1method [28], and the iterative reweighted least squares method [29]. Since the `pminimizationproblem is non-convex and several local minima exist, these algorithms attempt to convergeto local minima that are close to the global minimizer of the problem. To that end, the onlyproofs of global convergence that currently exist assume that the global minimizer can be foundif a feasible point can be found. However, numerical experiments show that these algorithmsperform well when the measurement matrix has i.i.d. Gaussian random entries.We describe these algorithms in the remainder of this section.2.3.2 Projected gradient methodAn approach based on the projected gradient method has been proposed in [16] and also used,for example, in [2] to compute approximate solutions of (1.5).An iteration of this algorithm minimizes a smoothed `pobjective instead of the `pnorm. Thesmoothed `pcost function is given by (Pi(x2i+ ff)p=2)1=p. The smoothing parameter ff isinitialized with a large value, say 10. After taking a projected gradient step in every iteration,the value of ff is reduced. In every iteration, the new iterant is projected onto the affine spacedefined by Ax = b.Adopting to the weighted `pminimizationThis method can be adopted to solve the weighted `pproblem by replacing the `p-norm, k kp,with the weighted `pnorm, k kp;w, and making corresponding changes to the algorithm towork for the weighted version. Algorithm 1 explains the details of this algorithm. Here r(fx)i= p wpi (xti (xti)+ ff2)p=21 xti.2.3.3 Iterative reweighted `pThe iterative reweighted `pmethod, proposed in [28], is a modification of the iterative reweighted`1(IRL1) algorithm of [30]. This method uses a modified version of IRL1 to solve the followingpenalized likelihood signal restoration problem:minx2RNkAx bk22+ kxkpp: (2.15)252.3. AlgorithmsAlgorithm 1 Weighted projected gradient method1: Input b = Ax+ e, p, A, ! 2 [0; 1], 2: Output x(t)3: Initialize ff = 10, t = 0, x0 = Ab, [M N ] = size(A), Q = Ay A, wi=!; i 2 1; i 2 c4: loop5: fx=Pi(w2i ((xti)2+ ff))p=26: d = r(fx)7: pd = dQ d8: t = t+ 19: line search10: xt = xt1 + l pd11: Indicator=p1pxt1pp12: Idx=find(Indicator < w ff)13: ff = min(0:98 ff;max(Indicator(Idx))14: end loopHere 0 < p < 1, is a positive penalty parameter, and kxkpp=Pijxijp. To attempt to solve(2.15) the algorithm starts from an initial guess x1 and at each iteration k, it defines thefollowing weights:wki=p(jxkij+ ff)1p; (2.16)and solves the following `1minimization:xk+1= arg minx2RNkAx bk22+ kxk1;wk: (2.17)2.3.4 Iterative reweighted least squaresUsing iterative reweighted least squares (IRLS) to solve the `pminimization problem has beenstudied in [31] and [29]. We have implemented the algorithm proposed in [29]. At each iterationk, of this method we solve the following least squares problem:minu2RNi=NXi=1wiu2isubject to Au = b; (2.18)where wi= ((u(k1)i)2+ ff)p21 and ff is a regularization parameter which takes care of theproblem of having ui= 0. As stated in [29], at each iteration k, we can solve (2.18) explicitlyby solving the Euler-Lagrange equation of (2.18) and using the constraint to solve for the262.4. Numerical examplesLagrange multipliers. This yieldsu(k)= QkAT(AQkAT)1b; (2.19)where Qk= diag(1wi) for w as defined above. The solution at each iteration is used to generatethe weights for the next iteration. After some iterations we get close to the optimal solutionand then we make ff smaller until we get to the desired "convergence level", i.e., when the normof the difference between consequent recovered signals is smaller than a predefined threshold.Our implementation of the weighted versions of these three algorithms show that the the pro-jected gradient method is more compatible with the way we do weighting and the performanceof the modified projected gradient method is better than the modified versions of the othertwo. Therefore, our numerical results, presented in the next section, has been produced byusing modified projected gradient method described in Algorithm 1.2.4 Numerical examplesIn this section, we provide numerical results to show how4p;wimproves the recovery conditionsof sparse and approximately sparse signals compared to 4pand 41;w. We show the resultsfor sparse signals and compressible signals. In these examples, we use the projected gradientalgorithm that is described in Section 2.3.2 to solve the weighted `pminimization problem(2.8).2.4.1 The sparse caseIn this section, we generate signals x 2 RN where N = 500 and with fixed sparsity k = 40. Wecompute the (noisy) compressed measurements of x using a Gaussian random measurementmatrix A with dimensions nN where n varies between 80 and 200 with an increment of 20.In the case of noisy measurements, we have assumed 5% noise, i.e., signal-to-noise ratio of 26db.Figure 2.4 shows the reconstruction signal-to-noise ratio (SNR) averaged over 10 experiments,using weighted `pand weighted `1minimization, versus the number of measurements, i.e., n.In Figures 2.4.a?c, we show result in the noise-free case when = 0:3; 0:5; 0:7 respectively.In figures 2.4.d?f we repeat the above experiment in the noisy case. In all the experiments,p = 0:5 and the recovery was done with the projected gradient algorithm described in Section272.4. Numerical examples2.3.2. Here the SNR is measured in db and is given bySNR(x; x^) = 10 log10(kxk22kx x^k22): (2.20)Recall that when ! = 1 weighted `pis equivalent to regular `pand weighted `1is equivalentto regular `1. Figures 2.4.a?c illustrate that in the noise-free case, the experimental results areconsistent with the theoretical results derived in Theorem 2.12. More precisely when > 0:5,the best recovery is achieved when the weights are set equal to zero and as our estimate getsworse ( decreases), the best recovery is achieved when larger weights are used. Also we ob-serve that weighted `precovers significantly better than weighted `1, especially when we havefew measurements, which is consistent with our analysis in Section 2.2.Remark 2.21. In Figures 2.1 and 2.3 we can see that when < 0:5 both the sufficient recoveryconditions and error bound constants point towards using ! = 1 for better recovery. However,Figure 2.4 shows that this is not always true. We attribute this behavior to the best k-termapproximation term in the error bound of Theorem 2.12. Consider the noise-free case wherethe error bound becomes:k 4p;w(A; y; 0;w) xkp2 C2kp21(!pkx xkkpp+ (1 !p)kxeTc\Tc0kpp):As we can we can see in Figure 2.3, C2decreases when we use bigger weights. Notice that onTc0, xk= 0 so we have kxeTc\Tc0k = k(x xk)eTc\Tc0k which means that kxeTc\Tc0kppis always lessthan kx xkkpp. When we use bigger weights kx xkk would have more impact on the errorwhich results in bigger error when we use bigger weights. Hence using bigger weights decreasesthe constant C2and increases the factor !pkxxkkpp+(1!p)kxeTc\Tc0kpp. Consequently whenthe algorithm cannot recover the full support of x, i.e., when kx xkk > 0, an intermediatevalue of ! in [0; 1] may result in the smallest recovery error. A full mathematical analysis ofthis observation needs to consider all the parameters in Theorem 2.12 including !; k; whichis beyond the scope of this thesis.Figures 2.4.d?f show results for the noisy case. As we can see using intermediate weightsresults in best recovery and we can see that weighted `pis outperforming weighted `1espe-cially when the number of measurements is small. It is also evident that when we have a bettersupport estimate, we reach the best possible recovery with fewer number of measurements n.Next, we investigate the effect of the choice of weights and p on the approximation error, whenwe have support estimates with various accuracy levels. Figure 2.5 shows the average SNR282.4. Numerical examplesover 10 experiments with = 1 and n = 100 for different values of p, !, and when wehave 5% noise and when we have no noise. Figures 2.5.a?c show the no-noise case. As wecan see using smaller p is beneficial when we do not have a good support estimate. However,Notice that using smaller p results in slower recovery and dealing with the non-convexity of `pnorm is more challenging when p is small. It is also worth noting that we again observe betterperformance with intermediate weights when < 0:5.Figures 2.5.d, 2.5.e, and 2.5.f show the recovery for different values of p when we have 5%noise. As we observe generally using smaller p results in better recovery and again intermedi-ate weights are recovering better. One important thing to observe here is how using smaller pimproves the results when we do not have a good support estimate, which is a result of weakersufficient recovery conditions when p is smaller.Figure 2.6 shows the averaged SNR for recovering a 40-sparse signal using weighted `pwithdifferent support estimate sizes. Figures 2.6.a?c show the results when we do not have anynoise and Figures 2.6.d?f show the results when we have 5% noise. We can see that generallyusing weighted solvers is useful especially in the non-convex case where we get better resultseven when = 0:3. Note that having a larger support estimate size usually results in betterrecovery, but the results are more sensitive to the accuracy of the support estimate than itssize. Another important point here is that when we are in the noisy case or when we do nothave a good support estimate, using intermediate weights results in better recovery which isvery important for applications. Also this figure emphasizes the benefit of using weighted `pwhen we do not have a good support estimate.2.4.2 The compressible caseIn this section we generate signals x 2 RN , sorted coefficients of which decay like jd wherej 2 f1; 2; :::; Ng and d > 1. Figure 2.7 illustrates the results when d = 1:1. Here, we attemptto estimate the support of the best k-term approximation of the signal when n = 100. Ac-cordingly, we find with respect to the best 40-term approximation. Figures 2.7.a?c showthe no-noise case where Figures 2.7.d?f have 5% noise. As we can see when we have a goodsupport estimate, having a larger support estimate ends in better recovery and like the sparsecase when we do not have a good support estimate, i.e., <0.5, using intermediate weightsresults in better reconstruction. When =0.7 using zero weights is giving us the best recovery.Generally in this figure we see that unlike the sparse case using weighted `pfor recoveringcompressible signals does not give us much better results compared to weighted `1, specifically292.4. Numerical examplesin Figure 2.7.d we see that weighted `1with zero weight is recovering better than weighted `p.We believe that this is an artifact of the algorithm we are using. As we said before we do nothave any proof for global convergence of the algorithm and the projected gradient algorithmhandles the local minima by a smoothing parameter ff. In the noisy compressible case we havea large number of these local minima which may be a reason that in some of the compressiblenoisy cases we see that weighted `1is recovering better than weighted `p. However, an exactexplanation of why this happens is beyond the scope of this thesis.Figure 2.8 shows the case when d = 1:5. Here the decay of coefficients of the signal is muchfaster so we have used fewer coefficients as our support. As we can see the results are similarto the case with d = 1:1 and generally when we have larger support estimate we get betterrecovery. Here we find using the best 20-sparse approximation. Again in the noisy com-pressible case we see that weighted `1is recovering better than weighted `pwhen we have agood support estimate.302.4. Numerical examples80 100 120 140 160 180 200050100150200250number of measurments nSNR weighted Lp, ? = 0weighted Lp, ? = 0.3weighted Lp, ?= 0.5weighted Lp, ? = 0.7weighted Lp, ?= 1weighted L1, ?= 0weighted L1, ?= 0.5weighted L1, ?= 1(a) = 0:7 with no noise80 100 120 140 160 180 200050100150200250number of measurments nSNR weighted Lp, ? = 0weighted Lp, ? = 0.3weighted Lp, ?= 0.5weighted Lp, ? = 0.7weighted Lp, ?= 1weighted L1, ?= 0weighted L1, ?= 0.5weighted L1, ?= 1(b) = 0:5 with no noise80 100 120 140 160 180 200050100150200250number of measurments nSNR weighted Lp, ? = 0weighted Lp, ? = 0.3weighted Lp, ?= 0.5weighted Lp, ? = 0.7weighted Lp, ?= 1weighted L1, ?= 0weighted L1, ?= 0.5weighted L1, ?= 1(c) = 0:3 with no noise80 100 120 140 160 180 200051015202530number of measurments nSNR weighted Lp, ? = 0weighted Lp, ? = 0.3weighted Lp, ?= 0.5weighted Lp, ? = 0.7weighted Lp, ?= 1weighted L1, ?= 0weighted L1, ?= 0.5weighted L1, ?= 1(d) = 0:7 with 5% noise80 100 120 140 160 180 200051015202530number of measurments nSNR weighted Lp, ? = 0weighted Lp, ? = 0.3weighted Lp, ?= 0.5weighted Lp, ? = 0.7weighted Lp, ?= 1weighted L1, ?= 0weighted L1, ?= 0.5weighted L1, ?= 1(e) = 0:5 with 5% noise80 100 120 140 160 180 200051015202530number of measurments nSNR weighted Lp, ? = 0weighted Lp, ? = 0.3weighted Lp, ?= 0.5weighted Lp, ? = 0.7weighted Lp, ?= 1weighted L1, ?= 0weighted L1, ?= 0.5weighted L1, ?= 1(f) = 0:3 with 5% noiseFigure 2.4: Comparison of performance of weighted `pand weighted `1recovery in terms ofSNR averaged over 10 experiments for sparse signals with variable weights and measurementsand = 1 and p = 0:5.312.4. Numerical examples0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7050100150200250pSNR regular Lpweighted Lp, ? = 0weighted Lp, ?= 0.1weighted Lp, ? = 0.3weighted Lp, ?= 0.5weighted Lp, ? = 0.7weighted Lp, ?= 0.9(a) = 0:7 with no noise0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7050100150200250pSNR regular Lpweighted Lp, ? = 0weighted Lp, ?= 0.1weighted Lp, ? = 0.3weighted Lp, ?= 0.5weighted Lp, ? = 0.7weighted Lp, ?= 0.9(b) = 0:5 with no noise0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7020406080100120140160180200pSNR regular Lpweighted Lp, ? = 0weighted Lp, ?= 0.1weighted Lp, ? = 0.3weighted Lp, ?= 0.5weighted Lp, ? = 0.7weighted Lp, ?= 0.9(c) = 0:3 with no noise0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.710152025pSNR regular Lpweighted Lp, ? = 0weighted Lp, ?= 0.1weighted Lp, ? = 0.3weighted Lp, ?= 0.5weighted Lp, ? = 0.7weighted Lp, ?= 0.9(d) = 0:7 with 5% noise0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.76810121416182022pSNR regular Lpweighted Lp, ? = 0weighted Lp, ?= 0.1weighted Lp, ? = 0.3weighted Lp, ?= 0.5weighted Lp, ? = 0.7weighted Lp, ?= 0.9(e) = 0:5 with 5% noise0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.724681012141618pSNR regular Lpweighted Lp, ? = 0weighted Lp, ?= 0.1weighted Lp, ? = 0.3weighted Lp, ?= 0.5weighted Lp, ? = 0.7weighted Lp, ?= 0.9(f) = 0:3 with 5% noiseFigure 2.5: Comparison of SNR for variable weights and p for = 1, k = 40 and n = 100.322.4. Numerical examples20 25 30 35 40 45 50 55 60050100150200250support estimates sizeSNR weighted Lp, ? = 0weighted Lp, ? = 0.3weighted Lp, ?= 0.5weighted Lp, ? = 0.7weighted Lp, ?= 1weighted L1, ?= 0weighted L1, ?= 0.5weighted L1, ?= 1(a) = 0:7 with no noise20 25 30 35 40 45 50 55 60020406080100120140160support estimates sizeSNR weighted Lp, ? = 0weighted Lp, ? = 0.3weighted Lp, ?= 0.5weighted Lp, ? = 0.7weighted Lp, ?= 1weighted L1, ?= 0weighted L1, ?= 0.5weighted L1, ?= 1(b) = 0:5 with no noise20 25 30 35 40 45 50 55 600102030405060708090100110support estimates sizeSNR weighted Lp, ? = 0weighted Lp, ? = 0.3weighted Lp, ?= 0.5weighted Lp, ? = 0.7weighted Lp, ?= 1weighted L1, ?= 0weighted L1, ?= 0.5weighted L1, ?= 1(c) = 0:3 with no noise20 25 30 35 40 45 50 55 600510152025303540support estimates sizeSNR weighted Lp, ? = 0weighted Lp, ? = 0.3weighted Lp, ?= 0.5weighted Lp, ? = 0.7weighted Lp, ?= 1weighted L1, ?= 0weighted L1, ?= 0.5weighted L1, ?= 1(d) = 0:7 with 5% noise20 25 30 35 40 45 50 55 6005101520253035support estimates sizeSNR weighted Lp, ? = 0weighted Lp, ? = 0.3weighted Lp, ?= 0.5weighted Lp, ? = 0.7weighted Lp, ?= 1weighted L1, ?= 0weighted L1, ?= 0.5weighted L1, ?= 1(e) = 0:5 with 5% noise20 25 30 35 40 45 50 55 600510152025support estimates sizeSNR weighted Lp, ? = 0weighted Lp, ? = 0.3weighted Lp, ?= 0.5weighted Lp, ? = 0.7weighted Lp, ?= 1weighted L1, ?= 0weighted L1, ?= 0.5weighted L1, ?= 1(f) = 0:3 with 5% noiseFigure 2.6: Comparison of performance of weighted `pand weighted `1recovery in terms ofSNR averaged over 20 experiments for sparse signals x with n = 100; N = 500 with variablesupport size and variable and !332.4. Numerical examples20 25 30 35 40 45 50 55 601314151617181920support estimates sizeSNR weighted L_p, ? = 0 weighted L_p, ? = 0.3 weighted L_p, ?= 0.5 weighted L_p, ? = 0.7 weighted L_p, ?= 1 weighted L_1, ?= 0 weighted L_1, ?= 0.5 weighted L_1, ?= 1(a) = 0:7 with no noise20 25 30 35 40 45 50 55 601313.51414.51515.51616.5support estimates sizeSNR weighted L_p, ? = 0 weighted L_p, ? = 0.3weighted L_p, ?= 0.5 weighted L_p, ? = 0.7 weighted L_p, ?= 1 weighted L_1, ?= 0 weighted L_1, ?= 0.5 weighted L_1, ?= 1(b) = 0:5 with no noise20 25 30 35 40 45 50 55 601010.51111.51212.51313.51414.515support estimates sizeSNR weighted L_p, ?= 0 weighted L_p, ?= 0.3 weighted L_p, ?= 0.5 weighted L_p, ?= 0.7 weighted L_p, ?= 1 weighted L_1, ?= 0 weighted L_1, ?= 0.5 weighted L_1, ?= 1(c) = 0:3 with no noise20 25 30 35 40 45 50 55 6013141516171819support estimates sizeSNR weighted L_p, ?= 0 weighted L_p, ?= 0.3 weighted L_p, ?= 0.5 weighted L_p, ?= 0.7 weighted L_p, ?= 1 weighted L_1, ?= 0 weighted L_1, ?= 0.5 weighted L_1, ?= 1(d) = 0:7 with 5% noise20 25 30 35 40 45 50 55 601313.51414.51515.5support estimates sizeSNR weighted L_p, ? = 0 weighted L_p, ? = 0.3 weighted L_p, ? = 0.5 weighted L_p, ? = 0.7 weighted L_p, ? = 1 weighted L_1, ? = 0 weighted L_1, ? = 0.5 weighted L_1, ? = 1(e) = 0:5 with 5% noise20 25 30 35 40 45 50 55 601010.51111.51212.51313.51414.5support estimates sizeSNR weighted L_p, ? = 0 weighted L_p, ? = 0.3 weighted L_p, ? = 0.5 weighted L_p, ? = 0.7 weighted L_p, ? = 1 weighted L_1, ? = 0 weighted L_1, ? = 0.5 weighted L_1, ? = 1(f) = 0:3 with 5% noiseFigure 2.7: Comparison of performance of weighted `pand weighted `1recovery in terms ofSNR averaged over 20 experiments for compressible signals x with n = 100; N = 500. Thecoefficients decay with a power d = 1:1. The accuracy of the support estimate is calculatedwith respect to the best k = 40 term approximation.342.4. Numerical examples12 14 16 18 20 22 24 26 282324252627282930313233support estimates sizeSNR weighted L_p, ? = 0 weighted L_p, ? = 0.3 weighted L_p, ?= 0.5 weighted L_p, ? = 0.7 weighted L_p, ?= 1 weighted L_1, ?= 0 weighted L_1, ?= 0.5 weighted L_1, ?= 1(a) = 0:7 with no noise12 14 16 18 20 22 24 26 2823242526272829303132support estimates sizeSNR weighted L_p, ? = 0 weighted L_p, ? = 0.3 weighted L_p, ?= 0.5 weighted L_p, ? = 0.7 weighted L_p, ?= 1 weighted L_1, ?= 0 weighted L_1, ?= 0.5 weighted L_1, ?= 1(b) = 0:5 with no noise12 14 16 18 20 22 24 26 2823242526272829303132support estimates sizeSNR weighted L_p, ? = 0 weighted L_p, ? = 0.3 weighted L_p, ?= 0.5 weighted L_p, ? = 0.7 weighted L_p, ?= 1 weighted L_1, ?= 0 weighted L_1, ?= 0.5 weighted L_1, ?= 1(c) = 0:3 with no noise12 14 16 18 20 22 24 26 282020.52121.52222.52323.52424.525support estimates sizeSNR weighted L_p, ?= 0weighted L_p, ?= 0.3 weighted L_p, ?= 0.5 weighted L_p, ? = 0.7 weighted L_p, ?= 1 weighted L_1, ?= 0 weighted L_1, ?= 0.5 weighted L_1, ?= 1(d) = 0:7 with 5% noise12 14 16 18 20 22 24 26 282020.52121.52222.5support estimates sizeSNR weighted L_p, ? = 0 weighted L_p, ? = 0.3weighted L_p, ?= 0.5 weighted L_p, ? = 0.7 weighted L_p, ?= 1 weighted L_1, ?= 0 weighted L_1, ?= 0.5 weighted L_1, ?= 1(e) = 0:5 with 5% noise12 14 16 18 20 22 24 26 2819.619.82020.220.420.620.82121.221.421.6support estimates sizeSNR weighted L_p, ? = 0 weighted L_p, ? = 0.3weighted L_p, ?= 0.5 weighted L_p, ? = 0.7 weighted L_p, ?= 1 weighted L_1, ?= 0 weighted L_1, ?= 0.5 weighted L_1, ?= 1(f) = 0:3 with 5% noiseFigure 2.8: Comparison of performance of weighted `pand weighted `1recovery in terms ofSNR averaged over 20 experiments for compressible signals x with n = 100; N = 500. Thecoefficients decay with a power d = 1:5. The accuracy of the support estimate is calculatedwith respect to the best k = 20 term approximation.352.5. Stylized application2.5 Stylized applicationIn this section, we apply standard and weighted `pminimization to recover real audio andseismic signals that are compressively sampled.2.5.1 Audio signalsIn this section we examine the performance of weighted `pminimization for the recovery ofcompressed sensing measurements of speech signals. Here the speech signals are sampled at44.1 kHz and we randomly choose only 14th of the samples (their indices chosen randomlyfrom uniform distribution). Assuming that s is the speech signal we have the measurementsy = Rs where R is a restriction operator. We divide our measurements y into 21 blocks, i.e.,y = [yT1; yT2; : : : ; yT21]T . Assuming the speech signal to be compressible in DCT domain (forexample a version of standard DCT is used to compress audio signals in standard MP3), wetry to recover the speech signal using each block measurement.This helps us in two ways: It reduces the size of the problem Considering the fact that the support set corresponding to the largest coefficients does notchange much from one block to another, we can use the indices of the largest coefficientsof each block as a support estimate for the next one.So for each block, we find the speech signal by solving yj= Rjsj, where Rj2 RnjN isthe associated restriction matrix. We also know that speech signals have large low-frequencycoefficients, so we use this fact and the recovered signal at previous block to build our supportestimate and find the speech signal at each block by weighted `p. We choose the supportestimate to be eT = eT 1 [ eT 2. Here eT 1 is the set corresponding to frequencies up to 4 kHz andeT2 is the set corresponding to the largest nj16recovered coefficients of the previous block (for thefirst block eT 2 is empty). The results of using weighted `pand weighted `1for reconstructionof two audio signals (one male and one female) are illustrated in Figure 2.9. Here N = 2048 ,and ! 2 f0; 16;26; : : : ; 1g. Weighted `pgives about 1 db improvement in reconstruction.2.5.2 Seismic signalsThe problem of interpolating irregularly sampled and incomplete seismic data to a regularperiodic grid often occurs in 2D and 3D seismic settings [4].362.5. Stylized application 0 1/6 2/6 3/6 4/6 5/6 1 246810121416?SNR (in dB) male wl1female wl1male wlpfemale wlpFigure 2.9: SNRs of reconstructed signal from compressed sensing measurements plottedagainst !. An intermediate value of ! yields the best performance.Our setting is as explained in Section 1.3.1. We have a seismic line with Nssources, Nrreceivers, and Nttime samples. We are dealing with a signal f 2 RN , where N = NsNrNt.We want to recover a very high dimensional seismic data volume f = Sx by interpolatingbetween a smaller number of measurements b = RMSx, where R is a restriction matrix, Mrepresents the basis in which the measurements are taken, and S is the 2D curvelet transform.Seismic data is approximately sparse in curvelet domain and hence we can formulate theseismic data interpolation problem as an instance of recovery from compressive samples [22,23]. We partition the seismic data volume into frequency slices and approximate x(1) by4p(R(1)MS; b(1); ) where is a small number (estimate of the noise level) and R(1) is thesubsampling operator restricted to the first partition and b(1) is the subsampled measurementsof the data f (1) in the first partition. After this we use the support of each recovered partitionas a support estimate for next partition. In particular for j 1 we approximate x(j+1) by4p;w(R(j)MSH; b(j); ;w) where w is the weight vector which puts smaller weights on thecoefficients that correspond to the support of the previous recovered partition. In [4] theperformance of weighted `1minimization has been tested for recovering a seismic line using50% randomly subsampled receivers. Exploiting the ideas in [4] we test the weighted `pminimization algorithm to recover a test seismic problem when we subsample 50% of the thereceivers using the mask shown in Figure 2.10.b. We omit the details of this algorithm as itmimics the steps taken in [4] when weighted `1is replaced by weighted `p.372.5. Stylized applicationFully Sampled time slice in source?receiver domainSource numberReceiver number10 20 30 40 50 60102030405060(a)Subsampling maskSource numberReceiver number10 20 30 40 50 60102030405060(b)Subsampled time sliceSource numberReceiver number10 20 30 40 50 60102030405060(c)Figure 2.10: (a) Example of a high resolution time slice at t = 0:32 s in the source-receiver (SR)domain, (b) the random subsampling mask where the black lines correspond to the locationsof inactive receivers, and (c) the subsampled time slice. The subsampling ratio is 50%.The Seismic line at full resolution has Ns= 64 sources, Nr= 64 receivers with a sampledistance of 12.5 meters, and Nt= 256 time samples acquired with a sampling interval of4 milliseconds. Consequently, the seismic line contains samples collected in a 1s temporalwindow with a maximum frequency of 125 Hz. To access frequency slices, we take the onedimensional discrete Fourier transform (DFT) of the data along the time axis. We solve the`pand weighted `pminimization problems. In each of the weighted `pproblems, the supportestimate set of a partition is derived from the analysis coefficients of the previously recoveredpartition. Moreover, p is set to be 0:5 and the weight is set to 0.3.382.5. Stylized applicationOriginal shot gatherDistance (m)Time (sec)100 200 300 400 500 600 700 8000.10.20.30.40.50.60.70.80.91(a)Subsampled shot gatherDistance (m)Time (sec)100 200 300 400 500 600 700 8000.10.20.30.40.50.60.70.80.91(b)Figure 2.11: (a) Shot gather number 32 from a seismic line from the Gulf of Suez. (b)Subsampled shot gather using column 32 from the mask in Figure 2.10.b.Lp minimization in SRDistance (m)Time(sec)100 200 300 400 500 600 700 8000.10.20.30.40.50.60.70.80.91(a)Weighted Lp minimization in SRDistance (m)Time(sec)100 200 300 400 500 600 700 8000.10.20.30.40.50.60.70.80.91(b)Figure 2.12: (a) Recovered shot gather using `pminimization in the SR domain. (b) Recoveredshot gather using weighted `pminimization in the SR domain.Figures 2.11.a and 2.11.b show a fully sampled and the corresponding subsampled shot gather,respectively. The shot gather corresponds to shot number 32 of the seismic line. Figures2.12.a and 2.12.b show the reconstructed shot gathers using `pminimization and weighted`pminimization, respectively. The error plots of both reconstructions are shown in Figures2.13.a and 2.13.b. The error plots show that the magnitude of the reconstruction error ofweighted `pminimization is smaller than that of standard `p. Figure 2.14 shows the SNRsof all shot gathers recovered by using regular and weighted `pand `1minimization problems.392.5. Stylized applicationLp error image in SRDistance (m)Time(sec)100 200 300 400 500 600 700 8000.10.20.30.40.50.60.70.80.91(a)Weighted Lp error image in SRDistance (m)Time(sec)100 200 300 400 500 600 700 8000.10.20.30.40.50.60.70.80.91(b)Figure 2.13: (a) Error plots showing the difference between the original shot gather and thereconstruction from `pminimization in the source-receiver domain. (b) Error plots show-ing the difference between the original shot gather and the reconstruction from weighted `pminimization in the SR domain.The plots demonstrate that recovery by weighted `pin the frequency-source-receiver domainis always better than recovery by regular `p. In this plot we also see that although recoveryby weighted `pminimization is better than regular `1minimization but recovery by weighted`1minimization is still about 1 db better than recovery by weighted `pminimization. Webelieve that similar to the case we see in the noisy compressible case this is an artifact of thealgorithm we are using.402.6. Proof of Theorem 2.120 10 20 30 40 50 60 705678910111213Shot gather numberSNR Lp on freq. in SRW?Lp on freq. in SRL1 on freq. in SRW?L1 on freq. in SRFigure 2.14: Comparison of the SNRs achieved by `1, `p, weighted `1, and weighted `pmini-mization in recovering shot gathers applied to source-receiver domain2.6 Proof of Theorem 2.12Recall that eT , an arbitrary subset of f1; 2; :::; Ng, is of size k where 0 a and a is somenumber larger than 1. Let the set eT= T0\eT and eT= Tc0\eT where, j eTj = eT = k and+ = 1.Let x = x+ h be a minimizer of the weighted `pproblem. Then:kx+ hkp;w kxkp;w) kx+ hkpp;w kxkpp;w:Using the weights we have:!pkxeT+ heTkpp+ kxeTc+ heTckpp !pkxeTkpp+ kxeTckpp:412.6. Proof of Theorem 2.12Figure 2.15: Illustration of the signal x and weight vector w emphasizing the relationshipbetween the sets T0and eT .Consequently,!pkxeT\T0+ heT\T0kpp+ !pkxeT\Tc0+ heT\Tc0kpp+ kxeTc\T0+ heTc\T0kpp+ kxeTc\Tc0+ heTc\Tc0kpp !pkxeT\T0kpp+ !pkxeT\Tc0kpp+ kxeTc\T0kpp+ kxeTc\Tc0kpp:We use the forward and reverse triangle inequalities to get:!pkheT\Tc0kpp+ kheTc\Tc0kpp !pkheT\T0kpp+ kheTc\T0kpp+ 2(!pkxeT\Tc0kpp+ kxeTc\Tc0kpp):Adding and subtracting !pkheTc\Tc0kppto the left hand side and adding and subtracting !pkheTc\T0kpp+!pkxeTc\Tc0kppto the right hand side we get:!pkheT\Tc0kpp+ !pkheTc\Tc0kpp+ kheTc\Tc0kpp !pkheTc\Tc0kpp !pkheT\T0kpp+ !pkheTc\T0kpp+ kheTc\T0kpp !pkheTc\T0kpp+2(!pkxeT\Tc0kpp+ !pkxeTc\Tc0kpp+ kxeTc\Tc0kpp !pkxeTc\Tc0kpp):Since khTcokpp= kheT\Tc0kpp+ kheTc\Tc0kppwe get:!pkhTc0kpp+ (1 !p)kheTc\Tc0kpp !pkhT0k+ 2(!pkxTc0kpp+ (1 !p)kxeTc\Tc0kpp): (2.21)422.6. Proof of Theorem 2.12We also have khTcokpp= !pkhTcokpp+ (1 !p)kheT\Tc0kpp+ (1 !p)kheTc\Tc0kpp: Combining thiswith (2.21) we get:khTc0kpp !pkhT0kpp+ (1 !p)(kheTc\T0kpp+ kheT\Tc0kpp)+2(!pkxTc0kpp+ (1 !p)(kxeTc\Tc0kpp):(2.22)eT= T0\eT ) kheTc\T0kpp+ kheT\Tc0kpp= khT0[eTneTkpp:khTc0kpp !pkhT0kpp+ (1 !p)khT0[eTneTkpp+ 2(!pkxTc0kpp+ (1 !p)(kxeTc\Tc0kpp): (2.23)Now partition T c0into sets of T1; T2; :::; jTjj = ak for j 1, such that T1is the set of indicesof the ak largest (in magnitude) coefficients of hTc0and so on. Finally let T01:= T0[ T1. Nowwe can find a lower bound for kAhkp2using the RIP constants of the matrix A. We have:kAhkp2= kAhT01+Xj2AhTjkp2 kAhT01kp2Xj2kAhTjkp2 (1 ak+jT0j)p2khT01kp2 (1 + ak)p2Xj2khTjkp2:(2.24)Here we also use the fact that k:kp2satisfies the triangle inequality for 0 < p < 1.Now we should note that jhTj+1(l)jp jhTj(l0)jp for all l 2 Tj+1and l0 2 Tj, and thusjhTj+1(l)jpkhTjkppak. It follows that khTjk22 (ak)12pkhTjk2pand consequently :Xj2khTjkp2 (ak)p21Xj1khTjkpp= (ak)p21khTc0kpp: (2.25)Using (2.25)in (2.24) we get:kAhkp2 (1 ak+jT0j)p2khT01kp2 (1 + ak)p2(ak)p21khTc0kpp: (2.26)Next, consider the feasibility of x and x. Both vectors are feasible, so we have kAhk2 2".Also note that jT0[eT neTj = (1 + 2)k and khT0kpp jT0j1p2khT0kp2. Using these and432.6. Proof of Theorem 2.12(2.23) in (2.26) we get:(1 ak+jT0j)p2khT01kp2 (2")p+ 2(1 + ak)p2(ak)p21!pkxTc0kpp+ (1 !p)kxeTc\Tc0kpp+(1 + ak)p2(ak)p21!pjT0j1p2khT0kp2+ (1 !p) ((1 + 2)k)1p2khT0[eTneTkp2:(2.27)T1contains the largest ak coefficients of hTc0with a > 1 so j eT n eTj = (1 )k ak thenkhT0[eTneTk2 khT01k2, also we have khT0k2 khT01k2so we get:khT01kp2(2")p+ 2(1 + ak)p2(ak)p21!pkxTc0kpp+ (1 !p)kxeTc\Tc0kpp(1 ak+jT0j)p2 (1 + ak)p2(ak)p21!pjT0j1p2+ (1 !p) ((1 + 2)k)1p2:(2.28)To complete the proof denote by hTc0[m] the m-th largest coefficient of hTc0and observe thatjhTc0[m]jpkhTc0kppm. As hTc01[m] = hTc0[m+ ak] we have:khTc01k22=Xmak+1jhTc0[m]j2Xmak+1(khTc0kppm)2pkhTc0k2p(ak)2p1(2p 1): (2.29)Here the last inequality follows because for 0 < p < 1:Xmak+1m2pZ1akt2pdt =1(ak)2p1(2p 1):Combining (2.29) with (2.23) we get:khTc01kp2(ak)2p1(2p 1)p2!pkhT0kpp+ (1 !p)khT0[eTneTkpp+ 2!pkxTc0kpp+ (1 !p)(kxeTc\Tc0kpp):(2.30)We showed that khT0[eTneTk2 khT01k2and khT0k2 khT01k2:Using these in (2.30) we get:khTc01kp2(ak)2p1(2p 1)p2!pjT0j1p2+ (1 !p) ((1 + 2)k)1p2khT01kp2+ 2!pkxTc0kpp+ (1 !p)(kxeTc\Tc0kpp):(2.31)442.6. Proof of Theorem 2.12Now we can find a bound for khk2using (2.28) and (2.31):khk22= (khT01kp2)2p+ (khTc01kp2)2pkhT01kp2+ khTc01kp22p:(2.32)khkp20B@1 +!pjT0j1p2+(1!p)((1+2)k)1p2(ak)2p1(2p1)p21CA(2")p(1 ak+jT0j)p2 (1 + ak)p2(ak)p21!pjT0j1p2+ (1 !p) ((1 + 2)k)1p2+20B@(1 + a)p2ap21+(1(a+1)k)p2a2p1(2p1)p21CA!pkxTc0kpp+ (1 !p)kxeTc\Tc0kpp(1 ak+jT0j)p2 (1 + ak)p2(ak)p21!pjT0j1p2+ (1 !p) ((1 + 2)k)1p2;(2.33)with the condition that the denominator is positive, equivalently:ak+a2p1(!p+ (1 !p)(1 + 2)1p2)2p(a+1)k<a2p1(!p+ (1 !p)(1 + 2)1p2)2p 1:(2.34)45Chapter 3Weighted AMP3.1 IntroductionSignificant effort has been made recently to find fast algorithms for recovering sparse signalsfrom a small number of linear measurements. Our setting is as before: Let x 2 RN be asparse vector. We try to recover x from n < N linear and potentially noisy measurementsacquired via y = Ax + e. Here A is an n N matrix whose coefficients are drawn from asub-Gaussian distribution and kek . As mentioned before the BP problem (1.4) is perhapsthe most common approach to recover x and can be solved by linear programming algorithms.Relatively high computational complexity of these algorithms has made them difficult to usein applications where the signals are very high dimensional. On the other hand, the lowcomputational complexity of iterative algorithms has made them an appealing alternative forBP [18?20]. A general form of these algorithms is asxt+1= (xt+ Azt; fi^t)zt= y Axt;(3.1)where as before xt is the current estimate of x, zt is the current residual, A is the Hermitianof the measurement matrix and is a non-linear thresholding function which acts component-wise on its vector-valued argument. Two popular examples are the soft thresholding function(a; b) = sign(a)(a b)+, and the hard thresholding function where H(a; b) = aI(jaj b)+.More precisely these functions act on the coefficients of the signal a and zero out any valuewhich is less than the scalar b in magnitude.Recently an extensive numerical study [32] found that even under optimal tunings, the recoveryconditions achieved by IT is worse than those of BP. As mentioned in the introduction Donoho,Maleki and Montanari proposed a new iterative thresholding algorithm which is referred toas approximate message passing [21] that was shown to enjoy both the low complexity of ITalgorithms and the superior recovery conditions of BP. As stated also in Section 1.2.3, theAMP algorithm starts from an initial x0 and an initial threshold fi^ 0 = 1 and iteratively goes463.2. Construction of the graphical model for weighted BPbyxt+1= (xt+ Azt; fit);zt= y Axt+ 1zt1h0(xt1+ Azt1; fit1)i:(3.2)Notice that the only difference between AMP (3.2) and generic IT algorithm (3.1) is theextra term 1zt1h0(xt1 + Azt1; fi t1)i in the calculation of the residual. This term hasbeen derived in [21] using the theory of belief propagation in graphical models and has beenempirically shown to improve the recovery conditions. Statistical physicists call this term theOnsager reaction term [33].In this section we design a weighted approximate message passing algorithm for recoveringsparse signals when there exists prior information about the support of the signal. We buildup the weighted AMP algorithm by following [34] step by step and empirically show thatwhen the support estimate is accurate enough, weighting results in faster recovery and bettersparsity-undersampling tradeoff. In particular we derive a "weighted AMP" algorithm tosolve the weighted `1minimizationargmins2RNjjsjj1;wsubject to y = As; (3.3)where w 2 f!; 1gN is the weight vector and ksk1;w:= iwijsij is the weighted `1norm. Notethat in (3.3) we have restated the weighted `1minimization (1.7) when = 0. As before givena support estimate eT f1; :::; Ng, wj= ! < 1 for j 2 eT and wj= 1 for j =2 eT .3.2 Construction of the graphical model for weighted BPEstimating marginalsAssume [w1;w2; :::;wN]T are the weights we use for the coefficients of the signal s. Considerthe following distribution over variables s1; s2; : : : ; sN:(ds) =1ZNYi=1exp(wijsij)nYa=1fya=(As)ag; (3.4)where fya=(As)agdenotes a Dirac distribution on the hyperplane ya= (As)a. As !1 themass of this distribution concentrates around the solutions of y = As which has more zerocoefficients and hence the solution of (3.3)?the maximum of exp(jtj) is achieved when t = 0.If the solution of (3.3) is unique, then finding the marginals of (3.4) gives us the solution.473.2. Construction of the graphical model for weighted BP1 a n1 i NFigure 3.1: Factor graph associated to the probability distribution 3.4. Circles correspond tovariables si, i 2 [n] and squares correspond to measurements ya, a 2 [m].Belief propagationBelief propagation provides a low complexity tool to estimate the marginals of (3.4). Thismethod has been introduced in [34]. In this section we review this method and make thecorresponding changes for the weighted version. For any d 2 N define [d] := f1; 2; : : : ; dg.Consider the bipartite factor graph G = (V; F;E), shown in Figure 3.1, which includes avariable node i 2 V = [N ] for each variable siand a factor node a 2 F = [n] for each termfya=(As)ag. E = [N ] [n] = f(i; a) : i 2 [N ]; a 2 [n]g where variable i and factor a areconnected by an edge if fya=(As)agdepends non-trivially on si, i.e., if Aai6= 0. As in our caseA is a dense matrix, G is a complete bipartite matrix [34]. The state variables of this beliefpropagation are the messages fi!agi2V;a2Fand f^a!igi2V;a2Fwhich are associated to eachedge of the factor graph G. The update rules for these densities are:t+1i!a(si)=ewijsijYb 6=a^tb!i(si);^ta!i(si)=ZYj 6=itj!a(si) fya=(As)agds:(3.5)Here and below the subscripts denote the iteration number and =means identity betweendistributions up to a normalization constant.This message passing has two challenges in implementation. First the messages are den-sity functions over the real line and keeping track of them is difficult unless they have somestructure. Furthermore at each iteration, 2nN messages should be calculated which is com-putationally expensive. In Sections 3.3 and 3.4 we show that in the large system limit andas !1 this message passing algorithm is equivalent to the following simple iterative algo-483.3. Large system limitrithm:As mentioned in (3.2) at t = 0 we start from x0 = 0 and fi^ 0 = 1 and z0 = y. Assumew = [w1; w2; : : : ; wN]T is the weight vector. Then the algorithm proceeds as follows:xt+1= (xt+ Azt; fi^tw)zt= y Axt+ 1zt1h0(xt1+ Azt1; fi^t1w)ifi^t=fi^t1h0(xt1+ Azt1; fi^t1w)i:(3.6)In each iteration, xtiis the mean value of the message ti!a, fi^ t is the variance of ti!a, and theresidual ztacorresponds to the mean of ^ta!i. An analysis of these terms?following the stepsin Section 3 of [34]?is given in the next section.3.3 Large system limitIn this section we explain the derivation of (3.6) which is a straight-forward adaptation tothe weighted case of the derivation of Section 3 of Donoho et al. in [34]. We restate someof lemmas and theorems in that paper without providing the proofs and justify the simplechanges which should be done to solve the weighted `1problem (1.7) by weighted AMP.The following lemma in [34, Lemma 3.1] approximates ^ta!iby a Gaussian distribution. Werestate this lemma without making any changes.Lemma 3.1. Let xtj!aandfitj!abe, respectively, the mean and variance of the distribution^tj!a. Further assumeRjs jj3dtj!a(sj) Ctuniformly in N and n. Then there exist aconstant C 0tsuch thatjj^ta!i^ffita!ijjKC0tN12(fi^ta!i)32;^ffita!i(dsi) :=sA2ai2fi^ta!iexpf2fi^ta!i(Aaisi zta!i)2gdsi;(3.7)where the distribution parameters are given byzta!i:= yaXj 6=iAajxtj!a;fi^ta!i:=Xj 6=iA2ajfitj!a:(3.8)493.3. Large system limitHere jjjjKis Kolmogorov distance, which for two distributions 1and 2is defined asjj1 2jjK:= supa2RjZa11(dx)Za11(dx)j: (3.9)Notice that fi^ ta!i=Pj 6=iA2ajfitj!a= Ca A2aifiti!a, where Ca=PjA2ajfitj!a. Assuming thatmatrix A is drawn from a random Gaussian distribution with mean 0 and variance 1nwe canapproximate fi^ ta!iby an edge independent quantity fi^ t.Motivated by this lemma we find the mean and variance of the messages fi t+1i!a(si). Considerthe following family of densities introduced in [34]f(s;x; b) :=1z(x; b)expfjsj2b(s x)2g; (3.10)where z(x; b) is a normalization constant.Also denote the mean and variance of these distributions asF(x; b) := Ef(;x;b)(Z); G(x; b) := Varf(;x;b)(Z); (3.11)where Z has density f(;x; b) [34]. Simple modification of the Lemma 3.2 in [34] gives us themean and variance of fi t+1i!a(si). Notice that in our case, fis replaced by fwiand similarlywe replace Fand Gby Fwiand Gwirespectively.Lemma 3.2. Suppose at iteration t, the messages from factor nodes to the variable nodesare set to be fi^ ta!i(si) =^ffita!i(si), with ^ffita!ias defined in (3.7) with parameters zta!iandfi^ta!i= fi^t. Then at the next iteration we havet+1i!a(si) = ffit+1i!a(si)f1 +O(s2in)g; ffit+1i!a(si) = fwi(si;Xb 6=aAbiztb!i; fi^t): (3.12)Combining this lemma with Lemma 3.1, the mean and variance is given byxt+1i!a= Fwi(Xb 6=aAbiztb!i; fi^t); fiti!a= Gwi(Xb 6=aAbiztb!i; fi^t): (3.13)503.4. Large limitProof. Combining (3.7) and (3.5) we have:t+1i!a(si)=ewijsijYb 6=a^tb!i(si) = expfw ijsijXb6=a2fi^t(Aaisi ztb!i)2g=expfwijsij2fi^t(n 1ns2i 2siXb 6=aAbiztb!i)2g=expf(wi)jsij(wi)2(wifi^t)(n 1ns2i 2siXb 6=aAbiztb!i)2g;(3.14)which coincides with ffit+1i!a(si) up to terms of s2in. Notice that here we have used the fact thatA2ai1pnand =means identity between distributions up to a normalization constant. Andfinally the formulae for xt+1i!aand fi ti!afollows from (3.11).xt+1i!a= Fwi(Xb 6=aAbiztb!i; wifi^t);zta!i= yaXj 6=iAajztj!a;fi^t+1i=nGwi(NXi=1Abiztb!i; wifi^t):(3.15)3.4 Large limitAs explained in Section 3.2 we are interested in the case where ! 1. In this section wesimplify the belief propagation formulas (3.15). In Section 3.3 of [34] the functions Fand Ghave been studied in the large limit. Here we follow the same argument. Consider the softthresholding function:(x; b) =8>>>><>>>>:x b; if x > b0; if b x bx+ b; if x < b: (3.16)We can easily confirm that(x; wib) = argmins2Rfjsj+12wib(s x)2g = argmins2Rfwijsj+12b(s x)2g: (3.17)In the ! 1 limit, the integral that defines Fwi(x; wib) is dominated by its maximumvalue which is s = (x; wib). Therefore Fwi(x; wib) ! (x; wib). The variance can be513.5. From message passing to AMPestimated by approximating fwi(s;x;wib) near s. If s = 0 then fwi(s;x;wib) can beapproximated by a Laplace distribution which leads to Gwi(x; wib) = (1(wi)2) and if s 6= 0then fwi(s;x;wib) can be approximated by a Gaussian distribution distribution which leadsto Gwi(x; wib) = (1wi). Hence using Lemma 3.3 of [34] we get the following lemma:Lemma 3.3.limwi!1Fwi(x; wib) = (x; wib);limwi!1Gwi(x; wib) =limwi!1wiGwi(x; wib)wi=wib 0(x; wib)wi= b0(x; wib):(3.18)Accordingly, we are led to the following simplified message passing algorithm:xt+1i!a= (Xb 6=aAbiztb!i; wifi^t);zta!i= yaXj 6=iAajztj!a;fi^t+1=fi^tNNXi=10(XbAbiztb!i; wifi^t):(3.19)3.5 From message passing to AMPThe simplified message passing algorithm (3.19), still needs 2nN updates at each iterationwhich is computationally intractable for large problems. In Section 3.4 of [34] the regularmessage passing algorithm has been approximated by AMP with approximations xti!a=xti+ xti!a+ O(1N) and zti!a= zti+ zti!a+ O(1N) which leads to the algorithm (3.2). Anidentical approach can be applied to (3.19) which approximates this message passing algorithmby the following algorithm. As calculations are the same as calculations in Section 3.4 of [34]once we include the weighting, we omit them.xt+1= (xt+ Azt; fi^tw)zt= y Axt+ 1zt1h0(xt1+ Azt1; fi^t1w)ifi^t=fi^t1h0(xt1+ Azt1; fi^t1w)i:(3.20)As mentioned earlier the main advantage of weighted AMP over weighted `1minimization isthe low computational complexity of this algorithm. Figures 3.2 and Figure 3.3 illustrate this523.5. From message passing to AMP0 2 4 6 8 10 12050100150200250300 Recovery time for weighted ?1 in secondsTime in seconds# of experiments(a) Weighted `10 0.5 1 1.5 2 2.5 3 3.5050100150200250300 Recovery t ime for weighted AMP in secondsTime in seconds# of experiments(b) Weighted AMPFigure 3.2: Histograms of the recovery time of weighted `1and weighted AMP. Plots showthe times it takes for (a) weighted `1, and (b) weighted AMP to recover 1000 k-sparse signalsx 2 R4000 with k = 400 and n = 1500 measurements when we have a support estimate with50% accuracy.advantage. Here we use weighted AMP and weighted `1to recover 1000 400-sparse signalsx 2 R4000 by n = 1500 linear measurements obtained via y = Ax where A 2 R15004000is a Gaussian random measurement matrix. For each instance of the experiment, assumingthat the signal x is supported on the set T?with jT j = 400?we have a support estimate eTwith 50% accuracy, i.e., jT\eT jjT j= 0:5. Our goal is to compare the recovery time of weightedAMP and weighted `1. To do this we compute the time it takes for each algorithm to findan approximation x such that kx xk 108, i.e., 80-db SNR. Figure 3.2.a shows thehistogram of the time it takes for weighted `1to recover the signals and Figure 3.2.b showsthe histogram for weighted AMP. Notice the difference in the horizontal axes. As we can seein this figure, more than 98% of the signals was recovered with weighted AMP in less than 2.5seconds, whereas, the fastest recovery with weighted `1minimization is 2.5 seconds.Figure 3.3 shows the recovery times of each one of the 1000 signals sorted with respect to therecovery time of weighted `1. In this figure we can see that weighted AMP is always?and insome cases, up to 10 times?faster than weighted `1.In Section 3.6 we introduce a reweighted AMP algorithm to recover sparse signals and inSection 3.7 we present extensive numerical results comparing weighted `1, weighted AMP, andreweighted AMP.533.6. Reweighted AMP and reweighted W-AMP0 100 200 300 400 500 600 700 800 900 1000024681012Re c o ve r y tim e s o f we ig h te d am p a n d we ig h te d ? 1 in s e c o n d sRecovery time in seconds Weighted L1Weighted AMPFigure 3.3: Comparison of recovery time of each one of the 1000 signals by weighted `1andweighted AMP. The experiments are sorted with respect to the time it takes for weighted `1to recover the signal. Specifically, the first experiments is one that weighted `1minimizationrecovers it faster than all the others and experiment number 1000 is the one which is recoveredslowest by weighted `1.3.6 Reweighted AMP and reweighted W-AMPIn this section we introduce the reweighted AMP and reweighted W-AMP algorithms for re-covering sparse and compressible signals and show preliminary results to justify the advantagesof reweighting.3.6.1 Reweighted AMPFigure 3.4 shows the percentage of the true support recovered versus the iteration numberwhen using regular AMP to recover a 100-sparse signal in R1000 and taking 250, 300, 350 and400 measurements. The measurement matrix is a Gaussian normalized random matrix.As we see in this figure even when the number of measurements is not enough (when AMPdoes not reach full recovery), more than 50% of the support is recovered after a few iterations.This observation motivates us to design a reweighted AMP algorithm which finds the supportof the current estimate every 20 iterations and uses that as an approximation for the true543.6. Reweighted AMP and reweighted W-AMP0 50 100 150 200102030405060708090100Iteration numberPercentage of support recoveredk=100 , n=1000 m=250m=300m=350m=400Figure 3.4: Percentage of the support recovered by AMP versus iteration number when thesignal x 2 R1000 is 100-sparse.support and recovers the signal by solving a weighted AMP after that. Algorithm 2 explainsthe details of this algorithm.Reweighted AMP increases the convergence speed and gives us better sparsity-undersamplingtrade-off. In other words there are cases that the number of measurements are not enoughto recover the signal using regular AMP (and hence `1) but reweighted AMP can recover thesignal using the same number of measurements. Figure 3.5 compares reweighted and regularAMP. Here we recover a 100-sparse signal in R1000 using an 250 1000 Gaussian matrix (left)and an 350 1000 Gaussian matrix (right). Both figures show the misfit error kxtrecxkkxk( tis the current iteration) versus the iteration number. The left figure shows the case that thenumber of measurements is not enough in order to get full recovery by regular AMP and theright figure shows the case that regular AMP achieves full recovery.To test our reweighted AMP algorithm, we compare the recovery performance of this algorithmwith two well known algorithms that use reweighting to enhance the recovery conditions, i.e.,iterative reweighted `1minimization (IRL1) [30], and support driven reweighted `1minimiza-tion (SDRL1) [35] in recovering signals x 2 RN with N = 1000. First we compare the resultswhen we use these algorithms to recover 100 k-sparse signals using compressed measurementsobtained by Gaussian matrices A 2 RnN where n 2 fN10;N4;N2g. In Figure 3.6 we compare553.6. Reweighted AMP and reweighted W-AMPAlgorithm 2 Reweighted AMP1: Input b = Ax, tmax, !2: Output x(t)3: Initialize p^ = 0:98, ^k =nlog(Nn)2, wi= 1 for all i 2 f1; : : : ; Ng, = nN, = ;, t = 0, x0 = 0,z0= y, fi^ 0 = 14: while t < tmax do5: t=t+16: xt = (xt1 + Azt1; fi^ t1w)7: zt = y Axt + 1zt1h0(xt1 + Azt1; fi^ t1w)i8: fi^ t = fi^t1h0(xt1+ Azt1; fi^t1w)i9: if mod(t; 20) = 10 then10: l = minjj s.t. kxtk p^kxtk11: s = min(l; ^k)12: = supp(xtjs)13: Set the weights equal to wi=(1; if i 2 c!; if i 2 14: end if15: end while0 50 100 1500.10.20.30.40.50.60.70.80.91Iteration numberrecovery errork=100 , n=1000 Regular AMP Reweighted AMP(a) m = 2500 20 40 60 80 10000.10.20.30.40.50.60.70.80.9Iteration numberrecovery errork=100 , n=1000 Regular AMP Reweighted AMP(b) m = 350Figure 3.5: Comparison between regular AMP and reweighted AMP when m=250 and m=350,recovering a 100-sparse signal in R1000.full recovery success rate when kn2 f110;210;310;410;510g. For the sake of comparison wealso include the recovery results of regular AMP and WSPGL1 algorithm introduced in [36].The figure shows that our proposed reweighted AMP has comparable performance with IRL1and SDRL1 (in some cases reweighted AMP outperforms the other two). Notice that both563.6. Reweighted AMP and reweighted W-AMP0.1 0.2 0.3 0.4 0.500.20.40.60.81k/nSuccess raten/N=1/10 SDRL1Reweighted AMPIRL1WSPGL1AMP(a) nN=1100.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.500.20.40.60.81k/nSuccess raten/N=1/4 SDRL1Reweighted AMPIRL1WSPGL1AMP(b) nN=140.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.500.20.40.60.81k/nSuccess raten/N=1/2 SDRL1Reweighted AMPIRL1WSPGL1AMP(c) nN=12Figure 3.6: Comparison of percentage of exact recovery by reweighted AMP, IRL1, SDRL1,WSPGL1, and regular AMP recovering 100 sparse signals x 2 R1000 with different number ofmeasurement, n, and different levels of sparsity.IRL1 and SDRL1 algorithms solve a series of weighted `1minimizations to recover the signaland our proposed reweighted AMP is much faster than both these algorithms. On the otherhand the recovery results of reweighted AMP are significantly better than algorithms whichhas comparable complexities, i.e., regular AMP and WSPGL1.Next, we generate compressible signals x 2 RN , sorted coefficients of which decay like jpwhere j 2 f1; 2; : : : ; Ng with dimension N = 1000. We consider the case where nN= 0:3 anddecay power p 2 f1:1; 1:5; 2g. Figure 3.7 shows the histogram of ratio of the reconstructionerror of reweighted AMP over that of IRL1 for 100 experiments. Notice that a ratio smallerthan one means that reweighted AMP has smaller reconstruction error than IRL1 and a ratiogreater than one that the the reconstruction error of IRL1 is smaller than that of reweightedAMP.3.6.2 Reweighted W-AMPApplying the same reweighting idea, explained in Section 3.6.1, to the weighted AMP algorithm(3.20), we derive the reweighted W-AMP algorithm when we have prior support informationabout the signal which is explained in Algorithm 3. Notice that here the only difference is inthe way we do reweighting, i.e., as before we use small weights for the coefficients which are inthe support estimates and also every 20 iterations we estimate the support set of the currentestimate and use smaller weights for those coefficients.573.6. Reweighted AMP and reweighted W-AMP0.9 0.95 1 1.05 1.1 1.15 1.2 1.25051015202530Reweighted AMP/IRL1(a) p = 1:1, MSE = 0:09641 1.2 1.4 1.6 1.8 205101520253035Reweighted AMP/IRL1(b) p = 1:5, MSE = 0:01731 1.1 1.2 1.3 1.4 1.5024681012141618Reweighted AMP/IRL1(c) p = 2, MSE = 0:0018Figure 3.7: Histogram of ratio of mean squared error (MSE) between reweighted AMP andIRL1 for recovering compressible signals x 2 R1000, sorted coefficients of which decay like jpwith j 2 f1; 2; : : : ; 100g and with different decay rates p.Algorithm 3 Reweighted W-AMP1: Input b = Ax, tmax, eT , !2: Output x(t)3: Initialize p^ = 0:98, ^k =nlog(Nn)2, wi= 1 for i 2 eT c and wi= ! for i 2 eT , = nN, = ;,t = 0, x0 = 0, z0 = y, fi^ 0 = 14: while t < tmax do5: t=t+16: xt = (xt1 + Azt1; fi^ t1w)7: zt = y Axt + 1zt1h0(xt1 + Azt1; fi^ t1w)i8: fi^ t = fi^t1h0(xt1+ Azt1; fi^t1w)i9: if mod(t; 20) = 10 then10: l = minjj s.t. kxtk p^kxtk11: s = min(l; ^k)12: = supp(xtjs)13: Set the weights equal to wi=8>><>>:1; if i 2 eT c \ c!; if i 2 eT \ c!5; if i 2 14: end if15: end while583.7. Numerical results3.7 Numerical resultsIn this section we provide numerical examples to compare regular, weighted and reweightedAMP with `1and weighted `1algorithms (we use the SPGL1 software to solve the weighted`1minimization problem [37]).Table 3.1 compares the time it takes for AMP, reweighted AMP, and `1minimization to recovera 100-sparse signal in R1000 with maximum SNR of 40 db. Here we take the average recoverytime over 20 experiments when the number of measurements changes from 300 to 500 with anincrement of 50. 1 means that the method fails full recovery in at least one of the examples.Notice that in Table 1 we see that reweighted AMP can recover 100-sparse signals with fewermeasurements compared to regular AMP and `1minimization.To examine this observation more carefully we consider the case that we have prior supportinformation (Notice that in these experiments we also include the cases that the recoveryalgorithm does not use any prior support information). We generate signals x 2 RN whereN = 500 and with fixed sparsity k = 40. We compute the (noisy) compressed measurementsof x using a Gaussian random measurement matrix A with dimensions nN where n variesbetween 80 and 200 with an increment of 20. In the case of noisy measurements, we haveassumed 5% noise, i.e., 26 db SNR. Figure 3.8 shows the reconstruction signal-to-noise ratioaveraged over 20 experiments, using weighted AMP and reweighted W-AMP to recover sparsesignals x 2 RN in the noise-free case and for = 0:3; 0:5; 0:7 ( determines the accuracy ofthe support estimate). The SNR is measured in dB, as defined in (2.20). In these figures wealso include the results of weighted `1, IRL1, SDRL1, and WSPGL1. Notice that when ! = 1,weighted AMP reduces to regular AMP, reweighted W-AMP reduces to reweighted AMP, andweighted `1reduces to regular `1. Figures 3.8.a, 3.8.c, and 3.8.e show the results for weightedAMP and weighted `1when = 0:7; 0:5; and 0:3 respectively. Notice that although weightedAMP is much faster than weighted `1,its recovery performance is slightly better than weighted`1. Similarly Figures 3.8.b, 3.8.d, and 3.8.f show the results for reweighted W-AMP, IRL1,SDRL1, and WSPGL1.Remark 3.1. As increases?support estimate is more accurate?both weighted and reweightedW-AMP achieve full recovery with fewer measurements. Also notice that intermediate valueof ! gives us the best recovery when > 0:5. This behavior is similar to what we see withweighted `1and weighted `pminimization.Remark 3.2. We can use Figure 3.8 to compare weighted AMP and weighted `1minimiza-tion with reweighted W-AMP empirically. In all the cases, reweighted W-AMP outperforms593.7. Numerical resultsmethodnm 300 350 400 450 500411 1 0.3367 0.2071 0.1538regular AMP 1 1 0.1111 0.0823 0.0727reweighted AMP 2.1029 0.2481 0.1102 0.0783 0.0724Table 3.1: Comparison of recovery time of AMP, reweighted AMP, and `1minimization inseconds.weighted AMP and weighted `1significantly. For example when = 0:5, n = 100, and! = 0:5 reweighted W-AMP reaches full recovery for all the 40-sparse signals (average SNR of180 db), whereas both weighted AMP and weighted `1recover with average SNR of less than10 db. As another example when = 0:3 reweighted W-AMP achieves full recovery with 120measurements whereas weighted AMP needs 160 measurement for full recovery.Remark 3.3. In all the experiments reweighted W-AMP outperforms IRL1 and SDRL1 (whileit is still much faster than both). For example when n = 100 reweighted W-AMP recovers thesignals with an average SNR of 75 db, whereas both IRL1 and SDRL1 recover the signals withan average SNR of less than 15 db. Also notice that in this case when ! = 0:3 or 0:5 we reachfull recovery for all the signals when we have a 50% accurate support estimate.Remark 3.4. Notice the results of reweighted W-AMP when we have a 70% accurate supportestimate (Figure 3.8.b). In this experiment when ! = 0:3 reweighted W-AMP has recoveredall the signals when the number of measurements is as small as twice the number of non-zerocoefficients of the signal.Next we repeat the above experiment except this time the measurements have 5% noise,i.e., 26 db SNR. The results are shown in Figure 3.9. Figures 3.9.a, 3.9.c, and 3.9.e show theweighted AMP and weighted `1reconstruction signal-to-noise ratio averaged over 20 experi-ments and Figures 3.9.b, 3.9.d, 3.9.f show the results when we use reweighted W-AMP, IRL1,SDRL1, and WSPGL1. Reweighted W-AMP outperforms weighted AMP again. As mentionedearlier when ! = 1 weighted AMP reduces to regular AMP and reweighted W-AMP reducesto reweighted AMP. Similar to the noise-free case reweighted W-AMP always outperforms allthe other algorithms. Notice that using intermediate weights is always beneficial in the noisycase even when the support estimate is inaccurate, i.e., < 0:5. For example when = 0:3and n = 120 with weighted AMP and ! = 0 we get SNR6 db and with regular AMP we getSNR10 db whereas with weighted AMP and ! = 0:5 we get SNR12 db.Remark 3.5. Notice the improvement in recovery performance of weighted W-AMP when wehave an accurate prior support information ( = 0:7). For example when n = 80, on average,603.7. Numerical results80 100 120 140 160 180 200020406080100120140160180number of measurments nSNR Weighted AMP, ? = 0Weighted AMP, ? = 0.3Weighted AMP, ? = 0.5Weighted AMP, ? = 0.7Regular AMPWeighted L1, ?= 0Weighted L1, ?= 0.5Regular L1(a) Weighted AMP, = 0:780 100 120 140 160 180 200020406080100120140160180number of measurments nSNR Reweighted W?AMP, ? = 0Reweighted W?AMP, ? = 0.3Reweighted W?AMP, ?= 0.5Reweighted W?AMP, ? = 0.7Reweighted AMPIRL1SDRL1WSPGL1(b) Reweighted W-AMP, = 0:780 100 120 140 160 180 200020406080100120140160180number of measurments nSNR Weighted AMP, ? = 0Weighted AMP, ? = 0.3Weighted AMP, ? = 0.5Weighted AMP, ? = 0.7Regular AMPWeighted L1, ?= 0Weighted L1, ?= 0.5Regular L1(c) Weighted AMP, = 0:580 100 120 140 160 180 200020406080100120140160180number of measurments nSNR Reweighted W?AMP, ? = 0Reweighted W?AMP, ? = 0.3Reweighted W?AMP, ?= 0.5Reweighted W?AMP, ? = 0.7Reweighted AMPIRL1SDRL1WSPGL1(d) Reweighted W-AMP, = 0:580 100 120 140 160 180 200020406080100120140160180number of measurments nSNR Weighted AMP, ? = 0Weighted AMP, ? = 0.3Weighted AMP, ? = 0.5Weighted AMP, ? = 0.7Regular AMPWeighted L1, ?= 0Weighted L1, ?= 0.5Regular L1(e) Weighted AMP, = 0:380 100 120 140 160 180 200020406080100120140160180number of measurments nSNR Reweighted W?AMP, ? = 0Reweighted W?AMP, ? = 0.3Reweighted W?AMP, ?= 0.5Reweighted W?AMP, ? = 0.7Reweighted AMPIRL1SDRL1WSPGL1(f) Reweighted W-AMP, = 0:3Figure 3.8: (Noise-free case) Comparison of performance of (a, c, e) weighted AMP andweighted `1and (b, d, f) reweighted W-AMP, IRL1, SDRL1, and WSPGL1 in terms of SNRaveraged over 20 experiments for sparse signals with variable weights and measurements whenthere is no noise.613.7. Numerical resultsusing reweighted W-AMP with ! = 0:3 improves the recovery SNR of reweighted AMP byabout 15 db.The interesting advantage of reweighted AMP over regular AMP and `1minimization isthat the sufficient number of measurements for recovery by reweighted AMP seems to be lessthan those of AMP and `1minimization. Figure 3.10 empirically compares the recovery successpercentage of these algorithms recovering signals x 2 R200 with different levels of sparsity anddifferent number of measurements. Horizontal axis determines the undersampling fraction, =mn, and the vertical axis represents the sparsity fraction, = km. Figures 3.10.a?c show theresults for recovery by41, regular AMP, and reweighted AMP respectively. In this figures eachpoint indicates the fraction of realizations with successful recovery, when we use that methodto recover signals with corresponding and . This figure empirically shows the advantage ofreweighted AMP over the other two, for example, when m=100, 41and AMP has recoveredall the 25-sparse signals, where reweighted AMP has recovered all 30-sparse signals.Figure 3.11 summarizes the empirical results of Figure 3.10. At the points below each curve,we empirically get more than 80% successful recovery and at the points above the curve, we getless than 80% successful recovery when the corresponding method is used to recover signals inR200. Notice that if we use 100% successful recovery the results of 41and AMP are essentiallysimilar to each other. However in our experiments we see that in the cases that these algorithmsdo not recover all the signals, AMP reaches full recovery for more signals compared to 41.Therefore in this experiment we show the result when we use 80% full recovery threshold.623.7. Numerical results80 100 120 140 160 180 200051015202530number of measurments nSNR Weighted AMP, ? = 0Weighted AMP, ? = 0.3Weighted AMP, ? = 0.5Weighted AMP, ? = 0.7Regular AMPWeighted L1, ?= 0Weighted L1, ?= 0.5Regular L1(a) Weighted AMP, = 0:780 100 120 140 160 180 2000510152025number of measurments nSNR Reweighted W?AMP, ? = 0Reweighted W?AMP, ? = 0.3Reweighted W?AMP, ?= 0.5Reweighted W?AMP, ? = 0.7Reweighted AMPIRL1SDRL1WSPGL1(b) Reweighted W-AMP, = 0:780 100 120 140 160 180 2000510152025number of measurments nSNR Weighted AMP, ? = 0Weighted AMP, ? = 0.3Weighted AMP, ? = 0.5Weighted AMP, ? = 0.7Regular AMPWeighted L1, ?= 0Weighted L1, ?= 0.5Regular L1(c) Weighted AMP, = 0:580 100 120 140 160 180 2000510152025number of measurments nSNR Reweighted W?AMP, ? = 0Reweighted W?AMP, ? = 0.3Reweighted W?AMP, ?= 0.5Reweighted W?AMP, ? = 0.7Reweighted AMPIRL1SDRL1WSPGL1(d) Reweighted W-AMP, = 0:580 100 120 140 160 180 2000510152025number of measurments nSNR Weighted AMP, ? = 0Weighted AMP, ? = 0.3Weighted AMP, ? = 0.5Weighted AMP, ? = 0.7Regular AMPWeighted L1, ?= 0Weighted L1, ?= 0.5Regular L1(e) Weighted AMP, = 0:380 100 120 140 160 180 2000510152025number of measurments nSNR Reweighted W?AMP, ? = 0Reweighted W?AMP, ? = 0.3Reweighted W?AMP, ?= 0.5Reweighted W?AMP, ? = 0.7Reweighted AMPIRL1SDRL1WSPGL1(f) Reweighted W-AMP, = 0:3Figure 3.9: (Noisy case) Comparison of performance of (a, c, e) weighted AMP and weighted`1and (b, d, f) reweighted W-AMP, IRL1, SDRL1, and WSPGL1 in terms of SNR averagedover 20 experiments for sparse signals with variable weights and measurements when there is5% noise.633.8. Stylized applications? = nN?=k nSPGL10.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110.90.80.70.60.50.40.30.20.1(a) 41? = nN?=k nRegular AMP0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110.90.80.70.60.50.40.30.20.1(b) Regular AMP? = nN?=k nReweighted AMP 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110.90.80.70.60.50.40.30.20.1 00.10.20.30.40.50.60.70.80.91(c) Reweighted AMPFigure 3.10: Empirical phase transition of 41, regular AMP and reweighted AMP recoveringsparse signals in R200. Figures show the percentage of successful recovery over 20 experimentswhen the measurement matrix is Gaussian.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100.10.20.30.40.50.60.70.80.91? = nN?=k n SPGL1AMPReweighted AMPFigure 3.11: Illustration of the phasediagrams of AMP, reweighted AMP, and 41presented inFigure 3.10.3.8 Stylized applicationsIn this section, we apply weighted AMP and reweighted W-AMP to recover real speech signalsthat are compressively sampled. The speech signals are the same speech signals used in Chapter2 and we choose the weights the same way. We break the measurements into 21 blocks anduse smaller weights on the low frequencies. We also use the large coefficients in each blockas an approximation for the large coefficients in the next block. For the reweighted AMP we643.8. Stylized applications 0 1/6 2/6 3/6 4/6 5/6 1 246810121416?SNR (in dB) Male wl1Female wl1Male weighted AMPFemale weighted W?AMPMale reweighted W?AMPFemale reweighted W?AMPMale wlpFemale wlpFigure 3.12: SNRs of reconstructed audio signals from compressed sensing measurementsplotted against !. Figure shows the result for reconstruction via weighted `1minimization,weighted `pminimization, weighted AMP, and reweighted W-AMP.start with with these weights for each block and renew the weights every 20 iterations.Figure 3.12 shows the results when we apply weighted AMP, reweighted W-AMP, weighted`1, and weighted `pto recover two compressively sampled speech signals (one male and onefemale). The results are presented for ! 2 [0; 16;26; : : : ; 1]. Generally, using intermediateweights results in better recovery and the results of reweighted W-AMP is slightly better thanthe results of weighted AMP and weighted `1minimization.65Bibliography[1] M. P. Friedlander, H. Mansour, R. Saab, and ?zg?r Y?lmaz, ?Recovering compressivelysampled signals using partial support information,? Arxiv preprint arXiv:1010.4612v2,2011.[2] R. Saab and ?zg?r Y?lmaz, ?Sparse recovery by non-convex optimization -instance op-timality,? Applied and Computational Harmonic Analysis, vol. 29, no. 1, pp. 30?48,2010.[3] D. L. Donoho, A. Maleki, and A. Montanari, ?Message passing algorithms for compressedsensing: I. motivation and construction,? Proc. ITW, Cairo,Egypt, 2010.[4] H. Mansour, F. Herrmann, and O. Y?lmaz, ?Improved wavefield reconstruction fromrandomized sampling via weighted one-norm minimization,? to appear in Geophysics,GEO-2012-0383, 2012.[5] B. Bah and J. Tanner, ?Improved bounds on restricted isometry constants for gaussianmatrices,? CoRR, 2010. [Online]. Available: arXiv:1003.3299v2[6] Q. Du and J. E. Fowler, ?Hyperspectral image compression using jpeg2000 and principalcomponent analysis,? IEEE Geosci. Remote Sens. Lett., vol. 4, no. 4, pp. 201?205, April2007.[7] M. Lustig, D. Donoho, and J. Pauly, ?Sparse MRI: The Application of Compressed Sensingfor Rapid MR Imaging,? Preprint, 2007.[8] G. Hennenfent and F. Herrmann, ?Application of stable sig-nal recovery to seismic interpolation,? 2006. [Online]. Available:http://slim.eos.ubc.ca/Publications/Private/Conferences/SEG/hennenfent06seg.pdf[9] G. Hennenfent and F. J. Herrmann, ?Seismic denoising with non-uniformly sampledcurvelets,? IEEE Comp. in Sci. and Eng., vol. 8, no. 3, pp. 16?25, 2006.66Bibliography[10] R. von Borries, C. Miosso, and C. Potes, ?Compressed sensing using prior information,?in 2nd IEEE International Workshop on Computational Advances in Multi-SensorAdaptive Processing, CAMPSAP 2007., 12-14 2007, pp. 121 ? 124.[11] N. Vaswani and W. Lu, ?Modified-CS: Modifying compressive sensing for problems withpartially known support,? IEEE Trans. on Signal Processing, vol. 58, no. 9, pp. 4595 ?4607, September 2010.[12] M. Amin Khajehnejad, W. Xu, A. Salman Avestimehr, and B. Hassibi, ?Weighted l1 mini-mization for sparse recovery with prior information,? in IEEE International Symposiumon Information Theory, ISIT 2009, June 2009, pp. 483 ? 487.[13] D. Donoho, ?Compressed sensing.? IEEE Transactions on Information Theory, vol. 52,no. 4, pp. 1289?1306, 2006.[14] E. J. Cand?s, J. Romberg, and T. Tao, ?Stable signal recovery from incomplete and inac-curate measurements,? Communications on Pure and Applied Mathematics, vol. 59,pp. 1207?1223, 2006.[15] D. Donoho and M. Elad, ?Optimally sparse representation in general (nonorthogonal)dictionaries via `1 minimization,? Proceedings of the National Academy of Sciences ofthe United States of America, vol. 100, no. 5, pp. 2197?2202, 2003.[16] R. Chartrand, ?Exact reconstructions of sparse signals via nonconvex minimization,?IEEE Signal Processing Letters, vol. 14, no. 10, pp. 707?710, 2007.[17] S. Foucart and M. Lai, ?Sparsest solutions of underdetermined linear systems via `q-minimization for 0 < q 1,? Applied and Computational Harmonic Analysis, vol. 26,no. 3, pp. 395?407, 2009.[18] T. Blumensath and M. Davies, ?Iterative thresholding for sparse approximations,? FourierAnal. Appl., vol. 14(5), pp. 629?654, 2008.[19] T. Blumensath and M. E. Davies, ?Iterative hard thresholding for compressed sensing,?Appl. Comput. Harmon. Anal., vol. 27, p. 265?274, 2009.[20] K. K. Herrity, A. C. Gilbert, , and J. A. Tropp, ?Sparse approximation via iterative thresh-olding,? in Proceedings of the Int. Conf. on Acoustics, Speech and Signal Processing,,2006.67Bibliography[21] D. Donoho, A. Maleki, and A. Montanari, ?Message passing algorithms for compressedsensing,? Proceedings of the National Academy of Sciences, vol. 106(45), pp. 18 914?18 919, 2009.[22] L. Demanet and E. J. Cand?s, ?The curvelet representation of wave propagators is opti-mally sparse,? vol. 58, 2005, pp. 1472?1528.[23] F. J. Herrmann, P. P. Moghaddam, and C. C. Stolk, ?Sparsity- and continuity- promotingseismic imaging with curvelet frames,? Journal of Applied and Computational Har-monic Analysis, vol. 24, pp. 150?173, 2008.[24] E. J. Cand?s and T. Tao, ?Decoding by linear programming.? IEEE Transactions onInformation Theory, vol. 51, no. 12, pp. 489?509, 2005.[25] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin, ?A simple proof of the restrictedisometry property for random matrices,? Constructive Approximation, vol. 28, no. 3,pp. 253?263, 2008.[26] R. Saab, R. Chartrand, and O. Yilmaz, ?Stable sparse approximations via nonconvexoptimization,? in IEEE International Conference on Acoustics, Speech and SignalProcessing (ICASSP), 2008, pp. 3885?3888.[27] E. J. Cand?s, ?The restricted isometry property and its implications for compressed sens-ing,? Comptes rendus-Math?matique, vol. 346, no. 9-10, pp. 589?592, 2008.[28] X. Chen and W. Zhou, ?Convergence of reweighted `1minimization algorithms and uniquesolution of truncated `pminimization.?[29] R. Chartrand and W. Tin, ?Iteratively reweighted algorithms for compressive sens-ing,? IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP), 2008., pp. 3869?3872, 31 2008-April 4 2008.[30] E. Candes, M. Wakin, and S. Boyd, ?Enhancing sparsity by reweighted l1 minimization,?Journal of Fourier Analysis and Applications, vol. 14, pp. 877?905, 2008.[31] B. D. Rao and K. Kreutz-Delgado, ?An affine scaling methodology for best basis selection,?IEEE Trans. Signal Process., vol. 47, pp. 187?200, 1999.[32] A. Maleki and D. L. Donoho, ?Optimally tuned iterative reconstruction algorithms forcompressed sensing,? IEEE J. Sel. Topics in Signal Processing, vol. 4, no. 2, pp. 330?341, 2010.68Bibliography[33] T. DJ, A. PW, and P. RG, ?Solution of ?solvable model of a spin glass?,? Philos Mag,vol. 35, pp. 593?601, 1977.[34] D. L. Donoho, A. Maleki, , and A. Montanari, ?Constructing message passing algorithmsfor compressed sensing,? submitted to IEEE Trans. Inf. Theory.[35] H. Mansour and O. Y?lmaz, ?Support driven reweighted 1-norm minimization,? Interna-tional Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2012.[36] H. Mansour, F. Herrmann, and O. Y?lmaz, ?Beyond l1 minimization for sparse signalrecovery,? Proc. of the IEEE Statistical Signal Processing Workshop (SSP), AnnArbor, Michigan, August 2012.[37] E. van den Berg and M. P. Friedlander, ?SPGL1: A solver for large-scale sparse recon-struction,? June 2007, http://www.cs.ubc.ca/labs/scl/spgl1.69
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Using prior support information in compressed sensing
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Using prior support information in compressed sensing Ghadermarzy, Navid 2013
pdf
Page Metadata
Item Metadata
Title | Using prior support information in compressed sensing |
Creator |
Ghadermarzy, Navid |
Publisher | University of British Columbia |
Date Issued | 2013 |
Description | Compressed sensing is a data acquisition technique that entails recovering estimates of sparse and compressible signals from n linear measurements, significantly fewer than the signal ambient dimension N. In this thesis we show how we can reduce the required number of measurements even further if we incorporate prior information about the signal into the reconstruction algorithm. Specifically, we study certain weighted nonconvex Lp minimization algorithms and a weighted approximate message passing algorithm. In Chapter 1 we describe compressed sensing as a practicable signal acquisition method in application and introduce the generic sparse approximation problem. Then we review some of the algorithms used in compressed sensing literature and briefly introduce the method we used to incorporate prior support information into these problems. In Chapter 2 we derive sufficient conditions for stable and robust recovery using weighted Lp minimization and show that these conditions are better than those for recovery by regular Lp and weighted L1. We present extensive numerical experiments, both on synthetic examples and on audio, and seismic signals. In Chapter 3 we derive weighted AMP algorithm which iteratively solves the weighted L1 minimization. We also introduce a reweighting scheme for weighted AMP algorithms which enhances the recovery performance of weighted AMP. We also apply these algorithms on synthetic experiments and on real audio signals. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2013-08-27 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0074157 |
URI | http://hdl.handle.net/2429/44912 |
Degree |
Master of Science - MSc |
Program |
Mathematics |
Affiliation |
Science, Faculty of Mathematics, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2013-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
Aggregated Source Repository | DSpace |
Download
- Media
- 24-ubc_2013_fall_ghadermarzy_navid.pdf [ 1.48MB ]
- Metadata
- JSON: 24-1.0074157.json
- JSON-LD: 24-1.0074157-ld.json
- RDF/XML (Pretty): 24-1.0074157-rdf.xml
- RDF/JSON: 24-1.0074157-rdf.json
- Turtle: 24-1.0074157-turtle.txt
- N-Triples: 24-1.0074157-rdf-ntriples.txt
- Original Record: 24-1.0074157-source.json
- Full Text
- 24-1.0074157-fulltext.txt
- Citation
- 24-1.0074157.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0074157/manifest