Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Sparse signal recovery : analysis and synthesis formulations with prior support information Hargreaves, Brock Edward 2014

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


24-ubc_2014_spring_hargreaves_brock.pdf [ 4.74MB ]
JSON: 24-1.0103412.json
JSON-LD: 24-1.0103412-ld.json
RDF/XML (Pretty): 24-1.0103412-rdf.xml
RDF/JSON: 24-1.0103412-rdf.json
Turtle: 24-1.0103412-turtle.txt
N-Triples: 24-1.0103412-rdf-ntriples.txt
Original Record: 24-1.0103412-source.json
Full Text

Full Text

Sparse Signal RecoveryAnalysis and Synthesis Formulations with PriorSupport InformationbyBrock Edward HargreavesB.Sc. (Hons.), The University of Calgary, 2006A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEinThe Faculty of Graduate and Postdoctoral Studies(Mathematics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)April 2014c© Brock Edward Hargreaves 2014AbstractThe synthesis model for signal recovery has been the model of choice formany years in compressive sensing. Various weighting schemes using priorsupport information to adjust the objective function associated with thesynthesis model have been shown to improve the recovery of the signal interms of accuracy. Generally, even with no prior knowledge of the support,iterative methods can build support estimates and incorporate that into therecovery which has also been shown to increase the speed and accuracy ofthe recovery. However when the original signal is sparse with respect to aredundant dictionary (rather than an orthonormal basis) there is a coun-terpart model to synthesis, namely the analysis model, which has been lesspopular but has recently attracted more attention. The analysis model ismuch less understood and thus there are fewer theorems available in boththe context of non-weighted and weighted signal recovery.In this thesis, we investigate weighting in both the analysis model andsynthesis model in weighted `1-minimization. Theoretical guarantees onreconstruction and various weighting strategies for each model are discussed.We give conditions for weighted synthesis recovery with frames which donot require strict incoherency conditions, this is based on recent resultsof regular synthesis with frames using optimal dual `1 analysis. A novelweighting technique is introduced in the analysis case which outperforms itstraditional counterparts in the case of seismic wavefield reconstruction. Wealso introduce a weighted split Bregman algorithm for analysis and optimaldual analysis. We then investigate these techniques on seismic data andsynthetically created test data using a variety of frames.iiPrefaceThis thesis is original, unpublished, independent work by the author, BrockEdward Hargreaves.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix1 Introduction to Compressive Sensing . . . . . . . . . . . . . 11.1 The Compressed Sensing Problem . . . . . . . . . . . . . . . 11.2 Applications of Sparse Approximation . . . . . . . . . . . . . 41.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Frames and Bases in Sparse Approximation . . . . . . . . . 82.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2 Finite Frames for Rd . . . . . . . . . . . . . . . . . . . . . . 93 The Sparse Recovery Problem . . . . . . . . . . . . . . . . . 133.1 Sparsity with respect to Orthonormal Bases . . . . . . . . . 133.1.1 The Synthesis Formulation . . . . . . . . . . . . . . . 133.1.2 The Analysis Formulation . . . . . . . . . . . . . . . 143.2 Equivalence of Synthesis and Analysis for Orthonormal Bases 153.3 Synthesis and Analysis Formulations for Redundant Frames 163.3.1 General Frames and Duals . . . . . . . . . . . . . . . 163.3.2 Optimal Dual `1-Analysis . . . . . . . . . . . . . . . . 183.4 Uniqueness of `0 Recovery for Synthesis . . . . . . . . . . . . 203.5 Recovery Guarantees for `1-Minimization . . . . . . . . . . . 213.5.1 Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . 213.5.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 22ivTable of Contents3.5.3 General and Optimal Dual Based Analysis . . . . . . 234 Weighted Methods and Partial Support Information . . . 254.1 Weighted Synthesis . . . . . . . . . . . . . . . . . . . . . . . 254.1.1 Fixed Weights . . . . . . . . . . . . . . . . . . . . . . 254.1.2 Variable Weights . . . . . . . . . . . . . . . . . . . . . 264.1.3 Iterative Reweighting . . . . . . . . . . . . . . . . . . 274.2 Weighted Analysis . . . . . . . . . . . . . . . . . . . . . . . . 284.2.1 Iterative Reweighting . . . . . . . . . . . . . . . . . . 284.3 Main Theoretical Contributions . . . . . . . . . . . . . . . . 294.3.1 A Novel Weighting Technique for Analysis . . . . . . 294.3.2 Weighted General Dual Analysis . . . . . . . . . . . . 304.3.3 Weighted Optimal Dual Analysis . . . . . . . . . . . 375 Synthetic Experiments . . . . . . . . . . . . . . . . . . . . . . 405.1 Optimal Dual `1-Analysis . . . . . . . . . . . . . . . . . . . . 405.1.1 Weighted vs Non-Weighted . . . . . . . . . . . . . . . 405.1.2 Iterative Reweighting . . . . . . . . . . . . . . . . . . 435.1.3 The Optimal Dual Frame Bounds . . . . . . . . . . . 446 Seismic Wavefield Reconstruction . . . . . . . . . . . . . . . 466.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 466.2 Seismic Data Interpolation . . . . . . . . . . . . . . . . . . . 486.3 Seismic Data Interpolation by Sparse Recovery . . . . . . . . 516.3.1 Incorporating Weights . . . . . . . . . . . . . . . . . . 546.4 Gulf of Mexico Experiment Results . . . . . . . . . . . . . . 576.4.1 Computational Considerations . . . . . . . . . . . . . 606.4.2 Conclusions and Future Work . . . . . . . . . . . . . 60Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61vList of Figures1.1 Haar wavelet coefficients computed via the Discrete WaveletsToolbox by Patrick J. Van Fleet [65] . . . . . . . . . . . . . . 41.2 Left: Seismic shot record, Right: Seismic shot record withmissing traces . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1 Two sparse signals . . . . . . . . . . . . . . . . . . . . . . . . 82.2 The sum of the two sparse signals . . . . . . . . . . . . . . . . 95.1 Average relative error of 10 reconstructed signals at a givenouter iteration of the Split Bregman algorithm . . . . . . . . 425.2 Measurements are now corrupted by a small amount of whitenoise η, i.e., y = ΦADx+ η . . . . . . . . . . . . . . . . . . . 425.3 Iterative reweighting with no initial support estimate vs noweighting for the noisy case. We update our weights by calcu-lating the coefficients of dk+1 which contribute 90 % of dk+1’scumulative energy. . . . . . . . . . . . . . . . . . . . . . . . . 445.4 Average upper frame bound of the optimal dual reconstructedfrom signals of varying sparsity. . . . . . . . . . . . . . . . . . 456.1 Basis seismic processing workflow. M.V.A stands for Migra-tion Velocity Analysis and C.I.G. stands for Common ImageGathers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476.2 An example of a top down view of a seismic acquisition ge-ometry provided by Total E & P USA. Each line representsa set of seismic receivers which are trailing a seismic vessel. . 496.3 A regular grid overlaying the seismic acquisition. Each in-tersection of the grid corresponds to a physical location forwhich we require information. . . . . . . . . . . . . . . . . . . 496.4 A regular grid overlaying the seismic acquisition. The cir-cled areas represent regions for which there are no receiverscollecting information. . . . . . . . . . . . . . . . . . . . . . . 50viList of Figures6.5 A seismic acquisition from the Gulf of Mexico expressed as a3D volume, here we shows several time slices of the volume. . 516.6 A seismic acquisition from the Gulf of Mexico expressed as a3D volume, here we show several common shot gathers. . . . 526.7 A common shot gather sliced from the 3D cube. . . . . . . . . 536.8 The shot gather sliced from the 3D cube in Figure 6.7 withrandomly missing receivers. . . . . . . . . . . . . . . . . . . . 536.9 A frequency sliced from the 3D cube. . . . . . . . . . . . . . 556.10 The frequency sliced from the 3D cube in Figure 6.9 withrandomly missing receivers. . . . . . . . . . . . . . . . . . . . 556.11 Illustration of weighting strategy, s: source , r: receiver , f:frequency . One begins by recovering low frequency slices,which are hopefully non-aliased, and then using those recov-ered slices to generate weights for recovering higher frequencyslices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566.12 Seismic wavefield reconstruction of the Gulf of Mexico Datausing analysis and synthesis with various weighting techniques.New weighting analysis refers to (4.8), and the others are thetraditional weighting techniques. . . . . . . . . . . . . . . . . 576.13 Recovered Synthesis . . . . . . . . . . . . . . . . . . . . . . . 586.14 Difference plot . . . . . . . . . . . . . . . . . . . . . . . . . . 586.15 Recovered Analysis . . . . . . . . . . . . . . . . . . . . . . . . 586.16 Difference plot . . . . . . . . . . . . . . . . . . . . . . . . . . 586.17 Recovered Weighted Synthesis . . . . . . . . . . . . . . . . . . 596.18 Difference plot . . . . . . . . . . . . . . . . . . . . . . . . . . 596.19 Recovered Weighted Analysis . . . . . . . . . . . . . . . . . . 596.20 Difference plot . . . . . . . . . . . . . . . . . . . . . . . . . . 596.21 Recovered New Weighted Analysis . . . . . . . . . . . . . . . 596.22 Difference plot . . . . . . . . . . . . . . . . . . . . . . . . . . 59viiAcknowledgementsFirst and foremost I would like to express my sincere gratitude to my su-pervisors Dr. O¨zgu¨r Yilmaz and Dr. Felix Herrmann for their patience,understanding, and for always pushing me to be a better mathematicianand researcher. Without their continuous support I would certainly not bewhere I am today.I would also like to thank all the members of the Seismic Laboratory ofImaging and Modelling (SLIM), along their support staff, for their expertise.viiiDedicationTo my mother and father for their continuous love and support throughoutmy university life. To my sister for joyfully boasting of my accomplishments,though her triumphs supersede my own. To my brother and his wife fortheir affection and support while far from home, for which I will be forevergrateful.ixChapter 1Introduction to CompressiveSensing1.1 The Compressed Sensing ProblemCompressive sensing is a sampling paradigm introduced in two seminal pa-pers Cande´s et al [10] and Donoho [14] which asserts that one can recoversignals from fewer samples or measurements than traditional methods suchas the Shannon-Nyquist sampling theorem [53, 58, 64], assuming certainprior knowledge of the structure of the signal.Before posing the mathematical problem of compressive sensing, let usmotivate one of the underlying principals of compressed sensing. Considerhow a basic compact digital camera collects and stores an image. The cam-era’s sensor is exposed to light passing through the lens and the image isthen generally stored using a lossy compression method such as JPEG. Atypical lossy compression algorithm might assume a sparse (or compress-ible) representation of the image in a particular basis or frame and storeonly those coefficients that contribute the most energy in the image. Manytimes this type of compression reduces the storage cost by orders of mag-nitude. Of course the computed coefficients come from the transform of afully sampled image, so only after the fact can we perform lossy compression.However, one may pose the following question: If a signal is characterizedby a small amount of transform coefficients, is it possible to change the ac-quisition design to reduce the number of samples collected? Among otherthings, compressive sensing attempts to answer this question by acquiring acompressed signal representation. Let us begin the mathematical frameworkbehind compressive sensing by starting with the definition of sparsity.Definition 1.1. A vector x ∈ RN is called k-sparse if ||x||0 ≤ k where ||x||0represents the number of non-zero coefficients of x. For k ∈ {1, 2, ..., N},11.1. The Compressed Sensing Problemdefine ΣNk as the set of all k-sparse vectors in RN :ΣNk :={x ∈ RN : ||x||0 ≤ k}(1.1)We do not typically observe sparse vectors directly in the practical ap-plications. Consider the following continuous valued signal f(t) where trepresents time.f(t) ={1 if t = 0.50 if t ∈ [0, 1]\{t = 0.5}(1.2)Suppose we had a measurement device that measures f(t) every ∆t seconds.In order to observe the only non-zero portion of this signal, we would require0.5 mod ∆t = 0, i.e., we measure the precise moment for which the signal isnon-zero. Such a condition on the measurement device is impractical sincein general we will not know apriori where the non-zero element of the signalis. What happens more often is that a particular signal is sparse with respectto a basis (or frame).Definition 1.2. A signal f ∈ Rd is said to be k-sparse with respect to amatrix D ∈ Rd×N , if f =∑Nj=1 djxj , for some x ∈ ΣNk , and dj ∈ Rd are thecolumn vectors1 of D. Equivalently, f = Dx for some x ∈ ΣNk .Denote T0 as the set of indices for which x is non zero where |T0| ≤ k.In this case, we would have the following decomposition of the signal:f =∑j∈T0djxjand the signal is a linear combination of only k out of the N vectors dj .Though the vector f is typically dense, it is characterized by the vector xwhich is sparse. The collection {dj}Nj=1 is typically a complete system in Rd,such as a basis (in which case d = N) or a frame (N ≥ d). See Chapter 2for a discussion on frames. For now, it is sufficient that we think of the djas the building blocks for the signal f .Typically, in compressed sensing, measurements will be collected from fin the form of y = Φf where Φ ∈ Rn×d and n d. The matrix Φ is referred1Note that as a slight abuse of notation d by itself is an integer representing the ambientdimension of the dictionary D, and the dj are column vectors of D. Context should makeit clear to which we are referring to.21.1. The Compressed Sensing Problemto the sensing or measurement matrix.The compressed sensing problem is then simple stated as the following[38]: Recover x from the knowledge ofy = ΦDxThat is from a seemingly incomplete set of linear measurements, recover thesparse vector x.A major question, which is still highly researched, is for which measure-ment matrices Φ will recovery of the sparse vector x be possible? Generally,what does the recovery process look like? Without any knowledge of wherethe k non-zero entries of the sparse vector x are, performing Gaussian elim-ination will be fruitless and yield infinitely many solutions. The seminalpapers referred to at the beginning of this section were the first papers tobegin answering these questions and indeed gave theoretical conditions forwhich the recovery is successful and the process of the recovery is computa-tionally feasible. In this thesis we will focus on the recovery problem and inparticular how we can incorporate prior knowledge of the locations of someof the non-zero entries, possibly inaccurately, in the recovery process. Formore comprehensive introduction and surveys regarding compressive sens-ing, the reader is referred to [8, 19, 20, 22].31.2. Applications of Sparse Approximation1.2 Applications of Sparse ApproximationOne might ask if the sparsity assumption is valid in practice. Indeed, sig-nals in the real world (such as natural images) are typically difficult tomodel. Some theoretical work has been done modelling images as functionsof bounded variation or belonging to an appropriate smoothness space suchas certain Sobolev or Besov spaces. However these characterizations are notperfect, e.g., [28].Nonetheless, it has been empirically shown that natural images as wellas images gathered from medical and seismic experiments have sparse ap-proximations with respect to a variety of different transforms. It is thisfundamental observation which compressive sensing takes advantage of. Letus look at a very simple type of wavelet transformation on an image, namelythe Haar wavelet [65]:(a) People playinghockey(b) Haar wavelet decom-position(c) Cumulative Energy:The first 7000 terms outof 65536Figure 1.1: Haar wavelet coefficients computed via the Discrete WaveletsToolbox by Patrick J. Van Fleet [65]41.2. Applications of Sparse ApproximationThis image has 256 by 256 pixels, and thus has 65536 coefficients. InFigure 1.1b we see that over 99% of the images energy is stored in only3% of the wavelet coefficients which is about 2000 terms. We could com-press this by only recording those 2000 terms. This is an example of lossycompression via transform coding, where a transform takes the signal to adifferent domain and only a selection of the coefficients are kept. A famousexample of such compression is the JPEG compression scheme, which uti-lizes the discrete cosine transform (DCT). This method was later updatedinto JPEG2000 which uses the frame transform (wavelets) and tends to bemore effective. Wavelets are based on multiresolutional analysis, a tool usedin applied harmonic analysis which has grown into a stable research fieldover the last two decades. For wavelet based compression schemes see [65].Sparse approximation is also used in the context of imaging while con-ducting mineral, oil, and gas exploration. In this thesis we are most con-cerned with the interpolation of seismic data, commonly referred to as seis-mic wavefield reconstruction. Seismic receivers are placed on the surfaceand then record seismic events from reflections of waves in the earth. Werestrict ourself to the scenario of missing random receivers on an otherwiseregularly sampled grid; this is portrayed by the example below:Figure 1.2: Left: Seismic shot record, Right: Seismic shot record with miss-ing tracesOne can take advantage of sparsity in the Curvelet domain to recover themissing traces, which we will elaborate on in Chapter 6. There is now a vastamount of literature that explores how to take advantage of this sparsity inthe context of seismic data and image interpolation in a variety of domains51.3. Thesis Outlineincluding Curvelet, Radon, and Fourier. Some exposure of these techniquescan be found in the following: Herrmann et al [33], Zwartjes and Sacchi [70],Shahidi et al [57] and Trad et al [61].Another common challenge in any seismic workflow is dealing with themassive amount of data that is being collected and processed. In seismic ex-ploration it is not uncommon to deal with data in up to 5 dimensions, and thecomplexity of seismic imaging problems does not scale linearly with size ordimension. This is referred to as the curse of dimensionality [33]. Compres-sive sensing can help alleviate these problems two-fold; by requiring fewermeasurements thus driving down the cost of acquisition, and by increasingthe speed of which an image is processed. There are now a collection of seis-mic imaging algorithms which take advantage of the sparse approximationframework: “Efficient least-squares imaging with sparsity promotion andcompressive sensing” by Herrmann and Li [32], “Randomized marine acqui-sition with compressive sampling matrices” by Mansour, Wason, Lin, andHerrmann [49], “Robust inversion, dimensionality reduction, and random-ized sampling” by Aravkin, Friedlander, Herrmann, and van Leeuwen [1],“Robust inversion via semistochastic dimensionality reduction” by Aravkin,Friedlander, and van Leeuwen [2], and “Improved wavefield reconstructionfrom randomized sampling via weighted one-norm minimization” by Man-sour, Herrmann, and Yilmaz [48] are among them.1.3 Thesis OutlineIn Chapter 2 we review bases in sparse approximation theory and motivatethe necessity for frames, followed by a brief introduction of frame theoryin finite dimensions. In Chapter 3 we give a review of the literature in thesparse recovery problem with respect to orthonormal basis and frames. Thisincludes a variety of theorems on uniqueness, equivalence, and recovery con-ditions for the analysis and synthesis formulations. Chapter 4 is dedicatedto recent work in how partial support information can be incorporated intothe analysis and synthesis formulations of the sparse recovery problem. Re-covery conditions for the weighted formulations are given, much akin to therecovery conditions of Chapter 3. Chapters 4 through 6 are where the maincontributions of this thesis lie. In Chapter 4 a condition for the weightedsynthesis formulation is introduced which does not require strict incoherentconditions, along with introducing a novel weighting method for the analy-sis formulation. Chapter 5 reviews a split Bregman method for solving the61.3. Thesis Outlineoptimal dual analysis formulation, along with the introduction of a weightedsplit Bregman method for solving the weighted optimal dual analysis for-mulation. Chapter 6 applies the formulations discussed in this thesis tothe seismic interpolation problem and compares their results to a seismicexperiment conducted in the Gulf of Mexico.7Chapter 2Frames and Bases in SparseApproximation2.1 MotivationIn the context of sparse approximation, many signals do not have sufficientlysparse coefficient sequences with respect to a basis. Often we can overcomethis inefficiency by considering frames. Consider signals that are sums ofsines and spikes. For example, consider the Fourier basis {dk = e−i(k−1)pim }and the standard basis {ek} for k = 1 . . .m where the standard basis is thefamiliar one, i.e., e1 = [1, 0, 0, ..., 0]. As a toy example, suppose we have twosignals f1 and f2 which are sparse respectively to dk and ek with m = 64,as shown in Figure 2.1:(a) The real part of a signal f1created by one Fourier mode(b) A signal f2 created by a singlenon-zero elementFigure 2.1: Two sparse signalsLet us create a third signal f3, shown in Figure 2.2, which is the sumof the signals f1 and f2 in Figure 2.1. The signal f3 cannot be sparselyrepresented in either the standard basis or the Fourier basis alone. However82.2. Finite Frames for Rdif we consider f3 as a linear combination of vectors in {dk}mk=1 ∪ {ek}mk=1 wecan indeed give the exact sparse representation. It is in fact true that theunion of two bases is a frame; next we formalize the definition of a frame.Figure 2.2: The sum of the two sparse signals2.2 Finite Frames for RdThis section is for reference and gives a brief review of basic frame theory.For a comprehensive coverage, see [13].Definition 2.1. A set of vectors {dk}Nk=1 in Rd is a finite frame for Rd ifthere exist constants 0 < A ≤ B <∞ such that∀f ∈ Rd, A||f ||22 ≤N∑k=1|< f, dk >|2 ≤ B||f ||22 (2.1)where A and B are called frame bounds and {dk}Nk=1 are referred to asthe frame elements. The frame bounds are not unique. The optimal lowerframe bound is the supremum over all lower frame bounds and the optimalupper frame bound is the infimum over all upper frame bounds. This canbe reformulated in matrix form:A||f ||22 ≤ f∗(DD∗)f ≤ B||f ||22 (2.2)where {dk}Nk=1 are the columns of the matrix D ∈ Rd×N . That is, A ≤σ2min(D) and B ≥ σ2max(D) where σmin(D) is the smallest (in magnitude)non-zero singular value of D and σmax(D) is the largest (in magnitude)singular value of D. Throughout the rest of this thesis if we refer to D92.2. Finite Frames for Rdas a frame, then we consider the columns of D as the frame elements. Itis obvious that this definition holds true if and only if DD∗ is invertible.Clearly, this is equivalent to saying that D is full rank, i.e., rank(D) = d.In the specific case when D is a basis, this implies that D is also a frame.Definition 2.2. A frame {dk}k∈I is called a tight frame if we can chooseA = B and is called Parseval (unit norm tight frame) if A = B = 1.In the case of a tight frame, the frame preserves angles between twovectors. In this sense, the frame bounds can be thought of how well the framepreserves the geometry of the vector space. Applied and computationalscientists will often prefer to work with tight frames as they tend to have fastcomputational implementations which is important for large scale problems,typically this translates into fast matrix vector products, which are a keycomponent in a wide variety of imaging algorithms. An important exampleof a frame for Rd is a harmonic frame as defined below.Definition 2.3 (Harmonic Frame). A harmonic frame [39] for Rd is gener-ated by the columns of a Fourier matrix and the definitionHdN = {hNk }Nk=1, N ≥d depends on whether d is odd or even. If d is even lethNk =√2d[cos2pikN, sin2pikN, cos4pikN, sin4pikN, cos6pikN,sin6pikN, . . . , cos2pi d2kN, sin2pi d2kN]Tfor k = 1, 2, . . . , N . If d is odd lethNk =√2d[1√2, cos2pikN, sin2pikN, cos4pikN, sin4pikN,cos6pikN, sin6pikN, . . . , cos2pi d−12 kN, sin2pi d−12 kN]TIt was shown by Zimmermann [69] that HdN as defined above, is a unit-norm tight frame for Rd with frame bound A = Nd . We will now introducethe notion of the synthesis and analysis operators, associated with a frame.This will allow us to formalize the notion of synthesis and analysis coeffi-cients.102.2. Finite Frames for RdDefinition 2.4. For Rd, equipped with a frame {dk}Nk=1, the synthesis op-erator is the linear mapping defined as follows:T : RN → Rd , T{ck}Nk=1 =N∑k=1ckdkWe have already seen the usage of the synthesis operator in the introduc-tion. The matrix representation of the synthesis operator in the standardbasis is the matrix D ∈ Rd×N whose columns are {dk}Nk=1. Indeed, whenwe build a signal f = Dx, the entries of x are fulfilling the role of ck in theabove definition. We will refer to these coefficients as the synthesis coeffi-cients. As in the introduction, these synthesis coefficients are the weightsof the building blocks of the signals of interest. On the other hand, given asignal f , can we decompose it into its constituent building blocks? This isthe role of the analysis operator:Definition 2.5. The analysis operator is defined as the adjoint of the syn-thesis operator and is given by:T ∗ : Rd → RN , T ∗f = {〈f, dk〉}Nk=1Using matrix notation, T ∗f = D∗f . Composing T with its adjoint T ∗, weobtain the frame operator :S : Rd → Rd, Sf = TT ∗f =N∑k=1〈f, dk〉dk (2.3)Again, using matrix notation, S = DD∗.Theorem 2.2.1 (Theorem 1.1.5 [13]). Let {dk}Nk=1 be a frame for Rd withframe operator S, then:1. S is invertible and self-adjoint.2. Every f ∈ Rd can be represented asf =N∑k=1〈f, S−1dk〉dk =N∑k=1〈f, dk〉S−1dk (2.4)An important sequence of vectors, {S−1dk}mk=1 also form a frame, andare referred to as the canonical dual frame of {dk}mk=1. In matrix notation,we write the canonical dual as D¯ = (DD∗)−1D where as usual the columns112.2. Finite Frames for Rdof D are dk. In linear algebra D¯∗ is commonly referred to as the pseudo-inverse and is a right inverse of D, where (*) is the Hermitian conjugate.An important theorem regarding the canonical dual frame:Theorem 2.2.2 (Proposition 5.1.4 [29]). If {dk}Nk=1 is a frame for Rd andf =∑Nk=1 ckdk for some coefficients ck, thenN∑k=1|ck|2 ≥N∑k=1|〈f, S−1dk〉|2with equality only if ck = 〈f, S−1dk〉 for all k.This theorem says that among all coefficients ck which satisfy f =∑Nk=1 ckdk, the canonical coefficients have the smallest `2-norm. This isimportant, for example, in the context of recovering signals when there isnoise involved. There are many theorems related to the denoising problemand frames the reader is referred to works of Dragotti and Vetterli [17],Donoho and Johnstone [16], and Hennenfent and Herrmann [30]. For proofsand finer details of this section, the reader is referred to the works of Chris-tensen [13], Casazza et al [12], Gro¨chenig [29], and Kovacevic and Chebira[37].We should note that there are generally, but not always, other frameswhich lead to decompositions as above. Any frame {vk}Nk=1 for Rd satisfying∀f ∈ Rd , f =N∑k=1〈f, dk〉vk (2.5)is called a dual frame to {dk}Nk=1. If {vk}Nk=1 is not the canonical dual frame,then it is referred to as an alternative dual frame.12Chapter 3The Sparse RecoveryProblem3.1 Sparsity with respect to Orthonormal Bases3.1.1 The Synthesis FormulationSuppose a signal f ∈ Rd has a sparse synthesis representation with respectto an orthonormal basis D ∈ Rd×d. That isf = Dx (3.1)where x ∈ Σdk and the columns of D form an orthonormal basis of Rd. Fur-thermore, we only have access to a set of linear non-adaptive measurementsy via the measurement matrix Φ ∈ Rm×d with m d.y = Φf = ΦDx (3.2)Solving the sparse recovery problem is to find the sparsest vector of coef-ficients that satisfies the measurements. This can be posed as the followingoptimization problem:xˆ = arg minx˜∈Rd||x˜||0 subject to ΦDx˜ = y (3.3)where ||x˜||0 is defined as the number of non-zero elements of x˜. Under thisrecovery, a reconstruction of our signal would befˆ = Dxˆ (3.4)Often in practice there is noise in the measurements and the constraintequations adapts for this:xˆ = arg minx˜∈Rd||x˜||0 subject to ||ΦDx˜− y||2 ≤  (3.5)133.1. Sparsity with respect to Orthonormal Baseswhere  is an estimate of the level of noise in the measurements. It is wellknown that this combinatorial problem (3.5) is NP-hard and thus alterna-tive optimization problems must be considered. One such alternative is theconvex relaxation of (3.5), given byxˆ = arg minx˜∈Rd||x˜||1 subject to ||ΦDx˜− y||2 ≤  (3.6)where our objective function now considers the `1 norm rather than the`0-“norm”. As before, the signal is then reconstructedfˆ = Dxˆ3.1.2 The Analysis FormulationRecall the synthesis based formulation of the sparse recovery problem has asparsity assumption on the synthesis coefficients of the signal. The analysisbased formulation of the sparse recovery problem proposes that the analysiscoefficients be sparse. In practice, signals are never exactly sparse, anddictionaries are often designed to make Ωf for some classes of f as sparse aspossible where Ω is some appropriate analysis operator. Common examplesof analysis operators include: the shift invariant wavelet transform [47]; thefinite difference operator, which concatenates the horizontal and verticalderivatives of an image and is closely connected to total variation [56]; andthe Curvelet transform [59]. In this case, the following optimization problemis considered:fˆ = arg minf˜∈Rd||Ωf˜ ||0 subject to y = Φf˜ (3.7)An important recently theoretical study of the analysis and synthesisformulations comes when considering the cosparse analysis model of Nam etal [51], in which both the synthesis and analysis models are given as a unionof subspace model. In the compressive sensing literature when comparinganalysis and synthesis, the analysis operator Ω has often been chosen as thecanonical dual of some frame D and (3.7) becomes:fˆ = arg minf˜∈Rd||D¯∗f˜ ||0 subject to y = Φf˜ (3.8)where D¯ ≡ (DD∗)−1D denotes the canonical dual frame of D. This formula-tion will be our focus in this thesis. As in the synthesis case, we can consider143.2. Equivalence of Synthesis and Analysis for Orthonormal Basesthe following convex relaxation along with denoising:fˆ = arg minf˜∈Rd||D¯∗f˜ ||1 subject to ||Φf˜ − y||2 ≤  (3.9)Notice that in this formulation there is no reconstruction after the opti-mization problem is solved. In words, the analysis formulation attempts todirectly find a signal whose measurements fit that data and whose associatedanalysis coefficient sequence is sparse.3.2 Equivalence of Synthesis and Analysis forOrthonormal BasesAn important, and still ongoing, research area is when are the analysis andsynthesis formulations equivalent and when does one outperform the otherin terms of recovery? Let us start with the case when D is an orthonormalbasis.Theorem 3.2.1. If D is an orthonormal basis, the analysis and synthesisformulations are equivalent.Proof. In this case DD∗ = D∗D = I, i.e., D−1 = D∗ and D¯ = D. Startingwith the synthesis formulation:fˆ = Dxˆ = D · arg minx∈RN||x||1 subject to ||y − ΦDx||2 ≤  (3.10)We have x = D−1f so thatfˆ = D · arg minD−1f∈RN||D−1f ||1 subject to ||y − Φf ||2 ≤ = arg minf∈RN||D¯∗f ||1 subject to ||y − Φf ||2 ≤ and we arrive at the analysis formulation. Conversely, suppose the analysisformulation:fˆ = arg minf∈RN||D¯∗f ||1 subject to ||y − Φf ||2 ≤ = arg minDx∈RN||D¯∗Dx||1 subject to ||y − ΦDx||2 ≤ 153.3. Synthesis and Analysis Formulations for Redundant Frameswhich simplifies to the synthesis formulation:f = D · arg minx∈RN||x||1 subject to ||y − ΦDx||2 ≤ 3.3 Synthesis and Analysis Formulations forRedundant Frames3.3.1 General Frames and DualsWhat changes when we consider the case when D ∈ Rd×N is a frame? Therecovery process in the synthesis formulation stays the same,xˆ = arg minx˜∈RN||x˜||1 subject to ||ΦDx˜− y||2 ≤  (3.11)where the optimization still takes place over an N -dimensional space, how-ever we then take the obtained sparse coefficient vector xˆ and map it fromthe high N -dimensional space to a lower d-dimensional space via the recon-struction fˆ = Dxˆ. This is somewhat of a blessing in disguise if our only goalis to recover the signal f . Note that the mapping is certainly not one-to-one, thus it is possible to have a recovery xˆ of the sparse coefficient vector xwhere the error ||xˆ− x||2 is large but still have xˆ map to the correct signalf and thus having the error ||f − fˆ ||2 be small. For example, if we addanything in the nullspace of D to our recovered coefficient sequence, we stillobtain the same reconstruction.The major change in the analysis formulation for frames as seen in (3.12)is that the space in which optimization takes place is now d-dimensionalinstead of N -dimensional2:fˆ = arg minf∈Rd||D¯∗f ||1 subject to ||Φf − y||2 ≤  (3.12)The approach of using a change of variables to prove equivalency breaksdown when D is a redundant frame, due to the fact that D is no longer2One should not be quick to assume that since the space of optimization is smallerthen computationally the analysis formulation is easier to solve than the synthesis one.In fact there are far more solvers available to solve synthesis problems than analysis. Theissue can essentially be boiled down to the coupling of the matrix D¯∗ inside the `1-normwith f , which makes it harder to develop fast solvers for the analysis formulation.163.3. Synthesis and Analysis Formulations for Redundant Framesinvertible. Let us illustrate this non-equivalence with a simple example.Consider the signal f = Dx with D and x defined asD =1 2 −1 −2 00 1 2 −4 3−4 −2 4 1 42 3 −3 1 −4x =[0 0 0 −2 0]Tso thatf =[4 8 −2 −2]TWe also construct the canonical dual frame D¯ = (DD∗)−1D for solvingthe analysis formulation:D¯ =−0.5691 0.1436 −1.2336 −0.0241 0.74230.3103 0.1034 0.7555 −0.1192 −0.3637−0.2654 0.2449 −0.0241 0.1243 0.10010.0548 0.3516 0.5504 0.1038 −0.3458We have that D has column rank 4 and thus the columns of D form aframe for R4, noting also that canonical dual frame D¯ satisfies DD¯∗ = I.Suppose now we observe Gaussian measurements y = Af by multiplying fby a matrix whose entries are sampled from a standard normal Gaussiandistribution A whose rows have been normalized:A =0.9447 0.0668 −0.0059 0.32090.4910 −0.5374 0.6784 −0.0998−0.4415 −0.5692 −0.3935 0.5712Using the software SPG`1 by van den Berg and Friedlander [66] we solvethe synthesis formulation (3.11), with noise parameter set to 0, obtainingthe approximation:fˆ =[3.9964 7.9944 −1.9948 −2.0007]TUsing the software NESTA by Bobin, Becker, and Cande´s [5] we solvethe analysis formulation (3.12), with noise and smoothing parameters set to0, obtaining the approximation:fˆ =[3.9703 8.0509 −1.9267 −1.9217]Tso we observe the solutions of the analysis and the synthesis recoveries do173.3. Synthesis and Analysis Formulations for Redundant Framesnot coincide. It should be noted that these solutions are quite similar andone might consider numerical accuracy to be the fault. However, thesesolutions were calculated to machine precision and their differences are aconsequence of non-equivalence of the analysis and synthesis formulation.Though not included in this thesis, one can replace the pseudo-inverse withan optimal dual analysis operator (described in the following section) andsee the solutions coincide.Some of the first theoretical work in analysis and synthesis formulationswas conducted in Elad et al [18] where the authors investigated the geome-try of the signals associated with the synthesis and analysis models. Morerecently Cande´s and Needell [9] gave the first recovery condition which im-poses no incoherence restriction on the dictionary. These mentioned papersuse the canonical dual as the analysis operator. It should not be surpris-ing that there might exist other duals which give a sparser representation.Indeed, this has been investigated; [18, 45, 51] all consider analysis opera-tors which are not the canonical dual. Li et al [45] generalize the analysisformulation by proposing the general dual `1-analysis formulation:fˆ = arg minf∈Rd||D˜∗f ||1 s.t. ||y − Φf ||2 ≤  (3.13)where columns of the analysis operator D˜ form a general (and any) dualframe of D. We could possibly choose the canonical dual, though theremight be more sparsifying dual, i.e., choose D˜ such that ||D˜∗f ||1 is smallerthan ||D¯∗f || Optimal Dual `1-AnalysisFor a given frame D, there are in fact infinitely many dual frames to D.Among those duals, we hope that some of them provide an optimal sparsifi-cation. One possible solution is the optimal-dual-based-`1analysis [45] whichoptimizes not only over the signal space but also over the set of all duals:fˆ = arg minf˜∈Rd,DD˜∗=I||D˜∗f˜ ||1 s.t. ||y − Φf˜ ||2 ≤  (3.14)Note that the class of all dual frames for D is given by [40]D˜ = (DD∗)−1D +W ∗(Id −D∗(DD∗)−1D)= D¯ +W ∗P183.3. Synthesis and Analysis Formulations for Redundant Frameswhere D¯ is the canonical dual frame of D, P is the orthogonal projectiononto the null space of D, and W ∈ RN×d is an arbitrary matrix. Pluggingthe above formula into (3.14) we obtain(fˆ , gˆ) = arg minf˜∈Rd,g∈RN||D¯∗f˜ + Pg||1 s.t. ||y − Φf˜ ||2 ≤  (3.15)The solution of (3.15) corresponds to the solution of (3.13) with some op-timal dual frame, say D˜0, as the analysis operator. As a byproduct of theoptimization, one can construct the optimal dual D˜0. The construction theauthors [44] propose is the following:D˜0 = D¯∗ +fˆ ⊗ gˆ||fˆ ||22(3.16)The optimality here is that in the sense that ||D˜∗0f˜ ||1 achieves the smallest||D˜∗f˜ ||1 among all dual frames D˜ of D and feasible signals f˜ satisfying theconstraint in (3.15). Further, and quite remarkably, it can then be shownthat the formulation (3.15) is equivalent to the synthesis formulation (3.11),leading to the following theorem; the proof of which is short and we givenow.Theorem 3.3.1 (Liu, Li, Mi, Lei, and Yu [44]). `1-synthesis and optimal-dual-based `1-analysis are equivalent.Proof. Start with the optimal-dual-based `1-analysis formulation (3.15). Letx˜ = D¯∗f˜ + Pg, then we haveDx˜ = DD¯∗f˜ +DPg= If˜ + 0= f˜Since the columns of [D¯∗, P ] span the whole N -dimensional space and bothf˜ and g are free, then any x˜ ∈ RN can be put in this form. Putting these twofacts into (3.15), we obtain the `1-synthesis formulation (3.11). Conversely,starting with the `1-synthesis formulation, we have that for any x˜ ∈ RN , thefollowing decomposition always holdsx˜ = x˜R + x˜N = D∗(DD∗)−1Dx˜+ Px˜= D¯∗Dx˜+ Px˜193.4. Uniqueness of `0 Recovery for Synthesiswhere x˜R and x˜N are the components of x˜ belonging to the row space andnull space of D, respectively. By definition of `1-synthesis, we have f˜ = Dx˜ ∈Rd. Let g = x˜ ∈ RN , we can arrive at the optimal dual-based `1-analysisformulation.In other words, their exists appropriate analysis operators (namely opti-mal duals) for which the analysis and synthesis formulations are equivalentfor the case of frames.3.4 Uniqueness of `0 Recovery for SynthesisBefore moving to recovery conditions via `1-minimization we will considerthe problem of uniqueness of both the analysis and synthesis formulations.As before, let us consider the issue of whether D is a basis or a frame.Consider a signal f = Dx where the columns of D form a basis, then thislinear system has a unique solution. Now consider a signal f = Dx whereD is a frame, there are infinitely many solutions to this equation. That is,even before sampling we have a non-uniqueness problem which did not existwhen D was a basis. A definition defined early in compressive sensing isthe notion of the spark of a matrix, which will help us better understanduniqueness.Definition 3.1 (Spark). The spark of a given matrix Φ is the smallestnumber of columns of Φ that are linearly dependent.The following theorem associates sparse signal recovery and the spark ofa matrix in the orthonormal basis case:Theorem 3.4.1 (Donoho, Elad, [15]). Let Φ be an m × d matrix and letk ∈ N. Then the following conditions are equivalent:1. If a solution x to (3.3) satisfies ||x||0 ≤ k then it is the unique solution2. spark(Φ) > 2kThus when dealing with exactly sparse vectors, spark provides a completecharacterization of sparse signal recovery. However, even for modestly sizedproblems, this is computationally infeasible to check. This definition hasbeen extended to handle sparsity with respect to a frame D.Definition 3.2 (D-Spark). Given a matrix Φ we define D-Spark(Φ) asthe smallest possible number of columns from D, marked as T , such thatrange(DT )∩ Null(Φ) 6= 0203.5. Recovery Guarantees for `1-MinimizationComing back to the issues introduced when using D as a frame, noticethat D-Spark(Φ) ≤ Spark(Φ) with equality if and only if D is an invertiblematrix, thus quantifying the uniqueness problem associated with frames. Asin the orthonormal case, there is an associated theorem with sparse recovery.Theorem 3.4.2 ( Giryes, Elad, [26]). Let y = Φf where f has a k-sparserepresentation x under D. If k <D-Spark(Φ)/2 then f = Dxˆ for xˆ theminimizer of (3.3).The theorems above give us conditions for uniqueness in the synthesissetting. Clearly, if we recover the correct sparse vector x then we wouldindeed recover the correct signal f via f = Dxˆ. In practice, the convexrelaxation of both the synthesis and analysis formulations are preferred. Thequestion is then when do the convex relaxations of the analysis and synthesisformulations give the same solution as their non-convex counterparts? Weinvestigate this in the following section. A small note on the case when Dis redundant: When one is only interested in recovering the signal f = Dx,than it is not necessary to first find it’s sparsest representation. As discussedpreviously, adding any vector to x in the null space of D (which will mostlikely giving a less sparse solution) does not change it’s reconstruction. Thusa large amount of research is focused on bounding the error ||f − fˆ ||2 of therecovery of the convex relaxations.3.5 Recovery Guarantees for `1-MinimizationIn practice, the convex relaxation of the sparse recovery problem is consid-ered. In this section we summarize existing recovery conditions for both theconvex relaxations of analysis and synthesis formulations.3.5.1 SynthesisIn order to quantify the errors in the reconstruction we will require twodefinitions. Namely, the k-term approximation error of a vector and therestricted isometry property(RIP) of a matrix.Definition 3.3. The best k-term approximation error of a vector x ∈ CNin `p is defined asσk(x)p = infz∈Σk||x− z||p (3.17)213.5. Recovery Guarantees for `1-MinimizationDefinition 3.4. The Restricted Isometry Property(RIP): For an m×n mea-surement matrix Φ, the s-restricted isometry constant δs of Φ is the smallestquantity such that:(1− δs)||x||22 ≤ ||Φx||22 ≤ (1 + δs)||x||22 (3.18)holds for all s-sparse signals x.Constructing measurement matrices which satisfy RIP is a difficult andongoing research area. This is beyond the scope of this thesis and we willrather focus on using this condition to show and prove error estimates forsignal recovery. The following was the first theorem using the definition ofRIP for sparse signal recovery.Theorem 3.5.1 (Cande´s, Tao, Romberg [10]). The recovery (3.11) satisfiesthe following:||xˆ− x||2 ≤ C0σs(x)1√s+ C1 (3.19)provided that the 2s-restricted isometry constant of the matrix compositionΦD obeys δ2s <√2− 1.This result was improved to δ2s ≤ 34+√6≈ 0.4652 in 2010 by Foucart [21]and further still δ2s < 4√41 ≈ 0.6246. in 2013 by Foucart and Rauhut [22] anda sharp bound of δ2s <√22 has now been found by Cai and Zhang in 2013 [7].Notice that if x is s-sparse and there is no noise then Theorem 3.5.1guarantees a perfect recovery. At this point we should note that when Φis a Gaussian and D is a frame it is well known that even if Φ has a smallRIP constant, ΦD may no longer satisfy the assumptions of the above the-orem. Thus when using frames as a sparsity inducing transform, traditionalmethods of analyzing the residual of the recovered signal break down.3.5.2 AnalysisThe definition of RIP has been extended for usage in the analysis formulationof the sparse recovery problem.Definition 3.5 (D-RIP). Let Σs be the union of all subspace spanned by allsubsets of s columns of D. We say that the measurement matrix Φ obeys therestricted isometry property adapted D(abbreviated D-RIP) with constantδs if223.5. Recovery Guarantees for `1-Minimization(1− δs)||v||22 ≤ ||Φv||22 ≤ (1 + δs)||v||22 (3.20)holds for all v ∈ Σs.Notice that Σs is the image under D of all s-sparse vectors. Again, we willnot focus on which matrices satisfy D-RIP but rather what D-RIP conditionsdo we require to ensure accurate signal reconstructions. The following is thefirst theorem using D-RIP to give an estimate of the error in the signalreconstruction for the analysis formulation.Theorem 3.5.2 (Cande´s, Needell [9]). Let D be an arbitrary tight frameand let Φ be a measurement matrix satisfying D-RIP with δ2s < .08. Thenthe solution fˆ to (3.9) satisfies||fˆ − f ||2 ≤ C0+ C1||D∗f − (D∗f)s||1√sfor some well behaved constants C0 and C1.Again, we see a perfect recovery if D∗f is s-sparse and there is no noise.We should mention an important recent result by Baker [4] in the case oftight frames though, which provides a sharp upper bound of δ2s <√22 , aresult analogous to its synthesis counterpart.These ideas have now been generalized for arbitrary frames, not requiringthe condition of tightness however not claiming optimality, and we investi-gate this in the following section.3.5.3 General and Optimal Dual Based AnalysisUsing (3.13) as a decoder, Li et al [45] have the following theorem regardingthe error of the signal reconstruction:Theorem 3.6 (Liu, Mi, Li [45]). Let D be a general frame of Rn with framebounds 0 < A ≤ B < ∞. Let D˜ be any dual frame of D with frame bounds0 < A˜ ≤ B˜ <∞, and let ρ = s/b. Suppose(1−√ρBB˜)2· δs+a + ρBB˜ · δb < 1− 2√ρBB˜ (3.21)holds for some positive integers a and b satisfying 0 < b − a ≤ 3a. Then,the solution fˆ to (3.13) satisfies233.5. Recovery Guarantees for `1-Minimization||fˆ − f ||2 ≤ C0+ C1||D˜f − (D˜∗f)s||1√s(3.22)where C0 and C1 are some constants and (D˜∗f)s denotes the vector consist-ing of the largest s entries of D˜∗f in magnitude.As a direct consequence of the equivalence between optimal-dual-`1-analysis and synthesis, we can make a direct application of Theorem 3.6,using an optimal dual, to arrive at a theorem for solving the synthesis for-mulation.Theorem 3.7 (Liu, Li, Mi, Lei, and Yu [44]). In Theorem 3.6 replace D˜with some optimal dual frame D˜0, as defined in (3.14). Then, the solutionfˆ to (3.14) or (3.11) satisfies (3.22).Note that the optimal dual cannot be known apriori since it is signal de-pendent. Rather than use optimal-dual-`1-analysis as a decoder, the authorsuse the equivalence to give theoretical conditions for synthesis based recoverywith frames (using traditional compressive sensing arguments which haveknown to break down since they rely on incoherency of the measurementmatrix with the frame). That said, the conditions recovered are pointwisedependent on the signal and the associated optimal dual so that we cannotcalculate the error without explicitly finding the optimal dual.In conclusion, we’ve given recovery conditions for the synthesis, analysis,and optimal dual analysis formulations of the sparse recovery problem. Overthe past decade, the compressive sensing community has been in the pursuitof weaker RIP and D-RIP conditions while simultaneously improving errorbounds. Suppose we had some specific prior knowledge of the solution ofthe problem. As it turns out, this can be used to both help numericalmethods in sparse recovery along with improve theoretical conditions whichwe investigate in the following chapter.24Chapter 4Weighted Methods andPartial Support InformationWithin the optimization framework, one can consider having prior supportinformation about the signal you wish to recover. For example, we mightknow where some of the non-zero synthesis coefficients are located if we aretrying to solve the synthesis formulation. It is natural to suspect that hav-ing extra information about a signal should allow for a faster convergence ormore accurate reconstruction method. In this chapter we investigate weight-ing in both the analysis and synthesis formulations. Though we will primar-ily focus on the weighting schemes by Friedlander et al [23] and Cande´s et al[11], we should recognize that the recovery of compressively sampled signalsusing prior support information has been studied in the literature by VonBorries, Mioss, and Potes, [54], Vaswani and Lu [67] [46], Jacques [35], andKhajehnejad et al [36].4.1 Weighted SynthesisOne approach that utilizes prior information in the recovery algorithm is toreplace `1 minimization with weighted `1 minimization:xˆ = arg minx∈RN||x||1,w subject to ||ΦDx− y||2 ≤  (4.1)where ||x||1,ω :=∑iwi|xi| is the weighted `1 norm. There are various waysto choose the weights. Here we consider two possible ways, fixed weightsand variable weights.4.1.1 Fixed WeightsThe first case we consider is a fixed weighting strategy [23]:wi ={1 if i ∈ T˜ cω if i ∈ T˜(4.2)254.1. Weighted Synthesiswhere T˜ is a “support” estimate of x. The following theorem gives a recoverycondition for such a weighting scheme:Theorem 4.1.1 (Friedlander, Mansour, Saab, Yilmaz [23]). Let x be inRN and let xk be its best k-term approximation, supported on T0. Let T˜ ⊂{1, ..., N} be an arbitrary set and define ρ and α such that |T˜ | = ρk and|T˜ ∩ T0| = αρk. Suppose that there exists an a ∈ 1kZ, with a ≥ (1 − α)ρwhere a ≥ 1, and the measurement matrix satisfies RIP withδak +aγ2δ(a+a)k <aγ2− 1 (4.3)where γ = ω + (1 − ω)√1 + ρ− 2αρ for some 0 ≤ ω ≤ 1. Then thesolution xˆ to (4.1) obeys||xˆ− x||2 ≤ C′0+ C′1k−1/2(ω||x− xk||1 + (1− ω)||xT˜ c∩T c0||1)(4.4)Moreover, when α > 0.5 (which indicates an accuracy of the supportgreater than 50 percent) and ω < 1, (4.1) outperforms (3.11) in terms ofaccuracy, stability, and robustness guarantees [23]. Practically, this meansthat whenever one has some reasonably accurate prior support informationthen the weighted formulation is preferred over it’s non weighted counter-part.4.1.2 Variable WeightsThe second weighting scheme popular in compressive sensing literature weconsider is the variable weighting scheme of [11], where the goal is to designthe weights to be inversely proportional to the magnitude of the non-zeroentries:wi =1|xi|+ a(4.5)where a is a stability parameter which will be explained shortly.Let us begin with a heuristic reasoning behind such a choice of weightsgiven in [11]. They specify that as a rough rule of thumb the weights shouldrelate inversely to the true signal magnitudes and more generally that largeweights could be used to discourage nonzero entries in the recovered signalwhile small weights could be used to encourage nonzero entries, a heuristicalso reflected in the strategy of Friedlander et al [23]. Suppose the sparse264.1. Weighted Synthesissignal was known and then weights were set as wi = 1|xi| . The weights wouldthen be infinite at all locations outside of the support of x, forcing the coor-dinates of the solution vector x at these locations to be zero. In this case, aslong as the number of measurements taken is at least as large as the numbernumber of non-zeros of the actual signal, we would have x = xˆ. In practicewe do not know the signal exactly, otherwise we would not have to solve anyproblems. As well, it is not possible to use infinity as a weight and thus ais used as a stability parameter.We will see in the following section that Needell [52] gave theoreticalguarantees for the case of iterative reweighting.4.1.3 Iterative ReweightingEven without prior support information, it is still possible to incorporateweights to the recovery problem. The approach is to solve the non-weightedformulation, use that solution to compute weights, and then solve the asso-ciate weighted formulation. Iterative reweighting in the synthesis case wasfirst introduced by Cande´s, Wakin, and Boyd [11] and has the followingstructure:Algorithm 1 Iterative reweighting with `1-synthesisSet l = 0 and wlj = 1, j ∈ J (J indexes the dictionary) and wj = Wj,jwhile ||x(l) − x(l+1)|| ≥ σ and l < maximum iterations dox(l) = arg min ||W (l)x||`1 subject to ||ΦDx− y||2 ≤ Update weights: w(l+1)j according solution x(l).l = l + 1end whileAs mentioned in the previous section, there exists theoretical results onthe convergence of the reweighting strategy:Theorem 4.1.2 (Needell [52]). Assume Φ satisfies RIP with constant δ2s <√2 − 1. Let x be an s-sparse vector with noisy measurements u = Φx + ewhere ||e||2 ≤ . Assume the smallest non-zero coordinate µ of x satisfiesµ ≥ 4α1−ρ . Then the limiting approximation of reweighted `1-minimizationusing weights defined as (4.5) satisfies||x− xˆ||2 ≤ C′′ (4.6)where ρ =√2δ2s1−δ2s, α = 2√1+δ2s1−δ2s, and C ′′ = 2α1+ρ274.2. Weighted AnalysisIt is still current research for choosing smart and robust rules for select-ing the parameter a though its purpose is clearly to provide stability butalso to ensure that a zero-valued component in x does not strictly prohibita nonzero estimate at the next step. That is, if a component is wronglyidentified as zero it is still possible to recover it in further iterations.4.2 Weighted AnalysisWeighting in the analysis case was introduced at the same time as weightingin the synthesis case in [11]. The weighted analysis problem is posed as thefollowing:fˆ = arg minf˜∈Rd||D¯∗f˜ ||1,w subject to ||Φf˜ − y||2 ≤  (4.7)We can again choose our weights based on some sort of prior information,applying either a variable weight strategy as in [11], or a fixed weight strategyas in [23]. Recall that in the analysis case, we hope that D¯∗f is sparse.Further more if we are assuming that f has a sparse representation withrespect to a dictionary D, we hope that D¯∗f is a close approximation tothat representation, noting that if D is a basis then it would in fact be theexact sparse representation. More often than not though, D¯∗f will not besparse but rather compressible in the sense that the majority of the signal’senergy is stored in a small amount of its coefficients. The weights thenshould be calculated based on those entries which contribute the most tothe signals energy. This is the approach taken in Chapter 6 on seismicwavefield reconstruction.4.2.1 Iterative ReweightingJust as in synthesis formulation, we consider iterative reweighting in theanalysis formulation. The authors of [11] examine iterative reweighting inthe analysis case empirically using examples from MRI imaging and radar.However, we are unaware of any theorems which give guarantees analogousto Theorem Main Theoretical ContributionsAs before we start by solving a non-weighted analysis formulation andthen solve a sequence of weighted analysis formulations using previous solu-tions as a way of calculating the weights.Algorithm 2 Iterative reweighting with `1-analysisSet l = 0 and wlj = 1, j ∈ J (J indexes the dictionary) and wj = Wj,jwhile ||f (l) − f (l+1)|| ≥ σ and l < maximum iterations dof (l) = arg min ||W (l)D¯∗f ||`1 subject to ||Φf − y||2 ≤ Update weights: w(l+1)j according solution f(l).l = l + 1end while4.3 Main Theoretical ContributionsWeighted Methods for Recovering Signals that are Sparsewith respect to Coherent DictionariesIn the remainder of this chapter, we will describe the major theoretical con-tributions of this thesis. Specifically, in Section 4.3.1, we will introduce anovel weighting technique for the analysis formulation. This technique isheuristically motivated and is observed to perform well for certain classes ofsignals including seismic data, for more on this see Chapter 6. In Section4.3.2, we will introduce and discuss weighted-general-dual-analysis recovery.In Section 4.3.3, we apply these to the “optimal duals” and establish theequivalence of the optimal-dual-weighted-analysis and weighted-synthesis al-gorithms (see Theorem 4.3.2). As a consequence we obtain recovery gau-runtees for the weighted-synthesis algorithm of [23] in the case when thesparsity is respect to highly coherent dictionaries.4.3.1 A Novel Weighting Technique for AnalysisThe last method of weighting which we examine in this thesis introducesa weight matrix inside of the canonical dual. It is posed as the followingoptimization problem:fˆ = arg minf˜∈Rd||DW∗f˜ ||1 subject to ||Φf˜ − y||2 ≤  (4.8)where DW is the canonical dual of DW .294.3. Main Theoretical ContributionsHowever, we make the following choice in weights which is reverse tothat of Friedlander et al [23]:wi ={1 if i ∈ T˜ω if i ∈ T˜ c(4.9)where 0 < ω < 1 and T˜ is a support estimate of x. As we shall see in Chapter6 on seismic wavefield reconstruction, this method sometimes outperformsthe weighted synthesis and analysis formulations with fixed weights.A heuristic reasoning behind this choice of weighting can be seen if weconsider the case that D is an orthonormal basis. Start with a sparse rep-resentation of a signal f with respect to an orthornomal basis D and thenmultiply both sides by DW∗and expand the right hand side:f = DxDW∗f = DW∗Dx=(((DW (DW )∗)−1DW)∗Dx= WD∗(DW 2D∗)−1Dx= WD∗(D−1)∗W−2D−1Dx= W−1xThe final line which states that DW∗f = W−1x is in fact reminiscent ofthe variable weighting of [11] in that the weights should relate inversely tothe true signal magnitudes. Notice that since we had reversed the choice ofweights in our definition, their inverse now correspond to assigning a largeweight on the off support and a small weight to the true support as in thesynthesis weighting case.4.3.2 Weighted General Dual AnalysisWe now consider weighting in the general dual formulation. The followingoptimization problem can be considered:fˆ = arg minf∈Rd||D˜∗f ||1,w s.t. ||y − Φf ||2 ≤  (4.10)304.3. Main Theoretical Contributionswhere D˜ is an arbitrary dual frame of D. We combine the techniques ofFriedlander et al [23] and Li et al [45] to arrive at the following theorem.Theorem 4.3.1 (Weighted general dual `1-analysis). Consider the signalf = Dx , where x ∈ RN , and D ∈ Rd×N is an arbitrary frame of Rd withframe bounds 0 < A ≤ B <∞ and we have noisy linear measurements:y = ΦDx+ z (4.11)Let D˜ be an alternative dual frame of D with frame bounds 0 < A˜ ≤ B˜ <∞. Let T˜ ⊂ {1, ..., N} be an arbitrary set. Let T0 index the support of thebest s-term approximation of D˜∗f and thus |T0| = s. Define ρ and α suchthat |T˜ | = ρs and |T˜ ∩ T0| = αρs. Suppose Φ has D-RIP with constant δthat satisfies1− γ√sBB˜b2δs+a +sBB˜bδb < 1− 2γ√sBB˜b(4.12)for some positive integers a and b satisfying 0 < b − a ≤ 3a where γ =ω+ (1−ω)√1 + ρ− 2αρ and 0 ≤ ω ≤ 1 with our weights chosen as follows:wi ={1 if i ∈ T˜ cω if i ∈ T˜(4.13)and T˜ is a support estimate of D˜∗f . Then the solution fˆ to (4.10) satisfies||f − fˆ ||2 ≤ c1(ω||D˜∗T c0 f ||1 + (1− ω)||D˜∗T˜ c∩T c0f ||1)+ c2 (4.14)where c1 and c2 are some well behaved constants, and  is an upper boundon the noise.If we consider the case when D is a Parseval frame and the analysisoperator D˜ is its canonical dual frame, then BB˜ = 1. If we set a = 3s,b = 12s, and ω = 1 then reduce to the case of non-weighting and achievethe condition δ2s < .1398 as given in [45]. However if set ω = .5, ρ = 1,and α = .6 then we require δ2s < 0.1456, a weaker condition. Moreover,whenever α > .5, an estimate of the support which is at least 50% accurate,we have weaker conditions and also see a decrease in the error bound forthe recovery. This means whenever one has a reasonably accurate estimateof the support, the weighted analysis formulation is preferred over it’s non-weighted counterpart.314.3. Main Theoretical ContributionsProof of Theorem 4.3.1An important contribution of this thesis is the proof of Theorem 4.3.1. Thetechniques for the proof are of the spirit given in Friedlander et al [23], Liet al [45], and Needell et al [9]. An important technique that has come upin various proofs in providing conditions for compressive sensing deals withpartitioning indicies of whatever we consider our support set. We will followthe partitioning technique of Li et al [45], for which the following lemma isused.Lemma 4.1 (The Shifting Inequality). Let q and r be positive integerssatisfying q ≤ 3r. Then for any non increasing sequence of real numbersa1 ≥ · · · ≥ ar ≥ b1 ≥ · · · ≥ bq ≥ c1 ≥ · · · ≥ cr ≥ 0, we have√√√√q∑i=1b2i +r∑i=1c2i ≤∑ri=1 ai +∑qi=1 bi√q + r(4.15)Without loss of generality, we will assume the first s entries in D˜∗f arelargest in magnitude so that T0 = {1, . . . , s}. Making rearrangements ifnecessary we also assume the entries of D˜∗f also satisfy|D˜∗f(s+ 1)| ≥ |D˜∗f(s+ 2)| ≥ . . .We now partition T c0 into smaller disjoint subsets. Let T1 = {s + 1, s +2, . . . , s + 1 + a} and Ti = {s + a + (i − 2)b + 1, . . . , s + a + (i − 1)b} fori = 2, 3, . . . with the last subset of size less than or equal to b, where a andb are positive integers satisfying 0 < b − a ≤ 3a. Further, divide each Ti,i ≥ 2 into two parts:Ti1 = {s+ a+ (i− 2)b+ 1, . . . , s+ (i− 1)b}Ti2 = {s+ (i− 1)b+ 1, . . . , s+ (i− 1)b+ a}We have |Ti1| = b−a and |Ti2| = a for i ≥ 2 and notice Ti2 = Ti\Ti1. DenoteT01 := T0 ∪ T1. Let h = f − fˆ . We have||h||2 = ||DD˜∗h||2= ||(DT01D˜∗T01 +DT c01D˜∗T c01)h||2= ||DT01D˜∗T01h+DT c01D˜∗T c01h||2324.3. Main Theoretical Contributions≤ ||DT01D˜∗T01h||2 + ||DT c01D˜∗T c01h||2, by the triangle inequality.≤ ||DT01D˜∗T01h||2 + ||DT c01 ||2||D˜∗T c01h||2≤ ||DT01D˜∗T01h||2 +√B||D˜∗T c01h||2, by (2.1)In other words,||h||2 ≤ ||DT01D˜∗T01h||2 +√B||D˜∗T c01h||2 (4.16)What remains is to bound ||DT01D˜∗T01h||2 and ||D˜∗T c01h||2.Lemma 4.2 (Bound the tail ||D˜∗T c01h||2).||D˜∗T c01h||2 ≤√sb(γ√B˜||h||2 + η)(4.17)where γ = ω+(1−ω)√1 + ρ+ 2αρ and η = 2√s(ω||D˜∗T c0 f ||1 + (1− ω)||D˜∗T˜ c∩T c0f ||1)Proof. As in Li et al [45], if 0 < b−a ≤ 3a then applying the shifting inequal-ity to the vectors[(D˜∗T1h)∗, (D˜∗T21h)∗), (D˜∗T22h)∗]∗and[(D˜∗T(i−1)2h)∗, (D˜∗Ti1h)∗), (D˜∗Ti2h)∗]∗for i = 3, 4, . . . , we have||D˜∗T2h||2 ≤||D˜∗T1h||1 + ||D˜∗T21h||1√b, . . .||D˜∗Tih||2 ≤||D˜∗T(i−1)2h||1 + ||D˜∗Ti1h||1√b, . . .Using the above and the fact that√u2 + v2 ≤ u + v for u, v ≥ 0, it thenfollows that||D˜∗T c01h||2 ≤∑i≥2||D˜Tih||2≤||D˜∗T c0h||1√bWe then need to bound ||D˜∗T c0h||1. We have h = fˆ − f . We now follow thestrategy of Friedlander et al. By choice of weights and since f and fˆ arefeasible and fˆ is the minimizer, we have334.3. Main Theoretical Contributions||D˜∗fˆ ||1,ω ≤ ||D˜∗f ||1,ω⇔ ω||D˜∗T˜fˆ ||1 + ||D˜∗T˜ cfˆ ||1 ≤ ω||D˜∗T˜f ||1 + ||D˜∗T˜ cf ||1Further still,ω(||D˜∗T˜∩T0fˆ ||1 + D˜∗T˜∩T c0fˆ ||1)+ ||D˜∗T˜ c∩T0fˆ ||1 + ||D˜∗T˜ c∩T c0fˆ ||1≤ ω(||D˜∗T˜∩T0f ||1 + ||D˜∗T˜f ||1)+ ||D˜∗T˜ c∩T0f ||1 + ||D˜∗T˜ c∩T c0f ||1Using the reverse triangle inequality and fˆ = f + h,ω||D˜∗T˜∩T c0h||1 + ||D˜∗T˜ c∩T c0h||1 ≤ ω||D˜∗T˜∩T0h||1 + ||D˜∗T˜ c∩T0h||1+ 2(||D˜∗T˜ c∩T c0f ||1 + ω||D˜∗T˜∩T c0f ||1) (4.18)On the left hand side of (4.18), we add and subtract ω||D˜∗T˜ c∩T c0h||1. Simi-larly, we add and subtract ω||D˜∗T˜ c∩T0h||1 and 2ω||D˜∗T˜ c∩T c0f ||1 from the righthand side of (4.18). This, along with the fact that:1. ||D˜∗T c0h||1 = ||D˜∗T˜∩T c0h||1 + ||D˜∗T˜ c∩T c0h||12. ||D˜∗T c0 f ||1 = ||D˜∗T˜∩T c0f ||1 + ||D˜∗T˜ c∩T c0f ||13. ||D˜∗T0h||1 = ||D˜∗T˜∩T0h||1 + ||D˜∗T˜∩T0h||1we obtain:ω||D˜∗T c0h||1 + (1− ω)||D˜∗T˜ c∩T c0h||1 ≤ ω||D˜∗T0h||1 + (1− ω)||D˜∗T˜ c∩T0h||1+ 2(ω||D˜∗T c0 f ||1 + (1− ω)||D˜∗T˜ c∩T c0h||1)We also have ω||D˜∗T c0h||1 = ||D˜∗T c0h||1−(1−ω)||D˜∗T˜ c∩T c0h||1−(1−ω)||D˜∗T˜∩T c0h||1so that||D˜∗T c0h||1 ≤ (1− ω)(||D˜∗T˜∩T c0h||1 + ||D˜∗T˜ c∩T0h||1)+ ω||D˜∗T0h||1344.3. Main Theoretical Contributions+ 2(ω||D˜∗T c0 f ||1 + (1− ω)||D˜∗T˜ c∩T c0f ||1)Recall, T˜α = T0 ∩ T˜ and noticing that (T˜ ∩ T c0 ) ∪ (T˜c ∩ T0) = (T0 ∪ T˜ )\T˜αwe arrive at the following:||D˜∗T c0h||1 ≤ ω||D˜∗T0h||1 + (1− ω)||D˜∗T0∪T˜\Tαh||1+ 2(ω||D˜∗T c0 f ||1 + (1− ω)||D˜∗T˜ c∩T c0f ||1)Then we have||D˜∗T c01h||2 ≤1√b(ω||D˜∗T0h||1 + (1− ω)||D˜∗T0∪T˜\Tαh||1+ 2(ω||D˜∗T c0 f ||1 + (1− ω)||D˜∗T˜ c∩T c0f ||1))≤1√b(ω√s||D˜∗T0h||2 + (1− ω)√s(1 + ρ− 2αρ)||D˜∗T0∪T˜\Tαh||2+ 2(ω||D˜∗T c0 f ||1 + (1− ω)||D˜∗T˜ c∩T c0f ||1))≤1√b(ω√s||D˜∗h||2 + (1− ω)√s(1 + ρ− 2αρ)||D˜∗h||2+ 2(ω||D˜∗T c0 f ||1 + (1− ω)||D˜∗T˜ c∩T c0f ||1))≤1√b(ω√s√B˜||h||2 + (1− ω)√s(1 + ρ− 2αρ)√B˜||h||2+ 2(ω||D˜∗T c0 f ||1 + (1− ω)||D˜∗T˜ c∩T c0f ||1))=√sb(γ√B˜||h||2 + η)where γ = ω+(1−ω)√1 + ρ+ 2αρ and η = 2√s(ω||D˜∗T c0 f ||1 + (1− ω)||D˜∗T˜ c∩T c0f ||1)354.3. Main Theoretical ContributionsLemma 4.3. ||DT01D˜∗T01h||2 satisfies2 ≥√1− δs+a||ΦDT01D˜∗T01h||2−√(1 + δb)B(√sb(γ√B˜||h||2 + η))Proof. Recall that h = f − fˆ and since fˆ is feasible we have2 ≥ ||Φh||2= ||ΦDD˜∗h||2= ||ΦDT01D˜∗T01h+ ΦDT c01D˜∗T c01h||2≥ ||ΦDT01D˜∗T01h||2 − ||ΦDT c01D˜∗T c01h||2≥√1− δs+a||DT01D˜∗T01h||2 − ||Φ(DT2D˜∗T2 +DT3D˜∗T3 + . . . )h||2≥√1− δs+a||DT01D˜∗T01h||2 −(||ΦDT2D˜∗T2h||2 + ||ΦDT3D˜∗T3h||2 + . . .)=√1− δs+a||DT01D˜∗T01h||2 −∑j≥2||ΦDTjD˜∗Tjh||2≥√1− δs+a||DT01D˜∗T01h||2 −√1 + δb∑j≥2||DTjD˜∗Tjh||2≥√1− δs+a||DT01D˜∗T01h||2 −√1 + δb∑j≥2||DTj ||2||D˜∗Tjh||2≥√1− δs+a||DT01D˜∗T01h||2 −√(1 + δb)B∑j≥2||D˜∗Tjh||2≥√1− δs+a||DT01D˜∗T01h||2 −√(1 + δb)B(√sb(γ√B˜||h||2 + η))We are now in a position to prove the theorem. We have (4.16), i.e.,||h||2 ≤ ||DT01D˜∗T01h||2 +√B||D˜∗T c01h||2and applying Lemma 4.2 and Lemma 4.3 we obtainK1||h||2 ≤ 2+K2η (4.19)where364.3. Main Theoretical ContributionsK1 =√1− δs+a − η√sBB˜(1− δs+a)b+√sBB˜(1 + δb)bK2 =√sB(1− δs+1)b+√sB(1 + δb)bWe require K1 to be positive, in which case1− γ√sBB˜b2δs+a +sBB˜bδb < 1− 2γ√sBB˜b(4.20)and this proves our theorem.4.3.3 Weighted Optimal Dual AnalysisRecall the optimal-dual `1 analysis formulation (3.14). We consider a weightedversion of this problem:fˆ = arg minf˜∈Rd,DD˜∗=I||D˜∗f˜ ||1,w s.t. ||y − Φf˜ ||2 ≤  (4.21)As a similar derivation as in the non-weighted case will show, we refor-mulate this problem as follows:fˆ = arg minf˜∈Rd,g∈RN||D¯∗f˜ + Pg||1,w s.t. ||y − Φf˜ ||2 ≤  (4.22)In the previous section, we gave the results from [44] that stated optimaldual-based `1-analysis and synthesis are equivalent. We extend their proofto the weighted case.374.3. Main Theoretical ContributionsTheorem 4.3.2. Weighted `1-synthesis and weighted optimal-dual-based `1-analysis are equivalent.Proof. First note that we can put the weighted-optimal-dual-based `1-analysisformulation (4.22) into matrix notationfˆ = arg minf˜∈Rd,g∈RN||D¯∗f˜ + Pg||1,w s.t. ||y − Φf˜ ||2 ≤  (4.23)= arg minf˜∈Rd,g∈RN||W(D¯∗f˜ + Pg)||1 s.t. ||y − Φf˜ ||2 ≤  (4.24)where W is a diagonal matrix whose entries are ω or 1 on the diagonal, and0 otherwise. Following the proof of Theorem 3.3.1, start with the weighted-optimal-dual-based `1-analysis formulation (4.24). Let x˜ = D¯∗f˜ + Pg, thenwe haveDx˜ = DD¯∗f˜ +DPg= If˜ + 0= f˜Since the columns of [D¯∗, P ] span the whole N -dimensional space and bothf˜ and g are free, then any x˜ ∈ RN can be put in this form. We then rewrite(4.24),fˆ = D arg minx˜∈RN||Wx˜||1 subject to ||ΦDx˜− y||2 ≤  (4.25)Conversely, starting with (4.25) we have that for any x ∈ RN , the followingdecomposition always holdsx˜ = x˜R + x˜N = D∗(DD∗)−1Dx˜+ Px˜= D¯∗Dx˜+ Px˜where x˜R and x˜N are the components of x˜ belonging to the row space andnull space of D, respectively. By definition of weighted-`1-synthesis, we havef˜ = Dx˜ ∈ Rd. Again since g is free, let g = x˜ ∈ RN , we can arrive at theweighted-optimal dual-based `1-analysis formulation.384.3. Main Theoretical ContributionsRecovery Conditions for Weighted `1-Synthesis via Weighted`1-AnalysisAs a direct consequence of Theorem 4.3.1 and Theorem 4.3.2, we have atheorem for weighted `1-synthesis that holds even when using highly coher-ent dictionaries by simply replacing the general dual with an optimal dual.This is analogous to the non-weighted work of Li et al [44].Theorem 4.3.3. Consider a signal f = Dx , where x ∈ RN , and D ∈ Rd×Nis an arbitrary frame of Rd with frame bounds 0 < A ≤ B <∞ and we havenoisy linear measurements:y = ΦDx+ z (4.26)Let D˜o be an optimal dual frame of D with frame bounds 0 < A˜o ≤ B˜o <∞. Let T˜ ⊂ {1, ..., N} be an arbitrary set. Let T0 index the support of thebest s-term approximation of D˜o∗f and thus |T0| = s. Define ρ and α suchthat |T˜ | = ρs and |T˜ ∩ T0| = αρs. Suppose Φ has D-RIP with constant δthat satisfies1− γ√sBB˜ob2δs+a +sBB˜obδb < 1− 2γ√sBB˜ob(4.27)for some positive integers a and b satisfying 0 < b − a ≤ 3a where γ =ω+ (1−ω)√1 + ρ− 2αρ and 0 ≤ ω ≤ 1 with our weights chosen as follows:wi ={1 if i ∈ T˜ cω if i ∈ T˜(4.28)and T˜ is a support estimate of D˜o∗f . Then the solution fˆ to (4.21) (or (4.1)after reconstruction via D) satisfies||f − fˆ ||2 ≤ c1(ω||(D˜o)∗T c0f ||1 + (1− ω)||(D˜o)∗T˜ c∩T c0f ||1)+ c2 (4.29)where c1 and c2 are some well behaved constants and  is an upper bound onthe noise.39Chapter 5Synthetic Experiments5.1 Optimal Dual `1-Analysis5.1.1 Weighted vs Non-WeightedRecall the optimal-dual-`1-analysis can be formulated as follows:fˆ = arg minf∈Rd,g∈RN||D¯∗f + Pg||1 s.t. ||y − Φf ||2 ≤  (5.1)where P is the orthogonal projection onto the nullspace of D. In this sectionwe briefly outline an iterative algorithm developed by Li et al [45], basedon the split Bregman iteration, to solve (5.1). Here is the algorithm theyderived based on the split Bregman iteration[27]:Algorithm 3 Split Bregman iteration for optimal dual analysisInitialization: f0 = 0,d0 = b0 = Pg0 = 0,c0 = 0, µ > 0, λ > 0,nOuter,nInner,tol;while k < nOuter and ||Φfk − y||2 > tol dofor n = 1 : nInner dofk+1 = (µΦ∗Φ + λD¯D¯∗)−1[µΦ∗(y − ck) + λD¯(dk − Pgk − bk)]dk+1 =shrink(D¯∗fk+1 + Pgk + bk, 1λ)Pgk+1 = P (dk+1 − D¯∗fk+1 − bk)bk+1 = bk + (D¯∗fk+1 + Pgk+1 − dk+1)end forck+1 = ck + (Φfk+1 − y)Increase kend whileAs a numerical contribution of this thesis we adapt this algorithm for theweighted version of (5.1), namely (4.24). Following the derivation of Li et al[45], this modification merely introduces weights into the soft threshholdingoperator shrink of Algorithm 3.405.1. Optimal Dual `1-AnalysisAlgorithm 4 Weighted split Bregman iteration for optimal dual analysisInitialization: f0 = 0,d0 = b0 = Pg0 = 0,c0 = 0, µ > 0, λ > 0,nOuter,nInner,tol;while k < nOuter and ||Φfk − y||2 > tol dofor n = 1 : nInner dofk+1 = (µΦ∗Φ + λD¯D¯∗)−1[µΦ∗(y − ck) + λD¯(dk − Pgk − bk)]dk+1 =shrink(D¯∗fk+1 + Pgk + bk, wλ) i.e., Weighted shrinkPgk+1 = P (dk+1 − D¯∗fk+1 − bk)bk+1 = bk + (D¯∗fk+1 + Pgk+1 − dk+1)end forck+1 = ck + (Φfk+1 − y)Increase kend whileThis form of weighting is not new, though . Indeed, it has been examined inpapers such as Asif and Romberg [3], Bioucas-Dias et al [6], and in the con-text of approximate message passing for the purposes of compressed sensingby Navid Ghadermarzy in his thesis [24]. This form of weighting also comesup in the weighted SPGl1 algorithm [66].Numerical ResultsWe now look at a numerical example. Consider the frame D ∈ Cn×2n madeof the concatenation of the Fourier basis and the identity, ie D = [F, I]/√2where F ∈ Cn×n is the normalized discrete Fourier transform matrix andI ∈ Rn×n is the identity. Letting n = 128 we will construct signals f =Dx where x is 20-sparse with respect to D. We will collect 64 Gaussianmeasurements of the formy = ADxwhere A is a standard normal Gaussian. We now wish to compare weightedand non-weighted optimal dual `1-analysis using the Split Bregman algo-rithm. For this experiment we will suppose that we know apriori the signalis 20-sparse, and for example if we indicate that a weighting strategy is 50%accurate that means we’ve correctly identified 10 indices and incorrectlyidentified 10. That is we assign a weight to 20 indices, half of which arecorrect. The same logic applies to 25% and 75% accuracy levels.415.1. Optimal Dual `1-Analysis0 20 40 60 80 100 12000. of Outer IterationRelative Error  No weighting50 percent accurate25 percent accurateFigure 5.1: Average relative error of 10 reconstructed signals at a givenouter iteration of the Split Bregman algorithm0 20 40 60 80 100 1200. of Outer IterationRelative ErrorNoisy case  No weighting50 percent accurate25 percent accurateFigure 5.2: Measurements are now corrupted by a small amount of whitenoise η, i.e., y = ΦADx+ η425.1. Optimal Dual `1-Analysis5.1.2 Iterative ReweightingIn the case where we have no support estimate, we can still use weighting asa way of accelerating convergence of the Split Bregman algorithm by usingthe support of the current iterate as an estimate of the support of the actualsignal, as in iterative reweighting for analysis and synthesis.Algorithm 5 Iterative reweighting split Bregman iteration for optimal dualanalysisInitialization: f0 = 0,d0 = b0 = Pg0 = 0,c0 = 0, µ > 0, λ > 0,nOuter,nInner,tol,weights = 1;while k < nOuter and ||Φfk − y||2 > tol dofor n = 1 : nInner dofk+1 = (µΦ∗Φ + λD¯D¯∗)−1[µΦ∗(y − ck) + λD¯(dk − Pgk − bk)]dk+1 =shrink(D¯∗fk+1 + Pgk + bk, wλ) i.e., Weighted shrinkbk+1 = bk + (D¯∗fk+1 + Pgk+1 − dk+1)Update weights wλ based on largest magnitude coefficients of dk+1.end forck+1 = ck + (Φfk+1 − y)Increase kend while435.1. Optimal Dual `1-Analysis0 20 40 60 80 100 1200. of Outer IterationRelative Error  No weightingIteratively reweightingFigure 5.3: Iterative reweighting with no initial support estimate vs noweighting for the noisy case. We update our weights by calculating thecoefficients of dk+1 which contribute 90 % of dk+1’s cumulative energy.5.1.3 The Optimal Dual Frame BoundsAs stated before, without actually calculating the optimal dual we cannotknow apriori what the frame bounds of the optimal dual are without actuallysolving an optimization problem. This poses a serious problem theoreticallysince we have recovery conditions which depend on the upper frame boundof the optimal dual. Here we wish to provide some empirical evidence toshow that for a particular frame, its optimal dual upper frame bound issmall and in some sense largely independent of the signal.Define the redundancy of a frame D ∈ Rd×N as κ = Nd . Here we wishto provide evidence for which a particular frame has the property of havingB˜o ≈√κ. As in the previous section, we construct a set of signals and applythe split bregman algorithm, however here we calculate the optimal dual asa byproduct of the algorithm and compute its upper frame bound.445.1. Optimal Dual `1-Analysis4 8 12 161.31.321.341.361.381.41.421.441.461.481.5SparsityMean norm(Optimal Dual)  Average upper boundsqrt(Kappa)Figure 5.4: Average upper frame bound of the optimal dual reconstructedfrom signals of varying sparsity.We will now take the example of a partial Fourier matrix, i.e., take theN × N discrete Fourier transform matrix, and keep only d rows. For avariety of d, we calculate the upper frame bound of the calculated optimaldual of the experiment and plot it in Figure 5.4 for various sparsity levels.We now have a reasonable expectation of having B˜o being small which is animportant theoretical aspect of the recovery conditions for optimal dual `1analysis, and thus synthesis.45Chapter 6Seismic WavefieldReconstruction6.1 IntroductionReflection seismology is a technique used by the oil industry for hydrocar-bon reservoir exploration to explore and identify potential oil-rich areas inthe earth. This technique involves propagating energy down into the earthand then collecting the reflected energy via some measurement device.The energy propagated is generated via a given source. This sourcesends out a complicated, and largely unknown waveform, attenuates, re-flects, transmits, changes modes, and scatters about while a set of receiversrecords what comes their way. On land, the source is typically given byeither a Vibroseis truck or dynamite, and in marine environments airgunsare used. The equipment used to record this energy on land is done viageophone receivers, whereas in marine environments hydrophone receiversare used. The recorded information gives an indirect measurement of theearth’s structure. This recorded information is then processed to create animage of the subsurface.The procedure of processing acquired data to develop an image of thesubsurface can be described in a workflow chart. In Figure 6.1, we see a basicgeophysical workflow. Though certainly non-exhaustive, these are some ofthe main movements from acquisition to interpretation of seismic data.466.1. IntroductionFigure 6.1: Basis seismic processing workflow. M.V.A stands for MigrationVelocity Analysis and C.I.G. stands for Common Image GathersLet us briefly outline some of the key components in the geophysical work-flow and note recent progress with respect to sparse approximation andcompressive sampling, leaving regularization for the following section as itis the main objective of this chapter.1. Acquisition: The collection of raw seismic data, where the objective isto probe the unknown geological subsurface. As stated above, this isdone by sending seismic waves through the earth surface by a source.The seismic waves are then collected by the receivers. Leveragingprincipals of compressive sensing for seismic acquisition, Herrmannand Wason investigated randomization for ocean bottom sensors [68].In another instance, Mansour et al [49] propose a practical randomizedmarine acquisition scheme for marine acquisition where the sequentialsources fire airguns at only randomly time-dithered instances.2. Multiples: When a seismic signal makes one reflection before beingrecorded by a receiver, we call this a primary reflection. On the otherhand, when a seismic signal bounces more than once we call this amultiple reflection. Multiples can be characterized in a variety of ways,e.g., near-surface, interformational, and peg-leg [55]. Herrmann etal [34] exploit curvelet domain sparsity of seismic data and proposea data-adaptive method that corrects amplitude errors, which varysmoothly as a function of location, scale (frequency band), and angle.3. Deconvolution: The premise of deconvolution is that seismic traces arethe convolution of a reflectivity series with a seismic wavelet. Thus the476.2. Seismic Data Interpolationgoal of deconvolution is to seperate a seismic trace into its constituentparts. For an investigation of deconvolution using advances in sparseinversion, see Herrmann and Lin [42]. Gholami, A. and Sacchi, M. usesparsity in a wavelet domain as a source prior for the purposes of blindseismic deconvolution [25].4. Migration: Also referred to as event movement, the goal of migrationis to give a true representation of the subsurface by converting fromtime to depth, e.g., moving primary reflections to their true spatiallocations. The following papers all pose this problem as a sparsity-promoting formulation [31, 41, 62, 63].What is not mentioned in the workflow is the fact that that a large ma-jority of algorithms existing in seismic processing rely on the fact the datais given on a regular grid, we investigate this now.6.2 Seismic Data InterpolationIn the remainder of this chapter, we will focus entirely on the regularizationportion of the workflow. By regularization, we mean the process of takingseismic data on a irregular grid (e.g., Figure 6.2) and projecting it to aregularly sampled grid (e.g., Figure 6.3). The fact that acquired data is onan irregular grid is a byproduct of acquisition constraints, which are variousin nature. Some physical constraints might be nearby oil wells, weather,or other seismic vessels. Even if ocean bottom sensors are used, in whichplacement of the node is more accurate, the ocean bottom could be jaggedor sloped, causing further grief.486.2. Seismic Data InterpolationFigure 6.2: An example of a top down view of a seismic acquisition geometryprovided by Total E & P USA. Each line represents a set of seismic receiverswhich are trailing a seismic vessel.Figure 6.3: A regular grid overlaying the seismic acquisition. Each inter-section of the grid corresponds to a physical location for which we requireinformation.496.2. Seismic Data InterpolationThere are a variety of strategies available to accomplish the process ofprojecting data to a regular grid. For example one might take nearest re-ceiver approximations or weighted average of neighboring receivers to esti-mate what a receiver at a target location would look like. An issue withthese methods is how to choose a threshold for what is considered a closeneighbor. If the threshhold is taken too small, no interpolation will takeplace and the problem of missing receivers remains. Certainly those gridpoints located in the gaps shown in Figure 6.4 will be missed by a nearestneighbour algorithm. If a threshold is chosen too high, the accuracy of therecovery will be sacrificed.Figure 6.4: A regular grid overlaying the seismic acquisition. The circledareas represent regions for which there are no receivers collecting informa-tion.Reconstruction methods can be divided into two main classes: waveequation based and signal processing based. The majority of methods inthe latter category utilize transforms domains[50]. These transforms includeFourier[50],[43], Radon[60], and Curvelet[30]. Transform based methods canbe easily adopted for irregularly sampled data, if the transform has a non-uniform counterpart.506.3. Seismic Data Interpolation by Sparse Recovery6.3 Seismic Data Interpolation by SparseRecoveryThis section considers the methodology conducted in [48], however withthe inclusion of the analysis formulation of the sparse recovery problem.Consider a seismic line with Ns sources, Nr receivers, and Nt time samples,thus there are N = NsNrNt datapoints under consideration. We assumethat all sources see the same receivers. These N points can be representedin a 3-d cube as in Figure 6.6. In this particular experiment, the seismicline at full resolution has Ns = 178 sources, Nr = 178 receivers with asample distance of 12.5 meters, and Nt = 500 time samples acquired with asampling interval of 4 milliseconds.Figure 6.5: A seismic acquisition from the Gulf of Mexico expressed as a 3Dvolume, here we shows several time slices of the volume.We consider the scenario for which each source is randomly missing 50%of its receivers. This is illustrated by removing a single shot gather corre-sponding to a slice in the 3D cube, as represented in Figure 6.7, with 50% of516.3. Seismic Data Interpolation by Sparse Recoveryits traces missing as shown in Figure 6.8. The problem is then interpolatingthose randomly missing traces to a complete periodic grid.Figure 6.6: A seismic acquisition from the Gulf of Mexico expressed as a 3Dvolume, here we show several common shot gathers.Expressing the shot gather as a single N -dimensional vector, our goalthen is to recover an approximation fˆ of the discretized wavefield f fromthe measurementsb = RMf (6.1)526.3. Seismic Data Interpolation by Sparse RecoveryReceiverTime Sample20 40 60 80 100 120 140 16050100150200250300350400450500Figure 6.7: A common shot gather sliced from the 3D cube.ReceiverTime Sample20 40 60 80 100 120 140 16050100150200250300350400450500Figure 6.8: The shot gather sliced from the 3D cube in Figure 6.7 withrandomly missing receivers.where RM is a sampling operator composed of the product of a restrictionmatrix R with a measurement basis matrix M . The measurement matrix Mrepresents the basis in which the measurements are taken and corresponds536.3. Seismic Data Interpolation by Sparse Recoveryto the Dirac (identity) basis in the missing trace interpolation problem.The synthesis formulation attempts to find a sparse representation of thewavefield in a transform domain whose inverse transform coefficients fit theacquired data,fˆsynthesis = D · arg minx˜||x˜||1 subject to ||RMDx˜− b||2 ≤  (6.2)One issue with this approach is that the coefficients recovered may not bethe forward transform of an actual wavefield, i.e., those coefficients may notrepresent an actual physical phenomenon. On the other hand, the traditionalanalysis formulation attempts to directly find a wavefield which fits theacquired data whose forwards transform coefficients are sparse.fˆanalysis = arg minf˜||D†f˜ ||1 subject to ||RMf˜ − b||2 ≤  (6.3)where D† is the pseudoinverse (canonical dual) of D i.e., D† = D¯∗. Forthe Curvelet transform, this is simple the transpose for which fast matrixvector products exist. Note here that it would likely be difficult to find ageneral dual frame for the Curvelet frame for which there still exists a fastimplementation, which is the reason we only consider the canonical dual.6.3.1 Incorporating WeightsAs in [48], we incorporate weighting by performing the sparse recovery in thefrequency domain. Rather than consider the wavefield as an N = NsNrNtdimensional vector, we convert take the Fourier transform of the data vol-ume in the time domain and partition the volume into frequency slices. Eachfrequency slice is missing 50% of its traces randomly, and we wish to recoverthose slices. An example of this is shown in Figure 6.9 and Figure 6.10. Af-ter all the frequency slices have been recovered, one simply takes the inverseFourier transform to see the recovered time domain solution.We begin with no initial support estimate and then sweeping from lowfrequency to high frequencies, develop a support estimate and using thesupport of the current frequency slice as an estimate for the next frequencyslice as illustrated in Figure 6.11. The continuity of wavefronts across adja-cent partitions of a seismic line manifests itself as a high correlation in the546.3. Seismic Data Interpolation by Sparse Recoverysupport of the Curvelet coefficients of the partitions, and thus the supportof adjacent solutions will overlap.Source numberReceiver number20 40 60 80 100 120 140 16020406080100120140160Figure 6.9: A frequency sliced from the 3D cube.Source numberReceiver number20 40 60 80 100 120 140 16020406080100120140160Figure 6.10: The frequency sliced from the 3D cube in Figure 6.9 withrandomly missing receivers.556.3. Seismic Data Interpolation by Sparse RecoveryFigure 6.11: Illustration of weighting strategy, s: source , r: receiver , f: fre-quency . One begins by recovering low frequency slices, which are hopefullynon-aliased, and then using those recovered slices to generate weights forrecovering higher frequency slices566.4. Gulf of Mexico Experiment Results6.4 Gulf of Mexico Experiment ResultsThe experiment conducted here is exactly the same as described in [48]however we consider both analysis and synthesis approaches for solving theassociated optimization problem. Note that if one acquires seismic data andthen applies a nearest neighbor algorithm to approximate missing receivers,it would be a natural problem to consider if the nearest neighborhood al-gorithm misses some traces. Also, this scenario is certainly a reality if ajittersampling strategy is employed during acquisition. There are severalbenefits of applying such a strategy and for details the reader is referred to[57] and [30].0 20 40 60 80 100 120 140 160 18024681012141618SourceSNR  New Weighting AnalysisWeighted AnalysisWeighted SynthesisSynthesisAnalysisFigure 6.12: Seismic wavefield reconstruction of the Gulf of Mexico Data us-ing analysis and synthesis with various weighting techniques. New weightinganalysis refers to (4.8), and the others are the traditional weighting tech-niques.It is clear that the non-weighted analysis and synthesis formulationsperform poorer than their weighted counterparts. An interesting observationis that non-weighted analysis only outperforms non-weighted synthesis insome shot ranges and vice versa, a phenomenon that does not occur in theweighted methods in the same magnitude.576.4. Gulf of Mexico Experiment ResultsReceiverTime Sample20 40 60 80 100 120 140 16050100150200250300350400450500Figure 6.13: Recovered SynthesisReceiverTime Sample20 40 60 80 100 120 140 16050100150200250300350400450500Figure 6.14: Difference plotReceiverTime Sample20 40 60 80 100 120 140 16050100150200250300350400450500Figure 6.15: Recovered AnalysisReceiverTime Sample20 40 60 80 100 120 140 16050100150200250300350400450500Figure 6.16: Difference plotNote that upon inspecting Figure 6.12, the novel weighted method foranalysis from Section 4.3.1 of Chapter 4 has a larger SNR than all othermethods investigated among the majority of shots. It is only in the shotrange of 80-100 where weighted synthesis performs slightly better. Howeverin lowest SNR point of the shot range 140-160, the novel weighted methodfor analysis has at least a 2dB increase to it’s closest competitor. Figures6.13-6.22 show the recovery of the various algorithms for the 150th shot inthe acquisition.586.4. Gulf of Mexico Experiment ResultsReceiverTime Sample20 40 60 80 100 120 140 16050100150200250300350400450500Figure 6.17: Recovered Weighted Syn-thesisReceiverTime Sample20 40 60 80 100 120 140 16050100150200250300350400450500Figure 6.18: Difference plotReceiverTime Sample20 40 60 80 100 120 140 16050100150200250300350400450500Figure 6.19: Recovered WeightedAnalysisReceiverTime Sample20 40 60 80 100 120 140 16050100150200250300350400450500Figure 6.20: Difference plotReceiverTime Sample20 40 60 80 100 120 140 16050100150200250300350400450500Figure 6.21: Recovered New WeightedAnalysisReceiverTime Sample20 40 60 80 100 120 140 16050100150200250300350400450500Figure 6.22: Difference plot596.4. Gulf of Mexico Experiment Results6.4.1 Computational ConsiderationsIt should be noted that [48] use SPGL1 [66] to solve the optimization tech-nique. Unfortunately, SPGL1 cannot solve the analysis formulation of thebasis pursuit problem. A strength of NESTA[5] is that it can solve boththe synthesis and analysis formulations. Thus for a fair comparison we haveopted for NESTA.One of the main goals of using compressive sensing in a seismic settingis to drive down the expense of computation. Calculating pseudoinverses isan expensive computation and the new weighting method considered herehas to recalculate how the pseudoinverse acts on a vector at every iteration.This overhead might be too expensive for only a 1db increase in the SNRfor the majority of the sources.6.4.2 Conclusions and Future WorkWe applied the method of [48], i.e., weighted one-norm minimization in aframework for the recovery of missing-traces caused by physical obstacles ordeliberate subsampling in otherwise regularly-sampled seismic wavefields,to the analysis and synthesis formulations of the seismic interpolation prob-lem along with their weighted extensions discussed in this thesis. Weightedone-norm minimization exploits correlations between the locations of signif-icant coefficients of different partitions. Though we only considered source-receiver gathers in this experiment, it was shown in [48] that converting thedata to time-midpoint and midpoint offset increases the SNR of the recoveryof synthesis and weighted synthesis. We expect the same results for analysis,weighted analysis, and the novel weighted analysis method discussed in thisthesis which we leave for future work.60Bibliography[1] A. Aravkin, M. P. Friedlander, F. Herrmann, and T. van Leeuwen.Robust inversion, dimensionality reduction, and randomized sampling.Mathematical Programming, 134(1):101–125, 2012.[2] Aleksandr Y. Aravkin, Michael P. Friedlander, and Tristan vanLeeuwen. Robust inversion via semistochastic dimensionality reduc-tion. In Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEEInternational Conference on, pages 5245 –5248, march 2012.[3] Muhammad Salman Asif and Justin K. Romberg. Fast and accurate al-gorithms for re-weighted l1-norm minimization. CoRR, abs/1208.0651,2012.[4] Christopher A. Baker. A note on sparsification by frames. CoRR,abs/1308.5249, 2013.[5] S. Becker, J. Bobin, and E. Candes. NESTA: A Fast and AccurateFirst-order Method for Sparse Recovery. ArXiv e-prints, April 2009.[6] Jose M. Bioucas-Dias and Mario A. T. Figueiredo. A new twist: Two-step iterative shrinkage/thresholding algorithms for image restoration.IEEE Transactions on Image Processing, 16(12):2992–3004, 2007.[7] T. Tony Cai and Anru Zhang. Sparse representation of a polytope andrecovery of sparse signals and low-rank matrices. CoRR, abs/1306.1154,2013.[8] E.J. Candes and M.B. Wakin. An introduction to compressive sampling.Signal Processing Magazine, IEEE, 25(2):21 –30, march 2008.[9] Emmanuel J. Cande`s, Yonina C. Eldar, Deanna Needell, and PaigeRandall. Compressed sensing with coherent and redundant dictionaries.Appl. Comput. Harmon. Anal., 31(1):59–73, 2011.61Bibliography[10] Emmanuel J. Cande`s, Justin Romberg, and Terence Tao. Robust un-certainty principles: exact signal reconstruction from highly incompletefrequency information. IEEE Trans. Inform. Theory, 52(2):489–509,2006.[11] Emmanuel J. Cande`s, Michael B. Wakin, and Stephen P. Boyd. En-hancing sparsity by reweighted l1 minimization. J. Fourier Anal. Appl.,14(5-6):877–905, 2008.[12] Peter G. Casazza, Gitta Kutyniok, and Friedrich Philipp. Introductionto finite frame theory. In Finite frames, Appl. Numer. Harmon. Anal.,pages 1–53. Birkha¨user/Springer, New York, 2013.[13] Ole Christensen. Frames and bases. Applied and Numerical HarmonicAnalysis. Birkha¨user Boston Inc., Boston, MA, 2008. An introductorycourse.[14] David L. Donoho. Compressed sensing. IEEE Trans. Inform. Theory,52(4):1289–1306, 2006.[15] David L. Donoho and Michael Elad. Optimally sparse representationin general (nonorthogonal) dictionaries via 1 minimization. Proceedingsof the National Academy of Sciences, 100(5):2197–2202, 2003.[16] David L. Donoho and Iain M. Johnstone. Ideal spatial adaptation bywavelet shrinkage. Biometrika, 81(3):425–455, 1994.[17] P.L. Dragotti and M. Vetterli. Wavelet footprints: theory, algorithms,and applications. Signal Processing, IEEE Transactions on, 51(5):1306– 1323, may 2003.[18] Michael Elad, Peyman Milanfar, and Ron Rubinstein. Analysis versussynthesis in signal priors. Technical report, 2005.[19] Y.C. Eldar and G. Kutyniok. Compressed Sensing: Theory and Ap-plications. Compressed Sensing: Theory and Applications. CambridgeUniversity Press, 2012.[20] Massimo Fornasier and Holger Rauhut. Compressive sensing, in. Hand-book of Mathematical Methods in Imaging, 2010.[21] Simon Foucart. A note on guaranteed sparse recovery via -minimization. Applied and Computational Harmonic Analysis, 29(1):97– 103, 2010.62Bibliography[22] Simon Foucart and Holger Rauhut. A mathematical introduction tocompressive sensing. Appl. Numer. Harmon. Anal. Birkha¨user, Boston,2013.[23] Michael P. Friedlander, Hassan Mansour, Rayan Saab, and O¨zgu¨r Yil-maz. Recovering compressively sampled signals using partial supportinformation. IEEE Trans. Inform. Theory, 58(2):1122–1134, 2012.[24] Navid Ghadermarzy. Using prior support information in compressedsensing. Master’s thesis, University of British Columbia, 2013.[25] A. Gholami and M. Sacchi. Fast 3d blind seismic deconvolution via con-strained total variation and gcv. SIAM Journal on Imaging Sciences,6(4):2350–2369, 2013.[26] R. Giryes and M. Elad. Can we allow linear dependencies in the dictio-nary in the sparse synthesis framework? ArXiv e-prints, March 2013.[27] Tom Goldstein and Stanley Osher. The split bregman method for l1-regularized problems. SIAM J. Img. Sci., 2(2):323–343, April 2009.[28] Y. Gousseau and J. Morel. Are natural images of bounded variation?SIAM Journal on Mathematical Analysis, 33(3):634–648, 2001.[29] Karlheinz Gro¨chenig. Foundations of time-frequency analysis, chap-ter 4.3, pages 61–63. Applied and Numerical Harmonic Analysis.Birkha¨user Boston Inc., Boston, MA, 2001.[30] Gilles Hennenfent and Felix J. Herrmann. Simply denoise: wavefieldreconstruction via jittered undersampling. Geophysics, 73(3):V19–V28,2008.[31] Felix J. Herrmann, Cody R. Brown, Yogi A. Erlangga, and Peyman P.Moghaddam. Curvelet-based migration preconditioning and scaling.Geophysics, 74:A41, 09 2009.[32] Felix J. Herrmann and Xiang Li. Efficient least-squares imaging withsparsity promotion and compressive sensing. Geophysical Prospecting,60(4):696–712, 2012.[33] F.J. Herrmann, M.P. Friedlander, and O. Yilmaz. Fighting the curse ofdimensionality: Compressive sensing in exploration seismology. SignalProcessing Magazine, IEEE, 29(3):88 –100, may 2012.63Bibliography[34] F.J. Herrmann, D. Wang, and D. Verschuur. Adaptive curvelet-domainprimary-multiple separation. GEOPHYSICS, 73(3):A17–A21, 2008.[35] Laurent Jacques. A short note on compressed sensing with partiallyknown signal support. CoRR, abs/0908.0660, 2009.[36] M. Amin Khajehnejad, Weiyu Xu, A. Salman Avestimehr, and BabakHassibi. Weighted l1 minimization for sparse recovery with prior infor-mation. In Proceedings of the 2009 IEEE international conference onSymposium on Information Theory - Volume 1, ISIT’09, pages 483–487,Piscataway, NJ, USA, 2009. IEEE Press.[37] Jelena Kovacevic and Amina Chebira. An introduction to frames. SignalProcessing, 2(1):1–94, 2008.[38] Gitta Kutyniok. Compressed sensing: Theory and applications. CoRR,abs/1203.3815, 2012.[39] M. Lammers, A. Powell, and O. Yilmaz. Alternative dual frames fordigital-to-analog conversion in sigmadelta quantization. Advances inComputational Mathematics, 32(1):73–102, 2010.[40] Shidong Li. On general frame decompositions. Numerical functionalanalysis and optimization, 16(9-10):1181–1191, 1995.[41] Xiang Li and Felix J. Herrmann. Sparsity-promoting migration ac-celerated by message passing. In SEG Technical Program ExpandedAbstracts. SEG, SEG, 04 2012.[42] Tim T.Y. Lin and Felix J. Herrmann. Robust source signature decon-volution and the estimation of primaries by sparse inversion. In SEGTechnical Program Expanded Abstracts, volume 30, pages 4354–4359.Dept. of Earth and Ocean Sciences, University of British Columbia,Dept. of Earth and Ocean Sciences, University of British Columbia, 042011.[43] B. Liu and M. Sacchi. Minimum weighted norm interpolation of seismicrecords. GEOPHYSICS, 69(6):1560–1568, 2004.[44] Yulong Liu, S. Li, Tiebin Mi, Hong Lei, and Weidong Yu. Performanceanalysis l1-synthesis with coherent frames. In Information Theory Pro-ceedings (ISIT), 2012 IEEE International Symposium on, pages 2042–2046, 2012.64Bibliography[45] Yulong Liu, Tiebin Mi, and Shidong Li. Compressed sensing with gen-eral frames via optimal-dual-based `1-analysis. IEEE Trans. Inform.Theory, 58(7):4201–4214, 2012.[46] Wei Lu and Namrata Vaswani. Exact reconstruction conditions forregularized modified basis pursuit. IEEE Transactions on Signal Pro-cessing, 60(5):2634–2640, 2012.[47] Stephane Mallat. A Wavelet Tour of Signal Processing: The SparseWay. Access Online via Elsevier, 2008.[48] H. Mansour, F.J. Herrmann, and O¨. Yilmaz. Improved wavefield recon-struction from randomized sampling via weighted one-norm minimiza-tion. GEOPHYSICS, 78(5):V193–V206, 2013.[49] Hassan Mansour, Haneet Wason, Tim T.Y. Lin, and Felix J. Herrmann.Randomized marine acquisition with compressive sampling matrices.Geophysical Prospecting, 60(4):648–662, 2012.[50] M. Naghizadeh and K. Innanen. Seismic data interpolation using a fastgeneralized fourier transform. GEOPHYSICS, 76(1):V1–V10, 2011.[51] Sangnam Nam, Mike E. Davies, Michael Elad, and Re´mi Gribonval.The Cosparse Analysis Model and Algorithms. Applied and Compu-tational Harmonic Analysis, 34(1):30–56, 2013. Preprint available onarXiv since 24 Jun 2011.[52] Deanna Needell. Noisy signal recovery via iterative reweighted l1-minimization. In Signals, Systems and Computers, 2009 ConferenceRecord of the Forty-Third Asilomar Conference on, pages 113–117.IEEE, 2009.[53] H. Nyquist. Certain topics in telegraph transmission theory. AmericanInstitute of Electrical Engineers, Transactions of the, 47(2):617 –644,april 1928.[54] Von Borries R., Miosso C. J., and Cristhian Potes. Compressed sens-ing using prior information. The Second International Workshop onComputational Advances in MultiSensor Adaptive Processing, 2007.[55] Enders A. Robinson and Sven Treitel. Digital Imaging and Deconvo-lution: The ABCs of Seismic Exploration and Processing. Society ofExploration Geophysicists, Tulsa, USA, November 2008.65Bibliography[56] Leonid I Rudin, Stanley Osher, and Emad Fatemi. Nonlinear totalvariation based noise removal algorithms. Physica D: Nonlinear Phe-nomena, 60(1):259–268, 1992.[57] Reza Shahidi, Gang Tang, Jianwei Ma, and Felix J. Herrmann. Ap-plication of randomized sampling schemes to curvelet-based sparsity-promoting seismic data recovery. Geophysical Prospecting, 61(5):973–997, 2013.[58] Claude E. Shannon. Communication in the presence of noise. Proc.I.R.E., 37:10–21, 1949.[59] Jean-Luc Starck, Emmanuel J Cande`s, and David L Donoho. Thecurvelet transform for image denoising. Image Processing, IEEE Trans-actions on, 11(6):670–684, 2002.[60] D. Trad, T. Ulrych, and M. Sacchi. Accurate interpolation with highres-olution timevariant radon transforms. GEOPHYSICS, 67(2):644–656,2002.[61] Daniel Trad, Tadeusz Ulrych, and Mauricio Sacchi. Latest views of thesparse radon transform. Geophysics, pages 386–399, 2003.[62] Ning Tu, Aleksandr Y. Aravkin, Tristan van Leeuwen, and Felix J.Herrmann. Fast least-squares migration with multiples and source es-timation. In EAGE, 06 2013.[63] Ning Tu and Felix J. Herrmann. Least-squares migration of full wave-field with source encoding. In EAGE technical program. EAGE, EAGE,01 2012.[64] M. Unser. Sampling-50 years after shannon. Proceedings of the IEEE,88(4):569 –587, april 2000.[65] Fleet Van. Discrete Wavelet Transformations An Elementary Approachwith Applications. Wiley-Interscience. Hoboken, NJ: John Wiley &Sons. xxiv, 2008.[66] E. van den Berg and M. P. Friedlander. Probing the pareto frontierfor basis pursuit solutions. SIAM Journal on Scientific Computing,31(2):890–912, 2008.[67] Namrata Vaswani and Wei Lu. Modified-cs: modifying compressivesensing for problems with partially known support. Trans. Sig. Proc.,58(9):4595–4607, September 2010.66Bibliography[68] Haneet Wason and Felix J. Herrmann. Time-jittered ocean bottomseismic acquisition. In SEG, 9 2013.[69] G Zimmermann. Normalized tight frames in finite dimensions. In RecentProgress in Multivariate Approximation, Birkhauser (2001, 2001.[70] P. M. Zwartjes and M. D. Sacchi. Fourier reconstruction of nonuni-formly sampled, aliased seismic data. Geophysics, 72(1):V21–V32, 2007.67


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items