UBC Faculty Research and Publications

Curvelet-based non-linear adaptive subtraction with sparseness constraints Herrmann, Felix J.; Moghaddam, Peyman P. 2004-03-26

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata


52383-SEGAD2004.pdf [ 1.48MB ]
JSON: 52383-1.0107418.json
JSON-LD: 52383-1.0107418-ld.json
RDF/XML (Pretty): 52383-1.0107418-rdf.xml
RDF/JSON: 52383-1.0107418-rdf.json
Turtle: 52383-1.0107418-turtle.txt
N-Triples: 52383-1.0107418-rdf-ntriples.txt
Original Record: 52383-1.0107418-source.json
Full Text

Full Text

Curvelet-based non-linear adaptive subtraction with sparseness constraintsFelix Herrmann and Peyman Moghaddam, EOS, University of British ColumbiaAbstractIn this paper an overview is given on the application of directional basisfunctions, known under the name Curvelets/Contourlets, to variousaspects of seismic processing and imaging, which involve adaptivesubtraction. Key concepts in the approach are the use of (i) directionalbasis functions that localize in both domains (e.g. space and angle);(ii) non-linear estimation, which corresponds to localized muting onthe coefficients, possibly supplemented by constrained optimization. Wewill discuss applications that include multiple, ground-roll removal andmigration denoising.IntroductionEverybody would agree that a seismologists dream would be the ex-istence of a basis-function decomposition which localizes in spaceand angle; is sparse for seismic data and images; is well behavedunder certain (imaging) operators and is independent on the localphase characteristics. To make a long story short, this dream hascome true, at least in part for 2-D data, with the recent introduction ofdirectional non-separable wavelets, known under the names Curveletsor Contourlets [4, 2]. As their names suggest, these basis functionsare designed to optimally represent images with edges on piece-wisesmooth curves (f element C2 with a finite number of jumps to be precise).For our application, this optimality corresponds to a non-adaptive (thetransform does not depend on the function to be transformed, like theFFT) decomposition that is near optimal (read as sparse as you canget, i.e. the non-linear approximation rate is near optimal) for seismicreflection data as well as for seismic images, both of which tend toconsist of events that are relatively smooth in the tangential directionand oscillatory/singular across.For some time, orthogonal separable discrete wavelet transforms lookedlike they would deliver onto above?s dream, since they represented atransform that is optimal for non-stationary piece-wise smooth 1-D func-tions. Indeed, for this class of functions, wavelets form an uncondi-tional basis and consequently lead to an almost diagonalization of thedata?s covariance. Even though, wavelets have been very successful inmany different fields, their application to functions and operators thatinvolve directivity, e.g. seismic data with wavefronts and non-stationarydirectional filters, such as the normal operators for seismic imaging, hasbeen less successful. For instance, discrete wavelets transforms havenot really made an impact in seismic processing and processing becausetheir lack of directional selectivity renders them less effective. Curvelets(from now on I will use the name Curvelets to refer to both Curveletsand Contourlets), on the other hand, remain more or less invariant underimaging/modeling operators, while they maintain near optimal sparserepresentations for functions with singularities on curves. These prop-erties make Curvelets the ideal representations in which to formulateseismic processing and imaging problems [8, 11, see also other contri-butions by the authors to the Proceedings of this Conference]/The success of identifying and effectively removing coherent (noise)components from data depends on the interplay of two factors: (i) theability to accurately model the noise & signal components and (ii) theability to separate/filter these components in a possibly noisy environ-ment. Adaptive-subtraction methods aim to robustly remove the noisecomponent, given imperfections in the noise prediction and the existenceof an additional incoherent noise term. Without being all inclusive, ap-plications of adaptive subtraction range from multiple, ground-roll elim-ination [14] to the robust computation of 4D-difference cubes and theremoval of noise after migration [8, and other contributions by the au-thors to the Proceedings of this Conference].We cast the adaptive subtraction method into the general framework ofinversion theory. We argue that one is much better shape to solve aninversion problem if one is able to sparsely represent data, noise and de-noised data (the model). Given the appropriate choice of basis functions,allows us to remove the bulk of the noise using a diagonal minimax esti-mation operator, based on thresholding [9, 5]. Sparseness/minimal struc-ture is imposed by solving an additional global optimization problem onthe estimated coefficients that minimizes an additional penalty functions[2, 13].First, we will formulate the adaptive subtraction problem as a least-squares inverse problem. Then, we will recast this problem onto optimalbasis functions and present the non-linear estimation by thresholdingrule that is based on the magnitudes of the coefficients only. Basic prop-erties of Curvelets are presented next, followed by a description of theglobal optimization techniques we employ. We conclude by presentinga number of examples.The inverse problem underlying adaptive subtractionVirtually any linear problems in seismic processing and imaging can beseen as a special case of the following generic problem: how to obtainm from data d in the presence of (coherent) noise n:d = Km + n, (1)in which, for the adaptive subtraction problem, K and Kasteriskmath, represent therespective time convolution and correlation with an effective wavelet Phithat minimizes the following functional [see e.g. 6]minPhi = bardbld -Phi t mbardblp, (2)where d represents data with coherent noise; m the predicted noise and? = minPhi bardbld-Phiasteriskmathmbardblp the noise-free data, obtained by minimizingthe Lp-norm. For p = 2, Eq. 2 corresponds to a standard least-squaresproblem where the energy of the ?noise? is minimized.Seismic data and images are notoriously non-stationary and non-Gaussian, which may give rise to a loss of coherent events in the de-noised component when employing the L2-norm. The non-stationarityand non-Gaussianity have with some success been addressed by solvingEq. 2 within sliding windows and for p = 1 [6]. In this paper, we followa different strategy replacing the above variational problem by? : minm 12bardblC-1/2n (d -m) bardbl22 + ?J(m). (3)In this formulation, the necessity of estimating a wavelet, Phi, has beenomitted. Noisy data is again represented by d but now m is the noise-free model on which an additional penalty function is imposed, suchas a L1-norm, i.e. J(m) = bardblmbardbl1. The noise term is now given bythe predicted noise and this explains the emergence of the covarianceoperator, whose kernel is given byCn = E{nnT }. (4)Question now is can we find a basis-function representation which is (i)sparse and local on the model (noise-free data), m, data, dand predictednoise n and (ii) almost diagonalizes the covariance of the model andthe predicted noise. Answer is affirmative, even though the condition oflocality is non-trivial. For instance, decompositions in terms of principlecomponents/Karhuhnen-Lo? e-basis (KL) or independent componentsmay diagonalize the covariance operators but these decomposition aregenerally not local and not necessary the same for both noise and model.By selecting a basis-function decomposition that is local, sparse and al-most diagonalizing (i.e. C?n approxequal diag{C?n} = diag{G2?n}), we find asolution to the above denoising problem that minimizes (mini) the max-imal (max) mean-square-error given the worst possible Bayesian priorfor the model [9, 5]. This minimax estimator minimizes?0 : minm 12bardblG-1?nparenleftBig?d - ?parenrightBigbardbl22 (5)by a simple diagonal non-linear thresholding operator? = B?GThetalambda parenleftbigG-1Bdparenrightbig = B?ThetalambdaGparenleftBig?dparenrightBig. (6)The threshold is set to lambdadiag{G} (we dropped the subscript ?) with lambda anadditional control parameter, which sets the confidence interval (e.g. theconfidence interval is 95 % for lambda = 3) (de)-emphasizing the threshold-ing. This thresholding operation removes the bulk of the noise by shrink-ing the coefficients that are smaller then the noise to zero and brings usin the convex region of the global optimization problem of Eq. 3. Thesymbol ? indicates (pseudo)-inverse, allowing us to use Frames ratherthen orthonormal bases. Before going onto detail on how to impose theadditional penalty term, let us first focus on the selection of the appro-priate basis-function decomposition that accomplishes the above task ofreplacing the adaptive subtraction problem, given in Eq. 2, by its diago-nalized counterpart (cf. Eq. 6).Basis-function for seismic imaging and processingNon-linear estimators of the type given in Eq. 6 are known to asymptot-ically obtain minimax [5] for particular combinations of basis functionsand classes of functions. For example, thresholding of the waveletcoefficients for piece-wise smooth functions (to be more specificfunctions that lie in particular Besov spaces) is minimax ands hencegives results with near optimal SNR. Can these results be extended tofunctions that contain singularities/edges on curves and to cases wherean (imaging operator) is involved? In other words, can we find basisfunctions that form an unconditional basis for seismic data, whichgenerally can be considered to be the result of the (repeated) action ofthe scattering operator its adjoint (both asymptotically Fourier IntegralOperators (FIO)) and the composition of these two operators, the normaloperator (which is an elliptic Pseudo Differential Operator (PsiDO))?To answer this question let us first explain what is meant by an uncon-ditional basis. Without being overly technical, (almost) unconditionalbases obey the following relationshipbardbl?prime?bardbldiscretefancy <=< bardbl??bardbldiscretefancy, universal? (7)for their discrete norms. These discrete norms on the ?-indexedcoefficients can be quite intricate (e.g. Besov norms, i.e. non-L2)hence the name fancy. In an unconditional basis, these discretenorms are equivalent to norms on the original continuous function,i.e. bardblfbardblcontinuousfancy equivasymptotic bardbl??bardbldiscretefancy. Moreover, these dis-crete norms decay whenever we shrink the coefficients, i.e. bardbl?prime?bardbl =bardbls???bardbl with |s?| <=< 1. Implications, of this property are profoundbecause it means that the (fancy) norm always decreases, irrespectiveof the sign or phase of s?. This decrease only depends on the magni-tude of s?. Moreover, the function can be shown to remain in the same?class?, which means as much that the edges are preserved. Translatedto our language, this property means that we no longer need to know the?phase? of our predicted noise. It suffices to only know the local stan-dard deviation (proportional to magnitude of the Curvelet coefficients ofthe predicted noise) of the predicted noise, which is used to define thethreshold in the non-linear estimator (cf. Eq. 6). We used the quotes forthe ?phase? because we are only referring to local phase rotations overthe (?compact?) support of the localized basis functions. This local phaseis different from the phase in the Fourier coefficients, which completelygoverns the location of certain events. Needless to say Fourier bases arenot unconditional.Besides the favorable unconditional-basis property (near) uncondi-tional bases also obtain almost optimal non-linear approximation rates.Curvelets as proposed by [4] and [2] respectively, constitute a familyof relatively new nearly unconditional non-separable wavelet bases thatsparsely represent functions with singularities (read wavefronts) that lieon piece-wise smooth curves. For these type of functions, Curveletsobtain near optimal non-linear approximation rates (NAR?s). For com-parison, the first m-term reconstruction of the sorted 2-D Fourier coeffi-cients obtains for a 2-D function x with curved singularities [3, 2, 4] thefollowing NAR (measured in the L2-norm),bardblx - ?mbardbl22 proportional m-1/2while separable wavelets (ordinary 2 - D discrete wavelet transforms,again ordered in decreasing size over their index set) obtainbardblx - ?mbardbl22 proportional m-1.This improvement for wavelets has mainly been responsible for heir suc-cess in signal/image processing [9]. Clearly, the further improvement inNRA by Curvelets, which obtain (ordered over ?-index in decreasingorder)bardblx - ?mbardbl22 proportional Cm-2(log m)3,will likely open up a whole suit of successful applications. Besides thelogarithmic term, the above NRA is equivalent to the best possible ap-proximation rate (e.g. by an adaptive triangular meshing, given the loca-tion of the edges/reflection events) attainable for this type of functions[2, 4].In addition to the optimal NRA, Curvelets almost diagonalize FIO?s andPsiDO?s [1] the asymptotic ?building blocks? of seismic modeling andimaging. Consequently, we can expect the covariance matrices of seis-mic data and possible sources of coherent noise such as multiples andground-roll to be almost diagonalized by Curvelets, i.e. Curvelets areclose to the KL-bases which, by definition, are the basis that diagonal-izes the Covariance operator.So how do Curvelets obtain such a high non-linear approximation rate?Without being all inclusive [see for details [2, 4]], the answer to thisquestion lies in the fact that Curvelets are? multi-scale, i.e. they live in different dyadic corona (see Figure 1)in the FK-domain.? multi-directional, i.e. they live on wedges within these corona (seeFigure 1).? anisotropic, i.e. they obey the following scaling law width proportionallength2.? directional selective with # orientations proportional 1scale .? local both in (x,t) and (k, f).? almost orthogonal, they are tight frames with a moderate redun-dancy. Contourlets implement the pseudo-inverse in closed-formwhile Curvelets provide the transform and its adjoint, yielding apseudo-inverse computed by iterative Conjugate Gradients.Fig. 1: Left: Curvelet partitioning of the frequency plane [modified from[3]]. Right: Comparison of non-linear approximation rates Curveletsand Wavelets [modified from [4]].In Figure 1, a number of Curvelet properties are detailed [adapted from[3, 2, 4]]. Figure 1 describes the Curvelet partitioning of the frequency-wavenumber plane. One Curvelet lives in a wedge and becomes moredirectional selective and anisotropic for the higher frequencies (whilemoving away from the origin). Curvelets are localized in both the space(or (x,t)) and spatial (KF)-domains and have, as consequence of theirpartitioning, the tendency to align themselves with curves/wavefronts(see Figure 1). As such they are more flexible then a representationyielded by e.g. high-resolution Radon [as described by e.g. [13]], be-cause they are local and able to follow any piece-wise smooth curve.Only Curvelets that align with reflectors yield large inner products andthis explains their success in solving the denoising problem for modelsthat contain events on curves. The noise is canceled because the basisfunctions adapt themselves locally to the dip and hence optimizing theiroverlap with the data integrating out the incoherent noise.To demonstrate the insensitivity of Curvelet denoising to variations inthe local phase, we include an example where we compare the thresh-olding results (cf. Eq. 6) for ground-roll removal given two noise pre-dictions that are 90-degrees out of phase. We used the noise predictedby high-res parabolic Radon and we subsequently took the Hilbert trans-form to apply the phase shift. The results are summarized in Fig. 2.Sparseness constraints by global optimizationHaving some basic understanding why Curvelets work, we now turnour attention towards the issue how to impose additional sparsenessconstraints (J(m)) that reflect our prior knowledge. By applying thethresholding operator (cf. Eq. 6) with the threshold set according to(i) the standard deviation of the Curvelet transform for the predictednoise: lambda|Bn| with B and n the predicted noise and the Curvelettransform, respectively; (ii) a Monte-Carlo sampling for the diagonalof the Covariance operator of the noise. The first method uses actualpredictions for the noise, which are either based on some physical model[6] (e.g. physical models for multiples or ground roll [10]) or on noisepredicted by conventional denoising techniques such as high-res Radon[13]. The second method uses information on an operator that colors thenoise. Migration denoising is an example of the second method wherethe migration operator colors the noise. Refer for this latter applicationand multiple elimination to [7, 8] or to other contributions of the firstauthor to the Proceedings of this Conference.It can be shown that thresholding (cf. Eq.6) brings us in the convex of thedenoising problem formulated in Eq. 3 for J(m = bardblmbardbl1). Remainsto impose the additional prior information residing in J(m). By settingthis constraint to the L1-norm we hope to (i) impose minimum structure;(ii) remove possible estimation and side-band artifacts, enhancing thecontinuity along the imaged reflectors. To accomplish these goals, weformulate the following constrained optimization problem [2]? : minm J(m) s.t. | ? - ?0|? <=< e?, universal?, (8)where our initial guess estimated by thresholding, ?0, is updated, sub-Fig. 2: Illustration of the robustness of adaptive subtraction by thresh-olding applied to ground-roll removal for the Yilmaz? ozdata20 dataset.Left panel: noisy data with ground roll; Second panel: high-resParabolic Radon denoised example, which was used to predict the noise[13]. Notice the discontinuity at zero offset. Third panel: denoised databy thresholding with noise predicted by Radon; Fourth panel: the samebut now for 90 degrees phase rotated predicted noise. The insensitivityof thresholding w.r.t. local phase is clear from this example.ject to the constraintse? =braceleftBiggG? if | ?0|? greaterequal |lambdaG|?lambdaG? if | ?0|? < |lambdaG|?. (9)This global optimization procedure is solved using an Augmented-Lagrangian technique [12, 7] with the Lagrangian multipliers initializedby the thresholded result, ?0. The optimization runs over all the indexspace ? and the tolerance differs by lambda for those coefficients that haveinitially been thresholded to obtain the first estimate and those that sur-vived the thresholding. The coefficients that were perceived to be noiseare allowed to vary more as part of the optimization.ApplicationsApplication of the presented non-linear adaptive subtraction methodare numerous. It can be applied to problems involving coherent orincoherent noise removal. As shown in an another paper submitted tothese Proceedings, our method can readily be extended to include an(imaging) operator [7]. The applications can roughly be divided intoadaptive subtraction for? multiple elimination, where predicted multiples and multiple-free data are represented by the coherent noise n and model m,respectively [in these proceedings and in 8]. See Fig. 3 for anexample.? ground-roll elimination, where ground roll-free data is the modelm and ground roll the noise n [14]. See Fig. 2 for an example.? computation 4-D difference cubes, where one vintage dataset isthe noise of the other and vice versa [see 11].Fig. 3: Synthetic example of multiple elimination. Left panel: Win-dowed least-squares method [see e.g. 6]. Right panel: Our sparseness-constrained denoising based on initial thresholding followed by the con-strained optimization (cf. Eq. 8).Fig. 4: Synthetic common-offset Kirchoff-migration example. Top row:Noisy migrated and least-square migrated image with 10 interations ofCG. Notice the coloring of the noise and the failure of CG to correctfor the normal operator. Bottom row: Thresholded image (left) andconstrained optimized least-squares image (right). Notice, the signifi-cant improvement in the signal-to-noise ratio by the thresholding. Theconstraint optimization (cf. Eq. 8) removes the artifacts and restores theleast-squares amplitudes, leading to a drastically improved image.and sparseness-constrained least-squares migration and AVO-inversion,where an operator involved (e.g. least-squares migration, where the mi-gration operator colors the noise [7, see also these Proceedings]). Aswe can see from the example shown in Fig. 4, thresholding followed byoptimization yields a substantially improved image, where most of thecoherent energy has been removed. Conversely, least-squares migrationby Conjugate gradients concentrates the energy of the noise towards re-gions that are not well insonified and fails to restore the amplitudes. Ourmethod, on the other hand, removes most of the artifacts and restores theamplitudes for the larger angles.ConclusionsWhy do Curvelets seem to be the appropriate choice to formulate prob-lems in seismic processing and imaging. The answer to this questionis relatively simple. Curvelets provide an almost unconditional basisnot only for functions with singularities on curves but also for functionsthat are generated by operators that approximately solve the waveequation. The superior non-linear approximation rate, locality both inposition and dip, allow for the construction of non-linear estimators,based thresholding and possibly supplemented by global optimization.Unlike subtraction, thresholding simply puts a certain ?feature? to zeroif it belongs to the noise. The decision to mute only depends on themagnitude of the noise and our method is therefore robust under localphase rotations. In addition, the unconditional basis property corre-sponds to an almost diagonalization of the covariance operators (andalso imaging operators), which explains their success of bringing usclose to the solution of the sparseness-constrained adaptive subtractionproblem. Because Curvelets are a non-adaptive transformation, theyform an attractive alternative to data-adaptive methods, while allowingfor maximal flexibility in dealing with non-stationary data.AcknowledgmentsThe authors thank Emmanuel Cand? and David Donoho for makingan early version of their Curvelet code (Digital Curvelet Transformsvia Unequispaced Fourier Transforms, presented at the ONR Meeting,University of Minnesota, May, 2003) available. We also would like tothank Mauricio Sacchi for his migration code and Yarham Carson andDaniel Trad for helping to make the ground role example. This workwas in part financially supported by a NSERC Discovery Grant.References[1] E. J. Cand` and L. Demanet. Curvelets and fourier integraloperators. 2002. URL http://www.acm.caltech.edu/?emmanuel/publications.html. to appear Comptes-Rendus de l?Academie des Sciences, Paris, Serie I.[2] E. J. Cand` and F. Guo. New multiscale transforms,minimum total variation synthesis: Applications to edge-preserving image reconstruction. Signal Processing, pages1519?1543, 2002. URL http://www.acm.caltech.edu/?emmanuel/publications.html.[3] Emmanual J. Cand` and David L. Donoho. Recovering Edges inIll-posed Problems: Optimality of Curvelet Frames. Ann. Statist.,30:784?842, 2000.[4] M. Do and M. Vetterli. Beyond wavelets, chapter Contourlets. Aca-demic Press, 2002.[5] David L. Donoho and Iain M. Johnstone. Minimax estimationvia wavelet shrinkage. Annals of Statistics, 26(3):879?921, 1998.URL citeseer.nj.nec.com/donoho92minimax.html.[6] A. Guitton. Adaptive subtraction of multiples using thel1-norm. Geophys Prospect, 52(1):27?27, 2004. URLhttp://www.blackwell-synergy.com/links/doi/10.1046/j.1365-2478.2004.00401.x/abs.[7] Felix J. Herrmann and Peyman Moghaddam. Curvelet-domainleast-squares migration with sparseness constraints. In ExpandedAbstracts. EAGE, 2004.[8] Felix J. Herrmann and Eric Verschuur. Separation of primariesand multiples by non-linear estimation in the curvelet domain. InExpanded Abstracts. EAGE, 2004.[9] S. G. Mallat. A wavelet tour of signal processing. Academic Press,1997.[10] George A. McMechan and Mathew J. Yedlin. Analysis of disper-sive waves by wave field transformation. Geophysics, 46, 1981.URL http://link.aip.org/link/?GPY/46/869/1.[11] Felix J. Herrmann Moritz Beyreuther, Jamin Cristall. Curvelet de-noising of 4d seismic. In Expanded Abstracts. EAGE, 2004.[12] Jorge Nocedal and Stephen J. Wright. Numerical optimization.Springer, 1999.[13] D. Trad, T. Ulrych, and M. Sacchi. Latest views of the sparse radontransform. Geophysics, 68(1):386?399, 2003.[14] Daniel Trad Yarham Carson and Felix J. Herrmann. Curvelet pro-cessing and imaging: adaptive ground roll removal. In ExpandedAbstracts. CSEG, 2004.


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items