Development of an instantaneous frequency estimation pipeline for compressional and shear wave arrival picking: application to quarry blast data by Ahmed Hasin Issa Sigiuk B.A.Sc, University of Tripoli, 2007 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of Applied Science in THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES (Electrical and Computer Engineering) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) May 2019 © Ahmed Hasin Issa Sigiuk, 2019 ii The following individuals certify that they have read, and recommend to the Faculty of Graduate and Postdoctoral Studies for acceptance, a thesis/dissertation entitled: Development of an instantaneous frequency estimation pipeline for compressional and shear wave arrival picking: application to quarry blast data submitted by Ahmed Hasin Issa Sigiuk in partial fulfillment of the requirements for the degree of Master of Applied Science in Electrical and Computer Engineering Examining Committee: Matthew Yedlin, Electrical and Computer Engineering Supervisor Edmond Cretu, Electrical and Computer Engineering Supervisory Committee Member Mieszko Lis, Electrical and Computer Engineering Supervisory Committee Member iii Abstract Many seismology applications such as earthquake hypocenter determination, source mechanism analysis, and hydrocarbon reservoir imaging, require accurate compressional and shear wave arrival time picking. Traditionally different seismological organizations have relied on human expert manual picking of seismic wave arrival and labeling. However as the number of seismic stations around the world rapidly grows, the need for automatic phase picking algorithms is increasing and requires the employment of large datasets. These algorithms need to distinguish between the different seismic phases, and accurately measure their onset. This thesis proposes a seismic signal processing pipeline application for automatic time picking of seismic compression waves (P) and shear waves (S). The algorithm picking accuracy is evaluated on quarry blasts seismic traces, and compared to manual expert picking. The proposed pipeline consists of three main segments: The data pre-processing segment focuses on Empirical Mode Decomposition (EMD) as a method of deconstructing multicomponent signals into monocomponent signals, followed by the Singular Spectral Analysis (SSA) method as a denoising filter. The data processing segment computes the instantaneous frequency (IF) using weighted least squares and Tychonov regularization with quadratic constraints. The data post-processing segment uses seismic signal IF estimation to pick the compressional and shear waves arrival times. iv Lay summary Applications such as earthquake source location determination, and object proximity determination using sonar, require accurate wave arrival time measurements. Traditionally, for seismic signal different seismological organizations have relied on human expert manual picking of seismic wave arrival and labeling, or, at least reviewing picks that have been automatically picked. However, in recent years multiple deployments of dense seismic arrays make manual picking impractical due to the huge data that these arrays are collecting. This gives rise to the need for reliable, efficient and accurate automatic phase-picking algorithms. This thesis proposes a seismic signal processing pipeline application for automatic time picking of seismic compression waves (P) and shear waves (S). The algorithm picking accuracy is evaluated on quarry blast seismic traces, and compared to manual expert picking. v Preface This dissertation is an original and independent work by the author, A. Sigiuk. This is based on the contribution of the author to the work published or will be published as a result of collaborations with other researchers. In Chapter 1, section 1.2, pages 20-22, the classical instantaneous frequency definition was primarily derived by Matthew Yedlin and myself. In Chapter 3, pages (58-60), section 3.1.5 was primarily derived by myself, and section 3.1.6, was primarily derived by Matthew Yedlin and myself. In Chapter 5, pages (100-101), section 5.3.1, the implementation of empirical mode decomposition on seismic signals was primarily derived by myself, which are partially based on the following publication: 1. Ahmed H. Sigiuk, Matthew J. Yedlin. “Instantaneous frequency estimation using weighted least squares Tychonov regularization with box constraints.” CREWES Research Report, 28(69), pp. 1-13, 2016. The derivation of ! (pages 58-59) was obtained for a final project submitted to ELEC 565: 2. Sigiuk, A., “Instantaneous Frequency Estimation Using Least Square Algorithm with Tychonov Regularization.” Unpublished manuscript, EECE 565, University of British Columbia, 2016. vi Table of contents Abstract ......................................................................................................................................... iii Lay summary ................................................................................................................................ iv Preface ............................................................................................................................................ v Table of contents .......................................................................................................................... vi List of tables ................................................................................................................................... x List of figures ................................................................................................................................ xi List of abbreviations ................................................................................................................... xv Acknowledgements .................................................................................................................... xvi Dedication .................................................................................................................................. xvii 1- Introduction .............................................................................................................................. 1 1.1 Seismic phase picking ........................................................................................................ 2 1.2 Classical instantaneous frequency definition ..................................................................... 3 1.3 Stationary and non-stationary signals ................................................................................ 5 1.4 Monocomponent and multicomponent signals .................................................................. 8 1.5 Thesis objectives, novelty and intuition ............................................................................. 9 1.6 Thesis organization .......................................................................................................... 10 2- Theory ...................................................................................................................................... 12 2.1 Instantaneous frequency theory ....................................................................................... 12 2.1.1 History of instantaneous frequency .......................................................................... 14 2.1.2 Interpretation of Hilbert generated instantaneous frequency and analytical signal .. 23 vii 2.2 Empirical mode decomposition (EMD) ........................................................................... 25 2.2.1 The sifting procedure ................................................................................................ 26 3- Algorithm design ..................................................................................................................... 30 3.1 Data processing ................................................................................................................ 30 3.1.1 Problem formulation ................................................................................................. 32 3.1.2 Performance evolution criteria and intuition ............................................................ 33 3.1.2.1 Performance evaluation ..................................................................................... 34 3.1.3 Spectral method ........................................................................................................ 36 3.1.4 Weighted least squares .............................................................................................. 38 3.1.4.1 Performance evaluation ..................................................................................... 40 3.1.5 Quadratic constraints as a quadratic inequality constraints ...................................... 41 3.1.5.1 Performance evaluation ..................................................................................... 41 3.1.6 Least squares Tychonov regularization .................................................................... 42 3.1.6.1 Performance evaluation ..................................................................................... 43 3.1.7 Putting it all together ................................................................................................. 44 3.1.7.1 Performance evaluation ..................................................................................... 44 4- Algorithm design extension .................................................................................................... 48 4.1 Data preprocessing ........................................................................................................... 50 4.1.1 Data restructuring ...................................................................................................... 50 4.1.2 Empirical mode decomposition ................................................................................ 51 4.1.2.1 Performance evaluation ..................................................................................... 52 4.1.3 Principal component analysis (PCA) ........................................................................ 56 4.1.4 Singular spectrum analysis (SSA) ............................................................................ 59 viii 4.1.4.1 Performance evaluation ..................................................................................... 62 4.1.5 Performance comparison with state-of-the-art methods for chirp model estimation 64 4.2 Data postprocessing ......................................................................................................... 70 4.2.1 Power threshold cutoff .............................................................................................. 71 4.2.2 Feature detection ....................................................................................................... 73 5- Seismic case study ................................................................................................................... 75 5.1 Input data set .................................................................................................................... 75 5.1.1 Input data source ....................................................................................................... 75 5.1.2 Data restructuring ...................................................................................................... 76 5.2 Algorithm structure and parameter setting ...................................................................... 78 5.2.1 Data preprocessing .................................................................................................... 78 5.2.2 Data processing ......................................................................................................... 81 5.2.3 Data postprocessing .................................................................................................. 81 5.3 Algorithm implementation and results ............................................................................. 82 5.3.1 Empirical mode decomposition ................................................................................ 83 5.3.2 Principal component analysis ................................................................................... 84 5.3.3 Singular spectrum analysis ....................................................................................... 87 5.3.4 Data processing ......................................................................................................... 90 5.3.5 Data postprocessing .................................................................................................. 93 5.3.5.1 Power threshold cutoff ....................................................................................... 93 5.3.5.2 Feature detection ................................................................................................ 94 5.4 Summary and comparative analysis of results ................................................................. 99 6- Conclusion ............................................................................................................................. 104 ix 6.1 Summary ........................................................................................................................ 104 6.2 Future work .................................................................................................................... 106 Bibliography .............................................................................................................................. 108 x List of tables Table 3-1 Performance comparison between STFT centroid method, and Aω = b IF solution. . 36 Table 3-2 Performance comparison between STFT centroid solution, and A!C!Aω = A!C!b IF solution. ......................................................................................................................................... 40 Table 3-3 Performance comparison between STFT centroid solution, and ((A!A+ C!)ω =A!b+ C!ω IF solution. ................................................................................................................. 42 Table 3-4 Performance comparison between STFT centroid solution, and (A!A+ αW!W)ω =A!b IF solution. ............................................................................................................................. 44 Table 3-5 Performance comparison between STFT centroid solution, and (A!C!A+ αW! +C!)ω = A!C!b+ C!ω IF solution. ............................................................................................... 45 Table 3-6 Performance comparison between STFT centroid solution, and ((A!C!A+ αW! +C!)ω = A!C!b+ C!ω IF solution. ............................................................................................... 47 Table 4-1 Performance comparison between EMD + WLSTR method, and EMD + STFT centroid frequency method. .......................................................................................................... 55 Table 4-2 Performance comparison between EMD + WLSTR method, and EMD + STFT centroid frequency method. .......................................................................................................... 64 Table 5-1 Comparison summary of compressional and shear wave characteristics. .................... 97 xi List of figures Figure 2.1 Simple harmonic motion projection x(t) of an object rotating on circle with uniform angular speed ω, where θ = ωt. ................................................................................................... 12 Figure 2.2 Harmonic motion represented in complex form where S t = A e! !!!! , x t =A cos ωt+ θ , φ t = ωt+ θ. .................................................................................................... 16 Figure 2.3 Simple harmonic signal spectrum. .............................................................................. 20 Figure 2.4: Illustration of the sifting process. (a) Original time series data example x(t). (b) Maxima spline function (red dashed line), Minima spline function (blue dashed line) and local mean function (thick solid black line). (c) Proto-IMF function that is the difference between original time series function x(t) and local mean function. ......................................................... 27 Figure 3.1 Data processing algorithm flow diagram. ................................................................... 31 Figure 3.2 Simple monotone signal s t = sin 2πf!t + noise amplitude and spectrogram plots. The monotone signal duration is only from 25 seconds to 180 seconds, and the rest of the interval is pure noise. ................................................................................................................................. 35 Figure 3.3 G ω function plot, where f! = 1. ............................................................................... 37 Figure 3.4 Top plot: IF estimation by solving the equation (A!C!A+ αW! + C!)ω = A!C!b+C!ω. Bottom plot: IF estimation using STFT centroid frequency. ............................................... 45 Figure 3.5 Down-chirp signal s t = chirp+ noise amplitude and spectrogram plots. The chirp signal duration is only from 25 seconds to 180 seconds, and the rest of the interval is pure noise. The average SNR is equal to 10 dB .............................................................................................. 46 xii Figure 3.6 Top plot: IF estimation by solving the equation (A!C!A+ αW! + C!)ω = A!C!b+C!ω. Bottom plot: IF estimation using STFT centroid frequency. The average SNR is equal to 10 dB .................................................................................................................................................. 47 Figure 4.1 Data preprocessing algorithm flow diagram. .............................................................. 49 Figure 4.2 Signal amplitude and spectrogram plots for s! t , and s! t chirp signals. ............... 53 Figure 4.3 IF estimation without applying EMD. Top plot: IF estimation using weighted least squares Tychonov regularization with quadratic constraints (WLSTR) method. Bottom plot: IF estimation using STFT centroid frequency method. ..................................................................... 54 Figure 4.4 Signal amplitude and spectrogram plots for both IMF1 and IMF2. ............................ 55 Figure 4.5 Top plot: IF estimation using EMD + WLSTR method. Bottom plot: IF estimation using EMD + STFT centroid frequency method. ......................................................................... 56 Figure 4.6 Signal amplitude and spectrogram plots after SSA processing for both IMF1 and IMF2. ............................................................................................................................................ 63 Figure 4.7 Top plot: IMF1 IF estimation using weighted least squares Tychonov regularization with quadratic constraints (WLSTR) method. Bottom plot: IMF2 IF estimation using STFT centroid frequency method. .......................................................................................................... 64 Figure 4.8 Signal amplitude and spectrogram plots for s(t), chirp signal. ................................... 67 Figure 4.9 The performance comparison between, proposed method (SSA+WLSTR), Method 1 proposed in [45], (see Figure 9 in [45]), and Method 2 proposed in [50] for a single linear chirp signal. ............................................................................................................................................ 68 Figure 4.10 Data postprocessing algorithm flow diagram. ........................................................... 71 xiii Figure 4.11 Top plot: IMF1 and IMF2 IF estimation using weighted least squares Tychonov regularization with quadratic constraints (WLSTR) method. Bottom plot: IMF1 and IMF2 IF estimation after a threshold cutoff is applied. ............................................................................... 73 Figure 5.1 Raw seismic data recording from HRFI station (Z component), which was recorded on 1th Jan 2015. The dotted line specifies the expert pick arrival time for the compressional waves. ............................................................................................................................................ 77 Figure 5.2 Extracted seismic data segment taken from the fourth compressional wave arrival. .. 77 Figure 5.3 Algorithm flow diagram, consisting of the three the main blocks. First is the Data Preprocessing, second is the Data Processing and third is the Data postprocessing. ................... 80 Figure 5.4 Example seismic event s!. ........................................................................................... 83 Figure 5.5 Plot of the first three IMF’s s!_!"#$, s!_!"#$ and s!_!"#$. ........................................... 84 Figure 5.6 Singular value plot for S!"#$%_!"#$ training set, with a threshold value as indicated. .. 85 Figure 5.7 Singular value plot for S!"#$%_!"#$ training set, with a threshold value as indicated. .. 85 Figure 5.8 Singular value plot for S!"#$%_!"#$ training set, with a threshold value as indicated. .. 86 Figure 5.9 Projected IMF signals s!_!"#_!"#$, s!_!"#_!"#$ and s!_!"#_!"#$. ................................ 87 Figure 5.10 Singular Value plot for IMF1 component s!_!"#_!"#$. ............................................. 88 Figure 5.11 Singular Value plot for IMF2 component s!_!"#_!"#$. ............................................. 88 Figure 5.12 Singular Value plot for IMF3 component s!_!"#_!"#$. ............................................. 89 Figure 5.13 SSA process output signals s!_!!"_!"#$, s!_!!"_!"#$ and s!_!!"_!"#$. ...................... 89 Figure 5.14 SSA process output signals s!_!!"_!"#$, s!_!!"_!"#$ and s!_!!"_!"#$ with a smaller Y-axis scale. .................................................................................................................................. 90 xiv Figure 5.15 PSD plot. (A) IMF1 envelope spectral functions. (B) IMF1 carrier spectral functions. (C) IMF2 envelope spectral functions. (D) IMF2 carrier spectral functions. (E) IMF3 envelope spectral functions. (F) IMF3 carrier spectral functions. ............................................................... 92 Figure 5.16 Instantaneous frequency estimation for IMF1, IMF2 and IMF3. The output IF estimation after the processing block. ........................................................................................... 93 Figure 5.17 Instantaneous frequency estimation after power threshold cutoff. ............................ 94 Figure 5.18 Instantaneous frequency plots (A) IMF1 compression wave onset pick of 29.96 sec. (B) IMF2 compression wave onset pick of 31.2 sec. (C) IMF1 shear wave onset pick of 47.1 sec. (D) IMF2 shear wave onset pick of 47 sec. .................................................................................. 98 Figure 5.19 Normalized cumulative number of arrival as a function of the absolute difference between our proposed algorithm and expert picks. .................................................................... 101 Figure 5.20 Probability density histograms of the difference between our proposed algorithm and expert picks. ................................................................................................................................ 102 xv List of abbreviations AS Analytical Signal AWGN Additive White Gaussian Noise DFRFT Discrete Fractured Fourier Transform EMD Empirical Mode Decomposition FFT Fast Fourier Transform FRFT Fractured Fourier Transform FT Fourier transform HT (H) Hilbert Transform IAF Instantaneous Angular Frequency IF Instantaneous Frequency IMF Intrinsic Mode Function PCA Principal Component Analysis PSD Power Spectral Density P-Wave Compressional Wave RMSE Root Mean Square Error SSA Singular Spectral Analysis STFT Short Time Fourier Transform SNR Signal to Noise Ratio S-Wave Shear Wave WLSTR Weighted Least Squares Tychonov Regularization with Quadratic Constraints WVD Wigner-Ville Distribution xvi Acknowledgements I would like to pay special acknowledgement to my supervisor Prof. Matthew J. Yedlin for his continuous support, patience and encouragement throughout my masters program. I would like to thank Dr. Yochai Ben Horin, SOREQ, Israel, for providing the data and giving assistance in interpreting the expert-picking database. I would also like to express my deepest gratitude to my mother Aziza, and my late father Hussein, may God rest his soul, for their love and advice throughout my life. A special thank you to my family, and loved ones for their help. Thank you to the Libyan - North American Scholarship Program who funded this work. xvii Dedication In honor of my Mother Aziza, and in memory of my late Father Hassain. 1 Chapter 1 1- Introduction As the number of seismic stations around the world rapidly grows, the need for automatic phase picking algorithms is increasing and requires the employment of large datasets. These algorithms need to distinguish between the different seismic phases, and accurately measure their onset. Different approaches to automate phase picking have been developed over the years. The goal of which is to develop a robust, and accurate automatic picking algorithm [34]. Allen in [30] defines the concept of the characteristic function, as the “function that is evaluated over segments of the seismogram, and identifies changes that correspond to the arrival of the seismic phase of interest.” Such functions can be energy and frequency functions, or the envelope functions. In this work, we take advantage of the instantaneous frequency property of any signal, and in particular seismic signals, to define a characteristic function to detect compressional and shear wave arrivals. The proposed algorithm provides phase arrival time estimates. We also measure the quality and the robustness of the algorithm picks, which is based on the instantaneous frequency function of the seismic signal. Instantaneous frequency is a time-varying parameter that defines the location of the signal spectral peak as a function of time. This thesis covers the development of an IF estimation accompanied with an algorithm implementation to estimate the instantaneous frequency of quarry blasts seismic traces, and to use this IF representation to accurately detect the compressional and shear wave arrival times. 2 Instantaneous frequency representation can sometimes have physical meaning. One example of this is when the signal is a monocomponent signal with only a single or a narrow band of frequencies at each time instant [1]. For multicomponent signals, the single instantaneous frequency value gives only an average frequency value at each time instant. This has very little physical meaning and hence, a decomposition of the signal is required to extract the several signal components or monocomponent modes present in the signal prior to estimating the instantaneous frequency of each mode. 1.1 Seismic phase picking Many seismology applications such as earthquake hypocenter determination [4], source mechanism analysis [5], and hydrocarbon reservoir imaging [6], require accurate compressional and shear wave arrival time picking. Traditionally different seismological organizations have relied on human expert manual picking of seismic wave arrival and labeling, or, at least reviewing picks that have been automatically picked [7]. However, in recent years multiple deployments of dense seismic arrays such as the K-net array in the region of Japan [8] and the USArray deployed in the USA [9], make manual picking impractical due to the huge data that these arrays are collecting. This gives rise to the need for reliable, efficient, and accurate automatic phase-picking algorithms. The field of seismic signal analysis and phase onset detection still remains an active field of research [34]. One of the first and still often used algorithms uses the method of average short-term and long-term ratio (STA/LTA) as a characteristic function[35], [31]. Other approaches utilizing different variations of the characteristic function where also used[36], [37]. More recently, research studies show that seismic waves are non-stationary, and hence, non-stationary methods that involve higher order statistics should be used for analysis [38]. 3 Compressional and shear wave onset picking are commonly achieved by expert seismologists. This could result in the introduction of error, dependent on the seismologist’s experience and personal assessment, and on the signal-to-noise ratio level of seismograms. This type of error was studied by Zeiler and Velasco [29], they compared how various institutions pick seismograms. They gathered data through the International Seismological Center (ISC) for five regions throughout the world. They presented measurement results from a single highly experienced analyst, and then presented measurement results obtained by different institutions that reported common arrival times. As far as we are aware this is the largest study of its kind. We use these results in our comparison in Chapter 5. In addition, we compare our compressional and shear wave time picking performance to a recently published paper by Bogiatzis and Ishii [34]. They propose a continuous wavelet decomposition algorithm for automatic detection of compressional and shear wave arrival times. In Chapter 5 we compare the results from both [29], and [34], with the results obtained using the proposed algorithm in this thesis. 1.2 Classical instantaneous frequency definition Gabor proposed a method for generating a unique complex signal “Analytical signal (AS)” from a real signal by using the Hilbert transform. Gabor’s method for obtaining the AS was achieved by first finding the Fast Fourier transform (FFT) of the real signal. He then suppressed the negative frequency amplitudes and multiplied the positive frequency amplitudes by two, demonstrating equivalence to the following time domain representation: ! ! = ! ! + ! ℋ ! ! = ! ! + !" ! , (1.1) 4 where ! ! is the analytical signal (AS), ! ! is the original real signal, and ! ! is the Hilbert transform ℋ of the signal ! ! with the Hilbert transform defined as ℋ ! ! = ! !! ! − !!!!! !". (1.2) The analytical signal can also be written in polar form as shown in equation (1.3). ! ! = ! ! !!" ! = ! ! + !" ! , (1.3) where ! ! represents the instantaneous magnitude with respect to time, and ! ! represents the instantaneous phase with respect to time. We can calculate the instantaneous magnitude ! ! and instantaneous phase ! ! from the analytical signal as follows ! ! = ! ! ! + ! ! ! , (1.4) ! ! = tan!! ! !! ! . (1.5) By differentiating the instantaneous phase ! ! with respect to time, we can obtain the instantaneous angular frequency (IAF) ! ! , and the instantaneous frequency ! ! as demonstrated in equations (1.6) and (1.7) respectively: 5 ! ! = ! ! !!" , (1.6) ! ! = ! !2! . (1.7) This work was previously presented in [65]. 1.3 Stationary and non-stationary signals When dealing with any time-series analysis, we are faced with the important question of determining whether the analyzed signal is stationary or non-stationary. A stationary time series signal loosely speaking is one whose statistical properties remain constant over time, whereas, non-stationary times series signals have changing statistical properties over time. Generally speaking, analysis methods used for stationary time series signals should not be used on non-stationary time series signals, as doing so runs the risk of obtaining completely misleading or meaningless results. When analyzing time series signals, there are two common and established methods or models of analysis, the probability model and the spectral model. Statisticians in particular lean more towards the probability model. This is because of the familiarity of concepts such as the mean and covariance values and the whole statistical process terminology and methods. However, there is an equivalent and parallel way of analyzing time series signals. The spectral or “frequency” approach is apparently a more natural approach when dealing with signal processing and engineering problems. This model determines how much energy is present in the time series signal as a function of frequency. However it is worth noting that the statistical model and the spectral model are two sides of the same coin [2]. First let us define some important concepts in the probability model. Consider a random process ! ! . Let ! !! , ! !! , …, ! !! denote the values of the random process observed at times !!, !!, …, !! respectively. Let the joint distribution function for this random process be 6 !! !! ,! !! ,…,! !! !!, !!,… , !! . If we shift the observation of the random process by time !, we obtain a different set of random process observations ! !! + ! , ! !! + ! , …, ! !! + ! and the joint distribution function for this data set is !! !!!! ,! !!!! ,…,! !!!! !!, !!,… , !! [3] A random process is said to be strictly stationary if this condition is satisfied !! !! ,! !! ,…,! !! !!, !!,… , !! = !! !!!! ,! !!!! ,…,! !!!! !!, !!,… , !! . (1.8) A more relaxed definition of stationary is wide-sense stationary (or second order stationary). A random process ! ! is said to be wide-sense stationary if the following conditions hold[3]: 1. For k = 1, we have !! ! ! = !! !!! ! !"# !"" ! !"# !. (1.9) In other words, if the first order distribution function of the random process is independent of time and hence the mean value is constant. 2. For k = 2 and τ = t!, we have !! !! ,! !! !!, !! = !! ! ,! !!!!! !!, !! !"# !"" !! !"# !!. (1.10) In other words, the second order distribution function of the random process depends only on the time difference and not on the absolute time value. Hence the covariance is only dependent on the time shift ! and not on the particular time at which the random process is observed. Let us now move to the connection between the probability model and the spectral model [2]. A time series ! ! is said to be stationary if it can be written as a discrete sum of sinusoids. 7 ! ! = !! exp ! 2!!!! + !!!∈! !"# ! !"#$%&' !"#$%& (1.11) It exists as a sum of elements, which have constant instantaneous amplitude !! and constant instantaneous frequency !!. As previously stated, a signal ! ! is said to be stationary (throughout the rest of this thesis we will refer to stationary as meaning wide sense stationary unless otherwise stated as strictly stationary) if both the mean and covariance or autocorrelation function E ! !! !∗ !! are independent of time. We can then show that the associated signal ! ! or its unique analytical signal equivalent are stationary only if its elements have constant instantaneous amplitude A! and constant instantaneous frequency !!. Thus, a time series is said to be non-stationary if one of these assumptions become invalid. To illustrate this with a simple example [3] let us consider the monotone sinusoidal signal ! ! = ! cos 2!!!! + ! . (1.12) Let us assume that both the amplitude ! and the center frequency !! are constant, and the phase angle ! is a random variable with a uniform distribution function over the interval –!,! . That is !! ! = 12! − ! ≤ ! ≤ !!"#$ !"#!$ℎ!"!. (1.13) First we calculate the mean value µ! as follows 8 !! = ! ! ! = ! ! !! ! !"!!! = !2! cos 2!!!! + ! !"!!! = 0. (1.14) Hence the mean value is constant and independent of time. Now we move into calculating the second order distribution using the autocorrelation function !! ! of ! ! as follows: !! ! = ! ! ! + ! ! ! = ! !! cos 2!!!! + 2!!!! + ! cos 2!!!! + ! = !!4! cos 4!!!! + 2!!!! + 2! !"!!! + !!4! cos 2!!!! !"!!! !! ! = !!2 cos 2!!!! . (1.15) The first term integrates to zero due to symmetry, and we are left with the term that only depends on the time shift ! and independent of the absolute time value. The results from equations (1.14) and (1.15) demonstrate that the sinusoidal time series ! ! is stationary when the phase ! is a random variable. However, if the amplitude ! or the frequency !! are random variables, the mean value in equation (1.14) will become time-dependent, and the first term in the covariance calculation equation (1.15) will be present. Therefore, the autocorrelation function !! ! will depend on absolute time !. 1.4 Monocomponent and multicomponent signals As stated earlier, the instantaneous frequency is a time-varying parameter (for non-stationary signals), which defines the location of the signal spectral peak as a function of time. This only gives physical meaning for a monocomponent signal with only a single frequency or a narrow band of frequencies at each time instant. Therefore, for multicomponent signals 9 consisting of several monocomponent signals, the direct instantaneous frequency estimation has no physical meaning. Since most real signals are multicomponent in nature, a decomposition of the signal into its dominant monocomponent signals is necessary prior to estimating the instantaneous frequency for each monocomponent signal. 1.5 Thesis objectives, novelty and intuition Our prime objective is to create a seismic signal-processing pipeline application to pick the arrival times of seismic compression waves (P) and shear waves (S) from noisy seismic traces generated by multiple quarry blasts. Accurate onset time picking is vital in constraining physical seismic models. This is especially important and is difficult for shear waves. The proposed pipeline consists of three segments: 1. The data pre-processing segment focuses on Empirical Mode Decomposition (EMD) as a method of deconstructing multicomponent signals into monocomponent signals, followed by the Singular Spectral Analysis (SSA) method as a denoising filter. 2. The data processing segment computes the instantaneous frequency (IF) for the denoised, intrinsic mode signals (narrowband signals which are the output of the EMD process) using weighted least squares and Tychonov regularization with quadratic constraints. 3. The data post-processing segment uses seismic signal IF estimation to pick the compressional and shear waves arrival times by examining each waves’ time-frequency characteristics to discriminate between P and S wave arrivals, whose onsets have different frequency components. 10 The embodiment of the novel contribution is the unique integration of the three segments described above to achieve the novel time picking of P waves and S waves from noisy quarry blast data. In Chapter 5 we look at algorithm implementation on real seismic data generated from quarry blasts and its application in seismic feature detection using the output instantaneous frequency estimation to detect compressional and shear wave arrival times. The intuition applied to develop the algorithm design space is based on the following: 1. Implementation of Empirical Mode Decomposition (EMD) to decompose seismic wideband signals into its primary narrowband components, which function like quasi-carriers, and enabling the application of classical IF estimation. In addition we derive a method for threshold cutoff selection based on the pre-signal noise level. 2. Integration of a data-driven noise compensated model with signal IF slope-weighted regularization and quadratic inequality constraints. This results in an algorithm well matched to the noisy quarry data target application. We derivative a penalty weight, !, for the IF slope (!"!" ) term that is dependent on the noise variance. The use of the L2 norm is based on the absence of frequency outliers in the seismic signals. 3. Noise versus signal power needed to quantify and establish the validity of the final results, based on noise power threshold. 1.6 Thesis organization The thesis is structured as follows. In Chapter 2, in the first section we start out with the basic instantaneous frequency definition, background theory, and elaborate on the history of instantaneous frequency and its limitations. In the second section we cover the Empirical Mode Decomposition method for multicomponent signal decomposition and describe the process used. 11 In Chapter 3, we begin the process of algorithm design, where we cover first the core algorithm (data processing algorithm) to obtain the instantaneous frequency. This chapter is divided into three parts, each of which deals with an individual term in the minimization problem and performance evaluation results associated with each term using synthetic data. In Chapter 4, we continue the algorithm design process adding in two additional processing blocks. The preprocessing block covers empirical mode decomposition, singular spectrum analysis, and principal component analysis. In Chapter 5, we perform a detailed case study where we present results on seismic data obtained from quarry blast seismic recordings from a quarry mine in Jordan. 12 Chapter 2 2- Theory 2.1 Instantaneous frequency theory Frequency is defined in machines as the number of oscillations per unit of time. A unique case is a simple harmonic motion, in which the acceleration is proportional to the displacement and points towards the equilibrium position [10]. A simple illustration of this is a body moving in a circle at a uniform speed. If we project the radius vector that represents the body movement on to the horizontal axis, we obtain a harmonic motion as show in Figure 2.1. At a particular time instant !, this projection has a displacement, velocity and acceleration as shown in equations (2.1), (2.2) and (2.3). Figure 2.1 Simple harmonic motion projection !(!) of an object rotating on circle with uniform angular speed !, where ! = !". 13 ! ! = ! cos !" , (2.1) !" !!" = −!" sin !" , (2.2) !!! !!!! = −!!! cos !" = −!!! ! . (2.3) Equation (2.3) is a second order differential equation. We can relate displacement to the frequency by solving Equation (2.3) and obtain a more general solution as follows ! ! = !! !!"# + !! !!!"# , (2.4) where !! and !! depend on the initial values of displacement and velocity. In what follows we will only consider the first term, which corresponds to positive frequency (counter-clockwise rotation). The simple harmonic motion can be used to describe some practical application such as the movement of a traveling waving inside a material (seismic wave traveling through the ground) in which the motion of the particles at a fixed point represents a harmonic motion with the number of waves passing through a point representing the frequency of the wave. As stated earlier in our introduction, and more generally speaking, any signal ! ! can be presented as a sum of weighted harmonics. We can find the contribution (coefficients) of each harmonic using the Fourier transform (FT), defined as ! ! = ! ! !!!"#!!! !", (2.5) where ! ! characterizes the whole signal ! ! . In turn, we can use ! ! to reconstruct the original synthetic signal as ! ! = ! ! !!"#!!! !", (2.6) where frequency spectrum ! ! is time independent.. To summarize, stationary signals can be represented as a weighted sum of sine and cosine harmonics, where the amplitude and phase are 14 time independent. However, for non-stationary signals both amplitude and/or phase can be time dependent. 2.1.1 History of instantaneous frequency Before we move ahead with defining the instantaneous frequency (IF) through the Hilbert Transform, we will first give a brief historical background on IF. A detailed examination of this history can found in Boashash [10]. In this section we will discuss particular fundamental historical milestones that will help in giving a better understanding of the definition. We start with Van Der Pol’s [11] definition of instantaneous frequency. He proposed an expression of the phase angle as an integral of the IF. In his efforts to formulate a definition for IF, he starts by analyzing an expression for a simple harmonic signal ! ! = ! cos !" + !! . (2.7) Van Der Pol uses the following nomenclature: ! is the angular frequency, !! is the phase constant, ! is the amplitude and the whole argument of the cosine !" + !! is the phase. He argues that defining the phase as the whole argument has the advantage of allowing for a consistent definition for phase difference between two signal oscillations with different frequencies. This phase difference is then simply a linear function of time, just as in one signal oscillation the phase is already a function of time. The approach taken by Van Der Pol begins with a definition of an amplitude modulation scheme, by making ! in equation (2.7) a function of time: ! ! = !! 1+!" ! . (2.8) In equation (2.8), ! ! is the modulating signal, ! is the modulation depth and !! is the amplitude constant. ! or ! ! in this case is the instantaneous amplitude. 15 In a similar manner, we can define the phase modulation by modulating the phase constant ! in equation (2.7) and thus obtain ! ! = !! 1+!" ! . (2.9) In addition, by substituting equation (2.9) into equation (2.7) we obtain ! ! = ! cos !" + ! ! = ! cos !" + !!!" ! + !! . (2.10) However, to define frequency modulation Van Der Pol noted that it would be wrong simply to substitute the angular frequency ! in equation (2.7) with ! ! = !! 1+!" ! . (2.11) By substituting equation (2.11) into equation (2.7), the resulting phase in equation (2.12) does not have the same format as equation (2.10), leading to the following physical inconsistency as shown: ! ! = ! cos ! ! ∗ ! + !! = ! cos !!! + !!!"# ! + !! . (2.12) Instead, he argued that by reformulating equation (2.7) as ! ! = ! cos ! ! !"!! + !! , (2.13) an equation similar to equation (2.7) is obtained, with frequency modulation encapsulated as ! ! . By substituting equation (2.11) into equation (2.13) we obtain ! ! = ! cos ! ! !"!! + !! = ! cos !!! +!!! ! ! !"!! + !! , (2.14) which is the right expression for frequency modulation oscillation 16 Van Der Pol relates the above formulation to the definition of instantaneous frequency. We note that a vector can represent any harmonic oscillation in the complex plane (Argand plane). The length of the vector represents the amplitude. The angle between the vector and the real axis represents the phase as demonstrated in Figure 2.2. We can find our simple harmonic signal in equation (2.7) by finding the projection of the complex vector on the real axis. Figure 2.2 Harmonic motion represented in complex form where ! ! = ! !! !"!! , ! ! =! !"# !"+ ! , ! ! = !"+ !. From the instantaneous phase, we obtain the angular velocity (or angular frequency) as the time derivative of the instantaneous phase. In the simple harmonic oscillation of equation (2.7), the instantaneous phase is derived from !" + !! . Therefore the angular frequency is obtained as follows: !"#$%!& !"#$%#&'( = ! !ℎ!"#!" = ! !" + !!!" = !. (2.15) We can also us this in our frequency modulation equation (2.14), where by differentiating the phase with respect to time we obtain the instantaneous angular frequency as 17 !"#$%"$%"&'(# !"#$%!& !"#$%#&'(= !!" !! 1+!" ! !"!! + !! = ! != !! 1+!" ! . (2.16) The next important step in the definition of instantaneous frequency was made by Gabor. In his paper titled Theory of Communication published in 1945 [12], Gabor proposed a method for generating a unique complex signal from a real signal. Before moving ahead, it is worth noting that this complex signal will only have physical meaning under certain conditions which will be discussed later. Gabor’s motivation for finding a complex representation of a signal was to obtain the central frequency moments for a signal. The n’th moment is defined for a signal ! ! with a frequency spectrum ! ! as !! = !! ! ! !!!! .!"! ! !!!! .!" . (2.17) This definition is not applicable if the signal ! ! is a real signal, as ! ! ! is an even function and hence all the odd moments would be zero. Only by using the complex representation of a signal is this possible to calculate it’s central frequency moments. To arrive at this complex signal representation, or as it is more formally known, the analytical signal (AS), we must first look at the simple harmonic case. Gabor noted that any sine or cosine functions maybe represented as a complex exponential (!!"#), which is a compact mathematical representation that is easy to work with. One such representation is 18 cos !" = ℜ! !!"# , sin !" = −ℜ! !"!!" . (2.18) The harmonics are represented by the real part of a single rotating complex vector. Taking this into account let us choose a simple real signal ! ! given by ! ! = ! cos !" + !"#$ !" . (2.19) We can write x(t) in complex form, ! ! , given by ! ! = ! ! + !" ! = ! − !" !!"# . (2.20) Equation (2.20) can be thought of as, adding an imaginary signal !" ! to the real signal ! ! to form this complex function ! ! . Alternatively, we can write equation (2.20) as ! ! = ! cos !" + ! sin !" + ! ! sin !" − ! cos !" . (2.21) By comparing equation (2.20) and (2.21), we observe that the function ! ! was derived from the real signal ! ! by the replacement of cos !" with sin !" and sin !" with −cos !" . In other words the function ! ! is in quadrature to the function ! ! . Therefore, when combining ! ! and !" ! we form the complex function ! ! . This is achieved systematically by replacing cos !" with !!"# and sin !" with −!"!"#. More generally, for any real signal ! ! , (not necessary a simple harmonic signal) we can obtain its analytical signal (complex function) ! ! by first representing ! ! as a real Fourier integral, and then replacing the cos !" and sin !" with !!"# and −!"!"# respectively. This is shown in equation (2.22). 19 ! ! = ! ! cos !" + ! ! sin !"!! !" ! ! = ! ! − j! !!!! !!"# !" . (2.22) An alternative way of finding the complex function arises from a deeper understanding of the previous process. First, we have to introduce an alternative representation of the basic harmonic function cos !" and sin !" , which is given as cos !" = 12 !!"# + !!!"# , sin !" = 1!2 !!!" − !!!!" . (2.23) The double-sided amplitude spectrum plot for one of our harmonic function is given in Figure 2.3. This is the same for both the cos !" and sin !" functions. We observe from Figure 2.3 that the two frequency peaks are at ! and –!. Therefore, when we represent a harmonic signal (cos !" for example) with a complex exponential !!"#, we are essentially suppressing the negative frequency component !!!"# in equation (2.23), and multiplying the positive frequency component !!"# by two. This important observation was made by Gabor and lent an easier way to obtain the analytical signal ! ! from our real signal. The exact process that was outline by Gabor in [12] was as follows, “suppress the amplitudes belonging to negative frequencies and multiply the amplitudes of the positive frequencies by two”. 20 Figure 2.3 Simple harmonic signal spectrum. Gabor also showed that we could obtain a time domain transformation between the real signal ! ! and complex signal j! ! using the inverse Fourier transform. The signal ! ! that is associated with ! ! can be obtaining using equation (2.24), with the equation derivation found in [13]: ! ! = 1! ! ! !"! − !!!! . (2.24) This integration, deconstructed as follows, is given by = lim!→! +!!!!! !!!!!!! , (2.25) which denotes the Cauchy’s principal value. In a similar manner we can obtain ! ! from ! ! ! ! = −1! ! ! !"! − !!!! . (2.26) The associated functions ! ! in equations (2.24), and ! ! in equation (2.26) are known as the Hilbert transform pairs. 21 Ville [14] combined the works of Gabor [12] and Carson and Fry [15] to define the instantaneous frequency using Gabor’s complex analytical signal representation. Any complex signal maybe considered as the result of the modulation of their envelope by a carrier, which is itself frequency modulated. We can define a signal ! ! using its complex signal unique equivalent as, ! ! = ℜ! ! ! !! !"#$(!) , (2.27) with ! ! defined as the envelope of analytical signal ! ! , with instantaneous phase !"#$ ! .. Equation (2.27) defines the signal ! ! as the real part of the amplitude modulation of !! !"#$ (!) by the envelope ! ! . Alternatively, ! ! can be defined as the real part of the frequency modulation of the envelope ! ! by the carrier !! !"#$ (!) . Therefore, we can define the instantaneous frequency as. !(!) = 12! !!" arg ! ! . (2.28) Let us consider two examples. For a simple monotone harmonic signal ! ! we can obtain its complex signal representation as ! ! = cos !" + ! , ! ! = !! !"!! , (2.29) with ! ! in this case being equal to one. The instantaneous frequency can therefore be derived from the phase, arg ! ! = !" + ! , as follows ! = 12! !!" arg ! ! = !2!. (2.30) As a second example, let us consider a modulated sinusoidal signal ! ! 22 ! ! = cos !" cos !" = 12 cos ! + ! ! + cos ! − ! ! . !ℎ!"! ! > ! > 0 (2.31) We can obtain its unique complex signal as follows. ! ! = 12 !! !!! ! + !! !!! ! = 12 !! !" + !!! !" !!"# = cos !" !!"# , (2.32) where the envelope ! ! = cos !" and the phase arg ! ! = !" and so the instantaneous frequency equals ! = 12! !!" βt = !2! . (2.33) The instantaneous frequency is the carrier frequency !!! and ! ! is the envelope as defined. Note the importance of the fact that the coefficients of ! in the exponential are positive; neglecting this would lead to the inversion of the roles of ! and ! and would produce absurd results. From the previous two examples Ville noted that the higher frequency value !/2! is the carrier frequency. This was not a random result, but rather was an outcome of the following proposition. For any real signal ! ! we can write a function ! ! [14] ! ! = ! ! !!"# !ℎ!"! ! > 0. (2.34) We think of ! ! as the frequency modulation of the signal ! ! by !!"#. This is essentially the frequency translation of the spectrum of ! ! by the amount !. At the point when all the negative frequency spectrum of ! ! is shifted to the positive frequencies side the function ! ! becomes the analytical complex signal ! ! [14]. This proposition is based on a more general fact: For a signal to be an analytical complex signal, for all values of t, its spectrum must have only positive frequency values with no negative frequency components [14]. 23 2.1.2 Interpretation of Hilbert generated instantaneous frequency and analytical signal Interpretation of the IF generated from Hilbert transform (HT) is often a subject of controversy. The most obvious of which is the fact that a real signal can be represented in complex form in many other ways. We can represent a signal with either an amplitude modulated (AM) signal and a constant carrier (phase), or, as a frequency modulated (FM) signal with a constant amplitude or a mixture of both. Although the analytical signal (AS) generated from the Hilbert transform is unique in giving rise to such complex expression as discussed earlier, this complex expression does not always correspond to any physical meaning. In this section we will cover the main conditions that a signal or the HT-generated AS must meet to give rise to a meaningful IF. The main two conditions as discussed by Boashash [10] are: A. The signal has to be a monocomponent (narrow band-pass) signal with a locally zero mean; B. The condition under which the Hilbert transform represents the quadrature component. The motivations for these conditions are described in what follows. A. Monocomponent and zero mean condition: Deriving the IF from the AS of a multicomponent signal makes little physical sense. This is because the computed IF is a result of the weighted average sum of all the frequency components at a particular instant. As for the zero mean criteria, we use a simple example to illustrate this condition. We chose the simple function ! ! , employed by Huang et al [1], given by ! ! = ! + cos !" , (2.35) where ! is a random constant. 24 The Hilbert transform can be found directly as ! ! = sin !" . (2.36) Using equation (2.28) to compute IF we obtain ! = 12! !(1+ a sin !" )1+ 2! cos !" + !! . (2.37) The IF computed from equation (2.37) is dependent on the constant !. To obtain the true IF value, which we know to be ! = !!!, we must set the constant ! in equation (2.37) to zero. Therefore, to obtain a meaningful IF estimation, the mean signal value must be equal to zero. Both these conditions can be satisfied, as we will see in the next section using the Empirical Mode Decomposition method. B. Hilbert transform condition: The conditions for this are summarized most successfully by the Bedrosian and Nuttall theorems [16]. The Bedrosian theorem is a condition that sets the limitation on the frequency separation of the HT of the carrier from the envelope. To illustrate this let us consider an FM signal ! ! given by ! ! = ! ! cos ! ! , (2.38) where ! ! is the time-dependent amplitude, and ! ! is the time-dependent phase. The corresponding AS for ! ! is ! ! = ! ! cos ! ! + !ℋ ! ! cos ! ! . (2.39) For the Hilbert transform separation described above to hold true, we must have that ℋ ! ! cos ! ! = ! ! ℋ cos ! ! . (2.40) The separation of the envelope is only possible if the Fourier spectrum of the envelope and the Fourier spectrum of the carrier are non-overlapping. In other words, the spectrum 25 ! ! = ℱ ! ! lies in ! < !! and the spectrum ℱ cos (!(!) lies in ! > !!. The proof for this can be obtained using Bedrosian Product Theorem found in [10]. As for the Nuttall Theorem which is discussed in [10] and [16], this theorem questions the conditions under which we can accurately compute the quadrature component using the HT, or more simply stated when could we write ℋ cos ! ! = sin ! ! (2.41) Considerable research has been conducted on this topic [16], [17], which try to identify the error measurement between the ideal quadrature and HT-generated quadrature. However, in practice for most real signals this error is negligible. 2.2 Empirical mode decomposition (EMD) Huang first introduced the empirical mode decomposition method in 1998 [1]. The motivation behind his work was that the Hilbert transform could only be meaningfully applied to a limited number of signals, which are narrow band-pass signals. As discussed in the previous section, to obtain a meaningful IF from the Hilbert transform, the signal has to be monocomponent with zero mean. One way to satisfy both conditions is by applying the empirical mode decomposition method. The empirical mode decomposition method assumes that for a typical time-series dataset, the data consists of many distinctive single intrinsic modes of oscillation. The data comprises many different coexisting modes at any given time, with each of the intrinsic modes superimposed on the remaining modes. Each of these oscillatory modes can be linear or non-linear, but will have the same number of extrema and zero-crossings. Furthermore, each of these intrinsic modes will be symmetrical with respect to the local mean. 26 To summarize, for an oscillatory mode to represent an intrinsic mode function (IMF) it must satisfy: A. The number of the extrema and of zero-crossings must be equal or have a difference of one at most. B. The mean value of the envelope defined by the local maxima and the envelope defined by the local minima is zero at any time point. In addition, it is also worth noting that each IMF represents the simple harmonic function counterpart, with the distinction that for the IMF, the amplitude and frequency are functions of time, whereas the harmonic function has a constant amplitude and frequency. Let us now move on to the procedure of extracting IMF’s from a given signal. 2.2.1 The sifting procedure Details of this procedure can be found in Huang et al [1]. In this section we will summarize the six main steps described in the list below: 1. First, we take the whole time series data ! ! , shown in Figure 2.4 (a), and locate all the local maxima points. We then connect all the local maxima points together using a cubic spline function. Then we identify all the local minima points and connect them also with a cubic spline function. 2. Then, we find the mean function for the local maxima and minima spline function for the whole data set, which is designated as !!.This is shown in Figure 2.4 (b). 3. We obtain the first proto-IMF function ℎ! as shown in Figure 2.4 (c), by subtracting the local mean function !! from our data set to obtain ℎ! = ! ! −!!. (2.42) 27 Figure 2.4: Illustration of the sifting process. (a) Original time series data example ! ! . (b) Maxima spline function (red dashed line), Minima spline function (blue dashed line) and local mean function (thick solid black line). (c) Proto-IMF function that is the difference between original time series function ! ! and local mean function. 4. We repeat steps 1 through 3 again taking our proto-IMF function as the original ! ! function. We do this by taking the ℎ! data set and identifying the maxima and minima cubic spline envelopes. Then we subtract the new mean function !!! from ℎ!, where the new mean function !!!is calculated from maxima and minima cubic spline envelops. We repeat this step ! number of times as shown in equation (2.43). The reason we repeat this step several times is because for a real signal we won’t be able to identify all the maxima and minima points from the first round. 28 ℎ!! = ℎ! −!!! . . ℎ!! = ℎ! !!! −!!! (2.43) After repeating the sifting processing ! number of times we obtain our first IMF which we will designate as !! = ℎ!!. The number of times we need to perform the sifting process in practice is between 4 to 8 times for each IMF component [18]. 5. After obtaining the first IMF !! which should contain the majority of the high frequency (shortest periods) components of the signal ! ! . We subtract !! from our signal ! ! to obtain the residual !!. !! = ! ! − !!. (2.44) 6. After obtaining our first IMF !! we repeat the steps 1 through 4 again to obtain the second IMF !! but using !! instead of ! ! in equation (2.42) as our input signal for the sifting process. This procedure is repeated k number of times to obtain all subsequent residual signal !! and the corresponding IMF’s !!. !! = !! − !! . . !! = !!!! − !! (2.45) The sifting process can be stopped finally when the components !! or !! become too small or the residual function !! becomes a monotonic function. 29 The original signal ! ! can be represented as ! ! = !!!!!! + !! (2.46) Thus, a decomposition of the signal into ! IMF functions is achieved, which satisfy the conditions needed for a meaningful HT. 30 Chapter 3 3- Algorithm design In this chapter and Chapter 4 we cover the algorithm design for instantaneous frequency estimation. The approach we take to examine the algorithm design in this thesis is based upon addressing the first of two broad issues when dealing with signal analysis. The first issue is solution stabilization. By this we mean the accuracy of the final solution, which is addressed in the data processing section of current chapter. The second issue is multicomponent signal decomposition and low signal to noise ratio. This will be covered in Chapter 4. Although in this chapter we focus only on the data processing section, the overall algorithm design consists of three main parts as follows 1. Data Preprocessing, which we will cover in Chapter 4; 2. Data Processing, which we will cover in this chapter; 3. Data Postprocessing, which we will cover in Chapter 4. 3.1 Data processing The way this section is organized is summarized in the flow diagram presented in Figure 3.1. Each processing block in the flow diagram represents a subsection of the current chapter, where we examine in detail the function of the process and the goals we aim to achieve. We also exemplify some of the processes with performance evaluation segments using synthetic data. 31 Figure 3.1 Data processing algorithm flow diagram. 32 3.1.1 Problem formulation To put it simply, our problem is taking a real signal ! ! , which can be non-stationary and nonlinear, and numerically using compute its instantaneous frequency. Using the Hilbert transform (HT) and consequently the analytical signal (AS), we can find a unique complex representation for our signal as follows ! ! = ! ! + !" ! = ! ! !!" ! !ℎ!"!,! ! = !! ! + !! ! !"# ! ! = tan!! ! !! ! . (3.1) We can then use equation (2.28) to find the Instantaneous Frequency (IF) from our instantaneous phase as follows !(!) = 12! !" !!" = 12! !!" tan!! ! !! ! , (3.2) !(!) = 2!" = ! ! !! ! − ! ! !! !!! ! + !! ! = ! ! !! ! − ! ! !! !!! ! , (3.3) where !! ! and !! ! are the first order derivatives of ! ! and ! ! respectively. ! ! is the envelope of the analytical signal ! ! computed in equation (3.1). To solve equation (3.3) numerically, we need to reformulate the equation into matrix format. To do this we first must sample our continuous time signal ! ! . Thus, obtain the discrete time version ! ! !" !, where ! and its corresponding discrete Hilbert transform ! are represented by vectors of length N. We can than write equation (3.3) as follows !" = !, (3.4) 33 where ! is a diagonal square matrix with ! dimensions whose elements represent the sampled squared envelope function !! ! . We can directly compute the diagonal elements of the matrix ! as follows ! = ! ! + ! !. (3.5) Furthermore, the measurement vector !, of length !, can also be derived as shown in equation (3.6). ! = ! ⋅ ∆!∆! − ! ⋅ ∆!∆! , (3.6) where ⋅ is an element-wise multiplication of two vectors. Both ∆!∆! and ∆!∆! are the first order derivatives of ! and ! with respect to time respectively. We will use the spectral method to compute the first order derivatives of ! and !. This turns out to be vital in reducing the first derivative error and obtaining accurate IF estimation. 3.1.2 Performance evolution criteria and intuition In this section we set out some of the intuition behind our selection of the simulated signals properties, and specify the error metrics used to evaluate the performance of our proposed method. We also stipulate the comparative algorithm (STFT) used here, and hope to clarify the reasoning behind our selection. 1. The criteria we used for selecting the synthetic signals are two properties: the frequency-band (0 Hz to 20 Hz), and SNR levels (0 dB and, 10 dB). These are similar to the properties of seismic traces that are generated by man-made events, more specifically, properties of quarry blast traces, which are of main interest to us as part of our case study in Chapter 5. 34 2. We compare the performance of our algorithms using two types of synthetic signals: constant frequency harmonic signals, and chirp signals. The intuition behind the former is to demonstrate the contribution each of the terms has in our proposed method has in stabilizing the solution. In particular, we focus on reducing the effect of noise, by incorporating the noise variance into the minimization problem using the ! and, !! parameters. The chirp signal is used later on in this chapter to demonstrate the effect of the frequency variance on the IF estimation. 3. We compare our proposed algorithm results to the classical time-frequency centroid spectral method (STFT) as point of a reference throughout the evaluation. The reason for our selection is that STFT is based on an alternative time-frequency technique for computing the IF, where as our proposed method is based on the analytic signal method. The STFT method is a well-established method for IF estimation, and we think is equivalent to our proposed model in terms computational complexity. 4. We use the root mean square error (RMSE) metric as the error evolution metrics through this thesis, which is given as follow(3.11)s: !"#$ = !! − !! !!!!! ! , (3.7) where !! represents the actual ideal frequency value, !! represents the estimated frequency value, and ! is the number of samples. 3.1.2.1 Performance evaluation In this section we evaluate the solution obtained by directly estimating the instantaneous frequency. We solve for ! using equation (3.4), by computing the matrix ! using equation (3.5) 35 and the vector ! using equation (3.6) using the one-sided finite-difference method to evaluate the derivatives. We now consider the following examples: 1. Synthetic signal. We use a constant frequency sinusoidal signal ! ! that is corrupted by zero mean AWGN represented in equation (3.8), where !! = 10 !". At the end of this chapter and the following chapter we compare the results from two constant frequency signals with different !"# levels (!"# = 10.0 !", !"# = 0 !", which represents worse case scenario with signal power equal to noise power.) ! ! = sin 2!!!! + !"#$% (3.8) Figure 3.2 Simple monotone signal ! ! = !"# !"!!! + !"#$% amplitude and spectrogram plots. The monotone signal duration is only from 25 seconds to 180 seconds, and the rest of the interval is pure noise. 36 2. Comparative algorithm. We compare the results obtained from our proposed algorithm to a time-frequency energy distribution method, based on the first frequency moment of the squares modulus of the short-time Fourier transform (STFT) using a Gaussian window. The algorithm uses a 32-sample window size with a 16-sample overlap. SNR (dB) RMSE !" = ! IF solution 10 3.631 STFT centroid frequency 10 0.026 Table 3-1 Performance comparison between STFT centroid method, and !" = ! IF solution. From Table 3-1, we observe that solving for IF directly using equation (3.4) produces an unstable result with a large IF error when compared to the true IF value. In addition, we find that the STFT centroid frequency method outperforms the direct solution approach. The STFT algorithm works by allocating the maximum PSD value for each time instance and then connects the corresponding frequency values to obtain the instantaneous frequency estimation. 3.1.3 Spectral method If we take a real signal x t and sample it at a sampling frequency f! we obtain a vector x of length N. The Fourier spectrum for x n can be found using Fast Fourier transform (FFT), which is denoted as X X k . ! ! = !!" ! ! (3.9) We also know that the relationship between a signal and its first order differentiation with relation to its Fourier transform is given by !" !!" !"#$%&$ !"#$%&"'( !"# ! . (3.10) Hence using both equations (3.11)(3.9), and (3.10), we can numerically compute ∆!∆! for our vector ! as illustrated in equation (3.11): 37 ∆!∆! = !""# ! ⋅ ! !ℎ!"!,! ! = !!!! 0 ≤ ! < !−!!! 2! − ! ! ≤ ! < 2! (3.11) with the function ! ! plotted in Figure 3.3. The first order differentiation function ! ! is dependent on both angular frequency !, and the sampling frequency !! used to sample data set ! ! . Figure 3.3 ! ! function plot, where !! = !. After computing and constructing both matrix !, which is given in equation (3.5) and vector ! computed from equation (3.6), we can now look into solving our linear equations using weighted least squares. 38 3.1.4 Weighted least squares First, let us reformulate equation (3.4) as a weighted least squares minimization problem, with the corresponding solution ! = arg min! !!!! !" − ! !!, (3.12) where !! is the weight matrix, !! is the second norm (Euclidean norm) squared and min! is notation that specifics the value ! that minimizes the objective function. The normal solution to equation (3.12) ! can be obtained from equation (3.13). !!!!!! = !!!!!. (3.13) To obtain more insight into the weighted least squares solution and the role of !!, we take a step back to equation (3.4), where in practice, equation (3.4) can be amended to include the noise in our measurement vector ! as follows. !" = ! − !, (3.14) where ! is the error (or noise) in our measurement vector !. This noise can be characterized by its mean ! and variance !!! for independent noise and !!"! (for dependent noise). ! = ! !! , !!! = ! !!! , !!"! = ! !!!! . (3.15) The variance scale value infers the following: A small !!! value implies a more accurate !! measurement, where as a large !!! value implies that the measurement values !! are less reliable. If we weight each equation by selecting the elements of the matrix !! in such away that (!!")! = !!!"! , where (!!")! are the corresponding elements of the matrix !!. The best-fit solution ! can then be calculated from equation (3.13), which takes into account these weights and the 39 reliability of each measurement value !!. We are able to show this is true for the more general case, where !! = Σ!! and Σ!! is the inverse of the noise covariance matrix [19]. Σ = ! !!! =!!!!!!!..!!!!!!!!!!!..!!!! … !!!!!!!!..!!! . (3.16) We now show that the choice of !! = Σ!! minimizes the expected error in our best-fit solution !. The normal equation for any choice of !! and the corresponding solution can be obtained as follows. !!!!!! = !!!!!. !ℎ!"ℎ !"#$%, ! = !!!!! !!!!!!! = !", (3.17) where ! = !!!!! !!!!!!, and !" = !!!!! !!!!!!! = ! is the identity matrix. We can also compute the covariance matrix for our output error ! − ! ; which is the output error in our estimates, when ! = ! − !" is the input error in our measurements. Therefore, !"#$"# !""#": ! − ! = !" − !" = !"# − !" = ! !" − ! = −!". (3.18) In a similar manner to equation (3.16) we can compute the covariance matrix ! of the output error ! − ! in equation (3.18) as follows. ! = ! ! − ! ! − ! ! = ! !"!!!! = ! ! !!! !! = !Σ!! (3.19) We can show that ! is as small as possible when the !! matrix used in ! is Σ!!. This gives the best linear unbiased estimate for the solution ! [19]. Therefore, if we select !! = Σ!! the corresponding matrix ! that is denoted as !∗ for distinction. The new covariance matrix ! can be calculated as follows. 40 ! = ! ! − ! ! − ! ! = !∗Σ!∗! = !!Σ!!! !!!!Σ!! Σ Σ!!! !!Σ!!! !! = !!Σ!!! !!. (3.20) Selecting any other value for !! will give a different value for !, which is expected. However the value of ! is smallest when the matrix !! used in ! equals Σ!!. The full proof for this can be found in Strang’s book computational science and engineering [19]. 3.1.4.1 Performance evaluation We repeat the same performance evaluation described in section 3.1.2 and use the same criteria. We solve for ! using equation (3.13) by computing the matrix ! from equation (3.5) and the vector ! from equation (3.6). As for the value of !!, we use a simple constant value diagonal matrix, where (!!)! = !!!!. This is done under the assumption that the noise is AWGN, where the value of !!! = ! !!! is computed from the pre-signal (noise only) section of the signal. SNR (dB) RMSE !!!!!! = !!!!! IF solution 10 0.0516 STFT centroid frequency 10 0.0268 Table 3-2 Performance comparison between STFT centroid solution, and !!!!!! = !!!!! IF solution. From Error! Reference source not found. we can compare the performance of both methods, where it is clear that the STFT centroid frequency method outperforms the weighted least squares method. However, there is great improvement in the IF estimation compared to the direct solution. 3.1.5 Quadratic constraints as a quadratic inequality constraints 41 3.1.5 Quadratic constraints as a quadratic inequality constraints In finding the solution for equation (3.13), we likely know the range of frequency values for the target solution. This additional information can be incorporated into the least squares minimization problem as a quadratic inequality constraint as follows: min! !" − ! !! !. ! !! − !! ! ≤ !!, !ℎ!"!, ! < ! < !, !! = ! − ! !4 ,! = ! + !2 , (3.21) where !, ! are the upper and lower frequency cutoffs respectively. For simplicity we assumed that frequency over the interval ! < ! < ! has a uniform distribution with mean ! and variance !!. By finding the Lagrange dual function for equation (3.21), and applying a penalty matrix !! (where !! = !"#$ !! ), we then solve for ! as follows: ! = !"#min! !" − ! !! + !!!! ! − ! ! ! (3.22) The normal equation and solution to equation (3.22) , ! can be obtained from equation (3.23): (!!! + !!)! = !!! + !!!. (3.23) In practice, to find the optimal values for the penalty matrix elements of !!, we run the solution through a loop until the constraints are satisfied [20]. This work was previously presented in [65]. 3.1.5.1 Performance evaluation In this section we present some synthetic signal results using only the quadratic inequality constraints as an additional term to our least squares minimization problem. This is to show the independent contribution this term brings in helping to stabilize the solution. The evaluation criteria used here is the same as the one use in section 3.1.2. 42 SNR (dB) RMSE (!!! + !!)! = !!! + !!! IF solution 10 0.0441 STFT centroid frequency 10 0.0268 Table 3-3 Performance comparison between STFT centroid solution, and (!!!+ !!)! =!!!+ !!! IF solution. The results obtained in Table 3-3 show only slight improvement in the evaluation metrics compared to the weighted least squares. However, that was somewhat expected. The quadratic constraints term required the solution to fall within a predefined frequency band, which in this case was mostly satisfied with weighted least squares term. 3.1.6 Least squares Tychonov regularization In equation (3.4) the matrix A has a determinant near zero. Therefore, in order to solve for the instantaneous frequency vector (!), we reformulate the problem by introducing the objective functional given by !(!) = !!!! !" − ! !! + ! !" !! (3.24) In equation (3.24), the first term represents the weighted least squares as previously derived in equation (3.12), and the second term represents the L2 regularization term, known as Tychonov regularization. The objective of the regularization term is to stabilize the solution for !. However, this requires a judicious choice of the numerical value of !. The solution (!) is obtained by minimizing !(!), i.e. ! = !"#min! !(!) = !"#min! !!!! !" − ! !! + ! !" !! (3.25) 43 The choice of ! for the minimization in equation (3.25) was previously derived in [64], and is given by ! = !!!2 ∆! ! ! !. (3.26) where !!! represents the noise variance, and ∆! is the sampling interval. For the application in this thesis we choose ! = ! 2∆!, where !, the first order central difference matrix, is given by, ! =01.. 0 0−10.. 0 00−1.. 0 0 … 00..1 0 00..0 1…00..−1 0 (3.27) The final normal equation that yields our solution, !, is given by, (!!! + !!!!)! = !!! !ℎ!"!, ! = !/4∆!! (3.28) 3.1.6.1 Performance evaluation In this section we conduct a performance evaluation with the additional Tychonov regularization term to our least squares terms, where we solve for ! in equation (3.28) using the value of ! from equation (3.26). As in the previous section, the same criteria and synthetic signal are used as described in section 3.1.2. We note that the introduction of the Tychonov regularization term significantly improved the performance of our proposed method solution, and outperforms the STFT method. 44 SNR (dB) RMSE (!!! + !!!!)! = !!! IF solution 10 0.0173 STFT centroid frequency 10 0.0268 Table 3-4 Performance comparison between STFT centroid solution, and (!!!+ !!!!)! =!!! IF solution. 3.1.7 Putting it all together Finally, we are at the stage of combining all the terms together to obtain one minimization objective function. By combining equations (2.12), (2.22), and (3.25), we obtain ! = !"#min! !!!! !" − ! !! + ! !" !! + !!!! ! − ! !! . (3.29) And to solve for ! we use the following normal equation !"#$%& !"#$%&'(: (!!!!! + !!! + !!)! = !!!!! + !!!. !ℎ!"!, !! =!! ! (3.30) 3.1.7.1 Performance evaluation In this section, we conduct a performance evaluation with all the terms combined where we solve for ω in equation (3.30). This evaluation is performed on two types of synthetic signals, constant frequency harmonic signals and chirp signals as shown below: 1. Constant frequency harmonic signal. The synthetic signal is represent by ! ! given in equation (3.8). The signal consists of constant frequency harmonic signal and AWGN. The results presented in Table 3-5 compares the performance of our proposed method with a classical STFT method at two SNR (0 dB, and 10 dB) levels. 45 Figure 3.4 Top plot: IF estimation by solving the equation (!!!!!+ !!! + !!)! = !!!!!+!!!. Bottom plot: IF estimation using STFT centroid frequency. SNR (dB) RMSE (!!!!! + !!! + !!)! = !!!!! + !!! IF solution 10 0.0173 STFT centroid frequency 10 0.0268 (!!!!! + !!! + !!)! = !!!!! + !!! IF solution 0 0.0205 STFT centroid frequency 0 0.0744 Table 3-5 Performance comparison between STFT centroid solution, and (!!!!!+ !!! +!!)! = !!!!!+ !!! IF solution. The results from Figure 3.4 and Table 3-5 are very encouraging. The performance of the weighted least squares Tychonov regularization method slightly outperforms the STFT centroid method. 2. Chirp signal We now present the simulation results for tracking a linear IF function. s t , which represents a chirp signal that is corrupted by zero mean AWGN as given in equation (3.31). 46 The results presented in Table 3-6 compares the performance of our proposed method with a classical STFT method at two SNR (0 dB, and 10 dB) levels. ! ! = sin 2! !! +!2 ! ! + !"#$% !ℎ!"!, ! = (!! − !!)(!! − !!) (3.31) !! = 15 !", !! = 5 !", !! = 25 !"#, !! = 180 !"#. Figure 3.5 Down-chirp signal ! ! = !"#$%+ !"#$% amplitude and spectrogram plots. The chirp signal duration is only from 25 seconds to 180 seconds, and the rest of the interval is pure noise. The average SNR is equal to 10 dB 47 Figure 3.6 Top plot: IF estimation by solving the equation (!!!!!+ !!! + !!)! = !!!!!+!!!. Bottom plot: IF estimation using STFT centroid frequency. The average SNR is equal to 10 dB SNR (dB) RMSE (!!!!! + !!! + !!)! = !!!!! + !!! IF solution 10 0.259 STFT centroid frequency 10 0.166 (!!!!! + !!! + !!)! = !!!!! + !!! IF solution 0 1.18 STFT centroid frequency 0 1.03 Table 3-6 Performance comparison between STFT centroid solution, and (!!!!!+ !!! +!!)! = !!!!!+ !!! IF solution. The performance of the weighted least squares Tychonov regularization method slightly underperforms when compared with the STFT centroid method, as shown Figure 3.6 and Table 3-6. However, we note that for the weighted least squares Tychonov regularization method most of the error in the IF estimation, is incurred at the edges. This edge effect is expected because of the sharp frequency change, which is heavily penalized. 48 Chapter 4 4- Algorithm design extension In this chapter we add and discuss two additional blocks to our algorithm and the motivation behind them. As we alluded to in the previous chapter, the goal for the algorithm expansion is to allow for the implementation of the algorithm on a wider class of signals. More specifically, we will be looking into empirical mode decomposition as a method to deconstruct multicomponent signals into monocomponent signals, which can then be analyzed to obtain the instantaneous frequency using the data processing method. In addition, due to the performance drop in the data processing block in a high noise environment, we will implement the Singular Spectral Analysis method as a denoising step to improve the SNR of the signal. Lastly, we introduce the postprocessing block later in this chapter, the aim of which is the extraction of certain features of interest from the signal under consideration. 49 Figure 4.1 Data preprocessing algorithm flow diagram. 50 4.1 Data preprocessing This processing block can be divided into four main sub-blocks as illustrated in Figure 4.1. The main goals for these processing blocks are: 1. Decompose a multicomponent signal into monocomponent intrinsic mode functions (IMF). This is achieved using Empirical Mode Decomposition (EMD). 2. The second goal is noise reduction using singular value decomposition. This is achieved using two methods: Principal Component Analysis (PCA) and Singular Spectrum Analysis (SSA). 4.1.1 Data restructuring The aim of this section is to organize the data set and introduce some notations. Frequently, in data analysis and algorithm design, we start out with a collection of data that was aggregated over time and stored. This data is then used to design an optimal data analysis algorithm. However, it has become good practice to split the data set prior to the analysis and design phase into at least two sets: 1. A training set that is used to build and train or develop the algorithm and find the optimal parameter values. 2. A test set that is only used as a performance measure for the algorithm. The training set usually contains most of the data to be processed, whereas the test set has a small fraction of the data and its main purpose is to avoid over-fitting of the model to the data. Specifically for this thesis application, the input data set used is a collection of seismic time series, where we refer to each time series as an event (or example) consisting of ! samples. In the data-restructuring block, we split the input data set into a training set denoted as !!"#$% that contains 80% of the events used. The !!"#$% set is constructed as a matrix with ! × ! 51 dimensions, where ! is the number of events used in the training, and ! is the length of each time series event. The second set is the !!"#!, which contains the remaining 20% of the events. The !!"#! set is also structured into a matrix with ! × ! dimensions, where ! is the number of events used in the test set, and ! is the length of each time series event which is the same as the training set. The general notation used here is, any data set is denoted by a capital letter (!) followed by a subscript denoting which set we are referencing to (i.e. training set, test set or portions of that set), and to which process this matrix is an output. For example, the training set matrix that is an output of the EMD process is denoted as !!"#$%_!"#. We also denote each event or equivalently each column in these matrices with a small letter (!) notation that is followed by a subscript of the column indices ! and the process it belongs too, for example !!_!"#. The focus in this chapter and the following chapter will be on the training set unless specified otherwise. 4.1.2 Empirical mode decomposition The process for decomposing a multicomponent signal into its IMFs was described in Chapter 2 using Huang et al [1] sifting process. The implementation of this process in this study was achieved using an algorithm design that was published in the Matlab Center File Exchange by Alan Tan [21]. The algorithm implements the sifting process to generate multiple IMF’s from a multicomponent signal. It is worth noting two points with regards to the implementation of the EMD process in this thesis: 1. The signal used in the EMD process is a multicomponent signal, and hence if the signal is a monocomponent signal the process is redundant; 52 2. The number of IMF’s selected for the IF analysis will depend on the relative power contribution of each IMF to the whole signal. This is exemplified in Chapter 5, where only the first three IMF’s are used for the IF computation. 4.1.2.1 Performance evaluation In this section we evaluate the performance of the Empirical Mode Decomposition processing block. The evaluation is based on the IF values computed using equation (4.1). The difference between equation (4.1) and equation (3.30) is the addition of the weighting factor λ, which is used to optimize the solution for each IMF. We follow the same evaluation criteria outlined in the previous chapter but only present results for signals composed of two chirp signals as follows. !"#$%& !"#$%&'(: (!!!!! + !"!! + !!)! = !!!!! + !!!. (4.1) The signal s t represents two chirp signals s! t , and s! t and corrupted by zero mean AWGN, represented by equation (4.2) and plotted in Figure 4.2, where SNR = 10 dB and, f!, f! for s! t and f!, f! for s! t represent the start and end frequency for each chirp signal, respectively. !! ! = sin 2! !! +!!2 ! ! !! ! = sin 2! !! +!!2 ! ! ! ! = !! ! + !! ! + !"#$%. !ℎ!"!, !! = (!! − !!)(!! − !!) , !! = (!! − !!)(!! − !!) (4.2) !! = 10 !", !! = 5 !", !! = 6 !", !! = 1 !", !! = 25 !"#, !! = 180 !"#. 53 Figure 4.2 Signal amplitude and spectrogram plots for !! ! , and !! ! chirp signals. Similarly, to the constant frequency harmonic signal we present the results for the IF estimation using both WLSTR and STFT for two cases: IF estimation without applying EMD and, IF estimation after applying EMD to the input signal. 54 A. IF estimation without applying the EMD processes. Figure 4.3 IF estimation without applying EMD. Top plot: IF estimation using weighted least squares Tychonov regularization with quadratic constraints (WLSTR) method. Bottom plot: IF estimation using STFT centroid frequency method. From Figure 4.3 we see that both algorithms are unable to correctly identify the true IF values (two constant frequency signals in this case), and instead estimate the average IF value. B. IF estimation after applying the EMD process. Figure 4.4 illustrates the signal amplitude and spectrogram plots for both !"#1 and !"#2, where it is apparent that the EMD algorithm block was successful in separating the two chirp signals !! ! and !! ! . 55 Figure 4.4 Signal amplitude and spectrogram plots for both IMF1 and IMF2. The IF plots in Figure 4.5 (Top) illustrate the IF estimation solution using the weighted least squares Tychonov regularization with quadratic constraints (WLSTR) algorithm. The IF plots in Figure 4.5 (Bottom) illustrate the IF solution using the STFT centroid frequency method. Methods SNR (dB) RMSE IMF1 RMSE IMF2 RMSE IMF2 EMD + WLSTR 10 0.28 (!=1) 0.11 (!=1) 0.03 (!=1e2) EMD + STFT 10 0.24 0.15 0.15 EMD + WLSTR 0 0.85 (!=1) 0.905 (!=1) 0.58 (!=1e3) EMD + STFT 0 0.98 1.21 1.21 Table 4-1 Performance comparison between EMD + WLSTR method, and EMD + STFT centroid frequency method. We note that for IMF2 in last column of Table 4-1 the error is reduced in our proposed method by increasing the value of λ. From Figure 4.5, and Table 4-1, we find that the 56 WLSTR method outperforms the STFT method for IMF2 estimation. This is more apparent at lower frequency (IMF2), and lower SNR levels (SNR = 0). Nevertheless, both methods see degradation in performance as the SNR decreases. This degradation in performance prompts the need for a filtering/denoising stage prior to performing the IF estimation. This is examined in the next section. Figure 4.5 Top plot: IF estimation using EMD + WLSTR method. Bottom plot: IF estimation using EMD + STFT centroid frequency method. 4.1.3 Principal component analysis (PCA) Principal component analysis is a well-established technique that has been employed successfully in feature reduction, data compression, and noise reduction. The three main features that underpin the PCA technique are, 1. The technique is non-stationary; 2. PCA is a data reduction technique and some of the original data will be lost; 3. The technique deconstructs a matrix ! into its orthonormal column space (principal components), and variance (eigenvalues) associated with each column vector. 57 Principal component analysis approximates a matrix ! as a sum of rank one matrices. As an example, let us take a matrix ! that is an !-by- ! matrix. Its singular value decomposition is given in equation (4.3). ! = !Σ!! , (4.3) where ! is the orthonormal column space matrix, !! is the orthonormal row space matrix, and Σ is the diagonal singular value matrix (standard deviation !) in descending order. An approximate representation of the matrix ! can be written as follow. ! = !! + !!…+ !!, (4.4) where ! = min !,! , and each !! matrix is rank one matrices. Furthermore, we can represent the matrix !!, given in equation (4.5) as the outer product of !! (!"ℎ column of !), with !!! (transpose of the !"ℎ column of !), and multiplied by the singular value !!. The contribution !! makes to the matrix ! is determined by the value of !!. !! = !!!!!!! , (4.5) Alternatively, we can think of equation (4.5) as the standard deviation (!!) of the data (!) along the !! dimension. A large !! value specifies that most of the data is present in the !! dimension, and vice versa. If we truncate the summation in equation (4.4) to ! terms, where ! < !. Then the matrix ! can be approximated as !! (rank ! matrix approximation to the matrix !) given by, !! = !! + !!…+ !! . (4.6) Subsequently, since the singular values !! in the matrix Σ are in decreasing order, the accuracy of this approximation increases as the rank increases. 58 One way to evaluate the value of ! is by setting a threshold value that represents a percentage of standard deviation !! for the data set along different dimensions. As we will see in Chapter 5, the seismic data is formatted into a matrix (!!"#$%) with each column represents a trace. We construct the matrix !!"#$%_!"#$% from the training set !!"#$% that represents the pre-signal noise section of each seismic event. The matrix !!"#$%_!"#$% has the same number of columns as !!"#$%. However, it only consists of the first 1000 rows (time points, which represent pure background noise. By performing the singular value decomposition on both matrices !!"#$% and !!"#$%_!"#$% we obtain !!"#$% = !!"#$%Σ!"#$%!!"#$%! !!"#$%_!"#$% = !!"#$%_!"#$% Σ!"#$%_!"#$% !!"#$%_!"#$% ! , (4.7) we then compute the threshold value using equation (4.7), where we sum the elements across the diagonal matrix Σ i.e. !ℎ!"#ℎ!"# = 1− !"# !"#$%&#' Σ!"#$%_!"#$% !"# !"#$%&#' Σ!"#$% . (4.8) The threshold value represents the percentage of noise standard deviation compared to the whole signal noise standard deviation. In practice the threshold values used are above 90%. The rank ! of the reduced matrix can then be computed by looping over the diagonal elements of the matrix Σ!"#$%, and summing the singular values in descending order, as illustrated in the algorithm below 59 Algorithm: Rank reduction Input: Threshold value from Equation (4.6), !ℎ!"#ℎ!"# Output: Rank of reduced matrix, ! Initialization: !,! = !"#$ Σ!"#$% , ! = min !,! For ! = 1 !" ! ! = !"# !"#$%&#' Σ!"#$% 1 !" !!"# !"#$%&#' Σ!"#$% If ! ≥ !ℎ!"#ℎ!"#, ! = !, End End. (4.9) This process is done for each IMF matrix separately. After which, we compute the denoised signal !!_!""#$% from the signal !! signal, as shown below. !!_!""#$% = !!"#$%_!"#$%"# !!"#$%_!"#$%"#!!! , !!"#$%_!"#$%"# =..!! ....!! .. … ..!! .. , (4.10) where !!"#$%_!"#$%"# is the reduced rank orthonormal column matrix, !! is a single seismic event, and !!_!""#$% is the same seismic event after denoising. 4.1.4 Singular spectrum analysis (SSA) Singular spectrum analysis (SSA) is implemented here as a denoise step for a time series signal. Although in the previous section PCA was also used as a denoising method, we will use SSA as an additional step to more effectively boost the SNR in a high noise environment. 60 Singular spectrum analysis is a non-stationary time series spectral estimation method. It combines the works of Karhunen [22] and Loève [23]in spectral decomposition and Mañé [24] and Taken [25]in an embedding theorem. This is also a well-established method for noise reduction of a time series signal and a full description of the process can be found in [26]. The following steps are implemented in the algorithm. 1. First step: Embedding The time series is mapped into a sequence of lagged vectors of size !, by forming ! lagged vectors as shown in equation (4.11), where ! = ! − ! + 1 and ! is the length of the time series signal ! [26]. The embedded matrix ! is then given as ! =!!!!..!!!!!!..!!!! … !!!!!!..!! , ! = !!, !!, !!… , !! . (4.11) 2. Second step: Singular value decomposition This implements a similar technique used in principal component analysis to find a reduced rank matrix !!"#$%"# from the embedded matrix !. The value of ! is obtained by setting a threshold that represents a ratio between, the actual denoised data standard deviation present in the signal, and the whole signal standard deviation that includes noise. For our times series signal !, we assume that the first ! samples are pre-signal noise only (channel noise), which we denote as !!"#$%. However, prior to performing the SSA method on the whole signal !, we first preform the embedding step on the noise !!"#$% to obtain the matrix !!"#$%, which is given in equation (4.12): 61 !!"#$% = !!, !!, !!… , !! !!"#$% =!!!!..!!!!!!..!!!! … !!!!!!..!! . (4.12) We now compute the singular value decomposition of the matrix !!"#$% = !!"#$%Σ!"#$%!!"#$%! and obtain Σ!"#$%. Σ!"#$% is then used as input in computing the threshold cutoff value as follows. !ℎ!"#ℎ!"# = 1− !"# !"#$%&#' Σ!"#$%!"# !"#$%&#' Σ! , (4.13) where the Σ! matrix is the singular value matrix computed from the input signal embedded matrix ! = !!Σ!!!!. The threshold value in equation (4.13) represents the percentage of the denoise signal standard deviation compared to the whole signal plus noise standard deviation. The rank ! of the reduced matrix !!"#$%"# can then be computed by, looping over and summing the singular values along the diagonal of the matrix Σ! in descending order, as illustrated in the algorithm below: Algorithm: Rank reduction Input: Threshold value from Equation (4.12), !ℎ!"#ℎ!"# Output: Rank of reduced matrix, ! Initialization: !,! = !"#$ Σ! , ! = min !,! For ! = 1 !" ! ! = !"# !"#$%&#' Σ! 1 !" !!"# !"#$%&#' Σ! If ! ≥ !ℎ!"#ℎ!"#, ! = !, End End. (4.14) 62 The reduced rank matrix !!"#$%"# is then derived as follows !!"#$%"# = !!Σ!_!"#$%"#!!! Σ!_!"#$%"# = !!0.. 0.00!!.. 0.0 … 00..!!.0 00..0.0…00..0.0 . (4.15) 3. Third step: reconstruction Finally, we transform the matrix !!"#$%"# back into a time series signal using the diagonal averaging method [26]. The implementation of the singular spectral analysis in this thesis is done using an algorithm published at MATLAB Center File Exchange by Andreas [27]. However, the rank reduction process, described in step two, is an added modification to the original algorithm and implemented as a part of this research. 4.1.4.1 Performance evaluation In this section we implement the EMD and SSA processing blocks, without any PCA block. This final process is applied prior to the data processing step, which estimates the instantaneous frequency. As was the case in the previous performance evaluation section, we compare the performance of the weighted least squares Tychonov regularization with quadratic constraints (WLSTR) method equation (4.1), and the short time frequency transform (STFT) centroid frequency method. 63 We use the same criteria as described in section 4.1.2.1. The same signal s t , previously analyzed, represents two chirp signals s! t , and s! t corrupted by zero mean AWGN and is represented by equation (4.2). . After applying the EMD process (Figure 4.4), and then the SSA process to the input signal s t we obtain IMF1 and IMF2, which is given in Figure 4.6. Figure 4.6 Signal amplitude and spectrogram plots after SSA processing for both IMF1 and IMF2. From Figure 4.7, and Table 4-2, we find that the SSA+WLSTR method outperforms the STFT method for both IMF1 and IMF2 estimations. We also note that for IMF2, in last column of Table 4-2, the error is reduced in our proposed method by increasing the value of λ. In addition, by comparing the results from Table 4-1 (without SSA) with the results from Table 4-2, the SSA process improves the overall performance of the algorithm for IMF1, especially at high SNR levels. However, the slight degradation in performance for IMF2 is due to the over regularization for low frequency components 64 Figure 4.7 Top plot: IMF1 IF estimation using weighted least squares Tychonov regularization with quadratic constraints (WLSTR) method. Bottom plot: IMF2 IF estimation using STFT centroid frequency method. Methods SNR (dB) RMSE IMF1 RMSE IMF2 RMSE IMF2 EMD + WLSTR 10 0.2 (!=1) 0.09 (!=1) 0.06 (!=1e2) EMD + STFT 10 0.24 0.15 0.15 EMD + WLSTR 0 0.74 (!=1) 1.006 (!=1) 0.81 (!=1e3) EMD + STFT 0 0.98 1.21 1.21 Table 4-2 Performance comparison between EMD + WLSTR method, and EMD + STFT centroid frequency method. 4.1.5 Performance comparison with state-of-the-art methods for chirp model estimation Many natural signals are not stationary, ranging from radar and biomedical signals to seismic measurements and human speech, may be modeled as signals with instantaneous frequencies (IF) that vary slowly over time [54]. An investigation of the harmonic chirp model 65 [55 -58] found that these models were more accurate than the traditional harmonic models in many practical scenarios where the IF of the signal is not stationary. The formulation of efficient estimation algorithms for chirp signal models has received a lot of attention in the literature. These algorithms focused on estimating the start frequency, and frequency rate for signals only containing a single chirp. Some of the methods described in the literature use maximum likelihood formulation [46], least squares minimization [47] , sample covariance matrix estimation techniques [48] ,as well as integrated cubic phase function estimators [49] . Alternatively, Fourier based time-frequency estimation methods, like the Wigner-Ville distribution, and the reassigned spectrogram methods, are used as rough initial estimates, which are refined using image processing techniques to fit a linear chirp model [59]–[61]. These methods also tend to generally perform better for a wider range of signal models [51]. However, methods such as the reassignment spectrogram method typically require large data sets for high accuracy IF localization [62]. These nonparametric methods typically suffer from poor resolution and high variance (as observed on spectrograms), but have a computational efficiency advantage. Alternatively, parametric methods often have better resolution and accuracy. However, they suffer two major disadvantages [51]: typically they require some a priori knowledge of the signal, and sometimes have a convergence-problem due to poor initialization. When using the chirp model the parameters are the amplitudes of the harmonics, the start frequency, and the fundamental chirp rate. Most of these works assume that the amplitude of the linear chirp signals is constant or normalized. Such signals are characterized by the phase 66 function, the instantaneous frequency, which is a function of the starting frequency and the frequency rate [45]. To perform the comparison, we compare our proposed IF estimation model (SSA+WLSTR), using the chirp signal model, with two parametric methods: The first method published in 2018, by X. Meng, A. Jakobsson, et al [45], proposes a parametric model amplitude and IF estimation for chirp signals, were they introduce the predefined dictionary Z!, to solve for the unknown parameters. The second method published in 2004, by L. Qi et al [50], presents a parameter estimation of multicomponent LFM signal based on the discrete fractional Fourier transform (DFRFT). We use the amplitude modulated chirp signal model defined in [45], where the starting frequency is !! = 0.105 !", with a frequency rate of ! = 2×10!! !"/!, sampling rate of 1 !", and the AM part of the signal defined by ! ! The analytical representation in Figure 4.8 is given as follows. ! ! = ! ! sin 2! !! +!2 ! ! + !"#$%. !ℎ!"!, ! = (!! − !!)(!! − !!), ! ! = 0.5 sin 0.06! + 0.16 + 0.04! + 0.9 (4.16) !! = 0.105 !", !! = 0.2 !", !! = 50 !"#, !! = 150 !"#. 67 Figure 4.8 Signal amplitude and spectrogram plots for ! ! , chirp signal. We evaluate the estimation performance of the proposed algorithm using the root mean squared error (RMSE). The results are presented in Figure 4.10, which combine the simulation results from our proposed algorithm (SSA+WLSTR) with the simulation results of the two alternative methods. 68 Figure 4.9 The performance comparison between, proposed method (SSA+WLSTR), Method 1 proposed in [45], (see Figure . 9 in [45]), and Method 2 proposed in [50] for a single linear chirp signal. From Figure 4.10, we observe that our proposed algorithm considerably underperforms when compared to the current state-of-the-art (Model 1) [45]. However, we are not far from the performance level of Model 2 [50]. In attempt to justify and clarify our results, we present the underpinning assumption for the above methods. The fractured Fourier transform method (FRFT) is interpreted as a decomposition of signal into orthonormal linear frequency modulated (LFM) functions in the fractional Fourier domain, in which it is considered a time-frequency analysis method and has close relationships with other time-frequency analysis tools (e.g. WVD). The discrete FRFT (DFRFT) implementation uses the decomposition algorithm proposed in [63], which is derived from the FRFT definition. This algorithm decomposes the computation of DFRFT into a convolution, which can be computed by FFT. It is worth noting that the numerical computation of the DFRFT is much more complicated than that of discrete Fourier transform (DFT) [50]. 69 The Model 1 proposed in [45] employs a parametric model using a weighted combination of splines to model the time-varying nature of the signal amplitudes. In addition a complex dictionary refinement technique is employed to obtain high accuracy IF estimations, as observed from our results. The Model 1 algorithm predefines the dictionary search parameters for the chirp signal model, and restricts its solution space. The authors achieved this high accuracy by increasing the number of dictionary parameters (for spline modeling), which increased the computational complexity (large-dimensional matrices) that would be exasperated with large data size. The authors utilized a parallel-distributed optimization framework (ADMM) to overcome these issues. As noted earlier, under the assumption stated for each algorithm, the results using Model 1 represent the state-of-the-art results in IF estimation. Moreover, the DFRFT method proposed in [50], and our proposed method also works well under the same assumptions. It should be noted that unlike both methods, our proposed method does not make a prior assumption on the signal form (chirp model and LFM), and finds the parameters directly from the time series, and is not based on the approximation using the spline basis and the dictionary used in Model 1. The resulting low RMSE for Model 1 is not surprising, because this estimation is not extracted directly from the very noisy waveform. The advantage Model 1 algorithm has is the application of modeling the amplitude using the spline basis functions resulting in the increased accuracy of estimating the frequency parameters. As can be seen, the SSA+WLSTR estimate can achieve low RMSE in high SNR situations, and also performs as well as the DFRFT (Model 2) method for lower SNRs, though the proposed method does suffer an error-offset from the DFRFT method. The reason, we think, for this offset between our proposed method and DFRFT is the high susceptibility of our model to error due to directly differentiating the signal. Finally, It 70 should be stressed that the Model 1 estimate assumed that the phase model of the chirp signal is quadratic in time. In contrast, our proposed method does not utilize any a priori information, and is not based on a specific signal model. We further note that as the DFRFT is implemented using a discrete form algorithm, it will suffer the effects of its resolution limitation. It can be concluded that our proposed method can provide an effective way to estimate the frequency parameters simultaneously, although there is considerable work to be done to improve the accuracy to the level matching the state-of-art models. 4.2 Data postprocessing In this last section, we go through the two processing blocks used in the postprocessing section, which is outlined in the algorithm flow diagram in Figure 4.10. The output instantaneous frequency from the data processing section is used as input postprocessing section. This section is application specific, meaning that, the feature detection process in this case, is specifically designed for the problem of seismic wave (compressional and shear) onset detection. 71 Figure 4.10 Data postprocessing algorithm flow diagram. 4.2.1 Power threshold cutoff The application of the power cutoff algorithm to the output Instantaneous frequency estimation, is achieved through the following steps: 1. Compute pre-signal (channel noise) instantaneous signal power. Assuming the input signal is ! ! , and the first !! samples are only noise, denoted as !!! ! , we compute the instantaneous signal and noise power as follows: 72 !"#$%& ! ! !"#$% !" = 20 log ! ! !"#$% !!! ! !"#$% !"!"#$% = 20 log !!! ! (4.17) 2. Set threshold value and evaluate !". The threshold value is set to the maximum power value !"!"#$%, which is calculated from equation (4.18), shown below. !ℎ!"#ℎ!"# = max !"!"#$% !"#$ !!!!"#!!"# !" ! ! ∶ !" ≤ !ℎ!"#ℎ!"# (4.18) 3. The instantaneous frequency estimation is filtered using !!!!"#!!"#, where all IF values with indices !!!!"#!!"# are omitted. This is illustrated in the bottom plot of Figure 4.11. The power threshold cutoff step is performed separately for each instantaneous frequency IMF output. To help illustrate this, we use the same synthetic signal (two constant frequency signals) The computed output IF for each IMF is then filtered using the power threshold cutoff process to produce the filtered (or cutoff) IF estimation. This is illustrated in Figure 4.11, which shows the IF estimation before and after the threshold cutoff is applied. 73 Figure 4.11 Top plot: IMF1 and IMF2 IF estimation using weighted least squares Tychonov regularization with quadratic constraints (WLSTR) method. Bottom plot: IMF1 and IMF2 IF estimation after a threshold cutoff is applied. 4.2.2 Feature detection In this section, the objective is to define features of interest that can be extracted from the seismic data set. As stated at the postprocessing introduction, the method used to extract these features is application and feature dependent. Therefore, this section will not contain a specific algorithm; rather, the definition and characteristics for these features are given. In Chapter 5 a specific detection algorithm is given, with some onset phase picking results. The compressional and shear waves are types of seismic phases that are the two main modes of propagation of acoustic energy in solid material. In the case of a compressional wave, the material particles oscillate in the direction of the propagating wave, and the speed at which the compressional wave travels through a medium depends on the bulk elasticity modulus. The oscillation of a shear wave occurs perpendicular to the direction of the propagating wave, and the 74 speed of shear wave propagation depends on the shear elasticity modulus. The compressional wave is the fastest wave to propagate through the solid material. Hence it is the first to arrive at the detection station. In addition, the compressional wave frequency spectrum occupies a higher frequency band than the other seismic phases (only specifically for quarry blast seismic events). Therefore, if we can accurately detect this frequency spike, we can estimate the arrival time of the compressional wave. However, since shear waves travel slower than compressional waves, they arrive later. The shear wave frequency spectrum also occupies a high frequency band that is slightly lower than the compressional wave frequency band. Using these characteristics for the compressional and shear wave, a method is developed in Chapter 5 to detect the phase onset picking using the filtered IF output. 75 Chapter 5 5- Seismic case study In this chapter we cover the implementation of the algorithm design discussed in Chapters 3 and 4. The chapter is divided into the following sections: 1. Input data set - This section describes the raw data set source and type, as well as data restructuring method and notations. 2. Algorithm structure and parameter setting - We cover the overall algorithm structure, bringing together all the processing blocks discussed in the previous chapters. We also explain the parameters settings for the final algorithm; 3. Algorithm implementation and results - In this section, a detailed discussion of the algorithm implementation process is given, with a summary of the results at the end. 5.1 Input data set 5.1.1 Input data source The data set is taken from a collection of seismic events recorded over a two-year period, using a three-component seismometer. The recording station HRFI, located at latitude 30.04° ! and longitude 35.04° !, recorded a three-component sequence of 338 quarry blasts, shot by the Jordan Phosphate Mines Company from January 2015 to March 2016. The geographic centroid of these blasts has an estimated location 29.9° ! and longitude 36.3° !. The station HRFI, located in southern Israel, approximately 50 km north of Eilat, is part of the NDC Cooperating National Facility (CNF as defined by the Comprehensive Test Ban Treaty Organization) seismic 76 network, which is part of the Israeli Seismic Network operated by the Geophysical Institute of Israel. HRFI is home to a three-component broadband STS-2 seismometer connected to Quanterra digitizer with a sampling rate of 40Hz. 5.1.2 Data restructuring The raw daily seismic data recordings are segmented into 338 events using the compressional wave (P-wave) arrival time picks as reference points. The P-wave arrival time picks were estimated by an expert geophysicist, and denoted as !!"#$% !"#$. The seismic data segments are extracted from the raw daily seismic data recording, by setting the segment start time to !!"#$% !"#$ − 30 seconds, and the stop time to !!"#$% !"#$ + 50 seconds. The resulting seismic data segment then consists of 3200 samples. Figure 5.1 illustrates a daily seismic recording example from the HFRI recording station, on the 1th of January 2015. The compressional wave arrival time picks (!!"#$% !"#$) are also highlighted on Figure 5.1, where the fourth phase arrival from Figure 5.1 is extracted and shown in Figure 5.2. As described in the data restructuring section of Chapter 4, the total set of 338 seismic events are split into two sets. The training, used to train the algorithm and optimize the parameter settings, consists of 80% of the events, or 250 seismic events. The test set consists of the remaining 20% of the events. This set is only used to evaluate the general performance of the algorithm. We also band-pass filter each of the seismic events using a seventh order Butterworth filter, with a frequency band between 0.1 and 16 Hz, prior to the EMD process. 77 Figure 5.1 Raw seismic data recording from HRFI station (Z component), which was recorded on 1th Jan 2015. The dotted line specifies the expert pick arrival time for the compressional waves. Figure 5.2 Extracted seismic data segment taken from the fourth compressional wave arrival. 78 5.2 Algorithm structure and parameter setting Illustrated in Figure 5.3 are the main three processing blocks. In this section we cover each of the processing blocks separately, and emphasize the main parameter setting for each block. The term parameter is used loosely here to mean the general algorithm settings. This includes some standard parameter settings, such as, the Tychonov regularization parameter !, in addition to non-standard parameters setting such as the training set matrix size or the number of IMF components used in the analysis. 5.2.1 Data preprocessing In Chapter 4 we covered the data-preprocessing algorithm in detail, which is summarized in the left flow diagram block of Figure 5.3. Other than the restructuring block, this section consists of an additional three blocks, which will be discussed here. 1. Empirical mode decomposition (EMD). This processing block decomposes a multicomponent signal into its monocomponent intrinsic mode functions (IMF). The main parameter setting here is the number of IMF’s chosen for analysis, which is set to the first three IMF’s, denoted as IMF1, IMF2 and IMF3. 2. Principal component analysis (PCA). PCA is one of the measures used for noise reduction, in addition to SSA. The main parameter settings for PCA are: the matrix size for both !!"#$% and !!"#$%_!"#$% , as given in equations (4.7). The !!"#$% matrix is of 3200 by 250 dimensions, and the corresponding !!"#$%_!"#$% matrix has dimensions of 1000 by 250. 3. Singular spectrum analysis (SSA) The goal here is further noise reduction using the singular spectrum analysis method. Similar to PCA, the main parameters are the size of the embedded matrices ! and !!"#$% 79 given by equations (4.11), and (4.12). For !, the number of time series samples is ! = 3200, and if we set the number of rows ! = 1024, we obtain ! = ! − ! + 1 = 2177, which is the number of lagged columns. In an analogous way for the !!"#$% noise matrix, where has ! = 1000, and we set ! = 900 to obtain ! = ! − ! + 1 = 101 lagged columns. 80 Figure 5.3 Algorithm flow diagram, consisting of the three the main blocks. First is the Data Preprocessing, second is the Data Processing and third is the Data postprocessing. 81 5.2.2 Data processing The data processing block, middle block in Figure 5.3, is the next process, which is used to compute the instantaneous frequency for each intrinsic mode function. This processing block function is described in Chapter 3. Therefore, in this section, we only highlight the main parameter settings used for the case study: 1. Compute !! and !. To compute !! and !, we first find Σ!"#$%_!"#$% from the singular value decomposition of the matrix !!"#$%_!"#!" found in equation (4.7). Then we find !! and ! using the following equations !!! = !"# !"#$%&#' Σ!"#$%_!"#$%! !! = Σ!! = (!!! ∗ ! (!"#$%&%' !"#$%&))!! ! = !!! ! !. (5.1) 2. Compute !! !! is found in the quadratic constraints term in equation (3.30), which is used to solve for the instantaneous frequency. To compute !!, we set the values of ! = 0.1 !" and ! = 16 !" in equation (3.21). 5.2.3 Data postprocessing The post signal-processing block is the following step in the data processing stream. The main goal for this process is to extract information regarding certain features of interest. This section is divided into two subsections 82 1. Power threshold cutoff The power threshold cutoff process, is used to filter the IF estimation as described in Chapter 4. The size for noise matrix !!"#$%_!"#$% has already been set in the PCA section, which is the only parameter setting. Section 4.2.1 presents the steps to compute the filtered IF output. 2. Feature detection The two main features we aim to detect are the first arrival times for the compressional and shear waves. The compressional wave is the first to be detected, and therefore it is the fastest wave to travel through ground. It usually occupies a higher frequency spectrum than any other seismic phase. Thus, when analyzing the IF for a seismic signal we are able to distinguish a compressional wave onset, as the first high frequency peak on an IF plot. The shear wave is the second fastest wave to travel through ground after the compressional wave. A high frequency spectrum also characterizes the shear wave. However, the average frequency spectrum values are less than that of the compressional wave. In practice, compressional wave detection using the signal IF is much easier to accomplish than the shear wave. This is due to the multiple compressional wave echoes that conceal the shear wave arrival. This will become evident as we go through the implementation of the algorithm in the next section. 5.3 Algorithm implementation and results In this section we go through each of the processing blocks shown in Figure 5.3 and exemplify some intermediate and final results of the algorithm. In doing so, we first start with our training set of 250 segments of seismic events, which are used mainly for noise 83 characterization. We then take a single event from the training set to illustrate and compute results. In our case to explain the results we will use the seismic event given in Figure 5.4,, which was one of the events recorded from the HFRI station (Z-component) on the 1st January 2015. Since this is the fourth event or equivalently the forth column in the training matrix, we denote this as !!. 5.3.1 Empirical mode decomposition The input for this processing block is a multicomponent signal, which for our case is the seismic signal event !! in Figure 5.4. The results obtained are three dominant and unique monocomponent signals, the Intrinsic Mode Functions (IMF), which for our seismic event are !!_!"#!, !!_!"#! and !!_!"#!, as shown in Figure 5.5. Figure 5.4 Example seismic event !!. 84 Figure 5.5 Plot of the first three IMF’s !!_!"#$, !!_!"#$ and !!_!"#$. 5.3.2 Principal component analysis To demonstrate the function and simplify the results for this processing block we split it into two steps. First, we perform PCA on the training set matrices !!"#$%_!"#!, !!"#$%_!"#! and !!"#$%_!"#!. For this we estimate the threshold value for each IMF matrix in our training set using equation (4.8). Then using equation (4.9) we compute ! (the reduced rank) for each IMF matrix. The figures shown below are the singular values log-linear plots for each IMF matrix. For example, Figure 5.6 illustrates the singular value log-linear plot for the !!"#$%_!"#! training matrix, which has a threshold value of 0.9778 resulting in the rank reduction of the IMF1 training matrix from 250 to ! = 150. Similarly, we present the equivalent results for the !!"#$%_!"#! training matrix in Figure 5.7, and the !!"#$%_!"#! training matrix in Figure 5.8. 85 Figure 5.6 Singular value plot for !!"#$%_!"#$ training set, with a threshold value as indicated. Figure 5.7 Singular value plot for !!"#$%_!"#$ training set, with a threshold value as indicated. 86 Figure 5.8 Singular value plot for !!"#$%_!"#$ training set, with a threshold value as indicated. The next step is to project the IMF seismic events components !!_!"#!, !!_!"#! and !!_!"#!, that are shown in Figure 5.5, into the reduced rank matrix space using equation (4.10). The resulting IMF seismic signals are denoted as !!_!"#_!"#!, !!_!"#_!"#! and !!_!"#_!"#!, which are illustrated in Figure 5.9. 87 Figure 5.9 Projected IMF signals !!_!"#_!"#$, !!_!"#_!"#$ and !!_!"#_!"#$. 5.3.3 Singular spectrum analysis The singular spectrum analysis process as discussed in the previous chapter serves a similar function as the PCA method, in that we are trying to filter the signal and reduce the noise prior to estimating the instantaneous frequency. Furthermore, because the SSA process incorporates an SVD stage that is applied to the embedded matrix (Chapter 4), the threshold value and the reduced rank ! are also computed as part of the SSA process. The input for this process are the three IMF signals !!_!"#_!"#!, !!_!"!_!"#! and !!_!"#_!"#!, and after following the steps outlined in the SSA process by first converting the time series signal into a time-lag matrix in the embedding step, an SVD step follows. We use equation (4.13) to compute the threshold value, and equation (4.14) to compute the reduced rank ! for each IMF embedded matrix. This is illustrated in Figure 5.10, Figure 5.11, and Figure 5.12 for !!_!"#_!"#!, !!_!"#_!"#!, and !!_!"#_!"#! respectively. 88 Figure 5.10 Singular Value plot for IMF1 component !!_!"#_!"#$. Figure 5.11 Singular Value plot for IMF2 component !!_!"#_!"#$. 89 Figure 5.12 Singular Value plot for IMF3 component !!_!"#_!"#$. The output of these processes are also three IMF signals denoted as !!_!!"_!"#!, !!_!!"_!"#! and !!_!!"_!"#!, as illustrated in Figure 5.13 below. Figure 5.13 SSA process output signals !!_!!"_!"#$, !!_!!"_!"#$ and !!_!!"_!"#$. 90 To better visualize the output signals !!_!!"_!"#!, !!_!!"_!"#! and !!_!!"_!"#! in Figure 5.13. The plot in Figure 5.14 shows the same output signals with a smaller y-axis scale. Figure 5.14 SSA process output signals !!_!!"_!"#$, !!_!!"_!"#$ and !!_!!"_!"#$ with a smaller Y-axis scale. 5.3.4 Data processing The data-processing block is used to solve for the instantaneous frequency, the details of which are found in Chapter 3. The inputs to this processing block are the three IMF components s!_!!"_!"#$, s!_!!"_!"#$ and s!_!!"_!"#$. However, before moving ahead and solving for the output IF, we first recap an important point made in Chapter 2, in relation to the condition under which the Hilbert transform closely represents quadrature component. 91 For a complex analytic signal the polar representation can be written as follows. ! ! = ! ! !!!" ! , (5.2) where the envelope ! ! , and the carrier phase ! ! , must have sufficient spectral separation for ! ! to be valid analytical signal. If the power spectral density (PSD) for the envelope and carrier functions are non-overlapping then the condition in section 2.1.2 is satisfied. The PSD plots for the envelope and carrier spectral functions, for our example, seismic signal are illustrated in Figure 5.15. The plots demonstrate the spectral frequency band occupied by each function for each IMF, where we assume that only 90% of the signal power is used to determine the frequency band for each function. The results from Figure 5.15 demonstrate the following results. For the IMF1 signal there is no overlapping between the envelope and carrier spectral functions, with a small frequency gap of 0.2 Hz. However, for IMF2 and IMF3 the envelope and carrier spectral functions overlap, where for IMF2 the overlap is around 0.6 Hz, and for IMF3 the over overlap is around 0.5 Hz. The exact frequency overlap value depends on the seismic signal. However, after analyzing the entire seismic training set we observe these results. For the IMF1 signal there is rarely any overlap between the envelope and carrier spectral functions with a varying frequency gap. For the IMF2 and IMF3 signals the opposite is true and the envelope and carrier spectral functions overlap with varying overlap bandwidth. The effect of this frequency overlap is the resultant distortion of the Hilbert transform. This distortion causes significant error in the IF estimation. In summary, we conclude that the IMF1 signal has the most reliable IF estimate. It closely resembles real seismic signal frequency characteristics. However, both IMF2 and IMF3 have some degree of distortion in their IF estimates. This distortion is inversely proportional to the overlapping frequency width of the envelope and carrier spectral functions. 92 Figure 5.15 PSD plot. (A) IMF1 envelope spectral functions. (B) IMF1 carrier spectral functions. (C) IMF2 envelope spectral functions. (D) IMF2 carrier spectral functions. (E) IMF3 envelope spectral functions. (F) IMF3 carrier spectral functions. 93 We now run the processing block to obtain the IF for each of the IMF component as shown in Figure 5.16. Figure 5.16 Instantaneous frequency estimation for IMF1, IMF2 and IMF3. The output IF estimation after the processing block. 5.3.5 Data postprocessing This processing block is optional and application dependent. In other words, the processing done here is to extract certain features or characteristics about the signal under analysis using the estimated instantaneous frequency. In our case we are analyzing seismic signals and our goal is to be able to detect the arrival of two types of waves, which are the compressional and shear waves. 5.3.5.1 Power threshold cutoff In order to facilitate the detection process, we first apply a simple power threshold cutoff process using the noise power level, the details of which were covered in the previous chapter. 94 The input to this processing blocks are the three instantaneous frequency estimation for IMF1, IMF2 and IMF3 as shown in Figure 5.16, and the output is the filtered instantaneous frequency plot for each IMF as given in Figure 5.17. Similar work in using the power threshold cutoff method was presented in [65]. Figure 5.17 Instantaneous frequency estimation after power threshold cutoff. 5.3.5.2 Feature detection Now we come to the feature detection step, where we aim to detect the compressional and shear waves onset time. In many seismology applications such as earthquake hypocenter determination [4], source mechanism analysis [5] and hydrocarbon reservoir imaging [6], in which accurate compressional and shear wave arrival time picking is required. Traditionally, different seismological organizations have relied on human expert manual picking of seismic wave arrival and labeling, or at least reviewing picks that have been automatically picked [7]. However, manual picking is not totally objective and is prone to error. The accuracy of the phase 95 picking depends on several factors such as the distance from station to source, signal-to-noise ratio (SNR), sampling rate, filters, and waveform shape (sharp or emergent). In [28] Diehi et al argue that a significant number of errors are made by the network analyst in picking phase onsets, Zeiler and Velasco[29] show similar phase picking error results from several seismological centers. Zeiler and Velasco[29] estimate that the average root mean square error (RMSE) in onset arrival time is around 0.435 seconds for P-wave arrivals, and 5.53 seconds for S-wave arrivals. The development of automatic algorithms for different phase onset picking has been approached by researchers such as Allen (1978) on automatic earthquake recognition [30], Bear and Kardolfer (1987) on automatic phase picker for local and teleseismic events [31], Taylor et al (2011) on arrival time estimation from seismic waves using a manifold-based approach [32], Baillard et al (2014) on automatic P and S wave picking for local seismic networks [33] and Gentili and Michelini on application of neural networks and high-order statistics for arrival picking [44]. However, the problem still remains a non-trivial one and an active field for research[34]. The seismic data used in this thesis is from a quarry blast mining facility, as detailed at the beginning of the chapter, and is noted for being very noisy and difficult for shear wave onset detection. The source of this difficulty in detection is outside the scope of this thesis, but we will briefly mention some of these issues later on. What we propose in this section is a method through the inspection of the IF curve given in Figure 5.17 to estimate onset time picks for each of the arrivals. Figure 5.18 illustrates the compressional and shear wave portion of the IF estimation in Figure 5.17 and the predicted onset times obtained by using this method. Compressional waves (P-waves) are much easier to detect when compared to the shear waves. They travel faster than any other wave and are the first to arrive at the monitoring station. 96 P-waves have a higher energy in comparison to shear waves, making them easier to detect. Compressional waves also have a sharp onset shape compared to the shear waves, which have an emerging onset characteristic. Although compressional waves are comparatively easier to detect, a high non-stationary noise environment still present a challenge. We now investigate a method of P-wave onset and S-wave onset detection that utilizes the instantaneous frequency characteristics of a seismic signal. The top two plots of Figure 5.18 (A) and (B) only focus on the P-wave arrival section of the instantaneous frequency estimates for both IMF1 and IMF2. As the compressional wave front arrives, we initially observe a small fluctuation in the IF. The IF value then gradually increases to a maximum frequency value. The compressional wave onset is calculated by averaging the arrival time of the first trough and peak in the initial fluctuation. This value is estimated at 29.96 sec from IMF 1 curve and 31.2 sec form IMF 2. In comparison, the expert analysts P-wave onset time pick is 30 sec, where the time pick value is rounded up to the nearest integer value. Shear wave onset detection is a notoriously difficult problem to solve [29]. The main difficulty in shear wave detection arises from the multiple wave echo arrivals after the initial compressional wave arrival (also know as coda waves). They are due to the multiple phase reflection and back scattering as the wave travels through the different layers of the ground. The coda waves arrive after the compressional wave and start with high amplitude and frequency, which then gradually decrease as a function of time. In some cases, the coda waves obscure the shear wave arrival making the shear wave detection difficult. This is especially true when the shear wave and compressional wave arrive very close to each other. The proposed shear wave onset detection method utilizes the instantaneous frequency estimation to overcome some of theses detection problems. The shear wave IF plots for IMF1 97 and IMF2 are illustrated in Figure 5.18 (C) and (D) respectively. As demonstrated within, we identified all the local maxima points and connect them with a straight-line function. Similarly, we identified all the local minima points and connect them with a straight-line function as well. As the shear wave arrives at the detector station two things occur to the IF curve. Firstly, the local maxima IF value gradually increases until it reaches a local peak frequency value. This is highlighted in Figure 5.18 (C) and (D) with a positive line slope. Secondly, the local minima IF value decreases until it reaches a local minimum IF value. This is also highlighted in Figure 5.18 (C) and (D) with a negative line slope. The shear wave onset is calculated by averaging the arrival time of the first trough and peak, which is located at the start of the positive and negative line slope section. The estimated shear wave arrival time from the IMF1 is 47.1 sec and for IMF2 is 47 sec. In comparison, the expert analysts shear wave onset time pick is 46 sec, where the time pick value is rounded up to the nearest integer value. The seven main waveform characteristics for the P- and S-wave detection are summarized in Table 5-1. P-wave detection S-wave detection 1. Sequence of arrival First wave to arrive Second wave to arrive 2. Signal energy High energy wave Medium energy wave 3. Frequency band Highest frequency band Medium frequency band 4. Signal shape Sharp wave arrival shape Emerging wave arrival shape 5. Detection problems High noise environment Coda wave interference 6. Detection method Detection using signal peak frequency Detection using signal envelope slope 7. Wave onset pick Average of first trough and peak arrival Average of first trough and peak arrival Table 5-1 Comparison summary of compressional and shear wave characteristics. 98 Figure 5.18 Instantaneous frequency plots (A) IMF1 compression wave onset pick of 29.96 sec. (B) IMF2 compression wave onset pick of 31.2 sec. (C) IMF1 shear wave onset pick of 47.1 sec. (D) IMF2 shear wave onset pick of 47 sec. 99 The P-wave onset detection is an easier problem principally because of the sharp wave arrival shape characteristic. In contrast, the S-wave has an emerging shape characteristic that makes onset detection difficult. In addition, due to the similar characteristics that S-waves and coda waves share in energy and frequency, mislabeling coda waves as S-waves is also a common detection issue. 5.4 Summary and comparative analysis of results In this chapter we covered the implementation of the proposed instantaneous frequency estimation algorithm on quarry blast seismic data. We first explained the multiple parameter settings used for this specific data set. Then we walked through a step-by-step implementation of the algorithm. We presented the results obtained at each stage, and gave insight into the analysis method as well as practical aspects of the algorithm implementation. For the final feature detection stage, we used the IF results obtained for our seismic signal and attempt to apply an alternative method for compressional and shear wave onset detection. Studying earthquakes or explosions and modeling the source location process of any event is based on three main steps, which define the process of picking seismic phases: identification or detection, classification or naming, and finally measurement or timing. Specifically, arrivals from seismograms can be associated with a particular event and then used to solve for the origin time, location, and the magnitude [29]. Extensive research has been conducted in seismic phase picking and its associated products. However, some studies [29] argue that the field lacks common datasets, and little recent published work has been performed on estimating errors from picking and misidentifying seismic phase. The results of Zeiler and Velasco[29] compared how various institutions pick seismograms, and obtained data from various networks for five regions 100 throughout the world. Their findings estimate that the average RMS P-wave time picking error is 0.43 seconds, while the RMS, S-wave time picking error is around 5.53 seconds. Alternatively, the method in [34] proposes a continuous wavelet decomposition algorithm for automatic detection of compressional and shear wave arrival times. The authors apply the automatic picking algorithm to data recorded by the High Sensitivity Seismograph Network of Japan. This network consisted of more than 750 stations distributed throughout the Japanese islands. Their findings estimate that the average RMS P-wave time picking error is 0.53 seconds, while the RMS, S-wave time picking error is around 1.71 seconds. Following the same criteria for error measurement as in [34], we measure the error between our proposed algorithm for compressional and shear wave picking and manual expert picking. To assess the performance of our proposed algorithms, we calculate the time-picking difference (Error) between, our proposed algorithm time-picks, and the export’s time-picks. Of the 338 available quarry blast traces, we exclude the very noisy traces, and traces that have an arrival times greater than 60 sec. This reduces the examined the dataset size to 246 and 50 for training and test sets, respectively. Figure 5.19 presents the Normalized cumulative number of arrivals as a function of the absolute difference between our proposed algorithm and expert picks, we observe that eighty percent of P-wave (Test set) picks, are within 2 sec of the expert picks, and are within 3.4 sec for S-wave (Test set). 101 Figure 5.19 Normalized cumulative number of arrival as a function of the absolute difference between our proposed algorithm and expert picks. The picking error-measurement for our proposed algorithm for both compressional and shear arrival times and, for both training and test data sets are presented in Figure 5.20. The Error is defined as the difference between our proposed algorithm’s picks and the expert’s picks. We observe from Figure 5.20, that the P-wave (Test set) mean-error is 1.4 sec with a standard deviation of 0.7 sec, and the S-wave (Test set) mean-error is 1 sec with a standard deviation of 3.1 sec. 102 Figure 5.20 Probability density histograms of the difference between our proposed algorithm and expert picks. In addition, the average RMS P-wave (Test set) time picking error is 1.58 sec, and 3.3 sec for the S-wave (Test set). Therefore, our results for shear wave time picking error are lower than, the shear wave RMS error reported between institutions in [29]. We acknowledge that the approach used in [34] outperforms our proposed algorithm, and the automatic picks are in better agreement with manual picks. However, we would like to point out some factors that could explain this variation in performance. The seismic data source, presented in [29] and [34] are all regional earthquake (< 200 km) sources. They are typically much larger in magnitude and easier to detect and measure, given their higher SNR. In comparison, quarry blast data, present a much more challenging problem. They are considerably 103 lower in magnitude than earthquake events, with much lower SNR and therefore, are harder to detect and measure. In addition, the automatic phase picking algorithm proposed in [34] utilizes a network consisting of more than 750 stations (sampling frequency of 100 Hz, and 27 bit resolution), which are installed inside boreholes at depths greater than 100 m below ground. This setup facilitates the acquisition of high resolution, and high SNR data. In comparison, the quarry blast data was recorded on one station (sampling frequency of 40Hz, and 24 bit resolution), resulting in lower SNR data. Finally, the results from our proposed algorithm expose slightly large discrepancies between the algorithm S-wave picking, and the expert picking. This can be attributed to several factors: the simultaneous arrival of different phases, in particular, surface waves [34] and P- waves and their coda sometime overlap with S-wave arrivals, further complicating the S-wave arrival time detection. 104 Chapter 6 6- Conclusion 6.1 Summary The algorithms presented in this thesis for automatic compressional and shear wave arrival picking perform at a level that is comparable to that of human analysts. Applying the EMD method allows for the abstraction of the IMF components of the signal. Then working in the time domain provides the means to distinguish the phase arrival from noise. We note that the shear wave arrival detection is especially difficult. This is due to the compressional wave coda arrivals overlapping with the shear wave arrivals. However, our proposed algorithm shows good performance even when the noise level is high. In this thesis we began with an introduction of the concept of instantaneous frequency, in which we first introduced the analytical signal and complex representation of a signal. We also introduced and discuss two categories of signals: Stationary and non-stationary signals, monocomponent and multicomponent signals. In Chapter 2 we presented some background history on classical instantaneous frequency definitions and its evolution, starting from Van Der Pol and the definition of AM and FM signals to Ville and his definition of the instantaneous frequency using the AS. At the end of Chapter 2 we described the conditions under which the Hilbert transform of a signal accurately represents the quadratic component of that signal. The key point we emphasize is that for the Hilbert transform to accurately representation the quadratic component of a signal the envelope and phase signals must have sufficient spectral separation. Equivalently, for the IF estimation of an AM-FM signal to accurately represent the 105 frequency of the carrier part of the signal (FM), the frequency spectrum for the AM and FM parts of the signal must not overlap. Chapter 2 is then concluded with the empirical mode decomposition method for multicomponent signal decomposition into its IMF’s, which allows for meaningful instantaneous frequency representation of a multicomponent signal. In the algorithm design chapters (Chapter 3 and 4) we proposed a numerical approach to estimating the instantaneous frequency for a multicomponent non-stationary signal. This is achieved by reformulating our problem into a weighted least squares Tychonov regularization problem with quadratic constraints, where we expanded on the advantages of each term in our minimization problem. We also present the Empirical Mode Decomposition (EMD) method to decompose seismic wideband signals into its primary narrowband components, which function like quasi-carriers, and enabling the application of classical IF estimation. In Chapter 3 we applied an optimal penalty weight, α, to the IF slope (!!!" ) penalty, that is dependent on the noise variance. This penalty was effective in reducing the error in the IF estimation that was due to noise. We also compared the results obtained from this proposed (SSA+WLSTR) method, using the chirp signal model, with two parametric methods: The first method published in 2018, by X. Meng, A. Jakobsson, et al [45], proposes a parametric model amplitude and IF estimation for chirp signals, where they introduce the predefined dictionary Z!, to solve for the unknown parameters. The second method published in 2004, by L. Qi et al [50], presents a parameter estimation of multicomponent LFM signal based on the discrete fractional Fourier transform (DFRFT). We concluded that although the first method outperformed our proposed method, the method was much more complex and thus more computational expensive, especially with large datasets. We concluded by stating that although our proposed method can 106 provide an effective, and efficient way for IF estimates, there is still considerable work to be done to improve the accuracy to the level-matching and data-fitting state-of-art models. In Chapter 5 we investigated the application of our algorithm on quarry blast seismic signal analysis. We focused on the implementation aspect of the algorithm for real quarry blast seismic data. For the final feature detection stage, we used the IF results obtained from our proposed method to estimate the compressional and shear wave arrival times. We concluded this chapter by comparing the performance of our proposed algorithm to the results obtained by Bogiatzis and Ishii in [34], and point to some factors that could explain the variation in performance. We finally concluded by showing that the performance of our proposed algorithm is comparable to that of human analysts that is presented by Zeiler and Velasco in [29]. Although not implemented for this thesis, other applications such as sonar signal analysis, and ultrasonic imaging in biomedical engineering, can utilize our proposed method for IF estimation. 6.2 Future work Future research work to develop and improve our proposed method can be identified into two areas. 1. IF Data processing: More accurate parameter selection techniques can be utilized in optimizing the solution. More accurate assumptions regarding the noise model employed for Tychonov regularization parameter could be researched, which would increase the stability of the solution and produce more accurate results. 2. Feature Detection: Developing on the feature detection algorithm proposed, as part of the case study would greatly improve the detection accuracy for onset phase picking. In addition, for larger seismic data sets there is greater potential for machine learning 107 implementation, where we can utilized the IF estimation for the multiple IMF components to learn a detection scheme. 3. Expand the algorithm: Potential for expanding the feature detection algorithm by using the seismic data logged on the horizontal plane; East axis (E-component) and North axis (N-component), in addition to the vertical axis or Z-component axis that was used in our case study. This in turn would increase the data size by factor of three allowing for better accuracy in IF estimation in this application 108 Bibliography [1] N. E. Huang et al. “The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis.” Proceedings of the Royal Society of London.Series A: Mathematical, Physical and Engineering Sciences, 454(1971), pp. 903-995, 1998. [2] J. D. Hamilton. Time Series Analysis. Princeton: Princeton university press, 1994, pp. 435-447. [3] S. S. Haykin. Communication Systems, 4th edition, John Wiley & Sons. 2001, pp 31-78. [4] S. Billings, M. Sambridge and B. Kennett. “Errors in hypocenter location: Picking, model, and magnitude dependence.” Bulletin of the Seismological Society of America, 84(6), pp. 1978-1990, 1994. [5] J. L. Hardebeck and P. M. Shearer. “A new method for determining first-motion focal mechanisms.” Bulletin of the Seismological Society of America, 92(6), pp. 2264-2276, 2002. [6] V. Oye and M. Roth. “Automated seismic event location for hydrocarbon reservoirs.” Comput. Geosci, 29(7), pp. 851-863, 2003. [7] R. J. Willemann and D. A. Storchak. “Data collection at the international seismological centre.” Seismol. Res. Lett, 72(4), pp. 440-453, 2001. 109 [8] H. Goto et al. “Very dense seismic array observations in Furukawa district, Japan.” Seismol. Res. Lett, 83(5), pp. 765-774, 2012. [9] A. Levander and G. Nolet. “Perspectives on array seismology and USArray.” Seismic Earth: Array Analysis of Broadband Seismograms, pp. 1-6, 2005. [10] B. Boashash. “Estimating and interpreting the instantaneous frequency of a signal. I. Fundamentals.” Proceedings of the IEEE, 80(4), pp. 520-538, 1992. [11] B. van der Pol. “The Fundamental Principals of Frequency Modulation.” Electrical Engineers - Part III: Radio and Communication Engineering, Journal of the Institution Of, 93(23), pp. 153-158, 1946. [12] D. Gabor. “Theory of communication. Part 1: The analysis of information.” Journal of the Institution of Electrical Engineers-Part III: Radio and Communication Engineering, 93(26), pp. 429-441, 1946. [13] M. J. Yedlin, G. F. Margrave and Y. B. Horin. “Instantaneous frequency computation: theory and practice.” CREWES Research Report, 26(85), pp. 1-13, 2016. [14] J. Ville. “Theorie et application de la notion de signal analytic, Cables et Transmissions.” Translation by I.Selin, Theory and Applications of the Notion of Complex Signal, Report T-92, Paris, France, 2A(1), pp. 61-74, 1948 [15] J. R. Carson and T. C. Fry. “Variable Frequency Electric Circuit Theory with Application to the Theory of Frequency-Modulation.” Bell System Technical Journal, 16(4), pp. 513-540, 1937. 110 [16] A. H. Nuttall and E. Bedrosian. “On the quadrature approximation to the Hilbert transform of modulated signals.” Proceedings of the IEEE, 54(10), pp. 1458-1459, 1966. [17] T. Qian, Q. Chen and L. Li. “Analytic unit quadrature signals with nonlinear phase.” Physica D: Nonlinear Phenomena, 203(1), pp. 80-87, 2005. [18] N. E. Huang et al. “A confidence limit for the empirical mode decomposition and Hilbert spectral analysis.” Proceedings of the Royal Society of London.Series A: Mathematical, Physical and Engineering Sciences, 459(2037), pp. 2317-2345, 2003. [19] G. Strang. Computational Science and Engineering. Wellesley-Cambridge Press Wellesley, 2007, pp. 100-345. [20] J. L. Mead and, R. A. Renaut. “Least squares problems with inequality constraints as quadratic constraints.” Linear Algebra and its Applications, 432(8), pp. 1936-1949, 2010. [21] A. Tan. “Hilbert-Huang Transform.” MATLAB Central File Exchange. Internet: https://www.mathworks.com/matlabcentral/fileexchange/19681-hilbert-huang-transform, Mar. 31, 2016 [Mar. 1, 2019]. [22] K. Karhunen. “Über Lineare Methoden in Der Wahrscheinlichkeitsrechnung.” Universitat Helsinki, 1947, pp. 230-242. [23] P. Lévy and M. Loève. “Processus Stochastiques Et Mouvement Brownien.” Gauthier-Villars Paris, 1948, pp. 299-304. 111 [24] R. Mañé. “On the dimension of the compact invariant sets of certain non-linear maps.” Dynamical Systems and Turbulence, Warwick 1980, Lecture Notes in Mathematics, 898(1), pp. 230-242, 1981. [25] F. Takens. “Detecting strange attractors in turbulence.” Lecture Notes in Mathematics, 898(1), pp. 366-381, 1981. [26] N. Golyandina and A. Zhigljavsky, Singular Spectrum Analysis for Time Series. Springer Science & Business Media, 2013, pp. 11-54. [27] A. Groth. “Singular Spectrum Analysis - Beginners guide.” MATLAB Central File Exchange. Internet: https://www.mathworks.com/matlabcentral/fileexchange/58967-singular-spectrum-analysis-beginners-guide, Sep. 1, 2016 [Feb. 1, 2017]. [28] T. Diehl et al. “Consistent phase picking for regional tomography models: application to the greater Alpine region.” Geophysical Journal International, 176(2), pp. 542-554, 2009. [29] C. Zeiler and A. A. Velasco. “Seismogram picking error from analyst review (SPEAR): Single-analyst and institution analysis.” Bulletin of the Seismological Society of America, 99(5), pp. 2759-2770, 2009. [30] R. V. Allen, “Automatic earthquake recognition and timing from single traces.” Bulletin of the Seismological Society of America, 68(5), pp. 1521-1532, 1978. [31] M. Baer, and U. Kradolfer. “An automatic phase picker for local and teleseismic events.” Bulletin of the Seismological Society of America, 77(4), pp. 1437-1445, 1987. 112 [32] K. M. Taylor et al. “Estimation of arrival times from seismic waves: a manifold-based approach.” Geophysical Journal International, 185(1), pp. 435-452, 2011. [33] C. Baillard et al. “An automatic kurtosis based P- and S-phase picker designed for local seismic networks.” Bulletin of the Seismological Society of America, 104(1), pp. 394-409, 2013. [34] P. Bogiatzis, and M. Ishii. “Continuous wavelet decomposition algorithms for automatic detection of compressional and shear wave arrival times.” Bulletin of the Seismological Society of America, 105(3), pp. 1628-1641, 2015. [35] R. Allen. “Automatic phase pickers: their present use and future prospects.” Bulletin of the Seismological Society of America, 72(6B), pp. S225-S242, 1982. [36] C. Panagiotakis, E. Kokinou, and F. Vallianatos. “Automatic P-Phase Picking Based on Local-Maxima Distribution.” IEEE Trans. Geosci. Remote Sens, 46(8), pp. 2280-2287, 2008. [37] P. S. Earle, and P. M. Shearer. “Characterization of global seismograms using an automatic-picking algorithm.” Bulletin of the Seismological Society of America, 84(2), pp. 366-376, 1994. [38] L. Persson. “Statistical tests for regional seismic phase characterizations.” J. Seismol, 7(1), pp. 19-33, 2003. 113 [39] C. D. Saragiotis, L. J. Hadjileontiadis, and S. M. Panas. “PAI-S/K: A robust automatic seismic P phase arrival identification scheme.” IEEE Trans. Geosci. Remote Sens, 40(6), pp. 1395-1404, 2002. [40] J. J. Galiana-Merino, J. L. Rosa-Herranz, and S. Parolai. “Seismic P-Phase Picking Using a Kurtosis-Based Criterion in the Stationary Wavelet Domain.” IEEE Trans. Geosci. Remote Sens, 46(11), pp. 3815-3826, 2008. [41] L. Küperkoch et al. “Automated determination of P-phase arrival times at regional and local distances using higher order statistics.” Geophysical Journal International, 181(2), pp. 1159-1170, 2010. [42] A. Lois et al. “A new automatic S-onset detection technique: Application in local earthquake data.” Geophysics, 78(1), pp. KS1-KS11, 2012. [43] H. Zhang, C. Thurber and, C. Rowe. “Automatic P-wave arrival detection and picking with multiscale wavelet analysis for single-component recordings.” Bulletin of the Seismological Society of America, 93(5), pp. 1904-1912, 2003. [44] S. Gentili, and A. Michelini. “Automatic picking of P and S phases using a neural tree.” J. Seismol, 10(1), pp. 39-63, 2006. [45] X. Meng, A. Jakobsson, et al. “Estimation of chirp signals with time-varying amplitudes.” Signal Processing, 147, pp. 1-10, 2018. 114 [46] T.L. Jensen, J.K. Nielsen, J.R. Jensen, M.G. Christensen, S.H. Jensen. “A fast algorithm for maximum-likelihood estimation of harmonic chirp parameters.” IEEE Transactions on Signal Processing, 65(19), pp. 5137-5152, 2017. [47] S. Nandi, D. Kundu. “Asymptotic properties of the least squares estimators of the parameters of the chirp signals.” Annals of the Institute of Statistical Mathematics, 56(3), pp. 529-544, 2004. [48] B. Völcker, B. Ottersten. “Chirp parameter estimation from a sample covariance matrix.” IEEE Transactions on Signal Processing, 49(3), pp. 603-612, 2001. [49] P. Wang, H. Li, I. Djurovi ´c, B. Himed. “Integrated cubic phase function for linear FM signal analysis.” IEEE Transactions on Aerospace and Electronic Systems, 46(3), pp. 963-977, 2010. [50] L. Qi, R. Tao, S. Zhou, Y. Wang. “Detection and parameter estimation of multicomponent LFM signal based on the fractional Fourier transform.” Science in China series F: information sciences, 47(2), pp. 184, 2004. [51] J. Swärd, J. Brynolfsson, A. Jakobsson, M. Hansson-Sandsten, “Sparse semi-parametric estimation of harmonic chirp signals.” IEEE Transactions on Signal Processing, 64(7), pp. 1798-1807, 2016. [52] O. Besson, P. Stoica. “Nonlinear least-squares approach to frequency estimation and detection for sinusoidal signals with arbitrary envelope.” Digital Signal Processing, 9(1), pp. 45-56, 1999. 115 [53] O. Besson, P. Stoica, “Sinusoidal signals with random amplitude: least-squares estimators and their statistical analysis.” IEEE Transactions on Signal Processing, 43(11), pp. 2733-2744, 1995. [54] P. Flandrin. “Time frequency and chirps.” In Wavelet Applications VIII International Society for Optics and Photonics. 4391, pp. 161-176, 2001. [55] Y. Pantazis, O. Rosec, and Y. Stylianou, “Chirp rate estimation of speech based on a time-varying quasi-harmonic model” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , pp. 3985–3988, 2009. [56] M. G. Christensen and J. R. Jensen, “Pitch estimation for non-stationary speech,” in Conf. Rec. 48th Asilomar Conf. Signals, Syst. Comput. , pp. 1400–1404, 2014. [57] Y. Doweck, A. Amar, and I. Cohen, “Joint model order selection and parameter estimation of chirps with harmonic components,” IEEE Trans. Signal Process. 63(7), pp. 1765–1778, 2015. [58] S. M. Nørholm, J. R. Jensen, and M. G. Christensen, “Instantaneous pitch estimation with optimal segmentation for non-stationary voiced speech,” IEEE Trans. Audio, Speech, Lang. Process. 24(12), pp. 2354–2367, 2016. [59] J. Xiao and P. Flandrin, “Multitaper time-frequency reassignment for nonstationary spectrum estimation and chirp enhancement,” IEEE Trans. Signal Process. 55(6), pp. 2851–2860, 2007. 116 [60] J. Guo, H. Zou, X. Yang, and G. Liu, “Parameter estimation of multicomponent chirp signals via sparse representation,” IEEE Trans. Aerosp. Electron. Syst. 47(3), pp. 2261–2268, 2011. [61] B. Wang and J. Huang, “Instantaneous frequency estimation of multicomponent chirp signals in noisy enviroments,” J. Marine Sci. Appl. 6(4), pp. 13–17, 2007. [62] F. Auger and P. Flandrin, “Improving the readability of timefrequency and time-scale representations by the reassignment method,” IEEE Trans. Signal Process. 43(5), pp. 1068–1089, 1995. [63] Ozaktas, H. M., Arikan, O., Kutay, M. A. et al., “Digital computation of the fractional Fourier transform.” IEEE Trans. on SP, 44(9), pp. 2141—2150, 1996. [64] Sigiuk, A., “Instantaneous Frequency Estimation Using Least Square Algorithm with Tychonov Regularization.” Unpublished manuscript, EECE 565, University of British Columbia, 2016. [65] Sigiuk. A, Yedlin. M, “Instantaneous frequency estimation using weighted least squares Tychonov regularization with box constraints.” CREWES Research Report, 28(69), pp. 1-13, 2016.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Development of an instantaneous frequency estimation...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Development of an instantaneous frequency estimation pipeline for compressional and shear wave arrival… Sigiuk, Ahmed Hasin Issa 2019
pdf
Page Metadata
Item Metadata
Title | Development of an instantaneous frequency estimation pipeline for compressional and shear wave arrival picking : application to quarry blast data |
Creator |
Sigiuk, Ahmed Hasin Issa |
Publisher | University of British Columbia |
Date Issued | 2019 |
Description | Many seismology applications such as earthquake hypocenter determination, source mechanism analysis, and hydrocarbon reservoir imaging, require accurate compressional and shear wave arrival time picking. Traditionally different seismological organizations have relied on human expert manual picking of seismic wave arrival and labeling. However as the number of seismic stations around the world rapidly grows, the need for automatic phase picking algorithms is increasing and requires the employment of large datasets. These algorithms need to distinguish between the different seismic phases, and accurately measure their onset. This thesis proposes a seismic signal processing pipeline application for automatic time picking of seismic compression waves (P) and shear waves (S). The algorithm picking accuracy is evaluated on quarry blasts seismic traces, and compared to manual expert picking. The proposed pipeline consists of three main segments: The data pre-processing segment focuses on Empirical Mode Decomposition (EMD) as a method of deconstructing multicomponent signals into monocomponent signals, followed by the Singular Spectral Analysis (SSA) method as a denoising filter. The data processing segment computes the instantaneous frequency (IF) using weighted least squares and Tychonov regularization with quadratic constraints. The data post-processing segment uses seismic signal IF estimation to pick the compressional and shear waves arrival times. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2019-05-22 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution 4.0 International |
DOI | 10.14288/1.0378928 |
URI | http://hdl.handle.net/2429/70251 |
Degree |
Master of Applied Science - MASc |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2019-09 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by/4.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2019_september_sigiuk_ahmed.pdf [ 44.32MB ]
- Metadata
- JSON: 24-1.0378928.json
- JSON-LD: 24-1.0378928-ld.json
- RDF/XML (Pretty): 24-1.0378928-rdf.xml
- RDF/JSON: 24-1.0378928-rdf.json
- Turtle: 24-1.0378928-turtle.txt
- N-Triples: 24-1.0378928-rdf-ntriples.txt
- Original Record: 24-1.0378928-source.json
- Full Text
- 24-1.0378928-fulltext.txt
- Citation
- 24-1.0378928.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0378928/manifest