Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

An investigation of the Outlier Processing Method for single-trial-event-related potentials Tajwar, Samina 1995

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_1996-0084.pdf [ 2.77MB ]
Metadata
JSON: 831-1.0065068.json
JSON-LD: 831-1.0065068-ld.json
RDF/XML (Pretty): 831-1.0065068-rdf.xml
RDF/JSON: 831-1.0065068-rdf.json
Turtle: 831-1.0065068-turtle.txt
N-Triples: 831-1.0065068-rdf-ntriples.txt
Original Record: 831-1.0065068-source.json
Full Text
831-1.0065068-fulltext.txt
Citation
831-1.0065068.ris

Full Text

An Investigation of the Outlier Processing Method for Single-Trial Event-Related Potentials by S A M I N A T A J W A R B . A . S c , The University of Waterloo, 1991 A THESIS SUBMITTED IN PARTIAL F U L F I L L M E N T O F T H E R E Q U I R E M E N T S FOR T H E D E G R E E O F M A S T E R O F APPLIED S C I E N C E in T H E F A C U L T Y O F G R A D U A T E STUDIES D E P A R T M E N T O F E L E C T R I C A L ENGINEERING We accept this thesis as conforming to the required standard T H E UNIVERSITY O F BRITISH C O L U M B I A December 1995 © Samina Tajwar, 1995 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department The University of British Columbia Vancouver, Canada Date DE-6 (2/88) ABSTRACT This thesis is an investigation of the Outlier Processing Method, which was developed to eliminate the dependence of conventional techniques on a priori information for the detection of event-related potentials (ERPs) from EEG signals. Instead of attempting to model the ERP, the OPM assumes the ERP to be an outlier signal which is imbedded in a background EEG signal. The background EEG is modelled as an auto-regressive signal and approximated using robust statistical parameter and signal estimation. The ERP estimate is obtained by subtracting the estimate of the background EEG signal from the observed EEG signal. The basis of the robust statistical parameter estimation used in OPM was the generalized maximum likelihood (GM) estimate. It required the generation of initial parameter estimates to start the process. MEM, a least squares estimator used in the original OPM, was found to be a more consistent estimator when compared to median regression estimation. Also, a method of automatically generating the tuning constants used by the influence functions in the GM estimate was proposed. , The robust signal estimation was a robust variation of a Kalman filter which required a cleaner function to minimize the effect of outliers such as ERPs. Of the three different cleaners that were evaluated, the fixed-lag cleaner had the best overall performance. A method of detecting the location of the outlier was developed and was found to be effective for signals with low SNRs and a large amount of spectral overlap between the outlier and the background EEG signal. i i The improvements to the OPM were incorporated into a new algorithm referred to as 0PM2 and tested against the original OPM, referred to as 0PM1, and MMSE, a previously published method used as a benchmark. The three algorithms were tested on real EEG signals of a ballistic movement of the index finger. Al l three techniques successfully extracted information related to the finger move-ment. However, both 0PM1 and 0PM2 were successful without any a priori information, whereas MMSE uses a filter based on the average of the raw data. 0PM1 and 0PM2 were significantly better, than MMSE in rejecting false positives. iii Table of Contents ABSTRACT ii List of Tables vii List of Figures viii Chapter 1) INTRODUCTION 1 1.1 Background 1 1.2 Goal 3 Chapter 2) ROBUST PARAMETER ESTIMATION 4 2.1 The EEG Model 4 2.2 Initial Parameter Estimates 6 2.3 The Influence Function and Its Tuning Constants 8 2.3.1 M-Estimate 8 2.4 Automatic Generation of Tuning Constants 12 2.4.1 Asymptotic Theory 12 2.4.2 Asymptotic Variance Function 13 2.4.3 Adaptation to the OPM 17 2.5 Summary of Issues 23 iv Chapter 3) ROBUST SIGNAL ESTIMATION 25 3.1 Filter-Cleaner 26 3.2 Fixed-Lag and Smoother Cleaners 29 3.2.1 Fixed-Lag Cleaner 29 3.2.2 Smoother Cleaner 30 3.3 Time-Varying Influence Function 31 3.3.1 SNR and Spectral Overlap 32 3.4 Summary of Issues 34 Chapter 4) THE IMPROVED OPM 35 4.1 Data Simulation 35 4.2 Initial Parameter Estimates 37 4.3 Automatic Generation of Tuning Constants 41 4.4 Cleaners 44 4.5 Time Varying Influence Function 47 4.6 Summary 52 Chapter 5) TESTS ON REAL EEG 53 5.1 Analysis of EEG signals 53 5.1.1 Ensemble Averages 53 5.1.2 SNR 56 5.1.3 Ranking Trials 57 V 5.2 Test Procedure 59 5.2.1 MMSE 60 5.2.2 Performance Measures 61 5.3 Results 64 5.3.1 Ensemble Averages and Single-Trials 64 5.3.2 Performance Measurements 72 Chapter 6) CONCLUSIONS 76 6.1 Contributions 76 6.2 Future Work 78 References 80 vi List of Tables Table 1 MSEs averaged over 100 trials 45 Table 2 Difference in MSEs between cleaners 46 Table 3 The AR parameters for the different outlier sequences. . . 47 Table 4 Average and maximum SNRs at Fz, C3 and Fz-C3 . . . . 57 Table 5 Results of DTW cost comparisons 75 v i i List of Figures Figure 1 OPM block diagram 3 Figure 2 , The Huber Psi Function 11 Figure 3 Tukey's Biweight Psi Function 17 Figure 4 GM2. 18 Figure 5 A block diagram of the IWLS procedure for GM 20 Figure 6 A block diagram of GM-estimation including the automatic generation of tuning constants. The dashed boxes signify the proposed changes to the OPM 24 Figure 7 Hampel's Psi Function 27 Figure 8 Comparison of Estimated Parameters between MR and MEM 38 Figure 9 Percentage of best trials between MEM and MR 39 Figure 10 The sum of the squared errors for the varying contamination levels 40 Figure 11 Comparison of the parameter estimates generated by GM and GM_agtc 42 Figure 12 Percentage of best trials between GM and GM_agtc. . . . 43 Figure 13 Percentage of Best Trials by each Cleaner 44 Figure 14 The percentage of best trials, excluding the smoother-cleaner 46 viii Figure 15 The spectral plots of the four different outlier sequences corresponding to the AR parameters listed in Table 3. The spectrum of the background sequence is the dotted line in each plot 47 Figure 16 The MSEs of GM2 and MEM for the three different types of outliers 49 Figure 17 Examples of CD profiles generated from observed signals with -5dB SNR and outlier no.3 51 Figure 18 Ensemble averages of the active and idle trials at electrodes F z and C 3 55 Figure 19 Test Procedure 60 Figure 20 Classification with Dynamic Time Warping 63 Figure 21 Ensemble averages of the active ERP estimates 65 Figure 22 Ensemble averages of the active ERP estimates 66 Figure 23 Examples of single trials of active ERP estimates 68 Figure 24 Examples of single trials of active ERP estimates 69 Figure 25 Examples of single trials of idle ERP estimates 70 Figure 26 Examples of single trials of idle ERP estimates 71 Figure 27 Standard template used for DTW for OPM1 estimates . . 73 Figure 28 Standard template used for DTW for OPM2 estimates . . 73 Figure 29 Standard template used for DTW for MMSE estimates . . 74 Chapter 1 INTRODUCTION Advances in assistive technology have the potential to provide some levels of independence for people with severe movement-related disabilities. But, often, the effectiveness of current technology is limited by the inadequacies of the human-machine interface (HMI). The ideal universal HMI coupled with recent advances in technology for peripheral devices could provide a much improved life-style for people of this group. This research is an attempt to put us one step closer to developing such an interface. 1.1 Background Electrical brain signals are one of the many human physiological signals that could be used for an ideal HMI. Movement information from signals elicited from the brain could be directly utilized to communicate with peripheral devices such as a robotic arm. The electrical activity of the brain can be recorded in a non-invasive manner with an electroencephalograph (EEG). EEG potentials, measured at the surface of the scalp, represent the combined effect of potentials from the cerebral cortex (the cellular layer forming the top layer of the brain) and various points beneath. The continuous neural activity of a person will be referred to as background EEG. When an event occurs, such as a movement or a response to a stimulus, the brain generates a characteristic signal which is referred to as an event-related potential (ERP). ERPs are transient signals imbedded in the non-stationary background EEG. In addition, the SNR of an ERP lies below OdB [1]. Given these conditions, researchers have been challenging the problem of ERP extraction 1 Chapter 1— INTRODUCTION for decades with varying levels of success. Initial studies modelled the ERP as a deterministic function for which the best estimate could be achieved by averaging over a number of responses, assuming the background E E G was random and zero mean. However, it was recognized by several independent researchers [2], [3], [4], that ERP components vary in amplitude and latency from trial to trial rendering the ensemble averaged model as an inaccurate representation of an ERP. Although, it does provide a general idea of the underlying ERP waveform, there is a significant loss of unique single-trial information. Current research has been focussed towards developing signal processing methods for ERP extraction on a single-trial basis. To date, researchers have based their single-trial techniques on using training sets of data-to obtain averaged templates or statistical characteristics of ERPs. This included methods incorporating filtering [5], correlation matrices [6] and neural networks [7]. Although these methods have achieved some level of success, their reliance on a priori knowledge of the ERP would limit the scope of a universal human-machine interface. Birch et. al [8] developed a method, called the Outlier Processing Method (OPM), that successfully extracted ERPs without any requirement for a priori information of the ERP. Instead of attempting to model the ERP, Birch modelled the background E E G signal. The algorithm was based on the assumption that ERPs were additive outliers which did not conform to the structure of the background EEG. 2 Chapter 1— INTRODUCTION 1.2 Goal The O P M is basically comprised of two fundamental elements. The first generates a set of robust autoregressive parameters to capture the statistical structure of the background E E G . Based on these parameters, the second element, a signal estimator, generates an estimate of the background signal. In the final step, the difference is taken between the background estimate and the observed signal to reveal the outlier content which contains E R P information. Figure 1: OPM block diagram Observed E E G signal Robust Parameter Estimator AR parameters Robust oignai Estimator Estimate of Background EEG signal Difference Estimate of Single-Trial ERP • The goal of this thesis was to independently analyze and improve the robust parameter and signal estimation to lead to the development of a new O P M in terms of accuracy, efficiency and facility. 3 Chapter 2 ROBUST PARAMETER ESTIMATION 2.1 The EEG Model The O P M was developed by Birch in 1988 with the deliberate intention of designing a single-trial E R P extraction technique that was not dependent on a priori information. The method and its development are described in detail in [9]. However, it would be beneficial to briefly review the E E G model for the purposes of this thesis. Martin and Thompson [10] had suggested that the Additive Outlier model described by (2.1.1-2.1.2) would apply to patchy type outliers whose behaviour seemed to be uncorrected with the behaviour of the rest of the data. E R P ' s are transient waveforms that would be like patchy type outliers. Seeing its applicability to the E E G waveform, Birch modelled the observed E E G signal as the sum of an autoregressive (AR) process, xi and an additive outlier, v{. where y,-: observed E E G signal xi: background E E G signal v,-: E R P uf. residual error term yl=xl + vi (2.1.1) p (2.1.2) 4 Chapter 2— ROBUST PARAMETER ESTIMATION AR parameters The Additive Outlier model required the background EEG signal to be modelled by an AR process. Parametric models such as the AR model (2.1.2) are often used to describe a set of data. Classical statistical methods typically assume that the data will exactly follow the chosen parametric model. However, outliers in the data set, in this case ERPs, can greatly hinder the results of classical statistical procedures. Robust statistics takes into account the existence of outliers realizing that parametric models are only approximations to reality. Therefore, in order to produce a set of AR parameters that would best model the background EEG signal, it was necessary to use a robust parameter estimator. Observed EEG signal Robust Parameter Estimator AR parameters 5 Chapter 2— ROBUST PARAMETER ESTIMATION 2.2 Initial Parameter Estimates The first step in robust parameter estimation is to generate a set of initial parameter estimates, based on classical statistical methods. These initial parameter estimates are then improved upon through an iterative process of estimation and removal of outlier data. In the original OPM, M E M (Maximum Entropy Method), a least squares estimator had been selected for the initial parameter estimates. A least squares estimate of 9 is that 9 which minimizes the sum of the squared errors or residuals (2.2.1). This type of estimator is also referred to as an L-2 estimator. When the errors are normally distributed, the least squares estimator is optimal in the class of all unbiased linear estimators. Unfortunately, it is extremely sensitive to additive outliers and the efficiency dramatically decreases when the error distribution is non-Gaussian. An alternative to least squares are L - l estimators. An L - l estimate minimizes the sum of the absolute errors (2.2.2). Birch [8] and Lind [9] had suggested investigating the L - l estimate, inferring that it might prove to be a better starting point than least squares. L - l estimators are known to be optimal for linear regressions where the errors have double exponential tails [10]. They also resist undue effects from a few, large outliers but the effects of patchy additive (2.2.1) i=l n (2.2.2) i=l 6 Chapter 2— ROBUST PARAMETER ESTIMATION outlier has not been well-documented. Median regression (MR) is an example of an L - l estimator. MR is based on a modified simplex algorithm proposed by Barrodale and Roberts [2] with a slight change by Koenker [11]. M E M and MR were compared, in Section 4.1, to determine which one would produce the best initial parameter estimates with patchy type additive outliers. 7 Chapter 2— ROBUST PARAMETER ESTIMATION 2.3 The Influence Function and Its Tuning Constants Another critical step in robust parameter estimation is to minimize the effect of outlier data. One approach involves the application of the influence function. The influence function can be interpreted as a heuristic tool which discriminates against the effect of outliers by weighting the residual error appropriately. The influence function originated with the development of the M-estimate. M-Estimate The M-estimate is a generalization of the well-known maximum likelihood estimate. Suppose that (xi, £ 2 , x n ) are an independently and identically distributed (i.i.d.) set of observations with probability density function / (x — 9) for an unknown true location parameter 9. The maximum likelihood estimate is that value of 9 that maximizes (2.3.1) or minimizes (2.3.2). n Il-fte-') (2.3.1) i=l n (2.3.2) Huber [12] proposed to generalize (2.3.2) to n (2.3.3) »=i 8 Chapter 2— ROBUST PARAMETER ESTIMATION where p is a differentiable continuous function. If the derivative of p is defined as ib(u) = then the estimate 9 satisfies n ^ # c , - - 0 ) = O. (2.3.4) j=i An M-estimate is identified by the choice of ip or influence function used in (2.3.4). The key to robust M-estimation is to find influence functions that will discriminate against outliers for better estimation results. Taking it one step further, (2.3.5) is a scale invariant M-estimator where cr is a normalization term representing a robust estimate of the standard deviation around 9. ijj is then independent of the scale of the process. i=l When applied to a linear regression model, yi = xj# + u{, the M-estimate is defined by E K ^ ^ j (2-3-6) i= l ^ ^ where yi : ith observation x / : ith row of design matrix X n x p 9: is a p x lvector of unknown parameters ui\ ith residual error with distribution F(u/a) a: robust estimate of the standard deviation of u 9 Chapter 2— ROBUST PARAMETER ESTIMATION Taking the partial derivatives with respect to 9 results in the linear regression M-estimator By defining x j = yi-2, Vi-p}, (2.3.7) becomes the M-estimator for the autore-gression model (2.1.2). In (2.3.7), yi — x j 9 is called the residual error term. By scaling the residuals, we hope to reduce their variance to approximately one, but that is largely dependent on their distribution, F. Influence functions are specifically designed to handle data with particular residual error distributions. If the scaled residuals are large, there is a high probability that the data contains outliers. Thereby, the role of the influence function is to set a threshold beyond which all residuals would be down-weighted to counteract the effect of outliers. An influence function is characterized by its shape and its relative size which is defined by tuning constants. One of the simplest influence functions is the Huber Psi function (2.3.8) designed for an error distribution, F, which is normally distributed in the center with double exponential tails. The tuning constant, k, is the point of discrimination. When the magnitude of the residual is less than or equal to k, it is treated as data with no outlier content and the output equals the input. When the residual magnitude increases above k, the function compensates for outlier content by limiting the output to k. (2.3.7). (2.3.7) (2.3.8) 10 Chapter 2— ROBUST PARAMETER ESTIMATION Figure 2: The Huber Psi Function \|/(U) k -k k u -k In practice, the amount of outlier content is unknown and so it is important for the accuracy of a parameter estimation technique to use an optimum value of the tuning constant. A sub-optimal setting of the tuning constant could lead to the downweighting of valid data or the inclusion of data with outliers, thereby adversely affecting the parameter estimates. Trial and error methods are typically used for determining the tuning constants of an influence function. However, in the following section, a method of automatically generating the optimal tuning constants is developed for certain influence functions. 11 Chapter 2— ROBUST PARAMETER ESTIMATION 2.4 Automatic Generation of Tuning Constants Asymptotic Theory In order to understand the premise for calculating the optimal tuning constant for a particular influence function, we must first consider the properties of asymptotic theory [13]. Let (yi,...,yn) be a set of independent and identically distributed observations with empirical distribution F n . Let T n ( y i , y n ) or T n(F n) be the estimators of the set of observations, where {Tn; n >1} can be viewed as the sequence of estimators, one for each possible sample size n. If T n ( y i , y n ) -»• T(F), T(F) is called the asymptotic value of {Tn; n >1} at F, n—t-oo the true distribution. From this, the property of asymptotic normality is derived as CF(MTN-T(F)}) W e S l y 7V(0,y(T,F)) (2.4.1) n—>oo Cp is defined as the distribution of its argument under F and V(T,F) is the asymptotic variance of {Tn; n >1} at F. For a parametric model, {Tn; n >1} can be expressed as > 1 j representing a sequence of estimators of 6, one for each possible sample size n. A sequence of estimators J6>„; n > 1 j is said to be consistent for a parameter 9 if J6>„ j —> 9. Thus, (2.4.1) can be written as CFeM0n-0\) - N(Q,V(Fe)) (2.4.2) \ L J / n—•oo 12 Chapter 2— ROBUST PARAMETER ESTIMATION The asymptotic efficiency of an estimator is e oc yj^. Intuitively, the smaller the variance of the difference between the parameter estimate (based on n observations) and the true parameter, the better the performance of the estimator. Based on this relationship, the minimization of the asymptotic variance maximizes the asymptotic efficiency of the estimator. This, in turn, implies that the optimal value of the tuning constant for a particular influence function is that value which minimizes the asymptotic variance of the estimator. These concepts can all be expanded to the multi-dimensional case with finite-dimensional vector parameters as described in [13]. Asymptotic Variance Function John Lind [9] presented a method for choosing optimal tuning constants based on minimizing the asymptotic variance of some M-estimators. This method could not be directly applied to the OPM because the OPM uses a slightly different parameter estimation technique. The development of his proposed method will be discussed in this section and the adaptation of this method to the OPM will be discussed in Section 2.5. The use of M-estimators in calculating robust estimates of regression parameters requires the practitioner to choose an appropriate influence function and specify the tuning constants. The choice of influence function is dependent on the underlying error distribution, F, of the data set. The tuning constants, which are typically chosen by trial and error, could be chosen through the minimization of the asymptotic variance function (AVF). 13 Chapter 2— ROBUST PARAMETER ESTIMATION Huber designed the influence function described by (2.3.8) to specifically accommo-date heavy-tailed distributions. Yohai [14] showed that the regression M-estimator with Huber's influence function had the properties of asymptotic normality and therefore, the AVF could be used to generate an optimal value for the tuning constant k. If k is the estimate of k and 8^ is the solution for the M-estimator, Yohai stated that y/n(0k — d^j converges to the normal distribution with mean 0 and multivariate covariance matrix X-'V^F) (2.4.3) where X0 = l i m X'X/n (2.4.4) and k / k J u2dF(u) + k2 1 - / dF{u) -k V -k I dF(u) -k The sample estimate of the asymptotic variance function (2.4.5) is (2.4.5) V^k, F) = / Y y.J/-2 (2A6) f 1 —k < t < k where I r _ v t i is the indicator function defined asl(t) — < n ' , ~ . — L *'KJ w \ 0 , otherwise 1 Note that Yohai's original equation for (2.4.6) did not include the scaling factor in the argument for the ip function because he did not scale the residuals in the solution of (2.3.7). 14 Chapter 2— ROBUST PARAMETER ESTIMATION Thus, for k belonging to a given interval, k is chosen as that value of k that minimizes the moment estimate of the asymptotic variance function (2.4.6). He also proved that this procedure had the same asymptotic efficiency for the unknown distribution of error. Yohai's work was the basis of Hlynka et al's [15] derivations used to estimate the tuning constants, aQ and u0, for where e and w were known or predetermined. This influence function was referred to as the optimum solution to the M-estimator (2.3.7) given that the distribution of F was known to belong to a defined class of distributions. The advantages of their approach was that they chose a redescending influence function (returned to zero) that would totally eliminate gross outliers and allowed for a more general error distribution. By establishing the asymptotic normality of QUo,a0, it was found that a reasonable estimate of the tuning constants would be that which minimized the asymptotic variance (2.4.8) for this M-estimator. sgn(u), \u\ < u0 u0 < |u| < u0 — w \u\ > a0 — w (2.4.7) aa—w J rio!a0(u)dF(u) aQ+w (2.4.8) 2 The minimization of (2.4.8) becomes virtually impossible when the distribution of F is unknown, which is the case in most practical situations. Hlynka et al. tackled 15 Chapter 2— ROBUST PARAMETER ESTIMATION this problem of choosing suitable values for the tuning constants of (2.4.7) where F was unknown. They stated that the tuning constants could be chosen by minimizing the moment estimate of the asymptotic variance function (2.4.9). Due to the nature of the ip function, the minimization of even the moment estimate (2.4.9) was still a complex problem. According to John Lind's [9] report, favorable results were obtained with a conjugate gradient minimization algorithm by Powell [16]. However, based on work subsequent to his technical report, he found that the AVF, as a function of two tuning constants was a fairly flat surface. As a result, in the finite sample case, Newton or quasi-Newton methods tended to hang up in small surface depressions. In practice, he suggested the implementation of a simpler redescending influence function such as Tukey's Biweight (2.4.10) which is a function of only one parameter. He also recommended using a grid search to locate the minimum of the moment estimate of the AVF (2.4.11) as it would only require a one dimensional search. (2.4.9) 2 (2.4.10) 16 Chapter 2— ROBUST PARAMETER ESTIMATION Figure 3: Tukey's Biweight Psi Function V(u)| (2.4.11) Adaptation to the OPM GM2-estimation As stated earlier, the OPM utilizes an estimation technique that is slightly different to the M-estimator. Denby and Martin [17] found that classical M -estimators lacked robustness when applied to the Additive Outlier model. They claimed that the generalized M-estimate (GM-estimate), for which the summands of the estimating equation are bounded and continuous functions of the data, is robust for the Additive Outlier case. The OPM utilizes a GM2-estimate which is two iterations of GM, each followed by a cleaning. The purpose of the "cleaning" was not to extract the ERP, but to remove large gross outliers that may have been introduced into the signal from a number of sources. It was found that two iterations (GM2) was slightly better than one iteration 17 Chapter 2— ROBUST PARAMETER ESTIMATION (GM1) and that GM1 was significantly better than GM. A GM2-estimate is obtained through the following steps: 1. Calculate a GM-estimate of the parameters from the observed signal. 2. Remove large outliers in the observed signal with a filter-cleaner as described in Section 3.1. 3. Calculate a GMl-estimate of the parameters using the output of the filter-cleaner. 4. Repeat step 2. 5. Calculate a GM2-estimate of the parameters using the output of the filter-cleaner. Figure 4: G M 2 Cleaned Observed . Signal GM1-estimate of AR parameters GM2-estimate of AR parameters GM-estimation Since the basis of GM2 is the GM-estimate, the minimization of the asymptotic variance function was applied to the GM-estimate. It can be seen, by comparing the M-estimate (2.4.12) and the GM-estimate (2.4.13) that the essential difference between the two are the weighting factors, W. The purpose of the weighting factors is to downweight the summands in (2.4.13) when one or more components in x,-is unreasonably large and produces a poor prediction. i=p+l (2.4.12) (2.4.13) 18 Chapter 2— ROBUST PARAMETER ESTIMATION Birch solved the GM-estimates using an IWLS ( Iterated Weighted Least Squares) procedure as shown in Figure 5. The parameter estimates were sequentially obtained starting at order one to the desired order. At the beginning of each order, a least squares estimator, M E M produced the initial set of parameter estimates. Within each order, the parameter estimates were updated with every iteration of GM-estimation. Three iterations were performed, the first two with the Huber Psi function and the last iteration with Tukey's Biweight. W is dependent on the covariance matrix of the system which was updated with each order. 19 Chapter 2— ROBUST PARAMETER ESTIMATION Figure 5: A b lock diagram o f the I W L S procedure for G M . C Start } T order = 1 MEM estimates iteration = 1 v|/=Tukey's Biweight [iterations^*] Function No U % \ Yes \\r= Huber's Psi Function • update parameter estimates with GM-estimation update covariance matrix C s t o p ) increment iteration increment order 20 Chapter 2— ROBUST PARAMETER ESTIMATION The Asymptotic Variance Function for G M The GM-estimate can be treated as an M-estimate where $ is a function of x and u by letting $(x, u) = W(-x)ip(u). (2.4.13) can be rewritten as n ' J2 *,-$(x,-,ui)- (2.4.14) i=p+l Marrona and Yohai [18] showed that this type of estimator has the properties of asymptotic normality given certain assumptions. Based on those assumptions, the moment estimate of the asymptotic variance function was derived to be V($,F) = —!=i ^ '—s (2.4.15) Assuming that u, is independent of x,-, (2.4.15) could be written in terms of W and tp as < 2 / Vi-** i i E w2(^ F) = - r ^ • (2-4.16) ( i E ^ ) ^ ^ ) ) Therefore, the tuning constant that minimizes (2.4.16) would theoretically contribute to producing the best parameter estimates. Before directly implementing the automatic generation of tuning constants we had to determine the effectiveness of the minimization with the influence functions. Birch used Tukey's Biweight and the Huber Psi functions in the GM-estimation. For this reason, an initial study [19] was performed where both influence functions were applied to (2.4.16). The results of these simulations provided some insight into the characteristics of their respective AVFs. 21 Chapter 2— ROBUST PARAMETER ESTIMATION The AVFs produced using the Huber Psi function were not very reliable. They were very "noisy" in the vicinity of the minima making it difficult to select the optimal tuning constant. Also, they were ineffective with small sample sizes, in that on many occasions there was no minimum. On the other hand, the AVFs for the Tukey's Biweight were fairly smooth curves where the minima were easily detectable with small sample sizes. The importance of processing the EEG in short data segments, and hence with small sample sizes, is so that the signal demonstrates a semblance of stationarity, as indicated in Birch [8]. Based on these results, it would not be feasible to implement the automatic generation of tuning constants using the Huber Psi function. Birch chose the Huber Psi to ensure that (2.4.13) produced a unique root, as discussed by Martin [20]. Martin stated that (2.4.13) has an essentially unique root with the Huber Psi function. The final iteration with Tukey's Biweight was to make use of the rejection character of the redescending function. Further iterations with Tukey's Biweight, according to Martin, could lead to extraneous roots. This raised the question: would it be worthwhile to perform all three iterations with Tukey's Biweight to allow for the automatic generation of tuning constants and the complete elimination of trial and error with the trade-off of risking extraneous roots? This issue is addressed in Section 4.1. 22 Chapter 2— ROBUST PARAMETER ESTIMATION 2.5 Summary of Issues The proposed issues with regards to the robust parameter estimation are summarized below and shown in Figure 6. 1. The automatic generation of tuning constants was compared to the trial and error methods to resolve whether the implementation of the OPM could be made easier. 2. MR was compared with M E M to determine which produced the best initial parameter estimates with a patchy additive outlier. 23 Chapter 2— ROBUST PARAMETER ESTIMATION Figure 6: A block diagram of GM-estimation including the automatic generation of tuning constants. The dashed boxes signify the proposed changes to the OPM. C Start order = 1 i J M E M or MR i estimates? I i generate I optimal j tuning constants y — iteration = 1 \|/=Tukey's Biweighti Function ! update parameter estimates with GM-estimation update covariance matrix Chapter 3 ROBUST SIGNAL ESTIMATION The technique used for signal estimation was also required to be "robust" to en-sure that the background EEG was estimated without adverse effects due to the ERP. And as in the case of the parameter estimator, the influence function is the chosen ap-proach, described in the next section. The autoregressive parameters generated through GM2-estimation are input, along with the observed signal, to a robust signal estimator which computes a robust estimate of the background EEG signal. Observed E E G signal Robust Signal Estimator Estimate of Background EEG signal AR parameters 25 Chapter 3— ROBUST SIGNAL ESTIMATION 3.1 Filter-Cleaner The original OPM algorithm used a filter-cleaner proposed by Martin and Thompson [20] for the signal estimation. It is basically a robust version of a Kalman filter as will be shown later. The filter-cleaner relies on the Additive Outlier model, reviewed below. yt=xi + Vi (3.1.1) The background EEG signal is represented by a pth order AR process defined as v xt = ^2xi_k9k + ui (3.1.2) k=i where 9p} are the AR parameters and w,- is a random error term. Written in the state variable form, (3.1.3) where —i = xi— 1) xi—^+l) y j = («,-, o,..,o) '01 h . 9p-i 9p 1 0 . 0 0 0 1 . 0 0 . .0 1 0 . (3.1.4) Robust estimates of X 8 are calculated according to the following recursion: X,- = + m&tf» ( 1 Si (3.1.5) 26 Chapter 3— ROBUST SIGNAL ESTIMATION where m' = mjs2 with rrij being the first column of the pxp matrix M i - M i is recursively calculated as P.. = M ; - W | yj - Vi \ n w (3.1.6) . The M i and matrixes are prediction and filtering error-covariance matrixes, respectively. Q is a pxp matrix with all zero elements except for Q = a2, si is a time varying estimate of the scale of the process and is defined by s2 = m 1 1 . Birch used w(i) = ip(t)/t with Hampel's three-part redescending ijj (3.1.7). t, |*| < a a, a < \t\ < b ^ ( c - < ) , 6 < l * l < c 0, \t\ > c (3.1.7) Figure 7: Hampel's Psi Function ¥(u)| -b -a c ~i—1 yi denotes a robust one-step prediction of yi and is defined by y~* 4 -1 Assuming that xi is independent of v,-, as in the AO model, a best predictor of yi is also a best predictor of X[. The robust one-step predictor x\ 1 of JC,- satisfies x ~i—1 27 Chapter 3— ROBUST SIGNAL ESTIMATION The cleaned data at time i is given by x i = ( x i ) i , (3.1.8) the first element of X ; . Note that when ip(t) = i , w = 1 and sf = m11:i + al (where al is the variance of v,), eq. (3.1.5) is effectively a Kalman filter. Unfortunately, the Kalman filter is not robust and a single additive outlier would throw off the predictions of X,-. 28 Chapter 3— ROBUST SIGNAL ESTIMATION 3.2 Fixed-Lag and Smoother Cleaners Fixed-Lag Cleaner Suppose that we take a particular look at the prediction of x; where / is a fixed point in time. The recursive equation (3.1.5) could be expressed at time / as X , X 7 - 1 xi ~XI-1" = $ _xl-p+l. Xl—p . y i - y i st •l-l (3.2.1) Now, according to Martin's filter-cleaner, the best prediction of is at time / when xj is the first element of Xj- However, consider the next recursion when time is l+l. 'xi+i' xi xi Xl-l = 1 .xl—p _ Xl-p-1. I ' / ( y l + 1 + mi+isi+iip \ -yli+i V (3.2.2) We still have a prediction of xi stored as the second element in X j + 1 . That prediction is, theoretically, a better prediction because it is not only based on past and present observations (y/, yj_i, ••••) but also on a future observation, yi+i. xj will continue to be improved with future observations until it is the last element in the vector, 2L/+P_i • Since 29 Chapter 3— ROBUST SIGNAL ESTIMATION X - only holds p elements, it is only possible to improve the prediction of any data value with up to p—1 future observations. In general, a better prediction of x{ is the pth element of X l + p - i -Martin [20] also noticed that there was room for improvement with the filter-cleaner. One of his suggestions was the fixed-lag cleaner, as described above, where the cleaned data is taken after a lag of p-1. Smoother Cleaner His second suggestion, a smoother cleaner, was to perform a backward recursion following the forward recursion using all the observations for a prediction of x™, where n denotes using the total number of samples. The backward recursion is defined as The background recursion is time consuming and requires the storage of the , M and P matrixes. The smoother-cleaner would be, theoretically, the best choice as it uses all the available information for the prediction of each data value, but it does put demands on memory and processing time. There are also some fundamental problems with the smoother cleaner because the backward recursion depends on the inverse of the M matrix which is the prediction error-covariance matrix and has the potential of being singular at times. (3.2.3) (3.2.4) where i = n — 1, n — 2 , 1 with initial condition X% = Xn and x1-.30 Chapter 3— ROBUST SIGNAL ESTIMATION 3.3 Time-Varying Influence Function The cleaner2 relies on the influence function to discriminate between outlier and non-outlier content in the data. This in turn implies that the tuning constants of the influence function play a critical role in the accuracy of the results as they directly affect the estimate of the ERP. In Birch's implementation of the OPM, the tuning constants a, b and c for the Hampel xj) function, Figure 7 were time invariant. A time invariant influence function would apply the same weighting during the entire EEG epoch. The ERP, however, is a transient signal that is only present for a portion of the EEG epoch. As a result, the influence function would be applying undue influence in the periods before and after the ERP and not enough influence during the ERP. Parts of the background EEG signal would be unduly weighted and considered as outlier whereas parts of the ERP would not be appropriately downweighted and treated as background EEG. This would lead to the extraction of noise and the degradation of the ERP estimate. Noise suppression and ERP enhancement could be achieved with an adaptive in-fluence function. The influence function would be required to be fairly insensitive (a large tuning constant ) for the time periods before and after the ERP but would have to clamp down during the ERP. Mason [21] proposed a method to achieve this with an ERP location estimator. His modified OPM had required two passes through the cleaner. On the first pass, the cleaner estimated the location of the ERP using a time-invariant 2 The word "cleaner" would be applicable to any of the three cleaners as they are all just modifications of the basic filter-cleaner. 31 Chapter 3— ROBUST SIGNAL ESTIMATION influence function. By keeping track of the statical characteristics of the prediction error he was able to use a binary implementation of a time-varying influence function on the second pass. Mason achieved good results for varying combinations of simulated data at and above OdB. The SNR of real EEG signals, however, tend to be less than OdB, which suggested that Mason's location estimator may not work as effectively with real data. Mason had tested his location estimator over a range of SNRs and spectral overlaps between the ERP and the background EEG signals. His simulations identified that the algorithm performed better as the SNR was increased and amount of spectral overlap was decreased. SNR and Spectral Overlap Spectral overlap is a measure of how similar the ERP is to the background EEG signal in the frequency domain. If the frequency profiles are very similar, a predictive filter like the cleaner would have difficulty differentiating between the two signals. SNR is the ratio of the power in the ERP to the power in the background EEG. In this case, it is not so much how similar they are but how much smaller the power of the ERP is relative to the background signal. To date, there is very little known about single-trial ERP. Through averaging and other techniques it is known that they exhibit low SNRs. But we have little information regarding their spectral characteristics. Realizing that the sensitivity of the OPM to both of these factors is primarily a function of the signal estimator, we decided to analyze the effects of spectral overlap and SNR on the parameter estimator. Perhaps, the parameter estimates could provide 32 Chapter 3— ROBUST SIGNAL ESTIMATION some insight into the spectral characteristics of the ERP. Further, if there is very little or no spectral overlap between the ERP and the background sequence, a location estimator could be developed for a time-varying influence function. The results of the simulations will be presented in Section 4.1. 33 Chapter 3— ROBUST SIGNAL ESTIMATION 3.4 Summary of Issues In summary, two issues were explored with regards to the robust signal estimator: 1. The three types of cleaners, a. Filter-Cleaner b. Fixed-Lag Cleaner c. Smoother-Cleaner were tested on simulated data to determine if an improvement could be made to the filter-cleaner. 2. The effects of SNR and spectral overlap on the parameter estimator and how that information could be used to develop a time-varying influence function. 34 Chapter 4 THE IMPROVED OPM Simulation studies were performed to test the issues discussed in the previous chapters and analyze their trade-offs. A new and improved OPM will be presented, based on the results of these simulations, and it will be tested on real EEG in the following chapter. 4.1 Data Simulation The background EEG was simulated by applying a filter, based on a known set of autoregressive parameters, to a random Gaussian error distribution. Recall that an autoregressive sequence is described as p Xi = Y2xi-k9k + Vi ( 4 .1 .1 ) k=l where represents an error term and { 0 i , 0 P } are the AR parameters. For the purposes of these simulations the AR parameters were specified as 01 = 0 . 8 3 8 92 = - 0 . 4 7 1 0 3 = 0.638, 0 4 = - 0 . 4 2 9 (4 .1 .2 ) 95 = 0 . 5 1 8 0 6 = - 0 . 3 0 4 0 7 = 0 . 1 8 2 0 8 = - 0 . 2 4 3 . The ERP was also simulated as an autoregressive sequence ( 4 .1 .3 ) which will be referred to as the outlier sequence. The reason that this sequence was chosen was because the spectral distribution of this sequence is somewhat overlapped with the spectral distribution of the background EEG signal. As will be discussed later in this chapter, spectral distribution can greatly affect the.-results of ERP extraction and for this reason 35 Chapter 4— THE IMPROVED OPM a middle ground spectral distribution was chosen. Oi = 0.6 62 = -0.2 03 = -0.2 04 = 0.4 (4.1.3) The length of the background sequence was set to 100 samples. The ratio of the lengths of the two sequences determined the level of contamination. For example, for 10% contamination, the length of the outlier sequence would be set to 10 samples. The outlier sequence was scaled to achieve a specified SNR and then added to the background sequence at a randomly chosen location, resulting in the simulated observation.. 36 Chapter 4— THE IMPROVED OPM 4.2 Initial Parameter Estimates The first test dealt with the capabilities of M E M and MR to estimate the auto-regressive parameters characterizing the background EEG signals. The mean squared error between the parameter estimates and the true parameters (4.1.2) were calculated, where the mean was taken over the 100 trials for varying levels of contamination. Figure 8 depicts the mean squared error of each AR parameter as generated by the two estimation techniques. As expected, M E M outperformed MR in the case of no outlier content. The least squares estimator is optimal for that case, especially since the error distribution in the simulated data was Gaussian. However, the results were not so clear cut with contaminated data. In general, MR appeared to provide better estimates of the first few parameters and M E M appeared to produce the same or slightly better estimates of the last few parameters. 37 Chapter 4— THE IMPROVED OPM Figure 8: Comparison of Estimated Parameters between MR and M E M Comparison of Estimated Parameters between MR and MEM Because these results were not definitive, a second measure of comparison was investigated. Two new sets of background sequences based on the parameter estimates of M E M and MR were generated. The mean squared error was calculated between these signals and the original background sequences, which were based on the true parameters. The estimation technique that exhibited the lowest MSE was denoted as the "best" technique for that particular trial. The percentage of best trials for the different amounts of contamination is plotted in Figure 9. As expected, M E M had a far greater number of best trials than MR for the no contamination case. But MEM's performance deteriorated and MR's improved as the amount of contamination was increased. Following this trend 38 Chapter 4— THE IMPROVED OPM would indicate that in the case of a significant amount of outlier content (over 50%), MR might be the better choice. Figure 9: Percentage of best trials between M E M and MR i i MEM Iii MR 10% 20% Amount of Contamination 50% However, Figure 10 indicated otherwise. It is a plot of the actual MSEs for each trial. Although better than M E M in some cases, MR produced extremely large errors for a number of trials. MR did not appear to be a very consistent estimator and even at 50% contamination, it outperformed M E M only 46% of the time. M E M appeared to be more successful in tracking the signal and maintaining a stable level of error. This could be due to the low SNR (OdB) or the patchy type of outlier. In any case, since both situations are expected in real EEG, M E M should be retained for the initial parameter estimates. 39 Chapter 4— THE IMPROVED OPM 40 Chapter 4— THE IMPROVED OPM 4.3 Automatic Generation of Tuning Constants Since G M is the fundamental component of GM2, this test studied the advantages and disadvantages of using the automatic generation of tuning constants compared to using trial and error for GM-estimation. If the tuning constants in G M are to be automatically customized to the data set, it was necessary to use only Tukey's Biweight for the GM-estimation. This, however, introduced a risk of producing extraneous roots for the solution of GM. To analyze the effects of this trade-off, the simulated data at OdB, was processed by the original GM and the proposed GM_agtc ( automatic generation of tuning constants). The tuning constants selected for GM, determined through trial and error, were ^Huber:k=1.0 V"rukey:c=3.0 . The mean squared error between the parameter estimates and the true parameters (4.1.2) were calculated, where the mean was taken over 100 trials for varying levels of contamination. The results are shown in Figure 11. For 50% and 0%contamination, both techniques were extremely close. For the cases of 20% and 10% contamination, G M produced slightly better estimates. Although, this was a positive result, it was intriguing that GM_agtc did not do better than GM, considering that it's tuning constants were supposedly optimal. This could be due to the replacement of the Huber Psi function with the Tukey's Biweight for the first two iterations. Also, the AVF was derived based on 41 Chapter 4— THE IMPROVED OPM the assumption that ra —> oo and the small sample size of 100 may incur deviations from the true optimal tuning constant. Figure 11: Comparison of the parameter estimates generated by GM and GM_agtc. 0.45-, 0-1 1 1 1 1 1 1 1 1 2 3 4 5 6 7 8 Parameter No. Since these results were not definitive, two new sets of background sequences were generated based on the GM and GM^agtc parameter estimates. The mean squared error was calculated between these signals and the original background signals. The estimation 42 Chapter 4— THE IMPROVED OPM technique that had the lowest MSE was denoted as the "best" technique for that particular trial. Figure 12 shows the percentage of best trials for both techniques for the various levels of contamination. Once again, no one technique dominated. For all cases, the percentages hovered around 50%. These results seemed to indicate that we should expect to achieve at least equal performance from GM_agtc. Figure 12: Percentage of best trials between GM and GM_agtc. 100-9 CM 8CH GM agtc 7CH 6CM CD Q_ 0-07. 10% 20% Amount of Contamination 50% 43 Chapter 4— THE IMPROVED OPM 4.4 Cleaners The niter-cleaner, fixed-lag cleaner and smoother cleaner were compared to each other by measuring how well each cleaner estimated the background sequences. The inputs to each cleaner were the true known AR parameters (4.1.2) and the simulated observation sequences. Each cleaner then output an estimate of the background sequence. The signal mean squared error between the estimated and the actual background sequences were calculated and used as a measure of performance. The cleaner that produced the lowest mean squared error was denoted as the "best" technique for a particular trial. The smoother cleaner, by far, outperformed the other two as shown in Figure 13. The fixed-lag cleaner ranked second and the filter-cleaner was the worst. Figure 13: Percentage of Best Trials by each Cleaner 100-80-1 f&M Smoother §§i Fixed—Lag SI Filter 70A 60A 201 30-1 1<H 0-0% 10% 20% Amount of Contamination 50% 44 Chapter 4— THE IMPROVED OPM The MSEs between the cleaners, averaged over 100 trials, are listed in Table 1. The average MSEs for the fixed-lag cleaner are approximately 3/4 that of the original filter cleaner and, as predicted, the average MSEs of the smoother cleaner are even better. Although it would seem obvious that the smoother cleaner is the best method, there are a number of issues to consider with regards to this method. First, the smoother cleaner algorithm is susceptible to problems because it requires the calculation of the inverse of the M matrix which can be singular at times. Table 2 lists the differences between the MSE's and it is clear that the improvement with the fixed-lag cleaner over the filter cleaner is an order of magnitude greater than the improvement achieved with the smoother cleaner over the fixed-lag cleaner. In fact, when the smoother cleaner was removed from the comparison, the fixed-lag cleaner picked up every single trial for which the smoother cleaner had been ranked best, as shown in Figure 14. Table 1: MSEs averaged over 100 trials Percent Contamination Filter-Cleaner Fixed-Lag Cleaner Smoother Cleaner 0% 1.041 0.745 0.682 10% 1.074 0.760 0.689 20% 0.981 0.748 0.699 50% 1.243 1.12 1.10 45 Chapter 4— THE IMPROVED OPM Table 2: Difference in MSEs between cleaners. Percent Contamination M S E Difference Fixed-Lag and Filter (avg. over 100 trials) M S E Difference Smoother and Fixed-Lag (avg. over 100 trials) 0% 0.296 0.063 10% 0.314 0.070 20% 0.232 0.049 50% 0.128 0.016 Figure 14: The percentage of best trials, excluding the smoother-cleaner. 0% | O X 20* S O X The results of these simulations showed that the original filter-cleaner could definitely be improved upon. Considering that the smoother cleaner was only slightly more accurate than the fixed-lag cleaner, it would be more efficient to use the fixed-lag cleaner and save on time, memory and possible complications due to singular matrixes. 46 Chapter 4— THE IMPROVED OPM 4.5 Time Varying Influence Function In Section 3.3, it was suggested that some insight could be gained regarding the spectral characteristics of an ERP by analyzing the effect of spectral overlap on the parameter estimates. Outlier sequences based on three different sets of AR parameters were chosen to test varying amounts of spectral overlap. The spectral plots of the outlier sequences are shown in Figure 15 and their corresponding AR parameters are listed in Table 3. The first sequence exhibits the least amount of overlap whereas the third outlier sequence is heavily overlapped with the background signal. Note that outlier no.2 was used in the previous simulation tests. 100 trials were generated with each outlier sequence (1-3) at SNRs of 5dB, OdB and -5dB. Figure 15: The spectral plots of the four different outlier sequences corresponding to the AR parameters listed in Table 3. The spectrum of the background sequence is the dotted line in each plot. 3) / 1 J 1 J ' i V i V ^ \ J * f 1 N Table 3: The AR parameters for the different outlier sequences. Outlier No. Oi 02 03 04 1. 0.3 -0.1 -0.3 0.4 2. 0.6 -0.2 -0.2 0.4 3. 0.72 -0.31 0.15 -0.05 47 Chapter 4— THE IMPROVED OPM The data was then processed by GM2 and MEM. GM2 was chosen as the best representative of a robust estimator and M E M was used as a reference. As seen in Figure 16, GM2's error level remained reasonably consistent across SNRs for each outlier sequence, reaffirming the superior performance of this robust estimator. The difference was in the M E M estimates. The error in the M E M estimates decreased as the SNR decreased and as the spectral overlap increased. In other words, as the outlier became less distinct and more similar to the background sequence, MEM's accuracy in estimating the background AR parameters improved. This quality of the M E M estimator was exploited to assist in revealing the spectral characteristics and the location of the outlier sequence. The SNR of —5dB was of particular interest because this is the range at which real ERP information is expected. At — 5dB the M E M and G M estimates were very similar. Earlier studies by Birch [8] on simulated data showed that when there was no outlier content, M E M was actually a better performer than GM2. Utilizing this information, for the case of SNR -5dB, the difference between the GM2 and M E M estimate would be significant when there was no outlier content and minimal in the vicinity of the outlier. By summing the square of the difference between the M E M and GM2 estimates for each parameter, a single quantity (Cumulative Difference) could be calculated to indicate the location of the outlier content. A set of 100 trials with each type of outlier was created to test the ability of this (4.5.1) i=i 48 Chapter 4— THE IMPROVED OPM 49 Chapter 4— THE IMPROVED OPM method to locate the outlier content. Each trial was generated with a background sequence of 1000 samples, a randomly placed outlier sequence of 200 samples with an SNR of —5dB. The background sequence length was increased to 1000 to accommodate a moving window of 100 samples to calculate the CD profile. Figure 17 consists of four examples of CD profiles with outlier no.3 where the true location of the outlier is within the dashed lines. The results from the trials with outlier no.l and no.2 showed that the outlier regions could not be clearly identified as they were in Figure 17. The reason for this might be that outlier no. 3 was the most heavily overlapped with the background EEG with respect to its spectral distribution and the M E M estimate would not be greatly affected by the outlier content. It would be expected that the MEM estimate would be consistent across the entire signal. However, the GM2 estimate would be a little off outside the outlier vicinity and improve in the outlier region, which is exactly why the difference is much larger outside the outlier in Figure 17. There was a slight lag in the CD profiles because a moving window of 100 samples in length was used. Since, outliers no.l and 2 begin to have spectral differentiation with the background EEG, the outlier content would begin to affect the M E M estimate and its performance would deteriorate with respect to GM2. Consequently, the two estimates would differ both outside and within the outlier region. 50 Chapter 4— THE IMPROVED OPM Chapter 4— THE IMPROVED OPM 4.6 Summary The results of these simulation studies are summarized below with the corresponding modifications to the OPM. 1. The automatic generation of tuning constants successfully equalled the performance of the original GM. By eliminating the trial and error methods, GM_agtc is an easier tool to use. There is also an additional security in knowing that if the data, for some reason, required dramatically different tuning constants, this technique would adapt to the data. This method was incorporated into the new OPM. 2. It was recommended that MEM remain as the initial parameter estimator for GM. MR appeared to be an inconsistent estimator. 3. The smoother cleaner was the most accurate cleaner but the fixed-lag cleaner proved to be the most efficient cleaner. The fixed-lag cleaner was only slightly less accurate than the smoother cleaner and didn't require as much memory or time. The fixed-lag cleaner replaced the filter-cleaner in the new OPM. 4. The cumulative sum of the difference between the GM2 and M E M parameter estimates would be effective at locating outlier content at low SNRs (—5dB) if the spectral characteristics of the outlier signal and the background EEG were heavily overlapped. Since this technique is specific to a particular type of signal, it was not implemented in the new OPM. The new OPM will be referred to as OPM2. This new method was tested on real EEG signals and compared to two other algorithms in the next section. 52 Chapter 5 TESTS ON REAL EEG 5.1 Analysis of EEG signals Ensemble Averages Real EEG signals had been previously collected for a conference paper by Mason et. al [22]. A detailed description of the experiment and data collection can be found in that paper. 100 trials of active and 100 trials of idle EEG signals from a single subject were selected for this study. The active trials consisted of 2 seconds prior to, and 1.5 seconds after a ballistic flexion of the right index finger. The idle trials were of the same total duration (3.5 seconds) during which the subject had been asked to relax but remain alert. The data had been sampled at 256Hz. There is a direct correlation between the sampling frequency and the AR model order. A higher sampling rate requires the use of a higher model order since the time dependency is spread over a greater number of sample points. It would be desirable to use the lowest model order possible without sacrificing information contained in the EEG signal. Since, the majority of the power in an EEG signal lies below 30Hz [8], that would be the highest frequency of interest. The EEG signals were lowpass filtered (cutoff 30Hz) and decimated by a factor of 4, reducing the sampling frequency to 64Hz. Fifteen signals had been recorded for each trial. The first two signals were indicators of EOG 3 activity and finger position. The remaining thirteen signals were EEG recorded from scalp electrodes. C 3 , C 4 , F z , C z , P z and O z are standard International 10/20 system 3 EOG is a measure of the variations in the coreneal-retinal potential as affected by the position and movement of the eye. 53 Chapter 5— TESTS ON REAL EEG electrode sites [22]. The additional electrodes had been added at equally spaced intervals between these electrodes. Al l of these signals had been referenced to the O z electrode. It is generally believed [23], [24] that the supplementary motor area (SMA) and motor cortex are parts of the brain that play significant roles in the execution of voluntary movements. The SMA is located in the frontal lobe of the brain in the vicinity of the F z electrode. The contralateral region responsible for right index finger movement corresponds to C 3 , which is located on the motor cortex in the left hemisphere of the brain. Ensemble averages (sample averages over 100 trials) of both these electrode sites are depicted in Figure 18. Assuming that the background EEG had a zero mean, the process of averaging was used as an aid to reveal the underlying ERP that may be present but would not be apparent in the single-trial. The ensemble averages of the active trials revealed distinct waves of electrical activity coinciding with the finger movement. Note that the finger movement is marked at 2.0 seconds. In sharp contrast, the ensemble averages of the idle trials were random low amplitude signals with means close to zero. The third signal of interest was F z — C 3 . The ensemble average of the difference signal was a very characteristic signal with a sharp transition from a negative peak to a positive peak.4 It would be easier for a method like OPM, which has demonstrated better performance in extracting sharp transitions, to pick out such a distinct signal. Once again, the ensemble average of the idle F z - C 3 signals, Figure 18f, was a random low amplitude signal. All EEG signals will be portrayed with negative voltages above the zero line as is the convention. 54 Chapter 5— TESTS ON REAL EEG Figure 18: Ensemble averages of the active and idle trials at electrodes F z and C3. a) Fz - active 0 J v ^ ^ \ ^ u w Y - " A A - ^vy—v 0.5 b) Fz - idle 1.5 0.5 c) C3 - active d) C3 - idle 1.5 2.5 3.5 — 1 — 0.5 -1 3.5 1.5 2.5 e) Fz-C3 - active / A ^ V V V M - V A f) Fz-C3 - idle 55 Chapter 5— TESTS ON REAL EEG j=i L » = i (5.1.1) SNR The calculation of SNR was not a straight-forward process. The SNR of any one active trial could not be calculated without attempting to extract the single-trial ERP. The extraction algorithm would then introduce an entire new set of factors that would have to be identified and accounted for. A simpler approximation of SNR is presented below. The noise power was defined as the average power in the idle trials. This translated to calculating the power in each idle trial and then taking the average of the powers over the 100 trials. ^ 100 " ~ Too The signal power was obtained from the ensemble average of the active trials' which was an approximation of a single-trial ERP. Two variations of signal power were calculated. The first was the average power in the ERP, where the ERP was defined by a window from a to b. P>-«» = (b-a)*f. + l ^ *' ' f s = 6 4 H z ( 5 A 2 ) i=a*fs The second was a calculation of the maximum signal power in the ERP. This was simply defined as the square of the maximum amplitude. Ps-max = max(sf), i = (o,...,6) * fs (5.1.3) The average and maximum SNRs were calculated using the corresponding signal power with the noise power as described by (5.1.1). Because the signal powers were based on ensemble average ERPs, the SNRs listed in Table 4 are slightly under-estimated. 56 Chapter 5— TESTS ON REAL EEG Single-trial ERPs are known to shift in time from trial to trial. By averaging across the 100 active trials, peaks that might exist in the time-shifted single-trial ERPs would tend to be smoothed over in the ensemble average. Consequently, these SNRs are conservative approximations and the actual SNRs should be slightly higher. Table 4: Average and maximum SNRs at Fz, C3 and Fz-C3 SNR F z c 3 Fz-C3 average SNR -8.9dB -9.0dB -9.3dB maximum SNR -5.3dB -4.0dB -4.0dB a 1.85sec l.Osec l.Osec b 2.15sec 3.5sec 3.5sec Ranking Trials Since not all of the trials were equally "good" trials, they were ranked according to the quality of the EOG and finger movement. An automatic system of rejecting trials with EOG artifact had been used during the data collection. The rejection criterion was defined by a pre-determined threshold. However, there were still trials with poor quality EOG that slipped through the system. Ideally, a good EOG signal would have a constant mean across the EEG epoch and a low variance. In other words, the flatter the EOG the better. The quality of the finger movement was judged by a signal that recorded the position of the finger relative to a 'white light' sensor. A weak finger movement may result in a poor ERP. Therefore, it was important to use only those trials with strong finger movements characterized by a sharp distinct change in the position of the finger as it was flexed. The active trials were initially given two ranks, EOG and finger movement. 57 Chapter 5— TESTS ON REAL EEG A combination of the EOG ranking and finger movement ranking was used to provide an overall ranking for each active trial. The idle trials did not contain any finger movement and were therefore ranked by EOG exclusively. 58 Chapter 5— TESTS ON REAL EEG 5.2 Test Procedure The improvements to the original OPM, as discussed in Chapter 4, were implemented and tested on the real EEG signals. F z — C 3 was the signal chosen for processing because it displayed the most distinct features. It was found that 50 of the 100 trials would not be suitable for processing due to either poor EOG and/or finger movement. Therefore, only the top 50 ranked trials were tested. The procedure was comprised of three extraction techniques; 1) the original OPM (OPM1) , 2) the improved OPM (OPM2) and 3) Minimum Mean Square Error (MMSE) filtering. MMSE was included to provide a comparison with a well-established and well-known extraction method. MMSE will be briefly overviewed in this section. The performance of each technique could not be evaluated with regards to how well it extracted the ERP because there was no true single-trial ERP to serve as a reference. The ensemble average would not be a good choice because it lacks unique single-trial information. The performance of each technique was measured by how well it could differentiate between idle and active trials. This was measured by a classifier which output the decision of whether the trial had contained an ERP or not. The classifier that was specifically used for these studies was Dynamic Time Warping. 59 Chapter 5— TESTS ON REAL EEG Figure 19: Test Procedure extracted top 50 idle trials top 50 active trials OPM1 signal • extracted OPM2 signal • • extracted MMSE signal • 50 active ERP estimates 50 idle ERP estimates 50 idle ERP estimates 50 idle ERP estimates MMSE MMSE filters have been used for various types of signal extraction. In particular, MMSE was used by McGillem et al. [6] to extract visually evoked potentials, a type of ERP. These filters are based on the principles of minimizing the mean squared error. The EEG signal is represented by yi = Xi + Vi (5.2.1) where yf. observed signal xf. background EEG vi: ERP 60 Chapter 5— TESTS ON REAL EEG An estimate of vi is obtained by applying a filter to y,-, where the filter coefficients are chosen such that they minimize the sum of the squared errors between v{ and v%. Thus, the filter coefficients are chosen to minimize Y^iyi-vi). (5.2.2) i The filter coefficients are obtained from the following equation. h = R _ 1 g (5.2.3) where R: sample correlation matrix of y g: sample cross-correlation vector between y and v h: filter. As the ERP is an unknown entity, the filter coefficients are based on using the ensemble average for v. The ERP estimate, v, by applying the filter to the observed signals, y. Performance Measures A critical problem encountered in this type of research is the inefficiency of available performance measures for the evaluation of ERP extraction methods. This problem is not limited to the scope of ERP signals alone and is applicable to other methods that extract transient signals. Performance measures are needed to quantify the results obtained from extraction methods because it is not possible to visually inspect the large quantities of data processed 61 Chapter 5— TESTS ON REAL EEG during our studies. In addition, visual inspection is not useful to quantify the performance of any given method. A performance measure for a signal extraction technique is a quantity describing how closely the extracted signal matches the true signal. Dynamic Time Warping Due to the variable latencies of ERP components, Dynamic Time Warping as described by Roberts [25], was chosen as the performance measure for evaluating the performance of each technique. Given two waveforms, it finds the optimal time adjustment through shifting, stretching and contracting the time axis to minimize a specified cost. In this case, DTW produced a cost of warping an ERP estimate along the time axis to best match the standard wave template. The cost of warping is defined by Roberts et al. [2] in terms of a correlation, Q, and a penalty function, P, as Q(a, 6, w) — aP(w) The higher the cost, the worse the match. A randomly selected subset of 25 active ERP estimates were chosen as the active test set and the remaining 25 active trials were used to generate a standard wave template for DTW. To compare the same number of active and idle trials, an idle test set of 25 ERP estimates was also randomly selected. Both test sets were processed by DTW and a cost for warping each trial to the standard wave was generated. 62 Chapter 5— TESTS ON REAL EEG Figure 20: Classification with Dynamic Time Warping 50 active ERP estimates randomly select 25 trials 50 idle ERP estimates randomly select) 25 trials active test set Remaining — • Standard 25 trials Wave idle test set The costs for the active test set were compared to the costs for the idle test set to determine the best threshold for discrimination. Each technique was evaluated on how well it discriminated between the active and idle trials. 63 Chapter 5— TESTS ON REAL EEG 5.3 Results Ensemble Averages and Single-Trials The ensemble averages of the top 50 active and idle ERP estimates were computed and plotted for each technique, Figures 21 and 22. The averages of the active ERP estimates indicate that the ERP related information was preserved by all three extraction methods. The finger movement is characterized by a negative peak at 2.0 sec. and a positive peak between 2.1 and 2.2 sec, which is consistent with the ensemble average of the original data, Figure 18e). In the case of MMSE, the waveform shape is very similar to Figure 18e), which is not surprising, since the MMSE filter that was applied to each signal was derived from Figure 18e). The ensemble averages of the ERP estimates resulting from OPM1 and OPM2 distinctly show the two peaks characterizing the ERP and exhibit the same negative and positive biases, on a smaller scale, preceding and following the two peaks as in Figure 18e). It is important to highlight at this point that the OPM algorithms were successful in preserving the ERP related information without any a priori information about the shape of the ERP waveform. As expected, the idle ERP estimates in Figure 22 do not exhibit any characteristic peaks. The ensemble average of the raw idle data, Figure 18f), represents an ensemble average of the background EEG signal which appears to have a positive bias. This positive bias is also seen in the ensemble average of the idle signals processed by MMSE in Figure 22. However, the idle signals processed by OPM1 and OPM2 are reasonably fiat signals with zero DC bias. This is probably due to the fact that the OPM1 and OPM2 64 Chapter 5— TESTS ON REAL EEG techniques attempt to remove the background E E G signal which should ideally produce a completely flat signal. Figure 21: Ensemble averages of the active ERP estimates 5 O0-. 1 1 : I : i 400- | 65 Chapter 5— TESTS ON REAL EEG Figure 22: Ensemble averages of the active ERP estimates 500-400-3 0 0 - \ 200H IOOH -100--200--300-- 4 0 0 H -500- —r— 0.5 0PM1 0PM2 MMSE — I — 2.5 Time (sec) 3.5 Figures 23 and 24 show the processed- ERP estimates for four different active trials. For the majority of the trials, OPM1, OPM2 and MMSE successfully extracted the peak at 2.0 sec. and the second peak between 2.1 and 2.2 sec. Given the extremely low SNR conditions, the successful extraction of the ERP-related information on a single-trial basis without a priori information, for the OPM1 and OPM2 algorithms, was a very encouraging result. This reaffirmed Birch's results [8] which used the OPM1 method. 66 Chapter 5— TESTS ON REAL EEG The results of processing four different idle signals through the three different extraction techniques are shown in Figures 25 and 26. As they should be, the OPM1 and OPM2 single trial idle signals were fairly flat and centered around zero volts, as there should not be any ERP related information or background EEG in these signals. The MMSE idle estimates were not so clean because the MMSE filter is based on the ensemble average of the active trials and does not attempt to remove the background EEG signal. 67 Chapter 5— TESTS ON REAL EEG 68 Chapter 5— TESTS ON REAL EEG Chapter 5— TESTS ON REAL EEG 70 Chapter 5— TESTS ON REAL EEG 71 Chapter 5— TESTS ON REAL EEG Performance Measurements A s described earlier in this section, Dynamic Time Warping was used to measure the performance of each extraction technique. The effectiveness of each technique was measured by determining how well each technique was able to detect an active signal and reject an idle signal. This was accomplished by determining a cost threshold, based on D T W , to differentiate between active and idle trials. A l l trials with costs above the threshold would be rejected whereas all trials with costs below the threshold would be accepted as active trials. The optimum cost threshold was determined for each extraction method by determining the cost that would maximize the number of active trials that were detected and minimize the number of idle trials that were mistaken for active trials, false positives. A standard template of an active E R P estimate was generated for each technique using the method described by Roberts [25]. These standard templates are shown in Figures 27, 28 and 29. 72 Chapter 5— TESTS ON REAL EEG 73 Chapter 5— TESTS ON REAL EEG Figure 29: Standard template used for DTW for MMSE estimates A separate template was generated for each extraction technique to ensure that the optimum template was used to determine the cost of warping the ERP estimates. Each active and idle processed signal was warped to its corresponding template and a cost was calculated for each signal. The idle and active costs were then analyzed for each extraction technique and a cost threshold was determined which would maximize the number of active trials that were detected and minimize the number of idle trials that were mistaken for signals containing ERP information. Table 5 lists the results of the DTW cost comparisons. 74 Chapter 5— TESTS ON REAL EEG Table 5: Results of D T W cost comparisons Extraction method Active trials detected False Positives detected OPM1 15/25 2/25 OPM2 13/25 0/25 MMSE 16/25 10/25 The ideal extraction method would have detected 25/25 of the active trials and detected 0/25 false positives, where a false positive is an idle trial that is mistaken for an active trial. However, since all of these methods fall short of ideal, it is important to note that in the real world situation of dealing with severely disabled individuals, a false positive is a far more serious error than rejecting an active trial. OPM1 and OPM2 proved to be better methods for rejecting idle data. 75 Chapter 6 CONCLUSIONS 6.1 Contributions Three different ERP extraction methods, MMSE, OPM1 and OPM2, were imple-mented and tested on simulated and real EEG signals. MMSE is the conventional approach to the problem and served as a benchmark for these tests. OPM1 was the first method that successfully extracted ERP related information from single trial EEGs without any a priori information. OPM2 was a method developed to improve upon the original OPM1. Four components of the OPM1 algorithm were studied and tested with simulations to develop the OPM2 method. M E M was retained as the initial parameter estimator for the robust parameter estimation in OPM2. Median Regression, which was an alternative method for parameter estimation, was shown to be an inconsistent estimator. The automatic generation of tuning constants, used in OPM2, successfully equalled the performance of the original OPM1, which determined the tuning constants using trial and error methods. By eliminating the trial and error, OPM2 was an easier tool to use. The smoother cleaner was the most accurate cleaner but the fixed-lag cleaner proved to be the most efficient cleaner. The fixed-lag cleaner was only slightly less accurate than the smoother cleaner and did not require as much memory or time. In the OPM2, the filter-cleaner was replaced by the fixed-lag cleaner. 76 Chapter 6— CONCLUSIONS The cumulative sum of the difference between the G M 2 and M E M parameter estimates was shown to be effective at locating outlier content at low SNRs (—5dB) i f the spectral characteristics of the outlier signal and the background E E G were heavily overlapped. Since this technique was specific to a particular type of outlier, it was not implemented in O P M 2 . The results of the tests on real E E G showed that all three algorithms detected approximately the same number of active E R P trials. However, M M S E detected the largest number of false positives (10/25), O P M 1 detected only 2/25 false positives, O P M 2 was the best performer with 0/25 false positives. This study showed that the O P M 1 and O P M 2 algorithms successfully extracted E R P related information on a single-trial basis from real E E G signals without any a priori information. 77 Chapter 6— CONCLUSIONS 6.2 Future Work The most difficult part of this research was acquiring valid usable data. 50% of the original collected data had to be discarded due to EOG artifact and/or poor finger movement. This statistic could be improved by collecting data based on a more pronounced movement and by using an automated and more sophisticated system for EOG artifact detection. The tests on real EEG were performed on signals containing information related to a ballistic movement of the index finger. The SNR of these signals was quite low, approximately -9dB on average and only -5dB at the maximum. The low SNR could be attributed to the reflex nature of the finger movement. A more controlled and determined movement would probably result in a higher SNR and could significantly improve the performance of these extraction techniques. Although these studies were limited to a single EEG signal, FZ-C3, each electrode location has a unique characteristic waveshape when an event occurs. Several theories have suggested that the ERP related information travels across the scalp and that spatial information could provide key characteristics of the ERP for better location detection. The OPM algorithm is based on modelling the background EEG signal. The current model of a stationary autoregressive signal could be improved by using adaptive modelling techniques that would take into account the non-stationary characteristics of the background signal. Dynamic time warping was used as the performance measure for these tests. This 78 Chapter 6— CONCLUSIONS performance measure was very susceptible to large peaks because it is based on a direct correlation of the two waveforms being compared. Also, the penalty function is chosen by trial and error. An automated method of choosing the optimal penalty function for a set of data and an automated normalization incorporated into the algorithm would greatly improve the efficiency of this method. 79 Chapter 6— CONCLUSIONS References [1] D. Childers, N. Perry, I. Fischler, T. Boaz, and A. Arroyo. Event-related potentials: A critical review of methods for single-trial detection. CRC Critical Reviews in Biomedical Engineering, 14(3): 185-200, 1989. • [2] J. I. Aunon, C. D. McGillem, and D. G. Childers. Signal processing in evoked potential research: Averaging principal components, modelling. Critical Reviews in Bio engineering, pages 323-367, 1981. [3] S. K. Burns and R. Melzak. A method for analyzing variations in evoked responses. Electroencephalography and clinical Neurophysiology, 20, 1966. [4] R. Coppola, R. Tabor, and M. S. Buchsbaum. Signal to noise ratio and response variability measurements in signle trial evoked potentials. Electroencephalography and clinical Neurophysiology, 44, 1978. [5] T. Nogawa, K. Katayama, Y. Tabata, and T. Ohshio. Visual eviked potentials estimated by wiener filter. Electroencephalography and clinical Neurophysiology, 35, 1973. [6] Clare D. McGillem and Jorge I. Aunon. Measurementss of signal components in single visually evoked brain potentials. IEEE Transactions on Biomedical Engineering, BME-24(3):232-241, 1977. 80 Chapter 6— CONCLUSIONS [7] Shigeto Nishida, Masatoshi Nakamura, and Hiroshi Shibasaki. Method for single-trial recordings of somatosensory evoked potentials. Journal of Biomedical Engineering, 15:257-262, May 1993. [8] G.E. Birch, P. D. Lawrence, and R.D. Hare. Single-trial processing of event-related potentials using outlier information. IEEE Transactions on Biomedical Engineering, 40(l):59-73, 1993. [9] John C. Lind. The calculation of tuning constants for some re-descending ip functions in linear models. Technical report, Clinical Diagnostic and Research Center, Alberta Hospital, Edmonton, Alberta, Canada., 1992. [10]Yadolah Dodge, editor. L\ — Statistical Analysis and Related Methods. Elsevier Science Publishers B.V., 1992. [11]R.W. Koenker and V. D'Orey. Computing regression quantiles. Royal Statistical Society, pages 383-393, 1987. [12]Peter J. Huber. Robust estimation of location parameters. Annals of Mathematical Statistics, 35:73-101, 1964. [13]F. R. Hampel, E. M . Ronchetti, P.J. Rousseeuw, and W.A. Stahel. Robust Statistics: The Approach Based on Influence Functions. John Wiley & Sons Inc., 1986. [14]Victor J. Yohai. Robust estimation in the linear model. The Annals of Statistics, 2:562-567, 1974. [15]M. Hlynka, J.N. Sheahan, and D.P. Weins. On the choice of support of re-descending functions in linear models with asymmetric error distributions. Metrika, 37:365-81 Chapter 6— CONCLUSIONS 381, 1990. [16]M.J.D. Powell. An efficient method for finding the minimum of a function of several variables without calculating derivatives. Computer Journal, 7:155-162, 1964. [17]L. Denby and R. D. Martin. Robust estimation of the first order autoregressive parameter. Journal of the American Statistical Association, 74(365): 140-146, 1979. [18]Ricardo A. Marronna and Victor J. Yohai. Asymptotic behavior of general m-estimates for regression and scale with random carriers. Zeitschrift fur Wahrschein-lichkeitstheorie und verwandte Gebiete, 58:7-20, 1981. [19]Samina Tajwar. The analysis of the asymptotic variance function for calculating tuning constants. Technical report, University of British Columbia, 1992. [20]R.D. Martin and D.J. Thomson. Robust-resistant spectrum estimation. Proceedings of the IEEE, 70(9): 1097-1115, 1982. [21]Steven G. Mason, Gary E. Birch, and Mabo R. Ito. Single-trial signal extraction of low snr events. IEEE Transactions Signal Processing (in press), January 1994. [22]S. G. Mason, G.E. Birch, S. Tajwar, and M.R. Palmer. The space-time characteristics of brain potentials related to single voluntary movements. In IEEE Sixth SP International Workshop on Statistical Signal and Array Processing, October 1992. [23]Akio Ikeda, Hans O. Luders, Richard C. Burgess, and Hiroshi Shibasaki. Movement-related potentials associated with single and repetitive movements recorded from human supplementary motor area. Electroencephalography and clinical Neurophys-iology, 89:269-277, 1993. 82 Chapter 6— CONCLUSIONS [24]Vander, Sherman, and Luciano. Human Physiology- The Mechanisms of Body Function. McGraw-Hill Publishing Company, 1990. [25]Kim Byron Roberts. Digital Processing of Somatosensory Evoked Potentials Applied to the Early Diagnosis of Multiple Sclerosis. PhD thesis, University of British Columbia, 1984. 83 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0065068/manifest

Comment

Related Items