Person Authentication Using EEG Brainwave Signals by Chen He B.Eng., McMaster University, 2007 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE in The Faculty of Graduate Studies (Electrical and Computer Engineering) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) September 2009 © Chen He 2009 Abstract Studies have shown that the electroencephalogram (EEC) recordings have unique pattern for each individual and thus have potential for biometric ap plications. There are two major problems associated with EEC biometrics. One is that the large EEG features size and the relatively limited EEG data size make it difficult to train a robust model; the other is that the signals from EEG scalp may not be reliable in many situations. Thus in this the sis we proposed new methods for increasing the accuracy and robustness of EEC-based authentication systems. First, to address the concern of the high dimensionality of EEC features, we proposed a novel dimension reduction method of EEG features based on the Fast Johnson-Lindeustrauss Transform (FJLT). We showed that this method has potential of mapping EEG features from a high dimension space to a lower one while keeping discrimination power between the features of subjects. The features we used are Multivariate Autoregressive (mAR) coefficients. We tested this method on a motor task related EEG data set. Second, to increase the reliability of scalp EEC signals, we employed an Independent Component Analysis (ICA)-based approach in our authen tication procedure, with the assumption that EEG recordings are linear combinations of the underlying brain source signals. We estimated the In dependent Components (ICs) from several physical regions on the scalp and determine the Dominating Independent Components (DIC) in the corre sponding regions. Then we extracted the Univariate Autoregressive (AR) coefficients from DICs as features. We tested our algorithm on two data sets, a motor task related EEC data set and an EEC data set of P300 potential. The proposed algorithm appeared to be promising, and when applied to EEG data collected from different days yields better performance 11 Abstract than other methods. This relative consistence over time is essential in person authentication systems. U’ Table of Contents Abstract ii Table of Contents iv List of Tables vii List of Figures ix Acknowledgements xi Statement of Co-Authorship xii 1 Introduction 1 1.1 Motivation 1 1.2 Background on EEG 3 1.2.1 EEG Artifacts 4 1.3 Literature Review 6 1.3.1 EEG-based Person Authentication and Identification 6 1.3.2 EEG Features 7 1.4 Research Objectives 11 Bibliography 15 2 Electroencephalogram (EEG) Hashing for Person Authen tication 17 2.1 Introduction 17 2.2 Method 18 2.2.1 Preprocessing of EEG Signals 18 iv Table of Contents 2.2.2 The Multivariate Autoregressive Model 19 2.2.3 Estimation of Multivariate Autoregressive Model . . . 20 2.2.4 Hashing MAR Coefficients via Fast Jonhson-Lindenstrauss Transform 24 2.2.5 Authentication Decision 28 2.3 Experiments and Results 29 2.3.1 EEG Database 29 2.3.2 Feature Extraction and Hashing 29 2.3.3 Experimental Results 31 2.4 Conclusion 31 Bibliography 34 3 An Independent Component Analysis (ICA) Based Approach for EEG Person Authentication 36 3.1 Introduction 36 3.2 Method 37 3.2.1 Independent Component Analysis 37 3.2.2 Estimation of ICA 40 3.2.3 Preprocessing of Data Before ICA 43 3.2.4 Dominating Independent Component Analysis for EEG 44 3.2.5 The Univariate Autoregressive Model 45 3.2.6 A Simple Model for Person Authentication 46 3.3 Experiments and Results 47 3.3.1 Experimental Results: Results on The Motor-task related Data Set 47 3.3.2 Experimental Results: Results on P300 EEG Data Set 53 3.3.3 Method of Gaussian Mixture and Maximum A Poste non Model Adaption for EEG Authentication . . . . 54 3.4 Conclusion 59 Bibliography 60 v Table of Contents 4 Conclusions and Future Work 61 4.1 Conclusions and Contributions 61 4.2 Future Work 62 Bibliography 64 vi List of Tables 1.1 A summary of EEG frequency band. 5 2.1 The error rates: FAR, FRR and HTER in %. Si to S4 denotes subject 1 to 4, and Average(S) is the average of 4 subjects. Ti to T5 denotes task 1 to 5 and Average(T) is the average of 5 tasks 32 3.1 Error rates of our method on motor-task-related data set, the proposed method: FAR,FRR and HTER error rates in % with the 5-th order AR modeling. Si to S7 denotes subject 1 to 7, Ti to T5 denotes task 1 to 5 and S denotes the average of a given task over all the subjects 50 3.2 Error rates of the proposed method on motor-task-related data set: FAR, FRR and HTER in %. P denotes the order of AR model, Ti to T5 denotes task i to 5 and T denotes the average of a given order over all tasks 51 3.3 Error rates of the method of [8] on motor-task-related data set: FAR,FRR and HTER in %. Si to S7 denotes subject 1 to 7, Ti to T5 denotes task 1 to 5 and S denotes the average of a given task over all the subjects 52 3.4 Results of the proposed method on the P300 data set: The power percentages of DICs over the corresponding regions. DIC1 ,..., DIC5 represntsent the dominating ICs from regions ito5 54 vii List of Tables 3.5 Error rates of the method in [8] on P300 data set: FAR, FRR and HTER are in %. Si to S7 denotes subject 1 to 7 S denotes the average of a given task over all the subjects. Dayi means testing EEG data of the first day by using parameters that is trained in the first day and day2 means testing the EEG data of a second day by using the parameters trained in the first day 54 3.6 Error rates of our method on P300 data set: FAR, FRR and HTER are in 9o with 5-th order AR modeling. Si to S7 denotes subject i to 7 5 denotes the average of a given task over all the subjects. Dayi means testing EEC data of the first day by using parameters that is trained in the first day and day2 means testing the EEG data of a second day by using the parameters trained in the first day 56 viii List of Figures 1.1 A set of EEC devices. The picture is from [1] 4 1.2 An early EEG recording, obtained by Hans Berger in 1924. The upper trace is EEC, and the lower is a 10 Hz timing signal. The picture is from [2] 4 1.3 The general scheme of the EEC authentication system 7 1.4 An illustration of the procedure of EEC-based person authen tication system. The two important components that we add into the general EEC-based person authentication system are: Reliability Increase of EEC signals via ICA-based approaches, and Size Reduction of the EEC features via FJLT hashing. 12 2.1 The location illustration of the EEC channels on the scalp. Those shaded are used for our experiments 30 2.2 This figure shows the the separation of hash vectors between subjects, the first three dimensions of the hash vectors are plotted 30 2.3 An example of log-likelihood values of the EEC hash vectors from the client and impostors 33 3.1 The scalp signal Cl is a linear combination of underlying brain signals Si, S2 and S3 37 3.2 The source signals and the mixtures of the signals 38 3.3 The estimated source signals by ICA 39 3.4 An example of log-likelihood values of AR coefficient vector from DICs. Circles indicate that of client and squares are from impostors 49 ix List of Figures 3.5 An illustration of average HTER, FAR and FRR over seven subjects as a function of threshholds. This example is from task 5 with AR model p = 7 49 3.6 Results of the proposed methods on the P300 data set: plots of the mean values of the AR coefficients for 5 DICs from 7 subjects. From top-left to bottom: DId, ..., DIC5. We can see that the first three AR coefficients are separated for each subjects 55 3.7 An illustration of the location of electrodes on the scalp. Elec trodes used in [81 are indicated in grey 57 x Acknowledgements I would like to thank everyone who has helped and inspired me during my master study. I especially want to convey my enduring gratitude to my supervisor, Dr. Z. Jane Wang, for providing me the opportunity to work in her research group, for her patient guidance, persistent encouragement and deep insight in the research area that have helped me throughout the course of my thesis research. I would like to extend my thanks to Dr. Martin McKeown who provides data for this research and the researchers in our lab for their support and help. My deepest gratitude goes to my parents for their endless love and con stant support over the year. Without their encouragement, I would have not made as far as I did. This work was supported by the NSERC STRGP grant and the Collab orative Health Research Projects grant. xi Statement of Co-Authorship The research was initiated by Dr. Z. Jane Wang and the majority of the research, including literature survey, model design, numerical simulation, statistical data analysis and compilation of results, was conducted by the author of this thesis, with suggestions from Dr. Z. Jane Wang. This thesis is based on two conference papers: • Chen He, Xudong Lv and Z. Jane Wang, “Hashing The MAR Coeffi cients from EEG Data for Person Authentication,” in Proc. 33th IEEE mt. Conf Acoustics, Speech, and Signal Proc. (ICASSP), 2009. • Chen He and Z. Jane Wang, “An Independent Component Analy sis (ICA) Based Approach for EEG Person Authentication,” in Proc. 3rd IEEE mt. Conf. Bioinformatics and Biomedical Engineering (ICBBE), 2009. Dr. Z. Jane Wang also helped greatly on revisions of the conference manuscripts. Mr. Xudong Lv helped on implementing the Fast Jonhson Lindenstrauss Transform for the work in Chapter 2. xii Chapter 1 Introduction 1.1 Motivation Biometrics, the measurement of personal physical features, actions or be havioral characteristics which distinguish between individuals, has a long history. People from Asia and the Middle East used their fingerprints as signature as long ago as 1500 BC. However, technology development in bio metric, is a new concept, and it often refers to automatic biometrics system. Recently, intense research has been conducted in biometrics technology and now many biometric systems reach the levels of success originally touted [6]. An automatic biometrics system has extensive application such as a form of identity access management and access control, areas of identity background checking, and detainee processing. The systems offer several advantages over current recognition methods, including the adding the pos sibility of measuring features (e.g. iris pattern) that carmot be readily sensed by humans. Additionally, more and more research and development is being devoted to multimodal biometric systems which use more than one biomet ric or more than one measure of the same biometric, by which the levels of accuracy and security can be increased. Automated biometric systems play an adjunct role to existing personal identification systems, adding to techniques already used. An automatic biometric system can be used for two main purposes: person authentication and identification. The goal of person identification is to identify an individual from a group of persons (i.e. matching the biometric features of one person against all the records in a database), while the goal of person authentication is to confirm or deny an identity claim by a particular individual. In this thesis we focus on person authentication. A good introduction about person authentication can be 1 1.1. Motivation found in [10]. Biometrics can be classified into two main classes: Physiological and Be havioral. Physiological biometrics is related to the physical characteristics of human body, such as facial patterns, fingerprints eye irises, hand geolne try, DNA and odor/scent. Behavioral biometrics refers to those related to the behavior of an individual. Such as signature, voice, typing rhythm and gait. Among them, facial patterns, fingerprints and eye irises are extensively used for person authentication or identification purposes, but they have cer tain limitations. For instance, they are prone to forgery. Researchers are therefore motivated to investigate alternative biometric traits. In previous studies, it has been shown that the brainwave pattern for each individual is unique and thus can be used for biometric purpose [15]. The advantage of biometry from brainwave is that it is almost impossible to duplicate human brain activity. Also, such biometric traits naturally allow aliveness detection to enhance the security of a traditional fingerprint-biometric-based system. The brain activity can be monitored via several methods, which can be classified as invasive and noninvasive. The invasive method need to per manently implant devices in the brain which generated many risks and it is not feasible in particle applications. The noninvasive methods include magnetoencephalography (MEG), functional magnetic resonance imaging (fMRI), positron emission tomography (PET), optical imaging and elec troencephalography (EEG). The brain activity produces several types of signals such as electrical, magnetic and metabolic signals. Magnetic signals can be recorded by MEG while fMRI and PET can observe brain metabolic activity which is reflected in changes in blood flow. Unfortunately, such techniques require sophis ticated and expansive devices and long time of measuring which are not feasible for practical application. The EEG which measures the electri cal brain signals is a simple noninvasive method to monitor brain activity. Moreover the EEG device is portable and much simpler and is desired in practice. Therefore, we focus on studying EEG, i.e. the electrical brain activity recorded from electrodes on the scalp. 2 1.2. Background on BEG 1.2 Background on EEG EEG is the recording of electrical activity along the scalp generated by changes in the electrical charge of thousands of neurons of the cerebral cor tex. A synapse is formed by a very small cleft which separates neurons and the specialized membrane between two neurons. The neurons have a resting potential. The resting potential is resulted from the electrical po tential difference between the interior of the cell and the extracellular space. The synapse is a polarized structure which allows the information to travel from one neuron to another. The resting potential fluctuates as a result of impulses or nerve stimulus arriving from other neurons at synapses. The impulses create the local postsynaptic potentials causing electrical current to flow along the membrane of the cell body. Once the changes reduce the membrane potential to a critical level, an action potential is generated. The action potential, which propagates along the neuronal membrane, is a transient disruption of the membrane potential. The synaptic activity gives rise to constantly changing ionic currents across the cell membrane and the electrical potential field. Because of the briefness of the action potentials, the fluctuations in the surface EEG are produced mainly by the temporal and spatial summation of electrical currents caused by the relatively slow postsynaptic potentials. The electrical signals are manifestations of these complex systematic physical and chemical processes which create these po tential differences [4, 16, 17]. Figure 1.1 shows a set of EEC device. Figure 1.2 is an early EEG recording obtained by Hans Berger in 1924. EEG has five major wave patterns, which are Delta rhythm, Theta rhythm, Alpha rhythm, Beta rhythm and Gamma rhythm. Delta rhythm is in the frequency between 0 and 4 Hz and is seen normally in adults in slow wave sleep, and it occurs with high voltage. Theta rhythm is in the frequency range between 4 Hz and 8 Hz and is associated with drowsiness, childhood, adolescence and young adulthood. Alpha rhythm is in the fre quency range between 8 Hz and 12 Hz and is brought out by closing the eyes and by relaxation. Beta is the in the frequency range between 12 and 30 Hz. It is most evident in the frontal region and associated with active, 3 1.2. Background on EEG Figure 1.1: A set of EEG devices. The picture is from [lj. Figure 1.2: An early EEG recording, obtained by Hans Berger in 1924. The upper trace is EEG, and the lower is a 10 Hz timing signal. The picture is from [2]. busy or anxious thinking and active concentration. Gamma rhythm is in the frequency range between 30 and 100 Hz and it is associated with certain cognitive or motor functions. Table 1.2 summarizes the descriptions of EEG frequency band. 1.2.1 EEG Artifacts EEG signals that originate from non-cerebral origin are called artifacts and EEG data is almost always contaminated by artifacts [5]. The amplitude of artifacts can be quite large relative to the size of amplitude of the cor tical signals of interest. This is one of the reasons why it takes consid erable experience to correctly interpret EEG clinically. The artifacts can 4 12. Background on EEG Table 1.1: A summary of BEG frequency band. Rhythm Frequency band Description Delta < 4Hz Prominent feature of sleep in adult; The activity has a frontal predominance in adults while in children the activity has an occipital predominance Theta 4-8Hz Drowsiness and sleep in adults and the waking state for young children Alpha 8-12Hz Normal, relaxed, awake with their eyes closed; the posterior dominant rhythm Beta 12 — 30Hz Most evident in the frontal region; associated with active,busy or anxious thinking and active concentration Gamma 30 — 100 Hz Certain cognitive or motor functions be typically clustered in two categories: internal (biological) and external [181. Internal artifacts have biological but non-cognitive origin, for example, electro-oculogram (EOG) and electromyographic signals (EMG). The exter nal artifacts have external origin, most of times linked to the electrical line noise, to the impedance or settling of the electrodes. One major type of internal artifacts is ocular artifact. There is a po tential difference of about 100 mV between a positively charged cornea and negative retina of the human eye, thus forming an electrical dipole. First, the rotation of the eyeball results in changes of the electrical field across the skull. Secondly, eye blinks are usually not associated with ocular rotation; however, the eyelids pick up the positive potential as they slide over the cornea. This creates an electrical field that is also propagated through the skull. But small or large reflexive eye movements occurs very frequently and they generate a potential which can be recorded in the frontopolar and frontal leads and which are in the frequency range of 1-3Hz. Movements of ocular muscles that cause vertical or horizontal movements (saccades) will also generates EMG potentials. The spectrum of this artifact is concentrated in the frequencies above 20 Hz. External artifacts originate from outside the human body (such as 50/60 5 1.3. Literature Review Hz power-line noise or changes in electrode impedances). These artifacts can be easily separated by proper filtering, shielding [5]. 1.3 Literature Review This section gives a brief review on the previous work in EEG-based au thentication arid identification arid summarizes the EEG features that rriay be suitable for EEG-based biometrics.. 1.3.1 EEG-based Person Authentication and Identification Person identification and person authentication are two different types of applications and thus pose different challenges on decision making of bio metric systems. The goal of person identification is to identify an individual from a group of persons (i.e. matching the biometric features of one person against all the records in a database), while the goal of person authenti cation is to confirm or deny an identity claim by a particular individual. We are particularly interested in person authentication in this thesis. An authentication/identification system often consists of two main components: EEG feature extraction and classification. Figure 1.3 illustrates a general framework for EEG-based authentication/identification system. Poulos et al. [7] introduced a method based on spectral information extracted from the EEG non-parametrically via the FFT and employed a neural network to classify unknown EEGs as belonging to one of a finite number of individuals. Their work focused on healthy individuals and aimed to establish an one-to-one correspondence between the genetic information of the individual. In Palaniappan’s work [11, 13, 14], the potential of brain electrical activ ity generated as a response to a visual stimulus was examined in the context of the identification of individuals. They established a framework based on Elman Neutal Network for the Visual Evoked Potential (VEP)-based bio metrics and energy features of the gamma band within VEP signals were of particular interest. 6 L3. Literature Review Raw / signals / Preprvcss€d signals J Figure 1.3: The general scheme of the EEC authentication system. Paranjap et al.[15J examined the potential of using EEG data as a bio metric to identify human subjects. They employed the autoregressive (AR) modeling of EEG from a single channel and applied discriminant functions to the AR coefficients for identification. Their work focused on normal subjects in the mental states of eyes open and eyes closed. EEG-base person authentication was first proposed by Marcel et al. in [8]. They proposed the use of Power Spectral Density (PSD) as the feature, and a statistical framework based on Gaussian Mixture Models (GMM) and Maximum A Posteriori (MAP) Model Adaptation. 1.3.2 EEG Features In this section we summarize common EEC features that may be used in EEG-based biometrics. Power Spectral Density Power spectral density is a positive real function of a frequency variable as sociated with a stationary stochastic process. It is the measure of the power strength at each frequency. In other words, it shows at which frequencies variations are strong and at which frequencies variations are weak. The unit of PSD is energy per frequency (width). Computation of PSD can be done f”parameter 1’ se 7 1.3. Literature Review directly by the method of FFT or computing autocorrelation function and then transforming it. The Discrete Fourier transform is given by X(f) = Zx(OwNi - 1), (1.1) where ‘WN = exp (2iri)/N, (1.2) is the Nth root of unity. Power spectral density is given by Sx(f) = X(f). (1.3) There are several methods to estimate the spectral density of a ran dom signal from a sequence of time samples. Estimation techniques can be classified as parametric or non-parametric approaches depending on what is known about the signal, and may be based on time-domain or frequency- domain analysis. For instance, fitting the observations to an autoregres sive model is a popular parametric estimation technique. A common non- parametric technique is the periodogram, which is often computed from a finite-length digital sequence using the fast Fourier transform (FFT). Other methods include Welch’s method [19] and the maximum entropy method. Due to the finite size of the EEG data, the spectra can only be estimated by averaging the periodogram over the epochs of the EEG signals. Channel Spectral Powers and Inter-hemispheric Channel Spectral Power Differences The channel spectral power is given by [12]: ff2 Pf1,f2 = J Sx(f)clf, (1.4)fi where (f’, f2) is the frequency band and Sx (f) is the power spectral density. The inter-hemispheric channel spectral power differences in each spectral 8 1.3. Literature Review band is given by Pdiff = + 2’ (1.5) where P1 and P2 are the powers in different channels in the same spectral band but in the opposite hemispheres. Coherence Coherence is a measurement of the linear correlation between two signals as a function of the frequency, it uncovers the correlation between two time series at different frequencies and has been applied to the study of EEC signals in several cognitive and clinical conditions. The magnitude of the squared coherence estimate, which is a frequency function with values ranging from 0 to 1 , quantizes how well x corresponds to y at each frequency. The coherence C(f) is a function of the power spectral density P and P of x and y and the cross-power spectral density P of x and y, as defined in the following expression: D — .LyJJ xyJ) — xxJ) yy In this case, the EEC feature is represented by the set of points of the coherence function. Cross-correlation The cross-correlation (CC), also known as sliding dot product, measures the similarity of two signals. Usually it is used to find occurrences of a known signals sequence in an unknown one. For example, consider two real valued time sequence x and y that differ only by a time shift. One can calculate the cross-correlation to figure out how much y must be shifted to make it identical to x: CC(r) N_Tt+Tt), (1.7) 9 1.3. Literature Review where x and y are two normalized (zero mean and unit variance) obser vations, N is the number of the observations and T is the delay. CC is within the range [— 1, 1]. C < 0 implies an inverse correlation and a ten dency of both signals having similar absolute values but with opposite signs and C > 0 implies a direct correlation and a tendency of both signals hav ing similar values with the same sign. If C = 0, then it indicates the lack of the linear dependency between two signals. In this case, the EEG feature can be represented by the set of points of the cross correlation function. Inter-hemispheric Channel Linear Complexity For N-channel signals, linear complexity is defined as [12]: = exp (— (1.8) where is the normalized eigenvalue that is computed from the covariance of the N-channel EEG signal matrix. The linear complexity measures the amount of spatial synchronization [12]. Large values of 12 indicates low correlation between the signals in the channels and vice versa. Mutual Information In probability and information theory, the mutual information (MI), also known as transinformation [3] [9], of two random variables measures the mutual dependence or information gained about one signal from another. Given two random variables X and Y the MI is defined as, I(X,Y) = Pxy(x,y) log2 (1.9) where P(x) is the probability that x is drawn from X and the probability distribution of values observed for the measurement x and Pxy (x, y) is the joint probability density for the measurements of X and Y that produce the values x and y. The probability densities, P(x) and Pxy(x,y) can 10 1.4. Research Objectives be approximated by histogram. The most common unit of measurement of MI is the bit, when logarithms of base 2 are used in its computation. The EEG feature in this case is defined as the difference between the sum of the entropies within two channels’ time series and their mutual entropy. Multivariate Autoregressive Coefficients A mathematical formulation of mAR model is given as M M xi(n) = — i) + ... a1N(i)xN(m — i) +e1(n) M M XN(fl) = aN1(i)x1(n — i) + ... aNN(i)xN(n — i) +eN(n), (1.10) where N is the number of time sequences (channels), x1(n) ... XN(fl) repre sent the current values of each channel. M is the model order, indicating the number of previous data points used for prediction, aa (i) ‘s are mAR coef ficients at delay i, and ei(n), ..., ejr(n) represent errors at time n which are modeled as uncorrelated random variables with zero mean. In this case the coefficients of mAR are used as EEG feature. More about mAR estimation can be find in Chapter 2. 1.4 Research Objectives In this thesis we focus on investigating the potential of using scalp EEG signals for person authentication. We examine the scalp EEG signals on two data sets: a motor-task-related data set and a EEC data set of P300 potential. Descriptions of the data sets can be found in Chapter 2 and Chapter 3. 11 1.4. Research Objectives Common Framework for EEG-based Authentication —I I II II II I H L Reliabilityincreaseof EEG Signals Our Research I et Size Reduction of EEG Features Figure 1.4: An illustration of the procedure of EEG-based person authenti cation system. The two important components that we add into the general EEG-based person authentication system are: Reliability Increase of EEG signals via ICA-based approaches, and Size Reduction of the EEG features via FJLT hashing. 12 1.4. Research Objectives A general framework of EEG-based person authentication systems is illustrated in Figure 1.4, where the common major components include: EEG preprocessing; Feature Extraction; Mathematical Modeling; and De cision Making. Previous work of EEG-based biometrics reviewed in Section 1.3 mainly focused on either the Mathematical Modeling component or the Feature Extraction component. However, as we mentioned earlier, there are two major additional concerns associated with EEG biometrics: the large size of EEG features and the relative limited size of EEC samples; the scalp signals may not be reliable in many situations since such scalp signals may not correspond well the features of underlying brain sources. To address these two challenges, we propose adding two components, namely Reliabil ity Increase of EEG Features and Size Reduction of EEG Features, into the general EEG-based person authentication system, as illustrated in Figure 1.4. In this thesis we propose new methods for increasing the accuracy and robustness of EEG-based authentication systems. The objectives of this research are summarized as follows: 1. to reduce the dimension size of the EEG features, so that robust au thentication models can be trained by using EEG data sets with lim ited sample size. 2. to increase the reliability of scalp EEC signals by extracting the under lying brain source signals, so that the EEC-based biometric features are more robust. To achieve the first objective, we propose a method of dimension reduc tion of EEC features via Fast Jonson-Lindenstrauss hashing. To address the second objective, we propose Independent-Component-Analysis-based approach for person authentication. Our work, however, presents a funda mental departure from existing methods in the area of EEG-biometry for person authentication. To our knowledge, there was no previous work pro posed to focus on the two components that we have added into the general procedure. The rest of the thesis is organized as follows: Chapter 2 presents the detail of a novel EEC hashing method (a method of dimension reduction), 13 1.4. Research Objectives the multivariate autoregressive coefficients are extracted from EEG channels and were hashed via FJLT. A naive bayes model is used to classify the hash vectors and an authentication decisions is made accordingly. In chapter 3, we present a method of Dominating Independent Component Analysis (DICA) for person authentication. We compare our methods with previous work for person authentication when testing on the same data set. We show the advantages of our approach. We conclude our research and contributions along with suggestions for future work in Chapter 4. 14 Bibliography [1] http://archlab.gmu.edu/peop1e/cba1dwi4/eeg.jpg. [2] http://upload.wikimedia.org/wikipedia/commons/7/7e/lst-eeg.png. [3] M. Deriche and A. Al-Ani. A New Algorithm for EEG Feature Selection Using Mutual Information. In IEEE ICA 5SF, pages 1057—1060, May 2001. [4] K.A. Kooi et al. Fundamentals of Electroencephalography. Technical Report, MD Medical Department, Harper Row, 1978. [5] Mehrdad Fatourechi, Ali Bashashati, Rabab Ward and Gary Birch. EMG and EOG artifacts in brain computer interface systems: A survey. Clinical Neurophysiology, 118:480—494, 2007. [6] R. Heyer. Biometics Technology Review 2008. Technical report, Land Operations Divsion, Defence Science and Techology Organisation, 2008. [7] M. Poulos, M. Rangoussi. and N. Alexandris. Neural Network Based Person Identification Using BEG Features. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing ICA 5SF ‘99, volume 2, pages 1117—1120, 15—19 March 1999. [8] S. Marcel and J. Millan. Person Authentication Using Brainwaves (EEG) and Maximum A Posteriori Model Adaptation. IEEE Trans. Pattern Analysis and Machine Intelligence, 29(4):743—752, April 2007. [9] R. Moddemeijer. On Estimation of Entropy and Mutual Information of Continuous Distributions. Signal Processing, 16:233—248, 1989. 15 Chapter 1. Bibliography [10] G. Chollet, P. Verlinde and M. Acheroy. Multi-modal Identity Verifi cation Using Expert Fusion. Information Fusion, 1:17—33, 2000. [11] R. Palaniappan. Method of Identifying Individuals Using VEP Signals and Neural Networks. In lEE Proc.-Sci. Meas. Technol, pages 1386— 1389, January 2004. [12] Ramaswamy Palaniappan. Electroencephalogram Signals from Imag ined Activities: A Novel Biometric Identifier for A Small Population. IDEAL, 4224:604—611, 2006. [13] R. Palaniappan and D. Mandic. Biometrics from Brain Electrical Ac tivity: A Machine Learning Approach. IEEE Trans. Pattern Analysis and Machine Intelligence, 29(4):738—742, April 2007. [14] K. Ravi and R. Palaniappan. Leave-one-out Authentication of Person Using 40 Hz EEG Oscillations. In IEEE EUROCON, pages 1386—1389, November 2005. [15] R. Paranjape, J. Mahovsky, L. Benedicenti and Z. Koles. The Electroen cephalogram as A Biometric. In Proc. Canadian Conference on Elect trical and Computer Engineering, volume 2, pages 1363—1366, 2001. [16] R. Spehlmann. EEG Primer. NY: Elsevier North-Holland, New York, 1981. [17] C. Stam. Nonlinear Brain Dynamics. NY: Nova Science Publishers, New York, 2006. [181 C. Humphries, T. Lee, M. Mckeown, V. Iragui, T. Jung, S. Makeig and T. Sejnowski. Removing Electroencephalographic Artifacts by Blind Scource Separation. Psychophyiology, 37:163—178, 2000. [19] P. Welch. The Use of Fast Fourier Transform for The Esitmation of Power Spectra: A Method Based on Time Averaging Over Short, Mod ified Periodograms. IEEE Trans. Audio and Electroacoust., 15:70—73, 1967. 16 Chapter 2 Electroencephalogram (EEG) Hashing for Person Authentication’ 2.1 Introduction We note that current research on EEG-biometry is characterized by directly modeling or classifying the extracted EEG features (e.g. AR coefficients). However, it is challenging to extract robust features. Also, since the dimen sion of the features used is generally huge, feature selection or reduction is an additional challenge for designing the classifiers or training the statistical models. Realizing that the problem of image authentication and our prob lem of person authentication share similar problem formulations and system requirements, to address these concerns, we propose a fundamental depar ture from the existing approaches in EEG-biometry area by investigating the EEG hashing direction for person authentication. Image hashing, which is a content-based compact and exclusive feature descriptor of a specific image, has been proved to be a significant tool in mu! timedia security applications such as image authentication. Several hash ing schemes based on dimension reduction techniques, including our Fast Jonson-Lindenstrauss Transform (FJLT)-based hashing algorithm proposed very recently, have been recently reported to provide superior performance. Motivated by the promising applications of the FJLT-based image hashing, ‘A version of this chapter has been published. Chen He, Xudong Lv and Z. Jane Wang (2009) Hashing The MAR Coefficients from EEG Data for Person Authentication. In Proc. 33th IEEE mt. Conf. Acoustics, Speech, and Signal Proc. (ICASSP). 17 2.2. Method in this paper, we present a FJLT-based EEG hashing scheme for person authentication. In Section 2.2, we first estimate the multivariate Autore gressive (mAR) coefficients as a feature set of EEG recordings, and then a dimension reduction technique FJLT is applied to hash the mAR coefficients from multiple EEG channels. In Section 2.3, we describe the EEG data set collected during motor tasks and present the results on person authentica tion when applying the proposed EEG hashing approach. 2.2 Method 2.2.1 Preprocessing of EEG Signals The first step for EEG-based authentication is prepossessing raw EEG data. There are two purpose for preprocessing of EEG signals before further anal ysis, one is that we want to remove the artifacts and the other is that we may only interest in EEG signals in certain frequency band. Preprocessing normally consists of filtering and Independent Components Analysis. Filtering Filtering is useful for removing artifacts located in certain frequency bands that do not overlap with those of the neurological phenomena of interest [9]. For example, low-pass filtering can be used to remove EMG artifacts and high-pass filtering can be used to remove EOG artifacts. Our data are bandpass-filtered by a 3rd order Butterworth filter between 5-100Hz in order to focus on the frequency range of interest. Also, this filtering removes the EOG artifacts. ICA Noise Removal Blind source separation (BSS) techniques separate the EEG signals into components that create them [5], and they can used to remove EEG artifacts. Among the BSS methods, Independent Component Analysis is more widely used. ICA has been proven capable of isolating both artifactual and neurally generated EEG sources. As various contaminants of EEG recordings such 18 2.2. Method as eye movements, eye blinks, cardiac signals, muscle contamination, etc., can be considered temporally independent from ongoing brain activity, ICA is useful for artifact removal. ICA decomposes mixture of time courses into a sum of temporally statistical maximally independent components. The EEG measurements from the scalp, x = xi(t), ..., XN(t), are mixtures of the source signals, s = si(t), ..., sjr(t) as x=As, (2.1) where A is the unknown mixture matrix. Estimation of the mixture matrix can be found in Chapter 3. ICA is applied to the EEG data to obtain the maximally independent time courses of different brain and artifact sources and the signals are corrected by eliminating the artifactual sources. After these two steps, EEG signals are ready for further processing. 2.2.2 The Multivariate Autoregressive Model A single channel Autoregressive (AR) model can predict the current value of a time series from the previous observations of the same time series. A Multivariate Autoregressive (mAR) model, however, is a model that for each value of one time series, the prediction depends not only on the history of the same time series but also the history of other time series. An M-variate AR(M) model for a stationary time series of state vectors x e R” observed at equally spaced instants n, can be defined by ao + Ax_ + e, (2.2) where e E RN are uncorrelated random vectors with mean zero and co variance matrix E RN)<N, and the matrices A1,... ,AM E RN are the coefficient matrices of the mAR modeling. The parameter vector a0 E RN is a vector of intercept terms that is included to allow for a nonzero mean of the time series, 19 2.2. Method E(Xn) (I — A1 — ... — A)’ao, (2.3) where E(x) is the expected value of random vector Zn. I is the identity matrix. Normally we assuming zero mean of the EEG time series, therefor we assume ao 0. The applications of mAR include, but not limited to, communications, controls, sensor array processing and biomedical signal processing. Studies showed that there is particular functional connectivity between brain re gions, mAR modeling of multi-channel EEG recordings is therefore encour aging, since mAR can capture information of the interactions between brain regions which can be used to enhance the discriminating power between in dividuals. Previous work using mAR coefficients as EEG features has been reported in neurology studies, including work by Anderson [4] who extracted mAR coefficients from EEG as features used to discriminate different mental tasks. But very little or no work has been proposed using mAR coefficients from EEG for person authentication or person identification purposes. 2.2.3 Estimation of Multivariate Autoregressive Model We review the Multivariate AR model in this section and present methods of estimation for mAR parameters. There are several methods to estimate multivariate AR model which can mainly be classified as two catalogs, Least Square Estimation and Maximum Likelihood Estimation [16]. Least Square Estimation The method of least squares is used to approximately solve overdetermined systems, i.e. systems of equations in which there are more equations than unknowns. Least squares is often applied in statistical contexts, particularly regression analysis. Least squares can be interpreted as a method of fitting data. The best fit, between modeled and observed data, in the least-squares sense is that instance of the model for which the sum of squared residuals has its least value, a residual being the difference between an observed value and the value given by the model. The method was first described by Carl 20 2.2. Method Friedrich Gauss around 1794 [3]. In this section, we will present a least square method [10, 14] to estimate the mAR parameters of a fixed order M, with a interception and a noise with zero mean (don’t have to be gaussian). Given an N-dimensional time series of k + M state vectors x,-, with 1 — M < m < k, the time series consisting of M pre-sample state vectors X1_M, ,x0 and k state vectors x1, ...,xk that form what we call the effec tive sample. The parameters A1, ..., AM, a0 and the the covariance matrix E of an mAR(M) model of fixed order M are to be estimated. The mAR(M) model can be casted as a form of a regression model: (2.4) where Bz=(ao,Al,...,AM), (2.5) is the parameter matrix and w = , (2.6) X_M are the predictors of dimension dM = NM + 1. This casting an AR model into the form of a regression model amounts to treating the first predictor: (2.7) X1M where x1_M,• , are called the pre-sample state vectors, as a constant vectors of initial values. The parameter estimates for the regression model are consistent and asymptotically unbiased estimates for the AR model since the relative influence of the initial condition (the pre-sample X1_M,• . . , xo) 21 2.2. Method decreases as the sample size k increases [101. With the moment matrices: W=ww, (2.8) X=xx, (2.9) and V = (2.10) the least square estimate of the parameter matrix B is given by B = VW’. (2.11) The covariance matrix of the noise is given by = k — NM 1 (x — Ew)(x — Ewn)T (2.12) = k — NM — 1(X — VW_1T). (2.13) The derivation of the least squares estimators for mAR(M) model can be found in [14]. Another way of computing the least square estimate is to find the QR factorization. The estimated noise covariance matrix is proportional to a Schur complement of the matrix r=(W )=(::)w (2.14) 22 2.2. Method which is the moment matrix F = pTp belonging to the data matrix: w x (2.15) T TWN XN A QR factorization of the data matrix P is given by P = QR, (2.16) where Q is an orthogonal matrix and R is an upper triangular matrix given by R = ( R11 R12 , (2.17) 0 R2) The QR factorization of the data matrix P leads to the Cholesky factoriza tion r = pTp of the moment matrix: / VT\ /DTD DTI 1 = RTR = I I1ll1L11 .LL11R2 1 . (2.18) V X ) ‘\RRll RR12 + RR22) Then the least square estimates of the parameter matrix B is given by (2.19) More details about least square estimation of mAR model can be found in [14J. Maximum Likelihood Estimation Maximum likelihood estimation (MLE) is to determine the parameters that maximize the probability (likelihood) of the sample data underlying a given probability model. Maximum likelihood estimation gives a unique and easy way to determine solution in the case of the normal distribution. Given the probability model of noise, the mAR model can also be estimated by MLE [15j. Least square corresponds to the maximum likelihood criterion if the 23 2.2. Method experimental errors have a normal distribution and can also be derived as a method of moments estimator. The advantage of least squares estimation over MLE is that one do not need to assume the noise follows a gaussian or any other distribution. In this thesis, we use the least square estimation to extract the mAR coefficients from EEG signals. 2.2.4 Hashing MAR Coefficients via Fast Jonhson-Lindenstrauss Transform Although mAR coefficients are very informative, the dimension of the mAR coefficients extracted from EEG channels is very large. Hence, it is neces sary for us to apply some dimension reduction techniques, especially in the case that the sample size is small. In this section, we introduce a recently developed dimension reduction technique that has been shown to be robust and time-efficient in image processing and other applications 111]. Metric Embedding The dimension reduction is often associated with metric embedding, here we give a brief review of metric space and metric embedding. For more details about metric embedding, one can find in [7]. A metric space is an ordered pair (M, d) where M is a set and d is a metric on M, that is, a function d: M x M —* R such that for any x, y and z in M. • d(x,y)>O • d(x,y)=Oif and only ifx=y • d(x,y)=d(y,x) • d(x,z)<d(x,y)+d(y,z) The function d is called distance function or distance. For example, in Eu clidean space,the distance function d(x, y) = Iy—xI is complete metric space. A metric embedding is a transformation of one metric space to another. Or more rigorously, given two metric spaces (U, d) and (17, dv) an injective mapping f: U ‘—* V is called an embedding of U into V. Metric embedding 24 2.2. Method have been proven to be very useful algorithmic tools. In the most general setting, one wishes to map a finite metric space (U, d) to another met ric space (V, d) belonging to some restricted family, without distorting the pairwise distances too much. Denoting the mapping f we usually care about minimizing the relative distortion, which we can define here as max Idu(xi,x2) — dv (f(xi),f(x2) (2.20) x1,x2EU du(xi,x2) Johnson-Lindenstrauss Lemma and Its Modifications A metric embedding technique called Johnson-Lindenstrauss transform [8] states that n points in Euclidean space can be projected down to k = O(s2log(ri)) dimensions while incurring a distortion within 1 ± in their pairwise distance, where 1 ± . Mathematically, we have — x2I2(1 — E) i — X2I2 < Ilxi — X2j2(1 + ), (2.21) where is a projection matrix with dimension k x d. The first row of is a random unit vector uniformly chosen from Sd_i. Sd_i denotes the (d — 1)- dimensional unit sphere in Rd. The second row is a random unit vector from the space orthogonal to the first row, the third is a random unit vector from the space orthogonal to the first two rows, and so on. This choice of F satisfies the following properties: 1. Spherical symmetry: For any orthogonal matrix A E 0(d), A and have the same distribution. 2. Orthogonality: The rows of are orthogonal to each other. 3. Normality: The rows of F are unit-length vectors. A number of modifications on the projection matrix were proposed after wards. Indyk and Motwani [6] found that the entry of can be chosen from the normal distribution jV(0, 1/d) to satisfy JL. As a result the entries of 25 2.2. Method is independent distributed, the norm of each row of 4) becomes a random variable and the rows are no longer necessarily orthogonal. Another important modification was done by Achlioptas [1] who noticed that the only property of 4) needed in projection is that the squared inner product (< 4)j, x >)2 is tightly distributed with the center 1/d for all unit vector x. The projection matrix 4) proposed in [1] is given by: choose each entry of 4) uniformly from {—d’/2,d’/2}. Furthermore he showed that if the entries of 4) are chosen independently according to the following distribution: ( +d/3 w.p. 1/6 4)ij = 0 w.p. 2/3 , (2.22) —d/3 w.p. 1/6 then the JL holds and the normality is now dropped. A nice property of this distribution is its sparsity: on average, a (2/3)-fraction of the projection matrix 4) is 0. Fast Jonhson-Lindenstrauss Transform Based on the previous studies, Alion and Chazelle [2, 13] proposed a new low d O(logn)distortion embedding of 12 into 1, , called the Fast-Johnson-Lindenstrauss Transform (FJLT), which is faster than standard random projections and easy to implement. A FJLT q5 = FJLT(m, d, f,) is a k x d matrix, where d is the original dimension number of the original signal, k is a lower dimension number which is set to be e’e2log(n), m is the number of data points, is the distortion rate and c’ is a positive constant real value. Suppose we employ the 4th-order mAR model and estimate the mAR coefficients from 16 EEG channels, then the feature size is 16 x 16 x 4. In this case we can consider the coefficients at each matrix as a data points, therefore n = 4 and M = 256. A Fast-Johnson-Lindenstrauss-Transform can be obtained by a product of three real valued matrices: q5=PxHxD, (2.23) 26 2.2. Method where P and D are random and H is deterministic. • P is a k-by-d matrix whose elements Pj are drawn independently ac cording to the following distribution, where A/(0, q’) means a Normal distribution with zero-mean and variance q1, f Pj .N(0, q’) with probability q 1 P- = 0 with probability 1 — q where Iclog2n 1q=mm d ‘j’’ for a large enough constant c. • H is a d-by-d normalized Hadamard matrix with the elements as: = d (_1)(i_13_1) (2.24) where (i, > is the dot-product of the rn-bit vectors i, j expressed in binary. • D is a d-by-d diagonal matrix, where each diagonal element is drawn independently from {-1,1} with probability 0.5. With this kind of construction, we can get our intermediate hash (IH) by FJLT as IH = cb(Feature) = P x H x D x Feature. (2.25) Now the original feature vector is mapped into a lower dimensional space with small distortion. However, the size of our intermediate hash is still large. Consider again the 4th-order mAR model from a 16-channel EEG setting, the size of the original features in this case is 256 x 4. By setting E = 0.1 and c = 0.72, we will end up with an IH with size 100 x 4. This problem can be solved by Random Weight Incorporation. Similar to the NMF-SQ hashing scheme proposed in [12], we introduce the pseudorandom 27 2.2. Method weight vectors {w}1,with w e Rc, and then calculate the final hash as (2.26) where II[ is the i-th column vector in IH and < IH, w > is the inner product of IH and w. Note that the distance between the hash vectors lH and 1H3 could be distorted by the weight vector, and hence degrades the classification performance. We sort the elements of IH and w in a descending order before the inner product and make sure a bigger weight will be assigned to a bigger component. By this operation, the perceptual quality of the hash vector is retained. Finally we end up with a hash vector with length M for the M-th order mAR model for EEG signals. This hash vector is very compact and robust, and can capture the essential features of the original mAR coefficients. 2.2.5 Authentication Decision Here, due to its simplicity and generally good performance in real-world applications, we apply the naive Bayes models for decision making in our EEG person authentication system. A naive Bayes model is a simple prob abilistic model based on Bayes’ theorem with strong (naive) independence assumptions. It is particularly attractive when the dimensionality of the feature variables is relatively high with respect to the sample size of the ob servations. Here we assume each element x, of the hash vector X follows a Gaussian distribution and is statistically independent with each other. For each subject, we train a specific naive Bayes probabilistic model of the hash vector and further use the model for person authentication. Based on the training data, we can estimate the parameters 8 of naive Bayes models by Maximum Likelihood Estimation(MLE). Then the likelihood of a claim is calculated as P(XI&) IIP(xjl8j), (2.27) where X is a calculated hash vector with length M. The person authentication decision will be made as follow: given a thred 28 2.3. Experiments and Results hold T, the claim is accepted if P(XIO) > T and otherwise rejected. 2.3 Experiments and Results 2.3.1 EEG Database The EEG data was collected from four normal subjects while performing motor-related-tasks. Subjects were seated about two meters away from a large screen, onto which a virtual environment (VE) was back-projected. The VE consisted of target and distracter balls rapidly approaching the subject. Subjects were asked to interact with the VE display by ‘blocking’ virtual target balls. They were fitted with an EEG cap recording signals from 19 electrodes. An illustration of the locations of EEC channels used is shown in Figure 2.1. EEC data from five tasks of this virtual reality experiment are collected and then used for person authentication. The trajectories of balls are different under the five tasks. Raw EEG recordings are too noisy and should not be analyzed directly. A denoised EEC data set is obtained by subtracting the common reference from the EEC and EYE channels, then performing Independent Component Analysis (ICA). The raw data was collected at 1000Hz whereas the denoised data was obtained by downsampling the raw data to 250Hz and bandpassing it between 5-100Hz. 2.3.2 Feature Extraction and Hashing The EEG signals from 16 channels out of the 19 channels were used for the analysis, as illustrated in of in Figure 2.1. The first step is the feature ex traction by performing the 4th-order mAR modeling of the time series from 16 EEG channels. Then the mAR coefficients are hashed by the proposed FJLT hashing algorithm. Finally, we obtain a hash vector with length 4, which can be denoted as X = (Xl, X2, X3, x4). To visually illustrate the sepa rating differences of hash vectors from different subjects, Figure 2.2 shows a particular angle of the spatial plots of the first three hash values (Xl, x2, x3) from four subjects in Task 1. Points marked by different signs denote the 29 2.3. Experiments and Results Figure 2.1: The location illustration of the EEG channels on the scalp. Those shaded are used for our experiments. Figure 2.2: This figure shows the the separation of hash vectors between subjects, the first three dimensions of the hash vectors are plotted. 21J0 150 + ++ * + 4- 150 + x2 3:51 150 xl 30 2.4. Conclusion trials of different subjects. Since the first three hash values are well sepa rated already, the naive Bayes model based on all 4 hash values provides a good identification performance. 2.3.3 Experimental Results Here we introduce two types of errors that are used for evaluating a person authentication system: False Rejection (FR), which occurs when the system refuses a true claimant, and False Acceptation (FA) which occurs when the system accepts an impostor. The performance is usually measured by False Rejection Rate (FRR) and False Acceptation Rate (FAR). The average of the two measures is called Half Total Error Rate (HTER). As described in Section 3.1, for each motor task we have 11 trials for each subject. We employ the leave-one-out cross validation approach in calculating the perfor mance. Each time we train a client model using 10 out of the 11 trials and the left one trial of the client is used for testing. An impostor set is created by including 33 trials from the other 3 subjects. This process is repeated for each trial and for each subject. We heuristically setup a threshold for each subject. To demonstrate the performance of the proposed EEG authenti cation approach, Figure 2.3 shows one example of the log-likelihood (LL) values from one task based on leave-one-cross-validation, circles indicate the LL values from the client trials and the dots are for impostors. The FAR and FRR results based on cross validation are given in Table 2.1. It is worth mentioning that a 9.1% of FAR in the table simply means that 1 out of 11 test trials is wrongly recognized. The average HTER through five tasks and four subjects is 6.7% which is encouraging. 2.4 Conclusion This chapter has studied the potential of dimension reduction technique on EEG features for person authentication. We proposed the use of Fast Johnson-Lindenstrauss Transform for EEC mAR coefficients hashing. The proposed method presents a fundamental departure from existing methods 31 2.4. Conclusion Table 2.1: The error rates: FAR, FRR and HTER in . Si to S4 denotes subject 1 to 4, and Average(S) is the average of 4 subjects. Ti to T5 denotes task 1 to 5 and Average(T” is the average of 5 tasks. Si S2 S3 S4 Average(S) FRR(Ti) 0 0 0 9.1 2.3 FAR 7.2 3.03 10.8 7.4 7.2 HTER 3.58 1.52 5.4 8.26 4.7 FRR(T2) 9.1 9.1 9.1 9.1 9.1 FAR 8.54 4.1 9.3 3.0 6.2 HTER 8.82 6.6 9.2 6.0 7.6 FRR(T3) 0 0 9.1 9.1 4.5 FAR 3.0 5.2 7.1 9.3 6.2 HTER 1.5 2.6 8.1 9.2 5.3 FRR(T4) 9.1 9.1 9J 9J 9.1 FAR 2.7 6.1 8.9 10.5 7,1 HTER 5.9 7.6 9.0 9.8 8.1 FRR(T5) 9.1 9.1 9.1 0 6.8 FAR 6.6 4.5 7.2 7.6 6.5 HTER 7.9 6.8 8.2 3.8 6.7 FRR Average(T) 5.5 5.5 7.3 7.3 6.4 FAR 5.6 5.7 8.7 7.6 6.9 HTER 5.6 5.6 8.0 7.5 6.7 32 2.4. Conclusion U -50 I -lcJJ I .150 . * •ao •25D Trial ode, Figure 2.3: An example of log-likelihood values of the EEG hash vectors from the client and impostors. in EEG biometry study and shows the potential of mapping an EEG fea ture from a high dimension space to a lower one while keeping the power of discrimination between the features of different subjects. This method makes mathematical model trainable from large features with small data set. Also, the usage of FJLT significantly reduces the computational cost which is essential in real time applications. 33 Bibliography [1] D. Achiloptas. Data-base Freindly Random Projections: Johnson Lindenstrauss with Binary Coins. J. Comput. System Sci., 66:671—687, 2003. [2] Nir Ailon and Bernard Chazelle. The Fast Johnson-Lindenstrauss Transform and Approximate Nearest Neighbors. SIAM J. Comput, 39:302—322, 2009. [3] H. Akaike. Autoregressive Model Fitting for Control. Ann. Inst.Stat. Math., 23:163—180, 1971. [4] E. Anderson, C. Stolz and S. Shamsunder. Multivariate Autoregres sive Models for Classification of Spontaneous Electroencephalographic Signals During Mental Tasks. IEEE Trans. Biomedical Engineering, 45(3):277—286, March 1998. [5] S. Choi et al. Blind Source Separation and Independent Component Analysis: A Review. Neural Inf Process-Lett Rev, 6:1—57, 2005. [6] P. Indyk and J. Matousek. Approximate Nearest Neighors: Towards Removing The Curse of Dimensionality. In the 30th Annual ACM Sym posium on Theory of Computing, pages 604—613, January 1998. [71 Yair Bartal, Ittai Abraham and Ofer Neimany. Advances in Metric Em bedding theory. In Annual ACM Symposium on Theory of Computing, pages 271—286, May 2006. [8] B. Johnson and J. Lindenstrauss. Extensions of Lipschitz Mappings to A Hilbert Space. Contemp. Math., 26:189—206, 1984. 34 Chapter 2. Bibliography [9] J. Barlow. EMG Artifact Minimization During Clinical EEG Record ings by Special Analog Filtering. Electroenceph. Gun. Neurophysiol, 58:161—174, 1986. [10] H. Lutkepohl. Introduction, to Multiple Time Series Analysis, volume 3 of NA. Springer-Verlag, Germany, second edition, 1998. [11] X. Lv and Z.J. Wang. Fast Johnson-Lindenstrauss Transform for Ro bust and Secure Image Hashing. In proc. of IEEE MMSP 2008, 2008. [12] V. Monga and M. Mihcak. Robust Image Hashing Via Non-Negative Matrix Factorizations. In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2006, volume 2, pages lI—Il, 14—19 May 2006. [13] N. Alion and B. Chazelle. Approximate Nearest Neighbors and The Fast Johnson-Liendenstrauss Transform. In Proceedings of the S8st Annual Symposium on the Theory of Computing (STOC), WA, 2006. [14] Arnold Neumaier and Tapio Schneider. Estimation of Parameters and Eigenmodes of Multivariate Autoregressive Models. ACM Trans. Math. Softw., 27(1):27—57, 2001. [15] Din Than Pham and Dinh Quy Tong. Maximum Likelihood Estimation for A Multivariate Autoregressive Model. IEEE transactions on signal processing, 42:3061—3072, 1994. [16] P. Whttle. On The Fitting of Multicariate Autoregressions, and The Approximate Canonical Factorization of A Spectral Density Matrix. Biometrika, 50:129—134, 1963. 35 Chapter 3 An Independent Component Analysis (ICA) Based Approach for EEG Person Authentication 2 3.1 Introduction We note that the current research on EEG-biometry is characterized by modeling or classifying the extracted EEG features directly from EEG sig nals collected from scalp channels. However, these scalp EEG signals are not the source signals inside the brain. Due to volume conduction, scalp EEG recordings do not correspond with the underlying sources but with a (noisy) linear mixture of its components [3] (see Figure 3.1 as an illustra tion). Features from scalp EEG signals directly do not necessary correspond well the features of underlying sources. For instance, volume conduction ef fects induce spurious connectivity relationships between scalp EEG signals that do not necessarily imply the same relationships between the underly ing neural sources. Also, such linear mixture matrix can be sensitive to the exact positions and placements of the EEG nodes, a factor that limits the success of EEG-based biometrics in practice. Therefore, it is worth to explore the underlying brain source signals for EEG biometrics because of their robustness to EEG node positions and relative consistence over time, 2A version of this chapter has been published. Chen He and Z. Jane Wang (2009) An Independent Component Analysis (ICA) Based Approach for EEG Person Authentication. In Proc. 3rd IEEE Int Conf. Bioinformatics and Biomedical Engineering (ICBBE). 36 3.2. Method Figure 3.1: The scalp signal Cl is a linear combination of underlying brain signals Si, S2 and S3. which are crucial requirements for person authentication purpose. With the assumption that EEG recordings are linear combinations of the underlying brain sources, we are motivated to extract the sources by employing blind source separation (BSS) techniques in our authentication approach. Among BSS techniques, Independent Components analysis (ICA) has been investi gated for EEG analysis to derive mutually independent sources from highly correlated scalp EEG recordings [2]. In Section 3.2, we first estimate the ICs from five EEG brain regions, and for each region, a dominating independent component (DIC) is derived. Then the scalar autoregressive coefficients are estimated as a feature set of each DIC. In Section 3.3, we describe two EEG data sets and present the results on person authentication when applying the proposed EEG ICA-based approach. 3.2 Method 3.2.1 Independent Component Analysis Independent Component Analysis (ICA), a particular technology of blind source separation, can separate a multivariate signal into additive subcom ponents assuming the mutual statistical independence of the nongaussian source signals. It is a method in which the goal is to find a linear rep resentation of nongaussian data so that the components are statistically independent, or as independent as possible. Such a representation captures 37 3.2. Method \ •// 7 / / / / 0 103 2W 4W 4W 4W 4W 7W 4W 0 103 2W 4W 4W 4W 4W 7W 4W --<L-- 0 ICC 203 4W 4W 4W 4W 7W 4W *---H 0 103 203 4W 403 4W 4W 703 4W Figure 3.2: The source signals and the mixtures of the signals. the essential structure of the data in many applications, including feature extraction and signal separation. To have a quick understanding of what is ICA, let’s imagine that we are in a room where two people are speaking simultaneously. We have two microphones, which are in different locations. The microphones give us two recorded time signals, denoted by xi(t) and x2(t), with x1 and x2 the amplitudes, and t the time index. Each of these recorded signals is a weighted sum of the speech signals emitted by the two speakers, which we denote by si(t) and s2(t). We could express this as a linear equation: xi(t) = a11s +a12s (3.1) x2(t) = a218 + a22s2, (3.2) where all, a12, a21 and a22 are some parameters that depend on the distances of the microphones from the speakers. Without the knowledge of the ICA can estimated the two speech signals sl (t) and s2 (t) by using only the recorded signals x1 (t) and z2(t). Figure 3.2 is original source signals si(t) and s2(t) (the upper two) and the recorded signals (or mixtures of the source signals) xl (t) and x2 (t) (the lower two). Figure 3.3 shows the estimated source signals by ICA, we can see that the source signals are recovered except there are a phase shift and a scaling on the estimated signals. 38 3.2. Method .2 -V. -\ / I / .. / \V // / / \ 1W 2W 2W .102 502 020 702 0) -2 -V ---. r h-- -.-- V-VV. 1112 2W 31) 402 502 WI) SW Figure 3.3: The estimated source signals by ICA. A more general mathematical formulation of ICA is given as xi(t) = aiisi(t) + a12s2(t) + ... + ais(t), x2(t) = a2lsl(t) + a22s2(t) + ... +a2s(t), Xk(t) = aklsl(t) + ak282(t) + ... + aksfl(t). where s(t)’s are the n source signals and x(t)’s are the k observations. The mixing matrix with size k consists of weight coefficients {a} which are depend on some unknown factors (e.g. modeling volume conduction from the location of the sources to the scalp electrodes). Here we will limit ourselves to the case of over-determined sources, i.e. k n in the equations above, it is assumed that the number of EEG channels exceeds the number of sources. ICA has applications in many areas such as telecommunication, sig nal/image processing, biomedical signal processing and finance. It was re ported to provide good performances in EEG analysis. An EEG data con sists of recordings of electrical potentials in a number of different locations on the scalp and it is widely accepted that these scalp recordings are actually linear mixtures of underlying unknown components of neural source activ ity. As mentioned in Introduction, it is worth to recover the brain sources (3.3) 39 3.2. Method and investigate their features for person authentication purpose. Due to its popularity, we are motivated to investigate ICA for source extraction. Previous work of using ICA on EEG data has been reported in neurology study. But, to our best knowledge, no ICA work has been proposed for person authentication or identification purposes. 3.2.2 Estimation of ICA Here we present a brief explanation on ICA estimation, more details can be found in [7]. The source signals can be expressed as s = Wx, (3.4) where W is the inverse of A to be estimated. The fundamental restriction in ICA is that the independent components must be nongaussian, otherwise ICA is possible. To see this, let’s first consider a special case. We assume that our mixing matrix is orthogonal and the s, are Gaussian then x1 and x2 are Gaussian, uncorrelated, and of unit variance. Hence, the joint density of x1 and x2 is given by 1 / x+x\f(x1,x2)=—exp 2 (3.5) It should not be hard to see that this distribution is completely symmetric for each direction, therefore, no information of the directions of the columns of the mixing matrix A is given by this distribution. And we cannot es timate A in this case. Indeed it has been proved that the distribution of any orthogonal transformation of the Gaussian (x1, x2) has exactly the same distribution as (x1, x2), and that x1 and x2 are independent. Thus, in the case of Gaussia.n variables the matrix A is not identifiable for Gaussian inde pendent components. However, if just one of the independent components is Gaussian, the ICA model can still be estimated. An important result of information theory is that a gaussian variable has the largest entropy among all random variables of equal variance. It means that entropy can be used to measure nongaussianity. Negentropy, a slightly modified version of the 40 3.2. Method definition of differential entropy, is defined as J(s) = H(sgauss) — H(s), (3.6) where is a Gaussian random variable of the same covariance matrix as s and H(.) denotes the entropy. Negentropy is in some sense the optimal estimator of nongaussianity, it is always non-negative, and is zero if and only if s follows a Gaussian distribution. Computational cost of estimating negentropy is usually very high and therefore, simpler approximations of negentropy are very useful, they can be find in [4]. Minimization of Mutual Information The ICA can be estimated via minimization of Mutual Information (MI) [5]. Using the concept of differential entropy, we define the MI I between m (scalar) random variables, I = (si, ..., s) = H(s) — H(x) — log I det(W)I. (3.7) An important property of MI is that we have an invertible linear transfor mation s. Now, let us consider what happens if we constrain the s, to be uncorre lated and of unit variance. This means E(ssT) = WE(xxT)W, (3.8) which implies det(I) = 1 = det(WE(xxT) = det(W) det(E(xxT)det(WT), (3.9) where det(.) denotes the determinant. The mutual information takes into account the whole dependence structure of the variables, and this implies that det(W) must be constant. Moreover, for s of unit variance, entropy 41 3.2. Method and negentropy differ only by a constant, and the sign, and we have, s) = C — J(s), (3.10) where C is a constant and it does not depend on W. This shows a important relation between negentropy and mutual information. Recall that mutual information is a measure of the independence of random variables, therefore we could use MI as the criterion for finding the ICA mixture matrix W. We define the ICA of a random vector x as an invertible transformation and find the matrix W so that the mutual information of the transformed components s, is minimized. This is equivalent to find an invertible transformation W that minimizes the mutual information and is roughly equivalent to finding directions in which the negentropy is maximized and is equivalent to maxi mizing the sum of nongaussianities of the estimates, when the estimates are constrained to be uncorrelated. Actually the constraint of uncorrelatedness is not necessary, but it can reduce the computational cost. Maximum Likelihood Estimation ICA can also be obtained via Maximum Likelihood Estimation [6]. The the log-likelihood is given by L = logf (wx(t)) + Tlogdet(W). (3.11) t=1 i=1 The method is based on the assumption that the density function s is given. Here f are the density functions of the s, and x(t) = (xi(t), ..., x(t)) is the matrix form of observations. The goal is to find a mixing matrix W such that (3.11) is maximized. It can be prove that MLE of ICA is essentially equivalent to minimization of mutual information [7]. Recently, a fast and robust algorithm for estimate ICA is proposed, more details about this algorithm can be found in [4]. 42 3.2. Method 3.2.3 Preprocessing of Data Before ICA Before applying ICA algorithm to the data, it is better to do some prepro cessing which can make the ICA estimation better conditioned and faster. Mean-subtraction and Whitening Mean-subtraction and Whitening are the common steps before applying ICA, they are always desired because they make ICA simpler and faster. The mean substraction makes the observation a zero mean random vector. The estimated source signals are therefore a zero mean random vector as well. The purpose of this step is to simplify the ICA algorithms but this does not mean that the mean could not be estimated. After estimating the mixing matrix of the zero mean data, the source signal s can be obtained by adding the mean vector of s back to the centered estimates of s. Whitening means after subtracting the mean of the data, applying a linear transform to the observations x such that the components of x are uncorrelated and the variances equal unity, or E(xxT) = I. Whitening is simple and always possible. One of approaches of implementing whitening is to use the Eigendecomposition of the covariance matrix: E(xxT) = pAPT where P is the orthogonal matrix of eigenvectors and A is diagonal matrix of eigenvalues. The whitening of x is given by = PA”Px. (3.12) From (3.12) we have = PA1s = As. (3.13) We can see that after whitening linear transformation, a new mixing matrix A is obtained, and not surprisingly A is orthogonal, because E(T) = AE(ssT)A = I. Because A is orthogonal, it contains only n(n — 1)/2 degrees of freedom, and we only need to estimate n(n— 1)/2 parameters. That’s why we say that whitening solves half of the problem of ICA. The simplicity of whitening and the complicity of ICA make it always desired to whitening the data first. Withening is also helpful for dimension reduction, and this can 43 3.2. Method be implemented by removing the eigenvalues which are too small compared with others. This step sometimes removes the noise of signals and avoid overlearning. Effect of Filtering As we discuss before, linear filtering is a popular method to get frequency of interesting or to reduce noise by depressing the power of certain band(s). A good result of ICA is that if we apply linear filtering to the signal and get a new signal, say , then the mixing matrix of the new signal , estimated by ICA is same as that of x. Let X be the matrix of the signals, then the filtered signals can be expressed as X = XF where F is the transformation matrix which is equivalent to perform filtering in frequency domain. Then we have X = XF = ASF = AS. (3.14) This means that the mixing matrix A is still valid. Furthermore, this result implies that it has no significant to perform ICA first or linear filtering first, since the source signals of the filtered observations can be easily obtained by SF. (3.15) 3.2.4 Dominating Independent Component Analysis for EEG One of the difficulties of using ICA for authentication purpose is that al though a source signal sj(t) can be estimated, the estimation procedure could not tell us where this source signal comes from and hence we are not able to label and match the index across subjects. This label-matching in formation, however, is indispensable, since the corresponding source signals of different individuals need to be processed and compared in the authenti cation procedure. Here we propose solving this problem by a simple but very effective method: instead of labeling each IC, for each physical brain region of interest, we apply ICA to the corresponding EEG recordings within the brain region, then we choose the IC with the greatest power as the source 44 3.2. Method of interest and name this IC as Dominating Independent Component (DIC) for the particular scalp region. Estimating ICs based on all the EEG scalp channels and selecting only one DIC may not be informative, alternatively we divide EEC channels into a number of regions based on their physical neighborhood relationships, and find one DIC for each region. The index d of DIC Sd(t) is defined by d = argmaxP (s(t)), (3.16) where i e (1, ...,n) (n is the number of ICs in a given region) and P(s(t)) is the power of the source signal s(t) projected to the EEG channels. There are two advantages associated with this multiple-region approach. First of all, with a large number of EEC channels, the overall DIC may not be obvious. Dividing EEC channels into regions makes the DICs be found easily. Second, in this way an appropriate number of DICs can be obtained and can provide more information to the authentication system. 3.2.5 The Univariate Autoregressive Model In this paper, we model the DICs using Univariate Autoregression. A Uni variate Autoregressive (AR) model can predict the current value of a time series from the previous observations of the same time series. A mathemat ical formulation of a pth order AR model is given as x(t) = ax(t—i)+e(t), (3.17) where x(t) represents the current values of each channel, a is the AR coef ficients at delay i, and e(t) represents the error at time t which is modeled as uncorrelated random variables with zero mean. p is the model order, indicating the number of previous data points used for prediction. The AR model is appropriate when a system is a function of its own behavior. Previous work of AR modeling of EEC has been reported in 45 3.2. Method neurology study, including work by Anderson [1] who extracted Univariate AR coefficients from EEG as features to discriminate different mental tasks, and Poulos who used AR coefficients for identification purpose. A robust AR coefficient estimation can be obtained based on the least-square criteria, as described in Chapter 2. 3.2.6 A Simple Model for Person Authentication Feature Extraction By estimating the AR coefficients from m DICs representing the m brain regions, finally we end up with a feature vector Y with length M = p x m for the pth-order AR model for each DIC. With yielding sufficiently small residues in AR modeling of DICs, this feature vector can capture the essen tial features of the original DICs. Person Authentication Here, due to its simplicity and generally good performance in real-world applications, we apply the naive Bayes models for decision making in our EEG person authentication system. A naive Bayes model is a simple prob abilistic model based on Bayes’ theorem with strong (naive) independence assumptions. It is particularly attractive when the dimensionality of the feature variables is relatively high with respect to the sample size of the observations. Here we simply assume each element y of the feature vector Y follows a Gaussian distribution and is statistically independent from each other. For each subject, we train a specific naive Bayes probabilistic model of the feature vector and further use the model for person authentication. Based on the training data, we can estimate the parameters 8 of naive Bayes models by Maximum Likelihood Estimation (MLE). Then the likeli hood of a claim is calculated as P(Yj8) flP(yIOi), (3.18) 46 3.3. Experiments and Results where Y is an extracted feature vector with length M. The person authentication decision will be made as follow: given a threshold r, the claim is accepted if P(Yj) T and otherwise rejected. 3.3 Experiments and Results In this section we examine our algorithm over two data set: a motor-task- related data set and a P300 EEG data set [9]. For both of the data sets the EEG signals from 17 channels are used for the analysis. The first step is to divide the EEG channels into five regions, depending on their physical position. (F7, Fpl, F3), (Fp2, F8, F4), (FZ, C3, CZ, C4, PZ), (P3, P7, 01) and (P4, P8, 02) forms region 1, 2, 3, 4 and 5, respectively. Then in each region, Independent Component Analysis is performed and the IC with greatest power is selected as the DIC. The second step is feature extraction by performing an appropriate AR modeling of the time series from DIC of each EEG channel region. Finally we obtain a AR coefficient vector with length 5p, where p is the order of AR modeling. This vector can be denoted as Y = (yl, Y2 ..., y5p). In the following part, we will give a detailed description of the EEG data sets we used and demonstrated our result. We also will implement the algorithm proposed in [8] and make comparisons. 3.3.1 Experimental Results: Results on The Motor-task-related Data Set In this data set, the EEG data was collected from seven normal subjects while performing motion related tasks. Subjects were seated about two meters away from a large screen, onto which a virtual environment (VE) was back-projected. The VE consisted of target and distracter balls rapidly approaching the subjects. Subjects were asked to interact with the VE display by ‘blocking’ virtual target balls. They were fitted with an EEG cap (CompumedicsR/NeuroscanR) recording signals from 17 electrodes: F7, Fpl, F3, Fp2, F8, F4, FZ, C3, CZ, C4, PZ, P3, P7, 01, P4, P8 and 02. EEG data from five tasks of this virtual reality experiment are collected and 47 3.3. Experiments and Results then used for person authentication. The trajectories of balls are different under the five tasks. A denoised EEG data set is obtained by subtracting the common reference from the EEG and EYE channels and then removing the artifact components. The raw data was collected at 1000Hz whereas the denoised data was obtained by dowusampling the raw data to 250Hz and bandpassing it between 5-100 Hz. For each motor task, we have 11 trials for each subject. Leave-one-out cross validation approach is employed in our experiments. Each time we train a client model using 10 out of the 11 trials and the left one trial of the client is used for testing. An impostor set is created by including 66 trials from the other 6 subjects. The process is repeated for each trial and for each subject. Figure 3.4 demonstrates the discriminating power of our authentication system. In real applications the decision threshold T needs to be chosen a priori, and should be selected such that HTER is minimized providing that both FAR and FRR are small. In this paper for a same task with a same AR modeling, we set a same threshold for all subjects. Figure 3.5 shows the average error rates over all subjects as a function of thresholds for task 5, 7th-oder AR modeling. The optimal threshold gives out a HTER of 2.2% which is very encouraging. We also summarized FRR, FAR and HTER for all tasks under 5-th order AR modeling in Table 3.1, from which we can see that there are no significant differences through tasks on average, but performance for different individuals varies. This is because of the same threshold setting: to get overall lowest HTER, some subjects are sacrificed. In practice, we can set different thresholds for each subjects by which a better performance can be obtained. However, in this paper we mainly focus on exploring ICA for EEG-authentication purpose and adaptive thresholds setting is left for future work. Since the order of the AR model can affect on the feature extraction process and hence the overall performance of our authentication system. We tested the error rates at several different orders and summarized the results in Table 3.2. These error rates are obtained with using optimal thresholds. We noticed from the Table 3.2 that although the performances for using different orders have no significant difference, the fifth and seventh-order models yield the lowest HTER. 48 3.3. Experiments and Results B -1000 0 -1200 0 0 -1400 B ii TriI Indox Figure 3.4: An example of log-likelihood values of AR coefficient vector from DICs. Circles indicate that of client and squares are from impostors. 0.14 0.12 —FAR / 0.1 —00— HTER / 0.08 02 ________________________________ o -10 -5 0 5 10 15 threshold Figure 3.5: An illustration of average HTER, FAR and FRR over seven subjects as a function of threshholds. This example is from task 5 with AR model p = 7. 49 3.3. Experiments and Results Table 3.1: Error rates of our method on motor-task-related data set, the proposed method: FAR,FRR and HTER error rates in % with the 5-th order AR modeling. Si to S7 denotes subject 1 to 7, Ti to T5 denotes task 1 to 5 and S denotes the average of a given task over all the subjects. Si S2 S3 S4 S5 S6 S7 S FRR(T1) 9.1 9.1 9.1 9.1 9.1 9.1 0.0 7.8 FAR 4.8 0.0 0.0 1.4 0.7 2.1 0.0 1.3 HTER 7.0 4.5 0.0 5.2 4.9 5.6 0.0 4.5 FRR(T2) 9.1 0.0 0.0 0.0 9.1 9.1 0.0 3.9 FAR 1.1 0.0 0.0 9.4 0.6 0.0 1.4 2.2 HTER 6.7 0.0 0.0 4.7 4.8 4.5 0.7 3.1 FRR(T3) 9.1 9.1 9.1 0.0 9.1 9.1 9.1 7.8 FAR 2.8 0.0 8.8 0.0 3.3 0.0 0.0 2.1 HTER 6.0 4.5 9.0 0.0 6.2 4.5 4.5 5.0 FRR(T4) 0.0 0.0 18.2 0.0 9.1 9.1 0.0 5.2 FAR 8.3 0.0 3.7 0.0 0.1 0.6 4.5 2.5 HTER 4.1 0.0 11.0 0.0 4.6 4.8 2.2 3.8 FRR(T5) 0.0 9.1 9.1 9.1 0.0 0.0 0.0 3.9 FAR 5.7 0.0 3.4 14.9 0.3 0.0 0.0 3.5 HTER 2.8 4.5 6.3 12.0 0.1 0.0 0.0 3.7 50 3.3. Experiments and Results Table 3.2: Error rates of the proposed method on motor-task-related data set: FAR, FRR and HTER in %. P denotes the order of AR model, Ti to T5 denotes task 1 to 5 and T denotes the average of a given order over all tasks. P=3 P=5 P=7 P=9 P=11 FRR(T1) 5.2 7.8 7.8 5.2 5.2 FAR 4.2 1.3 5.4 6.9 6.3 HTER 4.7 4.5 6.6 6.0 5.8 FRR(T2) 3.9 3.9 3.9 6.5 13.0 FAR 3.1 2.2 3.5 2.9 1.5 HTER 3.5 3.1 3.7 4.7 7.3 FRR(T3) 9.1 7.8 3.9 6.5 9.1 FAR 2.6 2.4 5.1 3.8 3.1 HTER 5.9 5.1 4.5 5.1 6.1 FRR(T4) 3.9 5.2 5.2 2.6 2.6 FAR 2.8 2.5 1.4 2.9 4.0 HTER 3.3 3.8 3.3 2.8 3.3 FRR(T5) 2.6 3.9 2.6 0.0 2.6 FAR 6.2 3.5 1.8 6.4 3.0 HTER 4.4 3.7 2.2 3.2 2.8 FRR(T) 4.9 5.8 4.7 4.2 6.5 FAR 3.8 2.4 3.4 4.6 3.6 HTER 4.4 4.1 4.1 4.4 5.1 51 3.3. Experiments and Results Table 3.3: Error rates of the method of [81 on motor-task-related data set: FAR,FRR and HTER in %. Si to S7 denotes subject 1 to 7, Ti to T5 denotes task 1 to 5 and S denotes the average of a given task over all the subjects.___________ _____ _____ Si S2 S3 S4 S5 S6 S7 S FRR(Ti) 9.1 9.1 0.0 9.1 9.1 0.0 0.0 5.2 FAR 8.3 5.7 0.0 3.5 5.7 0.0 2.5 3.9 HTER 8.7 7.4 0.0 6.3 7.4 0.0 1.3 4.6 FRR(T2) 9.1 0.0 0.0 0.0 0.0 9.1 0.0 2.6 FAR 7.5 1.5 9.1 4.5 0.0 0.0 0.0 3.2 HTER 8.3 0.8 4.6 2.3 0.0 4.6 0.0 2.9 FRR(T3) 0.0 9.1 9.1 i8.2 0.0 0.0 9.1 6.5 FAR 4.5 1.5 8.8 0.0 0.0 1.5 3.0 2.8 HTER 2.3 5.3 9.0 9.1 0.0 0.8 6.i 4.7 FRR(T4) 0.0 9.1 0.0 9.1 9.1 0.0 9.1 5.2 FAR 3.3 12.1 1.5 4.5 0.0 1.5 4.5 3.9 HTER 1.6 10.1 0.8 6.8 4.6 0.8 6.8 4.5 FRR(T5) 0.0 0.0 0.0 0.0 0.0 9.1 0.0 1.3 FAR 3.1 0.6 1.5 0.0 0.3 0.3 0.0 0.8 HTER 1.6 0.3 0.8 0.0 0.2 4.7 0.0 1.1 52 3.3. Experiments and Results 3.3.2 Experimental Results: Results on P300 EEG Data Set The second data set is obtained through a P300 EEG experiment. The data set is from [9]. In this experiment, the subjects were facing a screen on which 6 images were displayed. The 6 images are a television, a telephone, a lamp, a door, a window, and a radio. The images were flashed in random sequences, one image at a time. Each flash of an image lasted for 100 ms and during the following 300 ms none of the images was flashed, i.e. the inter-stimulus interval was 400 ms. The EEG was recorded at 2048 Hz sampling rate from 32 electrodes placed at the standard positions of the 10-20 international system. A Biosemi Active Two amplifier was used for ampliation and analog to digital conversion of the EEG signals. Each subject completed recording of two different days which separates up to two weeks. Each of the sessions consisted of 6 runs, one run for each of the 6 images. The following protocol was used in each of the runs: • Subjects were asked to count silently how often a prescribed image was flashed (For example: Now please count how often the image with the television is flashed) • The six images were displayed on the screen and a warning tone was issued. • Four seconds after the warning tone, a random sequence of flashes was started and the EEG was recorded. The sequence of flashes was block- randomized, this means that after 6 flashes each image was flashed once, after twelve flashes each image was flashed twice, and etc.. The number of blocks was chosen randomly between 20 and 25. On average 22.5 blocks of 6 flashes were displayed in one run. • After each run subjects were asked what their counting result was. This was done in order to monitor performance of the subjects. The original description of the data set is from [9]. 53 3.3. Experiments and Results Table 3.4: Results of the proposed method on the P300 data set: The power percentages of DICs over the corresponding regions. DIC1 ,..., DIC5 represntsent the dominating ICs from regions 1 to 5. subjctlD subi sub2 sub3 sub4 sub5 sub6 sub7 DIC1 58 54 67 64 60 53 58 DIC2 53 63 62 62 59 52 50 DIC3 49 51 62 72 58 58 54 DIC4 54 62 55 67 63 63 48 DIC5 34 79 33 43 50 39 72 Table 3.5: Error rates of the method in [81 on P300 data set: FAR, FRR and HTER are in %. Si to S7 denotes subject 1 to 7 S denotes the average of a given task over all the subjects. Dayi means testing EEG data of the first day by using parameters that is trained in the first day and day2 means testing the EEG data of a second day by using the parameters trained in the first day. dayl Si S2 S3 S4 S5 S6 S7 S FRR 0.0 0.0 7.1 0.0 14.2 0.0 0.0 3.0 FAR 3.6 1.2 7.2 7.2 0.0 10.8 4.8 5.0 HTER 1.8 0.6 7.2 3.6 7.1 5.4 2.4 4.0 day2 Si S3 S4 S5 S6 S7 3 FRR 78.6 86.0 64.3 78.6 71.4 71.4 64.3 73.5 FAR 21.4 2.4 7.2 13.9 14.3 22.6 16.7 14.1 HTER 50.0 44.2 35.8 46.3 42.9 47.0 41.0 43.8 iiig regions. Figure 3.6 plots the mean values of the first three AR coefficients of DICs for the subjects. We can see that the AR coefficients are well sep arated. Table 3.5 and Table 3.6 report the error rates of the method in [8j and the proposed method, respectively. In the tables, dayl means testing EEG data of the first day by using parameters that is trained in the first day. Well, day2 means testing the EEG data of a second day by using the parameters trained in the first day. As we can see that our method is much consistent over time and this is essential for person authentication systems. Also we can see that for this data set, the error rates of our method are higher than the error rates of the motor-task-related data set. This suggests that there are some tasks that are more suitable for EEG-base biometrics. 54 3.3. Experiments and Results 111 Pfi2 00 .15 -1 6 AOl .1 5 401 l94C: 14 13 12 Figure 3.6: Results of the proposed methods on the P300 data set: plots of the mean values of the AR coefficients for 5 DICs from 7 subjects. From top-left to bottom: DId, ..., DIC5. We can see that the first three AR coefficients are separated for each subjects. 3.3.3 Method of Gaussian Mixture and Maximum A Posteriori Model Adaption for EEG Authentication In this section we present the algorithms in 8] that were used to compare the results with that of the proposed method. Features The feature extraction of EEG data is described as follow. Every 62.5 ms the power spectral density (PSD) in the band 8-30 Hz was estimated for the eight centro-parietal channels C3, Cz, C4, CP1, CP2, P3, Pz, and P4. The 55 3.3. Experiments and Results Table 3.6: Error rates of our method on P300 data set: FAR, FRR and HTER are in % with 5-th order AR modeling. Si to S7 denotes subject 1 to 7 S denotes the average of a given task over all the subjects. Dayl means testing EEG data of the first day by using parameters that is trained in the first day and day2 means testing the EEG data of a second day by using the parameters trained in the first day. dayl Si S2 S3 S4 S5 S6 S7 FRR 0.0 0.0 7.1 7.1 14.3 7.1 14.3 7.1 FAR 17.8 10.7 9.5 26.2 20.2 15.5 8.3 15.5 HTER 8.9 5.4 8.3 16.7 17.3 11.3 11.3 11.3 day2 Si S2 S3 S4 S5 S6 S7 S FRR 7.1 0.0 7.1 0.0 35.7 0.0 7.1 8.2 FAR 27.4 11.9 41.7 44.0 26.2 39.3 40.5 33.0 HTER 17.3 6.0 24.4 22.0 31.0 19.6 23.8 20.6 PSD is estimated via the Welch periodogram algorithm [10]. Specifically, the FFT of three segments of 0.5 second with 50 percent overlap were av eraged, which yields a frequency resolution of 2 Hz. Finally, the values in the frequency band 8-30 Hz were normalized according to the total energy in this same band. As a result, an EEG feature is a 96-dimensional vector (eight channels times 12 frequency components). Then we define a data set X = {xi, X2, ..., XT}, where Xt E RD is the feature point at a observa tion time t, D is dimension of the feature. In the Gaussian Mixture Model (GMM) approach, all feature vectors x are assumed to be independent with each other. Given the GMM parameter set 0, the likelihood of a set of T feature vectors X = {x}L1 is given by P(XI0) = flP(x), (3.19) where P(xIO) = WkN(XIUk,k), (3.20) and 0 = {wk,uk, k}i, (3.21) 56 3.3. Experiments and Results - —‘-E)-- € - €- -—€--—)- --€ @ C) O---“k ‘.--ê---6’ + ——-i - Figure 3.7: An illustration of the location of electrodes on the scalp. Elec trodes used in [8]are indicated in grey. and with the constraint = 1 Vwk 0. (3.22) Here, N(u, ) is a D-dimensional Gaussian density function [4j with mean i and covariance matrix . In [8], the diagonal Gaussian mixture model was used, which means is a diagonal matrix. N is the number of Gaussian clusters and Wk is the weight for the k-th Gaussian. The Global Model and The Client Model The parameter set for client C is denote by &c and the parameter set de scribing a generic non client is denoted by &j. Given a claim for client C’s identity and a set of feature vectors X of the claim, an opinion on the claim is given by L(X) = log P(XI6) — log P(XIOj), (3.23) where P(XO) is the likelihood of the claim coming from the true claimant and P(Xj1) is the likelihood of the claim coming from an impostor. The above probabilities are represented by diagonal Gaussian Mixture Models. The generic EEG model is trained using data from many people. Finally, 57 3.4. Conclusion the authentication decision is reached as follows: Given a threshold r, the claim is accepted if (X) > -r and declined if z(X) <r. In [8J, they trained the client model via Maximum A Posteriori(MAP). MAP estimated of model parameter is defined by 0MAP = argmaxP(6iX), (3.24) where P(O) is the prior density of . An implementation of MAP training for client model adaptation consists of using a global parameter to tune the relative importance of the prior, and only the means are adapted: = ak + (1 — ) P(klxt)xt (3.25) Z=1p(klxt) where Ilk is the new mean of the kth Gaussian, uk is the corresponding parameters in the generic model, P(klxt)is the posterior probability of kth Gaussian (from the client model from the previous iteration), and [0, 1 is the adaptation factor chosen empirically. More about estimation of GMM can be found in [11]. 3.4 Conclusion This chapter has studied the potential of Independent Component Analysis for person authentication. We proposed the use of AR modeling for Domi nating Independent Component of each brain region. Real EEG experimen tal resuits based on two small groups of normal subjects demonstrated the potential of the proposed ICA-based EEG authentication approach. Our al gorithm is examined on two EEG data set: an EEG data set of motor tasks and a EEG data set of P300 potential. Results suggest that our method is promising and has much better performance than the method in [8] for the EEG signals collected from different days, which is essential in authen tication system. Also we note that our algorithm has better performance for motor tasks than the task of P300 potential, this suggests that there are some motor/mental tasks that are suitable for EEG biometrics. 58 Bibliography [1] E. A. Anderson, C. W. Stolz and Shamsunder S. Multivariate Au toregressive Models for Classification of Spontaneous Electroencephalo graphic Signals During Mental Tasks. IEEE Trans. Biomedical Engi neering, 45(3):277—286, March 1998. [2] Laura Astolfi et al. Estimate of Causality Between Independent Corti cal Spatial Patterns During Movement Volition in Spinal Cord Injured Patients. Brain Topography, 19:107—123, 2007. [3] K. Egiazarian, G. Gomez-Herrero, M. Atienza and J.L. Cantero. Mea suring Directional Coupling Between EEG Sources. Neurolmage, 43:497—508, 1994. [4] A. Hyvarinen. Fast and Robust Fixed-point Algorithm for Independent Component Analysis. IEEE Trans. Neural Networks, 10:626—634, 1998. [5] A. Hyvarinen. New Approximations of Differential Entropy for Indepen dent Component Analysis. Advances in neural information processing systems, 10:273—279, 1998. [6] A. Hyvarinen. The Fixed point Algorithm and Maximum Likelihood Estimation for Independent Component Analysis. IEEE Trans. Neural Networks, 10:1—5, 1999. [7] A. Hyvarinen and E. Oja. Independent Component Analysis: Algo rithms and Applications. Neural Networks, 13:411—430, 2000. [8] S. Marcel and Millan J. D. R. Person Authentication Using Brainwaves (EEG) and Maximum A Posteriori Model Adaptation. IEEE Trans. Pattern Analysis and Machine Intelligence, 29(4):743—752, April 2007. 59 Chapter 3. Bibliography [9] Touradj Ebrahimia, Ulrich Hoffmann, Jean-Marc Vesina and Karin Dis erensb. An Efficient P300-based Brain-computer Interface for Disabled Subjects. Journal of Neuroscience Methods, 164:115—125, 2008. [101 P. Welch. The Use of Fast Fourier Transform for The Esitmation of Power Spectra: A Method Based on Time Averaging Over Short, Mod ified Periodograms. IEEE Trans. Audio and Electroacoust., 15:70—73, 1967. [11] H. Zeng and Y. Cheung. A New Feature Selection Method for Gaussian Mixture Clustering. Pattern Recognition, 42:243—250, 2009. 60 Chapter 4 Conclusions and Future Work Chapter 4 presents the conclusions and outlines recommendations for future research arising from the study. 4.1 Conclusions and Contributions Two major problems involved in EEC-based person authentication are the large feature size of the scalp EEG signals compared with the limited EEG data set and the reliability of EEG signals due to inexact positioning and noise. To address the first concern, we propose an EEG feature hashing ap proach for person authentication (Chapter 2). The mAR coefficients are extracted from multiple EEG channels and then hashed via Fast Johnson Lindenstrauss Transform based dimension reduction algorithm. Based on the hash vectors, a Naive Bayes probabilistic model is employed for person authentication. The proposed method presents a fundamental departure from existing methods in EEG biometry study and shows the potential of mapping EEC features from high dimension to a lower one while keeping discrimination power between the features of different subjects. This makes statistical framework trainable from large EEG features with small data set. To address the second concern, and ICA-based modeling approach is pro posed. Five Dominating Independent Components (DICs) are determined from five brain regions represented by EEC channels, then Univariate Au toregressive coefficients of the DICs are extracted as features. Based on the AR coefficients of the DICs, a Naive Bayes probabilistic model is employed 61 4.2. Future Work for person authentication. We tested our algorithm on two EEG data sets: a data set of real motor tasks and a data set of P300 potential. Results suggest that the proposed method is promising and yields better performance than other methods when applied to EEG data collected from different days. This relative consistence over time is essential in person authentication systems. Although the EEG-based biometrics has recently attracted more atten tions and increasing research and good results have been published in this emerging field, the EEG—based biometrics has sonic limitations. These limi tations come from several inherent properties of the scalp EEG signals. First, the current EEG devices are not accurate enough and the spatial resolution of scalp EEG signals is poor, these limit the investigation of brainwave activ ities. Second, even for a same subject, the brainwave may change over time, this makes it very challenging to find robust EEG features. Hence, given the current hardware devices and biological theories on human brain, the EEG-based biometrics is in its infant stage and still far away from building up industrial applications. However, with the improvement on EEG devices and development of biological theories, maybe one day, the EEG-based bio metrics can be seen in real applications. 4.2 Future Work In this thesis, we proposed two algorithms that dealt with two different aspects of EEG authentication technique and the future work will focus on the following aspects: • Exploring other approaches which can increase the reliability of scalp EEG recordings, such as other blind source separation techniques. This includes exploring approach that can completely label the IC components [2]. • Exploring other dimension reduction algorithms [1] that can reduce the size of the EEG features while keeping discrimination power. This will include studying methods such as Karhunen-Loeve transform, a 62 4.2. Future Work dimension reduction technique used in image compression. Methods of combining multiple features [3] will also be studied. • Since different mental/motor tasks may give different performance (see Chapter 3) on EEC biometrics, we will design different experiments and test on different EEC databases to study which tasks are more robust and suitable for EEG biometrics. 63 Bibliography [1] Imola Fodor. A Survey of Dimension Reduction Techniques. Technical report, Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, 2002. [2] Aapo Hyvarinen Johan Himberg and Fabrizio Esposito. Validating The Independent Components of Neuroimaging Time-series via Clustering and Visualization. Neroimage, 22:1214—1222, 2003. {3] A. Riera, A.Soria-Frisch, M.Caparrini, C.Grau, and G.Ruffini. Un obtrusive Biometric System Based on Electroencephalogram Analysis. EURA SIP Journal on Advances in Signal Processing, 2008:57—64, 2008. 64
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Person authentication using EEG brainwave signals
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Person authentication using EEG brainwave signals He, Chen 2009
pdf
Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
Page Metadata
Item Metadata
Title | Person authentication using EEG brainwave signals |
Creator |
He, Chen |
Publisher | University of British Columbia |
Date Issued | 2009 |
Description | Studies have shown that the electroencephalogram (EEC) recordings have unique pattern for each individual and thus have potential for biometric applications. There are two major problems associated with EEC biometrics. One is that the large EEG features size and the relatively limited EEG data size make it difficult to train a robust model; the other is that the signals from EEG scalp may not be reliable in many situations. Thus in this thesis we proposed new methods for increasing the accuracy and robustness of EEC-based authentication systems. First, to address the concern of the high dimensionality of EEC features, we proposed a novel dimension reduction method of EEG features based on the Fast Johnson-Lindeustrauss Transform (FJLT). We showed that this method has potential of mapping EEG features from a high dimension space to a lower one while keeping discrimination power between the features of subjects. The features we used are Multivariate Autoregressive (mAR) coefficients. We tested this method on a motor task related EEG data set. Second, to increase the reliability of scalp EEC signals, we employed an Independent Component Analysis (ICA)-based approach in our authentication procedure, with the assumption that EEG recordings are linear combinations of the underlying brain source signals. We estimated the Independent Components (ICs) from several physical regions on the scalp and determine the Dominating Independent Components (DIC) in the corresponding regions. Then we extracted the Univariate Autoregressive (AR)coefficients from DICs as features. We tested our algorithm on two data sets, a motor task related EEC data set and an EEC data set of P300 potential. The proposed algorithm appeared to be promising, and when applied to EEG data collected from different days yields better performance than other methods. This relative consistence over time is essential in person authentication systems. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-03-24 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0069371 |
URI | http://hdl.handle.net/2429/22475 |
Degree |
Master of Applied Science - MASc |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2009-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2009_fall_he_chen.pdf [ 1.19MB ]
- Metadata
- JSON: 24-1.0069371.json
- JSON-LD: 24-1.0069371-ld.json
- RDF/XML (Pretty): 24-1.0069371-rdf.xml
- RDF/JSON: 24-1.0069371-rdf.json
- Turtle: 24-1.0069371-turtle.txt
- N-Triples: 24-1.0069371-rdf-ntriples.txt
- Original Record: 24-1.0069371-source.json
- Full Text
- 24-1.0069371-fulltext.txt
- Citation
- 24-1.0069371.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
data-media="{[{embed.selectedMedia}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0069371/manifest