Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Design of a self-paced brain computer interface system using features extracted from three neurological.. Fatourechi, Mehrdad 2008

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata

Download

Media
ubc_2008_spring_fatourechi_mehrdad.pdf [ 5.55MB ]
[if-you-see-this-DO-NOT-CLICK]
Metadata
JSON: 1.0066215.json
JSON-LD: 1.0066215+ld.json
RDF/XML (Pretty): 1.0066215.xml
RDF/JSON: 1.0066215+rdf.json
Turtle: 1.0066215+rdf-turtle.txt
N-Triples: 1.0066215+rdf-ntriples.txt
Original Record: 1.0066215 +original-record.json
Full Text
1.0066215.txt
Citation
1.0066215.ris

Full Text

 DESIGN OF A SELF-PACED BRAIN COMPUTER INTERFACE SYSTEM USING FEATURES EXTRACTED FROM THREE NEUROLOGICAL PHENOMENA   by  Mehrdad Fatourechi  B.Sc., University of Tehran, 1998 M.Sc., University of Tehran, 2001    A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  DOCTOR OF PHILOSOPHY   in    The Faculty of Graduate Studies  (Electrical and Computer Engineering)     THE UNIVERSITY of BRITISH COLUMBIA  January 2008   ? Mehrdad Fatourechi, 2008        ii ABSTRACT Self-paced  Brain  computer  interface  (SBCI)  systems  allow  individuals  with  motor disabilities  to  use  their  brain  signals  to  control  devices,  whenever  they  wish.  These systems are required to identify the user?s ?intentional control (IC)? commands and they must remain inactive during all periods in which users do not intend control (called ?no control (NC)? periods).  This dissertation addresses three issues related to the design of SBCI systems: 1) their presently  high  false  positive  (FP)  rates,  2)  the  presence  of  artifacts  and  3)  the identification of a suitable evaluation metric.   To improve the performance of SBCI systems, the following  are proposed: 1) a method for  the  automatic  user-customization  of  a  2-state  SBCI  system,  2)  a  two-stage  feature reduction  method  for  selecting  wavelet  coefficients  extracted  from  movement-related potentials  (MRP),  3)  an  SBCI  system  that  classifies  features  extracted  from  three neurological phenomena: MRPs, changes in the power of the Mu and Beta rhythms; 4) a novel  method  that  effectively  combines  methods  developed  in  2)  and  3  )  and                    5) generalizing the system developed in 3)  for detecting a right index finger flexion to detecting the right hand extension.  Results of these studies using actual movements show an average true positive (TP) rate of 56.2% at the FP rate of 0.14% for the finger flexion study and an average TP rate of 33.4% at the FP rate of 0.12% for the hand extension study.  These  FP  results  are  significantly  lower  than  those  achieved  in  other  SBCI systems, where FP rates vary between 1-10%. We  also  conduct  a  comprehensive  survey  of  the  BCI  literature.  We  demonstrate  that many BCI papers do not properly deal with artifacts. We show that the proposed BCI achieves  a  good  performance  of  TP=51.8%  and  FP=0.4%  in  the  presence  of  eye movement artifacts. Further tests of the performance of the proposed system in a pseudo-online environment, shows an average TP rate =48.8% at the FP rate of 0.8%.  Finally,  we  propose  a  framework  for  choosing  a  suitable  evaluation  metric  for  SBCI systems.  This  framework  shows  that  Kappa  coefficient  is  more  suitable  than  other metrics in evaluating the performance during the model selection procedure.         iii TABLE OF CONTENTS  Abstract .............................................................................................................................. ii Table of Contents ............................................................................................................ iii List of Tables .................................................................................................................. viii List of Figures .................................................................................................................. xi List of Abbreviations .......................................................................................................xv Acknowledgements ...................................................................................................... xvii Dedication ..................................................................................................................... xviii Co-authorship Statement ............................................................................................. xix Chapter 1  Introduction and background ...................................................................1 1.1  Introduction and motivation .........................................................................1 1.1.1  High false positive rates (FPR) ...............................................................3 1.1.2  Presence of artifacts .................................................................................4 1.1.3  Evaluation metrics .....................................................................................5 1.2  Functional model of a brain computer interface system .........................5 1.3  Background ....................................................................................................7 1.3.1  Signal recording ........................................................................................7 1.3.2  Choice of neurological phenomenon .....................................................8 1.3.3  Timing of BCI control ..............................................................................13 1.4  Design of self-paced BCI systems ...........................................................14 1.5  Use of multiple neurological phenomena in BCI systems ....................18 1.5.1  Simultaneous application of MRPs and changes in the power of Mu/Beta rhythms .....................................................................18 1.5.2  Using multiple neurological phenomena in BCI systems .................19 1.6  Artifacts in BCI systems .............................................................................21 1.6.1  Artifact avoidance ...................................................................................23 1.6.2  Artifact rejection .......................................................................................23 1.6.3  Artifact removal .......................................................................................25 1.7  Evaluating the performance of SBCI systems........................................27 1.8  Thesis contributions ....................................................................................31 1.8.1  Reducing high false positive rates .......................................................32 1.8.2  Addressing artifacts in SBCI systems ..................................................33 1.8.3  Finding a suitable evaluation metric for SBCI systems .....................34 1.9  Organization of the thesis ..........................................................................34 1.10  References ...................................................................................................40    iv Chapter 2  Automatic user customization for improving the performance of a self-paced brain computer interface system .......................................................50 2.1  Introduction ..................................................................................................50 2.2  Background ..................................................................................................53 2.3  Problem statement ......................................................................................56 2.4  Methods ........................................................................................................58 2.5  Experimental results ...................................................................................62 2.6  Discussion and conclusions ......................................................................69 2.7  Acknowledgements ....................................................................................71 2.8  References ...................................................................................................72 Chapter 3  Application of a hybrid wavelet feature selection method in the design of a self-paced brain computer interface system ....................................75 3.1  Background ..................................................................................................75 3.2  Data collection .............................................................................................79 3.3  Method ..........................................................................................................81 3.4  Results ..........................................................................................................86 3.5  Discussion and conclusions ......................................................................89 3.6  Acknowledgements ....................................................................................95 3.7  References ...................................................................................................96 Chapter 4  A self-paced brain computer interface system that uses movement related potentials in changes in the power of brain rhythms ................99 4.1  Introduction ..................................................................................................99 4.2  Background ................................................................................................102 4.2.1  Neurological phenomenon background .............................................102 4.2.2  Multiple neurological phenomena in BCI systems ...........................104 4.3  Data collection ...........................................................................................106 4.4  Methods ......................................................................................................108 4.4.1  Feature extraction .................................................................................110 4.4.2  Feature classifier ...................................................................................114 4.4.3  Feature selection ...................................................................................117 4.4.4  Performance evaluation .......................................................................119 4.5  Results ........................................................................................................120 4.6  Discussion ..................................................................................................127 4.6.1  Observations on the BCI designs based on a single neurological phenomenon ...................................................................127 4.6.2  Observations on Study 1 .....................................................................128 4.6.3  Observations on Study 2 .....................................................................128 4.6.4  Statistical analysis .................................................................................128 4.7  Acknowledgements ..................................................................................131 4.8  References .................................................................................................133 Chapter 5  A self-paced brain computer interface system with a low false positive rate ..........................................................................................................138 5.1  Introduction ................................................................................................138 5.2  Methods ......................................................................................................141    v 5.2.1  Feature extraction .................................................................................141 5.2.2  Feature classification ............................................................................147 5.2.3  Hybrid genetic algorithm (HGA) ..........................................................149 5.3  Experimental results .................................................................................152 5.3.1  Data collection and evaluation ............................................................152 5.3.2  Results ....................................................................................................155 5.4  Discussion and conclusions ....................................................................157 5.5  Acknowledgements ..................................................................................162 5.6  References .................................................................................................163 Chapter 6  EMG and EOG artifacts in brain computer interface systems: a survey   .................................................................................................................167 6.1  Introduction ................................................................................................167 6.2  Current neurological phenomena and associated artifacts ................168 6.2.1  Current neurological phenomena .......................................................168 6.2.2  Artifacts in BCI systems .......................................................................171 6.3  Methods of handling artifacts ..................................................................172 6.3.1  Artifact avoidance .................................................................................172 6.3.2  Artifact rejection .....................................................................................173 6.3.3  Artifact removal .....................................................................................175 6.4  Literature survey .......................................................................................178 6.4.1  EOG artifacts .........................................................................................185 6.4.2  EMG artifacts .........................................................................................185 6.5  Discussion and conclusions ....................................................................186 6.6  References .................................................................................................189 Chapter 7  Performance of a self-paced Brain computer Interface on data contaminated with eye blinks and on data recorded in subsequent sessions   .................................................................................................................209 7.1  Introduction ................................................................................................209 7.2  Methods ......................................................................................................212 7.2.1  Self-paced brain computer interface design .....................................212 7.2.2  Data collection .......................................................................................213 7.2.3  Evaluation ...............................................................................................215 7.3  Results ........................................................................................................217 7.3.1  Analysis of SBCI performance on artifact-contaminated data .......217 7.3.2  Test on data recorded in subsequent sessions ...............................218 7.3.3  The effect of adding a debounce component ...................................222 7.4  Discussion ..................................................................................................227 7.5  Acknowledgements ..................................................................................230 7.6  References .................................................................................................231 Chapter 8  Selection of a suitable evaluation metric for a self-paced brain computer interface system .................................................................................234 8.1  Introduction ................................................................................................234 8.2  Problem statement ....................................................................................239 8.3  A framework for comparing evaluation metrics ....................................244    vi 8.3.1  Suitability of an evaluation metric .......................................................245 8.3.2  Guidelines for comparing two evaluation metrics ............................247 8.3.3  Degree of consistency (DoC) ..............................................................248 8.3.4  The Degree of discriminancy (DoD) ...................................................251 8.3.5  Comparison of two evaluation metrics ...............................................252 8.3.6  Using sub-sampling grids for calculating the comparison measures ................................................................................................252 8.4  Selected evaluation metrics in SBCIs....................................................255 8.4.1  Overall accuracy (OA) ..........................................................................255 8.4.2  Information transfer rate (mutual information) ..................................256 8.4.3  Kappa ......................................................................................................256 8.4.4  HF-difference .........................................................................................257 8.4.5  FPRTPR ratio ................................................................................................257 8.4.6  ROC curve and related metrics ..........................................................258 8.5  Simulations ................................................................................................259 8.5.1  Application ..............................................................................................259 8.5.2  Results ....................................................................................................259 8.6  Discussion and conclusions ....................................................................271 8.7  References .................................................................................................274 Chapter 9  New studies on the design of a 2-state self-paced brain computer interface system with a low false activation rate ....................................276 9.1  Introduction ................................................................................................276 9.2  Experimental paradigm ............................................................................280 9.2.1  Data recording .......................................................................................280 9.2.2  Artifact monitoring .................................................................................282 9.3  System design methods ..........................................................................283 9.3.1  Generating the IC and NC data ..........................................................283 9.3.2  Feature extraction .................................................................................284 9.3.3  Feature classification ............................................................................287 9.3.4  Multiple classifier system .....................................................................288 9.3.5  Calculating the TPs and FPs ...............................................................288 9.3.6  Metric selection for model evaluation ................................................289 9.3.7  Model selection .....................................................................................292 9.3.8  Evaluation ...............................................................................................293 9.3.9  Using ROC curves for summarizing the performance on test sets ..........................................................................................................293 9.4  Results ........................................................................................................294 9.4.1  Choosing the evaluation metric for model selection ........................295 9.4.2  Performance of the system .................................................................295 9.5  Discussion and future work .....................................................................299 9.5.1  Discussion ..............................................................................................299 9.5.2  Future works ..........................................................................................305 9.6  Acknowledgements ..................................................................................308 9.7  References .................................................................................................309    vii Chapter 10  Summary and conclusions ..................................................................312 10.1  Summary ....................................................................................................312 10.1.1  Chapter 2: Improving the performance of LF-ASD by automatic user-customization .............................................................313 10.1.2  Chapter 3: Using DWT to extract features ........................................314 10.1.3  Chapter 4: Using three neurological phenomena as the source of control ....................................................................................315 10.1.4  Chapter 5:  Design of an automated SBCI system with low FP rates ..................................................................................................317 10.1.5  Chapter 6: Analysis of the effect of artifacts in BCI systems .........318 10.1.6  Chapter 7: Analysis of the performance of the proposed SBCI on artifact-contaminated data ...................................................319 10.1.7  Chapter 8: A framework for evaluating the performance of SBCI systems ........................................................................................320 10.1.8  Chapter 9: Applying the proposed SBCI with hand extension data .........................................................................................................320 10.2  Summary of contributions ........................................................................321 10.2.1  Reducing high false positive rates .....................................................321 10.2.2  Addressing artifacts in SBCI systems ................................................323 10.2.3  Finding a suitable evaluation metric for SBCI systems ...................323 10.3  Future research directions .......................................................................324 10.4  References .................................................................................................326 Appendix A- UBC Research Ethics Board Certificate .............................................327 Appendix b- Theoretical analysis of the proposed SBCI ........................................328 B.1. Formulating the problem ..................................................................................328 B.2. Constraints .........................................................................................................329 B.3. Objective functions ............................................................................................330 B.4. Results ................................................................................................................332 B.5. References .........................................................................................................336            viii LIST OF TABLES  Table 1-1. Comparison of the TPR and FPR rates achieved in different SBCI studies. ..................................................................................................17 Table 2-1. The confusion matrix for a 2-state self-paced BCI system. ...................60 Table 2-2. Comparison of the fitness value of the initial and final populations (tested on the validation sets). ................................................66 Table 2-3. TP rates of the LF-ASD and the ALF-ASD (FP=2%). ............................68 Table 2-4. Delay parameter values used in the design of the LF-ASD based on the ensemble averages of the MRP patterns in the training data set. Note that ? i  and ? j are set to zero and that the same delay parameter values are used for the rest of the bipolar channels. The table is reproduced from [15]. ............................................69 Table 3-1. The confusion matrix for a 2-state self-paced BCI system. ...................84 Table 3-2. Comparison of the average TP, average FP rates, average FPRTPR  and the average number of features. ................................................87 Table 3-3. The average number of selected features per channel after applying the hybrid feature selection algorithm. ........................................88 Table 4-1. The average TP and FP rates (%) for Study 1 (the numbers in parenthesis show the standard deviation). ...............................................121 Table 4-2. The average TP and FP rates (%) for each neurological phenomenon in Study 2 (the numbers in parenthesis show the standard deviation). ......................................................................................121 Table 4-3. The average TP and FP rates (%) for User AB1 in Study 2 (the numbers in parenthesis show the standard deviation). ..........................125 Table 4-4.The average TP and FP rates (%) for User AB2 in Study 2 (the numbers in parenthesis show the standard deviation). ..........................126 Table 4-5. The average TP and FP rates (%) for User AB3 in Study 2 (the numbers in parenthesis show the standard deviation). ..........................126 Table 4-6. The average TP and FP rates  (%) for User AB4 in Study 2 (the numbers in parenthesis show the standard deviation). ..........................127 Table 5-1. The time schedule of recording the data ................................................153    ix Table 5-2. The performance results for the proposed SBCI system. ...................156 Table 5-3. Comparison of the performance results. ................................................158 Table 6-1. Methods of handling artifacts in BCI literature .......................................181 Table 6-2. Methods of automatic EOG rejection in BCI studies. ...........................183 Table 6-3. Methods of automatic EMG rejection in BCI studies. ...........................184 Table 6-4.Methods of automatic EOG removal in BCI studies. .............................184 Table 6-5. Methods of automatic EMG removal in BCI studies. ............................185 Table 7-1. The time schedule of recording the data. For each participant, Day 1 is the first day that a participant attended the experiments. The rest of days are numbered with respect to Day 1 of that particular participant. ....................................................................................215 Table 7-2. Comparison of the average test results on artifact-contaminated and non-contaminated data. The averages are calculated over 5 outer validation sets. The numbers in the parentheses indicate standard deviations. .....................................................................................217 Table 7-3. Comparison of the average results using data recorded in the first five sessions with those using data recorded in subsequent sessions. The averages are calculated over 5 outer validation sets. The numbers in the parentheses indicate standard deviations. ......................................................................................................219 Table 8-1. DoS results for the evaluation metrics studied in this paper. Res1 stands for the finest resolution, Mean stands for the average of 10 resolution values and Res10 stands for the coarsest resolution. ......................................................................................261 Table 8-2. The DoC results for the evaluation metrics studied in this paper (the first three comparisons). Res1, Mean and Res10 stand for the finest resolution, the average of 10 resolution values and the coarsest resolution, respectively. ...............................................................263 Table 8-3. The DoC results for the evaluation metrics studied in this paper (the last three comparisons). Res1, Mean and Res10 stand for the finest resolution, the average of 10 resolution values and the coarsest resolution, respectively. ...............................................................264 Table 8-4. The DoD results for the evaluation metrics studied in this paper (the first three comparisons). Res1, Mean and Res10 stand for the finest resolution, the average of 10 resolution values and the coarsest resolution, respectively. ...............................................................265 Table 8-5. The DoD results for the evaluation metrics studied in this paper (the last three comparisons). Res1, Mean and Res10 stand for the finest resolution, the average of 10 resolution values and the coarsest resolution, respectively. ...............................................................266    x Table 8-6 . The effect of weights on average values of DoC. ................................267 Table 8-7. The effect of weights on average values of DoD. .................................268 Table 9-1. Comparison of the TP rates of monopolar and bipolar montages for different false activation rates. The numbers  in parentheses show the standard deviations. .............................................299 Table 9-2. Comparison of the TPR and FAR rates achieved in different SBCI studies. ................................................................................................302     xi LIST OF FIGURES  Figure 1-1. A BCI system allows users to control a device using their brain signals only. .......................................................................................................2 Figure 1-2. A typical SBCI system that identifies an IC command related to the execution of right finger flexion ................................................................3 Figure 1-3. High false positive rates can significantly impact the performance of an SBCI system, even if the TP rates are high. (a) Brain states of a user; (b) The output of the SBCI system. .................4 Figure 1-4. Functional model of a BCI system.  Note the control display is optional. ..............................................................................................................6 Figure 1-5. Two examples of neurological phenomena. (a) Changes in the power of Beta rhythms over time; (b) A movement-related potential. Vertical line shows the time of activation of the movement.  Note that these shapes are generated by averaging over many epochs. .........................................................................................10 Figure 1-6. Synchronized vs. self-paced control. (a) In a synchronized BCI system, control can be done only in certain intervals specified by the system; (b) In a self-paced BCI system, the control is done at the user?s own pace. ......................................................................................14 Figure 1-7. An example of how artifacts can affect the performance of an SBCI system. (a) The brain state of the user; (b) The periods when artifacts have occurred; (c) The output of the SBCI system (note: FP: false positive, TN: true negative, FN: false negative and TP: true positive). ....................................................................................21 Figure 1-8. Types of evaluation metrics used in synchronized and self-paced BCI systems. .......................................................................................29 Figure 1-9. The overall schematic of the SBCI system developed and studied in Chapters 4, 5, 7, and 9. ...............................................................36 Figure 1-10. Outline of the thesis. ................................................................................39 Figure 2-1. Components of the LF-ASD system (from [32]). ...................................54 Figure 2-2. Points selected by the feature generator when applied to a sample bipolar EEG signal. ..........................................................................55    xii Figure 2-3. The fitness of the best chromosomes as a function of the generation number for two representative individuals. a) AB2; b)  SCI4. .................................................................................................................65 Figure 3-1. The overall structure of the proposed hybrid method for extracting MRP features. ...............................................................................81 Figure 3-2. Spatial distribution of the average number of selected features for AB1. ............................................................................................................90 Figure 3-3. Spatial distribution of the average number of selected features for AB2. ............................................................................................................90 Figure 3-4. Spatial distribution of the average number of selected features for AB3. ............................................................................................................91 Figure 3-5. Spatial distribution of the average number of selected features for AB4. ............................................................................................................91 Figure 3-6. Comparison of the fitness of the best chromosome vs. other subset of features. ..........................................................................................95 Figure 4-1. Functional model of a BCI system (adapted from [1]). .........................99 Figure 4-2. Synchronized vs. SBCI systems. (a) In a synchronized BCI system control is only possible during System Ready periods; (b) In an SBCI system, the system continuously accepts the input signals. ...........................................................................................................100 Figure 4-3. NC periods are generated by shifting a window over NC datasets. ........................................................................................................108 Figure 4-4. The overall structure of the SBCI system implemented in Study 1. .....................................................................................................................109 Figure 4-5. The overall structure of the two-stage MCS implemented in Study 2. ..........................................................................................................110 Figure 4-6. The process of generating templates. ...................................................112 Figure 4-7. The spatial distribution of the selected features for individuals in Study 2. (a) User AB1; (b) User AB2; (c) User AB3; and (d) User AB4. ......................................................................................................125 Figure 5-1. The overall structure of the improved SBCI incorporating three neurological phenomena. ............................................................................142 Figure 5-2. An example of how features are extracted using the proposed cross-covariance method. ...........................................................................145 Figure 5-3.  (a) The structure of a chromosome; (b) Representation of the parameter values for each SVM in a chromosome. ................................150 Figure 5-4. Method of calculating the TP rate; (a) EEG Signal; (b) Output of the finger switch; (c) Output of the SBCI. .............................................155    xiii Figure 6-1. The functional model of a BCI system depicting its principle functional components. ................................................................................167 Figure 6-2.  The number of papers published on different methods of handling EOG artifacts in BCI studies. ......................................................180 Figure 6-3. The number of papers published on different methods of handling EMG artifacts in BCI studies. .....................................................180 Figure 7-1. The overall structure of the improved SBCI .........................................213 Figure 7-2. An example of extracting the maximum of the cross-correlogram using the proposed cross-covariance method. .................214 Figure 7-3. Method of calculating the TP rate; (a) EEG Signal; (b) Output of the hand switch; (c) Output of the SBCI. ..............................................216 Figure 7-4. The SBCI output during periods when finger movements were executed for a) Participant AB1; b) Participant AB2; and c) The output of the SBCI during NC sessions when movements did not occur for Participant AB1. ...........................................................................222 Figure 7-5. The operation of a debounce component. ............................................224 Figure 7-6. The TP rate, FP rate and the  FPRTPR  ratio as a function of the length of the debounce window for (a) Participant AB1; (b) Participant (AB2); (c ) Participant AB3; (d) Participant (AB4); (e) Averages of all four participants. ................................................................227 Figure 7-7. The output of the SBCI during periods when finger movements were executed for participant AB4; (b) the output of the SBCI during NC sessions when movements did not occur for participant AB4. ............................................................................................229 Figure 8-1. a) An example of a confusion matrix for a balanced dataset; b) An example of a confusion matrix for an imbalanced dataset; c) A second example of a confusion matrix for an imbalanced dataset. ..........................................................................................................236 Figure 8-2. A sample fitness landscape for a classification problem with two classes. ...................................................................................................242 Figure 8-3. An example of dividing the (TPR,FPR) domain into regions. Different movements on the (TPR, FPR) space may be associated with different weights. Note that the numbers on each axis denote (%). ............................................................................................244 Figure 8-4. Two examples of more complex break-down of the (TPR, FPR) domain with more complex partition and weighting schemes. ........................................................................................................245 Figure 8-5. An example of using grids on the (TPR, FPR) domain. .....................253    xiv Figure 8-6. Consistency between different resolutions in reaching the same conclusion for the cases studied here. The chart attributes to DoD results. ..............................................................................................271 Figure 9-1. The overall structure of the SBCI (from [17]). The dashed lines show the parts of the system whose values are determined by the hybrid genetic algorithm (HGA). ..........................................................284 Figure 9-2. An example of how features are extracted using the proposed cross-covariance method. ...........................................................................286 Figure 9-3. Method of calculating the TP rate; (a) EEG Signal; (b) Output of the finger switch; (c) Output of the SBCI. .............................................289 Figure 9-4. An example of dividing the (TPR,FPR) domain into regions. Different movements on the (TPR, FPR) space may be associated with different weights. Note that the numbers on each axis denote (%). ............................................................................................291 Figure 9-5. ROC plots for (a) AB1; (b)AB2;(c)AB3;(d) AB4;(e)AB5. .....................298 Figure 9-6. Average MRPs for Channel C1 over two sessions. (a) participant AB1; (b)participant AB2 ; (c) participant AB3; (d) participant AB4;. ...........................................................................................305      xv LIST OF ABBREVIATIONS 1-NN  One nearest neighbor AB  Able-bodied AEP  Auditory evoked potentials ANC  Activity of neural cells AR  Auto-regressive AUC  Area under the receiver operating characteristic curve BCI  Brain computer interface BI  Brain Interface BSS  Blind source separation CAR  Common average reference CBR  Changes in the brain rhythms CPBR  Changes in the power of Beta rhythms CPMR  Changes in the power of Mu rhythms CT  Cognitive task DoC  Degree of consistency DoD  Degree of discriminancy DoS  Degree of suitability DROP  Desired region of operation DWT  Discrete wavelet transform ECG  Electrocardiography ECoG  Electro-corticogram EEG  Electroencephalogram EMG  Electromyogram ENT  Energy normalization transform EOG  Electrooculgram ERD  Event-related desynchronization ERP  Event-related potential ERS  Event-related synchronization FAR  False activation rate FDR  False discovery rate FIR  Finite impulse response FP  False positive FPR  False positive rate GA  Genetic algorithm HGA  Hybrid genetic algorithm IC  Intentional control ICA  Independent component analysis ITR  Information transfer rate k-NN  k-nearest neighbor LF-ASD  Low frequency- asynchronous switch design MCS  Multiple classifier system MEG  Magnetoencephalography MI  Mutual information    xvi MN  Multiple neurological phenomena MRA  Movement-related activity MRP  Movement-related potential NC  No control NN  Neural networks OA  Overall accuracy OPM  Outlier processing method PCA  Principal component analysis RMS  Root mean square ROC  Receiver operating characteristic SBCI  Self-paced brain computer interface SCI  Spinal cord injury SNR  Signal-to-noise ratio SSEP  Somatosensory evoked potential SSVEP  Steady stated visual evoked potential STD  Standard deviation SVM  Support vector machine SWT  Stationary wavelet transform TEM  Time of expected attempted movement TP  True positive TPR  True positive rate VEP  Visual evoked potential                          xvii ACKNOWLEDGEMENTS This thesis is the result of nearly five years of research. I would like to express my sincere gratitude to all those who have supported me for completing my thesis. First, I would like to thank my research supervisors, Prof. Rabab K. Ward and Dr. Gary E. Birch for giving me the opportunity to work in their research group. I am greatly indebted to their guidance, support and encouragement throughout the course of my studies. I would also like to give thanks to my committee members: Dr. Dave Michelson, Dr. Jane Wang, and Dr. Tim Salcudean for investing their time to read and give me valuable feedback on this thesis.  Next, I extend my thanks to the researchers in the brain interface laboratory of the Neil Squire Society and the Image and Signal Processing lab of UBC for their support and help. Especially, I  would like to thank Dr. Lino Coria, Dr. Steven G. Mason, Dr. Jaimie  Borisoff,  Gordon  Handford,  Borna  Noureddin,  Xin  Yi  Yong,  Angela  Chuang, Qiang Tang, and Zicong Mai for their valuable technical comments and support on my work. Most importantly, I would like to express my deepest gratitude to my wife, Nona, my parents, my sister, my family and my friends for their love, help and endless support. If it were not for their sincere encouragement, I would have not made it as far as I did. This  work  was  supported  in  part  by  NSERC  under  Grant  90278-06  and  CIHR under Grant MOP-72711.      xviii DEDICATION      This thesis is dedicated to my lovely wife, Nona,                Who has offered me unconditional love and support and has stood by me all along. She believed in me, when I did not and for that, I shall always remain grateful.   And it is also dedicated to my parents, Nasrin and Hassan,                  Who have raised me to be the person I am today. They have been with me in every step. I hope I have made them proud.  And it is also dedicated to Shahnaz and Hossein,                 Who have always supported me and encouraged me.     xix CO-AUTHORSHIP STATEMENT  1-    Fatourechi,  M.,  Bashashati,  A.,  Birch,  G.E.  and  Ward,  R.K.    ?Automatic  User Customization  for  Improving  the  Performance  of  an  Asynchronous  Brain  Interface System?, Journal of Medical & Biological Engineering and Computing, Vol.44, No.12, Dec 2006, pp.1093-1104.  MF  developed  the  method,  analyzed  the  data,  interpreted  the  results,  wrote  the manuscript and acted as the corresponding author.  AB contributed to the study design, helped in shaping the manuscript , interpreting the results and evaluating the manuscript.  RKW and GEB supervised the development of the work, helped in writing and evaluating the manuscript.  2-  Fatourechi,  M.,  Birch,  G.  E.,  and  Ward,  R.  K.,  "Application  of  a  Hybrid Wavelet  Feature  Selection  Method  in  the  Design  of  a  Self-paced  Brain Interface  System",  Journal  of  NeuroEngineering  and  Rehabilitation,  Vol.4,  No.1,  Apr 2007. MF  developed  the  method,  analyzed  the  data,  interpreted  the  results,  wrote  the manuscript and acted as the corresponding author.  RKW and GEB supervised the development of the work, helped in writing and evaluating the manuscript.  3- Fatourechi, M., Birch, G. E., and Ward, R. K., ?A Self-paced Brain Interface System that Uses Movement Related Potentials and Changes in the Power of Brain Rhythms", Journal of Computational Neuroscience, Vol.23, No.1, Aug 2007,pp.21-37.   MF  developed  the  method,  analyzed  the  data,  interpreted  the  results,  wrote  the manuscript and acted as the corresponding author.     xx RKW and GEB supervised the development of the work, helped in writing and evaluating the manuscript.  4- Fatourechi, M., Birch, G. E., Ward, R. K., ?A Self-paced Brain Interface System with a Low False Positive Rate?, Journal of Neural Engineering, vol.5, 2008, pp.9-23. MF  developed  the  method,  analyzed  the  data,  interpreted  the  results,  wrote  the manuscript and acted as the corresponding author.  RKW and GEB supervised the development of the work, helped in writing and evaluating the manuscript.  5-  Fatourechi,  M.,  Bashashati,  A.,  Ward,  R.  K.,  and  Birch,  G.  E.,  "EOG  and EMG  Artifacts  in  Brain  Interface  Systems:  a  Survey",  Clinical  Neurophysiology, Vol.118, No.3, Mar 2007, pp.480-494 (Invited Paper). MF reviewed the papers, created the tables, interpreted the results, wrote the manuscript and acted as the corresponding author.  AB  helped  in  writing  the  manuscript,  interpreting  the  results  and  evaluating  the manuscript. RKW and GEB supervised the development of the work, helped in writing and evaluating the manuscript.  6-  Fatourechi,  M.,  Birch,  G.E.,  Ward,  R.K.,  ?Performance  of  a  Self-paced  Brain Computer  Interface  on  Data  Contaminated  with  Eye  Blinks  and  on  Data  Recorded  in Subsequent Sessions?, submitted. MF  developed  the  method,  analyzed  the  data,  interpreted  the  results,  wrote  the manuscript and acted as the corresponding author.  RKW and GEB supervised the development of the work, helped in writing and evaluating the manuscript.  7- Fatourechi, M., Mason, S. G., Ward, R.K., and Birch, G. E., ?A New Framework for Comparing Metrics Used in Pattern Classification Problems with Large Test Samples?, submitted.     xxi MF  developed  the  method,  analyzed  the  data,  interpreted  the  results,  wrote  the manuscript and acted as the corresponding author.  SGM contributed to the development of the initial concept, interpreting the results and evaluating the manuscript. RKW and GEB supervised the development of the work, helped in writing and evaluating the manuscript.  8- Fatourechi, M., Ward, R.K., and Birch, G. E., ?New studies on the design of a 2-state self-paced brain computer interface system with a low false positive rate?, submitted. MF  developed  the  method,  analyzed  the  data,  interpreted  the  results,  wrote  the manuscript and acted as the corresponding author.  RKW and GEB supervised the development of the work, helped in writing and evaluating the manuscript.            1 CHAPTER 1  INTRODUCTION AND BACKGROUND   1.1  Introduction and motivation Many  physiological  disorders  such  as  Amyotrophic  Lateral  Sclerosis  (ALS)  or injuries such as high-level spinal cord injury can disrupt the communication path between the  brain  and  the  body.    People  with  severe  motor  disabilities  may  lose  all  voluntary muscle control, including eye movements. These people are forced to accept a reduced quality of life, resulting in dependence on caretakers and escalating social costs [1]. Most of the existing assistive technology devices for these patients are not possible because these  devices  are  dependant  on  motor  activities  from  specific  parts  of  the  body. Alternative control paradigms for these individuals are thus desirable. Over the last two decades, brain-computer interface (BCI) has emerged as a new frontier  in  assistive  technology  since  it  could  provide  an  alternative  communication channel between a user?s brain and the outside world [2](see Figure 1-1 for a high-level block  diagram  of  a  BCI  system).  Other  terms  that  are  also  used  in  the  literature  for referring to a BCI system include: brain interface (BI), direct brain interface (DBI), and brain machine interface (BMI). A successful BCI design would enable people to control objects  in  their  environment  (such  as  a  light  switch  in  their  room  or  television, wheelchairs,  neural  prosthesis  and  computers)  by  thought  only.    This  could  be accomplished  by  measuring  specific  features  of  the  user?s  brain  activity  that  relate  to his/her  intent  to  perform  the  control.  This  specific  type  of  brain  activity  is  termed  a ?neurological phenomenon?. As an example, when a particular movement such as right index finger flexion is performed, specific neurological phenomena that  correspond to    2 that  movement  are  generated.  The  corresponding  neurological  phenomena  are  then translated into signals that are eventually used to control devices [3].    Figure 1-1. A BCI system allows users to control a device using their brain signals only. Currently,  two  different  approaches  are  pursued  in  the  design  of  BCI  systems: synchronized  and  self-paced  [4].  In  the  synchronized  approach,  which  forms  the traditional approach to the design of BCI systems, the user can only perform the control in  certain  time  intervals  that  are  specified  by  the  system.  While  synchronized  BCI systems  can  achieve  high  classification  accuracy  (>90%),  their  application  is  limited. This is because the user cannot perform the control at all times. Moreover, many of these systems  assume  that  the  user  will  exert  an  intentional  control  (IC)  command  during specified control periods. In other words they do not consider periods for which the user does not wish to exert control (called  no control, NC, periods). As a result, they may become unstable during NC periods [3]. To address these shortcomings of synchronized BCI systems, the concept of self-paced BCI (SBCI) has been proposed.  An SBCI system is constantly available for a user to use, as it should be able to identify IC patterns from the NC periods. Figure 1-2 shows a typical example of a 2-state SBCI system that should recognize IC patterns generated as the result of right finger flexion from the NC states. The output of this SBCI system is ?IC? when the system detects an IC command and is ?NC? at all other times.    3  Figure 1-2. A typical SBCI system that identifies an IC command related to the execution of right finger flexion The main aim of this thesis is to study the following three issues pertaining to SBCI systems:  1.1.1  High false positive rates (FPR) The performance of SBCI systems is usually summarized by two measures: 1) the correct detection rate of IC commands (denoted as the true positive or TP rate), 2) the amount of false activations during NC periods (false positive or FP rate). At present, the performance  of  the  SBCI  technology  is  not  high  enough  so  that  it  can  be  used  in  a practical  setting.  While  these  systems  can  achieve  an  arguably  good  detection  rate  of TP>50%, their FP rates remain too high for practical applications (e.g., a false positive occurs  every  few  seconds  [5,  6]).  For  example,  it  has  been  argued  that  for  an  SBCI system that makes a decision every th161 of a second, FP rates higher than 2% can cause excessive user frustration, since the SBCI system generates a false positive every 6.25 seconds on average [5]. As another example, consider the self-paced control of lighting in a room using an SBCI system.  The system has two states: I (turn on/turn off the light)    4 and N (no control). Figure 1-3 (a) and Figure 1-3 (b) show the brain states of a user and the output of the SBCI system, respectively. As seen, the system generated an FP at the beginning of monitoring the brain signals. The user then attempted to compensate for this error  by  issuing  an  intentional  control  command.  After  a  short  period,  the  system generated a second false positive, and the user had to compensate for it again. Clearly, during this period, the user only managed to compensate for the errors generated by the system.  This process becomes frustrating when errors happen frequently and especially if the TP rate is not very high. Based on these arguments, it is clear that the ultimate value of this new technology will largely depend on the degree to which its performance can be improved, e.g., false positives shall occur no more than once per minute.  Figure 1-3. High false positive rates can significantly impact the performance of an SBCI system, even if the TP rates are high. (a) Brain states of a user; (b) The output of the SBCI system. 1.1.2  Presence of artifacts  A second factor that limits the application of SBCI systems is the presence of artifacts. Artifacts are unwanted signals that can degrade the performance of the system. If artifacts occur at the same time of the initiation of an IC command by the user, they may change the shape of a neurological phenomenon, and decrease the TP rate. If they    5 occur during the NC periods, they can generate false positives and increase the FP rate. Since some artifacts such as physiological  artifacts (e.g., eye blinks) frequently occur, methods  should  be  developed  to  effectively  handle  them.  Unfortunately,  most  BCI systems do not handle artifacts at all (or at least efficiently). This is a serious drawback in online applications of BCI systems in general and SBCI systems in particular. 1.1.3  Evaluation metrics Yet another important factor in the design of SBCI systems is the availability of a suitable  ?evaluation  metric?.  In  synchronized  BCI  systems,  the  overall  classification accuracy (OA) and the information transfer rate (ITR) are metrics that are widely used. They are also accepted by the research community as reliable measures for comparing the performances of different synchronized BCI systems. This is not the case for SBCIs. It is very difficult to compare the performances of different SBCI systems. A wide variety of metrics such as OA[7], HF-difference[8], the mutual information (and ITR) [9], Kappa [10], the area under the receiver operating characteristic (ROC) curve [10] , the TP rate at a  fixed  FP  rate  [5]  and  others  have  been  proposed  in  the  literature.  However,  no consensus yet exists amongst self-paced BCI researchers regarding which metric is more suitable for summarizing the performance and how a suitable evaluation metric should be chosen for a particular self-paced BCI systems [4]. Please note that the neurological phenomena generated as the results of attempted movements by able-bodied individuals are similar to those generated by individuals with motor disabilities, as discussed later in this section. For this reason, in this thesis, data collected from able-bodied individuals are used for the analysis.  Before we address the existing work in the literature related to the above topics, we first provide some background information about the operation of BCI systems. This is done in the next two sections.  1.2   Functional model of a brain computer interface system Figure 1-4 shows a traditional BCI system in which a person controls a device in an  operating  environment  (e.g.,  a  powered  wheelchair  in  a  house)  through  a  series  of    6 functional components [11]. In this context, the user?s brain activity is used to generate IC commands that operate the BCI system. The user monitors the state of the device to determine the result of his/her control efforts.    Device   User  amp  Feature Translator Control Interface Device Controller state feedback Control Display electrodes  BCI Transducer Artifact Processor Feature Generator Signal Enhancement Feature Extraction Feature Selection Post-processing Feature Classification  Figure 1-4. Functional model of a BCI system.  Note the control display is optional.  The  building  components  of  a  BCI  system  (shown  in  Figure  1-4  )  have  the following tasks: the electrodes placed on the head of the user record the brain signal (e.g., electroencephalography  (EEG)  signals  from  the  scalp,  electrocorticography  (ECoG) signals from the brain or neuronal activity recorded using microelectrodes implanted in the brain). The ?artifact processor? block deals with artifacts in the EEG signals after the signals have been amplified. This block can either remove artifacts from the EEG signals or can simply mark some EEG epochs as artifact-contaminated. The ?feature generator? block  transforms  the  resultant  signals  into  feature  values  that  relate  to  the  underlying neurological phenomena employed by the user for control. For example, if the user is using the power of his/her Mu (8-12Hz) rhythm for the purpose of control, the feature generator could continually generate features relating to the power-spectral estimates of the user?s Mu rhythms. The feature generator generally consists of three components: the ?signal enhancement?, the ?feature extraction?, and the ?feature selection? components, as shown in Figure 1-4. In some BCI designs, ?signal enhancement? or some of form of ?pre-processing? is performed to increase the signal-to-noise ratio of the brain signal(s) prior to extracting the    7 features. To reduce the dimensionality of the problem, it is desired to reduce the number of features and/or the number of EEG channels. ?Feature selection? could be performed after  or  at  the  feature  extraction  stage  to  reduce  the  number  of  features  and/or  EEG channels  used.  Ideally,  the  features  that  are  meaningful  or  useful  in  the  classification stage are identified and chosen, while others are omitted.  The ?feature translator? block translates the features into logical control signals, e.g., 0 and 1 where 0 denotes NC and 1 denotes IC. The translation algorithm uses linear classification methods (e.g., linear discriminant analysis) or nonlinear ones (e.g., neural networks). As shown in Figure 1-4, a feature translator may consist of two components: ?feature classification? and ?post-processing?. The main aim of the feature classification component is to classify the features into logical control signals. Post-processing methods such as a moving average may be used after feature classification to reduce the number of activations of the system.  The  control  interface  translates  the  logical  control  signals  from  the  feature translator  into  semantic  control  signals  that  are  appropriate  for  the  particular  type  of device  used.  Finally,  the  device  controller  translates  the  semantic  control  signals  into physical control signals that are used by the device. For more detail refer to [3].  In the next section, we provide a brief review of some of the work done in the literature. 1.3  Background Since  the  introduction  of  the  concept  of  BCI  control  in  early  70?s  (e.g.,  [12]), many  BCI  systems  have  been  developed.  Despite  these  efforts,  many  design  issues remain under debate. In this section, we briefly review these design issues. 1.3.1  Signal recording An  activity  in  a  normal  human  brain  can  generate  various  responses  including electrical,  magnetic,  and  metabolic  responses.  These  signals  can  be  detected  by appropriate  sensors  and  they  can  be  used  for  controlling  a  BCI  system.  For  example, brain  activity  can  produce  magnetic  fields  that  can  be  recorded  using    8 magentoencephalographic  (MEG)  activity.    Brain  activity  can  also  result  in  some metabolic consequences in terms of changes in the blood flow and metabolism. Imaging methods  such  as  functional  magnetic  resonance  imaging  (fMRI)  can  image  these activities. At present, because of the cost and physical dimensions, methods that measure the electrical activities of the brain are more favored [1]. There  are  various  ways  to  record  the  electrical  activities  of  the  brain.  Non-invasive BCI approaches mostly use the EEG signals as the source of information. EEG signals are recorded by means of electrodes placed on the scalp. Invasive approaches, on the other hand, use electrocorticography (ECoG) signals recorded from the surface of the brain  or  action  potentials  of  single  neurons  in  the  cerebral  cortices,  using  implanted microelectrodes.   EEG  signals  have  good  temporal  resolution,  but  their  spatial  resolution  is  not good compared to other recording technology methods [1]. A recent study showed that only  12%  of  published  BCI  studies  use  implanted  electrodes,  5%  use  microelectrode arrays,  and  more  than  80%  use  EEG  signals  [3].  The  main  reason  is  that  the  EEG recording equipment is commercially produced and their cost is lower than other brain signal recording technologies. Also, since no surgery is necessary for placing electrodes, more individuals are willing to participate in such BCI experiments.  1.3.2  Choice of neurological phenomenon  Neurological phenomena are specific features of the brain activity that appear in the brain signals and can be used to control a BCI system. They are time-locked to a physical stimulus or to the cognitive responses of the brain. Neurological phenomena are characterized by their voltage amplitude, their latency which is related to the internal or external stimuli and their spatiotemporal  distribution. Their amplitude is usually much smaller than the background EEG signal.  The more common neurological phenomena in BCI systems are:  Changes in the brain rhythms (CBR) such as the Mu, and Beta rhythms related to a movement: Mu ([8-12] Hz) and Beta ([18-30] Hz) rhythms are frequency bands in the brain signals that are known to be suitable neurological phenomena for controlling BCI    9 systems. The reason is that they are closely associated with those cortical areas directly connected  to  normal  motor  channels  of  the  brain.  Voluntary  movement  results  in  a circumscribed desynchronization in the Mu and Beta bands that are localized close to the sensorimotor  areas  ([13,  14]).  This  desynchronization,  termed  ?event-related desynchronization (ERD)?, starts about two seconds prior to the onset of movement [15]. The  enhanced  rhythmic  activity  following  the  movement  is  called  ?event-related synchronization (ERS)?.   The post-movement Beta ERS is found in the first second after the  termination  of  a  voluntary  movement,  when  the  Mu  rhythm  might  still  display  a desynchronized  pattern  [15].  The  Beta  ERS  is  a  relatively  robust  phenomenon  and  is found in nearly all users after a finger, hand, arm or foot movement (see Figure 1-5 (a) for an example of the Beta ERS) [16].  Many research groups have developed BCI systems using the features extracted from the Mu and Beta rhythms. However, the works of two research groups are more prominent.  Wolpaw  and  McFarland  and  their  associates  in  Wadsworth  Center  have focused on developing such a CBR-based synchronized BCI system. Their proposed BCI system allows users to control the amplitude of Mu and Beta rhythms. This amplitude is then used to move a cursor on the computer screen [17-20]. Users of this system usually need  training  that  may  take  up  to  a  few  weeks,  but  eventually  they  can  achieve  high accuracies (e.g., above 90%) [21]. The other research group, the Graz BCI, uses the ERD and the ERS of the Mu/Beta rhythms in the design of synchronized BCI systems[22-26]. Similar to the first group, after a few sessions of training, the users of the Graz BCI can also achieve high accuracies.        10                (a)          (b) Figure 1-5. Two examples of neurological phenomena. (a) Changes in the power of Beta rhythms over time; (b) A movement-related potential. Vertical line shows the time of activation of the movement.  Note that these shapes are generated by averaging over many epochs. Movement  related  potentials  (MRPs):  Averaging  the  EEG  data  with  respect  to movement  onset  results  in  the  generation  of  slow  potentials  called  ?movement-related potentials? (MRPs) [27]. MRPs start about 1.5?1 seconds before the onset of a particular movement and have bilateral distribution (see Figure 1-5 (b) for an example of an MRP) [27-31]. High-resolution EEG studies have modeled the main sources of MRPs arising in the supplementary motor area and the primary sensorimotor cortex [32, 33]. MRPs have been used for the neurological phenomenon in several BCI studies. These studies include the work that has been carried out by Mason and Birch?s research group [5, 7, 34-36], Muller and Blankertz et. Al [37, 38] as well as Yom-tov and Inbar [6, 39, 40] .  Other  movement  related  activities  (OMRAs):  We  categorized  the  movement-related  activities  that  do  not  belong  to  any  of  the  preceding  categories  as  OMRA. OMRAs are usually not restricted to a particular frequency band or scalp location  and usually  cover  different  frequency  ranges.  They  may  be  a  combination  of  specific  and non-specific neurological phenomena. Levine and Huggins? research group are amongst the prominent research groups that have used OMRAs related to different movements to design their ECoG-based BCI system. They recorded ECoG activity from patients with 16-126 subdural electrodes prior to an epilepsy surgery. They have used topographically focused potentials associated with different movements to develop various 2-state self-paced BCI designs [8, 41, 42].    11Slow cortical potentials (SCPs): SCPs are slow usually non-movement potential changes generated by the user. They reflect changes in the cortical polarization of the EEG, lasting from 300 ms up to several seconds [2, 43]. Birbaumer and his colleagues have developed a BCI system called ?Thought Translation Device (TTD)? that uses an SCP as the source of control [44-47]. They have shown that patients with severe motor disabilities such as late-stage ALS can learn to control their SCPs and thus use TTD to communicate with the outside world.  Cognitive tasks (CTs): Changes in the brain signals as a result of non-movement mental  tasks  (e.g.,  mental  counting,  solving  a  multiplication  problem)  are  usually categorized  as  CTs  [48].  The  works  of  Milan  et.al  [49]  and  Anderson  et.  al  [50]  are amongst  the  prominent  BCI  research  carried  out  using  cognitive  tasks.  Millan  et.al?s work involves using the mental tasks to control a mobile robot, while Anderson et.al have focused on the design of a multi-class BCI system that detects cognitive tasks associated with different tasks such as 3D object recognition, mental counting, etc [51, 52].  P300:  When  infrequent  or  particularly  significant  auditory,  visual  or somatosensory  stimuli  are  interspersed  with  frequent  or  routine  stimuli,  they  evoke  a positive peak at about 300 ms after the stimulus is received. This peak is called P300 [48, 53]. Using this so-called ?oddball? response, Donchin and his colleagues have used P300 to build a successful BCI system [54, 55]. More recently, a number of studies have shown that P300-based control can be used as an alternative communication channel for people with spinal cord injury and ALS [56, 57]. Also, for individuals with visual impairments, solutions based on auditory or tactile stimuli have been proposed [58, 59].  Visual  evoked  potentials  (VEP):  VEPs  are  small  changes  in  the  brain  signal, generated in response to a visual stimulus such as flashing lights. They display properties whose characteristics depend on the type of the visual stimulus [48]. Many BCI systems use VEPs to control the BCI system including the works of Vidal [60], Sutter [61] and Middendorf [62]. Steady-State visual evoked potentials (SSVEP): If a visual stimulus is presented repetitively at a rate of 5-6 Hz or greater, a continuous oscillatory electrical response is elicited  in  the  visual  pathways.  Such  a  response  is  termed  SSVEP.    The  distinction    12between  VEP  and  SSVEP  depends  on  the  repetition  rate  of  the  stimulation  [63].  The works carried out by Gao?s research group are noticeable in this area [63-66].  Multiple  neurological  phenomena  (MNs):  BCI  systems  based  on  multiple neurological  phenomena  use  a  combination  of  two  or  more  of  the  above  neurological phenomena for the purpose of control. We will review this category of BCI systems in more details later in this chapter.  Activity  of  neural  cells  (ANC):  Some  BCI  research  groups  have  used microelectrode arrays to record the activity of single neurons in the motor cortex for the purpose of BCI control [67-73]. These BCI systems are usually based on reconstructing a movement  from  recorded  spike  trains.  Experiments  with  monkeys  have  shown  a relatively good ability of control in multiple directions in these systems [74]. Recently, there  have  been  reports  of  a  patient  learning  to  use  his  neuronal  activity  to  move  a computer cursor to several directions using the  ANCs [73]. These encouraging results provide hope for BCI control with multiple options and high accuracy. The downside is the invasive nature of the microelectrode implants, which may result in infection and side effects in the brain.   The above neurological phenomena can be categorized in two groups based on the origin of the phenomenon in the brain. Those neurological phenomena generated as the result of cognitive responses of the brain are called endogenous.  The ones evoked by an external stimulus are called exogenous.  BCI  systems  that  use  exogenous  neurological  phenomena,  usually  do  not  need any user training [54]. The downside of using these systems is that they require a constant commitment of one of sensory pathways to an external stimulus [75]. Furthermore, not all  users  may  tolerate  repetitive  sensory  stimulation.  On  the  other  hand,  endogenous-based BCI systems rely on the generation of a phenomenon that is more natural and is thus expected to cause the users less fatigue. This may be the reason why more than 80% of BCI studies use endogenous neurological phenomena to control BCI systems [3].  To generate a suitable neurological phenomenon, endogenous-based BCI systems usually need user training. This training may take a long time, sometimes even up to few months. The use of complex signal processing schemes for detecting weak neurological    13phenomena  can  greatly  reduce  or  even  eliminate  the  training  process  [76].  Another advantage  of  employing  endogenous  neurological  phenomena  is  that  it  is  possible  to select and use a combination of some of them to improve the performance of the system.  1.3.3  Timing of BCI control  So  far  most  BCI  researchers  have  focused  their  attention  on  ?synchronized? control applications.  In  synchronized applications, a user can initiate a command only during specific times specified by the system (see Figure 1-6(a)). In these systems, the users are required to generate  an intentional control (IC)  command during the periods allowed  by  the  BCI  system.  In  the  example  shown  in  Figure  1-6  (a),  the  user  should generate  one  of  IC1  or  IC2  commands  during  the  control  period  (the  control  period  is shown as a ?box?).  In  contrast,  in  self-paced  BCI  system,  a  user  does  not  need  to  be  constantly engaged in initiating the control command. In these systems, the users only consciously control their state when they desire to control the device (see Figure 1-6(b)) [35].  In the example shown in Figure 1-6 (b), the user is in the no control (NC) state at all times, except for those periods when he/she initiates an IC command.   In the latter case, the system will be in an IC state. During NC periods, the user can be idle, thinking about a problem  or  performing  any  action  other  than  attempting  to  control  the  device.  This property  of  SBCI  systems  that  allows  them  to  support  the  presence  of  NC  periods  is called ?NC support?. Whenever a BCI system involves control actions with periods of inaction, it needs to have NC support.    14        Figure 1-6. Synchronized vs. self-paced control. (a) In a synchronized BCI system, control can be done only in certain intervals specified by the system; (b) In a self-paced BCI system, the control is done at the user?s own pace. Synchronized  BCI  systems  usually  require  the  user  to  initiate  an  IC  command during  the  control  periods.  In  other  words,  during  the  control  periods,  the  users  are expected to be engaged with controlling the device. For this reason, they usually do not support the ?NC? periods. In some cases, the output of the system might even become unstable if an IC command is not issued.  In the next three sections, we briefly describe the literature related to this thesis. First,  in  Section  1.4,  the  previous  work  pertaining  to  the  design  of  SBCI  systems  is discussed. In Section 1.5, we address the use of more than one neurological phenomenon as the control source. Then in Section 1.6, we address the previous work on handling artifacts in the design of SBCI systems. Finally in Section 1.7, we discuss the previous research related to evaluating the performance of SBCI systems. 1.4  Design of self-paced BCI systems Self-paced  BCI  systems  provide  the  user  more  freedom  and  more  control flexibility. From the signal processing point of view, they are much more challenging to design compared to synchronized BCI systems.  The main reason is that there are many A synchronized BCI system IC1  IC1  IC2 IC NC NC A self-paced BCI system (a) (b)    15types of NC states (e.g., idle, different mental tasks, etc.). As a result, an SBCI system should be able to handle various types of different NC signals, once the SBCI system is turned on.  For this reason, only a few BCI groups have pursued the design of self-paced BCI systems [7, 36, 40, 77, 78]. The concept of self-paced control started in early 90?s with the development of the  outlier  processing  method  (OPM),  which  aimed  at  detecting  movement-related potentials (MRPs) in the EEG signals [36].  The results from this work were promising as true  positive  (TP)  rates  greater  than  90%  were  achieved  on  a  thumb  movement  task.  However, its poor performance over NC epochs (FP rates  ranging from 10% to 30%) restricted its use as a BCI system. To  overcome  the  vulnerabilities  of  OPM,  another  SBCI  system  called  the  low frequency- asynchronous switch design (LF-ASD) was later proposed in 2000 by Mason and Birch [35] . Similar to OPM, LF-ASD is also designed to detect MRPs in the EEG signals. It uses features extracted from the 0.1- 4Hz band in six bipolar EEG channels recorded  from  F1-  FC1,  Fz-  FCz,  F2-  FC2,  FC1-  C1,  FCz-Cz  and  FC2-  C2  on  the  scalp, sampled  at  128  Hz.  A  detector  that  was  a  simplified  version  of  the  discrete  wavelet transform was applied as the feature extractor and a 1-nearest neighbor (1-NN) classifier was used as the feature classifier.  By analyzing the EEG signals of five individuals, the features related to MRP (or IC) periods showed a definite difference from those in NC periods  [35].  During  the  past  few  years,  several  changes  have  been  applied  to  the structure of LF-ASD to improve its performance. These changes include the addition of an energy normalization transform [79], the addition of a debounce window as a post-processing  component  to  decrease  the  FP  rate  [5],    user-customization  of  the  feature extractor?s parameter values[80], and adding the knowledge of the past paths of features [81].  Despite these improvements, the performance of the LF-ASD is still not suitable for  many  practical  applications.  The  most  recent  design  of  the  LF-ASD  achieves  an average TP rate of 54.0 % at the false positive rate of 1%[82]. Since LF-ASD generates an output every 1/16th of a second, this is translated into, on average, one false positive every 6.25 seconds, while the detection rate of IC commands is less than 50%. For most    16practical  applications,  generating  such  a  high  FP  rate,  may  result  in  excessive  user frustration.  Another SBCI design, which improves upon the feature extractor of LF-ASD, is proposed  by  Yom-tov  et.al[6].  The  proposed  method  combines  the  LF-ASD  feature extractor with a matched filter, resulting in a hybrid detector. This method also results in a high FP rate. For FP rates<2%, the TP rates are lower than 30%. This system generates an output every 1/25th of a second. An FP rate of 2% is translated into one false positive, every  two  seconds.  As  a  result,  the  high  amount  of  FPs  limits  the  application  of  the proposed design.  While  the  above  studies  are  based  on  features  extracted  from  EEG  signals, researchers from the University of Michigan have focused on extracting features from ECoG signals [8, 41, 77, 83-85].  To detect IC commands, their designs either use the cross-correlation with a template [83] or the energy of wavelet packet transform [8]. In these studies, a threshold-based classifier is used for classifying the features. While these systems  usually  achieve  TP>50%,  their  performance  on  NC  epochs  is  not  very  clear. First,  none  of  these  studies  has  determined  the  number  of  NC  epochs.  Moreover,  to quantify the false positives, a new metric called the false discovery rate (FDR, i.e., the percentage  of  total  activations  of  the  switch  that  were  false)  was  used  [86].  Since  the number and the length of NC epochs is not determined in these studies, it is impossible to calculate the FP rate for these systems. In a recent study by this group, the reported FDR were in the range of 0% to 82% with 24 out of the 31 reported FDRs being higher than 10%  [8].  However,  since  the  numbers  of  IC  and  NC  epochs  were  not  specifically determined, no comment can be made on the performance of these systems over NC data. Table 1-1 compares the TPR and FPRs achieved in selected SBCI studies. Please keep in mind that although a direct comparison is not possible, this table roughly shows the performances of some of the existing SBCI systems. The rows of this table show the different  SBCI  studies.  The  columns  show  the  rate  at  which  the  system  generates  an output, TPR, and FPR, respectively. As shown, with the exception of the first study that uses ECoG signals, the rest of these studies, have low TPR for FPR<2%. Please note that,    17given  the  rates  at  which  that  these  SBCI  systems  generate  outputs,  these  FP  rates  are translated into one false positive every few seconds on average.  Table 1-1. Comparison of the TPR and FPR rates achieved in different SBCI studies. Paper\Study  Frequency  TPR(%)  FPR(%) Graimann, et.al [8]  100  Up to 100%  ? Mason and Birch[35] LF-ASD 16 <20% 2 OPM  <10% Mu-ASD*  <10%   Yom-tov and Inbar[6]           25  30%  2 Townsend et.al [87]  ?  <20%  2 Bashashati, et.al [82]  16  54.0  1 * Mu-ASD is a self-paced BCI system that uses Mu rhythms as the neurological phenomenon.  A review of self-paced endogenous BCI studies shows that with the exception of one  paper  [8]  (which  will  be  discussed  in  more  detail  later  in  this  section),  all  the proposed designs have relied on a single neurological phenomena. In the next Section, we bring evidence from the literature that supports the advantage of using the following three neurological  phenomena  (instead  of  only  one)  in  a  self-paced  BCI  system:  MRPs, changes in the power of the Mu and Beta rhythms.     181.5  Use of multiple neurological phenomena in BCI systems 1.5.1  Simultaneous application of MRPs and changes in the power of Mu/Beta rhythms A number of papers provide some evidence that MRPs and changes in the power of brain rhythms [usually characterized as the event-related desynchronization (ERD) and event-related synchronization (ERS)] provide complementary information for exploring  the  cognitive  functions  of  the  brain.  In  [88],  the  analysis  of  subdural  EEG  recordings from primary sensorimotor in epileptic patients showed that the amplitude of the ERD of the  Alpha  rhythm  recorded  from  subdural  areas  was  not  always  correlated  with  the corresponding  MRPs.  It  is  suggested  in  the  same  paper  that  these  neurological phenomena represent different aspects of cortical motor processes. In [89], the ERD of the  Alpha  rhythm  is  not  always  detected  in  cortical  sites  generating  MRPs.  In  [31], through  a high-resolution EEG study, it is shown that MRPs and the ERD of the Mu rhythm provide complementary information on human brain responses accompanying the preparation and execution of a finger movement. Further evidence from the analysis of EEG  signals  [90,  91]  and  magnetoencephalography  (MEG)[92-94]  strengthens  these findings.   There is also some evidence regarding the differences between the Mu and the Beta  rhythms.  Several  papers  show  that  the  reactivities  of  the  Mu  and  Beta  rhythms related  to  the  movement  onset  are  different  [95,  96].  Both  the  Mu  and  Beta  rhythms desynchronize before the occurrence of a voluntary self-paced movement. However, after the  movement,  the  ERD  of  the  Mu  rhythm  is  usually  followed  by  a  slow  return  to baseline  (and  sometimes  by  a  slight  synchronization),  while  the  Beta  rhythms synchronize rapidly after the movement onset [96]. This evidence from the literature shows that MRPs, Mu and Beta rhythms provide complementary  information  that  can  be  used  for  improving  the  performance  of  BCI systems. In the next sub-section, we review the simultaneous use of these phenomena in the BCI literature.      191.5.2  Using multiple neurological phenomena in BCI systems The main advantage of using more than one neurological phenomenon at the same time is that more information is available for the BCI system to detect an IC command related  to  a  particular  movement.  The  downside  is  that  as  the  size  of  the  input  data increases, the complexity of the pattern recognition algorithm increases as well.  Although  most  BCI  researchers  use  a  single  neurological  phenomenon  as  the source of control, there have been reports of using multiple neurological phenomena in BCI systems [8, 76, 92, 97-99]. In  [92], the authors analyzed different combinations of 1) features extracted from an early component of the MRP called Bereitschaftspotential (BP),  2)  features  extracted  from  the  ERD  of    neurological  phenomena  above  4Hz (through  autoregressive  modeling)  and  3)  features  extracted  from  the  common  spatial patterns  (CSP)  features  related  to  the  ERD  of  Mu  rhythms.  The  BCI  system  had  to discriminate  between  left  and  right  index  finger  movements.  A  linear  discriminant analysis  (LDA)  classifier  was  used  for  classification.  Different  combination  schemes were explored. The study showed that a certain combination of classifiers could result in a lower error rate than the case where a single classifier is used.  The results of combining the ERD of the Mu rhythm and the BP were not reported, although the authors mention that those results were slightly worse than the results obtained when all three neurological phenomena were used in the design of the BCI system.  In [97], the authors applied a combination  of  microstate  analysis  and  common  spatial  subspace  decomposition  to extract  features  belonging  to  three  different  frequency  bands:  Theta  +  Delta,  Mu  and Beta. MRPs were not treated as a separate neurological phenomenon. Instead, features were  extracted  from  the  frequency  band  covering  both  the  Delta  and  Theta  rhythms.  These features were then used to discriminate between left and right hand movements. Using  data  of  three  participants,  the  proposed  method  achieved  an  average  accuracy higher than 80%. In [100], the authors used the BP and the ERD of the brain rhythms in the 10 to 33 Hz frequency band (including both the Mu and Beta rhythms) to classify left vs. right finger movement. The features extracted from all neurological phenomena and from  all  EEG  channels  were  then  combined,  the  dimension  of  the  feature  vector  was reduced and the final vector was classified using a perceptron neural network. The results showed classification accuracy of 84% on the test set. In [101], the authors used features    20extracted from the BP and the ERD of the Mu rhythms for classifying the left and right index finger movements. It was shown that combining features results in decreasing the classification error for four out of five subjects whose data were studied.  The  above  studies  all  pertain  to  synchronized  BCI  systems.  Only  one  SBCI system that uses multiple neurological phenomena has been reported so far [8].   In this study, the authors combined a number of neurological phenomena in order to design an ECoG-based  SBCI  system.  Using  a  wavelet  packet  transform,  ECoG  signals  were divided into 18 different frequency bands covering the range from 0 - 100 Hz. This range covered  a  wide  range  of  neurological  phenomena  including  Mu,  Beta  and  Gamma rhythms,  as  well  as  other  movement-related  activities  (OMRAs).  Then  for  each  band, wavelet-filtered  signals  were  reconstructed.  The  wavelet  filtered  signals  were  then squared  to  achieve  power  values,  and  a  genetic  algorithm  was  applied  to  reduce  the dimension of the feature space to one. Using a thresholding classifier, the test samples were  classified as movement or no movement. As mentioned earlier in  Table 1-1, the reported false discovery rates of this study were in the range of 0% to 82% with 24 out of the  31  reported  FDRs  being  higher  than  10%.  This  study,  however,  did  not  consider MRPs as one of the neurological phenomena. Instead it solely focused on detecting the power of different frequency bands. Furthermore, it extracted features from ECoG signals instead of EEG signals. As noted earlier in Section 1.3.1, recording ECoG signals needs surgery  and  it  has  an  invasive  nature.  For  this  reason,  this  method  of  recording  brain signals  may  not  be  fully  accepted  by  the  research  community  until  the  health-related issues are fully investigated.  As we will discuss in Section 1.9, in this thesis we will design a new SBCI system that  simultaneously  uses  information  extracted  from  MRPs  as  well  as  changes  in  the power of the Mu and the Beta rhythms. To the best of our knowledge, this is the first time in  the  BCI  literature  that  such  a  study  is  carried  out  in the  context  of self-paced  BCI systems.     211.6  Artifacts in BCI systems Artifacts are undesirable potentials that contaminate brain signals and are mostly of  non-cerebral  origin.  Unfortunately,  they  can  modify  the  shape  of  a  neurological phenomenon  that  drives  a  BCI  system.  They  can  also  mistakenly  result  in  an unintentional control of the device [102].  Therefore, there is a need to avoid, reject or remove artifacts from the recordings of brain signals.   In an SBCI system, artifacts  can impact the performance of the  system  in two ways: 1) by changing the shape of the neurological phenomenon during an IC period, they  cause  a  decrease  in  the  TP  rate.  2)  By  mimicking  the  shape/properties  of  the neurological phenomenon during the NC periods, artifacts results in an increase in the FP rates.   Figure 1-7. An example of how artifacts can affect the performance of an SBCI system. (a) The brain state of the user; (b) The periods when artifacts have occurred; (c) The output of the SBCI system (note: FP: false positive, TN: true negative, FN: false negative and TP: true positive).  Figure 1-7 shows how this can happen. Figure 1-7 (a) shows the brain states of a user during a specific time frame. As seen, the user is in an NC state, however, at two time instants the user initiates an IC command. Figure 1-7 (b) shows the periods of EEG signals  that  are  contaminated  with  artifacts.  The  term  ?ART?  denotes  ?artifact-   22contaminated periods? and ?NO? refers to the periods not contaminated with artifacts. The second period coincides with the first IC command. Figure 1-7 (c) shows the output of the SBCI system. The occurrence of the first artifact results in a false positive. The second artifact, results in masking the first IC command (a false negative or FN).  Artifacts originate from non-physiological as well as physiological sources. Non-physiological artifacts originate from outside the human body (such as 50/60 Hz power-line  noise  or  changes  in  electrode  impedances),  and  are  usually  avoided  by  proper filtering, shielding, etc.  Physiological  artifacts  arise  from  a  variety  of  bodily  activities. Electrocardiography  (ECG)  artifacts  are  caused  by  heart  beats  and  may  introduce  a rhythmic activity into the EEG signal. Respiration can also cause artifacts by introducing a  rhythmic  activity  that  is  synchronized  with  the  body?s  respiratory  movements.  Skin responses such as sweating may alter the impedance of electrodes and cause artifacts in the EEG signals [103]. The two physiological artifacts that have been most examined in BCI  studies,  however,  are  ocular  (Electrooculography  or  EOG)  and  muscle (Electromyography or EMG) artifacts. EOG artifacts are generally high-amplitude patterns in the brain signal caused by blinking of the eyes, or low-frequency patterns caused by movements (such as rolling) of the eyes [104]. EOG activity has a wide frequency range, being maximal at frequencies below 4Hz, and is most prominent over the anterior head regions [105].  EMG  activity  (movement  of  the  head,  body,  jaw  or  tongue)  can  cause  large disturbances  in  the  brain  signal.  EMG  activity  has  a  wide  frequency  range,  being maximal  at  frequencies  higher  than  30  Hz  [104,  105].    Difficult  tasks  may  cause  an increase in EMG activity related to the movement of facial muscles [106, 107].  Some studies have shown that EOG and EMG activities may generate artifacts that affect the neurological phenomena used in a BCI system [108, 109]. For example, [109] demonstrated that brain rhythms are contaminated with EMG artifacts during the early training sessions of their proposed BCI system that used Mu and Beta rhythms as sources of control.     23 Physiological  artifacts  such  as  EOG  and  EMG  artifacts  are  much  more challenging  to  handle  than  non-physiological  ones.  Moreover,  controlling  them  during the signal acquisition stage is not easy. There are different ways of handling artifacts in BCI systems.  In this Section, we briefly examine the reported methods for handling EOG and EMG artifacts, as these are among the most important sources of contamination in BCI systems.  1.6.1  Artifact avoidance The first step in handling artifacts is to avoid their occurrence by issuing proper instructions to users. For example, users are instructed to avoid blinking or moving their bodies during the experiments. Instructing individuals to avoid generating artifacts during data collection has the advantage  of  being  the  least  computationally  demanding  among  the  artifact  handling methods, since it is assumed that no artifact is present in the signal (or that the presence of  artifacts  is  minimal).  However,  it  has  several  drawbacks.  First,  since  many physiological signals, such as the heart beats, are involuntary, artifacts  will always be present in brain signals. Even in the case of EOG and EMG activities, it is not easy to control eye and other movement activities during the process of data recording. Second, the  occurrence  of  ocular  and  muscle  activity  during  an  online  operation  of  any  BCI system is not avoidable. Third, collecting sufficient amount of data without artifacts may be difficult, especially in cases where a user has a neurological disability [110]. Finally, avoiding  artifacts  may  introduce  an  additional  cognitive  task  for  the  individual.  For example, it has been shown that refraining from eye blinking results in changes in the amplitude of some evoked potentials [111, 112].  1.6.2  Artifact rejection  Artifact rejection refers to the process of rejecting the trials affected by artifacts. It is perhaps the simplest way of dealing with brain signals contaminated with artifacts. It has some important advantages over the ?artifact avoidance? approach. For example, it would be easier for individuals to participate in the experiments and perform the required tasks,  especially  those  individuals  with  motor  disabilities.  Also,  the  ?secondary?    24cognitive task, resulting from an individual trying to avoid generating a particular artifact, will not be present in the EEG signal.  ?Artifact rejection? is usually done by visually inspecting the EEG or the artifact signals, or by using an automatic detection method [113].     Manual rejection Manual rejection of epochs contaminated with artifacts is a common practice in the BCI field. Trials are visually checked by an expert, and those that are contaminated with artifacts are removed from the analysis.  Similar  to  ?artifact  avoidance?,  manual  rejection  also  has  the  advantage  of  not being computationally demanding, as it is assumed that a human expert has identified all the artifact-contaminated epochs and removed them from the analysis. On the other hand, there  are  many  disadvantages  in  using  ?manual  rejection?.  First,  ?manual  rejection? comes  at  the  cost  of  intensive  human  labor,  especially  if  the  study  involves  a  large number  of  individuals  or  a  large  amount  of  recorded  data.  Second,  the  process  of selecting the artifact-free trials may become subjective. It has been argued that because of the selection bias, the sample trials that are artifact-free may not be representative of the entire population of the trials [113]. Third, in the case of offline analysis, the rejection of artifact-contaminated trials, may lead to a substantial loss of data. This may become a huge drawback, especially in the case of individuals with motor disabilities, where offline data recording is not as convenient as it is for able-bodied individuals.   Automatic rejection   In the ?automatic rejection?, the BCI system automatically discards the epochs of  brain  signals  that  are  contaminated  with  particular  artifacts.  This  procedure  is commonly carried out in offline investigations.  Automatic rejection of epochs can be done in the following two ways: Rejection using the EOG (EMG) signal: When one of the characteristics of the EOG  (EMG)  signal  in  an  epoch  exceeds  a  pre-determined  threshold,  the  epoch  is considered as artifact-contaminated and is automatically rejected.     25Rejection using the EEG signal:  This rejection methodology is similar to the above; only the EEG signal is used instead of the EOG (EMG) signal. This approach has the advantage of being independent of the EOG (EMG) signal, and is useful if the EOG (EMG) signal is not recorded during data collection.     An  advantage  of  the  ?automatic  rejection?  approach  over  that  of  ?manual rejection? is that it is less labor intensive. However, automatic rejection still suffers from loss of valuable data [114, 115].  In the case of EOG artifacts, the automatic rejection approach  also  does  not  allow  the  rejection  of  contaminated  trials  when  the  EOG amplitude is small [116, 117].     Two issues need to be addressed for the BCI systems which reject artifacts: Because of the vast number of artifacts that exist in BCI systems (eye blinking, eye movements, movements of different parts of the body, breathing, etc.), not all the artifact-contaminated  trials  can  be  rejected.  Usually  only  the  epochs  with  a  strong presence of artifacts are excluded from the analysis. Therefore, the so-called ?clean? data are unfortunately not completely free of artifacts.   The  second  issue  is  that  the  rejection  of  artifact-contaminated  data  during  an offline analysis may generate ?cleaner? data. However, for online real-time applications of a  BCI system, this may  pose  a huge drawback.  In online  applications, artifacts are unavoidable.  If  artifacts  are  rejected  during  the  offline  analysis,  the  same  rejection mechanism can be used to reject them during the online analysis. The only problem is that during the specific time periods when artifact-contaminated signals are rejected, the system is unreachable and cannot be used for controlling the device.   1.6.3  Artifact removal Artifact removal is the process of identifying and removing artifacts from brain signals. An artifact-removal method should be able to remove the artifacts as well as keep the related neurological phenomenon intact. Common methods for removing the artifacts in EEG signals are linear filtering [118, 119], linear combination and regression [116], blind  source  separation  [120],  principal  component  analysis  [121],  wavelet  transform [122] , nonlinear adaptive filtering [123]and source dipole analysis (SDA) [124].     26A survey of all BCI studies published before January 2006 shows that most BCI papers do not report whether or not they have considered EMG and/or EOG artifacts in their  analysis.  This  is  an  important  issue,  since  offline  analysis  methods  that  do  not account for physiological artifacts may probably face some problems when tested during an online study. As a result, it is important that BCI researchers pay more attention to this important issue and address the method that they have employed for handling artifacts.  A number of BCI studies state that EMG activity will not be present in the EEG signal  when  the  EEG  signal  is  analyzed  before  a  movement  has  occurred  [125].  This argument may not be valid for BCI systems.  This is because peripheral changes such as EMG  tension  can  affect  the  EEG  signal,  even  though  the  amount  by  which  the  EEG signal is affected remains unclear [126].  It is pointed out in [126] that even when the individuals are very restricted, they still preserve motor control over some muscle groups. Although  the  activities  of  several  muscle  groups  are  monitored  in  BCI  studies,  there remain some muscles whose activities are not recorded.    The  BCI  systems  that  employ  ?manual  rejection?  of  EOG  and  EMG  artifacts should also consider the fact that  ?manual  rejection? is only  a preliminary step in the design of a BCI system. ?Manual rejection? can only be used for offline analysis. In order for a particular BCI system to work in an online fashion, a scheme for handling artifacts should  be  incorporated.  Requesting  the  individuals  to  avoid  artifacts  should  be  only considered as a temporary solution. In a practical application, EMG and EOG artifacts do happen, so methods of  handling these artifacts  during  an online experiment should be investigated.   One solution for handling artifacts, which is not explored well in the BCI studies, is to design a BCI that is robust in the presence of artifacts. If such a BCI is designed, then  the  need  for  having  a  method  of  handling  artifacts  will  be  minimized.  Another solution that has not been explored well in the BCI literature, is that of using more than one neurological phenomenon may lead to increasing the robustness to the occurrence of artifacts[76].  Since  EOG  artifacts  mostly  affect  the  low-frequency  components  of  the EEG  signals,  BCI  systems  that  use  low-frequency  ERPs,  such  as  MRP  and  SCP  are mostly  affected  by  EOG  artifacts.  EMG  artifacts  on  the  other  hand,  mainly  affect  the    27high-frequency  components  of  the  EEG  signals,  hence  BCI  systems  that  use  high-frequency ERPs, such as Mu and Beta rhythms are mostly affected by EMG artifacts. Thus, it can be concluded that a BCI system that uses multiple neurological phenomena from  whose  frequency  span  both  the  low  as  well  as  the  high  frequency  bands,  may become more robust to the presence of artifacts.  1.7  Evaluating the performance of SBCI systems Model selection is the process of finding or adjusting the model parameters for any  classification  problem.  For  BCI  systems,  model  selection  is  a  crucial  part  of  the design.  This process may include selecting the features, the type of the feature extractor, the classifier, the EEG  channels, the neurological phenomenon, the frequency band of interest,  the  values  of  the  classifier?s  parameters  and  the  preprocessing  and  post-processing components. As an example, to find the optimal set of features for a certain BCI,  different  sets  of  features  are  considered.  For  every  set,  the  performance  of  the system is calculated and different performances are compared. The set of features that yields the best performance is then selected.  The performance of this best model can then be compared with those achieved by similar BCI systems (i.e., systems with the same experimental  as  well  as evaluation  protocols).  Therefore,  the  performance  of  an  SBCI must be evaluated in the following two cases, 1) during the model selection procedure and 2) when comparing the performance with other systems.  The  performance  of  a  BCI  with  discrete  states  is  usually  summarized  by  a confusion matrix. The (i,j)  entry of this matrix represents the number of samples from class i that are classified as belonging to class j. A confusion matrix provides valuable information  regarding  how  well  each  class  is  classified  by  the  BCI  system.    It  is, however,  not  usually  straightforward  to  compare  different  confusion  matrices.  Evaluation metrics are thus needed to summarize a confusion matrix into a single value. For  classification  problems  with  balanced  datasets  such  as  synchronized  BCI  systems (where  )21 N  for an N-class problem), the overall  classification  accuracy  (OA)  is  the  most  common  evaluation  metric  presently used  to  summarize  the  performance  [10].  The  use  of  OA  for  problems  with  highly    28imbalanced classes (e.g.,  )21 Class  for a two-class problem) is not satisfactory [127]. The  choice  of  the  evaluation  metric  is  of  great  importance  and  is  application-dependent.  A poorly defined evaluation metric may guide the model selection procedure to a far-from-optimal model or it can lead to erroneous conclusions when comparing the performances of two SBCI systems. As a result, all the effort spent in the design of a sophisticated  SBCI  may  be  lost,  simply  because  of  the  poor  choice  of  the  evaluation metric. Recently, the choice of OA as the default evaluation metric has been questioned, even in classification applications with balanced datasets. Specifically, it was shown that for  many  applications,  the  area  under  the  receiver  operating  characteristic  (AUC)  can summarize the performance better than OA [128].     Although OA is not suitable for classification problems with imbalanced classes, the choice of an alternative evaluation metric is not obvious. Several attempts have been made to define more suitable evaluation metrics for these problems. Examples of such evaluation metrics include weighted overall accuracy (WOA) [129], the use of receiver operating characteristic (ROC) curves and related measures such as area under the ROC (AUC)  [130]  and  the  Kappa  coefficient  [131].  In  the  SBCI  literature,  some  of  the evaluation  metrics  used  include  overall  accuracy  [7],  HF-difference[8],  mutual information (information transfer rate) [9], Kappa [10], AUC [10], the true positive rate (TPR)  at  a  fixed  false  positive  rate  (FPR)  [5]  and  FPRTPR [132].Figure  1-8  shows  the proposed evaluation metrics for synchronized and self-paced BCI systems. As seen, the number of proposed evaluation metrics is significantly higher for self-paced BCI systems than synchronized BCI systems.     29 Figure 1-8. Types of evaluation metrics used in synchronized and self-paced BCI systems.    OA  shows  the  total  number  of  test  samples  correctly  classified  by  an  SBCI system. It has been frequently used in evaluating many synchronized BCI systems [133-135]. Its use in SBCIs, however, has so far been limited [7]. This is because, for an SBCI system,  OA  assigns  a  huge  weight  on  the  more  frequent  class  (NC)  and  a  very  small weight on the less frequent classes (IC). This may lead to misleading conclusions about the performance of the system. The information transfer rate (ITR) has been specifically proposed for evaluating the performance of synchronized BCI systems [136]. This metric is proposed based on the  similarities  between  an  SBCI  and  a  communication  channel,  and  using  Shannon?s communication  theory.  The  rationale  is  that  ITR  measures  the  amount  of  information transferred between two reference points.  The output Y of an SBCI is the interpretation (information)  of  the  current  state  of  the  brain,  and  Y  conveys  this  information  to  the downstream components. It is thus argued in [136] that the amount of information in Y is a useful tool for comparing the results obtained from different synchronized BCI designs. It is also argued that ITR by itself is ?not?a suitable single evaluation metric for an SBCI system.  This  is  because  of  the  unique  nature  of  this  metric  having  more  than  one maximum (see [137] for a detailed discussion).     30Cohen?s  Kappa  coefficient  is  a  measure  of  agreement  between  two  estimators [138].  Since it considers chance agreements, it is regarded as a more robust measure than OA [10].  The  HF-difference  is  a  newly  proposed  metric  that  summarizes  the  confusion matrix [85]. It is defined as the difference between the TP rate and the percentage of total activations that are incorrect (the false discovery rate (FDR)[86]). The advantage of using HF-difference is that it is sensitive to the ratio of FPs to the total number of detections. The downside of using the HF-difference is that it does not consider the length of NC periods.  The  FPRTPR   is  another  evaluation  metric  that  was  recently  proposed  for  2-class SBCI systems [132, 139]. This metric gives more weight to cases with low FPRs. As a result, during the model tuning process, any model with a high FPR is assigned a low fitness, even though TPR might have a high value. The downside is that for FPR=0, the system cannot differentiate between confusion matrices with different TPRs.    The  receiver  operating  characteristics  (ROC)  curve  is  a  popular  metric  for evaluating  systems  with  imbalanced  classes.  The  ROC  curve  depicts  the  relationship between  TPR  and  FPR.  Popular  methods  that  use  the  ROC  curve  for  measuring  the performance employ one of the following two criteria 1) The area under the ROC curve (AUC) which is used as the fitness of the system [10]; 2) Defining a critical FPR value ( Critical ) , and then using the value of the TP rate at  Critical  as the fitness [5]. The advantage  of  using  the  ROC  curve  over  previous  metrics  is  that  a  whole  range  of solutions (in terms of a tradeoff between TPR and FPR) is provided.      One  problem  with  using  the  ROC  curve  is  that  when  it  is  plotted  over  the whole range of TPR and FPR, most SBCI systems produce a curve that is similar to a perfect ROC curve [4]. The other problem with using the ROC curve (and perhaps more important) is that it is computationally more demanding than other evaluation metrics. Several points need to be evaluated until a partial ROC curve that is accurate enough for estimating the AUC is drawn. Similarly, several points need to be calculated in order to obtain the value of TPR at Critical . Even if the ROC curve  is estimated using the more    31computationally  efficient  algorithm  as  described  in  [128],  it  remains  much  more  time consuming than the metrics described above as these only need the value of a single point to assess the performance.  When these metrics are used to evaluate the performance and select a model from thousands of confusion matrices during a model selection procedure, the  computational  burden  becomes  problematic.  For  these  reasons,  evaluation  metrics that summarize the performance based on a single evaluation of a confusion matrix are more desirable during the model selection procedure. Each of these metrics has strengths and weaknesses [10], however, the published SBCI  studies  do  not  usually  discuss  why  a  particular  evaluation  metric  is  chosen  for evaluating the performance.  This leads to the obvious conclusion that finding suitable evaluation metrics forms an important and a needed study for SBCI systems. This need has been emphasized in a recently published technical report on evaluating SBCI systems [4].  1.8  Thesis contributions As discussed above, this thesis addresses three issues of importance to the designs of SBCI systems: 1)  Decreasing the false positive rates in SBCI systems. 2)  Handling artifacts in SBCI systems, and 3)  Evaluating the performance of SBCI systems. To address the high FP rates and the presence of artifacts, in Chapters 4, 5, 7, and Chapter 9, we propose and evaluate a new 2-state SBCI system that can distinguish an IC command related to a specific movement pattern from the NC state in EEG signals. In the design  of  this  system,  the  main  focus  is  to  improve  the  performance  over  those  of previous  EEG-based  SBCIs.  To  achieve  this  goal,  we  propose  and  investigate  the simultaneous  detection  of  three  neurological  phenomena  to  recognize  IC  commands. These three neurological phenomena are movement-related potentials (MRPs), changes in the power of Mu rhythms (CPMR) and changes in the power of Beta rhythms (CPBR). These neurological phenomena are known to be time-locked to the onset of movement, so    32we  postulated  that  detecting  all  of  them  at  the  same  time  improves  the  system?s performance.  This  is  the  first  time  in  the  BCI  literature  that  the  analysis  of  these neurological phenomena at the same time is proposed for detecting the IC commands.  A systematic  approach  for  feature  extraction,  selection  and  classification  for  each neurological  phenomenon  is  presented.  We  also  propose  a  2  ?stage  multiple  classifier system (MCS) to efficiently combine the information extracted from these neurological phenomena.  The performance of the proposed system is compared with those of other state-of-the-art  EEG-based  SBCI  systems.  It  is  shown  that  the  proposed  method  results  in  a superior performance. A theoretical analysis of the performance of the proposed SBCI is presented and it is shown that under certain conditions the proposed methodology can theoretically approach perfect classification accuracy.  Since  the  proposed  SBCI  relies  on  detecting  more  than  one  neurological phenomenon at the same time, it is expected that its performance is robust in the presence of  most  artifacts.  This  is  because  artifacts  are  usually  more  prominent  over  a  certain frequency band and do not affect other frequency bands as much. In Chapter 7, we show that the proposed SBCI has a good performance over periods contaminated with artifacts.  Finally, in Chapter 8 a framework for comparing and selecting evaluation metrics for SBCI systems is also proposed. It is shown that this framework can be successfully applied to select from a number of available metrics, the evaluation metric that is most suitable for evaluating SBCI systems. The findings of this chapter are applied in Chapter 9 to select the most suitable evaluation metric for evaluating the performance.  The thesis provides a detailed description of our methods and results. The main contributions of this thesis fall into three categories as follows: 1.8.1  Reducing high false positive rates 1) Introducing the idea of using features from MRPs, CPMR and CPBR at the same time to detect the possible presence of IC commands.  2) Developing a new SBCI system that extracts and classifies features extracted from the  above  neurological  phenomena  efficiently.  A  new  two-stage  multiple  classifier    33system (MCS) is proposed. At the first stage, an MCS is separately designed for each neurological phenomenon. At the second stage, another MCS combines the outputs of MCSs in the first stage. 3)  Studying  the  performance  of  the  2-stage  MCS  using  the  Linear  Programming theory. 4) Investigating of the performance of the proposed SBCI system on two datasets: one dataset related to the right finger flexion and the other dataset related to the right hand extension. It will be shown that the proposed SBCI system achieves error rates that  are  significantly  lower  than  those  of  other  state-of-the-art  EEG-based  SBCI systems. 5) Comparing the use of monopolar and bipolar EEG electrodes for detecting right hand  extension  movements  and  demonstrating  that  bipolar  electrodes  provide superior results. 6) Studying the effect of automatic user-customization in the performance of a state-of the-art self-paced BCI system previously developed in the brain interface lab of the Neil  Squire  society.  It  will  be  demonstrated  that  automatic  user  customization significantly  improves  the  performance  compared  to  manual  customization  by  an expert. 1.8.2  Addressing artifacts in SBCI systems 1) Presenting a detailed review of the methods that handle artifacts in BCI systems. Surprisingly, this review shows that most BCI systems do not address the presence of artifacts properly. 2)  Investigating  the  performance  of  the  proposed  SBCI  system  over  periods contaminated  with  eye-blink  artifacts.  It  will  be  shown  that  the  system  has  a reasonably  good performance over periods contaminated with large eye movement artifacts. 3) Investigating the performance of the proposed SBCI system using the data from the session recorded few days after the data used for training the SBCI system. Again, it will be demonstrated that the proposed SBCI system achieves a good performance for three out of four participants whose data are studied.     344) Proposing an artifact monitoring system that detects large eye movement artifacts as well as EMG artifacts related to frontalis muscles at the same time.  1.8.3  Finding a suitable evaluation metric for SBCI systems 1)  Proposing  a  framework  for  comparing  the  evaluation  metrics  during  the  model selection process in SBCI systems. 2)  Applying  the  proposed  framework  to  a  particular  SBCI  system  and  finding  the most suitable evaluation metric. 3) Demonstrating that the Kappa coefficient is the most suitable evaluation metric for the proposed SBCI system.  1.9  Organization of the thesis The organization of this thesis is as follows: We first study the Low Frequency - Asynchronous Switch Design (the LF-ASD) in Chapter 2. The LF-ASD is a state-of-the-art EEG-based SBCI system that is used as the  basis  for  performance  comparison  in  some  of  the  following  chapters.  It  is  thus reasonable that at the first stage of the research, the structure of LF-ASD as well as its performance are examined in detail.  The parameter values of the feature generator of the LF-ASD have been usually determined  by  the  designer  based  on  trial  and  error.  This  process  is  suboptimal, subjective and time-consuming for the researchers. In Chapter 2, we propose the use of a genetic  algorithm  (GA)  to  automatically  tune  the  parameter  values  of  the  feature generator of one of the designs of the LF-ASD.  The purpose of this study is 2-fold: 1) to automate the tuning process of the feature generator and 2) to improve the performance of  the  LF-ASD.  Specifically,  we  are  interested  in  finding  an  upper  limit  for  the performance of this design of the LF-ASD. This could be a starting point for the next stage of the research as it would decide whether the current feature generator should be kept or if it should be replaced by a more powerful one. In Chapter 2, we show that only moderate  improvements  in  the  performance  of  the  LF-ASD  occur,  after  automatically    35tuning the parameters of the feature generator. For this reason, the subsequent chapters, the use of the wavelet transform for extracting the features is explored. In  Chapter  3,  we  study  the  use  of  the  discrete  wavelet  transform  (DWT)  for extracting MRP features. The reason behind choosing DWT is 2-fold: 1) LF-ASD uses a transform for feature extraction that is a simplified version of the wavelet transform. We thus postulate that a more sophisticated version of this detector (i.e., the DWT) should be able to better extract features related to MRPs and 2) DWT provides both time as well as frequency information, so it can provide more information than the traditional frequency-based  approaches  and  can  improve  the  performance  [11].  The  evidence  from  the  BCI literature also supports this hypothesis [140, 141]. We compare two different variations of  this  design.  The  first  is  based  on  MRP  features  extracted  from  monopolar  EEG channels and the other is based on MRP features extracted from bipolar EEG channels. We  argue  that  the  system  based  on  the  bipolar  MRP  features  yields  a  superior performance.  Parallel to the research carried out in Chapter 3, we carried out a second study which focused on a simple design of an SBCI system that is based on features extracted from three neurological phenomena: MRP, CPMR and CPBR. This study is carried out as a proof of concept to show that the combination of the three neurological phenomena discussed earlier in this chapter would improve the performance of the system. For this reason, we only applied simple feature extraction and classification methods (matched filtering and K-nearest neighbor classifier). A 2-stage multiple classifier system (MCS) is proposed to ?fuse? the classification results attributed to each neurological phenomenon. Figure 1-9 shows the overall structure of the system studied in Chapter 4.     36 Figure 1-9. The overall schematic of the SBCI system developed and studied in Chapters 4, 5, 7, and 9. In Chapter 5, we use the findings from Chapters 3 and 4 and propose an improved design. This design uses a new feature extraction method (a combination of stationary wavelet transform (SWT) followed by matched filtering) and proposes the use of a hybrid genetic algorithm (HGA) to simultaneously select the features, the parameter values of the classifiers and the combination method for the 2-stage MCS. We demonstrate that the new design achieves much lower FP rates than previous EEG-based SBCI systems, while it maintains a modest TP rate. These promising results facilitate the practical applications of the proposed SBCI system.  In Chapters 6 and 7 we focus on artifacts in BCI systems in general and in the proposed SBCI system in particular. Artifacts are unwanted potentials that can change the shape of the neurological phenomena and thus decrease the system?s performance. As a result, handling artifacts is an important part in the design of BCI systems.  In Chapter 6 artifacts in BCI literature are addressed. The results of this review study show that the BCI literature does not properly report artifacts handling. In other words, BCI researchers do  not  report  whether  or  not  they  have  considered  the  presence  of  artifacts.  A  large number  of  studies  reject  the  artifact-contaminated  periods  either  manually  or automatically. We argue that a proper solution is to either efficiently remove artifacts or to design a BCI system whose performance is robust to the presence of artifacts.  In Chapter 7 we further analyze the performance of the proposed SBCI. We first consider the performance over periods contaminated with eye-blink artifacts. Next, we    37test  the  performance  of  the  system  on  data  collected  in  a  session  recorded  after  the sessions used for training and testing the performance of the system. In both cases, we demonstrate that our proposed system maintains its good performance in the presence of artifacts. These results also demonstrate that during online testing, the system does not need to reject periods marked with artifacts. This, in turn, greatly increases the periods during which the system is available for control.  In Chapter 8, we address the critical issue of selecting a suitable evaluation metric in the design of SBCI systems. We revise and improve a framework that was proposed earlier for comparing the classification accuracy and the AUC metrics [128].  Our revised model can be used to compare various metrics as well as studying new metrics. It can also be used for selecting the metric(s) that is (are) most suitable for evaluating a certain classification system. We also analyze the application of the proposed framework to the field  of  SBCI  systems.  In  particular,  we  consider  four  evaluation  metrics:  overall classification accuracy (OA), FPRTPR , Kappa?s coefficient, and HF-difference and compare their performances during the model selection procedure for a particular SBCI system. We demonstrate that some evaluation metrics such as Kappa and HF-difference are more suitable and some such as OA and  FPRTPR  are less suitable evaluation metrics for SBCI systems.  In Chapters 2 to 8, the type of movement that was considered for the generation of IC commands was the index finger flexion. In order for the system to be generalized to more  control  options,  its  performance  on  new  mental  tasks  (related  to  other  types  of movements)  should  also  be  investigated.  It  is  also  desired  that  the  same  system  also performs well on other types of movements. In Chapter 9, we examine the performance of the SBCI system proposed in Chapter 5 on a new dataset. In this dataset, IC commands are  generated  by  hand  extension  movements.    NC  data  are  also  recorded  in  a  more engaging environment than those used in previous studies for training the system. It is demonstrated  that  our  proposed  design  maintains  a  superior  performance  compared  to other  EEG-based  SBCI  designs  in  the  literature.  Secondly,  electromyography  (EMG) signals  from  frontalis  muscles  are  recorded  to  rule  out  the  activation  of  such  muscles    38during  the  movement  executions.  Furthermore,  we  use  the  framework  developed  in Chapter 8 to select the most suitable evaluation metric for the system. We conclude that Cohen?s Kappa coefficient is the most suitable evaluation metric for the model selection procedure  of  the  proposed  SBCI  system.  Finally,  we  compare  the  performance  of monopolar  and  bipolar  EEG  electrode  montages.  This  study  shows  that  the  bipolar montage  generates  more  suitable  features  and  thus  a  superior  performance  than  the monopolar montage.   In Chapter 10, we summarize the contributions of this thesis to the field of SBCI systems. We also present some of the potential research subjects that can immediately follow this research in this chapter. Figure 1-10 shows a summary of the organization of this thesis.  In Appendix. A, we provide a copy of the approval from the Behavioral Research Ethics  Board  (BREB)  of  the  University  of  British  Columbia  to  conduct  this  study.  In Appendix B, we use the linear programming theory to show how the proposed multiple classifier system could achieve perfect classification accuracy under certain conditions.               39 Figure 1-10. Outline of the thesis. Chapter 1:  Introduction Design of a 2-state SBCI System with a low FP rate Analysis of the effect of artifacts in SBCI systems Finding a suitable evaluation metric for SBCI systems Chapter 2: Automatic user customization of the LF-ASD  Chapter 3: Using DWT for extracting features from MRPs  Chapter 4: Design of an SBCI using MRP, CPMR and CPBR features Chapter 5: Automating the design of an SBCI with low FP rates Chapter 9: Testing the performance of new movements; Comparison of monopolar and bipolar montages. Chapter 6: Comprehensive review of methods of handling artifacts in BCI systems Chapter 7: Testing the performance of the system developed in Chapter 5 over artifact-contaminated periods Chapter 8: Proposing a new framework for comparing evaluation metrics during model selection Chapter 9: Using the framework developed in Chapter 8 to find a suitable evaluation metric for an SBCI system. Chapter 10:  Conclusions and directions for future works    401.10 References  [1]  T.  Vaughan,  W.  J.  Heetderks,  L.  J.  Trejo,  W.  Z.  Rymer,  M.  Wienrich,  M.  M.  Moore,  A. Kubler,  B.  H.  Dobkin,  N.  Birbaumer,  E.  Donchin,  E.  W.  Wolpaw  and  J.  W.  R,  "Brain-computer interface technology: a review of the second international meeting", IEEE Trans. Neural Syst. Rehab. Eng., vol. 11, no.2, pp. 94-109, 2003.  [2] J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller and T. M. Vaughan, "Brain-computer interfaces for communication and control", Clin. Neurophysiol., vol. 113, no.6, pp. 767-791, Jun.2002.  [3] S. G. Mason, A. Bashashati, M. Fatourechi, K. F. Navarro and G. E. Birch, "A comprehensive survey of brain interface technology designs", Ann. Biomed. Eng., vol. 35, no.2, pp. 137-169, Feb.2007.  [4] S. G. Mason, J. Kronegg, J. Huggins, M. Fatourechi and A. and Schloegl, "Evaluating  the performance  of  self-paced  BCI  technology?,  Technical  Report,  available  online: http://www.bci-info.tugraz.at/Research_Info/documents/articles/self_paced_tech_report-2006-05-19.pdf, 2006. [5] J. F. Borisoff, S. G. Mason, A. Bashashati and G. E. Birch, "Brain-computer interface design for  asynchronous  control  applications:  improvements  to  the  LF-ASD  asynchronous  brain switch", IEEE Trans. Biomed. Eng., vol. 51, no.6, pp. 985-992, Jun.2004.  [6] E. Yom-Tov and G. F. Inbar, "Detection of Movement-Related Potentials from the Electro-Encephalogram  for  possible  use  in  a  Brain-Computer  Interface",  Medical  and  Biological Engineering and Computing, vol. 41, no.1, pp. 85-93, Jan.2003.  [7] G. E. Birch, Z. Bozorgzadeh and S. G. Mason, "Initial on-line evaluations of the LF-ASD brain-computer interface with able-bodied and spinal-cord subjects using imagined voluntary motor  potentials",  IEEE  Trans.  Neural  Syst.  Rehabil.  Eng.,  vol.  10,  no.4,  pp.  219-224, Dec.2002.  [8]  B.  Graimann,  J.  E.  Huggins,  S.  P.  Levine  and  G.  Pfurtscheller,  "Toward  a  direct  brain interface  based  on  human  subdural  recordings  and  wavelet-packet  analysis",  IEEE  Trans. Biomed. Eng., vol. 51, no.6, pp. 954-962, Jun.2004.  [9] J. Kronegg, s. Voloshynovskiy and P. Pun, "Analysis of bit rate definitions for brain-computer interfaces,"  in  the  Proc.  Int.  Conf.  on  Human-Computer  Interaction  (HCI'05),  Las  Vegas, Nevada, 2005. [10] A. Schl?gl, J. Kronegg, J. Huggins and S. G. Mason, "Evaluation  criteria in BCI research," in  Towards    Brain-Computer  Interfacing  (G.  Dornhege,  J.  R.  Millan,  T.  Hinterberger,  D. McFarland and K. R. Muller, Eds.), MIT Press, 2007. [11] A. Bashashati, M. Fatourechi, R. K. Ward and G. E. Birch, "A survey of signal processing algorithms  in  brain-computer  interfaces  based  on  electrical  brain  signals",  J.  Neural  Eng., vol. 4, no.2, pp. R32-57, Jun.2007.  [12] J. J. Vidal, "Toward direct brain-computer communication", Annu. Rev. Biophys. Bioeng., vol. 2, pp. 157-180, 1973.  [13]  G.  Pfurtscheller  and  A.  Aranibar,  "Event-related  cortical  desynchronization  detected  by power measurements of scalp EEG", Electroencephalogr. Clin. Neurophysiol., vol. 42, no.6, pp. 817-826, Jun.1977.     41[14] L. Leocani, C. Toro, P. Manganotti, P. Zhuang and M. Hallett, "Event-related coherence and event-related desynchronization/synchronization in the 10 Hz and 20 Hz EEG during self-paced  movements",  Electroencephalogr.  Clin.  Neurophysiol.,  vol.  104,  no.3,  pp.  199-206, May.1997.  [15] G. Pfurtscheller and F. H. Lopes da Silva, "Event-related EEG/MEG synchronization and desynchronization:  basic  principles",  Clin.  Neurophysiol.,  vol.  110,  no.11,  pp.  1842-1857, Nov.1999.  [16] G. Pfurtscheller, K. Pichler-Zalaudek, B. Ortmayr, J. Diez and F. Reisecker, "Postmovement beta  synchronization  in  patients  with  Parkinson's  disease",  J.  Clin.  Neurophysiol.,  vol.  15, no.3, pp. 243-250, May.1998.  [17]  D.  J.  McFarland,  W.  A.  Sarnacki,  T.  M.  Vaughan  and  J.  R.  Wolpaw,  "Brain-computer interface  (BCI)  operation:  signal  and  noise  during  early  training  sessions",  Clin. Neurophysiol., vol. 116, no.1, pp. 56-62, Jan.2005.  [18] J. R. Wolpaw and D. J. McFarland, "Control of a two-dimensional movement signal by a noninvasive brain-computer interface in humans", in Proc. Natl. Acad. Sci. U. S. A., vol. 101, no.51, pp. 17849-17854, Dec 21.2004.  [19]  J.  R.  Wolpaw  and  D.  J.  McFarland,  "Multichannel  EEG-based  brain-computer communication",  Electroencephalogr.  Clin.  Neurophysiol.,  vol.  90,  no.6,  pp.  444-449, Jun.1994.  [20] D. J. McFarland, L. M. McCane and J. R. Wolpaw, "EEG-based communication and control: short-term role of feedback", IEEE Trans. Rehabil. Eng., vol. 6, no.1, pp. 7-11, Mar.1998.  [21]  L.  A.  Miner,  D.  McFarland  and  J.  R.  Wolpaw,  "Answering  Questions  with  an Electroencephalogram-Based Brain Computer Interface", Arch. Phys. Med. Rehabil., vol. 79, pp. 1029-1033, 1998.  [22] G. R. Muller-Putz, R. Scherer, G. Pfurtscheller and R. Rupp, "EEG-based neuroprosthesis control: a step towards clinical practice", Neurosci. Lett., vol. 382, no.1-2, pp. 169-174, Jul 2005.  [23]  G.  Pfurtscheller,  G.  R.  M?ller-Putz,  J.  Pfurtscheller  and  R.  Rupp,  "EEG-Based Asynchronous  BCI  Controls  Functional  Electrical  Stimulation  in  a  Tetraplegic  Patient", EURASIP Journal on Applied Signal Processing, vol. 2005, no.19, pp. 3152-3155, 2005.  [24]  G.  Pfurtscheller,  B.  Graimann,  J.  E.  Huggins  and  S.  P.  Levine,  "Brain-computer communication based on the dynamics of brain oscillations", Suppl. Clin. Neurophysiol., vol. 57, pp. 583-591, 2004.  [25] G. Pfurtscheller, C. Neuper, C. Guger, W. Harkam, H. Ramoser, A. Schlogl, B. Obermaier and M. Pregenzer, "Current trends in Graz Brain-Computer Interface (BCI) Research", IEEE Trans. Rehab. Eng., vol. 8, no.2, pp. 216-219, Jun. 2000.  [26] G. Pfurtscheller and C. Neuper, "Motor imagery and direct brain-computer communication", Proc. IEEE, vol. 89, no.7, pp. 1123-1134, 2001.  [27]  L.  Deecke,  B.  Grozinger  and  H.  H.  Kornhuber,  "Voluntary  finger  movement  in  man: cerebral potentials and theory", Biol. Cybern., vol. 23, no.2, pp. 99-119, Jul 14.1976.  [28] H. Shibasaki, G. Barrett, E. Halliday and A. M. Halliday, "Components of the movement-related  cortical  potential  and  their  scalp  topography",  Electroencephalogr.  Clin. Neurophysiol., vol. 49, no.3-4, pp. 213-226, Aug.1980.     42[29]  I.  M.  Tarkka  and  M.  Hallett,  "Cortical  topography  of  premotor  and  motor  potentials preceding  self-paced,  voluntary  movement  of  dominant  and  non-dominant  hands", Electroencephalogr. Clin. Neurophysiol., vol. 75, no.2, pp. 36-43, Feb.1990.  [30] M. Hallett, "Movement-related cortical potentials", Electromyogr. Clin. Neurophysiol., vol. 34, no.1, pp. 5-13, Jan-Feb.1994.  [31]  C.  Babiloni,  F.  Carducci,  F.  Cincotti,  P.  M.  Rossini,  C.  Neuper,  G.  Pfurtscheller  and  F. Babiloni, "Human movement-related potentials vs desynchronization of EEG alpha rhythm: a high-resolution EEG study", Neuroimage, vol. 10, no.6, pp. 658-665, Dec.1999.  [32]  A.  Urbano,  C.  Babiloni,  P.  Onorati  and  F.  Babiloni,  "Human  cortical  activity  related  to unilateral movements. A high resolution EEG study", Neuroreport, vol. 8, no.1, pp. 203-206, Dec 20.1996.  [33] A. Urbano, C. Babiloni, P. Onorati, F. Carducci, A. Ambrosini, L. Fattorini and F. Babiloni, "Responses  of  human  primary  sensorimotor  and  supplementary  motor  areas  to  internally triggered unilateral and simultaneous bilateral one-digit movements. A high-resolution EEG study", Eur. J. Neurosci., vol. 10, no.2, pp. 765-770, Feb.1998.  [34] Z. Bozorgzadeh, G. E. Birch and S. G. Mason, "The LF-ASD brain computer interface: On-line  identification  of  imagined  finger  flexions  in  the  spontaneous  EEG  of  able-bodied subjects," in Proc. IEEE ICASSP?00,vol.6,pp. 2385-2388 , 2000.,  [35]  S.  G.  Mason  and  G.  E.  Birch,  "A  brain-controlled  switch  for  asynchronous  control applications", IEEE Trans. Biomed. Eng, vol. 47, no.10, pp. 1297-1307, Oct.2000.  [36]  G.  E.  Birch,  P.  D.  Lawrence  and  R.  D.  Hare,  "Single  Trial  Processing  of  Event  Related Potentials Using Outlier Information", IEEE Trans. Biomed. Eng., vol. 40, no.1, pp. 59-73, 1993.  [37]  B.  Blankertz,  G.  Dornhege,  C.  Sch?currency1fer,  R.  Krepki,  J.  Kolmorgen,  K.  R.  M??ller,  V. Kunzmann,  F.  Losch  and  G.  Curio,  "Boosting  bit  rates  and  error  detection  for  the classification of fast-paced motor commands based on single-trial EEG analysis," in  IEEE Trans. Neural Sys. Rehab. Eng, vol.11, no.2, 2003,  [38] G. Dornhege, B. Blankertz and G. Curio, "Speeding up classification of multi-channel brain-computer interfaces: Common spatial patterns for slow cortical potentials," in Proc. 1st IEEE EMBS Int. Conf. on Neural Engineering,pp. 595-598. 2003,  [39] E. Yom-Tov and G. F. Inbar, "Feature Selection for the Classification of Movements From Single Movement-Related Potentials", IEEE Trans. Neural Syst. Rehab. Eng., vol. 10, no.3, pp. 170-177, Sep.2002.  [40] E. Yom-Tov and G. F. Inbar, "Selection of relevant features for classification of movements from  single  movement-related  potentials  using  a  genetic  algorithm,"  in  the  Proc.  23rd  IEEE/EMBS Int. Conf.,vol.2,pp. 1364-1366 , 2001.  [41] S. P. Levine, J. E. Huggins, S. L. Bement, R. K. Kushwaha, L. A. Schuh, E. A. Passaro, M. M. Rohde and D. A. Ross, "Identification of Electrocorticogram Patterns as the Basis for a Direct Brain Interface", J Clinical Neurophysiol, vol. 16, no.5, pp. 439-447, Sep.1999.  [42]  J.  E.  Huggins,  S.  P.  Levine,  R.  Kushwaha,  S.  L.  Bement,  L.  A.  Schuh  and  D.  A.  Ross, "Identification of cortical signal patterns related to human tongue protrusion," in pp. 670-672. 1995.    43[43] N. Neumann, A. Kubler, J. Kaiser, T. Hinterberger and N. Birbaumer, "Conscious perception of brain states: mental strategies for brain-computer communication", Neuropsychologia, vol. 41, no.8, pp. 1028-1036, 2003.  [44] T. Hinterberger, B. Wilhelm, J. Mellinger, B. Kotchoubey and N. Birbaumer, "A device for the detection of cognitive brain functions in completely paralyzed or unresponsive patients", IEEE Trans. Biomed. Eng., vol. 52, no.2, pp. 211-220, Feb.2005.  [45] N. Birbaumer, "The thought-translation-device (TTD): Taming cognition for action", Brain Cogn., vol. 54, no.2, pp. 130-130, Mar.2004.  [46]  T.  Hinterberger,  S.  Schmidt,  N.  Neumann,  J.  Mellinger,  B.  Blankertz,  G.  Curio  and  N. Birbaumer,  "Brain-computer  communication  and  slow  cortical  potentials",  IEEE  Trans. Biomed. Eng., vol. 51, no.6, pp. 1011-1018, Jun.2004.  [47]  N.  Birbaumer,  A.  Kubler,  N.  Ghanayim,  T.  Hinterberger,  J.  Perelmouter,  J.  Kaiser,  I. Iversen, B. Kotchoubey, N. Neumann and H. Flor, "The thought translation device (TTD) for completely  paralyzed  patients",  IEEE  Trans.  Rehabil.  Eng.,  vol.  8,  no.2,  pp.  190-193, Jun.2000.  [48]  A. Kubler,  B. Kotchoubey,  J. Kaiser, J.  R.  Wolpaw  and  N.  Birbaumer, "Brain-Computer Communication:  Unlocking  the  Locked  In",  Psych  Bulletin,  vol.  127,  no.3,  pp.  358-375, May.2001.  [49] J. d. R. Millan, J. Mourino, M. G. Marciani, F. Babiloni, F. Topani, I. Canale, J. Heikkonen and  K.  Kaski,  "Adaptive  brain  interfaces  for  physically-disabled  people,"  in  Proc.  IEEE EMBS Conf, vol.4,, pp. 2008-2011, 1998.  [50] C. W. Anderson, S. V. Devulapalli and E. A. Stolz, "Signal Classification with Different Signal Representations", Neural Networks for Signal Processing, pp. 475-483, 1995.  [51]  D.  Garrett,  D.  A.  Peterson,  C.  W.  Anderson  and  M.  H.  Thaut,  "Comparison  of  linear, nonlinear, and feature selection methods for EEG signal classification", IEEE Trans. Neural Syst. Rehab. Eng., vol. 11, no.2, pp. 141-144, Jun. 2003.  [52] C. W. Anderson, E. A. Stolz and S. Shamsunder, "Multivariate autoregressive models for classification  of  spontaneous  electroencephalographic  signals  during  mental  tasks",  IEEE Trans. Biomed. Eng., vol. 45, no.3, pp. 277-286, Mar.1998.  [53] B. Z. Allison and J. A. Pineda, "ERPs evoked by different matrix sizes: Implications for a brain computer interface (BCI) system", IEEE Trans. Neural Syst. Rehab. Eng., vol. 11, no.2, pp. 110-113, Jun.2003.  [54] E. Donchin, K. M. Spencer and R. Wijesinghe, "The mental prosthesis: assessing the speed of a P300-based brain-computer interface", IEEE Trans. Rehabil. Eng., vol. 8, no.2, pp. 174-179, Jun.2000.  [55] L. A. Farwell and E. Donchin, "Talking off the top of your head: toward a mental prosthesis utilizing  event-related  brain  potentials",  Electroencephalogr.  Clin.  Neurophysiol.,  vol.  70, no.6, pp. 510-523, Dec.1988.  [56]  E.  W.  Sellers,  A.  Kubler  and  E.  Donchin,  "Brain-computer  interface  research  at  the University  of  South  Florida  Cognitive  Psychophysiology  Laboratory:  the  P300  Speller", IEEE Trans. Neural Syst. Rehabil. Eng., vol. 14, no.2, pp. 221-224, Jun.2006.  [57] F. Piccione, F. Giorgi, P. Tonin, K. Priftis, S. Giove, S. Silvoni, G. Palmas and F. Beverina, "P300-based brain computer interface: reliability and performance in healthy and paralysed participants", Clin. Neurophysiol., vol. 117, no.3, pp. 531-537, Mar.2006.     44[58] A. A. Glover, M. C. Onofrj, M. F. Ghilardi and I. Bodis-Wollner, "P300-like potentials in the  normal  monkey  using  classical  conditioning  and  an  auditory  'oddball'  paradigm", Electroencephalogr. Clin. Neurophysiol., vol. 65, no.3, pp. 231-235, May.1986.  [59]  B.  Roder,  F.  Rosler,  E.  Hennighausen  and  F.  Nacker,  "Event-related  potentials  during auditory and somatosensory discrimination in sighted and blind human subjects", Brain Res. Cogn. Brain Res., vol. 4, no.2, pp. 77-93, Sep.1996.  [60] J. J. Vidal, "Real-Time Detection of Brain Events in EEG,"  Proc IEEE, vol. 65, pp. 633-641, 1977.  [61]  E.  E.  Sutter,  "The  brain  response  interface:  communication  through  visually-induced electrical brain responses", J Micro Comp App, vol. 15, pp. 31-45, 1992.  [62]  M.  Middendorf,  G.  McMillan,  G.  Calhoun  and  K.  S.  Jones,  "Brain-Computer  Interfaces Based on the Steady-State Visual-Evoked Response", IEEE Trans. Rehab. Eng., vol. 8, no.2, pp. 211-214, Jun. 2000.  [63]  X.  Gao,  D.  Xu,  M.  Cheng  and  S.  Gao,  "A  BCI-based  environmental  controller  for  the motion-disabled",  IEEE  Trans.  Neural  Syst.  Rehab.  Eng.,  vol.  11,  no.2,  pp.  137-140,  Jun. 2003.  [64]  Y.  Wang,  R.  Wang,  X.  Gao  and  S.  Gao,  "Brain-computer  interface  based  on  the  high-frequency steady-state visual evoked potential," in the Proc. 1st  Int. Conf. in Neural Interface and Control, pp. 37-39, 2005.  [65]  Cheng  Ming,  Gao  Xiaorong,  Gao  Shangkai  and  Wang  Boliang,  "Stimulation  frequency extraction in SSVEP-based brain-computer interface," in in the Proc. 1st  Int. Conf. Neural Interface and Control, ,pp. 64-67. 2005,  [66]  Yijun  Wang,  Zhiguang  Zhang,  Xiaorong  Gao  and  Shangkai  Gao,  "Lead  selection  for SSVEP-based  brain-computer  interface,"  in  the  Proc.  26th  IEEE/EMBS  Int.  Conf.,vol.2,pp. 4507-4510 , 2004.  [67] J. K. Chapin, K. A. Moxon, R. S. Markowitz and M. A. Nicolelis, "Real-time control of a robot arm using simultaneously recorded neurons in the motor cortex", Nat. Neurosci., vol. 2, no.7, pp. 664-670, Jul.1999.  [68] M. A. L. Nicolelis and J. K. Chapin, "Controlling robots with the mind", Sci. Am., vol. 287, no.4, pp. 46-53, Oct.2002.  [69] J. T. Francis and J. K. Chapin, "Force field apparatus for investigating movement control in small animals", IEEE Trans. Biomed. Eng., vol. 51, no.6, pp. 963-965, Jun.2004.  [70] S. Darmanjian, Sung Phil Kim, M. C. Nechyba, S. Morrison, J. Principe, J. Wessberg and M. A. L. Nicolelis, "Bimodal brain-machine interface for motor control of robotic prosthetic," in the Proc. IEEE Int. Conf. Intelligent Robots and Systems, vol.4,pp. 3612-3617 , 2003.  [71]  J.  P.  Donoghue,  A.  Nurmikko,  G.  Friehs  and  M.  Black,  "Development  of  neuromotor prostheses for humans", Suppl. Clin. Neurophysiol., vol. 57, pp. 592-606, 2004.  [72] F. Wood, M. J. Black, C. Vargas-Irwin, M. Fellows and J. P. Donoghue, "On the variability of manual spike sorting", IEEE Trans. Biomed. Eng., vol. 51, no.6, pp. 912-918, Jun.2004.  [73] L. R. Hochberg, M. D. Serruya, G. M. Friehs, J. A. Mukand, M. Saleh, A. H. Caplan, A. Branner, D. Chen, R. D. Penn and J. P. Donoghue, "Neuronal ensemble control of prosthetic devices by a human with tetraplegia", Nature, vol. 442, no.7099, pp. 164-171, Jul 13.2006.     45[74] M. D. Serruya, N. G. Hatsopoulos, L. Paninski, M. R. Fellows and J. P. Donoghue, "Instant neural control of a movement signal", Nature, vol. 416, no.6877, pp. 141-142, Mar.2002.  [75] T. M. Vaughan, J. R. Wolpaw and E. Donchin, "EEG-Based Communication: Prospects and Problems", IEEE Trans. Rehab. Eng., vol. 4, no.4, pp. 425-430, 1996.  [76] G. Dornhege, B. Blankertz, G. Curio and K. R. Muller, "Boosting bit rates in noninvasive EEG  single-trial  classifications  by  feature  combination  and  multiclass  paradigms",  IEEE Trans. Biomed. Eng., vol. 51, no.6, pp. 993-1002, Jun.2004.  [77] B. Graimann, J. E. Huggins, A. Schlogl, S. P. Levine and G. Pfurtscheller, "Detection of movement-related desynchronization patterns in ongoing single-channel electrocorticogram", IEEE Trans. Neural Syst. Rehabil. Eng., vol. 11, no.3, pp. 276-281, Sep.2003.  [78] R. Scherer, G. R. Muller, C. Neuper, B. Graimann and G. Pfurtscheller, "An asynchronously controlled  EEG-based  virtual  keyboard:  improvement  of  the  spelling  rate",  IEEE  Trans. Biomed. Eng., vol. 51, no.6, pp. 979-984, Jun.2004.  [79] Z. Yu, S. G. Mason and G. E. Birch, "Enhancing the performance of the LF-ASD brain-computer interface," in in Proc. of the 2nd  Joint EMBS/BMES Conference, vol.3,pp. 2443-2444, 2002.  [80]  A.  Bashashati,  M.  Fatourechi,  R.  K.  Ward  and  G.  E.  Birch,  "User  customization  of  the feature generator of an asynchronous brain interface", Ann. Biomed. Eng., vol. 34, no.6, pp. 1051-1060, Jun.2006.  [81]  A.  Bashashati,  R.  K.  Ward  and  G.  E.  Birch,  "A  new  design  of  the  asynchronous  brain computer interface using the knowledge of the path of features," in Proc.2nd  IEEE EMBS Int. Conf. on Neural Engineering,pp. 101-104. 2005.   [82] A. Bashashati, R. K. Ward and G. E. Birch, "Towards development of a 3-state self-paced brain computer interface", Computational Intelligence and Neuroscience, Vol.2007, pp.1-8, Oct. 2007.  [83] J. E. Huggins, S. P. Levine, J. A. Fessler, W. M. Sowers, G. Pfurtscheller, B. Graimann, A. Schloegl,  D.  N.  Minecan,  R.  K.  Kushwaha,  S.  L.  BeMent,  O.  Sagher  and  L.  A.  Schuh, "Electrocorticogram  as  the  basis  for  a  direct  brain  interface:  Opportunities  for  improved detection accuracy," in Proc. 1st IEEE EMBS Int. Conf. on Neural Engineering,pp. 587-590. 2003,  [84] S. P. Levine, J. E. Huggins, S. L. Bement, R. K. Kushwaha, L. A. Schuh, M. M. Rohde, E. A. Passaro, D. A. Ross, K. V. Elisevich and B. J. Smith, "A Direct Brain Interface Based on Event-Related Potentials", IEEE Trans. Rehab. Eng., vol. 8, no.2, pp. 180-185, Jun.2000.  [85] J. E. Huggins, S. P. Levine, S. L. Bement, R. K. Kushwaha, L. A. Schuh, E. A. Passaro, M. M.  Rohde,  D.  A.  Ross,  K.  V.  Elisevich  and  B.  J.  Smith,  "Detection  of  Event-Related Potentials for  Development  of  a  Direct  Brain  Interface",  J  Clinical  Neurophysiol,  vol.  16, no.5, pp. 448-455, Sep.1999.  [86]  Y.  Benjamini  and  Y.  Hochberg,  "Controlling  the  False  Discovery  Rate:  A  Practical  and Powerful  Approach  to  Multiple  Testing",  Journal  of  the  Royal  Statistical  Society.Series  B (Methodological), vol. 57, no.1, pp. 289-300, 1995.  [87]  G.  Townsend,  B.  Graimann  and  G.  Pfurtscheller,  "Continuous  EEG  classification  during motor  imagery--simulation  of  an  asynchronous  BCI",  IEEE  Trans.  Neural  Syst.  Rehabil. Eng., vol. 12, no.2, pp. 258-265, Jun.2004.     46 [88]  C.  Toro,  G.  Deuschl,  R.  Thatcher,  S.  Sato,  C.  Kufta  and  M.  Hallett,  "Event-related desynchronization  and  movement-related  cortical  potentials  on  the  ECoG  and  EEG", Electroencephalogr. Clin. Neurophysiol., vol. 93, no.5, pp. 380-389, Oct.1994.  [89] S. Arroyo, R. P. Lesser, B. Gordon, S. Uematsu, D. Jackson and R. Webber, "Functional significance of the mu rhythm of human cortex: an electrophysiologic study with subdural electrodes", Electroencephalogr. Clin. Neurophysiol., vol. 87, no.3, pp. 76-87, Sep.1993.  [90]  G.  Pfurtscheller  and  A.  Aranibar,  "Evaluation  of  event-related  desynchronization  (ERD) preceding  and  following  voluntary  self-paced  movement",  Electroencephalogr.  Clin. Neurophysiol., vol. 46, no.2, pp. 138-146, Feb.1979.  [91]  L.  Defebvre,  J.  L.  Bourriez,  K.  Dujardin,  P.  Derambure,  A.  Destee  and  J.  D.  Guieu, "Spatiotemporal  study  of Bereitschaftspotential  and event-related  desynchronization  during voluntary  movement  in  Parkinson's  disease",  Brain  Topogr.,  vol.  6,  no.3,  pp.  237-244, Spring.1994.  [92] K. R. Muller, G. Curio, B. Blankertz and G. Dornhege, "Combining features for BCI," in the Proc. Advances in Neural Inf. Proc. Systems (NIPS 02), vol.15,2003.  [93] L. Narici, V. Pizzella, G. L. Romani, G. Torrioli, R. Traversa and P. M. Rossini, "Evoked alpha- and mu-rhythm in humans: a neuromagnetic study", Brain Res., vol. 520, no.1-2, pp. 222-231, Jun 18.1990.  [94] B. Feige, R. Kristeva-Feige, S. Rossi, V. Pizzella and P. M. Rossini, "Neuromagnetic study of movement-related changes in rhythmic brain activity",  Brain Res., vol. 734, no.1-2, pp. 252-260, Sep 23.1996.  [95]  G.  Pfurtscheller,  "Central  beta  rhythm  during  sensorimotor  activities  in  man", Electroencephalogr. Clin. Neurophysiol., vol. 51, no.3, pp. 253-264, Mar.1981.  [96] W. Szurhaj, P. Derambure, E. Labyt, F. Cassim, J. L. Bourriez, J. Isnard, J. D. Guieu and F. Mauguiere, "Basic mechanisms of central rhythms reactivity to preparation and execution of a voluntary movement: a stereoelectroencephalographic study", Clin. Neurophysiol., vol. 114, no.1, pp. 107-119, Jan.2003.  [97] H. S. Liu, X. Gao, F. Yang and S. Gao, "Imagined hand movement identification based on spatio-temporal  pattern  recognition  of  EEG,"  in  Proc.  of  the  1st    Joint  EMBS/BMES Conference, pp. 599-602. 2003,   [98] B. D. Mensh, J. Werfel and H. S. Seung, "BCI Competition 2003--Data set Ia: combining gamma-band  power  with  slow  cortical  potentials  to  improve  single-trial  classification  of electroencephalographic signals", IEEE Trans. Biomed. Eng., vol. 51, no.6, pp. 1052-1056, Jun. 2004.  [99]  T.  Hinterberger  and  G.  Baier,  "Parametric  orchestral  sonification  of  EEG  in  real  time", Multimedia, IEEE, vol. 12, no.2, pp. 70-79, 2005.  [100] Y. Wang, Z. Zhang, Y. Li, X. Gao, S. Gao and F. Yang, "BCI Competition 2003--Data set IV: an algorithm based on CSSD and FDA for classifying single-trial EEG", IEEE Trans. Biomed. Eng., vol. 51, no.6, pp. 1081-1086, Jun. 2004.  [101] M. Krauledat, G. Dornhege, B. Blankertz, F. Losch, G. Curio and K. -. Muller, "Improving speed and accuracy of brain-computer interfaces using readiness potential features,"  in the Proc. 26th IEEE/EMBS Int. Conf.,vol.2,pp. 4511-4515 , 2004.  [102] T. M. Vaughan, W. J. Heetderks, L. J. Trejo, W. Z. Rymer, M. Weinrich, M. M. Moore, A. Kubler, B. H. Dobkin, N. Birbaumer, E. Donchin, E. W. Wolpaw and J. R. Wolpaw, "Brain-   47computer interface technology: a review of the Second International Meeting", IEEE Trans. Neural Syst. Rehabil. Eng., vol. 11, no.2, pp. 94-109, Jun.2003.  [103] J. S. Barlow, "Artifact processing (rejection and minimization) in EEG data processing", Handbook  of  Electroencephalography  and  Clinical  Neurophysiology  (Revised  Series  Ed.), Amsterdam: Elsevier, vol.2., pp.15?62, 1986. [104] P. Anderer, S. Roberts, A. Schlogl, G. Gruber, G. Klosch, W. Herrmann, P. Rappelsberger, O.  Filz,  M.  J.  Barbanoj,  G.  Dorffner  and  B.  Saletu,  "Artifact  processing  in  computerized analysis of sleep EEG - a review", Neuropsychobiology, vol. 40, no.3, pp. 150-157, Sep.1999.  [105] D. J. McFarland, L. M. McCane, S. V. David and J. R. Wolpaw, "Spatial filter selection for EEG-based  communication",  Electroencephalogr.  Clin.  Neurophysiol.,  vol.  103,  no.3,  pp. 386-394, Sep.1997.  [106]  W.  Waterink  and  A.  van  Boxtel,  "Facial  and  jaw-elevator  EMG  activity  in  relation  to changes in performance level during a sustained information processing task", Biol. Psychol., vol. 37, no.3, pp. 183-198, Jul.1994.  [107]  B.  H.  Cohen,  R.  J.  Davidson,  J.  A.  Senulis,  C.  D.  Saron  and  D.  R.  Weisman,  "Muscle tension  patterns  during  auditory  attention",  Biol.  Psychol.,  vol.  33,  no.2-3,  pp.  133-156, Jul.1992.  [108] I. I. Goncharova, D. J. McFarland, T. M. Vaughan and J. R. Wolpaw, "EMG contamination of EEG: spectral and topographical characteristics", Clin. Neurophysiol., vol. 114, no.9, pp. 1580-1593, Sep.2003.  [109]  D.  J.  McFarland,  W.  A.  Sarnacki,  T.  M.  Vaughan  and  J.  R.  Wolpaw,  "Brain-computer interface  (BCI)  operation:  signal  and  noise  during  early  training  sessions",  Clin. Neurophysiol., vol. 116, no.1, pp. 56-62, Jan.2005.  [110]  R.  N.  Vigario,  "Extraction  of  ocular  artefacts  from  EEG  using  independent  component analysis",  Electroencephalography  and  Clinical  Neurophysiology,  vol.  103,  no.3,  pp.  395-404, Sep. 1997.  [111]  R.  Verleger,  "The  instruction  to  refrain  from  blinking  affects  auditory  P3  and  N1 amplitudes", Electroencephalogr. Clin. Neurophysiol., vol. 78, no.3, pp. 240-251, Mar.1991.  [112]  C. J.  Ochoa and J. Polich, "P300 and  blink  instructions",  Clin.  Neurophysiol.,  vol.  111, no.1, pp. 93-98, Jan.2000.  [113]  G.  Gratton,  "Dealing  with  artifacts:  The  EOG  contamination  of  the  event-reJated  brain potential", Behavior Research Methods, Instruments, & Computers, vol. 30, no.1, pp. 44-53, 1998.  [114] H. Ramoser, J. Muller-Gerking and G. Pfurtscheller, "Optimal Spatial Filtering of Single Trial EEG During Imagined Hand Movement", IEEE Trans. Rehab. Eng., vol. 8, no.4, pp. 441-446, Dec.2000.  [115] J. Millan, M. Franze, J. Mourino, F. Cincotti and F. Babiloni, "Relevant EEG features for the classification of spontaneous motor-related tasks", Biol. Cybern., vol. 86, no.2, pp. 89-95, Feb.2002.  [116]  R.  J.  Croft  and  R.  J.  Barry,  "Removal  of  ocular  artifact  from  the  EEG:  a  review", Neurophysiol. Clin., vol. 30, no.1, pp. 5-19, Feb.2000.  [117]  V.  Rowland,  "Cortical  steady  potential  (direct  current  potential)  in  reinforcement  and learning", Progress in Physiological Psychology, vol. 2, pp. 1?77, 1968.     48[118] J. S. Barlow, "EMG artifact minimization during clinical EEG recordings by special analog filtering", Electroencephalogr. Clin. Neurophysiol., vol. 58, no.2, pp. 161-174, Aug.1984.  [119]  J.  R.  Ives  and  D.  L.  Schomer,  "A  6-pole  filter  for  improving  the  readability  of  muscle contaminated  EEGs",  Electroencephalogr.  Clin.  Neurophysiol.,  vol.  69, no.5, pp.  486-490, May.1988.  [120] S. Choi, A. Cichocki, H. M. Park and S. Y. Lee, "Blind Source Separation and Independent Component Analysis: A Review",  Neural Information Processing-Letters and Review, vol. 6, no.1, pp. 1?57, 2005.  [121] T. D. Lagerlund, F. W. Sharbrough and N. E. Busacker, "Spatial filtering of multichannel electroencephalographic recordings  through principal component analysis by singular value decomposition", J. Clin. Neurophysiol., vol. 14, no.1, pp. 73-82, Jan.1997.  [122]  M.  Browne  and  T.  R.  Cutmore,  "Low-probability  event-detection  and  separation  via statistical  wavelet    thresholding:  an  application  to  psychophysiological  denoising",  Clin. Neurophysiol., vol. 113, no.9, pp. 1403-1411, Sep.2002.  [123] P. He, G. Wilson and C. Russell, "Removal of ocular artifacts from electro-encephalogram by adaptive filtering", Med. Biol. Eng. Comput., vol. 42, no.3, pp. 407-412, May.2004.  [124] P. Berg and M. Scherg, "A multiple source approach to the correction of eye artifacts", Electroencephalogr. Clin. Neurophysiol., vol. 90, no.3, pp. 229-241, Mar.1994.  [125]  D.  Burke,  S.  Kelly,  P.  de  Chazal  and  R.  Reilly,  "A  simultaneous  filtering  and  feature extraction  strategy  for  direct  brain  interfacing,"  in  Proc.  of  the  2nd    Joint  EMBS/BMES Conference,vol.1,pp. 279-280 , 2002. [126] A. Kuebler, B. Kotchoubey, H. P. Salzmann, N. Ghanayim, J. Perelmouter, V. Homberg and N. Birbaumer, "Self-regulation of slow cortical potentials in completely paralyzed human patients", Neurosci. Lett., vol. 252, no.3, pp. 171-174, Aug.1998.  [127]  F.  Provost  and  T.  Fawcett,  "Robust  Classification  for  Imprecise  Environments",  Mach. Learning, vol. 42, no.3, pp. 203-231, 2001.  [128] J. Huang and C. X. Ling, "Using AUC and accuracy in evaluating learning algorithms", IEEE Trans. Knowled. Data Eng., vol. 17, no.3, pp. 299-310, 2005.  [129]  J.  Zhu  and  T.  Yao,  "An  evaluation  of  statistical  spam  filtering  techniques",  ACM Transactions on Asian Language Information Processing (TALIP), vol. 3, no.4, pp. 243-269, 2004.  [130] A. P. Bradley, "Use of the area under the ROC curve in the evaluation of machine learning algorithms", Pattern Recognit, vol. 30, no.7, pp. 1145-1159, 1997.  [131]  N.  T.  Choplin  and  D.  C.  Lundy,  "The  sensitivity  and  specificity  of  scanning  laser polarimetry in the detection of glaucoma in a clinical setting", Ophthalmology, vol. 108, no.5, pp. 899-904, May.2001.  [132] M. Fatourechi, G. E. Birch and R. K. Ward, "A self-paced brain interface system that uses movement  related  potentials  and  changes  in  the  power  of  brain  rhythms",  J.  Comput. Neurosci., vol.23, no.1, pp.21-37, Aug. 2007.   [133] N. Yamawaki, C. Wilke, Z. Liu and B. He, "An enhanced time-frequency-spatial approach for motor imagery classification", IEEE Trans. Neural Syst. Rehabil. Eng., vol. 14, no.2, pp. 250-254, Jun.2006.     49[134] A. Buttfield, P. W. Ferrez and R. Millan Jdel, "Towards a robust BCI: error potentials and online  learning",  IEEE  Trans.  Neural  Syst.  Rehabil.  Eng.,  vol.  14,  no.2,  pp.  164-168, Jun.2006.  [135] G. R. Muller-Putz, R. Scherer, C. Neuper and G. Pfurtscheller, "Steady-state somatosensory evoked potentials: suitable brain signals for brain-computer interfaces?", IEEE Trans. Neural Syst. Rehabil. Eng., vol. 14, no.1, pp. 30-37, Mar.2006.  [136] J. R. Wolpaw, D. McFarland and G. Pfurtscheller, "EEG-based Communication: Improved Accuracy  by  Reponse  Verification",  IEEE  Trans.  Rehab.  Eng.,  vol.  6,  no.3,  pp.  326-333, 1998.  [137] M. Fatourechi, S. G. Mason, G. E. Birch and R. K. Ward, "Is information transfer rate a suitable  performance  measure  for  self-paced  brain  interface  systems?"  in  Proc.  IEEE  Int. Symp. Signal Processing and Information Technology, pp. 212-216, 2006.  [138] J. Cohen, "A coefficient of agreement for nominal scales", Educational and Psychological Measurement, vol. 20, no.1, pp. 37-46, 1960.  [139] M. Fatourechi, G. E. Birch and R. K. Ward, "Applying a hybrid genetic algorithm in the design  of  a  self-paced  brain  interface  with  a  low  false  positive  rate,"  in  Proc.  IEEE ICASSP?07,vol.4,pp. IV-1157; IV-1160, Apr. 2007.  [140] V. Bostanov, "BCI Competition 2003--Data sets Ib and IIb: feature extraction from event-related brain potentials with the continuous wavelet transform and the t-value scalogram", IEEE Trans. Biomed. Eng., vol. 51, no.6, pp. 1057-1061, Jun.2004.  [141] L. Qin and B. He, "A wavelet-based time-frequency analysis approach for classification of motor imagery for brain-computer interface applications", J. Neural Eng., vol. 2, no.4, pp. 65-72, Dec.2005.       50CHAPTER 2  AUTOMATIC USER CUSTOMIZATION FOR IMPROVING THE PERFORMANCE OF A SELF-PACED BRAIN COMPUTER INTERFACE SYSTEM1  2.1  Introduction A  self-paced  brain  computer  interface  (BCI)  system  allows  individuals  with severe motor disabilities to control objects in their environment using their brain signals only and at any time, i.e., at their own pace [1-11]. The output of a self-paced BCI system should only be activated when the user intends to control, and should remain inactive at all  other  times.  Implementing  such  a  BCI  system  is  much  more  difficult  than implementing a traditional synchronized BCI system, in which the user can only control a device at certain periods of time specified by the system [12].    BCI  systems  use  specific  features  of  a  neurological  phenomenon  in  the  brain activity  for  the  purpose  of  control.  Various  neurological  phenomena  can  be  used, including  neural  firing  rates,  changes  in  the  Mu  and  Beta  rhythms,  movement-related potentials  (MRPs),  slow  cortical  potentials  (SCPs)  and  P300.  For  a  complete  list  of neurological phenomena used in BCI systems and pertinent references, please see [13]. In designing  a  feature  extractor  for  a  BCI  system,  an  important  factor  that  needs  to  be addressed  is  the  variability  in  the  chosen  neurological  phenomenon;  i.e.,  the specifications of the neurological phenomenon may change from one user to another. For example, it has been shown that the Mu and Beta frequency bands [14] and the shape of an MRP [15] may vary from one user to another. As a result, if the features extractor does not extract user-specific features, the performance of the BCI system may degrade [16],                                                  1 A version of this chapter has been published. Fatourechi, M., Bashashati, A., Birch, G.E. and Ward, R.K.  ?Automatic User Customization for Improving the Performance of an Asynchronous Brain Interface System?, Journal of Medical & Biological Engineering and Computing, Vol.44, No.12, Dec 2006, pp.1093-1104.      51or even detect an incorrect pattern [15]. A successful BCI system must therefore select features  that  correctly  characterize  the  underlying  neurological  phenomenon  of  the specific user. We call this process user customization of the feature extractor.  Traditionally, BCI systems have not employed user customization for extracting features. Recent studies, however, showed that user customization of the feature extractor leads  to  improved  performance  for  most  users  [5,  15-19].  User  customization  can  be achieved  either ?manually? or ?automatically?.    In the manual user customization, the neurological  phenomenon  of  interest  is  visually  inspected  by  a  human  expert  (usually through inspecting the ensemble average of many single trials); this is then followed by the  expert  determining  the  parameter  values  of  the  feature  extractor  [15].    This customization  process  has  two  main  advantages:  it  is  relatively  fast  and  it  is  not computationally demanding. Thus, when the total number of users and EEG channels is small and the signal-to-noise ratio (SNR) of the neurological phenomenon is sufficiently high, the manual approach can be used for customizing the parameter values.  When  the  number  of  users  grows,  however,  this  process  becomes  increasingly time-consuming and exhausting.  The problem becomes more challenging when features are extracted from a large number of EEG channels, since many EEG channels (and not only  one  or  two)  need  to  be  visually  observed.  If  the  SNR  of  the  neurological phenomenon  is  low,  visual  estimation  becomes  subjective  and  inaccuracies  are introduced  in  the  estimates  of  the  parameter  values.  Furthermore,  if  some  kind  of preprocessing  that  changes  the  shape  of  the  neurological  phenomenon  of  interest  is employed, then this change should be considered in the design of the feature extractor. For these reasons, an automatic user customization algorithm is desired.  In this Chapter, we employ automatic user customization of the feature extractor of a self-paced BCI system called the Low Frequency-Asynchronous Switch Design (the LF-ASD)  [1].  The  LF-ASD  detects  an  Intentional  Control  (IC)  command  in  the  EEG signal. The IC command corresponds to an MRP pattern generated by the flexion of the right index finger. When the users are not in an IC state, they are said to be in a no-control (NC) state. In an NC state, a user may be idle or perform some action other than trying to control the BCI system. We chose the LF-ASD for this study because of our    52intimate knowledge of this BCI system. Also, the LF-ASD has been used as the basis for potential  design  improvements  by  other  researchers  [6],  and  it  is  one  of  the  few  BCI systems that has been successfully tested online [20].  Because the shape of MRP patterns differs from one user to another, determining the  specific  design  parameter  values  for  each  individual  is  expected  to  improve  the performance of a BCI system. In [15] the parameter values of the feature extractor of the LF-ASD are estimated by a human expert (see Section 2.3 for details). It is shown that such user customization results in improved performance of the LF-ASD. There are some limitations,  however,  in  the  application  of  the  proposed  method.  First,  as  mentioned above,  the  process  can  become  very  time?consuming,  especially  for  a  relatively  large number  of  EEG  channels,  which  is  the  case  here  (the  LF-ASD  uses  6  bipolar  EEG channels). Second, the  LF-ASD incorporates a  pre-processing component that changes the  shapes  of  MRPs.  Third,  the  SNR  of  the  MRPs  is  usually  very  low;  this  makes estimating  the  parameter  values  from  the  ensemble  averages  unreliable  for  some individuals [15].  In this study, we propose the use of a genetic algorithm (GA) to automatically estimate the shape of MRPs for each user and thus user customize the parameter values of  the  LF-ASD.  A  GA  is  a  heuristic  search  method  that  provides  a  framework  for effectively sampling large search spaces [21]. GAs are designed based upon the genetic processes of biological organisms, which evolve over many generations according to the principles of natural selection and survival of the fittest. By mimicking this process, they are able to evolve solutions to real-world problems. They have been shown to be effective in optimization problems where a large-dimensional feature space is involved, especially when the optimization problem cannot be solved by analytical tools [21, 22]. Since in this study  we  plan  to  automatically  estimate  the  shape  of  the  MRP  pattern  for  each  EEG channel, and thus we are dealing with a high-dimensional parameter space, we employ GAs.  The  use  of  a  GA  for  automatic  user  customization  of  the  LF-ASD  was  also motivated by the results of our earlier work in [23]. There, we used a GA to automatically customize the parameter values of the post-processing component in the LF-ASD for two individuals.  The  improvements  in  performance  of  the  two  individuals  studied, demonstrated the effectiveness of employing a GA.     53This study demonstrates that automatic user customization of the LF-ASD results in statistically significant improvements in the performance over the BCI system whose design parameter values are user customized by a human expert [15]. This finding further supports  existing  evidence  that  automatic  user  customization  leads  to  performance improvement in BCI systems. 2.2  Background This  section  briefly  reviews  MRPs  and  the  overall  structure  of  the  LF-ASD. MRPs are low-frequency  potentials that start about 1-1.5 seconds before a movement. They have bilateral distribution and present maximum amplitude at the vertex [24-26]. An MRP is a robust phenomenon observed in the brain signal. It has been shown that there are similarities between the shapes of MRPs resulting from a real execution of a movement and those resulting from an attempt to perform a movement [1]. In some BCI systems, MRPs have thus been chosen as the neurological phenomenon, from which the presence  of  an  IC  command  is  extracted  [1,  27-29]  .  An  MRP  consists  of  different components,  such  as  Bereitschaftspotential,  a  motor  potential  (MP)  ,  post-movement positive  potential  (PMPP),  etc.  [30].  Different  BCI  systems  focus  on  the  detection  of different components. For example, in [31], Bereitschaftspotential are detected, whereas in  [1],  the  whole  MRP  is  detected.  There  are  various  methods  for  detecting  the components of MRPs, such as using autoregressive parameters [27], wavelet transform [28]and  Fourier  transform  [29].  One  solution  is  to  use  a  simple  feature  extractor  that detects  the  peaks  of  MRPs  ,  since  the  peaks  are  found  to  be  robust  over  different individuals [1]. The LF-ASD system and its variations have used this idea for detecting MRP patterns [1, 15, 32].  The block diagram of the LF-ASD [15, 32] is shown in Figure 2-1. This design uses  features  extracted  from  six  bipolar  EEG  channels  located  on  the  sensorimotor cortex. After amplification and low-pass filtering using a low-pass, linear phase FIR filter with  a  4  Hz  cut-off  frequency,  all  six  EEG  channels  are  normalized  with  an  energy normalization transform (ENT) [33].     54The ENT, (see Figure 2-1), normalizes the input energy and has been shown to result in a better class separation by increasing the difference between the means of the IC and NC features [32-34].  The output of the ENT is calculated using                                                 ? ?? ???222)())(NNWWsnne                                       (2-1) where x(n) is the input EEG channel, WN  is the width of the ?sliding? window used to normalize x(n), and e(n) is the normalized EEG channel (the output of the ENT). The only design parameter of ENT is WN , i.e., the ?normalization parameter?. Its value was originally determined through using an exhaustive search on the data collected from one individual. This value was then used for all other individuals [33].   amp  Feature Translator LF-ASD feature generator 1-NN classifier electrode array moving average  debounce KLT  ENT  codebook generation mechanism  Figure 2-1. Components of the LF-ASD system (from [32]). A  specific  feature  generator  is  then  applied  to  detect  the  presence  of  an  MRP pattern in the single trial bipolar EEG signals [1]. Figure 2-2 shows the points used to calculate the features from a sample EEG signal at a particular point in time (t=n?). As shown in Figure 2-2,  each of the elemental features  )i  and  )j is defined as the difference in e(n) at two points in time, described by (2-2) and (2-3) below:                  )iii ?                                              (2-2)                 )jijij n ?                                           (2-3)    55  Figure 2-2. Points selected by the feature generator when applied to a sample bipolar EEG signal. where  ) is  the  ENT-normalized  EEG  signal.  Throughout  this  Chapter,  the  above parameters  ( jiji ?, )  are  referred  to  as  the  ?delay  parameters?  and  are  used  to estimate the shape of a bipolar MRP. To  emphasize  the  samples  for  which  two  large  elemental  features  appear concurrently, compound features are defined by pairing the elemental features ( ji E ), as shown below:         otherwise nEnEifnEnEng jijiij 0 0)'()'()'()'()'( ? ????                          (2-4) For robustness, the compound features are maximized over a window as follows:       ? ?)ijijijijij ?                     (2-5) Since there are six pairs of bipolar EEG channels, this procedure is repeated for each of these channels. Compound features of each of the six EEG signals then form a 6-dimension feature vector. The Karhunen-Lo?ve Transform (KLT) is used to reduce the 6-dimensional  feature  space  produced  by  the  feature  generator  to  a  2-dimensional space[35]. A 1-nearest neighbor (1-NN) classifier is used as the feature classifier.  The ? j ? j  ? i ? i Ej(n?) time Ei(n?) t=n?    56codebook generation mechanism for the classifier is explained elsewhere [1].  Finally, a moving  average  and  a  debounce  algorithm  are  employed  to  improve  the  classification accuracy of the system by reducing the number of false activations (for details, see [1, 32]).  After training, the LF-ASD classifies the input patterns as one of two classes: NC or IC.  2.3  Problem statement In  designing  the  LF-ASD,  the  parameter  values  of  the  ENT  and  the  feature generator must be estimated. The aim is to determine these estimates so that it is possible to detect MRP patterns in a single trial. The ENT has one parameter to be determined. This  is  the  window  size,  WN.  Its  value  should  be  estimated  for  each  of  the  six  EEG channels. The feature generator has four delay parameters ( jiji ? ) for each of the six EEG channels, resulting in a total of 24 delay parameters whose values should be estimated. This means that, to detect the presence of an MRP pattern, the values of 30 parameters  should  be  determined.  For  the  rest  of  this  Chapter,  we  refer  to  these  30 parameters as the ?design parameters?. These parameters were originally estimated by a human expert from the ensemble average of MRP patterns for one individual and then were used for all subsequent individuals [1, 32]. As the MRP pattern related to a specific movement may differ from one individual to another, using the same design parameter values for all individuals may lead to erroneous results.  Therefore, the design parameter values should be estimated for each individual. The same argument applies to any BCI system that uses a user-dependant pattern for its IC state. When determining the design parameter values, two points should be considered. First, these values could not be determined using an exhaustive search approach. Without having an efficient automatic method, it is prohibitively time consuming to determine all parameter  values  simultaneously  by  using  an  exhaustive  search  method.    Second,  an improper  choice  of  design  parameter  values  may  lead  the  BCI  system  to  detect  an incorrect pattern in the EEG signal. This, in turn, degrades the performance of the system, since  the  detected  pattern  would  not  correspond  to  an  MRP  pattern  as  it  may  have resulted from a particular artifact.     57To improve the performance of the LF-ASD, the  )ji ? delay parameter values were user customized by a human expert in [15]. The ? i?s  and ? j?s were set to 0, equal  i  and  j   values  were  used  for  each  of  the  six  EEG  channels  and  the  size  of  the normalization window was fixed for all individuals and for all EEG channels. For each individual,  the  MRP  pattern  associated  with  the  flexion  of  the  right  index  finger  was determined using the ensemble average of the MRP pattern. The delay parameter values were  then  estimated  by  visually  inspecting  the  user?s  ensemble  average  of  the  MRP patterns. The rationale behind using the ensemble average was that it enhanced the SNR, and that the resulting waveform better showed the desired pattern that the LF-ASD aimed at  detecting.  As  for  the  normalization  parameters,  since  no  analytical  method  for estimating these values existed, the values found earlier in [33] by trial and error were used. The data of eight individuals were analyzed [15]. Improvements from 2.0% to 6.8% were reported for four individuals, but the results for the rest of the individuals did not improve [15].  Although implementing the above customization approach seems straightforward, there were some problems associated with it. First, estimating the delay parameter values from  the  ensemble  averages  was  not  trivial.  The  number  of  available  trials  had  a significant effect on the quality of the generated ensemble averages and ultimately on the estimated values of the delay parameters. For some individuals, there were a number of closely  located  peaks  in  the  ensemble  averages  that  made  the  estimation  of  the  delay parameter values very difficult. As a result, several points had to be tested before the desired delay parameter values were estimated [15]. Also, the values of the normalization parameters and the delay parameters were not estimated simultaneously. Since the ENT was applied first, the delay parameter values were estimated subsequently. Thus, for each value of the normalization parameter of the ENT, the delay parameter values had to be estimated. Since no analytical method currently exists for estimating the normalization parameter  values,  the  resulting  estimates  of  the  delay  parameter  values  may  not  be reliable. Finally, the amount of improvement in the performance of the system found in [15] over that of the non-user customized system [32], was not as high as expected. This    58is  probably  due  to  the  fact  that  estimating  the  delay  parameter  values  based  on  the ensemble averages does not guarantee optimal performance in single-trial analysis.  To address these limitations, in the next section we propose the use of a GA to automatically user customize the design parameter values.  2.4  Methods     In applying GAs to select the parameter values, each parameter of interest is first  coded  in  the  form  of  a  randomly  generated  binary  string.  Each  bit  in  this  binary string is called a gene. The concatenation of all the binary strings forms a ?chromosome?, and the set of ?chromosomes? forms a ?population?. Each chromosome is then evaluated and  a  fitness  value  assigned.  For  example,  the  fitness  value  can  be  the  classification accuracy of the BCI system for a particular set of parameter values. The chromosomes are  then  combined  using  operators  such  as  ?selection?,  ?crossover?  and  ?mutation?  in order to generate new chromosomes. The ?selection? operator selects a proportion of the existing population to breed a new generation. The selected chromosomes are usually the ones  with  higher  fitness  compared  to  other  chromosomes  in  the  population.  After selection  of  the  ?fitter?  chromosomes,  a  pair  of  "parent"  chromosomes  is  selected  for generating  the  ?child?  chromosomes.  A  child  chromosome  is  a  new  solution  that typically  shares  many  of  the  characteristics  of  its  "parents".  The  ?crossover?  operator ensures that this is the case by copying some of the genes of each parent to the child. The ?mutation?  operator  is  used  to  maintain  genetic  diversity  from  one  generation  of  a population to the next.  This process is repeated until a new population of chromosomes is generated. It is expected that the population evolves gradually and that fitness improves over generations. This process is continued until some criteria for stopping the GA is met [21].  The GA we apply for user customization has the following characteristics. Each chromosome consists of a concatenated binary version of 31 parameter values.  These parameters comprise the 30 design parameters previously stated and the ?scale factor? parameter,  which  determines  the  operating  point  of  the  BCI  system  on  the  receiver operating characteristic (the ROC) curve. The ROC curve shows the relationship between    59the true positive (TP) and the false positive (FP) results for each parameter configuration (for more details on scale factor and plotting ROC curve for the LF-ASD, see [32]). The function of the scale factor is explained below. The width of the normalization window was chosen to be from 0 to 1.5 seconds. The  initial  values  of  the  delay  parameters  were  visually  estimated  from  the  ensemble averages of the MRP patterns in the training data with the ENT removed from the system. For simplicity, the same initial delay values were chosen for all channels. The ranges for the  delay  parameter  values  were  then  chosen  as  follows  (all  numbers  refer  to  sample numbers): Range of ? i: [? i-est -32  to ? i-est + 96] Range of ? j: [? j-est -96  to ? j-est + 32] Range of ? i: [-32  to + 32] Range of ? j: [-32  to + 32]                                                                               (2-6) where, ? i-est  and ? j-est are the approximate values of the delay parameters estimated  from the  ensemble  averages.  These  parameter  ranges  were  chosen  to  cover  the  range  over which the peaks, associated with the pattern shown in Figure 2-2, are expected to occur. Their values, thus, give an estimation of the shape of the MRP pattern.  The range of the scale factor (which determines the operating point on the ROC curve) was chosen as from 0.1 to 4. Our experience has shown that this selection covers the range of the operating points on the ROC curve of the LF-ASD that should be at low FP rates [32]. Following  an  initial  estimate  of  the  delay  parameter  values,  a  suitable  fitness function for the GA was chosen as follows.  A confusion matrix, shown in Table 2-1, was used to summarize the classification performance of a 2-state self-paced BCI system. In Table 2-1, the FP rate is the percentage of misclassifying a NC state as an IC state, the true negative (TN) rate is the percentage of correctly classifying an NC, the TP rate is the percentage  of  correctly  classifying  an  IC  and  the  false  negative  (FN)  rate  is  the percentage of misclassifying an IC state as an NC state. A suitable fitness function for a self-paced BCI should be able to effectively summarize the confusion matrix. For a two-state self-paced BCI system such as the LF-ASD, we have    60FN(%) = 100 (%)- TP (%) and  TN(%) = 100 (%)- FP (%)                                                 (2-7) Based on (2-7) , the fitness function needs to contain only TP and FP rates. One choice of a good fitness function can be one that maximizes the TP rate for a reasonably low fixed FP rate. This choice is based on our previous results, where it was found that an FP rate above 2% caused excessive frustration and distraction in users using a self-paced BCI system [32]. Thus, it is important to keep the FP rates below 2%. Table 2-1. The confusion matrix for a 2-state self-paced BCI system.        Actual  Class Predicted Class IC  NC IC   TP  FN NC  FP  TN  Our earlier attempts at calculating a suitable performance measure based on the confusion matrix were based on reporting the TP rate at a fixed FP rate (which was set at 2%; see [32] for details). In order to achieve this, various points on the ROC curve were analyzed by varying the scale factor until a desired point, with an FP rate of 2%, was found.  Such an approach is undesired for calculating the fitness function because of the huge  computational  load  involved.  Currently,  each  evaluation  of  the  fitness  function, including  training  the  classifier  and  evaluating  the  system  on  the  validation  set,  takes about two minutes on a PC with a Pentium IV 2.8 GHz CPU and 512 MB of RAM. Since finding  a  specific  point  on  the  ROC  curve  requires  several  such  evaluations,  and  this process should to be repeated for all chromosomes in the population, the computational load increases dramatically. To be more specific, if the time needed for each evaluation of the fitness function is denoted by sEvaluation , and  Chromosome evaluations are needed to find  a  specific  point  on  the  ROC  curve,  and  the  GA  needs  to  evaluate    61sN chromosomes  during  its  operation,  the  running  time  of  the  GA  can  be calculated as follows:  sTNNT ???                                                   (2-8)      Since 2sEvaluation , and  sEvaluation is in the order of thousands (e.g., 5000),  it is evident  that  even  for  a  small    Chromosome ,    GA will  become  very  large.  For  the  same reason, using the area under the ROC curve is not practical at this stage, since several points  on  the  ROC  curve  should  be  estimated  for  a  single  evaluation  of  the  fitness function. Our final configuration incorporated the FP rate as a constraint in the fitness function. We defined the fitness function as follows:                     ??????%%)(FPFPChromosomefitness            (2-9) where the TP and the FP rates are expressed in %. In (2-9), the TP rates remain intact only  for  FP  values  less  than  2%.  For  FP>2%,  we  attenuated  the  fitness  of  these chromosomes dramatically in order to prevent the less fit chromosomes from becoming active members of the population. Although such chromosomes had high TP rates, they also had high FP rates, and were considered ?unfit? from a practical point of view.  The scale factor  was  added to the structure of the chromosome because of the expectation that the algorithm is able to find the value of the scale factor that yields the highest TP rate when  % . In [23], we showed that this was indeed the case. The GA was able to find the scale factor value yielding the highest TP rate for % . The remaining operators of the GA were chosen as follows. Tournament-based selection  (tournament  size  =3)  was  used  as  the  selection  operator.  Uniform  crossover (p=0.9)  and  uniform  mutation  (p=0.01)  operators  were  used.  The  sizes  of  the  initial population and of the population in the next generations were chosen as 200 and 100, respectively.  We  used  random  initialization  for  initializing  the  GA.  The  number  of evaluations  was  set  to  5000  and  this  criterion  was  used  for  the  termination  of  the algorithm. If the improvement in the best solution was found to be less than 1% for more than  10  consecutive  generations,  before  reaching  the  total  number  of  evaluations  the    62algorithm was terminated. Because of the computational load involved, we did not tune the GA parameter values such as the mutation and crossover rates. 2.5  Experimental results In this section, the performance of the proposed algorithm is evaluated using the data collected from eight individuals. Off-line data were collected from users positioned 150 cm in front of a computer monitor. The EEG signals were recorded from six bipolar electrode pairs positioned over the users? supplementary motor area and primary motor cortex  at  F1-FC1,  Fz-FCz,  F2-FC2,  FC1-C1,  FCz-Cz,  and  FC2-C2  in  accordance  with  the International 10-20 System. Features extracted from these channels had been shown to provide  more  discriminant  information  for  the  separation  of  IC  and  NC  features  [1].  Electrooculography (EOG) activity was measured as the potential difference between two electrodes,  placed  at  the  corner  of  and  below  the  right  eye.  The  ocular  artifacts  were automatically  rejected when the difference between the  EOG  electrodes exceeded ?25 ?V.  All  signals  wer e sampled at 128 Hz and referenced to the ear electrodes (see [1, 36] for details). Data from four individuals with a high-level spinal cord injury (location of injury between C4-5 and C6-7 on the spinal cord) and four able-bodied individuals were used in this study. The individuals with spinal cord injury were coded as SCI (spinal cord injury) individuals and the able-bodied individuals were coded as AB individuals. None of the individuals  with  spinal  cord  injury  had  residual  sensation  or  motor  function  in  their hands. The users? descriptions are shown in Table 2-3. The data were collected from the users as they performed a guided task.  At each interval,  a  white  circle  of  2  cm  diameter  was  displayed  on  the  user?s  monitor  for  ? second, prompting the user to attempt a movement. In response to this cue, the user had to attempt to flex his right index finger one second after the cue appeared. The 1-second delay was used to avoid visual evoked potential (VEP) effects from the cue, and the users were trained to estimate it.  The 1-second time after the cue is denoted by the ?time of the expected  attempted  movement  (TEM)?.  Note  that  this  is  the  time  when  the  user  is expected to attempt to perform the movement, and that this time may vary from one user    63to  another  and  from  trial  to  trial.  This  task  resulted  in  an  attempted  movement  in individuals with spinal cord injury i.e., no physical finger movement, and an actual finger flexion in able-bodied individuals (see [36] for more details). For each user, an average of 80 trials was collected every day over a period of 6 days.   The data in the EEG signals were divided into segments, each of length equal to seven seconds.  A 7-second window was wide enough to contain an MRP pattern as well as NC periods. A training set, a validation set and a test set were then randomly generated for  each  user  from  these  7-second  windows.  The  training  set  was  used  to  train  the classifier.    The  validation  set  was  used  to  select  the  optimal  values  of  the  design parameters using the proposed GA. The parameter values yielding the least error on the validation set were then selected. The performance of the system was evaluated using the test set. For each user, the epochs were randomly divided into five non-overlapping sets of equal size. The data in the first set were used for training, the data in sets two and three were used for estimating the parameters and the data in sets four and five were used for testing  the  performance  of  the  selected  model.  The  number  of  epochs  in  the  training, validation and test sets for each user is reported in the fifth column of Table 2-3.  The features in (2-4) were generated by moving the feature generator over epochs, each of a 7?second length. Since the EEG signal is filtered to frequencies below 4Hz, the feature generator was shifted by 0.0625 seconds (8 samples), resulting in a total of 112 features in a 7-second epoch. To  determine  whether  or  not  an  IC  command  was  detected  by  the  system,  we defined a sliding window around the TEM. The length of this window was 1.5 seconds (from 0.5 seconds before the TEM to 1 second after the TEM). If an MRP pattern was detected  at  any  time  within  any  such  window,  the  output  of  the  BCI  system  was activated.  This  method  is  similar  to  those  used  by  other  researchers  [3,  6,  7].    False positives were assessed in the periods before the system cue appeared and after the user was expected to perform the movement. In  [15],  a  5-fold  stratified  cross-validation  process  was  used  to  assess  the performance of the LF-ASD. The trials in the training sets, validation sets and test sets were chosen randomly. The performance over different validation sets varied very little.    64Thus,  to  save  on  computational  time,  we  did  not  perform  cross-validation  over  the different validation sets in this study, saving about 20% of the time needed for a 5-fold stratified cross-validation. Figure  2-3  shows  the  fitness  of  the  best  chromosome  in  each  generation  as  a function of the generation number for two representative individuals (AB2 and SCI4). Figure 2-3 clearly shows the evolution of the fitness of the best chromosome as the GA explores the search space. Please note that in the early stages of the GA, the improvement rate of the fitness of the best chromosome is fast. As the population evolves, the rate of improvement drops. This is because in the early stages of the GA, the value of the scale factor is not properly chosen. The design parameter values are also far from optimal. As the population evolves, the GA is able to find the scale factor value that yields the highest TP rate for FP=2%. This in turn results in a significant improvement in the fitness. As the generation  number  increases,  the  scale  factor  value  is  more  properly  set.  The  rate  of improvement thus drops. Table  2-2  summarizes  the  performance  of  the  GA.  In  this  table,  the  average fitness of the population, the fitness of the best chromosome and the fitness of the worst chromosome are reported for both initial and the final populations. As Table 2-2shows, the average fitness of the initial population is very low. This result may be due to the following reasons: (1) The parameter values are randomly selected and are far from optimal.  (2)  The  scale  factor  value  is  not  properly  set.  Many  chromosomes  in  the population are thus assigned a fitness value equal to zero since their FP rates are above the threshold of FP=2% (see (2-9)).      65 0 5 10 15 20 25 30 35 40 45 5066687072747678808284Generation numberFitness (a) 0 5 10 15 20 25 30 35 40 45 5050556065707580Generation number Fitness (b) Figure 2-3. The fitness of the best chromosomes as a function of the generation number for two representative individuals. a) AB2; b)  SCI4.    66As the population evolves through generations, the GA is able to find the optimal value of the scale factor that yields the highest TP rates for FP=2%. Moreover, the choice of optimal parameter values leads to the generation of chromosomes with high fitness, resulting  in  an  increased  average  fitness  of  the  population.  Since  the  GA  found  the suitable scale factor values for the chromosomes, the fitness of the weakest chromosome in the population is also dramatically increased.  Table 2-2. Comparison of the fitness value of the initial and final populations (tested on the validation sets).  The  performance  of  the  proposed  ?Automatically  User  Customized  LF-ASD? system  or  ALF-ASD  on  the  test  sets  is  shown  in  Table  2-3  .  We  compared  the performance  of  the  ALF-ASD  with  that  of  the  latest  design  of  the  LF-ASD  whose parameter values tuned by a human expert [15]. The estimates of the delay parameter values in [15] are shown in Table 2-4.  We tested both designs on 10 different randomly User  Initial population  Final population Worst Fitness Mean Fitness Best Fitness Worst Fitness Mean Fitness Best Fitness AB1  0  13.45  63.75  76.79  76.90  77.65 AB2  0  13.74  66.19  81.31  81.50  82.99 AB3  0  6.33  54.55  78.03  79.22  80.65 AB4  0  13.66  65.42  82.93  83.82  86.51 SCI1  0  16.00  64.68  75.88  77.46  78.34 SCI2  0  8.93  63.69  79.21  81.74  83.19 SCI3  0  15.77  64.46  70.76  72.42  73.33 SCI4  0  11.94  51.14  73.12  75.81  76.24 Average  0  12.48  61.73  77.25  78.61  79.86    67chosen datasets. The TP results were then averaged over 10 sets for a fixed FP rate of 2%.  Table 2-3 shows the results of running both algorithms on the data of all individual. The numbers in parentheses show the standard  deviations. The last column shows the difference in TP rate for each user as well as the significance levels of the results, found by  applying  a  two-sample  t-test.  Before  carrying  out  the  t-test,  the  Levene's  test  for equality of variances was used to determine whether the estimates of means in the t-test should be equal or unequal [37]. The results of Levene?s test showed the homogeneity of the variances. As Table 2-3 shows, the average TP rate was increased to 67.78% from 61.13% achieved  using  the  method  described  in  [15].  Such  an  improvement  was  statistically significant for 5 users (p<0.01) and non-significant for the remaining three (p>0.05). The average improvement in the TP rate for individuals with spinal cord injury was more than that of able-bodied individuals. To be more specific, the average TP rate for individuals with  spinal  cord  injury  was  increased  to64.90%  in  the  current  study  from  55.08% achieved using the customization by a human expert (an increase of 9.82%). As for able-bodied  users,  the  average  TP  rate  was  increased  to  70.76%  in  the  current  study  from 67.17% achieved using the customization by a human expert (an increase of 3.58%).  Interestingly,  the  standard  deviations  of  the  TP  rate  also  dropped  from  those achieved using the customization by a human expert. For individuals with spinal cord injury, the standard deviation of the TP rate decreased to 4.58% from 12.32% achieved using  the  customization  by  a  human  expert;  while  for  able-bodied  users,  the  standard deviation fell to 2.76% from 3.39% achieved using the customization by a human expert. Overall the standard deviation of the TP rate was reduced to 4.65% compared to 10.57% achieved using the customization by a human expert. These findings indicate that as we remove  the  inaccuracies  introduced  as  the  result  of  estimating  the  design  parameter values by a human expert, the performance of individuals gets closer to each other. In other words, these results indicate that if the parameter values of the feature generator are correctly determined, the inter-subject variability in terms of performance will decrease.      68Table 2-3. TP rates of the LF-ASD and the ALF-ASD (FP=2%).  User  Disability Description Age Gender Number of epochs  LF-ASD (%) ALF-ASD(%) Difference in the TP(%) ation Test AB1  N/A  56  M  128  256  256  65.5 (3.6) 70.0  (1.9) 4.5  (p<0.01) AB2  N/A  43  M  103  206  206  72.2 (2.3) 72.6  (3.2) 0.5  (p>0.05) AB3  N/A  31  F  133  266  266  66.2 (1.4) 67.5 (3.2) 1.2  (p>0.05) AB4  N/A  45  M  97  194  194  64.7 (3.4) 72.9  (3.8) 8.2 (p<0.005) Average- (AB users) N/A  -  -  115.2 (17.9) 230.5 (35.8) 230.5 (35.8) 67.2 (3.4) 70.8 (2.8) 3.6 (p=0.07) SCI1  C4/5 (17 y2)  53  M  128  256  256  63.5 (2.0) 64.6  (3.5) 1.1 (p>0.05) SCI2  C4/5 (23 y)  56  M  103  206  206  66.0 (4.4) 70.6 (2.6) 4.5  (p<0.005) SCI3  C5/6 (4 y)  33  M  91  182  182  39.1 (5.1) 59.3  (4.1) 20.2  (p<0.0001) SCI4  C4/5 (5 y)  35  M  85  170  170  51.7 (5.7) 65.0  (5.3) 13.3  (p<0.0001) Average (SCI ) -  -  -  101.7 (19.0) 203.5 (38.1) 203.5 (38.1) 55.1 (12.3)  64.9 (4.6)  9.8 (p=0.09) Overall Average -  -  -  108.5  217  217  61.1 (10.6) 67.8(4.6)  6.7 (p=0.06)                                                  2 Indicates number of years since injury.    69Table 2-4. Delay parameter values used in the design of the LF-ASD based on the ensemble averages of the MRP patterns in the training data set. Note that ? i  and ? j are set to zero and that the same delay parameter values are used for the rest of the bipolar channels. The table is reproduced from [15]. User  ? i  ? j AB1  95  87 AB2  83  114 AB3  37  21 AB4  128  43 SCI1  112  99 SCI2  95  53 SCI3  39  64 SCI4  89  69    2.6  Discussion and conclusions An important issue in the design of many BCI systems is the correct detection of the IC pattern (if present) for each user. Since the shape of a neurological phenomenon varies  to  some  extent  from  one  individual  to  another,  it  is  necessary  to  consider  this variation in the design of BCI systems. As a result, adjusting the parameter values of the feature  extractor  (user  customization  of  the  feature  generator  of  the  BCI  system)  is necessary for each user. If such user customization is done visually by a human expert, the  results  may  have  a  subjective  bias  and  unreliable;  the  customization  process  also becomes  time  consuming  and  exhausting.  An  automatic  method  therefore  needs  to  be developed to perform user customization without the interference of a human expert. In this Chapter, the effect of automatic user customization of the design parameter values  of  a  self-paced  BCI  system  was  analyzed.  More  specifically,  we  proposed  an automatic method for estimating the shape of an MRP used to drive the output of a self-paced BCI. Since MRPs have been used as the neurological phenomenon in a number of BCI systems, an automatic algorithm to estimate their shape can be used as an effective feature extraction method in those systems. A  GA was implemented to user customize a self-paced BCI called the LF-ASD. The LF-ASD is one of the few self-paced BCI systems that have been successfully tested online [20] and has been used by other researchers as well [6]. In design of the LF-ASD,    70estimates of the delay parameter values obtained from the ensemble averages may be far from optimal because of the noisy nature of the EEG signals, the presence of artifacts and the psychological factors of each user. In addition, no analytical method currently exists for  estimating  the  normalization  parameter  values.  Until  recently,  these  have  been estimated  in  an  ad-hoc  manner  through  an  exhaustive  search  of  possible  values. Automatic customization resolves this problem, since it estimates the parameter values depending on their associated cost functions.  We showed that by using a GA, the performance of the LF-ASD is improved to a great extent over the case where the design parameter values were estimated by a human expert [15]. This finding provides additional evidence that automatic user customization boosts  the  performance  of  a  BCI  system.  Moreover,  the  designer  is  relieved  from  the cumbersome task of choosing the values of the feature extractor for each user.  One of the interesting findings of this Chapter is that the highest improvements were achieved in the performance of individuals with spinal cord injury when the delay parameter values were automatically customized. When the customization is done by a human  expert,  the  highest  improvements  were  achieved  for  able-bodied  users  [15]. However, the performance of individuals with spinal cord injury did not improve much. On the other hand, the results presented in Table 2-3 show that when the automatic user customization  is  used,  the  highest  improvements  were  achieved  for  individuals  with spinal cord injury. The average improvement in the TP rate was 3.58% for able-bodied users  and  9.82%  for  individuals  with  spinal  cord  injury  (resulting  in  the  overall improvement of 6.68% (p=0.06)). This is probably due to the fact that individuals with spinal cord injury did not perform an actual movement, thus their MRP patterns were not as strong as those of able-bodied users. This resulted in noisier ensemble average MRP templates for the latter users, where visual estimation of the delay parameter values was not straightforward. The proposed automatic user customization method, however, deals with the optimization of the performance over single epochs and thus was able to find more  suitable  delay  parameter  values.    Because  of  the  low  number  of  users,  these findings  cannot  be  generalized.  They  do,  however  provide  some  preliminary  evidence that automatic user customization is necessary for achieving acceptable BCI performance for individuals with spinal cord injury.    71We also found that for every individual, the values of the delay parameters found by the GA differed from one channel to another. There are two reasons for this result. First, the spatial distribution of the measured EEG signals was taken into consideration. Since the spatial distribution differs from one channel to another, it is expected that the delay parameter values should also differ. The other reason is the presence of the ENT. The value chosen for each normalization parameter changes the shape of the resultant EEG signals to some extent. Thus, for every value of the normalization window, a new set of delay parameter values should be estimated to correctly detect the presence of a bipolar MRP pattern in the EEG signal. The design parameter values found by the GA also differed from one user to another, providing further evidence that user customization is necessary to achieve acceptable performance values.  Comparison of the average results on test sets in Table 2-2 and Table 2-3 shows a drop of 12.05% in the performance. This drop in the performance indicates that the use of more sophisticated classifiers may be beneficial. For example, a support vector machines (SVM) can be used  as  a classifier, since not only it minimizes the empirical risk (the training error), it minimizes the confidence error as well (the test error) [38]. Future work includes finding better cost functions. Such a study has not been well explored in self-paced BCI systems. Finding better cost functions that can summarize the confusion matrix more effectively, is especially desired in optimization problems. Future work  should  also  include  online  testing  of  the  ALF-ASD.  Specifically  we  shall investigate the performance of the ALF-ASD over time. Since the literature indicates that the shapes of MRPs may change from one day to another, a method that locally tunes the parameter values of the feature generator ahead of each session should be developed.  2.7  Acknowledgements This work was supported in part by the NSERC under Grant 90278-06 and the CIHR under Grant MOP-72711. The authors also would like to thank Mr. Craig Wilson for his valuable comments on this Chapter.     722.8  References [1]  S.  G.  Mason  and  G.  E.  Birch,  "A  brain-controlled  switch  for  asynchronous  control applications?,  IEEE Trans. Biomed. Eng., vol. 47, no.10, pp. 1297-1307, Oct. 2000.  [2]  G.  E.  Birch,  P.  D.  Lawrence  and  R.  D.  Hare,  "Single-trial  processing  of  event-related potentials using outlier information?,  IEEE Trans. Biomed. Eng., vol. 40, no.1, pp. 59-73, Jan. 1993.  [3] S. P. Levine, J. E. Huggins, S. L. BeMent, R. K. Kushwaha, L. A. Schuh, M. M. Rohde, E. A. Passaro,  D.  A.  Ross,  K.  V.  Elisevich  and  B.  J.  Smith,  "A  direct  brain  interface  based  on event-related potentials?,  IEEE Trans. Rehabil. Eng., vol. 8, no.2, pp. 180-185, Jun. 2000.  [4] R. Millan Jdel and J. Mourino, "Asynchronous BCI and local neural classifiers: an overview of the Adaptive Brain Interface project?,  IEEE Trans. Neural Syst. Rehabil. Eng., vol. 11, no.2, pp. 159-161, Jun. 2003.  [5] R. Scherer, G. R. Muller, C. Neuper, B. Graimann and G. Pfurtscheller, "An asynchronously controlled  EEG-based  virtual  keyboard:  improvement  of  the  spelling  rate?,    IEEE  Trans. Biomed. Eng., vol. 51, no.6, pp. 979-984, Jun. 2004.  [6]  E.  Yom-Tov  and  G.  F.  Inbar,  "Detection  of  movement-related potentials from  the  electro-encephalogram for possible use in a brain-computer interface?,  Med. Biol. Eng. Comput., vol. 41, no.1, pp. 85-93, Jan. 2003.  [7]  G.  Townsend,  B.  Graimann  and  G.  Pfurtscheller,  "Continuous  EEG  classification  during motor  imagery--simulation  of  an  asynchronous  BCI?,    IEEE  Trans.  Neural  Syst.  Rehabil. Eng., vol. 12, no.2, pp. 258-265, Jun. 2004.  [8] L. R. Hochberg, M. D. Serruya, G. M. Friehs, J. A. Mukand, M. Saleh, A. H. Caplan, A. Branner, D. Chen, R. D. Penn and J. P. Donoghue, "Neuronal ensemble control of prosthetic devices by a human with tetraplegia?,  Nature, vol. 442, pp. 164-171, Jul 13. 2006.  [9]  J.  F.  Borisoff,  S.  G.  Mason  and  G.  E.  Birch,  "Brain  interface  research  for  asynchronous control applications?,  IEEE Trans. Neural Syst. Rehabil. Eng., vol. 14, no.2, pp. 160-164, Jun. 2006.  [10]  J.  T.  Francis  and  J.  K.  Chapin,  "Neural  ensemble  activity  from  multiple  brain  regions predicts  kinematic  and  dynamic  variables  in  a  multiple  force  field  reaching  task?,    IEEE Trans. Neural Syst. Rehabil. Eng., vol. 14, no.2, pp. 172-174, Jun. 2006.  [11]  G.  Pfurtscheller,  G.  R.  Muller-Putz,  A.  Schlogl,  B.  Graimann,  R.  Scherer,  R.  Leeb,  C. Brunner, C. Keinrath, F. Lee, G. Townsend, C. Vidaurre and C. Neuper, "15 years of BCI research  at  Graz  University  of  Technology:  current  projects?,    IEEE  Trans.  Neural  Syst. Rehabil. Eng., vol. 14, no.2, pp. 205-210, Jun. 2006.  [12]  S.  G.  Mason  and  G.  E.  Birch,  "Temporal  control  paradigms  for  direct  brain  interfaces  - rethinking  the  definition  of  asynchronous  and  synchronous?,  in  Proc.  HCI  International Conference, Las Vegas, USA, 2005.  [13]  S.  G.  Mason,  A.  Bashashati,  M.  Fatourechi,  K.  F.  Navarro  and  G.  E.  Birch,  "A Comprehensive  Survey  of  Brain  Interface  Technology  Designs?,    Annals  of  Biomedical Engineering, vol. 35, no. 2, pp. 137-69, Feb 2007.  [14] M. Pregenzer and G. Pfurtscheller, "Frequency component selection for an EEG-based brain to computer interface?,  IEEE Trans. Rehabil. Eng., vol. 7, no.4, pp. 413-419, Dec. 1999.     73[15]  A.  Bashashati,  M.  Fatourechi,  R.  K.  Ward  and  G.  E.  Birch,  "User  customization  of  the feature generator of an asynchronous brain interface?,  Ann. Biomed. Eng., vol. 34, no.6, pp. 1051-1060, Jun. 2006.  [16] M. Pregenzer and G. Pfurtscheller, "Frequency component selection for an EEG-based brain to computer interface?,  IEEE Trans. Rehabil. Eng., vol. 7, no.4, pp. 413-419, Dec. 1999.  [17] G. Blanchard and B. Blankertz, "BCI Competition 2003--Data set IIa: spatial patterns of self-controlled brain rhythm modulations?,  IEEE Trans. Biomed. Eng., vol. 51, no.6, pp. 1062-1066, Jun. 2004.  [18] Wenjie Xu, Cuntain Guan, Chng Eng Siong, S. Ranganatha, M. Thulasidas and Jiankand Wu,  "High  accuracy  classification  of  EEG  signal?,  in  Proc.  17th  Int.  Conf.  Pattern Recognition (ICPR 2004), vol.2,  pp. 391-394, 2004.  [19]  T.  N.  Lal,  M.  Schroder,  T.  Hinterberger,  J.  Weston,  M.  Bogdan,  N.  Birbaumer  and  B. Scholkopf, "Support vector channel selection in BCI?,  IEEE Trans. Biomed. Eng., vol. 51, no.6, pp. 1003-1010, Jun. 2004.  [20] S. G. Mason, R. Bohringer, J. F. Borisoff and G. E. Birch, "Real-time control of a video game with a direct brain--computer interface?,  J. Clin. Neurophysiol., vol. 21, no.6,  pp. 404-408, Nov-Dec. 2004.  [21]  D.  E.  Goldberg,  Genetic  Algorithms  in  Search,  Optimization  and  Machine  Learning. Reading, MA: Addison-Wesley Publishing Company, 1989. [22]  T.  Back,  D.  B.  Fogel  and  T.  Michalewicz,  Evolutionary  Computation.  Bristol  and Philadelphia: Institute of Physics Publishing, 2000. [23]  M.  Fatourechi,  A.  Bashashati,  R. K. Ward  and G.  E.  Birch, "A  hybrid  genetic  algorithm approach for improving  the  performance  of the  LF-ASD  brain  computer interface?, in the Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, (ICASSP '05), vol. 5, pp. 345-348, 2005.  [24]  C.  Babiloni,  F.  Carducci,  F.  Cincotti,  P.  M.  Rossini,  C.  Neuper,  G.  Pfurtscheller  and  F. Babiloni, "Human movement-related potentials vs desynchronization of EEG alpha rhythm: a high-resolution EEG study?,  Neuroimage, vol. 10, no.6, pp. 658-665, Dec. 1999.  [25]  L.  Deecke,  B.  Grozinger  and  H.  H.  Kornhuber,  "Voluntary  finger  movement  in  man: cerebral potentials and theory?,  Biol. Cybern., vol. 23, no.2, pp. 99-119, Jul 14. 1976.  [26] M. Hallett, "Movement-related cortical potentials?,  Electromyogr. Clin. Neurophysiol., vol. 34, no.1, pp. 5-13, Jan-Feb. 1994.  [27] D. P. Burke, S. P. Kelly, P. de Chazal, R. B. Reilly and C. Finucane, "A parametric feature extraction and classification strategy for brain-computer interfacing?,  IEEE Trans. Neural Syst. Rehabil. Eng., vol. 13, no.1, pp. 12-17, Mar. 2005.  [28]  E.  L.  Glassman,  "A  wavelet-like  filter  based  on  neuron  action  potentials  for  analysis  of human scalp electroencephalographs?,  IEEE Trans. Biomed. Eng., vol. 52, no.11, pp. 1851-1862, Nov. 2005.  [29] M. Krauledat, G. Dornhege, B. Blankertz, F. Losch, G. Curio and K. -. Muller, "Improving speed and accuracy of brain-computer interfaces using readiness potential features?, in Proc. EMBC Int. Conf., vol.6, pp.4511-4515, 2005.  [30] R. Q. Cui and L. Deecke, "High resolution DC-EEG analysis of the Bereitschaftspotential and  post    movement  onset  potentials  accompanying  uni-  or  bilateral  voluntary    finger movements?,  Brain Topogr., vol. 11, no.3,  pp. 233-249, Spring. 1999.     74[31] B. Blankertz, C. Sch?currency1fer, G. Dornhege and G. Curio, "Single trial detection of EEG error potentials: A tool for increasing BCI transmission rates?, in Proc. Int. Conf. Artificial Neural Networks (ICANN?02),  pp. 1137-1143, 2002.  [32] J. F. Borisoff, S. G. Mason, A. Bashashati and G. E. Birch, "Brain-computer interface design for  asynchronous  control  applications:  improvements  to  the  LF-ASD  asynchronous  brain switch?,  IEEE Trans. Biomed. Eng., vol. 51, no.6, pp. 985-992, Jun. 2004.  [33] Z. Yu, S. G. Mason and G. E. Birch, "Enhancing the performance of the LF-ASD brain-computer  interface?,  in  Proce.    2nd  Joint  IEEE-EMBS/BMES  Conference,  Houston,  TX, USA, vol. 3,  pp.2443-2444, Oct. 2002. [34] Z. Yu, S. G. Mason and G. E. Birch, "Impact of an energy normalization transform on the performance  of  the  LF-ASD  brain  computer  interface?,  in  the  Proc.  Advances  in  Neural Information Processing Systems (NIPS?03), 16,  pp. 725-732, 2003.  [35] N. S. Jayant and P. Noll, Digital Coding of Waveforms. Prentice Hall, 1984,  [36] G. E. Birch, Z. Bozorgzadeh and S. G. Mason, "Initial on-line evaluations of the LF-ASD brain-computer interface with able-bodied and spinal-cord subjects using imagined voluntary motor potentials?,  IEEE Trans. Neural Syst. Rehabil. Eng., vol. 10, no.4, pp. 219-224, Dec. 2002.  [37] K. A. Brownlee, "Statistical theory and methodology in science and engineering?,  A Wiley Publication in Applied Statistics, New York: Wiley, 1965, 2nd Ed., 1965.  [38]  H.  Yoon,  K.  Yang  and  C.  Shahabi,  "Feature  subset  selection  and  feature  ranking  for multivariate time series?,  IEEE Trans. Knowledge and Data Eng., vol. 17, no.9, pp. 1186-1198, 2005.                 75CHAPTER 3  APPLICATION OF A HYBRID WAVELET FEATURE SELECTION METHOD IN THE DESIGN OF A SELF-PACED BRAIN COMPUTER INTERFACE SYSTEM3   3.1  Background A  successful  brain  computer  interface  (BCI)  system  enables  individuals  with severe motor disabilities to control object in their environment (such as a light switch, a neural  prosthesis  or  a  computer)  by  using  only  their  brain  signals.  Such  a  system measures specific features of a person?s brain signal that relate to his or her intent to affect  control,  and  then  translates  them  into  control  signals  that  are  used  to  control  a device [1, 2]. Brain  computer  interface  systems  are  implemented  in  two  ways:  system-paced (synchronized) or self-paced (asynchronous). In system-paced BCI systems, a user can initiate a command only during certain periods specified by the system. In a self-paced BCI  system,  users  can  affect  the  output  of  the  BCI  system  whenever  they  want,  by intentionally  changing  their  brain  state.    The  state  in  which  a  user  is  intentionally attempting to control a device is called an intentional control (IC) state. At other times, users are said to be in a no-control (NC) state, where they may be idle, thinking about a problem,  or  performing  some  action  other  than  trying  to  control  the  device[3,  4].  To operate in this paradigm, BCI systems should be designed to respond only when the user is in an IC state and to remain inactive when the user is in an NC state. So far, only a few BCI systems (e.g. [3, 5-10]) have been specifically designed  and tested  for self-paced                                                  3 A version of this chapter has been published. Fatourechi, M., Birch, G. E., and Ward, R. K., "Application of a Hybrid Wavelet Feature Selection Method in the Design of a Self-paced Brain Interface System", Journal of NeuroEngineering and Rehabilitation, Vol.4, No.1, Apr 2007.    76control  applications.  But  as  recognized  in  [2],  self-paced  BCI  systems  deserve  more attention. The  discrete  wavelet  transform  (DWT)  can  be  used  as  a  powerful  feature extraction tool to extract time-frequency features similar in shape to that of a particular wavelet function. It therefore has an advantage over other feature extraction methods that operate in only one domain, such as the Fourier transform, or autoregressive modeling.   The DWT has been extensively applied in the analysis of event-related potential (ERP)  because  of  its  ability  to  effectively  explore  both  the  time  and  frequency information  of  these  signals  [11,  12].  It  has  also  been  successfully  used  to  generate wavelet  features  in  BCI  systems.  In  [13],  DWT  was  employed  in  the  design  of  a synchronized  BCI  system  that  used  wavelet  coefficients  extracted  from  slow  cortical potentials (SCPs) as well as other ERPs. This system performed better than other designs that used EEG time series and a mixed filtering method. In [14], the energies of various frequency bands decomposed by a wavelet packet transform (18 frequency bands in total) were  used  as  features  in  detecting  different  movement  patterns  in  a  self-paced  BCI system.  These  features  were  linearly  combined  to  generate  a  single  feature,  with coefficients of the linear mapping determined by a  genetic algorithm (GA).  In [15], a custom-made wavelet function was employed in two different studies: the detection of P300 in a single EEG channel, and the detection of the Bereitschaftspotential from two EEG  channels.  In  [16],  a  weighted  linear  combination  of  all  available  wavelet coefficients (15 in total) extracted from a single EEG channel was used to detect P300 patterns.  To estimate weights for each feature in the linear combination, a neural network was  employed.  Finally,  in  [17],  investigators  applied  DWT  to  extract  the  0-4Hz component  of  the  EEG  signal  in  a  P300-based  BCI  system.  Based  on  the  above encouraging results, in this study we explore applying DWT to extract movement-related potential (MRP) features for driving a self-paced BCI system. Although the above BCI studies provide promising evidence that DWT  can be employed to extract features in BCI systems, two main issues still need to be addressed. First,  studies  that  used  discrete  wavelet  coefficients  as  features  (rather  than  wavelet-filtered EEG signals), used only one or two EEG channels. In these cases, the resulting    77dimensionality of the space does not pose a serious problem, since it is not very large. Having a BCI system that uses data recorded from only one or two electrodes seems very appealing, since the setup is fast and uses less hardware/software infrastructure. Most of the above-mentioned papers, however, achieved a relatively high degree of classification error when only one or two EEG channels were used.  For example, in [16], the reported error rates were relatively high (nearly 40% error). In [17], where wavelet-filtered EEG signals were used, the system did not perform well (30% misclassification).  For the only self-paced BCI system that has applied wavelet coefficients so far [14] ,  false discovery rates (the percentage of hits that were not true positives) varied up to 67% , however, the authors  did  not  indicate  the  number  of  NC  epochs  used  in  their  study,  so  critical commentary on the performance of their BCI system cannot be made. The invasiveness of  the  recording  technology  of  the  BCI  system  in  [14]  is  also  an  important  issue  that needs to be considered. The above observations strongly motivate the use of additional EEG electrodes in BCI  systems.  With  signals  recorded  from  multiple  channels,  we  can  explore  spatial information, which is expected to yield improvements in classification performance.  Another issue that must be addressed when using DWT to extract features in BCI systems is the feature selection procedure. That is, how many features should be selected and  how  should  they  be  selected?  In  [13],  all  of  the  64  wavelet  features  used  for classification  were  extracted  from  only  one  EEG  channel.  In  [15],  because  of  the computational limitations affecting the classifier, only a number of top wavelet features (ranked by the amount of discriminability) were selected. None of the above-mentioned approaches yielded best results (since the feature selection process used was necessarily not  optimal).  Using  all  features  does  not  necessarily  provide  the  best  results,  because some of the less discriminant features may degrade the classifier?s performance [18]. On the other hand, using only few features that have the highest rank (and filtering out the rest of features) does not necessarily lead to the optimal classification performance, since there  is  no  guarantee  that  using  only  top-ranked  features  leads  to  the  best  classifier performance [19].     78Based on the related literature review, we postulate that the information extracted from multiple- electrode signals is necessary for achieving acceptable performance. This in turn leads us to the high dimensionality problem of the feature space; since the feature space dimension is directly affected by the number of electrodes used as well as by the number  of  features  per  EEG  signal.  Since  not  all  the  wavelet  coefficients  provide discriminatory  information  between  the  output  classes,  we  postulate  that  features  that better  discriminate  between  the  output  classes  need  to  be  selected  to  obtain  better classification performance. A mechanism for selecting the most discriminating features is thus needed. Wrapper  methods,  such  as  GAs,  use  the  classifier?s  performance  to  evaluate  a particular feature vector. They provide a good solution for finding the features that work well together by choosing the ones that lead to better classifier performance [20]. The downside of using wrapper methods is time inefficiency. As the dimension of the search space  increases,  it  becomes  harder  for  a  wrapper  method  to  find  a  suitable  subset  of features that lead to a high performance.  In order to benefit from the advantages of both filter and wrapper methods, we decided  to  employ  a  hybrid  approach.  Features  carrying  the  least  discriminative information about the output classes were filtered out first. Then a wrapper method was applied to the reduced feature space to find the features that work well together, i.e., the combination  that  leads  to  the  best  classification  performance.  We  used  mutual information (MI) in the filtering stage. Mutual information is a powerful tool for ranking features based on the amount of discriminative information each carries [21]. We then applied  a  GA  in  a  wrapper  approach  to  select  the  features  that  lead  to  the  best classification performance. Genetic algorithms are heuristic methods that can effectively sample  large  search  spaces  [22].  They  are  implemented  based  on  the  principles  of evolutionary  biology,  and  evolve  over  many  generations.  By  mimicking  this  process, GAs are able to evolve solutions to real-world problems. They have been shown to be useful tools in automatically customizing many practical systems [22, 23]. We used a support vector machine (SVM) to classify the selected features into one of two classes: no control (NC) or intentional control (IC). The results of this study    79show that applying the proposed approach to the offline data collected from four able-bodied individuals yields low false positive (FP) rates at a reasonably high true positive (TP) rate. We also examine the spatial distribution of the selected features. We show that this distribution varies considerably from one individual to another. This finding shows the importance of user customization of BCI systems. 3.2  Data collection People  with  severe  motor  disabilities  cannot  physically  execute  certain movements  such  as  a  finger  flexion,  but  they  are  usually  able  to  attempt  it.  Several studies  have  shown  that  recordings  of  brain  signals  obtained  from  attempted  and  real movements  for  able-bodied  individuals  bear  many  similarities  [14,  24-29].  Based  on these  studies,  both  attempted  and  executed  movements  have  been  shown  to  activate similar cortical areas and to generate similar movement patterns. This evidence enables us to base our analysis on the data of able-bodied individuals, who actually execute a particular movement. It is then possible to detect the occurrence of the control command by analyzing signals such as electromyography (EMG) signal or the output of an actual switch.  Such  signals  can  be  used  to  label  the  brain  signals  and  to  evaluate  the performance of a BCI.  The data analysis of individuals with motor disabilities was thus left to future studies. The data of four (three male and one female) able-bodied individuals were used in this study. All individuals were right-handed and between 31 and 56 years old. They had all signed consent forms prior to participation in the experiment. Individuals  were  positioned  150  cm  in  front  of  a  computer  monitor.  The  EEG signals  were  recorded  from  13  monopolar  electrodes  positioned  over  the  individuals? supplementary motor area and primary motor cortex (according to the International 10-20 System  at  F1,  Fz,  F2,  FC3,  FC1,  FCz,  FC2,  FC4,  C3,  C1,  Cz,  C2  and  C4  locations). Electrooculography (EOG) activity was measured as the potential difference between two electrodes,  placed  at  the  corner  of  and  below  the  right  eye.  An  ocular  artifact  was considered  present  when  the  difference  between  the  EOG  electrodes  exceeded  ?25  ?V. All signals were sampled at 128 Hz and referenced to ear electrodes (see [30] for details    80of  the  data  recording).  The  recorded  signals  were  then  saved  on  the  computer  and converted to bipolar EEG signals by calculating the difference between the adjacent EEG channels.  This  procedure  was  used  since  it  has  been  shown  that  bipolar  electrodes generate  more  discriminating  MRP  features  than  monopolar  electrodes  do  [3].  This conversion generated  the following 18 bipolar EEG channels: F1-FC1, F1-Fz , F2-Fz, F2-FC2 , FC3-FC1, FC3-C3, FC1-FCz, FC1-C1, FCz-FC2, C1-Cz, C2-C4 , FC2-FC4 , FC4-C4 , FC2-C2 , FCz-Cz , C3-C1 , Cz-C2 and Fz-FCz . Data were collected from individuals as they performed the following guided task.  At each interval, a white, 2cm diameter circle was displayed on the individual?s monitor for ? second, prompting the individual to attempt a movement. In response to this cue, the user had to perform a right index finger flexion one second after the cue appeared.  The 1-second delay was used to avoid visual evoked potential (VEP) effects caused by the cue (see [31] for more details). For each individual, an average of 80 IC epochs were collected every day over a period of 5 days.   An IC epoch consisted of data collected over an interval containing the movement onset  (measured  as  the  finger  switch  activation)  if  no  artifact  was  detected  in  that particular interval. The interval starts at tstart seconds before movement onset and ends at tfinish seconds after it. There were limitations in choosing the total length of (tstart+ tfinish). If the length of (tstart+ tfinish) increases, more artifacts may be present in an IC epoch.  As a result, the number of training epochs that are artifact-free based on the criterion used to reject ocular artifacts will be reduced. If the length of (tstart+ tfinish) is too short, a poor exploration of potential features results.  Since a simple finger flexion MRP usually starts about 1.5 seconds before the movement and returns back to the normal baseline around 1 second after the movement [32], data obtained from 1.5 seconds before  to 1.0 second after the movement onset were analyzed (i.e., tstart=1.5 seconds and tfinish =1.0 second). NC epochs were selected as follows. A window of width  (tstart+ tfinish) seconds was  considered  (tstart=1.5  seconds  and  tfinish  =1.0  second).  To  extract  NC  epochs,  the window was shifted over each EEG signal recorded during NC sessions by a step of 16 samples  (0.1250  sec).Wavelet  coefficients  were  extracted  for  each  epoch  that  did  not contain artifacts.     813.3   Method The overall structure of the proposed scheme is shown in Figure 3-1. EEG signals were checked for the presence of EOG artifacts. The contaminated epochs were rejected, as explained in Section 3.2.   Figure 3-1. The overall structure of the proposed hybrid method for extracting MRP features. The  continuous  wavelet  transform  (CWT)  is  defined  as  the  convolution  of  the signal x(t) with the wavelet functions  ), tba , where   ), tba is the dilated and shifted version of the wavelet function ) and is defined as follows: )1, abtaba ?                                                                                   (3-1) where  a and b are the scale and translation parameters, respectively. The CWT maps a signal of one independent variable t into a function of two independent variables a, b. This procedure is redundant and not efficient for algorithmic implementations. Therefore, it is more practical to define the wavelet transform at a discrete scale  a and a discrete time b by choosing the set of parameters (such a transform is called a discrete wavelet transform, or DWT), such that kjkjjj ., ?? ?   (j, k are integers)                                                           (3-2) The  contracted  versions  of  the  wavelet  function  will  match  the  high-frequency components of the original signal and the dilated versions will match the low-frequency oscillations.  Then  by  correlating  the  original  signal  with  the  wavelet  functions  of different  sizes,  the  details  of  the  signal  at  different  scales  are  obtained.  The  resulting correlation  features  can  be  arranged  in  a  hierarchical  scheme  called  multi-resolution decomposition [33] which separates the signal into ?details? at different frequency bands and a coarser representation of the signal called an ?approximation?.    82In  this  study,  the  rbio3.3  wavelet  from  the  B-spline  family  was  chosen  as  the wavelet function because it has some similarities with the shape of the classic bipolar MRP  pattern.    Using  a  5-level  decomposition  method  resulted  in  wavelet  coefficients corresponding to the following frequency bands (the sampling frequency was 128 Hz): [32-64], [16-32], [8-16], [4-8], [2-4], and [0-2] Hz. Based  on  the  previous  findings  in  [3],  which  showed  that  MRP  features  are mostly located in the frequency range below 4Hz , only the lowest frequency bands (i.e., 0-2Hz and 2-4Hz) were considered for further analysis of MRPs. Even with this reduced feature space, the resulting feature space dimension (Nfeatures), which is the product of the number  of  electrodes  (Nelectrodes)  and  the  number  of  wavelet  features  per  EEG  signal (Nwavelet).  That  is, waveletelectrodesfeatures N remained  very  high.  Thus,  a  feature selection  procedure  had  to  be  used  that  could  select  the  features  that  lead  to  optimal classification performance. This procedure should specify the selected EEG channels as well as the features selected per channel. We  devised  a  hybrid  feature  selection  algorithm  to  meet  these  requirements. Mutual information (MI) was employed in the filtering stage and a GA was then used to select the optimal set of features.  Although MI has been used elsewhere to filter out the less informative features [21, 34], it is not usually successful at finding features that lead to optimal classification performance.  This  is  because  when  there  are  more  than  three  feature  dimensions,  the calculation of MI is computationally demanding, and impossible for large feature spaces (since the calculation of MI requires the joint probability of features in a high dimension) [21, 34]. Thus, MI was only used in our algorithm to discard the least informative features based on the amount of information that each feature carries regarding the output classes. The  MI  between  the  input  feature  vector  X  and  the  output  classes  Y  was calculated as follows: )                           (3-3)      83where ???MjjyPyPH12 )(log).()Y(                                             (3-4) ? ?? ??NiMjixyPxyPxPH1 12 )(log).().()( XY                                                (3-5) ???NiixyPxPyP1)                                         (3-6) In these formulae,  I represents the mutual information between X and  Y, where X= {xi},  (i = 1,2,3,..., N) and Y= {yj}, ( j = 1,2,3,..., M) , N  is the number of input states and M is the number of outputs states (M=N=2, since the input and output can only take two values: IC and NC),P(xi ) is  the probability of occurrence of an input state xi , P(yj) is the  probability  of  the  output  class  yj  when  the  input  is  unknown,  and )ij x is  the probability of the output class  yj when the input state xi is known. For each individual, the wavelet coefficient (feature) values corresponding to all the  training  set  data  were  calculated.  Then,  using  histograms  with  10  bins  each,  the probability function of each feature was estimated and its mutual information with each of the output classes was calculated.  The values of MI were  calculated  for all  Nfeatures features and then ranked in descending order. The top L features were then selected. In this study, we arbitrarily chose L=50 to avoid having a feature space with a very high dimension.  After  reducing  the  dimension  of  the  feature  space,  a  GA  was  used  to  select  a subset of m features from the top L features. To represent each possible combination of features,  a  binary  chromosome  of  length  L  was  defined.  The  bit  i  of  the  binary chromosome specified whether or not the feature i was selected by the GA. A value of ?1?  indicated  the  presence  of  feature  i  and  a  value  of  ?0?  indicated  its  absence  in  a chromosome.  An important decision in the design of a GA is the definition of a proper fitness function. In the proposed design, a suitable fitness function should consider at least three    84objectives: maximizing the TP rate, minimizing the FP rate and minimizing the number of features selected by the hybrid feature selection procedure.  The  classification  performance  of  a  2-state,  self-paced  BCI  system  is  usually determined by a confusion matrix, as shown in Table 3-1. In Table 3-1, the FP rate is the percentage of instances for which an NC epoch is misclassified as an IC epoch, the true negative  (TN)  rate  is  the  percentage  of  NC  epochs  being  correctly  classified,  the  true positive (TP) rate is the percentage of IC epochs being correctly classified and the false negative (FN) rate is the percentage of misclassifying an IC epoch as an NC epoch. The fitness function should summarize this confusion matrix.  For a 2-state self-paced BCI system, we have (%)                              (3-7) and (%)                                                                                  (3-8) Table 3-1. The confusion matrix for a 2-state self-paced BCI system.                     Predicted Class Actual Class IC  NC IC  TP  FN NC  FP  TN  Based  on  3-7)  and  3-8),  only  TP  rates  (TPR)  and  FP  rates  (FPR)  need  to  be included  in  the  fitness  function.  One  example  of  a  fitness  function  is  a  function  that maximizes the   FPRTPR  ratio.   In this paper, the following objective function was used: ???????? %20,))%) TPRZZTPRZ                (3-9)    85where Z is a chromosome and f is the fitness function. This fitness function gives a higher fitness level to chromosomes that generate a higher  FPRTPR  ratio.  We also postulated that TP rates below 20% were too low for the successful operation of a self-paced BCI system (since they correspond to detection of less than one IC out of every five IC states, which may  lead  to  user  frustration,  even  though  the  FP  rates  might  be  very  low).  Such chromosomes were considered ?unfit? and were assigned a ?0? fitness value. Next, a lexicographic approach was applied for  multi-objective optimization of the  GA  population  [23].  Very  briefly,  in  this  approach,  the  objectives  were  ranked according to the priorities assigned to them prior to optimization. The objective with the highest priority was used first for comparing the members of the population. In our case, the average of  FPRTPR  over the validation sets was first selected as the objective function with  the  highest  priority.  The  chromosomes  were  then  ranked  in  a  single-objective fashion.  Any  ties  were  resolved  by  comparing  the  relevant  chromosomes  again  with respect to objectives that were assigned lower priority. The other three objectives were chosen as (1) the average of FP rate over the validation sets, (2) the average of TP rate over the validation set, and (3) the number of features, resulting in four objectives per chromosome in the GA population. The 2nd and 3rd objectives were ordered such that for two  chromosomes  with  the  same  FPRTPR   ratio,  the  one  with  the  lower  FP  rate  was considered to be the fit chromosome.      The  remaining  operators  of  the  GA  were  tournament-based  selection (tournament size =3), uniform crossover and uniform mutation. The sizes of the initial population  and  the  population  in  the  next  generations  were  chosen  as  100  and  50, respectively. We used random initialization to initialize the GA. Elitism was used to keep the best performing chromosome of each population in the subsequent populations.    The number of evaluations was set to 2000. If the improvement in the  FPRTPR  ratio of the best solution was found to be less  than 1% for more than  10 consecutive    86generations, the algorithm was terminated.  Because of the computational load, tuning the GA parameter values (such as the mutation and crossover rates) was not performed. A support vector machine (SVM) that uses kernel-based learning was chosen to classify  each  chromosome  in  the  GA  population.  In  kernel-based  learning,  all  of  the beneficial properties of linear classification methods, such as simplicity, are maintained, but the overall classification is nonlinear in the input space, since the feature and input spaces are nonlinearly related [35]. Another reason for selecting a SVM as a classifier is that SVMs not only minimize the empirical risk (training error), they also minimize the confidence error (test error) [36]. We used the  LIBSVM software [37], which has also been used in other BCI papers [38, 39].      The evaluation process was as follows. For each individual, IC and NC epochs were randomized and divided into training, validation and test sets.  The training set was used  to  train  the  classifier,  and  the  validation  set  was  used  to  select  the  best  set  of features. The configuration  yielding the best results on the validation set in the multi-objective  sense  mentioned  above  was  selected,  and  the  performance  of  the  system calculated on the test set was reported. We used a five-fold nested cross-validation for evaluating the performance of the system. For each outer cross-validation set, 20% of the data  were  used  for  testing  and  the  rest  were  used  for  training  and  model  selection (selection of optimal subset of features). In order to select the models, the datasets were further divided into five folds. For each fold, 80% of the data were used for training the classifier and 20% were used for model selection.      To deal with the problem of unbalanced training sets (there were at least 20 times  more  NC  epochs  than  IC  epochs),  the  size  of  the  NC  training  feature  set  was reduced  to  be  the  same  as  the  size  of  the  training  IC  feature  sets.  This  was  done  by randomly selecting epochs from the NC training set. 3.4  Results    In this section, we present the offline analysis of the data of the four individuals described in Section 3.2. We performed a search on the classifier?s parameters during the model  selection.  Our  findings  showed  that  a  5th  degree  polynomial  kernel  function    87performed better than other kernel functions studied (linear, polynomial with a degree other than 5 (3, 4, 6 and 7) and RBF kernel).     Since  a  five-fold  nested  cross-validation  was  used  for  the  performance evaluation,  the  results  were  averaged  over  five  runs  of  the  outer  validation  sets.  The columns 1 to 5 of  Table 3-2 show the individual identification number, the average TP rate on the test sets, the average FP rate on the test sets, the average  FPRTPR ratio and the average number of features selected by the hybrid feature selection process.  The latest performance results of another state-of-the art self-paced BCI system (the LF-ASD) [40], applied to the data of individuals AB1 to AB4 are presented in columns 6 to 9 of Table 3-2.  The numbers in parentheses are the standard deviations. As Table 3-2 shows, our proposed design achieved low FP rates for three of the four individuals (individuals AB1, AB2 and AB4) for a relatively high TP rate. For individual AB3, the TPR results on the test sets were low (although the FP rates remained less than 4%).  Table 3-2. Comparison of the average TP, average FP rates, average  FPRTPR  and the average number of features.    Individual ID Test Set (Current Study) Number of features (Current Study) Test Set ([[40]]) Number of Features ([[40]])   TPR  FPR FPRTPR   TPR  FPR FPRTPR  AB1  68.0 (4.8) 1.0 (0.3) 68.0  30.6 (1.1) 67.8 (1.4) 2.0  33.9  6 AB2  73.3 (2.6) 1.4 (0.4) 52.4  29.2 (3.3) 74.0 (1.7) 2.0  37.0  6 AB3  33.1 (14.0) 3.9 (1.0) 8.5  23.4 (2.4) 64.0 (1.3) 2.0  32.0  6 AB4  56.1 (4.9) 1.4 (0.7) 40.0  27.0 (2.8) 73.1 (1.8) 2.0  36.6  6 Average  57.4  1.9  30.2  27.5  69.7  2.0  34.9  6    88   Next,  the  spatial  distributions  of  the  selected  features  were  examined.      The average number of selected features per channel is shown in Table 3-3. The numbers in parentheses show the standard deviation over five runs of outer cross-validation. Figure 3-2 to Figure 3-5 show the number of selected features per channel for all individuals after applying the hybrid selection method (averaged over the number of cross-validation  Table 3-3. The average number of selected features per channel after applying the hybrid feature selection algorithm. Individual ID Channel AB1  AB2  AB3  AB4 F1-FC1  3.6 (1.1)  3  (1.2)  1.8 (0.8)  3 (0.7) F1-Fz  0.0 (0.0)  0.0 (0.0)  0.0 (0.0)  3.4 (0.5) F2-Fz  0.0 (0.0)  1.6 (0.9)  0.4 (0.5)  0.0 (0.0) F2-FC2  0.2 (0.4)  2 (0.7)  0.8 (0.8)  0.4 (0.5) FC3-FC1  1.0 (0.0)  1.0 (0.0)  1.6 (0.9)  0.0 (0.0) FC3-C3  1 (0.71)  3.0 (0.0)  2.4 (1.14)  1.6 (0.5) FC1-FCz  0.0 (0.0)  1.0 (0.0)  0.6 (0.5)  1.2 (0.84) FC1-C1  4.6 (0.5)  2.8 (0.4)  0.0 (0.0)  1.2 (0.4) FCz-FC2  0.0 (0.0)  2.2 (0.4)  0.6 (0.5)  0.0 (0.0) C1-Cz  1.6 (0.5)  0.4 (0.5)  3.6 (1.1)  1.2 (0.4) C2-C4  0.6 (0.5)  2.2 (0.4)  4.4 (0.9)  2.6 (0.9) FC2-FC4  4.2 (0.4)  1.6 (0.9)  2.2 (1.1)  3.4 (1.1) FC4-C4  3.2 (0.45)  2 (1.0)  1.8 (0.8)  4.4 (0.5) FC2-C2  2.0(0.0)  2.2 (0.4)  0.6 (0.5)  2.2 (0.4) FCz-Cz  1.6 (0.9)  0.6 (0.5)  0.2 (0.4)  0.8 (0.4) C3-C1  1 (0.7)  2.0 (0.0)  2.0 (0.0)  0.0 (0.0) Cz-C2  3.8 (0.4)  0.0 (0).0  0.0 (0.0)  0.6 (0.5) Fz-FCz  2.2 (1.3)  1.6 (0.5)  0.4 (0.5)  1.0 (0.7)    89sets).  The  low  standard  deviation  obtained  for  all  cases  shows  the  robustness  of  the proposed method over different runs of the algorithm.     3.5  Discussion and conclusions Discrete  wavelet  transform  (DWT)  is  a  useful  feature  extraction  tool  since  it explores the time as well as the frequency information of the signal. Although DWT has been  employed to some degree of success in a  number of synchronized BCI  systems, there remain some limitations in its application to self-paced BCI systems (in terms of the large size of the feature space).  Brain computer interface systems that use DWT features have mostly employed only one or two channels (perhaps due to the large dimensionality of the feature space or to  limitations  imposed  by  the  experimental  protocol).  To  simultaneously  explore  the wavelet coefficients (features) of BCIs with more channels (so as to explore the spatial information) and to avoid the problems associated with the resultant large feature space, a two-stage (hybrid) feature selection algorithm is proposed. The first stage uses mutual information (MI) to discard the least informative features. In the second stage, a genetic algorithm (GA) selects those remaining features that lead to better system performance in the sense of meeting multiple objectives. In  our  study,  the  features  selected  per  channel  varied  considerably  from  one individual to another, as shown in Figure 3-2 to Figure 3-5. For example, for individual AB1, more features were selected from channels FC1-C1, F1-FC1, Fz-FCz, FC4-C4, FC2-FC4  and  Cz-C2,  while  for  individual  AB4,  more  features  were  selected  from  channels FC4-C4, FC2-FC4, F1-Fz, C2-C4, F1-FC1, and FC2-C2.  These results support the hypothesis that  proper  channel  selection  for  every  individual  is  necessary  to  obtain  superior performance.     90 Figure 3-2. Spatial distribution of the average number of selected features for AB1.  Figure 3-3. Spatial distribution of the average number of selected features for AB2.    91 Figure 3-4. Spatial distribution of the average number of selected features for AB3.  Figure 3-5. Spatial distribution of the average number of selected features for AB4.    92Another  finding  from  Figure  3-2  to  Figure  3-5  is  that  the  relevant  features  for each individual were unique. These findings are in contrast to an earlier study done by our group that empirically determined six pairs of electrodes for all individuals (channels F1-FC1,  F2-FC2 , FC1-C1,  FC2-C2 , FCz-Cz , and Fz-FCz) [3]. Our findings in this regard are not surprising. The evidence from the literature supports the hypothesis that there is a significant amount of inter-subject variability in terms of generating MRP patterns [41]. The  literature  also  shows  that  the  selected  features  are  not  necessarily  located  in  the standard  frequency  bands  or  on  specific  scalp  locations,  and  that  the  set  of  selected features differs from individual to individual [42]. These studies support the notion that a customized BCI system should be designed for each individual.  Table 3-3 shows that for each individual, a number of bipolar channels were not selected by the feature selection process (such as channel F1-Fz for individuals AB1, AB2 and  AB3,  and  channel  FC3-FC1  for  individual  AB4).  These  results  indicate  that  these channels can be eliminated from the analysis in future studies. Moreover, Table 3-3 and Figure  3-2  to  Figure  3-5  show  that  the  degree  of  contribution  to  the  classification performance varies  from one channel to another. These results indicate that a channel elimination  methodology  could  be  incorporated  into  the  proposed  method  to  further decrease the number of channels used for the operation of the system. Such an approach would  rank  the  channels  according  to  the  number  of  selected  features.  It  would  then repeatedly  eliminate  the  channel  with  the  lowest  contribution  to  fitness  until  the performance  drops  below  a  certain  threshold  (recursive  elimination  of  channels). Systematic elimination of channels can lead to a faster setup of the system as well as decreased  computational  time.  This  could  be  part  of  future  research  works  aimed  at moving towards a more practical system.   It should be mentioned that it is difficult to directly compare the results of our study  with  other  BCI  studies.  This  is  because  the  user  population  (whether  or  not individuals are able-bodied), the experimental protocols, the evaluation protocol and the neurological  phenomenon  differ  from  one  study  to  another.  In  addition,  the  degree  of training individuals receive before participating in a BCI experiment, vary among studies.    93We can, however, compare our current results with the latest design of a state-of-the-art self-paced BCI system called the low frequency?asynchronous switch design (the LF-ASD) [40]. Both studies use the same individuals, the same experimental protocol, the same EEG data and similar evaluation protocol.   The LF-ASD (originally reported in [3] and later modified as reported in [40]) uses a feature extractor with a shape similar to a wavelet function, and extracts features from six bipolar EEG channels. The Karhunen-Lo?ve Transform (KLT) is used to reduce the  6-dimensional  feature  space  produced  by  the  feature  generator  to  a  2-dimensional space.  A  1-NN  classifier  is  used  as  the  feature  classifier.    A  moving  average  and  a debounce algorithm are employed to improve the performance of the system by reducing the number of false activations. The parameter values of the system were estimated by an expert (for details, see [3, 30, 40]).  The latest performance results of the LF-ASD [40], applied to the data of individuals AB1 to AB4 are presented in columns 6 to 9 of Table 3-2.  As can be seen from the table, our proposed system has resulted in an increased FPRTPR ratio for all individuals (with the exception of individual AB3) . Specifically, the FPRTPR ratio increased from 33.9 to 67.7 for individual AB1 (relative improvement of 99.5 %), from 37.0 to 52.4 for individual AB2 (relative improvement of 41.6%), and from 36.5 to 39.8 for individual AB4 (relative improvement of 8.9%). These results show that our proposed approach improved the performance of most individuals compared with the latest design of the LF-ASD. The degree of improvements in the  FPRTPR  ratio, however, is not statistically significant ( 05 ), so tests on the data of more individuals are needed to  further  substantiate  this  improvement.  Note  that  the  improved  performance  was achieved at the expense of using more features (please see columns 6 and 9 in Table 3-2).  The relatively poor results obtained for individual AB3 may be partly related to our choice of wavelet function. Note that the wavelet function chosen for this study was based on the similarities between the chosen wavelet function and a typical bipolar MRP ensemble  average  pattern.  However,  there  is  substantial  inter-subject  variability  in  the shape of MRPs, especially in single trials [40]. It is expected that by analyzing a more    94diverse  family  of  wavelet  functions,  a  different  wavelet  function  might  be  chosen  for each individual that would produce superior results. As mentioned in Section 3.3, we designated the number of features chosen by the MI to be L=50. Fewer features would have sped up the process of feature selection at the second stage, but might have resulted in a lower fitness value. To test this possibility, we compared the fitness of the best subset of features (see Table 3-2) with that of all features for individual AB1 (see Figure 3-6). In this figure, the thick line shows the fitness of the best  configuration  (calculated  from  Table  3-2).  The  thin  line  shows  the  fitness  of  the classifier as a function of the number of top features. We began by training and testing the classif