"Applied Science, Faculty of"@en . "Electrical and Computer Engineering, Department of"@en . "Non UBC"@en . "DSpace"@en . "Journal of NeuroEngineering and Rehabilitation. 2007 Apr 30;4(1):11"@en . "Fatourechi et al."@en . "Fatourechi, Mehrdad"@en . "Birch, Gary E."@en . "Ward, Rabab Kreidieh"@en . "2016-02-03T23:02:42Z"@* . "2007-04-30"@en . "Background:\r\n Recently, successful applications of the discrete wavelet transform have been reported in brain interface (BI) systems with one or two EEG channels. For a multi-channel BI system, however, the high dimensionality of the generated wavelet features space poses a challenging problem.\r\n \r\n \r\n Methods:\r\n In this paper, a feature selection method that effectively reduces the dimensionality of the feature space of a multi-channel, self-paced BI system is proposed. The proposed method uses a two-stage feature selection scheme to select the most suitable movement-related potential features from the feature space. The first stage employs mutual information to filter out the least discriminant features, resulting in a reduced feature space. Then a genetic algorithm is applied to the reduced feature space to further reduce its dimensionality and select the best set of features.\r\n \r\n \r\n Results:\r\n An offline analysis of the EEG signals (18 bipolar EEG channels) of four able-bodied subjects showed that the proposed method acquires low false positive rates at a reasonably high true positive rate. The results also show that features selected from different channels varied considerably from one subject to another.\r\n \r\n \r\n Conclusion:\r\n The proposed hybrid method effectively reduces the high dimensionality of the feature space. The variability in features among subjects indicates that a user-customized BI system needs to be developed for individual users."@en . "https://circle.library.ubc.ca/rest/handle/2429/56856?expand=metadata"@en . "ralJournal of NeuroEngineering and ssBioMed CentRehabilitationOpen AcceResearchApplication of a hybrid wavelet feature selection method in the design of a self-paced brain interface systemMehrdad Fatourechi*1, Gary E Birch1,2,3 and Rabab K Ward1,3Address: 1Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC V6T 1Z4, Canada, 2Neil Squire Society, Burnaby, BC V5M 3Z3, Canada and 3Institute for Computing, Information and Cognitive Systems, Vancouver, BC V6T 1Z4, CanadaEmail: Mehrdad Fatourechi* - mehrdadf@ece.ubc.ca; Gary E Birch - garyb@neilsquire.ca; Rabab K Ward - rababw@ece.ubc.ca* Corresponding author AbstractBackground: Recently, successful applications of the discrete wavelet transform have beenreported in brain interface (BI) systems with one or two EEG channels. For a multi-channel BIsystem, however, the high dimensionality of the generated wavelet features space poses achallenging problem.Methods: In this paper, a feature selection method that effectively reduces the dimensionality ofthe feature space of a multi-channel, self-paced BI system is proposed. The proposed method usesa two-stage feature selection scheme to select the most suitable movement-related potentialfeatures from the feature space. The first stage employs mutual information to filter out the leastdiscriminant features, resulting in a reduced feature space. Then a genetic algorithm is applied tothe reduced feature space to further reduce its dimensionality and select the best set of features.Results: An offline analysis of the EEG signals (18 bipolar EEG channels) of four able-bodiedsubjects showed that the proposed method acquires low false positive rates at a reasonably hightrue positive rate. The results also show that features selected from different channels variedconsiderably from one subject to another.Conclusion: The proposed hybrid method effectively reduces the high dimensionality of thefeature space. The variability in features among subjects indicates that a user-customized BI systemneeds to be developed for individual users.BackgroundA successful brain interface (BI) system enables individu-als with severe motor disabilities to control objects intheir environment (such as a light switch, neural prosthe-sis or computer) by using only their brain signals. Such asystem measures specific features of a person's brain sig-nal that relate to his or her intent to affect control, thenBrain interface systems are implemented in two ways: sys-tem-paced (synchronized) or self-paced (asynchronous).In system-paced BI systems, a user can initiate a commandonly during certain periods specified by the system. In aself-paced BI system, users can affect the output of the BIsystem whenever they want, by intentionally changingtheir brain state. The state in which a user is intentionallyPublished: 30 April 2007Journal of NeuroEngineering and Rehabilitation 2007, 4:11 doi:10.1186/1743-0003-4-11Received: 13 May 2006Accepted: 30 April 2007This article is available from: http://www.jneuroengrehab.com/content/4/1/11\u00C2\u00A9 2007 Fatourechi et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Page 1 of 13(page number not for citation purposes)translates them into control signals that are used to con-trol a device [1,2].attempting to control a device is called an intentional con-trol (IC) state. At other times, users are said to be in a no-Journal of NeuroEngineering and Rehabilitation 2007, 4:11 http://www.jneuroengrehab.com/content/4/1/11control (NC) state, where they may be idle, thinking abouta problem, or performing some action other than trying tocontrol the device[3,4]. To operate in this paradigm, BIsystems should be designed to respond only when theuser is in an IC state and to remain inactive when the useris in an NC state. So far, only a few BI systems (e.g. [3,5-10]) have been specifically designed and tested for self-paced control applications. But as recognized in [2], self-paced BI systems deserve more attention.The discrete wavelet transform (DWT) can be used as apowerful feature extraction tool to extract time-frequencyfeatures similar in shape to that of a particular waveletfunction. It therefore has an advantage over other featureextraction methods that operate in only one domain, suchas the Fourier transform, or autoregressive modeling.The DWT has been extensively applied in the analysis ofevent-related potential (ERP) because of its ability toeffectively explore both the time and frequency informa-tion of these signals [11,12]. It has also been successfullyused to generate wavelet features in BI systems. In [13],DWT was employed in the design of a system-paced BIsystem that used wavelet coefficients extracted from slowcortical potentials (SCPs) as well as other ERPs. This sys-tem performed better than other designs that used EEGtime series and a mixed filtering method. In [14], the ener-gies of various frequency bands decomposed by a waveletpacket transform (18 frequency bands in total) were usedas features in detecting different movement patterns in aself-paced BI system. These features were linearly com-bined to generate a single feature, with coefficients of thelinear mapping determined by a genetic algorithm (GA).In [15], a custom-made wavelet function was employed intwo different studies: the detection of P300 in a singleEEG channel, and the detection of the Bereitschafts poten-tial from two EEG channels. In [16], a weighted linearcombination of all available wavelet coefficients (15 intotal) extracted from a single EEG channel was used todetect P300 patterns. To estimate weights for each featurein the linear combination, a neural network wasemployed. Finally, in [17], DWT was applied to extract the0\u00E2\u0080\u00934 Hz component of the EEG signal in a P300-based BIsystem. Based on the above encouraging results, in thisstudy we explore applying DWT to extract movement-related potential (MRP) features for driving a self-paced BIsystem.Although the above BI studies provide promising evi-dence that DWT can be employed to extract features in BIsystems, two main issues still need to be addressed. First,studies that used discrete wavelet coefficients as features(rather than wavelet-filtered EEG signals), used only one orsince it is not very large. Having a BI system that uses datarecorded from only one or two electrodes seems veryappealing, since the setup is fast and uses less hardware/software infrastructure. Most of the above-mentionedpapers, however, achieved a relatively high degree of clas-sification error when only one or two EEG channels wereused. For example, in [16], the reported error rates wererelatively high (nearly 40% error). In [17], where wavelet-filtered EEG signals were used, the system did not performwell (30% misclassification). For the only self-paced BIsystem that has applied wavelet coefficients so far [14],false activation rates (the percentage of hits that were nottrue positives) varied up to 67%, however, the authors didnot indicate the number of NC epochs used in their study,so critical commentary on the performance of their BI sys-tem cannot be made. The invasiveness of the recordingtechnology of the BI system in [14] is also an importantissue that needs to be considered.The above observations strongly motivate the use of addi-tional EEG electrodes in BI systems. With signals recordedfrom multiple channels, we can explore spatial informa-tion, which is expected to yield improvements in classifi-cation performance.Another issue that must be addressed when using DWT toextract features in BI systems is the feature selection proce-dure. That is, how many features should be selected andhow should they be selected? In [13], all of the 64 waveletfeatures used for classification were extracted from onlyone EEG channel. In [15], because of the computationallimitations affecting the classifier, only a number of topwavelet features (ranked by the amount of discriminabil-ity) were selected. None of the above-mentionedapproaches yielded best results (since the feature selectionprocess used was not necessarily optimal). Using all fea-tures does not necessarily provide the best results, becausesome of the less discriminant features may degrade theclassifier's performance [18]. On the other hand, usingonly few features that have the highest rank (and filteringout the rest of features) does not necessarily lead to opti-mal classification performance, since there is no guaranteethat using only top-ranked features leads to the best clas-sifier performance[19].Based on the related literature review, we postulate thatthe information extracted from multiple-electrode signals isnecessary for achieving acceptable performance. This inturn leads us to the high dimensionality problem of thefeature space, since the feature space dimension is directlyaffected by the number of electrodes used as well as by thenumber of features per EEG signal. Since not all the wave-let coefficients provide discriminatory informationPage 2 of 13(page number not for citation purposes)two EEG channels. In these cases, the resulting dimen-sionality of the space does not pose a serious problem,between the output classes, we postulate that features thatbetter discriminate between the output classes need to beJournal of NeuroEngineering and Rehabilitation 2007, 4:11 http://www.jneuroengrehab.com/content/4/1/11selected to obtain better classification performance. Amechanism for selecting the most discriminating featuresis thus needed.Wrapper methods, such as GAs, use the classifier's per-formance to evaluate a particular feature vector. They pro-vide a good solution for finding the features that workwell together by choosing the ones that lead to better clas-sifier performance [20]. The downside of using wrappermethods is time inefficiency. As the dimension of thesearch space increases, it becomes harder for a wrappermethod to find a suitable subset of features that lead to ahigh performance.In order to benefit from the advantages of both filter andwrapper methods, we decided to employ a hybridapproach. Features carrying the least discriminative infor-mation about the output classes were filtered out first.Then a wrapper method was applied to the reduced fea-ture space to find the features that work well together, i.e.,the combination that leads to the best classification per-formance. We used mutual information (MI) in the filter-ing stage. Mutual information is a powerful tool forranking features based on the amount of discriminativeinformation each carries [21]. We then applied a GA in awrapper approach to select the features that lead to thebest classification performance. Genetic algorithms areheuristic methods that can effectively sample large searchspaces [22]. They are implemented based on the princi-ples of evolutionary biology, and evolve over many gener-ations. By mimicking this process, GAs are able to evolvesolutions to real-world problems. They have been shownto be useful tools in automatically customizing manypractical systems [22,23].We used a support vector machine (SVM) to classify theselected features into one of two classes: no control (NC)or intentional control (IC). The results of this study showthat applying the proposed approach to the offline datacollected from four able-bodied subjects yields low falsepositive (FP) rates at a reasonably high true positive (TP)rate. We also examine the spatial distribution of theselected features. We show that this distribution variesconsiderably from one subject to another. This findingshows the importance of user customization of BI sys-tems.Data collectionPeople with severe motor disabilities cannot physicallyexecute certain movements such as a finger flexion, butthey are usually able to attempt it. Several studies haveshown that recordings of brain signals obtained fromattempted and real movements for able-bodied subjectsshown to activate similar cortical areas and to generatesimilar movement patterns. This evidence enables us tobase our analysis on the data of able-bodied subjects, whoactually execute the particular movement. It is then possi-ble to detect the occurrence of the control command byanalyzing signals such as electro-myographic (EMG) sig-nal or the output of an actual switch. Such signals can beused to label the brain signals and to evaluate the per-formance of a BI. The data analysis of individuals withmotor disabilities was thus left to future studies.The data of four (three male and one female) able-bodiedsubjects were used in this study. All subjects were right-handed and between 31 and 56 years old. They had allsigned consent forms prior to participation in the experi-ment.Subjects were positioned 150 cm in front of a computermonitor. The EEG signals were recorded from 13 monop-olar electrodes positioned over the subjects' supplemen-tary motor area and primary motor cortex (according tothe International 10\u00E2\u0080\u009320 System at F1, Fz, F2, FC3, FC1, FCz,FC2, FC4, C3, C1, Cz, C2 and C4 locations). Electro-oculo-graphic (EOG) activity was measured as the potential dif-ference between two electrodes, placed at the corner ofand below the right eye. An ocular artifact was consideredpresent when the difference between the EOG electrodesexceeded \u00C2\u00B1 25 \u00C2\u00B5V. All signals were sampled at 128 Hz andreferenced to ear electrodes (see [30] for details of the datarecording). The recorded signals were then saved on thecomputer and converted to bipolar EEG signals by calcu-lating the difference between the adjacent EEG channels.This procedure was used since it has been shown thatbipolar electrodes generate more discriminating featuresthan do monopolar electrodes [3]. This conversion gener-ated the following 18 bipolar EEG channels: F1-FC1, F1-Fz,F2-Fz, F2-FC2, FC3-FC1, FC3-C3, FC1-FCz, FC1-C1, FCz-FC2,C1-Cz, C2-C4, FC2-FC4, FC4-C4, FC2-C2, FCz-Cz, C3-C1, Cz-C2 and Fz-FCz.The data were collected from subjects as they performedthe following guided task. At each interval, a white, 2 cmdiameter circle was displayed on the subject's monitor for1/4 second, prompting the subject to attempt a move-ment. In response to this cue, the subject had to performa right index finger flexion one second after the cueappeared. The 1-second delay was used to avoid visualevoked potential (VEP) effects caused by the cue (see [31]for more details). For each subject, 80 IC epochs were col-lected on average every day over a period of 5 days.An IC epoch consisted of data collected over an intervalcontaining the movement onset (measured as the fingerPage 3 of 13(page number not for citation purposes)bear many similarities [14,24-29]. Based on these studies,both attempted and executed movements have beenswitch activation) if no artifact was detected in that partic-ular interval. The interval starts at tstart seconds beforeJournal of NeuroEngineering and Rehabilitation 2007, 4:11 http://www.jneuroengrehab.com/content/4/1/11movement onset and ends at tfinish seconds after it. Therewere limitations in choosing the total length of (tstart + tfin-ish). If the length of (tstart + tfinish) increases, more artifactsmay be present in an IC epoch. As a result, the number oftraining epochs that are artifact-free based on the criterionused to reject ocular artifacts will be reduced. If the lengthof (tstart + tfinish) is too short, a poor exploration of potentialfeatures results. Since a simple finger flexion MRP usuallystarts about 1.5 seconds before the movement and returnsback to the normal baseline around 1 second after themovement [32], the data obtained from 1.5 secondsbefore to 1.0 second after the movement onset were ana-lyzed (i.e., tstart = 1.5 seconds and tfinish = 1.0 second).The NC epochs were selected as follows. A window ofwidth (tstart + tfinish) seconds was considered (tstart = 1.5 sec-onds and tfinish = 1.0 second). To extract NC epochs, thewindow was shifted over each EEG signal recorded duringNC sessions by a step of 16 samples (0.1250 sec). Waveletcoefficients were extracted for each epoch that did notcontain artifacts.MethodThe overall structure of the proposed scheme is shown inFigure 1. EEG signals were checked for the presence ofEOG artifacts. The contaminated epochs were rejected, asexplained in the \"Data Collection\" Section.The continuous wavelet transform (CWT) is defined as theconvolution of the signal x(t) with the wavelet functions\u00CF\u0088a,b(t) where \u00CF\u0088a,b(t) is the dilated and shifted version ofthe wavelet function \u00CF\u0088(t) and is defined as follows:where a and b are the scale and translation parameters,respectively. The CWT maps a signal of one independentvariable t into a function of two independent variables a,b. This procedure is redundant and not efficient for algo-rithmic implementations. Therefore, it is more practical todefine the wavelet transform at a discrete scale a and a dis-crete time b by choosing the set of parameters (this trans-form is called a discrete wavelet transform, or DWT), suchthataj = 2-j, bj,k = 2-j\u00C2\u00B7k (j, k are integers) (2)The contracted versions of the wavelet function will matchthe high-frequency components of the original signal andthe dilated versions will match the low-frequency oscilla-tions. Then by correlating the original signal with thewavelet functions of different sizes, the details of the sig-nal at different scales are obtained. The resulting correla-tion features can be arranged in a hierarchical schemecalled multi-resolution decomposition [33]. Multi-resolu-tion decomposition separates the signal into \"details\" atdifferent frequency bands and a coarser representation ofthe signal called an \"approximation\".For our study, the rbio3.3 wavelet from the B-spline familywas chosen as the wavelet function because it has somesimilarities with the shape of the classic bipolar MRP pat-tern. Using a 5-level decomposition method resulted inwavelet coefficients corresponding to the following fre-quency bands (the sampling frequency was 128 Hz): [32-64], [16-32], [8-16], [4-8], [2-4], and [0\u00E2\u0080\u00932] Hz.Based on the previous findings in [3], which showed thatMRP features are mostly located in the frequency rangebelow 4 Hz, only the lowest frequency bands (i.e., 0\u00E2\u0080\u00932 Hzand 2\u00E2\u0080\u00934 Hz) were considered for further analysis of MRPs.Even with this reduced feature space, the resulting featurespace dimension (Nfeatures), which is the product of thenumber of electrodes (Nelectrodes) and the number of wave-let features per EEG signal (Nwavelet). That is, Nfeatures = Nelec-trodes \u00C3\u0097 Nwavelet remained very high. Thus, a feature selectionprocedure had to be used that could select thefeatures thatlead to optimal classification performance. This proce-dure should specify the selected EEG channels as well asthe features selected per channel.We devised a hybrid feature selection algorithm to meetthese requirements. Mutual Information (MI) wasemployed in the filtering stage and a GA was then used toselect the optimal set of features.Although MI has been used elsewhere to filter out the lessinformative features [21,34], it is not usually successful at\u00CF\u0088 \u00CF\u0088a b t at ba,( ) ( )= \u00E2\u008B\u0085\u00E2\u0088\u0092 (1)The overall structure of the proposed methodFigure 1Page 4 of 13(page number not for citation purposes)The overall structure of the proposed method.Journal of NeuroEngineering and Rehabilitation 2007, 4:11 http://www.jneuroengrehab.com/content/4/1/11finding features that lead to optimal classification per-formance. This is because when there are more than threefeature dimensions, the calculation of MI is computation-ally demanding, and impossible for large feature spaces(since the calculation of MI requires the joint probabilityof features in a high dimension) [21,34]. Thus, MI wasonly used in our algorithm to discard the least informativefeatures based on the amount of information that eachfeature carries regarding the output classes.The MI between the input feature vector X and the outputclasses Y was calculated as follows:I(X, Y) = H(Y) - H(Y|X) (3)WhereIn these formulae, I represents the mutual informationbetween X and Y, where X = {xi}, (i = 1,2,3,..., N) and Y ={yj}, (j = 1,2,3,..., M), N is the number of input states andM is the number of outputs states (M = N = 2, since theinput and output can only take two values: IC and NC),P(xi) is the probability of occurrence of an input state xi,P(yj) is the probability of the output class yj when theinput is unknown, and P(yj|xi) is the probability of theoutput class yj when the input state xi is known.For each subject, the wavelet coefficient (feature) valuescorresponding to all the training set data were calculated.Then, using histograms with 10 bins each, the probabilityfunction of each feature was estimated and its mutualinformation with each of the output classes was calcu-lated. The values of MI were calculated for all Nfeatures fea-tures and then ranked in descending order. The top Lfeatures were then selected. In this study, we arbitrarilychose L = 50 to avoid having a feature space with a veryhigh dimension.After reducing the dimension of the feature space, a GAwas used to select a subset of m features from the top L fea-tures. To represent each possible combination of features,a binary chromosome of length L was defined. The bit i ofpresence of feature i and a value of \"0\" indicated itsabsence in a chromosome.An important decision in the design of a GA is the defini-tion of a proper fitness function. In the proposed design,a suitable fitness function should consider at least threeobjectives: maximizing the TP rate, minimizing the FP rateand minimizing the number of features selected by thehybrid feature selection procedure.The classification performance of a 2-state, self-paced BIsystem is usually determined by a confusion matrix, asshown in Table 1. In Table 1, the FP rate is the percentageof instances for which an NC epoch is misclassified as anIC epoch, the true negative (TN) rate is the percentage ofNC epochs being correctly classified, the true positive (TP)rate is the percentage of IC epochs being correctly classi-fied and the false negative (FN) rate is the percentage ofmisclassifying an IC epoch as an NC epoch. The fitnessfunction should summarize this confusion matrix. For a2-state self-paced BI system, we haveFN(%) = 100(%) - TP(%) (7)andTN(%) = 100(%) - FP(%) (8)Based on equations (7) and (8), only TP and FP rates needto be included in the fitness function. One example of afitness function is a function that maximizes the ratio. In this paper, the following objective function wasused:where Z is a chromosome and f is the fitness function.This fitness function gives a higher fitness level to chromo-somes that generate a higher ratio. We also postulatedthat TP rates below 20% were too low for the successfuloperation of a self-paced BI system (since they correspondto detection of less than one IC out of every five IC states,H P y P yjjMj( ) ( ) log ( )Y = \u00E2\u0088\u0092 \u00E2\u008B\u0085=\u00E2\u0088\u009112 (4)H P x P y x P y xi j i j ijMiN( | ) ( ) ( | ) log ( | )Y X = \u00E2\u0088\u0092 \u00E2\u008B\u0085 \u00E2\u008B\u0085==\u00E2\u0088\u0091\u00E2\u0088\u0091 211(5)P y P x P y xj i j iiN( ) ( ) ( | )= \u00E2\u008B\u0085=\u00E2\u0088\u00911(6)TPFPf ZTPTP ZFP ZTP( ), %( )( ), %=<\u00E2\u0089\u00A5\u00EF\u00A3\u00B1\u00EF\u00A3\u00B2\u00EF\u00A3\u00B4\u00EF\u00A3\u00B3\u00EF\u00A3\u00B40 2020(9)TPFPTable 1: The confusion matrix for a 2-state self-paced BI system.Actual Class/Predicted Class IC NCIC TP FNPage 5 of 13(page number not for citation purposes)the binary chromosome specified whether or not the fea-ture i was selected by the GA. A value of \"1\" indicated theNC FP TNJournal of NeuroEngineering and Rehabilitation 2007, 4:11 http://www.jneuroengrehab.com/content/4/1/11which may lead to user frustration, even though the FPrates might be very low). Such chromosomes were consid-ered \"unfit\" and were assigned a \"0\" fitness value.Next, a lexicographic approach was applied for multi-objective optimization of the GA population [23]. Verybriefly, in this approach, the objectives were rankedaccording to the priorities assigned to them prior to opti-mization. The objective with the highest priority was usedfirst for comparing the members of the population. In ourcase, the average of over the validation sets was firstselected as the objective function with the highest priority.The chromosomes were then ranked in a single-objectivefashion. Any ties were resolved by comparing the relevantchromosomes again with respect to objectives that wereassigned lower priority. The other three objectives werechosen as (1) the average of FP rate over the validationsets, (2) the average of TP rate over the validation set, and(3) the number of features, resulting in four objectives perchromosome in the GA population. The 2nd and 3rd objec-tives were ordered such that for two chromosomes withthe same ratio, the one with the lower FP rate wasconsidered to be the fit chromosome.The remaining operators of the GA were tournament-based selection (tournament size = 3), uniform crossoverand uniform mutation. The sizes of the initial populationand the population in the next generations were chosen as100 and 50, respectively. We used random initializationto initialize the GA. Elitism was used to keep the best per-forming chromosome of each population in the subse-quent populations.The number of evaluations was set to 2000. If theimprovement in the ratio of the best solution wasfound to be less than 1% for more than 10 consecutivegenerations, the algorithm was terminated. Because of thecomputational load, tuning the GA parameter values(such as the mutation and crossover rates) was not per-formed.A support vector machine (SVM) that uses kernel-basedlearning was chosen to classify each chromosome in theGA population. In kernel-based learning, all of the bene-ficial properties of linear classification methods, such assimplicity, are maintained, however, the overall classifica-for selecting an SVM as a classifier is that SVMs not onlyminimize the empirical risk (training error), they alsominimize the confidence error (test error) [36]. We usedthe LIBSVM software[37], which has also been used inother BI papers [38,39].The evaluation process was as follows. For each subject, ICand NC epochs were randomized and divided into train-ing, validation and test sets. The training set was used totrain the classifier, and the validation set was used to selectthe best set of features. The configuration yielding the bestresults on the validation set in the multi-objective sensementioned above was selected, and the performance ofthe system calculated on the test set was reported. We useda five-fold nested cross-validation for evaluating the per-formance of the system. For each outer cross-validationset, 20% of the data were used for testing and the rest wereused for training and model selection (selection of opti-mal subset of features). In order to select the models, thedatasets were further divided into five folds. For each fold,80% of the data were used for training the classifier and20% were used for model selection.To deal with the problem of unbalanced training sets(there were at least 20 times more NC epochs than ICepochs), the size of the NC training feature set wasreduced to be the same as the size of the training IC fea-ture sets. This was done by randomly selecting epochsfrom the NC training set.ResultsIn this section, we present our offline analysis of the dataof the four subjects described in the \"Data Collection\"Section. We performed a search on the classifier's param-eters during the model selection. Our findings showedthat a 5th degree polynomial kernel function performedbetter than other kernel functions studied (linear, polyno-mial with a degree other than 5 (3, 4, 6 and 7) and RBFkernel).Since a five-fold nested cross-validation was used for theperformance evaluation, the results were averaged overfive runs of the outer validation sets. The columns 1 to 5of Table 2 show the subject identification number, theaverage TP rate on the test sets, the average FP rate on thetest sets, the average ratio and the average number offeatures selected by the hybrid feature selection process.The numbers in parentheses are the standard deviations.As Table 2 shows, low FP rates for three of the four sub-jects (subjects AB1, AB2 and AB4) were achieved for a rel-atively high TP rate. For subject AB3, the TP results on theTPFPTPFPTPFPTPFPPage 6 of 13(page number not for citation purposes)tion is nonlinear in the input space, since the feature andinput spaces are nonlinearly related [35]. Another reasonJournal of NeuroEngineering and Rehabilitation 2007, 4:11 http://www.jneuroengrehab.com/content/4/1/11test sets were low (although the FP rates remained lessthan 4%).Next, the spatial distributions of the selected features wereexamined. The average number of selected features perchannel is shown in Table 3. The numbers in parenthesesshow the standard deviation over five runs of outer cross-validation. Figures 2 to 5 show the number of selected fea-tures per channel for all subjects after applying the hybridselection method (averaged over the number of cross-val-idation sets). The low standard deviation obtained for allcases shows the robustness of the proposed method overdifferent runs of the algorithm.Discussion and conclusionsDiscrete wavelet transform (DWT) is a useful featureextraction tool since it explores the time as well as the fre-quency information of the signal. Although DWT hasbeen employed to some degree of success in a number ofsynchronized BI systems, there remain some limitationsin its application to self-paced BI systems (in terms of thelarge size of the feature space).Brain interface systems that use DWT features have mostlyemployed only one or two channels (perhaps due to thelarge dimensionality of the feature space or to limitationsimposed by the experimental protocol). To simultane-ously explore the wavelet coefficients (features) of BIswith more channels (so as to explore the spatial informa-tion) and to avoid the problems associated with theresultant large feature space, a two-stage (hybrid) featureselection algorithm is proposed. The first stage usesmutual information (MI) to discard the least informativefeatures. In the second stage, a genetic algorithm (GA)selects those remaining features that lead to better systemIn our study, the features selected per channel varied con-siderably from one subject to another, as shown in Figures2 to 5. For example, for subject AB1, more features wereselected from channels FC1-C1, F1-FC1, Fz-FCz, FC4-C4,FC2-FC4 and Cz-C2, while for subject AB4, more featureswere selected from channels FC4-C4, FC2-FC4, F1-Fz, C2-C4,F1-FC1, and FC2-C2. These results support the hypothesisthat proper channel selection for every subject is necessaryto obtain superior performance.Another finding from Figures 2 to 5 is that the relevantfeatures for each subject were unique. These findings arein contrast to an earlier study done by our group thatempirically determined six pairs of electrodes for all sub-jects (channels F1-FC1, F2-FC2, FC1-C1, FC2-C2, FCz-Cz, andFz-FCz) [3]. Our findings in this regard are not surprising.The evidence from the literature supports the hypothesisthat there is a significant amount of intersubject variabil-ity in terms of generating MRP patterns [40]. The literaturealso shows that the selected features are not necessarilylocated in the standard frequency bands or on specificscalp locations, and that the set of selected features differsfrom subject to subject [41]. These studies support thenotion that a customized BI system should be designedfor each subject.Table 3 shows that for each subject, a number of bipolarchannels were not selected by the feature selection process(such as channel F1-Fz for subjects AB1, AB2 and AB3, andchannel FC3-FC1 for subject AB4). These results indicatethat these channels can be eliminated from the analysis infuture studies. Moreover, Table 3 and Figures 2 to 5 showthat the degree of contribution to the classification per-formance varies from one channel to another. TheseTable 2: Comparison of the average TP, average FP rates, average and the average number of features.Subject ID Test Set (Current Study) Number of features (Current Study)Test Set ([42]) Number of Features ([42])TP FP TP FPAB1 66.96 (4.79) 0.99 (0.39) 67.64 30.6 (1.14) 67.80 (1.4) 2.0 33.90 6AB2 73.34 (2.63) 1.40 (0.42) 52.39 29.2 (3.27) 74.0 (1.7) 2.0 37.0 6AB3 33.08 (14.03) 3.88 (1.04) 8.53 23.4 (2.41) 64.0 (1.3) 2.0 32.0 6AB4 56.10 (4.90) 1.41 (0.75) 39.79 27.0 (2.83) 73.1 (1.8) 2.0 36.55 6Average 57.37 1.92 29.88 27.55 69.73 2.0 34.86 6TPFPTPFPTPFPPage 7 of 13(page number not for citation purposes)performance in the sense of meeting multiple objectives. results indicate that a channel elimination methodologycould be incorporated into the proposed method to fur-Journal of NeuroEngineering and Rehabilitation 2007, 4:11 http://www.jneuroengrehab.com/content/4/1/11Page 8 of 13(page number not for citation purposes)Spatial distribution of the average number of selected features for Subject AB1Figure 2Spatial distribution of the average number of selected features for Subject AB1.Table 3: The average number of selected features per channel after applying the hybrid feature selection algorithm.Channel/Subject ID AB1 AB2 AB3 AB4F1-FC1 3.6 (1.14) 3 (1.22) 1.8 (0.84) 3 (0.71)F1-Fz 0 (0) 0 (0) 0 (0) 3.4 (0.55)F2-Fz 0 (0) 1.6 (0.89) 0.4 (0.55) 0 (0)F2-FC2 0.2 (0.45) 2 (0.71) 0.8 (0.84) 0.4 (0.55)FC3-FC1 1 (0) 1 (0) 1.6 (0.89) 0 (0)FC3-C3 1 (0.71) 3 (0) 2.4 (1.14) 1.6 (0.55)FC1-FCz 0 (0) 1 (0) 0.6 (0.55) 1.2 (0.84)FC1-C1 4.6 (0.55) 2.8 (0.45) 0 (0) 1.2 (0.45)FCz-FC2 0 (0) 2.2 (0.45) 0.6 (0.55) 0 (0)C1-Cz 1.6 (0.55) 0.4 (0.55) 3.6 (1.14) 1.2 (0.45)C2-C4 0.6 (0.55) 2.2 (0.45) 4.4 (0.89) 2.6 (0.89)FC2-FC4 4.2 (0.45) 1.6 (0.89) 2.2 (1.10) 3.4 (1.14)FC4-C4 3.2 (0.45) 2 (1) 1.8 (0.84) 4.4 (0.55)FC2-C2 2 (0) 2.2 (0.45) 0.6 (0.55) 2.2 (0.45)FCz-Cz 1.6 (0.89) 0.6 (0.55) 0.2 (0.45) 0.8 (0.45)C3-C1 1 (0.71) 2 (0) 2 (0) 0 (0)Cz-C2 3.8 (0.45) 0 (0) 0 (0) 0.6 (0.55)Fz-FCz 2.2 (1.30) 1.6 (0.55) 0.4 (0.55) 1 (0.71)Journal of NeuroEngineering and Rehabilitation 2007, 4:11 http://www.jneuroengrehab.com/content/4/1/11ther decrease the number of channels used for the opera-tion of the system. This approach would rank thechannels according to the number of selected features. Itwould then repeatedly eliminate the channel with thelowest contribution to fitness until the performance dropsbelow a certain threshold (recursive elimination of chan-nels). Systematic elimination of channels can lead to afaster setup of the system as well as decreased computa-tional time. This could be part of future research worksaimed at moving towards a more practical system.It should be mentioned that it is difficult to directly com-pare the results of our study with other BI studies. This isbecause the number of subjects, the type of subject(whether or not subjects are able-bodied), the experimen-tal protocols, the evaluation protocol and the neurologi-cal phenomenon differ from one study to another. Inaddition, the number of available EEG epochs, as well asthe degree of training subjects receive before participatingin a BI experiment, vary among studies.the low frequency-asynchronous switch design (the LF-ASD) [42]. Both studies use the same subjects, the sameexperimental protocol, the same EEG data and the sameevaluation protocol.The LF-ASD (originally reported in [3] and later modifiedas reported in [42]) uses a feature extractor with a shapesimilar to a wavelet function, and extracts features fromsix bipolar EEG channels. The Karhunen-Lo\u00C3\u00A8ve Transform(KLT) is used to reduce the 6-dimensional feature spaceproduced by the feature generator to a 2-dimensionalspace. A 1-NN classifier is used as the feature classifier. Amoving average and a debounce algorithm are employedto improve the performance of the system by reducing thenumber of false activations. The parameter values of thesystem were estimated by an expert (for details, see[3,30,42]). The latest performance results of the LF-ASD[42], applied to the data of subjects AB1 to AB4 are pre-sented in columns 6 to 9 of Table 2. As can be seen fromSpatial distribution of the average number of selected features for Subject AB2Figure 3Spatial distribution of the average number of selected features for Subject AB2.Page 9 of 13(page number not for citation purposes)We can, however, compare our current results with the lat-est design of a state-of-the-art self-paced BI system calledthe table, our proposed system has resulted in anJournal of NeuroEngineering and Rehabilitation 2007, 4:11 http://www.jneuroengrehab.com/content/4/1/11increased ratio for all subjects (with the exception ofsubject AB3). Specifically, the ratio increased from33.90 to 67.74 for subject AB1 (a relative improvement of99.5%), from 37 to 52.39 for subject AB2 (a relativeimprovement of 41.6%), and from 36.55 to 39.79 for sub-ject AB4 (a relative improvement of 8.9%). These resultsshow that our proposed approach improved the perform-ance of most subjects compared with the latest design ofthe LF-ASD. The degree of improvements in the ratio,however, is not statistically significant (p > 0.05), so testson the data of more subjects are needed to further sub-stantiate this improvement. Note that the improved per-formance was achieved at the expense of using morefeatures (please see columns 5 and 9 in Table 2).The relatively poor results obtained for subject AB3 maybe partly related to our choice of wavelet function. Noteand a typical bipolar MRP ensemble average pattern.However, there is substantial inter-subject variability inthe shape of MRPs, especially in single trials [42]. It isexpected that by analyzing a more diverse family of wave-let functions, a different wavelet function might be chosenfor each subject that would produce superior results.As mentioned in \"Methods\" Section, we designated thenumber of features chosen by the MI to be L = 50. Fewerfeatures would have sped up the process of feature selec-tion at the second stage, but might have resulted in alower fitness value. To test this possibility, we comparedthe fitness of the best subset of features (see Table 2) withthat of all features for subject AB1 (see Figure 6). In thisfigure, the black line shows the fitness of the best configu-ration (calculated from Table 2). The blue line shows thefitness of the classifier as a function of the number of topfeatures. We began by training and testing the classifierusing only the feature with the highest MI score, and thencalculated the fitness. Then we added features one at atime (according to their MI scores) and trained and testedTPFPTPFPTPFPSpatial distribution of the average number of selected features for Subject AB3Figure 4Spatial distribution of the average number of selected features for Subject AB3.Page 10 of 13(page number not for citation purposes)that the wavelet function chosen for this study was basedon the similarities between the chosen wavelet functionthe classifier using the new set of features. This processwas repeated until we reached L = 50. Although the fitnessJournal of NeuroEngineering and Rehabilitation 2007, 4:11 http://www.jneuroengrehab.com/content/4/1/11of the classifier increased as more features were added, itstayed well below the optimal value achieved by the GA.These results indicate that a lower L (especially when onlylimited top features are used for training the classifier)does not necessarily lead to better performance.A useful area to explore is the automation of the classifier.Currently, the feature selection procedure is automatedbut the selection of other parameter valuea, such as thoseof the classifier, is carried out through cross-validation.Incorporating these parameters into the automation proc-ess would relieve the designer from the tiresome processof selecting the classifier's parameter values, while poten-tially yielding better classification results. Expanding thecurrent results to continuous signals and ultimatelyonline testing are also worthwhile topics for future work.These results should be considered as preliminary resultsin the development of a self-paced brain interface systemwith a low FP rate. Our future work will also include test-ing of the proposed system on a larger pool of subjects toAbbreviationsAbbreviation Complete NameAB Able-bodiedBI brain interfaceCWT continuous wavelet transformDWT discrete wavelet transformEEG ElectroencephalographyEMG ElectromyographyEOG ElectrooculographyERP event-related potentialFN false negativeSpatial distribution of the average number of selected features for Subject AB4Figure 5Spatial distribution of the average number of selected features for Subject AB4.Page 11 of 13(page number not for citation purposes)further investigate its usability. FP false positiveJournal of NeuroEngineering and Rehabilitation 2007, 4:11 http://www.jneuroengrehab.com/content/4/1/11GA genetic algorithmIC intentional controlMI mutual informationMRP movement-related potentialsNC no controlSCP slow cortical potentialsSVM support vector machineTN true negativeTP true positiveWT wavelet transformAuthors' contributionsMF proposed the method, carried out the data analysisand drafted the manuscript. GEB and RKW criticallyrevised the proposed scheme, and were involved in draft-ing the manuscript. All authors read and approved thefinal manuscript.AcknowledgementsThis work was supported in part by NSERC under Grant 90278-06 and CIHR under Grant MOP-72711. This research has been enabled by the use of WestGrid computing resources, which are funded in part by the Canada Foundation for Innovation, Alberta Innovation and Science, BC Advanced Education, and the participating research institutions. The authors would References1. Vaughan TM: Brain-computer interface technology: A reviewof the second international meeting. IEEE Trans Neural SystRehabil Eng 2003, 11:94-109.2. Wolpaw JR, Birbaumer N, McFarland DJ, Pfurtscheller G, VaughanTM: Brain-computer interfaces for communication and con-trol. Clin Neurophysiol 2002, 113:767-791.3. Mason SG, Birch GE: A brain-controlled switch for asynchro-nous control applications. IEEE Trans Biomed Eng 2000,47:1297-1307.4. Mason SG, Birch GE: Temporal control paradigms for directbrain interfaces \u00E2\u0080\u0093 rethinking the definition of asynchronousand synchronous. Proc of HCI Int Conf Las Vegas, USA 2005.5. Birch GE, Lawrence PD, Hare RD: Single-trial processing ofevent-related potentials using outlier information. IEEE TransBiomed Eng 1993, 40:59-73.6. Levine SP, Huggins JE, BeMent SL, Kushwaha RK, Schuh LA, RohdeMM, Passaro EA, Ross DA, Elisevich KV, Smith BJ: A direct braininterface based on event-related potentials. IEEE Trans RehabilEng 2000, 8:180-185.7. Millan Jdel R, Mourino J: Asynchronous BCI and local neuralclassifiers: An overview of the adaptive brain interfaceproject. IEEE Trans Neural Syst Rehabil Eng 2003, 11:159-161.8. Scherer R, Muller GR, Neuper C, Graimann B, Pfurtscheller G: Anasynchronously controlled EEG-based virtual keyboard:Improvement of the spelling rate. IEEE Trans Biomed Eng 2004,51:979-984.9. Yom-Tov E, Inbar GF: Detection of movement-related poten-tials from the electro-encephalogram for possible use in abrain-computer interface. Med Biol Eng Comput 2003, 41:85-93.10. Townsend G, Graimann B, Pfurtscheller G: Continuous EEG clas-sification during motor imagery \u00E2\u0080\u0093 simulation of an asynchro-nous BCI. IEEE Trans Neural Syst Rehabil Eng 2004, 12:258-265.11. Demiralp T, Yordanova J, Kolev V, Ademoglu A, Devrim M, Samar VJ:Time-frequency analysis of single-sweep event-relatedpotentials by means of fast wavelet transform. Brain Lang1999, 66:129-145.12. Samar VJ, Bopardikar A, Rao R, Swartz K: Wavelet analysis of neu-roelectric waveforms: A conceptual tutorial. Brain Lang 1999,66:7-60.13. Hinterberger T, Kubler A, Kaiser J, Neumann N, Birbaumer N: Abrain-computer interface (BCI) for the locked-in: Compari-son of different EEG classifications for the thought transla-tion device. Electroencephalogr Clin Neurophysiol 2003, 114:416-425.14. Graimann B, Huggins JE, Levine SP, Pfurtscheller G: Toward a directbrain interface based on human subdural recordings andwavelet-packet analysis. IEEE Trans Biomed Eng 2004, 51:954-962.15. Glassman EL: A wavelet-like filter based on neuron actionpotentials for analysis of human scalp electroencephalo-graphs. IEEE Trans Biomed Eng 2005, 52:1851-1862.16. Fukuda S, Tatsumi D, Tsujimoto H, Inokuchi S: Studies of inputspeed of word inputting system using event-related poten-tial. Proc 20th Int Conf IEEE Engineering in Medicine and Biology SocietyHong Kong, China 1998, 3:1458-1460.17. Jansen BH, Allam A, Kota P, Lachance K, Osho A, Sundaresan K: Anexploratory study of factors affecting single trial P300 detec-tion. IEEE Tran Biomed Eng 2004, 51:975-978.18. Ding CH: Unsupervised feature selection via two-way order-ing in gene expression analysis. Bioinformatics 2003,19:1259-1266.19. Talavera L: An evaluation of filter and wrapper methods forfeature selection in categorical clustering. Proc 6th Int Symp onIntelligent Data Analysis(IDA05) Madrid, Spain 2005, 3646:440-451.20. Kohavi R, John GH: Wrappers for feature subset selection. ArtifIntell 1997, 97:273-324.21. Battiti R: Using mutual information for selecting features insupervised neural net learning. IEEE Trans on Neural Networks1994, 5:537-550.22. Goldberg DE: Genetic Algorithms in Search, Optimization and MachineLearning Reading, MA: Addison-Wesley Publishing Company; 1989. 23. Back T, Fogel DB, Michalewicz T: Evolutionary Computation Bristol andPhiladelphia: Institute of Physics Publishing; 2000. 24. Beisteiner R, Hollinger P, Lindinger G, Lang W, Berthoz A: MentalComparison of the fitness of the best chromosome vs. other subset of featuresFigure 6Comparison of the fitness of the best chromosome vs. other subset of features.Page 12 of 13(page number not for citation purposes)like to thank Mr. Craig Wilson for his valuable comments on this paper. representations of movements. brain potentials associatedwith imagination of hand movements. Electroencephalogr ClinNeurophysiol 1995, 96:183-193.Publish with BioMed Central and every scientist can read your work free of charge\"BioMed Central will be the most significant development for disseminating the results of biomedical research in our lifetime.\"Sir Paul Nurse, Cancer Research UKYour research papers will be:available free of charge to the entire biomedical communitypeer reviewed and published immediately upon acceptancecited in PubMed and archived on PubMed Central Journal of NeuroEngineering and Rehabilitation 2007, 4:11 http://www.jneuroengrehab.com/content/4/1/1125. Chatrian GE, Petersen MC, Lazarte JA: The blocking of the rolan-dic wicket rhythm and some central changes related tomovement. Electroencephalogr Clin Neurophysiol Suppl 1959,11:497-510.26. Pfurtscheller G, Neuper C: Motor imagery activates primarysensorimotor area in humans. Neurosci Lett 1997, 239:65-68.27. Pfurtscheller G, Neuper C, Flotzinger D, Pregenzer M: EEG-baseddiscrimination between imagination of right and left handmovement. Electroencephalogr Clin Neurophysiol 1997, 103:642-651.28. Porro CA, Francescato MP, Cettolo V, Diamond ME, Baraldi P, ZuianiC, Bazzocchi M, Di Prampero PE: Primary motor and sensorycortex activation during motor performance and motorimagery: A functional magnetic resonance imaging study. JNeurosci 1996, 16:7688-7698.29. Cunnington R, Iansek R, Bradshaw JL, Phillips JG: Movement-related potentials associated with movement preparationand motor imagery. Exp Brain Res 1996, 111:429-436.30. Borisoff JF, Mason SG, Bashashati A, Birch GE: Brain-computerinterface design for asynchronous control applications:Improvements to the LF-ASD asynchronous brain switch.IEEE Trans Biomed Eng 2004, 51:985-992.31. Birch GE, Bozorgzadeh Z, Mason SG: Initial on-line evaluations ofthe LF-ASD brain-computer interface with able-bodied andspinal-cord subjects using imagined voluntary motor poten-tials. IEEE Trans Neural Syst Rehabil Eng 2002, 10:219-224.32. Babiloni C, Carducci F, Cincotti F, Rossini PM, Neuper C, Pfurtschel-ler G, Babiloni F: Human movement-related potentials vsdesynchronization of EEG alpha rhythm: A high-resolutionEEG study. Neuroimage 1999, 10:658-665.33. Mallat SG: Multifrequency channel decompositions of imagesand wavelet models. IEEE Trans Acoustics, Speech, and SignalProcessing 1989, 37:2091-2106.34. Kwak N, Chong-Ho Choi: Input feature selection for classifica-tion problems. IEEE Trans Neural Networks 2002, 13:143-159.35. Muller KR, Anderson CW, Birch GE: Linear and nonlinear meth-ods for brain-computer interfaces. IEEE Trans on Neural Syst andRehab Eng 2003, 11:165-169.36. Yoon H, Yang K, Shahabi C: Feature subset selection and fea-ture ranking for multivariate time series. IEEE Trans Knowledgeand Data Eng 2005, 17:1186-1198.37. Chang C, Lin C: LIBSVM: A Library for Support Vector Machines 2001[http://www.csie.ntu.edu.tw/~cjlin/libsvm].38. Kaper M, Meinicke P, Grossekathoefer U, Lingner T, Ritter H: BCIcompetition 2003 \u00E2\u0080\u0093 data set IIb: Support vector machines forthe P300 speller paradigm. IEEE Trans Biomed Eng 2004,51:1073-1076.39. Kaper M, Ritter H: Generalizing to new subjects in brain-com-puter interfacing. Proc 26th IEEE EMBS Annual Int Conf (EMBC'04)San Francisco, USA 2004, 2:4363-4366.40. Evidente VG, Caviness JN, Jamieson B, Weaver A, Joshi N: Intersub-ject variability and intrasubject reproducibility of the bere-itschaftspotential. Mov Disord 1999, 14:313-319.41. Millan J, Franze M, Mourino J, Cincotti F, Babiloni F: Relevant EEGfeatures for the classification of spontaneous motor-relatedtasks. Biol Cybern 2002, 86:89-95.42. Bashashati A, Fatourechi M, Ward RK, Birch GE: User customiza-tion of the feature generator of an asynchronous brain inter-face. Ann Biomed Eng 2006, 34:1051-1060.yours \u00E2\u0080\u0094 you keep the copyrightSubmit your manuscript here:http://www.biomedcentral.com/info/publishing_adv.aspBioMedcentralPage 13 of 13(page number not for citation purposes)"@en . "Article"@en . "10.14288/1.0223918"@en . "eng"@en . "Reviewed"@en . "Vancouver : University of British Columbia Library"@en . "BioMed Central"@en . "10.1186/1743-0003-4-11"@en . "Attribution 4.0 International (CC BY 4.0)"@en . "http://creativecommons.org/licenses/by/4.0/"@en . "Faculty"@en . "Application of a hybrid wavelet feature selection method in the design of a self-paced brain interface system"@en . "Text"@en . "http://hdl.handle.net/2429/56856"@en .