Multimodal Biomedical Signal Processing forCorticomuscular Coupling AnalysisbyXun ChenB.Sc., University of Science and Technology of China, 2009A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Biomedical Engineering)The University Of British Columbia(Vancouver)January 2014c? Xun Chen, 2014AbstractCorticomuscular coupling analysis using multiple data sets such as electroencepha-logram (EEG) and electromyogram (EMG) signals provides a useful tool for under-standing human motor control systems. A popular conventional method to assesscorticomuscular coupling is the pair-wise magnitude-squared coherence (MSC).However, there are certain limitations associated with MSC, including the diffi-culty in robustly assessing group inference, only dealing with two types of datasets simultaneously and the biologically implausible assumption of pair-wise in-teractions. In this thesis, we propose several novel signal processing techniquesto overcome the disadvantages of current coupling analysis methods. We proposecombining partial least squares (PLS) and canonical correlation analysis (CCA)to take advantage of both techniques to ensure that the extracted components aremaximally correlated across two data sets and meanwhile can well explain theinformation within each data set. Furthermore, we propose jointly incorporatingresponse-relevance and statistical independence into a multi-objective optimiza-tion function, meaningfully combining the goals of independent component anal-ysis (ICA) and PLS under the same mathematical umbrella. In addition, we ex-tend the coupling analysis to multiple data sets by proposing a joint multimodalgroup analysis framework. Finally, to acquire independent components but notjust uncorrelated ones, we improve the multimodal framework by exploiting thecomplementary property of multiset canonical correlation analysis (M-CCA) andjoint ICA. Simulations show that our proposed methods can achieve superior per-formances than conventional approaches. We also apply the proposed methods toconcurrent EEG, EMG and behavior data collected in a Parkinson?s disease (PD)study. The results reveal highly correlated temporal patterns among the multimodaliisignals and corresponding spatial activation patterns. In addition to the expectedmotor areas, the corresponding spatial activation patterns demonstrate enhancedoccipital connectivity in PD subjects, consistent with previous medical findings.iiiPrefaceThis thesis is written based on a collection of manuscripts, resulting from the col-laboration of several researchers. The majority of the research, including liter-ature survey, algorithm development and implementation, numerical simulation,real data analysis and result report, are conducted by the author, with suggestionsfrom Prof. Z. Jane Wang and Prof. Martin J. McKeown. The manuscripts are pri-marily drafted by the author, with helpful revisions and comments from Prof. Z.Jane Wang (papers in Chapters 2?5), Prof. Martin J. McKeown (papers in Chap-ters 3?5) and Prof. Rabab K. Ward (papers in Chapter 4).Chapter 2 is based on the following manuscript:? X. Chen, A. Liu, Z. J. Wang, H. Peng, ?Modeling Corticomuscular Activityby Combining Partial Least Squares and Canonical Correlation Analysis,?Journal of Applied Mathematics, vol. 2013, Article ID 401976, 11 pages,2013.Chapter 3 is based on the following manuscript:? X. Chen, C. He, Z. J. Wang, M. J. McKeown, ?An IC-PLS Framework forCorticomuscular Coupling Analysis,? IEEE Transactions on Biomedical En-gineering, vol. 60, no. 7, pp. 2022-2033, 2013.Chapter 4 is based on the following manuscripts:? X. Chen, X. Chen, R. K. Ward, Z. J. Wang, ?A Joint Multimodal GroupAnalysis Framework for Modeling Corticomuscular Activity,? IEEE Trans-actions on Multimedia, Special Issue on Multimodal Biomedical Imaging,vol. 15, no. 5, pp. 1049-1059, 2013.iv? X. Chen, Z. J. Wang, M. J. McKeown, ?A Tridirectional Statistical Model forGroup Corticomuscular Activity Analysis,? Multimedia Signal Processing(MMSP), 2012 IEEE 14th International Workshop on, pp. 309-312, Banff,Canada, Sep. 2012.Chapter 5 is based on the following manuscript:? X. Chen, Z. J. Wang, M. J. McKeown, ?A Three-step Method for Cortico-muscular Activity Modeling with Application to Parkinson?s Disease,? IEEETransactions on Information Technology in Biomedicine, in press, 2013,doi:10.1109/JBHI.2013.2284480.Appendix A is based on the following manuscripts:? X. Chen, Z. J. Wang, ?Design and Implementation of a Wearable, WirelessEEG Recording System,? The 5th International Conference on Bioinformat-ics and Biomedical Engineering, pp. 1-4, May 2011.? X. Chen, Z. J. Wang, ?Pattern Recognition of Number Gestures Based on AWireless Surface EMG System,? Biomedical Signal Processing and Control,vol. 8, no. 2, pp. 184-192, 2013.All experimental procedures in this work were reviewed and approved by theUniversity of British Columbia Clinical Research Ethics Board under CertificateNumbers H04-70291 and H05-70555.vTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xList of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2.1 Wireless EEG and sEMG systems . . . . . . . . . . . . . 21.2.2 Corticomuscular Coupling Analysis . . . . . . . . . . . . 61.3 Challenges and Motivation . . . . . . . . . . . . . . . . . . . . . 81.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Modeling Corticomuscular Activity by Combining PLS and CCA . 122.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2 Experimental Data . . . . . . . . . . . . . . . . . . . . . . . . . 152.3 A Combined PLS and CCA Method . . . . . . . . . . . . . . . . 18vi2.3.1 Motivation and Objectives . . . . . . . . . . . . . . . . . 182.3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 192.3.3 Data Processing and Results . . . . . . . . . . . . . . . . 232.3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 333 An IC-PLS Framework for Corticomuscular Coupling Analysis . . 343.1 Motivation and Objectives . . . . . . . . . . . . . . . . . . . . . 343.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.2.1 A Multi-objective Optimization . . . . . . . . . . . . . . 353.2.2 Determining the Optimization Weights . . . . . . . . . . 373.2.3 Initialization Setting . . . . . . . . . . . . . . . . . . . . 393.2.4 A Group Analysis Framework . . . . . . . . . . . . . . . 413.3 Data Processing and Results . . . . . . . . . . . . . . . . . . . . 423.3.1 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 423.3.2 Real Data . . . . . . . . . . . . . . . . . . . . . . . . . . 483.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514 A Joint Multimodal Group Analysis Framework for Modeling Cor-ticomuscular Activity . . . . . . . . . . . . . . . . . . . . . . . . . . 534.1 Motivation and Objectives . . . . . . . . . . . . . . . . . . . . . 534.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544.2.1 The Joint Multimodal Statistical Framework . . . . . . . . 544.2.2 A Group Analysis Architecture . . . . . . . . . . . . . . 604.3 Data Processing and Results . . . . . . . . . . . . . . . . . . . . 614.3.1 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 614.3.2 Real Data . . . . . . . . . . . . . . . . . . . . . . . . . . 644.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735 A Three-step Method for Corticomuscular Activity Modeling . . . . 755.1 Motivation and Objectives . . . . . . . . . . . . . . . . . . . . . 755.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765.2.1 Multiset Canonical Correlation Analysis . . . . . . . . . . 765.2.2 Joint Independent Component Analysis . . . . . . . . . . 785.3 Data Processing and Results . . . . . . . . . . . . . . . . . . . . 78vii5.3.1 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 785.3.2 Real Data . . . . . . . . . . . . . . . . . . . . . . . . . . 825.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 846 Conclusions and Future Works . . . . . . . . . . . . . . . . . . . . . 886.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 886.2 Future Research Directions . . . . . . . . . . . . . . . . . . . . . 91Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93A Prototype Development . . . . . . . . . . . . . . . . . . . . . . . . . 105A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105A.2 A Wireless Wearable EEG Recording System . . . . . . . . . . . 107A.2.1 System Design . . . . . . . . . . . . . . . . . . . . . . . 107A.2.2 Experiments and Results . . . . . . . . . . . . . . . . . . 111A.3 A Wireless Portable sEMG Recognition System . . . . . . . . . . 116A.3.1 System Design . . . . . . . . . . . . . . . . . . . . . . . 116A.3.2 An Application for Chinese Number Guesture Recognition 116B Algorithm Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . 133B.1 The Derivation for The PLS+CCA Method . . . . . . . . . . . . 133B.1.1 The First Step: PLS . . . . . . . . . . . . . . . . . . . . . 133B.1.2 The Second Step: CCA . . . . . . . . . . . . . . . . . . . 136B.2 The Derivation for The IC-PLS Model . . . . . . . . . . . . . . . 137viiiList of TablesTable 2.1 The correlation coefficients between the corresponding sourcepairs of X and Y . . . . . . . . . . . . . . . . . . . . . . . . . . 25Table 4.1 The estimation performances of the proposed method at differ-ent noise levels. . . . . . . . . . . . . . . . . . . . . . . . . . 65Table A.1 Some combinations of different kernels and corresponding results.128ixList of FiguresFigure 1.1 EEG signal examples at different frequency bands. . . . . . . 3Figure 1.2 An example EMG signal with its amplitude envelope. . . . . . 4Figure 1.3 The overview of challenges and objectives of the thesis work. 9Figure 2.1 The squeezing task illustration. . . . . . . . . . . . . . . . . . 16Figure 2.2 Groupings of electrodes in five brain regions. . . . . . . . . . 17Figure 2.3 The simulated source signals . . . . . . . . . . . . . . . . . . 24Figure 2.4 LVs estimated using PLS. . . . . . . . . . . . . . . . . . . . 26Figure 2.5 LVs estimated using CCA. . . . . . . . . . . . . . . . . . . . 27Figure 2.6 LVs estimated using PLS+CCA. . . . . . . . . . . . . . . . . 28Figure 2.7 Two components extracted by PLS+CCA with EEG correla-tion and EMG amplitude as features. . . . . . . . . . . . . . . 32Figure 3.1 The IC-PLS model. . . . . . . . . . . . . . . . . . . . . . . . 42Figure 3.2 The four source signals. . . . . . . . . . . . . . . . . . . . . 43Figure 3.3 Mixed data . . . . . . . . . . . . . . . . . . . . . . . . . . . 44Figure 3.4 LVs estimated using PLS. . . . . . . . . . . . . . . . . . . . 45Figure 3.5 LVs estimated using ICA. . . . . . . . . . . . . . . . . . . . 46Figure 3.6 LVs estimated using IC-PLS. . . . . . . . . . . . . . . . . . . 47Figure 3.7 Two components estimated by the IC-PLS framework withEEG correlations and EMG amplitude as features . . . . . . . 49Figure 3.8 Two components estimated by the IC-PLS framework withEEG coherence and EMG amplitude as features . . . . . . . . 50Figure 4.1 The diagram of the joint multimodal statistical framework. . . 57xFigure 4.2 The diagram of the group analysis architecture . . . . . . . . 61Figure 4.3 The extracted four supLVs and the original four sources. . . . 63Figure 4.4 The original subLVs. . . . . . . . . . . . . . . . . . . . . . . 63Figure 4.5 The modified subLVs. . . . . . . . . . . . . . . . . . . . . . 64Figure 4.6 The postprocessing results for Multi-LVs. . . . . . . . . . . . 65Figure 4.7 Component 1 estimated by group JMSF with EEG correla-tions, EMG amplitude and force output as features. . . . . . . 68Figure 4.8 Component 2 estimated by group JMSF with EEG correla-tions, EMG amplitude and force output as features. . . . . . . 69Figure 4.9 Component 3 estimated by group JMSF with EEG correla-tions, EMG amplitude and force output as features. . . . . . . 70Figure 4.10 Component 1 estimated by group JMSF with EEG band-limitedenergies, EMG amplitude and force output as features. . . . . 71Figure 4.11 Component 2 estimated by group JMSF with EEG band-limitedenergies, EMG amplitude and force output as features. . . . . 72Figure 5.1 The diagram of the three-step method. . . . . . . . . . . . . . 76Figure 5.2 The six simulated source signals. . . . . . . . . . . . . . . . . 79Figure 5.3 The extracted underlying components by separate ICA, jICA,M-CCA and our proposed method. . . . . . . . . . . . . . . . 81Figure 5.4 The significant component jointly extracted from the EEG co-herence, EMG amplitude and BEH data sets. . . . . . . . . . 85Figure 6.1 The relationships between the four proposed methods. . . . . 89Figure A.1 Illustration of the Chinese number gestures. . . . . . . . . . . 106Figure A.2 The architecture of the designed EEG recording system. . . . 107Figure A.3 The schematic of a single channel EEG acquisition circuit. . . 108Figure A.4 The direct current (DC) restorator model. . . . . . . . . . . . 109Figure A.5 Hardware components of the wireless EEG system. . . . . . . 112Figure A.6 ECG and EEG readings from the designed system . . . . . . . 113Figure A.7 Energy analysis of the EEG readings by FFT . . . . . . . . . 114Figure A.8 Alpha waves detected on posterior regions of head. . . . . . . 115Figure A.9 Illustration of the sEMG electrode pair placement . . . . . . . 117xiFigure A.10 The basic diagram of the recognition procedure. . . . . . . . . 118Figure A.11 An example of 4-channel sEMG signals and the correspondingmotion detection results of Chinese number gestures. . . . . . 119Figure A.12 Some confusion matrices of different combinations. . . . . . . 126Figure A.13 Comparisons of the overall recognition accuracies when em-ploying different feature sets and different classifiers. . . . . . 127Figure A.14 Pictures of the hardware and software implementation of theproposed real-time sEMG recognition system. . . . . . . . . . 129Figure A.15 The average recognition accuracy rates of the six subjects overthe eight training sessions. . . . . . . . . . . . . . . . . . . . 130Figure A.16 The average recognition accuracy rates for the ten number ges-ture movements over the eight training sessions. . . . . . . . . 131xiiList of AcronymsACCC autocorrelation and cross-correlation coefficientsADC analog-to-digital converterANN artificial neural networkBEH behavior dataBCI brain-computer interfaceBSS blind source separationCMRR common-mode rejection ratioCCA canonical correlation analysisDC direct currentDRL driven right legECG electrocardiogramEEG electroencephalogramEMG electromyogramEOG electrooculogramPC personal computerfMRI functional magnetic resonance imagingFFT fast Fourier transformHCI human-computer interfacexiiiHOS high order statisticsICA independent component analysisICR Independent component regressionICs independent componentsIVA independent vector analysisJMSF joint multimodal statistical frameworkjICA joint independent component analysisk-NN k-nearest neighborLDA linear discriminant analysisLVs latent variablesLV latent variableM1 primary motor cortexMAV mean absolute valueMAVR mean absolute value ratioMAVS mean absolute value slopeM-CCA multiset canonical correlation analysisMCU microcontrollerMEG magnetoencephalogramMSC magnitude-squared coherenceMKL multiple kernel learningMKL-SVM multiple kernel learning support vector machineMVC maximum voluntary contractionop-amp operational amplifierOSC orthogonal signal correctionxivPCA principal component analysisPCs principal componentsPCB printed circuit boardPD Parkinson?s diseasePLS partial least squaresPPG photoplethysmographQDA quadratic discriminant analysisRBF radial basis functionsEMG surface EMGSPM spectral power magnitudesSSC slope sign changesSTFT short-time Fourier transformsubLVs sub-latent variablessupLV super latent variableSVM support vector machineTD Hudgins? time-domainUPDRS Unified Parkinson?s Disease Rating ScaleUSB Universal Serial BusWL waveform lengthWT wavelet transformZC zero crossingsxvAcknowledgmentsFirst and foremost, I would like to thank my PhD supervisors, Prof. Z. Jane Wangand Prof. Martin J. McKeown, for being great mentors, both professionally andpersonally. This thesis would never have been written without their patient guid-ance and insightful suggestions. I am in particular indebted to them for supportingme over four years and giving me sufficient freedom to explore interesting researchtopics, so that I could finally grow up as an independent researcher.I would like to thank my thesis examination committee members for their timeand efforts. I would also like to thank Prof. Rabab K. Ward and Prof. Victor C.M.Leung for their care and support for my research.I would like to thank my labmates, friends and colleagues at UBC, with whomI have had a memorable campus life over the years. I would also like to expressspecial thanks again to my senior labmates, including Junning Li, Xiaohui Chen,Xudong Lv, Chen He and Joyce Chiang. Without their professional suggestionsand personal encouragement, I would never have grown up that fast.Finally, I am deeply indebted to my family for their endless love and support.A special thank goes to my wife, Aiping Liu, for sharing all the happy as well asfrustrated moments with me all the time.xviChapter 1Introduction1.1 BackgroundDuring the last two decades, due to advances in molecular biology, electronics,computational techniques and biosignal acquisition technologies, there have beenincreasingly active research activities in studying the nervous system. Recent suchadvances allow scientists to investigate fundamental questions in neuroscience re-search ? how the brain is able to perceive, process and act upon the tremendousamount of information flowing across in a seemingly effortless manner. While theanatomical structure of the brain is relatively well studied, the underlying mecha-nism that coordinates various components to act together synergistically is yet to beelucidated. Of particular interest is how various components of the motor systeminteract together to produce coordinated movements. From movement planningto actual execution, this process involves a series of coordinated actions in func-tionally specialized brain regions and muscle groups. Understanding movementcontrol has important implications for the development of brain-computer inter-face (BCI), prosthetic control, deep brain stimulation therapies and treatment formovement disorders [1, 2].The emergence of powerful new measurement techniques such as neuroimag-ing and electrophysiology has provided researchers new opportunities to exploreunknown aspects of brain functioning from different angles. Among these tech-niques, electroencephalogram (EEG) and surface EMG (sEMG) have been the1most popular candidates in many practical biomedical applications, principallydue to their high temporal resolution, noninvasibility, low cost and suitability forlong-term monitoring [11]. For instance, wireless body sensor networks, integrat-ing electrophysiological signals and positioning sensors, have been attracting in-creasing attention for health-monitoring applications, e.g., greatly facilitating homehealth care in a long-distance manner.With the availability of multichannel neural signals, earlier works investigatinghuman motor control systems mainly focus on localization of functionally spe-cialized brain regions during specific motor tasks [1]. However, such approachesonly provide a limited view of the motor control mechanism. Alternative meth-ods have been developed to investigate the integration of functionally related neu-ronal groups, termed as brain connectivity [3]. The study of brain connectivityhas provided not only a system-level view of brain functioning in the normal statebut also an explanatory framework for pathological conditions such as Parkinson?sdisease (PD) [4, 5]. Nevertheless, during motor tasks, it is not convincing to onlyanalyze brain activity measured by EEG but neglect muscle activity measured bysEMG. One main reason is that EEG contains a lot of non-task related backgroundactivities. It is desirable to investigate the coupling between EEG and sEMG. Suchcoupling analysis allows the identification of underlying components from EEGsignals whose temporal patterns are maximally correlated with those of sEMG(i.e., highly modulated by a motor task), while meanwhile discards any non-taskrelated background components. Therefore, jointly analyzing sEMG together withEEG could highly benefit task-related motor control studies.1.2 Related Works1.2.1 Wireless EEG and sEMG systemsEEG SystemsIn the late 1800s, Richard Caton (1842-1926) first reported the presence of bipoten-tials on the surface of the human skull [6]. Later, in 1924, Hans Berger measuredthese electrical signals in the human brain for the first time and provided the first2systematic description for EEG [7]. EEG monitors the electrical activity caused bythe firing of cortical neurons across the human scalp. EEG activity always reflectsthe summation of the synchronous activity of thousands or millions of neurons andshows oscillations at a variety of frequencies. The human EEG activity is mainlycategorized into five bands by frequency: Delta (1?4 Hz), Theta (4?8 Hz), Alpha(8?12 Hz), Beta (12?30 Hz) and Gamma (30?100 Hz) [8]. An example is shownin Fig. 1.1.Figure 1.1: EEG frequency bands [9].EEG has been widely used for studying various neurological conditions suchas epilepsy and coma [10]. Diagnostic applications generally focus on spectralcontent of EEG, reflecting the type of neural oscillations observable in EEG sig-nals. As EEG recording remains the most widespread, noninvasive and inexpen-sive technology with sufficient temporal resolution and it is suitable for continuousmonitoring, the EEG recording systems play an important role in brain studies,especially in diagnosis of brain diseases such as epilepsy, sleeping disorder andabnormal behavior [11].In current EEG recording systems, the EEG recorders based on personal com-puter (PC) generally communicate with the medical instruments through the com-puter I/O interface. Such systems usually adopt a wired, serial port interface, such3as RS-232C standard, to transmit the measured EEG data. It can be inconvenientand uncomfortable for daily-life usage because of the wired transmission lines.Recent advances in electronic, communications and information technologies havestimulated great interest world-wide in the development of portable, battery op-erated biomedical instruments [12]. This has been particularly true in the caseof electrocardiogram (ECG) systems, which have been increasingly portable. Forcontinuous monitoring of EEG signals, it is also desirable to have a portable EEGmeasurement system for reliably measuring brain activity. Size, power consump-tion, and wireless communication capability are important factors to be consideredin designing such EEG systems [13].sEMG SystemsThe first actual recording of electromyogram (EMG) was made by Marey in 1890.Later in 1922, Gasser and Erlanger used an oscilloscope to show the electricalsignals from muscles. Then, researchers began to use improved electrodes morewidely for the study of muscles. The clinical use of EMG is mainly for diagnosis ofneurological and neuromuscular problems. A typical EMG signal and its amplitudeenvelope are illustrated in Fig. 1.2.Figure 1.2: An example of an EMG signal [9].4Significant insights have been acquired regarding the underlying neural mech-anisms controlling movements, via the simultaneous measurement of sEMG andbody kinematics during tasks such as walking, swimming, and scratching [14].Due to the presence of electrical wires between the electrodes and the sEMG de-vice, as well as the wire between the sEMG device and the computer, sEMG mea-surements were subject to an inherent limitation in the past. These sEMG devicescould also be affected by noise, resulting from the movements of the wires. Inrecent years, although some commercial wireless sEMG devices, such as TrignoWireless System produced by Delsys, have appeared, many research groups arestill developing their own wireless sEMG systems since it is more convenient forresearchers to design and modify their systems according to their specific researchneeds. We also have designed a sEMG system by appropriately modifying an orig-inal wireless EEG system.Among sEMG applications, hand gesture recognition based on forearm sEMGhas been an active research area due to its broad applications in myoelectric control.With using wired or wireless sEMG sensors, human motions can be captured non-invasively by sEMG signals and such sEMG signals can be intelligently recognizedas control commands in many myoelectric systems such as multifunction pros-thesis, wheelchairs, virtual keyboards, gesture-based interfaces for virtual realitygames etc. [15, 16]. Depending on the involved movements, current sEMG-basedrecognition and classification research can be divided into three main categories:gross hand, wrist and arm movement recognition; individual finger activation andmovement detection; and multiple finger gesture classification.The majority of previous research has been focused on gross hand, wrist andarm movement recognition and quite high recognition accuracy can be achieved.For instance, with two, four, five or eight sEMG channels, a high classificationaccuracy which is above 95% was reported for classifying four or six movements[17?19]. However, it is worth noting that these movements, such as palm extensionand closure, wrist flexion and extension, wrist pronation and supination, representrelatively easy classification problems since they generate distinguishing sEMGactivation patterns.For the category of individual finger activation and movements, such as fingertyping, generally involving flexion and extension of the individual thumb, index,5middle, ring and little fingers, several recent works have been reported along withthis research direction [20?22]. Researchers in [21] used five channels to collectforearm sEMG signals for a piano-tapping task, in which the subjects tapped a key-board with each of the five fingers, and a 85% recognition accuracy was achievedby using artificial neural network classifiers. In [22], though a 98% high accuracycould be achieved for classifying 12 flexion and extension movements of individ-ual fingers, 32 sEMG channels were used which is highly impractical for real-lifeonline applications.Regarding the research direction of multiple finger gesture classification, to ourknowledge, relatively few research papers were published and they didn?t investi-gate benchmark multi-finger movement tasks.1.2.2 Corticomuscular Coupling AnalysisCorticomuscular coupling analysis, i.e. studying simultaneous cortical and mus-cular activities typically during sustained isometric muscle contraction, is a keytechnique to assess functional interactions in the motor control system. The mostpopular method is magnitude-squared coherence (MSC), a normalized measure ofcorrelation between two waveforms or signals in the frequency domain, which iscalculated byCoh( f ) = |Se?e?( f )|2 /(Se?e?( f )Se?e?( f )), (1.1)where |Se?e? | is the cross-spectrum between the signals e? and e?; Se?e? is the au-tospectrum of the signal e?; Se?e? is the autospectrum of the signal e?. MSC hasbeen successfully used to assess the interactions between motor-related brain areasand the muscles. For example, in monkeys, MSC in the 20-30 Hz band can be de-tected between cortical local field potentials and the rectified EMG from contralat-eral hand muscles that are modulated during different phases of a precision griptask [23]. In humans, similar findings of beta-band corticomuscular coherenceshave also been detected during isometric contractions utilizing both magnetoen-cephalogram (MEG) [24] and electroencephalography EEG [25] recordings overthe primary motor cortex. MSC has proved useful in assessing diseases of the hu-man motor system, particularly PD. An inverse relation exists between beta-bandcorticomuscular MSC and the clinical sign of bradykinesia in PD [26]. Cerebro-6muscular MSC appears unaffected in early PD, yet beta band oscillations of bilat-eral primary sensorimotor cortices are already increased at the earliest stages ofPD [27]. Later on, cerebro-muscular MSC is reduced in non-medicated PD, andmedication [28] or deep brain stimulation [26] normalizes this deficiency. Finallyprimary motor cortex (M1)?muscular MSC is strongly reduced for both alpha andbeta bands during repetitive movement compared to static contraction, but this isunaffected by administration of levodopa [29].Recently, several data-driven multivariate methods, such as partial least squares(PLS), have been developed for analyzing biological data that may be appropriatefor assessing corticomuscular coupling as they establish a dependency relation-ship between two types of data sets. PLS relies on latent variables (LVs), whichmay enhance aid in the biological interpretation of the results. PLS is widely usedin practical applications, probably because PLS can handle high dimensional andcollinear data - frequently the case in real-world biological applications. PLS ex-ploits the covariation between predictor variables and response variables and ex-tracts a new set of latent components that maximally relate them [32]. In otherwords, the covariance between the extracted LVs should be maximized asmaxw1,w2(w1T XTY w2)2,s.t. wiT wi = 1, i = 1,2(1.2)where wi?s (i = 1,2) are the weight vectors, X(N? p) is the predictor matrix andY (N? q) is the response matrix with N representing the number of observations.PLS and its variants have been investigated in many medical applications, such asdetermining the spatial patterns of brain activity in functional magnetic resonanceimaging (fMRI) data associated with behavioral measures [33], and the commontemporal components between EEG and fMRI signals [34]. In a recent PD study,PLS has been extended to perform group level analysis and even accommodatemultiway (e.g., time, PLS channel, and frequency) data [30].71.3 Challenges and MotivationAlthough MSC has been popular in studying corticomuscular coupling, it suffersfrom several limitations. First, incorporating inter-subject variability in order tomake a robust group inference is not straightforward with MSC because the exactfrequency of maximum coupling may be inconsistent across subjects, necessitatingpost hoc pooled coherence analyses in an attempt to combine the original coher-ence estimates into a single representative estimate [35]. Second, the pair-wiseMSC concentrates on assessing the role of an individual locus in the brain in driv-ing the motor system while motor activity is known to be more distributed [36]. Infact, recent work has suggested that interactions between brain regions correspondmore closely to ongoing EMG than activity at discrete sites [30]. Moreover, whenthe brain activity is measured by EEG, applying MSC directly to raw EEG andEMG signals normally yields a very low coherence value, because only a smallfraction of ongoing EEG activity is related to motor control [31]. Therefore ex-tensive statistical testing, with all the accompanying assumptions, is required todetermine whether the EEG/EMG coherence is, in fact, significant.Regarding PLS, there exist several concerns with it and its extended methods,which hinder their applications to multimodal corticomuscular activity analysis.First, in the regular PLS, multiway PLS [33] and multiblock PLS [37] frameworks,only two types of data sets can be processed at the same time, e.g. fMRI andbehavior measurements [33] or EEG and EMG signals [30], while in many casesmore than two types of data sets are available and a better understanding couldbe achieved from analyzing multimodal data jointly. Another concern lies in thegoal of PLS, which is to maximize covariance but not correlation. This may makethe extracted underlying components across the data sets highly correlated, but notmaximally correlated. In addition, PLS can only extract uncorrelated LVs and theirinterpretations may be difficult in real applications [38] since it might be insuffi-cient to consider only up to the second-order statistics (e.g. correlation and covari-ance) for obtaining a unique LV model [39] if the data are not strictly multivariateGaussian.According to the aforementioned limitations, the main technical challengescould be summarized as poor signal-to-noise ratio, inter-subject variability, hard8biological interpretation and multimodality. The goal of this research work is todevelop novel multimodal signal processing techniques to address these challengesarising when modeling corticomuscular activity from EEG, EMG and behavioralrecordings. To make this happen, the corresponding objectives are to correlate EEGwith the task, perform group analysis, explore higher order statistics and establishmultimodal cost functions. Specifically, the main technical contributions are:1. Investigate the complementary relationship between PLS and canonical cor-relation analysis (CCA), and combine them to model the cortical and mus-cular activity.2. Develop a novel coupling analysis method, termed as IC-PLS, for model-ing EEG-EMG activity which incorporates the concept of independence tocircumvent the statistical insufficiency of the second-order statistics.3. Propose a joint multimodal group analysis framework (JMSF) for investi-gating the relationships between cortical, muscular and behavioral measure-ments under the uncorrelatedness assumption.4. Design a three-step method based on the above approach under the indepen-dence assumption.Figure 1.3: The overview of challenges and objectives of the thesis work.Fig. 1.3 illustrates all the challenges, objectives, contributions and their re-lationships. The developed techniques are applied to EEG, EMG and behavioral9data collected from both healthy subjects and PD patients when they perform adynamic, visually guided tracking task.1.4 Thesis OutlineThe thesis outline is summarized as follows:In Chapter 2, we first overview existing coupling analysis methods and summa-rize their advantages and disadvantages. We then describe the data sets collectedfrom a visually guided tracking task and explain the data preprocessing procedures.After that, we propose combining PLS and CCA to improve the performance of thejoint latent variable (LV) extraction. The proposed PLS+CCA has a two-step mod-eling strategy. PLS first extracts LVs which are able to most explain individual dataset and meanwhile are well correlated to the LVs in the other data set. Then CCAis utilized to extract the LVs by maximizing the correlation coefficients. The ex-tracted components are guaranteed to be maximally correlated across two data setsand meanwhile well explain the information within individual data sets.In Chapter 3, to overcome the insufficiency of the uncorrelatedness assump-tion, we propose an IC-PLS framework to simultaneously incorporate response-relevance and independence into the regression procedure, keeping the LVs maxi-mally independent and uniquely sorting the LVs in order of relevance. When ap-plied to corticomuscular coupling analysis, the proposed IC-PLS is able to extractthe most significant LV pairs from concurrent EEG and EMG data in an orderlymanner.In Chapter 4, to accommodate multimodal data sets, we propose a joint multi-modal statistical framework to simultaneously model multiple data spaces, keepingthe LVs uncorrelated within each data set and meanwhile highly correlated acrossmultiple data sets.In Chapter 5, to facilitate interpretations in real medical applications, we incor-porate independence into the joint multimodal framework and propose a three-stepmethod by combining multiset canonical correlation analysis (M-CCA) and jointindependent component analysis (ICA). This method is able to explore the re-lationships between multimodal data sets and meanwhile extract independent LVswithin each data set. We applied these proposed methods to concurrent EEG, EMG10and BEH data collected from normal subjects and patients with PD when perform-ing a dynamic force tracking task. We utilized the proposed methods to extracthighly correlated temporal patterns among the three types of signals (features) andreported meaningful connectivity patterns.Finally, the conclusions of the thesis and suggestions for future research aresummarized in Chapter 6.In Appendix A, we present the entire design process of a wearable and wire-less EEG recording system and report a few testing examples. Then, we introducea real-time sEMG hand gesture recognition system, including the hardware design,data collection, feature extraction and classification components. For feature ex-traction and classification, we investigate the most popular methods and comparetheir performances systematically. Further, we propose employing multiple kernellearning support vector machine (MKL-SVM) for hand gesture recognition anddemonstrate its superior performance. The reason we put this part in Appendixis that we did not use our developed system to collect data for corticomuscularcoupling analysis although our original purpose was to utilize an integrated wire-less EEG and EMG system for patients? convenience. We were required to usecommercialized medical devices in scientific publications.11Chapter 2Modeling CorticomuscularActivity by Combining PLS andCCA2.1 IntroductionCorticomuscular activity modeling is important for assessing functional interac-tions in the motor control system, i.e. studying simultaneous cortical and mus-cular activities during a sustained isometric muscle contraction. As mentioned inSection 1.2.2, the most common method to assess the interactions between motor-related brain areas and the muscles is MSC, which is a normalized measure ofcorrelation between two waveforms or signals in the frequency domain. AlthoughMSC has been popular in studying corticomuscular coupling, it suffers from sev-eral limitations. First, addressing the inter-subject variability challenge to makea robust group inference is not straightforward with MSC because the exact fre-quency of maximum coupling may be inconsistent across subjects. Second, MSCemphasizes the role of individual locus in the brain in driving the motor systemwhile motor activity is known to be more distributed [36]. In fact, recent workhas suggested that interactions between brain regions correspond more closely toongoing EMG than activity at discrete sites [30, 71?74]. Moreover, when the brain12activity is measured by EEG, applying MSC directly to raw EEG and EMG sig-nals normally yields a very low coherence value, because only a small fraction ofongoing EEG activity is related to the motor control [31]. This implies that exten-sive statistical testing is required to determine whether the EEG/EMG coherence isstatistically significant.Recently, several data-driven multivariate methods have been developed for an-alyzing biological data, and they seem to be appropriate for modeling corticomus-cular activity because these methods explore dependency relationships betweendata sets. These methods include multiple linear regression, principal componentregression, PLS and CCA [75]. Among these methods, the LV based approaches,such as PLS and CCA, play a dominating role, probably due to the fact that theextracted LVs could help the biological interpretations of the results.PLS, first developed for process monitoring in chemical industry, exploits thecovariation between predictor variables and response variables and finds a new setof latent components that maximally relate them [32]. An advantage of PLS isthat it can handle high dimensional and collinear data, which is often the case inreal-world biological applications. PLS and its variants have been investigated inmany medical applications, such as assessing the spatial patterns of brain activityin fMRI data associated with behavioral measures [33], and the common temporalcomponents between EEG and fMRI signals [34]. In addition to the ability ofhandling high dimensional and collinear data, PLS is sufficiently flexible that itcan be extended to perform group level analysis and to accommodate multiwaydata [30].CCA is commonly used to seek a pair of linear transformations between twosets of variables, such that the data are maximally correlated in the transformedspace. Generally CCA is not as popular as PLS in practical applications [76]. Thisis probably because real-world data are usually high dimensional and collinearand thus applying CCA directly to the raw data can be ill-conditioned. However,with some appropriate preprocessing strategies, CCA has been shown to be quiteuseful in many medical applications. For instance, in [77] Clercq et al. success-fully removed muscle artifacts from a real ictal EEG recording without alteringthe recorded underlying ictal activity. In [79], Gumus et al. found that there weresignificant correlations at expected places, indicating a palindromic behavior sur-13rounding the viral integration site. CCA can be extended to accommodate multipledata sets simultaneously [78].However, PLS and CCA can only extract uncorrelated LVs and their interpre-tations may be difficult in real applications [38]. ICA is based on the notion thatit is insufficient to consider only up to the second-order statistics (e.g. correlationand covariance) for obtaining a unique LV model [39] if the data are not strictlymultivariate Gaussian. ICA assumes that the multivariate data are composed of alinear superposition of mutually statistically independent signal sources. In statis-tics, independence is a much stronger condition than uncorrelatedness, and thusICA algorithms typically employ criteria related to information theory and/or non-Gaussianity. ICA has found many potential biomedical applications [80?82]. Forexample, ICA can be used to reliably separate fMRI data into meaningful con-stituent components, including consistently and transiently task-related physiolog-ical changes, non-task-related physiological phenomena and machine or movementartifacts [80].Independent component regression (ICR) has been developed as an alternativeto traditional LV-based methods (e.g. PLS) [83]. In ICR, independent compo-nents (ICs) are first extracted from the measurements and then linear regression isperformed to relate the ICs with the response. However, as a two-stage method,ICR inherits the following drawbacks of ICA. First, ICA can only decomposeone data set at a time and thus ignores the effects of the response variables. Eventhough the extracted ICs are maximally independent with each other, they may, andare typically not, directly informative about the response. Second, unlike princi-pal component analysis (PCA) which yields unique and ranked principal compo-nents (PCs) by best explaining the variance in the data, classical ICA with stochas-tic learning rules can result in ICs that vary by permutation and/or dilation evenwhen repeatedly performed on the same data set. Such inherent indeterminacy ofclassical ICA is due to the fact that the assumptions are mathematically insufficientto extract the independent sources exactly in their original form. To overcome thesedrawbacks, joint independent component analysis (jICA), was developed to max-imize the independence of joint sources of multiple data sets [84]. However, inthe jICA framework, all modalities are assumed to share the same mixing matrix,which may not be true in practice [84].14To address the aforementioned technical challenges arising when modelingcorticomuscular activity, we aim to develop novel signal processing techniques inthe following sections and chapters. We evaluate the performance of the proposedmethods both numerically and practically. We first apply the proposed methods tosynthetic data to illustrate their performance where the underlying truth is well un-derstood. We then apply the proposed group analysis methods to concurrent EEG,EMG and behavior data (BEH) collected from normal subjects and patients withPD when performing a force tracking task.2.2 Experimental DataThe study was approved by the University of British Columbia Ethics Board, andall subjects gave written, informed consent prior to participating. Nine PD patients(mean age: 66 yrs) were recruited from the Pacific Parkinson?s Research Centre atthe University of British Columbia (Vancouver, Canada). They all displayed mildto moderate levels of PD severity (stage 1-2 on the Hoehn and Yahr scale) and werebeing treated with L-dopa medication (mean daily dose of 720mg). All PD subjectswere assessed after a minimum of 12-hour withdrawal of L-dopa medication, andtheir motor symptoms were assessed using the Unified Parkinson?s Disease RatingScale (UPDRS), resulting in a mean score of 23. In addition, eight age-matchedhealthy subjects were recruited as controls. During the experiment, subjects seated2 m away from a large computer screen. The visual target was displayed on thescreen as a vertical yellow bar oscillating in height at 0.4 Hz. Subjects were askedto squeeze a pressure-responsive bulb with their right hand. The visual feedbackrepresenting the force output of the subject was displayed as a vertical green barsuperimposed on the target bar as shown in Fig. 2.1. Applying greater pressureto the bulb increased the height of the green bar, and releasing pressure from thebulb decreased the height of the green bar. Subjects were instructed to make theheight of the green bar match the height of target bar as closely as possible. Eachsqueezing period lasted for 15 seconds and was followed by a 15-second rest pe-riod. The squeezing task was performed twice. The force required was up to 10%of each subject?s maximum voluntary contraction (MVC), which was measured atthe beginning of each recording session.15Figure 2.1: The squeezing task: The subject was instructed to follow the tar-get bar (yellow) as close as possible. The force exerted by the subjectwas shown by the green bar.The EEG data were collected using an EEG cap (Quick-Cap, Compumedics,Texas, USA) with 19 electrodes based on the International 10-20 system, refer-enced to linked mastoids. The EEG and EMG data were sampled at 1kHz usingSynAmps2 amplifiers (NeuroScan, Compumedics, Texas, USA). A surface elec-trode on the tip of the nose was used as ground. Ocular movement artifacts weremeasured using surface electrodes placed above and below the eyes (Xltek, On-tario, Canada). Data were later processed by a band-pass filter (1 to 70Hz) off-lineand down-sampled to 250 Hz. Artifacts associated with eye blinks and muscularactivities were removed using the Automated Artifact Removal in the EEGLABMatlab Toolbox (Gomez-Herrero, 2007). To simplify analysis, the EEG signalswere divided into five regions as shown in Fig. 2.2. The five regions were: Fronto-Central (FCentral) - FP1, FP2, F7, F3, Fz, F4 and F8; Left Sensorimotor (LSM) -T7, C3, P7 and P3; Right Sensorimotor (RSM) - T8, C4, P8 and P4; Central - Cz16and Pz; and Occipital - O1 and O2. While the proposed framework is capable ofhandling data at an individual electrode level, we found in our prior work [100] thataveraging recordings within a region has several potential benefits: 1) it reduces theeffects of possible spurious activity measured at individual electrodes, 2) it reducesinter-subject variability due to slight variations in the placement of the electrodecap, and 3) regions can be chosen as they are biologically related to the motor taskwe are studying based on prior medical knowledge, making neuroscience interpre-tations easier. Thus, the raw time courses of the electrodes within each region wereaveraged as the overall activity of the region, and the averaged time course werethen zero-meaned and normalized to unit variance. For subsequent analysis, datacollected during the squeezing periods were concatenated in time into a single ma-trix for each individual subject. Data from the rest periods were excluded from theanalysis.Figure 2.2: Five brain regions: Fronto-Central (FCentral) - FP1, FP2, F7,F3, Fz, F4 and F8, Left Sensorimotor (LSM) - T7, C3, P7 and P3, RightSensorimotor (RSM) - T8, C4, P8 and P4, Central - Cz and Pz andOccipital - O1 and O2.The EMG signals were recorded using self-adhesive, silver, silver-chloride pel-let surface electrodes with 7 mm diameter. A bipolar montage was used with afixed inter-electrode distance of 30 mm. The surface EMG signals were simulta-neously collected together with the EEG signals and were amplified and sampledat 1000 Hz. To be consistent with the EEG preprocessing, the EMG signals were17down-sampled off-line to 250 Hz and only the squeezing periods were used forsubsequent analysis. The BEH signals were recorded from the pressure-responsivebulb, representing the force preformed by the subjects. Similarly, the BEH signalfrom each subject was resampled to ensure its proper alignment with other signals.It is worth noting that it took a graduate student in Neuroscience about six monthsto collect these data sets, including recruiting subjects and designing experimentalprotocol.2.3 A Combined PLS and CCA Method2.3.1 Motivation and ObjectivesPLS and CCA have been investigated in many medical applications. However, tothe best of our knowledge, no report has profoundly explored their underlying dif-ferences, compared their characteristic performances, and combined their advan-tages to overcome their drawbacks. For corticomuscular activity modeling, as wewill elaborate more in Section 2.3.2, both PLS and CCA have their advantages anddisadvantages, but perhaps more importantly, these two methods can be consideredcomplementary. In this chapter, we propose combining PLS and CCA to improvethe performance of the joint LV extraction and the proposed method is denoted asPLS+CCA. More specifically, the proposed PLS+CCA has a two-step modelingstrategy: we first adopt PLS to obtain LVs across two data sets and then performCCA on the extracted LVs. In the first step, PLS is performed for preliminary LVpreparation. The aim of this step is to extract LVs which can most explain its owndata set and meanwhile are well correlated to the LVs in the other data set. Besides,this step can also prevent the ill-conditioned problem when applying CCA directlyto the raw data. In the second step, CCA is applied to the extracted LVs by PLSto construct the LVs by maximizing the correlation coefficients. With these twosteps, it is ensured that the extracted components are maximally correlated acrosstwo data sets and meanwhile can well explain the information within individualdata sets.182.3.2 MethodsIn this subsection, we first analyze the properties of PLS and CCA and demonstratetheir complementarity. Based on this observation, we then propose combining thetwo approaches to have the PLS+CCA method. The two zero-meaned data setsare stored in two matrices, the predictor matrix X(N? p) and the response matrixY (N ? q), where N means the number of observations and p and q indicate thenumbers of variables in corresponding matrices.Partial Least SquaresPLS exploits the covariation between predictor variables and response variablesand tries to find a new set of LVs that maximally relate them [76]. In other words,the covariance between the extracted LVs should be maximized asmaxw1,w2(w1T XTY w2)2,s.t. wiT wi = 1, i = 1,2(2.1)where wi?s (i = 1,2) are the weight vectors. A typical PLS can be implementedby the classical NIPALS algorithm [32]. Also, an alternative calculation way isto perform eigenvalue-eigenvector decomposition [85]. Therefore, the maximumof Equation (2.1) is achieved by having w1 and w2 as the largest eigenvectors ofthe matrices XTYY T X and Y T XXTY respectively. To obtain subsequent weights,the algorithm is repeated with deflated X and Y matrices. The detailed calculationprocedure can be found in Appendix B.1.1.The number of components to be extracted is a very important parameter ofa PLS model. Although it is possible to extract as many PLS components as therank of the data matrix X , not all of them are generally used. The main reasonsfor this are the following: the measured data are never noise-free and some trivialcomponents only describe noise. Therefore appropriate measures are needed todetermine when to stop. Typically, the number of components needed to describethe data matrices is determined based on the amount of variation remained in theresidual data [32].19Canonical Correlation AnalysisDifferent from PLS, CCA is to find linear combinations of both X and Y variableswhich have maximum correlation coefficient with each other. This leads to thesame objective function but different constraints compared with Equ. (2.1):maxv1,v2(v1T XTY v2)2s.t. v1T XT Xv1 = 1, v2TY TY v2 = 1(2.2)where vi?s (i = 1,2) are the weight vectors.The solutions to this problem are the largest eigenvectors of the matrices ?(XT X)?1XTY (Y TY )?1Y T X and (Y TY )?1Y T X(XT X)?1XTY ? respectively. Thesubsequent weights are the eigenvectors of the same matrix in the order of decreas-ing eigenvalues. The predictor LVs UX can be calculated directly from the originalX matrix as UX = XV1, the columns of which are uncorrelated with each other. Thedetailed derivation is shown in Appendix B.1.2. However, the solution dependsheavily on whether or not the covariance matrix XT X is invertible. In practice, itis possible to have rank(XT X) < p so that the invertibility cannot be satisfied anddirectly applying eigenvalue decomposition in the raw data space may lead to theill-conditioned problem. Therefore, some appropriate preprocessing strategies areneeded in practice before applying CCA.The Combined PLS+CCA MethodBased on the discussion above, we can see that the fundamental difference be-tween PLS and CCA is that PLS maximizes the covariance while CCA maximizesthe correlation. The objective of PLS is to construct LVs which could most explaintheir own data set and meanwhile are well correlated to the corresponding LVs inthe other set. In other words, the first priority of PLS is to find the LVs which canexplain significant proportion of variance in each data set and the second priorityis to find the LVs with relatively high correlation coefficients between the two datasets. In contrast, the only objective of CCA in the construction of LVs is to maxi-mize their correlation coefficients with the LVs in another data set. From this pointof view, the LVs extracted by PLS are able to represent major information for indi-20vidual data sets while the ones extracted by CCA may be trivial (e.g. noises withsimilar patterns) even if their correlation coefficient is maximum. This is an advan-tage of PLS over CCA. Besides, PLS can handle high dimensional and collineardata, which is often the case in real-world biological applications, while applyingCCA directly to the raw data may be ill-conditioned. However, we should notethat our goal is to find the relationships between two data sets, not just to explorethe information within individual data sets. It is possible that a higher covariancemerely results from the larger variance of LVs, which may not necessarily implystrong correlations. To overcome this, CCA is a powerful tool to ensure that theextracted LVs have similar patterns across the data sets.For corticomuscular activity modeling, the coupling relationship between EEGand EMG signals is what to be explored. In practice, EEG and EMG signals canbe contaminated by other types of signals and are never noise-free. In addition, thesignals from adjacent channels generally are similar, which leads to collinear data.By employing PLS, we can deal with the collinear EEG/EMG data sets and extractsignificant LVs, but it can not guarantee that the corresponding LVs are highlycorrelated with each other. With using CCA, we can extract highly correlated LVsfrom EEG and EMG signals, but it can not ensure that such LVs are non-trivial andwe may face the ill-conditioned problem.For corticomuscular coupling analysis, both PLS and CCA have their advan-tages and disadvantages, but perhaps most importantly, these two methods can beconsidered complementary. It is natural for us to think of combining PLS andCCA to form a two-step modeling strategy. In the first step, PLS is performedfor preliminary LV preparation. The aim of this step is to extract LVs which canmost explain its own data set and meanwhile are well correlated to the LVs in an-other data set. In this case, the trivial and irrelevant information across data setscould be removed. Besides, this step can also prevent the ill-conditioned problemwhen applying CCA directly to the raw data. In the second step, CCA is appliedto the prepared LVs by PLS to construct the LVs by maximizing the correlationcoefficients. After these two steps, it is ensured that the extracted components aremaximally correlated across data sets and meanwhile can well explain the infor-mation within each individual data set. The details of the proposed PLS+CCAmethod are given in Appendix B.1 and the specific implementation procedure is21Algorithm 1 The Combined PLS+CCA MethodInput: two data sets X (with size N? p) and Y (with size N?q)Output: corresponding LVs matrices UX , and UYThe First Step:1: Solve the eigen decomposition problems:(XTYY T X)w1 = ?1w1 and(Y T XXTY)w2 = ?2w2.2: Determine R1 and R2, the numbers of LVs extracted, corresponding to theabove two problems by the ratio of explained variance.3: Determine the final number of LVs: R = min(R1,R2).4: Set count = R.5: Initialize both LVs matrices to be empty, i.e., TX =[ ] and TY =[ ].6: while count > 0 do7: Set w1 and w2 to be the largest eigenvectors of the matrices XTYY T X andY T XXTY respectively.8: Calculate the LVs as tX = Xw1 and tY = Y w2.9: Set TX =[TX tX ] and TY =[TY tY ].10: Deflate X by subtracting the effects of the LV tX from the data space: X =X? tX(tX T tX)?1tX T X .11: Deflate Y by subtracting the effects of the LV tY from the data space: Y =Y ? tY (tY T tY )?1tY TY .12: Let count = count?1.13: end whileThe Second Step:14: Solve the following eigen decomposition problems:[(TX T TX)?1TX T TY (TY T TY )?1TY T TX]v1 = ?1v1 and[(TY T TY )?1TY T TX(TX T TX)?1TX T TY]v2 = ?2v2.15: Set V1 and V2 to be the R associated eigenvectors respectively.16: The recovered LVs UX and UX can be calculated byUX = TXV1 and UY = TYV2.22shown in Algorithm 1.2.3.3 Data Processing and ResultsSimulationIn this simulation, we apply the proposed method to synthetic data and also reportthe results of the PLS and CCA approaches for comparison. As an illustrativeexample, without loss of generality, four sources are generated and analyzed foreach data set.Synthetic Data The following four source signals are considered for the data setX :s11 = 1.5sin(0.025(t +63))sin(0.2t),s12 = 1.5sin(0.025t),s13 = sign(sin(0.3t)+3cos(0.1t)),s14 = uniformly distributed noise in the range [?1.5,1.5],(2.3)where t denotes the time index vector, valued from 1 to 1000, and s1i?s (i =1,2,3,4) represent four simulated sources, as shown in Fig. 2.3a. Note that heres1i?s are column vectors.Also, four source signals are considered for the data set Y :s21 = 1.5sin(0.025(t +69))sin(0.2(t +6))s22 = 1.5sin(0.025(t +20))s23 = sign(sin(0.3(t +7))+3cos(0.1(t +7)))s24 = uniformly distributed noise (the same as s14)(2.4)where the notations are similarly defined. The four simulated sources are shown inFig. 2.3b.Two mixed data sets X and Y are generated as follows with each row denotingone observation in their respective data space:X = S1 ?A, Y = S2 ?B, (2.5)23(a)(b)Figure 2.3: The four simulated source signals: (a) for X ; (b) for Y .where S1 = [s11 s12 s13 s14] and S2 = [s21 s22 s23 s24] withA =??????0.76 ?0.65 0.77 0.83 0.820.49 0.25 0.12 0.22 ?0.170.28 ?0.21 0.11 0.19 ?0.110.07 0.06 ?0.08 0.07 ?0.04??????, (2.6)24Table 2.1: The correlation coefficients between the cor-responding source pairs of X and Y .s11 and s21 s12 and s22 s13 and s23 s14 and s24CC* 0.3655 0.8787 0.5520 1.00* Here CC stands for correlation coefficient between two sourcesignals.B =??????0.73 ?0.82 0.91 ?0.79 0.880.42 ?0.27 0.17 ?0.20 ?0.300.27 0.26 ?0.18 0.17 ?0.240.08 ?0.01 0.01 0.09 ?0.01??????. (2.7)The patterns of the corresponding sources are similar across the two data sets,representing common information. However, from Equations (2.3) and (2.4), wecan see that there are some time-shifts between corresponding source pairs andtheir correlation coefficients are given in Table 2.1. The first pair of sources havethe lowest CC, but in the mixed data sets we intentionally assign the highest weightsto this pair of sources, as shown in the mixing matrices A and B. This pair can repre-sent the major information within individual data sets, but can not reflect too muchthe coupling relationships between the two sets. The second and third pairs haverelatively high CCs and moderate weights in the mixed data sets. These two pairsgenerally not only contain the major information within individual data sets butalso represent the coupling relationships across data sets. The fourth pair sourceshave the highest CC, but we assign the smallest weights. Although this pair sourceshave the highest CC, they do not represent significant information due to the smallweights. Generally, they could be regarded as trivial information. Moreover, dif-ferent white Gaussian noise with 10% power was added to each source in each dataspace.Results The extracted components using PLS, CCA and the proposed PLS+CCAmethods are shown in Figs. 2.4, 2.5 and 2.6 respectively. The LVs extracted byPLS are automatically ordered in terms of their significance. To some extent, theLVs successfully reflect the corresponding relationships of the underlying sources25(a)(b)Figure 2.4: (a) The LVs estimated in X using PLS. (b) The LVs estimated inY using PLS.between X and Y . However, compared with the original sources, the extracted LVsare distorted, suggesting that a higher covariance may merely result from the largervariance of LVs, which may not necessarily imply strong correlations. We can seethat CCA can recover the original sources accurately in both data spaces and theLVs are ordered strictly according to their correlation coefficients, but it completelyignores the influence of the variance and thus the extracted LVs may only reflect26(a)(b)Figure 2.5: (a) The LVs estimated in X using CCA. (b) The LVs estimated inY using CCA.trivial information of the data sets (e.g., the 1st LV). For instance, although thefirst pair of LVs have the highest correlation coefficient, they do not contain majorinformation of the data spaces. In practice, such LVs generally represent the noiseswith similar patterns simultaneously coupled into the two data modalities. Whenjointly modeling the data sets, they should be removed. We also note that PLS onlyextracts three LVs since they are sufficient to describe the data sets. These LVs27(a)(b)Figure 2.6: (a) The LVs estimated in X using the proposed PLS+CCA. (b)The LVs estimated in Y using the proposed PLS+CCA.do not include the first pair recovered by CCA due to their triviality. The aboveobservations motivates us to employ the proposed PLS+CCA method.When the proposed method is employed, the dominant sources which makesignificant contributions to both data spaces are first identified and ordered in termsof covariance. At the same time, trivial information is removed. Then, withinthe extracted major information, sources that are highly correlated are accurately28recovered with the focus on correlation. In this case, it is ensured that the extractedLVs are maximally correlated across two data sets and meanwhile can well explainthe information within each individual data set.Real DataIn many medical applications, the results of analyzing one subject?s data can not begeneralized to the population level because of the inter-subject variability concern.Therefore, it is necessary to recruit a proper number of subjects to perform a groupanalysis. For modeling the corticomuscular activity, we apply the proposed methodto concurrent EEG and EMG signals collected from normal subjects and patientswith PD during a motor task, described in Section 2.2.Feature Extraction In most existing studies, the analysis for corticomuscular cou-pling is performed directly on the raw EEG and EMG data. This typically yieldsquite small correlation values. Nonetheless, with appropriate preprocessing steps,highly correlated EEG and EMG feature(s) can be extracted from the raw signals.In this chapter, we examine the coupling relationships between time-varying EEGfeatures and amplitudes of the EMG signals, constituting Xb and Yb respectivelyfor each subject b (for b = 1,2, ...,B). We have a total of B subjects (B = 17 inthis study). To achieve a group analysis, all subjects? data sets are concatenatedtogether as:X = [X1,X2, ...,XB] , ?b = 1,2, ...,BY = [Y1,Y2, ...,YB] , ?b = 1,2, ...,B(2.8)with the assumption that all subjects share common group patterns in the temporaldimension [82].EEG Features: pair-wise Pearson?s correlations [86] are considered in thisstudy. Pearson?s correlation measures the dependency between a pair of EEG sig-nals e? = (e?1,e?2, ...,e?n) and e? = (e?1,e?2, ...,e?n) in the time domain as follows:?e?e? =?ni=1(e?i ? e??)(e?i ? e??)??ni=1(e?i ? e??)2?ni=1(e?i ? e??)2, (2.9)where e?? and e?? are the sample means of e? and e?. In this chapter, we calculate the29time-varying pair-wise correlations between EEG channels, using a Hamming win-dow with length 300 and with a 95% overlap. Therefore, the raw EEG informationcan be represented by a two-dimensional matrix with size N?M, where the rowscorrespond to the samples at different time points and the columns correspond tothe features, i.e. pair-wise correlations between the EEG channels.EMG Features: an individual EMG channel signal can be considered as a zero-mean, band-limited and wide-sense stationary stochastic process modulated by theEMG amplitude, which represents the overall muscle activity of individual under-lying muscle fibers [87]. While different techniques have been proposed for accu-rate amplitude estimation, in this study, we employ the root-mean-square approachto calculate the EMG amplitude of short-duration EMG signals e = (e1,e2, ...,en):erms =?1n(e12 + e22 + ? ? ?+ en2). (2.10)A moving window with length n = 300 and a 95% overlap is applied here, thesame as in the EEG feature calculation, to ensure that the obtained EEG and EMGfeatures are temporally aligned and matched.In the above setting, for each subject b (for b = 1,2, ...,B), Xb and Yb representthe time-varying feature matrices of EEG and EMG respectively. The length ofthe time sequences here is 480 associated with the 300-length moving window anda 95% overlap. For the EEG correlation feature, since we have 19 EEG channelsbased on the International 10-20 system and thus there are a total of C192 = 171 cor-relation connections. Therefore, Xb is with size 480?171. For the EMG amplitudefeature, since there are three surface EMG channels, Yb is with size 480?3.Significance Assessment To determine the statistical significance levels of the ex-tracted LVs, we employ a non-parametric permutation test [88] in which the tempo-ral order of EEG features Xb is uniformly permuted for all subjects while keepingthe EMG features Yb intact. Two hundred random permutations are generated. Theproposed PLS+CCA method is applied to each of these permutations. The cor-relation coefficients among the extracted temporal patterns from permuted EEGfeatures and unchanged EMG features are then calculated to form an empiricalnull distribution. The p-value of the original EEG/EMG correlation coefficient is30then computed from the null distribution as the proportion of sampled permutationswhose correlation coefficients are greater than or equal to the original correlationcoefficient. The components with p-value being less than 0.05 are considered tobe statistically significant, denoted as LV EEG and LV EMG, both with size (N?K),where K means the number of significant components.Spatial Pattern Extraction Our goal is to investigate the differences in spatial pat-terns of EEG channels between the normal and PD patient groups when the subjectsperform a motor task. After the identification of significant temporal patterns, wecan regress the EEG-related components LV EEG back onto the EEG features Xb(for b = 1,2, ...,B) for each subject as follows:pbk =?1lvTk XbXbT lvkXbT lvk, k = 1,2, ...,K. (2.11)where lvk is the k-th column of LV EEG and pbk is the spatial pattern of the k-th component for subject b. In addition, we also want to determine which EEGfeatures in the spatial patterns have significant contributions to the correspondingtemporal patterns. This is done by identifying EEG features that have weightsstatistically different from zero. To determine the group-level spatial pattern, foreach significant component, we apply a two-tailed t-test to each element of thespatial patterns of all subjects with each group.Results In this real data study, we apply the proposed method for corticomuscu-lar activity modeling to the EEG and EMG features generated using the proceduredescribed in Section 2.3.3 from 8 normal and 9 PD subjects simultaneously. Thejoint modeling of normal and PD data allows the identification of common tem-poral patterns across the groups. Meanwhile, the spatial patterns may be differentacross subjects, from which we could identify specific correlation connections thatare differently recruited by PD subjects during the motor task.Using the permutation test, two components were deemed significant (P ?0.05) (Fig. 2.7). Note that in the figure only connections whose weights are sta-tistically different from zero are shown. The results based on real data from PD31Figure 2.7: The two components from the proposed PLS+CCA methodwhen using the EEG correlation features and the EMG amplitude fea-tures as data sets. Top panel: Temporal patterns of the EEG (blue,solid) and the EMG (red, dashed). The oscillation of the target bar isalso shown (black, solid). Bottom panel: EEG spatial patterns of nor-mal subjects (left) and PD subjects (right). The connections reflect therespective spatial patterns in the two groups. CC means correlation co-efficient.and normal subjects performing a dynamic motor task are promising. In the past,most EEG/EMG coupling studies have compared EEG activity at a specific locus(e.g. sensorimotor cortex contralateral to the hand performing the task) with theEMG during sustained contractions. However, we found that in normal subjects,correlations between the contralateral sensorimotor cortex and other regions areclosely associated with ongoing EMG features during dynamic motor tasks (Fig.2.7). It is likely that the dynamic nature of the task might require the recruitmentof additional regions such as frontal regions for motor selection [89], contralateral(i.e. ipsilateral to hand movement) sensorimotor cortex for fine modulatory control[90] and occipital regions for post-error adaptations [91].322.3.4 DiscussionFrom Fig. 2.7, we note similar connections between the PD and control groups,especially when comparing connections associated with each component, but wealso note significant differences when comparing the PD and control groups. Itis noted that PD subjects have increased connectivity between the frontal regionsand central and sensorimotor cortices, compared with control subjects. This mayreflect the enhanced effort required by PD subjects for motor task switching [92],a problem common in this PD population [93]. In addition, PD subjects have asignificant connection between the left sensorimotor and occipital regions that isnot present in the control group. We note that the connections with occipital re-gions are prominent in PD subjects. Compared to normal subjects, the PD subjectsheavily rely on visual cues for the initiation [94] and ongoing control of movement[95]. Moreover, the increased intra and inter-hemispheric connections observed inthe PD subjects are consistent with the findings in previous MEG studies [5].33Chapter 3An IC-PLS Framework forCorticomuscular CouplingAnalysis3.1 Motivation and ObjectivesIn the previous chapter, we have presented the combined PLS+CCA method forcorticomuscular coupling analysis. However, both PLS and CCA can only ex-tract uncorrelated LVs and their interpretations may be difficult in real applications[38]. ICA is based on the notion that it is insufficient to consider only up to thesecond-order statistics (e.g. correlation and covariance) for obtaining a unique LVmodel [39] if the data are not strictly multivariate Gaussian. ICA assumes that themultivariate data are composed of a linear superposition of mutually statisticallyindependent signal sources. In statistics, independence is a much stronger condi-tion than uncorrelatedness. As mentioned in Section 2.1, both PLS and ICA havetheir advantages and disadvantages, but most importantly, these two methods canbe considered complementary.In this chapter, we propose combining their advantages and minimizing theirdrawbacks by formulating a multi-objective optimization problem to simultane-ously incorporate response-relevance and independence into the regression proce-34dure, to produce a so-called IC-PLS model. The proposed IC-PLS extracts LVsfrom both the measured data X and the response Y , keeping the LVs maximallyindependent and uniquely sorting the LVs in order of relevance. When appliedto corticomuscular coupling analysis, the proposed IC-PLS can extract the mostsignificant LV pairs from concurrent EEG and EMG data in an orderly manner.Furthermore, to infer consistent activation patterns across subjects within a groupin the face of inter-subject variability, we also develop a group analysis frameworkbased on the proposed IC-PLS model to accommodate the multi-subject case andachieve robust group inference.3.2 MethodsIn this section, we first formulate the corticomuscular coupling analysis problemas a multi-objective optimization problem. The basic idea is to extract highly cor-related, but still maximally independent components in both the EEG and EMGso that the covariance between each of the extracted EEG and EMG componentsis maximized. Since the components are a result of an optimization based on aweighted combination of statistical independence and response-similarity goals,we then design some strategies to adjust each sub-objective?s parameters simulta-neously and make the corresponding sub-objectives change in parallel. Becauseof the importance of the initial solution, we also describe the initialization settingprocedure.3.2.1 A Multi-objective OptimizationTo combine the advantages of PLS and ICA and design an overall optimizationobjective, the following conditions should be satisfied simultaneously:First, PLS exploits the covariation between predictor variables and responsevariables and tries to find a new set of LVs that maximally relate them [76]. Inother words, the covariance between the extracted LVs should be maximized asmaxw1,w2(E((w1T x)(w2T y)))2s.t. wiT wi = 1, i = 1,2.(3.1)35where wi?s (i = 1,2) are the weight vectors, x is the predictor vector with size p?1and y is the response vector with size q? 1. Here it is assumed that x and y arepreprocessed (the specific procedure will be presented in Section 3.2.3).Second, to incorporate the advantages of ICA, the extracted LVs should be asindependent with each other as possible. According to Hyva?rinen and Oja [39],this can be achieved by solving the following problems:maxw1(E(G(w1T x))?E(G(u1)))2s.t. w1T w1 = 1,(3.2)andmaxw2(E(G(w2T y))?E(G(u2)))2s.t. w2T w2 = 1,(3.3)where u1 and u2 are standard Gaussian variables and G(?) is a non-quadratic func-tion. The following choices of G(?) have been previously suggested:G1(u) =1a1log cosh(a1u) and G2(u) =?exp(?u22) (3.4)where the constant a1 is generally 1 ? a1 ? 2. In this chapter, we adopt G1(?) forG(?) in the following optimization procedure.To encapsulate the above three maximization objectives, an intuitive approachis to combine the three objectives into a single aggregate objective function andachieve a good trade-off between their respective goals. We therefore employ thewell-known weighted linear sum of the sub-objectives, and propose the followingmulti-objective optimization problem:maxw1,w2?[E((w1T x)(w2T y))]2+?[E(G(w1T x))?E(G(u1))]2+?[E(G(w2T y))?E(G(u2))]2s.t. wiT wi = 1, i = 1,2.(3.5)where ? , ? and ? are the weights for the corresponding sub-objectives respectively,36and satisfy ?+? +? = 1. In Section 3.2.2, we will further discuss how to adjustthese parameters in detail.Based on the above optimization formulation, a solution can be calculated us-ing an approximate Newton iteration approach with the detailed derivation beingshown in Appendix B.2. At each iteration, wi?s are updated as:wi? wi +di,i.e., wi? wi? (J?(wi))?1?Fwi , i = 1,2.(3.6)where J?(wi) is the Jacobian matrix and ?Fwi is the first-order derivative of F(w1,w2,?1,?2) defined in Equation (B.20) with respect to wi.3.2.2 Determining the Optimization WeightsThe primary goal during the multi-objective optimization is to simultaneously op-timize two or more separate objectives subject to certain constraints. Due to thecomplicated and possibly conflicting nature of the sub-objectives, the overall opti-mal solution obtained depends on the relative values of the weights specified (i.e.,? , ? and ? here). Therefore, to achieve a good trade-off between different sub-objectives, it is important to determine the weights appropriately. Here two keyissues should be taken into consideration: the different scales and the differentconvergence speeds of the sub-objectives [96].The following strategy was employed to avoid the undesirable situation inwhich one sub-objective overwhelms the others: the aforementioned weights ? ,? and ? are decomposed into three factors respectively: ?sig ??scale ??ad j, ?sig ??scale ??ad j and ?sig ? ?scale ? ?ad j. Here ?sig, ?sig and ?sig, called significance fac-tors, denote the relative significance attached to each sub-objective. They can be setsubjectively according to the specific application. ?scale, ?scale and ?scale, namedscale factors, are adopted to unify the scales of the three sub-objectives. Just astheir names imply, the scale factors are defined as follows:?scale =1|F1,w1w2 |, ?scale =1|F2,w1 |, ?scale =1|F3,w2 |, (3.7)where the denominators are the absolute values of the corresponding sub-objective37functions defined asF1,w1w2 =(E((w1T x)(w2T y)))2+?11(w1T w1?1)+?12(w2T w2?1),F2,w1 =(E(G(w1T x))?E(G(u1)))2+?2(w1T w1?1),F3,w2 =(E(G(w2T y))?E(G(u2)))2+?3(w2T w2?1),(3.8)where ?11, ?12, ?2 and ?3 are Lagrange multipliers. ?ad j, ?ad j and ?ad j, called ad-justable factors, are used to balance their different convergence speeds and can beupdated in real-time during the iterative searching procedure. In general, the gra-dient of a function can be employed to estimate speed of convergence. Therefore,the adjustable factors can be defined as below:?ad j =1(??F1,w1?+??F1,w2?)/2?ad j =1??F2,w1?, ?ad j =1??F3,w2?(3.9)where the denominators are the L2 norm of the corresponding first-order deriva-tives, which are calculated in a similar way to the calculation of ?Fwi (i = 1,2) inAppendix:?F1,w1 = 2E((w1T x)(w2T y))E(x(w2T y))?2w1T E((w1T x)(w2T y))E(x(w2T y))w1?F1,w2 = 2E((w1T x)(w2T y))E(y(w1T x))?2w2T E((w1T x)(w2T y))E(y(w1T x))w2?F2,w1 = 2(E(G(w1T x))?E(G(u1)))E(xg(w1T x))?2w1T (E(G(w1T x))?E(G(u1)))E(xg(w1T x))w1?F3,w2 = 2(E(G(w2T y))?E(G(u2)))E(yg(w2T y))?2w2T (E(G(w2T y))?E(G(u2)))E(yg(w2T y))w2.(3.10)where g(?) represents the first-order derivative of G(?). From the above definitions,we can see that ?ad j, ?ad j and ?ad j will be adjusted online during the optimization38procedure. If one sub-objective changes faster, i.e., its gradient norm is larger, thecorresponding adjustable factor will be smaller and vice versa. This makes all sub-objectives change in parallel. However, when the iteration results are close to theoptima, the gradients will approach zero. Therefore we set the three weights asbelow: ???????????????????????????? = ?sig ??scale ??ad j, ?ad j ? T H? = ?sig ??scale ??ad j, ?ad j ? T H? = ?sig ??scale ??ad j, ?ad j ? T H? = ?sig ??scale, ?ad j > T H? = ?sig ??scale, ?ad j > T H? = ?sig ??scale, ?ad j > T H(3.11)where T H is a predefined small threshold that envelops a narrow range aroundthe optimum. Whenever the searching route enters the range, the correspondinggradient approximates zero so that the weights should be determined only by thefirst two factors. In short, the final weights take into account the effects of all thethree factors.3.2.3 Initialization SettingWhen solving optimization problems, the final solutions not only depend uponthe search algorithm employed, but also upon the initial setting, which determinesthe searching path and starting point respectively. In the proposed IC-PLS model,to prevent the optimization being stuck in an uninformative local minimum, weintroduce a joint statistical analysis method [70, 71] and use the extracted LVs asinitial conditions for the subsequent iterative search.Suppose the original predictor and response data are denoted by Z1 (with sizeN?P) and Z2 (with size N?Q) respectively and define the sub-latent variables(subLVs) in each data space to be linear combinations of the original variables, i.e.,Z1v1,Z2v2. One super latent variable (supLV) tg is designed to relate the subLVs39by solving the following optimization problem:max (tgT Z1v1)2 +(tgT Z2v2)2s.t. tgT tg = 1, viT vi = 1, ?i = 1,2(3.12)with all data assumed to be zero-mean and normalized to unit variance in advance.The subLV Zivi in each data space carries its associated variation information and(tgT Zivi)2 models the covariance information between the subLV Zivi and the su-pLV tg. The supLV tg relates all subLVs simultaneously and actually plays a role asa link. It is expected that, by solving the above optimization problem, the extractedsubLVs will carry as much variation as possible in each data space and at the sametime be correlated as closely as possible.By the method of Lagrange multipliers, the supLV and subLVs can be readilyderived as:(Z1Z1T +Z2Z2T )tg = ?gtg, (3.13)ti = Zivi =?1?iZiZiT tg, (3.14)where ?i = tgT ZiZiT tg. Thus with Lagrange multipliers the optimization problemis now characterized as a standard algebra problem.Several supLVs, represented by Tg and ranked by descending ?g?s can be ex-tracted and a same number of subLVs, represented by Ti, can be readily calculatedby using Eq. (3.14):Ti = ZiZiT TgDi, (3.15)where Di is a diagonal matrix with corresponding?1?i ?s as its diagonal elements.In practice, we need to specify the number of supLVs to be extracted, which canbe done by extracting a sufficient number to explain an adequate fraction of thevariance (e.g. 90%).From the above derivation, it is apparent that the supLVs Tg are actually theprincipal components (PCs) of the concatenated data shown in Equation (3.13)and the subLVs Ti are extracted by regressing each data space on the supLVs asshown in Equation (3.15). Hence, the supLVs can represent the general systematicvariations for both data spaces and the subLVs can capture the local systematic40variations within each data space. However, collinearity may exist in the subLVscalculated through the above procedure since each data set is used repetitively asshown in Equation (3.14). Furthermore, the subLVs are not sorted according tothe correlation of the corresponding between-set subLV pairs. To efficiently imple-ment the subsequent Newton iteration, we need to address the above concerns andtherefore modify the original subLVs as follows:First, find the maximal correlation and the corresponding supLV t?g as well as sub-LVs t?1 , t?2 ;Second, exclude t?g from Tg and update each data space using t?1 , t?2 in a deflationmanner;Third, update the original subLVs based on the new supLVs and the updated dataspaces by using Equation (3.15).After repeating these steps, it is ensured that the modified subLVs (MsubLVs)are orthogonal to each other and automatically ordered. The extracted MsubLVsare then used as the initial solutions for the proposed multi-objective optimizationproblem.3.2.4 A Group Analysis FrameworkIn many medical applications, the desire to make group inferences requires recruit-ment of an adequate number of subjects and the application of a suitable groupanalysis which accommodates inter-subject variability. Here we propose a groupanalysis framework for modeling corticomuscular coupling activity, and apply theproposed group analysis framework to the EEG/EMG data collected from normalsubjects and subjects with PD during a motor task.As illustrated in Fig. 3.1, there are two stages in this framework. Suppose wehave B subjects in total (e.g. B = 17 in this study) and two types of data sets foreach subject. In the first stage, the subLVs Tb j?s (b = 1,2, ...,B; j = 1,2) for eachsubject are calculated by using Equation (3.13) and Equation (3.15). Note thatthese subLVs include most of the useful information for further analysis, and thereis no need to use the modified algorithm here because the collinearity and orderingproblems will be solved together in the initialization step of the group level IC-PLSmodel.41Figure 3.1: The diagram of the group analysis framework based on the pro-posed IC-PLS model.In the second stage, all subjects? subLVs are correspondingly concatenated as:Tgroup j = [T1 j,T2 j, ...,TB j] , ? j = 1,2, (3.16)with the assumption that all subjects share common temporal group patterns [82].Then the whole initialization procedure described in Section 3.2.3 is applied to theconcatenated data Tgroup j. Finally, the proposed IC-PLS model is employed to pro-cess the initialized data sets and the expected LVs could be extracted to representgroup patterns.3.3 Data Processing and Results3.3.1 SimulationSynthetic DataIn this simulation, we apply the proposed IC-PLS model to synthetic data andalso implement PLS and ICA for comparison. As an illustrative example, without42loss of generality, four sources are generated and analyzed, similar to that used byHakyin [98].Figure 3.2: The four source signals.The following four source signals are considered:s1 = 2cos(0.08t)sin(0.006t)s2 = sign(sin(0.3t)+3cos(0.1t))s3 = uniformly distributed noise in the range [?1.5,1.5]s4 =?????????0.01t 1? t ? 200?0.01t +4 201? t ? 6000.01t +8 601? t ? 1000(3.17)where t denotes the time point index, valued from 1 to 1000, and si?s (i = 1,2,3,4)represented four simulated sources, as shown in Fig. 3.2.Two mixed data sets X and Y , shown in Fig. 3.3, were generated as follows:x = Asx and y = Bsy (3.18)43(a)(b)Figure 3.3: (a) The mixed data X. (b) The mixed data Y.where sx = [s1 s2 s3]T and sy = [s1 s2 s4]T withA =????????0.86 0.79 0.67?0.55 0.65 0.460.17 0.32 ?0.28?0.33 0.12 0.270.89 ?0.97 ?0.74????????(3.19)44(a)(b)Figure 3.4: (a) The LVs estimated in X using PLS. (b) The LVs estimated inY using PLS.B =????????0.21 0.69 0.050.22 0.98 0.030.68 0.74 0.100.05 0.63 0.420.83 0.10 0.79????????(3.20)Here x, y, sx and sy are all column vectors, denoting one observation in their re-45spective data space.(a)(b)Figure 3.5: (a) The LVs estimated in X using ICA. (b) The LVs estimated inY using ICA.The sources s1 and s2 exist in both data spaces X and Y , representing commoninformation. The source s3 only contributes to X and s4 only to Y , representingtheir own unique information. In Y , we intentionally assigned a relatively highweight to the source s2. Moreover, different white Gaussian noise with 5% powerwas added to each source in each data space.46(a)(b)Figure 3.6: (a) The LVs estimated in X using IC-PLS. (b) The LVs estimatedin Y using IC-PLS.ResultsThe extracted components using PLS, ICA and the proposed IC-PLS model areshown in Figs. 3.4, 3.5 and 3.6 separately. The LVs extracted by PLS are au-tomatically ordered in terms of their significance and successfully reflected thecorresponding relationships of the underlying sources between X and Y . However,47compared with the original sources, the extracted LVs are distorted, suggesting thatthe fact that the PLS sources are uncorrelated are insufficient to accurately recoverthe underlying sources. In contrast, ICA recovers the original sources accuratelyin both data spaces, but it does not relate the two data spaces and the LVs are un-ordered. When the proposed IC-PLS model is employed, the dominant sourceswhich make significant contributions to both data spaces are accurately identifiedand ordered, with the focus entirely on sources that are common across the predic-tor and the response.3.3.2 Real DataThe experimental data sets have been described in Section 2.2. The concurrentlycollected EEG and EMG from controls and PD subjects are utilized here.Feature ExtractionThe basic idea is similar to that in Section 2.3.3. We examined the coupling rela-tionships between time-varying EEG features and amplitudes of the EMG signals,constituting Zb1 and Zb2 respectively for Subject b.EEG Features Two types of EEG features were considered: pair-wise Pearson?scorrelation [86] and band-limited, pair-wise EEG coherences [97]. Pearson?s cor-relation has been defined in Section 2.3.3 and can be referred to Equation (2.9).EEG coherence can also be used to examine the relation between two EEG sig-nals, which includes both power and frequency information. The calculation canbe referred to Equation (1.1). In this chapter, we calculated both the time-varyingpair-wise correlations and coherences between EEG channels, using a Hammingwindow with length 300 and with a 95% overlap. To be consistent with com-mon EEG practice, we further divide the coherence into four frequency bands, i.e.,theta band (4-8 Hz), alpha band (8-13 Hz), beta band (13-30 Hz) and gamma band(30-70 Hz). The time-varying energy signals in each band were then used as theEEG features. Therefore, the raw EEG information can be represented by a two-dimensional matrix with size N ?M, where the rows correspond to the samplesat different time points and the columns correspond to the features, i.e. pair-wise48correlations or band-limited coherences between the EEG channels.EMG Features In this study, we still employ the root-mean-square approach de-scribed in Equation (2.10) to calculate the EMG amplitude of short-duration EMGsignals by a moving window with length n = 300 and a 95% overlap, the same asin the EEG feature calculation, to ensure that the obtained EEG and EMG featuresare temporally aligned and matched.Regarding significance assessment and spatial pattern extraction, we could re-fer to Section 2.3.3.ResultsFigure 3.7: The two components of the group analysis framework when usingthe EEG correlations and the EMG amplitude data sets. Top panel:Temporal patterns of the EEG (red, dashed) and the EMG (blue, solid).The oscillation of the target bar is also shown (black, solid). Bottompanel: EEG spatial patterns of normal subjects (left) and PD subjects(right). The connections reflect the respective spatial patterns in thetwo groups and the width is made proportional to the weighting of therespective connections in the group EEG spatial pattern. CC meanscorrelation coefficient.When the proposed group analysis framework is applied to EEG and EMG49Figure 3.8: The two components of the group analysis framework when usingthe EEG coherence and the EMG amplitude data sets.features generated using the procedure described in Section 3.3.2 from all 8 normaland 9 PD subjects simultaneously, the identification of common temporal patterns,but different spatial patterns could be identified. Using the permutation test, twocomponents are deemed significant (P ? 0.05) (Figs. 3.7 and 3.8). Note that inthe figures only connections whose weights are statistically significantly differentfrom zero are shown.When using the correlation between brain regions as the EEG feature, many50connections between are similar between PD and normals, especially when con-nections seen in either components are considered. The only difference is a con-nection between the L sensorimotor and frontal regions that is present in controlsbut missing in PD (Fig. 3.7).When the pairwise EEG coherence features are used to couple with the EMGfeatures, there are also two components deemed significant via the permutation test(Fig. 3.8). Again, when considering across components, there are similarities in theconnections between PD and controls, but also significant differences. In the betaband, normal subjects have connections between the R sensorimotor cortex andoccipital, central and frontal regions that are completely missing in PD subjects.In the gamma band, PD subjects have increased connectivity between the frontalregions and central and sensorimotor cortices that are missing in controls. Finally,PD subjects have a significant connection between the L sensorimotor and occipitalregions in the theta and alpha bands that is not present in controls.3.4 DiscussionIn this chapter, we have proposed a new method to assess EEG/EMG couplingbased on combining the favourable properties of ICA and PLS. The simulationssupport utilization of proposed method, and results based on real data from Parkin-son?s and normal control subjects performing a dynamic motor task are promising.The results we observed in the PD and control subjects are consistent with priorstudies describing the changes in EEG observed in PD. When using the correlationbetween brain regions as EEG features, we found widespread connectivity betweencentral regions and the ipsilateral (right) sensorimotor regions and ongoing EMGactivity. A body of literature suggests that widespread oscillatory disruption in PDis detectable in the EEG during visually guided tasks (e.g. [100]). This likely rep-resents compensatory expansion of cortical regions to overcome deficiencies in thebasal ganglia to perform motor tasks, as has been observed in fMRI studies [101].The lack of fronto-central alpha activity seen in PD subjects compared to controlsmay reflect impaired spatial attention [102], as impairment of maintenance of at-tention has been well described in PD subjects (e.g. [115]). The increased frontalconnectivity we observed in the gamma region in PD may reflect the enhanced ef-51fort PD subjects required for motor task switching [92], a problem common in thispopulation [93].The enhanced occipital connectivity seen in PD in the theta andbeta bands likely relates to the fact that PD subjects heavily rely on visual cues forthe initiation [94] and ongoing control of movement [95].We are able to find common temporal patterns of corticomuscular couplingin both PD and controls despite having different spatial origins of this commonsignal. We note that our results are consistent with a recent study that suggeststhe magnitude of traditional MSC is not significantly different early in PD [106].However, we have shown that the distributed EEG connectivity patterns requiredto generate these common temporal patterns differ between PD and controls. Onecould envision situations where the spatial extent is similar, but the temporal pro-files are significantly different (e.g. focal stroke affecting the sensorimotor cortexcontralateral to the hand performing the task) which may not be well-served by theproposed approach. However, even in this scenario, compensatory mechanismswould still presumably modify the connectivity patterns giving rise to corticomus-cular coherence, and this would still be captured by the current approach.A limitation of our approach is that it treats corticomuscular interactions asbidirectional; however, in addition to the cortex driving the muscle, there may beproprioceptive information from the muscle to the cortex. This reflects the inherentlimitation of the PLS step, however recent work on directed partial least squares(e.g. [107]), may provide future directions to extend the current approach. Anotherpossible refining is that the cross-validation and bootstrap method may be used totest the robustness and infer the population level information.52Chapter 4A Joint Multimodal GroupAnalysis Framework forModeling CorticomuscularActivity4.1 Motivation and ObjectivesIn the previous two chapters, we have proposed two novel coupling analysis meth-ods. However, in both methods, only two types of data sets can be processed atthe same time, while in many cases more than two types of data sets can be avail-able and a better understanding could be achieved from analyzing multimodal datajointly. To address the above concern, in this chapter, we design a joint multimodalstatistical framework (JMSF) for corticomuscular coupling analysis to relate mul-tiple data sets.Unlike previous approaches where only two data sets are considered and theinterest is to interpret one data set by the other in a unidirectional fashion, the pro-posed framework models multiple data spaces simultaneously in a multidirectionalfashion. Here, we extend the ?bidirection? concept in [108] to be ?multidirection?to accommodate multimodal cases. The framework has a two-step modeling strat-53egy. In the first step, a multidirectional LV extraction solution (denoted as multi-LVextraction) is established for preliminary LV preparation. It is formulated by solv-ing a simple optimization problem. In the second step, a joint postprocessing isperformed on the extracted LVs to acquire common and specific information ineach data space. Furthermore, to address the challenging issue of inter-subjectvariability in biomedical applications where the problem is to infer certain activa-tion patterns consistently shared within a population, we develop a group analysisarchitecture based on the proposed JMSF to accommodate the multi-subject caseand to summarize the information from each individual subject?s data to achievegroup inference.4.2 Methods4.2.1 The Joint Multimodal Statistical FrameworkThe main contributions of the proposed method lie in two folds: (1) it changes thetraditional unidirectional regression fashion to a multidirectional fashion wheredata spaces are explored under the supervision of each other and (2) it modelsmultiple data sets/signals simultaneously overcoming the traditional constraint ofhaving only two types of data sets. In the following subsections, the two-step mod-eling strategy of the proposed JMSF, which is illustrated in Fig. 4.1, is describedin details.Multi-LV ExtractionSuppose we have m data sets X1, which is with size N?P1, X2 with size N?P2,..., and Xm with size N ?Pm, and we define the sub-latent variables (denoted assubLVs) in each data space to be linear combinations of the original variables, i.e.,X1w1,X2w2, ...,Xmwm. One super latent variable (denoted as supLV) tg is designedto relate the subLVs, and it can be obtained by solving the following optimizationproblem:maxtg,wim?i=1(tgT Xiwi)2,s.t. tgT tg = 1, wiT wi = 1, ?i = 1,2, ...,m.(4.1)54All columns of the m data matrices are assumed to be zero-mean and normalizedto unit variance in advance. The subLV Xiwi for the ith data space carries its as-sociated variation information and (tgT Xiwi)2 models the covariance informationbetween each subLV Xiwi and the supLV tg. The supLV tg relates all subLVs si-multaneously and actually plays a role as a link bridge. By solving the above op-timization problem, it is expected that the extracted subLVs should carry as manyvariations as possible in each data space and at the same time be correlated to eachother as closely as possible.By employing the method of Lagrange multipliers, we rewrite the initial costfunction as:Ltg,wi =m?i=1(tgT Xiwi)2??g(tgT tg?1)?m?i=1?i(wiT wi?1), (4.2)where ?g and ?i?s are Lagrange multipliers.Taking the derivatives of Ltg,wi with respect to tg and wi?s and setting them tobe zero, we have:?Ltg,wi? tg= 2m?i=1(tgT Xiwi)Xiwi?2?gtg = 0, (4.3)?Ltg,wi?wi= 2(tgT Xiwi)XiT tg?2?iwi = 0. (4.4)By left multiplying Equation (4.3) with tgT and Equation (4.4) with wiT , wecan easily derive the following equations:m?i=1(tgT Xiwi)2 = ?g, (4.5)(tgT Xiwi)2 = ?i, ?i = 1,2, ...,m. (4.6)The above equations indicate that ?g is the underlying optimization objectiveand ?i?s stand for the suboptimal objective parameters, which is actually equivalentto the following relation:m?i=1?i = ?g. (4.7)55According to Equations (4.5) and (4.6), we can modify Equations (4.3) and(4.4) as follows:m?i=1??iXiwi = ?gtg, (4.8)1??iXiT tg = wi, ?i = 1,2, ...,m. (4.9)Then, the above equations can be combined as(m?i=1XiXiT )tg = ?gtg, (4.10)which means that the optimization problem is now transferred to be a standardalgebra problem. The solution can be easily found through the eigenvalue decom-position of the matrix (?mi=1 XiXiT ).After tg is calculated, the suboptimal objective parameters ?i?s can be obtainedby combining Equations (4.6) and (4.9):tgT XiXiT tg = ?i, ?i = 1,2, ...,m. (4.11)Finally, in terms of Equations (4.9) and (4.11), the subLVs can be expressed asti = Xiwi =?1tgT XiXiT tgXiXiT tg, ?i = 1,2, ...,m. (4.12)Usually, using Equation (4.10), several supLVs tg?s, denoted by the matrix Tg,can be derived according to the descending ?g?s. For each data set Xi, the samenumber of subLVs, denoted by Ti, can be readily calculated by Equation (4.12),rewritten asTi = XiXiT TgDi, ?i = 1,2, ...,m, (4.13)where Di is a diagonal matrix with corresponding?1?i ?s being its diagonal ele-ments. A practical issue is to determine the number of supLVs. In our study, wedetermine the number by setting a threshold that corresponds to the ratio of ex-plained variance (e.g. 90%).From the entire derivation process, we can see that the supLVs Tg are actu-56ally the principle components (PCs) in the concatenation data space, as shownin Equation (4.10), and the subLVs Ti?s are extracted by regressing each data setonto the supLVs, as shown in Equation (4.13). Hence, the supLVs can representthe general systematic variations for all data spaces and the subLVs Ti?s can cap-ture the local systematic variations in the ith data space. However, the collinearityproblem may exist in the subLVs calculated through the above procedure sinceeach data set is used repetitively as shown in Equation (4.12). The extracted sub-LVs are not necessarily orthogonal to each other. Furthermore, in practice, weare interested in identifying the most highly correlated components across mul-tiple data sets. However, the subLVs derived above are not sorted according tothe average correlation coefficient (acc) between corresponding subLV pairs, i.e.acc(l) = ?mi, j=1i 6= jcc(ti,l, t j,l)/(m(m? 1)/2). To effectively implement the joint post-processing step which is to be described shortly, we need to address these orthog-onality and sorting issues and thus we design a modified algorithm as described inAlgorithm 2. In Algorithm 2, it can be ensured that the modified subLVs (Msub-LVs) for each data space are orthogonal to each other and are automatically orderedaccording to the descending average correlation.Figure 4.1: The diagram of the joint multimodal statistical framework.57Algorithm 2 Multi-LV ExtractionInput: multiple data sets X1 with size N?P1, X2 with size N?P2, ..., and Xmwith size N?PmOutput: corresponding MsubLVs matrices mT1, mT2, ..., mTm1: Set count = L, and extract L supLVs Tg = [tg,1, tg,2, ..., tg,L] from the concate-nated data space [X1,X2, ...,Xm] by using Equation (4.10).2: Calculate the original subLVs Ti = [ti,1, ti,2, ..., ti,L], ?i = 1,2, ...,m withineach data space by Equation (4.13).3: Initialize all MsubLVs matrices to be empty, i.e., mTi =[ ], i = 1,2, ...,m.4: while count > 0 do5: For l = 1,2, ...L, calculate the correlation coefficients (cc) between each pairof subLVs (e.g. cc(ti,l, t j,l), i 6= j, l denotes the l-th subLV).6: For each l, compute the average correlation coefficient (acc), i.e., acc(l) =?mi, j=1i6= jcc(ti,l, t j,l)/A, where A = m(m? 1)/2 is the total number of possibleunique pairs.7: Find the maximum acc and the corresponding supLV t?g as well as the sub-LVs t?1 , t?2 , ..., t?m.8: Set mTi =[mTi t?i ], i = 1,2, ...,m9: Deflate Xi?s by substracting the effects of the corresponding subLV fromeach data space as follows:10: for i = 1 to m do11: piT = (t?iT t?i )?1t?iT Xi12: Ei = Xi? t?i piT13: end for14: Exclude t?g from Tg to obtain an updated supLVs matrix T?g .15: Calculate the updated subLVs matrices by using Equation (4.13) based onT?g and Ei16: Let count = count?1.17: end while58Joint PostprocessingAlthough the corresponding subLVs for the data spaces may show similar patternsunder the link of supLVs, the correlation among the subLVs can not be necessar-ily ensured to be maximized. Therefore, a joint postprocessing step is needed tofurther decompose the underlying information of the results from the first-step. Re-cently, joint blind source separation (BSS) techniques on multiple data sets havebeen successfully applied to a wide range of applications such as the estimation ofbrain activations [82]. Some popular methods include group ICA [82], indepen-dent vector analysis (IVA) [109] and M-CCA [110]. Through numerical and realdata study, it has been shown that M-CCA can achieve better performance than theothers, especially when the data sets are large [110]. Therefore, in this chapter, weadopt M-CCA as the joint postprocessing method.M-CCA extends the theory of CCA to more than two random vectors to identifycanonical variates that summarize the correlation structure among multiple randomvectors by linear transformations. Unlike CCA where correlation between twocanonical variates is maximized, M-CCA aims to optimize an objective functionof the correlation matrix of the canonical variates from multiple random vectorsin order to make the canonical variates achieve the maximum overall correlation.Details about the implementation procedure of M-CCA can be found in [110].In this study, the inputs to M-CCA are the extracted MsubLVs from the first-step, i.e. mT1, mT2, ..., mTm, and the outputs are the extracted canonical componentswith maximum correlation, revealing their common information, i.e., T1c, T2c, ...,Tmc. The associated subspace decomposition can be expressed as below:PicT = (TicT Tic)?1TicT XiXi = Xic +Xie = TicPicT +Xie,?i = 1,2, ...,m(4.14)where Pic is the loading matrix and Xie is the residual information within the ith dataspace. Regarding the number of the canonical components, we can first keep it thesame as the number of subLVs and then apply the permutation test (see Section4.3.2) to identify significant components as the final common information.Since the residual Xie may contain some specific information within the data59space Xi besides noises, it is desirable to explore this kind of underlying infor-mation, reflecting the unique characteristics of each data space. Here, we adoptthe method of orthogonal signal correction (OSC) [111] to draw the orthogonalcomponents from the M-CCA residuals, i.e., T1o, T2o, ..., Tmo. The associated de-composition can be described asPioT = (TioT Tio)?1TioT XiXie = Xio +Xir = TioPioT +Xir,?i = 1,2, ...,m(4.15)where Pio is the loading matrix, and Xir is the final residual within each data space.In summary, each original data space Xi is decomposed into three subspaces:Xi = Xic +Xio +Xir = TicPicT +TioPioT +Xir, (4.16)where the first subspace stands for the similarity among multiple data sets and thesecond reveals the unique information in each data space.4.2.2 A Group Analysis ArchitectureTo accommodate the multi-subject case, the proposed method in Section 4.2.1 isextended to the group level and we propose a group analysis method. For modelingthe corticomuscular activity, we apply the proposed group analysis method to con-current EEG, EMG and BEH signals collected from normal subjects and patientswith PD during a motor task.Suppose we have a total of B subjects (B = 17 in this study) and m types ofdata sets for each subject (m = 3 in this study). As illustrated in Fig. 4.2, thereare mainly three steps in this architecture. In the first step, the subLVs Tb j?s (b =1,2, ...,B; j = 1,2, ...m) for each subject are calculated using Equations (4.10) and(4.13). These subLVs include most of the useful information for further analysis. Itis not necessary to use the modification algorithm here because the collinearity andordering problems will be solved together later. In the second step, all subjects?60Figure 4.2: The diagram of the group analysis architecture based on the pro-posed JMSF in Section 4.2.1.subLVs are correspondingly concatenated as:Tgroupi = [T1i,T2i, ...,TBi] , ?i = 1,2, ...,m, (4.17)with the assumption that all subjects share common group patterns in the temporaldimension. In the third step, the proposed JMSF is applied to the m grouped datasets and correlated components can be extracted.4.3 Data Processing and Results4.3.1 SimulationSynthetic DataIn this simulation, without loss of generality, three data sets are generated and ana-lyzed as an illustrative example. The following three data space with five variablesare considered, where each data space contains a common source and a unique61source:X1 = [s1,s1,s2,s2,s2] , with size 100?5X2 = [s2,s2,s3,s3,s3] , with size 100?5X3 = [s4,s4,s2,s2,s4] , with size 100?5s1 = cos(0.01t)sin(t)s2 = sin(0.015t)+2cos(0.005t)s3 = 2sin(0.025t)s4 = 2cos(0.08t)sin(0.006t)(4.18)where t denotes the time point index, valued from 1 to 100, and si?s (i = 1,2,3,4)represent four major simulated sources, which are uncorrelated with each other. Ineach data space, the common source s2 represents the common information sharedwith other spaces and the unique source contributes to its different behavior. WhiteGaussian noises with 5% power level are added to each source in each data set.Moreover, to further demonstrate the robustness, we add another two higher noiselevels and summarize the results in Table 4.1.ResultsWe apply the proposed JMSF to synthetic data to demonstrate its performance.We illustrate the entire process of modeling the underlying variations in each dataspace under the supervision of each other.All variables in three data sets are normalized to have zero-means and unitvariances. According to the accumulative explained variance, four supLVs, de-noted by Tg, are extracted to explain the general systematic information in theconcatenated data space [X1,X2,X3], accounting for more than 90% of the overallvariation. The four supLVs are shown in Fig. (4.3), compared with the originalsources si?s (i = 1,2,3,4). The extracted supLVs are sorted in a descending orderof their associated variances. We can see that the supLVs recover the major sourcesin some sense. As expected from our method, the first supLV resembles the com-mon source s2 in the three data spaces. The second and third supLVs are actuallythe combinations of s3 and s4. The fourth is quite similar to s1.Furthermore, the corresponding subLVs and MsubLVs are also extracted from62Figure 4.3: The extracted four supLVs (left hand side) and the original foursources si?s (right hand side).Figure 4.4: From left to right: the original subLVs of the data spaces X1, X2and X3 respectively.each data space, and the results are shown in Fig. (4.4) and Fig. (4.5) respec-tively. Obviously, the subLVs from each data space are not orthogonal to eachother, which indicates that the information in each data space is repetitively used.By using the modified algorithm presented in Section 4.2.1, orthogonal MsubLVscan be extracted and sorted automatically according to their correlation relation-ships with the supLVs. Compared with the true underlying signal model, we can63Figure 4.5: From left to right: the modified subLVs for the data spaces X1, X2and X3 respectively.see that the first two MsubLVs can summarize the major variation in each dataspace and the third one is mainly due to noise.Finally, the postprocessing step is performed to separate the highly correlatedinformation from the irrelevant parts. As shown in Fig. (4.6), the recovered CCAcomponents and OSC components are comparatively present for the three dataspaces. The CCA components are similar to the common source s2 and meanwhilethe OSC components are similar to their corresponding unique source.This simulation study illustrates the main steps of the proposed frameworkand shows that the underlying information in each data space can be effectivelyextracted. To further demonstrate its robustness, we consider another two highernoise levels (i.e., 10% and 15%) and summarize the results in Table 4.1. From thetable, we can see that although the noise power significantly increases, cc and accstill maintain at quite high values, indicating the good robustness of the proposedmethod against additional noise.4.3.2 Real DataThe detailed description for the experiment can be referred to Section 2.2. Theconcurrently collected EEG, EMG and BEH signals from controls and PD subjectsare utilized here.64Figure 4.6: The postprocessing results for Multi-LVs.Table 4.1: The estimation performances of theproposed method at different noise levels.Noise Level cc1 cc2 cc3 acc5% 0.9985 0.9982 0.9978 0.996910% 0.9943 0.9920 0.9923 0.987115% 0.9842 0.9831 0.9880 0.9752a Here noise level indicates the noise power in terms of thepercentage of the source signal power; cc1 means thecorrelation coefficient between the original source s2 andthe correlated LV extracted from X1; cc2 and cc3 aresimilarly defined; acc is defined in Algorithm 2.Feature ExtractionIn this chapter, we examine the coupling relationships among the EEG signals,the amplitudes of the EMG signals and the behavioral performance measurements,constituting X1, X2, and X3 respectively for each subject.EEG Features Two types of EEG features are considered: Pearson?s correlation[86] and EEG spectrum [112]. Pearson?s correlation has been defined in Sec-tion 2.3.3 and can be referred to Equation (2.9). Spectrum represents the energy65distribution of an EEG signal x = (x1,x2, ...,xn) in the frequency domain, whichcan be generated via a discrete Fourier transform:Xk =n?i=1xie? j2pik in , k = 0,1, ...,n?1, (4.19)where Xk?s represent the frequency domain information. In this chapter, we calcu-late the time-varying correlations between each two EEG channels and the time-varying spectra of each EEG channel, using a Hamming window with length 300and with a 95% overlap. To be consistent with the EEG practice, we further dividethe spectrum into three frequency bands, i.e., theta band (4-8 Hz), alpha band (8-13 Hz) and beta band (13-30 Hz). The time-varying energy signals in each bandare used as the EEG features. Therefore, now the raw EEG signals can be trans-ferred into a two-dimensional matrix with size N?M, where the rows correspondto the samples at different time points and the columns correspond to the features,i.e. pair-wise correlations between the EEG channels or band-limited energies ofindividual EEG channel signals.EMG and BEH Features The same EMG feature described in Equation (2.10) isemployed here. Note that the window size used here is the same as in the EEGcorrelation calculation. This is to ensure that the obtained EEG and EMG featuresare temporally aligned and matched. Similarly, the BEH measurement sequencefrom each subject is resampled to ensure the same temporal length as that of theEEG and EMG features.Significance AssessmentTo determine the statistical significance levels of the extracted components, weexploit the non-parametric permutation test [88] in which temporal correlationsamong EEG, EMG and BEH are removed by permuting the temporal order of EEGfeatures X1 and EMG features X2 uniformly for all subjects while keeping the BEHfeatures X3 unchanged. In this study, 200 random permutations are generated andthe proposed group analysis architecture is applied to each of these permutations.The average correlation coefficients among the extracted temporal patterns from66permuted EEG features, permuted EMG features, and unchanged BEH features arethen calculated to form an empirical null distribution. The p-value of the originalEEG/EMG/BEH average correlation can be computed from the null distribution asthe probability of observing a value at least as extreme as the original correlationin the null distribution. The components with p-value being less than 0.05 areconsidered to be statistically significant, denoted as Tgroup1c, Tgroup2c and Tgroup3c,all with size (N?K), where K is the number of significant components.Spatial Pattern ExtractionSpatial patterns of EEG channels represent functional connections between brainregions. Especially in this study, we want to investigate the differences betweenthe normal group and the PD patient group when performing a motor task. Afterthe identification of significant temporal patterns, we regress the EEG-related com-ponents Tgroup1c back onto the EEG features Xb1 (for b = 1,2, ...,B) for all subjectsas follows:pbk =?1tkT Xb1Xb1T tkXb1T tk, k = 1,2, ...,K. (4.20)where tk is the k-th column of Tgroup1c and pbk is the spatial pattern of the k-th com-ponent for subject b. For subject b, pbk can reflect which correlation connectionsmake important contributions to the corresponding significant temporal pattern. Toallow for easier visualization and comparison of spatial patterns between the twogroups (normal vs. PD), for each significant component, we concatenate the spa-tial patterns of the subjects within a group horizontally, and perform PCA on theconcatenated spatial patterns. The group level spatial pattern is then representedby the first principal component [30].ResultsIn this real data study, we apply the proposed group analysis method for corti-comuscular activity analysis to concurrent EEG, EMG and BEH signals collectedfrom normal subjects and patient subjects with PD when they perform a force track-ing task. We aim to extract common temporal patterns among the three types ofsignals and explore the differences in their spatial patterns between the control and67PD groups.The proposed group analysis method is applied to the EEG, EMG and BEHfeatures generated using the procedure described in Section 4.3.2 from all 8 nor-mal and 9 PD subjects simultaneously. The joint modeling of normal and PD dataallows the identification of common temporal patterns across the groups. Mean-while, the spatial patterns may be different across subjects, from which we canidentify the specific correlation connections or brain regions that are differentiallyrecruited by PD subjects during the motor task.Figure 4.7: Component 1 of the group JMSF when using the EEG correla-tions, the EMG amplitude and the force output data sets. Top panel:Temporal patterns of the EEG (solid), the EMG (dashed) and the BEHsignals (dotted). The oscillation of the target bar is also shown (black,solid). Bottom panel: EEG spatial patterns of normal subjects (left)and PD subjects (right). The connections reflect the respective spatialpatterns in the two groups.In this study, we repeat the analysis to two types of EEG features. For eachcase, X1, X2 and X3 represent the time-varying feature matrices of EEG, EMG and68BEH respectively. The length of the time sequences here is 480, associated with the300-length moving window and a 95% overlap. For the EEG correlation case, sincethe EEG signals are divided into five regions as mentioned in Section 2.2, there area total of C52 = 10 correlation connections. Thus, X1 is with size 480? 10. Forthe EEG band-energy case, there are five brain regions and three frequency bands,indicating 15 energy features can be generated. Thus, X1 is with size 480?15. Forboth cases, X2 and X3 are with size 480?1.Figure 4.8: Component 2 of the group JMSF when using the EEG correla-tions, EMG amplitude and force output signals as the three data sets.EEG Correlation Components Corresponding to EMG and BEH Data Using thepermutation test, three significant components (P? 0.05) are identified. The tem-poral and the corresponding spatial patterns for Component 1 are shown in Fig.4.7. The average correlation coefficient among the EEG, EMG and BEH scores is0.86 (with the p-value P = 0.0005). The group-level EEG spatial activation pat-terns of normal and PD subjects given by PCA are shown in the bottom panel of69Figure 4.9: Component 3 of the group JMSF when using the EEG correla-tions, EMG amplitude and force output as the three data sets.Fig. 4.7. In this EEG feature setting, there are a total of 10 correlation connections.For easier interpretation, only connections whose weights are greater than 80% ofthe maximum value are shown. From Fig. 4.7, we can see that, for both normaland PD groups, this component is dominated by the connection Central ? FCen-tral, while in PD subjects, there is an additional connection Occipital ? RSM. Thetemporal and spatial patterns for Component 2 are shown in Fig. 4.8. The averagecorrelation coefficient among the EEG, EMG and BEH scores is 0.83 (with P =0.0012). From Fig. 4.8, we can see that, for both normal and PD groups, thereexists the connection Central ? FCentral, whereas for the PD group, there are twoadditional connections LSM ? RSM and Occipital ? FCentral. The temporal andspatial patterns for another significant component are shown in Fig. 4.9. The aver-age correlation coefficient among the EEG, EMG and BEH scores is 0.77 (with P= 0.0068). In normal subjects, this component is based largely on the connectionsLSM ? FCentral, Occipital ? LSM and Central ? RSM, while in PD subjects, itis based on the connections LSM ? FCentral, Occipital ? FCentral and Central ?70LSM.Figure 4.10: Component 1 of the group JMSF when using EEG band-limitedenergies, EMG amplitude and force output signals as data sets. Toppanel: Temporal patterns of EEG (solid), EMG (dashed) and BEH(dotted). The oscillation of the target bar is also shown (black, solid).Bottom panel: EEG spatial patterns of normal subjects (left) and PDsubjects (right). The colors reflect the relative weighting of the respec-tive band-limited energies in five brain regions.71Figure 4.11: Component 2 of the group JMSF when using the EEG band-limited energies, EMG amplitude and force output signals as data sets.EEG Band-Energy Components Corresponding to EMG and BEH Data Usingthe permutation test, two significant components (P ? 0.05) are identified. Thetemporal and spatial patterns for Component 1 are shown in Fig. 4.10. The av-72erage correlation coefficient among the EEG, EMG and BEH scores is 0.91 (withP = 0.0004). The group-level EEG spatial activation patterns of normal and PDsubjects given by PCA are shown in the bottom panel of Fig. 4.10. In this EEGfeature setting, there are five brain regions and three frequency bands, i.e. 15 en-ergy features. For the theta band, in both normal and PD groups, FCentral demon-strates higher weights, while in PD subjects LSM, Central and Occipital also haverelatively higher weights. For the alpha band, in normal subjects FCentral andLSM play important roles, whereas in PD subjects there are no obvious differencesamong the five regions. For the beta band, in the normal group this component islargely associated with activities in Central, RSM and Occipital regions, while inPD the activity is seen more in the RSM and LSM regions. The temporal and spatialpatterns for the other component are shown in Fig. 4.11. The average correlationcoefficient among the EEG, EMG and BEH scores is 0.80 (with P = 0.0284). Forthe theta band, in both normal and PD groups RSM and Occipital demonstrateshigher weights, while in normal subjects LSM shows a particularly high weight.For the alpha band, in normal subjects the right brain area plays an important role,whereas in PD subjects it is seen more in the frontocentral area. For the beta band,all regions display low values in normal subjects, while in PD this component isassociated with activity in the Central region.4.4 DiscussionWhen the EEG correlation feature is used, we note that the connections with oc-cipital regions are prominent in PD subjects. Compared to normal subjects, thePD subjects heavily rely on visual cues for the initiation [94] and ongoing controlof movement [95]. In addition, the increased intra and inter-hemispheric connec-tions observed in the PD subjects are in accord with the findings in previous MEGstudies during the resting state [5].For the case when the EEG Band-Energy feature is used, we note that an in-creased number of brain regions in the theta band are recruited in PD subjects,which is in line with the observed association between synchronization in the thetaband and motor symptoms, in particular tremor [5]. In addition, the PD subjectsshow increased interhemispheric synchronization in the alpha range. This find-73ing may be positively related to cognitive perseveration in Parkinson?s disease [5].Besides, compared to normal subjects, the PD subjects heavily rely on the Cen-tral region in the beta band during the motor task, which is quite consistent withour previous study in the same experiment setting by mutual information network[112].74Chapter 5A Three-step Method forCorticomuscular ActivityModeling5.1 Motivation and ObjectivesIn the last chapter, we have proposed a framework to handle multimodal data setsunder the uncorrelatedness assumption. However, LVs that are uncorrelated cansometimes complicate interpretations in real medical applications [38]. We willdemonstrate this point in the simulation part. In this chapter, we aim to developa framework that can explore the relationship between multimodal data sets andmeanwhile can extract independent LVs within each data set. For multimodal cor-ticomuscular activity analysis, both M-CCA and jICA have their individual ad-vantages and disadvantages and fortunately, these two methods can be consideredcomplementary.In this chapter, we propose combining M-CCA and jICA to improve the over-all performance of the joint source extraction. More specifically, we first adoptM-CCA to obtain CVs across multiple data sets and then perform jICA on theextracted LVs. In contrast to prior approaches [113, 114], we generate commoncomponents from sources rather than the mixing matrices and concentrate on mul-75timodal/multiset data fusion. In fact, we employ a three-step modeling strategy.The first two steps are similar to JMSF and the last one is to help improve thelimitation of M-CCA. Because M-CCA requires a stringent assumption that corre-lation coefficients of the corresponding sources between multiple data sets shouldbe sufficiently distinct [110], which will be specified in Section 5.2.1 by Equation(5.2). This assumption may not be satisfied in practice.5.2 MethodsThe main contributions of the proposed method lies in its ability to: (1) modelmultiple data sets/signals simultaneously, overcoming the traditional constraint ofanalyzing only two types of data sets, and (2) extract maximally correlated compo-nents across multiple data sets while simultaneously keeping components withineach individual data set maximally statistically independent. In the following sub-sections, the three-step modeling strategy, illustrated in Fig. 5.1, is described indetail. However, we will begin from the second step since the first step can bereferred to Section 4.2.1.Figure 5.1: The diagram of the proposed joint multimodal analysis method.5.2.1 Multiset Canonical Correlation AnalysisAlthough the corresponding subLVs for multiple data sets could show similar pat-terns because of the common supLVs, the correlation coefficients among the sub-LVs can not be necessarily ensured to be maximized. Therefore, a second step isneeded to further decompose the underlying information of the results from the76first-step. M-CCA extends the theory of CCA to more than two random vectorsto identify CVs that summarize the correlation structure among multiple randomvectors by linear transformations. Unlike CCA where correlation between twocanonical variates is maximized, M-CCA aims to optimize an objective functionof the correlation matrix of the canonical variates from multiple random vectorsin order to make the canonical variates achieve the maximum overall correlation[110]. It has been shown that M-CCA can achieve excellent performances, espe-cially when the data sets are large. Details about the implementation procedure ofM-CCA can be found in [110].In this study, the inputs to M-CCA are the extracted MsubLVs from the first-step, i.e. mT1, mT2, ..., mTM, and the outputs are the extracted canonical variateswith maximum correlation, revealing their common information, i.e., CV1, CV2, ...,CVM. The associated decomposition can be expressed as below:mTi =CViAi, ?i = 1,2, ...,M (5.1)where Ai is the mixing matrix within the ith data set. Regarding the number ofthe canonical variates in each data set, we can keep it the same as the number ofMsubLVs.As mentioned before, M-CCA may fail to separate sources whose correlationcoefficients are equal or very close, which could often occur in real biomedicalapplications. In other words, if K sources can be extracted from each of M data setscorrespondingly, the following requirement must be met to successfully recover thesources by M-CCA:|r(?)m,n| 6= |r(? )m,n|, (1? ? < ? ? K,?m,n ? 1,2, ...,M) (5.2)where |r(? )m,n| represents the correlation coefficient between the ? -th source from them-th data set and the ? -th source from the n-th data set. Therefore, CVi should beregarded as incompletely decomposed sources, i.e. mixtures of the real indepen-dent components.775.2.2 Joint Independent Component AnalysisDue to the possible incompleteness of the source separation, in Step-3, we per-form jICA on the concatenated CVs to maximize the independence among jointcomponents by reducing their higher order statistical dependencies as below [84]:[CV1;CV2; ...;CVM] = [S1;S2; ...;SM]A (5.3)where A represents the common mixing matrix and Si is the extracted ICs from thei-th data set. For the implementation of ICA, many algorithms have been developedbased on different cost functions. Here we employ the classical one, FastICA [39].We then correlate the corresponding columns of Si and calculate the average cor-relation coefficients, in terms of which the corresponding columns are sorted fromhigh to low. As to the number of the joint ICs in each data set, we can first keepit the same as the number of CVs and then apply the permutation test to identifysignificant components.We should note that jICA assumes that all modalities share the same mixingmatrix A. This constraint is not always easily satisfied in practice. In the proposedmethod, M-CCA first links multiple data sets via correlation and prepares morerelevant CVs correspondingly across the data sets. Therefore we can perform jICAon the extracted CVs with more confidence, meaning that M-CCA helps jICA relaxthe required constraint. In turn, jICA further decomposes the remained mixturesin CVs and also helps M-CCA relax the constraint of the distinctiveness for cor-relation coefficients. In summary, M-CCA and jICA in the proposed method arecomplementary to each another and combining them appropriately can somehowovercome their limitations.5.3 Data Processing and Results5.3.1 SimulationIn this simulation, we apply the proposed method to synthetic data and also imple-ment separate ICA, M-CCA and jICA respectively for comparison.78Synthetic DataAs an illustrative example, without loss of generality, six sources are generated andanalyzed.The following six source signals are considered:s1 = sin(0.015t)+ cos(0.005t)s2 = 2cos(0.08t)sin(0.006t)s3 = ECG, s4 = EMGs5 = 1.5cos(0.01t)sin(0.5t)s6 = 1.5sin(0.025(t +63))sin(0.2t)(5.4)where t denotes the time index vector, valued from 1 to 1000, and si?s (i= 1,2, ...,6)represent six simulated sources, as shown in Fig. 5.2. Note that here si?s are columnvectors.Figure 5.2: The six simulated source signals.Three mixed data sets X1, X2 and X3 are generated as follows, with each rowdenoting one observation in their respective data space:Xi = Si ?Ai, i = 1,2,3 (5.5)79where S1 = [s1 s2 s3 s4], S2 = [s1 s2 s3 s5] and S3 = [s1 s2 s3 s6] withA1 =??????0.76 ?0.55 0.17 ?0.33 0.820.79 0.65 0.32 0.12 ?0.970.87 0.46 ?0.58 0.27 ?0.740.59 0.45 0.37 0.22 0.11??????, (5.6)A2 =??????0.73 ?0.52 0.21 ?0.29 0.780.82 ?0.67 0.6 ?0.20 ?0.900.78 ?0.1 0.71 0.29 ?0.510.52 0.39 0.30 0.27 0.15??????, (5.7)A3 =??????0.69 ?0.55 0.22 ?0.34 0.770.76 0.69 0.22 0.25 ?0.600.71 ?0.49 ?0.35 0.4 ?0.90.60 0.41 0.30 0.16 0.20??????. (5.8)The sources s1, s2 and s3 exist in all data sets, representing common informa-tion. The source s4 only contribute to X1, s5 only to X2 and s6 only to X3, represent-ing their own unique information. Random Gaussian noise with 5% power wereadded to Si?s before generating the mixed data sets.ResultsThe extracted underlying components using separate ICA, joint ICA, M-CCA andthe proposed method are shown in Fig. 5.3a, 5.3b, 5.3c and 5.3d respectively. Wealso made a quantitative comparison between these methods by calculating averagecorrelation coefficients for each method (Fig. 5.3e.) The separate ICA approach re-covered the original sources accurately in all data sets as expected, but was unableto meaningfully relate the three data sets or rank the ICs appropriately. Althougha subsequent source match via cross-correlation could be used to oder the sourcesthis may introduce ambiguity especially when the estimated component numberis high. By using joint ICA, the first jointly extracted sources, although perhapsless accurate, are at least highly relevant to each other. But the remaining extractedsources seem to be uninformative due to the stringent assumption that all modalities80(a) (b)(c) (d)(e)Figure 5.3: The extracted underlying components by the following fourmethods: (a) separate ICA; (b) joint ICA; (c) M-CCA; (d) the proposedmethod; (e) performance comparison of the different methods.share the same mixing matrix. The LVs extracted by the M-CCA were at least auto-matically and neatly ordered in terms of their average correlation coefficient valuesamong the data sets. However, compared with the original sources, the extractedLVs were distorted, suggesting 1) performance of M-CCA may suffer when thecondition indicated by Equation (5.2) is not satisfied, and 2) uncorrelatedness may81not be a sufficiently rigorous criterion to accurately recover the underlying sources.When the proposed method was employed, the dominant sources which made sig-nificant contributions to all data sets were accurately identified and ordered, withthe focus entirely on sources common across multiple data sets. In summary, theproposed method mitigates the deficiencies of both M-CCA and joint ICA and canseparate sources accurately and link them correctly in a less-constrained condition.5.3.2 Real DataThe experiment description can be referred to Section 2.2. In this study, we utilizeconcurrently EEG, EMG and BEH signals collected from controls and PD subjects.Feature ExtractionIn this chapter, we examine the coupling relationships among time-varying EEGfeatures, the amplitude of the EMG signals, and behavioral performance mea-surements, constituting X1b, X2b, and X3b respectively for each subject b (for b =1,2, ...,B). Suppose we have a total of B subjects (B = 17 in this study) and m typesof data sets for each subject (m = 3 in this study). To achieve a group analysis, allsubjects? data sets are correspondingly concatenated as:X1 = [X11,X12, ...,X1B] ,X2 = [X21,X22, ...,X2B] ,X3 = [X31,X32, ...,X3B] ,(5.9)with the assumption that all subjects share common group patterns in the temporaldimension [82].Band-limited, pair-wise EEG coherences [97] are considered in this chaptersince this type of time-varying features include temporal, spatial and spectrum in-formation simultaneously. The calculation can be referred to Equation (1.1) andSection 3.3.2. EMG and BEH Features are exactly the same as in Section 4.3.2.82Significance AssessmentThe non-parametric permutation test performed here is similar to Section 4.3.2.The p-value of the original EEG/EMG/BEH average correlation was then esti-mated from the null distribution as the probability of observing a value at least asextreme as the original correlation in the null distribution. The components with p-value being less than 0.05 were considered statistically significant, denoted as U1,U2 and U3, all with size (N?K), where K is the number of significant components.Spatial Pattern ExtractionIn order to aid in the biological interpretation of the results, we also derived connec-tivity patterns between EEG channels representing functional connections betweenbrain regions. After the identification of significant temporal patterns, we regressedthe EEG-related components U1 back onto the EEG features X1b (for b = 1,2, ...,B)for all subjects as follows:pbk =?1ukT X1bX1bT ukX1bT uk, k = 1,2, ...,K. (5.10)where uk is the k-th column of U1 and pbk is the connectivity pattern of the k-thcomponent for subject b. In addition, we also want to determine which EEG fea-tures in the connectivity patterns have significant contributions to the correspond-ing temporal patterns. This is done by identifying EEG features that have weightsstatistically significant different from zero. To determine the group-level connec-tivity pattern, for each significant component, we applied a two-tailed t-test to eachconnectivity feature across all subjects in each subject group.ResultsThe proposed method was applied to the EEG, EMG and BEH features generatedusing the procedure described in Section 5.3.2 from 8 normal and 9 PD subjectssimultaneously. The joint modeling of normal and PD data allows for the iden-tification of common temporal patterns across the groups, while the connectivitypatterns may be different across subjects. This allows for identification of specificcorrelation connections that are differentially recruited by PD subjects during the83motor task.Here X1b, X2b and X3b (b = 1,2, ...,B) represent the time-varying feature ma-trices of EEG, EMG and BEH for the b-th subject respectively. The length of theoriginal raw EEG and simultaneously recorded EMG data for each subject was7485 associated with the sampling frequency 250 Hz. As mentioned before, af-ter using the 300-length moving window and a 95% overlap, the length of timepoints (the matrix row) here was 480. For the EEG coherences, there are five brainregions and four frequency bands, indicating there are a total of C52 ?4 = 40 vari-ables. Thus, X1b is of size 480? 40. For the EMG amplitude feature, since thereare two surface EMG channels, X2b was 480?2, and the BEH data set X3b was size480? 1. The group analysis data sets X1, X2 and X3 were generated by Equation(5.9).Using the permutation test, one significant component (p-value ? 0.05) wasidentified. The temporal and spatial patterns for this component are shown in Fig.3.8. The acc among the EEG, EMG and BEH scores was 0.7946 (with p-value =0.0015). Note that in the figure only connections whose weights were statisticallysignificantly different from zero are shown.Although there were some similarities in the brain regions and connections re-cruited by PD and control groups, there still existed significant differences Fig. 5.4.In theta and alpha bands, normal subjects demonstrated frontoparietal and fronto-occipital connections which were completely missing in PD subjects. In the betaband, normal subjects had connections between the L sensorimotor cortex and thefrontal and R sensorimotor regions that were absent or reduced in PD subjects. Inthe gamma band, PD subjects had increased connectivity between the frontal re-gions and central and sensorimotor cortices that were missing in controls. Finally,PD subjects had a significant connection between the L sensorimotor and occipitalregions in the theta, alpha and gamma bands that was not present in controls.5.4 DiscussionIn this chapter, we have proposed a three-step multimodel data analysis methodto assess corticomuscular coupling based on combining the advantages of M-CCAand jICA. The simulation results support the usefulness of the proposed method.84Figure 5.4: The significant component jointly extracted from the EEG coher-ence, EMG amplitude and BEH data sets. Top panel: Temporal patternsof the EEG (solid), EMG (dashed) and BEH signals (dotted). The oscil-lation of the target bar is also shown (gray, solid). Bottom panel: EEGconnectivity patterns of normal subjects (left) and PD subjects (right).The connections reflect the respective connectivity patterns in the twogroups with the width of each connection proportional to the weightingin the group EEG spatial pattern. Here acc refers to average correlationcoefficient.85Furthermore, when applied to real data promising results based on detecting sig-nificant differences between Parkinson?s disease subjects and age-matched controlswere detected.The observed results in PD and control groups are in accord with previousmedical findings. We found widespread connectivity between the frontal, centraland ipsilateral (right) sensorimotor regions and ongoing EMG/BEH activity fromFig. 5.4. It is suggested that during visually guided tasks PD subjects may generatedetectable widespread oscillatory disruption in the EEG (e.g. [100]). This likelyrepresents a compensatory mechanism of cortical regions to overcome deficienciesin the basal ganglia to perform motor tasks, as has been observed in fMRI studies[101]. The lack of fronto-central and fronto-occipital activities in the theta andalpha bands seen in PD subjects compared to controls may reflect impaired spatialattention [102], as impairment of maintenance of attention has been well describedin PD subjects (e.g. [115]). The weaker beta band connection between the L andR sensorimotor regions in PD may reflect the decreased ability for fine modula-tory control [90]. The increased gamma connectivity in PD likely represents theextra effort PD subjects require for motor task switching [73, 92, 116], a problemcommon in PD [93]. We also found that connections with occipital regions areprominent in PD subjects. The enhanced occipital connectivity in the theta, betaand gamma bands may relate to the fact that PD subjects heavily rely on visualcues for the initiation [94] and ongoing control of movement [95]. In addition, theincreased intra and inter-hemispheric connections we observed in PD subjects areconsistent with the findings in previous MEG studies during the resting state [5].It may seem paradoxical that PD subjects have reduced beta-band connectivityin the sensorimotor cortex compared with controls, especially since PD subjectsmay have enhanced beta-band EEG activity associated with the clinical signs ofbradykinesia and rigidity [104, 116]. We believe that this is because the currentmethod specifically looks at the common sources among the EEG, EMG and be-havior signals. Because a key aspect in PD is the inability to modulate the beta-band oscillations [105], it is very likely that the fraction of beta-band activity thatvaries with phasic EMG/BEH activity is reduced in PD.In this chapter, we were able to find common temporal patterns in both PDand controls assuming different connectivity origins of the common signal. We86have shown that the distributed EEG connectivity patterns required to generate thecommon temporal pattern differ between PD and controls. One could envision sit-uations where the connectivity patterns are similar, but the temporal profiles aresignificantly different. Nevertheless, usually behavioral tasks are specifically cho-sen so that PD and control subjects can still perform the task with roughly equalaccuracy. Otherwise any differences in brain activation may be dismissed as simplythat the PD and control subjects were performing different tasks.87Chapter 6Conclusions and Future Works6.1 ConclusionsIn this thesis, we proposed four novel multimodal signal processing methods forcorticomuscular coupling analysis and investigated different brain connectivity pat-terns between normal and PD subjects during a dynamic visually guided trackingtask. The goal of all these methods is to maximize the estimated source indepen-dence within an individual data set and to maximize the source dependence acrossdatasets. As illustrated in Fig. 6.1, the relationships between these methods can besummarized as follows: both PLS+CCA and IC-PLS are bimodal approaches forhandling only two types of data; both JMSF and 3-step method can handle multi-modal data; both PLS+CCA and JMSF can only extract uncorrelated componentswithin each data set, while either of IC-PLS and 3-step method is able to obtainindependent components using higher-order statistics.PLS+CCA is a bimodal method which only explores second-order statisticswith a two-stage procedure. It is a simple, yet efficient method to extract uncor-related components within each data set and meanwhile keep corresponding com-ponents across data sets highly correlated. It is especially suitable for a situationwhere the uncorrelatedness assumption is competent to decompose the signals andcomputational speed is the major concern (e.g. online detection). IC-PLS is alsoa bimodal method which incorporates both second-order and high-order statisticsinto a multi-objective optimization problem. It ensures that the components within88Figure 6.1: The relationships between the four proposed methods.each data set are as independent as possible and their orders across data sets are thesame. When two types of data are available and the uncorrelatedness assumptionis insufficient to decompose the signals, IC-PLS is a good option. However, as amulti-objective optimization problem, IC-PLS represents a compromise betweenthe conventional PLS and ICA.JMSF is preferred when more than two types of data are available. It is able toextract similar temporal patterns from multimodal data sets. It particularly accom-modates the needs in recent years, e.g., performing data fusion when concurrentdata sets are available (e.g. fMRI, EEG, EMG, etc). However, the estimated com-ponents within each data set are only uncorrelated, which may not be enough incertain applications where independent components are preferred. To address thisconcern, the proposed 3-step method aims to extract independent components byassuming all data sets share the same mixing matrix. For the purpose of groupanalysis, we could employ both JMSF and the proposed 3-step methods.One should note that in this thesis we have analyzed the brain spatial patternsby several data-driven coupling analysis methods based on multimodal data, whilethe conventional EEG-EMG coherence only explores individual locus and tradi-tional EEG-based connectivity analysis ignores EMG-related information. In thepast, most EEG/EMG coupling analysis studies have compared EEG activity at a89specific locus (e.g. sensorimotor cortex contralateral to the hand performing thetask) with the EMG during sustained contraction. However, we found that spec-trum/correlation/coherence features between the contralateral sensorimotor cortexand other regions were closely associated with ongoing EMG/BEH features duringdynamic motor performance. It is likely that the dynamic nature of the task mightrequire the recruitment of additional regions such as frontal regions for motor se-lection [89], contralateral (i.e. ipsilateral to hand movement) sensorimotor cortexfor fine modulatory control [90] and occipital regions for post-error adaptations[91].While the proposed methods are intentionally developed for corticomuscularcoupling analysis, we would like to emphasize that they can also be applied toanalyze other forms of concurrent signals including but not limited to fMRI, pho-toplethysmograph (PPG), ECG and kinematic data. Therefore, they are promisingtools for multi-subject and multi-modal data analysis.In Appendix A, we designed and implemented a wireless wearable EEG/sEMGrecording system. Based on the developed system, we investigated a novel ap-plication ? Chinese number gesture recognition. Number gestures are the mostfrequently used hand gestures in real-life especially when there are language com-munication difficulties. One is able to develop similar applications for their ownlanguages based on our reported work. We believe that such applications couldbe very useful in multifunction prosthesis, remote control, human-computer inter-face (HCI), etc. We also conducted a complete performance comparison by investi-gating several most popular feature extraction and recognition methods. To furtherimprove the recognition accuracy, we proposed using MKL-SVM as a more effec-tive classifier by fitting the features with multiple kernel functions. Among thesecombinations, we noted that 3F + MKL-SVM provided the best performance, butthe efficiency of MKL-SVM is much lower than that of the others. To implementan online recognition system, we suggest that Hudgins? time-domain (TD) + lin-ear discriminant analysis (LDA) can be the best trade-off between efficiency andperformance.906.2 Future Research DirectionsTo better model corticomuscular activity by data-driven coupling methods, we stillneed to address several challenges, including relaxing the statistical assumption(e.g. sharing the same mixing matrix) for independence, dealing with the underde-termined situation when there exist more sources than the measurement channels,and developing novel approaches for dynamic/time-varying coupling analysis.Decomposing signals into independent components has been proven very use-ful in many biomedical applications [80?82]. An interesting problem is to jointlymodel multiple data sets to decompose each into independent components and re-late corresponding components across the data sets. In this thesis, we have finallyproposed a three-step framework to realize this goal. However, a stringent sta-tistical assumption has been made in the framework that the mixing matrices arethe same. This assumption could be too strong in practice. We have to relax thispractically-unfavored assumption to explore underlying information in a more re-alistic setting. A possible solution might lie in a recently popular joint BSS method? the IVA [109]. IVA, as an extension of ICA from one to multiple datasets, has at-tracted increasing research attention during the past few years. IVA was originallydesigned to address the permutation problem in the frequency domain for the sep-aration of acoustic sources [117]. Recently, it has been used to do group inferencefor fMRI data [118]. In [119], IVA has been investigated under the multivariateGaussian model. Nevertheless, the distribution of sources in practice can not beeasily modeled as Gaussian. For the corticomuscular coupling analysis problem,we will explore the specific distributions of EEG and EMG signals and designspecific coupling analysis methods under the IVA framework.Another possible future direction is to develop underdetermined coupling anal-ysis methods. In many biomedical applications, only a limit number of sensors areavailable (e.g. one or two EEG/EMG channels), while more underlying sourcesactually exist. It is of particular importance for the situations where a large num-ber of sensors should be avoided such as in the application of health care at home.Possible solutions may include decomposing signals as a preprocessing step (e.g.empirical mode decomposition) [120], separating signals into multiple segments[121] and performing sparse representation [122].91It is also desirable to develop dynamic models for time-varying coupling anal-ysis. All previous methods are based on the assumption that the overall signalsare stationary and the corresponding patterns are the same. However, it may notbe the case in practice. A more natural assumption should be that the brain pat-terns are dynamically changed in terms of their states (e.g. different force levelsduring motor tasks). We could combine hidden Markov models with existing cou-pling methods and dynamically analyze the learned brain patterns. This may allowresearchers obtaining insights into the changing process of brain patterns duringsome motor tasks and provide more information about specific diseases.92Bibliography[1] T. Mima and M. Hallett, ?Corticomuscular Coherence: A Review,? Journal ofClinical Neurophysiology, vol. 16, no. 6, p. 501, 1999.[2] P. Grosse, M. Cassidy, and P. Brown, ?EEGEMG, MEGEMG and EMGEMGfrequency analysis: physiological principles and clinical applications,? Clini-cal Neurophysiology, vol. 113, no. 10, pp. 1523-1531, 2002.[3] K. Friston, ?Functional and effective connectivity in neuroimaging: a synthe-sis,? Human Brain Mapping, vol. 2, no. 1-2, pp. 56-78, 1994.[4] S. Farmer, Neural rhythms in Parkinsons disease, Brain, vol. 125, no. 6, pp.1175-1176, 2002.[5] D. Stoffers, J. Bosboom, etc., ?Increased cortico-cortical functional connectiv-ity in early-stage Parkinson?s disease: An MEG study,? NeuroImage, vol. 41,no. 2, pp. 212-222, 2008.[6] L. F. Haas, ?Hans Berger (1873-1941), Richard Caton (1842-1926), and elec-troencephalography,? Journal of Neurology, Neurosurgery and Psychiatry, vol.74, no. 1, pp. 9-9, 2003.[7] H. Berger, ?On the Electroencephalogram of Man,? Electroencephalogr ClinNeurophysiol, Suppl 28, pp. 37-73, 1969.[8] E. Niedermeyer, F. H. L. da Silva, ?Electroencephalography: Basic Principles,Clinical Applications, and Related Fields,? Lippincot Williams and Wilkins,2004.[9] http://cec.sonus.ca/econtact/14 2/ortiz biofeedback.html[10] J. J. Carr and J. M. Brown, ?Introduction to Biomedical Equipment Technol-ogy,? Prentice Hall, 4th Edition, 2001.93[11] Webster JG, ?Medical Instrumentation: Application and Design,? John Wileyand Sons Inc, 3th Edition, 1998.[12] M.J. Burke and D.T. Gleeson, ?A micropower dry-electrode ECG preampli-fier,? IEEE Trans. Biomed. Eng., vol. 47, pp. 155-162, 2000.[13] M. Engine, T. Dalbasti, M. Gulduren, E. Davash, E. Z. Engin, ?A prototypeportable system for EEG measurements,? Measurement, vol. 40, pp. 936-942,2007.[14] J. L. Smith, S. H. Chug, R. F. Zernicke, ?Gait-related motor patterns andhindlimb kinetic for the cat trot and gallop,? Exp Brain Res, vol. 94, pp. 308-322, 1993.[15] K. R. Wheeler and C. C. Jorgensen, ?Gestures as input: Neuroelectric joy-sticks and keyboards,? Pervasive Comput., vol. 2, no. 2, pp. 56-61, Apr.-Jun2003.[16] M. A. Oskoei and H. Hu, ?Myoelectric control systems-a survey,? Biomed.Signal Process. Control, vol. 2, no. 4, pp. 275-294, Oct. 2007.[17] B. Karlik, M. O. Tokhi and M. Alci, ?A fuzzy clustering neural network archi-tecture for multifunction upper-limb prosthesis,? IEEE Trans. Biomed. Eng.,vol. 50, no. 11, pp. 1255-1261, 2003.[18] J. U. Chu, I. Moon and M. S. Mun, ?A Real-Time EMG Pattern Recogni-tion System Based on Linear-Nonlinear Feature Projection for a MultifunctionMyoelectric Hand,? IEEE Trans. Biomed. Eng., vol. 53, no. 11, pp. 2232-2239,2006.[19] K. S. Kim, H. H. Choi, C. S. Moon, C. W. Mun, ?Comparison of k-nearestneighbor, quadratic discriminant and linear discriminant analysis in classifica-tion of electromyogram signals based on the wrist-motion directions,? CurrentApplied Physics, vol. 11, no. 3, pp.740-745, 2011.[20] G. Tsenov, A. H. Zeghbib, F. Palis, N. Shoylev, V. Mladenov, ?Neural net-works for online classification of hand and finger movements using surfaceEMG signals,? NEUREL 2006, pp. 167-171, Sep. 2006.[21] G. Kondo, R. Kato, H. Yokoi, T. Arai, ?Classification of individual finger mo-tions hybridizing electromyogram in transient and converged states,? in Proc.IEEE Eng. Med. Biol. Soc. Conf., pp. 2909-2915, 2010.94[22] F. Tenore, A. Ramos, A. Fahmy, S. Acharya, R. Etienne-Cummings, and N. V.Thakor, ?Towards the control of individual fingers of a prosthetic hand usingsurface EMG signals,? in Proc. IEEE Eng. Med. Biol. Soc. Conf., pp. 6145-6148, 2007.[23] S. N. Baker, E. Olivier, R. N. Lemon, ?Coherent oscillations in monkey motorcortex and hand muscle EMG show task-dependent modulation,? J Physiol,vol. 501, no. 1, pp. 225-241, 1997.[24] B. A. Conway, D. M. Halliday, S. F. Farmer, U. Shahani, P. Maas, A. I. Weirand J. R. Rosenberg, ?Synchronization between motor cortex and spinal mo-toneuronal pool during the performance of a maintained motor task in man,? JPhysiol, vol. 489, pp. 917-924, 1995.[25] D. Halliday, B. Conway, S. Farmer, J. Rosenberg, ?Using electroencephalog-raphy to study functional coupling between cortical activity and electromyo-grams during voluntary contractions in humans,? Neuroscience Letters, vol.241, no. 1, pp. 5-8, 1998.[26] P. Brown, J. Marsden, etc., ?Intermuscular coherence in Parkinson?s disease:relationship to bradykinesia,? Neuroreport, vol. 12, no. 11, pp. 2577-2581,2001.[27] B. Pollok, V. Krause, W. Martsch, C. Wach, A. Schnitzler, M. Sudmeyer,?Motor-cortical oscillations in early stages of Parkinson?s disease,? The Jour-nal of Physiology, vol. 590, no. 13, pp. 3203-3212, 2012.[28] S. Salenius, S. Avikainen, S. Kaakola, R. Hari and P. Brown, ?Defective corti-cal drive to muscle in Parkinson?s disease and its improvement with levodopa,?Brain, vol. 125, pp. 491-500, 2002.[29] J. Hirschmann, T. E. zkurt, Markus Butz, M. Homburger, S. Elben, C. J.Hartmann, J. Vesper, L. Wojtecki, and A. Schnitzler, ?Differential modulationof STN-cortical and cortico-muscular coherence by movement and levodopain Parkinson?s disease,? NeuroImage, vol. 68, pp. 203-213, March 2013.[30] J. Chiang, Z. J. Wang, M. J. McKeown, ?A multiblock PLS model of cortico-cortical interactions in Parkinson?s disease during corticomuscular activity,?NeuroImage, vol. 63, no. 3, pp. 1498-1509, 2012.[31] R. Bortel, P. Sovka, ?EEG-EMG coherence enhancement,? Signal Process-ing, vol. 86, no. 7, pp. 1737-1751, 2006.95[32] P. Geladi and B. Kowalski, ?Partial least-squares regression: a tutorial,? Ana-lytica Chimica Acta, vol. 185, pp. 1-17, 1986.[33] A. McIntosh, F. Bookstein, J. Haxby, C. Grady, ?Spatial pattern analysis offunctional brain images using partial least squares,? NeuroImage, vol.3, no. 3,pp. 143-157, 1996.[34] E. Martinez-Montes, P. A. Valdes-Sosa, F. Miwakeichi, R. I. Goldman, M.S. Cohen, ?Concurrent EEG/fMRI analysis by multiway partial least squares,?NeuroImage, vol. 22, no. 3, pp. 1023-1034, 2004.[35] A.M. Amjad, D.M. Halliday, J.R. Rosenberg, B.A. Conway, ?An extendeddifference of coherence test for comparing and combining several independentcoherence estimates: theory and application to the study of motor units andphysiological tremor,? J Neurosci Methods, vol. 73, no. 1, pp. 69-79, 1997.[36] V. Murthy, E. Fetz, ?Coherent 25-to 35-Hz oscillations in the sensorimotorcortex of awake behaving monkeys,? Proceedings of the National Academy ofSciences, vol. 89, no. 12, pp. 5670-5674, 1992.[37] S. Wold, N. Kettaneh, K. Tjessem, ?Hierarchical multiblock PLS and PCmodels for easier model interpretation and as an alternative to variable selec-tion,? Journal of Chemometrics, vol. 10, no. 5-6, pp. 463-482, 1998.[38] M. G. Gustafsson, ?Independent Component Analysis Yields Chemically In-terpretable Latent Variables in Multivariate Regression,? J. Chem. Inf. Model.,vol. 45, no. 5, pp. 1244-1255, 2005.[39] A. Hyvarinen, E. Oja, ?Fast fixed-point algorithm for independent componentanalysis,? Neural Comput., vol. 9, no. 7, pp. 1483-1492, 1997.[40] G. R. Naik, D. K. Kumar, H. Weghorn, M. Palaniswami, ?Subtle hand ges-ture identification for hci using temporal decorrelation source separation bss ofsurface EMG,? In: Digital image computing techniques and applications, 9thBiennial conference of the Australian pattern recognition society, Adelaide,Australia, pp. 30-37, 2007.[41] G. Lanckriet, T. De Bie, N. Cristianini, M. I. Jordan, and W. Stafford Noble.?A statistical framework for genomic data fusion,? Bioinfomatics, vol. 20, no.16, pp. 2626-2635, 2004.[42] R. Agarwal, J. Gotman, ?Adaptive Segmentation of ElectroencephalographicData Using a Nonlinear Energy Operator,? Proc. IEEE ISCAS 1999, pp. 199-202, 1999.96[43] B. B. Winter, J. G. Webster, ?Driven-Right-Leg Circuit Design,? IEEE Trans.Biomed. Eng., vol. 30, pp. 62-66, 1983.[44] H. L. Lew, S. J. TSAI, Pictorial guide to muscles and surface anatomy, John-son?s practical electromyography, Lippincott Williams and Wilkins; pp. 145-212, 2007.[45] G. Li, Y. Li, Z. Zhang, Y. Geng, R. Zhou, ?Selection of sampling rate forEMG pattern recognition based prosthesis control,? in Proc. IEEE Eng. Med.Biol. Soc. Conf., pp. 5058-5061, 2010.[46] X. Zhang, X. Chen, Z. Zhao, Y. Tu, J. Yang, V. Lantz, K. Wang, ?Researchon gesture definition and electrode placement in pattern recognition of handgesture action SEMG,? In Medical Biometrics, pp. 33-40, Springer Berlin Hei-delberg, 2007.[47] K. Englehart, B. Hudgins, P. A. Parker, M. Stevenson, ?Classification of themyoelectric signal using time-frequency based representations,? Medical En-gineering and Physics, vol. 21, no. 6-7, pp. 431-438, 1999.[48] B. Hudgins, P. A. Parker, R. Scott, ?A new strategy for multifunction myo-electric control,? IEEE Trans. Biomed. Eng., vol. 40, no. 1, pp. 82-94, 1993.[49] S. Leowinata, B. Hudgins, P.A. Parker, ?A multi function myoelectric con-trol strategy using an array of electrodes,? 16th Annual Congress of the Inter-national Society Electrophysiology and Kinesiology, Montreal, PQ, Canada,1998.[50] N. Uchida, A. Hiraiwa, N. Sonehara, K. Shimohara, ?EMG pattern recogni-tion by neural networks for multi fingers control,? Proc. Ann. Int. Conf. IEEEEng. Med. Biol. Soc., vol. 3, pp. 1016-1018, 1992.[51] K. A. Farry, I. D. Walker, R.G. Baraniuk, ?Myoelectric teleoperation of acomplex robotic hand,? IEEE Transactions on Robotics and Automation, vol.12, no. 5, pp. 775-788, 1996.[52] K. Englehart, B. Hudgins, P. A. Parker, ?A wavelet-based continuous classi-fication scheme for multifunction myoelectric control,? IEEE Trans. Biomed.Eng., vol. 48, pp. 302-311, 2001.[53] K. Nazarpour, A. R. Sharafat, S. M. P. Firoozabadi, ?Application of higherorder statistics to surface electromyogram signal classification,? IEEE Trans.Biomed. Eng., vol. 54, no. 10, pp. 1762-1769, Oct. 2007.97[54] F. Chan, Y.-S. Yang, F. K. Lam, Y.-T. Zhang, P. A. Parker, ?Fuzzy EMGclassification for prosthesis control,? IEEE Trans. Rehab. Eng., vol. 8, no. 3,pp. 305-311, 2000.[55] K. Englehart, B. Hudgins, ?A robust, real-time control scheme for multifunc-tion myoelectric control,? IEEE Trans. Biomed. Eng., vol. 50, no. 7, pp. 848-854, 2003.[56] Y. Huang, K. Englehart, B. Hudgins, A. D.C. Chan, ?A Gaussian mixturemodel based classification scheme for myoelectric control of powered upperlimb prostheses,? IEEE Trans. Biomed. Eng., vol. 52, no. 11, pp. 1801-1811,2005.[57] M. R. Ahsan, M. I. Ibrahimy, O. O. Khalifa, ?EMG signal classification forhuman computer interaction: a review,? Eur J Sci Res, vol. 33, no. 3, pp. 480-501, 2009.[58] J. Kim, S. Mastnik, E. Andre, ?EMG-based hand gesture recognition for re-altime biosignal interfacing,? in Proc. 13th Int. Conf. Intell. User Interfaces,Gran Canaria, Spain, pp. 30-39, 2008.[59] T. Lorrain, N. Jiang, D. Farina, ?Influence of the training set on the accuracyof surface EMG classification in dynamic contractions for the control of mul-tifunction prostheses,? Journal of NeuroEngineering and Rehabilitation, 8:25,2011.[60] L. Hargrove, E. Scheme, K. Englehart and B. Hudgins, ?Principal compo-nents analysis tuning for improved myoelectric control,? Proceeding of the30th Annual Conference of Canadian Medical and Biological Engineering So-ciety, 2007, Toronto.[61] M.F. Lucas, A. Gaufriau, S. Pascual, C. Doncarli, D. Farina, ?Multi-channelsurface EMG classification using support vector machines and signal-basedwavelet optimization,? Biomed. Signal Process. Control, vol. 3, no. 2, 169-174, 2008.[62] M. A. Oskoei, H. Hu, ?Support vector machine-based classification schemefor myoelectric control applied to upper limb,? IEEE Trans. Biomed. Eng., vol.55, pp.1956-1965, 2008.[63] C. W. Hsu, C. J. Lin, ?A comparison of methods for multiclass support vectormachines,? IEEE Trans. Neural Networks, vol. 13, pp.415-425, 2002.98[64] Y. H. Liu, H. P. Huang, C. H. Weng, ?Recognition of electromyographic sig-nals using cascaded kernel learning machine,? IEEE/ASME Trans. Mechatron.,vol. 12, no. 3, pp. 253-264, 2007.[65] F. R. Bach, G. R. G. Lanckriet, M. I. Jordan, ?Multiple kernel learning, conicduality, and the SMO algorithm,? Twenty-first international conference on Ma-chine learning, 2004.[66] J. Friedman, T. Hastie, R. Tibshirani, The Elements of Statistical Learning -Data Mining, Inference and Prediction, 2nd ed., Springer, New York, 2001.[67] S. Sonnenburg, G. Raetsch, C. Schaefer, and B. Schoelkopf. ?Large scalemultiple kernel learning,? J. Mach. Learn. Res., 7:1531-1565, 2006.[68] A. Rakotomamonjy, F.R. Bach, S. Canu and Y. Grandvalet, ?SimpleMKL,? J.Mach. Learn. Res., 9:2491-2521, 2008.[69] G. Tropini, J. Chiang, Z. J. Wang, E. Ty, M. J. McKeown, ?Altered DirectionalConnectivity in Parkinson?s Disease During Performance of a Visually GuidedTask,? Neuroimage, vol. 56, no. 4, pp. 2144-2156, 2011.[70] X. Chen, Z. J. Wang, M. J. McKeown, ?A tridirectional method for cortico-muscular coupling analysis in Parkinson?s disease,? Multimedia Signal Pro-cessing (MMSP), 2012 IEEE 14th International Workshop on, pp. 309-312,17-19 Sept. 2012.[71] X. Chen, X. Chen, R. K. Ward, Z. J. Wang, ?A Joint Multimodal Group Anal-ysis Framework for Modeling Corticomuscular Activity,? IEEE Trans. on Mul-timedia, vol. 15, no. 5, pp. 1049-1059, 2013.[72] X. Chen, A. Liu, Z. J. Wang, H. Peng, ?Modeling Corticomuscular Activ-ity by Combining Partial Least Squares and Canonical Correlation Analysis,?Journal of Applied Mathematics, vol. 2013, Article ID 401976, 11 pages, 2013.[73] X. Chen, C. He, Z. J. Wang, M. J. McKeown, ?An IC-PLS Framework forGroup Corticomuscular Coupling Analysis,? IEEE Trans. Biomed. Eng., vol.60, no. 7, pp. 2022-2033, 2013.[74] X. Chen, Z. J. Wang, M. J. Mckeown, ?A Three-step Method for Cor-ticomuscular Activity Modeling with Application to Parkinson?s Disease,?IEEE Transactions on Information Technology in Biomedicine, in press, 2013,doi:10.1109/JBHI.2013.2284480.99[75] B. S. Everitt, G. Dunn, Applied Multivariate Data Analysis, 2nd Ed., JohnWiley and Sons, 2010.[76] H. L. Yu, J. F. MacGregor, ?Post processing methods (PLS-CCA): simplealternatives to preprocessing methods (OSC-PLS),? Chemom. Intell. Lab. Syst.,vol. 73, pp. 199-205, 2004.[77] W. D. Clercq, A. Vergult, B. Vanrumste, W. V. Paesschen, S. V. Huffel,?Canonical Correlation Analysis Applied to Remove Muscle Artifacts Fromthe Electroencephalogram,? IEEE Trans. Biomed. Eng., vol. 53, no. 12, pp.2583-2587, 2006.[78] N. M. Correa, T. Adali, Y. Li and V. D. Calhoun , ?Canonical CorrelationAnalysis for Data Fusion and Group Inferences,? IEEE Signal ProcessingMagazine, vol. 27, no. 4, pp. 39-50, 2010.[79] E. Gumus, O. Kursun, A. Sertbas and D. Ustek, ?Application of canonicalcorrelation analysis for identifying viral integration preferences,? Bioinformat-ics, vol. 28, no. 5, pp. 651-655, 2012.[80] M. J. McKeown, S. Makeig, G. G. Brown, T. P. Jung, S. S. Kindermann,A. J. Bell, T. J. Sejnowski, ?Analysis of fMRI data by blind separation intoindependent spatial components,? Hum. Brain Mapp., vol. 6, no. 3, pp. 160-188, 1998.[81] R. Vigario, J. Sarela, V. Jousmaki, M. Hamalainen, and E. Oja, ?IndependentComponent Approach to the Analysis of EEG and MEG Recordings,? IEEETrans. Biomed. Eng., vol. 47, no. 5, pp. 589-593, 2000.[82] V. D. Calhoun, J. Liu, T. Adali, ?A review of group ICA for fMRI data andICA for joint inference of imaging, genetic, and ERP data,? NeuroImage, vol.45, no. 1, pp. S163-S172, 2009.[83] H. Kaneko, M. Arakawa, K. Funatsu, ?Development of a new regression anal-ysis method using independent component analysis,? J. Chem. Inf. Model, vol.48, pp. 534-541, 2008.[84] V. D. Calhoun, T. Adali, K. A. Kiehl, R. Astur, J. J. Pekar, G. D. Pearlson,?A method for multitask fMRI data fusion applied to schizophrenia,? Humanbrain mapping, vol. 27, no. 7, pp. 598-610, 2006.[85] F. Lindgren, P. Geladi, S. Wold, ?The kernel algorithm for PLS,? Journal ofChemometrics, vol. 7, no. 1, pp. 45-59, 1993.100[86] S. Weiss, P. Rappelsberger, ?Long-range EEG synchronisation during wordencoding correlates with successful memory performance,? Cognitive BrainResearch, vol. 9, pp. 299-312, 2000.[87] E. Clancy, E. Morin, and R. Merletti, ?Sampling, noise-reduction and ampli-tude estimation issues in surface electromyography,? Journal of Electromyog-raphy and Kinesiology, vol. 12, no. 1, pp. 1-16, 2002.[88] P. Good, Permutation Tests: A Practical Guide to Resampling Methods forTesting Hypotheses, 2nd Ed., Springer Series in Statistics, Springer-Verlag,Heidelberg, 2000.[89] P. Praamstra, L. Boutsen, G. W. Humphreys, ?Frontoparietal control of spatialattention and motor intention in human EEG,? Journal of Neurophysiology,vol. 94, no. 1, pp. 764-774, 2005.[90] J. Liepert, C. Dettmers, C. Terborg, C. Weiller, ?Inhibition of ipsilateral motorcortex during phasic generation of low force,? Clinical Neurophysiology, vol.112, no. 1, pp. 114-121, 2001.[91] C. Danielmeier, T. Eichele, B. U. Forstmann, M. Tittgemeyer, M. Ullsperger,?Posterior medial frontal cortex activity predicts post-error adaptations in task-related visual and motor areas,? The Journal of Neuroscience, vol. 31, no. 5,pp. 1780-1789, 2011.[92] P. Sauseng, W. Klimesch, R. Freunberger, T. Pecherstorfer, S. Hanslmayr,M. Doppelmayr, ?Relevance of EEG alpha and theta oscillations during taskswitching,? Experimental Brain Research, vol. 170, no. 3, pp. 295-301, 2006.[93] I. G. M. Cameron, M. Watanabe, G. Pari, D. P. Munoz, ?Executive impair-ment in Parkinson?s disease: response automaticity and task switching,? Neu-ropsychologia, vol. 48, no. 7, pp. 1948-1957, 2010.[94] P. Praamstra, D. F. Stegeman, A. R. Cools, M. W. Horstink, ?Reliance onexternal cues for movement initiation in Parkinson?s disease. Evidence frommovement-related potentials,? Brain, vol. 121, no. 1, pp. 167-177, 1998.[95] J. K. R. Stevenson, M. M. K. Oishi, S. Farajian, E. Cretu, E. Ty, M. J. McK-eown, ?Response to sensory uncertainty in Parkinson?s disease: a marker ofcerebellar dysfunction?? European Journal of Neuroscience, vol. 33, no. 2,pp. 298-305, 2011.[96] X. Lu, Optimization Applications and Basis , Shanghai: Tongji UniversityPress; 2003.101[97] M. A. Guevara, M. Corsi-Cabrera, ?EEG coherence or EEG correlation?? Int.J. Psychophysiol., vol. 23, no. 3, pp. 145-153, 1996.[98] S. Haykin, Neural Networks, Upper Saddle River, NJ: Prentice Hall Interna-tional; 1999.[99] D. G. Luenberger, Optimization by Vector Space Methods, New York: Wiley,1969.[100] G. Tropini, J. Chiang, Z. J. Wang, E. Ty, M. J. McKeown, ?Altered Direc-tional Connectivity in Parkinson?s Disease During Performance of a VisuallyGuided Task,? Neuroimage, vol. 56, no. 4, pp. 2144-2156, 2011.[101] B. Ng, S. Palmer, R. Abugharbieh, M. J. McKeown, ?Focusing effects ofL-dopa in Parkinson?s disease,? Hum. Brain Mapp., vol. 31, no. 1, pp. 88-97,Jan 2010.[102] P. Capotosto, C. Babiloni, G. L. Romani, M. Corbetta, ?Frontoparietalcortex controls spatial attention through modulation of anticipatory alpharhythms,? J Neurosci, vol. 29, pp. 5863-5872, 2009.[103] M.J. Wright, R.J. Burns, G.M. Geffen, L.B. Geffen, ?Covert orientation ofvisual attention in Parkinson?s disease: An impairment in the maintenance ofattention,? Neuropsychologia, vol. 28, no. 2, pp. 151-159, 1990.[104] P. Brown, ?Abnormal oscillatory synchronisation in the motor system leadsto impaired movement,? Current Opinion in Neurobiology, vol. 17, no. 6, pp.656-664, 2007.[105] N. Jenkinson, P. Brown, ?New insights into the relationship betweendopamine, beta oscillations and motor function,? Trends in Neurosciences,vol. 34, no. 12, pp. 611-618, 2011.[106] B. Pollok, V. Krause, W. Martsch, C. Wach, A. Schnitzler, M. Sudmeyer,?Motor-cortical oscillations in early stages of Parkinson?s disease,? J Physiol,DOI: 10.1113/jphysiol.2012.231316, 2012.[107] H. Gu, Z. Pan, B. Xi, V. Asiago, B. Musselman, D. Raftery, ?Principal com-ponent directed partial least squares analysis for combining nuclear magneticresonance and mass spectrometry data in metabolomics: application to the de-tection of breast cancer,? Anal Chim Acta, vol. 686, no. 1-2, pp. 57-63, 2011.[108] C. Zhao, F. Gao, ?A bidirectional between-set statistical analysis methodand its applications,? JAIChE Journal, 57(5):1233-1249, 2011.102[109] H. Attias, S. Lee, T. Lee, ?Blind Source Separation Exploiting Higher-OrderFrequency Dependencies,? IEEE Transactions on Audio, Speech, and Lan-guage Processing, 15(1):70-79, 2007.[110] Y. Li, T. Adali, W. Wang, V. Calhoun, ?Joint Blind Source Separation byMultiset Canonical Correlation Analysis,? IEEE Transactions on Signal Pro-cessing, vol. 57, no. 10, pp. 3918-3929, 2009.[111] T. Fearn, ?On orthogonal signal correction,? Chemom Intell Lab Syst., vol.50, pp. 47-52, 2000.[112] Z. J. Wang, P. W. Lee, and M. J. McKeown, ?A Novel Segmentation, MutualInformation Network Framework for EEGAnalysis of Motor Tasks,? BioMed-ical Engineering Online, vol. 8, no. 9, 2009.[113] J. Sui, T. Adali, G. Pearlson, H. Yang, S. Sponheim, T. White, and V. D.Calhoun, ?A CCA+ ICA based model for multi-task brain imaging data fusionand its application to schizophrenia,? Neuroimage, vol. 51, no. 1, pp. 123-134,2010.[114] J. Sui, G. Pearlson, A. Caprihan, T. Adali, K. A. Kiehl, J. Liu, J. Yamamoto,and V. D. Calhoun, ?Discriminating schizophrenia and bipolar disorder by fus-ing fMRI and DTI in a multimodal CCA+ joint ICA model,? Neuroimage, vol.57, no. 3, pp. 839-855, 2011.[115] M. J. Wright, R.J. Burns, G.M. Geffen, L.B. Geffen, ?Covert orientation ofvisual attention in Parkinson?s disease: An impairment in the maintenance ofattention,? Neuropsychologia, vol. 28, no. 2, pp. 151-159, 1990.[116] E. Florin, R. Erasmi, C. Reck, M. Maarouf, A. Schnitzler, G. R. Fink,L. Timmermann, ?Does increased gamma activity in patients suffering fromParkinson?s disease counteract the movement inhibiting beta activity?,? Neu-roscience, vol. 237, pp. 42-50, 2013.[117] T. Kim, T. Eltoft, TW Lee, ?Independent vector analysis: An extension ofICA to multivariate components,? Independent Component Analysis and BlindSignal Separation, pp. 165-172, Springer Berlin Heidelberg, 2006.[118] J. H. Lee, T. W. Lee, F. A. Jolesz, S. S. Yoo, ?Independent vector analysis(IVA): multivariate approach for fMRI group study,? Neuroimage, vol. 40, no.1, pp. 86-109, 2008.103[119] M. Anderson, T. Adali, XL Li, ?Joint blind source separation with multi-variate Gaussian model: algorithms and performance analysis,? IEEE Trans.Signal Process., vol. 60, no. 4, pp. 1672-1683, 2012.[120] M. J. McKeown, R. Saab, R. Abu-Gharbieh, ?A combined independent com-ponent analysis (ICA)/empirical mode decomposition (EMD) method to infercorticomuscular coupling,? IEEE EMBS 2nd International Conference on Neu-ral Engineering, pp. 679-682, 2005.[121] M. E. Davies, C. J. James, ?Source separation using single channel ICA,?Signal Processing, vol. 87, no. 8, pp. 1819-1832, 2007.[122] P. Bofill, M. Zibulevsky, ?Underdetermined blind source separation usingsparse representations,? Signal processing, vol. 81, no. 11, pp. 2353-2362,2001.104Appendix APrototype DevelopmentThe reason we put this part in Appendix is that we did not use our developedsystem to collect data for corticomuscular coupling analysis although our originalpurpose was to utilize an integrated wireless EEG and EMG system for patients?convenience. We were required to use commercialized medical devices in sci-entific publications. However, the prototype is still very useful to develop novelapplications for daily-life usage.A.1 IntroductionBiosensor monitoring systems have been attracting increasing interest, stimulatedby recent advances in electronic, communications and information technologies.For instance, EEG is the most popular non-invasive recording technology stud-ied in BCI, a hot topic aiming at creating a direct link between the brain and acomputer or a control device. However, most BCI systems use bulky and wiredEEG measurements, which is uncomfortable and inconvenient for users to performdaily-life tasks. To address this concern, in Section A.2 we design a wearable, wire-less EEG acquisition and recording system, including a data acquisition unit and adata transmission/receiving unit. Our testing results show that the proposed designis effective with high common-mode rejection ratio (CMRR). The developed EEGsystem is suitable for biomedical applications such as patient monitoring and BCI.Based on the developed EEG system, in Section A.3 we modify it and build a105sEMG hand gesture recognition system. As mentioned in Section 1.2.1, regardingthe research direction of multiple finger gesture classification, to our knowledge,there are relatively few research works in literature and there has been no multiplefinger gesture benchmark proposed yet. In [40], the authors used BSS and artifi-cial neural network (ANN) techniques to off-line classify four subtle hand gesturesby four wired sEMG channels and achieved 97% average accuracy for seven sub-jects, however, only three of the four gestures can be categorized as multiple fingergesture tasks. In this chapter, we plan to define a set of multi-finger movementtasks which could be commonly used for benchmark testing. Based on our exten-sive preliminary investigation, we choose number gestures as an appropriate set ofmulti-finger movements because of the following reasons: they satisfy the basicdefinition of multi-finger movements; and people use number gestures frequentlyin real-life especially when there are language communication difficulties, whichis indeed a major reason why originally number gestures were invented.In this chapter, we plan to study Chinese number gestures representing thenumbers zero to nine, as illustrated in Fig. A.1. We propose to recognize theseten classes of subtle hand gestures based on a 4-channel wireless sEMG system.The recognition procedure is implemented in two phases. In Phase-1, we developoff-line recognition algorithms to study their recognition performances and checkthe feasibility of implementing a real-time recognition system. We first investigatethe most popular feature extraction and classification algorithms; then based onobserved performances we further propose to combine all three features togetherand employ multiple kernels to adapt the combined feature set structure by multi-Figure A.1: Illustration of the Chinese number gestures.106ple kernel learning (MKL). In support vector machine (SVM), a kernel functionimplicitly maps samples to a feature space given by a feature map. It is often un-clear what the most suitable kernel for the task at hand is, and hence the user maywish to combine several possible kernels. One problem with simply adding kernelsis that using uniform weights is possibly not optimal. For instance, if one kernelis not correlated with the labels at all, then giving it a positive weight just addsnoise [41]. MKL is an efficient way of optimizing kernel weights. In Phase-2,we implement a real-time sEMG recognition system for Chinese number gesturesand demonstrate its online performance as a preliminary study for possible practi-cal applications. For the two phases, we achieved 97.93% and 95% high averagerecognition accuracies respectively.A.2 A Wireless Wearable EEG Recording SystemA.2.1 System DesignThe architecture of the designed EEG recording system is shown in Fig. A.2.Figure A.2: The architecture of the designed EEG recording system.EEG Acquisition CircuitIn order to achieve a reliable EEG recording system, the design of the data acqui-sition circuit is the most important step. The schematic of a single channel EEGacquisition circuit is shown in Fig. A.3. The details of each element in the circuitwill be presented as follows.1. EEG features: A typical human adult EEG signal is about 10 to 100 ?V in107Figure A.3: The schematic of a single channel EEG acquisition circuit.amplitude when measured from the scalp and it is about 10 to 20 mV whenmeasured from subdural electrodes. Since EEG signals are usually acquiredthrough passive electrodes from head surface, they are easily contaminatedby artifacts that are not directly related to brain electrical activity [42]. Theartifacts in the recorded EEG can be either technical such as AC power linenoise or person-related such as EMG, ECG and electrooculogram (EOG).2. Voltage follower: The voltage follower is constructed by applying a full se-ries negative feedback to an operational amplifier (op-amp) simply by con-necting its output to its inverting input, and connecting the signal source tothe non-inverting input. This circuit has an output which is identical to theinput voltage. The importance of this circuit does not come from any changein voltage, but from the input and output impedances of the op-amp. The in-put impedance of the op-amp is very high (usually 1 M? to 10 T?), meaningthat the input of the op-amp does not load down the source or draw any cur-rent from it. On the other hand, the output impedance of the op-amp is verylow and thus it behaves as a perfect voltage source. Both the connectionsto and from the voltage follower are therefore voltage bridging connections,which can reduce power consumption in the source, distortion from over-loading, crosstalk and other electromagnetic interferences. In this design, allop-amps are OPA333 (Texas Instruments, Inc., Dallas, TX). The OPA333series of CMOS op-amps adopt a proprietary auto-calibration technique to108simultaneously provide very low offset voltage (10 ?V max) and near-zerodrift over time and temperature. These miniature, high-precision, low qui-escent current amplifiers offer high-impedance inputs and low-impedanceoutputs with micro-power, making them quite suitable for battery-powered,micro-sized portable medical devices.3. Instrumentation Amplifier: The function of instrumentation amplifier is toamplify the voltage difference between the signals from REF and CH1 EEGelectrodes, and simultaneously reject common-mode noise, such as powerline interference, at both inputs. The INA333 (Texas Instruments, Inc., Dal-las, TX) is the instrumentation amplifier chosen for this design since it is theindustrial lowest power zero-drift instrumentation amplifier. It provides verylow offset voltage, excellent offset voltage drift, micro power consumption,very low quiescent current and high CMRR (110 dB typically), which en-sures excellent precision and stability while extending battery life in portablemedical devices. In our design, RG is set to 10 k? and the amplification gainis 11.4. DC restorator: The direct current (DC) restorator is used to eliminate DCoffset which would otherwise saturate the op-amps subsequent to the instru-mentation amplifier. Here the DC restorator is implemented by using anop-amp in the feedback loop of INA333. The model in Fig. A.4 is used todesign the DC restorator.Figure A.4: The DC restorator model.1095. Drive Right Leg: To reduce common-mode noise, such as power line in-terference (50/60Hz) which is unavoidably coupled into the human subject,the driven right leg (DRL) is necessary to be used in the design. The name,DRL, has been preserved from its first use in ECG equipment [43], althoughthe feedback electrode is not connected to the subject?s right leg for EEGapplications. As shown in Fig. A.3, the DRL is a feedback circuit, feedingthe inverse of common-mode voltage back to the human subject, which actsto reduce the common-mode noise present at the electrodes.6. Band pass filter: The band pass filter is composed of a first order active highpass filter and a first order active low pass filter, cutting off the signals withfrequency components below 0.5 Hz and above 150 Hz. The high pass filteris designed to remove DC voltage offsets and to reduce the baseline drift,while the low pass filter is used to prevent aliasing in the analog-to-digitalconverter (ADC) step. The gain of the band pass filter is set to 1000, whichcan be changed by adjusting the adjustable resistor P1. Therefore, the totalamplification gain of the EEG acquisition circuit is 11000.7. Power supply: Power for the board is supplied by a button battery, Pana-sonic CR2032 with 3V (VCC) output voltage. Through a voltage dividerand a voltage follower, a stable 1.5V (VCC/2) can be obtained and used as avirtual ground for the entire system. If we use a CR2032 to supply power forboth EEG acquisition circuit and the TelosB mote (Crossbow, Inc., Milpitas,CA), the entire system can work normally for at least 8 hours, with mostenergy being consumed by the TelosB mote for wireless data transmission.8. Physical circuit: A two-layer 3.2 cm ? 6.6 cm printed circuit board (PCB)is designed for the 4-channel EEG acquisition unit. The analog channelsare laid out in both sides of the board. A high board density is achieved byusing the second smallest available hand-solderable parts (size 0603 for thepassive components), amplifiers with MSOP8 footprint (3.0 mm? 3.0 mm),8 mil signal trace width and 35 mil vias with 20 mil drill holes. We feel thatsuch design is suitable for a portable EEG recording system.110Wireless Transmission UnitThe wireless transmission unit consists of three parts, which are all based on theTinyOS platform. Two of the three are TelosB motes (one for the transmission sideand the other for receiver on the server side), and another is a Java package for theserver. TelosB mote is an open source platform designed to enable cutting-edge ex-perimentation for research community. The TelosB mote bundles all the essentialsfor lab studies into a single platform, which includes: Universal Serial Bus (USB)programming capability, an IEEE 802.15.4 radio with integrated antenna, a low-power microcontroller (MCU) with extended memory, and an optional sensor suite.To transmit data wirelessly, the EEG signals acquired from the analogue circuit gothrough the ADC equipped with MSP430 on the TelosB mote. It provides 12-bitresolution, and can be configured in different modes. The sampling frequency isselectable and is set to be 333 Hz here, satisfying the Nyquist Theory in terms ofthe -3dB cutoff frequency of 150 Hz. The data receiver side uses another TelosBmote as the base station which is connected with the PC via USB, and the PC con-tinuously saves the received data into files by a Java-based application. The indoortransmission range can achieve 15 m, which is satisfactory for people?s daily us-age. The TelosB mote is also a 3.2 cm ? 6.6 cm PCB. With the two PCBs, theheight and the weight of the designed system are only 1.2 cm and 56 g (includingthe button battery) respectively as in Fig. A.5a, which includes data acquisitionunit, Telosb mote and their combination from top to bottom. Furthermore, sinceZigBees can sleep most of the time, the average power consumption can be verylow, resulting in a longer battery life. To make the device portable, we design ahead band which can be easily worn as shown in Fig. A.5b.A.2.2 Experiments and ResultsThe developed EEG recording system has been tested on one human subject. Toevaluate the signal quality of the designed system, the system was first used tomeasure ECG signals by adjusting the adjustable resistor P1 to make the gain ofband pass filter equal to 100. We connected three electrodes to V4, V5 and right legof the subject, and asked the subject to remain static in order to avoid movementartifacts. From Fig. A.6a, we can see that the designed system can successfully111(a) (b)Figure A.5: (a) Data acquisition unit, telosb mote and their combination fromtop to bottom; (b) The wireless EEG head band.capture P wave, QRS complex and T wave of the subject?s ECG signal. The mea-sured ECG signal seems to have high quality with little noise.Further, the designed system was used to measure EEG signals by six elec-trodes placed on the frontal lobe of the head. Four electrodes were placed on FP1,FP2, F7 and F8 (10-20 System of Electrode Placement), and the reference andDRL electrodes were placed on midline position of the frontal head. The subjectwas lying in bed at night, being mentally prepared for sleep. The measurementlasted for about 40 minutes, and an observation should be noticed that the subjectshowed light symptoms of waking up temporarily around the 10-minute time point.The EEG signals recorded from the designed 4-channel EEG system were shownin Fig. A.6b. The subject experienced from awake and relaxed state to asleep, assupported by the observations that Delta wave, normally seen in adults in slow-wave sleep, was the highest in amplitude and Alpha wave, emerging with closingof the eyes and with relaxation, was relatively higher in amplitude.112(a)(b)Figure A.6: (a) ECG reading from the designed EEG recording system; (b)EEG readings from the 4-channel EEG recording system.After performing fast Fourier transform (FFT) of the EEG signal from Channel1, we illustrated its frequency properties in Fig. A.7a, where the energies between1 and 4 Hz (the Delta band) and that between 8 and 12 Hz (the Alpha band) areobviously stronger than that in other frequency bands. Also, from Fig. A.7a, wecan see that there is no obvious unwanted interference in the EEG signals.113(a)(b)Figure A.7: (a) FFT results of the Channel-1 EEG signal; (b) The temporalenergy changes of Delta and Alpha waves;To have an insight into the measured EEG signals, we used the EEG signalfrom Channel 3 for further analysis. The EEG signal was uniformly divided into19 equal segments, with each segment including two-minute EEG signal. For eachsegment, the average energy between 1 and 4 Hz and that between 8 and 12 Hz114were calculated, and thus we could plot the energies of the Delta and Alpha bandsas a function of time. As shown in Fig. A.7b, the energy in Delta band was gen-erally increasing and had a trend to further increase with the subject falling asleepdeeper and deeper, while the energy in Alpha band seemed keep stable after 20minutes, which means that the subject had relaxed totally after 20 minutes, but notentered the deepest sleep state yet. We also noted another interesting observation:around the 10-minute time point, both the energies in Delta and Alpha bands tendedto increase first and then decreased, which coincided with what we observed aboutthe subject at that time during the experiment.Figure A.8: Alpha waves detected on posterior regions of head.At last, to demonstrate that the EEG signals were with high quality, we placedtwo electrodes on Cz and Pz positions to test Alpha waves when the subject closedeyes and relaxed. The testing results were illustrated in Fig. A.8, which presentedclear mu rhythm and alpha waves.115A.3 A Wireless Portable sEMG Recognition SystemA.3.1 System DesignThe basic architecture of our developed 4-channel wireless portable sEMG record-ing system is the same as that of the EEG system. However, there are some im-portant differences between them. First, each channel has a reference electrode inthe sEMG system while all channels in the EEG system share one common ref-erence electrode; second, the bandwidth of the band pass filter in sEMG systemis from 20 Hz to 250 Hz since the most useful information reside in the selectedband and then the sampling frequency should be changed; third, the overall gain ofthe sEMG circuit is set to be about 1000 while the gain in EEG system was about10000 since the amplitude of sEMG measured from human forearms is generally10 times higher that that of EEG from human scalp; fourth, the feedback (DRL) ofsEMG system could be the direct circuit ground while it was not the case for EEG.Besides, a real-time recognition module has been added into the sEMG recordingsystem which would be shown in the next section.A.3.2 An Application for Chinese Number Guesture RecognitionElectrode PlacementBased on the movement tasks and the muscle anatomy of human upper limb, sev-eral forearm muscles can contribute to the designed multi-finger movements, withdetails of which can be found in [44]. Through a large number of preliminaryexperiments and our previous study [46], we identify four forearm muscles as suit-able candidates for the designed recognition problem of Chinese number gestures.The four selected muscles are Extensor Pollicis Brevis, Extensor Digitorum, FlexorDigitorum Profundus for little finger and Flexor Digitorum Superficialis. The cor-responding four sEMG electrode pairs are placed over these muscles as shown inFig. A.9.116Figure A.9: Illustration of the sEMG electrode pair placement: CH1: overExtensor Pollicis Brevis muscle; CH2: over Extensor Digitorum mus-cle; CH3: over Flexor Digitorum Profundus muscle; and CH4: overFlexor Digitorum Superficialis muscle.Experimental ProtocolSix healthy subjects (three females and three males, aged from 23 to 28, all right-handed) volunteered for this study. We used the 4-channel wireless sEMG systemto collect sEMG signals from each subject by disposable AgCl electrodes (JunkangMedical Equipment, Inc., Shanghai) placed on the locations as indicated in Fig.A.9. The electrodes have the size 3 cm? 2 cm with pre-placed electric conductiongel in the middle circle part (1 cm diameter). The sampling frequency was set to be500 Hz since the most useful energy in sEMG signals is in the range of 0-250 Hz[45]. The experimental procedure consists of two phases: the first one is for off-lineanalysis and the second one for online analysis. At the beginning of each sessionin the experiments, we needed to re-place the electrodes. To find correspondingmuscles correctly, we asked the subjects to perform certain hand actions suggestedby an anatomist. For example, to locate Extensor Pollicis Brevis, the subjects wererequired to extend their thumbs. Also, before the electrode placement, we usedalcohol prep pads to clean the corresponding locations for reducing the electrode-skin impedance.In the off-line experiment, each subject sat in a comfortable chair and naturallyperformed the ten Chinese number gesture movements. Each movement lasted for117about 0.5 second and the time interval between two adjacent movements was about1 to 2 seconds. For each of the ten number gestures in the set, we collected 50 trialsfrom each subject to provide enough data samples for training and classification.All sEMG data from one subject were collected within one session. The popularleave-one-out cross validation method is employed on trial bases to calculate theclassification accuracy during off-line analysis.In the online experiment, we collected sEMG data from each subject in nineseparate sessions, with 20 trails for each number gesture during each session. Eachsubject completed the nine separate sessions within ten days. The time intervalbetween two adjacent sessions was more than 8 hours and at the beginning of eachsession the new electrodes were re-placed as in Fig. A.9. When a subject did the i-th session recording, sEMG data from the 1-st to the (i?1)-th sessions were treatedas the training set. Therefore, subjects can adjust his/her movements appropriatelyaccording to the feedback results of the real-time recognition system.Recognition ProcedureIn this part, we first describe how to detect active segments in multichannel sEMGsignals which represent the multi-finger movements. We then describe several pop-ular feature extraction methods and classification algorithms, and further proposea MKL-SVM approach by combining all the features. Finally, we evaluate theirclassification performances for Chinese number gesture recognition. The basic di-agram of the recognition procedure is shown in Fig. A.10.Figure A.10: The basic diagram of the recognition procedure.sEMG motion detection The simple moving average method is employed here toprocess the transient energy of sEMG signals for motion detection, which generally118consists of three steps.First, the summation of sEMG signals sc(t)?s is computed and the average s(t)is calculated ass(t) =1CC?c=1sc(t), (A.1)where C means the total number of sEMG channels and c means the sEMG channelindex. Then, the square of s(t) is calculated to get the transient energy E(t) asE(t) = s2(t). (A.2)Second, the width of the moving window is set to be W = 32 points (about64 ms when the sampling frequency is 500 Hz). Using the moving window tocalculate the average of the transient energy E(t), we have EMA(t) asEMA(t) =1Wt?i=t?W+1E(i). (A.3)Third, a suitable threshold value T H is chosen and the start point and endFigure A.11: An example of 4-channel sEMG signals and the correspondingmotion detection results of Chinese number gestures.119point of each motion segment can be detected. The start point is defined as thepoint where the value of EMA(t) is larger than T H and the subsequent W values ofEMA(t) are also larger than T H. The end point is defined as the point where thevalue of EMA(t) is smaller than T H and the subsequent W values of EMA(t) arealso smaller than T H. If the time interval between a pair of the start point and theend point is too short, e.g. less than 50 points, we suggest that the correspondingsegment is due to noise and should be discarded. Due to our good-quality sEMGsignals, we set T H to be 1% of the maximum value of EMA(t)?s.An example of the 4-channel sEMG signals and the motion detection results ofthe ten Chinese number gestures are illustrated in Fig. A.11.Feature extraction After the motion detection operation, suitable features shouldbe extracted from each active sEMG segment to compose a feature vector for fur-ther classification. Because feature extraction has been shown to have a greatereffect on classification accuracy than the type of classifier selected [47], it hasbeen studied thoroughly in previous sEMG-based recognition and classificationresearch. The most popular features are the following six types: TD [48], auto-correlation and cross-correlation coefficients (ACCC) [49], spectral power mag-nitudes (SPM) [50, 51], short-time Fourier transform (STFT) [47], wavelet trans-form (WT) [52] and high order statistics (HOS) [53]. However, we feel that STFT,WT and HOS are unsuitable for our case due to several reasons: to use STFT,the original active segments are required to have the same length to acquire trans-formed feature vectors with the same dimension, however the lengths of activesegments for the transient hand gestures are highly unlikely to be the same; Sincethe time duration of a transient motion is generally about 300 ms, if STFT is used,the frequency resolution would be so low that there will be no meaningful informa-tion existing even if it produces high dimensional feature vectors; WT has a morestrict requirement that the length of active segments must be a power of two, i.e.2n; HOS usually performs well for stationary or weak-sense stationary signals suchas sEMG signals with continuous muscle contraction, which is different from ourcase. Therefore, only TD, ACCC and SPM will be investigated in details in thischapter.Hudgins? time-domain features: Hudgins? time-domain feature set was intro-120duced in 1993 [48] and has been widely used in the myoelectric control field.It was originally comprised of five different features, including mean absolutevalue (MAV), mean absolute value slope (MAVS), zero crossings (ZC), slope signchanges (SSC) and waveform length (WL). However, it was reported that the inclu-sion of SSC in the feature set contributed either a negative effect or no significanteffect on classification performances [54]. The MAVS was another feature usu-ally excluded from the TD feature set [55, 56] perhaps due to the similar reasonalthough no clear justification was stated in the literature. Based on our prelim-inary study, to improve the classification accuracy, we add a new feature calledthe mean absolute value ratio (MAVR) into the Hudgins? time-domain feature set.MAVR can eliminate the influence of inequable strengthes when the subject per-forms the same hand gesture at different times. Therefore, we adopt the followingfour features: MAV, MAVR, ZC and WL.Autocorrelation and cross-correlation coefficients: Using ACCC as featuresfor myoelectric control was first proposed by Leowinata et al. in 1998 [49], whosuggested that useful information might reside in the crosstalk between channels.Therefore, the autocorrelation for each channel and the cross-correlation betweenchannels are adopted as features.Spectral power magnitudes: A feature set comprised of SPM has been pro-posed in several studies [50, 51] and was reported to provide good performances.SPM are calculated by taking the average of power spectrum within disjoint band-widths after performing FFT for the data in active segments. In our case, for eachchannel we calculate SPM for four equal bandwidths between 75 Hz and 250 Hz.Classification algorithms The classification algorithm is a key element of therecognition procedure. Its reliability, computational complexity and recognitionefficiency have a great effect on the overall performance of the whole system.In recent years, many algorithms for sEMG recognition have been developed tospecifically serve certain scenarios [57]. Among them, classical algorithms suchas k-nearest neighbor (k-NN), LDA, quadratic discriminant analysis (QDA) andSVM have been most widely employed and reported to perform robustly in manystudies [19, 58, 59, 61, 64]. In this chapter, associated with the three sets of fea-tures described above, we first evaluate these four popular classical classification121approaches and study their effects on the recognition accuracy of Chinese numbergestures.k-nearest neighbor: The k-NN classification algorithm predicts the test sam-ple?s class according to its k nearest training samples by classifying the test sampleto a class using majority vote among the k neighbors. The Euclidean distance in thefeature space is used to determine which k training samples are the nearest neigh-bors [66]. In this chapter, since there are fifty samples for each hand gesture foreach subject, we choose k = 10 as a proper setting.Linear discriminant analysis: The LDA classifier is carried out by calculatinglinear discriminant functions and selecting the maximum one as the classificationrule. LDA assumes that the feature vector x is multivariate normally distributed ineach class group and the K classes have a common covariance matrix ?k = ?, ?k.Quadratic discriminant analysis: QDA is closely related to, but different fromLDA, due to different assumption of the covariance matrix. In QDA, the matrices?k?s of the K classes are not assumed to be identical as in LDA. The covariancematrix needs to be estimated separately for each class. In general, QDA and LDAare interchangeable, and which to use depends on the personal preference and theavailability of the training data to support the QDA analysis. Both LDA and QDAperform well on a large and diverse set of classification tasks [66].Support vector machine: SVM aims to find the optimal separating hyperplanebetween classes by focusing on the training samples that lie at the edge of the classdistributions, named the support vectors, and with other training samples being ef-fectively discarded. The basic idea of the SVM classifier is that only the trainingsamples lying on the class boundaries are necessary for discrimination. Detaileddiscussions of SVM can be found in [66]. SVM was originally designed for binaryclassification as described above, but can be extended to multiclass classification.Several approaches have been suggested for multiclass classification by SVM [63],and here we adopt the one-against-all approach. In this approach, a set of binaryclassifiers, each of which is trained to separate one class from the rest, are under-taken and each test sample is allocated to the class for which the largest decisionvalue is determined. For the kernel type, we select radial basis function (RBF)kernel.We are interested in whether the performance of recognizing the 10 Chinese122number gestures can be further improved. Based on our preliminary results shownin Fig. A.13, we can see that all three types of features contain useful, but proba-bly compensating information and could yield good recognition performances, wetherefore propose combining the three feature sets together. From Fig. A.13, wealso note that SVM provides good performances consistently when using differentfeatures. In SVM, a kernel function k implicitly maps samples x to a feature space? given by a feature map k(xi,x j) =??(xi),?(x j)?. Since there are different ker-nel types such as RBF kernel, Polynomial kernel, Hyperbolic tangent kernel etc.,it is often unclear what the most suitable kernel for the task at hand is, and hencethe user may wish to combine several possible kernels. One problem with simplyadding kernels is that using uniform weights is possibly not optimal. For instance,if one kernel is not correlated with the labels at all, then giving it positive weightwill just add noise and degrade the performance [41]. It is thus desirable to use anoptimized kernel function that could fit the data structure very well. Therefore, toefficiently combine multiple features and to optimally combine multiple kernels inSVM, we propose employing the following multiple kernel learning approach forChinese number gesture recognition.Multiple kernel learning: MKL is an efficient way of optimizing kernel weightsand can design a kernel which is optimal for a given data set. In the MKL approach,now let us consider convex combinations of M kernelsk(xi,x j) =M?m=1wmkm(xi,x j) (A.4)where wm ? 0, ?Mm=1 wm = 1, and M is the total number of kernels used. Inthe multiple kernel learning problem, for binary classification, suppose we havea training set of N samples {xi,yi}, i = 1, ...,N,yi ? {1,?1}, where xi is trans-lated via M mappings ?m(xi) 7??RDm , m = 1, ...,M, from the input into M featurespaces (?1(xi), ...,?M(xi)), where Dm denotes the dimensionality of the m-th fea-ture space. We need to solve the following MKL primal problem [65], which is123equivalent to the linear SVM for M = 1:min12(M?m=1wm ??m?2)2+CN?i=1?iw.r.t. ?m ? RDm ,? ? RN ,?0 ? Rs.t. ?i ? 0 and yi(M?m=1wm ??m,?m(xi)?+?0)? 1??i,?i = 1, ...,N(A.5)where ?m is the normal vector to the hyperplane for the feature space ?m. Theauthors in [65] derived the MKL dual for the problem (A.5) as below:min ?w.r.t. ? ? R,? ? RNs.t. 0? ? ? 1C,N?i=1?iyi = 012N?i, j=1?i? jyiy jkm(xi,x j)?N?i=1?i ? ?,?m = 1, ...,M.(A.6)where ? = (?1,?2, ...,?N), ?i is a Lagrange multiplier and ?m = ?Ni=1?iyi?m(xi).In [68], the authors tackled the MKL problem through a weighted 2-norm reg-ularization formulation with an additional constraint on the weights that encour-ages sparse kernel combinations. This algorithm, called SimpleMKL, can convergerapidly with comparable efficiency. In this chapter, we will adopt SimpleMKL asour MKL algorithm.Results and DiscussionOff-line recognition results and discussion Three general feature types describedin Section A.3.2 and four popular classification algorithms described in SectionA.3.2, representing a total of twelve combinations, are first applied to our recogni-tion problem of Chinese number gestures. Then based on their performances, wesuggest to combine the three feature (3F) together and apply MKL-SVM method.124To make a fair comparison between single kernel and multiple kernels, we alsoemploy the single kernel SVM combined with 3F for classification. In total, 14approaches were investigated in this study, as shown in Fig. A.13.There were six subjects in total and each subject performed 50 sEMG experi-mental trials for each of the 10 designed Chinese number gestures. Therefore, therewere totally 300 trials for each movement in the defined number gesture move-ment set. For each subject, the leave-one-out method, a commonly used methodof cross-validation, was used to calculate the classification accuracy and form asubject-level confusion matrix. Adding the six subject-level confusion matricestogether produces an overall confusion matrix for a given approach. The result-ing classification results from different approaches are summarized by the overallconfusion matrices, some of which are presented in Fig. A.12 and the overall clas-sification accuracies are reported in Fig. A.13.From the confusion matrices, we can see that in general the recognition accu-racies for individual number gestures and the overall average accuracies are quitesatisfying. By studying the existing literature, we also note that the classificationresults in our multi-finger number gesture case is comparable to, or even outper-form, the ones in gross hand, wrist and arm movement cases when employingsimilar recognition approaches. For instance, in [19], the difference absolute meanvalue was used to construct a feature map and the k-NN, LDA and QDA algorithmswere used to classify five wrist-motion direction movements, including up, down,right, left, and the rest state. The reported recognition accuracy rates were 84.9%for k-NN, 82.4% for QDA, and 81.1% for LDA, while our resulting accuracy rateswere 96.17% for k-NN, 95.90% for LDA, and 95.67% for QDA. Although onlytwo forearm sEMG channels were used in [19] while we used four, it is worth not-ing that our problem is to recognize twice the number of movements and that ourdefined subtle number gestures belong to the category of subtle multi-finger move-ments, which is much more difficult to be classified than the category of grosshand, wrist and arm movements. Using eight sEMG channels, the authors in [61]applied discrete wavelet transform and SVM to classify six gross hand movementssuch as wrist flexion, wrist extension, hand supination, hand pronation, hand open-ing and hand closure, and their misclassification rate for six subjects was 4.7% ?3.7% while ours was 3.37% ? 1.19% by using TD and SVM.125(a) (b)(c) (d)Figure A.12: Some confusion matrices of different combinations. Here thenotation ?ACCC+k-NN? is used to represent the recognition ap-proach that uses the ACCC features and the k-NN classifier, and othernotations are similarly defined. (a) ACCC+k-NN. (b) TD+LDA.(c) SPM+SVM (SK), here SK means single kernel. (d) 3F+SVM(MKL).However, we also note some limitations in the recognition results reported inFig. A.12. Several pairs of number gestures are relatively more difficult to be dis-tinguished, such as the gestures representing the numbers 3 and 4, and the gesturesrepresenting the numbers 0 and 9. This is probably due to the high movement sim-ilarity in such pairs, as seen in Fig. A.1. From Fig. A.12 and Fig. A.13, we notethat the recognition approach using ACCC features and the k-NN classifier, whichis notated as ?ACCC + k-NN?, performs very different from the other recognitionapproaches and yields the worst recognition results, as presented in Fig. A.12a. Itseems that each number gesture can be misclassified into other classes, even if thetwo gestures (e.g. the numbers 0 and 3) do not seem similar and not been misclas-sified by any of the rest recognition approaches. We think this observation mightreflect k-NN?s possible disadvantages associated with the Euclidean distance mea-126Figure A.13: Comparisons of the overall recognition accuracies when em-ploying different feature sets and different classifiers.sure. When two class groups are very close to each other, the classifier based onthe Euclidean distance could misclassify the points residing around the boundary.It is worse if the two class groups overlap. When one class group has several otherclass groups surrounded with, the points lying around the boundary in the classcan be randomly classified into the surrounding classes by k-NN and only the onesin the center can be classified accurately. This can explain our results observed inFig. A.12a.From Fig. A.13, we also note that the ACCC + LDA approach yields thesecond worst accuracy 88.40%, which suggests that ACCC might not be a goodfeature set for the LDA classifier. We note that the combinations of ACCC andQDA or SVM provide quite high accuracies, both above 95%. In general, it seemsthat k-NN and LDA may not be as robust as QDA and SVM, probably due to theexplanations above for k-NN and the common covariance assumption for LDA.Compared with the ACCC feature set, SPM and TD can provide more stable per-formances no matter which classifier is employed, and it is noted that all combina-tions involving them achieve the accuracy rates above 91%. SPM generally yieldsa slightly worse recognition performance than TD does. Therefore, the TD featuresis slightly preferred than other features. Regarding the studied popular classifiers,127Table A.1: Some combinations of different kernels and corresponding results.Index Kernel Type Kernel Number Kernel Parameter* Accuracy1 RBF 10 [0.5 1 2 5 8 10 12 15 17 20] 97.47%2 RBF 4 [0.5 1 2 5] 97.73%3 RBF 5 [0.5 1 2 5 8] 97.77%4 RBF, Poly 7 [0.5 1 2 5 8], [1 2] 97.80%5 RBF, Poly 8 [0.5 1 2 5 8], [1 2 3] 97.83%6 RBF, Poly 6 [1 2 5 8], [1 2] 97.87%7 RBF, Poly 7 [1 2 5 8], [1 2 3] 97.93%* Here Kernel Parameters include the Gaussian width of RBF kernel and the freedom degree ofPolynomial kernel, e.g. ?[1 2 5 8], [1 2]? indicates that four RBF kernels with Gaussian width 1,2, 5, 8 and two Poly kernels with freedom degree 1, 2 are used.QDA and SVM are both reliable and well-performed. The approaches combiningQDA (and SVM) with TD achieved 95.67% and 96.63% average accuracies withsmall standard errors respectively.Since all the three types of feature sets contain useful information based ontheir performance, we further investigate to combine the three features together.We first use single kernel SVM with 3F and achieve the accuracy 97.03% which ishigher than that of the above 12 combinations. To fit the combined feature set bet-ter and further improve the recognition performance, we apply MKL-SVM methoddescribed in Section A.3.2. Regarding the kernel type selection, RBF is generallysuggested [62] and we also examine the polynomial kernel function. Regardingthe parameter selection, a parameter range is first set for each kernel function:RBF (min:0.5;max:20) and polynomial (min:1;max:5); then the number of ker-nels and the corresponding parameters (all are integers except 0.5 for RBF) areautomatically selected uniformly according to a uniform distribution by MATLABwhich is repeated 100 times; finally, the combinations with the best performancesare selected. We provide some kernel parameters and report their performances inTable A.1 for reference. From Table A.1, we can see that using polynomial ker-nel function can help improve the performance and a highest rate 97.93% can beachieved by combination-7. In addition, the results also indicate that though SVM128with single kernel provides excellent performance, it does not perform as well asMKL-SVM does. In particular, the combination TD + LDA was reported multipletimes to provide the most stable performance for myoelectric control [59, 60]. Tovalidate the significance of the results achieved by 3F + MKL-SVM upon that byTD + LDA, we perform a paired samples t-test to compare the recognition rates bythe two combinations. The test indicates that there is a significant difference in theaccuracy (t = 5.303, p < 0.01) and the proposed combination outperforms TD +LDA for classification rate. However, the efficiency of MKL-SVM is much lowerthan that of LDA since the former involves a slow optimization procedure. Over-all, MKL-SVM is a promising method for sEMG-based number gesture patternrecognition.On-line recognition results and discussion Based on high classification accura-cies observed in the offline sEMG recognition analysis in Section A.3.2, we imple-ment a real-time classification system for Chinese number gestures. The purposeof this section is to examine the possibility of practical applications in a controlledFigure A.14: Pictures of the hardware and software implementation of theproposed real-time sEMG recognition system.129laboratory setting, that is, to check if the accuracy can be improved to a satisfy-ing level when subjects become familiar with the system and if the re-placementof electrodes at the same locations at different time can affect the classificationaccuracy. From the results in Section A.3.2 and considering the tradeoff betweenaccuracy and algorithm complexity, we select QDA combined with TD as the algo-rithm for the real-time recognition system of Chinese number gestures. Fig. A.14presents a demo of the implemented real-time system, in which the recognition al-gorithm part and graphical user interface are realized by the programming languageJava.Figure A.15: The average recognition accuracy rates of the six subjects overthe eight training sessions. The spread of data points are also provided.Note that some points are overlapped.The corresponding real-time experiment was described in the Section A.3.2.The classification results for each subject were shown in Fig. A.15 and the resultsfor each of the designed number gesture movements were shown in Fig. A.16.From Fig. A.15, we note that the average accuracy rates for the six subjects gen-erally increase gradually as the number of training sessions increases. After fourtraining sessions, the accuracy rate for each subject remains nearly steady. All sixsubjects could achieve above 90% accuracy in average and three of them couldachieve above 95% accuracy for the ten number gestures. From Fig. A.16, similarobservations can be noted for each number gesture movement. The average recog-130Figure A.16: The average recognition accuracy rates for the ten number ges-ture movements over the eight training sessions. The spread of datapoints are also provided. Note that some points are overlapped.nition accuracy rates for each number gesture movement were above 90% afterfour training sessions and above 95% accuracy can be achieved for seven of the tenmovements.The reason for our excellent recognition performance should be the follow-ing factors: reasonable electrode placement, effective feature extraction methodsand advanced classification algorithms. Reasonable electrode placement ensuresthat meaningful sEMG signals from corresponding muscles are acquired; effectiveextraction methods ensure that useful information contained in the signals are ex-tracted appropriately; advanced classification algorithms guarantee that differentpatterns can be distinguished from each other. In this , through a large numberof preliminary experiments and our previous study [46], we can design the supe-rior electrode placement for our proposed number gestures using only four sEMGchannels, while in most previous studies (e.g. [22]) a large number of channelswere employed and arranged orderly in which case only a part of them were ac-tually useful. Also, we extensively investigate the most popular feature extractionand recognition methods so that we can clearly tell the most proper approachesfor number gestures and select the most effective features and classification al-gorithms. We believe that our super recognition performance of recognizing ten131multi-finger gestures is jointly contributed by the above factors. With that excellentperformance, we believe the number gestures are therefore promising for practicalapplications such as HCI.132Appendix BAlgorithm DerivationB.1 The Derivation for The PLS+CCA MethodIn this appendix, we show how to mathematically derive the solution of the pro-posed PLS+CCA method.B.1.1 The First Step: PLSThe cost function of PLS is as follows (the same as Equation (2.1) in Section 2.3.2):maxw1,w2(w1T XTY w2)2s.t. wiT wi = 1, i = 1,2(B.1)where wi?s (i = 1,2) are the weight vectors.By employing the method of Lagrange multipliers, we rewrite the initial costfunction as:L(wi,?i) =(w1T XTY w2)2?2?i=1?i(wiT wi?1), (B.2)where ?i?s are Lagrange multipliers.Now we only present the detailed derivations regarding w1, since w2 can besimilarly derived. Taking the derivatives of L(wi,?i) with respect to w1 and ?1 and133setting them to be zero, we have:?Lw1 = 2??w1T XTY w2??XTY w2?2?1w1 = 0, (B.3)?L?1 = w1T w1?1 = 0. (B.4)Left multiplying both sides of Eq. (B.3) by w1T , we have:2(w1T XTY w2)2?2?1w1T w1 = 0. (B.5)According to Eq. (B.4), ?1 can be calculated as?1 =(w1T XTY w2)2. (B.6)Through the similar procedure, ?Lw2 and ?2 can be easily derived as?Lw2 = 2??w1T XTY w2??Y T Xw1?2?2w2 = 0, (B.7)?2 =(w1T XTY w2)2. (B.8)Substituting Eq. (B.8) into Eq. (B.3) and Eq. (B.7) respectively, we have thefollowing two expressions:??2XTY w2 = ?1w1, (B.9)1??2Y T Xw1 = w2. (B.10)By substituting Eq. (B.10) into Eq. (B.9), we can formulate an eigenvalue-eigenvectordecomposition problem:(XTYY T X)w1 = ?1w1. (B.11)Similarly, we can have the formulation for w2 as:(Y T XXTY)w2 = ?2w2. (B.12)134The above solutions are straightforward. A practical issue is to determine thenumber of LVs. In our study, we determine the number R by setting a thresholdthat corresponds to the ratio of explained variance (e.g. 95%). Therefore, thecorresponding LVs in X and Y can be calculated byTX = XW1, TY = YW2, (B.13)where W1 is composed of the first R eigenvectors associated with Eq. (B.11) andthe columns of TX represent the R components extracted from X . W2 and TY aresimilarly defined.However, the collinearity problem may exist in the LVs calculated through theabove procedure, since each data set is used repetitively for each LV?s calculation.The extracted LVs are not necessarily uncorrelated to each other. To effectively im-plement the second step and avoid the ill-conditioned problem, we need to addressthis uncorrelatedness concern and thus we design a deflation procedure: Beforeextracting the second common LV in each data space, X and Y are deflated by theircorresponding first LVs as follows:X = X? tX(tXT tX)?1tXT X , Y = Y ? tY (tYT tY )?1tYTY . (B.14)Then the above procedure will be repeated for the further extraction of commonLVs. In this way, the following new LVs are uncorrelated to the previous ones.The purpose of this step is to extract LVs which can most explain the individualdata sets and meanwhile are well correlated to the LVs in another data set. With thisstep, trivial and irrelevant information across data sets could be removed. However,a higher covariance may merely result from the larger variance of LVs, which maynot necessarily imply strong correlations. To address this concern, the 2nd stepwill help further refine the results.135B.1.2 The Second Step: CCABased on the extracted LVs in the first step, the objective function of CCA can beconstructed as follows:maxv1,v2(v1T TXT TY v2)2s.t. v1T TXT TX v1 = 1, v2T TYT TY v2 = 1(B.15)where vi?s (i = 1,2) are the weight vectors.By employing the method of Lagrange multipliers, we rewrite the initial objec-tive function as:L(vi,?i) =(v1T TXT TY v2)2??1(v1T TX T TX v1?1)??2(v2T TY T TY v2?1), (B.16)where ?i?s are Lagrange multipliers. Similar to the derivation in the first step, wecan obtain the following eigenvalue-eigenvector decomposition problem:[(TXT TX)?1TXT TY (TYT TY )?1TYT TX]v1 = ?1v1. (B.17)Similarly, for v2, we have:[(TYT TY )?1TYT TX(TXT TX)?1TXT TY]v2 = ?2v2. (B.18)The solutions to this problem are the R largest eigenvectors of the correspondingmatrices. The recovered LVs UX and UY can be calculated directly from the ma-trices TX and TY byUX = TXV1, UY = TYV2, (B.19)where V1 is composed of the R eigenvectors associated with Eq. (B.17) and thecolumns of UX represent the R components extracted from TX . V2 and UY aresimilarly defined.After these two steps, it is ensured that the extracted components UX and UYare maximally correlated across data sets and meanwhile can well explain the in-formation within each individual data set.136B.2 The Derivation for The IC-PLS ModelIn this appendix, we show how to mathematically derive the optimization solutionof the proposed IC-PLS model.By employing the method of Lagrange multipliers, we can rewrite the initialcost function as follows:F(w1,w2,?1,?2) = ?(E((w1T x)(w2T y)))2+?(E(G(w1T x))?E(G(u1)))2+?(E(G(w2T y))?E(G(u2)))2+?1(w1T w1?1)+?2(w2T w2?1)(B.20)where ?1 and ?2 are Lagrange multipliers.Here, we only present the detailed derivations regarding w1. As to those ofw2, it is straightforward from the results obtained on w1. Taking the derivatives ofF(w1,w2,?1,?2) with respect to w1 and ?1 and setting them to be zero, we have:?Fw1 =?F?w1= 2?E((w1T x)(w2T y))E(x(w2T y))+2? (E(G(w1T x))?E(G(u1)))E(xg(w1T x))+2?1w1 = 0,(B.21)?F?1 = w1T w1?1 = 0, (B.22)where g(?) represents the corresponding first-order derivative of G(?).Left multiplying both sides of Eq. (B.3) by w1T , we have:2?w1T E((w1T x)(w2T y))E(x(w2T y))+2? (E(G(w1T x))?E(G(u1)))w1T E(xg(w1T x))+2?1w1T w1 = 0.(B.23)According to Eq. (B.4), ?1 can be calculated as?1 =??w1T E((w1T x)(w2T y))E(x(w2T y))?? (E(G(w1T x))?E(G(u1)))w1T E(xg(w1T x)).(B.24)137Based on Kuhn-Tucker conditions [99], the optima of the multi-objective func-tion shown in Eq. (3.5) will be at the points shown in Eq. (B.21) under the con-straint (B.22). In this work, to improve the convergence speed, Newton?s methodis employed to solve this problem.Suppose ?1 = ?Fw1 and then its Jacobian matrix can be derived asJ?1 = 2?E(x(w2T y))E(xT (w2T y))+2? (E(G(w1T x))?E(G(u1)))E(xxT g?(w1T x))+2?E(xg(w1T x))E(xT g(w1T x))+2?1I,(B.25)where g?(.) means the second-order derivative of G(.).Substituting Eq. (B.24) into Eq. (B.25), we have the following expression forJ?1:J?1 = 2?E(x(w2T y))E(xT (w2T y))+2? (E(G(w1T x))?E(G(u1)))E(xxT g?(w1T x))+2?E(xg(w1T x))E(xT g(w1T x))? (2?w1T E((w1T x)(w2T y))E(x(w2T y))+2? (E(G(w1T x))?E(G(u1)))w1T E(xg(w1T x)))I.(B.26)Since the data have been initialized, to simplify the inverse of the Jacobianmatrix, the second term can be approximated as in the original FastICA algorithmas [39]:E(xxT g?(w1T x))? E(xxT )E(g?(w1T x)) = E(g?(w1T x))I. (B.27)Therefore, the Jacobian matrix can be approximately expressed in a simpleform:J?1 = 2?E(x(w2T y))E(xT (w2T y))+2?E(xg(w1T x))E(xT g(w1T x))? c1I,(B.28)where c1 is a constant defined asc1 = 2?w1T E((w1T x)(w2T y))E(x(w2T y))+2? (E(G(w1T x))?E(G(u1)))w1T E(xg(w1T x))?2? (E(G(w1T x))?E(G(u1)))E(g?(w1T x)).(B.29)138Substituting Eq. (B.24) into Eq. (B.3), we can express the derivatives ofF(w1,w2,?1,?2) with respect to w1 as:?Fw1 = 2?E((w1T x)(w2T y))E(x(w2T y))+2? (E(G(w1T x))?E(G(u1)))E(xg(w1T x))?2(?w1T E((w1T x)(w2T y))E(x(w2T y))+? (E(G(w1T x))?E(G(u1)))w1T E(xg(w1T x)))w1,(B.30)Through the similar procedure, J?2 and ?Fw2 can be easily derived asJ?2 = 2?E(y(w1T x))E(yT (w1T x))+2?E(yg(w2T y))E(yT g(w2T y))? c2I,(B.31)where c2 is a constant defined asc2 = 2?w2T E((w1T x)(w2T y))E(y(w1T x))+2?(E(G(w2T y))?E(G(u2)))w2T E(yg(w2T y))?2?(E(G(w2T y))?E(G(u2)))E(g?(w2T y)).(B.32)and?Fw2 = 2?E((w1T x)(w2T y))E(y(w1T x))+2?(E(G(w2T y))?E(G(u2)))E(yg(w2T y))?2(?w2T E((w1T x)(w2T y))E(y(w1T x))+?(E(G(w2T y))?E(G(u2)))w2T E(yg(w2T y)))w2.(B.33)The Newton iteration direction vectors di (i = 1,2) are derived by solving thefollowing equations:J?(wi) ?di =??Fwi , i = 1,2. (B.34)Therefore, finally, wi (i = 1,2) can be derived as followswi? wi +di,i.e., wi? wi? (J?(wi))?1?Fwi , i = 1,2.(B.35)139
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Multimodal biomedical signal processing for corticomuscular...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Multimodal biomedical signal processing for corticomuscular coupling analysis Chen, Xun 2014
pdf
Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
Page Metadata
Item Metadata
Title | Multimodal biomedical signal processing for corticomuscular coupling analysis |
Creator |
Chen, Xun |
Publisher | University of British Columbia |
Date Issued | 2014 |
Description | Corticomuscular coupling analysis using multiple data sets such as electroencepha-logram (EEG) and electromyogram (EMG) signals provides a useful tool for understanding human motor control systems. A popular conventional method to assess corticomuscular coupling is the pair-wise magnitude-squared coherence (MSC). However, there are certain limitations associated with MSC, including the difficulty in robustly assessing group inference, only dealing with two types of data sets simultaneously and the biologically implausible assumption of pair-wise interactions. In this thesis, we propose several novel signal processing techniques to overcome the disadvantages of current coupling analysis methods. We propose combining partial least squares (PLS) and canonical correlation analysis (CCA) to take advantage of both techniques to ensure that the extracted components are maximally correlated across two data sets and meanwhile can well explain the information within each data set. Furthermore, we propose jointly incorporating response-relevance and statistical independence into a multi-objective optimization function, meaningfully combining the goals of independent component analysis (ICA) and PLS under the same mathematical umbrella. In addition, we extend the coupling analysis to multiple data sets by proposing a joint multimodal group analysis framework. Finally, to acquire independent components but not just uncorrelated ones, we improve the multimodal framework by exploiting the complementary property of multiset canonical correlation analysis (M-CCA) and joint ICA. Simulations show that our proposed methods can achieve superior performances than conventional approaches. We also apply the proposed methods to concurrent EEG, EMG and behavior data collected in a Parkinson's disease (PD) study. The results reveal highly correlated temporal patterns among the multimodal signals and corresponding spatial activation patterns. In addition to the expected motor areas, the corresponding spatial activation patterns demonstrate enhanced occipital connectivity in PD subjects, consistent with previous medical findings. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2014-01-20 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivs 2.5 Canada |
DOI | 10.14288/1.0165841 |
URI | http://hdl.handle.net/2429/45811 |
Degree |
Doctor of Philosophy - PhD |
Program |
Biomedical Engineering |
Affiliation |
Applied Science, Faculty of |
Degree Grantor | University of British Columbia |
GraduationDate | 2014-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/2.5/ca/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2014_spring_chen_xun.pdf [ 14.94MB ]
- Metadata
- JSON: 24-1.0165841.json
- JSON-LD: 24-1.0165841-ld.json
- RDF/XML (Pretty): 24-1.0165841-rdf.xml
- RDF/JSON: 24-1.0165841-rdf.json
- Turtle: 24-1.0165841-turtle.txt
- N-Triples: 24-1.0165841-rdf-ntriples.txt
- Original Record: 24-1.0165841-source.json
- Full Text
- 24-1.0165841-fulltext.txt
- Citation
- 24-1.0165841.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
data-media="{[{embed.selectedMedia}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0165841/manifest