Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Prior-informed multivariate models for functional magnetic resonance imaging Ng, Bernard 2011

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2012_spring_ng_bernard.pdf [ 2.76MB ]
JSON: 24-1.0072231.json
JSON-LD: 24-1.0072231-ld.json
RDF/XML (Pretty): 24-1.0072231-rdf.xml
RDF/JSON: 24-1.0072231-rdf.json
Turtle: 24-1.0072231-turtle.txt
N-Triples: 24-1.0072231-rdf-ntriples.txt
Original Record: 24-1.0072231-source.json
Full Text

Full Text

PRIOR-INFORMED MULTIVARIATE MODELS FOR FUNCTIONAL MAGNETIC RESONANCE IMAGING  by  Bernard Ng  B.A.Sc., Simon Fraser University, 2005 M.A.Sc., The University of British Columbia, 2007  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  DOCTOR OF PHILOSOPHY  in  The Faculty of Graduate Studies  (Electrical and Computer Engineering)  THE UNIVERSITY OF BRITISH COLUMBIA  (Vancouver)     September 2011  © Bernard Ng, 2011 ii  Abstract Neurological diseases constitute the leading disease burden worldwide. Existing symptom- based diagnostic methods are often insufficient to detect many of these diseases in their early stages. Recent advances in neuroimaging technologies have enabled non-invasive examination of the brain, which facilitates localization of disease-induced effects directly at the source. In particular, functional magnetic resonance imaging (fMRI) has become one of the dominant means for studying brain activity in healthy and diseased subjects. However, the low signal-to-noise ratio, the typical small sample size, and the large inter-subject variability present major challenges to fMRI analysis. Standard analysis approaches are largely univariate, which underutilize the available information in the data. In this thesis, we present novel strategies for activation detection, region of interest (ROI) characterization, functional connectivity analysis, and brain decoding that address many of the key challenges in fMRI research. Specifically, we propose: 1) new formulations for incorporating connectivity and group priors to better inform activation detection, 2) the use of invariant spatial features for capturing the often-neglected spatial information in ROI characterization, 3) an evolutionary group-wise approach for dealing with the high inter-subject variability in functional connectivity analysis, and 4) a generalized sparse regularization technique for handling ill-conditioned brain decoding problems. On both synthetic and real data, we showed that exploitation of prior information enables more sensitive activation detection, more refined ROI characterization, more robust functional connectivity analysis, and more accurate brain decoding over the current state-of-the-art. All of our results converged to the conclusion that integrating prior information is beneficial, and oftentimes, essential for tackling the challenges that fMRI research present.  iii  Preface The list of publications resulting from my thesis work is provided on pages iv to vi arranged according to chapters.  [P1−P5, P7−P9, P12−P16]: I was the primary author of these papers and the main contributor to the design, development, and testing of the presented methods under the supervision of Dr. Rafeef Abugharbieh. The papers were edited by the co-authors. Dr. Martin McKeown of the Pacific Parkinson’s Research Center (PPRC) provided data (Clinical Research Ethics Board, H04-70177) and input to [P7−P9]. Dr. Ghassan Hamarneh of Simon Fraser University (SFU) provided input to [P1, P3−P5]. Initial method testing of [P3] was partly conducted by Kelvin So of Berkeley University under my supervision.  [P6, P11]: I was the primary author of these papers and the main contributor to the design, development, and testing of the presented methods under the supervision of Dr. Rafeef Abugharbieh and Dr. Bertrand Thirion of the French National Institution for Research in Computer Science and Control (INRIA). The papers were edited by the co-authors. Dr. Jean Baptiste Poline of INRIA provided data and input to these works. Dr. Gael Varoquaux of INRIA also provided input.  [P10]: Dr. Samantha Palmer of PPRC was the primary author of this paper. The co-authors and I were responsible for editing. The experimental data were provided by Dr. Martin McKeown of PPRC (Clinical Research Ethics Board, H04-70177). The spatial analysis portion of this work was performed by me. The network analysis portion of this work was conducted by Dr. Samantha Palmer.  [P17−P20]: I was the primary author of these papers and the main contributor to the design, development, and testing of the presented methods under the supervision of Dr. Rafeef Abugharbieh. The papers were edited by the co-authors. The data were made publicly available online by Dr. Tom Mitchell of Carnegie Mellon University. Initial method testing of [P17] was partly conducted by Arash Vahdat of SFU under my supervision. iv  Chapter 2: Prior-informed Activation Detection [P1] B. Ng, R. Abugharbieh, G. Hamarneh, and M. J. McKeown, "Random walker based estimation and spatial analysis of probabilistic fMRI activation maps," in Proc. Workshop fMRI Data Analysis held in conjunction with Int. Conf. Medical Image Computing and Computer Assisted Intervention (MICCAI), London, U.K., 2009, pp. 37-44. [P2] B. Ng, R. Abugharbieh, and M. J. McKeown, "Functional segmentation of fMRI data using adaptive non-negative sparse PCA (ANSPCA)," in Proc. Int. Conf. Medical Image Computing and Computer Assisted Intervention (MICCAI), London, U.K., 2009, pp. 490-497. [P3] B. Ng, R. Abugharbieh, and G. Hamarneh, "Group MRF for fMRI activation detection," in IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), San Francisco, C.A., 2010, pp. 2887-2894. [P4] B. Ng, G. Hamarneh, and R. Abugharbieh, "Detecting brain activation in fMRI using group random walker," in Int. Conf. Medical Image Computing and Computer Assisted Intervention (MICCAI), Beijing, China, 2010, pp. 331-338. [P5] B. Ng, R. Abugharbieh, and G. Hamarneh, "Modeling brain activation in fMRI using group MRF," submitted.. [P6] B. Ng, R. Abugharbieh, G. Varoquaux, J.B. Poline, and B. Thirion, "Connectivity- informed fMRI activation detection," accepted in Int. Conf. Medical Image Computing and Computer Assisted Intervention (MICCAI), Toronto, Canada, 2011. v  Chapter 3: Spatial Characterization of Regional Activation Patterns [P7] B. Ng, R. Abugharbieh, X. Huang, and M. J. McKeown, "Spatial characterization of fMRI activation maps using invariant 3D moment descriptors," IEEE Trans. Med. Imaging, vol. 28, pp. 261-268, 2009. [P8] B. Ng, R. Abugharbieh, and M. J. McKeown, "Adverse effects of template-based warping on spatial fMRI analysis," in Proc. Int. Conf. Medical Imaging of Int. Society for Optics and Photonics (SPIE), Orlando, Florida, 2009, p. 72621Y (12 pages). [P9] B. Ng, S. Palmer, R. Abugharbieh, and M. J. McKeown, "Focusing effects of L-dopa in Parkinson's disease," Hum. Brain Mapp., vol. 31, pp. 88-97, 2010. [P10] S. J. Palmer, B. Ng, R. Abugharbieh, L. Eigenraam, and M. J. McKeown, "Motor reserve and novel area recruitment: Amplitude and spatial characteristics of compensation in Parkinson's disease," Eur. J. Neurosci., vol. 29, pp. 2187-2196, 2009. [P11] B. Ng, R. Abugharbieh, J. B. Poline, and B. Thirion, "Inferring brain activation from spatial modulations of fMRI BOLD distribution," accepted in Int. Conf. Human Brain Mapping (HBM), Quebec, Canada, 2011. [P12] B. Ng, R. Abugharbieh, and M. J. McKeown, "Enhanced fMRI response detection and reduced latency through spatial analysis of BOLD signals," in Proc. Workshop Analysis of Functional Medical Images held in conjunction with Int. Conf. Medical Image Computing and Computer Assisted Intervention (MICCAI), New York, N.Y., 2008, pp. 81-88. [P13] B. Ng, R. Abugharbieh, and M. J. McKeown, "Detecting maximal directional changes in spatial fMRI response using canonical correlation analysis," in Proc. Int. Symp. Biomedical Imaging (ISBI), Boston, M.A., 2009, pp. 650-653. [P14] B. Ng, R. Abugharbieh, and M. J. McKeown, "Inferring functional connectivity using spatial modulation measures of fMRI signals within brain regions of interest," in Proc. IEEE Int. Symp. Biomedical Imaging (ISBI), Paris, France, 2008, pp. 572-575. vi  Chapter 4: Group-wise Functional Network Analysis [P15] B. Ng, R. Abugharbieh, and M. J. McKeown, "Discovering sparse functional brain networks using group replicator dynamics (GRD)," in Proc. Int. Conf. Information Processing in Medical Imaging (IPMI), Williamsburg, V.A., 2009, pp. 76-87. [P16] B. Ng, R. Abugharbieh, and M.J. McKeown, "Group replicator dynamics: A novel group-wise evolutionary approach for sparse brain network detection," submitted.  Chapter 5: Generalized Sparse Classification [P17] B. Ng and R. Abugharbieh, "Generalized sparse regularization with application to fMRI brain decoding," in Proc. Int. Conf. Information Processing in Medical Imaging (IPMI), Monastery Irsee, Germany, 2011, pp. 612-623. [P18] B. Ng and R. Abugharbieh, "Generalized group sparse classifiers with application in fMRI brain decoding," in Proc. IEEE Int. Conf. Computer Vision Pattern Recognition (CVPR), Colorado Springs, C.O., 2011, pp. 1065-1071. [P19] B. Ng and R. Abugharbieh, "Modeling spatiotemporal structure in fMRI brain decoding using generalized sparse classifiers," in Proc. IEEE Int. Workshop Pattern Recognition in Neuroimaging (PRNI), Seoul, South Korea, 2011, pp. 65-68. [P20] B. Ng, A. Vahdat, G. Hamarneh, and R. Abugharbieh, "Generalized sparse classifiers for decoding cognitive states in fMRI," in Proc. Workshop Machine Learning in Medical Imaging held in conjunction with Int. Conf. Medical Image Computing and Computer Assisted Intervention (MICCAI), Beijing, China, 2010, pp. 108-115. vii  Table of Contents Abstract .................................................................................................................................... ii  Preface ..................................................................................................................................... iii  Table of Contents .................................................................................................................. vii  List of Tables ........................................................................................................................... x  List of Figures ......................................................................................................................... xi  List of Abbreviations ........................................................................................................... xiii  Acknowledgements ............................................................................................................... xv  Dedication ............................................................................................................................. xvi  1 Introduction .......................................................................................................................... 1  1.1  Problem Statement and Motivation ....................................................................................... 1  1.2  Functional Magnetic Resonance Imaging ............................................................................. 3  1.3  Activation Detection ............................................................................................................. 6  1.3.1  Intra-subject Analysis ....................................................................................................... 6  1.3.2  Group Analysis ................................................................................................................. 9  1.4  ROI Characterization .......................................................................................................... 11  1.5  Functional Connectivity Analysis ....................................................................................... 12  1.5.1  Intra-subject Analysis ..................................................................................................... 12  1.5.2  Group Analysis ............................................................................................................... 14  1.6  Brain Decoding ................................................................................................................... 15  1.7  Thesis Contributions ........................................................................................................... 18  1.7.1  Activation Detection ....................................................................................................... 18  1.7.2  ROI Characterization ...................................................................................................... 19  1.7.3  Functional Connectivity Analysis ................................................................................... 20  1.7.4  Brain Decoding ............................................................................................................... 20  2 Prior-informed Activation Detection ............................................................................... 21  2.1  Univariate Activation Detection with GLM ....................................................................... 22  2.2  Univariate Activation Detection with GMM ...................................................................... 23  viii  2.3  Modeling Local Spatial Correlations with MRF ................................................................. 25  2.4  Modeling Functional Connectivity with RW ...................................................................... 26  2.5  Modeling Group Information with GMRF ......................................................................... 28  2.5.1  GMRF Formulation ........................................................................................................ 29  2.5.2  Implementation Details ................................................................................................... 30  2.6  Synthetic Task Data Experiments ....................................................................................... 31  2.6.1  Data Generation .............................................................................................................. 31  2.6.2  Results and Discussion ................................................................................................... 32  2.7  Real Task Data Analysis ..................................................................................................... 36  2.7.1  Materials ......................................................................................................................... 36  2.7.2  Results and Discussion ................................................................................................... 37  2.8  Resting State-informed Activation Detection ..................................................................... 40  2.8.1  RS-informed Activation Model ...................................................................................... 41  2.8.2  Posterior Activation Effects Estimation ......................................................................... 42  2.8.3  Functional Connectivity Estimation ............................................................................... 42  2.8.4  Hyper-parameters Estimation ......................................................................................... 43  2.9  Synthetic RS/Task Paired Data Experiments ...................................................................... 45  2.9.1  Data Generation .............................................................................................................. 45  2.9.2  Results and Discussion ................................................................................................... 45  2.10  Real RS/Task Paired Data Analysis .................................................................................... 47  2.10.1  Materials ..................................................................................................................... 47  2.10.2  Results and Discussion ............................................................................................... 47  2.11  Summary ............................................................................................................................. 49  3 Spatial Characterization of ROI Activation .................................................................... 50  3.1  3D Moment-based Invariant Spatial Features ..................................................................... 51  3.2  Effects of Spatial Normalization ......................................................................................... 53  3.2.1  Assessment Methods ...................................................................................................... 54  3.2.2  Results and Discussion ................................................................................................... 56  Effects of Spatial Normalization on Smaller ROIs ................................................ 57  Effects of Spatial Normalization on Larger ROIs .................................................. 61  3.3  Effects of L-dopa on Parkinson’s Disease .......................................................................... 64  3.3.1  Statistical Analysis ......................................................................................................... 64  3.3.2  Results and Discussion ................................................................................................... 65  3.4  Large Scale Real Data Experiment ..................................................................................... 67  ix  3.4.1  Statistical Analysis ......................................................................................................... 67  3.4.2  Results and Discussion ................................................................................................... 68  3.5  Summary ............................................................................................................................. 70  4 Group-wise Functional Network Analysis ....................................................................... 71  4.1  Replicator Dynamics ........................................................................................................... 72  4.2  Group Replicator Dynamics ................................................................................................ 73  4.2.1  GRD Formulation ........................................................................................................... 73  4.2.2  GRD Optimization .......................................................................................................... 74  4.2.3  Group Inference .............................................................................................................. 76  4.2.4  Implementation Details ................................................................................................... 77  4.3  Synthetic Data Experiments ................................................................................................ 78  4.3.1  Data Generation .............................................................................................................. 78  4.3.2  Results and Discussion ................................................................................................... 79  4.4  Real Data Analysis .............................................................................................................. 85  4.4.1  Materials ......................................................................................................................... 85  4.4.2  Results and Discussion ................................................................................................... 86  4.5  Summary ............................................................................................................................. 89  5 Generalized Sparse Classification .................................................................................... 90  5.1  Overview of Sparse Linear Models..................................................................................... 91  5.2  Generalized Sparse Regularization ..................................................................................... 94  5.3  GSR for Classification ........................................................................................................ 96  5.4  Application to fMRI Spatiotemporal Classification ........................................................... 97  5.5  Materials ............................................................................................................................. 99  5.6  Results and Discussion ...................................................................................................... 100  5.7  Summary ........................................................................................................................... 105  6 Conclusions ....................................................................................................................... 106  6.1  Activation Detection ......................................................................................................... 106  6.2  ROI Characterization ........................................................................................................ 108  6.3  Functional Connectivity Analysis ..................................................................................... 110  6.4  Brain Decoding ................................................................................................................. 111  References ............................................................................................................................ 112  x  List of Tables Table 2.1  Summary of activation detection methods. ............................................................ 49  Table 4.1  Summary of functional connectivity analysis methods. ........................................ 89  Table 5.1  Summary of feature selection approaches for fMRI classification. ..................... 105  xi  List of Figures Figure 1.1  Typical fMRI block design experiment. ................................................................. 4  Figure 1.2  Sample raw intensity time course of an activated voxel. ....................................... 4  Figure 2.1  Pictorial depiction of GLM. ................................................................................. 22  Figure 2.2  Graphical depiction of GMM. .............................................................................. 24  Figure 2.3  Histogram of correlation values between random Gaussian noise time courses. . 27  Figure 2.4  Proposed GMRF graph structure. ......................................................................... 29  Figure 2.5  Synthetic data results. ........................................................................................... 34  Figure 2.6  TPR, FPR, and DSC at two typical SNR levels. .................................................. 35  Figure 2.7  Homunculus. ......................................................................................................... 37  Figure 2.8  Real data results for control subjects. ................................................................... 38  Figure 2.9  Real data results for PD subjects. ......................................................................... 39  Figure 2.10  Synthetic data results. ......................................................................................... 46  Figure 2.11  Real data results. ................................................................................................. 48  Figure 3.1  Spatiotemporal ROI analysis pipeline. ................................................................. 52  Figure 3.2  Extraction of ROI activation statistics. ................................................................. 55  Figure 3.3  Discriminability comparisons between the contrasted procedures for extracting ROI activation statistics. ........................................................................................... 57  Figure 3.4  ROI activation statistics in the right thalamus under the slow condition (Section 2.7.1). ........................................................................................................................ 58  Figure 3.5  ROI activation statistics in the right thalamus under the fast condition (Section 2.7.1). ........................................................................................................................ 59  Figure 3.6  ROI activation statistics in the right cerebellum under the slow condition (Section 2.7.1). ........................................................................................................................ 62  Figure 3.7  ROI activation statistics in the right cerebellum under the fast condition (Section 2.7.1). ........................................................................................................................ 63  Figure 3.8  Spatial focusing effect of L-dopa seen in a typical PD subject. ........................... 66  Figure 3.9  Normalization of the spatial extent of ROI activation statistics by L-dopa. ........ 66  Figure 3.10  Percentage of ROIs detected with significant activation differences using J1. .. 69  Figure 3.11  Activation detection comparisons between J1 and mean intensity. .................... 69  xii  Figure 4.1  Synthetic data results. ........................................................................................... 80  Figure 4.2  Convergence of GRD. .......................................................................................... 82  Figure 4.3  Outlier analysis. .................................................................................................... 83  Figure 4.4  Effects of varying λ. ............................................................................................. 84  Figure 4.5  Real fMRI data results. ......................................................................................... 87  Figure 5.1  Predictor matrix. ................................................................................................... 98  Figure 5.2  Spatiotemporal neighborhood. ............................................................................. 99  Figure 5.3  Prediction accuracy comparisons. ...................................................................... 101  Figure 5.4  Classifier weight patterns. .................................................................................. 102   xiii  List of Abbreviations ACCc ACCr AD AMY ANOVA ANSPCA AR ASLDA ASSLDA ASTSLDA  BOLD CAU CER CM CTC DSC ED EEG ENLDA EPI FDR fMRI FRP GE GL GL-CM GLM GMM GMRF GRD GRF GRW GSR HAMMER HDP HIP HRF ICA i.i.d. LASSO LDA Caudal anterior cingulate cortex Rostral anterior cingulate cortex Alzheimer’s disease Amygdale Analysis of variance Adaptive negative sparse principal component analysis Autoregressive Anatomically-informed sparse linear discriminant analysis Anatomically-informed spatial smooth sparse linear discriminant analysis Anatomically-informed spatiotemporally smooth sparse linear discriminant analysis Blood oxygen level dependent Caudate Cerebellum Connectivity-informed model Cerebello-thalamo-cortical Dice similarity coefficient Euclidean distance Electroencephalography Elastic net linear discriminant analysis Echo planar imaging False discovery rate Functional magnetic resonance imaging False positive rate Graph embedding Graphical least absolute shrinkage and selection operator Connectivity-informed model with graphical LASSO General linear model Gaussian mixture model Group Markov random field Group replicator dynamics Gaussian random field Group random walker Generalized sparse regularization Hierarchical attribute matching mechanism for elastic registration Hierarchical Dirichlet process Hippocampus Hemodynamic response function Independent component analysis Independently and identically distributed Least absolute shrinkage and selection operator Linear discriminant analysis xiv  L-Dopa MCICA MEG ML MM MNI MRF MRI MS M1 OAS OAS-CM PAL PCA PCC p.d. PD PDar PDt PET PM PMA PUT RD ROI RS R-UM RW SLDA SMA SNR SPG SPECT SSA SSLDA STC STSLDA S-UM SVM SVM-RFE THA TRP UM UPDRS WHO Levodopa Motion-corrected independent component analysis Magnetoencephalography Maximum likelihood Mixture model Montreal Neurological Institutes Markov random field Magnetic resonance imaging Multiple Sclerosis Primary motor cortex Oracle approximating shrinkage Connectivity-informed model with oracle approximating shrinkage Pallidum Principal component analysis Posterior cingulate cortex Positive definite Parkinson’s disease Akinetic Parkison’s disease Tremor dominant Parkison’s disease Positron emission tomography Premotor are Primary motor area Putamen Replicator dynamics Region of interest Resting state Univariate model with ridge penalty Random walker Sparse linear discriminant analysis Supplementary motor area Signal-to-noise ratio Spectral projected gradient Single photon emission computed tomography Somatosensory area Spatially smooth sparse linear discriminant analysis Striato-thalamo-cortical Spatiotemporally smooth sparse linear discriminant analysis Standard univariate model Support vector machine Support vector machine-recursive feature extraction Thalamus True positive rate Univariate model Unified Parkinson’s disease rating scale World health organization xv  Acknowledgements I would like to thank my advisor, Dr. Rafeef Abugharbieh, for supervising me throughout my graduate studies and providing me the freedom to explore different research directions. I would also like to thank Dr. Martin McKeown for data sharing and orienting me to the clinical side of neuroscience research. Moreover, I would like to thank Dr. Ghassan Hamarneh for his input on my research from a computer scientist’s perspective. In addition, I would like to thank Dr. Bertrand Thirion and his colleagues, Dr. Gael Varoquaux and Dr. Jean-Baptiste Poline, for hosting me at INRIA and sharing data. Furthermore, I would like to thank my friends inside and outside of BiSICL for their support. I would also like to thank NSERC, MSFHR, Walter C. Sumner Association, SPIE, Government of France, and UBC for their generous funding. Last but not least, I would like to thank my family for their encouragement. xvi  Dedication  To my family   1 1     Introduction Functional magnetic resonance imaging (fMRI) is currently one of the most widely-used imaging modality for studying human brain function. However, the low signal-to-noise ratio (SNR) coupled with the small sample size presents major challenges to fMRI analysis. Standard analysis techniques are largely univariate in nature, which neglect important information encoded by the interactions between voxels. Also, the high inter-subject variability both anatomically and functionally renders establishment of subject correspondence difficult, which complicates group analysis. In this thesis, we propose new prior-informed approaches for dealing with the key challenges in fMRI analysis. Our proposed methods enable an elaborate exploration of the functional organization of the human brain from detecting localized activation effects to deciphering mental representations encoded in whole-brain activity patterns. In this chapter, we first discuss the impact of neurological diseases, which serves as the main motivation for our work. We then provide an overview of fMRI and the key research questions and challenges associated with this imaging modality. The limitations of the existing analysis methods are then discussed, concluding with a summary of the thesis contributions.  1.1 Problem Statement and Motivation Based on the Global Burden of Disease study conducted by the World Health Organization (WHO) in collaboration with the World Bank and the Harvard School of Public Health, neurological diseases rank as the number one disease burden in the world [1]. In fact, over 1,000 neurological diseases, including Alzheimer’s disease (AD), Parkinson’s disease (PD), Multiple Sclerosis (MS), Huntington’s disease, Stroke, Migraine, and Epilepsy among others, are currently affecting the world population [2]. In Canada, neurological diseases attribute to an estimated annual cost of $8.8 billion with $2.3 billion devoted to hospital care, physician care, and drug expenditures [3]. These costs represent a major financial burden on the country. Also, functional impairments associated with these neurodiseases greatly hinder the  2 daily activities of those affected and place a major toll on their family, both physically and emotionally. For instance, AD is known to induce memory loss, confusion, aggressiveness, and language breakdown among other symptoms. These debilitating effects may result in unintended inappropriate behaviors in social situations or one getting lost even in his or her own home. AD can thus cause great inconvenience, and at times, danger for the patients and those around them. Other neurodisease-induced symptoms, such as hand tremors and rigidity arising from PD, render tasks as simple as eating difficult for the patients. Worse yet, many of the neurological diseases are incurable and tend to worsen over time. Thus, with the life expectancy of the general population increasing, neurological diseases are predicted to become an even greater burden to healthcare systems worldwide.  Among the common neurological diseases, AD and PD are by far the most prevalent [4, 5]. Diagnosis of these neurodegenerative diseases is typically based on clinical symptoms. For instance, AD is often diagnosed using the Alzheimer’s Criteria set by the National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer’s Disease and Related Disorders Association, where clinical tests are performed to assess impairments of cognitive functions, such as memory and attention [6]. Similarly, PD is typically diagnosed using the Unified Parkinson’s Disease Rating Scale (UPDRS), where mentation, behaviour, mood, activities of daily living, and motor changes are examined [7]. However, clinical symptoms are easily confused with effects of normal ageing and do not appear until years after disease onset [8]. Also, current therapies help reduce symptom severity, but typically do not modify disease progression. Thus, objective biomarkers are needed to facilitate earlier disease detection, where more effective treatments can be designed to slow disease progression or even prevent disease onset [8, 9].  In recent years, neuroimaging has greatly contributed to the characterization and understanding of neurological diseases. The application of MRI, for instance, has unveiled abnormal anatomical changes in the brain, such as hippocampal atropy in AD patients [10] and brain volume reduction in MS patients [11]. Other disease-induced anatomical changes, such as degeneration of white matter fiber tracks in AD patients [12], have also been observed using diffusion MRI. In addition to anatomical changes, many neurological  3 diseases are associated with abnormal functional changes. For example, brain studies with electroencephalography (EEG) and magnetoencephalograpy (MEG) have shown impaired neural synchrony between brain regions in AD and PD patients [13]. Positron emission tomography (PET) and single photon emission computed tomography (SPECT) have also been used to quantify the extent of striatal dopaminergic degeneration in PD [14] and the reduction of cerebral metabolic rate for glucose in AD [15]. Furthermore, the use of fMRI has detected decreased functional synchrony in the hippocampus of AD patients [16], compensatory changes in brain activity in MS patients [17], and functional reorganization in stroke patients during recovery [18], among other clinical findings.  In this thesis, we focused on studying the functional aspects of the human brain with the ultimate goal of identifying reliable biomarkers for various neurological diseases. The most commonly-used imaging modalities for such purposes include EEG, MEG, PET, SPECT, and fMRI. Since fMRI provides much higher spatial resolution than EEG and MEG without the need for injecting radioactive tracers as in PET and SPECT, we have chosen to concentrate on fMRI throughout this work.  1.2 Functional Magnetic Resonance Imaging Functional MRI provides a non-invasive means for studying brain activity based on changes in blood flow, blood volume, and blood oxygenation level. Among the techniques for creating image contrast, blood oxygen level dependent (BOLD) contrast is by far the most widely used [19]. The assumption is that neural activity directly affects the local blood flow near the active brain areas through a process called, “hemodynamic response.” Specifically, neuronal firing consumes oxygen, which increases local magnetic inhomogeneity, and hence decreases the BOLD signals near the brain areas involved. When an inflow of oxygenated blood rushes into these brain areas in response to the oxygen demand, the relative decrease in de-oxygenated blood reduces the local magnetic inhomogeneity, which increases the BOLD signals. In a typical fMRI experiment, a subject lying in the MR scanner performs a certain task as brain volumes are acquired at regular time intervals, as shown in Figure 1.1.  4  … Rest Rest Stim Stim time (s) … ≈ Expected Response Activation Statistics Maps Voxel Time Course BOLD Volumes Pre-processing   Figure 1.1  Typical fMRI block design experiment. A subject placed in the MR scanner performs a certain task, e.g. finger tapping, as BOLD volumes are acquired at regular time intervals. Alternating between rest and task generates the image contrast required for inferring brain activity. Pre-processing, such as motion correction to account for the subject’s head movements and high-pass filtering to remove low frequency scanner drifts, is then performed. The pre-processed voxel time courses are compared with an expected response to generate activation statistics maps, which are thresholded to localize the activated brain voxels.   Figure 1.2  Sample raw intensity time course of an activated voxel. Blue = voxel time course. Red = expected response.  5 The contrast in BOLD signals between the baseline and stimulus conditions is then used to infer brain activity. However, the inherently low SNR [20] (typically between 0.2 to 0.5 [21, 22]) and the complex noise structure of BOLD signals [23] greatly complicate analysis. Shown in Figure 1.2 is a sample intensity time course of an active brain voxel, plotted against the expected response (a boxcar function convolved with the canonical hemodynamic response function (HRF) [24]). As evident from Figure 1.2, the raw BOLD signals are highly contaminated by noise, and must be preprocessed prior to analysis [23].  Standard preprocessing steps include slice timing correction [25], head motion correction [26], and temporal autocorrelation correction [27, 28]. Spatial smoothing [29] and spatial normalization [26] may be optionally performed depending on the type of analysis (i.e. intra- subject or group, Section 1.3). Slice timing correction accounts for how brain slices are not acquired simultaneously, and is corrected by shifting the phase spectrums of the voxel time courses, while retaining their original magnitude spectrums [25]. Head motions are corrected by rigidly registering all brain volumes to the first volume assuming the shape of the brain stays fixed throughout the fMRI experiment [26]. Temporal autocorrelations mainly arise from scanner drifts, cardiac noise, respiratory noise, and head motions [28]. Since the noise component of the voxel time courses is often assumed to be identical independently distributed (i.i.d.), pre-whitening is typically performed to ensure correct statistical inference [28]. Specifically, temporal autocorrelations can be decomposed into a short-range and a long-range component [28]. The long-range temporal correlations are removed by high-pass filtering, while the short-range autocorrelations are accounted for using autoregressive (AR) models [28]. Since standard pre-processing techniques are widely available [30], the focus of this thesis is instead on the analysis of the pre-processed fMRI data.  The key research questions addressed by fMRI revolve around the two fundamental organization principles to which the brain adheres, namely functional specialization and functional integration [30]. Functional specialization refers to how specific brain regions are specialized for certain functions. This brain property is typically studied using fMRI by examining which areas of the brain are activated during various tasks [31] as well as by characterizing the activity patterns within brain regions of interest (ROIs) [32, 33]. The other  6 organizational principle, functional integration, refers to how different cognitive processes are mediated through the interactions between brain regions [30]. These interactions are usually inferred from fMRI data based on signal correlations [34], commonly referred to as “functional connectivity.” Also, recent advances in pattern classification techniques have ignited major research interest in examining how different cognitive states are encoded in whole-brain activity patterns [35, 36]. In this thesis, we present our contributions within these four aspects of fMRI analysis: activation detection, ROI characterization, functional connectivity analysis, and brain decoding. These aspects play complimentary roles in providing a more complete picture of the functional organization of the brain. In particular, activation detection and ROI characterization offer information concerning the individual brain units (i.e. voxels) and modules (i.e. ROIs), while connectivity analysis and brain decoding provide a more system level perspective of how cognitive functions are encoded through the interactions between the various brain units and modules. State-of-the-art for each of these four aspects of fMRI analysis are summarized in Sections 1.3, 1.4, 1.5, and 1.6, respectively. For the interested readers, details regarding other active areas of fMRI research, such as characterizing functional integration in terms of “effective connectivity” and HRF estimation, can be found in [37] and [38].  1.3 Activation Detection The goal of brain activation detection is to map brain regions to functions by exerting different stimuli and examining which areas of the brain respond to which stimuli. Such analysis may be performed either at the intra-subject level or at the group level. Intra-subject analysis is more often used in clinical applications, such as neurosurgical planning [39], whereas group analysis is typically for identifying functional characteristics that are generally common within a population. 1.3.1 Intra-subject Analysis The standard approach for inferring intra-subject brain activation from fMRI data involves comparing each voxel’s intensity time course independently against an expected response  7 using a general linear model (GLM) [31] to estimate what is referred to as “activation effect” (Section 2.1). Statistics, such as t-values, pertaining to the probabilities (p-values) of observing the estimated effects given that no activation occurred are then computed and assembled into an activation statistics map, referred to as a t-map. Voxels with t-values above a user-defined threshold (conventionally set to a t-value corresponding to an experimental- wise p-value of 0.05) are declared as activated. To control the number of voxels falsely declared as active, one must account for multiple comparisons during threshold selection [29]. Since the voxels are certainly correlated due to factors, such as physiological noise, interpolation during motion correction, and the functional integration property of the brain in general [30], correcting for multiple comparisons using the traditional Bonferroni correction technique, which assumes independence between voxels, will result in an overly stringent threshold [40]. To account for correlations between spatially neighboring voxels, Worsley et al. proposed a thresholding technique based on Gaussian random field (GRF) theory [29]. Deriving this GRF-based threshold requires estimating the smoothness of the activation statistics maps, which is non-trivial. The standard way of dealing with this issue is to spatially smooth the BOLD volumes using a fixed Gaussian kernel and assume the smoothness of the activation statistics maps is equal to the kernel width [30]. Spatial smoothing also helps increase the SNR based on the matched filter theorem [30], provided that the Gaussian kernel matches the shape of the active spatial clusters. However, the shape of the active clusters is likely to vary across the brain. To remedy this limitation, scale-space approaches have been proposed [41, 42], where kernels of varying widths are applied with the threshold adjusted for the additional scale dimension [41].  The main drawback of GRF thresholding is the need for spatial smoothing, which blurs the details of the activation statistics maps [41]. The use of permutation tests [40] has been proposed to mitigate this limitation at the expense of longer computational time. Alternatively, one may model the activation statistics maps using mixture models (MM) [43- 45] (Section 2.2), which assumes the activation statistics of the active and non-active voxels are drawn from two distinct probability distributions. Under this setting, activated voxels are delineated by comparing the posterior probabilities of a voxel being active or non-active given data, which mitigates the need for multiple comparisons correction [44]. However,  8 mixture models alone do not account for spatial correlations. To mitigate this limitation, Woolrich et al. proposed a spatial mixture model based on continuous Markov random field (MRF) [46, 47]. Discrete MRF approaches have also been proposed [48-50], which convert activation detection into a class labeling problem, i.e. active or non-active, with discrepancies in labels between neighboring voxels penalized (Section 2.3). Spatial continuity is thereby encouraged, which helps suppress false declarations of isolated voxels as being activated.  The methods described thus far model local voxel interactions at the inference stage after the activation statistics maps are computed. The alternative approach would be to model these local interactions during the activation effect estimation stage. For instance, Ruttimann et al. proposed fitting spatial wavelets to BOLD volumes to account for local spatial correlations [51]. In their approach, spatial wavelet transform is separately applied to each BOLD volume in generating wavelet coefficient time courses. GLM is then performed on these time courses to compute activation statistics. The statistically significant wavelet coefficients are then identified by thresholding the activation statistics maps with e.g. Bonferroni or false discovery rate (FDR) correction [52]. The advantage of performing statistical inference in the wavelet domain is that the decorrelation property of wavelet transform renders the independence assumption in Bonferroni and FDR correction more appropriate [53]. However, the optimal way for transforming the results back to the spatial domain is not obvious. If the inverse wavelet transform is naively applied to the activation statistics of the significant wavelet coefficients to reconstruct an activation statistics map in the spatial domain, most voxels will be assigned non-zero activation statistics [53], thus thresholding will again be required. Therefore, Van De Ville et al. proposed using wavelet transform as a denoising step and performing statistical testing in the spatial domain but with the effect of the wavelet processing taken into account during threshold selection [54].  In addition to the wavelet approach, substantial research efforts have been placed on using Bayesian approaches [46, 47, 55-59] to model local voxel interactions through a spatial prior. For instance, Penny et al. proposed using a spatial Laplacian prior on the activation effects to encourage spatial smoothness [55]. Extending Penny et al.’s work, Harrison et al. employed a diffusion-based prior that accounts for how the spatial correlation structure is non-stationary  9 across the brain [56]. Other approaches include using spatiotemporal autoregressive models to account for correlations in the noise [58]. Most of these models require deploying approximate inference or computationally expensive sampling methods in estimating the activation effects. Also, only local spatial correlations are modeled with longer range voxel interactions completely ignored. Such long-range interactions may be implicative of activation, and hence beneficial in improving activation detection. 1.3.2 Group Analysis In standard fMRI group analysis [31], brain images of all subjects are first warped onto a common brain template [60] to create voxel correspondence across subjects. This procedure is referred to as “spatial normalization.” Activation effects at each voxel location are then statistically compared across subjects, e.g. using a t-test, to generate a group activation statistics map. The resulting group map is subsequently thresholded to isolate the commonly activated voxels. The implicit assumption is that a one-to-one correspondence exists between the active voxels of the subjects, and that all voxels are perfectly aligned in the template space after spatial normalization. However, the large inter-subject variability renders whole- brain spatial normalization prone to mis-registration errors [61]. High dimensional registration techniques, such as Hierarchical Attribute Matching Mechanism for Elastic Registration (HAMMER) [62], have been shown to improve alignment over commonly-used low dimensional techniques [63, 64] with more precise, localized activation foci detected [65]. However, even if perfect anatomical alignment can be achieved, whether a one-to-one functional correspondence exists between voxels across subject is debatable [65, 66].  To account for inter-subject variability in the location of the activated voxels, the conventional way is to spatially smooth the BOLD volumes to increase the functional overlaps across subjects, which also serves the purpose of deriving a GRF-based threshold [29]. However, spatial smoothing tends to smear the activations statistics maps. To mitigate this drawback, a couple of methods have been proposed to deal with this challenging problem of inter-subject functional misalignment. Thirion et al. proposed using a “structural” strategy to draw correspondence at the cluster level [67, 68]. Specifically, the activation statistics map of each subject is first thresholded to segment out the active voxel clusters.  10 Associated with each cluster is a local maximum, whose location is used to draw probabilistic correspondences between subjects, as estimated with belief propagation [69]. Replicator dynamics (RD) [70] is then applied to the “belief matrix” to segregate clusters that are commonly present in the subjects. Another approach, proposed by Keller et al., is to introduce a “spatial jitter” variable into GLM and marginalize out that variable to account for the various ways that the activation maps may be misaligned across subjects [71].  In addition to the standard approach of drawing group inference by thresholding a group activation statistics map, a number of hybrid methods, where group information is integrated into intra-subject activation detection, have been explored. Such “group-wise” approach facilitates modeling of subject differences while exploiting group information to better disambiguate signal from noise. For instance, Calhoun et. al. [72] proposed “group independent component analysis” (ICA), where voxel time courses are concatenated across subjects and decomposed into independent spatial components and associating time courses of the respective spatial components. A “back reconstruction” procedure is then performed to extract subject-specific spatial component maps. The optimal way to delineate active voxels from these intra-subject spatial component maps, however, is not trivial. One approach is to apply a threshold to z-scores estimated from the spatial component maps [73]. However, these z-scores do not have a clear statistical interpretation, in contrast to activation statistics generated from GLM [73]. Therefore, the spatial component maps are more often used for generating group maps instead [72]. Recently, Bathula et al. [74] proposed a leave-one-out strategy, where all subjects except one are analyzed to compute a group map, which is then used as a prior to segment the active voxels of the left-out subject. However, mis-registration errors arising from whole-brain spatial normalization, as required for generating group maps, may limit the accuracy of this strategy. To relax the one-to-one voxel correspondence criterion, Kim et al. [75] proposed using a hierarchical Dirichlet process (HDP) model with “random effects” to extract Gaussian-shaped functional clusters, as opposed to detecting active voxels. The authors showed that adding “random effects” to the HDP model permits slight inter-subject variations in the location of the detected clusters without losing cluster correspondence between subjects. However, considering the possible shapes that a functional cluster may take, the assumption that all functional clusters are Gaussian-shaped can be  11 limiting. Removal of this assumption, along with the one-to-one voxel correspondence criterion, would thus be desirable.  1.4 ROI Characterization ROI analysis is typically employed for testing specific hypotheses. For instance, one might be interested in studying how Parkinson’s disease affects ROIs within the motor system in addition to causing cell death in the substantia nigra [76]. This analysis approach involves specifying ROIs for each subject and comparing some “properties” of the regional activation statistics across subjects to draw group inference. The main advantage of this ROI analysis approach is that an exact voxel correspondence across subjects is not required, hence no need for spatial normalization. The original information in the activation statistics maps, which is certainly altered during spatial normalization, is thus retained. The challenge is that defining the ROIs and specifying a metric to rigorously characterize and compare the regional activation statistics across subjects without merely amplifying inter-subject variability is non- trivial. ROIs are typically defined anatomically [32, 33, 77], but such ROI definition may incorporate non-activated voxels into the analysis, which could result in reduced sensitivity in detecting task-related effects, depending on the metric used.  By far, mean activation statistics, percentage of activated voxels, and variants of these metrics are most widely used in fMRI ROI analysis [32, 33]. For instance, Buck et al. used the mean activation statistics of different proportions of voxels within an ROI to examine the reproducibility of results for various cognitive tasks [33]. Allen and Courchesne used the percentage of activated voxels and mean percentage signal changes to study the activation differences between subjects with and without autism [78]. Similarly, Aizenstin et al. used the percentage of voxels with negative activation statistics values to compare the activation distributions of young and elderly subjects [79]. All these features are derived based on the amplitude of the activation statistics with the underlying assumption that the spatial distribution of the regional activation statistics is immaterial.   12 Characterizing ROI activation statistics using amplitude-based features provides a simple means for subject comparisons, since these features are insensitive to variability in brain shape, brain size, subjects’ orientation, and location of the activated voxels. However, Haxby et al. in their seminal work [80] have shown that the spatial distribution of the BOLD signals also encode important cognitive information. Thus, instead of only looking at mean amplitude changes in activation statistics averaged over an ROI, exploring spatial distribution changes may also be beneficial [81-83].  1.5 Functional Connectivity Analysis Cognitive functions are known to be manifested through functional integration in addition to function specialization [84, 85]. In fact, recent studies suggest that voxel interactions are present even at rest without any external stimulus [20, 86-92]. These interactions are typically characterized in terms of functional connectivity [34] and effective connectivity [37]. Functional connectivity refers to the statistical dependencies between brain regions, whereas effective connectivity refers to the causal interactions among regions. We focus on the exploratory detection of brain networks based on functional connectivity inferred from fMRI data, and refer readers to [93-95] for descriptions of methods, such as dynamic causal modeling and structural equation modeling, for analyzing effective connectivity. 1.5.1 Intra-subject Analysis The objective of functional connectivity analysis is to infer brain networks based on BOLD signal correlations. Viewed from a graph-theoretic perspective, voxels correspond to graph nodes with edges connecting nodes that form networks. One may thus infer brain networks by either finding cliques of connected nodes or by finding presence/absence of edges.  Methods that search for connected nodes can be broadly divided into seed-based and exploratory multivariate approaches [96]. The seed-based approach infers functional connectivity by examining the correlations between a seed voxel or region and all other voxels in the brain using GLM. This approach is equivalent to the standard univariate  13 analysis for activation detection except the expected response is replaced with the intensity time course of the seed. Though this approach has shown promising results [86, 97], the detected networks are inherently restricted by the choice of seed [98]. For instance, if a seed is selected from a motor region, only motor networks will be detected with the rest of the brain left unexplored. Also, joint interactions between multiple voxels are neglected [99]. To alleviate these limitations, various multivariate techniques, such as cluster analysis and decomposition-based methods, have been explored [96]. Cluster analysis [100, 101] partitions voxels based on similarity between the intensity time courses, but results are often sensitive to initialization and the number of clusters chosen [96]. Decomposition techniques, like principal component analysis (PCA) [34] and ICA [73, 99], factorize the voxel time courses into spatial components and associating time courses of the respective spatial components based on statistical criteria, such as maximum variance and independence. Functional networks are then inferred based on the weights assigned to the voxels in the spatial component maps. Specifically, voxels jointly assigned “large” weights within the same spatial component are considered to belong to the same network. However, grouping voxels into networks using this strategy is complicated by how non-zero weights are assigned to majority of the voxels due to the diffused weighting property of PCA and ICA [102, 103]. Also, determining which spatial components pertain to true brain activity can be non-trivial for complex experiments [104]. To extract stimulus-invoked spatial components, Todd et al. proposed using constrained PCA, where GLM is first performed to isolate the task-related variance from the voxel time courses prior to applying PCA [105]. The limitation of this approach is that any misspecifications in the expected stimulus responses will result in removal of signals from which brain networks are inferred. Some brain networks might thus be mistakenly neglected.  The alternative approach for discovering brain networks is to look for the presence/absence of graph edges. Pairwise correlations between voxel time courses are typically used as estimates of the degree of voxel interactions. By computing the correlations between all possible pairs of voxels, one can generate an inverse covariance matrix, i.e. precision matrix, where elements with non-zero values imply the corresponding pairs of voxels are connected. However, since there are typically more voxels than time points, estimation of the precision  14 matrix by inverting the empirical covariance matrix will be unstable. Also, due to noise, most (if not all) elements of the estimated precision matrix will be assigned non-zero weights. A recently proposed strategy to deal with these problems is to enforce sparsity during precision estimation [106-108]. The main advantage is that it shrinks elements of the precision matrix that correspond to non-connected voxels to exactly zero, hence provides an unambiguous answer as to which voxels are disconnected. However, existing methods for sparse inverse covariance estimation are very computationally intensive and can only handle up to thousands of variables. Thus, these methods are typically used for region level analysis [108]. 1.5.2 Group Analysis The seed-based approach naturally extends to group analysis since the same machinery for standard group activation detection can be applied. However, the limitations of standard group activation analysis, such as voxel misalignment, are also propagated. Moreover, the inherent difficulties of the seed-based approach, such as seed selection, limit the possible group networks that may be discovered.  Cluster analysis learns the seeds (i.e. cluster centers) in a data-driven, but the optimal way for statistically inferring commonly recruited clusters across subjects is not obvious. One approach that avoids spatial normalization is to pool the voxel time courses across all subjects and apply clustering based on time course similarity. However, inter-subject variability in HRF (if analyzing task-based data) renders this approach prone to errors. In fact, in the case of resting state (RS) data where no external stimulus is present to guide the timing of the BOLD signals, this approach would be completely infeasible. To ease the HRF variability issue in multi-stimuli task-based experiments, Lashkari et al. proposed applying GLM to generate “activation profiles,” i.e. vectors of activation effect estimates associated with different stimuli, normalizing and pooling all subjects’ activation profiles into a single set, and clustering these profiles using a mixture model [109]. However, it is unclear whether and how this approach can be generalized to RS data.  The decomposition methods, such as PCA and ICA, also do not required seed selection. However, drawing a correspondence between the intra-subject spatial component maps for  15 group analysis can be challenging [72]. One way to mitigate this problem is to average the voxel time courses across subjects and apply the decomposition methods on the average time courses to estimate average group networks [110, 111]. Alternatively, in the case of PCA, one can compute the covariance matrix for each subject and applying PCA or RD (which we showed to be a solution to the non-negative sparse PCA optimization problem, Section 4.1) on the average group covariance matrix [112]. The main drawback of these averaging approaches is that the average group network may not reflect individual differences [113]. Also, these averaging approaches prohibit statistical group inference. To circumvent this limitation, group ICA has been proposed for extracting subject-specific spatial components to enable group inference [72]. Alternatively, Beckmann et al. proposed a technique called, “dual regression” for a similar purpose [114]. This technique involves concatenating the voxel time courses across subjects, applying ICA to the concatenated time course matrix to extract group spatial component maps, performing regression on each subject’s data with the group spatial components as regressors to extract subject-specific time courses, and applying regression again on the each subject’s data but with the subject-specific time courses as regressors to extract subject-specific spatial component maps. A limitation common to both methods is that all subjects are assumed to have similar spatial noise components, which is rather unlikely.  For the recent graph edge detection approaches [106-108], one way of drawing group inference is to apply a separate t-test to each element of the subjects’ covariance/precision matrices [115]. Alternatively, one may derive certain network properties based on graph theory to compare networks across subjects [116]. Since this graph edge detection approach is rather recent, the preferred methods for making group inference are yet to be established.  1.6 Brain Decoding One of the aims of brain decoding is to exploit the collective information encoded by the signal patterns of the fMRI brain volumes to determine the experimental conditions to which these brain volumes belong. Under this classification framework, signal intensity of each  16 voxel is treated as a feature with brain volumes taken as samples [35, 36]. Alternatively, one may concatenate multiple brain volumes within a trial of the same experimental condition and treat each concatenated set of brain volumes as a sample [117]. Typical fMRI datasets consist of considerably more voxels (~tens of thousands) than brain volumes (~hundreds). Hence, direct application of standard classifiers, such as support vector machines (SVM) [117] and linear discriminant analysis (LDA) [118], with all brain voxels used as features will likely result in overfitting [35, 36]. To reduce the dimensionality of the feature space, a common strategy is to apply standard univariate analysis [31] to restrict the feature set to only those voxels displaying significant activation or discriminability [117]. However, this approach discards the collective discriminant information encoded by the voxel patterns, which may lead to suboptimal feature selection [119-122]. Alternatively, PCA may be applied to reduce feature dimension [123]. However, the new features created by taking a weighted sum of the original features do not allow for simple identification of relevant voxels, which hinders result interpretation [124]. More advanced methods, such as support vector machine with recursive feature selection (SVM-RFE) [125] and random forest [126], that enable multivariate feature selection have been explored, but tend to be more computationally demanding.  Recently, there is a surge of interest in applying sparsity-enforcing techniques (Section 5.1) to fMRI classification. Many of these techniques exploits the sparse property of the least absolute shrinkage and selection operator (LASSO) penalty [127] to shrink classifier weights associated with irrelevant features to exactly zero. Hence, feature selection is implicitly integrated as part of classifier learning. However, naively enforcing sparsity may result in spurious classifier weight patterns [119]. In particular, classifier weights derived from enforcing sparsity alone often deviate from how brain activity tends to distribute in localized spatial clusters [119]. To encourage spatial contiguity, spatial SVM [128] has been proposed. However, this technique assigns weights to all input voxels, which renders identification of relevant voxels nontrivial. Group LASSO has also been explored to jointly select spatially proximal voxels as well as voxels within ROIs [129]. However, group LASSO does not model the associations between features, hence may not assign similar weights to neighboring voxels to reflect how adjacent voxels tend to display similar level of brain  17 activity [130]. Elastic net [119, 121], which encourages correlated voxels to be jointly selected, has also been employed, but suffers from similar limitations as Group LASSO. More recently, van Gerven et al. proposed a Bayesian formulation for incorporating a spatiotemporal prior, where the authors opted to model uncertainty by estimating the posterior probabilities of the classifier weights as opposed to obtaining sparse weightings through finding the maximum a posterior solution [131].  18 1.7 Thesis Contributions We propose in this thesis novel directions for fMRI analysis, where informative priors are seamlessly integrated into activation detection, ROI characterization, functional connectivity analysis, and brain decoding. We demonstrate the power of such prior-informed approaches on carefully designed simulations as well as extensive sets of real fMRI data. We list below our specific contributions in advancing the current state-of-the-art. Our proposed methods enable an extensive analysis of fMRI data from studying the functional role of individual brain units and modules (i.e. voxels and ROIs) to investigating at the system level how cognitive functions are mediated through the interactions between brain modules and represented in whole-brain activity patterns. Details of these contributions are provided in Chapters 2, 3, 4, and 5, respectively. We note that we have kept the original mathematical notations used in the respective publications to ease cross-referencing. There might hence be some overlaps in notations. To disambiguate these overlaps, we have explicitly redefined the notations in each subsection. 1.7.1 Activation Detection  We proposed incorporating functional connectivity priors estimated from task-based [P1, P2] as well as RS data [P6] into activation detection. Our proposed formulations model both local and long-range voxel interactions, which are completely neglected in existing techniques. We demonstrated on large sets of synthetic and real data that integrating connectivity priors increases detection sensitivity over the standard univariate approach.   We proposed integrating a group prior into intra-subject activation detection [P3−P5]. Our proposed group-extended graph structure enables information coalescing between subjects without the need for a one-to-one voxel correspondence. We showed on large sets of synthetic and real data that incorporating a group prior enhances detection sensitivity over the standard univariate approach.   19 1.7.2 ROI Characterization  We proposed inferring brain activation from the spatial modulations of regional activity patterns as measured using invariant spatial features [81, 82]. We validated the generality of this spatial analysis approach over large sets of real fMRI data [P7, P11]. We showed that brain activation can generally be inferred from spatial changes in ROI activity patterns. Our results thus demonstrate that signal amplitude is not the only means by which to infer brain activation.   Applying our proposed invariant spatial features [81, 82] to specific datasets have resulted in other important findings. We showed that spatial normalization reduces discriminability of task-related activation differences in ROI analysis [P8]. Thus, spatial normalization, which is standardly performed in group activation analysis, may be distorting important spatial information encoded in the original activation statistics maps. We also demonstrated that a predominant effect of levodopa (L-dopa) on PD subjects is a spatial focusing of regional activation patterns, not signal amplitude changes [P9]. We further showed that increasing spatial extent of ROI recruitment is another strategy for PD subjects to handle increasing task difficulties [P10]. Other findings include earlier peak time in spatial response as compared to BOLD amplitude changes [P12] and that functional connectivity can be inferred from the spatial modulations of regional activity patterns [P14].   We proposed measuring the skewness of the regional activity patterns [P13], in addition to spatial extent. Our results on real data showed that this skewness feature is more sensitive to directional changes than measuring spatial extent, and thus can play a complementary role to facilitate a more elaborate spatial analysis.  20 1.7.3 Functional Connectivity Analysis  We showed that replicator dynamics is a solution of the challenging non-negative sparse PCA optimization problem [P15, P16].   We proposed jointly integrating sparsity and group priors into intra-subject network detection. We demonstrated on synthetic and real data that enforcing sparsity simplifies network identification, while adding a group prior effectively handles the large inter- subject variability often seen in fMRI studies. 1.7.4 Brain Decoding  We proposed a simple regularization technique for incorporating domain-specific knowledge into a wide range of sparse linear models [P17−P20].   We validated on real fMRI data and showed how our prior-informed sparse classifiers outperform standard classifiers, such as SVM and LDA, as well as a number of existing sparse classifiers both in terms of prediction accuracy and result interpretability.   21 2     Prior-informed Activation Detection The traditional way for studying functional specialization through fMRI is by invoking different stimuli and examining which areas of the brain are responding. To localize the activated brain areas, the standard approach is to analyze each voxel in isolation using GLM (Section 2.1). The activated voxels are then delineated by thresholding the resulting activation statistics maps based on GRF theory. This univariate approach has the drawbacks of neglecting voxel interactions and requiring multiple comparisons correction. One way of mitigating multiple comparisons correction is to model the activation statistics maps using Gaussian mixture models (GMM) (Section 2.2), but GMM alone does not account for voxel interactions. One may integrate GMM into an MRF framework to model local spatial correlations (Section 2.3), but longer range interactions will still be neglected. We thus propose incorporating a functional connectivity prior into GMM using a random walker (RW) formulation [P1] to model the interactions between both adjacent and distant voxels (Section 2.4). Also, since intra-subject activation statistics maps are typically very noisy due to the low SNR inherent in BOLD signals, we propose a group MRF (GMRF) [P3, P5] that coalesces information across subjects to handle noise (Section 2.5). Validations of the proposed methods were performed on both synthetic (Section 2.6) and real data (Section 2.7). In addition, recent studies suggest that an intrinsic relationship might exist between brain activity at rest and during task, which may serve as another source of information for activation detection. We thus propose a RS-informed activation model for integrating RS- connectivity in estimating activation effects [P6] (Section 2.8). This model was also tested on both synthetic (Section 2.9) and real data (Section 2.10). Our other related contributions that are not discussed in this chapter include a new sparse multivariate method that we refer to as “adaptive non-negative sparse PCA” (ANSPCA) [P2] for incorporating functional connectivity into activation detection with local spatial correlations explicitly modeled, and a group RW (GRW) [P4] that adaptively controls the amount of intra- and inter-subject spatial regularization based on similarities in the frequency spectrums of the BOLD signals within and between subjects.  22 2.1 Univariate Activation Detection with GLM Let yj be an n×1 vector corresponding to the intensity time course of voxel j of a subject, where n is the number of time points. The activation effect, βj, and its associating t-value, tj, can be estimated using a GLM:  jjj Xy   , (2.1) )(/. jjj set  , (2.2)  where βj is an m×1 vector, m is the number of experimental conditions, ωj is assumed to be white Gaussian noise after preprocessing (Section 1.2), ./ stands for element-wise division, and se(βj) is an m×1 vector containing the standard error of each element of βj [132]. X is an n×m design matrix with boxcar functions time-locked to task stimuli convolved with the canonical HRF as regressors along the columns [31]. A pictorial depiction of GLM is shown in Figure 2.1.  = + yj X βj ωj …   Figure 2.1  Pictorial depiction of GLM.  To isolate the activated voxels for each subject, the standard approach is to apply a GRF- threshold corresponding to a typical experiment-wise p-value of 0.05 to the t-map [29].    23 For detecting group activation, the standard approach involves warping the β-maps onto a common brain template and applying a t-test to the subjects’ βj at each voxel location to generate a group t-map. GRF thresholding is then applied for isolating the commonly activated voxels. For both intra-subject and group activation detection, spatial smoothing of the BOLD volumes is required for deriving the GRF-based threshold, but in the case of group analysis, spatial smoothing additionally serves the purpose of increasing the functional overlaps across subjects. However, spatial smoothing hinders precise localization of the functional boundaries of the active clusters.  2.2 Univariate Activation Detection with GMM Another approach for isolating the activated voxels is to model the activation statistics maps using GMMs, which alleviates the need for multiple comparisons correction [44]. In this model, the t-values, tj’s, of an activation statistics map (intra-subject or group) are assumed to be generated from a mixture of K Gaussians with mixing coefficients πk, means µk, and variance σk2:    K k kkkj Nt 1 2 ),(~  , (2.3)  where K = 2 if classifying the voxels as either active (k = 1) or non-active (k = 2). The maximum likelihood (ML) solution of πk, µk, and σk2 can be estimated e.g. using expectation maximization [69], but the singularity problem inherent in the ML solution, i.e. µk collapsing onto one of the tj’s with σk2 approaching 0, may result in a tj  being assigned infinite likelihood. One way to deal with this singularity problem is to add priors to the parameters:  ),(~ 2 Nk , ),(~2 baIGk , )(~  Dir , (2.4)  where IG(a,b) and Dir(α) denote inverse Gamma and Dirichlet distributions, respectively. A graphical summary of the resulting model is shown in Figure 2.2.  24  tj zπα σk2µk η τ a b Nv K K   Figure 2.2  Graphical depiction of GMM. Nv is the number of voxels. z is a Nv×K matrix with each element being an indicator variable set to 1 if voxel j is estimated to be in state k, and 0 otherwise.  In addition to mitigating the singularity problem of the ML solution, adding priors allows us to encode our knowledge into the model. Specifically, we know that tj of non-active voxels should theoretically be 0 and the t-threshold for active voxels is typically set between 3 and 4, based on GRF theory [29]. We can encode this prior knowledge on t-values of active and non-active voxels through η with τ2 set to 1 to model uncertainty in η. As for σk2, an uninformative prior is typically used by setting both a and b to 0.5 [38], since little is known about this parameter. Finally, α is set to 1/K assuming equal prior class probabilities.  To estimate the parameters, πk, µk, and σk2, we use Gibbs sampling [69] by drawing samples from the following posterior conditional distributions:        vv N j jKK N j j zzDirzp 11 11 ,,),|(   , (2.5)                   vv v N j kjk k N j kjk N j kjjk k zz tz ztp 1 22 22 1 22 1 22 ,),|(      ,  (2.6)          v vN j N j kjjkjkk tzbzaIGbaztp 1 1 22 )(5.05.0),,,|(  ,  (2.7)  25 ),|(),,,|1( 22 kkjkkkkjjk tNtzp   , (2.8)  where Nv is the number of voxels and z is an Nv×K matrix with the (j,k)th element, zjk, being an indicator variable set to 1 if voxel j is estimated to be in state k, and 0 otherwise. A voxel j is declared as activated if p(zj1 = 1|tj,µk,σk2,πk) > p(zj2 = 1|tj,µk,σk2,πk).  2.3 Modeling Local Spatial Correlations with MRF To model local spatial correlations, one can integrate the GMM into an MRF framework. Specifically, let G = (V,E) be a graph, where V and E are sets of vertices (voxels) and edges, respectively. Also, let jp E be the edge between voxels j and p. The activated voxels can be delineated by minimizing the MRF energy function:     Vj Ejp pjjpjj xxxxE ),()()(  , (2.9)  where xj {1, …, K} is the label assigned to voxel j, K is set to 2 to label voxels as either active or non-active, θj(xj) is a unary potential for modeling the activation statistics, and θjp(xj,xp) is a pairwise potential for encouraging spatial smoothness as controlled by adjusting λ. We define the unary potential as:  )|(1)( jjjj tkxpkx  , (2.10)  where tj is the t-value of voxel j and p(xj = k|tj) is the probability of xj being assigned label k given tj as estimated using GMM (Section 2.2). Thus, if voxel j has a low probability of being activated, we penalize labeling xj as active, and vice versa. k = 1 or 2 corresponding to active or non-active. As for the pairwise potential, we employ the widely-used Pott’s model [133]:  )(1),( pjpjjp xxxx   , (2.11)  26 where δ(y) = 1 if y = 0 and 0 otherwise. Edges are added between every voxel j and its 6- connected spatial neighbors to encourage adjacent voxels to have similar labels. λ is set to 1 to weight the unary and pairwise potentials equally. We note that this MRF approach is applicable for both intra-subject and group analysis, since the algorithm is indifferent to the type of input t-maps. The globally optimal label configuration can be determined using graph cuts [134] with the activated voxels delineated based on the voxel labels.  2.4 Modeling Functional Connectivity with RW In addition to modeling local spatial correlations, it may also be beneficial to model longer range interactions. We thus propose using RW to integrate voxel correlations (both local and long range) and activation statistics in estimating probabilistic activation maps [P1]. Under the original RW framework [135], each voxel corresponds to a graph vertex with edges between voxels weighted by their correlations to bias the paths for which a random walker may transverse. Voxels are labeled (i.e. active or non-active) based on the probability that a random walker starting at each voxel location will first reach a pre-labeled seed. This framework, however, not only requires specifying seed voxels but also does not model activation statistics. Therefore, we adopt an augmented RW formulation [136] that facilitates incorporation of activation statistics through label priors. This formulation is equivalent to adding an artificial seed vertex for each label and connecting these seeds to every vertex in the original graph with label priors being the edge weights [136]. Hence, this augmented RW formulation is analogous to MRF except the optimization is performed over real-valued probabilities instead of binary labels. Also, this augmented RW formulation has an exact, unique closed-form solution for computing the posterior label probabilities with global optimality guaranteed for an arbitrary number of labels [136]. The posterior probabilities of the voxels belonging to label class s, xs, can be estimated by solving:  ssK k k xL      1 , (2.12)   27 where xs is an d×1 vector, d is the number of voxels, K is the number of labels set to 2, and s = 1 or 2 corresponding to active or non-active, respectively. Λs is a diagonal matrix containing the d×1 label prior vector, λs , which we set as the probabilities of the voxels belonging to class s given their associating t-values as estimated using GMM (Section 2.2). L is an d×d weighted graph Laplacian matrix:         otherwise,0 , , i ij ij ij jiw Eijw L , (2.13)  where wij is the correlation between the intensity time courses of voxels i and j, ij denotes a graph edge between voxels i and j, and E denote the sets of edges. Graph edges are added between every pair of voxels with correlation greater than a conventional threshold of 0.4 [137]. This threshold can be empirically derived based on the assumption that time courses of non-active voxels after preprocessing are distributed as white Gaussian noise. Specifically, the correlation values between the non-active voxels likely lie between -0.4 and 0.4, as shown in Figure 2.3, where a histogram of pairwise correlations between 1,000 random Gaussian noise time courses is plotted. As evident from Figure 2.3, setting a threshold of 0.4 will remove majority of the correlations due to white noise.  -0.4 -0.2 0 0.2 0.4 0 0.2 0.4 0.6 0.8 1 Correlation Values Re la tiv e F re qu en cy   Figure 2.3  Histogram of correlation values between random Gaussian noise time courses.  28 To explicitly encourage spatial smoothness, which may be underrepresented in the estimated voxel correlation values due to noise, we add a small constant of 0.001 to the wij between every voxel and its 6-connected spatial neighbors. We have tested on large synthetic datasets (Section 2.6) that adding a constant two orders of magnitude lower than the signal correlation values promotes spatial continuity while permitting functional connectivity to remain dominant over spatial smoothness in driving the activation detection process. Upon solving (2.12) for xs, identification of the activated voxels is achieved by comparing x1 against x2, i.e. if element j of x1 is greater than that of x2, then voxel j is declared as active. We note RW naturally permits intra-subject analysis, but not group analysis since delineating a group activation statistics map is complicated by the need for a group L. One possible way of generating a group L is to average the subjects’ correlation matrices and then apply (2.13).  2.5 Modeling Group Information with GMRF The methods described thus far in this chapter, i.e. GLM, GMM, MRF, and RW, all operate on a single activation statistics map at a time in detecting the activated voxels. In the case of intra-subject analysis, the derived activation statistics maps might be very noisy even with functional connectivity integrated due to the low SNR inherent in BOLD signals. In the case of group analysis, averaging over subjects should reduce noise, but the considerable amount of functional inter-subject variability [138] renders the one-to-one voxel correspondence assumption questionable. Therefore, we propose a group MRF that does not require a one-to- one voxel correspondence to integrate group information into intra-subject activation detection [P3, P5]. Modeling each subject’s activation statistics map as an MRF, we extend the neighborhood system to other subjects so that information can be coalesced across subjects. The intuition behind this approach is that, although the existence of a one-to-one functional correspondence between voxels is very unlikely, true brain activation should presumably appear in similar proximal locations across subjects [138], whereas false positives are more likely to be randomly scattered across the brain with little cross-subject consistency. Thus, regularizing inter-subject neighbors reinforces brain areas that are consistently recruited across subjects, while suppressing false positives. Hence, even if intra-  29 subject neighbors are deemed unreliable due to noise, additional neighbors from other subjects are available to better approximate the state of each voxel. Since only voxels that are spatially proximal to each other are hypothesized to be in similar state, our approach alleviates the need for finding an exact one-to-one voxel correspondence across subjects. Instead, a relaxed condition of requiring only the brain structures of the subjects to be approximately aligned is in place. 2.5.1 GMRF Formulation Let Gi = (Vi,Ei) be a graph where Ei = εi U ε~i. Vi and εi are the sets of graph vertices (voxels) and edges of subject i, respectively, and ε~i is the set of edges between subject i and all other subjects. Also, let jpεi denotes the intra-subject edge between voxels j and p and jqε~i be the inter-subject edge between voxels j and q. Our proposed group-extended graph structure for a single subject i is shown in Figure 2.4.  subject i voxel j subject Ns subject 1 voxel q voxel p   Figure 2.4  Proposed GMRF graph structure. Each subject’s activation statistics map is modeled as an MRF with its neighborhood system extended to other subjects. Ns is the number of subjects in the group.  If we connect the voxels of every subject in an analogous manner, yielding a “GMRF”, all subjects’ activation statistics maps can be simultaneously and collaboratively delineated by minimizing a single MRF energy function:   30        ii i s jq qjjq Vj jp pjjpjj N i xxxxxxE ~ ),(),()()( 1   , (2.14)  where xj{1, …, K} is the label assigned to voxel j, K is set to 2 to label voxels as active or non-active, Ns is the number of subjects, θj(xj) denotes a unary potential, and θjp(xj, xp) and θjq(xj, xq) are intra- and inter-subject pairwise potentials. λ is used to adjust the contribution of the pairwise potentials while γ balances the contribution from the intra- and inter-subject pairwise potentials. We define the unary potential exactly the same as (2.10). As for the intra- subject and inter-subject pairwise potentials, we employ the Pott’s model [133]:  ),(1),( mjmjjm xxxx    (2.15)  where δ(y) = 1 if y = 0 and 0 otherwise, and m = p or q. Edges are added such that they link every voxel of subject i to its 6-connected intra-subject spatial neighbors as well as its c closest inter-subject spatial neighbors for every subject pair (i,h), i≠h. To define inter-subject neighbors, brain structures of all subjects are first aligned (Section 2.5.2). The resulting MRF energy (2.14) can be minimized using graph cuts [134] to find the globally optimal labeling for simultaneous delineation of all subjects’ activation statistics maps. 2.5.2 Implementation Details To properly define inter-subject spatial neighbors, we need to first align the subjects’ brain structures. The conventional strategy is to perform whole-brain spatial normalization, but the large anatomical inter-subject variability renders accurate whole-brain registration difficult [61], especially for diseased subjects. Another approach is to extract anatomical ROIs and perform alignment at the region level for which highly accurate techniques are available [139]. This alignment approach has several advantages over whole-brain registration. First, no brain structures will be mistakenly taken as part of another structure. Second, the improvement in alignment has shown to enhance activation localization [140]. We have also shown that this region level alignment approach better preserves the spatial information in  31 the regional activation statistics patterns than whole-brain spatial normalization, as discussed in Section 3.2. Moreover, computational cost is substantially reduced.  From an implementation perspective, GMRF essentially creates a “mega-subject” by combining the voxels of all subjects into a single set with edges added between intra- and inter-subject spatial neighbors. We set λ to 1/Ns to scale the cumulative pairwise potential over subjects to a level similar to that of a single subject, which ensures that the pairwise potentials will not dominate over the unary potential. The choice of γ depends on the degree of inter-subject variability, which is unknown. We thus set γ to 0.5 to weight subject i’s own information more than that of other subjects. As for the parameter c, we set it to 3 in agreement with how a one-to-one voxel correspondence unlikely exists between subjects.  2.6 Synthetic Task Data Experiments 2.6.1 Data Generation We generated 200 synthetic datasets, each consisting of ten subjects. The activation pattern comprised three clusters (Figure 2.5). The signal intensity of the active voxels (marked with circles in Figure 2.5) was set to decrease exponentially as a function of distance from the respective activation centroid of each cluster. The smaller cluster was added to test whether the compared methods can detect spatially isolated active voxels that are consistently present across subjects. The synthetic time courses of the active voxels were generated by convolving a box-car function, having the same stimulus timing as in our real data experiment (Section 2.7.1), with a canonical HRF and adding Gaussian noise. We also added low frequency temporally-correlated noise to emulate confounds, such as scanner drifts, aliased artifacts from cardiac and respiratory cycles, and mind wandering. Furthermore, we randomly varied the locations of the two larger clusters across subjects to introduce inter- subject variability in location of the active regions. The maximum cluster misalignment was set to two voxels (~30% of the cluster width) in both the vertical and horizontal directions. Two SNR levels were tested: 0.2 and 0.5, which are typical bounds for fMRI studies [21, 22]. We defined SNR as the standard deviation of the signal over that of noise as in [141]. We  32 note that only voxels near the cluster centers attained SNRs close to the maximum value of 0.2 or 0.5. SNR for voxels near the cluster borders are well below the maximum SNR values. 100 synthetic datasets were generated at each maximum SNR level. 2.6.2 Results and Discussion We applied GLM, GMM, MRF, RW, and GMRF to the synthetic datasets to compare the effects of different priors on intra-subject activation detection. The qualitative results of an exemplar synthetic dataset (three arbitrary subjects included) at an SNR of 0.35 (i.e. midpoint between the typical SNR bounds of 0.2 and 0.5 [21, 22]) are shown in Figure 2.5. GLM missed many of the active voxels, whereas GMM detected more active voxels but also included many isolated false positives. Enforcing intra-subject spatial continuity using MRF reduced the amount of false positives, but failed to detect some of the active voxels near the cluster borders. The smallest cluster was also neglected due to lack of intra-subject active neighbors. Incorporating functional connectivity using RW led to more detections near the cluster borders than MRF since RW adapts the amount of spatial regularization based on the level of correlations between the voxels, i.e. active and non-active voxels are presumably uncorrelated hence less encouraged to be assigned the same class label. Employing RW also resulted in fewer isolated false positives than GMM, though an ample amount remained due to noise-induced false correlations between the active and non-active voxels, which were mistakenly taken as long-range interactions. Integrating group information using GMRF correctly detected most of the active voxels with much stricter control over false positives than GMM and RW. Also, compared to MRF, GMRF detected the smallest cluster and more active voxels at the cluster borders, thus demonstrating the benefit of adding inter-subject neighbors to better disambiguate the state of the voxels.  To quantify and compare the various methods’ performances, we computed the average true positive rate (TPR) and average false positive rate (FPR) over the 100 synthetic datasets for each SNR level. We also computed the average Dice similarity coefficient (DSC) [142] to provide a single, overall measure of performance:   33 FNFPTP TPDSC  2 2 , (2.16)  where TP, FP, and FN denote the number of true positives, false positives, and false negatives, respectively. The quantitative results are summarized in Figure 2.6. At an SNR of 0.2, GLM achieved the lowest FPR at the expense of attaining the worst TPR. GMM obtained ~30 times higher TPR than GLM, but a horrific FPR of 0.31. Incorporating functional connectivity into GMM using RW further increased TPR and mildly decreased FPR. Encouraging only spatial continuity using MRF reduced FPR by ~3 times compared to RW, but resulted in ~2 times lower TPR. Our results thus show that modeling functional connectivity using RW increases detection sensitivity but has less control over false positives compared to MRF due to noise-induced correlations. In contrast, integrating group information using GMRF decreased FPR by an order of magnitude compared to GMM, MRF, and RW, while maintaining a competitive TPR that was only ~1.2 times lower than the highest TPR as obtained with RW, hence resulting in the best overall DSC among the compared methods. These results suggest that relying on the information within a single dataset of a subject might be insufficient to obtain satisfactory intra-subject activation detection at low SNR, even with spatial regularization or incorporation of a functional connectivity prior due to the lack of reliable intra-subject neighbors and noisy signal correlation estimates. At an SNR of 0.5, adequate signal appeared to be present for MRF and RW to be more effective, resulting in ~20 times and ~4 times decrease in FPR, respectively. Nevertheless, even at higher SNR where more reliable information was available within each subject, the addition of group information using GMRF still improved results over the compared methods with TPR increased to a level similar to that of RW and a FPR close to 0, hence achieving the highest DSC overall.   34   -2 0 2 4 6  -2 0 2 4 6 -2 0 2 4 6 (a) GLM  -3 -2 -1 0 1 2 3 4 5  -3 -2 -1 0 1 2 3 4 5 -3 -2 -1 0 1 2 3 4 5 (b) GMM  -3 -2 -1 0 1 2 3 4 5  -3 -2 -1 0 1 2 3 4 5 -3 -2 -1 0 1 2 3 4 5 (c) MRF  -3 -2 -1 0 1 2 3 4 5  -3 -2 -1 0 1 2 3 4 5 -3 -2 -1 0 1 2 3 4 5 (d) RW  -3 -2 -1 0 1 2 3 4 5  -3 -2 -1 0 1 2 3 4 5 -3 -2 -1 0 1 2 3 4 5 (e) GMRF  Figure 2.5  Synthetic data results. Results at an SNR of 0.35 for three exemplar subjects of a synthetic data set shown. Circle = ground truth activated voxels. Dots = detected activated voxels. Note: Data were spatially smoothed only for the case of GLM as required for deriving a GRF threshold.  35  0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 Tr ue  Po si tiv e R at e GLM GMM MRF RW GMRF 0.20 0.50 SNR  (a) TPR 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 1 2 Fa ls e P os iti ve  Ra te GLM GMM MRF RW GMRF 0.20 0.50 SNR  (b) FPR 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 Di ce  Sim ila rit y C oe ff ic ie nt GLM GMM MRF RW GMRF 0.20 0.50 SNR  (c) DSC  Figure 2.6  TPR, FPR, and DSC at two typical SNR levels. GMRF obtained the highest DSC for both SNR levels, demonstrating a better balance between TPR and FPR than the other compared methods. Note: GLM achieved FPRs of approximately 0 at the expense of lower TPRs.  In addition to the presented test case, we also tested the methods above under a couple of other scenarios. First, we tested the case where location of the clusters were consistent across subjects but the placement of the activation centroids were varied and permitted to be anywhere within the clusters. This scenario emulates the situation where the active regions completely overlap between subjects but the location at which the BOLD signal concentrates varies. Hence, it will appear that there is little overlap between the active regions of the subjects. Results were similar to those described above. Details of which can be found in [P3]. Second, we tested the case where artificial activations are injected into real ROIs. The general trends of the results were again similar to those above. Details can be found in [P4, P5]. We note that some of the activation detection results presented in this thesis slightly differ from those in our original publications due to a couple of changes made. First, we now define SNR as the standard deviation of the signal over that of noise [141], as opposed to squared mean of the signal over variance of the noise. Second, in detecting activation with GLM, spatial smoothing is now applied to the voxel time courses instead of on the activation statistics maps. In [143], it was mentioned that spatially smoothing the activation statistics maps provides similar results as spatially smoothing the voxel time courses. However, in our synthetic experiments, we obtained higher sensitivity with the latter approach, and have thus adopted this latter approach in applying GLM to our synthetic and real data (Section 2.7).  36 2.7 Real Task Data Analysis 2.7.1 Materials Functional MRI data were provided by Dr. Martin McKeown of the Pacific Parkinson Research Center. 10 PD patients off and on medication (4 men, 6 women, mean age 66 ± 8 years) and 10 age-matched healthy controls (3 men, 7 women, mean age 57.4  14 years) were recruited. Each subject used their right hand to squeeze a bulb with sufficient pressure to maintain a bar shown on a screen within an undulating pathway. The pathway remained straight during baseline periods and became sinusoidal at a frequency of 0.25 Hz (slow), 0.5 Hz (medium) or 0.75 Hz (fast) during the time of stimulus. Each session lasted 260 s, alternating between baseline and stimulus of 20 s duration.  Functional MRI was performed on a Philips Gyroscan Intera 3.0 T scanner (Philips, Best, Netherlands) equipped with a head-coil. T2*-weighted images with BOLD contrast were acquired using an echo-planar (EPI) sequence with an echo time of 3.7 ms, a repetition time of 1985 ms, a flip angle of 90°, an in plane resolution of 128×128 pixels, and a pixel size of 1.9×1.9 mm. Each volume consisted of 36 axial slices of 3 mm thickness with a 1 mm gap. A 3D T1-weighted image consisting of 170 axial slices was further acquired to facilitate anatomical localization of activation.  Slice timing and motion correction were performed using Brain Voyager (Brain Innovation B.V.). Further motion correction was performed using motion corrected independent component analysis (MCICA) [144]. The voxel time courses were high-pass filtered to account for temporal drifts and temporally whitened using an AR(1) model [28]. No whole- brain spatial normalization or spatial smoothing was performed. Anatomical delineation of the ROIs, including the primary motor cortex (M1), supplementary motor cortex (SMA), prefrontal cortex, caudate, putamen, globus pallidus, thalamus, cerebellum, and anterior cingulate cortex, was performed by an expert based on anatomical landmarks and guided by a neurological atlas. The segmented ROIs were resliced at the fMRI resolution for extracting the preprocessed voxel time courses within each ROI.  37 2.7.2 Results and Discussion For testing the various activation detection methods, we selected the left M1 as the region of interest, since this region is known to activate during right-hand motor movements (Figure 2.7). For applying GMRF, the left M1 of all subjects were non-rigidly aligned using a point set registration algorithm called, “coherent point drift” [139], which treats the point sets as Gaussian mixtures and registers the point sets by aligning the mixture distributions. Enhanced robustness to noise and outliers has been shown with this algorithm over standard techniques, such as iterative close point [139]. We note that since the subjects were required to perform motor squeezing during both baseline and task periods, the activation differences in the left M1 between these two conditions are expected to be subtle.    Figure 2.7  Homunculus. Hand area of M1 is circled in Homunculus diagram. Approximate corresponding area in the aligned left M1 is indicated. Homunculus courtesy of  Results obtained with GLM, GMM, MRF, RW, and GMRF on the healthy controls’ data are shown in Figure 2.8. GLM missed many of the voxels in the hand area (Figure 2.7). GMM detected more voxels in the hand area, but also included many isolated false positives. MRF reduced the amount of isolated false positives, but mistakenly declared adjacent non-hand areas as active, and RW led to similar results. In contrast, GMRF correctly detected the hand area in all subjects. Our results thus demonstrate the benefits of incorporating group information, especially for tasks where we expect weak brain activations as in our study.   38 ‐6 6  (a) non-smoothed t-maps  (b) GLM  (c) GMM  (d) MRF  (e) RW  (f) GMRF  Figure 2.8  Real data results for control subjects. 5 arbitrary exemplar subjects shown.  Results on the PD data are shown in Figure 2.9. The top and bottom rows correspond to PD patients off and on medication, respectively. GLM detected part of the hand area in most subjects, but also falsely declared islands of activation outside the hand area. GMM declared most voxels along the left M1 as active, which likely included many false positives. MRF detected spatially-contiguous clusters of voxels in the hand area, but also declared the nearby hip and leg areas, which should not be active unless the subjects’ hip and leg moved in sync with the stimulus. Using RW reduced the amount of false positives compared to GMM, but ample false positives remained, which were likely due to correlations arising from factors, such as physiological confounds, MR reconstruction, and head motion, mistakenly taken as voxel interactions. In contrast, GMRF consistently detected the hand area in all subjects without falsely declaring the hip and leg areas. In addition, the GMRF results suggest a very interesting trend between PD on and off medication. Specifically, PD off medication seemed to require recruiting a wider area of the left M1, which normalized back to an extent similar to the controls upon medication. Such spatial focusing effect of L-dopa medication has also been observed in past cognitive studies [145], thus confirming the validity of our results.  39 ‐6 6  ‐6 6  (a) Non-smoothed t-maps   (b) GLM   (c) GMM   (d) MRF   (e) RW   (f) GMRF  Figure 2.9  Real data results for PD subjects. 5 arbitrary exemplar subjects shown. Top and bottom rows of each sub-figure correspond to PD off and on medication, respectively.  40 2.8 Resting State-informed Activation Detection The discovery of signal structure in ongoing brain activity in the absence of external stimulus has ignited enormous research interest [20, 88, 90, 92, 97]. Beginning with the seminal work by Biswal et al. [86], where coherent spontaneous fluctuations in neural activity were observed between brain regions in the somatomotor system, a huge body of work studying these spontaneous brain activities has followed [146]. For instance, a pioneer study by Raichle et al. investigating the baseline state of the human brain demonstrated a high degree of functional connectivity between the precuneus, posterior cingulate cortex, medial prefrontal cortex, and medial, lateral and inferior parietal cortex during rest and deactivation of these same regions during goal-directed tasks [147]. This set of brain regions has been coined the “default mode network” [147], which is one of the most widely-studied RS networks [148]. Others have found high signal correlations between the dorsolateral prefrontal cortex, inferior parietal cortex, and SMA at rest [149]. These regions are known to be associated with increased alertness during task performance, and have been termed the “task-positive network” [149]. There are many other RS networks that exhibit high resemblance to those engaged during various tasks [90], and all these findings are suggesting potential relationships between brain activity during task and rest, which may serve as an additional source of information for detecting brain activation. In particular, task-based fMRI data typically display low SNR, especially in patients due to difficulties in performing certain tasks [20]. Since acquiring RS data requires minimal task demands and RS data are less susceptible to behavioural confounds [20], extracting connectivity priors from RS data to inform activation detection may enhance detection sensitivity, which is particularly beneficial for studying diseased populations. We thus proposed to integrate RS-connectivity and task-evoked responses under a unified generative model [P6].  41 2.8.1 RS-informed Activation Model Let Y be an d×n matrix containing the intensity time courses of d voxels or d brain regions of a subject. Our proposed model can be summarized as follows:         AXYVAXYtrVVAXNY Tn 1111 2 1exp 2 1),(~ 2 , (2.17)        KMAVMAtrV K KVMMNA T m d 1 2 2 1 2 2 1exp 2 ),,(~ 2 2   , (2.18)  where X is an m×n matrix with m regressors along the rows for modeling stimulus-invoked response and confounds [31]. A is the d×m activation effect matrix to be estimated (Section 2.8.2). V1 is the d×d covariance matrix of Y, V2 is an d×d covariance matrix modeling the correlations between the activation effects of the d brain regions, and K is an m×m covariance matrix modeling the correlations between the experimental conditions. MN(M,V2,K) denotes the matrix normal distribution, which serves as the conjugate prior of (2.17) [150] with α controlling the degree of influence of this prior on A (Section 2.8.4). To ensure that our estimate of A is invariant to affine transformations on Y and X, we set M to 0d×m and K to XXT [150]. Setting M to 0d×m also ensures that the activation effect estimates will not be biased towards non-zero values, which could induce false detections. V1 and V2 are assumed to be known. Compared to the model in [150] where V1 and V2 are assumed to be equal, we show that exact closed-form solutions for the posterior estimate of A and the model evidence can be derived even for the more general case with V1 and V2 being distinct. Permitting distinct V1 and V2 accounts for how Y and A might have different correlation structures. Also, we hypothesize that brain regions that are functionally correlated at rest are more likely to display similar levels of brain activity during task performance. We thus set V2 to the covariance estimates learned from RS data (Section 2.8.3). We assume V1 = Id×d as conventionally done for analytical simplicity, and defer learning V1 from data for future work.  42 2.8.2 Posterior Activation Effects Estimation To estimate A, we first derive the joint distribution of Y and A by taking the product of (2.17) and (2.18) and isolating the terms involving A from those involving Y:                       TTTTT TTT YYVAYXVVVAAXXVVtr AXVAXAXYVAXYtrVVXAYP 1 1 1 1 11 2 1 1 1 2 1 1 1 2 1 121 2 2 1exp 2 1exp,,|,   . (2.19)  Since the terms involving A take a quadratic form, the posterior distribution of A is again a matrix normal distribution, as expected from the conjugacy of (2.17) and (2.18). By completing the square, the maximum a posteriori mean of A can be derived:      11111211   TTA XXYXVVVM  . (2.20)  Computing MA requires inverting V1 and V2, which is unstable if V1 and V2 are set as sample covariance estimated from data with more brain regions than time points. In this work, we employ and compare two state-of-the-art techniques, namely graphical LASSO (GL) [106] and oracle approximating shrinkage (OAS) [151], for obtaining a well-conditioned V2 with V1 assumed to be Id×d. 2.8.3 Functional Connectivity Estimation Graphical LASSO. Given a sample covariance matrix, S, computed from data that are assumed to follow a centered multivariate Gaussian distribution, one can estimate a well- conditioned sparse inverse covariance matrix, ̂ , by minimizing the negative log-likelihood of the data distribution over the space of positive definite (p.d.) matrices while imposing an l1 penalty on ̂ [106]:    1 0 )det(logminargˆ   Str , (2.21)  43  where || · ||1 is the element-wise l1 norm and λ controls the level of sparsity. Having a sparse inverse covariance matrix simplifies interpretation, since ij̂ = 0 implies variables i and j are conditionally independent given all other variables. In the context of fMRI, ij̂ = 0 implies brain regions i and j are not connected. To optimize (2.21), we employ the Two-Metric Projection method for its efficiency [152].  Oracle Approximating Shrinkage. Assume the data for estimating the ground truth covariance, Σ, is generated from a multivariate Gaussian distribution. The most well- conditioned covariance estimate of Σ is F = tr(S)/d·Id×d [151]. The idea of OAS is to shrink the ill-conditioned sample covariance, S, towards F so that a well-conditioned covariance estimate, ̂ , can be obtained. Specifically, OAS optimizes the following cost function [151]:    FStsE F       1ˆ..,ˆminargˆ 2 , (2.22)  where ρ controls the amount of shrinkage with the optimal value given by [151]:                   d StrStrn StrStr d d 2 22 222 1 1̂ . (2.23)  Thus, no parameter selection is required and the inverse covariance matrix can be obtained by inverting ̂ with stable inversion warranted. 2.8.4 Hyper-parameters Estimation A critical hyper-parameter in our model is α, which controls the degree of influence of the connectivity prior on the activation effect estimates. To set α, we first derive the model evidence by integrating P(Y,A|α,V1,V2) over A:  44                  11 2 1 1 1 1 1 1 1 2 1 1 11 2 1 1 21 ...2 1exp 2 ),,|( 22 2 VVVXYXXYX YYVtr VV VV VVYP TTT T mn m     . (2.24)  Since 12 V estimated from GL or OAS is p.d., 12 V can be decomposed into QΓQT where Q are the eigenvectors of 12 V and Γ contains the eigenvalues of 12V along the diagonal. By exploiting this property of p.d. matrix and that 11 V is assumed to be I = QIQT, the log of (2.24) can be simplified into the following expression:      CB m mVVYP d i i ii ii       121 1 1ln1ln 2 ),,|(ln  , (2.25)   XYQXXYXQB TTT 1 , (2.26)  where γi is the ith eigenvalue of 12V , and terms that do not depend on α, V1, or V2 are grouped into C. Since (2.25) is a single variable function of α, the optimal α at which (2.25) is maximized can be efficiently determined using any generic optimization routines.  To optimize the choice of λ for the case where 12V is estimated using GL, we define a grid of λ values at which the percentage of non-zero elements in 12V roughly ranges from 10% to 90%. For each 12 V  associated with a given λ, we determine the optimal α at which (2.25) is maximized and compute the log model evidence. The λ associated with the largest log model evidence is then taken as the optimal λ.   45 2.9 Synthetic RS/Task Paired Data Experiments 2.9.1 Data Generation We generated 200 synthetic datasets based on our model described in Section 2.8.1. Each dataset consisted of 10 subjects. Each subject’s data comprised 100 regions with the first 20 regions set to be mutually correlated and activated across all subjects. The next 20 regions were set to be mutually correlated but not activated to test whether our model will falsely declare correlated regions as activated. The remaining regions were not correlated nor activated. To simulate the described scenario, we first generated 100 signal time courses, where the first 20 time courses were sine functions with Gaussian noise added. The next 20 time courses were cosine functions with Gaussian noise added. The remaining 60 time courses were simply Gaussian noise. We then computed the empirical covariance of these signal time courses, which was taken as the group covariance matrix of the activation effects and the resting state networks, Ωg. A random p.d. matrix was added to Ωg to introduce inter- subject variability into the individual covariance matrix of each subject, Ωi. The degree of variability was controlled by restricting the Kullback-Leibler divergence of N(0,Ωg) and N(0,Ωi) to be ~1.5. For each subject i, 100 RS time courses of length Nt were drawn from a multivariate Gaussian with zero mean and covariance Ωi. We set Nt to 25 to emulate the typical situation where the number of regions exceeds the number of time points. To generate task data, samples of A were independently drawn from (2.18) for each subject i with V2 set to Ωi. To simulate activation, the means of the first 20 regions in (2.18) were set to a small positive number, δ, that depended on SNR, δ2/σ2, at which we were testing. Random Gaussian noise was added to δ to further introduce inter-subject variability. The resulting A were then used to draw samples of Y from (2.17) with V1 set to σ2I and X being boxcar functions with 25 time points convolved with the canonical hemodynamic response function [24]. Two typical SNR levels were tested: 0.2 and 0.5 [21, 22], with 100 synthetic datasets generated at each SNR level. 2.9.2 Results and Discussion For validation, we compared the sensitivity of our model with connectivity prior estimated from OAS and GL against the ridge regression model, i.e. V2 set to I, and the standard  46 univariate model [31] in detecting group activation. We denote these models as OAS-CM, GL-CM, R-UM, and S-UM, respectively. Since the ground truth activated brain regions are unknown for real data, we employed the max-t permutation test [40] to enforce strict control on FPR so that we can safely base our validation on the number of regions detected with significant activation differences. We note that for each permutation, the entire activation effect map of each subject was multiplied by either 1 or -1 chosen at random. Hence, the spatial covariance structure of the activation effect maps was preserved. Also, we emphasize that our validation criterion is independent of the criterion used for optimizing the model parameters. This independence mitigates bias from being introduced during parameter estimation.  The mean receiver operating characteristics curves averaged over the synthetic datasets for each model are shown in Figure 2.10. At all FPR and both SNR levels, OAS-CM and GL- CM achieved higher TPR than R-UM and S-UM, thus confirming that given the activation effects are inherently correlated, our model can exploit this information to improve detection. We note that there is less need for imposing a prior (which introduces a bias) at higher SNR since more signals are available to estimate A, hence the decrease in sensitivity for OAS-CM and GL-CM.  0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 False Positive Rate Tr ue  P os iti ve  R at e OAS‐CM GL‐CM R‐UM S‐UM  (a) SNR = 0.20 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 False Positive Rate Tr ue  P os iti ve  R at e OAS‐CM GL‐CM R‐UM S‐UM  (b) SNR = 0.50  Figure 2.10  Synthetic data results. OAS-CM (red) and GL-CM (blue) achieved higher TPR for all FPR and both SNR levels than R-UM (green) and S-UM (black).   47 2.10 Real RS/Task Paired Data Analysis 2.10.1 Materials Functional MRI data were provided by Dr. Jean Baptiste Poline of INRIA. 65 healthy subjects were recruited and scanned at multiple imaging centers. Each subject performed 10 language, computation, and sensorimotor tasks over a period of ~5 min (140 brain volumes) similar to those in [153]. RS data of ~7 min duration (187 brain volumes) were also collected. 3T scanners from multiple manufacturers were used for acquiring the data with TR = 2200 ms, TE = 30 ms, and flip angle = 75o. Standard preprocessing, including slice timing correction, motion correction, temporal detrending, and spatial normalization, were performed on the task-based data using the SPM8 software. Similar preprocessing was performed on the RS data except a band-pass filter with cutoff frequencies at 0.01 to 0.1 Hz was applied to isolate the signal of interest [88]. Signals from white-matter and cerebrospinal fluid voxels were regressed out from the gray-matter voxels.  To ensure stable sparse inverse covariance estimation using GL [106], we reduced the dimension of the data by grouping the voxels into 1000 parcels. Specifically, we concatenated the RS voxel time courses across subjects and applied the functional parcellation technique in [130] to generate a group parcellation map. Each subject’s brain images (in normalized space) were then parcellated using the group parcel labels. The mean voxel time courses of each parcel from rest and task were taken as the input to our model. To account for scanner variability across imaging centers, we normalized the parcel time courses by subtracting the mean and dividing by the standard deviation. 2.10.2 Results and Discussion To test the generality of our model, we examined 21 contrasts between the 10 experimental conditions. Contrasts included computation vs. sentence processing task, auditory vs. visual task among others. Shown in Figure 2.11(a) is the percentage of parcels detected with significant activation differences averaged over contrasts for p-value thresholds ranging from 0 to 0.5. Incorporating a connectivity prior using OAS-CM and GL-CM outperformed R-UM and S-UM even under the simplifying yet common assumption of V1 = Id×d. We note that  48 adding a shrinkage prior to control overfitting, as in the case of R-UM, only improved detection mildly. Thus, our results suggest an intrinsic relationship between the correlation structure of activation effects and RS-connectivity, and this relationship is consistent across subjects, hence the improved group activation detection. Qualitatively, incorporating a RS- connectivity prior led to more detections of bilateral activation in brain regions implicated for the contrasts examined, examples of which are shown in Figure 2.11(b) and (c).  0 0.1 0.2 0.3 0.4 0.5 6 8 10 12 14 16 p-value Thresholds %  o f P ar ce ls OAS‐CM GL‐CM R‐UM S‐UM  (a) (b) p-value threshold = 0.01, corrected (c) p-value threshold = 0.01, corrected  Figure 2.11  Real data results. (a) Percentage of parcels with significant activation differences averaged across contrasts vs. p-value thresholds. (b) Parcels detected by contrasting computation against sentence processing task, and (c) auditory against visual task. Red = detected by only OAS-CM. Purple = detected by both OAS-CM and GL-CM. Blue = detected by all methods.   49 2.11 Summary In this chapter, we contrasted the effects of different priors on activation detection. The strengths and weaknesses of the current state-of-the-art are summarized in Table 2.1.  Table 2.1  Summary of activation detection methods. GA = group analysis. MCC = multiple comparisons correction. SS = spatial smoothing. VI = voxel interactions. Priors Existing Methods Strengths Weaknesses Local VI Long-range VI SS MCC Assumes one-to-one voxel correspondence for GA Univariate GLM with GRF threshold [31]   X X X MM [43-45]     X Spatial MRF [48-50] X    X Bayesian [46, 47, 55-59] X   X X Wavelet [51, 53, 54] X   X X  As evident from Table 2.1, a common missing attribute in all of the existing methods is that they do not model long-range voxel interactions, despite distant voxels also interact with each other. We thus proposed integrating a connectivity prior to model both local and long- range voxel interactions. We showed that incorporating a connectivity prior increases detection sensitivity in intra-subject activation over using univariate and spatial priors, but is confounded by noise-induced correlations. The reason was likely due to limited discernible signals within a single dataset of a subject. As an additional source of information, we proposed incorporating a group prior using a novel graph-based approach that enables information coalescing between subjects without the need for a one-to-one voxel correspondence. We showed that integrating a group prior with our proposed graph-based approach improves detection sensitivity over using univariate and spatial priors, and provides stricter control over false positives than using a connectivity prior estimated from the same task-based datasets employed for activation statistics estimation. In addition, as an alternative source of information, we showed that extracting a connectivity prior from RS data provided enhanced sensitivity in group activation detection over the standard univariate approach.  50 3     Spatial Characterization of ROI Activation A major limitation of the standard approach for fMRI group analysis is the need for drawing a one-to-one voxel correspondence across subjects. Due to the large inter-subject variability, spatial normalization tends to introduce errors into the analysis. An approach for mitigating mis-registration errors is to skip spatial normalization in all and perform group inference at the ROI level by comparing certain characteristics derived from regional activation statistics across subjects. Conventional features include thresholded mean activation statistics and the percentage of activated voxels [32, 33], which are solely based on the amplitude of the activation statistics without accounting for how these statistics are spatially distributed. As evident from recent brain decoding studies [35, 80], the spatial patterns of brain activity also encode important cognitive information. We thus proposed, during my Master studies, to use invariant spatial features to capture this spatial information [81] (Section 3.1). We also explored applying these spatial features directly to the BOLD signals to infer brain activation based on the spatial modulations of regional BOLD distributions [82]. In this thesis, we further validated our spatial characterization approach [81] on extended real data [P7] and investigated the benefits of adopting an ROI approach for spatial analysis by applying our proposed invariant spatial features to regional activation statistics with and without spatial normalization [P8] (Section 3.2). We also compared these spatial features against the traditional amplitude-based features in studying the effects of L-dopa on PD [P9] (Section 3.3). Furthermore, we applied our spatiotemporal characterization approach [82] on a large cohort of 65 subjects undergoing various experimental conditions to demonstrate its generality [P11] (Section 3.4). Our other related works not described in this chapter include investigation into the compensatory mechanisms in brain activations of PD patients using invariant spatial features [P10], comparisons between the temporal characteristics of spatial and BOLD responses [P12], proposal of a new spatial feature for measuring spatial skewness in BOLD signal patterns [P13], and inferring functional connectivity based on spatial modulations of regional BOLD distributions [P14].  51 3.1 3D Moment-based Invariant Spatial Features Our proposed invariant spatial features [81] are derived based on 3D moments:           dxdydzzyxzyxm rqppqr ),,( , (3.1)  where n = p+q+r is the order of the 3D moment, (x,y,z) are the coordinates of a voxel, and ρ(x,y,z) is the normalized t-value of  a voxel located at (x,y,z) within an ROI, as estimated using GLM. We restrict our analysis to only positive t-values due to the presently unclear interpretations of negative t-values [154]. The regional t-maps are normalized such that the t- values sum to one to decouple the effects of any overall amplitude changes.  Naively applying 3D moments to characterize the spatial distribution of the regional activation statistics can be problematic, since brain size and orientation differences across subjects would be inappropriately interpreted as spatial changes, which greatly amplifies functionally-irrelevant inter-subject variability. Therefore, metrics invariant to similarity transformations, i.e. rotation, translation, and scaling, are desired. Translational invariance can be obtained by using central moments defined as:  dxdydzzyxzzyyxx rqppqr ),,()()()(          , (3.2)  where x , y , and z  are the centroid coordinates of ρ(x,y,z):  000 001 000 010 000 100 ,, m mz m my m mx  , (3.3)  and scale invariance can be obtained by normalizing the moments:  52 1 3 000   rqp pqr pqr   . (3.4)  To obtain rotational invariance, the 3D moments need to be combined in specific ways. For all experiments in Sections 3.2 to 3.4, we focused on an invariant spatial feature [155]:  0020202001  J , (3.5)  that measures the spatial extent of regional activation statistics, and refer readers to [81] and [P7] for other 3D moment invariants that we derived.  m020 x y z m002 m200 m020 x y z m002 m200 m020 x y z m002 m200 … … J1(t) = m200(t)+m020(t)+m002(t) = + J1 X β ε … 1. Feature  extraction from  regional BOLD  distributions 2. Activate effects, β, estimation using GLM statistical  testing 3. Group  analysis on β β ROIs with significant  activation changes Normalized  Regional BOLD  Distribution   Figure 3.1  Spatiotemporal ROI analysis pipeline.  To extend the analysis to the spatiotemporal domain, we proposed computing J1 of every regional BOLD volume to generate spatial feature time courses, i.e. ρ(x,y,z) in (3.2) is now the signal intensity of the voxel located at (x,y,z) at a given time point. We can then directly apply GLM to the resulting spatial feature time courses, as in the standard approach for activation detection [31], to identify activated ROIs [82], as summarized in Figure 3.1.  53 3.2 Effects of Spatial Normalization The standard approach for detecting group activation [31] entails whole-brain spatial normalization, where each subject’s brain images are warped to a common template to generate a one-to-one voxel correspondence. The assumption is that aligning anatomical structures would correspondingly align the functional regions across subjects. However, due to the large anatomical and functional inter-subject variability, spatial normalization is prone to mis-registration errors. For instance, as discussed in a review by Uylings et al. [61], the location of Brodmann areas can vary considerably across normal individuals even after non- linear elastic registration. This issue of anatomical variability is further magnified in studies investigating the effects of age [156, 157] and neurological diseases [158], where the shape and size of the subjects’ brains may significantly depart from the commonly used templates [60, 159]. To circumvent this issue, Brett et al. proposed masking out regions of the brain that differ from the template so that these problematic regions would not affect the registration process [158]. However, manual delineation of such regions is quite laborious and automating the procedure is a highly challenging problem on its own.  To reduce the amount of mis-registration, region-level spatial normalization has been proposed and shown to provide higher sensitivity to activation detection compared to whole- brain spatial normalization [140, 160, 161]. Under this approach, an ROI is first segmented for each subject in its native space. The segmented ROIs are then aligned and averaged across subjects to generate an ROI template. The ROI of each subject is then warped onto the template to generate voxel correspondence. This region-level spatial normalization approach is especially important for analyzing smaller brain regions such as the thalamus and other subcortical structures, which tend to be mis-aligned in whole-brain spatial normalization. The reason is that smaller regions contribute considerably less to the objective function in driving the whole-brain registration process compared to larger brain structures. Hence, adopting a region-level spatial normalization approach, where each ROI is separately aligned, is likely to reduce anatomical mis-registration. However, functional mis-alignments between subjects remain an issue. Also, any spatial normalization will certainly distort the activation statistics patterns, which may affect the outcomes of the analysis.  54 An alternative approach for drawing group inference is to specify ROIs for each subject and compare certain properties of their regional activation statistics. Although this ROI analysis approach does not localize the active voxels, it addresses a more hypothesis-driven question of whether a particular brain region is commonly activated. A benefit of this approach is that it mitigates spatial distortions introduced by spatial normalization. We quantitatively assess the benefits of retaining the original activation statistics patterns by comparing the task discriminability of J1 (3.5) for ROIs extracted 1) from the subject’s native space, 2) using whole-brain spatial normalization, and 3) using region-level spatial normalization [P8]. 3.2.1 Assessment Methods We use the fMRI data of the ten healthy subjects described in Section 2.7.1 to assess the effects of spatial normalization. We restrict our attention to four motor-related ROIs, namely the bilateral thalamus and the bilateral cerebellum, which are expected incur task-related activation changes. The respective sizes of the thalamus and cerebellum render these ROIs representative for testing the impact of spatial normalization on smaller and larger ROIs. The activation statistics, i.e. t-maps, are computed using the standard GLM [31]. To proceed, we first extract ROI activation statistics using three different procedures as summarized in Figure 3.2. For the warp-free case, each ROI is manually segmented from the whole-brain anatomical image by an expert based on anatomical landmarks and guided by a neurological atlas [159]. The segmented ROI is then used to extract the corresponding ROI activation statistics from the whole-brain activation statistics map without any spatial normalization. For the whole-brain spatial normalization case, the whole-brain anatomical image is warped to the Montreal Neurological Institutes (MNI) template using SPM 5 [30]. The same warp is then applied to the whole-brain activation statistics map with the activation statistics of each ROI extracted using predefined labels associated with the MNI template. Lastly, for the region-level spatial normalization case, the segmented anatomical ROIs from the warp-free procedure are rigidly aligned and averaged across subjects to generate an ROI template for each brain structure. This method for generating an ROI template is similar to the approach proposed by Stark and Okado [140], which provides a template that more closely matches the ROI shapes of the subjects under study. The anatomical ROI of each subject is then warped onto the ROI template with the same warp applied to the ROI activation statistics.  55    Figure 3.2  Extraction of ROI activation statistics. (a) Under the warp-free scheme, ROI activation statistics were extracted for each subject by masking out the appropriate regions from the whole-brain activation map using manually defined labels. (b) The whole-brain warping scheme involved warping the anatomical brain image of each subject to a template, applying the same warp to the whole-brain activation map, and extracting the ROI activation statistics using the pre-defined labels of the template. (c) The region-level warping scheme required creating an ROI template and warping of each subject’s ROI to this template with the same warp applied the corresponding activation statistics.  56 For each ROI activation statistics extraction procedure, we compute J1 for each ROI and apply a paired t-test to contrast the activation differences between the fast and slow conditions of the experiment described in Section 2.7.1. Significance is declared at a family- wise p-value of 0.05 (with Bonferroni correction to account for the number of ROIs). The corrected critical p-value is 0.0127, which corresponds to a t-value of 3.10. For comparison, we also calculated the mean ROI activation statistics without spatial normalization. 3.2.2 Results and Discussion Employing the traditionally-used mean activation statistics detected no significant task- related activation differences in any of the ROIs (left thalamus t-value = 0.213, right thalamus t-value = 0.160, left cerebellum t-value = 1.613, and right cerebellum t-value = 0.356). In contrast, J1 without spatial normalization detected significant activation differences in all four ROIs as summarized in Figure 3.3, thus suggesting that important task- related information was encoded in the spatial distribution of the ROI activation statistics. J1 with region-level spatial normalization detected the bilateral thalami and the left cerebellum but not the right cerebellum. J1 with whole-brain spatial normalization detected only the left thalamus and at reduced discriminability compared to J1 without spatial normalization. The quantitative results thus suggest that J1 was more discriminant to task-related spatial changes under the warp-free settings than after whole-brain spatial normalization. J1 without spatial normalization also appeared more discriminant than after region-level spatial normalization but to a lesser extent, except for the left thalamus. We suspect that this exception was due to an increase in functional overlaps across subjects upon region-level warping, which would increase the inter-subject consistency in the spatial activation patterns, hence the enhanced discriminability. However, this benefit of region-level spatial normalization was not evident for the other ROIs, thus might not be generalizable. To determine the reasons for the differences in discriminability between the compared methods, we generated 3D renderings of the ROI activation statistics under each of the three ROI activation statistics extraction schemes to visualize the effects of spatial normalization.  57   0 1 2 3 4 5 6 left thalamus right thalamus left cerebellum right cerebellum ROIs t-v al ue Warp-free analysis Region-level warping Whole-brain warping t-threshold at 3.10   Figure 3.3  Discriminability comparisons between the contrasted procedures for extracting ROI activation statistics. The blue, magenta, and yellow bars correspond to warp-free analysis, region-level warping, and whole-brain warping, respectively. The green dash line is the t-threshold for a family-wise p-value of 0.05 with Bonferroni correction. Significant activation differences were detected in all ROIs for J1 in the warp-free case. Region-level warping resulted in reduced discriminability (except in the left thalamus) and a loss of significance in the right cerebellum. Whole-brain warping led to further loss of significance in the right thalamus and the bilateral cerebellum. Effects of Spatial Normalization on Smaller ROIs The top two rows of Figure 3.4 show the 3D rendering of the warp-free ROI activation statistics within the right thalamus under the slow condition for all ten healthy subjects. As apparent from the figure, the locations of the more highly activated voxels varied across subjects and similar level of functional inter-subject variability was also observed for the fast condition (Figure. 3.5). Hence, even if the ROIs were anatomically aligned perfectly, group activation differences might be difficult to detect using the standard univariate approach [31] due to functional misalignment. Instead, comparing Figures 3.4 and 3.5, a wider area of the right thalamus appeared to be consistently recruited in all subjects under the fast condition. Thus, measuring the spatial extent of ROI activation should provide higher sensitivity than examining each voxel in isolation for the type of activation changes observed.  58  (a) (b) (c)  (d) (e)  (f) (g) (h)  (i) (j)  (k) (l) (m)  (n) (o)  (p) (q) (r)  (s) (t)  (u) (v) (w)  (x) (y)  (z) (α) (β)  (γ) (η)  Figure 3.4  ROI activation statistics in the right thalamus under the slow condition (Section 2.7.1). The top, middle, and bottom two rows correspond to warp-free, whole-brain warped, and region-level warped ROI activation statistics.  59  (a) (b) (c)  (d) (e)  (f)  (g) (h)  (i) (j)  (k)  (l) (m)  (n) (o)  (p)  (q) (r)  (s) (t)  (u)  (v) (w)  (x) (y)  (z)  (α) (β)  (γ) (η)  Figure 3.5  ROI activation statistics in the right thalamus under the fast condition (Section 2.7.1). The top, middle, and bottom two rows correspond to warp-free, whole-brain warped, and region-level warped ROI activation statistics.  60 One might argue that the lack of functional overlap was due to ROI misalignment, and conventional whole-brain spatial normalization should be performed before any conclusions could be drawn. The middle two rows of Figure 3.4 show the activation maps of the right thalamus extracted using the whole-brain spatial normalization scheme. Consistent with Crivello et al.’s findings [162], the amount of functional overlap did not seem to increase after whole-brain spatial normalization. Worse yet, whole-brain spatial normalization appeared to have pooled neighboring regions as part of the right thalamus. For instance, the more activated voxels were originally concentrated on the left side of the right thalamus in Figure 3.4(f), but an extra cluster emerged in the lower right region after whole-brain spatial normalization in Figure 3.4(p). This mis-registration problem was more pronounced in Figure 3.4(t), where the entire upper region of the right thalamus appeared to be active when only a few voxels may actually be activated (Figure 3.4(j)). Moreover, as evident by comparing Figure 3.4(c) and (m), whole-brain spatial normalization might have so heavily distorted the ROI activation patterns that the patterns no longer resemble the original data. Since both problems of mis-registration and spatial distortions were inherent in the whole- brain spatial normalization process, similar artifacts were found in the ROI activation maps for the fast condition (middle two rows of Figure 3.5) and in the left thalamus. Thus, although significant activation differences were detected in the left thalamus under whole- brain spatial normalization scheme, results should be interpreted with caution since the detected differences could have partly arise from the pooling of neighboring active regions as opposed to the true differences observed in the warp-free data.  A plausible explanation for the observed mis-registration and spatial distortions under the whole-brain spatial normalization scheme is that the thalamus is smaller than many other regions in the brain, hence may have little influence on the spatial normalization process. One way to reduce the amount of mis-registration is to perform region-level spatial normalization, which mitigates the problem of pooling neighboring regions. The bottom two rows of Figures 3.4 and 3.5 show the activation statistics of the right thalamus after region- level spatial normalization. Compared to the original warp-free data (top two rows of Figures 3.4 and 3.5), the overall spatial patterns appeared to be conserved although slightly distorted and shifted. The amount of functional inter-subject variability also seemed to have mildly  61 decreased, but this gain might have been offset by the smearing and spatial distortions introduced, which led to reduced discriminability (Figure 3.3). For the case of the left thalamus, the decrease in functional variability actually appeared to have outweighed the artifacts introduced, though this trend was not consistent for the other ROIs examined. Compared to whole-brain spatial normalization, a reduced amount of mis-registration and spatial distortions was definitely evident, and thus the increase in discriminability. Effects of Spatial Normalization on Larger ROIs The top two rows of Figures 3.6 and 3.7 show the warp-free activation statistics of the right cerebellum for the slow and fast conditions, respectively. As can be seen from the figures, considerable functional inter-subject variability was present in the spatial activation pattern. Also, comparing Figures 3.6 and 3.7, the spread of activation only mildly increased under the fast condition. For instance, only the middle region of the right cerebellum (Figure 3.6(e) and (j)) became more active (Figure 3.7(e) and (j)). Thus, any distortions introduced by spatial normalization might remove these subtle spatial differences.  Since larger ROIs have more influence on whole-brain spatial normalization process, the cerebellum should theoretically be less misaligned than the thalamus. Examining the middle two rows of Figure 3.6 supports this hypothesis. Although spatial distortion was apparent and neighboring regions might have been pooled into the border of the right cerebellum (e.g. top of right cerebellum in Figure 3.6(m)), correspondence in the overall spatial pattern could roughly be drawn between the warp-free and whole-brain warped activation maps. Nevertheless, region-level spatial normalization again resulted in less spatial distortions than whole-brain spatial normalization as shown in the bottom two rows of Figures 3.6 and 3.7. Correspondence in the overall spatial patterns could be easily identified between the warp- free and region-level warped cases (e.g. Figure 3.6(j) and (η)), but ample amount of smearing was clearly present (e.g. Figure 3.6(i) and (γ)). Since only mild spatial differences were observed between the slow and fast conditions in the original warp-free activation maps, this smearing effect seemed to have reduced task discriminability, leading to a loss of significance. Thus, our results again suggest the importance of retaining the original ROI activation statistics patterns in fMRI spatial analysis.  62  (a) (b) (c)  (d) (e)  (f) (g) (h)  (i) (j)  (k) (l) (m)  (n) (o)  (p) (q) (r)  (s) (t)  (u)  (v)  (w)  (x)  (y)  (z)  (α)  (β)  (γ)  (η)  Figure 3.6  ROI activation statistics in the right cerebellum under the slow condition (Section 2.7.1). The top, middle, and bottom two rows correspond to warp-free, whole-brain warped, and region-level warped ROI activation statistics.   63  (a)  (b)  (c)  (d) (e)  (f)  (g)  (h)  (i)  (j)  (k) (l) (m)  (n) (o)  (p) (q) (r)  (s) (t)  (u)  (v)  (w)  (x)  (y)  (z)  (α)  (β)  (γ)  (η)  Figure 3.7  ROI activation statistics in the right cerebellum under the fast condition (Section 2.7.1). The top, middle, and bottom two rows correspond to warp-free, whole-brain warped, and region-level warped ROI activation statistics.   64 3.3 Effects of L-dopa on Parkinson’s Disease Despite the clinical benefits of L-dopa on the motor symptoms of PD, neuroimaging studies have revealed conflicting results for the effects of L-dopa on brain activity. Resting state PET studies of PD subjects have shown a pattern of hypermetabolism in lentiform and thalamic areas and hypometabolism of cortical motor regions, which is improved by L-dopa [163- 165]. Earlier studies, however, showed no effect of L-dopa on regional cerebral glucose metabolism [166]. During voluntary movement, studies have also produced conflicting results. While some have demonstrated hypoactivity of the SMA [167], others have found increased activity of caudal portions of the SMA [168]. Similarly, Haslinger et al. [169] showed hyperactivity in M1, while Buhmann et al. [170] found hypoactivity in M1. Nonetheless, both studies showed that L-dopa “normalized” brain activity.  A possible explanation of these conflicting results is that the aforementioned studies have focused on examining the amplitude of ROI activation statistics, and neglected possible characteristic changes in the regional spatial patterns. The assumption that only amplitude is modulated by disease can be overly simplistic, since a “spatially diffused” and a “spatially focused” pattern may have similar mean ROI activation statistics. We hypothesize that one of the effects of L-dopa may in fact be a “re-focusing” of brain activity in specific brain regions. We test this hypothesis by measuring the spatial extent of regional activation statistics in the PD subjects off and on L-dopa medication (Section 2.7.1) using J1 (3.5) [P9]. 3.3.1 Statistical Analysis For each ROI and experimental condition (Section 2.7.1), we computed J1 of the regional activation statistics of the PD subjects off and on medication and the age-matched controls. For comparison, we also computed the mean ROI activation statistics thresholded at a t-value of 1.96. A 2×3 repeated-measures design analysis of variance (ANOVA) was performed on the PD data to analyze the main effects of medication, movement frequency, and any interactions between these two factors. A paired-test was also performed to examine the effects of medication at each task frequency. Statistical significance is declared at a family- wise p-value of 0.05 with FDR correction [52].  65 3.3.2 Results and Discussion Applying an ANOVA on the mean ROI activation statistics detected significant effects of L- dopa on the bilateral M1 and the contralateral SMA. Frequency had a significant main effect in left putamen, bilateral M1, and bilateral SMA. Assessing the effects of L-dopa at each frequency separately detected no significant changes within any ROIs at the slow frequency. At the medium frequency, only the left M1 demonstrated a significant decrease in amplitude. At the fastest frequency, significant decreases in amplitude were detected in the contralateral SMA, ipsilateral putamen, thalamus, cerebellum, and M1.  Using J1, significant main effects of L-dopa were detected in bilateral cerebellum, bilateral M1, bilateral SMA, and ipsilateral prefrontal cortex. A significant interaction in the left prefrontal cortex was found. No main effect of frequency was detected, but frequency specific effects of L-dopa were found in the ipsilateral cerebellum and SMA, and contralateral primary motor cortex at the slow frequency. At medium frequency, significant medication effects were detected in the contralateral thalamus, ipsilateral cerebellum, and bilateral M1 and SMA. At the highest frequency, significant effects were detected in bilateral prefrontal cortex and SMA as well as contralateral cerebellum and ipsilateral M1.  To visual the effects of L-dopa, we plotted the t-values within the right M1 of an exemplar PD subject off and on L-dopa Figure 3.8. Comparing the spatial distribution of t-values, L- dopa appears to ‘refocus’ the area of activation. This trend was consistently found in all PD subjects as shown in Figure 3.9, where the spatial extent of regional activation statistics in PD subjects off and on medication as compared to healthy age-matched controls within the left SMA and right M1 (two regions where both amplitude and spatial distribution of activation was significantly affected by L-dopa) are plotted. The spatial extent of activation statistics within these two regions was reduced to a similar level as controls upon medication, thus demonstrating the normalizing effect of L-dopa.      66 (a) (b)   Figure 3.8  Spatial focusing effect of L-dopa seen in a typical PD subject. The t-maps of right M1 shown with t-values normalized between [0,1] to emphasize the spatial distribution changes. (a) A much wider recruitment of the right M1 was required to perform the same motor task prior to L-dopa medication. (b) The ROI activation pattern appears to focus upon medication.  predrug postdrug Normal J1  (a) Left SMA predrug postdrug Normal J1  (b) Right M1  Figure 3.9  Normalization of the spatial extent of ROI activation statistics by L-dopa. (a) The spatial extent of activation statistics in the left SMA is plotted. The arrows represent the effects of L-dopa on the spatial extent of ROI activation statistics in each PD subject. The feature values for the normal subjects are shown for comparison. (b) Normalization of spatial extent of activation statistics in the right M1. L- dopa appeared to refocus the regional activation in PD with the resultant spatial extent becoming more similar to that of the healthy control subjects.  67 Based on our results, a predominant effect of L-dopa appeared to be a change in the spatial extent of activation in both cortical and subcortical regions. In fact, the spatial effects of medication were significant in the contralateral M1 and ipsilateral cerebellum even at the lowest frequency, whereas medication-related changes in amplitude, as estimated with mean ROI activation statistics, were only evident at the higher frequencies. Similar findings have been observed in cognitive studies [145], and we are demonstrating here that such “focussing” effect of L-dopa is also present for motor tasks. This important brain characteristic would have been missed if only amplitude-based features were employed.  3.4 Large Scale Real Data Experiment We have previously demonstrated that spatial modulations in regional BOLD distributions can be used to infer brain activation for motor tasks [82]. The question is thus whether this is generally true for all different tasks. To address this question, we apply our spatiotemporal characterization approach [82] to fMRI data (Section 2.10.1) collected from a large cohort of 65 subjects [P11]. Each subject performed 10 tasks: 1) passive viewing of flashing horizontal checkerboards, 2) passive viewing of flashing vertical checkerboards, 3) pressing a button with the left thumb button according to visual instructions, 4) pressing a button with right thumb according to visual instruction, 5) pressing a button with left thumb according to auditory instruction, 6) pressing a button with right thumb according to auditory instruction, 7) silently reading short visual sentences, 8) listening to short sentences, 9) solving silently visual subtraction problems, and 10) solving silently auditory subtraction problems [153]. This dataset thus encompasses tasks, such as auditory perception, visual perception, motor actions, reading, language comprehension, and mental calculation, and hence particularly suitable for testing the generality of our spatiotemporal characterization approach. 3.4.1 Statistical Analysis The overall analysis pipeline is summarized in Figure 3.1. After extracting ROIs using the AAL atlas [171], we normalized the regional BOLD distributions such that, at each time point, the intensity of the voxels is non-negative and sums to one. This step decouples  68 amplitude changes from spatial changes. J1 was then computed at each time point to generate a spatial feature time course. Standard GLM [31] was employed to estimate the activation effects. To identify ROIs with significant activation changes, a non-parametric permutation test with strict control over false positives was used [40]. Statistical significance was declared at an experiment-wise p-value of 0.05. For comparison, the same analysis was performed using mean intensity within parcels as feature time courses. We employed parcel-level analysis since the mean intensity over the large AAL ROIs might be too coarse a measure for representing the ROI response. Parcels were functionally defined [130] based on resting-state data of the subjects (Section 2.10.1). 3.4.2 Results and Discussion 21 contrasts between the 10 experimental conditions were examined. Contrasts included auditory vs. visual task, language vs. computation task among others. Figure 3.10 shows the percentage of AAL ROIs detected with significant activation differences using J1. On average, significant activation differences were found in 9% of the AAL ROIs across contrasts. Thus, task-related spatial modulations of the BOLD distribution appeared to be present for various tasks, and not specific to only motor tasks. Also, a moderate number of ROIs detected with J1 were different from those found using mean intensity, as shown in Figure 3.11(a) and (b). This observation suggests that spatial characterization of the regional BOLD response can play a complementary role to traditional intensity-based analysis.  69 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 %  RO Is Contrasts 1 1 1 1 10   Figure 3.10  Percentage of ROIs detected with significant activation differences using J1. L = left, R = right, H = horizontal, V = vertical.   (a) Auditory vs. visual task  (b) Sentence vs. computation task  Figure 3.11  Activation detection comparisons between J1 and mean intensity. Red = detected by J1. Blue = detected by mean intensity. (a) Auditory vs. visual task. (b) Sentence vs. computation task.  70  3.5 Summary We validated the feasibility of inferring brain activation from spatial changes in regional activity patterns as measured with our proposed invariant spatial features. We showed that task-related spatial modulations are generally present for different tasks. This important finding indicates that analyzing signal amplitude is not the only means for detecting brain activation. Another point to emphasize is that our spatial characterization approach takes into account the collective information encoded by both the strongly and weakly responding voxels. From a spatial pattern analysis perspective, weakly responding voxels might just be as relevant, yet are often discarded in standard univariate analysis. Analyzing fMRI data at the ROI level thus opens up a new dimension − the spatial dimension, in complementing traditional univariate approaches.  71 4     Group-wise Functional Network Analysis The standard seed-based approach infers brain networks by comparing the intensity time courses of all voxels in the brain against that of a seed. Joint voxel interactions are thus ignored. Also, the detectable networks are restricted by the choice of seeds. Cluster analysis and decomposition techniques can alleviate these limitations, but identifying common networks across subjects using these approaches is very challenging, especially in the present of strong noise as inherent in fMRI data [108]. If each subject is analyzed separately, the less pronounced commonalities are likely to be obscured by noise. To deal with the observed inter-subject variability, we propose a new method, group replicator dynamics (GRD) [P15, P16], to exploit our prior knowledge of how subjects belonging to the same population likely recruit similar networks [108]. GRD is based on RD [70], which we show to be a simple solution to the NSPCA problem (Section 4.1). Exploiting the sparse property of RD thus eases the diffused weighting problem of PCA [102, 103] and enables explicit modeling of the sparse connections between spatially-remote brain regions [172]. To facilitate statistical group inference, we propose learning each subject’s network individually using RD (as opposed to applying RD to the average similarity matrix of the group [112]) but with group information integrated into each subject’s RD process (Section 4.2). This strategy encourages the individual networks to evolve towards the common network of the group. This results in sparse networks comprising the same brain regions across subjects yet with subject-specific regional weightings that reflect individual differences. Our approach thus enables sharing of statistical strength among subjects, while permitting inter-subject variability to be modeled. Statistical group inference can thereby be performed. We validated GRD on synthetic data (Section 4.3) and applied it to study network differences between tremor-dominant PD and akinetic-rigid PD subjects (Section 4.4).  72 4.1 Replicator Dynamics RD is a well-known concept that originated from theoretical biology for modeling the evolution of interacting, self-replicating entities [70]. Under the RD model, only the fittest entities survive over time. Let w(k) be a vector with the jth element being the probability that allele j remains in the gene pool during the kth generation and C be a ‘fitness’ matrix with each element reflecting the fitness of a genotype (a pairwise combination of alleles), then based on the theory of natural selection, w(k) can be estimated as follows [70]:  )()( )(*).()1( kCwkw kCwkwkw T   , (4.1)  where .* represents element-wise multiplication. According to the fundamental theorem of natural selection [70], if C is real, symmetric, and non-negative, then the function w(k)TCw(k), which corresponds to the average fitness of the surviving alleles, is strictly increasing with increasing k along any non-stationary trajectory w(k), and any such trajectory converges to a stationary point ws. Furthermore, ws is asymptotically stable under (4.1) if and only if ws is a strict local maximizer of wTCw. A proof of this theorem is provided in [173]. Based on this theorem, (4.1) maximizes the same objective function as PCA, i.e. wTCw, but with a different set of constraints. Specifically, the non-negative constraint on C restricts w to be non-negative. Also, since the sum of the elements in the numerator equals the value at the denominator in (4.1), elements of w are enforced to sum to one. Hence, (4.1) is a local maximizer of the following optimization problem:  0,1..,max 1  wwtsCwwTw , (4.2)  where the w is constrained to lie on the standard simplex, i.e. ||w||1 = 1, w ≥ 0. As shown in the seminal paper by Tibshirani [127], constraining the l1 norm induces sparsity. Thus, (4.1) provides an elegant solution to the challenging problem of NSPCA [102, 103] that can be easily implemented in just a few lines of code. Also, since only vector-matrix multiplications  73 are involved, the complexity at each iteration is O(n2), which renders RD computationally attractive against other NSPCA methods [103]. In the context of functional connectivity analysis, a brain region is analogous to an allele with its fitness reflected by the similarity between its time course and those of other ROIs. The most correlated set of ROIs can thus be found by iterating (4.1), where the non-zero elements in w indicate the ROIs belonging to the dominant network.  4.2 Group Replicator Dynamics If we apply (4.1) separately for each subject, the detected networks will likely be different across subjects [113]. However, if a core functional network that is common across subjects exists, we can exploit group information to identify this core network. The underlying assumption is that subjects belonging to the same group presumably use the same core functional network, comprising the same brain regions, to perform the same task. However, the common core network may not necessarily be the most coherent network for each subject, potentially due to different processing strategies or simply due to noise [113]. 4.2.1 GRD Formulation To identify the common core networks, we propose integrating group information in learning each subject’s network by maximizing the following objective function:  s ii W NiwwtsWEWJW ,...,1,0,1..)()(maxargˆ 1   , (4.3)    s r N i NiiiT eWEwCwWJ 1 )2(ln)(,)(  , (4.4)  where W is an Nr × Ns matrix with the weight vector wi of each subject i along the columns, Nr is the number of ROIs, and Ns is the number of subjects. J(W) corresponds to the sum of average network correlations across subjects (recall that wTCw is the average fitness of the surviving alleles in the original context of RD, Section 4.1), and E(W) is the group entropy of  74 the weight vectors assuming the wi’s are instances of a random variable, w, that follows a multivariate normal distribution [174] with λ governing the degree of group support. Σ is the covariance of w. Since entropy measures the uncertainty in predicting any instance of w given the other instances, e.g. predicting wp given wq, minimizing the entropy increases the mutual similarity of the wi’s based on commonalities shared between the subject’s network. 4.2.2 GRD Optimization To maximize (4.3), we propose the following optimization strategy. Let wi(k) be the weight vector of subject i during the kth iteration. At each iteration k, we first update each wi(k) individually using RD:  , )()( )(*).()1( kCwkw kCwkwkw T  (4.5)  We then adjust wRDi(k) to reduce group entropy:  ),()()1( kkwkw iiRD i   (4.6)  where δi(k) is the adjustment on wRDi(k) derived based on gradient descent [175]. Specifically, let WRD(k) be an Nr × Ns matrix with wRDi(k) of each subject along the columns, and Wc(k) be the same as WRD(k) but with the subject mean removed from each row. We approximate Σ in (4.4) with the sample covariance, WcWcT/(Ns-1), and take the matrix derivative of E(Wc) with respect to Wc:    .)( 1 cT c c WWcWc dW WdE   (4.7)  Based on (4.7), all wRDi(k) can be simultaneously adjusted:  )()()1( kkWkW RD  , (4.8)  75 )())()(()( 1 kWkWkWk c T cc  , (4.9)  where λ corresponds to the step size in the context of gradient descent optimization (see Section 4.2.4), and δi(k) in (4.6) is the ith column of δ(k). After applying (4.8), (4.5) is applied again in the next iteration, k+1, and the same procedure is repeated until convergence. We note that provided the step size, λ, is not “too large” as discussed in Section 4.2.4, applying (4.5) will project the weight vectors back onto the standard simplex, which ensures the constraints in (4.3) are satisfied. Since (4.5) searches for the network with highest average correlation within each subject, while (4.8) encourages networks to be similar across subjects, the combined effect of (4.5) and (4.8) results in networks with high average correlations that comprise the same ROIs across subjects, i.e. wi(k)’s having the same sparsity pattern, but with subject-specific ROI weightings.  For detecting subsequent networks, one may remove ROIs detected as the dominant network and reapply the procedures above [112]. Alternatively, if we consider ROIs as nodes of a graph and elements of Ci as edge weights, we can instead remove only those edges present in the dominant network, which enables the same ROI to be in multiple networks. Both of these approaches enable all functional networks in the brain to be potentially discovered in a hierarchical manner, in contrast to the seed-based approach where the detectable networks are restricted by the choice of seed [98]. We have adopted the latter approach to examine which ROIs serve as communication channels between the brain networks.  We note an interesting analogy between GRD and simulated annealing [176]. Specifically, applying (4.8) at each iteration is analogous to perturbing the RD solution of each subject to avoid local maxima. The difference is that the perturbation in GRD is informatively designed based on group consistency, as opposed to being random. Also, as the RD process of each subject evolves towards the common group network, the amount of perturbation will automatically decrease. Thus, explicitly designing an annealing schedule is not required. Furthermore, this designed perturbation renders GRD robust to initialization, as we show later in Section 4.3.  76 4.2.3 Group Inference To test if a detected network is statistically significant, we first compute the average correlation of the brain regions in the detected network of each subject, ci = wiTCiwi. We then apply a one-sample t-test to the Fisher’s Z-transform of the ci’s, which we denote as zi, against an estimated null mean to determine the probability that the zi’s are drawn from a null distribution generated from random noise. To estimate the mean of the null distribution, we temporally permute all preprocessed ROI time courses N = 1000 times, calculate the mean z averaged across subjects for each permutation, and take the average mean z over permutations as the null mean. Details regarding the general theories behind application of permutation tests for drawing statistical inference can be found in [40].  Since a network is characterized by its connections, the average connection strength, as (i.e. ci = wiTCiwi), which GRD maximizes, would be a representative characteristics of a network. However, comparing wiTCiwi across subjects is legitimate only if the same set of ROIs is detected as the common core functional network of the subjects. Otherwise, each estimated ci would represent the average correlation of a different network and declaring the presence of a significant common group network would be invalid. Also, for the same reason, comparing average network correlations would not be appropriate for statistically quantifying network differences between groups, since network of different groups may comprise different ROIs. Therefore, for statistically comparing the networks between two groups, we employ a permutation test [177]. In brief, we first compute the Euclidean distance (ED) between the mean weight vectors of the two groups. We then permute the group labels of the weight vectors and count the number of times, q, that the ED of the permuted groups exceeds the original ED. The p-value is then computed as q/N, where N is the number of permutations set to 100,000.  77 4.2.4 Implementation Details Functional connectivity analysis is typically performed at the voxel level [84] with subject correspondence drawn by whole-brain spatial normalization. However, voxel time courses are often very noisy and whole-brain spatial normalization is prone to mis-registration [61]. Hence, the implicit assumption that a perfect voxel correspondence is established upon whole-brain spatial normalization is quite tenuous. We thus instead define ROIs based on anatomy and set Ci as the pairwise correlations between the average intensity time courses of the ROIs. The underlying assumption is that the same anatomical ROIs pertain to similar functions across subjects. To ensure Ci is non-negative as required for employing RD, we take the absolute value of the correlations [112] assuming positive and negative correlations are equally important. Nevertheless, networks with only positively correlated ROIs can be separately identified by nulling out the negative elements in Ci. To avoid self connections, the main diagonal of Ci is set to zero. Furthermore, to avoid bias during initialization, wi(0) is set to 1/Nr. Convergence of GRD is defined as having all elements of wi(k) change by < 10-4.  Wc(k)Wc(k)T in (4.9) is typically singular since there are usually more ROIs than subjects, i.e. Nr >> Ns. Therefore, we add a small positive constant, α, to the main diagonal of Wc(k)Wc(k)T to regularize the inversion. Adding α to Wc(k)WcT(k) reduces the influence of the off-diagonal elements of Wc(k)WcT(k). These off-diagonal elements encode information regarding which pairs of ROIs are consistently selected (given non-zero weights) across subjects. Hence, one would ideally want to add as small an α value as possible such that all eigenvalues of Wc(k)WcT(k) are positive to ensure stable inversion of Wc(k)WcT(k), while the influence of the off-diagonal elements is preserved. Searching for such an α value at every iteration k, however, can be computationally expensive. We thus opt to fix α at 0.1, which is an order of magnitude lower than the value bounds of Wc(k)WcT(k), i.e. (-1,1). We tested on a wide set of synthetic cases with different number of ROIs and noise levels, and found that selecting an α value as such provides a good balance between ensuring stable inversion of Wc(k)WcT(k) and preserving the influence of the off-diagonal elements, as confirmed by how the detected networks perfectly matched the ground truth in our synthetic examples shown in Section 4.3.   78 As for λ in (4.8), which corresponds to step size in the context of gradient descent optimization, there are two standard methods for setting this parameter [175]. Specifically, one may employ exact line search [175] to find the λ that minimizes the group entropy at every iteration, but this can be very computationally expensive. Therefore, we instead adopted the backtracking line search strategy [175]. In brief, starting from an λ of 1, the idea is to reduce λ until the Armijo condition is satisfied:  )())(())(())()(( kkWEkWEkkWE TRDRDRD   , (4.10)  where γ ϵ (0,0.5). This condition guarantees sufficient decrease in group entropy at each iteration. For a typical γ of 0.01 [175], we found that setting λ to 0.1 satisfies the Armijo condition at every iteration for all synthetic cases and real data tested. We thus opt to fix λ to 0.1 as opposed to checking the Armijo condition at every iteration to find a potentially larger λ. The robustness of GRD as we vary λ is discussed in Section 4.3.  4.3 Synthetic Data Experiments 4.3.1 Data Generation We validated GRD under two test scenarios. 500 synthetic datasets were generated for each test. Scenario 1 consisted of 10 subjects and 100 ROIs, as summarized in Figure 4.1(k). ROIs 1 to 10 were designed to be mutually correlated and constituted the primary group network (i.e. 10% of the ROIs belong to the primary group network). These ROIs were also set to be connected to ROI 21 for subject 1, ROI 22 for subject 2, etc. to introduce inter-subject variability. We note that these extra subject-dependent ROIs are not supposed to be part of the primary group network. Furthermore, we set ROIs 11 to 20 to be mutually correlated but with lower correlations except for subjects 1 and 2 to test whether GRD will get trapped in a local maximum when a secondary network is present and the primary group network is not the primary network for some of the subjects. Sinusoidal signals were used as synthetic ROI time courses to emulate the hypothesized response of the continual task employed in our clinical study, which is likely to be sinusoidal (Section 4.4.1):  79  pp wtststf  )6.0sin()1()5.0sin()( , (6) ss wtststf  )4.0sin()5.0sin()1()( , (7)  where fp(t) and fs(t) are time courses of ROIs belonging to the primary and secondary group networks, respectively. wp and ws represent random Gaussian noise with wp < ws (except for subjects 1 and 2) so that ROIs of the primary network have higher mutual correlations. s is a uniform random variable with values between 0.7 and 1. We note that fp(t) is deliberately designed to be mildly correlated with fs(t) to test whether GRD can correctly separate the two networks. For ROIs that do not belong to the primary and secondary network, random Gaussian noise is used as ROI time courses. 4.3.2 Results and Discussion Results obtained by applying GRD to the synthetic data of scenario 1 are plotted in Figure 4.1. For comparison, we applied PCA, RD, and ICA separately to each subject’s data. Group ICA was also tested. For ICA and Group ICA, the spatial component associated with the temporal component that has highest correlation with sin(0.5t) was taken as wi, i.e. sin(0.5t) is taken as the hypothesized response of ROIs in the primary group network.  Using PCA (red “.”), higher weights were assigned to ROIs 1 to 10 and the extra subject- dependent ROI for subjects 3 to 10, and ROIs 11 to 20 for subject 1 and 2. However, if we examine each dataset separately, weights assigned to ROIs of the primary and secondary group networks were in fact very similar with higher weights occasionally assigned to ROIs of the secondary group network. Defining a threshold to separate the primary and secondary group networks is thus very difficult. Also, non-zero weights were assigned to ROIs with random noise as time courses. Since PCA does not account for group information, detecting the extra subject-dependent ROI and having the primary network of subjects 1 and 2 deviating from the primary group network were expected.    80 0 5 10 15 20 25 30 35 40 0 0.02 0.04 0.06 0.08 0.1 0.12 ROIs w ei gh ts ∙ = PCA x = RD + = ICA □ = Group ICA ○ = GRD  (a) Subject 1 0 5 10 15 20 25 30 35 40 0 0.02 0.04 0.06 0.08 0.1 0.12 ROIs w ei gh ts ∙ = PCA x = RD + = ICA □ = Group ICA ○ = GRD  (b) Subject 2 0 5 10 15 20 25 30 35 40 0 0.02 0.04 0.06 0.08 0.1 0.12 ROIs w ei gh ts ∙ = PCA x = RD + = ICA □ = Group ICA ○ = GRD  (c) Subject 3 0 5 10 15 20 25 30 35 40 0 0.02 0.04 0.06 0.08 0.1 0.12 ROIs w ei gh ts ∙ = PCA x = RD + = ICA □ = Group ICA ○ = GRD  (d) Subject 4 0 5 10 15 20 25 30 35 40 0 0.02 0.04 0.06 0.08 0.1 0.12 ROIs w ei gh ts ∙ = PCA x = RD + = ICA □ = Group ICA ○ = GRD  (e) Subject 5 0 5 10 15 20 25 30 35 40 0 0.02 0.04 0.06 0.08 0.1 0.12 ROIs w ei gh ts ∙ = PCA x = RD + = ICA □ = Group ICA ○ = GRD  (f) Subject 6 0 5 10 15 20 25 30 35 40 0 0.02 0.04 0.06 0.08 0.1 0.12 ROIs w ei gh ts ∙ = PCA x = RD + = ICA □ = Group ICA ○ = GRD  (g) Subject 7 0 5 10 15 20 25 30 35 40 0 0.02 0.04 0.06 0.08 0.1 0.12 ROIs w ei gh ts ∙ = PCA x = RD + = ICA □ = Group ICA ○ = GRD  (h) Subject 8 0 5 10 15 20 25 30 35 40 0 0.02 0.04 0.06 0.08 0.1 0.12 ROIs w ei gh ts ∙ = PCA x = RD + = ICA □ = Group ICA ○ = GRD  (i) Subject 9 0 5 10 15 20 25 30 35 40 0 0.02 0.04 0.06 0.08 0.1 0.12 ROIs w ei gh ts ∙ = PCA x = RD + = ICA □ = Group ICA ○ = GRD  (j) Subject 10  Subjects/ROIs 1-10 11-20 21 22 23 24 25 26 27 28 29 30 31-100 1 S P S - - - - - - - - - - 2 S P - S - - - - - - - - - 3 P S - - P - - - - - - - - 4 P S - - - P - - - - - - - 5 P S - - - - P - - - - - - 6 P S - - - - - P - - - - - 7 P S - - - - - - P - - - - 8 P S - - - - - - - P - - - 9 P S - - - - - - - - P - - 10 P S - - - - - - - - - P - (k)  Figure 4.1  Synthetic data results. (a) to (j) Average weight vectors of subjects 1 to 10 over 500 datasets (k) Ground truth subject-specific networks. “P” and “S” indicate if an ROI belongs to the primary or secondary network of each subject. ROIs 1 to 10 constitute the primary group network. “-” implies the ROI does not belong to a network. We zoomed into the weights of ROIs 1 to 40 for clarity. Weights of ROIs 41 to 100 were similar to those of ROIs 31 to 40 since all these ROIs do not belong to any network.   81 RD (green “x”) also assigned higher weights to ROIs 1 to 10 and the extra subject-dependent ROI for subjects 3 to 10, and ROIs 11 to 20 for subjects 1 and 2 on average. These deviations from the ground truth primary group network were expected since RD applied separately to each subject will obviously not be able to exploit group information. We note that zero weights were assigned to ROIs with random noise as time courses, thus demonstrating the sparse property of RD. Surprisingly, non-zero weights were sporadically assigned to ROIs of the secondary network when we examine results of each synthetic dataset individually. We suspect the amount of noise added might have obscured the true degree of correlations between ROI time courses of the two networks.  ICA (cyan “+”) resulted in higher weights assigned to ROIs 1 to 10 for all subjects, since we knew a priori that the temporal component having highest correlation with sin(0.5t) corresponds to the primary group network. Such prior knowledge is rarely available for real experiments, especially for complex tasks, such as the continual motor squeezing task employed in our study (Section 4.4.1). Also, the extra subject-dependent ROI was incorrectly detected, since ICA does not incorporate group information. Moreover, non-zero weights were assigned to ROIs of the secondary group network, though the magnitude was substantially reduced compared to PCA.  Results obtained using Group ICA (blue “”) were very similar to ICA except for a reduction in standard deviation of the weights (not shown in Figure 4.1 to avoid cluttering the plot). Thus, the back-reconstruction step in Group ICA [72] does not preserve the group network structure. If we apply a t-test to the subjects’ weights for each ROI separately as proposed in [72], the subject-dependent ROI would be correctly pruned out, but ROIs in the secondary group network would be mistakenly taken as part of the primary group network since small positive weights of similar magnitude were consistently assigned to these ROIs across all subjects. If we instead threshold the weights to avoid falsely including ROIs from the secondary group network, the subject-dependent ROI would be erroneously treated as part of the primary group network. Thus, no simple automated heuristics would be sufficient to separate out the primary group network.   82 Using GRD (black “o”), weights were correctly assigned to only those ROIs belonging to the ground truth primary group network for all subjects. Also, zero weights were assigned to all other ROIs, thus no heuristics were required to identify the primary group network. We note that with λ set to 0.1, which satisfied the Armijo condition, GRD was found to converge for all synthetic datasets. A plot of (4.3) vs iteration k is shown in Figure 4.2 for an exemplar synthetic dataset.  0 50 100-26 -25 -24 -23 -22 -21 Iteration k J( W (k )) -λE (W (k ))   Figure 4.2  Convergence of GRD. GRD was found to converge for all synthetic data tested.  Notice that GRD converged approximately within the first ten iterations. In addition to convergence, we also tested the robustness of GRD under random initializations of wi(0). Over 500 random initializations for an arbitrary synthetic dataset, GRD correctly assigned weights to only ROIs of the primary group network for 99% of the cases with negligible weights assigned to the secondary group network for the remaining cases. GRD thus appears highly robust to choice of initialization. Also, we tested the case where outlier subjects were present. Specifically, we generated another 500 synthetic datasets, where all subjects’ network structure remained the same as in Figure 4.1(k), except for subjects 9 and 10. For subjects 9 and 10, random noise was used as time courses for ROIs 1 to 20, and ROIs 21 to 30 were made mutually correlated. Weights obtained using GRD for exemplar subjects 1 and 9 are shown in Figure 4.3.   83   0 5 10 15 20 25 30 35 40 0 0.02 0.04 0.06 0.08 0.1 0.12 ROIs w ei gh ts  (a) Subject 1   0 5 10 15 20 25 30 35 40 0 0.02 0.04 0.06 0.08 0.1 0.12 ROIs w ei gh ts  (b) Subject 9  Figure 4.3  Outlier analysis. (a) and (b) Average weight vectors of subjects 1 and 9 over 500 datasets using GRD. Networks of subjects 1 to 8 were the same as in Fig. 1(k). For subjects 9 and 10, ROIs 1 to 20 were random noise, and ROIs 21 to 30 were mutually correlated. Notice negligible weights were assigned to ROIs 1 to 10 for subject 9, thus demonstrating that GRD would not falsely assign high weights to ROIs that do not form a network. Weight vectors of subjects 2 to 8 were similar to subject 1. Weight vector of subject 10 was similar to subject 9. Identifying subjects 9 and 10 as outliers is thus trivial.  For subjects 1 to 8, substantially higher weights were assigned to ROIs of the primary group network than ROIs 21 to 30, and conversely, more weights were assigned to ROIs 21 to 30 than ROIs of the primary group network for subjects 9 and 10. These results thus demonstrate that GRD would not falsely assign high weights to ROIs that do not form a network. Also, based on these differences in weight vectors, subjects 9 and 10 could be trivially identified as outliers.  Furthermore, we tested the robustness of GRD to the choice of λ (Figure 4.4). We observed that setting λ beyond 0.15 leads to violation of the Armijo condition. In fact, some elements of WRD(k)+λδ(k) became negative, thus RD was not be able to project the weight vectors back onto the standard simplex. When λ is set below 0.05, the sparsity patterns of the weight vectors no longer matched across subjects. Within the range of 0.05 and 0.15, where the Armijo condition is satisfied, GRD was found to be very robust.  84  Subject# R O I#   2 4 6 8 10 5 10 15 20 25 30 35 40 -0.05 0 0.05 0.1 0.15  (a) λ = 0.2 Subject# R O I#   2 4 6 8 10 5 10 15 20 25 30 35 40 -0.05 0 0.05 0.1 0.15 (b) λ = 0.15 Subject# R O I#   2 4 6 8 10 5 10 15 20 25 30 35 40 -0.05 0 0.05 0.1 0.15 (c) λ = 0.1 Subject# R O I#   2 4 6 8 10 5 10 15 20 25 30 35 40 -0.05 0 0.05 0.1 0.15  (d) λ = 0.05 Subject# R O I#   2 4 6 8 10 5 10 15 20 25 30 35 40 -0.05 0 0.05 0.1 0.15 (e) λ = 0.01 Subject# R O I#   2 4 6 8 10 5 10 15 20 25 30 35 40 -0.05 0 0.05 0.1 0.15 (f) λ = 0  Figure 4.4  Effects of varying λ. wi’s for an exemplar synthetic dataset of scenario 1. Weight values of the first 40 ROIs displayed. Each row of a subfigure corresponds to an ROI and each column corresponds to a subject. (a) Setting λ to 0.2 resulted in violation of the Armijo condition and a few negative weights due to too large a step taken during group entropy minimization. (b)-(d) Setting λ between 0.05 and 0.15 generated weight vectors with sparsity patterns that exactly match the ground truth (Figure 4.1(k)). Varying λ resulted in mild changes to the magnitude of the weights. (e) When λ was set to 0.01, the level of group influence appeared too low to steer each subject’s RD process towards the common group network. (f) Setting λ to 0 generated weight patterns identical to those obtained by applying RD separately to each subject as expected.  In scenario 2, we tested how well each method scales up to larger size problems by generating 500 synthetic datasets similar to Figure 4.1(k) but with 1000 ROIs (i.e. 1% of the ROIs belong to the primary group network). The results were exactly the same as in Figure 4.1 with GRD correctly detecting only those ROIs belonging to the primary group network, and the other methods prone to subject-specific network structure. GRD could thus be scaled up to whole-brain analysis if we parcellate the brain into thousands of ROIs using automated ROI extraction tools, such as Freesurfer. In fact, GRD could theoretically handle problems of arbitrary size, but memory size limits GRD’s scalability in practice. Results for a smaller scale synthetic test (typical for clinical studies) can be found in [P15].  85 4.4 Real Data Analysis 4.4.1 Materials Functional MRI data were provided by Dr. McKeown of the Pacific Parkinson Research Center. 9 healthy subjects, 6 patients with tremor-dominant Parkinson’s disease (PDt) off medication, and 8 patients with akinetic-rigid Parkinson’s disease (PDar) off medication were recruited. Subjects were required to use their right-hands to squeeze a bulb with sufficient pressure such that a horizontal bar shown on a screen was kept within an undulating pathway. The pathway remained straight during rest periods and became sinusoidal at a frequency of 1 Hz during time of stimulus. Increasing level of noise was added to the pathway for studying network changes as the task transitions from being externally-guided (guided by visual stimulus) to being internally-guided (guided by subjects’ memory). We denote the three examined noise levels as A, B, and C with level A having no noise and level C being the noisiest. Each run lasted 600s, alternating between stimulus and rest of 100 s durations.  Functional MRI was performed on a Philips Achieva 3.0 T scanner (Philips, Best, Netherlands). T2*-weighted images with BOLD contrast were acquired using an EPI sequence with an echo time of 37 ms, a repetition time of 1985 ms, a flip angle of 90°, an in plane resolution of 128128 pixels, and a pixel size of 1.91.9 mm. Each volume consisted of 36 axial slices of 3 mm thickness with a 1 mm gap. A 3D T1-weighted image consisting of 170 axial slices was also acquired.  Each subject’s fMRI data were pre-processed using Brain Voyager’s (Brain Innovation B.V.) trilinear interpolation for 3D motion correction and sinc interpolation for slice time correction. Further motion correction was performed using motion corrected ICA [144]. The voxel time courses were then high-pass filtered to account for temporal drifts. No spatial normalization or spatial smoothing was performed. In this study, our main objective was to identify differences in functional networks between the two PD subtypes, namely PDt and PDar. We had thus focused our analysis on ROIs that are pertinent to PD, as chosen by a neurology expert. ROIs included bilateral cerebellum (CER), thalamus (THA), caudate  86 (CAU), putamen (PUT), pallidum (PAL), hippocampus (HIP), amygdale (AMY), primary motor area (PMA), premotor area (PM), somatosensory area (SSA), caudal anterior cingulate cortex (ACCc), posterior cingulate cortex (PCC), and rostal anterior cingulate cortex (ACCr). All ROIs were manually segmented by a trained technician who was blind to each subject’s diagnosis. 4.4.2 Results and Discussion The primary, secondary, and tertiary group networks detected using GRD on real data are summarized in Figure 4.5. GRD converged in all cases with all found networks being significant at a p-value of 0.05 with Bonferroni correction. For noiseless case (i.e. “noise level A”), bilateral PMA, bilateral PM, left SSA, and left PCC were detected as the primary group network of the healthy subjects. As we increased the noise (i.e. levels B and C), the right SSA and right PCC were additionally recruited. Since ipsilateral regions are often recruited to handle increasing task difficulty [178], our results are consistent with prior neuroscience knowledge. Removing edges between the ROIs detected as the primary group network and reapplying GRD resulted in bilateral CER, bilateral THA, right PM, and right PCC being detected as the secondary group network, and bilateral CAU, left PM, right SSA, bilateral ACCc and right PCC as the tertiary group network for noise level A. Since the cerebello-thalamo-cortical (CTC) loop is known to be more involved than the striato- thalamo-cortical (STC) loop during externally-guided tasks [179], these results further confirm the validity of GRD. Little changes were observed in the secondary and tertiary networks at noise level B with left CAU becoming part of the secondary network and bilateral PUT recruited into the tertiary network. At noise level C, bilateral CAU, right PM, and bilateral ACCc became the secondary network, and bilateral CER, bilateral THA, left CAU, and left PM became the tertiary network. Since noise level C requires higher memory demands from the subjects, observing this switch in the level of recruitment of the CTC and STC loops again matches expectation.      87  LPMA LSSA LPM RPMA RPM LPCC Primary Network p = 8.465x10‐8 Secondary Network p = 6.529x10‐6 RPM RPCC RTHA LTHA RCER LCER LPM LCAURCAU RSSA LACCc RACCc RPCC Tertiary Network p = 6.661x10‐7  (a) Healthy subjects, noise level A Tertiary Network p = 4.744x10‐3 Primary Network p = 3.051x10‐4 Secondary Network p = 8.995x10‐4 RPMA LPMA LPM LSSA RPM LACCcRACCc LPCC RPCCRPM RSSA RSSA LPM LPUT LCAU RCAU RPUT RACCr (d) Tremor-dominant PD patients Tertiary Network p = 1.250x10‐5 Primary Network p = 1.467x10‐5 Secondary Network p = 1.802x10‐5 RPMA LPMA LPM LSSA RPM RSSA RPCC LPCC LCER RCER LTHA RTHA LCAU RPM LPUT LCAU RCAURPUT LACCc RACCc RPCC LPM  (b) Healthy subjects, noise level B Tertiary Network p = 2.158x10‐5 Primary Network p = 5.293x10‐7 Secondary Network p = 1.280x10‐6 RPMA LPMA LPM LSSA RPM RPMRPCC RCER LCER LPCC LPM LPCC RCAU RTHA LCAU LTHA  (e) Akinetic-rigid PD patients Tertiary Network p = 1.110x10‐5 Primary Network p = 1.075x10‐5 Secondary Network p = 1.085x10‐6 RPMA LPMA LSSA RPCC LPCC LPM RSSA RPM RPM RACCc RCAU LCAU LACCc LCAU LCERRCER RTHA LTHA LPM  (c) Healthy subjects, noise level C Legend  CER = cerebellum  THA = thalamus  CAU = caudate  PUT = putamen  PAL = pallidum  HIP = hippocampus  AMY = amygdale  PMA = primary motor area  PM = premotor area  SSA = somatosensory area  ACCc = caudal anterior cingulate  PCC = posterior cingulate  ACCr = rostal anterior cingulate   Figure 4.5  Real fMRI data results. The primary, secondary, and tertiary group networks detected using GRD and their p-values are shown. All networks were statistically significant at p-value of 0.05 with Bonferroni correction.  88  For PD subjects, similar networks were detected for all noise levels. We thus combined the data of the three noise levels by concatenating the ROI time courses in detecting representative networks for each PD subgroup. Although the primary group network of both PD subgroups comprised the same ROIs, namely bilateral PMA, bilateral PM, and left SSA, the relative weighting of the ROIs were found to be significantly different with a p-value of 0.0017. The secondary and tertiary group networks were also found to be vastly different (p- value = 0 over 100,000 permutations). Specifically, PDt recruited right PM, right SSA, bilateral ACCc and bilateral PCC as the secondary group network, and bilateral CAU, bilateral PUT, left PM, right SSA, and right ACCr as the tertiary group network. In contrast, PDar recruited bilateral cerebellum, right PM, and bilateral PCC as secondary group network, and bilateral THA, bilateral CAU, left PM, and left PCC as the tertiary group network. Our results thus suggest that differences in symptoms between PDt and PDar may originate from brain network differences, in addition to abnormal alterations in brain activity [180, 181]. In particular, excessive activation in the cerebellum has been hypothesized as part of a compensatory mechanism [180], which was found to be more pronounced in akinetic- rigid PD [181]. In support of these findings, our results further suggest that differences between the PD subtypes may be signified by the presence of hyper-active CTC loop in PDar. Since PDar normally portends a worse prognosis, we postulate that an over-active CTC loop may be maladaptive.  We note that we have also applied GRD to real data described in Section 2.7.1 to examine the effects of increasing task difficulties. We found that even at the lowest task frequency, PD subjects off medication were required to recruit a network that was employed by healthy subjects at the highest frequency. Details of this analysis can be found in [P15].   89 4.5 Summary In this chapter, we presented a novel group-wise sparse network detection method that we proposed for handling some of the limitations associated with state-of-the-art functional connectivity analysis techniques, as summarized in Table 4.1.  Table 4.1  Summary of functional connectivity analysis methods. GA = group analysis. Methods Strengths Weaknesses Multivariate Extendable to GA Sensitive to Parameter Selection Seed-based [86]  X X Clustering [100, 101] X  X Decomposition [34, 73] X X X  In brief, the standard seed-based approach is easily extendable to group analysis, but does not model joint voxel interactions and is sensitive to the choice of threshold in isolating the network nodes. Cluster analysis accounts for joint voxel interactions, but is not easily extendable to group analysis and is sensitive to the number of clusters chosen. Decomposition techniques, such as PCA and ICA, are multivariate, but their diffused weighting property renders identification of network nodes difficult. Moreover, drawing a correspondence between spatial component maps that are separately extracted from each subject is non-trivial due to the large inter-subject variability. Group extensions of ICA have been proposed, but assume all subjects share similar spatial components for both signal and noise. Thus ideally, one would want a multivariate method that facilitates simple network identification, while having some mechanisms for handling the large inter-subject variability. To jointly satisfy these criteria, we proposed incorporating sparsity and group priors into PCA for network detection. Specifically, we showed that RD is a solution of the NSPCA optimization problem. Thus, exploiting the sparse property of RD can simplify network identification. We further showed that incorporating a group prior provides an effective means for handling the large inter-subject variability with network correspondence implicitly drawn across subjects. We note that GRD extracts networks in a hierarchical manner, and stops when the extracted network has a correlation value below that of random noise. Hence, GRD does not assume nor require all subjects to have similar spatial noise structures.  90 5     Generalized Sparse Classification In fMRI classification, one is often faced with exceedingly more features (e.g. voxels) than samples (e.g. brain volumes). Under such ill-conditioned settings, direct application of standard classifiers will likely result in overfitting. The standard approach to deal with the curse of dimensionality is to perform univariate feature selection, but this strategy ignores the collective information encoded in the voxel signal patterns. A recent approach that has gained major research interest is to incorporate sparsity into classifier learning. Enforcing sparsity implicitly removes classifier weights associated with irrelevant features, i.e. shrinks them to exactly zero. In effect, this approach enables feature selection and classifier learning to be simultaneous and collaboratively performed.  The majority of state-of-the-art sparse linear models [182-188] (Section 5.1) are based on the seminal work by Tibshirani [127], in which he showed that the LASSO penalty induces sparsity. These models provide effective means for imposing structural constraints on the model parameters [184], but lack the flexibility for incorporating potentially advantageous problem-specific properties beyond sparsity [187]. For example, adjacent pixels in an image are typically correlated. Model parameters associated with these pixels should thus be assigned similar magnitudes to reflect the underlying correlations. However, merely enforcing sparsity does not model such associations between model parameters. To encourage smoothness in model parameters, in addition to sparsity, Tibshirani et al. proposed the fused LASSO model [188], which combines the LASSO penalty with a term that penalizes the l1 norm of the differences between successive model parameters. Similarly, Hebiri proposed the smooth LASSO [189], in which a term that penalizes the l2 norm of the differences between successive model parameters is added to the LASSO penalty. To extend beyond smoothness, Tibshirani and Taylor recently proposed the Generalized LASSO model [187], which penalizes the l1 norm of a weighted combination of the model parameters. By varying the choice of weights, Generalized LASSO facilitates numerous applications, such as trend filtering, wavelet smoothing, and outlier detection [187].   91 To enable greater modeling flexibility without creating overly complex optimization problems, we propose a simple technique, “generalized sparse regularization” (GSR) [P17] that enables incorporation of prior knowledge [P17-P20] into a wide collection of sparse linear models (Section 5.2). In contrast to [187], we show that GSR is applicable to a much broader set of sparse linear models than just the LASSO regression model, and can be easily extended to the classification settings (Section 5.3) for applications, such as brain decoding (Section 5.4), which is our primary interest. The adaptability of GSR to such a wide range of models stems from how any l2 norm penalty can be merged into an l2 data fitting loss through a simple augmentation trick. Thus, adding a generalized ridge penalty to any sparse linear models with an l2 data fitting loss enables problem-specific properties to be easily integrated while preserving the functional form of the original optimization problem that these models entail. This desirable property of GSR facilitates direct deployment of existing sparse optimizers [152, 190-192]. We built various prior-informed sparse classifiers using GSR and applied them to the publicly available StarPlus data (Section 5.5) to demonstrate the benefits of modeling application-specific properties beyond sparsity (Section 5.6).  5.1 Overview of Sparse Linear Models In this section, we focus on the problem of regression, since most sparse linear models [182- 188] are inherently designed for such application. We defer discussion of how the sparse linear models described in this section can be employed for classifier learning in Section 0. Consider the standard least square regression problem:  2 2minˆ Xaya a  , (5.1)  where y is an N×1 response vector, X is an N×M predictor matrix, a is an M×1 coefficient vector, N is the number of samples, and M is the number of predictors. The closed-form solution of (5.1) is given by:   92   yXXXa TT 1ˆ  . (5.2)  When N << M, (XTX)-1 is ill-conditioned. Thus, direct estimation of â using (5.2) usually results in overfitting. To obtain a more generalizable estimate of â , a common strategy is to employ regularization. In particular, Tibshirani proposed enforcing sparsity on a to achieve the dual objective of reducing overfitting and enhancing interpretability [127]:  1 2 2minˆ aXaya a  , (5.3)  where α ≥ 0. The model (5.3) is commonly referred to as the LASSO regression model, where Tibshirani showed that penalizing the l1 norm induces sparsity on a [127].  The success of the LASSO regression model in numerous applications resulted in an explosion of research on sparse linear models [192]. Although many LASSO variants involve only a simple addition of other penalty terms to (5.3), the modeling power that these extensions facilitate proved substantial. For example, Zou and Hastie proposed adding a ridge penalty to (5.3), which is known as the elastic net model [182]:  2 21 2 2minˆ aaXaya a   , (5.4)  where β ≥ 0. This model has two key advantages over LASSO. First, in contrast to LASSO, the number of non-zero elements in a is no longer limited by the number of samples [182]. This property is especially important in medical imaging applications, since the number of samples is often much smaller than the number of predictors. Second, in cases where the predictors are correlated, elastic net tends to jointly select the correlated predictors, whereas LASSO would arbitrarily select only one predictor among each correlated set [182].  Another widely-used extension of LASSO is the group LASSO, proposed by Yuan and Lin for applications where a natural grouping of the predictors is present [183]:  93  1,2 2 2minˆ ga aXaya  , (5.5) 211,2   H h gg haa , (5.6)  where agh are the coefficients associated with predictors belonging to the predefined group gh, and h ϵ {1, …, H} where H is the number of predefined groups [183]. As evident from (5.6), ||ag||2,1 is exactly the l1 norm of ||agh||2. Thus, minimizing ||ag||2,1 encourages sparse subsets of groups to be selected with non-zero coefficients assigned to all predictors within each selected group. However, since not all predictors within a group are necessarily relevant, Sprechmann et al. [184] proposed the hierarchical LASSO model, also known as sparse group LASSO [185]:  11,2 2 2minˆ aaXaya ga   , (5.7)  which is essentially group LASSO combined with a LASSO penalty. Reintroducing the LASSO penalty encourages internal sparsity such that only a sparse subset of predictors within each selected group is chosen [184]. This property of (5.7) is particularly useful in cases where one is uncertain about the exact group assignment.  To extend (5.7) to scenarios where multiple sets of samples, ys, are generated from different processes, e.g. when observations are collected from multiple subjects [193], but associated with the same set of predictors, e.g. using the same set of semantic features to predict brain activation patterns of different subjects [193], Sprechmann et al. [184] proposed the collaborative hierarchical LASSO model:    S s sK k F gFa aAXAYa h 1 11 2minˆ  , (5.8)  94  where S is the number of processes, Y = (y1, …, yS), A = (a1, …, aS), Agh are the rows of A belonging to group gh, and as is the coefficient vector of process s. || · ||F denotes the Frobenius norm. Minimizing (5.8) promotes the same groups to be selected across processes with the chosen sparse subset of predictors within each selected group being potentially different [184]. This model thus enables information across processes to be shared while providing flexibility in handling inter-process variability by allowing sparsity patterns within the selected groups to vary across processes.  All of the LASSO extensions described above provide effective means for modeling data structure through promoting groups of predictors to be jointly selected. However, the associations between predictors are ignored in these models, i.e. associated predictors may be jointly selected but the coefficients assigned to these predictors can substantially vary. Since the associations between predictors can be an important source of information for further constraining the ill-conditioned problem (5.1), we thus propose a simple approach to incorporate predictor associations into all of the models above, as discussed next.  5.2 Generalized Sparse Regularization To integrate properties beyond sparsity into sparse linear models as those described in Section 5.1, we propose complementing these models with the following penalty:  2 2)( aaJGSR  , (5.9)  where Γ = (γ1, …, γR)T is an R×M penalty matrix for modeling the associations between predictors, γr is an M×1 vector encoding the penalty for each association r, and R is the number of associations being modeled. For instance, if predictors xp and xq are associated with each other, say by virtue of being spatially adjacent voxels, then ap and aq should presumably be assigned similar values to reflect this intrinsic association. This can be accomplished by, e.g. setting γrp to 1 and γrq to –1 such that differences in ap and aq are  95 penalized. Many other context-specific properties can be analogously modeled by varying Γ as discussed in Section 5.4 and [187].  One may envisage other penalties to facilitate similar modeling flexibility. However, not all penalties added to the models in Section 5.1 will result in a practical optimization problem. The critical question is thus whether the optimization problem resulting from integrating (5.9) into models in Section 5.1 can be efficiently minimized. To address this question, we draw upon a basic property of the Euclidean norm:              z z zzz  1 2 21 2 2 ,  , (5.10)  where zω is a vector of arbitrary length. Since models (5.3), (5.4), (5.5) and (5.7) in combination with (5.10), can all be written in the form:  )(minˆ 22 2 2 aJaXaya a   , (5.11)  where λ ≥ 0 and J(a) is the sparse penalty term of the respective models, invoking (5.10) on (5.11) results in the following optimization problem:  )~(~~~min1ˆ 2 2~ aJaXya a   , (5.12)             X X y y 2 1 1~, 0 ~ , (5.13)  which has the exact same functional form as the optimization problems of the original sparse linear models. Thus, existing solvers of the respective sparse linear models [152, 190-192] can be directly employed to efficiently minimize (5.11).  96 For the multi-process model (5.8), if the associations between predictors are similar across processes, then the same matrix augmentation trick can be applied with the zero vector in (5.13) replaced by an R×S zero matrix. If associations between predictors vary across processes, a minor modification is needed:                      X X X Yvector y 00 0 0 00 1~, 0 )(~ 2 1     , (5.14)  where vector(·) is an operator that stacks the columns of its argument into a vector, and a in the first term of (5.11) is now vector(A). We highlight that, in addition to enabling different predictor associations to be modeled for each process, (5.14) permits modeling of predictor associations across processes. This modification of the matrix augmentation trick thus builds even greater flexibility into (5.8).  5.3 GSR for Classification The models described in Section 5.1 are inherently designed for regression problems with y being a continuous variable. To extend the GSR-augmented sparse regression models to the classification setting without altering the functional form of the associated optimization problems, we employ the spectral regression technique [194], which comprises two steps: Step 1. Learn the constraint-free optimal projection of the training data X, e.g. using graph embedding (GE) [195]:  DyWy  , (5.15)  where y is the projection of X on the subspace defined by W [195]. Wij models the intrinsic relationships between samples i and j of X, and D is a diagonal matrix with Dii = ∑jWij. We  97 note that the key advantage of GE is that it enables various subspace learning algorithms to be used as classifiers by simply varying W [195].  Step 2. Determine the classifier weights, a, such that y and Xa are as similar as possible under the desired constraints, Q(a):  )(minˆ 22 aQXaya a  . (5.16)  Clearly, setting Q(a) to 22a +J(a) results in our proposed GSR model (5.11). Hence, exploiting spectral regression in combination with GSR facilitates sparsity and predictor associations to be jointly integrated into classifier learning.  5.4 Application to fMRI Spatiotemporal Classification Given N M×1 feature vectors, xi, forming the rows of an N×M matrix, X, the general classification problem involves finding the corresponding N×1 label vector, l, containing the class label li of xi. We assume here that each feature xip can be naturally assigned to a group gh ϵ G = {g1, …, gH}. In the context of spatiotemporal fMRI classification, we treat the signal intensity of each brain voxel p at time tk within a trial as a feature, and all brain volumes within a trial of the same experimental condition as a sample xi, as illustrated in Figure 5.1. Our goal is to determine to which condition, li, does each concatenated set of brain volumes xi belongs. Since the brain is known to be functionally organized into specialized neural regions [196], this provides a natural grouping of the voxels.  98 T1 TNT t1 tNt v1 vM ... (V1,t1,T1) (vM,tNt,TNT) t1 tNt x1 xN vp v = voxel t = time T = trial   Figure 5.1  Predictor matrix. Brain volumes within the same trial are concatenated and treated a single sample xi. Nt is the number of volumes within a trial and NT is the number of trials for a particular experimental condition.  To model this modular structure of the human brain, as well as the intrinsic spatiotemporal correlations in brain activity, we build an anatomically-informed spatiotemporally smooth sparse LDA (ASTSLDA) classifier by first solving for y in (5.15) with:     otherwise cllm W jicij ,0 ,/1 , (5.17)  where mc is the number of samples in class c [194]. We then optimize (5.11) with J(a) set to the group LASSO penalty (6), and the signal intensity of the voxels within each ROI (Section 5.5) at each time point tk treated as a group. Γ is set as the spatiotemporal Laplacian operator:        ks skkk k sk pq qppp ps qp otherwise Nq ,, ,0 ,1  (5.18)  where Npk is the spatiotemporal neighborhood of voxel p at time tk, as shown in Figure 5.2. Specifically, 6-connected spatial neighbors and signal intensity of voxel p itself at adjacent time points are defined as the spatiotemporal neighbors.  99 tk+1tk-1 vpvp tk vp v = voxel t = time vq   Figure 5.2  Spatiotemporal neighborhood. 6-connected voxels and signal value of voxel p itself at adjacent time points are defined as neighbors of voxel p.  We can easily build other sparse LDA variants in an analogous manner. To build an anatomically-informed spatially smooth sparse LDA (ASSLDA) classifier, we employ (5.18) but with Npk being the 6-connected spatial neighborhood of voxel p. To build a sparse LDA classifier that is only anatomically-informed (ASLDA), we set λ in (5.11) to 0 with J(a) being the group LASSO penalty (5.6). Voxel-level sparse LDA classifiers that incorporates a spatiotemporal prior (STSLDA), a spatial prior (SSLDA), and no prior (SLDA) can be built in a similar manner as their anatomically-informed counterparts, but with J(a) in (5.11) being the LASSO penalty (5.3). LDA classifier with elastic net penalty (ENLDA) can also be built from (5.11) by setting Γ to identity and J(a) as the LASSO penalty (5.3). Our proposed model (5.11) thus encompasses many of the widely-used sparse linear models.  5.5 Materials The publicly available StarPlus database [117] was used for validation. We provide a brief description of the data and the experimental setup for convenience. Details can be found in [117]. In the StarPlus experiment, all subjects performed 40 trials of a sentence/picture matching task. In each trial, subjects were required to look at a picture (or sentence) followed by a sentence (or picture), and then decide whether the sentence (picture) correctly described the picture (sentence). The first stimulus was presented for 4 s followed by a blank screen for 4 s. The second stimulus was then presented for up to 4 s followed by a 15 s rest period. In half of the trials, the picture preceded the sentence, and vice versa.  100  Functional MRI brain volumes were acquired from 13 normal subjects at a TR of 500 ms, but only 6 of the subjects’ data are available online. Each subject’s dataset comprised voxel time courses within 25 ROIs, resulting in approximately 5000 voxels per subject. Inter-subject differences in the number of voxels were due to anatomical variability. Motion-correction and temporal detrending were applied on the voxel time courses to account for head motions and low frequency signal drifts. No spatial normalization was performed.  5.6 Results and Discussion To examine the benefits of incorporating a spatiotemporal prior in addition to enforcing sparsity, we treated the signal intensity of each voxel at each time point within a trial as a feature, and all brain volumes within the same trial of each experimental condition as a sample. The classification task was thus to discriminate sets of brain volumes associated with a picture from those associated with a sentence. To account for the delay in the HRF [24], we only used the 8 brain volumes collected 4 s after stimulus onset within each trial. Each sample thus consisted of approximately 40,000 features (i.e. roughly 5000 voxels × 8 volumes, see Section 3) with 40 samples per class (i.e. 40 trials per class) for each subject. For comparison, we contrasted ASTSLDA against LDA, linear SVM, SLDA, ENLDA, SSLDA, ASLDA, and ASSLDA. We employed the spectral projected gradient technique (SPG) [191] to minimize the optimization problem of the respective classifier models. SPG was chosen for its ability to efficiently handle large-scale problems (classifiers with ~40,000 coefficients in this study). All contrasted classifiers could be learned within a few seconds using SPG for a fixed set of parameter values λ and α (β was not involved in the classifier models contrasted in this work). Ten-fold nested cross validation was used [117] to select λ and α, and estimate the prediction accuracy. The average prediction accuracies over subjects are shown in Figure 5.3. An axial slice of the classifier weight patterns corresponding to 5.5 s and 6 s after stimulus onset (i.e. when HRF typically peaks) for 3 exemplar subjects are shown in Figure 5.4 for result interpretations.   101 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1 Av er ag e P re di ct io n A cc ur ac y .015 .015 .055 .068 .031 .349 .056 .138   Figure 5.3  Prediction accuracy comparisons. The p-values obtained by statistically comparing ASTSLDA against the other methods are indicated. Significance was declared at a p-value threshold of 0.05. ASTSLDA significantly outperformed the standard classifiers, LDA and SVM. The improvement over the other existing methods, namely SLDA, ENLDA, and ASLDA, were close to significant.  LDA resulted in the worst average prediction accuracy, which was likely due to overfitting. The classifier weight patterns also appeared spatially-random. Using linear SVM led to similar predictive performance and randomly-distributed weight patterns. Reducing overfitting using SLDA increased accuracy, but the weight patterns seemed overly-sparse. The reason was likely due to LASSO’s constraint on the number of non-zero elements in the classifier weight vector a, i.e. restricted by the sample size [2]. In fact, since we treated the signal intensity of the brain volumes within a trial as a single feature vector, weights would be spread across time, resulting in even sparser spatial patterns than what would be obtained in the conventional case where each brain volume is taken as a sample. Also, we observed little consistency in the SLDA weight patterns across subjects and time.   102 * ^ ++ * (a) LDA (b) SVM (c) SLDA (d) ENLDA (e) SSLDA (f) STSLDA (g) ASLDA (h) ASSLDA (i) ASTSLDA tk = 5.5 s tk = 5.5 s tk = 5.5 stk = 6 s tk = 6 s tk = 6 s  Figure 5.4  Classifier weight patterns. Weight patterns 5.5 s and 6 s after stimulus onset for 3 exemplar subjects. “*” = dorsolateral prefrontal cortex. “+” = temporal lobe. “^” = visual cortex. (a) LDA and (b) SVM resulted in randomly-distributed patterns. (c) SLDA generated overly-sparse patterns, which was mildly improved with (d) EN-LDA. (e) SSLDA and (f) STSLDA produced spatially smoother patterns. (g) ASLDA and (h) ASSLDA weight patterns displayed little consistency across time and subjects. (i) ASTSLDA patterns were more consistent across subjects and time compared to the contrasted classifiers.  103 Alleviating LASSO’s constraint on the number of non-zero elements in a using ENLDA [182] improved prediction accuracy over SLDA. However, the weight patterns remained overly-sparse, which demonstrate the highly-overlooked fact that higher prediction accuracy does not necessarily translate to more neurologically-sensible weight patterns. In contrast, modeling spatial correlations using SSLDA resulted in higher prediction accuracy and spatially smoother weight patterns. Our results thus illustrate the added-value of incorporating prior knowledge, both in terms of predictive performance and result interpretability. These benefits of exploiting prior knowledge were further exemplified using STSLDA, which obtained similar spatially smooth patterns but a further increase in accuracy. This increase likely arose from how STSLDA pooled information across brain volumes through penalizing discrepancies in classifier weights between spatiotemporal neighbors.  Accounting for the modular structure of the human brain using ASLDA improved prediction accuracy over the non-anatomically-informed classifiers except STSLDA, which again demonstrate the importance of constraining the model learning problem with prior knowledge (i.e. through grouping of associated features) to handle the curse of dimensionality. However, the weight patterns displayed little consistency between subjects and across time points. Modeling spatial correlations using ASSLDA resulted in higher accuracy than ASLDA, but it too obtained (slightly) lower accuracy than STSLDA. These results further support that exploiting the commonality between adjacent brain volumes, in addition to modeling spatial correlations, can greatly improve predictive performance.  Modeling both the modular property and spatiotemporal characteristics of brain activity using ASTSLDA resulted in the best overall predictive performance with an average accuracy of 91%, which is an 11% increase over LDA and SVM. ASTSLDA also achieved the lowest variability in prediction accuracy, thus demonstrating stable performance despite the considerable inter-subject variability seen in fMRI studies. Applying paired t-tests to statistically compare ASTSLDA against the other methods (Figure 5.3) showed that the increases in accuracy over the standard classifiers, LDA and SVM, were significant at a p- value threshold of 0.05. The accuracy increases over the other existing methods, namely SLDA, ENLDA, and ASLDA, were also close to significant. Provided that ASTSLDA  104 maintains the observed stability in performance, hence achieves similar level of accuracy for new subjects, we expect that the improvements over SLDA, ENLDA, and ASLDA would become statistically significant if more subjects are available [132]. We note that if we pool all samples together and ignore the factor of subject, we can assess whether two methods’ accuracy levels are significantly different using the McNemar’s test, which accounts for how accuracy levels are not normally-distributed. Qualitatively, higher consistency in weight patterns was observed across both subjects and time, with weights correctly assigned to the dorsolateral prefrontal cortex, which is responsible for verification task (e.g. decide if a sentence matches a picture) [197], as well as the visual cortex and the temporal lobe, which pertain to picture/sentence discrimination [198].   105 5.7 Summary The main challenge in fMRI classification is the need to learn very high dimensional classifiers given few noisy samples. The state-of-the-art feature selection methods that are commonly-used to handle this curse-of-dimensionality problem are summarized in Table 5.1.  Table 5.1  Summary of feature selection approaches for fMRI classification. Feature Selection Approaches Multivariate Data Structure Feature Associations Univariate [117] Multivariate [125, 126]  X Sparsity [120] X Structured Sparsity [119, 121, 129] X X  Selecting features using univariate techniques ignores the collective information encoded by the voxel signal patterns. Using multivariate feature selection techniques can alleviate this limitation, but these techniques do not model the structure within the data, e.g. adjacent voxels might not be jointly selected. Enforcing sparsity enables classifier learning and feature selection to be simultaneously and collaboratively performed, but again does not model data structure. Imposing structured sparsity penalties, such as group LASSO and elastic nets, account for data structure, but not feature associations, e.g. adjacent voxels might be jointly selected but might not be assigned similar weights. In fact, negligence of feature associations in classifier learning appears to be a common limitation to most existing methods for fMRI classification. Therefore, we proposed a simple, effective regularization technique that facilitates modeling of feature associations in addition to structured sparsity during classifier learning. We showed that adding sparsity, anatomical, and spatiotemporal priors to constrain the ill-conditioned fMRI classifier learning problem can substantially improve predictive accuracy over standard classifiers. Moreover, we demonstrated that incorporating relevant priors results in more neurologically-plausible classifier weight patterns.  106 6     Conclusions We presented in this thesis our technical contributions in the area of fMRI analysis. Our methods were based on novel incorporation of various pertinent priors to address key challenges facing current state-of-the-art approaches. We conclude here with a discussion of the strengths and weaknesses of our methods and some promising directions to explore.  6.1 Activation Detection The standard univariate approach completely ignores voxel interactions despite that such interactions may encode additional information for identifying the activated voxels. Current techniques that impose spatial priors account for local voxel interactions, but not the longer range interactions. We thus proposed modeling both types of interactions using RW [P1] and ANSPCA [P2], and showed that such information can improve detection sensitivity at the expense of potentially including noise-induced false correlations. This limitation was especially evident during synthetic testing at low SNR, hence suggesting that the extra information provided by voxel correlations estimated from the same dataset as that used for generating activation statistics might be limited. We thus proposed GMRF [P3, P5] to incorporate group information into intra-subject activation detection with improved detection sensitivity and stronger false positives control over RW shown. Also, GMRF detected a spatial focusing effect of L-dopa on the brain activation within the left M1 that was consistent across all PD subjects examined.  In addition to GMRF, we proposed GRW [P4] to integrate functional connectivity and group information. The activation detection results were similar to those obtained with GMRF, but as opposed to generating only binary output (i.e. active or non-active), GRW additionally provides posterior activation probabilities. From an interpretation perspective, having probability maps enables the experimenters to assess the uncertainty in the results. From a spatial pattern analysis point of view, since the probability maps are essentially “refined”  107 versions of the corresponding activation statistics maps, the spatial distribution of the activation probabilities would also encode important task-pertinent information, which can be captured using our proposed spatial characterization approach [P1]. A drawback to both GMRF and GRW is scalability. Having to store and analyze all subjects’ data within a group simultaneously would not scale well to databases with hundreds of subjects. Hence, GMRF and GRW are more suitable for clinical studies, where only ten to twenty subjects are typically recruited. For larger studies, some methods for sub-grouping the subjects based on demographic data or functional similarities will be needed.  Another complication to using GMRF and GRW, as well as RW and ANSPCA, is the need to “properly” balance the degree of influence between the activation statistics and the connectivity and group priors in delineating the activated voxels. Currently, we set all functional attributes on equal footings, but learning the relative weighting from data would be advantageous. One rigorous way to determine this weighting is to define a generative model and choose the weighting based on maximum model evidence. We thus proposed a connectivity-informed generative activation model [P6] that permits exact closed-form solutions for both the posterior activation effects estimates and the model evidence. We validated our model on a large cohort of 65 subjects and demonstrated that integrating a RS- connectivity prior improves sensitivity in detecting group activation. Our results thus support that the correlation structure of task activation effects is pertinent to RS connectivity, and this relationship is common across subjects. The flexibility of this model permits integration of other priors, such as group priors as used in GMRF and GRW and anatomical connectivity priors learned from diffusion MRI data, which would be promising directions to explore. Also, it would be interesting to add RS-connectivity prior to RW, ANSPCA, GMRF, and GRW, and compare results with our connectivity-informed activation model in the future.   108 6.2 ROI Characterization Conventional ROI analysis employs features that only capture amplitude changes in activation statistics without accounting for the spatial distribution of regional activation. During my Master studies, we proposed using invariant spatial features to model information encoded in the spatial patterns [81, 82]. In this thesis, we focused on further validating this analysis approach. In particular, we showed over a large cohort of 65 subjects undergoing ten different experimental conditions that brain activation can generally be inferred from the spatial modulations of regional BOLD distributions, as measured with our proposed invariant spatial features [P11].  We also explored the benefits of our spatial characterization approach for analyzing ROI activation statistics [81]. Specifically, we showed over more subjects that using our proposed spatial features provides greater task discriminability than traditional amplitude-based features [P7]. In fact, we detected the same spatial focusing effect of L-dopa on PD subjects’ regional activation [P9] as found with GMRF and GRW, as well as task-difficulty-related changes in the spread of regional activation [P10]. Also, we investigated into the temporal dynamics of spatial response, and found that its peak time antecedes that of BOLD amplitude changes [P12]. Furthermore, we showed that applying our spatial characterization approach [81] to warp-free activation statistics patterns outperformed the same approach applied to spatially normalized data [P8], at least with the registration technique employed [64]. Our results thus suggest that certain information might be lost during spatial normalization. Nevertheless, high dimensional registration techniques, such as HAMMER [62], might better preserve the commonalities in spatial patterns across subjects than the registration technique used in this work. Comparing these high dimensional techniques against warp-free spatial analysis would thus be beneficial in providing further insights into the pro’s and con’s of spatial normalization.  In most of our studies, we concentrated on examining the spatial extent of activation, since this feature is easy to interpret. Nevertheless, there are other spatial phenomena that have  109 simple physical meanings. In particular, we proposed a spatial feature that measures the skewness in the spatial distribution of regional BOLD distributions [P13]. Our results on real data (Section 2.7.1) suggest that this skewness feature can play a complementary role to more fully describe task-induced spatial changes. An interesting question is which feature is most dominant. We currently examined each of our spatial features separately to infer brain activation. Determining the optimal weightings to combine these spatial features in detecting activation may further improve sensitivity, and provide us an intuition as to which spatial feature is most task-relevant.  A limitation to our spatial characterization approach, in fact ROI analysis in general, is the need to define ROIs. Currently, we define the ROIs based on neuroanatomy, but some of the anatomical ROIs may consist of multiple functional subregions. Thus, extending our prior- informed activation model to divide the anatomical ROIs into functional subregions would be beneficial. Other applications of our spatial characterization approach include inferring functional connectivity from the spatial feature time courses [P14]. Our results demonstrate such feasibility, but the exact benefits and physical interpretation of the results are currently unclear, and thus, in need for further investigations.  110 6.3 Functional Connectivity Analysis The standard seed-based approach is limited in the type of networks that may be detected. Using decomposition techniques, such as PCA and ICA, enables functional networks to be identified in an exploratory manner. However, drawing a correspondence between the intra- subject spatial maps derived from PCA and ICA to infer group networks is nontrivial due to the large inter-subject variability seen in fMRI studies. Also, the diffused weighting problem of PCA and ICA renders delineation of network nodes from the spatial maps difficult. We thus proposed GRD [P15, P16] to incorporate group information in identifying common sparse networks across a group of subjects. Enforcing sparsity simplifies the delineation of network nodes, while incorporating group information eases the correspondence problem. Applying GRD to synthetic data correctly detected networks that perfectly matched with ground truth. In contrast, PCA, RD, ICA, and Group ICA were found to be prone to inter- subject variability, demonstrating the effectiveness of GRD in handling subject variability. When applied to real fMRI data, GRD detected task-specific group networks that matched well with prior neuroscience knowledge. Moreover, our results provide new explanations for the observed differences in symptoms between PDt and PDar, suggesting that altered brain networks may be implicative to the PD subtype differences.  A limitation of GRD is the need for storing large correlation matrices. Hence, GRD does not easily extend to voxel level analysis. Also, the memory requirement scales up with the number subjects. Nevertheless, if we instead divide the brain into ~1,000 regions, GRD can easily handle up to ~100 subjects. For future work, it will be interesting to examine whether differences in subject-specific regional weightings are correlated with e.g. diseased severity or genetic factors. Also, applying GRD to diffusion MRI data and comparing the resulting networks with that derived from RS-fMRI data would enable us to further our understanding of the relationships between anatomical and functional connectivity.   111 6.4 Brain Decoding In fMRI classification, the number of features typically far exceeds the number of samples. Naively applying standard classifiers will thus result in overfitting. Implicitly selecting features by enforcing sparsity can lead to spurious classifier weight patterns. We thus proposed GSR [P17] to model feature associations, in addition to sparsity. We showed that incorporating spatial [P20], spatiotemporal [P19], and anatomical priors [P18] can enhance predictive performance and generate more neurologically plausible classifier weight patterns. However, drawing a balance between mis-classification errors and the amount of regularization usually involves internal cross validation, which is very computationally demanding. Also, the anatomical prior that we used jointly assigns weights to all voxels within the selected ROIs. However, some larger anatomical ROIs may comprise multiple functional sub-regions, thus dividing these larger ROIs into functional parcels [130] might further improve results. Alternatively, using hierarchical LASSO [184] to prune out the non- relevant voxels within the selected anatomical ROIs would be a promising direction to investigate. Furthermore, it would be interesting to explore other priors, such as RS- connectivity prior and group prior, analogous to our approach taken for activation detection.   112 References [1] World Health Organization, Neurological disorders: Public health challenges. Switzerland: WHO Press, 2006. [2] NeuroScience Canada. Brain Facts. Available: [3] Canadian Institute for Health Information, The burden of neurological diseases, disorders and injuries in Canada. Ottawa: CIHI, 2007. [4] M. F. Beal, A. E. Lang, and A. C. Ludolph, Neurodegenerative diseases: Neurobiology, pathogenesis, and therapeutics: Cambridge University Press, 2005. [5] P. Mehta, A. Kifley, J. Wang, E. Rochtchina, P. Mitchell, and C. Sue, "Population prevalence and incidence of Parkinson’s disease in an Australian community," Intern. Med. Journal, vol. 37, pp. 812-814, 2007. [6] G. McKhann, D. Drachman, M. Folstein, R. Katzman, D. Price, and E. M. Stadlan, "Clinical diagnosis of Alzheimer's disease," Neurology, vol. 34, pp. 939-944, 1984. [7] D. J. Gelb, E. Oliver, and S. Gilman, "Diagnostic criteria for Parkinson disease," Arch. Neurology, vol. 56, pp. 33-39, 1999. [8] I. Halperin, M. Morelli, A. D. Korczyn, M. B. H. Youdim, and S. A. Mandel, "Biomarkers for evaluation of clinical efficacy of multipotential neuroprotective drugs for Alzheimer's and Parkinson's diseases," Neurotherapeutics, vol. 6, pp. 128- 140, 2009. [9] S. T. DeKosky and K. Marek, "Looking backward to move forward: Early detection of neurodegenerative disorders," Science, vol. 302, pp. 830-834, 2003. [10] C. R. Jack, R. C. Petersen, Y. Xu, P. C. O'Brien, G. E. Smith, R. J. Ivnik, E. G. Tangalos, and E. Kokmen, "The rate of medial temporal lobe atrophy in typical aging and Alzheimer's disease," Neurology, vol. 51, pp. 993-999, 1998. [11] N. De Stefano, M. Battaglini, and S. M. Smith, "Measuring brain atrophy in multiple sclerosis," J. Neuroimaging, vol. 17, pp. 10S-15S, 2007. [12] R. Stahl, O. Dietrich, S. J. Teipel, H. Hampel, M. F. Reiser, and S. O. Schoenberg, "White matter damage in Alzheimer disease and mild cognitive impairment:  113 Assessment with diffusion-tensor MR imaging and parallel imaging techniques," Radiology, vol. 243, pp. 483-492, 2007. [13] H. W. Berendse and C. J. Stam, "Stage-dependent patterns of disturbed neural synchrony in Parkinson's disease," Parkinsonism Relat. Disord., vol. 13, pp. S440- S445, 2007. [14] S. Thobois, S. Guillouet, and E. Broussolle, "Contributions of PET and SPECT to the understanding of the pathophysiology of Parkinson's disease," Neurophysiol. Clin., vol. 31, pp. 321-340, 2001. [15] L. Mosconi, A. Pupi, and M. J. De Leon, "Brain glucose hypometabolism and oxidative stress in preclinical Alzheimer's disease," Ann. N. Y. Acad. Sci., vol. 1147, pp. 180-195, 2008. [16] S. J. Li, Z. Li, G. Wu, M. J. Zhang, M. Franczak, and P. G. Antuono, "Alzheimer disease: Evaluation of a functional MR imaging index as a marker," Radiology, vol. 225, pp. 253-259, 2002. [17] P. Pantano, C. Mainero, and F. Caramia, "Functional brain reorganization in multiple sclerosis: Evidence from fMRI studies," J. Neuroimaging, vol. 16, pp. 104-114, 2006. [18] K. J. Kokotilo, J. J. Eng, and L. A. Boyd, "Reorganization of brain function during force production after stroke: A systematic review of the literature," J. Neurol. Phys. Ther., vol. 33, p. 45, 2009. [19] S. Ogawa, T. Lee, A. Kay, and D. Tank, "Brain magnetic resonance imaging with contrast dependent on blood oxygenation," Proc. Natl. Acad. Sci. U.S.A., vol. 87, pp. 9868-9872, 1990. [20] M. D. Fox and M. Greicius, "Clinical applications of resting state functional connectivity," Front. Syst. Neurosci., vol. 4, p. 19, 2010. [21] G. E. Sarty, Computing brain activity maps from fMRI time-series images. New York: Cambridge University Press, 2007. [22] V. Calhoun. Preprocessing III & IV: Anatomic transformation to standard brain spaces. Available: [23] S. C. Strother, "Evaluating fMRI preprocessing pipelines," IEEE Engin. Med. Biol., vol. 25, pp. 27-41, 2006.  114 [24] G. H. Glover, "Deconvolution of impulse response in event-related BOLD fMRI," NeuroImage, vol. 9, pp. 416-429, 1999. [25] R. N. A. Henson, C. Buechel, O. Josephs, and K. J. Friston, "The slice-timing problem in event-related fMRI," NeuroImage, vol. 9, p. S125, 1999. [26] K. J. Friston, J. Ashburner, C. D. Frith, J. B. Poline, J. D. Heather, and R. S. J. Frackowiak, "Spatial registration and normalization of images," Hum. Brain Mapp., vol. 3, pp. 165-189, 1995. [27] A. P. Holmes, O. Josephs, C. Buchel, and K. J. Friston, "Statistical modeling of low- frequency confounds in fMRI," NeuroImage, vol. 5, p. S480, 1997. [28] K. J. Worsley, C. H. Liao, J. Aston, V. Petre, G. H. Duncan, F. Morales, and A. C. Evans, "A general statistical analysis for fMRI data," NeuroImage, vol. 15, pp. 1-15, 2002. [29] K. J. Worsley, S. Marrett, P. Neelin, A. C. Vandal, K. J. Friston, and A. C. Evans, "A unified statistical approach for determining significant signals in images of cerebral activation," Hum. Brain Mapp., vol. 4, pp. 58-73, 1996. [30] R. S. Frackowiak, K. J. Friston, C. D. Frith, R. Dolan, J. Ashburner, C. Price, W. Penny, and S. Zeki, Human brain function: Academic Press, 2004. [31] K. J. Friston, A. P. Holmes, K. J. Worsley, J. B. Poline, C. D. Frith, and R. S. J. Frackowiak, "Statistical parametric maps in functional imaging: A general linear approach," Hum. Brain Mapp., vol. 2, pp. 189-210, 1995. [32] R. T. Constable, P. Skudlarski, E. Mencl, K. R. Pugh, R. K. Fulbright, C. Lacadie, S. E. Shaywitz, and B. A. Shaywitz, "Quantifying and comparing region-of-interest activation patterns in functional brain MR imaging: Methodology considerations," Magn. Reson. Imaging, vol. 16, pp. 289-300, 1998. [33] R. Buck, H. Singhal, J. Arora, H. Schlitt, and R. T. Constable, "Detecting change in BOLD signal between sessions for atlas-based anatomical ROIs," NeuroImage, vol. 40, pp. 1157-1165, 2008. [34] K. Friston, C. Frith, P. Fiddle, and R. Frackowiak, "Functional connectivity: The principal-component analysis of large (PET) data sets," J. Cereb. Blood Flow Metab., vol. 3, pp. 5-14, 1993.  115 [35] K. A. Norman, S. M. Polyn, G. J. Detre, and J. V. Haxby, "Beyond mind-reading: Multi-voxel pattern analysis of fMRI data," Trends Cogn. Sci., vol. 10, pp. 424-430, 2006. [36] J. D. Haynes and G. Rees, "Decoding mental states from brain activity in humans," Nat. Rev. Neurosci., vol. 7, pp. 523-534, 2006. [37] K. Friston, C. Frith, and R. Frackowiak, "Time dependent changes in effective connectivity measured with PET," Hum. Brain Mapp., vol. 1, pp. 69-79, 1993. [38] S. Makni, P. Ciuciu, J. Idier, and J. B. Poline, "Joint detection-estimation of brain activity in functional MRI: A multichannel deconvolution solution," IEEE Trans. Sig. Proc., vol. 53, pp. 3488-3502, 2005. [39] P. M. Matthews, G. D. Honey, and E. T. Bullmore, "Applications of fMRI in translational medicine and clinical practice," Nat. Rev. Neurosci., vol. 7, pp. 732-44, 2006. [40] T. E. Nichols and A. P. Holmes, "Nonparametric permutation tests for functional neuroimaging: A primer with examples," Hum. Brain Mapp., vol. 15, pp. 1-25, 2002. [41] K. Worsley, S. Marrett, P. Neelin, and A. Evans, "Searching scale space for activation in PET images," Hum. Brain Mapp., vol. 4, pp. 74-90, 1996. [42] J. B. Poline and B. M. Mazoyer, "Analysis of individual brain activation maps using hierarchical description and multiscale detection," IEEE Trans. Med. Imaging, vol. 13, pp. 702-710, 1994. [43] S. Makni, J. Idier, T. Vincent, B. Thirion, G. Dehaene-Lambertz, and P. Ciuciu, "A fully Bayesian approach to the parcel-based detection-estimation of brain activity in fMRI," NeuroImage, vol. 41, pp. 941-969, 2008. [44] B. S. Everitt and E. T. Bullmore, "Mixture model mapping of brain activation in functional magnetic resonance images," Hum. Brain Mapp., vol. 7, pp. 1-14, 1999. [45] N. V. Hartvig and J. L. Jensen, "Spatial mixture modeling of fMRI data," Hum. Brain Mapp., vol. 11, pp. 233-248, 2000. [46] M. W. Woolrich, T. E. J. Behrens, C. F. Beckmann, and S. M. Smith, "Mixture models with adaptive spatial regularization for segmentation with an application to fMRI data," IEEE Trans. Med. Imaging, vol. 24, pp. 1-11, 2005.  116 [47] M. W. Woolrich and T. E. Behrens, "Variational Bayes inference of spatial mixture models for segmentation," IEEE Trans. Med. Imaging, vol. 25, pp. 1380-1391, 2006. [48] X. Descombes, F. Kruggel, and D. Y. Von Cramon, "Spatio-temporal fMRI analysis using Markov random fields," IEEE Trans. Med. Imaging, vol. 17, pp. 1028-1039, 1998. [49] M. Svensen, F. Kruggel, and D. Y. Von Cramon, "Probabilistic modeling of single- trial fMRI data," IEEE Trans. Med. Imaging, vol. 19, pp. 25-35, 2000. [50] W. Ou, W. M. Wells III, and P. Golland, "Combining spatial priors and anatomical information for fMRI detection," Med. Image Ana., vol. 14, pp. 318-331, 2010. [51] U. E. Ruttimann, M. Unser, R. R. Rawlings, D. Rio, N. F. Ramsey, V. S. Mattay, D. W. Hommer, J. A. Frank, and D. R. Weinberger, "Statistical analysis of functional MRI data in the wavelet domain," IEEE Trans. Med. Imaging, vol. 17, pp. 142-154, 1998. [52] C. R. Genovese, N. A. Lazar, and T. Nichols, "Thresholding of statistical maps in functional neuroimaging using the false discovery rate," NeuroImage, vol. 15, pp. 870-878, 2002. [53] D. Van De Ville, T. Blu, and M. Unser, "Surfing the brain," IEEE Engin. Med. Biol., vol. 25, pp. 65-78, 2006. [54] D. Van De Ville, T. Blu, and M. Unser, "Integrated wavelet processing and spatial statistical testing of fMRI data," NeuroImage, vol. 23, pp. 1472-1485, 2004. [55] W. D. Penny, N. J. Trujillo-Barreto, and K. J. Friston, "Bayesian fMRI time series analysis with spatial priors," NeuroImage, vol. 24, pp. 350-362, 2005. [56] L. Harrison, W. Penny, J. Ashburner, N. Trujillo-Barreto, and K. Friston, "Diffusion- based spatial priors for imaging," NeuroImage, vol. 38, pp. 677-695, 2007. [57] G. Flandin and W. D. Penny, "Bayesian fMRI data analysis with sparse spatial basis function priors," NeuroImage, vol. 34, pp. 1108-1125, 2007. [58] M. W. Woolrich, M. Jenkinson, J. M. Brady, and S. M. Smith, "Fully Bayesian spatio-temporal modeling of fMRI data," IEEE Trans. Med. Imaging, vol. 23, pp. 213-231, 2004. [59] C. Gössl, D. P. Auer, and L. Fahrmeir, "Bayesian spatiotemporal inference in functional magnetic resonance imaging," Biometrics, vol. 57, pp. 554-562, 2001.  117 [60] A. C. Evans, D. L. Collins, S. R. Mills, E. D. Brown, R. L. Kelly, and T. M. Peters, "3D statistical neuroanatomical models from 305 MRI volumes," San Francisco, C.A., 1993, pp. 1813-1817. [61] H. B. M. Uylings, G. Rajkowska, E. Sanz-Arigita, K. Amunts, and K. Zilles, "Consequences of large interindividual variability for human brain atlases: Converging macroscopical imaging and microscopical neuroanatomy," Anat. Embryol., vol. 210, pp. 423-431, 2005. [62] D. Shen and C. Davatzikos, "HAMMER: hierarchical attribute matching mechanism for elastic registration," IEEE Trans. Med. Imaging, vol. 21, pp. 1421-1439, 2002. [63] J. Ashburner and K. J. Friston, "Nonlinear spatial normalization using basis functions," Hum. Brain Mapp., vol. 7, pp. 254-266, 1999. [64] J. Ashburner and K. J. Friston, "Unified segmentation," NeuroImage, vol. 26, pp. 839-851, 2005. [65] A. M. Tahmasebi, P. Abolmaesumi, Z. Z. Zheng, K. G. Munhall, and I. S. Johnsrude, "Reducing inter-subject anatomical variation: Effect of normalization method on sensitivity of functional magnetic resonance imaging data analysis in auditory cortex and the superior temporal region," NeuroImage, vol. 47, pp. 1522-1531, 2009. [66] M. Brett, I. S. Johnsrude, and A. M. Owen, "The problem of functional localization in the human brain," Nat. Rev. Neurosci., vol. 3, pp. 243-249, 2002. [67] B. Thirion, P. Pinel, A. Tucholka, A. Roche, P. Ciuciu, J. F. Mangin, and J. B. Poline, "Structural analysis of fMRI data revisited: Improving the sensitivity and reliability of fMRI group studies," IEEE Trans. Med. Imaging, vol. 26, pp. 1256-1269, 2007. [68] B. Thirion, A. Tucholka, M. Keller, P. Pinel, A. Roche, J. F. Mangin, and J. B. Poline, "High level group analysis of FMRI data based on Dirichlet process mixture models," in Proc. Int. Conf. Information Processing in Medical Imaging, KerKrade, Netherlands, 2007, pp. 482-494. [69] C. M. Bishop, Pattern recognition and machine learning. New York, N.Y.: Springer 2006. [70] P. Schuster and K. Sigmund, "Replicator dynamics," J. Theor. Biol., vol. 100, pp. 533-538, 1983.  118 [71] M. Keller, A. Roche, A. Tucholka, and B. Thirion, "Dealing with spatial normalization errors in fMRI group inference using hierarchical modeling," Stat. Sinica, vol. 18, pp. 1357–1374, 2008. [72] V. Calhoun, T. Adali, G. Pearlson, and J. Pekar, "A method for making group inferences from functional MRI data using independent component analysis," Hum. Brain Mapp., vol. 14, pp. 140-151, 2001. [73] M. J. McKeown, S. Makeig, G. G. Brown, T. P. Jung, S. S. Kindermann, A. J. Bell, and T. J. Sejnowski, "Analysis of fMRI data by blind separation into independent spatial components," Hum. Brain Mapp., vol. 6, pp. 160-188, 1998. [74] D. R. Bathula, L. H. Staib, H. D. Tagare, X. Papademetris, R. T. Schultz, and J. S. Duncan, "Multi-group functional MRI analysis using statistical activation priors," in Proc. Workshop fMRI Data Analysis held in conjunction with Int. Conf. Medical Image Computing and Computer Assisted Intervention, New York, N.Y., 2009, pp. 5- 12. [75] S. Kim, P. Smyth, and H. Stern, "A Bayesian mixture approach to modeling spatial activation patterns in multisite fMRI data," IEEE Trans. Med. Imaging, vol. 29, pp. 1260-1274, 2010. [76] D. J. Moore, A. B. West, V. L. Dawson, and T. M. Dawson, "Molecular pathophysiology of Parkinson's disease," Ann. Rev. Neurosci., vol. 28, pp. 57-87, 2005. [77] A. Nieto-Castanon, S. S. Ghosh, J. A. Tourville, and F. H. Guenther, "Region of interest based analysis of functional imaging data," NeuroImage, vol. 19, pp. 1303- 1316, 2003. [78] G. Allen and E. Courchesne, "Differential effects of developmental cerebellar abnormality on cognitive and motor functions in the cerebellum: An fMRI study of autism," Am. J. Psychiatry, vol. 160, pp. 262-273, 2003. [79] H. J. Aizenstein, K. A. Clark, M. A. Butters, J. Cochran, V. A. Stenger, C. C. Meltzer, C. F. Reynolds, and C. S. Carter, "The BOLD hemodynamic response in healthy aging," J. Cog. Neurosci., vol. 16, pp. 786-793, 2004.  119 [80] J. V. Haxby, M. I. Gobbini, M. L. Furey, A. Ishai, J. L. Schouten, and P. Pietrini, "Distributed and overlapping representations of faces and objects in ventral temporal cortex," Science, vol. 293, pp. 2425-2429, 2001. [81] B. Ng, R. Abugharbieh, X. Huang, and M. J. McKeown, "Characterizing fMRI activations within regions of interest (ROIs) using 3D moment invariants," in Proc. IEEE Workshop Mathematical Methods in Biomedical Image Analysis held in conjunction with Int. Conf. Computer Vision and Pattern Recognition, New York, N.Y., 2006, p. 63 (8 pages). [82] B. Ng, R. Abugharbieh, S. J. Palmer, and M. J. McKeown, "Characterizing task- related temporal dynamics of spatial activation distributions in fMRI BOLD signals," in Proc. Int. Conf. Medical Image Computing and Computer Assisted Intervention, Brisbane, Australia, 2007, pp. 767-774. [83] A. Uthama, R. Abugharbieh, S. J. Palmer, A. Traboulsee, and M. J. McKeown, "SPHARM-based spatial fMRI characterization with intersubject anatomical variability reduction," IEEE J. Sel. Topics Sig. Proc. , vol. 2, pp. 907-918, 2008. [84] B. P. Rogers, V. L. Morgan, A. T. Newton, and J. C. Gore, "Assessing functional connectivity in the human brain by fMRI," Magn. Reson. Imaging, vol. 25, pp. 1347- 1357, 2007. [85] M. Guye, F. Bartolomei, and J. P. Ranjeva, "Imaging structural and functional connectivity: Towards a unified definition of human brain organization?," Curr. Opin. Neurol., vol. 21, pp. 393-403, 2008. [86] B. Biswal, F. Z. Yetkin, V. M. Haughton, and J. S. Hyde, "Functional connectivity in the motor cortex of resting human brain using echo-planar MRI," Magn. Reson. Med., vol. 34, pp. 537-541, 1995. [87] M. P. Van Den Heuvel and H. E. Hulshoff Pol, "Exploring the brain network: A review on resting-state fMRI functional connectivity," Eur. Neuropsychopharmacology, vol. 20, pp. 519-534, 2010. [88] M. D. Fox and M. E. Raichle, "Spontaneous fluctuations in brain activity observed with functional magnetic resonance imaging," Nat. Rev. Neurosci., vol. 8, pp. 700- 711, 2007.  120 [89] M. D. Greicius, K. Supekar, V. Menon, and R. F. Dougherty, "Resting-state functional connectivity reflects structural connectivity in the default mode network," Cerebral Cortex, vol. 19, pp. 72-78, 2009. [90] S. M. Smith, P. T. Fox, K. L. Miller, D. C. Glahn, P. M. Fox, C. E. Mackay, N. Filippini, K. E. Watkins, R. Toro, and A. R. Laird, "Correspondence of the brain's functional architecture during activation and rest," Proc. Nat. Acad. Sci., vol. 106, pp. 13040-13045, 2009. [91] M. Mennes, C. Kelly, X. N. Zuo, A. Di Martino, B. B. Biswal, F. X. Castellanos, and M. P. Milham, "Inter-individual differences in resting-state functional connectivity predict task-induced BOLD activity," NeuroImage, vol. 50, pp. 1690-1701, 2010. [92] J. S. Damoiseaux, S. Rombouts, F. Barkhof, P. Scheltens, C. J. Stam, S. M. Smith, and C. F. Beckmann, "Consistent resting-state networks across healthy subjects," Proc. Nat. Acad. Sci., vol. 103, pp. 13848-13853, 2006. [93] K. E. Stephan, L. Kasper, L. M. Harrison, J. Daunizeau, H. E. M. den Ouden, M. Breakspear, and K. J. Friston, "Nonlinear dynamic causal models for fMRI," NeuroImage, vol. 42, pp. 649-662, 2008. [94] J. Kim, W. Zhu, L. Chang, P. M. Bentler, and T. Ernst, "Unified structural equation modeling approach for the analysis of multisubject, multivariate functional MRI data," Hum. Brain Mapp., vol. 28, pp. 85-93, 2007. [95] W. D. Penny, K. E. Stephan, A. Mechelli, and K. J. Friston, "Modelling functional integration: a comparison of structural equation and dynamic causal models," NeuroImage, vol. 23, pp. S264-S274, 2004. [96] K. Li, L. Guo, J. Nie, G. Li, and T. Liu, "Review of methods for functional brain connectivity detection using fMRI," Comput. Med. Imaging Graph., vol. 33, pp. 131- 139, 2009. [97] M. D. Greicius, B. Krasnow, A. L. Reiss, and V. Menon, "Functional connectivity in the resting brain: A network analysis of the default mode hypothesis," Proc. Natl. Acad. Sci. U.S.A., vol. 100, p. 253, 2003. [98] L. Ma, B. Wang, X. Chen, and J. Xiong, "Detecting functional connectivity in the resting brain: A comparison between ICA and CCA," Magn. Reson. Imaging, vol. 25, pp. 47-56, 2007.  121 [99] V. G. Van De Ven, E. Formisano, D. Prvulovic, C. H. Roeder, and D. E. J. Linden, "Functional connectivity as revealed by spatial independent component analysis of fMRI measurements during rest," Hum. Brain Mapp., vol. 22, pp. 165-178, 2004. [100] C. Goutte, P. Toft, E. Rostrup, F. Å. Nielsen, and L. K. Hansen, "On clustering fMRI time series," NeuroImage, vol. 9, pp. 298-310, 1999. [101] D. Cordes, V. Haughton, J. D. Carew, K. Arfanakis, and K. Maravilla, "Hierarchical clustering to measure connectivity in fMRI resting-state data," Magn. Reson. Imaging, vol. 20, pp. 305-317, 2002. [102] R. Zass and A. Shashua, "Nonnegative sparse PCA," in Proc. Int. Conf. Neural Information Processing Systems, 2007, pp. 1561-1568. [103] C. D. Sigg and J. M. Buhmann, "Expectation-maximization for sparse and non- negative PCA," in Proc. Int. Conf. Machine Learning, Helsinki, Finland, 2008, pp. 960-967. [104] F. De Martino, F. Gentile, F. Esposito, M. Balsi, F. Di Salle, R. Goebel, and E. Formisano, "Classification of fMRI independent components using IC-fingerprints and support vector machine classifiers," NeuroImage, vol. 34, pp. 177-194, 2007. [105] P. Metzak, E. Feredoes, Y. Takane, L. Wang, S. Weinstein, T. Cairo, E. T. C. Ngan, and T. S. Woodward, "Constrained principal component analysis reveals functionally connected load dependent networks involved in multiple stages of working memory," Hum. Brain Mapp., vol. 32, pp. 856-871, 2010. [106] J. Friedman, T. Hastie, and R. Tibshirani, "Sparse inverse covariance estimation with the graphical LASSO," Biostat., vol. 9, pp. 432-441, 2008. [107] S. M. Smith, K. L. Miller, G. Salimi-Khorshidi, M. Webster, C. F. Beckmann, T. E. Nichols, J. D. Ramsey, and M. W. Woolrich, "Network modelling methods for fMRI," NeuroImage, vol. 2, pp. 875-891, 2010. [108] G. Varoquaux, A. Gramfort, J. B. Poline, and B. Thirion, "Brain covariance selection: Better individual functional connectivity models using population prior," in Proc. Int. Conf. Advances in Neural Information Processing Systems, Vancouver, Canada, 2010, pp. 2334-2342. [109] D. Lashkari, E. Vul, N. Kanwisher, and P. Golland, "Discovering structure in the space of fMRI selectivity profiles," NeuroImage, vol. 50, pp. 1085-1098, 2010.  122 [110] V. J. Schmithorst and S. K. Holland, "Comparison of three methods for generating group statistical inferences from independent component analysis of functional magnetic resonance imaging data," J. Magn. Reson. Imaging, vol. 19, pp. 365-368, 2004. [111] A. L. W. Bokde, M. A. Tagamets, R. B. Friedman, and B. Horwitz, "Functional interactions of the inferior frontal cortex during the processing of words and word- like stimuli," Neuron, vol. 30, pp. 609-617, 2001. [112] G. Lohmann and S. Bohn, "Using replicator dynamics for analyzing fMRI data of the human brain," IEEE Trans. Med. Imaging, vol. 21, pp. 485-492, 2002. [113] M. S. Goncalves, D. A. Hall, I. S. Johnsrude, and M. P. Haggard, "Can meaningful effective connectivities be obtained between auditory cortical regions?," NeuroImage, vol. 14, pp. 1353-1360, 2001. [114] C. F. Beckmann, C. E. Mackay, N. Filippini, and S. M. Smith, "Group comparison of resting-state fMRI data using multi-subject ICA and dual regression," NeuroImage, vol. 47, p. S148, 2009. [115] D. A. Fair, N. U. F. Dosenbach, J. A. Church, A. L. Cohen, S. Brahmbhatt, F. M. Miezin, D. M. Barch, M. E. Raichle, S. E. Petersen, and B. L. Schlaggar, "Development of distinct control networks through segregation and integration," Proc. Nat. Acad. Sci., vol. 104, pp. 13507-13512, 2007. [116] L. Deuker, E. T. Bullmore, M. Smith, S. Christensen, P. J. Nathan, B. Rockstroh, and D. S. Bassett, "Reproducibility of graph metrics of human brain functional networks," NeuroImage, vol. 47, pp. 1460-1468, 2009. [117] T. M. Mitchell, R. Hutchinson, R. S. Niculescu, F. Pereira, X. Wang, M. Just, and S. Newman, "Learning to decode cognitive states from brain images," Mach. Learn., vol. 57, pp. 145-175, 2004. [118] J. D. Haynes and G. Rees, "Predicting the orientation of invisible stimuli from activity in human primary visual cortex," Nat. Neurosci., vol. 8, pp. 686-691, 2005. [119] M. K. Carroll, G. A. Cecchi, I. Rish, R. Garg, and A. R. Rao, "Prediction and interpretation of distributed neural activity with sparse models," NeuroImage, vol. 44, pp. 112-122, 2009.  123 [120] O. Yamashita, M. Sato, T. Yoshioka, F. Tong, and Y. Kamitani, "Sparse estimation automatically selects voxels relevant for the decoding of fMRI activity patterns," NeuroImage, vol. 42, pp. 1414-1429, 2008. [121] S. Ryali, K. Supekar, D. A. Abrams, and V. Menon, "Sparse logistic regression for whole-brain classification of fMRI data," NeuroImage, vol. 51, pp. 752-764, 2010. [122] V. Michel, E. Eger, C. Keribin, and B. Thirion, "Multi-class sparse Bayesian regression for neuroimaging data analysis," in Proc. Workshop on Machine Learning in Medical Imaging held in conjunction with Int. Conf. Medical Image Computing and Computer Assisted Intervention, Beijing, China, 2010, pp. 50-57. [123] T. A. Carlson, P. Schrater, and S. He, "Patterns of activity in the categorical representations of objects," J. Cogn. Neurosci., vol. 15, pp. 704-717, 2003. [124] A. K. Jain, R. P. W. Duin, and J. Mao, "Statistical pattern recognition: A review," IEEE Trans. Patt. Ana. Mach. Intell., vol. 22, pp. 4-37, 2000. [125] S. J. Hanson and Y. O. Halchenko, "Brain reading using full brain support vector machines for object recognition: There is no “face” identification area," Neural Comput., vol. 20, pp. 486-503, 2008. [126] G. Langs, B. H. Menze, D. Lashkari, and P. Golland, "Detecting stable distributed patterns of brain activation using gini contrast," NeuroImage, vol. 56, pp. 497-507, 2010. [127] R. Tibshirani, "Regression shrinkage and selection via the lasso," J. Royal Stat. Soc. Series B, vol. 58, pp. 267-288, 1996. [128] L. Liang, V. Cherkassky, and D. A. Rottenberg, "Spatial SVM for feature selection and fMRI activation detection," in Proc. Int. Joint Conf. on Neural Networks, Vancouver, Canada, 2006, pp. 1463-1469. [129] M. Van Gerven, A. Takashima, and T. Heskes, "Selecting and identifying regions of interest using groupwise regularization," in Proc. Int. Workshop on New Directions in Statistical Learning for Meaningful and Reproducible fMRI Analysis held in conjunction with Ann. Conf. Neural Information Processing Systems, Vancouver, Canada, 2008.  124 [130] B. Thirion, G. Flandin, P. Pinel, A. Roche, P. Ciuciu, and J. B. Poline, "Dealing with the shortcomings of spatial normalization: Multi-subject parcellation of fMRI datasets," Hum. Brain Mapp., vol. 27, pp. 678-693, 2006. [131] M. A. J. Van Gerven, B. Cseke, F. P. De Lange, and T. Heskes, "Efficient Bayesian multivariate fMRI analysis using a sparsifying spatio-temporal prior," NeuroImage, vol. 50, pp. 150-161, 2010. [132] W. W. Daniel, Biostatistics: A foundation for analysis in the health sciences: John Wiley & Sons, 2008. [133] R. B. Potts, "Some generalized order-disorder transformations," Math. Proc. Cambridge Philosophical Soc., vol. 48, pp. 106-109, 1952. [134] Y. Boykov, O. Veksler, and R. Zabih, "Fast approximate energy minimization via graph cuts," IEEE Trans. Patt. Ana. Mach. Intell., vol. 23, pp. 1222-1239, 2001. [135] L. Grady, "Random walks for image segmentation," IEEE Trans. Patt. Ana. Mach. Intell., vol. 28, pp. 1768-1783, 2006. [136] L. Grady, "Multilabel random walker image segmentation using prior models," in Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, San Diego, California 2005, pp. 763-770. [137] M. Van Den Heuvel, R. Mandl, and H. Hulshoff Pol, "Normalized cut group clustering of resting-state FMRI data," PLoS One, vol. 3, p. e2001, 2008. [138] M. Vandenbroucke, R. Goekoop, E. Duschek, J. Netelenbos, J. Kuijer, F. Barkhof, P. Scheltens, and S. Rombouts, "Interindividual differences of medial temporal lobe activation during encoding in an elderly population studied by fMRI," NeuroImage, vol. 21, pp. 173-180, 2004. [139] A. Myronenko and X. Song, "Point set registration: coherent point drift," IEEE Trans. Patt. Ana. Mach. Intell., vol. 32, pp. 2262-2275, 2010. [140] C. E. L. Stark and Y. Okado, "Making memories without trying: Medial temporal lobe activity associated with incidental memory formation during recognition," J. Neurosci., vol. 23, p. 6748, 2003. [141] W. Penny, G. Flandin, and N. Trujillo Barreto, "Bayesian comparison of spatially regularised general linear models," Hum. Brain Mapp., vol. 28, pp. 275-293, 2007.  125 [142] L. R. Dice, "Measures of the amount of ecologic association between species," Ecology, vol. 26, pp. 297-302, 1945. [143] P. Skudlarski, R. T. Constable, and J. C. Gore, "ROC analysis of statistical methods used in functional MRI: individual subjects," NeuroImage, vol. 9, pp. 311-329, 1999. [144] R. Liao, J. L. Krolik, and M. J. McKeown, "An information-theoretic criterion for intrasubject alignment of FMRI time series: Motion corrected independent component analysis," IEEE Trans. Med. Imaging, vol. 24, pp. 29-44, 2005. [145] O. Monchi, M. Petrides, J. Doyon, R. B. Postuma, K. Worsley, and A. Dagher, "Neural bases of set-shifting deficits in Parkinson's disease," J. Neurosci., vol. 24, p. 702, 2004. [146] S. J. Broyd, C. Demanuele, S. Debener, S. K. Helps, C. J. James, and E. J. S. Sonuga- Barke, "Default-mode brain dysfunction in mental disorders: A systematic review," Neurosci. Biobehav. Rev., vol. 33, pp. 279-296, 2009. [147] M. E. Raichle, A. M. MacLeod, A. Z. Snyder, W. J. Powers, D. A. Gusnard, and G. L. Shulman, "A default mode of brain function," Proc. Nat. Acad. Sci., vol. 98, pp. 676-682, 2001. [148] R. L. Buckner, J. R. Andrews Hanna, and D. L. Schacter, "The brain's default network," Ann. New York Acad. Sci., vol. 1124, pp. 1-38, 2008. [149] M. D. Fox, A. Z. Snyder, J. L. Vincent, M. Corbetta, D. C. Van Essen, and M. E. Raichle, "The human brain is intrinsically organized into dynamic, anticorrelated functional networks," Proc. Nat. Acad. Sci., vol. 102, pp. 9673-9678, 2005. [150] T. Minka, "Bayesian linear regression," Technical Report, 2001. [151] Y. Chen, A. Wiesel, Y. C. Eldar, and A. O. Hero, "Shrinkage algorithms for MMSE covariance estimation," IEEE Trans. Sig. Proc., vol. 58, pp. 5016-5029, 2010. [152] M. Schmidt, G. Fung, and R. Rosales, "Optimization methods for L1-regularization," Technical Report, 2009. [153] P. Pinel, B. Thirion, S. Meriaux, A. Jobert, J. Serres, D. Le Bihan, J. B. Poline, and S. Dehaene, "Fast reproducible identification and large-scale databasing of individual functional cognitive networks," BMC neurosci., vol. 8, p. 91, 2007.  126 [154] N. Harel, S. P. Lee, T. Nagaoka, D. S. Kim, and S. G. Kim, "Origin of negative blood oxygenation level-dependent fMRI signals," J. Cereb. Blood Flow Metab., vol. 22, pp. 908-917, 2002. [155] F. A. Sadjadi and E. L. Hall, "Three-dimensional moment invariants," IEEE Trans. Patt. Ana. Mach. Intell., vol. 2, pp. 127-136, 1980. [156] G. R. Samanez-Larkin and M. D’Esposito, "Group comparisons: Imaging the aging brain," Soc. Cogn. Affective Neurosci., vol. 3, pp. 290-297, 2008. [157] M. Wilke, V. J. Schmithorst, and S. K. Holland, "Assessment of spatial normalization of whole brain magnetic resonance images in children," Hum. Brain Mapp., vol. 17, pp. 48-60, 2002. [158] M. Brett, A. P. Leff, C. Rorden, and J. Ashburner, "Spatial normalization of brain images with focal lesions using cost function masking," NeuroImage, vol. 14, pp. 486-500, 2001. [159] J. Talairach and P. Tournoux, Co-planar stereotaxic atlas of the human brain. New York, N.Y.: Thieme Medical Publishers Inc., 1988. [160] M. I. Miller, M. F. Beg, C. Ceritoglu, and C. Stark, "Increasing the power of functional maps of the medial temporal lobe by using large deformation diffeomorphic metric mapping," Proc. Nat. Acad. Sci., vol. 102, pp. 9685-9690, 2005. [161] P. A. Yushkevich, J. A. Detre, D. Mechanic-Hamilton, M. A. Fernandez-Seara, K. Z. Tang, A. Hoang, M. Korczykowski, H. Zhang, and J. C. Gee, "Hippocampus-specific fMRI group activation analysis using the continuous medial representation," NeuroImage, vol. 35, pp. 1516-1530, 2007. [162] F. Crivello, T. Schormann, N. Tzourio Mazoyer, P. E. Roland, K. Zilles, and B. M. Mazoyer, "Comparison of spatial normalization procedures and their impact on functional maps," Hum. Brain Mapp., vol. 16, pp. 228-250, 2002. [163] D. Eidelberg, J. R. Moeller, V. Dhawan, P. Spetsieris, S. Takikawa, T. Ishikawa, T. Chaly, W. Robeson, D. Margouleff, and S. Przedborski, "The metabolic topography of parkinsonism," J. Cereb. Blood Flow Metab., vol. 14, pp. 783-801, 1994. [164] D. Eidelberg, J. R. Moeller, K. Kazumata, A. Antonini, D. Sterio, V. Dhawan, P. Spetsieris, R. Alterman, P. J. Kelly, and M. Dogali, "Metabolic correlates of pallidal neuronal activity in Parkinson's disease," Brain, vol. 120, pp. 1315-1324, 1997.  127 [165] A. Feigin, M. Fukuda, V. Dhawan, S. Przedborski, V. Jackson–Lewis, M. J. Mentis, J. R. Moeller, and D. Eidelberg, "Metabolic correlates of levodopa response in Parkinson’s disease," Neurology, vol. 57, pp. 2083-2088, 2001. [166] D. Rougemont, J. C. Baron, P. Collard, P. Bustany, D. Comar, and Y. Agid, "Local cerebral glucose utilisation in treated and untreated patients with Parkinson's disease," J. Neurol. Neurosurg. Psychiatry, vol. 47, pp. 824-830, 1984. [167] E. D. Playford, I. H. Jenkins, R. E. Passingham, J. Nutt, R. S. J. Frackowiak, and D. J. Brooks, "Impaired mesial frontal and putamen activation in Parkinson's disease: a positron emission tomography study," Ann. Neurol., vol. 32, pp. 151-161, 1992. [168] U. Sabatini, K. Boulanouar, N. Fabre, F. Martin, C. Carel, C. Colonnese, L. Bozzao, I. Berry, J. L. Montastruc, and F. Chollet, "Cortical motor reorganization in akinetic patients with Parkinson's disease," Brain, vol. 123, pp. 394-403, 2000. [169] B. Haslinger, P. Erhard, N. Kämpfe, H. Boecker, E. Rummeny, M. Schwaiger, B. Conrad, and A. O. Ceballos-Baumann, "Event-related functional magnetic resonance imaging in Parkinson's disease before and after levodopa," Brain, vol. 124, pp. 558- 570, 2001. [170] C. Buhmann, V. Glauche, H. J. Stürenburg, M. Oechsner, C. Weiller, and C. Büchel, "Pharmacologically modulated fMRI - Cortical responsiveness to levodopa in drug naive hemiparkinsonian patients," Brain, vol. 126, pp. 451-461, 2003. [171] N. Tzourio-Mazoyer, B. Landeau, D. Papathanassiou, F. Crivello, O. Etard, N. Delcroix, B. Mazoyer, and M. Joliot, "Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain," NeuroImage, vol. 15, pp. 273-289, 2002. [172] D. S. Bassett and E. Bullmore, "Small-world brain networks," The Neuroscientist, vol. 12, pp. 512-523, 2006. [173] J. Kingman, "A matrix inequality," Quart. J. Math, vol. 12, pp. 78-80, 1961. [174] N. A. Ahmed and D. Gokhale, "Entropy expressions and their estimators for multivariate distributions," IEEE Trans. Info. Theory, vol. 35, pp. 688-692, 1989. [175] S. P. Boyd and L. Vandenberghe, Convex optimization: Cambridge University Press, 2008.  128 [176] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, "Optimization by simulated annealing," Science, vol. 220, pp. 671-680, 1983. [177] Y. S. K. Vetsa, M. Styner, S. M. Pizer, J. A. Lieberman, and G. Gerig, "Caudate shape discrimination in schizophrenia using template-free non-parametric tests," in Proc. Int. Conf. Medical Image Computing and Computer Assisted Intervention, Montreal, Canada, 2003, pp. 661-669. [178] T. Verstynen, J. Diedrichsen, N. Albert, P. Aparicio, and R. B. Ivry, "Ipsilateral motor cortex activity during unimanual hand movements relates to task complexity," J. Neurophysiol., vol. 93, pp. 1209-1222, 2005. [179] F. Debaere, N. Wenderoth, S. Sunaert, P. Van Hecke, and S. P. Swinnen, "Internal vs external generation of movements: differential neural pathways involved in bimanual coordination performed in the presence or absence of augmented visual feedback," NeuroImage, vol. 19, pp. 764-776, 2003. [180] H. Yu, D. Sternad, D. M. Corcos, and D. E. Vaillancourt, "Role of hyperactive cerebellum and motor cortex in Parkinson's disease," NeuroImage, vol. 35, pp. 222- 233, 2007. [181] O. Rascol, U. Sabatini, N. Fabre, C. Brefel, I. Loubinoux, P. Celsis, J. Senard, J. Montastruc, and F. Chollet, "The ipsilateral cerebellar hemisphere is overactive during hand movements in akinetic parkinsonian patients," Brain, vol. 120, pp. 103- 110, 1997. [182] H. Zou and T. Hastie, "Regularization and variable selection via the elastic net," J. Royal Stat. Soc. Series B, vol. 67, pp. 301-320, 2005. [183] M. Yuan and Y. Lin, "Model selection and estimation in regression with grouped variables," J. Royal Stat. Soc. Series B, vol. 68, pp. 49-67, 2006. [184] P. Sprechmann, I. Ramirez, G. Sapiro, and Y. C. Eldar, "Collaborative hierarchical sparse modeling," Technical Report, 2010. [185] J. Friedman, T. Hastie, and R. Tibshirani, "A note on the group LASSO and a sparse group LASSO," Technical Report, 2010. [186] L. Jacob, G. Obozinski, and J. P. Vert, "Group LASSO with overlap and graph LASSO," in Proc. Int. Conf. Machine Learning, Montreal, Canada, 2009, pp. 433- 440.  129 [187] R. Tibshirani and J. Taylor, "The solution path of the generalized LASSO," Technical Report, 2010. [188] R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight, "Sparsity and smoothness via the fused LASSO," J. Royal Stat. Soc. Series B, vol. 67, pp. 91-108, 2005. [189] M. Hebiri, "Regularization with the smooth-lasso procedure," arXiv:0803.0668, 2008. [190] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, "Least angle regression," Ann. Stat., vol. 32, pp. 407-499, 2004. [191] E. Van Den Berg and M. P. Friedlander, "Probing the Pareto frontier for basis pursuit solutions," SIAM J. Sci. Comp., vol. 31, pp. 890-912, 2008. [192] J. Friedman and R. Tibshirani, "Regularization paths for generalized linear models via coordinate descent," J. Stat. Softw., vol. 33, pp. 1-22, 2010. [193] T. M. Mitchell, S. V. Shinkareva, A. Carlson, K. M. Chang, V. L. Malave, R. A. Mason, and M. A. Just, "Predicting human brain activity associated with the meanings of nouns," Science, vol. 320, pp. 1191-1195, 2008. [194] D. Cai, X. He, and J. Han, "Spectral regression: A unified approach for sparse subspace learning," in IEEE Int. Conf. Data Mining, Omaha, N.E., 2007, pp. 73-82. [195] S. Yan, D. Xu, B. Zhang, H. J. Zhang, Q. Yang, and S. Lin, "Graph embedding and extensions: A general framework for dimensionality reduction," IEEE Trans. Patt. Ana. Mach. Intell., vol. 29, pp. 40-51, 2007. [196] J. A. Fodor, The modularity of mind. Cambridge, M.A.: MIT Press 1983. [197] R. Manenti, S. F. Cappa, P. M. Rossini, and C. Miniussi, "The role of the prefrontal cortex in sentence comprehension: An rTMS study," Cortex, vol. 44, pp. 337-344, 2008. [198] R. Vandenberghe, C. Price, R. Wise, O. Josephs, and R. Frackowiak, "Functional anatomy of a common semantic system for words and pictures," Nature, vol. 383, pp. 254-256, 1996.  


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items