Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Joint source based brain imaging analysis for classification of individuals Ramezani, Mahdi 2014

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2015_february_ramezani_mahdi.pdf [ 9.11MB ]
Metadata
JSON: 24-1.0167096.json
JSON-LD: 24-1.0167096-ld.json
RDF/XML (Pretty): 24-1.0167096-rdf.xml
RDF/JSON: 24-1.0167096-rdf.json
Turtle: 24-1.0167096-turtle.txt
N-Triples: 24-1.0167096-rdf-ntriples.txt
Original Record: 24-1.0167096-source.json
Full Text
24-1.0167096-fulltext.txt
Citation
24-1.0167096.ris

Full Text

Joint Source Based Brain ImagingAnalysis for Classification of IndividualsbyMahdi RamezaniB.Sc., Sharif University of Technology, 2007M.Sc., Sharif University of Technology, 2010A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinThe Faculty of Graduate and Postdoctoral Studies(Electrical and Computer Engineering)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)December 2014c© Mahdi Ramezani 2014AbstractDiagnosis and clinical management of neurological disorders that affect brainstructure, function and networks would benefit substantially from the devel-opment of techniques that combine multi-modal and/or multi-task informa-tion. Here, we propose a joint Source Based Analysis (jSBA) frameworkto identify common information across structural and functional contrastsin data from MRI and fMRI experiments, for classification of individualswith neurological and psychiatric disorders. The framework consists of threecomponents: 1) individual’s feature generation, 2) joint group analysis, and3) classification of individuals based on the group’s generated features. Inthe proposed framework, information from brain neuroimaging datasets isreduced to a feature that is a lower-dimensional representation of a selectedbrain structure or task-related activation pattern. For each individual, fea-tures are used within a joint analysis method to generate basis brain ac-tivation sources and their corresponding modulation profiles. Modulationprofiles are used to classify individuals into different categories. We performtwo experiments to demonstrate the potential of the proposed framework toclassify groups of subjects based on structural and functional brain data. Inthe fMRI analysis, functional contrast images derived from a study of au-ditory and speech perception of 16 young and 16 older adults are used forclassification of individuals. First, we investigate the effect of using multi-task fMRI data to improve the classification accuracy. Then, we propose anovel joint Sparse Representation Analysis (jSRA) to identify common in-formation across different functional contrasts in data. We further assessthe reliability of jSRA, and visualize the brain patterns obtained from suchanalysis. In the sMRI analysis, features representing position, orientationand size (i.e. pose), shape, and local tissue composition of brain are usedto classify 19 depressed and 26 healthy individuals. First, we incorporatepose and shape measures of morphology, which are not usually analyzed inneuromorphometric studies, to measure structural changes. Then, we com-bine brain tissue composition and morphometry using the proposed jSBAframework. In a cross-validation leave-one-out experiment, we show that wecan classify the subjects with an accuracy of 67% solely based on the infor-iiAbstractmation gathered from the joint analysis of features obtained from multiplebrain structures.iiiPrefaceThis thesis is primarily based on five journal and five conference papers,resulting from the collaboration between multiple researchers. In all pub-lications, the contribution of the author was in developing, implementing,and evaluating the method. All co-authors contributed to the editing of themanuscript. Ethics approval for conducting this research has been providedby the Clinical Research Ethics Board, certificate numbers: PSYC-066-07,PSYC-071-07A study described in Chapter 3 has been published in:• Mahdi Ramezani, Purang Abolmaesumi, Kristopher Marble, HeatherTrang, and Ingrid Johnsrude, Fusion analysis of functional MRI datafor classification of individuals based on patterns of activation, BrainImaging and Behavior, DOI 10.1007/s11682-014-9292-1, 2013.The contribution of the author was in developing, implementing, and eval-uating the method. K. Marble, H. Trang and Dr. Johnsrude collected thedataset. Profs. Abolmaesumi and Johnsrude helped with their valuable sug-gestions in improving the methodology.A study described in Chapter 4 has been published in:• Mahdi Ramezani, Kristopher Marble, Heather Trang, Ingrid John-srude, and Purang Abolmaesumi, Joint Sparse Representation of BrainActivity Patterns in Multi-task fMRI Data, IEEE Transactions onMedical Imaging, 2014.• Mahdi Ramezani, Kristopher Marble, Heather Trang, Ingrid John-srude, and Purang Abolmaesumi, Joint Sparse Representation of BrainActivity Patterns Related to Perceptual and Cognitive Components ofa Speech Comprehension Task, IEEE Workshop on Pattern Recogni-tion in Neuroimaging, London, UK, 2012.• Mahdi Ramezani, Purang Abolmaesumi, Kristopher Marble, HeatherTrang, and Ingrid Johnsrude, Classification of individuals based onSparse Representation of brain cognitive patterns: A functional MRIivPrefacestudy, IEEE Conference Engineering in Medicine and Biology, SanDiego, US, 2012.The contribution of the author was in developing, implementing, and eval-uating the method. K. Marble, H. Trang and Dr. Johnsrude collected thedataset. Profs. Abolmaesumi and Johnsrude helped with their valuable sug-gestions in improving the methodology.A study described in Chapter 5 has been submitted:• Mahdi Ramezani, Saman Nouranian, Ingrid Johnsrude, and PurangAbolmaesumi, Reliability Analysis and Visualization of Sparse Repre-sentation Methods for Neuroimaging Data, submitted, 2014.The contribution of the author was in developing, implementing, and eval-uating the method. S. Noranian, Dr. Abolmaesumi and Dr. Johnsrudehelped with their valuable suggestions in improving the methodology.A study described in Chapter 6 has been published in:• Mahdi Ramezani, Ingrid Johnsrude, Abtin Rasoulian, Rachael Bosma,Ryan Tong, Tom Hollenstein, Kate Harkness, and Purang Abolmae-sumi, Temporal-lobe morphology differs between healthy adolescentsand those with early-onset of depression, Neuroimage: Clinical, 2014.• Mahdi Ramezani, Abtin Rasoulian, Purang Abolmaesumi, Tom Hol-lenstein, Ingrid Johnsrude , and Kate Harkness, Multi-object statisticalanalysis of late adolescent depression, SPIE Medical Imaging: ImageProcessing, Orlando, US, 2013.The contribution of the author was in developing, implementing, and evalu-ating the method. Dr. Rasoulian helped with the developing of the statisticalmodel. R. Bosma, R. Tong, Dr. Hollenstein and Dr. Harkness collected theMRI dataset. Dr. Abolmaesumi and Dr. Johnsrude helped with their valu-able suggestions in improving the methodology.A study described in Chapter 7 has been published in:• Mahdi Ramezani, Purang Abolmaesumi, Amir Tahmasebi, RachaelBosma, Ryan Tong, Tom Hollenstein, Kate Harkness, and Ingrid John-srude, Fusion Analysis of First Episode Depression: Where Brain ShapeDeformations Meet Local Composition of Tissue, Neuroimage: Clini-cal.The contribution of the author was in developing, implementing, and eval-uating the method. R. Bosma, R. Tong, Dr. Hollenstein and Dr. HarknessvPrefacecollected the MRI dataset. Dr. Abolmaesumi and Dr. Johnsrude helpedwith their valuable suggestions in improving the methodology.A study described in Chapter 8 has been published in:• Mahdi Ramezani, Abtin Rasoulian, Ingrid Johnsrude, Tom Hollen-stein, Kate Harkness, and Purang Abolmaesumi, Independent com-ponent analysis on Lie groups for multi-object analysis of first episodedepression, IEEE Internation Conference on Acoustics, Speech and Sig-nal Processing (ICASSP), Vancouver, Canada, 2013.• Mahdi Ramezani, Abtin Rasoulian, Tom Hollenstein, Kate Harkness,Ingrid Johnsrude, and Purang Abolmaesumi, Joint source based anal-ysis of multiple brain structures in studying major depressive disorder,SPIE Medical Imaging: Image Processing, San Diego, US, 2014.The contribution of the author was in developing, implementing, and evalu-ating the method. Dr. Rasoulian helped with the developing of the statisticalmodel. R. Bosma, R. Tong, Dr. Hollenstein and Dr. Harkness collected theMRI dataset. Dr. Abolmaesumi and Dr. Johnsrude helped with their valu-able suggestions in improving the methodology.viTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xivAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . xviiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Brain Imaging Data Analysis . . . . . . . . . . . . . . . . . . 11.2 Classification of Individuals . . . . . . . . . . . . . . . . . . . 21.3 Multi-task or Multi-modal Analysis . . . . . . . . . . . . . . 31.4 Proposed Framework . . . . . . . . . . . . . . . . . . . . . . 51.4.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . 51.4.2 Contributions . . . . . . . . . . . . . . . . . . . . . . 61.5 Structure of Thesis . . . . . . . . . . . . . . . . . . . . . . . 62 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.1 Multi-task fMRI data: Speech Comprehension Study . . . . . 122.1.1 Listening Study . . . . . . . . . . . . . . . . . . . . . 122.1.2 Participants . . . . . . . . . . . . . . . . . . . . . . . 132.1.3 Data Acquisition . . . . . . . . . . . . . . . . . . . . . 132.1.4 Data Preprocessing . . . . . . . . . . . . . . . . . . . 132.2 Structural MRI data: Major Depressive Disorder . . . . . . . 142.2.1 Participants . . . . . . . . . . . . . . . . . . . . . . . 14viiTable of Contents2.2.2 Clinical Examination . . . . . . . . . . . . . . . . . . 142.2.3 Behavioural Results . . . . . . . . . . . . . . . . . . . 152.2.4 Data Acquisition . . . . . . . . . . . . . . . . . . . . . 163 Fusion Analysis of Functional MRI Data for Classification ofIndividuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2.1 Joint Independent Component Analysis (jICA) . . . . 203.2.2 Selection of Optimal Joint Sources . . . . . . . . . . . 213.2.3 Automatic Classification of Young and Older Subjects 223.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.3.1 Statistical Difference Among Joint Sources . . . . . . 243.3.2 Selection of Optimal Joint Sources . . . . . . . . . . . 243.3.3 Classification . . . . . . . . . . . . . . . . . . . . . . . 243.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 Joint Sparse Representation Analysis . . . . . . . . . . . . . 354.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.3 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.3.1 Data Generation . . . . . . . . . . . . . . . . . . . . . 404.3.2 Experiments and Results . . . . . . . . . . . . . . . . 434.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.5 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485 Reliability Analysis and Visualization . . . . . . . . . . . . . 525.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 525.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555.2.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . 565.2.2 Sparse Representation Analysis . . . . . . . . . . . . . 565.2.3 Reliability Analysis . . . . . . . . . . . . . . . . . . . 565.2.4 Visualization using t-SNE . . . . . . . . . . . . . . . . 575.3 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585.3.1 Data Generation . . . . . . . . . . . . . . . . . . . . . 585.3.2 Experiments and Results . . . . . . . . . . . . . . . . 595.4 Experiments and Results . . . . . . . . . . . . . . . . . . . . 635.5 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64viiiTable of Contents6 Multi-object Statistical Analysis of Major Depressive Disor-der . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 676.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 676.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706.2.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . 706.2.2 Pose and Shape Analysis . . . . . . . . . . . . . . . . 726.2.3 Statistical Analysis . . . . . . . . . . . . . . . . . . . 736.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756.3.1 Volume Analysis . . . . . . . . . . . . . . . . . . . . . 756.3.2 Pose and Shape Analysis . . . . . . . . . . . . . . . . 776.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817 Fusion Analysis of Brain Shape Deformations and Local Com-position of Tissue . . . . . . . . . . . . . . . . . . . . . . . . . . 847.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 847.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 877.2.1 Features . . . . . . . . . . . . . . . . . . . . . . . . . 877.2.2 Joint Independent Component Analysis . . . . . . . . 887.3 Experiments and Results . . . . . . . . . . . . . . . . . . . . 907.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 958 Simultaneous Analysis of Pose, Shape, and Tissue Composi-tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 998.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 998.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1018.2.1 Feature Generation . . . . . . . . . . . . . . . . . . . 1018.2.2 Group Joint Analysis . . . . . . . . . . . . . . . . . . 1038.2.3 Classification . . . . . . . . . . . . . . . . . . . . . . . 1038.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . 1049 Conclusion and Future Work . . . . . . . . . . . . . . . . . . 1079.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1079.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 108Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111ixList of Tables3.1 Talaraich coordinates for the most discriminative source mapin three contrasts. . . . . . . . . . . . . . . . . . . . . . . . . 274.1 ROIs used for each group for each task. . . . . . . . . . . . . 424.2 Similarity coefficients showing the result of jICA and jSRAmethods for different combinations of tasks. . . . . . . . . . . 455.1 ROIs used for each group to create the simulated data. . . . . 595.2 Normalized AUC for the four estimated components using thesparse analysis and ICA methods. . . . . . . . . . . . . . . . . 646.1 Normalized pose parameters of brain structures. L and Rshow the assigned anatomical left and right hemispheres. . . . 777.1 MNI coordinates for the most discriminative source map inthree contrasts. . . . . . . . . . . . . . . . . . . . . . . . . . . 947.2 P-values of the most significant joint source. . . . . . . . . . . 957.3 Classification error obtained from discriminant analysis of themixing coefficients generated by jICA. . . . . . . . . . . . . . 95xList of Figures1.1 Proposed joint source-based analysis (jSBA) framework forclassification of individuals. . . . . . . . . . . . . . . . . . . . 112.1 Distribution of BDI for the control and depressed groups. . . 153.1 Schematic of the joint Independent Component Analysis (jICA)method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2 jICA of brain patterns. . . . . . . . . . . . . . . . . . . . . . . 263.3 The effect of combining different contrasts on differentiationbetween histograms. . . . . . . . . . . . . . . . . . . . . . . . 293.4 Classification accuracy for different number of features used. . 303.5 ROC curves and detection reliability resulting from four dif-ferent numbers of features used. . . . . . . . . . . . . . . . . . 313.6 ROC curves and detection reliability for different contrasts. . 323.7 ROC curves and detection reliability for different numbers ofICs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.1 Flowchart of the analysis, including preprocessing, feature se-lection, joint sparse representation analysis, and visualizingthe output sources. . . . . . . . . . . . . . . . . . . . . . . . . 384.2 Schematic of the joint Source Representation Analysis (jSRA)method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.3 ROIs representing the activated regions within the brain. . . . 424.4 fMRI signal of a voxel before (a) and after randomization(b), the created synthetic activation signal (c), and simulatedfMRI signal (d). . . . . . . . . . . . . . . . . . . . . . . . . . . 434.5 Average Jaccard (a), precision (b), and sensitivity (c) for com-binations of two, three and four tasks. . . . . . . . . . . . . . 464.6 Significant joint source for jSRA (a-d) and jICA (f-h) meth-ods, using 4 sources. . . . . . . . . . . . . . . . . . . . . . . . 474.7 Significant joint source for jSRA (a-d) and jICA (f-h) meth-ods, using 8 sources. . . . . . . . . . . . . . . . . . . . . . . . 48xiList of Figures4.8 Significant joint source for jSRA (a-d) and jICA (f-h) meth-ods, using 12 sources. . . . . . . . . . . . . . . . . . . . . . . . 495.1 Flowchart of the analysis, including preprocessing, sparse rep-resentation analysis, reliability analysis, and visualization. . . 555.2 Reliability and visualization analysis of the simulated data fortwo different sets of parameters (first and second row). . . . . 615.3 Trustworthiness index as a function of neighborhood size. . . 625.4 Distribution of cophenet correlation coefficient in multiple runsof the visualization and clustering. . . . . . . . . . . . . . . . 625.5 Activation maps obtained for each group using different pa-rameter settings. . . . . . . . . . . . . . . . . . . . . . . . . . 635.6 Reliability and visualization analysis of the simulated data fortwo different sets of parameters (first and second row). . . . . 656.1 Schematic of the pose and shape statistical analysis of multiplebrain structures. . . . . . . . . . . . . . . . . . . . . . . . . . 716.2 Segmented structures in both hemisphere of the brain whichare used for multi-object statistical analysis. . . . . . . . . . . 726.3 Distribution of the volume of each structure between the twogroups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766.4 Shape principal component that was significantly different be-tween the two groups. . . . . . . . . . . . . . . . . . . . . . . 786.5 Shape principal component that was significantly different be-tween the two groups. . . . . . . . . . . . . . . . . . . . . . . 796.6 Pose (a) and shape (b) scores that generated the significantdifference between the MDD subjects and controls across thebeck Depression Inventory Index (BDI). . . . . . . . . . . . . 807.1 Joint Independent Component Analysis (jICA) of brain tissuecomposition and shape deformation. . . . . . . . . . . . . . . 937.2 Group-average histogram for the whole dataset on GM (a),WM (b), and deformation field (c). . . . . . . . . . . . . . . . 967.3 Renyi divergence criteria values for different combination offeatures on differentiation between histograms. . . . . . . . . 978.1 Proposed joint source-based analysis (jSBA) framework forclassification of individuals . . . . . . . . . . . . . . . . . . . . 1018.2 Schematic of the joint sparse representation method [150]. . . 1028.3 Reliability and visualization analysis of the 45 subjects withand without depression. . . . . . . . . . . . . . . . . . . . . . 105xiiList of Figures8.4 Sparse coefficients for the significant joint sources for MDDsubjects and healthy controls (a, b), respectively. . . . . . . . 106xiiiGlossaryACC anterior cingulate cortex. 66AUC Area Under Curve. 23BDI Beck Depression Inventory. 14CSF Cerebrospinal Fluid. 69DARTEL Diffeomorphic Anatomical Registration Through ExponentiatedLie Algebra. 9DBM Deformation Based Morphometry. 68DF Deformation Fields. 85DSM-IV-TR Diagnostic and Statistical Manual of Mental Disorders. 14DTI Diffusion Tensor Imaging. 106EEG Electroencephalography. 3fMRI Functional Magnetic Resonance Imaging. 1FN False Negative. 22FP False Positive. 22GLM General Linear Model. 3GM Grey Matter. iii, 2, 69Hc hippocampus. 66ICA Independent Component Analysis. 3xivGlossaryITG inferior temporal gyrus. xv, 71jICA joint Independent Component Analysis. iijSBA joint Source Based Analysis. iijSRA joint Sparse Representation Analysis. iiiK-SVD K-Singular Value Decomposition. 34LDA Linear Discriminant Analysis. 2loni Laboratory of Neuro Imaging. 8LPBA40 LONI Probabilistic Brain Atlas. 69MDD Major Depressive Disorder. 6MDL Minimum Description Length. 21MEG Magnetoencephalography. 3MNI Montreal Neurological Institute. 8MRI Magnetic Resonance Imaging. 3MTG middle temporal gyrus. xv, 71OMP Orthogonal Matching Pursuit. 39PCA Principal Component Analysis. 3PVD Provoked Vestibulodynia. 107RFA Recursive Feature Addition. 3ROC Receiver Operating Characteristic. 23sMRI Structural Magnetic Resonance Imaging. iii, 1STG superior temporal gyrus. xv, 66, 71SVD Singular Value Decomposition. 3xvGlossarySVM Support Vector Machine. 2TBM Tensor Based Morphometry. 68TN True Negative. 22TP True Positive. 22TR Repetition Time. 13VBM Voxel Based Morphometry. 1VVS Vulvar Vestibulitis Syndrome. 107WM White Matter. iii, 69xviAcknowledgementsI offer my enduring gratitude to the faculty, staff and my fellow students atUBC and Queen’s University, who have inspired me to continue my work inthis field. I owe particular thanks to my supervisors, Dr. Purang Abolmae-sumi and Dr. Ingrid Johnsrude for their guidance, their scientific as well aspersonal support, and dedication to research.A number of faculty members and staff in Department of Electrical andComputer Engineering, UBC, Department of Radiology, and Department ofPsychology, Queen’s University have had a significant role in my research. Iwould like to thank and appreciate contributions from Dr. Kate Harkness,Dr. Tom Hollenstein, Dr. Caroline Pukall, Dr. Roger Tam, Kris Marblek,Dr. Amir Tahmasebi, Dr. Kate Sutton, Heather MacDonald, and RachaelBosma. Special thanks to Dr. Abtin Rasoulian and Saman Nouranian fortheir insightful feedback and sharing their knowledge in medical image anal-ysis, and software development.I would like to thank the Natural Sciences and Engineering ResearchCouncil of Canada (NSERC), the Canadian Institutes of Health Research(CIHR), and UBC for funding this work.Special thanks are owed to my parents, who have supported me through-out my years of education, both morally and financially. Last but not leastI would like to thank my wife, Marjan, who has been with me in every stepwith her support, encouragement, quiet patience and unwavering love.xviiDedicationI would like to dedicate my thesis to my beloved parents, Maryam andMahmoud, and my wife, Marjan.xviiiChapter 1Introduction1.1 Brain Imaging Data AnalysisMany neurological disorders are associated with changes in brain structuresor patterns of activity that can be observed using functional and anatom-ical neuroimaging. Over the past two decades, functional and structuralMagnetic Resonance Imaging (fMRI and sMRI) have been used to identifyregions of activity, to determine volume, shape, and position of brain struc-tures, to diagnose diseases and lesions, and also for neurological and cognitivepsychology research.fMRI studies are typically analyzed to reveal regional specialization forcognitive functions or tasks, or to compare patterns of activity between twogroups, such as patients and healthy control participants [43]. Usually simplecomparisons of conditions are performed, and contrasts of interest are cre-ated. These contrasts reveal the regions showing the most consistent effects,and regions that differ consistently between groups [73, 75, 76]. Markedvariability within groups can make it difficult to determine whether groupsdiffer reliably with respect to the localization and extent of the activation.Furthermore, these approaches, in which all comparisons are considered in-dependently, are not sensitive to shared information among different taskson which functional imaging data are collected.sMRI studies have revealed the neuroanatomical correlates of neurolog-ical disorders, characterizing differences in shape, volume, and local tissuecomposition. Proposed approaches are classified into those that measuredifferences in brain shape, and those that measure differences in the localvolume (and concentration) of brain tissue after macroscopic differences inshape have been discounted [6]. The former approaches use the deforma-tion fields that map any individual brain onto some standard reference tocharacterize neuroanatomy. The latter approaches, such as Voxel BasedMorphometry (VBM) [6], focus on the local composition of brain tissue, andcompare images for their tissue composition on a voxel-by-voxel basis afterthe deformation fields have been used to spatially normalize the images [13].In other words, conventional computational neuroanatomic techniques have11.2. Classification of Individualseither used the deformation fields themselves to characterize brain structuralvariation, or have used these fields to normalize images that are then enteredinto an analysis of regionally specific differences. Ideally, a procedure likeVBM should be able to automatically identify any structural abnormalitiesin a single Grey Matter (GM) concentration image. However, even withmany hundreds of subjects in a database, the method may not be powerfulenough to detect subtle abnormalities in individuals [6]. A more powerfulprocedure might be to use some form of voxel-wise multivariate approach.Within a multivariate framework, in addition to images of GM concentra-tion, other image features such as White Matter (WM) concentration, andinformation from the spatial normalization procedure could be included [6].1.2 Classification of IndividualsOur understanding of the brain basis of disorders has benefited greatly duringthe past decade from important advances in machine learning and classifica-tion techniques. Given the predicted increase in prevalence of brain disordersincluding developmental, psychiatric and neurodegenerative diseases in thecoming decades [160, 61], early detection and intervention in persons withmild brain impairment is of great importance. To better characterize pa-tients with early impairment, it is critical to develop tools that can be usedin reliable classification of patients in their early stage of the disease. Theseclassification tools will initially be used in clinical research and ultimatelybe use in the clinical treatment of patients at risk for brain disorders [160].Generally, classification based on neuroimaging data is not trivial, due tothe high dimensionality of input feature space and small set of subjects thatis usually available.Although scant previous work deals with the specific problem of clas-sification of individuals based on brain imaging data, the closely relatedgoal, of using machine learning to decode stimuli, mental states, and behav-iors from fMRI data is rapidly gaining in popularity; particularly the set ofmethods called representational similarity analysis, or pattern-informationanalysis (see [132, 145] for tutorial reviews). In this context, Haxby andcolleagues showed that fMRI activation patterns are different when viewinga photograph of a face from viewing a house, a shoe, or a chair [88]. Usinga similar dataset, Cox and Savoy successfully classified patterns of fMRI ac-tivation evoked by the presentation of photographs of various categories ofobjects, by applying Support Vector Machine (SVM) and Linear Discrimi-nant Analysis (LDA) [48]. Mitchell et al. successfully trained classifiers to21.3. Multi-task or Multi-modal Analysisautomatically decode the subject’s cognitive state at a single time instantor interval [129]. De Martino et al. combined multivariate voxel selectionand SVM for classification of fMRI spatial patterns [121]. Pereira combineddimensionality reduction and classification into a single learning objectiveto achieve better learning performance [144]. Unlike these studies that usedrelatively simple stimuli or images drawn from fixed categories, Kay et al.used natural receptive-field models to identify a specific image, viewed byan observer, from a large set of natural images [95]. Their group furthercombined structural and semantic encoding models, and prior informationabout the structure and semantic content of natural images, to produce ac-curate reconstructions of observed natural images from brain activities [133].Schrouff et al. used feature extraction methods with different classifiers todecode semi-constrained brain activity patterns, where number and durationof mental events were not externally imposed [171].In this thesis, the goal is to characterize brain structural and functionalchanges and to use it for classification of individuals, rather than to detecttransient cognitive states. This procedure typically consists of three steps. Infunctional MRI analysis, the first step is to determine the activation mapsusing a data-driven method such as group-Independent Component Anal-ysis (group-ICA) [27], or a model-based approach such as General LinearModel (GLM) [69]. In structural MRI analysis, the first step is to deter-mine brain tissue composition maps [13], or brain deformation maps [6].The second step is to reduce the dimensionality of the data and computerepresentative features using Principal Component Analysis (PCA) [58, 70],Singular Value Decomposition (SVD) [3], Independent Component Analysis(ICA) [69], GLM or Recursive Feature Addition (RFA) [171]. In the thirdstep, a classification is performed on the obtained features. These approacheshave only focused on the classification based on single comparison of condi-tions or structural maps, and are not sensitive to shared information amongdifferent contrasts generated from those comparisons.1.3 Multi-task or Multi-modal AnalysisRecently, considerable attention has been focused on combining data acrossmultiple modalities or tasks, to provide knowledge of the joint informationthat may exist among those sources [182]. These analyses may be impor-tant to better understand complex disorders that affect many aspects of thebrain (such as its structure, function, and organization; [30]). The premise ofmulti-modality approaches is that each imaging modality provides comple-31.3. Multi-task or Multi-modal Analysismentary information about different tissue characteristics, and at differentspatial and temporal resolutions. Many techniques have been proposed tocombine multi-modal or multi-task fMRI information. These techniques canbe categorized into two main types: data-integration and data-fusion meth-ods [29, 52, 168]. Data-integration techniques use one imaging modality toimprove the results of another modality (for example, registration of EEG orMEG to MRI [85]; and the use of fMRI to estimate the location of dipoles orthe distribution of neural sources prior to EEG [109]). On the other hand,data fusion techniques utilize multiple modalities [30] or tasks [31] to takeadvantage of combined information. Generally, due to weak cross-modalityrelationships and intersubject variability, finding one-to-one correspondencein multi-modal images is difficult; however, performing fusion analysis acrossmultiple subjects makes this an easier problem to solve. In this type of anal-ysis, each modality is usually reduced to a feature that is a lower-dimensionalrepresentation of a selected brain structure or task-related activation pattern.Using variations across individual subjects, associations across the featurescan be explored [29]. Joint Independent Component Analysis (jICA) [30] isone multivariate technique for fusion analysis. jICA is a group-level analysistechnique that uses extracted features from individual subjects’ data (i.e.multiple modalities or functional contrast images) and tries to maximize theindependence among joint components.Classification based on multi-modal and/or multi-task information mayimprove our knowledge of neurological disorders that affect multiple aspectsof brain structure, function and networks. However, to the best of our knowl-edge, prior studies did not show the effect of combining datasets across tasksor modalities to increase the sensitivity of the classification for neuroimag-ing applications, where usually small number of subjects are available. jICArepresents a powerful way to reduce the dimensionality of fMRI data sets,permitting classification of subjects based on different functional and struc-tural features. jICA and its extensions have been successfully applied tostudy aphasia [180] and schizophrenia [30, 31, 108, 182, 183]; however, jICAdata have not been used before to quantitatively classify individuals as be-longing to one group or another.Although results of previous studies have shown that ICA-based jointanalysis techniques, such as joint ICA [30], parallel ICA [108], and coefficient-constrained ICA [183], accurately identify sources of common variance amongfeatures, the theoretical assumption of independence of the functional pat-terns extracted by ICA algorithms is not guaranteed in practice, and com-ponents are separated on the basis of spatial sparsity rather than depen-dence [51]. Therefore, other mathematical properties of brain fMRI data41.4. Proposed Frameworkthan independence should be used. Sparsity is a natural characteristic sat-isfied by fMRI sources in the spatial domain [114]. Therefore, sparse repre-sentation methods can be used instead of ICA decomposition.1.4 Proposed FrameworkDiagnosis and clinical management of neurological disorders that affect brainstructure, function and networks would benefit substantially from the devel-opment of new techniques that combine multi-modal and/or multi-task infor-mation. Here, we propose a joint Source-based Analysis (jSBA) frameworkto identify common information across different functional and structuralfeatures in data from fMRI and sMRI experiments, for classification of indi-viduals with neurological and psychiatric disorders. The framework consistsof three components, as shown in Fig. 1.1: 1) individual’s feature extrac-tion, 2) joint group analysis, and 3) classification of individuals based onthe group’s extracted features. In each component, combinations of noveland state of the art methods have been used. In the proposed framework,information from multi-modal and/or multi-task datasets is reduced to afeature that is a lower-dimensional representation of a selected brain struc-ture or task-related activation pattern. For each individual, features areused within a source-based analysis method to generate basis brain sourcesand their corresponding modulation profiles, used to classify individuals intodifferent categories.1.4.1 ObjectiveThe global objective of this thesis is to propose a framework that can distin-guish patients and healthy controls where the number of available subjectsare low, and the between group differences are subtle. To this end, we pro-pose to use multi-modal and/or multi-task brain imaging data, and takeadvantage of the complementary information that exist among the modal-ities or tasks for group classification. We investigate the joint analysis ofmultiple features, some of which previously have not been integrated orfused. We further develop techniques that are appropriate for joint anal-ysis of brain imaging datasets. As a corollary to this objective, we postulatethat the proposed framework can identify joint information across differentstructural and functional features.51.5. Structure of Thesis1.4.2 ContributionsThis study develops a framework for classification of individuals based onfusing brain imaging information. In the course of achieving this objective,the following contributions were made:• Investigating the use of multi-task fMRI for classification of individuals.• Proposing a novel joint sparse representation technique for joint groupanalysis of multi-task fMRI data.• Proposing a new technique for reliability analysis and visualization ofsparse representation algorithms.• Investigating the use of morphometric analysis of multiple brain struc-tures for classification of adolescents with and without depression.• Proposing a novel way to combine brain structural information suchas shape, pose, and tissue composition within the framework.1.5 Structure of ThesisThe rest of this thesis is subdivided into seven chapters as outlined below:Chapter 2: MaterialsIn this chapter we describe the two datasets (functional and structural MRIdata) which have been used to evaluate the proposed framework. The ex-perimental fMRI data was acquired from sixteen young (age: 19-26) andsixteen older (age: 57-73) adults obtained from multiple speech comprehen-sion tasks within subjects. We will use this dataset in chapters 3, 4, and 5.The structural MRI data was acquired from 16 females (aged 16 to 21) and3 males (aged 18) with early-onset Major Depressive Disorder (MDD), and25 female and 1 male healthy control participants, drawn from the same agerange. This dataset will be used in chapters 6, 7, and 8.Chapter 3: Fusion Analysis of Functional MRI Data forClassification of IndividualsClassification of individuals based on patterns of brain activity observed infunctional MRI contrasts may be helpful for diagnosis of neurological dis-orders. Prior work for classification based on these patterns have primarily61.5. Structure of Thesisfocused on using a single contrast, which does not take advantage of com-plementary information that may be available in multiple contrasts. Wheremultiple contrasts are used, the objective has been only to identify the joint,distinct brain activity patterns that differ between groups of subjects; notto use the information to classify individuals. In this chapter, we use jointIndependent Component Analysis (jICA) within a Support Vector Machine(SVM) classification method, and take advantage of the relative contributionof activation patterns generated from multiple fMRI contrasts to improveclassification accuracy. Young (age: 19-26) and older (age: 57-73) adults (16each) were scanned while listening to noise alone and to speech degradedwith noise, half of which contained meaningful context that could be used toenhance intelligibility. Functional contrasts based on these conditions (anda silent baseline condition) were used within jICA to generate spatially inde-pendent joint activation sources and their corresponding modulation profiles.Modulation profiles were used within a non-linear SVM framework to clas-sify individuals as young or older. Results demonstrate that a combinationof activation maps across the multiple contrasts yielded an area under ROCcurve of 0.86, superior to classification resulting from individual contrasts.Moreover, class separability, measured by a divergence criterion, was sub-stantially higher when using the combination of activation maps.Chapter 4: Joint Sparse Representation AnalysisPrior research using multi-task analysis in fMRI, such as the proposed ap-proach in Chapter 3, has mainly assumed that brain activity patterns evokedby different tasks are independent. This may not be valid in practice. Inthis chapter, we use sparsity, which is a natural characteristic of fMRI datain the spatial domain, and propose a joint Sparse Representation Analysis(jSRA) method to identify common information across different functionalcontrasts in data from a multi-task fMRI experiment. Sparse representationmethods do not require independence, or that the brain activity patterns benon-overlapping. We use functional contrast images within the joint sparserepresentation analysis to generate joint activation sources and their corre-sponding sparse modulation profiles. We evaluate the use of sparse represen-tation analysis to capture individual differences with simulated fMRI dataand with experimental fMRI data. The same experimental fMRI data as inChapter 3 was used, where an independent measure (namely, age in years)can be used to differentiate between groups. Simulation results show thatthis method yields greater sensitivity, precision and higher Jaccard indices(which measures similarity and diversity of the true and estimated brain ac-71.5. Structure of Thesistivation sources) than does the jICA method. Moreover, superiority of thejSRA method in capturing individual differences was successfully demon-strated using experimental fMRI data.Chapter 5: Reliability Analysis and VisualizationSparse representation analysis of neuroimaging data has been shown to beeffective for detection of functional activation, for identification of brain func-tional networks, for multivariate pattern analysis and as shown in Chapter 4,for classification of individuals. However, results of a sparse analysis shouldbe interpreted cautiously, because they may vary over multiple runs of the al-gorithm and depend on the initialization, parameter values and optimizationalgorithms employed. In this chapter, we propose a way to assess the relia-bility of such analyses, and to visualize the brain patterns obtained from asparse representation analysis. We run the sparse analysis multiple times foreach parameter value, and cluster the estimated components. The clustersare nonlinearly mapped into a low-dimension space, which enables furtherinterpretation of the components, and identification of the best parametersvalues. We evaluate the use of the proposed approach using both simulatedand experimental fMRI data. Results show that we can successfully iden-tify reliable components and select the best parameters using the proposedapproach.Chapter 6: Multi-object Statistical Analysis of MajorDepressive DisorderIn the next three chapters we use the structural MRI dataset of individ-uals with and without depression. Major depressive disorder (MDD) haspreviously been linked to structural changes in several brain regions, partic-ularly in the medial temporal lobes [14, 15]. This has been determined usingvoxel-based morphometry, segmentation algorithms, and analysis of shapedeformations [13, 16, 147, 196, 204]: these are methods in which informationrelated to the shape and the pose (the size, and anatomical position andorientation) of structures is lost. In this chapter, we incorporate informa-tion about shape and pose to measure structural deformation in adolescentsand young adults with and without depression (as measured using the BeckDepression Inventory and Diagnostic and Statistical Manual of Mental Dis-orders criteria). We focus on changes in cortical and subcortical structures,and use a multi-object statistical pose and shape model to analyze imagingdata from 16 females (aged 16 to 21) and 3 males (aged 18) in with early-81.5. Structure of Thesisonset MDD, and 25 female and 1 male healthy control participants, drawnfrom the same age range. Hippocampus, parahippocampal gyrus, putamen,and the superior, inferior and middle temporal gyri in both hemispheresof the brain are automatically segmented using the loni Probabilistic BrainAtlas [176] in MNI space. Points on the surface of each structure in theatlas are extracted and warped to each participant’s structural MRI. Thesesurface points are analyzed to extract the pose and shape features. Posedifferences are detected between the two groups, particularly in the left andright putamen, right hippocampus, and the left and right inferior temporalgyri. Shape differences are detected between the two groups, particularly inthe left hippocampus and in the left and right parahippocampal gyri. Fur-thermore, pose measures are significantly correlated with BDI score acrossthe whole (clinical and control) sample. Since the clinical participants wereexperiencing their very first episodes of MDD, morphological alteration inthe medial temporal lobe appears to be an early sign of MDD, and is unlikelyto result from treatment with antidepressants. Pose and shape measures ofmorphology, which are not usually analyzed in neuromorphometric studies,appear to be sensitive to depressive symptomatology.Chapter 7: Fusion Analysis of Brain Shape Deformationsand Local Composition of TissueComputational neuroanatomical techniques, such as the one proposed inChapter 6, that are used to evaluate the structural correlates of disordersin the brain typically measure regional differences in grey matter or whitematter, or measure regional differences in the deformation fields required towarp individual datasets to a standard space. Our aim in this chapter isto combine measurements of regional tissue composition and deformationsin order to characterize a particular brain disorder (here, major depressivedisorder). We use structural magnetic resonance imaging (MRI) data fromyoung adults in a first episode of depression, and from an age- and sex-matched group of non-depressed individuals. After DARTEL groupwise reg-istration, we obtained grey matter (GM) and white matter (WM) tissuemaps in the template space, along with the deformation fields required towarp the DARTEL template to the GM and WM maps in the population.These three features, reflecting tissue composition and shape of the brain,are used within jICA to extract spatially independent joint sources and theircorresponding modulation profiles. Coefficients of the modulation profilesare used to capture differences between depressed and non-depressed groups.The combination of hippocampal shape deformations and local composition91.5. Structure of Thesisof tissue (but neither shape nor local composition of tissue alone) is shownto discriminate reliably between individuals in a first episode of depressionand healthy controls, suggesting that brain structural differences betweendepressed and non-depressed individuals do not simply reflect chronicity ofthe disorder but are there from the very outset.Chapter 8: Simultaneous analysis of Pose, Shape, and TissueCompositionIn this chapter we use the jSBA framework to combine the two previouschapters, which were shown to be effective in identification of brain struc-tural variations in patients with Major Depressive Disorder (MDD). In thisframework, features representing position, orientation and size (i.e. pose),shape, and local tissue composition are extracted. Subsequently, simultane-ous analysis of these features within a joint analysis method is performed togenerate the basis sources that show significant differences between subjectswith MDD and those in healthy control. Moreover, in a cross-validationleave-one-out experiment, we use a random forest classifier to identify indi-viduals within the MDD group. Results show that we can classify the MDDsubjects with an accuracy of 67% solely based on the information gatheredfrom the joint analysis of pose, shape, and tissue composition in multiplebrain structures.Chapter 9: Conclusion and Future WorkThis chapter includes a short summary followed by suggestions for futurework.101.5. Structure of ThesisSource-based AnalysisBrain Imaging Data• Fusion• Integration• Multi-object Analysis• sMRI• fMRI• DTI• EEG/MEGFeature Extraction• Intensity• Geometry• Functional Patterns• Support vector Machine• Fisher Linear Discriminant• Multi Layer perceptronClassificationFigure 1.1: Proposed joint source-based analysis (jSBA) framework for clas-sification of individuals.11Chapter 2DatasetsThe proposed framework has been used for analysis of brain activity patternsin a speech comprehension task, and for analysis of depression in adoles-cence and young adulthood, to demonstrate its potential to classify groupsof subjects based on functional and structural datasets. In the next twosub-sections we briefly describe the datasets used for the two applications.2.1 Multi-task fMRI data: Speech ComprehensionStudy2.1.1 Listening StudySubjects were asked to listen to sentences in the scanner and try to un-derstand them as well as they could. Sentences with and without coherentsentence-level meaning ’coherent’ and ’anomalous’ sentences, respectively)were taken from those used by [55] and were mixed with noise that had thesame long term spectrum of the speech and the amplitude envelope of thesignal to be masked (Signal-Correlated Noise: SCN; [170]) at six different sig-nal to noise ratios (SNRs): -5 dB, -3.5 dB, -2.5 dB, -1 dB, 0 dB, and 2.5 dB.Clear speech was also tested, making 7 sentence conditions. Coherent andanomalous sentences were divided into 7 sets, which were pseudorandomlyassigned to conditions such that each sentence set was tested in each of theseven SNR conditions (including clear speech) an equal number of times,across participants. Over the scanning session, each participant heard 14trials of each sentence type at each SNR, half of which were followed by’repeat’ trials requiring the participant to repeat as much of the sentence aspossible. The stimuli assigned to repeat trials were counterbalanced acrossparticipants, and repeat and non-repeat trials were randomly intermixedfor each participant. Intelligibility, defined here as the proportion of wordscorrectly reported, was obtained for each signal quality level and for eachsentence type, for each participant, from the repeat trials. Moreover, datafrom 14 trials of SCN on its own and 16 trials of silence were obtained. Therewere 324, trials distributed across four 81-trial sessions of testing.122.1. Multi-task fMRI data: Speech Comprehension Study2.1.2 ParticipantsSixteen young (mean age: 21.1, range: 19-26, 11 female) and sixteen older(mean age: 64.2, range: 57-73, 11 female) adults were scanned. All subjectswere native speakers of English, without any history of neurological illness,head injury, or hearing impairment. This study was cleared by the Queen’sUniversity Health Sciences Research Ethics Board, and written informedconsent was obtained from all participants.2.1.3 Data AcquisitionThe fMRI data were acquired using a 3.0 Tesla Siemens Trio MRI sys-tem with a 12-channel head coil in the MRI facility at Queen’s University,Kingston, Canada. Each acquisition consisted of 32 contiguous slices with 4mm thickness, field of view 211×211 mm, in plane resolution of 3.3×3.3 mm,resulting in a grid of 64× 64× 32 voxels, each 3.3× 3.3× 4 mm in volume.The Repetition Time (TR) was 9 sec and the acquisition time was 2 sec.This sparse GE-EPI imaging technique allowed for stimuli to be presentedin the silent gaps between scans. Total functional imaging time was 48 min-utes. Auditory stimuli and the visual ’repeat’ instructions were presentedto the participants using E-Prime v.1.2 and a NEC LT265 DLP projector.Participants viewed the screen via a mirror system mounted on the headcoil [115].2.1.4 Data PreprocessingBefore preprocessing, the Siemens motion correction algorithm (Available on-line at: http://imaging.mrc-cbu.cam.ac.uk/imaging/DataDiagnostics) wasapplied to the DICOM MR images, and then the DICOM images were con-verted to NIFTI format. The fMRI data were preprocessed using Statisti-cal Parametric Mapping software (SPM8, Wellcome Department of Cogni-tive Neurology, London, UK). Preprocessing steps included realignment [74],coregistration [44] and the segmentation-based spatial normalization [7] ofSPM8. The data were spatially smoothed using an 8-mm Gaussian kernel.Spatial smoothing has been previously shown to be effective at increasingfunctional signal-to-noise in SPM-based analyses [187]. The first scan ofeach session was discarded, and the rest were coded according to the audi-tory condition of the preceding stimulus and entered into a single-subjectgeneral linear model. The Finite Impulse Response (FIR) set was selectedas the hemodynamic response function. Three functional contrasts werecalculated: SCN versus silence, identifying brain regions that process the132.2. Structural MRI data: Major Depressive Disorderacoustic properties of sound; anomalous sentences versus SCN to highlightspeech-responsive areas; coherent versus anomalous sentences to reveal re-gions sensitive to coherent sentence-level semantic content.2.2 Structural MRI data: Major DepressiveDisorder2.2.1 ParticipantsNineteen depressed subjects (age: 18.1± 1.1, 3 male, all right-handed) whomet DSM-IV-TR (Diagnostic and Statistical Manual of Mental Disorders;American Psychiatric Association, 2000) criteria for a current episode ofMDD were recruited through referrals from community mental health clin-ics. Twenty-six healthy participants (age: 17.96 ± 0.2, 1 male, all right-handed) with no psychiatric history were recruited through community ad-vertisement. Participants were excluded if they met current or lifetime crite-ria for bipolar disorder, a psychotic disorder, attention-deficit/hyperactivitydisorder, a developmental disability (e.g., autism spectrum disorder), or amedical disorder that could cause depression (e.g., hypothyroidism). All par-ticipants were medication-free prior to the study. This study was cleared bythe Queen’s University Health Sciences Research Ethics Board, and writ-ten informed consent was obtained from all participants and by a parent orguardian for participants under 18 years. All participants were compensated$10 for their time and were given a picture of their brain to keep.2.2.2 Clinical ExaminationAll participants in the depressed group were diagnosed based on a structureddiagnostic interview administered by an advanced doctoral student in clinicalpsychology (the Child and Adolescent version of the Schedule for AffectiveDisorders and Schizophrenia; K-SADS; [94]). The K-SADS is the gold stan-dard for deriving DSM-IV-TR diagnoses in children and adolescents and isthe most widely used measure for this purpose in clinical research. Thisclinician interview was administered by graduate-level students in clinicalpsychology who were trained and supervised by a registered clinical psy-chologist with over 20 years of expertise in the assessment and diagnosis ofdepression in adolescence. Participants were scored in the mild to severedepression range, as defined by a Beck Depression Inventory (BDI-II; [11])score. The BDI is a 21-item measure designed to assess the presence andseverity of depression symptoms, and is the most commonly used depression142.2. Structural MRI data: Major Depressive DisorderControl Depressed051015202530354045BDIFigure 2.1: Distribution of BDI for the control and depressed groups. Thecentral red mark is the median, the edges of the blue box are the 25th and75th percentiles, and the whiskers show the extreme values of the volumes.measure in adolescent and adult samples [50]. The BDI was administeredby trained graduate or senior undergraduate-level students who went overthe measure with each participant to ensure that they understood the ques-tions. We chose not to include the Hamilton Depression Rating Scale andto focus exclusively on the BDI as an index of depression severity for theprimary reason that there is evidence that the Hamilton possesses a poorpsychometric profile [9].2.2.3 Behavioural ResultsThe groups were matched for age (p = 0.52). There was no socioeconomicstatus (SES) differences between the subjects in the two groups (p = 0.93).The BDI differed significantly between the two groups (p < .0001). Fig. 2.1shows the boxplot of the BDI for the two groups.152.2. Structural MRI data: Major Depressive Disorder2.2.4 Data AcquisitionThe MRI data were acquired using a 3.0 Tesla Siemens Trio MRI scan-ner with a 12-channel head coil in the MRI facility at Queen’s University,Kingston, Canada. A whole-brain 3D MPRAGE T1-weighted anatomicalimage was acquired for each participant (voxel resolution of 1.0 × 1.0 ×1.0 mm, flip angle α = 9◦, TR = 1760 ms, and TE = 2.6 ms). The subjectsfilled out the BDI immediately after being in the scanner.16Chapter 3Fusion Analysis of FunctionalMRI Data for Classification ofIndividuals3.1 IntroductionFunctional Magnetic Resonance Imaging (fMRI) studies are typically an-alyzed to reveal regional specialization for cognitive functions or tasks, orto compare patterns of activity between two groups, such as patients andnormal-control participants [43]. Usually, simple comparisons of conditionsare performed to reveal regions that are reliably active by the task of inter-est, and/or regions that differ reliably between groups [73, 75, 76]. Severalstudies have taken advantage of these identified regions for group classifica-tion based on fMRI data [3, 58, 70]. However, these approaches have onlyfocused on the classification based on single comparison of conditions, andare not sensitive to shared information among different contrasts generatedfrom those comparisons.Many techniques have been proposed to combine multi-task fMRI infor-mation. Joint Independent Component Analysis (jICA) [30] is one multivari-ate technique for fusion analysis. It is an extension of ICA that combinesinformation from multiple modalities or functional contrasts. The simplifiednoise-free ICA model seems to be sufficient for many applications [92], andhas been successfully applied to fMRI data [28, 123, 183]. In this simplifiedmodel, the ICA components that contribute the least, and which may havea "speckled" spatial distribution in contrast images, are noise of unknownorigin [123]. ICA is typically used as a first-level data-driven approach tofind spatially or temporally independent brain sources of activity from fMRI1This chapter is adapted from [148]: Mahdi Ramezani, Purang Abolmaesumi, Kristo-pher Marble, Heather Trang, and Ingrid Johnsrude, Fusion analysis of functional MRIdata for classification of individuals based on patterns of activation, Brain Imaging andBehavior, 2013.173.1. Introductionscans of a person’s brain, while that person is performing a desired task [123].Spatial ICA results in a set of independent components (spatial brain acti-vation patterns), and a set of “mixing coefficients”.Joint ICA is a group-level analysis technique that uses extracted fea-tures from individual subjects’ data and tries to maximize the independenceamong joint components. Assuming that the features, obtained from multi-ple modalities or multiple contrasts, share the same modulation profile (i.e.mixing-coefficient matrix), jICA uses more information to estimate the samenumber of mixing coefficients; and may therefore yield improved results com-pared to ICA. An additional advantage of jICA is the computation of mod-ulation profiles along with the identified sources. Such profiles substantiallyreduce the dimensionality of the data, and can be used for group classifica-tion. The ability of jICA to reduce the number of dimensions is particularlyimportant in the context of fMRI data analysis, where the dimensionality ofinput feature space is high and the number of available subjects is usuallylow. Although the discrimination ability of joint components has been in-vestigated [184], to the best of our knowledge, the application of modulationprofiles for classification of individuals has not, to our knowledge, been tried.Recently, Fan et al. used the modulation profile, which resulted from apply-ing ICA to resting-state fMRI data, to classify individuals with schizophreniaand healthy controls [69]. Their work did not investigate the possibility ofimproved classification that could be obtained by combining multiple fMRIcontrasts.In this chapter of the thesis, we use jICA for group classification. Wefirst identify the modulation profile (mixing coefficients) that reflects groupdifferences by fusion analysis of multiple contrasts, and then use the re-sulting profile for group classification. We test this classification approachusing fMRI data collected from 16 young and 16 older neurologically nor-mal individuals who were scanned in multiple stimulus conditions in a speechperception experiment [55, 116]. The jICA modulation profile, which reflectsgroup differences in activation patterns observed in three different functionalcontrasts, is used to automatically classify individuals as young or older.One contrast compares responses to unintelligible noise bursts, amplitudemodulated with the temporal envelope of spoken sentences, with silence.A second contrast compares responses to sentences without coherent mean-ing (“anomalous” sentences, e.g., “Her good slope was done in carrot”) withthe unintelligible noise bursts. A third contrast compares responses to sen-tences with coherent meaning (“coherent” sentences, e.g., “Her new skirt wasmade of denim”) to anomalous sentences. The intelligibility of anomaloussentences is determined by the quality of the signal, whereas the intelligi-183.2. Methodbility of coherent sentences is determined both by the quality of the signal,and by semantic knowledge. At any given signal quality, comprehension isgreater for the latter than for the former (referred to hereafter as “contextbenefit”) [116]. Although older and younger adults do not differ in contextbenefit measured behaviorally [116], we examine whether patterns of activ-ity in functional contrasts can be used to classify young and older people,without considering any information reflecting the distinct brain structuraldifferences of the two groups [35, 81].Joint ICA is used with data from the three contrasts to probe the uniqueand joint information among different contrasts and groups. First, joint inde-pendent components based on different combinations of these features, alongwith the mixing coefficients, are determined and then statistical differencesamong mixing coefficients (reflecting the network strengths) are examinedusing t-tests. Third, separability of the joint-source distributions is mea-sured in order to assess the difference between distributions from differentyoung and older participants [90]. Finally, the modulation profile extractedfrom the three functional contrasts is used to classify individuals as youngor older, and the accuracy of this classification is assessed. We demonstratethat by fusing the three contrasts with jICA, the discrimination of subjectsas young or older is substantially improved compared to using each individ-ual contrast alone. Here we show the feasibility of this method by examiningage-related differences in healthy subjects, where an independent measure(namely, age in years) can be used to differentiate between groups with100% certainty. This "gold standard" allows us to validate the approach,which we expect to be applicable to real-world diagnostic problems withoutsuch a clear standard to differentiate groups.3.2 MethodAs mentioned in Section 2.1, the behavioural performance for each subjectwas measured as the words correctly reported at different SNRs. Pilot workrevealed that in general at a given SNR older people reported fewer wordsthan younger people. Accordingly we altered SNRs for the two groups tomatch behavioural performance. The average report score for anomaloussentences, which do not provide a contextual benefit, gives a good indicationof low-level speech processing. A range of SNRs for each group was chosenin order to equate the behavioural performance while hearing the anomaloussentences. The ranges were -5 dB, -3.5 dB, -2.5 dB, -1 dB, and 0 dB foryounger people and -3.5 dB, -2.5 dB, -1 dB, 0 dB, and +2.5 dB for older193.2. Method… Map K Ao Ay ST: Joint sources A: Shared Mixing Matrix Map 1 Contrast 1 … Contrast K Young Old X: Observations xiT siT ෍͓ூ஼௜ୀଵ ai ith joint source Coefficients associated to the ith joint source Figure 3.1: Schematic of the joint Independent Component Analysis (jICA)method.adults.3.2.1 Joint Independent Component Analysis (jICA)Joint ICA assumes a noise-free generative model X = AS where a sourcematrix S = [s1, s2, . . . , sN ]T combines with the mixing coefficients matrix A(also called the ICA loading parameters matrix) to generate the observationsX = [x1,x2, . . . ,xN ]T . The jth row, sj , of S is the jth joint independentcomponent, and M is the number of independent components. N is thenumber of participants and xi is a vector formed by concatenating the brainfeatures.The jICAmethod, shown schematically in Fig. 3.1, involves findingU = WX,where W = A−1 is called the unmixing matrix and U is the estimate ofthe joint source matrix S. In this figure Ay, Ao indicate the submatri-ces associated with the young and old subjects. A MATLAB implementa-tion of jICA is provided by the FIT 2.0b software [30], available online athttp://mialab.mrn.org/software/.Joint independent components are found using the Infomax algorithm [12],which is based on minimization of mutual information of components. In this203.2. Methodalgorithm, the output entropy of a neural network is adaptively maximizedwith as many outputs as the number of Independent Components (ICs) tobe estimated. The best way to estimate the most appropriate number ofindependent components is not clear. This number can affect the results ofICA, particularly if it is too small [112]. Information theoretic techniques,such as the Minimum Description Length (MDL) criterion [104], have beenshown to be useful for selecting the number of brain basis patterns [104].However, these techniques may not converge because of the heterogeneity inlocalization and size of individuals’ brain features.The jICA procedure generates a set of joint independent components andassociated mixing coefficients. These low-dimensional coefficients model themodulation of each subject’s feature by a joint source, and thus can be usedas a criterion for capturing group differences. To investigate whether thegroups were separable by different weightings of the joint sources, unpairedtwo-sample t-tests with unequal variance (heteroscedastic) on the mixingcoefficients were performed. The z-scaled results indicated the joint compo-nents of interest.3.2.2 Selection of Optimal Joint SourcesIf two groups differ, then the distributions of their joint components shouldbe separable [183, 29]. Separability can be quantified by computing a diver-gence between joint histograms. Group joint-sources are defined by Uy =A−1y Xy = WyXy, where Ay, Wy and Xy indicate the submatrices associ-ated with the young subjects (similar for the older group). For each subject,the appropriate group joint source (e.g. Uo for an older subject) was di-vided into three maps, which correspond to the three contrasts used in thejICA analysis. The map elements (each one representing a specific voxel)were sorted and thresholded, leaving a set of voxels statistically relevant tothe joint source. Each voxel that survived thresholding in all three mapswas included in a three-dimensional joint histogram in a bin defined by thethree contrast values (from the input observation matrix X) at that voxel’slocation.The group-averaged joint histograms were then calculated by taking themean of the joint histograms of each subject in the group. The differencebetween the two groups was then assessed using the Renyi divergence for-mula [90] with α = 0.5 which has been shown to be optimal for discrimination213.2. Methodbetween pairs of close feature densities [90]:Dα(P ||Q) =1α− 1log(n∑i=1pαiqα−1i) =1α− 1log(n∑i=1pαi q1−αi ) (3.1)where P and Q are probability distributions, reflected in the group-averagedjoint histograms.The divergence is also computed for different combinations of contrasts.The higher the values of the Renyi divergence criterion, the better the dis-crimination between groups. Therefore, best combination of contrasts is theone that yields the highest divergence value.3.2.3 Automatic Classification of Young and Older SubjectsIn order to overcome the classification problems caused by high dimension-ality of fMRI data and the small set of available subjects (16 in each group),columns of the mixing coefficients matrix, which reflect the weighting of eachjoint source in a subject’s contrast, were used as input features to a classi-fication algorithm. A non-linear Support Vector Machine (SVM) was usedto classify the subjects. SVM does not assume that data points conformto a specific model, but rather seeks to find the hyperplane that separatesthe two classes with maximum margin [189]. The hyperplane is defined byf(x) = ωKs(x) + b, where Ks(x) = [k(x, s1), .., k(x, sd)] is the vector ofkernel functions centered at the support vectors, ω is the parameter vectorand b is a scalar. Radial Basis Functions (RBF) were used as the kernels:k(x, s) = exp(− |x−z|2σ2 ). The data were split into a training set and a test set.In the training phase, fusion analysis was repeated on only those subjectsin the training set. This produced a new mixing coefficient matrix A(train)and joint source matrix S(train) that modeled the generation of the trainingobservations X(train). The columns of the mixing matrix A(train) were usedas input features to train the classifier, which divided the training subjectsinto two classes of young and older adults.The input features for the test set, columns of A(test), were then foundby least-squares solution of X(test) = A(test)S(train). The positions of thesevectors in k-dimensional feature space, relative to the hyperplane found inrunning the classifier on the training set, determined the classification ofeach test subject. The number of columns of A, or mixing coefficients, usedin the classification was k. A MATLAB implementation of the classifier pro-vided by the Statistical Pattern Recognition Toolbox (STPRtool), availableat http://cmp.felk.cvut.cz/cmp/cmp_software.html was used for this step.223.3. ResultsPerformance of the classification procedure was measured by repeatedlysplitting the data into training and test sets and averaging classification per-formance across iterations. The splitting was done 200 times (each timeselecting different 11 young and 11 older subjects as the training set and theremaining five subjects in each group as the test set). The False Positive(FP), False Negative (FN), True Positive (TP) and True Negative (TN) val-ues were calculated and the ratio between TP and TN values to the totalnumber of outcomes was taken as the performance metric. Selecting thesignificant features (mixing coefficients) is an important factor in classifica-tion results. The mixing coefficients were sorted by the p-values resultingfrom a two-sample t-test checking for differences between groups, and the kcoefficients with the most significant difference were chosen for use in classifi-cation. Unlike some statistical classification methods, SVM does not provideposterior class probabilities (Pi). Without posterior probabilities, it is notpossible to assess the performance of the classifier at other threshold values,and to measure the sensitivity and specificity of the classifier. Followingthe Platt [146] approach, we trained an SVM and later the parameters ofan additional sigmoid function to map the values of SVM outputs to pos-terior probabilities. Using the posterior probabilities, the Receiver Operat-ing Characteristic (ROC) curves were plotted and their Area Under Curve(AUC) calculated. The AUC metric is the most common way to comparethe accuracy of classification methods in the machine learning community.Detection reliability ρ, was defined based on AUC as ρ = 2×AUC−1 [152].Under this definition, ρ = 1 for perfect detection and ρ = 0 for failure indetection.The joint ICA classification result was compared to ICA for each of thethree contrasts separately, to examine whether the fusion analysis has ad-vantages over analysis of the results of each contrast separately. As in thefusion analysis, the mixing coefficients were employed as input features forclassification of young and older adults.3.3 ResultsAlthough older adults need higher SNRs to achieve the same performancescores as young adults (i.e. they do not perform as well in noise), be-haviourally there is no difference in the amount of benefit they get fromcontextual information compared to anomalous information. The goal ofour fusion analysis is to examine whether jICA components can be used toaccurately distinguish young and older adults on the basis of fMRI data from233.3. Resultsa speech perception experiment, despite the similarity in contextual benefitbetween the groups. The success of this analysis is evaluated by examiningthe statistical difference among the mixing coefficients of joint sources, byapplying the Renyi divergence criterion, and by an automatic classificationmethod. These tests are described in the following three subsections.3.3.1 Statistical Difference Among Joint SourcesUnpaired two-sample t-tests (assuming unequal variance) were performedon the mixing coefficients, and two components were found to differ signif-icantly (p-value = 0.00071 and p-value = 0.0301, number of subjects = 16in each group) between the two groups. Fig. 3.2a, Fig. 3.2b and Fig. 3.2cshow the statistical Z-maps generated for the joint source (shown as rowsof Map 1, Map 2, and Map 3 in Fig. 3.1) with the largest group difference.Fig. 3.2d shows that the mixing coefficients for this joint source have highervalues in older subjects compared to younger subjects. Table 3.1 shows thecorresponding stereotaxic coordinates in MNI space for this source.3.3.2 Selection of Optimal Joint SourcesThe sorted maximum Renyi divergence values for different combinations ofcontrasts are shown in Fig. 3.3. Higher values indicate better discriminationbetween the groups. It is clear that combining all three contrast images yieldsthe best results. It can also be seen that the contrast comparing responsesto anomalous sentences and to unintelligible noise is the most effective singlecontrast in separating the groups.3.3.3 ClassificationFig. 3.4 shows the average classification accuracy for different numbers offeatures. Results show that the young and older subjects can be classifiedbased on their patterns of activity across the three contrasts of interest. Con-sidering the fact that the number of subjects is low and the dimensionalityof the input fMRI dataset to the fusion framework is quite high, the resultsare very promising. By selecting only the first three features, an averageclassification accuracy of around 75% is obtained. Fig. 3.5 shows the ROCcurves and detection reliability for four different numbers of features. Sinceeach additional feature is (by definition) less important than the previousone, the addition of features beyond three leads to asymptotically improvingdiscriminability.243.3. ResultsUsing the first three features, the joint ICA classification result is com-pared to ICA for each of the three contrasts separately in Fig. 3.6, whichshows the ROC curves, the detection reliability and the area under ROCcurve obtained. Classification of individuals based on ICAs of the contrastcomparing anomalous sentences and noise, and the contrast comparing SCNand silence, yield significantly (p-value < 0.05) better-than-chance classifi-cations, but substantially lower than the result obtained by the jICA classi-fication.253.3. Results(a) (b)(c) (d)Figure 3.2: Joint Independent Component Analysis (jICA) of brain patterns.The joint source map of the most significant component for the contrast ofhigh vs anomalous sentences (a), for the contrast of anomalous sentences vsSCN (b), and for the contrast of SCN vs Silence (c), along with the mixingcoefficients for the young and older subjects (d) is presented.263.3.ResultsTable 3.1: Talaraich coordinates for the most discriminative source map in three contrasts. MNI coordinates ofvoxels, which are above a threshold of |Z| > 3.5, are converted to Talairach coordinates. L and R show the assignedanatomical left and right hemispheres, cc stands for cubic centimeters showing the volume concentration of voxels,the coordinates and value of the maximum Z are also provided in the table. Not significant regions are shown byns.Area Brodmann Area R/L volume (cc) R/L random effects: Max Value (x, y, z)Higher-level patternsPositiveCingulate Gyrus 31 0.5/0.5 4.2(−4,−48, 35)/4.3(7,−48, 35)Precuneus 7, 31, 39 0.4/0.5 4.2(−4,−48, 38)/4.1(7,−48, 38)Posterior Cingulate 23, 31 0.2/0.2 3.9(0,−45, 31)/3.8(3,−45, 31)Angular Gyrus ∗ 0.0/0.1 ns/3.6(41,−63, 40)NegativeMiddle Frontal Gyrus ∗ 0.0/0.2 ns/4.3(44, 15, 32)- ∗ 0.0/0.2 ns/4.0(48, 15, 32)Inferior Frontal Gyrus 9 0.0/0.3 ns/4.0(44, 15, 32)Precentral Gyrus − 0.0/0.1 ns/3.7(48, 7, 39)Mid-level patternsPositiveMiddle Temporal Gyrus 21, 22 0.7/1.6 4.8(−64,−5,−6)/5.1(67,−29, 8)Middle Frontal Gyrus 6 0.1/1.0 3.6(−57, 10, 55)/4.7(51, 10, 55)Superior Temporal Gyrus 21, 22, 42 0.3/1.7 3.8(−55,−32, 8)/4.9(57,−32, 6)Continued on next page273.3.ResultsTable 3.1 – continued from previous pageArea Brodmann Area R/L volume (cc) R/L random effects: Max Value (x, y, z)- ∗ 0.1/0.5 ns/4.2(63,−42, 31)Inferior Parietal Lobule 40 0.0/0.4 3.9(−51,−32, 8)/4.6(57,−35, 6)Precentral Gyrus 6 0.0/0.1 ns/3.9(57, 7, 51)Postcentral Gyrus ∗ 0.0/0.1 ns/3.6(63,−26, 45)Inferior Frontal Gyrus ∗ 0.0/0.1 ns/3.6(48, 12, 32)Supramarginal Gyrus 40 0.0/0.1 ns/4.0(63,−43, 28)Lower-level patternsPositivePostcentral Gyrus 40, 43 1.1/0.1 6.0(−74,−16, 20)/3.6(67,−24, 23)Superior Temporal Gyrus 22, 41, 42 1.9/0.7 5.6(−74,−22, 17)/4.4(67,−28, 24)Transverse Temporal Gyrus 41, 42 1.2/0.2 5.5(−67,−10, 12)/4.0(67,−13, 13)Precentral Gyrus 43 0.1/0.0 4.7(−64,−7, 12)/nsInsula 13 0.0/0.2 ns/4.2(63,−31, 24)- ∗ .1/0.0 5.9(−78,−22, 17)/nsNegativeInferior Frontal Gyrus 47 0.4/0.0 3.9(−34, 28,−4)/nsInsula 13 0.1/0.0 3.6(−31, 22,−6)/ns283.4. Discussion1 2 3 4 5 6 700.10.20.30.40.50.60.70.8Combination of contrastsRenyi divergence1: Coherent−Anomalous + Anomalous−SCN + SCN−Rest2: Anomalous−SCN + SCN−Rest3: Coherent−Anomalous + Anomalous−SCN4: Coherent−Anomalous + SCN−Rest5: Anomalous−SCN6: SCN−Rest7: Coherent−AnomalousFigure 3.3: The effect of combining different contrasts on differentiationbetween histograms. The higher the values of the Renyi divergence, thebetter the discrimination between groups.Fig. 3.7 shows the ROC curves and the corresponding detection reliabilitywhen the analysis is run with different numbers of ICs. It is easily seenthat adding more ICs has little effect on the performance. Comparing theclassification errors, there is really no major difference between differentnumbers of ICs.3.4 DiscussionIn this study we used GLM to generate activation maps for multiple fMRIconditions related to speech perception and comprehension, and jICA to de-compose the activation maps into independent maps that share modulationprofiles. This is similar to assuming a fixed hemodynamic response for each293.4. Discussion1 2 3 4 5 6 7 80102030405060708090100Number of significant components usedAverage Accuracy (%)Figure 3.4: Classification accuracy for different number of features used. Se-lecting only three significant features (p-value < 0.05), produced an averageclassification accuracy of around 75%.subject and modeling the amplitude differences in modulation profiles [31].The modulation profiles were used within a non-linear SVM framework toclassify individuals. Our major findings are: (1) brain functional patternsof activation permit classification of individuals as younger or older; (2)combining these patterns improves the separability of joint sources and theaccuracy of classification of individuals.The joint source that differed the most between the two groups appearedto reflect activation differences at multiple levels of processing, including inleft inferior frontal cortex for the contrast between coherent and anomaloussentences, in both temporal lobes for the contrast between anomalous sen-tences and SCN, and in primary auditory cortex bilaterally for the contrastof SCN vs. silence. This reflects the well-known hierarchy of speech pro-cessing in which low-level acoustic features are analyzed in auditory cortex,303.4. Discussion0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100.10.20.30.40.50.60.70.80.91 ROC curvesFalse Positive Ratio (1 − specifity)True Positive Ratio (sensitivity) 2 ICs (ρ=64.1%)4 ICs (ρ=73.9%)6 ICs (ρ=74.5%)8 ICs (ρ=73.5%)Figure 3.5: ROC curves and detection reliability resulting from four differentnumbers of features used. Perfect detection: ρ = 1; detection failure: ρ = 0.Performance of the classification was measured by repeatedly splitting thedata into training and test sets and averaging classification performance ontest data set. Splitting was done 200 times and the classification was repeated5 times with randomized order of the subjects in the training dataset. ROCcurves are computed using posterior probabilities of the SVM output.superior and middle temporal gyri are sensitive to processing of auditorilypresented sentences, and left inferior frontal gyrus activity reflects higher-level linguistic (possibly semantic) processing [56, 55, 161, 142, 137].The relation between the three maps of the joint sources (see Fig. 3.1)were investigated by looking back to the SPM contrast images and examiningregions that contributed significantly in the joint source, i.e. computing thejoint histogram. The divergence criterion derived from the joint histogramswas used to measure the separability of the two groups based on the jointsources. This criterion confirmed that the fusion of contrasts improved sep-313.4. Discussion0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100.10.20.30.40.50.60.70.80.91 ROC curvesFalse Positive Ratio (1 − specifity)True Positive Ratio (sensitivity)Coherent vs Anomalous + Anomalous vs SCN + SCN vs Rest ρ=72.6%Coherent vs Anomalous ρ=18.7%Anomalous vs SCN ρ=61.2%SCN vs Rest ρ=53.1%Figure 3.6: ROC curves and detection reliability for different contrasts. Per-fect detection: ρ = 1; detection failure: ρ = 0. All of the contrasts (exceptthat of coherent vs anomalous sentences) show high detection accuracy.arability, compared to the consideration of each contrast separately.Results demonstrate that individuals can be classified relatively accu-rately into young and older age groups by combining functional contrastssensitive to the processing of noise vs. silence, anomalous sentences vs.noise, and coherent vs. anomalous sentences. Note that although the brainimaging data permit this classification, behavioral data did not: the abilityto report words from the anomalous and coherent sentences was matched be-tween young and older listeners for the contrasts examined. Fig. 3.4 showsthat using only three coefficients of the mixing matrix, a classification accu-racy of around 75% can be obtained, albeit with a high standard deviation.The high standard deviation on the classification performance might be dueto the small number of datasets, or because our method of controlling forhearing ability based on the behavioral performance did not work as well as323.4. Discussion0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100.10.20.30.40.50.60.70.80.91 ROC curvesFalse Positive Ratio (1 − specifity)True Positive Ratio (sensitivity)8 ICs (ρ=73.4%, AUC = 0.867)12 ICs (ρ=68.7%, AUC = 0.843)16 ICs (ρ=68.3%, AUC = 0.841)Figure 3.7: ROC curves and detection reliability for different numbers ofICs. Perfect detection: ρ = 1; detection failure: ρ = 0.we had hoped.Although using all three contrasts resulted in the best detection reliabil-ity, i.e. highest area under ROC curve, the contrast of anomalous sentencesvs. unintelligible noise had the most impact on separability of the groups,with a detection reliability of around 60% (AUC of 80%) by itself. Thismay be because, in order to match intelligibility, older adults heard sentencematerials at more advantageous SNRs, and the acoustic differences betweenthese more positive SNRs and those experienced by the younger listenersmay be reflected in different patterns of activity in auditory regions in thetwo groups. The analysis did not appear sensitive to the number of inde-pendent components (8, 12, or 16) included. Also, the number of features,as long as it was three or more, had relatively little impact on classificationaccuracy.This was simply a validation study to demonstrate that information333.4. Discussionacross multiple functional contrasts can be usefully combined for classifi-cation. Although here we differentiate young and older people, using ageas an observable "gold standard" way to discriminate groups, we anticipatethat this method will be useful to aid in classification of individuals to clini-cal groups using objective, quantitative, criteria. In chapter 7, we apply thismethod in order to classify individuals with depression.In summary, using the joint ICA method together with an SVM classifi-cation algorithm, we have demonstrated that cognitive patterns can be usedto classify individuals in the absence of behavioral differences. Feasibility ofthe proposed framework is shown by demonstrating that functional activitymaps can be used to classify subjects accurately. The best combination ofcontrasts and optimal components are identified. We showed that by com-bining three different functional contrasts, revealing three different patternsof brain activity, the overall performance of the classification improves.34Chapter 4Joint Sparse RepresentationAnalysis4.1 IntroductionA recent study by Daubechies et al. showed that the theoretical assumptionof independence of the patterns extracted by ICA algorithms is not guar-anteed in practice [51]. Furthermore, there is no physical reason for thespatial samples to correspond to different activity patterns with statisticallyindependent distributions [125]. It has been shown that, by using ICA al-gorithms, patterns are separated on the basis of spatial sparsity rather thanindependence [51, 195, 114]; therefore, mathematical properties of brain dataother than independence should be used.Unlike Independent Component Analysis based methods, sparse repre-sentation methods do not require independence of the brain activation pat-terns. Although a recent study by Calhoun et al. claimed that ICA al-gorithms used for fMRI analysis select for independence rather than spar-sity [32], several studies have shown superiority of sparse representationmethods over the ICA-based methods [164, 103, 64]. Roussos et al. em-ployed wavelets and sparsity-inducing adaptive priors to construct a struc-tured generative latent-variable model, and showed their proposed algorithmoutperforms ICA in benchmark datasets. Lee et al. used K-Singular ValueDecomposition (K-SVD; [1]) to estimate design matrices within a sparsegeneral linear model (GLM) framework [103]. Eavani et al. utilized K-SVDto identify distinct functional sub-networks of resting-state fMRI data [64].Kim et al. employed sparse prior regularization for temporally concatenatedICA to enhance the estimation of an individual’s spatial patterns and thetemporal components of neuronal activation [98]. Wang et al. showed betterdetection sensitivity of fMRI signal using a sparse approximation coefficient0This chapter is adapted from [154]: Mahdi Ramezani, Kristopher Marble, HeatherTrang, Ingrid Johnsrude, and Purang Abolmaesumi, Joint Sparse Representation of BrainActivity Patterns in Multi-task fMRI Data, IEEE Transactions on Medical Imaging, 2014.354.2. Methodprior to ICA decomposition [202]. These studies demonstrated the superior-ity of sparse representation methods over ICA-based algorithms.4.2 MethodIn multi-task fMRI, the functional contrasts are obtained from subjects’fMRI responses to different stimuli and tasks, which produce multiple func-tional contrasts. It is reasonable to expect that these functional contrastscontain some shared information, because even distant brain regions are of-ten connected [127]. Our goal is to examine brain activations across multipletasks, collected on the same participants, in a unified analytic framework bymodeling potential coupling among the data from different fMRI tasks (thefeatures). Here, the features are contrast images generated from multipletasks. Each feature is represented as a set of sparse, linearly mixed, jointbrain maps, which is formally analogous to conventional sparse representa-tion of signals.Sparse representation methods assume a generative model y = Dx, wherea dictionary, D, combines with the sparse coefficients x, to generate the ob-servation y. The objective is to maximize the likelihood that observationhas efficient, sparse representations in a redundant dictionary given by thematrix D [191]. Formally, the goal of learning is to find the overcompletedictionary D∗ such thatD∗ = arg maxD[logP (y|x)] = arg maxD[logˆP (y|x,D)P (x)dx](4.1)If we were to run the sparse representation on data from each fMRI taskseparately, we would maximize the likelihood functions in separate sparserepresentation analysis as follows:D∗1 = arg maxD[logˆP (y1|x1, D1)P (x2)dx1],D∗2 = arg maxD[logˆP (y2|x2, D2)P (x2)dx2],...D∗M = arg maxD[logˆP (yM |xM , DM )P (xM )dxM].(4.2)This would result in M set of dictionaries [D∗1,D∗2, . . . ,D∗M ] and sparsecoefficients [x∗1,x∗2, . . . ,x∗M ]. To obtain the joint relation of the results we364.2. Methodwould need to combine the sparse coefficients. However, if we utilize a fusionapproach, the dictionaries can be learned such that they efficiently describethe content of the fMRI contrasts and simultaneously allow us to capturethe correlation among multi-task contrasts. The learning objective is tomaximize the joint likelihood that all contrasts are well represented by adictionary D [191].D∗ = arg maxD[logˆP (y1, y2, . . . , yM |x,D)P (x)dx](4.3)where yj is the functional contrast obtained from subject’s fMRI re-sponses to task j, and D is the joint dictionary containing joint sources.The flowchart of the analysis, including the preprocessing, feature se-lection, joint sparse representation analysis, and visualization of the outputsources is displayed in Fig. 4.1. Preprocessing steps include realignment ofthe functional images, coregistration of the functional images to structuralMRI data of subjects, and normalization of each subject’s structural MRIdata to a template, and using the deformation parameters to normalize thefunctional images. To select features, single-subject general linear modelsare created by coding the condition to which each scan of the fMRI sessionbelonged. As a result, functional contrast images (the features) are gener-ated for each subject that represent the brain activation patterns related tothe specific task. The functional contrast images of each subject are normal-ized to have the same average sum of squares. Then, subjects’ functionalcontrasts are stacked together and a joint activation pattern (joint feature)is created. Observation matrix, Y = [y1,y2, . . . ,yN ] ∈ RV×N , is created byjoint features of all subjects, where yi =[y1iT ,y2iT , . . . ,yMiT]Tis the vectorcontaining functional contrasts of subject i stacking together.In jSRA, we assume a generative modelY = DX. In this model, Y is theobservation matrix, X = [x1,x2, . . . ,xN ] ∈ RM×N is the sparse modulationmatrix, and D = [d1,d2, . . . ,dN ] ∈ RV×M is the dictionary containingM signal atoms, where di = [d1iT ,d2iT , . . . ,dMiT] ∈ RM×N is the vectorcontaining functional maps of the dictionary atom i. Each of these atomsrepresents the joint activation pattern of brain extracted from the subjects’functional contrasts. V , N , and M are the number of total voxels, subjectsand brain patterns, respectively.374.2. MethodTextfMRI dataFeature selectionPreprocessingRealignmentCoregistrationNormalizationComponent displayTextStatistical analysisZ-scale & thresholdingJoint sourse based analysisTextFeature normalizationMatrix compositionK-SVD decompositionFigure 4.1: Flowchart of the analysis, including preprocessing, feature selec-tion, joint sparse representation analysis, and visualizing the output sources.Fig. 4.2 shows the schematic of the joint sparse representation analysis.Assuming that maps of joint sources share the sparse modulation matrix,the method can represent a large set of functional contrasts as a sparselinear combination of a small set of joint basis patterns. This method does384.2. Method…Map MD: K Joint sourcesMap 1Feature 1…Feature MGrp1 Grp2X: Sparse Coefficient MatrixY: ObservationsFigure 4.2: Schematic of the joint Source Representation Analysis (jSRA)method.not provide information about inter-voxel relationships, but can be used toidentify common networks that are differentially involved between groups.Prior to the decomposition, the appropriate number of brain basis pat-terns (dictionary size) should be selected. The best way to estimate thisnumber of brain patterns, which may affect the results of the sparse repre-sentation analysis, is not clear [64]. Information theoretic techniques havebeen shown to be useful for selecting the number of brain basis patterns [104].However, these techniques may not converge because of the heterogeneity inlocalization and size of activations. Therefore, in this study we performthe analysis using different values for the number of brain patterns, andcompare the results with each other. In the jSRA method, we use the K-Singular Value Decomposition (K-SVD) method to decompose the multiplefMRI contrast images into joint sources. K-Singular Value Decomposition(KSVD) method [1] is one dictionary-learning algorithm for sparse signalrepresentations, which decomposes the input matrix, Y, into a linear com-bination of dictionary elements, D, using the fewest number of non-zero394.3. Simulationscoefficients. In other words, it solves the following minimization problem:minD,x{‖Y −DX‖2F } subject to ∀i, ‖xi‖0 ≤ T0 (4.4)In this equation, ‖.‖F is the Frobenius matrix norm and T0 is the number ofnon-zero elements in each linear combination. To find the sparsest represen-tation of input contrasts, KSVD iteratively updates the vectors xi and eachcolumn of the dictionary in two steps [1]. Assuming a fixed dictionary, in thefirst step, the following minimization problem is solved using the OrthogonalMatching Pursuit (OMP) algorithm [141]:minxi{‖yi −Dxi‖2F } subject to ∀i, ‖xi‖0 ≤ T0, i = 1, 2, . . . , N. (4.5)In the second step, for each column of the dictionary (dk), the representationerror (Ek) is computed and using SVD decomposition the updated dictionarycolumn (d̂k) is obtained:Ek = Y −∑j 6=kdjxj , Ek = U∆VT , k = 1, 2, . . . ,K, (4.6)where xj is the jth row in X. The first column of U is chosen as d̂k and thefirst column of V multiplied by ∆(1, 1) is chosen as the updated xk. Thesesteps are run for a finite number of iterations (See [1] for more details).The K-SVD procedure generates a set of joint sources and associatedsparse coefficients. These low-dimensional coefficients model the modula-tion of each subject’s functional contrast by a joint source. Thus, in groupanalysis of fMRI data, these coefficients can be used as a criterion for cap-turing group differences. To investigate whether the groups are separable bydifferent weightings of the joint sources, unpaired two-sample t-tests withunequal variance (heteroscedastic) on the mixing coefficients are performed.For visualization, the corresponding joint sources are converted to z-values.4.3 Simulations4.3.1 Data GenerationThe simulated fMRI data were generated by taking experimental fMRI dataacquired using sparse imaging technique during a task paradigm, and (a) ran-domizing the time-course of each voxel to remove the intrinsic task-relatedactivations, (b) adding pre-defined spatial patterns of functional activation404.3. Simulationswith varying degrees of spatial overlap size across subjects, and (c) addingcorresponding task-related hemodynamic response functions (HRFs) associ-ated with the neuronal activation.A total of 20 simulated datasets that represent two groups of subjects,each with 10 datasets, were generated. For each subject multiple datasetsrepresenting multiple tasks were created. To make the simulations more re-alistic, we have used the experimental fMRI data of a subject, which wasacquired during a speech comprehension task (see Section 2.1). The datawere acquired across four blocks of trials, each 81 trials long. Each block ofdata was used to generate a simulated dataset that represents fMRI data ob-tained during a single task. As each task can activate multiple regions, whichmay or may not be similar to those activated by other tasks, multiple re-gions of interest (ROI) were defined. Five regions of interest (ROI) includingBroca’s area, left hand motor function, medial frontal cortex, right dorso-lateral prefrontal cortex, and Wernicke’s area were selected from the non-overlapping functionally connected brain regions defined by the PittsburghBrain Connectivity competition (PBC 2009). The PBC functional maps arecortical areas that are associated with cognitive control networks [42], lan-guage function (Wernicke’s and Broca’s area [17]), motor function (hand,foot, and tongue [198]), retinal field maps [201], and auditory responsivecortex [71]. Fig. 4.3 shows the regions of interest.For each group, we defined four tasks that activate similar ROIs in sub-jects within each group, and similar and/or different ROIs between the twogroups of subjects. Table 4.1 shows the ROIs used for each group for eachtask. Task 1 is defined so that it includes two similar regions between thegroups; task 2 and task 3 have one similar and one dissimilar regions; andtask 4 has two dissimilar regions between the groups.Since each activated region in each subject can be slightly different fromother subjects within the same group, the ROIs were randomly dilated usinga disk with variable radius of one to five voxels for each subject. Therefore,all subjects within each group have a common known area of activation.For each ROI of each subject a synthetic activation signal was createdusing the SimTB MATLAB toolbox [68]. A block experimental design with81 time samples, a repetition time of 9 sec, a block length of 6 sec, and inter-stimulus interval of 6 sec was defined. The activation signal was generatedby linear convolution of the block design signal and a canonical HRF.Rician noise [84] was added to real and imaginary parts of the activationsignals and taking the square root relative to a specified contrast-to-noise ra-tio (CNR). CNR, which is the ratio of the standard deviation of the signal tothe standard deviation of the noise [68], obtained from a uniform distribution414.3. SimulationsFigure 4.3: ROIs representing the activated regions within the brain, de-fined using the brain functional connectivity maps provided by the Pitts-burgh Brain Connectivity competition (PBC 2009). Blue, green, red, cyanand magenta colors show Broca, left hand, medial frontal cortex, right dor-solateral prefrontal cortex, and Wernicke.of numbers between 0.5 and 1.5, for each of the signals.Table 4.1: ROIs used for each group for each task.Task group 1 group 21 broca wernicke broca wernicke2 medial frontalcortexright dorsolateralprefrontal cortexleft handright dorsolateralprefrontal cortex3 broca left hand left hand wernicke4 medial frontalcortexwernicke brocaright dorsolateralprefrontal cortexAfter realigning the data set using SPM8, the time course of each voxelwas randomized to remove the intrinsic activations in the whole dataset.Then, for each subject, the activation signal of each ROI was added to the424.3. Simulations10 20 30 40 50 60 70 80132013301340135013601370  Original fMRI signal(a)10 20 30 40 50 60 70 8013801390140014101420143014401450146014701480  Randomized fMRI signal(b)10 20 30 40 50 60 70 80−0.4−0.3−0.2−0.100.10.20.30.40.5  Syntethic Activation Signal(c)10 20 30 40 50 60 70 80138014001420144014601480  Realistic SignalRandomized fMRI signal(d)Figure 4.4: fMRI signal of a voxel before (a) and after randomization (b),the created synthetic activation signal (c), and simulated fMRI signal (d). Asimulated fMRI signal is generated by adding the weighted activation signalto the randomized fMRI signal.time courses of voxels in that ROI. The amplitude of the activation signalwas chosen to be 3% of the mean amplitude of the time course of the selectedvoxel for the subjects in group 1, and to be 5% of the mean amplitude ofthe time course of the selected voxel for the subjects in group 2. Fig. 4.4shows the fMRI signal of a voxel before (a) and after randomization (b),the synthetic activation signal (c), and simulated fMRI signal (d), which iscreated by adding the weighted activation signal to the randomized fMRIsignal.4.3.2 Experiments and ResultsThe simulated data were spatially smoothed using Gaussian kernel of 8 mm,and the contrast images related to the simulated conditions were created.In order to generate contrast images related to the simulated conditions,single-subject general linear models were created. Different combinations ofcontrasts generated from each task were used as input observations to thejSRA method. Besides the combination of all four tasks, two combinations ofthree tasks were generated, by removing the task that activated two similar434.3. Simulationsregions between the groups, i.e. task 1, and by removing the task thatactivated two dissimilar regions between the two tasks, i.e. task 4. SeeTable 4.1 for regions in each task. Combinations of two tasks were generatedas follows: Tasks 1 and 2 that do not have any overlap among the syntheticactivation regions, i.e. broca and wernicke regions for task 1 and medialfrontal cortex, right dorsolateral prefrontal cortex and left hand regions fortask 2; Tasks 3 and 4 that have one activation map for each group beingopposites of each other, i.e. the activation map in task 3 of group 1 issimilar to the task 4 of group 2 and vice versa; Tasks 1 and 4 have onesimilar activation map for both groups, i.e. wernicke region in group 1 andbroca region in group 2; Tasks 2 and 3 have one similar activation map forone of the groups, i.e. left hand region in group 2.Using the jSRA method, the joint sources for each combination wereobtained. These joint sources are estimations of the true joint sources, i.e.activation maps that were used to generate the observations. We comparedthe estimated and true joint sources by calculating the False Positive (FP),False Negative (FN), True Positive (TP) and True Negative (TN) values.TP shows the number of correctly identified active voxels, TN shows thenumber of correctly identified non-active voxels, FP shows the number ofnon-active voxels identified as active voxels, and FN shows the number ofactive voxels identified as non-active voxels, in the estimated joint sources.Furthermore, the similarity indices such as Jaccard, precision, and sensitivitywere calculated, as defined below:Jaccard =TPTP + FP + FN, (4.7a)Precision =TPTP + FP, (4.7b)Sensitivity =TPTP + FN. (4.7c)As there are many more non-active voxels in the brain than active voxels,TN is a much bigger number than TP, FP, and FN. The specificity andaccuracy of the results will be near 99%, and will not differ between theresults of the jSRA and jICA. Accordingly we do not report these indices.Similar to the jSRA method, the jICA method [31] was applied to the samecombinations of contrasts and similarity coefficients were calculated.Using the estimated maps of the joint sources for the jSRA and thejICA methods, the Jaccard index, precision, and sensitivity were calculated.Table 4.2 shows the results. As it can be seen, the jSRA method is superiorto the jICA method. The best performance occurs for the combination of444.3. SimulationsTable 4.2: Similarity coefficients showing the result of jICA and jSRA meth-ods for different combinations of tasks. jS1 and jS2 stand for joint source 1and 2.Jaccard Precision SensitivityTasks Method jS1 jS2 jS1 jS2 jS1 jS2T1+T4jICA 49.2 54.1 62.7 59.9 69.6 85.3jSRA 52.9 63.2 68.5 76.3 69.9 78.6T2+T3jICA 53.8 39.3 82.5 76.3 69.9 78.6jSRA 62.5 52.9 74.8 65.8 79.2 73.0T1+T2jICA 40.3 56.4 53.7 62.8 61.6 84.8jSRA 47.1 59.2 59.9 71.8 68.8 77.1T3+T4jICA 46.4 67.4 60.6 75.7 66.6 86.1jSRA 60.1 66.6 74.0 79.8 76.2 80.0T1+T2+T3jICA 37.5 58.2 50.1 72.3 59.8 74.9jSRA 51.9 62.2 64.9 74.1 72.1 79.4T2+T3+T4jICA 26.6 54.9 36.2 65.9 50.1 76.6jSRA 53.7 62.7 65.6 75.7 74.8 78.5T1+T2+T3+T4jICA 30.7 35.1 40 53.5 56.8 50.6jSRA 52.6 43.3 66.3 62.1 71.8 58.8tasks 3 and 4, where the average Jaccard index of 63.35% is obtained, andthe worst performance is for the combination of all tasks, where an averageJaccard index of 47.96% is obtained.As the number of combined tasks, which is related to the dimensional-ity of the input observations, increases, the values of the similarity indicesbetween the estimated and true maps drops. However, the jSRA methodis more robust to the increase in the dimensionality. In other words, theimprovement of the jSRA method, in comparison to jICA, becomes moremarked as the number of combined tasks increases. Fig. 6 (a, b, c) showsthe average Jaccard, precision and sensitivity for combinations of two, threeand four tasks.454.4. Results2 3 40102030405060708090100Number of combined tasksJaccard index [%]  jICAjSRA(a)2 3 40102030405060708090100Number of combined tasksPrecision [%]  jICAjSRA(b)2 3 40102030405060708090100Number of combined tasksSensitivity [%]  jICAjSRA(c)Figure 4.5: Average Jaccard (a), precision (b), and sensitivity (c) for combi-nations of two, three and four tasks.4.4 ResultsFor visualization, the joint source maps were converted to empirical Z-valuesby subtracting the mean of the joint source from each value and then di-viding each result by the standard deviation. Figs. 4.6, 4.7, and 4.8 showthe statistical Z maps (|Z| > 2.25) generated for the most significant jointsource (i.e., the one with the smallest p-value) obtained using the statisticalt-test on the sparse/mixing coefficients for the jSRA/jICA methods. Thethree figures correspond to analyses with 4, 8 and 12 components, respec-tively. Figs. 4.6(a, e), 4.7(a,e), and 4.8(a,e) show the distribution of thesparse/mixing coefficients with the most significant difference between the464.4. Results 9      (a)  (b)  (c)  (d)      (e)  (f)  (g)  (h)  Fig. 7 . Significant joint source for jSRA (a -d) and jICA  (f -h) methods, using 4 sources. From left to right the figures show the distributions of the coefficients, and the corresponding maps related to high-, mid -, and low -level auditory processing.  White ovals show the most expected activation regions.        (a)  (b)  (c)  (d)      (e)  (f)  (g)  (h)  Fig. 7. Significant joint source for jSRA (a -d) and jICA (f -h) methods, using 8 sources. From left to right the figures show the distributions of the coefficients, and the corresponding maps related to high-, mid -, and low -level auditory processing. White ovals show the most expected activation regions.   Young Old-400-2000200400600p-value = 0.019Component:4 feature:1Component:4 feature:2Component:4 feature:3Young Old0.10.15.20.250.3p-value = 0.132Component:3 feature:1Component:3 feature:2Component:3 feature:3Young Old-1000100200300p-value = 0.015Component:4 feature:1Component:4 feature:2Component:4 feature:3Young Old-0.3-0.2-0.100.10.2p-value = 0.0312Component:7 feature:1Component:7 feature:2Component:7 feature:3Figure 4.6: Significant joint source for jSRA (a-d) and jICA (f-h) methods,using 4 sources. From left to right the figures show the distributions of thecoefficients, and the corresponding maps related to high-, mid-, and low-levelauditory processing. White ovals show the most expected activation regions:1) left inferior frontal cortex for the high-level contrast; 2) temporal lobes,particularly in the left hemisphere, for the mid-level contrast; and 3) primaryauditory cortex in both hemispheres for the low-level contrast.two groups. As it can be seen, the absolute value of the modulation co-efficients is higher for older compared to young adults. These coefficientsprovide a measure of functional connectivity [31] implying greater functionalconnectivity among different levels of auditory processing in older adults.Fig.4.6e shows that jICA has failed to identify a joint source that is sig-nificantly different between the two groups, when the number of estimatedindependent components was four. We expected to see joint activation mapsin the left inferior frontal cortex for the high-level contrast, in the tempo-ral lobes, particularly in the left hemisphere, for the mid-level contrast, andin the primary auditory cortex in both hemispheres for the low-level con-trast [186]. These regions are shown using white ovals in Figs. 4.6, 4.7, and4.8. As it can be seen in these figures, both of the jICA and jSRA methodsshowed almost all of the regions using 4, 8, and 12 sources. However, the474.5. Discussions 9      (a)  (b)  (c)  (d)      (e)  (f)  (g)  (h)  Fig. 7 . Significant joint source for jSRA (a -d) and jICA  (f -h) methods, using 4 sources. From left to right the figures show the distributions of the coefficients, and the corresponding maps related to high-, mid -, and low -level auditory processing.  White ovals show the most expected activation regions.        (a)  (b)  (c)  (d)      (e)  (f)  (g)  (h)  Fig. 7. Significant joint source for jSRA (a -d) and jICA (f -h) methods, using 8 sources. From left to right the figures show the distributions of the coefficients, and the corresponding maps related to high-, mid -, and low -level auditory processing. White ovals show the most expected activation regions.   Young Old-400-2000200400600p-value = 0.019Component:4 feature:1Component:4 feature:2Component:4 feature:3Young Old0.10.15.20.250.3p-value = 0.132Component:3 feature:1Component:3 feature:2Component:3 feature:3Young Old-1000100200300p-value = 0.015Component:4 feature:1Component:4 feature:2Component:4 feature:3Young Old-0.3-0.2-0.100.10.2p-value = 0.0312Component:7 feature:1Component:7 feature:2Component:7 feature:3Figure 4.7: Significant joint source for jSRA (a-d) and jICA (f-h) methods,using 8 sources. From left to right the figures show the distributions of thecoefficients, and the corresponding maps related to high-, mid-, and low-levelauditory processing. White ovals show the most expected activation regions:1) left inferior frontal cortex for the high-level contrast; 2) temporal lobes,particularly in the left hemisphere, for the mid-level contrast; and 3) primaryauditory cortex in both hemispheres for the low-level contrast.jICA has failed to identify left temporal lobes in the mid-level contrast (com-pare Fig. 4.7g to Fig. 4.7c), and failed to show activations in auditory cortexin the left hemisphere in the low-level auditory analysis contrast (compareFig. 4.7h to Fig. 4.7d). Besides, there are non-related activations (shown inblue) captured in Fig.4.7h and Fig.4.8h.4.5 DiscussionsIn this study, the jSRA method was used to identify patterns of brain activ-ity (fMRI) evoked by multiple cognitive tasks within subjects. Our majorfindings are: (1) sparse representation analysis for multi-task fMRI data anal-ysis better captures the activation maps compared to the jICA approach; (2)brain functional-activation patterns can be represented as sparse coefficients484.5. Discussions 10  Acknowledgment  The authors would like to thank the Natural Sciences and Engineering Research Council (NSERC) and the Canadian Institutes of Health Research (CIHR) for funding this project.  References [1] N. K. Logothetis, "What we can do and what we cannot do with fMRI,"   Nature, vol. 453 , pp. 869 -878, 200 8.  [2] V. D. Calhoun, T. Adali, K. A. Kiehl, R. Astur, J. J. Pekar and G. D. Pearlson, "A method for multitask fMRI data fusion applied to schizophrenia,"   Hum.  Brain Mapp., vol. 27, pp. 598 -610, 07, 2006 .  [3] J . Sui, T. Adali, Y. Li, H. Yang and V. D. Calhoun, "A review of multivariate methods in brain imaging data fusion," in SPIE Medical Imaging, 20 10 , pp. 1 -11.  >@%0LMRYLü.9DQGHUSHUUHQ11RYLWVNL\%9DQUXPVWH36WLHUV%9DQden Bergh, L. Lagae,  S. Sunaert, J. Wagemans and S. Van Huffel, "The ³ZK\´ DQG ³KRZ´ RI -RLQW,&$ 5HVXOWV IURP D YLVXDO GHWHFWLRQtask,"   Neuroimage, vol. 60, pp. 1171 -1185 , 20 12 .  [5] Z. Long, K. Chen, X. Wu, E. Reiman, D. Peng and L. Yao, "Improved application of independent component analysis to functional magnetic resonance imaging study via linear projection techniques,"   Hum.  Brain Mapp., vol. 30, pp. 417 -431 , 20 09 .  [6] Z. Long, R. Li, M. Hui, Z. Jin and L. Yao, "An improvement of independent component analysis with pr ojection method applied to multi-task fMRI data,"   Comput.  Biol.  Med., vol. 43, pp. 200 -210 , 20 13 .  [7] J. Sui, T. Adali, G. Pearlson, H. Yang, S. R. Sponheim, T. White and V. D. Calhoun, "A {CCA ICA} based model for multi -task brain imaging data fusion and its application to schizophrenia,"   Neuroimage, vol. 51, pp. 123 -134 , 20 10 .  [8] V. D. Calhoun and T. Adali, "Feature -based fusion of medical imaging data,"   IEEE Trans. Info. Tech. Biomed., vol. 13, pp. 711 -720 , 200 9.  [9] K. Choi, Z. Yang, X. Hu and H. Mayberg, "A combined functional -structural connectivity analysis of major depression using joint independent components analysis,"   Psychiatric MRI/MRS, Toronto, Canada, May, pp. 355 5, 2008 .  [10] K. Specht, R. Zahn, K. Willmes, S. Weis, C. Holtel, B. J . Krause, H. Herzog and W. Huber, "Joint independent component analysis of structural and functional images reveals complex patterns of functional reorganisation in stroke aphasia,"   Neuroimage, vol. 47, pp. 2057 -206 3, 2009 .  [11] V. D. Calhoun, T. Adali, N. Giuliani, J. Pekar, K. Kiehl and G. Pearlson, "Method for multimodal analysis of independent source differences in schizophrenia: combining gray matter structural and auditory oddball functional data,"   Hum.  Brain Mapp., vol. 27, pp. 47 -62, 2006 .  [12]  J. Liu, G. Pearlson, A. Windemuth, G. Ruano, N. I. Perrone Bizzozero and V. Calhoun, "Combining fMRI  and SNP data to investigate connections between brain function and genetics using parallel {ICA},"   Hum.  Brain Mapp., vol. 30, pp. 241 -255 , 200 9.  [13] J . Sui, T. Adali, G. D. Pearlson, V. P. Clark and V. D. Calhoun, "A method for accurate group difference detection by constraining the mixing coefficients in an ICA framework,"   Hum.  Brain Mapp., vol. 30, pp. 2953 -2970 , 200 9.  [14] I. Daubechies, E. Rousso s, S. Takerkart, M. Benharrosh, C. Golden, K. D'Ardenne, W. Richter, J. D. Cohen and J. Haxby, "Independent component analysis for brain fMRI does not select for independence,"   Proc. Natl. Acad. Sci., vol. 106 , pp. 104 15 , 200 9.  [15] M. J. McKeown and T. J. Sejnowski, "Independent component analysis of fMRI data: examining the assumptions,"   Hum.  Brain Mapp., vol. 6, pp. 368 -372 , 19 98 .  [16] G. Varoquaux, M. Keller, J. Poline, P. Ciuciu and B. Thirion, "ICA -based sparse features recovery from fMRI datasets," in IEEE Int. Symp. Biomed. Imaging, 20 10 , pp. 117 7 -1180 .  [17] S. Ma, X. L. Li, N. M. Correa, T. Adali and V. D. Calhoun, "Independent subspace analysis with prior information for fMRI data," in IEEE Int. Conf. Acoust. Spee. Sign. Proc. (ICASSP), 20 10 , pp. 192 2 -1925 .  [18] M. Aharon, M. Elad and A. Bruckstein, "K -SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation,"   IEEE Trans Signal Process, vol. 54, pp. 4311 -4322, 2006 .      (a)  (b)  (c)  (d)      (e)  (f)  (g)  (h)   Fig. 9. Significant joint source for jSRA (a -d) and jICA (f -h) methods, using 12 sources. From left to right the figures show the distributions of the coefficients, and the corresponding maps related to high-, mid -, and low -level auditory processing. White ovals show the most expected activation regions.  Young Old-1000100200300400p-value = 0.009Component:12 feature:1Component:12 feature:2Component:12 feature:3Young Old-0.100.10.20.3p-value = 0.004Component:6 feature:1Component:6 feature:2Component:6 feature:3Figure 4.8: Significant joint source f r jSRA (a-d) and jICA (f-h) methods,using 12 sources. From left to right the figures show the distributions of thecoefficients, and the correspondi g maps related to high-, mid-, and low-levelauditory processing. White ovals show the most expected activation regions:1) left inferior frontal cortex for the high-level contrast; 2) temporal lobes,particularly in the left hemisphere, for the mid-l vel co t as ; and 3) primaryauditory cortex in both hemispheres for the low-level contrast.that capture individual differences. We have shown these findings within twoexperiments using si ulated MRI data and experimental fMRI data from aspeech comprehension task.Simulation xperiment showed the superiority of the jSRA method com-pared t the jICA method in terms of the similarity indices between theestimated and true activation aps. As the numb r f input tasks creases,the average value of the similarity indices for both of the jICA a d jSRAmethods, become smaller. This is reasonable onsidering the fact that thenumber of observations, i.e. number of subjects, does not change, but thedimensionali y of the input observations is multiplied by a factor of 1.5 ortwo. However, jICA was more sensitive to the increase in the dimension-ality of the input observations. The change in the jSRA results related tothe dimension increase was less than that for the jICA method. Thereforethe relative improvement in sensitivity and specificity, for jSRA compared to494.5. DiscussionsjICA, also becomes more marked, as the number of combined tasks increases.Looking more closely at the results of the analysis of simulated data, wecan see that, among the combinations of two tasks, combination of tasks oneand two had the worst performance. In this combination of tasks, there wasno overlap among the simulated activated regions between the two tasks.Besides, three of these four regions were similar between the two groups, i.e.broca, wernicke and right dorsolateral prefrontal cortex regions. Therefore, itwas difficult to estimate the true sources and capturing the group differencesregardless of method.Combination of tasks three and four resulted in the greatest differenti-ation between the groups, with the activation patterns for the two groupsbeing opposites of each other. Group differences were captured not only bythe amplitude differences of the modulation profiles, but also through thecomplementarity of activated regions in the two groups: the activation mapfor task 3 of group 1 partially matched that for task 4 of group 2, and viceversa.A concern about the simulations is losing temporally correlated noisestructure (e.g., physiological noise) by randomizing the time-course of thefMRI signals. However, since we simulated sparse imaging data with 9 secbetween scans, it is reasonable to assume that successive scans are largelyuncorrelated.Results of the real experimental data showed that the activation mapsthat were captured within the maps of the joint sources, which were signifi-cantly different between the two groups, reflected the well-known hierarchy ofspeech processing: Auditory cortex analyzes the low-level acoustic features;superior and middle temporal gyri are sensitive to processing of auditorilypresented sentences; and left inferior frontal gyrus activity reflects semanticprocessing [56, 55, 161, 142, 137]. Sensitivity of the analysis to the numberof brain basis patterns was examined, by choosing 4, 8, and 12 numbers ofpatterns. We showed that unlike the jICA, the proposed jSRA method isnot very sensitive to the chosen number of components. Although the differ-ence between jSRA and jICA appears to be reduced with more components,using more components introduces two problems when dealing with groupfMRI data acquired from multiple subjects: 1) selecting the components ofinterest, and 2) split activation patterns within a region of interest.An additional advantage of jSRA method is that the modulation profilesare substantially compact and sparse. These coefficients can be used forreducing the dimensionality of the fMRI data, and may allow for a morereliable classification.In summary, we have shown that a joint sparse representation analysis504.5. Discussionscan effectively identify the common and unique information among differentlevels of brain cognitive patterns in multi-task fMRI data within differentgroups. To demonstrate the potential of the proposed framework, analysesof simulated fMRI data were performed followed by analyses of experimentalfMRI data from normal subjects performing speech comprehension tasks.Simulations showed the superiority of the proposed method to the stateof the art method (jICA) for multi-task fMRI data analysis. Results onthe experimental fMRI data also demonstrate that the jSRA method canbetter capture the brain functional activation patterns, and therefore thedifferences, between two groups.51Chapter 5Reliability Analysis andVisualization5.1 IntroductionSparse representation methods have gained popularity for analysis of braindata sets including EEG [178, 10, 86], MRI [181, 78], and fMRI [107, 106,164, 149], in recent years. In functional MRI studies, sparse representationmethods have been used for multivariate pattern analysis (MVPA) [79, 106,181, 166, 33], statistical parametric mapping for detection of task relatedactivation [103], classification of individuals [149, 150], and identification ofbrain resting-state networks [64]. Sparse representation analysis decomposesthe input observations into a linear combination of dictionary atoms, usingthe fewest number of non-zero coefficients.Despite the great advantage of sparse representation methods, there aresignificant drawbacks associated with these methods. First, the results ofmost sparse-representation algorithms may be somewhat different in mul-tiple runs of the algorithm, depending on the initialization of dictionaryatoms. It has been shown that the exact determination of sparsest represen-tation is an NP-hard problem [54]. Therefore, approximate solutions havebeen proposed for instead. Most sparse representation algorithms solve aminimization problem iteratively and guarantee convergence to at least alocal minimum solution. Depending on the initialization and parametersvalue, the algorithms may find different local minima. Therefore, an algo-rithmic reliability analysis of estimated sparse dictionary-atoms, hereafternamed components, is needed. Algorithmic reliability refers to the fact offinding the point that globally minimizes (maximizes) the objective function.Moreover, robustness of the sparse representation analysis with respect tothe parameters, i.e. dictionary size and sparsity level, should be investi-1This chapter is adapted from the following submission [155]: Mahdi Ramezani, SamanNouranian, Ingrid Johnsrude, and Purang Abolmaesumi, Reliability Analysis and Visual-ization of Sparse Representation Methods for Neuroimaging Data, submitted, 2014.525.1. Introductiongated prior to interpretation of the results. Algorithmic reliability shouldnot be mistaken by the statistical reliability of sparse representation analy-sis which is a necessary step for interpretation of the results of any statisticalmethod. Such analysis can be accomplished by bootstrapping (resampling)or variable selection (subsampling) algorithms [65, 8, 126]. Within theseresampling/subsampling methods the data sample is randomly changed bysimulating the sampling process, and the algorithm is run multiple timeswith the resampled/subsampled data. The spread of the obtained compo-nents is used to assess the statistical reliability of the original estimatedcomponent [91].Second, the dictionary atoms (components) are not ordered and are ran-domly permuted. This introduces two problems, when dealing with groupfMRI data acquired from multiple subjects, where the output componentsmay be split activation patterns within a region of interest (ROI) or mergedactivation patterns of multiple ROIs [98, 91, 124]: 1) selecting the compo-nents of interest, and 2) choosing the optimum number of components. Thefirst problem, is more important where no prior information of the spatiallocation of the activation patterns is available. The second problem arisesfrom the fact that fMRI signal consists of multiple components such as hemo-dynamic changes due to neural activations which can be task related or non-task related, motion and MRI scanner artifacts, and cardiac and respiratorypulsations. If the number of output components is selected less than theactual number of components, the output component may appear as mergedactivation patterns; and if the number of output component is selected farmore than the actual number of components, the output components maybe split activation patterns.A handful of studies have tried to overcome these issues. Eavani et al.used a Hausdorff distance metric to compare the dictionaries obtained byrunning the algorithm for different values of dictionary size and sparsity level,and ranked the estimated components [64]. Bilwaj et al. used L2 norm ofthe coefficients of a corresponding component to rank the components [78].In our previous work, we ranked the components based on the ability of theestimated components to classify individuals [149]. However, these studiesdid not address the effect of variation in components in multiple runs in thesepapers, nor did they investigate the robustness of the method with respectto the parameters (i.e. dictionary size and sparsity level).To circumvent these issues of component splitting and merging as well asrandom permutation and to facilitate use of the sparse representation methodfor group fMRI data, we present a method for assessing the reliability of theestimated components as well as for visual inspection of components in 2D535.1. Introductionspace, using a non-linear mapping. The method is based on the work by Him-berg et al. [91], ICASSO, which was proposed for stability estimation ofindependent components. We run the sparse-representation algorithm multi-ple times with different initializations and cluster the estimated componentsbased on the similarity of the components as represented by a correlationcoefficient. We further visualize the similarities using a nonlinear 2D projec-tion: t-SNE [59]. T-SNE visualizes high-dimensional data by assigning eachdata point a location in two-dimensional map. Visualization allows furtherinterpretation of the clustering results, including individual estimates withincluster, and the relations between clusters. Using t-SNE visualization thesimilarity of the components will be clearly presented. Unlike the ICASSOmethod [91], the t-SNE method maintains the local and global structure ofthe data in a single map, and keeps the quality of the visualization when thenumber of components becomes large [59].In this chapter, we examine the reliability with which sparse analyses canreliably identify activation foci in multi-subject functional imaging studies.We first examine this in simulated data, and then extend our analysis toreal data. We use simulated fMRI data sets, four groups of subjects with10 subjects per group and two activation foci in all subjects of each group,to quantitatively evaluate the results. Simulated fMRI data were generatedby taking experimental fMRI data (described in Material section), and (a)randomizing the time-course of each voxel to remove the intrinsic condition-related activations, (b) adding pre-defined spatial patterns of functional ac-tivation with varying degrees of spatial overlap size across subjects, and (c)adding corresponding task-related hemodynamic response functions (HRFs)associated with the neuronal activation. We perform reliability analysis andvisualization of the estimated fMRI components. We compare the results ob-tained using sparse analysis to those obtained using ICA, since this is anotherpopular approach. We furthermore show that using the t-SNE visualizationof the similarities between activation foci in multiple runs of algorithm, cor-rect number of sources that were introduced in the simulated data couldbe detected. The assumption in the simulated data that all activations arefound in all subjects may not be valid in real fMRI data, moreover, the in-tersubject variability in the simulated data may not be enough, therefore wereport an analysis on experimental fMRI data acquired from sixteen subjectswhile performing tasks related to speech comprehension. We demonstratehow the proposed reliability assessment works if activation foci are not foundin every subject, and if there is a realistic amount of intersubject variabil-ity. We show that the proposed approach helps in identifying the parameterssuch as dictionary size and sparsity level that result in reliable sparse sources.545.2. Methodt-SNE VisualizationRunning the algorithm multiple timesComputing similarity of the componentsHierarchical clusteringfMRI dataSparse representation AnalysisPreprocessingt-SNE VisualizationReliability AnalysisRunning the algorithm multiple timesComputing similarity of the componentsHierarchical clusteringRealignmentCoregistrationNormalizationGLM  Figure 5.1: Flowchart of the analysis, including preprocessing, sparse repre-sentation analysis, reliability analysis, and visualization.5.2 MethodThe proposed approach consists of four steps: a) Preprocessing of the fMRIdata; b) setting the parameters of the sparse representation analysis andrunning it multiple times, with different dictionaries, using the selected pa-rameters; c) clustering the components based on their similarities using thehierarchical clustering algorithm; d) visualizing the clusters using the t-SNEapproach. The flowchart of the analysis is displayed in Fig.5.1.555.2. Method5.2.1 PreprocessingPreprocessing steps included realignment of the functional images, coreg-istration of the functional images to structural MRI data of subjects, andnormalization of each subject’s structural MRI data to a template, and us-ing the deformation parameters to normalize the functional images. Single-subject General Linear Models are created by coding the condition to whicheach scan of the fMRI session belonged. As a result, functional contrastimages (the features) are generated for each subject that represent the brainactivation patterns related to the specific task comparison. The functionalcontrast image of each subject are normalized to have the same average sumof squares. Observation matrix, Y = [y1,y2, . . . ,yN ] ∈ RV×N , is created byfeatures of all subjects, where yi is the vector containing functional contrast,V and N are the number of variables and subjects, respectively.5.2.2 Sparse Representation AnalysisThe goal of sparse representation approaches, when they are used as secondlevel (group-level) analysis of fMRI data, is to represent functional contrastimages as a set of sparse and linearly mixed brain maps. Sparse representa-tion methods assume a generative model Y = DX [191]. In this model, Y isthe observation matrix, X = [x1,x2, . . . ,xN ] ∈ RK×N is the sparse modula-tion matrix, and D = [d1,d2, . . . ,dN ] ∈ RV×K is the dictionary containingK signal atoms (estimated components) representing the activation patternof brain extracted from the subjects’ functional contrasts. The objective isto maximize the likelihood that observation has efficient, sparse represen-tations in a redundant dictionary given by the matrix D [191]. Here, weuse the KSVD method, as a well-known sparse representation algorithm [1],to decompose the input matrix, Y, into a linear combination of dictionaryelements, D, using the fewest number of non-zero coefficients. The numberof iterations, components, and non-zero elements in each linear combinationis set prior to running K-SVD. To initially estimate the number of compo-nents (dictionary elements) in the K-SVD analysis, we used the MinimumDescription Length (MDL) criterion [104], which is an information theoretictechnique for model order selection.5.2.3 Reliability AnalysisThe K-SVD algorithm is run M times on the data matrix Y, with differentinitializations of the dictionary, D, and the estimates of the demixing ma-trices, which are pseudo inverse of sparse modulation matrices, are collected565.2. Methodinto a single matrix W = [WT1 ,WT2 , . . . ,WTM ], where WTi = X−1i .Similarity of the estimated components is measured by the absolute valueof their mutual correlation coefficients ρij , i, j = 1, 2, ...,K. Assuming thegenerative model Y = DX, and YWT = D, the correlation coefficients arecomputed by:P = DTD = WYTYWT = W(YTY)WT = WCWT , (5.1)where C is the covariance matrix of the observation matrix, Y [91]. Hierar-chical clustering algorithm [83] is used to cluster the estimated componentsusing the dissimilarity measure defined as 1 − |ρij |. Using the hierarchicalclustering algorithm the dendrogram is generated and partition of L clusters(which is set equal to the number of components) is generated by the average-linkage criterion. The number of points in each cluster is computed. Thesenumbers, which shows the number of times each component is estimated bythe sparse analysis, together with the similarities between estimated clusters,are used within the visualization step to select reliable components.5.2.4 Visualization using t-SNEThe t-Distributed Stochastic Neighbor Embedding (t-SNE) algorithm [59]is used to visualize the clustering results in a two-dimensional map. t-SNEconverts the Euclidean distance between the data points into conditionalprobabilities that represent similarities. Similarity in the high-dimensionalinput space between points j and i is the conditional probability pi|j , whichshows the probability that i would pick j as its neighbour to the probabilitydensity under a Student-t distribution centred at i. A similar conditionalprobability can be computed in the low-dimensional (2D) space denotedby qi|j . In a correct non-linear 2D projection map of the high-dimensionalspace to the 2D space, the two conditional probabilities, pi|j and qi|j , willbe equal. Therefore, minimizing the Kullback-Leibler divergence betweenthe conditional probabilities over all input points will yield a correct 2Drepresentation of the input space.Cost =∑i∑jpi|j logpi|jqi|j+∑i∑jqi|j logqi|jpi|j. (5.2)t-SNE uses a symmetric KL-divergence criterion and minimizes it using agradient descent method.Each component is shown as a point in 2D space and the similarity of thecomponents are represented by connected lines whose thickness denote the575.3. Simulationssimilarities between them. A convex hull bounds the points belonging to thesame cluster. Compact and isolated clusters represent reliable estimation ofthe components.In order to evaluate the quality of the visualization, the 2D projectionof t-SNE is compared to other projection techniques such as CurvilinearComponent Analysis (CCA) [57] and Multidmensional Scaling (MDS) [190]which are typically used for 2D visualization. The quality of the projectionis assessed based on the quality of the obtained clusters in two differentways. First, a hierarchical clustering algorithm is used to cluster the 2Dprojection of components. The quality of the clustering is evaluated usingthe cophenetic correlation coefficient which shows how well the dendrogramof the hierarchical clustering preserves the distances between the projectedcomponents [179]. The closer the value of the this coefficient is to 1, themore accurately the clustering solution reflects the 2D projected components.Second, the trustworthiness index is used to compare the quality of theprojection [197]. A projection onto a lower dimension is trustworthy if theset of m closest neighbors of a point on the lower dimension are close by inthe original space.5.3 Simulations5.3.1 Data GenerationA total of 40 simulated datasets that represent four groups of subjects, eachwith 10 datasets, were generated. The simulated data is generated similar tothe 4.3.1. we have used the experimental fMRI data of a subject, which wasacquired during a speech comprehension task (see Section 2.1). The datawas acquired across 81 trials.As each simulated subject activates multiple regions of activation, whichmay or may not be similar to those activated by others, multiple regionsof interest (ROI) were defined. These ROIs were selected from the non-overlapping functionally connected brain regions shown in Fig. 4.3. For eachgroup, two similar ROIs in subjects within each group, and similar and/ordifferent ROIs between the other groups of subjects, were selected. Table 5.1shows the ROIs used for each group. Since each activated region in eachsubject can be slightly different from other subjects within the same group,ROIs were randomly dilated using a disk with variable radius of one to fivevoxels for each subject. Therefore, all subjects within each group have acommon known area of activation.For each ROI of each subject a synthetic activation signal was created585.3. SimulationsTable 5.1: ROIs used for each group to create the simulated data. Eachgroup has two ROIs which are similar and/or dissimilar to other groups.Group Activated Regions1 Broca’s area Wernicke’s area2 medial frontal cortex right dorsolateral prefrontal cortex3 Broca’area left-hand motor cortex4 medial frontal cortex Wernicke’s areaas described in 4.3.1. After realigning the data set using SPM8, the timecourse of each voxel was randomized to remove the intrinsic activations inthe whole dataset. Then, for each subject, the activation signal of each ROIwas added to the time courses of voxels in that ROI. The amplitude of theactivation signal was chosen to be 3% of the mean amplitude of the timecourse of the selected voxels.5.3.2 Experiments and ResultsSimulated data were spatially smoothed using a Gaussian kernel of 8 mm,and the contrast images related to the simulated conditions were created. Inthe simulated dataset, four groups of subjects each with specific spatial pat-terns that were not the same between the groups were created. Hence, thecorrect number of dictionary size and sparsity level are four and one, respec-tively. However, in real cases these numbers are not known. Therefore, herewe tried two different values for the dictionary size and sparsity level andinvestigated the results of the analysis. For various parameters setting, thedendrogram created by the hierarchical clustering algorithm and the numberof KSVD components in the estimated clusters were plotted. The cluster ofcomponents was visualized in 2D space. Furthermore, the estimated compo-nents were compared to the true components and true positive rate (TPR)and false positive rate (FPR) of the estimated active voxels in the spatialpatterns were calculated. Receiver operating characteristic (ROC) curveswere computed, and the area under the curves were measured to quantita-tively evaluate the performance of the estimated spatial patterns. To havea better understanding of the quality of the estimated components, resultswere compared to the conventional ICA analysis.Fig. 5.2 shows the number of components in each cluster, the dendrogramof the hierarchical clustering algorithm, similarities between the estimatedclusters, and 2D visualization of the clusters, for two different parameters595.3. Simulationssetting. The first row shows the result for dictionary size of 4 and sparsitylevel 1, and the second row shows the results for dictionary size 8 and spar-sity level 4. The first column shows the number of times each componentappeared in a cluster. The second column is the dendrogram of the hierar-chical clustering algorithm obtained by the linkage strategy criterion. Thethird column shows the similarities between estimated clusters, arranged ac-cording to the dendrogram. The third row shows the 2D projection of theclusters using t-SNE approach for both parameter settings. Single run es-timation of components, the centrotype of the clusters and the similaritiesbetween the components are shown in the third row figures. Convex hullsare generated around the estimated clusters. Compact and isolated clustersrepresent reliable estimation of the components. Fig. 5.2g shows that thereare four compact and isolated clusters of components, which have appearedequally in multiple runs of the algorithm (Fig. 5.2a). Therefore, if the size ofdictionary and sparsity level are correctly set, compact and isolated clusterswill be generated. Fig. 5.2h shows that if these parameters are not correctlyselected, the clusters are not compact and isolated. However, still separatecluster of components (in our experiment 3-4 clusters) are visible withinthe data, which can help with correctly identifying the parameters prior tore-running the algorithm.To better investigate the results, the activation maps obtained using thetwo parameter settings is shown in Fig. 5.5. The first row shows the resultfor dictionary size of 4 and sparsity level 1, and the second row shows theresults for dictionary size 8 and sparsity level 4. Only the first four com-ponents which appeared more than the other components are shown for thedictionary size of 8. In this figure, column 1 to 4 show activations related togroup 1 to 4, respectively. The component number which is written beloweach activation map shows the component label in Fig. 5.2. These compo-nents’ numbers correspond to the large and isolated clusters for dictionarysize 8 (Fig. 5.2h). Therefore, we could correctly identify the informative com-ponents by looking at the visualization, and selecting the largest compactclusters. In other words, the meaningful components are the ones with iso-lated and compact clusters which have appeared relatively more in multipleruns of the algorithm. This means that we can select meaningful compo-nents, even when the parameters are not initially correctly selected.Fig. 5.3 shows the trustworthiness index as a function of neighborhoodsize for the CCA, MDS and t-SNE methods. According to this figure, thet-SNE method produces more trustworthy projections than CCA and MDSmethods. The maximum trustworthiness occurs at around the neighborhoodsize of 20 which is equal to the number of times that the algorithm was run.605.3. Simulations0 2 4 6 8 10 124321Component LabelNumber of estimates(a)00.20.40.60.8 Cluster merging similarity(b)Red lines indicate estimate−clusters3241Estimate−cluster label  0.10.20.30.40.50.60.70.80.91(c)0 5 10 15 2024653871Component LabelNumber of estimates(d)00.20.40.60.8 Cluster merging similarity(e)Red lines indicate estimate−clusters43687251Estimate−cluster label   0.10.20.30.40.50.60.70.80.91(f)123 4  0   0.5 0.750.9 1   Single−run−estimate"Best Estimate" (centrotype)0.50<ρij≤ 0.75(g)12 3 45678  0   0.5 0.750.9 1   Single−run−estimate"Best Estimate"0.50<ρij≤ 0.750.90<ρij≤ 1.00(h)Figure 5.2: Reliability and visualization analysis of the simulated data fortwo different sets of parameters (first and second row).(a, d) Number of timeseach component appeared in a cluster. (b, e) Dendrogram of the hierarchicalclustering algorithm. (c, f) Similarities between estimated clusters, arrangedaccording to the dendrogram. (g, h) 2D projection of the clusters.615.3. Simulations0 5 10 15 20 25 30 35 400.50.550.60.650.70.750.80.850.90.951Neighborhood sizeTrustworthiness  MDSt−SNECCAFigure 5.3: Trustworthiness index as a function of neighborhood size.CCA MDS t−SNE0.70.750.80.850.90.951Cophenet Correlation CoefficientFigure 5.4: Distribution of cophenet correlation coefficient in multiple runsof the visualization and clustering. The central red mark is the median, theedges of the blue box are the 25th and 75th percentiles, and the whiskersshow the extreme values of the volumes.Fig. 5.4 shows the distribution of cophenet correlation coefficient in mul-tiple runs of the visualization and clustering for the CCA, MDS and t-SNEmethods. This figure shows that the t-SNE projection can significantly betterpreserve the pairwise distances between the projected components comparedto CCA and MDS.Table 5.2 shows the normalized AUC for the four estimated componentsusing sparse analysis. As it can be seen the sparse analysis has successfullyidentified the activation maps related to each group. To have a better un-625.4. Experiments and Resultsderstanding of the quality of the estimation, we have compared the resultsto the results of using conventional ICA2. Results show that the sparse anal-ysis is superior to the conventional ICA. The ICA approach has failed toestimate at least one component using different estimations for number ofindependent components.(a) component 2 (b) component 4 (c) component 3 (d) component 1(e) component 4Component:2 feature:1(f) component 2 (g) component 6 (h) component 5Figure 5.5: Activation maps obtained for each group using different param-eter settings. Column 1 to 4 show activations related to groups 1 to 4 (seeFig.4.3), respectively. Component number represents the component labelin Fig.5.2.5.4 Experiments and ResultsFor real fMRI experiments, we use the experimental fMRI dara describein 2.1. We use the conditions of listening to “anomalous" sentences, and thecondition of silence. For each subject functional contrast image comparing2Independent Components (ICs) were found using the Infomax algorithm [12].635.5. DiscussionsTable 5.2: Normalized AUC for the four estimated components using thesparse analysis and ICA methods. C1 to C4 represent the true componentswithin the simulated dataset.dict. size sparsity level C1 C2 C3 C4sparse analysis4 1 0.49 0.58 0.93 0.664 3 0.40 0.71 0.92 0.675 4 0.53 0.62 0.93 0.666 4 0.50 0.58 0.93 0.668 1 0.38 0.61 0.93 0.888 4 0.48 0.57 0.93 0.65number of components C1 C2 C3 C4ICA3 Failed 0.72 0.90 Failed4 0.58 0.86 0.93 Failed5 0.65 0.84 0.93 Failedthe condition of listening to anomalous sentences versus silence are createdusing the general linear model approach.Fig. 5.6 shows the results of experimental fMRI data analysis. The num-ber of times each component appeared in a cluster, the dendrogram of thehierarchical clustering algorithm, the similarities between estimated clusters,the 2D projection of the clusters using t-SNE approach, and the two acti-vation maps which were related to the largest clusters are shown in thisfigure. As it can be seen in Fig. 5.6d, there are two major cluster of com-ponents, i.e. functional activation patterns, within the dataset. Convexhulls are generated around the estimated clusters. The components can besorted based on the size of the clusters. As expected, the largest cluster(component) is associated with the activations in superior and middle tem-poral gyri that analyzes the low-level acoustic signals (Fig. 5.6e). The secondlargest cluster shows activations in the left inferior frontal gyrus which re-flects higher-level linguistic (possibly semantic) processing of speech in thebrain (Fig. 5.6f) [137, 142, 55, 161].5.5 DiscussionsIn this study, we demonstrate that we can assess the reliability of the ob-tained components by a sparse representation analysis. Although here weused KSVD as a sparse representation analysis for an fMRI dataset, we an-645.5. Discussions0 5 10 15 20132 721 2Component labelNumber of estimates(a)00.20.40.6 Cluster merging similarity(b)Red lines indicate estimate−clusters312Estimate−cluster label   0.20.30.40.50.60.70.80.91(c)123  0   0.5 0.750.9 1   Single−run−estimate"Best Estimate"0.50<ρij≤ 0.750.75<ρij≤ 0.900.90<ρij≤ 1.00(d)Component:3 feature:1(e) component 3Component:1 feature:1(f) component 1Figure 5.6: Reliability and visualization analysis of the simulated data fortwo different sets of parameters (first and second row).(a) Number of timeseach component appeared in a cluster. (b) Dendrogram of the hierarchicalclustering algorithm. (c) Similarities between estimated clusters, arrangedaccording to the dendrogram. (d) 2D projection of the clusters. (e, f) Acti-vation maps corresponding to the two largest clusters.ticipate that this method will be useful for analyzing other neuroimagingdatasets (such as EEG and MEG), and other sparse representation tech-niques (such as [139, 67, 100]) can be tried. Future work will apply thismethod in order to investigate the usefulness of the proposed approach forother datasets. Future work should also provide a quantitative measure forinterpretation of clusters and tuning of the sparse representation parameters.In summary, a method is proposed to investigate the reliability of theestimated components using sparse representation analysis. To achieve thisgoal the KSVD algorithm is run several times and estimated components areclustered based on their similarity. Then, the clusters are visualized using655.5. Discussionsa nonlinear 2D projection. The proposed approach provides a tool for fur-ther investigation of the obtained sources. The approach highlights compactclusters with higher number of components, and suggests less reliable clusterof components that could be discarded.66Chapter 6Multi-object StatisticalAnalysis of Major DepressiveDisorder6.1 IntroductionDepression directly affects more than 10% of the population at some pointin their lives (World Health Organization, 2004), and is a leading cause ofdisability, with significant social, health and economic impacts [138]. MajorDepressive Disorder (MDD) has a typical onset in adolescence and youngadulthood, and prevalence rates of MDD by late adolescence equal those inadulthood [96]. MDD that starts in adolescence is associated with a largenumber of negative outcomes, including lower educational and occupationalattainment, poor physical health, and poor interpersonal functioning [93].These outcomes persist into adulthood and predict significant risk for a life-long pattern of illness [18]. Given the enormous personal and societal costsassociated with MDD, studies aimed at uncovering the pathology of the dis-order in its earliest stages are crucial to informing effective prevention andintervention efforts.Our understanding of the changes in brain neuroanatomy that are associ-ated with MDD have benefited greatly from important advances in MagneticResonance Imaging (MRI) technology in the past two decades. Using struc-tural MRI techniques in adult samples, differences in volume and shape havebeen found between depressed and non-depressed groups in temporal (e.g.,superior temporal gyrus (STG)), hippocampus, amygdala), frontal (e.g., an-terior cingulate cortex (ACC)), and orbitofrontal regions (see [110] for areview of the structural MRI findings associated with MDD in adulthood).1This chapter is adapted from [153]: Mahdi Ramezani, Ingrid Johnsrude, Abtin Ra-soulian, Rachael Bosma, Ryan Tong, Tom Hollenstein, Kate Harkness, and Purang Abol-maesumi, Temporal-lobe morphology differs between healthy adolescents and those withearly-onset of depression, Neuroimage: Clinical, 2014.676.1. IntroductionThese studies, conducted in adults, are likely to reflect the pathophysiologyof MDD, as well as secondary changes due to longstanding behavioural al-teration, and iatrogenic changes (as a result of pharmacological and othertherapies).To date, a small handful of studies have also investigated pediatric andadolescent-onset MDD and have reported structural differences from healthycontrols in similar regions, including hippocampus [118], amygdala [162],striatum and caudate nucleus [122, 173], superior and middle temporal gyri [173],and subgenual prefrontal cortex [22]. A compelling recent study by [37] evenfound volumetric differences in left hippocampus in clinically non-depressedyoung girls at high risk for depression (due to a maternal depression his-tory), in comparison with girls who did not have a maternal depressionhistory. However, other studies of early-onset depression have failed to findvolumetric differences between depressed and healthy control groups in criti-cal brain regions, including prefrontal cortex (e.g., [135], hippocampus [162],and amygdala [118].One potential reason for the failure to find consistent evidence of morpho-logical differences in critical cortico-limbic circuits in early-onset MDD maybe that such differences are subtle. Since the extent of hippocampal volumeloss has been found to correlate significantly with the number of depressiveepisodes (i.e., time spent depressed) in adults with depression [128, 177], dif-ferences between depressed and non-depressed groups are likely to be largerin older samples of adults with recurrent depression than in younger indi-viduals in the earliest stages of the illness. Hippocampal volume loss hasalso been associated with traumatic life events, which can be expected toaccumulate with age (e.g., [38, 200]). More sensitive methods than havebeen used to date may be required to detect subtle differences in brain mor-phology associated with depression in its earliest stages, and in its youngestsufferers.Previous methods used for investigating the morphological differencesbetween individuals with depression and healthy controls can be categorizedinto three main types: 1) volume analysis; 2) analysis of local composition oftissue; and 3) analysis of shape and volume. The most common approach ishippocampal volume analysis using manual or automated segmentation [13,196, 16]. In such analyses the volume of the hippocampal region is measuredafter isolating it from the rest of the brain. Using this method, several groupshave observed smaller hippocampal volumes in adults with MDD [24, 26, 77,120, 134, 169] whereas other groups have reported no differences or evenlarger hippocampal volumes [87, 131, 165, 200].Voxel-based morphometry (VBM) [6, 81] which examines voxelwise dif-686.1. Introductionferences in grey- and white-matter volume and concentration throughoutthe brain, has demonstrated reduced grey matter intensity in hippocampusof MDD subjects [196, 16, 175, 36]. A limitation of VBM is that each in-dividual’s brain data is normalized using nonlinear deformation fields to areference template. Through that process, crucial idiographic informationsuch as the shape of brain structures and their position, orientation and size(pose), both relative to other structures and in absolute terms, is lost [6].This information may be critical for capturing group differences, particularlywhen such differences are subtle.Alternatives to VBM approaches include Deformation Based Morphom-etry (DBM) [19] and Tensor Based Morphometry (TBM) [41], which arewidely used to study the brains of people with schizophrenia, autism, dyslexiaand Turner’s syndrome [72]. Unlike VBM, which analyzes images after thedeformation fields have been applied in order to map any individual braininto a standard reference, these approaches take the deformation fields them-selves as the dependent variable. Neither of these approaches has yet been at-tempted to study structural changes in depression. However, shape-analysismethods that are related to DBM/TBM have been employed in two sepa-rate studies to examine hippocampal differences in depression. These studieshave focused on separate analysis of both shape and volume of the hip-pocampus using high-dimensional mapping [147] or spherical harmonic basisfunctions [204]. These studies with adult and elderly depressed participantsreveal significant differences in hippocampal shape, but no volumetric differ-ences. In these analyses, contribution to morphology made by the shape andpose of the hippocampal region and the surrounding regions was ignored.Multi-object analysis enables the simultaneous statistical analysis of mul-tiple brain structures, possibly allowing for the identification of subtle mor-phological differences across multiple brain regions, between groups. Multi-object methods were originally designed to characterize the shape of a pop-ulation of geometric entities [63, 193, 111, 34], and have since been appliedto analysis of brain MRI images to discriminate between healthy and clinicalpopulations (e.g., pediatric autism; [82]), but has not yet been employed inthe context of major depressive disorder.In this chapter, we report the first use of a multi-object statistical poseand shape model to simultaneously analyze several temporal-lobe structuresthat have been implicated in MDD. Given that MDD is associated withmorphological changes in several brain structures, pose and shape analysisof these brain structures simultaneously may be more sensitive to subtlegroup differences than is independent analysis of those structures, since si-multaneous analysis includes information not just about the pose of brain696.2. Methodstructures, but about their pose relative to each other. In this chapter, wefirst present the method, and then use it to identify temporal-lobe structuresof interest and to characterize the relationship between the pose and shapeof these structures and the symptomatology of early-onset MDD, when mor-phological differences between healthy and clinical groups are expected to bemild, and subtle. Use of a young sample at the earliest stage of their depres-sive illness has important implications for understanding the neurostructuralcorrelates of the etiology of MDD.6.2 MethodPose and shape analysis of multiple brain structures, shown schematicallyin Fig. 6.1, involves three steps: a) preprocessing the MRI data to extractsurface points on brain structures of interest; b) finding the pose and shapevariations among these brain structures; c) Principal Component Analysis(PCA) on pose and shape variations in the subject population.6.2.1 PreprocessingThe structural MRI data of the subjects are preprocessed using StatisticalParametric Mapping software (SPM8, Wellcome Department of CognitiveNeurology, London, UK). Briefly, Grey Matter (GM), White Matter (WM)and Cerebrospinal Fluid (CSF) are segmented using the automated segmen-tation processes in SPM. This results in a set of three maps for GM, WMand CSF in native space for each subject, in which each voxel is assigned aprobability of being one of the three tissue types. The LONI ProbabilisticBrain Atlas (LPBA40/SPM5) [176] in MNI space was used to extract leftand right hippocampus, parahippocampal gyrus, putamen, and superior,inferior and middle temporal gyri from the brain of each participant (seeFig. 6.2; these are structures that have been shown to be associated withMDD in adulthood [110]). The LONI atlas is constructed using MRI dataof 40 healthy volunteers, and 56 structures were labeled manually. We usethe maximum-probability values at each voxel to segment the regions of in-terest in the atlas. To accomplish segmentations in each of the participants,we use the DARTEL algorithm to register the LONI atlas to each partici-pant’s structural MRI, and extract surface points, V = {vn,l}n=1...N,l=1...L,indexing the coordinates of the surface voxels on each of the selected brainstructures [5]. Here, vn,l consists of all surface points of the lth structure ofsubject n, L = 12 is the number of structures, and N = 45 is the number ofsubjects in the training set. For each subject, the surface boundary of each706.2. Methodbrain structure was used to compute the volume of that structure. Structurevolumes were compared between the MDD and control groups.Figure 6.1: Schematic of the pose and shape statistical analysis of multi-ple brain structures. (a) Preprocessing the MRI data for extracting sur-face points on brain structures of interest; (b) Pose and shape multi-objectanalysis for finding the pose and shape variations between multiple brainstructures; (c) PCA for generating pose and shape features.716.2. Method(a) (b) (c)Figure 6.2: Segmented structures in both hemisphere of the brain which areused for multi-object statistical analysis. Surface points of putamen (blue),hippocampus (green), parahippocampal gyrus (red), ITG (cyan). MTG (yel-low), and STG (magenta) in both hemisphere of brain are shown in (a) an-terior to posterior view, and (b) posterior to anterior view. Structures in lefthemisphere of the brain are shown in (c).6.2.2 Pose and Shape AnalysisSince all surface points are extracted using the atlas in MNI space, thecorrespondences among the surface points (between homologous points indifferent subjects) was known. We used those correspondences to computethe linear (rigid plus scaling) deformation required to warp each structure ineach participant to the mean shape of each structure calculated across par-ticipants, using generalized Procrustes analysis [62]. Pose variations werecalculated using translation, rotation, and scaling values of these deforma-tion fields. Each transformation for a voxel, x, is defined as T (x) = sRx+d,where R is a rotation matrix, d is a translation vector, and s is a scale factor.These transformations form a Lie group, which is a Riemannian manifold soconventional statistical analysis in Euclidean space is not applicable. How-ever, a logarithmic transform was used to put the members of the Lie groupinto linear tangent space, appropriate for conventional statistical analysis.Exponential and logarithm maps are performed using the standard matrixexponential and matrix logarithm; e.g., the matrix exponential is defined bythe series:exp(T ) =∞∑01k!T k (6.1)726.2. MethodThe logarithm of the transformation is defined as:log(T ) =l −rz ry xrz l −rx y−ry rx l z0 0 0 1 , (6.2)where l = log(s), and (rx, ry, rz) is the rotation axis with angle θ =√r2x + r2y + r2z .Thus, each transformation, Tn,l, which represents the transformation fromthe lth structure in the mean shape to the corresponding structure in the nthinstance, was expressed as a vector with seven variables: (rx, ry, rz, x, y, z, l)T .For the purpose of statistical analysis, each transformation was normal-ized using the mean transformation for each structure, Ml, and mappedto the tangent space: upn,l = log(M−1l Tn,l) [20, 143]. The transformationvectors were concatenated for each individual to form a 7L × 1 vector:upn =[upn,1T , . . . , upn,LT]Tand the matrix of all transformations for all indi-viduals was created: Up =[up1, . . . , upN]T.Shape variations are computed as the residual deformation required tomap the mean shape of each structure to the corresponding structure foreach subject, after the linear transformation for pose has been applied. Sub-sequently, similar to the pose variation extraction method described ear-lier, the distance vectors (deformations) were concatenated for each subject:usn =[usn,1T , . . . , usn,LT]Tand the matrix of all transformations for subjectswas created: U s = [us1, . . . , usN ]T .6.2.3 Statistical AnalysisA multi-object statistical pose and shape model [21] was generated for theselected brain structures. In order to extract major directions of the pose andshape variations across all subjects, we constructed an orthonormal basis setthat represented all pose and shape variations using Principal ComponentAnalysis (PCA).PCA on pose was performed using Up = ApF pT . In this equation,F p =[fp1 , . . . , fpN−1]7L×(N−1)is the pose feature matrix, and fpi s are prin-cipal components that are sorted in descending order of their variance.Ap =[ap1, . . . , apN−1]N×(N−1)is the corresponding weight matrix, generatedfrom the principal component weights. We focus our analysis on the principalcomponents associated with the pose that capture two standard deviations736.2. Methodof variations in the data. Similarly, PCA was used to identify an orthogo-nal vector set for shape, F s =[fs1 , . . . , fsN−1], and the corresponding weightmatrix, As =[as1, . . . , asN−1]. We consider principal components that cap-ture one standard deviation of shape variations in the data. The primarydifference between the number of principal components we consider for poseand shape stems from the difference in the dimensions of pose components(i.e. 7L, where L = 12 is the number of structures in our study) and shapecomponents, which is the number of all surface points in each structure andis significantly larger than L.Our objectives were to 1) identify pose and shape features that woulddifferentiate the two groups; and 2) investigate the relation between thesefeatures and the clinical index of depression (i.e., BDI scores).To achieve the first objective, we first use a random-forest classifica-tion [23] approach to sort the selected principal components. Random forestsare a learning method for classification that use multiple decision trees fortraining. The decision tree splits the weights related to the considered prin-cipal components to maximize diversity among the subjects [46]. As a result,a tree with nodes and leaves is constructed, where its top node shows theweights with maximum separability. We perform unpaired two-sample t-tests(assuming unequal variance in the two groups) only on the top-node weightsfor pose and shape, i.e. one component for pose and one component forshape. As this study was designed to be hypothesis-generating and sensitiveto morphological differences in brain structures between adolescent depressedindividuals and control participants, a significance level of p < 0.05, uncor-rected for multiple comparisons, was used [163, 89]. In order to visualizethe significant pose component, associated with the top node weights, thenorms of the three pose parameters (three translation, three rotation, andone scale variables) were computed. Subsequently, the mean of each param-eter was removed and the result was divided by the standard deviation ofthe parameter. For the shape, the mean of the significant shape principalcomponent associated with the top node weights was removed and the re-sult was normalized to the component’s standard deviation. The higher theabsolute value of the normalized pose or shape component is, the more thecontribution of that member of the principal component is to capture thedifferences between the two groups.To achieve the second objective, we calculate Spearman correlation coef-ficients between Beck Depression Inventory score and the top-node pose andshape weights.746.3. Results6.3 Results6.3.1 Volume AnalysisWe first assessed the volume differences between the MDD and control groupsfor each structure. Unpaired two-sample t-tests (assuming unequal variancein the two groups) with significance level of p < 0.05, uncorrected for mul-tiple comparisons, was used to detect volume differences between the twogroups. Fig. 6.3 shows the distribution of the volume of each structure forthe depressed and control groups.756.3. Results    (a)  (b)  (c)  (d)      (e)  (f)  (g)  (h)      (i)  (j)  (k)  (l)  Figure 4. Distribution of the volume of each structure between the two groups, (a) left putamen, (b) right putamen, (c) left hippocampus, (d) right hippocampus, (e) left parahippocampal gyrus, (f) right parahippocampal gyrus, (g) left inferior temporal gyrus, (h) right inferior temporal gyrus, (i) left middle  temporal gyrus, (j) right middle temporal gyrus, (k) left superior temporal gyrus, (l) right superior temporal gyrus.  The central red mark is the median, the edges of the blue box are the 25th and 75th percentiles, and the whiskers show the ex treme values of the volumes.   Control Depressp-value = 0.821Control Depressp-value = 0.775Control Depressp-value = 0.092Control Depressp-value = 0.176Control Depressp-value = 0.019Control Depressp-value = 0.051Control Depressp-value = 0.868Control Depressp-value = 0.919Control Depressp-value = 0.085Control Depressp-val e = 0.130Control Depressp-value = 0.034Control Depressp-value = 0.106Figure 6.3: Distribution of the volume of each structure between the twogroups, (a) left putamen, (b) i ht putam n, (c) left hippocampus, (d) righthippocampus, (e) left parahippocampal gyrus, (f) right parahippocampalgyrus, (g) left inferior temporal gyrus, (h) right inferior temporal gyrus, (i)left middle temporal gyrus, (j) right middle temporal gyrus, (k) left superiortemporal gyrus, (l) right superior temporal gyrus. The central red mark isthe median, the edges of the blue box are the 25th and 75th percentiles, andthe whiskers show the extreme values of the volumes.766.3. ResultsThe volume of both the left parahippocampal gyrus and the left superiortemporal gyrus were significantly greater (p = 0.019 and p = 0.034 respec-tively) in the depressed than the control group.6.3.2 Pose and Shape AnalysisThe goal of our multi-object analysis was to investigate the pose and shapedifferences in brain structures between the participants with MDD and healthycontrols. The first four principal components of pose capture two standarddeviations (95%) of the variation in pose, and the first eight componentsof shape capture one standard deviation (68%) of the variation in shape.The random-forest classification trees for pose and shape were built on thesecomponents. Statistical analyses using unpaired two-sample t-tests wereperformed on the top component for each tree. The two groups differed sig-nificantly (p = 0.031 with corresponding statistical power of 0.77 [66] for thepose component, and p = 0.042 with corresponding statistical power of 0.89for the shape component). Table 6.1 shows the normalized pose parametersacross different brain structures for the most significant pose component.The translation component differed significantly between the two groups inleft putamen, left and right hippocampus, and left ITG. Rotation also dif-fered between the groups in left putamen, right hippocampus, and left andright ITG, and scale differed between groups in the left and right putamen.Table 6.1: Normalized pose parameters of brain structures. L and R showthe assigned anatomical left and right hemispheres.Structure Left or righthemisphereTranslation Rotation ScalePutamenL 1.16 1.93 1.81R 0.66 0.28 2.32HippocampusL 1.13 -0.07 -0.14R 1.50 -1.11 0.03Parahippocampal gyrusL -0.31 -0.75 -0.41R 0.54 -0.12 -0.79Inferior temporal gyrusL -1.82 -1.58 -0.52R -0.90 1.3 -0.4Middle temporal gyrusL -0.73 -0.72 -0.28R -0.70 0.81 -0.55Superior temporal gyrusL -0.14 0.01 -0.27R -0.36 0.03 -0.80776.3. ResultsLeft hemisphere Right hemisphere  Inferior Superior Inferior Superior        (a) (b) (c) (d)     (e) (f) (g) (h)     (i) (j) (k) (l)   -101234Figure 6.4: Shape principal component that was significantly different be-tween the two groups. The component is normalized by removing the meanand divided by its standard deviation. Inferior and superior view of the (a,b) left putamen, (c, d) right putamen, (e, f) left hippocampus, (g, h) righthippocampus, (i, j) left parahippocampal gyrus, (k, l) right parahippocam-pal gyrus. The color smoothly varies from black through red, orange, yellowand white, to show the minimum through maximum difference values. Leftside of the pictures shows the left side of the brain, right shows right side,top is the anterior and bottom is the posterior.786.3. Results     (m) (n) (o) (p)     (q) (r) (s) (t)     (u) (v) (w) (x) Figure 5. Shape principal component that was significantly different between the two groups. The component is normalized by removing the mean and divided to its standard deviation. Inferior and superior view of the (a, b) left putamen, (c, d) right putamen, (e, f) left hippocampus, (g, h) right hippocampus, (i, j) left parahippocampal  gyrus, (k, l) right parahippocampal gyrus, (m, n) left superior temporal gyrus, (o, p) right superior temporal gyrus, (q, r) left middle temporal gyrus, (s, t) right middle temporal gyrus, (u, v) left inferior temporal gyrus, (w, x) right inferior temporal gyrus.  The color smoothly varies from black through red, orange, yellow and white, to show the minimum through maximum difference values. Left side of the pictures shows the left side of the brain, right shows right side, top is the anterior and bottom is the posterior.    -101234  Figure 6.5: Sha e principal co ponent that was significantly different be-tween the two groups. The component is normalized by removing the meanand divided by its standard deviation. Inferior and superior view of the(m, ) left superior emporal gyrus, (o, p) right superior temporal gyrus,(q, r) left middle temporal gyrus, (s, t) right middle temporal gyrus, (u, v)left inferior temporal gyrus, (w, x) right inferior temporal gyrus. The colorsmoothly varies from black through red, orange, yellow and white, to showthe minimum through maximum difference values. Left side of the picturesshows the left side of the brain, right shows right side, top is the anteriorand bottom is the posterior.796.3. ResultsFigs. 6.4, 6.5 shows the normalized shape component across differentstructures in the brain. As can be seen in the figure, many regions of all theexamined structures show variations of the shape that are more than 1.96(two standard deviations away from the mean), in both hemispheres.To investigate the relation between the pose and shape weights thatwere significantly different between the two groups and BDI scores, Spear-man correlation coefficients were calculated between the pose and shape val-ues and BDI. The significant pose component correlated significantly withBDI (Spearman correlation: 0.38, p-value = 0.0086, slope: -0.039, intercept:0.39), but the significant shape component did not (Spearman correlation:0.15, p-value = 0.298, slope: -0.89, intercept: 8.8). Fig. 6.6 shows the dis-tributions of the pose scores (6.6a) and shape scores (6.6b) across BDI. Thefour male subjects are identified with a circle.(a) (b)Figure 6.6: Pose (a) and shape (b) scores that generated the significantdifference between the MDD subjects and controls across the Beck Depres-sion Inventory Index (BDI). Pose scores are significantly correlated to theBDI (Spearman correlation: 0.38, p-value = 0.0086, slope: -0.039, intercept:0.39). Shape scores are not significantly correlated to the BDI (Spearmancorrelation: 0.15, p-value = 0.298, slope: -0.89, intercept: 8.8). A circle hasbeen drawn around the data of male subjects.806.4. Discussion6.4 DiscussionWe conducted a statistical analysis of pose and shape information from sev-eral brain regions in order to examine whether the brains of individuals withearly-onset MDD differ from those of healthy controls. Indeed, despite arather small number of participants, we were able to observe statistically re-liable differences in the medial temporal lobe regions, and we also determinedthat some features captured by the pose and shape analysis correlated withdepressive symptomatology as measured by the Beck Depression Inventory.The sensitivity of this method may be related to its ability to capture dif-ferences in the spatial relationships among structures, not simply differenceswithin an individual structure.We observed volume differences in the left parahippocampal gyrus andthe left superior temporal gyrus (STG) structures between the depressedgroup and the control group. The STG volume and GM density differ-ences between the MDD and control subjects was previously shown by [200]and [174]. The individuals studied by these authors had been diagnosed withMDD at least two years earlier; so a later stage of the illness than the clinicalgroup in the current study. Our results indicate that differences in STG arepresent right from the earliest stages of the disease.The most significant component of the pose, highlighted in Table 6.1,showed that the left and right putamen, the left and right hippocampus, andthe left and right inferior temporal gyri were more affected by MDD. Thescale parameter of the right putamen is the only parameter that showed atleast two standard deviations of variation. The translation mostly affectedthe left inferior temporal gyrus, and the rotation mostly affected the leftputamen.Shape analysis revealed that all examined structures, including putamen,hippocampus, parahippocampal gyrus, and superior, middle and inferiortemporal gyri, differed between the two groups, suggesting that multiob-ject shape analysis is a sensitive tool for the examination of morphologicaldifferences in clinical samples. Moreover, within the most significant com-ponent of the shape, we identified regions that were at least two standarddeviations away from the mean of that component, highlighting regions thatwere more affected by MDD.Importantly, depressive symptomatology, as indexed by BDI scores, cor-related with the pose of the structures (Fig. 6(a)). While the volume increasein the fusiform gyrus, cuneus and precuneus, have been previously shown tohave association with BDI increase in MDD [101], we are the first to showthat pose variations of multiple structures are also affected by MDD, and816.4. Discussioncorrelate significantly with BDI.The significant brain structural abnormalities seen here in early-onset de-pression are consistent with those observed in previous work [118, 119, 117].However, MacMaster et al. only investigated volumetric differences betweenbrain structures, after isolating each structure from the rest of the brain.Here, we have investigated the morphometric differences using simultaneouspose and shape analysis of multiple structures. As a result, we can capturedifferences due to the relationship among structures, and also differentiatebetween pose and shape morphometric differences.The neural mechanisms underlying the observed morphometric differ-ences in MDD have received empirical attention. Depression is associatedwith chronic dysregulation of the hypothalamic-pituitary-adrenal (HPA) axiswith resulting chronic release of cortisol and other neurotoxic stress hor-mones [25]. Glucocorticoid neurotoxicity has preferential effects on hip-pocampal neurogenesis (e.g., [199, 167]). Indeed, in both preclinical andhuman clinical studies chronic stress and depression are associated withlong-term changes in the hippocampus in the expression of genes involvedin synaptic plasticity, such as brain-derived neurotrophic factor (BDNF;e.g., [102, 130]). Our results extend the state of the literature by suggestingthat through the use of sensitive pose and shape analyses, the structuraldifferences in MDD can be observed at the very initial stages of the ill-ness, suggesting that they do not just emerge over the recurrent and chronicpathology of the disorder.A concern about the method is the possible dependence on the qualityof the segmentation. In this work, the segmentation comes from an atlasand the registration of atlas to the brains of the individual participants.A potential alternative is to manually segment the structures in individualbrains prior to a group-wise registration. In future studies, we can alsouse polyaffine transformations in a logarithmic domain [4, 45], instead ofsimilarity transformations for registration of multiple structures. An affinetransform would further encompass anisotropic scaling and shearing.Another concern is that we did not make any formal adjustments to cor-rect for multiple comparisons, which potentially introduced a risk of false-positive results. Therefore, the p-values should be interpreted with cau-tion [60]. The use of multiple comparisons corrections is often debated,because these corrections increase the chance of making type II errors thatminimize truly important findings and require the use of large samples (whichare often prohibitively expensive in neuroimaging research) to detect modesteffect sizes [140]. As such, future studies with larger samples are needed tofurther validate these results.826.4. DiscussionThe current study investigated morphological variation in the pose andthe shape of hippocampus and surrounding structures in early-onset MDDcompared to control participants. Although a large number of previous stud-ies have shown differences between MDD subjects and controls [196, 16, 175,36, 24, 26, 77, 134, 169], ours is the first, to our knowledge, to simultaneouslyanalyze multiple structures, and to separate pose and shape in morphologicalanalysis. The value of the presented method is that it identifies structuresof interest and characterizes types of differences (i.e. pose and shape) thatcan then be fed back into models/theories on etiology. In other words, whatis more relevant than finding group differences is pinpointing the effect ofunderlying mechanisms that lead to MDD on brain structures and their in-terrelationships.In summary, using multi-object statistical pose and shape analysis, wedemonstrated brain morphological differences between adolescents and youngadults with early-onset MDD and healthy control subjects. Relative poseand shape information of multiple structures in brain, which are usually dis-regarded, were shown to be important in capturing the group differences.Within this framework, the shape deformations were analyzed separatelyfrom rigid transformations and scale (i.e., the pose information). Therefore,we could identify the type of morphological differences (pose and shape).Within the simultaneous analysis of multiple structures the relative dif-ferences among structures were captured. The differences were more pro-nounced in the moderate and severely depressed participants. Moreover,morphological features (pose) significantly correlate with depressive symp-toms across both normal and depressed participants.83Chapter 7Fusion Analysis of Brain ShapeDeformations and LocalComposition of Tissue7.1 IntroductionStudies of adults with primarily recurrent episodes of MDD have shown sig-nificant volumetric differences in temporal (e.g., Superior Temporal Gyrus[STG], hippocampus, amygdala) and frontal (e.g., Anterior Cingulate Cor-tex [ACC] and Orbitofrontal cortex [OFC]) brain regions relative to healthycontrols (see [14, 110, 15]) for reviews of the neuroanatomy and structuralMRI findings associated with MDD). The most consistent finding in thesestudies is reduced hippocampal volume in adult patients with MDD com-pared to healthy controls. However, some studies have also failed to findgroup differences in hippocampal volumes [200, 87, 131, 165], and othershave even reported larger hippocampal volumes in patients with MDD rela-tive to healthy controls [77, 194, 200].A large number of approaches have been developed to characterize differ-ences, among individuals and groups, in the neuroanatomical configurationof the human brain. Generally, these approaches are classified into thosethat measure differences in brain shape, and those that measure differencesin the local volume (and concentration) of brain tissue after macroscopic dif-ferences in shape have been discounted [6]. The former approaches analyzethe deformation fields required to map individual brains onto some stan-dard reference in order to characterize neuroanatomy. Deformation BasedMorphometry (DBM) [19] and Tensor Based Morphometry (TBM) [41], arewidely used approaches that use deformation fields. Shape-analysis meth-1This chapter is adapted from the following submission [151]: Mahdi Ramezania, Pu-rang Abolmaesumia, Amir Tahmasebib, Rachael Bosma, Ryan Tong, Tom Hollenstein,Kate Harkness, and Ingrid Johnsrude, Fusion Analysis of First Episode Depression: WhereBrain Shape Deformations Meet Local Composition of Tissue, Neuroimage: Clinical, 2014.847.1. Introductionods that are related to DBM/TBM have been widely employed to examinemorphometric differences in depression. For example, in MDD, [147] usedhigh dimensional brain mapping on MRI data to quantitatively characterizethe shape and volume of the hippocampus in adults with MDD and healthycontrols (mean age = 33 ± 10). They found significant group differences inhippocampal shape, but no evidence for differences in volume. In a morerecent study, [204] applied SPherical HARMonic (SPHARM) shape analysisto the left and right hippocampi of elderly patients with MDD (age > 60)and healthy controls. Analysis revealed significant shape differences in themid-body of the left hippocampus between the two groups. Further, in termsof volume, patients in a current episode of MDD had lower left hippocam-pal volumes in comparison to controls, whereas patients in remission fromMDD showed no reduction in hippocampal volume. In previous chapter,we used multi-object statistical pose and shape analysis, and demostratedbrain morphological differences between adolescents with early-onset MDDand healthy controls.Approaches that focus on the local composition of brain tissue, such asVBM, compare tissue images on a voxel-by-voxel basis after the deformationfields have been used to spatially normalize the images. For example, [13]applied VBM using SPM99, and reported smaller grey-matter volume ofthe right hippocampus, and smaller white-matter volume in the left ante-rior cingulate and right middle frontal gyrus, in elderly patients with MDDcompared to healthy controls. Using VBM in SPM5, [196] reported signif-icantly lower left hippocampal volumes in middle-aged patients with MDDin comparison to healthy controls. Similarly, in the same group of middle-aged patients with MDD, [16] compared VBM using a manual segmentationmethod and the automated method, and found significant hippocampal vol-ume reductions using both segmentation methods in comparison with healthycontrols. Finally, studies focusing on younger age groups, and including rel-evant covariates (i.e., age, sex, and intracranial volume) have also reportedsignificantly lower hippocampal volumes, particularly in the left hemisphere,in both adolescents with MDD [118] and in patients with early onset MDDand a family history of depression [117].In summary, computational neuroanatomical techniques either use thedeformation fields themselves to characterize brain structural variation, oruse these fields to normalize images that are then entered into an analysisof regionally specific differences in tissue composition. Ideally, a procedurelike VBM should be able to automatically identify any structural abnormal-ities in a single brain image. However, even with many hundreds of subjectsin a database, the method may not be powerful enough to detect subtle857.1. Introductionabnormalities [6]. Recently, unified voxel- and tensor-based morphometry(UVTM) is proposed that uses locally adaptive combination of TBM andVBM to improve sensitivity [97]. UVTM is an extension of the Jacobianmodulated VBM [53], which gives weights to VBM or TBM analysis basedon registration confidence. In modulated VBM, voxel concentration is scaledbased on the amount of deformation which was applied in the registrationprocedure. Although the motivation for multiplying the Jacobian determi-nant of transformations and the tissue segmentation probabilities is intuitive,it is not clear if the statistically significant regions resulting from VBM andTBM will match, although it is assumed to be. In addition, there has beenno quantitative study on determining the optimal weight parameters basedon the registration confidence. A more powerful procedure would be to usea voxel-wise multivariate approach. Within a multivariate framework, inaddition to images of grey matter concentration, other image features suchas white matter concentration, and the deformation fields calculated duringthe spatial normalization procedure can also be included [6]. Fusion of thesemultiple images may help in detecting subtle individual differences.Joint Independent-Components Analysis (jICA) [30] is a multivariatetechnique for such "fusion analysis". It combines information from multi-ple features, which are a lower-dimensional representation of selected brainstructures. jICA, as a group-level analysis technique, uses extracted fea-tures from individual subjects’ data and tries to maximize the independenceamong joint components. For example, [39] combined resting state func-tional connectivity and fractional anisotropy data within jICA in a datasetof four subjects with MDD and nine healthy control subjects to investigatelinks between functional connectivity changes and white-matter abnormal-ities. They reported differences in the strength of connectivity and in thecoherence of white-matter tracts among subgenual anterior cingulate cortex(sACC) and perigenual ACC, anterior midcingulate cortex, caudate, tha-lamus, medial frontal cortex, amygdala, hippocampus, insula, and lateraltemporal lobe.The purpose of the current study was to combine, for the first time, brainshape and regional brain tissue composition using multivariate jICA tech-nique in order to investigate the brain structural correlates of first-episodeMDD. We determined the joint variation of shape and tissue composition inthe hippocampal region in a sample of young people suffering from a firstepisode of MDD in comparison to a sample of young healthy controls. Theimportance of a young first-episode group is that they have not been subjectto the known neurotoxic effects of glucocorticoids resulting from aging andthe pathology of chronic depression [167, 172]. We hypothesize that, whereas867.2. Methodconventional univariate analysis may not be sensitive to subtle differences inbrain structure in this group, a multivariate technique that jointly analyzesmultiple brain characteristics (i.e., shape and tissue composition) may havethe requisite sensitivity to capture group differences. Following a group-wiseregistration using DARTEL [20, 16] to create an average template, we ob-tained individual grey matter (GM) and white matter (WM) tissue maps inthe template space, along with the deformation fields required to warp thetemplate to the GM and WM maps2. Using the jICA technique, we com-bined these three features, reflecting the tissue composition and shape ofthe brain in each individual, in order to extract spatially independent jointsources and their corresponding modulation profiles. We hypothesize thatthe mixing coefficients of the modulation profiles will lead to better discrim-ination of MDD subjects from the control group compared to the resultsobtained when brain shape and tissue composition are analysed separately.7.2 MethodIn the following two subsections, first the input features to the jICA method,representing tissue composition and deformation of selected brain structures,are described. Then, the multivariate joint independent-components analysistechnique, used to fuse multiple features, is briefly reviewed.7.2.1 FeaturesThe data type on which we focus in this paper is structural MRI (sMRI).Outcome measures derived from structural images may include measures ofshape (e.g., deformation) or tissue volume or concentration (e.g., grey orwhite matter). Below, we describe how we extracted three different features:(1) shape deformation information, and (2) grey- and (3) white-matter con-centration used for voxel-based morphometric (VBM) analysis.The sMRI data were preprocessed using Statistical Parametric Mappingsoftware (SPM8, Wellcome Department of Cognitive Neurology, London,UK). Briefly, GM, WM, and cerebral spinal fluid (CSF) were segmented us-ing the automated segmentation processes in SPM. This resulted in a set ofthree images in native space, in which each voxel is assigned a probabilityof being one of the three tissue types. The GM maps were registered using2The reason for using these inverse deformation fields from the template to each subjectis that we can use the correspondences among the voxels (between homologous voxels indifferent subjects) to compute the jICA decomposition.877.2. Methodthe DARTEL method, which achieves accurate inter-subject registration ofimages [5, 16, 187]. The DARTEL procedure uses the GM and WM mapsto create new templates and warps the GM and WM maps of each sub-ject to the DARTEL template. Using DARTEL group-wise registration, theinter-subject registration is more accurate comparing to other SPM tools,therefore less spatial smoothing can be performed. We have used Gaussianconvolution kernel with Full Width at Half Maximum (FWHM) of 8 mm. Todemonstrate the effect of smoothing, we report the results with and withoutspatial smoothing. The deformation fields (DF) required for warping thegroupwise (DARTEL) template to the GM and WM maps of each subjectwere also created. These deformation fields show how much a participant’sstructure deviates from that of the other participants. The absolute valueof the deformation field (displacement) for each voxel is used to representshape morphometry. The warped GM and WM segments along with thedeformation fields are input features to the joint analysis method.To reduce the number of voxels in the analysis, a segmented LPBA40/SPM5atlas [176] in MNI space was used to extract the anatomical regions of inter-est. We selected the hippocampal region since abnormalities in this regionhave been associated with the pathology of MDD [110, 14, 15]. To accountfor atlas-to-subject small registration errors, the selected region was dilatedusing a disk with the radius of 5 voxels, with morphological operators toinclude adjacent regions in addition to the selected brain structure. Voxelsinside the created mask were selected for joint analysis.7.2.2 Joint Independent Component AnalysisWe assume that there is a relation between brain tissue type (GM or WM)differences and brain structural deformations. This is not an unreasonablepremise: if depression is associated with differences in both the size andshape of brain structures, then differences in the volume and/or concentra-tion of gray and/or white matter might be related to differences in struc-tural deformations in depressed individuals relative to controls. The threefeatures described in the previous section were used as input observations(X = [x1,x2, . . . ,xN ]T ∈ RN×K) to jICA in order to combine brain shapedeformations and local composition of tissue. jICA can be used to identifyany joint set of features (S = [s1, s2, . . . , sN ]T ∈ RK×N ) that is anatomi-cally differentiable between depressed subjects and healthy controls, wherexi (i = 1, 2, . . . , N) is the vector of stacked features for subject i, and sishows the ith joint independent component (source). N is the number ofsubjects and K is the total dimensionality of stacked vectors. Considering887.2. Methodthe generative model X = AS, the aim of jICA method is to find the matrixW = A−1 so that the estimation of U = WX is close to S. In this model,A is the matrix of mixing coefficients (also called ICA loading parameters,or the modulation profile), and W is the unmixing matrix. A schematic ofthe jICA approach is shown in Fig. 3.1.Joint independent components were found using the Infomax algorithm [12],which is based on minimization of mutual information of components. In thisalgorithm, the output entropy of a neural network is adaptively maximizedwith as many outputs as the number of Independent Components (ICs) tobe estimated. In order to use ICA, it is necessary to first specify the numberof Independent Components (ICs) expected. We first attempted to estimatethe number of ICs using the Minimum Description Length (MDL) criterion,which is an information-theoretic technique for model order selection [105].Using the MDL criterion, the number of components in GM and WM was es-timated to be 4 and 3, respectively, but because of the heterogeneity in thelocation and extent of deformations across both groups, this information-theoretic criterion did not converge for the on deformation field dataset.Accordingly, we instead follow the precedent set by [180] and set the numberof ICs equal to 13 of the total number of subjects: so for 25 subjects here, wespecify eight components3.Separability of the mixing coefficients was used as a criterion for cap-turing group differences. These low-dimension coefficients reflect how mucheach subjects shape deformation and tissue composition is modulated bya joint source. To investigate whether the mixing coefficients truly differbetween groups, we used two-sample (unpaired) t-tests. We report mixingcoefficients that differ significantly between the two groups (p<0.05), and forwhich the corresponding z-scaled component had more than 10 voxels withvalues above a threshold of |z| > 2.5 (99.4% cumulative probability). Wefollowed the precedent set by [2] to select minimum number of voxels withina cluster, and [192, 185] to select the threshold.In order to determine whether the fusion analysis is superior (in termsof sensitivity) to analyses based on single feature, we examined the mixing3We performed subsequent follow-up analyses for 4, 10 and 12 components to furtherconfirm the validity of our model and to test for the stability of the joint independent com-ponents. Stability analysis of the results for different number of independent componentsshowed replication of findings for 10 and 12 independent components; however, using eightcomponents yielded stronger group differences, and higher z-values. As expected [112],under-estimating the number of components (e.g., choosing four as the number of ICsin our case), yielded less reliable results. Results of analyses with 4, 10, and 12 ICs areavailable from the author by request.897.3. Experiments and Resultscoefficients and component maps of datasets containing single features. Wecompared the result of the t-tests on the mixing coefficients from the jICAof GM, WM and DF features to the result of the t-tests on the mixingcoefficients from the ICA of each of the three features separately.To further investigate the group differences, columns of the mixing coef-ficients matrix, which reflect the weighting of each joint source in a subject’sGM, WM and DF, were used as input features to a classification algorithm.A discriminant analysis with a quadratic discriminant function was used toclassify the subjects. Performance of the classifier was measured using leave-one-subject-out cross-validation, averaging classification performance acrossiterations. The joint ICA classification result was compared to classificationresults obtained with one or two features. The mixing coefficients were usedas input features for classification of depressed and control subjects.Furthermore, separability of the joint source distributions was quantifiedby computing a divergence measure between joint histograms. Each of thejoint sources was divided into three maps, which correspond to the GM, WMand deformation field features used in the jICA analysis. The map elements(each one representing a specific voxel) were thresholded and sorted in de-scending order by the voxel value, resulting in a set of voxels representingthe greatest differences between groups in each joint source. For each sub-ject, voxels that survived thresholding in all three maps were counted on athree-dimensional joint histogram in a bin defined by the three input featurevalues (from the input observation matrix X in Fig. 3.1) at those voxels’locations (see [31]) for more details on computing joint histograms). Thegroup-averaged joint histograms were then calculated by taking the meanof the joint histograms across all the subjects in the group. The differencebetween the two groups was then assessed using the Renyi divergence for-mula [90]. The divergence was also computed for other combinations of fea-tures (two or one). The higher the values of the Renyi divergence criterion,the better the discrimination between groups [29]. The best combination offeatures is the one that yields the highest divergence value.7.3 Experiments and ResultsStructural brain differences are generally more apparent in patients withmore severe or persistent forms of the illness [110]. Therefore, the analyses ofthis chapter were performed on a subset of 11 subjects (age: 18±0.89, range:16-21, 2 male, all right-handed) with moderate to severe levels of depressionsymptoms, as defined by a score of 19 or greater on the Beck Depression907.3. Experiments and ResultsInventory (BDI-II). A similar-sized comparison group of 14 healthy controls(all 18 years old; all female, all right-handed) with BDI scores of zero wereselected to act as the comparison group. The groups were well matchedin age (p-value = 1). There was no socioeconomic status (SES) differencesbetween the subjects in the two groups (p-value = 0.50).We performed VBM analysis on GM and WM images obtained by DAR-TEL group-wise registration of the maps using SPM8 toolbox. We used thesame explicit mask (described in 7.2.1) that we had used for the joint ICAanalysis of multiple features. Results show no significant WM, or GM dif-ferences using Family Wise Error (FWE) rate of 0.05 or significance level of0.001, and cluster size of more 10 voxels.We report the statistical difference among joint sources to evaluate theperformance of the proposed joint analysis. Two-sample t-test were per-formed on the mixing coefficients; i.e., the eight columns of matrix A, whichcorrespond to eight independent components, where each column consists oftwo groups of coefficients (one for each group of participants). One source dif-fered significantly between the two groups (p = 0.004, which passed the Bon-ferroni correction for multiple comparisons (p<0.00625)). Fig. 7.1a showsthe mixing coefficients (i.e., weights) for this joint source, and its GM, WMand deformation-field components. The weights in the depressed group weresignificantly higher than in the control group. Figs. 7.1b, 7.1c, and 7.1ddepict the statistical Z maps around the left and right hippocampus (theregions of interest) for this joint source, and Table 7.1 shows the correspond-ing stereotaxic coordinates in MNI space. As can be seen in Fig. 7.1d andTable 7.1, the shape variations appear mostly in the left hemisphere of thebrain within the hippocampal region. On the hand, we observe that thechanges in the GM and WM concentration appear in both hemispheres, asshown in Fig. 7.1b, and 7.1c. We remind the reader that for each subject,within the jICA framework (see Fig 3.1), the coefficients that modulate thethree maps (shape deformations GM and WM concentrations within eachjoint source) are the same. In other words, the three maps, which representthe variation of shape deformations, GM and WM concentrations amongsubjects, are jointly related. Hence, our results indicate that the statisti-cally significant shape deformations observed within the left hemisphere ofthe brain in the hippocampal region are related to the statistically significantGM and WM alterations in the hippocampal region in both hemispheres. Itis reasonable to infer from these results that local changes in brain tissuecomposition may lead to alterations of shape in distant regions, becausebrain is an interconnected organ.To investigate the effect of using fusion analysis to determine the in-917.3. Experiments and Resultsdependent components, the result of joint analysis of GM+WM+DF, andseparate analysis of each of the GM, WM, and DF were compared. Eightindependent-sample t-tests were conducted to compare depressed and non-depressed groups on the columns of the mixing coefficients for joint or sep-arate analysis of features. The modulation profiles differed significantly be-tween the two groups (Table 7.2). Separate analyses of GM, WM and DFfailed to identify significant group differences. Results confirm that combina-tion of shape deformations and local composition of tissue, but neither shapenor local composition of tissue alone, can discriminate between individualsin the two groups. As it can be seen smoothing hasn’t affected the resultsby much.Table 7.3 shows the average classification error for jICA (first column),and ICA (last three columns) of GM, WM, and DF, each used as inputfeatures in data fusion analysis. Results show that the control and depressedsubjects can be classified based on structural MRI data with an error of32% using the combination of shape deformations and tissue composition(GM+WM+DF). The classification error using shape deformations or tissuecomposition alone was more than 36%. Considering that the number ofsubjects is low and the dimensionality of the input MRI dataset is quitehigh, the results are very promising.Fig. 7.2 shows the group-average marginal histograms for GM, WM, anddeformation field, respectively. As it can be seen, the histograms of thenormalized intensity values GM and WM were almost the same for the twogroups, whereas the histogram of the absolute deformations showed around0.3 mm more deformation for subjects with MDD comparing to healthycontrols.927.3. Experiments and ResultsControl Depressed−0.0200.020.040.060.080.1SubjectsP−value = 0.0042Mixing coefficients (Arbitrary Unit)(a) (b)(c) (d)Figure 7.1: Joint Independent Component Analysis (jICA) of brain tissuecomposition and shape deformation. Fig. 7.1a shows the mixing coefficientsfor the depressed and control subjects wherein the central red mark is themedian, the edges of the blue box are the 25th and 75th percentiles, andthe whiskers show the extreme values of the coefficients. Figs. 7.1b, 7.1c,and 7.1b show the joint source map of the most significant component for(b) GM, (c) WM, and (d) deformation field. The green dots indicate theboundaries of the region of the interest which was created by dilating a maskaround the hippocampus.937.3. Experiments and ResultsTable 7.1: MNI coordinates for the most discriminative source map in threecontrasts. voxels which are above a threshold of |Z| > 2.5, and create acluster volume of more than 10 voxels, are shown in the table. L and Rshow the assigned anatomical left and right hemispheres, the coordinatesand value of the maximum Z are also provided in the table. Not significantregions are shown by ns.Feature Volume (voxels) random effects: Max Value (x, y, z)L R L RGM concentrationPositive69 55 5.5(−33,−28,−12) 4.6(33,−15,−18)44 74 5.0(−39,−4,−26) 5.8(33, 3,−30)26 30 4.3(−30, 2,−27) 5.3(35,−13,−27)23 9 5.0(−41,−6,−26) 3.9(38,−18,−26)4 19 3.9(−29,−9,−18) 4.6(35,−16,−17)2 18 6.4(−39,−4,−27) 2.6(38,−6,−29)Negative60 25 6.5(−36, 3,−29) 4.0(42,−13,−8)28 51 6.4(−39,−36,−14) 5.0(26,−39,−12)40 19 5.3(−36,−37,−12) 4.5(30,−33,−17)33 11 7.1(−38, 6,−26) 5.1(30, 8,−26)7 31 3.6(−26, 8,−20) 4.1(24, 0,−27)WM concentrationPositive77 27 7.8(−36, 3,−27) 5.1(30,−3,−26)42 73 8.0(−39,−36,−14) 6.6(26,−39,−12)16 72 3.4(−27, 2,−27) 5.0(24, 0,−27)55 31 6.5(−36,−37,−12) 5.8(27,−39,−12)28 18 8.7(−38, 6,−26) 6.2(30, 8,−26)Negative61 91 6.1(−39,−4,−26) 6.3(33, 3,−29)80 86 6.6(−33,−28,−12) 5.6(33,−15,−18)30 15 6.2(−41,−6,−26) 4.3(42,−13,−24)18 1 7.8(−39,−4,−27) 3.1(38,−6,−27)2 16 4.5(−33,−33,−9) 5.7(35,−16,−17)13 11 5.2(−30, 2,−27) 4.8(33,−4,−27)Deformation FieldsPositive515 0 4.3(−45,−5,−18) ns13 0 2.6(−32,−21,−24) ns947.4. DiscussionTable 7.2: P-values of the most significant joint source, obtained from two-sample t-tests performed on the columns of the mixing coefficients gener-ated by jICA (first three columns), and ICA (last three columns). Firstand second row show the results without and with spatial smoothing of thefeatures. GM: gray matter; WM: white matter; DF: deformation field, eachused as input features in data fusion analysis. The p-values displayed in thefirst three columns passed a Bonferroni correction for multiple comparison(p<0.00625).SmoothingCombinationGM+WM+DF GM+DF WM+DF DF GM WMNone 0.004 0.005 0.004 0.069 0.067 0.081FWHM: 8 mm 0.005 0.006 0.005 0.069 0.087 0.036Table 7.3: Classification error obtained from discriminant analysis of themixing coefficients generated by jICA (first column), and ICA (last threecolumns). GM: gray matter; WM: white matter; DF: deformation field,each used as input features in data fusion analysis.Combination GM+WM+DF DF GM WMError 32% 36% 36% 40%The sorted maximum Renyi divergence values for different combinationsof contrasts are shown in Fig. 7.3. Higher values indicate better discrimi-nation between the groups. Therefore, as indicated in the figure, combiningthe deformation field and tissue composition yielded greater discriminationthan utilizing either deformation field or tissue composition data alone. Inparticular, combining GM and DF yielded the highest level of discriminatorypower in both samples.7.4 DiscussionThe current study is the first to report that joint analysis of brain shape andtissue composition is sensitive enough to identify subtle significant differ-ences between young people in a first episode of MDD and healthy controls,957.4. Discussion0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6102030405060708090GM Intensity ValueHistogram Count  ControlDepressed(a)0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.850100150200250300350WM Intensity ValueHistogram Count  ControlDepressed(b)0.5 1 1.5 2 2.5 3 3.5050100150200250Absolute DeformationHistogram Count  ControlDepressed(c)Figure 7.2: Group-average histogram for the whole dataset on GM (a), WM(b), and Deformation field (c). The difference between histograms of the twogroups in deformation field was more than GM and WM.whereas separate analysis of shape and tissue composition fails to discrim-inate the groups. The identified corresponding sources demonstrate MDD-related links between WM, GM and shape deformation changes in the hip-pocampus, which were not detectable with univariate voxel-based methods.Assuming that the features share the same mixing coefficient matrix (modu-lation profile), jICA uses more information to estimate the same number ofmixing coefficients and can therefore improve source estimations comparedto ICA. The observed shape deformations in left hippocampus are related toGM and WM alterations in hippocampus in both hemispheres (see Fig. 7.1and Table 7.1). These significant shape deformation differences in the lefthippocampus are consistent with a previous study of shape [204], and volume967.4. DiscussionGM & DF WM & DF GM & WM & DF DF GM GM & WM00.10.20.30.40.50.60.70.80.91Renyi divergenceFigure 7.3: Renyi divergence criteria values for different combination of fea-tures on differentiation between histograms. The first six highest value com-binations are shown in the figure. The higher the values of the Renyi diver-gence, the better the discrimination between groups.differences [196] in late-life MDD; and volume differences in adolescents withMDD [118]. Our results provide compelling evidence that shape-deformationdifferences in the hippocampus between depressed and healthy individualsare present to at least some extent even in the very initial stages of the ill-ness; they do not simply emerge over the recurrent and chronic pathology ofthe disorder, and they are independent of any potential neurotoxic effects ofchronic anti-depressant usage.Results demonstrate that individuals can be classified relatively accu-rately (with 68% accuracy) into control and depressed groups by using onlystructural MRI data. This is consistent with previous studies on diagnosticclassification of MDD using brain structural neuroanatomy (67.6% diagnosticaccuracy reported by [47] and 77.8% prognosis accuracy reported by [136] us-ing adult subjects). However, classification results reported using functionalMagnetic Resonance Imaging are higher (94.3% reported by [203], 90.6% re-ported by [113] and 95% reported by [49]), suggesting that functional analysisof MDD is more suitable for diagnostic classification.The group-average histograms for individual features indicated that amongindividual features, the deformation field was able to best discriminate be-tween the two groups. However, the combination of GM and deformation977.4. Discussionfield captured the group differences better than any individual feature alone,or any other combinations of features, as indicated by the values of the Renyidivergence. These results suggest that future studies should use both defor-mation fields, used to normalize individuals’ brain to the reference space, andregionally specific analyses, such as tissue composition measures, to betterunderstand the brain basis of MDD and capture structural differences be-tween subjects with MDD and healthy controls.The proposed method based on fusion of brain tissue composition andshape deformation successfully captured the differences in hippocampal shapeand tissue composition between young people in a first episode of depressionand healthy control subjects. Specifically, using the jICA method, signifi-cant shape deformation differences in the left hippocampus were observedbetween depressed and control groups. In contrast, no differences were de-tected between the two groups when a separate analysis of each feature wasconducted. These results suggest that the jICA method may be a moresensitive technique for detecting morphological differences in brain tissue;such sensitivity may be particularly helpful when the sample size is rela-tively small, or when structural abnormalities are relatively subtle (such asin groups of young people who are very early in their disease course). Thecurrent results have important clinical implications. Although prospectivestudies with individuals at risk for MDD are needed to determine the causalrole of these structural differences in MDD, the current results suggest thathippocampal volume loss may be a vulnerability factor for a particularlysevere manifestation of MDD in the first onset.98Chapter 8Simultaneous Analysis of Pose,Shape, and Tissue Composition8.1 IntroductionMajor depression has been linked to brain structural changes using five dif-ferent types of approaches: 1) volume analysis of a single brain structure(region of interest) using manual or automated segmentation [13, 16, 24, 26,77, 87, 120, 131, 134, 165, 169, 196, 200]; 2) separate analysis of shape andvolume of a region of interest using high-dimensional mapping [147] or spher-ical harmonic basis functions [204]; 3) analysis of local composition of tissuein a single brain structure or whole brain using voxel-based morphometry(VBM) [16, 36, 175, 196]; 4) analysis of pose (i.e. position, orientation, andsize) and shape of multiple brain structures [156, 157]; and 5) joint analysisof shape deformations and local composition of tissue [151].Limitation of the region of interest based methods (first and second ap-proaches mentioned above) is that analysis is performed only on a singlebrain structure after isolating it from the rest of the brain, and the relativeshape and pose information between that region and the surrounding regionsfor the purpose of group analysis is largely ignored.Limitation of VBM is that each individual’s brain data must be normal-ized to a reference template, and the proportion of Grey Matter (GM) con-centration and absolute volume of corresponding voxels are compared acrossindividuals. Through that process, crucial idiographic information from thespatial normalization procedure, such as shape, relative pose of brain struc-tures with respect to each other, and their absolute pose, is lost [6]. Thisinformation may be critical for capturing group differences, particularly whensuch differences are likely to be subtle.1This chapter is adapted from [159]: Mahdi Ramezani, Abtin Rasoulian, Tom Hollen-stein, Kate Harkness, Ingrid Johnsrude, and Purang Abolmaesumi, Joint source basedanalysis of multiple brain structures in studying major depressive disorder, SPIE MedicalImaging: Image Processing, San Diego, US, 2014.998.1. IntroductionAs shown in chapter 6, pose and shape analysis of multiple brain struc-tures, alleviates several issues associated with previous techniques, and al-lows for identifying subtle shape and volumetric differences across multiplebrain regions between groups [156, 157]. However, simultaneous analysisof pose and shape variations within a multivariate framework has not beeninvestigated yet.In chapter 7, within the jICA technique, we showed that local changes inbrain tissue composition (i.e., GM and WM concentrations) result in mod-ulation of shape deformations in distant regions (in both control and MDDgroups, although to different degrees). However, this approach has not beenapplied for analysis of multiple brain structures.In this chapter, we use the joint Source-Based Analysis (jSBA) frame-work to identify common information across different brain structural MRIfeatures, for classification of individuals with and without MDD. Here, asource includes regions of brain that together exhibit intersubject covari-ance and group differences. The framework consists of three components,as shown in Fig. 8.1: 1) feature generation, 2) joint group analysis and 3)classification of individuals based on the joint analysis results. In the pro-posed framework, information from pose, shape, and tissue composition ofa selected brain structure are represented as features. For each individual,features are used within the joint group analysis to generate joint sourcesand their corresponding modulation profiles. Modulation profiles are usedto classify individuals into different categories.To the best of our knowledge, this is the first framework for quantita-tive classification of individuals with MDD based on simultaneous analysis ofpose, shape, and tissue composition, obtained from multiple brain structures.Our key contribution is extracting multiple information from structural MRIdata of each subject, and creating features of pose, shape and tissue com-position of multiple brain structures for the joint group analysis. An addedvalue of our method is in identifying structures of interest, and characteriz-ing types of differences that can then be injected back into models/theorieson etiology.The proposed framework is evaluated on data from a group of subjectsdiagnosed with severe or moderate MDD. In a cross-validation leave-one-subject-out experiment, we demonstrate that the framework enables theclassification of these subjects with around 70% accuracy, solely based ontheir structural brain data.1008.2. MethodFeature Generation • Tissue composition • Pose • Shape Group joint analysis Fusion analysis Classification Linear or Non -linear classifier Brain structural MRI data }  Multi-object analysis Figure 8.1: Proposed joint source-based analysis (jSBA) framework for clas-sification of individuals8.2 MethodThe jSBA framework consists of three components, as shown in Figure 8.1: 1)feature generation, 2) joint group analysis, and 3) classification of individualsbased on the joint analysis results.8.2.1 Feature GenerationThe primary outcome measure derived from a structural image may includea measure of a particular structure (e.g., deformation) or a description oftissue type (e.g., Grey Matter (GM) and White Matter (WM) concentra-tions). Below, we describe how we extract three different features: (1) localbrain tissue compositions, (2) pose variations, and (3) shape deformations;represented by the superscripts “g”, “p”, and “s”, respectively.The structural MRI data were preprocessed using Statistical Paramet-ric Mapping software (SPM8, Wellcome Department of Cognitive Neurol-ogy, London, UK). Each voxel of each individual structural (T1-weighted)MRI was assigned a probability of being Gray Matter (GM), White Matter(WM) and Cerebral Spinal Fluid (CSF), using the automated segmentationprocesses in SPM. The GM and WM maps were used within the DARTELgroupwise registration method [5] to create a population template, and thedeformation fields required to map the GM maps from each participant tothe template space. The mapped GM segments were then spatially normal-ized to stereotaxic MNI space, where a segmented LPBA40/SPM5 atlas [176]was used to segment multiple structures in the brain, including hippocam-pus, parahippocampal gyrus, putamen, and superior, inferior and middletemporal gyri, from both hemispheres of the brain. For each individual, thenormalized mapped GM concentrations for the voxels within the selectedbrain structures were used as tissue composition feature, and the matrix ofall GM concentrations was created: Yg = [yg1 , . . . , ygN ]T .1018.2. Method  𝐾𝑖=1 di ith joint source xi Coefficients associated to the ith joint source Pose map Shape map DT: K Joint sources  GM map GM feature Pose feature Shape feature Healthy MDD Y T: Observations  X2 X1 X T: Sparse Coefficient Matrix  Y g Y p Y s Figure 8.2: Schematic of the joint sparse representation method [150].Using the DARTEL algorithm, the atlas was registered to each indi-vidual’s structural MRI, and deformation fields for such registrations werecreated. The deformation fields were used to warp the surface points of eachof the selected brain structures to each individual’s brain volume. Pose fea-tures were calculated by the parameters of similarity transformations (trans-lation, rotation, and scale) between surface points of the brain structuresacross subjects [158]. For each brain structure, the correspondences amongthe surface points across subjects was used to compute the similarity trans-formations (Tn,l) from the mean shape (µl) of that structure across sub-jects to that structure in each subject. Tn,l represents the transformationfrom the lth anatomical structure in the mean shape to the correspondingone in the nth subject. These transformations form a Lie group, thereforea logarithmic mapping transforms them to a linear tangent space, whereconventional statistical analysis can be applied. Each transformation wasnormalized using the mean transformation for each anatomical structure,Ml, and mapped to the tangent space: ypn,l = log(M−1l Tn,l) [143]. Thetransformation vectors were concatenated for each instance to form a vector:ypn = [ypn,1T , . . . , ypn,LT ]T and the matrix of all transformations for subjects,representing pose variations, was created: Yp = [yp1 , . . . , ypN ]T .Shape features are computed as the residual deformation required tomap the mean shape of each structure to the corresponding structure foreach subject, after the similarity transformation is applied. The distancebetween the mean shape of each structure across all individuals in the dataset, µl, and surface points of each structure in each individual were com-puted. The transformation vectors were concatenated for each instance:ysn = [ysn,1T , . . . , ysn,LT ]T and the matrix of all transformations for instances,representing shape deformations, was created:Ys = [ys1, . . . , ysN ]T .1028.2. Method8.2.2 Group Joint AnalysisThe three features described in the previous section were used as input ob-servations to the proposed joint sparse representation method [150], whichis a dictionary learning algorithm based on K-SVD algorithm [1]. Figure 8.2shows the schematic of this method. In this figure, Y T = [Yg, Yp, Ys] ∈ RV×Nis the observation matrix, where Yg, Yp, and Ys are matrices of GM, pose,and shape features, X = [x1, x2, . . . , xN ] ∈ RK×N is the sparse modulationmatrix, and D = [d1, d2, . . . , dN ] ∈ RV×K is the dictionary containing Ksignal atoms representing the joint maps of GM, pose and shape. V , N , andK are the number of variables (GM voxels + pose features + shape features),subjects and brain joint sources, respectively. Assuming that maps of jointsources share the same sparse coefficients matrix, the method tries to rep-resent a large set of brain features as a sparse linear combination of a smallset of joint sources (’basis maps’). To assess the reliability of the estimatedjoint sources as well as for visual inspection of sources in low-dimensionalspace, we use the proposed reliability analysis and visualization method inchapter 5. To accomplish this, the parameters of the sparse representationanalysis are set and the analysis is run multiple times, using the selectedparameters. The estimated components are clustered based on their simi-larities using the hierarchical clustering algorithm. Finally the clusters arevisualized using the t-SNE approach. The estimated joint sources and sparsecoefficients corresponding to centrotypes of the clusters are used in the thirdcomponent of the jSBA framework, i.e., classification.8.2.3 ClassificationIn order to overcome the classification problems caused by high dimension-ality of data and the small set of available subjects, columns of the sparsecoefficients matrix, X, which reflect the weighting of each joint source ina subject’s feature, were used as input features to a random-forest classi-fier [23]. Random forests are a learning method for classification that usemultiple decision trees for training. The decision tree splits the weightsrelated to the considered joint sources to maximize diversity among the sub-jects [46]. As a result, a tree with nodes and leaves is constructed, where itstop node shows the weights with maximum separability.Performance of the classification was measured by the leave-one-subject-out cross-validation and averaging classification performance on each left outdata. In other words, the data were split into training subjects and a testsubject. In the training phase, the jSBA was performed on training subjects.1038.3. Results and DiscussionThis produced a new sparse coefficient matrix X(train) and joint sourcematrix D(train) that modeled the generation of the training observationsY(train). The rows of the sparse matrixX(train) were used as input featuresto train the classifier, which divided the training subjects into two classesof with and without MDD. The input features for the test subject, rows ofX(test), were then found by OMP solution of Y (test) = X(test)D(train).The positions of these vectors in k-dimensional feature space, relative to thehyperplane found in running the classifier on the training set, determinedthe classification of the test subject.8.3 Results and DiscussionThe goal of our jSBA framework is to examine whether joint sources canbe used to accurately distinguish two groups of subjects on the basis sMRIdata. The success of this analysis is evaluated by examining the statisticaldifference among joint sources, and by a classification method. Fig. 8.3shows the results of reliability and visualization analysis on the 45 subjectswith and without depression. The number of times each source appearedin a cluster, the dendrogram of the hierarchical clustering algorithm, thesimilarities between estimated clusters, the 2D projection of the clustersusing t-SNE approach are shown in this figure. As it can be seen, thereare two major cluster of components, within the dataset. Convex hulls aregenerated around the estimated clusters. The components can be sortedbased on the size of the clusters.Two-sample t-tests on the mixing coefficients was performed, and twocomponents were found to differ significantly (p < 0.05) between the twogroups. These components showed different sparse coefficients (p = 0.033and p = 0.036) in subjects with and without MDD. Figures 8.4a and 8.4bshows the distribution of the sparse coefficients for these two significantsources. These sparse coefficients reflect how much each subject’s struc-tural MRI features (i.e. GM concentration, pose, and shape) are modulatedby a joint source. Coefficients of the first significant different source (lowestp-value) were almost zero for the control subjects. This demonstrates thatthe features are modulated by the subjects with MDD more than controls.Classification results show that individuals with and without MDD canbe classified with an accuracy of 67% based on information gathered frommultiple brain structures. Considering the fact that the number of subjectsis low and the dimensionality of the dataset is high, the results are verypromising.1048.3. Results and Discussion0 5 10 15143596278 71410 81811 311 8Component LabelNumber of estimates(a)00.20.40.60.8 Cluster merging similarity(b)Red lines indicate estimate−clusters967284135Estimate−cluster label   0.10.20.30.40.50.60.70.80.91(c)123456789  00.50.750.91Single−run−estimate"Best Estimate" (centrotype)0.50<ρij≤ 0.750.75<ρij≤ 0.900.90<ρij≤ 1.00(d)Figure 8.3: Reliability and visualization analysis of the 45 subjects with andwithout depression.(a) Number of times each component appeared in a clus-ter. (b) Dendrogram of the hierarchical clustering algorithm. (c) Similaritiesbetween estimated clusters, arranged according to the dendrogram. (d) 2Dprojection of the clusters.The current study is the first to report classification of subjects withMDD and healthy controls based on pose, shape, and tissue compositionobtained from sMRI data. All participants in the MDD group were in theirvery first episode of depression and were medication-free, which may provideimportant clues as to the disorder’s initial etiology. Results confirmed thatshape deformation differences in the selected brain structures between de-pressed and healthy individuals are present to at least some extent even in1058.3. Results and DiscussionControl Depressed−600−400−2000200400(a)Control Depressed−400−300−200−1000100200300400500(b)Figure 8.4: Sparse coefficients for the significant joint sources (p = 0.033,p = 0.036) for MDD subjects and healthy controls (a, b), respectively; thecentral red mark is the median, the edges of the blue box are the 25th and75th percentiles, and the whiskers show the extreme values of the coefficients.the very initial stages of the MDD. In the current study, all of the depressedsubjects were diagnosed to have severe and moderate depression, furtherstudies on subjects with mild depression are required.In summary, using the proposed jSBA framework, independent jointsources of GM concentrations, pose, and shape deformations were obtainedand simultaneously analyzed, to capture the group differences. The frame-work was evaluated on brain MRI of young adults with, and without, MDD,and determined significant differences between these two groups. In a cross-validation leave-one-subject-out experiment, we demonstrate that the frame-work enables the classification of these subjects with 67% accuracy, solelybased on their structural brain data.106Chapter 9Conclusion and Future Work9.1 ConclusionIn this thesis we proposed a framework that can distinguish patients andhealthy controls when the number of available subjects is low, and the be-tween group differences are subtle. To this end, in the third chapter weproposed to use multi-task fMRI data, and take advantage of the comple-mentary information that exist among the tasks for group classification. Weused the joint ICA method together with an SVM classification algorithm,and demonstrated that cognitive patterns can be used to classify individualsin the absence of behavioral differences. We showed that by combining mul-tiple functional contrasts, revealing different patterns of brain activity, theoverall performance of the classification improves.In the fourth chapter, we proposed joint sparse representation analy-sis which is more appropriate for joint analysis of brain functional imagingdatasets. We showed that the proposed analysis can effectively identify thecommon and unique information among different levels of brain cognitivepatterns in multi-task fMRI data within different groups. To demonstratethe potential of the proposed framework, analyses of simulated fMRI datawere performed followed by analyses of experimental fMRI data from nor-mal subjects performing speech comprehension tasks. Simulations showedthe superiority of the proposed method to the jICA for multi-task fMRI dataanalysis. Results on the experimental fMRI data also demonstrate that thejSRA method can better capture the brain functional activation patterns,and therefore the differences, between two groups.In Chapter Five, we proposed a method to investigate the reliability ofthe estimated components using sparse representation analysis. To achievethis goal the KSVD algorithm was run several times and estimated com-ponents were clustered based on their similarity. Then, the clusters werevisualized using a nonlinear 2D projection. The proposed approach providesa tool for further investigation of the obtained sources. The approach high-lights compact clusters with higher number of components, and suggests lessreliable cluster of components that could be discarded. Future work will in-1079.2. Future Workvestigate the use of the modulation profiles, which are substantially compactand sparse, for reliable classification of individuals based on neuroimagingdata.In Chapter Six, we investigated the simultaneous analysis of multiplebrain structures within the multi-object statistical pose and shape analysis.Relative pose and shape information of multiple structures in brain, whichare usually disregarded, were shown to be important in capturing the groupdifferences. Within this framework, the shape deformations were analyzedseparately from rigid transformations and scale (i.e., the pose information).Therefore, we could identify the type of morphological differences (pose andshape). Within the simultaneous analysis of multiple structures the relativedifferences among structures were captured.In Chapter Seven, we fused brain tissue composition and shape defor-mation which previously haven’t been integrated or fused. Using the jICAmethod, we successfully captured the differences in hippocampal shape andtissue composition between young people in a first episode of depression andhealthy control subjects. Specifically, using the jICA method, significantshape deformation differences in the left hippocampus were observed be-tween depressed and control groups. In contrast, no differences were detectedbetween the two groups when a separate analysis of each feature was con-ducted. These results suggest that the jICA method may be a more sensitivetechnique for detecting morphological differences in brain tissue-such sensi-tivity may be particularly helpful when the sample size is relatively small,or when structural abnormalities are relatively subtle (such as in groups ofyoung people who are very early in their disease course).In Chapter Eight, we used the proposed jSBA framework, to jointly ana-lyze GM concentrations, pose, and shape deformations to capture the groupdifferences of young adults with, and without, MDD. In a cross-validationleave-one-subject-out experiment, we demonstrate that the framework en-ables the classification of these subjects with 67% accuracy, solely based ontheir structural brain data.9.2 Future WorkThe results reported above for sMRI analysis of subjects with and withoutdepression should be interpreted in the context of the following limitations.First, although current results (Fig. 6.6) did not reveal any differences in thestructures we examined between male and female individuals with MDD,the study comprised young women with moderate and severe depression al-1089.2. Future Workmost exclusively; therefore, generalization to young men in a first episode ofMDD, and young men and women with milder levels of depression severityrequires further study. Future studies should also investigate whether thepresent results generalize to adults with recurrent MDD, as well as youngerchildren with MDD.Second, we did not assess for the presence of comorbid anxiety disorders orspecific subtypes of MDD. Future studies are required to examine variationin brain morphology with differing depression syndromes in order to identifybiomarkers of more homogeneous endophenotypes.Third, it will be important in future research to determine whether thepresent results generalize to children and early adolescents with MDD, assignificant corticolimbic plasticity remains throughout childhood and earlyadolescence [80], which may obscure any potential toxic effects of depressionvulnerability.Fourth, as all of our depressed participants were medication naive, the struc-tural differences are not due to any potential neurotoxic effects of chronicanti-depressant usage. As such, the structural differences may emerge asa result of premorbid epigenetic vulnerabilities. For example, hippocam-pal volume differences have been shown in individuals with particular poly-morphisms of genes known to impart risk for depression, but only in thecontext of environmental adversity, such as a history of childhood trauma(e.g., [188]), or maternal depression (e.g., [37]. Future prospective, longitu-dinal studies that follow children at risk of depression as a result of thesevulnerabilities through to the onset of syndromal MDD are required to clarifythe precise etiological and pathological mechanism underyling the relationof hippocampal volume loss to MDD.Fifth, few recent studies in MDD have focused on the White Matter (WM)integrity using Diffusion Tensor Imaging (DTI), to assess the structural con-nectivity of the network between healthy controls and MDD subjects. Ko-rgaonkar et al. (2014) showed structural connectomic alterations betweennodes of the default mode network and the frontal-thalamo-caudate regionsin 95 MDD outpatients comparing to 102 matched control subjects [99].However, Choi et al. (2014) found no significant differences in WM integritydisruption between 134 medication-free MDD patients and 54 healthy con-trols [40]. Future studies may use multivariate approaches to include analysisof geometric changes (pose and shape), tissue concentrations (WM and GM)and structural connectome (DTI). Future studies should also consider differ-ences between MDD and control groups in other brain structures of relevanceto MDD.Finally, the participants in this study were volunteers and, thus, may not1099.2. Future Workbe entirely representative of the population of young people with depression.Nevertheless, as a community sample, they may be more representative thanthe subjects of most previous studies, which have relied on treatment-seekingpatients in tertiary care centers.In future, we will apply the jSBA framework for improving the clinicaldiagnosis of the brain disorders using multi-modal datasets. In collaborationwith sexual health research laboratory of Queen’s University, we have gath-ered structural and functional MRI images of women with Provoked Vestibu-lodynia (PVD). We will analyze structural and functional imaging data frompatients with PVD within the jSBA framework, by combining brain regionaltissue composition, and functional patterns of activation. In addition, usingmulti-object statistical analyses of brain pose and shape deformations [153],we will investigate shape deformations and relative anatomical pose infor-mation that is usually discarded in the alignment process. The combinationof brain structure and function data should make clearer the pathologic roleof abnormalities, and reveal the joint information between the features, suchas the GM/WM involvement associations with brain activity patterns. Theanalysis would likely improve our understanding of the relation between clin-ical scores, brain structural and functional changes.110Bibliography[1] M. Aharon, M. Elad, and A. Bruckstein. K-svd: An algorithm fordesigning overcomplete dictionaries for sparse representation. SignalProcessing, IEEE Transactions on, 54(11):4311–4322, 2006.[2] Ellemarije Altena, Hugo Vrenken, Ysbrand D Van Der Werf, Odile Avan den Heuvel, and Eus JW Van Someren. Reduced orbitofrontal andparietal gray matter in chronic insomnia: a voxel-based morphometricstudy. Biological psychiatry, 67(2):182–185, 2010.[3] J. I. Arribas, V. D. Calhoun, and T. Adali. Automatic bayesian classi-fication of healthy controls, bipolar disorder, and schizophrenia usingintrinsic connectivity maps from fMRI data. Biomedical Engineering,IEEE Transactions on, 57(12):2850–2860, 2010.[4] Vincent Arsigny, Olivier Commowick, Nicholas Ayache, and XavierPennec. A fast and log-euclidean polyaffine framework for locally linearregistration. Journal of Mathematical Imaging and Vision, 33(2):222–238, 2009.[5] J. Ashburner. A fast diffeomorphic image registration algorithm. Neu-roImage, 38(1):95–113, 2007.[6] J. Ashburner and K. J. Friston. Voxel-based morphometry–the meth-ods. NeuroImage, 11(6):805–821, 2000.[7] John Ashburner and Karl J. Friston. Unified segmentation. NeuroIm-age, 26(3):839–851, 2005.[8] Francis R. Bach. Bolasso: model consistent lasso estimation throughthe bootstrap. In Proceedings of the 25th international conference onMachine learning, pages 33–40. ACM, 2008.[9] R. Michael Bagby, Andrew G. Ryder, Deborah R. Schuller, and Mar-garita B. Marshall. The hamilton depression rating scale: has the goldstandard become a lead weight? American Journal of Psychiatry,161(12):2163–2177, 2004.111Bibliography[10] Quentin Barthelemy, Cedric Gouy-Pailler, Yoann Isaac, AntoineSouloumiac, Anthony Larue, and Jerome I. Mars. Multivariate tem-poral dictionary learning for eeg. Journal of neuroscience methods,2013.[11] Aaron T. Beck, Robert A. Steer, Roberta Ball, and William F. Ranieri.Comparison of beck depression inventories-ia and-ii in psychiatric out-patients. Journal of personality assessment, 67(3):588–597, 1996.[12] A. J. Bell and T. J. Sejnowski. An information-maximization ap-proach to blind separation and blind deconvolution. Neural compu-tation, 7(6):1129–1159, 1995.[13] S. Bell-McGinty, M. A. Butters, C. C. Meltzer, P. J. Greer, C. F.Reynolds, and J. T. Becker. Brain morphometric abnormalities in geri-atric depression: long-term neurobiological effects of illness duration.American Journal of Psychiatry, 159(8):1424–1427, 2002.[14] M. Bellani, M. Baiano, and P. Brambilla. Brain anatomy of major de-pression i. focus on hippocampus. Epidemiologia e psichiatria sociale,19(04):298–301, 2010.[15] M. Bellani, M. Baiano, and P. Brambilla. Brain anatomy of major de-pression ii. focus on amygdala. Epidemiology and Psychiatric Sciences,20(01):33–36, 2011.[16] L. Bergouignan, M. Chupin, Y. Czechowska, S. Kinkingnefhun,C. Lemogne, G. Le Bastard, M. Lepage, and L. Garnero. Can voxelbased morphometry, manual segmentation and automated segmenta-tion equally detect hippocampal volume differences in acute depres-sion? NeuroImage, 45(1):29–37, 2009.[17] Erik Berntsen, Inge Rasmussen, Petter Samuelsen, Jian Xu, Olav Har-aldseth, Jim Lagopoulos, and Gin Malhi. Putting the brain in jeopardy:a novel comprehensive and expressive language task? Acta Neuropsy-chiatrica, 18(2):115–119, 2006.[18] Boris Birmaher, David A. Brent, Laurel Chiappetta, Jeffrey Bridge,Suneeta Monga, and Marianne Baugher. Psychometric properties ofthe screen for child anxiety related emotional disorders (scared): areplication study. Journal of the American Academy of Child Adoles-cent Psychiatry, 38(10):1230–1236, 1999.112Bibliography[19] Fred L. Bookstein. Landmark methods for forms without landmarks:localizing group differences in outline shape. In Mathematical Methodsin Biomedical Image Analysis, 1996., Proceedings of the Workshop on,pages 279–289. IEEE, 1996.[20] M. N. Bossa and S. Olmos. Statistical model of similarity transforma-tions: Building a multi-object pose model of brain structures. In IEEEComput. Soc. Workshop Math. Methods Biomed. Image Anal, page 59,2006.[21] MN Bossa and S. Olmos. Multi-object statistical pose+shape models.In 4th IEEE International Symposium on Biomedical Imaging: FromNano to Macro, pages 1204–1207. IEEE, 2007.[22] Kelly N. Botteron, Marcus E. Raichle, Wayne C. Drevets, Andrew C.Heath, and Richard D. Todd. Volumetric reduction in left subgen-ual prefrontal cortex in early onset depression. Biological psychiatry,51(4):342–344, 2002.[23] Leo Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.[24] J. D. Bremner, M. Narayan, E. R. Anderson, L. H. Staib, H. L. Miller,and D. S. Charney. Hippocampal volume reduction in major depres-sion. American Journal of Psychiatry, 157(1):115–118, 2000.[25] Heather M. Burke, Mary C. Davis, Christian Otte, and David C.Mohr. Depression and cortisol responses to psychological stress: ameta-analysis. Psychoneuroendocrinology, 30(9):846–856, 2005.[26] S. C. Caetano, J. P. Hatch, P. Brambilla, R. B. Sassi, M. Nicoletti,A. G. Mallinger, E. Frank, D. J. Kupfer, M. S. Keshavan, and J. C.Soares. Anatomical mri study of hippocampus and amygdala in pa-tients with current and remitted major depression. Psychiatry Re-search: Neuroimaging, 132(2):141–147, 2004.[27] V. D. Calhoun, P. K. Maciejewski, G. D. Pearlson, and K. A. Kiehl.Temporal lobe and default hemodynamic brain modes discriminatebetween schizophrenia and bipolar disorder. Human brain mapping,29(11):1265–1275, 2008.[28] VD Calhoun, T. Adali, GD Pearlson, and JJ Pekar. A method formaking group inferences from functional mri data using independentcomponent analysis. Human brain mapping, 14(3):140–151, 2001.113Bibliography[29] Vince D. Calhoun and Tulay Adali. Feature-based fusion of medicalimaging data. Information Technology in Biomedicine, IEEE Trans-actions on, 13(5):711–720, 2009.[30] Vince D. Calhoun, Tulay Adali, NR Giuliani, JJ Pekar, KA Kiehl,and GD Pearlson. Method for multimodal analysis of independentsource differences in schizophrenia: combining gray matter structuraland auditory oddball functional data. Human brain mapping, 27(1):47–62, 2006.[31] Vince D. Calhoun, Tulay Adali, Kent A. Kiehl, Robert Astur, James J.Pekar, and Godfrey D. Pearlson. A method for multitask fMRI datafusion applied to schizophrenia. Human brain mapping, 27(7):598–610,07 2006.[32] Vince D. Calhoun, Vamsi K. Potluru, Ronald Phlypo, Rogers F. Silva,Barak A. Pearlmutter, Arvind Caprihan, Sergey M. Plis, and TulayAdali. Independent component analysis for brain fmri does indeedselect for maximal independence. PloS one, 8(8):e73309, 2013.[33] Melissa K. Carroll, Guillermo A. Cecchi, Irina Rish, Rahul Garg, andA. Ravishankar Rao. Prediction and interpretation of distributed neu-ral activity with sparse models. NeuroImage, 44(1):112–122, 2009.[34] J. J. Cerrolaza, A. Villanueva, and R. Cabeza. Hierarchical statisti-cal shape models of multiobject anatomical structures: Application tobrain mri. Medical Imaging, IEEE Transactions on, 31(3):713–724,2012.[35] M. W. L. Chee, H. Zheng, J. O. S. Goh, D. Park, and B. P. Sutton.Brain structure in young and old east asians and westerners: compar-isons of structural volume and cortical thickness. Journal of cognitiveneuroscience, 23(5):1065–1079, 2011.[36] Chi-Hua Chen, Khanum Ridler, John Suckling, Steve Williams, Cyn-thia HY Fu, Emilio Merlo-Pich, and Ed Bullmore. Brain imagingcorrelates of depressive symptom severity and predictors of symptomimprovement after antidepressant treatment. Biological psychiatry,62(5):407–414, 2007.[37] Michael C. Chen, J. Paul Hamilton, and Ian H. Gotlib. Decreasedhippocampal volume in healthy girls at risk of depression. Archives ofGeneral Psychiatry, 67(3):270, 2010.114Bibliography[38] Jason E. Childress, Emily J. McDowell, Venkata Vijaya K. Dalai, Sai-vivek R. Bogale, Chethan Ramamurthy, Ali Jawaid, Mark E. Kunik,Salah U. Qureshi, and Paul E. Schulz. Hippocampal volumes in pa-tients with chronic combat-related posttraumatic stress disorder: Asystematic review. The Journal of neuropsychiatry and clinical neuro-sciences, 25(1):12–25, 2013.[39] K. Choi, Z. Yang, X. Hu, and H. Mayberg. A combined functional-structural connectivity analysis of major depression using joint in-dependent components analysis. Psychiatric MRI/MRS, Toronto,Canada, May, page 3555, 2008.[40] Ki Sueng Choi, Paul E. Holtzheimer, Alexandre R. Franco, Mary E.Kelley, Boadie W. Dunlop, Xiaoping P. Hu, and Helen S. Mayberg.Reconciling variable findings of white matter integrity in major de-pressive disorder. Neuropsychopharmacology, 39:1332–1339, 2014.[41] MK Chung, KJ Worsley, T. Paus, C. Cherif, DL Collins, JN Giedd,JL Rapoport, and AC Evans. A unified statistical approach todeformation-based morphometry. NeuroImage, 14(3):595–606, 2001.[42] Michael W. Cole and Walter Schneider. The cognitive control network:Integrated cortical regions with dissociable functions. NeuroImage,37(1):343–360, 2007.[43] M. R. Coleman, J. M. Rodd, M. H. Davis, I. S. Johnsrude, D. K.Menon, J. D. Pickard, and A. M. Owen. Do vegetative patients re-tain aspects of language comprehension? Evidence from fMRI. Brain,130(10):2494–2507, 2007.[44] Andre Collignon, Frederik Maes, Dominique Delaere, Dirk Vander-meulen, Paul Suetens, and Guy Marchal. Automated multi-modalityimage registration based on information theory. In Information pro-cessing in medical imaging, volume 3, pages 263–274, 1995.[45] Olivier Commowick, Vincent Arsigny, Aurelie Isambert, Jimena Costa,Fredric Dhermain, Francois Bidault, P-Y Bondiau, Nicholas Ayache,and Gregoire Malandain. An efficient locally affine framework for thesmooth registration of anatomical structures. Medical image analysis,12(4):427–441, 2008.115Bibliography[46] Don Coppersmith, Se June Hong, and Jonathan RM Hosking. Parti-tioning nominal attributes in decision trees. Data Mining and Knowl-edge Discovery, 3(2):197–217, 1999.[47] Sergi G. Costafreda, Carlton Chu, John Ashburner, and Cynthia HYFu. Prognostic and diagnostic potential of the structural neuroanatomyof depression. PLoS One, 4(7):e6353, 2009.[48] David D. Cox and Robert L. Savoy. Functional magnetic reso-nance imaging (fmri)brain readingÂ: detecting and classifying dis-tributed patterns of fmri activity in human visual cortex. NeuroImage,19(2):261–270, 2003.[49] R. Cameron Craddock, Paul E. Holtzheimer, Xiaoping P. Hu, and He-len S. Mayberg. Disease state prediction from resting state functionalconnectivity. Magnetic resonance in Medicine, 62(6):1619–1628, 2009.[50] Cristina Cusin, Huaiyu Yang, Albert Yeung, and Maurizio Fava. Ratingscales for depression, pages 7–35. Handbook of clinical rating scalesand assessment in psychiatry and mental health. Springer, 2010.[51] I. Daubechies, E. Roussos, S. Takerkart, M. Benharrosh, C. Golden,K. D’Ardenne, W. Richter, J. D. Cohen, and J. Haxby. Independentcomponent analysis for brain fMRI does not select for independence.Proceedings of the National Academy of Sciences, 106(26):10415, 2009.[52] J. Daunizeau, H. Laufs, and K. J. Friston. EEG fMRI informationfusion: Biophysics and data analysis. EEG - fMRI, pages 511–526,2010.[53] Christos Davatzikos, Ahmet Genc, Dongrong Xu, and Susan M.Resnick. Voxel-based morphometry using the ravens maps: meth-ods and validation using simulated longitudinal atrophy. NeuroImage,14(6):1361–1369, 2001.[54] Geoff Davis, Stephane Mallat, and Marco Avellaneda. Adaptive greedyapproximations. Constructive approximation, 13(1):57–98, 1997.[55] M. H. Davis, M. A. Ford, F. Kherif, and I. S. Johnsrude. Does seman-tic context benefit speech understanding through top-down processes?evidence from time-resolved sparse fmri. Journal of cognitive neuro-science, 23 (12)(Early Access):1–3932, 2011.116Bibliography[56] M. H. Davis and I. S. Johnsrude. Hierarchical processing in spokenlanguage comprehension. The Journal of Neuroscience, 23(8):3423,2003.[57] Pierre Demartines and Jeanny Herault. Curvilinear component anal-ysis: A self-organizing neural network for nonlinear mapping of datasets. Neural Networks, IEEE Transactions on, 8(1):148–154, 1997.[58] Oguz Demirci, Vincent P. Clark, and Vince D. Calhoun. A projectionpursuit algorithm to classify individuals using fmri data: Applicationto schizophrenia. NeuroImage, 39(4):1774–1782, 2008.[59] Laurens Van der Maaten and Geoffrey Hinton. Visualizing data usingt-sne. Journal of Machine Learning Research, 9(11), 2008.[60] Virginia Devonshire, Eva Havrdova, Ernst Wilhelm Radue, PaulO’Connor, Lixin Zhang-Auberson, Catherine Agoropoulou, Di-eter Adrian Haring, Gordon Francis, and Ludwig Kappos. Relapse anddisability outcomes in patients with multiple sclerosis treated with fin-golimod: subgroup analyses of the double-blind, randomised, placebo-controlled freedoms study. The Lancet Neurology, 11(5):420–428, 2012.[61] E. R. Dorsey, R. Constantinescu, J. P. Thompson, K. M. Biglan, R. G.Holloway, K. Kieburtz, F. J. Marshall, B. M. Ravina, G. Schifitto,A. Siderowf, and C. M. Tanner. Projected number of people withparkinson disease in the most populous nations, 2005 through 2030.Neurology, 68(5):384–386, Jan 30 2007. LR: 20070813; JID: 0401060;CIN: Neurology. 2007 Jan 30;68(5):322-3. PMID: 17261676; CIN: Neu-rology. 2007 Jul 10;69(2):223-4; author reply 224. PMID: 17620562;2006/11/02 [aheadofprint]; ppublish.[62] I. L. Dryden and K. V. Mardia. Statistical shape analysis, volume 4.John Wiley Sons New York, 1998.[63] N. Duta and M. Sonka. Segmentation and interpretation of mr brainimages. an improved active shape model. Medical Imaging, IEEETransactions on, 17(6):1049–1062, 1998.[64] Harini Eavani, Roman Filipovych, Christos Davatzikos, Theodore D.Satterthwaite, Raquel E. Gur, and Ruben C. Gur. Sparse dictionarylearning of resting state fmri networks. In Pattern Recognition inNeuroImaging (PRNI), 2012 International Workshop on, pages 73–76.IEEE, 2012.117Bibliography[65] Bradley Efron and Robert J. Tibshirani. An introduction to the boot-strap, volume 57. CRC press, 1994.[66] Paul D. Ellis. The essential guide to effect sizes: Statistical power,meta-analysis, and the interpretation of research results. CambridgeUniversity Press, 2010.[67] Kjersti Engan, Sven Ole Aase, and John Hakon Husoy. Multi-framecompression: Theory and design. Signal Processing, 80(10):2121–2140,2000.[68] Erik B. Erhardt, Elena A. Allen, Yonghua Wei, Tom Eichele, andVince D. Calhoun. Simtb, a simulation toolbox for fmri data undera model of spatiotemporal separability. NeuroImage, 59(4):4160–4167,2012.[69] Yong Fan, Yong Liu, Hong Wu, Yihui Hao, Haihong Liu, ZheningLiu, and Tianzi Jiang. Discriminant analysis of functional connectivitypatterns on grassmann manifold. NeuroImage, 56(4):2058–2067, 2011.[70] James Ford, Hany Farid, Fillia Makedon, Laura A. Flashman,Thomas W. McAllister, Vasilis Megalooikonomou, and Andrew J.Saykin. Patient classification of fMRI activation maps, pages 58–65. Medical Image Computing and Computer-Assisted Intervention-MICCAI 2003. Springer, 2003.[71] Elia Formisano, Dae-Shik Kim, Francesco Di Salle, Pierre-Francoisvan de Moortele, Kamil Ugurbil, and Rainer Goebel. Mirror-symmetrictonotopic maps in human primary auditory cortex. Neuron, 40(4):859–869, 2003.[72] Richard SJ Frackowiak, Karl J. Friston, Christopher D. Frith, Ray-mond J. Dolan, Cathy J. Price, Semir Zeki, John T. Ashburner, andWilliam D. Penny. Human brain function. Academic Press, 2004.[73] K. J. Friston, A. P. Holmes, CJ Price, C. Buchel, and KJ Wors-ley. Multisubject fmri studies and conjunction analyses. NeuroImage,10(4):385–396, 1999.[74] Karl Friston, John Ashburner, Christopher Frith, JB Poline, JohnHeather, and Richard SJ Frackowiak. Spatial registration and nor-malization of images. Human brain mapping, 3(3):165–189, 1995.118Bibliography[75] KJ Friston, CD Frith, R. Turner, and RSJ Frackowiak. Characterizingevoked hemodynamics with fmri. NeuroImage, 2(2PA):157–165, 1995.[76] KJ Friston, AP Holmes, JB Poline, PJ Grasby, SCR Williams, RSJFrackowiak, and R. Turner. Analysis of fmri time-series revisited. Neu-roImage, 2(1):45–53, 1995.[77] T. Frodl, E. M. Meisenzahl, T. Zetzsche, C. Born, C. Groll, M. Jager,G. Leinsinger, R. Bottlender, K. Hahn, and H. J. Moller. Hippocampalchanges in patients with a first episode of major depression. AmericanJournal of Psychiatry, 159(7):1112–1118, 2002.[78] Bilwaj Gaonkar, Kilian Pohl, and Christos Davatzikos. Patternbased morphometry, pages 459–466. Medical Image Computing andComputer-Assisted Intervention (MICCAI). Springer, 2011.[79] P. Georgiev, F. Theis, A. Cichocki, and H. Bakardjian. Sparse compo-nent analysis: a new tool for data mining. Data Mining in Biomedicine,pages 91–116, 2007.[80] J. N. Giedd, M. Stockman, C. Weddle, M. Liverpool, A. Alexander-Bloch, G. L. Wallace, N. R. Lee, F. Lalonde, and R. K. Lenroot.Anatomic magnetic resonance imaging of the developing child and ado-lescent brain and effects of genetic variation. Neuropsychology review,20(4):349–361, 2010.[81] C. D. Good, I. S. Johnsrude, J. Ashburner, R. N. A. Henson, KJ Fris-ten, and R. S. J. Frackowiak. A voxel-based morphometric study ofageing in 465 normal adult human brains. NeuroImage, 14(1):21–36,2001.[82] K. Gorczowski, M. Styner, J. Y. Jeong, JS Marron, J. Piven, H. C.Hazlett, S. M. Pizer, and G. Gerig. Multi-object analysis of volume,pose, and shape using statistical discrimination. Pattern Analysis andMachine Intelligence, IEEE Transactions on, 32(4):652–661, 2010.[83] Allan D. Gordon. A review of hierarchical classification. Journal ofthe Royal Statistical Society, 150(2):119–137, 1987.[84] Hakon Gudbjartsson and Samuel Patz. The rician distribution of noisymri data. Magnetic Resonance in Medicine, 34(6):910–914, 1995.119Bibliography[85] Y. O. Halchenko, S. J. Hanson, and B. A. Pearlmutter. Multimodalintegration: fMRI, MRI, EEG, MEG. Advanced Image Processing inMagnetic Resonance Imaging, pages 223–265, 2005.[86] Benjamin Hamner, Ricardo Chavarriaga, and Jose del R. Millan.Learning dictionaries of spatial and temporal eeg primitives for brain-computer interfaces. 2011.[87] R. S. Hastings, R. V. Parsey, M. A. Oquendo, V. Arango, and J. J.Mann. Volumetric analysis of the prefrontal cortex, amygdala, andhippocampus in major depression. Neuropsychopharmacology: offi-cial publication of the American College of Neuropsychopharmacology,29(5):952, 2004.[88] James V. Haxby, M. Ida Gobbini, Maura L. Furey, Alumit Ishai, Jen-nifer L. Schouten, and Pietro Pietrini. Distributed and overlappingrepresentations of faces and objects in ventral temporal cortex. Sci-ence, 293(5539):2425–2430, 2001.[89] Barton F. Haynes, Peter B. Gilbert, M. Juliana McElrath, SusanZolla-Pazner, Georgia D. Tomaras, S. Munir Alam, David T. Evans,David C. Montefiori, Chitraporn Karnasuta, and Ruengpueng Sut-thent. Immune-correlates analysis of an hiv-1 vaccine efficacy trial.New England Journal of Medicine, 366(14):1275–1286, 2012.[90] A. O. Hero, B. Ma, O. Michel, and J. Gorman. Alpha-divergencefor classification, indexing and retrieval. Communication and SignalProcessing Laboratory, Technical Report CSPL-328, U.Mich, 2001.[91] Johan Himberg, Aapo Hyvarinen, and Fabrizio Esposito. Validatingthe independent components of neuroimaging time series via clusteringand visualization. NeuroImage, 22(3):1214–1222, 2004.[92] A. Hyvarinen and E. Oja. Independent component analysis: algorithmsand applications. Neural Networks, 13(4):411–430, 2000.[93] D. B. Kandel and M. Davies. Adult sequelae of adolescent depressivesymptoms. Archives of General Psychiatry, 43(3):255, 1986.[94] Joan Kaufman and Amanda E. Schweder. The schedule for affectivedisorders and schizophrenia for school-age children: Present and life-time version. Comprehensive Handbook of Psychological Assessment,Personality Assessment, 2:247, 2004.120Bibliography[95] Kendrick N. Kay, Thomas Naselaris, Ryan J. Prenger, and Jack L.Gallant. Identifying natural images from human brain activity. Nature,452(7185):352–355, 2008.[96] R. C. Kessler and E. E. Walters. Epidemiology of dsm-iii-r majordepression and minor depression among adolescents and young adultsin the national comorbidity survey. Depression and anxiety, 7(1):3–14,1998.[97] Ali R. Khan, Lei Wang, and Mirza Faisal Beg. Unified voxel and tensor-based morphometry (uvtbm) using registration confidence. Neurobiol-ogy of aging, 2014.[98] Yong-Hwan Kim, Junghoe Kim, and Jong-Hwan Lee. Iterative ap-proach of dual regression with a sparse prior enhances the performanceof independent component analysis for group functional magnetic res-onance imaging (fmri) data. NeuroImage, 2012.[99] Mayuresh S. Korgaonkar, Alex Fornito, Leanne M. Williams, and Stu-art M. Grieve. Abnormal structural networks characterize major de-pressive disorder: A connectome analysis. Biological psychiatry, 2014.[100] Kenneth Kreutz-Delgado, Joseph F. Murray, Bhaskar D. Rao, KjerstiEngan, and et al. Dictionary learning algorithms for sparse represen-tation. Neural computation, 15(2):349–396, 2003.[101] M. C. Kroes, M. D. Rugg, M. G. Whalley, and C. R. Brewin. Struc-tural brain abnormalities common to posttraumatic stress disorder anddepression. Journal of psychiatry neuroscience : JPN, 36(4):256–265,Jul 2011. LR: 20131018; GR: Wellcome Trust/United Kingdom; JID:9107859; OID: NLM: PMC3120894; ppublish.[102] Amanda J. Law, Qi Pei, Mary Walker, Helen Gordon-Andrews,Cyndi Shannon Weickert, Joram Feldon, Christopher R. Pryce, andPaul J. Harrison. Early parental deprivation in the marmoset mon-key produces long-term changes in hippocampal expression of genesinvolved in synaptic plasticity and implicated in mood disorder. Neu-ropsychopharmacology, 34(6):1381–1394, 2008.[103] K. Lee, S. Tak, and J. C. Ye. A data-driven sparse GLM for fMRIanalysis using sparse dictionary learning with MDL criterion. MedicalImaging, IEEE Transactions on, pages 1–1, 2011.121Bibliography[104] Y. O. Li, T. Adali, and V. D. Calhoun. Estimating the number ofindependent components for functional magnetic resonance imagingdata. Human brain mapping, 28(11):1251–1266, 2007.[105] Y. O. Li, T. Adali, W. Wang, and V. D. Calhoun. Joint blind sourceseparation by multiset canonical correlation analysis. Signal Process-ing, IEEE Transactions on, 57(10):3918–3929, 2009.[106] Yuanqing Li, Jinyi Long, Lin He, Haidong Lu, Zhenghui Gu, and PeiSun. A sparse representation-based algorithm for pattern localizationin brain imaging data analysis. PloS one, 7(12):e50332, 2012.[107] Yuanqing Li, Praneeth Namburi, Zhuliang Yu, Cuntai Guan, JianfengFeng, and Zhenghui Gu. Voxel selection in fmri data analysis basedon sparse representation. Biomedical Engineering, IEEE Transactionson, 56(10):2439–2451, 2009.[108] J. Liu, G. Pearlson, A. Windemuth, G. Ruano, N. I. Perrone ÂBiz-zozero, and V. Calhoun. Combining fMRI and SNP data to inves-tigate connections between brain function and genetics using parallelICA. Human brain mapping, 30(1):241–255, 2009.[109] Z. Liu, L. Ding, and B. He. Integration of EEG/MEG with MRI andfMRI. Engineering in Medicine and Biology Magazine, IEEE, 25(4):46–53, 2006.[110] V. Lorenzetti, N. B. Allen, A. Fornito, and M. Yucel. Structural brainabnormalities in major depressive disorder: a selective review of recentmri studies. Journal of affective disorders, 117(1):1–17, 2009.[111] C. Lu, S. M. Pizer, S. Joshi, and J. Y. Jeong. Statistical multi-objectshape models. International Journal of Computer Vision, 75(3):387–404, 2007.[112] L. Ma, B. Wang, X. Chen, and J. Xiong. Detecting functional connec-tivity in the resting brain: a comparison between ica and cca. Magneticresonance imaging, 25(1):47–56, 2007.[113] Qiongmin Ma, Ling-Li Zeng, Hui Shen, Li Liu, and Dewen Hu. Al-tered cerebellar-cerebral resting-state functional connectivity reliablyidentifies major depressive disorder. Brain research, 1495:86–94, 2013.122Bibliography[114] S. Ma, X. L. Li, N. M. Correa, T. Adali, and V. D. Calhoun. Inde-pendent subspace analysis with prior information for fmri data. InAcoustics Speech and Signal Processing (ICASSP), 2010 IEEE Inter-national Conference on, pages 1922–1925. IEEE, 2010.[115] H. MacDonald. Behavioural and neuroimaging studies of the influenceof semantic context on the perception of speech in noise, 2008.[116] Heather MacDonald, Matthew H. Davis, Kathy Pichora-Fuller, andIngrid S. Johnsrude. Contextual influences: Perception of sentences innoise is facilitated similarly in young and older listeners by meaningfulsemantic context; neural correlates explored via functional magneticresonance imaging (fMRI). The Journal of the Acoustical Society ofAmerica, 123(5):3887–3887, 2008.[117] F. P. MacMaster, Y. Mirza, P. R. Szeszko, L. E. Kmiecik, P. C. Easter,S. P. Taormina, M. Lynch, M. Rose, G. J. Moore, and D. R. Rosenberg.Amygdala and hippocampal volumes in familial early onset major de-pressive disorder. Biological psychiatry, 63(4):385–390, 2008.[118] Frank MacMaster and Vivek Kusumakar. Hippocampal volume inearly onset depression. BMC Medicine, 2, 2004.[119] Frank P. MacMaster, Normand Carrey, Lisa Marie Langevin, NataliaJaworska, and Susan Crawford. Disorder-specific volumetric brain dif-ference in adolescent major depressive disorder and bipolar depression.Brain imaging and behavior, pages 1–9, 2013.[120] G. M. MacQueen, S. Campbell, B. S. McEwen, K. Macdonald,S. Amano, R. T. Joffe, C. Nahmias, and L. T. Young. Course of illness,hippocampal function, and hippocampal volume in major depression.Proceedings of the National Academy of Sciences, 100(3):1387–1392,2003.[121] Federico De Martino, Giancarlo Valente, Noel Staeren, John Ash-burner, Rainer Goebel, and Elia Formisano. Combining multivariatevoxel selection and support vector machines for mapping and classifi-cation of fmri spatial patterns. NeuroImage, 43(1):44–58, 2008.[122] Koji Matsuo, David R. Rosenberg, Philip C. Easter, Frank P. Mac-Master, Hua-Hsuan Chen, Mark Nicoletti, Sheila C. Caetano, John P.Hatch, and Jair C. Soares. Striatal volume abnormalities in treatment-naive patients diagnosed with pediatric major depressive disorder.123BibliographyJournal of child and adolescent psychopharmacology, 18(2):121–131,2008.[123] M. J. McKeown, S. Makeig, G. G. Brown, T. P. Jung, S. S. Kinder-mann, A. J. Bell, and T. J. Sejnowski. Analysis of fmri data by blindseparation into independent spatial components. Human brain map-ping, 6(3):160–188, 1998.[124] Martin J. McKeown. Detection of consistently task-related activationsin fmri data with hybrid independent component analysis. NeuroIm-age, 11(1):24–35, 2000.[125] Martin J. McKeown and Terrence J. Sejnowski. Independent compo-nent analysis of fmri data: examining the assumptions. Human brainmapping, 6(5-6):368–372, 1998.[126] Nicolai Meinshausen and Peter Buhlmann. Stability selection. Journalof the Royal Statistical Society: Series B (Statistical Methodology),72(4):417–473, 2010.[127] M-Marsel Mesulam. From sensation to cognition. Brain, 121(6):1013–1052, 1998.[128] A. M. Milne, G. M. MacQueen, and G. B. Hall. Abnormal hippocampalactivation in patients with extensive history of major depression: anfmri study. Journal of psychiatry neuroscience : JPN, 37(1):28–36,Jan 2012.[129] Tom M. Mitchell, Rebecca Hutchinson, Radu S. Niculescu, FranciscoPereira, Xuerui Wang, Marcel Adam Just, and Sharlene D. Newman.Learning to decode cognitive states from brain images. Machine Learn-ing, 57:145–175, 2004.[130] Valeria Mondelli, Annamaria Cattaneo, Martino Murri, Marta DiForti, Rowena Handley, Nilay Hepgul, Ana Miorelli, Serena Navari, An-drew Papadopoulos, Katherine J. Aitchison, Craig Morgan, Robin M.Murray, Paola Dazzan, and Carmine M. Pariante. Stress and in-flammation reduce bdnf expression in first-episode psychosis: a path-way to smaller hippocampal volume. Journal of Clinical Psychiatry,72(12):1677–1684, 2011.[131] ES Monkul, JP Hatch, MA Nicoletti, S. Spence, P. Brambilla, ALTLacerda, RB Sassi, AG Mallinger, MS Keshavan, and JC Soares.124BibliographyFronto-limbic brain structures in suicidal and non-suicidal female pa-tients with major depressive disorder. Molecular psychiatry, 12(4):360–366, 2006.[132] Marieke Mur, Peter A. Bandettini, and Nikolaus Kriegeskorte. Re-vealing representational content with pattern-information fmri–an in-troductory guide. Social cognitive and affective neuroscience, 4(1):101–109, 2009.[133] Thomas Naselaris, Ryan J. Prenger, Kendrick N. Kay, Michael Oliver,and Jack L. Gallant. Bayesian reconstruction of natural images fromhuman brain activity. Neuron, 63(6):902–915, 2009.[134] A. Neumeister, S. Wood, O. Bonne, A. C. Nugent, D. A. Luckenbaugh,T. Young, E. E. Bain, D. S. Charney, and W. C. Drevets. Reducedhippocampal volume in unmedicated, remitted patients with majordepression versus control subjects. Biological psychiatry, 57(8):935–937, 2005.[135] Carla L. Nolan, Gregory J. Moore, Rachel Madden, Tiffany Farchione,Marla Bartoi, Elisa Lorch, Carol M. Stewart, and David R. Rosen-berg. Prefrontal cortical volume in childhood-onset major depression:preliminary findings. Archives of General Psychiatry, 59(2):173, 2002.[136] Ilia Nouretdinov, Sergi G. Costafreda, Alexander Gammerman, AlexeyChervonenkis, Vladimir Vovk, Vladimir Vapnik, and Cynthia HY Fu.Machine learning classification with confidence: application of trans-ductive conformal predictors to mri-based diagnostic and prognosticmarkers in depression. NeuroImage, 56(2):809–813, 2011.[137] Kayoko Okada, Feng Rong, Jon Venezia, William Matchin, I-HuiHsieh, Kourosh Saberi, John T. Serences, and Gregory Hickok. Hier-archical organization of human auditory cortex: evidence from acous-tic invariance in the response to intelligible speech. Cerebral Cortex,20(10):2486–2495, 2010.[138] J. Olesen, A. Gustavsson, M. Svensson, HU. Wittchen, and B. Jonsson.The economic cost of brain disorders in europe. European Journal ofNeurology, 19(1):155–162, 2012.[139] Bruno A. Olshausen and David J. Field. Sparse coding with an over-complete basis set: A strategy employed by v1? Vision research,37(23):3311–3325, 1997.125Bibliography[140] Dustin A. Pardini, Kirk Erickson, Rolf Loeber, and Adrian Raine.Lower amygdala volume in men is associated with childhood aggres-sion, early psychopathic traits, and future violence. Biological psychi-atry, 2013.[141] Y. C. Pati, R. Rezaiifar, and PS Krishnaprasad. Orthogonal match-ing pursuit: Recursive function approximation with applications towavelet decomposition. In Twenty-Seventh Asilomar Conference onSignals, Systems and Computers, pages 40–44 vol. 1. IEEE, 1993.[142] Jonathan E. Peelle, Ingrid S. Johnsrude, and Matthew H. Davis. Hier-archical processing for speech in human auditory cortex and beyond.Frontiers in human neuroscience, 4, 2010.[143] X. Pennec. Intrinsic statistics on riemannian manifolds: Basic toolsfor geometric measurements. Journal of Mathematical Imaging andVision, 25(1):127–154, 2006.[144] Francisco Pereira. Beyond brain blobs: machine learning classifiers asinstruments for analyzing functional magnetic resonance imaging data.ProQuest, 2007.[145] Francisco Pereira, Tom Mitchell, and Matthew Botvinick. Ma-chine learning classifiers and fmri: a tutorial overview. NeuroImage,45(1):S199–S209, 2009.[146] J. Platt. Probabilistic outputs for support vector machines and com-parisons to regularized likelihood methods. Advances in Large MarginClassifiers, pages 61–74, 1999.[147] J. A. Posener, L. Wang, J. L. Price, M. H. Gado, M. A. Province, M. I.Miller, C. M. Babb, and J. G. Csernansky. High-dimensional mappingof the hippocampus in depression. American Journal of Psychiatry,160(1):83–89, 2003.[148] M. Ramezani, P. Abolmaesumi, K. Marble, H. MacDonald, andI. Johnsrude. Fusion analysis of functional MRI data for classifica-tion of individuals based on patterns of activation. Brain imaging andBehavior, 2014.[149] Mahdi Ramezani, Purang Abolmaesumi, Kris Marble, Heather Mac-Donald, and Ingrid Johnsrude. Classification of individuals based onsparse representation of brain cognitive patterns: a functional mri126Bibliographystudy. In 34th Annual International Conference of the IEEE Engi-neering in Medicine and Biology Society (EMBC 2012), 2012.[150] Mahdi Ramezani, Purang Abolmaesumi, Kris Marble, Heather Mac-Donald, and Ingrid Johnsrude. Joint sparse representation of brainactivity patterns related to perceptual and cognitive components of aspeech comprehension task. In 2nd International Workshop on PatternRecognition in NeuroImaging (PRNI 2012), 2012.[151] Mahdi Ramezani, Purang Abolmaesumi, Amir Tahmasebi, RchaelBosma, Ryan Tong, Tom Hollenstein, Kate Harkness, and Ingrid John-srude. Fusion analysis of first episode depression: Where brain shapedeformations meet local composition of tissue. Neuroimage: Clinical,2014.[152] Mahdi Ramezani and Shahrokh Ghaemmaghami. Towards genetic fea-ture selection in image steganalysis. In Consumer Communications andNetworking Conference (CCNC), 2010 7th IEEE, pages 1–4. IEEE,2010.[153] Mahdi Ramezani, Ingrid Johnsrude, Abtin Rasoulian, Rchael Bosma,Ryan Tong, Tom Hollenstein, Kate Harkness, and Purang Abolmae-sumi. Temporal-lobe morphology differs between healthy adolescentsand those with early-onset of depression. Neuroimage: Clinical, 2014.[154] Mahdi Ramezani, Kris Marble, Heather MacDonald, Ingrid Johnsrude,and Purang Abolmaesumi. Joint sparse representation of brain activ-ity patterns in multi-task fmri data. IEEE Transactions on MedicalImaging, 2014.[155] Mahdi Ramezani, Saman Nouranian, Ingrid Johnsrude, and PurangAbolmaesumi. Reliability analysis and visualization of sparse repre-sentation methods for neuroimaging data. Submitted, 2014.[156] Mahdi Ramezani, Abtin Rasoulian, Purang Abolmaesumi, RchaelBosma, Ryan Tong, Tom Hollenstein, Ingrid Johnsrude, and KateHarkness. Multi-object statistical analysis of late adolescent depres-sion. In SPIE Medical Imaging 2010 - Image Processing, February9-14, 2013.[157] Mahdi Ramezani, Abtin Rasoulian, Purang Abolmaesumi, Tom Hol-lenstein, Kate Harkness, and Ingrid Johnsrude. Independent compo-nent analysis on lie groups for multi-object analysis of first episode127Bibliographydepression. International Conference on Acoustics, Speech, and SignalProcessing (ICASSP), 2013.[158] Mahdi Ramezani, Abtin Rasoulian, Purang Abolmaesumi, Tom Hol-lenstein, Ingrid Johnsrude, and Kate Harkness. Multi-object statisticalanalysis of late adolescent depression. In SPIE Medical Imaging - Im-age processing, 2013.[159] Mahdi Ramezani, Abtin Rasoulian, Tom Hollenstein, Kate Harkness,Ingrid Johnsrude, and Purang Abolmaesumi. Joint source based anal-ysis of multiple brain structures in studying major depressive disorder.In SPIE Medical Imaging, pages 90341P–90341P–6, 2014.[160] Michele L. Ries, Cynthia M. Carlsson, Howard A. Rowley, Mark A.Sager, Carey E. Gleason, Sanjay Asthana, and Sterling C. Johnson.Magnetic resonance imaging characterization of brain structure andfunction in mild cognitive impairment: a review. Journal of the Amer-ican Geriatrics Society, 56(5):920–934, 2008.[161] J. M. Rodd, M. H. Davis, and I. S. Johnsrude. The neural mechanismsof speech comprehension: fMRI studies of semantic ambiguity. CerebralCortex, 15(8):1261, 2005.[162] Isabella M. Rosso, Christina M. Cintron, Ronald J. Steingard, Perry F.Renshaw, Ashley D. Young, and Deborah A. Yurgelun-Todd. Amyg-dala and hippocampus volumes in pediatric major depression. Biolog-ical Psychiatry; Biological Psychiatry, 2005.[163] Kenneth J. Rothman. No adjustments are needed for multiple com-parisons. Epidemiology, 1(1):43–46, 1990.[164] Evangelos Roussos, Steven Roberts, and Ingrid Daubechies. Varia-tional Bayesian Learning of Sparse Representations and Its Applica-tion in Functional Neuroimaging, pages 218–225. Machine Learningand Interpretation in Neuroimaging. Springer, 2012.[165] B. D. Rusch, H. C. Abercrombie, T. R. Oakes, S. M. Schaefer, andR. J. Davidson. Hippocampal morphometry in depressed patients andcontrol subjects: relations to anxiety symptoms. Biological psychiatry,50(12):960–964, 2001.[166] Srikanth Ryali, Kaustubh Supekar, Daniel A. Abrams, and VinodMenon. Sparse logistic regression for whole-brain classification of fmridata. NeuroImage, 51(2):752–764, 2010.128Bibliography[167] R. M. Sapolsky. The possibility of neurotoxicity in the hippocampusin major depression: a primer on neuron death. Biological psychiatry,48(8):755–765, 2000.[168] F. Savopol and C. Armenakis. Merging of heterogeneous data foremergency mapping: data integration or data fusion? InternationalArchives of the Photogrammetry, Remote Sensing and Spatial Infor-mation Sciences, 34(4):668–674, 2002.[169] C. Saylam, H. Ucerler, O. Kitis, E. Ozand, and A. S. Gonul. Re-duced hippocampal volume in drug-free depressed patients. Surgicaland Radiologic Anatomy, 28(1):82–87, 2006.[170] M. R. Schroeder. Reference signal for signal quality studies. TheJournal of the Acoustical Society of America, 44(6):1735–1736, 1968.[171] Jessica Schrouff, Caroline Kusse, Louis Wehenkel, Pierre Maquet, andChristophe Phillips. Decoding semi-constrained brain activity fromfmri using support vector machines and gaussian processes. PLoS one,7(4):e35860, 2012.[172] N. Schuff, D. L. Amend, R. Knowlton, D. Norman, G. Fein, and M. W.Weiner. Age-related metabolite changes and volume loss in the hip-pocampus by magnetic resonance spectroscopy and imaging. Neurobi-ology of aging, 20(3):279–285, 1999.[173] Mujeeb U. Shad, Srirangam Muddasani, and Uma Rao. Gray matterdifferences between healthy and depressed adolescents: A voxel-basedmorphometry study. Journal of child and adolescent psychopharma-cology, 22(3):190–197, 2012.[174] PJ Shah, MF Glabus, GM Goodwin, and KP Ebmeier. Chronic,treatment-resistant depression and right fronto-striatal atrophy. TheBritish Journal of Psychiatry, 180(5):434–440, 2002.[175] Premal J. Shah, Klaus P. Ebmeier, Michael F. Glabus, and Guy M.Goodwin. Cortical grey matter reductions associated with treatment-resistant chronic unipolar depression. controlled magnetic resonanceimaging study. The British journal of psychiatry, 172(6):527–532, 1998.[176] D. W. Shattuck, M. Mirza, V. Adisetiyo, C. Hojatkashani, G. Sala-mon, K. L. Narr, R. A. Poldrack, R. M. Bilder, and A. W. Toga.Construction of a 3d probabilistic atlas of human cortical structures.NeuroImage, 39(3):1064–1080, 2008.129Bibliography[177] Y. I. Sheline, M. Sanghavi, M. A. Mintun, and M. H. Gado. Depressionduration but not age predicts hippocampal volume loss in medicallyhealthy women with recurrent major depression. The Journal of Neu-roscience, 19(12):5034–5043, 1999.[178] Younghak Shin, Seungchan Lee, Junho Lee, and Heung-No Lee. Sparserepresentation-based classification scheme for motor imagery-basedbrain computer interface systems. Journal of Neural Engineering,9(5):056002, 2012.[179] Robert R. Sokal and F. James Rohlf. The comparison of dendrogramsby objective methods. Taxon, 11(2):33–40, 1962.[180] K. Specht, R. Zahn, K. Willmes, S. Weis, C. Holtel, B. J. Krause,H. Herzog, and W. Huber. Joint independent component analysis ofstructural and functional images reveals complex patterns of functionalreorganisation in stroke aphasia. NeuroImage, 47(4):2057–2063, 2009.[181] Longfei Su, Lubin Wang, Fanglin Chen, Hui Shen, Baojuan Li, andDewen Hu. Sparse representation of brain aging: extracting covariancepatterns from structural mri. PloS one, 7(5):e36147, 2012.[182] J. Sui, T. Adali, G. Pearlson, H. Yang, S. R. Sponheim, T. White,and V. D. Calhoun. A CCA ICA based model for multi-task brainimaging data fusion and its application to schizophrenia. NeuroImage,51(1):123–134, 2010.[183] J. Sui, T. Adali, G. D. Pearlson, V. P. Clark, and V. D. Calhoun. Amethod for accurate group difference detection by constraining themixing coefficients in an ICA framework. Human brain mapping,30(9):2953–2970, 2009.[184] Jing Sui, Tulay Adali, Godfrey D. Pearlson, and Vince D. Calhoun.An ica-based method for the identification of optimal fmri featuresand components using combined group-discriminative techniques. Neu-roImage, 46(1):73–86, 2009.[185] Jing Sui, Hao He, Qingbao Yu, Jiayu Chen, Jack Rogers, Godfrey D.Pearlson, Andrew Mayer, Juan Bustillo, Jose Canive, and Vince D.Calhoun. Combination of resting state fmri, dti, and smri data todiscriminate schizophrenia by n-way mcca jica. Frontiers in humanneuroscience, 7, 2013.130Bibliography[186] A. M. Tahmasebi, M. H. Davis, C. J. Wild, J. M. Rodd, H. Hakyemez,P. Abolmaesumi, and I. S. Johnsrude. Is the link between anatomicalstructure and function equally strong at all cognitive levels of process-ing? Cerebral Cortex, 2011.[187] Amir M. Tahmasebi, Purang Abolmaesumi, Zane Z. Zheng, Kevin G.Munhall, and Ingrid S. Johnsrude. Reducing inter-subject anatomicalvariation: effect of normalization method on sensitivity of functionalmagnetic resonance imaging data analysis in auditory cortex and thesuperior temporal region. NeuroImage, 47(4):1522–1531, 2009.[188] M. H. Teicher, C. M. Anderson, and A. Polcari. Childhood mal-treatment is associated with reduced volume in the hippocampal sub-fields ca3, dentate gyrus, and subiculum. Proceedings of the NationalAcademy of Sciences of the United States of America, 109(9):E563–72,Feb 28 2012.[189] S. Theodoridis and K. Koutroumbas. Pattern recognition. AcademicPress, New York, 2003.[190] Warren S. Torgerson. Multidimensional scaling: I. theory and method.Psychometrika, 17(4):401–419, 1952.[191] I. Tosic and P. Frossard. Dictionary learning. Signal Processing Mag-azine, IEEE, 28(2):27–38, 2011. ID: 1.[192] Duygu Tosun, Howard Rosen, Bruce L. Miller, Michael W. Weiner,and Norbert Schuff. Mri patterns of atrophy and hypoperfusion asso-ciations across brain regions in frontotemporal dementia. NeuroImage,59(3):2098–2109, 2012.[193] A. Tsai, W. Wells, C. Tempany, E. Grimson, and A. Willsky. Coupledmulti-shape model and mutual information for medical image segmen-tation. In Information Processing in Medical Imaging, pages 185–197.Springer, 2003.[194] K. Vakili, S. S. Pillay, B. Lafer, M. Fava, P. F. Renshaw, C. M. Bonello-Cintron, and D. A. Yurgelun-Todd. Hippocampal volume in primaryunipolar major depression: a magnetic resonance imaging study. Bio-logical psychiatry, 47(12):1087–1090, 2000.[195] Gael Varoquaux, Merlin Keller, J-B Poline, Philippe Ciuciu, andBertrand Thirion. Ica-based sparse features recovery from fmri131Bibliographydatasets. In Biomedical Imaging: From Nano to Macro, 2010 IEEEInternational Symposium on, pages 1177–1180. IEEE, 2010.[196] N. Vasic, H. Walter, A. Hose, and R. C. Wolf. Gray matter reductionassociated with psychopathology and cognitive dysfunction in unipolardepression: a voxel-based morphometry study. Journal of affectivedisorders, 109(1):107–116, 2008.[197] Jarkko Venna and Samuel Kaski. Neighborhood preservation in non-linear projection methods: An experimental study, pages 485–491. Ar-tificial Neural Networks. Springer, 2001.[198] Justin L. Vincent, Abraham Z. Snyder, Michael D. Fox, Benjamin J.Shannon, Jessica R. Andrews, Marcus E. Raichle, and Randy L. Buck-ner. Coherent spontaneous activity identifies a hippocampal-parietalmemory network. Journal of neurophysiology, 96(6):3517–3531, 2006.[199] J. De Vry, J. Prickaerts, M. Jetten, M. Hulst, HWM Steinbusch,DLA Van den Hove, T. Schuurman, and FJ van der Staay. Recur-rent long-lasting tethering reduces bdnf protein levels in the dorsalhippocampus and frontal cortex in pigs. Hormones and behavior,62(1):10–17, 2012.[200] M. Vythilingam, E. Vermetten, G. M. Anderson, D. Luckenbaugh,E. R. Anderson, J. Snow, L. H. Staib, D. S. Charney, and J. D.Bremner. Hippocampal volume, memory, and cortisol status in ma-jor depressive disorder: effects of treatment. Biological psychiatry,56(2):101–112, 2004.[201] BA Wandell, SO Dumoulin, and AA Brewer. Visual areas in humans.[202] Nizhuan Wang, Weiming Zeng, and Lei Chen. Sacica: A sparse approx-imation coefficients based ica model for functional magnetic resonanceimaging data analysis. Journal of neuroscience methods, 2013.[203] L. L. Zeng, H. Shen, L. Liu, L. Wang, B. Li, P. Fang, Z. Zhou, Y. Li,and D. Hu. Identifying major depression using whole-brain functionalconnectivity: a multivariate pattern analysis. Brain : a journal ofneurology, 135(Pt 5):1498–1507, May 2012.[204] Z. Zhao, W. D. Taylor, M. Styner, D. C. Steffens, K. R. R. Krishnan,and J. R. MacFall. Hippocampus shape analysis and late-life depres-sion. PLoS One, 3(3):e1837, 2008.132

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0167096/manifest

Comment

Related Items