Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Prediction of graft-versus-host disease based on supervised temporal analysis on high-throughput flow… Lee, Shang-Jung 2007

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2007-0157a.pdf [ 19.75MB ]
Metadata
JSON: 831-1.0100886.json
JSON-LD: 831-1.0100886-ld.json
RDF/XML (Pretty): 831-1.0100886-rdf.xml
RDF/JSON: 831-1.0100886-rdf.json
Turtle: 831-1.0100886-turtle.txt
N-Triples: 831-1.0100886-rdf-ntriples.txt
Original Record: 831-1.0100886-source.json
Full Text
831-1.0100886-fulltext.txt
Citation
831-1.0100886.ris

Full Text

PREDICTION OF GRAFT-VERSUS-HOST DISEASE BASED ON SUPERVISED TEMPORAL ANALYSIS ON HIGH-THROUGHPUT FLOW CYTOMETRY DATA by Shang-Jung Lee B.Sc, Queen's University, 2003 THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in The Faculty of Graduate Studies (Bioinformatics) UNIVERSITY OF BRITISH COLUMBIA APRIL 2007 © Shang-Jung Lee 2007 ABSTRACT Despite recent advancements in human immune-genetics, graft-versus-host disease (GvHD) continues to be the major and potentially fatal complication of hematopoietic stem cell transplantations affecting up to 80% of transplant patients [1]. Very little is known regarding the pathophysiologic mechanisms behind the manifestation of either acute or chronic GvHD. Diagnosis and treatment assessment are often hindered as they rely primarily on ambiguous clinical symptoms, such as tissue inflammation. It is likely that the outcome for patients diagnosed with GvHD could be improved if they were treated in a pre-emptive fashion, before the development of full-scale clinical symptoms. Using flow cytometry high content screening [2], 123 subsets of immune cells were identified from blood samples taken at multiple time points from 31 patients who underwent allogenic bone marrow transplantations. I assembled a novel analysis pipeline specifically designed to process this high-throughput clinical flow cytometry dataset. The pipeline included a novel quality assurance test [3] and temporal classification via functional linear discriminant analysis [4]. Temporal patterns of multiple immune cell abundances both after the transplantation and around the acute GvHD diagnosis were screened for potential discriminative power for either acute or chronic GvHD. Among many potential discriminative patterns: higher proportion values in immune cell with CD3 +CD4 +CD8p + phenotype were found in acute GvHD patients (21), compared to the patients unaffected by GvHD (3), between zero and 120 days post-transplant. I also generated a list of recommendations for an extended study designed to validate the current findings. The global approach of the high-throughput flow cytometry technique and the novel temporal analysis pipeline, implemented according to the list of recommendations would be beneficial in ii elucidating pathophysiologic mechanisms of complex immunologically based diseases including GvHD. iii TABLE OF CONTENTS ABSTRACT ii TABLE OF CONTENTS - iv LIST OF TABLES viii LIST OF FIGURES xi LIST OF ABBREVIATIONS xvii LIST OF SYMBOLS xix LIST OF EQUATIONS xx PREFACE xxi ACKNOWLEDGEMENT xxii DEDICATION xxiv CHAPTER 1 INTRODUCTION 1 1.1 Flow Cytometry 1 1.2 Graft versus host disease 4 1.2.1 Acute graft versus host disease 5 1.2.2 Chronic graft versus host disease 7 1.3 Temporal analyses 10 1.3.1 Temporal analysis for flow cytometry data 10 1.3.2 Representing temporal data 11 1.3.3 Data pre-processing - Smoothing & Registration 12 1.3.4 Classification 14 1.4 Sample size calculations 17 1.5 Thesis goals 18 CHAPTER 2 PATIENTS AND METHODS 20 2.1 Overview......... 20 2.2 Study patients 20 2.3 Sample preparations and flow cytometry high content screening 22 2.4 Temporal analysis pipeline 24 2.4.1 Quality Assurance 27 2.4.2 B-spline parameters evaluation 28 2.4.3 Data transformation 29 2.4.4 Temporal classification 30 2.5 Static sample size calculation 31 2.5.1 Weight values in the functional linear discriminant analysis classification 33 CHAPTER 3 RESULTS - QUALITY ASSURANCE AND B-SPLINE PARAMETERS 35 3.1 Quality assurance on ungated data '. 35 3.2 Quality assurance on gated data 35 3.2.1 Singular outliers 35 3.2.2 Unusually large variations among aliquots 42 3.2.3 Repeated outlier conditions 45 3.2.4 Outlier distributions on the 96-well plate 47 3.3 B-spline parameters 49 CHAPTER 4 RESULTS - TOP RANKING CLASSIFIERS 51 4.1 Classifiers for the onset of acute graft versus host disease 51 4.1.1 Inconsistent classifier by missing values 54 4.1.2 CD3+CD4+CD8p+(CD8+) 57 4:1.3 ' CD3 +CD4 i nt 63 4.1.4 Static sample size analysis 69 4.2 Classifiers for the onset of chronic graft versus host disease 70 4.2.1 Inconsistent classifiers by pattern outlier 70 4.2.2 Opposite estimated signals between groups 73 v 4.2.3 Static sample size analysis : 73 c h a p t e r 5 D I S C U S S I O N 77 5.1 Quality assurance 78 5.1.1 Quality assurance on ungated and gated data 79 5.1.2 Quality assurance via raw data time plots 81 5.1.3 Robustness of the flow cytometry high content screening technique.. 82 5.2 Data issues 82 5.2.1 Patients 82 5.2.2 Sampling time ranges 84 5.2.3 Proportion and concentration flow cytometry datasets 85 5.3 Temporal analysis 85 5.4 Predicting the onset of graft versus host disease 87 5.4.1 Acute graft versus host disease 87 5.4.2' "' Acute graft versus host disease prediction model using . CD3 +CD4 +CD8p + 91 5.4.3 Chronic graft versus host disease 93 5.4.4 Chronic graft versus host disease prediction model using 45RO+CD3~ C D 4 d i m 94 5.5 Recommended improvements 96 5.5.1 Random plating 96 5.5.2 Patient recruitment 97 5.5.3 Sampling rate 98 5.5.4 Additional markers 98 5.5.5 Additional statistic tests 99 5.5.6 Graft versus host disease grades 100 5.5.7 External validation 101 5.5.8 Multiparametric approach 101 5.5.9 Long time series analysis 103 5.6 ' Conclusion 104 vi B I B L I O G R A P H Y 106 A P P E N D I C E S 120 Appendix A. Patient information on maximum GvHD grade, GvHD diagnosis in days post-transplant and patient-donor relationship 120 Appendix B. List of the subsets of immune cells from each of the ten aliquots.. 122 Appendix C. PERL script fixFCS.pl for enforcing FCS file compatibility from Flowjo into rflowcyt 126 Appendix D. PERL script viz_days.pl for flow cytometry data transformation. 132 Appendix E. PERL script FLDA_MATLAB.pl for creating MATLAB commands performing FLDA analysis 142 Appendix F. QA on gated data using CD3 as the common intensity 163 Appendix G. Other top ranking classifiers for the onset of aGvHD 165 Appendix H. Summaries of LOOCV results for the FLDA analyses between aGvHD and non-GvHD patients 174 Appendix I. Other top ranking classifiers for the onset of cGvHD 198 Appendix J. Summaries of LOOCV results for the FLDA analyses between aGvHD & cGvHD and aGvHD only patients 204 Appendix K. FLDA classification model for the onset of aGvHD 234 Appendix L. FLDA classification model for the onset of cGvHD 237 vii LIST OF TABLES Table 2.1 Characterist ics of the 31 patients recrui ted for the s tudy 21 Table 2.2 Anno ta t ed functions and selected literature references o n the 25 cel l surface antigens used 23 Table 2.3 The combinat ions of ant ibody - f luorochromes used i n each of the 10 al iquots available per sample 24 Table 3.1 Out l ie r s ident i f ied i n the Q A test o n gated data 39 Table 3.2 C e l l popula t ions a n d samples where C D 3 + or C D 3 c e l l p o p u l a t i o n exhib i ted unusua l var iat ions a m o n g the available al iquots 42 Table 3.3 C e l l popula t ions and samples where the t w o aliquots res t /act T helper a n d res t /ac t T suppressor exhibi ted s imi lar pattern w i t h i n a n d different pattern compared to a l l other available al iquots 45 Table 3.4 P la t ing order for patient #6 w i t h samples taken at mu l t ip l e t ime points o n t w o plates. A l i q u o t s ident i f ied as outliers and unusua l ly var ia t ions are label led w i t h shaded areas 48 Table 4.1 V a l i d a t i o n results for the top r ank ing subsets of i m m u n e cells a n d their related cel l popula t ions f rom the F L D A classification w i t h different subsets of a G v H D vs. the n o n - G v H D patients u s ing samples taken between 7 a n d 21 days post-transplant, (nd = not done due to lack of data) 52 Table 4.2 Es t imated p o w e r of s tudy v i a the static sample size ca lcula t ion u s ing C D 3 + C D 4 + C D 8 [ 3 + p ropor t ion values f rom samples taken closest to 21 days post-transplant 69 Table 4.3 V a l i d a t i o n results for the top r ank ing subsets of i m m u n e cells f r o m the F L D A classification between the a G v H D & c G v H D and G v H D on ly patients u s i n g samples taken between 21 and 0 days p r io r to a G v H D diagnosis 71 v i i i Table 4.4 Estimated power of study via the static sample size calculation using 45RO +CD3-CD4 d i m proportion values from samples taken closest to 7 days prior to aGvHD diagnosis 76 Table H . l Validation results for qualified subsets of immune cells in proportion to PBMC (%) from the FLDA classification between aGvHD and non-GvHD patients using samples taken from 7 to 21 days post-transplant 174 Table H.2 Validation results for qualified subsets of immune cells in proportion to PBMC (%) from the FLDA classification between aGvHD and non-GvHD patients using samples taken between 21 and 0 days prior to aGvHD diagnosis. ...178 Table H.3 Validation results for qualified subsets of immune cells in proportion to PBMC (%) from the FLDA classification between aGvHD and non-GvHD patients using samples taken between 0 and 21 days from aGvHD diagnosis. 182 Table H.4 Validation results for qualified subsets of immune cells in concentration (mm3) from the FLDA classification between aGvHD and non-GvHD patients using samples taken from 7 to 21 days post-transplant : 186 Table H.5 Validation results for qualified subsets of immune cells in concentration (mm3) from the FLDA classification between aGvHD and non-GvHD patients using samples taken between 21 and 0 days prior to aGvHD diagnosis 190 Table H.6 Validation results for qualified subsets of immune cells in concentration (mm3) from the FLDA classification between aGvHD and non-GvHD patients using samples taken between 0 and 21 days from aGvHD diagnosis 194 Table J.l Validation results for qualified subsets of immune cells in proportion to PBMC (%) from the FLDA classification between aGvHD & cGvHD and aGvHD only patients using samples taken from 7 to 21 days post-transplant. 204 Table J.2 Validation results for qualified subsets of immune cells in proportion to PBMC (%) from the FLDA classification between aGvHD & cGvHD and ix aGvHD only patients using samples taken between 21 and 0 days prior to aGvHD diagnosis 209 Table J.3 Validation results for qualified subsets of immune cells in proportion to PBMC (%) from the FLDA classification between aGvHD & cGvHD and aGvHD only patients using samples taken between 0 and 21 days from aGvHD diagnosis 214 Table J.4 Validation results for qualified subsets of immune cells in concentration (mm3) from the FLDA classification between aGvHD & cGvHD and aGvHD ., pnly patients using samples taken from 7 to 21 days post-transplant. 219 Table J.5 Validation results for qualified subsets of immune cells in concentration (mm3) from the FLDA classification between aGvHD & cGvHD and aGvHD only patients using samples taken between 21 and 0 days prior to aGvHD diagnosis 224 Table J.6 Validation results for qualified subsets of immune cells in concentration (mm3) from the FLDA classification between aGvHD & cGvHD and aGvHD only patients using samples taken between 0 and 21 days from aGvHD diagnosis 229 LIST OF FIGURES Figure 1.1 An example of sequential gating in FCM displayed in contour or histogram 3 Figure 1.2 Pathophysiologic mechanism of aGvHD (adapted from Couriel et al [17]) 6 Figure 1.3 Pathophysiologic mechanism of cGvHD (adapted from Iwasaki et al [37].) 8 Figure 1.4 An example of the FLDA signal plus noise training from the raw data (panel a) to the estimated signals (panel b), adapted from James and Hastie [4] 16 Figure 2.1 Temporal analysis pipeline designed for the high-throughput clinical FCM dataset 26 Figure 2.2 Static sample size calculation pipeline 32 Figure 3.1 Density plots of the FSC intensity of different aliquots of samples taken at 12 different time points (adopted from [3]). At day 46, the two red arrows show distributions corresponding to aliquots 'leukocyte' and '3Activation' are substantially different from other aliquots 36 Figure 3.2 Density plot of the FSC intensity using CD3 + cell population from seven aliquots of patient #6's 76 days post-transplant sample. Aliquot '3Activation' was identified as a visual outlier 37 Figure 3,3 Density plot of the SSC intensity using CD3 + cell population from seven aliquots of patient #6's 76 days post-transplant sample. Aliquot '3Activation' was identified as a visual outlier 38 Figure 3.4 Density plot of the FSC intensity using CD3- cell population from five aliquots of patient #4's 81 days post-transplant sample. Aliquot'T cells' was identified as a visual outlier 40 x i Figure 3.5 ECDF plot of the FSC intensity using CD3- cell population from five aliquots of patient #4's 81 days post-transplant sample. Aliquot'T cells' was identified as a visual outlier 41 Figure 3.6 Density plot of the FSC intensity using CD3" cell population from seven aliquots of patient #28's 14 days post-transplant sample. All aliquots exhibited great variations from each other. Similar observations also occur in 15 other samples 43 Figure 3.7 FCM contour graphs of FSC vs. SSC from patient #6, aliquots TCR' and '3Activation' from samples taken at 27 and 53 days post-transplant 44 Figure 3.8 Density plot of the SSC intensity using CD3- cell population from seven aliquots of patient #7's sample taken at the day of BMT. Aliquots 'rest/act T helper' and 'rest/act T suppressor' exhibited different pattern than all other aliquots 46 Figure 3.9 B-splines with knots located at every available time point and orders two, three or four fitting into the raw data 50 Figure 3.10 B-spline with order two and different distribution of knots fitting into the raw data. 50 Figure 4.1 Cumulative distribution of the aGvHD diagnosis days post-transplant with the selected time range between 7 and 21 days post-transplant labelled... 53 Figure 4.2 Time plots of the FLDA estimated signals (panel a) and the raw data (panel b) based on samples taken between 7 and 21 days post-transplant for the immune cells CD2 d i mCD16 +CD56 +CD3-in proportion to PBMC 55 Figure 4.3 Raw data time plot for immune cells CD2 d i mCD16 +CD56 +CD3- in proportion to PBMC based on samples taken between 0 and 100 days post-transplant. The purpled striped box indicates the time range where data was analyzed via FLDA 56 xii Figure 4.4 FLDA estimated signals time plot based on samples taken between 7 and 21 days post-transplant for immune cells CD3 +CD4 +CD8p + in proportion to PBMC '. 58 Figure 4.5 FCM contour graphs of transformed CD4 and CD8(3 marker measurements for a non-GvHD patient (#4) and aGvHD patients (#27) between zero and three weeks post-transplant. The CD3 +CD4 +CD8p + population is gated within the double positive gate 59 Figure 4.6 Raw data time plot for immune cells CD3 +CD4 +CD8p + in proportion to PBMC, based on samples taken between 0 and 120 days post-transplant. The purpled striped box indicates the time range where data was analyzed via FLDA 60 Figure 4.7 An example of sequential gating of the existing cell population CD3 +CD4 +CD8p + (red gates, panels a, b, and c) to identify a new immune cell population CD3 +CD4 +CD8p +CD8 + (panel d) 61 Figure 4.8 Time plots of the FLDA estimated signals (panel a) and the raw data (panel b) based on samples taken between 7 and 21 days post-transplant for the new immune cell population CD3 +CD4 +CD8p +CD8 + in proportion to PBMC... 62 Figure 4.9 Time plot of the FLDA estimated signals (panel a) based on samples taken between 7 and 21 days post-transplant and time plot of the raw data (panel b) based on samples taken between 0 and 100 days post-transplant for the immune cells CD3 + CD4 i n t in proportion to PBMC (aliquot ^Activation'). The purpled striped box indicates the time range where data was analyzed via FLDA 64 Figure 4:10 FCM data in scatter plot of FSC vs. SSC and histogram of CD3-PerCP intensity from patient #6, aliquot T cells' from samples taken at 45, 53, and 60 days post-transplant 65 Figure 4.11 Raw data time plot for immune cells CD3 + (aliquot '1 Activation') in proportion to PBMC based on samples taken between 0 and 100 days post-xiii transplant. The purpled striped box indicates the time range where data was analyzed via FLDA 67 Figure 4.12 Raw data time plot for immune cells CD3 +CD4 + (aliquot 'rest/act T helper') in proportion to PBMC based on samples taken between 0 and 100 days post-transplant. The purpled striped box indicates the time range where data was analyzed via FLDA 68 Figure 4.13 Time plot of the FLDA estimated signals (panel a) and raw data (panel b) based on samples taken between 21 and 0 days prior to aGvHD diagnosis for the immune cells 45RA+CD3+ in proportion to PBMC (%) 72 Figure 4.14 Time plot of the FLDA estimated signals (panel a) based on samples taken between -21 and 0 days from aGvHD and time plot of the raw data (panel b) based on samples taken between -21 and 21 days from aGvHD diagnosis for the immune cells CD45+CD33CD15+CD14-in proportion to PBMC. The aGvHD diagnosis day is labelled at day 0 .74 Figure 4.15 Time plot of the FLDA estimated signals (panel a) and raw data (panel b) based on samples taken between 21 and 0 days prior to aGvHD,diagnosis for the immune cells 45RO+CD3<ID4dlm in proportion to PBMC (%) 75 Figure 5.1 A pictorial example of FSC vs. SSC dot plot from a normal peripheral blood sample (adapted from [122]) 80 Figure 5.2 T cells development and maturation 90 Figure 5.3 An example of FLDA classification using immune cells CD3 +CD4 +CD8p + in proportion to PBMC 92 Figure 5.4 An example of FLDA classification using immune cells 45RO+CD3" C D 4 d i m in proportion to PBMC 95 Figure 5.5 Parallel coordinates plot of the normalized linear discriminant values from the 11. FLDA classifiers selected via the correlation-based feature selection method 103 xiv Figure F.l Density plot of the CD3-PerCP intensity using CD3 + cell population from seven aliquots of patient #6's 76 days post-transplant sample. There is no visible outlier 163 Figure F.2 Density plot of the CD3-PerCP intensity using CD3 + cell population from seven aliquots of patient #6's -6 days post-transplant sample shown as an example of gate quality control 164 Figure G. l Time plot of the FLDA estimated signals (panel a) based on samples , taken between -21 and 0 days from aGvHD and time plot of the raw data (panel b) based on samples taken between -21 and 21 days from aGvHD diagnosis for the immune cells CD3+CD44-CD25- in proportion to PBMC. The aGvHD diagnosis day is labelled at day 0 166 Figure G.2 Time plot of the FLDA estimated signals (panel a) based on samples taken between -21 and to 0 days from aGvHD and time plot of the raw data (panel b) based on samples taken between -21 and to 21 days from aGvHD diagnosis for the immune cells CD3 - (aliquot '1 Activation') in proportion to PBMC. The date of aGvHD diagnosis is labelled as day 0 167 Figure G.3 Time plot of the FLDA estimated signals (panel a) based on samples taken between 0 and 21 days from aGvHD and time plot of the raw data (panel b) based on samples taken between -21 and 21 days from aGvHD diagnosis for the immune cells CD2 d i mCD16 +CD56CD3-in proportion to PBMC. The date of aGvHD diagnosis is labelled as day 0 169 Figure G.4 Time plot of the FLDA estimated signals (panel a) based on samples taken between 0 and 21 days from aGvHD and time plot of the raw data (panel b) based on samples taken between -21 and 21 days from aGvHD diagnosis for the immune cells CD3 + CD4 i n t (aliquot '3Activation') in proportion to PBMC. The date of aGvHD diagnosis is labelled as day 0 170 Figure G.5 Time plot of the FLDA estimated signals (panel a) based on samples taken between 0 and 21 days from aGvHD and time plot of the raw data (panel xv b) based on samples taken between -21 and 21 days from aGvHD diagnosis for the new subset of immune cells CD3 +CD4 +CD8p +CD8 + in proportion to CD3 + cell population. The aGvHD diagnosis day is labelled at day 0 172 Figure G.6 Time plots of the FLDA estimated signals (panel a) and the raw data (panel b) based on samples taken between 21 and 0 days prior to aGvHD diagnosis for the immune cells CD45+CD33- in concentration (mm3) 173 Figure 1.1 Time plot of the FLDA estimated signals (panel a) based on samples taken between 7 and 21 days post-transplant and time plot of the raw data (panel b) based on samples taken between 0 and 100 days post-transplant for the immune cells.45RA+CD3+CD8 l o w in proportion to PBMC (%). The purple striped box indicates the time range where data was analyzed via FLDA 199 Figure 1.2 Time plot of the FLDA estimated signals (panel a) based on samples taken between -21 and 0 from aGvHD diagnosis and time plot of the raw data (panel b) based on samples taken between -21 and 21 days from aGvHD diagnosis for the immune cells 45RA+CD3"CD4d i m in concentration (mm3). The date of aGvHD diagnosis is labelled as day 0 201 Figure 1.3 Time plot of the FLDA estimated signals (panel a) based on samples taken between 0 and 21 days from aGvHD diagnosis and time plot of the raw data (panel b) based on samples taken between -21 and 21 days from aGvHD diagnosis for the immune cells CD3+CD4int (aliquot '2Activation') in proportion to PBMC (%). The date of aGvHD diagnosis is labelled as day 0.. 203 xvi LIST OF ABBREVIATIONS aGvHD Acute graft-versus-host disease ALL Acute lymphoblastic leukemia AML Acute myeloid leukemia APC Allophycocyahin BMT Bone marrow transplantation br Bright (in FCM gating) CD Cluster of differentiation CE-MS Capillary electrophoresis coupled mass spectrometry cGvHD Chronic graft-versus-host disease CLL Chronic lymphoblastic leukemia CML Chronic myeloid leukemia DP Double positive ECDF Empirical cumulative distribution function EM Expectation maximization FC-HCS Flow cytometric high content screening FCM Flow cytometry FCS Flow cytometry standard FITC Fluorescein isothiocyanate FSC Forward scatter GvHD Graft-versus-host disease (refers to both acute and chronic GvHD) HIV Human immunodeficiency virus HLA Human leukocyte antigen HSCT Hematopoietic stem cell transplantation int Intermediate (in FCM gating) LOOCV Leave-one-out cross-validation xvii MDS Myelodysplasia MHC Major histocompatibility complex MNC Mononuclear cell MPD Myeloproliferative disorder MUD Matched unrelated donor NHL Non-Hodgkin's lymphoma NK Natural killer (cells) PE Phycoerythrin PerCP Peridinin chlorophyll protein QA Quality assurance resect resting or activate states (of T cells) SELDI-TOF Surface-enhanced laser desorption ionization time-of-flight SIB Sibling donor SSC Side scatter SVMs Support vector machines TCR T cell receptors L I S T O F S Y M B O L S Ytj A set of observed value from patient j and class i (j = 1... J; i = 1... I) Sjj B-spline matrix Sx B-spline matrix for test data x A0 Global base value Aai Class signal Yy Individual signal variation Sjj Random experiment error ax Linear discriminant value weight Weights of the difference between the test data and the global base value ( (A'^I- '^A)- 1 A 7'^S-') a Significance level or tolerance of a type I error xix LIST OF EQUATIONS Equation 1.1 Signal plus noise model 13 Equation 1.2 Static linear discriminant classification 15 Equation 1.3 FLDA weight values at specified time points 16 Equation 1.4 Functional linear discriminant value 16 Equation 5.1 The aGvHD prediction formula for patient data sampled at 7,14, and 21 days post-transplant 92 Equation 5.2 The cGvHD prediction formula for patient data sampled at 21,15, 7 and 0 days prior to aGvHD diagnosis 95 Equation 5.3 Normalization function for the linear discriminant values 102 xx P R E F A C E Graduate study in the CIHR/MSFHR Strategic Training Program in Bioinformatics at the University of British Columbia begins with three four-month rotation projects and concludes with a master thesis study. Please note that only the master project was included in this thesis in order to have a connected thesis framework. My three rotation projects with Dr. Peter M. Lansdorp & Dr. Ryan Brinkman, Dr. Artem Cherkasov, and Dr. Robert Hancock are not directly related to my master study and are therefore absent from this thesis. Nonetheless, I gained a significant part of my knowledge in genome analysis, drug target identification, microarray analysis, etc. through the rotation projects. xxi ACKNOWLEDGEMENT I would like to thank everyone who helped or advised me with this project. Especially, I would like to extend my whole-hearted appreciation to: My supervisor Dr. Ryan Brinkman who gave me the opportunity to study this innovative project and the freedom to explore many different analysis methods. In particular, I would like to thank Ryan for his continuous support through my illnesses. Dr. Clay Smith and Dr. Maura Gasparetto for imparting invaluable knowledge of the graft-versus-host disease and flow cytometry. Dr. Marco Marra for his advice throughout the project. Dr. Colleen Nelson for her support and insight into the statistical validation of this study. Also, Ben Smith for his help on the static sample size calculation. I would also like to thank Dr. Robert Gentleman and Dr. Nolwenn Le Meur for their work on the flow cytometry quality assurance test ; Simon Dablemont for his MATLAB scripts' for the functional linear discriminant analysis; and James Wagner for his help on SVMs. My program committee members: Dr. Marco Marra (senior supervisor), Dr. David Baillie, and Dr. Fiona Brinkman for their guidance especially at the beginning of my graduate studies. Also, my rotation project supervisors: Dr. Peter M. Lansdorp, Dr. Ryan Brinkman, Dr. Artem Cherkasov, and Dr. Robert Hancock. Administrative staff including Ms. Sharon Ruschkowski from the Bioinformatics program and Ms. Monica Deutsch from the UBC genetics program for their assistances. xxii Fellow students in the UBC genetics program, the Bioinformatics training program, and the BC Cancer Research Centre. Special thanks to Debra Fulton, Evette Haddad, and Alison Meynert for their friendship and brainstorming sections. This study was funded by the CIHR/MSFHR Strategic Training Program in Bioinformatics and the British Columbia Transplant Foundation. xxiii D E D I C A T I O N This work is dedicated to my parents Shiaou-Cheng Lee and Mu-Tzu Tsou, for their support. Great opportunities like this master project were only made possible because of their insight and hard work in bringing me to Canada. xxiv CHAPTER 1 INTRODUCTION Hundreds of bone marrow transplantations (BMT) are performed in Canada each year. Despite numerous technical advances, graft versus host disease (GvHD) continues to be a major complication of hematopoietic stem cell transplantations (HSCT) [1, 5] with a maximum 90% fatality rate for severe GvHD [5-7]. Presently, there is no test to diagnose the disease definitively, nor standardized assessment for monitoring response to treatment. Therefore, it is imperative to develop more reliable and precise tests for predicting and diagnosing GvHD. In the present study, large scale immune cell population data obtained from a high-throughput flow cytometry (FCM) technique (section 1.1), were screened for their potential GvHD (section 1.2) predictive power by a novel temporal analysis pipeline (section 1.3). Finally, principles of sample size calculation are described in section 1.4. 1.1 Flow Cytometry The first flow cytometer, an integration of the flow system and the static microscope, was developed by Wallace Coulter in 1954 to count red blood cells. Today, flow cytometers can separate and count almost any type of biological or non-biological particle by combining its light scattering properties, which provide an indication of particle size and shape, as well as the presence of specific fluorescence markers or fluorochromes. In FCM, cells are typically labelled with antibody-conjugated fluorochromes that are used to detect the presence of cell surface proteins. The labelled cells are then suspended in sheath fluid and flow past the excitation light source, usually a laser, through a narrow tube one cell at a time. A detector measures the light emitted from the sample and the intensity of the light can then be used as an indication of, for example, the presence or absence of a fluorochrome. In the late 1 1970's and early 1980's, clinical applications of FCM rapidly developed in response to the emergence of the human immunodeficiency virus (HIV) [8]. Since then, several advancements in antibodies, fluorochromes, and resonance fluorescence techniques now allow researchers to count and sort an exact population of particles via sequential gating based on their physical or chemical characteristics. Gating is a procedure for FCM data where cells with common measurement intensities are grouped together. This is performed by either identifying a particular group of cells or separating the entire cell population based on a one or two parameters display. In sequential gating, multiple markers can be utilized to identify, a particular subset of particles. An example of the FCM sequential gating is shown in Figure 1.1. First, forward and side scatter (FSC and SSC) contour graphs (Figure 1.1a) were used to distinguish live cells (34%) and dead cells by their unique characteristic size and granularity. The population of live cells can be further divided using different cluster of differentiation (CD) markers. CDs generally represent cell-surface antigens. Different immune cell lineages and functions can be identified using different combination of the CD markers. In this case, the live cells can be further divided using CD3-fluorchorme intensity (Figure 1.1b) and then CD44 and CD25 (Figure 1.1 c & d). At the end, 68.8% and 31.2% live cells are with (CD3+) and without (CD3-) the CD3 surface marker respectively. These two populations can be further divided into subpopulations of CD25+CD44+, CD25+CD44-, etc. 2 Figure 1.1 An example of sequential gating in FCM displayed in contour or histogram Mul t ipa rame t r i c F C M data analysis is a n essential technique i n i m m u n o p h e n o t y p i n g . M u l t i p l e antibodies and f luorochromes can be used to ident i fy specific i m m u n e cell lineages. Major c l in ica l uses of F C M inc lude the d iagnosis and m o n i t o r i n g of l eukemia and l y m p h o m a [9, 10], the eva lua t ion of per iphera l b l o o d hematopoiet ic stem cel l grafts [11], and the quant i ta t ion of C D 4 + 3 versus CD8 + T cells in blood to monitor HIV infection and to assess the treatment performance [8]. FCM high content screening (FC-HCS) [2], a high throughput FCM method, was developed by automating the staining and sample analyses using robotic devices. The technique is robust and can process up to a thousand samples per day. Using this technique, large FCM datasets with complexities similar to genomic techniques such as microarrays can be obtained relatively simply. The FC-HCS technique has many advantages over the conventional manual flow cytometric assays. First, only a few thousand cells are required for analysis. Consequently, replication and various experimental designs can be achieved from each sample collection. As this technique is almost entirely automated, mistakes in handling and staining large numbers of cell samples are minimized. These advantages dramatically enhance both the efficiency and the reproducibility of the high-throughput flow cytometric assays. 1.2 Graft versus host disease GvHD occurs following allogeneic HSCT when immune cells in the graft attack the recipient's tissues. Very little is known about this potentially fatal disease [5] and for many that survive, the result is a significant decrease in quality of life [6, 7, 12, 13]. GvHD is the major limitation for broader application of HSCT which is the only curative treatment for many hematopoietic disorders [1]. GvHD occurs in two distinct forms, acute (aGvHD) and chronic GvHD (cGvHD). Here the term GvHD refers to both forms. GvHD requires the following three conditions to occur [14]: The graft contains enough immunologically competent cells; Antigens present in the recipient are different from those present in 4 the donor; and The recipient is incapable of mounting an effective immune response to destroy the graft. 1.2.1 Acute graft versus host disease Manifestations of aGvHD can be described in three phases [1, 15-17], summarized in Figure 1.2. In phase one, preparative treatments such as chemotherapy or radiotherapy damage host tissues that subsequently secrete inflammatory cytokines. During phase two, the donor's T cell pathway is activated when it recognizes foreign recipient's antigens presented by host antigen-presenting cells. The donor's T cells proliferate and differentiate into effector cells. Finally, in phase three, Thl inflammatory T cells' differentiation leads to the activation of cytotoxoic T cells, which in turn release a variety of inflammatory cytokines. This cytokine dysregulation results in skin, liver and gastrointestinal tract tissue damages. aGvHD typically occurs within the first 100 days following the HSCT, usually between 14 and 42 days post-transplant [15]. The diagnosis and the subsequent grading of aGvHD usually involve skin and histopathologic examinations. However, a wide range of unrelated illnesses such as the basal cell necrosis, viral infection, and epidermolysis often exhibit similar symptoms and complicate the early diagnosis of aGvHD [18]. When an aGvHD diagnosis is made, it can be graded into four different levels based on the extent of tissue damage [16,17]. The most important risk factor for developing aGvHD after a HSCT procedure is the degree of histoincompatibility in the human leukocyte antigen (HLA) between patient and donor [1]. Other aGvHD risk factors include increased age of donor and mismatched gender [1,19]. 5 Target cell apoptosis Figure 1.2 Pathophysiologic mechanism of aGvHD (adapted from Couriel et al [17]) Many immune cell populations have been identified as aGvHD mediators particularly through animal models and ex vivo graft treatment studies. They include the major (MHC) and minor histocompatibility complexes, dendritic cells, T cells, nature killer (NK) cells, macrophages, and cytokines [1]. The most prominent mediator is donor T cells [20]. T cell depleted BMT has been shown to reduce the occurrence of aGvHD significantly. However T cell depletion is rarely applied due to its severe side effects including increased rate of graft failure, prolonged 6 immunosuppressive state resulting in increased likelihood of fatal infections, and higher relapse rate [21-26]. Previous attempts to build a predictive model using CD3 + T cells usually comprised small numbers of patients and exhibited conflicting results. Even though T cell depletion studies have demonstrated the importance of T cells in aGvHD development, many studies could not establish a significant correlation in the CD3 + , CD3 +CD4 + or CD3 +CD8 + T cells patterns (in either proportion or absolute number) to the onset of aGvHD [27, 28]. However, one study comparing nine moderate or severe aGvHD and 15 non-GvHD patients demonstrated significant correlation between the changes of three T cell subtypes (CD4+CD25+, CD4 +CD69 +, and CD4+CD134+) to the development of aGvHD [29]. Another study in humans demonstrated significant correlation between the rapid increase (>50%) of donor T cell chimerism and the development of moderate or severe aGvHD [30]. NK cells are also one of the known aGvHD mediators [1]. However, the exact NK cells population and their functions are not well defined. Some studies suggest NK cells contribute to tissue damage during aGvHD via secreting pro-inflammatory cytokines [31/32] while others suggest that NK cells suppress GvHD effects [33,34]. 1.2.2 Chronic graft versus host disease : cGvHD affects 30-80% of patient surviving six months or longer after their HSCT procedure [35] and is the leading cause of non-relapse deaths. The pathophysiologic mechanism of cGvHD remains poorly defined despite numerous studies. Researchers have suggested the participation of both autoreactive and alloreactive T cells in the manifestation of cGvHD because the symptoms resemble autoimmune diseases. The development of cGvHD (Figure 1.3) might be the result of autoreactive T cells escaping negative selection in the damaged thymus caused by 7 the preparative treatments or aGvHD [36]. The resulting Th2 CD4 + helper T cells facilitate synthesis of autoantibodies by host B cells [37]. Preparative treatments 1 Thymic injury 1 Loss of negative selection of autoreactive T cells autoreactive antibodies Figure 1.3 Pathophysiologic mechanism of cGvHD (adapted from Iwasaki etal [37].) cGvHD usually occurs approximately four months after transplantation [38]. Similar to the diagnosis of aGvHD, cGvHD diagnostic methods are based on ambiguous clinical symptoms that involve skin and multiple internal organs. 8 cGvHD is usually differentially diagnosed apart from aGvHD and bacterial infections by at least one unique cGvHD symptom rather than the timing of the onset [37,39]: cGvHD is graded into either limited or extensive disease based on the extent of skin tissue and internal organ damage. An alternative classification system is based on the cGvHD diagnosis time relative to the aGvHD status. Progressive cGvHD evolves directly from aGvHD and is associated with the most severe prognosis. Quiescent-type cGvHD with an intermediate prognosis occurs after an aGvHD free period. Finally, de novo cGvHD occurs without a prior history of aGvHD and has a better prognosis [37, 39]. The greatest risk factor associated with cGvHD is the prior incidence of aGvHD. The risk of developing cGvHD is more than ten times higher in patients with prior aGvHD [35]. Other factors include those common to aGvHD, such as the age of the patient and the degree of transplant histoincompatibility [39]. The known mediators of cGvHD include interleukin-18, T cells, and B cells [37]. Researchers have speculated that T cells are also the main mediator and effector cell type for the development of cGvHD. However, a recent randomized-trial study of T cell depletion contradicted previous findings [25, 26, 35, 40] and concluded that T cell depletion did not significantly reduce the incidence or the severity of cGvHD [21]. Attempts to build a predictive model using T cells or T cell subsets have resulted in conflicting or incomparable results. One study [41] demonstrated an insignificant correlation between the changes in CD4 + and CD8 + T cells and the onset of cGvHD. Another study utilizing both FCM and intracellular staining demonstrated a potential correlation between IL-4 producing CD8 + T cells and cGvHD development. Other similar studies have focused on CD34+ cells and suggested the importance of graft composition. However, they did not observed any significant correlation between any cell subset and the onset.of cGvHD [27, 42-9 45]. A pilot study of limited number of patients (six cGvHD and nine controls) focused on regulatory T cells with a CD25h igh phenotype and observed a significant increase of CD4 +CD25 h i8 h T cells associated with the onset of cGvHD. 1.3 Temporal analyses In comparison to the conventional static or multivariate analyses, temporal analysis is the most efficient analysis approach for the study of biological phenomena occurring over time [46]. In static analyses, values from a single fixed time point or the relationship between two fixed time points are examined. In multivariate analyses, values from multiple time points are examined as independent variables. Only in temporal analyses, values from multiple time points are examined as a single entity, thus conserving the continuity and dynamic of time. Other main advantages of temporal analyses are that they are generally more tolerant to missing values and non-uniform sampling rate, the two most prominent challenges in a clinical dataset. On the other hand, the major challenge in designing a time-course experiment is the sampling rate. If the experiment is under-sampled, temporal aggregation may occur [47]. Oversampling is not favourable because of the cost. There is no standard sampling rate as it is specific to the biological phenomenon under investigation and the instrumental error rate [47]. Other experimental and computational challenges in a temporal analysis were previously reviewed by Ramsay and Silverman [48]. 1.3.1 Temporal analysis for flow cytometry data The popularity of time-course studies has already prompted the development of temporal versions of many conventional statistic analysis methods. Examples of these include algorithms for analysis of variance [49, 50], functional principal 10 component analysis [48, 51], clustering [52-59], and classification [4, 60-63]. Most temporal analysis algorithms were designed for or tested on microarray data. In some cases, the algorithms are not applicable to FCM data. The most noteworthy difference in the analyses of a microarray dataset versus a FCM dataset is the underlying assumption. Many microarray analyses are based on the assumptions that gene expression values follow a normal distribution and most do not change. These assumptions fit well with the whole genome approach of microarrays. The same assumptions have no standing in FCM data where only known cell populations are measured from a limited and biased selections of antibody-fluorochromes, and manual sequential gating. Furthermore, results from sequential gating overlap in their targeted immune cell subpopulations. Thus, FCM data are potentially dependent and correlated. To the best of my knowledge, no study has been.done on the distribution of individual or overall immune cell population changes. As a result, availability of temporal algorithms suitable for FCM data analyses is further limited. Below in sections 1.3.2 to 1.3.4, I have summarized the common temporal analysis procedures: time-series data representation, pre-processing, and classification, employed in the pipeline I developed in response to the shortcomings of existing analysis methodologies. 1.3.2 Representing temporal data The first step into a temporal analysis is to transform the time-series data consisting of.a set of discrete values at multiple time points info one or more functions,. The purpose of this transformation step is to represent1 the data as coefficients in a formula. The most common way of representing a non-periodic time-series data is the B-spline [48,64]. A B-spline is a linear combination of a basis function. Two parameters involved in a B-spline shape are basis function order (n-1) and a knot placement. 11 Generally, these parameters were selected to ensure adaptability of a B-spline to the original data pattern. If n is two, a B-spline is built on combinations of linear basis functions between each knot. The spline dictates smoothness between the two basis functions on each side of a knot. The order of the basis function is also determined by the degree of the derivative function to be analyzed. For an example, a cubic B-spline (n=4) will ensure smoothness and the availability of up to the second derivative for further analyses. In a B-spline, knots designate the beginning of a new basis function where a change in the pattern is available. Subsequently, knots are placed around regions where complex variation is expected. By specifying the location and the number of knots, one can enforce regions with complex variation, ensure tolerance to non-uniform sampling time, and induce smoothing. Presently, there is no standard for the basis order or the knot positions. The B-spline, represented as coefficient values in a matrix, is flexible to fit large numbers of data points and allows relatively easy implantation of various calculations. Other data representation techniques include: P-spline, polynomial function, exponential basis, power basis, step-function basis [65] and the. Fourier basis for periodic data [48, 64]. In this study, I utilized B-splines, the most robust representation of time-series data and investigated how to build a B-spline that best reflects the raw data pattern. 1.3.3 Data pre-processing - Smoothing & Registration When time-series datasets are transformed into one or more combinations of functions via methods such as B-splines, the resulting pattern is automatically smoothed. The purpose of performing an additional smoothing procedure is to minimize fluctuations in the pattern that might be motivated by random experimental errors instead of the underlying biological phenomena. The 12 commonly used smoothing methods: least square, roughness penalty, and positive smoothing methods, are briefly described below. The basic aim of these smoothing methods is to determine the balance between goodness of fit to the intrinsic or external pattern and amount of information lost. For the least squares smoothing method, patterns are changed throughout the available time range in order to minimize sum of squared errors in fitting a simulated model with normally distributed and independent residues. For the roughness penalty method, variances among the patterns are decreased throughout. The amount of smoothing is unbiased and is controlled by the user specified parameter A. The positive smoothing method modifies every pattern by enforcing a logarithmic property, thus adding positive constraint throughout. Overall, there is no standard degree of smoothing by any method, and as a result, this increases the complexity of long time-series data analyses. Another form of smoothing where random experimental errors are estimated and removed is the signal-plus-noise model. Essentially, a set of observed values Yij from sample in class i are divided into global base value X0, class signal A a,, individual signal variation yy and individual experimental errors £ i y (Equation 1.1). The parameters can be estimated via algorithms such as the Expectation Maximization (EM) algorithm. Yv =A0 +Aai +yIJ +ey Equation 1.1 Signal plus noise model Registration in a temporal analysis refers to stretching and shrinking the time index of each observed data to fit an overall time-series data pattern [48]. This step is often necessary because the phenomena being measured may not follow the linear 13 time scale the data. Registration is particularly important for long time-series data and clinical data in order to synchronize different patient response times. Examples of registration methods are the landmark and continuous fitting criterion [48]. Landmark registration is biased as it depends on prior information. First, a minimum of two landmarks for the two ends of each time-series data are identified. More landmarks can be identified based on specified patterns or prior information such as disease diagnosis or combinations of both. There must be an equal number of landmarks specified for each set of time-series data. The landmark registration algorithm then transforms the time axis so that corresponding landmarks in the time-series dataset are comparable [48, 66]. Continuous fitting or global registration is unbiased and aims to minimize the least square value between.the registered patterns and their means. At each iteration, amplitude differences between the patterns and their mean are minimized by modifying the time scale [48]. Other registration methods include shift registration, which applies a constant shift to the time index and warping function, which combines registration and smoothing. 1.3.4 Classification Classification algorithms analyzing time course data can be categorized into two approaches. The first approach utilizes conventional multivariate analyses such as principal component analysis [67, 68], singular value decomposition [69], correlation analysis [70], and support vector machines [71]. These algorithms omit the time dependency of the time course data. The second approach includes the time dependency in the time course data. Thus, the second approach is generally considered more efficient in time course study and applicable to study with missing values and non-uniform sampling rate. 14 Algorithms categorized in the second approach include nonparametric curves discrimination [72], functional linear discriminant analysis (FLDA) [4], mixture functional discriminant analysis [73], predictive modular neural networks [60], etc. Among all these classification algorithms, only FLDA was designed specifically for sparsely sampled datasets. Therefore, FLDA [4] is deemed the most suitable temporal classification algorithm for the analysis of the present clinical dataset. FLDA is a B-spline based method. Similar to the static linear discriminant analysis, it provides an easily interpretable classifier. In the static linear discriminant analysis [74], the classification of test data can be made via multiplying weight values (bi, bz... bm) with test data values (xi, xi,... xm) from the corresponding parameter (Equation 1.2). These weight values are determined for each parameter using a training dataset with multiple and independent parameters. The absolute value of these weight values also represents how strongly each test data will be accounted for in the classifier. Group = a + bxxx +b2x2... + bmxm — Equation 1.2 Static linear discriminant classification For time-series datasets, FLDA builds a classifier by estimating the signal-plus-noise model (Equation 1.1 and Figure 1.4) using a training dataset where the first three parameters (global base value, class signal, and individual signal variation) are denoted by the B-spline matrix Sy. In the FLDA classifier, weight values (Equation 1.3) are determined for a set of sampled time points using variables estimated in the signal plus noise model (Equation 1.1). Classification is made by multiplying the difference of the test data with the corresponding global base values A0, to the weight values at the sampled time points (Equation 1.4). The 15 polarity of the linear discriminant value ax is used to determine the classification of a test data into one of the two groups. In a FLDA classifier, large absolute weight values are assigned to time points where there is large separation between the estimated class signals. As a result, small differences between test data and the global base values at those time points will be accounted more heavily than differences at other time points in the overall classification (Equation 1.4). 17) OJ 3 > 17) a > time time Figure 1.4 An example of the FLDA signal plus noise training from the raw data (panel a) to the estimated signals (panel b), adapted from James and Hastie [4] weight = ( A ^ E ^ A ) - 1 A ^ I ' 1 Equation 1.3 FLDA weight values at specified time points ax = weights • (X - SXA0) Equation 1.4 Functional linear discriminant value 16 Validation techniques for classifiers can be categorized into four groups: external test set, resubstitution, bootstrap, and cross-validation. The external test set is the best validation technique because it provides unbiased error estimation by validating the classifier using a new dataset with prior knowledge of class assignment. Unfortunately, the external test set validation is usually impractical in studies with a small sample size. The other three groups of validation techniques utilize the same dataset for both training and validation of classifiers. Resubstitution is a method where the same training dataset is used as the test dataset and it usually underestimates the classifier error considerably [75]. Similar to resubstitution, bootstrap repeatedly re-analyzes a subset of the training dataset by selecting profiles with replacement. K-fold cross-validation also repeatedly re-analyzes a subset of the training dataset but without replacement. Error is estimated by k training datasets, each time leaving a subset of the original dataset as the testing dataset. If k is set to the size of a dataset, then leave-one-out cross-validation (LOOCV) is performed and a single data point is used as the testing dataset each time. Studies have shown that bootstrap technique generally results in biased error estimation with small variance while the cross-validation results in less biased estimation with large variance [76]. 1.4 Sample size calculations Sample size calculation or power analysis estimates the certainty of detecting an effect, which is inversely proportional to the probability of a false negative (a type II error) result. The estimated power depends on the tolerance of type I errors (significance level, a) and the data variance. In the case of a pilot project, sample size calculation may be used to determine how many samples are needed for a future study in order to achieve a certain power level. Generally, sample size calculation consists of four steps [77]: 17 1. Specify a 2. Specify hypothesis-testing procedure 3. Sampling of the original dataset to create simulated datasets of different sizes 4. Estimate power of the analysis based on multiple stimulated datasets Most of the sample size calculations vary with their choice of hypothesis testing and sampling methods. Most include assumption of normal or known distributions. Power analysis by location shift [78] is entirely nonparametric and incorporates the average X & Y method for a conservative power estimation. It is a bootstrap based method where multiple stimulated datasets from the empirical cumulative distribution function (ECDF) are compared with the Wilcoxon test. It considers variances from the two original datasets separately and determines the overall power as the average of the power estimated from the two original datasets. 1.5 Thesis goals Previously, high-throughput methods have proven useful in probing unknown diseases [79, 80]. High-throughput FCM has never been applied to the study of GvHD because of the technical difficulties of FCM were only resolved with the recent development of FC-HCS. As manifestations of GvHD are based on the immune system, it was thought a high-throughput analysis on immune cell changes in the blood following allogeneic HSCT might prove to be successful in predicting the onset of GvHD and elucidating their mechanisms. The main hypothesis of the present study was: Onset of aGvHD or cGvHD can be predicted by identifying patterns of cellular markers in peripheral blood mononuclear cells (PBMCs) via FC-HCS. 18 It is suspected that there are multiple immune cells and pathways involved in GvHD disease manifestation [81]. The global approach used in this study should be beneficial in the further elucidation of the disease. The main goal of the present study was to develop a bioinformatics analysis pipeline that can analyze high-throughput clinical FCM data and if possible identify immune cell populations that may be used in a diagnosis of either aGvHD or cGvHD. The specific aims were: 1. Assemble a suitable temporal analysis pipeline to process the high-throughput FCM dataset 2. Identify one or more immune cell populations with potential discriminate power for either aGvHD or cGvHD . . < 3. Construct diagnostic models for aGvHD and cGvHD 4. Recommend an analysis methodology for an extended study 19 CHAPTER 2 PATIENTS AND METHODS 2.1 Overview One hundred and twenty-three subsets of PBMCs were obtained by FC-HCS using samples taken from 31 patients (Table 2.1) at multiple time, points. The quality of the dataset was assessed and suspicious outliers removed. This dataset was then separated based on patients' GvHD diagnoses and analyzed by a temporal classification algorithm. In order to verify the hypothesis of the present study, temporal patterns of immune cell populations' abundances that appeared to correlate with the onset of either aGvHD or cGvHD were identified and visually inspected. Finally, sample size calculations were performed based on the top classifiers in order to estimate statistical power of the current and future studies. 2.2 Study patients Thirty-one patients who received H L A matched BMT from either sibling (SIB) or matched-unrelated donors (MUD) were enrolled at the Moffitt Cancer Center with the approval of the institutional review board. On average, there were 14 (±3) samples per patient, collected approximately every ten days (±14). Samples were collected from 0 to 16 days (average 6 + 4 days) before the transplantation and until 49 to 400 days (average 125 ± 81 days) after the transplantation. This was a heterogeneous dataset. Among the 31 patients, there were seven different underlying hematopoietic disorders (Table 2.1) and at least four different types of pre-transplant treatments (data not shown). Twenty-one patients were diagnosed with aGvHD on average 36 days (±18 days) post-transplant. Seven of these aGvHD patients were later diagnosed with cGvHD from 98 to 446 days post-transplant. The diagnosis and grading of GvHD were performed using previously published criteria [82]. Details of the stem cell source, GvHD diagnosis time, and maximum GvHD grades are summarized in Appendix A. 20 Table 2.1 Characteristics of the 31 patients recruited for the study. Characteristics Subtypes Incidence (% of total population) GvHD aGvHD 21 (68%) aGvHD and survived 9/21 (29%) aGvHD then died or withdrew 5/21 (16%) from the study Progressive or quiescent-type 7/21 (23%) - - >» - • cGvHD non-GvHD 7 (23%) non-GvHD with records past 100 4/7(13%) days post-transplant non-GvHD died or withdrew 3/7(10%) before 100 days post-transplant De novo cGvHD 3 (10%) Underlying disorders A M L 11 (35%) MDS 1 (3%) MDS-AML 3 (10%) CML 5 (16%) NHL 7 (23%) MPD 1 (3%) CLL 2(6%) ALL 1 (3%) Donor-recipient relationship SIB 17(55%) MUD 7 (23%) unknown 7 (23%) Total 31 21 2.3 Sample preparations and flow cytometry high content screening Blood samples were obtained both pre- and post-transplantation on an approximate weekly basis. PBMCs were isolated using Ficoll-Hypaque technique. The samples were divided into ten aliquots in 96 well plates. Each aliquot was stained with four different antibodies out of the total 25 (Table 2.2) used in the present study. The four antibodies used per group were attached with different fluorochromes and the combinations of antibodies-fluorochromes were designed to target different immune cells (Table 2.3). Six aliquots named 'lActivation', '2Activation', '3Activation', 'resting/activate (rest/act) T helper', 'rest/act T suppressor', and 'T cells' targeted subsets of T cells and their functional states. The other four aliquots targeted myeloid cells, B cells, NK cells, and T cell receptor (TCR) via aliquots so named. Depending on the sample number and frequency, one or more 96-well plates were used for each patient. Samples were usually plated one row per aliquot and ordered in columns by their sampled time. These 96 well plates were stained with antibodies and then analyzed using multi-parameter FCM as part of the FC-HCS technique previously described [2]. Batch gating analysis of the FCM was performed using Flowjo software (Tree Star, Inc, Oregon) on one- or two-dimensional plots to generate abundance values for maximum 123 subsets of immune cells for each sample (Appendix B). The sample preparations and the FCM gating were previously performed by the Moffitt Cancer Center and Dr. Maura Gasparetto (BC Cancer Research Centre). 22 Table 2.2 Annotated functions and selected literature references on the 25 cell surface antigens used. Gene Functions Literature Name(s) GD2 - - • Activation of T and NK cells [83] CD3 Known to be involved in phase II of acute GvHD [84] CD4 Regulation of interleukin-2 biosynthesis; T-cell differentiation; Known mediator in GvHD [85-87] CD5 . Cell.proliferation and recognition [88,89] CD8 Know to be involved in phase II of acute GvHD [84] CD8(3 T-cell activation, M H C class I binding [90, 91] CD10 Also known as common acute lymphoblastic leukemia; marks early lymphoid progenitor cells [92] CD14 Cell surface receptor linked signal transduction; inflammatory response [93] CD15 Neutrophil adhesion [94] CD16 Immune response [95] CD19 B cells marker [96] CD20 B cells activation; immune responses; signal transduction [97] CD22 Cell adhesion; antimicrobial humoral response [98, 99] CD25 Marker for strong or prolonged antigen stimulation [96] CD33 •" ' Cell adhesion [100] CD44 Cell adhesion [101] CD45 Lymphocytes activation [102] CD45RA T cells in resting state [103] CD45RO T cells in activating state [103] CD56 NK cells marker [96] CD69 Early T cell activation antigen, acute graft rejection [104] CD122 Cytokine receptor [96] CD134 Tumor necrosis factor receptor superfamily, marks activated CD4 + cells [105] TCRab T cell activation [96,106] TCRgd T cell activation [96] 23 Table 2.3 The combinations of antibody - fluorochromes used in each of the 10 aliquots available per sample. Aliquot # Aliquot name FITC PE PerCP APC 1 Myeloids CD15 CD45 CD14 CD33 2 T cells CD4 CD8P CD3 CD8 3 NK cells CD16 CD2 CD3 CD56 4 B cells CD10 CD20 CD19 CD22 5 TCR TCRab TCRgd CD3 CD5 6 lActivation CD44 CD25 CD3 CD69 7 2Activation CD4 CD134 CD3 CD8 8 3Activation CD4 CD122 CD3 CD8 ., . 9 , rest/ act T helper CD45RA CD45RO CD3 CD4 10 rest/act T suppressor CD45RA CD45RO CD3 CD8 2.4 Temporal analysis pipeline A temporal analysis pipeline consisting of three steps was assembled specifically for the high-throughput clinical FCM dataset (Figure 2.1). Step one involved a quality assurance (QA) test in two parts. The purpose of this QA test was to identify values motivated by experimental errors. Step two involved the data transformation via a PERL script. Finally, step three involved the temporal classification via FLDA. The resulting classifiers were ranked based on their potential discriminative power for the onset of either aGvHD or cGvHD. 24 Step 1: Flow Cytometry Quality Control Flow Cytometry Flow cytometry data V Quality assurance test 1 Manual gating and data extraction 104 lrf> 101 102 103 10* CD4 FITC Step 2: Data transformation Cell lineage abundances proportion tothePBMCs As is Proportion dataset (%) Time Day 0 Day 7 Patient #1 30 10 Patient #2 6 39 Patient #3 27 52 Markers CD3+ CD3+ CD4br Patient # 1 dayO 30 25 Patient #1 day 7 10 10 Patient #1 day 14 47 40 Mononuclear cells concentration Concentration dataset (mm) Time DayO Day 7 Patient #1 2.1 0.2 Patient #2 10.5 5.07 Patient #3 1.6 7.3 25 Step 3: Temporal classification Functional Linear Discriminant Analysis Continuous representation Data as values at multiple discrete time points Patient #1 Patient #2 Patient #3 Time Day 0 30 27 Dav 7 10 39 52 Linear B-splines Days post-transplant FLDA Classifiers LOOCV validation o o Q u Q u 10 12 14 16 Days post-transplant ' "T cells CD3+CD4+CD8P+ diagnosis < aGVHD healthy Q aGVHD 18 0 healthy 3 3 Weighted knots validation for static sample size calculation V T cells CD3+CD4+CD8p+ Knots (days post-transplant) 7 14 21 Accounted weights 0 0.012 -0.177 Visual inspections of top ranking measurements Figure 2.1 Temporal analysis pipeline designed for the high-throughput clinical FCM dataset. 26 2.4.1 Quality Assurance The basic assumption for the main QA test was that distributions from common light scatter intensities of cells in different aliquots of the same sample should be similar [3]. Outliers were identified through visual inspection of ECDF, density plots and box plots. Part one of the QA test was performed on ungated data by Dr. Le Meur (Fred Hutchinson Cancer Centre) where the QA assumption was tested on intensities of the FSC and SSC measurements for all cells. Raw flow cytometry standard (FCS) files from a FACSCalibur (Becton Dickinson (BD), San Jose, CA) were obtained and analyzed in R via the rflowcyt package [107]. In part two, I tested the QA assumption based on the intensities of the FSC, SSC and CD3-PerCP antibody-fluorochrome for CD3 + and CD3- populations separately. FCS files of the gated CD3 + and CD3- populations were exported from Flowjo. Excess keywords in the FCS files were removed via a PERL script (fixFCS.pl, Appendix C) to generate a file format compatible with the rflowcyt package. Unlike the QA test on ungated data where up to ten aliquots were available per sample, there were only five or seven aliquots available for the QA test of gated data. Consequently, it was more difficult to identify outliers visually. In order to retain most of the limited data for the subsequent classification analysis, only obvious and singular outliers were identified. Criteria for outlier identification in the QA test on gated data were: 1. One outlier per sample 2. The outlier pattern must be visually different from all other aliquots 3. The outlier pattern cannot be visually explained by the observed general variations. Under these criteria, outliers were identified and all their associated sub-gates were removed from subsequent analyses. Data with putative outliers that did not fit the above criteria were retained. Finally, all outliers and unusual patterns were mapped 27 back to the original plating chart in order to investigate the distribution of outliers on the 96-well plate. 2.4.2 B-spline parameters evaluation The effects of two B-spline parameters: basis order and knot placement were tested using a time-series data from patient #2 between 0 and 13 weeks post-transplant. This patient was selected because of its uniform sampling rate and a single missing value at week one. The effects of these parameters on the overall fit between the resulting B-spline and this data were evaluated and were used as models in determining the optimal B-spline parameters for the dataset. However, because of the sampling rate disparities and the massive numbers of values available, this data may not be representative of the entire dataset. First, the effects of different basis orders were examined using three B-splines created with two, three or four basis order creating linear, quadratic, and cubic basis functions. Knot placement of one knot for every sampled time point was used for all three B-splines. Secondly, the effects of different knot placements were examined with four Brsplines consisting of linear basis functions. The four knot placements, with decreasing knot frequency were: 1. A weekly knot placement including one knot at week one post-transplant when patient information was not available 2. Knots at every sampled time points (no knot at week 1) 3. A bi-weekly knot placement covering the entire time range (0,2,4, 6,8, and 13 weeks post-transplant) 4. A tri-weekly knot placement covering the entire time range (0, 3, 6, 9, and 13 weeks post-transplant) 28 2.4.3 Data transformation Step two in the temporal analysis pipeline (Figure 2.1) involved data transformations via a PERL script (viz_days.pl; Appendix D). The 123 gated immune cell abundances were exported to text files using Flowjo software. The FCM data were then combined with immune cell concentration data and transformed into a proportion dataset and a concentration dataset. The proportion dataset contained all 123 subsets of immune cells; each corresponding to the proportion of cells (proportion of either the total PBMCs or total CD3 + cells) in the gate: The mononuclear cell (MNC) concentration values (mm3) were obtained separately using different samples taken from the same group of patients at multiple time points. The concentration dataset was obtained by multiplying each proportion value with the M N C concentration of samples taken at the closest date. Both datasets were tested because they may contribute different insights into immune responses. The PERL script viz_days.pl (Appendix D) also rearranged the file layout and the time scale. Originally, data was recorded as the number of days after the BMT. Viz_days.pl combined the known aGvHD diagnosis date, BMT date, and the sampled time points to modify the time scale from days post-transplant into days from the aGvHD diagnosis. For patients unaffected by aGvHD, the average date of aGvHD diagnosis observed in the current dataset (36 days post-transplant) was used as the synchronization event. The non-GvHD patient data were transformed so they could be compared to the aGvHD patient data. Thus, patients' responses were synchronized by two events resulting in two time scales in days post-transplant and days from aGvHD diagnosis. The PERL script also excerpted three parts of the data for time ranges representing patterns right after BMT, and before and after aGvHD manifestation. Consequently, results derived from these three time ranges should be useful in elucidating the onset, manifestation, and progression of GvHD. In the end, three separate dataset of different time ranges were obtained: 29 1.7 to 21 days post-transplant 2. 21 to 0 days prior to aGvHD diagnosis 3. 0 to 21 days from aGvHD diagnosis 2.4.4 Temporal classification In step three of the temporal analysis pipeline (Figure 2.1), different combinations of GvHD and non-GvHD patient groups (Table 2.1) were analyzed using FLDA for both the proportion and concentration datasets. The first comparison was between the 21 aGvHD and the 4 non-GvHD patients. This comparison was intended to identify temporal patterns from one or more subsets of immune cells that could predict aGvHD reliably and precisely prior to the manifestation of clinical symptoms, or elucidate pathophysiologic pathway of aGvHD during the clinical manifestation of aGvHD. Supplementary comparisons including 17 Grade II-IV aGvHD vs. 4 non-GvHD patients and 12 Grade III-IV aGvHD vs. 4 non-GvHD patients were also performed. The second comparison was between seven patients diagnosed with both aGvHD and cGvHD and nine patients diagnosed with only aGvHD. This comparison was intended to identify temporal patterns from one or more subsets of immune cell that are predictive of progressive or quiescent-type cGvHD either after the BMT or during the manifestation of aGvHD. A PERL script (FLDA_MATLAB.pl; Appendix E) read in the specified data and outputted necessary MATLAB (MathWorks, Inc. Boston) commands to build a FLDA classifier for each subset of immune cells and each patient group comparison. The PERL script also acted as a filter to omit data with fewer than three available sampled time points per patients in each of the selected time ranges, or fewer than three available patients per group. Because of missing values from the sampled time point and limited number of available aliquots, not all the identified immune cell 30 populations and patients were included in each analysis. The qualified data were then analyzed via the FLDA analyses with a linear B-spline and a weekly knot placement. LOOCV was performed on the FLDA classifiers. The validation results were used'to rank the FLDA classifiers and their corresponding subsets of immune cells as the values were directly proportional to the potential discriminative power of the temporal patterns. Top ranking classifiers were then inspected visually via time plots df the FLDA estimated signals and the raw data in the analyzed and extended time ranges. 2.5 Static sample size calculation A static sample size calculation pipeline (Figure 2.2) was implemented in the R package 'PALS' (Power Analysis by Location Shift) based on the location shift hypothesis [78]. The analysis was performed on values from the top FLDA ranking immune cell populations closest to the time point where the class signal separation was-the greatest based on the adjusted weight values (section 2.6).. The purpose of this analysis was to estimate statistical power of the present and future studies. Briefly, in the sample size calculation (Figure 2.2), simulated datasets were generated from random samplings of two ECDFs corresponding to the two groups of observed values. The simulated datasets were then analyzed using the Wilcoxon test for statistical significance (a <, 0.1). This was repeated 10,000 times to estimate the power of the study. Each ECDF was used to simulate data representing both groups. The average of the power from each ECDF was obtained. In the interest of time, an upper and lower limit of 100 and 0 was set for the random sampling from the proportion dataset. 31 Xtj Xt (tj) j _ individual t = time j Observations from group 1 Observations from group 2 t C^Group 1 ECDF C^Group 2 ECDF Random sampling 1 Simulated dataset of size n Simulated dataset of size n +/- median differences Simulated dataset for groups 1&2^> C^Simulated dataset for groups 1&2 Wilcoxin test p <a Wilcoxin test p <a Repeat j times "Estimate power of the study with Bx simulated datasets of size n average of (number of time p< a / j) Figure 2.2 Static sample size calculation pipeline. For the first comparison, between 21 aGvHD and three out of four non-GvHD patients, observed values from the immune cells CD3 +CD4 +CD8p + taken closest to 21 days post-transplant were used. For the second comparison, between seven aGvHD ,.,& cGyHD, and nine aGvHD only patients, observed values from the immune cells CD3 +TCRab +CD5 +TCRgd + taken closest to seven days prior to the aGvHD diagnosis were used. Various simulated dataset sizes were used for both comparisons. However, sizes of aGvHD simulated data were two times larger than the non-GvHD simulated data sizes in order to imitate the aGvHD manifestation rate in the BMT patients. On the other hand, equal sizes were assigned between the aGvHD & cGvHD and aGvHD simulated datasets. 2.5.1 Weight values in the functional linear discriminant analysis classification In a FLDA classifier, large absolute weight values are assigned to time points where there are large separation between the estimated class signals (Equations 1.3 and 1.4). For the static sample size calculation, weight values were determined at each of the weekly knots originally used in the FLDA analysis (section 2.4.4). The reliability of the weight values were accounted for by multiplying the weight value with the ratio of the corresponding total number of observed values and the total number of expected values. In the range between half the knot interval away from each knot on both sides, the number of expected and observed values from the class with the least number of patients was noted in order to obtain the most conservative estimations. A hypothetical example of accounted weight values is described using a FLDA classifier built using fabricated samples from 21 aGvHD and three non-GvHD patients taken between 7 and 21 days post-transplant. Weight values for the weekly knots at 7,14, and 21 days post-transplant were assumed to be 2, 0.5 and 3. Sample availability-for the three non-GvHD patients were assumed to be two values at 33 seven days post-transplant, three at 14 days post-transplant and one at 21 days post-transplant. In a weekly sampled rate, one value was expected for every patient and every week. As a result, the accounted weight values were determined to be 4/3, 0.5 and 1 at each knot. Due to the lack of available values for the smaller non-GvHD patient group (between 18 and 21 days post-transplant), the estimated class separation at 21 days post-transplant was not reliable. By taking the actual number of values available around each knot into account, the greatest and the most reliable class separation was at 7 days post-transplant. 34 CHAPTER 3 RESULTS - QUALITY ASSURANCE AND B-SPLINE PARAMETERS 3.1 Quality assurance on ungated data From the QA test on ungated data, two outliers corresponding to aliquots 'Myeloids' and '3Activation' were identified in the FSC intensity density plots for patient #6 (Figure 3.1). One of the two outliers (aliquot 'Myeloids') was also identified in the ECDF plots (data not shown). Box plots failed to depict details in the distributions while most differences were observed in the FSC distribution, compared to the SSC ones [3]. 3.2 Quality assurance on gated data 3.2.1 Singular outliers In the QA test on gated data, outliers such as aliquot '3Activation' from patient #6's samples taken at 76 days post-transplant (Figures 3.2 and 3.3) were selected using the criteria outlined in section 2.4.1,. In total, 29 aliquots (< 0.4% of the dataset) were identified as visually significant outliers (Table 3.1) and removed from the dataset. While the outlier '3Activation' can be easily identified in the FSC and SSC intensities density plots (Figures 3.2 and 3.3), the same aliquot would not have been identified as an outlier due to general variations observed in the density plot of CD3-PerCP intensity (Figure F.l). Consequently, CD3-PerCP was not used in the outlier identification. Results from the CD3-PerCP density plots and their potential role in gate quality control are described in Appendix F. 35 Patent *6 FSC lima -8 am m Patent ati FSC lima 0 0 3W SEN) N=133$Q BmlwMlli«4,7» Patent e€ FSC lime 5 0 .290 m Patent e« FSC 1rm« 12 Patent o« FSC time 19 Patent « i FSC time 27 Q 200 600 N» 13839 BMNJWtdft-10.87 o 2©o eoo N-tMM Ba*twk»i»12.28 o o 0 200 * i « 19*70 BarwKvklth ~ 11M P* Merit «8 FSC lirte 32 Patent •€ FSC time 39 Patent «6 FSC im* 46 0 200 600 N * 9405 BarMfMrtdti • 5.«19 D 200 N«SB1D BanctaWth » 9224 0 200 600 Patent FSC tim» 60 P«tterrt«iFSCim»«? N-1365 Bandiwvnri-fi^l7 M« 11985 Bern***!) - 12.2* 0 200 600 Nl * 17389 Bartdwtth •10.: Figure 3.1 Density plots of the FSC intensity of different aliquots of samples taken at 12 different time points (adopted from [3]). At day 46, the two red arrows show distributions corresponding to aliquots 'leukocyte' and '3Activation' are substantially different from other aliquots. 36 o 100 200 300 400 500 600 FSC intensity Figure 3.2 Density plot of the FSC intensity using CD3+ cell population from seven aliquots of patient #6's 76 days post-transplant sample. Aliquot '3Activation' was identified as a visual outlier. 37 Figure 3.3 Density plot of the SSC intensity using CD3+ cell population from seven aliquots of patient #6's 76 days post-transplant sample. Aliquot '3Activation' was identified as a visual outlier. 38 Table 3.1 Outliers identified in the QA test on gated data. Patient # Cell population Outlier aliquot Time point (days post-transplant) P 3 CD3- 2Activation 14 CD3 + 2Activation 14 CD3- 3Activation 0 P 4 CD3- T cells 81 CD3 + T cells 81 CD3 + TCR 32 P 6 CD3 + 3Activation 76 and 83 p7 . CD3 + TCR 35 P 9 CD3 + TCR 32 pl3 CD3 + TCR 20 pl4 CD3 + TCR 21 pl7 CD3 + TCR 34,41, and 55 pl8 CD3 + 1 Activation -6, 27, 34, and 41 CD3- T cells 0 pl9 CD3- T cells 28 and 38 p20 CD3 + TCR 28 p23 CD3- T cells 28 p25 CD3 + TCR 7 and 21 p31 CD3 + TCR 21,35, and 70 An example of an outlier and its representation in the density and ECDF plots is shown in Figures 3.4 and 3.5. Among the five available aliquots from patient #4's sample taken at 81 days post-transplant, aliquot 'T cells' exhibited a shift in the intensity while maintaining similar shape. In this case, evidence of this outlier was more prominent in the density plot (Figure 3.4), compared to its corresponding ECDF plot (Figure 3.5). 39 o o <£> O O i n o o o o o a5 o o ° "1 § S d CN O O O o o o o § -o 1 Activation 2Activation 3Activation TCR T cells . . — . M M . -~~I 1 1 1 1 100 200 300 400 500 FSC intensity 600 700 800 Figure 3.4 Density plot of the FSC intensity using CD3- cell population from five aliquots of patient #4's 81 days post-transplant sample. Aliquot'T cells' was identified as a visual outlier. 40 1 Activation 2Activation 3Activatcn TCR T cells J I L time 81 200 400 600 800 F S C intensity Figure 3.5 ECDF plot of the FSC intensity using CD3" cell population from five aliquots of patient #4's 81 days post-transplant sample. Aliquot'T cells' was identified as a visual outlier. 41 3.2.2 Unusually large variations among aliquots Among all the density plots of FSC and SSC intensities, there were 15 occurrences (Table 3.2) of unusually large variations among the available aliquots. These aliquots (1.4% of the dataset) were removed from the dataset. An example of this trend is shown using the density plot of the CD3" cell population from patient #28's sample at 14 days post-transplant (Figure 3.6). Although most density plots were mono- or bi-modal and relatively smooth, these 15 samples exhibited rapid polymodal distribution in both FSC and SSC intensity plots. The unusually large variations were also observed in the corresponding ECDF plots; however, the pattern was less apparent without details in the polymodal shape (data not shown). Upon visualization of the FCM data, less live cells were present in some of the aliquots identified (aliquots taken at 53 days post-transplant) with this unusually large variations compared to aliquots from sample taken at different time point (27 days post-transplant) (Figure 3.7). Table 3.2 Cell populations and samples where CD3+ or CD3- cell population exhibited unusual variations among the available aliquots. Patient #• Cell population Time point (days post-transplant) P 4 CD3 + 0 P6 CD3 + 46 CD3- 53 p9 CD3- 6 plO CD3- 6 pl5 CD3- 7 p20 CD3- 7 CD3 + 49, 56, and 63 P26 CD3- 1, 7, and 14 p28 CD3- 14 p29 CD3- 0 42 o CN O O LO T— o o o o LO o o o o o o o T 1 Activation 2Activation 3Activation TCR T cells rest/act T helper rest/act T suppressor T 100 200 300 400 500 FSC intensity 600 700 800 Figure 3.6 Density plot of the FSC intensity using CD3- cell population from seven aliquots of patient #28's 14 days post-transplant sample. All aliquots exhibited great variations from each other. Similar observations also occur in 1 other samples. Aliquot '3 Activation' Aliquot 'TCR' 27 days post-transplant y 6oo i ' 1 1 1 ' ' ' i ' ' ' i 1 ' ' i 0 200 400 600 800 1000 FSC 0 200_. 400_.600 . .800 1000 FSC 53 days post-transplant 200 400 600 800 1000 hi CD 200 400 600 800 1000 FSC FSC Figure 3.7 FCM contour graphs of FSC vs. SSC from patient #6, aliquots 'TCR' arid '^Activation' from samples taken at 27 and 53 days post-transplant. 44 3.2.3 Repeated outlier conditions The last unusual pattern I found were repeated 'rest/act T helper' and 'rest/ act T suppressor' outliers. There were 33 cell populations where there were two distinct aliquot clusters (Table 3.3). In all cases, the 'rest/act T helper' and the 'rest/act T suppressor' aliquots exhibited similar pattern and formed one cluster whereas all other available aliquots formed another. This trend was most frequent in patients #6's and #7's samples. An example is shown with patient #7's sample taken at the day of BMT. In the CD3cell population density plot (Figure 3.8), both shape and intensity were different between the two clusters: i. 'rest/ act T helper' and 'rest/act T suppressors'; and ii. '1 Activation', '2Activation', '3Activation', 'TCR' and 'T cells'. Relatively small variations were observed within each cluster. Table 3.3 Cell populations and samples where the two aliquots res^ act T helper and res^ act T suppressor exhibited similar pattern within and different pattern compared to all other available aliquots. Patient # Cell population Time point (days post-transplant) P 6 CD3- 0,5,19,27,32,39,46, 60, and 67 CD3 + 60 and 67 p7 CD3- -4, 0, 7,21,28,35,49,56, 63, 70, and 77 P 8 CD3- 19,33,42,49, 54, and 61 P9. CD3- -6, 55, and 62 pl9 CD3- 0 CD3 + 77 •p21 CD3 + 21 45 CN o o o o oo o o d CD o o o o CN o o d o o o d 1 Activation 2Activation 3Activation TCR T cells rest/act T helper rest/act T suppressor 100 200 300 400 S S C intensity 500 600 Figure 3.8 Density plot of the SSC intensity using C D 3 - cell population from seven aliquots of patient #7's sample taken at the day of BMT. Aliquots 'resi/act T helper' and 'rest/act T suppressor' exhibited different pattern than all other aliquots. 46 3.2.4 Outlier distributions on the 96-well plate Distributions of all outliers and unusual patterns on the 96-well plate were investigated. The plating for samples from patient # 6 is shown as an example (Table 3.4). The two outliers, both from aliquot '3Activation', were from sample taken at 76 and 83 days post-transplant and were found to be platted next to each other in column at the left-hand corner of the second plate (Table 3.4). Unusually large variations were observed among all ten aliquots from samples taken at 46 and 53 days post-transplant (Table 3.2). Most of these aliquots were plated in ninth and tenth columns of the first plate and top of seventh and ninth columns of the second plate (Table 3.4). Furthermore, for all but one sample taken between 0 and 67 days post-transplant, two aliquots 'rest/act T helper' and 'rest/act T suppressor' exhibited similar pattern to each other while being completely different to other aliquots (Table 3.3). These aliquots were plated on different plate - the two rest/act aliquots were plated on the second plate while most of the other aliquots were plated on the first plate (Table 3.4). There were many outliers observed from aliquots close or next to each other in column as there were a trend of cluster of time points when the same aliquot were identified as outliers at multiple time points (Table 3.1). Among the 29 outliers, 20 were aliquots 'TCR' or 'T cells' and 13 of which were identified from samples taken between 20 and 40 days post-transplant (Table 3.1). In many cases, these outliers were mapped to aliquots plated in the middle of a plate (Table 3.4). Similar to patient # 6's, many of the rest/ act aliquots differences were observed when these aliquots were plated in a separate plate from most of the other aliquots. These trends could generally be observed from other patients' samples (data not shown). 47 Table 3.4 Plating order for patient #6 with samples taken at multiple time points on two plates. Aliquots identified as outliers and unusually variations are labelled with shaded areas. Plate #1 Plate Rows Aliquots 1 2 3 4 5 6 7 8 9 10 11 12 A Myeloids -8 0 5 12 19 27 32 39 46 53 60 67 B T cells -8 0 5 12 19 27 32 39 46 53 60 67 C NK cells -8 0 5 12 19 27 32 39 46 53 60 67 D B cells -8 0 5 12 19 27 32 39 46 53 60 67 E TCR -8 0 5 12 19 27 32 39 46 53 60 67 F lAct Marker -8 0 5 12 19 27 32 39 46 53 60 67 G 2Act Marker -8 0 5 12 19 27 32 39 46 53 60 67 H 3Act Marker -8 0 5 12 19 27 32 39 46 53 60 67 Plate #2 Plate Rows Aliquots 1 2 3 4 5 6 7 8 9 10 11 12 A Myeloids 76 83 90 176 -8 4b -8 46 B T cells 76 83 90 176 • P H P ! 53 WKHB. 53 C NK cells 76 83 90 176 60 OHM 60 D B cells 76 83 90 176 12 67 67 E TCR 76 83 90 176 11 7d £fflHS3 76 F lAct Marker 76 83 90 176 27 83 27 83 G 2Act Marker 76 83 90 176 32 90 90 H 3Act Marker 76 83 90 176 39 | 17b 17b Helper Suppressor 3.3 B-spline parameters The effects of the B-spline basis order and knot placements were evaluated using data from patient #2 with a missing observation at week one. First, B-splines were built with one knot at every sampled time point and three different basis orders (Figure 3.9). Although all three B-splines followed the general patterns exhibited by the raw data (red dots) by visual inspection, the B-spline with basis order two best reflected the raw data. Even though no knot was placed at week one, fitting a B-spline with basis order three imposed quadratic function between the two knots at week zero and weeks two. As a result, there was a discrepancy between the B-spline with basis order of three and the raw data pattern. A similar discrepancy was also observed between the raw data and B-spline fitted with basis order four, most evidently between five and six weeks post-transplant (Figure 3.9). Secondly, B-splines were built with linear basis order and four different knot placements with decreasing knots interval. The B-splines becomes smoother and further away from the actual raw data pattern as the knot frequency decreased (Figure 3.10). Another noticeable feature was the behaviour of each spline at week one where no observed value was available. A knot at week one resulted in an imputed B-spline pattern at either side of the knot based on the trends of the previous basis function. As a result, the imputation created discrepancy from the raw data pattern (Figure 3.10). 49 3.0 Weeks post-transplant Figure 3.9 B-splines with knots located at every available time point and orders two, three or four fitting into the raw data. 3 u CD J! c o •e o & o ba 2 H OH - l H • Raw data Knots at available data points Knots from 0 to 13, interval of 1 Knots from 0 to 13, interval of 2 1 1 Knots from 0 to 13, interval of 3 1—, 1 i 4 6 8 Weeks post-transplant 10 12 Figure 3.10 B-spline with order two and different distribution of knots fitting into the raw data. 50 CHAPTER 4 RESULTS - TOP RANKING CLASSIFIERS In order to identify patterns of immune cell abundances that correlate to the onset of aGvHD and cGvHD, the temporal analysis pipeline was performed on qualified subsets of immune cells comparing between samples taken from the aGvHD and the non-GvHD patients, and between samples taken from seven aGvHD & cGvHD and nine aGvHD only patients respectively. Top ranking classifiers with potential discriminative patterns predicting the onset of aGvHD and cGvHD are described in sections 4.1 and 4.2. 4.1 Classifiers for the onset of acute graft versus host disease Patient #17, a non-GvHD patient, was omitted from the FLDA analysis due to lack of available data within the selected time ranges. However, these data, if available, were included in the raw data time plots. Only top ranking classifiers from the proportion dataset using samples taken between 7 and 21 days post-transplant are described below (Table 4.1). All others are described in Appendix G. The complete validation results for all subsets of immune cells in each time range are listed in Tables H. l - H.3 for the proportion dataset and Tables H.4 - H.6 for the concentration dataset. The time range after BMT (7 to 21 days post-transplant) was selected to exclude the day of BMT and 21 to 28 days post-transplant when the aGvHD diagnosis rate rapidly increased (Figure 4.1) so the top classifiers may be used for aGvHD prediction. 51 Table 4.1 Validation results for the top ranking subsets of immune cells and their related cell populations from the FLDA classification with different subsets of aGvHD vs. the non-GvHD patients using samples taken between 7 and .21 days post-transplant, (nd = not done due to lack of data). Immune cells Aliquot aGvHD Grade II-IV aGvHD Grade III-IV aGvHD Sensitivity Specificity Sensitivity Specificity Sensitivity Specificity CD2 d i m CD16 + CD56 + CD3- NK cells 90% 100% 82% 100% 92% 67% CD3+CD4+CD8(3+ T cells 86% 100% 82% 100% 92% 100% CD3+CD4int 2Activation 81% 100% 82% 100% 83% 100% CD3 +CD4 +CD8p +CD8 + T cells 71% 100% 76% 100% 83% 100% CD3 + lActivation 90% 33% 94% 33% 92% 33% CD3 + 2Activation 86% 33% 94% 33% 92% 33% CD3 +CD4 + rest/ act T helper nd nd nd nd nd nd CD3+CD8pto»CD8- T cells 90% 0% 82% 67% 83% 67% CD3+CD8P+CD4- T cells 81% 33% 76% 33% 75% 33% CD3+CD8+CD8p- T cells 81% 33% 76% 33% 83% 33% CD3+CD4+CD8p- T cells 90% 33% 100% 33% 100% 0% CD3 +CD8p +CD8 + T cells 81% 33% 76% 33% 75% 33% Days post-transplant Figure 4.1 Cumulative distribution of the aGvHD diagnosis days post-transplant with the selected time range between 7 and 21 days post-transplant labelled. 53 4.1.1 Inconsistent classifier by missing values The FLDA classifier built on the immune cells CD2 d i mCD16 +CD56 +CD3- was estimated to have the highest sensitivity and specificity (Table 4.1). The FLDA estimated signals exhibited a very clear separation between the aGvHD and the non-GvHD patients at seven days post-transplant (Figure 4.2a). However, the separation around seven days post-transplant between the aGvHD and the non-GvHD patients was not observed in the raw data time plot of the same time range because there were no data available from the non-GvHD patients between seven and ten days post-transplant (Figure 4.2b). In the extended raw data time plot, the proportion values from two out of three non-GvHD patients were as high as the values from most aGvHD patients (Figure 4.3). Unlike all other top ranking classifiers described below, this subset of immune cells did not display a consistent pattern in its extended raw data time plot. 54 6 8 10 12 14 16 18 20 22 Days post-transplant o a U ca aGvHD non-GvHD 10 18 20 22 12 14 16 Days post-transplant Figure 4.2 Time plots of the FLDA estimated signals (panel a) and the raw data (panel b) based on samples taken between 7 and 21 days post-transplant for the immune cells CD2dimCD16+CD56+CD3-in proportion to PBMC. 55 0 20 40 60 80 100 Days post-transplant Figure 4.3 Raw data time plot for immune cells CD2dlmCD16+CD56+CD3-in proportion to PBMC based on samples taken between 0 and 100 days post-transplant. The purpled striped box indicates the time range where data was analyzed via FLDA. c n 4.1.2 CD3+CD4+CD8p+(CD8+) The FLDA classifier built from the immune cells CD3 +CD4 +CD8p + was identified as one of the top ranking classifiers with the estimated sensitivity and specificity both higher than 70% in two time ranges: 7 to 21 days post-transplant (Table 4.1) and 21 and 0 days prior to aGvHD diagnosis (Table H.2). Estimated sensitivity and specificity increased in the supplementary comparisons between moderate or severe aGvHD and non-GvHD patients (Table 4.1). The FLDA estimated signals time plot (Figure 4.4) displayed a pattern of higher PBMC proportion values from the aGvHD patients, compared to values from the non-GvHD patients. A similar pattern was also observed in the FCM data in contour graphs between CD4 and CD8p intensities (Figure 4.5). In the extended raw data time plot (Figure 4.6), all but one aGvHD patient had higher values and greater fluctuation, compared to the non-GvHD patients, within the time range from 0 to 120 days post-transplant. Patient #25, who was diagnosed with grade I aGvHD at 44 days post-transplant, had CD3 +CD4 +CD8p + proportion values lower than 0.5% from 0 to 50 days post-transplant. There were two sudden increases in the CD3 +CD4 +CD8p + proportion for patient #6's samples taken at 53 and 90 days post-transplant (Figure 4.6). They were the results of minimal amounts of viable cells in the aliquots (data not shown). Similar incidences were observed in the immune cells CD3 + CD4 k l t described in section 4.1.3. A new subpopulation was gated within the immune cells CD3 +CD4 +CD8p + to obtain abundance readings for a new immune cell population CD3 +CD4 +CD8p +CD8 + (Figure 4.7). The FLDA classifier from this new subset of immune cells had an estimated 71% sensitivity and 100% specificity (Table 4.1), and displayed a similar pattern in both the raw data and FLDA signal time plots (Figure 4.8) to its parent population. All other related immune cell populations that were positive in only one of the CD4 or CD8/CD8P markers had a lower estimated 57 sensitivity and specificity (Table 4.1) and did not exhibit discriminative pattern between the two patient groups (Figure 4.12). 6 8 10 12 14 16 18 20 22 Days post-transplant Figure 4.4 F L D A estimated signals time plot based on samples taken between 7 and 21 days post-transplant for immune cells CD3+CD4+CD8p+ in proportion to P B M C . 58 ca co Q O Non-GvHD patient day 0 day 4 1 0 ' p24.3 - 10 3 --@ I 1 0 2 i 10 ' -0 : 0 0 101 10 2 10 3 I—21.8-1.41 n o ° 69 aGvHD patient day 0 day 7 10-1 0 -10' 26.2 67 ' I I ' " ' " " I — I I 0 10' 1Q2 10 3 day 11 day 18 1 0 ' 1—22 1 2.61 I 0 67. D _ •„ _ j 0 10 1 10 2 10 3 day 14 day 21 0 10' 10^ 10' 0 10' 10 z 10 a 0 10 1 10 2 10 3 CD4 Figure 4.5 FCM contour graphs of transformed CD4 and CD80 marker measurements for a non-GvHD patient (#4) and aGvHD patients (#27) between zero and three weeks post-transplant. The CD3+CD4+CD8p+ population is gated within the double positive gate. 0 20 40 60 80 100 120 Days post-transplant Figure 4.6 Raw data time plot for immune cells CD3+CD4+CD8p+ i n proportion to P B M C , based on samples taken between 0 and 120 days post-transplant. The purpled striped box indicates the time range where data was analyzed v ia F L D A . O c 0 10 1 10 2 10 3 0 10 1 10 2 10 3 CD4-FITC CD8 -APC Figure 4.7 An example of sequential gating of the existing cell population CD3+CD4+CD8p+ (red gates, panels a, b, and c) to identify a new immune cell population CD3+CD4+CD8p+CD8+ (panel d). 61 5H 4 = o •js o — c U s pa 2H — aGvHD — non-GvHD 10 JS 20 22 12 14 16 Days post-transplant Figure 4.8 Time plots of the FLDA estimated signals (panel a) and the raw data (panel b) based on samples taken between 7 and 21 days post-transplant for the new immune cell population CD3+CD4+CD8p+CD8+ in proportion to PBMC. 62 4.1.3 CD3+CD4 i n t The FLDA classifier built using the immune cells CD3 + CD4 i n t (aliquot ^Activation') had an estimated 71% sensitivity and 100% specificity (Table 4.1). The time plots of FLDA estimated signals (Figure 4.9a) and raw data (purple stripped area, Figure 4.9b) exhibited similar patterns to that of the immune cells CD3+CD4+CD8(3+ (Figure 4.4). In the FLDA estimated signals time plot, the aGvHD patients had higher proportion values of this subset of immune cells, compared to the non-GvHD patients, and the main separations were found around 7 and 14 days post-transplant. This pattern persisted in the raw data time plot from 0 to 100 days post-transplant (Figure 4.9b). There were also two peaks from patient #6's samples at 53 and 90 days. At 39 days post-transplant, the proportion value was 1%. It increased to 26% at 53 days post-transplant and returned to 2.6% at 60 days post-transplant. After a relatively flat pattern between 60 and 83 days post-transplant, the value increased again to 25% at 90 days post-transplant. Similar peaks from patient #6 were also observed in the immune cells CD3+CD4+CD8f3+ (Figure 4.6). The corresponding FCM data from samples taken around the aforementioned time points were examined. Aliquot'T cells' from samples taken at 53 and 90 days post-transplant exhibited very different pattern with less live cells within the gate in both the FSC-SSC scatter plot and CD3-PerCP histogram, when compared to sample taken before (46 days post-transplant) or after (60 days post-transplant) the sudden peaks (Figure 4.10). 63 0 20 40 60 80 100 Days post-transplant Figure 4.9 Time plot of the FLDA estimated signals (panel a) based on samples taken between 7 and 21 days post-transplant and time plot of the raw data (panel b) based on samples taken between 0 and 100 days post-transplant for the immune cells CD3 +CD4 i n t in proportion to PBMC (aliquot '2Activation'). The purpled striped box indicates the time range where data was analyzed via FLDA. 6 4 Day 46 Day 53 Day 60 - — — CD3 ; Figure 4.10 FCM data in scatter plot of FSC vs. SSC and histogram of CD3-PerCP intensity from patient #6, aliquot "T cells' from samples taken at 45,53, and 60 days post-transplant. 65 It was also noted that two other subsets of immune cells: CD3 + and CD3 +CD4 + , representing immune cell populations closely related to cells with the phenotype CD3 + CD4 i n t did not exhibit discriminative patterns between the aGvHD and the non-GvHD patients. Multiple readings from the CD3 + immune cell population all had approximately 86% sensitivity but only 33% specificity (Table 4.1). In the time plot of CD3 + immune cell population (Figure 4.11), the proportion values were high from both the aGvHD and non-GvHD patients. The subset of immune cells CD3 + CD4 i n t was not analyzed via FLDA because of insufficient data. Regardless, its raw data time plot did not exhibit discriminative pattern (Figure 4.12). All four subsets of immune cells: CD3 +CD4 +CD8p + (Figure 4.6), CD3 + CD4 i n t (Figure 4.9), CD3 + (Figure 4.11), and CD3 +CD4 + (Figure 4.12) exhibited a rapid decrease in their proportion values between 7 and 21 days post-transplant followed by an increase. A common trend was observed in the four aforementioned subsets of immune cells and was more apparent in the latter two. However, it should be noted that this trend was present from most immune cell populations identified in the present study (data not shown). 66 0 20 40 60 80 100 Days post-transplant Figure 4.11 Raw data time plot for immune cells CD3+ (aliquot 'lActivation') in proportion to PBMC based on samples taken between 0 and 100 days post-transplant. The purpled striped box indicates the time range where data was analyzed via FLDA 0 20 40 60 80 100 Days post-transplant Figure 4.12 Raw data time plot for immune cells CD3+CD4+ (aliquot 'resl/act T helper') in proportion to PBMC based on samples taken between 0 and 100 days post-transplant. The purpled striped box indicates the time range where data was analyzed via FLDA. ON 00 4.1.4 Static sample size analysis The FLDA classifier built using the immune cells CD3 +CD4 +CD8p + was the best classifier with consistent pattern observed in both the FLDA estimated signals and the raw data time plots (Figures 4.4 and 4.6). Values obtained closest to the 21 days post-transplant, when the accounted absolute weight value was largest, were used for the static sample size calculation. Even though the FLDA weight value was the largest at seven days post-transplant, the group separation observed around 21 days post-transplant were deemed more reliable because there was no available data from non-GvHD patients between seven and ten days post-transplant. Different sizes of the simulated aGvHD and non-GvHD datasets were tested. The present study compared data between 21 aGvHD and three non-GvHD patients and had an estimated 29% power at 90% confidence level. The unbalanced risk of aGvHD developments among HSCT patients severely compromised the analytical power. In order to achieve a study with 82% power at 90% confidence level, approximately 38 aGvHD and 18 non-GvHD patients will be required (Table 4.2). Table 4.2 Estimated power of study via the static sample size calculation using CD3+CD4+CD8p+ proportion values from samples taken closest to 21 days post-transplant. aGvHD patients required Non-GvHD patients required Power estimated from aGvHD (a<0.1) Power estimated from non-GvHD (a<0.1) Average power (a<0.1) 21 3 29% 48% 39% 20 10 49% 77% 63% 30 15 62% 92% 77% 38 18 69% 95% 82% 40 20 73% 96% 85% 42 21 73% 97% 85% 46 23 77% 98% 87% 48 24 77% 99% 88% 50 25 79% 99% 89% 52 26 81% 99% 90% 69 4.2 Classifiers for the onset of chronic graft versus host disease . . Only top ranking classifiers from the proportion dataset using samples taken between 21 and 0 days prior to aGvHD diagnosis (Table 4.3) are described below. All others are described in Appendix I. The complete validation results for all subsets of immune cells in each time range are listed in Table J.l - J.3 for the proportion dataset and Tables J.4 - J.6 for the concentration dataset. 4.2.1 Inconsistent classifiers by pattern outlier Even though there were more FLDA classifiers with high sensitivity and specificity for the onset of cGvHD compared to aGvHD (Chapter 4), only a fraction of the top ranking classifiers exhibited comparable patterns in both FLDA estimated signals and raw data time plots. From the time range of 21 to 0 days prior to aGvHD diagnosis, all the subsets of immune cells with putative discriminative patterns in their raw data time plots exhibited opposite FLDA signal patterns between groups. All other-top classifiers were deemed inconsistent due to the presence df pattern outliers (Table 4.3). The classifier built using the immune cells 45RA+CD3+ had an estimated 71 % sensitivity and 86% specificity. However there was no clear separation between most of the individual FLDA estimated signals (Figure 4.13a). Only two patients (#6 and #12) had proportion values above 30% between 20 and 7 days prior to aGvHD diagnosis (Figure 4.13b). These values caused the overall FLDA global base values (cross dots, Figure 4.13a) to rise thus separating the two groups. 70 Table 4.3 Validation results for the top ranking subsets of immune cells from the FLDA classification between the aGvHD & cGvHD and GvHD only patients using samples taken between 21 and 0 days prior to aGvHD diagnosis. Immune cells Aliquot t Sensitivity Specificity Accuracy Pattern types CD45+CD33-CD15+CD14- Myeloids 71% 100% 88% 45ROCD3-CD4d i m rest/ act T helper 86% 86% 86% 45RACD3- rest/ act T suppressor 86% 86% 86% CD3- 3Activation 71% 89% 81% 45RACD3-CD4d i m rest/ act T helper 86% 71% 79% 45RACD3- rest/ act T helper 71% 86% 79% opposite FLDA signals CD3- rest/ act T helper 86% 71% 79% CD3- rest/ act T suppressor 71% 86% 79% CD3CD8- rest/act T suppressor 71% 86% 79% CD3- 2Activation 71% 78% 75% CD3- T cells 71% 78% 75% CD3 + rest/ act T helper 71% 71% 71% CD3 + rest/ act T suppressor 71% 71% 71% 45RACD3+ rest/ act T helper 71% 86% 79% C D 4 d i m rest/ act T helper 86% 71% 79% 45RACD3+ rest/ act T suppressor 71% 86% 79% pattern outlier CD3-44+25- lActivation 71% 78% 75% CD3-CD4 d i m 3Activation 71% 78% 75% aGvHD & cGvHD • - - aGvHD onlv -20 -15 -10 -5 0 Days from aGvHD diagnosis Figure 4.13 Time plot of the FLDA estimated signals (panel a) and raw data (panel b) based on samples taken between 21 and 0 days prior to aGvHD diagnosis for the immune cells 45RA +CD3 + in proportion to PBMC (%). 72 4.2.2 Opposite estimated signals between groups All 13 subsets of immune cells exhibiting consistent patterns between their FLDA estimated and raw data time plots, displayed exactly opposite FLDA signal patterns between the two patient groups. The top two classifiers exhibiting this pattern were CD45+CD33CD15+CD14- and 45RO +CD3CD4 d i m . The FLDA signals between the two patients groups were the exact opposite of each other (Figures 4.14a and 4.15a). However, this pattern could not be easily identified in the local or extended raw data time plots for either subset of immune cells (Figures 4.14b and 4.15b). 4.2.3 Static sample size analysis The FLDA classifier built using the immune cells 45RO + CD3CD4 d i m based on samples taken between 21 and 0 days prior to aGvHD diagnosis had the highest sensitivity (86%) and second highest specificity (86%) among the consistent top ranking (Table 4.3). In this case, the largest and the most reliable group separation were determined to be around 7 days prior to aGvHD diagnosis. . Consequently, values obtained closest to that time were used for the static sample size calculation with equal sizes for aGvHD & cGvHD and aGvHD only simulated datasets (Table 4.4). The present study with seven aGvHD & cGvHD and nine aGvHD only patients had an estimated 50% power at 90% confidence level. In order to achieve a study with 81% power at 90% confidence level, approximately 23 aGvHD & cGvHD and 23 aGvHD only patients will be required. 73 -20 -15 -10 -5 0 Days from aGvHD diagnosis -20 -10 0 10 20 Days from aGvHD diagnosis Figure 4.14 Time plot of the FLDA estimated signals (panel a) based on samples taken between -21 and 0 days from aGvHD and time plot of the raw data (panel b) based on samples taken between -21 and 21 days from aGvHD diagnosis for the immune cells CD45+CD33CD15+CD14- in proportion to PBMC. The aGvHD diagnosis day is labelled at day 0. 74 aGvHD & cGvHD — - aGvHDonlv -20 -15 -10 -5 Days from aGvHD diagnosis -20 -15 -10 -5 Days from aGvHD diagnosis Figure 4.15 Time plot of the FLDA estimated signals (panel a) and raw data (panel b) based on samples taken between 21 and 0 days prior to aGvHD diagnosis for the immune cells 45RO+CD3CD4d i m in proportion to PBMC (%). 75 Table 4.4 Estimated power of study via the static sample size calculation using 45RO+CD3CD4d i m proportion values from samples taken closest to 7 days prior to aGvHD diagnosis. aGvHD & aGvHD only Power estimated Power estimated Average cGvHD patients patients from aGvHD & from aGvHD power required required cGvHD (a<0.1) only (a<0.1) (a<0.1) . 7 9 67% 34% 50% 10 10 78% 35% 56% .. - 15 15 91% 49% 70% 20 20 97% 58% 77% 23 23 98% 63% 81% 25 25 99% 68% 83% 30 30 100% 74% 87% 35 35 100% 79% 90% 40 40 100% 84% 92% 45 45 100% 88% 94% 50 50 100% 91% 95% 60 60 100% 95% 97% 76 CHAPTER 5 DISCUSSION For many patients diagnosed with hematopoietic disorders, HSCT is the only curative treatment [1]. However, the risk of developing fatal GvHD makes it the major limiting factor for broader application of the HSCT procedure [1]. Currently, there is no definitive diagnosis method, standard for treatment or treatment assessment, and very little understanding on the disease's pathophysiologic mechanism. High-throughput genomic experiments have been useful in elucidating many diseases or conditions [79, 80,108,109]. Previous microarray studies have suggested multiple gene expression patterns associated with the onset of GvHD, however none were found to be statistically significant [110-114]. Proteomic methods such as surface-enhanced laser desorption ionization time-of-flight (SELDI-TOF) [115] and capillary electrophoresis coupled mass spectrometry (CE-MS) [116] have been utilized in studying GvHD [117,118]. Both were pilot studies and further work is needed to link the peptides identified with known proteins in order to infer their role in the immune system and GvHD manifestation. Compared to genomic methods such as microarrays, proteomic methods and FCM have the advantage of visualizing physical characteristics of cells such as protein functions directly. The main hypothesis of the present study was that one or more immune cell populations with differential temporal patterns that correlate to the onset of either aGvHD or cGvHD could be identified and potentially be used to predict the disease. The present dataset had the complexity of a microarray data with a large number of immune cell population abundances that were screened for their potential discriminative powers for either aGvHD or cGvHD. The present study was a pilot project with the main objective of assembling a temporal analysis pipeline for the high-throughput clinical FCM dataset. To the best of my knowledge, there is no 77 existing temporal analysis pipeline purposely designed for large-scale FCM data. Consequently, the majority of the discussion is devoted to experimental and analytical difficulties of the present study and corresponding improvements for a future one. In sections 5.1 to 5.3, obstacles from each step of the analysis pipeline (Figure 2.1): QA, data transformation, and temporal classification are discussed. Then possible predictive models and pathophysiologic mechanism for aGvHD and cGvHD are examined in section 5.4. A list of specific recommendations to improve the efficiency of future studies where current GvHD models will be validated is discussed in section 5.5. 5.1 Quality assurance QA is an essential step in the analysis of any high-throughput dataset [119-121], probably more so in the case of clinical data with limited samples. The assumption of the QA test used in this study was that distributions of common intensities from different aliquots of the same sample should be similar [3]. Two aliquots were identified as outliers in the QA test on ungated data (Figure 3.1). Whereas 29 aliquots were identified as outliers in the QA on gated CD3 + and CD3" live M N C populations (Table 3.1). Among the outliers, I observed both intensity shift (Figures 3.4 and 3.5), density or ECDF shape difference (Figures 3.2 and 3.3), or the combination of both (Figure 3.6). A simple intensity shift might indicate a different concentration of reagents during the staining procedure in the well corresponding to the outlier aliquot [3]. However, sources for other distribution differences were less understood. While further study is required to investigate the precise causes of outliers and unusual trends discussed below, they indicated potential complications with the FC-HCS technique [2], At the end, based on the QA test results, approximately 1.8% of the dataset were removed from subsequent analyses because the differences observed might not be biologically but experimentally motivated. 78 5.1.1 Quality assurance on ungated and gated data Analyzing both ungated and gated data each had their own advantages. Ungated data offered QA assessment without the interference of manual gating. On the other hand, the QA test on gated data provided an assessment of the gate quality. In addition, the QA test on the gated data provided an assessment of the population data that were used in the subsequent FCM analyses. There was no overlap of outliers identified in the two parts of the QA test. For QA visualization, FSC was observed to have more informative patterns than SSC in the QA test of ungated data [3]; while both FSC and SSC displayed similar patterns and were both useful in QA test of gated data (Figures 3.2 and 3.3). FSC and SSC, which are strongly influenced by cell size and granularity respectively, are often used together in FCM gating to identify and exclude dead cells from further analyses (Figure 1.1a). Dead cells and debris that were excluded in the gated data have a very broad SSC intensity range and overlap with the relatively narrow SSC intensity range from MNC (Figure 5.1). Thus, the FSC intensity was more informative than the SSC intensity in the ungated data because more variations were observed from different cell types. Many unusual patterns might only be visible after removal of dead cells and debris via the gating procedure. Visualization using the CD3 intensity was proven to be the least informative in outlier identification as more variations were expected and observed because of the sensitivity of the antibody and the limited number of aliquots available (Figure F.l). 79 Dead cells & debris Neutrophils Monocytes Lymphocytes Figure 5.1 A pictorial example of FSC vs. SSC dot plot from a normal peripheral blood sample (adapted from [122]). Outliers and other unusual patterns were frequently found in either CD3 + or CD3-population but rarely both (Table 3.1). These observations could be related to the fact that CD3 + and CD3~ gates represented two different immune cell populations. The gated data only included live PBMCs that were divided into CD3 + and CD3-populations. CD3 and TCR are exclusively expressed on 70% to 80% peripheral blood T cells [123]. Thus, CD3 + and CD3" populations represented T cells and non-T cells among the PBMC populations. Future studies are needed to determine why these two cell populations behave differently and if certain cell population is more prone to experimental errors. The trends observed among the outliers (Table 3.1) and unusual patterns (Tables 3.2 and 3.3) indicated possible non-random plating effects. Many outliers were mapped to aliquots plated close to each other or cluster of aliquots in the middle of the plate (Table 3.4). These trends potentially indicate: 1. Improper washing leading to false reading from cluster of wells and wells in the middle; 2. 80 Contamination affecting multiple wells next to each other; 3. Edge drying causing false readings from wells at the edges; 4. Different reagent or cell concentrations among wells; and 5. Different logarithmic compensations. The unusual pattern from the two rest/act aliquots (Table 3.3 and Figure 3.8) which were often mapped to a separate plate may also suggest noticeable differences in readings from different plates or effects of different sample storage time. Further examination of the F C M gates from the FSC-SSC contour graphs (Figure 3.7) indicate that the occurrences (Table 3.2) of unusually large variations (Figure 3.6) among all aliquots could result from interference of dead cells or a minimal amount of viable cells in patient samples. From a sample with a minimal amount of viable cells, the proportion of any subgates may be incorrect because there are not enough cells in the sample to represent the overall population properly. 5.1.2 Quality assurance via raw data time plots Raw data time plots used in the visualization of FLDA classification may also be used as an additional QA test. Biologically, it is impossible to have an abrupt increase in either PBMC proportion or cell concentration such as the two peaks observed from patient #6 at 53 and 90 days post-transplant (Figures 4.6 and 4.9b). Upon visual inspection of the gated FCM data (Figure 4.10), I discovered that these abrupt increases were the result of an experimental error likely from a minimal amount of viable cells in the FSC-SSC gate. While the QA test via raw data time plots could be very useful in identifying experimental errors, it would require long time-series data. In addition, implementation of this QA test to large-scale data would., require further studies, on the rate of immune responses to establish a threshold for the rate of increase from a biological standpoint. 81 5.1.3 Robustness of the flow cytometry high content screening technique Unfortunately, not all the trends mentioned above were always consistent in their distribution in the plates. There was only enough evidence to suggest possible plating effects but not to confirm it. Further studies are required to investigate the robustness of the FC-HCS technique [2], to elucidate the precise causes of these outliers, and to improve the present QA test procedure. Preferably, a larger quantity of samples from healthy individuals would provide a larger number of aliquots for outlier identification and a lower likelihood for occurrences of minimal viable cells to be used for future studies. Frequencies of outliers observed in different antibody-fluorochrome intensity, different location within a plate and between plates could be used to validate the current results. Furthermore, a larger number of aliquots may be used to determine the overall experimental variations among aliquots. Statistical tests such as the analysis of variance and visualizations such as box plots [124] in addition to the existing visualization methods for the outlier identification could potentially identify bias caused by the current manual visualization. Fortunately, some of the potential causes for these outliers such as difference in reagent concentrations and different sample storage time between plates can be easily ayoided with an organized experiment design and a smooth instrumental pipeline. In addition,'a simple procedure of random plating as discussed in section 5.5.1 may be used to combat effects of these potential plating effects. 5.2 Data issues 5.2.1 Patients The present dataset is consisted of a heterogeneous group of patients (Table 2.1). Two patient grouping comparisons using prior GvHD diagnosis knowledge were selected to train FLDA classifiers. The comparisons were also designed to conserve the study population where the main factor was the onset of aGvHD or cGvHD. 82 The first patient group comparison between aGvHD and non-GvHD patients was devised to identify subsets of immune cells with patterns that correlate with the onset of aGvHD. All 21 patients who were diagnosed with aGvHD were included. However, only four out of seven patients not affected by aGvHD or cGvHD were included (Table 2.1). Three patients who were not diagnosed with aGvHD prior to their death before 100 days post-transplant were omitted from the analyses. This strict selection was chosen because I can only be certain that patients would not have developed aGvHD if there were information available past 100 days post-transplant, when most aGvHD diagnoses were made. The 100 days post-transplant is generally recognized as the cut-off for aGvHD diagnosis; however, it is possible to diagnose aGvHD after 100 days post-transplant [39]. Please be noted that one of the remaining four non-GvHD patients was often omitted in the FLDA analysis because of lack of data. The second patient group comparison between aGvHD & cGvHD and aGvHD only patients was devised to identify subsets of immune cells with patterns that correlate with the onset of cGvHD which occurred weeks or months after the diagnosis of aGvHD. Among the 21 patients who were diagnosed with aGvHD, seven patients were later diagnosed with cGvHD and were included in the aGvHD & cGvHD dataset. However, only nine out of 14 were considered as aGvHD only patients because I could not be sure of patients who died or withdrew from the study after their aGvHD diagnoses (n=5) that they would not have developed cGvHD. De novo cGvHD patients were not considered. While diagnoses for both aGvHD and cGvHD are not definitive, a retrospective study in cGvHD diagnosis was. performed by Vogelsang and colleagues in 2001 and found 25% misdiagnoses on active cGvHD [125]. Errors from incorrect patient groups from either false cut off diagnosis time or misdiagnoses [35, 125] could be exaggerated in the present study due to the 83 limited number of patient available and cause inconsistent classifiers. These exaggerated inconsistent classifiers may be avoided with an external test dataset with an adequate number of patients. However, the sensitivity and specificity of the new diagnostic model created using a dataset with potential misdiagnoses will be limited by the accuracy of the present diagnostic methods. Tolerance to misdiagnoses is discussed further in section 5.5.6. 5.2.2 Sampling time ranges The three time ranges were selected to present patterns before and during the full clinical manifestation of aGvHD. These patterns were in turn analyzed via FLDA in order to identify immune cell populations that can predict either aGvHD or cGvHD. I decided that the time range most suitable for predicting the onset of aGvHD was between 7 and 21 days post-transplant. Classifiers found in this time range should be useful in predicting aGvHD because only four out of the total 21 aGvHD patients were diagnosed prior to 21 days post-transplant (Figure 4.1). The aGvHD diagnosis rate for the present study was comparable with previous studies where most aGvHD diagnosis is made within the first 100 days and most prominently between 14 and 42 days post-transplant [15]. The other two time ranges: 21 to 0 days prior to aGvHD diagnosis and 0 to 21 days post-aGvHD diagnosis were selected to reflect patterns occurring immediately before and after the aGvHD diagnosis. Molecular changes leading to or result of aGvHD may contribute to cGvHD manifestation at a later date as cGvHD may be a continuation of aGvHD [36]. For predicting the onset of cGvHD, the time range before the aGvHD diagnosis was selected because predictions would not be confounded by different aGvHD treatments. All three time ranges were purposely designed to be short in order to avoid loss of synchronization and smoothing requirements. 84 5.2.3 Proportion and concentration flow cytometry datasets Both the proportion and concentration datasets were tested because they might contribute different insights into the immune responses. Previous GvHD studies have used both proportion (either to PBMC or chimerism) [30] and concentration values [27, 28]. However, more errors and thus more inconsistent classifiers were expected in the concentration datasets because different samples sometimes taken at different time were used to estimate the immune cell concentration. These errors could be avoided for future studies with a coordinated sample quantity standard. 5.3 Temporal analysis Static analyses using rates of immune cell population changes from patients at multiple time points were performed. The rates of changes were extensively screened by a combination of dimension reduction via between group analysis [126] and hierarchical clustering via hierarchical ordered partitioning and collapsing hybrid [127]. However, the static approach failed to analyze the current dataset properly because of missing values, lack of synchronization events, and diverse patient response time (Table 2.1, Appendix A). Because of these shortcomings, I undertook a temporal approach for the present study. While temporal analysis has been suggested to be more efficient in analyzing biological process occurring across time [46], there are a limited number of available algorithms. During my temporal analysis investigation, I encountered three main challenges in adapting a suitable temporal analysis method for the current dataset: 1. Tolerance for missing values and non-uniform sampling time 2. Short vs. long time-series data 3. Limited number of samples 85 Using an excerpt of the current dataset, and combinations of basis order and knot placements, I determined that B-spline with a linear basis and weekly knot placements was most reflective to the raw data pattern (Figures 3.9 and 3.10). A B-spline fitted with basis order two best reflected the actual raw data especially for short time-series dataset such as the one used in the present study (Figure 3.9). Weekly knot placement was selected to fit the weekly sampled dataset because flexible knot placement was not compatible with the FLDA algorithm. While discrepancies between B-spline and raw data patterns were minimized, they could still exist and be exaggerated in a short time-series dataset with various sampling rates and missing values among the study patient population. Similar to most of the existing temporal algorithms, FLDA was intended for long time-series data with more than eight time points [128]. Yet it was difficult to analyze a time range longer than three weeks (assumed one time points per week) without the possible loss of synchronization. In addition, usage of a long time-series data in the present clinical dataset with diverse patients' response time would require potentially biased smoothing and registration procedures. As a pilot study, short time-series data were purposely selected. However, short time-series data highlighted effects of missing values (Figure 4.2) and pattern outliers (Figure 4.13) resulting in inconsistent FLDA classifiers. While LOOCV might over-estimate the classification accuracy [76], it does reflect, to a certain degree, the overall stability of the classifiers. Unfortunately, the influence of pattern outliers to the FLDA global base values was still observed (Table 4.3 and Figure 4.13). There are many possible causes for these visually extreme pattern outliers from either the proportion or the concentration dataset and they may be remedied by improvements discussed in section 5.5. Slightly different LOOCV results and FLDA classifier patterns from redundant readings such as the CD3 + immune cell population (Table 4.1 and 86 Appendices H & J) demonstrated the instability of the FLDA classification with the limited number of patients available to the present study. These errors may be remedied by an external validation with large and separate testing dataset as proposed for future studies (Section 5.5). Ideally, continuous discriminative patterns between two groups of patients are preferred. However, visually clear discriminative pattern spanning a few day are sufficient to be identified by FLDA. 5.4 Predicting the onset of graft versus host disease The biological motivation behind this study was to identify subsets of immune cells that may be used as molecular predictors of either aGvHD or cGvHD before the full clinical manifestation. All the top ranking classifiers and their corresponding subsets of immune cells might serve as potential GvHD diagnostic markers even if they do not correspond to known immune cell populations. Without knowing their function in the immune system, one limitation is that these subsets of immune cells could not be used to elucidate GvHD pathophysiologic mechanism. Lack of correction for multiple testing resulting in possible incorrect classifiers should be noted with the findings discussed below which must be validated via a future study (section 5.5). 5.4.1 Acute graft versus host disease All the consistent top ranking classifiers for aGvHD were based on the proportion dataset (Tables 4.1 and H1-H3). The three top ranking classifiers from the concentration dataset were inconsistent due to missing values and pattern outliers (Appendix G). This was expected because there were more errors in the concentration dataset. Interestingly, all but two top ranking classifiers from the proportion dataset, target CD3 + T cells or T cell subsets (Table 4.1). Apart from the inconsistent classifier CD2 d i mCD16+CD56+CD3- based on samples taken between 7 87 and 21 days post-transplant (Figure 4.2), the only top ranking classifier targeting non-T cells (CD3-) were CD3- and CD2 d i mCD16 +CD56CD3- based on samples taken between 0 and 21 days post-aGvHD diagnosis. All the CD3 + and its subsets (Table 4.1) displayed similar patterns with higher PBMC proportion values and greater fluctuation in the aGvHD patients when compared to the non-GvHD patients (Figures 4.4, 4.6, 4.8, 4.9, G. l , G.4, and G.5). Because all viable cells were divided into CD3 + and CD3- cell populations (Figure 4.7), for the proportion dataset the CD3~ cell population displayed the exact opposite pattern with higher PBMC proportion values from the non-GvHD patients (Figures G.2 and G.3). Higher proportion of CD3 + immune cells in the PBMC represents higher numbers of T cells in the peripheral blood that could be the result of inflammatory response toward the 'foreign' host tissues. Even though there is no precedent on a B-spline temporal pattern as predictive model for GvHD; the observation of higher proportions of T cells after HSCT in the aGvHD patients is comparable with other studies [27-29]. The current findings combined with other previous GvHD studies suggest that GvHD is a complex disease. While T cells' critical involvement in aGvHD (Figure 1.2) is proven by significantly less aGvHD occurrences in T cell depleted BMT [20-26], the exact subset of T cells with predictive pattern is yet to be identified The most persistent correlation to the onset of aGvHD was observed from the immune cells CD3+CD4+CD8(3+ and its subpopulation CD3 +CD4 +CD8p +CD8 + (Table 4.1). These two subsets of immune cells were higher and had greater fluctuation in the aGvHD patients, compared to the non-GvHD patients after BMT (Figures 4.4 and 4.8). This pattern was found to persist until 120 days post-transplant (Figure 4.6). FCM data in the contour graphs (Figure 4.5) confirmed the FLDA results. Interestingly, none of the related CD3 + immune cell populations with the presence of CD4 or CD8/ CD8(3 but not both exhibited similar pattern or had high estimated 88 sensitivity and specificity (Table 4.1). A future study with sufficient power (section 5.5.2) will need to determine the validity of the classifiers CD3+CD4+CD8(3+(CD8+) as predictors of aGvHD. The two subsets of immune cells CD3+CD4+CD8(3+(CD8+) target cell populations that co-express CD4, CD8aB heterodimers and CDaa homodimers. These specific phenotypes might contain an unusual group of double positive (DP) T cells and putatively suggest that the key T cell subtype for the prediction and development of aGvHD could be this unusual T cell subset. This also explains why the CD3 + and CD3~ immune cell populations were not identified as a top classifier based on samples taken between 7 and 21 days post-transplant (Table J.l). Large CD3 + proportion values were observed from both patient groups right after BMT (Figure 4.11) could be the result of residual recipient T cells which are know to survive the preparative treatments [129,130]. If so, there would only be a minimal impact on the DP T cells because of its low abundance and may not exist in the recipient prior to the BMT procedure [131-133]. The most prominent theory on T cell maturation suggests that T cell maturation is limited to thymus [133] (Figure 5.2). After the intense screening for the M H C restriction and self-tolerance, more than 95% immature DP T cells are killed via apoptosis. The remaining cells develop into mature single positive T cells (either CD3+CD4"CD8+ or CD3+CD4+CD8") and are exported into peripheral blood. Consequently, DP T cells are not normally expected to occur in peripheral blood. However, this distinction was contradicted by many reports of peripheral DP T cells in humans..[134-140].. The proportion values of DP T cells observed in the present dataset from the non-GvHD patients agreed with previous studies that most healthy individuals had less than 3% peripheral DP T cells [134,137]. Increased DP T cells have been previously observed in older individuals [138,139] and individuals with viral infections [135,140]. 89 Bone marrow Lymphoid Progenitor cells Hematopoietic stem cells Thymus Figure 5.2 T cells development and maturation. double negative T cells double positive T cells :D3 1.1)4 s A . Mature T cells CD3 JL ;D3 U K Peripheral blood The origin and function of DP T cells are still not understood. Two DP T cell pathways have been proposed [131]: premature release from thymus and extrathymic maturation [141-143]. While premature release of DP T cells from thymus is more likely in a HSCT patient where thymus damages from either the preparative treatments or aGvHD have been reported [37]; the DP T cell population observed in the present study (Figure 4.7) appears to express lower levels of CD4 than typical immature thymocytes [144]. Thus, it is more likely that the DP T cells observed are mature antigen specific cells of extrathymic origin [145] and may play a role in the aGvHD manifestation. DP T cells may consist of two or more functional subgroups [135, 146, 147]. Consequently, future studies are needed to define the activation and differentiation status of the DP T cell population using additional markers (section 5.5.4). 90 The CD2d i mCD16+CD56-CD3- classifier, though targeting non-T cells, exhibited a similar pattern with higher PBMC proportion values from the aGvHD patients, compared to the non-GvHD patients between 0 and 21 days post-aGvHD diagnosis (Figure G.3). The combination of CD3*and CD16+ exclusively targets NK cells [148]. However, previous studies on NK cells only distinguished two major NK cell subsets, both usually associated with CD2 + or CD2br: CD56b r and CD56 d i m [149,150]. The subset of immune cells CD2d i mCD16+CD56CD3-most likely targeted a NK cell subset similar to the highly dysfunctional NK subset CD56CD16+ detected in HIV-patients [151]. In vitro functional study of NK cell subset CD56CD16+ [151] suggested that expansion of CD56- NK cells cause impaired NK cell function with lower cytotoxoic activity and cytokines production. Presently, there is no existing study on the CD56- NK cells and its possible role in GvHD development. Another unknown cell type with the CD3 +CD4 i n t phenotype (Figures 4.9 and G.4) also exhibited a similar pattern to CD3 + cells based on samples taken between 0 and 21 days post-aGvHD diagnosis (Table H.3). The closest known T cell subtype with a similar phenotype is that of helper T cells (CD3+CD4+). Their main function in the immune response is to secrete cytokines responsible for proliferation and differentiation of T cells [133]. In the present study, CD3 +CD4 + temporal patterns at any time range were not found to correlate with the onset of aGvHD (Figure 4.12). Further study is required to determine if CD3 + CD4 i n t cells are a distinct immune cell population and their functions in the immune systems. 5.4.2 Acute graft versus host disease prediction model using CD3 + CD4 + CD8p + The FLDA classifier built using immune cells CD3 +CD4 +CD8p + and samples taken between 7 and 21 days post-transplant, had the highest sensitivity (86%) and specificity (100%) among the consistent classifiers. Classification of a new patient with sampled time points at 7,14, and 21 days post-transplant, can be made using 91 the following model (Figure 5.3). Based on Equation 1.4, linear discriminant value can be'calculated with Equation 5.1. 4.0 -r 3.5 -0.0 H ; 1 1 1 1 1 1 1 1 6 8 10 12 14 16 18 20 22 Days post-transplant Figure 5.3 An example of FLDA classification using immune cells CD3+CD4+CD8B+ in proportion to PBMC 0.2718 ax =-1.0823 0.0123 - 0.1767-(X-2.2034) 2.3000 Equation 5.1 The aGvHD prediction formula for patient data sampled at 7,14, M and 21 days post-transplant 9 2 0.92 In a resubstitution example, patient# 1 with observed values X = 2.77 had an 3.63 estimated linear discriminant value of -0.9. Based on the linear classification rule, patient #1 who was diagnosed with aGvHD at 26 days post-transplant and with ax smaller than zero, is classified into the aGvHD class, a true positive (Figure 5.3). The detail calculation of the weight values is available in Appendix K. 5.4.3 Chronic graft versus host disease None of the. consistent top classifiers for cGvHD exhibited patient group separation as clearly as the top ranking classifiers for aGvHD (section 5.4.1). Among the 13 (eight unique) FLDA classifiers that exhibited the opposite FLDA signal pattern (Table 4.3), none was comparable to prior cGvHD studies (Figure 1.3). None of these discriminative patterns was observed after aGvHD diagnosis probably because of different patient responses to various treatments (Table J.3). During the FLDA analysis process, random experimental errors from each sample were estimated and removed in the final FLDA classification. This could be the reason why these FLDA signal patterns exhibiting the opposite signal pattern (Figures 4.14a, 4.15a, and I.2a) could not be easily identified in the corresponding raw data time plots (Figures 4.14b, 4.15b, and I.2b). Another plausible explanation is an over-correction from FLDA, which could be amplified because of the limited number of patients available. However, the frequent occurrence of this opposite FLDA signals pattern between the patient groups suggested potential cGvHD diagnosis markers that will require further investigations. 93 All the classifiers exhibiting the opposite FLDA signals pattern are built from subsets of immune cells representing heterogeneous T cell (CD3+) and non-T cell (CD3-) subsets. The one common CD marker among all these immune cell subsets: CD45+CD33CD15+CD14-, 45ROCD3-CD4dim, 45RACD3- , 45RACD3"CD4dim, and 45RACD3- (Table 4.1) is CD45 (RO/RA). CD45 is one of the major accessory molecules in immune response and functions as a protein tyrosine phosphatases [152]. The relationship between these immune cell subsets and cGvHD manifestation is not known. The classifier built from immune cells CD3+CD4int, based on samples taken between 0 and 21 days from aGvHD diagnosis, was identified as one of the top ranking classifier for cGvHD (Table J.3). The same subset of immune cells was also identified as one of the top ranking classifiers for aGvHD (Table 4.1). Here the PBMC. proportion values for CD3 + CD4 i n t were generally higher in the aGvHD only patients compared to the aGvHD & cGvHD patients (Figure 1.3). Like the classifier using CD3 + CD4 i n t for aGvHD prediction, the relationships between this unknown cell population with the CD3 + CD4 i n t phenotype and the development of cGvHD is not yet defined. 5.4.4 Chronic graft versus host disease prediction model using 45RO+CD3-CD4 d i m The FLDA classifier built using immune cells 45RO + CD3CD4 d i m in proportion to PBMC and samples taken between 21 and 0 days prior to aGvHD diagnosis, had the highest estimated 86% sensitivity and 86% specificity (Table 4.3), excluding the inconsistent classifiers. Classification of a new patient with sampled time points at 21,15, 7, and 0 days prior to aGvHD diagnosis can be made using the following model (Figure 5.4). Based on Equation 1.4, linear discriminant value can 94 be calculated by multiplying the new values with the determined weight values (at each time point) (Equation 1.3): T3 O U i Q U + o LO " * c o •e o a. o £ 24 22 20 18 H 16 14 12 A 10 6H \ -20 Global base values Test data: patient #19 - i — -10 -15 -10 -5 Days from aGvHD diagnosis Figure 5.4 An example of FLDA classification using immune cells 45RO+CD3-CD4 d i m in proportion to PBMC. ax = 0.0762 -0.1436 0.1191 0.1091 f 5.8992 ^ 15.0097 X -14.2864 20.4889 Equation 5.2 The cGvHD prediction formula for patient data sampled at 21,15,7 and 0 days prior to aGvHD diagnosis 95 13.3 23.4 t In a resubstitution example, patient #19 with observed values X - has 12.6 13.6 an estimated linear discriminant value of -1.59. Based on the linear classification rule, patient #19 who was diagnosed with both aGvHD and cGvHD and with a negative ax, is classified into the aGvHD & cGvHD class, a true positive (Figure 5.4). The detail calculation of the weight values is available in Appendix L . . 5.5 Recommended improvements The main objectives of the present pilot study were to assemble a novel temporal analysis pipeline for the high-throughput clinical FCM data and recommend improvements in preparation for future studies. While I have demonstrated the applicability of the analysis pipeline (Figure 2.1), there are seven practical and two tentative improvements needed to achieve better efficiency and power for future studies. 5.5.1 Random plating The first recommendation for experiment procedures is random plating. The results of the QA test on the current dataset presented possible plating effects (Table 3.4). While further analysis (section 5.1.3) is required to elucidate the plating effects, random plating [153] will aid in minimizing the likelihood that changes observed are due to plating arrangements. For example, if samples taken prior to BMT are always plated in the first two columns, then it will not be clear if changes observed from these samples are from biological changes or the edge drying effect. 96 5.5.2 Patient recruitment The second recommendation is to increase patient recruitment in order to achieve a sufficient power. The estimated power to detect any specific change for this pilot study was understandably low. In the comparison between aGvHD and non-GvHD patients using the immune cells CD3 +CD4 +CD8p +, the analysis was estimated to have 29% power at 90% confidence level (Table 4.2). In the comparison between aGvHD & cGvHD and aGvHD only patients, using the immune cells 45RO +CD3-CD4 d i m, the analysis was estimated to have 50% power at 90% confidence level (Table 4.4). Based on the present data, there was 68% chance of the recruited patients developing aGvHD and 13% chance of patients not affected aGvHD including early withdraws and fatality rate before 100 days post-transplant. This unbalanced number of aGvHD and non-GvHD patients could partially be the result of biased patient recruitments. Generally, patients with higher risks for disease are more inclined to enrol in studies [154]. Among the recruited aGvHD patients, there was 33% chance of developing cGvHD and 43% chance of being free of cGvHD including early withdraws and fatality rate. Overall, I estimate that 100 HSCT patients should result in 68 aGvHD and 13 non-GvHD cases; and 22 aGvHD & cGvHD and 29 aGvHD only cases. This will support an analysis with approximately 80% power at 90% confidence level for both patient group comparisons (Tables 4.2 and 4.4). This increased patient recruitment will also improve tolerance to the normality assumption in the FLDA. In addition, sample collection can be organized in a future study so the M N C concentration and immune cell proportions may be determined using the same sample in order to minimize errors in the concentration dataset. Another set of 100 HSCT patients would allow external validation (section 5.5.7). 97 5.5.3 Sampling rate The third recommendation is to increase sampling rate immediately before and after BMT. In the present study, patients were sampled weekly. Multiple potential immune cell populations exhibiting discriminative patterns that may predict both aGvHD and cGvHD manifestations could be found following the BMT, between 7 and 21 days post-transplant (Tables H. l and J.l). The ideal sampling rate capturing immune cell population changes is daily. Flow cytometry is capable of capturing changes as small as 0.1% in the sample population [155]. In animal models, average daily turnover rates of T cells, B cells and NK cells under viral infections are 2, 3, and 3% [156]. The T cells' response to viral infection in mice can be detected in one to two days post-infection, reaching maximum by five to six days post-infection [157]. It may not be possible to establish a long-term rapid sampling rate for future studies. However, frequent sampling within the first two or three weeks of BMT, when patients might still be available in the hospital, may yield an informative dataset. The temporal analysis pipeline (Figure 2.1) requires a minimum of two samples per patient. The sampling rate can be non-uniform because of the robustness of the pipeline. Aside from the increased sampling rate around BMT, efforts should be made to obtain samples for the ends of the selected time range. Although the analysis pipeline was designed for clinical data with missing values and non-uniform sampling time, missing values still affected eligibility of the dataset to be included in the temporal analysis. 5.5.4 Additional markers The fourth recommendation is to include markers specifically for the identification of the DP T cells and separation between host and donor origin immune cells. From the pilot study, I have found that immune cell populations CD3+CD4+CD8(3+ and CD3 +CD4 +CD8p +CD8 + exhibited a pattern of higher PBMC 98 proportion values and greater fluctuation from the aGvHD patients, when compare to the non-GvHD patients (Figures 4.6 and 4.7). Marker such as CDla [134, 158] may be incorporated to distinguish thymocytes and mature T cells. Additional marker such as CD69, CD56, CD38, CD27, CD28, CD134, CXCR3, and CD62L will help to determine the exact origin and functional phenotype of the DP T cells and facilitate the efforts of validating current findings. Additional experimental methods to separate immune cells of donor or recipient origins may also be necessary. The apparent DP T cell population was identified as a potential aGvHD marker from its pattern between 7 and 21 days post-transplant. During this time, the donor and the residual recipient immune cell chimerism has been documented in both human [129,130] and mouse [159] models. Separation of immune cells' origin will aid in elucidating functions of T cells and T cell subset and their roles in the GvHD manifestation. Furthermore, the separation may also be useful in validating patterns of possible immune reconstruction (Figures 4.11 and 4.12). 5.5.5 Additional statistic tests There are also three recommendations to improve the current analytical procedure (Figure 2.1). The first recommendation is the addition of statistical tests to the manual QA test and the FCM gating procedure. In the present study, outliers were identified from the QA test solely based on visual inspection. Conventional statistical tests such as analysis of variance and box plots to the current QA test may help to eliminate some biases. However, these tests are more efficient in identifying differences in distribution shifts instead of distribution shapes. Statistical tests such as the functional arbitrary co variance tests of shape [160] may be tested on its sensitivity to F C M Q A testing using known samples (section 5.1.3) or simulated data. In this study, the FCM gating was performed manually by one or two -parameters 99 visualization with prior biological knowledge. These manual visual analyses were subjective and time consuming. Efforts have been made to improve gating efficiency and robustness. A recently developed feature-guided clustering algorithm [161] might be applicable in both QA and gating of high-throughput F C M dataset. 5.5.6 Graft versus host disease grades The second recommendation for analytical improvement is the addition of GvHD grade in the analysis in order to accommodate GvHD misdiagnoses. At present, aGvHD and cGvHD diagnoses are ambiguous especially for mild forms of aGvHD (grade I) and cGvHD (limited). There are many reports on GvHD grading schemes [162-172] and their uncertain reproducibility [173, 174]. While the reproducibility might be remedied by a clinical algorithm [175], it will not decrease misdiagnoses. Many previous aGvHD studies [29, 30] omitted patients diagnosed with grades I or II aGvHD from analyses so as to avoid interferences from misdiagnoses. This option was attempted for the present study resulting in similar or higher predictive powers from the top classifiers (Table 4.1). For future studies, I propose an addition of fuzzy clustering algorithm [176, 177] or mixture model based classification [178] to the temporal analysis pipeline in order to accommodate GvHD grades and misdiagnoses. It is important to predict not only the development of GvHD but also its severity. Many studies have suggested that due to the beneficial graft versus leukemia effects, only moderate or severe GvHD should be treated [154]. 100 5.5.7 External validation The third analytical recommendation is the implementation of external validation, which would only be possible if there are enough patients recruited to separate into a training dataset and a testing dataset. Two sets of 100 HSCT patients as the training and testing dataset for FLDA are recommended. Another set of 100 HSCT patients may be required for the multiparametric approached described below. Currently, LOOCV, which over-estimates classifier accuracy, is used to validate and rank FLDA classifiers without correction for multiple testing. 5.5.8 Multiparametric approach The first tentative recommendation is an additional multiparametric analysis. Previous [81] and current studies have suggested a very complex GvHD manifestation. Presently, temporal classifiers from different subsets of immune cells were interpreted individually as there is no multiparametric temporal analysis algorithm available. However, preliminary results from Support Vector Machines (SVMs) analyses on the linear discriminant values from multiple FLDA classifiers indicated that predictive powers of these classifiers could be combined to achieve a better accuracy. A SVM defines the best linear separating hyperplane between different classes of the training dataset projected into a high dimensional space. In the preliminary analysis, linear discriminant values from FLDA classifiers predicting the onset of aGvHD were obtained through resubstitution. Linear discriminant values representing weighted distances between the test data and the classifier, were then normalized to the range {-1, 1} using Equation 5.3 in preparation for the SVM analysis. Correlation-based feature selection method [179] was performed in Weka [180] to select a subset of the temporal classifiers by comparing the individual predictive power of the classifiers and the degree of redundancy between them. The 101 three top ranking classifiers selected based on LOOCV sensitivity and specificity were among the 11 classifiers selected using this feature selection method. Normalized linear discriminant values from these 11 classifiers were visually different between the aGvHD and non-GvHD patients (Figure 5.5). Individually, the best LOOCV estimated accuracy among these 11 classifiers was 86% sensitivity and 100% specificity. LOOCV estimated accuracy from the SVM [181] classifier of all 11 classifiers was 100% sensitivity and 100% specificity. Albeit resubstitution for the FLDA classifiers and LOOCV for the SVM classifier could result in a severe over-estimation in the final SVM's accuracy; the preliminary results suggest the applicability of SVM to combine predictive powers of multiple FLDA classifiers. \ax < 0,<5y/min(a) a x = < „ \ax >0,ax/max(a) Equation 5.3 Normalization function for the linear discriminant values 102 cu > .a 1-1 T3 a> N ns s l-H o 2 d o d L n d / / / .<? # J J ? A .<f & 0 Measurements aGvHD nonGvHD Figure 5.5 Parallel coordinates plot of the normalized linear discriminant values from the 11 FLDA classifiers selected via the correlation-based feature selection method. 5.5.9 Long time series analysis The second tentative recommendation is an evaluation of additional long time series analyses. A long time series analysis would utilize, the maximum amount of data and could be useful in elucidating the GvHD pathophysiologic mechanism occurring over time at different rates among patients. Also, most spline-103 based methods including FLDA may perform more efficiently on long time series data [128]. Even though on average 15 weeks of data were available from each patient in the present dataset, I found the risk of desynchronization and needs for biased smoothing and registration procedures outweighed the benefits of a long time series analysis for this pilot study. However, for future studies, it might be possible to perform long time series analysis if detailed patient information such as GvHD progression can be incorporated into the registration procedure [48]. 5.6 Conclusion This pilot project achieved its objectives. The temporal analysis pipeline (Figure 2.1) was designed and implemented on the high-throughput clinical FCM data. Results of the QA test identify potential experiment errors. The screening of the current limited dataset by the temporal pipeline identified several potential aGvHD and cGvHD diagnosis markers including rare forms of T cells. In the present study, the most promising pattern was immune cells with CD3 +CD4 +CD8p + (CD8+) phenotype which had higher proportion values and greater fluctuation from the aGvHD patients, compared to the non-GvHD patients (Figures 4.4 and 4.6). Multiple unknown immune cell subsets including 45RO +CD3-CD4 d i m (Table 4.3) exhibited opposite FLDA estimated signal patterns (Figures 4.14 and 4.15) between the aGvHD & cGvHD and the aGvHD only patients. While there was a high risk of false positives in the classification due to the limited number of available patients and errors from multiple testing, the current results demonstrated the applicability of the temporal analysis pipeline to the high-throughput clinical FCM data and the applicability of SVMs to combine multiple temporal classifiers' predictive powers. They also demonstrated the benefits of large scaled F C M study and temporal analysis. Large scale FCM study possibly combined with automatic gating process [161] would eliminate biases from prior knowledge 104 and could be very useful in elucidating GvHD. For instance, the DP T cells were never purposely included in other studies because they were not expected to exist based on the known T cell maturation mechanism (Figure 4.11). Potential problems from the experimental and analytic procedures were identified and seven potential improvements recommended. They were: 1. Random plating 2. Increase patient recruitment, ideally two sets of 100 HSCT patients for training and testing purposes respectively 3. Increase sampling rate especially after the BMT procedure 4. Addition of markers targeting differentiation and function status of T cells 5. Addition of statistic tests to both the QA test and the FCM gating procedure to the existing visualization methods ' 6. Including GvHD grades in the temporal analysis in order to accommodate -GvHD diagnosis errors 7. External validation for classifiers As expected, none of the classifiers yielded significant correlation to the onset of either aGvHD or cGvHD. A future study made more efficient by these recommendations will be required to validate the current findings. 105 BIBLIOGRAPHY 1. Gilliam AC: Update on Graft versus Host Disease. Progress in Dermatology 2004,123:251-257. 2. Gasparetto M, Gentry T, Sebti S, O'Bryan E, Nimmanapalli R, Blaskovich MA, Bhalla K, Rizzieri D, Haaland P, Dunne J et al: Identification of compounds that enhance the anti-lymphoma activity of rituximab using flow , cytometric high-content screening. Journal of Immunological Methods 2004, 292(l-2):59. 3. Le Meur N, Rossini AJ, Gasparetto M, Smith C, Brinkman RR, Gentleman RC: Quality Assessment of Ungated Flow Cytometry data in High Throughput experiments. Cytometry 2007, In press. 4. James GM, Hastie TJ: Functional linear discriminant analysis for irregularly sampled curves. Journal of the Royal Statistical Society Series B 2001, 63(3):533-550. 5. Reddy P: Pathophysiology of acute graft-versus-host disease. Hematological Oncology 2003,21:149-161. 6. Syrjala KL, Chapko MK, Vitaliano PP, Cummings C, Sullivan KM: Recovery after allogeneic marrow transplantation: prospective study of predictors of long-term physical and psychosocial functioning. Bone Marrow Transplantation 1993,11:319-327. 7. Duell T, van Lint MT, Ljungman P, Tichelli A, Socie G, Apperley J, Weiss M, Cohen A, Nekolla E, Kolb HJ: Health and functional status of long-term survivors of bone marrow transplantation. Annals of Internal Medicine 1997, 126(3):184-192. 8~. Mandy FF: Twenty-five years of clinical flow cytometry: AIDS accelerated global instrument distribution. Cytometry 2004, 58A(l):55-56. 9. Orfao A, Ortuno F, de Santiago M, Lopez A, San Miguel J: Immunophenotyping of acute leukemias and myelodysplastic syndromes. Cytometry 2004, 58A(1):62-71. 10. Braylan RC: Impact of flow cytometry on the diagnosis and characterization of lymphomas, chronic lymphoproliferative disorders and plasma cell neoplasias. Cytometry 2004,58A(1):57-61. 11. Keeney M, Gratama JW, Sutherland DR: Critical role of flow cytometry in evaluating peripheral blood hematopoietic stem cell grafts. Cytometry 2004, 58A:72-75. 12. Sutherland HJ, Fyles GM, Adams G, Hao Y, Lipton JH, Minden MD, Meharchand JM, Atkins H, Tejpar I, Messner HA: Quality of life following bone marrow transplantation: a comparison of patient reports with population norm. Bone Marrow Transplantation 1997,19:1129-1136. 13. Socie G, Stone JV, Wingard JR, Weisdorf D, Henslee-Downey PJ, Bredeson C, Cahn JTY, Passweg JR, Rowlings PA, Schouten HC et al: Long-Term Survival and Late Deaths after Allogeneic Bone Marrow Transplantation. N Engl J Med 1999, 341(1):14-21. 106 14. B i l l i n g h a m R E : The biology of graft-versus-host reactions. Harvey Lectures 1966, 62:21-78. 15. G o k e r H , H a z n e d a r o g l u IC, C h a o N J : Acute graft-vs-host disease: pathobiology and management. Experimental Hematology 2001, 29:259-277. 16. B a r o n F, Storb R: Allogeneic hematopoietic cell transplantation as treatment for hematological malignancies: a review. Springer Seminars in Immunopathology 2004, 26(l-2):71-94. 17. C o u r i e l D , C a l d e r a H , C h a m p l i n R, K o m a n d u r i K : Acute graft-versus-host disease: pathophysiology, clinical manifestations, and management. Cancer 2004,101(9):1936-1946. 18. Johnson M L , Farmer ER: Graft-versus-host reactions in dermatology. Journal of the American Academy of Dermatology 1998, 38(3):369-392. 19. K l i n g e m a n n H G , Storb R, Fefer A , Deeg H J , A p p e l b a u m F R , Buckner C D , Cheever M A , Greenberg P D , Stewart PS, S u l l i v a n K M : Bone marrow transplantation in patients aged 45 years and older. Blood 1986, 67:770-776. 20. Barrett A J , R e z v a n i K , S o l o m o n S, D i c k i n s o n A M , W a n g X N , Stark G , C u l l u p H , Jarvis M , M i d d l e t o n P G , C h a o N J : New Developments in Allotransplant Immunology. Hematology 2003:350-371. 21. Pavlet ic S Z , Car ter S L , K e r n a n N A , Hens lee -Downey J, M e n d i z a b a l A M , Papadopou los E , G i n g r i c h R, Casper J, Y a n o v i c h S, We i sdo r f D : Inf lurnece of T cell depletion on chronic Graft-Versus-Host Disease: results of a multi-center randomized trial in unrelated marrow donor transplantation. Blood 2005,106(9):3308-3313. 22. I ch ik i Y , B o w l u s C L , S h i m o d a S, Ishibashi H , V i e r l i n g J M , G e r s h w i n M E : T cell immunity and graft-versus-host disease (GVHD). Autoimmunity Reviews 2006,5:1-9. 23. K e r n a n N A , Bartsch G , A s h R C , Beatty P G , C h a m p l i n R, F i l i p o v i c h A , Gajewski J, Hansen J A , Hens lee -Downey J, M c C u l l o u g h J et al. Analysis of 462 Transplantations from Unrelated Donors Facilitated by the National Marrow Donor Program. N Engl J Med 1993,328(9)593-602. 24. M a r m o n t A M , H o r o w i t z M M , Ga le R P , Sobocinski K , A s h R C , v a n B e k k u m D W , C h a m p l i n R E , D i c k e K A , G o l d m a n J M , G o o d R A : T-cell depletion of HLA identical transplants in leukemia. Blood 1991, 78(8):2120-2130. 25. Bac iga lupo A , L a m p a r e l l i T, B r u z z i P, G u i d i S, Alessandr ino P E , d i Bar to lomeo P , Oneto R, Bruno B, Barbanti M , Sacchi N et al: Antithymocyte globulin for graft-versus-host disease prophylaxis in transplants from unrelated donors: 2 randomized studies from Gruppo Italiano Trapianti Midollo Osseo (GITMO). Blood 2001,98(10):2942-2947. 26. H a l e G , Z h a n g M - J , Bunjes D , Prentice H G , Spence D , H o r o w i t z M M , Barrett A J , W a l d m a n n H : Improving the Outcome of Bone Marrow Transplantation by Using CD52 Monoclonal Antibodies to Prevent Graft-Versus-Host Disease and Graft Rejection. Blood 1998,92(12):4581-4590. 107 27. Baron F, Maris M B , Storer BE, Sandmaier B M , Panse JP, Chauncey TR, Sorror M , Little M-T, Maloney D G , Storb R et al: H i g h doses of transplanted CD34+ cells are associated wi th rapid T-cell engraftment and lessened risk of graft rejection, but not more graft-versus-host disease after nonmyeloablative condit ioning and unrelated hematopoietic cell transplantation. Leukemia 2005. 28. Storb R, Prentice R, Buckner C D , Clift R A , Appelbaum FR, Deeg J, Doney K , Hansen JA, Mason M , Sanders J et al: Graft-versus-host disease and survival i n patients w i t h aplastic anemia treated by marrow grafts from HLA-identical siblings. Beneficial effect of a protective environment. N Engl J Med 1983,308(6) :302-307. 29. Paz Morante M , Briones J, Canto E, Sabzevari H , Martino R, Sierra J, Rodriguez-Sanchez JL, Vida l S: Activation-associated phenotype of CD3+ T cells i n acute graft-versus-host disease. Clinical and Experimental Immunology 2006,145(l):36-43. 30. Jaksch M , Uzunel M , Remberger M , Sundberg B, Mattsson J: Molecular monitoring of T-cell chimerism early after allogeneic stem cell transplantation may predict the occurrence of acute G V H D grades II-IV. Clinical Transplantation 2005,19(3):346-349. 31. Ferrara J, Guil len FJ, van Dijken PJ, Marion A , Murphy GF, Burakoff SJ: Evidence that large granular lymphocytes of donor or igin mediate acute graft-versus-host disease. Transplantation 1989, 47(l):50-54. 32. Filep JG, Baron C, Lachance S, Perreault C, Chan JS: Involvement of nitric oxide i n target-cell lysis and DNA fragmentation induced by murine natural k i l le r cells. Blood 1996,87(12):5136-5143. 33. Asai O , Longo D L , Tian Z G , Hornung RL, Taub D D , Ruscerti FW, Murphy WJ: Suppression of graft-versus-host disease and amplification of graft-versus-tumor effects by activated natural k i l le r cells after allogeneic bone marrow transplantation. Journal of Clinical Investigation 1998,101(9):1835-1842. 34. Klingemann H G : Relevance and potential of natural k i l l e r cells i n stem cell transplantation. Biology of Blood and Marrow Transplantation 2000, 6(2):90-99. 35. Vargas-Diez E, Garcia-Diez A , Mar in A , Fernandez-Herrera J: Life-threatening graft-vs-host disease. Clinics in Dermatology 2005,23(3):285-300. 36. Kansu E: The Pathophysiology of Chronic Graft-versus-Host disease. International Journal of Hematology 2004, 79(3):209-215. 37. Iwasaki T: Recent Advances i n the Treatment of Graft-Versus-Host Disease. Clinical medicine and research 2004,2(4):243-252. 38. Lee SJ, Klein JP, Barrett AJ , Ringden O , Ant in JH, Cahn J-Y, Carabasi M H , Gale RP, Giralt S, Hale G A et al: Severity of chronic graft-versus-host disease: association wi th treatment-related mortality and relapse. Blood 2002, 100:406-452., 39. Higman M A , Vogelsang GB: Chronic graft versus host disease. British Journal of Hematology 2004,125(4):435-454. 108 40. H a l e G , Jacobs P , W o o d L , Fibbe W E , Barge R, N o v i t z k y N , Toi t C , A b r a h a m s L , Thomas V , Bunjes et a: CD52 antibodies for prevention of graft-versus-host disease and graft rejection following transplantation of allogeneic peripheral blood stem cells. Bone Marrow Transplantation 2000, 26(l):69-76. 41. Komatsuda M : Changes of lymphocyte subsets in leukemia patients who received allogenic bone marrow transplantation. Acta medica Okayama 1991, 45(4):257-265. 42. Remberger M , R i n g d e n O , B l a u I-W, Ot t inger H , Kremens B , K i e h l M G , A s c h a n J, Beelen D W , Basara N , K u m l i e n G et al: No difference in graft-versus-host disease, relapse, and survival comparing peripheral stem cells to bone marrow using unrelated donors. Blood 2001, 98(6):1739-1745. 43. Z a u c h a J M , Goo ley T, Bensinger W I , H e i m f e l d S, Chauncey T R , Z a u c h a R, M a r t i n PJ, F lowers M E D , Storek J, Georges G et al: CD34 cell dose in granulocyte colony-stimulating factor-mobilized peripheral blood mononuclear cell grafts affects engraftment kinetics and development of extensive chronic graft-versus-host disease after human leukocyte antigen-identical sibling transplantation. Blood 2001, 98(12):3221-3227. 44. M o h t y M , Bi lger K , Jourdan E , K u e n t z M , Michallet M , Bourh i s J H , M i l p i e d N , Sut ton L , Jouet JP, A t t a l M et al: Higher doses of CD34+ peripheral blood stem cells are associated with increased mortality from chronic graft-versus-host disease after allogeneic HLA-identical sibling transplantation. Leukemia 2003,17(5):869-875. 45. Pe rez -S imon J A , D i e z - C a m p e l o M , M a r t i n o R, Sureda A , Cabal le ro D , C a n i z o C , Brunet S, Al tes A , V a z q u e z L , Sierra J et al: Impact of CD34+ cell dose on the outcome of patients undergoing reduced-intensity-conditioning allogeneic peripheral blood stem cell transplantation. Blood 2003, 102(3):1108-1113. 46. Bar-Joseph Z : Analyzing time series gene expression data. Bioinformatics 2004,20(16):2493-2503. 47. Bay S D , Chrisman L , Pohor i l l e A , Shrager J: Temporal aggregation bias and inference of casual regulatory networks. Journal of Computational Biology 2004, ll(5):971-985. 48. Ramsay JO, Silverman B W : Functional data analysis, Second edn. N e w Y o r k : Springer ; 2005. 49. C u e v a s A , Febrero M , F r a i m a n R: An anova test for functional data. Computational statistics and data analysis 2004,47:111-122. 50. Pa rk T, Y i S-G, Lee S, Lee S Y , Y o o D - H , A h n J-I, Lee Y - S : Statistical tests for identifying differentially expressed genes in time-course microarray experiments. Bioinformatics 2003,19(6):694-703. 51. Y a o F, M u l l e r H - G , W a n g J-L: Functional Data Analysis for Sparse Longitudinal Data. Journal of the American Statistical Association 2005, 100(470)577-590. 109 52. L i u X , M u l l e r H - G : Modes and clustering for time-warped gene expression profile data. Bioinformatics 2003,19(15):1937-1944. 53. Ba lasubramaniyan R, Hu l l e rme ie r E , W e s k a m p N , K a m p e r J: Clustering of gene expression data using a local shape-based similarity measure. Bioinformatics 2005, 21(7):1069-1077. 54. Erns t J, N a u G J , Bar-Joseph Z : Clustering short time series gene expression data. Bioinformatics 2005,21(Suppl I ) : i l 59 - i l68 . 55. L i u H , T a r i m a S, Borders A S , Getchel l T V , Getchel l M L , S t romberg A J : Quadratic regression analysis for gene discovery and pattern recognition for non-cyclic short time-course microarray experiments. Bioinformatics 2005, 6(1):106-122. 56. Bar-Joseph Z , Gerber G K , G i f fo rd D K , Jaakkola TS, S i m o n I: Continuous Representations of Time-Series Gene Expression Data. Journal of Computational Biology 2003,10(3-4):341-356. 57. Ben-Dor A , Shamir R, Y a k h i n i Z : Clustering gene expression patterns. Journal of Computational Biology 1999, 6(3/4):281-297. 58. Azuaje F: Clustering-based approaches to discovering and visuzlising microarray data patterns. Briefing in Bioinformatics 2003,4(l):31-42. 59. L u a n Y , L i H : Clustering of time-course gene expression data using a mixed-effects model with B-splines. Bioinformatics 2003,19(4):474-482. 60. Kehagias A , Petr idis V : Predictive Modular Neural Networks for Time Series Classification. Neural networks 1997,10(l):31-49. 61. M e n d e z M A , H o d a r C , V u l p e C , Gonza l ez M , C a m b i a z o V : Discriminant analysis to evaluate clustering of gene expression data. FEBS letters 2002, 522:24-28. 62. H a l l P, Poski t t DS , Presnel l B: A Functional Data - Analytic Approach to Signal Discrimination. Technometrics 2001,43(l):l-9. 63. M u l l e r H - G : Functional Modelling and Classification of Longitudinal Data. Scandinavian Journal of Statistics 2005, 32(2):223-240. 64. deBoor C : A Practical Guide to Splines, Rev ised E d i t i o n edn. N e w Y o r k : Springer; 2001. 65. Has t ie T, T ibsh i ran i R, F r i e d m a n J: The Elements of Statistical Learning. N e w . Y o r k : Springer; 2001. 66. Ramsay JO, L i X : Curve registration. Journal of the Royal Statistical Society Series B 1998,60(2):351-363. 67. R a y c h a u d h u r i S, Stuart J M , A l t m a n R B : Principal components analysis to summarize microarray experiments: application to sporulation time series. In: Pacific Symposium on Biocomputing: 2000; Singapore: W o r l d Scientific; 2000: 455-466. 68. L i K C , Y a n M , Y u a n SS: A simple statistical model for depicting the cdcl5-synchronized yeast cell-cycle regulated gene expression data. Statistica Sinica 2002,12(1):141-158. 110 69. A l t e r O , B r o w n P O , Botstein D : Singular value decomposition for genome-wide expression data processing and modeling. PNAS 2000, 97(18):10101-10106. 70. K r u g l y a k S, T a n g H : A New Estimator of Significance of Correlation in Time Series Data. Journal of Computational Biology 2001,8(5):463-470. 71. B r o w n M P S , G r u n d y W N , L i n D , Cr i s t i an in i N , Sugnet C W , Fu rey TS, A r e s M J , Hauss le r D : Knowledge-based analysis of microarray gene expression data by using support vector machines. PNAS 2000, 97(l):262-267. 72. Ferraty F, V i e u P: Curves discrimination: a nonparametric functional approach. Computational statistics and data analysis 2003, 44:161-173. 73. G u i J, L i H : Mixture Functional Discriminant Analysis for Gene Function Classification Based on Time Course Gene Expression Data. In: Joint Statistical Meetings: 2003; San Francisco, California; 2003. 74. R i p l e y B D : Pattern Recognition and Neural Networks: C a m b r i d g e U n i v e r s i t y Press; 1996. 75. Braga-Neto U M , H a s h i m o t o R, Dougher ty E R , N g u y e n D V , C a r r o l l RJ: Is cross-validation better than resubstitution for ranking genes? Bioinformatics 2004,20(2):253-258. 76. Braga-Neto U M , Dougher ty ER: Is cross-validation valid for small-sample microarray classification? Bioinformatics 2004, 20(3):374-380. 77. v a n Belle G , Fisher L D , Heagerty PJ, L u m l e y T: Biostatistics: A Methodology for the Health Sciences, 2 edn. N e w Jersey: W i l e y ; 2004. 78. C o l l i n g s BJ, H a m i l t o n M A : Estimating the power of the two-sample wilcoxon test for location shift. Biometrics 1988, 44:847-860. 79. Johns ton-Wilson N L , Bou ton C M , Pevsner J, Breen JJ, Tor rey E E , Y o l k e n R H : Emerging technologies for large-scale screening of human tissues and fluids in the study of severe psychiatric disease. International Journal of Neuropsychopharmacology 2001,4(l):83-92. 80. A l i z a d e h A A , E isen M B , D a v i s R E , M a C , Lossos IS, R o s e n w a l d A , B o l d r i c k JC, Sabet H , T r a n T, Y u X et al: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 2000, 403(6769):503-511. 81. Jaksch M , Mat t s son J: The Pathophysiology of Acute Graft-Versus-Host Disease. Scandinavian Journal of Immunology 2005,61(5):398-409. 82. G r a t w o h l A , H e r m a n s J, A p p e r l e y J, Arcese W , Baciga lupo A , B a n d i n i G , d i Bar to lomeo P, Boogaerts M , Bosi A , Carreras E : Acute graft-versus-host disease: grade and outcome in patients with chronic myelogenous leukemia. Working Party Chronic Leukemia of the European Group for Blood and Marrow Transplantation. Blood 1995,86(2):813-818. 83. C r a w f o r d K , Stark A , Ki tchens B, S ternheim K , Pantazopoulos V , Tr iantafe l low E, W a n g Z , Vas i r B, Larsen C E , G a b u z d a D et al: CD2 . engagement induces dendritic cell activation: implications for immune surveillance and T-cell activation. Blood 2003,102(5):1745-1752. I l l 84. Sykes M , A b r a h a m V S : The mechanism of IL-2-mediated protection against GVHD in mice. II. protection occurs independently of NK/LAK cells. Transplantation 1992,53(5):1063-1070. 85. C h a o N J : Graft-versus-host disease: the viewpoint from the donor T cell. Biology of Blood and Marrow Transplantation 1997, 3(1):1-10. 86. M i c e l i M C , v o n H o e g a n P , Parnes JR: Adhesion versus coreceptor function of CD4 and CD8: role of the cytoplasmic tail in coreceptor activity. PNAS 1991,88(7):2623-2627. . 87. Seong R H , C h a m b e r l a i n JW, Parnes JR: Signal for T-cell differentiation to a CD4 cell lineage is delivered by CD4 transmembrane region and/or cytoplasmic tail. Nature 1992, 356(6371):718-720. 88. Jones N H , C l a b b y M L , Dia lynas D P , H u a n g H J , Herzenbe rg L A , S t rominger JL: Isolation of complementary DNA clones encoding the human lymphocyte glycoprotein Tl/Leu-1. Nature 1986, 323(6086):346-349. 89. v a n de V e l d e H , v o n hoegan I, L u o W , Parnes JR, Thie lemans K : The B-cell surface protein CD72/Lyb-2 is the ligand for CD5. Nature 1991, 351(6328):662-665. 90. A l l a m A , Illges H : Calyculin A inhibits expression of CD8alpha but not CD4 in human peripheral blood T cells. Immunobiology 2000,202(4)-.353-362. 91. L i n RS, Rodr iguez C , Veil let te A , L o d i s h H F : Zinc is essential for binding of p56(lck) to CD4 and CD8alpha. Journal of Biological Chemistry 1998, 473(49):32878-32882. 92. Correa le P , Tagliaferr i P , C a m e r a A , Carag l i a M , D e l Vecch io L , D e Laurent i s M , P in to A , Ro to l i B, Bianco A R : CDlO/common acute lymphoblastic leukemia-associated antigen and adhesion factor expression is predictive for lymphokine-activated killing sensitivity of adult B-lineage acute lymphoblastic leukemia. The Year in Immunology 1993, 7:90-95. 93. G u p t a D , K i r k l a n d T N , V i r i y a k o s o l S, D z i a r s k i R: CD14 is a cell-activating receptor for bacterial peptidoglycan. journal of Biological Chemistry 1996, 271(38):23310-23316. 94. K e r f M A , Stocks S C : The role of CD15-(Le(X))-related carbohydrates in neutrophil adhesion. Histochem J1992, 24(ll):811-826. 95. A n d e r s o n P , C a l g i u r i M , O ' B r i e n C , M a n l e y T, Ri t z J, Schlossman SF: Fc gamma receptor type III (CD16) is included in the zeta NK receptor complex expressed by human natural killer cells. PNAS 1990,87(6):2274-2278. 96. Cruse J M , L e w i s R E : Atlas of Immunology. Boca Raton: C R C press; 1999. 97. Sh i r akawa T, L i A , D u b o w i t z M , Dekke r J W , Shaw A E , Faux J A , R a C , C o o k s o n W O , H o p k i n J M : Association between atopy and variants of the beta subunit of the high-affinity immunoglobulin E receptor. Nature Genetics 1994, 7(2):125-129. 112 98. W i l s o n G L , Najfeld V , K o z l o w E , Menn ige r J, W a r d D , K e h r l J H : Genomic structure and chromosomal mapping of the human CD22 gene. Journal of Immunology 1993,150(ll):5013-5024. 99. C rocke r P R , M u c k l o w S, Bouckson V , M c W i l l i a m A , W i l l i s A C , G o r d o n S, M i l o n G , K e l m S, Bradf ie ld P: Sialoadhesin, a macrophage sialic acid binding receptor for haemopoietic cells with 17 immunoglobulin-like domains, the EMBO journal 1994,13(19):4490-4503. 100. Vi t a l e C , R o m a g n a n i C , Falco M , Ponte M , Vi ta le M , More t t a A , Bac iga lupo A , More t t a L , M i n g a r i M C : Engagement of p75/AIRMl or CD33 inhibits the proliferation of normal or leukemic myeloid cells. PNAS 1999, 96(26):15091-15096. 101. Sh t i ve lman E , B ishop J M : Expression of CD44 is repressed in neuroblastoma cells. Molecular and Cellular Biology 1991, l l( l l ) :5446-5453. 102. Juretic E , G a g r o A , V u k e l i c V , Pe t rovecki M : Maternal and neonatal lymphocyte subpopulations at delivery and 3 days postpartum: increased coexpression of CD45 isoforms. American Journal of Reproductive Immunology 2004, 52(l):l-7. 103. Ma t to M , N u u t i n e n U M , R o p p o n e n A , M y l l y k a n g a s K , Pe lkonen J: CD45RA and RO Isoforms Have Distinct Effects on Cytokine- and B-Cell-Receptor-Mediated Signalling in Human B cells. Scandinavian Journal of Immunology 2005, 61(6):520-528. 104. Posselt A M , V incen t i F , B e d o l l i M , Lan tz M , Roberts JP, H i r o s e R: CD69 expression on peripheral CD8 T cells correlates with acute rejection in renal transplant recipients. Transplantation 2003, 76(1):190-195. 105. Stuber E , N e u r a t h M , Ca lderhead D , Fe l l H P , Stober W : Cross-linking of OX40 ligand, a member of the TNF/NGF cytokine family, induces proliferation and differentiation in murine splenic B cells. Immunity 1995, 2(5):507-521. 106. Pieters R H H , Pun t P, B o l M , v a n Di jken J M : The thymus atrophy inducing organotin compound DBTC stimulates TCRab-CD3 signaling in immature rat thymocytes. Biochem Biophys Res Commun 1995,214(552-558). 107. Ross in i A J , W a n JY, M o o d i e Z : rflowcyt: Statistical tools and data structures for analytic flow cytometry. In., 1.0.1 edn; 2005: R package. 108. H u a n g E , Wes t M , N e v i n s JR: Gene expression profiling for prediction of clinical characteristics of breast cancer. Recent progress in hormone research 2003,58:55-73. 109. C h e o k M H , Y a n g W , P u i C - H , D o w n i n g JR, C h e n g C , Naeve C W , R e l l i n g M V , Evans W E : Treatment specific changes in gene expression discriminate in vivo drug response in human leukemia cells. Nature Genetics 2003,34(1):85-90. 110. M i u r a Y , T h o b u r n C J , Br ight E C , A r a i S, Hess A D : Regulation of OX40 gene expression in graft-versus-host disease. Transplantation Proceedings 2005, 37(1):57-61. 113 111. Imamura M , Tsu t sumi Y , M i u r a Y , Touba i T, Tanaka J: Immune Reconstitution and Tolerance after Allogeneic Hematopoietic Stem Cell Transplantation. Hematology 2003, 8(l):19-26. 112. Ju X I 5 , X u B , X i a o Z P , L i JY, C h e n L , L u S Q , H u a n g Z X : Cytokine expression during acute graft-versus-host disease after allogeneic peripheral stem cell transplantation. Bone Marrow Transplantation 2005. 113. Ichiba T, Tesh ima T, K u i c k R, M i s e k D E , L i u C , Takada Y , M a e d a Y , R e d d y P, W i l l i a m s D L , H a n a s h S M et al: Early changes in gene expression profiles of hepatic GVHD uncovered by oligonucleotide microarrays. Blood 2003, 102(2):763-771. 114. Brad ley L M , D a l t o n D K , Crof t M : A direct role of IFN-gamma in regulation of Thl cell development. / Immunol 1996,157(4):1350-1358. 115. M a g g i e Merchan t S R W : Recent advancements in surface-enhanced laser desorption/ionization-time of flight-mass spectrometry. Electrophoresis 2000, 21(6):1164-1177. 116. Kaiser T, H e r m a n n A , Kie l s te in JT, Wi t tke S, Bartel S, Krebs R, H a u s a d e l F , H i l l m a n n M , G o l o v k o I, Koester P : Capillary electrophoresis coupled to mass spectrometry to establish polypeptide patterns in dialysis fluids. Journal of Chromatography A 2003,1013(1-2):157-171. 117. S r in ivasan R, Danie ls J, Fusaro V , Lundqvist A , K i l l i a n J K , G e h o D , Q u e z a d o M , K le ine r D , Rucker S, Esp ina V : Accurate diagnosis of acute graft-versus-host disease using serum proteomic pattern analysis. Experimental Hematology 2006, 34(6):796-801. 118. Ka ise r T, K a m a l H , Rank A , K o l b H-J , H o l l e r E , Ganser A , Her tens te in B, M i s c h a k H , Weiss inger E M : Proteomics applied to the clinical follow-up of patients after allogeneic hematopoietic stem cell transplantation. Blood 2004, 104(2):340-349. 119. B r a z m a A : On the Importance of Standardisation in Life Sciences. Bioinformatics 2001,17(2):113-114. 120. Chicurel M : Bioinformatics: Bringing it all together technology feature. Nature 2002,419(6908):751-757. 121. B o g u s k i M S , M c i n t o s h M W : Biomedical informatics for proteomics. Nature 2003,422(6928):233-237. 122. Introductory Core Operation Course [http:/ / w w w . m e d . u m i c h . e d u / f l o w c y t o m e t r y / I n i t i a l T r a i n i n g / i n d e x . h t m j 123. Lan ie r L L , Ph i l i p s JP, Ph i l i p s J H : Correlation of cell surface antigen expression on human thymocytes by multi-color flow cytometric analysis: implications for differentiation. Journal of Immunology 1986,137(8):2501-2507. 124. D u d o i t S, Y a n g Y H : Bioconductor R packages for exploratory analysis and normalization of cDNA microarray data. In: The Analysis of Gene Expression Data: Methods and Software. Ed i t ed by Pa rmig ian i G , Garret t ES , I r izarry R, Zeger S L . N e w Y o r k : Springer; 2002. 114 125. Jacobsohn D A , Mont ross S, A n d e r s V , Voge l sang G B : Clinical importance of confirming or excluding the diagnosis of chronic graft-versus-host disease. Bone Marrow Transplantation 2001, 28(11):1047-1051. 126. C u l h a n e A C , Perriere G , Cons id ine E C , Cotter T G , H i g g i n s D G : Between-group analysis of microarray data. Bioinformatics 2002,18(12):1600-1608. 127. Hierarchical Ordered Partitioning and Collapsing Hybrid (HOPACH) [http:/ / www.s ta t .be rke ley .edu /~ laan / ] 128. Bar-Joseph Z , Gerber G , S i m o n I, G i f fo rd D K , Jaakkola TS: Comparing the continuous representation of time-series expression profiles to identify differentially expressed genes. PNAS 2003,100(18):10146-10151. 129. Bu t tu r in i A , Seeger R C , Ga le R P : Recipient immune-competent T lymphocytes can survive intensive conditioning for bone marrow transplantation. Blood 1986, 68(4):954-956. 130. Bertheas M F , Lafage M , L e v y P , Blaise D , S toppa A M , V iens P , M a n n o n i P , M a r a n i n c h i D : Influence of mixed chimerism on the results of allogeneic bone marrow transplantation for leukemia. Blood 1991, 78(11):3103-3106. 131. Z u c k e r m a n n F A : Extrathymic CD4/CD8 double positive T cells. Veterinary Immunology and Immunopathology 1999, 72:55-66. 132. K e l l y K , P i l a r sk i L , Shor tman K , Scol lay R: CD4+ CD8+ cells are rare among in vitro activated mouse or human T lymphocytes. Cellular immunology 1988, 2:414-424. 133. A b b a s A K , L i c h t m a n A H : Cellular and Molecular Immunology, 5 edn . Ph i l ade lph ia , P A : Saunders; 2003. 134. Blue M L , Da ley JF, Lev ine H , Schlossman SF: Coexpression of T4 and T8 on peripheral blood T cells demonstrated by two-color florescence flow cytometry. Journal of Immunology 1985,134:2281-2286. 135. O r t o l a n i C , For t i E , R a d i n E , C i b i n R, Cossa r i zza A : Cytofluorometric identification of two populations of double positive (CD4+, CD8+) T lymphocytes in human peripheral blood. Biochem Biophys Res Commun 1993, 191(2):601-609. .. . .. 136. K a y N E , Bone N , H u p k e M , Dalmasso A P : Expansion of a Lymphocyte Population Co-Expressing T4 (CD4) and T8 (CD8) Antigens in the Peripheral Blood of a Normal Adult Male. Blood 1990, 75(10):2024-2029. 137. Patel SS, Wacho l t z M C , D u b y A D , Thiele D L , L i p s k y P E : Analysis of the functional capabilities of CD3+CD4-CD8- and CD3+CD4+CD8+ human T cell clones. Journal of Immunology 1989,143:1108-1117. 138. Pawelec G , A d i b z a d e h M , P o h l a H , Schaudt K : Immunosenescence: Aging of the immune system. Immunology Today 1995,16(9):420-422. 139. Colombatti A , D o l i a n a R, Schiappacassi M , A r g e n t i n i C , Tonu t t i E , Fe rug l io C , Sala P: Age related persistent clonal expansions of CD28(-) cells: phenotypic and molecular TCR analysis reveals both CD4(+) and CD4(+)CD8(+) cells with identical CDR3 sequences. Clinical Immunology and Immunopathology 1998,89(l):61-70. 115 140. Weiss L , R o u x A , Garc i a S, D e m o u c h y C , Haef fner -Cavai l lon N , Kaza tchk ine M D , G o u g e o n M L : Persistent expansion, in a human immunodeficiency virus-infected person, of V beta-restricted CD4+CD8+ T lymphocytes that express cytdtoxicity-associated molecules and are committed to produce interf eron-gamma and tumor necrosis factor-alpha. The Journal of Infectious Diseases 1998,178(4):1158-1162. 141. Blue M L , Da ley JF, L e v i n e H , C r a i g K , Scholssman SF: Biosynthesis and surface expression of T8 by peripheral blood T4+ cells in vitro. Journal of Immunology 1986,137:1202-1207. 142. P a l i a r d X , de W a a l Malefij t R, de Vr ie s JE, Spits H : Interleukin-4 mediates CD8 induction on human CD4+ T cell-clones. Nature 1988, 335:642-644. 143. B r o d S A , Purvee M , Benjamin D , Haf ler D A : Frequency analysis of CD4-CD8+ T cells cloned with IL-4. Cellular immunology 1990,125:426-436. 144. J imenez E , Sacedon R, Vicente A , Hernandez -Lopez C , Zapata A G , Varas A : Rat Peripheral CD4+CD8+ T Lymphocytes Are Partially Immunocompetent Thymus-Derived Cells That Undergo Post-Thymic Maturation to Become Functionally Mature CD4+ T Lymphocytes. / Immunol 2002,168(10):5005-5013. 145. N a s c i m b e n i M , S h i n E - C , C h i r i b o g a L , Kle iner D E , Rehe rmann B: Peripheral CD4+CD8+ T cells are differentiated effector memory cells with antiviral functions. Blood 2004,104(2):478-486. 146. Pr ince H E , G o l d i n g J, Y o r k J: Characterization of circulating CD4+ CD8+ Lymphocytes in Healthy Individuals Prompted by Identification of a Blood Donor with a Markedly Elevated Level of CD4+ CD8+ Lymphocytes. Clinical and Diagnostic Laboratory Immunology 1994, l(5):597-605. 147. Tonu t t i E , Sala P, Ferugl io C , Y i n Z , Co lomba t t i A : Phenotypic Heterogeneity of Persistent Expansions of CD4+CD8+ T Cells. Clinical Immunology and Immunopathology 1994, 73(3):312-320. 148. Barclay A N , B i rke l and M L , B r o w n M H , Beyers A D , D a v i s SJ, S o m o z a C , W i l l i a m s A F : The Leucocyte Antigen FactsBook. L o n d o n : A c a d e m i c Press L i m i t e d ; 1993. 149. Farag SS, C a l i g i u r i M A : Human natural killer cell development and biology. Blood Reviews 2006,20(3):123-137. 150. C o o p e r M A , Fehniger T A , C a l i g i u r i M A : The biology of human natural killer-^ cell subsets. Trends in Immunology 2001,22(ll):633-640. 151. M a v i l i o D , L o m b a r d o G , Benjamin J, K i m D , F o l l m a n D , Marcenaro E , O'Shea M A , Kin te r A , Kovacs C , More t t a A et al: Characterization of CD56-/CD16+ natural killer (NK) cells: A highly dysfunctional NK subset expanded in HIV-infected viremic individuals. PNAS 2005,102(8):2886-2891. 152. W o o d G S , Szwejbka P, Schwandt A : Human Langerhans Cells Express a Novel Form of the Leukocyte Common Antigen (CD45). 1998, lll(4):668-673. 153. Bit tner M L , B u t o w R, D e R i s i J, D i e h n M , Eberwine J, Eps te in C B , G l y n n e R, G r i m m o n d S, Ideker T, Kacha rmina JE et al: Expression Analysis of RNA. In: 116 DNA Microarrays: A Molecular Cloning Manual. Ed i t ed by B o w t e l l D , Sambrook J, v o l . 1st. N e w Y o r k : C o l d S p r i n g H a r b o r Labora tory Press; 2003: 102-288. 154. M a r t i n PJ, N a s h R A : Pitfalls in the Design of Clinical Trials for Prevention or Treatment of Acute Graft-versus-Host Disease. Biology of Blood and Marrow Transplantation 2006,12(1, Supplement 2):31-36. 155. Q w e n R G , Raws t ron A C : Minimal residual disease monitoring in multiple myeloma: flow cytometry is the method of choice. British Journal of Haematology 2005,128(5):732-733. 156. D e Boer RJ, M o h r i H , H o D D , Perelson A S : Turnover Rates of B Cells, T Cells, and NK Cells in Simian Immunodeficiency Virus-Infected and Uninfected Rhesus Macaques. J Immunol 2003,170(5) :2479-2487. 157. D e Boer RJ, O p r e a M , A n t i a R, M u r a l i - K r i s h n a K , A h m e d R, Perelson A S : Recruitment Times, Proliferation, and Apoptosis Rates during the CD8+ T-Cell Response to Lymphocytic Choriomeningitis Virus. / Virol 2001, 75(22):10663-10669. 158. Schlossman SF, Boumse l l L , G i l k s W , H a r l a n J M , K i s h i m o t o T: Leucocyte Typing V. N e w Y o r k , U . S . A . : O x f o r d Univers i ty Press; 1995. 159. C h o i EY, Chr i s t i anson G J , Yoshimura Y, Jung N , Sprou le TJ, M a l a r k a n n a n S, Joyce S, Roopen ian D C : Real-time T-cell profiling identifies H60 as a major minor histocompatibility antigen in murine graft-versus-host disease. Blood 2002,100(13):4259-4264. 160. James G M , Sood A : Performing Hypothesis Tests on the Shape of Functional Data. Computational statistics and data analysis 2006, 50(7):1774-1792. 161. Z e n g Q T , Pratt JP, Pak J, Ravnic D , H u s s H , M e n t z e r SJ: Feature-guided clustering of multi-dimensional flow cytometry datasets. Journal of Biomedical Informatics, In Press, Corrected Proof. 162. Glucksberg H , Storb R, Fefer A , Buckner C D , N e i m a n P E , C l i f t R A , Le rne r K G , Thomas E D : Clinical manifestations of graft-versus-host disease in human recipients of marrow from HL-A-matched sibling donors. Transplantation 1974,18(4):295-304. 163. Przepiorka D , Wei sdor f D , M a r t i n P, K l i n g e m a n n H G , Beatty P, H o w s J, Thomas E D : 1994 Consensus Conference on Acute GVHD Grading. Bone Marrow Transplantation 1995,15(6):825-828. 164. R o w l i n g s P A , Przepiorka D , K l e i n JP, Ga le R P , Passweg JR, H e n s l e e - D o w n e y J, C a h n J-Y, C a l d e r w o o d S, G r a t w o h l A , Socie G et al: IBMTR Severity Index for grading acute graft-versus-host disease: retrospective comparison with Glucksberg grade. British Journal of Haematology 1997,97(4):855-864. 165. Lerner K G , K a o C M , Storb R, Buckner C D , Cl i f t R A , Thomas E D : Histopathology of graft-vs.-host reaction (GvHR) in human recipients of marrow from HL-A-matched sibling donors. Transplantation Proceedings 1974, 6(4):367-371. 117 166. Sale G E : Pathology and recent pathogenetic studies in human graft-versus-host disease. Survey and synthesis of pathology research 1984, 3(3):235-253. 167. Sale G E , Lerner K G , Barker E A , S h u l m a n H M , Thomas E D : The skin biopsy in the diagnosis of acute graft-versus-host disease in man. Am J Pathol 1977, 89(3):621-635. 168. S v i l a n d L , Pearson A D , Green M A , Baker B D , Eas tham EJ, R e i d M M , H a m i l t o n PJ, Proctor SJ, M a l c o l m A J : Immunopathology of early graft-versus-host disease—a prospective study of skin, rectum, and peripheral blood in allogeneic and autologous bone marrow transplant recipients. Transplantation 1991,52(6):1029-1036. 169. A t k i n s o n K , M u n r o V , Vasak E , Biggs J: Mononuclear cell subpopulations in the skin defined by monoclonal antibodies after HLA-identical sibling marrow transplantation. British Journal of Dermatology 1986,114(2):145-160. 170. Snoyer D C , W e i s d o r f S A , Ramsay N , M c G l a v e P , Kersey J: Hepatic graft versus host disease: a study of the predictive value of liver biopsy in diagnosis. Hepatology 1984,4(1):123-130. 171. S h u l m a n H M , Sharma P, A m o s D , Fenster L F , M c D o n a l d G B : A coded histologic study of hepatic graft-versus-host disease after human bone marrow transplantation. Hepatology1988, 8(3):462-470. 172. Eps te in RJ, M c D o n a l d G B , Sale G E , S h u l m a n H M , Thomas E D : The diagnostic accuracy of the rectal biopsy in acute graft-versus-host disease: a prospective study of thirteen patients. Gastroenterology 1980, 78(4):764-771. 173. A t k i n s o n K , H o r o w i t z M M , Biggs J, Ga le R P , R i r n m A A , B o r t i n M M : The clinical diagnosis of acute graft-versus-host disease: a diversity of views amongst marrow transplant centers. Bone Marrow Transplantation 1988, 3(1):5-10. 174. M a r t i n o R, Romero P, Subi ra M , Be l l ido M , Al tes A , Sureda A , Brunet S, Bade l l I, Cube l l s J, Sierra J: Comparison of the classic Glucksberg criteria and the IBMTR Severity Index for grading acute graft-versus-host disease following HLA-identical sibling stem cell transplantation. Bone Marrow Transplantation 1999,24(3):283-287. 175. W e i s d o r f DJ , H u r d D , Carter S, H o w e C , Jensen L A , Wagne r J, Stablein D , T h o m p s o n J, K e r n a n N A : Prospective grading of graft-versus-host disease after unrelated donor marrow transplantation: a grading algorithm versus blinded expert panel review. Biology of Blood and Marrow Transplantation 2003, 9(8):512-518. 176. Vin te rbo S A , K i m E - Y , O h n o - M a c h a d o L : Small, fuzzy and interpretable gene expression based classifiers. Bioinformatics 2005,21(9):1964-1970. 177. D u n n J: Well separated clusters and optimal fuzzy partitions. Journal Cybernet 1974,4:95-104. 178. Has t ie T, T ibsh i ran i R: Discriminant Analysis by Gaussian Mixtures. Journal of the Royal Statistical Society Series B 1996, 58:158-176. 118 179. H a l l M A : Correlation-based Feature Subset Selection for Machine Learning. PhD-dissertation. H a m i l t o n , N e w Zea land: Un ive r s i ty of Waika to ; 1999. 180. W i t t e n I H , Frank E : Data Mining: Practical machine learning tools and techniques, Second edn. San Francisco: Elsevier; 2005. 181. Piat t JC: Fast Training of Support Vector Machines using Sequential Minimal Optimization. In: Advances in Kernel Methods - Support Vector Learning. E d i t e d by Schoelkopf B, Burges C J C , Smola A J . C a m b r i d g e , Massachusetts: M I T Press; 1998:185-208. 119 APPENDICES Appendix A. Patient information on maximum GvHD grade, GvHD diagnosis in days post-transplant and patient-donor relationship Patient # Max aGvHD . grade aGvHD post-transplant cGvHD post-transplant Donor-patient relationship Comments 1 3 26 MUD Last contact 187 days post-transplant 2 0 SIB 3 4 23 MUD Expired 61 days post-transplant 4 0 SIB Expired 278 days post-transplant 5 3 59 SIB 6 3 19 SIB 7 3 39 SIB Expired 89 days post-transplant 8 0 122 SIB 9 3 43 211 SIB 10 1 11 MUD 11 1 68 273 SIB 12 3 22 SIB 13 3 48 SIB 14 2 28 MUD Relapsed 15 ' • 2- 19 98 SIB 16 2 10 MUD Expired 74 days post-transplant 17 0 SIB Relapsed 18 0 SIB Expired 54 days post-transplant 19 2 77 446 SIB 20 0 SIB Expired 55 days post-transplant 21 3 54 294 MUD 22 3 32 223 SIB 23 3 22 SIB Last contact < 100 days post transplant 24 3 37 SIB 25 1 44 SIB Expired 89 days post-transplant 26 0 117 SIB 120 Patient # Max aGvHD grade aGvHD post-transplant cGvHD post-transplant Donor-patient relationship Comments 27 2 31 SIB 28 1 51 177 MUD 29 0 SIB Expired 97 days post-transplant 30 0 104 SIB 31 0 SIB Last contact 109 days post-transplant; Relapsed 121 Appendix B. List of the subsets of immune cells from each of the ten aliquots Aliquots Immune cells 1 A c t i v a t i o n S S C , F S C / C D 3 P e r C P + S S Q F S C / C D 3 P e r C P + / C D 4 4 - C D 2 5 -S S Q F S C / C D 3 P e r C P + / C D 4 4 - C D 2 5 + S S Q F S C / C D 3 P e r C P + / C D 4 4 + C D 2 5 + S S Q F S C / C D 3 P e r C P V C D 4 4 + C D 2 5 + / C D 6 9 + S S Q F S C / C D 3 P e r C P + / C D 4 4 + C D 2 5 -S S C , F S C / C D 3 PerCP-S S Q F S C / C D 3 P e r C P - / C D 4 4 + C D 2 5 + S S Q F S C / C D 3 P e r C P - / C D 4 4 + C D 2 5 + / C D 6 9 + S S Q F S C / C D 3 P e r C P - / C D 4 4 - C D 2 5 -S S Q F S C / C D 3 P e r C P / C D 4 4 + C D 2 5 -2 A c t i v a t i o n S S Q F S C / C D 3 Pe rCP* S S Q F S C / C D 3 P e r C P V C D 4 b r S S Q F S C / C D 3 P e r C P + / C D 4 i n t S S Q F S C / C D 3 P e r C P V C D 8 d i m S S Q F S C / C D 3 P e r C P V C D 8 b r S S Q F S C / C D 3 PerCP-S S Q F S C / C D 3 P e r C P - / C D 4 d i m S S C , F S C / C D 3 P e r C P - / C D 4 - C D 8 -S S Q F S C / C D 3 P e r C P - / C D 8 l o w 3 A c t i v a t i o n S S Q F S C / C D 3 P e r C P + S S Q F S C / C D 3 P e r C P V C D 4 b r S S Q F S C / C D 3 P e r C P V C D 4 ^ S S Q F S C / C D 3 P e r C P + / C D 8 d i m S S Q F S C / C D 3 P e r C P V C D 8 b r S S Q F S C / C D 3 PerCP-S S Q F S C / C D 3 P e r C P - / C D 4 d i m S S Q F S C / C D 3 P e r C P - / C D 4 C D 8 -S S Q F S C / C D 3 P e r C P - / C D 8 l o w S S Q F S C / C D 3 P e r C P - / C D 8 l 0 W / C D 1 2 2 h i B cells S S Q F S C / C D 2 0 + S S Q F S C / C D 2 2 + S S C , F S C / C D 2 2 + C D 2 0 + S S Q F S C / C D 2 0 + C D 1 9 + 122 Aliquots Immune cells S S Q F S C / C D 3 3 + C D 4 5 + S S Q F S C / C D 3 3 + C D 4 5 + / C D 1 5 + C D 1 4 + S S Q F S C / C D 3 3 + C D 4 5 d i m M y e l o i d s S S Q F S C / C D 3 3 + C D 4 5 d i m / C D 1 5 + C D 1 4 + S S C , F S C / C D 3 3 + C D 4 5 d i m / C D 1 5 l o w C D 1 4 l o w S S Q F S C / C D 3 3 + C D 4 5 d i m / CD15+CD14-S S Q F S C / C D 4 5 + C D 3 3 -S S C F S C / C D 4 5 + C D 3 3 V C D 1 5 + C D 1 4 -S S C , F S C / C D 2 - C D 1 6 + S S Q F S C / C D 2 - C D 1 6 V C D 5 6 + C D 3 -S S Q F S C / C D 2 - C D 1 6 + / C D 3 + C D 5 6 -S S C F S C / C D 2 - C D 1 6 V C D 5 6 C D 3 -S S C , F S C / C D 2 d i m C D 1 6 + S S C , F S C / C D 2 d i m C D 1 6 + / CD56+CD3" S S Q F S C / C D 2 d i m C D 1 6 + / CD3+CD56-N K cells S S Q F S C / C D 2 d ™ C D 1 6 + / C D 5 6 Q D 3 -S S C / F S C / C D 2 + C D 1 6 + S S Q F S C / C D 2 + C D 1 6 + / C D 5 6 + C D 3 -S S Q F S C / C D 2 + C D 1 6 + / C D 3 + C D 5 6 -S S C , F S C / C D 2 + C D 1 6 + / C D 5 6 - C D 3 -S S C , F S C / C D 2 + C D 1 6 -S S Q F S C / C D 2 + C D 1 6 / CD56+CD3" S S Q F S C / C D 2 + C D 1 6 / CD3+CD56-S S Q F S C / C D 2 + C D 1 6 - / C D 5 6 - C D 3 -S S Q F S C / C D 3 P e r C P + S S Q F S C / C D 3 P e r C P + / C D 4 + C D 8 p -S S Q F S C / C D 3 P e r C P + / C D 4 + C D 8 p + S S Q F S C / C D 3 P e r C P + / C D 4 + C D 8 p + / C D 8 + S S Q F S C / C D 3 P e r C P + / C D 4 + C D 8 p + / C D 8 + (p ropor t ion of C D 3 + cells) T cells S S Q F S C / C D 3 P e r C P + / C D 8 p + C D 4 -S S Q F S C / C D 3 P e r C P + / C D 8 + C D 8 p -S S Q F S C / C D 3 P e r C P + / C D 8 p + C D 8 + S S Q F S C / C D 3 P e r C P + / C D 8 P + C D 8 l o w S S Q F S C / C D 3 P e r C P V C D S p ^ C D S -S S Q F S C / C D 3 PerCP-S S Q F S C / C D 3 P e r C P - / C D 4 l o w C D 8 p l o w S S Q F S C / C D 3 P e r C P / C D 8 + C D 8 p -123 Aliquots Immune cells T cells S S Q F S C / C D 3 P e r C P - / C D 8 p d i m C D 8 -rest / activate T helper S S Q F S C / C D 3 P e r C P + S S Q F S C / C D 3 + C D 4 + S S Q F S C / C D 3 + C D 4 -S S Q F S C / 4 5 R O C D 3 + S S Q F S C / 4 5 R O C D 3 + / C D 4 + S S Q F S C / 4 5 R O C D 3 7 C D 4 -S S Q F S C / 4 5 R O C D 3 + / C D 4 l o w S S Q F S C / 4 5 R A C D 3 + S S Q F S C / 4 5 R A C D 3 + / C D 4 + S S Q F S C / 4 5 R A C D 3 + / C D 4 -S S Q F S C / 4 5 R A C D 3 + / C D 4 l o w S S Q F S C / C D 3 PerCP-S S Q F S C / C D 4 d i m S S Q F S C / C D 3 C D 4 -S S Q F S C / 4 5 R O C D 3 -S S Q F S C / 4 5 R O C D 3 - / C D 4 d i m S S Q F S C / 4 5 R A C D 3 -S S Q F S C / 4 5 R A C D 3 - / C D 4 d i m rest /act ivate T suppressor S S Q F S C / C D 3 P e r C P + S S Q F S C / C D 3 + C D 8 + S S Q F S C / C D 3 + C D 8 -S S Q F S C / 4 5 R O C D 3 + S S Q F S C / 4 5 R O C D 3 + / C D 8 + S S Q F S C / 4 5 R O C D 3 + / C D 8 -S S Q F S C / 4 5 R O C D 3 + / C D 8 l o w S S Q F S C / 4 5 R A C D 3 + S S Q F S C / 4 5 R A C D 3 + / C D 8 + S S Q F S C / 4 5 R A C D 3 + / C D 8 -S S Q F S C / 4 5 R A C D 3 + / C D 8 l o w S S Q F S C / C D 3 PerCP-S S Q F S C / C D 8 + C D 3 -S S Q F S C / C D 3 C D 8 -S S Q F S C / 4 5 R O C D 3 -S S Q F S C / 4 5 R A C D 3 -S S Q F S C / 4 5 R A C D 3 - / C D 8 + 124 Aliquots Immune cells S S Q F S C / C D 3 P e r C P + S S Q F S C / C D 3 P e r C P + / T C R a b + C D 5 + S S Q F S C / C D 3 P e r C P + / T C R a b + C D 5 + / T C R a b + T C R g d + S S Q F S C / C D 3 P e r C P + / T C R g d + C D 5 + S S Q F S C / C D 3 P e r C P + / T C R a b + C D 5 -S S Q F S C / C D 3 P e r C P + / T C R a b + C D 5 - / T C R a b + T C R g d + S S Q F S C / C D 3 P e r C P + / T C R a b + C D 5 - / T C R g d - C D 5 -T C R S S Q F S C / C D 3 PerCP-S S Q F S C / C D 3 P e r C P - / C D 5 + S S Q F S C / C D 3 P e r C P - / T C R a b + C D 5 + S S Q F S C / C D 3 P e r C P - / T C R a b + C D 5 -S S Q F S C / C D 3 P e r C P / T C R a b + T C R g d - / C D 5 + S S Q F S C / C D 3 P e r C P / T C R a b + C D 5 - / T C R a b + T C R g d + S S Q F S C / C D 3 P e r C P - / T C R a b + C D 5 " / T C R g d C D 5 -S S Q F S C / C D 3 P e r C P / T C R + C D 5 + S S Q F S C / C D 3 P e r C P / T C R a b + T C R g d - / C D 5 -the ' / ' indicates each level of the sequential gating. 125 Appendix C. PERL script fixFCS.pl for enforcing FCS file compatibility from Flowjo into rflowcyt #!usr/bin/perl #fixFCS_vO.7.pi #Written by Shang-Jung (Jessica) Lee #BC Cancer Research Centre #Last updated: December 13, 2006 #Maintainer: Jessica Lee <jleeObccrc.ca> #Please be noted that Immune c e l l populations and measurements were used interchangeably i n the PERL codes/documentation #This PERL sc r i p t reads i n the FCS f i l e s from FlowJo (Tree Star, Inc, Oregon) #It then creates a new FCS f i l e s with necessary modifications to be successfully read into R v i a rflowcyt #NOTE: information on experiment d e t a i l s , samples labels may be l o s t ! ! #Folder and f i l e s are selected based on i t s names. User can modify t h i s selection i n the regular expression located below the comment »####### USER MODIFY HERE" #This s c r i p t w i l l also have updated header with new bytes information #Tested on FCS version 2.0 exported from FlowJo version 6.3.4 #Tested with rflowcyt version 1.4.0 on R (windows 2.3.0) #On Windows XP (Pentium 4 CPU, 1.00GB of RAM), i t takes less than 1 minute to search through 500 f i l e s and modify/create 200 f i l e s . #Please report a l l bugs and suggestions to <j leeObccrc.'ca> use warnings; use s t r i c t ; use F i l e : : F i n d ; use Storable; use Getopt::Long; use bytes (); ############ ### MAIN ### ############ #opens log f i l e to record status and errors open (OUTFILElog,' ">>fixFCS.log") or die ("Cannot open output f i l e : $ ! " ) ; print (OUTFILElog "\n\nSTART TIME: " . scalar localtimeO . "\n"); #selects folder with the FCS f i l e s to be modified &SELECT_FOLDER(); close (OUTFILElog) or die ("Cannot close output f i l e : $!"); ############################################################# #sub SELECT_FOLDER #selects one or a l l subfolders within the current location based on user s p e c i f i c a t i o n #calls subroutine SELECT_FILES i n t e r n a l l y sub SELECT_FOLDER { print ("Subfolder name or \ ' a l l \ ' for a l l subfolders (based on the default selection c r i t e r i a , P#) : ") ; chomp(my $userFolder = <STDIN>); i f ($userFolder =~ m/all$/i){ #select a l l folders my OfolderNames; fi n d sub {push @folderNames, $File::Find::name i f -d}, 1.'; foreach my $folderPos (0..$#folderNames){ ####### USER MODIFY HERE for selecting f o l d e r / f i l e i f ($folderNames[$folderPos] =~ m|\./(p[\d]+)|){ &SELECT_FILES ( "$1" ) ; } } } else{ #select s p e c i f i c folder ho i f (-d $userFolder){ &SELECT_FILES ("$userFolder"); } else{ p r i n t OUTFILElog "END PROGRAM: cannot f i n d folder $userFolder\n"; die ("Cannot f i n d folder $userFolder: $!"); } } } #sub SELECT_FOLDER ############################################################# #sub SELECT_FILES #selects the correct FCS f i l e s based on i t s naming scheme #calls subroutine FIX i n t e r n a l l y #INPUT: name and location (optional) of the subfolder where FCS f i l e s are located sub SELECT_FILES { my $patientFolder = shift(@_); my ©FCSfiles; #find a l l f i l e s i n folder.. f i n d sub {push OFCSfiles, $File::Find::name}, "./$patientFolder"; my %fixed;; my %toBeFix; #in order to save time, skip any FCS f i l e that was already fix e d (ie a corresponding FCS f i l e with the modified data exists (+ "_fixed")) foreach my $currentFile (@FCSfiles){ i f (!($currentFile =~ m/\._/)){ i f ($currentFile =~ m/(.+)_fixed\.fcs/){ ####### USER MODIFY HERE for selecting / excluding f i l e s $fixed{"$l"} = "$currentFile"; } e l s i f ( $ c u r r e n t F i l e =~ m/(.+)\.fcs/){ $ t o B e F i x { } = "$currentFile"; } 00 } } foreach my $key (keys %toBeFix){ i f (!(exists($fixed{$key}))){ prin t (OUTFILElog " f i x i n g : $toBeFix{$key}\n"); &FIX C'$toBeFix{$key}"); } else{ print (OUTFILElog "FIXED: $toBeFix{$key}\n"); } } #foreach f i l e } #sub SELECT_FILES ############################################################# #sub FIX #removes the unwanted keywords i n the FCS f i l e #updates bytes information i n the header #creates a new FCS f i l e with necessary modificaiton #INPUT: name and location (optional) of the FCS f i l e sub FIX { my $currentFile = shift(@_); my $newFileName = "$currentFile"; $newFileName =~ s/\.fcs/_fixed.fcs/; ####### USER MODIFY HERE for naming scheme my $keywords; my $temp; #reading i n the BINARY f i l e open (INFILE, "<:raw", "$currentFile") or die ("Cannot open input f i l e : $!"); binmode (INFILE); u n t i l (eof INFILE){ $temp .= <INFILE>; K3 } #remove $FIL (not necessary) #if ($entireText =~ m|\$FIL.+\.fcs\\\$NEXTDATA|){ #$entireText =~ s|\$FIL.+\.fcs\\\$NEXTDATA|\\\$NEXTDATA|; #print (OUTFILElog "remove \$FIL\n") ; #} #remove $BTIM. ..BD$NPAR. . . .BD$P1N. . . i f ($temp = ~ m| (\\\$DATATYPE\\.{l,2})\\\$BTIM\\.+\\BD\$NPAR.+\\BD\$P1N. + (\\\$P1N) | ) { print (OUTFILElog "remove \$BTIM...BD\$NPAR\n"); $temp =~ s| (\\\$DATATYPE\\.{l,2})\\\$BTIM\\.+\\BD\$NPAR.+\\BD\$P1N. + (\\\$P1N) |$1$2 | ; } #remove $BEGINDATA (not necessary) #determine the old byte infomration my %oldBytes; my $newHeader; i f ($segments[0]=~ m/(FCS\d\.\d)(.+)$/){ $newHeader = $1; $oldBytes{"original"} = "" . $2; $oldBytes{"start keyword"} = substr($oldBytes{"original"},0,12); $oldBytes{"end keyword"} = substr($oldBytes{"original"},12,8); $oldBytes{"start data"} = substr($oldBytes{"original"},20,8); $oldBytes{"end data"} = substr($oldBytes{"original"},28,8); $oldBytes{"sOsO"} = substr($oldBytes{"original"},36); #make sure that the d i g i t s between the old byte and the new byte information are the same i f ((length($oldBytes{"start keyword"})-length($bytes{"start keyword"})) >=0){ for (1..(length($oldBytes{"start keyword"})-length($bytes{"start keyword"}))){ $newHeader .= " ";} $newHeader .= $bytes{"start keyword"}; } e l s e j p r i n t (OUTFILElog "Error: over size l i m i t \ n " ) ; return();} o if((length($oldBytes{"end keyword"})-length($bytes{"end keyword"}))>=0){ for (1..(length($oldBytes{"end keyword"})-length($bytes{"end keyword"}))){ $newHeader .="";} $newHeader .= $bytes{"end keyword"}; } e l s e j p r i n t (OUTFILElog "Error: over size limit\n"),- return () ; } i f ((length($oldBytes{"start data"})-length($bytes{"start data"}))>=0){ for (1..(length($oldBytes{"start data"})-length($bytes{"start data"}))){ $newHeader .="";} $newHeader .= $bytes{"start data"}; } e l s e j p r i n t (OUTFILElog "Error: over size l i m i t \ n " ) ; return();} if((length($oldBytes{"end data"})-length($bytes{"end data"}))>=0){ for (1..(length($oldBytes{"end data"})-length($bytes{"end data"}))){ $newHeader .= " "; } $newHeader .= $bytes{"end data"}; } e l s e j p r i n t (OUTFILElog "Error: over size l i m i t \ n " ) ; return();} $newHeader .= $oldBytes{"sOsO"}; #replace old header with the new one open (OUTFILE, ">$newFileName") or die ("Cannot open output f i l e : $!"); binmode(OUTFILE); print (OUTFILE "" . $newHeader . $spaces40 . $segments[1] . $spaces40 . $segments[2]) close (OUTFILE) or die ("Cannot close output f i l e : $!"); } } else { print (OUTFILElog "ERROR: Cannot locate header i n the FCS f i l e ($currentFile)\n"); return(); } } #sub FIX 03 Appendix D. PERL script viz_days.pl for flow cytometry data transformation #!usr/bin/perl use s t r i c t ; use warnings; use F i l e : : F i n d ; use Storable; use Getopt::Long; #viz_days.pi #Written by Shang-Jung (Jessica) Lee #BC Cancer Research Centre #Last updated: July 14, 2006 #Maintainer: Jessica Lee <jleeObccrc.ca> #Please be noted that Immune c e l l populations and measurements were used interchangeably i n the PERL codes/documentation #This PERL sc r i p t reads i n f i l e s containing flow cytometry data and c l i n i c a l data ##acute GvHD diagnosis time i n days post-transplant from f i l e "GvHD_days_p31.txt" v i a subroutine GVHD_DAY ##flow cytometry data f i l e s (for each patient, each available aliquot) i n the specified subfolder v i a subroutine FILES ##sampling time points for each patient i n f i l e "sampling_time_p31.txt" v i a subroutine SAMPLING_TIME ##MNC values estiamted from different samples of the same patient population from f i l e s "JL_MNC.txt" v i a subroutine READ_MNC #It combines these f i l e s and user specified information such as excerpt time range #New f i l e s are. created grouping samples from patients taken at s p e c i f i c time range into individual f i l e for each available measurement i n subroutine ' v i s u a l i z a t i o n ' #make a subfolder named 'vi s u a l i z a t i o n ' i f i t does not exist i f (!-d ".\\\\visualization"){mkdir " . \ \ \ \ v i s u a l i z a t i o n " or die ("Cannot make subfolder v i s u a l i z a t i o n " ) ; } my $log = " . \ \ v i s u a l i z a t i o n W l o g _ v i z . t x t " ; #lot f i l e #Pre-specified parameters #average aGvHD diagnosis i n days post-transplant, used i n the data transformation of non-GvHD data from days post-transplant into days from aGvHD diagnosis my $averageGVHD = 36; #input f i l e s : #GVHD diagnosis day my $gvhd_diagnosis_inputFile = "GvHD_days_p31.txt"; i f (-e $gvhd_diagnosis_inputFile){die ("Cannot f i n d f i l e : $gvhd_diagnosis_inputFile");} #sampling time points my $sampling_inputFile = "sampling_time_p31.txt"; 132 i f (-e $sampling_inputFile){die ("Cannot f i n d f i l e : $sampling_inputFile");} #mnc values my $mnc_inputFile = "JL_MNC.txt"; i f (-e $mnc_inputFile){die ("Cannot f i n d f i l e : $mnc_inputFile");} ################ ### sub MAIN ### ################ my $reference; #user specified option: time i n days from transplantation or aGvHD diagnosis GetOptions('r|reference=s'=>\$reference); if(!$reference || !($reference =~ m/transplant|gvhd/i)){ die ("Usage: perl v i s u a l i z a t i o n . p l -r <post - \"transplant\" or \"gvhd\"") ; } #open l o t f i l e to record status and errors open (OUTFILElog, ">$log") or die ("Cannot open output f i l e : $!"); iob t a i n data from f i l e s by c a l l i n g the various subroutines #read i n raw flow cytometry data f i l e as exported from FlowJo my @temp = &FILES(); my %data = %{$temp[0]}; #data my %ann = %{$temp[1]}; #measurement names ttread i n sampling time poits for each patient i n days post-transplant my %samplingTime = %{&SAMPLING_TIME()}; tread i n GvHD diagnosis i n days post-transplant my %GVHDdays = %{&GVHD_DAY()}; #read i n the mnc values my %MNC = %{&READ_MNC()}; ############################# #change time scale: #if user choose acute GvHD diagnosis as a point of reference (instead of the transplantation), changes the days i n sampling time, data, and mnc #time i s o r i g i n a l l y recorded i n days post-transplant i f preference =~ m/gvhd/i){ pr i n t "changing sampling time and data to r e f l e c t time post-aGVHD\n"; my %tempData; my %tempMNC; foreach my $tempPatient (keys %GVHDdays){ #get the GvHD diagnosed day (days post-transplant) for each patient #if the patient was never diagnosed with GvHD (GVHDday = 0 ) , average GvHD day which i s set at the beginning of the s c r i p t i s used my $gvhd = 0 + $GVHDdays{$tempPatient}; 133 i f ($gvhd ==0){ $GVHDdays{$tempPatient} = 0 + $averageGVHD; $gvhd = 0 + $averageGVHD; } ichange the day i n samplingTime if(exists($samplingTime{$tempPatient})){ foreach (0..$#{$samplingTime{$tempPatient}}){ $samplingTime{$tempPatient}[$_] = $samplingTime{$tempPatient}[$_] - $gvhd; }} else{print (OUTFILElog "###Cannot f i n d the following patient i n samplingTime: $tempPatient\n");} #change the day i n data if(exists($data{$tempPatient})){ foreach my $tempGroup (keys %{$data{$tempPatient}}){ foreach my $tempMeasurement (keys %{$data{$tempPatient}{$tempGroup}}){ foreach (keys %{$data{$tempPatient}{$tempGroup}{$tempMeasurement}}){ $tempData{$tempPatient}{$tempGroup}{$tempMeasurement}{0+($_ -$gvhd)} = 0 + $data{$tempPatient}{$tempGroup}{$tempMeasurement}{$_}; }}} } e l s e j p r i n t (OUTFILElog "###Cannot f i n d the following patient i n data: $tempPatient\n");} #change the day i n MNC if(exists($MNC{$tempPatient})){ foreach my $MNCday (keys %{$MNC{$tempPatient}}){ $tempMNC{$tempPatient}{0+($MNCday-$gvhd)} = 0+$MNC{$tempPatient}{$MNCday}; }} else{print (OUTFILElog "Cannot f i n d the following patient i n MNC: $tempPatient\n");} } #foreach patient %data = %tempData; %MNC=%tempMNC; } ######################################## #get MNC sampling day for each patient into the array %MNC{patient}{"array"} foreach my $patientToArray (keys %MNC){ foreach my $dayToArray (keys %{$MNC{$patientToArray}}){ push (@{$MNC{$patientToArray}{"array"}}, 0 + $dayToArray); } @{$MNC{$patientToArray}{array}} = sort {$a<=>$b} @{$MNC{$patientToArray}{array}};} 134 #######.################################# #user input time range (post-transplant or post-aGVHD) my OrangeDays; pr i n t ("Specify time range (interger i n DAYS) separated by \',\'. Leave t h i s empty for the maximum available time range: " ) ; chomp(my $input = <STDIN>); my OuserSpecifyRange; i f ($input){ @userSpecifyRange = s p l i t (",", $input) ; $rangeDays[0] = 0 + $userSpecifyRange [0] ; $rangeDays[1] = 0 + $userSpecifyRange[1];} else{ #determine the e a r l i e s t and the lat e s t day i n sampling time my $earliest = 100; my $latest = -100; foreach (keys %samplingTime){ foreach (@{$samplingTime{$_}}){ i f ($earliest > $_){$earliest = 0 + $_;} i f ($latest < $_){$latest = 0 + $_;} } } $rangeDays[0] = $earli e s t ; $rangeDays[1] = $latest; } ################################ #MAIN PRINT OUT foreach my $group (keys %ann){ foreach my $measurement (keys %{$ann{"$group"}}){ my $tempGroup = "" . $group; $tempGroup =~ s/\s|-//g; my $tempMeasurement = "" . $measurement; $tempMeasurement =~ s~/|\&~~g; $tempMeasurement =~ s/\s/_/g; $tempMeasurement =~ s/\+/plus/g; $tempMeasurement =~ s/-/minus/g; #subfolder v i s u a l i z a t i o n , convert possible minus sign (from the time range) to "minus" my $subfolderl = " . \ \ v i s u a l i z a t i o n \ \ a s i s _ " . $reference . "_d" . $rangeDays[0] . "_d" . $rangeDays[1]; $subfolderl =~ s/-/minus/g; my $subfolder2 = ".\\visualization\\mnc_" . preference . "_d $rangeDays[0] . "_d" . $rangeDays[1]; $subfolder2 =~ s/-/minus/g; #make the subfolder i f they did not exist already i f (!-d $subfolderl){mkdir $subfolderl or die "Cannot make subfolder $subfolderl"}; i f (!-d $subfolder2){mkdir $subfolder2 or die "Cannot make subfolder $subfolder2"}; ###HAVE NOT implemented to delete a l l e x i s t i n g f i l e s i n the subfolder my $ f i l e A s i s = "$subfolderl\\" . $tempGroup . "_" . $tempMeasurement . " . t x t " ; my $fileMNC = "$subfolder2\\" . $tempGroup . "_" . $tempMeasurement . " . t x t " ; open (OUTFILEAsis, ">$fileAsis") or die ("Cannot open output f i l e ( $ f i l e A s i s ) : $ ! " ) ; open (OUTFILEMNC, ">$fileMNC") or die ("Cannot open output file($fileMNC) : $!") ; pri n t (OUTFILEAsis "time"); p r i n t (OUTFILEMNC "time"); foreach (sort {$a<=>$b} keys %samplingTime){ #print header (patients) prin t (OUTFILEAsis "\tp$_") ; p r i n t (OUTFILEMNC "\tp$_"); } p r i n t (OUTFILEAsis "\n"); p r i n t (OUTFILEMNC "\n"); foreach my $currentDay ($rangeDays[0]..$rangeDays[1]){ pr i n t (OUTFILEAsis "$currentDay"); p r i n t (OUTFILEMNC "$currentDay"); foreach my $patient (sort {$a<=>$b} keys %samplingTime){ my $currentProportion; if(exists($data{$patient}) && exists($data{$patient}{$group}) && exists($data{$patient}{$group}{$measurement}) && exists($data{$patient}{$group}{$measurement}{$currentDay})){ $currentProportion = 0 + $data{$patient}{$group}{$measurement}{$currentDay}; } #matching MNC value at the closest sampling time to the current day my ©closest = &CLOSEST("$currentDay" , \@{$MNC{$patient}{array}}); my $currentMNC; i f (exists($MNC{$patient}) && exists($MNC{$patient}{$closest[1] })){ $currentMNC = 0 + $MNC{$patient}{$closest[1] } ; } i f (defined($currentProportion)){ p r i n t (OUTFILEAsis " \ t " . $currentProportion); i f (defined($currentMNC)){ pr i n t (OUTFILEMNC " \ t " . ($currentProportion * $currentMNC)); . } else { print (OUTFILEMNC " \ t " ) ; } } else { p r i n t (OUTFILEAsis " \ t " ) ; p r i n t (OUTFILEMNC " \ t " ) ; } 136 } #patient print (OUTFILEAsis "\n"); pri n t (OUTFILEMNC "\n"); } #current week close (OUTFILEAsis) or die ("Cannot close output f i l e : $! ") ; close (OUTFILEMNC) or die ("Cannot close output f i l e : $!"); } #measurement } #group close (OUTFILElog) or die ("Cannot close output f i l e : $!"); #################.############################################ #sub GVHD_DAY #read i n GVHD diagnosis day (post-transplant) from the specified f i l e $gvhd_diagnosis_inputFile ttreturn hash %aGvHD ##$aGVHD{patient number (number only)} => aGVHD diagnosis day (zero i f the patient was not diagnosed with aGVHD) sub GVHD_DAY{ my $input = "$gvhd_diagnosis_inputFile"; #storing parsed GvHD diagnosis day data into hash %aGvHD my %aGVHD; print "performing subroutine GVHD_DAY...\n"; open (INFILE, "$input") or die ("Cannot open input f i l e : $!"); u n t i l (eof INFILE){ chomp(my $newText = <INFILE>); #fir s t row contains patient number and second row i s the aGVHD diagnosis day (post-transplant) my ©values = s p l i t ( " \ t " , $newText); if(©values){ my $patient; i f ($values[0] =~ m/( [\d]+)/) { my $temp = $1; $temp =~ s/A0+//g; #remove any zero at the beginning $patient = 0 + $temp; } else { prin t (OUTFILElog "###CANNOT f i n d patient number i n GVHD_DAY: ©values\n"); die("Cannot f i n d patient number i n GVHD_DAY\n"); } if($values[1] =~ m/([\d]+)/){ $aGVHD{"$patient"} = 0 + $1; } else{ p r i n t (OUTFILElog "###Cannot f i n d aGVHD diagnosis day i n GVHD_DAY for patient $patient from @values\n"); die ("Cannot f i n d aGVHD diagnosis day i n GVHD_DAY\n"); } }} close (INFILE) or die ("Cannot close input f i l e : $!"); 137 return(\%aGVHD); } ############################################################# #sub FILES #read i n the raw flow cytometry data f i l e s i n the user-specified subfolder #flow cytometry data f i l e s were exported from flowJo #flow cytometry data f i l e naming scheme: ##the name of the f i l e indicate patient number and aliquot name ##E#'patient number' 'patient i n i t i a l ' 'year'-'aliquot name'.txt #flow cytometry data f i l e format: ##first row includes sampling time i n days post-transplant ##first column indicates the measurement name #assumptions: ##same measurement (or comparable measurements) have the same name ##did NOT assume that the measurement were l i s t e d i n any order ttreturn two hashes: %data and %ann ##%data ###$data{patient number (number only)}{measurement group name}{measurement name}{time i n day post-transplant} => acutal measurement i n % from FlowJo ##%ann ftitf$ann{measurement group name} {measurement name} sub FILES{ my (%data, %ann); print ("performing subroutine FILES ....\n") ; #prompt for name of the subfolder print ("Specify folder name containg the raw data f i l e s : 11) ; chomp (my $dataFolder = <STDIN>); i f (!-d "$dataFolder" ) { die ("INVALID folder name entered!\n"); } #find the a l l f i l e s i n the specified subfolder my @fileNames; f i n d sub {push OfileNames, $File::Find::name i f !-d}, ".\\$dataFolder"; foreach my $ f i l e (OfileNames){ #derive patient number and aliquot name from the f i l e name my ($patient, $group); i f ( $ f i l e =~ m|E\#([\d]+)\s[\w]*\s[\d]*(.+)\.txt$|){ $patient = 0 + $1;' $group = "$2"; $group =~ s / \ . j o - l / / ; $group =~ s/ A-// ; open (INFILE, " $ f i l e " ) or die ("Cannot open input f i l e : $ ! " ) ; p r i n t (OUTFILElog "Reading f i l e : $ f i l e \ n " ) ; iheader with measurement names i n the f i r s t column ([1,1] i s always "sample") 138 chomp(my $header = <INFILE>); my ©titles = s p l i t ( " \ t " , $header); foreach (0..$#titles){ my $measurement = " $ t i t l e s [ $ _ ] " ; #clean up the measurement name "\"*\SSC,, FSC/||; ,Freq. of Parent\"*$|ofParent|; ,Freq. of,SSC, FSC\"*$|ofLiveCells|; ,Freq\..of,CD3.+erCP\+*$|ofTcells|; [\s]PerCP||; \s||g; $measurement =~ s $measurement = ~ s $measurement =~ s $measurement =~ s $measurement =~ s $measurement = ~ s $t i t l e s [$_] = "$measurement",-} u n t i l (eof INFILE){ chomp(my $text = <INFILE>); my ©values = s p l i t ( " \ t " , $text); i f (©values){ #sampling day post-transplant i n the f i r s t row (d# or #d) *[\d]+)/) { my $day; i f ($values[0] =~ m/(-* [\d]+) d/ | $values [0] =~ m/d(-$day = 0 + $1; } else { p r i n t (OUTFILElog "CANNOT FIND: $values[0]\n"); } foreach my $count (1..$#values){ if($values[$count] =~ m/[\d]/){ #could be empty $data{"$patient"}{"$group"}{"$titles[$count]"}{"$day"} = 0 + $values[$count] ; $ann{"$group"}{"$titles[$count]"}++; } } } } #until close (INFILE) or die ("Cannot close input f i l e : $!"); } #patient number and measurement group name from f i l e name else{ p r i n t (OUTFILElog "###CANN0T f i n d patient number and/or measurement group name from the f i l e name $ f i l e \ n " ) ; } } #foreach f i l e i f (!%data || !%ann){die ("No data/annotation");} return (\%data, \%ann); } ############################################################# 139 #sub SAMPLING_TIME #read i n the sampling time for each patients (raw data i s not used becuase not a l l measurements from one patients are available on a l l the time points etc) #input f i l e : sampling_time_p31.txt # f i l l i n hash data complex: %samplingTime => saved as samplingTime.hash #@samplingTime{patient number, 1-31} => sorted (from small to large) sampling time (day post transplant) freturn \%samplingTime sub SAMPLING_TIME { my $samplingTimeFileName = "samplingTime.hash"; my $input = "$sampling_inputFile"; my %samplingTime; pr i n t "performing subroutine SAMPLING_TIME \n"; open (INFILE, "$input") or die ("Cannot open input f i l e : $!"); u n t i l (eof INFILE){ chomp (my $newText = <INFILE>); my ©values = s p l i t ("\t", $newText); i f (©values){ #fi r s t row i s the patient numbers my $patient; i f ($values[0] =~ m/( [\d]+)/) { my $temp = • $1; $temp =~ s/*0+//g; #get r i d of extra zero i n front of the patient number(01->1) $patient = 0 + $temp; } else { print (OUTFILElog "###CANN0T f i n d patinet number i n SAMPLING_TIME: ©values\n"); die ("Cannot f i n d patient number!"); } foreach my $count (1..$#values){ i f ($values[$count] =~ m/[\d]/){push @{$samplingTime{"$patient"}}, 0 + $values[$count];} } @{$samplingTime{"$patient"}} = sort {$a <=> $b} @{$samplingTime{"$patient"}}; }} close (INFILE) or die ("Cannot close output f i l e : $!"); return(\%samplingTime); } ##################•########################################### #sub READ_MNC #read i n the mnc values from the specified f i l e $mnc_inputFile #return hash %MNC ##format: $MNC{patient number}{sampling time i n days post-transplant} => mnc value (in mm3) sub READ_MNC { 140 my %MNC; open (INFILE, "$mnc_inputFile") or die ("Cannot open input f i l e : $ ! " ) ; my $ t i t l e = <INFILE>; u n t i l (eof INFILE){ chomp(my $newText = <INFILE>); my ©cols = s p l i t ("\t", $newText); #cols [0] => patient number #cols [1] => sample date #cols[2] => MNC value #cols[3] => BMT date #cols [4] => days post-transplant #$MNC{patient number}{days post-transplant} = MNC value i f ($cols[2] && $cols [0]) { #if both patient number and MNC value exist $MNC{0 + $cols[0] }{0 + $cols[4]} = 0 + $cols [2] ; }} close (INFILE) or die ("Cannot close input f i l e : $! ") ; return (\%MNC); } ############################################################# #sub CLOSEST #INPUT: a target value and an array #finds the value i n the array that i s closest to the target value #returns two values: p o s i t i o n of the closest value inside the array and the actual closest value ################### ### sub CLOSEST ### ################### sub CLOSEST { my $target = shift(@_); my ©array = @{shift(@_)}; i f ($array[$#array] <= $target) { return("$#array", "$array[$#array]"); } e l s i f ( $ a r r a y [0] >= $target){return("0", "$array[0] ") ;} else{ foreach my $position (0..$#array){ my $element = $array[$position]; i f ($element == $target){return("$position", "$element");} elsif($element>$target){ my $fromLarge = abs($element-$target); my $fromSmall = abs($target-$array[$position-1 ] ) ; if($fromLarge<$fromSmall){return("$position", "$element");} else{return(($position-1), $array[$position-1]);}}}}} 141 Appendix E. PERL script FLDA_MATLAB.pl for creating MATLAB commands performing FLDA analysis #!usr/bin/perl use warnings; use s t r i c t ; use F i l e : : F i n d ; use Storable; use Getopt::Long; use T i e : : F i l e ; use POSIX; # F LDA_MAT LAB.pi #Written by Shang-Jung (Jessica) Lee #BC Cancer Research Centre #Last updated: August 21, 2 0 06 #Maintainer: Jessica Lee <jleeObccrc.ca> #Please be noted that Immune c e l l populations and measurements were used interchangeably i n the PERL codes/documentation #this s c r i p t read i n the text f i l e s (each f i l e represent different measurement/population) prepared from the viz_days.pl #it then outputted the necessary MATLAB commands to perform FLDA c l a s s i f i c a t i o n to each measurements (that q u a l i f i e d , see f i l t e r below) . #FLDA or functional l i n e a r discriminant analysis: ## James, G.M. and Hastie, T.J. (2001) Functional l i n e a r discriminant analysis for ir r e g u l a r sampled curves. Journal of the Royal S t a t i s t i c a l Society, Series B. 63(3): 533-550. ## FLDA was implemented i n MATLAB by Simon Dablemont <Dablemont©dice.ucl.ac.be> ## for everyting related to FLDA (ie setting different parameters such as grid, B-spline order and knots, please refer to the published paper and manuals available i n Dr. Gareth James' website <http://www-rcf.use.edu/~gareth/> #user i s able to select: ##1. where the data are located ##2. which population to analyze (by chosing the approprite f i l e ) or a l l f i l e s i n the specified folder ##3. FLDA parameter: g r i d range (a time range that covers a l l the data selected) ##4. FLDA parameter: g r i d i n t e r v a l ##5. FLDA parameter: B-spline order (norder) ##6. FLDA parameter: number of B-spline knots (which w i l l be placed uniformly covering the g r i d ##7. Different pre-set patient comparisons ########################################### 142 #user inputs for data and FLDA parameters #then checks that inputs (mostly format) are correct #specify subfolder name where the data are located p r i n t ("Specify the subfolder name: " ) ; chomp(my $folder = <STDIN>); print ("Specify the f i l e name or \ " a l l \ " for a l l f i l e s i n the specified subfolder: " ) ; chomp(my $fileName = <STDIN>); my $userFile =. " .\\$f older\\$f ileName" ; #check i f the specified subfolder and f i l e e xists i f (!(-e $userFile) && $userFile =~ m/*all$/i) {die "Cannot f i n d input f i l e : $userFile";} #grid range (or time range) print ("Specify g r i d range(#,#): " ) ; chomp(my $userlnput_grid = <STDIN>); my ©grid = s p l i t ( " , " , $userlnput_grid); #check i f the correct g r i d format i s used i f (!(scalar(@grid) == 2)){die ("Incorrect gr i d : $userlnput_grid");} print ("Specify g r i d i n t e r v a l : " ) ; chomp(my $by = <STDIN>); #check i f g r i d i s given as number i f ($by =~ m/\D/){die ("Incorrect g r i d i n t e r v a l : $by");} #B-spline basis order and knot number print ("Specify norder and nbreaks (#,#):"); chomp(my $userInput_orderBreaks = <STDIN>); my OorderBreaks = s p l i t ( " , " , $userInput_orderBreaks); #check i f correct order breaks format i s used if(!(scalar(©orderBreaks) == 2)){die ("Incorrect order and breaks: $userInput_orderBreaks");} #select patient comparison print ("Specify the group membership comparison to use\n"); print ("'1' for aGVHDcGVHD(7) vs. aGVHDlived(9) vs. healthy4(4)\n'2' for aGVHD(21) vs. healthy4(4)\n"); print ("'3' for aGVHDcGVHD(7) vs. aGVHDlived(9)\n'4' for aGVHD(21) vs. non aGVHD(7)\n'5' for aGvHDcGvHD (7) vs. aGvHD (14)\n"); print ("group membership comparison: " ) ; chomp(my $comparison = <STDIN>); check i f the correct comparision number i s given i f (!($comparison == 1 || $comparison == 2 || $comparison ==3 || $comparison == 4 || $comparison == 5)){die ("Invalid comparison choice");} #MAY 31, 2006 ## leave-one-out cross-validation does not work for comparision between more than two classes i f ($comparison == 1){die ("Leave-one-out cross-validation does not work for comparision between mroe than two classes");} # i n i t i a l i z e d OcompareGroups based on the comparison chosen #########Y0U CAN MODIFY/CREATE NEW COMPAREGROUPS BY ADDING HERE my OcompareGroups; 143 i f ($comparison ==1){ #'1' for aGVHDcGVHD(7) vs aGVHDlived(9) vs healthy4(4) OcompareGroups = (["aGVHDcGVHD"], ["aGVHDlived"], ["healthy4"] , ) ; } elsif($comparison ==2){ #'2' for aGVHD(21) vs healthy4(4)\n") OcompareGroups = (["aGVHDcGVHD", "aGVHDlived", "aGVHDdied"], ["healthy4"],);} elsif($comparison ==3){ #'3' for aGVHDcGVHD(7) vs aGVHDlived(9) OcompareGroups = (["aGVHDcGVHD"], ["aGVHDlived"],);} elsif($comparison==4){ #'4' for aGVHD(21) vs non aGVHD(7)\n") OcompareGroups = (["aGVHDcGVHD", "aGVHDlived", "aGVHDdied"], ["healthy4", "healthyDied" ] , ) ; } elsif($comparison==5){ #5' for aGvHDcGvHD (7) vs. aGvHD a l l (14) OcompareGroups = (["aGVHDcGVHD"], ["aGVHDlived", "aGVHDdied"],);} elsejdie ("###ERROR: incorrect comparison chosen");} #open OUTFILEs #outfiles are created within the specified subfolder #file names includes comparison number, order, and kntos #An MATLAB code f i l e (.m) and log f i l e (.log) are created open (OUTFILE, ">.\\\\$folder\\\\FLDA_comparison$comparison" . "_order$orderBreaks [0] 11 . "_breaks$orderBreaks [1] " . ".m") or die ("Cannot open output f i l e : $!"); open (OUTFILElog, ">.\\\\$folder\\\\FLDA_comparison$comparison" . "_order$orderBreaks[0]" . "_breaks$orderBreaks[1]" . ".log") or die ("Cannot open output f i l e : $ ! " ) ; #create subfolder "data" and "images" i f they are not already existed! (These folders are required for FLDA) i f (!-d ".\\\\$folder\\\\data"){mkdir ".\\\\$folder\\\\data" or die ("Cannot make subfolder data");} i f (!-d ".\\\\$folder\\\\images"){mkdir ".\\\\$folder\\\\images" or die ("Cannot make subfolder images");} #determine which f i l e s to be processed #the selected f i l e ' s name must matched the preset naming scheme #'aliquot name'_'measurement name'.txt #exclude f i l e s from the subfolders data and image (which have a very si m i l a r naming scheme) i f ($userFile =~ m/all$/i){ my OfileNames; f i n d sub {push OfileNames, $File::Find::name i f !-d}, ".\\$folder"; foreach my $f (OfileNames){ i f ($f =~ m|$folder[\\/]+(.+)\.txt$| && !($f =~ m~ [\\/]data|images[\\/] ~) ) { &MAIN("$f"); 144 } } } else { i f ($userFile =~ m|$folder[\\/]+(.+)\.txt$|){ &MAIN("$userFile"); } #if else { die ("File $userFile does not have the correct naming scheme"); } .} close (OUTFILE) or-die ("Cannot close output f i l e : $!"); close (OUTFILElog) or die ("Cannot close output f i l e : $!"); ################ ### sub MAIN ### ##############'## sub MAIN { #class information ####################YOU CAN MODIFY/CREATE NEW PATIENT GROUPS HERE my %class; $class{"aGVHDcGVHD"}{ "p9" } = "p9"; $class{"aGVHDcGVHD"}{ " p l l " } = " p l l " ; $class{"aGVHDcGVHD"}{"pl5"} = "pl5"; $class{"aGVHDcGVHD"}{"pl9"} = "pl9"; $class{"aGVHDcGVHD"}{"p21"} = "p21"; $class{"aGVHDcGVHD"}{"p22"} = "p22"; $class{"aGVHDcGVHD"}{"p28"} = "p28"; $class{"aGVHDlived"}{"pi"} = " p i " ; $class{"aGVHDlived"}{"p5"} = "p5"; $class{"aGVHDlived"}{"p6"} = "p6"; $class{"aGVHDlived"}{"plO"} = "plO"; $class{"aGVHDlived"}{"pl2"} = "pl2"; $class{"aGVHDlived"}{"pl3"} = "pl3"; $class{"aGVHDlived"}{"pl4"} = "pl4"; $class{"aGVHDlived"}{"p24"} = "p24"; $class{"aGVHDlived"}{"p27"} = "p27"; $class{"aGVHDdied"}{"p3"} = "p3"; $class{"aGVHDdied"}{"p7"} = "p7"; $class{"aGVHDdied"}{"pl6"} = "pl6"; $class{"aGVHDdied"}{"p23"} = "p23"; $class{"aGVHDdied"}{"p25"} = "p25"; $class{"healthy4"}{"p2"} = "p2"; $class{"healthy4"}{"p4"} = "p4"; $class{"healthy4"}{"pl7"} = "pl7"; $class{"healthy4"}{"p31"} = "p31"; $class{"denovocGVHD"}{"p8"} = "p8"; $class{"denovocGVHD"}{"p26"} = "p26"; $class{"denovocGVHD"}{"p30"} = "p30"; $class{"healthyDied"}{"pl8"} = "pl8"; $class{"healthyDied"}{"p20"} = "p20"; $class{"healthyDied"}{"p29"} = "p29"; my $i n F i l e = shift(@_); #name of the current processed f i l e 145 p r i n t (OUTFILElog "\nprocessing: $ i n F i l e \ n " ) ; p r i n t (OUTFILE "%%%MEASUREMENT: $inFile%%%\nclc\nclear all\ n c l o s e a l l \ n " ) ; my $measurement = "$1"; #read data from f i l e i n the s p e c i f i c folder my %data = %{&READTABLE("$ i n F i l e " ) } ; #load individual patient's data into variables i n matlab #$omitPatients{$patient} = "reason", only include patients with less than 2 values and more than 1 #not patients without any value because they are not included i n %data my %omitPatients = %{&LOADJDATA(\%data, \@grid, $by)}; icheck i f any of the patient i s not i n the intput f i l e #if not, delete the patient i n %class and inlucde the patient i n %omitPatients foreach my $tempGroup (keys %class){ foreach (keys %{$class{$tempGroup}}){ i f (!(exists($data{$_}))){ delete $class{"$tempGroup"}{"$_"}; $omitPatients{"$_"} = " i s not i n the input f i l e " ; } i f (exists($omitPatients{$_})){ delete $class{"$tempGroup"}{"$_"}; } } } foreach (keys %omitPatients){ p r i n t (OUTFILElog "OMIT: $_ i s omitted $omitPatients{$_}\n"); } p r i n t (OUTFILE "%omit patients: " . j o i n (", ", keys(%omitPatients)) . "\n") ; #determine i f there i s enough data to perform FLDA, i f not, skip to the next f i l e my (SchkNumDataResults = &CHK_NUM_DATA(\%class, \@compareGroups, \%data); i f ($chkNumDataResults[1] =~ m/NO,(.+)/){ print (OUTFILElog "SKIP: $ i n F i l e i s ignored because$l\n"); return (); } my %acceptedPatientsPerGroup = % { s h i f t (@chkNumDataResults)} ; ########################### #PERFORMING FLDA WITH DATA 146 #group data based on the specified groups i n Iclass, then based on the comparison type chosen, further group the data to f i t the FLDA format &GROUP_DATA(\%class, \@compareGroups, \%acceptedPatientsPerGroup); # i n i t i a l i z e d f l d a parameters &FLDA_PARAMETERS(\@grid, $by, \@orderBreaks); #Running FLDA and writing data, parameters and results into text f i l e &FLDA("$measurement", "$comparison", \@orderBreaks); #group data for leave one out v a l i d a t i o n my $validationFileName = &LEAVEONEOUT(\@compareGroups, \%class, \%acceptedPatientsPerGroup, "$comparison", \@orderBreaks, "^measurement"); ###########################END PERFORMING FLDA WITH DATA ########################### #read e x i s t i n g FLDA results from the current measurement from subfolder data #NOT IMPLEMENTED WHEN YOU ARE PERFORMING FLDA ANALYSIS WITH DATA v i a subroutine GROUP_DATA, FLDA_PARAMETERS, and FLDA # &READ_FLDA_RESULTS ("$measurement", "$comparison", ©orderBreaks) ; ' #determine knots time index # my %knotRanges = %{&KNOTS_POSITIONS (\@grid, \@orderBreaks)}; ############################END #determine how many values were observed per each compared gruops of patients. This i s used to represnt how r e l i a b l e a FLDA analysis i s . my %valuePerKnot = %{&VALUE_PER_KNOT(\%data, \%class, \@compareGroups, \@grid, \@orderBreaks)}; #determine weights #table output with weights and i t s r e l i a b i l i t y &WEIGHT_ON_KNOTS (\@grid, \@orderBreaks, "$comparison", "$measurement", \%valuePerKnot, "$validationFileName"); } #sub MAIN ##################### ### SUB READTABLE ### ##################### #INPUT: ##1) f i l e name ($) #OUTPUT: ##1) \%currentData ### $currentData{p#}{time i n #} -> value 147 #FUNCTIONS: ## read i n the specified tab-deliminted data text f i l e ## input data f i l e naming scheme: "group name"_"measurement name".txt ## f i l e format: ### columns -> patient ( i d e n t i f i e d by patient number) ### row -> time i n weeks ### values are actual (average) values of that measurement from the patient at that time range, sub READTABLE{ my %currentData; my $inFileName = s h i f t (@_); open (INFILE, "$inFileName") or die ("Cannot open input f i l e : $ ! ") ; chomp(my $titleText = <INFILE>); my ©titles = s p l i t ( " \ t " , $ t i t l e T e x t ) ; u n t i l (eof INFILE){ chomp(my $text = <INFILE>); my ©values = s p l i t ( " \ t " , $text); foreach my $pos (1. .$#titles){ i f (defined($values[$pos]) && $values [$pos] =~ m/[\d]/){ #$currentData{patient}{time} = measured proportion value $currentData{$titles[$pos]}{$values [0 ] } = 0 + $values[$pos];} } } close (INFILE) or die ("Cannot close input f i l e : $ ! " ) ; return (\%currentData); } ###.################## ### SUB L0AD_DATA ### ##################### #INPUT: . . ##1) \%currentData (from sub READTABLE) ##2) ©grid (grid begins at $grid [0] and ends at $grid[l]) ##3) the i n t e r v a l of the g r i d #0UTPUT: ##1) \%omitPatients ### $omitPatients{p#} -> numbers of values available #FUNCTIONS:a ## print to OUTFILE ## commands to load individual patient's data ### p#.y = a vector containing values for patient # ### p#.timeindex = a vector containing the time index of the patient # r e l a t i v e to the specified g r i d ### p#.curve = a vector of zeros with equal length to p#.y ### determine which patient ( i f number of available values i s <2) i s omitted sub L0AD_DATA { #omit patient i f there are less than 2 values available 148 my %omitPatients; my %currentData = %{shift(@_)}; my OcurrentGrid = @{shift(@_)}; my $currentBy = shift(@_); p r i n t (OUTFILE "userGrid = [$currentGrid [0]" . ":$currentBy" . ":$currentGrid [ 1 ]]\ 1;\n"); foreach my $patient (keys %currentData){ my (@y, @timeindex); foreach my $time (sort {$a<=>$b} keys %{$currentData{$patient}}){ push (@y, 0 + $currentData{$patient}{$time}); push (Otimeindex, ((($time - $currentGrid [0])/$currentBy) + 1) ) ; } #foreach time p r i n t (OUTFILE "$patient" . ".y = [" . j o i n (", ", @y) . "]\';\n"); p r i n t (OUTFILE "$patient" . ".timeindex = [" . j o i n (", ", Otimeindex) . "] \ 1 ; \ n " ) ; p r i n t (OUTFILE "$patient" . ".curve = ones(length($patient" . ".y), l ) ; \ n \ n " ) ; i f (scalar(Oy) < 2){$omitPatients{"$patient"} = "for having less the three available values (" . scalar(@y) . ")";} } #foreach patient return(\%omitPatients) ; } #sub ######################## ### sub CHK_NUM_DATA ### ######################## #INPUT: ##%class (from MAIN) ##@compareGroups (global) ##%data (from READTABLE) #0UTPUT: ##\%acceptedPatientsNum (number of accepted patients per group ##"enough" or "NO" to indicate i f there i s enough data to run FLDA #FUNCTIONS: ##determine i f there i s enough patients to run f l d a ###pateints with less than 2 values available i s omitted ###There must be at least 3 patients included i n each class ##determine i f there i s enough time point to f i t the nbreaks specified ##ie i f nbreaks i s 4, there must be at least one patient with 4 available data points sub CHK_NUM_DATA { my %currentClass = %{shift(@_)}; my OcompareGroups = @{shift(@_)}; my %currentData = %{shift(@_)}; #determine how many q u a l i f i e d patients there are i n each group #patient i s omitted i f there are less than 2 values available 149 my %acceptedPatientsNum; foreach my $group (keys %currentClass){ foreach my $okPatients (keys %{$currentClass{"$group"}}){ $acceptedPatientsNum{"$group"} ++; } } #determine how many q u a l i f i e d patients they are i n each class my $maxNumData = 0 ; foreach my $numGroup (0..$#compareGroups){ #each class my $groupNumCheck = 0 ; foreach (@{$compareGroups[$numGroup]}){ #groups within class if(exists($acceptedPatientsNum{"$_"})){$groupNumCheck = $groupNumCheck + $acceptedPatientsNum{"$_"};} #there are instances when the whole group of patient i s missing (so i t won't be i n %acceptedPatientsNum foreach my $patient (keys %{$currentClass{"$_"}}){ my $numDataPerPatient = 0 ; foreach my $time . (keys %{$currentData{"$patient"}}){ $numDataPerPatient ++; } i f ($numDataPerPatient > $maxNumData){$maxNumData = 0 + $numDataPerPatient;} } } i f ($groupNumCheck < 3){return (\%acceptedPatientsNum, "NO, less than 3 available patients i n a class");} } i f ($maxNumData < $orderBreaks[1]){return (\%acceptedPatientsNum, "NO, nbreaks $orderBreaks[1] > max time points $maxNumData");} return (\%acceptedPatientsNum, "enough"); } #sub ###################### ### sub GROUP_DATA ### ###################### #INPUT: ##1) \%class, class information i n a hash ### $class{group/class}{p#} => p# ##2) comparison chosen ##3) %omitPatients from sub L0AD_DATA #0UTPUT: ## none #FUNCTI0NS: ## pri n t to OUTFILE ## group each patient data into the pre-specified gruop ### group.y = [p#.y', p#.y']'; ### group.timeindex = [p#.timeindex', p#.timeindex']';a ### group.curve = [p#.curve'+1, p#.curve'+2]'; ### group.num -> number of available patients i n that group 150 ## further group the grouped data based on the comparison chosen, into format sutable for FLDA ### class = [ones(group.num, 1) + increment]; ### curve = [group.curve' + $increment]; ### timeindex = [group.timeindex'..]; ### class = [group.y',...]; ### data.y = y ; ### data.timeindex = timeindex ### data.curve = curve ### data.class = class sub GROUP_DATA { my %currentClass = %{shift(@_)}; my OcompareGroups = @{shift(@_)}; my %acceptedPatientsNum = %{shift(@_)}; foreach my $group (keys %acceptedPatientsNum){ print (OUTFILE "$group" . ".num = " . (0+$acceptedPatientsNum{"$group"}) . ";\n"); my (©groupTimeindex, ©groupY, ©groupCurve); my $n = -1; foreach (keys %{$currentClass{"$group"}}){ $n ++; push (@groupTimeindex, "$_" . ".timeindex\'"); push (©groupY, "$_" . " . y \ ' " ) ; push (@groupCurve, "$_" . ".curve\'+ $n"); } #group data into the pre-specified groups print (OUTFILE "$group" . ".timeindex = [" . j o i n (", ", ©groupTimeindex) . "]\';\n"); p r i n t (OUTFILE "$group" . ".y = [" . j o i n (", ", ©groupY) . " ] V;\n"); pr i n t (OUTFILE "$group" . ".curve = [" . j o i n (", ", ©groupCurve) . " ] \ 1 ; \ n " ) ; } #each group #further group the grouped data into format sutiable for FLDA #group the grouped data based on the comparison chosen #fir s t p r i n t out matlab commands for each of the following variables: class, curve, timeindex, and y #then pr i n t out matlab commands to combine the above variables into variable data (ie data.y, data.class, etc) my (©tempY, ©tempClass, ©tempCurve, ©tempTimeindex); my $increment = 0 ; foreach my $numGroup (0..$#compareGroups){ foreach (@{$compareGroups[$numGroup]}){ push (©tempY, "$_" . ".y\'"); push (©tempTimeindex, "$_" . ".timeindex\ 1"); push (©tempClass, "ones($_" . ".num, 1)\' + $numGroup") ,-push (©tempCurve, "$_" . ".curveV + $increment") ; i f (!(exists($acceptedPatientsNum{$_}))){ pr i n t "###ERROR: cannot f i n d group $_ i n accepted patients number\n"; die("cannot f i n d group $_ i n accepted patients number"); 151 } $increment = $increment + $acceptedPatientsNum{$_}; #individual patient's class foreach (keys %{$currentClass{$_}}){ print (OUTFILE "$_" . ".class = " . ($numGroup + 1) . " ; \n" ) ; } } } p r i n t (OUTFILE "data.class = [" . j o i n (", ", ©tempClass) . "]\';\n"); pr i n t (OUTFILE "data, curve = [" . j o i n (", 11, ©tempCurve) . "]\';\n"); pr i n t (OUTFILE "data.timeindex = [" . j o i n ( " , ", ©tempTimeindex) . "]\';\n"); pr i n t (OUTFILE "data.y = [" . j o i n ( " , ", @tempY) . "]\';\n"); } #sub ########################### ### sub FLDA_PARAMETERS ### ########################### #INPUT: ##1) ©currentGrid (grid begins at $grid[0] and ends at $grid[l]) ##2) the i n t e r v a l of the g r i d ##3) ©currentOrderBreaks -> $currentOrderBreaks[0] = order, $currentOrderBreaks[1] = number of breaks; #OUTPUT: ## none #FUNCTIONS: ## print to OUTFILE ## i n i t i a l i z e d a l l the necessary FLDA parameters such as: ### userGrid, nbreaks, norder, nbasis, q, G, pert, p, h, t o l , maxit ## commands to check p, q and h value making sure that they are within range sub FLDA_PARAMETERS{ my ©currentGrid = @{shift(@_)}; my $currentBy = shift(@_); my ©currentOrderBreaks = ©{shift (©_)}; pri n t (OUTFILE "\%nbreaks: number of breaks\nnbreaks=$currentOrderBreaks[1];\n"); pr i n t (OUTFILE "\%norder: order of the spline (degree+1)\nnorder=$currentOrderBreaks[0];\n"); pr i n t (OUTFILE "\%nbasis: number of basis functions\nnbasis=nbreaks+norder-2;\n"); print (OUTFILE "\%q: dimensionof the spline basis (q-2 equally spaced knots)\nq=nbasis;\n"); pr i n t (OUTFILE' "\%G: number of cluster\nG=length(unique(data.class));\n"); p r i n t (OUTFILE "\%pert: small adjustment(ridge regression)\npert=0.1;\n"); 152 p r i n t (OUTFILE "\%p: rank constraint on the gammas !! p<=q\np=l;\n"); pri n t (OUTFILE "\%h: dimension of alpha !! h <= min(p, G-l) G= number of clusters\nh=l;\n"); p r i n t (OUTFILE "\%minimum r e l a t i v e change for loops (log l i k e l i h o o d or sum of squares)\ntol = 0.001;\n"); pri n t (OUTFILE "\%maximum number of iterations\nmaxit=50;\n"); p r i n t (OUTFILE " i f p>q\nfprintf(\ 1 error on p >q (Nb of basis) q = % 3 i , p = % 3 i \\n\',q,p)\nreturn\nend\n"); print (OUTFILE "max_h = min(p,G-l);\nif h > min(p,G-1)\nfprintf(\'error on h > min(p,K-l)\\th=%3i\\tmin(p,G-l)=%3i\\n\',h,max_h)\nreturn\nend\n\n"); } #sub ################# ### sub FLDA #### ################# #INPUT: ##1) name of the current measurement ##2) number of the current comparison ##3) OcurrentOrderBreaks -> $currentOrderBreaks[0] = order, $currentOrderBreaks[1] = number of breaks; #OUTPUT: ## none #FUNCTIONS: ## print to OUTFILE ## matlab commands for running the f l d a f i t and fldapred using the previously i n i t i a l i z e d parameters and data ## matlab commands to pr i n t the data, f i t t i n g parameters, and prediction results into i n d i v i d u a l text f i l e s sub FLDA { my $currentMeasurement = shift(@_); my $currentComparison = shift(@_); my ©currentOrderBreaks = @{shift(@_)}; #FLDA pri n t (OUTFILE "[fIda.parameters, flda.vars, flda.S, f l d a . F u l l S , f Ida. likenew] = . . \ n " ) ; prin t (OUTFILE " f l d a f i t ( d a t a , norder, nbreaks, h, p, pert, maxit, userGrid, t o l ) ; \ n " ) ; p r i n t (OUTFILE "[fIda.Calpha, fIda.alphahat, fIda.classpred, fIda.distance] = ...\n"); p r i n t (OUTFILE "fldapred(fIda.parameters, flda.vars, flda.S, f l d a . F u l l S , fIda.likenew, data);\n\n"); #count the error rate prin t (OUTFILE " \ % c l a s s l = data.class == l;\n\%class2 = data.class == 2;\n"); 153 p r i n t (OUTFILE " \%error TP = sum f l d a classpred(class1 1);\n"); print (OUTFILE "\%error FN = sum (f Ida classpred(class1 2);\n"); p r i n t (OUTFILE 11 \%error FP = sum (flda classpred(class2 1);\n"); p r i n t (OUTFILE "\%error TN = sum (flda classpred(class2 2);\n»); #print out error rate to f i l e " error...txt" my $dlmwriteFile = "\%\'.\\error_comparison$currentComparison" . "_Order$currentOrderBreaks[0]" . "Breaks$currentOrderBreaks[1] " . " . t x t \ 1 " ; my $dlmwriteParameter = "\%\'-append\', \'newline\', \'pc\', \'delimiter\', \ ' \ ' " ; #print into text f i l e s p r i n t (OUTFILE "dlmwrite(\ 1 .\\dat a\\$ currentMeasurement" . "_comparison$currentComparison" . "_Order$currentOrderBreaks[0]" . "Breaks$currentOrderBreaks [1] " . " _ c l a s s . t x t \ ' , data.class, \ ' \ \ t \ ' ) \ n " ) ; p r i n t (OUTFILE "dlmwrite(\'.\\data\\$currentMeasurement" . "_compari son$ currentCompari son" . "_Order$currentOrderBreaks[0] " . "Breaks$currentOrderBreaks[1]" . "_curve.txt\', data.curve, \ ' \ \ t \ ' ) \ n " ) ; p r i n t (OUTFILE "dlmwrite(\ 1.\\data\\$currentMeasurement" . "_compari son$ currentCompari son" . "_Order$currentOrderBreaks[0]" . "Breaks$currentOrderBreaks[1]" . "_timeindex.txt\', data.timeindex, \'\\t\')\n»); pri n t (OUTFILE "dlmwrite(\ 1 .\\data\\$currentMeasurement" . "_comparison$currentComparison" . "_Order$currentOrderBreaks[0]" . "Breaks$currentOrderBreaks[1]" . "_y.txt\', data.y, \ ' \ \ t \ ' ) \ n " ) ; p r i n t (OUTFILE- "dlmwrite(\'.\\data\\$currentMeasurement" . "_comparison$currentComparison" . "_Order$currentOrderBreaks [0] 11 . "Breaks$currentOrderBreaks[1]" . "_lambdazero.txt\', flda.parameters.lambdazero, \ 1 \ \ t \ 1 ) \ n " ) ; p r i n t (OUTFILE "dlmwrite(\'.\\data\\$currentMeasurement" . "_comparison$currentComparison" . "_Order$currentOrderBreaks[0]" . "Breaks$currentOrderBreaks[1]" . "_Lambda.txt\', flda.parameters.Lambda, \ ' \ \ t \ ' ) \ n " ) ; p r i n t (OUTFILE "dlmwrite(\'.\\data\\$currentMeasurement" . "_comparison$currentComparison" . "_Order$currentOrderBreaks[0]" . "Breaks$currentOrderBreaks[1]" . "_alpha.txt\ 1, flda.parameters.alpha, \ 1 \ \ t \ ' ) \ n " ) ; p r i n t (OUTFILE "dlmwrite(\ 1.\\data\\$currentMeasurement" . "_comparison$currentComparison" . "_Order$currentOrderBreaks[0]" . "Breaks$currentOrderBreaks[1]" . "_Theta.txt\', flda.parameters.Theta, \ ' \ \ t \ ' ) \ n " ) ; p r i n t (OUTFILE "dlmwrite(\'.\\data\\$currentMeasurement" . "_comparison$currentComparison" . "_Order$currentOrderBreaks[0]" . 154 " B r e a k s $ c u r r e n t O r d e r B r e a k s [ 1 ] " f I d a . p a r a m e t e r s . s i g m a , \ ' \ \ t \ ' ) p r i n t (OUTFILE M d l m w r i t e ( \ ' _ c o m p a r i s o n $ c u r r e n t C o m p a r i s o n ' B r e a k s $ c u r r e n t O r d e r B r e a k s [ 1 ] " ' \ \ t \ ' ) \ n " ) ; p r i n t (OUTFILE " d l m w r i t e ( \ ' _ c o m p a r i s o n $ c u r r e n t C o m p a r i s o n ' B r e a k s $ c u r r e n t O r d e r B r e a k s [ 1 ] " ' \ \ t \ ' ) \ n " ) ; ' p r i n t (OUTFILE " d l m w r i t e ( \ ' _ c o m p a r i s o n $ c u r r e n t C o m p a r i s o n ' B r e a k s $ c u r r e n t O r d e r B r e a k s [ 1 ] " ' \ \ t \ ' ) \ n " ) ; p r i n t (OUTFILE " d l m w r i t e ( \ ' _ c o m p a r i s o n $ c u r r e n t C o m p a r i s o n ' B r e a k s $ c u r r e n t O r d e r B r e a k s [ 1 ] " ' \ \ t \ ' ) \ n " ) ; p r i n t (OUTFILE " d l m w r i t e ( \ ' _ c o m p a r i s o n $ c u r r e n t C o m p a r i s o n ' B r e a k s $ c u r r e n t O r d e r B r e a k s [ 1 ] " ' \ \ t \ ' ) \ n ' ' ) ; p r i n t (OUTFILE " d l m w r i t e ( \ ' _ c o m p a r i s o n $ c u r r e n t C o m p a r i s o n ' B r e a k s $ c u r r e n t O r d e r B r e a k s [ 1 ] " • \ \ t \ ' ) \ n " ) ; p r i n t (OUTFILE " d l m w r i t e ( \ ' _ c o m p a r i s o n $ c u r r e n t C o m p a r i s o n ' B r e a k s $ c u r r e n t O r d e r B r e a k s [ 1 ] " p r i n t (OUTFILE " d l m w r i t e ( V _ c o m p a r i s o n $ c u r r e n t C o m p a r i s o n ' B r e a k s $ c u r r e n t O r d e r B r e a k s [ 1 ] " \ ' \ \ t \ ' ) \ n \ n " ) ; } #sub " _ s i g m a . t x t \ ' , n") ; \ \ d a t a \ \ $ c u r r e n t M e a s u r e m e n t " . . " _ O r d e r $ c u r r e n t O r d e r B r e a k s [ 0 ] " " _ D . t x t \ ' , f I d a . p a r a m e t e r s . D , \ \ d a t a \ \ $ c u r r e n t M e a s u r e m e n t " . . " _ O r d e r $ c u r r e n t O r d e r B r e a k s [ 0 ] " " _ g a m m a . t x t \ ' , f I d a . v a r s . g a m m a , \ \ d a t a \ \ $ c u r r e n t M e a s u r e m e n t " . . " _ O r d e r $ c u r r e n t O r d e r B r e a k s [ 0 ] " . " _ C a l p h a . t x t \ ' , f l d a . C a l p h a , \ \ d a t a \ \ $ c u r r e n t M e a s u r e m e n t " . . " _ O r d e r $ c u r r e n t O r d e r B r e a k s [ 0 ] " . " _ a l p h a h a t . t x t \ ' , f I d a . a l p h a h a t , \ \ d a t a \ \ $ currentMeasurement" . . " _ O r d e r $ c u r r e n t O r d e r B r e a k s [ 0 ] " . " _ c l a s s p r e d . t x t \ ' , f I d a . c l a s s p r e d , \ \ d a t a \ \ $ currentMeasurement" . . " _ O r d e r $ c u r r e n t O r d e r B r e a k s [ 0 ] " . " d i s t a n c e . t x t \ ' , f I d a . d i s t a n c e , \ \ d a t a \ \ $ c u r r e n t M e a s u r e m e n t " . . " _ O r d e r $ c u r r e n t O r d e r B r e a k s [ 0 ] " _ S . t x t \ ' f f l d a . S , \ ' \ \ t \ ' ) \ n " : \ \ d a t a \ \ $ c u r r e n t M e a s u r e m e n t " . . " _ O r d e r $ c u r r e n t O r d e r B r e a k s [ 0 ] " F u l l S . t x t \ ' , f I d a . F u l l S , #################.###### ### sub LEAVEONEOUT ### ####################### #INPUT: ##1) @compareGroups ($compareGroups [ 0 . . # 1 ] [ 0 . . # 2 ] = > group name) #1 i s the number o f groups t o compared and #2 i n d i c a t e s how many-subgroup group #1 i s c o n s i s t s o f ##2) %class i n f o r m a t i o n ##3) % a c c e p t e d P a t i e n t s P e r G r o u p ( $ a c c e p t e d P a t i e n t s P e r G r o u p { g r o u p name} => number o f p a t i e n t s i n the group #OUTPUT: ##1) name o f the v a l i d a t i o n f i l e #FUNCTIONS: 155 ## pr i n t to f i l e ## FLDA commands to assemble leave-one-out data based the the previously specified class and comparison information ## Then run f l d a f i t on the t r a i n i n g dataset (dataset -1 patient) and fldapred on the determined parameters and the one patient data sub LEAVEONEOUT { my ©compareGroups = @{shift(@_)}; my %currentClass = %{shift(@_)}; my %acceptedPatientsPerGroup = %{shift(@_)}; my $currentComparison = s h i f t (@_); my ©currentOrderBreaks = @{shift(@_)}; my $currentMeasurement = s h i f t (@_); my %leavePatients; my $increament = 0; pri n t (OUTFILE "validation.TP=0;\nvalidation.FN=0;\nvalidation.FP=0;\nvalidation.TN =0;\n"); #create leave one out data LEAVECLASS: foreach my $leaveClass (keys %currentClass){ my $leaveClassInCompare = 0; foreach (0..$#compareGroups){ foreach (@{$compareGroups[$_]}){ i f ($_ =~ m/*$leaveClass$/){$leaveClassInCompare ++;} } } i f ($leaveClassInCompare == 0){next LEAVECLASS;} foreach my $leavePatient (keys %{$currentClass{$leaveClass}}){ #assemble the class data - the leave patient my (©leaveClassY, ©leaveClassTimeindex, ©leaveClassCurve, ©leaveClassClass); my $leaveN = -1; foreach my $notLeavePatient (keys %{$currentClass{$leaveClass}}) { i f (!($notLeavePatient =~ m/*$leavePatient$/)){ $leaveN ++; push (@leaveClassY, "$notLeavePatient" . ".y\'"); push (©leaveClassCurve, "$notLeavePatient" . ".curve\' + $leaveN"); push (@leaveClassTimeindex, "$notLeavePatient" . ".timeindex\'"); } } p r i n t (OUTFILE "tempCurve = [" . j o i n ( " , ", ©leaveClassCurve) . "]';\n"); my (@tempY, ©tempTimeindex, ©tempClass, ©tempCurve); my $increment = 0; foreach my $numGroup (0..$#compareGroups){ foreach my $group (@{$compareGroups[$numGroup]}){ i f ($group =~ m/^$leaveClass$/){ push (©tempY, ©leaveClassY); 156 push (©tempTimeindex, ©leaveClassTimeindex); push (©tempCurve, "(tempCurve + $increment)\'"); push (©tempClass, "ones($group" . ".num -1, 1)\' $numGroup"); $increment = $increment + $acceptedPatientsPerGroup{$group} -1; } else{ push (©tempY, "$group" . ".y\'"] push (©tempTimeindex, "$group" push (©tempClass, "ones($group" " .timeindex\ 1") ".num, 1)\' + push (©tempCurve, "($group" $numGroup"); $increment)'"); $increment = $increment + $acceptedPatientsPerGroup{$group}; } #else } #foreach my $group } #foreach $numGroup #commands to b u l i d the leavep# data print (OUTFILE "leave$leavePatient" j o i n ( " , ", ©tempClass) . "]\';\n"); p r i n t (OUTFILE "leave$leavePatient" j o i n ( " , ", ©tempCurve) . "]\';\n"); pr i n t (OUTFILE "leave$leavePatient" j o i n ( " , ", ©tempTimeindex) . "]\';\n"); p r i n t (OUTFILE "leave$leavePatient" ©tempY) . " ] \ 1 ; \ n " ) ; curve + ".class = [" . ".curve = [" . ".timeindex = [" . " .y = [" . j o i n ( " , #FLDA commands pri n t (OUTFILE "[leave$leavePatient" . ".parameters, leave$leavePatient" . ".vars, leave$leavePatient" . ".S, leave$leavePatient" . ".FullS, leave$leavePatient" . ".likenew] = ...\n"); p r i n t (OUTFILE "fldafit(leave$leavePatient, norder, nbreaks, h, p, pert, maxit, userGrid, t o l ) ; \ n " ) ; p r i n t (OUTFILE "[leave$leavePatient" . ".Calpha, leave$leavePatient" . ".alphahat, leave$leavePatient" . ".classpred, leave$leavePatient" . ".distance] = ...\n"); prin t (OUTFILE "fldapred(leave$leavePatient" . ".parameters, leave$leavePatient" . ".vars, leave$leavePatient" . ".S, leave$leavePatient" . ".FullS, leave$leavePatient" . ".likenew, $leavePatient);\n"); idetermine the correctness prin t (OUTFILE " i f ($leavePatient" . ".class ==1) && ($leavePatient" . ".class == leave$leavePatient" . ".classpred)\n"); p r i n t (OUTFILE "validation.TP=validation.TP+1;\nend\n"); pr i n t (OUTFILE " i f ($leavePatient" . ".class == 1) && ($leavePatient" . ".class ~= leave$leavePatient" . 11 . classpred) \n" ) ; pri n t (OUTFILE "validation.FN=validation.FN+1;\nend\n"); 157 p r i n t (OUTFILE " i f ($leavePatient" . ".class == 2) && !$leavePatient" . ".class ~= leave$leavePatient" . ".classpred)\n"] p r i n t (OUTFILE "validation.FP=validation.FP+1;\nend\n"); p r i n t (OUTFILE " i f ($leavePatient" . ".class = = 2 ) && ($leavePatient" . ".class == leave$leavePatient" . ".classpred)\n"] prin t (OUTFILE "validation.TN=validation.TN+1;\nend\n"); } #foreach $leavePatient } #foreach $leaveClass #print out leave-one-out cross-validation result my $dlmwriteFile = "V.\\validation_comparison$currentComparison" . "_Order$currentOrderBreaks[0]" . "Breaks$currentOrderBreaks [1] " . " . t x t \ 1 " ; my $dlmwriteParameter = "\'-append\', \'newline\', \'pc\', \'delimiter\' , \'\'"; pri n t (OUTFILE "dlmwrite($dlmwriteFile, \ 1$currentMeasurement\', $dlmwriteParameter)\n"); prin t (OUTFILE "dlmwrite($dlmwriteFile, validation.TP, $dlmwriteParameter)\n"); prin t (OUTFILE "dlmwrite($dlmwriteFile, validation.FN, $dlmwriteParameter)\n") ; print (OUTFILE "dlmwrite($dlmwriteFile, validation.FP, $dlmwriteParameter)\n"); prin t (OUTFILE "dlmwrite($dlmwriteFile, validation.TN, $dlmwriteParameter)\n\n\n"); return ("$dlmwriteFile"); } ############################# ### sub READ_FLDA_RESULTS ### ############################# #INPUT: #1. measurement name #OUTPUT: none #FUNCTIONS: #read i n the FLDA results written i n subfolder data #restore a l l the variables created during the FLDA process sub READ_FLDA_RESULTS { print (OUTFILE "%read i n a l l FLDA parameters back from subfolder 'data'\n"); my $currentMeasurement = s h i f t (@_); my $parti'alFileName = "$currentMeasurement" . "_comparison" . shift(@_) . "_Order" . shift(@_) . "Breaks" . shift(@_); #print MATLAB command s p e c i i f y the current measurement print (OUTFILE "measurement = \'$partialFileName\';\n"); #data 158 p r i n t (OUTFILE " [data.class] = dlmread( [\1 .\\data\\\', measurement, \ 1 _ c l a s s . t x t \ ' ] , \ ' \ \ t \ ' ) ; \ n " ) ; p r i n t (OUTFILE "[data.curve] = dlmread([\'.\\data\\\', measurement, \ ' _ c u r v e . t x t \ ' ] , \ 1 \ \ t \ 1 ) ; \ n " ) ; prin t (OUTFILE "[data.timeindex] = dlmread([\'.\\data\\\ 1, measurement, \'_timeindex.txt\'],\'\\t\');\n"); prin t (OUTFILE "[data.y] = dlmread([\'.\\data\\\', measurement, \ ' _ y . t x t \ ' ] , \ ' \ \ t \ ' ) ; \ n " ) ; #fIda.parameters pr i n t (OUTFILE 11 [flda. parameters . lambdazero] = dlmread([\ '.\\data\\\', measurement, \ 1_lambdazero.txt\'],\'\\t\ 1);\n"); p r i n t (OUTFILE "[flda.parameters.Lambda] = dlmread([\'.\\data\\\', measurement, \'_Lambda.txt\•],\'\\t\');\n"); pr i n t (OUTFILE "[flda.parameters.alpha] = dlmread([\'.\\data\\\ 1, measurement, \ ' _ a l p h a . t x t \ ' ] , \ 1 \ \ t \ 1 ) ; \ n " ) ; prin t (OUTFILE "[flda.parameters.Theta] = dlmread([\ 1.\\data\\\', measurement, \'_Theta.txt\ 1] ,\ ' \ \ t \ ' ) ;\n") ; pri n t (OUTFILE "[flda.parameters.sigma] = dlmread([\'.\\data\\\', measurement, \'_sigma.txt\'] ,\ ' \ \ t \ 1 ) ;\n") ; pri n t (OUTFILE "[flda.parameters.D] = dlmread([\'.\\data\\\', measurement, \'_D.txt\'],\'\\t\');\n"); #other FLDA variables prin t (OUTFILE "[flda.vars.gamma] '= dlmread([\ 1.\\data\\\ 1, measurement, \ 1_gamma.txt\'],\'\\t\');\n"); pr i n t (OUTFILE "[flda.S] = dlmread([\'.\\data\\\', measurement, \ ' _ S . t x t \ ' ] , \ ' \ \ t \ ' ) ; \ n " ) ; p r i n t (OUTFILE "[flda.FullS] = dlmread([\'.\\data\\\', measurement, \ ' _ F u l l S . t x t \ ' ] , \ ' \ t \ ' ) ; \ n " ) ; p r i n t (OUTFILE "[fIda.Calpha] = dlmread([\'.\\data\\\ 1, measurement, \'_Calpha.txt\'],\'\\t\');\n"); prin t (OUTFILE "[fIda.alphahat] = dlmread([\'.\\data\\\', measurement, \'_alphahat.txt\'],\ 1\\t\');\n"); p r i n t (OUTFILE "[flda.classpred] = dlmread([\'.\\data\\\ 1, measurement, \'_c l a s s p r e d . t x t \ 1 ] , \ ' \ \ t \ ' ) ; \ n " ) ; p r i n t (OUTFILE "[flda.distance] = dlmread([\ 1.\\data\\\', measurement, \'_distance.txt\'] ,\ ' \ \ t \ ' ) ;\n") ; pri n t (OUTFILE "\n"); } #sub read f l d a results ########################## ### sub VALUE_PER_KNOT ### ########################## #INPUT: #1. \%data #2. \%class #3. \@compareGroups #4. \@currentGrid #5. \@currentOrderBreaks #OUTPUT: 159 #\%valuePerKnot #$valuePerKnot{$knot#}{$class#}{expected} => expected number of values #$valuePerKnot{$knot#}{$class#}{observed} => observed number of values #only include class# that has the smallest observed number of value for that knot #FUNCTIONS sub VALUE_PER_KNOT { my %currentData = %{shift(@_)}; my %currentClass = %{shift(@_)}; my OcurrentCompareGroups = @{shift(@_)}; my OcurrentGrid = @{shift(@_)}; my ©currentOrderBreaks = @{shift(@_)}; my $halfInterval = floor((($currentGrid[1] -$currentGrid[0])/($currentOrderBreaks[1]-1))/2) ; my %tempValuePerKnot; my %valuePerKnot; for (my $pos = $currentGrid[0]; $pos <= $currentGrid[1]; $pos += (($currentGrid[1] - $currentGrid[0] )/($ currentOrderBreaks[1]-1))) { pr i n t OUTFILElog "GRID: $pos\n"; foreach my $numClass (0..$#currentCompareGroups){ foreach my $group (@{$currentCompareGroups[$numClass]}){ foreach my $patient (keys %{$currentClass{$group}}){ foreach my $time (keys %{$currentData{$patient}}){ print OUTFILElog "TIME: $time\n"; i f ($time <= $pos + $halfInterval && $time >= $pos - $halfInterval){ $tempValuePerKnot{$pos}{$numClass}{"observed"} + + ; p r i n t OUTFILElog "adding; knot $pos from clas $numClass\n"; } else { $tempValuePerKnot{$pos}{$numClass}{"observed"} + = 0; } } $tempValuePerKnot{$pos}{$numClass}{"expected"}++; } #foreach knot } #foreach patient i n current class } #foreach group i n current compare groups } #foreach compared class i n current compare groups foreach my $printKnot (sort {$a<=>$b} keys %tempValuePerKnot){ my $smallestObserved = 100; my $smallestObservedClass; foreach my $printClass. (0..$#currentCompareGroups){ 160 i f ($tempValuePerKnot{$printKnot}{$printClass}{"observed"} <= $smallestObserved){ $smallestObserved = 0 + $tempValuePerKnot{$printKnot}{$printClass}{"observed" } ; pri n t OUTFILElog "new small observed from knot: $printKnot i s $tempValuePerKnot{$printKnot}{$printClass}{observed}\n"; $smallestObservedClass = 0 + $printClass; } else{ p r i n t OUTFILElog "wrong: knot $printKnot from class $printClass has " . $tempValuePerKnot{$printKnot}{$printClass}{"observed"} . "\n"; } } $valuePerKnot{$printKnot}{"observed"} = $smallestObserved; pr i n t OUTFILElog "smallest observed at knot $printKnot i s $smallestObserved from class $smallestObservedClass\n"; $valuePerKnot{$printKnot}{"expected"} = $tempValuePerKnot{$printKnot}{$smallestObservedClass}{"expected"}; } return (\%valuePerKnot); } #sub ########################### ### sub WEIGHT_ON_KNOTS ### ########################### #INPUT: #1. \@grid #2. \@orderBreaks #3. $comparison #4. $measurement #5. \%valuePerKnot #OUTPUT: none #FUNCTIONS: ttprint out MATLAB commands needed to determine weight using the knots d i s t r i b u t i o n #print sub WEIGHT_ON_KNOTS { my OcurrentGrid = @{shift(@_)}; my OcurrentOrderBreaks = @{shift(@_)}; my $currentComparison = s h i f t (@_); my $currentMeasurement = s h i f t (@_); my %currentValuePerKnot = %{shift(@_)}; my $dlmwriteFile = shift(@_); p r i n t (OUTFILE "currentTimelndex = int32([1:(($currentGrid[1]-$currentGrid[0])/($currentOrderBreaks[1]-1)):($currentGrid[1]-$currentGrid[0]+1) ] \') ;\n") ; print (OUTFILE " S i j = flda.FullS(currentTimelndex, :);\n"); 161 p r i n t (OUTFILE "N = l ; \ n " ) ; p r i n t (OUTFILE "h = size(fIda.parameters.alpha, 2);\n"); print (OUTFILE "K = size(fIda.parameters.alpha, l ) ; \ n " ) ; p r i n t (OUTFILE "Calpha = zeros(N,h,h);\n"); print (OUTFILE "n = length(currentTimelndex);\n"); print (OUTFILE "Sigma = fIda.parameters.sigma * eye (n) + S i j fIda.parameters.Theta * diag(fIda.parameters.D) * fIda.parameters.Theta\' * S i j \ ' ; \ n " ) ; p r i n t (OUTFILE "InvCalpha = fIda.parameters.Lambda\1 * S i j \ ' inv(Sigma) * S i j * fIda.parameters.Lambda;\n"); pr i n t (OUTFILE "Calpha(l, :, :) = inv(InvCalpha);\n"); pr i n t (OUTFILE "[u,v,w] = size(Calpha);\n"); pr i n t (OUTFILE "Cpart = reshape(Calpha(1,:,:),v,w);\n"); pr i n t (OUTFILE "Weights = Cpart * fIda.parameters.Lambda\' * S i j \ ' * inv(Sigma);\n"); pr i n t (OUTFILE "\%dlmwrite ($dlmwriteFile, \'$currentMeasurement\', \'-append\', \'newline\', \'pc\', \'delimiter\', \'\')\n"); pri n t (OUTFILE "dlmwrite ($dlmwriteFile, Weights, \'-append\' \'newline\', \'pc\', \'delimiter\', \ ' \ \ t \ ' ) \ n " ) ; my ©printObserved; my ©printExpected; foreach my $sortedKnot (sort {$a<=>$b} keys IcurrentValuePerKnot){ push (©printObserved, $currentValuePerKnot{$sortedKnot}{"observed"}); push (©printExpected, $currentValuePerKnot{$sortedKnot}{"expected"}); } p r i n t (OUTFILE "dlmwrite ($dlmwriteFile, [" . j o i n ( " , ", ©printObserved) . "], V-appendV, \'newline\', \'pc\', \'delimiter\', \ ' \ \ t \ ' ) \ n " ) ; p r i n t (OUTFILE "dlmwrite ($dlmwriteFile, [" . j o i n ( " , ", ©printExpected) . " ] , \'-append\', \'newline\', \'pc\', \'delimiter\', \ ' \ \ t \ ' ) \ n " ) ; p r i n t (OUTFILE "\n\n");• } #sub 162 A p p e n d i x F. Q A o n gated data u s i n g C D 3 as the c o m m o n in tens i ty The general variat ions observed i n m a n y C D 3 - P e r C P density plots (Figure F . l ) prevent their use as a Q A test for the dataset. H o w e v e r , densi ty plots of C D 3 - P e r C P intensity were screened for gate qual i ty control . A n example of CD3~ gate is s h o w n Figure F.2 where smal l peaks w i t h the C D 3 - P e r C P intensity higher than 200 m a y indicate inc lus ion of C D 3 + cells i n the C D 3 " gate. o CN o o d CD o o m c o Q o o m o o d o o o d lActivation 2Activation 3Activation TCR T cells rest/act T helper rest/act T suppressor 400 600 800 C D 3 - P e r C P intensity 1000 Figure F . l D e n s i t y p lo t of the C D 3 - P e r C P in tens i ty u s i n g C D 3 + c e l l p o p u l a t i o n f r o m seven a l iquots of pat ient #6's 76 days post- transplant sample . There is no v i s i b l e out l ie r . 163 in p o o C M o o o o o o o o o o o lActivation 2Activation 3Activation TCR T cells rest/act T helper rest/act T suppressor 100 200 CD3-PerCP intensity 300 400 Figure F.2 D e n s i t y p lo t of the C D 3 - P e r C P in tens i ty u s i n g C D 3 + c e l l p o p u l a t i o n f r o m seven a l iquots of pat ient #6's -6 days post- t ransplant s ample s h o w n as an example o f gate q u a l i t y cont ro l . 164 Appendix G. Other top ranking classifiers for the onset of aGvHD In the F L D A analysis of the p ropor t ion dataset us ing samples taken between 21 and 0 days p r io r to a G v H D diagnosis , there were six unique subsets of i m m u n e cells w i t h an estimated sensi t ivi ty and specificity both higher than 70% (Table H.2) . They i nc luded the i m m u n e cells C D 3 + C D 4 + C D 8 p + and C D 3 + C D 4 + C D 8 p + C D 8 + , p rev ious ly ident if ied as the top r a n k i n g classifiers based o n samples taken between 7 and 21 days post-transplant (Table 4.1). A l l the C D 3 + and related subsets of i m m u n e cells exhibi ted the same pattern whereas the CD3~ i m m u n e cel l popu la t ion exhibi ted the opposite pattern. The C D 3 + and its related subsets of i m m u n e cells such as C D 3 + C D 4 4 - C D 2 5 -exhibi ted a pattern s imi lar to that observed between a G v H D and n o n - G v H D patients f r o m i m m u n e cells C D 3 + C D 4 + C D 8 p + between 7 and 21 days post-transplant. T ime plots of the i m m u n e cells C D 3 + C D 4 4 C D 2 5 - (Figure G . l ) are s h o w n as examples. In the F L D A estimated signals t ime plot for the i m m u n e cells C D 3 + C D 4 4 -CD25~ (Figure G . l a ) , the a G v H D patients h a d higher signals than the n o n - G v H D patients d i d . F r o m the r a w data t ime plo t f rom -21 to 21 days f rom a G v H D diagnosis (Figure G . l b ) , there was a consistent pattern i n the r a w data w i t h i n the same time range. H o w e v e r , this pattern d i d not carry over after a G v H D was diagnosed. The C D 3 - i m m u n e cel l popu la t ion (two readings f rom aliquots ' l A c t i v a t i o n ' and '2Act iva t ion ' ) exhibi ted a pattern opposite to the C D 3 + i m m u n e cel l popula t ion . In the F L D A estimated signals t ime plot, the a G v H D patients had lower signals than the n o n - G v H D patients d i d (Figure G.2a). A consistent pattern was also observed i n the r a w data t ime point w i t h i n the same t ime range (Figure G.2b). 165 -20 -15 -10 -5 0 Days from a G v H D diagnosis -20 -10 0 10 20 Days from a G v H D diagnosis Figure G . l T i m e p lo t of the F L D A est imated s ignals (panel a) based o n samples t aken be tween -21 a n d 0 days f r o m a G v H D a n d t ime p lo t of the r a w data (panel b) based o n samples t a k e n be tween -21 a n d 21 days f r o m a G v H D d iagnos i s for the i m m u n e cel ls C D 3 + C D 4 4 C D 2 5 " i n p r o p o r t i o n to P B M C . T h e a G v H D d iagnos i s day is l a b e l l e d at day 0. 166 100 1 80 60 8 10 20 -21! aGvHD non-GvHD -15 -10 -5 Days from aGvHD diagnosis 120 100 80 1 60 -{ 40 A 20 aGvHD non-GvHD •20 -10 10 20 Days from aGvHD diagnosis Figure G .2 T i m e p lo t of the F L D A es t imated s ignals (panel a) based o n samples t aken be tween -21 a n d to 0 days f r o m a G v H D a n d t ime p lo t of the r a w data (panel b) based o n samples t a k e n be tween -21 a n d to 21 days f r o m a G v H D d iagnos i s for the i m m u n e cel ls CD3 - (a l iquot ' l A c t i v a t i o n ' ) i n p r o p o r t i o n to P B M C . T h e date of a G v H D d iagnos i s i s l a b e l l e d as day 0. 167 In the F L D A analysis of the p ropor t ion dataset u s ing samples taken between 0 and 21 days f rom a G v H D diagnosis , on ly three classifiers were found to have sensi t ivi ty and specificity both higher than 70% (Table H.3) . They were C D 2 d i m C D 1 6 + C D 5 6 - C D 3 - , C D 3 + C D 4 i n t (from al iquot '3Act iva t ion ' ) and C D 3 + C D 4 + C D 8 B C D 8 + i n p ropor t ion to the C D 3 + cells (not P B M C ) . A l l three classifiers exhibi ted s imi la r patterns to that of the C D 3 + T cells described i n the previous section. The F L D A classifier bu i l t f rom i m m u n e cells C D 2 d i m C D 1 6 + C D 5 6 C D 3 - u s ing samples taken between 0 and 21 days f rom a G v H D diagnosis had a n estimated 78% sensit ivi ty and 100% specificity. The F L D A estimated signals t ime plot (Figure G.3a) d i sp layed a pattern of higher signals f rom the a G v H D patients compared to the non-G v H D patients, w h i c h was consistent w i t h its cor responding r a w data t ime plot (Figure G.3b). H o w e v e r , this pattern was not observed before a G v H D diagnosis (Figure G.3b). The F L D A classifier bu i l t f rom i m m u n e cells C D 3 + C D 4 i n t (from al iquot '3Act iva t ion ' ) u s ing samples taken between 0 and 21 days f rom a G v H D diagnosis had an estimated 72% sensi t ivi ty and 100% specificity. The F L D A estimated signals t ime plo t (Figure G.4a) d i sp layed a pattern of higher signals f rom the a G v H D patients compared to the n o n - G v H D patients. The separation between the two groups of patients was smaller than the one observed i n the F L D A estimated signals for the i m m u n e cells C D 3 + C D 4 + C D 8 ( 3 + based o n samples taken between 7 and 21 days post-transplant (Figure 4.4). Nevertheless, this pattern was consistent w i t h its cor responding r a w data t ime plot (Figure G.4b). A s imi la r pattern was also observed i n the r a w data t ime plot before the a G v H D diagnosis, outside the ana lyzed t ime range. H o w e v e r , F L D A classifier u s ing the same subset of i m m u n e cells based samples taken between 21 and 0 days p r io r to a G v H D diagnosis had on ly an estimated 57% sensi t ivi ty and 67% specificity (Table H.2) . 168 aGvHD non-GvHD x FLDA classifier global base values Days from aGvHD diagnosis b 80 -r —i 1 1 1 ' r--20 -10 0 10 20 Days from aGvHD diagnosis Figure G.3 Time plot of the FLDA estimated signals (panel a) based on samples taken between 0 and 21 days from aGvHD and time plot of the raw data (panel b) based on samples taken between -21 and 21 days from aGvHD diagnosis for the immune cells CD2 d i l"CD16 +CD56CD3-in proportion to PBMC. The date of aGvHD diagnosis is labelled as day 0. 169 0 5 10 15 20 Days from aGvHD diagnosis -20 -10 0 10 20 Days from aGvHD diagnosis Figure G.4 Time plot of the F L D A estimated signals (panel a) based on samples taken between 0 and 21 days from a G v H D and time plot of the raw data (panel b) based on samples taken between -21 and 21 days from a G v H D diagnosis for the immune cells CD3+CD4 i n t (aliquot '3Activation') i n proportion to P B M C . The date of a G v H D diagnosis is labelled as day 0. 170 The F L D A classifier bu i l t f rom the p ropor t ion of the i m m u n e cells C D 3 + C D 4 + C D 8 ( 3 + C D 8 + relative to the total C D 3 + cells (instead of the usua l P B M C s ) us ing samples between 0 and 21 days f rom a G v H D diagnosis had a n estimated 72% sensit ivi ty and 100% specificity. L i k e most of classifiers p rev ious ly described, it exhibi ted a pattern where both F L D A signals and the r a w C D 3 + cells p ropor t ion were higher f rom the a G v H D patients, compared to the n o n - G v H D patients (Figure G.5). E v e n though the i m m u n e cel l abundance was recorded i n p ropor t ion to C D 3 + cells, it exhibi ted a s imi lar pattern to C D 3 + C D 4 + C D 8 ( 3 + C D 8 + i n p ropor t ion to P B M C (Figure 4.8). In the F L D A analysis of the concentrat ion dataset u s ing samples taken f rom a l l three t ime ranges, there were only three classifiers w i t h their estimated sensit ivi ty and specificity both higher than 70% (Tables H .4 - H.6). O v e r a l l , there was very litt le correlat ion between the classifiers accuracies f rom the p ropor t ion and concentrat ion datasets (r = 0.02). The top r ank ing classifiers f rom the concentrat ion dataset were: 1. . C D 2 + C D 1 6 + , based o n samples taken between 7 and 21 days post-transplant (data not shown) 2. C D 3 " C D 4 4 + C D 2 5 + , based o n samples taken between 21 and 0 days p r io r to a G v H D diagnosis (data not shown) 3. C D 4 5 + C D 3 3 " , based o n samples taken between 21 and 0 days p r io r to a G v H D diagnosis (Figure G.6) These classifiers were a l l inconsistent due to pattern outliers as described i n details i n Chapter 4. 171 0 5 10 15 20 Days from aGvHD diagnosis -20 -10 0 10 20 Days from aGvHD diagnosis Figure G.5 Time plot of the F L D A estimated signals (panel a) based on samples taken between 0 and 21 days from a G v H D and time plot of the raw data (panel b) based on samples taken between -21 and 21 days from a G v H D diagnosis for the new subset of immune cells CD3+CD4+CD8p+CD8+ in proportion to CD3+ cell population. The a G v H D diagnosis day is labelled at day 0. 172 -20 -15 -10 -5 0 Days from aGvHD diagnosis -20 -15 -10 -5 0 Days from aGvHD diagnosis Figure G.6 T i m e plots o f the F L D A es t imated s ignals (panel a) a n d the r a w data (panel b) based o n samples t aken be tween 21 a n d 0 days p r io r to a G v H D d iagnos i s for the i m m u n e cel ls CD45+CD33" i n concent ra t ion (mm 3 ) . 173 Appendix H. Summaries of LOOCV results for the FLDA analyses between aGvHD and non-GvHD patients Table H.l Validation results for qualified subsets of immune cells in proportion to PBMC (%) from the FLDA classification between aGvHD and non-GvHD patients using samples taken from 7 to 21 days post-transplant. Immune cells Aliquot Sensitivity (%) Specificity (%) Accuracy (%) CD3-CD44+CD25-1 Ac t iva t i on 86 33 79 C D 3 C D 4 4 + C D 2 5 + C D 6 9 + 100 0 85 C D 3 C D 4 4 + C D 2 5 + 94 0 80 C D 3 - C D 4 4 C D 2 5 - 67 67 67 C D 3 - 81 0 71 C D 3 + C D 4 4 C D 2 5 + 57 33 54 C D 3 + C D 4 4 + C D 2 5 + C D 6 9 + 24 67 29 C D 3 + C D 4 4 + C D 2 5 + 81 0 71 C D 3 + C D 4 4 - C D 2 5 - 81 33 75 C D 3 + 90 33 83 C D 3 C D 4 d i m 2Act iva t ion 86 0 75 C D 3 C D 8 1 ™ 57 0 50 C D 3 - C D 4 C D 8 - 95 33 88 C D 3 - 86 0 75 C D 3 + C D 4 b r 86 33 79 C D 3 + C D 4 i n t 81 100 83 C D 3 + C D 8 b r 81 33 75 C D S + C D S ^ 67 0 58 C D 3 + 86 33 79 I m m u n e cel ls A l i q u o t Sens i t i v i t y (%) Spec i f i c i ty (%) Accu racy (%) C D 3 - C D 4 d i m 81 0 71 C D 3 C D 8 l o w 62 0 54 C D 3 - C D 4 C D 8 - 95 33 88 C D 3 - 86 0 75 C D 3 + C D 4 b r 3Act iva t ion 86 33 79 C D 3 + C D 4 ^ 76 67 75 C D 3 + C D 8 b r 81 33 75 C D 3 + C D 8 d i m 81 33 75 C D 3 + 90 33 83 C D 2 2 + C D 2 0 + B cells 95 0 83 C D 2 2 + 100 0 88 C D 3 3 + C D 4 5 d i m C D 1 5 l o w C D 1 4 l o w 100 67 96 C D 3 3 + C D 4 5 d i m C D 1 5 + C D 1 4 - 62 0 54 C D 3 3 + C D 4 5 d i m C D 1 5 + C D 1 4 + 67 0 58 C D 3 3 + C D 4 5 d i m M y e l o i d s 81 0 71 C D 3 3 + C D 4 5 + C D 1 5 + C D 1 4 + 90 0 79 C D 3 3 + C D 4 5 + 95 0 83 C D 4 5 + C D 3 3 - C D 1 5 + C D 1 4 - 67 0 58 C D 4 5 + C D 3 3 - 86 0 75 C D 2 d i m C D 1 6 + C D 3 + C D 5 6 - 81 33 75 C D 2 d i m C D 1 6 + C D 5 6 + C D 3 - 90 100 92 C D 2 d i m C D 1 6 + C D 5 6 - C D 3 - 90 0 79 C D 2 d i m C D 1 6 + N K cells 90 0 79 C D 2 C D 1 6 + C D 3 + C D 5 6 - 86 33 79 C D 2 C D 1 6 + C D 5 6 + C D 3 - 81 33 75 C D 2 C D 1 6 + C D 5 6 C D 3 - 67 33 62 Immune cells Aliquot Sensitivity (%) Specificity (%) Accuracy (%) C D 2 C D 1 6 + 52 67 54 C D 2 + C D 1 6 C D 3 + C D 5 6 - 86 33 79 C D 2 + C D 1 6 C D 5 6 + C D 3 - 71 0 62 C D 2 + C D 1 6 C D 5 6 C D 3 - 71 0 62 C D 2 + C D 1 6 - N K cells 90 33 83 C D 2 + C D 1 6 + C D 3 + C D 5 6 - 67 67 67 C D 2 + C D 1 6 + C D 5 6 + C D 3 - 71 0 62 C D 2 + C D 1 6 + C D 5 6 - C D 3 - 95 33 88 C D 2 + C D 1 6 + 76 0 67 C D 3 - C D 4 l o w C D 8 i 3 l o w 90 0 79 C D 3 C D 8 p d i m C D 8 - 90 0 79 C D 3 C D 8 + C D 8 p - 67 0 58 C D 3 - 86 0 75 C D 3 + C D 4 + C D 8 p - 90 33 83 C D 3 + C D 4 + C D 8 p + 86 100 88 C D 3 + C D 8 p d i m C D 8 -T cells 90 0 79 C D 3 + C D 8 p + C D 4 - 81 33 75 C D 3 + C D 8 p + C D 8 l 0 W 57 0 50 C D 3 + C D 8 p + C D 8 + 81 33 75 C D 3 + C D 8 + C D 8 p - 81 33 75 C D 3 + 90 33 83 C D 3 + C D 4 + C D 8 p + C D 8 + 71 100 75 C D 3 + C D 4 + C D 8 P + C D 8 + (proport ion o f C D 3 + cells) 48 33 46 ON Immune cells Aliquot Sensitivity (%) Specificity (%) Accuracy (%) C D 3 - 86 0 75 C D 3 - C D 5 - T C R a b + T C R g d - 76 0 67 C D 3 C D 5 T C R a b + T C R 86 0 75 C D 3 C D 5 T C R a b + T C R g d + 50 0 43 C D 3 C D 5 T C R a b + T C R g d - 85 0 74 Table H.2 Validation results for qualified subsets of immune cells in proportion to PBMC (%) from the FLDA classification between aGvHD and non-GvHD patients using samples taken between 21 and 0 days prior to aGvHD diagnosis. Immune cells Aliquot Sensitivity (%) Specificity (%) Accuracy (%) C D 3 C D 4 4 + C D 2 5 -l A c t i v a t i o n 90 33 83 CD3CD44+CD25+CD69+ 76 33 70 CD3CD44+CD25+ 88 33 80 C D 3 C D 4 4 - C D 2 5 - 57 33 54 C D 3 - 71 100 75 C D 3 + C D 4 4 C D 2 5 + 62 33 58 C D 3 + C D 4 4 + C D 2 5 + C D 6 9 + 57 100 62 C D 3 + C D 4 4 + C D 2 5 + 43 0 38 C D 3 + C D 4 4 C D 2 5 - 76 100 79 C D 3 + 71 100 75 C D 3 C D 4 d i m 2Act iva t ion 71 0 62 C D 3 C D 8 l o w 67 0 58 C D 3 C D 4 - C D 8 - 62 0 54 C D 3 - 71 100 75 C D 3 + C D 4 b r 62 67 62 C D 3 + C D 4 t a t 57 100 62 C D 3 + C D 8 b r 76 100 79 C D 3 + C D 8 d i m 43 33 42 C D 3 + 71 100 75 00 Immune cells Aliquot Sensitivity (%) Specificity (%) Accuracy (%) C D 3 - C D 4 d i m 71 0 62 C D 3 C D 8 l o w 71 0 62 C D 3 - C D 4 C D 8 - 67 0 58 C D 3 - 67 100 71 C D 3 + C D 4 b r 3Act iva t ion 62 67 62 C D 3 + C D 4 i n t 57 67 58 C D 3 + C D 8 b r 67 100 71 C D 3 + C D 8 d i m 38 0 33 C D 3 + 67 100 71 C D 2 2 + C D 2 0 + B cells 90 0 79 C D 2 2 + 81 0 71 C D 3 3 + C D 4 5 d i m C D 1 5 l o w C D 1 4 l o w 90 33 83 C D 3 3 + C D 4 5 d i m C D 1 5 + C D 1 4 - 95 33 88 C D 3 3 + C D 4 5 d i m C D 1 5 + C D 1 4 + 81 0 71 C D 3 3 + C D 4 5 d i m M y e l o i d s 90 0 79 C D 3 3 + C D 4 5 + C D 1 5 + C D 1 4 + 76 0 67 C D 3 3 + C D 4 5 + 71 0 62 C D 4 5 + C D 3 3 C D 1 5 + C D 1 4 - 81 0 71 C D 4 5 + C D 3 3 - 71 0 62 C D 2 d i m C D 1 6 + C D 3 + C D 5 6 - 43 67 46 C D 2 d i m C D 1 6 + C D 5 6 + C D 3 - 52 33 50 C D 2 d i m C D 1 6 + C D 5 6 C D 3 - 76 0 67 C D 2 d i m C D 1 6 + N K cells 67 0 58 C D 2 C D 1 6 + C D 3 + C D 5 6 - 90 33 83 C D 2 C D 1 6 + C D 5 6 + C D 3 - 86 33 79 C D 2 C D 1 6 + C D 5 6 C D 3 - 38 33 38 Immune cells Aliquot Sensitivity (%) Specificity (%) Accuracy (%) C D 2 C D 1 6 + 38 0 33 C D 2 + C D 1 6 C D 3 + C D 5 6 - 67 67 67 C D 2 + C D 1 6 C D 5 6 + C D 3 - 95 0 83 C D 2 + C D 1 6 C D 5 6 C D 3 - 67 0 58 C D 2 + C D 1 6 - N K cells 86 33 79 C D 2 + C D 1 6 + C D 3 + C D 5 6 - 52 67 54 C D 2 + C D 1 6 + C D 5 6 + C D 3 - 90 33 83 C D 2 + C D 1 6 + C D 5 6 C D 3 - 86 67 83 C D 2 + C D 1 6 + 76 0 67 C D 3 + C D 4 + C D 8 p + C D 8 + 71 100 75 C D 3 + C D 4 + C D 8 p + C D 8 + (proport ion of C D 3 + cells) 57 100 62 C D 3 C D 4 l o w C D 8 p l o w 76 0 67 C D 3 - C D 8 p d i m C D 8 - 90 67 88 C D 3 C D 8 + C D 8 p - 86 0 75 C D 3 - 67 100 71 C D 3 + C D 4 + C D 8 p -T cells 57 67 58 C D 3 + C D 4 + C D 8 p + 67 100 71 C D 3 + C D 8 p d i m C D 8 - 48 100 54 C D 3 + C D 8 p + C D 4 - 67 100 71 C D 3 + C D 8 p + C D 8 l o w 81 67 79 C D 3 + C D 8 p + C D 8 + 67 100 71 C D 3 + C D 8 + C D 8 p - 71 100 75 C D 3 + 67 100 71 oo o Immune cells Aliquot Sensitivity (%) Specificity (%) Accuracy (%) C D 3 - 76 100 79 C D 3 C D 5 T C R a b + T C R g d - 86 0 75 C D 3 C D 5 T C R a b + T C R 76 33 71 C D 3 C D 5 T C R a b + T C R g d + 80 0 70 C D 3 C D 5 T C R a b + T C R g d - 65 33 61 oo Table H.3 Validation results for qualified subsets of immune cells in proportion to PBMC (%) from the FLDA classification between aGvHD and non-GvHD patients using samples taken between 0 and 21 days from aGvHD diagnosis. Immune cells Aliquot Sensitivity (%) Specificity (%) Accuracy (%) C D 3 - C D 4 4 + C D 2 5 -l A c t i v a t i o n 72 33 67 C D 3 - C D 4 4 + C D 2 5 + C D 6 9 + 62 67 63 C D 3 C D 4 4 + C D 2 5 + 81 0 68 C D 3 C D 4 4 - C D 2 5 - 67 0 57 C D 3 - 94 0 81 C D 3 + C D 4 4 - C D 2 5 + 44 33 43 C D 3 + C D 4 4 + C D 2 5 + C D 6 9 + 56 100 62 C D 3 + C D 4 4 + C D 2 5 + 78 0 67 C D 3 + C D 4 4 C D 2 5 - 72 33 67 C D 3 + 94 33 86 C D 3 C D 4 d ™ 2Act iva t ion 78 33 71 C D 3 - C D 8 l o w 83 0 71 C D 3 C D 4 - C D 8 - 94 0 81 C D 3 - 94 0 81 C D 3 + C D 4 b r 89 0 76 C D 3 + C D 4 t a t 67 100 71 C D 3 + C D 8 b r 50 67 52 C D 3 + C D 8 d i m 94 33 86 C D 3 + 94 0 81 00 K3 Immune cells Aliquot Sensitivity (%) Specificity (%) Accuracy (%) C D 3 C D 4 d i m 72 67 71 C D 3 - C D 8 l 0 W 61 0 52 C D 3 C D 4 - C D 8 - 100 0 86 C D 3 - 94 0 81 C D 3 + C D 4 b r 3Act iva t ion 83 0 71 C D 3 + C D 4 t a t 72 100 76 C D 3 + C D 8 b r 61 67 62 C D 3 + C D 8 d i m 89 33 81 C D 3 + 94 0 81 C D 2 2 + C D 2 0 + B cells 94 67 90 C D 2 2 + 94 0 81 C D 3 3 + C D 4 5 d i m C D 1 5 l o w C D 1 4 l o w 72 33 67 C D 3 3 + C D 4 5 d i m C D 1 5 + C D 1 4 - 44 33 43 C D 3 3 + C D 4 5 d i m C D 1 5 + C D 1 4 + 33 100 43 C D 3 3 + C D 4 5 d i m M y e l o i d s 33 100 43 C D 3 3 + C D 4 5 + C D 1 5 + C D 1 4 + 89 0 76 C D 3 3 + C D 4 5 + 56 33 52 C D 4 5 + C D 3 3 - C D 1 5 + C D 1 4 - 100 0 86 C D 4 5 + C D 3 3 - 72 0 62 C D 2 d i m C D 1 6 + C D 3 + C D 5 6 - 83 0 71 C D 2 d i m C D 1 6 + C D 5 6 + C D 3 - 50 0 43 C D 2 d i m C D 1 6 + C D 5 6 C D 3 - 78 100 81 C D 2 d i m C D 1 6 + N K cells 78 33 71 C D 2 C D 1 6 + C D 3 + C D 5 6 - 94 33 86 C D 2 - C D 1 6 + C D 5 6 + C D 3 - 72 0 62 C D 2 C D 1 6 + C D 5 6 C D 3 - 89 33 81 oo C O Immune cells Aliquot Sensitivity (%) Specificity (%) Accuracy (%) C D 2 C D 1 6 + 89 33 81 C D 2 + C D 1 6 - C D 3 + C D 5 6 - 89 33 81 C D 2 + C D 1 6 - C D 5 6 + C D 3 - 89 33 81 C D 2 + C D 1 6 C D 5 6 C D 3 - 67 0 57 C D 2 + C D 1 6 - N K cells 89 0 76 C D 2 + C D 1 6 + C D 3 + C D 5 6 - 50 67 52 C D 2 + C D 1 6 + C D 5 6 + C D 3 - 83 33 76 C D 2 + C D 1 6 + C D 5 6 - C D 3 - 94 0 81 C D 2 + C D 1 6 + 28 0 24 C D 3 + C D 4 + C D 8 p + C D 8 + 67 67 67 C D 3 + C D 4 + C D 8 P + C D 8 + (proport ion of C D 3 + cells) 72 100 76 C D 3 C D 4 l o w C D 8 p l o w 78 67 76 C D 3 C D 8 p d i m C D 8 - 78 0 67 C D 3 C D 8 + C D 8 p - 72 0 62 C D 3 - 94 33 86 C D 3 + C D 4 + C D 8 p -T cells 89 0 76 C D 3 + C D 4 + C D 8 p + 72 67 71 C D 3 + C D 8 p d i m C D 8 - 56 100 62 C D 3 + C D 8 p + C D 4 - 56 33 52 C D 3 + C D 8 p + C D 8 l o w 83 0 71 C D 3 + C D 8 p + C D 8 + 56 67 57 C D 3 + C D 8 + C D 8 p - 72 33 67 C D 3 + 100 0 86 oo Immune cells Aliquot Sensitivity (%) Specificity (%) Accuracy (%) C D 3 - 94 0 81 C D 3 C D 5 T C R a b + T C R g d - 89 0 76 C D 3 C D 5 T C R a b + T C R 67 33 62 C D 3 C D 5 T C R a b + T C R g d + 39 33 38 C D 3 C D 5 T C R a b + T C R g d - 72 0 62 00 Table H.4 Validation results for qualified subsets of immune cells in concentration (mm3) from the FLDA classification between aGvHD and non-GvHD patients using samples taken from 7 to 21 days post-transplant. Immune cells Aliquot Sensitivity (%) Specificity (%) Accuracy (%) C D 3 C D 4 4 + C D 2 5 -l A c t i v a t i o n 76 33 71 C D 3 C D 4 4 + C D 2 5 + C D 6 9 + 100 0 85 C D 3 C D 4 4 + C D 2 5 + 100 0 85 C D 3 - C D 4 4 C D 2 5 - 43 67 46 C D 3 - 76 67 75 C D 3 + C D 4 4 C D 2 5 + 52 67 54 C D 3 + C D 4 4 + C D 2 5 + C D 6 9 + 81 67 79 C D 3 + C D 4 4 + C D 2 5 + 71 0 62 C D 3 + C D 4 4 - C D 2 5 - 43 67 46 C D 3 + 43 67 46 C D 3 - C D 4 d i m 2Act iva t ion 67 67 67 C D 3 C D 8 l o w 67 33 62 C D 3 C D 4 - C D 8 - 81 33 75 C D 3 - 71 67 71 C D 3 + C D 4 b l - 81 67 79 33 100 42 C D 3 + C D 8 b r 43 67 46 C D 3 + C D 8 d i m 33 67 38 C D 3 + 52 67 54 00 O N Immune cells Aliquot Sensitivity (%) Specificity (%) Accuracy (%) C D 3 - C D 4 d i m 3Act iva t ion 67 67 67 C D 3 C D 8 l o w 71 33 67 C D 3 C D 4 - C D 8 - 86 33 79 C D 3 - 71 67 71 C D 3 + C D 4 b r 86 67 83 C D 3 + C D 4 f a t 43 67 46 C D 3 + C D 8 b r 52 67 54 C D 3 + C D 8 d i m 57 33 54 C D 3 + 57 67 58 C D 2 2 + C D 2 0 + B cells 62 100 67 C D 2 2 + 95 33 88 C D 3 3 + C D 4 5 d i m C D 1 5 l o w C D 1 4 l o w M y e l o i d s 100 33 92 C D 3 3 + C D 4 5 d i m C D 1 5 + C D 1 4 - 71 33 67 C D 3 3 + C D 4 5 d i m C D 1 5 + C D 1 4 + 71 0 62 C D 3 3 + C D 4 5 d i m 71 0 62 C D 3 3 + C D 4 5 + C D 1 5 + C D 1 4 + 76 33 71 C D 3 3 + C D 4 5 + 76 67 75 C D 4 5 + C D 3 3 - C D 1 5 + C D 1 4 - 81 67 79 C D 4 5 + C D 3 3 - 81 67 79 C D 2 d i m C D 1 6 + C D 3 + C D 5 6 -N K cells 38 100 46 C D 2 d i m C D 1 6 + C D 5 6 + C D 3 - 43 67 46 C D 2 d i m C D 1 6 + G D 5 6 C D 3 - 71 67 71 C D 2 d i m C D 1 6 + 71 67 71 C D 2 C D 1 6 + C D 3 + C D 5 6 - 90 0 79 C D 2 C D 1 6 + C D 5 6 + C D 3 - 95 0 83 C D 2 C D 1 6 + C D 5 6 C D 3 - 76 33 71 00 VI Immune cells Aliquot Sensitivity (%) Specificity (%) Accuracy (%) C D 2 C D 1 6 + 76 33 71 C D 2 + C D 1 6 C D 3 + C D 5 6 - 52 33 50 C D 2 + C D 1 6 C D 5 6 + C D 3 - 100 33 92 C D 2 + C D 1 6 C D 5 6 C D 3 - 76 33 71 C D 2 + C D 1 6 - N K cells 90 0 79 C D 2 + C D 1 6 + C D 3 + C D 5 6 - 52 100 58 C D 2 + C D 1 6 + C D 5 6 + C D 3 - 90 0 79 C D 2 + C D 1 6 + C D 5 6 C D 3 - 86 67 83 C D 2 + C D 1 6 + 76 100 79 C D 3 + C D 4 + C D 8 p + C D 8 + 48 67 50 C D 3 + C D 4 + C D 8 p + C D 8 + (proport ion of C D 3 + cells) 48 33 46 C D 3 C D 4 l 0 W C D 8 p l 0 W 76 67 75 C D 3 C D 8 p d i m C D 8 - 76 67 75 C D 3 - C D 8 + C D 8 p - 71 67 71 C D 3 - 76 67 75 C D 3 + C D 4 + C D 8 p -T cells 81 67 79 C D 3 + C D 4 + C D 8 p + 43 0 38 C D 3 + C D 8 p d i m C D 8 - 38 0 33 C D 3 + C D 8 p + C D 4 - 52 100 58 C D 3 + C D 8 p + C D 8 l 0 W 90 33 83 C D 3 + C D 8 p + C D 8 + 52 100 58 C D 3 + C D 8 + C D 8 p - 38 67 42 C D 3 + 57 67 58 CO CO Immune cells Aliquot Sensitivity (%) Specificity (%) Accuracy (%) C D 3 - 76 67 75 C D 3 C D 5 T C R a b + T C R g d - 86 67 83 C D 3 C D 5 T C R a b + T C R 67 67 67 C D 3 C D 5 T C R a b + T C R g d + 75 0 65 C D 3 C D 5 T C R a b + T C R g d - 75 67 74 I—i 00 Table H.5 Validation results for qualified subsets of immune cells in concentration (mm3) from the FLDA classification between aGvHD and non-GvHD patients using samples taken between 21 and 0 days prior to aGvHD diagnosis. Immune cells Aliquot Sensitivity (%) Specificity (%) Accuracy (%) CD3-CD44+CD25-1 A c t i v a t i o n 90 0 79 C D 3 C D 4 4 + C D 2 5 + C D 6 9 + 59 67 60 C D 3 C D 4 4 + C D 2 5 + 82 100 85 C D 3 - C D 4 4 C D 2 5 - 62 67 62 C D 3 - 81 33 75 C D 3 + C D 4 4 - C D 2 5 + 67 0 58 C D 3 + C D 4 4 + C D 2 5 + C D 6 9 + 95 67 92 C D 3 + C D 4 4 + C D 2 5 + 95 0 83 C D 3 + C D 4 4 C D 2 5 - 57 67 58 G D 3 + 62 33 58 C D 3 C D 4 d i m 2Act iva t ion 76 0 67 C D 3 C D 8 l o w 62 0 54 C D 3 C D 4 - C D 8 - 76 67 75 C D 3 - 81 33 75 C D 3 + C D 4 b r 81 33 75 C D 3 + C D 4 i n t 38 67 42 C D 3 + C D 8 b r 48 67 50 C D 3 + C D 8 d i m 81 33 75 C D 3 + 67 33 62 o Immune cells Aliquot Sensitivity (%) Specificity (%) Accuracy (%) C D 3 C D 4 d i m 3Act iva t ion 71 0 62 C D 3 - C D 8 l o w 62 0 54 C D 3 C D 4 - C D 8 - 76 67 75 C D 3 - 81 33 75 C D 3 + C D 4 b r 86 0 75 C D 3 + C D 4 i n t 57 67 58 C D 3 + C D 8 b r 43 67 46 C D 3 + C D 8 d i m 76 67 75 C D 3 + 71 67 71 C D 2 2 + C D 2 0 + B cells 76 0 67 C D 2 2 + 81 0 71 C D 3 3 + C D 4 5 d i m C D 1 5 l o w C D 1 4 l o w M y e l o i d s 100 33 92 C D 3 3 + C D 4 5 d i m C D 1 5 + C D 1 4 - 90 33 83 C D 3 3 + C D 4 5 d i m C D 1 5 + C D 1 4 + 86 0 75 C D 3 3 + C D 4 5 d i m 86 33 79 C D 3 3 + C D 4 5 + C D 1 5 + C D 1 4 + 81 0 71 C D 3 3 + C D 4 5 + 76 0 67 C D 4 5 + C D 3 3 C D 1 5 + C D 1 4 - 95 67 92 C D 4 5 + C D 3 3 - 71 100 75 C D 2 d i m C D 1 6 + C D 3 + C D 5 6 -N K cells 24 67 29 C D 2 d i m C D 1 6 + C D 5 6 + C D 3 - 76 0 67 C D 2 d ™ C D 1 6 + C D 5 6 C D 3 - 86 0 75 C D 2 d i m C D 1 6 + 76 0 67 C D 2 C D 1 6 + C D 3 + C D 5 6 - 95 33 88 C D 2 C D 1 6 + C D 5 6 + C D 3 - 95 0 83 C D 2 C D 1 6 + C D 5 6 C D 3 - 81 67 79 N O Immune cells Aliquot Sensitivity (%) Specificity (%) Accuracy (%) C D 2 C D 1 6 + 81 67 79 C D 2 + C D 1 6 C D 3 + C D 5 6 - 81 67 79 C D 2 + C D 1 6 C D 5 6 + C D 3 - 100 33 92 C D 2 + C D 1 6 C D 5 6 C D 3 - 76 0 67 C D 2 + C D 1 6 - N K cells 86 67 83 C D 2 + C D 1 6 + C D 3 + C D 5 6 - 62 33 58 C D 2 + C D 1 6 + C D 5 6 + C D 3 - 81 33 75 C D 2 + C D 1 6 + C D 5 6 C D 3 - 86 33 79 C D 2 + C D 1 6 + 86 67 83 C D 3 + C D 4 + C D 8 p + C D 8 + 48 100 54 C D 3 + C D 4 + C D 8 P + C D 8 + (proport ion of C D 3 + cells) 52 67 54 C D 3 C D 4 l o w C D 8 p l o w 90 0 79 C D 3 - C D 8 p d i m C D 8 - 95 33 88 C D 3 C D 8 + C D 8 p - 67 67 67 C D 3 - 81 33 75 C D 3 + C D 4 + C D 8 p -T cells 86 33 79 C D 3 + C D 4 + C D 8 p + 43 67 46 C D 3 + C D 8 p d i m C D 8 - 43 33 42 C D 3 + C D 8 p + C D 4 - 48 100 54 C D 3 + C D 8 p + C D 8 l 0 W 100 33 92 C D 3 + C D 8 p + C D 8 + 48 100 54 C D 3 + C D 8 + C D 8 p - 48 33 46 C D 3 + 67 67 67 K J Immune cells Aliquot Sensitivity (%) Specificity (%) Accuracy (%) C D 3 - 81 33 75 C D 3 C D 5 T C R a b + T C R g d - 81 67 79 C D 3 C D 5 T C R a b + T C R 95 33 88 C D 3 - C D 5 T C R a b + T C R g d + 90 0 78 C D 3 C D 5 T C R a b + T C R g d - 95 33 87 Table H.6 Validation results for qualified subsets of immune cells in concentration (mm3) from the FLDA classification between aGvHD and non-GvHD patients using samples taken between 0 and 21 days from aGvHD diagnosis. Immune cells Aliquot Sensitivity (%) Specificity (%) Accuracy (%) CD3-CD44+CD25- 72 33 67 C D 3 C D 4 4 + C D 2 5 + C D 6 9 + 50 67 53 C D 3 C D 4 4 + C D 2 5 + 62 0 53 C D 3 C D 4 4 - C D 2 5 - 67 33 62 C D 3 -1 Ac t iva t ion 61 100 67 C D 3 + C D 4 4 C D 2 5 + 39 33 38 C D 3 + C D 4 4 + C D 2 5 + C D 6 9 + 33 100 43 C D 3 + C D 4 4 + C D 2 5 + 72 0 62 C D 3 + C D 4 4 C D 2 5 - 78 33 71 C D 3 + 89 33 81 C D 3 - C D 4 d i m 61 67 62 C D 3 - C D 8 l o w 56 0 48 C D 3 - C D 4 C D 8 - 67 33 62 C D 3 - 44 100 52 C D 3 + C D 4 b r 2Act iva t ion 100 33 90 C D 3 + C D 4 i n t 56 67 57 C D 3 + C D 8 b r 78 33 71 C D 3 + C D 8 d i m 94 33 86 C D 3 + 94 33 86 4^ Immune cells Aliquot Sensitivity (%) Specificity (%) Accuracy (%) C D 3 C D 4 d i m 67 67 67 C D 3 - C D 8 l 0 W 72 0 62 C D 3 - C D 4 C D 8 - 67 33 62 C D 3 - 44 100 52 C D 3 + C D 4 b r 3Act iva t ion 100 33 90 C D 3 + C D 4 ^ t 83 67 81 C D 3 + C D 8 b r 78 33 71 C D 3 + C D 8 d i m 94 33 86 C D 3 + 94 33 86 C D 2 2 + C D 2 0 + B cells 94 0 81 C D 2 2 + 100 0 86 C D 3 3 + C D 4 5 d i m C D 1 5 l o w C D 1 4 l o w 22 67 29 C D 3 3 + C D 4 5 d i m C D 1 5 + C D 1 4 - 50 33 48 C D 3 3 + C D 4 5 d i m C D 1 5 + C D 1 4 + 22 100 33 C D 3 3 + C D 4 5 d i m M y e l o i d s 28 100 38 C D 3 3 + C D 4 5 + C D 1 5 + C D 1 4 + 50 67 52 C D 3 3 + C D 4 5 + 56 67 57 C D 4 5 + C D 3 3 C D 1 5 + C D 1 4 - 100 0 86 C D 4 5 + C D 3 3 - 94 33 86 C D 2 d i m C D 1 6 + C D 3 + C D 5 6 - 83 67 81 C D 2 d i m C D 1 6 + C D 5 6 + C D 3 - 50 0 43 C D 2 d i m C D 1 6 + C D 5 6 C D 3 - 44 100 52 C D 2 d i m C D 1 6 + N K cells 44 100 52 C D 2 C D 1 6 + C D 3 + C D 5 6 - 89 0 76 C D 2 C D 1 6 + C D 5 6 + C D 3 - 78 33 71 C D 2 - C D 1 6 + C D 5 6 C D 3 - 83 67 81 Immune cells Aliquot Sensitivity (%) Specificity (%) Accuracy (%) C D 2 C D 1 6 + 83 67 81 C D 2 + C D 1 6 - C D 3 + C D 5 6 - 100 33 90 C D 2 + C D 1 6 C D 5 6 + C D 3 - 83 0 71 C D 2 + C D 1 6 C D 5 6 C D 3 - 100 33 90 C D 2 + C D 1 6 - N K cells 100 33 90 C D 2 + C D 1 6 + C D 3 + C D 5 6 - 61 33 57 C D 2 + C D 1 6 + C D 5 6 + C D 3 - 72 0 62 C D 2 + C D 1 6 + C D 5 6 C D 3 - 89 0 76 C D 2 + C D 1 6 + 67 33 62 C D 3 + C D 4 + C D 8 p + C D 8 + 83 33 76 C D 3 + C D 4 + C D 8 p + C D 8 + (proport ion of C D 3 + cells) 56 100 62 C D 3 - C D 4 l o w C D 8 p l o w 39 100 48 C D 3 - C D 8 p d i m C D 8 - 78 33 71 C D 3 C D 8 + C D 8 p - 67 0 57 C D 3 - 44 100 52 C D 3 + C D 4 + C D 8 p -T cells 100 33 90 C D 3 + C D 4 + C D 8 p + 72 33 67 C D 3 + C D 8 p d i m C D 8 - 61 67 62 C D 3 + C D 8 p + C D 4 - 78 33 71 C D 3 + C D 8 p + C D 8 l o w 83 0 71 C D 3 + C D 8 p + C D 8 + 72 67 71 C D 3 + C D 8 + C D 8 p - 94 33 86 C D 3 + 94 33 86 O N Immune cells Aliquot Sensitivity (%) Specificity (%) Accuracy (%) C D 3 - 50 100 57 C D 3 C D 5 T C R a b + T C R g d - 61 67 62 C D 3 C D 5 T C R a b + T C R 44 100 52 C D 3 - C D 5 T C R a b + T C R g d + 44 100 52 C D 3 C D 5 T C R a b + T C R g d - 50 100 57 V J Appendix I. Other top ranking classifiers for the onset of cGvHD M a n y top r a n k i n g classifiers des igned to predict or elucidate the onset and progress ion of c G v H D exhibi ted inconsistent patterns compared to its r a w data patterns. A n example of the inconsistent classifiers was s h o w n i n Section 5.1. In the F L D A analysis of the concentrat ion dataset u s ing samples taken between 7 and 21 days post-transplant, on ly one type of pattern: a sudden increase f rom a G v H D on ly patients, was observed. The F L D A classification bui l t f r o m the subset of i m m u n e cells 4 5 R A + C D 3 + C D 8 l o w i n cel l concentrat ion (Figure 1.1) was used as an example of this pattern. The classifier had an estimated 86% sensi t ivi ty and 71% specificity (Table J.4). The F L D A estimated signals f rom the a G v H D on ly patients increased at 15 days post-transplant and became higher than the a G v H D & c G v H D patients a round 21 days post-transplant (Figure I.la). This pattern was consistent w i t h the r a w data plot ted f rom 0 to 100 days post-transplant (purp led s t r iped area, Figure L i b ) . In the extended r a w data t ime plot, four out of the seven available n o n - G v H D patient datasets sudden ly increased a round 15 to 55 days post-transplant (Figure L i b ) . S imi la r patterns were also observed f rom other classifiers such as C D 3 -T C R a b + C D 5 + and C D 2 d i m C D 1 6 + C D 3 + C D 5 6 - (data not shown) but w i t h a lower estimated sensi t ivi ty and specificity (Table J.4). 198 £ E *—•• c o •J= te a c y c o a £ 60 -o fi B OI < D 12 10 4 8 A OA aGvHD & c G vH D aGvHD only 8 12 14 16 Days post-transplant is 20 12 A io H S 8 c o a 6 c c 3 * 2A aGvHD & cGvHD aGvHD only 40 60 80 100 Days post-transplant Figure L I T i m e p lo t of the F L D A est imated s ignals (panel a) based o n samples t aken be tween 7 a n d 21 days post- transplant a n d t i m e p lo t of the r a w data (panel b) based o n samples t a k e n be tween 0 a n d 100 days post- t ransplant for the i m m u n e cel ls 4 5 R A + C D 3 + C D 8 l o w i n p r o p o r t i o n to P B M C (%). T h e p u r p l e s t r iped box indicates the t ime range where data was ana lyzed v i a F L D A . In the F L D A analysis of the concentrat ion dataset us ing samples taken between 21 and 0 days p r io r to a G v H D diagnosis, on ly one subset of i m m u n e cells exhibi ted a consistent classifier exhib i t ing opposite F L D A signal pattern. The top classifier was 4 5 R A + C D 3 " C D 4 d i m (Figure 1.2). The F L D A classifier had an estimated 86% sensit ivi ty and 71% specificity (Table J.5). Its F L D A signals were the opposite between the patients groups (Figure I.2a). H o w e v e r , this pattern c o u l d not be easily identif ied i n the local or extended r a w data t ime plots for either subset of i m m u n e cells (Figure I.2b). J 200 120 A a 1 0 0 1= c B OB B 01 0 s 0 m C •SP •O JS a S •13 i 8 0 60 40 20 4 aGvHD & cGvHD aGvHD only / -20 -15 -10 -5 Days from aGvHD diagnosis 120 A 100 S 80 c o •n (8 c 0 c o U 60 A 40 20 A aGvHD & cGvHD aGvHD only -20 -10 0 1 0 20 Days from aGvHD diagnosis Figure 1.2 T i m e p lo t of the F L D A est imated s ignals (panel a) based o n samples t aken be tween -21 a n d 0 f r o m a G v H D d iagnos i s a n d t i m e p lo t of the r a w data (panel b) based o n samples t aken be tween -21 a n d 21 days f r o m a G v H D d iagnos i s for the i m m u n e cel ls 4 5 R A + C D 3 C D 4 d i m i n concent ra t ion (mm 3 ) . T h e date of a G v H D d iagnos i s is l a b e l l e d as day 0. 201 In the F L D A analysis of the p ropor t ion dataset u s ing samples taken between 0 and 21 days p o s t - a G v H D diagnosis, the F L D A classifier bu i l t f rom the i m m u n e cells C D 3 + C D 4 i n t (aliquot '2Act iva t ion ' ) h a d a pattern of higher values f rom the a G v H D on ly patients (Figure 1.3). The classifier p red ic t ing the onset of c G v H D h a d an estimated 83% sensi t ivi ty and 89% specificity (Table J.3). The same subset of i m m u n e cells was also ident if ied as top r ank ing classifier i n the compar i son between a G v H D and n o n - G v H D patients (section 4.1.3). In the F L D A estimated signals t ime plot (Figure I.3a), p ropor t ion values f rom the a G v H D on ly patients started w i t h higher values at the beg inn ing of the ana lyzed t ime range and steadily decreased, w h i l e the values f rom the a G v H D & c G v H D patients increased. In the r a w data t ime plot f rom -21 to 21 days f r o m a G v H D diagnosis (Figure I.3b), values f rom the a G v H D patients were generally higher across t ime points, w h e n compared to the a G v H D & c G v H D patients. 202 0 5 10 15 20 Days from aGvHD diagnosis -20 -10 0 10 20 Days from aGvHD diagnosis Figure 1.3 T i m e p lo t of the F L D A est imated s ignals (panel a) based o n samples t aken be tween 0 a n d 21 days f r o m a G v H D d iagnos i s a n d t i m e p lo t of the r a w data (panel b) based o n samples t aken be tween -21 a n d 21 days f r o m a G v H D d iagnos i s for the i m m u n e cel ls C D 3 + C D 4 i n t (a l iquot ^ A c t i v a t i o n ' ) i n p r o p o r t i o n to P B M C (%). T h e date of a G v H D d iagnos i s is l a b e l l e d as day 0. 203 Appendix J. Summaries of LOOCV results for the FLDA analyses between aGvHD & cGvHD and aGvHD only patients Table J.l Validation results for qualified subsets of immune cells in proportion to PBMC (%) from the FLDA classification between aGvHD & cGvHD and aGvHD only patients using samples taken from 7 to 21 days post-transplant. Immune cells Aliquot Sensitivity (%) Specificity (%) Accuracy (%) CD3-CD44+CD25- 43 . 67 56 C D 3 - C D 4 4 + C D 2 5 + C D 6 9 + 100 43 71 C D 3 C D 4 4 + C D 2 5 + 86 43 64 C D 3 - C D 4 4 C D 2 5 - 71 56 62 C D 3 - 29 22 25 C D 3 + C D 4 4 C D 2 5 + 1 Ac t iva t i on 57 67 62 C D 3 + C D 4 4 + C D 2 5 - 57 22 38 C D 3 + C D 4 4 + C D 2 5 + C D 6 9 + 86 67 75 C D 3 + C D 4 4 + C D 2 5 + 86 44 62 C D 3 + C D 4 4 C D 2 5 - 14 56 38 C D 3 + 43 11 25 C D 3 C D 4 d i m 57 67 62 C D 3 - C D 8 l 0 W 57 22 38 C D 3 C D 4 - C D 8 - 86 44 62 C D 3 - 2Act iva t ion 43 33 38 C D 3 + C D 4 b r 57 56 56 C D 3 + C D 4 i n t 86 67 75 C D 3 + C D 8 b r 43 56 50 o Immune cells Aliquot Sensitivity (%) Specificity (%) Accuracy (%) C D 3 + C D 8 d i m 2Act iva t ion 57 56 56 C D 3 + 29 0 12 C D 3 C D 4 d i m 3Act iva t ion 57 67 62 C D 3 - C D 8 l o w 43 22 31 C D 3 - C D 4 C D 8 - 86 44 62 C D 3 - 29 22 25 C D 3 + C D 4 b r 86 67 75 C D 3 + C D 4 i n t 71 44 56 C D 3 + C D 8 b r 43 67 56 C D 3 + C D 8 d i m 71 44 56 C D 3 + 29 22 25 C D 2 0 + C D 1 9 + B cells 57 56 56 C D 2 2 + C D 2 0 + 29 67 50 C D 2 2 + 71 33 50 C D 3 3 + C D 4 5 d i m C D 1 5 l o w C D 1 4 l o w M y e l o i d s 29 44 38 C D 3 3 + C D 4 5 d i m C D 1 5 + C D 1 4 - 57 11 31 C D 3 3 + C D 4 5 d i m C D 1 5 + C D 1 4 + 71 78 75 C D 3 3 + C D 4 5 d i m 14 11 12 C D 3 3 + C D 4 5 + C D 1 5 + C D 1 4 + 29 44 38 CD33+CD45+ 14 44 31 C D 4 5 + C D 3 3 C D 1 5 + C D 1 4 - 29 78 56 C D 4 5 + C D 3 3 - 57 11 31 C D 2 d ™ C D 1 6 + C D 3 + C D 5 6 -N K cells 57 78 69 C D 2 d i m C D 1 6 + C D 5 6 + C D 3 - 71 56 62 C D 2 d i m C D 1 6 + C D 5 6 C D 3 - - 29 67 50 C D 2 d i m C D 1 6 + 29 44 38 KJ O Ul I m m u n e cel ls A l i q u o t Sens i t i v i t y (%) Spec i f i c i ty (%) A c c u r a c y (%) C D 2 C D 1 6 + C D 3 + C D 5 6 - 86 89 88 C D 2 C D 1 6 + C D 5 6 + C D 3 - 71 33 50 C D 2 C D 1 6 + C D 5 6 C D 3 - 57 44 50 C D 2 C D 1 6 + 43 44 44 C D 2 + C D 1 6 C D 3 + C D 5 6 - 57 33 44 C D 2 + C D 1 6 C D 5 6 + C D 3 -N K cells 14 0 6 C D 2 + C D 1 6 C D 5 6 C D 3 - 29 44 38 C D 2 + C D 1 6 - 71 44 56 C D 2 + C D 1 6 + C D 3 + C D 5 6 - 71 44 56 C D 2 + C D 1 6 + C D 5 6 + C D 3 - 29 44 38 C D 2 + C D 1 6 + C D 5 6 C D 3 - 57 33 44 C D 2 + C D 1 6 + 71 44 56 4 5 R A + C D 3 C D 4 d i m 0 0 0 4 5 R A + C D 3 - 0 14 7 4 5 R A + C D 3 + C D 4 l o w 57 43 50 4 5 R A + C D 3 + C D 4 - 71 71 71 4 5 R A + C D 3 + C D 4 + 14 43 29 4 5 R A + C D 3 + 57 57 57 4 5 R O + C D 3 - C D 4 d i m rest/ act T 14 43 29 4 5 R O + C D 3 - helper 29 57 43 4 5 R O + C D 3 + C D 4 l o w 43 71 57 4 5 R O + C D 3 + C D 4 - 86 71 79 4 5 R O + C D 3 + C D 4 + 57 43 50 4 5 R O + C D 3 + 29 43 36 C D 3 - 43 29 36 C D 3 + C D 4 - 57 57 57 O ON I m m u n e cel ls A l i q u o t Sens i t i v i ty (%) Spec i f i c i ty (%) Accu racy (%) C D 3 + C D 4 + 57 43 50 C D 3 + rest/ act T 29 57 43 C D 4 d i m helper 14 14 14 C D 3 C D 4 - 29 43 36 4 5 R A + C D 3 C D 8 0 14 7 4 5 R A + C D 3 - 0 0 0 4 5 R A + C D 3 + C D 8 l o w 57 71 64 4 5 R A + C D 3 + C D 8 - 29 57 43 4 5 R A + C D 3 + C D 8 + 71 43 57 4 5 R A + C D 3 + 57 57 57 4 5 R O + C D 3 - 29 57 43 4 5 R O + C D 3 + C D 8 l o w rest/ act T suppressor 71 57 64 4 5 R O + C D 3 + C D 8 - 29 57 43 4 5 R O + C D 3 + C D 8 + 57 43 50 4 5 R O + C D 3 + 14 43 29 C D 3 - 29 29 29 C D 3 + C D 8 - 43 57 50 C D 3 + C D 8 + 71 43 57 C D 3 + 14 43 29 C D 8 + C D 3 - 57 0 29 C D 3 C D 8 - 29 43 36 C D 3 + C D 4 + C D 8 p + C D 8 + 57 56 56 C D 3 + C D 4 + C D 8 p + C D 8 + (proport ion of C D 3 + cells) 43 44 44 C D 3 C D 4 l o w C D 8 p l o w T cells 29 78 56 C D 3 C D 8 p d i m C D 8 - 29 67 50 C D 3 C D 8 + C D 8 p - 43 22 31 ho o V ] I m m u n e cel ls A l i q u o t Sens i t i v i t y (%) Spec i f i c i ty (%) Accu racy (%) C D 3 - 43 33 38 C D 3 + C D 4 + C D 8 p - 57 56 56 C D 3 + C D 4 + C D 8 p + 57 33 44 C D 3 + C D 8 B d i m C D 8 - 57 78 69 C D 3 + C D 8 p + C D 4 - T cells 43 67 56 C D 3 + C D 8 p + C D 8 l o w 100 • 22 56 C D 3 + C D 8 p + C D 8 + 43 67 56 C D 3 + C D 8 + C D 8 p - 29 11 19 C D 3 + 14 0 6 C D 3 C D 5 + 57 50 54 C D 3 - 29 22 25 C D 3 C D 5 T C R a b + T C R g d - 57 44 50 C D 3 C D 5 T C R a b + 0 56 31 C D 3 - C D 5 T C R a b + T C R g d + 43 44 44 C D 3 C D 5 T C R a b + T C R g d - 14 44 31 C D 3 T C R + C D 5 + 71 67 69 C D 3 + 57 22 38 C D 3 + C D 5 T C R a b + 100 11 50 C D 3 + C D 5 T C R a b + T C R g d + 71 11 38 G D 3 + C D 5 T C R a b + T C R g d - 100 11 50 C D 3 + C D 5 + T C R a b + 71 38 53 C D 3 + C D 5 + T C R a b + T C R g d + 57 25 40 C D 3 + C D 5 + T C R g d + 71 38 53 KJ O 00 Table J.2 Validation results for qualified subsets of immune cells in proportion to PBMC (%) from the FLDA classification between aGvHD & cGvHD and aGvHD only patients using samples taken between 21 and 0 days prior to aGvHD diagnosis. Immune cells Aliquot Sensitivity (%) Specificity (%) Accuracy (%) CD3-CD44+CD25- 71 78 75 C D 3 C D 4 4 + C D 2 5 + C D 6 9 + 71 29 50 C D 3 - C D 4 4 + C D 2 5 + 86 29 57 C D 3 C D 4 4 - C D 2 5 - 14 11 12 C D 3 - 71 67 69 C D 3 + C D 4 4 C D 2 5 + 1 Ac t iva t i on 71 56 62 C D 3 + C D 4 4 + C D 2 5 - 57 11 31 C D 3 + C D 4 4 + C D 2 5 + C D 6 9 + 86 33 56 C D 3 + C D 4 4 + C D 2 5 + 29 33 31 C D 3 + C D 4 4 - C D 2 5 - 57 56 56 C D 3 + 71 56 62 C D 3 - C D 4 d ™ 71 67 69 C D 3 - C D 8 l 0 W 43 44 44 C D 3 C D 4 - C D 8 - 57 33 44 C D 3 - 71 78 75 C D 3 + C D 4 b r 2Act iva t ion 57 44 50 C D 3 + C D 4 i n t 86 44 62 C D 3 + C D 8 b r 43 89 69 C D 3 + C D 8 d i m 71 44 56 C D 3 + 71 56 62 O N£> I m m u n e cel ls A l i q u o t Sens i t i v i ty (%) Spec i f i c i ty (%) Accu racy (%) C D 3 - C D 4 d i m 3 A c h v a t i o n 71 78 75 C D 3 C D 8 l o w 43 33 38 C D 3 C D 4 - C D 8 - 57 33 44 C D 3 - 71 89 81 C D 3 + C D 4 b r 57 56 56 C D 3 + C D 4 i n t 86 56 69 C D 3 + C D 8 b r 57 89 75 C D 3 + C D 8 d i m 71 33 50 C D 3 + 71 56 62 C D 2 0 + C D 1 9 + B cells 43 22 31 C D 2 2 + C D 2 0 + 43 100 75 C D 2 2 + 86 44 62 C D 3 3 + C D 4 5 d i m C D 1 5 l o w C D 1 4 l o w M y e l o i d s 86 22 50 C D 3 3 + C D 4 5 d i m C D 1 5 + C D 1 4 - 57 56 56 C D 3 3 + C D 4 5 d i m C D 1 5 + C D 1 4 + 71 11 38 C D 3 3 + C D 4 5 d i m 71 33 50 C D 3 3 + C D 4 5 + C D 1 5 + C D 1 4 + 71 56 62 C D 3 3 + C D 4 5 + 71 67 69 C D 4 5 + C D 3 3 C D 1 5 + C D 1 4 - 71 100 88 C D 4 5 + C D 3 3 - 86 56 69 C D 2 d i m C D 1 6 + C D 3 + C D 5 6 -N K cells 86 56 69 C D 2 d i m C D 1 6 + C D 5 6 + C D 3 - 57 56 56 C D 2 d i m C D 1 6 + C D 5 6 C D 3 - 57 78 69 C D 2 d i m C D 1 6 + 71 56 62 C D 2 C D 1 6 + C D 3 + C D 5 6 - 71 33 50 C D 2 C D 1 6 + C D 5 6 + C D 3 - 71 56 62 I m m u n e cel ls A l i q u o t Sens i t i v i t y (%) Spec i f i c i ty (%) Accuracy (%) C D 2 C D 1 6 + C D 5 6 C D 3 - 71 56 62 C D 2 C D 1 6 + 57 56 56 C D 2 + C D 1 6 C D 3 + C D 5 6 - 57 56 56 C D 2 + C D 1 6 C D 5 6 + C D 3 - 14 44 31 C D 2 + C D 1 6 C D 5 6 C D 3 -N K cells 29 0 12 C D 2 + C D 1 6 - 71 67 69 C D 2 + C D 1 6 + C D 3 + C D 5 6 - 86 22 50 C D 2 + C D 1 6 + C D 5 6 + C D 3 - 57 56 56 C D 2 + C D 1 6 + C D 5 6 C D 3 - 57 56 56 C D 2 + C D 1 6 + 57 11 31 4 5 R A + C D 3 C D 4 d i m 86 71 79 4 5 R A + C D 3 - 71 86 79 4 5 R A + C D 3 + C D 4 l o w 86 43 64 4 5 R A + C D 3 + C D 4 - 57 86 71 4 5 R A + C D 3 + C D 4 + 57 57 57 4 5 R A + C D 3 + 71 86 79 4 5 R O + C D 3 C D 4 d i m 86 86 86 4 5 R O + C D 3 - rest/ act T 86 57 71 4 5 R O + C D 3 + C D 4 l o w helper 57 43 50 4 5 R O + C D 3 + C D 4 - 57 86 71 4 5 R O + C D 3 + C D 4 + 57 57 57 4 5 R O + C D 3 + 57 43 50 C D 3 - 86 71 79 C D 3 + C D 4 - 57 71 64 C D 3 + C D 4 + 57 57 57 C D 3 + 71 71 71 I m m u n e cel ls A l i q u o t Sens i t i v i ty (%) Spec i f i c i ty (%) Accuracy (%) C D 4 d i m 86 71 79 C D 3 C D 4 -rest/ act T suppressor 43 57 50 4 5 R A + C D 3 C D 8 43 29 36 4 5 R A + C D 3 - 86 86 86 4 5 R A + C D 3 + C D 8 l o w 43 29 36 4 5 R A + C D 3 + C D 8 - 71 57 64 4 5 R A + C D 3 + C D 8 + 57 86 71 4 5 R A + C D 3 + 71 86 79 4 5 R O C D 3 - 71 43 57 4 5 R O + C D 3 + C D 8 l o w 29 43 36 4 5 R O + C D 3 + C D 8 - 71 57 64 4 5 R O + C D 3 + C D 8 + rest/ act T 57 86 71 4 5 R O + C D 3 + suppressor 57 71 64 C D 3 - 71 86 79 C D 3 + C D 8 - 71 57 64 C D 3 + C D 8 + 57 100 79 C D 3 + 71 71 71 C D 8 + C D 3 - 43 29 36 C D 3 - C D 8 - 71 86 79 C D 3 + C D 4 + C D 8 p + C D 8 + 43 44 44 C D 3 + C D 4 + C D 8 p + C D 8 + (proport ion of C D 3 + cells) 57 44 50 C D 3 - C D 4 l o w C D 8 p l o w 57 56 56 C D 3 - C D 8 p d i m C D 8 - T cells 57 67 62 C D 3 C D 8 + C D 8 p - 14 22 19 C D 3 - 71 78 75 C D 3 + C D 4 + C D 8 p - 71 56 62 ro ro I m m u n e cel ls A l i q u o t Sens i t i v i t y (%) Spec i f i c i ty (%) Accu racy (%) C D 3 + C D 4 + C D 8 p + 14 44 31 C D 3 + C D 8 8 d i m C D 8 - 57 67 62 C D 3 + C D 8 p + C D 4 - 43 78 62 C D 3 + C D 8 B + C D 8 l o w T cells 100 22 56 C D 3 + C D 8 p + C D 8 + 43 89 69 C D 3 + C D 8 + C D 8 p - 43 56 50 C D 3 + 57 56 56 C D 3 C D 5 + 43 33 38 C D 3 - 71 67 69 C D 3 C D 5 T C R a b + T C R g d - 29 44 38 C D 3 C D 5 T C R a b + 86 67 75 C D 3 C D 5 T C R a b + T C R g d + 43 67 56 C D 3 - C D 5 T C R a b + T C R g d - 86 67 75 C D 3 - T C R + C D 5 + TCI? 71 67 69 C D 3 + 57 56 56 C D 3 + C D 5 T C R a b + 86 0 38 C D 3 + C D 5 T C R a b + T C R g d + 57 33 44 C D 3 + C D 5 T C R a b + T C R g d - 86 11 44 C D 3 + C D 5 + T C R a b + 29 62 47 C D 3 + C D 5 + T C R a b + T C R g d + 86 50 67 C D 3 + C D 5 + T C R g d + 57 75 67 CO Table J.3 Validation results for qualified subsets of immune cells in proportion to PBMC (%) from the FLDA classification between aGvHD & cGvHD and aGvHD only patients using samples taken between 0 and 21 days from aGvHD diagnosis. Immune cells Aliquot Sensitivity (%) Specificity (%) Accuracy (%) CD3-CD44+CD25- 33 44 40 C D 3 C D 4 4 + C D 2 5 + C D 6 9 + 83 43 62 CD3CD44+CD25+ 83 43 62 C D 3 C D 4 4 - C D 2 5 - 50 44 47 C D 3 - 50 44 47 C D 3 + C D 4 4 - C D 2 5 + l A c t i v a t i o n 67 22 40 C D 3 + C D 4 4 + C D 2 5 - 83 44 60 C D 3 + C D 4 4 + C D 2 5 + C D 6 9 + 33 56 47 C D 3 + C D 4 4 + C D 2 5 + 0 44 27 C D 3 + C D 4 4 - C D 2 5 - 67 56 60 C D 3 + 50 44 47 C D 3 C D 4 d i m 17 33 27 C D 3 C D 8 l o w 17 33 27 C D 3 - C D 4 C D 8 - 33 56 47 C D 3 - 17 44 33 C D 3 + C D 4 b r 2Act iva t ion 17 22 20 C D 3 + C D 4 i n t 83 89 87 C D 3 + C D 8 b r 17 67 47 C D 3 + C D 8 d i m 67 33 47 C D 3 + 0 44 27 i—i •4^ I m m u n e cel ls A l i q u o t Sens i t i v i ty (%) Spec i f i c i ty (%) A c c u r a c y (%) C D 3 C D 4 d i m 3Act iva t ion 17 22 20 C D 3 - C D 8 l 0 W 33 44 40 C D 3 C D 4 - C D 8 - 33 22 27 C D 3 - 17 56 40 C D 3 + C D 4 b r 17 33 27 C D 3 + C D 4 i n t 67 78 73 C D 3 + C D 8 b r 17 67 47 C D 3 + C D 8 d i m 50 22 33 C D 3 + 17 56 40 C D 2 0 + C D 1 9 + B cells 33 78 60 C D 2 2 + C D 2 0 + 17 89 60 C D 2 2 + 33 56 47 C D 3 3 + C D 4 5 d i m C D 1 5 l o w C D 1 4 l o w M y e l o i d s 50 78 67 C D 3 3 + C D 4 5 d i m C D 1 5 + C D 1 4 - 33 78 60 C D 3 3 + C D 4 5 d ™ C D 1 5 + C D 1 4 + 33 89 67 C D 3 3 + C D 4 5 d i m 33 89 67 C D 3 3 + C D 4 5 + C D 1 5 + C D 1 4 + 0 22 13 C D 3 3 + C D 4 5 + 0 22 13 C D 4 5 + C D 3 3 C D 1 5 + C D 1 4 - 33 44 40 C D 4 5 + C D 3 3 - 17 44 33 C D 2 d i m C D 1 6 + C D 3 + C D 5 6 -N K cells 83 44 60 C D 2 d i m C D 1 6 + C D 5 6 + C D 3 - 67 44 53 C D 2 d i m C D 1 6 + C D 5 6 C D 3 - 33 44 40 C D 2 d i m C D 1 6 + 33 56 47 C D 2 C D 1 6 + C D 3 + C D 5 6 - 83 22 47 C D 2 C D 1 6 + C D 5 6 + C D 3 - 33 33 33 I m m u n e cel ls A l i q u o t Sens i t i v i ty (%) Spec i f i c i ty (%) Accu racy (%) C D 2 C D 1 6 + C D 5 6 C D 3 - 33 33 33 C D 2 C D 1 6 + 17 44 33 C D 2 + C D 1 6 C D 3 + C D 5 6 - 33 22 27 C D 2 + C D 1 6 C D 5 6 + C D 3 - 0 67 40 C D 2 + C D 1 6 C D 5 6 C D 3 -N K cells 67 56 60 C D 2 + C D 1 6 - 50 67 60 C D 2 + C D 1 6 + C D 3 + C D 5 6 - 50 67 60 C D 2 + C D 1 6 + C D 5 6 + C D 3 - 50 22 33 C D 2 + C D 1 6 + C D 5 6 C D 3 - 50 56 53 C D 2 + C D 1 6 + 50 56 53 4 5 R A + C D 3 - C D 4 d i m 33 14 23 4 5 R A + C D 3 - 50 43 46 4 5 R A + C D 3 + C D 4 l o w 17 43 31 4 5 R A + C D 3 + C D 4 - 33 57 46 4 5 R A + C D 3 + C D 4 + 0 43 23 4 5 R A + C D 3 + 33 57 46 4 5 R O + C D 3 C D 4 d i l " 33 14 23 4 5 R O + C D 3 - rest /act T 33 43 38 4 5 R O + C D 3 + C D 4 l o w helper 0 43 23 4 5 R O + C D 3 + C D 4 - 0 57 31 4 5 R O + C D 3 + C D 4 + 33 14 23 4 5 R O + C D 3 + 50 43 46 C D 3 - 50 43 46 C D 3 + C D 4 - 50 71 62 C D 3 + C D 4 + 17 57 38 C D 3 + 50 43 46 ro i—i ON I m m u n e cel ls A l i q u o t Sens i t i v i t y (%) Spec i f i c i ty (%) Accuracy (%) C D 4 d i m rest/ act T 17 14 15 C D 3 C D 4 - helper 17 43 31 4 5 R A + C D 3 C D 8 33 57 46 45RA+CD3- 33 43 38 4 5 R A + C D 3 + C D 8 l o w 83 86 85 4 5 R A + C D 3 + C D 8 - 17 14 15 4 5 R A + C D 3 + C D 8 + 50 71 62 4 5 R A + C D 3 + 50 86 69 4 5 R O C D 3 - 17 43 31 4 5 R O + C D 3 + C D 8 l o w rest /act T suppressor 83 57 69 4 5 R O + C D 3 + C D 8 - 50 29 38 4 5 R O + C D 3 + C D 8 + 50 86 69 4 5 R O + C D 3 + 67 57 62 C D 3 - 67 43 54 C D 3 + C D 8 - 33 43 38 C D 3 + C D 8 + 33 71 54 C D 3 + 67 43 54 C D 8 + C D 3 - 33 43 38 C D 3 C D 8 - 67 43 54 C D 3 + C D 4 + C D 8 p + C D 8 + 0 33 20 C D 3 + C D 4 + C D 8 p + C D 8 + (proport ion of C D 3 + cells) 0 11 7 C D 3 C D 4 l o w C D 8 p l o w 50 44 47 C D 3 - C D 8 p d i m C D 8 - T cells 33 56 47 C D 3 C D 8 + C D 8 p - 17 33 27 C D 3 - 33 44 40 C D 3 + C D 4 + C D 8 p - 50 - 44 47 K3 V I I m m u n e cel ls A l i q u o t Sens i t iv i ty (%) Spec i f i c i ty (%) Accuracy (%) C D 3 + C D 4 + C D 8 ( 3 + 0 44 27 C D 3 + C D 8 p d i m C D 8 - 17 56 40 C D 3 + C D 8 p + C D 4 - 17 56 40 C D 3 + C D 8 p + C D 8 l o w T cells 100 22 53 C D 3 + C D 8 P + C D 8 + 17 56 40 C D 3 + C D 8 + C D 8 p - 67 11 33 C D 3 + 33 44 40 C D 3 C D 5 + 33 33 33 C D 3 - 0 56 33 C D 3 C D 5 T C R a b + T C R g d - 50 67 60 C D 3 C D 5 T C R a b + 33 56 47 C D 3 C D 5 T C R a b + T C R g d + 100 33 60 C D 3 C D 5 T C R a b + T C R g d - 17 67 47 C D 3 T C R + C D 5 + 100 33 67 C D 3 + 0 56 33 C D 3 + C D 5 T C R a b + 83 67 73 C D 3 + C D 5 T C R a b + T C R g d + 67 56 60 C D 3 + C D 5 T C R a b + T C R g d - 83 44 60 C D 3 + C D 5 + T C R a b + 50 75 64 C D 3 + C D 5 + T C R a b + T C R g d + 50 62 57 C D 3 + C D 5 + T C R g d + 17 50 36 00 Table J.4 Validation results for qualified subsets of immune cells in concentration (mm3) from the FLDA classification between aGvHD & cGvHD and aGvHD only patients using samples taken from 7 to 21 days post-transplant. Immune cells Aliquot Sensitivity (%) Specificity (%) Accuracy (%) C D 3 C D 4 4 + C D 2 5 - 29 67 50 C D 3 - C D 4 4 + C D 2 5 + C D 6 9 + 71 29 50 CD3CD44+CD25+ 71 14 43 C D 3 C D 4 4 - C D 2 5 - 86 33 56 C D 3 - 57 22 38 C D 3 + C D 4 4 C D 2 5 + l A c t i v a t i o n 57 33 44 C D 3 + C D 4 4 + C D 2 5 - 57 44 50 C D 3 + C D 4 4 + C D 2 5 + C D 6 9 + 86 33 56 C D 3 + C D 4 4 + C D 2 5 + 57 33 44 C D 3 + C D 4 4 C D 2 5 - 86 33 56 C D 3 + 71 22 44 C D 3 - C D 4 d i m 57 44 50 C D 3 C D 8 l o w 57 11 31 C D 3 C D 4 - C D 8 - 86 22 50 C D 3 - 57 33 44 C D 3 + C D 4 b r 2Act iva t ion 43 22 31 C D 3 + C D 4 ^ 86 44 62 C D 3 + C D 8 b r 71 22 44 C D 3 + C D 8 d i m 71 33 50 C D 3 + 71 22 44 NO I m m u n e cells A l i q u o t Sens i t i v i t y (%) Spec i f i c i ty (%) Accu racy (%) C D 3 C D 4 d ™ 3Act iva t ion 43 44 44 C D 3 C D 8 l 0 W 57 33 44 C D 3 C D 4 - C D 8 - 86 22 50 C D 3 - 57 33 44 C D 3 + C D 4 b r 43 22 31 C D 3 + C D 4 t a t 86 44 62 C D 3 + C D 8 b r 71 33 50 C D 3 + C D 8 d i m 86 33 56 C D 3 + 71 33 50 C D 2 0 + C D 1 9 + B cells 57 22 38 C D 2 2 + C D 2 0 + 86 33 56 CD22+ 71 22 44 C D 3 3 + C D 4 5 d i m C D 1 5 l o w C D 1 4 l o w M y e l o i d s 29 78 56 C D 3 3 + C D 4 5 d i m C D 1 5 + C D 1 4 - 29 0 12 C D 3 3 + C D 4 5 d i m C D 1 5 + C D 1 4 + 57 33 44 C D 3 3 + C D 4 5 d ™ 71 0 31 C D 3 3 + C D 4 5 + C D 1 5 + C D 1 4 + 29 56 44 C D 3 3 + C D 4 5 + 29 56 44 C D 4 5 + C D 3 3 C D 1 5 + C D 1 4 - 57 44 50 C D 4 5 + C D 3 3 - 57 22 38 C D 2 d i m C D 1 6 + C D 3 + C D 5 6 -N K cells 86 56 69 C D 2 d i m C D 1 6 + C D 5 6 + C D 3 - 57 33 44 C D 2 d i m C D 1 6 + C D 5 6 - C D 3 - 29 33 31 C D 2 d i m C D 1 6 + 43 33 38 C D 2 C D 1 6 + C D 3 + C D 5 6 - 100 67 81 C D 2 C D 1 6 + C D 5 6 + C D 3 - 71 44 56 O I m m u n e cel ls A l i q u o t Sens i t i v i t y (%) Spec i f i c i ty (%) Accu racy (%) C D 2 C D 1 6 + C D 5 6 C D 3 - 71 22 44 C D 2 C D 1 6 + 71 22 44 C D 2 + C D 1 6 - C D 3 + C D 5 6 - 43 0 19 C D 2 + C D 1 6 C D 5 6 + C D 3 - 57 22 38 C D 2 + C D 1 6 C D 5 6 C D 3 -N K cells 57 33 44 C D 2 + C D 1 6 - 57 0 25 C D 2 + C D 1 6 + C D 3 + C D 5 6 - 71 33 50 C D 2 + C D 1 6 + C D 5 6 + C D 3 - 57 22 38 C D 2 + C D 1 6 + C D 5 6 C D 3 - 57 33 44 C D 2 + C D 1 6 + 71 44 56 4 5 R A + C D 3 C D 4 d i m 57 43 50 4 5 R A + C D 3 - 57 43 50 4 5 R A + C D 3 + C D 4 l o w 57 71 64 4 5 R A + C D 3 + C D 4 - 71 57 64 4 5 R A + C D 3 + C D 4 + 43 71 57 4 5 R A + C D 3 + 57 71 64 4 5 R O + C D 3 - C D 4 d i m 71 29 50 4 5 R O + C D 3 - rest/ act T 86 29 57 4 5 R O + C D 3 + C D 4 l o w helper 71 57 64 4 5 R O + C D 3 + C D 4 - 86 57 71 4 5 R O + C D 3 + C D 4 + 71 43 57 4 5 R O + C D 3 + 86 43 64 C D 3 - 71 29 50 C D 3 + C D 4 - 100 43 71 C D 3 + C D 4 + 71 43 57 C D 3 + 71 43 57 ho ro I m m u n e cel ls A l i q u o t Sens i t i v i t y (%) Spec i f i c i ty (%) A c c u r a c y (%) C D 4 d i m rest/ act T 71 29 50 C D 3 C D 4 - helper 71 43 57 4 5 R A + C D 3 - C D 8 57 29 43 4 5 R A + C D 3 - 57 29 43 4 5 R A + C D 3 + C D 8 l o w 86 71 79 4 5 R A + C D 3 + C D 8 - 57 71 64 4 5 R A + C D 3 + C D 8 + 57 43 50 4 5 R A + C D 3 + 43 71 57 4 5 R O + C D 3 - 86 43 64 4 5 R O + C D 3 + C D 8 l o w rest/ act T suppressor 86 57 71 4 5 R O + C D 3 + C D 8 - 100 43 71 4 5 R O + C D 3 + C D 8 + 86 43 64 4 5 R O + C D 3 + 100 43 71 C D 3 - 71 29 50 C D 3 + C D 8 - 86 43 64 C D 3 + C D 8 + 57 14 36 C D 3 + 86 43 64 C D 8 + C D 3 - 57 29 43 C D 3 C D 8 - 71 29 50 C D 3 + C D 4 + C D 8 p + C D 8 + 71 33 50 C D 3 + C D 4 + C D 8 P + C D 8 + (proport ion of C D 3 + cells) 57 22 38 C D 3 - C D 4 l o w C D 8 p l o w 86 33 56 C D 3 C D 8 p d i m C D 8 - T cells 43 56 50 C D 3 C D 8 + C D 8 p - 57 11 31 C D 3 - 57 33 44 C D 3 + C D 4 + C D 8 p - 71 33 50 K5 I m m u n e cells A l i q u o t Sens i t i v i ty (%) Spec i f i c i ty (%) Accuracy (%) C D 3 + C D 4 + C D 8 ( 3 + 71 44 56 C D 3 + C D 8 p d i m C D 8 - 71 44 56 C D 3 + C D 8 p + C D 4 - 57 22 38 C D 3 + C D 8 p + C D 8 l 0 W T cells 86 44 62 C D 3 + C D 8 p + C D 8 + 71 33 50 C D 3 + C D 8 + C D 8 p - 71 33 50 C D 3 + 71 33 50 C D 3 C D 5 + 71 33 54 C D 3 - 57 33 44 C D 3 C D 5 T C R a b + T C R g d - 57 11 31 C D 3 C D 5 T C R a b + 71 33 50 C D 3 C D 5 T C R a b + T C R g d + 57 22 38 C D 3 C D 5 T C R a b + T C R g d - 71 33 50 C D 3 T C R + C D 5 + 100 67 85 C D 3 + 57 11 31 C D 3 + C D 5 T C R a b + 100 33 62 C D 3 + C D 5 - T C R a b + T C R g d + 71 44 56 C D 3 + C D 5 T C R a b + T C R g d - 100 11 50 C D 3 + C D 5 + T C R a b + 57 12 33 C D 3 + C D 5 + T C R a b + T C R g d + 71 38 53 C D 3 + C D 5 + T C R g d + 71 0 33 CO Table J.5 Validation results for qualified subsets of immune cells in concentration (mm3) from the FLDA classification between aGvHD & cGvHD and aGvHD only patients using samples taken between 21 and 0 days prior to aGvHD diagnosis. Immune cells Aliquot Sensitivity (%) Specificity (%) Accuracy (%) CD3-CD44+CD25- 71 78 75 C D 3 C D 4 4 + C D 2 5 + C D 6 9 + 86 29 57 CD3CD44+CD25+ 100 29 64 C D 3 C D 4 4 - C D 2 5 - 57 22 38 C D 3 - 57 33 44 C D 3 + C D 4 4 C D 2 5 + 1 A c t i v a t i o n 57 22 38 C D 3 + C D 4 4 + C D 2 5 - 14 11 12 C D 3 + C D 4 4 + C D 2 5 + C D 6 9 + 14 56 38 C D 3 + C D 4 4 + C D 2 5 + 29 44 38 C D 3 + C D 4 4 C D 2 5 - 43 56 50 C D 3 + 0 33 19 C D 3 - C D 4 d i m 71 78 75 C D 3 - C D 8 l o w 57 33 44 C D 3 C D 4 - C D 8 - 29 22 25 C D 3 - 71 33 50 C D 3 + C D 4 b r 2Act iva t ion 14 44 31 C D 3 + C D 4 i n t 86 33 56 C D 3 + C D 8 b r 29 67 50 C D 3 + C D 8 d i m 57 0 25 CD3+ 43 56 50 K3 I m m u n e cel ls A l i q u o t Sens i t i v i t y (%) Spec i f i c i ty (%) Accuracy (%) C D 3 C D 4 d i m 3Act iva t ion 71 78 75 C D 3 - C D 8 l 0 W 57 ' 33 44 C D 3 C D 4 - C D 8 - 29 22 25 C D 3 - 71 33 50 C D 3 + C D 4 b r 14 56 38 C D S + C D ^ 1 71 56 62 C D 3 + C D 8 b r 29 67 50 C D 3 + C D 8 d i m 57 22 38 C D 3 + 43 67 56 C D 2 0 + C D 1 9 + B cells 71 11 38 C D 2 2 + C D 2 0 + 57 100 81 C D 2 2 + 71 11 38 C D 3 3 + C D 4 5 d i m C D 1 5 l o w C D 1 4 l o w M y e l o i d s 86 22 50 C D 3 3 + C D 4 5 d i m C D 1 5 + C D 1 4 - 57 33 44 C D 3 3 + C D 4 5 d i n i C D 1 5 + C D 1 4 + 86 11 44 C D 3 3 + C D 4 5 d i m 86 11 44 C D 3 3 + C D 4 5 + C D 1 5 + C D 1 4 + 43 56 50 C D 3 3 + C D 4 5 + 57 67 62 C D 4 5 + C D 3 3 C D 1 5 + C D 1 4 - 29 67 50 CD45+CD33- 0 33 19 C D 2 d i m C D 1 6 + C D 3 + C D 5 6 -N K cells 86 11 44 C D 2 d i m C D 1 6 + C D 5 6 + C D 3 - 43 44 44 C D 2 d i m C D 1 6 + C D 5 6 C D 3 - 71 44 56 C D 2 d ™ C D 1 6 + 71 44 56 C D 2 C D 1 6 + C D 3 + C D 5 6 - 86 11 44 C D 2 C D 1 6 + C D 5 6 + C D 3 - 86 33 56 K3 K3 I m m u n e cel ls A l i q u o t Sens i t i v i ty (%) Spec i f i c i ty (%) Accu racy (%) C D 2 C D 1 6 + C D 5 6 C D 3 - 86 22 50 C D 2 C D 1 6 + 86 22 50 C D 2 + C D 1 6 C D 3 + C D 5 6 - 29 78 56 C D 2 + C D 1 6 C D 5 6 + C D 3 - 71 67 69 C D 2 + C D 1 6 C D 5 6 C D 3 -N K cells 57 22 38 C D 2 + C D 1 6 - 14 44 31 C D 2 + C D 1 6 + C D 3 + C D 5 6 - 29 44 38 C D 2 + C D 1 6 + C D 5 6 + C D 3 - 71 33 50 C D 2 + C D 1 6 + C D 5 6 C D 3 - 57 56 56 C D 2 + C D 1 6 + 43 11 25 4 5 R A + C D 3 - C D 4 d i m 86 71 79 4 5 R A + C D 3 - 86 14 50 4 5 R A + C D 3 + C D 4 l o w 86 29 57 4 5 R A + C D 3 + C D 4 - 43 71 57 4 5 R A + C D 3 + C D 4 + 14 0 7 4 5 R A + C D 3 + 14 29 21 4 5 R O + C D 3 - C D 4 d i m 71 57 64 4 5 R O + C D 3 - r est/act T 71 57 64 4 5 R O + C D 3 + C D 4 l o w helper 86 14 50 4 5 R O + C D 3 + C D 4 - 43 57 50 4 5 R O + C D 3 + C D 4 + 43 43 43 4 5 R O + C D 3 + 0 0 0 C D 3 - 71 43 57 C D 3 + C D 4 - 14 57 36 C D 3 + C D 4 + 43 57 50 C D 3 + 14 57 36 ro ro I m m u n e cel ls A l i q u o t Sens i t i v i ty (%) Spec i f i c i ty (%) Accu racy (%) C D 4 d i m rest/ act T 71 29 50 C D 3 C D 4 - helper 43 29 36 4 5 R A + C D 3 C D 8 86 29 57 4 5 R A + C D 3 - 86 14 50 4 5 R A + C D 3 + C D 8 l o w 71 14 43 4 5 R A + C D 3 + C D 8 - 14 14 14 4 5 R A + C D 3 + C D 8 + 57 43 50 4 5 R A + C D 3 + 43 43 43 4 5 R O + C D 3 - 100 43 71 4 5 R O + C D 3 + C D 8 l o w rest/ act T suppressor 71 29 50 4 5 R O + C D 3 + C D 8 - 71 29 50 4 5 R O + C D 3 + C D 8 + 29 57 43 4 5 R O + C D 3 + 29 14 21 C D 3 - 71 43 57 C D 3 + C D 8 - 71 43 57. C D 3 + C D 8 + 29 57 43 C D 3 + 14 14 14 C D 8 + C D 3 - 86 43 64 C D 3 C D 8 - 71 43 57 C D 3 + C D 4 + C D 8 p + C D 8 + 29 56 44 C D 3 + C D 4 + C D 8 p + C D 8 + (proport ion of C D 3 + cells) 29 56 44 C D 3 C D 4 l o w C D 8 p l o w 100 44 69 C D 3 C D 8 p d i m C D 8 - T cells 86 56 69 C D 3 C D 8 + C D 8 p - 57 44 50 C D 3 - 71 44 56 C D 3 + C D 4 + C D 8 p - 29 44 38 I m m u n e cel ls A l i q u o t Sens i t i v i ty (%) Spec i f i c i ty (%) Accuracy (%) C D 3 + C D 4 + C D 8 p + 14 44 31 C D 3 + C D 8 p d ™ C D 8 - 29 44 38 C D 3 + C D 8 p + C D 4 - 29 67 50 C D 3 + C D 8 p + C D 8 l o w T cells 71 44 56 C D 3 + C D 8 p + C D 8 + 29 67 50 C D 3 + C D 8 + C D 8 p - 43 56 50 C D 3 + 29 67 50 C D 3 C D 5 + 14 50 31 C D 3 - 71 44 56 C D 3 C D 5 T C R a b + T C R g d - 43 67 56 C D 3 C D 5 T C R a b + 100 33 62 C D 3 C D 5 T C R a b + T C R g d + 86 33 56 C D 3 C D 5 T C R a b + T C R g d - 100 33 62 C D 3 T C R + C D 5 + TV' l? 100 17 62 C D 3 + 29 56 44 C D 3 + C D 5 T C R a b + 86 22 50 C D 3 + C D 5 T C R a b + T C R g d + 86 33 56 C D 3 + C D 5 T C R a b + T C R g d - 71 11 38 C D 3 + C D 5 + T C R a b + 29 50 40 C D 3 + C D 5 + T C R a b + T C R g d + 43 50 47 C D 3 + C D 5 + T C R g d + 43 0 20 00 Table J.6 Validation results for qualified subsets of immune cells in concentration (mm3) from the FLDA classification between aGvHD & cGvHD and aGvHD only patients using samples taken between 0 and 21 days from aGvHD diagnosis. Immune cells Aliquot Sensitivity (%) Specificity (%) Accuracy (%) C D 3 - C D 4 4 + C D 2 5 - 50 11 27 C D 3 C D 4 4 + C D 2 5 + C D 6 9 + 100 29 62 CD3CD44+CD25+ 100 29 62 C D 3 - C D 4 4 C D 2 5 - 50 78 67 C D 3 - 83 33 53 C D 3 + C D 4 4 - C D 2 5 + l A c t i v a t i o n 33 11 20 C D 3 + C D 4 4 + C D 2 5 - 67 44 53 C D 3 + C D 4 4 + C D 2 5 + C D 6 9 + 100 33 60 C D 3 + C D 4 4 + C D 2 5 + 100 33 60 C D 3 + C D 4 4 - C D 2 5 - 83 44 60 C D 3 + 67 44 53 C D 3 C D 4 d i m 50 22 33 CD3-CD8 1 ™ 50 67 60 C D 3 - C D 4 C D 8 - 100 22 53 C D 3 - 100 33 60 C D 3 + C D 4 b r 2Act iva t ion 17 11 13 C D 3 + C D 4 t a t 100 22 53 C D 3 + C D 8 b r 67 78 73 C D 3 + C D 8 d i m 83 33 53 C D 3 + 67 44 53 ro ro NO Immune cells Aliquot Sensitivity (%) Specificity (%) Accuracy (%) C D 3 C D 4 d ™ 3Act iva t ion 33 22 27 C D 3 - C D 8 l o w 67 56 60 C D 3 C D 4 - C D 8 - 83 22 47 C D 3 - 100 44 67 C D 3 + C D 4 b r 17 22 20 C D 3 + C D 4 i n t 100 22 53 C D 3 + C D 8 b r 67 78 73 C D 3 + C D 8 d i m 83 33 53 C D 3 + 67 56 60 C D 2 0 + C D 1 9 + B cells 0 56 33 C D 2 2 + C D 2 0 + 17 67 47 C D 2 2 + 83 44 60 C D 3 3 + C D 4 5 d i m C D 1 5 l o w C D 1 4 l o w M y e l o i d s 50 89 73 C D 3 3 + C D 4 5 d i m C D 1 5 + C D 1 4 - 50 67 60 C D 3 3 + C D 4 5 d i m C D 1 5 + C D 1 4 + 50 89 73 C D 3 3 + C D 4 5 d i m 50 78 67 C D 3 3 + C D 4 5 + C D 1 5 + C D 1 4 + 67 44 53 C D 3 3 + C D 4 5 + 33 33 33 C D 4 5 + C D 3 3 C D 1 5 + C D 1 4 - 83 89 87 C D 4 5 + C D 3 3 - 83 78 80 C D 2 d i m C D 1 6 + C D 3 + C D 5 6 -N K cells 100 44 67 C D 2 d i m C D 1 6 + C D 5 6 + C D 3 - 67 33 47 C D 2 d i m C D 1 6 + C D 5 6 - C D 3 - 83 11 40 C D 2 d i m C D 1 6 + 83 22 47 C D 2 C D 1 6 + C D 3 + C D 5 6 - 100 44 67 C D 2 C D 1 6 + C D 5 6 + C D 3 - 50 33 40 ro o I m m u n e cel ls A l i q u o t Sens i t i v i t y (%) Spec i f i c i ty (%) Accuracy (%) C D 2 C D 1 6 + C D 5 6 C D 3 - 83 33 53 C D 2 C D 1 6 + 83 44 60 C D 2 + C D 1 6 - C D 3 + C D 5 6 - 50 56 53 C D 2 + C D 1 6 C D 5 6 + C D 3 - 17 78 53 C D 2 + C D 1 6 C D 5 6 C D 3 -N K cells 83 33 53 C D 2 + C D 1 6 - 67 67 67 C D 2 + C D 1 6 + C D 3 + C D 5 6 - 33 56 47 C D 2 + C D 1 6 + C D 5 6 + C D 3 - 33 44 40 C D 2 + C D 1 6 + C D 5 6 - C D 3 - 50 78 67 C D 2 + C D 1 6 + 67 44 53 4 5 R A + C D 3 C D 4 d i m 83 14 46 4 5 R A + C D 3 - 100 43 69 4 5 R A + C D 3 + C D 4 i ° w 83 43 62 4 5 R A + C D 3 + C D 4 - 67 71 69 4 5 R A + C D 3 + C D 4 + 0 43 23 4 5 R A + C D 3 + 50 43 46 4 5 R O + C D 3 C D 4 d i m 67 29 46 4 5 R O + C D 3 - rest/act T 50 14 31 4 5 R O + C D 3 + C D 4 l o w helper 50 29 38 4 5 R O + C D 3 + C D 4 - 50 86 69 4 5 R O + C D 3 + C D 4 + 50 43 46 4 5 R O + C D 3 + 67 57 62 C D 3 - 100 43 69 C D 3 + C D 4 - 83 71 77 C D 3 + C D 4 + 17 29 23 C D 3 + 83 57 69 I m m u n e cel ls A l i q u o t Sens i t i v i t y (%) Spec i f i c i ty (%) Accu racy (%) C D 4 d i m rest/ act T 67 14 38 C D 3 C D 4 - helper 100 57 77 4 5 R A + C D 3 - C D 8 50 71 62 4 5 R A + C D 3 - 100 43 69 4 5 R A + C D 3 + C D 8 l o w 67 57 62 4 5 R A + C D 3 + C D 8 - 33 43 38 4 5 R A + C D 3 + C D 8 + 83 71 77 4 5 R A + C D 3 + 83 57 69 4 5 R O + C D 3 - 67 29 46 4 5 R O + C D 3 + C D 8 l o w rest/ act T suppressor 100 57 77 4 5 R O + C D 3 + C D 8 - 67 29 46 4 5 R O + C D 3 + C D 8 + 67 100 85 4 5 R O + C D 3 + 67 57 62 C D 3 - 100 43 69 C D 3 + C D 8 - 33 14 23 C D 3 + C D 8 + 83 86 85 C D 3 + 67 57 62 C D 8 + C D 3 - 67 71 69 C D 3 C D 8 - 83 29 54 C D 3 + C D 4 + C D 8 p + C D 8 + 33 67 53 C D 3 + C D 4 + C D 8 p + C D 8 + (proport ion of C D 3 + cells) 50 44 47 C D 3 - C D 4 l o w C D 8 p l o w 83 44 60 C D 3 C D 8 p d i m C D 8 - T cells 67 22 40 C D 3 C D 8 + C D 8 p - 50 56 53 C D 3 - 100 44 67 C D 3 + C D 4 + C D 8 p - 17 11 13 C O KJ I m m u n e cel ls A l i q u o t Sens i t i v i ty (%) Spec i f i c i ty (%) Accu racy (%) C D 3 + C D 4 + C D 8 P + 33 33 33 C D 3 + C D 8 p d i m C D 8 - 33 44 40 C D 3 + C D 8 p + C D 4 - 50 78 67 C D 3 + C D 8 p + C D 8 l 0 W T cells 100 11 47 C D 3 + C D 8 p + C D 8 + 50 78 67 C D 3 + C D 8 + C D 8 p - 83 11 40 C D 3 + 33 67 53 C D 3 C D 5 + 50 67 58 C D 3 - 100 44 67 C D 3 C D 5 T C R a b + T C R g d - 83 67 73 C D 3 C D 5 T C R a b + 67 22 40 C D 3 C D 5 T C R a b + T C R g d + 83 44 60 C D 3 C D 5 T C R a b + T C R g d - 67 11 33 C D 3 T C R + C D 5 + T C R 100 50 75 C D 3 + 67 67 67 C D 3 + C D 5 T C R a b + 100 33 60 C D 3 + C D 5 T C R a b + T C R g d + 100 56 73 C D 3 + C D 5 - T C R a b + T C R g d - 83 11 40 C D 3 + C D 5 + T C R a b + 100 50 71 C D 3 + C D 5 + T C R a b + T C R g d + 50 75 64 C D 3 + C D 5 + T C R g d + 50 75 64 N3 A p p e n d i x K . F L D A c lass i f ica t ion m o d e l for the onset of a G v H D The F L D A classifier bu i l t u s ing i m m u n e cells C D 3 + C D 4 + C D 8 p + and samples taken between 7 and 21 days post-transplant, had the highest sensit ivi ty (86%) and specificity (100%) a m o n g the consistent classifiers. The u n k n o w n parameters i n the s ignal p lus noise m o d e l (Equat ion 1.1) were estimated u s ing the t ra in ing dataset v i a the E M algor i thm. The t ra in ing dataset is consists of observed values YJJ i nc luded 21 a G v H D and 3 n o n - G v H D patients w i t h samples taken between 7 and 21 days post-transplant. L inear B-splines w i t h week ly knot placement were used to m o d e l the observed data. A t the end, the observed values were d i v i d e d into different elements: -6 .6980 1. XQ=- 2.4241 for each knot; -0 .6519 1.7458 -2.5267 2. Class signals A a , , A = - 0.4414 for each knot and a, - for each class 5 ' 2.5267 0.9158 3. A B-spline matr ix denot ing these first three parameters for each j (columns) representing each knot (j = 7,14, 21) and each / (rows) representing each t ime uni t (j = 7, 8, 9 , . . . 21) (values were r o u n d e d to two dec imal place). 234 -0 .15 0.42 -0 .44 -0 .18 0.36 -0 .33 -0.21 0.3 -0 .21 -0 .24 0.24 -0 .1 -0 .27 0.18 0.02 - 0 . 3 0.12 0.13 -0 .33 0.06 0.25 -0 .36 0 0.36 -0 .33 -0 .06 0.25 -0 .30 -0 .12 0.13 -0 .27 -0 .18 0.02 -0 .24 -0 .24 -0 .1 -0 .21 -0 .3 -0 .21 -0 .18 -0 .36 -0 .33 -0 .15 -0 .42 -0 .44 4.. For the test data p i w i t h samples taken at 7,14, and 21 days post-transplant: -0 .15 0.42 -0 .44 Sx=-036 0 0.36 -0 .15 - 0 . 4 2 -0 .44 We igh t values can be de termined us ing the estimated parameters f rom the F L D A classifier v i a Equa t ion 1.3. weigh = -1.0823 0.0123 - 0 . 1 7 6 7 , for each sampled t ime point . 0.2718 G l o b a l base values SXXQ = 2.2034 2.3000 5. Class i f icat ion of p i can be made us ing Equa t ion 1.4. If the l inear d iscr iminant value is negative, n e w data w i l l be classified into the a G v H D patient g roup and vice versa for n o n - G v H D patient group. 235 0.92 For example, for a n e w patient w i t h values X = 2.77 f rom samples taken at 7,14, 3.63 and 21 days post-transplant, the l inear d iscr iminant value ax = weight • (X - SXA,0) calculated to -0.9. The n e w patient is classified into the a G v H D class (a x ~< 0). 236 A p p e n d i x L . F L D A c lass i f ica t ion m o d e l for the onset of c G v H D The F L D A classifier bu i l t u s ing i m m u n e cells 4 5 R O + C D 3 " C D 4 d i m i n p ropor t ion to P B M C and samples taken between 21 and 0 days p r io r to a G v H D diagnosis, had the highest estimated 86% sensi t ivi ty and 86% specificity (Table 4.1), exc lud ing the inconsistent classifiers. The u n k n o w n parameters i n the s ignal p lus noise m o d e l (Equat ion 1.1) were estimated us ing the t ra in ing dataset v i a the E M algor i thm. The t ra in ing dataset is consists of observed values Y0 i nc luded 7 a G v H D & c G v H D and 7 a G v H D on ly patients w i t h samples taken between 21 and 0 p r io r to a G v H D diagnosis. L inear B -splines w i t h week ly knot placement were used to m o d e l the observed data. A t the end, the observed values were d i v i d e d into different elements: -66.4930 -10.1525 , 1. Xn = for each knot; 0 -16.8377 -13.1379 237 0.1042 -7.3447 f ' -3.0568 r , , 2. Class signals A a , . A = for each knot and a,= for each class 6 ' -4.2339 3.0568 2.8252 3. The first three parameters are denoted by the specified B-spl ine mat r ix for each j (columns) representing each knot (j = -21, -14, -7, and 0) and each i (rows) representing each t ime uni t (j = -21, -19, -18, -17, .. . and 0) (values were r o u n d e d to two dec imal place). - 0 .09 0.2 -0 .43 0.4 -0 .12 0.21 -0 .34 0.3 -0 .14 0.22 -0 .26 0.2 -0 .17 0.23 -0 .18 0.1 -0 .19 0.24 - 0 . 1 0 0 -0 .22 0.25 - 0 . 0 2 -0 .1 -0 .24 0.26 0.06 - 0 . 2 -0 .27 0.27 0.14 - 0 . 2 9 -0 .27 0.19 0.14 -0 .21 -0 .27 0.12 0.14 -0 .13 -0 .27 0.04 0.14 -0 .04 -0 .27 -0 .04 0.14 0.04 -0 .27 - 0 . 1 2 0.14 0.13 -0 .27 - 0 . 1 9 0.14 f 0.21 -0 .27 -0 .27 0.14 0.29 -0 .24 -0 .26 0.06 0.2 - 0.22 -0 .25 - 0 . 0 2 0.1 - 0 . 1 9 -0 .24 -0 .1 0 -0 .17 -0 .23 -0 .18 -0 .1 -0 .14 - 0 . 2 2 -0 .26 - 0 . 2 -0 .12 -0 .21 -0 .34 - 0 . 3 -0 .09 - 0 . 2 -0 .43 - 0 . 4 238 4. For the test data p i 9 w i t h samples taken at 21,15, 7, and 0 days p r io r to a G v H D diagnosis: - 0 .09 0.2 -0 .43 0.40 -0 .24 0.26 0.06 - 0 . 2 0 -0 .27 -0 .27 0.14 0.29 - 0 . 0 9 - 0 . 2 0 -0 .43 - 0 . 4 sx = Weigh t values specific to these sampled t ime points can be de termined us ing the estimated parameters f rom the F L D A classifier v i a Equa t ion 1.3. weight = 0.0762 -0.1436 0.1191 0.1091, for each sampled t ime point . 5.8992 , , , 15.0097 G l o b a l base values SyA.n = x 0 14.2864 20.4889 5. Class i f icat ion of p l 9 can be made us ing Equa t ion 1.4. If the l inear d iscr iminant value is negative, n e w data w i l l be classified into the a G v H D & c G v H D patient g roup and vice versa for a G v H D on ly patient group. 13.3 23.4 For example for a n e w patient w i t h values X = f r o m samples taken at 21,15, 7 12.6 13.6 and 0 days p r io r to a G v H D diagnosis, the l inear d iscr iminant va lue ax = weight • (X - SXA0) is calculated to -1.59. The n e w patient is classified into the a G v H D & c G v H D group (ax y 0). 239 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0100886/manifest

Comment

Related Items