{"http:\/\/dx.doi.org\/10.14288\/1.0051174":{"http:\/\/vivoweb.org\/ontology\/core#departmentOrSchool":[{"value":"Science, Faculty of","type":"literal","lang":"en"},{"value":"Computer Science, Department of","type":"literal","lang":"en"}],"http:\/\/www.europeana.eu\/schemas\/edm\/dataProvider":[{"value":"DSpace","type":"literal","lang":"en"}],"https:\/\/open.library.ubc.ca\/terms#degreeCampus":[{"value":"UBCV","type":"literal","lang":"en"}],"http:\/\/purl.org\/dc\/terms\/creator":[{"value":"Hollander, Zsuzsanna","type":"literal","lang":"en"}],"http:\/\/purl.org\/dc\/terms\/issued":[{"value":"2010-01-06T00:34:39Z","type":"literal","lang":"en"},{"value":"2006","type":"literal","lang":"en"}],"http:\/\/vivoweb.org\/ontology\/core#relatedDegree":[{"value":"Master of Science - MSc","type":"literal","lang":"en"}],"https:\/\/open.library.ubc.ca\/terms#degreeGrantor":[{"value":"University of British Columbia","type":"literal","lang":"en"}],"http:\/\/purl.org\/dc\/terms\/description":[{"value":"Solid organ transplantation is a common procedure for end-stage organ failure. After the\r\ntransplantation, the rejection of the new organ is possible due to the patient's immune\r\nsystem trying to eliminate the foreign object. To prevent rejection, monthly painful and\r\ncostly procedure is needed which involves taking a biopsy of the allograft. The purpose\r\nof our project is to find biomarkers based on blood samples, so the diagnosis\/prognosis of\r\nthe rejection can occur based on a simple blood or urine sample. Up to eight blood\r\nsamples are taken from rejection and non-rejection patients and for each sample a\r\nmicroarray is created. The microarray data is longitudinal and contains 54,613 genes.\r\nThe analysis consists of normalization, pre-filtering, filtering, testing the candidate\r\nbiomarkers for diagnosis\/prediction, pathway analysis, and biomarker validation. For our\r\ntype of data, the bottleneck and the most understudied step is the filtering.\r\nWe focused our research on finding possible filtering methods. We tested these methods\r\nagainst the questions our biologists wanted to get the answer for. We generated a data\r\nset, based on real data, to find the strengths and weaknesses of the filtering methods we\r\nproposed to use. We also tested which one of the filtering methods would provide the\r\nmost precise answer to each group of questions by creating synthetic data sets with a\r\nnumber of biomarkers planted in them.\r\nOur conclusion is that a statistical method, or group of methods, would not be able to\r\nprovide the perfect answer to all of our biological questions. That is why we created a\r\ntable where we matched our questions to methods that, based on our experiments, give\r\nthe best results. Also, we provided some advice on which methods perform better under\r\nspecific conditions.","type":"literal","lang":"en"}],"http:\/\/www.europeana.eu\/schemas\/edm\/aggregatedCHO":[{"value":"https:\/\/circle.library.ubc.ca\/rest\/handle\/2429\/17587?expand=metadata","type":"literal","lang":"en"}],"http:\/\/www.w3.org\/2009\/08\/skos-reference\/skos.html#note":[{"value":"C O M P A R A T I V E STUDY O F STATISTICAL M E T H O D S F O R FINDING B I O M A R K E R S IN LONGITUDINAL D A T A by ZSUZSANNA HOLLANDER B.A., Academy of Economical Studies, 1998 B.Sc, The University of British Columbia, 2003 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of Science in THE FACULTY OF GRADUATE STUDIES (Computer Science) THE UNIVERSITY OF BRITISH COLUMBIA December 2005 \u00a9 Zsuzsanna Hollander, 2005 \fAbstract Solid organ transplantation is a common procedure for end-stage organ failure. After the transplantation, the rejection o f the new organ is possible due to the patient's immune system trying to eliminate the foreign object. T o prevent rejection, monthly painful and costly procedure is needed which involves taking a biopsy o f the allograft. The purpose o f our project is to find biomarkers based on blood samples, so the diagnosis\/prognosis o f the rejection can occur based on a simple blood or urine sample. U p to eight blood samples are taken from rejection and non-rejection patients and for each sample a microarray is created. The microarray data is longitudinal and contains 54,613 genes. The analysis consists o f normalization, pre-filtering, filtering, testing the candidate biomarkers for diagnosis\/prediction, pathway analysis, and biomarker validation. For our type o f data, the bottleneck and the most understudied step is the filtering. W e focused our research on finding possible filtering methods. W e tested these methods against the questions our biologists wanted to get the answer for. W e generated a data set, based on real data, to find the strengths and weaknesses o f the filtering methods we proposed to use. W e also tested which one o f the filtering methods would provide the most precise answer to each group o f questions by creating synthetic data sets with a number o f biomarkers planted i n them. Our conclusion is that a statistical method, or group o f methods, would not be able to provide the perfect answer to all o f our biological questions. That is why we created a table where we matched our questions to methods that, based on our experiments, give the best results. A l s o , we provided some advice on which methods perform better under specific conditions. ii \fContents Abstract ii Contents iii List o f Tables v List o f Figures vi . Acknowledgements viii 1. Introduction 1 1.1. Description o f Project 1 1.2. Description o f Data to be Analyzed 2 1.3. Challenges 3 1.4. Description o f the Steps o f Our Analysis 5 1.5. Description o f Biological Questions 7 2. Review o f Methods Currently Used for Finding Biomarkers 2.1. 9 Types o f Studies 9 2.1.1. Studies with Multiple Time Points and Small Number o f Patients 2.1.2. Studies with Small Number o f Genes 10 2.1.3. Studies with One Time Point 10 2.2. Summary and Critique 13 3. Filtering Methods for Finding Candidate Biomarkers 3.1. 9 Description o f the Filtering Methods 15 , 15 3.1.1. T i m e Not Included 16 3.1.2. Time Included 19 iii \f3.2. From Biological Questions to Statistical Questions 24 3.3. Question to Method Mapping 25 4. Validation o f Method - Question Correspondence 27 4.1. Description o f Test Data Sets 27 4.2. Experiments Performed and Results 29 4.2.1. Category I o f Questions 31 4.2.1.1. Differential Expression for A l l Patients at A l l Time Points 32 4.2.1.2. Differential Expression for Some Patients at A l l Time Points 35 4.2.1.3. Differential Expression for A l l Patients at Some Time Points 38 4.2.2. Category II o f Questions 40 4.2.2.1. Differential Expression for A l l Patients at Both Time Points 41 4.2.2.2. Differential Expression for Some Patients at Both Time Points 43 4.2.3. Category III o f Questions 46 4.2.3.1. Differential Expression for A l l Patients at A l l Time Points 47 4.2.3.2. Differential Expression for Some Patients at A l l Time Points 50 4.2.3.3. Differential Expression for A l l Patients at Some Time Points 53 4.2.4. Category I V o f Questions., 55 4.2.4.1. Differential Expression for A l l Patients at A l l Time Points 55 4.2.4.2. Differential Expression for Some Patients at A l l Time Points 58 4.2.4.3. Differential Expression for A l l Patients at Some Time Points 60 5. Conclusion 64 Bibliography 67 iv \fList of Tables 1.1 List o f biological question 8 3.1 Possible tests for the selection o f candidate biomarkers 15 3.2 Categories o f questions 25 3.3 Questions and statistical methods that can help answer them 26 4.1 Types o f data sets and their symbols 28 4.2. Positives and negatives 30 5.1 Revised question-method table 65 v \fList of Figures 1.1 Steps o f statistical analysis for finding biomarkers 5 2.1 Common steps o f statistical analysis for finding biomarkers 3.1 Strengths o f t-test, W i l c o x o n , and permutation test ....11 17 3.2 Weaknesses o f t-test, Wilcoxon, and permutation test .....18 3.3 Strengths o f linear regression and G E E 20 3.4 Weaknesses o f linear regression and G E E 21 3.5 Strengths o f M A N O V A 22 3.6 Weaknesses o f M A N O V A '. 24 4.1 Top k vs. number o f true positives i n A P - A T , cat. 1 34 4.2 Precision-Recall graph for biomarkers expressed i n all patients at all time points 35 4.3 Top k vs. number o f true positives i n S P - A T , cat. 1 36 4.4 Precision-Recall graph for markers expressed i n some patients at all time points 37 4.5 Top k vs. number o f true positives i n A P - S T , cat. I 39 4.6 Precision-Recall graph for markers expressed i n all patients at some time points 40 4.7 Top k vs. number o f true positives i n A P - A T , cat. II 42 4.8 Precision-Recall graph for biomarkers expressed i n all patients 43 4.9 Top k vs. number o f true positives i n S P - A T , cat. II 45 4.10 Precision-Recall graph for biomarkers expressed i n some patients 46 4.11 Top k vs. number o f true positives i n A P - A T , cat. I l l 49 4.12 Precision-Recall graph for biomarkers expressed i n all patients at all time points ...50 4.13 Top k vs. number o f true positives i n S P - A T , cat. I l l vi 51 \f4.14 Precision-Recall graph for markers expressed i n some patients at all time points....52 4.15 Top k vs. number o f true positives in A P - S T , cat. I l l 54 4.16 Precision-Recall graph for biomarkers expressed i n A P - S T 54 4.17 Top k vs. number o f true positives in A P - A T , cat. I V 57 4.18 Precision-Recall graph for markers expressed i n all patients at all time points 58 4.19 Top k vs. number o f true positives in S P - A T , cat. I V 59 : 4.20 Precision-Recall graph for markers expressed in some patients at all time points....60 4.21 Top k vs. number o f true positives in A P - S T , cat. I V 61 4.22 Precision-Recall graph for markers expressed i n all patients at some time points....62 Vll \fAcknowledgements First o f all, I would like to thank m y supervisor, Dr. Raymond N g , for all the invaluable advice and guidance he provided during m y graduate studies. introducing me into the rewarding field o f health informatics. A l s o , many thanks for Without his help I would not been able to be part o f the Biomarkers in Transplantation project. I would also like to thank m y second reader, Dr. Ruben Zamar, for reviewing m y thesis and for all the insightful advice. Furthermore, I would like to thank Dr. Lang W u for all the information he provided me regarding longitudinal data analysis. Moreover, many thanks go to Dr. Holger Hoos, Elaine Chang, Timothy Chan, and all o f m y lab members. M a n y thanks to my Biomarkers i n Transplantation team members, Dr. Bruce M c M a n u s , Dr. Paul K e o w n , Dr. Rob McMaster, Janet M c M a n u s , Rob Balshaw, Dr. Gabriela Cohen Freue, Stephane Vannier, Martha Casey-Knight, A x e l Bergman, Dr. A l i c e M u i , Dr. Chris Ong, Pooran Qasimi, Jon Carthy, B e n Good, and everybody else on the team. Last, but not least I would like to thank my father and m y husband for all the encouragement and help. Without m y husband's, Laszlo, support I would not have gone back to school and would not been able to find this great new profession. Thank you and eternal love. In memory o f m y mother, Puskas Maria, who taught me from an early age that with hard work everything is achievable. viii \fChapter 1 Introduction 1.1. Description of Project Transplantation is a common procedure for patients with end-stage heart, kidney, or liver failure. Once the organ transplantation is performed, the rejection o f the allograft (implanted organ) is possible. Rejection occurs i f and when the patient's immune cells realize the presence o f a foreign organ and try to destroy it. Periodic checking for signs o f chronic or acute rejection has to be performed, so patient's immunosuppressor drug dosage can be adjusted to save the allograft from being destroyed. Biopsy o f the allograft is the common method for the diagnosis o f rejection. This involves the advancing o f a thin tube to the transplanted organ, for example the heart, by insertion into a vein i n the neck or groin and threading through blood vessels into the heart o f transplant patients [15]. A t the end o f the tube there is a bioptome that takes a small sample o f the allograft tissue. This procedure is not only expensive but also painful and potentially harmful for the patient (sometimes causing even death) [2]. These are the main reasons why there is a need for a non-invasive and cost effective procedure for diagnosis\/prediction o f organ allograft rejection. Such a procedure could possibly be as simple as a blood test. Our Biomarkers in Transplantation project's focus is to find biomarkers o f organ transplant rejection based on blood samples from patients with and without rejection. B y biomarker we mean a gene (or protein), which, i f 1 \fexpressed at a certain level, indicates the occurrence o f allograft rejection. These biomarkers should be differentially expressed (DE) in the rejection and non-rejection patients. This way, by looking at a few genes' expression level o f a patient, the clinician could determine i f that patient is, or w i l l be, rejecting the transplanted organ. If we can find genes that are involved in the rejection process and which are good predictors\/diagnostic tools, then the periodic biopsies can be replaced with a relatively painless and inexpensive procedure. The biomarker genes could even serve as drug targets for treating patients with organ transplant rejection. If biomarkers can be found, then the diagnosis\/prediction could be performed i n the future by taking a small amount o f blood or urine sample from the patient with organ transplant and checking the biomarker genes. Depending on the expression level o f these genes, the clinician could tell i f rejection has occurred or w i l l occur in the next few weeks\/months. 1.2. Description of Data to be Analyzed To be able to find biomarkers i n blood, biopsy and blood samples are taken from about 100 patients with organ transplants. The biopsies are obtained as part o f the regular check-ups each transplant patient has to attend. Based on these tissue samples the clinician can tell i f the patient is experiencing rejection. B l o o d samples are taken from, each patient right before and periodically after the transplantation. U p to 8 samples are sent for microarray analysis for each rejection patient, where the last sample is the one when rejection is diagnosed for the first time. There is a control group o f patients whose blood samples are also collected. This group consists o f transplant patients who do not develop rejection within the time period o f sample collection. Since the data set consists o f samples collected from multiple patients at multiple time points, the data to be analyzed is longitudinal. The analysis o f this data is complex because each o f the samples consists o f 54,613 genes. Due to the size o f the data set, the traditional statistical methods used for longitudinal data analysis cannot be applied. 2 \fTherefore, some pre-processing is needed to eliminate the genes that we know cannot qualify as biomarkers. The question is what statistical methods should be used to filter out the non-biomarker genes. W e have to make sure that no gene with potential predictive\/diagnostic properties is eliminated, since once a gene is eliminated it w i l l no longer be considered i n the analysis. The purpose o f our research is to find appropriate methods for the analysis o f the longitudinal data with a focus on the filtering process. The blood samples are still being collected and, as a result, we have no real data to work with. This is why we created some test data sets that are based on the properties o f the preliminary data we have. W i t h the help o f these test data sets, we were able to investigate the filtering methods to ensure that only genes with non-desired properties are eliminated. 1.3. Challenges In order to see what the common practice is in the analysis o f longitudinal microarray data, we looked for papers describing the steps for finding biomarkers i n this type o f data. To the best o f our knowledge, there are no studies with the same scope and data types. There are only a few papers that deal with longitudinal data, either because a study with multiple patients and multiple time points is still too costly or because for some disease the tracking o f gene expression patterns over time is not essential. Out o f these few longitudinal studies, some look at the expression o f only those previously known biomarkers [8 and 18]. Others analyze one time point, and then verify the candidate biomarkers found at that one time point, b y clustering the candidate biomarkers one time point at a time and comparing the results [14]. In some other studies, the purpose is not o f finding biomarkers. If one is looking at only the previously known biomarkers then the analysis is different than the analysis o f the whole microarray. This is because i n the first case one deals with fewer than 10 genes and, as a result, all traditional statistical methods built for the analysis o f longitudinal data work. In the later case, one has to look 3- \fat tens o f thousands o f genes and, as we noticed, the traditional methods cannot be applied anymore. W e surveyed biomarker studies that analyze microarray data for multiple patients and only at one time point to see what the steps for finding biomarkers are. W e noticed a pattern among most papers (see details i n Chapter 2). The common steps applied are normalization, pre-filtering, finding differentially expressed genes, and clustering. The first step, normalization, is applied to have expression values for genes from different samples comparable. During the pre-filtering step, the genes with undesirable qualities are eliminated from further analysis. The common method used i n most studies, for finding the differentially expressed genes, is the t-test. Clustering, the last step o f the analysis, is based on the candidate biomarker genes found i n the previous step. W e tried to use these steps as the base for our analysis. W e changed the usual t-test applied i n the filtering step to G E E (Generalized Estimation Equations) [12], a method developed for the analysis o f longitudinal data. The model that we had to fit with this method is the following: Y =p +X p +X P +... + X P +e ' s o fil i ij2 2 ijk k ij where Y y is the outcome (the absence or presence o f rejection) for patient i at time point j , Pk is the regression parameter for predictor k, and Xyk is the predictor k (expression o f gene k) for patient i at time point j . B y fitting this model, G E E would show which genes are playing a role i n differentiating the expression pattern over time between the control and rejection groups. In other words, G E E would give us candidates for biomarkers. W e applied G E E to our test data and realized that G E E was not built for the analysis o f thousands o f predictors. In a usual statistical setting, it would be used for analyzing no more then a dozen predictors. Even when used for the analysis o f biological data, the usual setting would be to use the clinical data (e.g. age, gender, medication, etc.) as predictors, which again is limited i n number. W e realized that we cannot apply G E E , for fitting the above model, i n our analysis, or at least not before filtering out some more genes, because o f the degree o f freedom problem. In order for G E E to be applied, based 4 \fon the number o f patients and time points we have in our data set, the maximum number o f genes we can use as predictors is no more than 400. number, o f genes < number o f patients * number o f time points A s a result, we came to the conclusion that there has to be a filtering step before we can apply G E E or survival analysis to find the biomarker. 1.4. Description of the Steps of Our Analysis The first 3 steps o f analysis performed i n our study are similar with the ones commonly used in this field (see Fig. 1.1). W e have some extra steps that are needed for validation o f the prediction\/diagnostic power o f the candidate biomarkers obtained after the filtering process. Normalization Pre-filtering Filtering Biomarker validation Figure 1.1: Steps o f statistical analysis for finding biomarkers 5 \fNormalization is needed in a microarray study to make the expression values for genes from different samples comparable. The normalization is performed with Genetrix, a microarray analysis software developed by Dr. Jonathan Buckley at the University o f Southern California, Los Angeles [7]. The normalization is achieved by weighting the score for each gene for each sample b y using the P C A method. Genetrix incorporates the Probe Profiler package (developed by Corimbia Inc.), and P C A is a Probe Profiler algorithm. Following the normalization, genes with negative expression value across all samples (control and rejection) are eliminated because negative expression value means that the gene is not activated. The genes with similar expression value across all patients and all time points are also eliminated from the analysis. The reason for this is that a biomarker gene should have different expression levels for the rejection and non-rejection patients, and they should also have some between time point variations. According to our previous experience with microarray data, the pre-filtering w i l l reduce the number o f probe sets from over 50,000 to about 10,000. Various statistical methods are applied for the filtering (see section 3.2.) to further reduce the number o f genes to about 100 (which we call top k). The purpose o f this step is to find differentially expressed genes between the two groups o f patients. The remaining 100 genes are called candidate biomarkers. Following the filtering step, pathway analysis is performed, during which we find those genes that regulate or are regulated by our candidate biomarkers. It is important to identify all genes that might be involved i n the rejection so that the ones that have the best predictive\/diagnostic power can be pinpointed. The genes up- and downstream from the candidate genes are added to the list o f candidate genes, and are further analyzed with traditional statistical methods used for longitudinal data analysis. These statistical techniques are G E E and survival analysis methods, and are applied to check which one o f the candidate biomarkers can be used for prognosis and\/or diagnosis. A t the end o f this step, we are left with less than 10 genes which constitute the biomarker genes. 6 \fAnother pathway analysis step follows, where the pathways that the biomarkers are involved in are found. A t this stage more biological information can be found about the biomarkers, and i f there are a few o f them involved i n the same part o f a pathway then a possible drug target might be located. This information could be used by a drug company in prevention o f rejection, after the biomarkers and the drug target are biologically validated. Our research focuses on the filtering step because this is the bottleneck o f the analysis and is one o f the most important steps o f the whole analysis process. A l s o , this is the step that is not performed in the related studies. W e want to give a list o f methods that one could use in the filtering phase, and also show which method should be used i n what situation. The reason for the comparative study o f the filtering methods is to provide guidelines for researchers who want to analyze similar type o f data to find biomarkers for a certain medical condition. 1.5. Description of Biological Questions To be able to find the best analysis strategies for our data set we asked our biologists to provide us with questions they would like to get the answers to within our Biomarkers i n Transplantation project. W e collected these questions and created a list (see Table 1.1). The remaining o f the thesis is organized as follows: currently used for finding biomarkers. Chapter 2 describes the methods In chapter 3 we categorize the biological questions and map them to filtering methods. In chapter 4 we describe experiments performed on synthetic data and our findings. W e provide our conclusions i n chapter 5. 7 \fBiological Questions 1 A t which one o f the TPs does the D E start? 2 Are there genes with D E from the moment o f transplantation? 3 W h i c h genes have D E between control and rejection at the first time point after transplantation (Tj) compared to time point before transplantation ( T B ) ? 4 W h i c h genes have D E between last ( T ) and one before last time points (T -i)? L L 5 A r e there genes with D E between liver, kidney, & heart? 6 Is the concentration o f the biomarker detected correlated with the severity o f the rejection? 7 Does gender, race, age, medication play a role in rejection? 8 Are there genes with D E between control & rejection? 9 A r e there genes with D E between liver & kidney? 10 A r e there genes with D E between heart & kidney? 11 A r e there genes with D E between liver & heart? Table 1.1: List o f biological questions 8 \fChapter 2 Review of Methods Currently Used for Finding Biomarkers 2.1. Types of Studies We searched the literature for studies with similar purpose and data type as our Biomarkers in Transplantation project. We realized that, while there are a few studies that have some similarities with our project, none of them have exactly the same scope and data type. What this means is that either the purpose of the study is not to find biomarkers or the data under analysis does not have the same properties. For the papers we found to analyze longitudinal microarray data, the purpose of the research was not finding biomarkers, but getting a better understanding of a specific condition. 2.1.1. Studies with Multiple Time Points and Small Number of Patients In the studies by Vaes et al., Tracey et al., and Hirano et al. [ 9, 19, and 21], where the number of samples per time point and condition is very limited, between 1 and 10, the method of choice for finding genes of interest is fold change. Usually, genes with fold change smaller than 1.5 [9] or 2 [19 and 21] are excluded from the study. This means that if the gene expression level change between control and the studied condition is less than 1.5 (or 2) then that gene can not be used as biomarker of that specific condition. 9 \fDepending on the number o f genes remaining, some look at only the genes with the largest fold change [9] or use clustering,[19] and function o f genes [21] to narrow down the number o f genes that could be considered markers o f the disease under study. 2.1.2. Studies with Small Number of Genes M a n y studies hypothesize that genes known from the literature, as related to the condition under study, might be helpful i n the diagnosis o f that condition. In these cases, a microarray study with tens o f thousands o f genes is not necessary; the method o f choice is R T - P C R , or some other version o f P C R . P C R is a technique for rapidly creating copies o f a specific segment o f D N A . This technique enables the genetic profiling o f samples containing very small amounts o f D N A . The advantage o f P C R is that it can be performed i n the lab and is cost effective. A l s o , the researcher can measure only the expression level o f the genes o f interest. Shoels et al. [17] analyze 39 genes with R T - P C R , by using Mann-Whitney U-test, to find differentially expressed genes between the groups o f patients. The next step is discriminant analysis, to find genes that discriminate between the different conditions after which, b y applying receiver operating characteristic ( R O C ) analysis, the optimal cut-off is evaluated and applied i n the discriminant analysis until the optimal R O C cut-off is achieved. In another similar study [5], the number o f genes analyzed is 10, the number o f patients is 13 control and 8 diseased. The statistical analysis method o f choice is t-test. Expression levels at the time o f appearance, o f disease are compared with the baseline expression level and a comparison between the control and disease group is also performed. 2.1.3. Studies with One Time Point Based on papers published on biomarker studies with one time point [4, 13, 14, and 16], the trend o f statistical analysis can be summarized i n a few major steps: normalization, pre-filtering, identification and analysis o f differentially expressed genes using t-test, and clustering procedures (see F i g . 2.1). 10 \fNormalization Pre-filtering Finding D E genes (t-test) Clustering D E genes Figure 2.1: Common steps o f statistical analysis for finding biomarkers A s i n any microarray analysis, the first step o f the statistical analysis for finding biomarkers consists o f the normalization o f data. A global normalization is performed b y setting the mean value [13, 14, and 16] or the median and unit variance value [4] to a value fixed b y the researcher. In the case o f c D N A microarray analysis, a correction o f dye-specific effects and the intensity dependent bias is also performed [4]. The normalization is followed by the pre-filtering process, where genes are eliminated i f they have undesirable qualities, such as expression value close to that o f the background, small fold difference between expression value o f rejection and control samples [4, 14]. The following step is to identify genes with differential expression, out o f the ones remaining after pre-filtering. The most commonly used test for finding these genes is the t-test [4, 13, 14, and 16]. Some researchers prefer to use W i l c o x o n and quantile-based tests as well [16]. The genes, identified as differentially expressed, are clustered using hierarchical clustering. If the samples within the same group are i n the same cluster and separate from the other group then it confirms that the identified genes are correlated with the allograft rejection. T o provide some detail on how the steps described i n Figure 2.1 are performed the description o f the analysis o f choice for two papers follows. 11 \fScherer et al. [16] analyze a sample o f 17 transplant patients (9 o f the patients developed rejection within 6 months after the sample was taken and 8 did not) to find predictors o f renal allograft rejection. They normalize the data set, consisting o f 12,625 probes, by scaling the mean value o f the 96% quantile o f the probes on a chip to .200. Further analysis is performed in S-Plus and GeneSpring, and comprises o f setting the really small average difference values to 10 and eliminating the probe sets that do not satisfy some preset conditions. These conditions are the following: both groups o f patients' data contain at least 40% values above 50, and the whole data set contains at least 50% o f values above 10. To find the differentially expressed genes Student's t-test, W i l c o x o n test, and quantile-based tests are performed. Ten genes are identified by all o f the above tests and, as a result, they are considered to be the biomarkers. T w o kinds o f tests are performed to assess the prediction power o f the ten genes. In one o f the tests, by looking at the expression o f the ten genes in 16 samples, the authors aim to predict the 17 th patient's outcome. The other test is done by looking at a patient's data, which was not included i n the analysis, and predicting the outcome based on the expression o f the biomarkers. Both tests validate the prediction power o f the identified genes. Donauer et al. [4] analyze kidney transplant data to identify genes based on which one can tell i f a patient has chronic allograft rejection. They first perform a global normalization on the data set consisting o f 13 chronic allograft rejection, 16 normal, and 12 end-stage polycystic kidney ( P K D ) samples. Genes with expression levels not exceeding three-fold over the background i n over 80% samples within a group o f patients are eliminated. After this pre-filtering 2,190 genes remain for statistical analysis. To find differentially expressed genes, they use two-sample t-tests, with p-values adjusted as per Benjamini and Hochberg method [1] to control the false discovery rate. They also perform a hierarchical cluster analysis for the 517 genes with significant difference between transplants and normal tissue (p-value < 0.0003), or transplants and P K D tissue (p-value < 0.05). The purpose o f the clustering is to check if, based on the candidate biomarkers, the different groups o f samples would cluster i n different groups. They find two major branches in the dendrogram, one containing all normal samples and the other 12 \fcontaining kidneys with end-stage renal disease ( E S R D ) . This last branch is further divided into two major branches that can not be explained by clinical or histological data. To identify biomarker genes specific for chronic transplant rejection, they evaluate the set o f genes that were significantly different from normal and P K D (p-value < 0.05) and further eliminate those that were less than 1.5 fold up- or down-regulated. The remaining 27 genes are considered to be characterizing the kidney allograft rejection. To understand more about these genes they are annotated according to their biological functions. A similar analysis is performed to identify genes characteristic for E S R D . 2.2. Summary and Critique A s we stated before, to the best o f our knowledge, there are no studies with the same purpose and data type as our project. Because o f this, the type o f analysis performed is also different. One situation is when the data has multiple time points, but only one or two patients. In this case, only very simple methods can be used i n the analysis, like ratios. Another situation is when researchers have an idea about what the biomarkers would be for a certain condition and they only look at those genes' expression b y using some version o f P C R , not microarrays. This means the number o f genes to be analyzed is only about a dozen. Moreover, there are situations when only the clinical data is measured over time not the gene expression, which again means only about a dozen o f predictors. The analysis o f a handful or a few hundred genes is different from how one would analyze tens o f thousands o f genes. In both o f these last two cases the traditional statistical methods designed for longitudinal data analysis will work perfectly. Furthermore, most biomarker studies have only one time point, which requires different statistical analysis methods than our longitudinal data set. For the papers we found to analyze longitudinal microarray data, the purpose o f the research was not finding biomarkers, but getting a better understanding o f a specific condition. In this case, again, the steps o f the analysis are different from our project's because the aim o f the study is not the same. 13 \fB y reviewing the statistical analysis o f microarray data with one time point we found some common steps followed by all the papers. However, the statistical methods used i n each step are not always the same. In general, a cluster analysis is performed after the differentially expressed genes are identified [4, 13, 14, and 16]. Some studies compare clusters at different time points as well [14]. W e believe that, while clustering is useful i n microarray data analysis, it should be mostly used for hypothesis generating purposes. In general, the goal o f the papers reviewed was to find biomarkers for allograft rejection. A s pointed out by Lachenbruch et al. [11] it is not enough to discover the genes correlated to allograft rejection. One has to also make sure that these genes can predict the occurrence o f rejection, meaning that change in the expression level o f these genes w i l l result i n change in the outcome o f organ transplant. In our case, after the biomarkers are identified, according to Lachenbruch et al., they should be tested to see i f they can be applied i n prediction\/diagnosis. The method o f choice to find the genes that can predict the outcome can be multiple regression, and\/or survival analysis, and variations o f these methods [11]. Although the methods listed by Lachenbruch et al. are mentioned as techniques for prediction based on clinical data, they can be employed for microarray data as well. 14 \fChapter 3 Filtering Methods for Finding Candidate Biomarkers 3.1. Description of the Filtering Methods Based on previous experience, we know that most o f the genes are not expressed within a certain type o f tissue sample. W e estimate that we w i l l have about 10,000 genes at the end o f the pre-filtering step. Due to the still large number o f genes, we can not apply the traditional statistical methods to analyze our longitudinal data set. However, we can devise algorithms based on existing statistical methods that w i l l help us narrow the search for biomarkers. These methods can be categorized based on their properties o f being able to handle the analysis o f data with multiple time points or not. Time not included Time included 1. T-test 4. Linear regression 2. W i l c o x o n test 5. G E E 3. Permutation test 6. M A N O V A Table 3.1: Possible tests for the select ion o f candidate biomarkers 15 \fIn Table 3.1 we listed the methods that may help us achieve the objective o f this study. Next, we described each group o f filtering methods and listed their strengths and weaknesses, based on the statistical properties o f these methods. The filtering methods were tested with the help o f a semi-synthetic data set. The test data set was generated based on a rat spinal cord injury microarray data (http:\/\/www.ncbi.nlm.nih.gov\/projects\/geo\/gds\/gds_browse.cgi?gds=259) downloaded from the G E O data repository [6]. The original data set contained only 4 samples per time point, and we needed a higher number o f subjects i n order to test our methods. A 25 subject data set was generated with the mvrnorm function i n R [10], based on the mean and the covariance matrix o f the four rat samples. This new data set has multivariate distribution with the same mean and variance as the original rat microarray data. The moderate spinal cord injury was considered to be the control and the severe condition as rejection. A s a final result, our test data set contains 8791 genes, 4 time points, and 25 subjects i n each group (control\/rejection). 3.1.1. Time Not Included The three statistical tests i n the \"time not included\" group (see Table 3.1) can help determine i f there is a significant difference between the expression level o f a specific gene for two groups o f subjects. These tests can be tailored to find genes that have differential expression at all time points, at some o f the time points, or only at specific time points. The t-test assumes that the data to be analyzed is normally distributed. W i l c o x o n and permutation tests are both non-parametric methods. W h i l e the W i l c o x o n test is based on the ranking o f the individual difference scores, the permutation test determines the statistical significance o f the experimental outcome based on randomizing the two groups many times. Iterations are needed to assure every possible combination o f the drawn data is attained. Based on our tests with the semi-synthetic data we learnt that t-test, W i l c o x o n test, and permutation test are good at finding genes that are differentially expressed i n two 16 \fdifferent groups of subjects at all (see Fig. 3.1.a) or some of the time points (see Fig. 3.1.b). DEat All Time Points 2.5 \u2014 2 1.5 - * \u2014 Rejection a. x a> a> c a> D) -\u2022- - - Control 2 3 Time points a. DE at all time points DE at One Time Point 2.5 -r- 2 - \u2022 \u2014 Rejection CL X 0) c - o- \u2022 - Control 1 0! 2 3 4 Time points b. DE at only one time point Figure 3.1: Strengths of t-test, Wilcoxon, and permutation test 17 \fThese three tests are not able to differentiate between different trends for the different groups of subjects (see Fig. 3.2.a) and are weak at differentiating between similar trends of gene expression over time (with different intercepts) for the two groups of subjects (see Fig. 3.2.b). Different Slopes 2.5 c .2 2 to g) 1.5 \u2022 Rejection CL x 0) \u2022 Control -I 3 0.5 CO 0 5 1 2 3 Time points a. different slope of gene expression over time for the two groups Similar Slopes & Different Expr. Levels 2.5 .2 2 in in \u2022 Rejection| \u00a3 1.5 \u2022 Control Q. X a | 1 0.5 c? 0 c. differential expression between the groups at the last time point Figure 3.4: Weaknesses o f linear regression and G E E 21 \fThe typical question M A N O V A can answer is: \"Does the gene expression change over time?\" The power of this method comes from the fact that besides the time points, the groups or other variables (e.g. age, gender) can also be considered in the model. M A N O V A is good at finding genes with differential expression pattern over time between control and acute rejection subjects (like GEE). Unlike GEE, M A N O V A is not restricted to finding genes with a certain expression pattern (like linear, exponential, etc.). M A N O V A can find genes with any expression pattern over time (see Fig. 3.5). Non-linear M o d e l Rejection Control 1 2 3 4 Time points Figure 3.5: Strengths of M A N O V A M A N O V A is weak at finding only those genes with differential expression at specific time point(s), like GEE, (see Fig. 3.6.a). M A N O V A is also weak at finding only genes with a specific slope or intercept, unlike G E E (see Fig. 3.6.b\u20143.6.d). For example, M A N O V A can find genes that have upward and downward slopes, but i f somebody is only interested in genes with upward slope M A N O V A can not differentiate these from the genes with downward slope. 22 \fDE at Last Time Point 3 -,- 0 \"1 2.5 2- 1 15a> o> I Rejection Control 1 - 0.5 1 2 3 4 Time points a. differential expression between the groups at the last time point Similar Slopes & Different Expr. Levels b. similar, upward, slope for the two groups o f subjects Different Slopes & Same Intercept Time points c. different slopes and same intercept for the two groups o f subjects 23 \fDifferent Slopes & Intercept 2.5 c .2 n 2- CO Rejection \u00a3 1.5 Q. CD o Control 0 1\u2022 2 3 4 Time points d. different slopes and intercepts for the two groups o f subjects Figure 3.6: Weaknesses o f M A N O V A 3.2. From Biological Questions to Statistical Questions After reviewing the strengths and weaknesses o f all o f the statistical methods listed i n Table 3.1, we realized that one test, or one group o f tests, is not able to answer all o f the biological questions from Table 1.1. B y mapping the biological questions to statistical questions, we recognized that the 11 questions can be placed i n four categories (see Table 3.2). Category one includes questions which inquire about two groups o f patients being differentially expressed at specific time points. The questions i n the second category are comparing expression levels between two time points. In the next category o f questions three or more groups are compared. The last category comprises o f questions where differential expression between two groups o f patients is o f interest. In this last case, the differential expression can occur at any or all time points. 24 \fCategory Questions I 1. A t which one o f the time points does the D E start? 2. A r e there genes with D E from the moment o f transplantation? II 3. W h i c h genes have D E between control and rejection at T i compared to T B ? 4. Which genes have D E between Ti_ and T n ? III 5. A r e there genes with D E between liver, kidney, & heart? 6. Is the concentration o f the biomarker detected correlated with the severity o f the rejection? 7. Does gender, race, age, medication play a role i n rejection? 8. A r e there genes with D E between control & rejection? 9. A r e there genes with D E between liver & kidney? 10. A r e there genes with D E between heart & kidney? 11. A r e there genes with D E between liver & heart? Table 3.2: Categories o f questions 3.3. Question to Method Mapping Once we categorized the questions we were able to map questions with statistical methods that can help answer them (see Table 3.3). The matching between questions and filtering methods is based on the strengths and weaknesses o f the filtering methods listed i n this chapter. W e believe t-test, Wilcoxon, and permutation test would provide the best results for the first two questions i n table 3.3 because these three methods are able to find differential expression at specific time points. These three tests along with M A N O V A would provide the most precise answer to the next two questions. Since linear regression and G E E work better with more time points we believe they would not be able to perform as well as the other methods, due to the fact that we are comparing only two time points. For questions 5, 6, and 7 linear regression, G E E , and M A N O V A would be the best fit, for the reason that we have more than two predictors i n the model. 25 \fFor t-test, Wilcoxon, and permutation test we have to run each combination o f two predictors separately and then combine the answers. This can become cumbersome as the number o f predictors increases. A s far as the last four questions go, we believe that all methods would be able to provide a good answer. Cat. Methods Questions I 1. A t which one o f the time points does the D E start? 2. A r e there genes with D E from the moment o f transplantation? T-test, Wilcoxon, Permutation test II 3. W h i c h genes have D E between control and rejection at T i compared to T B ? 4. W h i c h genes have D E between T L and TL_I? T-test, Wilcoxon, Permutation test, MANOVA III 5. A r e there genes with D E between liver, kidney, & heart? 6. Is the concentration o f the biomarker detected correlated with the severity o f the rejection? 7. Does gender, race, age, medication play a role in rejection? Linear regression, GEE, M A N O V A IV 8. A r e there genes with D E between control & rejection? All 9. A r e there genes with D E between liver & kidney? 10. A r e there genes with D E between heart & kidney? 11. A r e there genes with D E between liver & heart? Table 3.3: Questions and statistical methods that can help answer them 26 \fChapter 4 Validation of Method - Question Correspondence 4.1. Description of Test Data Sets A s we described i n the previous chapter, we found the strengths and weaknesses o f the filtering methods using a semi-synthetic data set. However, we could not test the number o f true positives, false positives, and false negatives returned b y these methods using the same data set. This is due to the fact that it is almost impossible to know the exact number o f differentially expressed genes i n a real data set. A s we stated before, our semi-synthetic data was generated based on a real data set. T o test i f our method-to- question mapping is correct, we created synthetic data sets. In these data we planted differentially expressed genes, which allows us to see how each method performs in terms o f finding the biomarkers. A l l o f the test data sets have 10,000 genes and 100 patients per medical condition. Patients have variable number o f time points with minimum o f three and maximum o f 8 time points. Gene expression values vary between 0 and 15. T o make our data closer to a real data set, we had to add some variation within patients at a specific time point and variation between time points for all genes. This was necessary since, as we saw i n real data sets, there is always some within and between variation for some o f the genes. Given that we were testing our filtering methods, we assumed that genes with no variation across all time points and samples are eliminated; hence all our genes have within and\/or between variations. 27 W e achieved this by \frandomly generating the within variation (between zero and one) and between variation (zero to two) for all patients at all time points. A l l these numbers are values derived through l o g base two transformations, since the common practice is to take the log two o f the gene expression values before performing analysis on the data. The variables i n our data sets are the number o f biomarkers, the number o f patients with differential expression for a biomarker, and the number o f time points with differential expression. It is important to have these dimensions o f the data variable to test how the different filtering methods perform under the different conditions. W e created data sets with 60, 100, 200, and 500 biomarker genes respectively, which means that variable number o f genes have differential expression. Based on the literature, most people are only interested i n above two fold differential expression. In our generated data sets we have between 2 fold and 64 fold difference, which corresponds to a value between 1 and 6 after log transformation. This, again, is based on real microarray data. W e have seen this kind o f variation i n the limited number o f samples we obtained so far from transplant patients. Number of Biomarkers 60 100 200 500 DE for all patients at all time points 60-AP-AT 100-AP-AT 200-AP-AT 500-AP-AT DE for some patients at all time points 60-SP-AT 100-SP-AT 200-SP-AT 500-SP-AT DE for all patients at some time points 60-AP-ST 100-AP-ST 200-AP-ST 500-AP-ST Table 4.1: Types o f data sets and their symbols W i t h i n each o f the data sets with a specific number o f biomarkers we created three types o f data sets (see Table 4.1). For example, i n the above table, the symbol 6 0 - A P - A T corresponds to a data set with 60 biomarkers, expressed i n all patients and at all time 28 \fpoints. X - S P - A T is a data set in which only some o f the patients have differential expression, and X - A P - S T contains X number o f biomarkers differentially expressed i n all patients, but only at some o f the time points. T o be more specific, for instance i n 60-SPA T , we divided the biomarkers into 20 groups. Each group has an equal number o f genes (number o f biomarkers divided by the number o f groups). The groups consist o f 100, 95, 90, 10, and 5 patients, with differential expression at all time points. This way, we can test how the filtering methods find biomarkers with not all patients showing signs o f differential expression. W e also wanted to know i f the filtering methods find the biomarkers i f the differential expression is not present at all time points. T o be able to do this, within each o f the data sets with a specific number o f biomarkers we created a data set ( X - A P - S T ) where the biomarker genes are divided into 4 groups. Each group consists o f equal number o f genes (number o f biomarkers divided b y number o f groups). First group o f genes has no differential expression at one o f the first four time points. Second, third, and fourth groups have no differential expression at two, three, and four time points respectively. In the next sections we w i l l specify which test data is used to study the filtering methods under specific circumstances. 4.2. Experiments Performed and Results A s described i n the previous chapter, the questions posed by our biologists are grouped into four categories, based on the types o f statistical methods that would help answer them (see Table 3.2). For each o f these categories we chose a representative question, and proved that the other questions are similar and can be deducted from the representative o f that category. W e applied all six filtering methods described i n chapter 3 and compared the number o f biomarkers found by each with the number o f biomarkers planted i n the data set. Except for the permutation test, all other methods were run i n R , a free software environment for statistical computing [10]. The permutation test, an in-house Perl script, 29 \fwas executed on a Sun Microsystems Sun-Fire-280R server with Solaris SunOS 5.9 operating system, 2 UltraSPARC-Ill CPUs, and 4 G B of R A M . The permutation test had the longest running time out of all the filtering methods, 32 hours and 42 minutes. This was the running time for a data set with 10,000 genes and 200 patients, and with 5000 iterations. The rest of the filtering methods were run on a server with Microsoft Windows Server 2003 operating system, a 3 G H z Intel Pentium 4 C P U , and 2 G B of R A M . The running time for t-test was 1 min 27 sec, for Wilcoxon 3 min 9 sec, for linear regression 7 min 48 sec, for M A N O V A 25 min 28 sec, and for G E E 53 min 28 sec. To find the best statistical methods for a category of questions, we tried multiple data sets for each category and investigated the top k (60 to 100) genes returned by each method. The basis for finding the best filtering method is that the top k genes returned by the method have to have the most overlap with the biomarker genes planted in the data sets. To assess the methods not only based on number of true positives returned within the top k genes, but also based on false positives and false negatives, we created precision-recall graphs for each test data set. The precision and recall are defined as follows: precision = P(A | B) = recall = P(B | A ) P(A u B) P(B) n, n + ! 1 P(A u B) \u2022xl00% n 2 1 -xl00% P(A) i l n + n ! 2 where A is the set of biomarkers and B is the set of biomarkers found by a statistical method, nn represents the number of true positives, ni2 is number of false negatives, and n i is the number of false positives (see Table 4.2). 2 Found by Method Yes Biomarker No Yes No n, n \\ v n2 ,; 2 ;\u2022 A 2 2 B Table 4.2: Positives and negatives 30 \fIn other words, precision is the proportion o f biomarkers found by a method. Recall is the number o f biomarkers found as fraction o f all the biomarkers planted in the data set. Based on a precision-recall graph, the best method is the one which is the closest to 100% precision and 100% recall. This means few or no false positives and false negatives. We were also interested i n seeing what the minimum number o f patients or time points is for which a specific method returns a biomarker gene i n the top k genes. This might be interesting since, for example, i f only a subset o f the patients has differential expression for a specific biomarker, based on the clinical data (age, gender, race, etc.), we might find out that patients within a certain age range or race have different biomarker(s) than the other patients. Although the same six filtering methods are applied to all four categories o f questions, there are some differences in the way we apply these methods for the different types o f questions. For example, the tests are similar for category I and I V , while they are different for category II and III. Next, we w i l l show how the filtering methods are applied for each o f the categories. 4.2.1. Category I of Questions In this category we explore the following questions: 1. A t which one o f the time points does the differential expression start? 2. Are there genes with differential expression from the moment of transplantation? In these questions we are asking i f the differential expression between control and rejection occurs at all (question 2) or at some time points (question 1). M o r e specifically, i f there are genes for which the differential expression starts at a specific time point (Ti for question 2, any time point for question 1) and has differential expression at all consecutive time points. From this we can conclude that question 2 is a subset o f question 1. If we apply question 2, while eliminating the T i , T , 2 TL-I time points one by one i n the consecutive runs, we w i l l get the answer for question 1. Based on this, i f 31 \fwe test the filtering methods for question 2, we can be sure that the methods that perform the best for this question w i l l also be the best for answering question 1. For question 2 we applied the filtering methods i n the following way: we ran t-test, W i l c o x o n , and permutation test for each time point separately, while comparing the gene expression levels for control versus rejection patients. A t the end we picked the top k out of the genes that have p-value smaller than 0.01 at all time points. For linear regression, G E E , and M A N O V A we looked for genes that have a different combined group and time effect for the control and rejection patients. For these three methods the cut-off for genes o f interest was a p-value less than 0.01. To test the method-question correspondence, we applied the six filtering methods to 12 different data sets: X - A P - A T , X - S P - A T , and X - A P - S T , where X is 60, 100, 200, and 500 respectively. 4.2.1.1. Differential Expression for All Patients at All Time Points When we examined the top k versus number o f true positives returned b y all methods, we realized that t-test, W i l c o x o n , permutation test, and M A N O V A returned all the biomarker genes planted i n the data sets (see Fig.4.1). If the number o f biomarkers was increasing compared to the number o f top k we were interested in, then both linear regression (starting from about 150 biomarkers) and G E E returned good results (starting from about 400 biomarkers) as can be seen on Fig.4.1 .b and 4.1 .c. 32 \f60 Biomarkers for All Patients at All Time Points \u2022\u2014 T-test, Wilcoxon. Permutation; Mandva Linear regression \u2014 - GEE \u2014 \u2022 60 ' 70 \u2022 . I. so \u2022 ; ~H\u2014 90 ~~r100 Number of top genes a. 60 biomarkers expressed in all patients at all time points 200 Biomarkers for All Patients at All Time Points \u2022: Number of .'top genes :; b. 200 biomarkers expressed i n all patients at all time points 33 \f500 Biomarkers for All Patients at All Time Points ' N u m b e r of t o p g e n e s ' c. 500 biomarkers expressed in all patients at all time points Figure 4.1: Top k vs. number o f true positives in A P - A T , cat. I Based on F i g . 4.2 we were able to spot that we had a few clear winners within the filtering methods for the data sets with biomarkers expressed differentially for all patients at all time points. These statistical methods were - the same as the ones found to be the best based on the previous 3 graphs - t-test, Wilcoxon, permutation test, and M A N O V A . A l l four methods had 100% precision and recall which means they returned only and all the biomarker genes. Linear regression provided around 80% true positives and no false negatives while G E E gave the worst results with less than 40% recall and only 10% precision (see Fig. 4.2). 34 \fPrecision - Recall .\u00bb T-test\/Wile'oxon, Permutation; Manoya: x : Linear regression '. O GEE 7\\T~ ;0% 20% 80% '40% \". 100% Rrecisibn.: Figure 4.2: Precision-Recall graph for biomarkers expressed i n all patients at all time points 4.2.1.2. Differential Expression for Some Patients at All Time Points If the biomarker is expressed at all time points, but only for some o f the patients, then we might still be interested i n these genes, since this can be a good indicator o f the existence of subgroups o f patients (with different age, race, gender, severity o f rejection, etc.). Although the graphs with top k versus number o f true positives (see Fig. 4.3.a and 4.3.b) might have suggested that linear regression provided similar quality answers as t-test and\/or permutation and W i l c o x o n test, upon closer inspection significant differences could be detected i n the genes returned by these methods as candidate biomarkers. A l l biomarkers with differential expression for over 85% o f rejection patients had a small p-value and were within the top k based on t-test, W i l c o x o n , and permutation test. A s the percentage o f patients showing evidence o f differential expression for a gene decreases, the number o f such genes showing up in the top k decreases. higher differential expression had p-value smaller then 0.01. 35 Only those genes with \f100 Biomarkers for Some Patients at All Time Points Permutation '-.-'- Linear regression \u2014 \u2022'T-test' - - Wilcoxon : -'-\u2022.'GEE '. - \u2014 \u2022 . Mahova - '' : \u2014I\u2014 60 '80 70 ;ioo: 90^'. Number of top genes: '. : ..>' a. 100 biomarkers expressed at all time points in some patients 500 Biomarkers for Some Patients at All Time Points \u2014\u2014':\u2022 T-test; Wilcoxon, Permutatiori;Linear regression ; -f^ :;GEE.-?::';:'.;\u2022 :\u2022\u2022>;'.\u2022\u2022 : \u2022 \" -\u2014oManova' ' ^ \u2022' ; : ;- \"'^ \u2022 : : : : : : : : 'i:p.:;:;yir-PPi \\-:-i' ' P i p p p p : .\u2022\u2022 : : : : : -'. : ; .' : :\u2022\u2022: S E E 'i^-i-. \" \u2022 : .illlllS^^^^ W^MiM^^iU 3;I;IP%^ ' : ;;P?6% | p ; * p 4 p % fM Mi'M II : : ; pH:'gp;:SS:j:; ::;:PreGisibn''vp;'', \u2022 : 1 : :P-V'::S.:: : : : Figure 4.4: Precision-Recall graph for markers expressed in some patients at all time points 37 \f4.2.1.3. Differential Expression for All Patients at Some Time Points If the biomarker is differentially expressed for all patients, but only at some of the time points, then, based on our question, we should have no genes returned by the filtering methods. While this was the case when applying t-test, Wilcoxon, and permutation test the other three methods returned more than 50% false positives (see Fig. 4.5. and 4.6). 60 Biomarkers for All Patients at Some Time Points : - - - L i n e a r regression Manova - '\"'i :.\u2014 -:'GE'E;-; \u2014 : - :- ;4 : : : '\u2022' \u2022 T-testWilcoxoh.Permutation' -.60: a. 60 biomarkers expressed in all patients at only some of the time points 38 \f200 Biomarkers for All Patients at Some Time Points -\u2022\u2022 L i n e a r r e g r e s s i o n . M a n o v a \u2022- GEE' . . . \u2014 T-test. W i l c o x o n . P e r m u t a t i o n E Z3 . 2 T7T~ 60. 90 70 ' ioo : N u m b e r of t o p g e n e s b. 200 biomarkers expressed in all patients at some of the time points Figure 4.5: Top k vs. number of true positives in AP-ST, cat. I Based on this last experiment we should discard linear regression, GEE, and MANOVA from the list of statistical methods that can be used for filtering for the type of questions from this category because, clearly, they can not distinguish between differential expression at some or all time points of interest. Also, GEE provided less than 10% precision, which is unacceptable. 39 \fPrecision \u2022 Recall T-test,-Wilcoxon, Permutation Linear regression' o G! I.. v ::Manova : XV O. \"I ,;o% .20%:' : : :0% ; : 40% : -I 100% Precision Figure 4.6: Precision-Recall graph for markers expressed i n all patients at some time points 4.2.2. Category II of Questions This category includes the following questions: 1. W h i c h genes have differential expression between control and rejection at T) compared to T ? B 2. W h i c h genes have differential expression between TL-I and T L ? The questions i n this category are very similar. One is asking i f there are genes with differential expression between T B and T i and the other is asking the same, except the comparison is made between the last and one before last time points. The first question is asked with the scope o f finding the genes whose expression level changes during and shortly after the transplant is performed. The last question is posed to find out what exactly happens shortly before the rejection can be detected versus at the point when it is 40 \fdetected for the first time. According to the experimental design o f the Biomarkers in Transplantation project, T L i n the rejection patient group is the time point when the rejection is detected. In both questions we are comparing two time points; the only difference between the questions is the time points which are compared. A s ' a result, we can test the filtering methods for this category by applying them to answer either o f the questions. The results we get should be the same in terms o f which statistical methods are better to answer them. We chose to test the first question i n this category and we applied the filtering methods in the following manner: t-test, Wilcoxon, and permutation test were applied to the ratio o f T i divided by T B . A l l patients have to have both o f these measurements according to the experimental design and, as a result, the ratio can be calculated for all patients. Since the expression values were already log transformed, the ratio is actually the absolute difference between the log value o f measurement taken at the first time point after transplantation and the log value o f the baseline. When applying linear regression, G E E , and M A N O V A , we looked for genes with different combined group and time effect for the control and rejection patients. For all six methods the cut-off for genes o f interest was a combination o f p-value (had to be smaller than 0.01) and top k. W e picked eight test data sets for this category; each set has 10,000 genes, two time points, and 100 patients for each o f the control and rejection groups. The data sets were X - A P - A T and X - S P - A T , where X is 60, 100, 200, and 500 respectively. 4.2.2.1. Differential Expression for All Patients at Both Time Points Based on F i g . 4.7 we were able to. see that, except M A N O V A , all other test performed really well. The top k versus true positives graph looked exactly the same when the number o f biomarkers planted i n the test data sets is 100, 200, or 500. 41 \f60 Biomarkers Expressed by All Patients -Trtest, Wilcoxon,' Permutation, Linear regression, GEE Manova'-'.\u2022 v; v;;'' 60.'. - ' . 80 70- .'\u2022''.;' . 90 '100. \u2022 Number of top genes : ' : ; : : a. 60 biomarkers expressed in all patients 100 Biomarkers Expressed by All Patients '-^\u2014.; T-test, Wilcoxon; PermutaSon',:Lihear- regression,GEE -^--Manova' \u2022:.::\";-;s>:^'';\\'-^^ ;:': ;::: : : : : \u2022 a>':-x::'o v ; S <9i: 60.: - 70 V 80 i90;; i:100: : : : Number',oftbp':genes:5: b. 100 biomarkers expressed i n all patients Figure 4.7: Top k vs. number o f true positives i n A P - A T , cat. II 42 \fU p o n closer inspection, we found that W i l c o x o n and permutation tests gave a smaller than 0.01 p-value to 2 and 15 non-biomarker genes respectively. These did not show up in the top k because their p-value was smaller than those o f the true biomarkers. These false positives can be seen i n Fig. 4.8. This figure also shows that M A N O V A had a very poor performance, with 0% precision and recall. This means that M A N O V A returned no gene with p-value smaller than 0.01. Precision - Recall \u00b0 4 '\u2022 T teat. Linear regress*on. GEE Vvllt.oxon Cnrmutatiori :; : -t-: V;::Manova ,. 0% \u2022 20% '10% G0% 80% : i':0 1C0% i Precision? Figure 4.8: Precision-Recall graph for biomarkers expressed in all patients 4.2.2.2. Differential Expression for Some Patients at Both Time Points In the case o f biomarkers being expressed in only some o f the patients, all filtering methods, except M A N O V A , performed similarly when the number o f rejection patients with differential expression for a specific gene was more than 70% o f all patients with allograft rejection. In this case, all biomarkers were found by all five methods. If the percentage o f patients with differential expression was between 15% and 70% linear 43 \fregression and G E E found all biomarkers, but t-test, Wilcoxon, and permutation test found less and less o f the planted biomarkers as the percentage decreased. A high level illustration o f how all filtering methods performed is shown i n F i g . 4.9. Based on these graphs, one can see that M A N O V A had the worst performance. The other methods' performance was good, and similar to each other, i f the number o f biomarkers existent i n the data sets exceeded 200 (2% o f all genes i n a data set). For fewer biomarker genes present i n a data set, permutation test had the best performance, followed by t-test, linear regression, and G E E . A l l three methods had above 80% recall and no false positives (see F i g . 4.10). W i l c o x o n test returned the least number o f biomarkers out the five methods, but still had over 95% precision and almost 80% recall. 60 Biomarkers Expressed by Some Patients Permutation T-test; Linear regression. GEE Wilcoxon Manova': '''' \"\u2022?\u2022\u2022'\" 'iv:.!:.'\".\" \u2022;'.::,;;;.':\u2022 \u2022\u2022:-'\u2022\u2022\u2022\u2022\u2022\u2022\u2022:: ^ ' *>\u2022'' .'> '-_ Pi<\u2022:\u2022 W^m? \u2022;''.'.;...\"': ;Nurnberpf top genes-'M :Z : : '' \u2022 :'y^ ^ P W l p \u2022\u2022 a. 60 biomarkers expressed i n some patients 44 \f100 Biomarkers Expressed by Some Patients rrr 'T-test, Permutation \u2022 \u2022 \u2022 Linear regression,:GEE -Wilcoxon ' - - Manova v. \u2022,, . : \" 60 70 r \u2014r '90 80 - \u2022\u2022too- Number of top genes b. 100 biomarkers expressed in some patients 200 Biomarkers Expressed by Some Patients '.rrr?-.:T test, Wilcoxon; Permutation. Linear regression. G E E ; \u2022^Manova .:\u2022:.:.'\u2022\u2022:\u2022:\" \u2022 : v. -->.: a--------- \u2014r~ 80 00 \u2022 K Number of top gene's; .'; : : Pibb; : c. 200 biomarkers expressed in some patients Figure 4.9: Top k vs. number o f true positives in S P - A T , cat. II 45 \fPrecision - Recall .'.'\".v::0: T-test ... Linear regression, G E E ' Wilcoxon '+\u2022 Permutation '\u2022' V Manova . : A ; '!-V \" \u2022I 1 . : . \u2022.\u2022 v :-':6%. :p : : ; \/'.\"\u2022. \u2022\u2022 I, \". :. : :\u2022: I.... *..';V'::. - .; . ' .... . . ' : l. .'..;.:.:, : ... - I ' V:' :, 20% :i P;r':p40.% ::t ;'\u2022' *: :60%\u2022.\u2022 p.'\u2022'.','.' BO-s,\u2022;\": :\u2022;\u2022 , 1 0 0 % : : : : : ' '\u2022.: : ; : : \":pl: y>iPTecision' \u2022 : : Figure 4.10: Precision-Recall graph for biomarkers expressed in some patients W e performed two types o f experiments for category II type o f questions. In the first type the biomarker was expressed i n all patients. In the second type the biomarker was expressed only in some o f the patients. In both cases M A N O V A performed poorly, returning around 0.9% false positives and zero or one true positives. 4.2.3. Category III of Questions This category includes the following questions: 1. A r e there genes with differential expression between liver, kidney, and heart transplant patients? 2. Is the concentration (expression level) o f the biomarker detected correlated with the severity o f the rejection? 3. Does gender, race, age, medication play a role i n rejection? 46 \fIn this category the questions look more diverse at first glance, but upon further investigation they have a lot o f common features. For example, question one and two are actually identical in terms o f analysis, because they both ask about the differences between three different conditions. The conditions i n question one are the three different kinds o f organ transplants, in question two the conditions are non-rejection, slight rejection, and rejection. Question three is a bit more complex, since it asks about multiple clinical variables. Thus, the number o f pair wise comparisons is higher, but the analysis strategies are the same nonetheless. A s a result, we could test the filtering methods by applying them to either o f these methods, and we chose question one for this task. A p p l y i n g t-test, Wilcoxon, and permutation test are not the most natural choice, since these methods can only compare two groups at a time. Therefore, the analysis had to be performed i n groups o f two different kinds o f organ transplant patients for each time point separately. W e had to apply these methods separately to liver versus kidney, to liver versus heart, and to kidney versus heart transplant patients, one time point at a time, and then combine the results. When applying linear regression, G E E , and M A N O V A we looked for genes with different combined time and organ effect. A l l o f these three methods support the analysis of multiple predictors with multiple categories within each predictor variable. Same as i n the category I and II questions, for all six methods the cut-off for genes o f interest was a combination o f p-value (had to be smaller than 0.01) and the number o f genes we wanted to examine after this step (top k). The test data sets were similar to the ones used in testing the filtering methods for category I o f questions, except we had 100 patients for each o f the three categories (liver, kidney, and heart transplant). 4.2.3.1. Differential Expression for All Patients at All Time Points A s it can be observed on F i g . 4.11, there were four methods that returned the same results, t-test, Wilcoxon, permutation test, and M A N O V A . The other two tests' performance was not as good; especially G E E found less than 40% o f biomarkers. A s the 47 \fnumber o f biomarkers planted in a data set increased, linear regression and G E E were able to find the top 60 to 100 genes. 100 B i o m a r k e r s f o r All Patients at All Time P o i n t s \u2014 ^T-test.-.Wilcoxori,Permutation, Manova - \u2022 ' l i n e a r degression \u2022 :GEE,;: :., : B0 J-90 : 100 Number of top genes a. 100 biomarkers expressed in all patients at all time points 48 \f200 Biomarkers for All Patients at All Time Points b. 200 biomarkers expressed i n all patients at all time points 500 Biomarkers for All Patients at All Time Points MNumbersbfit^ ';s' : c. 500 biomarkers expressed in all patients at all time points Figure 4.11: Top k vs. number o f true positives i n A P - A T , cat. I l l 49 \fAlthough the top k versus number o f true positives graphs showed that t-test, Wilcoxon, permutation, and linear regression (see Fig. 4.12) performed similarly, the picture shown by the precision-recall graph was different. W h i l e these tests had 100% recall, only M A N O V A had no false positives. T-test, W i l c o x o n , and permutation test had all low precision, but the false positives had smaller p-value than the true positives. Precision - Recall 40%; y :y:; 60%';. ;i00%:? : \u2022 '' Precision;:^?: Figure 4.12: Precision-Recall graph for biomarkers expressed i n all patients at all time points 4.2.3.2. Differential Expression for Some Patients at All Time Points For the data sets with biomarkers expressed at all time points i n only some patients, t-test, Wilcoxon, and permutation test found the most markers (see F i g . 4.13). Linear regression found about 70% o f the biomarkers, followed b y M A N O V A and G E E , with less than 40% differentially expressed genes found. W h e n the number o f biomarkers was 500 and we were interested i n the top 100 genes, all methods, except G E E , returned all true positives within the top 100 (see Fig. 4.13.b). 50 \f100 Biomarkers for Some Patients at All time; Points S00 Biomarkers for Some Patients at All Time Points - r i - i T-test; Wilcoxon, Perrh'utation.Li near regression; Manova v!\"?.''\" ';:' S S 1 : ;; \u2022 V? \u2022 ii-- ;i;:Nurnber of to? genes : '\u2022. .j I b. 500 biomarkers expressed i n some patients at all time points Figure 4.13: Top k vs. number o f true positives in S P - A T , cat. I l l 51 \fB y looking at the precision-recall graph (see F i g . 4.14), we were able to see that although t-test, Wilcoxon, and permutation test had similar recall, the precision for the first two methods was better than the precision o f permutation test. Nonetheless, all three filtering methods had low precision. The low precision however seems less important i n this case, because the top k genes included over 95% o f the planted biomarkers. Linear regression and M A N O V A had perfect precision, but 77% and 37% recall respectively. A s we noticed from the top k versus true positives graphs above, G E E had the worst performance out o f the filtering methods tested. This method had the lowest precision and recall as the precision-recall graphs shows. Precision - Recall \u00b0 -T-test & \u2022 Wilcoxon? +\\ Permutation x v Linear regression o . G E E '\u2022 \u2022 v. Manova ; : V o \u2022 \u2022 -M: :C: $S1P^ ?i???i;; ?:;:;:?> : : : ? ; ? \" : ; \u2022 ;| 4 . 8 0 % \u2022 \u2022 j ; ' .p:P;:pRrecisibnp..?:5p '. : V. dp*?!, ..; Figure 4.14: Precision-Recall graph for markers expressed i n some patients at all time points 52 \f4.2.3.3. Differential Expression for All Patients at Some Time Points When the biomarkers planted in the test data sets are differentially expressed at only some o f the time points, all o f our filtering methods found only a few o f the planted biomarkers (see F i g . 4.15). Although permutation test found the most genes, even this method's performance was poor. The precision-recall graph, shown i n F i g . 4.16, confirmed our findings based on F i g . 4.15. If the differential expression was only present at some o f the time points, then none o f the filtering methods proposed by us performed well. Although linear regression and M A N O V A had 100% precision, these methods still had a weak performance, because the number o f true positives was very small, thus causing a very low recall. 60 Biomarkers for All Patients at Some Time Points Permutation\\ - \u2014 T-test - - Wilcoxon.tinear regression; GEE \u2014-.- Manova .: : ; : a. 60 biomarkers expressed in all patients at some time points 53 \f200 Biomarkers for All Patients at Some Time Points -:-'\u2022- P e r m u t a t i o n : ' ; . i - T - \\ T - t e s t ':'\u2022;,;\u2022\u2022 - -\"-Wilcoxon'-' ; \u2022-:'->-- L i n e a t r e g r e s s i o n \u2014 Manovai. ; \u2022:\u2022=) CO' \u2022\u2022MfflS-i'lp TTT; \u2022 60:' T-P - TT-\u2014~~ 1 r \u2014 90 \" 70;; ,100;' N u m b e r of t o p g e n e s \u2022 b. 200 biomarkers expressed in all patients at some time points Figure 4.15: Top k vs. number of true positives in A P - S T , cat. I l l Precision - Recall : '4p% ; ; ; ; :;60%; ; Precision: : : - Figure 4.16: Precision-Recall graph for biomarkers expressed i n A P - S T 54 \f4.2.4. Category IV of Questions This last category includes four questions: 1. A r e there genes with differential expression between control and rejection patients? 2. A r e there genes with differential expression between liver and kidney? 3. A r e there genes with differential expression between heart and kidney? 4. A r e there genes with differential expression between liver and heart? These four questions are really similar, they are asking for genes with differential expression between two groups o f patients. W e chose question one as the representative of this category and we applied the same 12 test data sets we used for the questions i n category I. T-test, W i l c o x o n , and permutation test were applied at each time point as i n category I, but the difference is that we did not care at how many and which time points the differential expression occurred. Rather, we were interested i n all genes with different expression levels between control and rejection patients. The way linear regression, G E E , and M A N O V A were applied is the same as for the questions i n category I. The cut-off for the genes o f interest was the same as for all o f the previous categories. 4.2.4.1. Differential Expression for All Patients at All Time Points When we examined the data sets with biomarkers expressed differentially for all patient at all time points, t-test, Wilcoxon, permutation test, and M A N O V A had all the perfect answers within the top k genes. I f the number o f biomarker genes was much bigger compared to the top k genes, then linear regression and G E E would eventually provide 100% results for number o f true positives versus top k (see Fig. 4.17). 55 \fi 00 Biomarkers for All Patients at All Time Points - \u2014 ' T-test, W i l c o x o n , P e r m i i t a t i o n . M a n o v a \u2022 Linear regression . : - ' - : G E E ; ' \u2014 60 70 n ~ ~ ~ so- HO :'.100: \u2022 N u m b e r of t o p g e n e s ' a. 100 biomarkers expressed i n all patients at all time points 200 Biomarkers for All Patients at All Time Points T-test, Wilcoxon; Permutation, Linear regressiori' Mariova : \u2022\"GEE .:\u2022 J:- . )\u2022 \u2022\u2022 ^JM:';\u2022.\u2022';:\u2022.'': .;I;?'o: \"TTT \u2022 \u2022:: \u2022 : 7 0 ; :: 60 : 1 70 : ;:;;:j; ; : '\". ;':'\u2022' 80 80\u2022; ' \u2022\u2022\u2022\u2022:\u2022!;;';;;: ;:';'9p':';*X;m;ml991-: M : : : ;; : .yNuiTibetoftdp.ge^ b. 200 biomarkers expressed i n all patients at all time points 56 \f500 Biomarkers for All Patients at All Time Points vT-t'est. Wilcoxon. Permutation. Linear regression, GEE Manova ; ec \" \u2022 so-\u2022\"\u2022 \u2022' 7o:,- : 90 ' 100 Number of top genes c. 500 biomarkers expressed i n all patients at all time points Figure 4.17: Top k vs. number o f true positives i n A P - A T , cat. I V W e found some interesting results from testing the filtering methods for this category o f questions. For example, t-test, W i l c o x o n , and permutation test provided many false positives. That is why, although they had 100% recall, their precision was very low, below 10% (see F i g . 4.18). This is partly due to the fact that we were interested i n differential expression at any, one or more time points, so genes with small differential expression at a single time point were also included i n the results. O n the other hand, M A N O V A had both perfect recall and precision. Linear regression had perfect precision, but only around 80% recall. G E E returned both false positives and false negatives, therefore had l o w precision and l o w recall. 57 \fPrecision - Recall .-T-test',.Wilcoxon Po.-ri.fatioi. Linear'regression: . GEE .-.'\u2022:\u2022:':\u2022'\u2022 fylahova'iv : ?o+ O ?.0% ? : \u2022'\u2022\u2022.\u2022 \" 2 0 % , : v - ' ^ 4 0 % '.\u2022 : ' , ' v \u2022\u2022\u2022'i :'.\u2022 1 ' 60% ,Z:80%V;:\/' \u2022\u2022 : . ; i o o % :'- '' Precision:?.: ?\u2022?\u00a5\u2022':\u2022 :?s;:?;:: '' : :: Figure 4.18: Precision-Recall graph for markers expressed i n all patients at all time points 4.2.4.2. Differential Expression for Some Patients at All Time Points When the biomarkers planted i n the test data sets were differentially expressed at all time points, but only for some o f the patients, M A N O V A performed differently from the other tests (see F i g . 4.19). W h i l e all o f the other tests found some o f the biomarker genes from all or most o f the 20 groups, M A N O V A only found the biomarkers expressed i n all patients. 58 \f100 Biomarkers for Some Patients at All Time Points Permutation< '!. ... \u2014 fT-test'iv : \":!;->' Wilcoxon;\\\/r ' Linear regression -T'GEEiv\u2014 \u2022 \" Manova.- v. \u2022' : : ;: : ~~1\u2014 TO- GO '.'\u2022\u2022\u2022 <' '100\": 90 .' : Number of top genes [ a. 100 biomarkers differentially expressed at all time points in some patients 500 Biomarkers for Some Patients at All Time Points : T-te'st. Wlcox6n';Permutati'o'n,Linearregression \u2022GEE::::^^ W Manova \u2022'; ; j :\u2022.''] : : SO 70 \u202280:: : 90 r : ; 100 Nuvbor oi top genes b. 500 biomarkers differentially expressed at all time points in some patients Figure 4.19: Top k vs. number of true positives in SP-AT, cat. IV 59 \fAlthough it seems from the top k versus true positive graphs (see Fig. 4.19) that GEE did the same thing as MANOVA, the poor performance of GEE was not due to the same issue. GEE assigned small p-values to many of the non-biomarker genes and, as a result, the true positives got a lower ranking and only a few made it to the top k. In contrast, while t-test, permutation, and Wilcoxon test had a lot of false positives (see Fig. 4.20), the top k for these methods included over 95% of all biomarkers for the first two methods, and over 80% for the third method. Precision - Recall :\u00b0 A +' :< O. y . crr- T-test Wilcoxon . ' . Permutation.. Linear regression GEE \/Manoya '. x tmtmM.'MAAWMMMM^MAW\u00a3$Qfflw&i'$f^ wiiA AAMyZMZAW: W-lil.PrecisionpispJP;; . ioo% P' i P Figure 4.20: Precision-Recall graph for markers expressed in some patients at all time points 4.2.4.3. Differential Expression for All Patients at Some Time Points While the performance of most of the filtering methods was not affected by having differential expression at not all, but only some of the time points, for MANOVA the performance changed quite a bit compared to the experiments with the biomarker being expressed at all time points. 60 \f60 Biomarkers for All Patients at Some Time Points: \u2014 T-test, Wilcoxon, Permutation - \u2022 .Linear regression:'\",.'\u2022:: - \u2022 Manova'' \u2022-\u2022'GEE 60 : :,<-, 70, 90\/ ybo: \"Number of top.genes\"; a. 60 biomarkers expressed in all patients at some o f the time points 200 Biomarkers for All Patients at Some Time Points :::'KS: Number of topVg'e'nes i : : b. 200 biomarkers expressed i n all patients at some o f the time points Figure 4.21: Top k vs. number o f true positives i n A P - S T , cat. I V 61 \fMANOVA found only about 60% of the true positives when the biomarker had differential expression in all patients but only at a few time points. T-test, Wilcoxon, and permutation test were the best methods to find all true positives (see Fig. 4.21). Even though t-test, Wilcoxon, and permutation test seemed to provide the perfect answers based on Fig. 21, when we studied the precision-recall graph we were able to notice these methods' fault. Namely, these threefilteringmethods had very low precision (see Fig. 4.22), due to the large number of false positives they returned. This problem was not detected by examining Fig 4.21, because the false positives had smaller p-values than the false negatives and, as such, the top k genes only included the true positives. Precision - Recall \u00b0 T-test, Wilcoxon + Permutation '\u2022 x Linear regression o GEE ' v Manova'' 1 of V ...O TTT 20% 40%:;-'-;J,;.;:.' v.p%; : : : 80% 100% Proc.sio ! - Figure 4.22: Precision-Recall graph for markers expressed in all patients at some time points Based on the experiments described in this chapter, we realized that some of the statistical methods' performance was different than expected. Also, even within a single category of questions, depending on how many patients have differential expression for a 62 \fgene and at how many time points, the methods performed differently. A l l these observations, and some other interesting findings, are described i n the next chapter. 63 \fChapter 5 Conclusion In the previous chapters we presented a few methods we believe are good for the filtering step o f analyzing a longitudinal microarray data set. W e described these statistical methods' strengths and weaknesses and, based on these properties, we mapped them to the biological questions they can answer. After we performed a few experiments using synthetic data sets, we found that some o f the filtering methods perform differently then expected. Based on our findings, we re-mapped the questions with filtering methods (see Table 5.1). W e expected some o f the tests to perform better. For example, G E E , a method designed for longitudinal data analysis, did not perform as well as we thought it would. We discovered that G E E found many false positives and false negatives, which decreases the value o f this statistical method for the filtering step. W e also found that t-test, Wilcoxon, and permutation test had a better performance than expected due to the high recall they provide. Although these methods are fast and easy to use, i n some cases, they might not be the best solution. Such is the case for questions i n category III, where there are many predictors to be considered i n the model. In these cases the methods have to be run for each combination o f pairs o f predictors and the results have to be combined. This process can become tedious and combining the results returned b y each pair might get complex. If there are many predictors for category III like questions, using M A N O V A for filtering might be a better solution. 64 \fCat. I II III IV Questions Methods 1. A t which one o f the time points does the D E start? 2. A r e there genes with D E from the moment o f transplantation? 3. W h i c h genes have D E between control and rejection at T i compared to T B ? 4 . W h i c h genes have D E between T and T n ? L 5. Are there genes with D E between liver, kidney, & heart? 6. Is the concentration o f the biomarker detected correlated with the severity o f the rejection? 7. Does gender, race, age, medication play a role in rejection? T-test, Wilcoxon, Permutation Type of Data AP-AT T-test, Wilcoxon, Permutation, MANOVA SP-AT Permutation, T-test, Linear regression AP-ST T-test, Wilcoxon, Permutation AP-AT T-test, Wilcoxon, Permutation, MANOVA SP-AT T-test, Wilcoxon, Permutation, Linear regression, GEE AP-AT T-test, Wilcoxon, Permutation, MANOVA SP-AT T-test, Wilcoxon, Permutation AP-ST Permutation AP-AT T-test, Wilcoxon, Permutation, MANOVA SP-AT T-test, Permutation, Wilcoxon AP-ST T-test, Wilcoxon, Permutation Linear regression, GEE, MANOVA 8. A r e there genes with D E between control & rejection? 9. Are there genes with D E between liver & kidney? All 1 0 . Are there genes with D E between heart & kidney? 1 1 . A r e there genes with D E between liver & heart? Table 5 . 1 : Revised question-method table 65 Revised Methods \fThere was another issue that came to our attention during the experiments. If the data set has genes differentially expressed at all time points, but in only some o f the patients, M A N O V A w i l l not find these genes. Therefore, M A N O V A is only recommended for the analysis o f these types o f data sets i f the purpose is finding biomarkers differentially expressed in all patients within a specific group. Furthermore, linear regression and G E E can find the genes with differential expression in the least number o f patients. For example, for category II type o f questions these tests can find genes with differential expression i n only 15% o f the rejection patients while t-test, W i l c o x o n and permutation test need this number to be above 70%. If the purpose o f the study is to find differentially expressed genes expressed i n a few o f the patients, then linear regression and G E E are the methods to be applied. O n the other hand, i f one is interested i n biomarkers expressed in all or most patients then the other statistical methods should be employed. We think that future research is needed to find statistical methods that can provide a high precision and recall for category III A P - S T type o f data. Even though i n Table 5.1 we listed permutation as the best method out o f the ones we investigated, this method did not perform well enough to find most biomarkers planted i n the data. For this type o f data all of the six methods had low recall and precision. Although the question-method mapping is specific to our project, Table 5.1 can provide a guideline for other studies with longitudinal microarray data. 66 \fBibliography 1. Benjamini Y , Hochberg Y . Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Royal Stat Soc B Met. 1995; 57 (1): 289-300. 2. Clarke W , Silverman B C , Zhang Z , Chan D W , K l e i n A S , Molmenti E P . Characterization o f renal allograft rejection by urinary proteomic analysis. A n n Surg 2003; 237 (5): 660-4. 3. Diggle P J , Heagerty P, Liang K - Y , Zeger S L . Analysis o f longitudinal data. 2 n d ed. Oxford; N e w Y o r k : Oxford University Press, 2002. 4. Donauer J, Rumberger B , K l e i n M , Faller D , Wilpert J, Sparna T, Schieren G , Rohrbach R, Dern P, Timmer J , Pisarski P, Kirste G , W a l z G . Expression profiling on chronically rejected transplant kidneys. Transplantation. 2003; 76 (3): 539-47. 5. Dugre F J , Gaudreau S, Belles-Isles M , Houde I, R o y R. Cytokine and cytotoxic molecule gene expression determined i n peripheral blood mononuclear cells i n the diagnosis o f acute renal rejection. Transplantation. 2000; 70 (7): 1074-80. 6. Edgar R, Domrachev M , Lash A E . Gene Expression Omnibus: N C B I gene expression and hybridization array data repository. Nucleic A c i d s Res. 2002; 30(1): 207-10. 7. Epicenter Software. Genetrix 2004. Online. Available: 15 July 2005. 8. Fahmy N M , Y a m a n i M H , Starling R C , Ratliff N B , Y o u n g J B , McCarthy P M , Feng J, N o v i c k A C , Fairchild R L . Chemokine and receptor-gene expression during early and 67 \flate acute rejection episodes i n human cardiac allografts. Transplantation. 2003; 75 (12): 2044-7. 9. Hirano H , Tabuchi Y , Kondo T, Zhao Q L , Ogawa R, C u i Z G , Feril L B Jr, Kanayama S. Analysis o f gene expression in apoptosis o f human lymphoma U937 cells induced by heat shock and the effects o f alpha-phenyl N-tert-butylnitrone ( P B N ) and its derivatives. Apoptosis. 2005; 10 (2): 331-40. 10. Ihaka R, Gentleman R . R : a language for data analysis and graphics. J. o f Comp. and Graphical Stats. 1996; 5: 299-314. 11. Lachenbruch P A , Rosenberg A S , Bonvini E , Cavaille-Coll M W , C o l v i n R B . Biomarkers and surrogate endpoints i n renal transplantation: present status and considerations for clinical trial design. A m J Transplant. 2004; 4 (4): 451-7. 12. Liang K - Y , Zeger S L . Longitudinal Data Analysis U s i n g Generalized Linear Models. Biometrica. 1986; 73: 13-22. 13. L u c k o w B , Joergensen J, Chilla S, L i JP, Henger A , K i s s E , Wieczorek G , Roth L , Hartmann N , Hoffmann R, Kretzler M , Nelson P J , Perez de Lema G , Maier H , Wurst W , Balling R, Pfeffer K , Grone H J , Schlondorff D , Zerwes H G . Reduced intragraft m R N A expression o f matrix metalloproteinases M m p 3 , M m p l 2 , M m p l 3 and Adam8, and diminished transplant arteriosclerosis in Ccr5-deficient mice. Eur J Immunol. 2004; 34 (9): 2568-78. 14. Matsui Y , Saiura A , Sugawara Y , Sata M , Naruse K , Yagita H , Kohro T, Mataki C , Izumi A , Yamaguchi T, M i n a m i T, Sakihama T, Ihara S, Aburatani H , Hamakubo T, Kodama T, Makuuchi M . Identification o f gene expression profile i n tolerizing murine cardiac allograft by costimulatory blockade. Physiol Genomics. 2003; 15 (3): 199-208. 15. Shands HealthCare. Myocardial biopsy 02 Oct. 2003. Online. Available: 15 July 2005. 16. Scherer A , Krause A , Walker J R , K o r n A , Niese D , R a u l f F . Early prognosis o f the development o f renal chronic allograft rejection b y gene expression profiling o f human protocol biopsies. Transplantation. 2003; 75 (8): 1323-30. 68 \f17. Schoels M , Dengler T J , Richter R, Meuer S C , Giese T. Detection o f cardiac allograft rejection by real-time P C R analysis o f circulating mononuclear cells. C l i n Transplant. 2004; 18 (5): 513-7. 18. Schorge JO, Drake R D , Lee H , Skates SJ, Rajanbabu R, M i l l e r D S , K i m J H , Cramer D W , Berkowitz R S , M o k S C . Osteopontin as an adjunct to C A 1 2 5 i n detecting recurrent ovarian cancer. C l i n Cancer Res. 2004 M a y 15; 10 (10): 3474-8. 19. Tracey L , Villuendas R , Ortiz P, Dopazo A , Spiteri I, Lombardia L , RodriguezPeralto J L , Fernandez-Herrera J, Hernandez A , Fraga J, Dominguez O, Herrero J, Alonso M A , Dopazo J, Piris M A . Identification o f genes involved i n resistance to interferon-alpha i n cutaneous T-cell lymphoma. A m J Pathol. 2002; 161 (5): 1825-37. 20. T w i s k J W R . Applied longitudinal data analysis for epidemiology: a practical guide. Cambridge, U K ; N e w Y o r k : Cambridge University Press, 2003. 21. Vaes B L , Dechering K J , Feijen A , Hendriks J M , Lefevre C , M u m m e r y C L , Olijve W , van Zoelen E J , Steegenga W T . Comprehensive morphogenetic protein 2-induced osteoblast microarray differentiation analysis resulting o f bone in the identification o f novel markers for bone development. J Bone Miner Res. 2002; 17 (12): 2106-18. 69 ","type":"literal","lang":"en"}],"http:\/\/www.europeana.eu\/schemas\/edm\/hasType":[{"value":"Thesis\/Dissertation","type":"literal","lang":"en"}],"http:\/\/vivoweb.org\/ontology\/core#dateIssued":[{"value":"2006-05","type":"literal","lang":"en"}],"http:\/\/www.europeana.eu\/schemas\/edm\/isShownAt":[{"value":"10.14288\/1.0051174","type":"literal","lang":"en"}],"http:\/\/purl.org\/dc\/terms\/language":[{"value":"eng","type":"literal","lang":"en"}],"https:\/\/open.library.ubc.ca\/terms#degreeDiscipline":[{"value":"Computer Science","type":"literal","lang":"en"}],"http:\/\/www.europeana.eu\/schemas\/edm\/provider":[{"value":"Vancouver : University of British Columbia Library","type":"literal","lang":"en"}],"http:\/\/purl.org\/dc\/terms\/rights":[{"value":"For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https:\/\/open.library.ubc.ca\/terms_of_use.","type":"literal","lang":"en"}],"https:\/\/open.library.ubc.ca\/terms#scholarLevel":[{"value":"Graduate","type":"literal","lang":"en"}],"http:\/\/purl.org\/dc\/terms\/title":[{"value":"Comparative study of statistical methods for finding biomarkers in longitudinal data","type":"literal","lang":"en"}],"http:\/\/purl.org\/dc\/terms\/type":[{"value":"Text","type":"literal","lang":"en"}],"https:\/\/open.library.ubc.ca\/terms#identifierURI":[{"value":"http:\/\/hdl.handle.net\/2429\/17587","type":"literal","lang":"en"}]}}