"Science, Faculty of"@en .
"Computer Science, Department of"@en .
"DSpace"@en .
"UBCV"@en .
"Hollander, Zsuzsanna"@en .
"2010-01-06T00:34:39Z"@en .
"2006"@en .
"Master of Science - MSc"@en .
"University of British Columbia"@en .
"Solid organ transplantation is a common procedure for end-stage organ failure. After the\r\ntransplantation, the rejection of the new organ is possible due to the patient's immune\r\nsystem trying to eliminate the foreign object. To prevent rejection, monthly painful and\r\ncostly procedure is needed which involves taking a biopsy of the allograft. The purpose\r\nof our project is to find biomarkers based on blood samples, so the diagnosis/prognosis of\r\nthe rejection can occur based on a simple blood or urine sample. Up to eight blood\r\nsamples are taken from rejection and non-rejection patients and for each sample a\r\nmicroarray is created. The microarray data is longitudinal and contains 54,613 genes.\r\nThe analysis consists of normalization, pre-filtering, filtering, testing the candidate\r\nbiomarkers for diagnosis/prediction, pathway analysis, and biomarker validation. For our\r\ntype of data, the bottleneck and the most understudied step is the filtering.\r\nWe focused our research on finding possible filtering methods. We tested these methods\r\nagainst the questions our biologists wanted to get the answer for. We generated a data\r\nset, based on real data, to find the strengths and weaknesses of the filtering methods we\r\nproposed to use. We also tested which one of the filtering methods would provide the\r\nmost precise answer to each group of questions by creating synthetic data sets with a\r\nnumber of biomarkers planted in them.\r\nOur conclusion is that a statistical method, or group of methods, would not be able to\r\nprovide the perfect answer to all of our biological questions. That is why we created a\r\ntable where we matched our questions to methods that, based on our experiments, give\r\nthe best results. Also, we provided some advice on which methods perform better under\r\nspecific conditions."@en .
"https://circle.library.ubc.ca/rest/handle/2429/17587?expand=metadata"@en .
"COMPARATIVE STUDY OF STATISTICAL METHODS FOR FINDING BIOMARKERS IN LONGITUDINAL DATA by ZSUZSANNA HOLLANDER B.A., Academy of Economical Studies, 1998 B.Sc, The University of British Columbia, 2003 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of Science in THE FACULTY OF GRADUATE STUDIES (Computer Science) THE UNIVERSITY OF BRITISH COLUMBIA December 2005 \u00C2\u00A9 Zsuzsanna Hollander, 2005 Abstract Solid organ transplantation is a common procedure for end-stage organ failure. After the transplantation, the rejection of the new organ is possible due to the patient's immune system trying to eliminate the foreign object. To prevent rejection, monthly painful and costly procedure is needed which involves taking a biopsy of the allograft. The purpose of our project is to find biomarkers based on blood samples, so the diagnosis/prognosis of the rejection can occur based on a simple blood or urine sample. Up to eight blood samples are taken from rejection and non-rejection patients and for each sample a microarray is created. The microarray data is longitudinal and contains 54,613 genes. The analysis consists of normalization, pre-filtering, filtering, testing the candidate biomarkers for diagnosis/prediction, pathway analysis, and biomarker validation. For our type of data, the bottleneck and the most understudied step is the filtering. We focused our research on finding possible filtering methods. We tested these methods against the questions our biologists wanted to get the answer for. We generated a data set, based on real data, to find the strengths and weaknesses of the filtering methods we proposed to use. W e also tested which one of the filtering methods would provide the most precise answer to each group of questions by creating synthetic data sets with a number o f biomarkers planted in them. Our conclusion is that a statistical method, or group o f methods, would not be able to provide the perfect answer to all o f our biological questions. That is why we created a table where we matched our questions to methods that, based on our experiments, give the best results. Also , we provided some advice on which methods perform better under specific conditions. i i Contents Abstract i i Contents i i i List of Tables v List o f Figures v i . Acknowledgements v i i i 1. Introduction 1 1.1. Description of Project 1 1.2. Description of Data to be Analyzed 2 1.3. Challenges 3 1.4. Description of the Steps of Our Analysis 5 1.5. Description of Biological Questions 7 2. Review of Methods Currently Used for Finding Biomarkers 9 2.1. Types of Studies 9 2.1.1. Studies with Multiple Time Points and Small Number of Patients 9 2.1.2. Studies with Small Number of Genes 10 2.1.3. Studies with One Time Point 10 2.2. Summary and Critique 13 3. Filtering Methods for Finding Candidate Biomarkers 15 3.1. Description of the Filtering Methods , 15 3.1.1. Time Not Included 16 3.1.2. Time Included 19 i i i 3.2. From Biological Questions to Statistical Questions 24 3.3. Question to Method Mapping 25 4. Validation of Method - Question Correspondence 27 4.1. Description of Test Data Sets 27 4.2. Experiments Performed and Results 29 4.2.1. Category I of Questions 31 4.2.1.1. Differential Expression for A l l Patients at A l l Time Points 32 4.2.1.2. Differential Expression for Some Patients at A l l Time Points 35 4.2.1.3. Differential Expression for A l l Patients at Some Time Points 38 4.2.2. Category II of Questions 40 4.2.2.1. Differential Expression for A l l Patients at Both Time Points 41 4.2.2.2. Differential Expression for Some Patients at Both Time Points 43 4.2.3. Category III of Questions 46 4.2.3.1. Differential Expression for A l l Patients at A l l Time Points 47 4.2.3.2. Differential Expression for Some Patients at A l l Time Points 50 4.2.3.3. Differential Expression for A l l Patients at Some Time Points 53 4.2.4. Category IV of Questions., 55 4.2.4.1. Differential Expression for A l l Patients at A l l Time Points 55 4.2.4.2. Differential Expression for Some Patients at A l l Time Points 58 4.2.4.3. Differential Expression for A l l Patients at Some Time Points 60 5. Conclusion 64 Bibliography 67 i v List of Tables 1.1 List of biological question 8 3.1 Possible tests for the selection of candidate biomarkers 15 3.2 Categories of questions 25 3.3 Questions and statistical methods that can help answer them 26 4.1 Types o f data sets and their symbols 28 4.2. Positives and negatives 30 5.1 Revised question-method table 65 v List of Figures 1.1 Steps of statistical analysis for finding biomarkers 5 2.1 Common steps of statistical analysis for finding biomarkers ....11 3.1 Strengths of t-test, Wilcoxon, and permutation test 17 3.2 Weaknesses of t-test, Wilcoxon, and permutation test .....18 3.3 Strengths of linear regression and G E E 20 3.4 Weaknesses of linear regression and G E E 21 3.5 Strengths of M A N O V A 22 3.6 Weaknesses of M A N O V A '. 24 4.1 Top k vs. number of true positives in A P - A T , cat. 1 34 4.2 Precision-Recall graph for biomarkers expressed in all patients at all time points 35 4.3 Top k vs. number of true positives in S P - A T , cat. 1 36 4.4 Precision-Recall graph for markers expressed in some patients at all time points 37 4.5 Top k vs. number of true positives in A P - S T , cat. I 39 4.6 Precision-Recall graph for markers expressed in all patients at some time points 40 4.7 Top k vs. number of true positives in A P - A T , cat. II 42 4.8 Precision-Recall graph for biomarkers expressed in all patients 43 4.9 Top k vs. number of true positives in S P - A T , cat. II 45 4.10 Precision-Recall graph for biomarkers expressed in some patients 46 4.11 Top k vs. number o f true positives in A P - A T , cat. I l l 49 4.12 Precision-Recall graph for biomarkers expressed in all patients at all time points ...50 4.13 Top k vs. number of true positives in S P - A T , cat. I l l 51 v i 4.14 Precision-Recall graph for markers expressed in some patients at all time points....52 4.15 Top k vs. number of true positives in A P - S T , cat. I l l 54 4.16 Precision-Recall graph for biomarkers expressed in A P - S T 54 4.17 Top k vs. number of true positives in A P - A T , cat. IV 57 4.18 Precision-Recall graph for markers expressed in all patients at all time points 58 4.19 Top k vs. number of true positives in S P - A T , cat. IV : 59 4.20 Precision-Recall graph for markers expressed in some patients at all time points....60 4.21 Top k vs. number of true positives in A P - S T , cat. IV 61 4.22 Precision-Recall graph for markers expressed in all patients at some time points....62 Vll Acknowledgements First of all , I would like to thank my supervisor, Dr. Raymond N g , for all the invaluable advice and guidance he provided during my graduate studies. Also , many thanks for introducing me into the rewarding field of health informatics. Without his help I would not been able to be part of the Biomarkers in Transplantation project. I would also like to thank my second reader, Dr. Ruben Zamar, for reviewing my thesis and for all the insightful advice. Furthermore, I would like to thank Dr. Lang W u for all the information he provided me regarding longitudinal data analysis. Moreover, many thanks go to Dr. Holger Hoos, Elaine Chang, Timothy Chan, and all o f my lab members. Many thanks to my Biomarkers in Transplantation team members, Dr. Bruce McManus, Dr. Paul Keown, Dr. Rob McMaster, Janet McManus, Rob Balshaw, Dr. Gabriela Cohen Freue, Stephane Vannier, Martha Casey-Knight, A x e l Bergman, Dr. A l i ce M u i , Dr. Chris Ong, Pooran Qasimi, Jon Carthy, Ben Good, and everybody else on the team. Last, but not least I would like to thank my father and my husband for all the encouragement and help. Without my husband's, Laszlo, support I would not have gone back to school and would not been able to find this great new profession. Thank you and eternal love. In memory o f my mother, Puskas Maria , who taught me from an early age that with hard work everything is achievable. v i i i Chapter 1 Introduction 1.1. Description of Project Transplantation is a common procedure for patients with end-stage heart, kidney, or liver failure. Once the organ transplantation is performed, the rejection o f the allograft (implanted organ) is possible. Rejection occurs i f and when the patient's immune cells realize the presence of a foreign organ and try to destroy it. Periodic checking for signs of chronic or acute rejection has to be performed, so patient's immunosuppressor drug dosage can be adjusted to save the allograft from being destroyed. Biopsy of the allograft is the common method for the diagnosis of rejection. This involves the advancing of a thin tube to the transplanted organ, for example the heart, by insertion into a vein in the neck or groin and threading through blood vessels into the heart of transplant patients [15]. A t the end of the tube there is a bioptome that takes a small sample of the allograft tissue. This procedure is not only expensive but also painful and potentially harmful for the patient (sometimes causing even death) [2]. These are the main reasons why there is a need for a non-invasive and cost effective procedure for diagnosis/prediction of organ allograft rejection. Such a procedure could possibly be as simple as a blood test. Our Biomarkers in Transplantation project's focus is to find biomarkers of organ transplant rejection based on blood samples from patients with and without rejection. B y biomarker we mean a gene (or protein), which, i f 1 expressed at a certain level, indicates the occurrence of allograft rejection. These biomarkers should be differentially expressed (DE) in the rejection and non-rejection patients. This way, by looking at a few genes' expression level of a patient, the clinician could determine i f that patient is, or w i l l be, rejecting the transplanted organ. If we can find genes that are involved in the rejection process and which are good predictors/diagnostic tools, then the periodic biopsies can be replaced with a relatively painless and inexpensive procedure. The biomarker genes could even serve as drug targets for treating patients with organ transplant rejection. If biomarkers can be found, then the diagnosis/prediction could be performed in the future by taking a small amount of blood or urine sample from the patient with organ transplant and checking the biomarker genes. Depending on the expression level of these genes, the clinician could tell i f rejection has occurred or w i l l occur in the next few weeks/months. 1.2. Description of Data to be Analyzed To be able to find biomarkers in blood, biopsy and blood samples are taken from about 100 patients with organ transplants. The biopsies are obtained as part of the regular check-ups each transplant patient has to attend. Based on these tissue samples the clinician can tell i f the patient is experiencing rejection. Blood samples are taken from, each patient right before and periodically after the transplantation. U p to 8 samples are sent for microarray analysis for each rejection patient, where the last sample is the one when rejection is diagnosed for the first time. There is a control group of patients whose blood samples are also collected. This group consists of transplant patients who do not develop rejection within the time period of sample collection. Since the data set consists of samples collected from multiple patients at multiple time points, the data to be analyzed is longitudinal. The analysis of this data is complex because each o f the samples consists of 54,613 genes. Due to the size of the data set, the traditional statistical methods used for longitudinal data analysis cannot be applied. 2 Therefore, some pre-processing is needed to eliminate the genes that we know cannot qualify as biomarkers. The question is what statistical methods should be used to filter out the non-biomarker genes. We have to make sure that no gene with potential predictive/diagnostic properties is eliminated, since once a gene is eliminated it w i l l no longer be considered in the analysis. The purpose of our research is to find appropriate methods for the analysis of the longitudinal data with a focus on the filtering process. The blood samples are still being collected and, as a result, we have no real data to work with. This is why we created some test data sets that are based on the properties of the preliminary data we have. With the help of these test data sets, we were able to investigate the filtering methods to ensure that only genes with non-desired properties are eliminated. 1.3. Challenges In order to see what the common practice is in the analysis of longitudinal microarray data, we looked for papers describing the steps for finding biomarkers in this type of data. To the best of our knowledge, there are no studies with the same scope and data types. There are only a few papers that deal with longitudinal data, either because a study with multiple patients and multiple time points is still too costly or because for some disease the tracking of gene expression patterns over time is not essential. Out of these few longitudinal studies, some look at the expression of only those previously known biomarkers [8 and 18]. Others analyze one time point, and then verify the candidate biomarkers found at that one time point, by clustering the candidate biomarkers one time point at a time and comparing the results [14]. In some other studies, the purpose is not of finding biomarkers. If one is looking at only the previously known biomarkers then the analysis is different than the analysis o f the whole microarray. This is because in the first case one deals with fewer than 10 genes and, as a result, all traditional statistical methods built for the analysis of longitudinal data work. In the later case, one has to look 3-at tens o f thousands o f genes and, as we noticed, the traditional methods cannot be applied anymore. We surveyed biomarker studies that analyze microarray data for multiple patients and only at one time point to see what the steps for finding biomarkers are. W e noticed a pattern among most papers (see details in Chapter 2). The common steps applied are normalization, pre-filtering, finding differentially expressed genes, and clustering. The first step, normalization, is applied to have expression values for genes from different samples comparable. During the pre-filtering step, the genes with undesirable qualities are eliminated from further analysis. The common method used in most studies, for finding the differentially expressed genes, is the t-test. Clustering, the last step of the analysis, is based on the candidate biomarker genes found in the previous step. We tried to use these steps as the base for our analysis. We changed the usual t-test applied in the filtering step to G E E (Generalized Estimation Equations) [12], a method developed for the analysis of longitudinal data. The model that we had to fit with this method is the following: Y s =p o +X f i l p i +X i j 2 P 2 +. . . + X i j k P k +e i j ' where Yy is the outcome (the absence or presence of rejection) for patient i at time point j , Pk is the regression parameter for predictor k, and Xyk is the predictor k (expression of gene k) for patient i at time point j . B y fitting this model, G E E would show which genes are playing a role in differentiating the expression pattern over time between the control and rejection groups. In other words, G E E would give us candidates for biomarkers. We applied G E E to our test data and realized that G E E was not built for the analysis of thousands of predictors. In a usual statistical setting, it would be used for analyzing no more then a dozen predictors. Even when used for the analysis of biological data, the usual setting would be to use the clinical data (e.g. age, gender, medication, etc.) as predictors, which again is limited in number. We realized that we cannot apply G E E , for fitting the above model, in our analysis, or at least not before filtering out some more genes, because of the degree of freedom problem. In order for G E E to be applied, based 4 on the number of patients and time points we have in our data set, the maximum number of genes we can use as predictors is no more than 400. number, of genes < number of patients * number o f time points A s a result, we came to the conclusion that there has to be a filtering step before we can apply G E E or survival analysis to find the biomarker. 1.4. Description of the Steps of Our Analysis The first 3 steps of analysis performed in our study are similar with the ones commonly used in this field (see Fig . 1.1). We have some extra steps that are needed for validation of the prediction/diagnostic power o f the candidate biomarkers obtained after the filtering process. Normalization Pre-filtering Filtering Biomarker validation Figure 1.1: Steps of statistical analysis for finding biomarkers 5 Normalization is needed in a microarray study to make the expression values for genes from different samples comparable. The normalization is performed with Genetrix, a microarray analysis software developed by Dr. Jonathan Buckley at the University of Southern California, Los Angeles [7]. The normalization is achieved by weighting the score for each gene for each sample by using the P C A method. Genetrix incorporates the Probe Profiler package (developed by Corimbia Inc.), and P C A is a Probe Profiler algorithm. Following the normalization, genes with negative expression value across all samples (control and rejection) are eliminated because negative expression value means that the gene is not activated. The genes with similar expression value across all patients and all time points are also eliminated from the analysis. The reason for this is that a biomarker gene should have different expression levels for the rejection and non-rejection patients, and they should also have some between time point variations. According to our previous experience with microarray data, the pre-filtering w i l l reduce the number of probe sets from over 50,000 to about 10,000. Various statistical methods are applied for the filtering (see section 3.2.) to further reduce the number of genes to about 100 (which we call top k). The purpose o f this step is to find differentially expressed genes between the two groups o f patients. The remaining 100 genes are called candidate biomarkers. Following the filtering step, pathway analysis is performed, during which we find those genes that regulate or are regulated by our candidate biomarkers. It is important to identify all genes that might be involved in the rejection so that the ones that have the best predictive/diagnostic power can be pinpointed. The genes up- and downstream from the candidate genes are added to the list of candidate genes, and are further analyzed with traditional statistical methods used for longitudinal data analysis. These statistical techniques are G E E and survival analysis methods, and are applied to check which one of the candidate biomarkers can be used for prognosis and/or diagnosis. A t the end of this step, we are left with less than 10 genes which constitute the biomarker genes. 6 Another pathway analysis step follows, where the pathways that the biomarkers are involved in are found. A t this stage more biological information can be found about the biomarkers, and i f there are a few of them involved in the same part of a pathway then a possible drug target might be located. This information could be used by a drug company in prevention of rejection, after the biomarkers and the drug target are biologically validated. Our research focuses on the filtering step because this is the bottleneck of the analysis and is one of the most important steps of the whole analysis process. Also , this is the step that is not performed in the related studies. We want to give a list o f methods that one could use in the filtering phase, and also show which method should be used in what situation. The reason for the comparative study of the filtering methods is to provide guidelines for researchers who want to analyze similar type of data to find biomarkers for a certain medical condition. 1.5. Description of Biological Questions To be able to find the best analysis strategies for our data set we asked our biologists to provide us with questions they would like to get the answers to within our Biomarkers in Transplantation project. We collected these questions and created a list (see Table 1.1). The remaining of the thesis is organized as follows: Chapter 2 describes the methods currently used for finding biomarkers. In chapter 3 we categorize the biological questions and map them to filtering methods. In chapter 4 we describe experiments performed on synthetic data and our findings. We provide our conclusions in chapter 5. 7 Biological Questions 1 A t which one of the TPs does the D E start? 2 Are there genes with D E from the moment of transplantation? 3 Which genes have D E between control and rejection at the first time point after transplantation (Tj) compared to time point before transplantation ( T B ) ? 4 Which genes have D E between last (T L ) and one before last time points (T L-i)? 5 Are there genes with D E between liver, kidney, & heart? 6 Is the concentration of the biomarker detected correlated with the severity of the rejection? 7 Does gender, race, age, medication play a role in rejection? 8 Are there genes with D E between control & rejection? 9 Are there genes with D E between liver & kidney? 10 Are there genes with D E between heart & kidney? 11 Are there genes with D E between liver & heart? Table 1.1: List of biological questions 8 Chapter 2 Review of Methods Currently Used for Finding Biomarkers 2.1. Types of Studies We searched the literature for studies with similar purpose and data type as our Biomarkers in Transplantation project. We realized that, while there are a few studies that have some similarities with our project, none of them have exactly the same scope and data type. What this means is that either the purpose of the study is not to find biomarkers or the data under analysis does not have the same properties. For the papers we found to analyze longitudinal microarray data, the purpose of the research was not finding biomarkers, but getting a better understanding of a specific condition. 2.1.1. Studies with Multiple Time Points and Small Number of Patients In the studies by Vaes et al., Tracey et al., and Hirano et al. [ 9, 19, and 21], where the number of samples per time point and condition is very limited, between 1 and 10, the method of choice for finding genes of interest is fold change. Usually, genes with fold change smaller than 1.5 [9] or 2 [19 and 21] are excluded from the study. This means that i f the gene expression level change between control and the studied condition is less than 1.5 (or 2) then that gene can not be used as biomarker of that specific condition. 9 Depending on the number of genes remaining, some look at only the genes with the largest fold change [9] or use clustering,[19] and function of genes [21] to narrow down the number of genes that could be considered markers of the disease under study. 2.1.2. Studies with Small Number of Genes Many studies hypothesize that genes known from the literature, as related to the condition under study, might be helpful in the diagnosis of that condition. In these cases, a microarray study with tens of thousands of genes is not necessary; the method of choice is R T - P C R , or some other version of P C R . P C R is a technique for rapidly creating copies of a specific segment of D N A . This technique enables the genetic profiling of samples containing very small amounts o f D N A . The advantage of P C R is that it can be performed in the lab and is cost effective. Also , the researcher can measure only the expression level of the genes of interest. Shoels et al. [17] analyze 39 genes with R T -P C R , by using Mann-Whitney U-test, to find differentially expressed genes between the groups of patients. The next step is discriminant analysis, to find genes that discriminate between the different conditions after which, by applying receiver operating characteristic (ROC) analysis, the optimal cut-off is evaluated and applied in the discriminant analysis until the optimal R O C cut-off is achieved. In another similar study [5], the number of genes analyzed is 10, the number of patients is 13 control and 8 diseased. The statistical analysis method of choice is t-test. Expression levels at the time of appearance, o f disease are compared with the baseline expression level and a comparison between the control and disease group is also performed. 2.1.3. Studies with One Time Point Based on papers published on biomarker studies with one time point [4, 13, 14, and 16], the trend of statistical analysis can be summarized in a few major steps: normalization, pre-filtering, identification and analysis of differentially expressed genes using t-test, and clustering procedures (see Fig . 2.1). 10 Normalization Pre-filtering Finding D E genes (t-test) Clustering D E genes Figure 2.1: Common steps of statistical analysis for finding biomarkers A s in any microarray analysis, the first step o f the statistical analysis for finding biomarkers consists of the normalization of data. A global normalization is performed by setting the mean value [13, 14, and 16] or the median and unit variance value [4] to a value fixed by the researcher. In the case of c D N A microarray analysis, a correction of dye-specific effects and the intensity dependent bias is also performed [4]. The normalization is followed by the pre-filtering process, where genes are eliminated i f they have undesirable qualities, such as expression value close to that of the background, small fold difference between expression value of rejection and control samples [4, 14]. The following step is to identify genes with differential expression, out of the ones remaining after pre-filtering. The most commonly used test for finding these genes is the t-test [4, 13, 14, and 16]. Some researchers prefer to use Wilcoxon and quantile-based tests as well [16]. The genes, identified as differentially expressed, are clustered using hierarchical clustering. If the samples within the same group are in the same cluster and separate from the other group then it confirms that the identified genes are correlated with the allograft rejection. To provide some detail on how the steps described in Figure 2.1 are performed the description of the analysis of choice for two papers follows. 11 Scherer et al. [16] analyze a sample of 17 transplant patients (9 of the patients developed rejection within 6 months after the sample was taken and 8 did not) to find predictors o f renal allograft rejection. They normalize the data set, consisting of 12,625 probes, by scaling the mean value of the 96% quantile of the probes on a chip to .200. Further analysis is performed in S-Plus and GeneSpring, and comprises of setting the really small average difference values to 10 and eliminating the probe sets that do not satisfy some preset conditions. These conditions are the following: both groups of patients' data contain at least 40% values above 50, and the whole data set contains at least 50% of values above 10. To find the differentially expressed genes Student's t-test, Wilcoxon test, and quantile-based tests are performed. Ten genes are identified by all o f the above tests and, as a result, they are considered to be the biomarkers. Two kinds of tests are performed to assess the prediction power of the ten genes. In one o f the tests, by looking at the expression of the ten genes in 16 samples, the authors aim to predict the 17 t h patient's outcome. The other test is done by looking at a patient's data, which was not included in the analysis, and predicting the outcome based on the expression o f the biomarkers. Both tests validate the prediction power of the identified genes. Donauer et al. [4] analyze kidney transplant data to identify genes based on which one can tell i f a patient has chronic allograft rejection. They first perform a global normalization on the data set consisting of 13 chronic allograft rejection, 16 normal, and 12 end-stage polycystic kidney ( P K D ) samples. Genes with expression levels not exceeding three-fold over the background in over 80% samples within a group of patients are eliminated. After this pre-filtering 2,190 genes remain for statistical analysis. To find differentially expressed genes, they use two-sample t-tests, with p-values adjusted as per Benjamini and Hochberg method [1] to control the false discovery rate. They also perform a hierarchical cluster analysis for the 517 genes with significant difference between transplants and normal tissue (p-value < 0.0003), or transplants and P K D tissue (p-value < 0.05). The purpose o f the clustering is to check if, based on the candidate biomarkers, the different groups of samples would cluster in different groups. They find two major branches in the dendrogram, one containing all normal samples and the other 12 containing kidneys with end-stage renal disease (ESRD). This last branch is further divided into two major branches that can not be explained by clinical or histological data. To identify biomarker genes specific for chronic transplant rejection, they evaluate the set of genes that were significantly different from normal and P K D (p-value < 0.05) and further eliminate those that were less than 1.5 fold up- or down-regulated. The remaining 27 genes are considered to be characterizing the kidney allograft rejection. To understand more about these genes they are annotated according to their biological functions. A similar analysis is performed to identify genes characteristic for E S R D . 2.2. Summary and Critique A s we stated before, to the best of our knowledge, there are no studies with the same purpose and data type as our project. Because of this, the type of analysis performed is also different. One situation is when the data has multiple time points, but only one or two patients. In this case, only very simple methods can be used in the analysis, like ratios. Another situation is when researchers have an idea about what the biomarkers would be for a certain condition and they only look at those genes' expression by using some version of P C R , not microarrays. This means the number o f genes to be analyzed is only about a dozen. Moreover, there are situations when only the clinical data is measured over time not the gene expression, which again means only about a dozen of predictors. The analysis of a handful or a few hundred genes is different from how one would analyze tens of thousands of genes. In both of these last two cases the traditional statistical methods designed for longitudinal data analysis w i l l work perfectly. Furthermore, most biomarker studies have only one time point, which requires different statistical analysis methods than our longitudinal data set. For the papers we found to analyze longitudinal microarray data, the purpose of the research was not finding biomarkers, but getting a better understanding of a specific condition. In this case, again, the steps of the analysis are different from our project's because the aim of the study is not the same. 13 B y reviewing the statistical analysis of microarray data with one time point we found some common steps followed by all the papers. However, the statistical methods used in each step are not always the same. In general, a cluster analysis is performed after the differentially expressed genes are identified [4, 13, 14, and 16]. Some studies compare clusters at different time points as well [14]. We believe that, while clustering is useful in microarray data analysis, it should be mostly used for hypothesis generating purposes. In general, the goal of the papers reviewed was to find biomarkers for allograft rejection. A s pointed out by Lachenbruch et al. [11] it is not enough to discover the genes correlated to allograft rejection. One has to also make sure that these genes can predict the occurrence of rejection, meaning that change in the expression level of these genes w i l l result in change in the outcome of organ transplant. In our case, after the biomarkers are identified, according to Lachenbruch et al., they should be tested to see i f they can be applied in prediction/diagnosis. The method of choice to find the genes that can predict the outcome can be multiple regression, and/or survival analysis, and variations of these methods [11]. Although the methods listed by Lachenbruch et al. are mentioned as techniques for prediction based on clinical data, they can be employed for microarray data as well . 14 Chapter 3 Filtering Methods for Finding Candidate Biomarkers 3.1. Description of the Filtering Methods Based on previous experience, we know that most of the genes are not expressed within a certain type of tissue sample. We estimate that we w i l l have about 10,000 genes at the end o f the pre-filtering step. Due to the still large number of genes, we can not apply the traditional statistical methods to analyze our longitudinal data set. However, we can devise algorithms based on existing statistical methods that w i l l help us narrow the search for biomarkers. These methods can be categorized based on their properties of being able to handle the analysis of data with multiple time points or not. Time not included Time included 1. T-test 2. Wilcoxon test 3. Permutation test 4. Linear regression 5. G E E 6. M A N O V A Table 3.1: Possible tests for the select ion of candidate biomarkers 15 In Table 3.1 we listed the methods that may help us achieve the objective of this study. Next, we described each group of filtering methods and listed their strengths and weaknesses, based on the statistical properties of these methods. The filtering methods were tested with the help of a semi-synthetic data set. The test data set was generated based on a rat spinal cord injury microarray data (http://www.ncbi.nlm.nih.gov/projects/geo/gds/gds_browse.cgi?gds=259) downloaded from the G E O data repository [6]. The original data set contained only 4 samples per time point, and we needed a higher number of subjects in order to test our methods. A 25 subject data set was generated with the mvrnorm function in R [10], based on the mean and the covariance matrix of the four rat samples. This new data set has multivariate distribution with the same mean and variance as the original rat microarray data. The moderate spinal cord injury was considered to be the control and the severe condition as rejection. A s a final result, our test data set contains 8791 genes, 4 time points, and 25 subjects in each group (control/rejection). 3.1.1. Time Not Included The three statistical tests in the \"time not included\" group (see Table 3.1) can help determine i f there is a significant difference between the expression level of a specific gene for two groups of subjects. These tests can be tailored to find genes that have differential expression at all time points, at some of the time points, or only at specific time points. The t-test assumes that the data to be analyzed is normally distributed. Wilcoxon and permutation tests are both non-parametric methods. Whi le the Wilcoxon test is based on the ranking o f the individual difference scores, the permutation test determines the statistical significance of the experimental outcome based on randomizing the two groups many times. Iterations are needed to assure every possible combination of the drawn data is attained. Based on our tests with the semi-synthetic data we learnt that t-test, Wilcoxon test, and permutation test are good at finding genes that are differentially expressed in two 16 different groups of subjects at all (see Fig. 3.1.a) or some of the time points (see Fig. 3.1.b). DEat All Time Points 2.5 \u00E2\u0080\u0094 2 1.5 a. x a> a> c a> D) 2 3 Time points a. DE at all time points DE at One Time Point 2.5 -r- 2 C L X 0) c 0! 1 2 3 4 Time points - * \u00E2\u0080\u0094 Rejection -\u00E2\u0080\u00A2- - - Control - \u00E2\u0080\u00A2 \u00E2\u0080\u0094 Rejection - o- \u00E2\u0080\u00A2 - Control b. DE at only one time point Figure 3.1: Strengths of t-test, Wilcoxon, and permutation test 17 These three tests are not able to differentiate between different trends for the different groups of subjects (see Fig. 3.2.a) and are weak at differentiating between similar trends of gene expression over time (with different intercepts) for the two groups of subjects (see Fig. 3.2.b). Different Slopes 2.5 c 2 .2 to g) 1.5 C L x 0) -I 3 0.5 CO 5 0 1 2 3 Time points \u00E2\u0080\u00A2 Rejection \u00E2\u0080\u00A2 Control a. different slope of gene expression over time for the two groups Similar Slopes & Different Expr. Levels 2.5 .2 2 in in \u00C2\u00A3 1.5 Q . X a 1 | 0.5 c? 0 - Rejection \u00E2\u0080\u00A2 Control c. differential expression between the groups at the last time point Figure 3.4: Weaknesses of linear regression and G E E 21 The typical question M A N O V A can answer is: \"Does the gene expression change over time?\" The power of this method comes from the fact that besides the time points, the groups or other variables (e.g. age, gender) can also be considered in the model. M A N O V A is good at finding genes with differential expression pattern over time between control and acute rejection subjects (like GEE). Unlike GEE, M A N O V A is not restricted to finding genes with a certain expression pattern (like linear, exponential, etc.). M A N O V A can find genes with any expression pattern over time (see Fig. 3.5). Non-linear Model Rejection Control 1 2 3 4 Time points Figure 3.5: Strengths of M A N O V A M A N O V A is weak at finding only those genes with differential expression at specific time point(s), like GEE, (see Fig. 3.6.a). M A N O V A is also weak at finding only genes with a specific slope or intercept, unlike GEE (see Fig. 3.6.b\u00E2\u0080\u00943.6.d). For example, M A N O V A can find genes that have upward and downward slopes, but i f somebody is only interested in genes with upward slope M A N O V A can not differentiate these from the genes with downward slope. 22 DE at Last Time Point 3 -,-0 2.5 -\"1 2 -1 1 5 -a> o> 1 -I 0.5 -1 2 3 4 Time points a. differential expression between the groups at the last time point Rejection Control Similar Slopes & Different Expr. Levels b. similar, upward, slope for the two groups of subjects Different Slopes & Same Intercept Time points c. different slopes and same intercept for the two groups o f subjects 23 Different Slopes & Intercept 2.5 n c .2 2 -CO \u00C2\u00A3 1.5 - Rejection Q. Control CD o 0 1 \u00E2\u0080\u00A2 2 3 4 Time points d. different slopes and intercepts for the two groups of subjects Figure 3.6: Weaknesses o f M A N O V A 3.2. From Biological Questions to Statistical Questions After reviewing the strengths and weaknesses of all o f the statistical methods listed in Table 3.1, we realized that one test, or one group of tests, is not able to answer all o f the biological questions from Table 1.1. B y mapping the biological questions to statistical questions, we recognized that the 11 questions can be placed in four categories (see Table 3.2). Category one includes questions which inquire about two groups of patients being differentially expressed at specific time points. The questions in the second category are comparing expression levels between two time points. In the next category of questions three or more groups are compared. The last category comprises of questions where differential expression between two groups o f patients is of interest. In this last case, the differential expression can occur at any or all time points. 24 Category Questions I 1. A t which one of the time points does the D E start? 2. Are there genes with D E from the moment of transplantation? II 3. Which genes have D E between control and rejection at T i compared to T B ? 4. Which genes have D E between Ti_ and T n ? III 5. Are there genes with D E between liver, kidney, & heart? 6. Is the concentration of the biomarker detected correlated with the severity o f the rejection? 7. Does gender, race, age, medication play a role in rejection? 8. Are there genes with D E between control & rejection? 9. Are there genes with D E between liver & kidney? 10. Are there genes with D E between heart & kidney? 11. Are there genes with D E between liver & heart? Table 3.2: Categories of questions 3.3. Question to Method Mapping Once we categorized the questions we were able to map questions with statistical methods that can help answer them (see Table 3.3). The matching between questions and filtering methods is based on the strengths and weaknesses of the filtering methods listed in this chapter. We believe t-test, Wilcoxon, and permutation test would provide the best results for the first two questions in table 3.3 because these three methods are able to find differential expression at specific time points. These three tests along with M A N O V A would provide the most precise answer to the next two questions. Since linear regression and G E E work better with more time points we believe they would not be able to perform as well as the other methods, due to the fact that we are comparing only two time points. For questions 5, 6, and 7 linear regression, G E E , and M A N O V A would be the best fit, for the reason that we have more than two predictors in the model. 25 For t-test, Wilcoxon, and permutation test we have to run each combination of two predictors separately and then combine the answers. This can become cumbersome as the number of predictors increases. A s far as the last four questions go, we believe that all methods would be able to provide a good answer. Cat. Questions Methods I 1. A t which one of the time points does the D E start? 2. Are there genes with D E from the moment of transplantation? T-test, Wilcoxon, Permutation test II 3. Which genes have D E between control and rejection at T i compared to T B ? 4. Which genes have D E between T L and TL_I? T-test, Wilcoxon, Permutation test, M A N O V A III 5. Are there genes with D E between liver, kidney, & heart? 6. Is the concentration of the biomarker detected correlated with the severity of the rejection? 7. Does gender, race, age, medication play a role in rejection? Linear regression, G E E , M A N O V A IV 8. Are there genes with D E between control & rejection? 9. Are there genes with D E between liver & kidney? 10. Are there genes with D E between heart & kidney? 11. Are there genes with D E between liver & heart? A l l Table 3.3: Questions and statistical methods that can help answer them 26 Chapter 4 Validation of Method - Question Correspondence 4.1. Description of Test Data Sets A s we described in the previous chapter, we found the strengths and weaknesses of the filtering methods using a semi-synthetic data set. However, we could not test the number of true positives, false positives, and false negatives returned by these methods using the same data set. This is due to the fact that it is almost impossible to know the exact number of differentially expressed genes in a real data set. A s we stated before, our semi-synthetic data was generated based on a real data set. To test i f our method-to-question mapping is correct, we created synthetic data sets. In these data we planted differentially expressed genes, which allows us to see how each method performs in terms of finding the biomarkers. A l l o f the test data sets have 10,000 genes and 100 patients per medical condition. Patients have variable number of time points with minimum of three and maximum of 8 time points. Gene expression values vary between 0 and 15. To make our data closer to a real data set, we had to add some variation within patients at a specific time point and variation between time points for all genes. This was necessary since, as we saw in real data sets, there is always some within and between variation for some of the genes. Given that we were testing our filtering methods, we assumed that genes with no variation across all time points and samples are eliminated; hence all our genes have within and/or between variations. We achieved this by 27 randomly generating the within variation (between zero and one) and between variation (zero to two) for all patients at all time points. A l l these numbers are values derived through log base two transformations, since the common practice is to take the log two of the gene expression values before performing analysis on the data. The variables in our data sets are the number o f biomarkers, the number o f patients with differential expression for a biomarker, and the number of time points with differential expression. It is important to have these dimensions o f the data variable to test how the different filtering methods perform under the different conditions. We created data sets with 60, 100, 200, and 500 biomarker genes respectively, which means that variable number o f genes have differential expression. Based on the literature, most people are only interested in above two fold differential expression. In our generated data sets we have between 2 fold and 64 fold difference, which corresponds to a value between 1 and 6 after log transformation. This, again, is based on real microarray data. We have seen this kind of variation in the limited number o f samples we obtained so far from transplant patients. Number of Biomarkers 60 100 200 500 DE for all patients at all time points 6 0 - A P - A T 100-AP-AT 2 0 0 - A P - A T 500-AP-AT DE for some patients at all time points 60-SP-AT 100-SP-AT 200-SP-AT 500-SP-AT DE for all patients at some time points 60-AP-ST 100-AP-ST 200-AP-ST 500-AP-ST Table 4.1: Types o f data sets and their symbols Within each of the data sets with a specific number of biomarkers we created three types o f data sets (see Table 4.1). For example, in the above table, the symbol 6 0 - A P - A T corresponds to a data set with 60 biomarkers, expressed in all patients and at all time 28 points. X - S P - A T is a data set in which only some of the patients have differential expression, and X - A P - S T contains X number of biomarkers differentially expressed in all patients, but only at some of the time points. To be more specific, for instance in 60-SP-A T , we divided the biomarkers into 20 groups. Each group has an equal number of genes (number of biomarkers divided by the number of groups). The groups consist o f 100, 95, 90, 10, and 5 patients, with differential expression at all time points. This way, we can test how the filtering methods find biomarkers with not all patients showing signs of differential expression. We also wanted to know i f the filtering methods find the biomarkers i f the differential expression is not present at all time points. To be able to do this, within each of the data sets with a specific number of biomarkers we created a data set ( X - A P - S T ) where the biomarker genes are divided into 4 groups. Each group consists of equal number of genes (number of biomarkers divided by number of groups). First group o f genes has no differential expression at one o f the first four time points. Second, third, and fourth groups have no differential expression at two, three, and four time points respectively. In the next sections we wi l l specify which test data is used to study the filtering methods under specific circumstances. 4.2. Experiments Performed and Results A s described in the previous chapter, the questions posed by our biologists are grouped into four categories, based on the types o f statistical methods that would help answer them (see Table 3.2). For each of these categories we chose a representative question, and proved that the other questions are similar and can be deducted from the representative of that category. W e applied all six filtering methods described in chapter 3 and compared the number of biomarkers found by each with the number of biomarkers planted in the data set. Except for the permutation test, all other methods were run in R, a free software environment for statistical computing [10]. The permutation test, an in-house Perl script, 29 was executed on a Sun Microsystems Sun-Fire-280R server with Solaris SunOS 5.9 operating system, 2 UltraSPARC-Ill CPUs, and 4 GB of R A M . The permutation test had the longest running time out of all the filtering methods, 32 hours and 42 minutes. This was the running time for a data set with 10,000 genes and 200 patients, and with 5000 iterations. The rest of the filtering methods were run on a server with Microsoft Windows Server 2003 operating system, a 3 GHz Intel Pentium 4 CPU, and 2 GB of R A M . The running time for t-test was 1 min 27 sec, for Wilcoxon 3 min 9 sec, for linear regression 7 min 48 sec, for M A N O V A 25 min 28 sec, and for GEE 53 min 28 sec. To find the best statistical methods for a category of questions, we tried multiple data sets for each category and investigated the top k (60 to 100) genes returned by each method. The basis for finding the best filtering method is that the top k genes returned by the method have to have the most overlap with the biomarker genes planted in the data sets. To assess the methods not only based on number of true positives returned within the top k genes, but also based on false positives and false negatives, we created precision-recall graphs for each test data set. The precision and recall are defined as follows: precision = P(A | B) = recall = P(B | A) P(A u B) P(B) P(A u B) n, n ! 1 + n 2 1 \u00E2\u0080\u00A2 x l 0 0 % P(A) n i l + n ! 2 - x l 0 0 % where A is the set of biomarkers and B is the set of biomarkers found by a statistical method, nn represents the number of true positives, ni2 is number of false negatives, and n 2 i is the number of false positives (see Table 4.2). Biomarker Found by Method Yes No Yes n, 2 No \ n2v,; n 2 2 B ;\u00E2\u0080\u00A2 A Table 4.2: Positives and negatives 30 In other words, precision is the proportion of biomarkers found by a method. Recall is the number of biomarkers found as fraction of all the biomarkers planted in the data set. Based on a precision-recall graph, the best method is the one which is the closest to 100% precision and 100% recall. This means few or no false positives and false negatives. We were also interested in seeing what the minimum number of patients or time points is for which a specific method returns a biomarker gene in the top k genes. This might be interesting since, for example, i f only a subset of the patients has differential expression for a specific biomarker, based on the clinical data (age, gender, race, etc.), we might find out that patients within a certain age range or race have different biomarker(s) than the other patients. Although the same six filtering methods are applied to all four categories of questions, there are some differences in the way we apply these methods for the different types of questions. For example, the tests are similar for category I and IV, while they are different for category II and III. Next, we w i l l show how the filtering methods are applied for each of the categories. 4.2.1. Category I of Questions In this category we explore the following questions: 1. A t which one of the time points does the differential expression start? 2. Are there genes with differential expression from the moment of transplantation? In these questions we are asking i f the differential expression between control and rejection occurs at all (question 2) or at some time points (question 1). More specifically, i f there are genes for which the differential expression starts at a specific time point (Ti for question 2, any time point for question 1) and has differential expression at all consecutive time points. From this we can conclude that question 2 is a subset of question 1. If we apply question 2, while eliminating the T i , T 2 , TL-I time points one by one in the consecutive runs, we wi l l get the answer for question 1. Based on this, i f 31 we test the filtering methods for question 2, we can be sure that the methods that perform the best for this question w i l l also be the best for answering question 1. For question 2 we applied the filtering methods in the following way: we ran t-test, Wilcoxon, and permutation test for each time point separately, while comparing the gene expression levels for control versus rejection patients. A t the end we picked the top k out of the genes that have p-value smaller than 0.01 at all time points. For linear regression, G E E , and M A N O V A we looked for genes that have a different combined group and time effect for the control and rejection patients. For these three methods the cut-off for genes o f interest was a p-value less than 0.01. To test the method-question correspondence, we applied the six filtering methods to 12 different data sets: X - A P - A T , X - S P - A T , and X - A P - S T , where X is 60, 100, 200, and 500 respectively. 4.2.1.1. Differential Expression for All Patients at All Time Points When we examined the top k versus number o f true positives returned by all methods, we realized that t-test, Wilcoxon, permutation test, and M A N O V A returned all the biomarker genes planted in the data sets (see Fig.4.1). If the number of biomarkers was increasing compared to the number of top k we were interested in, then both linear regression (starting from about 150 biomarkers) and G E E returned good results (starting from about 400 biomarkers) as can be seen on Fig.4.1 .b and 4.1 .c. 32 60 Biomarkers for All Patients at All Time Points \u00E2\u0080\u00A2 \u00E2\u0080\u0094 T-test, Wilcoxon. Permutation; Mandva Linear regression \u00E2\u0080\u0094 - GEE 60 ' 70 \u00E2\u0080\u0094 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 . I. \u00E2\u0080\u00A2 so ; Number of top genes ~ H \u00E2\u0080\u0094 90 ~~r-100 a. 60 biomarkers expressed in all patients at all time points 200 Biomarkers for All Patients at All Time Points \u00E2\u0080\u00A2: Number of .'top genes :; b. 200 biomarkers expressed in all patients at all time points 33 500 Biomarkers for All Patients at All Time Points ' Number of top g e n e s ' c. 500 biomarkers expressed in all patients at all time points Figure 4.1: Top k vs. number of true positives in A P - A T , cat. I Based on Fig . 4.2 we were able to spot that we had a few clear winners within the filtering methods for the data sets with biomarkers expressed differentially for all patients at all time points. These statistical methods were - the same as the ones found to be the best based on the previous 3 graphs - t-test, Wilcoxon, permutation test, and M A N O V A . A l l four methods had 100% precision and recall which means they returned only and all the biomarker genes. Linear regression provided around 80% true positives and no false negatives while G E E gave the worst results with less than 40% recall and only 10% precision (see F ig . 4.2). 34 Precision - Recall .\u00C2\u00BB T-test/Wile'oxon, Permutation; Manoya: x : Linear regression '. O GEE 7 \T~ ;0% 20% '40% Rrecisibn.: 80% \". 100% Figure 4.2: Precision-Recall graph for biomarkers expressed in all patients at all time points 4.2.1.2. Differential Expression for Some Patients at All Time Points If the biomarker is expressed at all time points, but only for some of the patients, then we might still be interested in these genes, since this can be a good indicator of the existence of subgroups of patients (with different age, race, gender, severity of rejection, etc.). Although the graphs with top k versus number of true positives (see Fig. 4.3.a and 4.3.b) might have suggested that linear regression provided similar quality answers as t-test and/or permutation and Wilcoxon test, upon closer inspection significant differences could be detected in the genes returned by these methods as candidate biomarkers. A l l biomarkers with differential expression for over 85% of rejection patients had a small p-value and were within the top k based on t-test, Wilcoxon, and permutation test. A s the percentage of patients showing evidence of differential expression for a gene decreases, the number of such genes showing up in the top k decreases. Only those genes with higher differential expression had p-value smaller then 0.01. 35 100 Biomarkers for Some Patients at All Time Points Permutation '-.-'- Linear regression \u00E2\u0080\u0094 \u00E2\u0080\u00A2'T-test' : - - Wilcoxon - ' - \u00E2\u0080\u00A2 . ' G E E ' . - \u00E2\u0080\u0094 \u00E2\u0080\u00A2 . Mahova -:'' 60 7 0 \u00E2\u0080\u0094I\u00E2\u0080\u0094 '80 90^'. ;ioo:: Number of top genes: ' . . . > ' a. 100 biomarkers expressed at all time points in some patients 500 Biomarkers for Some Patients at All Time Points : < D o ' : ' \u00E2\u0080\u00A2 :;:;'.\u00E2\u0080\u00A2\u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \" ^ - \u00E2\u0080\u0094 o M a n o v a ' ; ' \u00E2\u0080\u00A2' ~ T \" 60 80 : : ; :96; : ; 100 i;!;:. . -x,;. Numberof top;genesj: ;i;:^ '.,:?;J;|i;S' b. 500 biomarkers expressed at all time points in some patients Figure 4.3: Top k vs. number of true positives in S P - A T , cat. I 36 According to Fig . 4.3 and 4.4, M A N O V A returned the poorest results. The reason for this is that M A N O V A only finds the biomarkers with differential expression for all patients. Even i f the gene has differential expression for 90% of patients, M A N O V A w i l l assign a big p-value to that gene and, as a result, it w i l l not be considered of interest based on this statistical method. For linear regression and G E E to provide the right genes within the top k, the issue was not as much the number of patients with differential expression pattern, but rather the level o f differential expression. Based on the precision-recall graph in figure 4.4, we were able to notice that permutation test provided the best results, followed by linear regression and t-test. Furthermore, based on the graph bellow, it is clear that G E E was the only filtering method returning false positives. Precision - Recall ' :*?:^si: :;::j;' :' :< :;.-?'yM^ : ; - : : \" '^ \u00E2\u0080\u00A2 :\u00E2\u0080\u00A2\u00E2\u0080\u00A2: S E E 'i^-i-.:\" \u00E2\u0080\u00A2 .illlllS^ ^^ ^ ' :fM:Mi'M II1 W^MiM^^iU : ;;P?6% | p ; * p 4 p % : ; :P:-V'::S.:: Figure 4.4: Precision-Recall graph for markers expressed in some patients at all time points 37 4.2.1.3. Differential Expression for All Patients at Some Time Points If the biomarker is differentially expressed for all patients, but only at some of the time points, then, based on our question, we should have no genes returned by the filtering methods. While this was the case when applying t-test, Wilcoxon, and permutation test the other three methods returned more than 50% false positives (see Fig. 4.5. and 4.6). 60 Biomarkers for All Patients at Some Time Points : - - -L inear regression Manova - '\"'i '\u00E2\u0080\u00A2' \u00E2\u0080\u00A2 : . \u00E2\u0080\u0094 - : 'GE 'E ; - ; ::-::-:;4 \u00E2\u0080\u0094 T-testWilcoxoh.Permutation' -.60: a. 60 biomarkers expressed in all patients at only some of the time points 38 200 Biomarkers for All Patients at Some Time Points E Z3 . 2 -\u00E2\u0080\u00A2\u00E2\u0080\u00A2 L inear regress ion . M a n o v a \u00E2\u0080\u00A2 - G E E ' . . . \u00E2\u0080\u0094 T-test. W i l coxon . Permutat ion 60. 70 90 T7T~ ' ioo: Number of top genes b. 200 biomarkers expressed in all patients at some of the time points Figure 4.5: Top k vs. number of true positives in AP-ST, cat. I Based on this last experiment we should discard linear regression, GEE, and MANOVA from the list of statistical methods that can be used for filtering for the type of questions from this category because, clearly, they can not distinguish between differential expression at some or all time points of interest. Also, GEE provided less than 10% precision, which is unacceptable. 39 Precision \u00E2\u0080\u00A2 Recall T-test,-Wilcoxon, Permutation Linear regression' o G! I.. v ::Manova: O. ,;o% .20%:' 40% Precision X-V \" I :: - I :0%::; 100% Figure 4.6: Precision-Recall graph for markers expressed in all patients at some time points 4.2.2. Category II of Questions This category includes the following questions: 1. Which genes have differential expression between control and rejection at T) compared to T B ? 2. Which genes have differential expression between TL-I and T L ? The questions in this category are very similar. One is asking i f there are genes with differential expression between T B and T i and the other is asking the same, except the comparison is made between the last and one before last time points. The first question is asked with the scope of finding the genes whose expression level changes during and shortly after the transplant is performed. The last question is posed to find out what exactly happens shortly before the rejection can be detected versus at the point when it is 40 detected for the first time. According to the experimental design of the Biomarkers in Transplantation project, T L in the rejection patient group is the time point when the rejection is detected. In both questions we are comparing two time points; the only difference between the questions is the time points which are compared. A s ' a result, we can test the filtering methods for this category by applying them to answer either of the questions. The results we get should be the same in terms o f which statistical methods are better to answer them. We chose to test the first question in this category and we applied the filtering methods in the following manner: t-test, Wilcoxon, and permutation test were applied to the ratio of T i divided by T B . A l l patients have to have both of these measurements according to the experimental design and, as a result, the ratio can be calculated for all patients. Since the expression values were already log transformed, the ratio is actually the absolute difference between the log value of measurement taken at the first time point after transplantation and the log value of the baseline. When applying linear regression, G E E , and M A N O V A , we looked for genes with different combined group and time effect for the control and rejection patients. For all six methods the cut-off for genes of interest was a combination of p-value (had to be smaller than 0.01) and top k. We picked eight test data sets for this category; each set has 10,000 genes, two time points, and 100 patients for each o f the control and rejection groups. The data sets were X - A P - A T and X - S P - A T , where X is 60, 100, 200, and 500 respectively. 4.2.2.1. Differential Expression for All Patients at Both Time Points Based on Fig . 4.7 we were able to. see that, except M A N O V A , all other test performed really well . The top k versus true positives graph looked exactly the same when the number of biomarkers planted in the test data sets is 100, 200, or 500. 41 60 Biomarkers Expressed by All Patients -Trtest, Wilcoxon,' Permutation, Linear regression, GEE Manova'-'.- \u00E2\u0080\u00A2 v; v;;'' 6 0 . ' . - ' . 7 0 - 80 . 90 '100. . ' \u00E2\u0080\u00A2 ' ' . ; ' \u00E2\u0080\u00A2 Number of top genes :: : ; ' : -a. 60 biomarkers expressed in all patients 100 Biomarkers Expressed by All Patients \u00E2\u0080\u00A2; a>':-x::'o v S <9i: '-^\u00E2\u0080\u0094.; T-test, Wilcoxon; PermutaSon',:Lihear- regression,GEE - ^ - - M a n o v a ' \u00E2\u0080\u00A2:.:::\";-;:s>::^ '';\'-^ ^ :;:'::;::-60 . : -7 0 V 80 : Number',oftbp':genes:5: i 9 0 ; ; i:100: :: b. 100 biomarkers expressed in all patients Figure 4.7: Top k vs. number of true positives in A P - A T , cat. II 42 Upon closer inspection, we found that Wilcoxon and permutation tests gave a smaller than 0.01 p-value to 2 and 15 non-biomarker genes respectively. These did not show up in the top k because their p-value was smaller than those of the true biomarkers. These false positives can be seen in Fig . 4.8. This figure also shows that M A N O V A had a very poor performance, with 0% precision and recall. This means that M A N O V A returned no gene with p-value smaller than 0.01. Precision - Recall \u00C2\u00B0 T teat. Linear regress*on. GEE 4 Vvllt.oxon '\u00E2\u0080\u00A2 Cnrmutatiori :;: V;::Manova ,. -t-:: i':0 0 % \u00E2\u0080\u00A2 2 0 % '10% G 0 % 8 0 % 1 C 0 % i Precision? Figure 4.8: Precision-Recall graph for biomarkers expressed in all patients 4.2.2.2. Differential Expression for Some Patients at Both Time Points In the case o f biomarkers being expressed in only some of the patients, all filtering methods, except M A N O V A , performed similarly when the number of rejection patients with differential expression for a specific gene was more than 70% of all patients with allograft rejection. In this case, all biomarkers were found by all five methods. If the percentage of patients with differential expression was between 15% and 70% linear 43 regression and G E E found all biomarkers, but t-test, Wilcoxon, and permutation test found less and less of the planted biomarkers as the percentage decreased. A high level illustration of how all filtering methods performed is shown in Fig . 4.9. Based on these graphs, one can see that M A N O V A had the worst performance. The other methods' performance was good, and similar to each other, i f the number of biomarkers existent in the data sets exceeded 200 (2% of all genes in a data set). For fewer biomarker genes present in a data set, permutation test had the best performance, followed by t-test, linear regression, and G E E . A l l three methods had above 80% recall and no false positives (see Fig . 4.10). Wilcoxon test returned the least number of biomarkers out the five methods, but still had over 95% precision and almost 80% recall. 60 Biomarkers Expressed by Some Patients Permutation T-test; Linear regression. GEE Wilcoxon Manova': ' ' ' ' \"\u00E2\u0080\u00A2?\u00E2\u0080\u00A2\u00E2\u0080\u00A2'\" 'iv:.!:.'\".\" \u00E2\u0080\u00A2\u00E2\u0080\u00A2:-'\u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2:: ^ ' *>\u00E2\u0080\u00A2'' .'> '-_ Pi<\u00E2\u0080\u00A2:\u00E2\u0080\u00A2 W^m? '' \u00E2\u0080\u00A2 :'y^ ^ P W l p \u00E2\u0080\u00A2;'.::,;;;.':\u00E2\u0080\u00A2 \u00E2\u0080\u00A2;''.'.;...\"': :;Nurnberpf top genes-'M::Z \u00E2\u0080\u00A2\u00E2\u0080\u00A2 a. 60 biomarkers expressed in some patients 44 100 Biomarkers Expressed by Some Patients rrr 'T-test, Permutation \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 Linear regression,:GEE - W i l c o x o n ' : - - Manova v. \u00E2\u0080\u00A2,, . 60 7 0 \" r 8 0 Number of top genes \u00E2\u0080\u0094 r -'90 \u00E2\u0080\u00A2\u00E2\u0080\u00A2too-b. 100 biomarkers expressed in some patients 200 Biomarkers Expressed by Some Patients v. -->.: '.rrr?-.:T:test, Wilcoxon; Permutation. Linear regression. G E E ; \u00E2\u0080\u00A2^Manova .:\u00E2\u0080\u00A2:.:.'\u00E2\u0080\u00A2\u00E2\u0080\u00A2:\u00E2\u0080\u00A2:\" \u00E2\u0080\u00A2 a---------8 0 0 0 \u00E2\u0080\u0094 r ~ Pibb; \u00E2\u0080\u00A2 K Number of top gene's;:.';: : c. 200 biomarkers expressed in some patients Figure 4.9: Top k vs. number of true positives in S P - A T , cat. II 45 Precision - Recall T-test ... .'.'\".v::0: Linear regression, GEE' : A Wilcoxon1-' + \u00E2\u0080\u00A2 Permutation '\u00E2\u0080\u00A2' V Manova . ' ! -V ; \u00E2\u0080\u00A2 I . : . \u00E2\u0080\u00A2\u00E2\u0080\u00A2 I, \". :. : :\u00E2\u0080\u00A2: I.... * . . ' ;V' : : . -: .; . ' .... . . ' l : . . ' . . ; . : . : , ... - I ' V : ' \" \u00E2\u0080\u00A2.\u00E2\u0080\u00A2: v::-':6%. :p :, :20% :i:P;r':p40.%:::t:;'\u00E2\u0080\u00A2'::*:;:60%\u00E2\u0080\u00A2.\u00E2\u0080\u00A2 p.'\u00E2\u0080\u00A2'.','.' BO-s,\u00E2\u0080\u00A2;\": :\u00E2\u0080\u00A2;\u00E2\u0080\u00A2 , 100% : : ;/'.\"\u00E2\u0080\u00A2. ' '\u00E2\u0080\u00A2.: \":pl: : :y>iPTecision' \u00E2\u0080\u00A2 Figure 4.10: Precision-Recall graph for biomarkers expressed in some patients We performed two types of experiments for category II type of questions. In the first type the biomarker was expressed in all patients. In the second type the biomarker was expressed only in some of the patients. In both cases M A N O V A performed poorly, returning around 0.9% false positives and zero or one true positives. 4.2.3. Category III of Questions This category includes the following questions: 1. Are there genes with differential expression between liver, kidney, and heart transplant patients? 2. Is the concentration (expression level) of the biomarker detected correlated with the severity of the rejection? 3. Does gender, race, age, medication play a role in rejection? 46 In this category the questions look more diverse at first glance, but upon further investigation they have a lot of common features. For example, question one and two are actually identical in terms of analysis, because they both ask about the differences between three different conditions. The conditions in question one are the three different kinds of organ transplants, in question two the conditions are non-rejection, slight rejection, and rejection. Question three is a bit more complex, since it asks about multiple clinical variables. Thus, the number of pair wise comparisons is higher, but the analysis strategies are the same nonetheless. A s a result, we could test the filtering methods by applying them to either of these methods, and we chose question one for this task. Applying t-test, Wilcoxon, and permutation test are not the most natural choice, since these methods can only compare two groups at a time. Therefore, the analysis had to be performed in groups o f two different kinds of organ transplant patients for each time point separately. We had to apply these methods separately to liver versus kidney, to liver versus heart, and to kidney versus heart transplant patients, one time point at a time, and then combine the results. When applying linear regression, G E E , and M A N O V A we looked for genes with different combined time and organ effect. A l l o f these three methods support the analysis of multiple predictors with multiple categories within each predictor variable. Same as in the category I and II questions, for all six methods the cut-off for genes of interest was a combination of p-value (had to be smaller than 0.01) and the number of genes we wanted to examine after this step (top k). The test data sets were similar to the ones used in testing the filtering methods for category I of questions, except we had 100 patients for each of the three categories (liver, kidney, and heart transplant). 4.2.3.1. Differential Expression for All Patients at All Time Points A s it can be observed on Fig . 4.11, there were four methods that returned the same results, t-test, Wilcoxon, permutation test, and M A N O V A . The other two tests' performance was not as good; especially G E E found less than 40% of biomarkers. A s the 47 number of biomarkers planted in a data set increased, linear regression and G E E were able to find the top 60 to 100 genes. 100 Biomarkers for All Patients at All Time Points \u00E2\u0080\u0094 ^T-test.-.Wilcoxori,Permutation, Manova - \u00E2\u0080\u00A2 ' l inear degression \u00E2\u0080\u00A2 :GEE,;:::., B0 Number of top genes J-90 : 100 a. 100 biomarkers expressed in all patients at all time points 48 200 Biomarkers for All Patients at All Time Points b. 200 biomarkers expressed in all patients at all time points 500 Biomarkers for All Patients at All Time Points MNumbersbfit^ :';s' c. 500 biomarkers expressed in all patients at all time points Figure 4.11: Top k vs. number of true positives in A P - A T , cat. I l l 49 Although the top k versus number of true positives graphs showed that t-test, Wilcoxon, permutation, and linear regression (see Fig . 4.12) performed similarly, the picture shown by the precision-recall graph was different. While these tests had 100% recall, only M A N O V A had no false positives. T-test, Wilcoxon, and permutation test had all low precision, but the false positives had smaller p-value than the true positives. Precision - Recall 40%; y :y:; 60%';. \u00E2\u0080\u00A2 '' Precision;:^?: ;i00%:?: Figure 4.12: Precision-Recall graph for biomarkers expressed in all patients at all time points 4.2.3.2. Differential Expression for Some Patients at All Time Points For the data sets with biomarkers expressed at all time points in only some patients, t-test, Wilcoxon, and permutation test found the most markers (see Fig . 4.13). Linear regression found about 70% o f the biomarkers, followed by M A N O V A and G E E , with less than 40% differentially expressed genes found. When the number o f biomarkers was 500 and we were interested in the top 100 genes, all methods, except G E E , returned all true positives within the top 100 (see Fig . 4.13.b). 50 100 Biomarkers for Some Patients at All time; Points S00 Biomarkers for Some Patients at All Time Points - r i - i T-test; Wilcoxon, Perrh'utation.Li near regression; Manova v!\"?.''\"1';:': S S ; ; \u00E2\u0080\u00A2 V? \u00E2\u0080\u00A2 ii-- ;i;::Nurnber of to? genes '\u00E2\u0080\u00A2. . j I b. 500 biomarkers expressed in some patients at all time points Figure 4.13: Top k vs. number of true positives in S P - A T , cat. I l l 51 B y looking at the precision-recall graph (see F ig . 4.14), we were able to see that although t-test, Wilcoxon, and permutation test had similar recall, the precision for the first two methods was better than the precision o f permutation test. Nonetheless, all three filtering methods had low precision. The low precision however seems less important in this case, because the top k genes included over 95% of the planted biomarkers. Linear regression and M A N O V A had perfect precision, but 77% and 37% recall respectively. A s we noticed from the top k versus true positives graphs above, G E E had the worst performance out o f the filtering methods tested. This method had the lowest precision and recall as the precision-recall graphs shows. Precision - Recall \u00C2\u00B0 -T-test & \u00E2\u0080\u00A2 Wilcoxon? +\ : Permutat ion x v Linear regression o . G E E '\u00E2\u0080\u00A2 \u00E2\u0080\u00A2 v. Manova ; V \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 -M: :C: $S1P^ ; \u00E2\u0080\u00A2 ;| 4 . 8 0 % \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 j d p * ? ! , ?i???i;;: :?:;:;::?> ? ; ? \" : ; ' .p:P;:pRrecisibnp..?:5p :'. V. ..; Figure 4.14: Precision-Recall graph for markers expressed in some patients at all time points o 52 4.2.3.3. Differential Expression for All Patients at Some Time Points When the biomarkers planted in the test data sets are differentially expressed at only some of the time points, all of our filtering methods found only a few of the planted biomarkers (see Fig. 4.15). Although permutation test found the most genes, even this method's performance was poor. The precision-recall graph, shown in Fig . 4.16, confirmed our findings based on Fig . 4.15. If the differential expression was only present at some of the time points, then none of the filtering methods proposed by us performed well . Although linear regression and M A N O V A had 100% precision, these methods still had a weak performance, because the number of true positives was very small, thus causing a very low recall. 60 Biomarkers for All Patients at Some Time Points Permutation\ : - \u00E2\u0080\u0094 T-test - - Wilcoxon.tinear regression; ;GEE : \u00E2\u0080\u0094-.- Manova .: a. 60 biomarkers expressed in all patients at some time points 53 200 Biomarkers for All Patients at Some Time Points \u00E2\u0080\u00A2:\u00E2\u0080\u00A2=) CO' \u00E2\u0080\u00A2\u00E2\u0080\u00A2MfflS-i'lp -:-'\u00E2\u0080\u00A2- Permutat ion: ' ; . i -T - \ T - t e s t ':'\u00E2\u0080\u00A2;,;\u00E2\u0080\u00A2\u00E2\u0080\u00A2 - ; - \" - W i l c o x o n ' - ' \u00E2\u0080\u00A2-:'->-- L i nea t reg ress ion \u00E2\u0080\u0094 Manovai . ; TTT; \u00E2\u0080\u00A2 60:' T - P -\" 70 ; ; TT-\u00E2\u0080\u0094~~1 r \u00E2\u0080\u0094 Number of top genes \u00E2\u0080\u00A2 90 , 100 ; ' b. 200 biomarkers expressed in all patients at some time points Figure 4.15: Top k vs. number of true positives in A P - S T , cat. I l l Precision - Recall :;'4p%;; ; :;60%;; P r e c i s i o n : : : -Figure 4.16: Precision-Recall graph for biomarkers expressed in A P - S T 54 4.2.4. Category IV of Questions This last category includes four questions: 1. Are there genes with differential expression between control and rejection patients? 2. Are there genes with differential expression between liver and kidney? 3. Are there genes with differential expression between heart and kidney? 4. Are there genes with differential expression between liver and heart? These four questions are really similar, they are asking for genes with differential expression between two groups of patients. We chose question one as the representative of this category and we applied the same 12 test data sets we used for the questions in category I. T-test, Wilcoxon, and permutation test were applied at each time point as in category I, but the difference is that we did not care at how many and which time points the differential expression occurred. Rather, we were interested in all genes with different expression levels between control and rejection patients. The way linear regression, G E E , and M A N O V A were applied is the same as for the questions in category I. The cut-off for the genes o f interest was the same as for all o f the previous categories. 4.2.4.1. Differential Expression for All Patients at All Time Points When we examined the data sets with biomarkers expressed differentially for all patient at all time points, t-test, Wilcoxon, permutation test, and M A N O V A had all the perfect answers within the top k genes. If the number of biomarker genes was much bigger compared to the top k genes, then linear regression and G E E would eventually provide 100% results for number of true positives versus top k (see Fig. 4.17). 55 i 00 Biomarkers for All Patients at All Time Points - \u00E2\u0080\u0094 ' T-test, : W i l coxon , P e r m i i t a t i o n . M a n o v a \u00E2\u0080\u00A2 L inear regress ion . - ' - : G E E ; 60 70 ' \u00E2\u0080\u0094 n ~ ~ ~ HO \u00E2\u0080\u00A2 Number of top g e n e s ' so- :'.100: a. 100 biomarkers expressed in all patients at all time points .;I;?'o: 200 Biomarkers for All Patients at All Time Points T-test , W i l c o x o n ; Permuta t ion , L inear regress io r i ' : Mar iova \u00E2\u0080\u00A2 \" G E E .:\u00E2\u0080\u00A2 J:- . )\u00E2\u0080\u00A2 \u00E2\u0080\u00A2 ^JM:';\u00E2\u0080\u00A2.\u00E2\u0080\u00A2';:\u00E2\u0080\u00A2.'': : : 7 0 ; : \"TTT 80 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2:: \u00E2\u0080\u00A2 601 70 :: :;;:;;:j; ;':'\u00E2\u0080\u00A2' \u00E2\u0080\u00A2; :' \u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2:\u00E2\u0080\u00A2!;:;';;;;;: :;:';'9p':';*X;m;ml991-::M '\". . y N u i T i b e t o f t d p . g e ^ b. 200 biomarkers expressed in all patients at all time points 56 500 Biomarkers for All Patients at All Time Points vT-t'est. Wilcoxon. Permutation. Linear regression, GEE ; Manova ec \u00E2\u0080\u00A2' 7o:,-: \" \u00E2\u0080\u00A2 so-\u00E2\u0080\u00A2\"\u00E2\u0080\u00A2 90 ' 100 Number of top genes c. 500 biomarkers expressed in all patients at all time points Figure 4.17: Top k vs. number of true positives in A P - A T , cat. I V W e found some interesting results from testing the filtering methods for this category of questions. For example, t-test, Wilcoxon, and permutation test provided many false positives. That is why, although they had 100% recall, their precision was very low, below 10% (see F ig . 4.18). This is partly due to the fact that we were interested in differential expression at any, one or more time points, so genes with small differential expression at a single time point were also included in the results. On the other hand, M A N O V A had both perfect recall and precision. Linear regression had perfect precision, but only around 80% recall. G E E returned both false positives and false negatives, therefore had low precision and low recall. 57 Precision - Recall ?o+ .-T-test',.Wilcoxon : Po.-ri.fatioi. Linear'regression: . GEE .-.'\u00E2\u0080\u00A2:\u00E2\u0080\u00A2:':\u00E2\u0080\u00A2'\u00E2\u0080\u00A2 fylahova'iv O ? . 0 % ? : \u00E2\u0080\u00A2'\u00E2\u0080\u00A2\u00E2\u0080\u00A2.\u00E2\u0080\u00A2 \" 2 0 % , : : v v - ' ^ 4 0 % ' .\u00E2\u0080\u00A2 ' 6 0 % ,Z:80%V;:/' \u00E2\u0080\u00A2\u00E2\u0080\u00A2 : . ; i o o % ' , ' \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 ' i 1 : ' . \u00E2\u0080\u00A2 :'-:'' Precision:?.: ?\u00E2\u0080\u00A2?\u00C2\u00A5\u00E2\u0080\u00A2':\u00E2\u0080\u00A2 :?s;:?;::::'' Figure 4.18: Precision-Recall graph for markers expressed in all patients at all time points 4.2.4.2. Differential Expression for Some Patients at All Time Points When the biomarkers planted in the test data sets were differentially expressed at all time points, but only for some of the patients, M A N O V A performed differently from the other tests (see Fig . 4.19). While all o f the other tests found some of the biomarker genes from all or most of the 20 groups, M A N O V A only found the biomarkers expressed in all patients. 58 100 Biomarkers for Some Patients at All Time Points Permutation<::'!. ... \u00E2\u0080\u0094 fT-test 'iv : ; :\":!;->' : Wilcoxon;\/r ' Linear regression -T'GEEiv-\u00E2\u0080\u0094 \u00E2\u0080\u00A2 \" Manova.- v. \u00E2\u0080\u00A2' GO ~~1\u00E2\u0080\u0094 TO- 9 0 ' 1 0 0 \" : '. '\u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2 <:' .' Number of top genes [ a. 100 biomarkers differentially expressed at all time points in some patients 500 Biomarkers for Some Patients at All Time Points : T-te'st.:Wlcox6n';Permutati'o'n,Linearregression \u00E2\u0080\u00A2 G E E : : : : ^ ^ W: Manova \u00E2\u0080\u00A2'; ; j :\u00E2\u0080\u00A2.''] SO 70 \u00E2\u0080\u00A280: : Nuvbor oi top genes : :90 ;r 100 b. 500 biomarkers differentially expressed at all time points in some patients Figure 4.19: Top k vs. number of true positives in SP-AT, cat. IV 59 Although it seems from the top k versus true positive graphs (see Fig. 4.19) that GEE did the same thing as MANOVA, the poor performance of GEE was not due to the same issue. GEE assigned small p-values to many of the non-biomarker genes and, as a result, the true positives got a lower ranking and only a few made it to the top k. In contrast, while t-test, permutation, and Wilcoxon test had a lot of false positives (see Fig. 4.20), the top k for these methods included over 95% of all biomarkers for the first two methods, and over 80% for the third method. Precision - Recall :\u00C2\u00B0 T-test A Wilcoxon . ' +' . Permutation.. :< Linear regression O. G E E y /Manoya '. x tmtmM.'MAAWMMMM^MAW\u00C2\u00A3$Qfflw&i'$f^ . ioo% wiiA AAMyZMZAW: W-lil.PrecisionpispJP;; P' i P Figure 4.20: Precision-Recall graph for markers expressed in some patients at all time points 4.2.4.3. Differential Expression for All Patients at Some Time Points While the performance of most of the filtering methods was not affected by having differential expression at not all, but only some of the time points, for MANOVA the performance changed quite a bit compared to the experiments with the biomarker being expressed at all time points. . crr-60 60 Biomarkers for All Patients at Some Time Points: \u00E2\u0080\u0094 T-test, Wilcoxon, Permutation - \u00E2\u0080\u00A2 .Linear regression:'\"- ,. '\u00E2\u0080\u00A2:: - \u00E2\u0080\u00A2 Manova'' \u00E2\u0080\u00A2 - \u00E2\u0080\u00A2 ' G E E : : , < - , 60 70, 90/ ybo: \"Number of top.genes\"; a. 60 biomarkers expressed in all patients at some of the time points 200 Biomarkers for All Patients at Some Time Points :::::'KS: Number of topVg'e'nes i b. 200 biomarkers expressed in all patients at some of the time points Figure 4.21: Top k vs. number o f true positives in A P - S T , cat. IV 61 MANOVA found only about 60% of the true positives when the biomarker had differential expression in all patients but only at a few time points. T-test, Wilcoxon, and permutation test were the best methods to find all true positives (see Fig. 4.21). Even though t-test, Wilcoxon, and permutation test seemed to provide the perfect answers based on Fig. 21, when we studied the precision-recall graph we were able to notice these methods' fault. Namely, these three filtering methods had very low precision (see Fig. 4.22), due to the large number of false positives they returned. This problem was not detected by examining Fig 4.21, because the false positives had smaller p-values than the false negatives and, as such, the top k genes only included the true positives. Precision - Recall of \u00C2\u00B0 T-test, Wilcoxon + Permutation '\u00E2\u0080\u00A2 x 1 Linear regression o GEE ' v Manova'' V ...O TTT 20% :40%:;-'-;J:,;.;::.' v.p%; Proc.sio-! 80% 100% Figure 4.22: Precision-Recall graph for markers expressed in all patients at some time points Based on the experiments described in this chapter, we realized that some of the statistical methods' performance was different than expected. Also, even within a single category of questions, depending on how many patients have differential expression for a 62 gene and at how many time points, the methods performed differently. A l l these observations, and some other interesting findings, are described in the next chapter. 63 Chapter 5 Conclusion In the previous chapters we presented a few methods we believe are good for the filtering step of analyzing a longitudinal microarray data set. We described these statistical methods' strengths and weaknesses and, based on these properties, we mapped them to the biological questions they can answer. After we performed a few experiments using synthetic data sets, we found that some of the filtering methods perform differently then expected. Based on our findings, we re-mapped the questions with filtering methods (see Table 5.1). We expected some o f the tests to perform better. For example, G E E , a method designed for longitudinal data analysis, did not perform as well as we thought it would. We discovered that G E E found many false positives and false negatives, which decreases the value of this statistical method for the filtering step. We also found that t-test, Wilcoxon, and permutation test had a better performance than expected due to the high recall they provide. Although these methods are fast and easy to use, in some cases, they might not be the best solution. Such is the case for questions in category III, where there are many predictors to be considered in the model. In these cases the methods have to be run for each combination of pairs o f predictors and the results have to be combined. This process can become tedious and combining the results returned by each pair might get complex. If there are many predictors for category III like questions, using M A N O V A for filtering might be a better solution. 64 Cat. Questions Methods Type of Data Revised Methods I 1. A t which one of the time points does the D E start? 2. Are there genes with D E from the moment of transplantation? T-test, Wilcoxon, Permutation A P - A T T-test, Wilcoxon, Permutation, M A N O V A S P - A T Permutation, T-test, Linear regression A P - S T T-test, Wilcoxon, Permutation II 3. Which genes have D E between control and rejection at T i compared to T B ? 4 . Which genes have D E between T L and T n ? T-test, Wilcoxon, Permutation, M A N O V A A P - A T T-test, Wilcoxon, Permutation, Linear regression, G E E S P - A T III 5. Are there genes with D E between liver, kidney, & heart? 6. Is the concentration of the biomarker detected correlated with the severity of the rejection? 7. Does gender, race, age, medication play a role in rejection? Linear regression, G E E , M A N O V A A P - A T T-test, Wilcoxon, Permutation, M A N O V A S P - A T T-test, Wilcoxon, Permutation A P - S T Permutation IV 8. Are there genes with D E between control & rejection? 9. Are there genes with D E between liver & kidney? 10 . Are there genes with D E between heart & kidney? 1 1 . Are there genes with D E between liver & heart? A l l A P - A T T-test, Wilcoxon, Permutation, M A N O V A S P - A T T-test, Permutation, Wilcoxon A P - S T T-test, Wilcoxon, Permutation Table 5 . 1 : Revised question-method table 6 5 There was another issue that came to our attention during the experiments. If the data set has genes differentially expressed at all time points, but in only some of the patients, M A N O V A wi l l not find these genes. Therefore, M A N O V A is only recommended for the analysis of these types o f data sets i f the purpose is finding biomarkers differentially expressed in all patients within a specific group. Furthermore, linear regression and G E E can find the genes with differential expression in the least number of patients. For example, for category II type of questions these tests can find genes with differential expression in only 15% of the rejection patients while t-test, Wilcoxon and permutation test need this number to be above 70%. If the purpose of the study is to find differentially expressed genes expressed in a few of the patients, then linear regression and G E E are the methods to be applied. On the other hand, i f one is interested in biomarkers expressed in all or most patients then the other statistical methods should be employed. We think that future research is needed to find statistical methods that can provide a high precision and recall for category III A P - S T type of data. Even though in Table 5.1 we listed permutation as the best method out of the ones we investigated, this method did not perform well enough to find most biomarkers planted in the data. For this type of data all of the six methods had low recall and precision. Although the question-method mapping is specific to our project, Table 5.1 can provide a guideline for other studies with longitudinal microarray data. 66 Bibliography 1. Benjamini Y , Hochberg Y . Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Royal Stat Soc B Met. 1995; 57 (1): 289-300. 2. Clarke W , Silverman B C , Zhang Z , Chan D W , Kle in A S , Molmenti EP . Characterization of renal allograft rejection by urinary proteomic analysis. A n n Surg 2003; 237 (5): 660-4. 3. Diggle PJ, Heagerty P, Liang K - Y , Zeger S L . Analysis of longitudinal data. 2 n d ed. Oxford; New York: Oxford University Press, 2002. 4. Donauer J, Rumberger B , K l e i n M , Faller D , Wilpert J, Sparna T, Schieren G , Rohrbach R, Dern P, Timmer J , Pisarski P, Kirste G , Walz G . Expression profiling on chronically rejected transplant kidneys. Transplantation. 2003; 76 (3): 539-47. 5. Dugre FJ , Gaudreau S, Belles-Isles M , Houde I, Roy R. Cytokine and cytotoxic molecule gene expression determined in peripheral blood mononuclear cells in the diagnosis of acute renal rejection. Transplantation. 2000; 70 (7): 1074-80. 6. Edgar R, Domrachev M , Lash A E . Gene Expression Omnibus: N C B I gene expression and hybridization array data repository. Nucleic Acids Res. 2002; 30(1): 207-10. 7. Epicenter Software. Genetrix 2004. Online. Available: 15 July 2005. 8. Fahmy N M , Yamani M H , Starling R C , Ratliff N B , Young JB , McCarthy P M , Feng J, Novick A C , Fairchild R L . Chemokine and receptor-gene expression during early and 67 late acute rejection episodes in human cardiac allografts. Transplantation. 2003; 75 (12): 2044-7. 9. Hirano H , Tabuchi Y , Kondo T, Zhao Q L , Ogawa R, Cu i Z G , Feril L B Jr, Kanayama S. Analysis of gene expression in apoptosis of human lymphoma U937 cells induced by heat shock and the effects of alpha-phenyl N-tert-butylnitrone (PBN) and its derivatives. Apoptosis. 2005; 10 (2): 331-40. 10. Ihaka R, Gentleman R. R: a language for data analysis and graphics. J. o f Comp. and Graphical Stats. 1996; 5: 299-314. 11. Lachenbruch P A , Rosenberg A S , Bonvini E , Cavail le-Coll M W , Colv in R B . Biomarkers and surrogate endpoints in renal transplantation: present status and considerations for clinical trial design. A m J Transplant. 2004; 4 (4): 451-7. 12. Liang K - Y , Zeger S L . Longitudinal Data Analysis Using Generalized Linear Models. Biometrica. 1986; 73: 13-22. 13. Luckow B , Joergensen J, Chi l la S, L i JP, Henger A , Kiss E , Wieczorek G , Roth L , Hartmann N , Hoffmann R, Kretzler M , Nelson PJ, Perez de Lema G , Maier H , Wurst W , Ball ing R, Pfeffer K , Grone H J , Schlondorff D , Zerwes H G . Reduced intragraft m R N A expression of matrix metalloproteinases Mmp3, M m p l 2 , M m p l 3 and Adam8, and diminished transplant arteriosclerosis in Ccr5-deficient mice. Eur J Immunol. 2004; 34 (9): 2568-78. 14. Matsui Y , Saiura A , Sugawara Y , Sata M , Naruse K , Yagita H , Kohro T, Mataki C , Izumi A , Yamaguchi T, Minami T, Sakihama T, Ihara S, Aburatani H , Hamakubo T, Kodama T, Makuuchi M . Identification of gene expression profile in tolerizing murine cardiac allograft by costimulatory blockade. Physiol Genomics. 2003; 15 (3): 199-208. 15. Shands HealthCare. Myocardial biopsy 02 Oct. 2003. Online. Available: 15 July 2005. 16. Scherer A , Krause A , Walker JR, K o r n A , Niese D , Raulf F . Early prognosis of the development of renal chronic allograft rejection by gene expression profiling o f human protocol biopsies. Transplantation. 2003; 75 (8): 1323-30. 68 17. Schoels M , Dengler TJ , Richter R, Meuer S C , Giese T. Detection of cardiac allograft rejection by real-time P C R analysis of circulating mononuclear cells. C l i n Transplant. 2004; 18 (5): 513-7. 18. Schorge JO, Drake R D , Lee H , Skates SJ, Rajanbabu R, Mi l l e r D S , K i m J H , Cramer D W , Berkowitz R S , M o k SC . Osteopontin as an adjunct to CA125 in detecting recurrent ovarian cancer. C l i n Cancer Res. 2004 M a y 15; 10 (10): 3474-8. 19. Tracey L , Villuendas R, Ortiz P, Dopazo A , Spiteri I, Lombardia L , Rodriguez-Peralto J L , Fernandez-Herrera J, Hernandez A , Fraga J, Dominguez O, Herrero J, Alonso M A , Dopazo J, Piris M A . Identification of genes involved in resistance to interferon-alpha in cutaneous T-cell lymphoma. A m J Pathol. 2002; 161 (5): 1825-37. 20. Twisk J W R . Applied longitudinal data analysis for epidemiology: a practical guide. Cambridge, U K ; New York: Cambridge University Press, 2003. 21. Vaes B L , Dechering K J , Feijen A , Hendriks J M , Lefevre C, Mummery C L , Olijve W , van Zoelen E J , Steegenga W T . Comprehensive microarray analysis of bone morphogenetic protein 2-induced osteoblast differentiation resulting in the identification of novel markers for bone development. J Bone Miner Res. 2002; 17 (12): 2106-18. 69 "@en .
"Thesis/Dissertation"@en .
"2006-05"@en .
"10.14288/1.0051174"@en .
"eng"@en .
"Computer Science"@en .
"Vancouver : University of British Columbia Library"@en .
"For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use."@en .
"Graduate"@en .
"Comparative study of statistical methods for finding biomarkers in longitudinal data"@en .
"Text"@en .
"http://hdl.handle.net/2429/17587"@en .