ANALYSIS OF ORDERED CATEGORICAL DATA By Janis Chang Sc., (Biochemistry) University of British Columbia, 1985 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES DEPARTMENT OF STATISTICS We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA October 1988 © Janis Chang, 1988 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of Statistics The University of British Columbia 2075 Wesbrook Place Vancouver, Canada V6T 1W5 Date: Abstract Methods of testing for a location shift between two populations in a longitudinal study are investigated when the data of interest are ordered, categorical and non-linear. A non-standard analysis involving modelling of data over time with transition probability matrices is discussed. Next, the relative efficiencies of statistics more frequently used for the analysis of such categorical data at a single time point are examined. The Wilcoxon rank sum, McCullagh, and 2 sample t statistic are compared for the analysis of such cross sectional data using simulation and efficacy calculations. Simulation techniques are then utilized in comparing the stratified Wilcoxon, McCullagh and chi squared-type statistic in their efficiencies at detecting a location shift when the data are examined over two time points. The distribution of a chi squared-type statistic based on the simple contingency table constructed by merely noting whether a subject improved, stayed the same or deteriorated is derived. Applications of these methods and results to a data set of Multiple Sclerosis patients, some of whom were treated with interferon and some of whom received a placebo are provided throughout the thesis and our findings are summarized in the last Chapter. 11 Table of Contents Abstract ii List of Tables v List of Figures v Acknowledgement vi 1 Introduction 1 2 Markov Analysis 3 2.1 Tests of Order and Stationarity 4 2.1.1 Tests of Order of Markov Chain 6 2.1.2 Tests of Stationarity 8 2.2 Modelling of the Tridiagonals 10 2.2.1 Modelling Incorporating V2 1 2.2.2 Modelling Without V2 12 3 Analysis At One Time Point 9 3.1 Simulation 20 3.2 Efficacy Calculations 22 3.2.1 Efficacy of the Wilcoxon 23 3.2.2 Efficacy of the T test 31 3.3 Comparison of the Efficacies 4 iii k 4 Analysis at Two Time Points 36 4.1 Description of the Simulation4.1.1 Stratified Wilcoxon 37 4.1.2 Chi-squared Statistic 8 4.2 Simulation Results 46 4.3 Application of Tests to the MS Data 51 5 Conclusions 53 A Sample Transition Matrices 55 Bibliography 60 iv List of Tables 2.1 Comparison of Treatment and Control Groups 9 2.2 Tests of 2nd Order vs. 1st Order Markov Process2.3 Tests of 1st Order vs. Independence 16 2.4 Results of Test for Stationarity2.5 Treatment vs. Control (general tridiagonal modelling) 16 2.6 Fit of General Tridiagonal Models 17 2.7 Treatment vs. Control (general tridiagonal - no P2) 17 2.8 Fit of General Model - no P2 12.9 Comparison of Specific and General Models (treatment) 17 2.10 Comparison of Specific and General Models 18 4.1 Results of McCullagh Analysis 51 v List of Figures 3.1 Simulation of Data at One Time Point 21 4.1 Simulation Run 1 (3 = 0.1, p = 0.8) 47 4.2 Simulation Run 2(8 = 0.1, p = 0.6) 8 4.3 Simulation Run 3(3 = 0.3, p = 0.8) 49 4.4 Simulation Run 4(8 = 0.3, p = 0.6) 50 vi Acknowledgement I would like to thank my supervisor, Dr. N.E. Heckman for her continual guidance, support and many invaluable suggestions. Also, I am grateful to Dr. A.J. Petkau advice and careful reading of this work. In addition, the encouragement of Dr. J. Berkowitz and expert computing advice of Peter Schumacher are gratefully acknowledged. I also with to thank the MS Clinic at the UBC Hospital for their support. This research was funded by the Multiple Sclerosis Society of Canada through the UBC MS Clinic. vii Chapter 1 Introduction One hundred Multiple Sclerosis patients from the patient population of the MS Clinic at the UBC Hospital participated in a randomized double-blind clinical trial to deter mine the effectiveness of treatment with interferon. Multiple Sclerosis is a progressive disease which attacks the nervous system and often results in loss of vision, motor coordination and/or sensory perception. The severity of the symptoms varies among patients. The subjects in this study are chronic progressive, that is, their condition deteriorates progressively over time. In this trial, fifty patients were assigned randomly to control and treatment groups. Subjects in the control group were given injections of a placebo and those in the treatment group were given injections of interferon for six months. The patients were monitored during the six months of treatment and for eighteen months of follow-up to the subsequent termination of the treatment. No standard quantitative method of measuring the level of the disease exists. In this study, measurements of symptoms such as mobility or numbness were used to assess the severity of the subject's condition and this information was used to produce the Kurtzke extended disability status scale (EDSS). The Kurtzke EDSS, referred to here as Kurtzke score, was chosen as the means of tracing the subjects' conditions over time. The Kurtzke score is ordered and categorical, taking on values of 0 (normal) to 10 (dead) in increments of 0.5. It is also nonlinear, so that, for example, a change in score of 1 to 2 is not as severe as a change of 5 to 6. The nonlinearity and categorical nature of the scores makes it inappropriate to treat 1 Chapter 1. Introduction 2 them as continuous variables in the assessment of the extent of the disease. In partic ular, any statistic which requires the assumption of normally distributed observations (such as the 2 sample t statistic) is not suitable for analyzing this type of data. On the other hand, standard categorical data analysis could be done ignoring the ordinal nature of the data. This type of analysis is not appropriate here as information on the degree of improvement or deterioration of the patient's condition would be lost. One method of analysis, described in Chapter 2, is to consider the categories as states and the movements of the subjects from category to category over time as tran sitions from state to state. The treatment and control groups were modelled by dif ferent transition probability matrices which were compared to determine the effect of the interferon injections. Models assuming stationarity, the Markov property and other restrictions on the transition probabilities were fit to the data to determine whether a simple matrix could be used to describe the transitions of the patients. In Chapter 3, the Wilcoxon statistic, McCullagh model and 2 sample t test were compared to assess their relative efficiency in detecting a location shift between the two groups when the scores are assumed to have an underlying continuous distribution. The McCullagh model, a modification of the logistic regression model, incorporates the ordinal nature of the scores. In this analysis the data at only one time point was used. The comparison of the tests was based on asymptotic efficacy calculations and simulations. In Chapter 4, the stratified Wilcoxon, McCullagh and chi squared-type statistic were compared using simulation when analyzing the data between two time points. The chi squared- type statistic calculated was the usual analysis involving contingency tables where the cells were the number of patients in treatment and control who improved, stayed the same or worsened. The distribution of this statistic under the hypothesis of no difference between the groups is discussed. Chapter 2 Markov Analysis Since patients in this study moved from state to state over time, the movements from one state to another were modelled for each group using transition probability ma trices. Comparison of the two groups then involved analyzing the matrices estimated for each group. This approach does not require the assumption of a parametric form for the transition probabilities, however some models assuming a specific form for the probabilities were fit to the data to determine if the number of parameters required to describe the data could be reduced. Initially, the data was analyzed in its original form. Transition probability matrices were produced for each group (treatment and control) from 0 months to each of the other time points (1,3,6,9,12,18,24 months). Most observations were on or near the diagonals, indicating that the patients' scores did not change much from one time period to the next. As the dimensions of the matrices were 21 x 21 and there were only fifty patients in each group, many cells were empty. For this reason, the Kurtzke scores were grouped into five categories, chosen to ensure at least four people in each category at 0 months. The categories were scores of 0-4, 4.5-5.5, 6.0, 6.5, and 7-10, with 0-4 becoming state 1, 4.5-5.5 becoming state 2, etc. Transition matrices of the patients from 0 months to 24 months are displayed in Appendix A. All further calculations in this chapter are based upon this collapsed data. In Section 2.1, likelihood ratio statistics were used to determine if a second order Markov structure fit appreciably better than a first order model. A test for stationarity 3 Chapter 2. Markov Analysis 4 was also applied to the data. Some modelling of the data in the tridiagonal positions is discussed in Section 2.2. 2.1 Tests of Order and Stationarity The following tests were based on those described by Bhat [1]. Some notation used in the remainder of this chapter is: z — number of states t = time points studied, t = 0,1,. .., 7 corresponding to the times 0, 1, 3, 6, 9, 12, 18, 24 months Xt = Kurtzke score at time t. Using the collapsed data, transition probability matrices were calculated for tran sitions from 0 months to each of the other time points studied. Maximum likelihood estimates for PT and Pc, the transition matrices of the treatment and control group were computed, as well as for P, the transition matrix under Ho : PT = P°. A likeli hood ratio statistic was computed separately for each time point t = 0,1,..., 7 to test whether the two groups' transition probabilities were similar. Under the null hypothe sis that the two groups can be modelled using the same transition probability matrix, it can be shown [6] that the log likelihood function lnL(P) is: InL(P) = B + J2 I>S+ < ln^)- (2-1) i=l j=l where Chapter 2. Markov Analysis B = a term independent of the p4j's nf- = number of observations where Xt = j and Xo = i in the treat-ment group ng = number of observations in the control group where Xt = j and XQ = i Pij = Pv[Xt = j\ X0 = i]. Under the alternate hypothesis, Hi : PT ^ Pc, the log likelihood function is: lnL(Pc,PT) = B + ±± (nj,lnpg + ng Inpg) «'=1 j=l where pfj — Pr[X( = j | Xo = i in treatment group] pfj = Pr[Xt = j \ Xo = i in. control group]. The log likelihood statistic, G2, is then G2 = -2 [lnL(P)-lnL(Pc,Pr)] = 2 ± ± [(nl lnpg + ng Inpg) - (nj + ng)(lnp,,)] . t'=i j=i where all the probabilities were estimated using maximum likelihood techniques: lnL(P) = the log likelihood at P Pij = nIj + n% nf.+nf. Pij = "S Pij = nj. <. = Chapter 2. Markov Analysis 6 Under the null hypothesis, G2 has an asymptotic chi-squared distribution with degrees of freedom equal to z[z — 1) — yp where yp is the number of zero entries in P. Pearson's chi squared statistic, X2, was also calculated: *2 = ££ + i=l j=l L mIj mij (2.3) where mJ- = nj.pij, and mfj = nf.pij are the expected number of observations in the treatment and control group who moved from state i to state j between time 0 and time t. Results from the seven tests are given in Table 2.1 and the transition matrices between time 0 and 24 months are displayed in Appendix A. The p-values reported were based on the x2 approximation. The p-values were smallest (although not significant) at six and nine months, which is when the treatment was discontinued. This suggests that the treatment may have had an effect at this time, which wore off at eighteen months. Since the statistics calculated only measure absolute differences it was not possible to determine from them whether the treatment group did better or worse than the control group. No obvious trend could be seen from examination of the estimated transition matrices, but previous work [7] indicated that in fact the treatment group regressed (relative to the controls) from 0 to 6 months. 2.1.1 Tests of Order of Markov Chain Second order and first order Markov Chain models were fit to the data and the resultant estimates were compared. Each group was tested separately. A likelihood ratio statistic Chapter 2. Markov Analysis 7 was used to test H0 : chain is ls< order Markov vs. Hi : chain is 2nd order Markov. This test was carried out separately for every three consecutive time points in the study, i.e. at 0-1-3 months, 1-3-6 months, etc. If the data appeared to act as a 2nd order Markov chain then that would imply that a patient's score would depend on his score both at the previous time point and the time point before that. In this problem, the G2 statistic becomes: i=i j=i k=i nijk x n). inh- x n)k where n\-k — number of observations in which Xt-2 = h Xt-i = j, Xt = k njk = number of observations in which Xt_i = j, Xt = k n). — Yli=i ntjk nij. — J2t=i n\jk t = 2,...,z. G\ has an approximate chi squared distribution with J2j=i(z ~ rj — 1) X — Cj — 1) degrees of freedom where z is the number of categories, Cj and rj are the number of zero rows and columns respectively in the two dimensional transition matrix consisting of the transition probabilities p\jk (i = 1,..., z, j is fixed and k = 1,..., z). The results from these tests are in Table 2.2. Only the statistic for the treatment group in the time interval six to twelve months was significant at an a level of 0.05, which indicated that modelling with a Is* order Markov chain was reasonable for most of the time periods. The counts in these matrices were very low, so the statistics calculated may be misleading. Similar tests were applied to the 1st order Markov transition matrices to determine whether the final response depended on the previous score (1st order Markov) or not Chapter 2. Markov Analysis 8 (independence). The appropriate G2 statistic becomes: G! = 2£5X.ln z z (2.4) where The statistics calculated using (2.4) are in Table 2.3. All of the statistics were very large in comparison with the degrees of freedom at all time points which meant that the reduction from a 1ST order Markov chain model to an independence model was not reasonable. 2.1.2 Tests of Stationarity Assuming the data had the Markov Property, another likelihood ratio statistic was computed, testing for stationarity. We assume that the same transition matrix could be used to describe the patients' movements between all of the time intervals measured, independent of the actual real-time length of that time interval. A transition matrix was estimated using data from all time points. This was compared to the seven matrices estimated with the counts for every two consecutive time points in the study (0-1 month, 1-3 months, etc.). The G2 statistic here takes on the form: z z 7 n\. x riij t=:i j=i t=i where Chapter 2. Markov Analysis 9 Table 2.1: Comparison of Treatment and Control Groups Time Period G2 p value X2 p value degrees of (months) statistic (G2) statistic (X2) freedom 0-1 12.386 0.19 10.718 0.30 9 0-3 15.086 0.13 12.745 0.24 10 0-6 22.050 0.08 17.387 0.24 14 0-9 17.985 0.08 14.931 0.19 11 0-12 17.502 0.13 14.239 0.29 12 0-18 12.621 0.32 10.683 0.47 11 0-24 14.072 0.44 11.825 0.62 14 Table 2.2: Tests of 2nd Order vs. 1st Order Markov rrocess Time Period degrees of p-value (months) statistic freedom 0 -» 1 -> 3 25.400 19 0.15 1 -» 3 -• 6 22.141 16 0.14 Treatment 3 -> 6 -4 9 20.006 15 0.17 Group 6 -> 9 -> 12 21.161 11 0.03 9 -»• 12 -> 18 3.589 12 0.99 12 18 -»• 24 29.795 24 0.19 0 1 ^ 3 18.429 12 0.10 1^3-^6 5.997 12 0.92 Control 3 -> 6 -> 9 10.104 9 0.34 Group 6 9 12 19.101 13 0.12 9 '-»• 12 -» 18 11.920 11 0.37 12 18 -> 24 11.251 17 0.84 Chapter 2. Markov Analysis 10 The results are in Table 2.4. The statistics for both groups were nonsignificant (at a = 0.05) which implied that this model was not unreasonable for the time period studied. 2.2 Modelling of the Tridiagonals Specific modelling of the tridiagonals was done as most transitions were made to neigh bouring Kurtzke scores. In this section, all transitions made to non-neighboring states were ignored and transition probabilities not in a tridiagonal position were assumed to be zero; that is, p^ = 0 if | i — j |> 2. In the following calculations, the transition matrices were assumed to be stationary and first order Markov. The observations on the tridiagonals were modelled first using a general form where each row had different entries, then using a more specific form which was suggested by the data. In this section, the uneven spacing of the time points will be taken into account. The data at the one month time point were omitted so that the remaining time points (0,3,6,9,12,18,24) were separated by intervals which were multiples of three months. This allowed modelling of transitions over a three month period using the matrix, V. Thus, transitions over a six month period could be modelled using V2. To adjust for this omission, redefine t = 0,1,..., 6 corresponding to 0, 3, 6, 9, 12, 18, and 24 months. Subsection 2.2.1 discusses modelling the tridiagonals using the matrices V and V2 to describe transitions occurring over three and six month periods respectively. In Section 2.2.2, models are fitted assuming that the same transition matrix, P, can be used to model both three and six month periods. The entire time interval and that during treatment (0, 3, 6 months) and after treatment (6, 9, 12, 18, 24 months) were modelled separately so that the estimates could be compared. Some other observations were omitted from the analysis because the transitions were Chapter 2. Markov Analysis 11 larger than those allowed by the model. In this case, data from the subject in question was used until the violation occurred. A subject with a missing observation was dealt with similarly. Approximately 25% of the subjects in each group had a missing value at some time point after six months. Only six patients in the treatment group and three in the control group had transitions which were larger than those allowed by the model. 2.2.1 Modelling Incorporating V2 Transition matrices for treatment (VT) and control groups (Vc) were estimated and compared to a matrix estimated under the null hypothesis V° = VT. The data were first modelled using the general matrix, V: Pll 1 - pn 0 0 0 P21 P22 1 — P21 — P22 0 0 0 P32 P33 1 - P32 - P33 0 0 0 P43 P44 1 - P43 ~ 0 0 0 1 -P55 P55 The likelihood function L(V) was as follows: L(V) = Kx n,^" . (P2 )*,• where pij was redefined as: Pij = Pr [Xt = j | Xt-i = i] for t = 1, 2, 3,4 and Nij = Ei=5 n\j Pi = Pr[Xt = j | Xt_x = i] for t = 5,6 K = a constant independent of Pi/s. Chapter 2. Markov Analysis 12 To calculate the m.l.e.'s, it was necessary to solve eight nonlinear equations. This was accomplished using Powell's method, described in [5]. The G2 statistic calculated to compare the two groups was: <3 = £E t=i j=i (2.5) where njj, nfj, Nf-, Nf- were defined to correspond to counts in the treatment and control groups in the obvious way. Results of the test are in Table 2.5. In the time period during the treatment, G\ was large relative to the degrees of freedom. The low p values (calculated using the \2 approximation) indicated that, during this time period, the two groups' transition probability matrices were different. In the follow up period after treatment, and the entire two year period, there was no evidence that the two groups behaved differently. G2 statistics were calculated to determine how well the models fit the data (see Table 2.6). These statistics indicated that the fit of the model was reasonable at all of the time intervals modelled. 2.2.2 Modelling Without V2 Since the "stationarity" test seemed to indicate that all consecutive intervals in the study could be modelled using the same matrix, the above analysis was repeated under this assumption with the data gathered at one month again omitted. The period during treatment did not include any six month intervals so estimates are the same as in the previous section. G|, the log likelihood statistic comparing the two groups, was similar to that calculated in (2.3). The results, displayed in Table 2.7, show that the groups could be modelled reasonably according to the same transition probability matrix after treatment. Over the two year period, the statistics indicated that the transition probability matrices for the two groups were different, due probably to the differences which were detected in the first six months of the study (see Table 2.5). Chapter 2. Markov Analysis 13 G2 statistics for the period after treatment and the whole time studied are displayed in Table 2.8. These show that the predictions from the models agreed reasonably well with the actual data. After further examination of the data it was noted that most of the diagonal ele ments were similar. A pattern among the off diagonal elements was noticed also. To determine whether the data could be modelled with four parameters instead of the eight used above, the previous calculations were repeated, this time using a transition probability matrix suggested by the data, namely: Pll 1 - Pn 0 0 0 P21 P22 1 - P21-P22 0 0 0 P32 Pll 1 ~ P32 ~ Pll 0 0 0 1 ~P32 ~ Pll Pll P32 0 0 0 1 - Pll Pll The estimates computed for the specific and general tridiagonal models were compared for each group separately using G2: G27 = 2££<> i=l j=l 'A ,Pfj where pfj = m.l.e. of pij under the hypothesis of a general tridiagonal matrix pfj = m.l.e. of pij under the hypothesis of the specific tridiagonal matrix above. Chapter 2. Markov Analysis 14 The results are shown in Table 2.9 to 2.10 and estimates for the specific and general matrices are provided in the Appendix. The statistics for the control group indicate that the general model does not fit significantly better (at an a level of 0.05) than the specific model over the two years studied. The specific matrix estimated for the treatment group however, was only reasonable for the follow up period. As it was of interest whether the treatment and control group could be modelled using the same matrix after treatment, this matrix was compared to that for the control group (over the two year period) and they were found to be significantly different. This difference may be due to the fact that the specific matrix for the control group does not fit the data particularly well. A chi-squared statistic for the goodness of fit was 6.056 (p=0.195). From an examination of the matrices, it appears that in the control group, those patients in states other than two were more likely to stay in that state. If they were in state two, however, they were more likely to move to state one. This was also true for patients in the treatment group after six months. During administration of the drug, treatment subjects were more likely to move to a higher state (i.e. to become sicker) than the control group. All of these analyses indicate that the progress of the disease in the treatment group was not the same as in the control group in the first six months of the study. In the follow up period, there was no evidence to indicate that any of the groups was worse off than the other and over the entire time period studied there seemed to be little difference between the two. A model assuming the transition probability matrix was first order Markov and stationary seemed to fit the data reasonably well. The modelling of the tridiagonal elements revealed that a specific model in which all diagonal elements were the same except for the second, appeared to fit the data. From the matrices estimated using this model, it is seems that patients with scores of 4.5 Chapter 2. Markov Analysis 15 to 5.5 are more likely to move to an adjacent score than patients in any of the other categories used. However, it should be noted that the collapsed categories used in this analysis were somewhat arbitrary and may produce misleading results. As well, many of the cells in the transition matrices tested were zero, which could produce statistics that do not reflect the data accurately. Chapter 2. Markov Analysis 16 Table 2.3: Tests of 1st Order vs. Independence Time Period Gl _ degrees of (months) statistic freedom 0 -> 1 52.025 16 1 -»• 3 42.851 16 Treatment 3^6 53.937 16 Group 6 -> 9 46.017 16 9 -» 12 59.955 16 12 -»• 18 51.898 16 18 -* 24 49.019 16 0 -> 1 58.278 16 1 -> 3 44.748 16 Control 3 -> 6 55.247 16 Group 6^9 45.811 16 9 ^ 12 52.629 16 12 -»• 18 51.747 16 18 ->• 24 63.797 16 Table 2.4: Results of Test for Stationarity Group G\ df p value treatment control 88.921 65.198 84 78 0.34 0.85 Table 2.5: Treatment vs. Control (general tridiagonal modelling) Time (months) 0-3-6 6-9-12-18-24 0-3-6-9-12-18-24 G\ 19.250 4.283 11.415 p-value 0.01 0.83 0.18 Chapter 2. Markov Analysis 17 Table 2.6: Fit of General Tridiagonal Models Time Treatment degrees P Control degrees P (in months) Group of value Group of value freedom freedom 0-3-6 1.22 8 0.99 3.73 8 0.88 6-9-12-18-24 14.75 24 0.93 25.91 24 0.36 0-3-6-9-12-18-24 43.76 40 0.32 42.80 40 0.35 Table 2.7: Treatment vs. Control (general tridiagonal - no P2) Time (in months) 6-9-12-18-24 0-3-6-9-12-18-24 3.456 9.642 p-value 0.484 0.047 Table 2.8: Fit of General Model - no P2 Time Treatment degrees P Control degrees P (in months) Group of freedom value Group of freedom value 6-9-12-18-24 16.74 8 0.86 27.66 24 0.27 0-3-6-9-12-18-24 44.08 40 0.30 43.55 ) 40 0.32 Table 2.9: Comparison of Specific and General Models (treatment) Time Treatment (in months) G2? degrees p-value of freedom 0-3-6 11.815 4 0.018 6-9-12-18-24 3.919 4 0.417 0-3-6-9-12-18-24 14.7211 4 0.005 Chapter 2. Markov Analysis 18 Table 2.10: Comparison of Specific and General Models Time Control (in months) G? degrees p-value of freedom 0-3-6 6.531 3 0.088 6-9-12-18-24 6.017 4 0.197 0-3-6-9-12-18-24 5.565 4 0.234 Chapter 3 Analysis At One Time Point A typical analysis involves comparing the scores of the patients in the treatment and control group at one time point. Commonly used for this are the 2 sample t statistic (although inappropriate) and the Wilcoxon rank sum statistic. Here, these statistics were compared to a McCullagh model statistic described in [4]. The McCullagh model can be used to analyze data with ordered, categorical responses. The form used in this chapter is the proportional odds model: log 7j(g) 1 -7j(£). 9j -f3Tx . . — (3.6) where j = 1,2, ...,z z = total number of categories x = vector of covariates 7j(af) = probability of being in category j or lower given covariate x 9j = cutpoint j, the (unknown) point separating the categories j and j + 1 f3 = a vector of unknown parameters K = a scale parameter. The relative efficiencies of the three tests were compared using simulation and ef ficacy calculations. Data for the treatment and control group were simulated by first generating normal data with different location parameters. This continuous data was 19 Chapter 3. Analysis At One Time Point 20 used to create ordered categorical data according to a chosen set of cutpoints. Tests for differences in the two groups were then carried out using the 2 sample t, Wilcoxon rank sum and McCullagh statistic. The 2 sample t test was also calculated on the uncategorized data so that a test which did not lose information due to the categoriza tion could be compared to the others. These results are given in Section 3.1. Efficacy calculations were also made to determine how different the tests were asymptotically. The efficacies of the Wilcoxon rank sum test are calculated in Section 3.2.1, and for the t test calculated on the categorical data in Section 3.2.2. These efficacies are compared to the t test calculated on the underlying continuous data in Section 3.3. 3.1 Simulation A simulation was run to compare the powers of the Wilcoxon rank sum statistic, both 2 sample t tests and McCullagh model estimates. The control and treatment group were simulated using iV(0,l) and iV(A,l) distributions respectively to produce underlying continuous responses. These responses were then categorized with cut points chosen so that the probability of being in any category was 0.2 if the data came from a iV(0,1) distribution. Fifty observations (the same number as were in the Multiple Sclerosis data) were generated for each group. The simulations involved one thousand replications at each value of A. The A values used were those from 0 to 1 in increments of 0.1. The data were generated without conditioning on the number of observations in each category as this was the form of the Multiple Sclerosis data. Power curves were calculated for a one sided 0.05 level test of the null hypothesis H0 : A = 0 vs. the alternate hypothesis Hi : A > 0. The results of the simulation appear in Figure 3.1. As expected, the 2 sample t test on the underlying continuous data was the most powerful because no information was lost through collapsing the Chapter 3. Analysis At One Time Point Figure 3.1: Simulation of Data at One Time Point Chapter 3. Analysis At One Time Point 22 data. The Wilcoxon rank sum statistic and the 2 sample t test on the categorized data were very similar at all points. For the results summarized in Figure 3.1, K, the scale parameter for the McCullagh analysis was set to one. The covariate x was an indicator variable distinguishing the treatment from the control observations. In this case the test based on the McCullagh statistic was as powerful as the other tests on the categorized data. If K was included as a parameter depending on covariate x, it was not significantly different from 1 ~ 92% of the time. However, including this parameter decreased the power of the McCullagh statistic for values of A larger than 1.5. Differences in the simulated power between any of the models were not very large. The maximum difference was ~ 0.05 between the 2 sample t test calculated using the continuous data and the McCullagh model at A = 0.7, where the power of the t test is about 0.95. Given that the standard error of this difference was ~ 0.02, the t test was significantly more powerful than the McCullagh model at this time point. 3.2 Efficacy Calculations Another way of comparing the tests is to calculate their asymptotic relative efficiencies. The method of calculation used is that in Lehmann [3]. Let Vjv, V}f be two sequences of statistics based on N observations. Assume the distributions of both Vpi and depend on a real valued parameter 9. Let the null hypothesis be 9 = 90 and the alternate hypothesis be 9 > 8Q as in the simulation. Define BN to be the power of the test which rejects HQ if Vff ~ /*(flo) C/v(#0) > CAT (3.7) and 8'N to be the power of the test based on which rejects HQ if cr'N(90) (3.8) Chapter 3. Analysis At One Time Point 23 where pi(90), CJV(#0) and fi'(90), cr'N(60) are normalizing constants which may be the expectation and standard deviation of VN and respectively, and c/v and c'N are sequences of critical values. Assume 9^ is a sequence of alternatives, converging to 90 in such a way that 9^ = 90 + For most commonly used tests, this condition is sufficient to ensure that PN(9N) —• B0, 0 < 0O < 1. Define N' to be the sample size required for V^, to achieve the same limiting power as Vjv against the same sequence of alternatives 9^. If c/y, c^i —> za (the (1 — a)th quantile of the standard normal distribution) as N —> oo, then the Pitman efficiency of VN relative to V^, is limjv-foo (N'/N). Suppose that whenever 9^ = 90 + 0N(9N) - $(cA - za) and 3'N(9N) $(c'A - *„). Then c and c' are called the efficacies of the tests based on Vpj and V^, and the Pitman efficiency of Vjv relative to is: (c/c'f 3.2.1 Efficacy of the Wilcoxon Suppose the ordered categorical variable takes on values gi < g2 < • • • < gz, where z is the number of categories. In this analysis, observations tied in a category were assigned midranks, so the Wilcoxon rank sum statistic (with ties) was: W = ^(iVa + l)/2 + T2(NX + (N2 + l)/2) + • • • + T,(£ AT,- + (Nz + l)/2) (3.9) Chapter 3. Analysis At One Time Point 24 where Tj = number of observations in the treatment group in category j Cj = number of observations in the control group in category j and Nj = Tj + Cj. Let qj = Pr[in category j | in control group] Pj = Prfin category j | in treatment group] iV=total number of observations. The test based on the Wilcoxon involved normalizing W as follows: W-E0(W) yjvar*0{W) ' where E0(W) is the expectation of the Wilcoxon under H0 : qj = pj Vj, and varl(W) is the variance of the Wilcoxon under HQ conditional on the number of tied observations in each category. The power of the test at a specific alternative could have been calculated either unconditionally or conditionally on the tied observations. In this case, unconditional power was used as it simplified the efficacy calculations of the t test and eliminated the need to generate a fixed number of observations in each category for the simulations. The unconditional power of the test using the Wilcoxon statistic was: f3N(p,6)=?r where W - E0(W) > z, Jvar*(W) a Chapter 3. Analysis At One Time Point 25 E*(W) = E0(W) = ^^-var:(W) = _ ^^^Nf ^ Nj) 8 = y/N{q-p) m = total number of observations in the control group n = total number of observations in the treatment group JV = total number of observations in the study = n + m. Theorem 3.2.1 Suppose (Ti, T2,..., Tz) ~ multinomial(n, pi, p2, • • •, pz) independent of (Ci, C2, • • •, Cz) ~ multinomial(m,qi,q2,. .. ,qz) and N —> oo in such a way that n/N -> a, m/N -> b, 0 < a < 1. Then, as N —> oo, MP) - $(/(££) - za) (3.io) where * V^E;:; *J+E£ E£ P*-E£ E^+1 W*) V^;=i r>p>-Ei=i E,= i '•i'-iPiPj rj = 1 + Ef=j+i Pi — Ei=i defining E/_1 Pi ^° &e 0 /or aw?/ integer I. In the simulation run of Section 3.1 px = p2 = • • • = pz and the underlying continu ous distribution was known. The following corollaries state the efficacies under those conditions and the proof of Theorem 3.2.1 follows. —» —* Corollary 3.2.1 Assume that the multinomial vectors T and C are generated from XT± , XT2, •. •, Xxn ~ Hd N(A, 1) and XGl, Xc2,..., Xcm ~ iid iV(0,1) in the usual way. That is, an observation is in category i if and only if the X value is between cutpoints di and 9^ where 90 = —oo and 9Z = oo. Then under the assumptions of Theorem 3.2.1, and with A = 3N(A) -> $(c • £ - za) Chapter 3. Analysis At One Time Point 26 where \/a6 vEi=i ryPi-E,'=i Ej=i - /or j = 1,..., z - 1 probability density function of the standard normal distribution at 6j. Corollary 3.2.2 In i/ie case where —A, -Xr2—A,..., XT„ —A anc? > -^c*2> • • • > -^"cm are zzc? observations with a continuous density, h, and p1 = p2 = • • • = pz then, under the assumptions of Theorem 3.2.1., the efficacy of the Wilcoxon becomes: Va~b^-\(z - j)e3 where A = (5/x/iV ej =h(0j)-h(6j-1)j = l,2,...,z-l Oj = outpoint j. The above results do not depend upon the scores associated with the categories. Proof of Theorem 3.2.1: PN(p,6) = Pr = Pr W - Ea(W) y/var*(W) W - E(W) > zD var*(W) E(W) - Ea(W) y/mnN V mnN y/mnN It will be shown that if N —» oo in such a way that jj —> a, ^ —» 6, then 1- W^^0'^) ^ = i [££} r?Pi - E*=i EJ=1 rtTiPiPi] r,- = 1 + Ej=J+i Pj ~ Ej^i Pj Chapter 3. Analysis At One Time Point 27 „ vart(W) -, 3. wn-^w-* ffat). 1. To simplify the calculations, substitute m — £;=i Q for C2 and n — Ti for T2 in the expression for the Wilcoxon (refeq:wilc) to obtain: W - E(W) _ 1 \J mnN 2VmnN mn 2-1 2-1 2-1 2-1 nYJCi-mYJTj + Y^Tjd - £ E TJCi !=1 j=l j=l t'=l j=l «=j+l 2-1 2-1 j-l Z-l 2-1 E^ + EE^-E E j=i«=j+i Let c; Cj - mqj Ti — npi Then, W - E(W) _ 1 yj mnN 2 2-1 »=i 2-1 -li-i £ E rtc: - ^ E w + ^ E E (27c; - ir q) Since T*,Cj" converge in distribution to normal random variables with mean 0 and variance Pj(l — Pj), as iV —> 00, I 2-1j-i Thus as N —> 00, W - E(W) y/mnN where 2-1 J-I ri = 1 + E Pi - E Chapter 3. Analysis At One Time Point 28 2-1 t-1 »t = i + Yl 9j - £ 9; j=i+l j=l defining E^fc = 0 and EfJmPi = 0- Since cot;(C;,C;) = -qiqj and cot;(T*,T*) = —piPj, wjJ^^ converges to a normally distributed random variable with mean 0 and variance a w-^ / z—1 z—1z—1 uw = j\YL r)pj - Y Yl ririPiPj * \j=i t=i j=i 2. Rewriting the ratio as var*0(W) _ var*(W) var0(W) mnNa2v var0(W) mnNo~w' it will be shown that both ratios on the right hand side converge to 1. For the first ratio, var0(W) = E0[var*0(W)] + var0[E*0(W)] = E0[var*0(W)] since E*(W) is a constant. So var*0(W) _ var*0(W) var0(W) mnNo-w E0[var*(W)] mnNa^' But varn W Jf^arliW) 1 N~* ab 12 ab 12 mn(N + 1) ™ Y(N3-N) 12N(N - 1) jr?"3 j) as N —> oo, and 12 m. i-E N Chapter 3. Analysis At One Time Point 29 ab 12 ab 12 1 - £ (aPi + Hjf 3=1 under HQ. Thus, varl (^f) converges to a constant, say y, and is bounded and positive. Hence, E var* ^-^==)j also converges to y, and so var*0(W) Now we need to prove that E[var*(W)] var0(W) mnNaw 1. Recalling that under H0, pj = qj Vj, and so rj = Sj V j it can be shown that z-l z-l (W-E(W)\ 1 var0 . —• -V VmnN J 4 z-l £ r]pj - £ £ rtrjp;pj t=i j=i So var, „2 crw. o(W) mnNa^ as N —> oo. 3. Finally, using parts 1 and 2, the efficacy term can be shown to converge to a constant: E(W) - E0(W) r2w yj mnNa\ KM. So now we have that 3N(pJ) - Pr [JV(0,1) < /(p,<?) - za\ . Chapter 3. Analysis At One Time Point 30 Proof of Corollary 3.2.1. Note that, Pj = - A) - - A). Hence, as N —> oo, Substitution of this expression into (3.10) yields the result of Corollary 3.2.1. Proof of Corollary 3.2.2. From Theorem 3.2.1, the efficacy term is: For the case pj = 1/zV j, y/XJZl fjPi - E&1 Eg r,TiPiPi (2—1 2—1j—1 2—1 2—1 £^ + ££M*'-£ £ Vab = — Vao" z = —y/ab -sz + - £ -; -1) - - E(J -2-2 j=2 1 z 2 — 2 + - E - 2i^) + (*i - ^-0 Z j=2 Z 2-1 2-1 *£*; + £(*-2j)$> 3=1 3=1 2*£(z-j)6e since ^ Si = 0 »=i Since it can easily be seen from the above argument that Si ~ VNA[h(8j) - h(0j..i)] = Se, Chapter 3. Analysis At One Time Point 31 for any continuous density h. Note that when pj = 1/z, z-l 12 ri = z-l 3=1 gr? = 2(> - 1)(2* - 1) 3=1 3 3z Substitution of these expressions into the denominator of the efficacy term produces: z-l z-lz-l \ 12 rjPj ~1212 nrjPiPj = \ j=i i=i j=i and it can easily be seen that the efficacy term is: vV-i)/i2 3.2.2 Efficacy of the T test ?2 - 1 Let the response variable be as in Section 3.2.1. Then the 2 sample t statistic can be written as: T = Z]Zl(9i ~ 9z)(mT3 - nCj) nmS\Jl/m + 1/n where S = pooled standard deviation of the 2 groups (treatment and control). As N -> oo, (with n/N -> a, m/N —• b), a Z 2—12—1 12(dj ~ 9z)2Pj ~ 12 12(9j - 9z)(9i ~ 9z)PiPj j=l 3=1 *=1 Z 2—1 2—1 - 9z)2Pj - 12 12(9j - 9z)(gi - 9z)qi9j j=l 3=1 i=l + and, under HQ, z-l z-lz-l S2 -* a\ = Y(9j ~ 9z)2Pj - 12 12(9j ~ 9z)(9i ~ 9z)p, Pj-Chapter 3. Analysis At One Time Point The power of the 2 sample t test, II^(p) is nN(p) = Pr^[T>ta}. Theorem 3.2.2 Under the assumptions of Theorem 3.2.1 r —* IIJV(P) $ [w(p, 8) - za where ™IP, ~ -Pj Corollary 3.2.3 If XT and Xc are as in Section 3.2.1 then, as N oo, njv(A) = $(c • 8 - za) where dj and 8 are as in the previous section and vya&££}(fl,i - 9z)dj c = \JHU\(9j - 9z)2PJ - £^=1 ELiO - 9z)(g, - 9z)PiPj where A = 6/y/W and n/N —• a, m/N —* b. Corollary 3.2.4 When Pl = p2 = • • • = pz = \, (XTl - A,XTi - A,..., XTz and (Xcx, Xc2, • • •, Xcz) are as in Corollary 3.2.2 and gi oc i, the efficacy term 2 sample t test becomes where ej is defined as previously. Proof of Theorem 2.2.2: HN(p) = Prp^[T>ta} Chapter 3. Analysis At One Time Point 33 Let 7 = £jZ\(9j ~ 9z)(mTj - nCj). Then ILN(p) = Prft? 7 mnSyJl/m + 1/n Pr p,5 7 £(7)' mntaSvyJl/m + 1/n - E(j) > \/var(y) In order to reduce this expression to the form in (3.2.2) the following will be shown: 1. ^±^N(0,1) y/var(-y) 2. var( i 7 „) —> <7 yJl~F" yf var(y) mntaSWl/m+l/n 3. / . . • zc Eh) w(p,6) yj var{i) Part 1 is true by the Central Limit Theorem. To prove part 2, note that: uar(7) = var J^(9j ~ 9z)(mTj - nC3) v=1 mn z z—1z—X 12(9 i - 9zf [mpj + nq3] - ^ 12(9 j ~ 9z)(9i ~ 9z) [mpiPj + nqiq3} j=l j=l i=l As N —> oo, p3 —> qj and hence, (z-l z-lz-l Y(9j - 9z)2Pj - 12 12(9j - 9z)(9i ~ 9z)piPj j=l j=l i=l . var(- 7 Now ' y/mnN' mntaS^l/m + 1/n _ taSy/mnN sfmr^y) taS ,/uar(-7=2==) V v VmnN' Chapter 3. Analysis At One Time Point 34 The efficacy term can then be found to converge to a constant: Vob'ZjZiigz - 9 y/Z&9j ~ 9z)2Pi - Ei=i£f=i(0.- - g,)(gj - g,)PiPj 3.3 Comparison of the Efficacies The 2 sample t test is often used in analyzing ordered categorical data although it is not appropriate. For this reason, the relative efficiency of the Wilcoxon and the t test is of some interest. Under conditions listed below, as the number of observations becomes large, the efficacy terms for the Wilcoxon and the t tests applied to such data converge to the same expression: Sufficient conditions for (3.11) to hold are: 1. the probabilities of the responses are equal i.e. p1 = p2 = • • • = pz and 2. the scores for the categories are proportional to the category number. 3. the scores are generated by a continuous distribution. The efficacy for the t test applied to the continuous data is calculated in Lehmann [3]. It is c = where Chapter 3. Analysis At One Time Point 35 c is the standard deviation of the distribution of the data a = n/N, the proportion of the sample in the treatment group b = m/N, the proportion of the sample in the control group. The efficiency of the Wilcoxon applied to the categorical data relative to the the t test applied to the underlying continuous data was of interest as this indicated the decrease in power due to the categorization of the continuous data. Under the condition of equal p/s and data generated from a continuous distribution, the Pitman efficiency of the Wilcoxon to the t test was: ,2 1 ewilc,t(cut) - [12(72(E-l^ _I)E.)2 So, for example if the control group's scores were uniformly distributed on [0,1] and the treatment group's scores were uniformly distributed on [A, 1 + A] then the efficiency of the t (uncut) test to the Wilcoxon (on scores categorized as in the simulation) is: c-(z-iy (Although the uniform density is not continuous at 0 and 1, Corollary 3.2.2 can be modified for this case.) It is easily seen from this that as z, the number of categories becomes large, c —> 1, which is what would be expected. The simulation in Section 3.1 was run under conditions 1 and 2 with the normal as its underlying continuous distribution. Although there were only 100 observations in each run, the curves in Figure 3.1 are very close. Using the efficacy terms calculated, the asymptotic Pitman efficiency of the Wilcoxon with respect to the t test on the continuous data was calculated and found to be 0.89. This means that the t test requires ~ 89% of the observations needed by the Wilcoxon in order to attain the same limiting power under the same alternate hypotheses. These calculations seem to agree with the simulation run in Section 3.1. Chapter 4 Analysis at Two Time Points Most patients in the Multiple Sclerosis trials were examined at all of the time points 0, 1, 3, 6, 9, 12, 18, 24 months during the study. A better indication of the differences between the treatment and control group could be gained by analyzing how the scores of the patients change between two time points. The power of some statistics frequently used to analyze ordered data over time were compared using simulation. These statis tics were the stratified Wilcoxon, the McCullagh, and a chi squared statistic. Section 4.1 contains a description of the simulation runs. The Wilcoxon and the chi-squared statistic are described more fully in Sections 4.1.1 and 4.1.2. In the latter, the dis tribution of the chi-squared statistic is derived under the hypothesis of no difference between the treatment and control groups. In Section 4.2, the results of the simulation are discussed. 4.1 Description of the Simulation An initial and final score were generated for each subject simulating his condition at two time points. The values of the underlying continuous random variables giving rise to these scores were denoted by (Rf, Y?) or (Rf,Yf) for those in the treatment or control group. These data were categorized using the same cut points as in the previous chapter to produce scores. The distribution of these pairs was: /^\ /r n 1 r, .IN Y? J I 0 1 P BVN \ . £ + A . i . P 1 . 36 Chapter 4. Analysis at Two Time Points 37 and \ / The parameters 3 and A represented the changes in the scores due to the progression of the disease and the treatment respectively. Power curves were computed for a level a=0.05 test of the null hypothesis A = 0 vs. the alternate hypothesis A ^ 0. A simulation run generated two groups (fifty observations/group) at A values of 0 to 1.2 in increments of 0.1. Each curve was calculated using one thousand simulation runs. The stratified Wilcoxon, chi-squared and McCullagh model statistics were calcu lated for each set of data generated. The McCullagh model statistic was computed using the plum software in a way similar to that in Chapter 2, except that the co variate x had multiple components. One component was a variable indicating whether the observation was from the treatment or control group. The other components were indicator variables for each possible initial score. The scale parameter was set to one. Since, in the simulation, the continuous observations were known, the continuous response variable was regressed using an indicator for the treatment/control groups and the initial values for covariates. The null hypothesis H0 : A = 0 was rejected if the estimate for the treatment/control variable was significantly different from zero. This statistic was included so that the decrease in power due to the loss of information from categorization could be determined. 4.1.1 Stratified Wilcoxon The stratified Wilcoxon rank sum statistic, described in [3] (p. 132), is an extension of that employed in the previous chapter. The subjects in the two groups were first stratified according to their initial score. Within each strata, s, the Wilcoxon statistic / Rc \ I - BVN 0 0 1 P P 1 Chapter 4. Analysis at Two Time Points 38 Ws was computed, as in Chapter 3, using the final scores as the basis for the ranking procedure and summing the treatment ranks. Each of the statistics, Ws, was normalized by subtraction of its mean to produce W*: w; = wa- E(w8) where E(W.) = 2^±H Ts — number of observations in treatment group with initial score s Ns = total number of observations with initial score s. The overall Wilcoxon, W*, was then calculated as: WT The variance of W* was: ~se(W*) [se(W*)f = £ where Ni + 1 z = number of strata se(Wj) ~ standard deviation of W* calculated in the usual way. The test was then rejected or accepted using the normal approximation: W* se(W*) 4.1.2 Chi-squared Statistic JV(0,1). A natural test of the null hypothesis that a subject's progress is independent of treat ment received is based on a simple contingency table analysis. The resulting statistic, referred to as xL/o ^s Dased on the number of subjects whose condition improved, de teriorated or remained the same over the two time points. To calculate Xca/ci ^ne data were collapsed into a 3 x 2 contingency table: Chapter 4. Analysis at Two Time Points Treatment Control Totals c-n~ rpo C° n° T+ C+ n+ n m N where rp— 1 X^z T1 c = Ef=i EjLt+i Cij C° = E?=1Cy rp-\- V-"*2 VM—1 rp 1 — 2^i=2 l^j=l •Lij c+ = EUE&Ci,-T{j = the number of observations in the treatment group with initial score i and final score j Cij = the number of observations in the control group with initial score i and final score j n = number of observations in the treatment group m = number of observations in the control group N = total number of observations in the study. The statistic was calculated as: y2 = (r- - ^)2 (c- - ^)2 X-calc nn- ~r mn-N N (rpo nn"^ (Vio mn° \ 1 ~ir) Ac ~ —) nn" ' mn" N N Chapter 4. Analysis at Two Time Points 40 (T+ - (c+ TV The statistic, Xcaici ls usually assumed to have a limiting x2 distribution with two degrees of freedom. It was not clear if this approximation was valid so the limiting null distribution of xL/c was derived. Two methods of sampling which produce data in a fashion similar to that in the Multiple Sclerosis study were considered, and led to distinct limiting null distributions for Xcalc- ^ description of each of these sampling schemes and the distribution of xlaic *s preceded by some notation used in this analysis: Pij — Pr[ final score = j | initial score = i, in treatment group ] qij = Pr[final score == j | initial score = i, in control group ] p^j = Pr[ final score = j, initial score = i, in treatment group ] qfj = Pr[ final score = j, initial score = i, in control group ] rii = number of treatment observations in strata i rrii = number of control observations in strata i Unconditional Sampling In unconditional sampling, subjects are randomly assigned to groups and then are stratified according to the chosen covariate. In this case, the numbers of observations in each strata is random. This type of sampling is appropriate when it is not important if strata are empty, or there are so many subjects that the experimenter is assured that it is unlikely that any of the strata will be empty. Suppose that (Tn,Ti2,... ,TZZ) ~ multinornial(n,p"2,...,p"_J, independent of (Cn, C12,..., Czz) ~ multinomial(m, gn, q™2,..., q™z) under unconditional sampling. If we define T to be a vector with the components (T+, T~,T°) and C to be (C+, C~, C°) Chapter 4. Analysis at Two Time Points 41 then T and C are independent with distributions: T ~ multinomial(n,p+ ,p ,p°) C ~ multinomial (m, q+, q , q° ) where P+ — Ei=2 Ej=i Pij P ~ Ei=l ^2j=i+l Pij P° = ZUPIJ o+ — V* y^-1 a?-y — i^i=2 Hi] 1 = £i=i £j=;+i Qtj In this case, the usual asymptotic theory for independent multinomials holds and under the hypothesis p+ = q+, p° — q°, p~ = q~, (and thus also under the more restrictive null hypothesis p^- = q^ V i,j), xlaic 1S asymptotically distributed as a x2 with two degrees of freedom. Conditional Sampling In conditional sampling, the number of subjects in each strata is fixed at the begin ning of the experiment. This type of sampling could be used when it is expensive to allow many subjects to take part. It ensures that there will be observations in each strata, although the total number of subjects may be relatively small. This type of sampling is common so it is natural to calculate the distribution of xlaic conditional on n{ and mt-. In this case, the distribution of the counts becomes (Tn,Ti2, • • • ,TJ2) ~ multinomial(ni,pn,pi2, ... ,Piz) and (Cn, C,-2,..., Ciz) ~ multinomial (mi, qn, qi2,..., qiz) where i = 1,2,...,z and all multinomials are independent. Some assumptions made to Chapter 4. Analysis at Two Time Points 42 facilitate computations were that n,- = mt- V i and that N rii/N —¥ ai. Note that xlaic can ^e rewritten as: xLc = Kl + K\ + Kj oo in such a way that where K — J m \ 1 (T0 nn°\]2 •ZX2 V mnn" [VW V AT /J js- I f 1 / rp+ nn+ \ Theorem 4.1.1 If tii — TTij V i, then, under H0 : p,j = ^jj, V i,j as N —• oo ane? ^2,7^) ^>MVN(0,X) with E12 S13 and E23 = 1 Et=2 ^j=l Efc=l aiPijPik Ef=2 Ej=i fli'Pij ELi «jP 33 1 -Ej=i ajPjj>' Ei=l Ej=i+1 Efe=t'+1 aiPijPik Ej=:l Ej=i+1 at'Pij vj=l ^iPiiPij Ez v-»t—1 i=2 Z^j 1/2 [(E*=2 E}=\ a,-p„) (Ej=i ajP«)] — Et'=2 Ej=i Efc=i+i AiPijPik (E;=2 Ej=l aiPij) (Ef=i Ej=;+l Q-iPijj 1/2 Ef=l Ej=j-|-l aiPiiPij [(zui zUi+i aiPij) (Ej=i ajPii) 1/2 Chapter 4. Analysis at Two Time Points 43 Thus, where Zt, Z2, Z3 are iid N(0,1) and Ai,A2,A3 are the eigenvalues o/S. Proof of Theorem 4.1.1. Let X2 ~ VW\ ~N~ x3 = _L(T+-^ VN \ N To determine the limiting distribution of Xcaio nrs^ nnd the limiting distributions of Xi,X2,X3. If and then rp* Tjj TljPij 13 V^i c. * Cij rriiqij 'mi Xx = nn ~N z i—1 EE2* nn vwyuu N Chapter 4. Analysis at Two Time Points 44 1 )z i—1 ^ z i—1 £ £ Tij - JJ £ £ °ij i=2 j=l iv i=2 j=\ JV 7 t=2 i=l iV i=2 i=l Therefore, under iJ, 05 z i—1 i=2 j=l since ra; = mt- and p,j = under HQ. Under H0, T*j and C*j converge to normal random variables with mean zero, and thus Xi is asymptotically normal with mean zero and variance: var(Xi) ~ var Under H0, n Z \t=2i=l j (z i—1 z i—1 j—1 ^ £ E a^iiC1 - Pij) -2 E E E aiPijPik i=2 j=l i=2 i=l fc=l , z i—1 z i—1 i—1 EE EEE diPijPik i=2j=l i=2j=lk=l iV z i—1 ~ fefeU iV + m, X iV z i—1 EE(P<'ifli + ft'ia') i=2 j=l z i—1 = 2££<w, i=2 j=l and thus K~i is asymptotically normal with mean zero and variance, Et=2 Ej-l QiPij ~ £j-2 Ej=l Efc=l AiPijPik 12i=2 Ej=l aiPij Chapter 4. Analysis at Two Time Points 45 Similar calculations show that K2 and K3 are asymptotically normal. The variance/covariance calculations are straightforward. Now (KUK2,K3) ~ MVN(0,V). If S is a positive semidefinite matrix, 3 a diagonal matrix D, with the eigenvalues (Ai, A2, A3) of E as its elements, and a matrix P, such that PTP = I, and PHPT — D. Let Y = PK, then Y ~ N(0, D) and KTK = YTY ~ AiZx + A2Z2 + A3Z3 In most cases, the eigenvalues of S are difficult to find explicitly. However, one simple case is considered in the following: Corollary 4.1.1 Suppose that n\ = n2 = ... = nz and p^ = 1/z V i,j. Under the null hypothesis, where ZUZ2 ~ JV(0,1). Thus, xlaic does not have a limiting chi-squared distribution with two degrees of freedom in this case. Proof of Corollary: Using Theorem 4.1.1, the asymptotic covariance matrix of (Kx, K2, K3) is: The eigenvalues for this covariance matrix are Ai = 0, X2 = 1, A3 = | — ^. Chapter 4. Analysis at Two Time Points 46 4.2 Simulation Results The power curves computed by the simulations appear in Figures 4.1 to 4.4. In all of the figures, the most powerful test was the one based on the regression parameter. This result was expected as no information due to categorization was lost in the calculation of this statistic. The next most powerful was the McCullagh statistic which had a similar power curve to the Wilcoxon. In all cases both of these statistics were much more powerful than the chi-squared statistic. Because the chi-squared statistic did not take into account the initial score or the size of the difference, this was was not surprising. The powers of the tests increased with increasing p. This probably occurred because increasing p would decrease the variation in differences between the initial and final scores at fixed values of 0 and A. Changes in /? did not affect the power curves of the Wilcoxon, McCullagh or regression statistics, but increasing it did decrease the power of the chi-squared statistic. As /? became large, most patients' scores increased, and the difference between the treatment and control groups were in the sizes of these increases. Since the Wilcoxon and the McCullagh statistics compared the treatment and control groups based on the differences of the final scores given the inital scores, an increase in 0 would not be expected to change the power of these statistics. However, very large changes in /3 would result in all patients moving to the largest score, in which case, none of the tests would detect any difference between the groups. The chi-square statistic computed here was unconditional as the initial number of observations in the strata were not fixed. It only recorded whether or not the patients' scores increased, decreased or remained the same, so its power was expected to decrease as /3 increased. Chapter 4. Analysis at Two Time Points delta Figure 4.1: Simulation Run 1 (8 = 0.1, p = 0.8) Figure 4.2: Simulation Run 2 (/? = 0.1, p = 0.6) Chapter 4. Analysis at Two Time Points Chapter 4. Analysis at Two Time Points delta Figure 4.4: Simulation Run 4 (0 = 0.3, p = 0.6) Chapter 4. Analysis at Two Time Points 51 Table 4.1: Results of McCullagh Analysis Time estimate standard z-score (in months) of treatment parameter error 0-1 -0.82 0.418 -1.95 1-3 -0.40 0.403 -1.00 3-6 -0.17 0.424 -0.40 6-9 0.98 0.466 2.10 9-12 0.07 0.51 0.14 12-18 0.31 0.452 0.69 18-24 -0.16 0.48 -0.33 4.3 Application of Tests to the MS Data The stratified Wilcoxon and x2 tests have been calculated on the MS data set previously [2]. Both statistics were calculated on the data between 0 months and each of the other times the patients were observed. The results of the test based on the Wilcoxon statistic showed that the treatment group regressed (p ~ 0.05) relative to the control gorup in the time periods 0-1 months and 0-3 months. The remaining statistics calculated were nonsignificant. However, the change in sign in those time periods larger than six months indicates that patients in the control group may have fared less well than those in the treatment group in the follow up period. Since the x2 is less powerful than the Wilcoxon, it is not surprising that none of the tests based on it were significant. The data were fit to a McCullagh model. Unlike the previous analysis, the collapsed scores were used here since the data were sparse. The covariates in this analysis were indicator variables for each of the possible initial scores and one for treatment/ control group. The data were modelled between consecutive time points rather than from baseline to the other scores. The results are shown in Table 4.1. The variable for the treatment group parameter was set up so that a negative value Chapter 4. Analysis at Two Time Points 52 implied that the control group regressed with respect to the treatment group. Collaps ing the data has produced results which are contradictory to the previous Wilcoxon analysis in the 0-1 month time period. The Wilcoxon and x2 analysis agree with the analysis using the Markov techniques in that all three show that patients in the treatment group may not have progressed at the same rate as those in the control group during the first six months of the study. In the case of the Markov analysis, it is not possible to determine which group improved relative to the other. Chapter 5 Conclusions As the data set was small, the number of observations which had any particular Kurtzke score was low. For this reason, the data were collapsed into five categories chosen only to ensure at least four observations in each category at zero months. Results from the Markov analysis indicate that, during the administration of interferon, the treatment group regressed relative to the controls. After this period, subjects in the treatment group appeared to return to a state in which their transition probabilities were not significantly different from the controls. Models which did not incorporate information about the length of time intervals between observations fit the data just as well as those that did. That is, a model which assumed that transitions between any two consecutive time points could be modelled using the same transition matrix fit as well as a model which allowed a different matrix for each time interval. Modelling with a specific tridiagonal matrix with all diagonal elements equal, except for the second, proved to be reasonable for the control group over the entire time period. The treatment group could be modelled reasonably by this form only in the eighteen month follow up period. The major differences between the two groups were in those patients that started at Kurtzke scores in the range 4.5 to 5.5. Control patients with initial scores in this range fared better than treatment patients, during the period when the interferon was administered. Statistics commonly used to compare the groups at one time point were examined 53 Chapter 5. Conclusions 54 using simulation techniques. If data for the control group were generated using under lying iV(0,1) distribution and those for the treatment group by a N(A, 1) distribution, and the probability of being in each category was equal for A = 0, then the Wilcoxon, McCullagh and 2 sample t test were found to have similar relative efficiencies. Theoreti cal asymptotic calculations yielded expressions for Pitman efficiencies of these statistics for general shift models. Under the conditions of the simulations, the efficacies of the Wilcoxon and 2 sample t test calculated on the categorized data were found to be equal. The Pitman efficiency of these two statistics with respect to the t test calculated on the underlying continuous data was 0.89. A comparison of statistics used to compare the two groups' progression of disease between two time points was then carried out. Simulation results showed that when the categorical data were generated using a bivariate normal distribution, with equal initial probabilities of being in any category, the Wilcoxon and McCullagh statistics were much more efficient than the chi squared statistic. The asymptotic distribution of the chi-squared statistic was derived under the hypothesis of no treatment effect. It was determined that the chi-squared statistic did not necessarily have a limiting x2 distribution with two degrees of freedom if, in the sample of patients, the initial number of subjects in each category was fixed. However, if the number was not fixed, the chi squared statistic did have an asymptotic x2 distribution with two degrees of freedom. Appendix A Sample Transition Matrices Transition Matrix for Control Group from 0 to 24 months 21 7 1 0 0 8 7 3 0 0 0 2 67 29 1 0 0 23 49 7 0 0 15 8 Transition Matrix for Treatment Group from 0 to 24 months 23 6 1 0 0 4 2 7 0 0 1 3 59 20 1 0 0 20 40 14 0 0 0 12 16 55 Appendix A. Sample Transition Matrices 56 Control Group - General Tridiagonal Modelling 0 months to 6 months 0.67 0.33 0.00 0.00 0.00 0.57 0.29 0.14 0.00 0.00 0.00 0.03 0.65 0.32 0.00 0.00 0.00 0.35 0.59 0.06 0.00 0.00 0.00 1.00 0.00 6 months to 24 months 0.81 0.19 0.00 0.00 0.00 0.36 0.45 0.18 0.00 0.00 0.00 0.02 0.71 0.28 0.00 0.00 0.00 0.24 0.64 0.11 0.00 0.00 0.00 0.20 0.80 0 months to 24 months 0.75 0.25 0.00 0.00 0.00 0.44 0.39 0.17 0.00 0.00 0.00 0.02 0.68 0.30 0.00 0.00 0.00 0.29 0.62 0.09 0.00 0.00 0.00 0.38 0.62 Appendix A. Sample Transition Matrices Treatment group - General Modelling 0 months to 6 months 0.75 0.25 0.00 0.00 0.00 0.13 0.13 0.75 0.00 0.00 0.00 0.07 0.57 0.37 0.00 0.00 0.00 0.35 0.32 0.32 0.00 0.00 0.00 0.58 0.42 6 months to 24 months 0.82 0.18 0.00 0.00 0.00 0.60 0.20 0.20 0.00 0.00 0.00 0.02 0.81 0.17 0.00 0.00 0.00 0.21 0.70 0.09 0.00 0.00 0.00 0.31 0.69 0 months to 24 months 0.79 0.21 0.00 0.00 0.00 0.31 0.15 0.54 0.00 0.00 0.00 0.04 0.72 0.24 0.00 0.00 0.00 0.27 0.54 0.19 0.00 0.00 0.00 0.43 0.57 Appendix A. Sample Transition Matrices 58 Control Group - Specific Tridiagonal Modelling 0 months to 6 months 0.61 0.39 0.00 0.00 0.00 0.57 0.29 0.14 0.00 0.00 0.00 0.04 0.61 0.35 0.00 0.00 0.00 0.35 0.61 0.04 0.00 0.00 0.00 0.39 0.61 6 months to 24 months 0.71 0.29 0.00 0.00 0.00 0.36 0.45 0.18 0.00 0.00 0.00 0.05 0.71 0.24 0.00 0.00 0.00 0.24 0.71 0.05 0.00 0.00 0.00 0.29 0.71 0 months to 24 months 0.67 0.33 0.00 0.00 0.00 0.44 0.39 0.17 0.00 0.00 0.00 0.05 0.67 0.29 0.00 0.00 0.00 0.29 0.67 0.05 0.00 0.00 0.00 0.33 0.67 Appendix A. Sample Transition Matrices Treatment Group - Tridiagonal Modelling 0 months to 6 months 0.48 0.52 0.00 0.00 0.00 0.13 0.13 0.75 0.00 0.00 0.00 0.18 0.48 0.33 0.00 0.00 0.00 0.33 0.48 0.18 0.00 0.00 0.00 0.52 0.48 6 months to 24 months 0.76 0.24 0.00 0.00 0.00 0.60 0.20 0.20 0.00 0.00 0.00 0.05 0.76 0.19 0.00 0.00 0.00 0.19 0.76 0.05 0.00 0.00 0.00 0.24 0.76 0 months to 24 months 0.65 0.35 0.00 0.00 0.00 0.31 0.15 0.54 0.00 0.00 0.00 0.11 0.65 0.25 0.00 0.00 0.00 0.25 0.65 0.11 0.00 0.00 0.00 0.35 0.65 Bibliography [1] Bhat, Narayan U., (1984). Elements of Applied Stochastic Processes 2nd edition. John Wiley & Sons. [2] Kastrukoff, L., et. al., (1988). unpublished manuscript. UBC Hospital. [3] Lehmann, E.L. (1975). Nonparametrics: Statistical Methods Based on Ranks. Holden-Day Inc., San Francisco, California. [4] McCullagh, P. (1980). Regression Models For Ordinal Data. J.R. Statist. Soc. B, 42, No. 2, pp. 109-142. [5] Walsh, G.R. (1975). Methods of Optimization. John Wiley & Sons. [6] Whittle, P. (1955). Some Distribution and Moment Formulae For the Markov Chain. J.R. Statist. Soc. B, 17, pp. 235-242. [7] The Statistical Advisory Service (1988). Multiple Sclerosis Dataset. unpublished manuscript, University of Manitoba. 60
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Analysis of ordered categorical data
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Analysis of ordered categorical data Chang, Janis 1988-08-28
pdf
Page Metadata
Item Metadata
Title | Analysis of ordered categorical data |
Creator |
Chang, Janis |
Publisher | University of British Columbia |
Date Issued | 1988 |
Description | Methods of testing for a location shift between two populations in a longitudinal study are investigated when the data of interest are ordered, categorical and non-linear. A non-standard analysis involving modelling of data over time with transition probability matrices is discussed. Next, the relative efficiencies of statistics more frequently used for the analysis of such categorical data at a single time point are examined. The Wilcoxon rank sum, McCullagh, and 2 sample t statistic are compared for the analysis of such cross sectional data using simulation and efficacy calculations. Simulation techniques are then utilized in comparing the stratified Wilcoxon, McCullagh and chi squared-type statistic in their efficiencies at detecting a location shift when the data are examined over two time points. The distribution of a chi squared-type statistic based on the simple contingency table constructed by merely noting whether a subject improved, stayed the same or deteriorated is derived. Applications of these methods and results to a data set of Multiple Sclerosis patients, some of whom were treated with interferon and some of whom received a placebo are provided throughout the thesis and our findings are summarized in the last Chapter. |
Subject |
Longitudinal method Multivariate analysis Order statistics Matrices |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-08-28 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0097667 |
URI | http://hdl.handle.net/2429/27857 |
Degree |
Master of Science - MSc |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-UBC_1988_A6_7 C42_6.pdf [ 3.16MB ]
- Metadata
- JSON: 831-1.0097667.json
- JSON-LD: 831-1.0097667-ld.json
- RDF/XML (Pretty): 831-1.0097667-rdf.xml
- RDF/JSON: 831-1.0097667-rdf.json
- Turtle: 831-1.0097667-turtle.txt
- N-Triples: 831-1.0097667-rdf-ntriples.txt
- Original Record: 831-1.0097667-source.json
- Full Text
- 831-1.0097667-fulltext.txt
- Citation
- 831-1.0097667.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0097667/manifest