j(l>k- For x = 1 (level 1), we have the following equations: win = m i n i 4>K (6.8) tono = mnoi $ j 4>K (6-9) IOIOI = mion + mioio 4>K (6.10) wioo = miooi 4>j + TOIOOO * J K ( 6 - l l ) We can solve the two linear equations (6.8) and (6.10) for the two unknowns and (f>K- Substituting these solutions into (6.9) and (6.11) yields two equations for (f>j and thus (j>j is overdetermined. The equation: 10211 = "12111 fi X + ™2110 4> 4>X Ki then yields a value for x- Indeed, each of the w2ij equations yields an equation for as fif for the first level, fix as (j>2 for the second level, etc. In other words, there is one parameter for each 55 level of x and these parameters can all be identified. Thus, this specification for ^3({PiP};y*t I x\r}3) is identifiable. • L O R * L U R This model has 4>xjk = fijk as there is no dependence on the covariate x. Thus, there are only four parameters for all the levels of x. As before, a perfect fit re-l quires wxij = YJ mxijkfijk which represents four linear equations in the same four fc=o unknowns for each level of x. Hence, /i3({p, p},y3 | x;r]3) is also identifiable under this parameterization. In summary, /i3({p,p},y3 | x;rj3) can be identified if its form is one of the three types considered above. 6.4.2 Identifiability of h2({p},y2 I z ; ^ ) The verification of the identifiability of h,2{{p},y2 \ x\rf2) is similar to that for ^3({p,P},y 3 I s;^)- l n addition to the notation from the previous subsection, denote vxi = ny* >a>a,x and M{p},y*i I x]r}2) l x i ] i-M{p},ySI*;i 2)' The contribution of /i2({p}>yj! I ^i^) t ° Lx(0,ri) in (6.6) can be expressed as: i l l i l l ^2 ^ 2 ^ 2 mxiJk l0§ P x i 3 k + Yl Wxii lo&{X^ Pxijkfixjk } i=0 j=Q k=0 i=0 j=0 k=0 1 1 1 + ^ W I j l o g | ^ ^ / J x j j j f c [ l + txjkhxij}- (6.12) i=0 j=0 k=0 This is identical to the log-likelihood function for a contingency table {mxijk} with two supplementary margins, namely {wxij} (where k was not observed) and {vxi} (where neither of j and k were observed). Therefore, the expected cell counts for l ™xijk, wxij and vxi are nxijk = pxijk{mx+++ + wx++ + vx+), VJ Pxijkfixjk and l l . X] Z) Aixijfc(l + (pxjkhxij, respectively. j=0k=0 56 • C O V * L U R This model has (f>xjk = xk and vxi = 2~2 Miijfc(l + xk)lxj- Hence, we require fc=o j=o k=0 w XIJ = y^fnxijk^xk, (6.13) fc=o and l l vXi = ^2J2mxijk(l + 4>xk)lxj- (6.14) j=0 jfc=0 For a fixed level of x, (6.13) represents four linear equations in the two unknowns, CJ)xQ and (j)xi, indicating these are overdetermined. With solutions for c^x0 and xi, (6.14) represents two linear equations in the two unknowns, jxo and jxi. Thus, /i2({p},y2 I x;r72) is identifiable under this model. • C O V -I- L O R + L U R In this model, we can represent xjk = (fix 4>j (t>k and jxij = jx 7^ jj. The equations for (f>Xjk are identical to the earlier case for this model and so are identifiable provided the covariate takes on at least two levels. It remains to show that the parameters jxij can also be identified. The equations for vx{ are 1 1 Vxi = ^2 ^ 2 rnxijk(l + xjk)lx 7i Ij j=0 k=0 1 = ^T,lx li Ij Mxij, (6.15) 3=0 1 where Mxij = ^ mXijk(l + Xjk) is treated as known since solutions for the 's fc=o exist. Suppose x has 2 levels. Using the same representation for jxij as was used -57 for (fixjk earlier, these equations become no = MWo 7 7/ 7 J + Afioi 7 7/, (6-16) m = Mno 7 jj + M m 7, (6.17) V20 = M200 7 ix 7/ U + M201 7 7x 7/, (6-18) v2\ = M2\o 7 ix U + M2n 7 ix- (6.19) Taking the ratio of (6.18) to (6.16) to eliminate 7 7/ and of (6.19) to (6.17) to eliminate 7 leads to two equations in 7^ and 77 from which 7x is easily eliminated. This leads to a quadratic equation in 7^; that is, A jj + B + C = 0, where A = M100M210 - VOR-M110.M200 B = (M101M210 + M211M100) - VOR(MUIM2OO + M201M110) C = M101M211 - VORM20IMU1 no/vn VOR = 7—• W20/W21 A perfect fit requires real roots, or B 2 — 4AC > 0. Thus, /i2({p}5y2 I x'irl2) 1S identifiable under this model provided the covariate takes on at least two levels and the equation B 2 — 4AC > 0 is satisfied. • L O R * L U R This model has 4>xjk — 4>jk and 7 ^ = jij. Thus, there are 4 distinct parameters of each type for all the levels of x. These 8 parameters can be identified from the equations for a perfect fit: 1 Wxij = y^'mxijkjk, (6.20) k=0 and 1 1 Y2 XI mxijk{i- + 3khij- (6-21) j=0 k=0 58 For each x, (6.20) corresponds to 4 linear equations in the same 4 unknowns as in the verification for hs({p,p},y3 | x;rj3). Substituting these solutions for the 's into (6.21) leads to 2 linear equations in the same 4 unknowns for each x. The 4 jij parameters are determined as long as x has 2 or more levels. Hence, ^2({p},y2 I x'irl2) 1S identifiable provided the covariate x has 2 or more levels. 6.4.3 Identifiability of hi({ },y{ \ x^i) In addition to the notation from the previous subsection, denote zx = naA,ayX and i - M { }>yi I x;f?i) The contribution of h\({ },yj | x\r)x) to Lx(8,r)) can then be expressed as: i i i i i l X 5Z ^ 2 mxijk log p x i j k + X X ^glX '^*1^^} ^ = 0 ^ = 0 ^ = 0 2^=0 ^ = 0 fc=o i i i j=0 fc=0 1 1 1 +^xlog| X) X 5Z /9^j'fc(1 + ^ f c ) ( 1 + TxijMx}- (6.23) 3/J=0 2/;=0j/*=0 A perfect fit requires l Wxij = y^,mXijkxjk, (6-24) fe=o l l vXi = Y2^2™>xijk(l + ^xjkhxij (6.25) j=o k=0 and l l l zx = E E E m i i i * ( 1 + M ( 1 + 7 « i ) ^ - . (6-26) «j=o»;=oy5=o • C O V * L U R This implies (f>xjk = xk and jxij = ^xj, while the equations (6.19) for wxij and 59 (6.20) for vx{ are the same as before. The argument in the previous subsections shows that the xk and 7 X j are identified. With these solutions, (6.21) becomes a single equation in one unknown, namely 8X. Thus, 5X is also identified. In other words, h\({ },y\ | z;T7i) is identifiable. • C O V + L O R + L U R In this model, we can represent xjk = x j k and jxij — 7 x 7 i 7 j - The argument for the identifiability of the and 7 parameters is identical to that in the previous subsection. Additionally, we have a 6X parameter for each level of x in (6.21). In other words, there exists a solution for Sx provided the solutions for the and 7 parameters exist. Hence, h\({ },y\ \ is identifiable. • L O R * L U R This model implies 4>xjk = jk (4 parameters for all levels of x), "fxij = lij (4 parameters for all levels of x) and 6X = 5 (1 parameter for all levels of x). The argument for the identifiability of the (f> and 7 parameters is again identical to that in the previous subsection. The additional parameter, 6, can be determined from (6.21) provided solutions exist for the and 7 parameters. Hence, hi({ }, | x; r^) is identifiable. Thus, we have shown that, when coupled with a saturated outcome model, the parameters in the drop-out models of the three forms suggested by Baker (1995) are identifiable. Notice, that we only consider the case where the covariates are categorical. In the next chapter, we analyze our annual data set with the models mentioned in the previous chapters. 60 Chapter 7 Application to the Data 7.1 Introduction In this chapter, we implement the selection model approach for our annual MS data as described in Chapter 2. Recall our study questions of interest are: • to investigate the most appropriate form of drop-out model for our annual data (in particular, to explore whether the data provide evidence of informative drop-out); • to assess the sensitivity of inferences concerning the treatment effects (and other covariate effects) to the form of drop-out model employed; • to explore the influence of baseline covariates. Recall that the basic idea of a selection model is to factor the joint distri-bution for the response variables (Y) and the indicator variables corresponding to whether or not the response variables are observed (R) as follows: / ( Y , R ) = . / ( R | Y ) / ( Y ) . (7.1) Thus, the selection model approach involves the specification of a model for the out-comes, / ( Y ) , and for the drop-out pattern conditional on the outcomes, / ( R | Y ) . 61 The outline of this chapter is as follows: Section 7.2 considers a simple struc-ture for Baker's selection model where only treatment group and time are included as covariates in the outcome model. This outcome model is coupled with a LOR-f-LUR type of drop-out model. In Section 7.3, we consider three more general model spec-ifications for the drop-out process in conjunction with the same outcome model: COV * LUR, COV + LOR + LUR, and LOR * LUR. We extend this simple model by incorporating other baseline covariates described in Section 2.2.3 into the outcome model in Section 7.4. The latter two sections can be viewed as further explorations of Baker's selection model. We conclude the chapter with a brief discussion of the use of the Liu et al. transition model for the outcome model. 7.2 Baker's Selection Model: With Only Treatment Groups and Time as Covariates As described in Chapter 4, Baker (1995) suggested specifying the outcome model in terms of marginal and association models. The drop-out process is modelled using a time-dependent causal model assuming the non-response does not depend on future events. • Repeated B i n a r y Outcomes w i t h Informative Drop-out • o Outcome Model The outcome model /* (y^y^Vz I x;0) is expressed in terms of marginal and as-sociation models. As is apparent from Figure 2.3, the proportion of patients with exacerbations seems to vary across the treatment groups and with time, so the marginal model employed is logit{«7t(as;/3)} = fa + faLD + faHD + fat, (7.2) where t — 1,2,3, and LD and HD are indicator variables to represent the treatment groups. For patients in the LD group, LD = 1 and HD = 0. Similarly, LD = 0 and 62 HD = 1 if patients belong to the HD group. For patients in the PL group, both LD and HD take on value 0. We propose modelling the 2-way and 3-way associations with different inter-cept parameters to describe different degrees of association. We further assume the association among the responses is related to the treatment arms. For simplicity, these treatment effects are taken to be the same for all associations. • Models for 2-way Association: logit{gst{x; ast)} = ast + ct\LD + a2HD (7.3) where st = {12,13,23}. • Model for 3-way Association: logit { # 1 2 3 {x; 0 1 2 3 ) } = « i 2 3 + OL\LD + a2HD. (7.4) Both the marginal and association models remain the same throughout the analyses in this section regardless of the assumption on the drop-out mechanism. The adequacy of this non-saturated outcome model for our data has been confirmed by comparing it to various more general models. This information is presented in the next subsection. o Drop-out Model We model the drop-out process using time-dependent causal models assuming the non-response does not depend on future events. We allow different regression pa-rameters for the logistic regressions specifying the different conditional probabilities of absence, /it(rt_i,y£ | x,nt); see (4.12). To simplify the notation, we introduce two subscripts for these regression parameters: logit{/i 3(r 2 = {p,p},y 3 | aj,T} 3)} = r/03 + VnV2 + mzVz logit{/i 2(ri = {p},y 2 | x, r»2)} = r/02 + rjnvt + V22V2 (7.5) 63 Table 7.1: Drop-out Models under Different Drop-out Mechanisms: J denotes in-clusion of a parameter and Vi denotes parameters which are restricted to be equal Drop-out Parameter Mechanism Model V03 V23 V02 Vl2 V22 Vol 1 V V v 7 V V V 2 V m V2 V Vi V2 V ID 3 vo m V2 Vo Vi V2 Vo 4 V - V V - V V 5 V - - V2 V - V2 V 6 no - V2 Vo . - V2 Vo 1 V - V V - V RD 2 V m - Vi - V 3 110 Vi - Vo Vi - Vo CRD 1 V - - V - -2 Vo - - Vo Vo where the first subscript indexes the specific parameter in the model, while the second subscript indexes the year the drop-out occurred. According to Baker (1995), if the conditional non-response probability in the first year, Pr(i?i = a | y*,x) = hi(ri = { },y* | x,r]i), depends only on the covariates, then the non-ignorable non-response models under consideration will be identifiable. In our case, the model for M r i = { }>2/i I xiV\) becomes: logit{/»i(n = { },y{\ x,Vl)} = r/oi. (7.6) These drop-out models belong to Baker's LOR + LUR class of models. For sim-plicity, we have taken the drop-out mechanism to be independent of the available covariates. We relax this assumption in Section 7.3. To explore the adequacy of simpler models, we consider five other model specifications which are obtained by letting certain parameters be equal or be equal to zero. The ID models to be considered are summarized in the first six rows of Table 7.1. 64 • Repeated Binary Outcomes with Ignorable Drop-out • To investigate the types of drop-out in our annual data, we also fit the data to models under ignorable drop-out assumptions, i.e. with RD and CRD models. o Random Drop-out We consider the three RD models summarized in Table 7.1. Modifying an ID model by setting the parameters associated with the unobserved response to zero leads to an RD model. For instance, RD1 (Model 1 under RD) is obtained by setting ??23 = ?722 = 0 in ID1 (Model 1 under ID). RD2 and RD3 are similarly obtained from ID2 and ID3. o Completely Random Drop-out The two CRD models considered in Table 7.1 are obtained by simplifying the RD models. Under CRD, the drop-out mechanism is independent of the measurement process. Thus, CRD1 (obtained by setting 7713 = 7712 = 0 in RD1, or 771 = 0 in RD2) and CRD2 (obtained by setting 771 — 0 in RD3) each consist of only intercept parameters. For both the RD and CRD cases, we also have the opportunity to examine the sensitivity of the covariate effects (treatment and time) under different forms of the RD or CRD models. These outcome and drop-out models can be assembled into explicit expres-sions for the logarithm of the likelihood (see (4.16)). The maximun likelihood esti-mates of the parameters in these models are obtained by minimizing the negative log-likelihood function using a quasi-newton (QN) minimization procedure. This procedure is briefly described in the following subsection. The corresponding re-sults are summarized in next subsection. 65 7.2.1 The Quasi-Newton (QN) Algorithm The QN algorithm used to maximize the log-likelihood is a variable metric algorithm. A l l variable metric methods seek to minimize a certain function S(6) (in our case, S(6) is the negative log-likelihood function) of p parameters by means of a sequence of basic iterative steps 6' = 0- kBg (7.7) where g is the gradient of the function S, B is a matrix defining a transformation of the gradient and k is a step length. Consider the set of nonlinear equations formed by the gradient at a minimum 9(0') = 0. (7.8) As in the one-dimensional root-finding problem, one can use a linear approximation from the current 0, that is 9(O')^g(0)+H(O)(O'-O) (7.9) where H(0) is the Hessian matrix (the matrix of second derivatives of the func-tion S). For convex functions, H will be positive definite. From (7.8), (7.9) becomes 0' ttO-H^WgiO) (7.10) which is Newton's method for a function of p parameters. This is equivalent to (7.7) with B - H _ 1 and k = 1. Newton's method is generally preferable if second derivatives can be analyt-ically computed. But the implementation of Newton's method may induce errors when closed form expressions for the second derivatives do not exist as it involves composing subroutines for evaluating p first derivatives, p2 second derivatives and a matrix inversion. For these reasons, Newton's method does not recommend itself for some problems. 66 If H _ 1 could be approximated directly from the first derivative information available at each step of the iteration, this would save a great deal of work in computing both the matrix H and its inverse. This is precisely the role of the matrix B in the iteration defined by (7.7). The transformed gradients in the matrix B are used to generate linearly independent search directions; equivalently, these search directions are conjugate to each other with respect to H. Further, the step parameter k is rarely fixed; its value is usually determined by some form of a linear search. In particular, the role of A; is to allow a search for values of 6' at which the function value is reduced, i.e. S(6') < S(6). Since the second derivatives required in Newton's method are approximated in the iteration (7.7), this algorithm is known as a guasi-Newton method. We employ the QN algorithm suggested in Nash (1979). It involves specific choices of the formula for updating the matrix B and of the linear search proce-dure for obtaining the updated values of 6'. An 'acceptable point' search procedure suggested by Fletcher (1970) and a matrix-updating formula for B due to Broyden (1970a, 1970b), Fletcher (1970) and Shanno (1970) are employed. Generally speak-ing, the algorithm first goes through a linear search to find one value for 6 which gives a smaller function value than that at the previous value for 6. The approx-imation to the Hessian matrix is then updated accordingly. The algorithm stops when all the parameter values on consecutive iterations are sufficiently close. For our purposes, the absolute difference between the parameter values of consecutive iterations must be smaller than 10 - 7 . A detailed outline of this algorithm can be found in Chapter 15 of Nash (1979). Note that in this version of the QN algorithm, the matrix B is initialized as a unit matrix. This simple choice nevertheless has the advantage of generating the steepest descent direction (Nash, 1970). To ensure rounding errors which occur in updating the matrix B and forming the search directions, t, through the equation t = 0' - 0 = -kBg 67 have not accidentally given a direction in which the function S cannot be reduced, a reset of B to a unit matrix is suggested in any of the following cases: (i) tT9 > 0; that is, the direction of the search is 'uphill'; (ii) 6' = 6; that is, no change is made in the parameters by the linear search along t; (iii) tT{g(0') — g(Q)} < 0; that is, an updating contrary to the objective of the method to reduce S along t (tTg(6') is expected to be greater (less negative) than tTg(6)), indicating a danger that matrix B may no longer be positive definite. If either (i) or (ii) occurs during the first step after B has been set to the unit matrix, the algorithm is taken to have converged. A l l results described in this thesis are obtained using this Q N algorithm implemented in C. The results for the models described in the beginning of this section are discussed in the next subsection. 7.2.2 Resul ts • Adequacy of the Outcome Model To verify the adequacy of our reduced (non-saturated) outcome model, we consider four more general outcome model specifications. These outcome models are 1. Saturated: a saturated marginal model (9 distinct parameters) and a satu-rated association model of the same form as (7.3) and (7.4) but with regression parameters that differ for each of the 2-way and 3-way association models (12 distinct parameters); 2. Semi-saturated I: a saturated marginal model and a reduced association model with common treatment effects in the 2-way associations (8 distinct parameters); 68 Table 7.2: Negative Log-likelihood Values for Five Outcome Model Specifications Outcome Model Negative Log-likelihood Number of Parameters Saturated 928.923 28 Semi-saturated I 930.450 24 Semi-saturated II 931.680 22 Semi-saturated III 930.304 23 Reduced 933.407 17 3. Semi-saturated II: a saturated marginal model and a reduced association model with common treatment effects for all associations (6 distinct parame-ters). Note that this reduced association model is exactly (7.3) and (7.4); 4. Semi-saturated III: a reduced marginal model assuming linearity in time (4 parameters) and a saturated association model (12 distinct parameters). Note that this reduced marginal model is exactly (7.2). The negative log-likelihood values presented in Table 7.2 correspond to these out-come models coupled with the drop-out model (7.5). The likelihood ratio test (LRT) indicates the reduction from the fully sat-urated outcome model to semi-saturated I is reasonable (LR statistic = 3.05 on degrees of freedom (df) = 4; p-value = 0.55). To examine whether the treatment effects in the association model can be taken to be common across all associations, we compare semi-saturated I to semi-saturated II. The LR statistic of 2.46 (df = 2; p-value = 0.29), indicates the reduction is permissible. The result based on a direct comparison between the saturated and semi-saturated II models also agrees (LR statistic = 5.51 on df = 6; p-value = 0.48). This indicates that an associa-tion model with common treatment effects for all associations is reasonable for our data set. The further reduction to our reduced outcome model is also allowed (LR statistic = 3.45 on df = 5; p-value = 0.63). 69 As our primary focus is on the marginal model, a more interesting comparison is between the semi-saturated III and saturated outcome models. In the context of a saturated association model, this provides an assessment of whether the reduced marginal model (7.2), which incorporates additive treatment effects and a linear pattern over time for the log odds of having exacerbations, is reasonable. The LRT allows this reduction (p-value = 0.74). As should be expected from the earlier com-parisons, the semi-saturated III model can be further reduced to our non-saturated model (p-value = 0.40). Both sequences of model reductions lead to the same conclusion: the re-duction to the model presented in the beginning of this section is permitted. This reduced model also provides an adequate fit to our data. The usual goodness-of-fit statistics based on the 15 different possible patterns of binary responses for each treatment arm lead to G2 = 24.65 and X2 = 22.80 on 25 degrees of freedom (p-values = 0.48 and 0.59 respectively). Thus, we can proceed confidently with further work using this reduced model as a starting point in the investigations. • Informative Drop-out (ID) The detailed results corresponding to the six ID models described in Table 7.1 can be found in Appendix B: Tables B . l to B.6. These tables include the sets of starting values (SV) used, and the maximum likelihood estimates for the parameters (Est), the corresponding standard errors (SE), and the negative log-likelihood computed at the M L E which are all provided as part of the output from the Q N minimization procedure. The number of iterations needed to achieve convergence is also cited in the tables. In each of these tables, regardless of the starting values in the Q N procedure, the corresponding negative log-likelihoods computed at the parameter estimates (at convergence) are the same (at least up to the 4 significant decimal digits displayed). However, in Tables B . l , B.2, B.4 and B,5, not all the reported MLEs are the same 70 (see especially for parameters 7703 and 7/23 in Tables B . l and B.4, and parameters 7703, 772 and 7702 in Tables B.2 and B.5). Also, in these four tables, the SEs for the estimates vary quite a bit across different sets of starting values. This phenomenon might be due to how the Hessian matrix is approximated in the minimization procedure. As mentioned earlier, the Hessian matrix is approximated based on the search directions for the parameter estimates obtained in each successive iteration. To illustrate, consider starting value Sets #1 and #4 in Table B . l . For Set #1, the estimated Hessian matrix was reset to a unit matrix at the 56th iteration due to it not being a positive definite matrix. The final SEs as displayed thus depend on both the parameter estimates at convergence and the corresponding search directions at the subsequent iterations, i.e. the 57th iteration until convergence was achieved (at the 71st iteration). The estimated Hessian matrix for Set #4, however, was reset to a unit matrix three times during the process of minimization (at the 4th, 9th, and 75th iterations), with convergence established at the 91st iteration. Since the process of minimization for the two sets was quite different, this might be the reason why the estimated SEs differ considerably from one set of starting values to another. The substantially different values of the estimates obtained with different sets of starting values for some of the parameters in models ID1, ID2, ID4 and ID5 indicates a more fundamental difficulty. Consider the results for model ID1, for example. Table B . l shows the parameter 7703 is always estimated as being large negative, while 7723 is always estimated as large positive. Furthermore, for all four sets of starting values, the sum of these two parameter estimates equals a constant value, —1.548. This suggests the maximum likelihood estimates for this data set satisfy the constraint 7703 + 1723 = —1.548, with the M L E occurring on the boundary of the parameter space (7703 = — 00 or 7723 = 00) . Recall that the non-response probability for the third observation is modelled as a logistic regression on last observed outcome (y )^ and last unobserved outcome (7/3); see (7.5). When 7703 = —00, 7703 + *723 = —1-548 and 7713 is finite, the probability that the third observation 71 is missing is estimated to be zero if the history is either {y2 = 0, y | = 0} or {y2 = 1,2/3 = 0}' but non-zero for the remaining two histories. The same phenomenon is observed for model ID4 in Table B.4, but with 7703 +1723 = —1.165. This phenomenon is also apparent for models ID2 (Table B.2) and ID5 (Table B.5), but manifests itself in a slightly different fashion. Here, the parameters 7703 and 7/02 are always estimated as being large negative, while the parameter 772 is always estimated as large positive. However, the sum of 7703 and 772 always equals a constant, and the sum of 7702 and 772 equals another constant. The pair of constants differ from model ID2 to model ID5. Thus under models ID2 and ID5, the probabilities for the second and third observations to be missing are estimated to be zero when the past observations are either {y* = 0, y 2 = 0} or {y{ = 1, y 2 =0}, and when the history is either {y2 = 0, y | = 0} or {y2 = 1, y 3 =0}, respectively. In the next few paragraphs, we discuss the issue of boundary solutions for model ID1 in greater detail. The corresponding discussion for models ID2, ID4, and ID5 is omitted as the details are essentially identical to model ID1. But the results for these three models evaluated at the boundary solutions are also presented. • Discussion of Boundary Solutions Consider model ID1. The estimates obtained for 7703 and 7723 displayed in Table B . l vary across different starting values, but in each case 7703 + 7723 = —1.548. Further the negative log-likelihood remains the same up to the four decimal digits displayed. We believe that the M L E is located on the boundary of the parameter space. To confirm this conjecture, we first use a graphical visualization of the negative log-likelihood function incorporating the special feature (e.g. 7703 is estimated with large negative value, while 7723 is estimated with large positive values, and the sum of the two is always the same) observed in Table B . l . 72 Figure 7.1: A Two-Dimensional Profile Log-likelihood Surface for Model ID1 Figure 7.1 is a graphical representation of the profile log-likelihood surface for the parameters 7703 and 7723 in model ID1. This three-dimensional plot is produced by maximizing the log-likelihood over all parameters except 7703 and 7723. For fixed values of 7703 and 7723, we apply the QN minimization procedure to the negative log-likelihood function. This log-likelihood value is then plotted against these values for 7703 and 7723 using the S-PLUS function "persp". We chose the values for 7703 and 7723 to be a sequence of numbers between —.20 and 20 with increment size of 0.5. This yields a 81 by 81 grid of log-likelihood values. Notice that there seems to be a steady, but very shallow, decrease in this surface along a line (where 7703 + 7723 = —1.548) in the grid where 7703 and 7723 take on values ranging from —20.0 to —0.5, and from 0.0 to 20.0, respectively. This seems to agree with the results presented in Table B . l . We also computed the log-likelihood on the boundary of the parameter space to check that the log-likelihood values obtained in Table B . l are what one would obtain at the suggested point on the boundary. Because the parameter estimates appear to satisfy the constraint 7703 + V23 = —1.548, it is useful to re-parameterize in terms of 7703 and 7723 = ~V03 + A , where A is a finite-valued parameter. As 7703 approaches —00, the log-likelihood is a function of the remaining parameters and A . For the probability of non-response, Pr(i?3 = a | {p, p}, y 3 , x), we substitute the values presented in Table 7.3 to obtain the reduced log-likelihood function. Applying the QN minimization routine to this reduced negative log-likelihood function yields the results summarized in Table 7.4. The estimates for the model parameters are essentially the same as those presented in Table B . l and the log-likelihood value also agrees. Thus, both Figure 7.1 and this computation of the log-likelihood at the indicated boundary point seem to support our conjecture that the parameter estimates for model ID1 occur on the boundary of the parameter space. The same values of the estimates reported in Table 7.4 were obtained with different choices of starting values and these minimizations required many fewer iterations than those presented in Table B . l . Further, the estimated Hessian matrix 74 was never reset to a unit matrix during these minimizations. Notice that the standard errors for 7/02 and 7722 in Table 7.4 are relatively large. One might suspect this reflects a potential boundary solution phenomenon for the reduced log-likelihood even though these estimates did not vary with the sets of starting values chosen (see also Table B. l) . Perhaps these large standard errors are simply indicating that our data set does not contain sufficient information to obtain precise estimates for these parameters. We explored this further graphically. Figure 7.2 shows the profile log-likelihood surface for the parameters 7702 a n d 7722 of the reduced model ID1. The values for 7702 and 7722 were chosen to be a sequence of numbers between —20 and 20 with increment size of 0.5. The plot is not very informative in terms of revealing the existence of optimal solutions. The rotating option in "persp" allowed us to view Figure 7.2 from different directions and convinced us of the existence of optimal solutions in the interior of the param-eter space for this reduced log-likelihood function. For further assurance, we also calculated the log-likelihood values at various points in the neighbourhood of the suggested estimates for 7702 and 7722; these values are all larger than 933.407. Thus we are certain that this situation does not indicate a boundary solution, but simply indicates a lack of information in the data to precisely estimate these parameters. One can easily show, in a similar fashion, that the parameter estimates for models ID2, ID4 and ID5 also occur on the boundary of the parameter space. The corresponding results for these three models computed at the suggested boundary points are presented in Tables 7.5, 7.6, and 7.7. Note that the parameter estimates in the outcome models for ID2 and ID5 are the same. With the imposed boundary constraints, the log-likelihood functions can be expressed as the sum of a function of the parameters in the outcome model and a function of the parameters in the drop-out model. Hence, the parameters in the outcome and drop-out models can be maximized separately. Compared to the minimizations summarized in Tables B.2, B.4 and B.5, the convergence for these three cases is achieved with many fewer 75 Figure 7.2: A Two-Dimensional Profile Log-likelihood Surface for Model ID1 with Boundary Constraint 7703 —> — 00 and 7703 + 7723 = A 76 Table 7.3: Non-response Probability for the Third Response Using Model ID1 with 7703 ->• -oo and 7703 + 7723 = A y*2 y*s logit{Pr(i?3 = a | {p,p},y3,a:)} 0 0 — 0 0 0 1 A 1 0 — 0 0 1 1 TJ13 + A iterations. Further, the estimated Hessian matrices were never reset to a unit matrix during the course of minimization. As expected, the standard errors for 7702 and 7722 in Table 7.6 behave similarly as in Table 7.4. This is again verified (by the same approach) not to reflect a boundary solution. On the other hand, the standard errors for all the estimates in Tables 7.5 and 7.7 look quite reasonable. This feature of boundary solutions does not appear in models ID3 and ID6. For both models, the solutions obtained by the QN minimization are located in the interior of the parameter space. Different sets of starting values lead to the same parameter estimates and similar standard errors for the estimates, as shown in Tables B.3 and B.6. Even though the Hessian matrix was never reset to unity during the minimization process, the small discrepancy in the estimated SEs is expected due to the way the Hessian matrix is approximated. For these two models, the convergence is achieved between 17 and 21 iterations, which is much faster than for the models where the solutions are located on the boundary of the parameter space. This concludes the discussion concerning the existence of boundary solutions. • Results for the ID Models Now we examine if the treatment effects are sensitive to the form of the informative drop-out model based on the results presented in Tables 7.4, 7.5, B.3, 7.6, 7.7 and B.6. Our primary focus is on the treatment effects in the marginal model for the exacerbation rates even though treatment effects are also incorporated in 77 Table 7.4: Results for Model ID1 Evaluated on the Boundary: 7703 —• - 0 0 and V03 + V23 = A Parameter Estimate SE A> 0.876 0.206 0i (LD) -0.028 0.200 02 (HD) -0.489 0.195 03 (time) -0.122 0.074 «12 -0.020 0.170 « 1 3 r0.031 0.168 « 2 3 -0.136 0.183 « 1 2 3 -0.534 0.187 a i -0.113 0.213 a 2 -0.657 0.221 "13 0.558 0.409 A -1.548 0.347 ?702 -3.360 2.218 »7l2 0.140 0.417 1.860 2.615 ??01 -2.089 0.167 Neg. Loglik 933.407 (# Iter = 25) 78 Table 7.5: Results for Model ID2 Evaluated on the Boundary: 7703 -> - 0 0 , 7702 -> - 0 0 , 7703 + 772 = Ai and 7702 + 772 = A 2 • Parameter Estimate SE Po 0.886 0.204 Pi (LD) -0.017 0.195 fa (HD) -0.484 0.194 fa (time) -0.118 0.074 " 1 2 -0.004 0.163 " 1 3 -0.010 0.161 " 2 3 -0.111 0.173 "123 -0.511 0.177 -0.103 0.208 Q!2 -0.649 0.217 »7i 0.286 0.275 A i -1.356 0.264 A 2 -1.499 0.258 »7oi -2.089 0.164 Neg. Loglik 933.922 (# Iter = 20) Table 7.6: Results for Model ID4 Evaluated on the Boundary: 7703 —• —00 and »?03 + 7723 = A Parameter Estimate SE fa 0.880 0.189 fa (LD) -0.024 0.190 fa (HD) -0.487 0.187 fa (time) -0.120 0.071 " 1 2 -0.013 0.145 " 1 3 -0.022 0.137 " 2 3 -0.126 0.151 " 1 2 3 -0.524 0.152 " 1 -0.109 0.202 " 2 -0.654 0.214 A -1.165 0.181 V02 -3.819 3.002 V22 2.464 3.217 V01 -2.089 0.165 Neg. Loglik 934.432 (# Iter = 27) 79 Table 7.7: Results for Model ID5 Evaluated on the Boundary: 7703 —> —00, 7/02 —> - 0 0 , 7703 + 772 = A i and 7702 + 772 = A 2 Parameter Estimate SE A> 0.886 0.202 Pi (LD) -0.017 0.198 02 (HD) -0.484 0.192 03 (time) -0.118 0.073 «12 -0.004 0.162 "13 -0.010 0.160 "23 -0.111 0.172 "123 -0.511 0.176 a i -0.103 0.211 0:2 -0.649 0.217 A i -1.165 0.182 A 2 -1.293 0.168 Vol -2.089 0.165 Neg. Loglik 934.473 (# Iter = 21) the association model. The structure of the ID drop-out model does not change the conclusions about the treatment effects in the marginal model. Al l six models conclude that the exacerbation rates in the LD and PL groups at any given time are not significantly different (approximate two-sided p-value > 0.62 based on 0i in each case). On the other hand, the exacerbation rate in the HD group is estimated to be significantly lower than in the PL group at all time points (two-sided p-value < 0.02 based on 02 in each case). The odds of experiencing exacerbations in the PL group are roughly 1.6 times higher than in the HD group. There is a weak suggestion of a linear decrease with time in the log odds of experiencing exacerbations under models ID1, ID2, ID4 and ID5 (two-sided p-value « 0.10 in each model), but the estimates of 03 in both ID3 and ID6 provide a strong indication of a linear decrease over time (two-sided p-values < 0.008). The conclusions regarding the treatment effects in the association model are similar. Al l six models indicate that the odds of having exacerbations at two 80 occasions or at all three occasions in the study are not significantly different between the LD and PL groups (two-sided p-values > 0.38 based on a\). But the models suggest that the odds in the HD group are significantly smaller than in the P L group (two-sided p-values < 0.004). Under models ID1, ID2, ID4 and ID5, the estimates of the intercept param-eters, c*i2 and a i 3 , are fairly similar while 0:23 is slightly more negative. As would be expected, the estimate for the intercept in the 3-way association model is most negative. The situation is similar for models ID3 and ID6, although the estimates are slightly more negative. Note that the estimates for a i 2 , " 1 3 and « 2 3 are not very different, suggesting a possibility of a common intercept parameter for all the 2-way association models. However, the reduction to a model with the same intercept parameter for all 2- and 3-way association models may not seem reasonable since the estimate for a\23 is always quite different from the others. Further, we could explore explicitly whether the responses are positively or negatively associated by comparing the joint probabilities of the responses with those obtained under the in-dependence assumption. If the joint probabilities are larger than the product of the marginal probabilities, then there is some positive dependence among the responses; otherwise, the responses are negatively correlated. See Chapter 8 for more details. We now consider selecting a parsimonious ID model to describe our data. Table 7.8 summarizes the negative log-likelihood and available degrees of freedom for all models listed in Table 7.1. Based on the LRT, the reduction from model ID1 to ID2 is permissible (p-value = 0.60), indicating the dependence on the previous and current observations is similar at time points 2 and 3. Using model ID2 as the base model and comparing to model ID3 examines whether the odds of dropping out (for the same history) change over time; that is, the hypothesis is 7703 = 7702 = 7701 = ??o-But the LRT statistic indicates this reduction is not reasonable (p-value = 0.03). Note that one can also assess the reduction from model ID1 directly to ID3, although this assessment is not as sensitive as the comparison between models ID2 and ID3. 81 Table 7.8: Negative Log-likelihood Values for Models in Table 7.1 Drop-out Negative Degrees of Mechanism Model Log-likelihood Freedom (df) 1 933.407 25 2 933.922 27 ID 3 937.349 29 4 934.432 27 5 934.473 28 6 938.464 30 1 936.833 27 RD 2 937.250 28 3 937.457 30 CRD 1 940.422 29 2 941.040 31 The associated p-value is 0.096, indicating only fairly weak evidence against reducing from model ID1 to ID3. Thus, based on the more sensitive assessment, we conclude that model ID2 is the simplest permissible ID model among these three. To consider further model reductions, we next compare model ID2 to ID5. The LRT statistic suggests this reduction is reasonable (p-value = 0.29). The overall reduction from model ID1 to ID5 also agrees (p-value = 0.54). In model ID5, the drop-out probabilities do not depend on the last observed response, only on the last unobserved response. The further reduction from model ID5 to ID6 is not allowed (p-value = 0.02). We conclude that model ID5 is the simplest of these six informative drop-out models that can be used to describe our annual data set. The two reduced models, ID2 and ID5, both fit the data adequately. For model ID2, G2 = 25.94 and X2 = 23.81 on 27 degrees of freedom (p-values = 0.52 and 0.64 respectively). For model ID5, G2 = 26.53 and X2 = 24.09 on 28 degrees of freedom (p-value = 0.54 and 0.68 respectively). Note that all parameter estimates in the outcome model are the same for drop-out models ID2 and ID5. This phenomenon is induced by the 82 imposed boundary constraints mentioned earlier which allow separate maximizations for the parameters in the outcome and drop-out models. • Ignorable Drop-out Under the assumption of ignorable drop-out (either RD or CRD), the maximum likelihood estimates obtained by the QN minimization are in the interior of the parameter space. The results are summarized in Tables B.7 to B . l l . As expected, the parameter estimates in the measurement process are the same in all the RD and CRD models. Hence, the conclusions about the treatment effects in the marginal model for the exacerbation rates do not differ across the different specifications of these drop-out models. Only the HD group has a different effect on the exacerbation rates compared to the PL group (two-sided p-value « 0.01 based on /32); the odds of having exacerbations in the PL group are about 1.6 times the odds in the HD group. There is a strong indication of a linear decrease over time in the log odds of having exacerbations (two-sided p-value « 0.001 based on ft). The treatment effects express themselves similarly in the association model. There are no apparent differences between the LD and PL groups in the odds of having exacerbations at two and three occasions (two-sided p-value « 0.40 based on di), but the HD and PL groups differ (two-sided p-value ~ 0.004 based on d 2). The intercept parameter estimates are quite similar, although slightly more negative, to those obtained under models ID3 and ID6. Again, the estimated values for a i 2 , ai3 and a 23 are reasonably similar, and the estimate for ai 23 is somewhat more negative. This indicates a model which assumes a common intercept parameter for all the 2-way association models and a separate intercept parameter for the 3-way association may be reasonable for our data. We next consider selecting a simpler model among the three RD models. Based on the LRT, the model reduction from RD1 to RD2 is permissible (p-value — 83 0.36). One can also reduce model RD2 to RD3 (p-value = 0.81). The LRT statistic comparing model RD1 to RD3 also indicates the reduction to model RD3 is reason-able (p-value = 0.74). Thus, model RD3 is the simplest permissble model under the RD assumption. Similarly, if a CRD mechanism is assumed, model CRD2 can be used instead of CRD1 to describe our annual data (p-value = 0.54). • Types of Drop-out in the Data In the earlier part of this section, we determined that reductions from model ID1 to models ID2 and ID5 are permissible, with model ID5 being the simplest possible model among the six ID models considered. These three models can be used to examine whether the drop-out mechanisms in our data is ID, RD or CRD according to the classification by Little and Rubin (1987). To assess whether the drop-out occurred at random (RD), we can compare model ID1 to RD1. This comparison examines 7723 = 7722 = 0. The LR statistic of 6.85 (df = 2; p-value = 0.03) provides evidence against this reduction. As already established, it is reasonable to have common regression parameters describing drop-out at the different time points (reduce from ID1 to ID2). Hence, the comparison between model ID2 and RD2 should provide a more sensitive assessment of our question. In this case, we investigate whether 772 — 0 and the result agrees with the previous assessment. (LR statistic = 6.66, df = 1; p-value = 0.01). The less sensitive comparison of model ID1 to RD2 also sugguests one should not reduce to the simpler model (LR statistic = 7.69, df = 3; p-value = 0.05). Thus, the data indicate that the drop-out did not occur at random. As reduction to an RD model is not allowed, presumably reduction to a CRD model will also not be allowed. For the sake of completeness, we perform various assessments to examine this. Model CRD1 can be compared to model ID1, ID2 and ID5 to examine the dependence between the drop-out and the outcome processes. The LR test comparing models ID1 and CRD1 clearly indicates the reduction is 84 not permissible (LR statistic = 14.03, df = 4; p-value = 0.007). The LR statistics for examining the reduction from model ID2 and ID5 to CRD1 are 13.00 (df = 2; p-value = 0.002) and 11.90 (df = 1; p-value < 0.001), respectively. As expected, the comparison to ID5 provides the strongest evidence. Thus, the data provide strong evidence against the hypothesis that the drop-out process is independent of the outcome process. According to these comparisons, one cannot reduce from the ID models to any of these RD and CRD models. We can thus confidently conclude that the drop-out process in our data is informative. 7.2.3 Summary We fitted six ID models and the maximum likelihood solutions for four of these models lie on the boundary of the parameter space. This phenomenon does not occur in the case where the drop-out mechanism is assumed to be ignorable. Based on LR tests, we conclude that the drop-out mechanism in our data is informative and model ID5 is determined to be the simplest possible model for our data. The treatment effects appear in both the marginal and association models. However, we focus primarily on the treatment effects in the marginal model. Under model ID5, the HD group has a lower rate of exacerbations compared to the PL group. The odds ratio of having exacerbations in the HD group relative to the PL group is estimated to be 0.62 and the corresponding approximate 95% confidence interval (CI) is (0.42, 0.90). The indication of a linear decrease in the odds of having exacerbations over time is quite weak; the approximate 95% CI for fa is (—0.26,0.03). The treatment effects in the association model convey a similar story: the odds of experiencing exacerbations at two occasions and at all three occasions in the LD group are not significantly different from the PL group, but these odds are clearly lower in the HD group. Interestingly, these conclusions are not very sensitive to the underlying drop-85 out mechanisms for this data set. In particular, the parameter estimates (and stan-dard errors) in the outcome model obtained with the ID assumption are fairly similar to those obtained with the ignorable drop-out assumptions. 7.3 Baker's Selection Model: Extensions of the Drop-out Model In this section, we are interested in investigating the impact of different specifictions of the drop-out model on inferences concerning the treatment effects. The outcome model remains the same as in the previous section, and is coupled with the drop-out models considered in Baker (1995); that is, COV + LOR + LUR, COV * LUR and LOR * LUR. Since the only covariates to be used are the treatment groups indicators, we replace COV with TRT throughout this section. We have established that models ID1, ID2 and ID5 can be used to describe our annual data but no reduction to the RD and CRD models is allowed. Model ID1 is of form LOR + LUR, with different parameters associated with each time of occurrence of the drop-outs. Models ID2 is obtained from model ID1 by assuming the regression parameters to be common at each time of occurrence of the drop-outs, while model ID5 corresponds to the further assumption that the drop-out probabilities do not depend on LOR. In this section, we retain the feature of common regression parameters in all drop-out models considered. The three non-nested ID models considered for M r t - i , y ? I x',Vt) a r e : 1. TRT * LUR: For t = 2,3 ( r t _ i equal to {p,p} or {p}), we have logit[ftt(rt-i,y? I x;rit)] = not + m^D + n2HD + r/ 3 y t * + r]4LDy*t+r]5HDy*t, (7.11) 86 and for t = 1 ( r t _ i equal to { }), the model is logit[/»i({ }>yi I w,Vi)] = Voi+mLD + r)2HD; (7.12) 2. TRT + LOR + LUR: For t = 2,3, the model is logit[/it(rt_i, y*t | x; r)t)] = r)0t + rj\LD + rj2HD + mvi-i + ViVt, (7.13) and for t = 1, we have logit[/ii({ },yl | a;;»h)] = Vol + ViLD + mHD; (7.14) 3. LOR * LUR: For t = 2,3, the model is log i t [Mr t - i , y*t I »7t)] = Vot + mVt-i + VMt + mVt-iVt> (7-15) and for t = 1, we have One can view these models as expansions of models ID2 and ID5. More specifically, all three drop-out models are expansions of model ID5. Further, models TRT * LUR and TRT + LOR + LUR can also be considered as expansions of model ID2. Hence we can compare these models to models ID2 or ID5 for examining the improvement of the fit with these more general models. The results are presented in next subsection. 7.3.1 Results Tables C . l to C.3 in Appendix C display detailed summaries of the results cor-responding to the three extended drop-out models. For each drop-out model, we logit[/ii({ },y\ | x;T7j] = 7701. (7.16) 87 report the starting values used to obtain the parameter estimates, the estimated standard errors, negative log-likelihood values, and the number of iterations re-quired to achieve convergence. The phenomenon observed in models ID2 and ID5 can also be seen in these drop-out models. For the drop-out model TRT * LUR (see Table C.l) , the same parameter estimates are obtained regardless of the starting values used except for the intercept parameters, 7703 and 77025 and the parameter associated with LUR (773). Parameters 7703 and 7702 are estimated as large negative values, and 773 is estimated as large positive. Further the estimates of 7703 and 773 al-ways sum to -1.226, and 7702+773 = -1.350. Similarly, for model TRT+LOR+LUR (see Table C.2), the intercept parameters, 7703 and 7702, are estimated as being large negative, and the estimated value for 774 (the regression parameter corresponding to LUR) is large positive, but 7703 + 774 = —1.430 and 7702 + 774 = —1.573. The situation for model LOR * LUR is more complicated. Here we have the same phenomenon described for both models TRT * LUR and TRT + LOR + LUR, but the estimates of 771 (the parameter corresponding to LOR) and 773 (the param-eter associated with the interaction term, LOR x LUR) also appear to satisfy the constraint, 771 + 773 = 0.286. The parameter estimates obtained from the fourth set of the starting values, in particular, indicate that the maximum likelihood solution corresponds to 771 —> — 00 with 771 + 773 = 0.286. To make comparison to models ID2 or ID5, we need to verify that the maxi-mum likelihood solutions for these extended models occur at the suggested points on the boundary of the parameter space. Re-parameterizing in a similar fashion as pre-viously, the conditional drop-out probabilities at years 2 and 3 can be expressed as in Table 7.9. We then substitute these expressions into the log-likelihood functions for the three models. To obtain the MLEs, we minimize the negative log-likelihood functions using the QN procedure. The results are reported in Tables 7.10 to 7.12. The minimizations reported in Tables C . l to C.3 required a large number of iterations for convergence and, in each case, the estimated Hessian matrix was 88 Table 7.9: Non-response Probability for the Second and Third Responses Model: T R T * L U R With r/02 + r/3 = A i , rj03 + m = A 2 L O R L U R logit{Pr(/?2 = a | {p},y*2,x)} logit{Pr(ii3 = a | {p,p},Y3,a;)}. 0/1 0 —oo —oo 0/1 1 A i + (m + r)4)LD + (r/2 + m)HD A 2 + (r?i + 7]A)LD + (r)2 + r)b)HD Model: T R T + L O R + L U R With 7/02 + 774 = A i , 7703 + 774 = A 2 L O R L U R logit{Pr(i?2 = a | {p},y*2,x)} logit{Pr(i?3 = a | {p,p},yg,x)} 0/1 0 —00 • —00 0 1 A i + 771 + V2HD A 2 + 771 L i ? + r)2HD 1 1 A i +rnLD + r)2HD + T]3 A 2 + 771 L D + r)2HD + 773 Model: L O R * L U R With 7702 + T?2 = A i , 7703 + 772 = A 2 , 771 +773 = A 3 L O R L U R logit{Pr(i?2 = a | {p},yS,aj)} logit{Pr(i?3 = a | (p,p},y3,x)} 0/1 0 —00 —00 0 1 A i A 2 1 1 A i + A 3 A 2 + A 3 reset to a unit matrix in the course of the computations. These features were not found for the minimizations reported in Tables 7.10 to 7.12. In particular, the number of iterations needed in Tables 7.10 to 7.12 is, on average, only one-third the number required in Tables C . l to C.3. Furthermore, the estimated Hessian matrix in Tables 7.10, 7.11 and 7.12 was never reset to unity throughout the minimization process. The parameter estimates and the log-likelihood values in the corresponding tables in these two sets are identical to the number of digits displayed, but the log-likelihood is always slightly larger at the boundary point than at the interior points located by the original minimizations. Hence, we have shown that the maximum likelihood solutions for these extended models are indeed located at the suggested points on the boundary. 89 Table 7.10: Results for Model TRT * LUR Evaluated on the Boundary: 7703 -> - 0 0 , 7702 -> - 0 0 , 7702 + 773 = A i and 7703 +773 = A 2 Parameter Estimate SE / V 0.886 0.204 01 (LD) -0.017 0.201 P2 (HD) -0.484 0.198 03 (time) -0.118 0.075 "12 -0.004 0.165 "13 -0.010 0.165 "23 -0.111 0.180 "123 -0.511 0.182 "1 -0.103 0.213 "2 -0.649 0.221 V01 -2.136 0.293 Vi(LD) -0.203 0.433 V2(HD) 0.296 0.394 Vi(LD x LUR) 0.571 0.521 Vs(HD x LUR) -0.620 0.518 A i -1.350 0.244 A 2 -1.226 0.251 Neg. Loglik 931.223 (# Iter = 23) 90 Table 7.11: Results for Model TRT + LOR + LUR Evaluated on the Boundary: 7703 - > - 0 0 , 7702 ->• - 0 0 , 7702 + 774 = A i and 7703 + 774 = A 2 Parameter Estimate SE A) 0.886 0.198 Pi (LD) -0.017 0.195 P2 (HD) -0.484 0.190 flz (time) -0.118 0.074 " 1 2 -0.004 0.157 " 1 3 -0.010 0.155 " 2 3 -0.111 0.169 " 1 2 3 -0.511 0.173 " 1 -0.103 0.207 " 2 -0.649 0.215 7701 -2.156 0.223 m(LD) -0.209 0.238 m{HD) -0.023 0.249 m(LOR) 0.290 0.202 A i -1.573 0.283 A 2 -1.430 0.242 Neg. Loglik 933.350 (# Iter = 25) 91 Table 7.12: Results for Model LOR * LUR Evaluated on the Boundary: With 7/03 -> -oo, 7/02 -> -oo, T/I -» -oo, 7/02 + T/ 2 = A i , 7/03 + T/ 2 = A 2 and 7/1 + 7/3 = A 3 Parameter Estimate SE 0.886 0.206 Pi (LD) -0.017 0.196 P2 (HD) -0.484 0.194 03 (time) -0.118 0.074 " 1 2 -0.004 0.163 " 1 3 -0.010 0.160 " 2 3 -0.111 0.172 " 1 2 3 -0.511 0.177 " 1 ^0.103 0.208 " 2 -0.649 0.217 7/01 -2.089 0.167 A l -1.499 0.265 A 2 -1.356 0.265 A 3 0.286 0.277 Neg. Loglik 933.922 (# Iter = 21) There is an interesting point to note before moving on to the comparisons between these models and the models described in the previous section. Tables 7.10 to 7.12 (see also Tables C . l to C.3) display identical estimates for all the parameters in the outcome model. In fact, these parameter estimates are identical to those reported in Tables 7.5 and 7.7 (see also Tables B.2 and B.5) for models ID2 and ID5, respectively. The explanation for this is simple: for these drop-out models, the conditional probabilities that the second and third observations are missing are estimated to be zero when LUR = 0 (for both values of LOR). This simplifies the log-likelihood functions and allows the parameters in the outcome model and in the drop-out model to be maximized separately. As the 5 models share the same specification for the outcome process, it is then no surprise that the estimates of the parameters in the outcome model are identical even though the model specifications for the drop-out process differ. 92 Table 7.13: Results for Model TRT + LUR Evaluated on the Boundary: 7703 ->• - 0 0 , 7702 -> - 0 0 , 7702 + 774 = A i and 7703 + 774 = A 2 Parameter Estimate SE Po 0.886 0.207 Pi {LD) -0.017 0.193 P2 (HD) -0.484 0.194 Pz (time) -0.118 0.076 "12 -0.004 0.163 « 1 3 -0.010 0.159 « 2 3 -0.111 0.171 0!123 -0.511 0.175 Oil -0.103 0.206 Ci2 -0.649 0.218 VOl -2.136 0.214 Vi(LD) 0.191 0.232 V2(HD) -0.051 0.248 A i -1.349 0.211 A 2 -1.222 0.226 Neg. Loglik 933.910 (# Iter = 27) 93 To examine if one of these more complicated models should be employed for the drop-out process, we compare models TRT * LUR, TRT + LOR + LUR and LOR * LUR to models ID2 and ID5. By comparing model TRT * LUR to model ID5, we are examining whether the additional treatment effects (771,772) and the interaction between the treatment effects and the last unobserved response (774,775) provide a significant improvement on the fit of model ID5. The L R statistic (6.50 on df = 4; p-value = 0.16) indicates that there is not strong evidence that we should employ model TRT * LUR instead of model ID5. Table 7.10 suggests the two interaction terms contribute the major improve-ment in expanding the model from ID5 to TRT * LUR. Further, the comparisons of model ID5 with models TRT + LOR + LUR and LOR * LUR seem to agree with this observation (LR statistics = 2.25 and 1.10, df = 3 and 2; p-values = 0.52 and 0.58, respectively). That is, neither the terms LOR and TRT nor the terms LOR and LOR x LUR contribute significant improvement to the fit of model ID5. Thus model TRT + LUR (obtained by setting 774 = 775 = 0 in model TRT * LUR) is an interesting intermediate model between models ID5 and TRT * LUR. The detailed results for model TRT + LUR are provided in Table C.4, while the maximum likeli-hood estimates evaluated on the suggested point on the boundary of the parameter space is presented in Table 7.13. Comparing model TRT * LUR to TRT -I- LUR examines the contribution of the interaction terms, TRT x LUR. The correspond-ing LRT statistic is 5.37 on 2 degrees of freedom (p-value = 0.07), indicating fairly weak evidence against the hypothesis that the interaction terms are negligible. The cautious approach in this situation might be to retain the more general model, i.e. TRT * LUR, rather than reducing to the simpler TRT + LUR. But the evidence is not compelling, so we choose to reduce to the simpler TRT + LUR as the drop-out model. We then further examine whether the reduction from model TRT + LUR to ID5 is reasonable. Not surprisingly, in view of the earlier comparisons of model TRT * LUR to ID5, the LRT shows that the data provide no evidence to conclude 94 that the additional TRT covariates improve the fit of model ID5 (p-value = 0.57). We have already identified that models TRT + LOR + LUR and LOR * LUR do not improve the fit of model ID5. We can also examine whether these extended drop-out models provide improvements to model ID2 (LOR 4- LUR). The LR statis-tics are 1.15 and 0.00 (due to possible round-off error) on 2 and 1 degrees of free-dom, respectively, indicating insufficient evidence to conclude that these extended drop-out models improve upon the fit of ID2 to our data set. Thus, neither the addition of TRT nor of LOR x LUR, provides a meaningful improvement in fit to ID2 (LOR + LUR). Hence, the simpler models ID2 or ID5 can be used to describe the drop-out process in our annual data set. 7.3.2 Summary We explored various ways of modelling the drop-out process in our data. More specifically, the three models considered can be viewed as extensions of ID2 and ID5, two of the permissible drop-out models described in the previous section. We introduce treatment effects and interactions terms into the drop-out model with a view to examining whether there is any impact on the conclusions about the treatment effects. Because some of the conditional drop-out probabilities at years 2 and 3 are estimated to be zero for each of these three drop-out models (see Table 7.9), the estimates of the parameters in the outcome model from these three dfop-out model specifications are identical to those obtained under models ID2 and ID5. It is also of interest to investigate whether a more general model specification for the drop-out process improves the fit. The results indicate that the simpler drop-out models ID2 or ID5 are adequate for our annual MS data. Thus, models ID2 and ID5 would be used throughout the next section. 95 7.4 Baker's Selection Model: Extension of the Outcome Model In this section, we explore extensions of the outcome model considered in the two previous sections based on including other baseline covariates such as gender, age, duration of MS, EDSS and BOD, in addition to the treatment arms and time. The main purpose of this section is to investigate whether or not inclusion of other base-line covariates in the model has any impact on the conclusions about the treatment effects identified in Section 7.2. For simplicity, we only consider the five baseline covariates described in Sec-tion 2.2.3, and these are introduced only into the marginal model for the exacerba-tion rates. The structure of the associations among the measurements is assumed to remain as previously described. This is thought reasonable as our primary in-terest focuses on the impact of additional covariates on the conclusions about the treatment effects in the marginal model for the exacerbation rates. The baseline covariates are included one at a time into the marginal com-ponent of the outcome model. The forward stepwise procedure for inclusion of the baseline covariates in addition to the treatment and time effects is carried out in the following fashion: (1) . Consider each covariate for inclusion in the marginal model and examine if it has a significant effect; (2) . If any covariates have significant effects, include the most significant covariate in the marginal model and repeat (1). Stop when no remaining covariates are found to be significant; (3) . If no covariates have significant effects, terminate the procedure. Even though EDSS score is an ordinal variable, for simplicity, we treat it as a continuous variable in our analysis. The BOD at baseline is skewed to the right .96 as is evident in Figure 2.5. Further, this covariate has a much larger scale than the other covariates. To avoid potential difficulties these features could induce in the estimating procedure, we use a logarithm transformation of the baseline BOD. Baseline BOD and its logarithm are highly associated (the correlation between them is roughly 0.7 based on the 362 patients who had baseline BOD greater than zero). Among the ID models considered with the original form of the outcome model, we found that the reduced models ID2 and ID5 were adequate. The exten-sions considered in Section 7.3 did not improve the fit significantly, so these same drop-out models will be considered here. The inclusion of additional covariates in the outcome model contemplated here could improve the overall fit, in which case it would again be of interest to examine whether the drop-out process is ID, RD or CRD. As noted in Section 7.2, model ID2 is more suitable for this purpose. Hence, model ID2 is used to describe the drop-out process throughout this section. 7.4.1 Results The results of the forward stepwise procedure to examine the role of each baseline covariate are summarized in Table 7.14. These log-likelihood values correspond to maximum likelihood estimates on the boundary of the parameter space as in the earlier fitting with models ID2 and ID5. Detailed summaries for the several cases reported in Table 7.14 appear in Tables D . l to D.5 of Appendix D. The minimization process for obtaining the estimates reported in Tables D . l to D.5 are similar to those described earlier. These maximum likelihood estimates were, on average, obtained at the 24th iteration and the Hessian matrix was never reset to a unit matrix in any of the minimizations. The first baseline covariate in addition to the treatment group included in the model is gender of the patients (Gender). The LRT indicates gender is not an important covariate when estimating the exacerbation rate. This agrees with Wald test (see Table D . l : z-score = 1.17, p-value = 0.24). The effects of baseline EDSS 97 Table 7.14: The LRT Statistics in the Forward Stepwise Procedure Neg. Loglik for Model with LD + HD + time: 933.922 Case Additional COV Neg. Loglik LRT p-value Comment 1 Gender 933.244 1.357 0.24 2 EDSS 933.901 0.043 0.84 3 Dur 933.768 0.154 0.69 4 Age 933.354 1.137 0.29 5 log(BOD) 933.088 1.668 0.20 Based on Imputed Set 1 log(BOD) 933.215 1.414 0.23 Based on Imputed Set 2 log(BOD) 933.083 1.677 0.20 Based on Imputed Set 3 log(BOD) 933.211 1.421 0.23 Based on Imputed Set 4 (EDSS), duration of MS at baseline (Dur), and age at baseline (Age), are similarly not significant; see Table 7.14. As mentioned before, there are 8 patients with missing BOD at baseline. In addition, 2 patients did not have any lesions at baseline, i.e. their baseline BOD value is zero. This creates a minor difficulty for converting baseline BOD to the log scale. Since the smallest non-zero baseline BOD value is 9, we impute a value between 0 and 9 for these 2 patients and perform a sensitivity analysis to determine whether the specific value chosen has any impact on the conclusion of our analysis. The arbitrary values chosen are 1.0 and 4.5. For the 8 patients who did not have any reading on BOD at baseline, one way to impute values for them is with the expectation-maximization (EM) algorithm, utilizing the other baseline covariates. For our purposes, it is sufficient to use the following values to fill in the 8 missing values and perform a sensitivity assessment: • the average of the log of the baseline BOD from 362 patients (excluding the 10 patients mentioned earlier), i.e. 7.085 (BOD = 1194.516); • the average of the log of the BOD from 364 patients (2 patients with zero 98 Table 7.15: Data sets used for assessing the sensitivity of the results when consid-ering log(BOD) in addition to treatment group and gender as a covariate Data Set In terms of BOD The 8 Patients The 2 Patients Imputed Set 1 1194.516 1.0 Imputed Set 2 1194.516 4.5 Imputed Set 3 1148.905 1.0 Imputed Set 4 1158.439 4.5 • baseline BOD imputed to have a value 1.0), i.e. 7.047 (BOD = 1148.905); • the average of the log of the BOD from 364 patients data (2 patients with zero baseline BOD imputed to have a value 4.5), i.e. 7.055 (BOD = 1158.439). The four different combinations of values for imputing the 8 missing values and the 2 zero baseline BOD values are listed in Table 7.15. A l l four imputed data sets lead to a similar conclusion: log(BOD) is not a statistically important factor; see Table D.5 for the detailed results. Since the other baseline covariates are demonstrated to be not important for estimating the rate of exacerbations, we can also perform an alternative assessment for the significance of log(-BO-D). In particular, the 8 patients with missing baseline BOD are withheld from the analysis and the 2 patients with zero baseline BOD are imputed to have values of 1.0 and 4.5. The results evaluated on the boundary of the parameter space are displayed, in Table D.6. To perform a LRT, we re-fit model ID2 with this reduced data set; see Table D.7. The conclusion from this assessment remains the same as in the previous analyses. The LR statistics corresponding to the data sets with zero baseline BOD imputed as 1.0 and 4.5 are 1.83 and 1.56 on 1 degree of freedom (p-values = 0.17 and 0.21, respectively). The Wald-test for 04 also leads to the same conclusion (z-scores — 1.30 and 1.20, with p-values = 0.19 and 0.23, respectively). As expected, the parameter estimates associated with the drop-out process 99 are identical in Tables D . l to D.5. The reason is exactly as in the previous section. Because the conditional drop-out probabilities at years 2 and 3 are estimated to be zero, the log-likelihood functions in all five cases can be expressed as the sum of a function of the parameters for the outcome model and a function of the parameters for the drop-out model. Hence, the MLEs for the parameters in the two processes can be obtained separately. Since we employ the ID2 drop-out model in all five cases, the parameter estimates are expected to be identical. 7.4.2 Summary In the previous sections, the outcome model includes only the treatment groups and time as covariates. Here we consider also including the five baseline covariates, gender of the patients, EDSS, duration of disease, age and BOD, into the marginal model for the exacerbation rates. Model ID2 is used to described the drop-out process throughout the section. We found that none of these five baseline covariates contribute significantly to the fit in estimating the exacerbation rates. 7.5 Overall Summary for Baker's Selection Model We have used Baker's selection modelling approach to address various questions, and we provided a brief summary of our findings at the end of Sections 7.2, 7.3 and 7.4. In this section, we briefly describe what we have learned about the data according to the results obtained with the simplest acceptable model. In Section 7.2, we first determined that the non-saturated outcome model described in (7.2) — (7.4) is sufficient for our data by comparing it to various more general outcome models. This outcome model was then used throughout the section, coupled with drop-out models of the type LOR + LUR, to address questions of interest. We discovered that the maximum likelihood solutions for four (models ID1, ID2, ID4 and ID5) of the six informative drop-out models are located on the boundary of the parameter space. This results in identical parameter estimates for 100 the outcome model associated with drop-out models ID2 and ID5 for our data set. This boundary phenomenon does not arise in any of the ignorable drop-out models, i.e. the RD and CRD models. Based on likelihood ratio tests, we concluded the drop-out mechanism in our data set is informative. Models ID1, ID2 and ID5 are permissible and adequate models for modelling the drop-out process in our data. Model ID5, the simplest permissible informative drop-out model, indicates that the drop-out process in our data depends on the outcome process only through the last unobserved measurement (LUR). In Section 7.3, we explored several drop-out models that can be viewed as generalizations of models ID2 and ID5. In particular, we allowed the drop-out process to depend on the treatment groups. We found that these general drop-out models do not provide significant improvement to the fit of models ID2 or ID5. Thus, our drop-out process can be described by the simpler models ID2 and ID5. In Section 7.4, we addressed the question of the significance of other base-line covariates such as gender, EDSS, duration of MS, age and BOD in estimating the rate of exacerbations. These covariates were considered for inclusion only in the marginal component of the outcome model. Based on the forward stepwise procedure, none of these covariates were found to contribute significantly to the fit. Consequently, the simplest Baker's selection model consists of an outcome model composed of (7.2) — (7.4), and the drop-out process described by model ID5; see Table 7.7. This model fits the data quite adequately (p-value > 0.54). The observed and expected counts for the 15 (observation patterns) by 3 (treatment groups) contingency table are presented in Table 7.16. None of the expected cell counts are zero even though this model estimates some of the conditional proba-bilities of drop-out to be zero. The discrepencies between the observed and the expected counts are generally small, indicating the data are well-described by the model. Thus, we make inferences based on our data using this model in Chapter 8. 101 Table 7.16: The Observed and Expected Cell Counts with Drop-Out Model ID5 ("*" denotes missing) for Baker's Selection Model Pattern P L L D HD (0,0,0) 14 (13.5) 9 (9.1) 15 (15.3) (0,0,1) 3 (3.0) 5 (5.0) 11 (7.7) (0,1,0) 6 (5.2) 7 (7.4) 12 (10.3) (0,1,1) 5 (6.4) 7 (6.3) 7 (5.3) (1,0,0) 9 (6.7) 9 (9.5) 9 (13.9) (1,0,1) 12 (10.2) 10 (10.2) 6 (8.6) (1,1,0) 8 (10.7) 11 (10.7) 13 (9.0) (1,1,1) 25 (24.5) 18 (23.4) 16 (15.8) (0,0,*) 0 (0.9) 1 (1.6) 1 (2.4) (0,1,*) 2 (2.0) 3 (2.0) 0 (1.6) (1,0,*) 1 (3.2) 5 (3.2) 2 (2.7) (1,1,*) 11 (7.7) 10 (7.3) 3 (4.9) (0, *, *) 2 (3.7) 7 (4.3) 4 (4.7) (1,*,*) 12 (11.8) 12 (11.3) 8 (8.1) (*,*, *) 13 (13.6) 11 (13.8) 17 (13.7) Goodness-of-fit Tests G'1 = 26.53 on 28 degrees of freedom; p-value = 0.54 X> = 24.09 on 28 degrees of freedom; p-value = 0.68 102 7.6 The Liu et al. Transition Model In this section, we apply the Liu transition model to our annual data. Recall that Liu et al. (1999) employ a first-order transition model to model the out-come process. Further they assume that each of the conditional probabilities, Pr(Yt* = y% | Y't!_1 = yjL^x) , does not depend on the covariates measured at time t which seems somewhat unusual (see Chapter 5 for details). In our case, we can proceed with their idea without making such an assumption about the dependence on the covariates measured at time t as we consider only covariates measured at baseline. For the drop-out process, we consider three models: ID1, ID2 and ID3 as described in Table 7.1. Based on the LRT, we can select the simplest permissible model among the three. The basic idea of these models is similar to those considered in Liu et al. (1999) in the sense that the drop-out probabilities are assumed to depend only on the response observed prior the drop-out (LOR) and the response which would be observed if drop-out had not occurred (LUR). But in their data set, the first observation is always observed. Thus their models are slightly different than ours as they do not need a model for the case where the response pattern r t _ i is equal to { }. • Repeated Binary Outcomes with Informative Drop-out • o Outcome Model A first-order transition model is assumed for the binary longitudinal data. This means that the current measurement, yt, is related only to the previous measure-ment, y £ _ i , for t = 2,3, as well as to the baseline covariates of interest. Here only the treatment assignment and time are considered in the analysis since the results in the previous section indicate that gender of the patients, baseline EDSS, age at baseline, duration of MS, and baseline BOD were not important covariates in estimating the 103 rates of exacerbation. Thus, the outcome model employed can be expressed as: logit{Pr(Y7* = 1 | y / _ ! = yU,*t)} = A) + PiLD + faHD + 03t + fay*t_x (7.17) o Drop-out Model Models similar to ID3 and ID6 from Table 7.1 were considered in Liu et al. (1999). Here we propose to model the drop-out process using models ID1, ID2, ID3 and ID5. We choose to focus on these three ID models out of the six listed in Table 7.1 because they allow straightforward investigation for the form of the drop-out mechanisms according to the terminology by Little and Rubin (1987). Furthermore, it will be interesting to determine if this leads to the same choice of the ID models for the drop-out process, namely ID2 and ID5, as the Baker selection model approach. • Repeated Binary Outcomes with Ignorable Drop-out • To investigate the impact of different drop-out mechanisms on the treatment ef-fects, we also consider drop-out models assuming the drop-out occurred at random (RD) and completely at random (CRD). The RD and CRD models are the same in Table 7.1. ~ Likelihood ratio tests can be performed to examine the type of drop-out in our annual data based on these models. The results for the parameter estimates under different drop-out mechanisms are presented in the subsequent subsection. We conclude the section with a brief summary. 7.6.1 Results • Informative Drop-out The maximum likelihood solutions for models ID1 and ID2 lie on the boundary of the parameter space, while those for model ID3 exist in the interior. The detailed 104 results for these models are summarized in Tables E . l to E.6 of Appendix E. The boundary solutions for models ID1 and ID2 occur in a similar fashion as in Baker's selection model (see Tables E . l and E.2). We present the MLEs computed on the boundary for drop-out models ID1 and ID2 in Tables 7.17 and 7.18, respectively. These reported estimates are obtained with many fewer iterations than those in Tables E . l and E.2. Moreover, the estimated Hessian matrix in both cases was never reset to unity throughout the minimization process. Notice that the MLEs for the parameters in the outcome model are identical for drop-out models ID1 and ID2. This is again because some of the conditional probabilities of drop-out at years 2 and 3 are estimated to be zero and hence the parameters in the outcome and drop-out models can be estimated separately. The G2 and X2 goodness-of-fit statistics shown in Table 7.20 provide some evidence of lack-of-fit in each case. Although the evidence is not compelling, the fit of these models for our data are somewhat questionable; perhaps a more complicated asso-ciation structure or a more general drop-out model should be employed. However, our objective is not to perform a definitive analysis on our annual data, but rather to explore different approaches for modelling incomplete longitudinal binary data with informative drop-outs. Hence, despite their somewhat questionable fit, we do not elaborate on these models but rather go on to consider the best choices within this collection of models. Al l three ID models lead to similar conclusions about the treatment effects. In particular, the chance that an exacerbation would be experienced, given the past history (whether or not an exacerbation occurred at the previous time point), is not significantly different between the LD and PL groups (all p-values > 0.47). Nevertheless, the LD effect is estimated to be much stronger in model ID3 than in models ID1 and ID2. Al l three models conclude that the HD group has a lower chance than the P L group to experience an exacerbation, given the past history (p-values < 0.01). Further, there is a strong suggestion of a linear decrease over 105 Table 7.17: Results for Liu Transition Model with Drop-out Model ID1 Evaluated on the Boundary: 7703 -oo, 7702 - 0 0 , 7703 + 7723 = A i and 7702 + 7722 = A 2 Parameter Estimate SE Po 1.007 0.206 Pi (LD) -0.040 0.168 P2 (HD) -0.462 0.167 •03 (time) -0.324 0.095 Pi 0.692 0.161 V01 -2.089 0.165 V13 0.558 0.413 V12 0.048 0.374 Ax -1.548 0.348 A 2 -1.327 0.314 Neg. Loglik 942.259 (# Iter = 16) Table 7.18: Results for Liu Transition Model with Drop-out Model ID2 Evaluated on the Boundary: 7703,7702 -» - 0 0 , A i = 7703 + 772 and A 2 = 7702 + 772 Parameter Estimate SE PQ 1.007 0.199 PI (LD) -0.040 0.167 P2 (HD) -0.462 0.167 03 (time) -0.324 0.094 Pi 0.692 0.161 V01 -2.089 0.169 Vi • 0.286 0.296 A i -1.356 0.279 A 2 -1.499 0.273 Neg. Loglik 942.687 (# Iter = 15) 106 Table 7.19: Results for Liu Transition Model with Drop-out Model ID5 Evaluated on the Boundary: 7703, V02 - » - 0 0 , A i = 7703 + V2 and A 2 = 7702 + 772 Parameter Estimate SE A) 1.007 0.206 0i (LD) -0.040 0.169 0 2 (HD) -0.462 0.168 03 (time) -0.324 0.095 04 0.692 0.161 -2.089 0.166 A x -1.356 0.185 A 2 -1.499 0.168 Neg. Loglik 943.239 (# Iter = 14) Table 7.20: Goodness-of-fit Statistics for Liu Transition Model with Drop-out Mod-els ID1, ID2, ID3 and ID5 Model Degrees of Freedom G'2 p-value X2 p-value ID1 30 42.56 0.06 40.91 0.09 ID2 32 42.77 . 0.10 41.15 0.13 ID3 34 48.09 0.06 46.06 0.08 ID5 33 46.10 0.06 45.76 0.07 107 time in the log odds of having exacerbations given the past history (p-value < 0.001 based on fa in each model). The association parameter fa is highly significant (all p-values < 0.001). Under models ID1 and ID2, the odds of having an exacerbation given there was an exacerbation at the previous visit are 2.0 times the odds of having an exacerbation given there was no exacerbation at the previous visit; the corresponding approximate 95% CI for the odds ratio is (1.46,2.74). Under model ID3, the odds ratio is estimated as 1.8 and the approximate 95% CI is (1.27,2.48). The LR statistic for the reduction from model ID1 to model ID2 is 0.86 on 2 degrees of freedom (p-value = 0.65) and hence is permissible. However, we cannot further reduce model ID2 to model ID3 (LR statistic = 6.43, df = 2; p-value = 0.04). Thus, the simplest ID model among these three is ID2, which is the same conclusion obtained with Baker's selection model. Recall that with Baker's selection model, drop-out model ID5 is a reasonable reduction of model ID2. Thus, it is of interest to perform this assessment with the Liu transition model. The parameter estimates obtained from the Q N minimization with drop-out model ID5 are summarized in Table E.4. The results indicate a similar feature of boundary solutions as in model ID2. Table 7.7 presents the maximum likelihood estimates obtained at the suggested boundary points for model ID5. The LR statistic indicates that the term corresponding to the last observed response included in ID2 does not provide an important improvement to the fit (LR statistic = 1.10, df = 1; p-value = 0.29). Further, while the goodness-of-fit of model ID5 is slightly less satisfactory than for ID2 (see Table 7.20), the evidence against the adequacy of model ID5 is not overly compelling. These conclusions are qualitatively similar to those obtained with Baker's selection model. • Ignorable Drop-out The results for the RD and CRD models are displayed in Tables E.5 and E.6, respec-tively. As expected, the parameter estimates for the outcome model are identical 108 under both drop-out mechanisms. Al l parameter estimates are located in the interior of the parameter space. Under the assumption that the drop-out process is ignorable, the Wald tests suggest that the chance a patient would have an exacerbation given the past history is similar in the LD and PL group (p-value 0.50). But the risk is significantly lower in the HD group than in the PL group (p-value « 0.01). As in the ID case, the suggestion of a linear decrease over time in the log odds of having exacerbations given the past history is quite strong (z-score « —4.4 based on fa; p-value < 0.001). The odds of having an exacerbation given there was an exacerbation in the previous period are about 1.8 times the odds of having an exacerbation given there was no exacerbation in the previous period; the approximate 95% CI for the odds ratio is (1.32, 2.50). We perform LRTs for selecting the simplest RD and CRD models. The reduction from model RD1 to RD2 is permissible (p-value = 0.36), but the further reduction from model RD2 to RD3 is not allowed (p-value = 0.007). Under the CRD assumption, CRD1 is identified as the simplest possible model, as the reduction from CRD1 to CRD2 is not permissible (p-value = 0.004). These choices differ from those for Baker's selection model; see Section 7.2. • Types of Drop-out We established that models ID1 and ID2 are reasonable for describing our data. To investigate the types of drop-out in our annual data set, we compare these models with some RD and CRD models. For assessing if the drop-out mechanism is of type RD, model ID1 can be compared to model RD1 and similarly, model ID2 can be compared to model RD2. The reduction from ID1 to RD1 is permissible (LR statistic = 3.68, df = 2; p-value = 0.16). However, the more sensitive assessment comparing model RD2 to ID2 (since the reduction from ID1 to ID2 is reasonable) provides a less definite 109 conclusion; the L R statistic equals 3.66 on 1 degree of freedom (p-value = 0.06). With a 5% level of significance, we would not reject the hypothesis that 772 = 0, but with only a slightly larger acceptable type I error, we would reject the hypothesis. Thus further investigation is required. The L R test indicates one cannot reduce from model RD2 to CRD1 (LR statistic = 6.14, df = 1; p-value = 0.01). The reduction from model ID2 to CRD1 is also not permitted (LR statistic = 9.80, df = 2; p-value = 0.007). Thus we need to make a decision based on the comparison between model ID2 and RD2. In such an ambiguous situation, one would usually prefer not to reduce from ID2 to RD2 because the simpler model may be more susceptible to potential bias in the results. As mentioned earlier, model ID2 can be further reduced to ID5. The comparison of model ID5 to CRD1 confirms that model CRD1 is not appropriate for our data (LR statistic = 8.70, df = 1; p-value = 0.003). Thus we conclude that the drop-out process in our data appears to be informative. 7.6.2 Summary We considered a first-order transition model for modelling the outcome process, coupled with the same drop-out models considered in Section 7.2. Based on the likelihood ratio tests, it appears that the drop-out process in our model cannot be ignored. Model ID5 is identified as the simplest drop-out model that is acceptable for our data. Based on this model, we computed the expected cell counts for the 15 (obser-vation patterns) by 3 (treatment arms) contingency table; see Table 7.21. Despite some of conditional drop-out probabilities being estimated as zero, the expected counts are all nonzero. Notice that the differences between the observed and ex-pected counts in some cells are quite large. For instance, the differences in cells (0,0,0) and (1,1,0) for the P L group and in cell (1,1,1) for the L D group are larger than 5.0 in magnitude. This is also reflected in the values of G2 and X2, both 110 J Table 7.21: The Observed and Expected Cell Counts for the Liu Transition Model with Drop-Out Model ID5 ("*" denotes missing) Pattern PL LD HD (0,0,0) 14 (7.4) 9 (8.1) 15 (15.6) (0,0,1) 3 (6,1) 5 (6.4) 11 (8.1) (0,1,0) 6 (5.8) 7 (6.1) 12 (8.3) (0,1,1) 5 (9.5) 7 (9.6) 7 (8.6) (1,0,0) 9 (9.3) 9 (9.7) 9 (13.2) (1,0,1) 12 (7.6) 10 (7.7) 6 (6.9) (1,1,0) 8 (14.4) 11 (14.6) 13 (14.0) (1,1,1) 25 (23.6) 18 (23.1) 16 (14.5) (0,0,*) 0 (1-6) 1 (1.6) 1 (2.1) (0,1,*) 2 (2.4) 3 (2.5) 0 (2.2) (1,0,*) 1 (2.0) 5 (2.0) 2 (1.8) (1,1,*) 11 (6.1) 10 (6.0) 3 (3.7) (o, *, *) 2 (3.9) 7 (4.1) 4 (4.3) (1,*,*) 12 (9.8) 12 (9.8) 8 (7.2) (*, *, *) 13 (13.6) 11 (13.8) 17 (13.7) Goodness-of-fit Tests & = 46.10 on 33 degrees of freedom; p-value = 0.06 X 2 = 45.76 on 33 degrees of freedom; p-value = 0.07 111 indicating a potential lack-of-fit of this model. One could explore more complicated drop-out models or association structures to improve the fit of the model, but such extensions are not our main interest. Thus, we go on to make inferences based on our data using this model in the concluding chapter. 112 Chapter 8 Conclusions 8.1 Conclusions The main focus of this thesis has been on exploring likelihood-based methods for analyzing longitudinal binary responses under informative (or non-ignorable) drop-out. The two modelling approaches, considered were Baker's selection model and the Liu et al. transition model. Both models belong to a general class of models known as selection models. A selection model factors the joint distribution for the response variables (Y) and the indicator variables denoting whether the response variables were observed (R) as / ( Y , R ) = / ( R | Y ) / ( Y ) , (8.1) where / ( R | Y ) is the model for the drop-out process and / ( Y ) corresponds to the model for the measurement (or outcome) process. The main difference between Baker's selection model and the Liu transi-tion model resides in the model specification for the measurement process. Baker's selection model uses a parameterization proposed by Ekholm (1991, 1992) to acco-modate longitudinal binary measurements. That is, the outcome model is expressed in terms of a model for the (univariate marginal) probabilities of the responses and 113 an association model for the temporal associations among the responses. The Liu transition model, however, employs a first-order Markov chain transition model for the measurement process. The conditional distribution of response at time t (yt) given the history of the responses up to time t — 1 is assumed to depend only on the response at the previous time point (yt-i)- These outcome models are coupled with a drop-out model specified as a time-ordered causal model incorporating the assumption that the drop-out does not depend on future events. Given that the two approaches model the outcome process differently, this raises the question of the advantages and disadvantages of the two approaches. If the objective of the study is to study the effects of covariates on the marginal prob-abilities of the responses, marginal models provide a direct answer to this question. However, transition models should be used when the interest is in prediction (Diggle et al., 1994). Baker's selection model incorporates a more general structure for the strength of association among the responses than the Liu transition model. The structure for the associations among the responses in the Liu transition model is completely specified in terms of a single lagged effect. (Additional lagged effects could be added to the model but the nature of the association structure is lim-ited by this parameterization.) For Baker's selection model, the expression for the outcome model for a sequence with more than three responses becomes more com-plicated, and the number of parameters increases rapidly. This is particularly so for the association model if no assumptions are made regarding the nature of the asso-ciation structure among the responses. Unlike Baker's selection model, the number of parameters in the Liu transition outcome model need not change with the length of the response sequence. Both models were applied to our annual version of the Berlex exacerbation data described in Chapter 2 to examine the sensitivity of the estimated effects of Interferon /3-lb on the exacerbation rates in relapsing-remitting MS patients to various assumed forms for the drop-out mechanisms. More fundamentally, we were 114 Table 8.1: Estimated Chance of Exacerbations Based on Baker's Selection Model Treatment Group Year 1 Year 2 Year 3 P L 0.68 0.66 0.63 L D 0.68 0.65 0.63 HD 0.57 0.54 0.51 interested in studying the nature of the drop-out process in this clinical trial. Using Baker's selection modelling approach, we verified that the relationships expressed in (7.2) — (7.4) are sufficient for describing the outcome process in our data. This outcome model coupled with drop-out model ID5 is determined to be the most parsimonious yet adequate model among other more general models considered. In other words, the drop-out process in our data is informative and it depends on the last unobserved response, but not on the last observed response. Based on this model, we conclude that the low dose effect is not significant. The odds of having exacerbations in the L D group are reduced only by 1.7% relative to the odds of having exacerbations in the P L group. The corresponding approximate 95% CI for the precent reduction in the odds is (—44.9%, 33.3%). The high dose effect, however, is evidently different from the placebo effect. The odds of having exacerbations in the HD group are roughly 38.4% lower than the odds in the P L group (95% CI: 10.1%, 57.7%). Under the model assumption that the log odds of having exacerbations changes linearly over time, the odds are estimated to decrease by 11.1% per year in each group. The approximate 95% CI for the relative reduction in odds over time is (—2.6%, 23.0%), indicating the reduction is not statistically significant. The estimated chances of having exacerbations at each occasion presented in Table 8.1 also reflect these conclusions. The chances of experiencing exacerbations are almost the same in the L D and P L groups, but are much smaller in the HD group. In each group, these chances decrease only slightly over time. As for the association models, the L D and P L groups seem to have similar 115 Table 8.2: Estimated Chances of Exacerbations Based on the Liu et al. Transition Model Exacerbation Experienced in Previous Period Treatment Group Year 1 Year 2 Year 3 PL 0.80 0.74 0.67 LD 0.79 0.73 0.67 HD 0.71 0.64 0.57 No Exacerbation Experienced in Previous Period Treatment Group Year 1 Year 2 Year 3 PL 0.66 0.59 0.51 LD 0.66 0.58 . 0.50 HD 0.56 • 0.47 0.39 chances of having exacerbations at exactly two or all three time points during the study, but these chances are lower in the HD group. The odds ratios for the LD and PL groups are estimated as 0.90, reflecting a 9.8% reduction in the odds in the LD group. The corresponding approximate 95% CI is (—36.4%,40.3%), implying the LD effect is not statistically significant. On the other hand, the odds in the HD group are only about half the odds in the PL group. The approximate 95% CI for the decrease in the odds in the HD group is (20.1%, 65.8%). The estimates for the intercept parameters, a\2, and 0:23, are all quite small. This suggests a possible reduction to a model with all the 2-way associations in each treatment group being the same, i.e. au = 0:13 = 0 2 3 . On the other hand, a separate intercept parameter for the 3-way association appears to be useful as d i23 is considerably larger in magnitude. Notice that, the estimated joint probabilities of the responses obtained from our model are slightly larger than those obtained under the independence assumption, indicating that there is some positive dependence among the responses; see Table 8.3. With the Liu et al. transition approach, the simplest acceptable drop-out model is also identified to be ID5, again indicating the drop-out mechanism in our 116 data is informative. Even though the outcome model, and hence the parameters be-ing estimated, are different than in Baker's selection model, the conclusion regarding the treatment effects remain quite similar. For fixed t and previous response yjLj, the odds of having exacerbations are reduced by 3.9% (95% CI: —33.8%, 31.0%) in the LD group and by 37.0% (95% CI: 12.5%, 54.6%) in the HD group relative to the PL group. This indicates that only the high dosage of Interferon /3-lb effectively reduces the odds of experiencing exacerbations in MS patients. Similarly, the parameters fa and ^4 can also be interpretated as log odds ratios. In particular, exp(fa) represents the ratio of the odds of having exacerbations at time t + 1 as relative to time t for a patient with the same history at times t — 1 and t {yt-i = yt)• This odds ratio is estimated as 0.72 with approximate 95% CI (0.60, 0.87). The odds of having exacerbations given exacerbations in the previous period are 2.00 (= ex.p(fa)) times the odds given no exacerbations in the previous period; the corresponding 95% CI for the odds ratio is (1.46, 2.74). The estimated chances of experiencing exacerbations given the previous his-tory presented in Table 8.2 also indicate similar conclusions regarding the treatment effects: the risks are much smaller in the HD group than in the LD and PL groups. Given that exacerbations were observed in the previous period (i.e. y^_x = 1), the relative differences in the chances between the HD and PL groups are 11%, 14% and 15% at years 1, 2, and 3, respectively. For the case where no exacerbations were detected in the previous period (i.e. yl_\ = 0), the relative differences are slightly larger: 15%, 20% and 24% at years 1, 2, and 3, correspondingly. Table 8.3 displays the values of Pi(Y* = l,Yt* = 1) where {s,t} = {1,2}, {1,3}, {2,3} and Pr(Yj* = 1, Y2* = 1, Yz* = 1) obtained from Baker's selection model and the Liu et al. transition model. The estimates are generally similar for the two approaches except for the estimated probability of exacerbations at visits 1 and 3 and at all three visits. The differences are more substantial for the former estimated probabilities; the magnitudes of the (absolute) differences are 0.08, 0.06, 0.12 in the 117 Table 8.3: Estimated Pr(Ys* = l,Yt* = 1) and Pr(Yx* = 1,Y2* = 1,Y3* = 1) by Treatment Groups Baker's Selection Model PL LD HD Pr(Y1* = l,y 2* = l) 0.50 0.47 0.34 Pr(Y1* = l ,Y 3 * = l) 0.50 0.47 0.34 Pr(Y2* = l ,Y 3 * = l) 0.47 0.45 0.32 Pr(Y1* = l ,Y 2 * = l ,Y 3 * = l) 0.38 0.35 0.24 Assuming Independent Responses PL LD HD Pr(Y1* = l ,Y 2 * = l) 0.45 0.44 0.31 Pr(Y1* = l ,Y 3 * = l) 0.43 0.43 0.29 Pr(Y2* = l ,Y 3 * = l) 0.42 0.41 0.28 Pr(Y1* = l ,Y 2 * = l ,y 3 * = l) 0.28 0.28 0.15 Liu et al. Transition Model PL LD HD Pr(Y1* = l ,Y 2 * = l) 0.49 0.48 0.36 Pr(Y1*.= l ,Y 3 * = l) 0.42 0.41 0.28 Pr(Y2* = l ,Y 3 * = l) 0.47 0.45 0.32 Pr(Y1* = l ,Y 2 * = l ,Y 3 * = l) 0.33 0.32 0.20 PL, LD and HD groups respectively. In the intent-to-treat analyses reported in [35] (which assumed the drop-out occurred completely at random), the exacerbation rate was defined as the number of exacerbations experienced in one year. This is different from the exacerbation rate referred to throughout this thesis (the chance of having one or more exacerbations in a year). Nevertheless, it is of interest to compare the two sets of estimated treatment effects in terms of the relative change in the exacerbation rates. From the intent-to-treat analyses, the exacerbation rates in the PL, LD and HD group were 1.21, 1.05 and 0.84, respectively. Thus, the rates were 13% and 31% lower for the LD and HD groups relative to the PL group. Based on Baker's selection 118 model, the odds of having exacerbations are reduced by 1.7% and 38.4% in the LD and HD groups, respectively. Similarly, they are reduced by 3.9% and 37.0% under the Liu et al. transition model. The relative changes for the low dose effect are quite different between our approaches and the intent-to-treat analyses, but the variation is not as large for the high dose effect. Even though the magnitudes of the relative changes are quite different, the results convey a similar conclusion; that is, the effect of the high dosage of Interferon /3-lb is much more evident than that of the low dosage. We also found that there is a weak positive association over time in the presence/absence of exacerbations, and that the influence of the association is present over more than 1 time period. In the previous chapter, we provided the results from goodness-of-fit tests for both Baker's selection model and the Liu et al. transition model. The tests provided no evidence to suggest any lack-of-fit of Baker's selection model for our data. However, the adequacy of the Liu et al. transition model (p-values = 0.06 and 0.07 for G2 and X2 respectively) is questionable. The discrepency between some of the observed and expected counts obtained from the Liu et al. model is quite large (see Table 7.21). This seems to suggest the restrictive assumption on the form of the associations among the responses in the Liu transition model may not be adequate for our data; that is, a higher-ordered transition model could possible be used instead. Alternatively, this may suggest a more general model for the drop-out process should be employed. Between Baker's selection model and the Liu transition model, Baker's selection model seems much more satisfactory as it fits the data quite well (see Table 7.20). In summary, analyses based on an assumption of ignorable non-response when the non-response mechanism is informative could lead to misleading results. By incorporating a non-response model in a likelihood-based approach, valid infer-ences can be obtained when the non-response mechanism is non-ignorable provided the non-response model correctly describes the non-response mechanism (Little and 119 Rubin, 1987). However, this approach is not without analytical difficulties. The parameters of the non-ignorable models may not be identifiable or the solutions to the likelihood equations (which may not be the maximum) may lie on the boundary of the parameter space. In Chapter 6, we showed that, with a saturated outcome model, the informative models of types C O V * L U R , C O V + L O R + L U R and L O R * L U R where C O V represents categorical covariates, are identifiable. In the course of our analyses in Chapter 7, we demonstrated that the maximum likelihood solutions for some of our non-ignorable models were located on the boundary of the parameter space. This boundary phenomenon did not occur in any of the ignorable non-response models considered. 8.2 Further Work • Other Approaches of Interest There are approaches other than selection models that can be used for analyzing incomplete data. In particular, the pattern-mixture modelling framework proposed by Little (1993) has become an area of active research. The pattern-mixture ap-proach specifies the joint distribution of the measurement and response processes in terms of the marginal distribution of the responses multiplied by the distribution of measurements, conditional on the response patterns. Pattern-mixture models are natural when the interest is in population strata defined by missing data patterns, but these models are typically underidentified (Little, 1993). Thus the models re-quire restrictions or prior information to identify the parameters. Unlike selection models, with the pattern-mixture approach one can avoid specifying the form of the missing data mechanism as it is incorporated indirectly via parameter restrictions (Little, 1993). This is a possible attractive feature over the selection model ap-proach as the latter is vulnerable to misspecification of the form of the missing-data mechanism. Further, pattern-mixture models are closer to the form of the data and sometimes simpler to fit. Thus, it would be of interest to re-analyze our annual data 120 with this approach and compare the results to those reported here. • Generalizations of the Data We chose to express the exacerbation data in terms of annual binary outcome vari-ables. One could perform similar analyses on the binary data with more refined time intervals; for instance, semi-annual intervals. This semi-annual data may contain more information and may provide more precise estimates for the parameters. As mentioned at the outset, there is a loss of information associated with dichotomizing the data. To retain all the information, one could analyze the count data presented in Table 2.4 treating these as realizations of Poisson random vari-ables [18, 19]. One could also use this approach with finer time-intervals, semi-annual intervals say. The conclusions obtained from these annual and semi-annual count data might be more informative than those based on the dichotomized data. 121 Bibl iography [1] Baker, S.G. and Laird, N . M . (1988). Regression analysis for categorical variable with outcome subject to nonignorable nonresponse. Journal of the American Statistical Association 83, 62-69. [2] Baker, S.G. (1995). Marginal regression for repeated binary data with outcome subject to non-ignorable non-response. Biometrics 51, 1042-1052. [3] Broyden, C . G . (1970a). The convergence of a class of double-rank minimization algorithms, pt 1. Journal of the Institute of Mathematics and Its Applications 6, 76-90. [4] Broyden, C . G . (1970b). The convergence of a class of double-rank minimization algorithms, pt 2. Journal of the Institute of Mathematics and Its Applications 6, 222-231. [5] Dale, J . (1986). Global cross-ratio models for bivariate discrete ordered re-sponses. Biometrics 42, 909-917. [6] Diggle, P.J. and Kenward, M . G . (1994). Informative drop-out in longitudinal data analysis. Applied Statistics 43, 49-93. [7] Ekholm, A . (1991). Fitting regression models to a multivariate binary response. In: A Spectrum of Statistical Thought: Essays in Statistical Theory, Eco-nomics, and Population Genetics in Honour of Johan Fellman, G . Rosenqvist, 122 K. Juselius, K. Nordstrom, J . Palmgren (eds), 19-32. Helsinki: Swedish School of Economics and Business Administration. [8] Ekholm, A . (1992). Discussion of: Multivariate regression analysis for categor-ical data by K. Liang, S.L. Zeger, and B. Qaqish. Journal of the American Statistical Association 81, 354-365. [9] Ekholm, A. (1998). The muscatine children's obesity data reanalysed using pattern mixture models. Applied Statistics 47, 251-263. [10] Fitzmaurice, G . M . and Laird, N . M . (1993). A likelihood-based method for analysing longitudinal binary responses. Biomeirika 80, 141-151. [11] Fitzmaurice, G . M . , Laird, N . M . and Zahner, E.P. (1996). Multivariate logistic models for incomplete binary responses. Journal of the American Statistical Association 91, 99-108. [12] Fletcher, R. (1970). A new approach to variable metric algorithms. The Com-puter Journal 13, 317-322. [13] Glonek, G . F . V . (1999). On identifiability in models for incomplete binary data. Statistics & Probability Letters 41, 191-197. [14] Goodman, L . A . (1974). Exploratory latent structure analysis using both iden-tifiable and unidentifiable models. Biometrika 61, 215-231. [15] Kenward, M . G . , Lesaffre, E . and Molenberghs, G. (1994). A n application of maximum likelihood and generalized estimating equations to the analysis of ordinal data from a longitudinal study with cases missing at random. Biometrics 50, 945-953. [16] Laird, N . M . (1988). Missing data in longitudinal studies. Statistics in Medicine 7, 305-315. 123 [17] Liang, K . Y . and Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73, 13-22. [18] Lindsey, J .K. (1997). Applying Generalized Linear Models. Springer-Verlag, New York. [19] Lindsey, J .K. (1999). Models for Repeated Measurements. Oxford University Press, New York. [20] Little, R.J .A. and Rubin, D.B. (1987). Statistical Analysis with Missing Data. John Wiley, New York. [21] Little, R.J .A. (1993). Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association 88, 125-134. [22] Liu, X. , Waternaux, C. and Petkova, E . (1999). Influence of human immunod-eficiency virus infection on neurological impairment: an analysis of longtudinal binary data with informative drop-out. Applied Statistics 48, 103-115. [23] Michiels, B., Molenberghs, G. and Lipsitz, S.R. (1999). Selection models and pattern-mixture models for incomplete data with covariates. Biometrics 55, 978-983. [24] Molenberghs, G. , Kenward, M . G . and Lesaffre, E . (1997). The analysis of lon-gitudinal ordinal data with non-random dropout* Biometrika 84, 33-44. [25] Molenberghs, G. , Goetghebeur, E . J . T . , Lipsitz, S.R. and Kenward, M . G . (1999). Nonrandom missingness in categorical data: strengths and limitations. The American Statistician 53, 110-118. [26] Nash, J . C . (1979). Compact Numerical Methods for Computers: Linear Algebra and Function Minimisation. Adam Hilger Ltd, Bristol. 124 [27] Paty, D.W., L i , D . K . B . , The U B C M S / M R I Study Group, and The IFNB Multiple Sclerosis Study Group (1993). Interferon B-lb is effective in relapsing-remitting multiple sclerosis: II. MRI analysis results of a multicenter, random-ized, double-blind, placebo-controlled trial. Neurology 43, 662-668. [28] Robins, J . M . and Rotnitzky, A . (1995) Semiparametric efficiency in multivari-ate regression models with missing data. Journal of the American Statistical Association 90, 122-129. [29] Rothenberg, T . J . (1971). Identification in parametric models. Econometrica 39, 577-591. [30] Rubin, D.B. (1976). Inference and missing data. Biometrika 63, 581-592. [31] Schluchter, M.D. (1992). Methods for the analysis of informatively censored longitudinal data. Statistics in Medicine 11, 1861-1870. [32] Shanno, D.F . (1970). Conditioning of quasi-Newton methods for function min-imization. Mathematics of Computation 24, 647-656. [33] Sun, W. and Song, P. (2000). Statistical analysis of repeated measurements with informative cersoring times. Statistics in Medicine. To appear. [34] Ten Have, T.R. , Kunselman, A.R. , Pulkstenis, E.P. and Landis, J.R. (1998). Mixed effects logistic regression models for longitudinal binary response data with informative drop-out. Biometrics 54, 367-383. [35] The IFNB Multiple Sclerosis Study Group (1993). Interferon B-lb is effective in relapsing-remitting multiple sclerosis: I. Clinical results of a multicenter, randomized, double-blind, placebo-controlled trial. Neurology 43, 655-661. [36] The IFNB Multiple Sclerosis Study Group (1995). Interferon B-ib in the treat-ment of multiple sclerosis: final outcome of the randomized controlled trial. Neurology 45, 1277-1285. 125 [37] Wu, M.C. and Carroll, R.J. (1988). Estimation and comparison of changes in the presence of informative right censoring by modelling the censoring process. Biometrics 44, 175-188. 126 Appendix A Proof for Condition (6.4) As in Section 6.2.2, there are two binary responses, Y\ and Y2, with only Y2 subject to non-response. The outcome model is Pr(Yi = j,Y2 = k \ X = i) = iVijk, for j, k = 0 , 1 . The non-response model, PT(R2 = p | Y\ = j, Y2 = k, X = i) = pijk, is assumed to be homogeneous in Y\\ that is, pijk = pik. Thus, the joint probabilities for the observed data are Pr(Yi = j , Y2 = k, both observed | X = i) = 9ijk = itijkpik Pr(Yi = j, Y2 unobserved | X = i) = 0ijt = irij0(l - pi0) + 7 ^ 1 ( 1 - pix), and the marginal probabilities for Y\ are ITij. = 7Tjj0 + Ttijl = + dijQ + 9ij\. Let (f>ik = 1/pik and assume 1 = 2. The 0,^ must satisfy the following system of equations: #100 #101 0 0 \ ^ 010 \ f 7T10- \ #110 #111 0 0 011 TTll-0 0 #200 #201 020 7T20-V 0 0 #210 #211 / \ 021 / \ 7T21- / ( A . l ) 127 Given the multinomial probabilities 6, there is a unique solution for the fak pro-vided the coefficient matrix is non-singular; that is, provided the determinant of the coefficient matrix does not equal to 0. The determinant of the coefficient matrix, ( 0 m 0 i o o — 0 i o i 0 n o ) ( 0 2 i i 0 2 o o — ^201^210)1 will be non-zero provided #1110100 - 0 i o i # i i o 0 (A.2) and #211#200 - #2010210 7^ 0- (A-3) To satisfy (A.2), we require 01110100 01010110 TTll lPl lTTlooPlO TI ' lOlPllTrilO/'lO 7Tlll(7riO- - TTlOl) 7!"l0l(7Tll- - TTll l ) 7!"lll/7I"ll- * TTlOl/^lO-Pr(Y 2 = l 1 y 1 = l , X = l) Pr(y 2 = 1 1 YX = 0,X = 1) Similarly, to satisfy (A.3) requires Pr(y 2 = 1 I YX = 1,X = 2) ^ Pr(y 2 = 1\Y1 = 0,X = 2). Thus the necessary and sufficient condition for the coefficient matrix to be non-singular is Pr(y 2 = 11 y = i,x = i) ?Pr(y2 = I\ Y1=O,X = i) for i = 1,2. Thus, the fak are identifiable unless this condition fails to hold. Note that, in contrast to the argument leading to condition (6.3), the argument leading to this condition remains the same if the number of levels of the categorical covariate X is greater than 2 (J > 2). 128 Appendix B Detailed Results for the Selection Models Described Section 7.2 129 Table B . l : Results for Model ID1 Set 1 Set 2 Parameter SV Estimate (SE) SV Estimate (SE) Po 0.82 0.876 (0.919) 0.90 0.876 (0.816) Pi 0.00 -0.028 (1.036) -0.02 -0.028 (0.584) P2 0.00 -0.489 (0.896) -0.50 -0.489 (0.357) 03 -0.26 -0.122 (0.568) -0.12 -0.122 (0.388) " 1 2 -0.60 -0.020 (0.579) -0.02 -0.020 (0.404) " 1 3 -0.63 -0.031 (0.840) -0.03 -0.031 (0.378) " 2 3 -0.77 -0.136 (0.959) -0.14 -0.136 (0.486) " 1 2 3 -1.15 -0.534 (0.702) -0.50 -0.534 (0.446) " 1 0.00 -0.113 (1.188) -0.11 -0.113 (0.656) " 2 0.00 -0.657 (0.938) -0.66 -0.657 (0.391) »703 -1.95 -14.421 (1.025) -1.00 -15.848 (0.784) Vl3 0.00 0.558 (1.003) 0.50 0.558 (0.769) V23 0.00 12.874 (1.015) 1.00 14.301 (0.775) V02 -1.95 -3.360 (1.042) -2.00 -3.360 (0.690) Vl2 0.00 0.140 (1.001) 0.14 0.140 (0.540) V22 0.00 1.860 (1.030) 2.00 1.860 (0.787) Voi -1.95 -2.089 (1.057) -2.00 -2.089 (0.563) Neg. Loglik 933.407 ( # Iter = 71) 933.407 ( # Iter = 70) Set 3 Set 4 Parameter SV Estimate (SE) SV Estimate (SE) 00 0.90 0.876 (0.799) 0.876 0.876 (0.330) 0i -0.03 -0.028 (0.266) -0.028 -0.028 (0.338) 02 0.00 -0.489 (0.286) -0.489 -0.489 (0.341) 03 -0.12 -0.122 (0.394) -0.122 -0.122 (0.079) " 1 2 -0.02 -0.020 (0.287) -0.020 -0.020 (0.302) " 1 3 -0.04 -0.031 (0.25.1) -0.031 -0.031 (0.311) " 2 3 -0.15 -0.136 (0.434) -0.136 -0.136 (0.321) " 1 2 3 -0.50 -0.534 (0.292) -0.534 -0.534 (0.334) " 1 0.00 -0.113 (0.287) -0.113 -0.113 (0.355) " 2 0.00 -0.657 (0.310) -0.657 -0.657 (0.371) V03 -2.00 -15.400 (0.799) -20.000 -15.171 (0.737) Vl3 0.56 0.558 (0.865) 0.558 0.558 (0.503) V23 0.00 13.853 (0.794) 1.000 13.624 (0.739) V02 -2.40 -3.360 (0.806) -3.360 -3.360 (0.800) Vl2 0.15 0.140 (0.783) 0.140 0.140 (0.828) V22 2.00 1.860 (0.813) 1.860 1.860 (0.813) V01 0.00 -2.089 (0.266) -2.089 -2.089 (0.171) Neg. Loglik 933.407 ( # Iter = 59) ' 933.407 ( # Iter = 91). 130 Table B.2: Results for Model ID2 Set 1 Set 2 Parameter SV Estimate (SE) SV Estimate (SE) ft 0.82 0.886 (0.553) 0.88 0.886 (0.738) 0i 0.00 - 0,017 (0.538) -0.02 - 0.017 (0.603) 02 0.00 - 0.484 (0.538) -0.48 - 0.484 (0.700) ft -0.26 - 0.118 (0.248) -0.12 - 0.118 (0.379) C*12 -0.60 - 0.004 (0.341) 0.00 - 0.004 (0.487) ai3 -0.63 - 0.010 (0.325) -0.01 - 0.010 (0.683) " 2 3 -0.77 - 0.111 (0.373) -0.11 - 0.111 (0.848) a m -1.15 - 0.511 (0.368) -0.51 - 0.511 (0.607) a i 0.00 - 0.103 (0.592) -0.10 - 0.103 (0.713) 0 2 0.00 - 0.649 (0.647) -0.65 - 0.649 (0.818) »703 -1.95 -14.421 (0.888) -1.95 -14.760 (1.075) »?1 0.00 0.286 (0.815) 0.00 0.286 (0.844) 0.00 13.065 (0.708) 0.00 13.404 (0.875) V02 -1.95 -14.564 (0.889) -1.95 -14.903 (0.982) Vol -1.95 - 2.089 (0.990) -2.00 - 2.089 (0.995) Neg. Loglik 933.922 (# Iter = 67) 933.922 (# Iter = 64) Set 3 Set 4 Parameter SV Estimate (SE) SV Estimate (SE) ft 0.90 0.886 (0.788) 0.886 0.886 (0.530) ft -0.02 - 0.017 (0.734) -0.017 -0.017 (0.389) ft -0.50 - 0.484 (0.751) -0.484 -0.484 (0.377) . ft -0.12 - 0.118 (0.324) -0.118 -0.118 (0.286) " 1 2 0.00 - 0.004 (0.562) -0.004 -0.004 (0.370) " 1 3 -0.01 - 0.010 (0.694) -0.010 -0.010 (0.420) " 2 3 -0.11 - 0.111 (0.593) -0.111 -0.111 (0.546) « 1 2 3 -0.50 - 0.511 (0.599) -0.511 -0.511 (0.489) Oil -0.10 - 0.103 (0.792) -0.103 -0.103 (0.441) Q-2 -0.60 - 0.649 (0.847) -0.649 -0.649 (0.426) V03 -6.00 -14.384 (1.033) -14.384 -13.732 (0.616) Vl -0.30 0.286 (0.957) 0.000 0.286 (0.318) m 6.00 13.028 (0.990) 0.000 12.376 (0.584) V02 -4.00 -14.527 (0.904) 0.000 -13.875 (0.642) V01 -2.00 - 2.089 (0.958) 0.000 -2.089 (0.838) Neg. Loglik 933.922 (# Iter = 56) 933.922 (# Iter = 72) 131 Table B.3: Results for Model ID3 Set 1 Set 2 Parameter SV Estimate (SE) SV Estimate (SE) 00 0.88 0.986 (0.210) 1.00 0.986 (0.210) 0i -0.02 -0.097 (0.199) -0.10 -0.097 (0.200) 02 -0.48 -0.475 (0.196) -0.50 -0.475 (0.195) 03 -0.12 -0.230 (0.083) -0.20 -0.230 (0.083) « 1 2 0.00 -0.082 (0.169) -0.08 -0.082 (0.171) « 1 3 -0.01 -0.189 (0.177) -0.20 -0.189 (0.178) « 2 3 -0.11 -0.345 (0.195) -0.30 -0.345 (0.198) "123 -0.51 -0.706 (0.200) -0.70 -0.706 (0.202) a.\ -0.10 -0.191 (0.219) -0.20 -0.191 (0.219) -0.60 -0.648 (0.226) -0.60 -0.648 (0.224) vo -1.95 -2.195 (0.159) -2.00 -2.195 (0.161) Vi 0.00 0.416 (0.286) 0.40 0.416 (0.290) V2 0.00 0.222 (0.449) 0.20 0.222 (0.459) Neg. Loglik 937.349 (# Iter = 20) 937.349 (# Iter = 21) 132 Table B.4: Results for Model ID4 Set 1 Set 2 Parameter SV Estimate (SE) SV Estimate (SE) ft 0.82 0.880 (0.731) 0.88 0.880 (0.217) ft 0.00 -0.024 (0.252) 0.00 -0.024 (0.197) ft 0.00 -0.487 (0.241) -0.50 -0.487 (0.202) ft -0.26 -0.120 (0.386) -0.12 -0.120 (0.075) "12 -0.60 -0.013 (0.216) 0.00 -0.013 (0.174) "13 -0.63 -0.022 .(0.225)" -0.02 -0.022 (0.168) «23 -0.77 -0.126 (0.422) -0.13 -0.126 (0.179) ttl23 -1.15 -0.524 (0.257) -0.52 -0.524 (0.189) ai 0.00 -0.109 (0.298) -0.10 -0.109 (0.208) «2 0.00 -0.654 (0.274) -0.65 -0.654 (0.230) V03 -1.95 -14.818 (0.730) -4.00 -16.284 (1.015) V23 0.00 13.654 (0.751) 2.00 15.119 (1.008) V02 -1.95 -3.819 (0.728) -3.80 -3.819 (2.606) V22 0.00 2.464 (0.761) 0.00 2.464 (2.754) VOl -1.95 -2.089 (0.209) -2.08 -2.089 (0.142) Neg. Loglik 934.432 (# Iter = 67) 934.432 (# Iter = 63) Set 3 Set 4 Parameter SV Estimate (SE) SV Estimate (SE) ft 0.880 0.880 (0.880) 0.880 0.880 (0.170) ft -0.024 -0.024 (0.720) -0.024 -0.024 (0.165) ft -0.486 -0.487 (0.703) -0.487 -0.487 (0.171) ft -0.120 -0.120 (0.434) -0.120 -0.120 (0.074) "12 -0.010 -0.013 (0.438) -0.013 -0.013 (0.118) «13 -0.022 -0.022 (0.771) -0.022 -0.022 (0.121) "23 -0.125 -0.126 (0.845) -0.126 -0.126 (0.139) «123 -0.524 -0.524 (0.641) -0.524 -0.524 (0.131) « i -0.109 -0.109 (0,776) -0.109 -0.109 (0.174) «2 -0.650 -0.654 (0.820) -0.654 -0.654 (0.195) »703 0.000 -15.432. (0.986) -15.432 -15.432 (1.363) ?723 0.000 14.268 (0.988) 14.268 14.268 (1.368) %2 -3.820 -3:819 (0.725) -3.819 -3.819 (1.973) %2 2.460 2.464 (0.766) 2.464 2.464 (2.093) »7oi -2.090 -2.089 (1.002) -2.089 -2.089 (0.156) Neg. Loglik 934.432 (# Iter = 61) 934.432 (# Iter = 24) 133 Table B.5: Results for Model ID5 Set 1 Set 2 Parameter SV Estimate (SE) SV Estimate (SE) So 0.82 0.886 (0.926) 0.890 0.886 (0.169) Si 0.00 -0.017 (0.898) -0.017 -0.017 (0.178) S2 0.00 -0.484 (0.874) -0.480 -0.484 (0.187) S3 -0.26 -0.118 (0.543) ^0.120 -0.118 (0.070) "12 -0.60 -0.004 (0.971) 0.000 -0.004 (0.134) "13 -0.63 -0.010 (0.965) -0.010 -0.010 (0.140) "23 -0.77 -0.111 (0.990) -0.110 -0.111 (0.153) "123 -1.15 -0.511 (0.891) -0.510 -0.511 (0.151) "1 0.00 -0.103 (0.901) -0.100 -0.103 (0.195) " 2 0.00 -0.649 (0.940) -0.650 -0.649 (0.204) V03 -1.95 -13.737 (1.000) -4.000 -15.608 (301.091) m 0.00 12.573 (1.002) 0.000 14.443 (301.092) V02 -1.95 -13.866 (1.000) -2.000 -15.737 (301.091) V01 -1.95 -2.089 (1.000) -2.080 -2.089 (0.166) Neg. Loglik 934.473 (# Iter = 55) • 934.473 (# Iter = 60) Set 3 Set 4 Parameter SV Estimate (SE) SV Estimate (SE) •Bo 0.886 0.886 (0.213) 0.900 0.886 (0.400) Si -0.017 -0.017 (0.206) -0.017 -0.017 (0.573) S2 -0.484 -0.484 (0.192) -0.480 -0.484 (0.482) S3 -0.118 -0.118 (0.074) -0.120 -0.118 (0.214) "12 -0.004 -0.004 (0.171) 0.000 -0.004 (0.305) "13 -0.010 -0.010 (0.167) -0.010 -0.010 (0.306) "23 -0.111 -0.111 (0.178) -0.110 -0.111 (0.456) "123 -0.511 -0.511 (0.185) -0.510 -0.511 (0.460) "1 -0.103 -0.103 (0.220) -0.100 -0.103 (0.619) " 2 -0.649 -0.649 (0.214) -0.650 -0.649 (0.534) Voz -15.608 -15.608 (0.582) 0.000 -13.737 (0.942) m 14.443 14.443 (0.568) 0.000 12.573 (0.587). V02 -15.737 -15.737 (0.569) 0.000 -13.866 (0.896) V01 -2.089 -2.089 (0.166) -2.000 -2.089 (0.577) Neg. Loglik 934.473 (# Iter = 20) 934.473 (# Iter = 71) 134 Table B.6: Results for Model ID6 Set 1 Set 2 Parameter SV Estimate (SE) SV Estimate (SE) So 0.90 0.962 (0.209) 1.00 0.962 (0.208) Si ' -0.10 -0.080 (0.197) -0.08 -0.080 (0.198) Si -0.50 -0.483 (0.198) -0.50 -0.483 (0.195) s3 -0.20 -0.201 (0.077) -0.20 -0.201 (0.078) • "12 -0.05 -0.057 (0.171) -0.06 -0.057 (0.168) "13 -0.10 -0.137 (0.172) -0.10 -0.137 (0.170) "23 -0.20 -0.279 (0.187) -0.30 -0.279 (0.186) "123 -0.50 -0.646 (0.193) -0.60 -0.646 (0.190) Oil -0.10 -0.172 (0.213) -0.20 -0.172 (0.216) " 2 -0.60 -0.655 (0.223) -0.70 -0.655 (0.224) no. -1.95 -2.206 (0.151) -2.00 -2.206 (0.161) m 0.00 0.661 (0.264) 0.70 0.661 (0.269) Neg. Loglik 938.464 (# Iter = 19) 938.464 (# Iter = 17) Table B.7: Results for Model RD1 Set 1 Set 2 Parameter SV Estimate (SE) SV Estimate (SE) So 0.82 0.999 (0.211) 1.00 0.999 (0.216) Si 0.00 -0.106 (0.196) -0.12 -0.106 (0.194) s2 0.00 -0.470 (0.196) -0.47 -0.470 (0.194) S3 -0.26 -0.246 (0.080) -0.20 -0.246 (0.082) "12 -0.60 -0.097 (0.166) -0.10 -0.097 (0.173) "13 -0.63 -0.219 (0.169) -0.22 -0.219 (0.172) "23 -0.77 -0.384 (0.182) -0.38 -0.384 (0.188) "123 -1.15 -0.742 (0.188) -0.74 -0.742 (0.195) "1 0.00 -0.201 (0.216) -0.20 -0.201 (0.217) " 2 0.00 -0.643 (0.229) -0.64 -0.643 (0.227) V03 -1.95 -2.416 (0.335) -2.41 -2.416 (0.336) ni3 0.00 0.878 (0.396) 0.90 0.878 (0.396) no2 -1.95 -2.117 (0.300) -2.11 -2.117 (0.293) nn 0.00 0.401 (0.360) 0.40 0.401 (0.350) noi -1.95 -2.089 (0.165) -2.08 -2.089 (0.164) Neg. Loglik 936.833 (# Iter = 25) 936.833 (# Iter = 23) 135 Table B.8: Results for Model RD2 Set 1 Set 2 Parameter SV Estimate (SE) SV Estimate (SE) ft 0.82 0.999 (0.204) 0.999 0.999 (0.208) ft 0.00 -0.106 (0.193) -0.106 -0.106 (0.197) 02 0.00 -0.470 (0.187) -0.470 -0.470 (0.196) ft -0.26 -0.246 (0.080) -0.246 -0.246 (0.077) "12 -0.60 -0.097 (0.166) -0.097 -0.097 (0.168) «13 -0.63 -0.219 (0.162) -0.219 -0.219 (0.170) «23 -0.77 -0.384 (0.174) -0.384 -0.384 (0.186) "123 -1.15 -0.742 (0.183) -0.742 -0.742 (0.193) ai 0.00 -0.201 (0.217) -0.201 -0.201 (.0.217) OL2 0.00 -0.643 (0.229) -0.643 -0.643 (0.227) V03 -1.95 -2.239 (0.258) -2.239 -2.239 (0.247) m 0.00 0.625 (0.261) 0.625 0.625 (0.264) V02 -1.95 -2.278 (0.250) -2.278 -2.278 (0.253) Vol -2.08 -2.089 (0.169) -2.089 -2.089 (0.167) Neg. Loglik 937.250 (# Iter = 26) 937.250 (# Iter = 17) Table B.9: Results for Model RD3 Set 1 Set 2 Parameter SV Estimate (SE) SV Estimate (SE) ft 0.82 0.999 (0.206) 0.999 0.999 (0.208) ft 0.00 -0.106 (0.197) -0.106 -0.106 (0.195) ft 0.00 -0.470 (0.179) -0.470 -0.470 (0.196) ft -0.26 -0.246 (0.078) -0.246 -0.246 (0.073) ai2 -0.60 -0.097 (0.167) -0.097 -0.097 (0.168) "13 -0.63 -0.219 (0.168) -0.219 -0.219 (0.163) "23 -0.77 -0.384 (0.180) -0.384 -0.384 (0.181) "123 -1.15 -0.742 (0.187) -0.742 -0.742 (0.191) ai 0.00 -0.201 (0.227) -0.201 -0.201 (0.217) 0.00 -0.643 (0.219) -0.643 -0.643 (0.225) Vo -1.95 -2.153 (0.133) 0.000 -2.153 (0.121) Vi 0.00 0.518 (0.196) 0.000 0.518 (0.188) Neg. Loglik 937.457 (# Iter = 22) 937.457 (# Iter = 24) 136 Table B.10: Results for Model CRD1 Set 1 Set 2 Parameter SV Estimate (SE) SV Estimate (SE) Bo 0.82 0.999 (0.203) 1.00 0.999 (0.212) Bi 0.00 -0.106 (0.173) -0.10 -0.106 (0.199) B2 0.00 -0.470 (0.189) -0.40 -0.470 (0.197) Bs -0.26 -0.246 (0.078) -0.20 -0.246 (0.080) " 1 2 -0.60 -0.097 (0.172) 0.00 -0.097 (0.169) " 1 3 -0.63 -0.219 (0.169) -0.20 -0.219 (0.171) " 2 3 -0.77 -0.384 (0.186) -0.40 -0.384 (0.186) " 1 2 3 -1.15 -0.742 (0.192) -0.70 -0.742 (0.193) a i 0.00 -0.201 (0.196) -0.20 -0.201 (0.221) a2 0.00 -0.643 (0.231) -0.60 -0.643 (0.229) V03 0.00 -1.846 (0.161) -2.00 -1.846 (0.170) V02 0.00 -1.849 (0.152) -2.00 -1.849 (0.159) Vol 0.00 -2.089 (0.154) -2.10 -2.089 (0.165) Neg. Loglik 940.322 (# Iter = 29) 940.322 (# Iter = 21) Table B . l l : Results for Model CRD2 Set 1 Set 2 Parameter SV Estimate (SE) SV Estimate (SE) Bo 0.82 0.999 (0.281) 1.00 0.999 (0.210) Bi 0.00 -0.106 (0.211) -0.10 -0.106 (0.192) B2 0.00 -0.470 (0.228) -0.40 -0.470 (0.193) B3 -0.26 -0.246 (0.078) -0.20 -0.246 (0.079) « 1 2 . -0.60 -0.097 (0.232) 0.00 -0.097 (0.169) « 1 3 -0.63 -0.219 (0.221). -0.20 -0.219 (0.172) « 2 3 -0.77 -0.384 (0.227) -0.40 -0.384 (0.185) " 1 2 3 -1.15 -0.742 (0.250) -0.70 -0.742 (0.190) ai • 0.00 -0.201 (0.228) -0.20 -0.201 (0.217) a2 0.00 -0.643 (0.239) -0.60 -0.643 (0.224) Vo -1.95 -1.933 (0.106) -2.00 -1.933 (0.095) Neg. Loglik 941.040 (# Iter = 21) 941.040 (# Iter = 19) 137 Appendix C Detailed Results for the Selection Models Described Section 7.3 138 Table C . l : Results for Drop-out Model: TRT * LUR Set 1 Set 2 Parameter SV Estimate (SE) . sv Estimate (SE) ft 0.80 .0.886 (1.178) 0.90 0.886 (0.926) Pi(LD) -0.10 -0.017 (0.779) -0.02 -0.017 (0.988) fo{HD) -0.50 -0.484 (0.876) -0.50 -0.484 (1.095) 03 (time) -0.20 -0.118 (0.422) -0.10 -0.118 (0.911) "12 -0.08 -0.004 (1.143) 0.00 -0.004 (1.365) "13 -0.20 -0.010 (1.025) -0.01 -0.010 (1.198) "23 -0.30 -0.111 (0.877) -0.10 -0.111 (1.265) "123 -0.70 -0.511 (0.617) -0.50 -0.511 (1.193) " i -0.20 -0.103 (1.096) -0.10 -0.103 (0.987) " 2 -0.70 -0.649 (0.885) -0.60 -0.649 (1.143) V03 -1.95 -13.608 (1.053) -1.00 -15.118 (1.047) V02 -1.95 -13.732 (1.069) -1.00 -15.242. (1.044) Voi -1.95 -2.136 (1.235) -2.10 -2.136 (1.026) Vi(LD) 0.00 -0.203 (1.232) -0.20 -0.203 (1.020) V2(HD) 0.00 0.296 (1.044) 0.30 0.296 (1.133) Vs(LUR) 0.00 12.382 (1.122) 1.00 13.892 (1.348) r]4(LD x LUR) 0.00 0.571 (1.001) 0.60 0.571 (1.025) m(HD x LUR) 0.00 -0.620 (1.101) -0.60 -0.620 (1.100) Neg. Loglik 931.223 (# Iter = 60) 931.223 (# Iter = 65) 139 Table C.2: Results for Drop-out Model: T R T + L O R + L U R Set 1 Set 2 Parameter SV Estimate (SE) SV Estimate (SE) Bo 0.80 0.886 (0.602) 0.90 0.886 (1.280) Bi(LD) -0.10 -0.017 (0.687) -0.02 -0.017 (1.112) B2{HD) -0.50 -0.484 (0.650) -0.50 -0.484 (1.012) 8z(time) -0.20 -0.118 (0.267) -0.10 -0.118 (0.683) « 1 2 -0.08 -0.004 (0.561) 0.00 -0.004 (1.067) "13 -0.20 -0.010 (0.899) -0.01 -0.010 (1.112) « 2 3 -0.30 -0.111 (0.715) -0.10 -0.111 (1.089) t*123 -0.70 -0.511 (0.572) -0.50 -0.511 (1.104) Oil -0.20 -0.103 (0.732) -0.10 -0.103 (0.978) Oi2 -0.70 -0.649 (0.774) -0.60 -0.649 (1.188) V03 -1.95 -13.920 (1.010) -2.00 -14.006 (1.019) V02 -1.95 -14.063 (0.998) -1.00 -14.149 (1.368) Vol -1.95 -2.156 (0.923) -2.10 -2.156 (1.102) Vi(LD) 0.00 0.209 (0.934) 0.20 -0.209 (1.287) m(HD) 0.00 -0.023 (0.974) -0.20 -0.023 (1.381)' V3(LOR) 0.00 0.290 (0.873) 0.30 0.290 (1.145) rj^(LUR) 0.00 12.490 (0.777) 1.00 12.576 (2.942) Neg. Loglik 933.350 (# Iter = 65) 933.350 (# Iter = 60) 140 Table C.3: Results for Drop-out Model: LOR * LUR Set 1 Set 2 Parameter SV Estimate (SE) SV Estimate (SE) ft 0.80 0.886 (0.664) 0.90 0.886 (0.650) Pi(LD) -0.10 ' -0.017 (0.411) -0.02 -0.017 (0.733) -0.50 -0.484 (0.639) -0.50 -0.484 (0.800) f33(time) -0.20 -0.118 (0.361) -0.10 -0.118 (0.188) "12 -0.08 -0.004 (0.448) 0.00 -0.004 (0.487) "13 -0.20 -0.010 (0.539) -0.01 -0.010 (0.766) "23 -0.30 -0.111 (0.711) -0.10 -0.111 (0.586) "123 -0.70 -0.511 (0.643) -0.50 -0.511 (0.610) "1 -0.20 -0.103 (0.456) -0.10 -0.103 (0.788) "2 -0.70 -0.649 (0.751) -0.60 -0.649 (0.847) V03 -1.95 -12.674 (0.953) -1.00 -14.340 (0.992) V02 -1.95 -12.817 (0.936) -1.00 -14.483 (1.001) VOI -1.95 -2.089 (0.979) -2.10 -2.089 (0.998) Vi(LOR) -0.10 -0.711 (0.850) 0.10 0.152 (1.002) V2(LUR) 0.00 11.318 (0.785) 1.00 12.984 (0.995) m{LOR x LUR). 0.20 0.996 (0.852) 0.20 0.134 (0.998) Neg. Loglik 933.922 (# Iter = 61) 933.922 (# Iter = 66) Set 3 Set 4 Parameter SV Estimate (SE) SV Estimate (SE) ft 0.80 0.886 (0.908) 0.90 0.886 (0.704) Pi(LD) -0.10 -0.017 (0.882) -0.02 -0.017 (0.722) (32(HD) -0.50 -0.484 (0.963) -0.50 -0.484 (0.628) 03 (time) -0.20 -0.118 (0.593) -0.10 -0.118 (0.345) "12 . -0.08 -0.004 (0.956) 0.00 -0.004 (0.710) "13 -0.20 -0.010 (0.945) -0.01 -0.010 (0.806) "23 -0.30 -0.111 (0.887) -0.10 -0.111 (0.812) "123 -0.70 -0.511 (0.884) -0.50 -0.511 (0.705) "1 -0.20 -0.103 (0.918) -0.10 -0.103 (0.787) "2 -0.70 -0.649 (0.985) -0.60 -0.649 (0.793) V03 -1.95 -14.338 (1.002) -4.00 -13.092 (0.922) V02 -1.95 -14.481 (1.004) -3.00 -13.235 (1.002) VOI -1.95 -2.089 (1.002) -2.10 -2.089 (0.992) Vi(LOR) -0.50 -0.496 (1.001) 0.10 -3.387 (0.916) V2(LUR) -0.10 12.982 (1.009) 2.00 11.736 (0.861) m(LOR x LUR) 0.10 0.782 (1.001) -0.20 3.672 (0.929) Neg. Loglik 933.922 (# Iter = 60), 933.922 (# Iter = 59) 141 Table C.4: Results for Drop-out Model: T R T + L U R Parameter Set 1 Set 2 SV Estimate (SE) SV Estimate (SE) Bo Bi(LD) 82(HD) Bz(time\ 0.80 -0.10 -0.50 -0.20 0.886 (0.202) -0.017 (0.195) -0.484 (0.192) -0.118 (0.074) 0.80 -0.02 -0.50 -0.10 0.886 (0.927) -0.017 (0.995) -0.484 (0.995) -0.118 (0.600) Ct\2 « 1 3 C*23 "123 oti a.2 -0.08 -0.20 -0.30 -0.70 -0.20 -0.70 -0.004 (0.157) -0.010 (0.156) -0.111 (0.168) -0.511 (0.171) -0.103 (0.207) -0.649 (0.215) 0.00 -0.01 -0.10 -0.50 -0.10 -0.60 -0.004 (0.988) -0.010 (0.981) -0.111 (0.981) -0.511 (0.976) -0.103 (1.000) -0.649 (0.997) Vol V02 V03 m(LD) m{HD) r)i(LUR) -1.95 -1.95 -1.95 0.00 0.00 0.00 -2.140 (0.151) -15.364 (0.862) -15.236 (0.807) 0.191 (0.205) -0.051 (0.219) 14.014 (0.876) -2.10 -2.00 -3.00 0.20 -0.05 1.00 -2.140 (1.001) -13.697 (1.001) -13.569 (1.001) 0.191 (1.002) -0.051 (1.002) 12.347 (1.021) Neg. Loglik 933.910 (# Iter = 67) 933.910 (# Iter = 54) 142 Appendix D Detailed Results for the Selection Models Described Section 7.4 143 Table D . l : Results for Case 1 in Table 7.14 Evaluated at the Boundary: 7/03 -> —00, 7702 -> -00 ,7 /03 + m = A i , 7/02 + m = A 2 Parameter Estimate SE 0.861 0.206 -0.007 0.196 & -0.495 0.193 /?3 (time) -0.122 0.073 04 (gender) 0.052 0.045 «12 -0.002 0.164 " 1 3 -0.011 0.159 «23 -0.111 0.172 "123 -0.511 0.176 « 1 -0.094 0.208 « 2 -0.661 0.218 r/i(LOi?) 0.286 0.275 A i -1.356 0.264 • A 2 -1.499 0.262 *7oi -2.089 0.167 Neg. Loglik 933.244 (# Iter = 24) 144 Table D.2: Results for Case 2 in Table 7.14 Evaluated at the Boundary: 7703 - » —00, 7702 -> - 0 0 , 7/03 + V2 = A i , 7702 + 7/2 = A 2 Parameter Estimate SE Bo 0.894 0.205 Bi (LD) -0.016 0.193 B2 (HD) -0.483 0.190 63 (time) -0.117 0.073 At (EDSS) -0.004 0.017 " 1 2 -0.004 0.157 « 1 3 -0.011 0.155 « 2 3 -0.111 0:168 «123 -0.511 0.170 Oil -0.101 0.207 Oi2 -0.649 0.215 m(LOR) 0.286 0.251 A i -1.356 0.209 A 2 -1.499 0.237 Voi -2.089 0.164 Neg. Loglik 933.901 (# Iter = 24) 145 Table D.3: Results for Case 3 in Table 7.14 Evaluated at the Boundary: 7703 —> —00, 7702 -> - 0 0 , 7703 + 772 = A i , 7702 + 772 = A 2 Parameter Estimate SE Po 0.877 0.205 Pi (LD) -0.025 0.196 P2 (HD) -0.484 0.194 Ps (time) -0.119 0.073 Pi (duration) 0.002 0.003 " 1 2 0.000 0.163 " 1 3 -0.008 0.161 " 2 3 -0.108 0.173 "123 -0.508 0.177 " 1 -0.110 0.208 " 2 -0.651 0.216 Vi(LOR) 0.286 0.279 A i -1.356 0.263 A 2 -1.499 0.265 Voi -2.089 0.165 Neg. Loglik 933.768 (# Iter = 24) 146 Table D.4: Results for Case 4 in Table 7.14 Evaluated at the Boundary: 7703 —> —00, 7/02 -> -co , 7703 + 7/2 = A i , 7702 + % = A 2 Parameter Estimate SE A> 0.745 0.241 01 (LD) -0.025 0.195 02 (HD) -0.486 0.185 /?3 (time) -0.118 0.073 Bi (age) 0.004 0.004 " 1 2 -0.001 0.162 " 1 3 -0.005 0.162 " 2 3 • -0.110 0.174 "123 -0.507 0.179 "1 -0.108 0.208 " 2 -0.654 0.209 m(LOR) 0.286 0.241 A i -1.356 0.247 A 2 -1.499 0.232 V01 -2.089 0.138 Neg. Loglik 933.354 (# Iter = 26) 147 Table D.5: Results for Case 5 in Table 7.14 Evaluated at the Boundary: 7703 —> —00, 7702 -> -co, 7703 + 772 = A i , 7702 + m = A 2 Imputed Set 1 Imputed Set 2 Parameter Estimate SE Estimate SE Po 0.718 0.253 0.726 0.251 Pi(LD) 0.008 0.193 0.006 0.195 P2(HD) -0.474 0.190 -0.475 0.192 Pz(time) -0.108 0.075 -0.109 0.073 p4(log(BOD)) 0.020 0.016 0.019 0.016 "12 -0.022 0.164 -0.021 0.162 "13 -0.031 0.161 -0.029 0.159 "23 -0.131 0.172 -0.130 0.171 "123 -0.530 0.178 -0.529 0.175 "1 -0.071 0.207 -0.074 0.210 " 2 -0.624 0.216 -0.626 0.218 Vi(LOR) 0.286 0.274 0.286 0.277 A i -1.356 0.264 -1.356 0.266 A 2 -1.499 0.263 -1.499 0.261 »7oi -2.089 0.164 -2.089 0.166 Neg. Loglik 933.088 (# Iter = 23) 933.215 (# Iter = 23) Imputed Set 3 Imputed Set 4 Parameter Estimate SE Estimate SE Po 0.717 0.251 0.725 0.249 Pi(LD) 0.009 0.190 0.006 0.195 P2(HD) -0.474 0.189 -0.475 0.189 Pz(time) -0.108 0.075 -0.109 0.073 P4(log(BOD)) 0.020 0.016 0.019 0.016 " 1 2 -0.022 0.160 -0.021 0.160 "13 -0.031 0.158 -0.029 0.155 "23 -0.131 • 0.172 -0.129 0.167 "123 -0.530 0.173 -0.529 0.171 " 1 -0.071 0.204 -0.073 0.210 " 2 -0^624 0.215 -0.625 0.214 Vi(LOR) 0.286 0.278 0.286 0.283 A i -1.356 0.266 -1.356 0.268 A 2 -1.499 0.264 -1.499 0.267 ' V01 -2.089 0.166 -2.089 0.166 Neg. Loglik 933.083 (# Iter = 23) 933.211 (# Iter = 23) 148 Table D.6: Results for Case 5 in Table 7.14 Evaluated at the Boundary (364 pa-tients): 7/03 -> " O O , 7/02 -> - O O , 7/ 0 3 + 7/2 = A X , 7 / 0 2 + 7/2 = A 2 Imputed with 1.0 Imputed with 4.5 Parameter Estimate SE Estimate SE 00 0.686 0.248 0.694 0.249 0i{LD) 0.063 0.193 0.061 0.193 B2(HD) -0.446 0.196 -0.447 0.192 8% (time) -0.108 0.074 -0.108 0.074 fa(log(BOD)) 0.020 0.016 0.020 0.016 "12. -0.061 0.166 -0.060 0.160 "13 -0.059 0.162 -0.058 0.159 "23 - -0.161 0.175 -0.160 0.172 "123 -0.567 0.180 -0.565 0.176 "1 -0.022 0.208 -0.025 0.206 "2 -0.587 0.220 -0.589 0.212 ni(LOii) 0.301 0.274 0.301 0.276 A i -1.342 0.264 -1.342 0.264 A 2 -1.491 0.261 -1.491 0.263 V01 -2.149 0.168 -2.149 0.171 Neg. Loglik 914.912 (# Iter = 24) 915.047 (# Iter = 26) 149 Table D.7: Results for Model ID2 Evaluated at the Boundary (364 patients): 7703 ->• - 0 0 , 7702 -> - 0 0 , 7703 + 772 = A i , 7702 + 772 = A 2 Parameter Estimate SE 00 0.859 0.206 Pi(LD) 0.035 . 0.193 -0.457 0.201 03 (time) -0.118 0.074 "12 -0.041 0.171 "13 -0.037 0.161 "23 -0.140 0.174 "123 -0.546 0.179 Oil -0.056 0.207 "2 -0.614 0.224 m(LOR) 0.301 0.276 A i -1.342 0.250 A 2 -1.491 0.266 »7oi -2.149 0.170 Neg. Loglik 915.825 (# Iter = 21) 150 Appendix E Detailed Results for the Liu e t a l . Transition Models Described in Section 7.6 151 Table E . l : Results for Liu Transition Model with Drop-out Model ID1 Parameter SV Estimate (SE) SV Estimate (SE) ft 0.89 1.007 (0.558) 1.00 1.007 (0.447) 0i -0.12 -0.040 (0.876) -0.04 -0.040 (0.475) ft -0.50 -0.462 (1.336) -0.50 -0.462 (0.372) ft -0.42 -0.324 (0.369) -0.30 -0.324 (0.404) ft 0.90 0.692 (0.707) 0.70 0.692 (0.361) Voz -1.95 -12.083 (2.727) -2.00 -10.451 (1.197) Vl3 0.00 0.558 (1.013) 0.60 0.558 (0.767) V23 0.00 10.535 (0.888) 2.00 8.903 (0.970) V02 -1.95 -18.645 (1.144) -1.00 -22.807 (0.746) Vl2 ' 0.00 0.048 (2.246) 0.05 0.048 (0.646) V22 0.00 17.318 (0.913) 2.00 21.480 (0.746) Vol -1.95 -2.089 (0.828) -2.00 -2.089 (0.363) Neg. Loglik 942.259 (# Iter = 137) 942.259 (# Iter = 97) Table E.2: Results for Liu Transition Model with Drop-out Model ID2 Parameter SV Estimate (SE) SV Estimate (SE) ft 0.89 1.007 (0.817) 1.00 1.007 (0.873) ft -0.12 -0.040 (0.748) -0.04 -0.040 (0.710) ft -0.50 -0.462 (0.604) -0.50 -0.462 (0.625) ft -0.42 -0.324 (0.483) -0.30 -0.324 (0.481) ft 0.90 0.692 (0.631) 0.70 0.692 (0.697) V03 -1.95 -14.182 (0.991) -2.00 -14.175 (0.908) V02 -1.95 -14.324 (1.388) -1.00 -14.318 (0.909) Vol -1.95 -2.089 (0.924) -2.00 -2.089 (0.886). Vi 0.00 0.286 (0.710) 0.30 0.286 (0.415) V2 0.00 12.826 (0.812) 3.00 12.819 (0.594) Neg. Loglik 942.687 (# Iter = 58) 942.687-(# Iter = 61) 152 Table E.3: Results for Liu Transition Model with Drop-out Model ID3 Parameter SV Estimate (SE) SV Estimate (SE) A> 0.89 1.123 (0.211) 1.10 1.123 (0.212) 01 -0.12 -0.128 (0.178) -0.13 -0.128 (0.176) 02 -0.50 -0.437 (0.174) -0.40 -0.437 (0.173) 03 -0.42 -0.443 (0.102) -0.50 -0.443 (0.103) 04 0.90 0.573 (0.172) 0.60 0.573 (0.177) Vo -1.95 -2.023 (0.163) -2.00 -2.023 (0.161) Vi 0.00 0.542 (0.311) 0.50 0.542 (0.313) V2 0.00 -0.262 (0.610) -0.30 -0.262 (0.609) Neg. Loglik 939.471 (# Iter = 17) 939.471 (# Iter = 15) Table E.4: Results for Liu Transition Model with Drop-out Model ID5 Parameter SV Estimate (SE) SV Estimate (SE) 00 0.89 1.007 (0.931) 1.10 1.007 (0.929) 01 -0.12 -0.040 (1.026) -0.03 -0.040 (0.994) 02 -0.50 -0.462 (0.984) -0.40 -0.462 (0.992) 03 -0.42 -0.324 (0.463) -0.30 -0.324 (0.485) 04 0.90 0.692 (0.975) 0.60 0.692 (0.971) V03 -1.95 -14.704 (0.863) -1.00 -13.967 (0.970) • V02 -1.95 -14.832 (0.972) -0.00 -14.096 (0.894) Voi -1.95 -2.089 (1.028) -1.00 -2.089 (0.992) V2 0.00 13.539 (0.742) 0.00 12.803 (0.570) Neg. Loglik 943.239 (# Iter = 55) 943.239 (# Iter = 57) 153 Table E.5: Results for Liu Transition Model with Random Drop-out (RD) RD1 RD2 RD3 Parameter Estimate SE Estimate SE Estimate SE Po 1.113 0.206 1.113 0.209 1.113 0.212 '0i -0.118 0.170 -0.118 0.172 -0.118 0.175 02 -0.445 0.165 -0.445 0.172 -0.445 0.173 03 -0.431 0.096 -0.431 0.099 -0.431 0.099 04 0.596 0.160 0.596 0.164 0.596 0.169 V03 -2.416 0.327 -2.239 0.248 - -??02 -2.117 0.297 -2.278 0.261 - -Voi -2.089 0.108 -2.089 0.163 -Vo - - - - -2.068 0.132 Vl3 0.878 0.367 - - - -V12 0.401 0.337 - - • -Vi - - 0.625 0.261 0.432 0.195 Neg. Loglik 944.101 (# Iter = 21) 944.518 (# Iter = 17) 939.578 (# Iter = 14) Table E.6: Results for Liu Transition Model with Drop-out Completely At Random (CRD) CRD1 CRD2 Parameter Estimate SE Estimate SE 00 1.113 0.209 1.113 0.209 0i -0.118 0.174 -0.118 0.172 02 -0.445 0.172 -0.445 0.172 03 • -0.431 0.099 -0.431 0.099 04 0.596 0.169 0.596 0.164 V03 -1.846 0.172 - -V02 -1.849 0.160 - -Vol -2.089 0.166 - -Vo - - -1.880 0.096 Neg. Loglik 947.589 (# Iter = 13) 942.074 (# Iter = 11) 154 *