HIERARCHICAL MODELLING OF MULTIVARIATE SURVIVAL DATA By Paul Antony Gustafson B. Sc. (Mathematics) University of British Columbia A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES DEPARTMENT OF STATISTICS We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA 1991 © Paul Antony Gustafson, 1991 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. (Signature Department of ^Statistics The University of British Columbia Vancouver, Canada Date^Dec. DE-6 (2/88) 9, 1991. Abstract Hierarchical models based on conditional independence are investigated as a means of modelling multivariate survival times. The model structure follows Clayton (1978), Hougaard (1986b), and Oakes (1986,1989). Both approximate Bayesian and maximum likelihood estimation in these models is investigated via simulation. Predicting a component of a response vector on the basis of other components of the response and other vectors is also studied, using Bayes and empirical Bayes methods. Application to real data is detailed, using the techniques of estimation and prediction discussed. ii Table of Contents ii Abstract^ List of Tables^ vi List of Figures^ vii Acknowledgement^ viii 1 Introduction 1 2 Models and Techniques 3 3 ^ 2.1 Conditionally Independent Hierarchical Models 2.2 Hazard Rate and Frailty ^ 4 2.3 Multivariate Frailty Model as a CIHM ^ 6 2.4 The Weibull Model ^ 9 2.5 Properties as a One-Stage Model ^ 10 2.6 Bayesian Inference ^ 11 2.7 Using Laplace's Method to Approximate Posterior Quantities ^ 12 Estimation and Simulation 3 14 3.1 Estimating the Common Parameter ^ 14 3.2 Prior Information ^ 15 3.3 Reparameterizat ions ^ 15 3.4 Weibull/Gamma Simulation Study ^ 17 iii 3.5 4 5 6 Weibull/Positive Stable Simulation Study ^ Prediction 20 24 4.1 Parametric Empirical Bayes Approach to Prediction ^ 25 4.2 Full Bayesian Approach to Prediction ^ 26 4.3 The PEB Approach in the Weibull Model ^ 27 4.4 The Full Bayesian Approach in the Weibull Model ^ 28 4.5 Full versus PEB Prediction ^ 29 4.5.1^Exponential Prediction Example ^ 30 4.6 Prediction in the Weibull/Gamma Model ^ 32 4.7 Prediction in the Weibull/Positive Stable Model ^ 33 4.8 Effect of Frailty Distribution on Prediction ^ 36 Application 39 5.1 Description of Rat-Litter Data ^ 39 5.2 Single Risk versus Competing Risks ^ 39 5.3 Model Adequacy Checking ^ 40 5.4 Single Risk : Tumour Development ^ 42 5.5 Single Risk : Death due to Other Causes ^ 44 5.6 Competing Risks : Single Frailty ^ 44 5.7 Competing Risks : Two Frailties ^ 45 5.8 Including Covariates ^ 47 5.9 Assessing Model Choice ^ 48 Discussion 53 6.1 Approaches to Frailty Models ^ 53 6.2 Specification of Distributions ^ 54 iv 6.3 Techniques ^ 56 6.4 Assessing Models ^ 57 Appendices^ 60 A Quadrant Dependence of Frailty Models B Female Rat Litter Data C Male Rat Litter Data ^ ^ 61 ^ 62 D Performance of Weibull/Gamma Model on Diagnostic Plot E Derivatives of Survivor Function in Two-Frailty Model Bibliography 60 ^ ^ ^ 63 64 66 v List of Tables 3.1 Comparison of estimators for 99 simulated samples of 50 observations in the Weibull/gamma model ^ 18 3.2 Comparison of estimators for 100 simulated samples of 200 observations in the Weibull/gamma model ^ 19 3.3 Comparison of estimators for 100 simulated samples of 50 observations in the Weibull/positive stable model ^ 21 3.4 Comparison of estimators for 100 simulated samples of 200 observations in the Weibull/positive stable model ^22 4.5 Probabilities of intervals for log L in exponential example ^32 5.6 Estimation with rat—litter data : single risk (tumour) ^42 5.7 Estimation with rat—litter data : single risk (other causes) ^44 5.8 Estimation with rat-litter data : competing risks, single frailty ^46 5.9 Estimation with rat-litter data : competing risks, two frailties ^48 5.10 Estimation with rat-litter data : single risk (other cause), proportional hazards model ^ 5.11 Predictive scores 49 ^51 vi List of Figures 4.1 Predictive density ratio as a function of Y (exponential example) ^. . . . 31 4.2 Approximate full and empirical predictive densities (gamma frailty) ^. . . 34 4.3 Histograms of observed predictive density ratios: approximate full versus empirical Hayes ^ 35 4.4 Approximate full and empirical predictive densities (stable frailty) . . . 37 5.5 Diagnostic plots based on Kaplan-Meier estimates of the marginals^.^. 43 5.6 Histograms of predictive density ratios :^positive stable frailty versus gamma frailty ^ vii 50 Acknowledgement I would like to thank Mohan Delampady for his guidance and supervision throughout the development of this thesis. Additionally, I am grateful to Harry Joe and James Zidek for their careful reading of the manuscript, and their subsequent suggestions. Finally, I would like to thank my wife, Reka Vasarhelyi, for her continued patience and support. viii Chapter 1 Introduction Often in experimental sciences and engineering, the time taken for objects or entities to fail or die is studied. The statistical methods and models used to interpret the resultant data are commonly referred to under the broad heading of survival analysis. Kalbfleish & Prentice (1980), Lawless (1982), and Cox Si Oakes (1984) cover the main ideas of survival analysis and act as guides to the large body of relevant literature. A small fraction of this literature deals with survival analysis within the Bayesian paradigm. Some such works are Cornfield & Detre (1977), Martz & Waller (1982), Kalbfleish (1985), and Gammerman & West (1987). Another small fraction is concerned with multivariate survival analysis. In this context, by multivariate we mean situations where several survival times somehow belong to a single cohesive unit or family. Holt & Prentice (1974) and Wild (1983) are two works in this category. Also related are works on the reliability of systems of several components, such as Lindley & Singpurwalla (1986). The literature is much sparser in the area of intersection of these two categories (which we can call Bayesian multivariate survival analysis). In fact, there is apparently no literature dealing with multivariate survival from an explicitly Bayesian perspective. However, the type of multivariate survival models developed and used by Clayton (1978), Hougaard (1986b), and Oakes(1986,1989), though developed from a frequentist perspective, have an implicit hierarchical Bayes structure. We aim to bring this Bayesianity to the fore in our investigation of these models. For completeness, it should be noted that there are hierarchical structures for multivariate survival other than the one we will investigate. 1 Chapter 1. Introduction^ 2 For an example, see Lindley & Singpurwalla (1986). To meet our aim, in Chapter 2 we introduce the model from the point of view of a hierarchical model as well as from a frailty model perspective. We also introduce relevant aspects of Bayesian inference. Chapter 3 is concerned with inference, using simulation to examine techniques. Then, Chapter 4 focusses on the problem of predicting a survival time given the survival times of other components of the same unit. The ideas of Chapters 3 and 4 are put to use in the analysis of a real data set, contained in Chapter 5. The data is lifetimes of rats—for each of a number of litters, a multivariate response (lifetimes of litter members) is observed. The hierarchical frailty model is employed as an attempt to account for factors common to a litter. The situation is complicated by three possible mechanisms for terminating individual observations (tumour appearance, sacrifice, natural death), as well as a treatment suspected to promote tumour appearance. Finally, Chapter 6 attempts to address some of the issues raised in the first five chapters. Chapter 2 Models and Techniques 2.1 Conditionally Independent Hierarchical Models A Bayesian model is hierarchical if there are several stages of information. A two-stage model would involve a response Y coming from a density f (y 10). In turn 0 would come from a density g(0 I A). In general, 0 would be referred to as the parameter, while A is the hyperparameter. The distribution g on 0 could represent prior information or structural information. Additional prior information could be specified in the form of a completely specified distribution on A. Alternately, A could be estimated from the data. Usually, the investigator would wish to make inferences about the parameter 0, although there is nothing to prevent making inferences about the hyperparameter A. Kass & Steffey (1989) formalize the class of conditionally independent hierarchical models (CIHM's) as follows. Say we have observation vectors Y of length n i , i =1,... , k, parameterized by 0, E 0, i = 1, . . . ,k and by A E A, where 0 and A have dimension q and r respectively. Then we have a CIHM if: Stage 1. conditionally on (0 1 ,^, 0 k ) and A, the Y are independent with densities f (y i 0 i , A) (i = 1,... , k) belonging to a family { f(y 10, A) : 0 E 0, A E A}; Stage 2. conditionally on A, 0 1 ,^, 01, are independent and identically distributed with density from {g(0 A) : A E A}. We can think of observing k units or families, with the i th unit consisting of n, components or individuals. Then the 0,'s are unit-specific parameters, while A is a parameter 3 Chapter 2. Models and Techniques ^ 4 common to all units. A fully Bayesian approach to this model would incorporate a prior distribution on A. Then given data, one could in principle determine the joint posterior distribution of any subset of (0, A) = (0 1 , , Ok, A 1 , , A r )• Of course such a model can be collapsed to a one-stage model parameterized by A. Then Y =^, Yk ) has density given by: k P(Y A)^E f (y 2 10i, A)^ t=i (2.1) where the expectation is with respect to the distribution on O i , and thus depends on A. If we were only interested in the 0 2 's, we could take a parametric empirical Bayes (PEB) approach. This would involve finding a maximum likelihood estimate ,\ by maximizing (2.1), and then proceeding as if A were the true value of A. This approach is discussed in Kass & Steffey (1989). Of course if the n i 's are small, we cannot hope to get useful information about individual O i 's. In this case we can only hope for information about A (providing k is suitably large). So in some cases we can only expect to get information about the distribution of the unit-specific parameters, rather than information about the parameters themselves. Before we employ CIHM's, we wish to introduce the notion of frailty. Hopefully, this will lend a physical interpretation to the model structure. 2.2 Hazard Rate and Frailty Often when working with distributions on the positive reals, and especially when studying survival times, distributions are characterized by their hazard rate or hazard function. If a continuous non-negative random variable has distribution function F, then its hazard rate h is given by d h(t) = -- log[l — F(t)1 dt Chapter 2. Models and Techniques^ 5 A common method of including covariates in a hazard rate model was introduced by Cox (1972). If an individual has observed covariate vector z then his hazard rate is assumed to be h(t I z) = g(z t ,3)h o (t) where 0 is an unknown parameter vector, g is a fixed function (usually the exponential function), and h 0 is an unknown function, the baseline hazard. Modelling concomitant variables, as above, is a way to account for heterogeneity in a population. However, in many situations it is reasonable to expect that a population has sources of heterogeneity not accounted for by covariates. For this reason, Vaupel, Manton & Stallard (1979) introduced an unobservable positive scalar quantity, frailty, in a hazard rate model. If an individual has frailty 0, then his hazard is modelled as h(t 10) = Oh o (t) where again h o is a baseline hazard common to a population. As in the Cox model, the effect on the hazard is multiplicative. So one can think of frailty as an unobserved covariate. (To be consistent with the Cox model, frailty would be a function of an unobserved covariate vector.) A model with covariates and frailty is also reasonable, both would have a multiplicative effect on the baseline hazard. If one makes no assumptions about the frailty of individuals in a population, it is unreasonable to expect to make inferences about the population. Hougaard (1984, 1986a) suggests assuming that within a population, frailty is distributed as some parametric family. A gamma distribution is mathematically convenient, and has been used in several studies (e.g. Clayton, 1978). Hougaard presents arguments in favour of a stable distribution of frailties. Other distributions could be used; however, as we shall see, only distributions with closed form Laplace and inverse Laplace transforms (such as the gamma and positive stable distributions) will be computationally feasible. 6 Chapter 2. Models and Techniques^ It is also clear that if the frailties in a population are all independent realizations from a distribution then unless very strong assumptions are made, it will be difficult to identify the frailty distribution in the presence of the baseline hazard, and vice-versa. There are, however, situations where one expects that certain individuals will share a common frailty. For instance it would be reasonable to assume that members of a family have the same frailty, and that conditionally on this frailty the family member lifetimes are independent. In such a context the frailty might be a function of genetic and environmental factors (Hougaard, 1986a). With two or more individuals per frailty value, it becomes reasonable to estimate both the frailty distribution, and the (parameterized) baseline hazard. 2.3 Multivariate Frailty Model as a CIHM. A model with observations on families, as mentioned above, fits into the framework of a CIHM. If Y (Yii , , Yin .) is the vector of lifetimes from the i t h family, then we take O to be the family frailty. We partition A = (A(a), AM) so that A(a) parameterizes the frailty distribution, while A( b ) parameterizes the baseline hazard. In general we might expect to have several baseline hazards; that is, we might expect our observations to be classified as coming from various groups. For instance in a clinical trial we might have a treatment group and a control group, or with data on fathers and sons we would treat fathers as one group and sons as another group. If we have p groups, then we can specify the first stage of the model by specifying functions h 1 , , h p such that Yi3 I O has hazard Oihk(yij I A (1)) )^ (2.2) when^belongs to the k th group. Note that since 0, is positive, (2.2) will be a valid hazard function precisely when h k is a valid hazard function. Chapter 2. Models and Techniques^ 7 Because we will be dealing with survival times and failure times, we wish to allow for right-censored observations. In the usual way, we introduce an indicator d ij taking value 0 if Yij is censored at y ij , and 1 if Yij = yij is observed. We assume that given a family's frailty, censoring arises in a way that makes the likelihood of independent observations a product of densities and survivor functions. This is true under fairly weak conditions— for a discussion, see Lawless (1982, pp. 31-44). Then, suppressing the dependence of the h k 's on Alb), the first-stage likelihood is k L(0, A I y,^=^L(0 i ,^y i , di )^ i=i (2.3) where L(0 i , A I yi, di ) = [eihk,,(yij)exp(-0iHkti (yij))] d. [exp(-0 i Hk ,, (NM] }^(2.4) in which kij denotes the group in which Yij belongs, while Hk (y) = foY h k (t)dt. Simplifying and integrating with respect to the O i 's gives the unconditional (collapsed) likelihood H L(A I y, d) =^ h (y id E),9 1(a) [O i D s exp(—O i lli(yi))]^(2.5) i=1^{i:d,,=1} [ where Di = E; t i dij , and Hi(yi) = E .7 1` 14,(Yij)• One possibility for the second-stage is that the O i 's come from a gamma distribution. Note that in this case, (2.5) is relatively easy to evaluate. However, if the gamma distribution has unconstrained shape and scale parameters, one might expect an identifiability problem with the likelihood (2.5) (this is certainly the case with the Weibull model of the next section). A natural restriction is to fix the mean frailty at 1. This is the approach taken by Clayton (1978). Hence an individual with average frailty has baseline hazard, and scale is accounted for in the first stage of the model. We choose to parameterize the gamma frailty distribution so that A(a) = A o , and the frailty density is " Tr g(0) =^ 0"' exp(—a0)^ •^ (2.6) Chapter 2. Models and Techniques^ 8 where a = —(log A 0 ) -1 , and A o E (0,1]. This parameterization has the advantage that , Xo lies in a finite interval, with the important special case of unconditional independence within families occuring at an endpoint (A 0 = 1). It can also be shown that the limiting case as A o decreases to 0 is the Frechet upper bound (perfect positive dependence) (Oakes, 1989). Assuming (2.6), (2.5) can be evaluated using [0 D exp(—H0)] — F(D + a) ^ as F(a) (H cr)(D+a) (2.7) where again a = —(log Hougaard's suggestion is a positive stable distribution of frailties. These distributions are characterized by their Laplace transforms (moment generating functions). Specifically, E°0 [exp(—sO)] = exp(-3 )0 ), A o E (0, 1]^(2.8) The stable distributions have long right-tails. In fact, when A o < 1, the k th moment of the variate defined in (2.8) exists if and only if k < A o , and so in particular, the mean is infinite. The density is complicated, given by 1 ^ruc k, + 1) g(0 A o ) = r k=1^k! ( 0 -A0 ) k sin(A 0 kr) A good discussion of stable variates is found in Feller (1971, pp. 169-76). As in the gamma case, A o = 1 means the distribution is degenerate at 0 = 1, and we have independence within families, while as A o decreases to 0, we obtain the Frechet upper bound (Oakes, 1989). We can evaluate (2.5) by differentiating (2.8) D times with respect to s. The differentiation gives a recursive form for (2.5) (Hougaard, 1986b). In particular, Et [O D exp( - 1-16)1 = (AoH) D exp(—H A0 )R(A 0 , D, H)^(2.9) where R= { 1^if D = 0 ED - 01 cD7m H-mAo m= otherwise 9 Chapter 2. Models and Techniques^ and rn = C k,m = k -1,m + C k-lon-10 — 1A0 1 — 0; (k — m)] m =1,...,k — 2 ; x lr(k-Ao) -k ^ c)^r(1-A0) ,, m = k —1 . A potentially important difference between gamma frailty and positive stable frailty has been noted by Hougaard (1984,1986a). If a population has frailty distribution given by density g(0), then one can consider the distribution of frailties amongst those individuals living at time t > 0. We denote this distribution g t (0). For any second-stage distribution, with gamma frailty the coefficient of variation of gt is constant in t, while with positive stable frailty, the coefficient of variation is decreasing in t. (Hougaard (1984) suggests that a normalized measure of variability, like the coefficient of variation of the frailty distribution, is an appropriate measure of heterogeneity.) So, in this sense, the population becomes more homogeneous with time under stable frailty, but not under gamma frailty. Since it seems sensible that the sub-population of long-lived individuals be more homogeneous than the whole population, this supports the use of stable frailty rather than gamma frailty. So, gamma and stable distributions are candidates for the second stage of the model. A possible first stage is suggested in the next section. 2.4 The Weibull Model A useful parametric family for survival times is the Weibull distributions. Following Hougaard (1986b), we introduce a two-group model, with groups 1 and 2 being treatment and control groups respectively. We take A (b) = (A i , A2, A 3 ), and specify the conditional hazards by hl (y I Al, A3) = A1(A3YA3-1) Chapter 2. Models and Techniques^ h2(y I A2) A3) = A2(A3Y A3-1 ) 10 (2.10) So the groups have Weibull conditional hazards, with common shape parameter, but different scale parameters. Under both gamma and stable frailty, this leads to an identifiable unconditional likelihood. 2.5 Properties as a One-Stage Model The properties of a frailty model treated as a multivariate distribution have been studied by several authors. Often they arise in the context of bivariate distributions with given marginals (these can be classified by bivariate distributions with uniform marginals, often referred to as copulas—see Genest and MacKay, 1986a,b). If the marginal survivor functions of T1 and T2 are specified as S1 and S2 respectively, then, as noted in Oakes (1989), the joint survivor function S(t) = P(T1 > t i , T2 > t 2 ) is given by S(t) = p[q(S i (i i )) + q(S2(t2))] (2.11) where p is the Laplace transform of the frailty density, and q is the inverse function of p. Oakes (1989) points out that this is a subclass of the Archimedian bivariate distributions considered by Genest & MacKay (1986a,b). For another generalization, see Marshall & Olkin (1988). There are of course a number of possible measures of the association between T 1 and T2. These measures tend to classify the association as positive, in the sense that T 1 and T2 have a tendency to be simultaneously large or simultaneously small. A useful characterization is the notion of quadrant dependence given in Lehmann (1966). (T 1 , T2 ) are positively quadrant dependent if S(t) ? S1(t1)S2(t2) Vtl) t2 ^ (2.12) Chapter 2. Models and Techniques^ 11 It is possible to show that if the log of the Laplace transform of the frailty density is convex then the bivariate frailty model is positively quadrant dependent for any choice of second-stage distributions (Appendix A). From (2.7) and (2.8) we see this convexity condition holds for both gamma and positive stable frailty. By results in Lehmann (1966), this implies that the following measures of dependence/association are non-negative 1 : Cov(Ti , T2 ) (provided it exists), Kendall's T (Kendall, 1938), Spearman's p s (Spearman, 1904), and Blomqvist's quadrant measure (Blomqvist, 1950). Oakes (1989) suggests using Kendall's 7 in frailty models as it depends only on the second-stage (frailty) distribution. The definition is T= E {sign [ e) - T(2))(e) - Ta2))11 ( where T( i )(i = 1, 2) are independent copies of T. Genest & MacKay (1986b) show that a model of the form (2.11) satisfies oo 7 = 4 f sp(s)p"(s)ds —1 o (2.13) With our parameterizations, under gamma frailty T = (2a + 1) -1 , where a = —(log A 0 ) -1 , while under positive stable frailty, T = 1 — A o (Oakes, 1989). Estimating T from data and subsequently estimating the frailty distribution parameter is discussed in Oakes (1986). The same author suggests a method of assessing the frailty distribution choice without specifying conditional hazards (Oakes, 1989). 2.6 Bayesian Inference Of course from a one-stage model perspective, we can only make inferences about A. Any Bayesian inference about A will involve finding the posterior distribution of A. If the prior 1 This is actually a special case of a result in Genest & MacKay (1986a) concerning a stochastic ordering (concordance ordering) on copulas. Our case is obtained by comparing our copula to the independence copula. The condition on the log of the Laplace transform can be weakened to superadditivity. Chapter 2. Models and Techniques^ 12 information about A is given by distribution 11(A), then the posterior is given by [11 ;/ i L(A 11(A) II(A I y) = ^ I yl)] f [Tric=i-L(A I yi)] 11(A)dA — The mean and variance of a component Ai with respect to the posterior are known as the posterior mean and posterior variance of Ai, and are good summaries of the investigator's post-experimental knowledge about A i . Alternatively, the posterior distribution of A i can be computed according to li(ai I Y) =^/11(a I y)dAi This distribution can be plotted for a visual display of the knowledge of A i . Of course if A were 1 or 2-dimensional, then the complete posterior could be plotted. Unfortunately, this is seldom the case. Bayesian analysis commonly requires the use of approximation techniques. 2.7 Using Laplace's Method to Approximate Posterior Quantities Tierney & Kadane (1986) discuss the use of Laplace's method to approximate posterior quantities. For example if a model with k observations is parameterized by vector A, with log-likelihood £(X) and prior r(\), then the posterior mean of a function g(\) can be written E(g(A) I data) = f exp(kL*(A))dA f exp(kL(A))dA where L(A) = k -1 (log7r(A) L(A)) and L*()) = k -1 (logg()) + log r(A) £(A)). The Laplace approximation amounts to replacing both numerator and denominator integrands with second-order Taylor approximations about their respective global maximas. This Chapter 2. Models and Techniques^ 13 results in integrals of unnormalized multivariate normal density functions. The approximation becomes E(g(\) I data) = where ( det E * ) 1/2 exp{k(L*(A*) — L(A))1 det E A and A. maximize L and L* respectively, and E and E* are the respective negative inverse Hessians evaluated at these maxima. The thrust of Tierney and Kadane's argument is that when g is a positive function, the numerator and denominator integrands are of similar shape. Thus the two Laplace approximations make similar errors, and so in taking the ratio the error is reduced. In fact while a single integral approximation has 0(k -1 ) error, the ratio approximation has 0(k') error. This method could be used to estimate the posterior means and variances of components of ) in our model, using the likelihood in (2.5). Note that this method requires only maximization of slightly modified likelihood functions, and so should be feasible whenever maximum likelihood estimation is feasible. In general, the method can be adapted to estimate marginal posterior densities. Laplace's method will be used to approximate posterior means in the simulated data sets of the next chapter. Chapter 3 Estimation and Simulation 3.1 Estimating the Common Parameter As mentioned previously, when family size is small, as in a bivariate model, we seek information about the common parameter A. Using the unconditional likelihood (2.5), there are several possible methods of point estimation of A. The most common would be maximum likelihood estimation; however, with a prior distribution on A, Bayesian estimation is more general and appropriate. Neither method will produce closed-form estimators, given the form of the likelihood (the univariate Weibull model with censoring has no closed-form MLE). However, using an iterative procedure, the MLE should not be hard to obtain. Possibly, computation of posterior means would be more difficult, but approximating posterior means using Laplace's method should not be much harder than finding the MLE, given the similarity in method. We propose to simulate data from the Weibull model of section 2.4, with one member of each group per family. Hougaard (1986b) found maximum likelihood estimates in the positive stable frailty case, using a heavily censored trivariate sample with 50 observations (the sample is lifetimes of female rat litters, with one treated sibling per litter, found in Mantel, Bohidar & Ciminera, 1977). Using a quasi-Newton minimization routine, we were able to reproduce Hougaard's results. The routine was also sucessfully applied to the same data through the gamma frailty model. These estimates were used to choose hopefully realistic parameter values for simulation. Analytic derivatives of the likelihood 14 Chapter 3. Estimation and Simulation ^ 15 are complicated in both models; for simplicity, we use the derivative-free routine in our simulations. We found in general that we could maximize the likelihood (and the slightly modified likelihoods needed for Laplace's method) fairly routinely. Of course, to do Bayesian estimation, we must also specify prior distributions. 3.2 Prior Information Recognizing our relative paucity of prior information, we use vague priors on A. One such prior is 111(A) = ^1 Ao E (0,1], Aj > 0 (j = 1,2,3) 0 otherwise We note from (2.10) that A i-1 / A3 is a scale parameter for Y 1 and similarly A2 11A3 is a scale parameter for Y2 (a conditional scale factor is also an unconditional scale factor). Computing the necessary Jacobian, we see that the scale invariant prior will be 11 2 (A) = 1 A 1: 1 A 2 1 .16 2 Ao E (0,1], Ai > 0 (j = 1,2,3) otherwise We try both priors in the simulation study. 3.3 Reparameterizations There are of course an infinite number of possible reparameterizations of our model. The point maximizing the likelihood is invariant under transformation of the parameter space. The posterior mean of a parameter in a given parameterization can be computed using an arbitrary parameterization for computation. For instance suppose we are interested in the posterior mean of A 0 , but we parameterize using i = g(A). The likelihood becomes L'(K) = L(g -1 (!c)). Our prior becomes 11'(K) = 11(g -1 (K)) 1'i/7,g -1 001 with respect to which we want to find the posterior mean of (g -1 (K)) 0 . So reparameterization just induces Chapter 3. Estimation and Simulation^ 16 a change of variables in integration; the true posterior mean is unaffected. However, when using Laplace's method, the Taylor approximations to the integrands could be better with one parameterization rather than another. So, a prudent choice of transformation g could potentially improve the estimation, when using approximate posterior means. In the Weibull model, the parameter space of the parameterization of interest, is truncated. (A 0 E (0, 1], A > 0, j = 1, 2, 3). However, the approximating integrals in Laplace's method are over all of 7?.". Perhaps choosing a parameterization that similarly has support on 7?" might improve the approximation. To this end we introduce reparameterization 1: /c o = — log(— log AO IC1 = log A i (3.14) tz2 = log A2 K3 = log A3 which has Jacobian cj g -1 (10 = exp ('Cl + K2 + K3 — 'CO — C") A second reparameterization was used by Hougaard (1986b) in performing maximum likelihood estimation in the positive stable model: 'CO = A0 AI X° (3.15) K2 = A2 A° K3 AO A3 for which the Jacobian is 3 — ) (K1K2) 11"-1 ( KO We try using the original parameterization and the two reparameterizations for computation in the simulation study. Chapter 3. Estimation and Simulation ^ 17 3.4 Weibull/Gamma Simulation Study The obvious way to simulate data from the Weibull Model is to generate each family's frailty independently from the frailty distribution, and then generate the lifetimes within a family as independent Weibull variates with parameters depending on the frailty. So in the gamma frailty case, it is a straightforward exercise in simulating gamma and Weibull variates. We fix the parameter value at A = (.6, 1.0, 1.0, 3.0), and generate 100 samples of size 50 each 1 and 100 samples of size 200 each (the samples of size 200 are extensions of the samples of size 50). The parameter value was chosen on the basis of being close to preliminary parameter estimates for the data analyzed in Chapter 5. We treat the original parameterization as the parameterization of interest. Using each parameterization, we compute the posterior mode of A, and the approximate posterior means of the components of A. Note that the MLE will be posterior mode of A using the original parameterization and prior 11 1 . For each of the six estimators (and four components of A), we report the mean of the 100 estimates. Also, as a realized loss in estimation, we report the sums of squares of the 100 deviations between parameter and estimate. The results are displayed in Tables 3.1 and 3.2. The simulations with sample size 50 indicate that the approximate posterior means with respect to 11 2 outperform those computed with respect to 11 1 , in terms of realized loss. The only exceptional cases occur with component A 0 , using the original parameterization and reparameterization II. The difference in loss is most noticeable in components A l and A2 using the original parameterization and reparameterization I. Using reparameterization II seems to slighty improve the performance of the approximate posterior means; however, the estimates do not seem to be particularly sensitive to the selection between the original parameterization and reparameterization I. The performance of the lOne sample of size 50 produced a likelihood which our routine could not maximize; therefore, results are provided based on the other 99 samples. Chapter 3. Estimation and Simulation ^ 18 Table 3.1: Comparison of estimators for 99 simulated samples of 50 observations in the Weibull/gamma bivariate model with parameter vector A = (0.6,1.0,1.0,3.0) Estimator ORIG. PARAM. Prior I Approx. Post. Mean Posterior Mode (MLE) Prior II Approx. Post. Mean Posterior Mode REPARAM. I Prior I Approx. Post. Mean Posterior Mode Prior II Approx. Post. Mean Posterior Mode REPARAM. II Prior I Approx. Post. Mean Posterior Mode Prior II Approx. Post. Mean Posterior Mode A o (0.6) Mean Loss Al (1.0) Mean Loss A2 (1.0) A3 (3.0) Mean Loss Mean Loss .564 .623 1.56 1.74 1.14 1.01 10.84 5.67 1.14 1.01 7.67 3.60 3.16 3.05 12.78 9.65 .608 .663 1.61 2.08 1.04 .932 6.10 4.65 1.04 .934 3.97 3.10 3.05 2.95 9.30 8.72 .561 .560 1.49 1.42 1.16 1.12 11.55 9.28 1.16 1.12 8.28 6.37 3.17 3.16 12.92 12.31 .599 .599 1.26 1.20 1.05 1.02 6.32 5.58 1.05 1.03 4.12 3.54 3.06 3.06 9.25 8.92 .592 .495 .809 3.74 1.09 1.18 6.92 17.84 1.09 1.18 4.65 12.71 3.11 3.26 9.41 22.07 .617 .556 1.02 2.65 1.02 1.04 5.04 7.85 1.02 1.04 3.26 5.03 3.02 3.12 7.76 13.57 Chapter 3. Estimation and Simulation ^ 19 Table 3.2: Comparison of estimators for 100 simulated samples of 200 observations in the Weibull/gamma bivariate model with parameter vector A = (0.6,1.0,1.0, 3.0) Estimator ORIG. PARAM. Prior I Approx. Post. Mean Posterior Mode (MLE) Prior II Approx. Post. Mean Posterior Mode REPARAM. I Prior I Approx. Post. Mean Posterior Mode Prior II Approx. Post. Mean Posterior Mode REPARAM. II Prior I Approx. Post. Mean Posterior Mode Prior II Approx. Post. Mean Posterior Mode Ao (0.6) Mean Loss Al (1.0) Mean Loss A2 (1.0) Mean Loss A3 (3.0) Mean Loss .590 .606 .318 .328 1.04 1.01 1.56 1.29 1.04 1.01 1.39 1.14 3.05 3.02 2.37 2.16 .600 .616 .308 .350 1.02 .990 1.34 1.20 1.02 .989 1.18 1.06 3.02 2.99 2.15 2.09 .590 .591 .317 .309 1.04 1.03 1.57 1.48 1.04 1.03 1.40 1.32 3.05 3.04 2.37 2.34 .600 .601 .307 .300 1.02 1.01 1.34 1.29 1.02 1.01 1.18 1.14 3.02 3.02 2.15 2.13 .591 .581 .314 .387 1.04 1.03 1.56 1.55 1.04 1.03 1.39 1.38 3.04 3.06 2.35 2.57 .601 .592 .305 .357 1.02 1.01 1.34 1.34 1.02 1.01 1.18 1.18 3.02 3.03 2.14 2.29 Chapter 3. Estimation and Simulation ^ 20 MLE is relatively good, except in component A o . Empirically then, one might prefer to use approximate posterior means when the frailty distribution parameter is of primary interest. Not surprisingly, with sample size 200, there is little difference in the estimators. 3.5 Weibull/Positive Stable Simulation Study The positive stable frailty simulations were carried out in exactly the same fashion as in the gamma case. The true parameter value was fixed at A = (.9, 1.0, 1.0, 3.0) (again, this is based on estimated values from the data in Chapter 5). To simulate positive stable random variates, a formula due to Kanter (1975) was used. The stable variate 0 is realized according to 0 where X 1 and X2 = [a(X01 A-1-1 ° X2 are independent; X 1 is uniform on [0,11], while X2 has a standard exponential distribution. The function a is given by A0 sin[(1 — ) 0 )x] {sin(A0x)} 1-A0 {sin x}i A0 - This is a special case of the method for general stable variates presented in Chambers, Mallows & Stuck (1976). The simulation results are presented in Tables 3.3 and 3.4. As previously, the simulations with sample size 50 indicate that using 11 2 rather than H i improves the performance of the approximate posterior means, in terms of realized loss. However, it seems there is less sensitivity to reparameterization than in the gamma frailty case. The MLE again performs well in components A i , A2, A3, but has higher realized loss than approximate posterior means in component A o . The results with sample size 200 indicate very little difference between the various estimators. The simulated samples and the resultant estimates are put to use in the study of a prediction problem in the next chapter. Also, both maximum likelihood and approximate Chapter 3. Estimation and Simulation ^ 21 Table 3.3: Comparison of estimators for 100 simulated samples of 50 observations in the Weibull/positive stable bivariate model with parameter vector A .-,- (0.9,1.0, 1.0, 3.0) Estimator ORIG. PARAM. Prior I Approx. Post. Mean Posterior Mode (MLE) Prior II Approx. Post. Mean Posterior Mode REPARAM. I Prior I Approx. Post. Mean Posterior Mode Prior II Approx. Post. Mean Posterior Mode REPARAM. II Prior I Approx. Post. Mean Posterior Mode Prior II Approx. Post. Mean Posterior Mode A o (0.9) Mean Loss Al (1.0) Mean Loss A2 (1.0) A3 (3.0) Mean Loss Mean Loss .865 .896 .617 .561 1.04 1.00 3.32 2.70 1.06 1.02 4.24 3.52 3.17 3.10 16.32 13.92 .879 .909 .500 .519 1.01 .973 2.76 2.48 1.01 .991 3.60 3.27 3.11 3.05 14.06 12.47 .850 .839 .621 .737 1.05 1.05 3.50 3.51 1.07 1.07 4.55 4.60 3.22 3.25 17.16 18.47 .862 .851 .469 .562 1.01 1.01 2.88 2.93 1.03 1.03 3.81 3.88 3.16 3.19 14.54 15.61 .864 .871 .755 .735 1.04 1.01 3.39 3.00 1.06 1.03 4.36 3.84 3.17 3.16 16.44 17.34 .881 .887 .516 .618 1.01 .979 2.77 2.67 1.03 .997 3.70 4.34 3.13 3.10 17.44 15.14 Chapter 3. Estimation and Simulation ^ 22 Table 3.4: Comparison of estimators for 100 simulated samples of 200 observations in the Weibull/positive stable bivariate model with parameter vector A = (0.9,1.0,1.0,3.0) Estimator ORIG. PARAM. Prior I Approx. Post. Mean Posterior Mode (MLE) Prior II Approx. Post. Mean Posterior Mode REPARAM. I Prior I Approx. Post. Mean Posterior Mode Prior II Approx. Post. Mean Posterior Mode REPARAM. II Prior I Approx. Post. Mean Posterior Mode Prior II Approx. Post. Mean Posterior Mode A o (0.9) Mean Loss Al (1.0) Mean Loss A2 (1.0) A3 (3.0) Mean Loss Mean Loss .892 .900 .179 .186 1.02 1.01 .606 .568 1.01 1.00 .523 .489 3.05 3.03 3.70 3.57 .894 .903 .167 .178 1.01 .997 .599 .560 1.00 .995 .516 .498 3.04 3.02 3.55 3.47 .887 .880 .165 .177 1.01 1.01 .610 .618 1.01 1.01 .544 .551 3.07 3.09 3.69 3.911 .891 .884 .153 .161 1.01 1.01 .582 .590 1.00 1.00 .519 .526 3.05 3.07 3.46 3.64 .890 .892 .176 .191 1.01 1.01 .607 .583 1.01 1.00 .540 .517 3.06 3.05 3.75 3.85 .894 .896 .168 .184 1.01 .999 .579 .567 1.00 .977 .515 .504 3.04 3.04 3.57 3.68 Chapter 3. Estimation and Simulation ^ Bayesian estimation are utilized in the data analysis of Chapter 5. 23 Chapter 4 Prediction In many instances it may be desirable to use information about a family's frailty in order to predict individual lifetimes. Of course, to do so would also require information about the individual's baseline hazard. For simplicity, we will assume that if we wish to predict potentially observable survival time Yi then we have observed all the other components of the ith family (unobserved family members need not be included in the model). We denote these components by Yi ,( 3 ) = (Yo , , Yij-11 Yi,j+1) • • • Yi,ni )• So, if we have observed m families, and the individual of interest belongs to the ( in + l)st family, then we wish to study Y„,,+1,i I Yi , • • • , Ym, Ym + 1,(i). In such a situation, observing , Y,n Yni+1 ,( i) gives information about the baseline effects and the frailty distribu, tion; however, only n i, +1 ,( i) can give direct information about ° m+1 , the frailty of Ym+1, 3 • The question of how much improvement in our knowledge about 0 m+1 is realized by observing 17,n+1 ,0 ) is crucial to the prediction. Before delving into detail, we look at why this scenario might arise naturally. Often we might have a situation where within a family, survival times of one group are generally measured before those of another group. In such a setting it may be of interest to predict the second group times given the first group times. For example, having observed the lifetimes of a number of father-son pairs, one might want to predict a son's lifetime given his father's lifetime. Similarly, if an individual has two responses, with one often larger than the other, such prediction might be relevant. For instance, Holt & Prentice (1974) report survival times of several skin-grafts per patient, with a mixture 24 Chapter 4. Prediction^ 25 of poor-match and good-match grafts. In a clinical situation, if a patient has grafts of both types, it might be desirable to predict the failure times of the good grafts, having observed the poor-match failure times. Additionally, in a clinical counselling situation, the onset times of a disease in family members could be used to predict the onset time in an unaffected individual. Given a data set, one approach to the prediction problem is to assume the value of the underlying parameter A is known and given by an estimator. This is known as an empirical Bayes procedure. Alternatively, one can incorporate the uncertainty about the value of A by considering the posterior distribution of A. Here, we consider both approaches. 4.1 Parametric Empirical Bayes Approach to Prediction To take a Parametric Empirical Bayes (PEB) approach to prediction, we assume that based on the first in families and the partial observation of the (m 1)' family, we have an estimator A of A. This could be the maximum likelihood estimator, a vector of approximate posterior means, or another estimator. The PEB approximation amounts to assuming A = A. So given , Ym , Ym+1 ,( 3) , we assume 0, +1 is distributed with density g(O m+i I ;I■ ). If we treat this as a prior distribution for O m+i , and approximate the density of Ym+i ,(,) by f (y m+1 ,( i )161 m+i , A), an application of Bayes theorem gives the density of Om+1 I Y1, • • • , Ym, ICH-1,(i) as proportional to f (Ym4-1,(j) I Om+1, 5)9(0,.+1 I A). It follows that the predictive density of Ym +i,j is I f (Y m+1, 3 I e m+1,A)f (Yrni-1 ,()) I 19 m+1) A)g(Om+1 I A) dOm+1 I f (Ym+1,(x)1 0 .+1,A)g(em+lIA) de.+1 (4.16) We note that the denominator of (4.16) is simply the marginal unconditional density of Ym+1,( 3 ) given A = A. Similarly, given the conditional independence structure, the numerator of (4.16) is the unconditional joint density of Ym+1 = (Ym+1,( 3 ), Ym+1, 3 ). So the PEB 26 Chapter 4. Prediction^ predictive density is simply the conditional density of Ym, +1 , 3 I Yin+1 ,( 3) with parameter A replaced by its estimator (the estimator contains the information from I'm). We denote this density by pi(ym+1, 3 I Ym+1,(i),A)• 4.2 Full Bayesian Approach to Prediction To take a full Bayesian approach to prediction, we need the posterior distribution of O m+i , A I li, • • • , Ym, Ym + 1,(j). By the definition of conditional probability, this will be the product of the densities of 0,2+1 I A, li, • • • '177.'1'922+1,w and A I Y1 , • • • , 17m, Ym-1-1,(A• But 0m+1 I A, li, Ym , Ym+l,(j) —= 0.+1 I A, Ym+1,(i), which has posterior density f(Ym+1,(i) I 0 .+i, I A) f f(Ym+i,(3) I em+i, A)g(0,72 + 1 I A)d0.+1 A)g(0,72+1 while the distribution of A I Y1 , .. • , Yrn, Y,22+ 1,( 3 ) is just the posterior distribtion of A given all the observed data. We denote the corresponding density by T(o). Hence the predictive density will be If f(Ym+1,(i) Om+i, A)g(0m+1A) ^H*(A)d0,7,41dA f (Ym+1,3 I 0,2+1,A)^ f f (^m+i,(A) I 0.„ +1 , A)g(0',„ +1 A)dOini+1 which simplifies to [f f(ym+1,3 I em+i, A)f (Ym+i,(3) Om+i, )0y(Orn-FiA)clemd-ii f f(Yrrt+1,(3) I em+i, A)g( 0 m-1-1A)dOm+1 1-1*(A)dA or equivalently Eir fpi(ym+10 I Ym-1-1,(j), A)} ^ (4.17) We denote this density by p 2 (y m4. 1 , 3 I data). Note that the PEB predictive density can be obtained from (4.17) by assuming a posterior distribution on A which is degenerate at A. As with other posterior quantities, we anticipate that it will not be possible to evaluate (4.17) exactly. ^f Chapter 4. Prediction ^ 27 4.3 The PEB Approach in the Weibull Model To evaluate PEB predictive densities, we can easily obtain conditional densities as ratios of appropriate joint densities. For example, in a bivariate model, suppose we want the conditional density of Yi,2 I Yo . Suppressing the family index, we know an uncensored family has density f (yi, y2 I A) = h i (y i )h 2 (y 2 )40 2 exp(--OH(y))] Similarly, by considering a family of size 1, we know .i(Yi I A) = hi(yi )nOexp( -01/1(M.))] Taking the ratio yields Pi(Y2 i yl,a)^ h 2 (y 2 )Eg [02 exp(-01/(Y))] Eve exp(-0 (Y1))] (4.18) Similar arguments give conditionals for censored cases: (Y2 I Y1 >^A) = h2(y2)40 exp(-01/(Y))] Eg[exp(-1911 1(y0)] - EVO exp(-011(y))] P(Y2 > Y2 I Y1 = Y 1 ' A) — Eve exp( - 0Hi(Y1))] Eg [exp(— 0 H (0)] P(Y2 > Y2 I Y1 > Y 1 ' A) - EVexp( -61 Hi(Y1))] Note that all these expressions can be easily evaluated in either the Weibull/gamma model or the Weibull/positive stable model. Assuming Y1 is in group 1 and Y2 is in group 2, (4.18) becomes (a + 1)A2A3(0/ ^AiM, ) (.-1-1) y 2 A3-1 Pi (y2^A)^ A3 (a +^+ A2Y2 3 ) (a+2) (4.19) where a = —(log A 0 ) -1 . When A3 > 1, this is a unimodal density; otherwise, it is strictly decreasing. 28 Chapter 4. Prediction^ In the positive stable case, as mentioned in Hougaard(1986b), (4.18) is Pi(Y2 I M., A) = (A2A 3Y,,3-1)A0 {(Aiyi3-1-A23)21 A i y iA3 „1,2y3)-,\01 x exP[(A2Y2 3 ) A°^(Aiy1A3^A2Y2 3 ) )1 [1 + (A 0 1 —^ (4.20) Note that in both cases, Ar ) is a scale parameter. 4.4 The Full Bayesian Approach in the Weibull Model We note that with the complicated functional forms for the predictive densities (4.19) and (4.20), analytic evaluation of (4.17) would not be possible even if we had a nice form for the posterior density. We need to resort to some kind of approximation. Since a predictive density is necessarily a positive function, we could use the Laplace approximation method. However, this would have to be applied pointwise: a numerical maximization would be required for each point at which the predictive density is approximated. Furthermore, the approximate density may need to be normalized, requiring numerical integration. A second possibility would be to replace the posterior distribution with a multivariate normal distribution. That is, if log(likelihood x prior) has mode it and Hessian —E at the mode, then replace II* with a N(1t, E -1 ) distribution. This is correct asymptotically (Berger, 1985, p. 224). However, due to the complicated integrand, Monte Carlo methods would still be required. For little additional cost (and potentially considerable gain) one could use this Normal distribution as an importance sampling distribution. So, using the usual importance sampling scheme, if II ° is the N(a, E -1 ) density function, and if A( 1 ), , A(') are simulated vectors from this distribution, then our approximation is n p 2 (z data) ti where ci E ci pi (z I y i=1 II * A (i) floo(0) and ^ m+1 , 2 , A () ) > ci = 1 i=1 (4.21) 29 Chapter 4. Prediction^ Note that we need only evaluate the posterior II* up to a constant, so it is sufficient just to evaluate the likelihood and prior. Hence, our estimate of p 2 is easily normalized. It is of course of interest, within the context of the Weibull model as well as more generally, to discuss the relative merits of the PEB and full Bayesian approaches to prediction. 4.5 Full versus PEB Prediction Generally, the difference between any parametric empirical Bayes approach and a full hierarchical approach is that the PEB approach neglects the experimenter's uncertainty about the value of A. For this reason, we expect the PEB predictive density to be overconfident: to be more peaked and have thinner tails than the data can justify. However, the PEB approach does yield the correct functional form of the true conditional density, and if our estimate A of A is particularly good, we might expect the PEB predictive density to outperform the full predictive density. Since we are comparing two specified probability density functions intended to describe observable Ym+i ,i , Ym , y m+i ,u), we choose to compare their performance by the density ratio L(Yi , P2 (Ym+1,j Yll = • • • Ym Ym+1,(j)) (4.22) 7••• Ym, )) Often we prefer to work with the log-ratio due to its symmetry. (A positive value corresponds to p 2 outperforming p i , while the corresponding negative value indicates P1 outperforming p 2 by the same amount.) Of course in our model, p i and p 2 are complicated, so it is unlikely we can make any analytic inferences about the distribution of L. For this reason we mention a simple, though somewhat unrelated example of prediction. We speculate there is similar behaviour of likelihood ratios in the example and in our model. Chapter 4. Prediction^ 30 4.5.1 Exponential Prediction Example Let X1 ,^, Xn+i be iid exponential with scale 1/A. Take a non-informative prior II(A) 1/A. It is straightforward to see that A , Xn has a gamma distribution with shape n, and inverse scale E:L i Xi . So based on the first n observations, A has posterior mean n/ 1 Xi , which is also the MLE. A "plug-in" method neglecting uncertainty about A would predict X n+1 as exponential with scale ETia_ i Xi /n. The full predictive density would be the expectation of the density of X n+1 with respect to the gamma posterior distribtion on A. The ratio of the second density to the first is ( L(Xi ,^, Xn +i) Er 1 n , )n f [0 exp( -0 Xn+i)]e n -1 exp(- (E:L. 1 Xj )0)d0 =^ L-2=1 exp(-X„+i(n/ E i . 1 Xi)) This simplifies to n+1 7n X.^nXn+i L(Xi ,^Xn-Fi ) - ^ exp ^ EL /I Xn+1^Erii=1 which can be written as L(Xi,^Xn+i) = Y n+1 exp[n(Y -1 - 1)] where Y 2=1 1--4=1 Xi But by a well known property of independent gamma distribtions, Y has a beta(n, 1) distribution. Note that as a function of Y, L attains a minimum of e( — 1-2-- n+1 at Y = n+ 1) , but that L(Y) increases without bound as Y decreases to the left of the minima. nn 1 Furthermore L(0) 1. However, the minimum occurs at the mean of Y's distribution, whereas Y has little mass where L is large. To clarify, we superimpose log L(Y) and the density of Y for various n in Figure 4.1. So the region where L < 1 has high probability; however, L is almost surely larger than kn = n-Fi)n+1. Conversely, large values of L are possible, but are unlikely. Based on the beta distribution function, we tabulate Chapter 4. Prediction ^ 31 Figure 4.1: Predictive density ratio as a function of Y (exponential example) n=5 n=10 0.4^0.6^0.8^1.0 0.4^0.6^0.8^1.0 Y Y n=15 n=20 57 ..:5 o) O : 113 . 0 L° . 9^I^. 0.4^0.6^0.8^1.0 0.4^0.6^0.8^1.0 Y Y Dotted Line proportional to density of Y Horizontal Line equals zero Chapter 4. Prediction^ 32 Table 4.5: Probabilities of intervals for log L in exponential example n 5 10 15 20 k r, .91 .95 .97 .98 (log k„,0) .83 .85 .85 .86 (0, — log k n ) .05 .04 .05 .04 (— log k„, oo) .12 .11 .10 .10 probabilities of log L lying in the intervals (log k n , 0),(0, — log kn ), and (log kn , co) for various n in Table (4.5). So, the "plug-in" prediction has a high probability of doing slightly better than the proper predictive density, but it cannot do a lot better. With small probability, it will do much worse. Thus the full predictive density is more robust, at the cost of usually being slightly outperformed. This is consistent with the conjectured sharper peaks and thinner tails of general PEB predictive densities mentioned earlier. 4.6 Prediction in the Weibull/Gamma Model To try out the prediction in our model, we take the 99 samples of size 50 simulated from the Weibull/gamma model, and extend them with a 51s t pair of observations. For each sample, we compute a PEB predictive density and an approximate full predictive density for Y51,21 Y5 1 , 1 , Y1, • • • , Y5o. For the PEB density we estimate A by the approximate posterior means with respect to the flat prior H i , computed using reparameterization II. (The simulations indicated approximate posterior means give better estimates of A o , the parameter governing the frailty distribution, and hence the association within families. Since this association will be critical to our prediction, we use these estimators, even though they are slightly outperformed by the MLE in terms of the baseline parameters.) For the approximate full predictive density, we use importance sampling, generating Chapter 4. Prediction^ 33 1000 multinormal vectors to approximate the integral. By estimating the error in the importance sampling, and by visually inspecting the effect of the importance sample size, it seems that this sample size is sufficient to control stochasticity introduced by the importance sample. We also use reparameterization II for this because the previous simulation results suggest that this parameterization leads to a 'more normal' posterior distribution than the others. If this is true, the importance sampling should be most efficient with this parameterization, given the choice of a multinormal importance density. Of course this makes our predictive density a sum of 1000 terms, which makes it somewhat hard to handle. However, we cannot really expect to do much better. Since the integral being approximated is 4-dimensional, using this sample size is analogous to approximating a 1-dimensional integral using AY1000 --:.,' 5.6 simulated variates. The two predictive densities are plotted for the first 8 samples in Figure 4.2. In each case the PEB predictive density is the more peaked and thinner—tailed of the pair. The potential overconfidence of the PEB densities is clear. It becomes a reality in light of Figure 4.3, a histogram of the log-predictive density ratios. While half the samples produce negative values (PEB beats full density) there are no large negative values, but there is a thick tail in the large positive values. The relative robustness of the full predictive density observed in the exponential example is empirically obvious in our model. 4.7 Prediction in the Weibull/Positive Stable Model The same procedures mentioned above were carried out for the 100 samples of size 50 simulated from the Weibull/positive stable model. In this case, the maximum likelihood estimates of A were used for the PEB prediction, given their performance in the simulation. Because of the proximity of the true value of A o (0.9) to the parameter space Chapter 4. Prediction ^ 34 Figure 4.2: Approximate full and empirical predictive densities (gamma frailty) A l^-.41 0 LC1 O 0 O O O ^ 0.0 O N O ^ 0.5^1.0^1.5^2.0^2.5^3.0 A +.68 0.0 0.5^1.0^1.5^2.0^2.5^3.0 I A^ O +.16 O O 0 O _ O O 0 0.0^0.5^1.0^1.5^2.0^2.5^3.0 0.0^0.5^1.0^1.5^2.0^2.5^3.0 -.35 O Lc?^_ O O O 0.0^0.5^1.0^1.5^2.0^2.5^3.0 0.0^0.5^1.0^1.5^2.0^2.5^3.0 A +.27 9 Lt1 O O O 0.0^0.5^1.0^1.5^2.0 0.0^0.5^1.0^1.5^2.0^2.5^3.0 marks predicting value Y(51,1) A marks observed predicted value Y(51,2) number is log-liklihood ratio evaluated at observed Y(51,2) Chapter 4. Prediction^ 35 Figure 4.3: Histograms of observed predictive density ratios: approximate full versus empirical Bayes 99 samples of size 50+1 with gamma frailty 0 0 CO 0 CV 0 0 -4 -2 0 2 4 log predictive density ratio 100 samples of size 50+1 with stable frailty 0 -4- 0 CO 0 CV 0 .,— 0 -0.4^-0.2^0.0^0.2^0.4 log predictive density ratio Chapter 4. Prediction^ 36 boundary () o = 1), many samples produced a substantial fraction of importance sample points outside the parameter space. However, this did not seem to hamper the accuracy of the approximation. The histogram of observed log-likelihood ratios is displayed in Figure 4.3. The right tail is again slightly heavier, but this trend is less pronounced than in the gamma frailty case. The important difference however is that the variability is roughly an order of magnitude smaller. Observing the plots of the two predictive densities in Figure 4.4 shows that in all but one of the first eight samples, the two densities are visually indistinguishable. It seems that with positive stable frailty and a sample of size 50, PEB prediction is virtually identical to the full Bayesian approach to prediction. 4.8 Effect of Frailty Distribution on Prediction We speculate that the full Bayesian and Empirical Bayes predictive densities are so close in the stable frailty case for the following reason. Implicit in the prediction is an updating of knowledge regarding 8 m+1 . Prior to observing Y,,,, +1 ( i i, we know 0 ni+1 comes from a positive stable distribution, and we have information about the parameter, A o . We use Y7,2+1 4 3 ) to update (via Bayes theorem) the distribution on O ni+1 . Because the stable distribution has such a thick right-tail, we expect the information from Y,7, 14 ,( 3) dominates the prior information. In other words, our knowledge about ° m+1 is not sensitive to our knowledge of A o . This would explain why replacing a distribution on A o with a point mass might not greatly affect the prediction. Further, because the information on the remaining components of A is so good compared to the knowledge about 6 1 ,7,41 , interchanging degenerate and non-degenerate distributions on these components will not have a large effect. Conversely, with gamma frailty, the prior knowledge about A o would be important in the updating process; consequently, there would be a noticable difference between the full Chapter 4. Prediction ^ 37 Figure 4.4: Approximate full and empirical predictive densities (positive stable frailty) A^ • - • - +.319 -.020 N - O 0.0^0.5^1.0^1.5^2.0^2.5^3.0 A • I^ +.003 0.0^0.5^1.0^1.5^2.0^2.5^3.0 A^ - • - - N - 0.0^0.5^1.0^1.5^2.0^2.5^3.0 • - 0.0^0.5^1.0^1.5^2.0^2.5^3.0 A +.075 • -.031 - N - - O O 0.0^0.5^1.0^1.5^2.0^2.5^3.0 0.0^0.5^1.0^1.5^2.0^2.5^3.0 A^ I^ A -.027 - • +.013 -.025 - N - - 0 - 0.0^0.5^1.0^1.5^2.0^2.5^3.0 0.0^0.5^1.0^1.5^2.0^2.5^3.0 marks predicting value Y(51,1) A marks observed predicted value Y(51,2) number is log-liklihood ratio evaluated at observed Y(51,2) Chapter 4. Prediction^ 38 and PEB predictors under gamma frailty. As we will see in the next chapter, predictive denisities, while useful in their own right, can also be used as a tool in model selection criteria. Chapter 5 Application 5.1 Description of Rat-Litter Data We consider the data from a litter-matched experiment on rats, given in Mantel, Bohidar & Ciminera (1977). This data was analyzed in Hougaard (1986b), using likelihood methods. One-hundred litters of 3 animals each were observed. Fifty litters contained all female rats, the other 50 were exclusively male. One rat per litter was exposed to a possibly carcinogenic treatment. For each rat, a time (in weeks) and an indicator (T or D) was recorded. The data in its original form is given in Appendices B and C. The times recorded are times from birth to a response. The response is appearance of a tumour (T), or death (D). The data description indicates that all animals were sacrificed after 104 weeks; however, it is clear that some animals (the surviving male rats listed as third in their litter) were sacrificed after 102 weeks. We will refer to deaths not due to sacrifice as deaths due to other causes. For ease of analysis, all times were divided by 100. Initially, we concentrate only on the female litters. 5.2 Single Risk versus Competing Risks If the phenomenon of interest is tumour appearance, then we can treat those animals observed to die as censored observations. This was the approach taken in Hougaard (1986b). Note that this implies a mixture of censoring mechanisms. We treat each animal as having a time to tumour appearance T, and a censoring time L. Then either 39 Chapter 5. Application^ 40 we observe a value of T smaller than L (uncensored observation) or we know only that T exceeds L (censored observation). Following Lawless (1982, pp. 31-34), the sacrificed animals correspond to type I censoring (L is fixed before the experiment begins), while the animals observed to die correspond to random censoring (L is a random variable). For the conditional likelihood (2.3) to remain valid under random censoring, it is necessary to assume T and L are independently distributed given the frailty. Of course it is entirely possible that death due to other causes (not tumours) is the response of interest, and we could treat tumour appearances as random censoring. The sacrificed animals would still correspond to type I censoring. As before the validity of (2.3) would require conditional independence of the tumour appearance time and the death due to other cause time. It may be the case that both responses (tumour and death due to other causes) are of interest. Or, with data on families, it may be advantageous to model both responses, even if only one is of interest. We then require a competing risks model. Here, associated with each animal is a pair of random variables, Ti. (time to tumour appearance) and T2 (time to death due to other causes). We observe either (T 1 = t, T2 > t) or (T1 > 1, T2 -= t) in addition to type I censored observations (T 1 > t, T2 > t). Note that assuming independence of (T1 , T2 ) given an animal's frailty is equivalent to the assumption required to justify the conditional likelihood in the single risk models. We explore several competing risk models based on this assumption. 5.3 Model Adequacy Checking We would like to do some simple preliminary model checking with the data. One means by which to do this is examining marginal distributions. First, given a conditional Weibull distribution, we examine the unconditional marginal distribution induced by the frailty Chapter 5. Application^ 41 distribution. Specifically, if the conditional hazard is 0A 1 () 3 0 3-1 ) it is easy to show that a gamma frailty distribution yields marginal hazard crA 3 03 -1 tA3^ (5.23) while the marginal hazard under positive stable frailty is Flo (A 0 A 3 t A ° A3-1 ) (5.24) So the positive stable frailty model has Weibull marginals as well as conditionals, but the shape parameters differ. Hence the marginal hazard is monotone (increasing, constant, or decreasing). Conversely with gamma frailty, the marginal hazard either increases to a maxima then decreases, or is monotone decreasing. To do diagnostic marginal checks, we treat the female treatment rats as one sample, the first-listed female control rats as a second sample, and the second-listed as a third sample. Treating a single response as the response of interest, we proceed to check the adequacy of a Weibull distribution for each sample. The random component of the censoring means we don't know the ranks of even the uncensored observations, making a Weibull probability plot impossible. So, as suggested in Lawless (1982, p. 82) for each sample we plot log(— log(S(t)) as a function of log t, where S is the non-parametric Kaplan-Meier estimate of the survivor function. A true underlying Weibull marginal will manifest itself as a roughly linear plot with slope depending solely on the shape parameter. So, obvious non-linearity would contradict the Weibull/positive stable model, while linearity, but different slopes in treatment and control plots would contradict the assumption of a common shape parameter (conditional common shape A3 implies marginal common shape \0A3 under stable frailty). We would like to know how the Weibull/gamma model manifests itself on the above plot. It is possible to show that with this model, log(— log S(t)) will be increasing and convex for large log t (Appendix D). 42 Chapter 5. Application^ Table 5.6: Estimation with rat-litter data : single risk (tumour) gamma frailty frailty parameter baseline treatment control hazard parameters common shape positive stable frailty frailty parameter baseline treatment control hazard parameters common shape Ao Al A2 A3 posterior mode (MLE) .61 (.29) .64 (.17) .260 (.068) 3.93 (.57) approx. posterior mean .58 (.25) .72 (.20) .288 (.080) 4.05 (.57) .906 (.095) .55 (.14) .214 (.058) 4.10 (.63) .871 (.083) .57 (.14) .223 (.059) 4.23 (.64) Ao Al A2 A3 The above-mentioned plots for both responses are given in Figure 5.5. The tumour plots appear to be roughly linear with common slope, supporting our Weibull/positive stable model. The other cause plots seem to show a slight concavity for large log t, indicating perhaps the suitability of gamma frailty rather than stable frailty for this response. Of course this is appropriate only if the assumption of Weibull conditional hazards is appropriate. 5.4 Single Risk : Tumour Development Using the Weibull model with conditional hazards 0A 1 (A 3 t A 3 -1 ) for treatment animals and 0A 2 () 3 0 3-1 ) for controls, we compute the MLE and approximate standard errors in the usual fashion. We also compute approximate posterior means, and the square roots of approximate posterior variances with respect to H i , the flat prior. Results for gamma frailty and for positive stable frailty are displayed in Table 5.6. The approximate posterior quantities were computed using the original parameterization. Given the success of using reparameterization II in the simulation study, we wished to try that technique here too. Chapter 5. Application 43 Figure 5.5: Diagnostic plots based on Kaplan-Meier estimates of the marginals tumour-treatment other cause-treatment 0 - CJ - - • - - -1.2^-1.0^-0.8^-0.6^-0.4^-0.2^0.0 -1.2^-1.0^-0.8^-0.6^-0.4^-0.2^0.0 tumour-control 1 r - - • other cause-control 1 - -1.2^-1.0^-0.8^-0.6^-0.4^-0.2^0.0 -1.2^-1.0^-0.8^-0.6^-0.4^-0.2^0.0 tumour-control 2 - other cause-control 2 - - • - • - - -1.2^-1.0^-0.8^-0.6^-0.4^-0.2^0.0 -0.8 -0.6 -0.4 -0.2 0.0 44 Chapter 5. Application^ Table 5.7: Estimation with rat—litter data : single risk (other causes) gamma frailty frailty parameter treatment baseline control hazard parameters common shape positive stable frailty frailty parameter baseline treatment hazard control parameters common shape Ao Al A2 A3 Ao Al A2 A3 posterior mode (MLE) .60 (.21) .54 (.16) .51 (.11) 5.39 (.69) approx. posterior mean .56 (.19) .62 (.19) .55 (.13) 5.55 (.70) .876 (.085) .44 (.12) .392 (.084) 5.57 (.75) .854 (.081) .47 (.13) .408 (.086) 5.70 (.75) Unfortunately, we couldn't maximize the necessary modified likelihoods. In particular, the induced prior 11/(K) increases without bound as /c o decreases to 0. It would seem that the likelihood does not decrease commensurably quickly. 5.5 Single Risk : Death due to Other Causes The same analysis applied to the tumour appearance times above was applied to the other cause times. The results are displayed in Table 5.7. Note that unlike the tumour risk, there is no discernible difference between treatment and control groups with other cause risk. 5.6 Competing Risks : Single Frailty A simple approach to the competing risks problem is to assume that frailty is not riskspecific. Consequently, an animal's frailty acts multiplicatively on both its tumour hazard and its other cause hazard. Since treating the risks seperately has produced similar 45 Chapter 5. Application^ frailty parameter estimates for both risks, we have no immediate grounds to reject this assumption. The model structure remains the same; however, we now effectively have families of six members each (two lifetimes per animal), and three groups. The first group consists of tumour appearance times for treatment animals, the second group is tumour times for control animals, while the third group is other cause times. The conditional hazards are Oh(t), where h is group-dependent, according to: h(t) = A 1 0 3 0 3- ') for group 1; A 2 (A 3 03 -1 ) for group 2; )1 405 05 -1 ) for group 3. (5.25) So each family has one observation on group 1, two observations on group 2, and three observations on group 3. However, at most three observations from one family can be uncensored, since at most one lifetime per animal can be uncensored. Note that we assume the treatment has no effect on the other cause hazard (consistent with the estimation results from the last section). On the other hand, we do allow for different shape parameters for the different risks. The results of estimation in this model are given in Table 5.8. 5.7 Competing Risks : Two Frailties There is no reason in general to expect that non-risk-specific frailty is realistic. On the other hand, assuming an individual has independent risk-specific frailties does not allow for an overall general health effect. A middle ground is suggested by Hougaard (1986b), in the context of distinguishing between sibling and twin relationships. Following his idea, we associate with an individual a general frailty 0, and risk-specific frailties 0 1 and 0 2 , corresponding to tumour appearance and other cause risk respectively. Generalizing 46 Chapter 5. Application^ Table 5.8: Estimation with rat-litter data : competing risks, single frailty gamma frailty frailty parameter tumour treatment appearance control baseline common shape other cause baseline shape positive stable frailty frailty parameter tumour treatment appearance control baseline common shape other cause baseline shape Ao Al A2 A3 A4 A5 Ao Al A2 A3 A4 A5 posterior mode (MLE) .78 (.15) .65 (.16) .262 (.065) 3.98 (.57) .508 (.088) 5.30 (.67) approx. posterior mean .72 (.14) .72 (.18) .290 (.074) 4.13 (.59) .55 (.10) 5.45 (.67) .911 (.063) .59 (.14) .228 (.055) 4.29 (.61) .449 (.069) 5.52 (.68) .880 (.060) .63 (.15) .241 (.058) 4.43 (.61) .464 (.074) 5.67 (.68) from the previous section, the conditional hazards become {00 1 A 1 (A 3 t A3 ') for group 1; h(t I 0,01, 02) =^0cb 1 A 2 (A 3 t A3-1 ) for group 2; (5.26) 00 2 A 4 (A 5 t A 5 -1 ) for group 3. It is assumed that (0 1 , 02) are independently distributed according to a positive stable distribution parameterized by A6, while 0 A 6 independently follows a positive stable distribution parameterized by A o . Unfortunately, it appears this structure is chosen to make integration to the unconditional likelihood tractable, rather than for physical reasons. It is easy to show that (suppressing the family index), the unconditional survivor function is P(Ii > yi, ... , Y6 > y6) = exp (— [HaA6 + N6] A°)^(5.27) Chapter 5. Application^ where Ha + 47 Hb = H, such that Ha contains the terms corresponding to tumour times, while Hb 's terms correspond to other cause times. For our application, we can get the likelihood by taking partial derivatives (up to third order) of (5.27) with respect to (y i , , ye). Unfortunately, there does not seem to be an obvious recursive form for the likelihood as in the single frailty case. The derivatives required are given in Appendix E. The results from fitting the model are given in Table 5.9. Note that the two frailty model generalizes the single frailty with stable distribution model in the following sense. The net frailty associated with each individual, Ock i , follows a positive stable distribution (with parameter X0)6). So for example, considering groups 1 and 2 only in (5.26) reduces to the two group model with a single frailty following a stable distribution. So an analogous generalization of the gamma frailty model could be achieved by taking any joint distribution on (0,0 i ) such that the product 00 i has a gamma distribution. However, for physical reasons, one may wish to impose other conditions such as independence between 0 and 0 i , or perhaps certain marginal distributions. A convenient specification yielding gamma distributed net frailty was not found. For a general discussion of these types of mixture models, from the point of view of specified marginals, see Joe (1991). 5.8 Including Covariates We propose to include the data on male litters by means of a regression function acting multiplicatively on an animal's conditional hazard, in the manner of Cox (1972). Since there are only two observed tumours in the male animals, we choose to model the other cause hazard only. Because of the binary nature of the covariate, we can generalize the model of section 5.5 by introducing a multiplicative factor A4 in the conditional hazards Chapter 5. Application^ 48 Table 5.9: Estimation with rat-litter data : competing risks, two frailties positive stable frailty frailty parameters baseline hazard parameters other cause baseline Ao A6 treatment control common shape Al A2 A3 A4 A5 posterior mode (MLE) .938 (.062) .945 (.071) .56 (.14) .216 (.056) 4.29 (.61) .424 (.076) 5.66 (.71) approx. posterior mean .897 (.064) .942 (.071) .60 (.15) .231 (.059) 4.46 (.63) .445 (.082) 5.79 (.73) of male rats only. The estimation results from this model are given in Table 5.10. 5.9 Assessing Model Choice To compare choice of model and frailty distribution, we propose to use predictive methods. Following the cross-validation idea of Geisser (1980a), for each uncensored observation, we construct a predictive density conditional on all the other observations. For example, to do this for the 40 tumour appearances observed would require 40 different posteriors, the i th posterior being based on all the data except the i th tumour appearance time. In practice we use the posterior based on all the data to compute each predictor, as deleting one of 150 observations should not greatly influence the posterior. To compute the predictive densities, we use importance sampling with multinormal samples of size 1000, as described in the previous chapter. Using these densities to assess model choice is an application of the idea of predictive sample reuse, introduced independently by Geisser (1975) and Stone (1974). As a measure of the quality of prediction, we take the log of the product of the predictive densities evaluated at the observed values. This choice of measure of prediction is inspired by the idea of comparisons based on likelihood ratios. It is also suggested in Chapter 5. Application^ 49 Table 5.10: Estimation with rat-litter data : single risk (other cause), proportional hazards model gamma frailty frailty parameter baseline treatment hazard control parameters common shape regression parameter positive stable frailty frailty parameter baseline treatment hazard control parameters common shape regression parameter Ao Ai A2 A3 A4 Ao Al A2 A3 A4 posterior mode (MLE) .75 (.13) .48 (.10) .489 (.087) 4.87 (.40) 1.47 (.31) approx. posterior mean .71 (.13) .51 (.11) .511 (.085) 4.94 (.42) 1.50 (.34) .895 (.050) .401 (.089) .415 (.077) 5.14 (.43) 1.48 (.33) .878 (.053) .413 (.093) .422 (.079) 5.20 (.44) 1.53 (.36) Geisser Sz Eddy (1979). Their interpretation is that of the models considered, the one maximizing this predictive score best explains the data. These predictive scores for all the models considered are given in Table 5.11. Note that for the purposes of comparison, the score for the model including males is based on prediction of the female survival times only. So, the predictive scores indicate superior performance by the stable frailty distribution in all but the proportional hazards model. To investigate this phenomenon, we give histograms of the individual ratios of stable predictive density to gamma predictive density evaluated at the observed failure times. These are given for the single risk models and the competing risk single frailty model in Figure 5.6. Note that in three of the four models, the superior performance of stable frailty can be attributed to one or two relatively large predictive density ratios, while the remaining observations show relatively even performance between the two frailty distributions, with any edge in fact favouring Chapter 5. Application^ 50 Figure 5.6: Histograms of predictive density ratios : positive stable frailty versus gamma frailty tumour risk only competing risks: tumour Lf) 0 1- u) LC) 0 - 1M 0 -2^-1^0^1^2^-2^-1^0^1^2 Iog(predictive density ratio) ^log(predictive density ratio) other cause risk only 0 CO 0 CO C\I 0 0 CV 0 0 -r- competing risks: other cause ,— Ln 0 r 1^ 0 ^ n -2^-1^0^1^2^-2^-1^0^1^2 log(predictive density ratio)^Iog(predictive density ratio) Chapter 5. Application^ 51 Table 5.11: Predictive scores model tumour risk only tumour appearance gamma stable frailty frailty -24.71 -24.18 other cause gamma stable frailty frailty other cause risk only -8.93 -8.71 other cause risk only including males Competing risks single frailty Competing risks two frailties -10.16 -10.26 -9.44 -7.51 -25.49 -22.80 -22.92 -8.04 the gamma distribution. Again it seems the stable frailty choice is robust, relative to the gamma frailty specification. Also it is clear that on the basis of these predictive scores, the single frailty, competing risks model with stable frailty is the most suitable, as it gives the best prediction of both tumour appearance and other cause failure times. Note that while this model with stable frailty improves upon the single risk models, the same model with gamma frailty is worse than the single risk models. This underscores the importance of the frailty distribution selection—different choices can lead to different conclusions about the manner in which frailty acts upon individuals hazards. The double frailty model exhibits slightly poorer performance than the best model, even though it is a generalization. Perhaps the ad-hoc assumptions concerning frailty distributions in this model are inappropriate at least in this situation. The assumption of exchangeable risk-specific frailties is perhaps too restrictive. A model with more flexible inclusion of general and risk-specific frailties might exhibit better performance; however, Chapter 5. Application^ 52 such a model would be calculationally difficult. Including the male rats in the other risk only model via proportional hazards does not improve the prediction of the female other cause times. This certainly calls into question the assumption of conditional proportional hazards between male and female animals. Computing a predictive score for each model provides a very easy way to choose between several models for given data (the model with the highest score is most suitable). Of course there are other model—ch000sing criteria. Some of these are mentioned in the next chapter. Chapter 6 Discussion 6.1 Approaches to Frailty Models As noted in (2.1), the hierarchical model can be collapsed to a one-stage model. When the n i are small (the case with which we have been concerned), there seems to be little point in trying to estimate the O i 's. In a sense the two-stage model is only motivation for the one-stage likelihood. However, it is important to keep in mind the notion of underlying frailties. As we saw in Chapter 4, the knowledge about an individual's frailty is fundamental to predicting that individual's failure time. Moreover, in Chapter 5, we saw that the concept of frailty led to more sophisticated models, with competing risks, and several frailties. Of course, if the Laplace transform of the frailty density does not have a nice form, we could not straightforwardly collapse to a one-stage model. In this instance the hierarchical nature of the model would be more visible (though no more important). We also saw that these models could be approached from the perspective of say bivariate models, with given marginals. Here obviously model specification would be in terms of marginal rather than conditional (baseline) distributions. This has some disadvantages. Specifying marginals is specifying an effect averaged over a population. In analogy to the Cox (1972) model, it would be akin to specifying a form for the baseline hazard averaged with respect to the covariate vectors of the individuals. Conversely, specifying conditionals corresponds to choosing a form for the baseline hazards. The 53 Chapter 6. Discussion^ 54 latter approach seems more sensible. Additionally, specifying arbitrary marginals implies that the conditional (baseline) hazards depend on the frailty parameter(s). This can also be construed as undesirable. Lindley & Singpurwalla (1986) motivate a model with time-varying frailties. In particular they suggest treating an individual frailty as a stochastic process. They point out that assuming frailty is a purely random process with a stationary distribution results in a model analogous to ours, but with different interpretation. Apparently, this approach has not been pursued in the literature. 6.2 Specification of Distributions In practice, we have restricted ourselves to the use of Weibull distributions for the first stage of our model. Although this family of distributions is flexible, and commonly used for failure times, this is a somewhat arbitrary specification. For more generality, one could use the generalized gamma distribution of Stacy (1962). This three parameter family includes the Weibull and gamma families, and has a log-normal distribution as a limiting case. Unfortunately, an additional parameter will have ramifications in terms of computational practicality. In light of the existing literature, we have considered gamma and positive stable distributions as second-stage distributions. These families have the advantage of possessing nice Laplace transforms of their densities. In our data analysis, the stable frailty outperforms the gamma frailty, at least in terms of our predictive sample reuse criteria. As mentioned previously, this may be attributable to the robustness of the stable specification rather than a higher degree of "truthfulness." This robustness in turn comes from the long right tail of positive stable distributions. In any case, at least according to our criteria, the stable frailty specification offers a better explanation of the data than the Chapter 6. Discussion^ 55 gamma frailty. This is consistent with Hougaard's theoretical preference for the positive stable frailty. There is however, a drawback to using stable frailty in terms of implementing the Bayesian paradigm. With gamma frailty, a specification of a prior on A and particularly on A o , describes belief in a simple function of the variance of the frailty distribution. With stable frailty, A o is the index of the variate, and no such straightforward interpretations exist (the mean and variance of the frailty distribution do not exist). In short, with stable frailty, it is harder to encapsulate knowledge about the heterogeneity of a population. Again, one wonders about other frailty distribution specifications. Hougaard (1986a) introduces a three-parameter family generalizing both the positive stable and gamma distributions. However, using this entire family would present computational difficulties, and possibly problems with identifiability. An alternate family with the property of closed form Laplace and inverse Laplace transforms is the inverse Gaussian densities. However, inverse Gaussian frailty models cannot attain the Frechet upper bound. In fact, the limiting distribution corresponds to positive stable frailty with parameter A o = 1/2 (Oakes, 1989). Hence this frailty distribution specification would not encompass as wide a range of dependence/association as gamma or positive stable frailty. There are some nonparameteric and semiparametric procedures which can be used for multivariate frailty models. Oakes (1986,1989) discusses such procedures. However, they are presently limited to bivariate situations with identical groupings within each family. Also, they do not provide for the specification of prior information. Chapter 6. Discussion^ 56 6.3 Techniques To estimate parameters, we used both maximum likelihood estimation and Laplace's method with diffuse priors. In the data analysis, there were no cases of these two methods providing greatly differing results. In the simulation studies, the overall performance of the MLE was perhaps superior; however, at least in one instance Laplace's method provided the better results. Of course, if one has non-diffuse prior information then Laplace's method would provide reliable estimates of functions of parameters. The additional advantage of Laplace's method is its simplicity—it rivals maximum likelihood estimation in terms of ease of implementation. We also witnessed the dependence of Laplace's method on parameterization. A fixed posterior function can be approximated via any parameterization of the model. Due to the nature of Laplace's method, the approximation will be better with a parameterization in which the posterior distribution is in some sense closer to a normal distribution. In fact if the parameter space were 1 or 2-dimensional, one could visually inspect the (nonnormalized) posterior under a variety of parameterizations. The parameterization which induces the most "normal looking" posterior would be a good choice to use for Laplace approximations. Of course in our situation (and most other realistic situations) the parameter space is of higher dimension. Achcar & Smith (1990) give examples of the sensitivity of Laplace approximations to reparamerization in a number of situations. The Laplace approximation was not so attractive for approximating predictive densities. For this purpose, importance sampling gave us normalization essentially for free. This was due to the nice form of the conditional densities (PEB predictive densities), which in turn was due to the nice form of the Laplace transform of the frailty density. Another Monte Carlo technique which is potentially useful for approximating posterior quantities in hierarchical models is Gibbs sampling. For a general discussion of Chapter 6. Discussion^ 57 Gibbs sampling, see Geman Geman (1984), or Gelfand Si Smith (1988). In CIHM's, Gibbs sampling would require simulating variates from the distributions of (0, I A, data), i = 1, , k. and (Aj Au), 0, data), j = 1, , r. In the Weibull model with gamma frailty, (0 i I A, data) will also have a gamma distribution (gamma frailty is effectively a conjugate prior specification). However, () j A u) , 0, data) would have a complicated distribution function. Thus in our particular hierarchical model, Gibbs sampling would be very difficult to implement. 6.4 Assessing Models Chapter 5 illustrated several ways of modelling a data set. Of particular interest was the difference between the single risk models and the competing risk models. According to our criteria, (predictive sample reuse), the competing risk models fared better, in terms of predicting both tumour appearance times and other cause times. Put another way, with data on families, even if say tumour appearance is the response of interest, it is advantageous to model the censoring times (in this case the other cause times). The criteria used to assess models is also of interest. The predictive sample reuse idea is conceptually appealing. In a sense it favours the model which offers the best explanation of the data. There is not such a strong emphasis on the truth of various models, which is an approximate notion at best. There is also not the reliance on asymptotics which is used for likelihood assessment with nested models. There are many other criteria available to assess the different models. Even within the framework of predictive sample reuse, there are many ways to implement a model— selection scheme. Geisser (1980a) has suggested making point predictions and using some discrepancy function to measure the distance between predicted and realized values. An entirely different approach would be to use Bayes factors. When comparing two models Chapter 6. Discussion^ 58 (each consisting of a likelihood L and a prior II), the Bayes factor is given by f L i (A)II I (A)dA f L 2 (A)II 2 (A)dA with a large Bayes factor favouring model 1 (Berger, 1985, p. 146). In our models, computing Bayes factors would be difficult, requiring the integration of the unconditional likelihood (2.5) with respect to the prior measure H on A. Also, with diffuse priors, we expect Bayes factors to be very similar to likelihood ratios. Because of the behaviour with nested models, we wish to avoid likelihood ratio criteria. When likelihood ratio is applied to nested models, the more compex model will necessarily have a larger likelihood, or score. We wish to avoid the appeal to asymptotics usually made to resolve this problem. Predictive sample reuse and Bayes factors will not necessarily favour a more complex model. Other approaches involve modifying the likelihood ratio. Akaike's information criterion (Akaike, 1973) uses log L(5) — [# estimable parameters] as a score with which to compare models. (A is a maximum likelihood estimate.) The approach of Schwarz (1978) modifies the criterion to log L(A) — [(1/2) log n x (# estimable parameters)] where n is the number of independent observations. Of course, these latter two criteria do not admit prior information. These criteria are not entirely dissimilar. Smith & Spiegelhalter (1980) show a connection between Bayes factors and the Schwarz criterion in certain contexts. Also, Stone (1977) has shown an asymptotic equivalence between a variant of the predictive sample reuse criterion and the Akaike information criterion. The variant in question is similar to the one employed by us, but replaces the full predictive density with the PEB predictive density. Chapter 6. Discussion^ 59 We note that in our data analysis, the single frailty competing risks model is a special case (A 6 = 1) of the double frailty model. Thus we have an example of a predictive sample reuse criteria directly favouring the simpler of two nested models. Our treatment of censoring in terms of predictive sample reuse bears discussion. We have based our criteria on prediction of the uncensored observations only. This allows for direct comparison between single risk and competing risk models. For methods of incorporating the censored values, see Geisser (1980b). Appendix A Quadrant Dependence of Frailty Models Following (2.11), let p denote the Laplace transform of the frailty distribution. We know p is convex. We will show that if log p is convex, then for arbitrary first stage distributions, the bivariate distribution is positively quadrant dependent. We will need the facts p(0) = 1, p > 0, and p' < 0; these are elementary properties of Laplace transforms. Note that to show (2.12) holds, it would suffice to show p(a + b) > p(a)p(b) Va, b > 0, or equivalently log p(a + b) > log p(a) + log p(b). But log p(a + b) = log p(a) + la a+b pf ( s ) t 19 s) ds. Now we know that the integrand is negative, and by assumption it is increasing, hence Ar ., ,) log p(a + b)^ ds ; log P(a) +^11(3) i that is log p(a + b) > log p(a) + log p(b) But log p(0) = 0, giving the desired result. 60 — log p(0). Appendix B Female Rat Litter Data Female Rat Litters Drugtreated Control-1 Control-2 101-D 49-T 104-D 104-D 102-D 104-D 104-D 104-D 104-D 77-D 97-D 79-D 89-D 104-D 104-D 88-T 96-T 104-D 104-D 94-D 77-T 96-T 104-D 104-D 82-D 77-D 104-D 104-D 70-T 7-D 89-T 91-D 90-D 91-D 70-D 92-D 39-T 45-D 50-T 103-T 69-D 91-D 93-D 104-D 103-D 85-D 72-D 104-D 104-D 63-D 104-D 104-D 104-D 74-D 81-D 104-D 69-D 67-T 104-D 68-T 104-D 104-D 104-D 104-D 104-D 104-D 104-D 83-D 40-T 87-D 104-D 104-D 104-D 104-D 104-D Female Rat Litters Drugtreated Control-1 Control-2 89-D 104-D 104-D 78-D 104-D 104-D 104-D 81-T 64-T 86-T 55-T 94-D 34-T 104-D 54-T 76-D 87-D 74-D 103-T 73-T 84-T 102-T 104-D 80-D 80-T 104-D 73-D 45-T 79-D 104-D 94-T 104-D 104-D 104-D 104-D 104-D 104-D 101-T 94-D 76-D 84-T 78-T 80-T 81-T 76-D 72-T 95-D 104-D 104-D 73-T 66-T 92-T 104-D 102-T 104-D 98-D 73-D 55-D 104-D 104-D 49-D 83-D 77-D 89-T 104-D 104-D 88-D 79-D 99-D 103-T 91-D 104-D 104-D 104-D 79-T 61 Appendix C Male Rat Litter Data Male rat litters Drugtreated 91-D 91-D 98-D 91-D 104-D 96-D 79-D 61-D 63-D 104-D 104-D 104-D 104-D 65-D 86-D 104-D 95-D 104-D 104-D 92-D 104-D 63-D 104-D 104-D 87-D Control-1 104-D 104-D 62-D 98-D 104-D 71-D 104-D 88-D 104-D 104-D 80-D 104-D 53-D 104-D 104-D 100-D 104-D 104-D 93-D 98-D 89-D 32-D 98-D 104-D 104-D Male rat litters Drugtreated 104-D 104-D 90-D 91-D 104-D 23-D 104-D 87-D 104-D 104-D 67-D 104-D 104-D 104-D 51-D 102-D 88-D 67-D 104-D 81-D 94-D 104-D 104-D 104-D 92-D Control-2 102-D 102-D 77-D 76-D 98-D 91-D 99-D 85-D 102-D 102-D 92-D 101-D 102-D 91-D 75-D 102-D 95-D 102-D 80-D 83-D 89-D 51-D 78-D 102-D 94-D 62 Control-1 104-D 91-D 104-D 104-D 104-D 104-D 71-T 51-D 83-D 104-D 84-D 104-D 94-D 103-D 104-D 98-D 54-D 84-D 104-D 82-D 104-D 89-D 69-D 75-T 104-D Control-2 102-D 102-D 55-D 102-D 102-D 102-D 90-D 102-D 102-D 96-D 94-D 99-D 102-D 102-D 91-D 102-D 39-D 54-D 87-D 102-D 102-D 77-D 102-D 64-D 102-D Appendix D Performance of Weibull/Gamma Model on Diagnostic Plot Letting y = log[— log S(t)] and x = log t, from the Weibull/gamma marginal hazard (5.23), we see y = log a + log(log g(x)) where g(x) = k + exp(.\3x), k = a1 t 1 We wish to show that for large x,cWdx is positive, while d 2y/dr 2 is negative. Noting that g' (x) = A 3 [g(x) — k], we can write dy A3 1 dx^3 (k I g) log g — which will be positive when g > 1. Since g is increasing this will occur when x is sufficiently large. Differentiating again, we can write d2 y^A2 {k[g log g] — k2[logg] dx 2^[g log .9, 1 2 — [(g — k) 2 ]} — which we can see will be negative when g (and hence x) is sufficiently large. 63 ^ Appendix E Derivatives of Survivor Function in Two-Frailty Model We want the first, second, and third-order derivatives of (5.27) with respect to (yi, .. • , ye). Initially, we will express these in terms of the derivatives of H* = HaA6 . We denote (5.27) by A = exp(—(H*) A 9 and write B = —A o (H*)'°'. Then the first-order derivative is Aa = AB The second-order derivatives have the form A ii = A(H: H;)[B 2 B (1) H*^ * B1 l H2 where B( d ) is the dth derivative of B with resepct to H*. The third-order derivatives can be expressed as a sum of three terms: ^Aijk ^ =^AB[B2 B( 1 ) + H13] A(HAH; H;k H:)[B 2 B( 1 ) + H B] A(HiHi )[2B B( 1 ) + B( 2 ).111 tc ik 1;- & B] where d B^HZiHA B 112*.i ii;k B 11:i HZ B(1) ^B = ^ dyk HH^H:^(Hr)2 H;^(H;)2 H: The derivatives of H* are given by H = A6 64 -l h i Appendix E. Derivatives of Survivor Function in Two-Frailty Model ^65 1-1: . = 3 A6(A6 — 1)H (Ar 2 hillj (i) = (j) 0 otherwise and H:jk = A 6 (A 6 — 1)(A 6 — 2)111r 3 h i h i h k (i) (j) (k) 0^ where (i) indicates a or b depending on whether cause response respectively. otherwise Yi is a tumour response or an other Bibliography [1] ACHCAR, J.A. & SMITH, A.F.M.(1990). Aspects of reparameterization in approximate Bayesian inference. Bayesian and Likelihood Methods in Statistics and Econometrics 439-52. eds. S. Geisser, J.S. Hodges, S.J. Press, A. Zellner. Amsterdam: North-Holland. [2] AKAIKE, H.(1973). Information theory and an extension of the maximum likelihood principle. 2nd. International Symposium on Information Theory 267-81. Budapest: Akademia Kiado. [3] BERGER, J.0.(1985). Statistical Decision Theory and Bayesian Analysis New York: Springer-Verlag. [4] BLOMQVIST, N.(1950). On a measure of dependence between two random variables. Ann. Math. Statist. 21, 593-600. [5] CHAMBERS, J.M., MALLOWS, C.L. & STUCK, B.W.(1966). A method for simulating stable random variables. J. Am. Statist. Assoc. 71, 340-44. [6] CLAYTON, D.(1978). A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika 65, 141-51. [7] CORNFIELD, J. & DETRE, K.(1977). Bayesian life table analysis. J. R. Statist. Soc. B 39, 86-94. [8] COX, D.R.(1972). Regression models and life tables (with discussion). J. R. Statist. Soc. B 34, 187-202. [9] COX, D.R. & OAKES, D.(1984). Analysis of Survival Data Chapman & Hall. [10] FELLER, W.(1971). An Introduction to Probability Theory and Its Applications, 2, 2nd ed. New York: Wiley. [11] GAMMERMAN, D. & WEST, M. (1987). An application of dynamic survival models in unemployment. Statistician 36, 269-74. [12] GEISSER, S.(1975). The predictive sample reuse method with applications. J. Am. Statist. Assoc. 70, 320-38,350. 66 Bibliography^ 67 [13] GEISSER, S.(1980a). A predictivist primer. Bayesian Analysis in Econometrics and Statistics: Essays in Honor of Harold Jeffreys. 363-81. ed. A. Zellner. Amsterdam: North-Holland. [14] GEISSER, S.(1980b). Predictive sample reuse techniques for censored data. Bayesian Statistics. 433-68. eds. J.M. Bernardo et. al., Valencia: University Press [15] GEISSER, S. & EDDY, W.F.(1979). A predictive approach to model selection. J. Am. Statist. Assoc. 74, 153-160. [16] GELFAND, A.E. Si SMITH, A.F.M.(1988). Sampling based approaches to calculating marginal densities. Technical report: Nottingham Statistics Group. University of Nottingham. [17] GEMAN, S. & GEMAN, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 721-41. [18] GENEST, C. & MACKAY, R.J.(1986a). Copules Archimediennes et families de lois marges sont donnees. Can. J. Stat. 14, 145-59. [19] GENEST, C. & MACKAY, R.J.(1986b). The joy of copulas: bivariate distributions with given marginals. The American Statistician 40 280-83. [20] HOLT, J.D. & PRENTICE, R.L.(1974). Survival analysis in twin studies and matched-pair experiments. Biometrika 61 17-30. [21] HOUGAARD, P.(1984). Life table methods for heterogeneous populations: Distributions describing the heterogeneity. Biometrika 71, 75-83. [22] HOUGAARD, P.(1986a). Survival models for heterogeneous populations derived from stable distributions. Biometrika 73, 387-96. [23] HOUGAARD, P.(1986b). A class of multivariate failure time distributions. Biometrika 73, 671-78. [24] JOE, H.(1991). Parametric families of multivariate distributions with given margins. Technical report: Department of Statistics. University of British Columbia. [25] KALBFLEISH, J.(1978). Non-parametric Bayesian analysis of survival time data. J. R. Statist. Soc. B 40, 214-21. [26] KALBFLEISH, J. & PRENTICE, R.L.(1980). The Statistical Analysis of Failure Time Data New York: Wiley. Bibliography^ 68 [27] KANTER, M.(1975). Stable densities under change of scale and total variation inequalities. Annals of Prob. 3, 697-707. [28] KASS, R.E. & STEFFEY, D.(1989). Approximate Bayesian inference in conditionally independent hierarchical models (parametric empirical Bayes models). J. Am. Statist. Assoc. 84, 717-26. [29] KENDALL, M.G.(1938). A new measure of rank correlation. Biometrika 30, 81-93. [30] LAWLESS, J.F.(1982). Statistical Models and Methods for Lifetime Data New York: Wiley. [31] LEHMANN, E.L.(1966). Some concepts on dependence. Ann. Math. Statist. 37, 1137-53. [32] LINDLEY, D.V. & SINGPURWALLA, N.D.(1986). Multivariate distributions for the life lengths of components of a system sharing a common environment. J. Applied Prob. 23, 418-31. [33] MANTEL, N., BOHIDAR, N.R. & CIMINERA, J.L.(1977). Mantel-Haenszel analyses of litter matched time-to-response data, with modifications for recovery of interlitter information. Cancer Res. 37, 3863-68. [34] MARSHALL, A.W. & OLKIN, I.(1988). Families of multivariate distributions. J. Am. Statist. Assoc. 83, 834-41. [35] MARTZ, H.F. & WALLER, R.A.(1982). Bayesian Reliability Analysis New York: Wiley. [36] OAKES, D.(1986). A model for bivariate survival data. Modern Statistical Methods in Chronic Disease Epidemiology eds. R.L. Prentice and S.H. Moolgavkar. New York, Wiley. [37] OAKES, D.(1989). Bivariate survival models induced by frailties. J. Am. Statist. Assoc. 84, 487-93. [38] SCHWARZ, G.(1978). Estimating the dimension of a model. Ann. Statist. 6, 461-64. [39] SMITH, A.F.M. & SPIEGELHALTER, D.J.(1980). Bayes factors and choice criteria for linear models. J. R. Statist. Soc. B 42, 213-20. [40] SPEARMAN, C.(1904). The proof and measurement of association between two things. Am. J. of Psychology 15, 72-101. [41] STACY, E.W.(1962). A generalization of the gamma distribution. Ann. Math. Statist. 33, 1187-92. Bibliography^ 69 [42] STONE, M.(1974). Cross-validatory choice and assessment of statistical predictions (with discussion). J. R. Statist. Soc. B 36, 111-47. [43] STONE, M.(1977). An asymptotic equivalence of choice of model by cross-validation and Aikake's criterion. J. R. Statist. Soc. B 39, 44-47. [44] TIERNEY, L. & KADANE, J.B.(1986). Accurate approximations for posterior moments and marginal densities. J. Am. Statist. Assoc. 81, 82-86. [45] VAUPEL, J.W., MANTON, K.G. & STALLARD, E.(1979). The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography 16, 439-54. [46] WILD, C.J.(1983). Failure time models with matched data. Biometrika 70, 633-41.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Hierarchical modelling of multivariate survival data
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Hierarchical modelling of multivariate survival data Gustafson, Paul Antony 1991
pdf
Page Metadata
Item Metadata
Title | Hierarchical modelling of multivariate survival data |
Creator |
Gustafson, Paul Antony |
Date Issued | 1991 |
Description | Hierarchical models based on conditional independence are investigated as a means of modelling multivariate survival times. The model structure follows Clayton (1978), Hougaard (1986b), and Oakes (1986, 1989). Both approximate Bayesian and maximum likelihood estimation in these models is investigated via simulation. Predicting a component of a response vector on the basis of other components of the response and other vectors is also studied, using Bayes and empirical Bayes methods. Application to real data is detailed, using the techniques of estimation and prediction discussed. |
Extent | 3398991 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
File Format | application/pdf |
Language | eng |
Date Available | 2008-09-05 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0086471 |
URI | http://hdl.handle.net/2429/1649 |
Degree |
Master of Science - MSc |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 1992-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-ubc_1992_spring_gustafson_paul.pdf [ 3.24MB ]
- Metadata
- JSON: 831-1.0086471.json
- JSON-LD: 831-1.0086471-ld.json
- RDF/XML (Pretty): 831-1.0086471-rdf.xml
- RDF/JSON: 831-1.0086471-rdf.json
- Turtle: 831-1.0086471-turtle.txt
- N-Triples: 831-1.0086471-rdf-ntriples.txt
- Original Record: 831-1.0086471-source.json
- Full Text
- 831-1.0086471-fulltext.txt
- Citation
- 831-1.0086471.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0086471/manifest