HIERARCHICAL MODELLING OF MULTIVARIATE SURVIVAL DATAByPaul Antony GustafsonB. Sc. (Mathematics) University of British ColumbiaA THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEinTHE FACULTY OF GRADUATE STUDIESDEPARTMENT OF STATISTICSWe accept this thesis as conformingto the required standardTHE UNIVERSITY OF BRITISH COLUMBIA1991© Paul Antony Gustafson, 1991In presenting this thesis in partial fulfilment of the requirements for an advanceddegree at the University of British Columbia, I agree that the Library shall make itfreely available for reference and study. I further agree that permission for extensivecopying of this thesis for scholarly purposes may be granted by the head of mydepartment or by his or her representatives. It is understood that copying orpublication of this thesis for financial gain shall not be allowed without my writtenpermission.(SignatureDepartment of ^StatisticsThe University of British ColumbiaVancouver, CanadaDate^Dec. 9, 1991.DE-6 (2/88)AbstractHierarchical models based on conditional independence are investigated as a means ofmodelling multivariate survival times. The model structure follows Clayton (1978),Hougaard (1986b), and Oakes (1986,1989). Both approximate Bayesian and maximumlikelihood estimation in these models is investigated via simulation. Predicting a com-ponent of a response vector on the basis of other components of the response and othervectors is also studied, using Bayes and empirical Bayes methods. Application to realdata is detailed, using the techniques of estimation and prediction discussed.iiTable of ContentsAbstract^ iiList of Tables^ viList of Figures^ viiAcknowledgement viii1 Introduction 12 Models and Techniques 32.1 Conditionally Independent Hierarchical Models ^ 32.2 Hazard Rate and Frailty ^ 42.3 Multivariate Frailty Model as a CIHM^ 62.4 The Weibull Model ^ 92.5 Properties as a One-Stage Model ^ 102.6 Bayesian Inference ^ 112.7 Using Laplace's Method to Approximate Posterior Quantities ^ 123 Estimation and Simulation 143.1 Estimating the Common Parameter ^ 143.2 Prior Information ^ 153.3 Reparameterizat ions 153.4 Weibull/Gamma Simulation Study ^ 17iii3.5 Weibull/Positive Stable Simulation Study^ 204 Prediction 244.1 Parametric Empirical Bayes Approach to Prediction ^ 254.2 Full Bayesian Approach to Prediction ^ 264.3 The PEB Approach in the Weibull Model 274.4 The Full Bayesian Approach in the Weibull Model ^ 284.5 Full versus PEB Prediction ^ 294.5.1^Exponential Prediction Example^ 304.6 Prediction in the Weibull/Gamma Model 324.7 Prediction in the Weibull/Positive Stable Model ^ 334.8 Effect of Frailty Distribution on Prediction 365 Application 395.1 Description of Rat-Litter Data ^ 395.2 Single Risk versus Competing Risks 395.3 Model Adequacy Checking ^ 405.4 Single Risk : Tumour Development ^ 425.5 Single Risk : Death due to Other Causes 445.6 Competing Risks : Single Frailty ^ 445.7 Competing Risks : Two Frailties 455.8 Including Covariates ^ 475.9 Assessing Model Choice 486 Discussion 536.1 Approaches to Frailty Models ^ 536.2 Specification of Distributions 54iv6.3 Techniques ^ 566.4 Assessing Models ^ 57Appendices^ 60A Quadrant Dependence of Frailty Models^ 60B Female Rat Litter Data^ 61C Male Rat Litter Data^ 62D Performance of Weibull/Gamma Model on Diagnostic Plot^63E Derivatives of Survivor Function in Two-Frailty Model^64Bibliography^ 66vList of Tables3.1 Comparison of estimators for 99 simulated samples of 50 observations inthe Weibull/gamma model ^ 183.2 Comparison of estimators for 100 simulated samples of 200 observationsin the Weibull/gamma model ^ 193.3 Comparison of estimators for 100 simulated samples of 50 observations inthe Weibull/positive stable model ^ 213.4 Comparison of estimators for 100 simulated samples of 200 observationsin the Weibull/positive stable model ^224.5 Probabilities of intervals for log L in exponential example ^325.6 Estimation with rat—litter data : single risk (tumour) ^425.7 Estimation with rat—litter data : single risk (other causes) ^445.8 Estimation with rat-litter data : competing risks, single frailty ^465.9 Estimation with rat-litter data : competing risks, two frailties ^485.10 Estimation with rat-litter data : single risk (other cause), proportionalhazards model ^ 495.11 Predictive scores ^51viList of Figures4.1 Predictive density ratio as a function of Y (exponential example)^. . . . 314.2 Approximate full and empirical predictive densities (gamma frailty)^. . . 344.3 Histograms of observed predictive density ratios: approximate full versusempirical Hayes ^ 354.4 Approximate full and empirical predictive densities (stable frailty) . . . 375.5 Diagnostic plots based on Kaplan-Meier estimates of the marginals^.^. 435.6 Histograms of predictive density ratios :^positive stable frailty versusgamma frailty ^ 50viiAcknowledgementI would like to thank Mohan Delampady for his guidance and supervision throughoutthe development of this thesis. Additionally, I am grateful to Harry Joe and James Zidekfor their careful reading of the manuscript, and their subsequent suggestions. Finally, Iwould like to thank my wife, Reka Vasarhelyi, for her continued patience and support.viiiChapter 1IntroductionOften in experimental sciences and engineering, the time taken for objects or entities tofail or die is studied. The statistical methods and models used to interpret the resultantdata are commonly referred to under the broad heading of survival analysis. Kalbfleish& Prentice (1980), Lawless (1982), and Cox Si Oakes (1984) cover the main ideas ofsurvival analysis and act as guides to the large body of relevant literature.A small fraction of this literature deals with survival analysis within the Bayesianparadigm. Some such works are Cornfield & Detre (1977), Martz & Waller (1982),Kalbfleish (1985), and Gammerman & West (1987). Another small fraction is concernedwith multivariate survival analysis. In this context, by multivariate we mean situationswhere several survival times somehow belong to a single cohesive unit or family. Holt &Prentice (1974) and Wild (1983) are two works in this category. Also related are works onthe reliability of systems of several components, such as Lindley & Singpurwalla (1986).The literature is much sparser in the area of intersection of these two categories (whichwe can call Bayesian multivariate survival analysis). In fact, there is apparently no litera-ture dealing with multivariate survival from an explicitly Bayesian perspective. However,the type of multivariate survival models developed and used by Clayton (1978), Hougaard(1986b), and Oakes(1986,1989), though developed from a frequentist perspective, havean implicit hierarchical Bayes structure. We aim to bring this Bayesianity to the fore inour investigation of these models. For completeness, it should be noted that there arehierarchical structures for multivariate survival other than the one we will investigate.1Chapter 1. Introduction^ 2For an example, see Lindley & Singpurwalla (1986).To meet our aim, in Chapter 2 we introduce the model from the point of view ofa hierarchical model as well as from a frailty model perspective. We also introducerelevant aspects of Bayesian inference. Chapter 3 is concerned with inference, usingsimulation to examine techniques. Then, Chapter 4 focusses on the problem of predictinga survival time given the survival times of other components of the same unit. The ideasof Chapters 3 and 4 are put to use in the analysis of a real data set, contained in Chapter5. The data is lifetimes of rats—for each of a number of litters, a multivariate response(lifetimes of litter members) is observed. The hierarchical frailty model is employed asan attempt to account for factors common to a litter. The situation is complicated bythree possible mechanisms for terminating individual observations (tumour appearance,sacrifice, natural death), as well as a treatment suspected to promote tumour appearance.Finally, Chapter 6 attempts to address some of the issues raised in the first five chapters.Chapter 2Models and Techniques2.1 Conditionally Independent Hierarchical ModelsA Bayesian model is hierarchical if there are several stages of information. A two-stagemodel would involve a response Y coming from a density f (y 10). In turn 0 would comefrom a density g(0 I A). In general, 0 would be referred to as the parameter, while A is thehyperparameter. The distribution g on 0 could represent prior information or structuralinformation. Additional prior information could be specified in the form of a completelyspecified distribution on A. Alternately, A could be estimated from the data. Usually,the investigator would wish to make inferences about the parameter 0, although there isnothing to prevent making inferences about the hyperparameter A.Kass & Steffey (1989) formalize the class of conditionally independent hierarchicalmodels (CIHM's) as follows. Say we have observation vectors Y of length n i , i =1,... , k,parameterized by 0, E 0, i = 1, . . . ,k and by A E A, where 0 and A have dimension qand r respectively. Then we have a CIHM if:Stage 1. conditionally on (01 ,^, 0k) and A, the Y are independent with densities f (yi0i , A) (i = 1,... , k) belonging to a family { f(y 10, A) : 0 E 0, A E A};Stage 2. conditionally on A, 0 1 ,^, 01, are independent and identically distributed with den-sity from {g(0 A) : A E A}.We can think of observing k units or families, with the i th unit consisting of n, compo-nents or individuals. Then the 0,'s are unit-specific parameters, while A is a parameter3Chapter 2. Models and Techniques^ 4common to all units. A fully Bayesian approach to this model would incorporate a priordistribution on A. Then given data, one could in principle determine the joint posteriordistribution of any subset of (0, A) = (0 1, , Ok, A 1 , , Ar )•Of course such a model can be collapsed to a one-stage model parameterized by A.Then Y =^, Yk ) has density given by:kP(Y A)^Et=if (y2 10i, A)^ (2.1)where the expectation is with respect to the distribution on O i , and thus depends on A.If we were only interested in the 02 's, we could take a parametric empirical Bayes (PEB)approach. This would involve finding a maximum likelihood estimate ,\ by maximizing(2.1), and then proceeding as if A were the true value of A. This approach is discussedin Kass & Steffey (1989). Of course if the n i 's are small, we cannot hope to get usefulinformation about individual Oi 's. In this case we can only hope for information about A(providing k is suitably large). So in some cases we can only expect to get informationabout the distribution of the unit-specific parameters, rather than information about theparameters themselves.Before we employ CIHM's, we wish to introduce the notion of frailty. Hopefully, thiswill lend a physical interpretation to the model structure.2.2 Hazard Rate and FrailtyOften when working with distributions on the positive reals, and especially when studyingsurvival times, distributions are characterized by their hazard rate or hazard function. Ifa continuous non-negative random variable has distribution function F, then its hazardrate h is given byh(t) = --ddt log[l — F(t)1Chapter 2. Models and Techniques^ 5A common method of including covariates in a hazard rate model was introduced by Cox(1972). If an individual has observed covariate vector z then his hazard rate is assumedto beh(t I z) = g(z t ,3)ho (t)where 0 is an unknown parameter vector, g is a fixed function (usually the exponentialfunction), and h0 is an unknown function, the baseline hazard.Modelling concomitant variables, as above, is a way to account for heterogeneity ina population. However, in many situations it is reasonable to expect that a populationhas sources of heterogeneity not accounted for by covariates. For this reason, Vaupel,Manton & Stallard (1979) introduced an unobservable positive scalar quantity, frailty, ina hazard rate model. If an individual has frailty 0, then his hazard is modelled ash(t 10) = Oh o (t)where again ho is a baseline hazard common to a population. As in the Cox model,the effect on the hazard is multiplicative. So one can think of frailty as an unobservedcovariate. (To be consistent with the Cox model, frailty would be a function of anunobserved covariate vector.) A model with covariates and frailty is also reasonable,both would have a multiplicative effect on the baseline hazard.If one makes no assumptions about the frailty of individuals in a population, it is un-reasonable to expect to make inferences about the population. Hougaard (1984, 1986a)suggests assuming that within a population, frailty is distributed as some parametricfamily. A gamma distribution is mathematically convenient, and has been used in sev-eral studies (e.g. Clayton, 1978). Hougaard presents arguments in favour of a stabledistribution of frailties. Other distributions could be used; however, as we shall see,only distributions with closed form Laplace and inverse Laplace transforms (such as thegamma and positive stable distributions) will be computationally feasible.Chapter 2. Models and Techniques^ 6It is also clear that if the frailties in a population are all independent realizationsfrom a distribution then unless very strong assumptions are made, it will be difficult toidentify the frailty distribution in the presence of the baseline hazard, and vice-versa.There are, however, situations where one expects that certain individuals will sharea common frailty. For instance it would be reasonable to assume that members of afamily have the same frailty, and that conditionally on this frailty the family memberlifetimes are independent. In such a context the frailty might be a function of genetic andenvironmental factors (Hougaard, 1986a). With two or more individuals per frailty value,it becomes reasonable to estimate both the frailty distribution, and the (parameterized)baseline hazard.2.3 Multivariate Frailty Model as a CIHM.A model with observations on families, as mentioned above, fits into the framework of aCIHM. If Y (Yii , , Yin.) is the vector of lifetimes from the i t h family, then we takeO to be the family frailty. We partition A = (A(a), AM) so that A(a) parameterizes thefrailty distribution, while A( b) parameterizes the baseline hazard.In general we might expect to have several baseline hazards; that is, we might expectour observations to be classified as coming from various groups. For instance in a clinicaltrial we might have a treatment group and a control group, or with data on fathers andsons we would treat fathers as one group and sons as another group. If we have p groups,then we can specify the first stage of the model by specifying functions h 1 , , hp suchthat Yi3 I O has hazardOihk(yij I A (1)) )^ (2.2)when^belongs to the kth group. Note that since 0, is positive, (2.2) will be a validhazard function precisely when hk is a valid hazard function.Chapter 2. Models and Techniques^ 7Because we will be dealing with survival times and failure times, we wish to allow forright-censored observations. In the usual way, we introduce an indicator d ij taking value0 if Yij is censored at yij , and 1 if Yij = yij is observed. We assume that given a family'sfrailty, censoring arises in a way that makes the likelihood of independent observations aproduct of densities and survivor functions. This is true under fairly weak conditions—for a discussion, see Lawless (1982, pp. 31-44). Then, suppressing the dependence of thehk 's on Alb), the first-stage likelihood iskL(0, A I y,^=^L(0i ,^yi , di )^ (2.3)i=iwhereL(0i , A I yi, di ) = [eihk,,(yij)exp(-0iHkti (yij))]d.[exp(-0iHk ,, (NM] }^(2.4)in which kij denotes the group in which Yij belongs, while Hk (y) = foY h k (t)dt. Simplifyingand integrating with respect to the Oi 's gives the unconditional (collapsed) likelihoodL(A I y, d) =^[ H h (y id E),91(a) [O iDs exp(—O illi(yi))]^(2.5)i=1^{i:d,,=1}where Di = E; t i dij , and Hi(yi) = E 7.1` 14,(Yij)•One possibility for the second-stage is that the O i 's come from a gamma distribution.Note that in this case, (2.5) is relatively easy to evaluate. However, if the gamma distri-bution has unconstrained shape and scale parameters, one might expect an identifiabilityproblem with the likelihood (2.5) (this is certainly the case with the Weibull model of thenext section). A natural restriction is to fix the mean frailty at 1. This is the approachtaken by Clayton (1978). Hence an individual with average frailty has baseline hazard,and scale is accounted for in the first stage of the model. We choose to parameterize thegamma frailty distribution so that A(a) = A o , and the frailty density is"g(0) =^•T^r 0"' exp(—a0)^ (2.6)Chapter 2. Models and Techniques^ 8where a = —(log A 0 ) -1 , and Ao E (0,1]. This parameterization has the advantage that,Xo lies in a finite interval, with the important special case of unconditional independencewithin families occuring at an endpoint (A 0 = 1). It can also be shown that the limitingcase as A o decreases to 0 is the Frechet upper bound (perfect positive dependence) (Oakes,1989). Assuming (2.6), (2.5) can be evaluated using[0D exp(—H0)] — F(D + a) ^as(2.7)F(a) (H cr)(D+a)where again a = —(logHougaard's suggestion is a positive stable distribution of frailties. These distributionsare characterized by their Laplace transforms (moment generating functions). Specifi-cally,E°0 [exp(—sO)] = exp(-3 )0 ), Ao E (0, 1]^(2.8)The stable distributions have long right-tails. In fact, when A o < 1, the kth moment ofthe variate defined in (2.8) exists if and only if k < Ao , and so in particular, the mean isinfinite. The density is complicated, given byg(0 1Ao) = ^ruc k, + 1) ( 0 -A0 ) k sin(A0 kr)r k=1^k!A good discussion of stable variates is found in Feller (1971, pp. 169-76).As in the gamma case, A o = 1 means the distribution is degenerate at 0 = 1, andwe have independence within families, while as A o decreases to 0, we obtain the Frechetupper bound (Oakes, 1989). We can evaluate (2.5) by differentiating (2.8) D times withrespect to s. The differentiation gives a recursive form for (2.5) (Hougaard, 1986b). Inparticular,Et [OD exp(-1-16)1 = (AoH)D exp(—HA0 )R(A0 , D, H)^(2.9)where{ 1^if D = 0R =EDm=-01 cD7m H-mAo otherwiseChapter 2. Models and Techniques^ 9andrn = 0 ;k -1,m + C k-lon-10 — 1A0 1 — (k — m)] m =1,...,k — 2 ;-k^x lr(k-Ao) m = k —1 .,, c)^r(1-A0)A potentially important difference between gamma frailty and positive stable frailtyhas been noted by Hougaard (1984,1986a). If a population has frailty distribution givenby density g(0), then one can consider the distribution of frailties amongst those indi-viduals living at time t > 0. We denote this distribution gt (0). For any second-stagedistribution, with gamma frailty the coefficient of variation of gt is constant in t, whilewith positive stable frailty, the coefficient of variation is decreasing in t. (Hougaard(1984) suggests that a normalized measure of variability, like the coefficient of variationof the frailty distribution, is an appropriate measure of heterogeneity.) So, in this sense,the population becomes more homogeneous with time under stable frailty, but not undergamma frailty. Since it seems sensible that the sub-population of long-lived individualsbe more homogeneous than the whole population, this supports the use of stable frailtyrather than gamma frailty.So, gamma and stable distributions are candidates for the second stage of the model.A possible first stage is suggested in the next section.2.4 The Weibull ModelA useful parametric family for survival times is the Weibull distributions. FollowingHougaard (1986b), we introduce a two-group model, with groups 1 and 2 being treatmentand control groups respectively. We take A (b) = (A i , A2, A3 ), and specify the conditionalhazards byhl (y I Al, A3) = A1(A3YA3-1)C k,m =Chapter 2. Models and Techniques^ 10h2(y I A2) A3) = A2(A3Y A3-1) (2.10)So the groups have Weibull conditional hazards, with common shape parameter, but dif-ferent scale parameters. Under both gamma and stable frailty, this leads to an identifiableunconditional likelihood.2.5 Properties as a One-Stage ModelThe properties of a frailty model treated as a multivariate distribution have been studiedby several authors. Often they arise in the context of bivariate distributions with givenmarginals (these can be classified by bivariate distributions with uniform marginals, oftenreferred to as copulas—see Genest and MacKay, 1986a,b).If the marginal survivor functions of T1 and T2 are specified as S1 and S2 respectively,then, as noted in Oakes (1989), the joint survivor function S(t) = P(T1 > t i , T2 > t 2 ) isgiven byS(t) = p[q(Si (i i)) + q(S2(t2))] (2.11)where p is the Laplace transform of the frailty density, and q is the inverse function of p.Oakes (1989) points out that this is a subclass of the Archimedian bivariate distributionsconsidered by Genest & MacKay (1986a,b). For another generalization, see Marshall &Olkin (1988).There are of course a number of possible measures of the association between T1 andT2. These measures tend to classify the association as positive, in the sense that T 1and T2 have a tendency to be simultaneously large or simultaneously small. A usefulcharacterization is the notion of quadrant dependence given in Lehmann (1966). (T1 , T2 )are positively quadrant dependent ifS(t) ? S1(t1)S2(t2) Vtl) t2^ (2.12)Chapter 2. Models and Techniques^ 11It is possible to show that if the log of the Laplace transform of the frailty density isconvex then the bivariate frailty model is positively quadrant dependent for any choiceof second-stage distributions (Appendix A). From (2.7) and (2.8) we see this convexitycondition holds for both gamma and positive stable frailty. By results in Lehmann (1966),this implies that the following measures of dependence/association are non-negative 1 :Cov(Ti , T2 ) (provided it exists), Kendall's T (Kendall, 1938), Spearman's ps (Spearman,1904), and Blomqvist's quadrant measure (Blomqvist, 1950).Oakes (1989) suggests using Kendall's 7 in frailty models as it depends only on thesecond-stage (frailty) distribution. The definition isT= E {sign [ (e) - T(2))(e) - Ta2))11where T( i)(i = 1, 2) are independent copies of T. Genest & MacKay (1986b) show thata model of the form (2.11) satisfieso o7 = 4 f sp(s)p"(s)ds —1o(2.13)With our parameterizations, under gamma frailty T = (2a + 1) -1 , where a = —(log A0 ) -1 ,while under positive stable frailty, T = 1 — Ao (Oakes, 1989). Estimating T from data andsubsequently estimating the frailty distribution parameter is discussed in Oakes (1986).The same author suggests a method of assessing the frailty distribution choice withoutspecifying conditional hazards (Oakes, 1989).2.6 Bayesian InferenceOf course from a one-stage model perspective, we can only make inferences about A. AnyBayesian inference about A will involve finding the posterior distribution of A. If the prior1 This is actually a special case of a result in Genest & MacKay (1986a) concerning a stochasticordering (concordance ordering) on copulas. Our case is obtained by comparing our copula to the inde-pendence copula. The condition on the log of the Laplace transform can be weakened to superadditivity.Chapter 2. Models and Techniques^ 12information about A is given by distribution 11(A), then the posterior is given byII(A I y) = ^[11 /; i L(A I yl)] 11(A)—f [Tric=i-L(A I yi)] 11(A)dAThe mean and variance of a component Ai with respect to the posterior are known as theposterior mean and posterior variance of Ai, and are good summaries of the investigator'spost-experimental knowledge about A i . Alternatively, the posterior distribution of A i canbe computed according toli(ai I Y) =^/11(a I y)dAiThis distribution can be plotted for a visual display of the knowledge of A i . Of course ifA were 1 or 2-dimensional, then the complete posterior could be plotted. Unfortunately,this is seldom the case. Bayesian analysis commonly requires the use of approximationtechniques.2.7 Using Laplace's Method to Approximate Posterior QuantitiesTierney & Kadane (1986) discuss the use of Laplace's method to approximate posteriorquantities. For example if a model with k observations is parameterized by vector A,with log-likelihood £(X) and prior r(\), then the posterior mean of a function g(\) canbe writtenf exp(kL*(A))dA E(g(A) I data) =f exp(kL(A))dAwhere L(A) = k -1 (log7r(A) L(A)) and L*()) = k -1 (logg()) + log r(A) £(A)). TheLaplace approximation amounts to replacing both numerator and denominator integrandswith second-order Taylor approximations about their respective global maximas. ThisChapter 2. Models and Techniques^ 13results in integrals of unnormalized multivariate normal density functions. The approxi-mation becomesE(g(\) I data) = (det Edet* 1/2) exp{k(L*(A*) — L(A))1Ewhere A and A. maximize L and L* respectively, and E and E* are the respective negativeinverse Hessians evaluated at these maxima. The thrust of Tierney and Kadane's argu-ment is that when g is a positive function, the numerator and denominator integrandsare of similar shape. Thus the two Laplace approximations make similar errors, and soin taking the ratio the error is reduced. In fact while a single integral approximation has0(k -1 ) error, the ratio approximation has 0(k') error.This method could be used to estimate the posterior means and variances of compo-nents of ) in our model, using the likelihood in (2.5). Note that this method requires onlymaximization of slightly modified likelihood functions, and so should be feasible when-ever maximum likelihood estimation is feasible. In general, the method can be adaptedto estimate marginal posterior densities.Laplace's method will be used to approximate posterior means in the simulated datasets of the next chapter.Chapter 3Estimation and Simulation3.1 Estimating the Common ParameterAs mentioned previously, when family size is small, as in a bivariate model, we seekinformation about the common parameter A. Using the unconditional likelihood (2.5),there are several possible methods of point estimation of A. The most common wouldbe maximum likelihood estimation; however, with a prior distribution on A, Bayesianestimation is more general and appropriate. Neither method will produce closed-formestimators, given the form of the likelihood (the univariate Weibull model with censoringhas no closed-form MLE). However, using an iterative procedure, the MLE should notbe hard to obtain. Possibly, computation of posterior means would be more difficult, butapproximating posterior means using Laplace's method should not be much harder thanfinding the MLE, given the similarity in method.We propose to simulate data from the Weibull model of section 2.4, with one memberof each group per family. Hougaard (1986b) found maximum likelihood estimates in thepositive stable frailty case, using a heavily censored trivariate sample with 50 observations(the sample is lifetimes of female rat litters, with one treated sibling per litter, found inMantel, Bohidar & Ciminera, 1977). Using a quasi-Newton minimization routine, wewere able to reproduce Hougaard's results. The routine was also sucessfully applied tothe same data through the gamma frailty model. These estimates were used to choosehopefully realistic parameter values for simulation. Analytic derivatives of the likelihood141 A 1: 1 A2 1 .16 2 Ao E (0,1], Ai > 0 (j = 1,2,3)11 2 (A) = otherwiseChapter 3. Estimation and Simulation^ 15are complicated in both models; for simplicity, we use the derivative-free routine in oursimulations. We found in general that we could maximize the likelihood (and the slightlymodified likelihoods needed for Laplace's method) fairly routinely. Of course, to doBayesian estimation, we must also specify prior distributions.3.2 Prior InformationRecognizing our relative paucity of prior information, we use vague priors on A. Onesuch prior is111(A) =^1 Ao E (0,1], Aj > 0 (j = 1,2,3)0 otherwiseWe note from (2.10) that A i-1 / A3 is a scale parameter for Y 1 and similarly A2 11A3 is ascale parameter for Y2 (a conditional scale factor is also an unconditional scale factor).Computing the necessary Jacobian, we see that the scale invariant prior will beWe try both priors in the simulation study.3.3 ReparameterizationsThere are of course an infinite number of possible reparameterizations of our model. Thepoint maximizing the likelihood is invariant under transformation of the parameter space.The posterior mean of a parameter in a given parameterization can be computed usingan arbitrary parameterization for computation. For instance suppose we are interested inthe posterior mean of A0 , but we parameterize using i = g(A). The likelihood becomesL'(K) = L(g -1 (!c)). Our prior becomes 11'(K) = 11(g -1 (K)) 1'i/7,g -1 001 with respect towhich we want to find the posterior mean of (g -1 (K)) 0 . So reparameterization just inducesChapter 3. Estimation and Simulation^ 16a change of variables in integration; the true posterior mean is unaffected. However, whenusing Laplace's method, the Taylor approximations to the integrands could be better withone parameterization rather than another. So, a prudent choice of transformation g couldpotentially improve the estimation, when using approximate posterior means.In the Weibull model, the parameter space of the parameterization of interest, istruncated. (A 0 E (0, 1], A > 0, j = 1, 2, 3). However, the approximating integrals inLaplace's method are over all of 7?.". Perhaps choosing a parameterization that simi-larly has support on 7?" might improve the approximation. To this end we introducereparameterization 1:which has Jacobian/co = — log(— log AOIC1 = log A itz2 = log A2K3 = log A3(3.14)cjg -1 (10 = exp ('Cl + K2 + K3 — 'CO — C") A second reparameterization was used by Hougaard (1986b) in performing maximumlikelihood estimation in the positive stable model:'CO = A0AI X°K2 = A2 A°(3.15)for which the Jacobian isK3 AO A3(—)3 (K1K2) 11"-1KOWe try using the original parameterization and the two reparameterizations for compu-tation in the simulation study.Chapter 3. Estimation and Simulation^ 173.4 Weibull/Gamma Simulation StudyThe obvious way to simulate data from the Weibull Model is to generate each family'sfrailty independently from the frailty distribution, and then generate the lifetimes withina family as independent Weibull variates with parameters depending on the frailty. So inthe gamma frailty case, it is a straightforward exercise in simulating gamma and Weibullvariates. We fix the parameter value at A = (.6, 1.0, 1.0, 3.0), and generate 100 samplesof size 50 each 1 and 100 samples of size 200 each (the samples of size 200 are extensionsof the samples of size 50). The parameter value was chosen on the basis of being closeto preliminary parameter estimates for the data analyzed in Chapter 5.We treat the original parameterization as the parameterization of interest. Using eachparameterization, we compute the posterior mode of A, and the approximate posteriormeans of the components of A. Note that the MLE will be posterior mode of A usingthe original parameterization and prior 11 1 . For each of the six estimators (and fourcomponents of A), we report the mean of the 100 estimates. Also, as a realized loss inestimation, we report the sums of squares of the 100 deviations between parameter andestimate. The results are displayed in Tables 3.1 and 3.2.The simulations with sample size 50 indicate that the approximate posterior meanswith respect to 11 2 outperform those computed with respect to 11 1 , in terms of realizedloss. The only exceptional cases occur with component A 0 , using the original parameteri-zation and reparameterization II. The difference in loss is most noticeable in componentsA l and A2 using the original parameterization and reparameterization I. Using reparam-eterization II seems to slighty improve the performance of the approximate posteriormeans; however, the estimates do not seem to be particularly sensitive to the selectionbetween the original parameterization and reparameterization I. The performance of thelOne sample of size 50 produced a likelihood which our routine could not maximize; therefore, resultsare provided based on the other 99 samples.Chapter 3. Estimation and Simulation^ 18Table 3.1: Comparison of estimators for 99 simulated samples of 50 observations in theWeibull/gamma bivariate model with parameter vector A = (0.6,1.0,1.0,3.0)EstimatorAo (0.6)Mean LossAl (1.0)Mean LossA2 (1.0)Mean LossA3 (3.0)Mean LossORIG. PARAM.Prior IApprox. Post. Mean .564 1.56 1.14 10.84 1.14 7.67 3.16 12.78Posterior Mode (MLE) .623 1.74 1.01 5.67 1.01 3.60 3.05 9.65Prior IIApprox. Post. Mean .608 1.61 1.04 6.10 1.04 3.97 3.05 9.30Posterior Mode .663 2.08 .932 4.65 .934 3.10 2.95 8.72REPARAM. IPrior IApprox. Post. Mean .561 1.49 1.16 11.55 1.16 8.28 3.17 12.92Posterior Mode .560 1.42 1.12 9.28 1.12 6.37 3.16 12.31Prior IIApprox. Post. Mean .599 1.26 1.05 6.32 1.05 4.12 3.06 9.25Posterior Mode .599 1.20 1.02 5.58 1.03 3.54 3.06 8.92REPARAM. IIPrior IApprox. Post. Mean .592 .809 1.09 6.92 1.09 4.65 3.11 9.41Posterior Mode .495 3.74 1.18 17.84 1.18 12.71 3.26 22.07Prior IIApprox. Post. Mean .617 1.02 1.02 5.04 1.02 3.26 3.02 7.76Posterior Mode .556 2.65 1.04 7.85 1.04 5.03 3.12 13.57Chapter 3. Estimation and Simulation^ 19Table 3.2: Comparison of estimators for 100 simulated samples of 200 observations inthe Weibull/gamma bivariate model with parameter vector A = (0.6,1.0,1.0, 3.0)EstimatorAo (0.6)Mean LossAl (1.0)Mean LossA2 (1.0)Mean LossA3 (3.0)Mean LossORIG. PARAM.Prior IApprox. Post. Mean .590 .318 1.04 1.56 1.04 1.39 3.05 2.37Posterior Mode (MLE) .606 .328 1.01 1.29 1.01 1.14 3.02 2.16Prior IIApprox. Post. Mean .600 .308 1.02 1.34 1.02 1.18 3.02 2.15Posterior Mode .616 .350 .990 1.20 .989 1.06 2.99 2.09REPARAM. IPrior IApprox. Post. Mean .590 .317 1.04 1.57 1.04 1.40 3.05 2.37Posterior Mode .591 .309 1.03 1.48 1.03 1.32 3.04 2.34Prior IIApprox. Post. Mean .600 .307 1.02 1.34 1.02 1.18 3.02 2.15Posterior Mode .601 .300 1.01 1.29 1.01 1.14 3.02 2.13REPARAM. IIPrior IApprox. Post. Mean .591 .314 1.04 1.56 1.04 1.39 3.04 2.35Posterior Mode .581 .387 1.03 1.55 1.03 1.38 3.06 2.57Prior IIApprox. Post. Mean .601 .305 1.02 1.34 1.02 1.18 3.02 2.14Posterior Mode .592 .357 1.01 1.34 1.01 1.18 3.03 2.29Chapter 3. Estimation and Simulation^ 20MLE is relatively good, except in component A o . Empirically then, one might prefer touse approximate posterior means when the frailty distribution parameter is of primaryinterest. Not surprisingly, with sample size 200, there is little difference in the estimators.3.5 Weibull/Positive Stable Simulation StudyThe positive stable frailty simulations were carried out in exactly the same fashion asin the gamma case. The true parameter value was fixed at A = (.9, 1.0, 1.0, 3.0) (again,this is based on estimated values from the data in Chapter 5). To simulate positivestable random variates, a formula due to Kanter (1975) was used. The stable variate 0is realized according toA-1-1[a(X01 °0 = X2where X1 and X2 are independent; X1 is uniform on [0,11], while X2 has a standardexponential distribution. The function a is given byA0 sin[(1 — ) 0 )x] {sin(A0x)} 1-A0{sin x}i - A0This is a special case of the method for general stable variates presented in Chambers,Mallows & Stuck (1976). The simulation results are presented in Tables 3.3 and 3.4.As previously, the simulations with sample size 50 indicate that using 11 2 rather thanH i improves the performance of the approximate posterior means, in terms of realizedloss. However, it seems there is less sensitivity to reparameterization than in the gammafrailty case. The MLE again performs well in components A i , A2, A3, but has higherrealized loss than approximate posterior means in component A o . The results with samplesize 200 indicate very little difference between the various estimators.The simulated samples and the resultant estimates are put to use in the study of aprediction problem in the next chapter. Also, both maximum likelihood and approximateChapter 3. Estimation and Simulation^ 21Table 3.3: Comparison of estimators for 100 simulated samples of 50 observations in theWeibull/positive stable bivariate model with parameter vector A .-,- (0.9,1.0, 1.0, 3.0)EstimatorAo (0.9)Mean LossAl (1.0)Mean LossA2 (1.0)Mean LossA3 (3.0)Mean LossORIG. PARAM.Prior IApprox. Post. Mean .865 .617 1.04 3.32 1.06 4.24 3.17 16.32Posterior Mode (MLE) .896 .561 1.00 2.70 1.02 3.52 3.10 13.92Prior IIApprox. Post. Mean .879 .500 1.01 2.76 1.01 3.60 3.11 14.06Posterior Mode .909 .519 .973 2.48 .991 3.27 3.05 12.47REPARAM. IPrior IApprox. Post. Mean .850 .621 1.05 3.50 1.07 4.55 3.22 17.16Posterior Mode .839 .737 1.05 3.51 1.07 4.60 3.25 18.47Prior IIApprox. Post. Mean .862 .469 1.01 2.88 1.03 3.81 3.16 14.54Posterior Mode .851 .562 1.01 2.93 1.03 3.88 3.19 15.61REPARAM. IIPrior IApprox. Post. Mean .864 .755 1.04 3.39 1.06 4.36 3.17 16.44Posterior Mode .871 .735 1.01 3.00 1.03 3.84 3.16 17.34Prior IIApprox. Post. Mean .881 .516 1.01 2.77 1.03 3.70 3.13 17.44Posterior Mode .887 .618 .979 2.67 .997 4.34 3.10 15.14Chapter 3. Estimation and Simulation^ 22Table 3.4: Comparison of estimators for 100 simulated samples of 200 observations inthe Weibull/positive stable bivariate model with parameter vector A = (0.9,1.0,1.0,3.0)EstimatorAo (0.9)Mean LossAl (1.0)Mean LossA2 (1.0)Mean LossA3 (3.0)Mean LossORIG. PARAM.Prior IApprox. Post. Mean .892 .179 1.02 .606 1.01 .523 3.05 3.70Posterior Mode (MLE) .900 .186 1.01 .568 1.00 .489 3.03 3.57Prior IIApprox. Post. Mean .894 .167 1.01 .599 1.00 .516 3.04 3.55Posterior Mode .903 .178 .997 .560 .995 .498 3.02 3.47REPARAM. IPrior IApprox. Post. Mean .887 .165 1.01 .610 1.01 .544 3.07 3.69Posterior Mode .880 .177 1.01 .618 1.01 .551 3.09 3.911Prior IIApprox. Post. Mean .891 .153 1.01 .582 1.00 .519 3.05 3.46Posterior Mode .884 .161 1.01 .590 1.00 .526 3.07 3.64REPARAM. IIPrior IApprox. Post. Mean .890 .176 1.01 .607 1.01 .540 3.06 3.75Posterior Mode .892 .191 1.01 .583 1.00 .517 3.05 3.85Prior IIApprox. Post. Mean .894 .168 1.01 .579 1.00 .515 3.04 3.57Posterior Mode .896 .184 .999 .567 .977 .504 3.04 3.68Chapter 3. Estimation and Simulation^ 23Bayesian estimation are utilized in the data analysis of Chapter 5.Chapter 4PredictionIn many instances it may be desirable to use information about a family's frailty in or-der to predict individual lifetimes. Of course, to do so would also require informationabout the individual's baseline hazard. For simplicity, we will assume that if we wishto predict potentially observable survival time Yi then we have observed all the othercomponents of the ith family (unobserved family members need not be included in themodel). We denote these components by Yi ,(3) = (Yo , , Yij-11 Yi,j+1) • • • Yi,ni )• So,if we have observed m families, and the individual of interest belongs to the ( in + l)stfamily, then we wish to study Y„,,+1,i I Yi , • • • , Ym, Ym +1,(i). In such a situation, observing, Y,n , Yni+1 ,(i) gives information about the baseline effects and the frailty distribu-tion; however, only n i, +1 ,(i) can give direct information about ° m+1 , the frailty of Ym+1,3 •The question of how much improvement in our knowledge about 0 m+1 is realized by ob-serving 17,n+1 ,0 ) is crucial to the prediction. Before delving into detail, we look at whythis scenario might arise naturally.Often we might have a situation where within a family, survival times of one groupare generally measured before those of another group. In such a setting it may be ofinterest to predict the second group times given the first group times. For example,having observed the lifetimes of a number of father-son pairs, one might want to predicta son's lifetime given his father's lifetime. Similarly, if an individual has two responses,with one often larger than the other, such prediction might be relevant. For instance, Holt& Prentice (1974) report survival times of several skin-grafts per patient, with a mixture24Chapter 4. Prediction^ 25of poor-match and good-match grafts. In a clinical situation, if a patient has grafts ofboth types, it might be desirable to predict the failure times of the good grafts, havingobserved the poor-match failure times. Additionally, in a clinical counselling situation,the onset times of a disease in family members could be used to predict the onset timein an unaffected individual.Given a data set, one approach to the prediction problem is to assume the valueof the underlying parameter A is known and given by an estimator. This is known asan empirical Bayes procedure. Alternatively, one can incorporate the uncertainty aboutthe value of A by considering the posterior distribution of A. Here, we consider bothapproaches.4.1 Parametric Empirical Bayes Approach to PredictionTo take a Parametric Empirical Bayes (PEB) approach to prediction, we assume thatbased on the first in families and the partial observation of the (m 1)' family, wehave an estimator A of A. This could be the maximum likelihood estimator, a vector ofapproximate posterior means, or another estimator. The PEB approximation amountsto assuming A = A. So given , Ym , Ym+1 ,(3) , we assume 0,+1 is distributed withdensity g(Om+i I ;I■ ). If we treat this as a prior distribution for Om+i , and approximatethe density of Ym+i ,(,) by f (ym+1 ,(i)161m+i , A), an application of Bayes theorem gives thedensity of Om+1 I Y1, • • • , Ym, ICH-1,(i) as proportional to f (Ym4-1,(j) I Om+1, 5)9(0,.+1follows that the predictive density of Ym+i,j isI f (Y m+1,3 I e m+1,A)f (Yrni-1 ,()) I 19m+1) A)g(Om+1 I A) dOm+1 (4.16)I f (Ym+1,(x)1 0.+1,A)g(em+lIA) de.+1We note that the denominator of (4.16) is simply the marginal unconditional density ofYm+1,(3 ) given A = A. Similarly, given the conditional independence structure, the numer-ator of (4.16) is the unconditional joint density of Ym+1 = (Ym+1,( 3 ), Ym+1,3 ). So the PEBI A). ItChapter 4. Prediction^ 26predictive density is simply the conditional density of Ym, +1 ,3 I Yin+1 ,(3) with parameter Areplaced by its estimator (the estimator contains the information from I'm). Wedenote this density by pi(ym+1, 3 I Ym+1,(i),A)•4.2 Full Bayesian Approach to PredictionTo take a full Bayesian approach to prediction, we need the posterior distribution ofOm+i , A I li, • • • , Ym, Ym +1,(j). By the definition of conditional probability, this will be theproduct of the densities of 0,2+1 I A, li, • • • '177.'1'922+1,w and A I Y1 , • • • , 17m, Ym-1-1,(A• But0m+1 I A, li, Ym , Ym+l,(j) —= 0.+1 I A, Ym+1,(i), which has posterior densityf(Ym+1,(i) I 0.+i, A)g(0,72+1 I A)f f(Ym+i,(3) I em+i, A)g(0,72 +1 I A)d0.+1while the distribution of A I Y1 , .. • , Yrn, Y,22+ 1,(3) is just the posterior distribtion of Agiven all the observed data. We denote the corresponding density by T(o). Hence thepredictive density will beI f f (Ym+1,3 I 0,2+1,A)^f(Ym+1,(i) Om+i, A)g(0m+1A)^H*(A)d0,7,41dAf f (^m+i,(A) I 0.„ +1 , A)g(0',„ +1 A)dOini+1which simplifies to[f f(ym+1,3 I em+i, A)f (Ym+i,(3) Om+i, )0y(Orn-FiA)clemd-ii 1-1*(A)dAf f(Yrrt+1,(3) I em+i, A)g(0m-1-1A)dOm+1or equivalentlyEir fpi(ym+10 I Ym-1-1,(j), A)}^(4.17)We denote this density by p 2 (ym4. 1 ,3 I data). Note that the PEB predictive density canbe obtained from (4.17) by assuming a posterior distribution on A which is degenerateat A. As with other posterior quantities, we anticipate that it will not be possible toevaluate (4.17) exactly.Chapter 4. Prediction^ 274.3 The PEB Approach in the Weibull ModelTo evaluate PEB predictive densities, we can easily obtain conditional densities as ratiosof appropriate joint densities. For example, in a bivariate model, suppose we want theconditional density of Yi,2 I Yo . Suppressing the family index, we know an uncensoredfamily has densityf (yi, y2 I A) = h i (yi )h 2 (y2)40 2 exp(--OH(y))]Similarly, by considering a family of size 1, we know.i(Yi I A) = hi(yi )nOexp( -01/1(M.))]Taking the ratio yieldsPi(Y2 i yl,a)^h2 (y2 )Eg [02 exp(-01/(Y))] Eve exp(-0 (Y1))]Similar arguments give conditionals for censored cases:(4.18)h2(y2)40 exp(-01/(Y))] ^f (Y2 I Y1 >^A) = Eg[exp(-1911 -1(y0)]EVO exp(-011(y))] P(Y2 > Y2 I Y1 = Y1 ' A) — Eve exp(-0Hi(Y1))]P(Y2 > Y2 I Y1 > Y 1 ' A) - Eg [exp(— 0 H (0)] EVexp( -61Hi(Y1))]Note that all these expressions can be easily evaluated in either the Weibull/gammamodel or the Weibull/positive stable model. Assuming Y1 is in group 1 and Y2 is ingroup 2, (4.18) becomes(a + 1)A2A3(0/^AiM,A3 ) (.-1-1) y2 A3-1(a +^+ A2Y2 3 ) (a+2)Pi (y2^A)^ (4.19)where a = —(log A 0 ) -1 . When A3 > 1, this is a unimodal density; otherwise, it is strictlydecreasing.Chapter 4. Prediction^ 28In the positive stable case, as mentioned in Hougaard(1986b), (4.18) isPi(Y2 I M., A) = (A2A3Y,,3-1)A0 {(Aiyi3-1-A23)21A i yiA3x exP[(A2Y2 3 ) A°^(Aiy1A3^A2Y2 3 ) )1 [1 + (A0 1 —^„1,2y3)-,\01Note that in both cases, Ar ) is a scale parameter.(4.20)4.4 The Full Bayesian Approach in the Weibull ModelWe note that with the complicated functional forms for the predictive densities (4.19) and(4.20), analytic evaluation of (4.17) would not be possible even if we had a nice form forthe posterior density. We need to resort to some kind of approximation. Since a predictivedensity is necessarily a positive function, we could use the Laplace approximation method.However, this would have to be applied pointwise: a numerical maximization would berequired for each point at which the predictive density is approximated. Furthermore,the approximate density may need to be normalized, requiring numerical integration.A second possibility would be to replace the posterior distribution with a multivariatenormal distribution. That is, if log(likelihood x prior) has mode it and Hessian —E atthe mode, then replace II* with a N(1t, E -1 ) distribution. This is correct asymptotically(Berger, 1985, p. 224). However, due to the complicated integrand, Monte Carlo methodswould still be required. For little additional cost (and potentially considerable gain) onecould use this Normal distribution as an importance sampling distribution.So, using the usual importance sampling scheme, if II ° is the N(a, E -1 ) density func-tion, and if A( 1 ), , A(') are simulated vectors from this distribution, then our approxi-mation isnp2 (z data) ti E ci pi (z I ym+1 ,2 , A () )i=1II * A (i)where ci f^loo(0) and > ci = 1i=1(4.21)Chapter 4. Prediction^ 29Note that we need only evaluate the posterior II* up to a constant, so it is sufficient justto evaluate the likelihood and prior. Hence, our estimate of p2 is easily normalized.It is of course of interest, within the context of the Weibull model as well as moregenerally, to discuss the relative merits of the PEB and full Bayesian approaches toprediction.4.5 Full versus PEB PredictionGenerally, the difference between any parametric empirical Bayes approach and a fullhierarchical approach is that the PEB approach neglects the experimenter's uncertaintyabout the value of A. For this reason, we expect the PEB predictive density to be over-confident: to be more peaked and have thinner tails than the data can justify. However,the PEB approach does yield the correct functional form of the true conditional density,and if our estimate A of A is particularly good, we might expect the PEB predictivedensity to outperform the full predictive density.Since we are comparing two specified probability density functions intended to de-scribe observable Ym+i,i , Ym , ym+i ,u), we choose to compare their performanceby the density ratioP2 (Ym+1,j Yll • • • Ym Ym+1,(j)) L(Yi, = (4.22)7 • • • Ym, ))Often we prefer to work with the log-ratio due to its symmetry. (A positive value cor-responds to p2 outperforming p i , while the corresponding negative value indicates P1outperforming p2 by the same amount.) Of course in our model, p i and p2 are compli-cated, so it is unlikely we can make any analytic inferences about the distribution of L.For this reason we mention a simple, though somewhat unrelated example of prediction.We speculate there is similar behaviour of likelihood ratios in the example and in ourmodel.Chapter 4. Prediction^ 304.5.1 Exponential Prediction ExampleLet X1 ,^, Xn+i be iid exponential with scale 1/A. Take a non-informative prior II(A)1/A. It is straightforward to see that A , Xn has a gamma distribution with shapen, and inverse scale E:L i Xi . So based on the first n observations, A has posterior meann/ 1 Xi , which is also the MLE. A "plug-in" method neglecting uncertainty aboutA would predict Xn+1 as exponential with scale ETia_ i Xi/n. The full predictive densitywould be the expectation of the density of Xn+1 with respect to the gamma posteriordistribtion on A. The ratio of the second density to the first is(Enr,1 )n f [0 exp( -0Xn+i)]en-1 exp(- (E:L. 1 Xj )0)d0L(Xi ,^, Xn+i) =^exp(-X„+i(n/ E i . 1 Xi))L-2=1This simplifies ton+17n X.^nXn+iL(Xi ,^Xn-Fi ) - ^ exp ^EL /I Xn+1^Erii=1which can be written asL(Xi,^Xn+i) = Yn+1 exp[n(Y -1 - 1)]where Y 2=1Xi1--4=1But by a well known property of independent gamma distribtions, Y has a beta(n, 1)distribution. Note that as a function of Y, L attains a minimum of e( —n+1-2--1 )n+1 at Y =nn , but that L(Y) increases without bound as Y decreases to the left of the minima.1 Furthermore L(0) 1. However, the minimum occurs at the mean of Y's distribution,whereas Y has little mass where L is large. To clarify, we superimpose log L(Y) and thedensity of Y for various n in Figure 4.1. So the region where L < 1 has high probability;however, L is almost surely larger than kn = n-Fi)n+1. Conversely, large values ofL are possible, but are unlikely. Based on the beta distribution function, we tabulaten=100.4^0.6^0.8^1.0Yn=20n=5 0.4^0.6^0.8^1.0Yn=1557:..:5o)O 113.0L°.9^I^.Chapter 4. Prediction^ 31Figure 4.1: Predictive density ratio as a function of Y (exponential example)0.4^0.6^0.8^1.0YDotted Line proportional to density of YHorizontal Line equals zero0.4^0.6^0.8^1.0YChapter 4. Prediction^ 32Table 4.5: Probabilities of intervals for log L in exponential examplen kr, (log k„,0) (0, — log kn ) (— log k„, oo)5 .91 .83 .05 .1210 .95 .85 .04 .1115 .97 .85 .05 .1020 .98 .86 .04 .10probabilities of log L lying in the intervals (log k n , 0),(0, — log kn ), and (log kn , co) forvarious n in Table (4.5).So, the "plug-in" prediction has a high probability of doing slightly better than theproper predictive density, but it cannot do a lot better. With small probability, it willdo much worse. Thus the full predictive density is more robust, at the cost of usuallybeing slightly outperformed. This is consistent with the conjectured sharper peaks andthinner tails of general PEB predictive densities mentioned earlier.4.6 Prediction in the Weibull/Gamma ModelTo try out the prediction in our model, we take the 99 samples of size 50 simulatedfrom the Weibull/gamma model, and extend them with a 51s t pair of observations. Foreach sample, we compute a PEB predictive density and an approximate full predictivedensity for Y51,21 Y5 1 , 1 , Y1, • • • , Y5o. For the PEB density we estimate A by the approximateposterior means with respect to the flat prior H i , computed using reparameterization II.(The simulations indicated approximate posterior means give better estimates of A o , theparameter governing the frailty distribution, and hence the association within families.Since this association will be critical to our prediction, we use these estimators, eventhough they are slightly outperformed by the MLE in terms of the baseline parameters.)For the approximate full predictive density, we use importance sampling, generatingChapter 4. Prediction^ 331000 multinormal vectors to approximate the integral. By estimating the error in theimportance sampling, and by visually inspecting the effect of the importance samplesize, it seems that this sample size is sufficient to control stochasticity introduced bythe importance sample. We also use reparameterization II for this because the previoussimulation results suggest that this parameterization leads to a 'more normal' posteriordistribution than the others. If this is true, the importance sampling should be mostefficient with this parameterization, given the choice of a multinormal importance density.Of course this makes our predictive density a sum of 1000 terms, which makes it somewhathard to handle. However, we cannot really expect to do much better. Since the integralbeing approximated is 4-dimensional, using this sample size is analogous to approximatinga 1-dimensional integral using AY1000 --:.,' 5.6 simulated variates.The two predictive densities are plotted for the first 8 samples in Figure 4.2. Ineach case the PEB predictive density is the more peaked and thinner—tailed of the pair.The potential overconfidence of the PEB densities is clear. It becomes a reality in lightof Figure 4.3, a histogram of the log-predictive density ratios. While half the samplesproduce negative values (PEB beats full density) there are no large negative values,but there is a thick tail in the large positive values. The relative robustness of thefull predictive density observed in the exponential example is empirically obvious in ourmodel.4.7 Prediction in the Weibull/Positive Stable ModelThe same procedures mentioned above were carried out for the 100 samples of size 50simulated from the Weibull/positive stable model. In this case, the maximum likelihoodestimates of A were used for the PEB prediction, given their performance in the simu-lation. Because of the proximity of the true value of A o (0.9) to the parameter spaceA l^-.410.5^1.0^1.5^2.0^2.5^3.0OO ^0.0ONA +.68OO _O0.0^0.5^1.0^1.5^2.0^2.5^3.0-.35OLc?^_OOO0.0^0.5^1.0^1.5^2.0^2.5^3.00.0^0.5^1.0^1.5^2.0+.16A I^A +.270LC1O00.5^1.0^1.5^2.0^2.5^3.0OO ^0.00.0^0.5^1.0^1.5^2.0^2.5^3.00.0^0.5^1.0^1.5^2.0^2.5^3.00.0^0.5^1.0^1.5^2.0^2.5^3.0OO0O09Lt1OOOChapter 4. Prediction^ 34Figure 4.2: Approximate full and empirical predictive densities (gamma frailty)marks predicting value Y(51,1) A marks observed predicted value Y(51,2)number is log-liklihood ratio evaluated at observed Y(51,2)Chapter 4. Prediction^ 35Figure 4.3: Histograms of observed predictive density ratios: approximate full versusempirical Bayes99 samples of size 50+1 with gamma frailty00CO0CV00-4 -2 0 2 4log predictive density ratio100 samples of size 50+1 with stable frailty0-4-0CO0CV0.,—0-0.4^-0.2^0.0^0.2^0.4log predictive density ratioChapter 4. Prediction^ 36boundary () o = 1), many samples produced a substantial fraction of importance samplepoints outside the parameter space. However, this did not seem to hamper the accuracyof the approximation. The histogram of observed log-likelihood ratios is displayed inFigure 4.3. The right tail is again slightly heavier, but this trend is less pronounced thanin the gamma frailty case. The important difference however is that the variability isroughly an order of magnitude smaller. Observing the plots of the two predictive densi-ties in Figure 4.4 shows that in all but one of the first eight samples, the two densitiesare visually indistinguishable. It seems that with positive stable frailty and a sample ofsize 50, PEB prediction is virtually identical to the full Bayesian approach to prediction.4.8 Effect of Frailty Distribution on PredictionWe speculate that the full Bayesian and Empirical Bayes predictive densities are soclose in the stable frailty case for the following reason. Implicit in the prediction is anupdating of knowledge regarding 8m+1 . Prior to observing Y,,,,+1 (i i, we know 0ni+1 comesfrom a positive stable distribution, and we have information about the parameter, A o . Weuse Y7,2+1 43 ) to update (via Bayes theorem) the distribution on O ni+1 . Because the stabledistribution has such a thick right-tail, we expect the information from Y,7,14 ,(3) dominatesthe prior information. In other words, our knowledge about ° m+1 is not sensitive to ourknowledge of A o .This would explain why replacing a distribution on A o with a point mass might notgreatly affect the prediction. Further, because the information on the remaining compo-nents of A is so good compared to the knowledge about 6 1 ,7,41 , interchanging degenerateand non-degenerate distributions on these components will not have a large effect. Con-versely, with gamma frailty, the prior knowledge about A o would be important in theupdating process; consequently, there would be a noticable difference between the full+.319N -OA^+.013• -N -A -.031N -• -OA -.025N --0 -Chapter 4. Prediction^ 37Figure 4.4: Approximate full and empirical predictive densities (positive stable frailty)A^-.020• -• -0.0^0.5^1.0^1.5^2.0^2.5^3.0AI^+.003-• -0.0^0.5^1.0^1.5^2.0^2.5^3.0+.075• --O0.0^0.5^1.0^1.5^2.0^2.5^3.0AI^-.027-• -0.0^0.5^1.0^1.5^2.0^2.5^3.00.0^0.5^1.0^1.5^2.0^2.5^3.00.0^0.5^1.0^1.5^2.0^2.5^3.00.0^0.5^1.0^1.5^2.0^2.5^3.00.0^0.5^1.0^1.5^2.0^2.5^3.0marks predicting value Y(51,1) A marks observed predicted value Y(51,2)number is log-liklihood ratio evaluated at observed Y(51,2)Chapter 4. Prediction^ 38and PEB predictors under gamma frailty.As we will see in the next chapter, predictive denisities, while useful in their ownright, can also be used as a tool in model selection criteria.Chapter 5Application5.1 Description of Rat-Litter DataWe consider the data from a litter-matched experiment on rats, given in Mantel, Bohi-dar & Ciminera (1977). This data was analyzed in Hougaard (1986b), using likelihoodmethods. One-hundred litters of 3 animals each were observed. Fifty litters containedall female rats, the other 50 were exclusively male. One rat per litter was exposed to apossibly carcinogenic treatment. For each rat, a time (in weeks) and an indicator (T orD) was recorded. The data in its original form is given in Appendices B and C.The times recorded are times from birth to a response. The response is appearanceof a tumour (T), or death (D). The data description indicates that all animals weresacrificed after 104 weeks; however, it is clear that some animals (the surviving male ratslisted as third in their litter) were sacrificed after 102 weeks. We will refer to deaths notdue to sacrifice as deaths due to other causes. For ease of analysis, all times were dividedby 100. Initially, we concentrate only on the female litters.5.2 Single Risk versus Competing RisksIf the phenomenon of interest is tumour appearance, then we can treat those animalsobserved to die as censored observations. This was the approach taken in Hougaard(1986b). Note that this implies a mixture of censoring mechanisms. We treat eachanimal as having a time to tumour appearance T, and a censoring time L. Then either39Chapter 5. Application^ 40we observe a value of T smaller than L (uncensored observation) or we know only thatT exceeds L (censored observation). Following Lawless (1982, pp. 31-34), the sacrificedanimals correspond to type I censoring (L is fixed before the experiment begins), whilethe animals observed to die correspond to random censoring (L is a random variable). Forthe conditional likelihood (2.3) to remain valid under random censoring, it is necessaryto assume T and L are independently distributed given the frailty.Of course it is entirely possible that death due to other causes (not tumours) is theresponse of interest, and we could treat tumour appearances as random censoring. Thesacrificed animals would still correspond to type I censoring. As before the validity of(2.3) would require conditional independence of the tumour appearance time and thedeath due to other cause time.It may be the case that both responses (tumour and death due to other causes) areof interest. Or, with data on families, it may be advantageous to model both responses,even if only one is of interest. We then require a competing risks model. Here, associatedwith each animal is a pair of random variables, Ti. (time to tumour appearance) and T2(time to death due to other causes). We observe either (T 1 = t, T2 > t) or (T1 > 1, T2 -= t)in addition to type I censored observations (T 1 > t, T2 > t). Note that assuming inde-pendence of (T1 , T2 ) given an animal's frailty is equivalent to the assumption required tojustify the conditional likelihood in the single risk models. We explore several competingrisk models based on this assumption.5.3 Model Adequacy CheckingWe would like to do some simple preliminary model checking with the data. One meansby which to do this is examining marginal distributions. First, given a conditional Weibulldistribution, we examine the unconditional marginal distribution induced by the frailtyChapter 5. Application^ 41tA3^(5.23)while the marginal hazard under positive stable frailty isFlo (A0 A3tA°A3-1) (5.24)So the positive stable frailty model has Weibull marginals as well as conditionals, but theshape parameters differ. Hence the marginal hazard is monotone (increasing, constant,or decreasing). Conversely with gamma frailty, the marginal hazard either increases to amaxima then decreases, or is monotone decreasing.To do diagnostic marginal checks, we treat the female treatment rats as one sample,the first-listed female control rats as a second sample, and the second-listed as a thirdsample. Treating a single response as the response of interest, we proceed to check theadequacy of a Weibull distribution for each sample. The random component of the cen-soring means we don't know the ranks of even the uncensored observations, making aWeibull probability plot impossible. So, as suggested in Lawless (1982, p. 82) for eachsample we plot log(— log(S(t)) as a function of log t, where S is the non-parametricKaplan-Meier estimate of the survivor function. A true underlying Weibull marginal willmanifest itself as a roughly linear plot with slope depending solely on the shape parame-ter. So, obvious non-linearity would contradict the Weibull/positive stable model, whilelinearity, but different slopes in treatment and control plots would contradict the as-sumption of a common shape parameter (conditional common shape A3 implies marginalcommon shape \0A3 under stable frailty).We would like to know how the Weibull/gamma model manifests itself on the aboveplot. It is possible to show that with this model, log(— log S(t)) will be increasing andconvex for large log t (Appendix D).distribution. Specifically, if the conditional hazard is 0A 1 () 303-1 ) it is easy to show thata gamma frailty distribution yields marginal hazardcrA303 -1Chapter 5. Application^ 42Table 5.6: Estimation with rat-litter data : single risk (tumour)gamma frailty posterior mode(MLE)approx.posterior meanfrailty parameter Ao .61 (.29) .58 (.25)baseline treatment Al .64 (.17) .72 (.20)hazard control A2 .260 (.068) .288 (.080)parameters common shape A3 3.93 (.57) 4.05 (.57)positive stable frailtyfrailty parameter Ao .906 (.095) .871 (.083)baseline treatment Al .55 (.14) .57 (.14)hazard control A2 .214 (.058) .223 (.059)parameters common shape A3 4.10 (.63) 4.23 (.64)The above-mentioned plots for both responses are given in Figure 5.5. The tumourplots appear to be roughly linear with common slope, supporting our Weibull/positivestable model. The other cause plots seem to show a slight concavity for large log t,indicating perhaps the suitability of gamma frailty rather than stable frailty for thisresponse. Of course this is appropriate only if the assumption of Weibull conditionalhazards is appropriate.5.4 Single Risk : Tumour DevelopmentUsing the Weibull model with conditional hazards 0A 1 (A3tA3 -1 ) for treatment animalsand 0A 2 () 303-1 ) for controls, we compute the MLE and approximate standard errors inthe usual fashion. We also compute approximate posterior means, and the square rootsof approximate posterior variances with respect to H i , the flat prior. Results for gammafrailty and for positive stable frailty are displayed in Table 5.6. The approximate posteriorquantities were computed using the original parameterization. Given the success of usingreparameterization II in the simulation study, we wished to try that technique here too.Chapter 5. ApplicationFigure 5.5: Diagnostic plots based on Kaplan-Meier estimates of the marginals43tumour-treatment other cause-treatment--0.8 -0.6 -0.4 -0.2 0.0-1.2^-1.0^-0.8^-0.6^-0.4^-0.2^0.0tumour-control 1• --1.2^-1.0^-0.8^-0.6^-0.4^-0.2^0.0tumour-control 2-• -• --1.2^-1.0^-0.8^-0.6^-0.4^-0.2^0.0-1.2^-1.0^-0.8^-0.6^-0.4^-0.2^0.0other cause-control 1-1.2^-1.0^-0.8^-0.6^-0.4^-0.2^0.0other cause-control 2-- r-CJ -• -0 ----Chapter 5. Application^ 44Table 5.7: Estimation with rat—litter data : single risk (other causes)gamma frailty posterior mode(MLE)approx.posterior meanfrailty parameter Ao .60 (.21) .56 (.19)baseline treatment A l .54 (.16) .62 (.19)hazard control A2 .51 (.11) .55 (.13)parameters common shape A 3 5.39 (.69) 5.55 (.70)positive stable frailtyfrailty parameter Ao .876 (.085) .854 (.081)baseline treatment A l .44 (.12) .47 (.13)hazard control A2 .392 (.084) .408 (.086)parameters common shape A3 5.57 (.75) 5.70 (.75)Unfortunately, we couldn't maximize the necessary modified likelihoods. In particular,the induced prior 11/(K) increases without bound as /c o decreases to 0. It would seem thatthe likelihood does not decrease commensurably quickly.5.5 Single Risk : Death due to Other CausesThe same analysis applied to the tumour appearance times above was applied to theother cause times. The results are displayed in Table 5.7. Note that unlike the tumourrisk, there is no discernible difference between treatment and control groups with othercause risk.5.6 Competing Risks : Single FrailtyA simple approach to the competing risks problem is to assume that frailty is not risk-specific. Consequently, an animal's frailty acts multiplicatively on both its tumour hazardand its other cause hazard. Since treating the risks seperately has produced similarChapter 5. Application^ 45frailty parameter estimates for both risks, we have no immediate grounds to reject thisassumption. The model structure remains the same; however, we now effectively havefamilies of six members each (two lifetimes per animal), and three groups. The first groupconsists of tumour appearance times for treatment animals, the second group is tumourtimes for control animals, while the third group is other cause times. The conditionalhazards are Oh(t), where h is group-dependent, according to:h(t) =A 1 0303- ')A2 (A303 -1 ))140505 -1 )for group 1;for group 2;for group 3.(5.25)So each family has one observation on group 1, two observations on group 2, and threeobservations on group 3. However, at most three observations from one family canbe uncensored, since at most one lifetime per animal can be uncensored. Note thatwe assume the treatment has no effect on the other cause hazard (consistent with theestimation results from the last section). On the other hand, we do allow for differentshape parameters for the different risks. The results of estimation in this model are givenin Table 5.8.5.7 Competing Risks : Two FrailtiesThere is no reason in general to expect that non-risk-specific frailty is realistic. On theother hand, assuming an individual has independent risk-specific frailties does not allowfor an overall general health effect. A middle ground is suggested by Hougaard (1986b),in the context of distinguishing between sibling and twin relationships. Following hisidea, we associate with an individual a general frailty 0, and risk-specific frailties 0 1 and02 , corresponding to tumour appearance and other cause risk respectively. GeneralizingChapter 5. Application^ 46Table 5.8: Estimation with rat-litter data : competing risks, single frailtygamma frailty posterior mode(MLE)approx.posterior meanfrailty parameter Ao .78 (.15) .72 (.14)tumour treatment Al .65 (.16) .72 (.18)appearance control A2 .262 (.065) .290 (.074)baseline common shape A3 3.98 (.57) 4.13 (.59)other cause A4 .508 (.088) .55 (.10)baseline shape A5 5.30 (.67) 5.45 (.67)positive stable frailtyfrailty parameter Ao .911 (.063) .880 (.060)tumour treatment Al .59 (.14) .63 (.15)appearance control A2 .228 (.055) .241 (.058)baseline common shape A3 4.29 (.61) 4.43 (.61)other cause A4 .449 (.069) .464 (.074)baseline shape A5 5.52 (.68) 5.67 (.68)from the previous section, the conditional hazards become{00 1 A 1 (A3 tA3 ') for group 1;h(t I 0,01, 02) =^0cb 1 A2 (A3tA3-1 ) for group 2;002 A4 (A 5 tA 5 -1 ) for group 3.(5.26)It is assumed that (0 1 , 02) are independently distributed according to a positive stabledistribution parameterized by A6, while 0 A6 independently follows a positive stable distri-bution parameterized by A o . Unfortunately, it appears this structure is chosen to makeintegration to the unconditional likelihood tractable, rather than for physical reasons. Itis easy to show that (suppressing the family index), the unconditional survivor functionisP(Ii > yi, ... , Y6 > y6) = exp (— [HaA6 + N6] A°)^(5.27)Chapter 5. Application^ 47where Ha + Hb = H, such that Ha contains the terms corresponding to tumour times,while Hb 's terms correspond to other cause times.For our application, we can get the likelihood by taking partial derivatives (up tothird order) of (5.27) with respect to (y i , , ye). Unfortunately, there does not seemto be an obvious recursive form for the likelihood as in the single frailty case. Thederivatives required are given in Appendix E. The results from fitting the model aregiven in Table 5.9.Note that the two frailty model generalizes the single frailty with stable distributionmodel in the following sense. The net frailty associated with each individual, Ocki , followsa positive stable distribution (with parameter X0)6). So for example, considering groups1 and 2 only in (5.26) reduces to the two group model with a single frailty following astable distribution. So an analogous generalization of the gamma frailty model could beachieved by taking any joint distribution on (0,0 i ) such that the product 00 i has a gammadistribution. However, for physical reasons, one may wish to impose other conditionssuch as independence between 0 and 0i , or perhaps certain marginal distributions. Aconvenient specification yielding gamma distributed net frailty was not found. For ageneral discussion of these types of mixture models, from the point of view of specifiedmarginals, see Joe (1991).5.8 Including CovariatesWe propose to include the data on male litters by means of a regression function actingmultiplicatively on an animal's conditional hazard, in the manner of Cox (1972). Sincethere are only two observed tumours in the male animals, we choose to model the othercause hazard only. Because of the binary nature of the covariate, we can generalize themodel of section 5.5 by introducing a multiplicative factor A4 in the conditional hazardsChapter 5. Application^ 48Table 5.9: Estimation with rat-litter data : competing risks, two frailtiespositive stable frailty posterior mode(MLE)approx.posterior meanfrailty Ao .938 (.062) .897 (.064)parameters A6 .945 (.071) .942 (.071)baseline treatment Al .56 (.14) .60 (.15)hazard control A2 .216 (.056) .231 (.059)parameters common shape A3 4.29 (.61) 4.46 (.63)other cause A4 .424 (.076) .445 (.082)baseline A5 5.66 (.71) 5.79 (.73)of male rats only. The estimation results from this model are given in Table 5.10.5.9 Assessing Model ChoiceTo compare choice of model and frailty distribution, we propose to use predictive meth-ods. Following the cross-validation idea of Geisser (1980a), for each uncensored obser-vation, we construct a predictive density conditional on all the other observations. Forexample, to do this for the 40 tumour appearances observed would require 40 differentposteriors, the i th posterior being based on all the data except the i th tumour appearancetime. In practice we use the posterior based on all the data to compute each predictor, asdeleting one of 150 observations should not greatly influence the posterior. To computethe predictive densities, we use importance sampling with multinormal samples of size1000, as described in the previous chapter. Using these densities to assess model choiceis an application of the idea of predictive sample reuse, introduced independently byGeisser (1975) and Stone (1974).As a measure of the quality of prediction, we take the log of the product of thepredictive densities evaluated at the observed values. This choice of measure of predictionis inspired by the idea of comparisons based on likelihood ratios. It is also suggested inChapter 5. Application^ 49Table 5.10: Estimation with rat-litter data : single risk (other cause), proportionalhazards modelgamma frailty posterior mode(MLE)approx.posterior meanfrailty parameter Ao .75 (.13) .71 (.13)baselinehazardparameterstreatmentcontrolcommon shapeA iA2A3.48 (.10).489 (.087)4.87 (.40).51 (.11).511 (.085)4.94 (.42)regression parameter A4 1.47 (.31) 1.50 (.34)positive stable frailtyfrailty parameter Ao .895 (.050) .878 (.053)baselinehazardparameterstreatmentcontrolcommon shapeA lA2A3.401 (.089).415 (.077)5.14 (.43).413 (.093).422 (.079)5.20 (.44)regression parameter A4 1.48 (.33) 1.53 (.36)Geisser Sz Eddy (1979). Their interpretation is that of the models considered, the onemaximizing this predictive score best explains the data. These predictive scores for allthe models considered are given in Table 5.11. Note that for the purposes of comparison,the score for the model including males is based on prediction of the female survival timesonly.So, the predictive scores indicate superior performance by the stable frailty distri-bution in all but the proportional hazards model. To investigate this phenomenon, wegive histograms of the individual ratios of stable predictive density to gamma predictivedensity evaluated at the observed failure times. These are given for the single risk modelsand the competing risk single frailty model in Figure 5.6. Note that in three of the fourmodels, the superior performance of stable frailty can be attributed to one or two rel-atively large predictive density ratios, while the remaining observations show relativelyeven performance between the two frailty distributions, with any edge in fact favouringLf)01-LC)0 0u-)1 MChapter 5. Application^ 50Figure 5.6: Histograms of predictive density ratios : positive stable frailty versus gammafrailtytumour risk only competing risks:tumour-2^-1^0^1^2^-2^-1^0^1^2Iog(predictive density ratio)^log(predictive density ratio)other cause risk only0CO0C\Icompeting risks:other cause0-r-00CO0CV0,—Lnr 1^0^ n-2^-1^0^1^2^-2^-1^0^1^2log(predictive density ratio)^Iog(predictive density ratio)Chapter 5. Application^ 51Table 5.11: Predictive scoresmodeltumourgammafrailtyappearancestablefrailtyothergammafrailtycausestablefrailtytumour risk only -24.71 -24.18other cause risk only -8.93 -8.71other cause risk onlyincluding males-10.16 -10.26Competing riskssingle frailty-25.49 -22.80 -9.44 -7.51Competing riskstwo frailties-22.92 -8.04the gamma distribution. Again it seems the stable frailty choice is robust, relative to thegamma frailty specification.Also it is clear that on the basis of these predictive scores, the single frailty, competingrisks model with stable frailty is the most suitable, as it gives the best prediction of bothtumour appearance and other cause failure times. Note that while this model with stablefrailty improves upon the single risk models, the same model with gamma frailty is worsethan the single risk models. This underscores the importance of the frailty distributionselection—different choices can lead to different conclusions about the manner in whichfrailty acts upon individuals hazards.The double frailty model exhibits slightly poorer performance than the best model,even though it is a generalization. Perhaps the ad-hoc assumptions concerning frailtydistributions in this model are inappropriate at least in this situation. The assumption ofexchangeable risk-specific frailties is perhaps too restrictive. A model with more flexibleinclusion of general and risk-specific frailties might exhibit better performance; however,Chapter 5. Application^ 52such a model would be calculationally difficult.Including the male rats in the other risk only model via proportional hazards does notimprove the prediction of the female other cause times. This certainly calls into questionthe assumption of conditional proportional hazards between male and female animals.Computing a predictive score for each model provides a very easy way to choosebetween several models for given data (the model with the highest score is most suitable).Of course there are other model—ch000sing criteria. Some of these are mentioned in thenext chapter.Chapter 6Discussion6.1 Approaches to Frailty ModelsAs noted in (2.1), the hierarchical model can be collapsed to a one-stage model. Whenthe n i are small (the case with which we have been concerned), there seems to be littlepoint in trying to estimate the Oi 's. In a sense the two-stage model is only motivationfor the one-stage likelihood. However, it is important to keep in mind the notion ofunderlying frailties. As we saw in Chapter 4, the knowledge about an individual's frailtyis fundamental to predicting that individual's failure time. Moreover, in Chapter 5, wesaw that the concept of frailty led to more sophisticated models, with competing risks,and several frailties. Of course, if the Laplace transform of the frailty density does nothave a nice form, we could not straightforwardly collapse to a one-stage model. In thisinstance the hierarchical nature of the model would be more visible (though no moreimportant).We also saw that these models could be approached from the perspective of saybivariate models, with given marginals. Here obviously model specification would bein terms of marginal rather than conditional (baseline) distributions. This has somedisadvantages. Specifying marginals is specifying an effect averaged over a population.In analogy to the Cox (1972) model, it would be akin to specifying a form for the baselinehazard averaged with respect to the covariate vectors of the individuals. Conversely,specifying conditionals corresponds to choosing a form for the baseline hazards. The53Chapter 6. Discussion^ 54latter approach seems more sensible. Additionally, specifying arbitrary marginals impliesthat the conditional (baseline) hazards depend on the frailty parameter(s). This can alsobe construed as undesirable.Lindley & Singpurwalla (1986) motivate a model with time-varying frailties. In par-ticular they suggest treating an individual frailty as a stochastic process. They point outthat assuming frailty is a purely random process with a stationary distribution results ina model analogous to ours, but with different interpretation. Apparently, this approachhas not been pursued in the literature.6.2 Specification of DistributionsIn practice, we have restricted ourselves to the use of Weibull distributions for the firststage of our model. Although this family of distributions is flexible, and commonly usedfor failure times, this is a somewhat arbitrary specification. For more generality, onecould use the generalized gamma distribution of Stacy (1962). This three parameterfamily includes the Weibull and gamma families, and has a log-normal distribution as alimiting case. Unfortunately, an additional parameter will have ramifications in terms ofcomputational practicality.In light of the existing literature, we have considered gamma and positive stable dis-tributions as second-stage distributions. These families have the advantage of possessingnice Laplace transforms of their densities. In our data analysis, the stable frailty out-performs the gamma frailty, at least in terms of our predictive sample reuse criteria. Asmentioned previously, this may be attributable to the robustness of the stable specifica-tion rather than a higher degree of "truthfulness." This robustness in turn comes fromthe long right tail of positive stable distributions. In any case, at least according to ourcriteria, the stable frailty specification offers a better explanation of the data than theChapter 6. Discussion^ 55gamma frailty. This is consistent with Hougaard's theoretical preference for the positivestable frailty.There is however, a drawback to using stable frailty in terms of implementing theBayesian paradigm. With gamma frailty, a specification of a prior on A and particularlyon Ao , describes belief in a simple function of the variance of the frailty distribution. Withstable frailty, Ao is the index of the variate, and no such straightforward interpretationsexist (the mean and variance of the frailty distribution do not exist). In short, with stablefrailty, it is harder to encapsulate knowledge about the heterogeneity of a population.Again, one wonders about other frailty distribution specifications. Hougaard (1986a)introduces a three-parameter family generalizing both the positive stable and gammadistributions. However, using this entire family would present computational difficulties,and possibly problems with identifiability. An alternate family with the property ofclosed form Laplace and inverse Laplace transforms is the inverse Gaussian densities.However, inverse Gaussian frailty models cannot attain the Frechet upper bound. In fact,the limiting distribution corresponds to positive stable frailty with parameter A o = 1/2(Oakes, 1989). Hence this frailty distribution specification would not encompass as widea range of dependence/association as gamma or positive stable frailty.There are some nonparameteric and semiparametric procedures which can be usedfor multivariate frailty models. Oakes (1986,1989) discusses such procedures. However,they are presently limited to bivariate situations with identical groupings within eachfamily. Also, they do not provide for the specification of prior information.Chapter 6. Discussion^ 566.3 TechniquesTo estimate parameters, we used both maximum likelihood estimation and Laplace'smethod with diffuse priors. In the data analysis, there were no cases of these two meth-ods providing greatly differing results. In the simulation studies, the overall performanceof the MLE was perhaps superior; however, at least in one instance Laplace's methodprovided the better results. Of course, if one has non-diffuse prior information thenLaplace's method would provide reliable estimates of functions of parameters. The ad-ditional advantage of Laplace's method is its simplicity—it rivals maximum likelihoodestimation in terms of ease of implementation.We also witnessed the dependence of Laplace's method on parameterization. A fixedposterior function can be approximated via any parameterization of the model. Due tothe nature of Laplace's method, the approximation will be better with a parameterizationin which the posterior distribution is in some sense closer to a normal distribution. Infact if the parameter space were 1 or 2-dimensional, one could visually inspect the (non-normalized) posterior under a variety of parameterizations. The parameterization whichinduces the most "normal looking" posterior would be a good choice to use for Laplaceapproximations. Of course in our situation (and most other realistic situations) theparameter space is of higher dimension. Achcar & Smith (1990) give examples of thesensitivity of Laplace approximations to reparamerization in a number of situations.The Laplace approximation was not so attractive for approximating predictive den-sities. For this purpose, importance sampling gave us normalization essentially for free.This was due to the nice form of the conditional densities (PEB predictive densities),which in turn was due to the nice form of the Laplace transform of the frailty density.Another Monte Carlo technique which is potentially useful for approximating pos-terior quantities in hierarchical models is Gibbs sampling. For a general discussion ofChapter 6. Discussion^ 57Gibbs sampling, see Geman Geman (1984), or Gelfand Si Smith (1988). In CIHM's,Gibbs sampling would require simulating variates from the distributions of (0, I A, data),i = 1, , k. and (Aj Au), 0, data), j = 1, , r. In the Weibull model with gammafrailty, (0i I A, data) will also have a gamma distribution (gamma frailty is effectivelya conjugate prior specification). However, () j Au) , 0, data) would have a complicateddistribution function. Thus in our particular hierarchical model, Gibbs sampling wouldbe very difficult to implement.6.4 Assessing ModelsChapter 5 illustrated several ways of modelling a data set. Of particular interest was thedifference between the single risk models and the competing risk models. According toour criteria, (predictive sample reuse), the competing risk models fared better, in termsof predicting both tumour appearance times and other cause times. Put another way,with data on families, even if say tumour appearance is the response of interest, it isadvantageous to model the censoring times (in this case the other cause times).The criteria used to assess models is also of interest. The predictive sample reuse ideais conceptually appealing. In a sense it favours the model which offers the best expla-nation of the data. There is not such a strong emphasis on the truth of various models,which is an approximate notion at best. There is also not the reliance on asymptoticswhich is used for likelihood assessment with nested models.There are many other criteria available to assess the different models. Even withinthe framework of predictive sample reuse, there are many ways to implement a model—selection scheme. Geisser (1980a) has suggested making point predictions and using somediscrepancy function to measure the distance between predicted and realized values. Anentirely different approach would be to use Bayes factors. When comparing two modelsChapter 6. Discussion^ 58(each consisting of a likelihood L and a prior II), the Bayes factor is given byf L i (A)II I (A)dA f L 2 (A)II2 (A)dAwith a large Bayes factor favouring model 1 (Berger, 1985, p. 146). In our models,computing Bayes factors would be difficult, requiring the integration of the unconditionallikelihood (2.5) with respect to the prior measure H on A. Also, with diffuse priors, weexpect Bayes factors to be very similar to likelihood ratios. Because of the behaviourwith nested models, we wish to avoid likelihood ratio criteria.When likelihood ratio is applied to nested models, the more compex model will nec-essarily have a larger likelihood, or score. We wish to avoid the appeal to asymptoticsusually made to resolve this problem. Predictive sample reuse and Bayes factors willnot necessarily favour a more complex model. Other approaches involve modifying thelikelihood ratio. Akaike's information criterion (Akaike, 1973) useslog L(5) — [# estimable parameters]as a score with which to compare models. (A is a maximum likelihood estimate.) Theapproach of Schwarz (1978) modifies the criterion tolog L(A) — [(1/2) log n x (# estimable parameters)]where n is the number of independent observations. Of course, these latter two criteriado not admit prior information.These criteria are not entirely dissimilar. Smith & Spiegelhalter (1980) show a con-nection between Bayes factors and the Schwarz criterion in certain contexts. Also, Stone(1977) has shown an asymptotic equivalence between a variant of the predictive samplereuse criterion and the Akaike information criterion. The variant in question is similar tothe one employed by us, but replaces the full predictive density with the PEB predictivedensity.Chapter 6. Discussion^ 59We note that in our data analysis, the single frailty competing risks model is a specialcase (A6 = 1) of the double frailty model. Thus we have an example of a predictivesample reuse criteria directly favouring the simpler of two nested models.Our treatment of censoring in terms of predictive sample reuse bears discussion. Wehave based our criteria on prediction of the uncensored observations only. This allowsfor direct comparison between single risk and competing risk models. For methods ofincorporating the censored values, see Geisser (1980b).Appendix AQuadrant Dependence of Frailty ModelsFollowing (2.11), let p denote the Laplace transform of the frailty distribution. Weknow p is convex. We will show that if log p is convex, then for arbitrary first stagedistributions, the bivariate distribution is positively quadrant dependent. We will needthe facts p(0) = 1, p > 0, and p' < 0; these are elementary properties of Laplacetransforms.Note that to show (2.12) holds, it would suffice to showp(a + b) > p(a)p(b) Va, b > 0,or equivalentlylog p(a + b) > log p(a) + log p(b).Buta+b pf ( s )log p(a + b) = log p(a) + la ts)19 ds.Now we know that the integrand is negative, and by assumption it is increasing, hence11(3) ds ;log p(a + b)^Arlog P(a) +^.,i, )that islog p(a + b) > log p(a) + log p(b) — log p(0).But log p(0) = 0, giving the desired result.60Appendix BFemale Rat Litter DataFemale Rat LittersDrug-treated Control-1 Control-2101-D 49-T 104-D104-D 102-D 104-D104-D 104-D 104-D77-D 97-D 79-D89-D 104-D 104-D88-T 96-T 104-D104-D 94-D 77-T96-T 104-D 104-D82-D 77-D 104-D70-T 104-D 7-D89-T 91-D 90-D91-D 70-D 92-D39-T 45-D 50-T103-T 69-D 91-D93-D 104-D 103-D85-D 72-D 104-D104-D 63-D 104-D104-D 104-D 74-D81-D 104-D 69-D67-T 104-D 68-T104-D 104-D 104-D104-D 104-D 104-D104-D 83-D 40-T87-D 104-D 104-D104-D 104-D 104-DFemale Rat LittersDrug-treated Control-1 Control-289-D 104-D 104-D78-D 104-D 104-D104-D 81-T 64-T86-T 55-T 94-D34-T 104-D 54-T76-D 87-D 74-D103-T 73-T 84-T102-T 104-D 80-D80-T 104-D 73-D45-T 79-D 104-D94-T 104-D 104-D104-D 104-D 104-D104-D 101-T 94-D76-D 84-T 78-T80-T 81-T 76-D72-T 95-D 104-D73-T 104-D 66-T92-T 104-D 102-T104-D 98-D 73-D55-D 104-D 104-D49-D 83-D 77-D89-T 104-D 104-D88-D 79-D 99-D103-T 91-D 104-D104-D 104-D 79-T61Appendix CMale Rat Litter DataMale rat littersDrug-treated Control-1 Control-291-D 104-D 102-D91-D 104-D 102-D98-D 62-D 77-D91-D 98-D 76-D104-D 104-D 98-D96-D 71-D 91-D79-D 104-D 99-D61-D 88-D 85-D63-D 104-D 102-D104-D 104-D 102-D104-D 80-D 92-D104-D 104-D 101-D104-D 53-D 102-D65-D 104-D 91-D86-D 104-D 75-D104-D 100-D 102-D95-D 104-D 95-D104-D 104-D 102-D104-D 93-D 80-D92-D 98-D 83-D104-D 89-D 89-D63-D 32-D 51-D104-D 98-D 78-D104-D 104-D 102-D87-D 104-D 94-DMale rat littersDrug-treated Control-1 Control-2104-D 104-D 102-D104-D 91-D 102-D90-D 104-D 55-D91-D 104-D 102-D104-D 104-D 102-D23-D 104-D 102-D104-D 71-T 90-D87-D 51-D 102-D104-D 83-D 102-D104-D 104-D 96-D67-D 84-D 94-D104-D 104-D 99-D104-D 94-D 102-D104-D 103-D 102-D51-D 104-D 91-D102-D 98-D 102-D88-D 54-D 39-D67-D 84-D 54-D104-D 104-D 87-D81-D 82-D 102-D94-D 104-D 102-D104-D 89-D 77-D104-D 69-D 102-D104-D 75-T 64-D92-D 104-D 102-D62Appendix DPerformance of Weibull/Gamma Model on Diagnostic PlotLetting y = log[— log S(t)] and x = log t, from the Weibull/gamma marginal hazard(5.23), we seey = log a + log(log g(x))whereg(x) = k + exp(.\3x), k = a1 t 1We wish to show that for large x,cWdx is positive, while d2y/dr 2 is negative.Noting that g' (x) = A3 [g(x) — k], we can writedy A3 1 — (k I g) dx^3 log gwhich will be positive when g > 1. Since g is increasing this will occur when x issufficiently large.Differentiating again, we can writed2 y^A2 dx 2^[g log .9, 1 2{k[g log g] — k2[logg] — [(g — k) 2 ]}— which we can see will be negative when g (and hence x) is sufficiently large.63Appendix EDerivatives of Survivor Function in Two-Frailty ModelWe want the first, second, and third-order derivatives of (5.27) with respect to (yi, .. • , ye).Initially, we will express these in terms of the derivatives of H* = HaA6 . We denote(5.27) by A = exp(—(H*) A9 and write B = —Ao (H*)'°'. Then the first-order derivativeisAa = ABThe second-order derivatives have the formAii = A(H: H;)[B 2 B (1) H* *^ B1l H2where B(d) is the dth derivative of B with resepct to H*.The third-order derivatives can be expressed as a sum of three terms:^Aijk^=^AB[B2 B( 1 ) + H13]A(HAH; H;k H:)[B 2 B( 1 ) + H B]A(HiHi )[2B B( 1) + B( 2).111 ctik 1;-- -&B]whered ^B = ^B^HZiHA B 112*.i ii;k B 11:i HZ B(1)^dyk HH^H: (Hr)2 H;^(H;)2 H:The derivatives of H* are given byH = A 6 -l h i64Appendix E. Derivatives of Survivor Function in Two-Frailty Model^65=0.A6(A6 — 1)H(Ar 2 hillj (i) = (j)1-1:3 otherwiseandA6 (A6 — 1)(A6 — 2)111r3 h i hi hk (i) (j) (k)H:jk =0^ otherwisewhere (i) indicates a or b depending on whether Yi is a tumour response or an othercause response respectively.Bibliography[1] ACHCAR, J.A. & SMITH, A.F.M.(1990). Aspects of reparameterization in approx-imate Bayesian inference. Bayesian and Likelihood Methods in Statistics and Econo-metrics 439-52. eds. S. Geisser, J.S. Hodges, S.J. Press, A. Zellner. Amsterdam:North-Holland.[2] AKAIKE, H.(1973). Information theory and an extension of the maximum likelihoodprinciple. 2nd. International Symposium on Information Theory 267-81. Budapest:Akademia Kiado.[3] BERGER, J.0.(1985). Statistical Decision Theory and Bayesian Analysis New York:Springer-Verlag.[4] BLOMQVIST, N.(1950). On a measure of dependence between two random vari-ables. Ann. Math. Statist. 21, 593-600.[5] CHAMBERS, J.M., MALLOWS, C.L. & STUCK, B.W.(1966). A method for sim-ulating stable random variables. J. Am. Statist. Assoc. 71, 340-44.[6] CLAYTON, D.(1978). A model for association in bivariate life tables and its appli-cation in epidemiological studies of familial tendency in chronic disease incidence.Biometrika 65, 141-51.[7] CORNFIELD, J. & DETRE, K.(1977). Bayesian life table analysis. J. R. Statist.Soc. B 39, 86-94.[8] COX, D.R.(1972). Regression models and life tables (with discussion). J. R. Statist.Soc. B 34, 187-202.[9] COX, D.R. & OAKES, D.(1984). Analysis of Survival Data Chapman & Hall.[10] FELLER, W.(1971). An Introduction to Probability Theory and Its Applications, 2,2nd ed. New York: Wiley.[11] GAMMERMAN, D. & WEST, M. (1987). An application of dynamic survival modelsin unemployment. Statistician 36, 269-74.[12] GEISSER, S.(1975). The predictive sample reuse method with applications. J. Am.Statist. Assoc. 70, 320-38,350.66Bibliography^ 67[13] GEISSER, S.(1980a). A predictivist primer. Bayesian Analysis in Econometrics andStatistics: Essays in Honor of Harold Jeffreys. 363-81. ed. A. Zellner. Amsterdam:North-Holland.[14] GEISSER, S.(1980b). Predictive sample reuse techniques for censored data. BayesianStatistics. 433-68. eds. J.M. Bernardo et. al., Valencia: University Press[15] GEISSER, S. & EDDY, W.F.(1979). A predictive approach to model selection. J.Am. Statist. Assoc. 74, 153-160.[16] GELFAND, A.E. Si SMITH, A.F.M.(1988). Sampling based approaches to calculat-ing marginal densities. Technical report: Nottingham Statistics Group. University ofNottingham.[17] GEMAN, S. & GEMAN, D. (1984). Stochastic relaxation, Gibbs distributions andthe Bayesian restoration of images. IEEE Transactions on Pattern Analysis andMachine Intelligence 6, 721-41.[18] GENEST, C. & MACKAY, R.J.(1986a). Copules Archimediennes et families de loismarges sont donnees. Can. J. Stat. 14, 145-59.[19] GENEST, C. & MACKAY, R.J.(1986b). The joy of copulas: bivariate distributionswith given marginals. The American Statistician 40 280-83.[20] HOLT, J.D. & PRENTICE, R.L.(1974). Survival analysis in twin studies andmatched-pair experiments. Biometrika 61 17-30.[21] HOUGAARD, P.(1984). Life table methods for heterogeneous populations: Distri-butions describing the heterogeneity. Biometrika 71, 75-83.[22] HOUGAARD, P.(1986a). Survival models for heterogeneous populations derivedfrom stable distributions. Biometrika 73, 387-96.[23] HOUGAARD, P.(1986b). A class of multivariate failure time distributions.Biometrika 73, 671-78.[24] JOE, H.(1991). Parametric families of multivariate distributions with given margins.Technical report: Department of Statistics. University of British Columbia.[25] KALBFLEISH, J.(1978). Non-parametric Bayesian analysis of survival time data.J. R. Statist. Soc. B 40, 214-21.[26] KALBFLEISH, J. & PRENTICE, R.L.(1980). The Statistical Analysis of FailureTime Data New York: Wiley.Bibliography^ 68[27] KANTER, M.(1975). Stable densities under change of scale and total variation in-equalities. Annals of Prob. 3, 697-707.[28] KASS, R.E. & STEFFEY, D.(1989). Approximate Bayesian inference in condition-ally independent hierarchical models (parametric empirical Bayes models). J. Am.Statist. Assoc. 84, 717-26.[29] KENDALL, M.G.(1938). A new measure of rank correlation. Biometrika 30, 81-93.[30] LAWLESS, J.F.(1982). Statistical Models and Methods for Lifetime Data New York:Wiley.[31] LEHMANN, E.L.(1966). Some concepts on dependence. Ann. Math. Statist. 37,1137-53.[32] LINDLEY, D.V. & SINGPURWALLA, N.D.(1986). Multivariate distributions forthe life lengths of components of a system sharing a common environment. J. AppliedProb. 23, 418-31.[33] MANTEL, N., BOHIDAR, N.R. & CIMINERA, J.L.(1977). Mantel-Haenszel anal-yses of litter matched time-to-response data, with modifications for recovery of in-terlitter information. Cancer Res. 37, 3863-68.[34] MARSHALL, A.W. & OLKIN, I.(1988). Families of multivariate distributions. J.Am. Statist. Assoc. 83, 834-41.[35] MARTZ, H.F. & WALLER, R.A.(1982). Bayesian Reliability Analysis New York:Wiley.[36] OAKES, D.(1986). A model for bivariate survival data. Modern Statistical Methodsin Chronic Disease Epidemiology eds. R.L. Prentice and S.H. Moolgavkar. New York,Wiley.[37] OAKES, D.(1989). Bivariate survival models induced by frailties. J. Am. Statist.Assoc. 84, 487-93.[38] SCHWARZ, G.(1978). Estimating the dimension of a model. Ann. Statist. 6, 461-64.[39] SMITH, A.F.M. & SPIEGELHALTER, D.J.(1980). Bayes factors and choice criteriafor linear models. J. R. Statist. Soc. B 42, 213-20.[40] SPEARMAN, C.(1904). The proof and measurement of association between twothings. Am. J. of Psychology 15, 72-101.[41] STACY, E.W.(1962). A generalization of the gamma distribution. Ann. Math.Statist. 33, 1187-92.Bibliography^ 69[42] STONE, M.(1974). Cross-validatory choice and assessment of statistical predictions(with discussion). J. R. Statist. Soc. B 36, 111-47.[43] STONE, M.(1977). An asymptotic equivalence of choice of model by cross-validationand Aikake's criterion. J. R. Statist. Soc. B 39, 44-47.[44] TIERNEY, L. & KADANE, J.B.(1986). Accurate approximations for posterior mo-ments and marginal densities. J. Am. Statist. Assoc. 81, 82-86.[45] VAUPEL, J.W., MANTON, K.G. & STALLARD, E.(1979). The impact of hetero-geneity in individual frailty on the dynamics of mortality. Demography 16, 439-54.[46] WILD, C.J.(1983). Failure time models with matched data. Biometrika 70, 633-41.