UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Identification of worsening subjects and treatment responders in comparative longitudinal studies Kondo, Yumi 2016

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2016_may_kondo_yumi.pdf [ 30.12MB ]
JSON: 24-1.0300651.json
JSON-LD: 24-1.0300651-ld.json
RDF/XML (Pretty): 24-1.0300651-rdf.xml
RDF/JSON: 24-1.0300651-rdf.json
Turtle: 24-1.0300651-turtle.txt
N-Triples: 24-1.0300651-rdf-ntriples.txt
Original Record: 24-1.0300651-source.json
Full Text

Full Text

Identification of Worsening Subjectsand Treatment Responders inComparative Longitudinal StudiesbyYumi KondoB.A., Ritsumeikan University, 2009B.A., American University, 2009M.Sc., The University of British Columbia, 2011A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinThe Faculty of Graduate and Postdoctoral Studies(Statistics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)March 2016c© Yumi Kondo 2016AbstractThis thesis discusses the problems of identifying worsening individuals in on-going clinical trials and treatment responders in completed trials.We develop a new modelling approach to enhance a recently proposedmethod to detect increases of contrast enhancing lesions (CELs) on repeatedmagnetic resonance imaging, which have been used as an indicator for poten-tial adverse events. The method signals patients with unusual increases inCEL activity by estimating the probability of observing CEL counts as largeas those observed on a patient’s recent scans conditional on the patient’s CELcounts on previous scans. This index, computed based on a mixed effectnegative binomial regression model, can vary substantially depending on thechoice of distribution for the patient-specific random effects. Therefore, werelax this parametric assumption to model the random effects with an infinitemixture of beta distributions, using the Dirichlet process, which allows anyform of distribution. As our inference is in the Bayesian framework, we adopta meta-analytic approach to develop an informative prior based on previoustrials. This is particularly helpful at the early stages of a trial. We illustrateour method with 10 multiple sclerosis (MS) trial datasets, and assess it bysimulation studies.Identification of treatment responders is a challenge in comparative stud-ies where a treatment efficacy is measured by various longitudinally-collectedcontinuous and count outcomes. Existing procedures often identify respondersbased on only a single outcome. We propose to classify patients according totheir posterior probability of being a responder estimated based on a mul-tiple outcome mixture model. Our novel model assumes that, conditioningon a cluster label, each longitudinal outcome is from the generalized lineariiAbstractmixed effect model (GLMM), arguably the most popular longitudinal model.As GLMM is a rich class of models, our general procedure enables findingresponders comprehensively defined by multiple outcomes from various distri-butions. We utilize the Monte Carlo expectation-maximization algorithm toobtain the maximum likelihood estimates of our high-dimensional model. Wedemonstrate the generality of our procedure on two MS trial datasets. Oursimulation study shows that incorporating multiple outcomes improves theresponder identification performance.iiiPrefaceThis thesis was completed under the supervision of Dr. Yinshan Zhao andProfessor John Petkau.Chapters 2 and 3 of this thesis are based on the paper “A flexible mixedeffect negative binomial regression model for detecting unusual increases inMRI lesion counts in individual multiple sclerosis patients,” in Statistics inMedicine by Kondo Y., Zhao Y. and Petkau, J. [35].This research problem was identified by my thesis supervisors Dr. YinshanZhao and Professor John Petkau. As the first author, I conducted literaturereviews, proposed the models, developed the algorithms, implemented thealgorithms in C and R programming languages, and performed the empiricalanalysis and simulation studies. I drafted the manuscript; my supervisorshelped to revise it. The UBC MS/MRI Research Group and the Department ofStatistics Statistical Consulting and Research Laboratory (SCARL) providedthe multiple sclerosis clinical trial datasets.Additional manuscripts based on the research in this thesis are in prepa-ration for submission to peer-reviewed journals.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xivList of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . xxiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxivDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 A flexible mixed effect negative binomial regression model fordetecting unusual increases in MRI lesion counts . . . . . . 62.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Model for repeated CEL counts . . . . . . . . . . . . . . . . . 122.2.1 The distribution of the CEL counts conditionally on Gi 122.2.2 The infinite mixture distribution of Gi and priors . . . 132.3 Posterior computation . . . . . . . . . . . . . . . . . . . . . . . 152.3.1 The truncated Dirichlet process . . . . . . . . . . . . . 152.3.2 The posterior conditional probability index . . . . . . . 18vTable of Contents2.4 Synthesizing previous datasets for prior specification . . . . . . 192.4.1 Structure of β for “current” and “previous” trials . . . 192.4.2 Random effect meta-analysis . . . . . . . . . . . . . . . 202.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 The applications and simulation studies of the flexible mixedeffect negative binomial regression model . . . . . . . . . . . 233.1 Application to MS clinical trial datasets . . . . . . . . . . . . . 233.1.1 Ten MS clinical datasets . . . . . . . . . . . . . . . . . 233.1.2 Analysis of “previous” trials . . . . . . . . . . . . . . . 253.1.3 Meta-analysis: Synthesizing the nine previous trials . . 263.1.4 Analysis of the ‘‘current’’ trial – Model fit . . . . . . . 303.1.5 Analysis of the ‘‘current’’ trial – Evaluation of CPIs . . 333.2 Simulation studies . . . . . . . . . . . . . . . . . . . . . . . . . 403.2.1 Investigation of the impacts of prior choices and REdistributional assumptions on CPI estimates . . . . . . 403.2.2 Revisiting YZ’s simulation study . . . . . . . . . . . . . 483.3 Conclusions and discussion . . . . . . . . . . . . . . . . . . . . 504 Identification of treatment responders . . . . . . . . . . . . . 534.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.2 Review of composite outcome measures . . . . . . . . . . . . . 574.3 Simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . 604.3.1 The ROC curve . . . . . . . . . . . . . . . . . . . . . . 614.3.2 Simulation settings . . . . . . . . . . . . . . . . . . . . 614.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 644.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705 Multiple longitudinal outcome mixture model to identify re-sponders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745.3 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75viTable of Contents5.3.1 Review of GLM and GLMM . . . . . . . . . . . . . . . 755.3.2 Multiple longitudinal outcome mixture model . . . . . 775.4 The estimation scheme . . . . . . . . . . . . . . . . . . . . . . 795.4.1 Generating samples from B˜i|y˜i; Ψ̂{s} . . . . . . . . . . 855.4.2 Choice of the MC sample size M . . . . . . . . . . . . . 875.4.3 Stopping rule . . . . . . . . . . . . . . . . . . . . . . . 895.4.4 Estimating the variance of Ψ̂ . . . . . . . . . . . . . . . 905.4.5 Initial values of the MCEM algorithm . . . . . . . . . . 915.4.6 Evaluation of the posterior probabilities . . . . . . . . . 925.5 Extending to the negative binomial model . . . . . . . . . . . 935.6 A simple illustrative example . . . . . . . . . . . . . . . . . . . 945.6.1 Scenario 1: All outcomes are effective . . . . . . . . . . 975.6.2 Scenario 2: One outcome is ineffective . . . . . . . . . . 1075.7 Application to identify relative responders . . . . . . . . . . . 1085.7.1 The illustrative example revisited . . . . . . . . . . . . 1125.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1136 MS clinical trial data analysis to identify responders . . . . 1166.1 Lenercept MS trial . . . . . . . . . . . . . . . . . . . . . . . . . 1166.1.1 Newly active lesions and persistently active lesions . . . 1176.1.2 Data and model . . . . . . . . . . . . . . . . . . . . . . 1186.1.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 1226.2 MBP8298 MS trial . . . . . . . . . . . . . . . . . . . . . . . . . 1296.2.1 The Multiple Sclerosis Functional Composite . . . . . . 1296.2.2 Data and model . . . . . . . . . . . . . . . . . . . . . . 1316.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 1336.3 Simulation studies . . . . . . . . . . . . . . . . . . . . . . . . . 1436.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1457 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152viiTable of ContentsAppendicesA Appendix for Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . 162A.1 Technical modification in YZ’s semiparametric procedure . . . 162A.2 The details of the DIC calculation for our semiparametric model 163B Appendix for Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . 165B.1 The asymptotic distribution of Ψ̂{s+1} and ∆QM (Ψ̂{s+1}; Ψ̂{s}) 165B.2 The first derivative of the density of Y˜ i|b˜i . . . . . . . . . . . 169B.2.1 A modified notation for η(r)[k]<l>i,j . . . . . . . . . . . . 170B.2.2 Derivations . . . . . . . . . . . . . . . . . . . . . . . . . 171B.3 The first derivatives of logit(Pi,k(Ψ′)) with respect to Ψ′. . . 174C Appendix for Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . 177C.1 Lenercept trial: MCEM parameter assessments . . . . . . . . . 177C.2 MBP8298 trial: MCEM parameter assessments . . . . . . . . . 181viiiList of Tables2.1 The CPIs for 4 example patients based on the YZ NB mixedeffect regression model with gamma or log-normal RE distri-butional assumptions. The fixed effect parameters of the NBmodel are selected to be the same for the two models. The ex-pectation and variance of the RE distribution are also set thesame. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.1 The key features and analysis summaries of the clinical tri-als. The abbreviations are RRMS: relapsing remitting multiplesclerosis, SPMS: secondary progressive, T (s): the number offour-month intervals during the follow-up, SP: semiparametric,P: parametric, IFN: Interferon, IVIG: IV immunoglobulin, MP:Micellar Paclitaxel, “Efficacy”: whether or not the referencedclinical paper found efficacy based on the CEL counts. Al-though our application treats SPMS-5 as a“current” trial, wealso report its fits with this prior. The lower panel of the tableshows the estimated fixed effect parameters and the devianceinformation criterion (DIC) introduced in Section 3.1.2. TheDICs are computed for both the semiparametric Bayesian pro-cedure (SP) and the parametric Bayesian procedure (P). Nopublished reference is available for SPMS-5. . . . . . . . . . . 24ixList of Tables3.2 Summary of the “current” trial analysis. The reported valuesare the estimated posterior mean (or MLE) of the fixed effectcoefficients and the SD. The abbreviations are; Freq: Frequen-tist (i.e., YZ’s procedures), Bayes: Bayesian, P: parametric, SP:semiparametric, U: uninformative prior, I (full): full informa-tive prior, I (SPMS): SPMS informative prior. The table alsoshows estimated mixture components and the estimated pre-cision parameter D. For YZ’s semiparametric procedure, theSD is computed based on 500 bootstrap samples. To accountfor the varying follow-up times of the patients, the bootstrapsampling is stratified according to the follow-up time. . . . . . 323.3 CPI estimates (and 95% credible intervals) and CEL counts ofpatients whose CPI estimates based on the Bayesian semipara-metric (SP) and parametric (P) procedures with the same fullinformative prior (I (full)) differ by more than 0.03. Only caseswith CPI < 0.25 are considered as our focus is the patients whohave unexpected increases in CEL counts. . . . . . . . . . . . 383.4 CPI estimates (and 95% credible intervals) and CEL counts ofpatients whose CPI estimates based on the Bayesian semipara-metric (SP) and parametric (P) procedures with the same fullinformative prior (I (full)) differ by more than 50%. Only caseswith CPI < 0.25 are considered as our focus is the patients whohave unexpected increases in CEL counts. . . . . . . . . . . . 393.5 The average RMSEs (and SD) computed based on 300 simu-lations. The RMSEs of the CPIs are computed only based onthe patients with one or two new scans. The abbreviations areFreq: frequentist (i.e., YZ’s procedures), Bayes: Bayesian, P:parametric, SP: semiparametric, I (full): full informative prior,I (SPMS): SPMS informative prior, U: uninformative prior. . . 463.6 Comparisons of Bayes SP U and Bayes SP I in terms of Pr(Yi,new+ ≥yi,new+|Y i,pre = yi,pre) for selected values of yi,new+ and yi,pre+based on 500 simulations. . . . . . . . . . . . . . . . . . . . . . 49xList of Tables5.1 Simulation results of the illustrative example. The values de-noted as (a/b) within parentheses represent the SEs computedbased on the Hessian representation of the Fisher informationmatrix (a) and that based on the outer products of the gradi-ent representation of the Fisher information matrix (b). True:simulation parameter values; MLE: the MLE obtained by max-imizing the observed likelihood directly; EMMLE: the MLEobtained via the MCEM algorithm; llk: log-likelihood valuesat the final estimates; M: the final MCEM sample size; Itr: thenumber of iterations required before convergence. . . . . . . . 985.2 The estimated posterior probability (and 95% CI) of being incluster 2 (decreasing trend) for patients missclassified by atleast one of TRIPLE, DOUBLE-12, SINGLE-1, 2 or 3. ‘‘x’’ in-dicates a patient misclassified by the corresponding procedure.N miss indicates the number of misclassified patients. . . . . . 1065.3 Simulation results of the illustrative example when outcomes1, 2 are effective and outcome 3 is ineffective. The values de-noted as (a/b) within parentheses represent the SEs computedbased on the Hessian representation of the Fisher informationmatrix (a) and that based on the outer products of the gradi-ent representation of the Fisher information matrix (b). True:simulation parameter values; MLE: the MLE obtained by max-imizing the observed likelihood directly; EMMLE: the MLEobtained via the MCEM algorithm; llk: log-likelihood valuesat the final estimates; M: the final MCEM sample size; Itr: thenumber of iterations required before convergence. . . . . . . . 109xiList of Tables5.4 Simulation results of the illustrative example in the contextof identifing relative responders. The values denoted as (a/b)within parentheses represent the SEs computed based on theHessian representation of the Fisher information matrix (a) andthat based on the outer products of the gradient representationof the Fisher information matrix (b). True: simulation pa-rameter values; MLE: the MLE obtained by maximizing theobserved likelihood directly; EMMLE: the MLE obtained viathe MCEM algorithm; llk: log-likelihood values at the final es-timates; M: the final MCEM sample size; Itr: the number ofiterations required before convergence. . . . . . . . . . . . . . 1146.1 The model estimates of DOUBLE and SINGLE using NAL.For both PAL and NAL outcomes, Linear and Step mean mod-els are considered. Labels next to β(r)[1]l indicate the p-valueof the asymptotic Z-test for hypothesis β(r)[1]l = β(r)[2]l ; †† indi-cates that p-value is less than 0.01; 4 indicates that the p-valueis greater than 0.05. The abbreviations are: llk: log-likelihoodvalues at the final estimates; AIC: Akaike Information Crite-rion; M: the final MCEM sample size; Itr: the number of iter-ations required before convergence. . . . . . . . . . . . . . . . 1236.2 The estimated posterior probabilities based on DOUBLE withLinear mean models for both NAL and PAL, and based on SIN-GLE with the Linear mean model for NAL for various patients. 1286.3 The model estimates of TRIPLE and SINGLE. Labels next toβ(r)[1]l indicate the p-value of the asymptotic Z-test for hypoth-esis β(r)[1]l = β(r)[2]l ; †† indicates that the p-value is less than0.01. The abbreviations are; llk: log-likelihood values at thefinal estimates; AIC: Akaike Information Criterion; M: the fi-nal MCEM sample size; Itr: the number of iterations requiredbefore convergence. . . . . . . . . . . . . . . . . . . . . . . . . 134xiiList of Tables6.4 The mean point estimates and RMSE across 100 simulateddatasets for three scenarios. . . . . . . . . . . . . . . . . . . . 146xiiiList of Figures2.1 CEL counts of the control patients in one clinical trial. Leftpanel: trajectories of counts for individual patients; right panel:histogram of average counts. . . . . . . . . . . . . . . . . . . . 72.2 The days of MRI scans for each patient in one clinical trial sincethe date of the screening scan of the first patient. The verticallines are drawn at every 150 days. #pat and #S in each binrepresent the number of patients and the number of MRI scanswithin the interval. The screening and the baseline scans arecoloured in blue. . . . . . . . . . . . . . . . . . . . . . . . . . . 83.1 The posterior RE density estimates (i.e., Gi, top) and the den-sity estimates of the patient’s mean CEL counts at baseline (i.e.,eα(s)(1−Gi)/Gi, bottom) from both the parametric (blue) andsemiparametric (black) Bayesian procedures for all ten clinicaltrials. The dotted curves are pointwise 95% CIs based on thesemiparametric Bayesian procedure. . . . . . . . . . . . . . . . 27xivList of Figures3.2 Top: The estimates of D(s), α(s) and β(s)a,t t = 1, 2, a = 0, 1, s =1, 2, · · · , 9. The areas of the rectangles are proportional to thenumber of scans in each study. The horizontal line for eachestimate corresponds to the 95% CI. Middle: Marginal esti-mates of the full and SPMS informative priors. The horizontallength represents the 95% CI. Bottom: The estimates from theanalysis of the “current” study based on the Bayesian semipara-metric procedure with the full informative prior (Bayes SP: I(full)), the SPMS informative prior (Bayes SP: I (SPMS)), andan uninformative prior (Bayes SP: U), the parametric Bayesianprocedure with the full informative prior (Bayes P: I (full)),YZ’s parametric procedure (Freq P), and YZ’s semiparametricprocedure (Freq SP). Review 1 does not have estimates of β(0)0,2as no patient was followed more than 4 months by this review.“T” indicates that the limit of the 95% CI is truncated; seeWeb Table 3.1 for the actual values. . . . . . . . . . . . . . . . 283.3 Comparisons of CPI estimates (defined in (2.6)) from semipara-metric Bayesian procedures among the full informative prior,the SPMS informative prior and an uninformative prior for the“current” trial at reviews 2 and 4. The digits are patient IDs.The solid line is the 45-degree reference line. . . . . . . . . . . 343.4 The conditional probability mass function of Yi,n+|Y i,pre fortwo selected patients with large discrepancies in their CPI es-timates from the semiparametric Bayesian procedure with thefull and SPMS priors at review 4. . . . . . . . . . . . . . . . . 353.5 Comparisons of CPI estimates from the parametric and semi-parametric Bayesian procedures for the “current” trial. Thedigits are patient IDs. The solid line is the 45-degree referenceline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37xvList of Figures3.6 Boxplots of the lengths of 95% CIs for the CPIs of patientsin SPMS-5 at each review based on the YZ parametric pro-cedure (Freq P), the YZ semiparametric procedure (Freq SP),the semiparametric Bayesian procedure with the uninformativeprior (Bayes SP U), the SPMS informative prior (Bayes SP I(SPMS)) and the full informative prior (Bayes SP I (full)), andthe parametric Bayesian procedure with the full informativeprior (Bayes P I (full)). For the YZ semiparametric procedure,95% empirical CIs based on 500 bootstrap samples are reported.The boxplots only show the lengths of CPIs for patients withYi,pre+ > 0, and the number of CPIs used for each boxplot isrepresented as #CPI. . . . . . . . . . . . . . . . . . . . . . . . 413.7 The estimated RE density from the Bayesian semiparametric(black), Bayesian parametric (blue) (both with full informativeprior) and YZ’s parametric procedures (green) from a singlesimulation in Scenario A. The dotted curves represent the up-per and lower limits of 95% credible intervals from the semi-parametric procedure. The true density curve (red) and thesampled REs (histogram) are superimposed. . . . . . . . . . . 474.1 The AUC based on six procedures under the scenarios whenY(2)i is ineffective (µ2 = 0) and effective (µ2 = 0.5, 1, 2). Threeposterior probability procedures are considered: PP1,i, PP2,iand PP12,i. Three linear composite score procedures are con-sidered: OLS, LCSi(−1) and LCSi(10). . . . . . . . . . . . . . 65xviList of Figures4.2 A hundred random samples of (Y(1)i , Y(2)i )T from each of theresponder (blue) and non-responder (red) groups for variouscombinations of the parameters: µ2 = 1, 0.5, 0 and ρ1 = ρ2 =−0.95, 0, 0.5, 0.95 when µ1 = 1 and pi = 0.5. The contours ofthe posterior probability based on the two outcomes are su-perimposed. In each plot, the adjoining side panel shows thecorresponding posterior probability based only on Y(2)i and theadjoining top panel shows the corresponding posterior proba-bility based only on Y(1)i . The adjoining number in the topright corner is ρ1 (= ρ2). . . . . . . . . . . . . . . . . . . . . . 674.3 The contours of OLS (left), LCSi(0) (middle) and LCSi(−1)(right) as functions of Y(1)i and Y(2)i . The superimposed pointsare 100 samples of (Y(1)i , Y(2)i ) from each of the responder (cir-cle, blue) and non-responder (triangle, red) groups when ρ1 =ρ2 = 0.8 and µ2 = 2. . . . . . . . . . . . . . . . . . . . . . . . 684.4 The density of OLS (left), LCSi(0) (middle) and LCSi(−1)(right) for responders (solid curve) and non-responders (dottedcurve) when ρ1 = ρ2 = 0.8 and µ2 = 2. . . . . . . . . . . . . . 695.1 Trace plots of the parameters Ψ{s}Y (1)over the MCEM iterations.Black: TRIPLE; Magenta: DOUBLE-12; Red: SINGLE-1.The dotted lines indicate the MLEs. . . . . . . . . . . . . . . . 1005.2 Trace plots of the parameters Ψ{s}Y (2)over the MCEM iterations.Black: TRIPLE; Magenta: DOUBLE-12; Green: SINGLE-2.The dotted lines indicate the MLEs. . . . . . . . . . . . . . . . 1015.3 Trace plots of the parameters Ψ{s}Y (3)over the MCEM iterations.Black: TRIPLE; Red: SINGLE-3. The dotted lines indicatethe MLEs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1015.4 Trace plots of the parameters Σ{s}B and pi{s} over the MCEMiterations from the Black: TRIPLE; Magenta: DOUBLE-12;Red: SINGLE-1; Green: SINGLE-2; Blue: SINGLE-3. Thedotted lines indicate the MLEs. . . . . . . . . . . . . . . . . . 102xviiList of Figures5.5 The observed log-likelihood values at each iteration of the MCEMalgorithm. Black: TRIPLE; Magenta: DOUBLE-12; Red: SINGLE-1; Green: SINGLE-2; Blue: SINGLE-3. . . . . . . . . . . . . . 1025.6 The top panel shows the asymptotic lower Lα(σˆ) and upperUα(σˆ) bounds for ∆Q(Ψ̂{s+1}; Ψ̂{s}) (dashed curves) and thepoint estimates (solid curve) for each procedure. The y-axis ison the log-scale. The bottom panel shows the MC sample sizeat each iteration for each procedure. . . . . . . . . . . . . . . . 1035.7 The trajectories of the simulated subjects over the 5 time pointscoloured by the clusters assigned by TRIPLE (top panels) andtrajectory of subject 76 misclassified by SINGLE-2 and 3 (bot-tom panels). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1045.8 The joint distribution of the posterior probabilities based onSINGLE-1, 2, 3 or DOUBLE, and TRIPLE. The percentageindicates the proportion of the TRIPLE posterior probabilitiesthat are larger than SINGLE-1, 2, 3 or DOUBLE. . . . . . . . 1075.9 The trajectories of the 50 additional control patients . . . . . . 1136.1 The proportions of NALs or PALs from the previous time pointthat become PALs at the current time point plotted against theratio of the NAL and PAL counts at the previous time point.The intensity of the red indicates the number of observationslying at the same location. Note that the x-axis is presentedon the log-scale. . . . . . . . . . . . . . . . . . . . . . . . . . . 1216.2 The Spearman correlation matrices across 14 repeated mea-surements (7 from NAL and 7 from PAL) for treated patientsobtained from: (a) DOUBLE with the Linear mean models forboth NAL and PAL, and (b) the raw data. N:j and P:j indicatethe NAL and PAL counts on the jth scan, respectively. . . . . 1256.3 The estimated posterior probability of being a relative respon-der for all patients on the treatment arm based on both DOU-BLE and SINGLE labeled by patient ID. . . . . . . . . . . . . 126xviiiList of Figures6.4 The Spearman correlation matrices across 39 repeated measure-ments (13 from PASAT, 13 from T25FW and 13 from 9HPT)obtained from: (a) TRIPLE, and (b) the raw data. P:j, T:jand H:j indicate the jth PASAT, T25FW and 9HPT outcome,respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1366.5 The trajectories of the observed values of PASAT and T25FWon the log-log scale and of 9HPT on the log-log scale for eachpatient, grouped by non-worsening and worsening clusters. Theclassifications are based on the results from TRIPLE (first row),SINGLE-PASAT (second row), SINGLE-T25FW (third row)and SINGLE-9HPT (fourth row). The average at each timepoint for each cluster is represented by the bold dotted curve.The dotted horizontal lines for T25FW and 9HPT show theirmaximum values. . . . . . . . . . . . . . . . . . . . . . . . . . 1376.6 The Venn diagram of worsening patients found by the four pro-cedures. The number in parenthesis indicates the total numberof identified worsening patients by each procedure. . . . . . . . 1386.7 The trajectories of T25FW and 9HPT on the log-log scale, andof PASAT and MSFC over j = −1, 0, · · · , 12 for example pa-tients. The dotted horizontal lines for T25FW and 9HPT showtheir maximum values. The legends show patient IDs. . . . . . 1406.8 The trajectories of the MSFC scores grouped by clusters fromTRIPLE, SINGLE-PASAT, SINGLE-T25FW and SINGLE-9HPT.1416.9 The estimated posterior probabilities of being in the worseninggroup from the four procedures versus the change in MSFCbetween the final follow-up measure and the baseline. “cor”shows the Spearman correlation between the change in MSFCscores and the estimated posterior probabilities. . . . . . . . . 142xixList of FiguresC.1 The top panels show the traceplots of the approximated ob-served log-likelihood value based on SINGLE with the Linearmean model for NAL and DOUBLE with the Linear meanmodels for both NAL and PAL counts. The middle panelsshow the asymptotic lower Lα(σˆ) and upper Uα(σˆ) boundsfor ∆Q(Ψ̂{s+1}; Ψ̂{s}) (dashed curves) and the point estimates(solid curve). The y-axis is on the log-scale. The bottom panelsshow the MC sample size at each iteration. . . . . . . . . . . . 178C.2 The traceplots of the estimated parameters over MCEM iter-ations. The black curves indicate the trace of DOUBLE withthe Linear mean models for both NAL and PAL counts. Thered curves indicate the trace of SINGLE with the Linear meanmodel for NAL counts. . . . . . . . . . . . . . . . . . . . . . . 180C.3 The top panels show the traceplots of the approximated ob-served log-likelihood value based on procedures (TRIPLE andSINGLE). The middle panels show the asymptotic lower Lα(σˆ)and upper Uα(σˆ) bounds for ∆Q(Ψ̂{s+1}; Ψ̂{s}) (dashed curves)and the point estimates (solid curve). The y-axis is on the log-scale. The bottom panels show the MC sample size at eachiteration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182C.4 The traceplots of the estimated parameters over MCEM it-erations. The black, magenta, red and blue curves indicatethe trace of TRIPLE, SINGLE-PASAT, SINGLE-T25FW andSINGLE-9HPT, respectively. . . . . . . . . . . . . . . . . . . . 183xxList of Abbreviations9HPT 9 hole peg testAIC Akaike information criterionAUC area under the ROC curveBLUE best linear unbiased estimatorCDF cumulative distribution functionCEL contrast-enhancing lesionCI credible interval or confidence interval (depending on context)CPI conditional probability indexCPMF conditional probability mass functionDIC deviance information criterionDOUBLE double longitudinal outcome mixture modelDP Dirichlet processDSMB data and safety monitoring boardEM expectation-maximizationGLM generalized linear modelxxiGLMM generalized linear mixed effect modelGLS generalized least squaresLCS linear composite scoreMC Monte CarloMCMC Markov chain Monte CarloMH Metropolis–HastingsMLE maximum likelihood estimateMLOMM multiple longitudinal outcome mixture modelMRI magnetic resonance imagingMS multiple sclerosisMSFC multiple sclerosis functional compositeMVN multivariate normalNAL newly active lesionsNB negative binomialOLS ordinary least squaresPAL persistently active lesionsPASAT paced auditory serial addition testRE random effectRMSE root mean square errorROC receiver operating characteristicRRMS relapsing remitting multiple sclerosisxxiiSD standard deviationSE standard errorSINGLE single longitudinal outcome mixture modelSPMS secondary progressive multiple sclerosisT1-Gd T1-weighted gadolinium-enhancedT25FW timed 25-foot walkTRIPLE triple longitudinal outcome mixture modelYZ Zhao et al. [83]xxiiiAcknowledgementsFirstly, I would like to express my sincere gratitude to my supervisors Dr.Yinshan Zhao and Professor John Petkau for their continuous support of myPh.D. studies, for their career advice, and for their patience. I was blessedto have weekly meetings with them for the past 3 years. I hope they knowhow excited I was for every meeting to share my new findings with them.Their keen intellectual insights toward research always amazed me. I had aclear idea of my goal throughout my Ph.D. studies: I want to be a profes-sional statistician just like them. I could not have imagined having bettersupervisors.Besides my supervisors, I would like to thank the rest of my thesis com-mittee, Professor Harry Joe and Professor James V. Zidek, for their insightfulcomments and encouragement.I thank my fellow graduate student and best friend, Dr. Yanling (Tara)Cai. I enjoyed every single discussion with her regarding research, writing,case study competition, exams, marriage, or simply what to eat for lunch.She was a a ray of sunshine in my office. Also I would like to thank my fellowgraduate student and now my husband, Dr. Kyle Hambrook for believing inme in times of difficulty. He never doubted that I would complete the Ph.D.program, and his confidence gave me true strength. I also thank him forenlightening me about playing card and board games.Last but not the least, I would like to thank the mountains in NorthVancouver, Burnaby, Squamish, Whistler and Pemberton. The adventures inthose mountains always made me realize that there is so much more to explorein the world.xxivDedicationTo KylexxvChapter 1IntroductionRandomized controlled clinical trials are conducted to assess the efficacy ofdrugs and their potential adverse effects on the population of interest. Theireffects on specific individuals in trials have been considered of secondary inter-est. However, clinicians have long observed that patients with similar symp-toms may have different causes, and similarly that medical interventions maywork well in some patients with a disease but not in others. Advances ina wide range of fields from genomics to magnetic resonance imaging (MRI),along with increased computational power and other technologies are allowingindividual patients to be monitored more carefully and treated more efficientlyto better meet their individual needs [18].Multiple sclerosis (MS) involves immune system attacks against the centralnervous system (brain, spinal cord, and optic nerves). MS is thought to affectmore than 2.3 million people worldwide. It is estimated that approximately55,000-75,000 men and women in Canada have the disease, and every dayabout three more Canadians are diagnosed with MS. MS patients have het-erogeneous disease courses: MS can cause blurred vision, loss of balance, poorcoordination, slurred speech, tremors, numbness, extreme fatigue, problemswith memory and concentration, paralysis, blindness, and other symptoms.The U.S. Food and Drug Administration has approved several medications forthe treatment of MS since 1993; Betaseron was the first medication approvedfor MS [19]. As yet, there is no cure for MS [52].Conventionally, various longitudinally collected clinical and MRI endpointsare examined separately to assess progression of MS. Contrast-enhancing le-sion (CEL) counts on T1-weighted gadolinium-enhanced (T1-Gd) MRI scansdetect acute inflammatory plaques and are widely used as a surrogate end-1Chapter 1. Introductionpoint in short-term phase I/II MS clinical trials to monitor treatment efficacyas well as patient safety [48]. Another popular MRI outcome is the countof lesions observed on T2-weighted scans. Unlike T1-Gd scans which showonly the currently enhancing lesions, T2-weighted scans also show inactiveold lesions. T2-weighted scans are often used for tracking long-term diseaseprogression. Although the relationship between clinical symptoms and lesionactivity on MRI scans is not completely clear, the use of MRI to monitordisease activity in phase I/II MS trials has gained acceptance as it is sensitiveenough to measure subtle disease progression [63].However, regulatory agencies require clinical outcomes, which directlymeasure what patients feel, as primary outcomes for phase III trials of thera-peutic medications for MS, and MRI outcomes are only allowed as secondaryoutcomes. The MS functional composite (MSFC) [16] is a widely used clinicaloutcome in phase II/III MS trials. The MSFC was designed to reflect the var-ied clinical expression of MS across patients and over time. It is a compositescore based on three test scores: the first test is the 9 hole peg test (9HPT),which assesses arm functionality by measuring how fast a patient can placeand remove pegs in holes; the second test is the timed 25-foot walk (T25FW),which assesses leg functionality by measuring how fast a patient can walk 25feet; and the third test is the paced auditory serial addition test (PASAT),which assesses cognitive functionality by counting how many correct answersa patient can make to 60 addition quizzes.During MS trials, these MRI and clinical outcome measures are typicallycollected at screening, baseline and several follow-up visits. The screeningmeasures are taken to assist in determining eligibility and the baseline mea-sures are taken immediately before treatment initiation. The follow-up mea-sures are taken according to a prespecified schedule. There is an urgent needto develop rational procedures for monitoring the safety of individual patientsin on-going MS trials, and for identifying individuals who responded partic-ularly well to treatments in completed MS trials based on these longitudinaloutcomes.We first discuss the problem of how to identify worsening patients using2Chapter 1. IntroductionCEL counts during MS clinical trials in Chapters 2 and 3. Such a procedureis essential as new drugs tested in phase II/III trials could have potential ad-verse effects on some patients. Data and Safety Monitoring Boards (DSMBs)are responsible for monitoring the safety of trial patients and the ethical ex-ecution of trials [51] and have been using an increase of CELs on MRI scansas an indicator for potential adverse events in MS trials. However, identify-ing excessive activity levels for an individual patient is challenging as CELactivity levels vary substantially across patients. To quantify how excessiveCEL activity levels are for individual patients, Zhao et al. [83] (hereafter YZ)proposed to first summarize the CEL counts on the recent MRI scans into ascalar statistic, such as the total count, and then compute the probability ofobserving the summarized lesion activity as large as that observed, condition-ing on the patient’s previous CEL activity levels. This conditional probabilityindex (CPI) was evaluated based on a mixed effect negative binomial (NB)model for the longitudinal CEL counts with a patient-specific random effect(RE).This novel CPI approach was useful to identify patients with unexpectedrecent CEL activity, but it was observed that the CPI is sensitive to the choiceof RE distribution. This motivated us to develop a flexible alternative model.Our approach also employs a mixed effect NB model but parametrizes a RE tomodel the failure probability, hence ranging between (0,1). Then we assumethat it is from an infinite mixture of beta distributions. To keep our modeltractable, we take a Bayesian approach and assign a “stick-breaking prior”[70] for the probability components of the infinite mixtures. This prior allowsfor most probability mass to stay in the first “few” probability components.Another shortcoming of the YZ procedure is that it is difficult to reliablyidentify worsening patients in the early stages of a trial when only a smallnumber of patients are enrolled. As our inference is in Bayesian framework,we incorporate information from previously completed similar MS trials byadopting a meta-analytic approach [72] to develop an informative prior. Ournovel procedure is introduced in Chapter 2. Chapter 3 demonstrates theprocedure with ten previously conducted MS clinical trials, and assesses it3Chapter 1. Introductionwith simulation studies.The second problem this thesis discusses is how to identify treatment re-sponders based on multiple longitudinal outcomes in completed clinical trials.When a cure is absent and a number of therapeutic options is available, theidentification of responders and non-responders is crucial to optimize selectionof treatments. Existing procedures to identify treatment responders are oftenbased on a single longitudinal outcome. However, due to the complex nature ofdiseases, it is often difficult to determine a single outcome that would indicatea treatment effect for responders. Treatment responders should be definedcomprehensively based on all the outcomes of interest. Currently, there is alack of general approach to determine treatment responders based on multiplelongitudinal outcomes generated from various continuous and count distribu-tions.Chapter 4 discusses our initial attempt to identify treatment respondersusing the CPI idea from Chapter 2. We first summarize all the follow-uplongitudinal measurements from multiple outcomes into a single scalar statis-tic to indicate the disease condition during follow-up, and then compute theprobability for a treated patient to experience as large an improvement asobserved on this summary statistic conditioning on the pre-treatment diseaseactivity level. A joint model for the multiple longitudinal outcomes allowsthe evaluation of this probability. However, we found that it is often verydifficult to pre-determine a suitable single scalar statistic to summarize all thefollow-up measurements as it is not clear in advance how treatment will affectthe multiple longitudinal outcomes of the responders. The simulation stud-ies in Chapter 4 show that the performance of this procedure in identifyingresponders depends heavily on the choice of the scalar statistic.This motivated us to take a more general alternative approach that shouldwork well with any choice of longitudinal outcomes regardless of how treat-ment affects the outcomes of responders. We propose to view the identificationof responders as a clustering problem. The clusters of responders and non-responders are based on the multiple longitudinal outcomes. For each patient,we compute the posterior probability that patient belongs to the responder4Chapter 1. Introductioncluster. The simulation studies in Chapter 4 show that this posterior prob-ability approach performs better than the CPI approach, provided that themixture model can be estimated.This clustering approach requires a mixture model for multiple longitudi-nal outcomes. Chapter 5 proposes our novel multiple longitudinal outcomemixture model. As generalized linear mixed effect models (GLMMs) are ar-guably the most popular models for a longitudinal outcome in biostatisticalapplications, this model assumes that conditioning on a cluster label, eachoutcome is from a GLMM. As the GLMMs represent a rich family of distri-butions, our proposed model can be applied to a great variety of continuousand discrete longitudinal outcomes. As our high-dimensional model poten-tially could have a large number of parameters, direct maximization of thelikelihood may be a challenge. To obtain maximum likelihood estimates, wedevelop a optimization algorithm that utilizes the Monte Carlo expectation-maximization algorithm [6]. Chapter 6 illustrates our procedure using two MSclinical trial datasets and assesses how well the procedure is able to identifyresponders using simulations. Although our applications focus on MS clini-cal trial datasets, our general procedure can be applied to any clinical trialdatasets where, conditioning on a cluster label, each of multiple longitudi-nal outcomes can be modelled with a GLMM. The thesis concludes with adiscussion in Chapter 7.5Chapter 2A flexible mixed effectnegative binomial regressionmodel for detecting unusualincreases in MRI lesioncounts2.1 IntroductionCEL counts on MRI detect acute inflammatory plaques in MS patients andare widely used as a surrogate endpoint in short-term phase I/II MS clinicaltrials to monitor treatment efficacy as well as patients’ safety [48]. Newlyenhancing lesions typically continue to enhance for at most several months,so the CEL counts on repeated MRI scans in individual patients will increaseand decrease over time. The trajectories of CEL counts from one of ourclinical trial datasets in Figure 2.1 illustrate the typical pattern of a largedegree of heterogeneity both across patients at fixed time points and withinpatients over time; see also YZ [83, Figure 1]. DSMBs have been using anincrease of CELs on MRI as an indicator for potential adverse events in MStrials. However, identifying excessive activity levels for an individual patientis challenging. DSMBs often rely on ad-hoc criteria such as five or more newCELs above the patient’s baseline level on two consecutive monthly MRIs toidentify an unexpected increase of CEL activity. Such a guideline could be62.1. IntroductionFigure 2.1: CEL counts of the control patients in one clinical trial. Leftpanel: trajectories of counts for individual patients; right panel: histogram ofaverage counts.problematic as the same increase over baseline may have different implicationsfor different patients [65]. To fully utilize the CEL information, YZ proposedto use the probability of observing CEL counts as large as those observed ona patient’s recent scans conditional on the patient’s previous scans.Phase II MS trials typically involve a screening MRI scan to assist indetermining eligibility, a baseline scan taken immediately before treatmentinitiation, followed by monthly scans for 6 to 12 months. Figure 2.2 showsthe scan days of patients in one clinical trial dataset. CELs are counted oneach scan and reviewed by the DSMB on a regular basis, perhaps once every72.1. IntroductionFigure 2.2: The days of MRI scans for each patient in one clinical trial sincethe date of the screening scan of the first patient. The vertical lines are drawnat every 150 days. #pat and #S in each bin represent the number of patientsand the number of MRI scans within the interval. The screening and thebaseline scans are coloured in blue.4 months (red vertical lines in Figure 2.2).Let N be the total number of patients undergoing follow-up at the time ofa review and let yi,j be the CEL count of the ith patient at the jth time point,where j = −1, 0, · · · , ni correspond to screening, baseline, and follow-up scans.Hence the data for the ith patient is yi = (yi,−1, · · · , yi,ni). These scans areintended to be taken at a fixed schedule of time points that is the same for allpatients. These scans can be divided into two sets: the ‘‘pre-scans’’ and the‘‘new scans’’. The pre-scans include the pre-treatment scans (screening and82.1. Introductionbaseline) and any follow-up scans reviewed by the DSMB previously, while thenew scans include all the available follow-up scans not reviewed previously.For a probability model, we consider yi as a realization of a random vectorY i = (Yi,−1, · · · , Yi,ni). Let mi ≥ 0 be the time index of the last scan inthe pre-scans set. YZ proposed to quantify the degree of abnormality ofyi,new = (yi,mi+1, · · · , yi,ni) relative to yi,pre = (yi,−1, · · · , yi,mi) with the CPI:Pr(Yi,n+ ≥ yi,n+|Y i,pre = yi,pre),where Yi,n+ =∑nij=mi+1Yi,j and yi,n+ is the sample counterpart. The CPI foreach patient is estimated based on a mixed effect NB regression model with apatient-specific random intercept taking positive values. The model is fittedto the data using maximum likelihood, treating control and treated patientsas a single group.The YZ procedure provides a rational means to signal patients with un-expected increases in CEL counts, which takes into account the variability ofCEL counts within a patient. However, this approach has at least three short-comings. First and most importantly, the patient-specific CPI can be rathersensitive to the choice of the RE distribution. The CPIs for 4 example casesbased on the mixed effect NB model with log-normal and gamma RE distri-butions reported in Table 2.1 clearly illustrate this point. YZ also considereda simple semi-parametric approach to estimate the RE distribution but thisapproach could show considerable variation in resulting CPI estimates.Furthermore, the REs may not be described by a simple parametric model.The histogram in Figure 2.1 shows the average CEL count of each controlpatient, which is a crude estimate of the scaled RE of YZ’s model, from oneof our clinical trial datasets. The histogram is clearly multimodal, suggestingthat modelling the REs with a mixture distribution might be better.This shortcoming leads us to consider a flexible nonparametric approachto model the RE distribution.Our procedure also assumes a NB for the sampling distribution of CELcounts conditional on a patient-specific RE. We model the failure probability92.1. IntroductionCEL counts RE distributionpre-scans new scans gamma log-normal0/0 0/0/1 0.83 0.590/0 1/1/2 0.30 0.150/1 1/1/2 0.42 0.3315/12 20/27/5 0.10 0.07Table 2.1: The CPIs for 4 example patients based on the YZ NB mixed effectregression model with gamma or log-normal RE distributional assumptions.The fixed effect parameters of the NB model are selected to be the same forthe two models. The expectation and variance of the RE distribution are alsoset the same.of the NB as a RE, ranging within (0,1), and assume it is from an infinitemixture of beta distributions:FG(g) =∞∑h=1pihBeta(g; aGh , bGh), (2.1)where 0 ≤ pih ≤ 1,∑∞h=1 pih = 1 and Beta(·; a, b), a > 0 and b > 0, is the betacumulative distribution function (CDF) with mean a/(a+b). We selected betadistributions for the components of the infinite mixture because the conjugacybetween the beta and the NB leads to computational advantages. We adopta Bayesian approach where the parameters aGh , bGh and pih are taken to berandom. The shape parameters (aGh , bGh) are assumed to be i.i.d. samplesfrom some joint distribution K0. To prevent over-fitting and keep our modeltractable, the prior of {pih}∞h=1 should favor shrinkage towards a simpler finitemixture model. This is achieved by assigning the “stick-breaking prior” [70]for {pih}∞h=1 so that pih is stochastically decreasing as the index h increases.With this prior, we are able to accurately represent the infinite mixture witha truncated finite mixture yet enjoy a great deal of flexibility.The second shortcoming of the YZ procedure is that it is difficult to reliablyidentify worsening patients at the early stages of a trial when only a small102.1. Introductionnumber of patients are enrolled. For example, the clinical trial shown inFigure 2.2 contained 50 patients and merely 149 MRI scans at the end of thefirst five months of the study. This issue motivates us to generalize the YZprocedure to allow the incorporation of information from previously completedstudies. This is straightforward within the Bayesian framework. Available“previous” trials are synthesized in two steps: (1) our flexible Bayesian modelwith uninformative priors is fitted to each “previous” trial separately; (2) theestimates of relevant unknown parameters from each trial are combined viameta-analysis and the estimated distribution is used as an informative priorfor the on-going “current” trial.The third shortcoming arises in YZ’s evaluation of CPI. As the DSMB re-view of MRI safety in MS trials is often initially carried out without knowledgeof the treatment assignments, YZ evaluated the CPI based on a model whichtreats control and treated patients in the “current” trial as a single group.If the treatment reduces patients’ CEL counts, then this approach underesti-mates the CEL activity mean function among control patients. The resultingCPIs consider increases in CEL counts in the new scans more surprising thanit should for control patients. The CPI should detect CEL activity levels thatare unusual for an individual patient if they had been on the control arm. Weassume that the treatment assignments are available to the unblinded statis-ticians responsible for preparing the data review for the DSMB. Hence in ouranalysis of the “current” trial, we model the mean levels of the control andtreated patients separately and evaluate the CPIs based on the CEL activitymodel with the mean function as estimated for the control patients. Similarapproach is also adopted in [82].The rest of this chapter is organized as follows: Section 2.2 introducesour regression model and Section 2.3 discusses the posterior sampling scheme.We discuss how to develop an informative prior based on previous clinicaltrials in Section 2.4. Section 2.5 briefly concludes this chapter. Chapter 3demonstrates the use of our proposed method, and reports simulation studiesthat compare the performance of our procedure with YZ’s procedures andparametric Bayesian alternatives.112.2. Model for repeated CEL counts2.2 Model for repeated CEL counts2.2.1 The distribution of the CEL counts conditionally on GiThe Poisson distribution, commonly used to model count data, restricts themean and the variance of the count variable to be the same. The NB distri-bution allows the variance to be larger than the mean and has been the maincandidate to model the distribution of CEL counts [50, 71, 84].As CEL activity levels differ substantially across patients [3], we assumethere exists an unobserved latent random variable Gi for the ith patient, whichreflects the patient-specific CEL activity level relative to the overall cohort.Conditionally on Gi, the patient’s CEL counts are assumed to be mutuallyindependent. Since the CEL counts of individual patients tend to be consid-erably over-dispersed, we model the conditional distribution as NB:Yi,j |Gi = gi ind.∼ NB(ri,j , gi), (2.2)where NB(r, g), denotes the NB distribution with a probability mass functionevaluated at y:Γ(r + y)Γ(r)y!gr(1− g)y, r > 0, g ∈ (0, 1),and mean r(1−g)/g. The size parameter ri,j is modelled as ri,j = exp(XTi,jβ),and the failure probability is given by the RE gi. This parametrization re-stricts the range of gi to be within (0,1). If we denote µ 1G= E (1/Gi) > 1,then on the log scale, the mean CEL count is linear in the covariates asln E (Yi,j) = ln(µ 1G− 1) +XTi,jβ. The conditional variance of the CEL countis Var(Yi,j |Gi = gi) = ri,j(1 − gi)/g2i = E (Yi,j |Gi = gi)/gi which is strictlygreater than its conditional expectation, hence allowing over-dispersion of theCEL counts. If we denote σ21G= Var(1/Gi), then the marginal variance isVar(Yi,j) = ri,jσ21G(ri,j + 1) + ri,jµ 1G(µ 1G− 1) and Cov(Yi,j , Yi,j′) = ri,jri,j′σ21Gwhere j 6= j′.We model the failure probability of the NB as the RE so that the distri-122.2. Model for repeated CEL countsbution of∑j Yi,j |Gi = gi has a closed form expression NB(∑j ri,j , gi). Thissimplifies the computation of the CPI as it is based on Yi,n+; see Section 2.3.2for the details. The RE is related to the ratio of the patient-specific mean andthe overall mean at given time point as:E (Yi,j |Gi = gi)E (Yi,j)=1gi− 1µ 1G− 1 .Therefore, a small value of the RE (gi ≈ 0) indicates that this patient has alarge CEL activity level within the cohort. Similarly, a large value of the RE(gi ≈ 1) indicates that this patient has a small CEL activity relative to thecohort.To relate the parametrization of our NB model to YZ’s model, we notethat YZ assumes the conditional distribution of Yi,j as:Yi,j |GYZi = gYZi ∼ NB(rY Zi,jαYZ,1αYZgYZi + 1),where αYZ > 0, gYZi ∈ (0,∞) and rYZi,j = exp(XTi βYZ). Therefore, our model(2.2) and YZ’s model are the same if:β0 = βYZ0 − ln(αYZ)Gi =1G∗i + 1, where G∗i = αYZGYZi . (2.3)Here, β0 and βYZ0 indicate the regression intercepts of our model and YZ’smodel, respectively. In YZ’s model, the patient-specific RE is assumed tohave a mean 1, and E (Yi,j |GYZi = gYZi )/E (Yi,j) = gYZi . Hence GYZi directlyreflects the patient-specific CEL activity level relative to the cohort.2.2.2 The infinite mixture distribution of Gi and priorsA popular choice of the RE distribution for the NB when parametrized asabove is a beta distribution due to its conjugacy [25]. However, any sim-132.2. Model for repeated CEL countsple parametric distributional assumption could be too restrictive. To allow aflexible RE distribution, we characterize the distribution of Gi as an infinitemixture of beta distributions (2.1). Let latent variables Hi indicate the mix-ture component from which Gi is “drawn”. Then (2.1) can be rewritten as:Hii.i.d.∼ Categorical(pi1, pi2, · · · ) and Gi|Hi = k ∼ Beta(aGk , bGk).Our prior is the so-called Dirichlet process (DP) mixture, which has beena popular model in the nonparametric Bayesian literature since the seminalpaper [2] and its first application [14]. Many have utilized mixed effect re-gression models with DP mixture distributed REs: [34] applied the normal-normal model; [37, 57] and many others applied the Bernoulli-normal model.However, to our knowledge, this is the first time that a RE associated withthe distribution of count variables is modelled with a DP mixture.The key to tractability of our model is the stick-breaking prior for theprobability components pih. Introducing latent variables Vh, h = 1, 2, · · · , thisprior is defined as:pi1 = V1 and pih = Vh∏l<h(1− Vl) for h > 1, Vh i.i.d.∼ Beta(1, D). (2.4)The stick-breaking terminology arises due to the popular analogy: given aunit length stick, V1 is the proportion allocated to pi1 and V2 is the proportionof the remaining stick 1−V1 (= 1−pi1) allocated to pi2 and so on. This processassures that piha.s.→ 0. Hence most of the REs are “drawn” from the first fewdistributional components of the infinite mixture. In other words, the priorfavours shrinkage towards a simple finite mixture.Since the sample size is finite, many distributional components are “empty”in the sense that none of the Hi corresponds to them. The flexibility of themodel or the amount of shrinkage can be described by the number of nonemptycomponents K (1 ≤ K ≤ N). The precision parameter D in (2.4) controlsK. For small D, pi1 = V1 ≈ 1 and essentially all the probability weight willbe assigned to a single component (K ≈ 1). On the other hand, for large D,small values of Vh are typically returned so many pih receive small but nonzerovalues. As a result, K could be large.142.3. Posterior computationTo implement the Bayesian approach, all the unknown parameters requirea prior specification. As a relatively uninformative prior, (aGh , bGh) are as-sumed to be i.i.d. samples from a joint uniform distribution with wide support,K0(aG, bG) = Unif(aG; 0.5, 30) Unif(bG; 0.5, 30). The lower bounds are setto 0.5 to avoid extreme values, which could cause computational issues. We se-lected wide supports for aG and bG as we found that the choice of hyperparam-eters, i.e., the upper bounds (and lower bounds) of the independent uniformpriors of aGh and bGh , have little impact on the CPI estimates as long as theyare reasonably large (small). We let the data decide the value of D by settinga prior lnD ∼ N 1(µlnD, σ2lnD). For the regression coefficients β, we allow de-pendence and assume a multivariate normal (MVN) prior, β ∼ N (µ,Σ). Foran uninformative version, “large” values are specified for the variances of βand lnD. For an informative version, we use the hyperparameters suggestedby the meta-analysis procedure described in Section Posterior computation2.3.1 The truncated Dirichlet processAs the posterior distribution is not analytically tractable, we use a Markovchain Monte Carlo (MCMC) algorithm to update the prior distributions. Thecollapsed Gibbs sampler [44] and the blocked Gibbs sampler [27] are the mostpopular MCMC algorithms for obtaining approximate posterior samples fromDP mixture distributions. The former avoids updating the infinitely manyparameters characterizing (2.1) by marginalizing out FG from the posteriordistribution and relies on the Polya urn scheme of [4]. A useful alternative isthe blocked Gibbs sampler which approximates FG through truncation of thestick-breaking representation of (2.1) with M components by setting VM = 1so that piM = 1−∑M−1h=1 pih. We adopt the blocked Gibbs sampler as it returnsposterior samples of the approximated FG [28], which will later be used toestimate the posterior CPI. To specify the parameter M that determines thenumber of mass points used in the approximation of the DP, we select the152.3. Posterior computationsmallest M such that the amount of probability assigned to the final masspoint piM is expected to be smaller than , i.e., E (piM ) = E {E (piM |D)} =E [{1− 1/(D + 1)}M−1] < . Therefore, M depends on the prior of D.Assuming the independence of each set of parameters whose dependencerelationships were not described above, we design the MCMC algorithm tohave the following posterior target distribution:p({Vh}M1 , {aGh}M1 , {bGh}M1 , {hi}N1 ,β, D|{{yi,j}nij=−1}Ni=1)∝ p(β)p(D)M−1∏h=1p(Vh|D){ M∏h=1p(aGh)p(bGh)}×{ N∏i=1p(hi|{Vh}M1 )p({yi,j}nij=−1|β, aGhi , bGhi )}.In this chapter, we use the usual Bayesian notation where p() denotes a genericdensity. The reader can infer the random variable from the argument of thefunction p(). This posterior distribution marginalizes out the REs Gi. Ourprimary interest is to compute the posterior estimate of the CPI. As its com-putation does not depend on posterior samples of Gi, but on its approximateddistribution defined by {Vh}Mh=1, {aGh}Mh=1, {bGh}Mh=1 (see Section 2.3.2), ourneed is satisfied with this marginalized target distribution. Furthermore, dueto the conjugacy of the beta distributions, a closed form expression for themarginal density p({yi,j}nij=−1|β, aGhi , bGhi ) is available. Our empirical studiesindicate that this MCMC sampler tends to require much less computationaltime and to return less auto-correlated samples than another MCMC algo-rithm that targets the full posterior distribution (not marginalizing out Gi).The next section discusses the details of our MCMC algorithm.Details of the MCMC samplerFor the parameters with non-conjugate priors (i.e. β, D, {aGh}Mh=1 and{bGh}Mh=1), the Metropolis Hasting (MH) algorithm is employed to samplefrom their full conditional distributions. For D, {aGh}Mh=1 and {bGh}Mh=1, the162.3. Posterior computationMH algorithm is performed separately with a normal proposal distribution,where its proposal variance is tuned during the burn-in to have the accep-tance rates range between 0.2 and 0.6. For β, we found that updating eachregression coefficient with separate MH algorithms resulted in poor mixing inthe Markov chain when high correlation is assumed in some of β in our prior.Therefore, the MH algorithm is performed simultaneously for β and a MVNproposal distribution is employed with aΣ as its proposal covariance matrix,where Σ is the covariance of the prior for β and a is a tuning scalar adjustedduring the burn-in period to have the acceptance rates range between 0.2 and0.6.The parameters with conjugate priors (i.e. {hi}Ni=1 and {Vh}Mh=1) are up-dated as:• Update hi (i = 1, 2, · · · , N) by sampling from the posterior categoricaldistribution:Categorical( p({yi,j}nij=−1|β, aGk , bGk )Vk ∏l<k(1− Vl)M∑h=1p({yi,j}nij=−1|β, aGh , bGh )Vh∏l<h(1− Vl); k = 1, 2, · · · ,M),where {Yi,j}nij=−1|β, aGhi , bGhi has a beta binomial distribution:p({yi,j}nij=−1|β, aGhi , bGhi ) (2.5)=∫ 10p({yi,j}nij=−1|gi,β)p(gi|aGhi , bGhi )dgi=ni∏j=−1(exp(XTi,jβ) + yi,j − 1yi,j)B(ni∑j=−1exp(XTi,jβ) + aGhi,ni∑j=−1yi,j + bGhi)B (aGhi, bGhi),and B (a, b) represents the beta function (a > 0, b > 0).• Update Vh (h = 1, 2, · · · ,M) from Beta(1+N∑i=1I(hi = h), D+N∑i=1I(hi >h)).We emphasize that this MCMC algorithm approximates the posteriordistribution of the REs at each draw b based on {a[b]Gh}Mh=1, {b[b]Gh}Mh=1 and172.3. Posterior computation{V [b]h }Mh=1 (b = 1, 2, · · · , B), where B denotes the number of the Gibbs itera-tion. Given the bth approximated RE distribution and β[b], we can obtain aposterior sample of the CPI for each patient.2.3.2 The posterior conditional probability indexGiven the parameters β, {Vh}Mh=1, {aGh}Mh=1, and {bGh}Mh=1, our CPI canbe expressed in a closed form with beta functions B (a, b). As discussed inSection 2.1, our CPI is evaluated based on the CEL activity mean functionof control patients. Our model allows the mean level of the CEL counts ofthe treated and control patients to vary (details are discussed in Section 2.4).Let r0i,j be the size parameter that would obtain for patient i at time j if thispatient had been on the control arm. Denote Yi,p+ =∑mij=−1 Yi,j , Yi,n+ =∑nij=mi+1Yi,j , r0i,p+ =∑mij=−1 r0i,j and r0i,n+ =∑nij=mi+1r0i,j . Then Yi,p+|Gi =g ∼ NB(r0i,p+, g) and Yi,n+|Gi = g ∼ NB(r0i,n+, g) and the CPI of the ithpatient is:Pr(Yi,n+ ≥ yi,n+|Y i,pre = yi,pre)= Pr(Yi,n+ ≥ yi,n+|Yi,p+ = yi,p+)=1−yi,n+−1∑t=0(r0i,n++t−1t) ∞∑h=1pihB (r0i,n++r0i,p++aGh ,t+yi,p++bGh )B (aGh ,bGh )∞∑h=1pihB (r0i,p++aGh ,yi,p++bGh )B (aGh ,bGh ). (2.6)Therefore, given B posterior samples of β, {Vh}Mh=1, {aGh}Mh=1, and {bGh}Mh=1,we can obtain B approximate samples of the posterior CPI by truncating theinfinite sums of (2.6) at M and then plugging in the set of posterior samples.We use the mean of B posterior samples as the estimate of the CPI for apatient, and the empirical 2.5th and 97.5th quantiles of the B samples aslower and upper limits of its 95% credible interval (CI). The conjugacy of thebeta and NB distributions enables the evaluation of (2.6) without numericalintegration which greatly reduces the computational cost.182.4. Synthesizing previous datasets for prior specification2.4 Synthesizing previous datasets for priorspecificationThe priors for aGh , bGh and pih are fully specified in Section 2.2.2. Our synthe-sis procedure is used only for the hyperparameters of the precision parameterD and the fixed effects β.2.4.1 Structure of β for “current” and “previous” trialsThis section introduces the superscript (s) to indicate the corresponding trialwhere s = 0 indicates the current trial to be monitored and s = 1, 2, · · · , Sindicate the previous trials which yield the informative prior for the currenttrial. We assume that the CEL counts of the MS trials are from our semipara-metric NB mixed effect model. We consider the simple scenario when only twoassignments are present; an active treatment arm and a control arm. Thenthe mean CEL count for both “previous” and “current” trials is modelled onthe log scale as a constant over every four-month follow-up period, where theconstants are allowed to depend on the treatment assignments A(s)i (A(s)i = 1for treatment, else 0):ln E (Y(s)i,j |A(s)i = ai) (2.7)= ln(µ 1G− 1) + α(s) +1∑a=0T∑t=1β(s)a,t I(j ∈ Tt, ai = a),where Tt contains the indices corresponding to scans taken within the tth four-month interval during the follow-up, and TT ends with the index of the lastscheduled follow-up scan. For simplicity, this section assumes that the numberof four-month intervals of follow-up scans T are the same for all studies. Ourprocedure can be readily applied to the scenario where T differs among studiesas long as the T for at least one of the previous trials is at least as large asthe current trial’s T . In (2.7), α(s) (intercept) reflects the mean of the pre-treatment CEL counts in study s and β(s)a,t measures the change in log-scaledmean from the baseline to the tth four-month interval among the placebo192.4. Synthesizing previous datasets for prior specificationpatients (a = 0) or treated patients (a = 1) in study s.2.4.2 Random effect meta-analysisLet β(s) = (α(s), β(s)0,1, β(s)1,1, · · · , β(s)0,T , β(s)1,T ). To develop an informative priorfor β(0) and lnD(0) we synthesize the information from previous datasets viameta-analysis. The development of an informative prior in the MS safetymonitoring context was previously discussed in [23], where the CEL countsare modelled with YZ’s NB model with gamma REs, and an informative priorfor the unknown parameters of this model is developed based on a MVN REmeta-analysis [1].We also synthesize βˆ(s)with a MVN RE meta-analysis and ̂lnD(s) withthe corresponding univariate version. We adopt a frequentist meta-analysisbecause our purpose is to objectively synthesize the previous trials. Our dis-cussion focuses on the procedure for β(s). Our MVN RE meta-analysis as-sumes that each study s produces a normally distributed estimate of its ownmean CEL activity levels β(s) and the β(s) are assumed to be i.i.d. samples:βˆ(s)|β(s) ind.∼ N 2T+1(β(s),Ψ(s)) s = 1, 2, · · · , S,β(s)i.i.d.∼ N 2T+1(µ,Σ) s = 0, 1, · · · , S, (2.8)where µ = (µα, µβ0,1 , µβ1,1 , · · · , µβ0,T , µβ1,T ). While [23] used (2.8) with themaximum likelihood estimate (MLE) of µ and Σ as an informative prior forher fixed effects, we follow [72] and employ the predictive distribution of β(0).Assuming that Σ and Ψ(s) are known and µ has an improper uniform prior,the predictive distribution is:β(0)|βˆ(1), · · · , βˆ(S) ∼ N 2T+1(µˆ,Σ + V ), (2.9)where V = {∑Ss=1(Σ+Ψ(s))−1}−1 and µˆ = V {∑Ss=1 (Σ+Ψ(s))−1βˆ(s)}. Notethat µˆ is the MLE for µ with respect to the likelihood p(βˆ(1), · · · , βˆ(S)|µ,Σ)and V = V̂ar(µˆ). While [23]’s prior only accounts for the between-study vari-ability Σ, our prior also accounts for the uncertainty in µˆ, and therefore avoids202.5. Conclusionsan overly-confident informative prior especially when few previous studies areavailable or when each available previous trial has a small number of patients.In practice, we need estimates of Ψ(s) and Σ to use (2.9) as a prior. Thewithin-study covariance matrices Ψ(s) are replaced by the estimates obtainedin the analysis of the previous studies. This is reasonable in our context as thenumber of patients in MS trials is usually quite large. The between-study co-variance matrix Σ is replaced by its restricted MLE. The restricted MLE of Σis obtained using the R library mvmeta. The implemented function maximizesthe profiled version of the restricted likelihood only with respect to the com-ponents of Σ, which are expressed in terms of the Cholesky decomposition.Therefore, the procedure assures the positive-definiteness of Σˆ. The functionalso returns µˆ and its variance, V . The univariate version of this restrictedthe maximum likelihood method is applied to estimate µlnD and σ2lnD.2.5 ConclusionsTo signal patients with unexpected increases in CEL activity, the YZ proce-dure uses the CPI based on a mixed effect NB model for longitudinally col-lected CEL counts. As the CPI is sensitive to the distributional assumptionfor the patient-specific RE, this chapter proposed a more flexible mixed effectregression model, which specifies the RE distribution as an infinite mixture ofbetas. To our knowledge, no previous literature considers a mixed effect re-gression for longitudinal count variables where the RE is modelled with a DPmixture. This model should be of general interest and utility, well beyond ourapplication to the CEL counts of MS patients. For example, our semiparamet-ric DP mixture model can be used to identify clusters of subjects with similarpatterns of count responses without specifying the number of clusters. Asour model is developed within the Bayesian framework, it provides a naturalmeans for incorporating the DSMB’s expert knowledge. This chapter also pre-sented a procedure to develop informative priors for the regression coefficientsand the precision parameter of our model based on data from previously com-pleted trials. To facilitate use of this new methodology, we have implemented212.5. Conclusionsour MCMC sampler and the function to evaluate the CPI in the R packagelmeNBBayes that is publicly available at http://cran.r-project.org/. Thenext chapter demonstrates our procedure with ten previously conducted MSclinical trials and assesses it with simulation studies.22Chapter 3The applications andsimulation studies of theflexible mixed effect negativebinomial regression modelThis chapter demonstrates the safety monitoring procedure discussed in Chap-ter 2. Section 3.1 demonstrates the procedure with ten previously conductedphase II/III MS clinical trials. Section 3.2 assesses the procedure with simu-lation studies. Our conclusions are summarized in Section Application to MS clinical trial datasets3.1.1 Ten MS clinical datasetsWe illustrate our semiparametric Bayesian procedure using ten completedplacebo-controlled MS clinical studies with repeated CEL counts. Patients inthese studies are either relapsing-remitting (RRMS) or secondary progressive(SPMS). Table 3.1 describes the ten studies (5 trials with RRMS patients and5 trials with SPMS patients). The numbers of patients recruited are between103 to 293. MRI scans were taken monthly or six-weekly in all studies andthe total number of scheduled MRI scans for each patient ranges between 7 to11. (SPMS-2 originally had 41 monthly scans, but to make all studies roughlycomparable in terms of the scheduled MRI scans for each patient, only thefirst 11 scans from this study are used in our analysis.) An active control233.1.ApplicationtoMSclinicaltrialdatasetsTrial RRMS-1 RRMS-2 RRMS-3 RRMS-4 RRMS-5 SPMS-1 SPMS-2 SPMS-3 SPMS-4 SPMS-5Key features:N 293 167 127 198 163 276 163 125 103 190T (s) 2 2 3 3 2 3 3 2 3 2# scans/pat 8 8 10 11 7 11 11 7 9 9# scans 2294 1305 1160 2124 1040 2840 1662 857 823 1592Efficacy + no no + + + + + no noDrug IFN-β1a lenercept IVIG IFN-β1a Ocrelizumab IFN-β1a IFN-β1b IFN-β1b MBP8298 MPReference [75] [40] [15] [63] [32] [41] [55] [49] [20] —Analysis summaries: Goodness-of-fit from the Bayesian SP and Bayesian P proceduresDIC (SP) 6433.9 3927.4 3461.3 4634.9 1854.9 5312.1 2895.3 2080.5 1537.9 3643.5DIC (P) 6437.9 3929.3 3462.5 4636.4 1856.9 5329.2 2932.6 2082.5 1555.9 3656.9Posterior estimates (SE) of K, D and regression coefficients from the Bayesian SP procedureK(s) 2.4(1.8)3.1(2.6)2.5(2.2)5.2(3.5)4.4(3.1)7.4(2.9)5.2(1.9)4.5(3.2)4.6(1.7)6.8(3.0)D(s) 0.33(0.30)0.47(0.51)0.39(0.42)0.77(0.73)0.67 (0.63) 1.04 (0.68) 0.72 (0.48) 0.72 (0.72) 0.68 (0.47) 1.01 (0.73)α(s) 1.79(0.11)1.37(0.14)0.90(0.13)1.32(0.12)1.34 (0.21) 1.15 (0.12) 1.26 (0.15) 1.46 (0.21) 1.24 (0.24) 1.02 (0.14)β(s)0,1 -0.20(0.08)0.09(0.12)0.09(0.17)-0.10(0.10)-0.14(0.15)0.04 (0.09) 0.30 (0.12) 0.08 (0.13) -0.03 (0.19) 0.07 (0.12)β(s)1,1 -0.75(0.06)0.02(0.08)-0.02(0.12)-1.03(0.10)-1.67(0.15)-1.02 (0.09) -1.23 (0.13) -1.48 (0.16) 0.10 (0.16) -0.03(0.10)β(s)0,2 -0.38(0.09)-0.05(0.14)-0.23(0.16)-0.42(0.11)-0.31(0.18)-0.01 (0.10) 0.01 (0.12) -0.20 (0.15) 0.10 (0.20) -0.18(0.14)β(s)1,2 -0.88(0.08)-0.04(0.09)0.16(0.11)-1.56(0.12)-3.31(0.35)-1.41 (0.11) -1.62 (0.15) -1.50 (0.20) 0.37 (0.17) -0.06(0.11)β(s)0,3 — — -0.21(0.16)-0.39(0.17)— -0.04 (0.14) -0.34 (0.23) — 0.03 (0.23) —β(s)1,3 — — 0.27(0.11)-1.59(0.21)— -1.29 (0.19) -1.24 (0.23) — 0.31 (0.17) —Table 3.1: The key features and analysis summaries of the clinical trials. The abbreviations are RRMS:relapsing remitting multiple sclerosis, SPMS: secondary progressive, T (s): the number of four-month intervalsduring the follow-up, SP: semiparametric, P: parametric, IFN: Interferon, IVIG: IV immunoglobulin, MP:Micellar Paclitaxel, “Efficacy”: whether or not the referenced clinical paper found efficacy based on the CELcounts. Although our application treats SPMS-5 as a“current” trial, we also report its fits with this prior. Thelower panel of the table shows the estimated fixed effect parameters and the deviance information criterion(DIC) introduced in Section 3.1.2. The DICs are computed for both the semiparametric Bayesian procedure(SP) and the parametric Bayesian procedure (P). No published reference is available for SPMS-5.243.1. Application to MS clinical trial datasetsgroup in the RRMS-5 study was excluded from our analysis.We treat all datasets except SPMS-5 as previous clinical trials, from whichan informative prior is developed for β(0), the vector of the fixed effects, andlnD(0), the log-scaled precision parameter in (2.4), in SPMS-5. SPMS-5 isselected as a current trial to be monitored because it failed to demonstrateany treatment effect and it was previously analyzed by YZ in their application.3.1.2 Analysis of “previous” trialsFor each trial, we applied the semiparametric Bayesian procedure using fairlyuninformative priors to all the scans. For simplicity, treatment arms with dif-ferent dosages are combined as one arm and the expected CEL counts are mod-elled as (2.7). The hyperparameters of the priors for β(s) are µ = (0, · · · , 0),Σ = diag(10, · · · , 10). The dimension of µ used for each trial depends on thelength of the follow-up, i.e., 2T (s) + 1 (see Table 3.1 for T (s)). The hyperpa-rameters of the prior for lnD(s) are set to have E (D(s)) = SD (D(s)) = 0.5;that is µlnD = −1.04 and σ2lnD = 0.69.We truncate the infinite mixture at M = 21, which satisfies E (piM ) <10−4 with this prior for D(s). The MCMC sample size is set to 20,000 afterdiscarding the first 5,000 as burn-in and thinning at every 10th iteration. Wedetermined the convergence of the MCMC by visual inspection of trace-plotsand auto-correlation plots.To assess the goodness-of-fit of competing models, Table 3.1 reports thedeviance information criterion (DIC) with p({yi,j}nij=−1|β(s), FG) as the fo-cused likelihood [73] from both the semiparametric (FG defined as (2.1)) andparametric Bayesian models. The latter has the same hierarchical struc-ture as the former but the RE is assumed to be from a single beta, i.e.,FG(g) = Beta(g; aG, bG). We selected β(s) and FG as the focused param-eters because they are used in the computation of the CPI in (2.6). SeeAppendix A.2 for details of the calculation procedure for our semiparametricmodel.Interestingly, the DICs indicate that the semiparametric procedure fits253.1. Application to MS clinical trial datasetsbetter than the parametric alternative for each of the ten trials and, in par-ticular, considerably better for four SPMS trials (SPMS-1, 2, 4 and 5). Fig-ure ??rovides one possible explanation. The top panel shows the RE den-sity estimates from both the parametric (blue) and semiparametric (black)Bayesian procedures for all ten trials. The semiparametric estimated densitiesshow clear multi-modality for the trials where the DIC considerably favoursthe semiparametric procedure. The bottom panel shows the estimated den-sity of the transformed RE, eα(s)(1−Gi)/Gi, which can be interpreted as themean baseline CEL count. For the calculation, α(s) is substituted by αˆ(s).The estimated densities of the transformed RE from SPMS-1, 2 and 4 remainmulti-modal. For example, SPMS-2 shows three modes at around 0, 2 and 6,indicating that patients’ baseline CEL activities cluster into three groups: no,low (mean 2) and high (mean 6) activity. The better fit of the semiparametricprocedure could be because the CEL activity levels of MS patients, especiallySPMS patients, tend to be heterogeneous.3.1.3 Meta-analysis: Synthesizing the nine previous trialsBased on the nine sets of µˆ(s) from RRMS 1-5 and SPMS 1-4, (see Table 3.1)and their estimated covariance matrices Ψˆ(s) (results not shown), we developtwo informative priors to monitor SPMS-5 as described in Section 2.4. Ifclinicians believe that the CEL activity patterns of RRMS and SPMS cohortscould differ, they might want to use only the SPMS trials for synthesis. There-fore, we develop informative prior based on all nine previous trials, namelythe “full informative prior”, and one based only on the previous SPMS trials,namely the “SPMS informative prior”.The top panel of Figure 3.2 shows the precision parameter and regressioncoefficient estimates for the first two 4-month periods of follow-up from thenine studies. In our model, βˆ(s)1,t − βˆ(s)0,t indicates the estimated treatment effectrelative to controls in time interval t. RRMS-2,3 and SPMS-4 return nearlyidentical time effects in the treatment and placebo arms, βˆ(s)1,t ≈ βˆ(s)0,t (t = 1, 2),indicating that the treatments are ineffective for decreasing the mean CEL263.1. Application to MS clinical trial datasetsFigure 3.1: The posterior RE density estimates (i.e., Gi, top) and the densityestimates of the patient’s mean CEL counts at baseline (i.e., eα(s)(1−Gi)/Gi,bottom) from both the parametric (blue) and semiparametric (black) Bayesianprocedures for all ten clinical trials. The dotted curves are pointwise 95% CIsbased on the semiparametric Bayesian procedure.273.1. Application to MS clinical trial datasetsFigure 3.2: Top: The estimates of D(s), α(s) and β(s)a,t t = 1, 2, a = 0, 1, s =1, 2, · · · , 9. The areas of the rectangles are proportional to the number ofscans in each study. The horizontal line for each estimate corresponds to the95% CI. Middle: Marginal estimates of the full and SPMS informative priors.The horizontal length represents the 95% CI. Bottom: The estimates fromthe analysis of the “current” study based on the Bayesian semiparametricprocedure with the full informative prior (Bayes SP: I (full)), the SPMS in-formative prior (Bayes SP: I (SPMS)), and an uninformative prior (Bayes SP:U), the parametric Bayesian procedure with the full informative prior (BayesP: I (full)), YZ’s parametric procedure (Freq P), and YZ’s semiparametricprocedure (Freq SP). Review 1 does not have estimates of β(0)0,2 as no patientwas followed more than 4 months by this review. “T” indicates that the limitof the 95% CI is truncated; see Web Table 3.1 for the actual values. 283.1. Application to MS clinical trial datasetscount during the first eight months of follow-up while the remaining six trialsdemonstrate positive treatment effects. On the other hand, the placebo effectsseem to be relatively stable across studies.The middle panel of Figure 3.2 shows the key features of the univariatemarginal distributions of the estimated informative priors for D(0) and β(0).Due to the high variation among the treatment effects of our “previous” trials,both the full and SPMS priors for the treatment effects β(0)1,t (t = 1, 2) arehighly variable. However, this is not a big concern as the calculation of CPIdoes not rely on the estimated β(0)1,t (see Section 2.1). Interestingly, the SPMSpriors for α(0) and β(0)0,2 are less variable than those for the full informativepriors. The center of the SPMS informative prior for β(0)0,2 lies around 0, slightlyhigher than the full informative prior, indicating that there is nearly no changebetween the baseline and the second follow-up period. The center of the fullinformative prior for D(0) is smaller than that for the SPMS informative prior,which is reasonable as the SPMS trials exhibited more heterogeneity in CELcounts across patients within a trial than the RRMS trials (Figure 3.1).The estimated mean, standard deviation (SD) and correlation matrix ofthe predictive distribution (informative priors) based on all nine “previous”trials are:µˆ =1.310.02−0.78−0.17−1.04 , ŜD (β(0)|βˆ(1), · · · , βˆ(S)) = ,Ĉor(β(0)|βˆ(1), · · · , βˆ(S)) =1.00−0.73 1.00−0.31 0.15 1.00−0.63 0.86 0.41 1.00−0.28 0.20 0.98 0.43 1.00The estimated informative prior for lnD(0) is normal with µˆlnD = −0.77 and293.1. Application to MS clinical trial datasetsσˆlnD = 0.26.Similarly, the estimated parameters of the informative priors based onlyon SPMS 1-4 are:µˆ =1.260.09−0.90−0.02−1.05 , ŜD (β(0)|βˆ(1), · · · , βˆ(S)) = ,Ĉor(β(0)|βˆ(1), · · · , βˆ(S)) =1.000.36 1.00−0.35 −0.86 1.00−0.55 −0.60 0.80 1.00−0.24 −0.83 0.98 0.75 1.00 .The estimated informative prior for lnD(0) is normal with µˆlnD = −0.48 andσˆlnD = 0.36.Notably, both the full and SPMS informative priors show that the largemean baseline CEL count is associated with large declines in the mean CELcount during the two follow-up intervals among treated patients, and thetreatment effects for the two follow-up intervals are nearly perfectly linearlyrelated (Cor(β1,1, β1,2) ≥ 0.98 for both priors).3.1.4 Analysis of the ‘‘current’’ trial – Model fitThe dataset SPMS-5 consists of CEL counts from 190 patients who were ran-domized over a period of about 22 months to three treatment arms: placebo,low dose and high dose of Micellar Paclitaxel. Patients entered the study ina staggered manner as shown in Figure 2.2. Nine MRI scans were scheduledduring the trial: eight 4-weekly MRI scans (including 2 pre-treatment scans)and a final scan at week 32. Hence the study has 2 four-month intervals dur-ing the follow-up (T (0) = 2). We assume that DSMB reviews occur every 150303.1. Application to MS clinical trial datasetsdays and a total of 5 reviews are conducted over the study period. All themissing scans (5.6%) are treated as missing at random in our analysis.At each review, we fitted the YZ parametric (with gamma RE) and semi-parametric procedures, and the semiparametric Bayesian procedure with theuninformative prior defined in Section 3.1.2, and with full and SPMS informa-tive priors. (To keep E (piM ) < 10−4, we set M = 10 and 14 for the full andSPMS informative priors, respectively.) For YZ’s semiparametric procedure,we observed that the algorithm could be unstable when there are too fewpatients on either arm with repeated measures in one of the time intervals.To avoid breakdown of the algorithm we made a technical modification; seeAppendix A.1 for details. A parametric Bayesian procedure with the samefull informative prior for β is also considered. For all the procedures, theexpectation of the CEL counts is modelled as (2.7).The bottom panel of Figure 3.2 shows the estimates of the precision param-eter D(0), intercept α(0) and placebo effects β(0)0,t (t = 1, 2), from the “current”analysis at each review and Table 3.2 shows their actual values. At review1, YZ’s procedures and the Bayesian procedures with an uninformative priorreturn exceptionally large estimates of α(0) (YZ’s parametric procedure: 4.94;YZ’s semiparametric procedure: 5.25; semiparametric Bayesian procedure:2.27), with relatively large standard error (SE) indicating that these estimatesshould not be trusted. The large estimated α(0) from YZ’s parametric pro-cedure at review 1 is accompanied by a highly left-skewed estimated RE dis-tribution which keeps ln E (Yi,j) within reasonable range (results not shown).This means that the NB mixed effect model fitted by these procedures seemsto reduce to the Poisson model. In contrast, the Bayesian procedures withinformative priors return fairly consistent estimates of α(0), although slightlydecreasing with smaller SE over successive reviews. As the number of reviewsincreases, the estimated α(0) and its confidence or credible intervals becomequite similar among all the procedures except for YZ’s semiparametric proce-dure: its 95% empirical bootstrap CI is substantially larger than the otherseven at the final review.The estimates of the placebo effects in the second follow-up interval β(0)0,2313.1.ApplicationtoMSclinicaltrialdatasetsReview Model RE Prior α(0) β(0)0,1 β(0)0,2 K D1 Freq P — 4.94 (3.34) -0.29 (0.39) — — —SP — 5.25 (0.99) -0.20 (0.85) — — —Bayes P I (full) 1.41 (0.22) -0.04 (0.12) — — —SP U 2.27 (0.93) -0.19 (0.41) — 2.5 (1.4) 0.43 (0.36)SP I (full) 1.32 (0.21) -0.01 (0.12) — 3.4 (1.1) 0.49 (0.13)SP I (SPMS) 1.21 (0.12) -0.08 (0.07) — 3.9 (1.3) 0.68 (0.24)2 Freq P — 1.66 (0.29) 0.23 (0.18) 0.00 (0.37) — —SP — 1.90 (0.97) -0.04 (0.49) -0.60 (0.96) — —Bayes P I (full) 1.41 (0.17) -0.03 (0.09) -0.15 (0.12) — —SP U 1.73 (0.28) 0.04 (0.18) -0.37 (0.40) 4.5 (1.9) 0.66 (0.45)SP I (full) 1.43 (0.16) -0.04 (0.09) -0.16 (0.12) 4.4 (1.5) 0.51 (0.13)SP I (SPMS) 1.28 (0.10) -0.05 (0.06) 0.08 (0.07) 5.2 (1.8) 0.73 (0.26)3 Freq P — 1.40 (0.18) 0.14 (0.13) -0.24 (0.19) — —SP — 1.57 (0.45) 0.17 (0.16) -0.20 (0.28) — —Bayes P I (full) 1.30 (0.13) 0.00 (0.08) -0.15 (0.10) — —SP U 1.51 (0.18) 0.15 (0.13) -0.27 (0.20) 5.5 (2.3) 0.77 (0.53)SP I (full) 1.36 (0.13) -0.01 (0.08) -0.15 (0.10) 4.8 (1.5) 0.51 (0.13)SP I (SPMS) 1.29 (0.09) 0.01 (0.06) 0.04 (0.07) 5.7 (1.9) 0.75 (0.26)4 Freq P — 1.02 (0.14) 0.04 (0.12) -0.17 (0.15) — —SP — 1.08 (0.30) 0.11 (0.17) -0.05 (0.27) — —Bayes P I (full) 1.09 (0.12) 0.05 (0.07) -0.11 (0.10) — —SP U 1.05 (0.15) 0.07 (0.13) -0.18 (0.15) 6.6 (2.9) 0.97 (0.70)SP I (full) 1.12 (0.12) 0.05 (0.07) -0.11 (0.10) 4.9 (1.6) 0.52 (0.13)SP I (SPMS) 1.16 (0.08) -0.00 (0.06) 0.04 (0.06) 5.9 (2.0) 0.76 (0.27)5 Freq P — 1.02 (0.13) -0.04 (0.13) -0.13 (0.14) — —SP — 1.06 (0.30) 0.10 (0.17) -0.08 (0.26) — —Bayes P I (full) 1.07 (0.12) 0.05 (0.07) -0.11 (0.10) — —SP U 1.02 (0.14) 0.07 (0.13) -0.18 (0.14) 6.6 (3.0) 0.97 (0.72)SP I (full) 1.10 (0.12) 0.05 (0.07) -0.12 (0.10) 5.0 (1.7) 0.52 (0.14)SP I (SPMS) 1.15 (0.08) -0.01 (0.06) 0.04 (0.06) 5.8 (2.0) 0.76 (0.27)Table 3.2: Summary of the “current” trial analysis. The reported values are the estimated posterior mean (orMLE) of the fixed effect coefficients and the SD. The abbreviations are; Freq: Frequentist (i.e., YZ’s procedures),Bayes: Bayesian, P: parametric, SP: semiparametric, U: uninformative prior, I (full): full informative prior,I (SPMS): SPMS informative prior. The table also shows estimated mixture components and the estimatedprecision parameter D. For YZ’s semiparametric procedure, the SD is computed based on 500 bootstrapsamples. To account for the varying follow-up times of the patients, the bootstrap sampling is stratifiedaccording to the follow-up time.323.1. Application to MS clinical trial datasetswith the SPMS informative prior are larger than those produced by the re-maining procedures at all reviews, and their 95% CIs are substantially shorter,especially at earlier reviews. In fact, the estimates based on the SPMS priorare slightly positive at all reviews, indicating that the CEL activities amongplacebo patients increased from the pre-study period to the second follow-upperiod. This is because the center of the SPMS prior for β(0)0,2 (−0.02) is largerthan for the full prior (−0.17) and its variability is relatively small (SD(β(0)0,2)is 0.12 and 0.16 from the SPMS and full informative priors, respectively).While the prior choice affects the regression coefficient estimates, αˆ(0) andβˆ(0)0,t (t = 1, 2), the same prior tends to return similar estimates under differentRE distributional assumptions. For example, with the same full informativeprior, the estimates and their 95% CIs from the semiparametric and paramet-ric Bayesian procedures are very similar at each review.3.1.5 Analysis of the ‘‘current’’ trial – Evaluation of CPIsThis section discusses how the CPI estimates vary with the choice of priors andthe RE distributional assumption. First, we compare the CPI estimates fromthe semiparametric Bayesian procedures with different priors. Figure 3.3 plotsthe estimated CPIs smaller than 0.25 based on the semiparametric Bayesianprocedures with all three priors considered at reviews 2 and 4. The uninfor-mative and full informative priors return nearly identical CPI estimates (toppanels) by review 4, which makes sense as their regression coefficient esti-mates are similar at later reviews. The bottom panel shows that the SPMSinformative prior tends to return larger CPIs than the full informative priorat review 4. Why are there such discrepancies in the CPI estimates? Fig-ure 3.4 shows the estimated conditional probability mass function (CPMF) ofYi,n+|Y i,pre = yi,pre for two selected patients with relatively large discrepan-cies among the CPI estimates based on the two informative priors. For bothpatients, the CPMF based on the full informative prior is shifted more towardthe left than the CPMF based on the SPMS informative prior. This is becausethe latter prior for βˆ0,2 is centered at a larger value (prior mean −0.02 and333.1. Application to MS clinical trial datasetsFigure 3.3: Comparisons of CPI estimates (defined in (2.6)) from semi-parametric Bayesian procedures among the full informative prior, the SPMSinformative prior and an uninformative prior for the “current” trial at reviews2 and 4. The digits are patient IDs. The solid line is the 45-degree referenceline.343.1. Application to MS clinical trial datasets(a) ID 51, Yi,n+|Y i,pre = (4, 5, 3, 1, 0, 2) with 3new scans. This patient has Yi,n+ = 13.(b) ID 65, Yi,n+|Y i,pre = (1, 6, 24) with 5 newscans. This patient has Yi,n+ = 86.Figure 3.4: The conditional probability mass function of Yi,n+|Y i,pre fortwo selected patients with large discrepancies in their CPI estimates from thesemiparametric Bayesian procedure with the full and SPMS priors at review4.353.1. Application to MS clinical trial datasetsSD 0.12) than the former prior (prior mean −0.17 and SD 0.12), leading to alarger βˆ0,2 (0.04 versus −0.12) and a larger estimate of mean CEL count atthe second follow-up interval. As a result, the CPMF from the former prioris shifted towards the left, which in turn yields smaller estimated CPIs.To examine how the RE distributional assumption affects the CPI esti-mates, we compare CPI estimates from the semiparametric and parametricBayesian procedures with the full informative prior. As the correspondingeffect estimates are very similar at each review, any discrepancies in their CPIestimates can be attributed to their only difference: the RE distributionalassumption. Figure 3.5 compares the CPI estimates smaller than 0.25 fromthe two procedures at each review. We observe some discrepancies in the CPIestimates from the two procedures, although none is overly dramatic. Ta-bles 3.3 and 3.4 show the CPI estimates and CEL counts of all patients whoseCPI estimates based on the two procedures differ by more than 0.03 and 50%,respectively. Patients with smaller CPI estimates from the semiparametricprocedure tend to have many small or zero CEL counts (e.g. CEL counts <2) whereas patients with smaller CPI estimates from the parametric proce-dure tend to have many large CEL counts. For example, Table 3.4 shows thatat review 3 the parametric Bayesian method considers Patient 65 (pre 1/6,new 24, CPI=0.0011) as having more extreme recent lesion activity than Pa-tient 101 (pre 0/0/0/0, new 0/2/4/1, CPI=0.0023), while the semiparametricBayesian method indicates the opposite ordering (Patient 65: CPI=0.0021,Patient 101: CPI=0.0012). The correct ordering of worsening patients is im-portant to allow the DSMB to make appropriate decisions for each patient.Tables 3.3 and 3.4 show the RE distributional assumption can affect the or-dering. As the true RE distribution is unknown, using a flexible model withless restrictive assumptions should be preferred.Finally, Figure 3.6 shows boxplots of the lengths of 95% CIs for the CPIs ofpatients in SPMS-5 at each review based on various procedures. The shortestCIs should be attained by the parametric Bayesian procedure with the fullinformative prior as it utilizes a strong RE distributional assumption. It isevident that, regardless of the choice of informative priors, our semiparametric363.1. Application to MS clinical trial datasetsFigure 3.5: Comparisons of CPI estimates from the parametric and semi-parametric Bayesian procedures for the “current” trial. The digits are patientIDs. The solid line is the 45-degree reference line.373.1.ApplicationtoMSclinicaltrialdatasetsReview ID Pre-scans New scans Bayes P I (full) Bayes SP I (full)Bayes P I (full)+0.03 < Bayes SP I (full)1 67 3/1 4 0.15 (0.09, 0.22) 0.24 (0.12, 0.38)2 8 2/1 1/4 0.21 (0.15, 0.27) 0.25 (0.15, 0.37)13 3/2 5/5 0.09 (0.05, 0.14) 0.12 (0.06, 0.21)3 49 11/9 8/14/16/14 0.17 (0.10, 0.25) 0.21 (0.10, 0.35)98 6/10/6/7/16 14/8/11 0.15 (0.09, 0.23) 0.19 (0.10, 0.30)4 54 1/3/4/4/0/3 5/3/2 0.19 (0.14, 0.25) 0.23 (0.15, 0.32)82 0/0/0/9 12/2/0/0 0.15 (0.11, 0.19) 0.19 (0.12, 0.27)Bayes P I (full) > Bayes SP I (full)+0.032 39 1/0/0 1/1/1/1/0 0.20 (0.16, 0.25) 0.14 (0.08, 0.21)68 0/0 1/1/1/0 0.11 (0.08, 0.14) 0.07 (0.04, 0.12)71 1/0 2 0.14 (0.11, 0.18) 0.11 (0.07, 0.16)86 2/1/0/0/0/0 2/1/0 0.20 (0.16, 0.25) 0.16 (0.10, 0.22)121 0/0/1/0/0 1/2/0/0 0.12 (0.10, 0.15) 0.09 (0.05, 0.13)155 0/1 2/1/1/1/1 0.17 (0.12, 0.22) 0.13 (0.07, 0.20)157 0/0 0/0/1/1 0.20 (0.16, 0.24) 0.15 (0.10, 0.22)3 11 0/0/0/1/0 1/0/0/2 0.12 (0.10, 0.15) 0.09 (0.06, 0.12)14 0/0 1/1/1/0 0.11 (0.09, 0.14) 0.07 (0.05, 0.11)22 0/0/0 0/0/0/1/1 0.17 (0.15, 0.19) 0.13 (0.09, 0.17)32 0/0 0/1/0/1/1 0.14 (0.12, 0.17) 0.09 (0.06, 0.13)48 0/0 0/0/1/1 0.20 (0.17, 0.23) 0.14 (0.10, 0.19)174 0/0 1/1 0.10 (0.08, 0.12) 0.07 (0.05, 0.09)4 177 0/0/0 0/0/0/1/1 0.17 (0.15, 0.20) 0.14 (0.10, 0.18)Table 3.3: CPI estimates (and 95% credible intervals) and CEL counts of patients whose CPI estimates basedon the Bayesian semiparametric (SP) and parametric (P) procedures with the same full informative prior (I(full)) differ by more than 0.03. Only cases with CPI < 0.25 are considered as our focus is the patients whohave unexpected increases in CEL counts.383.1.ApplicationtoMSclinicaltrialdatasetsReview ID Pre-scans New scans Bayes P I (full) Bayes SP I (full)1.5×Bayes P I (full)< Bayes SP I (full)Review ID Pre-scans New scans Bayes P I (full) Bayes SP I (full)1 67 3/1 4 0.1466 (0.0881, 0.2177) 0.2373 (0.1197, 0.3752)2 89 1/0/2/6 11/8/7/6 0.0015 (0.0004, 0.0038) 0.0022 (0.0004, 0.0067)3 13 3/2/5/5 14/1/20/2 0.0105 (0.0049, 0.0190) 0.0166 (0.0056, 0.0361)65 1/6 24 0.0011 (0.0003, 0.0028) 0.0021 (0.0005, 0.0056)4 185 0/1/0/2/1/8 6/12/13 0.0001 (0.0000, 0.0003) 0.0002 (0.0000, 0.0006)Bayes P I (full) > Bayes SP I (full)×1.5Review ID Pre-scans New scans Bayes P I (full) Bayes SP I (full)2 10 0/0 2/2/1 0.0222 (0.0138, 0.0335) 0.0135 (0.0063, 0.0251)3 14 0/0 1/1/1/0 0.1133 (0.0929, 0.1363) 0.0736 (0.0484, 0.1060)32 0/0 0/1/0/1/1 0.1423 (0.1182, 0.1693) 0.0944 (0.0629, 0.1343)82 0/0 0/9 0.0011 (0.0006, 0.0020) 0.0007 (0.0003, 0.0014)101 0/0/0/0 0/2/4/1 0.0023 (0.0016, 0.0032) 0.0012 (0.0006, 0.0022)160 0/0/0/0/0 2/1/1/0 0.0109 (0.0082, 0.0141) 0.0068 (0.0039, 0.0110)4 79 0/0/0/0/0 0/2/3/1 0.0023 (0.0016, 0.0032) 0.0015 (0.0008, 0.0025)175 0/0/0 0/2/1/1/2 0.0188 (0.0141, 0.0243) 0.0124 (0.0073, 0.0195)Table 3.4: CPI estimates (and 95% credible intervals) and CEL counts of patients whose CPI estimates basedon the Bayesian semiparametric (SP) and parametric (P) procedures with the same full informative prior (I(full)) differ by more than 50%. Only cases with CPI < 0.25 are considered as our focus is the patients whohave unexpected increases in CEL counts.393.2. Simulation studiesBayesian procedure substantially reduces the lengths of the CIs for CPIs rela-tive to the YZ procedures or the Bayesian semiparametric procedure with anuninformative prior, especially at earlier reviews. In fact, at the first DSMBreview, some of the 95% CIs from the YZ procedures covered most of the unitinterval; this indicates that an informative prior is essential for meaningfulinferences. It is also worth noting that the YZ semiparametric procedure re-turns relatively large CIs even at later reviews, indicating the inefficiency ofthe method even with larger datasets.In summary, this application shows that the choice of RE distribution isimportant as it could result in a biased selection of worsening patients asseen in Tables 3.3 and 3.4. In particular, our semiparametric Bayesian modelcould be preferred to the parametric alternative as a model for CEL activi-ties of SPMS patients (Table 3.1). Further, the CPI estimates based on thesemiparametric Bayesian procedure with informative priors are more reliablethan those obtained with the YZ parametric model, especially at earlier re-views (Figure 3.6). Our Bayesian safety monitoring procedure requires carein selecting “previous” trials with patients who have similar characteristics tothose in the “current” trials as the informative prior can impact on the CPIestimates (Figure 3.3 and Figure 3.4).3.2 Simulation studies3.2.1 Investigation of the impacts of prior choices and REdistributional assumptions on CPI estimatesThe goal of this simulation study is to investigate how different choices ofpriors and RE distributional assumptions impact the CPI estimates. Giventhe RE, monthly CEL counts are generated from the NB model (2.2). Theregression coefficients are assumed to differ among studies and are generatedfrom a MVN with µ and Σ replaced by the estimates from the full informativeprior (Scenario A) or the SPMS informative prior (Scenario B) developed inSection 3.1.3. The proportion of treated patients is assumed to be 0.67, the403.2. Simulation studiesFigure 3.6: Boxplots of the lengths of 95% CIs for the CPIs of patientsin SPMS-5 at each review based on the YZ parametric procedure (Freq P),the YZ semiparametric procedure (Freq SP), the semiparametric Bayesianprocedure with the uninformative prior (Bayes SP U), the SPMS informativeprior (Bayes SP I (SPMS)) and the full informative prior (Bayes SP I (full)),and the parametric Bayesian procedure with the full informative prior (BayesP I (full)). For the YZ semiparametric procedure, 95% empirical CIs basedon 500 bootstrap samples are reported. The boxplots only show the lengthsof CPIs for patients with Yi,pre+ > 0, and the number of CPIs used for eachboxplot is represented as #CPI.413.2. Simulation studiesactual proportion in SPMS-5.Three RE distributions are considered:• Setting 1: Gi ∼ Beta(3, 0.8), which returns (E (Yi,j), SD (Yi,j)) = (1.48, 3.45)and (1.40, 3.29) at baseline under Scenarios A and B, respectively.• Setting 2: Gi ∼ 0.3Beta(10, 10) + 0.7Beta(20, 1), i.e. the mixture distri-bution with 30% from a Beta with both shape parameters 10 and 70%from a Beta with shape parameters 20 and 1, which returns (E (Yi,j),SD (Yi,j)) = (4.12, 3.73) and (0.20, 0.51) at baseline for the patientswhose REs are generated from the first and second component of themixture under Scenario A, and (3.90, 3.59) and (0.18, 0.49) under Sce-nario B.• Setting 3: Gi = 1/(G∗i + 1), where G∗i ∼ 0.85Gamma (0.176, 2.226) +0.15N 1(1.820, 0.303), and Gamma (a, b) represents the gamma CDF withvariance ab2 (a > 0 and b > 0). See (2.3) for the definition of G∗i .This mixture returns (E (Yi,j), SD (Yi,j)) = (1.46, 4.16) and (6.75, 4.56)at baseline for the patients whose REs are drawn from the first and sec-ond component of the mixture under Scenario A, and (1.38, 3.97) and(6.39, 4.41) under Scenario B. This RE distribution is the same as thesimulation model considered by YZ.We assume MRI scans are taken monthly, with a total of 10 MRI scansfor each patient, a screening, a baseline and eight follow-up scans. We assumethat 15 patients are recruited every month so that 180 patients are recruitedin 12 months, leading to a study duration of 21 months. DSMB reviews areassumed to occur every 4 months. This means that by the DSMB review 2,540 scans are collected from 120 patients. We limit attention to the first twoDSMB reviews as the accuracy of the CPI estimates with a relatively smallnumber of scans is our primary interest. For each of the 6 combinations ofRE distribution and Scenario, 300 datasets are generated.For the fitted model, we consider our semiparametric and parametricBayesian procedures, and YZ’s semiparametric and parametric (with gamma423.2. Simulation studiesRE) procedures. The mean CEL counts of the fitted model are allowed todiffer between the placebo and treated patients as (2.7). For the semipara-metric Bayesian method, we consider the uninformative prior used in thedata analysis in Section 3.1 as well as the two informative priors developedin Section 3.1, but with modified values of µlnD and σlnD. To reflect oursimulation setting, the hyperparameters µlnD and σlnD are estimated by syn-thesizing 100 “previous” trials (containing 180 patients and 10 MRI scans foreach patient) simulated from Scenario A (B) for each RE setting. That is, thesemiparametric Bayesian model with uninformative priors is fit to estimatelnD(s) for each “previous” trial, and the estimated predictive distributionlnD(0)|l̂nD(1), · · · , l̂nD(S) is then used as an informative prior. The modifiedfull (SPMS) informative prior can be thought as a “correct” informative priorunder Scenario A (B). The truncation parameters M are selected to haveE (piM ) < 10−4. For the parametric Bayesian model, we only consider the“correct” informative prior for β corresponding to each of Scenarios A and B.The MCMC was run 55,000 times with the first 5,000 samples discardedas burn-in. The remaining 50,000 samples are thinned to every 5th iterationto reduce the autocorrelation among samples, which returns 10,000 posteriorsamples in total.ResultsThe estimated modified full and SPMS priors for lnD are very similar ineach RE setting: the means of the modified full prior are −0.97, −0.81 and−0.43 in RE settings 1-3 respectively, and the corresponding means of themodified SPMS prior are −0.97, −0.80 and −0.42. The SD of the modifiedfull prior for lnD is estimated as 0.08, 0.07 and 0.07 in RE settings 1-3 andthe corresponding SD of the modified SPMS prior are 0.08, 0.07 and 0.06. Itis reasonable that the estimated mean is smallest under RE setting 1 becausethe larger D is, the more mixture components the RE distribution tends tohave, and RE setting 1 assumes a single beta distribution while the othersassume mixture distributions with two distinct modes.We assess the accuracy of CPI estimates by their average RMSE; results433.2. Simulation studiesare shown in Table 3.5. The CPIs are computed treating all the patientsas placebo patients. The calculation of RMSE only includes patients witha nonzero sum of CEL counts on their new scans as Pr(Yi,n+ ≥ 0|Y i,pre =yi,pre) = 1 for any yi,pre. As the variances of the CPI estimators vary depend-ing on the number of new scans, we compute the RMSE of CPI only basedon patients with 1 or 2 new scans.In both scenarios considered, the YZ semiparametric procedure performspoorly at review 1 relative to all the other procedures. This procedure alsoperforms poorly at review 2, although to a considerably lesser extent. Indeed,in RE setting 2, the YZ parametric and parametric Bayesian procedures per-form equally poorly. This pattern of behaviour of the YZ semiparametricprocedure was not apparent from the YZ simulations which were based on arelatively large number of scans (1,000 scans = 200 patients x 5 scans) andconsidered only an intercept predictor variable. In contrast, our simulationinvolves only 150 scans from 60 patients at review 1 and 540 scans from 120patients at review 2. Further, more predictor variables are involved in our sim-ulations. The poor performance of the YZ semiparametric procedure clearlyindicates the need to develop better semiparametric procedures, such as thosedeveloped in this thesis, that are effective in the early stages of trials.In all RE settings, Scenarios and reviews, the “correct” informative priorreturns the smallest average RMSE among all the priors compared for thesemiparametric Bayesian procedure. In all RE settings and Scenarios, in com-parison to the uninformative prior, the “correct” informative prior reduces theaverage RMSE by a greater percentage at review 1 (between 49.0%− 60.6%)than at review 2 (between 24.0%− 41.5%), which is expected as review 1 hasfewer scans and patients. Although the modified SPMS and full priors are “in-correct” under Scenarios A and B, respectively, the semiparametric Bayesianprocedure with this “incorrect” prior still outperforms the YZ parametric andsemiparametric procedures and the semiparametric Bayesian procedure withan uninformative prior in all these RE settings. In RE settings 2 and 3, italso outperforms the parametric Bayesian procedure.When the RE follows a single beta distribution, the parametric Bayesian443.2. Simulation studiesprocedure performs the best. This is not surprising as it has a strong distribu-tional assumption. The semiparametric Bayesian procedure tends to overesti-mate the number of components in the RE distribution (the posterior mean ofK is 3.1 at review 1 for both Scenarios), allowing too much flexibility for sucha small sample. In contrast, when the RE is from the mixture distributions,the semiparametric Bayesian method with “correct” prior returns the smallestaverage RMSE among all the procedures at both reviews in both Scenarios.Figure 3.7 shows the RE density estimates based on a single simulation atreviews 1 and 2. As each draw from the MCMC sampler approximates theinfinite mixture of beta distributions by∑Mh=1 pihBeta(·; aGh , bGh), we can ob-tain estimates of the density by evaluating them at grids of values of the RE.For RE setting 3, where the RE is re-parametrized to range between [0,∞),the RE density is appropriately transformed. Our semiparametric Bayesianprocedure captures the bi-modality of the true RE densities in RE settings 2and 3. Although the parametric Bayesian procedure outperforms the semi-parametric Bayesian procedure with “correct” prior in RE setting 1, the per-centage reduction in RMSE remains modest (between 21.6% − 31.1%) in allScenarios and reviews. On the other hand, the semiparametric Bayesian pro-cedure with the correct prior in RE setting 2 reduces the RMSE considerably(between 45.2%− 59.3%) compared to the parametric Bayesian procedure.The YZ parametric procedure and the semiparametric Bayesian procedurewith an uninformative prior perform poorly at review 1 in all RE settings andScenarios (RMSE > 0.1) as neither procedure utilizes prior knowledge. How-ever, by review 2, the semiparametric Bayesian procedure with uninformativeprior outperforms the YZ parametric procedure in all RE settings and Scenar-ios. Whereas the average RMSE of the semiparametric Bayesian procedurewith uninformative prior reduces by 45.3% − 58.8% from review 1 to review2, the increased number of available scans at review 2 does not lead to asmuch reduction under the YZ parametric procedure (32.3% − 46.9%) due toits inflexibility.453.2. Simulation studiesReview Model RE Prior RE Setting 1 RE Setting 2 RE Setting 3Scenario A (full informative prior)1 Freq P – 0.114 (0.088) 0.141 (0.079) 0.121 (0.068)SP – 0.148 (0.116) 0.178 (0.122) 0.191 (0.137)Bayes P I (full) 0.043 (0.027) 0.099 (0.029) 0.091 (0.029)SP U 0.124 (0.084) 0.137 (0.098) 0.132 (0.080)SP I (full) 0.056 (0.024) 0.054 (0.026) 0.067 (0.027)SP I (SPMS) 0.066 (0.030) 0.064 (0.033) 0.076 (0.032)2 Freq P – 0.061 (0.035) 0.089 (0.034) 0.082 (0.037)SP – 0.065 (0.041) 0.087 (0.039) 0.096 (0.060)Bayes P I (full) 0.029 (0.017) 0.088 (0.023) 0.085 (0.027)SP U 0.055 (0.033) 0.056 (0.033) 0.072 (0.035)SP I (full) 0.037 (0.019) 0.037 (0.017) 0.055 (0.021)SP I (SPMS) 0.048 (0.027) 0.050 (0.028) 0.063 (0.028)Scenario B (SPMS informative prior)1 Freq P – 0.108 (0.077) 0.135 (0.083) 0.121 (0.080)SP – 0.141 (0.107) 0.179 (0.125) 0.185 (0.134)Bayes P I (SPMS) 0.038 (0.025) 0.096 (0.025) 0.087 (0.025)SP U 0.117 (0.076) 0.129 (0.085) 0.127 (0.080)SP I (full) 0.060 (0.026) 0.059 (0.028) 0.076 (0.032)SP I (SPMS) 0.052 (0.023) 0.051 (0.023) 0.064 (0.024)2 Freq P – 0.061 (0.035) 0.088 (0.033) 0.077 (0.035)SP – 0.069 (0.039) 0.089 (0.042) 0.095 (0.046)Bayes P I (SPMS) 0.024 (0.014) 0.085 (0.022) 0.084 (0.025)SP U 0.057 (0.031) 0.059 (0.037) 0.068 (0.035)SP I (full) 0.045 (0.023) 0.044 (0.025) 0.061 (0.029)SP I (SPMS) 0.035 (0.017) 0.035 (0.020) 0.050 (0.021)Table 3.5: The average RMSEs (and SD) computed based on 300 simulations.The RMSEs of the CPIs are computed only based on the patients with one ortwo new scans. The abbreviations are Freq: frequentist (i.e., YZ’s procedures),Bayes: Bayesian, P: parametric, SP: semiparametric, I (full): full informativeprior, I (SPMS): SPMS informative prior, U: uninformative prior.463.2. Simulation studies(a) Single Beta, Review 1 (b) Single Beta, Review 2(c) Beta mixture, Review 1 (d) Beta mixture, Review 2(e) Normal + Gamma, Review 1 (f) Normal + Gamma, Review 2Figure 3.7: The estimated RE density from the Bayesian semiparametric(black), Bayesian parametric (blue) (both with full informative prior) andYZ’s parametric procedures (green) from a single simulation in Scenario A.The dotted curves represent the upper and lower limits of 95% credible in-tervals from the semiparametric procedure. The true density curve (red) andthe sampled REs (histogram) are superimposed. 473.2. Simulation studies3.2.2 Revisiting YZ’s simulation studyYZ reported that their semiparametric approach has considerable variation inthe estimation of the CPI when the counts on the previous scans are large aswell as potential bias when the previous counts are zero. We carried out ad-ditional simulations to assess whether our semiparametric Bayesian approachovercomes these shortcomings. For this purpose, we consider the same simu-lation model as YZ.Following YZ’s simulation settings, we simulated data for 200 patientswith five scans each (2 pre-scans and 3 new scans). CEL counts are generatedfrom YZ’s simulation model with no covariate. That is, Yi,j |GYZi = gYZi i.i.d.∼NB(exp(βYZ0 )/αYZ , 1/(gYZi αYZ+1)), where βYZ0 = 0.405, ln(αYZ) = −0.5 andthe RE is parametrized to lie in the range [0,∞). The RE GYZi is assumed tobe from a bimodal distribution with 85% of GYZi from a gamma distributionwith mean 0.647 and variance 2.374 and 15% from a normal distribution withmean 3 and variance 0.25.For each dataset, we fit our semiparametric procedure with the uninforma-tive prior used in the data analysis and an informative prior. The informativeprior for β and lnD are developed via the RE meta-analysis on 100 simulateddatasets from the same simulation model (i.e., 200 patients with 5 scans each).YZ’s simulation study examined the estimated CPIs for selected examplesof yi,new+ and yi,pre+. Table 3.6 reports the estimated CPIs from the semipara-metric Bayesian procedures for these same examples. The summary statisticsyi,pre+ and yi,new+ are sufficient to define CPI as Pr(Yi,new+ ≥ yi,new+|Y i,pre =yi,pre) = Pr(Yi,new+ ≥ yi,new+|Yi,pre+ = yi,pre+) holds for both our NB modeland YZ’s independent model. Our semiparametric Bayesian procedure yieldsvery similar CPI estimates with the uninformative and informative priors.This is reasonable as each generated dataset contains 1,000 scans, which rep-resents a large amount of available data. Both methods yield smaller or equalmean square errors than YZ’s semiparametric procedure for all of these exam-ples (see Appendix Table 1 of YZ); for some of the examples the mean squareerrors are substantially smaller.483.2. Simulation studiesYZ reported the semiparametric procedure had noticeable bias for esti-mating the CPI of a patient with ypre,i+ = 0: the bias was 0.206 whenynew,i+ = 1 and 0.018 when ynew,i = 3. Table 3.6 shows that both semipara-metric Bayesian procedures yield much smaller bias for such patients. Forexample, the semiparametric Bayesian procedure with uninformative prior re-duces the absolute bias by 83% (ynew,i+ = 1) and 90% (ynew,i+ = 3) relativeto YZ’s semiparametric procedure for patients with ypre,i+ = 0.ypre,i+ ynew,i+Bayes SP U Bayes SP ITrue Average (SD) Average (SD)Value Bias (MSE) Bias (MSE)0 1 0.152 0.1860 (0.0209) 0.1849 (0.0205)0.0344 (0.0016) 0.0333 (0.0015)3 0.034 0.0326 (0.0064) 0.0324 (0.0061)-0.0018 (<0.0001) -0.0021 (<0.0001)3 11 0.215 0.2193 (0.0345) 0.2195 (0.0341)0.0039 (0.0012) 0.0041 (0.0012)20 0.046 0.0438 (0.0101) 0.0439 (0.0090)-0.0019 (0.0001) -0.0018 (0.0001)10 19 0.224 0.2293 (0.0311) 0.2307 (0.0315)0.0054 (0.0010) 0.0068 (0.0010)29 0.049 0.0620 (0.0149) 0.0625 (0.0144)0.0126 (0.0004) 0.0131 (0.0004)20 26 0.233 0.2595 (0.0466) 0.2577 (0.0453)0.0269 (0.0029) 0.0251 (0.0027)46 0.047 0.0434 (0.0167) 0.0428 (0.0164)-0.0038 (0.0003) -0.0044 (0.0003)40 53 0.248 0.1686 (0.0590) 0.1626 (0.0557)-0.0799 (0.0099) -0.0859 (0.0105)86 0.049 0.0388 (0.0207) 0.0373 (0.0204)-0.0100 (0.0005) -0.0116 (0.0005)Table 3.6: Comparisons of Bayes SP U and Bayes SP I in terms ofPr(Yi,new+ ≥ yi,new+|Y i,pre = yi,pre) for selected values of yi,new+ and yi,pre+based on 500 simulations.YZ also reported that the SE of the semiparametric estimate of CPI be-comes larger as yi,pre+ increases: the SE was 0.143 when ypre,i+ = 40 and493.3. Conclusions and discussionynew,i+ = 53 and 0.061 when ypre,i+ = 40 and ynew,i+ = 86. Table 3.6 showsthat both semiparametric Bayesian procedures more than halve these SEs.Based on these results, we conclude that our proposed semiparametricBayesian procedure appears to overcome the main problems of YZ’s semi-parametric procedure.3.3 Conclusions and discussionThis chapter illustrated the use of our flexible NB mixed effect model usingten MS trials, and assessed the procedure via simulation studies. Our dataanalysis of ten MS trials indicated that the semiparametric model with an un-informative prior was always preferred to the parametric model. In particular,the DIC indicates that the semiparametric model provides considerably betterfits to the CEL activity of SPMS patients. Our simulation study showed thatour model returns the smallest average RMSE of the CPI estimates amongthe competing methods when the true RE distribution is a mixture of betas ora mixture of a normal and a gamma, indicating the robustness of our flexiblemodel against model misspecifications. Our simulation study also showed thatour semiparametric Bayesian procedure overcomes the main problems of theYZ semiparametric procedure: even when the counts on the previous scansare large, the variability of the CPI estimates from our procedure is moder-ate, and our proposed method substantially reduces the bias of CPI estimateswhen the previous counts are zero.As the index values vary under models with different RE distributionalassumptions, DSMBs wishing to utilize the CPI approach to monitoring MRIsafety should employ our flexible mixed effect model which does not dependupon any restrictive distributional assumptions for the RE. Both the para-metric and semiparametric Bayesian procedures are implemented in a freelyavailable R package. From the user’s point of view, no extra effort is requiredto use our semiparametric Bayesian procedure than the parametric Bayesianprocedure; the only extra burden lies in the additional computational timerequired for the semiparametric Bayesian procedure. The semiparametric503.3. Conclusions and discussionBayesian procedure with the full informative prior required 37 seconds (107seconds) to compute the CPIs for all the patients in SPMS-5 at review 1(review 2) while the parametric Bayesian procedure required 18 seconds (55seconds) on a machine with 3 GHz Intel Core i7 processor and 16 GB 1600MHz DDR3 memory. Although our semiparametric procedure takes longerto fit, this is negligible relative to the costs and time involved in carrying outsuch a clinical trial.The advantage of an informative prior has been demonstrated in both ourdata analysis and simulation study. The data analysis demonstrated that aninformative prior can greatly reduce the length of the 95% CI for the CPIat the early stages of a trial. In fact, at the first DSMB review, some of the95% CIs from the YZ procedures were uninformative; this indicates that aninformative prior is essential for an insightful analysis. Our simulation studyshowed that with an informative prior, CPI estimates become more accurate.Although DSMBs should be careful in choosing “previous” trials to developan informative prior as it could affect the CPI estimates, our simulation studyshowed that our procedure with an “incorrect” informative prior was able tooutperform the uninformative prior. This is because the CEL activity of thecontrol patients used to develop the “incorrect” prior was fairly similar to theCEL activity of the control patients used to develop our “correct” prior. Werecommend that DSMBs include all available trials based on patients withsimilar characteristics as those in the current trial to develop an appropriateinformative prior. The inclusion and exclusion criteria of the trials should beconsidered carefully in this process.Whether to compute the CPI based on the control or treatment size pa-rameter is open to debate; if the treatment is very effective in most patientsbut actually harmful in a subset, the CPI based on the treatment size pa-rameter should be able to identify this subset, while the CPI based on thecontrol size parameter might not. However, the CPI based on the treatmentsize parameter has a different interpretation as it would measure the proba-bility of observing an increase in CEL counts as large as those observed inthe new scans if the patient has been treated. This interpretation may not513.3. Conclusions and discussionbe very meaningful as the treatment effect is unknown a priori; for example,if the CPI is computed based on the estimated treatment size parameter andtreatment is harmful, this CPI might not identify any patients having un-expected increases in CEL activity. On balance, we think that DSMBs areinterested in identifying CEL activities that are unexpectedly large relativeto those experienced by the patients on the comparator arm. Therefore, theCPI is computed using the size parameter from the control patients to providethe interpretation of the CPI as the probability of observing an increase inCEL counts as large as those observed in the new scans if the patient was onthe control arm. Nevertheless, using our software, our safety monitoring pro-cedure based on the flexible model with an informative prior can be readilyutilized to compute CPIs under either definition. Therefore, DSMBs couldmonitor both CPIs, if desired.In practice, it could be difficult to obtain enough MS trial data to developa reliable informative prior. However it might be easier to obtain the trial dataof control patients. An informative prior could then be developed for the meanfunction of the control patients in the current trial and an uninformative priorcould be used for the mean function of the treated patients. Resulting CPIsshould be still reliable as our CPI evaluation is based only on the estimatedmean function of control patients.52Chapter 4Identification of treatmentresponders4.1 IntroductionIn the era of personalized medicine, there is a focus on learning which treat-ment better meets each individual patient’s needs [18]. The identificationof responders and non-responders could optimize selection of therapy particu-larly when a cure for the disease is absent and a number of therapeutic optionsare available. At the same time, the question of how to measure the efficacyof a therapy is a fundamental issue. In planning clinical trials in many subjectareas, researchers often find it difficult to designate one single outcome as theprimary endpoint to describe treatment efficacy [24]. Well designed and care-fully executed clinical trials are expensive and time-consuming, and require asubstantial commitment by investigators. There is a natural tendency to wishto study as many different aspects of treatment efficacy as possible during thecourse of any proposed clinical trial. As a consequence, most clinical trialsinvolve multiple outcome measures, each of which is repeatedly recorded overthe follow-up period [61]. This chapter, Chapter 5 and Chapter 6 consider theproblem of identifying treatment responders in clinical trials utilizing multiplelongitudinal outcomes.Hereafter, we refer to any group of patients receiving a test treatment asa treatment arm, and the patients in a treatment arm as “treated patients”.In contrast, we refer to the patients in the comparator arm as “control pa-tients”. Depending on the trial, the control patients may receive a placebo ora standard treatment.534.1. IntroductionLoosely speaking, responders are the subset of patients who are responsiveto treatment. The literature has considered two definitions of responders. Oneis the patients that respond positively (or negatively, depending on the direc-tion of interest) regardless of the treatment assignments, namely “absoluteresponders”. For this definition, control patients can be absolute respondersbecause of, for example, placebo effects or natural improvements. The anal-ysis often examines the proportions of identified responders in treatment andcontrol arms, and then assesses whether the proportions are significantly dif-ferent. For example, some have suggested that patients without CELs afterone year of interferon β-1a should be considered as responders (e.g., [67]).[38] used a mixture of multivariate longitudinal models, a so-called Bayesiangrowth curve latent class model, for two longitudinal continuous outcomes inorder to identify such absolute responders. The mixture model was fit to bothtreated and control patients with treatment assignment as a covariate, andthen the patients were assigned to the cluster that yielded the largest poste-rior probability given the observed data. In this application, the cluster withrelatively large declines in the outcome measures over time was labeled as theresponder cluster.Another definition of responders would be the patients in the treatmentarm with a change in the outcomes that would not be expected to be observedin the absence of an active test treatment, namely “relative responders”. Un-der this definition, control patients cannot be relative responders, and treat-ment response is measured relative to the control patients. Therefore, relativeresponders cannot be a consequence of placebo effects. For example, [74] sug-gested that, for each patient in the treatment arm, the percent reduction inthe average follow-up CEL counts from the baseline level be computed. Then,the treated patients with individual percent reduction below a pre-determinedthreshold were considered as responders. The threshold was determined basedon the sampling distribution of the percent reduction among control patients.[59] modelled the changes in bone density of treated patients with a mixtureof two univariate normals, one for the responder cluster and another for thenon-responder cluster. The changes of control patients were assumed to be544.1. Introductionfrom the univariate normal corresponding to the non-responders.The identification of both relative and absolute responders is often basedon a single longitudinal outcome [36] as seen in the examples above. A commonapproach to the use of multiple longitudinal outcomes is to combine them ateach time point, thereby creating a single longitudinal composite outcomefor assessing the treatment efficacy. For example, the MSFC, introduced inChapter 1, is a widely used composite score in phase II/III MS trials [16].It is based on the concept that scores from three body functions, arm, legand cognitive, are combined to create a single score [12, 16]. This is done bycreating a standardized score for each component of the MSFC and averagingthese standardized scores to create an overall composite score known as theMSFC score.In general, it is not obvious how to construct a composite score. O’Brienproposed procedures to develop composite scores as test statistics in the con-text of assessing relative treatment efficacy [56]. His composite scores accountfor the correlation structure among the original components. Although hiscomposite scores are group based summaries, one can show that his com-posite scores are proportional to the difference between the average of thestandardized individual scores from treated patients and the average of thestandardized individual scores from control patients. Therefore, a naive ap-proach to extend [56]’s procedure to identify responders based on multipleoutcomes could be to use the standardized individual score as an indicator forhow responsive a patient is. Then a patient can be defined as an absolute re-sponder if the score is above/below a pre-specified threshold determined basedon the distribution of the scores from all patients. Similarly, a treated patientcan be defined as a relative responder if the score is above/below a thresholddefined based on the distribution of the scores from control patients.There is an issue in this naive composite score approach: the constructionof the composite score suggested by [56] does not consider how effective eachoutcome is to describe the treatment effect. Therefore, its performance as anindicator for responsive patients could be particularly poor when the outcomeshave different effect levels. This is problematic since clinicians are often not554.1. Introductionaware which outcomes are more closely tied to the treatment effect, or whichoutcomes discriminate those who are treatment responders from those whoare not. Hereafter, we refer to discriminating outcomes as effective outcomes.Let y(r)i = (y(r)i,1 , · · · , y(r)i,n(r)i )T be the vector containing the rth outcome’sn(r)i longitudinal measurements from the ith patient (i = 1, 2, · · · , N andr = 1, 2, · · · , R). Let F be the set of indices j corresponding to the follow-upafter the treatment initiation. We originally attempted to utilize the CPI ideadeveloped in Chapter 2 to identify relative responders. That is, we considery(r)i as a realization of a random vector Y(r)i and first we develop a multivari-ate model for the multiple longitudinal outcomes y(1)i , · · · ,y(R)i for controlpatients. Then, we choose some scalar function q that summarizes all thefollow-up outcomes into a single scalar statistic. A natural choice of q wouldbe some linear combination of the follow-up outcome measurements. Withoutloss of generality, assume that large values of q correspond to disease wors-ening. Using the multiple longitudinal outcome model for control patientsas a reference distribution, we may compute the conditional probability for atreated patient to have as small a scalar statistic as that observed during thefollow-up given their pre-treatment values on all outcomes:Pr(q({{Y (r)i,j }j∈F}Rr=1) ≤ q({{y(r)i,j }j∈F}Rr=1)|{{Y (r)i,j = y(r)i,j }j /∈F}Rr=1). (4.1)A small value of this probability means that it would be surprising to see sucha decrease in disease activity in the absence of the treatment. Hence we canidentify the treated patient with a small conditional probability as a relativeresponder.However, this CPI procedure encounters the same issue as the compositeoutcome procedure: the choice of q is difficult. In practice, it would be idealto choose q such that it gives more weight to the measurements that showclearer separation between responders and non-responders. However, withoutpre-knowledge of these treatment effects, the choice of q is challenging. Fur-thermore, some choices of q may cause the CPI to be very difficult to computein practice.564.2. Review of composite outcome measuresTo circumvent the difficulty of selecting the weights for a composite scoreor specifying the choice of q in the CPI procedure, we consider a latent-classclustering approach: that is, we consider the identification of responders as aclustering problem. The number of clusters may be K = 2 for responder andnon-responder clusters but it can be more than 2 if one is interested in sub-groups of responders or non-responders. For simplicity, this chapter only con-siders the scenario where K = 2. Here, responders can be relative or absolute.Let Ci be a random variable equal to 2 if the ith subject is from the respon-der cluster and 1 otherwise. Given a multiple longitudinal outcome mixturemodel, we compute the model-based classification probability, i.e., the proba-bility of being a responder given the observed data y˜i = (y(1)Ti , · · · ,y(R)Ti )T ,for each patient. That is, we compute:Pr(Ci = 2|Y˜ i = y˜i). (4.2)Following the literature (e.g., [7, 53]), we refer this probability as a posteriorprobability of class membership. If this probability is larger than 0.5, one mayclassify the patient as a responder.This short chapter discusses the performance of the posterior probabilityprocedure in comparison to the CPI and the composite outcome proceduresin a simple setting where none of the outcomes have repeated measures; i.e.n(1) = · · · = n(R) = 1. As the following sections do not consider repeatedmeasures, we omit the subscript j. Section 4.2 reviews the composite outcomeapproach proposed by [56]. Section 4.3 demonstrates the desirable propertiesof our posterior probability method using simulation studies. We will leavethe development of the multivariate model for Y(1)i , · · · ,Y (R)i to Chapter 5.4.2 Review of composite outcome measuresThis section reviews the two composite scores proposed by [56] in the contextof assessing relative treatment efficacy. The composite scores are developedto form test statistics for assessing the null hypothesis of no group treatment574.2. Review of composite outcome measureseffect. Therefore, these composite scores are not defined for individual pa-tients. However, we will show that these composite scores are proportional tothe difference of average standardized individual scores between treated andcontrol groups. Then we argue that the standardized individual compositescore could be a natural extension of these composite scores for measuringhow responsive an individual patient is to a treatment.For simplicity our discussion focuses on the scenario with only two arms:control and treatment arms, where y˜a,i = (y(1)a,i , · · · , y(R)a,i )T denotes the vectorof R outcomes from the ith patient in the ath arm (a = 0: control arm; a = 1:treatment arm). Let n0 and n1 be the number of control and treated patients,respectively. [56] assumed that the Y˜ a,i’s are independently distributed acrosspatients, and treatment and control groups share a common variance covari-ance matrix Σ. As the purpose of [56]’s composite score is to develop a test toassess the treatment effect by comparing the means of the treated and controlgroups, the mean of each group is allowed to be different:Y˜ a,ii.i.d.∼ (µa,Σ). (4.3)Note that no specific distributional assumption is made in (4.3). We canexpress Σ = V12MV12 , whereM is the correlation matrix and V12 = diag(σ1,· · · , σR) is a diagonal matrix containing the SDs. Let y¯a =na∑i=1y˜a,i/na bethe group average. To develop composite scores which can be used as teststatistics, the difference of the group averages is considered: the differenceY¯ 1 − Y¯ 0 has mean µ1 − µ0 and variance matrix (1/n1 + 1/n0)Σ. Thereforethe difference can be transformed to have variance (1/n1 +1/n0)M by scalingit as:Z := V −12 (Y¯ 1 − Y¯ 0) ∼(V −12 (µ1 − µ0),(1n1+1n0)M).To develop composite scores with simple forms, [56] assumed the standardized584.2. Review of composite outcome measureseffect size of each outcome is the same value denoted by β:β1R := V− 12 (µ1 − µ0) .Then the expectation of Z is β1R.The first composite score is simply the average of all the scaled outcomes,which is an unbiased estimator for β for any M , namely the ordinary leastsquares (OLS) estimator:βˆOLS =1TRZR.By the Gauss-Markov theorem, βˆOLS is the best linear unbiased estimator(BLUE) of β when Z is uncorrelated. Here “best” means giving the low-est variance of the estimate, among all linear unbiased estimates. However,when Z is correlated (i.e., the off-diagonal entries of M are nonzero), theOLS estimate is no longer BLUE although it is still an unbiased estimator.Therefore, [56] proposed to use the generalized least squares (GLS) estimatorin this situation:βˆGLS =1TRM−1Z1TRM−11R,which is BLUE for any structure of M . Finally the unknown covariancematrix Σ is replaced by its sample estimate and [56] used βˆOLS and βˆGLS astest statistics to assess the null hypothesis of µ0 = µ1 i.e., β = 0.The development of these composite scores is based on the assumptionof a common standardized effect size across the R outcomes. In practice,each outcome may have a different standardized effect size. In this case,these composite scores might not be as suitable as a summary of the multipleoutcomes.To use a composite score as a measure of how responsive an individualpatient is, the composite score must be defined for individual patients. Letc be any vector of length R and Ka,i(c) = V− 12 (y˜a,i − c). Then Ka,i(c) ∼594.3. Simulation study(V −12 (µa − c),M). We define:βˆOLS,a,i(c) :=1TRKa,i(c)RβˆGLS,a,i(c) :=1TRM−1Ka,i(c)1TRM−11R.The composite scores βˆOLS and βˆGLS can be represented as functions ofβˆOLS,a,i and βˆGLS,a,i, respectively as:βˆOLS =1n1n1∑i=1βˆOLS,1,i(c)− 1n0n0∑i=1βˆOLS,0,i(c)βˆGLS =1n1n1∑i=1βˆGLS,1,i(c)− 1n0n0∑i=1βˆGLS,0,i(c).Therefore, the natural extensions of [56]’s two composite scores for an indi-vidual patient are βˆOLS,a,i(c) and βˆGLS,a,i(c) for some vector c.The choice of c requires some attention. The form of Ka,i(c) suggeststhat the natural choice of c maybe the center of some reference distributionagainst which y˜a,i is standardized. Throughout the rest of this chapter, weuse c = µ0, because then βˆOLS,a,i(µ0) and βˆGLS,a,i(µ0) measure how muchthis patient’s outcomes deviate from a typical control patient.4.3 Simulation studyThe purpose of this simulation study is to compare the responder identificationperformance of the CPI, the OLS and GLS methods discussed in Section 4.2,and the posterior probability method. We also consider composite scores otherthan OLS and GLS. We will use the area under the receiver operating char-acteristic (ROC) curve to assess the classification performance of the variousprocedures. Before detailing the simulation setting, we briefly introduce theROC curve.604.3. Simulation study4.3.1 The ROC curveThe ROC curve is a graphical tool that illustrates the performance of a con-tinuous classifier as its threshold is varied. In general, the determination of an‘ideal’ threshold value involves a trade-off between sensitivity (true positives)and specificity (true negatives). As both change with each threshold value, itis difficult to determine where the cut-off should be. The ROC curve offersa graphical illustration of these trade-offs at each threshold for any binaryclassifier that uses a continuous variable [22]. The curve is created by plottingthe true positive rate (ranging between 0 and 1) against the false positive rate(ranging between 0 and 1) as threshold varies. The area under the ROC curve(hereafter AUC) summarizes the ROC curve into a scalar value that rangesbetween 0 and 1: 1 indicates perfect classification with true positive rate of 1and false positive rate of 0 while less than 0.5 indicates that the classificationperformance is worse than by chance. The AUC is a convenient measure toassess the classification performance of a continuous classifier without pre-determining a threshold value. The units of the original continuous classifierdo not affect the AUC, and the AUC is affected only by the ordering of thecases according to the values of the continuous variable.4.3.2 Simulation settingsFor notational simplicity, this section ignores the group subscript a corre-sponding to the arm and uses y˜i to denote the measurement from the ithpatient. If one is interested in relative responders, then y˜i may be interpretedas the ith patient in the treatment arm, while if one is interested in abso-lute responders, y˜i may be interpreted as the ith patient in any arm. In oursimulation study, conditioning on Ci (1 or 2), Y˜ i = (Y(1)i , Y(2)i )T is a twodimensional vector from the bivariate normal with unit variances:[Y(1)iY(2)i] ∣∣∣∣∣Ci = c i.i.d.∼ N 2(−(c− 1)[µ1µ2],[1 ρcρc 1]),Cii.i.d.∼ Categorical(pi, 1− pi), (4.4)614.3. Simulation studywhere |ρc| < 1 and pi = Pr(Ci = 1). The correlation between the two outcomesis ρ1 for non-responders (Ci = 1) and ρ2 for responders (Ci = 2). Whenρ1 = ρ2, the responder and non-responder groups share a common variancematrix. We let the non-responders have mean (0, 0)T . If µ1 or µ2 is zerothen the corresponding outcome is ineffective to distinguish between responderand non-responders. Positive (negative) values of µr (r = 1, 2) indicate thatthe treatment reduces (increases) the mean of Y(r)i for responders by |µr|in comparison to non-responders. Without loss of generality, our simulationstudy considers the scenario µ1, µ2 ≥ 0.Under this model, the posterior probability in (4.2) based only on theoutcome Y(r)i (r = 1, 2) has a closed form expression:PPr,i := Pr(Ci = 2|Y (r)i = y(r)i ) = gpi(Gr,i),where gpi(k) = 1/(1 +pi1−pik) and:Gr,i =exp(−12y(r)2i)exp(−12(y(r)i − µr)2) = exp(12µ2r − µry(r)i), (4.5)for r = 1, 2. Similarly, the posterior probability based on both y(r)i , r = 1, 2 is:PP12,i := Pr(Ci = 2|Y (1)i = y(1)i , Y (2)i = y(2)i ) = gpi(G12,i), (4.6)where:G12,i =(1− ρ221− ρ21) 12exp(− 12{11−ρ21[(2∑r=1y(r)2i)− 2ρ1y(1)i y(2)i]})exp(− 12{11−ρ22[(2∑r=1(y(r)i − µr)2)− 2ρ2(y(1)i − µ1)(y(2)i − µ2)]}) .Regardless of their correlation, the OLS and GLS composite scores are thesame when only two outcomes are involved, and they are a simple average of624.3. Simulation studythe two outcomes:βˆOLS,1,i = βˆGLS,1,i =∑2r=1 y(r)i2.As the purpose of this simulation is to study the properties of various methods,we use the true parameter values for the computation of the OLS and theposterior probability. We investigate not only the OLS but also alternativelinear composite scores (LCS), because OLS is merely one example of an LCS,and one can create infinitely many linear combinations of the standardizedoutcomes. Letting LCSi(b) = Y(1)i + bY(2)i , we consider b = −1, 0, 10. Forcomposite score procedures, smaller values of the composite scores correspondto indication of a responder as the treatment reduces the mean of effectiveoutcomes for responders.If one is only interested in the classification performance of the compositescores in terms of AUC, LCSi(1) has the same AUC as OLS (or GLS) becauseboth LCSi(1) and OLS weight the two outcomes equally. Also PP1,i has thesame AUC as LCSi(0) because, as seen in (4.5), PP1,i is an increasing functionof Y(1)i = LCSi(0).For the CPI procedure to identify relative responders, it may be reason-able to consider the scalar summary function q to be a linear compositescore: q(Y(1)i , Y(2)i ) = LCSi(b). Since our simulation setting does not con-sider any pre-treatment measurement, and LCSi(b) has a normal distributionwith mean 0 and variance 1+ b2 +2bρ1 for non-responders, the CPI defined in(4.1) is based on the CDF of this normal distribution. As the CDF of a normaldistribution is strictly increasing, for (y(1), y(2))T and (y(1)′, y(2)′)T pairs suchthat y(1) + by(2) < y(1)′+ by(2)′, we always have:Pr(LCSi(b) ≤ y(1) + by(2)) < Pr(LCSi(b) ≤ y(1)′ + by(2)′).Therefore, for given b, the CPI and LCS procedures yield the same orderingof cases. As the AUC is only affected by the order of the cases, their AUCsare the same. Therefore we only consider the LCS procedure.634.3. Simulation studyWe fix µ1 = 1 and consider all combinations of ρ1 ∈ (−1, 1), ρ2 =−0.25, 0.25, 0.75, ρ1 = ρ2 and µ2 = 0, 0.5, 1, 2. As larger µ2 indicates thatthe outcome Y(2)i is more effective to classify the two groups, the choices con-sidered for µ2 correspond to the outcome Y(2)i being ineffective (µ2 = 0), halfas effective as Y(1)i (µ2 = 0.5), as effective as Y(1)i (µ2 = 1) and twice as ef-fective as Y(1)i (µ2 = 2). The proportion of responders pi considered are 25%,50% and 75%. The purpose of this simulation study is to demonstrate theexpected classification performance of the various procedures in the best pos-sible setting. Therefore, we generate a large number (100,000) of patients fromthe simulation model (4.4) for each parameter set. The OLS composite score,the posterior probability and the LCSi(b) composite scores are computed atthe true parameter values µ1, µ2, ρ1 and ρ2 for each of the 100,000 patients.The AUC is then computed based on 100,000 samples for each combinationof parameters.4.3.3 ResultsFigure 4.1 shows the AUC based on the six distinct procedures considered inthe simulations. We first discuss how the model parameters (pi, ρ1, ρ2, µ1 andµ2) influence the performance of all the methods in general, and then discussthe performance of the posterior probability methods in more detail.Overall performanceFirst, the proportion of non-responders pi does not affect the prediction per-formance. Therefore, we only show the simulation results with pi = 0.5.For fixed ρ1 and ρ2, the classification performances of the posterior proba-bility method based on two outcome measures, the OLS and the LCS methodswith positive weights on Y(2)i improve as µ2 increases. However, LCSi(−1)performs worse as µ2 increases. In fact, LCSi(−1) has AUC  0.5 when theoutcome Y(2)i has a large mean among responders (µ2 = 2), indicating thatclassification performance would be better if the sign of this classifier wasswitched.644.3. Simulation studyFigure 4.1: The AUC based on six procedures under the scenarios whenY(2)i is ineffective (µ2 = 0) and effective (µ2 = 0.5, 1, 2). Three posteriorprobability procedures are considered: PP1,i, PP2,i and PP12,i. Three linearcomposite score procedures are considered: OLS, LCSi(−1) and LCSi(10).654.3. Simulation studyIt is reasonable that the prediction performances of the methods basedonly (or heavily) on a single outcome (i.e., the posterior probability methodbased only on Y(r)i and the LCS procedures with b = 0 or 10) are not muchaffected by the magnitudes of ρ1 and ρ2. In particular, for all considered µ2,the LCSi(10) performs the worst among all considered LCSs for all valuesof ρ1 and ρ2 when µ2 = 0. This is reasonable as Y(2)i is ineffective and thisprocedure classifies the patients based primarily on the ineffective outcome.However, the rest of the procedures, i.e., PP12,i, LCSi(−1) and OLS, areaffected by the values of ρ1 and ρ2. In particular, when µ1 = µ2 = 1 andρ1 = ρ2, the classification performance of the posterior probability methodbased on both outcomes and the OLS deteriorate as both ρ1 and ρ2 becomelarger. This is an artifact of our simulation model: for certain choices ofµ1, µ2, ρ1 and ρ2, the two groups are not well-separated. The top row ofFigure 4.2 shows 100 samples from each of the responder and non-respondergroups when µ1 = µ2 = 1 and ρ1 = ρ2 = −0.95, 0, 0.95, the contours of theposterior probability method based on both outcomes and the correspondingposterior probability based on only Y(1)i (adjoining top panel) and only Y(2)i(adjoining side panel). When ρ1 = ρ2 = −0.95, the responder and non-responder groups are well-separated and the contours show a steep increasein posterior probability in the area between the two groups. However, whenρ1 = ρ2 = 0.95, the two clusters have large overlap which makes classificationdifficult for all the procedures.While the performance of OLS deteriorates when ρ1 becomes larger for anyvalues of ρ2, LCSi(−1) performs better as ρ1 becomes larger when µ2 ≤ 0.5,and, in particular, the AUC of OLS and LCSi(−1) are symmetric aroundρ1 = 0 when Y(2)i is ineffective (µ2 = 0). This is because the responderand non-responder groups are well-separated by the 45 degree line in theY(2)i versus Y(1)i plots when strong positive correlation is present. Figure 4.3shows the contours of OLS (left), LCSi(0) (middle) and LCSi(−1) (right)when ρ1 = ρ2 = 0.8 and µ2 = 2. The superimposed points are 100 samplesof (Y(1)i , Y(2)i ) from each of the responder (circle, blue) and non-responder(triangle, red) groups. The contours of LCSi(−1) separate the two groups664.3. Simulation studyFigure 4.2: A hundred random samples of (Y(1)i , Y(2)i )T from each of theresponder (blue) and non-responder (red) groups for various combinations ofthe parameters: µ2 = 1, 0.5, 0 and ρ1 = ρ2 = −0.95, 0, 0.5, 0.95 when µ1 = 1and pi = 0.5. The contours of the posterior probability based on the twooutcomes are superimposed. In each plot, the adjoining side panel shows thecorresponding posterior probability based only on Y(2)i and the adjoining toppanel shows the corresponding posterior probability based only on Y(1)i . Theadjoining number in the top right corner is ρ1 (= ρ2).674.3. Simulation studyFigure 4.3: The contours of OLS (left), LCSi(0) (middle) and LCSi(−1)(right) as functions of Y(1)i and Y(2)i . The superimposed points are 100 samplesof (Y(1)i , Y(2)i ) from each of the responder (circle, blue) and non-responder(triangle, red) groups when ρ1 = ρ2 = 0.8 and µ2 = 2.well, i.e., larger values of LCSi(−1) are associated with larger chances of beingresponders. On the contrary, each contour line of OLS (= LCSi(1)) crossesthe two groups, indicating that classification into the two groups based onthis composite score is a challenge. This can also be seen by Figure 4.4 whichshows the densities of OLS (left), LCSi(0) (middle) and LCSi(−1) (right)scores for the responder (solid curve) and the non-responder (dotted curve)groups in the same settings as Figure 4.3; the densities of responders andnon-responders are best separated by LCSi(−1).The performance of the posterior probability methodThe performance of the posterior probability method based on two correlatedoutcomes is always at least as good as that of the other procedures for all com-binations of pi, ρ1, ρ2, µ1 and µ2 considered. It is notable that even when Y(2)iis ineffective (µ2 = 0), the performance of the posterior probability methodbased on two outcomes is better than the posterior probability method basedonly on the effective outcome if the two outcomes are dependent, indicatingthat information on Y(2)i does help to improve the prediction accuracy. To684.3. Simulation studyFigure 4.4: The density of OLS (left), LCSi(0) (middle) and LCSi(−1)(right) for responders (solid curve) and non-responders (dotted curve) whenρ1 = ρ2 = 0.8 and µ2 = 2.better understand the reasons for the desirable performance of the posteriorprobability method particularly when one outcome is ineffective and ρ1 = ρ2,the bottom row of Figure 4.2 shows 100 samples of (Y(1)i , Y(2)i )T from each ofthe responder and non-responder groups when µ2 = 0 as well as the contoursof the posterior probability method based on both outcomes and the corre-sponding posterior probabilities based on each of the outcomes. As Y(2)i isineffective, PP2,i = 0.5 for all values of Y(2)i , indicating that given any valueof Y(2)i , the patient is equally likely to be a responder or a non-responder.On the other hand, PP1,i decreases as Y(1)i increases because larger values ofY(1)i indicates an increased chance the patient is a non-responder. AlthoughPP2,i does not depend on Y(2)i , PP12,i depends not only on the effective out-come Y(1)i but also on the ineffective outcome Y(2)i . As a result, PP12,i andPP1,i show considerable discrepancies when two outcomes are dependent. Forexample, when (Y(1)i , Y(2)i ) = (0, 1) and ρ1 = ρ2 = 0.95, PP1,i = 0.38 indicat-ing the patient is more likely to be a non-responder, however, PP12,i = 0.99strongly indicating the patient is a responder. On the other hand, when thetwo outcomes are not dependent ρ1 = ρ2 = 0, and Y(2)i is ineffective, thenPP12,i = PP1,i for a given value of Y(1)i regardless of the value of Y(2)i . This694.4. Discussioncan be readily seen by substituting ρ1 = ρ2 = 0 and µ2 = 0 into PP12,i definedin (4.6), and it can also be seen from the middle panel of the bottom row ofFigure 4.2.In summary, the classification performance of the composite score pro-cedures with two outcomes depends heavily on the choice of the weights onthe two outcomes. The selection of the weights is not an easy task becausethe optimum weights should account for the effect sizes µ1 and µ2, and thecorrelations (ρ1 and ρ2) between the two outcomes. Depending on the choiceof weights, the classification performance of the LCS based on two outcomescould be worse than that of the LCS based only on one outcome (i.e, LCSi(0))even when both outcomes are effective. As OLS is also a linear combinationof the two outcomes, its performance varies depending on the parameters ofthe true model. On the contrary, the posterior probability method basedon the two outcomes yields at least as good performance as the other meth-ods considered for all ρ1, ρ2, µ2 and pi. It is notable that inclusion of anineffective outcome in addition to an effective outcome may improve the clas-sification performance of the posterior probability method. As there is noobvious choice of the optimum weights in the composite score procedure, theposterior probability method has a practical advantage.4.4 DiscussionThis chapter motivated the use of the posterior probability to identify treat-ment responders. Although this chapter only considered two outcomes, manystudies will want to consider more than two outcomes; in a typical clinical trial,this might be one primary endpoint and a collection of two or more secondaryoutcomes. Furthermore, although this chapter only considered normally dis-tributed outcomes, the measurements could be from other distributions suchas the binomial or negative binomial. Similar investigations of the posteriorprobability approach in such contexts would be of considerable interest.To evaluate a posterior probability, we need a parametric mixture modelthat describes the joint distribution of the multiple outcomes. Although this704.4. Discussionchapter only considered the scenario with no repeated measures, in many clin-ical trials each outcome is measured longitudinally and the treatment efficacyis assessed based on multiple longitudinal outcomes. To implement a latent-class clustering approach, we need a general mixture model for the multiplelongitudinal outcomes. The next chapter will discuss our novel multivariatemodel, will develop an estimation procedure based on a Monte Carlo expecta-tion maximization (MCEM) algorithm, and will demonstrate its performancein identifying responders using simple illustrative examples.71Chapter 5Multiple longitudinaloutcome mixture model toidentify responders5.1 IntroductionDue to the complex nature of diseases, it is often difficult to determine asingle endpoint that would comprehensively indicate improvement with treat-ment over time. Instead, longitudinal studies often involve multiple outcomesmeasured repeatedly on each participant. These longitudinal outcomes mayinvolve both continuous and discrete outcomes. In MS clinical trials, examplesof continuous outcomes are the T25FW and the 9HPT, which are two compo-nents of the MSFC (introduced in Chapter 1); examples of discrete outcomesare the PASAT, which is another component of the MSFC, and MRI lesioncounts of various types.An additional challenge for assessing treatment efficacy is the extremeheterogeneity of the population diagnosed as having MS. Heterogeneity of ex-perienced symptoms suggests the possibility of distinct disease subgroups, orclusters. Consequently, not all subjects diagnosed with MS would be expectedto respond to a given treatment [66]. A model that accounts for multiple out-comes reported over time while grouping subjects based on their multivariatesymptom profiles is needed for identifying clinically important clusters forfurther study. As discussed in Section 4.1, we call such clusters of patientsas treatment responders and they could be relative responders or absolute725.1. Introductionresponders. This chapter introduces a multiple longitudinal outcome mixturemodel (MLOMM) that can be used to identify the clusters of populationscharacterized by multiple longitudinal outcomes.The research on joint mixture models for multiple longitudinal outcomesis very limited, and is mostly focused on the setting where all the outcomesare normally distributed. [38] developed a mixture model for multiple con-tinuous longitudinal outcomes by relating them to a single latent variable ateach time point in Bayesian framework. The same authors recently extendedthis approach to non-continuous outcomes [39]. Some attempts have beenmade to develop joint (non-mixture) models for longitudinal continuous anddiscrete outcomes. [81] developed a joint model for longitudinally collecteddiscrete and continuous outcomes where the discrete outcomes are modelled asa multivariate Poisson [33], and the joint conditional distribution of the con-tinuous outcomes given the discrete outcomes is modelled as a MVN. In thismodel, the dependencies between the discrete and continuous outcomes aremodelled by assuming that the conditional means of the continuous outcomesdepends on the discrete outcomes. [31] provides an overview of work on jointmodelling of discrete and continuous longitudinal variables. [69] proposed ajoint (non-mixture) model for correlated ordinal and discrete longitudinal out-comes. More recently, [64] proposed a hidden Markov model for a multiplelongitudinal responses of different type (e.g., binomial and normal), wherethe hidden latent classes may be interpreted as “clusters” with time-varyingmemberships.Arguably, the most popular longitudinal model for a single outcome is aGLMM which is an extension of the generalized linear model (GLM) [45].Hence there is ample motivation to develop a multivariate model with eachlongitudinal outcome margin assumed to be from a GLMM. In many circum-stances, it may be reasonable to assume that the cluster differences arise fromthe mean structure of the longitudinal outcomes, conditional on the RE andthe cluster label. For example, in a longitudinal setting, observations in differ-ent clusters might have different patterns over time in their means. Therefore,our MLOMM assumes that each longitudinal outcome follows a GLMM with735.2. Notationits mean structure depending on a cluster label. We will estimate the param-eters using a MCEM algorithm. Unlike [39], our procedure does not requirethe multiple outcomes to be taken at the same time point.The rest of this chapter is organized as follows. Before going into thedetails of our procedure, Section 5.2 briefly discusses our mathematical no-tation. Section 5.3 introduces our MLOMM, which can be used to identifyabsolute responders. Section 5.4 introduces the estimation procedure based ona MCEM algorithm. Section 5.5 introduces the extension of the MLOMM inwhich a longitudinal outcome is modelled with a NB mixed effect model con-ditioning on the cluster label. Section 5.6 demonstrates the MLOMM usinga simulated dataset. Section 5.7 discusses how the MLOMM can be utilizedto identify relative responders. Section 5.8 provides a brief conclusion for thischapter. Analyses of two actual MS clinical trial datasets with our MLOMMand simulation studies are left to Chapter 6.5.2 NotationIn this chapter, unless otherwise stated, given a random variable denoted byan upper case letter, for example X, we use the corresponding lower caseletter, x, to indicate a realization of that random variable.In this chapter, f(a|b; Ψ) denotes the density of the conditional distributionof random variable A given a random variable B = b evaluated at a where thedensity is parametrized by the parameter Ψ. When the density is evaluatedat a point other than a, say c, we explicitly state it as fA|B(c|b; Ψ). Whenthe conditioning random variable B takes a value different from b, say c, weexplicitly state it as f(a|B = c; Ψ). When the argument of the density iscapitalized as f(A|b; Ψ), it indicates that the density is a random quantity.The expectation of a random variable X with respect to a density f(x|y; Ψ)is expressed as E (X|y; Ψ). When it is obvious, the parameter Ψ is omittedfrom the expression. When the expectation of a function of random variablesis evaluated, we explicitly state what is the corresponding density using sub-scripts. For example, EX(h(X, y)|y; Ψ) indicates the expectation of h(X, y)745.3. Modelwith respect to the conditional density f(x|y; Ψ). The same rule applies forvariance, correlation and covariance operators.5.3 ModelWe first briefly review GLMs and GLMMs in Section 5.3.1. We introduce ourMLOMM in Section Review of GLM and GLMMGLMs are natural extensions of ordinary linear regression models to allow theoutcome variables to have a distribution other than a normal. The outcomeYi (i = 1, 2, · · · , N) from the ith patient has a distribution in the exponentialfamily, with the density function taking the form:fGLM(yi; ΨY ) = exp(miκ[θiyi − b(θi)] + c(yi;κmi)), (5.1)for some specific functions b(·) and c(·), where θi = θi(β) is the canonicalparameter parametrized by a vector of p regression coefficients β, κ is thedispersion parameter, mi is a known prior weight, and the set of parametersis ΨY = {β, κ} [45]. Some common exponential family distributions, suchas the Poisson and binomial, have, dispersion parameter κ = 1. The meanand canonical parameters are related through the equation µi = E (Yi) =db(θi)/dθi. Denoting Zi as a covariate vector of length p, the link functionh relates the linear predictor ηi = ZTi β to the expected value µi as h(µi) =ηi. Many GLMs other than the classical linear models use non-identity linkfunctions as the values of µi are restricted. The “canonical” link function isthe link function for which θi = ηi.GLMs are suited to model Yi (i = 1, 2, · · · , N) when these outcomes areindependently distributed and their means have systematic patterns modelledwith covariates. GLMMs are extensions of GLMs that allow dependenciesamong the different measurements on the ith patient, Yi,j (j = 1, 2, · · · , n).In our clinical trial application, Yi,j may denote a longitudinally collected755.3. Modelmeasurement from the ith patient at the jth time point. For simplicity ofpresentation, we assume that n, the number of repeated measures over time,does not depend on the patient index i, and the repeated measures of patientsi and i′ are taken at the same time point, however our procedure can readilyincorporate different numbers of repeated measures and different measurementschedules for different patients. GLMMs incorporate a vector of REs Bi =(Bi,1, · · · , Bi,q)T in the linear predictor ηi.We let yi = (yi,1, · · · , yi,n)T be the observed data vector from the ithpatient. For any vector a, denote ab:c = (ab, ab+1, · · · , ac−1)T when c > b andab:c = ∅ when b = c. Conditionally on bi and the previous measures yi,1:j ,a GLMM assumes that the observation yi,j arises from a GLM with a linearpredictor:ηi,j = ZTi,jβ +KTi,jbi, (5.2)where Ki,j is a vector of q explanatory variables associated with the REs. Inour clinical trial application, the RE may be simply a patient-specific randomintercept (i.e.,Ki,j = 1). Then each patient is assumed to have a differentbaseline level in Yi,j , j = 1, · · · , n. Consequently, we have:f(yi|bi; ΨY ) =n∏j=1fGLM(yi,j |yi,1:j , bi; ΨY ).The conditional density of Yi,j at a value yi,j given yi,1:j and bi has the form of(5.1), and the conditional density may depend on yi,1:j through the fixed effectcovariates Zi,j = Zi,j(yi,1:j). This is particularly useful when longitudinaldata is modelled with a GLMM because then E (Yi,j |yi,1:j , bi) can have anautoregressive structure. Finally, the RE density is often assumed to be anormal with mean 0 and unstructured covariance.765.3. Model5.3.2 Multiple longitudinal outcome mixture modelTo distinguish R longitudinal outcomes, hereafter we introduce the superscript(r) as y(r)i = (y(r)i,1 , · · · , y(r)i,n(r))T (r = 1, 2, · · · , R). The number of repeatedmeasures n(r) are allowed to be different across outcomes, and the times whenthe repeated measures are taken can also differ across outcomes. Similarly, thesuperscript (r) is introduced for Zi,j ,Ki,j , ηi,j , µi,j , κ, β and bi to denote theircorresponding outcomes. Our MLOMM models Y˜ i = (Y(1)Ti , · · · ,Y (R)Ti )T byassuming each outcome is from a GLMM conditioning on the cluster label ci,and the dependencies across outcomes arise through the joint RE distributionfor B˜i = (B(1)Ti ,· · · ,B(R)Ti )T and the conditional mean structure of eachoutcome.Without loss of generality, we assume that the conditional mean of Y(r)i,jdepends on Y(r′)i,j where r > r′. We let y(1:r)i = (y(1)Ti , · · · ,y(r−1)Ti )T whenr > 1 and y(1:r)i = ∅ if r = 1. Then, conditioning on y(r)i,1:j , y(1:r)i , b˜i, and ci,our MLOMM assumes that y(r)i,j arises from a GLM with a linear predictor:η(r)[ci]i,j = Z(r)Ti,j β(r)[ci] +K(r)Ti,j b(r)i (5.3)that models:µ(r)[ci]i,j := E (Y(r)i,j |y(r)i,1:j ,y(1:r)i , b(r)i , ci), (5.4)with a link function h(r)(µ(r)[ci]i,j ) = η(r)[ci]i,j . The fixed effect covariates Z(r)i,jmay depend on other outcomes and previous measurements of this outcomeas Z(r)i,j = Z(r)i,j (y(r)i,1:j ,y(1:r)i ). In this expression, the mean cluster differencearises from β(r)[ci], ci = 1, 2, · · · ,K.Let Ψ[ci]Y (r)= {β(r)[ci], κ(r)} be the parameters that define the distribu-tion of Y(r)i |y(1:r)i , b(r)i , ci. Also let ΨY (r) be the parameters other than pithat define the distribution of Y(r)i |y(1:r)i , b˜i. Then ΨY (r) = {Ψ[k]Y (r)}Kk=1 ={{β(r)[k]}Kk=1, κ(r)}. We assume that ΨY (r) and ΨY (r′) do not share any pa-rameters (i.e.,ΨY (r) and ΨY (r′) are disjoint) for all r, r′ (r < r′). This as-sumption greatly simplifies the estimation; see Section 5.4 for details. Let775.3. ModelΨ = {{ΨY (r)}Rr=1,ΨB,pi} be the complete set of parameters for the distribu-tion of Y˜ i. Then the joint conditional density of Y˜ i given b˜i and ci is:f(y˜i|b˜i, ci; Ψ) =R∏r=1f(y(r)i |y(1:r)i , b˜i, ci; Ψ[ci]Y (r))=R∏r=1n(r)∏j=1fGLM(y(r)i,j |y(r)i,1:j ,y(1:r)i , b(r)i , ci; Ψ[ci]Y (r)).Note that for notational simplicity we parametrized Y˜ i|b˜i, ci by Ψ althoughthe distribution only depends on {Ψ[ci]Y (r)}Rr=1.We allow dependencies among Y˜ i through B˜i by modelling its densityf(b˜i; ΨB) where ΨB contains all the parameters that define the distributionof B˜i. If the RE density for B(r)i is a normal with 0 mean and unstructuredvariance, a natural choice of the density of the complete RE B˜i may also bethe MVN with 0 mean and an unstructured variance matrix. This way, REscorresponding to different outcomes can be correlated. In some special cases,where a non-normal RE distribution is assumed for each B(r)i , one mightuse a copula to model the joint distribution of B˜i [29]. Unless otherwisespecified, from now on we assume that f(b˜i; ΨB) is the MVN with mean 0and an unstructured variance matrix ΣB. Therefore, ΨB is a set containingthe variances and correlations that specify ΣB.The cluster label Ci is a latent variable from a categorical distribution withprobabilities pi = (pi1, · · · , piK)T , where K represents the pre-specified numberof clusters. In the identification of absolute responder problem, one may setK = 2 for responder and non-responder clusters, but K may be greater than 2if one suspects that there may be more than one distinct responder subgroups.We assume that the RE distribution is independent of the cluster label Ci.This means that the dependence structure across outcomes within a patientand the dependence among repeated measures within an outcome of a patientdoes not differ across clusters.We call Ci and B˜i unobserved variables, as we do not observe them inpractice while we call {Y (r)i }Rr=1 the observed variables. Since B˜i is indepen-785.4. The estimation schemedent of Ci, under this model, the joint density of the observed and unobservedvariables from the ith patient on the log-scale is:ln fC(y˜i, b˜i, ci; Ψ) = ln f(b˜i; ΨB) + lnpici + ln f(y˜i|b˜i, ci; Ψ). (5.5)We note that (5.5) can also be written as linear in the unknown cluster labelsI(ci = k):ln fC(y˜i, b˜i, ci; Ψ) = ln f(b˜i; ΨB) +K∑k=1I(ci = k)[lnpik + ln f(y˜i|b˜i, Ci = k; Ψ)].(5.6)This expression will be helpful in Section 5.4. Then the likelihood contributionfrom the ith patient in our MLOMM is:f(y˜i; Ψ) =K∑k=1∫fCY˜ i,B˜i,Ci(y˜i, b˜i, k; Ψ)db˜i. (5.7)5.4 The estimation schemeDue to the cluster structure and the REs, the log-likelihood corresponding to(5.7) involves multiple integrals and summations within the logarithm; henceits evaluation is computationally intensive. For example, triple integrals arerequired in our application of Section 6.2 as the dimension of B˜i is three. TheEM algorithm [13] is an attractive procedure to obtain the MLE when thedensity of the observed variables is difficult to evaluate but the joint densityof both the observed and unobserved variables is easy to evaluate. In theEM algorithm, the likelihood of the observed and unobserved data is calleda complete likelihood. We will employ the EM algorithm since our completelog-likelihood as defined in (5.5) is relatively easy to evaluate as it does notrequire integrations nor summations within a logarithm.Before discussing the details of our EM algorithm, we introduce a slightsimplification of our observed likelihood representation. When the outcomes795.4. The estimation schemeof interest are all discrete, the observed outcome values could be identicalfor two subjects. In addition, when the choice of covariates for both fixedeffects and REs are the same, it could happen that the evaluated likelihoodof two subjects, i and i′, are identical: f(y˜i; Ψ) = f(y˜i′ ; Ψ). In this case, itis computationally more efficient to only evaluate the likelihood of subjectsthat have unique likelihood values. Let U be a specific choice of set containingthe indices of subjects with unique likelihood values. Let ui be the numberof patients that have an identical likelihood value as the patient i; that is,∑i∈U ui = N . Then the log-likelihood can be written as:N∑i=1ln f(y˜i; Ψ) =∑i∈Uui ln f(y˜i; Ψ). (5.8)Now we introduce our EM algorithm to seek the maximizer of (5.8). Eachiteration of the EM algorithm consists of an E-step and a M-step. Our {s+1}thE-step entails the calculation of:Q(Ψ; Ψ{s}) =∑i∈UuiE B˜i,Ci(ln fC(y˜i, B˜i, Ci; Ψ)∣∣∣∣∣y˜i; Ψ{s}),where Ψ{s} denotes the value of Ψ from the sth iteration of the EM algorithm.The following {s+ 1}th M-step seeks a value of Ψ{s+1} that satisfies:Q(Ψ{s+1}; Ψ{s}) ≥ Q(Ψ; Ψ{s}), (5.9)for all values of Ψ in the parameter space. It can be shown that (5.9) guar-antees:N∑i=1ln f(y˜i; Ψ{s+1}) ≥N∑i=1ln f(y˜i; Ψ{s}). (5.10)Thus, at worst, each iteration of an EM algorithm yields a better estimate of Ψin the sense that it corresponds to a greater likelihood; see [26], for example,for a proof. Furthermore, under regularity conditions [13, 42, 80], the EM805.4. The estimation schemealgorithm converges to the MLE.Since the complete log-likelihood (5.6) is linear in I(Ci = k), ourQ(Ψ; Ψ{s})can be simplified as:Q(Ψ; Ψ{s})=∑i∈UuiE B˜i(ECi[ln fC(y˜i, B˜i, Ci; Ψ)∣∣∣y˜i, B˜i; Ψ{s}] ∣∣y˜i; Ψ{s})=∑i∈UuiE B˜i(ln f(B˜i; ΨB) +K∑k=1ω[k]{s}i[lnpik + ln f(y˜i|B˜i, Ci = k; Ψ)] ∣∣∣∣y˜i; Ψ{s}),(5.11)where:ω[k]{s}i := Pr(Ci = k∣∣∣y˜i, B˜i; Ψ{s}) = pi{s}k f(y˜i|B˜i, Ci = k; Ψ{s})K∑l=1pi{s}l f(y˜i|B˜i, Ci = l; Ψ{s}).In general, the expectation with respect to B˜i|y˜i; Ψ{s} cannot be furthersimplified. (One exception occurs when all outcomes and REs are modelledas MVN; this special case is discussed in Sections 5.6 and 5.7.1.) Thereforewe will employ the MCEM algorithm [6] to approximate (5.11). The MCEMalgorithm is a modification of the EM algorithm where the expectation inthe E-step is computed numerically through either classical MC or MCMCmethods. Let Ψ̂{s} be an estimate from the sth step of the MCEM algorithm,and Ψ̂ be the final parameter estimates from the MCEM algorithm. Let b˜{s,m}im = 1, 2, · · · ,M be i.i.d. samples generated from B˜i|y˜i; Ψ̂{s}. Then appealingto the strong law of large numbers, we can approximate Q(Ψ; Ψ{s}) in (5.11)as:QM (Ψ; Ψ̂{s}) =1MM∑m=1qm(Ψ; Ψ̂{s}), (5.12)815.4. The estimation schemewhere:qm(Ψ; Ψ̂{s}) (5.13)=∑i∈Uui(ln fB˜(b˜{s,m}i ; ΨB) +K∑k=1ω[k]{s,m}i[lnpik + ln f(y˜i|B˜i = b˜{s,m}i , Ci = k; Ψ)]),and ω[k]{s,m}i = Pr(Ci = k|y˜i, B˜i = b˜{s,m}i ; Ψ̂{s}). We use rejection samplingto generate b˜{s,m}i as a first choice, but if the acceptance rate of the rejectionsampling is prohibitively small, we employ MCMC sampling; see Section 5.4.1for the details.In the {s + 1}th iteration of the M-step, we seek Ψ̂{s+1} that maximizesthe approximated expected complete log-likelihood QM (Ψ; Ψ̂{s}) with respectto Ψ. The function qm(Ψ; Ψ̂{s}) can be decomposed as:qm(Ψ; Ψ̂{s}) = qCm(pi; Ψ̂{s}) + qBm(ΨB; Ψ̂{s}) +R∑r=1qY(r)m (ΨY (r) ; Ψ̂{s}) (5.14)where:qCm(pi; Ψ̂{s}) =K∑k=1lnpik∑i∈Uuiω[k]{s,m}iqBm(ΨB; Ψ̂{s}) =∑i∈Uui ln fB˜i(b˜{s,m}i ; ΨB) (5.15)qY(r)m (ΨY (r) ; Ψ̂{s})=∑i∈UuiK∑k=1ω[k]{s,m}in(r)∑j=1ln fGLM(y(r)i,j |y(r)i,1:j ,y(1:r)i ,B(r)i = b(r){s,m}i , Ci = k; Ψ[k]Y (r)).(5.16)LettingQCM =∑Mm=1 qCm/M , QBM =∑Mm=1 qBm/M andQY (r)M =∑Mm=1 qY (r)m /M ,the objective function can be decomposed as QM = QCM +QBM +∑Rr=1QY (r)M .As QCM , QBM and QY (r)M of QM involve only the parameters pi,ΨB, and ΨY (r)respectively, maximization with respect to these parameters can be done sep-825.4. The estimation schemearately.Maximization with respect to pi is straightforward, since QCM is propor-tional to the log-likelihood associated with a random sample of size NM (=∑Kk=1∑i∈U ui∑Mm=1 ω[k]{s,m}i ) from the multinomial distribution with prob-ability parameter pi. Therefore, for k = 1, 2, · · · ,K, ̂{s+1}k is the average ofω[k]{s,m}i across subjects and MC samples:̂{s+1}k = 1NM ∑i∈UuiM∑m=1ω[k]{s,m}i .The maximization with respect to ΨB can be easily done for many choicesof RE density. In particular, when B˜ii.i.d.∼ N (0,ΣB):Σ̂{s+1}B =1NM∑i∈UuiM∑m=1b˜{s,m}i b˜{s,m}Ti .The maximization with respect to ΨY (r) is also straightforward. Noticethat QY(r)M is a sum of |U|MKn(r) weighted log-likelihoods of a GLM wherethe mean is modelled with a linear predictor:η(r)[k]{s,m}i,j = Z(r)Ti,j β(r)[k] +K(r)Ti,j b(r){s,m}i , (5.17)with the dispersion parameter κ(r) and a weight uiω[k]{s,m}i (i ∈ U ,m =1, 2, · · · ,M, k = 1, 2, · · · ,K, j = 1, 2, · · · , n(r)). Note that the weights do notdepend on the index j. The MLE of the GLM can be obtained by the usualiterative weighted least squares algorithm [45]. Therefore, we can utilize ex-isting well-developed software, such as the glm function in the R programminglanguage, to search for Ψ̂{s+1}Y (r).In general, when using the glm function to find the maximizer of the sumof weighted log-likelihoods∑i$i ln fGLMi , where fGLMi is specified in (5.1),one can specify $i by utilizing the prior weight mi which is specified via theweight option. Let mi be the original prior weight and m∗i = mi$i be therescaled prior weight. Then the log-likelihood of the GLM with the rescaled835.4. The estimation schemeprior weight differs from the weighted log-likelihood with the weight$i and theoriginal prior weight because, in general, c(yi;κ/mi) in (5.1) is not a multipleof mi. However, the MLE of the GLM regression coefficient vector, β, is thesame for the log-likelihood with the rescaled prior weight and the weightedlog-likelihood with the weight $i and the original prior weight. This is becauseβ models the GLM density through θi which does not appear in c(yi;κ/mi)and the remaining part of the log-likelihood of the GLM, mi[θiyi − b(θi)]/κ,is a multiple of mi.Therefore, when Y(r)i,j |y(r)i,1:j , y(1:r)i,j , B(r)i = b˜(r){s,m}i , Ci = k is modelledby a GLM with a known dispersion parameter κ(r) such as for Poisson orbinomial models, the M-step can be carried out using the glm function withinput weight set to a vector of length |U|MKn(r) containing rescaled priorweights. In addition, although the dispersion parameter of the normal linearmodel is unknown, simply scaling all weights as Kω[k]{s,m}i = ω[k]{s,m}∗i (sothat∑i,j,k,m uiω[k]{s,m}∗i = |U|MKn(r)) allows maximization of the weightedlog-likelihood component QY(r)M using the glm function with weight set touiω[k]{s,m}∗i (the original prior weight is 1 for the normal density).Notice that the RE componentK(r)Ti,j b(r){s,m}i in the linear predictor (5.17)has a known coefficient 1. In the formula input of glm, such terms can behandled with the offset option.The distributional difference of Y(r)i,j across clusters arises only through theconditional mean structure of Y(r)i,j shown in (5.4). The GLM optimizationprocedure can readily account for this cluster difference introducing a factorvector C of length |U|MKn(r) that indicates the cluster labels. Therefore,when Nd covariates are allowed to have different mean effects across clustersand the remaining Ns covariates have the same mean effects across clusters,the formula input in glm could be defined as:formula= Y ~ shared_1 + ...+ shared_Ns+ (diff_1 + ...+ diff_Nd):C + offset(KB)where:• Y845.4. The estimation schemea vector of length |U|MKn(r) that stacks the outcome measures, i.e.,Y(r)i,j , KM times in an appropriate order.• shared_i (i=1,...,Ns)a vector of length |U|MKn(r) that stacks the |U|n(r) covariate vectorKM times in an appropriate order. The effects of these covariates onthe mean of the outcome Y are the same across clusters.• diff_i (i=1,...,Nd)a vector of length |U|MKn(r) that stacks the |U|n(r) covariate vectorKM times in an appropriate order. These covariates are assumed toaffect the mean of the outcome Y differently across clusters.• Ca factor vector of length |U|MKn(r) that records the value of Ci of (5.16)corresponding to the weight ω[ci]{s,m}i .• KBa vector of length |U|MKn(r) that stacks the values of K(r)Ti,j b(r){s,m}i Ktimes in an appropriate order.5.4.1 Generating samples from B˜i|y˜i; Ψ̂{s}The MCEM algorithm takes a numerical approach at the E-step, and ap-proximates the expectation of the complete log-likelihood with respect toB˜i|y˜i; Ψ̂{s}, using a random sample. The conditional density:f(b˜i|y˜i; Ψ̂{s}) =f(y˜i, b˜i; Ψ̂{s})f(y˜i; Ψ̂{s}), (5.18)where f(y˜, b˜i; Ψ̂{s}) = f(b˜i; Ψ̂{s}B )f(y˜|b˜i; Ψ̂{s}) andf(y˜i|b˜i; Ψ̂{s}) =K∑k=1pikf(y˜i|b˜i, Ci = k; Ψ̂{s}), (5.19)855.4. The estimation schemeis typically a non-standard multivariate density, so direct sampling is difficult.Instead of directly sampling from the target distribution, we consider twosampling schemes: rejection sampling and MCMC sampling. We will employthe rejection sampler as our primary sampler, and use MCMC sampling whenthe acceptance rate of the rejection sampler is prohibitively small.Rejection SamplingRejection sampling [54] utilizes an alternative distribution, i.e., a proposaldistribution, which is “close” to the target and for which we already have anefficient algorithm for generating samples. Let g(b˜i) be a proposal distribution.Then the rejection sampler repeats the following Steps 1 and 2 until a singlesample of b˜i is collected.• Step 1: sample b˜∗i from the density g(b˜i), and sample u from the uniform(0,1) distribution.• Step 2: if lnu < ln f(y˜i, b˜∗i ; Ψ̂{s}) − ln g(b˜∗i ) −τ{s}i , where:τ{s}i = supb˜i[ln f(y˜i, b˜i; Ψ̂{s})− ln g(b˜i)],then accept b˜∗i ; if not, return to Step 1.A poor choice of g(b˜i) results in a small acceptance rate. For the firstiteration of the MCEM sampling (s = 1), we set the proposal distributionto be the MVN density of the REs based on initial values of the parameters:g(b˜i) = f(b˜i; Ψ{0}). For any later MCEM iteration s (s > 1), we utilizethe samples from f(b˜i|y˜i; Ψ̂{s−1}) obtained in the previous MCEM iteration.Denote µ¯{s−1}i as their sample mean. Then the proposal distribution at thesth iteration for the ith subject is a MVN with mean µ¯{s−1}i and varianceΣ̂{s−1}. Finally, to determine τ{s}i , we employ a numerical procedure; the Roptimization subroutine optim is used.865.4. The estimation schemeMCMC samplingWhen rejection sampling yields a prohibitively small acceptance rate for somesubjects, we employ MCMC to generate samples from the target distributionf(b˜i|y˜i; Ψ̂{s}) with a multivariate proposal distribution. To decrease auto-correlation across the MCMC samples we will use only every thinth MCMCsample. Let burn be the number of burn-in samples to discard. Also let c bea scalar constant, to be adjusted to yield a MCMC acceptance rate within anacceptable range during the burn-in period. To generate b˜i from the targetdistribution at the sth step of the MCEM iteration, first let b˜ci = 0 be aninitial value, and repeat the following process M∗ := burn+ thin×M times.• Step 1: sample b˜pi from N (b˜ci , cΣ̂{s}).• Step 2: compute the Metropolis-Hastings (MH) rate:MHrate =fY˜ i,B˜i(y˜i, b˜pi ; Ψ{s})fY˜ i,B˜i(y˜i, b˜ci ; Ψ{s}),• Step 3: sample u from Unif(0, 1). If MHrate > u then accept b˜pi as asample, and let b˜ci = b˜pi ; else accept b˜ci as a sample.Once M∗ samples are collected, discard the burn-in and thinning samples toobtain the M samples to be utilized.5.4.2 Choice of the MC sample size MIt is inefficient to start the MCEM algorithm with a large MC sample (M)when Ψ̂{s} is far from the true value. Rather, one may increase M as thecurrent approximation Ψ̂{s} moves closer to the true value Ψ. [46] suggested toincrease M linearly with the number of iterations. [76] suggested monitoringthe convergence of the algorithm by plotting Ψ̂{s} against the iteration numbers. [6] made the first serious attempt to automate the MCEM algorithm andsuggested selecting M at each iteration based on an approximate conditional100(1−α)% confidence ellipsoid for Ψ{s}. If the previous value Ψ̂{s−1} lies in875.4. The estimation schemethat region, then the EM step was swamped by MC error, and M should beincreased.The procedure described in [6] requires an inversion of the estimated co-variance matrix of Ψ̂{s} at every iteration. However, this matrix could be sin-gular, or nearly singular, especially when the number of parameters is large.Therefore we employ a more recently proposed alternative procedure which re-quires simpler computations [9]. This procedure also automatically increasesthe MC sample size and is based on the ascent property of the EM algorithm.To ensure the ascent property (5.10), it is not necessary to satisfy the inequal-ity (5.9); it is sufficient if Q(Ψ{s+1}; Ψ{s}) ≥ Q(Ψ{s}; Ψ{s}) holds. Therefore,[9] suggested that within each MCEM iteration an asymptotic lower boundbe calculated for:∆Q(Ψ̂{s+1}; Ψ̂{s}) = Q(Ψ̂{s+1}; Ψ̂{s})−Q(Ψ̂{s}; Ψ̂{s}). (5.20)Let ∆QM (Ψ̂{s+1}; Ψ̂{s}) = QM (Ψ̂{s+1}; Ψ̂{s}) − QM (Ψ̂{s}; Ψ̂{s}). When thesamples generated at the E-step are i.i.d., they showed that:√M(∆QM (Ψ̂{s+1}; Ψ̂{s})−∆Q(Ψ̂{s+1}; Ψ̂{s}))d→ N (0, σ2) , (5.21)and suggested use of Lα(σˆ) = ∆QM (Ψ̂{s+1};Ψ̂{s})−zασˆ/√M as an asymptoticlower bound for (5.20) where zα is the 1−αth quantile of the standard normal(the estimation of σ is discussed in Appendix B.1).The computation of this asymptotic lower bound is simpler than [6]’s pro-cedure because σˆ is a univariate quantity while the confidence ellipse requiresthe computation of the variance matrix for multiple parameter estimates. Ifthe asymptotic lower bound is positive, there is sufficient evidence to concludethat Ψ̂{s+1} increases the likelihood. Thus Ψ̂{s+1} is accepted as the {s+ 1}thparameter update, and the algorithm moves on to the next iteration. If theasymptotic lower bound is negative, this estimate is rejected, and the M-stepis performed with additional samples until the asymptotic lower bound is posi-tive. A geometric rate of increase is employed, with the next sample size taken885.4. The estimation schemeto be M ← M +M/C for some C > 1. This is achieved by appending M/Cadditional samples to the current sample. In our data analysis in Sections 5.6,5.7.1 and Chapter 6 we employ C = 3.In the interest of computational efficiency, a large enough starting samplesize should be chosen at each iteration so that the appending process is re-quired infrequently. For the sth MCEM iteration, let the starting MC samplesize be M{s}start. To force an increase in the MC sample sizes across MCEM iter-ations, we take M{s+1}start ≥M{s}start. Using the standard sample size calculation,we set the next sample size M{s+1}start to be large enough to detect an increaseas small as the previous one with probability 1− β:M{s+1}start = max{M{s}start,σˆ2(zα + zβ)2∆QM (Ψ̂{s}; Ψ̂{s−1})2}.In Appendix B.1, we review the derivation of the asymptotic distribution for∆QM (Ψ̂{s+1}; Ψ̂{s}) in the context of our MLOMM, and discuss how to com-pute σˆ. The next section discusses the stopping rule of our MCEM algorithmthat relies on (5.21).5.4.3 Stopping rule[9] suggested that the algorithm be terminated when the observed likelihood“stabilizes” and the change in QM is too small to be easily detected. Morespecifically, (5.21) implies that asymptotically we have:Pr(∆Q(Ψ̂{s+1}; Ψ̂{s}) ≤ Uγ(σ))= 1− γ,where:Uγ(σ) := ∆QM (Ψ̂{s+1}; Ψ̂{s}) + zγσ√M.Therefore if the interest lies in stopping the MCEM algorithm when themarginal likelihood stabilizes, then waiting until Uγ(σ) becomes less than895.4. The estimation schemesome pre-specified constant is a convenient stopping rule:Uγ(σˆ) < Cstop.We employ [9]’s stopping criterion in Sections 5.6 and 5.7.1, and Chapter Estimating the variance of Ψ̂[43] proposed that the observed Fisher information may be approximated bythe difference of two positive definite matrices, the estimated expected Hes-sian of the negative of the complete log-likelihood, and the estimated expectedouter product of the gradient of the complete log-likelihood (where the expec-tation is taken with respect to the conditional distribution of the unobservedvariables given the observed data). Many MCEM algorithms, including [6],use this result to estimate the observed Fisher information. However, be-cause of sampling error, we have found that this difference may not alwaysbe positive definite in practice. We use an alternative procedure of [11] whichguarantees that the approximated observed Fisher information is a positivedefinite matrix.To implement this approach, we first express the distribution of RE B˜iin terms of the multivariate standard normal. Let A be an upper triangularmatrix such that ΣB = ATA. Then B˜i can be expressed as B˜i = AT D˜i,where D˜i ∼ N (0, I). Let Ψ′ = {{pik}K−1k=1 , {Ai,j}i≤j , {ΨY (r)}Rr=1} where Ai,j isthe (i, j)th entry of the matrix A. Then the observed information evaluatedat Ψ̂′ may be written as:∑i∈Uui∂∂Ψ′ln f(y˜i; Ψ′)∂∂Ψ′Tln f(y˜i; Ψ′)∣∣∣∣∣Ψ′=Ψ̂′,905.4. The estimation schemewhere:∂∂Ψ′ln f(y˜i; Ψ′) =∂∂Ψ′ln∫f(y˜i|B˜i = AT d˜i; Ψ′)f(d˜i)dd˜i=∫ [∂∂Ψ′ f(y˜i|B˜i = AT d˜i; Ψ′)]f(d˜i)dd˜i∫f(y˜i|B˜i = AT d˜i; Ψ′)f(d˜i)dd˜i. (5.22)The first equality holds because the change of variable technique yields:fY˜ i|D˜i(y˜i|d˜i) =fY˜ i,D˜i(y˜i, d˜i)fD˜i(d˜i)=fY˜ i,B˜i(y˜i,AT d˜i)|A|fD˜i(d˜i)=fY˜ i|B˜i(y˜i|AT d˜i)fB˜i(AT d˜i)|A|fB˜i(AT d˜i)|A|= fY˜ i|B˜i(y˜i|AT d˜i).The parameters are omitted in the density expression above for the sake ofsimplicity. The differentiation and the integration are interchangeable undermild regularity conditions. Therefore, the MC approximation of (5.22) for theith subject is given by:L∑l=1[∂∂Ψ′ f(y˜i|B˜i = AT d˜<l>i ; Ψ′)]L∑l=1f(y˜i|B˜i = AT d˜<l>i ; Ψ′)where d˜<l>i , l = 1, 2, · · · , L are samples from the multivariate standard normaldistribution f(d˜i). See Appendix B.2 for the details of the evaluation of∂f(y˜i|B˜i = AT d˜<l>i ; Ψ′)/∂Ψ′. Finally, the delta method is applied to theestimate of the inverted Fisher information matrix of Ψ̂′ to obtain an estimateof the variance matrix of Ψ̂.5.4.5 Initial values of the MCEM algorithmThe EM algorithm only guarantees reaching a local maximum. Therefore,when there are multiple maxima, whether we will actually reach the global915.4. The estimation schememaximum depends on where we start; if we start at “good” initial values, wewill be able to find the global maximum. Good initial values should be closeto the parameters corresponding to the global maximum.To choose the initial values, our algorithm first identifies MLEs of eachsingle outcome longitudinal model assuming there is only a single cluster.Such MLEs can be readily obtained as each of these models is a GLMM. Weobserved that the EM algorithm performs poorly when the same initial valuesare assigned to the fixed effect coefficients that are expected to differ acrossthe K clusters. Therefore, a small amount of noise is added to the estimatesof the fixed effect coefficients that are expected to differ across clusters toobtain the initial values. The noise was generated from a normal distributionwith standard deviation set to the absolute value of the GLMM estimatedfixed effects multiplied by small constant a (0.001 ≤ a ≤ 0.1). We observethat when the initial cluster-specific fixed effects are not ordered as expectedbetween clusters, the algorithm can recover the order if the clusters are well-separated. As this is an ad-hoc approach to select initial values, one should trymultiple initial values to ensure that the algorithm converges to the maximum.Finally, the initial values of the RE correlations across outcomes are taken tobe Evaluation of the posterior probabilitiesWe classify the ith subject into the kth cluster, if Pr(Ci = k|y˜i) > Pr(Ci =k′|y˜i) for all k′ 6= k. Using the multivariate standard normal expression forthe RE distribution as in Section 5.4.4, the posterior probability Pi,k(Ψ′) :=Pr(Ci = k|y˜i; Ψ′) can be written as:Pi,k(Ψ′) =∫pikf(y˜i|B˜i = AT d˜i, Ci = k; Ψ′)f(d˜i)dd˜iK∑h=1∫pihf(y˜i|B˜i = AT d˜i, Ci = h; Ψ′)f(d˜i)dd˜i. (5.23)Direct evaluation of Pi,k(Ψ̂′) is difficult as it requires multidimensional inte-grations over the REs. Therefore, we approximate (5.23) via the MC method.925.5. Extending to the negative binomial modelThat is, define:P˜i,k(Ψ′) :=L∑l=1pikf(y˜i|B˜i = AT d˜<l>i , Ci = k; Ψ′)L∑l=1K∑h=1pihf(y˜i|B˜i = AT d˜<l>i , Ci = h; Ψ′)(5.24)where d˜<l>i , l = 1, 2, · · · , L are samples from the multivariate standard normaldistribution f(d˜i). We evaluate P˜i,k(Ψ̂′) as an estimate of Pi,k(Ψ′).We construct the confidence interval (CI) for Pi,k(Ψ′) by first obtaining itsSE on the logit scale using the delta method. As logit(P˜i,k(Ψ̂′)) is a functionof the MLEs, we construct the CI for logit(P˜i,k) based on the usual normal ap-proximation, and then transform back to the original scale, thus guaranteeingthat the CI for Pi,k lies within [0, 1]. See Appendix B.3 for the details.5.5 Extending to the negative binomial modelOur MLOMM models the conditional distribution Y(r)i,j |y(r)i,1:j ,y(1:r)i , b(r)i , ciwith a GLM. In fact, our MLOMM can be readily extended to non-GLMsfor this conditional distribution so long as the model can be extended toincorporate REs and it has a well-developed algorithm to find the MLE usinga weighted log-likelihood.The NB distribution has often been used to describe count data where thevariance is greater than the mean. An observation yi is from a NB with meanµi and variance µi + µ2i /ζ if it has a density:f(yi) = exp[yi ln(µiµi + ζ)+ ζ ln(ζµi + ζ)+ ln(Γ(ζ + yi)Γ(ζ)yi!)].This is the so-called NB2 model [21]. If the dispersion parameter ζ is known,this distribution belongs to an exponential family. The exponential familyparameters defined in (5.1) are mi = 1, κ = 1, θi = ln(µi/(µi + ζ)), b(θi) =−ζ ln(1−eθi) and c(yi;κ) = ln Γ(ζ+yi)/(Γ(ζ)yi!). NB regression models oftenlink the linear predictor to the mean µi using a log-link: ln(µi) = ηi. This is935.6. A simple illustrative examplenot a canonical link. In practice, ζ is estimated rather than fixed and the MLEcan be obtained via an alternating iteration process [77]: for a given ζ, theNB model is fitted using the regular GLM algorithm; for fixed θi’s, the profilelikelihood of ζ is maximized with respect to ζ. These two steps are alternateduntil both converge. This approach is implemented in the R function glm.nb,for example.A natural extension of NB regression to model correlated outcomes yi,j issuggested by [5], where conditionally on the REs, ηi,j is modelled as a linearfunction of fixed and REs on the log-scale as in (5.2). Therefore, we canfurther extend this NB mixed effect model to allow a mixture distributionand allow dependence with other outcomes following the procedure describedin Section 5.3.2.We can perform the MCEM algorithm as described in Section 5.4 whenY(r)i,j |y(r)i,1:j ,y(1:r)i , b(r)i , ci is modelled with the NB regression. Conveniently,glm.nb includes a weight argument that facilitates maximizing the weightedlog-likelihood function. Therefore, the M-step to search for the maximizer ofthe approximated expected complete log-likelihood QY(r)M can be readily doneusing glm.nb together with the techniques described in Section A simple illustrative exampleThis section demonstrates our procedure using simulated data. Our simulationmodel considers three outcomes (R = 3) and 5 repeated measures of eachoutcome that are assumed to be taken at the same times (n(r) = n = 5 forall r). We simulate data from the triple longitudinal outcome mixture modelproposed in Section 5.3.2. It assumes that conditionally on a RE b˜i, a clusterlabel ci and other outcomes y(1:r)i , Y(r)i,j j = 1, 2, · · · , n(r) are independent fromthe normal distribution. We further assume that the conditional expectationof Y(r)i (r > 1) depends linearly on Y(1:r)i . In addition, the means of allY(r)i , r = 1, 2, 3 depend on time and the cluster differences arise only fromthe differences in the mean time effect on the outcomes. Two clusters areconsidered (K = 2), which may be interpreted as an absolute responder cluster945.6. A simple illustrative exampleand an absolute non-responder cluster. Specifically, we assume:B˜i =B(1)iB(2)iB(3)i ∼ N 3 (0,ΣB) , ΣB =σ21 ρ12σ1σ2 ρ13σ1σ3σ22 ρ23σ2σ3σ23Ci ∼ Categorical(pi, 1− pi)Y(1)i |b˜i, ci ∼ N n((β(1)0 + b(1)i)1 + β(1)[ci]1 t(1), κ(1)I)Y(2)i |y(1)i , b˜i, ci ∼ N n((β(2)0 + b(2)i)1 + β(2)[ci]1 t(2) + β(2)2 y(1)i , κ(2)I)Y(3)i |y(2)i ,y(1)i , b˜i, ci ∼ N n((β(3)0 + b(3)i)1 + β(3)[ci]1 t(3) + β(3)2 y(1)i + β(3)3 y(2)i , κ(3)I),where t(1) = t(2) = (0, 1, · · · , n − 1)T and t(3) = (0, 1, · · · , 1)T . This meansthat the 1st and the 2nd outcomes have different linear time effects on theirmeans across clusters, and the 3rd outcome has different constant time effectsacross clusters. Our simulation model consists of 22 parameters. The param-eters can be partitioned into four sets as: ΨY (1)={β(1)0 , β(1)[1]1 , β(1)[2]1 , κ(1)},ΨY (2)= {β(2)0 , β(2)[1]1 , β(2)[2]1 , β(2)2 , κ(2)}, ΨY (3)={β(3)0 , β(3)[1]1 , β(3)[2]1 , β(3)2 , β(3)3 ,κ(3)}, ΨB = {σ1, σ2, σ3, ρ12, ρ13, ρ23} and pi.Because all Y(r)i |y(1:r)i , b˜i, ci, r = 1, 2, 3 and B˜i are from MVNs, the ob-served likelihood of y˜i = (y(1)Ti ,y(2)Ti ,y(3)Ti )T is available in closed form asa mixture of MVNs. Therefore, it is possible to find the MLE by directlymaximizing the observed log-likelihood. In this simulation study, we obtainboth the MLE and the EMMLE (the estimates obtained from the MCEMprocedure described in Section 5.4), and compare their values and their SEs.Under this definition, Y˜ i has a distribution:2∑k=1pikN 3n β(1)0 1n + β(1)[k]1 t(1)γ(2)0 1n + β(2)2 β(1)[k]1 t(1) + β(2)[k]1 t(2)(β(3)3 γ(2)0 + γ(3)0 )1n + γ(3)2 β(1)[k]1 t(1) + β(3)3 β(2)[k]1 t(2) + β(3)[k]1 t(3) , κ(1)I + σ21Jβ(2)2 κ(1)I + γ12J κ22I + γ22Jγ(3)2 κ(1)I + vTΣB(1 0 0)TJ κ23I + vTΣB(β(2)2 1 0)TJ κ33I + vTΣBvJ ,955.6. A simple illustrative examplewhere pi1 = pi, pi2 = 1− pi, J = 1n1Tn and:γ(2)0 = β(2)0 + β(2)2 β(1)0 ,γ(3)0 = β(3)0 + β(3)2 β(1)0 ,γ(3)2 = β(3)2 + β(3)3 β(2)2κ22 = κ(2) + β(2)22 κ(1)κ33 = κ(3) + β(3)23 κ(2) + γ(3)22 κ(1)κ23 = β(3)2 β(2)2 κ(1) + β(3)3 κ22γ12 = β(2)2 σ21 + ρ12σ1σ2,γ22 = β(2)22 σ21 + 2β(2)2 ρ12σ1σ2 + σ22,v = (γ(3)2 , β(3)3 , 1)T .Since the within cluster covariance between Y(1)i,j and Y(2)i,j and that betweenY(1)i,j and Y(2)i,j′ for j 6= j′ differ by β(2)2 κ(1), the sign of β(2)2 explains the directionof the additional dependence between the two outcomes measured at the sametime j. If β(2)2 = β(3)2 = β(3)3 = 0 then Y(1)i ,Y(2)i and Y(3)i are conditionallyindependent given ci and b˜i, and:Var(Y(r)i |ci) = σ2rJ + κ(r)ICov(Y(r′)i ,Y(r)i |ci) = ρr′rσr′σrJwhere r = 2, 3, r′ = 1, 2 and r 6= r′.We generate a sample of size N = 100 with two scenarios of parameterspecifications: Section 5.6.1 discusses Scenario 1 where the parameters arespecified so that each outcome is effective to identify clusters by itself. Sec-tion 5.6.2 discusses Scenario 2 where only two outcomes are effective and theremaining outcome is ineffective.In both scenarios, in order to assess the benefit of incorporating multi-ple outcomes, we fit the triple longitudinal outcome mixture model (here-after, TRIPLE), the double longitudinal outcome mixture model based on965.6. A simple illustrative exampleoutcomes 1 and 2 (DOUBLE-12), and the single longitudinal outcome mix-ture models based on each outcome separately (SINGLE-r for r=1,2,3). Notethat SINGLE-2 (SINGLE-3) treats f(y(2)i |y(1)i ) (f(y(3)i |y(1)i ,y(2)i )) as an ob-served likelihood, and the RE B(2)i (B(3)i ) is assumed to be independent ofY(1)i (Y(2)i and Y(3)i ). The parameters of our ascent-based procedure are setto α = 0.1, β = 0.3, γ = 0.1 and Cstop = 0.005. The initial MC sample sizeis set to M = 200. To assess the performance of the MCEM algorithm wheninitial values are far from the true values, we do not employ the choice ofinitial values discussed in Section 5.4.5. See Figures 5.1 - 5.4 for the initialvalues.5.6.1 Scenario 1: All outcomes are effectiveTable 5.1 shows the parameter specifications for this scenario where all out-comes have different time effects across clusters. Table 5.1 also shows theMLEs and EMMLEs from TRIPLE, DOUBLE-12, and SINGLE-1, 2 and 3.The asymptotic SEs of the MLEs of the parameters that have constrainedparameter spaces, that is κ(r), ΣB and pi, are obtained in three steps. First,the Fisher information matrix is estimated at the MLE on the unconstrainedscale: κ(r) is transformed into the log-scale, pi is transformed via the Fishertransformation [17], and ΣB is transformed via the Cholesky decomposition.Second, the estimated Fisher information matrix is inverted. Third, the in-verse of the estimated Fisher information matrix is transformed via the deltamethod to estimate the variance matrix of the MLE on the original (con-strained) scale.975.6.AsimpleillustrativeexampleTrue TRIPLE DOUBLE-12 SINGLE-1 SINGLE-2 SINGLE-3MLE EMMLE MLE EMMLE MLE EMMLE MLE EMMLE MLE EMMLEβ(1)0 2.0 1.92 (0.26/0.28) 1.91 (0.27) 1.92 (0.25/0.26) 1.89 (0.26) 1.92 (0.25/0.26) 1.93 (0.26)β(1)[1]1 0.6 0.59 (0.06/0.07) 0.58 (0.07) 0.59 (0.06/0.07) 0.59 (0.07) 0.53 (0.07/0.08) 0.53 (0.08)β(1)[2]1 -0.6 -0.59 (0.06/0.06) -0.59 (0.06) -0.58 (0.07/0.07) -0.58 (0.07) -0.67 (0.09/0.09) -0.66 (0.09)κ(1) 2.0 2.03 (0.14/0.16) 2.03 (0.16) 2.03 (0.15/0.16) 2.03 (0.16) 2.02 (0.15/0.15) 2.02 (0.16)β(2)0 0.0 -0.02 (0.14/0.16) -0.02 (0.16) -0.04 (0.14/0.14) -0.03 (0.14) -0.12 (0.14/0.14) -0.10 (0.14)β(2)[1]1 0.4 0.44 (0.03/0.04) 0.44 (0.04) 0.44 (0.03/0.03) 0.44 (0.03) 0.42 (0.03/0.03) 0.42 (0.03)β(2)[2]1 -0.4 -0.38 (0.04/0.05) -0.38 (0.05) -0.36 (0.05/0.04) -0.36 (0.04) -0.32 (0.05/0.05) -0.32 (0.05)β(2)2 0.1 0.14 (0.04/0.05) 0.14 (0.05) 0.15 (0.05/0.05) 0.15 (0.05) 0.19 (0.05/0.05) 0.19 (0.05)κ(2) 1.0 0.84 (0.06/0.09) 0.84 (0.09) 0.86 (0.06/0.08) 0.86 (0.08) 0.86 (0.07/0.08) 0.86 (0.08)β(3)0 3.0 3.08 (0.16/0.17) 3.07 (0.17) 2.78 (0.14/0.14) 2.78 (0.15)β(3)[1]1 1.5 1.49 (0.04/0.04) 1.50 (0.04) 0.99 (0.03/0.03) 0.99 (0.03)β(3)[2]1 -1.5 -1.49 (0.05/0.05) -1.49 (0.05) -1.22 (0.05/0.05) -1.22 (0.05)β(3)2 0.0 0.05 (0.15/0.21) 0.05 (0.21) 0.21 (0.20/0.21) 0.21 (0.21)β(3)3 0.1 0.06 (0.15/0.17) 0.06 (0.17) 0.17 (0.19/0.21) 0.17 (0.21)κ(3) 1.0 1.04 (0.07/0.09) 1.04 (0.09) 1.16 (0.09/0.09) 1.16 (0.09)σ1 2.0 2.31 (0.18/0.23) 2.31 (0.23) 2.28 (0.18/0.24) 2.28 (0.23) 2.29 (0.18/0.24) 2.29 (0.23)σ2 1.0 1.03 (0.09/0.10) 1.03 (0.10) 1.00 (0.09/0.10) 0.99 (0.10) 1.03 (0.09/0.11) 1.03 (0.11)σ3 1.0 1.01 (0.11/0.13) 1.01 (0.13) 0.70 (0.09/0.13) 0.70 (0.13)ρ12 0.2 0.06 (0.14/0.16) 0.06 (0.16) 0.00 (0.15/0.16) 0.00 (0.16)ρ13 0.7 0.79 (0.07/0.08) 0.79 (0.08)ρ23 0.2 0.30 (0.14/0.16) 0.29 (0.16)pi 0.5 0.55 (0.05/0.05) 0.55 (0.05) 0.54 (0.05/0.05) 0.54 (0.05) 0.60 (0.06/0.06) 0.60 (0.06) 0.53 (0.06/0.06) 0.53 (0.06) 0.58 (0.07/0.07) 0.58 (0.07)llk -2639.29 -2623.67 -2623.67 -1848.56 -1848.57 -1066.26 -1066.27 -821.41 -821.43 -850.99 -851.00M 1329 1673 7775 1776 1723Itr 119 61 32 64 66Table 5.1: Simulation results of the illustrative example. The values denoted as (a/b) within parenthesesrepresent the SEs computed based on the Hessian representation of the Fisher information matrix (a) andthat based on the outer products of the gradient representation of the Fisher information matrix (b). True:simulation parameter values; MLE: the MLE obtained by maximizing the observed likelihood directly; EMMLE:the MLE obtained via the MCEM algorithm; llk: log-likelihood values at the final estimates; M: the final MCEMsample size; Itr: the number of iterations required before convergence.985.6. A simple illustrative exampleThe Fisher information matrix at the MLE on the unconstrained scale isestimated in two ways. The first estimate is the Hessian matrix of the negativelog-likelihood, numerically evaluated using finite differences at the MLE on theunconstrained scales. The second estimate is based on the gradient vector ofthe log-likelihood evaluated for each patient at the MLE on the unconstrainedscales. Then the outer products of the gradient vectors are summed acrosspatients to obtain the estimate. Both methods yield unbiased estimatorsof the Fisher information matrix. The first method is commonly used asthe Hessian is a by-product of many optimization algorithms. The secondmethod is more compatible with our procedure to estimate the variance inthe MCEM algorithm in Section 5.4.4. Therefore, it is interesting to comparehow the variance estimates for the MLE differ between the two procedures aswell as the variance estimates for the EMMLE. The SEs of the EMMLEs arecalculated by the procedures discussed in Section 5.4.4.Table 5.1 shows that both the MLE and EMMLE differ somewhat fromthe true parameter values due to sampling variation of the generated data.However, the MLE and EMMLE are very similar for all the parameters. In allmodels, the observed likelihood values are essentially identical for the EMMLEand MLE procedures. The SEs of the MLE based on the Hessian represen-tation of the Fisher information matrix is somewhat different from the SEsof the MLE based on the gradient representation of the Fisher informationmatrix, mainly for the variances of the REs. However, the SEs of the MLEbased on the gradient representation of the Fisher information matrix and theSEs of the EMMLE are very similar for all the parameters. This is reasonableas the SEs of the EMMLE are also based on the gradient representation ofthe Fisher information matrix.Figures 5.1-5.4 show the trace plots for Ψ{s} based on all the procedures.For all the parameters of all the models, the estimates move rapidly for aboutthe first 5 iterations. However, the traceplots of some parameters such as β(2)2and β(3)3 , ρ12 and ρ23 change direction in the next few iterations. Nevertheless,as seen in Figure 5.5, the traceplots of the observed log-likelihoods of allthe procedures show overall increasing trends. (However, 6% of the MCEM995.6. A simple illustrative exampleFigure 5.1: Trace plots of the parameters Ψ{s}Y (1)over the MCEM iterations.Black: TRIPLE; Magenta: DOUBLE-12; Red: SINGLE-1. The dotted linesindicate the MLEs.iterations yield a decrease in the observed log-likelihood for TRIPLE; thesedecreases occur only after the 80th iteration when the observed log-likelihoodhas essentially already converged. The observed likelihood may decrease inthe MCEM algorithm due to the MC error that arises in the M-step of theMCEM algorithm, although the EM algorithm guarantees that the observedlikelihood increases at every iteration.)The top panel of Figure 5.6 shows the point estimates (solid line) and theasymptotic lower Lα(σˆ) and upper Uγ(σˆ) bounds for ∆Q(Ψ̂{s+1}; Ψ̂{s}) foreach MCEM iteration for all the procedures. The panels show that ∆Q(Ψ̂{s+1};Ψ̂{s})drops rapidly in the first few iterations in all the procedures. (Note that they-axis in the top panel of Figure 5.6 is on the log-scale.) The algorithm hadto append additional samples and repeat the M-step only a few times (9, 6,6, 5 and 4 times in TRIPLE, DOUBLE-12, SINGLE-1, 2 and 3 respectively).The bottom panels of Figure 5.6 show that the MC sample size remains at200 for a large portion of the iterations and then rapidly increases in the lastfew iterations: TRIPLE, DOUBLE-12, SINGLE-1, 2 and 3 stay at 200 MCsample size for the first 79, 43, 24, 57 and 58 iterations, respectively.We also assess the classification performance of these procedures. In thesimulated dataset, 55 subjects are from Ci = 1 and the other 45 subjectsare from Ci = 2. The top panel of Figure 5.7 shows the trajectories of thesimulated subjects over the 5 time points coloured by the assigned clusters1005.6. A simple illustrative exampleFigure 5.2: Trace plots of the parameters Ψ{s}Y (2)over the MCEM iterations.Black: TRIPLE; Magenta: DOUBLE-12; Green: SINGLE-2. The dotted linesindicate the MLEs.Figure 5.3: Trace plots of the parameters Ψ{s}Y (3)over the MCEM iterations.Black: TRIPLE; Red: SINGLE-3. The dotted lines indicate the MLEs.1015.6. A simple illustrative exampleFigure 5.4: Trace plots of the parameters Σ{s}B and pi{s} over the MCEMiterations from the Black: TRIPLE; Magenta: DOUBLE-12; Red: SINGLE-1; Green: SINGLE-2; Blue: SINGLE-3. The dotted lines indicate the MLEs.Figure 5.5: The observed log-likelihood values at each iteration of the MCEMalgorithm. Black: TRIPLE; Magenta: DOUBLE-12; Red: SINGLE-1; Green:SINGLE-2; Blue: SINGLE-3.1025.6. A simple illustrative example(a) ∆Q(Ψ̂{s+1}; Ψ̂{s})(b) MFigure 5.6: The top panel shows the asymptotic lower Lα(σˆ) and upper Uα(σˆ)bounds for ∆Q(Ψ̂{s+1}; Ψ̂{s}) (dashed curves) and the point estimates (solidcurve) for each procedure. The y-axis is on the log-scale. The bottom panelshows the MC sample size at each iteration for each procedure.1035.6. A simple illustrative exampleFigure 5.7: The trajectories of the simulated subjects over the 5 time pointscoloured by the clusters assigned by TRIPLE (top panels) and trajectory ofsubject 76 misclassified by SINGLE-2 and 3 (bottom panels).1045.6. A simple illustrative example(Ci = 1: black, Ci = 2: red) based on the results from TRIPLE. TRIPLEclassifies all 100 subjects into their correct cluster. DOUBLE-12 performs sec-ond best: it classifies 97 subjects correctly. As expected, each of SINGLE-1,2 and 3 performs less well: only 89, 92 and 94 subjects are correctly classi-fied, respectively. The classification performance of SINGLE-3 is better thanSINGLE-1 or 2 although the estimated parameters of β(3)[1]1 and β(3)[2]1 arecloser than β(r)[1]1 and β(r)[2]1 for r = 1, 2. This may be because the time-dependent covariate for the third outcome t(3) is a step function while thetime-dependent covariate for the first and the second outcome t(r) (r = 1, 2)is linear for j = 2, 3, · · · , 5.Table 5.2 shows the estimated posterior probabilities (and 95% asymptoticCIs) of belonging to cluster 2 for all the subjects misclassified by at least oneof the procedures. The bottom panel of Figure 5.7 shows the trajectories ofa misclassified subject. Subject 76 is misclassified into the increasing trendcluster by SINGLE-2 and 3. While the 2nd and the 3rd outcomes of subject76 exhibit mild increasing trends, outcome 1 shows a clear decreasing trend.As a result, subject 76 is correctly classified into the decreasing trend clusterby DOUBLE-12, SINGLE-1 and TRIPLE.Table 5.2 shows that estimated posterior probabilities from TRIPLE aremore often nearly 1 or nearly 0 than for any SINGLE or for DOUBLE-12. Wecompare the distributions of posterior probabilities based on TRIPLE and theother procedures (i.e., SINGLE-1, 2, 3 or DOUBLE-12) when Y˜ i is generatedfrom the decreasing cluster (Ci = 2). That is, we compute the posteriorprobability that a patient is in the decreasing cluster based on 10,000 MCsamples generated from Y˜ i|Ci = 2 using the estimated parameters of TRIPLE(See Table 5.1). The posterior probabilities are computed based on TRIPLE,DOUBLE-12 or SINGLE-1, 2, 3.Figure 5.8 shows the joint distribution of these posterior probabilities aswell as their marginal distributions. Clearly, the posterior probabilities basedon TRIPLE are more skewed to the left with a large mass around 1 than theposterior probabilities based on any of SINGLE-1, 2 or 3. In fact, Figure 5.8shows that the pairs of posterior probabilities based on TRIPLE and any of1055.6. A simple illustrative exampleID TRIPLE DOUBLE-12 SINGLE-1 SINGLE-2 SINGLE-39 0.00 (0.00, 0.00) x0.56 (0.50, 0.61) 0.27 (0.25, 0.30) x0.63 (0.58, 0.67) 0.01 (0.00, 0.01)36 0.96 (0.95, 0.97) x0.44 (0.38, 0.50) x0.03 (0.03, 0.03) 0.93 (0.93, 0.94) 0.64 (0.60, 0.68)100 0.03 (0.02, 0.04) x0.53 (0.48, 0.58) x0.73 (0.71, 0.76) 0.15 (0.13, 0.17) 0.18 (0.16, 0.21)5 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) x0.21 (0.20, 0.23) 1.00 (1.00, 1.00) 0.92 (0.90, 0.94)15 0.00 (0.00, 0.00) 0.00 (0.00, 0.00) x0.71 (0.69, 0.73) 0.00 (0.00, 0.00) 0.00 (0.00, 0.00)17 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) x0.36 (0.34, 0.39) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00)33 1.00 (1.00, 1.00) 0.58 (0.54, 0.62) x0.04 (0.03, 0.05) 0.94 (0.93, 0.95) 1.00 (1.00, 1.00)39 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) x0.34 (0.32, 0.37) 1.00 (1.00, 1.00) 0.67 (0.63, 0.70)45 1.00 (1.00, 1.00) 0.96 (0.95, 0.97) x0.14 (0.13, 0.15) 0.98 (0.98, 0.99) 0.96 (0.95, 0.97)62 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) x0.19 (0.18, 0.21) 1.00 (1.00, 1.00) 0.93 (0.91, 0.94)75 0.00 (0.00, 0.00) 0.10 (0.08, 0.12) x0.68 (0.65, 0.70) 0.04 (0.03, 0.05) 0.01 (0.01, 0.01)82 1.00 (1.00, 1.00) 0.67 (0.65, 0.69) x0.49 (0.47, 0.51) 0.51 (0.49, 0.53) 1.00 (0.99, 1.00)4 0.00 (0.00, 0.00) 0.07 (0.06, 0.08) 0.03 (0.03, 0.04) x0.58 (0.56, 0.61) 0.00 (0.00, 0.01)10 1.00 (1.00, 1.00) 1.00 (0.99, 1.00) 0.99 (0.99, 0.99) x0.44 (0.42, 0.47) 1.00 (1.00, 1.00)18 0.00 (0.00, 0.00) 0.02 (0.02, 0.03) 0.01 (0.00, 0.01) x0.77 (0.75, 0.79) 0.00 (0.00, 0.00)23 0.94 (0.92, 0.96) 0.92 (0.90, 0.93) 0.83 (0.81, 0.85) x0.43 (0.40, 0.46) 0.84 (0.81, 0.87)55 0.01 (0.01, 0.01) 0.35 (0.32, 0.38) 0.06 (0.05, 0.06) x0.83 (0.81, 0.84) 0.03 (0.03, 0.04)64 0.01 (0.01, 0.01) 0.36 (0.33, 0.39) 0.10 (0.09, 0.11) x0.72 (0.70, 0.74) 0.01 (0.01, 0.01)76 0.98 (0.97, 0.98) 0.99 (0.99, 0.99) 0.99 (0.99, 0.99) x0.25 (0.23, 0.28) x0.20 (0.18, 0.23)13 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 0.98 (0.97, 0.98) x0.21 (0.17, 0.26)40 0.00 (0.00, 0.00) 0.00 (0.00, 0.00) 0.00 (0.00, 0.00) 0.06 (0.05, 0.07) x0.62 (0.58, 0.66)41 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 0.95 (0.94, 0.96) 1.00 (1.00, 1.00) x0.02 (0.02, 0.03)63 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 0.99 (0.99, 0.99) 0.99 (0.98, 0.99) x0.39 (0.35, 0.43)87 1.00 (1.00, 1.00) 1.00 (1.00, 1.00) 0.67 (0.64, 0.69) 0.98 (0.98, 0.98) x0.39 (0.35, 0.43)N miss 0 3 11 8 6Table 5.2: The estimated posterior probability (and 95% CI) of being in cluster2 (decreasing trend) for patients missclassified by at least one of TRIPLE,DOUBLE-12, SINGLE-1, 2 or 3. ‘‘x’’ indicates a patient misclassified bythe corresponding procedure. N miss indicates the number of misclassifiedpatients.SINGLE-1, 2, 3 or DOUBLE-12 tend to lie above the 45 degree line. Wefound that the posterior probability based on TRIPLE is higher than any ofSINGLE-1, 2 or 3 about 98% of time, and the posterior probability basedon TRIPLE is higher than the DOUBLE-12 about 94% of time. This indi-cates that when the MLOMM incorporates more effective outcomes, then theprocedure can more confidently identify clusters.1065.6. A simple illustrative exampleFigure 5.8: The joint distribution of the posterior probabilities based onSINGLE-1, 2, 3 or DOUBLE, and TRIPLE. The percentage indicates the pro-portion of the TRIPLE posterior probabilities that are larger than SINGLE-1,2, 3 or DOUBLE.5.6.2 Scenario 2: One outcome is ineffectiveIn Scenario 1, where all outcomes are effective, the classification performanceof TRIPLE was best. This section assesses the classification performance1075.7. Application to identify relative respondersof TRIPLE, DOUBLE-12 and SINGLE-1, 2, 3 when one of three outcomesis ineffective. The simulation model of the 3rd outcome in Section 5.6.1 ismodified to represent an ineffective outcome by setting β(3)[k]1 = 0 for bothk = 1, 2.Table 5.3 shows the estimated parameter values of TRIPLE and SINGLE-3from both the MLE and EMMLE procedures. The estimates from DOUBLE-12, SINGLE-1 and 2 are not shown as they are identical to those in Table 5.1(as the same data was used for the 1st and the 2nd outcomes). The SEs ofEMMLE and MLE of SINGLE-3 based on both the gradient vector repre-sentation of the Fisher information matrix and the Hessian representation ofthe Fisher information matrix cannot be computed due to the singularity ofthe estimated Fisher information matrix. This is not surprising as pi is notidentifiable in this simulation setting.SINGLE-3 has no classification power as β̂(3)[k]1 for k = 1, 2 are almost iden-tical. Notice that while DOUBLE-12 classifies 97 out of 100 patients correctly(shown in Table 5.2), TRIPLE identifies 99 out of 100 subjects correctly. Themisclassified patient, ID 33, has an estimated posterior probability of beingin cluster 2 (decreasing trend) of 0.20 (95% CI: 0.17 − 0.23) by TRIPLE. Inthis dataset, the inclusion of the ineffective outcomes helps identification ofresponders. This could be because of the correlation between the ineffectiveoutcome and the 1st outcome which is effective: Cor(Y(1)i,j , Y(3)i,j |ci) = 0.424and Cor(Y(1)i,j , Y(3)i,j′ |ci) = 0.418, j 6= j′.5.7 Application to identify relative respondersThe model introduced in Section 5.3 can be used to identify K clusters createdby multiple longitudinal outcomes. One application of this model may be toidentify absolute responders. In practice, one may want to identify the treatedpatients that have changes in their disease conditions that would not have beenobserved in the absence of treatment as responders. Section 4.1 referred tosuch treated patients as relative responders. This section demonstrates howwe can apply our MLOMM to identify relative responders.1085.7. Application to identify relative respondersTrue TRIPLE SINGLE-3MLE EMMLE MLE EMMLEβ(1)0 2.0 1.92 (0.25/0.27) 1.90 (0.27)β(1)[1]1 0.6 0.59 (0.06/0.07) 0.58 (0.07)β(1)[2]1 -0.6 -0.60 (0.07/0.07) -0.60 (0.07)κ(1) 2.0 2.03 (0.14/0.16) 2.03 (0.16)β(2)0 0.0 -0.03 (0.14/0.15) -0.03 (0.15)β(2)[1]1 0.4 0.43 (0.03/0.04) 0.43 (0.04)β(2)[2]1 -0.4 -0.37 (0.04/0.04) -0.37 (0.04)β(2)2 0.1 0.15 (0.05/0.05) 0.15 (0.05)κ(2) 1.0 0.86 (0.06/0.09) 0.86 (0.09)β(3)0 3.0 3.08 (0.16/0.17) 3.07 (0.17) 2.89 (NA/NA) 2.87 (NA)β(3)[1]1 0.0 -0.00 (0.04/0.04) -0.00 (0.04) -0.03 (NA/NA) -0.03 (NA)β(3)[2]1 -0.0 0.01 (0.05/0.05) 0.01 (0.05) -0.03 (NA/NA) -0.03 (NA)β(3)2 0.0 0.05 (0.15/0.21) 0.05 (0.20) 0.16 (NA/NA) 0.16 (NA)β(3)3 0.1 0.06 (0.15/0.17) 0.06 (0.17) 0.05 (NA/NA) 0.05 (NA)κ(3) 1.0 1.04 (0.07/0.09) 1.04 (0.09) 1.07 (NA/NA) 1.07 (NA)σ1 2.0 2.28 (0.18/0.24) 2.28 (0.24)σ2 1.0 1.02 (0.09/0.10) 1.02 (0.10)σ3 1.0 1.01 (0.10/0.13) 1.01 (0.13) 0.85 (NA/NA) 0.85 (NA)ρ12 0.2 0.03 (0.14/0.16) 0.03 (0.16)ρ13 0.7 0.79 (0.07/0.08) 0.79 (0.08)ρ23 0.2 0.28 (0.14/0.17) 0.28 (0.16)pi 0.5 0.55 (0.05/0.06) 0.55 (0.06) 0.50 (NA/NA) 0.21 (NA)llk -2635.72 -2620.54 -2620.55 -799.86 -799.86M 1407 2320Itr 113 40Table 5.3: Simulation results of the illustrative example when outcomes 1, 2are effective and outcome 3 is ineffective. The values denoted as (a/b) withinparentheses represent the SEs computed based on the Hessian representationof the Fisher information matrix (a) and that based on the outer products ofthe gradient representation of the Fisher information matrix (b). True: simu-lation parameter values; MLE: the MLE obtained by maximizing the observedlikelihood directly; EMMLE: the MLE obtained via the MCEM algorithm; llk:log-likelihood values at the final estimates; M: the final MCEM sample size;Itr: the number of iterations required before convergence.In this cluster structure, relative responders exist only among treated pa-tients, and non-responders among treated patients belong to the same cluster1095.7. Application to identify relative respondersas the control patients. One may consider K = 2 simply for relative responderand non-responder clusters, but K may be greater than 2 as the clusters mayhave distinct subgroups. This section considers the scenario when single non-responder (Ci = 1) and multiple relative responder clusters (Ci = 2, 3, · · · ,K)are of interest. We can easily incorporate this cluster structure into our mix-ture model by introducing a new variable Ai that equals 0 if the ith patientis in the control arm and 1 if not. Our model now assumes that control pa-tients always belong to the non-responder cluster Pr(Ci = 1|Ai = 0) = 1. Letδk := Pr(Ci = k|Ai = 1). Then it follows that for k > 1:δk :=Pr(Ai = 1|Ci = k) Pr(Ci = k)Pr(Ai = 1)=pikPr(Ai = 1),because all the patients in the responder cluster are from the treatment armPr(Ai = 1|Ci = k) = 1 for k > 1, and:δ1 = 1−K∑k=2pikPr(Ai = 1).Furthermore, we assume that B˜i and Y˜ i are conditionally independent of Aigiven ci; that is, knowledge of treatment assignment does not contain extrainformation about the REs or the outcome values once the cluster label isgiven.We assume that the treatment assignments Ai are known. Then the com-plete log-likelihood can be written as:ln fC(y˜i, b˜i, ci|ai; Ψ) (5.25)= ln f(b˜i; ΨB) +K∑k=1I(ci = k)[ln Pr(Ci = k|ai) + ln f(y˜i|b˜i, Ci = k, ai; Ψ)]=ln f(b˜i; ΨB) + ln f(y˜i|b˜i, Ci = 1; Ψ) if ai = 0 and ci = 1ln f(b˜i; ΨB) +K∑k=1I(ci = k)[ln δk + ln f(y˜i|b˜i, Ci = k; Ψ)]if ai = 11105.7. Application to identify relative respondersNote that the complete log-likelihood for treated patients is the same as thecomplete log-likelihood introduced in (5.6) except that pik = Pr(Ci = k) isnow replaced by δk.The expected complete log-likelihood is obtained by taking the expectationof (5.25) with respect to B˜i, Ci|y˜i, ai:Q∗(Ψ; Ψ{s}) =∑i∈UuiE B˜i(ln f(B˜i; ΨB) + k{s}i (B˜i)∣∣∣y˜i, ai; Ψ{s}) (5.26)and:k{s}i (B˜i) =ln f(y˜i|B˜i, Ci = 1; Ψ) if ai = 0 and ci = 1K∑k=1ω∗[k]{s}i[ln δk + ln f(y˜i|B˜i, Ci = k; Ψ)]if ai = 1where:ω∗[k]{s}i := Pr(Ci = k∣∣∣y˜i, B˜i, Ai = 1; Ψ̂{s}) = δ{s}k f(y˜i|B˜i, Ci = k; Ψ̂{s})K∑l=1δ{s}l f(y˜i|B˜i, Ci = l; Ψ̂{s}),and δ{s}k = Pr(Ci = k|Ai = 1; Ψ{s}). Notice that the expected complete log-likelihood for treated patients (5.26) is the same as (5.11) except that pik isreplaced by δk and ω[k]{s}i is replaced by ω∗[k]{s}i .The MCEM algorithm may be used to obtain the MLE as in Section 5.4.The {s+1}th iteration of the E-step approximates (5.26) by generating samplesb˜{s,m}i ,m = 1, 2, · · · ,M from B˜i|y˜i, ai; Ψ̂{s}. As in (5.12), let Q∗M (Ψ; Ψ{s}) =∑Mm=1 q∗m(Ψ; Ψ{s})/M . Then q∗m can be decomposed into R+2 functions, q∗Cm ,q∗Bm , q∗Y(r)m , r = 1, 2, · · · , R, when each depends on a set of parameters thatis disjoint with the other sets as in (5.14). Then under this model, q∗Bm = qBm,1115.7. Application to identify relative responderswhere qBm is defined in Section (5.15), and,q∗Cm (pi; Ψ̂{s}) =K∑k=1ln δk∑i;Ai=1ω∗[k]{s,m}iq∗Y(r)m (ΨY (r) ; Ψ̂{s})=∑i;Ai=1K∑k=1ω∗[k]{s,m}in(r)∑j=1ln fGLM(y(r)i,j |y(r)i,1:j ,y(1:r)i ,B(r)i = b(r){s,m}i , Ci = k; Ψ[k]Y (r))+∑i;Ai=0n(r)∑j=1ln fGLM(y(r)i,j |y(r)i,1:j ,y(1:r)i ,B(r)i = b(r){s,m}i , Ci = 1; Ψ[1]Y (r)),where ω∗[k]{s,m}i = Pr(Ci = k|y˜i, B˜i = b˜{s,m}i , Ai = 1; Ψ̂{s}). Therefore, wecan proceed with a minor modification to the MCEM algorithm detailed inSection The illustrative example revisitedWe revisit the illustrative example of Scenario 1 introduced in Section 5.6. Wetreat the 100 subjects in this illustrative dataset as treated patients and add50 new control patients, who belong to cluster 1 (Ci = 1, coloured in blackin Figure 5.7). See Figure 5.9 for the generated control data. In total, thismodified illustrative example contains 150 patients.Table 5.4 contains both the MLE and EMMLE for this modified illustrativeexample. Due to the additional 50 control patients, the MLE and EMMLEchanged from the analysis reported in Section 5.6. Again, the MLE andEMMLE are very similar for all the parameters, and their log-likelihood valuesare almost identical. The SE of β̂(r)[2]1 is slightly larger than the SE of β̂(r)[1]1for all r = 1, 2, 3. This is reasonable as the additional control patient dataimproves the estimation of β(r)[1]1 . Notice that all the SEs of the MLE andEMMLE stay the same or become smaller than those shown in Section 5.6,which is reasonable as additional data is included here.1125.8. ConclusionsFigure 5.9: The trajectories of the 50 additional control patientsThe traceplots of all the parameters and the observed log-likelihood arevery similar to those in Section 5.6 (results not shown). As in Section 5.6,TRIPLE correctly classifies all 100 treated patients. By definition, “relativeresponders” do not exist among control patients. However, we can computethe posterior probabilities for the control patients and assign them to clustersaccordingly. Ideally, all control patients should be classified into the non-responder cluster. Only one of the 50 control patients is classified into the re-sponder cluster. Even in the scenario where the responder and non-responderclusters are clearly separated, some control patients could be classified intothe responder group due to the sampling variation.5.8 ConclusionsAssessments of treatment effects for specific individuals in clinical trials areessential as a treatment may work well in some patients but could be harmfulto others. Identifying the subsets of responsive patients is challenging whentreatment efficacy is assessed by multiple continuous and discrete longitudinaloutcomes. This chapter discussed our MLOMM and explained its estimation1135.8.ConclusionsTrue TRIPLE DOUBLE-12 SINGLE-1 SINGLE-2 SINGLE-3MLE EMMLE MLE EMMLE MLE EMMLE MLE EMMLE MLE EMMLEβ(1)0 2.0 2.07 (0.20/0.21) 2.07 (0.21) 2.07 (0.19/0.20) 2.04 (0.20) 2.07 (0.19/0.19) 2.02 (0.20)β(1)[1]1 0.6 0.58 (0.04/0.05) 0.58 (0.05) 0.58 (0.04/0.05) 0.58 (0.05) 0.56 (0.05/0.05) 0.56 (0.05)β(1)[2]1 -0.6 -0.60 (0.06/0.06) -0.60 (0.06) -0.61 (0.07/0.07) -0.61 (0.07) -0.65 (0.09/0.09) -0.66 (0.09)κ(1) 2.0 2.12 (0.12/0.13) 2.12 (0.13) 2.12 (0.12/0.12) 2.12 (0.12) 2.12 (0.13/0.12) 2.12 (0.12)β(2)0 0.0 0.03 (0.12/0.13) 0.03 (0.13) 0.02 (0.12/0.12) 0.02 (0.12) -0.06 (0.12/0.12) -0.05 (0.12)β(2)[1]1 0.4 0.41 (0.03/0.03) 0.41 (0.03) 0.40 (0.03/0.03) 0.40 (0.03) 0.37 (0.02/0.02) 0.37 (0.02)β(2)[2]1 -0.4 -0.38 (0.03/0.03) -0.38 (0.03) -0.37 (0.03/0.03) -0.37 (0.03) -0.34 (0.04/0.03) -0.34 (0.03)β(2)2 0.1 0.14 (0.04/0.05) 0.14 (0.05) 0.14 (0.05/0.05) 0.14 (0.05) 0.18 (0.05/0.05) 0.18 (0.05)κ(2) 1.0 0.90 (0.05/0.07) 0.90 (0.07) 0.92 (0.05/0.07) 0.92 (0.07) 0.92 (0.06/0.07) 0.92 (0.07)β(3)0 3.0 3.17 (0.13/0.13) 3.17 (0.13) 2.92 (0.12/0.12) 2.91 (0.12)β(3)[1]1 1.5 1.40 (0.03/0.03) 1.40 (0.03) 1.11 (0.03/0.03) 1.11 (0.03)β(3)[2]1 -1.5 -1.50 (0.04/0.04) -1.50 (0.04) -1.32 (0.04/0.04) -1.33 (0.04)β(3)2 0.0 0.04 (0.11/0.12) 0.04 (0.13) 0.16 (0.13/0.12) 0.16 (0.12)β(3)3 0.1 0.08 (0.14/0.15) 0.08 (0.15) 0.14 (0.17/0.17) 0.14 (0.17)κ(3) 1.0 0.99 (0.06/0.06) 0.99 (0.06) 1.05 (0.06/0.06) 1.05 (0.06)σ1 2.0 2.13 (0.14/0.17) 2.13 (0.17) 2.10 (0.14/0.17) 2.10 (0.17) 2.10 (0.14/0.17) 2.10 (0.17)σ2 1.0 1.02 (0.07/0.08) 1.02 (0.08) 1.00 (0.07/0.08) 1.01 (0.08) 1.02 (0.07/0.09) 1.02 (0.08)σ3 1.0 0.99 (0.08/0.10) 0.99 (0.10) 0.79 (0.07/0.09) 0.79 (0.09)ρ12 0.2 0.11 (0.11/0.12) 0.11 (0.12) 0.07 (0.11/0.12) 0.07 (0.12)ρ13 0.7 0.74 (0.06/0.07) 0.74 (0.07)ρ23 0.2 0.24 (0.11/0.13) 0.24 (0.13)δ1 0.5 0.55 (0.05/0.05) 0.55 (0.05) 0.55 (0.05/0.05) 0.55 (0.05) 0.59 (0.06/0.06) 0.59 (0.06) 0.56 (0.06/0.06) 0.55 (0.06) 0.57 (0.06/0.06) 0.57 (0.06)llk -3895.58 -3883.20 -3883.20 -2728.42 -2728.44 -1543.84 -1543.89 -1189.91 -1189.93 -1202.88 -1202.89M 1430 1535 2154 2052 3277Itr 130 57 41 49 47Table 5.4: Simulation results of the illustrative example in the context of identifing relative responders. Thevalues denoted as (a/b) within parentheses represent the SEs computed based on the Hessian representation ofthe Fisher information matrix (a) and that based on the outer products of the gradient representation of theFisher information matrix (b). True: simulation parameter values; MLE: the MLE obtained by maximizing theobserved likelihood directly; EMMLE: the MLE obtained via the MCEM algorithm; llk: log-likelihood values atthe final estimates; M: the final MCEM sample size; Itr: the number of iterations required before convergence.1145.8. Conclusionsprocedures. Conditionally on the cluster label, we modelled each longitudinaloutcome to be from either a GLMM or the NB mixed effect model, as they arearguably the most popular longitudinal models in the clinical trial applicationsof interest to us. Therefore, the use of our model is appropriate for anycollection of discrete or continuous longitudinal outcomes, when each can bemodelled with a GLMM conditioning on a cluster label. Our MLOMM canbe used to identify both absolute and relative responders. Our estimationprocedure is based on the MCEM algorithm. Our algorithm is capable ofestimating the large number of parameters in our MLOMM by exploitingexisting well-developed algorithms that maximize the GLM likelihoods. Wealso introduce a procedure to obtain the SEs. As the procedure assumes thatthe variance of our estimator can be estimated by the inverse of the Fisherinformation matrix, the procedure relies on the large sample theory of theMLE. Therefore, our SEs are reliable only when the sample size is large andthe algorithm converges to the MLE.Our procedures were demonstrated with illustrative examples to identifyboth absolute and relative responders. The examples confirmed that our es-timation procedure yielded the MLE when three longitudinal outcomes weremodelled with trivariate longitudinal outcome mixture models. In our ex-amples, we compared the classification performances of absolute respondersamong MLOMMs based on one, two or three outcomes. These compar-isons showed that when the MLOMM incorporated more effective outcomes,the procedure could more confidently identify clusters with higher accuracy.Chapter 6 illustrates our procedure using datasets from two previously con-ducted clinical trials.115Chapter 6MS clinical trial data analysisto identify respondersThis chapter applies the MLOMM procedures to two MS clinical trial datasetsto identify absolute and relative responders. Section 6.1 illustrates our proce-dure to search for relative responders among lenercept treated MS patients.This is the trial labeled as RRMS-2 in Table 3.1. This application models lon-gitudinally collected newly active lesion (NAL) counts and persistently activelesion (PAL) counts. Section 6.2 illustrates our procedure to search for abso-lute responders among MS patients in the MBP8298 trial. This is the triallabeled as SPMS-4 in Table 3.1. This application models each component oflongitudinally collected MSFC: two components are continuous outcomes, andone is a count outcome.6.1 Lenercept MS trialThis section demonstrates our procedures using a previously conducted phaseII MS clinical trial. The trial was conducted with 167 patients to evaluatewhether lenercept would reduce new lesions on MRI. Patients received 10 mg(44 patients), 50 mg (40 patients) or 100 mg (40 patients) of lenercept orplacebo (43 patients) every 4 weeks for up to 48 weeks. MRI scans were per-formed at screening, at baseline and then every 4 weeks through study week 24for a total of 8 MRI scans. The primary efficacy measure was the cumulativenumber of NALs on MRI scans identified on the six scans during treatment.PALs on MRI scans are identified separately as a secondary measure of effi-cacy. No significant differences were seen across groups on any MRI measures,1166.1. Lenercept MS trialincluding the primary and secondary efficacy measures. The study was termi-nated because the number of relapses in lenercept-treated patients increasedsignificantly compared with placebo patients. See [40] for more details. We re-visit this dataset and identify relative responders among the lenercept-treatedpatients using NAL and PAL as two longitudinal outcomes. Before detailingour model specification, we introduce the terminology related to MS lesionsand the NAL and PAL data collection process.6.1.1 Newly active lesions and persistently active lesionsIn MS clinical trials, the most common types of MRI scans are the T2-weightedscan and the T1-Gd scan. The NALs and PALs are defined based on activelesions on the T2-weighted scan and CELs on T1-Gd scans: T2 active lesionsare defined as new, recurrent, newly enlarging or persistently enlarging le-sions on the T2-weighted scans, and CELs are defined as new, recurrent orpersistently enhancing lesions on the T1-Gd scans.A lesion on the T2-weighted or T1-Gd scans is considered as “new” if ithas not been seen before, and is considered as “recurrent” if it reappears ata site where an earlier lesion disappeared. A lesion on the T2-weighted scanis considered as “newly enlarging” if enlargement was not observed on theprevious scan, and is considered as “persistently enlarging” if enlargementwas also observed on the previous scan (i.e., a persistently enlarging lesionmust be newly or persistently enlarging on the previous scan). A lesion onthe T1-Gd scan is considered as “persistently enhancing” if it enhanced onthe previous scan and continues to enhance on the current scan. (See [58, 63]for more details of T2 active lesions and CELs.)In the lenercept clinical trial, the number of NALs is ascertained by sum-ming the new, recurrent or newly enlarging T2 active lesions, and the newor recurrent CELs. A NAL identified on both the T1-Gd and T2-weightedscans was counted only once. Persistently enhancing lesions or persistentlyenlarging T2 lesions were separately identified as PALs. Therefore, the PALsare always a subset of the PALs or NALs at the previous scan. As each T2-1176.1. Lenercept MS trialweighted scan shows both old and active lesions, to identify T2 active lesionsradiologists have to compare the current T2-weighted MRI scan to the previ-ous ones. Therefore, T2 active lesions cannot be identified on the T2-weightedscan at screening. In order to assess whether T2 active lesions or CELs arenew, recurrent, enlarging or enhancing, radiologists must check the lesion’sstatus on the previous scans. As a result, although the total number of CELsare available at screening, their exact categories are unknown.The total number of T2 active lesion and CEL counts (i.e., the sum of PALand NAL) are unknown at screening, due to the lack of information about T2active lesions. However, following the convention used in the MS literature(based on the fact that the number of T2 active lesions on a scan is usuallymuch smaller than the number of CELs), we will substitute the unknownnumber with the total number of CELs.6.1.2 Data and modelThe goal of our analysis is to demonstrate how the relative responders selectedbased only on the primary outcome (NALs) changes when the secondary out-come (PALs) is incorporated. We will treat all the lenercept-treated arms as asingle treated group. Hereafter, let Y(1)i,j and Y(2)i,j be the NAL and PAL count,respectively, for the ith patient at the jth 4-week interval since the baseline.Notice that the repeated measure index j starts from −1 indicating screening,0 indicates the baseline, and 1, 2, · · · , 6 indicates the follow-up scans. For eachpatient i we have 7 scheduled NAL and PAL counts, i.e., j = 0, 1, · · · , 6 forY(r)i,j , r = 1, 2.As discussed in Section 6.1.1, the individual values of Y(1)i,−1 and Y(2)i,−1 arenot available. Hereafter, our analysis treats Y(r)i,j , j = 0, 1, · · · , 6 as responsevariables, and y(1)i,−1 +y(2)i,−1 is considered only as a covariate value to model thelinear predictors η(r)[k]i,j (r = 1, 2). This should not severely affect the identifi-cation of relative responders since patients are not divided into treatment andplacebo groups at screening, so y(1)i,−1 + y(2)i,−1 contains little information aboutthe relative responder clusters.1186.1. Lenercept MS trialWe employ the NB model for the conditional distribution of NAL, Y(1)i,jgiven y(1)i,0:j , b(1)i and ci (j = 0, 1, · · · , 6). The NB distribution has been usedto model lesions in various applications [50, 71, 84]. For the conditional meanstructure µ(1)[ci]i,j given in (5.4), we use a log link function as ln(µ(1)[ci]i,j ) =η(1)[ci]i,j , and assume that µ(1)[ci]i,j is proportional to the total count of NALs andPALs at screening. Then the component of the linear predictor η(1)[ci]i,j thatdoes not depend on the cluster label ci is expressed as:Υi := β(1)0 + b(1)i + β(1)1 ln(y(1)i,−1 + y(2)i,−1 + 1).One of the advantages of a model-based approach to identify relative re-sponders is that we can take into account our expectation of “how the clustercentres will change over time” by specifying the functional form of µ(1)[ci]i,j . Weconsider two scenarios of cluster effect on the conditional mean. The first as-sumes that a linear time trend for µ(1)[ci]i,j differs across clusters on the log-scaleand (5.3) is modelled as:NAL:Linear η(1)[ci]i,j = Υi + β(1)[ci]2 j.The second scenario assumes that a step-wise time trend for µ(1)[ci]i,j differsacross clusters on the log-scale and (5.3) is modelled as:NAL:Step η(1)[ci]i,j = Υi + β(1)[ci]3 I[1,4),j + β(1)[ci]4 I[4,7),jwhere IS,j = I(j ∈ S) is an indicator function. That is, β(1)[ci]3 indicatesthe change in the conditional mean from the baseline to the first 12 weeks offollow-up, and β(1)[ci]4 indicates the change in the conditional mean from thebaseline to the second 12 weeks of follow-up.Technically, PAL at the current time point Y(2)i,j comes from two sources:from PAL at the previous time Y(2)i,j−1, and from NAL at the previous timeY(1)i,j−1. One might wonder whether the probability of a PAL to persist in thenext time point is the same as the probability of a NAL to become a PAL in1196.1. Lenercept MS trialthe next time point. If the probabilities are the same, then one might modelthe conditional distribution of the PAL count, Y(2)i,j |y(2)i,0:j ,y(1)i , b(2)i , ci usingthe binomial distribution with y(1)i,j−1 +y(2)i,j−1 as the size parameter. Figure 6.1plots the proportion of NALs or PALs from a time point that become PALsat the next time point for each patient against the ratio of the NAL and PALcounts at the time point. Figure 6.1 shows these proportions for all scans thathave nonzero y(1)i,j + y(2)i,j . If a NAL is more likely to become a PAL at the nexttime point than a PAL is to stay as a PAL, then the proportion becoming PALswould increase with the ratio of the NAL and PAL counts. However, we seethat for all given ratios, the proportions are randomly spread between 0 and 1,indicating that there is no evidence that the probability that a NAL becomesa PAL is different from the probability a PAL stays as a PAL. Therefore,although a NB is popular to model lesion counts, it is more natural to modelthe conditional distribution of the PAL count, Y(2)i,j |y(2)i,0:j ,y(1)i , b(2)i , ci usingthe binomial distribution with y(1)i,j−1 + y(2)i,j−1 as the size parameter. Underthis assumption, the proportion of NALs and PALs from the previous scanto become PALs at the current scan, Y(2)i,j /(y(1)i,j−1 + y(2)i,j−1), belongs to theexponential family [45]. So, our MLOMM can handle this model.For the conditional mean structure of Y(2)i,j /(y(1)i,j−1 +y(2)i,j−1) under the bino-mial model, we use a logit link function, and again consider two scenarios forthe cluster effect on the µ(2)[ci]i,j . The first assumes that the cluster differencearises from a linear time effect and the second one assumes a stepwise timeeffect as:PAL:Linear η(2)[ci]i,j = β(2)0 + b(2)i + β(2)[ci]1 j,PAL:Step η(2)[ci]i,j = β(2)0 + b(2)i + β(2)[ci]2 I[1,4),j + β(2)[ci]3 I[4,7),j .Our analysis considers all 4 combinations of conditional mean structuresfor PAL (Linear or Step) and NAL (Linear or Step).1206.1. Lenercept MS trialFigure 6.1: The proportions of NALs or PALs from the previous time pointthat become PALs at the current time point plotted against the ratio of theNAL and PAL counts at the previous time point. The intensity of the redindicates the number of observations lying at the same location. Note thatthe x-axis is presented on the log-scale.1216.1. Lenercept MS trial6.1.3 ResultsModel parametersThe ascent-based algorithm parameters are selected as: α = 0.1, β = 0.3, γ =0.1 and Cstop = 0.005. The MCEM algorithm performs well, and its detailsare discussed in Appendix C.1. Table 6.1 shows the fitted results. Withboth Linear and Step mean models, the SINGLE fit indicates that relativeresponders (Ci = 2) based solely on the NAL outcome have increasing NALactivity level while the non-responders have a decreasing trend. Although theestimates of parameters that are shared by both Linear and Step mean models(i.e., β(1)0 , β(1)1 , ζ(1), σ(1) and pi) are similar, the Akaike Information Criterion(AIC) indicates that the Linear mean model is a better fit than the Step meanmodel.The Linear mean model for the NAL outcome is also a better fit than theStep mean model (in terms of AIC) when the NAL and the PAL outcomes aremodelled together in DOUBLE. All four DOUBLE fits also indicate that rel-ative responders have increasing NAL activity level while the non-respondershave decreasing NAL activity level. When the same mean model is used forNAL, the estimated parameters ΨY (1) for the NAL outcome are quite similarbetween SINGLE and DOUBLE. For the PAL outcome of DOUBLE, boththe Linear and Step mean models indicate that the mean rate of NALs andPALs becoming PALs is decreasing over time for both the relative respondersand non-responders. The DOUBLE fits indicate that the PAL rates for therelative responders are higher than for the non-responders, except in the caseof the Step mean model for the NAL counts and the Linear mean model forthe PAL counts, where the PAL rates in the two clusters are very similar.In fact, none of the four DOUBLE fits provides clear evidence of clusters forthe PAL response (as can be measured by differences in the mean modelsbetween the relative responders and the non-responders). AIC indicates thatDOUBLE with the Linear mean models for both NAL and PAL is the bestfit among all the four joint models considered.All estimated parameters that are shared by the four DOUBLE models1226.1.LenerceptMStrialModel DOUBLE SINGLENAL Linear Step Linear StepPAL Linear Step Linear Step — —NAL β(1)0 -1.243 (0.136) -1.226 (0.138) -1.343 (0.165) -1.319 (0.167) -1.236 (0.135) -1.336 (0.163)β(1)1 1.293 (0.123) 1.287 (0.131) 1.288 (0.121) 1.272 (0.129) 1.282 (0.123) 1.275 (0.122)β(1)[1]2††-0.102 (0.027) ††-0.109 (0.026) ††-0.102 (0.025)β(1)[2]2 0.197 (0.061) 0.180 (0.054) 0.193 (0.054)β(1)[1]3†-0.101 (0.144) †-0.121 (0.149) ††-0.102 (0.141)β(1)[2]3 0.770 (0.369) 0.696 (0.325) 0.705 (0.292)β(1)[1]4††-0.369 (0.165) ††-0.412 (0.164) ††-0.381 (0.158)β(1)[2]4 1.196 (0.474) 1.061 (0.399) 1.123 (0.410)ζ(1) 3.703 (0.004) 3.752 (0.004) 3.601 (0.004) 3.667 (0.004) 3.700 (0.004) 3.609 (0.004)PAL β(2)0 -0.738 (0.126) -0.638 (0.129) -0.740 (0.124) -0.632 (0.128)β(2)[1]14-0.092 (0.032) 4-0.086 (0.033)β(2)[2]1 -0.082 (0.052) -0.096 (0.076)β(2)[1]24-0.378 (0.131) 4-0.402 (0.134)β(2)[2]2 -0.195 (0.275) -0.128 (0.224)β(2)[1]34-0.582 (0.167) 4-0.601 (0.173)β(2)[2]3 -0.269 (0.261) -0.236 (0.251)RE σ1 0.933 (0.096) 0.934 (0.095) 0.926 (0.110) 0.926 (0.116) 0.931 (0.091) 0.928 (0.105)σ2 0.588 (0.096) 0.584 (0.099) 0.581 (0.095) 0.579 (0.101)ρ12 -0.071 (0.247) -0.092 (0.229) -0.069 (0.245) -0.107 (0.249)δ1 0.797 (0.095) 0.775 (0.093) 0.810 (0.105) 0.778 (0.105) 0.792 (0.091) 0.793 (0.107)Fit llk -2266.29 -2267.82 -2271.23 -2272.69 -1549.57 -1554.87AIC 4556.57 4563.64 4570.45 4577.38 3113.13 3127.75M 9384 9551 12620 6097 1562 3030Itr 97 95 141 104 64 86Table 6.1: The model estimates of DOUBLE and SINGLE using NAL. For both PAL and NAL outcomes,Linear and Step mean models are considered. Labels next to β(r)[1]l indicate the p-value of the asymptotic Z-testfor hypothesis β(r)[1]l = β(r)[2]l ; †† indicates that p-value is less than 0.01; 4 indicates that the p-value is greaterthan 0.05. The abbreviations are: llk: log-likelihood values at the final estimates; AIC: Akaike InformationCriterion; M: the final MCEM sample size; Itr: the number of iterations required before convergence.1236.1. Lenercept MS trial(i.e., β(1)1 , ζ(1), σ1, σ2, ρ12 and pi) are somewhat similar. Although the pointestimate of ρ12 is negative, this cannot be directly interpreted in terms ofthe correlations between the NAL and PAL counts. Figure 6.2 (a) shows theestimated Spearman correlation matrices across 14 repeated measurements (7from NAL and 7 from PAL) obtained from estimates of DOUBLE with Lin-ear mean models for both NAL and PAL for patients in the treatment arm.(The Spearman correlations are computed by first generating a large numberof samples from DOUBLE with estimated parameters as in Table 6.1 givenvalues of y(1)i,−1 +y(2)i,−1. The distribution of the values of y(1)i,−1 +y(2)i,−1 among thegenerated samples is set to be the same as in our data. Then the Spearmancorrelations are computed for the generated samples.) The estimated correla-tions between Y(1)i,j and Y(2)i,j′ for j, j′ = 0, 1, · · · , 6 are reasonably high, rangingbetween 0.45 − 0.71. This range is somewhat smaller than the range of thesample Spearman correlations (0.36 − 0.77: Figure 6.2 (b)). The estimatedcorrelations from DOUBLE show that PAL at the time point j is highly cor-related with NAL at the time point j − 1 and PAL at the time point j − 1.PAL becomes less correlated with both NAL and PAL as the time separationincreases. These characteristics of the correlation estimates from DOUBLEagree with the sample correlations.Identification of relative respondersHereafter, we focus on SINGLE with the Linear mean model and DOUBLEwith the Linear mean models for both NAL and PAL as AIC indicated theseprovide the best fits. Figure 6.3 shows the estimated posterior probabilityfor all patients on the treatment arm based on both SINGLE and DOUBLE.The estimated probabilities for DOUBLE and SINGLE are very similar for allpatients. The same patients are identified as relative responders by SINGLEand DOUBLE except for patient 904: this patient is identified as a relativeresponder by SINGLE but not by DOUBLE (Table 6.2). However, the 95%CI for this patient’s posterior probability for both classifiers contains 0.5, in-dicating that there is no clear evidence that this patient is a relative responder1246.1. Lenercept MS trial(a) Model(b) SampleFigure 6.2: The Spearman correlation matrices across 14 repeated measure-ments (7 from NAL and 7 from PAL) for treated patients obtained from: (a)DOUBLE with the Linear mean models for both NAL and PAL, and (b) theraw data. N:j and P:j indicate the NAL and PAL counts on the jth scan,respectively.1256.1. Lenercept MS trialFigure 6.3: The estimated posterior probability of being a relative responderfor all patients on the treatment arm based on both DOUBLE and SINGLElabeled by patient ID.1266.1. Lenercept MS trialor non-responder. The similar results between SINGLE and DOUBLE couldbe because the cluster structure with respect to the PAL counts is weak (re-call that the cluster-specific coefficients for the PAL fit in Table 6.1 are notsignificantly different). Table 6.2 shows all the patients that have posteriorprobability estimates from DOUBLE and SINGLE that differ by more than10%. DOUBLE yields higher posterior probability than SINGLE for patientswith high PAL activities while SINGLE yields higher posterior probability forpatients with low PAL activities.Although, by definition, relative responders only exist among treated pa-tients, we can compute the posterior probabilities for the placebo patients.Ideally, the posterior probability of a placebo patient to be a relative respon-der should always be less than 0.5. Among the 43 placebo patients, only oneis classified into the relative responder cluster: Patient 211 is classified as arelative responder by both SINGLE and DOUBLE in Table 6.2. This suggeststhat the relative responder and non-responder clusters are well-separated.In summary, this data analysis demonstrates the use of our MLOMM toidentify relative responders. Our model can account for various cluster struc-tures by specifying the conditional mean structures. The estimated modelyields reasonable correlation estimates among the NAL and PAL counts. Al-though SINGLE and DOUBLE provide different posterior probabilities, withthe exception of patient 904, both classify the same patients as relative re-sponders. This is reasonable as there is no clear cluster structure for the PALcounts.1276.1. Lenercept MS trialID NAL PAL SINGLE-NAL DOUBLEPatients identified as relative responders by SINGLE but not by DOUBLE904 0/1/0/1/1/4/1 0/0/1/0/0/0/1 0.503 (0.469, 0.536) 0.493 (0.453, 0.534)Posterior probability by SINGLE ×1.1 < posterior probability by DOUBLE404 6/15/8/12/11/5/25 10/9/13/12/15/14/11 0.526 (0.430, 0.620) 0.600 (0.420, 0.756)416 3/10/10/4/5/9/5 1/2/8/11/12/9/7 0.099 (0.070, 0.138) 0.133 (0.078, 0.216)Posterior probability by SINGLE > posterior probability by DOUBLE ×1.1103 2/4/0/0/0/1/1 2/3/3/0/0/0/0 0.010 (0.006, 0.014) 0.009 (0.005, 0.015)106 4/4/8/1/0/1/2 0/1/3/5/2/0/0 0.006 (0.003, 0.009) 0.005 (0.003, 0.009)200 7/0/1/3/6/4/1 2/2/0/0/0/0/1 0.068 (0.049, 0.092) 0.056 (0.031, 0.100)203 6/1/2/4/1/3/4 2/0/0/1/1/0/0 0.069 (0.049, 0.096) 0.057 (0.033, 0.098)207 1/0/3/3/0/0/0 0/1/0/0/0/0/0 0.036 (0.028, 0.046) 0.031 (0.021, 0.046)212 1/0/2/2/1/0/0 0/0/0/1/0/0/0 0.140 (0.122, 0.159) 0.127 (0.103, 0.154)305 1/3/1/0/0/0/0 1/0/0/0/0/0/0 0.008 (0.005, 0.011) 0.007 (0.004, 0.011)306 5/3/2/3/5/1/2 3/3/3/1/2/2/0 0.009 (0.005, 0.015) 0.008 (0.004, 0.014)403 7/11/4/13/9/10/8 6/2/0/0/3/0/2 0.141 (0.099, 0.196) 0.111 (0.049, 0.229)422 0/2/4/1/4/1/1 2/0/1/1/1/0/0 0.219 (0.190, 0.251) 0.196 (0.150, 0.253)500 1/3/0/0/0/0/1 0/0/0/0/0/0/0 0.038 (0.030, 0.048) 0.034 (0.024, 0.047)511 2/0/1/1/1/0/0 1/0/0/0/0/0/0 0.039 (0.031, 0.049) 0.034 (0.024, 0.048)512 2/3/0/2/2/2/0 2/0/0/0/0/0/1 0.021 (0.015, 0.030) 0.019 (0.012, 0.030)514 0/0/1/2/1/1/0 0/0/0/0/0/0/0 0.160 (0.140, 0.183) 0.142 (0.113, 0.177)600 12/20/7/4/7/4/5 4/6/3/2/2/0/1 0.002 (0.001, 0.004) 0.001 (0.000, 0.006)606 1/1/0/0/0/0/0 0/0/1/0/0/0/0 0.019 (0.014, 0.024) 0.017 (0.012, 0.024)610 0/4/NA/0/0/0/0 0/0/NA/0/0/0/0 0.019 (0.015, 0.025) 0.017 (0.011, 0.025)617 3/3/0/1/2/4/4 0/0/0/0/0/0/1 0.377 (0.328, 0.429) 0.342 (0.274, 0.417)621 8/NA/0/3/2/1/1 0/NA/1/1/3/0/1 0.002 (0.001, 0.003) 0.001 (0.001, 0.003)717 2/8/0/4/1/1/2 0/0/4/0/0/0/0 0.008 (0.005, 0.013) 0.007 (0.003, 0.013)801 2/0/0/1/2/1/0 0/0/0/0/0/0/0 0.110 (0.094, 0.129) 0.097 (0.074, 0.126)806 2/3/4/1/2/0/2 0/1/2/1/0/0/0 0.027 (0.019, 0.037) 0.023 (0.014, 0.037)807 7/5/0/3/2/4/3 4/4/1/0/0/0/1 0.015 (0.009, 0.025) 0.012 (0.005, 0.029)814 6/15/6/NA/NA/3/6 1/2/5/NA/NA/0/1 0.063 (0.043, 0.090) 0.056 (0.035, 0.088)917 1/0/2/0/0/0/0 0/0/0/0/0/0/0 0.034 (0.027, 0.042) 0.030 (0.022, 0.041)950 6/2/8/7/7/2/3 7/7/3/2/4/1/2 0.016 (0.009, 0.025) 0.013 (0.007, 0.026)Placebo patients that would be identified as relative responders211 0/0/0/0/3/2/0 0/0/0/0/0/1/1 0.684 (0.663, 0.704) 0.694 (0.673, 0.714)Table 6.2: The estimated posterior probabilities based on DOUBLE withLinear mean models for both NAL and PAL, and based on SINGLE with theLinear mean model for NAL for various patients.1286.2. MBP8298 MS trial6.2 MBP8298 MS trialThe objective of this study was to evaluate the efficacy and safety of MBP8298in SPMS patients. Patients were randomly assigned to either 500 mg MBP8298or placebo, given by IV injection once every 6 months for 2 years. One of theefficacy outcomes is based on the MSFC, the composite score that combinesthree body functional measurements. In our MBP8298 MS trial subcohort,101 patients (47: placebo, 54: MBP8298) were scheduled for MSFCs at onemonth prior to the initiation of treatment (screening), baseline, and 12 follow-up visits at months 1, 2, 3, 6, 7, 8, 9, 12, 15, 18, 21 and 24 after the baseline.No significant benefit of MBP8298 in any of the primary or secondary end-points was observed, including the mean change in MSFC between month 24and baseline [20]. Before going into the details of our data analysis, we reviewthe MSFC.6.2.1 The Multiple Sclerosis Functional CompositeThe MSFC was designed to reflect the varied clinical expression of MS acrosspatients and over time. It was developed by a special Task Force on Clini-cal Outcomes Assessment appointed by the National MS Society’s AdvisoryCommittee on Clinical Trials of New Agents in MS [68]. It is based on theconcept that scores from three body functions, arm, leg and cognitive func-tion, are combined to create a single score [12, 16]. This is done by creatinga standardized score for each component of the MSFC and averaging these tocreate an overall composite score known as the MSFC score.The leg function is measured by the Timed 25-Foot Walk (T25FW). Thepatient is instructed to walk safely 25 feet as quickly as possible twice and theiraverage time (sec) is recorded. Let TT25FW,j be the recorded time for the jthtrial (j = 1, 2). Then the standardized score for the leg function componentof the MSFC is:ZT25FW = −ST25FW − µbase (ST25FW)σbase (ST25FW)where ST25FW =∑j TT25FW,j2.1296.2. MBP8298 MS trialThe minus sign is included to make negative (positive) ZT25FW be associ-ated with the disease worsening (improvment). The standardization involvescomparing each outcome with that of a reference population which requiresa decision about which population to use as the reference to derive the base-line mean µbase (ST25FW) and baseline SD σbase (ST25FW). [16] recommendsto use the summary statistics either from all patients in a current study co-hort at baseline, or from a representative Task Force database. For the latter,µbase (ST25FW) = 9.5353 and σbase (ST25FW) = 11.4058. If both trials fail tobe completed within 180 seconds, the standardized score is replaced by a largenegative value ZT25FW = −13.7 [16]. This value represents the largest negativeZT25FW in the Task Force dataset, which corresponds to ST25FW = 165.7948.When only one of the trials is successful, [16] did not provide a clear guidelineto compute the standardized score, but for example, [20] uses the non-missingrecord as the average of the trials to compute ZT25FW.The arm function is measured by the 9-Hole Peg Test (9HPT). The patientis seated at a table with a small, shallow container holding nine identicalpegs and a wood or plastic block containing nine corresponding holes. Thepatient picks up the nine pegs one at a time using only the dominant (or non-dominant) hand and puts them in the nine holes as quickly as possible. Oncethey are in the holes, the patient removes them again as quickly as possibleone at a time, replacing them into the container. The total time to completethe task is recorded four times: dominant hand twice and non-dominant handtwice. Let T9HPT,L,j and T9HPT,R,j be the time to complete the task usingthe left and right hand, respectively in the jth trial (j = 1, 2). Then thestandardized score for the arm-component of MSFC is defined as:Z9HPT =S9HPT − µbase (S9HPT)σbase (S9HPT),where:S9HPT =12 1∑j T9HPT,L,j2+1∑j T9HPT,R,j2 .1306.2. MBP8298 MS trialThe mean of the reciprocated average left time and reciprocated average righttime is used so that a large score corresponds to improvements. If a patient isunable to finish any of the four trials within 300 seconds, 777 is imputed forthe corresponding T9HPT,h,j where h = L,R and j = 1, 2. The baseline meanand SD are selected as in T25FW. In the representative Task Force database,µbase (S9HPT) = 0.0439 and σbase (S9HPT) = 0.0101.Finally, cognitive function is measured by the Paced Auditory Serial Ad-dition Test (PASAT). Single digits are presented every 3 seconds and thepatient must add each new digit to the one immediately prior. The test scoreis the number of correct sums given (out of 60) in each trial. Let SPASAT bethe number of correct sums. Then the standardized score for the cognitivecomponent of MSFC is defined as:ZPASAT =SPASAT − µbase (SPASAT)σbase (SPASAT),where the baseline mean and SD are selected as in T25FW. In the represen-tative Task Force database, µbase (SPASAT) = 45.0311 and σbase (SPASAT) =12.0771.Finally, the MSFC score is defined to be the average of the standardizedsubscores:MSFC score =13(ZT25FW + Z9HPT + ZPASAT) .6.2.2 Data and modelWe revisit the MBP8298 trial, and seek to identify clusters of absolute re-sponders and non-responders (K = 2) created by the three longitudinal com-ponents of the MSFCs: PASAT, T25FW and 9HPT. That is, we will fit ourMLOMM, ignoring the treatment assignments. We examine how the MSFCvalues differ between the absolute responders and non-responders identifiedbased on MLOMM. In the calculation of MSFC, we use the baseline meanand baseline SD from the representative Task Force. For the calculation ofMSFC, we make one modification for a censored T25FW data: Instead ofsubstituting ST25FW = 165.7948 for the censored T25FW, we substitute the1316.2. MBP8298 MS trialcensored value 180. This is because the maximum ST25FW observed in ourdataset is greater than 165.7948.Let Y(1)i,j be the number of incorrect answers to the total of 60 mathemat-ical tests in PASAT at the jth follow-up measurement from the ith patient.That is, Y(1)i,j is defined as 60 − SPASAT at each time point for each patient.The number of incorrect answers is used so that the larger values indicatesdisease worsening in all of the outcomes of MLOMM. As in Section 6.1, j = −1indicates the screening measure and j = 0 indicates the baseline. The pos-itive values of j = 1, 2, · · · , 12 indicate the follow-up measurements. As inSection 6.1, we do not model the screening measure j = −1 and use it onlyas a covariate. Let Y(1)i = (Y(1)i,0 , · · · , Y (1)i,12)T . The conditional distributionof Y(1)i,j is assumed to be from a binomial with size parameter 60, given theprevious measurements y(1)i,0:j , RE b(1)i and the cluster label ci. Then, the con-ditional mean of the proportion of incorrect answers Y(1)i,j /60 is modelled aslogit(µ(1)[ci]i,j ) = η(1)[ci]i,j , and:η(1)[ci]i,j = β(1)0 + b(1)i + β(1)1 ln(y(1)i,j−1 + 1) + β(1)[ci]2 wj ,where j = 0, 1, · · · , 12 and wj contains 0 at baseline, and the number of yearssince the baseline for follow-up measures. The cluster difference arises throughthe linear time effect on the conditional expectation of the outcome values,and we assume that the mean of the current outcome depends on the previousoutcome value. There are no missing values at screening. Among the 1414scheduled measures after screening, 1324 (93.6%) of Y(1)i,j are non-missing.For simplicity, we will treat the missing values after screening as missing atrandom and exclude these from our analysis.Let Y(2)i,j be the average time to complete the 25 feet walk over the twotrials for the ith patient at the jth follow-up on the log-log scale. That is,Y(2)i,j = ln lnST25FW at each time point for each patient. Similarly, let Y(3)i,jbe the average time to complete the four 9HPT trials for the ith patient atthe jth follow-up on the log-log scale. That is, Y(3)i,j = ln ln(∑j(T9HPT,L,j +T9HPT,R,j)/4) at each time point for each patient. (The log is taken twice for1326.2. MBP8298 MS trialboth T25FW and 9HPT to improve the fit. Taking log twice is possible as allthese original measures are well above 1 second). To compute the trial averagesfor both T25FW and 9HPT, each of the censored trials is replaced with thecensored time. We take this ad-hoc approach to the administratively censoreddata, as the proportions of censored measurements are very small: for T25FW,among 1414 scheduled measurements, 1332 (94.2%) Y(2)i,j are non-missing, and,among the non-missing measurements 52 (3.9%) correspond to all trials havingbeing censored; for 9HPT, among 1414 scheduled measurements, 1336 (94.5%)Y(3)i,j are non-missing, and among the non-missing measurements 2 (0.1%)correspond to all trials having being censored. Extending our MLOMM forcensored data is an interesting question for future work.As for y(1)i,−1, we do not model the screening readings y(r)i,−1 (r = 2, 3), andlet Y(r)i = (Y(r)i,0 , · · · , Y (r)i,12)T . Then Y (r)i,j |y(r)i,0:j ,y(1:r)i , b(r)i , ci is modelled tobe from a normal distribution with mean µ(r)[ci]i,j = η(r)[ci]i,j and variance κ(r)where:η(r)[ci]i,j = β(r)0 + b(r)i + β(r)1 y(r)i,j−1 + β(r)[ci]2 wj ,for j = 0, 1, · · · , 12 and r = 2, 3. As in the model for Y (1)i,j , the cluster differencearises through the linear time effect on the conditional mean. The coefficientβ(r)1 measures how much the previous measure affects the current conditionalmean.6.2.3 ResultsWe fit our MLOMM using all three outcomes (namely TRIPLE), and us-ing each outcome separately (namely SINGLE-PASAT, SINGLE-T25FW andSINGLE-9HPT). The MCEM parameters are selected as Cstop = 0.005, α =0.1, β = 0.3 and γ = 0.1. The MCEM algorithm performed well, and itsdetails are discussed in Appendix C.2.Table 6.3 shows the point estimates of SINGLE-PASAT, SINGLE-T25FW,SINGLE-9HPT and TRIPLE. TRIPLE and all SINGLEs yield similar param-eter estimates for β(1)0 , β(1)1 , β(1)[1]2 , β(2)[1]2 , β(3)[1]2 , κ(2), κ(3), σ1 and σ3. However,1336.2. MBP8298 MS trialTRIPLE SINGLEPASAT T25FW 9HPTPASAT β(1)0 -2.763 (0.132) -2.753 (0.183)β(1)1 0.209 (0.019) 0.196 (0.018)β(1)[1]2††-0.234 (0.021) ††-0.243 (0.019)β(1)[2]2 0.372 (0.056) 0.405 (0.042)T25FW β(2)0 0.590 (0.034) 0.682 (0.036)β(2)1 0.320 (0.025) 0.200 (0.025)β(2)[1]2††0.026 (0.007) ††0.022 (0.006)β(2)[2]2 0.138 (0.016) 0.239 (0.018)κ(2) 0.013 (3e-04) 0.012 (2e-04)9HPT β(3)0 0.989 (0.042) 1.059 (0.044)β(3)1 0.186 (0.035) 0.126 (0.035)β(3)[1]2††0.003 (0.004) ††0.004 (0.003)β(3)[2]2 0.076 (0.007) 0.166 (0.013)κ(3) 0.003 (6e-05) 0.002 (4e-05)RE σ1 1.571 (0.105) 1.594 (0.162)σ2 0.195 (0.021) 0.224 (0.020)σ3 0.091 (0.008) 0.097 (0.009)ρ12 0.179 (0.104)ρ13 0.231 (0.103)ρ23 0.444 (0.086)Cluster pi 0.878 (0.052) 0.885 (0.040) 0.868 (0.043) 0.951 (0.028)llk -578.85 -3026.17 743.03 1745.21AIC 1199.70 6064.33 -1472.06 -3476.41M 1883 2089 1717 3099Itr 67 61 46 54Table 6.3: The model estimates of TRIPLE and SINGLE. Labels next toβ(r)[1]l indicate the p-value of the asymptotic Z-test for hypothesis β(r)[1]l =β(r)[2]l ; †† indicates that the p-value is less than 0.01. The abbreviationsare; llk: log-likelihood values at the final estimates; AIC: Akaike InformationCriterion; M: the final MCEM sample size; Itr: the number of iterationsrequired before convergence.1346.2. MBP8298 MS trialthe estimate of the probability of being a non-responder, pi, from the four pro-cedures differ somewhat, ranging between 0.868 and 0.951. Similarly, the esti-mates of the cluster-specific coefficients seem somewhat different between eachSINGLE and TRIPLE, especially β(r)[2]2 , r = 1, 2, 3. Nevertheless, results of allthe SINGLE and TRIPLE procedures indicate that cluster-specific coefficientsare significantly different. TRIPLE indicates that all outcomes are decreasingor nearly not changing over time for Ci = 1 (i.e., β(1)[1]2 << 0, β(2)[1]2 ≈ 0, andβ(3)[1]2 ≈ 0), and increasing over time for Ci = 2 (i.e., β(1)[2]2 >> 0, β(2)[2]2 >> 0,and β(3)[2]2 >> 0).Figure 6.4 shows the sample Spearman correlation matrix and the Spear-man correlation matrix estimated from the TRIPLE model fit. The estimationprocedure is as in Section 6.1.3. Both the sample estimate and the model esti-mate show strong correlations within measurements of each outcome. Overall,both correlation estimates are comparable although our model yields lower es-timates for the correlations among the PASAT and 9HPT measurements thanthose in the data.In all SINGLE and TRIPLE procedures, the cluster-specific time effectof cluster 1 is smaller than that of cluster 2 and latter indicates a highlysignificant increase over time. Hence, the large point estimate of pi (> 0.8)indicates that cluster 2 is a minority cluster containing worsening patients(Table 6.3). Hereafter, we call these worsening patients in the second clusteras absolute responders.Absolute responders: worsening patientsFigure 6.5 shows the trajectories of the observed values of PASAT, of averageT25FW on the log-log scale, and of average 9HPT on the log-log scale for eachpatient, grouped by the non-worsening and worsening clusters. The classifi-cations are based on the results from TRIPLE (first row), SINGLE-PASAT(second row), SINGLE-T25FW (third row) and SINGLE-9HPT (fourth row).On average, patients classified as worsening by the TRIPLE procedure have in-creasing trends for all the outcomes. Similarly, the worsening patients from theSINGLE-PASAT and the SINGLE-9HPT procedures show increasing trends1356.2. MBP8298 MS trial(a) Model(b) SampleFigure 6.4: The Spearman correlation matrices across 39 repeated measure-ments (13 from PASAT, 13 from T25FW and 13 from 9HPT) obtained from:(a) TRIPLE, and (b) the raw data. P:j, T:j and H:j indicate the jth PASAT,T25FW and 9HPT outcome, respectively. 1366.2. MBP8298 MS trial(a) PASAT (b) T25FW (c) 9HPTFigure 6.5: The trajectories of the observed values of PASAT and T25FW onthe log-log scale and of 9HPT on the log-log scale for each patient, groupedby non-worsening and worsening clusters. The classifications are based on theresults from TRIPLE (first row), SINGLE-PASAT (second row), SINGLE-T25FW (third row) and SINGLE-9HPT (fourth row). The average at eachtime point for each cluster is represented by the bold dotted curve. The dottedhorizontal lines for T25FW and 9HPT show their maximum values. 1376.2. MBP8298 MS trialFigure 6.6: The Venn diagram of worsening patients found by the fourprocedures. The number in parenthesis indicates the total number of identifiedworsening patients by each procedure.for all the outcomes. On the other hand, the worsening patients from theSINGLE-T25FW procedure have a clear increasing trend with respect to theT25FW outcome, but these worsening patients do not have as clear increasingtrends for the other two outcomes.Figure 6.6 shows the Venn diagram of worsening patients found by thefour procedures. The number of worsening patients identified by TRIPLE,SINGLE-PASAT, SINGLE-T25FW and SINGLE-9HPT are: 11, 9, 12 and 5.All the worsening patients identified by TRIPLE are also identified as wors-ening by at least one of the SINGLE procedures. All 5 worsening patientsidentified by SINGLE-9HPT are also identified as worsening by TRIPLE.Similarly, all but 2 of the 9 worsening patients identified by SINGLE-PASATare also identified as worsening by TRIPLE. As most of the worsening patientsidentified by the SINGLE-PASAT or SINGLE-9HPT are identified as worsen-ing patients by TRIPLE, it makes sense that they have increasing trends inall the outcomes as seen in Figure 6.5.Seven of the 12 patients identified as worsening by SINGLE-T25FW arenot identified as worsening by any of the other procedures. Figure 6.7 (a)shows the trajectories of each outcome and of MSFC for these patients. These7 patients have an increasing linear trend in T25FW but not in PASAT or9HPT. Two patients are identified as worsening by all the procedures except1386.2. MBP8298 MS trialSINGLE-PASAT. These patients have increasing linear trends from the base-line in T25FW and 9HPT, but not in the PASAT (Figure 6.7 (b)). Twopatients (Patients 73 and 89) are identified as worsening by all proceduresexcept SINGLE-T25FW. The estimated posterior probability of being a re-sponder from SINGLE-T25FW for Patients 73 and 89 are 0.040 (95% CI:0.033− 0.048) and 0.464 (95% CI: 0.431− 0.497), respectively. Although theT25FW measures of patient 73 have a mild increasing trend over time fromthe baseline (Figure 6.7 (c)), the trend was not strong enough for that patientto be classified into a responder. Although the last few T25FW measures ofpatient 89 are increasing, the first few (0 ≤ j < 7) are decreasing; as a result,this patient is not identified as worsening by SINGLE-T25FW.Figure 6.8 shows the trajectories of the observed values of MSFC for eachpatient grouped by the worsening and non-worsening clusters from TRIPLE,SINGLE-PASAT, SINGLE-T25FW and SINGLE-9HPT. As smaller valuesof MSFC correspond to worsening, the worsening patients identified by allfour procedures tend to have decreasing MSFC trends. One might arguethat we could have treated MSFC as a single outcome and then fit our single-longitudinal outcome mixture model to identify responders. Although such ananalysis is possible in principle, the distribution of this longitudinal outcomemay be difficult to model as each MSFC is the sum of standardized versions ofcontinuous and discrete measurements with arbitrary adjustments for censoredmeasures.The previous clinical trial analysis used the change in MSFC between the24th month follow-up and the baseline as one of the secondary endpoints. Fig-ure 6.9 plots the estimated posterior probabilities of being in the worseninggroup versus the change in MSFC between the final follow-up measure and thebaseline. It also shows their Spearman correlation. The posterior probabili-ties of being in the worsening group estimated from SINGLE-T25FW is themost strongly correlated with the change in MSFC, followed by the posteriorprobabilities estimated from TRIPLE, while SINGLE-PASAT and SINGLE-T25FW are not as strongly correlated to the change in MSFC. Patients 8, 23,32, 84 and 86 are in the left bottom corner of the Figure 6.9 (a), and they are1396.2. MBP8298 MS trial(a) Patients identified as worsening by SINGLE-T25FW but not by any of the other procedures(b) Patients identified as worsening by all the procedures except SINGLE-PASAT(c) Patients identified as worsening by all the procedures except SINGLE-T25FWFigure 6.7: The trajectories of T25FW and 9HPT on the log-log scale, and ofPASAT and MSFC over j = −1, 0, · · · , 12 for example patients. The dottedhorizontal lines for T25FW and 9HPT show their maximum values. Thelegends show patient IDs.1406.2. MBP8298 MS trialFigure 6.8: The trajectories of the MSFC scores grouped by clusters fromTRIPLE, SINGLE-PASAT, SINGLE-T25FW and SINGLE-9HPT.not identified as responders by TRIPLE but have a relatively large negativechange in MSFC scores (< −3). As seen in Figure 6.7 (a), these 5 patientsare identified as responders only by SINGLE-T25FW. The figure also revealsthat the final measure of T25FW is censored for all of these patients (hence,the value of 180 is imputed for these patients in Figure 6.7). The changein MSFC tends to be more affected by the T25FW outcome when its finalmeasure is censored than the results of our TRIPLE, even if there is littlechange with respect to 9HPT and PASAT over time. This may be one reasonwhy SINGLE-T25FW is more correlated with the change in MSFC than isTRIPLE.Finally among the 11, 9, 12 and 5 worsening patients identified by TRIPLE,SINGLE-PASAT, SINGLE-T25FW and SINGLE-9HPT, 5, 4, 7 and 3 are fromthe MBP8098-treatment arm. Fisher exact tests for the equality of responderproportions between treatment and placebo arms indicate that no proceduresyield significant differences in the proportions. The p-values are: 0.751, 0.730,0.767 and 1.000 for the results from TRIPLE, SINGLE-PASAT, SINGLE-T25FW and SINGLE-9HPT, respectively. These results indicate that thetreatment may have no effect on preventing worsening. This result agreeswith the previous clinical paper that found no significant benefit of MBP8298[20].In summary, this analysis demonstrated how our MLOMM can be used toidentify the subset of worsening patients using two continuous outcomes and1416.2. MBP8298 MS trial(a) TRIPLE (b) SINGLE-PASAT(c) SINGLE-T25FW (d) SINGLE-9HPTFigure 6.9: The estimated posterior probabilities of being in the worseninggroup from the four procedures versus the change in MSFC between the finalfollow-up measure and the baseline. “cor” shows the Spearman correlationbetween the change in MSFC scores and the estimated posterior probabilities.1426.3. Simulation studiesone discrete outcome. The worsening patients identified by TRIPLE exhib-ited deterioration in all three outcomes while the SINGLE procedures identifyworsening patients based on only a single longitudinal outcome. In this ex-ample, the worsening patients identified by TRIPLE were always identifiedby one of the SINGLE procedures. However, not all the patients identified asworsening by one of the SINGLE procedures were identified as worsening byTRIPLE. Although both MSFC and our MLOMM data analysis handle thecensored T25FW values in the same way (i.e., imputing the censored time),the two analysis show quite different results: the estimated posterior proba-bility of being a responder by TRIPLE and the change in the MSFC betweenthe final and baseline measurements do not necessarily have high correlation,because the change in MSFC tends be affected more by the censored T25FWthan our MLOMM procedure.6.3 Simulation studiesThe purpose of this simulation study is to compare the performance of ourprocedures based on multiple longitudinal outcomes and that based on a singlelongitudinal outcome. The performance is assessed in terms of the parameterestimation as well as the classification performance of relative responders inthree scenarios. In Scenario 1, all three outcomes are effective, and we generatedatasets from TRIPLE used in Section 6.2 with parameters set to the MLEsreported in Table 6.3. In Scenario 2, one outcome (T25FW) is ineffective,and we generate datasets as for Scenario 1 except that the simulation modelhas β(2)[1]2 = β(2)[2]2 . In Scenario 3, two outcomes (T25FW and PASAT) areineffective, and the simulation model of Scenario 1 is modified by settingβ(r)[1]2 = β(r)[2]2 , r = 1, 2.In all scenarios, the number of placebo and treated patients in the sim-ulated data are set to be the same as the MBP8298 MS trial: 47 and 54,respectively. The outcome values at screening of the 101 patients in our sim-ulated data, y(r)i,−1 for r = 1, 2, 3, are set to be the same as those observed inthe MBP8298 MS trial. We generate data so that the treatment arm is ex-1436.3. Simulation studiespected to contain 12.2% of relative responders (i.e., worsening patients). Foreach generated dataset, we fit TRIPLE, SINGLE-PASAT, SINGLE-T25FWand SINGLE-9HPT in Scenario 1. We only fit TRIPLE in Scenarios 2 and3 as the SINGLE procedure based on an ineffective outcome does not havepower to identify responders under these scenarios and the performance of theSINGLE procedure based on each effective outcome would be the same as inScenario 1. The dataset is generated 100 times for each scenario.Table 6.4 shows the mean point estimates and root mean square error(RMSE) over the 100 simulated datasets in all scenarios. For all parameters,the RMSEs of TRIPLE are less, or at most negligibly larger than, the RMSEsof any SINGLE procedure in all three scenarios. For the responder-specificparameters, i.e., β(r)[2]2 , r = 1, 2, 3, and pi, the RMSEs of TRIPLE are sub-stantially smaller than the RMSEs of any of the SINGLE procedures, andthis trend is more evident in Scenario 1 than in Scenario 2. In Scenario 3,the RMSEs of these parameters from TRIPLE and SINGLE-9HPT are verysimilar. For the rest of the parameters, RMSE differences between TRIPLEand SINGLE are small in all scenarios.Table 6.4 also reports the classification performance of each procedure.The performances are measured by three criteria: 1) the average proportionof identified responders among true responders (i.e., sensitivity), 2) the av-erage proportion of identified non-responders among true non-responders inthe treatment arm (i.e., specificity), and 3) the average proportion of placebopatients who would be assigned to the non-responder group if they were clas-sified, namely placebo-specificity.In Scenario 1, we find that the best sensitivity, specificity and placebo-specificity are achieved by TRIPLE. This indicates that incorporating multi-ple outcomes does improve the identification of responders in this simulationsetting. This result corresponds to TRIPLE yielding the smallest RMSEsfor the responder-specific parameters. In Scenario 2, we observe that eventhough one of the outcomes is now ineffective, TRIPLE still performs betterthan SINGLE-PASAT and SINGLE-9HPT in terms of all three criteria. InScenario 3, where only the 9HPT outcome is effective, the classification per-1446.4. Conclusionsformance of TRIPLE is very similar to SINGLE-9HPT in terms of all threecriteria, indicating that incorporating multiple ineffective outcomes does nothurt the classification performance.6.4 ConclusionsIn this chapter, we demonstrated our novel procedure introduced in Chapter 5on two MS clinical trial datasets. Since the lenercept clinical trial employedNAL counts as the primary outcome and the PAL counts as a secondaryoutcome, we used these outcomes to identify the relative responders. Our bi-variate longitudinal mixture model used the NB and the binomial GLM for theconditional distribution of the NAL counts and the PAL counts, respectively.The responders identified based only on NAL counts and based on both NALand PAL counts are almost identical as the PAL counts did not exhibit clearcluster differences. Yet the posterior probability estimates based on both out-comes differ somewhat from those based on only the NAL counts. This dataanalysis also demonstrated that one can let data decide how treatments affectthe outcomes of responders over time by evaluating the goodness of fit of themixture model under various scenarios for the conditional mean structures ofthe outcomes. This would be useful as it is often not clear in advance how thetreatment affects the mean outcomes of responders over time. We used AICto assess the fit. However, if the sample size is not large, one might want toconsider other goodness of fit measures such as AICc [8].Since the MBP8298 clinical trial employed the MSFC as a clinical out-come, we used the three original longitudinal components of the MSFC toidentify absolute responders. Our trivariate longitudinal mixture model hadnormal models for the conditional distribution of the two continuous compo-nents of the MSFC and the binomial GLM for the conditional distributionsfor the remaining component. The absolute responders (worsening patients)identified based on all three components were always identified by at least oneof our procedures based on the single longitudinal components of the MSFC.1456.4.ConclusionsParameter Scenario 1SINGLEtrue TRIPLE PASAT T25FW 9HPTPASAT β(1)0 -2.763 -2.792 (0.170) -2.793 (0.169)β(1)1 0.209 0.209 (0.023) 0.210 (0.024)β(1)[1]2 -0.234 -0.231 (0.024) -0.232 (0.024)β(1)[2]2 0.372 0.347 (0.134) 0.276 (0.246)T25FW β(2)0 0.590 0.589 (0.026) 0.588 (0.027)β(2)1 0.320 0.318 (0.021) 0.321 (0.023)β(2)[1]2 0.026 0.025 (0.005) 0.025 (0.008)β(2)[2]2 0.138 0.134 (0.028) 0.090 (0.077)κ(2) 0.013 0.013 (0.001) 0.014 (0.001)9HPT β(3)0 0.989 0.983 (0.031) 0.982 (0.033)β(3)1 0.186 0.188 (0.024) 0.189 (0.025)β(3)[1]2 0.003 0.003 (0.003) 0.002 (0.003)β(3)[2]2 0.076 0.073 (0.016) 0.061 (0.031)κ(3) 0.003 0.003 (1e-04) 0.003 (1e-04)RE σ1 1.571 1.561 (0.110) 1.559 (0.111)σ2 0.195 0.193 (0.014) 0.193 (0.015)σ3 0.091 0.091 (0.008) 0.091 (0.008)ρ12 0.179 0.175 (0.093)ρ13 0.231 0.258 (0.109)ρ23 0.444 0.456 (0.095)δ 0.878 0.872 (0.083) 0.808 (0.156) 0.664 (0.277) 0.785 (0.177)Sensitivity 94.1 67.2 72.0 82.9Specificity 97.9 90.1 72.3 87.7Placebo-Specificity 98.6 91.7 76.7 90.0Scenario 2true TRIPLE-2.763 -2.792 (0.171)0.209 0.209 (0.023)-0.234 -0.231 (0.026)0.372 0.335 (0.160)0.590 0.592 (0.026)0.320 0.318 (0.021)0.082 0.082 (0.005)0.082 0.079 (0.022)0.013 0.013 (0.001)0.989 0.984 (0.030)0.186 0.188 (0.024)0.003 0.003 (0.003)0.076 0.072 (0.018)0.003 0.003 (1e-04)1.571 1.560 (0.110)0.195 0.193 (0.015)0.091 0.091 (0.008)0.179 0.174 (0.092)0.231 0.257 (0.109)0.444 0.455 (0.096)0.878 0.864 (0.099)92.096.897.9Scenario 3true TRIPLE-2.763 -2.770 (0.168)0.209 0.209 (0.024)0.069 0.070 (0.019)0.069 0.056 (0.121)0.590 0.593 (0.026)0.320 0.317 (0.021)0.082 0.082 (0.005)0.082 0.082 (0.022)0.013 0.013 (0.001)0.989 0.983 (0.032)0.186 0.189 (0.025)0.003 0.002 (0.003)0.076 0.061 (0.032)0.003 0.003 (1e-04)1.571 1.557 (0.111)0.195 0.193 (0.015)0.091 0.091 (0.008)0.179 0.174 (0.092)0.231 0.256 (0.108)0.444 0.455 (0.096)0.878 0.781 (0.198)84.186.389.3Table 6.4: The mean point estimates and RMSE across 100 simulated datasets for three scenarios.1466.4. ConclusionsIn both MS data applications, the model estimate of the Spearman cor-relation matrix was reasonably similar to the sample Spearman correlationmatrix indicating that our models specify the dependence structures of thetwo datasets reasonably well. Although the longitudinal data structures fromthe lenercept trial and the MBP8298 trial are very different, our general mix-ture model can be applied for identifying responders in both of these context.The simulation study, designed based on the fitted results of our MLOMMto the MBP8298 trial, showed that our MLOMM based on all three longi-tudinal outcomes (i.e., TRIPLE) is better than any of the MLOMMs basedon only one of the longitudinal outcomes (i.e., SINGLE) in terms of correctidentification rates of the relative responders and non-responders. Indeed, theresponder identification performance of TRIPLE was better than any SIN-GLE even when one of the outcomes is ineffective, and as good as the best ofthe SINGLE procedures when only one outcome is effective, indicating thatincorporation of ineffective outcomes does not hurt the classification perfor-mance.147Chapter 7DiscussionChapters 2 and 3 of this thesis introduced a procedure to assist DSMBs inidentifying patients with unexpected increases in CEL activity during MSclinical trials. We extended the YZ procedure that used the CPI based on amixed effect NB model for longitudinally collected CEL counts, and developeda more flexible mixed effect regression model that allowed incorporating priorknowledge via meta analysis. Our novel procedures were demonstrated with10 MS clinical trial datasets and assessed via simulation studies. In Chap-ters 4, 5 and 6, we discussed the issue of identifying treatment respondersbased on multiple longitudinal outcomes in completed clinical trials. We in-troduced our novel multiple longitudinal outcome mixture model (MLOMM)that, conditionally on the cluster label, models each longitudinal outcome tobe from either a GLMM or the NB mixed effect model, as they are arguablythe most popular longitudinal models in the clinical trial applications of inter-est to us. Our novel procedures were demonstrated with two MS clinical trialdatasets, and assessed via a simulation study. This chapter briefly discussespotential directions of future work on our MLOMM procedure for identifyingresponders.First, the MLOMM developed in Chapter 5 is a rich multivariate modelthat can be applied to multiple longitudinal outcomes each of which has aGLMM structure conditioning on a cluster label, and it would be of inter-est to apply our model to more datasets. Our data analyses in Chapter 6demonstrated its usage in two application contexts: when, conditionally onthe cluster label, the joint distribution of two longitudinal outcomes is mod-elled with a binomial GLMM and a NB GLMM (Section 6.1), and when thejoint distribution of three longitudinal outcomes is modelled with a binomial148Chapter 7. DiscussionGLMM and two normal GLMMs (Section 6.2). As future work, we may ap-ply our model to application contexts that require different GLMM structures.For example, in MS clinical trials, we might use our model to identify MRIresponders, comprehensively defined by all the important MRI outcomes suchas various types of lesion counts and burden of disease (BOD). Such an anal-ysis would be of interest as many have attempted to define MRI responders(e.g.,[67, 74]). Although BOD would, in most cases, be measured on a differentschedule than the lesion counts, such a data structure can be handled by ourMLOMM with no additional complications. As BOD is a continuous outcome,we might use a gamma or normal GLMM for the conditional distribution forthe BOD.Second, it would be useful to improve the estimation scheme for ourMLOMM. Section 5.4 described our estimation scheme based on the MCEMalgorithm. One limitation of this scheme is computation time. To guaranteethe increase in observed likelihood at every iteration of the MCEM algorithm,the MC sample size must be reasonably large especially at the later MCEMiterations. However, the large MC sample size results in a larger generateddataset which, in turn, is used in the M-step. As our current implementationof the M-step uses the R function glm on the data generated in the E-step,the size of the generated data cannot be too large. We found that a large MCsample size is required particularly when the cluster separations are unclearas the likelihood is relatively flat. Although this issue may diminish in impor-tance as computer power continues to develop, improvement of computationaltime is a fruitful direction of future work.As our MLOMM may be considered as a hierarchical model, estimationmay be faster if one employs a Bayesian approach and develops the MCMCalgorithm. Although such an approach requires careful development of eachMCMC step, sensitivity analyses for the prior specifications, and MCMCconvergence analysis, this would be interesting future work. The Bayesianframework has other advantages: one can easily obtain approximate posteriordistributions for the unknown parameters, as well as for the classification prob-ability of each patient, while our current frequentist approach only provides149Chapter 7. Discussionthe point estimate and a CI.Alternatively, we may be able to obtain the MLE using the Laplace ap-proximation together with the EM algorithm. The Laplace approximationuses the Taylor series expansion technique to approximate integrals of certainforms. Many authors have successfully applied the Laplace approximationfor evaluating GLMM likelihoods that involve integrations with respect tonormally-distributed REs (e.g., [47]), and the accuracy of this approximationhas been studied (e.g., [30]).With the Laplace approximation, we can evaluate the conditional densityf(y˜i|ci; Ψ) =∫f(y˜i|b˜i, ci; Ψ)f(b˜i; ΨB)db˜i of our MLOMM as f(y˜i|ci; Ψ) isessentially a GLMM likelihood. Therefore, to implement the EM algorithm,we can redefine the complete likelihood of each patient without the REs asfC(y˜i, ci; Ψ). At the sth iteration of the E-step, the expectation of the com-plete log-likelihood for each patient is taken only with respect to Ci|y˜i; Ψ{s} asE Ci(ln fC(y˜i, Ci; Ψ)|y˜i; Ψ{s}). The sum of these expectations across patientscan be directly evaluated as:N∑i=1K∑k=1Pr(Ci = k|y˜i; Ψ{s})[ln f(y˜i|Ci = k; Ψ) + lnpik], (7.1)wherePr(Ci = k|y˜i; Ψ{s}) =f(y˜i|Ci = k; Ψ{s})pi{s}k∑Kl=1 f(y˜i|Ci = l; Ψ{s})pi{s}l.The subsequent M-step would numerically maximize (7.1). Developing suchan EM algorithm for our MLOMM would be interesting future work.Lastly, although our model is very general, one may extend it to accountfor other data structures: extending our model so that it can handle censoringmechanisms would be interesting future work. T25FW and 9HPT, two of thethree components of MSFC, are time-to-event outcomes. Our data analysis inSection 6.2 treated the censored values by simply imputing the administrativecensoring time since the censoring proportions are low for both outcomes.However, we could model the time-to-event outcome, say Y(r)i,j , to be from150Chapter 7. Discussiona popular survival regression model such as the log-normal or Weibull [79]conditionally on y(r)i,1:j ,y(1:r)i , b(r)i and ci. Then in the M-step of our MCEMalgorithm, the above conditional likelihood might be maximized using a well-developed algorithm for maximizing the survival regression likelihood. Suchan algorithm is implemented in, for example, the survreg function in the Rprogramming language. Extending our model so that it can handle an ordinallongitudinal outcome might also be interesting as there are important ordinaloutcomes in MS clinical trials such as the Expanded Disability Status Scale.Extending our model to incorporate RE distributions other than MVN with azero mean vector and an unstructured variance is another direction of futurework. As briefly mentioned in Section 5.3.2, copulas could be used to combinenon-normal marginal distributions of REs to develop their joint distribution[29]. Another extension could be incorporating an additional level of hierarchyfor the REs as would be required, for example, for cluster-randomized trials.Finally, in quite a different direction, the cluster structures utilized could begeneralized so that cluster differences may arise not only from the conditionalmean of the outcomes but also from the variance structures of the REs or thedispersion parameters.151Bibliography[1] K. Abrams and B. Sanso. Approximate Bayesian inference for randomeffects meta-analysis. Statistics in Medicine, 17:201–218, 1998.[2] C. E. Antoniak. Mixtures of Dirichlet processes with applications toBayesian nonparametric problems. Annals of Statistics, 2:1152–1174,1974.[3] F. Barkhof, U. Held, J. H. Simon, M. Daumer, F. Fazekas, M. Filippi,J. A. Frank, L. Kappos, D. K. B. Li, S. Menzler, D. H. Miller, A. J.Petkau, J. Wolinsky, and Sylvia Lawry Centre for MS Research. Predict-ing gadolinium enhancement status in MS patients eligible for randomizedclinical trials. Neurology, 65:1447–1454, 2005.[4] D. Blackwell and J. B. MacQueen. Ferguson distributions via Polya urnschemes. Annals of Statistics, 1:353–355, 1973.[5] J. G. Booth, G. Casella, H. Friedl, and J. P. Hobert. Negative binomialloglinear mixed models. Statistical Modelling, 3:179–191, 2003.[6] J. G. Booth and J. P. Hobert. Maximizing generalized linear mixed modellikelihoods with an automated Monte Carlo EM algorithm. Journal ofRoyal Statistical Society, Series B (Methodological), 61:265–285, 1999.[7] B. C. Bray, S. T. Lanza, and X. Tan. Eliminating bias in classify-analyzeapproaches for latent class analysis. Structural Equation Modeling: aMultidisciplinary Journal, 22:1–11, 2015.[8] K. P. Burnham and D. R. Anderson. Model Selection and Multimodel152BibliographyInference: A Practical Information-Theoretic Approach. Springer-Verlag,New York, 2002.[9] B. S. Caffo, W. Jank, and G. L. Jones. Ascent-based Monte Carloexpectation-maximization. Journal of Royal Statistical Society, SeriesB (Methodological), 67:235–251, 2005.[10] G. Casella and R. L. Berger. Statistical Inference. Duxbury ResourceCenter, CA, 2002.[11] J. Chen, D. Zhang, and M. Davidian. A Monte Carlo EM algorithm forgeneralized linear mixed models with flexible random effects distribution.Biostatistics, 3:347–360, 2002.[12] G. R. Cutter, M. L. Baier, R. A. Rudick, D. L. Cookfair, J. S. Fischer,A. J. Petkau, K. Syndulko, B. G. Weinshenker, J. P. Antel, C. Con-favreux, G. W. Ellison, F. Lublin, A. E. Miller, S. M. Rao, S. Reingold,A. Thompson, and E. Willoughby. Development of a multiple sclerosisfunctional composite as a clinical trial outcome measure. Brain, 122:871–882, 1999.[13] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihoodfrom incomplete data via the EM algorithm (with discussion). Journalof Royal Statistical Society, Series B (Methodological), 39:1–38, 1977.[14] M. D. Escobar and M. West. Bayesian density estimation and inferenceusing mixtures. Journal of the American Statistical Association, 90:577–588, 1995.[15] F. Fazekas, F. D. Lublin, D. K. B. Li, M. S. Freedman, H. P. Kartung,and P. Rieckmann. Intravenous immunoglobulin in relapsing-remittingmultiple sclerosis: a dose-finding trial. Neurology, 71:265–271, 2008.[16] J. S. Fischer, A. J. Jak, J. E. Kniker, R. A. Rudick, and G. Cutter. Multi-ple Sclerosis Functional Composite: Administration and Scoring Manual.National Multiple Sclerosis Society, 2001.153Bibliography[17] R. A. Fisher. Frequency distribution of the values of the correlationcoefficient in samples from an indefinitely large population. Biometrika,10:507–521, 1915.[18] U.S. Food and Drug Administration. Paving the way for per-sonalized medicine: FDA’s role in a new era of medical prod-uct development. http://www.fda.gov/downloads/ScienceResearch/SpecialTopics/PersonalizedMedicine/UCM372421.pdf, 2013. Ac-cessed: 2016-03-30.[19] Multiple Sclerosis Foundation. Learn about multiple sclerosis. http://www.msfocus.org/Treatments-for-multiple-sclerosis.aspx. Ac-cessed: 2016-03-30.[20] M. S. Freedman, A. Bar-Or, J. Oger, A. Traboulsee, D. Patry, C. Young,T. Olsson, D. Li, H. P. Hartung, M. Krantz, L. Ferenczi, and T. Verco.A phase III study evaluating the efficacy and safety of MBP8298 in sec-ondary progressive MS. Neurology, 77:1551–1560, 2011.[21] W. Greene. Functional forms for the negative binomial model for countdata. Economics Letters, 99:585–590, 2008.[22] M. Grzybowski and J. G. Younger. Statistical methodology: III. receiveroperating characteristic (ROC) curves. Academic Emergency Medicine,4:818–826, 1997.[23] M. Guan. Incorporating prior information into an approach for detectingunusually large increases in MRI activity in multiple sclerosis patients.Master’s thesis, Department of Statistics, University of British Columbia,Canada, 2012.[24] P. D. Guh. Procedures for multiple outcome measures with applicationsto multiple sclerosis clinical trial. Master’s thesis, Department of Statis-tics, University of British Columbia, Canada, 1997.154Bibliography[25] J. M. Hilbe. Negative Binomial Regression. Cambridge University Press,Cambridge, 2011.[26] R. V. Hogg, J. W. McKean, and A. T. Craig. Introduction to Mathemat-ical Statistics. Pearson, New Jersey, 2005.[27] H. Ishwaran and L. F. James. Gibbs sampling methods for stick-breakingpriors. Journal of the American Statistical Association, 96:161–173, 2001.[28] H. Ishwaran and M. Zarepur. Markov chain Monte Carlo in approx-imate Dirichlet and beta two-parameter process hierarchical models.Biometrika, 87:371–390, 2000.[29] H. Joe. Multivariate Models and Dependence Concepts. Chapman andHall, London, 1997.[30] H. Joe. Accuracy of Laplace approximation for discrete response mixedmodels. Computational Statistics and Data Analysis, 52:5066–5074, 2008.[31] J. Kang and Y. Yang. Joint modelling of mixed count and continuouslongitudinal data. In A.R. de Leon and K. C. Carriere, editors, Analysis ofMixed Data: Method & Applications. Chapman and Hall/CRC, London,2013. Chapter 4.[32] L. Kappos, D. K. B. Li, P. A. Calabresi, P. O’Connor, A. Bar-Or,F. Barkhof, M. Yin, D. Leppert, R. Glanzman, J. Tinbergenm, and S. L.Hauser. Ocrelizmab in relapsing-remitting multiple sclerosis: a phase 2,randomized, placebo-controlled, multicentre trial. Lancet, 378:1779–1787,2011.[33] D. Karlis and L. Meligkotsidou. Multivariate Poisson regression withcovariance structure. Statistics and Computing, 15:255–265, 2005.[34] K. P. Kleinman and J. G. Ibrahim. A semiparametric Bayesian approachto the random effects model. Biometrics, 54:921–938, 1998.155Bibliography[35] Y. Kondo, Y. Zhao, and A. J. Petkau. A flexible mixed-effect negativebinomial regression model for detecting unusual increases in MRI lesioncounts in individual multiple sclerosis patients. Statistics in Medicine,34:2165–2180, 2015.[36] M. Kunz. On responder analyses in the framework of within subjectcomparisons - considerations and two case studies. Statistics in Medicine,33:2939–2952, 2014.[37] M. Kyung, J. Gill, and G. Casella. New findings from terrorism data:Dirichlet process random-effects models for latent groups. Applied Statis-tics, 60:701–721, 2011.[38] B. E. Leiby, M. D. Sammel, T. R. Ten Have, and K. G. Lynch. Identifi-cation of multivariate responders/non-responders using Bayesian growthcurve latent class models. Journal of the Royal Statistical Society, SeriesC, 58:505–524, 2009.[39] B. E. Leiby, T. R. Ten Have, K. G. Lynch, and M. D. Sammel. Bayesianmultivariate growth curve latent class models for mixed outcomes. Statis-tics in Medicine, 33:3434–3452, 2014.[40] Lenercept MS Study Group and the UBC MS/MRI Analysis Group. TNFneutralization in MS: Results of a randomized, placebo-controlled multi-center study. Neurology, 53:457–465, 1999.[41] D. K. B. Li, G. J. Zhao, D. W. Paty, the University of British ColumbiaMS/MRI Analysis Research Group, and the SPECTRIMS Study Group.Randomized controlled trial of interferon-beta-1a in secondary progres-sive MS: MRI results. Neurology, 56:1505–1513, 2001.[42] R. J. A. Little and D. B. Rubin. Statistical Analysis with Missing Data.Wiley, New York, 1987.[43] T. A. Louis. Finding the observed information matrix when using the EM156Bibliographyalgorithm. Journal of Royal Statistical Society, Series B (Methodological),44:226–233, 1982.[44] S. N. MacEachern. Estimating normal means with a conjugate styleDirichlet process prior. Communications in Statistics: Simulation andComputation, 23:727–741, 1994.[45] P. McCullagh and J. A. Nelder. Generalized Linear Models, second edi-tion. Chapman and Hall, London, 1989.[46] C. E. McCulloch. Maximum likelihood variance components estimationfor binary data. Journal of the American Statistical Association, 89:330–335, 1994.[47] C. E. McCulloch and S. R. Searle. Generalized, Linear, and Mixed Models.Willey, New York, 2001.[48] H. F. McFarland, F. Barkhof, J. Antel, and D. H. Miller. The role of MRIas a surrogate outcome measure in multiple sclerosis. Multiple Sclerosis,8:40–51, 2002.[49] D. H. Miller, P. D. Molyneux, G. J. Barker, D. G. MacManus, I. F.Moseley, K. Wagner, and the European Study Group on Interferon-b1b inSecondary Progressive Multiple Sclerosis. Effect of interferon-β1b onmagnetic resonance imaging outcomes in secondary progressive multiplesclerosis: Results of a European multicenter, randomized, double-blind,placebo-controlled trial. Annals of Neurology, 46:850–859, 1999.[50] C. J. Morgan, I. B. Aban, C. R. Katholi, and G. R. Cutter. Modelinglesion counts in multiple sclerosis when patients have been selected forbaseline activity. Multiple Sclerosis, 16:926–934, 2010.[51] L. A. Moye. Elementary Bayesian Biostatistics. Chapman & Hall, 2008.[52] Multiple Sclerosis Society of Canada. Multiple Sclerosis: Its Effects onYou and Those You Love. Greenwood Tamad Inc, 2012.157Bibliography[53] Daniel Nagin. Group-based Modeling of Development. Harvard UniversityPress, Cambridge, 2005.[54] R. M. Neal. Slice sampling. Annals of Statistics, 31:705–741, 2003.[55] The North American Study Group on Interferon beta-1b in Sec-ondary Progressive MS. Interferon beta-1b in secondary progressive MS,results from a 3-year controlled study. Neurology, 63:1788–1795, 2004.[56] P. C. O’Brien. Procedures for comparing samples with multiple end-points. Biometrics, 40:1079–1087, 1984.[57] D. Ohlssen, L. Sharples, and D. Spiegelhalter. Flexible random-effectsmodels using Bayesian semi-parametric models: Applications to institu-tional comparisons. Statistics in Medicine, 26:2088–2112, 2007.[58] D. W. Paty and D. K. B. Li. Interferon beta-1b is effective in relapsing-remitting multiple sclerosis II. MRI analysis results of a multicenter, ran-domized, double-blind, placebo-controlled trial. Neurology, 43:662–667,1993.[59] M. Pavlic, R. Brand, and S. Cummings. Estimating probability of non-response to treatment using mixture distributions. Statistics in Medicine,20:1739–1753, 2001.[60] K. B. Petersen and M. S. Pedersen. The matrix cookbook. Technicalreport, Technical University of Denmark, 2008.[61] A. J. Petkau. Statistical and design considerations for multiple sclero-sis clinical trials. In D.E. Goodkin and R.A. Rudick, editors, MultipleSclerosis: Advances in Clinical Trial Design, Treatment and Future Per-spectives. Springer-Verlag, London, 1996. Chapter 4.[62] M. Plummer. Penalized loss functions for Bayesian model comparison.Biostatistics, 9:523–539, 2008.158Bibliography[63] PRISMS Study Group. Randomised double-blind placebo-controlledstudy of interferon beta-1a in relapsing/remitting multiple sclerosis.Lancet, 352:1498–1504, 1998.[64] J. D. Raffa and J. A. Dubin. Multivariate longitudinal data analysis withmixed effects hidden Markov models. Biometrics, 71:821–831, 2015.[65] C. A. Riddell, Y. Zhao, D. K. B. Li, A. J. Petkau, A. Riddehough, G. R.Cutter, and A. Traboulsee. Evaluation of safety monitoring guidelinesbased on MRI lesion activity in multiple sclerosis. Neurology, 77:2089–2096, 2011.[66] J. R´ıo, M. Comabella, and X. Montalban. Predicting responders to ther-apies for multiple sclerosis. Nature Reviews Neurology, 5:553–560, 2009.[67] M. Rovaris. The definition of non-responder to multiple sclerosis treat-ment: neuroimaging markers. Neurological Sciences, 29:222–224, 2008.[68] R. Rudick, J. Antel, C. Confavreux, G. Cutter, G. Ellison, J. Fischer,F. Lublin, A. Miller, A. J. Petkau, S. Rao, S. Reingold, K. Syndulko,A. Thompson, J. Wallenberg, B. Weinshenker, and Willoughby E. Rec-ommendations from the national multiple sclerosis society clinical out-comes assessment task force. Annals of Neurology, 42:379–382, 1997.[69] E. B. Samani and M. Ganjali. Mixed correlated bivariate ordinal andnegative binomial longitudinal responses with nonignorable missing val-ues. Communications in Statistics - Theory and Methods, 43:2659–2673,2014.[70] J. Sethuraman. A constructive definition of Dirichlet priors. StatisticaSinica, 4:639–650, 1994.[71] M. P. Sormani, P. Bruzzi, D. H. Miller, C. Gasperini, F. Barkhof, andL. Filippi. Modelling MRI enhancing lesion counts in multiple sclerosisusing a negative binomial model: implications for clinical trials. Journalof the Neurological Sciences, 163:74–80, 1999.159Bibliography[72] D. J. Spiegelhalter, K. R. Abrams, and J. P. Myles. Bayesian Approachesto Clinical Trials and Health-Care Evaluation. John Wiley & Sons Ltd.,Chichester, 2004.[73] D. J. Spiegelhalter, N. G. Best, B. P. Carlin, and A. V. D. Linde. Bayesianmeasures of complexity and fit (with discussion). Journal of the RoyalStatistical Society, Series B (Methodological), 64:1–34, 2002.[74] L. A. Stone, J. A. Frank, P. S. Albert, C. N. Bash, P. A. Calabresi,H. Maloni, and H. F. McFarland. Characterization of MRI responseto treatment with interferon beta-1b: Contrast-enhancing MRI lesionfrequency as a primary outcome measure. Neurology, 49:862–869, 1997.[75] OWIMS Study Group. Evidence of interferon beta-1a dose response inrelapsing-remitting MS. Neurology, 53:1–16, 1999.[76] M. A. Tanner. Tools for Statistical Inference: Methods for the Explorationof Posterior Distributions and Likelihood Functions. Springer, New York,1996.[77] W. N. Venables and B. D. Ripley. Modern Applied Statistics with S,fourth edition. Springer, New York, 2002. ISBN 0-387-95457-0.[78] S. Watanabe. Asymptotic equivalence of bayes cross validation and widelyapplicable information criterion in singular learning theory. The Journalof Machine Learning Research, 11:3571–3594, 2010.[79] A. Wienke. Frailty Models in Survival Analysis. CRC Press, 2010.[80] C. F. J. Wu. On the convergence properties of the EM algorithm. Annalsof Statistics, 11:95–103, 1983.[81] Y. Yang and J. Kang. Joint analysis of mixed Poisson and continu-ous longitudinal data with nonignorable missing values. ComputationalStatistics and Data Analysis, 54:193–207.160Bibliography[82] Y. Zhao, Y. Kondo, A. Traboulsee, D. K. B Li, A. Riddehough, and A. J.Petkau. Personalized activity index, a new safety monitoring tool formultiple sclerosis clinical trials. Multiple Sclerosis Journal–Experimental,Translational and Clinical, 1:2055217315577829, 2015.[83] Y. Zhao, A. J. Petkau, D. K. B. Li, A. Riddehough, and A. Traboulsee.Detection of unusual increases in MRI lesion counts in multiple sclerosispatients. Journal of the American Statistical Association, 109:119–132,2013.[84] Y. Zhao, A. J. Petkau, A. Traboulsee, A. Riddehough, and D. K. B.Li. Does MRI lesion activity regress in secondary progressive multiplesclerosis? Multiple Sclerosis, 16:434–442, 2010.161Appendix AAppendix for Chapter 3A.1 Technical modification in YZ’ssemiparametric procedureFor YZ’s semiparametric procedure, we observed that the algorithm could beunstable when there are only a few patients with repeated measures on eitherarm in one of the time intervals.In this scenario, the algorithm could yield unacceptably small estimates ofµi,j in some iterations. As Var(Yi,j) and Cov(Yi,j , Yi,j′), j 6= j′ are both multi-ples of µi,j (see Equations (1) and (2) in YZ), small values of µi,j may lead toa singular or nearly-singular Var(Y i) matrix. As a result, the algorithm maybreak down when computing the weight matrix W i = Var(Y i)−1. To preventthis, the current algorithm replaces any µi,j < U with U when calculatingthe weight matrix W i. U = 0 means that there is no modification, and thesmaller U , the closer this modified algorithm is to that proposed in YZ. Ourpreliminary study indicates that U > 10−4 prevents the breakdown problemand the performance of the algorithm is similar in terms of the root meansquare error of the CPI when 10−3 < U < 10−1. The analyses in this thesisemploy U = 10−2.Furthermore, even with this ad-hoc adjustment, the algorithm might notsatisfy its convergence criterion (absolute change in all parameter estimates< 10−3) within an acceptable time. Therefore, we set the maximum numberof iterations to 100 and if convergence is not achieved, we treat the parameterestimates at the 100th iteration as a solution.These modifications are ad-hoc and more fundamental improvement ofthe algorithm is necessary. Every iteration of YZ’s current procedure updates162A.2. The details of the DIC calculation for our semiparametric modelthe regression coefficients and dispersion parameter separately: the algorithmidentifies regression coefficients by minimizing the weighted least squares giventhe estimated dispersion parameter and predicted REs, and then estimates thedispersion parameter by maximizing a profile likelihood given the estimatedregression coefficients and the predicted REs. Therefore, for example, im-proved convergence performance might be achieved if the algorithm estimatesthe dispersion parameter and the regression coefficients simultaneously at ev-ery iteration. Such an investigation is an interesting direction for future work.A.2 The details of the DIC calculation for oursemiparametric modelIn this appendix, we follow the DIC literature [73] and let D be a deviancecorresponding to a focused likelihood, parametrized by parameters of interestθ. Then the DIC is defined as:DIC = D¯ + pD,where D¯ = Eθ(D|data) is the expected deviance with expectation taken withrespect to θ|data. In our context, the data is {yi,j}, and pD is the effectivenumber of parameters defined as:pD = D¯ −D(θ¯(data)),where θ¯(data) = E (θ|data) is the expectation of θ given the data.Under our semiparametric model, our focused parameter is θ = {β, FG} ={β, {aGh , bGh , pih}Mh=1}, and our focused likelihood can be written as:N∏i=1p({yi,j}nij=−1|β, FG) =N∏i=1p({yi,j}nij=−1|β, {aGh , bGh , pih}Mh=1)=N∏i=1( M∑h=1pihp({yi,j}nij=−1|β(s), aGh , bGh)).163A.2. The details of the DIC calculation for our semiparametric modelNote that the density p({yi,j}nij=−1|β(s), aGh , bGh) is the same as the densityof p({yi,j}nij=−1|β(s), aGhi , bGhi ) defined in (2.5) with aGhi and bGhi replaced byaGh and bGh . Therefore, its corresponding deviance isD = D(β, {aGh , bGh , pih}Mh=1)= −2 ln∏Ni=1 p({yi,j}nij=−1|β, {aGh , bGh , pih}Mh=1). Based on MCMC samples ofβ[b], {a[b]Gh , b[b]Gh, pi[b]h }Mh=1, b = 1, · · · , B, D¯ can be readily estimated as:∑Bb=1D(β[b], {a[b]Gh , b[b]Gh, pi[b]h }Mh=1)B.The average of the MCMC samples of β is used to estimate θ¯(data) for β.The estimation of θ¯(data) for FG requires some attention as FG is a mixturedistribution. For each MCMC sample of {a[b]Gh , b[b]Gh, pi[b]h }Mh=1, we evaluate theRE density fG(g|{a[b]Gh , b[b]Gh, pi[b]h }Mh=1) at a grid of points g = g1, · · · , gL between0 and 1. Then we obtain the average density value across MCMC samples ateach grid point g = g1, · · · , gL as:∑Bb=1 fG(g|{a[b]Gh , b[b]Gh, pi[b]h }Mh=1)B.Using these approximations, we evaluate D(θ¯(data)).There is some literature (e.g., [62]) discussing the limitations of the DIC.Incorporating other tools for assessing goodness of fit, such as the Watanabe-Akaike information criterion [78], could be interesting future work.164Appendix BAppendix for Chapter 5B.1 The asymptotic distribution of Ψ̂{s+1} and∆QM(Ψ̂{s+1}; Ψ̂{s})This Appendix reviews the derivation of the asymptotic distribution of ∆QM (Ψ̂{s+1}; Ψ̂{s})[9]. As the theory of [9] relies on the results of [6], we briefly sketch the proofof [6] for the asymptotic normality of Ψ̂{s+1} when M increases in our context,and then discuss the ascent-based procedure of [9].In this section, expectations and variances are with respect to the distri-bution B˜i|y˜i; Ψ̂{s} shown in (5.18) in order to evaluate the errors from therandom samples. This section assumes that the random samples are i.i.d.,although the theory can be generalized for MCMC samples [6, 9].The gradient and Hessian of the expected complete log-likelihood (5.11),the approximated expected complete log-likelihood (5.12) and each componentof the approximated expected complete log-likelihood (5.13) with respect toΨ are written as:Q(j)(Ψ; Ψ′) =∂j∂ΨjQ(Ψ; Ψ′)Q(j)M (Ψ; Ψ′) =∂j∂ΨjQM (Ψ; Ψ′)q(j)m (Ψ; Ψ′) =∂j∂Ψjqm(Ψ; Ψ′),where j = 1, 2. In this Appendix we define Ψ{s+1} = argmaxΨQ(Ψ; Ψ̂{s}).(Note that this differs from the definition in (5.9), where Ψ{s+1} is defined asa maximizer of Q(Ψ; Ψ{s}) with respect to Ψ.) Then, we have Q(1)(Ψ{s+1};165B.1. The asymptotic distribution of Ψ̂{s+1} and ∆QM (Ψ̂{s+1}; Ψ̂{s})Ψ̂{s})= 0. Notice that Q(Ψ; Ψ̂{s}) = E (qm(Ψ; Ψ̂{s})). Therefore, assumingthat the integration and the differentiation are interchangeable,0 = Q(1)(Ψ{s+1}; Ψ̂{s}) = E(q(1)m (Ψ{s+1}; Ψ̂{s})). (B.1)Since b˜{s,m}i ,m = 1, 2, · · · ,M are i.i.d. samples from B˜i|y˜i; Ψ̂{s}, the ele-ments of each of {qm(Ψ; Ψ̂{s})}Mm=1, {q(1)m (Ψ; Ψ̂{s})}Mm=1 and {q(2)m (Ψ; Ψ̂{s})}Mm=1are also i.i.d. Therefore we can write the common mean of q(2)m and the commonvariance of q(1)m at Ψ{s+1} as:J(Ψ{s+1}) = E(q(2)m (Ψ{s+1}; Ψ̂{s})),V (Ψ{s+1}) = Var(q(1)m (Ψ{s+1}; Ψ̂{s}))= E(q(1)m (Ψ{s+1}; Ψ̂{s})q(1)m (Ψ{s+1}; Ψ̂{s})T). (B.2)The second equality of (B.2) is true because of (B.1).The first order Taylor series expansion of Q(1)M (Ψ; Ψ̂{s}) around Ψ{s+1}evaluated at Ψ̂{s+1} is:0 = Q(1)M (Ψ̂{s+1}; Ψ̂{s})≈ Q(1)M (Ψ{s+1}; Ψ̂{s}) +Q(2)M (Ψ{s+1}; Ψ̂{s})(Ψ̂{s+1} −Ψ{s+1}),where the first equality holds because Ψ̂{s+1} is the maximizer of Q(1)M (Ψ; Ψ̂{s})with respect to Ψ. The approximation holds for large M because conditionallyon Ψ̂{s}, by the weak law of large numbers, Q(1)M (Ψ; Ψ{s}) p→ Q(1)(Ψ; Ψ{s}) asM → ∞, which leads to Ψ̂{s+1} p→ Ψ{s+1}. Rearranging this equation, wehave:√M(Ψ̂{s+1} −Ψ{s+1})≈[1MM∑m=1q(2)m (Ψ{s+1}; Ψ̂{s})]−1 [1√MM∑m=1q(1)m (Ψ{s+1}; Ψ̂{s})].166B.1. The asymptotic distribution of Ψ̂{s+1} and ∆QM (Ψ̂{s+1}; Ψ̂{s})Since q(2)m (Ψ{s+1}; Ψ̂{s}) are i.i.d., by the weak law of large numbers, theterm inside the inverse converges in probability to J(Ψ{s+1}). Since q(1)m (Ψ{s+1}; Ψ̂{s})are i.i.d. with mean zero and variance V (Ψ{s+1}), by the central limit theorem,the second term converges in distribution to N (0,V (Ψ{s+1})).Finally, using Slutsky’s theorem, we have the asymptotic distribution ofΨ̂{s+1}:√M(Ψ̂{s+1} −Ψ{s+1}) d→ N (0,ΣΨ) ,ΣΨ = J(Ψ{s+1})−1V (Ψ{s+1})J−1(Ψ{s+1}). (B.3)The sandwich estimate Σ̂Ψ of this asymptotic variance matrix is obtained byestimating J(Ψ{s+1}) and V (Ψ{s+1}) as:Ĵ(Ψ{s+1}) =1MM∑m=1q(2)m (Ψ̂{s+1}; Ψ̂{s}),V̂ (Ψ{s+1}) =1MM∑m=1q(1)m (Ψ̂{s+1}; Ψ̂{s})q(1)m (Ψ̂{s+1}; Ψ̂{s})T .The evaluation of a confidence ellipsoid based on (B.3) as in [6] requires aninversion of Σ̂Ψ at every iteration, which may not be feasible in practice be-cause Σ̂Ψ can be numerically singular, or nearly singular, especially when thenumber of parameters is large. This motivates the alternative approach of[9] which increases the MC sample size if the asymptotic lower bound for∆Q(Ψ̂{s+1}; Ψ̂{s}) is less than 0.The asymptotic distribution of ∆QM (Ψ̂{s+1}; Ψ̂{s}) is obtained as follows.First, consider:√M(∆QM (Ψ̂{s+1}; Ψ̂{s})−∆Q(Ψ̂{s+1}; Ψ̂{s}))=√M(∆QM (Ψ̂{s+1}; Ψ̂{s})−∆QM (Ψ{s+1}; Ψ̂{s}))+√M(∆Q(Ψ{s+1}; Ψ̂{s})−∆Q(Ψ̂{s+1}; Ψ̂{s}))+√M(∆QM (Ψ{s+1}; Ψ̂{s})−∆Q(Ψ{s+1}; Ψ̂{s})). (B.4)167B.1. The asymptotic distribution of Ψ̂{s+1} and ∆QM (Ψ̂{s+1}; Ψ̂{s})We first show that the first term in (B.4) converges in probability to zero. Thefirst-order Taylor series expansion of QM (Ψ; Ψ̂{s}) around Ψ{s+1} evaluatedat Ψ̂{s+1} yields:√M(∆QM (Ψ̂{s+1}; Ψ̂{s})−∆QM (Ψ{s+1}; Ψ̂{s}))=√M(QM (Ψ̂{s+1}; Ψ̂{s})−QM (Ψ{s+1}; Ψ̂{s}))≈Q(1)M (Ψ{s+1}; Ψ̂{s})√M(Ψ̂{s+1} −Ψ{s+1}). (B.5)But:Q(1)M (Ψ{s+1}; Ψ̂{s}) =1MM∑m=1q(1)m (Ψ{s+1}; Ψ̂{s})converges in probability to zero by the strong law of large numbers and (B.1).The convergence in distribution in (B.3) combined with Sltusky’s theoremthen yields convergence in probability to zero for the expression in (B.5).Similarly, the first-order Taylor series expansion of Q(Ψ; Ψ̂{s}) aroundΨ{s+1} evaluated at Ψ̂{s+1} allows us to approximate the second term in (B.4)as:√M(∆Q(Ψ{s+1}; Ψ̂{s})−∆Q(Ψ̂{s+1}; Ψ̂{s}))=√M(Q(Ψ{s+1}; Ψ̂{s})−Q(Ψ̂{s+1}; Ψ̂{s}))≈−Q(1)(Ψ{s+1}; Ψ̂{s})√M(Ψ̂{s+1} −Ψ{s+1})= 0,where the last equality holds by the definition of Ψ(s+1) as in (B.1).Finally, the third term in (B.4) converges in distribution to a normal withmean 0 and variance:σ2 = V ar(∆qm(Ψ{s+1}; Ψ̂{s})),168B.2. The first derivative of the density of Y˜ i|b˜iwhere:∆qm(Ψ{s+1}; Ψ̂{s}) = qm(Ψ{s+1}; Ψ̂{s})− qm(Ψ̂{s}; Ψ̂{s}).Therefore, the asymptotic normality (5.21) follows. If we define:Lα(σ) := ∆QM (Ψ̂{s+1}; Ψ̂{s})− zα σ√M,where zα is the 1− α quantile of the standard normal distribution as definedpreviously, then asymptotically we have:Pr(Lα(σ) ≤ ∆Q(Ψ̂{s+1}; Ψ̂{s}))= 1− α. (B.6)The equation (B.6) means that Lα(σ) is smaller than ∆Q(Ψ̂{s+1}; Ψ̂{s})with probability 1− α as M →∞. The SD σ can be estimated by the SD ofthe M i.i.d. samples of ∆qm(Ψ̂{s+1}; Ψ̂{s}), and Lα(σˆ) is the natural predictorof the asymptotic lower bound for ∆Q(Ψ̂{s+1} ; Ψ̂{s}).B.2 The first derivative of the density of Y˜ i|b˜iThis section discusses how to evaluate the first derivatives of f(y˜i|B˜i =AT d˜<l>i ; Ψ) with respect to Ψ′. For the simplicity of notation, we let:l(r)[k]<l>i,j := ln f(y(r)i,j |y(r)i,1:j ,y(1:r)i , B˜i = AT d˜<l>i , Ci = k; Ψ[k]Y (r)).This Appendix shows the first derivatives of f(y˜i|B˜i = AT d˜<l>i ; Ψ′) withrespect to Ψ′ = {{pik}K−1k=1 , {Ai,j}i≤j , {ΨY (r)}Rr=1} in two cases:169B.2. The first derivative of the density of Y˜ i|b˜iCase 1l(r)[k]<l>i,j is modelled by the GLM with a canonical link (i.e., θ(r)[k]<l>i,j =η(r)[k]<l>i,j ) as:l(r)[k]<l>i,j =m(r)i,jκ(r)[θ(r)[k]<l>i,j y(r)i,j − b(θ(r)[k]<l>i,j)]+ c(y(r)i,j ;κ(r)m(r)i,j).Case 2l(r)[k]<l>i,j is the NB density on the log-scale with mean µ(r)[k]<l>i,j and varianceµ(r)[k]<l>i,j + µ(r)[k]<l>2i,j /ζ(r) as:l(r)[k]<l>i,j = y(r)i,j ln(µ(r)[k]<l>i,jµ(r)[k]<l>i,j + ζ(r))+ ζ(r) ln(ζ(r)µ(r)[k]<l>i,j + ζ(r))+ ln(Γ(ζ(r) + y(r)i,j )Γ(ζ(r))y(r)i,j !),where the mean is parametrized with a log link as η(r)[k]<l>i,j = exp(µ(r)[k]<l>i,j ).Before deriving the derivatives of interest in Section B.2.2, we introduce anew representation of the linear predictor η(r)[k]<l>i,j in Section B.2.1.B.2.1 A modified notation for η(r)[k]<l>i,jThis Appendix slightly changes the notation for the fixed effect and RE co-variates. In the main text, the linear predictor of Y(r)i,j for cluster k giventhe RE is expressed as in (5.3) where the mean cluster difference arises fromβ(r)[k], and the RE appears through B(r)i . We will reexpress (5.3) so that thenew expression suppresses the index k in the regression coefficient. For thispurpose, let β(r) be an augmented vector containing all the unique parts ofβ(r)[1], · · · ,β(r)[K] from (5.3). Clearly, β(r) does not depend on index k. Thenlet Z(r)[k]i,j be an augmented version of the vector Z(r)i,j from (5.3) such thatthe augmented entries of Z(r)[k]i,j are all zero, and Z(r)[k]Ti,j β(r) = Z(r)Ti,j β(r)[k].Similarly, we will reexpress (5.3) in terms of B˜i = (B(1)Ti , · · · ,B(R)Ti )T . LetK˜(r)i,j be an augmented version of the vector K(r)i,j from (5.3) such that theaugmented entries of K˜(r)i,j are all zero, and K˜(r)Ti,j B˜i = K(r)Ti,j B(r)i . Then one170B.2. The first derivative of the density of Y˜ i|b˜ican write (5.3) as:η(r)[k]i,j = Z(r)[k]Ti,j β(r) + K˜(r)Ti,j b˜i.With these new notations, when b˜<l>i = AT d˜<l>i , η(r)[k]<l>i,j can be written as:η(r)[k]<l>i,j = Z(r)[k]Ti,j β(r) +(AK˜(r)i,j)Td˜<l>i .With these new notation, we can also write ΨY (r) = {β(r), κ(r)} for Cases1, and ΨY (r) = {β(r), ζ(r)} for Case 2. Now with these new notations, wederive the first derivatives of f(y˜i|B˜i = AT d˜<l>i ; Ψ′) with respect to Ψ′. Thederivation below requires matrix calculus and we refer to [60] for the details.B.2.2 DerivationsThe derivative with respect to pik, k = 1, 2, · · · ,K − 1Notice that piK = 1−∑K−1k=1 pik. Therefore, for h = 1, 2, · · · ,K − 1:∂∂pihf(y˜i|B˜i = AT d˜<l>i ; Ψ′)=∂∂pihK∑k=1pikf(y˜i|B˜i = AT d˜<l>i , Ci = k; Ψ′)= f(y˜i|B˜i = AT d˜<l>i , Ci = h; Ψ′)− f(y˜i|B˜i = AT d˜<l>i , Ci = K; Ψ′).The first equality is true by (5.19). This derivative applies to all cases.171B.2. The first derivative of the density of Y˜ i|b˜iThe derivative with respect to ΨY (r) , r = 1, 2, · · · , R∂∂ΨY (r)f(y˜i|B˜i = AT d˜<l>i ; Ψ′)=K∑k=1pik∂∂ΨY (r)f(y˜i|B˜i = AT d˜<l>i , Ci = k; Ψ′)=K∑k=1W[k]<l>in(r)∑j=1∂∂ΨY (r)l(r)[k]<l>i,j , (B.7)where:W[k]<l>i = pikf(y˜i|B˜i = AT d˜<l>i , Ci = k; Ψ′). (B.8)The second equality is true because ∂f(a)/∂a = f(a)∂ ln f(a)/∂a.In Case 1, ΨY (r) = {β(r), κ(r)} and the derivative of (B.7) reduces to:∂∂β(r)l(r)[k]<l>i,j =m(r)i,jκ(r)Z(r)[k]i,j(y(r)i,j − b′(θ(r)[k]<l>i,j)), (B.9)and:∂∂κ(r)l(r)[k]<l>i,j = −m(r)i,jκ(r)2(θ(r)[k]<l>i,j y(r)i,j − b(θ(r)[k]<l>i,j))+ c′(y(r)i,j , κ(r)).(B.10)In Case 2, ΨY (r) = {β(r), ζ(r)} and the derivative of (B.7) reduces to:∂∂β(r)l(r)[k]<l>i,j = µ(r)[k]<l>i,j Z(r)[k]i,j(y(r)i,jµ(r)[k]<l>i,j− y(r)i,j + ζ(r)µ(r)[k]<l>i,j + ζ(r))= Z(r)[k]i,jζ(r)(y(r)i,j − µ(r)[k]<l>i,j )µ(r)[k]<l>i,j + ζ(r). (B.11)172B.2. The first derivative of the density of Y˜ i|b˜iand:∂∂ζ(r)l(r)[k]<l>i,j =µ(r)[k]<l>i,j − y(r)i,jµ(r)[k]<l>i,j + ζ(r)+ ln(ζ(r)µ(r)[k]<l>i,j + ζ(r)). (B.12)The derivative with respect to AInstead of evaluating the derivatives with respect to ΨB, we evaluate thederivatives with respect to A:∂∂Af(y˜i|B˜i = AT d˜<l>i ; Ψ) =K∑k=1W[k]<l>i R∑r=1n(r)∑j=1∂∂Al(r)[k]<l>i,j .(B.13)In Case 1, the derivative of (B.13) becomes:∂∂Al(r)[k]<l>i,j =m(r)i,jκ(r)(y(r)i,j − b′(θ(r)[k]<l>i,j )) ∂θ(r)[k]<l>i,j∂A, (B.14)where:∂θ(r)[k]<l>i,j∂A= d˜<l>i K˜(r)Ti,jis a NRE by NRE matrix, and NRE is the length of the RE vector B˜i.In Case 2, the derivative of (B.13) becomes:∂∂Al(r)[k]<l>i,j =(y(r)i,jµ(r)[k]<l>i,j− y(r)i,j + ζ(r)µ(r)[k]<l>i,j + ζ(r))∂µ(r)[k]<l>i,j∂A,where:∂µ(r)[k]<l>i,j∂A= µ(r)[k]<l>i,j d˜<l>i K˜(r)Ti,j173B.3. The first derivatives of logit(Pi,k(Ψ′)) with respect to Ψ′.is a NRE by NRE matrix. Therefore,∂∂Al(r)[k]<l>i,j =ζ(r)(y(r)i,j − µ(r)[k]<l>i,j )µ(r)[k]<l>i,j + ζ(r)d˜<l>i K˜(r)Ti,j . (B.15)B.3 The first derivatives of logit(Pi,k(Ψ′)) withrespect to Ψ′.We used the delta method [10] to obtain a CI for the posterior probabilityPi,k(Ψ′) in (5.23). This Appendix evaluates the expression of ∂logit(Pi,k(Ψ′))/∂Ψ′,which is required in the delta method. Notice that∂logit(Pi,k(Ψ′))∂Ψ′=∂∂Ψ′ln[∫pikf(y˜i|B˜i = AT d˜i, Ci = k; Ψ′)f(d˜i)dd˜i]− ∂∂Ψ′ln∑h6=k∫pihf(y˜i|B˜i = AT d˜i, Ci = h; Ψ′)f(d˜i)dd˜i .Therefore, one can approximate the derivative using MC integration as:∂∂Ψ′ln[L∑l=1W[k]<l>i]− ∂∂Ψ′ln∑h6=kL∑l=1W[h]<l>i , (B.16)whereW[k]<l>i is defined in (B.8). Notice that (B.16) is exactly ∂logitP˜i,k(Ψ′)/∂Ψ′where P˜i,k(Ψ′) is defined in (5.24), and (B.16) can be expressed as:∂∂Ψ′logit(P˜i,k(Ψ′))=1L∑l=1W[k]<l>i[L∑l=1∂∂Ψ′W[k]<l>i]− 1∑h6=kL∑l=1W[h]<l>i∑h6=kL∑l=1∂∂Ψ′W[h]<l>i .(B.17)Since the collection of parameters Ψ′ = {{pik}K−1k=1 , {Ai,j}i≤j , {ΨY (r)}Rr=1},174B.3. The first derivatives of logit(Pi,k(Ψ′)) with respect to Ψ′.each component of ∂logit(Pi,k(Ψ′))/∂Ψ′ can be obtained by substituting theappropriate derivative of W[k]<l>i in (B.17). In particular, the derivatives withrespect to pim (m = 1, 2, · · · ,K − 1) are given by:∂∂pimW[k]<l>i =f(y˜i|B˜i = AT d˜<l>i , Ci = m) if k = m−f(y˜i|B˜i = AT d˜<l>i , Ci = K) if k = K0 else.Therefore for m = 1, 2, · · · ,K − 1:∂∂pimlogit(P˜i,k(Ψ′))=−L∑l=1f(y˜i|B˜i=AT d˜<l>i ,Ci=m;Ψ′)∑h 6=KL∑l=1W[h]<l>i−L∑l=1f(y˜i|B˜i=AT d˜<l>i ,Ci=K;Ψ′)L∑l=1W[K]<l>iif k = K−L∑l=1f(y˜i|B˜i=AT d˜<l>i ,Ci=m;Ψ′)∑h 6=kL∑l=1W[h]<l>i+L∑l=1f(y˜i|B˜i=AT d˜<l>i ,Ci=K;Ψ′)∑h 6=kL∑l=1W[h]<l>iif k 6= K and m 6= k+L∑l=1f(y˜i|B˜i=AT d˜<l>i ,Ci=m;Ψ′)L∑l=1W[m]li+L∑l=1f(y˜i|B˜i=AT d˜<l>i ,Ci=K;Ψ′)∑h 6=mL∑l=1W[h]<l>iif k 6= K and m = k.Similarly, the derivative of logit(Pi,k(Ψ′)) with respect to the other pa-rameters of Ψ′ can be obtained by substituting the following derivatives in(B.17):∂∂β(r)W[k]<l>i = W[k]<l>in(r)∑j=1∂∂β(r)l(r)[k]<l>i,j ,∂∂κ(r)W[k]<l>i = W[k]<l>in(r)∑j=1∂∂κ(r)l(r)[k]<l>i,j ,∂∂AW[k]<l>i = W[k]<l>i R∑r=1n(r)∑j=1∂∂Al(r)[k]<l>i,j ,175B.3. The first derivatives of logit(Pi,k(Ψ′)) with respect to Ψ′.where k = 1, 2, · · · ,K − 1 and r = 1, 2, · · · , R. The derivations above usedthe formula ∂f(Ψ)/Ψ = f(Ψ)∂ ln f(Ψ)/Ψ. Expressions for the derivativesthat appear in the expression immediately above have already been providedin Appendix B.2: ∂l(r)[k]<l>i,j /∂β(r) is given in (B.9) for Case 1 and in (B.11)for Case 2; ∂l(r)[k]<l>i,j /∂κ(r) is given in (B.10) for Case 1 and in (B.12) forCase 2; and ∂l(r)[k]<l>i,j /∂A is given in (B.14) for Case 1 and in (B.15) for Case2.176Appendix CAppendix for Chapter 6C.1 Lenercept trial: MCEM parameterassessmentsThis Appendix assesses the convergence of the MCEM algorithm for the lener-cept trial analysis. The top row of Figure C.1 shows the approximated log-likelihood values at each MCEM iteration for SINGLE with the Linear meanmodel and DOUBLE with the Linear mean models for both the NAL andPAL counts. The log-likelihood values sometimes decrease slightly, due to theMC approximation employed to evaluate the observed log-likelihood, or MCerror in the MCEM algorithm. The traceplots show that in both procedures,the approximated observed log-likelihoods increase more rapidly in the firstfew iterations.The middle row of Figure C.1 shows the traceplots of ∆Q(Ψ̂{s+1}; Ψ̂{s})with its lower Lα(σˆ) and upper Uα(σˆ) bounds. Unlike the illustrative ex-ample, this data analysis employs the initial values discussed in Section 5.4.5.Therefore, the value of ∆Q(Ψ̂{s+1} ; Ψ̂{s}) at the first iteration is substantiallysmaller here than in the illustrative example.The bottom row of Figure C.1 shows the traceplots of the MC sample sizeM . Both the SINGLE and DOUBLE procedures stay at the initial MCEMsample size M = 200 for a long time: for the first 42 and 40 iterations respec-tively.Figure C.2 shows the traceplots of the parameters estimated by these twoprocedures. SINGLE and DOUBLE require 64 and 97 iterations before thealgorithms terminate respectively (reported in Table 6.1). Although some ofthe parameters that are shared by the two clusters (i.e., β(1)0 , β(2)0 , β(2)0 , σ1,177C.1. Lenercept trial: MCEM parameter assessments(a) Approximated observed log-likelihood value(b) ∆Q(Ψ̂{s+1}; Ψ̂{s})(c) MFigure C.1: The top panels show the traceplots of the approximated observedlog-likelihood value based on SINGLE with the Linear mean model for NALand DOUBLE with the Linear mean models for both NAL and PAL counts.The middle panels show the asymptotic lower Lα(σˆ) and upper Uα(σˆ) boundsfor ∆Q(Ψ̂{s+1}; Ψ̂{s}) (dashed curves) and the point estimates (solid curve).The y-axis is on the log-scale. The bottom panels show the MC sample sizeat each iteration. 178C.1. Lenercept trial: MCEM parameter assessmentsσ2, ρ12) appear to fluctuate a lot even at the later iterations, this could bebecause the initial values are relatively close to the final estimate and the y-axes of the traceplots have quite a limited range. Considering the magnitudesof the SEs of these parameters in Table 6.1, even with these fluctuations, wemay consider that our MCEM algorithm has converged to a local maximum.179C.1. Lenercept trial: MCEM parameter assessments(a) Traceplots of parameters Ψ{s}Y (1)over the MCEM iterations.(b) Traceplots of parameters Ψ{s}Y (2)over the MCEM iterations.(c) Traceplots of parameters Σ{s} and pi{s} over the MCEM iterations.Figure C.2: The traceplots of the estimated parameters over MCEM itera-tions. The black curves indicate the trace of DOUBLE with the Linear meanmodels for both NAL and PAL counts. The red curves indicate the trace ofSINGLE with the Linear mean model for NAL counts.180C.2. MBP8298 trial: MCEM parameter assessmentsC.2 MBP8298 trial: MCEM parameterassessmentsThis Appendix assesses the convergence of the MCEM algorithm for theMBP8298 trial analysis. Figure C.3 (a) shows the traceplots for the ap-proximated observed log-likelihood of TRIPLE, SINGLE-PASAT, SINGLE-T25FW and SINGLE-9HPT. All SINGLEs show clear increasing trends largelybecause the initial values are far from the final results. The approximated ob-served log-likelihood of TRIPLE seems to decrease more often than other pro-cedures due to both the MC error to approximate the observed log-likelihoodand the MC error at E-step of the MCEM algorithm.Figure C.3 (b) shows the traceplots of ∆Q(Ψ̂{s+1} ; Ψ̂{s}). All the proce-dures show overall decreasing trends of ∆Q(Ψ̂{s+1}; Ψ̂{s}). However, aroundthe 40th − 45th iterations, ∆Q(Ψ̂{s+1}; Ψ̂{s}) of SINGLE-PASAT increasesrapidly, and the approximated observed likelihood also increases rapidly. Aroundthese iterations, the traceplots of the regression coefficients of the PASAT, i.e.,β(1)0 , β(1)1 , β(1)[1]2 and β(1)[2]2 in Figure C.4 are changing dramatically. Thesecould be because the MCEM algorithm reached a nearly local maximum ofthe observed log-likelihood around the 40th iteration.Figure C.3 (c) shows the traceplot of MC sample sizes. We see that theMC sample size grows gradually in all the procedures.Figure C.4 shows the traceplots of each estimated parameter. Most pa-rameters seem to stabilize before the algorithm terminates, except ρ23 fromTRIPLE and β(3)0 from SINGLE-T25FW. However, considering the magni-tude of the SEs of these parameters, reported in Table 6.3, their fluctuationsacross iterations may be reasonable. Overall we conclude that the MCEMalgorithm has converged to the local maximum.181C.2. MBP8298 trial: MCEM parameter assessments(a) Approximated observed log-likelihood value(b) ∆Q(Ψ̂{s+1}; Ψ̂{s})(c) MFigure C.3: The top panels show the traceplots of the approximated ob-served log-likelihood value based on procedures (TRIPLE and SINGLE). Themiddle panels show the asymptotic lower Lα(σˆ) and upper Uα(σˆ) bounds for∆Q(Ψ̂{s+1}; Ψ̂{s}) (dashed curves) and the point estimates (solid curve). They-axis is on the log-scale. The bottom panels show the MC sample size ateach iteration.182C.2. MBP8298 trial: MCEM parameter assessments(a) Traceplots of parameters Ψ{s}Y (1)over the MCEM iterations.(b) Traceplots of parameters Ψ{s}Y (2)over the MCEM iterations.(c) Traceplots of parameters Ψ{s}Y (3)over the MCEM iterations.(d) Traceplots of parameters Σ{s} and pi{s} over the MCEM iterations.Figure C.4: The traceplots of the estimated parameters over MCEM itera-tions. The black, magenta, red and blue curves indicate the trace of TRIPLE,SINGLE-PASAT, SINGLE-T25FW and SINGLE-9HPT, respectively.183


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items