@prefix vivo: . @prefix edm: . @prefix ns0: . @prefix dcterms: . @prefix skos: . vivo:departmentOrSchool "Science, Faculty of"@en, "Statistics, Department of"@en ; edm:dataProvider "DSpace"@en ; ns0:degreeCampus "UBCV"@en ; dcterms:creator "Zhou, Guohai"@en ; dcterms:issued "2017-12-20T19:46:35Z"@en, "2017"@en ; vivo:relatedDegree "Doctor of Philosophy - PhD"@en ; ns0:degreeGrantor "University of British Columbia"@en ; dcterms:description "In many applications, statistical models for real data often have natural constraints or restrictions on some model parameters. For example, the growth rate of a child is expected to be positive, and patients receiving anti-HIV treatments are expected to exhibit a decline in their viral loads. Hypothesis testing for certain model parameters incorporating the natural constraints is expected to be more powerful than testing ignoring the constraints. Although constrained statistical inference, especially multi-parameter order-restricted hypothesis testing, has been studied in the literature for several decades, methods for models for complex longitudinal data are still very limited. In this thesis, we develop innovative multi-parameter orderrestricted (or one-sided) hypothesis testing methods for modelling the following complex data: (1) multivariate normal data with non-ignorable missing values; (2) semi-continuous longitudinal data; and (3) left censored or truncated longitudinal data due to detection limits. We focus on testing mean parameters in the models, and the approaches are based on the likelihood methods. Some asymptotic results are obtained, and some computational challenges are discussed. Simulation studies are conducted to evaluate the proposed methods. Several real datasets are analyzed to illustrate the power advantages of proposed new tests."@en ; edm:aggregatedCHO "https://circle.library.ubc.ca/rest/handle/2429/64122?expand=metadata"@en ; skos:note "Multivariate One-sided Tests For Multivariate Normaland Mixed Effects Regression Models With Missing Data,Semi-continuous Data and Censored DatabyGuohai ZhouB.Sc., Sun Yat-sen University, 2010M.Sc., University of Illinois at Urbana-Champaign, 2012A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Statistics)The University Of British Columbia(Vancouver)December 2017c© Guohai Zhou, 2017AbstractIn many applications, statistical models for real data often have natural constraintsor restrictions on some model parameters. For example, the growth rate of a childis expected to be positive, and patients receiving anti-HIV treatments are expectedto exhibit a decline in their viral loads. Hypothesis testing for certain model pa-rameters incorporating the natural constraints is expected to be more powerful thantesting ignoring the constraints. Although constrained statistical inference, espe-cially multi-parameter order-restricted hypothesis testing, has been studied in theliterature for several decades, methods for models for complex longitudinal dataare still very limited. In this thesis, we develop innovative multi-parameter order-restricted (or one-sided) hypothesis testing methods for modelling the followingcomplex data: (1) multivariate normal data with non-ignorable missing values; (2)semi-continuous longitudinal data; and (3) left censored or truncated longitudinaldata due to detection limits. We focus on testing mean parameters in the models,and the approaches are based on the likelihood methods. Some asymptotic resultsare obtained, and some computational challenges are discussed. Simulation studiesare conducted to evaluate the proposed methods. Several real datasets are analyzedto illustrate the power advantages of proposed new tests.iiLay SummaryMany real world studies exhibit order constraints between variables, for example,patients receiving anti-HIV treatments are expected to show deceased HIV viruslevels. This thesis aims to develop innovative statistical methods to formally in-corporate order constraints into the analysis of complex data such as data not fullyobserved from every study subject, data containing a mixture of excessive zeroand non-zero values, and data produced by devices unable to measure small val-ues below certain thresholds. This thesis has demonstrated that the simultaneousincorporation of order constraints and data complications can lead to more reliableas well as more efficient statistical inferences and therefore can also yield scientif-ically more sound conclusions.iiiPrefaceChapter 3, Chapter 4 and Chapter 5 in this thesis are collaborated work with Profes-sor Lang Wu, Department of Statistics, University of British Columbia. A versionof Chapter 3 has been published in the Journal of Applied Statistics. Versions ofChapter 4 and Chapter 5 will be submitted for publication. In these three chapters, Iidentified the research topics, conducted the research, analyzed real data examples,implemented simulation studies and wrote multiple versions of the manuscript.Professor Lang Wu guided the research design, real data analysis, simulation stud-ies, and edited multiple versions of the manuscript.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiList of Mathematical Notations . . . . . . . . . . . . . . . . . . . . . . . xivAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 A Review of Multivariate One-sided Tests for Multivariate NormalMeans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5v1.3 Multivariate One-sided Tests With Complex Data: Motivating Ex-amples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.4 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . 231.5 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241.6 Appendix: Some Miscellaneous Results . . . . . . . . . . . . . . 261.6.1 Some Results for Section 1.1 . . . . . . . . . . . . . . . . 261.6.2 Some Results for Section 1.2 . . . . . . . . . . . . . . . . 291.6.3 Simulation Based Calculation of Chi-bar-weights . . . . . 302 Complications in Longitudinal Data . . . . . . . . . . . . . . . . . . 332.1 Longitudinal Data and Mixed Effects Models . . . . . . . . . . . 332.1.1 Linear Mixed Effects (LME) Models . . . . . . . . . . . 372.1.2 Generalized Linear Mixed Effects Models (GLMMs) . . . 392.1.3 Nonlinear Mixed Effects (NLME) Models . . . . . . . . . 412.2 Missing Data Problems . . . . . . . . . . . . . . . . . . . . . . . 432.3 Semi-Continuous Data . . . . . . . . . . . . . . . . . . . . . . . 472.4 Censored or Truncated Data . . . . . . . . . . . . . . . . . . . . 503 ALikelihood-based Approach forMultivariate One-Sided TestsWithMissing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553.2 Multivariate One-sided Tests with Missing Data . . . . . . . . . . 573.2.1 Multivariate One-sided Tests With Complete Data Basedon Multivariate Normality . . . . . . . . . . . . . . . . . 573.2.2 Multivariate One-sided Tests With Missing Data . . . . . 603.3 Computational Issues . . . . . . . . . . . . . . . . . . . . . . . . 63vi3.4 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703.4.1 The Data and Hypotheses . . . . . . . . . . . . . . . . . 703.4.2 Inequality-restricted Tests With Missing Data . . . . . . . 713.4.3 Analysis Results . . . . . . . . . . . . . . . . . . . . . . 743.5 A Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . 783.5.1 Data Generation . . . . . . . . . . . . . . . . . . . . . . 783.5.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . 813.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 863.7 Proof of Asymptotic Results . . . . . . . . . . . . . . . . . . . . 883.7.1 Notation and Regularity Conditions . . . . . . . . . . . . 883.7.2 Consistency and√n-consistency . . . . . . . . . . . . . . 923.7.3 Asymptotic Distribution of LRT statistic . . . . . . . . . . 943.7.4 Asymptotic Distribution of Wald Test Statistic . . . . . . 963.8 An Illustrative Example of BOBYQA Algorithm . . . . . . . . . 984 Constrained Inference for Models of Semi-continuous LongitudinalData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1014.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1014.2 Models and Likelihood Inference . . . . . . . . . . . . . . . . . . 1044.2.1 Models for Semi-continuous Longitudinal Data . . . . . . 1044.2.2 The Likelihood . . . . . . . . . . . . . . . . . . . . . . . 1064.2.3 Approximate Likelihood Computation . . . . . . . . . . . 1084.2.4 Likelihood Inference Under Order Restrictions . . . . . . 1114.3 Analysis of a Lung Cancer Data . . . . . . . . . . . . . . . . . . 1174.4 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . 124vii4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1305 Multivariate One-sided Tests for Nonlinear Mixed Effects ModelsWith Censored Response . . . . . . . . . . . . . . . . . . . . . . . . 1325.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1325.2 Multivariate Order-Restricted Tests for NLME Models with Cen-sored Responses . . . . . . . . . . . . . . . . . . . . . . . . . . . 1355.2.1 General NLME Models with Censored Responses . . . . 1355.2.2 Order-Restricted Tests for Mean Parameters . . . . . . . . 1365.2.3 Computational Strategies . . . . . . . . . . . . . . . . . . 1415.2.4 An EM Algorithm for Order-Restricted Inference for NLMEModels with Censored Responses . . . . . . . . . . . . . 1425.3 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1575.4 A Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . 1625.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1676 Conclusions, Discussion, and Future Work . . . . . . . . . . . . . . 1696.1 Conclusions and Discussion . . . . . . . . . . . . . . . . . . . . 1696.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174viiiList of TablesTable 3.1 Parameter estimates and standard errors (in brackets) for meanparameters under different assumptions of missing data mecha-nism for the Surkan et al. (2015) trial data . . . . . . . . . . . 75Table 3.2 Parameter estimates for variance component parameters underdifferent assumptions of missing data mechanism for the Surkanet al. (2015) trial data . . . . . . . . . . . . . . . . . . . . . . 75Table 3.3 Missing data model parameters and standard errors (in brackets)under different assumptions of missing data mechanism for theSurkan et al. (2015) trial data . . . . . . . . . . . . . . . . . . 76Table 3.4 P-values of various tests applied to the Surkan et al. (2015) . . 76Table 3.5 Simulation results under unequal missing rates (10%,35%) fortesting H0 : µ1 ≤ 0 and µ2 ≤ 0 versus H1 : µ1 > 0 or µ2 > 0 . 82Table 3.6 Simulation results under equal missing rates (15%,15%) for test-ing H0 : µ1 ≤ 0 and µ2 ≤ 0 versus H1 : µ1 > 0 or µ2 > 0 . . . 83Table 3.7 Empirical type I error rates and power (in %) under MCAR(p=5, n=50 and equal missing rates 20%) . . . . . . . . . . . . 85ixTable 4.1 Summary of the lung cancer data in Ishizumi et al. (2010) . . . 119Table 4.2 Estimates of the fixed parameters in models for Y ∗i j and Ui j . . . 123Table 4.3 P-values from testing H01 (or H02) versus the unrestricted alter-native Hu1 and restricted alternative Hr1 (the bound method andthe substitution method are both used for Hr1). . . . . . . . . . 124Table 4.4 Comparison of type I error rates and powers for restricted andunrestricted tests. . . . . . . . . . . . . . . . . . . . . . . . . 127Table 4.5 Comparison of type I error rates and powers in the second sim-ulation study. . . . . . . . . . . . . . . . . . . . . . . . . . . . 129Table 5.1 Fixed-effects parameter estimates . . . . . . . . . . . . . . . . 159Table 5.2 Estimates of variance component parameters D and σ2 . . . . 160Table 5.3 Comparison of type I error rates and power between constrainedtests and unconstrained tests for an HIV viral load detectionlimit of 50, at 5% significance level . . . . . . . . . . . . . . . 165Table 5.4 Comparison of type I error rates and power between constrainedtests and unconstrained tests for an HIV viral load detectionlimit of 100, at 5% significance level . . . . . . . . . . . . . . 166xList of FiguresFigure 1.1 Null and alternative parameter spaces in hypotheses (1.6) forp = 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6Figure 1.2 Null and alternative parameter spaces in hypotheses (1.7) forp = 2. The region to the left and bottom of the dashed lines isO−− µ = {x− µ : x ∈ O−}, for µ ∈ Q3. The dashed arrowsmark the distance from Z to O−− µ and the distance from Zto O−. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Figure 1.3 The temperament score trajectories of 10 randomly subjectsfrom the zinc-folic group in the Surkan et al. (2015) trial. Thedot subject has a missing temperament score at the 2nd follow-up visit and the triangle subject has a missing temperamentscore at the 1st follow-up visit. . . . . . . . . . . . . . . . . . 16Figure 1.4 The trajectories of log(volumn+ 1) for a randomly selectedsubject from each group taken from Albert and Shen (2005):a) active acupuncture plus medication; b) sham acupunctureplus medication; and c) medication alone . . . . . . . . . . . 19xiFigure 1.5 HIV viral load trajectories of 4 randomly selected subjectsfrom ACTG 315 study and from ACTG 398 study respectively.In the figures, dotted horizontal lines indicate censoring thresh-olds, and square points below dashed lines are half the detec-tion limits. . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Figure 2.1 An example of features of longitudinal data: trajectories ofHIV viral load and CD4 from the ACTG 315 data, the dashedline in the left panel indicates censoring threshold and the pointsmarked by crosses indicate censored HIV viral load observations 35Figure 2.2 Example of semi-continuous data taken from Olsen and Schafer(2001): the histogram of alcohol use for grade 7 teenagers inLos Angeles area, where there is a large proportion of zeros(i.e., a lack of alcohol usage). . . . . . . . . . . . . . . . . . 48Figure 3.1 Null and alternative parameter regions when p = 2. . . . . . . 58Figure 3.2 Trajectories of temperament scores for 8 randomly selectedsubjects in each group . . . . . . . . . . . . . . . . . . . . . 72Figure 3.3 A hypothetical data example to illustrate the BOBYQA algo-rithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99Figure 3.4 Values of the negative log-likelihood function and parameterestimates at every 5 iterations of the BOBYQA algorithm. . . 100Figure 4.1 The two shaded regions denote the alternative hypothesis H1. 103xiiFigure 4.2 Trajectory of 4 randomly selected subjects from each group.The dashed line corresponds to 3, the cutoff defining abnor-mality in lung-average biopsy grade. . . . . . . . . . . . . . . 120Figure 4.3 Median and interquartile range of p-value reductions benefit-ing from order-restrictions . . . . . . . . . . . . . . . . . . . 130Figure 5.1 Observed viral load (dots) versus fitted viral load (solid line)for 4 randomly selected subjects in the ACTG 398 study; thedotted horizontal line indicates censoring threshold; the smallred dot below the dotted line from subject 22 indicates a cen-sored viral load observation . . . . . . . . . . . . . . . . . . 161xiiiList of Mathematical Notationsθ unknown scalar parameterθ vector of unknown parametersfY (y;θ) probability density (or mass) function of a continuous (or dis-crete) random variable Y , parametrized by θfY (y;θ) probability density function of a continuous random vector Y ,parametrized by θfX ,Y (x,y;θ) joint probability density function of two continuous random vari-ables X and Y , parametrized by θfY |X(y|x;θ) conditional probability density function of a continuous randomvariable Y given a random variable X = x, parametrized by θE(Y ;θ) marginal expectation of a random variable Y whose distributionis parameterized by θE(Y |X ;θ) conditional expectation of a random variable Y given a randomvariable X , calculated as E(Y |X ;θ) = ∫ y fY |X(y|X ;θ)dy when Yis continuous or E(Y |X ;θ) = ∑y y fY |X(y|X ;θ) when Y is discretexivAcknowledgmentsForemost, I would like to express my tremendous gratitude to my supervisor, Prof.Lang Wu, for his incredibly wonderful encouragement, patience, and insightfulguidance. I would have been unable to finish the PhD study without his remarkablementorship.I am also grateful to my other committee members including Prof. RollinBrant, Prof. Jiahua Chen, Prof. Harry Joe, Prof. Paul Gustafson, Prof. Qian Yi,and Prof. Sanjoy Sinha for their perspicacious comments and perceptive questions.I also thank all faculty, staff and graduate students in the Department of Statis-tics at UBC for creating an amazing environment for learning and growth, as wellas Prof. Mark Ansermino, Prof. Matthias Gorges and their team in BC Children’shospital for providing me excellent research collaboration opportunities.Finally, I would like to dedicate this dissertation to my father Debing Zhou andmy mother Mingzhen Tang, for their unconditional support and love.xvChapter 1Introduction1.1 BackgroundIn many practical problems, some order constraints on the parameters of a statis-tical model arise naturally. For example, the growth of a child is expected to bepositive, or a drug/treatment is expected to be at least as effective as a control. Inhypothesis testing, it is desirable to incorporate such constraints since they maylead to more powerful tests than unconstrained tests, e.g., a one-sided test may bemore powerful than a two-sided test if the one-sided test incorporating the “one-sided” constraint can be justified by the scientific problem under consideration.More specifically, consider the simple case of testing one parameter, say µ .Suppose that µ represents the effectiveness of a new drug or a new treatment, withµ > 0, µ < 0, and µ = 0 respectively indicating that the drug is effective, harmful,and neutral. To test the effectiveness of the drug, we may consider the following1one-sided testH0 : µ ≤ 0 versus H1 : µ > 0. (1.1)The above one-sided test may be more appropriate than the following two-sidedtestH0 : µ = 0 versus H1 : µ 6= 0. (1.2)Furthermore, if subject-area knowledge suggests that the drug is unlikely to beharmful, then we may even consider the following one-sided testH0 : µ = 0 versus H1 : µ > 0. (1.3)In the literature, the likelihood ratio test (LRT) and Wald test are perhapsthe most commonly used hypothesis testing methods. In preparation for multi-parameter testing problems to be discussed later, here we briefly illustrate theideas for testing one-parameter problems (1.1), (1.2) and (1.3). Then, we willextend the methods to multi-parameter problems and discuss the challenges in-volved. Suppose that there are n independent and identically distributed observa-tions, X1, . . . ,Xn, drawn from a univariate normal population N(µ,σ2).Given (x1, . . . ,xn), which is a set of realized values of (X1, . . . ,Xn), the log-likelihood function is as follows.ln(µ,σ2) =−n2 logσ2− 12σ2n∑i=1(xi−µ)2. (1.4)2After some algebra, we can derive the maximum likelihood estimate (MLE) of themean parameter µ under different constraints:• Without any constraint on µ , the MLE of µ is µˆ = x = ∑ni=1 xi/n.• Under the one-sided constraint µ ≤ 0, the MLE of µ is µˆ−= x, if x≤ 0,0, if x > 0.• Under the one-sided constraint µ ≥ 0, the MLE of µ is µˆ+= x, if x≥ 0,0, if x < 0.• Under the constraint µ = 0, the MLE of µ is µˆ0 = 0.For the two-sided hypotheses (1.2),H0 : µ = 0 versus H1 : µ 6= 0,the Wald test statistic is given byTW =x2Vˆar(x)=n2x2∑ni=1(xi− x)2.The LRT statistic is given byTLR = 2[− n2log∑ni=1(xi− µˆ)2n− n2+n2log∑ni=1(xi− µˆ0)2n+n2]= n log[1+nx2∑ni=1(xi− x)2]= n log(1+1nTW). (1.5)It can be shown in Section 1.6.1 that both TW and TLR have a χ21 distribution as theirlarge sample approximate null distribution.3Since the one-parameter hypotheses (1.2) only involves a scalar parameter, theabove Wald test statistic TW may also be written asT ′W =x√∑ni=1(xi−x)2n /√n= sign(x)√TW ,where sign(x) = 1 if x is positive and sign(x) = −1 if x is negative. Note that thewell-known Student’s t-test statistic is given byT =x√∑ni=1(xi−x)2n−1 /√n=√n−1√nT ′W ,which has a Student’s t-distribution with degrees of freedom n−1 under H0. Sim-ilarly, the LRT statistic TLR in (1.5) can also be written asTLR = n log(1+1n−1T2).Therefore, the Wald test and the LRT are equivalent to the Student’s t-test in theone-parameter testing problems.Similarly, we can show that the Wald test and the LRT are equivalent to theStudent’s t-test for one-sided hypotheses to test the mean parameter µ . In otherwords, the Student’s t-test, LRT, and Wald tests are equivalent for testing eitherone-sided or two-sided hypotheses for a single mean parameter under the univariatenormal model. However, for multivariate normal models, when testing the meanparameter vector, the situation is different. In this case, the Student’s t-test may begeneralized to the Hotelling’s T-statistic for two-sided hypotheses, but there is nostraightforward generalization of Student’s t-test statistic for multi-parameter one-4sided hypotheses. In this case, both the Wald and LRT test statistics can still bedeveloped, but their null distributions are no longer the familiar χ2 distributions,due to the violation of standard regularity conditions for likelihood theory whenthe parameters are constrained. In the next section, we briefly review multivariateone-sided or order-restricted testing problems.1.2 A Review of Multivariate One-sided Tests forMultivariate Normal MeansThere has been an extensive literature on multivariate one-sided or order-restrictedtests in the last several decades. In this section, we briefly review the basic re-sults under the multivariate normal populations. Suppose that X 1,X 2, . . . ,X n areindependent and identically distributed random vectors from the p-dimension mul-tivariate normal distribution N(µ ,S), where X i = (xi1,xi2, · · · ,xip)T andµ = (µ1,µ2, · · · ,µp)T , p ≥ 2. For simplicity, we first assume that the covariancematrix S is known. We will discuss the case with unknown S later, where we maysubstitute an unknown S by its consistent estimates such as the sample covariancematrix. We focus on the following two multivariate one-sided hypotheses:H0 : µ = 0 versus H1 : µ ≥ 0, µ 6= 0, (1.6)H0 : µ ∈O− versus H1 : µ ∈ Rp\\O−, (1.7)where O− = {(µ1, . . . ,µp)T | µi ≤ 0, for i = 1, . . . , p}, µ ≥ 0 indicates µi ≥ 0for all i, and µ 6= 0 indicates µi 6= 0 for some i. Figure 1.1 shows an example ofthe parameter spaces for hypotheses (1.6) when p = 2, and Figure 1.2 shows anexample of the parameter spaces for hypotheses (1.7).5µ1µ2H1H0Q1Q2Q3 Q4Figure 1.1: Null and alternative parameter spaces in hypotheses (1.6) for p=2.6µ1µ2H1H1H1H0Q1Q2Q3 Q4ZFigure 1.2: Null and alternative parameter spaces in hypotheses (1.7) for p=2. The region to the left and bottom of the dashed lines is O−− µ ={x− µ : x ∈ O−}, for µ ∈ Q3. The dashed arrows mark the distancefrom Z to O−−µ and the distance from Z to O−.The log-likelihood function, up to an additive constant, is given byln(µ ) =−n2(X −µ )T S−1(X −µ ),where X = ∑ni=1 X i/n. When the covariance matrix S is known, the LRT and Waldtest statistics for the multivariate one-sided test (1.7) are the same and both aregiven byT1 = minv∈O−n(X − v)T S−1(X − v)7=‖ X −piSn(X ;O−) ‖2Sn ,where Sn = S/n, the norm ‖ y ‖2Sn is defined by yT S−1n y, and piSn(X ;O−) is theorthogonal projection of X ontoO− with respect to Sn. Intuitively, T1 measures the“distance” between X (the sufficient statistic for µ ) and the null parameter spaceO− with respect to Sn. Therefore, larger values of T1 indicate stronger evidenceagainst H0.Similarly, the LRT and Wald test statistic for the multivariate one-sided test (1.6)are also the same and both are given byT2 = nXTS−1X −minv∈O+n(X − v)T S−1(X − v)=‖ piSn(X ;O+) ‖2Sn ,where O+ = {(µ1, . . . ,µp)T | µi ≥ 0, for i = 1, . . . , p}, piSn(X ;O+) is the orthogo-nal projection of X onto O+ with respect to Sn. Again, larger values of T2 indicatestronger evidence against H0.To determine critical values or p-values of the above tests, we need to find thenull distributions of the test statistics, i.e., the distributions of T1 and T2 if the nullhypothsis H0 holds. Unlike the univariate case discussed in the previous section,the null distributions in the multivariate case are different from the “two-sided”Hotelling’s T -test, due to the shape of the null and alternative parameter spaces.We illustrate the basic ideas based on simple examples when p = 2.An example for the null distribution of T2 for testing (1.6)As an illustrating example, to intuitively understand the sampling distributionof T2 (say), we consider a simple case of p = 2 and S = ( 1 00 1). In this case,8piSn(X ;O+) and T2 take values in four possible regions, depending on which quad-rant the value of X = (X1,X2)T falls (see Figure 1.1):piSn(X ;O+) =(X1,X2)T if X ∈ Q1(0,X2)T if X ∈ Q2(0,0)T if X ∈ Q3(X1,0)T if X ∈ Q4,T2 =‖ piSn(X ;O+) ‖2Sn =n(X21+X22) if X ∈ Q1nX22 if X ∈ Q20 if X ∈ Q3nX21 if X ∈ Q4, (1.8)where Qi is quadrant i for i = 1,2,3,4.In Section 1.6.2, we show that, under H0: µ = 0, a) the quadrant that X may falland the “length” of X are statistically independent, and b) X are equally probableto fall in any of the four quadrants, i.e., Pr(X ∈ Qi) = 1/4 for i = 1,2,3,4. Thus,under H0, for any positive constant c, we havePr(T2 ≤ c) =4∑i=1Pr(T2 ≤ c|X ∈ Qi)Pr(X ∈ Qi) = 144∑i=1Pr(T2 ≤ c|X ∈ Qi).Note that, based on (1.8), we havePr(T2 ≤ c|X ∈ Q1) = Pr[n(X21+X22)≤ c]= Fχ2(c;2),where Fχ2(c;2) is the probability that a random variable following a χ2-distributionwith 2 degrees of freedom is less than or equal to c.9This is because under H0, (X1X2 ) follows a bivariate normal distribution withmean ( 00) and covariance matrix (1/n 00 1/n), so√nX1 and√nX2 are independentstandard normal random variables respectively and consequently the random vari-able n(X21 +X22) has a χ22 distribution, where χ2k denotes a χ2 distribution with kdegrees of freedom. Similarly, we havePr(T2 ≤ c|X ∈ Q2) = Pr(nX22 ≤ c) = Fχ2(c;1),Pr(T2 ≤ c|X ∈ Q4) = Pr(nX21 ≤ c) = Fχ2(c;1),Pr(T2 ≤ c|X ∈ Q3) = Pr(0≤ c) = 1 = Fχ2(c;0),where χ20 denotes a distribution that places probability 1 at the point mass 0, andFχ2(c; i) is the probability that a χ2i random variable is less than or equal to c.In summary, the above example shows that the null distribution of the LRT orWald test statistic T2 for testing (1.6) is given by the following so-called chi-bar-square distribution (denoted by χ¯2)Pr(T2 ≤ c) =2∑i=0ωi(2,Sn,O+)Fχ2(c; i),where chi-bar-weight ωi(2,Sn,O+) is a probability weight and has the followinginterpretationωi(2,Sn,O+) = Pr[piSn(X ;O+) is on the i-th dimensional face of O+].That is, in the above two-dimensional example, the 0, 1 and 2 dimensional faceof O+ are respectively the origin (0,0), the union of the positive part of x-axis and10y-axis, and the interior of the first quadrant Q1 (see Figure 1.1).In general, the null distribution of the LRT or the Wald test statistic T2 fortesting hypothesis (1.6) is given by (Silvapulle and Sen, 2004)Pr(T2 ≤ c) =p∑i=0ωi(p,Sn,O+)Fχ2(c; i),where the weightsωi(p,Sn,O+) = Pr[piSn(X ;O+) is on the i-th dimensional face of O+],with ∑pi=0ωSn(p,X ;O+) = 1.An example for the null distribution of T1 for testing (1.7)To intuitively understand the sampling distribution of the LRT and Wald teststatistic T1 =‖ X −piSn(X ;O−) ‖2Sn for testing hypothesis (1.7), we again considerthe simple case of p = 2 and S = ( 1 00 1). In this case, T1 takes values in four pos-sible regions, depending on which quadrant the value of X = (X1,X2)T falls, asillustrated in Figure (1.2):piSn(X ;O−) =(0,0)T if X ∈ Q1(X1,0)T if X ∈ Q2(X1,X2)T if X ∈ Q3(0,X2)T if X ∈ Q4,11T1 =‖ X −piSn(X ;O−) ‖2Sn =n(X21+X22) if X ∈ Q1nX22 if X ∈ Q20 if X ∈ Q3nX21 if X ∈ Q4. (1.9)Since the null parameter space O− contains more than one value of µ , thep-value will vary as a function of µ when H0 holds. In the following we willfirst show that the largest p-value is attained at the boundary point µ = 0 (seeFigure 1.2), which is a special case of Theorem 3.8.1 in Silvapulle and Sen (2004).Let tobs denote the observed value of the test statistic T1 based on a given sam-ple. Let Z = X −µ and‖ X −O− ‖2Sn=‖ X −piSn(X ;O−) ‖2Sn= T1.Let O−− µ = {x− µ : x ∈ O−} denote the region on the bottom-left side of thedashed lines in Figure 1.2. Note that O− = {(z1,z2)T : z1 ≤ 0,z2 ≤ 0} is containedin O−−µ = {(z1,z2)T : z1 ≤−µ1,z2 ≤−µ2} whenever µ = (µ1,µ2)T ∈O− (i.e.under H0), since zi ≤ 0 and µi ≤ 0 together imply zi ≤−µi for i= 1,2. Thus, underH0, we havePr( ‖ X −O− ‖2Sn≥ tobs) = Pr( ‖ (X −µ )− (O−−µ ) ‖2Sn≥ tobs)= Pr( ‖ Z− (O−−µ ) ‖2Sn≥ tobs)≤ Pr( ‖ Z−O− ‖2Sn≥ tobs). (1.10)The last step holds because the distance from Z to O− is at least as large as the12distance from Z to the bigger space O−− µ containing O−. As an example, anZ in quadrant two plotted in Figure 1.2 has the distance to O− strictly larger thanthe distance to O−− µ . Since Z has the same distribution as X when µ = 0, theinequality (1.10) suggests that the largest p-value based on T1 for testing µ ∈ O−against µ /∈O− is attained at µ = 0.Since the p-value is maximized at µ = 0, only the sampling distribution of T1 atµ = 0 is of interest, which is called the least favorable null distribution (Silvapulleand Sen, 2004). Following a similar argument used for the sampling distributionof T2, the cumulative distribution function of T1 at µ = 0 is given byPr(T1 ≤ c) =2∑i=0ω2−i(2,Sn,O−)Fχ2(c; i),where the chi-bar-weightω2−i(2,Sn,O−) = Pr[piSn(X ;O−) is on the (2− i)-th dimensional face of O−].In general, the least favorable null distribution of T1 for testing (1.7) is attainedat µ = 0 and has the following cumulative distribution function (Silvapulle andSen, 2004)Pr(T1 ≤ c) =p∑i=0ωp−i(p,Sn,O−)Fχ2(c; i),whereωp−i(p,Sn,O−) = Pr[piSn(X ;O−) is on the (p− i)-th dimensional face of O−],13with ∑pi=0ωp−i(p,Sn,O−) = 1.The chi-bar-weights ωp(p,Sn;O+) and ωp−i(p,Sn;O−) can be computed us-ing Monte Carlo simulations. A brief discussion of this approach is provided inSection 1.6.3 or Section 3.5 in Silvapulle and Sen (2004).Multivariate one-sided tests can also be conducted in the context of regressionmodels, as outlined in Section 3.9 of Silvapulle and Sen (2004). LRT and Waldtests can be constructed, and their asymptotic null distributions can be shown to besome chi-bar-square distributions under mild regularity conditions, as discussed inSection 4.3 of Silvapulle and Sen (2004).1.3 Multivariate One-sided Tests With Complex Data:Motivating ExamplesIn practice, multivariate data are often complicated in the sense that i) multivariatedata often contain missing values; ii) the variables may be of different types suchas one being a continuous variable and the other being a binary (discrete) variable;and iii) some data may be censored or truncated due to a lower or upper detec-tion limit. In particular, these problems may be even more common for longitudi-nal data or repeated measurement data, which may also be viewed as multivariatedata. Although there has been extensive literature on multivariate one-sided ororder-restricted hypothesis testing, as reviewed in Silvapulle and Sen (2004), thecorresponding tests incorporating the foregoing data complications are very lim-ited. It is important to fill in this gap so that such tests can be used in real worldproblems. In this section, we present several real world examples which motivatedour research and the current thesis.Example 1. A temperament dataset with missing values.14Surkan et al. (2015) describe a randomized trial investigating the use of severalinterventions (e.g. zinc-folic and iron-folic acid) on temperament improvement inchildren in rural Nepal. At baseline and at each follow-up visit (at roughly threemonths interval), subjects receive questionnaires. Subjects’ answers to 12 relatedquestions are used to derive an aggregated temperament score. Lower scores indi-cate easier temperament. Missing data arise whenever more than one third of thesequestions are unanswered. The rates of missing temperament scores are not small(roughly 15%), and the missing data mechanisms may be informative in the sensethat missing temperament scores may represent systematically better or worse tem-perament status compared with observed temperament scores. Figure 1.3 displaysthe temperament score trajectories of 10 randomly subjects from the zinc-folicgroup. Missing data are present for 2 of the 10 subjects.15024681012Pure ZincVisit timeTemperament scoreBaseline 1st follow−up 2nd follow−upllFigure 1.3: The temperament score trajectories of 10 randomly subjects fromthe zinc-folic group in the Surkan et al. (2015) trial. The dot subject hasa missing temperament score at the 2nd follow-up visit and the trianglesubject has a missing temperament score at the 1st follow-up visit.Let yi j and ti j respectively denote the temperament score and measurementtime (in weeks) of subject i at visit j ( j = 0 corresponds to baseline) in each group.Let x1i = (yi0− yi1)/(ti1− ti0), x2i = (yi0− yi2)/(ti2− ti0), µ1 = E(X1i), and µ2 =E(X2i), where X1i and X2i are underlying random variables whose correspondingrealized values are x1i and x2i. To find evidence for short-term (first six months)amelioration of temperament in this population, we may perform the following16hypothesis testingH0 : µ1 ≤ 0 and µ2 ≤ 0 versus H1 : µ1 > 0 or µ2 > 0. (1.11)This is an example of the testing problem (1.7) in Section 1.2.Wang and Wu (2011) studied this type of problems under the assumption ofmultivariate normality. They use a multiple imputation approach to address miss-ing data. However, their method requires the assumption of equal ratios of missing-to-observed information (i.e. missing rates are roughly similar across variables),as well as the assumption of missing at random in the sense of Rubin (1976) (moredetailed explanation in Section 2.3). In Chapter 3 of this thesis, we will propose amore general likelihood-based method to handle missing data in the above testingproblem without the assumptions of equal missing rates and missing at random.Example 2. A semi-continuous acupuncture data.Albert and Shen (2005) describe a study in which three intervention groups arecompared: a control group receiving medication alone, a treatment group receivingmedication plus sham acupuncture (minimal needling that are believed not to be bi-ologically effective), and another treatment group receiving medication plus activeacupuncture. Since acupuncture is a non-chemical supplement to medication, it isnaturally expected that there should be non-decreasingly stronger treatment effectsacross the three groups.Albert and Shen (2005) considers a longitudinal outcome: the volume of vom-ited substance measured at baseline and several follow-up visits. Lower volumevalues indicate improved health conditions and zero values indicate full recovery(i.e. a qualitatively distinct state). Roughly 50% response values are zeros and17the remaining 50% response values are positively skewed. Figure 1.4, taken fromAlbert and Shen (2005), displays the trajectories of log(volumn+1) for a randomselected subject from each group. These types of data with excessive zeros arereferred to as semi-continuous data or zero-inflated data in the literature (Duanet al., 1983; Olsen and Schafer, 2001; Neelon et al., 2016). Due to excessive zeros,semi-continuous data cannot be modelled by a single continuous distribution suchas the normal distribution. In the literature, a two-part model or a mixture modelare often considered for such data: one model is used to model the binary responseof “zero-or-not” and another model is used to model the non-zero continuous data.We will describe the details in later chapters.18Figure 1.4: The trajectories of log(volumn+1) for a randomly selected sub-ject from each group taken from Albert and Shen (2005): a) activeacupuncture plus medication; b) sham acupuncture plus medication; andc) medication aloneLet β1 and β2 respectively denote the mean difference (in log-odds scale as ina logistic regression) of vomiting between the “medication plus active acupunc-ture” group and the medication only group, and that between the “medication plussham acupuncture” group and the medication only group. Then, the following or-19der constraint β1 ≤ β2 ≤ 0 suggests that more active use of acupuncture will notexacerbate the risk of vomiting. Similarly, let β3 and β4 respectively denote themean difference in the amount of vomiting between the “medication plus activeacupuncture” group and the medication only group, and that between the “medica-tion plus sham acupuncture” group and the medication only group. Then, the con-straint β3 ≤ β4 ≤ 0 suggests that more active use of acupuncture will not increasethe amount of vomiting, conditional on the occurrence of vomiting. Therefore, thefollowing order-constrained hypothesis testing is of interest:H0 : β1 = β2 = β3 = β4 = 0, versusHc1 : β1 ≤ β2 ≤ 0, β3 ≤ β4 ≤ 0, with at least one inequality being strict.The above order-restricted hypothesis testing may be more appropriate and morepowerful than the standard analysis of variance (ANOVA) hypotheses. We willreturn to this example with more details later.Example 3. HIV viral dynamics.Studies of HIV viral dynamics during an anti-HIV treatment have receivedgreat attention in recent years. HIV viral dynamic models are used to model theviral load trajectories during an anti-HIV treatment, which allows us to evaluatethe treatment effects. Based on biological arguments, Wu and Ding (1999) suggestthe use of the following viral dynamic model (which is a nonlinear mixed-effects(NLME) model)Yi j = log10(eP1i−λ1i jti j + eP2i−λ2iti j)+Ei j20P1i = P1+B1i, λ1i j = λ1+α1CD4i j +B2i,P2i = P2+B3i, λ2i = λ2+B4i, j = 1, . . . ,ni, i = 1, . . . ,n,where Yi j is the log10-transformed viral load of individual i at time ti j, Ei j is thecorresponding random error, Bki’s are random effects for k = 1, . . . ,4, eP1 and eP2are respectively the population baseline viral loads in the productively infectedcells and in the long-lived or latently infected cells, λ1 and λ2 are respectivelythe viral decay rates in the forgoing two cell compartments, and λ1i j and λ2i areindividual-specific viral decay rates in those two cell compartments.The order constraint P2 > P1 has biological interpretation and is also assumedto ensure model identifiability. The constraints λ1 ≥ 0 and λ2 ≥ 0 are reasonablesince an anti-HIV treatment usually does not increase virus levels during the initialperiod of treatments. Moreover, we typically have λ1 ≥ λ2, due to the decreasingtreatment effects over time. When λ1 = λ2 = 0, there will be no viral decay (i.e.no treatment effect). The viral load may rebound in later periods, but we focus onthe initial period (say, first three months) which best reflects the treatment efficacy.A main challenge in the analysis of HIV data is that viral loads are often trun-cated or censored. That is, due to detection limits of medical devices, HIV viralload data are frequently subject to left-censoring (Hughes, 1999; Vaida et al., 2007;Wu, 2009). For example, Figure 1.5 shows the trajectories of 4 randomly selectedsubjects from two HIV viral load studies (ACTG 315 and ACTG 398). Both studiesare complicated by the presence of censoring in HIV viral load data.21llll l l l0 20 40 60 801234567ACTG 315 sbujectDaylog10 Viral loadllll0 20 40 60 801234567ACTG 315 sbujectDaylog10 Viral loadlllll l l0 20 40 60 801234567ACTG 315 sbujectDaylog10 Viral load llll0 20 40 60 801234567ACTG 315 sbujectDaylog10 Viral loadllll0 50 100 150 2001234567ACTG 398 sbujectDaylog10 Viral loadlll l l l0 50 100 150 2001234567ACTG 398 sbujectDaylog10 Viral loadlllll l0 50 100 150 2001234567ACTG 398 sbujectDaylog10 Viral loadllll l l0 50 100 150 2001234567ACTG 398 sbujectDaylog10 Viral loadFigure 1.5: HIV viral load trajectories of 4 randomly selected subjects fromACTG 315 study and from ACTG 398 study respectively. In the figures,dotted horizontal lines indicate censoring thresholds, and square pointsbelow dashed lines are half the detection limits.22In the above HIV viral dynamic model, a basic research question is if the treat-ment is actually effective since there are large between-individual variations. Thisquestion can be answered by testing the following one-sided hypothesisH0 :λ1 = λ2 = 0 versusH1 :λ1 ≥ 0, λ2 ≥ 0, with at least one inequality being strict.In this thesis, we will consider multivariate one-sided or order-restricted tests forHIV viral dynamic models or more generally nonlinear mixed effects models inthe presence of censoring in the responses.1.4 Literature ReviewThere has been an extensive literature on multivariate one-sided tests or order-restricted tests. Silvapulle and Sen (2004) provides a nice summary of early liter-ature. In the early literature, the focus is on multivariate normal models withoutany data complications such as missing data or censored data. In this section, weprovide a brief review of the more recent literature.A common approach for multivariate one-sided tests is based on the likelihoodmethod. Davidov and Rosen (2011) and Farnan et al. (2014) considered multi-variate one-sided tests for continuous longitudinal data using linear mixed effectsmodels. Davis et al. (2012) consider multivariate one-sided tests for categoricaland count longitudinal data using generalized linear mixed effects models. Wangand Wu (2013) consider multivariate one-sided tests for nonlinear mixed effectsmodels. Pedersen (2017) considers one-sided tests for time series data using gen-eralized autoregressive conditional heteroskedasticity (GARCH) models. Davidov23et al. (2017) consider one-sided tests for continuous or discrete multivariate datawhere the models may be singular.Incorporating constraints into statistical inference has also been considered inBayesian methods. Kline (2011) provides a comparison between Bayesian andfrequentist methods that both address a one-sided hypothesis about a multivari-ate mean. Oh (2014) considers Bayesian variable selection for linear regressionmodels in which some regression coefficients are subject to inequality constraints.Mulder (2016) presents a recent review of Bayesian testing methods for constrainedhypotheses.Since longitudinal data are often complicated by missing values, censored val-ues, and measurement errors, as illustrated by examples in the previous section,there is also a growing interest in methods for multivariate one-sided tests that ad-dress data complexities. Wang and Wu (2011) propose multivariate one-sided testsfor multivariate normal models in the presence of ignorable missing data usingmultiple imputation methods. Davis et al. (2013) considers multivariate one-sidedtests for generalized linear models with missing covariates.1.5 Thesis OutlineIn the literature, existing multivariate one-sided or order-restricted tests are typi-cally based on “regular data”, i.e., data without missing values, censoring, or othercomplications. In practice, however, multivariate data or longitudinal data are usu-ally not “regular” in the sense that missing values, censoring, and other complica-tions are very common. Therefore, there is a need to develop statistical methods inthe presence of data complications.This thesis focuses on multivariate one-sided or order-restricted tests for com-24plex data: 1) multivariate data with missing values and the missingness may benon-ignorable; 2) semi-continuous or zero-inflated longitudinal data; and 3) trun-cated or censored longitudinal data. For multivariate data, we assume multivariatenormal models. For longitudinal data, we consider mixed effects models. The gen-eral approach is based on the likelihood method. We mainly focus on the likelihoodratio tests (LRT) and Wald-type tests since these tests are widely used in one-sidedor order-restricted inference. In the presence of “unobservables” such as missing orcensored values, the likelihood functions involve evaluations of high-dimensionaland intractable integrals. Thus, a main issue in our work is computation.The rest of the thesis is organized as follows. Chapters 2 provides an overviewof complications of multivariate data and longitudinal data in practice, includingmissing values, truncated values, and semi-continuous data, as well as existingapproaches to address these problems. Chapter 3 proposes two new approachesfor multivariate one-sided tests for multivariate normal data with possibly nonig-norable missing values based on a likelihood approach. In Chapter 4, we studymultivariate order-restricted tests for semi-continuous longitudinal data based onmixed effects models, which has not been studied in the literature. Chapter 5 stud-ies multivariate one-sided tests for truncated/censored longitudinal data, which hasnot been studied in the literature either. Chapter 6 presents conclusions, discus-sions, as well as plans for future work.251.6 Appendix: Some Miscellaneous Results1.6.1 Some Results for Section 1.1Let X1, . . . ,Xn denote n independent and identically distributed observations drawnfrom a normal population N(µ,σ2), and let X = 1n ∑ni=1 Xi be the sample mean. Letd→ and p→ respectively denote convergence in distribution and in probability asn→ ∞.Recall that, in Section 1.1, the LRT and Wald test statistic for testing the (uni-variate) two-sided hypothesis H0 : µ = 0 versus H1 : µ 6= 0 are given respectivelybyTW =n2X2∑ni=1(Xi−X)2,TLR = n log(1+1nTW).Under H0 : µ = 0, we show that the following results holdTWd→ χ21 ,TLR−TW p→ 0,TLRd→ χ21 .Proof. When µ = 0, by the central limit theorem, we have√nXσd→ N(0,1),26which suggests thatnX2σ2d→ χ21 .Now, by the law of large numbers, we have∑ni=1(X−µ)2np→ σ2,Xp→ µ.Thus, based on the Slutsky’s theorem, we haven∑i=1(Xi−X)2/n =n∑i=1(X−µ)2/n+(µ−X)2 p→ σ2,σ2∑ni=1(Xi−X)2/np→ 1,TW =n2X2∑ni=1(Xi−X)2=nX2σ2× σ2∑ni=1(Xi−X)2/nd→ χ21 ×1 = χ21 .Further, we haveTLR−TW = n log(1+1nTW)−TW= log(1+1nTW)n− logeTW= log[(1+ 1n TW )n/TW ]TWeTW= TW log(1+ 1n TW )nTWep→ TW log(ee) = 0.The above last step holds because i) TW is bounded in probability in the sense27that the probability that TW is bounded goes to 1 as n goes to infinity, which isjustified from TWd→ χ21 (see the results in Section 1.2.5 in Serfling (1980)), ii)(1+ 1n TW )nTWp→ e due to nTWp→ ∞, and iii) log(·) is a continuous function.Finally, by the Slutsky’s theorem, TLRd→ χ21 follows from the above results thatTWd→ χ21 and TLR−TWp→ 0.Also recall that in Section 1.1, the LRT and Wald test for testing the (univariate)one-sided hypothesis H0 : µ = 0 versus H1 : µ > 0 are given respectively byTW = 0 if X ≤ 0,n2X2∑ni=1(Xi−X)2, if x > 0.TLR = n log(1+1nTW).Under the null hypothesis H0 : µ = 0, we show that the following results holdPr(TW ≤ c)→ 12 +12Fχ2(c;1), for any c > 0, as n→ ∞, (1.12)TLR−TW p→ 0, (1.13)Pr(TLR ≤ c)→ 12 +12Fχ2(c;1), for any c > 0, as n→ ∞. (1.14)Proof. First, we have that under H0 : µ = 0,Pr(TW ≤ c) = Pr(TW ≤ c|X ≤ 0)Pr(X ≤ 0)+Pr(TW ≤ c|X > 0)Pr(X > 0)=12+12Pr[ n2X2∑ni=1(Xi−X)2≤ c].28It is shown in the previous proof thatn2X2∑ni=1(Xi−X)2d→ χ21 ,then (1.12) follows. Since the relationship between TLR and TW remains the sameas in the previous results, the arguments in the previous proof can be used toshow (1.13) and (1.14).1.6.2 Some Results for Section 1.2Let the random vectorX = (X1,X2)T ∼ N[0,( 1 00 1)].Consider the transformation X1 = R× sinΘ and X2 = R× cosΘ, where R repre-sents the length of X and Θ represents the direction of X with a range of [−pi,pi).We show that R and Θ are independent and Θ follows a uniform distribution on[−pi,pi).Proof. The Jacobin determinant equals∣∣∣∣∣∣∣cosΘ −RsinΘsinΘ RcosΘ∣∣∣∣∣∣∣ = R. Let I(A) denotethe indicator function for a set A. The joint probability density function (pdf) of(R,Θ)T is given by:f (r,θ) =( 1√2pi)2exp(− x21+ x222)r× I(0≤ r <+∞ and −pi ≤ θ < pi)=12piexp(− r22)rI(0≤ r <+∞)I(−pi ≤ θ < pi)= f1(r) f2(θ),29where f1(r) = exp(− r22 )rI(0 ≤ r < +∞) is the marginal pdf of R and f2(θ) =12pi I(−pi ≤ θ < pi)is the marginal pdf of Θ.Since the joint pdf R and Θ can be factorized as the product of the marginalpdf of R and the marginal pdf of Θ, it follows from Lemma 4.2.1 in Casella andBerger (2002) that R and Θ are independent. In addition, the pdf of Θ indicatesthat it follows a uniform distribution on [−pi,pi).1.6.3 Simulation Based Calculation of Chi-bar-weightsAs we noted in Section 1.2, the null distributions of multivariate one-sided testsare typically chi-bar-square (χ¯2) distributions, which are weighted averages of χ2random variables. A main step to obtain these chi-bar-square (χ¯2) distributions isto compute the weights. In this section, we describe a simulation-based method tocompute these weights.First, we note that the solution b to the following standard quadratic program-ming problem is well-coded, for example, in the R package quadprog:minimize −dT b+ 12bT Db subject to AT b ≥ b0, (1.15)for a known matrix A, vector b0, d , and a positive definite matrix D. Here eachmatrix/vector is of appropriate size to ensure meaningful matrix multiplications.Next, we describe how to compute the chi-bar-weightωi(p,Sn;O+) using MonteCarlo simulations, following Silvapulle and Sen (2004). Specifically, the followingMonte Carlo method can be used to compute the chi-bar-weights:ωi(p,Sn;O+) = Pr[piSn(X ;O+) is on the i-th dimensional face of O+],30piSn(X ;O+) = arg minv∈O+(X − v)T S−1n (X − v),where X =∑ni=1 X i/n and Sn = S/n, with X 1,X 2, . . ., and X n being independent andidentically distributed random vectors from the p-dimension multivariate normaldistribution N(0,S).The specific steps are as follows:(a) Generate X ∼ N(0,Sn).(b) Compute piSn(X ;O+) = X − bˆ, where bˆ is the solution to (1.15) when D is setto 2S−1n , d is set to 0, b0 is set to −X and A is set to −ID, the identity matrixof the same size as D.(c) Repeat steps (a) and (b) N times, say N = 10000.Then, for i = 0, . . . , p, we haveωi(p,Sn;O+)≈ niN ,where ni is the number of times in which piSn(X ;O+) in (b) has exactly i positivecomponents.Similarly, the following Monte Carlo method can be used to compute the chi-bar-weightωp−i(p,Sn;O−) = Pr[piSn(X ;O−) is on the (p− i)-th dimensional face of O−],piSn(X ;O−) = arg minv∈O−(X − v)T S−1n (X − v).The specific steps are as follows31(a) Generate X ∼ N(0,Sn).(b) Compute piSn(X ;O−) = X − bˆ, where bˆ is the solution to (1.15) when D is setto 2S−1n , d is set to 0, b0 is set to X and A is set to ID, the identity matrix ofthe same size as D.(c) Repeat (a) and (b) N times, say N = 10000.For i = 0, . . . ,r, we haveωp−i(p,Sn;O−)≈ miN ,where mi is the number of times in which piSn(X ;O−) in (b) has exactly (p− i)negative components.32Chapter 2Complications in LongitudinalData2.1 Longitudinal Data and Mixed Effects ModelsLongitudinal studies are characterized by repeated observations or measurementscollected at different time points for each subject. For example, Figures 2.1 showsthe trajectories of repeated CD4 cell counts and HIV viral load measured at dif-ferent days of all subjects from the ACTG 315 study which will be describedlater. Longitudinal studies are widely used in practice since they allow us to studychanges over time, in addition to variations between subjects. Note that some lon-gitudinal data may be analyzed by viewing them as multivariate data if the dataare balanced (i.e., data are available at each time point), or if data at different timepoints are viewed as data from different variables. In practice, longitudinal data areoften complicated. For example, the following issues are common in longitudinaldata:33• missing data or dropouts,• data truncations or censoring,• semi-continuous data or zero-inflated data,• measurement errors,• outliers (heavy tails),• · · · .Thus, analysis of longitudinal data can be challenging. In this thesis, we focus onthe first three issues listed above.The defining feature of longitudinal data is that observations within the samesubject are usually correlated, whereas observations from different subjects arereasonably assumed as independent. Three methods are commonly used to modellongitudinal data: mixed-effects models, marginal or generalized estimating equa-tion (GEE) based models, and transitional models. These methods address within-subject correlations in different ways.34234560 25 50 75Number of days after baselinelog10(viral load)Viral load01002003004005000 25 50 75Number of days after baselineCD4 countsCD4Figure 2.1: An example of features of longitudinal data: trajectories of HIVviral load and CD4 from the ACTG 315 data, the dashed line in the leftpanel indicates censoring threshold and the points marked by crossesindicate censored HIV viral load observationsIn this thesis, we focus on (parametric) mixed-effects models for longitudinaldata. Mixed effects models use random effects to address between-subjects vari-ation and within-subjects correlation in the longitudinal response variable. Mixedeffects models are popular for longitudinal data since they have the following ad-vantages:• They naturally extend common regression models for cross-sectional data byintroducing random effects into the models.• They allow for individual-specific inference, in addition to the usual population-35average inference.• They automatically incorporate missing data if the missingness is missing atrandom in likelihood inference.Disadvantages of mixed effects models include distributional assumptions of therandom effects and within-individual errors and computational challenges for like-lihood inference, compared with (say) GEE based models.As noted earlier, mixed effects models can be obtained by adding random ef-fects to the corresponding regression models for cross-sectional data. Commonregression models for cross-sectional data include linear, generalized linear, andnonlinear regression models. By adding appropriate random effects in these mod-els, we can obtain three common types of mixed effects models• linear mixed effects (LME) models,• generalized linear mixed effects models (GLMM),• nonlinear mixed effects (NLME) models.The choices of random effects in these models can be decided based on between-individual variations in the data and AIC or BIC criteria.When some parameters in a mixed effects model have certain constraints, suchas all being positive, statistical inference should incorporate such constraints. Al-though there is a large separate literature on both constrained inference and mixedeffects models, order-restricted inference for mixed effects models is still very lim-ited. Therefore, in this thesis we focus on order-restricted inference for parametricmixed effects models and simultaneously address some data complications such as36missing data, semi-continuous data, and censored data. We will begin with mul-tivariate normal models since existing literature on constrained inference mostlyfocuses on such models. In the following sections, we provide briefly review ofthe basics of mixed effects models, methods for missing data, methods for semi-continuous data, and methods for censored longitudinal data.Before we discuss constrained inference and data complications, we first brieflyreview the basics of the above three types of mixed effects models in the followingsubsections, as preparation for later sections and chapters. Throughout this chapter,we use ni to denote the number of time points for subject i, and n to denote thenumber of subjects.2.1.1 Linear Mixed Effects (LME) ModelsLME models can be constructed by adding appropriate random effects to linearregression models. For example, the following LME may be considered for theCD4 cell counts trajectories in the ACTG 315 study (Wu, 2002):Yi j = β0+β1ti j +Bi0+Bi1ti j +Ei j,(Bi0,Bi1)i.i.d.∼ N(0,D), Ei j i.i.d.∼ N(0,σ2),(Bi0,Bi1) independent of Ei j, i = 1, . . . ,n, j = 1, . . . ,ni,where “i.i.d.” is an abbreviation for independent and identically distributed andYi j denotes the CD4 measurement collected from subject i at time ti j. In this LMEmodel, the fixed effects β0 and β1 respectively describe the population-averagebaseline CD4 and CD4 change rate over time (i.e. CD4 slope). The random effectBi0 describes the individual deviation in baseline CD4 from the population average37and the random effect Bi1 describes the individual deviation in CD4 slope from thepopulation average for subject i. These random effects also incorporate correlationsamong the repeated CD4 measurements within each subject.A general form of LME models are given by Laird and Ware (1982)Y i = Xiβ +ZiBi+E i,Bii.i.d.∼ N(0,D), E i i.i.d.∼ N(0,σ2Ini×ni),Bi is independent of E i, i = 1, . . . ,n,where the vector β denotes population parameters (also called fixed effects), thevector Bi denotes random effects, Y i and E i are respectively the response vectorand the within-subjects error vector, Xi and Zi are respectively the design matricesfor fixed effects and random effects, and Ini×ni is an ni×ni identity matrix.Statistical inference for mixed effects model is typically based on the like-lihood method. Based on the observed data {y1, . . . ,yn}, that is, realizations of{Y 1, . . . ,Y n}, the likelihood function for the above general LME models can bewritten asL(β ,D,σ2) =n∏i=1∫fY i|Bi(yi|bi;β ,σ2) fBi(bi;D)dbi=n∏i=11(√2pi)q+ni√|D|σ2ni×∫exp{− 12σ2(yi−Xiβ −Zibi)T (yi−Xiβ −Zibi)−12bTi D−1bi}dbi,where q is the dimension of bi, and the unobservable random effects are inte-grated out. Maximum likelihood estimates for the mean parameters and restricted38maximum likelihood estimates for the variance parameters can be obtained basedon the Expectation-Maximization (EM) algorithm, as described in Laird and Ware(1982). Section 2.2 provides a brief review of EM algorithms.2.1.2 Generalized Linear Mixed Effects Models (GLMMs)Generalized linear models (GLMs) extends linear regression models by allowingnon-normal responses. Let Y1, . . . ,Yn be n independent observations following adistribution from the exponential family, which includes normal, binomial, Pois-son, and some other distributions. Let µi = E(Yi), and let g(·) be a link function. Ageneral GLM can be written as (McCullagh and Nelder, 1989).g(µi) = β0+β1xi1+ . . .βpxip,where β0, . . . ,βp are unknown parameters and xi1, . . . ,xip are predictors (covari-ates). For example, when Y is a binary variable taking values 0 or 1, we mayassume Ti follows a Bernoulli distribution, so Pr(Yi = 1) = E(Yi) = µi. Let g(µi) =logit(µi) = log[µi/(1− µi)]). The resulting GLM is a logistic regression model.For longitudinal data, a GLMM can be constructed by adding random effects inthe corresponding GLM. For example, sometimes CD4 values are dichotomized,defined as whether the raw CD4 value is below or above a reference value (say)350 (Aina et al., 2005). Then, the redefined binary CD4 data may be modeled bythe following logistic regression with random effects (a special case of GLMM)logit Pr(Yi j = 1|Bi0 = bi0,Bi1 = bi1) = β0+β1ti j +bi0+bi1ti j, (2.1)(Bi0,Bi1)Ti.i.d.∼ N(0,D), i = 1, . . . ,n, j = 1, . . . ,ni,39where Yi j denotes the dichotomized CD4 from subject i at time ti j.A general form of GLMM models is as follows (Diggle, 2002)g(µ i|Bi = bi) = Xiβ +Zibi,Bii.i.d.∼ N(0,D), i = 1, . . . ,n,where β contains fixed effects, Bi’s are random effects, g(µ i)= [g(µi1), . . . ,g(µini)]Twith µi j = E(Yi j), g(·) is a link function, and Xi and Zi are respectively the designmatrices for fixed effects and random effects. Statistical inference for a GLMM istypically based on the likelihood method. Based on the observed data {y1, . . . ,yn},that is, realizations of {Y 1, . . . ,Y n} with Y i = {Yi1, . . . ,Yini} for i = 1, . . . ,n, thelikelihood function for the above GLMM can be written asL(β ,D) =n∏i=1∫fY i|Bi(yi|bi;β ) fBi(bi;D)dbi=n∏i=11(√2pi)q√|D|×∫ ni∏j=1fYi j|Bi(yi j|bi;β )exp{− 12bTi D−1bi}dbi,where q is the dimension of Bi and fYi j|Bi(yi j|bi;β ) is the conditional density ofYi j in the exponential family. In example (2.1), fYi j|Bi(yi j|bi;β ) =[11+exp(−ηi j)]yi j×[11+exp(ηi j)]1−yi jwith ηi j = β0+β1ti j+bi0+bi1ti j, β =(β0,β1)T and bi =(bi0,bi1)T .The integral in the above likelihood function is generally intractable since thelink function is often nonlinear. Thus, Monte Carlo methods, numerical methods,and approximate methods are often used for parameter estimation, such as MonteCarlo EM algorithms (Booth and Hobert, 1999), Laplace approximations (Wolfin-40ger, 1993), and Gauss−Hermite quadrature or adaptive Gauss−Hermite quadrature(Joe, 2008; Pinheiro and Chao, 2006). A recent comprehensive review for GLMMsis given by Stroup (2012).2.1.3 Nonlinear Mixed Effects (NLME) ModelsUnlike linear and generalized linear models which are usually empirical mod-els, nonlinear regression models are usually mechanistic or scientific models inthe sense that they are usually based on underlying data-generating mechanisms.For example, the viral dynamic models proposed in Wu and Ding (1999) are so-lutions of a set of differential equations which roughly describe viral productionand elimination processes during an anti-HIV treatment. For longitudinal data,NLME models are constructed by introducing appropriate random effects to thecorresponding nonlinear regression models. Consequently, NLME models oftenreflect data-generating mechanisms. The viral dynamic models proposed in Wuand Ding (1999) can be written asYi j = log10(eP1i−λ1i jti j + eP2i−λ2iti j)+Ei j,P1i = P1+B1i, λ1i j = λ1+α1CD4i j +B2i,P2i = P2+B3i, λ2i = λ2+B4i, j = 1, . . . ,ni, i = 1, . . . ,n,where Yi j, CD4i j and Ei j respectively denote the log10-transformed viral load, thecovariate CD4 count, and the within-subject error of individual i at time ti j, Bki’s arerandom effects for k = 1, . . . ,4, the fixed effect parameters eP1 and λ1 respectivelydenote baseline viral loads and viral decay rate in the productively infected cells,and the fixed effect parameters eP2 and λ2 respectively denote baseline viral loads41and viral decay rate in long-lived or latently infected cells.A general NLME model can be written as follows (Davidian and Giltinan,1995)Yi j = h(xi j,β ,Bi)+Ei j,Bii.i.d.∼ N(0,σ2D), Ei j i.i.d.∼ N(0,σ2),Bi independent of Ei j, j = 1, . . . ,ni, i = 1, . . . ,n,where h(·) is a known nonlinear functions, vector β contains fixed effect parame-ters, vector Bi denotes random effects for subject i, and Yi j, xi j and Ei j are respec-tively the response, the covariate vector and the within-subject error for the j-threpeated measurement from the i-th subject. Note that, unlike generalized linear(mixed) models, the above nonlinear function h(·) can be any smooth function, notnecessarily monotone.Statistical inference for NLME models is typically based on the likelihoodmethod. Based on the observed data {y1, . . . ,yn}, that is, realizations of {Y 1, . . . ,Y n}with Y i = {Yi1, . . . ,Yini} for i = 1, . . . ,n, the likelihood function for the above gen-eral NLME can be written asL(β ,D,σ2) =n∏i=1∫ ni∏j=1fY i|Bi(yi|bi;β ,σ2) fBi(bi;D)dbi=n∏i=11(√2piσ)ni+q√|D|×∫ ni∏j=1exp{− bTi D−1bi+∑nij=1[yi j−h(xi j,β ,bi)]22σ2}dbi,where q is the dimension of bi. Similar to GLMMs, the above integral in the42likelihood of a NLME model generally does not have closed-form expression dueto the nonlinear function. Commonly used computational approximation methodsinclude iterative linearization methods (Lindstrom and Bates, 1990), Laplace ap-proximations (Wolfinger, 1993), Monte Carlo EM algorithms (Kuhn and Lavielle,2005), and Gauss−Hermite quadratures or adaptive Gauss−Hermite quadratures(Joe, 2008; Pinheiro and Bates, 1995). Note that, although the computational meth-ods are similar for GLMM and NLME models, there are differences in specificdetails since a NLME model assumes a normal distribution for the response whilea GLMM assumes a distribution in the exponential family which includes binaryand count responses.2.2 Missing Data ProblemsMissing data are very common in many longitudinal studies. For example, Fig-ure 1.3 displays examples of missing longitudinal temperament measurements. Inthe presence of missing data, standard statistical methods may not be directly ap-plicable, and ignoring missing data may lead to biased analysis results. The firststep to handle missing data is to consider possible missing data mechanisms, i.e.,how the data are missing, since choices of statistical methods for missing data maydepend on the types of missing data mechanisms. Rubin (1976) defines three typesof missing data mechanisms:• Missing completely at random (MCAR): the probabilities of data being miss-ing depend on neither the observable data nor the unobservable data. Forexample, missing data due to a broken measurement device are likely to beMCAR.43• Missing at random (MAR): the probabilities of data being missing may de-pend on the observable data but not on the unobservable data. For example,a patient may miss a clinic visit because he is too old, so the missingnessonly relates to his age (observed).• Missing not at random (MNAR): the probabilities of data being missing maydepend on both the observed data and the missing data. For example, whenevaluating a new drug, missing data from patients with drug side-effects arelikely to be MNAR.MCAR and MAR are called ignorable missing, while MNAR is called non-ignorablemissing, since likelihood-based methods can ignore the missing data mechanismsif the missing data are MCAR or MAR but not for MNAR (Little and Rubin, 2002).As reviewed in Little and Rubin (2002), perhaps the two most commonly usedformal methods to address missing data are• likelihood methods based on the Expectation-Maximization (EM) algorithms;• multiple imputation methods;Wang and Wu (2011) considered multiple imputation methods, assuming MCARor MAR and equal missing rates. In this thesis, we consider likelihood-based meth-ods since they have the following advantages: (i) missing data mechanisms can beignored when the missing data are MCAR or MAR, (ii) the missing data mecha-nisms can be explicitly incorporated in a relatively straightforward way when themissing data are MNAR, and (iii) the missing data patterns can be arbitrary (i.e.,there is no need to assume equal missing rates).44In the following, we briefly review the basic ideas of the likelihood methodfor missing data in longitudinal studies. Let Y i = (Yi1, . . . ,Yini)T denote the re-peated measurements of the response from the i-th subject. Let Ri =(Ri1, . . . ,Rini)Tbe the missing indicator for Y i with Ri j = 1 if Yi j is observed and Ri j = 0 ifYi j is missing. Let Y i,obs and Y i,mis respectively denote the observed and miss-ing components in Y i. Let θ denote parameters in the response model and let φdenote parameters in the missing data model. Under MAR or MCAR, we havefRi|Y i,mis,Y i,obs(r i|yi,mis,yi,obs;φ ) = fRi|Y i,obs(r i|yi,obs;φ ), and thus we can factor theobserved-data likelihood as follows∏ifY i,obs,Ri(yi,obs,r i;θ ,φ )=∏i∫fY i,obs,Y i,mis,Ri(yi,obs,yi,mis,r i;θ ,φ )dyi,mis=∏i∫fRi|Y i,mis,Y i,obs(r i|yi,mis,yi,obs;φ ) fY i,mis,Y i,obs(yi,mis,yi,obs;θ )dyi,mis=∏ifRi|Y i,obs(r i|yi,obs;φ )∫fY i,mis,Y i,obs(yi,mis,yi,obs;θ )dyi,mis=∏ifRi|Y i,obs(r i|yi,obs;φ )×∏ifY i,obs(yi,obs;θ ).The distinct parameters assumption (Schafer, 1997) states that the parameter spaceof θ is the same for each value of φ and vice versa. This can ensure that, if (θˆ , φˆ ) isthe joint maximizer of ∏i fY i,obs,Ri(yi,obs,r i;θ ,φ ), then θˆ and φˆ must be the respec-tive maximizer of ∏i fY i,obs(yi,obs;θ ) and ∏i fRi|Y i,obs(r i|yi,obs;φ ), and vice versa.Intuitively, the distinct parameters assumption says that the parameters in the re-sponse model and the parameters in the missing data model provide no informationabout each other, which is unlikely to be violated in most practical problems. When45both the distinct parameters assumption and MAR or MCAR hold, the missing dataare said to be ignorable. In this case, inference about θ can be completely basedon the observed data likelihood ∏i fY i,obs(yi,obs;θ ). The missing data mechanismmodels fRi|Y i,obs(r i|yi,obs;φ ) can be left unspecified.Likelihood-based methods for missing data are typically implemented by theEM algorithms, as described in Dempster et al. (1977). An EM algorithm iteratesbetween the following two steps until convergence:• Expectation step (E-step): computing the conditional expectation of the “com-plete data” log-likelihood given the observed data, assuming the missing datawere observed.• Maximization step (M-step): maximizing the conditional expectation in theE-step with respect to unknown parameters to yield updated parameter esti-mates.When the E-step is difficult to compute, Monte Carlo methods may be used to ap-proximate the conditional expectation, leading to Monte Carlo EM algorithms. EMalgorithms are known to be numerically stable and generally applicable to a largeclass of models where the missing data may include actual missing observationsor unobservables such as random effects in mixed effects models (Laird and Ware,1982). A main drawback of EM algorithms is their slow or even non-convergences(McLachlan and Krishnan, 2007). We illustrate a specific use of EM algorithms inthe Section 2.4.462.3 Semi-Continuous DataSemi-continuous data here refer to continuous data with an unusually large pro-portion of some particular value(s), such as continuous data with many zeros (orzero-inflated data). Such data make distributional assumptions difficult, since theprobability of a response variable taking any particular value should be zero if theresponse variable follows a continuous distribution such as the normal distribu-tion. Thus, special statistical methods are needed to model such data. In regressionsettings, a semi-continuous response is often characterized by a large proportionof a single value (with zero being the most typical), where the single value usu-ally represents a qualitatively distinct state. For examples, Hansen and Graham(1991) describes an adolescent alcohol prevention trial in which alcohol use wasmeasured longitudinally for teenagers in Los Angeles area. Figure 2.2 displays thehistogram of alcohol use for grade 7 teenagers. It is evident that the observationscorresponding to a lack of alcohol usage form a distinct cluster.47Figure 2.2: Example of semi-continuous data taken from Olsen and Schafer(2001): the histogram of alcohol use for grade 7 teenagers in Los Ange-les area, where there is a large proportion of zeros (i.e., a lack of alcoholusage).To model semi-continuous data, Manning et al. (1981) first used a two-part re-gression model for semi-continuous health care demand in a cross-sectional study.The basic idea of their model is to use a mixture distribution for the semi-continuousresponse. For example, let {Yi ≥ 0, i = 1,2, · · · ,n} denote continuous and positiveresponse whose realized values {yi ≥ 0, i= 1,2, · · · ,n}may contain a large propor-tion of zeros. Then, the following two-part model may be used, which is a mixture48of a logistic regression model and a log-linear modelfYi(yi) = (1−pii)× I(yi = 0)+pii× I(yi > 0)×g(yi;µ,σ2), (2.2)pii = Pr(Yi > 0) =exp(xTi β 1)1+ exp(xTi β 1),Ui = I(Yi > 0),µ = E(logYi|Ui = 1;β 2) = xTi β 2,where I(A) is the indicator function for a set A, g(x,µ,σ2) is the probability den-sity function for a log-normal distribution, xi is a vector of covariates from obser-vation i, and β 1 and β 2 are regression coefficients parameters. For count datawith excessive zeros, zero-inflated Poisson or negative binomial models (Lam-bert, 1992) may be considered. Inference for two-part models such as (2.2) andzero-inflated Poisson or negative binomial models can be based on the likelihoodmethod. For example, the likelihood function for model (2.2) is given byL(β 1,β 2,σ2) =n∏i=1pi I(yi>0)i (1−pii)I(yi=0) ∏i∈{i:yi>0}1√2piσexp{− 12(logyi− xTi β 2)2},where a log-normal distribution is assumed for the positive (non-zero) data.Olsen and Schafer (2001) and Tooze et al. (2002) extend the two-part regres-sion models for cross-sectional semi-continuous data to accommodate longitudinalsemi-continuous data by adding appropriate random effects in the models. The ba-sic idea is to replace the logistic model in (2.2) by a logistic mixed effects model (aGLMM) and replace the log-normal model by a LME model with the response be-ing possibly (log) transformed. Likelihood function can then be written in a similarway, except that the random effects are integrated out.49In Chapter 4, we will study two-part models for semi-continuous longitudinaldata, similar to Olsen and Schafer (2001). Our focus will be on order-restrictedhypothesis tests for such models, motivated from an interesting application.2.4 Censored or Truncated DataCensored or truncated data may arise when the devices used to measure the data aresubject to lower and/or upper detection limits. For example, in HIV/AIDS studies,HIV viral loads typically have a lower detection limit such as 100 (i.e., viral loadvalues smaller than 100 cannot be measured; see Figure 1.5 for two examples), soviral loads are often left-censored. Censored data may be considered as a specialcase of missing data with missing not at random mechanisms. In the presence ofcensoring or truncations, statistical analyses ignoring the censoring or imputing thecensored values by a known number such as half the detection limit may lead tobiased results (e.g. Hughes, 1999).In the literature, formal methods for handling censored data include likelihood-based methods (e.g. Hughes, 1999; Vaida and Liu, 2009) typically implementedby EM algorithms, multiple imputation methods (e.g. Pan, 2001), and Bayesianmethods (e.g. Gelfand et al., 1992). The likelihood-based methods are perhaps themost popular. More recent literature includes Bandyopadhyay et al. (2015) andGaray et al. (2017). Fu et al. (2016) provides an overview. In this thesis, we focuson likelihood-based methods. We first provide a brief review of the basic ideas andalso provide an example of an EM algorithm.We consider a LME model with censored responses. Specifically, consider the50following LME modelY i = Xiβ +ZiBi+E i,Bii.i.d.∼ N(0,D), E i i.i.d.∼ N(0,σ2Ini), i = 1, . . . ,n,where Y i = (Yi1, . . . ,Yini)T , Xi = (xTi1, . . . ,xTini)T , and Zi = (zTi1, . . . ,zTini)T . When theresponse yi j (realizations of Yi j) is subject to left-censoring with lower detectionlimits di j, the observed response data become {(qi,ci) : i = 1, . . . ,n}, with qi =(qi1, . . . ,qini)T , ci = (ci1, . . . ,cini)T , and for j = 1, . . . ,ni,(qi j,ci j) = (yi j,0), for yi j ≥ di j,(di j,1), for yi j < di j,where ci j is a censoring indicator for yi j such that ci j = 0 if yi j is not left censoredand ci j = 1 if yi j is left censored.The likelihood function based on the observed data {(qi,ci) : i = 1, . . . ,n} isgiven byL(θ ) =n∏i=1∫ { ni∏j=1fYi j|Bi(qi j|bi;D,σ2)1−ci j FYi j|Bi(di j|bi;β ,D,σ2)ci j}fBi(bi)dbi,(2.3)where θ = (β ,D,σ2) denotes the collection of all parameters andfYi j|Bi(qi j|bi;β ,D,σ2) = exp{− (qi j− xTi jβ − zTi jbi)2/(2σ2)}/(√2piσ),FYi j|Bi(di j|bi;β ,D,σ2) =∫ di j−∞exp{− (z− xTi jβ − zTi jbi)2/(2σ2)}/(√2piσ)dz,51fBi(bi) =1(√2pi)q√|D| exp(−bTi D−1bi/2),with q being the dimension of bi.The above likelihood is intractable. To find the MLEs of the parameters, wecan use the EM algorithm by viewing the censored data and random effects as“missing data”. Note that, if the “missing data” were observed, the log-likelihoodof the “complete data” is given bylc(β ,D,σ2) =n∑i=1lci(β ,D,σ2)=n∑i=1{− ni+q2log(2pi)− ni2logσ2− 12log |D|− (yi−Xiβ −Zibi)T (yi−Xiβ −Zibi)2σ2− 12bTi D−1bi}Starting with some initial values θ (0), the E-step and M-step of the EM algorithmfor the t-th iteration are described as below, where {(Qi,C i) : i = 1, . . . ,n} arerandom vectors whose realized values are {(qi,ci) : i = 1, . . . ,n}.The E-stepThe E-step of the EM algorithm calculates the conditional expectation of the“complete data” log-likelihood given the observed data, evaluated at the currentparameter estimates θ (t) at iteration t,Q(θ ;θ (t)) =n∑i=1E[lci(β ,D,σ2)|Qi,C i;θ (t)]=n∑i=1{− ni+q2log(2pi)− ni2logσ2− 12log |D|− E[(Y i−Xiβ −ZiBi)T (Y i−Xiβ −ZiBi)|Qi,C i;θ (t)]2σ252− 12E(BTi D−1Bi|Qi,C i;θ (t))}.Hughes (1999) shows that Q(θ ;θ (t)) are functions of E(Bi|Qi,C i;θ (t)),E(BiBTi |Qi,C i;θ (t)), E(BiY Ti |Qi,C i;θ (t)), E(Y i|Qi,C i;θ (t)) and E(Y iY Ti |Qi,C i;θ (t)).He proposed to approximate E(Y i|Qi,C i;θ (t)) and E(Y iY Ti |Qi,C i;θ (t)) using MonteCarlo methods, implemented by the Gibbs sampling. Vaida and Liu (2009) pro-posed a computationally more efficient method in which E(Y i|Qi,C i;θ (t)) andE(Y iY Ti |Qi,C i;θ (t)) are expressed in closed forms based on multivariate normalprobabilities.M-stepThe M-step of the EM algorithm produces updated parameter estimates θ (t+1)by maximizing Q(θ ;θ (t)). Specifically, (Hughes, 1999) shows thatβ (t+1) = (XTWX)−1XTW ×E(Y |C,Q,θ (t)),D(t+1) =n∑i=1E(BiBTi |Qi,C i;θ (t))/n,σ2(t+1) =1∑ni=1 nin∑i=1E[(Y i−Xiβ (t+1)−ZiBi)T (Y i−Xiβ (t+1)−ZiBi)|Qi,C i;θ (t)],whereX =X1...Xn , E(Y |C,Q,θ (t)) =E(Y 1|Q1,C1;θ (t))...E(Y n|Qn,Cn;θ (t)) ,53W =W1 . . . 0.... . ....0 . . . Wn ,Wi = Cov(Y i)−1 =[ZiD(t)ZTi +σ2(t)Ini]−1, i = 1, . . . ,n.Iterating between the above E-step and M-step until convergence, we obtainapproximate MLEs of the model parameters. At convergence, an approximate co-variance matrix for the MLE of the fixed effects βˆ may be obtained using the Louis(1982) methodCov(βˆ )≈[ n∑i=1(XTi WiXi−XTi WiSiWiXi)]−1,where Si = Var(Y i−Xiβ |Qi,C i; βˆ ).In Chapter 5, we will extend the above EM algorithm in the following ways:a) accommodating NLME models based on a linearization method similar to Lind-strom and Bates (1990), b) introducing analytical calculations in the E-step forthe NLME model, following Vaida and Liu (2009), and c) incorporating order-restricted constraints in the M-step.54Chapter 3A Likelihood-based Approach forMultivariate One-Sided TestsWith Missing Data3.1 IntroductionIn the analysis of multivariate data and longitudinal data, sometimes certain param-eters in the assumed models have natural restrictions or constraints, such as someparameters being strictly positive or the treatment being at least as effective as thecontrol on multiple endpoints. There has been extensive research for inequality-restricted hypothesis testing in the past few decades. Silvapulle and Sen (2004)provide a comprehensive review of the early literature.In practice, a typical feature of multivariate data or longitudinal data is that thedata often have missing values, since it is difficult to observe all values of several55variables or to observe all values of a variable at every time point. In the presenceof missing data, the commonly used complete-case (CC) method, which deletesall units containing missing data, is well known to be inefficient or even biased ifthe missing data are not missing completely at random (MCAR). Although miss-ing data are very common in practice, literature on inequality-restricted hypothesistesting with missing data is very limited. Wang and Wu (2011) proposed sev-eral multiple imputation methods for inequality-restricted hypothesis testing withmissing data. However, their methods require that the missing rates are equal forthe variables and the missing data mechanism is missing at random (MAR). Theequal-missing-rates assumption and the MAR assumption may be too restrictive inpractice. In this chapter, we propose two likelihood-based methods for multivari-ate one-sided tests with missing data. Our proposed methods do not require theequal-missing-rates assumption and the MAR assumption, therefore they are moregenerally applicable in practice. Moreover, our proposed methods incorporate alltypes of missing data mechanisms, including non-ignorable missing data.It is known that non-ignorable missing data are not testable based on observed-data. However, when the missing data are non-ignorable, i.e., when the missing-ness depends on the values being missing, statistical analysis ignoring this infor-mation can lead to severely biased results (Wu, 2009). By assuming possible non-ignorable missing data mechanisms, we may conduct sensitivity analysis to seeif analysis results are sensitive to the assumed missing data models. Such an ap-proach may help us to better understand the missing data problems in hand andmay help us to draw more reliable conclusions from statistical analysis.Our research is motivated by a recent longitudinal randomized trial (Surkanet al., 2015). The aim of this trial is to evaluate the effects of zinc-folic and iron-56folic acid on temperament and eating behaviours of children in rural Nepal. Atbaseline and at each follow-up visit, subjects receive questionnaires, and temper-ament behaviours are measured by summing the scores obtained from 12 relatedquestions. Missing data arise whenever more than one third of these questions arenot answered. The rates of missing temperament scores are not small (roughlyaround 15%), and the missing data mechanisms are likely to be non-ignorable. Toanalyze this dataset, we consider multivariate one-sided tests, which may offer in-sights unavailable from univariate analysis or two-sided test. A major challenge indata analysis is to address the missing data problem since ignoring missing data forthis study may lead to biased results.The rest of this Chapter is organized as follows. Section 2 proposes two meth-ods for multivariate one-sided tests with missing data, after a brief review of themethods applicable in complete-data cases. Section 3.3 discusses some computa-tional issues. The data analysis for the Surkan et al. (2015) trial is presented inSection 3.4. A simulation study is conducted in Section 3.5 to evaluate the newtests. Section 3.6 concludes this article with some discussion.3.2 Multivariate One-sided Tests with Missing Data3.2.1 Multivariate One-sided Tests With Complete Data Based onMultivariate NormalityWe first briefly review existing results for multivariate one-sided tests without miss-ing data (Silvapulle and Sen, 2004).Consider a p-dimension multivariate normal population Np(µ ,Σ) with meanvector µ = (µ1, . . . ,µp)T and covariance matrix Σ. Let (X 1,X 2, . . ., X n) be ani.i.d. sample from Np(µ ,Σ), where X i = (X1i, · · · ,Xpi)T , i = 1,2, · · · ,n. Let O− =57{(µ1, . . . ,µp)T |µi ≤ 0, for i = 1, . . . , p} be the non-positive orthant in the meanparameter space. We consider the following inequality-restricted hypothesesH0 : µ ∈O− versus H1 : µ ∈ Rp\\O−. (3.1)The null and alternative parameter regions in the case of p = 2 are shown in Fig-ure 3.1.µ1µ2H1H1H1H0Figure 3.1: Null and alternative parameter regions when p = 2.When the covariance matrix Σ is assumed to be known, the likelihood ratio test(LRT) statistic and the Wald-type statistic for testing (3.1) are both given byT = minv∈O−n(x− v)TΣ−1(x− v) =‖ x−piΣn(x;O−) ‖2Σn , (3.2)58where x = ∑ni=1 xi/n with xi being realized values of X i, Σn = Cov(X)= Cov(∑ni=1 X i/n) = Σ/n, and piΣn(x;O−) is the orthogonal projection of the sam-ple mean x onto the null parameter space O− with respect to Σn. Intuitively, thetest statistic T measures the (Σn-scaled) distance between x and the null parameterspace O−. A large distance indicates evidence against H0. It is known that theleast favourable null distribution is attained at µ = 0 (Silvapulle and Sen, 2004):supµ∈O−Pr(T ≥ t) = Pr(T ≥ t|µ = 0). The p-value can be calculated based on thefollowing χ2-distribution:Pr(T ≥ t|µ = 0) =p∑i=0ωp−i(p,Σn,O−)[1−Fχ2(t; i)],where t is the observed value of T , ωp−i(p,Σn,O−) are chi-bar weights given byωp−i(p,Σn,O−)=Pr{piΣn(x;O−) is on the (p− i)-dimensional face of O−}, χ2i rep-resents the χ2-distribution with degrees of freedom i, i = 1, . . . , p, χ20 represents adistribution that places probability 1 at the point mass 0, and F2χ (t; i) is the proba-bility that a χ2i random variable is less than or equal to t.In practice, the covariance matrix Σ is unknown. In this case, we may substi-tute Σ by the sample covariance matrix for approximate large sample inference, orwe can construct the likelihood ratio test by maximizing the log-likelihood withrespect to all unknown mean and covariance parameters under the restrictions onthe mean parameters specified by the null hypothesis, and separately maximizingthe log-likelihood under the union of the null and alternative hypotheses (see Perl-man (1969) for more discussion on inequality-restricted likelihood ratio tests withunknown Σ). The probability weights ωp−i(p,Σn,O−) may be computed via sim-ulations, as described in Section 1.6 in Chapter 1.593.2.2 Multivariate One-sided Tests With Missing DataIn the presence of missing data, especially non-ignorable missing data, multi-variate one-sided tests or inequality-restricted tests can be challenging, since wemust address both missing data and inequality-restrictions. In principle, likelihoodmethods incorporating missing data and inequality-restrictions are conceptuallystraightforward, but computation can be challenging as will be shown later. There-fore, in this section, we propose two approaches: in the first approach we handlemissing data and inequality-restrictions separately and construct a Wald-type testwhich is computationally efficient, while in the second approach we consider alikelihood approach which addresses missing data and one-sided inequality restric-tions simultaneously but is computationally more intensive.We first introduce some notation. Let X i = (X Tmis,i,XTobs,i)T , where X obs,i andX mis,i respectively denote the observed and missing components in X i. Let Ri =(Ri1, . . . ,Rip)T be the missingness indicator such that Ri j = 1 if Xi j is missing andRi j = 0 if Xi j is observed. When the missing data mechanism is non-ignorable, themissingness may depend on the missing values and observed values. So we mayconsider a possible model for Ri, say fRi|X i(r i|xi;α ), where α denote unknownparameters. For example, we may consider the familiar factorizationfRi|X i(r i|xi;α ) = fRi1|X i(ri1|xi;α )p∏j=2fRi j|Ri1,...,Ri( j−1),X i(ri j|ri1, . . . ,ri( j−1),xi;α ),withfRi j|Ri1,...,Ri( j−1),X i(ri j|ri1, . . . ,ri( j−1),xi;α ) = piri ji j (1−pii j)1−ri jfor j≥ 2 and fRi1|X i(ri1|xi,α )= piri1i1 (1−pii1)1−ri1 , where pii j(ri1, ...,ri( j−1))=Pr(Ri j =601|Rik = rik,k = 1, ..., j−1) for j ≥ 2 and pii1 = Pr(Ri1 = 1). Then, logistic regres-sion models may be considered to model pii j. To avoid too many nuisance param-eters, parsimonious models for pii j are generally preferred. In addition, differentmodels may be considered for sensitivity analysis (see Section 4).For the covariance matrix Σ(η ), to ensure its positive-definiteness, we mayadopt the log-Choleskey parameterization as in Pinheiro and Bates (1996). Thisparameterization first applies the Choleskey decomposition Σ = LTL, where Lis an upper triangular matrix with positive diagonal entries, and then stacks thecolumns of L into an unconstrained vector η with the diagonal entries of L beinglog-transformed. In some applications, special correlation structures such as AR(1)or compound symmetry structure may be reasonable, which can lead to fewer pa-rameters in η .Let θ = (µ T ,η T ,α T )T ⊆ Rk denote all unknown parameters. The observed-data log-likelihood can be written as followslobs(θ ) =n∑i=1li(θ ) =n∑i=1log∫fRi|X i(r i|xi;α ) fX i(xi;µ ,η )dxmis,i. (3.3)A two-step Wald-type testFirst, we propose a two-step Wald-type inequality-restricted test, which sepa-rately addresses missing data and one-sided inequality restrictions. Note that thetest statistic T in (3.2) is a Wald-type test statistic for complete data for testing (3.1).This motivates the following two-step method when there are missing data.Step 1: obtain estimates of the mean vector µ and its associated sampling covari-ance matrix S based on the observed-data likelihood lobs(θ ), ignoring the61inequality-restrictions in the hypotheses, and denote the resulting estimatesby µˆ and Sˆ respectively.Step 2: construct the inequality-restricted Wald-type statistic as follows, followingthe form for the complete data in (3.2).TW = minv∈O−(µˆ − v)T Sˆ−1(µˆ − v) =‖ µˆ −piSˆ(µˆ ;O−) ‖2Sˆ , (3.4)where piSˆ(µˆ ;O−) is the orthogonal projection of µˆ onto the H0-constrainedparameter space with respect to Sˆ.The likelihood-ratio test (LRT)We can also directly construct the LRT statistic asTLR = 2[lobs(θˆ )− lobs(θ )],where θˆ and θ respectively denote the unrestricted MLE and the MLE under therestrictions specified by the null hypothesis H0. Thus, the LRT addresses missingdata and one-sided inequality restrictions simultaneously.For the above two proposed tests, the Wald-type test TW separately handles themissing data and the inequality constraint under H0 in two steps, while the LRTTLR addresses these two issues simultaneously. Intuitively, the LRT may performbetter. However, the LRT is computationally more challenging than the Wald-typetest since the Wald statistic TW only requires the calculation of the unconstrainedMLE. We will discuss the computational issues in greater details in the next section.We will also conduct simulations to compare the finite-sample performances of thetwo tests. The two tests are asymptotically equivalent, as shown below.62When lobs(θ ) satisfies the regularity conditions listed in Section 3.7, both TWand TLR have their least favourable asymptotic null distribution attained at µ = 0.Let R be a matrix of 0’s and 1’s such that µ = Rθ , θ 0 be the true parameter vector,and Iθ 0 be the probability limit of −n−1∂ 2lobs(θ )/∂θ ∂θ T evaluated at θ = θ 0. InSection 3.7, we will show thatPr(T ≤ c|µ = 0)→p∑i=0ωp−i(p,RI−1θ 0 RT ,O−)Fχ2(c; i), as n→ ∞. (3.5)where T is either TLR or TW .In practice, the true information matrix Iθ 0 contains unknown parameters. Wecan use a so-called substitution method which substitutes a consistent estimator Iˆfor the unknown Iθ 0 in (3.5). For example, Iˆ can be the observed information ma-trix −n−1∂ 2lobs(θ )/∂θ ∂θ T , evaluated at the unconstrained MLE. Alternatively,we can also use a so-called bound method which uses the inequality (Perlman,1969).p∑i=0ωp−i(p,RI−1θ 0 RT ,O−)[1−Fχ2(c; i)]≤[1−Fχ2(c; p−1)]+ [1−Fχ2(c; p)]2,to obtain a conservative p-value. The substitution and the bound methods are bothevaluated in the simulation studies in Section 3.4.3.3 Computational IssuesWhen the missing data are ignorable (either MCAR or MAR), the missing datamodel fRi|X i(r i|xi;α ) can be dropped in the observed-data likelihood lobs(θ ), sincein this case likelihood inference does not depend on the missing data mechanism,which is what “ignorable” means (Little and Rubin, 2002). (Strictly speaking,63the Wald-type test is not a likelihood method, but the Wald test statistic is thesame as the LRT statistic in the current multivariate normal setting, as shown ear-lier.) When the missing data are non-ignorable, however, computations of the teststatistics TW and TLR both involve the evaluation of the multi-dimensional inte-gral∫fRi|X i(r i|xi;α ) fX i(xi;µ ,η )dxmis,i, which generally has no closed-form sincefRi|X i(r i|xi;α ) is nonlinear in xi. Monte Carlo EM algorithms, Laplacian approx-imations, and Gaussian quadrature approximations are common options to handlesuch intractable integrals in likelihood inference (Wu, 2009). Iterative algorithmssuch as EM algorithms sometimes exhibit convergence problems, especially whenthe dimensions of the integrals are high and the parameters are constrained. More-over, the results of an iterative algorithm may depend on the choices of startingvalues. Thus, in this paper we consider approximating the integrals in (3.3) usingGaussian quadrature, as in Stroud and Secrest (1966), Pinheiro and Chao (2006) orJoe (2008).Specifically, let pi be the number of missing components in xi. The propertiesof multivariate normal distributions imply that fX mis,i|X obs,i(xmis,i|xobs,i)∼N(µ (i),Σ(i))with µ (i) and Σ(i) determined from µ and η according to the missing pattern in xi,which then yields∫fRi|X i(r i|xi;α ) fX i(xi;µ ,η )dxmis,i = fX obs,i(xobs,i;µ ,η )∫fRi|X i(r i|xi;α )× fX mis,i|X obs,i(xmis,i|xobs,i;µ (i),Σ(i))dxmis,i.Let z∗j , ω j, j = 1, . . . ,NGQ respectively denote the abscissas and the weights for theone-dimensional Gaussian quadrature rule based on the N(0,1) kernel. We have64the following approximation∫fRi|X i(r i|xi;α ) fX mis,i|X obs,i(xmis,i|xobs,i;µ (i),Σ(i))dxmis,i=∫(2pi)−pi/2|Σ(i)|−1/2 exp{−(xmis,i−µ (i))TΣ−1(i) (xmis,i−µ (i))/2}× fRi|X obs,i,X mis,i(r i|xobs,i,xmis,i;α )dxmis,i=∫(2pi)−pi/2 exp(− ‖ z∗ ‖2 /2) fRi|X obs,i,X mis,i(r i|xobs,i;µ (i)+ΣT/2(i) z∗,α )dz∗≈NGQ∑j1=1· · ·NGQ∑jpi=1fRi|X obs,i,X mis,i(r i|xobs,i,µ (i)+ΣT/2(i) z∗j1,..., jpi )pi∏k=1ω jk , (3.6)where z∗j1,..., jpi =(z∗j1 , . . . ,z∗jpi)T and ΣT/2(i) denotes the Choleskey square root matrixof Σ(i) (Higham, 1997).As an illustration of (3.6), consider an example of p = 2, and suppose in theobservation (xi1,xi2), xi1 is observed and xi2 is missing, that is, ri1 = 0 and ri2 = 1.The parameterization is as follows.µ = (µ1,µ2)TΣ= ( eη1 η20 eη3 )(eη1 0η2 eη3 ) = (σ21 σ12σ12 σ22),with σ21 = e2η1 +η22 , σ12 = η2eη3 , σ22 = e2η2 .The conditional distribution of Xi2 given Xi1 = xi1 is N(µ(i),Σ(i)), withµ(i) = µ2+σ12σ21(xi1−µ1),Σ(i) = σ22 −σ212σ21.65For simplicity, suppose the following model for missing data mechanism is used.Pr(Ri j = 1|Xi j = xi j) = α0+α1xi j, j = 1,2,Ri1 and Ri2 are independent.Using 3 quadrature points (i.e. NGQ=3), the approximated likelihood based onobservation (xi1,xi2) according to (3.6) is given byLi(µ1,µ2,η1,η2,η3,α0,α1) =∫fRi1,Ri2|Xi1,Xi2(ri1,ri2|xi1,xi2;α0,α1)× fXi1,Xi2(xi1,xi2;µ1,µ2,η1,η2,η3)dxi2= fRi1|Xi1(ri1|xi1;α0,α1) fXi1(xi1;µ1,σ21 )×∫fXi2|Xi1(xi2|xi1;µ(i),Σ(i)) fRi2|Xi2(ri2|xi2;α0,α1)dxi2=11+ exp(α0+α1xi1)× 1√2piσ1exp{− (xi1−µ1)22σ21}×∫ exp(α0+α1xi2)1+ exp(α0+α1xi2)× 1√2piΣ(i)exp{− (xi2−µ(i))22Σ(i)}dxi2≈ 11+ exp(α0+α1xi1)1√2piσ1exp{− (xi1−µ1)22σ21}×3∑j1=1exp[α0+α1(√Σ(i)z∗j1 +µ(i))]1+ exp[α0+α1(√Σ(i)z∗j1 +µ(i))]ω j1 ,where z∗j1 and ω j1 can be obtained as follows, using the R command gaussHer-miteData(3) provided by the package fastGHQuad:z∗1 = 0, ω1 =23;66z∗2 =−1.73, ω2 =13;z∗3 = 1.73, ω3 =13.Based on the approximation (3.6), we first consider the computation for the Wald-type test statistic TW for the two-step method. The approximate unconstrainedMLE θˆ can be computed via a standard Newton or Quasi-Newton routine, appliedto the approximated log-likelihood in each iteration. That is, at each iteration theupdated parameter estimates can be obtained based onθ (t+1) = θ (t)+ I−1θ (t)Sθ (t) ,where Sθ (t) =∂ lobs(θ )∂θ |θ=θ (t) and Iθ (t) =−∂ 2lobs(θ )∂θ ∂θ T|θ=θ (t) . At convergence, a consis-tent estimator of the covariance matrix Cov(µˆ ) can be obtained usingSˆ = RI−1θˆRT/n.This completes Step 1 in the implementation of TW . A quadratic programmingroutine, as outline in Section 1.6.3, can be applied to complete Step 2 for con-strained optimization. Note that Iˆu is also used to substitute for Iθ 0 in the substitu-tion method to obtain conservative p-values associated with TW .The implementation of the LRT and computation of TLR are more challengingsince it requires the calculation of the H0-constrained MLE θ with missing data. Itis more difficult than the quadratic programming problem for the calculation of theWald statistic TW because a) the objective function lobs(θ ) takes a more compli-cated form than quadratic functions, and b) the optimization to obtain θ involves67the dimension of θ , which is much higher than the dimension of µ associated withthe quadratic programming problem.We propose to apply the BOBYQA (Bound Optimization By Quadratic Ap-proximation) algorithm (Powell, 2009) to the approximated log-likelihood basedon (3.6) to calculate θ . The idea behind the BOBYQA algorithm is to use aquadratic function Q(θ ) to approximate lobs(θ ), with Q(θ ) being equal to lobs(θ )at m interpolation points, where m is a tuning parameter and is typically chosen asm= 2×dim(θ )+1. Section 2 in Powell (2009) described a method to calculate theinitial interpolations points (x(0)1 , . . . ,x(0)m ) and the initial function Q(θ )(0). At the(t + 1)-th iteration, one of the interpolation points (x(t)1 , . . . ,x(t)m ) is replaced by anewly constructed point to form (x(t+1)1 , . . . ,x(t+1)m ), with the inequality constraintsfor parameters in θ being incorporated in this updating scheme (details in Section3 of Powell, 2009). Then, an updated Q(θ )(t+1) was constructed from Q(θ )(t)to form a more accurate approximation of lobs(θ ) and was used to calculate theupdated θ (t+1) (details in Section 4 of Powell, 2009). The algorithm stops whenthe difference in each component between θ (t+1) and θ (t) is smaller than a pre-specified tolerance ε . Section 3.8 provides an illustrative example of the detailedBOBYQA iterations.Without the need to understand the underlying details, the BOBYQA algo-rithm can be conveniently used in C++ through the function find max bobyqa inthe dlib library, which was converted from Powell’s BOBYQA Fortran code, or inR through the function bobyqa in the minqa package. These two functions takethe definition of lobs(θ ), and the lower and upper bound for each variable as inputarguments and calculates θ .When the substitution method is used to obtain p-values associated with TLR,68we can use either Iθˆ , as in TW , or we can use Iθ , the observed information matrix−n−1∂ 2lobs(θ )/∂θ ∂θ T evaluated at the constrained MLE θ . Intuitively, when H0holds, θ is a better estimate than θˆ as it incorporates the constraints under H0. Wetherefore suggest the use of Iθ over Iθˆ in the substitution method to obtain p-valuesassociated with TLR.As discussed in Pinheiro and Bates (1995),Pinheiro and Chao (2006) and Joe(2008), the approximation (3.6) may be computationally intensive if the dimensionof xmis,i is high. In this case, an adaptive Guassian quadrature approximation maybe used as follows.∫fRi|X i(r i|xi;α ) fX mis,i|X obs,i(xmis,i|xobs,i;µ (i),Σ(i))dxmis,i=∫(2pi)−pi/2|Σ(i)|−1/2 exp{− (xmis,i−µ (i))TΣ−1(i) (xmis,i−µ (i))/2}× fRi|X obs,i,X mis,i(r i|xobs,i,xmis,i;α )dxmis,i=(2pi)−pi/2|Σ(i)|−1/2∫exp{−g(xmis,i)}dxmis,i≈|Σ(i)|−1/2|Hi|−1/2NGQ∑j1=1· · ·NGQ∑jpi=1exp{−g(xˆmis,i+H−1/2i z∗j1,..., jpi )} pi∏k=1ω jk , (3.7)where−g(xmis,i) = log fRi|X obs,i,X mis,i(r i|xobs,i,xmis,i;α )− (xmis,i−µ (i))TΣ−1(i) (xmis,i−µ (i))/2,xˆmis,i = arg min g(xmis,i),Hi =∂ 2g(xmis,i)∂xmis,i∂xTmis,ievaluated at xmis,i = xˆmis,i,and z∗j1,..., jpi and ω jk are the same as in (3.6).Intuitively, the above approximation improves the ordinary Guassian quadra-69ture approximation in (3.6) by placing the abscissas around the mode of the inte-grand−g(xmis,i), instead of around zero, and by scaling distance between abscissaswith the curvature of −g(xmis,i) instead of using a universally fixed scaling. Suchimprovement may greatly reduce the required number of quadrature points NGQ toachieve the same approximation accuracy. For example, the approximation(3.7) re-duces to Laplace approximation when NGQ = 1, which is already accurate in manycases. Since the total number of quadrature points required for a d-dimensionalintegral is NdGQ, the ability to reduce NGQ makes the approximation(3.7) computa-tionally more appropriate than the approximation (3.6) when the dimension of theintegral d, that is, the dimension of xmis,i, is high.On the other hand, the approximation (3.7) is computationally less stable thanthe approximation (3.6), since the calculation of xˆmis,i usually involves numericalmaximization and therefore may encounter non-convergence problems. For thisconsideration, approximation (3.6) is used in the real data analysis and simulationstudies.3.4 Data Analysis3.4.1 The Data and HypothesesWe apply our new tests TW and TLR to the data from Surkan et al. (2015). Recallthat this trial aims to evaluate the effects of zinc-folic and iron-folic acid on thetemperament and eating behaviours of children. A total of n= 569 eligible subjectswere randomized into 4 groups: pure zinc (n= 126), pure iron (n= 129), iron pluszinc (n = 162) and placebo (n = 152). For each subject, the temperament scores(ranging from 0 to 12, with lower scores indicating easier temperament) and eating70behaviour scores were measured at baseline and at four follow-up visits (at roughlyevery three months).One question of interest is to test whether there is evidence to support short-term (first six months) amelioration in child temperament in each of the four groups,that is, whether there is evidence that the mean temperament score strictly de-creases relative to baseline at (at least) one of the first two follow-up visits. Morespecifically, let yi j and ti j respectively denote the temperament score and measure-ment time (in weeks) of subject i at visit j ( j = 0 corresponds to baseline) in eachgroup. Let x1i = (yi0− yi1)/(ti1− ti0), x2i = (yi0− yi2)/(ti2− ti0), µ1 = E(X1i), andµ2 = E(X2i), where X1i and X2i are random variables whose respective realized val-ues are x1i and x2i. This question of interest is formulated as a hypothesis testingofH0 : µ1 ≤ 0 and µ2 ≤ 0 versus H1 : µ1 > 0 or µ2 > 0,which is a multivariate one-sided hypothesis testing problem with p= 2. Figure 3.2shows the temperament score trajectories of 8 randomly selected subjects fromeach of the four groups. In all the four groups, it is not immediately clear whetheror not the mean temperament score decreases relative to baseline at (at least) oneof the first two follow-up visits due to the large between-subjects variability.3.4.2 Inequality-restricted Tests With Missing DataThis hypothesis testing problem is complicated by the issue of missing data. Therates of missing data in (x1i,x2i) in the four groups are respectively: (12.7%,11.9%)(pure zinc), (14.0%,14.0%) (pure iron), (14.8%,15.4%) (zinc plus iron), and(18.4%,17.8%) (placebo). To check the sensitivity of the results of the one-sided710246810Pure ZincVisit timeTemperament scoreBaseline 1st follow−up 2nd follow−uplll0246810Pure IronVisit timeTemperament scoreBaseline 1st follow−up 2nd follow−uplll0246810Zinc + IronVisit timeTemperament scoreBaseline 1st follow−up 2nd follow−upl0246810PlaceboVisit timeTemperament scoreBaseline 1st follow−up 2nd follow−upl llFigure 3.2: Trajectories of temperament scores for 8 randomly selected sub-jects in each group.testing to various possible missing mechanisms, we implement the two new testsTW and TLR respectively under the assumption of ignorable missingness and that ofnonignorable missingness, and compare with the CC method. We apply both thebound method and the substitution method. For the latter we use Iθˆ for TW and weuse Iθ for TLR. To simplify the presentation of results and to improve numericalstability, we rescale the raw data by a multiplicative factor of 50, which is roughly72the number of weeks in one year.Under the assumption of ignorable missingness, both TW and TLR obviate theneed to explicitly model the missing mechanism, as is discussed in Section 3.3. Forthe more general cases of nonignorable missingness, we consider the following twosimple models for the missing mechanism, where σ1 and σ2 respectively denote thestandard deviation of X1i and X2i,• MNAR model 1:fRi1,Ri2|Xi1,Xi2(ri1,ri2|xi1,xi2) = fRi1|Xi1(ri1|xi1) fRi2|Ri1,Xi2(ri2|ri1,xi2),fRi1|Xi1(ri1|xi1) =Pr(Ri1 = 1|Xi1 = xi1)}ri1×{1−Pr(Ri1 = 1|Xi1 = xi1)}1−ri1 ,logit{Pr(Ri1 = 1|Xi1 = xi1)}=α01+α11(xi1−µ1)/σ1,fRi2|Ri1,Xi2(ri2|ri1,xi2) =Pr(Ri2 = 1|Ri1 = ri1,Xi2 = xi2)}ri2×{1−Pr(Ri2 = 1|Ri1 = ri1,Xi2 = xi2)}1−ri2 ,logit{Pr(Ri2 = 1|Ri1 = ri1,Xi2 = xi2)}=α02+α12(xi2−µ2)/σ2+α22ri1.• MNAR model 2: a simplified version of MNAR model 1 with α22 = 0 (i.e.,the missingness of xi2 is unrelated to the missingness of xi1).In the absence of scientific background regarding the missing data generating mech-anisms, these two empirical models can avoid too many nuisance parameters. Inboth MNAR models, α11 and α12 can be interpreted as the parameters governingthe degree of nonignorable missingness. In addition, a test of α11 = α12 = α22 = 0under MNAR model 1 and a test of α11 = α12 = 0 under MNAR model 2 can serveas an assessment of the plausibility of MCAR. The true missing data mechanism73may be more complicated, but the above two models may be used as sensitivityanalysis, i.e., if and how the analysis results may be related to the assumed missingdata models.We also apply the modified Hawkins test (Jamshidian and Jalal, 2010), imple-mented in the R package MissMech, which provides an assessment of multivariatenormality and MCAR. This test is based on first grouping observations with iden-tical missing data patterns and then checking the homogeneity of means and co-variances between groups in a nonparametric manner. However, a caution shouldbe kept in mind that the power of this test is not good when the sample size is nomore than 200 (Jamshidian and Jalal, 2010).For all the four groups, the modified Hawkins test neither rejects multivariatenormality nor rejects MCAR, probably due to insufficient power with the currentsample sizes. However, with structural assumptions, both MNAR models suggestevidence against MCAR for the three intervention groups, as is shown in Table 3.3.For the placebo group, MNAR model 2 just narrowly fails to reject MCAR at asignificance level of 0.05 and MNAR model 1 indicates strong evidence againstMCAR.3.4.3 Analysis ResultsTable 3.4 summarizes the results from the multivariate one-sided tests. For theplacebo group and the zinc plus iron group, our new tests confirm that there isno adequate evidence to reject H0 (i.e. no evidence to support improvement intemperament) at a significance level of 0.05, both under ignorable missingnessand under possible nonignorable missingness. These results confirm the findingsin Surkan et al. (2015) who use a complete-case generalized estimating equation74(GEE) based analysis.Table 3.1: Parameter estimates and standard errors (in brackets) for mean pa-rameters under different assumptions of missing data mechanism for theSurkan et al. (2015) trial dataGroupComplete case Ignorable missingness MNAR model 1 MNAR model 2µ1 µ2 µ1 µ2 µ1 µ2 µ1 µ2Pure Zinc 0.5 (1.0) 0.8 (0.5) 0.2 (0.9) 0.9 (0.5) 2.2 (1.1) 1.2 (0.6) 1.5 (1.0) 2.1 (0.6)Pure Iron −0.4 (1.0) −0.1 (0.5) −0.4 (0.9) −0.1 (0.5) 1.7 (1.2) 0.7 (0.5) 0.5 (1.2) 1.4 (0.6)Zinc + Iron −2.4 (0.8) −0.8 (0.4) −2.3 (0.8) −0.7 (0.4) −1.6 (1.2) −0.5 (0.6) −1.0 (0.9) 0.6 (0.5)Placebo 0.2 (0.9) 0.7 (0.4) 0.1 (0.9) 0.8 (0.4) 0.4 (1.4) 0.7 (0.5) 0.4 (1.5) −0.4 (0.6)µ1 and µ2 respectively denote the mean temperament score per week decrease from baseline to first and second follow-up visit.Table 3.2: Parameter estimates for variance component parameters under dif-ferent assumptions of missing data mechanism for the Surkan et al.(2015) trial dataGroupComplete case Ignorable missingness MNAR model 1 MNAR model 2σ21 σ22 σ12 σ21 σ22 σ12 σ21 σ22 σ12 σ21 σ22 σ12Pure Zinc 95 26 23 95 27 24 120 28 28 125 34 29Pure Iron 99 26 23 100 26 23 125 28 29 151 40 47Zinc + Iron 79 26 28 80 26 28 83 26 29 116 37 48Placebo 89 21 21 96 22 23 97 22 24 104 29 28The variance of Xi1, the variance of Xi2, and the covariance between Xi1 and Xi2 are respectively denoted by σ21 , σ22 and σ12.75Table 3.3: Missing data model parameters and standard errors (in brackets)under different assumptions of missing data mechanism for the Surkanet al. (2015) trial dataGroupMNAR model 1 MNAR model 2α11 α12 α22 Reject MCARa α11 α12 Reject MCARbPure Zinc 2.8 (1.7) 0.2 (0.9) 2.2 (0.7) Yes 3.3 (1.6) 3.2 (1.6) YesPure Iron 1.8 (0.9) 3.1 (3.1) 8.1 (5.4) Yes 2.8 (0.9) 5.0 (2.3) YesZinc + Iron 0.5 (0.7) 0.1 (0.6) 4.9 (0.7) Yes 2.1 (0.6) 2.5 (0.6) YesPlacebo 0.2 (0.7) −0.4 (0.4) 2.8 (0.5) Yes −0.5 (0.8) −2.0 (0.8) Noca MCAR is rejected if the Wald test p-value for testing α11 = α12 = α22 = 0 vs α211+α212+α222 6= 0 is less than 0.05.b MCAR is rejected if the Wald test p-value for testing α11 = α12 = 0 vs α211+α212 6= 0 is less than 0.05.c The Wald test p-value is 0.059.Table 3.4: P-values of various tests applied to the Surkan et al. (2015)Ignorable missingness MNAR model 1 MNAR model 2Group CC TW TLR TW TLR TW TLRSubstitution methodPure Zinc 0.09 0.06 0.06 0.01 0.00 0.00 0.00Pure Iron 0.67 0.68 0.68 0.05 0.04 0.00 0.02Zinc + Iron 0.64 0.65 0.65 0.23 0.23 0.11 0.17Placebo 0.09 0.05 0.05 0.08 0.08 0.46 0.54Bound methodPure Zinc 0.17 0.11 0.12 0.02 0.01 0.00 0.01Pure Iron 1.00 1.00 1.00 0.11 0.08 0.01 0.04Zinc + Iron 1.00 1.00 1.00 1.00 1.00 0.24 0.34Placebo 0.17 0.09 0.09 0.17 0.17 0.76 0.8676For the pure zinc group and the pure iron group, in contrast to the results basedon the CC method, our new tests based on the substitution method interestinglysuggest that zinc-folic acid alone or iron-folic acid alone may facilitate short-termtemperament amelioration. The evidence against H0 is particularly strong for thepure zinc group under both MNAR models, even with the conservative boundmethod.The estimates of parameters in the models for missing mechanisms in Table 3.3can shed light on why our new tests produce qualitatively different p-values. Forthe three intervention groups, large values of xi1 and large values of xi2 may bemore likely missing, since positive estimates of α11 and α12 are obtained underboth MNAR models. In particular, α11 in MNAR model 1 for the pure iron groupand α11 and α12 in MNAR model 2 for all the three intervention groups are found tobe significantly positive. Such informative missingness may explain the decreasedestimates for mean parameters (µ1 and µ2) in Table 3.1 and the decreased estimatesfor variance parameters (σ21 and σ22 ) in Table 3.2 based on the CC method for thethree intervention groups. In addition, if the 14% missing data in xi1 and/or the14% missing data in xi2 for the pure iron group are systematically much largerthan the observed data, then inference based only on the observed data may bebiased in favour of H0 and therefore may suffer from decreased power and yieldmuch larger p-values. The smaller p-values obtained under both MNAR modelscompared with those obtained under the CC method for the pure zinc group andthe iron plus zinc group repeat the same story. In the next section, we will conducta simulation study to verify that ignoring missing mechanisms will indeed lead toreduced power when missing data are systematically larger than observed data.Another interesting finding revealed by the estimates of missing model param-77eters is that the missing data for the placebo group may be systematically smallerthan the observed data. In such a situation, inference based only on the observeddata (i.e. the CC test as well as TW and TLR under ignorable missingness) mayoverestimate µ1 and µ2, may be biased in favour of H1, and may yield erroneouslysmaller p-values and suffer from inflated type I error rates. We will confirm thesephenomena via a simulation study in the next section.In conclusion, new findings are obtained based on our proposed new tests. Re-sults based on the new tests suggest that the pure iron group and the pure zinc groupmay have significantly improved short-term temperament and that some missingtemperament score reductions may be systematically larger than the observed onesin the three intervention groups for the present trial in Surkan et al. (2015). Fur-ther investigation may be warranted to confirm the impact of iron-folic alone andzinc-folic alone on temperament amelioration. Our two new tests also confirmsome findings in Surkan et al. (2015) that there is no evidence for temperamentimprovement for the placebo group and for the zinc plus iron group.3.5 A Simulation Study3.5.1 Data GenerationIn this section, we compare the proposed tests TW and TLR with the CC test interms of type I error rates and power. We consider settings similar to the mod-els fitted from the real data example, that is, we draw n = 130 i.i.d observations{(xi1,xi2) : i = 1, . . . ,n} from N(( µ1µ2 ),(σ21 σ12σ12 σ22))with σ21 = 150, σ22 = 40 andσ12 = 50. We consider the case of equal missing rates (15% in both xi1 and xi2),which is similar to the real data example, as well as the case of moderately unequal78missing rates (10% in xi1 and 35% in xi2). We consider five representative scenarioswith different missing mechanisms:• Scenario MNAR-mlo: missing data are systematically larger than observeddata.• Scenario MNAR-mso: missing data are systematically smaller than observeddata.• Scenario MAR-so: missing data in xi2 are associated with small observeddata in xi1 and missing data in xi1 are MCAR.• Scenario MAR-lo: missing data in xi2 are associated with large observed datain xi1 and missing data in xi1 are MCAR.• Scenario MCAR: missing data are MCAR.Scenario MNAR-mlo and MNAR-mso may respectively mimic the case of the threeintervention groups and the case of the placebo group in the real data example.Scenario MAR-so and MAR-lo represent cases where missing data in one variable(e.g. efficacy measurements) may be accounted for by observable data in anothervariable (e.g. toxicity measurements). Scenario MCAR has been extensively stud-ied in Wang and Wu (2011) and we include it for the sake of completeness.In the three scenarios MNAR-mlo, MNAR-mso, and MCAR, we create missingvalues in xi1 and xi2 independently according to the missing data models:logit{Pr(Xi1 is missing|Xi1 = xi1)}= α01+α11(xi1−µ1)/σ179andlogit{Pr(Xi2 is missing|Xi2 = xi2)}= α02+α12(xi2−µ2)/σ2.We respectively set (α11,α12) to be (3,3), (0,−2), and (0,0) so that the degree ofnonignorable missingness in the simulated datasets can roughly match that in thereal data example. In Scenario MAR-so and MAR-lo, we create missing values inxi1 according tologit{Pr(Xi1 is missing)}= α01and independently create missing values in xi2 according tologit{Pr(Xi2 is missing|Xi1 = xi1,Ri1 = ri1)}= α02+αMAR(1− ri1)(xi1−µ1)/σ1,where ri1 = 1 if xi1 is missing and ri1 = 0 if xi1 is observed. We choose αMAR =−2for MAR-so and αMAR = 2 for MAR-lo as illustrative examples. For all scenarios,we determine the values for α01 and α02 to yield the required expected missingrates in (xi1,xi2), namely (15%,15%) or (10%,35%). Finally, we consider variouschoices of (µ1,µ2) that are similar to the estimates from the real data example.The nominal significance level is α = 0.05. The integrals in the log-likelihoodare approximated by Gaussian quadrature with 100 quadrature points. Both TWand TLR are implemented under the assumption of ignorable missingness as wellas under the MNAR model logit{Pr(Xi j is missing)} = α0 j + α1 j(xi j − µ j)/σ j( j = 1,2). In addition, both the substitution method and the bound method areimplemented, and the substitution method uses Iθˆ for TW and Iθ for TLR. The sim-ulation is repeated 2000 times so that the Monte Carlo error associated with anyempirical power or type I error rate is at most 1.1%.803.5.2 Simulation ResultsIn Scenarios MNAR-mso and MAR-so, Table 3.5 and Table 3.6 show that the CCmethod exhibits severely inflated type I error rates, under both equal missing ratesand unequal missing rates, and even with the conservative bound method. In con-trast, the proposed new tests TW and TLR, which incorporate missing mechanisms,both produce type I errors consistent with the nominal significance level for boththe substitution method and the bound method, under both equal missing rates andunequal missing rates. For the MNAR-mso case, since the CC method produced in-flated type I error probabilities which are higher than the nominal level, the powercomparisons between the new tests and the CC method for the MNAR-mso case arefor information only and should not be interpreted as advantages of the CC method.Finally, in Scenario MCAR both TW and TLR perform better than the CC method interms of type I error probability control and power under both equal missing ratesand unequal missing rates, especially when TW and TLR are implemented assumingignorable missingness (correct assumption for Scenario MCAR).81Table 3.5: Simulation results under unequal missing rates (10%,35%) fortesting H0 : µ1 ≤ 0 and µ2 ≤ 0 versus H1 : µ1 > 0 or µ2 > 0Ignorable missingness Nonignorable missingnessScenarios (µ1,µ2) CC TW TLR TW TLRSubstitution method, type I error rates (in %)MNAR-mlo (0,0) 0.0 0.0 0.0 3.0 4.1MNAR-mso (0,0) 99.7 93 93 5.2 4.0MNAR-mso (−1,−1) 85 37 36 0.5 0.3MAR-so (0,0) 99.8 5.6 5.5 0.4 0.4MCAR (0,0) 8.7 5.1 5.0 7.2 7.1Bound method, type I error rates (in %)MNAR-mlo (0,0) 0.0 0.0 0.0 1.2 2.1MNAR-mso (0,0) 99 88 86 3.0 1.8MNAR-mso (−1,−1) 76 28 26 0.4 0.1MAR-so (0,0) 99 2.8 2.4 0.2 0.2MCAR (0,0) 2.1 2.5 2.2 4.2 4.1Substitution method, power (in %)MNAR-mlo (0.5,1) 0.0 0.0 0.0 28 31MNAR-mlo (1,2) 0.0 0.2 0.2 80 83MNAR-mso (0.5,0.5) 100 99 99 15 11MAR-lo (0.5,1) 2.4 35 36 99 99MCAR (1,2) 79 89 89 76 71Bound method, power (in %)MNAR-mlo (0.5,1) 0.0 0.0 0.0 17 21MNAR-mlo (1,2) 0.0 0.2 0.2 70 72MNAR-mso (0.5,0.5) 100 98 98 9.3 7.1MAR-lo (0.5,1) 0.0 25 26 98 98MCAR (1,2) 69 81 80 67 6582Table 3.6: Simulation results under equal missing rates (15%,15%) for testingH0 : µ1 ≤ 0 and µ2 ≤ 0 versus H1 : µ1 > 0 or µ2 > 0Ignorable missingness Nonignorable missingnessScenarios (µ1,µ2) CC TW TLR TW TLRSubstitution method, type I error rates (in %)MNAR-mlo (0,0) 0.0 0.0 0.0 2.0 2.3MNAR-mso (0,0) 64 42 41 6.4 6.2MNAR-mso (−1,−1) 19 2.1 2.1 0.4 0.5MAR-so (0,0) 71 5.0 4.8 2.1 1.9MCAR (0,0) 8.3 5.1 5.0 5.7 6.0Bound method, type I error rates (in %)MNAR-mlo (0,0) 0.0 0.0 0.0 1.0 1.1MNAR-mso (0,0) 51 30 29 3.5 3.3MNAR-mso (−1,−1) 8.0 0.8 0.8 0.2 0.5MAR-so (0,0) 58 2.6 2.5 0.9 1.0MCAR (0,0) 2.6 2.8 2.7 2.7 2.9Substitution method, power (in %)MNAR-mlo (0.5,1) 0.0 1.0 1.0 36 38MNAR-mlo (1,2) 4.8 35 35 92 92MNAR-mso (0.5,0.5) 88 74 74 21 19MAR-lo (0.5,1) 7.6 43 44 86 87MCAR (1,2) 87 94 94 87 86Bound method, power (in %)MNAR-mlo (0.5,1) 0.0 0.4 0.4 24 26MNAR-mlo (1,2) 1.0 24 24 85 86MNAR-mso (0.5,0.5) 79 63 61 13 12MAR-lo (0.5,1) 1.8 31 32 78 79MCAR (1,2) 77 88 88 82 8083The good performances of the proposed new tests TW and TLR are even moreevident for higher dimensional data when the number of variables is p = 5, asshown in Table 3.7. Interestingly, TW and TLR lost some power advantages overthe CC method when missing data mechanisms are incorporated, as shown in Ta-ble 3.5. This is probably because that the TW and TLR methods must estimate theextra nuisance parameters in missing data models.84Table 3.7: Empirical type I error rates and power (in %) under MCAR (p=5,n=50 and equal missing rates 20%)Substitution method Bound methodµ Σ CC TW TLR CC TW TLRType I Error Rates(05) Σ1(ρ=0.2) 7.0 6.8 5.7 3.9 2.0 1.1(05) Σ2(ρ=0.6) 6.3 5.6 5.0 1.5 0.8 0.4(−12,03) Σ1(ρ=0.2) 3.2 3.1 2.4 1.8 0.8 0.4(−12,03) Σ2(ρ=0.6) 3.8 3.2 2.8 1.0 0.4 0.2Power(0.15) Σ1(ρ=0.2) 19 29 27 13 12 9(0.15) Σ1(ρ=0.6) 15 20 19 5.2 4.8 3.2(−12,0.33) Σ1(ρ=0.2) 36 75 73 29 52 45(−12,0.33) Σ1(ρ=0.6) 34 66 66 15 33 28Note: a) TW and TLR both assume ignorable missingness.b) Σ has 1’s on the diagonal and ρ’s on the off-diagonal.c) (−12,0.33) denotes (−1,−1,0.3,0.3,0.3) etc.In the terms of computational time, we find that TW is roughly 50% to 90%faster than TLR in all scenarios. This confirms the computational advantage of TWover TLR as noted in previous section. The computational advantage of TW is moreevident for larger samples and higher data dimensions. On the other hand, TLR85exhibits slightly lower type I error rates as well as slightly higher power comparedwith TW in most MNAR cases. This confirms our previous speculation that TLRmay have a better finite sample performance than TW since TLR simultaneouslyincorporates missing data and one-sided inequality restrictions.In conclusion, the proposed new tests TW and TLR for multivariate inequality-restricted tests with missing data perform well in simulations for all three missingdata mechanisms considered in the simulation. They are clearly better than theCC method. Between the two, the computational time of TW is faster than that ofTLR. On the other hand, TLR is slightly superior to TW regarding type I error rateand the power. The actual computation of the statistic TLR is accompanied withcomplications. Therefore, it seems that TW is more preferable for practical use.3.6 DiscussionIn this chapter we propose two new tests for multivariate one-sided hypothesistesting problems with missing data when the missing data mechanisms may benon-ignorable and the missing data pattern is arbitrary. One test is based on aWald-type statistic which is computationally efficient. The other test is the likeli-hood ratio test which has better finite-sample performance but is computationallymore intensive. Simulation shows that both tests perform well and are clearly betterthe naive complete-case method. Asymptotic results for the two new tests are alsodeveloped. Since missing data are common in practice, the proposed tests shouldbe useful for many practical problems. Note that the assumed missing data mech-anisms are not testable based on observed data, but methods based on differentmissing data mechanisms are useful for sensitivity analysis, i.e., they can be usedto check how sensitive the data analysis results are to various assumed missing data86mechanisms.The proposed tests sometimes exhibit type I error rates below the nominal sig-nificance level in simulation studies. This is probably because the substitutionmethod relies on a large sample size to accurately estimate the information matrixand the bound method is conservative by nature. One possible improvement ofthe proposed test is to adaptively choose larger critical values to relax type I errorcontrol and gain power, as in Zhu and Chen (2017).Inequality-restricted inference has received increasing attention recently for re-gression models, such as Davidov and Rosen (2011) for linear mixed-effects mod-els, Davis et al. (2012) for generalized linear models, and Wang and Wu (2013)for non-linear mixed-effects models. Missing data frequently arise in longitudinalstudies. We are currently extending the proposed tests in this chapter to mixed-effects regression models for longitudinal data. Such extensions may be conceptu-ally straightforward, but the computation problem is much more challenging in re-gression models, especially mixed-effects models, due to more complicated meanand covariance structures and unobservable random effects.Although there has been extensive research on inequality-restricted inferencein the past few decades, the use of such methods in practice seems not proportionalto the amount of available literature. One main reason is probably the lack of user-friendly software. We plan to develop an R package for our proposed methods.The proposed methods in this chapter are currently implemented in C++ and R.The code is available online at github.com/GuohaiZhou/mvot. With the availabilityof user-friendly software, more applied statisticians or other researchers may real-ize the advantages of using inequality-restricted tests over other tests such as theunrestricted Hotelling’s t-test, especially the power gain by incorporating natural87restrictions on the parameter space.3.7 Proof of Asymptotic ResultsThis section derives the asymptotic chi-bar-square least favorable distribution forboth Wald test and likelihood ratio test. Section 3.7.1 lists regularity conditions.Section 3.7.2 establishes consistency and√n- consistency of the maximum like-lihood estimates (MLE) under H0 and consistency and√n- consistency of MLEunder H0UH1. Section 3.7.3 derives the asymptotic distribution for likelihood ra-tio test. Section 3.7.4 derives the asymptotic distribution for Wald test.3.7.1 Notation and Regularity ConditionsThe following notation will be used. Let Ω = {θ : θ ∈ Rk,Rθ ≥ 0} denotethe overall parameter space, which is closed. Let θ (0) = (θ (0)1 , . . . ,θ(0)k )T denotethe true parameter vector, which can be on the boundary of Ω. Recall lobs(θ )denotes the observed-data log-likelihood based on n observations as defined inequation (3.3). Let θ˜ = arg maxθ∈Ωlobs(θ ) denote the MLE under H0⋃H1 and θ =arg maxθ∈{θ :Rθ=0}lobs(θ ) denote the MLE under H0. Let ‖ x ‖ denote the Euclidean normof x, Eθ (·) denote an expectation evaluated under distributions corresponding tothe parameter values θ . When n→ ∞, let a.s.→ denote almost sure convergence, p→denote convergence in probability, and d→ denote convergence in distribution. Wefurther define ∇= ∂/∂θ and ∇2 = ∂ 2/∂θ ∂θ T . We also use Op(1) or op(1) for arandom vector to indicate that its components are all Op(1) or op(1).All partial derivatives mentioned will be taken from the appropriate side when-ever θ falls on the boundary of Ω. For i = 1, . . . ,n, the following regularity condi-tions are assumed.88C0: Different values of θ correspond to different distributions of yi (identifiabil-ity).C1: For each θ ∈ Ω, li(θ ) is differentiable up to order three with respect to θ atalmost all yi (the exceptional set is common for all θ and is a 0-probabilityevent).C2: There exist real-valued functions hi(yi) with∫hi(yi)dyi < ∞ such that, foreach θ ∈Ω, all first and second order partial derivatives of li(θ ) are boundedby hi(yi) in absolute value.C3: There exist real-valued functions Hi(yi) and a closed-ball neighborhood ofθ (0) denoted as B(θ (0)), such that for each θ ∈ B(θ (0))⋂Ω, all third or-der partial derivatives of li(θ ) are bounded by Hi(yi) in absolute value. Inaddition, limn→∞∑ni=1 Hi(Y i)n = Op(1).C4: For each θ ∈Ω, I(i)θ = Eθ{∇li(θ )[∇li(θ )]T}has finite entries and is positivedefinite. Let I(i)θ ( j1, j2) denote the ( j1, j2) entry of I(i)θ .C5: For each θ ∈Ω, limn→∞1n ∑ni=1 I(i)θ = Iθ exists and is positive definite. Iθ can beinterpreted as the “average” information matrix.C7: There exists a δ1 > 0 and a finite M such that for the hi(yi) in C2,Eθ[(hi(Y i)+ I(i)θ ( j1, j2))1+δ1 ]≤Mfor all i, h and l and for each θ ∈Ω.C8: There exists a δ2 > 0 such that Lyapunov’s condition is satisfied for any linear89combinations aT∇li(θ ). That is, for any a ∈ Rk and a 6= 0,1[∑ni=1 aT I(i)θ a]1+δ2/2n∑i=1Eθ[|aT∇li(θ )|2+δ2]→ 0, as n→ ∞.C9: lobs(θ ) has a unique local maximizer. For θ values on the boundary of Ω,“local” is in a one-sided sense.C9′: There exists a function l(θ ) such that supθ∈Ω|1n lobs(θ )− l(θ )|p→ 0 as n→ ∞,and sup{θ :‖θ−θ 0‖≥ε}l(θ )< l(θ 0) for any ε > 0.Note that C2 and the dominated convergence theorem can justify differentia-tion (up to order 2) under the integral sign, which leads to Eθ {∇li(θ )} = 0 andEθ {∇2li(θ )}=−Eθ{∇li(θ )[∇li(θ )]T}, for each θ ∈Ω.The Markov’s Strong Law of Large Numbers states that if Xi is a sequenceof independent random variables with E(Xi) = µi < ∞ and if for some δ3 > 0,∑∞i=1 E[|Xi− µi|1+δ3 ]/i1+δ3 < ∞, then Xn− µn = 1n ∑ni=1(Xi− µi)a.s.→ 0. Considerthe case when Xi is equal to the ( j1, j2) entry of −∇2li(θ ), for the δ1 in C7, |Xi−I(i)θ ( j1, j2)|1+δ1 ≤ |hi(yi) + I(i)θ ( j1, j2)|1+δ1 by C2, so Eθ [|Xi− I(i)θ ( j1, j2)|1+δ1 ] isuniformly bounded by a finite M byC7. Therefore,∑∞i=1 Eθ [|Xi−I(i)θ ( j1, j2)|1+δ1 ]/i1+δ1≤M∑∞i=1 1/i1+δ1 < ∞ and Markov’s Strong Law of Large Numbers yields that the(k, l) entry of −1n∇2lobs(θ ) converges almost surely to the (k, l) entry of Iθ . ByC5,−1n∇2lobs(θ )a.s.→ Iθ , (3.8)where the almost sure convergence of a sequence of random matrix Zn to a ran-90dom matrix Z is defined by Pr(limn→∞ ‖ Zn−Z ‖)=1 with ‖ Z ‖=√∑k∑l Z2kl (i.e.Euclidean matrix norm).By C8, one-dimensional Lyapunov’s Central Limit Theorem and the Cramer-Wold Theorem (Billingsley, 2008) which states that a random vector converges indistribution to a multivariate normal distribution if all linear combinations of itscomponents converge in distribution to the corresponding linear combinations ofthat multivariate normal distribution, we have(n∑i=1I(i)θ)−1/2∇lobs(θ )d→ N(0, Ik×k),where A−1/2 denotes the square root matrix of A−1 for a positive definite matrix Abased on Cholesky factorization and Ik×k denotes the k× k identity matrix.By C5, (3.8), Slutsky’s theorem, and the continuity of the matrix functionf (A) = A−1/2 at Iθ , we haven−12∇lobs(θ )d→ N(0, Iθ ). (3.9)One of C9 and C9′ ensure the consistency of both θ˜ and θ . C9 will be satisfiedif lobs(θ ) is strictly concave. This may be achieved after some re-parameterizationη = η (θ ). C9′ can be roughly interpreted as that θ 0 is a well-separated point ofmaximum of the uniform limit (in probability) of the average likelihood function.The strong condition of uniform convergence may be justified by the uniform lawof large numbers in Rubin et al. (1956).913.7.2 Consistency and√n-consistencyUnder conditions C0 to C8, and one of C9 or C9′, we have√n(θ˜ −θ 0) = Op(1),√n(θ −θ 0) = Op(1) under H0 (i.e. if Rθ 0 = 0 ) .Proof. Consistency under C0 to C8 and C9. The following proof is similar tothe derivation of Theorem 1 in Self and Liang (1987).Let A denote Ω when H0 does not hold and {θ : Rθ = 0} when H0 holds.Let θˆAn = arg maxθ∈Alobs(θ ). For any sufficiently small ε > 0, the intersection of theclosure of a ε-neighborhood of θ 0 and the closure of A is closed. The continuousfunction lobs(θ ) must attain its local maximum in this intersection set. By C9, thislocal maximum occurs at θˆAn . We will show that for any θ such that ‖ θ −θ 0 ‖= ε ,lobs(θ )< lobs(θ 0). (3.10)holds with probability greater than 1−δ whenever n is greater than some Nδ de-pending on δ , for each δ >0. Then consistency of θˆAn then follows by definition.To show (3.10), consider the following Taylor expansion, which is validated byC2 and C3,1nlobs(θ )− 1nlobs(θ 0) =(θ −θ 0)T 1n∇lobs(θ 0)+(θ −θ 0)T 12n∇2lobs(θ 0)(θ −θ 0)+Op(1) ‖ θ −θ 0 ‖3= S1+S2+S3. (3.11)Here ∇lobs(θ 0) denotes ∇lobs(θ ) evaluated at θ = θ 0 and ∇2lobs(θ 0) denotes92∇2lobs(θ ) evaluated at θ = θ 0.We note that S3 is smaller in absolute value than S2 for all sufficiently small ε .Since 1n∇lobs(θ 0) =1√n× 1√n∇lobs(θ 0)d→ 0 by (3.9) (hence p→ 0 as well), S1 can bearbitrarily small for all large n. Therefore, the sign of the right side of (3.11) willbe the determined by S2 for all large n. Since S2 has a negative limit (in probability)−(θ −θ 0)T Iθ (θ −θ 0) by (3.8), (3.10) is justified.Alternatively, consistency under C0 to C8 and C9′. The following proof issimilar to the derivation of Theorem 5.7 in Van der Vaart (2000).By the first part of C9′, l(θ 0) = 1n lobs(θ 0)+op(1),l(θ 0)− l(θˆ An ) =1nlobs(θ 0)− l(θˆ An )+op(1)≤1nlobs(θˆAn )− l(θˆAn )+op(1)≤ supθ∈Ω|1nlobs(θ )− l(θˆ )|+op(1) p→ 0. (3.12)By the second part of C9′, for any ε > 0, there exists a ηε between 0 and thepositive quantity l(θ 0)− sup{θ :‖θ−θ 0‖≥ε}l(θ ), then l(θ )< l(θ 0)−ηε whenever ‖ θ −θ 0 ‖≥ ε . In other words, the event {‖ θˆ An − θ 0 ‖≥ ε} is contained in the event{l(θˆ An )< l(θ 0)−ηε}. The probability of the latter event converges to 0 in view of(3.12). Then by definition θˆAnp→ θ 0.√n-consistency after consistency. The following proof is similar to the deriva-tion of lemma 1 in Chernoff (1954).For any ε > 0, there is a positive sequence cnε → 0 and a Kε such that withprobability greater than 1− ε ,‖ θˆ An −θ 0 ‖< cnε (due to consistency),93‖ 1√n∇lobs(θ 0) ‖< Kε (due to (3.9)),‖ 1n∇2lobs(θ 0)+ I(θ 0) ‖< cnε (due to (3.8)),and Op(1) ‖ θ −θ 0 ‖3 is less than Kε ‖ θ −θ 0 ‖3. With these inequalities and thatlobs(θˆAn )≥ lobs(θ 0), there is a K∗ε such that0≤ (θˆ An −θ 0)T1n∇lobs(θ 0)+(θˆAn −θ 0)T12n∇2lobs(θ 0)(θˆAn −θ 0)+Op(1) ‖ θˆAn −θ 0 ‖3<‖ θˆ An −θ 0 ‖Kε√n− 12(θˆAn −θ 0)T I(θ 0)(θˆAn −θ 0)+K∗ε × cnε× ‖ θˆAn −θ 0 ‖2 .(3.13)As n→ ∞, the third term on the right of (3.13) vanishes since cnε → 0, then theremust be a K∗∗ε such that ‖ θˆAn − θ 0 ‖< K∗∗ε√n occurs with probability 1−ε for allsufficiently large n to allow (3.13) to hold. By definition this implies√n(θˆAn −θ 0) = Op(1).3.7.3 Asymptotic Distribution of LRT statisticUnder conditions C0 to C8, and one of C9 or C9′, we havePr(TLR ≤ c|µ = 0)→p∑i=0ωp−i(p,RI−1θ 0 RT ,O−)Fχ2(c; i), as n→ ∞Proof. The following proof is similar to the derivation of Results (14) in Wang andWu (2013) and to the derivation of Proposition 4.3.1 in Silvapulle and Sen (2004).A quadratic approximation of lobs(θ ) can be further derived based on (3.11) as94follows.lobs(θ ) =lobs(θ 0)+(θ −θ 0)T∇lobs(θ 0)+ 12(θ −θ 0)T∇2lobs(θ 0)(θ −θ 0)+nOp(1) ‖ θ −θ 0 ‖3=lobs(θ 0)+(θ −θ 0)T∇lobs(θ 0)− 12n(θ −θ 0)T Iθ 0(θ −θ 0)+nOp(1) ‖ θ −θ 0 ‖3 +op(1)=lobs(θ 0)+µ T Iθ 0Zn−12µ T Iθ 0µ +n−1/2 ‖ µ ‖3 Op(1)+op(1)( Define Zn = n−1/2I−1θ 0 ∇lobs(θ 0) and µ =√n(θ −θ 0). )=lobs(θ 0)− 12(Zn−µ )T Iθ 0(Zn−µ )+12ZTn Iθ 0Zn+R(θ ),where the remainder R(θ ) = n−1/2 ‖ √n(θ −θ 0) ‖3 Op(1)+op(1). Define ΩK ={θ :√n ‖ θ −θ 0 ‖< K} for any given K > 0 not depending on n. For any θ ∈ΩK ,R(θ ) = op(1).Under H0, Section 3.7.2 shows that both θ˜ and θ˜ are√n-consistent, thereforeTLR =2[lobs(θ˜ )− lobs(θ )] = 2[supRθ≥0,θ∈ΩKlobs(θ )− supRθ=0,θ∈ΩKlobs(θ )]=2{[− infRθ≥0,θ∈ΩK12(Zn−µ )T Iθ 0(Zn−µ )]−[− infRθ=0,θ∈ΩK12(Zn−µ )T Iθ 0(Zn−µ )]}+op(1)= infRµ=0,‖µ‖ 0 and a finite M such that for the hi(yi) in C2,Eθ 1[(hi(Y i)+ I(i)θ 1 (h, l))1+δ1 ]≤Mfor all i, h and l and for each θ 1 ∈Ω.C8: There exists a δ2 > 0 such that Lyapunov’s condition is satisfied for any linearcombinations aT ∂ lapi (θ 1)∂θ 1 . That is, for any a ∈ Rk and a 6= 0,1[∑ni=1 aT I(i)θ 1 a]1+δ2/2n∑i=1Eθ 1[|aT ∂ lapi(θ 1)∂θ 1|2+δ2]→ 0, as n→ ∞.C9: There exists a function l(θ 1) such that supθ 1∈Ω|1n ∑ni=1 lapi(θ 1)− l(θ 1)|p→ 0, andsup{θ 1:‖θ 1−θ 0‖≥ε}l(θ 1)< l(θ 0) for any ε > 0.Under the above assumptions and regularity conditions, the test statistics TWand TLR share the following asymptotic chi-bar-square distribution (Silvapulle andSen, 2004),Pr(T ≥ tobs|H0)→p∑i=0ωi(p,RI−1θ 0 RT ;O+)[1−Fχ2(tobs; i)], as n→ ∞, (4.7)where the test statistic T is either TW or TLR, tobs is the observed value of T given adataset, θ 0 is the true parameter vector, and Iθ 0 is the limit of−n−1∂ 2l(θ )/∂θ ∂θ Tevaluated at θ = θ 0, ωi(p,RI−1θ 0 RT ;O+) are chi-bar probability weights whose cal-culations are described in Section 1.6.3 on page 30 or in Section 3.5 of Silvapulleand Sen (2004), χ20 denotes a distribution that places probability 1 at the pointmass 0, and Fχ2(tobs; i) denotes the probability that a random variable following a116χ2-distribution with i degrees of freedom is less than or equal to tobs.Since the null hypothesis H0 typically only specifies values of a subset of themean parameters in β , the null distribution in (4.7) involves unknown parameters inθ , especially the variance-covariance parameters. For this reason, following Wangand Wu (2013), we propose two methods to approximate p-values in practice:• the substitution method: approximate p-values through substituting unknownparameters by their consistent estimators in Iθ 0 in (4.7), such as I˜n/n whereI˜n =− ∂2lap(θ )∂θ ∂θ T|θ=θ .• the bound method: approximate (conservative) p-values by using the upperbound given by Perlman (1969):p∑i=0ωi(p,RI−1θ 0 RT ,O+)[1−Fχ2(tobs; i)]≤1−Fχ2(tobs; p−1)2+1−Fχ2(tobs; p)2. (4.8)The substitution method may perform well when the sample size is large, but it mayperform poorly for small samples. The bound method may be too conservative,but it is computationally much simpler. We will evaluate these two methods viasimulations later.4.3 Analysis of a Lung Cancer DataWe consider a longitudinal retrospective case-control study on lung cancer pro-gression described in Ishizumi et al. (2010). In this study, 101 cancer patients wereidentified and were retrospectively matched with 101 control patients who exhibitcomparable distributions of demographic characteristics (such as age and gender)117and smoking status. The response of interest is subject’s biopsy grade based ontumor size. The biopsy grade include the following levels: normal, hyperplasia,metaplasia, mild dysplasia, moderate dysplasia, severe dysplasia, carcinoma in situ(CIS) and cancer. For each subject at each time point, the biopsy grade was mea-sured at multiple locations within the lung. For simplicity, we consider the average(numerical) biopsy grade over multiple lung locations for each subject at each timepoint and use this lung-average biopsy grade as the response variable.Table 4.1 shows a descriptive summary of the data. Figure 4.2 shows the tra-jectories of the lung-average biopsy grade of four randomly selected patients fromthe cancer group and four randomly selected patients from the control group. Wesee that the lung-average biopsy grade exhibits both inter- and intra-subjects vari-abilities. We focus on the 65 cancer subjects and the 64 control patients who are atleast 45 years old and who have smoking duration for at least 30 years, since sucha sub-group seems of special interest.118Table 4.1: Summary of the lung cancer data in Ishizumi et al. (2010)Variable Cancer Group Control GroupAge (years; mean±SD∗) 61±7 61±8Men:women (number of subjects) 59:42 59:42Current:former smoker (number of subjects) 57:44 57:44Smoking duration (years; mean±SD) 41±9 40±8Years quit (former smokers; mean ±SD) 11±7 10±7Follow-up duration (years; mean±SD) 9.8±3.9 9.2±3.9Lung-average biopsy grade (mean±SD)baseline 3.94 ± 0.87 3.87±0.850 to 5-year follow-up 4.06±1.36 3.69±1.015 to 10-year follow-up 4.21±1.32 3.32±1.2810 to 15-year follow-up 3.84±0.69 3.45±0.10∗ SD = Standard DeviationLet ti j denote the time (in years) at which measurement j for subject i wastaken. We use j = 0 for baseline and j ≥ 1 for follow-up times. Let Yi j denotethe lung-average biopsy grade from subject i at time ti j. From a clinical point ofview, Yi j values smaller than 3 may be considered as normal or very mild cells,while values over 3 indicate possible non-normal cells which may possible lead tocancer so Yi j values over 3 are of primary interest. We thus introduce the indicatorvariable Ui j = I(Yi j < 3) as an indicator of abnormality of the cells. In other words,we can view Yi j values smaller than 3 as “zeros” (or a cluster of normal cases),119and we view Yi j values larger than 3 as continuous data with larger values beingmore severe “abnormality”. We also consider a transformed response variable Y ∗i j =log10(Yi j− 2.5) for Yi j ≥ 3 to make the response data more normally distributed.The proportion of abnormal lung-average biopsy grade is 86% in the cancer groupand is 82% in the control group.ll0 2 4 6 8 102468Cancer groupYears after baselineLung−average biopsy gradell0 2 4 6 8 102468Control groupYears after baselineLung−average biopsy gradeFigure 4.2: Trajectory of 4 randomly selected subjects from each group. Thedashed line corresponds to 3, the cutoff defining abnormality in lung-average biopsy grade.Since the cancer group and the control group were matched with respect todemographic covariates, we consider the following simple and empirical modelwhich includes group indicator, baseline lung-average biopsy grade Yi0, and timeas predictors. Specifically, let ICi denote the group indicator of subject i, with 1 forcancer group and 0 for control group. We consider the following modelslogit Pr(Ui j=1 |Bi1=bi1) = β0+β1ICi+β2Yi0+β3ti j +bi1, (4.9)120Y ∗i j = β4+β5ICi+β6Yi0+β7ti j +Bi2+Ei j, (4.10)where Ei j is i.i.d. N(0,σ2), and (Bi1,Bi2) are i.i.d. bivariate Guassian with mean0 and covariance matrix Ψ.For this study, a natural research question of interest is if the disease pro-gression, measured by the values of Ui j and Y ∗i j over time, is (i) different for thecancer and control groups, and (ii) related to the basedline values Yi0. Statis-tically, the questions can be answered by testing the following two hypotheses:H01 : β1 = β5 = 0 and H02 : β2 = β6 = 0. The alternative hypotheses may be theunrestricted alternatives Hu11 : not Hu01 (i.e., at least one of β1 and β5 is not zero,or β 21 +β 25 6= 0), and Hu12 : not H02 (i.e., at least one of β2 and β6 is not zero, orβ 22 + β 26 6= 0). However, in this application, it is natually to expect that patientsin the cancer group should have worse disease progression than those in the con-trol group. Similarly, patients have worse baseline disease statuses should be morelikely to progress poorly. In other words, the values of the parameters, β1,β5,β2,β6should all be non-negative. Therefore, the order-restricted alternatives,Hr11 : β1 ≥ 0, β5 ≥ 0, Hr12 : β2 ≥ 0, β6 ≥ 0(with at least one inequality strictly hold) should lead to more powerful tests sincethey incorporate order restrictions on the parameter values (i.e., they use moreinformation). Such order restricted alternatives are particularly valuable if the dif-ferences to be detected are small, i.e., signficant results may be obtained based onorder-restricted alternatives but not based on unrestricted alternatives. So they haveimportant clinical implications.121For comparison, we conducted both unrestricted and order-restricted hypothe-sis testing for the effects of cancer/group group and baseline values on the responsevariable to see if any new findings can be obtained. Table 4.2 displays estimatesfor the fixed effects parameters for the two models. The estimated variance com-ponent are σˆ = 0.24 and Ψˆ =( 1.39 −0.01−0.01 0.005). Table 4.3 shows the p-values asso-ciated with unrestricted test and order-restricted tests. For order-restricted tests,we consider both the bound method and the substitution method. We see that theorder-restricted tests yield smaller p-values than the unrestricted tests, suggestingthat incorporating the restrictions on the parameters does lead to more powerfultest. For example, there is stronger evidience on the difference between the cancerand control groups on the response over time, and the substitution method is evenmore powerful than the bound method, suggesting that the bound method may betoo conservative. Although the significances of the unrestricted and restricted testsappear to be the same based on (say) 5% level, the unrestricted tests produce p-values very close to the significance level for testing the group effects, while theorder-restricted tests produce p-values much smaller than the significance level,making the conclusions more convincing.122Table 4.2: Estimates of the fixed parameters in models for Y ∗i j and Ui jModel Parameter Estimate Standard errorModel for Ui j β0 1.74 0.93β1 0.55 0.44β2 0.02 0.23β3 0.01 0.07Model for Y ∗i j β4 -0.01 0.08β5 0.07 0.03β6 0.02 0.02β7 0.01 0.01123Table 4.3: P-values from testing H01 (or H02) versus the unrestricted alterna-tive Hu1 and restricted alternative Hr1 (the bound method and the substitu-tion method are both used for Hr1).Testing for cancer/control group effects (i.e., β1 and β5)Hu11 Hr11 (bound) Hr11 (substitution)Wald test 0.042 0.027 0.016LRT 0.048 0.031 0.018Hu11 : β21 +β 25 6= 0, Hr11 : β1 ≥ 0, β5 ≥ 0 with at least one strict inequalityTesting for baseline sickness effects (i.e., β2 and β6)Hu12 Hr12 (bound) Hr12 (substitution)Wald test 0.566 0.426 0.282LRT 0.565 0.425 0.281Hu12 : β22 +β 26 6= 0, Hr12 : β2 ≥ 0, β6 ≥ 0 with at least one strict inequalityIn summary, by using the order-restricted tests proposed in this chapter, wehave found more convincing evidence for the differences between the cancer andcontrol groups than the usual unrestricted tests.4.4 Simulation StudiesWe conduct some simulation studies to demonstrate the advantages of order-restricted hypothesis testing over unrestricted testing under models for semi-continuousdata, in situations when the order-restricts are reasonable. A main advantage oforder-restricted inference is its power gain by incorporating reasonable restrictions124in parameters. The objective of the simulation studies in this section is to confirmthis advantage in the context of semi-continuous data models. We consider twosimulation settings, one setting mimics models in Albert and Shen (2005), as de-scribed in Section 1, and the other setting mimics the models in the data analysisin previous section.The first simulation study is based on a simplified version of the models forsemi-continuous data considered in Albert and Shen (2005). Let ILi and IHi respec-tively be the dummy variable for a “light” treatment (the medication plus shamacupuncture group) and a “heavy” treatment (the medication plus active acupunc-ture group). Let Ui j denote the indicator of volume from subject i at time ti j beingpositive and let Y ∗i j =log(volume+1) of the transformed volume of subject i at timeti j. Similar to Albert and Shen (2005), we consider the following models in thesimulationlogit Pr(Ui j=1|Bi1=bi1) = 2.3− ti j +β1IHi+β2ILi+bi1, (4.11)Y ∗i j = 5.5− ti j +β3IHi+β4ILi+Bi2+ εi j, (4.12)Ei ji.i.d.∼ N(0,0.3), Bi1Bi2 i.i.d.∼ 00 , 0.26 0.00460.0046 0.002 .For each simulated dataset, we consider both unrestricted hypotheses H0 : β1 =β2 = β3 = β4 = 0 versus Hu1 : not H0 (i.e., at least one β j 6= 0) and the order-restricted alternative Hr1 : β1 ≤ β2 ≤ 0 and β3 ≤ β4 ≤ 0, with at least one inequalitystrictly holds. In the simulation, we first evaluate type I error rates under the null125hypothesis H0. Then, we compare the powers at the alternatives β1 = −0.8, β2 =−0.4, β3 = −0.2 and β4 = −0.1, similar to those in Albert and Shen (2005). Weconsider various sample size choices.For order-restricted tests, the substitution method and the bound method areapplied in both Wald tests and LRT tests, and are compared with their unrestrictedcounterparts. The simulation was repeated 2000 times so that the Monte Carloerrors associated with empirical power or type I error probability will be limited toat most 1.1% (since√p(1−p)2000 ≤√0.5×0.52000 = 1.1%).Table 4.4 shows the results of the first simulation study. The null hypothesisH0 is the same for all tests, while the alternative hypotheses are either unrestrictedor restricted. The objective is to check if the restricted alternatives offer substantialpower advantages over unrestricted alternatives. All tests achieve nominal type Ierror probability 0.05, so we can compare the powers. Compared with unrestrictedalternative Hu1 , both the substitution method and bound method for restricted alter-native yield substantially higher powers than unrestricted alternative for both Waldand LRT tests. The substitution method shows substantial power advantage overthe bound method, which is expected since the bound method is conservative.126Table 4.4: Comparison of type I error rates and powers for restricted and un-restricted tests.Wald∗ LRT∗Sample size Hu1 Hr1 (bd) Hr1 (sub) Hu1 Hr1 (bd) Hr1 (sub)Type I error rates (in %)30 4.35 1.05 3.20 4.95 1.30 3.9045 5.00 0.85 2.45 5.50 1.35 2.8560 4.70 0.90 2.60 4.80 1.10 3.0575 4.90 0.95 2.65 5.00 1.00 2.9590∗∗ 4.15 0.95 2.60 4.05 1.10 3.15Power (in %)30 31 32 50 31 33 5145 48 49 67 49 52 6860 61 64 78 64 67 8075 71 74 86 75 77 8890∗∗ 79 82 92 83 86 93∗ Hu1 , Hr1 (bd) and Hr1 (sub) respectively denote unrestricted alternativehypothesis, restricted alternative hypothesis with the bound method, andrestricted alternative hypothesis with the substitution method.∗∗ Similar to sample size in Albert and Shen (2005) .The second simulation study is based on the real data analysis in the previoussection. We simulate data from the models (4.9) and (4.10) and set the true param-127eters values to the corresponding estimates based on real data, except β1 and β5.We consider various values of (β1,β5) to mimic scenarios in which the true differ-ences between the treatment group and the control group (i.e. the effect sizes) donot exist, and are weaker than, equal to or stronger than the estimates from the realdata example respectively.Table 4.5 displays the results of the second simulation study. We again havesimilar findings as those in the first simulation study: the restricted tests outperformthe unrestricted test for both Wald and LRT tests. Figure 4.3 shows that the medianand inter-quartile range of percent reductions in p-values resulting from the use oforder-restricted tests. The substitution method shows over 50% median reductionin p-values, compared with unrestricted tests, regardless of the effect size and thetype of test statistics (LRT or Wald).128Table 4.5: Comparison of type I error rates and powers in the second simula-tion study.Effect size Wald∗ LRT∗(β1,β5) Hu1 Hr1 (bd) Hr1 (sub) Hu1 Hr1 (bd) Hr1 (sub)Type I error rates(0,0) 5.35 2.95 5.00 5.70 3.20 5.15Power(0.50,0.06) 28 34 42 29 34 42(0.55,0.07)∗∗ 33 41 49 33 41 49(0.60,0.08) 38 45 55 38 46 55∗ Hu1 , Hr1 (bd) and Hr1 (sub) respectively denote unrestricted alternativehypothesis, restricted alternative hypothesis with the bound method, andrestricted alternative hypothesis with the substitution method.∗∗ Equal to estimates from real data example in Table 4.2.129l l l l ll l l20406080100Median and interquartile range of p−value reductions compared with unrestricted tests(Effect size) Type−of−testP−value reductions (in %)l l l ll l(0.50,0.06) Wald (0.50,0.06) LRT (0.55,0.07) Wald (0.55,0.07) LRT (0.60,0.08) Wald (0.60,0.08) LRTl Substitution methodBound methodFigure 4.3: Median and interquartile range of p-value reductions benefitingfrom order-restrictions4.5 DiscussionThe use of order-restricted inference provides a formal framework to incorporatescientific considerations about order constraints in model parameters into statisticalinference. We have demonstrated that, for modelling semi-continuous longitudi-nal data, order-restricted tests can increase powers substantially while maintaintype I error probabilities, compared with tests which ignore order-restrictions inthe parameters. Such a power gain may help researchers to detect significant ef-fects which are not evident based on usual tests which ignore the order constraints.Therefore, the proposed new order-restricted tests are scientifically important andvaluable.The proposed tests rely on large samples, since p-value formulas are based on130asymptotic results. When sample sizes are small, the tests may perform less well.In this case, bootstrap tests incorporating order-restrictions may be considered andwill be our future research. In addition, a gamma generalized linear mixed effectssub-model may be used to model the positive part of the response.In addition to semi-continuous responses, practical longitudinal data often in-volve other complications such as missing data or censoring. In particular, informa-tive censoring is common for HIV viral load data, due to the detection limits of viralload quantifying devices. Since anti-HIV therapies usually do not increase the av-erage virus level among HIV infected populations, incorporating order-restrictionsprovides an interesting way to improve powers of statistical tests to detect the ef-ficacies of anti-HIV therapies (Wang and Wu, 2013). The next Chapter presentssuch an extension.Although there is an extensive literature on order-restricted inference since1960’s (e.g. Perlman, 1969), much of the literature focuses on theoretical resultsfor multivariate normal models. Literature on order-restricted inference for manycommonly used models in practice, such as models for complex longitudinal data,is still quite limited. Moreover, there is little software for these methods, whichgreatly limits the use of these tests. We plan to develop R packages to facilitate theapplication of order-restricted inference for various types of mixed-effects modelsfor longitudinal and cluster data.131Chapter 5Multivariate One-sided Tests forNonlinear Mixed Effects ModelsWith Censored Response5.1 IntroductionNonlinear mixed effects (NLME) models are frequently used in many longitudinalstudies such as growth curve studies, HIV viral dynamic studies, and pharmacoki-netics and pharmacodynamics studies. A main advantage of a NLME model isthat it is typically based on scientific justifications or data-generating mechanisms,so it is different from empirical models such as linear or generalized linear mixedeffects models. Thus, in a NLME model, model parameters often have meaningfulscientific interpretations such as viral decay rates. These scientific interpretationsmay suggest certain reasonable constraints such as non-negative virus decay rates132during an anti-HIV treatment.As an example, we focus on HIV viral dynamic studies, which are useful forevaluating the efficacy of an anti-HIV treatment. For example, the estimates ofviral decay rates may be used to evaluate and compare anti-HIV treatments. Inthe literature, the following NLME model is widely used to model HIV viral loadtrajectories during an anti-HIV treatment (Wu and Ding, 1999; Wang and Wu,2013):Yi j = log10(eP1i−λ1i jti j + eP2i−λ2iti j)+Ei j, (5.1)P1i = P1+Bi1, λ1i j = λ1+αxi j +Bi2,P2i = P2+Bi3, λ2i = λ2+Bi4,where Yi j and xi j respectively denote the response (log-10 transformed viral load)and the covariate (CD4) measured at time ti j, β = (P1,λ1,α,P2,λ2)T are fixedeffects, and Bi = (Bi1,Bi2,Bi3,Bi4)T are random effects. Parameters λ1 and λ2 arerespectively first-phase and second phase (population) viral decay rates during ananti-HIV treatment. It is typical to assume that λ1 ≥ 0 and λ2 ≥ 0 since an anti-HIV treatment should at least lead to some decline or non-increase in viral loads(Wu and Ding, 1999; Vaida et al., 2007; Wang and Wu, 2013). Thus, the followingmultivariate one-sided hypothesesH0 : λ1 = λ2 = 0 versus Hc1 : λ1 ≥ 0,λ2 ≥ 0 (with at least one strict inequality)can be used to test if the treatment works or not (H0 indicates that the treatmentdoes not work, while Hc1 indicates otherwise). Figure 1.5 on page 22 in Chapter 1133shows a typical example of viral load trajectories during an anti-HIV treatment.Wang and Wu (2013) consider the above one-sided test in the absence of miss-ing data or censoring. For HIV/AIDS data, a main challenge in statistical analysisis that viral loads are often left censored due to a lower detection limit, e.g., viralloads below 100 (or 50) cannot be detected or measured, leading to (left) censoredviral load data. It is well known that ad-hoc methods such as imputing censoredvalues by the detection limit or half the detection limit may lead to misleading re-sults (Hughes, 1999; Wu, 2009). Formal methods to handle censored responsesin mixed effects models include multiple imputation methods and expectation-maximization (EM) algorithms. Hughes (1999) first proposes a Monte Carlo EMalgorithm for linear mixed effects (LME) models with censored responses. Wu(2002) extends his method to fit NLME models with censored responses and co-variate measurement errors. Vaida et al. (2007) and Vaida and Liu (2009) proposecomputationally more efficient EM algorithms for LME and NLME models withcensored responses. Recently, some researchers propose to model the error termsin mixed effects models with multivariate-t distributions instead of multivariatenormal distributions to incorporate outliers (heavy tails) in the responses (Matoset al., 2013, 2015; Lachos et al., 2017).In this chapter, we consider multivariate one-sided or order-restricted hypoth-esis testing for mean parameters (fixed effects) in NLME models with censoredresponses, extending the work of Wang and Wu (2013) and Vaida and Liu (2009).Note that, here the word “multivariate” should be “multi-parameters”, i.e., multi-parameters order-restricted hypothesis testing. We continue to use “multivariate”so that the terminology is consistent with those in the literature.This chapter is organized as follows. Section 5.2.1 describes general NLME134models with censored responses. In Section 5.2.2 we propose Wald and likelihoodratio tests for NLME models with censored responses. Section 5.2.3 extends thework of Vaida and Liu (2009) for efficient computation. In Section 5.3, we analyzetwo HIV datasets based on the proposed method. Simulation studies are presentedin Section 5.4 to evaluate the proposed method. Some concluding remarks arediscussed in Section 5.5.5.2 Multivariate Order-Restricted Tests for NLMEModels with Censored Responses5.2.1 General NLME Models with Censored ResponsesConsider a longitudinal study. Let Yi j and xi j respective denote the response andthe covariate vector from subject i measured at time ti j, j = 1 . . . ,ni, i = 1 . . . ,n.A general nonlinear mixed effect (NLME) model can be written as (Davidian andGiltinan, 1995)Yi j = g(xi j;β ,Bi)+Ei j, (5.2)where β and Bi are respectively fixed effects (mean parameters) and random ef-fects, g(·) is a known nonlinear function of β and Bi, the random errors Ei j’s areassumed to be i.i.d. and follow N(0,σ2), the random effects Bi are assumed tofollow N(0,Ψ), and Ei j’s are assumed to be independent of Bi, for j = 1 . . . ,ni,i = 1 . . . ,n.When the response data yi j, which are realized values of Yi j model (5.2), aresubject to left censoring with a lower detection limit d (with d known), the observedresponse data are (qi j,ci j), where qi j = yi j and ci j = 0 if yi j is not censored, and135qi j = d and ci j = 1 if yi j is censored.Let θ = (β ,Ψ,σ)T denote the collection of all unknown parameters arrangedas a column vector. The likelihood function corresponding to the observed data isgiven byL(θ ) =n∏i=1∫ { ni∏j=1fYi j|Bi(yi j|bi;β ,Ψ,σ2)1−ci j FYi j|Bi(d|bi;β ,Ψ,σ2)ci j}fBi(bi) dbi,(5.3)where f (·) and F(·) respectively denote the (generic) density function and cumu-lative distribution function and are given respectively byfYi j|Bi(yi j|bi;β ,Ψ,σ2) = exp{− [yi j−g(xi j;β ,bi)]2/(2σ2)}/(√2piσ),FYi j|Bi(d|bi;β ,Ψ,σ2) =∫ d−∞exp{− [z−g(xi j;β ,bi)]2/(2σ2)}/(√2piσ)dz.Evaluation of the above likelihood is typically based on the EM algorithm sincethe likelihood involves intractable integrals with unobservable random effects Bi.5.2.2 Order-Restricted Tests for Mean ParametersIn the NLME model (5.2), when certain mean parameters (fixed effects) are subjectto order-restricted constraints, we may consider (say) the following multivariateorder-restricted hypotheses for statistical inferenceH0 : Aβ = 0 versus Hc1 : Aβ ≥ 0 (with at least one strict inequality) (5.4)where A is a matrix of 0’s or 1’s. We focus on the above hypotheses since theyare motivated from HIV viral dynamic studies. Specifically, consider the HIV viral136dynamic model (5.1) given in the previous section. The fixed effects parameters inmodel (5.1) are β = (P1,λ1,α1,P2,λ2)T . With the following choice of matrix A:A = 0 1 0 0 00 0 0 0 1 ,the hypothesis testing problem (5.4) becomes the previously discussed hypothesistesting problem of H0 : λ1 = λ2 = 0 against H1 : λ1 ≥ 0, λ2 ≥ 0 with at least onestrict inequality.The likelihood ratio test statistic for testing (5.4) is given byTLR = 2[l(θ˜ )− l(θ )],where l(θ ) = logL(θ ) is the log-likelihood function, θ˜ maximizes l(θ ) under theconstraint Aβ ≥ 0 in Hc1 and θ maximizes l(θ ) under the constraint Aβ = 0 in H0.A Wald test statistic for testing (5.4) is given byTW = (Rθ˜ )T (RI˜−1n RT )−1(Rθ˜ ),where R is a matrix of 0’s and 1’s such that Aβ = Rθ , and I˜n =−n−1∂ 2l(θ )/∂θ ∂θ Tevaluated at θ = θ˜ .Following Wang and Wu (2013), the following two assumptions may be madeto investigate the asymptotic distributions of TW and TLR:Assumption 1: − 1n∂ 2l(θ )∂θ ∂θ Tp→ IθAssumption 2:1√n∂ l(θ )∂θd→ N(0, Iθ ),137wherep→ and d→ respectively denote converge in probability and convergence indistribution when the number of subjects n→∞, and Iθ is defined in the followingregularity condition C5.Let Ω = {θ : Rθ ≥ 0} denote the parameter space, θ (0) denote the true pa-rameter vector and Eθ (·) denote an expectation evaluated under distributions cor-responding to the parameter value θ . As in Wang and Wu (2013), the above twoassumptions may be validated by applications of Markov Strong Law of LargeNumbers and Lyapunov’s Central Limit Theorem with the following regularityconditions about the log-likelihood based on subject ili(θ ) = log∫ { ni∏j=1fYi j|Bi(yi j|bi;β ,Ψ,σ2)1−ci j FYi j|Bi(d|bi;β ,Ψ,σ2)ci j}fBi(bi) dbi,for i = 1, . . . ,n.C0: Different values of θ correspond to different distributions of yi (identifiabil-ity).C1: For each θ ∈ Ω, li(θ ) is differentiable up to order three with respect to θ atalmost all yi (the exceptional set is common for all θ and is a 0-probabilityevent).C2: There exist real-valued functions hi(yi) with∫hi(yi)dyi < ∞ such that, foreach θ ∈Ω, all first and second order partial derivatives of li(θ ) are boundedby hi(yi) in absolute value.C3: There exist real-valued functions Hi(yi) and a closed-ball neighborhood ofθ (0) denoted as B(θ (0)), such that for each θ ∈ B(θ (0))⋂Ω, all third or-138der partial derivatives of li(θ ) are bounded by Hi(yi) in absolute value. Inaddition, limn→∞∑ni=1 Hi(Y i)n = Op(1).C4: For each θ ∈Ω, I(i)θ = Eθ{∆li(θ )[∆li(θ )]T}has finite entries and is positivedefinite.C5: For each θ ∈Ω, limn→∞1n ∑ni=1 I(i)θ = Iθ exists and is positive definite. Iθ can beinterpreted as the “average” information matrix.C7: There exists a δ1 > 0 and a finite M such that for the hi(yi) in C2,Eθ[(hi(Y i)+ I(i)θ (h, l))1+δ1]≤Mfor all i, h and l and for each θ ∈Ω.C8: There exists a δ2 > 0 such that Lyapunov’s condition is satisfied for any linearcombinations aT∆li(θ ). That is, for any a ∈ Rk and a 6= 0,1[∑ni=1 aT I(i)θ a]1+δ2/2n∑i=1Eθ[|aT∆li(θ )|2+δ2]→ 0, as n→ ∞.C9: There exists a function l(θ ) such that supθ∈Ω|1n ∑ni=1 li(θ )− l(θ )|p→ 0 as n→∞,and sup{θ :‖θ−θ 0‖≥ε}l(θ )< l(θ 0) for any ε > 0.Under the above assumptions and regularity conditions, the test statistics TWand TLR share the following asymptotic distribution (Wang and Wu, 2013)Pr(T ≥ tobs|H0)→r∑i=0ωi(r,RI−1θ 0 RT ;O+)[1−Fχ2(tobs; i)], as n→ ∞, (5.5)139where the random variable T is either TW or TLR, tobs is the observed value ofT given a dataset, r is the rank of R, ωi(r,RI−1θ 0 RT ;O+)’s are chi-bar probabilityweights whose calculations are described in Section 1.6.3 on page 30 or Section3.5 of Silvapulle and Sen (2004), χ20 denotes a distribution that places probability1 at the point mass 0, and Fχ2(tobs; i) denotes the probability that a random variablefollowing a χ2-distribution with i degrees of freedom is less than or equal to tobs.Note that the above test statistics contain unknown variance-covariance param-eters under the null hypothesis. As in previous chapters, we can either substitutethese unknown parameters by their consistent estimates given a dataset or constructconservative tests based on upper bounds of the null probability. Specifically, fol-lowing Wang and Wu (2013), we propose two methods to compute approximate orconservative critical values or p-values based on the above asymptotic null distri-bution:The substitution method: the unknown Iθ 0 in (5.5) can be substituted by its consis-tent estimate. For example, we may choose −n−1∂ 2l(θ )/∂θ ∂θ T evaluatedat θ = θ .The bound method: conservative p-values can be obtained based on the upperbound given by Perlman (1969)r∑i=0ωi(r,RI−1θ 0 RT ;O+)[1−Fχ2(tobs; i)]≤1−Fχ2(tobs;r−1)2+1−Fχ2(tobs;r)2. (5.6)As discussed in Section 4.2.4, the substitution method may perform well when thesample size is large enough to reasonably estimate I−1θ 0 . The bound method may be140conservative but offers computational simplicity. Both the substitution method andthe bound method will be evaluated later in simulation studies.5.2.3 Computational StrategiesFor a general NLME model, MLEs of the model parameters are typically obtainedbased on an iterative algorithm Lindstrom and Bates (1990). In such an iterativealgorithm, at each iteration, the NLME model is linearized based on a first-orderTaylor approximation around current estimates of the parameters and random ef-fects, leading to a “working” LME model. Parameters in the working LME modelare estimated using an EM algorithm. The procedure is iterated until convergence.At convergence, approximate MLEs of the model parameters in the original NLMEmodel are obtained. Such a linearization procedure has been shown to work well,is widely used, and is implemented in standard software such as the R package“nlme” (Wu, 2009).For a LME model with censored responses, Hughes (1999) proposed a MonteCarlo EM algorithm for likelihood inference, treating the censored values and ran-dom effects as “missing data”. However, their method is typically computationallyintensive due to the Monte Carlo E-step in the EM algorithm. More recently, Vaidaand Liu (2009) proposed a faster EM type algorithm for LME models and NLMEmodels with censored responses. Their method combines Lindstrom and Bates(1990) and Hughes (1999), but with a computationally more efficient E-step for theworking LME model. Specifically, for a NLME model with censored responses,the method of Vaida and Liu (2009) iterates between a linearization step (L-step),an expectation step (E-step), and a maximization step (M-step) until convergence:• L-step: taking a first order Taylor expansion of the mean response E(yi j) in141the NLME model about the current estimates of the parameters and randomeffects, leading to a “working” LME model.• E-step: computing the conditional expectation of the “complete data” log-likelihood given the observed responses of the working LME model using acomputationally efficient method.• M-step: updating estimates for parameters and random effects by maximiz-ing the conditional expectation in the E-step.The details of this procedure will be discussed further in the following section.5.2.4 An EM Algorithm for Order-Restricted Inference for NLMEModels with Censored ResponsesIn this section, we consider order-restricted inference for NLME models with cen-sored responses. Note that, for the likelihood ratio test and Wald test for order-restricted hypotheses, a key step is to compute the constrained MLEs of the modelparameters under the null and alternative hypotheses. Thus, in this section we focuson these computations.We propose to extend the method of Vaida and Liu (2009) to calculate the con-strained MLE θ˜ = (β˜ , σ˜2,Ψ˜) of the parameters in the NLME model with censoredresponses. That is, we try to maximize the following observed-data log-likelihoodl(θ ) =n∑i=1log∫ { ni∏j=1fYi j|Bi(yi j|bi;β ,Ψ,σ2)1−ci j FYi j|Bi(d|bi;β ,Ψ,σ2)ci j}fBi(bi)dbi,subject to the order-restricted constraints Aβ ≥ 0 in the alternative hypothesis Hc1in (5.4). Note that the estimates under the null hypothesis H0 in (5.4): Aβ = 0 canbe easily obtained since it is an equality.142Our proposed method modifies the method of Vaida and Liu (2009) in twoways:• A modified LME step: Instead of running a single E-step and M-step withineach L-step as in Vaida and Liu (2009), our method runs multiple E-stepsand M-steps until convergence for the “working” LME model within eachL-step.• A constrained M-step: Unlike the unconstrained M-step in Vaida and Liu(2009), we incorporate the constraint Aβ ≥ 0 in the M-step for the “working”LME model.That is, we begin with some initial estimates (starting values) of the parametersand random effects, and then we iterate between an L-step, a modified LME step,and a constrained M-step until convergence.For starting values or initial estimates, we may consider the following choices:• The initial estimates for the random effects {Bi : i = 1 . . . ,n} can be chosento be all zeros.• The initial estimate for the covariance matrix Ψ can be an identity matrix.• The initial estimates for the mean paarameters β and variance parameterσ2 can be obtained by fitting a standard nonlinear regression model (e.g.,using R function nls) to the uncensored observations, assuming that they areindependent.Note that, if any components of the initial estimates for β obtained in the above stepviolate the constraint Aβ ≥ 0, they can be manually adjusted to be on the boundary143of the constraint. For example, negative initial estimates of the non-negativelyconstrained components of β are adjusted to be zeros. This ad-hoc approach isonly used for obtaining initial values.In the following, we describe the detail of the L-step and the LME step in thet-th iteration, t = 1,2,3, · · · . At iteration t, the current estimates are denoted byβ˜(t), B˜(t)i , σ˜2(t), and Ψ˜(t).The L-step: constructing a “working” LME modelFollowing (Lindstrom and Bates, 1990), at iteration t, we take a first-orderTaylor expansion of the NLME model (5.2) about β˜(t)and B˜(t)i to obtain an ap-proximate LME model. LetX∗i j =∂g(xi j;β ,Bi)∂β∣∣∣(β ,Bi)=(β˜(t),B˜(t)i ),Z∗i j =∂g(xi j;β ,Bi)∂Bi∣∣∣(β ,Bi)=(β˜(t),B˜(t)i ),where the individual component partial derivatives in X∗i j and Z∗i j are arranged asrow vectors. The “new response” variable Wi j has realized value wi j given bywi j = yi j− (g∗i j−X∗i jβ˜(t)−Z∗i jB˜(t)i ) for yi j ≥ d,< di j = d− (g∗i j−X∗i jβ˜(t)−Z∗i jB˜(t)i ) for yi j < d,where g∗i j = g(xi j; β˜(t), B˜(t)i ) and d is the censoring threshold for yi j. Note that thenew response wi j also contains censored values with the new censoring thresholddi j. Then, the L-step leads to the following “working” LME modelWi j = X∗i jβ +Z∗i jBi+Ei j, i = 1,2, · · · ,n; j = 1,2, · · · ,ni. (5.7)144To address censoring in the new response wi j, we consider the partition wi =(wci ,woi ) as in Vaida et al. (2007) and Vaida and Liu (2009), where wci and woirespectively denote the collection of censored and un-censored response data fromsubject i. Thus, the observed data from subject i are {(woi ,ci), i = 1,2, · · · ,n},where ci = (ci1, . . . ,cni) is the vector of censoring indicators. The log-likelihoodfor the working LME model (5.7) based on “new data” (woi ,ci) is given byl∗obs,i(θ ) = log∫ { ni∏j=1fWi j|Bi(wi j|bi;β ,Ψ,σ2)1−Ci j FWi j|Bi(di j|bi;β ,Ψ,σ2)Ci j}× fBi(bi)dbi,where fWi j|Bi(bi;β ,Ψ,σ2) = exp{− (wi j−X∗i jβ −Z∗i jbi)2/(2σ2)}/(√2piσ), andFWi j|Bi(di j|bi;β ,Ψ,σ2) =∫ di j−∞ exp{− (z−X∗i jβ −Z∗i jbi)2/(2σ2)}/(√2piσ)dz.We first define some notation. Let p and q respectively denote the dimensionof vectors β and Bi. Let tr(A) and |A| respectively denote the trace and the de-terminant of a matrix A. Let ‖x‖ denote the Euclidean norm of a vector x. LetIni×ni denote an ni×ni identity matrix. Let X∗i be the ni× p matrix whose rows areX∗i j, let Z∗i be the ni×q design matrix whose rows are Z∗i j, and let wi be the ni×1vector whose components are wi j. Let ∆ be the Choleskey factor of matrix σ2Ψ−1,i.e., ∆T∆= σ2Ψ−1. Let W ci , W oi , and Bi denote the random vector whose realizedvalues are respectively wci , woi , and bi.To obtain updated estimates of the parameters and random effects at iterationt with the L-step, we can maximize the approximate observed-data log-likelihoodl∗obs,i(θ ) using an EM algorithm, treating (wci ,bi) as “missing data”. If these “miss-145ing data” were observed, the “complete-data” log-likelihood for the “working”LME model (5.7) is given byl∗c (θ ;wi,bi) =−12σ2‖wi−X∗i β −Z∗i bi‖2−ni2log(2piσ2)−12bTi Ψ−1bi− q2 log(2pi)−12log |Ψ|.The details of the E-step and M-step of the EM algorithm for obtaining MLEsbased on l∗obs,i(θ ) are described as follows.The E-stepAt the t-th iteration of the L-step, we do another iteration for the EM algorithm.That is, we iterate between an E-step and an M-step to maximize l∗obs,i(θ ) for the“working” LME model (5.7) within the t-th iteration of the L-step. The E-stepcomputes the conditional expectation of∑i l∗c (θ ;W i,Bi) given (W oi ,C i). Let θ˜(s)=(β˜(s), σ˜2(s),Ψ˜(s)) and B˜(s)i denote the current estimates in the s-th EM iteration. Inthe E-step we computeQ(θ |θ˜ (s)) =E(∑il∗c (θ ;W i,Bi) |W oi ,C i; θ˜ (s))=−n∑i=1ni2log(2piσ2)− nq2log(2pi)− n2log |Ψ|− 12tr{Ψ−1n∑i=1E(BiBTi |W oi ,C i; θ˜ (s))}− 12σ2n∑i=1E(‖W i−X∗i β −Z∗i Bi‖2 |W oi ,C i; θ˜ (s)). (5.8)Note that, in principle the E-step to compute Q(θ |θ˜ (s)) can be implementedusing Markov Chain Monte Carlo (MCMC) methods, as in Hughes (1999), but146such methods can be computationally very intensive and may offer convergenceproblems. Thus, in the following we propose a computationally more efficientmethod, although the mathematics may seem a bit tedious.In (5.8), the last term can be computed asE(‖W i−X∗i β −Z∗i Bi‖2 |W oi ,C i; θ˜ (s))=tr{E[(W i−X∗i β )(W i−X∗i β )T |W oi ,C i; θ˜ (s)]}+ tr[(Z∗Ti Z∗i )E(BiBTi |W oi ,C i; θ˜ (s))]−2tr{E(Z∗i BiWTi |W oi ,C i; θ˜ (s))}+2(X∗i β )T Z∗i E(Bi|W oi ,C i; θ˜ (s)), (5.9)where the first term can be further computed astr{E[(W i−X∗i β )(W i−X∗i β )T |W oi ,C i; θ˜ (s)]}=tr{E(W iW Ti |W oi ,C i; θ˜ (s))}−2β T X∗Ti E(W i|W oi ,C i; θ˜ (s))+β T X∗Ti X∗i β . (5.10)Based on (5.8), (5.9) and (5.10), the calculation of Q(θ |θ˜ (s)) reduces to thecalculation of E(Bi|W oi ,C i; θ˜ (s)), E(BiBTi |W oi ,C i; θ˜ (s)), E(Z∗i BiWTi |W oi ,C i; θ˜ (s)),E(W i|W oi ,C i; θ˜ (s)), and E(W iW Ti |W oi ,C i; θ˜ (s)). We next show thatE(Bi|W oi ,C i; θ˜ (s)), E(BiBTi |W oi ,C i; θ˜ (s))and E(Z∗i BiWTi |W oi ,C i; θ˜ (s))are all func-tions of E(W i|W oi ,C i; θ˜ (s))and E(W iW Ti |W oi ,C i; θ˜ (s)). Let Si = Z∗i Ψ˜(s)Z∗Ti +147σ˜2(s)Ini×ni and Vi = S−1i . Under the “working” LME model, W iBi∼ N X∗i β˜ (s)0 , V−1i Z∗i Ψ˜(s)Ψ˜(s)Z∗Ti Ψ˜(s) ,the well-known results of conditional distributions of a multivariate normal distri-bution yieldE(Bi|W i; θ˜ (s))= Ψ˜(s)Z∗Ti Vi(W i−X∗i β˜(s)), (5.11)Var(Bi|W i; θ˜ (s))= Ψ˜(s)− Ψ˜(s)Z∗Ti ViZ∗i Ψ˜(s). (5.12)Moreover,E(BiBTi |W i; θ˜ (s))=Var(Bi|W i; θ˜ (s))+E(Bi|W i; θ˜ (s))E(Bi|W i; θ˜ (s))T=Ψ˜(s)− Ψ˜(s)Z∗Ti ViZ∗i Ψ˜(s)+ Ψ˜(s)Z∗Ti Vi(W i−X∗i β˜(s))(W i−X∗i β˜(s))TViZiΨ˜(s). (5.13)Based on the law of iterated expectation (Billingsley, 2008) and (5.11)E(Bi|W oi ,C i; θ˜ (s)) =E{E(Bi|W i,W oi ,C i; θ˜ (s))|W oi ,C i; θ˜ (s)}=E{Ψ˜(s)Z∗Ti Vi(W i−X∗i β˜(s))|W oi ,C i; θ˜ (s)}=Ψ˜(s)Z∗Ti Vi[E(W i|W oi ,C i; θ˜ (s))−X∗i β˜(s)], (5.14)E(Z∗i BiWTi |W oi ,C i; θ˜ (s)) =E{E(Z∗i BiWTi |W i,W oi ,C i; θ˜ (s))|W oi ,C i; θ˜ (s)}=E{Z∗i E(Bi|W i,C i; θ˜ (s))W Ti |W oi ,C i; θ˜ (s)}148=Z∗i Ψ˜(s)Z∗Ti Vi[E(W iW Ti |W oi ,C i; θ˜ (s))−X∗i β˜(s)E(W i|W oi ,C i; θ˜ (s))T], (5.15)Similarly, based on the law of iterated expectation and (5.13)E(BiBTi |W oi ,C i; θ˜ (s)) =E{E(BiBTi |W i,W oi ,C i; θ˜ (s))|W oi ,C i; θ˜ (s)}=Ψ˜(s)− Ψ˜(s)Z∗Ti ViZ∗i Ψ˜(s)+Ψ˜(s)Z∗Ti ViE[(W i−X∗i β˜(s))(W i−X∗i β˜(s))T |W oi ,C i; θ˜ (s)]ViZiΨ˜(s). (5.16)Thus, based on equations from (5.8) to (5.16), the calculation of Q(θ |θ˜ (s)) inthe E-step reduces to the calculations of E(wi|woi ,C i; θ˜(s)) and E(wiwTi |woi ,C i; θ˜(s)).Note that, if subject i does not have censored responses, then E(W i|W oi ,C i; θ˜ (s)) =W oi and E(W iWTi |W oi ,C i; θ˜ (s)) = W oi W oTi . If subject i has censored responses,Hughes (1999) and Vaida et al. (2007) propose some approximations based onthe Gibbs sampler method for these quantities. When the censoring rate and/or thedimension of the random effects are not small, the Gibbs sampler may be computa-tionally inefficient, since a large number of Monte Carlo samples may be needed toaccurately approximate Q(θ |θ˜ (s)). Thus, in the following we propose a computa-tionally more efficient approximate method, following Vaida and Liu (2009). Thisapproach approximates E(W i|W oi ,C i; θ˜ (s)) and E(W iW Ti |W oi ,C i; θ˜ (s)) by closed-form functions based on multivariate normal probabilities, which is described asfollows.149First, under the “working” LME model (5.7), we have W i ∼ N(X∗i β˜(s),Si),which can be written as follows W ciW oi∼ N X∗ci β˜ (s)X∗oi β˜(s) , Scci ScoiSoci Sooi .Based on properties of a multivariate normal distribution, we haveW ci |W oi = woi ∼ N(µ c|oi ,Sc|oi )µ c|oi = Scoi (Sooi )−1[woi −X∗oi β˜(s)]+X∗ci β˜(s)Sc|oi = Scci −Scoi (Sooi )−1Soci .Let Ji = { j : wi j is censored} denote the collection of indices correspondingto censored observations from subject i, let d i = {di j : j ∈ Ji} be a column vec-tor of censoring thresholds, and let U i denote a random vector obtained by anN(µ c|oi ,Sc|oi ) random vector left censored at d i. Then, we haveE(W i|W oi ,C i; θ˜ (s)) =(E(U i),W oi)T, (5.17)Var(W i|W oi ,C i; θ˜ (s)) =(Var(U i) 00 0),E(W iW Ti |W oi ,C i; θ˜ (s)) =(Var(U i) 00 0)+(E(U i)W oi)(E(U i),W oi). (5.18)Second, let S(i) be a diagonal matrix whose diagonal elements are the squareroots of the corresponding diagonal elements in matrix Si. Then, T i = S−1(i) (U i−µ c|oi ) ∼ N(0,Ri) whose values are left censored at threshold ai = S−1(i) (d i− µc|oi ),150with Ri = S−1(i) Sc|oi S−1(i) being a correlation matrix. Moreover, we haveE(U i) = S(i)E(T i)+µc|oi , (5.19)Var(U i) = S(i)Var(T i)ST(i). (5.20)Furthermore, E(T i) and Var(T i) can be computed based on the moment gen-erating function of T i, following Tallis (1961) and Vaida and Liu (2009). Specif-ically, let φ(x;µ ,Σ) denote the density function of a multivariate normal randomvector with mean µ and covariance matrix Σ. Let K =∫[x>ai]φ(x;0,Ri)dx, where[x > ai] = {x j > ai j for all j ∈ Ji}. We also denote [x < ai] = {x j < ai j for all j ∈Ji}. Let n∗i denote the number of censored observations from subject i. The mo-ment generating function of T i is given bymi(t) = E(etT T i) = K−1∫[x>ai]etT xφ(x;0,Ri)dx= K−1∫[x>ai](2pi)−n∗i2 |Ri|− 12 e− 12 xT R−1i x+t T xdx= K−1e12 tT Rit∫[x>ai](2pi)−n∗i2 |Ri|− 12 e− 12 (x−Rit)T R−1i (x−Rit)dx= K−1e12 tT Rit∫[x1>ai−Rit ](2pi)−n∗i2 |Ri|− 12 e− 12 xT1 R−1i x1dx1= K−1e12 tT Rit∫[x1