"Science, Faculty of"@en . "Statistics, Department of"@en . "DSpace"@en . "UBCV"@en . "Xu, James Jianmeng"@en . "2009-03-17T22:54:30Z"@en . "1996"@en . "Doctor of Philosophy - PhD"@en . "University of British Columbia"@en . "This thesis presents research on modelling, statistical inference and computation for multivariate\r\ndiscrete data. I address the problem of how to systematically model multivariate discrete response\r\ndata including binary, ordinal categorical and count data, and how to carry out statistical inference\r\nand computations. To this end, I relate the multivariate models to similar univariate models already\r\nwidely used in applications and to some multivariate models that hitherto were available but\r\nscattered in the literature, and I introduce new classes of models.\r\nThe main contributions in this thesis to multivariate discrete data analysis are in several distinct\r\ndirections. In modelling of multivariate discrete data , we propose two new classification of multivariate\r\nparametric discrete models: multivariate copula discrete (MCD) models and multivariate\r\nmixture discrete (MMD) models. Numerous new multivariate discrete models are introduced through\r\nthese two classes and several multivariate discrete models which have appeared in the literature are\r\nunified by these two classes. With appropriate choices of copulas, these two classes of models allow\r\nthe marginal parameters and dependence parameters to vary with covariates in a natural way. By\r\nusing special dependence structures, the models can be used for longitudinal data with short time\r\nseries or repeated measures data. As a result, the scope of multivariate discrete data analysis is substantially\r\nbroadened. In statistical inference and computation for multivariate models, we propose\r\nthe inference function of margins (IFM) approach in which each inference function is a likelihood\r\nequation for some marginal distribution of a multivariate distribution. Examples where the approach\r\napplies are the multivariate logit model with the copulas having certain closure properties and the\r\nmultivariate probit model for binary data. This general approach makes the estimation of parameters\r\nfor the multivariate models computationally feasible. The corresponding asymptotic theory, the\r\nestimation of standard errors by the Godambe information matrix as well as the jackknife method,\r\nand the efficiency of the IFM approach relative to full multivariate likelihood function approach\r\nare studied. Particular attention has been given to the models with special dependence structure (e.g. the copula dependence structure is exchangeable or AR(1) type if applicable), and efficient\r\nparameter estimation schemes based on IFM (weighting approach and pool-marginal-likelihood approach)\r\nare developed. We also give detailed assessments of the efficiency of the GEE approach for\r\nestimating regression parameters in multivariate models; this is lacking in the literature. Detailed\r\ndata analyses of existing data sets are provided to give concrete application of multivariate models\r\nand the statistical inference procedures in this thesis."@en . "https://circle.library.ubc.ca/rest/handle/2429/6188?expand=metadata"@en . "12783655 bytes"@en . "application/pdf"@en . "STATISTICAL M O D E L L I N G A N D I N F E R E N C E FOR MULTIVARIATE A N D LONGITUDINAL DISCRETE RESPONSE DATA by James Jianmeng X u B.Sc. Sichuan University M.Sc. Universite Laval A T H E S I S S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F D O C T O R O F P H I L O S O P H Y T H E F A C U L T Y O F G R A D U A T E S T U D I E S Department of Statistics We accept this thesis as conforming to the required standard T H E U N I V E R S I T Y O F BRITISH C O L U M B I A September 1996 \u00C2\u00A9 James Jianmeng X u , 1996 in In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of The University of British Columbia Vancouver, Canada \u00E2\u0080\u00A2ate S ^ p l f r y DE-6 (2/88) Abstract This thesis presents research on modelling, statistical inference and computation for multivariate discrete data. I address the problem of how to systematically model multivariate discrete response data including binary, ordinal categorical and count data, and how to carry out statistical inference and computations. To this end, I relate the multivariate models to similar univariate models al-ready widely used in applications and to some multivariate models that hitherto were available but scattered in the literature, and I introduce new classes of models. The main contributions in this thesis to multivariate discrete data analysis are in several distinct directions. In modelling of multivariate discrete data , we propose two new classification of mul-tivariate parametric discrete models: multivariate copula discrete (MCD) models and multivariate mixture discrete (MMD) models. Numerous new multivariate discrete models are introduced through these two classes and several multivariate discrete models which have appeared in the literature are unified by these two classes. With appropriate choices of copulas, these two classes of models allow the marginal parameters and dependence parameters to vary with covariates in a natural way. By using special dependence structures, the models can be used for longitudinal data with short time series or repeated measures data. As a result, the scope of multivariate discrete data analysis is sub-stantially broadened. In statistical inference and computation for multivariate models, we propose the inference function of margins (IFM) approach in which each inference function is a likelihood equation for some marginal distribution of a multivariate distribution. Examples where the approach applies are the multivariate logit model with the copulas having certain closure properties and the multivariate probit model for binary data. This general approach makes the estimation of parame-ters for the multivariate models computationally feasible. The corresponding asymptotic theory, the estimation of standard errors by the Godambe information matrix as well as the jackknife method, and the efficiency of the IFM approach relative to full multivariate likelihood function approach are studied. Particular attention has been given to the models with special dependence structure ii (e.g. the copula dependence structure is exchangeable or AR(1) type if applicable), and efficient parameter estimation schemes based on I F M (weighting approach and pool-marginal-likelihood ap-proach) are developed. We also give detailed assessments of the efficiency of the G E E approach for estimating regression parameters in multivariate models; this is lacking in the literature. Detailed data analyses of existing data sets are provided to give concrete application of multivariate models and the statistical inference procedures in this thesis. 111 Contents A b s t r a c t i i Table of C o n t e n t s i i i L i s t of Tables v i i L i s t of F i g u r e s x i i B a s i c N o t a t i o n a n d D e f i n i t i o n s x v A c k n o w l e d g e m e n t s x v i i i 1 I n t r o d u c t i o n 1 1.1 Multivariate discrete response data 1 1.2 Review of literature and research motivation 4 1.3 Statistical modelling 8 1.4 Overview of thesis 10 2 F o u n d a t i o n : m o d e l s , s t a t i s t i c a l inference a n d c o m p u t a t i o n 11 2.1 Multivariate copulas and dependence measures 12 2.1.1 Multivariate distribution functions 12 2.1.2 Multivariate copulas and Frechet bounds 13 2.1.3 Dependence measures 15 2.1.4 Examples of multivariate copulas 17 2.1.5 CTJOM, CUOM(fc), MTJBE, P U B E and M P M E concepts 20 2.2 Multivariate discrete models 24 2.2.1 Multivariate copula discrete models 25 iv 2.2.2 Multivariate mixture discrete models 26 2.2.3 Examples of M C D and M M D models 26 2.2.4 Some properties of M C D and M M D models 33 2.3 Inference functions of margins 34 2.3.1 Approaches for fitting multivariate models 35 2.3.2 Inference functions for multiple parameters 37 2.3.3 Inference function of margins 43 2.4 Parameter estimation with I F M and asymptotic results 49 2.4.1 Models with no covariates 49 2.4.2 Models with covariates 57 2.4.3 Asymptotic results for the models assuming a joint distribution for response vector and covariates . 66 2.5 The Jackknife approach for the variance of I F M E . 70 2.5.1 Jackknife approach for models with no covariates 71 2.5.2 Jackknife for a function of 6 79 2.5.3 Jackknife approach for models with covariates 81 2.6 Estimation for models with parameters common to more than one margin 82 2.6.1 Weighting approach 83 2.6.2 The pool-marginal-likelihoods approach 85 2.6.3 Examples 87 2.7 Numerical methods for the model fitting 88 2.8 Summary ' 92 3 M o d e l l i n g of m u l t i v a r i a t e discrete d a t a 93 3.1 Multivariate copula discrete models for binary data 94 3.1.1 Multivariate logit model 94 3.1.2 Multivariate probit model 100 3.2 Comparison of models 102 3.3 Multivariate copula discrete models for count data 103 3.3.1 Multivariate Poisson model 104 3.3.2 Multivariate generalized Poisson model 106 3.3.3 Multivariate negative binomial model 107 3.3.4 Multivariate logarithmic series model 107 v 3.4 Multivariate copula discrete models for ordinal data 108 3.4.1 Multivariate logit model 109 3.4.2 Multivariate probit model 113 3.4.3 Multivariate binomial model 114 3.5 Multivariate mixture discrete models for binary data 115 3.5.1 Multivariate probit-normal model 115 3.5.2 Multivariate Bernoulli-Beta model 116 3.5.3 Multivariate logit-normal model 118 3.6 Multivariate mixture discrete models for count data 119 3.6.1 Multivariate Poisson-lognormal model 119 3.6.2 Multivariate Poisson-gamma model 121 3.6.3 Multivariate negative-binomial mixture model 122 3.6.4 Multivariate Poisson-inverse Gaussian model 122 3.7 Application to longitudinal and repeated measures data 123 3.8 Summary 124 4 T h e eff iciency o f I F M a p p r o a c h a n d t h e eff iciency o f j a c k k n i f e var iance e s t i m a t e l 2 5 4.1 The assessment of the efficiency of I F M approach 126 4.2 Analytical assessment of the efficiency 129 4.3 Efficiency assessment through simulation 146 4.4 I F M efficiency for models with special dependence structure 162 4.5 Jackknife variance estimate compared with Godambe information matrix 163 4.6 Summary 168 5 M o d e l l i n g , d a t a analys is a n d examples 172 5.1 Some issues on modelling 173 5.1.1 Data analysis cycle 173 5.1.2 Model selection 174 5.1.3 Diagnostic checking 176 5.1.4 Testing the dependence structure 179 5.2 Data analysis examples 181 5.2.1 Example with multivariate/longitudinal binary response data 182 5.2.2 Example with multivariate/longitudinal ordinal response data 194 vi 5.2.3 Example with multivariate count response data 205 5.3 Summary 212 6 G E E methodology and its comparison with M L and I F M approaches 213 6.1 Generalized estimating equations 214 6.2 G E E in multivariate analysis 217 6.3 G E E compared with the M L and I F M approaches 221 6.4 A combination of G E E and I F M estimation approach 234 6.5 Summary 235 7 Some further research topics 237 Bibliography - 247 A Maple programs 248 vii List of Tables 1.1 The structure of general multivariate discrete data 2 4.1 Efficiency assessment with M C D model for binary data: d \u00E2\u0080\u0094 3, z = (0, 0, 0)', N = 1000148 4.2 Efficiency assessment with M C D model for binary data: d = 3, 0O \u00E2\u0080\u0094 (0.7,0.5,0.3)', & = (0.5,0.5,0.5)', Xij discrete, N = 1000 149 4.3 Efficiency assessment with M C D model for binary data: d = 3, 0O = (0.7,0.5,0.3)', /?i = (0.5,0.5,0.5)', = X{ continuous, N \u00E2\u0080\u0094 100 149 4.4 Efficiency assessment with M C D model for binary data: d = 4, a\2 = ct\3 \u00E2\u0080\u0094 <*14 = a 2 3 = a 2 4 = a 3 4 = 1.3863, N = 1000 150 4.5 Efficiency assessment with M C D model for binary data: d = 4, a\2 = \u00C2\u00AB 2 3 = CZA = 2.1972, a i 3 = a 2 4 = 1.5163, a14 = 1.1309, N = 1000 151 4.6 Efficiency assessment with M C D modelfor ordinal data: d = 3, z( l) = (\u00E2\u0080\u00940.5, \u00E2\u0080\u00940.5, \u00E2\u0080\u00940.5)', z(2) = (0.5,0.5,0.5)', TV = 1000 152 4.7 Efficiency assessment with M C D modelfor ordinal data: d = 3, z( l) = (\u00E2\u0080\u00940.5,0, \u00E2\u0080\u00940.5)', z(2) = (0.5,1,0.5)', N = 1000 153 4.8 Efficiency assessment with M C D model for ordinal data: d = 4, z( l ) = (\u00E2\u0080\u00940.5, \u00E2\u0080\u00940.5, - 0 . 5 , -0 .5) ' , z(2) = (0.5,0.5,0.5,0.5)', a12 = a13 = a14 = a23 = a 2 4 = a 3 4 = 1.3863, N = 100 153 4.9 Efficiency assessment with M C D model for ordinal data: d = 4, z( l ) = (\u00E2\u0080\u00940.5, \u00E2\u0080\u00940.5, - 0 . 5 , - 0 . 5 ) ' , z(2) = (0.5,0.5,0.5,0.5)', a12 = a23 = a34 = 2.1972, a13 = a 2 4 = 1.5163, a 1 4 = 1.1309, N = 100 154 4.10 Efficiency assessment with M C D modelfor ordinal data: d \u00E2\u0080\u0094 4, z( l ) = (\u00E2\u0080\u00940.5,0,-0.5,0)', z(2) = (0.5,1,0.5,1)', ori2 = ai3 = a14 = a23 = a24 = a34 = 1.3863, N = 100 . . . . 154 vm 4.11 Efficiency assessment with M C D modelfor ordinal data: d = 4, z(l) = (\u00E2\u0080\u00940.5, 0, \u00E2\u0080\u00940.5, 0)', z(2) = (0.5,1,0.5,1)', ai2 = a23 = a34 = 2.1972, a13 = a 2 4 = 1.5163, a14 = 1.1309, JV = 100 155 4.12 Efficiency assessment with M C D model for count data: d = 3, 0O = (1,1,1)' and 0! = (0.5,0.5,0.5)', X{j discrete, N = 1000 156 4.13 Efficiency assessment with M C D model for count data: d = 3, (0i, 02,03) = (1-6094, 1.0986,1.6094), N = 1000 157 4.14 Efficiency assessment with M C D model for count data: d = 4, (01,02,03,04) = (1.6094,1.0986,1.6094,1.6094), 'N = 1000 157 4.15 Efficiency assessment with M C D model for count data: d = 4, (0i,02,03,0A) = (1.3863,0.6931,1.6094,2.0794), N = 1000 158 4.16 Efficiency assessment with multivariate Poisson-Morgenstern-gamma model, d = 3 . 161 4.17 Efficiency assessment with multivariate Poisson-Morgenstern-gamma model, d = 4 . 161 4.18 Efficiency assessment with multivariate Poisson-Morgenstern-gamma model, d = 5 . 161 4.19 Efficiency assessment with special dependence structure: d = 3, z = (0.5,0.5,0.5)' . . 163 4.20 Efficiency assessment with special dependence structure: d = 3, z = (0.5,1.0,1.5)' . . 163 4.21 Efficiency assessment with special dependence structure: d = 3, ao = (0.5,0.5,0.5)', oti =(1,1,1)' 164 4.22 Efficiency assessment with special dependence structure: d = 3, Qfo = (0.5,0.5,0.5)', o i =(1,0.5,1.5)' 164 4.23 Efficiency assessment with special dependence structure: d = 4, z = (0.5,0.5,0.5,0.5)' 164 4.24 Efficiency assessment with special dependence structure: d = 4, z = (0.5,0.8,1.2,1.5)' 165 4.25 Efficiency assessment with special dependence structure: d = 4, z = (0.5,0.5,0.5,0.5)' 165 4.26 Efficiency assessment with special dependence structure: d = 4, z = (0.5,0.8,1.2,1.5)' 165 4.27 Comparison of estimates of standard error, (i) true, (ii) Godambe, (iii) jackknife with g groups; N - 500, n = 1000 169 4.28 Comparison of estimates of standard error, (i) true, (ii) Godambe, (iii) jackknife with g groups; N = 500, n = 500 170 5.1 Six Cities Study: Percentages for binary variables 188 5.2 Six Cities Study: Frequencies of the response vector (Age 9, 10, 11, 12) 188 5.3 Six Cities Study: Pairwise log odds ratios for Age 9, 10, 11, 12 189 i x 5.4 Six Cities Study: Estimates of marginal regression parameters for multivariate logit model 189 5.5 Six Cities Study: Estimates of dependence regression parameters for multivariate logit model with multinormal copula 189 5.6 Six Cities Study: Comparisons of A I C values and X2 values from various submodels of models (la) and (2) 190 5.7 Six Cities Study: Comparisons of A I C values from various models 190 5.8 Six Cities Study: Comparisons of X2 values from various models 190 5.9 Six Cities Study: Estimates (SE) of dependence regression parameters from the sub-model l.md.g.wn of various models 190 5.10 Six Cities Study: Estimates of P r ( Y = y) from various submodels of model (la) . . 191 5.11 Six Cities Study: Estimates of P r ( Y = y) from the submodel l.md.g.wn of various models 191 5.12 Six Cities Study: Observed frequencies in comparison with estimates of P r ( Y = y|x) from various models, x =(City, Smoking9, SmokinglO, Smokingll , Smokingl2). . . . 192 5.13 Six Cities Study: Estimates of P r ( Y = y|x) from the submodel l.md.g.wn of various models, x =(City, Smoking9, SmokinglO, Smokingll , Smokingl2) 193 5.14 T M I Accident Study: Stress levels for 4 years following accident at T M I . Responses with non zero frequencies 199 5.15 T M I Accident Study: Univariate marginal (and relative) frequencies . . 200 5.16 T M I Accident Study: Pairwise gamma measures for Year 1979, 1980, 1981, 1982 . . 200 5.17 T M I Accident Study: Estimates of univariate marginal regression parameters for multivariate logit models 200 5.18 T M I Accident Study: Estimates of dependence regression parameters for multivariate logit model with multinormal copula 201 5.19 T M I Accident Study: Comparisons of A I C values and X2^ values from various sub-models of models (la) and (2) 201 5.20 T M I Accident Study: Comparisons of A I C values and X2^ values from the submodel l.md.g.wn of various models 202 5.21 T M I Accident Study: Estimates (SE) of dependence regression parameters from the submodel l.md.g.wn of various models 202 5.22 T M I Accident Study: Comparisons of values from the submodels l.md.g.wn and l.md.a.wc of model (la) 202 5.23 T M I Accident Study: Estimates of P r ( Y = y) and frequencies from the submodels l.md.g.wn and l.md.a.wc of model (la) 203 5.24 T M I Accident Study: Estimates of P r ( Y = y|x) and frequencies from the submodels l.md.g.wn and l.md.a.wc of model (la) 204 5.25 Bacteria Counts: Bacterial counts by 3 samplers in 50 sterile locations 209 5.26 Bacteria Counts: Univariate marginal frequencies 209 5.27 Bacteria Counts: Pairwise gamma measures for samplers 1, 2, 3 209 5.28 Bacteria Counts: Moment estimate of means, variances, correlations and other sum-mary statistics of responses 210 5.29 Bacteria Counts: Estimates of marginal parameters for multivariate Poisson model . 210 5.30 Bacteria Counts: Estimates of dependence regression parameters for multivariate Poisson model with multinormal copula 210 5.31 Bacteria Counts: Comparisons of A I C values from various submodels of multivariate Poisson model with multinormal copula 210 5.32 Bacteria Counts: Estimates of marginal parameters from multivariate Poisson-lognormal model 211 5.33 Bacteria Counts: Estimates of dependence parameters from multivariate Poisson-lognormal model 211 5.34 Bacteria Counts: Comparisons of A I C values from various submodels of multivariate Poisson-lognormal model 211 5.35 Bacteria Counts: Estimates of means, variances and correlations of responses from the submodel md.g of multivariate Poisson-lognormal model 211 6.1 G E E assessment: d = 2, /?0 = 0.5, Bx = 1, xtj discrete, p = 0.9, TV - 1000 226 6.2 G E E assessment: d = 2, f30 - 0.5,/?i = 1, xtj continuous, p = 0.9, TV = 1000 226 6.3 G E E assessment: d = 2, (30 = \u00E2\u0080\u00940.5,/?i = 0.5, = 1, wt, x{j discrete, p = 0.9, TV = 1000227 6.4 G E E assessment: d - 2, f30 = 0.5, /?i = 1, x{j discrete, p = 0.5, TV = 1000 227 6.5 G E E assessment: d = 3, z = 0.5, latent exchangeable, p = 0.9, \"working\" exchange-able, TV = 1000 228 6.6 G E E assessment: d = 3, z = 1.5, latent exchangeable, p = 0.9, \"working\" exchange-able, TV = 1000 228 xi 6.7 G E E assessment: d = 3, z = 0.5, latent AR(1) , p = 0.9, \"working\" AR(1) , N = 1000 228 6.8 G E E assessment: d = 3, z = 1.5, latent AR(1) , p = 0.9, \"working\" AR(1) , N = 1000 229 6.9 G E E assessment: d = 3, B0 = 0.5, B\ \u00E2\u0080\u0094 1, Xij discrete, latent exchangeable, p \u00E2\u0080\u0094 0.9, \"working\" exchangeable, TV = 1000 229 6.10 G E E assessment: d = 3, B0 = 0.5, B\ = 1, Xjj discrete, latent AR(1) , p = 0.9, \"working\" AR(1) , N = 1000 230 6.11 G E E assessment: d = 4, z = 0.5, latent exchangeable, = 0.9, \"working\" exchange-able, N = 1000 230 6.12 G E E assessment: d = 4, z - 0.5, latent AR(1) , p = 0.9, \"working\" AR(1) , N - 1000 230 6.13 Estimates of v and rj under different variance specification 232 6.14 G E E assessment: (u,rj) = (0.99995,0.01), E (Y ) = 2.718282, Var(Y) = 2.719, n = 1000, N = 500 . , 233 6.15 G E E assessment: (y,rj) = (-0.1,1.48324), E (Y ) = 2.718282, Var(Y) = 62.02, n = 1000, N = 100 233 6.16 G E E assessment: a - 0.5, B = 0.5, t] = 0.01, n = 1000, N \u00E2\u0080\u0094 500 233 6.17 A comparison of I F M to G E E with R\(a) given 235 xi i List of Figures 4.1 Trivariate probit, exchangeable: The efficiency of p from the margin (1,2) (or (1,3), (2,3)) 135 4.2 Trivariate probit, AR(1): The weight ui (or u3) versus p (solid line) and the weight u2 versus p (dash line) 137 4.3 Trivariate probit, AR(1): The efficiency of pp 137 4.4 Trivariate probit, AR(1): (a) The efficiency of p from the margins (1, 2) or (2, 3). (b) The efficiency of p from the margin (1,3) 138 4.5 Trivariate Morgenstern-binary model: Relative efficiency of I F M approach versus IFS approach 140 4.6 Trivariate Morgenstern-binary model: (a). Ordered relative efficiency values of I F M approach versus IFS approach; (b) A histogram of the efficiency value rg 141 4.7 Trivariate Morgenstern-binary model: Relative efficiency of I F M approach versus IFS approach when ui = u2 = \"3 and 6\2 \u00E2\u0080\u0094 #13 = #23 142 4.8 Trivariate Morgenstern-binary model: Relative efficiency of I F M approach versus IFS approach when u\ = u2 = u3, #12 = #23 = # and 913 \u00E2\u0080\u0094 02 142 4.9 Trivariate normal-binary model: Relative efficiency of I F M approach versus IFS ap-proach 144 4.10 Trivariate normal-binary model: (a). Ordered relative efficiency of I F M approach versus IFS approach; (b) A histogram of the efficiency rg 145 4.11 Trivariate normal-binary model: Relative efficiency of I F M approach versus IFS ap-proach when \u00C2\u00AB i = u2 = \u00C2\u00AB3 and pi2 = pi3 = Piz 145 5.1 Bacteria Counts: Residuals from the submodel md.g of model (1) 207 xm Bacteria Counts: Residuals from the submodel md.g of trivariate Poisson lognormal model xiv Basic Notation and Definitions The following notation and definitions are used throughout the thesis. 1. cdf stands for cumulative distribution function; pdf stands for probability density function, and pmf stands for probability mass function; Pr stands for probability of. 2. rv stands for random variable or random vector depending on the context; iid stands for independent and identically distributed. 3. B V N and M V N are the abbreviations for bivariate normal and multivariate normal respec-tively. 4. C U O M is the abbreviation for closure under taking of margins. M U B E is the abbreviation for model univariate and bivariate expressible. P U B E is the abbreviation for parameter uni-variate and bivariate expressible. M P M E is the abbreviation for model parameters marginally expressible. (Definitions given in Section 2.1.) 5. M L and M L E are the abbreviations for maximum likelihood and maximum likelihood estimates or estimation. A n M L E of 6 will usually be denoted by 6. 6. I F M and I F M E are the abbreviations for inference functions of margins and inference functions of margins estimates or estimation. A n I F M E of 6 will usually be denoted by 0. IFS is the abbreviation for inference functions of scores. 7. M C D and M M D are the abbreviations for multivariate copula discrete and multivariate mixture discrete. 8. The symbol \" \u00E2\u0080\u00A2 \" indicates the end of a definition, the statement of assumptions, a proof, a result, or an example. xv 9. For a vector or matrix, the transpose is indicated with a superscript of T or ', depending on convenience in the context. 10. A l l vectors are column vectors; hence transposed vectors such as X', x' (or XT, xT) are row vectors. 11. Rk = {x : x = (xi,..., Xk)', \u00E2\u0080\u0094 oo < Xj < oo for j = l,...,k} denotes the fc-dimensional Euclidean space. 12. d is used for dimension of the multivariate response vector of multivariate distribution. 13. Boldfaced Roman upper case letter Y = ( Y i , . . . , Yd)', usually with subscripts, is used to denote a response random vector and y is used for the observed value of this response vector. A vector of explanatory variables or covariates is usually denoted by x or w. 14. Boldfaced Roman upper case letters X , Y , Z and so on, usually with subscripts, are used for (random) vectors, boldfaced Roman lower case letters x, y, z and so on are used for the observed vector values. 15. Roman upper case letters X,Y, Z and so on, usually with subscripts, are used for random variables, roman lower case letters x, y, z and so on are used for the observed values. 16. Greek boldfaced lower case letters, often with subscripts, are used for a collection of parameters of families of distributions, e.g. a, 0,6,6. They are in vector format. Greek lower case letters, often with subscripts, are used for parameters of families of distributions, e.g. a, 0, 9, 8. 17. Greek upper case letters 0, E are used for a set of parameters (often dependence parameters) in multivariate family, they are mostly in matrix format. 18. 3? is the symbol for parameter space, usually ft C Rk for some K. 19. Script Roman upper case letters T and Q are used for classes of functions or distribution families. 20. F, G, H are the symbols for a (multivariate) cdf. 21. For a d-variate cdf F, the set of its marginal distributions is denoted as {Fs '\u00E2\u0080\u00A2 S \u00C2\u00A3 Sd}, where Sd is the set of non-empty subsets of {1,..., d}. For a specific S, the subscript is written without braces, e.g., F\,..., Fd, Fi2, Fi2z, etc.. xvi 22. We define the pdf or pmf of Y at y = (yi, . . .,yd) as Pu-d(yi \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 -yd) or simply P(yx \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 -yd), with the corresponding jth marginal Pj(yj), the bivariate (j, k) marginal Pjk{yjl)k), and so on. We also write P(yi \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 - yd', 0) to denote that the pdf or pmf of Y depends on a parameter (or parameter vector) 0. 23. The frequency of observing a particular outcome ( j / i , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, j / d ) ' in a data set is denoted by \" i 2 - - d ( j / i ' ' - Vd) or simply n(yi \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 - yd)- The frequency corresponding to the j th marginal out-come yj is nj(yj), and that corresponding to the (j, k) bivariate marginal outcome yj and J/J, is njk(yjyk), and so on. 24. J2{yj} means the summation over all possible different values of yj. J2{Xld{-x.,U,Y) (or d(x.)) the pdf. 26. The partial derivative of a scalar function ip(0), dip(9)/d9, is the q x 1 vector where 9\,..., 8q are the components of the vector 0. 27. The partial derivative of a vector function \? = (ipi(6),..., ipr(6))', 8^/06', is the r xq matrix (dm dm\ V 00i deq ) nVvO?)/ where 0\,..., 6Q are the components of the vector 6. xvn Acknowledgements I wish to record my warmest thanks to my thesis supervisor, Professor Harry Joe, for his continuous encouragement and for numerous suggestions and discussions during the development of this thesis. I acknowledge, with gratitude, the work of Harry Joe on multvariate copulas and dependence, and especially his invaluable book draft on \"Multivariate Models and Dependence Concepts\"; this has had a significant impact on the development of this thesis, and served as a foundation for this thesis. There are also many computer programming techniques and computations involved in this work where Harry Joe's tremendous experience was a crucial help. I would like to thank Professors John Petkau and Michael Schulzer for serving on my supervisory committee. In addition, I thank Professor John Petkau for helpful discussions, his valuable comments on the thesis, as well as his encouragement and support. Also, many thanks go to Professors Nancy Heckman, James Zidek and Ruben Zamar for their encouragement and support. Special thanks to Rinaldo Artes, Billy Ching and John Smith for their encouragement and support. I would also like to thank my fellow students, friends, professors and staff members in the Department of Statistics at U B C for providing a pleasant and stimulating research environment. Finally, I would like to thank Harry Joe for his financial support through an N S E R C grant. The financial support from the Department of Statistics is acknowledged with great appreciation. xvin C h a p t e r 1 Introduction This chapter starts by discussing the structure of the multivariate data for which we are going to build appropriate multivariate models. We motivate our thesis research through reviewing and criticizing the relevant literature on the modelling of the multivariate discrete data. This chapter is organized in the following way. In section 1.1, we introduce the multivariate data structure, for which we are going to develop multivariate models. In this section, we discuss in detail multivariate binary, multivariate ordinal categorical and multivariate count data. The models developed in this thesis are general in nature, but the illustrative examples will be mainly based on the forementioned three types of multivariate discrete data. In section 1.2, we briefly summarize and criticize the relevant statistical literature on the modelling of data of the types described in section 1.1, point out the inadequacies thereof, and thus motivate our thesis research. In section 1.3, we outline some desirable features of multivariate models and briefly discuss some of my understandings about statistical modelling. Section 1.4 provides an overview of the thesis. 1.1 Multivariate discrete response data T h e d a t a s t r u c t u r e Many data sets consist of discrete variables. Familiar examples of such variables are religion, na-tionality, level of education, degree of disability, attitude to a social issue, and the number of job changes for an individual during a certain period of time. These variables are categorical or count, they may be unordered (religion, nationality) or ordered (degree of disability, attitude to a social issue). In real life, what is more complicated is that often the discrete data are multivariate and 1 Chapter 1. Introduction 2 Table 1.1: The structure of general multivariate discrete data d-variate resp. margin-indep. cova. margin-dep. cova. J/n \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 Vid yn \u00E2\u0080\u00A2 '\u00E2\u0080\u00A2 \u00E2\u0080\u00A2 Vid Vnl \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 Und Xu \u00E2\u0080\u00A2 \u00E2\u0080\u00A2\u00E2\u0080\u00A2 X i p X{1 ' ' ' %ip Xnl ' ' ' Xnp \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 z u P l , . . . , z\d\ \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 Zldpd Zi\\ \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 ZHpi, . . ., Zidl ' ' ' Zidpt Znll ' ' ' Z n \ p l , . . ., Zndl ' ' ' Zndpt the multiple measurements may be interdependent in some way. The dependence may be general or special. The multivariate data structure can be further complicated by having missing data, random covariates and so on. In this thesis, we shall concentrate mainly on multivariate discrete response data, with or without covariates. The general multivariate discrete data set of interest is given in Table 1.1. The data structure in Table 1.1 consists of basically three parts: d-dimensional discrete response observations = (yn, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, yid)': a margin-independent covariate vector of p components x, = (xn,..., Xip)', that is, a covariate vector which is constant across margins, and d margin-dependent (or marginal specific) covariate vectors z , i , . . . , Zid, where Zij = (ziji,..., ZijPj)' is a vector of pj components for the jth margin, j = 1 , . . . , d, i \u00E2\u0080\u0094 1 , . . . , n. In the longitudinal or repeated measures settings, the marginals might be defined by successive points in time. In these situations, we can call the margin-independent covariates time-independent, that is, constant across times, and the margin-dependent covariates time-dependent. The response vector yt- can be measures on d variates with general or special dependence structure, such as multiple measures from a human, a litter of animals, a piece of equipment, a geographical location, or any other unit for which the observations are a collection of related measures. The measures can be spatial or temporal. One way to make inferences from such a data structure is through a multivariate parametric model. (Nonparametric multivariate inference requires much more data than parametric multivariate inference.) The development and analysis of suitable models for the multivariate data in Table 1.1 are the main objectives of this thesis. Some typica l mult ivar ia te discrete data Binary data. Binary data arises when measurements can have only one of two values. Conventionally these are represented by 0 and 1, with 1 usually representing the occurrence of an event and 0 representing non-occurrence. For example, the reaction of a living organism to some material, often Chapter 1. Introduction 3 observed as presence or absence of the reaction (usually called quantal response), is binary. Alive throughout a specified period or died during the period, won or lost, success or failure in a specified task, gender, agree or disagree, are all examples of sources of binary data. Multivariate binary data are frequent in statistical applications. Whenever multivariate data are coded, for each dimension, as one of two mutually exclusive categories, the data are multivariate binary. In the more complicated situation, covariates can be included when one is considering binary response data. A n example of multivariate binary data is the Six Cities Study case analyzed in subsection 5.2.1. Ordinal categorical data. A n ordinal variable is one that has a natural ordering of its possible values, but for which the distances between the values are undefined, such as a four-category Likert scale. Ordinal categorical (or ordered categorical) response data, often accompanied with a set of covari-ates, arise frequently in a wide variety of experimental studies, such as in bioassay, epidemiology, econometrics and medicine. For example, in medicine it may be possible to classify a patient as, say, severely, moderately or mildly ill, when a more exact measurement of the severity of the disease is not possible; the covariates may be age, gender and so on. With ordinal variables, the categories are known to have an order but knowledge of the scale is insufficient to consider them as forming a metric. We may treat the ordinal categories simply as nominal categories - which is unordered categorical measures, but by doing so the valuable information of order is lost. So the consideration of the order is important for optimal information extraction. For an ordinal variable, it is often reasonable to assume that the ordered categories correspond to non-overlapping and exhaustive in-tervals of the real line. Multivariate ordinal data are frequent in applications. Whenever multivariate response variables are each ordinal categorical, the data are multivariate ordinal categorical. More complicated situations include covariates for each of the response variables. A case of multivariate ordinal data from the Three Mile Island (TMI) nuclear power plant accident study can be found in subsection 5.2.2. Count data. Data in the form of counts appear regularly in life. In the simplest case, the number of occurrences of some phenomena on each unit are counted. Because no explanatory variable (e.g. time, treatment) distinguishes among these observed events, they can be aggregated as single numbers, the counts. Examples of count data are the counts of pest eggs on plant leaves, the counts of bacteria in different kinds of bacteria colonies, the number of organic cells with fixed number of chromosome interchanges produced by X-ray irradiation, etc.. Consul (1989) discussed many count data examples in a variety of situations, including home injuries, and strikes in industries. Other examples include the number of units of different commodities purchased by consumers over a Chapter 1. Introduction 4 period of time, the number of times authors cited over a number of years, spatial patterns of plants, the number of television commercials, or the number of speakers in a meeting. Multivariate count data are also frequent in applications. Whenever multivariate response variables are each count in nature, the data are multivariate count. The more complicated situations also include covariates to the response variables. A n example of multivariate count data can be found in subsection 5.2.3. 1.2 Review of literature and research motivation For the data types we have seen in section 1.1, one of the questions is how to build a model or a probability distribution as an approximation to the stochastic phenomenon of multivariate nature, and based on the available data, to estimate the distribution, and make some inference or predictions. For this purpose, the construction of an appropriate probability distribution or statistical model in accordance with the available data generated by the stochastic phenomenon is essential. Models for univariate discrete data have been studied extensively. The well-known generalized linear models for a univariate variable are such examples (McCullagh and Nelder 1989, Nelder and Wedderburn 1972). However, general studies on multivariate models for the type of data outlined in Table 1.1 are lacking in the statistical literature. One difficulty with the analysis of nonnormal multivariate data (including continuous and discrete data) has been the lack of rich classes of models such as the multivariate Gaussian. Some isolated studies on the modelling of a particular data set or under a particular multivariate setting of the type of data in Table 1.1 have appeared in the literature. These studies can be classified in general as being based either on a completely specified probability model or on a-partially specified probability model. We overview some of them here, and point out their drawbacks or weaknesses. Completely specified probability models Exponential family: Following Cox (1972), the probability distribution for a binary random vector Y can be represented as a saturated log-linear model d P(y) = exp(w0 + ^Ujyj + 'YJUjkyiyk + h uu...dyi \u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2yd) (1-1) where uo is a normalizing constant. The 2d \u00E2\u0080\u0094 1 parameters \u00C2\u00AB i , . . . , ud, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2., \u00C2\u00AB i 2 , \u00C2\u00AB 1 3 , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, U(d-i)d, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, ui2\u00E2\u0080\u0094d v a r y independently from \u00E2\u0080\u0094oo to co. Expressions similar to (1.1) can also be found in Zhao and Prentice (1990), Liang et al. (1992) and Fitzmaurice and Laird (1993). The representation Chapter 1. Introduction 5 (1.1) is not closed under taking of margins (see Section 2.1 for a definition). In fact, if we write P(yiV2) = e x p ( u o + ]Cj=i UjVj + u*22/i2/2), then UQ, U*- and u * 2 must depend on all the parameters uo, Uj, Ujk, ..., u\2-d- This fact makes the interpretation of the parameters Uj, Ujk, . \u00E2\u0080\u00A2., ui 2--d very difficult, and it is not clear how covariates could be included. For the general form, there are too many parameters. Bahadur representation: Bahadur (1961) gave a representation of the distribution for a binary random vector Y, in terms of the moments: d P(y) = nPj(l)yjPj(0)1~yj[l + J 2 P i k e i e k + 2 PjkiejekeI + --- + P l 2 . . . d e 1 e 2 - - - e d ] (1.2) j=l j - ( d - l ) _ 1 min{p, 1-p}. The model (1.4) is an extension of a beta-Bernoulli model derived from the mixture model P(y) \u00E2\u0080\u0094 JQ p y + (1 \u00E2\u0080\u0094 p)d~y+ g(p) dp, where y+ = Ylj=i %' a n d 9(P) i s t r i e density of a Beta(a, /?) distribution. This model implies equicorrelation Chapter 1. Introduction 6 of Y with correlation parameter of (1 + 7 - 1 ) - 1 . The representation (1.4) has the closure property under taking of margins, but it is limited to equicorrelation of response variables. Joe (1996) has discussions on the range of negative dependence on this family. Discussions of extensions to incorporate covariates appeared in Prentice (1986) and Connolly and Liang (1988). Multivariate probit model: A d-variate probit model for binary data is where /(\u00E2\u0080\u00A2) is indicator function, Z = (Z\,.. .,Zd)' ~ N(0,Q), 0 = (Ojk) is a correlation matrix. The Zj's are often referred to as cut-off points. Ashford and Sowden (1970) used the bivariate probit model for binary data to describe a coal miner's status of development of breathlessness (present or absent) and wheeze (present or absent) as a function of the miner's age. Anderson and Pemberton (1985) used a trivariate probit model for the analysis of an ornithological data set on the three aspects of colouring of blackbirds. A general introduction to the multivariate probit model is Lesaffre and Molenberghs (1991). The multivariate probit model has many nice properties, such as closure under taking of margins, model univariate-bivariate expressibility, and a wide range of dependence. M L E is considered but is computationally more difficult as d increases. New approaches to estimation and inference are explored in this thesis. Multivariate Poisson-lognormal model: Aitchison and Ho (1989) studied a model for count random vector Y , with where fj{yj\^j) is a Poisson pmf with parameter \j and g(X) is the density of a multivariate lognormal distribution. This model also has many nice properties, such as closure under taking of margins, model univariate-bivariate expressibility, and a wide range of dependence. Again the M L E is computationally difficult. Molenberghs-Lesaffre model: A model that may be suitable for binary and ordinal data is studied in Molenberghs and Lesaffre (1994). This model can accommodate general dependence structure from the Molenberghs-Lesaffre construction (Joe 1996) with bivariate copulas, such as in Joe (1993). The multivariate objects in the Molenberghs-Lesaffre construction have not been proved to be proper multivariate copulas, but they can be used for the parameters that lead to positive orthant proba-bilities for the resulting probabilities for the multivariate binary vector. Other miscellaneous models (some for time series or longitudinal data): Yj = I(Zj < ZJ), j = l,...,d, (1.5) (1.6) Chapter 1. Introduction 7 - Kocherlakota and Kocherlakota (1992) provide a good summary of bivariate discrete distribu-tions (including bivariate Poisson, bivariate negative binomial, etc.). - Markov chain of first order for binary data with Pr(Yj+i = l|Yj = 0) = PJJ+I(01) and Pr(yj+i = l\Yj = 1) = i-jj+i(ll). It can be generalized to higher order Markov chains. Some combinations of P,j+i(01), PJJ+I(11) and Pr(Yj+i = 1) could be replaced by logistic functions (but not all three) to incorporate covariates. Examples are in Darlington and Farewell (1992), Muenz and Rubinstein (1985), Zeger, Liang and Self (1985) and Gardner (1990). - Poisson AR(1) time series, as in Al-Osh and Alzaad (1987) and McKenzie (1988). The bivariate Poisson margin (for consecutive Y<'s) from this Poisson AR(1) time series is the same as a bivariate margin of (1.3). - Negative binomial AR(1) time series, as in McKenzie (1986), Al-Osh and Aly (1992) and Joe (1996b). The model of Al-Osh and Aly has range of serial correlation depending on the parameters of the negative binomial distribution (and hence is not very flexible). - When the binary or count variables are observed sequentially in time, one could use a model consisting of a product of a sequence of logit models for binary data (logit of Yt given Yi,..., Y(_i, x) and of Poisson models for counts (Poisson of Yt given YL, ..., Y (_i, x). This is proposed in Bonney (1987) and Fahrmeir and Kaufmann (1987). The advantage of such models is that one can use widely available software for univariate logit and Poisson models. One disadvantage of such models is that it would be difficult to predict Yt based on x alone. - Meester and MacKay (1994) studied a class of multivariate exchangeable models with the multivariate Frank copula. The models have limited application since only exchangeable de-pendence structures are considered. - Glonek and McCullagh (1995) have a similar bivariate model to the Molenberghs-Lesaffre model in that the dependence parameter is linear in covariates and the related bivariate copula is the Plackett copula. Their multivariate extension appears to overlap with that of Molen-berghs and Lesaffre (1994), but with a different model construction approach. P a r t i a l l y speci f ied p r o b a b i l i t y m o d e l s \u00E2\u0080\u0094 genera l ized e s t i m a t i n g equat ions a p p r o a c h General application of many of the preceding models was impeded, however, by their mathemati-cal complications and by the computational difficulty usually encountered in multivariate analysis. Chapter 1. Introduction 8 A different body of methodology, called the generalized estimating equations (GEE) approach, has been developed based on moment-type methods which do not require explicit distributional assump-tions. References for this methodology are Liang and Zeger (1986) and Zeger and Liang (1986), Zhao and Prentice (1990), Fitzmaurice and Laird (1993), among others. However the G E E approach has several disadvantages mainly related to the modelling, inference, diagnostics checking and interpre-tations. Furthermore, the G E E approach does not apply directly to multivariate ordinal data. A detailed study of the G E E approach, including a discussion of some of its shortcomings, can be found in Chapter 6. R e s e a r c h m o t i v a t i o n In summary, although some approaches have appeared in the literature to model specific instances/examples for the data in Table 1.1, there are at least two major features lacking in the statistical literature in terms of modelling multivariate discrete data: 1. A unified, systematic approach to multivariate discrete modelling, with classes of models for multivariate discrete data where some models in the class have nice properties (see section 1.3 for some desirable features of multivariate models). 2. A model-fitting strategy with computationally feasible parameter estimation and inference procedures, with good asymptotic properties and efficiency. This thesis makes contributions to these two lackings in multivariate discrete (more generally, non-normal) data modelling. We study systematic approaches to the modelling of multivariate discrete response data with covariates. The response types include binary, ordinal categorical and count. Statistical inference and computational aspects of the multivariate nonnormal models are studied. 1.3 Statistical modelling We discuss here two issues in statistical modelling. One is what we mean by statistical modelling in general. The other is the construction of multivariate models with desirable properties. Other aspects of statistical modelling as part of data analysis will be discussed in Chapter 5. In practice, with a finite sample of data, to capture exactly the possibly complex multivariate system which generated the data is impossible. The problem can even be more complicated than modelling a system; it might be that the system itself does not exist and it is forever a hypothetical Chapter 1. Introduction 9 one. In statistical modelling, the specification of a particular model for the data is always somehow arbitrary; what we hope is that the stochastic models we use may reflect relatively well the random-ness or uncertainty in the system, as well as the significant features of the systematic relationships. The statistical models should be considered as a means of providing statistical inference; they should be viewed as tentative approximations to the truth. The most important consideration in using any statistical method (or model) is whether the method (or model) can give insight into important practical problems. A l l models are subjective in some degree. Often the modeller chooses those elements of the system under investigation that should be included in the model as well as the mode of representation. Modelling should not be a substitute for thinking and will only be effective if combined with an interest in and knowledge of the system being modelled. The construction of multivariate nonnormal models is not easy. For modelling purposes, we would like to have parametric families of models that (i) cover the different types of dependence, (ii) have interpretable parameters, and (iii) apply to multivariate discrete data. Some desirable properties of a multivariate model are the following: 1. The model is natural. That is, the model is interpretable in terms of mixture, stochastic or latent variable representations, etc.. 2. Parameters in the model are interpretable. A parametric family has extra interpretability if some of the parameters can be identified as dependence or multivariate parameters, such that some range of the parameters corresponds to positive dependence and some corresponds to negative dependence, and it is desirable to have the amount of dependence to be increasing as parameters increase. 3. The model allows wide and flexible range of dependence, with interpretable dependence pa-rameters which are flexible to the needs for different applications. 4. The model extends naturally to include covariates for the univariate marginal parameters as well as dependence parameters, in the sense that after the extension, we still have probabilistic model and proper interpretations. 5. The model has marginally expressible properties, such as model parameters expressible by pa-rameters in univariate and bivariate distributions property and closure property with extension of univariate to bivariate and to higher order margins. 6. The model has a simple form, preferably with closed form representations of the cdf and density, or at least is easy to use for computation, estimation and inference. Chapter 1. Introduction 10 Generally, it is not possible to achieve all of these desirable properties simultaneously, in which case one must decide the relative importance of the properties and sacrifice one or more of them. There is no known multivariate family having all of these properties but the family of multivariate normal distributions may be the closest. Multinormal distributions satisfy (1), (2), (3), (4) and (5) but not (6) since the multinormal has no closed form cdf. The mixture of max-id copulas (Joe and Hu 1996) satisfy'(1), (2), (3), (4) and (6) but only partially (5). In Chapter 3, these desirable properties of a multivariate model will be used as criteria to compare different models. 1.4 O v e r v i e w o f t h e s i s This thesis consists of seven chapters. In Chapter 2, we develop the theoretical background for the multivariate discrete models, statistical inference and computation procedures. Two general classes of multivariate discrete models are introduced; their common feature is that both rely on the copula concept. Several new concepts related to multivariate models are proposed. The asymptotic theory for parameter estimation based on the inference functions of margins (IFM) is also given in this chapter. In Chapter 3, we study and compare many specific models in the two general classes of multivariate models proposed in Chapter 2. Mathematical details for parameter estimation for some of the models are provided. In Chapter 4, the efficiency of I F M approach relative to the classical maximum likelihood approach is investigated. The major advantage of I F M is its computational feasibility and its good asymptotic properties. We demonstrate that I F M is an efficient parameter estimation approach when it is applicable. We also study the efficiency of the jackknife method of variance estimation proposed in Chapter 2. In Chapter 5, some important issues such as a proper data analysis cycle, model selection and diagnostic checking are discussed. Data analysis examples illustrating modelling and inference procedures developed in this thesis are also carried out. In Chapter 6, we study the usefulness and efficiency of the G E E approach which has been the focus of many recent statistical applications dealing with multivariate and longitudinal data with univariate margins covered by the theory of generalized linear models. In Chapter 7, the final chapter, we discuss some further important research topics closely related to this thesis work. Finally, the Appendix contains a Maple symbolic manipulation program example used in Chapter 4. C h a p t e r 2 Foundation: models, statistical inference and computation In this chapter, we propose two classes of multivariate discrete models: multivariate copula dis-crete (MCD) models and multivariate mixture discrete (MMD) models. These two classes of models provide a new classification of multivariate discrete models, and allow a general approach to mod-elling multivariate discrete data. The two classes unify a number of multivariate discrete models appearing in the literature, such as the multivariate probit model, multivariate Poisson-lognormal model, etc. At the same time, numerous new models are proposed under these two classes. We also propose an inference functions of margins (IFM) approach to parameter estimation for M C D and M M D models. This estimation approach is built on the general theory of inference functions (or estimating equations). Asymptotic theory for I F M is developed and applied to the specific models in Chapter 3. While similar ideas about the same kind of estimating functions for a specific model have appeared in the literature, the general development of the procedure as an approach for the parameter estimation for a class of multivariate discrete models, and the related asymptotic results, are new. We also show that a jackknife estimate of the covariance matrix of the estimates from the I F M approach is asymptotically equivalent to the asymptotic covariance matrix from the Godambe information matrix. The jackknife procedure has the advantage of general computational feasibility. These results are used extensively in the applications in Chapter 5. The efficiency of I F M versus the optimal estimation procedure based on maximum likelihood estimation and the numerical assess-ment of the efficiency of jackknife covariance matrix estimates compared with Godambe information 11 Chapter 2. Foundation: models, statistical inference and computation 12 matrix are studied in detail in Chapter 4. The present chapter is organized as follows. Section 2.1 introduces the multivariate copula mod-els, some multivariate dependence concepts and a number of new concepts regarding the properties of a multivariate model. In section 2.2, we introduce two classes of multivariate discrete models: the multivariate copula discrete models and the multivariate mixture discrete models. These two classes of models are the focus of this thesis, and specific models in these two classes will be extensively studied in Chapter 3. In section 2.3, we propose an inference functions of margins (IFM) approach for the parameter estimation of M C D and M M D models; the theoretical foundation is built on the theory of inference functions for the multi-parameter situation. Section 2.4 is devoted to the study of the asymptotic properties of parameter estimates based on the I F M approach. Under regularity conditions, the I F M estimators (IFME) for parameters are shown to be consistent and asymptoti-cally normal with a Godambe information matrix as the variance-covariance matrix. These are done for the models with no covariates as well as models with covariates. The extension of models with no covariates to models with covariates will be made clear in this section. In section 2.5, we propose a jackknife approach to the asymptotic variance estimation of I F M E , and show theoretically that the jackknife estimate of variance is asymptotically equivalent to the Godambe information matrix. The importance of the jackknife estimate of variance will be demonstrated in Chapter 5 for real data analysis. 2.1 Multivariate copulas and dependence measures 2.1.1 M u l t i v a r i a t e d i s t r i b u t i o n f u n c t i o n s We begin by recalling the definition of a distribution on Md. D e f i n i t i o n 2.1 A d-dimensional distribution function is a function F : Md \u00E2\u0080\u0094> IR, which is right continuous, with (i) lim F(y1,...,yi) = 0, j = l , . . . , d , (ii) lim F ( y i , . . . , y d ) = l and which satisfies the following rectangle inequality: For all (ai,..., af), (b\,..., bf) with a,j < bj, j = !,\u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2, d, 2 2 Jfei=l kd=l Chapter 2. Foundation: models, statistical inference and computation 13 The following are several remarks related to Definition 2.1: i . If F has dth order derivatives, then (2.1) is equivalent to ddF/dyi \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 -dyd > 0. ii . Letting a2, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2., ad \u00E2\u0080\u0094+ \u00E2\u0080\u0094oo, then (2.1) reduces to F(bi, b2,.. .,bd) \u00E2\u0080\u0094 F(ai,b2,..., bd) > 0, so F is increasing in the first variable. Similarly, by symmetry, F is increasing in the remaining variables. iii. Let 5 be a subset of {1,..., d}. The margins Fs of F(yi,..., yd) are obtained by letting y,- \u00E2\u0080\u0094\u00C2\u00BB oo for i \u00C2\u00A3 S. There are two important types of cdf generated from a random vector Y : discrete and continuous. In the case of an absolutely continuous random vector Y, there is a corresponding density function f(yi ,---,yd) which satisfies / ( y ! , . . . , yd) > 0 and f f ^ \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 f(yx \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 yd)dyx \u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2dyd = 1. The cdf can be written by /yd ryi \u00E2\u0080\u00A2\u00E2\u0080\u00A2 f(xi \u00E2\u0080\u00A2 --x^dxi \u00E2\u0080\u00A2 --dxd. - o o J \u00E2\u0080\u0094 oo In the case of a discrete random vector Y, the probability that Y takes on a value y = ( j / i , . . . , yd)' is defined by the pmf P(yi---yd) = Pv(Y1 = y1,...,Yd = yd), which satisfies P(yi \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 -yd) > 0 and Y^,{yi} ' ' '^{yd} ^ (v^ \u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2%) = !\u00E2\u0080\u00A2 The cdf can be written as F{yi,...,yd)- P(xi---xd). {xit},j = l,...,d. The copula, C, of Y = (Yi,..., Yd)' is constructed by making marginal probability integral transforms on Y\,..., Yd to Ui,..., Ud. That is, the copula is the joint distribution function of U\,..., Ud: C { u l , . . . , u i ) = G{G-ll{ul),...,Gd\ud)). (2.2) C is non-unique if the Gj's are not all continuous. This point will be made clear in section 2.2. Sup-pose Y is a continuous random vector with distribution function G(yi,.. - ,yd) and the corresponding copula is C ( u i , . . . , ud) with density function c ( u i , . . . , ud). The density function of G(y\,..., yd) in terms of copula density function is g(yx ,...,yd) = c(Gi(t/i),..., Gd(yd)) ]T? = 1 9j (%)\u00E2\u0080\u00A2 The copula captures the dependence among the components of the random vector Y ; it contains all of the information that couples the d marginal distributions together to yield the joint distribution of Y . This understanding is essential for the subsequent development of the multivariate discrete models. The copula was first introduced by Sklar (1959). For parametric families of copulas with good properties, see Joe (1993, 1996). Through the copula, a distribution function is decomposed into two parts: a set of marginal distribution functions and the dependence structure which is Chapter 2. Foundation: models, statistical inference and computation 15 specified in terms of the copula. This suggests that one natural way to model multivariate data is to find the dependence structure in terms of copula and the univariate marginals separately. This important feature will be extended to form multivariate discrete models by using the copula concept in section 2.2 and in Chapter 3. Next we define the Frechet bounds. D e f i n i t i o n 2.3 (Frechet bounds) Let F(x) be a d-variate cdf with univariate margins F \ , . . . , Fd. Then for V i i , . . . , xd, max{0, J^(a:i) + \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 + Fd{xd) - (d - 1)} < F(xx, ...,xd)< min{ ^ ( s i ) , . . . , Fd(xd)}, (2.3) where m i n { i ? i ( x i ) , . . . , Fd(xd)} is the Frechet upper bound, and max{0, F\(xi)+- \u00E2\u0080\u00A2 -+Fd(xd) \u00E2\u0080\u0094 (d\u00E2\u0080\u0094l)} is the Frechet lower bound. \u00E2\u0080\u00A2 We state here some important properties of the Frechet bounds. Proper t ies 1. The Frechet upper bound is a cdf. 2. The Frechet lower bound is a cdf for d = 2. 3. The Frechet upper bound copula is C{/(u) = m a x { \u00C2\u00AB i , . . .,ud}. For d = 2, the Frechet lower bound copula is C^(u) = min{0, u\ + u2 \u00E2\u0080\u0094 1). For a proof of the properties 1,2,3 and other properties of Frechet bounds, see Joe (1996). Under independence, the copula is d Cj(ui,...,ud) = J\uj, and any copula must pointwise fall between max{0, u\ + \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 + ud \u00E2\u0080\u0094 (d \u00E2\u0080\u0094 1)} and m i n { u i , . . . , ud}. 2.1.3 Dependence measures It is desirable for a parametric family of multivariate distributions to have a flexible and wide range of dependence. For non-normal random variables, correlation is not the best measure of dependence, and concepts based on linearity are not necessarily the best to work with. More general concepts of positive and negative dependence and measures of monotone dependence are needed. These are necessary for analyzing the type of dependence and range of dependence in a parametric family of Chapter 2. Foundation: models, statistical inference and computation 16 multivariate models. For a thorough treatment of dependence concepts and dependence orderings, see Joe (1996, Chapter 2). In multivariate analysis, one of the most important activities is to model the dependence structure among the random variables. The complexity of the dependence structure and its range often determines the practical usefulness of the model. The dependence structure of a multivariate model can be considered somehow equivalent to the copula; for example, Schweizer and Wolff (1981) used copulas to define several natural nonparametric measures of dependence for pairs of random variables. The parameters in a multivariate copula reflect the degree of dependence among variables; for example, the multivariate normal copula can be adequately expressed in terms of a correlation matrix of which the elements consist of the pairwise correlation coefficients of a multinormal random vector, with a large correlation coefficients indicating strong dependence among variables. However, it is not always possible to express a copula in term of correlation coefficients of a set of random variables. There is also a mathematical reason, e.g. mathematical simplicity, not to express a copula in terms of correlation coefficients. A measure of dependence for two random variables indicates how closely these two random variables are related. The extreme situations would be mutual independence and complete mutual dependence. Some very useful dependence concepts, such as positive and negative dependence concepts, are based on the refinement of some intuitive understanding of dependence among random variables. For example, for two random variables X and Y , the positive dependence concept means roughly that large (small) values of X tend to accompany large (small) values of Y. Often in practice, this knowledge of the amount of dependence is good enough for some modelling purposes. Some well-known measures of dependence for two random variables are Pearson's correlation coefficient r, Spearman's rho and Kendall's tau. These measures are defined as follows: Let X, Y be random variables with continuous distribution function F(x) and G(y) and copula C. We further assume that (X\,Yi), ( X 2 , Y 2 ) and (X, Y ) are independent with the same joint distribution. Then Pearson's correlation coefficient is r = Co\(X,Y)/CTXCY or Kendall's tau is r = CoTr{sgn(Xl - X), sgn^ - Y)) = 2Pr((Xi - X ) ( Y : - Y ) > 0) - 1, or and Spearman's rho is p = Covi(sgn(Xi \u00E2\u0080\u0094 X), sgn(Y2 \u00E2\u0080\u0094 Y)) or Chapter 2. Foundation: models, statistical inference and computation 17 where ax and cry stand for the standard deviation of random variables X and Y, sgn(-) denotes the sign function. Both Kendall's tau and Spearman's rho are invariant to strictly increasing trans-formations. They are equal to 1 for the Frechet upper bound and -1 for the Frechet lower bound. These properties do not hold for Pearson's correlation. Essentially, Pearson's r measures the strength of the linear relationship between two random variables X and Y, whereas the Kendall's tau and Spearman's rho are measures of monotone correlation (strength of monotone relationship). For bi-variate quantitative data, Spearman's rho corresponds to the rank correlation (Pearson's correlation applied to the ranks of the 2 variables). That the copula captures the basic dependence structure among the components of Y can be seen by the fact that all nonparametric measures of association, such as Kendall's tau, Spearman's rho, are normed distances of the copula from the independence copula. In general, it is difficult to judge the intensity of dependence for a given multivariate model solely based on one dependence measure; the three common dependence measures can be used as a reference for the attainable intensity of the dependences of a given multivariate model. For ease of interpretation of the dependence structure, we would like to see the dependence structure expressed in easily interpretable parameters. For example, for arbitrary marginals, a question is how to express a copula in terms of the most common measures of association, such as Pearson's r (from some specific marginals), Spearman's rho, or Kendall's tau, in a natural way. For some well-defined classes of distribution, such as the multivariate normal, Pearson's correlation coefficient is the measure of choice. In other classes, other measures may be more appropriate. (For example, the Morgenstern copula in subsection 2.1.4 is expressed in terms of Kendall's tau in a natural way.) A parametric family has extra interpretability if some of the parameters can be identified as dependence parameters. More specifically, one would like to be able to say that some range of the parameters corresponds to positive dependence and some corresponds to negative dependence. Furthermore, it would be desirable to have the amount of dependence to be increasing (decreasing) as parameters increase (decrease). 2.1.4 Examples of multivariate copulas Some well-known examples of copula families are: the multivariate normal copula, Morgenstern copula, Plackett copula, Frank copula, etc. Joe (1993, 1996) provides a detailed list of families of copulas with good properties. In Genest and Mackay (1986), a class of copulas, called Archimedean copulas, is studied extensively. Most existing parametric families of copulas represent monotone dependence structures where the intensity of the dependence is determined by the value of the Chapter 2. Foundation: models, statistical inference and computation 18 dependence parameter. Some families, such as the normal family, possess a complete range of dependence intensities whereas others, such as the Morgenstern family, possess only a limited range. In fact, the Morgenstern copula never attains the Frechet bounds; Spearman's rho lies between \u00E2\u0080\u0094 1/3 and 1/3. For general modelling purposes, we would naturally seek families with a wide range of dependence intensities. Here we give some examples of multivariate copulas. More examples of multivariate copulas will be given in Chapter 3 for constructing multivariate discrete models. E x a m p l e 2.1 ( M u l t i v a r i a t e n o r m a l copula) Let $ be the standard normal distribution func-tion and let $-\Gd{zd)); 0), d d (2.5) Chapter 2. Foundation: models, statistical inference and computation 19 It has density function d c(ui,u2,...,ud) = 1 + J~] 6jkQ- ~ 2 \u00C2\u00AB j ) ( l - 2uk). j 0, l+#i3 > #23+#23, 1 + O12 > #13 + 023, l + 023 > 9i2 + #13, or more succinctly - 1 + \012 + ^ s l < #i3 < 1 - \0i2 - #231> \u00E2\u0080\u0094 1 < 0\2, #13, 2^3 < 1- Similar conditions for higher dimension d \u00E2\u0080\u0094 4,5,..., can also be obtained by considering the 2d cases for uj = 0 or 1, i = 1 , . . . , d, and verifying that c(u\,..., ud) > 0. It is easy to see that for any j,k = 1 , . . . , d; j ^ k, Cjk(uj,uk;0jk) = [l + 9jk(l - UJ)(1 - uk)]ujuk, - 1 < 0jk < 1, with density function Cjk(uj,uk) = 1 + 0jk(l - 2UJ)(1 - 2uk). The dependence structure between Uj and Uk is controlled by the parameter 0jk. Spearman's rho is p = 0jk/3. The maximum Pearson's correlation coefficient over all choices of Gj and Gk is 1/3 (when 9jk = 1) which occurs for uniform marginals. For normal marginals, the maximum of the Pearson's correlation coefficient is l / V ; for exponential marginals it is 1/4; for double exponential marginals, the limit is 0.281. Kendall's tau is 20jk/9, with the maximum range of \u00E2\u0080\u00942/9 to 2/9. Because of the dependence range limitation, the Morgenstern copula is not very useful for general modelling. Nevertheless, because the Morgenstern copula has such a simple form, it can be used as an investigation tool in, for example, simulation studies to check properties of some general modelling procedures. A n example of its use is provided in section 4.3. If a new procedure breaks down with a distribution based on the Morgenstern copula, then it will probably have difficulties with other models that admit a wider range of dependence. A version of the d-dimensional Morgenstern copula with higher order terms has the following density function d c(uuu2, . . . , u d ) = l+ ] T Pjd2[l - 2ujJ[l - 2uj2] jl<]3 ro- That is, there exists a parameterization of the lower dimensional margins so that the lower order closure property hold. iii . If a model has the M U B E property, then all the parameters in this model are P U B E . Further-more, this model is also M P M E . iv. If every parameter is P U B E , then the model is M U B E . No other implications hold in general. In the following, a few examples are used to illustrate the above concepts and some of their relationships. Example 2.3 (Models with C U O M and M U B E properties) A familiar example of a model with the C U O M and P U B E properties is the multivariate normal model. The closure under taking of margins for the multinormal distribution is somewhat stronger than the C U O M property defined here, since it is also closed under taking of univariate margins, which is not required in our definition. \u00E2\u0080\u00A2 Example 2.4 (Models with M U B E property) For some copulas, such as (2.4) and (2.5), the dependence structure can be expressed by a d x d matrix parameter 0 = (Ojk) with Ojj = 1. For such a d-dimensional copula C ( ; 0 ) , the 2-dimensional margins can be expressed by a bivariate copula C j ( - ; Ojk) with one dependence parameter Ojk, for j, k = 1 , . . . , d; j ^ k. Thus each element in the dependence structure described by the parametric matrix 0 = (Ojk) can be equivalently expressed Chapter 2. Foundation: models, statistical inference and computation 22 by a set of bivariate copulas Cjk(-;9jk)- The distribution with this copula is thus M U B E . Some copulas such as (2.4) have a wide range of dependence; some such as (2.5) do not. \u00E2\u0080\u00A2 E x a m p l e 2.5 ( M o d e l s w i t h C U O M but not M U B E proper ty ) We give two examples here: a. Consider the generalized Morgenstern copula (2.6). This copula has the C U O M property, since for any {ji,..., jm} \u00C2\u00A3 {1,..., d} where m < d, it is straightforward to verify that C(UJ1 , Uj2,.,., Ujm) has the form (2.6). But this generalized Morgenstern copula is not M U B E . b . Another example is the multivariate Poisson distribution. Let us examine the trivariate Poisson distribution. Let the random variables X\, X2, X3, X\2, X13, X23, X123 have independent Poisson distributions with the mean parameters A i , A 2 , A 3 , A12, A13, A23, A123 respectively. We now construct new random variables as follows: Y\ = X\ + X\2 + X13 + X\23, Y2 \u00E2\u0080\u0094 X2 + X12 + X23 + X i 2 3 i Y3 = X3 + X13 + X23 + X l 2 3 -Using the convolution property of the Poisson, we derive that Yi ~ Po(X\ + A i 2 + A13 + A123), 7 2 ~ Po(A 2 + A 1 2 + A 2 3 + A123), Y3 ~ Po(A 3 + A i 3 + A 2 3 + A 1 2 3 ) , that (YUY2), (yltY3), (Y2,Y3) have bivariate Poisson distributions, and that (Yi, Y2, Y3) has a trivariate Poisson distribution. This 3-dimensional Poisson model has the C U O M property because the bivariate margins have a similar stochastic representation. But it is not M U B E nor P U B E . In fact, with univariate and bivariate margins, we can only estimate A x + A13, A 2 + A 2 3 and A 1 2 + A i 2 3 from the (1,2) margins, Ai + A i 2 , A 3 + A 2 3 and A i 3 + A i 2 3 from the (1,3) margins, and A 2 + A i 2 , A 3 + A i 3 and A 2 3 + A 1 2 3 from the (2,3) margins. These nine linear expressions form only six independent linear expressions. Since we have seven parameters in the model, thus the model is not M U B E . Furthermore, it can be easily verified that no any single parameter can be univariate-bivariate.expressed. \u00E2\u0080\u00A2 E x a m p l e 2.6 ( M o d e l s w i t h M U B E b u t not C U O M ( 2 ) p r o p e r t y ) Consider a trivariate cop-ula constructed in the following way: C i 2 3 ( u i , u 2 , ' u 3 ) = / C1\2(u1\x;612)C3\2(u3\x;623)dx, (2.7) Jo where C\\2 and C 3 |2 are conditional cdfs obtained from two arbitrary bivariate copulas families C 1 2 ( u i , \u00C2\u00AB2! ^ 12) and C 2 3 ( u 2 , u 3 ; <523). This trivariate copula has (1,2) bivariate margin C i 2 ( \u00C2\u00AB i , u 2 ; <5i2), (1,3) bivariate margin C i 3 ( \u00C2\u00AB i , u3) = JX Ci| 2 (\u00C2\u00ABi|a;; <5i2)C3|2(u3|a:; <523) dx, and (2, 3) bivariate margin C ' 2 3 ( u 2 , u3; 623)- Suppose we can let C\2 be the Plackett copula C(u,v;6) = Q.bn-1{l + r1(u + v)-[(l + r)(u + v))2 -ASnuv]1!2}, 0 < <5 < 00, (2.8) Chapter 2. Foundation: models, statistical inference and computation 23 where n = 5 \u00E2\u0080\u0094 1, and we can let C23 be the Frank copula C(u,v;6) = - 6 - 1 l o g ( [ t - ( l - e - s \u00C2\u00BB ) ( l - e - 6 v ) ] / t ) , 0 < 6 < oo, (2.9) where \u00C2\u00A3 = 1 \u00E2\u0080\u0094 e~s. Then the model (2.7) is well-defined, and is obviously M U B E with 2 bivariate dependence parameters 612 and 623- But the model (2.7) is not CUOM(2) , since the Plackett copula and the Frank copula are not in the same parametric family. Generally speaking, given bivariate distributions ^12,-^23 with univariate margins F\,F2,F3, it can be shown that \u00C2\u00A313(^112(2/1 W, O12), F3\2(y3\z2; #23)) ^ 2 ( ^ 2 ) (2.10) -00 is a proper trivariate distribution with univariate margins F\,F2,F3, (1,2) bivariate margin F12, and (2,3) bivariate margin i<23- In (2.10), Fi\2,F3\2 a r e conditional cdfs obtained from F\2,F23, and C 1 3 is a bivariate copula associated with the (1,3) margin (it can be interpreted as a copula representing the amount of conditional dependence in the first and third univariate margin given the second). Specifically, Ci3(ui,u3) = U\u3 corresponds to conditional independence and C i 3 ( u i , W3) = m i n { u i , u 3 } corresponds to perfect conditional dependence. The model (2.10) is M U B E , but it may not be CUOM(2) - it is enough to see this fact by choosing F\2 and F23 from different parametric family. The model (2.7) is a special case of (2.10) obtained by letting F\2, F23 be the Plackett and Frank copulas respectively, and C i 3 ( u i , U 3 ) = uiu3. The construction (2.10) is a special case of Joe (1996a). \u00E2\u0080\u00A2 E x a m p l e 2.7 ( M o d e l s w i t h C U O M ( 2 ) but not C U O M proper ty ) Let F(u, v; 9) = uv(l + 9(1 \u00E2\u0080\u0094 u)(l - v)), - 1 < 9 < 1, be the bivariate Morgenstern family (2.5). Let Fi2 and F23 are in this family with parameters #12 and 023 respectively. Let C i 3 ( u i , w 3 ) = W1W3. The conditional distributions are FJ\2(UJ\U2) = Uj + 9j2Uj(l - Uj)(l - 2u2), j = 1,3. Hence by (2.10), we have F 1 3 ( u 1 } u 3 ) = j F1\2(ui\z2)F3\2(u3\z2)dz2 = u l U 3 [ l + Z ' ^ ^ ^ l - U l ) ( l - u3)], Jo which is in the bivariate Morgenstern family (2.5) with parameter 0i2#23/3. Hence the model / \u00E2\u0080\u00A2 U 2 Fl23(ui,U2,U3) = ^ ( U l |Z2)-F3|2(U3 l z 2 ) ^ 2 (2.11) Jo is CUOM(2) . But (2.11) is not C U O M . In fact, we find F 1 2 3 ( u l t U2, U3) = U 1 U 2 U 3 [ 1 + 012(1 - U i ) ( l - U2) + 3 - 1 0 i 2 0 2 3 ( l - \u00C2\u00AB l ) ( l - U3) + 0 2 3(1 - \u00C2\u00AB 2 ) ( 1 - u3) + 20 1 20 2 3(1 - u i ) ( l - u2)(l - u3)(l - 2 \u00C2\u00AB 2 ) / 3 ] , which is not in the trivariate Morgenstern family (2.5). \u00E2\u0080\u00A2 Chapter 2. Foundation: models, statistical inference and computation 24 E x a m p l e 2.8 ( M o d e l s w i t h C U O M ( r 0 ) b u t not C U O M ( r i ) property , w h e n r 0 < r i ) Con-sider a 4-variate copula model: Fl234(ui,U2,U3,U4) = UlU2U3U4[l + 0i2(l - U i ) ( l - u2) + 3 _ 1 0 i 2 0 2 3 ( l - \u00C2\u00AB i ) ( l - u 3) + 014(1 - U l ) ( l - u4) + 823(1 - u2)(l - u3) + e24(l - u 2 ) ( l - \u00C2\u00AB 4 ) + 6*34(1 - u 3 ) ( l - u 4 )+ 2^12^23(1 - \u00C2\u00AB i ) ( l - u 2 ) ( l - \u00C2\u00AB 3 ) ( 1 - 2u 2)/3], where \914 + e24 + 934\ - 612 - 1 < 0 2 3(1 + M < 1 + 912 - \914 + 624 - 934\, \914 -024 -934\ + 912 - 1 < ^23 (1 -^12) < 1 \u00E2\u0080\u0094 ^12 \u00E2\u0080\u0094 1^14 \u00E2\u0080\u0094 ^24 + ^341 and \9jk\ < 1, 1 < j < k < 4. It can be shown that F12, Fi3, F14, F23, F24, and F34 are in the bivariate Morgenstern family (2.5), but i ? i 2 3 , i * i 2 4 , F i 3 4 and F 2 3 4 are not in the same parametric family. In fact, ^124, -P134 and F234 are in the trivariate Morgenstern family (2.5), but .F123 is not. \u00E2\u0080\u00A2 E x a m p l e 2.9 ( M o d e l s w i t h P U B E b u t not M P M E proper ty ) We give two examples here: a. In the generalized Morgenstern copula (2.6), the parameters (3j1j2 (1 < ji < J2 < d) are P U B E , but the model is not M P M E , as the parameter /?i2--d cannot be expressed by any marginal copula. b. Another example is the Molenberghs-Lesaffre model in Example 2.17. The parameters r\j (1 < j < d) and r]jk (1 < j < k < d) are P U B E , but the model is not M P M E , as the parameter r)i2-d cannot be expressed by any marginal pmf. \u00E2\u0080\u00A2 2.2 M u l t i v a r i a t e d i s c r e t e m o d e l s Assume T is a parametric family defined on a common measurable space (3^.4), where y is a discrete sample space and A the corresponding cr-field. We further assume F={P(y;0):6e$i}, ft C JtV, (2.12) where $ \u00E2\u0080\u0094 (9\,..., 9q)' is a g-component vector, and ft is the parameter space. The parameter space is usually a subset of g-dimensional Euclidean space. We presume the existence of a measure fi on y such that for each fixed value of the parameter 0, the function P(y; 0) is the density with respect to u of a probability measure V on y. For a d-dimensional random discrete vector Y \u00E2\u0080\u0094 (Yi,..., Yd)', its pmf P(yi \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 - yd, 6) (or simply P(y\ \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 - yd)) is assumed to be in T. Chapter 2. Foundation: models, statistical inference and computation 25 2.2.1 Multivariate copula discrete models We define a cdf of a discrete random vector Y = (Y\,..., Yd)' as G(yi,...,yd) = C(G1(yl),...,Gd(yd)), (2.13) where C is a (/-dimensional copula and Gj (j = 1,. . .,d) is the cdf of the discrete rv Yj. Thus G(yi, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, yd) is a well-defined cdf for a discrete random vector Y . The pmf of Y = y = (yi,..., yd)' is 2 2 P(vi---Vd)= J2---H(-l)kl+\"+kdC(^'---'x^), (2-14) *1 = 1 kd = l where Xji = Gj(yj),Xj2 = Gj(y*j) with Gj(yj) < Gj(y*j) and for any x such that yj < x < y*j , we have Pr(Yj = x) = 0. We call the model (2.13) for a discrete random vector Y a multivariate copula discrete (MCD) model. The family of M C D models is a big family. With M C D models, we have flexible choices of marginal cdfs, including standard distributions such as Bernoulli, binomial, negative binomial, Pois-son and generalized Poisson, etc., and these allow the models to accommodate a wide range of data. We may also have flexible choices of copulas; examples are multinormal copula, Hiisler-Reiss copula, Morgenstern copula, etc.. For a summary of properties of M C D models, see subsection 2.2.4. For a given d-variate discrete distribution F, we can often find multiple copulas which match F into a M C D model. For example, suppose we have a bivariate binary random vector Y = (Yi, Y2)', where Yj (j = 1,2) takes values 0 and 1. The probability of observing (1,1), (1,0), (0,1) and (0, 0) are P(ll), -P(IO), -P(Ol) and P(00) respectively. Then for any given one-parameter family of bivariate copulas C(u\, u2; 0) that ranges from the Frechet lower bound to upper bound, we can find a 6 to express the four probability masses in the following way C{ux,u2;e) = P(ll), < \u00C2\u00AB i = P ( l l ) + P(10), (2.15) u2 = P ( l l ) + P(01). (2.15) may not hold if C(-;9) cannot attain the Frechet bounds. The above observation suggests that to model multivariate discrete data, different copulas could do the modelling job equally well. To make the modelling successful in the general sense, it is important that the copula has a wide dependence range. Evidently, with different copulas, we will not be estimating the same dependence parameters, but nevertheless the fitted model should lead to the similar inference or interpretations. Chapter 2. Foundation: models, statistical inference and computation 26 2.2.2 Multivariate mixture discrete models Multivariate discrete models can be constructed in different ways than the derivation of M C D models. We can envisage circumstances that the multivariate discrete random vector Y at y = ( j / i , . . . , yd)' has pmf/ ( j / i \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 - yd', A) for a given A. Suppose further that A is a random outcome which we assume to be a p-component vector (p may be different from d) subject to chance variation described by a certain (continuous) multivariate distribution G ( A i , . . . , A p ) , which in turn can be expressed in terms of a copula function C(u\,..., up) with (continuous) univariate marginal distribution Gj j = 1 , . . . , p. This is similar to imagining a group of outcomes, with random traits or effects for the individuals in the group, and having a common constant trait or element through the distribution of the random effects. Then the probability of Y = y, or the pmf of Y at y is P(vi---Vd)= / \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 [ f(yi---yd;X)c(G1(X1),...,Gp(Xp))'[[gj(Xj)dX1---dXp. (2.16) We call (2.16) a multivariate mixture discrete (MMD) model. We use the word mixture since the distribution function is constructed as a mixture of {/(j/i \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 - yd', A)} over A. A special case of (2.16) obtains by assuming that the outcome of each univariate marginal probability mass corresponding to the outcome of Yj, which is Pj{yj), depends on a parameter jj, j = l,...,d, (or a vector of parameters), and given jj, the variables Yj are independent. If A = ( A i , . . . , A p ) ' is the p-component vector formed by the non-singular components of jj, j = 1 , . . . , d, then the model (2.16) becomes P(vi---Vd)= / \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 / Y[fj(yj;7j)c(G1(X1),...,Gp(Xp))l[gj(\j)dX1---dXp, (2.17) J J j=i j=i where fj(yj',jj) \u00E2\u0080\u0094 Pr(l} = yj\Tj = Jj). The dependence among the response variables is induced through the mixing distribution of A. Usually Xj = jj, j = 1 , . . . , d. A special case is jj = Xj = X for all j . 2.2.3 Examples of M C D and M M D models From their definitions, we see that the above two classes are rather general. We can choose any appropriate multivariate copula as the copula in the construction of the distribution. The sets of M C D and M M D models are not disjoint, as we can see from Example 2.13. From practical viewpoint, we need to find some specific multivariate copulas C which offer good modelling properties and have a simple analytic form. One such choice is the multivariate normal cop-ula (2.4). With this copula, we have C ( G i ( * i ) , . . . , Gd(zd)) = $ d ( $ \" 1 ( G i ( z i ) ) , . . . , *-\Gd(zd));e), Chapter 2. Foundation: models, statistical inference and computation 27 where Gj's are arbitrary cdfs. The multivariate normal copula allows us to fully or almost fully ex-ploit the dependence structure among the response variables. Its primary disadvantage may be computational difficulties when d is large (e.g. d>7, see Schervish 1984). This subsection consists of examples of M C D and M M D models. Discussion concerning the inclusion of covariates is given in some cases. More extensive studies of specific M C D and M M D models are given in Chapter 3. Example 2.10 ( M C D binary model) 1. General models. Let Yj (j = l,...,d) be a binary random variable taking values 0 or 1, and suppose the probability of outcome 1 is pj. The cdf for Yj is For a given d-dimensional copula C ( u i , . . . , ud; $), C(G\(yi),..., Gd{yd)', 9) is a well-defined distri-bution for the binary random vector Y = ( Y i , . . . , Yd)'. When d = 2, with a one-parameter copula C ( u i , u2; #12), we can write down the pmf of Y as where eti = Gi(j / i - 1), 61 = G i ( y i ) , a 2 = G2(y2 - 1) and b2 - G2(y2). The pmf of Y = y for a general d is expressed by (2.14). One simple way to reparameterize pj in (2.18), so that the new parameter associated to the univariate margin has the range in (\u00E2\u0080\u009400,00), is by letting pj \u00E2\u0080\u0094 FJ(ZJ), where Fj is a proper cdf. This is equivalent to writing Yj = I(Zj < Zj), where Zj is a rv with cdf Fj, and the random vector Z = (Z\,..., Zd)' has a multivariate cdf F^-d- In the literature, this approach is referred to as a latent variable model or a multivariate latent model, since Z is an unobserved (latent) vector. There is also the option of including covariates to the parameter Zj, as well as to the dependence parameters 6 in the copula C ( u i , ...,Ud\0). We will show these by examples. 2. Multivariate probit model with no covariates. The classical multivariate probit model for the multivariate binary response vector Y is (2.14) with the multinormal copula (2.4), where pj is reparameterized as pj = $(ZJ) and Gj has form (2.18). This model has the C U O M and M U B E properties. Through its latent variable representation, the model can also be written as Yj = I(Zj < Zj), j = 1 , . . . , d, where Z = (Zlt..., Zd)' ~ TV(0, 0), 6 = (9jk); Zj is often referred to as the cut-off point. 0 is a correlation matrix, which (a) has elements bounded by 1 in absolute value ' 0, yj < 0, Gj(Vj) = I 1 ~Pj, 0 < yj < 1, . 1, yj > 1. (2.18) P{yiV2) = C(blt62; 912) - C(bua2;912) - C(ax,b2;012) + C(ai,a2;912) Chapter 2. Foundation: models, statistical inference and computation 28 and (b) is nonnegative definite. To avoid the constraint of the bounds, we can reparameterize Qjk through the hyperbolic tangent transform as ^ = eXP\7jk)~l (2-19) exp (7 j j b ) + l so that the new parameter jjk is in the range (\u00E2\u0080\u0094oo, oo). The right hand side of (2.19) is an increasing function in jjk- Condition (a) is not sufficient to guarantee 0 be nonnegative definite except when d = 2. For d = 2, 0 is always nonnegative definite since the determinant of 0, 1 \u00E2\u0080\u0094 ^ 12 > ^ s always nonnegative. For d = 3, 0 is nonnegative definite matrix provided det(0) = 1 + 20120i3023 - 0{2 - 6\3 - e23 > 0; (2.20) this constraint is satisfied for about 61.7% of the cube [\u00E2\u0080\u00941,1]3 for (#12, #13, #23)- For d = 4, only about 18.3% of the hyper cube [\u00E2\u0080\u00941,1]6 leads to a nonnegative definite matrix 0 ; see Rousseeuw and Molenberghs (1994). Theoretically, the constraint (b) causes no trouble for the usefulness of the model. But numerically, this constraint may be a problem, since the space where the numerical computation can be carried out is quite limited. For the numerical computation to be successful, we have to guarantee that the current values are not out of the space of constraint, which, in some situations (e.g. the real parameters are close to the space boundaries), may render the computation time consuming or even not possible. In some situations, these problems with the constraint (b) can be avoided by limiting consideration to a simple correlation structure, so that the nonnegative definite condition is always satisfied. Examples include an exchangeable correlation matrix with all correlations equal to the same 9, and an AR(1) correlation matrix with the (j, k) component equal to for some 6. 3. Multivariate probit model with covariates. The classical multivariate probit model for a binary response vector Y,-, i = 1 , . . . , n, with covariate vector X{j for the jth univariate marginal parameter, if we use the latent variable representation, is that Y,j = I (Zij < atj + ySjXjj), j = 1, . . .,d, i = l , . . . , n , where Z,- ~ N(O,0 j ) , 0 S = (Oijk)- A modelling question may be whether dependence parameters should also be functions of covariates. If so, what are natural function to choose, so that Oi are all correlation matrices? If 0,- does not depend on any covariates, then Zs- are iid N ( O , 0 ) , with 0j = 0 = (Ojk). If 0j depends on some covariate vectors, say 8ijk depends on vfijk, then to satisfy \0ijk\ < 1> we can let o _ e x p ( 7 j M + 7,-fcWjjt) - 1 uiJk - 7 i T 7 T - ( ^ 1 ) e x p ( 7 j M + 7jifcWt,jfc) + l Since all 0, , i = 1 , . . . , n, must be nonnegative definite, this may be a very strong restriction on the regression parameters ( 7 ^ , 0 , Jjk)- In some situations, choices of the parameters (jjk,o,7jk) i n (2-21) Chapter 2. Foundation: models, statistical inference and computation 29 making all 6 t nonnegative definite may not exist. The inclusion of covariates to the dependence parameters Oijk as expressed in (2.21) is a mathematical construction. In the Example 2.13, we will give a more \"natural\" way to include covariates to dependence parameters. \u00E2\u0080\u00A2 Example 2.11 ( M C D count model) 1. General models. Consider a cf-variate random count vector Y = ( Y i , . . . , Yd)'. Let Yj be a random variable taking the integer values 0,1, 2 , . . . , oo, j = 1, 2 , . . . , d. Let Pr(Y;- = m) = p^m\ Then we have J2m=o P^ = ^ a n < ^ * n e c c u \" \u00C2\u00B0 f X?' is [y;l GJ(yJ)=Y,P (2-22) m = 0 where [yj] means the largest integer less or equal than yj. Thus for a given d-dimensional copula C(u\,..., Ud\ 9), C(G\(yi),..., Gd(yd)\9) is a well-defined distribution for the count random vector Y . The.pmf of Y = y for a general d is expressed by (2.14). If we further assume that Yj has a Poisson distribution with parameter Xj, that is ^ ) = A r e x p ( - A , ) J ml then we will say we have a MCD Poisson model. For the M C D Poisson model, the univariate parameter Xj can be reparameterized by rjj = log(Aj), so that the new parameter rjj has the range (\u00E2\u0080\u009400,00). Covariates can be included to r\j in an appropriate way. The comments on modelling of the dependence structure in the copula C for the M C D binary model are also relevant here. To represent the M C D Poisson model by latent variables, let Yj = m if z m _ i < Zj < zm, \u00E2\u0080\u009400 = z_i < ZQ < \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 < Zoo = 00, where Zj is a rv with cdf Fj, and the random vector Z = (Z\,..., Zd)' has a multivariate cdf F\2...d- The form of Fj does not have much importance since for count data, we are seldom interested in the cut-off points ZQ, Z \ , ..., z^. But the copula related to Fi^-.-d has essential importance for the modelling of count data, since it determines the multivariate structure of the random count vector Y . Thus we may say that for count data, the M C D representation (2.14) is more relevant than the latent variable representation. 2. Multivariate Poisson model with multinormal copula. The multivariate Poisson model with multi-normal copula for a count response vector Y is that in (2.14), where the copula is the multinormal copula (2.4) and has the form (2.23). This model has the C U O M and M U B E properties. The univariate marginal parameters Xj can be transformed to rjj \u00E2\u0080\u0094 log(Aj) so that rjj has range (\u00E2\u0080\u009400,00). For a random vector Y j , i = 1, . . .,n, if there is a covariate vector x^ - for Xij, a possible way to Chapter 2. Foundation: models, statistical inference and computation 30 include x,j is by letting rjij = ctj + Pj^ij, where rjij \u00E2\u0080\u0094 log(Ajj). Similarly, if 0j = (Oijk) with 9ijk depending on a covariate vector Wjjfc, a possible way to include vfijk is by letting 9ijk have the form (2.21). The difficulties with adding covariates to 0, remain, as in the previous example. \u00E2\u0080\u00A2 E x a m p l e 2.12 ( M M D Poisson model) 1. General models. Let Y = ( Y i , . . . , Yd) be a random vector of count data, where Yj, j = 1 , . . . , d, has a Poisson distribution. The MMD Poisson model for the random vector Y is /\u00E2\u0080\u00A2oo /.oo d p P(vi---Vd)= \u00E2\u0080\u00A2\u00E2\u0080\u00A2 / ]Jfj(yj^j)c(G1(m),...,Gd(rjp))'[[gj(r}j)dm---drlp, (2.24) Jo Jo j = 1 j = 1 where fj(yj;\j) = exv-xi\y /yj\ (2.25) is the probability mass function of a Poisson distribution for Yj given the parameter Xj. In (2.24), T) = (rji,..., r)p)' is a p x 1 vector of the collection of functions of A i , . . . , Xd; it is assumed to be random with a density function c(Gi(n{),...,Gp(r)d))\Yj=i 9i(^i)^ where c(-) is the density function of a copula C and gj (\u00E2\u0080\u00A2) the marginal density of rjj. The model can cover a wide range of dependence through appropriate parametric families of the copula C. Through conditional expectations one can study the covariances and correlations of Y . If Xj = rjj, j = 1 , . . . , cf, we have E ( y i ) = E(E(Y >|A i)) = E(A i ) , Var(Y i) = E(Var(Y i | Xj)) + Var(E(Y j [A,-)) = Var(A;) + E(A,-), (2.26) [ Cov(Yj, Yk) = E(Cov(Y j , Y^A,-, A*)) + Cov(E(Y j |A,-), E(Y 2|A f c)) = Cov(Xj, Xk). Therefore the correlation of Yj and Yk is Con(Yj,Yk) = { [ V a r ( ^ } + ^ [ J ^ j + E ( A , ) ] } i / 2 ) (2-27) which has the same sign as the correlation of Xj and Xk. Corr(Yj,Yi:) is smaller than Corr(Aj, A*,) and tends to Corr(Aj, Afc) when E(Aj)/Var(Aj) and E(Afc)/Var(Afc) tend to zero. When Xj = n, j = 1 , . . . , d, Y is equicorrelated with Corr(Yj, Yk) = Var(?7)/[Var(^)+E(77)]. The range of dependence for this special situation is quite restricted. For the general model (2.24), the parameters are introduced by the marginal distribution of r\j and the copula C. Letting the parameters depend on covariates is possible, as we can see from the next example with a specific copula. 2. Multivariate Poisson-lognormal model. The Multivariate Poisson-lognormal model for a random Poisson vector Y is that in (2.24), where the copula is the multinormal copula (2.4), and T]J has a Chapter 2. Foundation: models, statistical inference and computation 31 lognormal distribution with parameters fij and 0, j = 1 , . . . ,p, is a multivariate lognormal density function. The model (2.28) has the C U O M and M U B E properties. The parameters in the model are p. = (m,..., Ud)', o = (ax,..., ad)' and 0 . By (2.26) and (2.27), we have The margins are overdispersed Poisson since Var(Yj)/E(Yj) > 1. |Corr(Y), Yk)\ is less than |Corr(r?j, r]k)\ and Corr(Yj, Yk) approaches Corr(?7j, rjk) when a,j, ak \u00E2\u0080\u0094* oo. A covariate vector x can be included in the model, say by letting the components of (i be linear functions of x. a can be assumed to have some special pattern, for example a\ = \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u0094 ap \u00E2\u0080\u0094 a. It is harder to naturally let the correlation matrix 0 depend on covariates, as already discussed for the multivariate probit model for binary data. \u00E2\u0080\u00A2 E x a m p l e 2.13 ( M M D m o d e l for b i n a r y data) 1. General models. Let Y = ( Y i , . . . , Y d ) ' be a binary random vector. Assume that Y has the M C D binary model in Example 2.10 for a given cut-off point vector a = (cxi,..., ad)' \u00E2\u0080\u00A2 a in turn is assumed to be a random vector. Let T) = (771,..., t]p) be the collection of functions of a. With the latent variable representation, we have that for given t] E(Yj) = exp{fij + ^a?}d=aj, Var(Y i ) = aj + a][exp(a]) - 1], Cov(Yj, Yj,) = ajak[exp(9jkajak) - 1], j ^ k, (2.30) Y = ( Y i , . . . , Yd)' = ( / (Ai < a i ) , . . . , I(Ad < ad))' (2.31) where A = (Ai,..., Ad)' has a multivariate cdf F, and T) has a multivariate cdf G. Thus P{yi---yd)= / \u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2/ P(yi---yd\r,)c(G1(r]l),--.,Gp(r]p))]Jgj(T]j)dri1---dr)1 J \u00E2\u0080\u0094 00 J\u00E2\u0080\u0094 00 Chapter 2. Foundation: models, statistical inference and computation 32 where c(G\(rji),..., Gp(rjp)) ]TJ=i 9j(Vj) is the density function of t), with c(-) the density function of a copula C and gj(-) the marginal density of r)j. A more general case is when there is a covariate vector x. In this situation, we may let ctj = Bjt0 + BjX, j = 1 , . . . , d, where the Bj^s and Bj's are random, and Tf is now assumed to be the collection of functions of the random components BJQ'S and Bj's. 2. Multivariate probit-normal model. The M M D probit model is obtained by assuming that in (2.31), A = ( A i , . . .,Ad)' ~ Nd(0,Q) and 17 ~ Np(p,T,), where 6 = (Ojk) is a correlation matrix and \u00C2\u00A3 = (o-jk) is a variance-covariance matrix. Without loss of generality, let us assume 17 = a. Then the M M D probit model of the form (2.31) becomes Y = ( Y i , . . . , Yd)' = < zt),... ,I(Zd < z*d))', (2.32) where Zj = (Aj - a, - Uj)/^/l + 0 , 3 d < 0 ) ' ~ Nd(ji0, E 0 ) independent of 8 = (Pi,..., BD)' ~ Nd(p, S) , where Mo = (pi,o,---,Ud,o)', S 0 = (i(y;0),.-.,^(y;0))T : ^xft^TR* is called a vector of inference functions, if the component functions of \&(y; 0) are measurable for each fixed 8= (6i,...,0q) ef t . \u00E2\u0080\u00A2 Definition 2.10 (Unbiased inference functions) \P is said to be unbiased if for each 6 E ft and j = 1,.. .,q, Efjlijj} = 0, where Eg means expectation relative to P(-;6). \u00E2\u0080\u00A2 Unbiasedness is a natural requirement which ensures that the roots of the equations are close to the true values when little random variation is present. Whereas 6 may not have an unbiased estimator, unbiased inference functions exist under fairly general circumstances. For any given inference function vector ^ and any y E y, an estimator of 0, say 6 \u00E2\u0080\u0094 6(y), can be obtained as the solution to \t = 0. In order for the estimate 6 to be well-defined and well-behaved, the inference function vector $ must satisfy some regularity conditions, that is, * must consist of regular inference functions. Definition 2.11 (Regular inference functions) The vector of inference functions $ is said to be a vector of regular inference functions if, for all 6 E ft, the following assumptions are satisfied: 1. The support ofy does not depend on any 6 E ft. 2. E{i>j} = 0, j = l,...,q. Chapter 2. Foundation: models, statistical inference and computation 39 3. The partial derivative dyjj/dOk exists for almost every y \u00C2\u00A3 y, j,k = 1 , . . . , q. 4- The order of integration and differentiation may be interchanged as follows: ^- J ^P{y; 0)dp{y) = J [ ^ P ( y ; 6)] dp(y), j, k = 1 , . . .,q. 5. E{ipjij)k} exists, j,k = 1 , . . . , q, and the q x q matrix M$(0) = E{WT} is positive-definite. 6. The q x q matrix is non-singular. \u00E2\u0080\u00A2 A model P(y; 6) in (2.35) is said to be regular, if the score functions are regular inference functions and 5ft is an open region of Mq. We are only interested in regular models, such that the asymptotic theory concerning M L E s is readily available for use. This is not a strong assumption for applications. (The main limitation may be the exclusion of models in 1 of Definition 2 .11.) D e f i n i t i o n 2.12 (Fisher i n f o r m a t i o n matr ix ) The Fisher information matrix is the matrix-valued function I : 5ft \u00E2\u0080\u0094\u00E2\u0080\u00A2 Ftqxq defined by I(6) = E{U(6)UT(6)}, where U(6) is the vector of score functions, U{B) = d/dB\ogP(y;6). \u00E2\u0080\u00A2 D e f i n i t i o n 2.13 ( G o d a m b e i n f o r m a t i o n matr ix ) For a regular inference function vector^!, the Godambe information matrix is the matrix-valued function J$ : 5ft \u00E2\u0080\u0094\u00E2\u0080\u00A2 R q x q defined by M$) = Dl{6)M^{6)D$(0) = E{dV/dO'}. \u00E2\u0080\u00A2 Chapter 2. Foundation: models, statistical inference and computation 40 Consider n iid observations y x , . . . , y\u00E2\u0080\u009E from a model P(y; 0) in (2.35). Let ^(y,-; 8) = (ipn,..., ipiq)' \u00E2\u0080\u00A2 The inference function vector based on the n observations is \\u00C2\u00A3\u00E2\u0080\u009E : yn x ft \u00E2\u0080\u0094\u00E2\u0080\u00A2 IR9 given by n 8 = 1 We define the estimator 0 = 0(yi,..., y n) as the solution of \P\u00E2\u0080\u009E = 0. The following theorem establishes the asymptotic normality of the solution 0 based on regular inference functions and gives an asymptotic interpretation of the Godambe information matrix. Theorem 2.1 Assume that the estimator 0 = 0(y1,... ,yn) associated with the regular inference function vector \Pn : yn x ft \u00E2\u0080\u0094+ IR9 is a y/n-consistent estimator of 0, that is, y/n(0j \u00E2\u0080\u0094 Oj), j = l,...,q, is bounded in probability so that 9j tends to 9j at least at the rate ofl/y/n. We further assume that there exist functions Mjki(y) such that \d2ipj/d9kd9i\ < Mjki(y) for all 0 G ft, where E{Mjki(y)} < oo for all j,k,l. Then as n \u00E2\u0080\u0094\u00E2\u0080\u00A2 oo, we have asymptotically V^(0-0)\u00C2\u00B0Nq(O,J^(0)) under P(-;0). Proof. The proof is similar to the corresponding theorem for the asymptotic normality of the M L E . We therefore only sketch it. * n has the following expansion around 0 O = Vn(0) = yn(0) + Hn(0)(0-0) + Rn, where Hn is a q x q matrix d^n/d0 and R n = Op(||0 \u00E2\u0080\u0094 0\\2) = o p ( n _ 1 ) by assumptions. Thus ^(6 -0) = l-Hn{0) n 1 1 - = [ - * \u00E2\u0080\u009E ( * ) - R \u00E2\u0080\u009E ] . (2.36) By the Law of Large Numbers -Hn{0)^{0). n Now for any fixed vector u = ( u i , . . . , uq)'', consider the sequence of one-dimensional rv's 4 , U l f \u00C2\u00AB + . . . + u f ^ ; \u00C2\u00AB ) Chapter 2. Foundation: models, statistical inference and computation 41 By the central limit theorem (Lindberg-Levy), u'9n/y/n is Ni(0, u ' M * u ) . This result leads to Applying Slutsky's Theorem to (2.36), we obtain y/H(8 - 0)^NQ(O, D^M^D^f) or V^(e-e)^Nq(o,j^(e)). \u00E2\u0080\u00A2 O p t i m a l i t y cr i teria for inference functions In this subsection, we will summarize optimality results for inference functions in the multi-parameter situation. These results will be referred to later for comparing two sets of regular inference functions. Consider a scalar inference function \P. It is natural to seek an unbiased estimating function \t for which the variance E{\&2} is as small as possible. This is analogous to the theory of minimum variance unbiased ( M V U ) estimation. Since the variance may be changed by multiplying \? with an arbitrary constant, some further standardization is necessary for the purpose of comparing variances. Godambe (1960) suggested considering the variance of the standardized estimating function \t s = ^/E{d^/d9], and defined an optimal estimating function to be one which minimizes Var(^ r J ) = E{\t2}/{E(d\E ,/3#)}2, or maximizes Var - 1 ( \T/j) , the Godambe information for <\u00C2\u00A3. Godambe showed that in the one-parameter case the usual maximum likelihood estimating equation has this optimal property within a wide class of regular unbiased inference functions. Thus Godambe information can be used to compare two regular inference functions, and the function with the larger Godambe information is generally preferred. Given two vectors of inference functions, \P and Q, several different optimality criteria can be used to say that fi is preferred (or optimal) to D e f i n i t i o n 2.14 ( M - o p t i m a l i t y ) A vector of inference functions is said to have matrix opti-mality or M-optimality versus a vector of inference functions ^ if the difference of the inverses of the Godambe information matrices is non-negative definite. \u00E2\u0080\u00A2 Chapter 2. Foundation: models, statistical inference and computation 42 Definition 2.15 (T-optimality) A vector of inference functions is said to have trace optimality or T-optimality versus a vector of inference functions ^ if the difference of the trace of the inverse of Godambe information matrices TriJ^(e))-Tr{J^(d)) is positive. \u00E2\u0080\u00A2 Definition 2.16 (D-optimality) A vector of inference functions is said to have determinant optimality or D-optimality versus a vector of inference functions \P if the difference of determinant of the inverse of Godambe information matrices \J^(6)\-\J^(6)\ is positive. Chandrasekar and Kale (1984) proved that M-optimality implies T-optimality and D-optimality. Joseph and Durairajan (1991) further proved that the above three criteria are equivalent in the sense that if $ is optimal with respect to any one of the three criteria then it is also optimal with respect to the remaining two. When comparing two sets of regular inference functions, we could examine a slightly different version of T-optimality and D-optimality. For example, for T-optimality, we may examine Tr(J^(e))' and for the D-optimality j\Jn\')\ In practice and often in simulation studies, only the estimated values of J^1(6) and J^iO) are available, M-optimality or T-optimality or D-optimality may be violated slightly numerically based on only one set of observations. We end this subsection by stating an extended Cramer-Rao inequality for inference functions: Theorem 2.2 For any given vector of regular inference functions \P, and for all 6 6 5ft, J$l(8) \u00E2\u0080\u0094 I - 1 (6) is non-negative definite. For a proof of this result, see Jorgensen and Labouriau (1995). Related references include Ferreira (1982) and Chandrasekar (1988), among others. \u00E2\u0080\u00A2 Chapter 2. Foundation: models, statistical inference and computation 43 This theorem states that, for a regular model P(y;6), the vector score functions nta\ dlogP(y;fl) fd log P(y,$) dlogP(y;6)\ u { 6 ) - dB - 1, del Wk ) are M-optimal within the class of all regular unbiased estimating functions. 2 .3 .3 I n f e r e n c e f u n c t i o n o f m a r g i n s We have seen from previous subsection that, under fairly general regularity conditions, the score functions are asymptotically optimal among regular inference functions. However, with multivariate models, except in a few special cases (e.g multivariate normal), the estimating equations based on the score functions are computationally very cumbersome or intractable. It would be an invaluable alternative to have inference functions which are computationally feasible in general and also efficient compared to the score functions. In the ensuing subsection, we introduce a set of inference functions, we call the inference functions of margins (IFM). In Chapter 4, we show that I F M shares the asymptotic optimality properties of the score functions, and this is particularly true for the multivariate models with M U B E and P U B E properties. One major advantage of I F M is that it is computationally feasible in general and more flexible for handlinge different types of data. This leads us to develop a new inference theory and computationally feasible procedures for many M C D and M M D models. Inference f u n c t i o n of scores We consider the family (2.35) and assume it is a regular parametric family. The likelihood function of 6, given y, is L(6;y) \u00E2\u0080\u0094 P(y;6), the corresponding loglikelihood function is \u00C2\u00A3(6;y) \u00E2\u0080\u0094 \ogP(y;6). Let Ln(6) = f[L(6;yi) ! = 1 denote the likelihood of 6 based on y 1 ; . . . ,y\u00E2\u0080\u009E , a sample from y. The loglikelihood function of 6 based on y l t . . . , yn is 4(*) = l o g M * ) = \u00C2\u00A3 * ( 0 ; y , . ) . \u00C2\u00BB=i D e f i n i t i o n 2.17 (Inference functions of scores, or I F S ) The vector of score functions dtn{6) = fd\u00C2\u00A3n(6) dtn{6) 06 V 30i d9q is called inference function vector of scores, or IFS. \u00E2\u0080\u00A2 Chapter 2. Foundation: models, statistical inference and computation 44 The maximum likelihood estimate (MLE) is generally determined as the solution to the likelihood equations dtn(0)/d0 = 0. The Hessian matrix of the function \u00E2\u0080\u0094\u00C2\u00A3n(6)/n is J(0), where (J(8))jk = -(l/n)(d2\u00C2\u00A3n(0)/d0jd9k). The expected value of J(6), 1(6) = E{J(0)}, is the Fisher information matrix. The value J(0) of /(\u00E2\u0080\u00A2) at the maximum likelihood estimate 0 = 6(y1,...,yn) is referred to as the observed information. J(0) will generally be positive definite since 6 is the point of maximum likelihood. A consistent estimate of1(0) is 1(0) = J(0). Under very general regularity conditions, it is known that the M L E s are asymptotically normal, in the sense that as n \u00E2\u0080\u0094\u00E2\u0080\u00A2 oo, V^(0-0)\u00C2\u00B0Nq(O,I(0)-'). See Sen and Singer (1993, p.209) for a proof. Inference f u n c t i o n of margins We now introduce the loglikelihood function of a margin, the inference function of a margin for one parameter, and then define the inference functions of margins (IFM) for a parameter vector 6. The asymptotic results for the estimates from I F M will be established in the next section. Consider the parametric family (2.35) and assume P(y,0) is a d-dimensional density function with respect to a probability measure p on y . Let Sd denote the set of non-empty subsets of {1,..., d}. For any S \u00E2\u0082\u00AC Sd, we use \S\ to denote the cardinality of S. Let Ps(ys) be the 5-margin of P(y;0), where y 5 = {yj :. j G S}. Assume Ps(ys) depends on 0s, where 0s is a subvector of 0. D e f i n i t i o n 2.18 Let 0 = (9\,..., 9q)'. Suppose the parameter 9k appears in S-margin Ps(ys)- The loglikelihood function of the S-margin is ts(8s) = logPs(ys). An inference function for 9k is d\u00C2\u00A3s(0s) 08k ' \u00E2\u0080\u00A2 The inference function of a margin for a parameter 9 is not necessarily uniquely defined by the above definition. In this thesis, unless specified otherwise, we always work with the inference function from a margin with the smallest \S\. For a specific model, it is often evident when \S\ is the smallest for a parameter, so we will not be concerned with the proof of this feature in most applied situations. If there are two or more inference functions for 9 with the same smallest \S\, than there Chapter 2. Foundation: models, statistical inference and computation 45 is a question of how to combine these inference functions to optimally extract information. We will discuss this issue in section 2.6. Note that with the assumption of M P M E (or M U B E ) , one can use S with \S\ < q (\S\ < 2 for M U B E ) for every parameter Bk. In the case where M P M E does not hold, then one has S = {1,..., d} for some 6k in the model. For the new theory below, we assume M P M E or M U B E or P U B E in the remainder of this chapter. Assume for the parameter vector 8 \u00E2\u0080\u0094 {6\,..., 9q)', the corresponding smallest cardinality subsets associated with the parameters are Si,.. .,Sq (as q is usually greater than d, there are duplicates among the Sk's). D e f i n i t i o n 2.19 (Inference funct ions of margins , or I F M ) The vector of inference functions 'd\u00C2\u00A3n,Sl(8Sl) den,sq(6s,)\' is called the inference functions of margins, or IFM, for 6. \u00E2\u0080\u00A2 For a regular model, the inference functions derived from the likelihood functions of margins also satisfy the regularity conditions. Thus asymptotic properties related to the regular inference functions should apply to I F M . Detailed development of this aspect will be given in section 2.4. D e f i n i t i o n 2.20 (Inference funct ions of margins estimates, or I F M E ) Any fl \u00C2\u00A3 $ which is the solution of - = ( ^ ^ y = \u00C2\u00B0 is called the inference functions of margins estimate, or IFME, of the unknown true parameter vector 8. \u00E2\u0080\u00A2 In a few cases, 6 has an analytic expression (e.g. Example 4.3). In general, 8 has to be obtained by means of numerical methods. E x a m p l e s of inference functions of margins E x a m p l e 2.15 Let X i , . . . , X n be n iid rv's from the distribution C(Gi(xi),...,Gd(xd)',@) with GJ(XJ) = $(XJ; Uj,aj), where C is the multinormal copula (2.4). Let p = (pi,..., fid) and a = ({.Xij;p.j,aj) \ . Chapter 2. Foundation: models, statistical inference and computation 46 Thus the IFS is vp I F S = (dtn(M,',&) d\u00C2\u00A3n(p,{xik\ Pk,<^k)) , 1 < j < k < d. Thus the I F M is \u00C2\u00BB=i * I F M = (d^\"1^1'0\"1) dtnd(Pd,o-d) d\u00C2\u00A3ni(pi,o-d) d\u00C2\u00A3nd(pd,o-d) \ dpi ' ' ' ' ' Q^d ' Q(Tl i \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 > > d\u00C2\u00A3n!2(9l2, Pi, P2, Q~l, ^ 2) d\u00C2\u00A3nd-l,d(9d-l,d, Pd-1, Pd, O'd-lyO'd) d9d-l,d 86 12 It is known that \PIFS and \PIFM lead to the same estimates; see for example Seber (1984). \u00E2\u0080\u00A2 E x a m p l e 2.16 Let Y i , . . . , Y n be n iid rv from the multivariate Poisson model with multinormal copula in Example 2.11. The loglikelihood functions of margins for the parameters A and 0 are 4 j ( A j ) = ^logPjiyij), j = l,...,d, 8 = 1 n \u00C2\u00A3njk(9jk, Aj, A*) = E l o g P j k i y i i V i k ) , 1 < j < k < d, where Pj(yij) = A f J e x p ^ A , - ) / ^ - ! and P j k ( y i j y i k ) = $ 2 ( $ - 1 ( 6 o ) , $ _ 1 ( M ; tyt) - M * - 1 ^ ) . ^ _ 1 ( a \u00C2\u00BB i f c ) ; i^/fc) \u00E2\u0080\u0094 *2(^ _ 1(a a-j), ^\u00C2\u00AB~ 1(6ifc); <9jfc) -I- ^ 2 ( ^ _ 1 ( a\u00C2\u00BBi)> ^ _ 1 ( a \u00C2\u00AB J k ) ; j^Jfe). w h e r e aij = Gij(yij - 1), bu - Gij(yij), aik - G i k ( y i k - 1) and bik = Gik(yik), with Gij(yij) = YH=opj(x) a n d G i k ( y i k ) = Zl=0Pk(x)-Let r)j = log(Aj). The I F M for rjj, j = 1,..., d, and 6jk, 1 < j < k < d are * I F M = f E i a P i ( W i ) 1 dPd(yid) E f^Pi(yn) d m \"\"'friPdim) dnd ' 1 dPi2(yayi2) \^ 1 dPd-\,d{yid-iyid) E fr( Pd-lAVid-lVid) d9d-l,d fri Puivnvn) dd12 For a similar random vector Y,-, i = 1 , . . . , n with a covariate vector x,j for A ^ , a possible way to include X ; J is by letting rjij = aj + BjXij, where rjij = log(A,j). We can similarly write down the I F M for parameters aj, Bj and 9jk. \u00E2\u0080\u00A2 Chapter 2. Foundation: models, statistical inference and computation 47 E x a m p l e 2.17 ( M u l t i v a r i a t e b i n a r y Molenberghs-Lesaffre model ) We first define the mul-tivariate binary Molenberghs-Lesaffre model (Molenberghs and Lesaffre, 1994), or M - L model. Let Y = ( Y i , . . . , Yd) be a d-variate binary random vector taking values 0 or 1 for each component. A model for Y is defined in the following way. Consider a set of 2d \u00E2\u0080\u0094 1 generalized cross-ratios with values in (0, oo): rjj, 1 < j' < d, rjjk, 1 < j < k < d, .. ., and r\\i-d such that: )?. . _ n( g i l , . . . ,y i ( ,) 6A+ pix-u(Vh \u00E2\u0080\u00A2 \u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00C2\u00AB/\u00C2\u00AB) ^ where A+ = {(% 1 , . . ., yjq) G {1,0}* | (q - E ? = i W \u00C2\u00AB ) = \u00C2\u00B0 ( m o d 2)> a n d A , = {1.0}\u00C2\u00ABW. a n d {ii i \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2! jq} is a subset of {1, 2 , . . . , d} with cardinality q. We can verify for example when q = 1,2,3,4 that * = ^j^d> P(11)P(00) , ^ . , ^ , ^ = P ( 1 0 ) P ( 0 1 ) ' 1 ^ < f c ^ > P(111)P(100)P(010)P(001) - p(H0)P(101)P(011)P(000)' - 3 < < - ' _ P(1111)P(1100)P(1010)P(1001)P(0110)P(0101)P(0011)P(000Q) tytzm - p( 1 1 1 0)p(iioi)p(ioil)P(1000)P(0111)P(0100)P(0010)P(0001)' 1 - ^ < * < / < m ^ c ( ' (2.38) where subscripts on P are suppressed to simplify the notation. Molenberghs and Lesaffre (1994) show that the 2d \u00E2\u0080\u0094 1 equations in (2.37) together with ^2 P(yi '\"' yd) = 1 leads to unique nonnegative solutions for P(yi \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 -yd), (yi, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 - ,2/d) G {l,0}d, under some compatibility conditions on the d \u00E2\u0080\u0094 1 and lower-dimensional probabilities. If all these conditions in the Molenberghs-Lesaffre construction are satisfied, we have a well-defined multivariate Bernoulli model. We call this model multivariate M - L binary model. The multivariate M - L binary model is not M U B E , but the parameters rjj and ijjk are P U B E . The special case where rjs = 1 for |5| > 3 is M U B E . Related to the M C D model, it is not clear if there exists a M C D model such that (2.37) is true and under what conditions a M C D model is equivalent to (2.37). The difficulty is to prove there exists a copula, such that (2.37) is a equivalent expression to a M C D model. The existence of a copula is needed in order to properly define this model with covariates (e.g. logistic regression univariate margins). For a discussion of whether the Molenberghs-Lesaffre construction leads to a copula model, see Joe (1996). Nevertheless, (2.37) in terms of Pjl-jq(yj1 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 - yjq) certainly defines a multivariate model for binary data for some ranges of the parameters. Chapter 2. Foundation: models, statistical inference and computation 48 Let Y i , . . . , Y n be n iid binary rv's from-a proper multivariate M - L binary model. Assume the parameters of interest are r) = ( 7 7 1 , . . . ,r)d, 7/12,.. . , nd-i,d)' and let 77s be arbitrary for \S\ > 3. The loglikelihood functions of margins for r) are \u00C2\u00AB=i n enjk(Vjk,rij, Vk) = E l o S PjkiyijVik), 1 < j < k < d. Thus the I F M is lT, fd\u00C2\u00A3\u00E2\u0080\u009Ei(r]i) d\u00C2\u00A3nd(rid) d\u00C2\u00A3ni2(m2,m>^2) d\u00C2\u00A3nd-i,d(rid-i,d,Vd-i, * IFM = \u00E2\u0080\u0094 ~ , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, \u00E2\u0080\u0094 * ' a > \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 > x For an interpretation of the parameters (2.37), see Joe (1996). \u00E2\u0080\u00A2 Some advantages of I F M approach The I F M approach has many advantages for parameter estimation and statistical inference: 1. The I F M approach for parameter estimation is computationally simpler than estimating all the parameters from the IFS approach. A numerical optimization with many parameters is much more time-consuming (sometimes beyond the capacity of current computers) compared with several numerical optimizations, each with fewer parameters. In some cases, optimization is done with parameters from lower-dimensional margins already estimated (that is, there is some order to the sequence of numerical optimizations). I F M leads to estimates of the parameters of many multivariate nonnormal models efficiently and quickly. 2. A potential problem with the IFS approach is the lack of stability of the solution when there are outliers or perturbations of the data in one or few dimensions. With the I F M approach, we suggest that only the contaminated margins will have such nonrobustness problems. In other words, I F M has some robustness properties in multivariate analysis. It would be interesting to study theoretically and numerically how outliers perturb the IFS and I F M estimates. 3. A large sample size is often needed for a large dimension of the responses. This may not be easily satisfied in most applied problems. Rather, sparse data are commonplace when there are multiple responses; these often create problems for M L estimation. By working with the lower dimensional likelihoods, the I F M approach avoids the sparseness problem in multivariate situations to a certain degree; this could be a major advantage in small sample situations. Chapter 2. Foundation: models, statistical inference and computation 49 4. The I F M approach should be robust against some misspecification in the multivariate model. Also some assessment of the goodness-of-fit of the copula can be made after solving part of the estimation equations from I F M , corresponding to parameters of univariate margins. 5. Finally, I F M leads to separate modelling of the relationship of the response with marginal covariates, and the association among the response variables in some situations. This feature can be exploited to shorten the modelling cycle when some quick answer on the marginal behaviour of the covariates is the scientific focus. In the above, we listed some advantages of I F M approach. In the next section, we study the asymptotic properties of I F M approach. The remaining question of efficiency of I F M will be studied in Chapter 4. 2.4 Parameter estimation with I F M and asymptotic results In this section we will be concerned with the asymptotic properties of the parameter estimates from the I F M approach. We will develop in detail the parameter estimation procedure with the I F M approach for a M C D or M M D model with M U B E or with some parameters of the models having P U B E properties. The situations we consider include models with covariates. Sufficient conditions for the consistency and asymptotic normality of I F M E are given. Some theory concerning the asymptotic variance matrix (Godambe information matrix) for the estimates from the I F M approach is also developed. Detailed direct calculations of the Godambe information matrix for the estimates based on the data and fitted models are given. A n alternative computational approach, namely the jackknife method, for the estimation of the Godambe information matrix is given in section 2.5. This technique has considerable importance because of its practical usefulness (See Chapter 5). Later in section 2.6, we will propose computational algorithms, which are based on I F M , for the parameter estimation where common parameters appear in different inference functions of margins. 2.4.1 Models with no covariates In this subsection, we confine our discussion to the case of samples of n independent observations from the same distributions. The case of samples of n independent observations from different distributions will be studied in the next subsection. We consider a regular M C D or M M D model in (2.12) P(yi---yd;6), 0 6% (2.39) Chapter 2. Foundation: models, statistical inference and computation 50 where 6 = (di,..., dd, 612, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, 8d-\,d)'\u00E2\u0080\u00A2 The model (2.39) is assumed to have M U B E or to have some of its parameters having P U B E properties. In general, we assume that 8j (j = l,...,d) is a parameter vector for the j th univariate margin of (2.39) such that Pj(yj) = Pj(yj',8j), and Ojk (1 < j < k < d) is & parameter vector for the (j, k) bivariate margin of (2.39) such that Pjk(yjVk) = Pjk(yj, Vk\ 8j, 8k, 9jk)- The situation for models with higher order (> 2) parameters are similar; the extension of the results here should be straightforward. For the purpose of illustration, and without loss of generality, we assume in the following that 8j and 8jk are scalar parameters. Let Y , Y i , . . . , Y \u00E2\u0080\u009E be iid rv with model (2.39). The loglikelihood functions of margins of 0 are 4 j (9j) = E lo\u00C2\u00A7 Pi(yij) \u00C2\u00BB\u00E2\u0080\u00A2 3 = 1 > \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2) d , 1=1 n tnjk(0j , Ok, Ojk) = E l o S PjkiVij Vik), 1 < j < k < d. (2.40) \u00C2\u00AB=i These expressions can also be rewritten as 4>j (di) = ]C ni (y>)log Pi )' 3 = 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, d, {yj} tnjk(0j, 8k,8jk) - njk{yjyk)\ogPjk(yjyk), 1 < j < k < d, {yjVk} (2.41) based on the summary data rij(yj) and rijk(yjyk). In the following we continue to use the expression (2.40) for technical development, for consistency with the case where covariates are present. But (2.41) is a more economic form for computation, and should be used for model fitting whenever possible. Let 1 d. d - - T H < n = f 1 9 P j i y j ) i P j { y j ) . m . . 3 V l \" *' } k ) Pjk(yjyk) d8jk ' . . d e f 1 d j(yij) . 1 < j < k < d, 1pi;jk = 1pi,jk(8j,8k,8jk) d e f 1 dPjkjyijyik) li,..., V d , ^12, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 . , V d - i . d ) ' , and = = ( * \u00E2\u0080\u009E i * B i , * n i 2 , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 , 9nd-i,d)', where \u00C2\u00A5\u00E2\u0080\u009E,\u00E2\u0080\u00A2 = \u00C2\u00A3 \" = 1 (j = 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, d) and 9njk = \u00C2\u00A3 ? = 1 ^.jk (1 < j < k < d). Chapter 2. Foundation: models, statistical inference and computation 51 From (2.40), we derive the I F M for 0 n \u00C2\u00BB = 1 (2.42) i=l Since (2.39) is a regular model, the regularity conditions of Definition 2.11 are also true for the inference functions (2.42). With the I F M approach, an estimate of 0 that we denote by 0 = 0(vi> \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 J Yn) \u00E2\u0080\u0094 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 > 0d> 012, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, 8d-i,d)' is obtained by solving the following system of nonlinear equations '*\u00E2\u0080\u009E,\u00E2\u0080\u00A2=(), j = l , . . . , d , ^njk =0, 1 < j < k < d. Proper t ies of estimators We start by examining the simple case (particularly, for notation) when d = 2 to illustrate how the consistency and asymptotic normality of 0 can be established. The model (2.39) is now ^(yi.ifc; ft,02,0i2) (2.43) with 0 = (0i, 02,0i2)' G 3J- Without loss of generality, 0i, 02,0i2 are assumed to be scalar parameters. Let the random vector Y = (Yi, Yjj)' and Yj = (Y,i, YJ2)' (i \u00E2\u0080\u0094 1,...,n) be iid with (2.43), and y, y,-be the observed values of Y and Y,- respectively. Thus the I F M for 0i,0 2,0i2 are i = l n *r>12 = y~]ll>i;12-(2.44) i=l We denote the true value of 0 = (0i,02,0i2)' by 0O = (0i,o, 02,o, 0i2,o)'- Using Taylor's theorem on (2.44) at 0o to the first order, we have 0 = * \u00E2\u0080\u009E i(0i) = *\u00E2\u0080\u009E i(0i , o ) + (Oi ~ 0i,o) 0 = * n 2 ( 0 2 ) = *\u00E2\u0080\u009E2(^2,0) + (02 - 02,0) 301 502 0 = \u00C2\u00A5n12(6~12,h,6~2) = *nl2(012,o) + (#12 ~ \u00C2\u00BB12,o) dVnl2 60 12 (2.45) + (0i - 0i,o) 5^\u00E2\u0080\u009E12 30i r , + ( 0 2 - 0 2 , 0 ) ^ Chapter 2. Foundation: models, statistical inference and computation 52 where 0* is some value between 9\ and 01,0, Q\ is some value between 02 and 02,o, and 6** is some vector value between 6 and OQ. Note that \ P ni2 also depends on 0i and 02. Let Hn=Hn{6) = 0 0 0 0 3 \u00C2\u00BB i 9 * 1 . 1 2 a s 3 a * \u00E2\u0080\u009E , Q a \u00C2\u00AB i 2 and D $ = -D*(0) = E{n 1 i 7 \u00E2\u0080\u009E } . Since (2.43) is assumed to be a regular model, we have that E(\P\u00E2\u0080\u009E) = 0 and non-singular. On the right-hand side of (2.45), * \u00E2\u0080\u009E i , * n 2 , * \u00E2\u0080\u009E i 2 , d^m/d9u <9#r,2/<902, dVnl2/d912, <9tf\u00E2\u0080\u009Ei 2/<90i and d^ni2/d92 are all sums of independent identical variates, and as n \u00E2\u0080\u0094\u00E2\u0080\u00A2 oo each therefore converges to its expectation by the strong law of large numbers. The expectations of \?\u00E2\u0080\u009Ei (0 i ,o ) , ^n 2(0 2 ,o), and *ni 2(0i2,o) a r e zero and the expectations of d^ni/dOx, d^n2/d62, d^ni2/d9i2, are non-zero by regularity assumptions. Since all terms on the right-hand side must converge to zero to remain equal to the left-hand sides, we see that we must have (0i \u00E2\u0080\u0094 0i,o), (02 \u00E2\u0080\u0094 02,o) and (0i2 \u00E2\u0080\u0094 0i2,o) converging to zero a s n - > oo, so that 9\, 92 and 0i2 are consistent estimators under our assumptions (for a more rigorous proof along these lines, see Cramer 1946, page 500-504). Now let HI I \ a*\u00E2\u0080\u009E, 0 0 80! \u00E2\u0080\u00A2\ 0 \u00C2\u00AB; 0 8*n,2 0 * n , 12 a\u00C2\u00AB, B\" 682 6\" a \u00C2\u00AB i 2 \ 3** / It follows from the convergence in probability of 6 to 6Q that Hn{0) - Hn(60) >0. Since each element of n~1Hn(6) is the mean of n independent identically-distributed variates, by the strong law of large numbers, it converges with probability 1 to its expectation. This implies that n~1Hn(6o) converges with probability 1 to Dy(0o)- Thus -Hn(6)^D9(60). n Now we rewrite (2.45) in the following form y/n(0 - 00) = n - 7 = [ - * \u00C2\u00BB ( \u00C2\u00AB o ) ] -(2.46) Since 6\ lies between $i and 0i ] O , 02 lies between 02 and 02,o, and 0** lies between 6 and 0O, thus we also have -H^D9(e0). n Chapter 2. Foundation: models, statistical inference and computation 53 Along the same lines as the proof in Theorem 2.1, we see that 4=M0o)-^iV3(O,M*(0o)), where M*(0 O ) = E(tftf'). Applying Slutsky's Theorem to (2.46), we obtain v ^ ( * - 0O)^N3(O, D*\do)M*(6o)(Dy\0o))T), or equivalently Vt(8-0o)\u00C2\u00B0N3(O,Jyl(6o)). Thus we can extend to the following theorem for the I F M E of model (2.43): Theorem 2.3 Consider the model (2.43) and let the dimension ofOj (j = 1, 2) be pj and that of 612 be P12. Let 6 denote the IFME of 6 under the IFM corresponding to (2-44)- Then 0 is a consistent estimator of 0. Furthermore, as n \u00E2\u0080\u0094\u00E2\u0080\u00A2 00, y/H(9-e)^NPl+Pa+Pia(0,J^), where J * = J*(0) = D^M^1 , with M * = E{W} and Dj/90j]~2 and the asymptotic covariance of \u00C2\u00A71,92 is ra-^EVi^tE^Vi/^i]-1^^/^]\"1. The asymptotic variance of 0 1 2 is 12 30 12 2 r EV22 + \u00C2\u00A3 E EV? - 2 E E <9Vj_ -1 E 12 [EVi2 -^]+2n 3 = 1 and the asymptotic covariance of 0i2, Oj is E dip 12 E av_i (90J Eyj12ipj \u00E2\u0080\u00A2n E 50* E <9V 12 30* E a_Vj_ 90j [ 99j \ 90~. -1 'M12 90j \ 'diPj d9~ -1 'M\2 [ 90j \ -1 EV1V2 [EV1V2 Furthermore, from the calculation steps leading to the asymptotic variance expression, we can see that 0i, 02 and ,0i2 are ^/n-consistent. Chapter 2. Foundation: models, statistical inference and computation 54 Now we turn to the general model (2.39) where d is arbitrary. As we can see from the detailed development for d \u00E2\u0080\u0094 2, it is straightforward to generalize Theorem 2.3 to the case where d > 3, since in the general situation of the model (2.39), the corresponding I F M are n *nj=X]^\u00C2\u00BB:J' J = 1> \u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2><*, t=l \u00C2\u00AEnjk = ^2 1 < J < k < d-t=l In (2.39), 0j (j \u00E2\u0080\u0094 1 , . . . , d) and 9jk (1 < j < k < d) can be scalars or vectors, and in the latter case, ipj(0j) and ipjk(8jk) are function vectors. The asymptotic properties of 0 for a general d is given by the following theorem: Theorem 2.4 Consider the model (2.39) and let the dimension of Oj (j = 1, . . .,d) be pj and that of 9jk (1 < j < k < d) be pjk. Let 0 denote the IFME of 6 under the IFM corresponding to (2.42). Then 0 is a consistent estimator of 6. Furthermore, as n \u00E2\u0080\u0094\u00E2\u0080\u00A2 oo, we have asymptotically M6-6)^Nq(0,J^), where q = \u00C2\u00A3 ? = 1 pj + J2i\u00C2\u00B0> J * = J*(d) = D^M^D*, with M * = M*(0) = Eg{VW'} and = E{8V/80}. \u00E2\u0080\u00A2 For many models, we have pjk = 1; that is 9jk is a scalar (see Chapter 3). The asymptotic variance-covariance of 6 is expressed in terms of a Godambe information matrix J$(0). Assume 9j (j = 1, . . .,d) and 9jk (1 < j < k < d) are scalars. Let us see how we can calculate the Godambe information matrix J$. We first calculate the matrix M $ and then Since M $ = E(\P\I>'), we only need to calculate its typical components E(ipjtpk) (1 < j, k < d), E(i/>>ipkm) {k < m) where j may be equal to k or m, and E(ipjkiptm) (j < k,l < m), where j may be equal to / and k may be equal to m. For E(ipjtpk), we have \pj(yj) pj(yk) 99j 09k j pik(yjVk) OPj[yj)0Pk{yk) It can be estimated consistently by { s ^ } pi(yj)pk(yk) d9j 89k dPj(ya) 8Pk(yik) l ^ U 1 _ n h i pj(yij)pk(Vik) de{j 89i]k (2.47) Chapter 2. Foundation: models, statistical inference and computation 55 or equivalently by 1 njk(VjVk) 9Pj(yj) dPk(yk) based on the summary data. For the case j = k, we need to replace Pjk(yjUk) by Pj(yj), {yjyk} by {yj} and rijk(yjyk) by nj(yj) in the above expressions. For E(ipjtpkm) (k < m), we have E ( ^ k m ) = E ( 1 dPj(yj)dPkm(ykymy Pj(yj) Pkm(ykym) 96) d9km _ y - Pjkm(yjykym) 9Pj(yj) dPkm(ykym) . ^ , Pj(yj)Pkm(ykym) 96'j d6km \y jykymj It can be estimated consistently by }_ A 1 dPj(yij) 9Pkm(yikyim) (2.48) n Pj(yij)Pkm(yikyim) 96j d6km For the case j = k or j = m, a slight modification on the above expression is necessary. For example for j = k, we need to replace Pjkm(yjykym) by Pjm(yjym), {yjykym} by {%ym} and njkm(yjykym) by rijm(yjym) in the above expressions. For E(tpjkipim) (j < k, I < TO), we have 1 1 dPjk(yjyk) 9Pim(yiymY E ( ^ m ) - E K p . k { y . y k ) Plm{yiym) OOjk 99lm y . Pj * i ~ (V< Vk. VIVm) 9Pik(Vi Vk ) dPlm (Mm) ~ ^ , Pjk(yjyk)Pim(yiym) 96jk 99lm {yjykyiymi It can be estimated consistently by I 9Pjk(yijyik) dPim (mmm) i f \" i^i PJk(yijyik)Plm(yiiyim) 99jk 99,\u00E2\u0080\u009E (2.49) For the particular case j \u00E2\u0080\u0094 l,k ^ m (or similarly j = m or k = / or j ^ /, k = m cases), we need to replace Pjkim(yjykyiym) by Pjkm(yjykym), {yjykyiym} by {yjykym} and njklm(yjyky,ym) by njkm(yjykym) in the above expressions. For the particular case j = I, k = m, we need to replace Pjkim(yjykyiym) by Pjk(yjyk), {yjykyiym} by {yjyk} and njkim(yjykyiym) by rijk(yjyk) in the above expressions. Now let us calculate \u00C2\u00A3 ) $ . Since Z)$ = D^(6) is a matrix with (p, q) element E(dipp/89 q) (1 < j, k < q), where xjjp is the pth components of \\u00C2\u00A3 and 9q is the gth component of 6, we only need to calculate its typical component E(8ijjj / 89m) (1 < j,m < d), E(8t}>j/86im) (1 < j < d; 1 < / < ra < Chapter 2. Foundation: models, statistical inference and computation 56 d), F,(dipjk/d6m) (1 < j < k < d; 1 < m < d), and E(dyjjk/dOim) (1 < j < k < d; 1 < / < m < d). Since dipj d9\u00E2\u0080\u009E 1 dPjjyj) dPjjyj) 1 d^Pjjyj) Pj{yj) d9j dOm + Pj(yj) ddjdem ' we have E 1 dPMdPjjyj) 1 d2Pj(yj)\ = i_dPMdPj(yj) It can be estimated consistently.by n PfiVii) d6j 80m (2.50) Because Pj depends only on univariate parameter Oj, thus yjj does not depend on the parameter Oim- So dvjj(0j)/d0im = 0; this also leads to Since we have dipjk _ 1 dPjk(yjyk) dPjk(yjyk) 1 d2Pjk(yjyk) ddm ~ PjMvjVk) dOjk 80m + Pjk(yjVk) d0jkd0m ' E fdjpj\u00C2\u00A3\_ 1 urjkyyj \dOmJ Pjk(yjyk) dOjk 9Pj ( yk) 9Pjk(yjyk) {yjyk} dOr, It can be estimated consistently by _I f 1 dPjk(yjjyik) dPjk(yjjyik) n fr[ P-k(yijVik) dOjk dom (2.51) Similarly, we find Er^ ,=< 80, lm sr 1 fdPjk(yjyk)\ . _ , , 0, otherwise, where E(dyjjk/dOjk) can be estimated consistently by 9Pjk(yijyik) n tt PUVHV\") V dOjk 71 (2.52) 8j8k9jk Chapter 2. Foundation: models, statistical inference and computation 57 2.4.2 Models with covariates In this subsection, we consider extensions of models to include covariates. Under regularity con-ditions, the I F M E for parameters are shown to be consistent and asymptotically normal and the form of the asymptotic variance-covariance matrix is derived. One approach for asymptotic results is given in this subsection, a second approach is given in the next subsection. Our objective is to set forth results as simply as possible and in useful forms; more general theorems for multivariate models could be obtained. Let Y i , . . . , Y \u00E2\u0080\u009E be a sequence of independent random vectors of dimension d, which are denned on the probability measure space (y,A,P(Y;0)), 6 G Sr. C Mq. The marginal probability measure spaces are defined as (yj,Aj,P(Yj\6)) (j = l , . . . , d ) for j th margin, and (yjk,Ajk,P{Yj,Yk\0)) (1 < j < k < d) for the (j, k) bivariate margin and so on. Particularly, the random vectors Y j (i = 1 , . . . , n) are assumed to have the distribution P(yn---yid\0i), 0i e% (2.53) where P(yn \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 -yid',9i) is a M C D or M M D model with univariate and bivariate expressible (PUBE) parameters 0,- = \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, 6%,d, 8i-,i2, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, 0\u00C2\u00BB;d-i,d)- We also assume that Oij (j = l,...,d) is the parameter for the j th univariate margin of Y,- such that Pj(yij) depends only on the parameter Oij, and Oi-jk (1 < j < k < d) is the parameter for the (j, k) bivariate margin of Y,- such that Oijk is the only bivariate parameter appearing in Pjk(yijyik)- Oij and Oi-jk can both be vectors, but for the purpose of illustration, and without loss of generality, we assume they are scalar parameters. Furthermore, we assume for i = 1 , . . . , ra, = 9j(<*'jXij), j = 1 ,2, . . . , d, (2.54) Gijk - hjk(Bjkyfijk), 1 < j < k < d, where the functions gj(-) and hjk(-) are usually monotonic increasing (or decreasing) and differen-tiable functions (for examples of the choice of gj(-) and hjk(-) in a specific model, see Chapter 3), called link functions. They link the model parameters to a set of covariates through a functional relationship. In (2.54), otj = (otji, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, ajpj)' is a pj x 1 parameter vector, X j j = (a ; , j i , . . . , Xijpj)1 is a pj x 1 vector of observable covariates corresponding to the response y ; . 3jk = (djki, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, Pjkqjk)' is a qjk x 1 parameter vector, and Wijk = (wijki, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, wijkqjk)' is a qjk x 1 vector of observable co-variates corresponding to the response y,-. Usually ^ . pj + J2ji-ju \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2,il>i-jPj)', Vijk = VijkiPjk) - (V't-jtl, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2. 1pi\jkqjk)', *\u00E2\u0080\u009E,\u00E2\u0080\u00A2 = *\u00E2\u0080\u009E,-(\u00C2\u00ABi) = (*\u00E2\u0080\u009E,-!, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 , *\u00C2\u00BBj, P i)', Vnjk = VnjkiPjk) - ( * n i t l , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 , ^njk.qjj, Chapter 2. Foundation: models, statistical inference and computation 59 where * n j s = Yl\"=i^i-J> (s = and ^ njkt = H)\"=i V'iyJfct (t = l , . . . , ? j jb) . Let tf<;(7) = ( ^ j i C o i i ) ' , . . . , * * \u00C2\u00BB ( 7 ) = E?=i *.-;(7), and we define M \u00E2\u0080\u009E ( 7 ) = n - 1 ^ E ( * i ; ( 7 ) ^ ; ( 7 ) / ) and \u00C2\u00A3 > \u00E2\u0080\u009E ( 7 ) = n ^ E [ . ( 2 . 5 6 ) From (2.55), the I F M for a ; and /? J j t are t=i datj (2.57) and the estimating equations based on I F M are jvnj = o, j = l,...,d, \ * \u00E2\u0080\u009E i t = 0 , 1 < j < k < d. (2.58) With the I F M approach, estimates of a j and 0jk, denoted by otj \u00E2\u0080\u0094 ( S j i , . . .,djiPj)' and /? j jb = (^jibi, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 ,Pjk,qjk)'', are obtained by solving the nonlinear system of equations (2.58). We next give several necessary assumptions for establishing asymptotic properties of otj and (3jk. A s s u m p t i o n s 2.1 For 1 < j < k < d, 1 < / < m < d and i = 1 , . . . , n, we make the following assumptions: 1. (a) For almost all y,- G y and for all 0; G ft, tPi-j, fi-jk, n} n E E PjkivijVik) = o p ( i ) , l 8 = 1 {ya,yik \u00E2\u0080\u00A2 \n} E E l<\"} n E E i;jk\,\jyik) = op(l), \u00C2\u00BB = 1 {yij ; l^ - i i , i l>\" } \u00C2\u00BB = 1 {ya,yik \u00E2\u0080\u00A2 \vi-,jk,j\>n} E E Pjkiytjyik) = op(l), E E pik{yayik) I \u00C2\u00BB = 1 {yn,yik \u00E2\u0080\u00A2 Iviijfc.fc|>n> !=i {ytj.yik \u00E2\u0080\u00A2 \n} Chapter 2. Foundation: models, statistical inference and computation 61 and X) YI 'Pi-Jj PJ ( ) = oP{n2), i = \ {vu \u00E2\u0080\u00A2 \v>i;jj\i;jk,j\i;j,j\,\ i l i f e , f e | ) < n } n E E i ; i k, i|,|v. i. I m, ! m|) no, where n0 is some positive integer value. (bj In the neighborhood of the true 7 defined by B(8) = {y*eg : \"||7* - Til < *}, \u00C2\u00AB5 i 0, 9j(')i 9j('):9j(')i hjk(-), frjfcO and hjk(-) are bounded away from zero. Chapter 2. Foundation: models, statistical inference and computation 62 \u00E2\u0080\u00A2 A s s u m p t i o n 2.3 For all e > 0 and for any fixed vector ||u|| ^ 0, the following condition is satisfied i i j S S ^ i : \u00C2\u00A3 K , ( r . ) fP(Y;r . ) = o. v nwoy ; \u00E2\u0080\u00A2=i{|u'f j(70)|>\u00C2\u00A3(u'Af.(70)u)i/\u00C2\u00BB} \u00E2\u0080\u00A2 Assumptions 2.1 and 2.2 are needed so that we may apply some weak law of large numbers theorem to prove the consistency of the estimates. Assumption 2.3 is needed for applying the central limit theorem for deriving the asymptotic normality of the estimates. These conditions appear to be complicated, but for special cases they are often not difficult to verify. For instance, for the models we will use in Chapter 5, if the covariates are bounded and have full rank in the sense of Assumptions 2.2, with appropriate choice of the link functions, the conditions are almost empty. Related to statistical literature, Bradley and Gart (1962) studied the asymptotic properties of maximum likelihood estimates (MLE's) where the observations are independent but not identically distributed (i.n.i.d.). Hoadley (1971) established general conditions (for cases where the observations are i.n.i.d. and there is a lack of densities with respect to Lebesque or counting measure) under which maximum likelihood estimators are consistent and asymptotically normal. Assumptions 2.1, 2.2 and 2.3 are closely related to the conditions in Bradley and Gart (1962) and Hoadley (1971), but adapted to multivariate discrete models with M P M E property. In particular, the Assumptions 2.1 reflect the uniform integrability concept (for discrete models) employed in Hoadley (1971). Proper t ies of estimators As with the model (2.39) with no covariates, we first develop the asymptotic results for the simplest situation when d \u00E2\u0080\u0094 2 such that Y j = (Yu, Y,-2)' (i = 1 , . . . , n) has the distribution P{ynyi2\0i), (2.59) where 6{ = (0j ; 1, 0j ; 2 , 0j ;i 2). Without loss of generality, 0j ;i, 0j ; 2 , 0 j ; i 2 are assumed to be scalar parameters. We further assume = 9j(ti2 4 l 3)' is a ? 1 2 x 1 vector of covariates corresponding to the response vector y,-, where yj = (yn, yi2) is the observed value of Y j . Theorem 2.5 Consider the model (2.59) together with (2.60) and let y = (ot[, d' 2 , J312)' denote the IFME ofy corresponding to IFM (2.57). Assume y is a q x 1 vector. Under Assumptions 2.1 and 2.2, 7 is a consistent estimator ofy0. Furthermore, under Assumptions 2.1, 2.2 and 2.3, as n \u00E2\u0080\u0094> 00, we have ^ - 1 ( 7 - 7 o ) ^ ( 0 , / ) , where An = Dn1/2(y0)M^/2(y0)(Dn 1 / 2 (7o)) T , with Dn(y0) and Mn(y0) defined in (2.56). Proof: Using Taylor's theorem to the first order, we have * \u00E2\u0080\u009E i ( a i ) = * \u00E2\u0080\u009E i ( a i , o ) + ( 0 1 - ari.o) *n2(Q!2) = *n2 (\u00C2\u00AB2 , o ) + ( \u00C2\u00AB 2 . - \u00C2\u00AB2,o) 0 * m dati a-3 * \u00E2\u0080\u009E 2 da2 * n l 2 ( / ? 1 2 ) = *nl 2(/Vo) + (Pl2 ~ Plifi) 3 * \u00E2\u0080\u009E 1 2 12 + (di - a i , 0 ) dati dfi12 + ( d 2 - a2,o) (2.61) V 3 * \u00E2\u0080\u009E 1 2 da2 V where ot\ is some vector value between d i and Qfi,o, &2 is some vector value between d2 and a 2 ] o, and 7** is some vector value between 7 and 70. Note that in (2.61), \Pf,i 2(/?i 2) also depends on d i and d 2 , and ^ ni2(0\2,o) depends on o^o and at2to- Furthermore, in (2.61), we have f = E ^ ; l ( ^ ; l ) ( ^ K x . - l ) ) 2 + ^ ; l (^ ; l )^Kxj 1 ) ]x j 1 xf 1 , i = l ^ f ^ 1 = iyi.4\u00C2\u00B0iMi(\u00C2\u00ABS*>))2 + ^ ; 2 ( ^ ; 2 ) < ( ^ X i 2 ) ] x j 2 X ? T 2> \u00E2\u0080\u00A2=1 = E [ ^ ; i 2 ( ^ ; i 2 ) ( ^ f c ( ^ 2 W i i 2 ) ) 2 + W i i a ^ - i a J ^ O T a W a s M w a a w f ^ , (2.62) 0 7 , 1 2 ,-=i ^ n l 2 Q 9 i 2 ) _ ^ 9a 1 1=1 0 t t n l 2 ( \u00C2\u00A3 l 2 ) = f da2 \u00C2\u00AB=i cVt;l2(#i;l2) / , ; x , / / j 0 T \ 'gg ' ' 9j(. - ^ ( a 2 x \u00C2\u00AB ) ^ i ( i 8 i 2 W i i 2 ) 90j;: Wj^Xj^ , Wj l2x f 2 , Chapter 2. Foundation: models, statistical inference and computation 64 and 3 * \u00E2\u0080\u009E i ( o i ) / a\u00C2\u00AB2 = 0, 0 * \u00E2\u0080\u009E i ( a i ) / d 0 1 2 = 0, d^n2{a2)/da1 = 0 and d*n2(a2)/d/312 = 0. To establish the consistency and asymptotic normality of y, note that with Assumptions 2.1, we have that n- 2 E( p O and n~11'ni2(/?12|0)-'-P0. Since ^ni(Q!i) = 0, \Pn2(<*2) = 0 and ^n\2(S\2) = 0, by following similar arguments as for the consistency of 6 in the model (2.39), we establish the consistency of y. Now let *n(7) / *\u00C2\u00BB l (\u00C2\u00AB l ) \ * n 2 ( \u00C2\u00AB 2 ) V*nl2(^12) / Hn{y) = I 9 Q t i x aa. 0 0 0 dOtQ and / 9 t t i 0 aa 3 (2.61) can be rewritten in the following matrix form T* 0 0 \ Vn(y - y0) = -HI h*n(7o)]. (2.63) It follows from the convergence in probability of 7 to y0 that \u00C2\u00B1[Hn(y) - Hn(y0)]^0. Since each element of n~1Hn(y0) is the mean of n independent variates, under some conditions for the law of large numbers, we have 1 -Hn(y0)-Dn(y0)^0, (2.64) where Dn(y0) = n- 1 E{i?\u00E2\u0080\u009E(7 0 )}. Assumptions 2.1 and 2.2 imply that lim ra-2Var(#\u00E2\u0080\u009E(70)) = 0. (2.65) Chapter 2. Foundation: models, statistical inference and computation 65 Thus by Markov's weak law of large numbers, we derive that ^H*n - Dn(y0)^0. (2.66) Next, we note that ^ n (7o) involves independent centered summands. Therefore, we may directly appeal to the Lindberg-Feller central limit theorem to establish its asymptotic normality. From Assumption 2.3, by the Lindberg-Feller central limit theorem, we have \" ' * n ( 7 o ) D (u>Mn(y0)uy/2 Applying Slutsky's Theorem to (2.63), we obtain v ^ ^ 1 ( 7 - 7 o ) ^ \u00C2\u00AB ( 0 , / ) , where An = \u00C2\u00A3 > - 1 / 2 ( 7 0 ) M \u00E2\u0080\u009E 1 / 2 ( 7 o ) ( \u00C2\u00A3 > - 1 / 2 (7o) ) T , and J is a q x q identity matrix. \u00E2\u0080\u00A2 Next we turn to the general model (2.53) where d is arbitrary. With the Assumptions 2.1, 2.2 and 2.3, Theorem 2.5 can be generalized to the case d > 2. Compared with the d = 2 situation, the I F M for the general model (2.53) is a system of equations (2.58), which do not introduce any complication in terms of the asymptotic properties for I F M E . Thus we have the following: T h e o r e m 2.6 Consider the general model (2.53) with arbitrary d. Lety denote the IFME ofy under the IFM (2.58). Under Assumptions 2.1 and 2.2, y is a consistent estimator of y. Furthermore, under Assumptions 2.1, 2.2 and 2.3, as n \u00E2\u0080\u0094+ oo, we have v ^ A - 1 ^ - 7 0 ) ^ ( 0 , 7 ) , where An \u00E2\u0080\u0094 \u00C2\u00A3 \u00C2\u00BB \u00E2\u0080\u009E 1 / 2 ( 7 0 ) M \u00E2\u0080\u009E 1 / 2 ( 7 0 ) ( J D \u00E2\u0080\u009E 1 / 2 ( 7 0 ) ) T , with Dn(y0) and Mn(y0) are defined in (2.56) \u00E2\u0080\u00A2 We now calculate the matrix Mn(y) and Dn(y) (or just part of the matrices, depending on the need) corresponding to the I F M for aj and 0jk. For example, suppose a,-; is a parameter appearing in Pj(yij) and a k m is a parameter appearing in Pk(y(k). Then the element of the matrix Mn(y) corresponding to the parameters ctji and a k m can be estimated by 1 \" 1 dPjjyjj) dPk(yik) (2.67) OtjOt* n JT[ Pj(Vij)Pk(yik) daji dak where j may equal to k. If ctji is a parameter appearing in Pj{yij), and Bkms is a parameter appearing in Pkm{yikyim), then the element of the matrix Mn(y) corresponding to the parameters ctji and Bkms can be estimated by l . f 1 dPjjyij) dPkm(yikyim) n ~[ Pj(yij)Pkm(yikyim) da^ dBkms . , (2.68) ajakamBkm Chapter 2. Foundation: models, statistical inference and computation 66 where k < m and j may equal to k or m. Furthermore, if Pjks is a parameter appearing in PjkiyijVik) and fiimt a parameter appearing in Pim(Vi\Vim), then the element of the matrix Mn(y) corresponding to the parameters Pjks and Pimt can be estimated by 1 \" dPjkiyijVik) dPlmjyuVim) . . , (2.69) dr idr l l a Idrm 0 i l k 0 I m n ^ Pjk(Vijyik)Plm(yilVim) df3jk, dPlmt where j < k, I < m and (j, k) = (/, m) is allowed. For the elements of the matrix Dn(y), suppose ctji and ctjm are parameters appearing in Pj(yij)-Then the element of the matrix Dniy) corresponding to the expression d^fnji{aji)/dajm can be estimated by 1 ^ 1 dPjjyjj) dPj(yij) (2.70) \" i^l Pj(y>i) dai< d0Cjm If ctji is a parameter appearing in Pj(vij) and /?jjts is a parameter appearing in PjkiyijVik), then the element of the matrix Dn(y) corresponding to the expression d^nj(ctji)/dPjks is 0. If Pjks is a parameter appearing in PjkiyijVik) and ai is a parameter corresponding to a univariate margin, then the element of the matrix Dn(y) corresponding to the expression diinjksiPjks)/dai can be estimated by 1 \" n t\u00E2\u0080\u0094' dPjkivijyik) dPjkiyijVik) JT[ PjkiVijyik) dfijk* dai (2.71) If Pjks is a-parameter appearing in PjkiyijVik) and /3t is a parameter corresponding to a bivariate margin, then the element of the matrix Dniy) corresponding to the expression d^njksiPjks)/'dPt can be estimated by dPjkiyijVik) dPjkjyijVik) l y ^ 1_ n fri Pjkiyayik) dpjks dpt (2.72) However, as in section 2.5 and the data analysis examples in Chapter 5, it is easier to use the jackknife technique to estimate M\u00E2\u0080\u009Eiy) and Dniy). 2.4.3 Asymptotic results for the models assuming a joint distribution for response vector and covariates The asymptotic developments in subsection 2.4.2 treat x , i , . . . , x*d and w , i 2 , . . . , W j d - i , d as known constant vectors. A n alternative would be to consider the covariates as marginal stochastic outcomes of a vector V* , and to consider the distribution of the random vector formed by the response vector Chapter 2. Foundation: models, statistical inference and computation 67 together with the covariate vector. Then the-model (2.53) can be interpreted as a conditional model, i.e., (2.53) is the conditional mass function given the covariate vectors, where the Y t- are conditionally independent. Specifically, let Zs- = I I , i = 1, . . . , n , be iid with distribution F$ belonging to a family T = {F^,6 G A} . Suppose that the distributions F$ possess densities or mass functions (or the mixture of density functions and mass functions) with sup-port Z; denote these functions by /(z; 6). Let 6 = (y' ,r)')', where y = (o^ , . . . , a'd, /3' 1 2 , . . . ,0'd_1 d)' \u00E2\u0080\u00A2 Assume that the conditional distribution of Y t- given V,- = vt-, P(yi;6i\vi = vi), (2.73) is as in (2.53), that is P(yi;0i\'Vi = v,) is a M C D or M M D model with univariate and bivariate expressible parameters (PUBE) Oi = (;r}), Pj k,\i = Pjk (ytj yik )9j (v,-; >?). Thus we obtain a set of loglikelihood functions of margins for 7 t=i >'=i \u00C2\u00BB'=i n n n Injk = ^ l o g P _ , j f c , V i = E l o g P^J / . - j J / j * ) + E f f j ( V ' ; , ? ) ' 1 < J < ^ < rf-(2.75) 8 = 1 i=l s = l Let , .def 1 dPjy Uj,s = Wj,,(Q!j-,) = ^ L L - , J = l,...,d; s = l,...,pj, Fjy OOLjs u>jk,t = Ujk,t(0jk,t)d= 9 ^ k ' V , 1 < j < k < d; t = 1 and \"j.Vi OCtj, Ui-jk,t - ^i;jk,t(0jkt)'=-^:\u00E2\u0080\u0094d^ik'Vi, 1<3 - 2^ P . v 5 a . = 2^ p.(v..\ d a . =2^Vi-jB, J = l,...,d; s = l,...,Pj 8 = 1 8 = 1 3 , X i 3\u00C2\u00B0 8 = 1 r]{y*j' 0 0 ! 1 S i = l 1 3P J(;/, J) njk,t - V * 1 dpjk,vi 1 dPjk(yijyik) L 8 = 1 fr[Pjk,Yi dPjkt fr( Pjk(yijyik) dpjkt jkt, 8 = 1 1 < 3 < k < d; t = l , . . . , q j k , and the estimating equations for aj and 3jk based on I F M are ttnj,s = 0, j - l , . . . , d ; s = l , . . . , p j , (2.76) (2.77) ftnjM = 0, 1 < j < < d ; t = 1 , . . . , qjk. With the I F M approach, estimates oiaj,j = and 8jk, 1 < j < k < d, denoted by aj = (ctji, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2,&j,pj)' a n d Pjk ~ (Pjki, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2>Pjk,qjk)', are obtained by solving the nonlinear system Chapter 2. Foundation: models, statistical inference and computation 69 of equations (2.77). Note that (2.77) is computationally equivalent to (2.57); they both lead to the same numerical solutions (assuming the link functions given in (2.54)). Let fi\u00E2\u0080\u009Ej = (firy.i, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2>tonj,pj)' and \u00C2\u00A3l\u00E2\u0080\u009Ejk = (\u00C2\u00A3lnjk,i, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 ,^njk,qjk)'\u00E2\u0080\u00A2 Then (2.76) can be rewritten in function vector form { &nj=0, j = l , . . . , d , Vnjk = 0, l < j < k < d . Let Qj = (wj . i , . . . , U j , P j ) ' and Qjk = (ujk,i, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 -^jk^J- Let 0 = ( f i i , . . . , Q'd, fi'12>ft'd_M)'. Let M n = E(Qfl ') , Dn = dCl/dy. Under some regularity conditions similar to subsection 2.4.1, consistency and asymptotic normality for y can be established. Basically, the assumptions that we need are those making 0 a regular inference function vector. A s s u m p t i o n s 2.3 1. The support ofZ, Z does not depend on any 6 \u00C2\u00A3 A. 2. Es{Sl) = 0; 3. The partial derivative dCl/dy exists for almost every z G Z; 4- Assumed andy areqxl vectors respectively, and their components have subindex j = l , . . . , q . The order of integration and differentiation may be interchanged as follows: Wi JZU^^'^DTI^ = ^ ^ [ W J A Z ; * ) ] D M Z ) ; 5. E(j{QQ'} G M q x q and the q x q matrix Mn = Es{\u00C2\u00A3in'} is positive-definite. 6. The q x q matrix = dd(6)/dy is non-singular. \u00E2\u0080\u00A2 Assumptions 2.3 are equivalent to assuming that Mn(y) in (2.56) is positive-definite and Dn(y) in (2.56) is non-singular for certain n > no, where no is a fixed integer. We have the following theorem Chapter 2. Foundation: models, statistical inference and computation 70 Theorem 2.7 Consider the model (2.73) and let y denote the IFME of y under the IFM (2.77). Under Assumptions 2.3, y is a consistent estimator ofy. Furthermore, as n \u00E2\u0080\u0094\u00C2\u00BB\u00E2\u0080\u00A2 oo, we have asymp-totically where Jn = D^M^1 Da-Proof. Under the model (2.74) and the Assumptions 2.3, the proof is similar to that of Theorem 2.3 We believe this approach for deriving asymptotic properties of an estimate has not appeared in the statistical literature. The assumptions are suitable for an observational study but not an experimental study in which one has control of the v's. Theorem 2.7 is different from Theorem 2.6 in that M n and \u00C2\u00A3>n both depend on the distribution function of V . Nevertheless, because Uij^ = ipij,s and Wijk,t = ipi;jk,t, the numerical evaluation of M n and Mn(y) in (2.59) based on data are the same because only the empirical distribution for ^ is needed. For example, suppose or; is a parameter of Pj(yij) and ctm a parameter of Pk{yik), then the element of the matrix M n corresponding to the parameters a; and ctm can be estimated by which is the same as (2.67). We can similarly obtain (2.68) and (2.69). The same result is true for D n versus Dn(y) in (2.56); they both lead to the same numerical results based on the data. We thus derive the same formulas (2.70)-(2.72) for numerical evaluation of D n -2.5 The Jackknife approach for the variance of I F M E The calculation of the Godambe information matrix based on regular I F M for the models (2.39) and (2.53) is straightforward in terms of symbolic representation. However, the actual computation of the Godambe information matrix requires many derivatives of first and second order, and in terms of computer implementation, considerable programming effort would be required. With this consideration, an alternative jackknife procedure for the calculation of the I F M E asymptotic variance is developed. The jackknife idea is simple, but very useful, especially in cases where the analytical answer is very difficult to obtain or computationally very complex. This procedure has the advantage of general computational feasibility. V ^ ( 7 - 7 0 ) ^ ( 0 , J n 1 ) . and 2.4. \u00E2\u0080\u00A2 Chapter 2. Foundation: models, statistical inference and computation 71 In this section, we show that our jackknife method for calculating the corresponding asymptotic variance matrix of 0 is asymptotically equivalent to the Godambe information matrix. We examine the situation for models with covariates and with no covariates. Our main interest in using the jackknife is to obtain the SE of an estimate and not for bias correction (because for multivariate statistical inference the data set cannot be small), though several results about jackknife parameter estimation are also given. The jackknife estimate of variance may be preferred when the appropriate computer code is not available to compute the Godambe information matrix or there are other complications such as the need to calculate the asymptotic variance of a function of an estimator. Some numerical comparisons of the,Godambe information and the jackknife variance estimate based on simulations are reported in Chapter 4. The jackknife procedure is demonstrated to be satisfactory. Some general references about jackknife methods are Quenouille (1956), Miller (1974), Efron and Stein (1981), and Efron (1982), among others. A recent reference on the use of jackknife estimators of variance for parameter estimates from estimating equations is Lipsitz et al. (1994), though their one-step jackknife estimator is not as general as what we are studying here, and their main application is to clustered survival data. In the following, we assume that we can partition the n observations into g groups with m observations each so that n = ra x g, m is an integer. We discuss two situations for applying jackknife idea: leaving out one observation at a time and leaving out more than one observation at a time. 2.5.1 Jackknife approach for models with no covariates Let Y , Y i , . . . , Y \u00E2\u0080\u009E be iid rv's from a regular discrete model P(yi---yd;6), 6 e \u00C2\u00BB in (2.12), and y, y i , . . . , y \u00E2\u0080\u009E be their observed values respectively. Let 9(6) = (tpi(6),...,ipq(6)) be the I F M based on y, = (i>i-i(6),..., ipi-q(6)) be the I F M based on yit and 9n(6) = (\u00C2\u00A5\u00E2\u0080\u009E i (0 ) , . . . , \u00C2\u00A5 \u00E2\u0080\u009E , ( * ) ) be the I F M based on y l t . . . , y \u00E2\u0080\u009E , where 9nj(6) - \u00C2\u00A3 ? = 1 ^;j(0) (j = 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, \u00C2\u00AB)\u00E2\u0080\u00A2 Let 6 - 0i,..., eq) be the estimate of 6 from 9n(6) = 0. Leave out one observation at a t ime Let be an estimate of 0 based on the same set of inference functions \P\u00E2\u0080\u009E but with the ith observation yt from the data set y1,..., yn deleted, i = 1 , . . . , n. In this situation, we have m = 1 Chapter 2. Foundation: models, statistical inference and computation 72 and g \u00E2\u0080\u0094 n. That is, we delete one group of size 1 each time and calculate the same estimate of 0 based on the remaining n \u00E2\u0080\u0094 1 observations. Let 0,- = nO \u00E2\u0080\u0094 (n \u00E2\u0080\u0094 1)0(\u00C2\u00AB), and 0(.) = X2\"=1 0({)/n. 0,- are called \"pseudo-values\" in the literature. The jackknife estimate of 0 is defined as -!)*(\u00E2\u0080\u00A2)\u00E2\u0080\u00A2 (2-78) i = l The jackknife estimator has the property that it eliminates the order l / n term from a bias of the form E{6) = 0 + u\/n + U2/n2 + \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, where the functions u i,U2, . . . do not depend upon n (see Miller 1974). In fact, the jackknife statistic Oj often has less bias than the original statistic 0. The early version of the jackknife estimate of variance for the jackknife estimator Oj was sug-gested by Tukey (1958). It is defined as Vj = ^ nhf) ~ ~6j)ih ~ h ) T = ^ B * \u00C2\u00AB - *<\u00E2\u0080\u00A2>)(*\u00C2\u00AB> - * 0 ) T - ( 2 J 9 ) In (2.79), the pseudo-values 0,- are treated as if they are independent and identically distributed, disregarding the fact that the pseudo-values 0t- actually depend on the common observations. To justify (2.79), Thorburn (1976) proved that under rather natural conditions on the original statistic 0, the pseudo-values are asymptotically uncorrelated. In practice, the pseudo-values have been used to estimate the variance not only of the jackknife estimate 0j, but also 0. But if the bias correction in jackknife estimate is not needed, we propose to use a simpler estimate of the asymptotic variance (matrix) of 0: ^ = D * ( o - W o - * ) T - (2-8\u00C2\u00B0) \u00C2\u00AB = 1 In our context, unless stated otherwise, we always call Vj defined by (2.80) the jackknife estimate of variance. In the following, we first prove that the asymptotic distribution of a jackknife statistic 6j is the same as the original statistic 0 under the same model assumptions; subsequently, we prove that Vj defined by (2.80) is a consistent estimator of inverse of the Godambe information matrix MO). Theorem 2.8 Under the same assumptions as in Theorem 2.4, the jackknife estimate 0j in (2.78) has the same asymptotic distribution as 0. That is, as n \u00E2\u0080\u0094\u00E2\u0080\u00A2 oo, Mh-0)\u00C2\u00B0Nq(0,M(0)), where J*(0) = D%{0)M^(0)D9(0), with M*(0) = \u00C2\u00A3{tf(0)tf T(0)} and \u00C2\u00A3>*(0) = 89(0)/dO. Chapter 2. Foundation: models, statistical inference and computation 73 Proof. We sketch the proof. For * n (0) = ( t f \u00E2\u0080\u009E i ( 0 ) , * \u00E2\u0080\u009E , ( 0 ) ) : y n x $ ^ M9 has the following expansion around 0 o = = + - 0) + R \u00E2\u0080\u009E , where Hn(0) \u00E2\u0080\u0094 3 * \u00E2\u0080\u009E ( 0 ) / 3 0 is a g x g matrix and R N = 0P(||0 \u00E2\u0080\u0094 0\\2) = O p ( n _ 1 ) by assumptions. Thus Vn~(0 -0)= [-Hn(0)) ^ ( - * \u00E2\u0080\u009E ( \u00C2\u00AB ) - R \u00E2\u0080\u009E ) . Let 4,(i)(0) be ^ n (0) calculated without the ith observation, and H^(0) be the g x q matrix d'9n(0)/d0 calculated without the ith observation. Similarly, we have where Rn_i,< = Op{\\~0(i) - 0\\2) = o ^ n \" 1 ) . Since Vn~(0i -0) = n (yn~(0 - 0 ) - x/^(0(i) - 0)) + y/nffi^ - 0), we have yfrih -ey=n (vn-(0 - 0) -1>/\u00C2\u00BB(*(.\u00E2\u0080\u00A2) -')) + ^ E v^ (*(o - ') \ i=l / i=l Thus Vn(0 j -0) = n + By the Law of Large Numbers, we have and From the central limit theorem, -Hn(0)^D^(0), n \u00E2\u0080\u0094 f f ( O ( 0 ) A \u00C2\u00A3 > , ( * ) . -\u00C2\u00B1=yn(0)ZNq(o,M*). Chapter 2. Foundation: models, statistical inference and computation 74 We further have This, together with the -yn-consistency of 0 (Theorem 2.4), lead to -Hn(0)) 1 - i= ( - * \u00E2\u0080\u009E ( \u00C2\u00AB ) - R \u00E2\u0080\u009E ) ^ T T ; E { ( ^ T F F ( ' ) C ) ) ( \" * ( ' > ( ' l ) \" \" M J \u00C2\u00B0 \" . ( \u00C2\u00B0 . D ; ' M . ( C ; ' ) T ) . Thus by applying Slutsky's Theorem, we obtain ^(ej-0)^Nq(o,j^(0)). \u00E2\u0080\u00A2 Theorem 2.9 Under the same assumptions as in Theorem 2-4, the sample size n times the jackknife estimate of variance Vj defined by (2.80) is a consistent estimator of J^1(0). Proof. We have - *)(*(0 - Of = - *)(*(.\u00E2\u0080\u00A2) - of i=l Recall that i=l L\u00C2\u00BB = l (fl _ fl)T -{0-0) n(O-0)(0-Of. 0-0 = H-1(6)(-9n(O)-Kl), and Chapter 2. Foundation: models, statistical inference and computation 75 where R n = Op{\\9 - 6\\2) = Opin'1) and R \u00E2\u0080\u009E _ l i 8 - = Op(\\6(i) - 6\\2) = Op{n-1). Thus J2(h) ~ * ) ( ' \u00C2\u00AB ) -~0)T=J2 H^(6) ( - * ( 0 ( * ) - R n - i , i ) ( -*( , \u00E2\u0080\u00A2)(\u00C2\u00AB) - R n - i , 0 T i = l t'=l n Li=l -1 H~\6) ( - * \u00E2\u0080\u009E ( * ) - R \u00E2\u0080\u009E ) E ^ j M (-*(,)(*)-Rn-i,,) \u00C2\u00AB=i + n / / - 1 ^ ) ( - * \u00E2\u0080\u009E ( * ) - R n) ( - * \u00E2\u0080\u009E ( \u00C2\u00AB ) - R n ) T {H-\e)f . (2.81) As * ( 0 ( t f ) = * \u00E2\u0080\u009E ( * ) - *,,(*), thus E : = i * ( . - ) (\u00C2\u00BB) = ( \u00C2\u00AB \" ! ) * \u00C2\u00BB ( * ) and \u00C2\u00A3 ? = 1 ^oC*)*^*) = (n -2 ) * \u00E2\u0080\u009E ( 0 ) ^ ( 0 ) + \u00C2\u00A3 \" = 1 (*)\u00E2\u0080\u00A2 % t h e L a w of Large Numbers, we have -Hn(B)^*D9(9), n 1 and \u00C2\u00BB=1 From (2.81), we have i=i and this implies that t = l i=l In other words, we proved that nVj is a consistent estimator of J^1(0). \u00E2\u0080\u00A2 Leave out m o r e t h a n one observation at a t ime Now for general g > 1, we assume a random subdivision of y l t . . . , yn into g groups (n = gm). Let 0(\u00E2\u0080\u009E) = 1,...,(/) be an estimate of 0 based on the same set of inference functions \P from the data set y : , . . . , y n but deleting the u-th group of size m (m is fixed), thus is calculated based on a subsample of size m(g \u00E2\u0080\u0094 1). The jackknife estimate of 6 in this general setting is the mean of Bv, which is 1 3 OJ =-52** = g*- (g- i)h)> (2-82) 9 v=\ Chapter 2. Foundation: models, statistical inference and computation 76 where = g 1 YH=i ^0)> a n d ^ v = gO \u00E2\u0080\u0094 (g \u00E2\u0080\u0094 1)6^ {y = 1 , . . . , g) are the pseudo-values. In this situation, the jackknife estimate of variance for 6, Vj, is defined as (2.83) i/=i Theorem 2.8 and 2.9 can be easily generalized to the situation with fixed m > 1. Theorem 2.10 Under the same assumptions as in Theorem 2.4, the jackknife estimate 6j defined by (2.82) with m fixed has the same asymptotic distribution as 6. That is, as n \u00E2\u0080\u0094\u00E2\u0080\u00A2 oo (thus g \u00E2\u0080\u0094\u00E2\u0080\u00A2 oo^, Vn-(0j-6)\u00C2\u00B0Nq(O,J*\0)), where Jy{6) = D%{6)M^l(6)Diil{6), with M*(0) = Ee{9{6)9T (6)} and D*(6) = 89(6)/86. Proof. We sketch the proof. Let \P(\u00E2\u0080\u009E)(0) be 9n(6) calculated without the l/th group, and H^(6) be the q x q matrix 89n(6)/d6 calculated without the uth group. We have - l \/n \u00E2\u0080\u0094 m(0(v) - 6) = -H{v){6) 1 ( - * ( \u00E2\u0080\u009E ) ( * ) - R \u00E2\u0080\u009E - m , \u00E2\u0080\u009E ) where R n _ m , \u00E2\u0080\u009E = Op{\\6{v) - 6\\2) - Op(n-1). Recall that - 6) = g (yg(6 -6)- ^{6{v) - 6)) + ^{6{l/) - 6), we thus have y/g&j -6) = g (jg-{6 -6)-l-J2 - 9)) + - E v W w - 6) \ 9 v = l ) 9 v = l this implies that 9 1 yft(h -6) = g [^{6 - 6 ) - ]f^1~ E Vn~^(6(v) - 6)^ + yj^~ E v^=^(0 ( l O - *). Thus Vn~{0j -6) = g ^ ( ^ ' ^ ( - \u00E2\u0080\u00A2 n W - R n ) -+ (2.84) Chapter 2. Foundation: models, statistical inference and computation 77 By the Law of Large Numbers, we have and From the central limit theorem, We also have \u00C2\u00B1Hn(6)^D*(6), n 1 H(l>)($)**D9(0). n \u00E2\u0080\u0094 m 4=*(0)-^ W,(O,M\u00C2\u00BB). This, together with the \/n-consistency of 6 (Theorem 2.4), lead to -#\u00E2\u0080\u009E(*)) - L ( _ * \u00E2\u0080\u009E ( \u00C2\u00AB ) _ R \u00E2\u0080\u009E ) ' i j E { ( s r ^ ' w w ) \" ^ (-t(.,(\u00C2\u00BB) - a.-.,)} \ ^ \u00C2\u00A3 { ( ^ \" \u00C2\u00AB w ) T T = (-*\u00C2\u00AB(') - )} -\".0. V \u00C2\u00AB * W)T). Applying Slutsky's Theorem to (2.84), we obtain ^(8j-6)^Nq(0,J^(6)). \u00E2\u0080\u00A2 Theorem 2.11 Under the same assumptions as in Theorem 2.4, the sample size n times the jack-knife estimate of variance Vj defined by (2.83) is a consistent estimator of J^1(6), when m is fixed and g \u00E2\u0080\u0094\u00E2\u0080\u00A2 oo. Proof: We use the same notation as in the proof of Theorem 2.10. We have 9 9 - 6)(6{v) - 6f = - 6)(6{v) - 6? vzzl I / = l (6 - 6)T - ( 6 - 6 ) + g(6-6)(6-6f Chapter 2. Foundation: models, statistical inference and computation 78 and Thus *(\u00E2\u0080\u009E) - 0 = H(V)(0) ( - * ( \u00E2\u0080\u009E > ( * ) - R \u00E2\u0080\u009E _ m , \u00E2\u0080\u009E ) . y y r - ~0)(O{V) -6)T = J2 H^){0) ( - * ( \u00E2\u0080\u009E ) ( \u00C2\u00AB ) - R \u00E2\u0080\u009E _ m , \u00E2\u0080\u009E ) ( - * ( \u00E2\u0080\u009E ) ( ' ) - R n - m , , ) T i / = i 9 ( - * \u00E2\u0080\u009E ( * ) - R n f ( / / n \" 1 ^ ) ) 5 \u00C2\u00A3/frU^(*(*)*TW)-ra v From (2.85), we have D*oo - - *)T - E w\u00C2\u00AB: ( ^ i ^ i / = i which implies that \u00C2\u00ABE(*(\") \" W \u00C2\u00AB 0 - *)TI*D*(B)-1M9(B)(D9(B)-1)T. 1=1 In other words, we proved that n ^f_ 1( f l( 1/) - 0)(9(v) ~ ^ ) T is a consistent estimator of 1 (0), when m is fixed and # \u00E2\u0080\u0094\u00E2\u0080\u00A2 oo. \u00E2\u0080\u00A2 Chapter 2. Foundation: models, statistical inference and computation 79 The main motive for the leave-out-more-than-one-observation-at-a-time approach is to reduce the amount of computation required for the jackknife method. For large samples, it may be helpful to make an initial random grouping of the data by randomly deleting a few observations, if necessary, into g groups of size ra. The choice of the number of groups g may be based on the computation costs and the precision or accuracy of the resulting estimators. As regards of computation costs, the choice (m,g) = ( l ,n) is most expensive. For large samples, g = n may not be computationally feasible, thus some values of g less than n may be preferred. The grouping, however, introduces a degree of arbitrariness, a problem not encountered when g \u00E2\u0080\u0094 n. This results in an analysis that is not uniquely defined. This is generally not a problem for SE estimation for application purposes, as usually then a rough assessment is sufficient. As regards the precision of the estimators, when the sample size n is small to moderate, the choice (m,g) = ( l ,n) is preferred. See Chapter 5 for examples. 2.5.2 J a c k k n i f e f o r a f u n c t i o n o f 0 The jackknife method can also be used for estimates of functions of parameters, such as the asymp-totic variance of P(yi \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 - yd', 0) for a M C D or M M D model. The usual delta method requires partial derivatives of the function with respect to the parameters, and these may be very difficult to obtain. The jackknife method eliminates the need for these partial derivatives. In the following, we present some results on the jackknife method for functions of 0. Suppose h(0) = (bi(0),..., bs(6))' is a vector-valued function defined on 5ft and taking values in s-dimensional space. We assume that each component function of b, bj(-) (j = 1,... .,s), is real valued and has a differential at 0Q, thus b has the following expansion as 0 \u00E2\u0080\u0094\u00E2\u0080\u00A2 0o'. b(0) = b(6o) + (0-Oo)(j^y+ o(\\0-0o\^ (2.86) where db/d0'o \u00E2\u0080\u0094 (db/d0')\Q_Qo is of rank t = min(s, q). By Theorem 2.4, 0 has an asymptotic normal distribution Similarly by Theorem 2.8 and 2.10, 0j has an asymptotic normal distribution in the sense that Vn-(0j- 0)\u00C2\u00B0Nq(0, J * 1 ) . We have the following results for b(0) and b(0j): Chapter 2. Foundation: models, statistical inference and computation 80 T h e o r e m 2.12 Let b be as described above and suppose (2.86) holds. Under the same assumptions as in Theorem 2.4, b(0) has the asymptotic distribution given by Proof. See Serfling (1980, Ch.3). \u00E2\u0080\u00A2 T h e o r e m 2.13 Let b be as described above and suppose (2.86) hold. Under the same assumptions as in Theorem 2.4, b(6j) has the asymptotic distribution given by Proof. See Serfling (1980, Ch.3). \u00E2\u0080\u00A2 As in the previous subsection, let t9(\u00E2\u0080\u009E) be the estimator of 0 with the i/-ih group of size m deleted, v = 1 , . . . , g. We define the jackknife estimate of variance of h(0), which we denote by Vjb, as VJb = J2 (*>(*(,)) - b (*)) ( b ( ^ ) ) - b ( * ) f \u00E2\u0080\u00A2 (2-87) i/=i We have the following theorem. T h e o r e m 2.14 Let b be as described above and suppose (2.86) holds. Under the same assumptions as in Theorem 2.4, the sample size n times the jackknife estimate of variance Vj\> defined by (2.87) is a consistent estimator of dB'JJ* \d6'J \u00E2\u0080\u00A2 Proof. The proof is similar to that of Theorem 2.11, and thus omitted here. \u00E2\u0080\u00A2 To carry out the above computational results related to the estimates of functions of parameters, it would be desirable to maintain a table of the parameter estimates for the full sample and each jackknife subsample. Then one can use this table for computing estimates of one or more functions of the parameters, and their corresponding SEs. The results in Theorems 2.12, 2.13 and 2.14 have immediate applications. One example is given next. E x a m p l e 2.18 For a M C D or M M D model in (2.12), say P(yi \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 -yd]6), we could apply the above results to say something about the asymptotic behaviour of P(yi \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 ya',6) and P(yi \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 - yd',6j). From Theorems 2.12 and 2.13, we derive that as n \u00E2\u0080\u0094\u00E2\u0080\u00A2 oo Chapter 2. Foundation: models, statistical inference and computation 81 and yfrWvi \u00E2\u0080\u00A2 y d ; h ) - P(V1 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2\u00E2\u0080\u00A2yd;0))$N (o, J*1 {^f^j . Furthermore, by Theorem 2.14, we obtain a consistent estimator of (dP/d0')J^1(dP/d0')T, i.e. g 2 n \u00C2\u00A3 {p(yi \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 - \u00E2\u0080\u00A2 '\u00E2\u0080\u00A2 \u00E2\u0080\u00A2 ')) \u00E2\u0080\u00A2 i/=i Also see Chapter 5 for direct application in data analysis. O 2 .5 .3 J a c k k n i f e a p p r o a c h f o r m o d e l s w i t h c o v a r i a t e s Suppose we have the model defined by (2.53) and (2.54). Let (v = 1,..., g) be an estimate of 7 based on the same set of inference functions ^ n (7) from the data set y l t . . . , y n but deleting the f - th group of size m (m is fixed). The jackknife estimate off is 1 9 T ; = - D \" = ^ - ( \u00C2\u00AB - % ' ( 2- 8 8) 9 v = l where 7 ( . } = 1/g \u00C2\u00A3 * = 1 7 ( l / ) , and yv = gy - (g - l ) 7 ( l / ) (v=l,...,g). We define the jackknife estimate of variance Vj:y for y as follows: ^ 7 = E ( 7 W - 7 ) ( 7 W - 7 ) T . . (2.89) i/=i Under the assumptions for the establishment of Theorem 2.6, in parallel to Theorems 2.10 and 2.11, we have the following theorems for the models with covariates. The proofs are generalizations of the proofs for Theorem 2.5, 2.6, 2.10 and 2.11. We omit the proofs and only state the results here. T h e o r e m 2.15 Consider the general model (2.53) with d arbitrary. Let y denote the IFME of y under the IFM corresponding to (2.57). Under Assumptions 2.1, 2.2 and 2.3, the jackknife estimator 7 J defined by (2.88) is asymptotically normal in the sense that, as n \u00E2\u0080\u0094\u00E2\u0080\u00A2 oo, V ^ ^ 1 ( 7 J - 7 O ) ) ^ ( 0 , / ) , where An = Dn1/2(y0)M^2(yQ)(D- 1 / 2 ( y 0 ) ) T , Dn(y0) and Mn(y0) are defined by (2.56). \u00E2\u0080\u00A2 T h e o r e m 2.16 Under the same assumptions as in Theorem 2.7, we have nVj,y - J D - 1 ( 7 o ) M \u00E2\u0080\u009E ( 7 o ) p - 1 ( 7 o ) ) T - 0 ! where Vj:y is the jackknife estimate of variance defined by (2.89), and Ai(7o) a n d Mn(y0) are defined by (2.56). \u00E2\u0080\u00A2 Chapter 2. Foundation: models, statistical inference and computation 82 T h e o r e m 2.17 Let b be as described in subsection 2.5.2 and suppose (2.86) hold. Under the same assumptions as in Theorem 2.7, b(y), a function of IFME y, has the asymptotic distribution given by VnB'1 (b(y) - b(y))^Nt (0,1), where Bn = [(db/8y'0) D;l(y0)Mn(y0)(D;l(y0))T (db/dy'0f] ^ , and Dn{y0) and Mn(y0) are defined by (2.56). \u00E2\u0080\u00A2 T h e o r e m 2.18 Let b be as described in subsection 2.5.2 and suppose (2.86) hold. Under the same assumptions as in Theorem 2.7, b(yj), of the jackknife estimate yj derived fromy, has the asymp-totic distribution given by v^5- 1 (b(7 J )-b( 7 ))^Ar ( (o,7) , where Bn = T |l / 2 (db/dy'0)D-\y0)Mn(y0)(D-1(y0))T(db/dy'0Y , and Dn(y0) and Mn(y0) are defined by (2.56). \u00E2\u0080\u00A2 We define the jackknife estimate of variance of b(7), Vjb, as follows: VJb = \u00C2\u00A3 (b (7 W ) - b(7)) (b(7(\u00E2\u0080\u009E)) - b( 7)) T . (2.90) v = \ T h e o r e m 2.19 Let b be as described in subsection 2.5.2 and suppose (2.86) hold. Under the same assumptions as in Theorem 2.6, we have nVJh - D - 1 ( 7 o ) M \u00E2\u0080\u009E ( 7 o ) P \u00E2\u0080\u009E - 1 ( 7 o ) f (jfif where the jackknife estimate of variance Vjt, defined by (2.90) and Dn(y0) and Mn(y0) are defined by (2.56). \u00E2\u0080\u00A2 2.6 Estimation for models with parameters common to more than one margin One potential application of the M C D and M M D models is for longitudinal or repeated measures studies with short time series, in which the interest may be on how the distribution of the response changes over time. Some common characteristics, which carry over time, may appear in the form of common regression parameters or common dependence parameters. There are also general situations in which the same parameters appear in more than one margin. This happens with the M C D and Chapter 2. Foundation: models, statistical inference and computation 83 M M D models, for example, when there is a special dependence structure in the copula C, such as in the multinormal copula (2.4), where 0 = (Ojk) is an exchangeable correlation matrix with all correlations equal to 0, or 0 is an AR(1) correlation matrix with the (j, k) component equal to 0|J\"*I for some 0. Example 2.19 Suppose for the d-variate binary vector Y j with a covariate vector for the jth univariate margin can be represented as Yjj = I(Zij < ctj+/3jXij), i = 1 , . . . , n, where Zj ~ N(0,0). This is a multivariate probit model. Assume /3j \u00E2\u0080\u0094 /?, then the common regression coefficients appear in more than one margin. We could estimate /? from the d univariate margins based on the I F M approach, but then we have d estimates of /?. Taking any one of them as the estimate of /? evidently results in some loss of information. Can we pool the information together to get a better estimate of /3? The same question arises for the correlation matrix 0 . Assume there is no covariate for 0 . When 0 has certain special forms, for example exchangeable or AR(1), the same parameter appears in d(d\u00E2\u0080\u00941)/2 bivariate margins. Can we get a more efficient estimate from the I F M approach? There are also situations where a parameter is common some margins, such as 0i2 = 023 = \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 = 0d,d-i in 0 . The same question about getting a more efficient estimate arises. \u00E2\u0080\u00A2 A direct approach for common parameter estimation is to use the likelihood of a higher-order margin, if this is computationally feasible. Otherwise, the I F M approach for model fitting can be ap-plied. With the I F M approach, appropriately taking the information about common parameters into account can improve the efficiency of the parameter estimates. Analytical and numerical evidence supporting this claim are given in Chapter 4 for these two approaches of information pooling for I F M that we propose here. The first approach, called the weighting approach (WA), is to form a new estimate based on some weighting of the estimates for the same parameter from different margins. A special case is the simple average. The second approach, called the pool-marginal-likelihoods ap-proach ( P M L A ) , is to rewrite the inference function of margins under the assumption that the same parameter appears in several margins. In the following, we outline the two estimating approaches in general terms. 2.6.1 W e i g h t i n g a p p r o a c h W A is a method to get an efficient estimate based on a weighted average of different estimates for the same parameter. We state this approach in general terms. Assume yi,. \u00E2\u0080\u00A2 - ,yq are estimates of the same parameter 7, but from different inference functions. Let 7 = (71,... ,yq)', and let Y,y be Chapter 2. Foundation: models, statistical inference and computation 84 the asymptotic variance-covariance matrix based on Godambe information matrix. One relatively efficient estimate of the parameter 7 is based on the following result, which easily obtains from the method of Lagrange multipliers. Result 2.1 Suppose X is a q-variate random vector with mean vector px = (p, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 - tP)' = / ^ l and Var(X) = E x , where p is a scalar and 1 = (1,..., 1)'. A linear unbiased estimate of p, u'X, has the smallest variance when E ^ l u = \u00E2\u0080\u00A2 Applying the above result to our problem, the resulting estimate of 7 is 7 = , 7 , . (2.91) If 7jS are consistent estimates of 7 , then 7 is also a consistent estimate of 7 and it has smaller asymptotic variance than any of the individual estimates of 7 from one particular inference function. The asymptotic variance of 7 is 4 = 1 1 ^ 1 1 = ^ ^ - . (2.92) A computationally simpler but slightly less efficient estimate of 7 is l'diagfCI 1}? and an approximation of the asymptotic variance of 7 from (2.93) is g ^ ( \u00C2\u00AB y ) . C ( M ) = \u00C2\u00A3 \u00C2\u00A3 l \u00C2\u00B0 g ^ * ( w ; W * ) -(2.94) is an example of P M L A . The inference functions of margins from (2.94) corresponding to ot and /? are \u00C2\u00AB - i f f 1 dpj(yjj) 1 dPj(yij) IFM \hihp^ dai ' \" \" k k p ^ da> ' n d E E ^ J dPjkjyijyik) and the estimating equations based on I F M is n d EE i=lj2/dp) It can be estimated consistently by 1 A / A \u00E2\u0080\u00A2 1 fliMw,-) \u00C2\u00BBSL? j^ (Wi) ^A (2.97) Similarly, we have E(V22) = E Z 1 dPjkjyjyk) Pjk(yjVk) dp Chapter 2. Foundation: models, statistical inference and computation 87 f=iPj(yj) d\ 1 dPjjyj) \ 1 dPjk(yjyk) Pjk(yjyk) dp = \u00C2\u00A3 P(yi--w) \u00C2\u00A3 1 dPj(yj) \u00C2\u00A3 1 dPjk(yjyk) friPiivj) dx / yf^PMyjyk) dp For E (5Vi /3A) , chp_ ex r = E 1 f d P M \ \ 1 d2Pj(yj) Pf(yj) \ dX J + Pj(yj) dX2 E(d^/dX)= ^(w-w)E { y i - y d } J = 1 Similarly, we find E(di>2/dP)= p(yi\u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2w)E { y i - y d } i < * 1 f-dPjMY } 1 a2p,-(y,-) P,-(W) 3A 2 ^ \u00E2\u0080\u00A2 f c C y j ^ j ^ \" , 1 d2Pjk(yjyk) PjkiVjVk) dp2 Consistent estimates for E ^ ^ ) , E(ipiip2), E(dipi/dX) and E(drp2/dp) can be similarly written as in (2.97). 2.6.3 E x a m p l e s We give two examples of W A and P M L A . Example 2.20 1. A trivariate probit model with exchangeable dependence structure: Suppose we have a trivariate probit model with known cut-off points and P ( l l l ) = $3(0,0,0,p,p,p) . It can be shown (see Example 4.4) that the asymptotic variance of p from one bivariate margin is [(7r 2 \u00E2\u0080\u0094 4 (s in _ 1 p)2)(l \u00E2\u0080\u0094 p 2)]/4n, and the asymptotic variance of p from W A or P M L A is [(1 \u00E2\u0080\u0094 p2)(ir + 6 s i n - 1 p)(n \u00E2\u0080\u0094 2 s i n - 1 p)]/12n. The ratio of the former to the latter is [3(7r + 2 s i n - 1 /5)]/[7r + 6 s i n _ 1 p], which decreases from 0 0 to 1.5 as p increases from \u00E2\u0080\u00940.5 to 1. In this example, the optimal weighting is equivalent to a simple average (see Example 4.4 for details). 2. A trivariate probit model with AR(1) dependence structure: Suppose we have a trivariate probit model with cut-off points known, such that -P( l l l ) = $3(0, 0, 0, p, p2, p). Let u2w be the asymptotic variance of p from W A , a2 be the asymptotic variance of p from P M L A , a 2 2 be the asymptotic variance of p from the (1, 2) margin, and 1, with a maximum value of 1.0391, which is attained at Chapter 2. Foundation: models, statistical inference and computation 88 p = 0.3842 and p \u00E2\u0080\u0094 \u00E2\u0080\u00940.3842; cr22/ 1 dPjkjyijyik) (2.101) fr[ pjk(yijyik) dBjk 0, 1 < j < k < d. Chapter 2. Foundation: models, statistical inference and computation 89 N e w t o n - R a p h s o n m e t h o d The traditional approach for numerical optimization or root-finding is the Newton-Raphson method. This method requires the evaluation of both the first and the second derivations of the objective functions in (2.98) and (2.100). This method, with its good rate of convergence, is the preferred method if the derivatives can be easily obtained analytically and coded in a program. But in many cases, for example with \u00C2\u00A3njk(6jk) or \u00C2\u00A3njk(0jk)i where bivariate objects involve non-closed form two-dimensional integrals, application of the Newton-Raphson method is difficult since analytical derivatives in such situations are very hard to obtain. The Newton-Raphson method can only be easily applied for a few cases with \u00C2\u00A3nj(Xj) or \u00C2\u00A3nj(otj), where only univariate objects are involved. For example, the Newton-Raphson method may be used to solve 9nj(Xj) = 0 to find Xj, j \u00E2\u0080\u0094 1 , . . . , d. In this case, based qn Newton-Raphson method, for a given initial value Xj,o, an updated value of Xj is ' 'dVnjjXjY dXj \u00E2\u0080\u00A2V.new Aj,o \u00E2\u0080\u0094 *nj(Xj) (2.102) This is repeated until successive A^new agree to a specified precision. In (2.102), we need to be able to code n *nj(Xj) = YtlVPjiynWPjiyn)/9^] (2.io3) and 0*n,-(A;) dXj \u00C2\u00A3 1 (dPjiVij) PfiVij) V dXj (2.104) tiPjiVii) dX] This is for the case with no covariates. For the case with covariates, similar iteration equations to (2.102) can be written down. We need to calculate *\u00C2\u00AB;(<*;) = T,[VPi(vij)][dPj{vij)/d\"i], 8 = 1 which is apj x 1 vector and dotj E i=i L 1 d2Pj(yij) 1 dPj(yij) (dPjjyjj) Pj(yij) dctjdet'j Pf{mj) dotj \ dctj (2.105) (2.106) which is a pj x pj matrix. It is equivalent to calculate the gradient of Pj(j)ij) at the point otj, which is the pj-vector of (first order) partial derivatives: d d $ M T Chapter 2. Foundation: models, statistical inference and computation 90 and the Hessian of Pj(ytj) at the point aj, which is &pj xpj matrix of second order partial derivatives with (s,t) (s,t = 1 , . . . ,pj) component (d2/da\u00E2\u0080\u009Edat)Pj(yij). To avoid the often tedious algebraic derivatives in (2.103) - (2.106), modern symbolic computa-tion software, such as Maple (Char et al., 1992), may be used. This software is also convenient in that it outputs the results in the form of C or Fortran code. Q u a s i - N e w t o n m e t h o d For many multivariate models, it is inconvenient to supply both first and second partial derivatives of the objective functions as required by the Newton-Raphson method. For example, to get the partial derivatives of the forms (2.104) - (2.106) may be tedious, particularly with function objects such as (\u00E2\u0080\u00A2njk{9jk) or (.njki,Pjk), where 2-dimensional integrations are often involved. A numerical method for optimization that is useful for many multivariate models in this thesis is the quasi-Newton method (or variable-metric method). This method uses the numerical approximation to the derivatives (gradients and Hessian matrix) in the Newton-Raphson iteration; thus it can be considered as a derivative-free method. In many situations, a crude approximation to the derivatives can lead to convergence in the Newton-Raphson iteration as well. Application of this method requires only the objective functions, such as those in (2.98) and (2.100), to be coded. The gradients are computed numerically and the inverse Hessian matrix of second order derivatives is updated after each iteration. This method has the advantage of not requiring the analytic derivatives of the objective functions with respect to the parameters. Its disadvantage is that convergence could be slow compared with the Newton-Raphson approach. A n example of a quasi-Newton routine, which is used in the programs written for this thesis work, is a quasi-Newton minimization routine in Nash (1990) (Algorithm 21, pl92). This is a modified Fletcher variable-metric method; the original method is due to Fletcher (1970). With the quasi-Newton methods, all we need to do is to write down the optimization (min-imization or maximization) objective function (such as \u00C2\u00A3njk{Pjk)), a n d then let a quasi-Newton routine take care of the rest. A quasi-Newton routine works fine if the objective function can be computed to arbitrary precision, say eo- The numerical gradients are then based on a step size (or step length) e < eo- The calculation of the optimization objective function with multivariate model often involves the evaluation of multiple integration at some arbitrary points. One-dimensional and two-dimensional numerical integrals can usually be computed quite quickly to around six digits of precision, but there is a problem of computational time in trying to achieve many digits of precision for numerical integrals of dimension three or more. When the objective function is not computed Chapter 2. Foundation: models, statistical inference and computation 91 sufficiently accurately, the numerical gradients are poor approximations to the true gradients and this will lead to poor performance of the quasi-Newton method. On the other hand, for statistical problems, great accuracy is seldom required; it is often suffice to obtain two or three significant digits, and we expect that in most of situations, we are not dealing with the worst cases. S tar t in g points for n u m e r i c a l o p t i m i z a t i o n In general, an objective function may have many local optima in addition to possibly a single global optimum. There is no numerical method which will always locate an existing global optimum, and the computational complexity in general increases either linearly or quadratically in the number of parameters. The best scenario is that we have a dependable method which converges to a local optimum based on initial guesses of the values which optimize the objective function. Thus good starting points for the numerical optimization methods are important. It is desirable to locate a good starting point based on a simple method, rather than trying many random starting points. A n example based on method of moments estimation for deciding the starting points is for the multivariate Poisson-lognormal model (see Example 2.12), where the initial values for an estimate of pj arid ( - ^ l o g ^ , ( e - ^ \" 1 ( \" i ) > e - P ^ \" 1 ( ^ ) ) + ^ ^ f t V ' - 1 K ) ) , (3-2) where Kjk are max-id copulas, pj = (VJ +d\u00E2\u0080\u0094 l ) - 1 , Vj > 0. For interpretation, we may say that ip can be considered as providing the minimal level of (pairwise) dependence, the copula Kjk adds some pairwise dependence to the global dependence, and i/j's can be used for bivariate and multivariate asymmetry (the asymmetries are represented through Vj/(i/j + uk), j ^ k). (3.2) has the M U B E and partially C U O M properties. With ip(s) = -9~l log(l - [1 - e-e)e-'), 0 > O , d ^-(^-e-e)l[Kjk(xj,xk)l[xy , (3.3) j C23, C24, C34 are arbitrary compatible bivariate copulas. Examples are the bivariate normal, Plackett (2.8) or Frank copula (2.9); see Joe (1993, 1996) for a list of bivariate copula families with good properties. The parameters in C1234 are the parameters from the bivariate copulas, plus rjjki, l < j < k < l < 4 , and 771234. The generalization of this construction to arbitrary dimension can be found in Joe (1996). Notice that we have quotation marks on the word copula, because the multivariate object obtained from (3.4) and (3.5) or the corresponding form for general dimension has not been proven to be a proper multivariate copula. But they can be used for the parameter range that leads to positive orthant probabilities for the resulting probabilities for the multivariate binary vector. \u00E2\u0080\u00A2 Morgenstern copula C ( u i , ...,ud) d I+EM1-^1-\"*) j ^X>-1(u'-)) > (3J) Chapter 3. Modelling of multivariate discrete data 96 where 4> : [0,oo) \u00E2\u0080\u0094>\u00E2\u0080\u00A2 [0,1] is strictly decreasing and continuously differentiable (of all orders), and (0) = 1, ^(oo) = 0, (-l)J>Ci) > 0. With ^(s) = -6~l log(l - [1 - e-e]e~s), 6 > 0, (3.7) is C(uu ..., ud) = ^ \u00C2\u00A3 = - \ log ( l - y ^ J - ! ) ^ ) \u00E2\u0080\u00A2 (3-8) This choice of ip(s) leads to 3.8 with bivariate marginal copulas that are reflection symmetric. It is straightforward to extend the univariate marginal parameters to include covariates. For example, for z,j corresponding to the random vector Y,-, we can let Z{j \u00E2\u0080\u0094 otjXij, where otj is a parameter vector and Xjj is a covariate vector corresponding to the jth margin. But what about the dependence parameters in the copula? Should the dependence parameters be functions of covariates? If so, what are the functions? These questions have no obvious answers. It is evident that if the dependence parameters are functions of covariates, the form of the functions will depend on the particular copula associated with the model. A simple way to deal with the dependence parameters is to let the dependence parameters in the copula be independent of covariates, and sometimes this may be good enough for the modelling purposes. If we need to include covariates for the dependence parameters, careful consideration should be given. In the following, in referring to specific copulas, we give some discussion on different ways of letting the dependence parameters depend on covariates: - With the multinormal copula, the dependence structure in the copula for the ith response vector Y, is Q{ = (Oijk)- It is well-known that (i) 0,- has to be nonnegative definite, and (ii) the component 6ijk of has to be bounded by 1 in absolute value. Under these constraints, different ways of letting 0 ; depend on covariates are possible: (a) let Oijk = [exp(/?Jj.WjJfc) \u00E2\u0080\u0094 l]/[exp()9jjfcw,Jjfe)\"+ 1]; (b) let 0j have a simple correlation structure such as exchangeable and AR(1); (c) use a representation such as z,j - [otjXij]/[l + x^-Xjj] 1 / 2 , 6ijk \u00E2\u0080\u0094 fjk/[(l + x- J x ! J ) ( l + x ^ X j i ) ] 1 / 2 ; (d) use a more general representation such as 6ijk = r^w^ jkvfi,jk; or (e) reparameterize 0j into the form of d\u00E2\u0080\u0094 1 correlations and (d\u00E2\u0080\u0094 l)(d\u00E2\u0080\u00942)/2 partial correlations. The extension (a) satisfies condition (ii), but not necessarily (i). The extension (b) satisfies conditions (i) and (ii), but is only suitable for data with a special dependence structure. The extension (c) is more natural, as it is derived from a mixture representation (see section 3.5 for a more general form) and it satisfies condition (ii) and also condition (i) as long as the correlation matrix (rjk) is nonnegative definite. This is an advantage in comparison with (a), as in (a), for (i) to be satisfied, all n correlation matrices must be nonnegative definite. The disadvantage of (c) is that the dependence range may be limited once a particular formal Chapter 3. Modelling of multivariate discrete data 97 representation is chosen. The extension (d) is similar to the the extension (c), except that now the 0,-jjbS are not required to depend on the same covariate vectors as z 8 j . The extension (e) completely avoids the matrix constraint (i), thus relieving a big burden on the constraint for the appropriate inclusion of covariate to the dependence parameters. But this extension implicitly implies an indexing of variables, which may not be rational in general, although this is not a problem in many applications as the indexing of variables is often evident, such as with time series; see Joe (1996). - With the mixture of max-id copula (3.3), extensions to parameters 6, Vj, 6jk as functions of the covariates are straightforward. For example, for 0,-, Sijk corresponding to the random vector Y,-, we may have 9i, Vij constant, and 6ijk = exp(3jkVfi,jk)-- With the Molenberghs-Lesaffre construction, the extension to include covariates is possible. In applications, it is often good enough to let the bivariate parameters be function of covariates, such as Sijk = exP(/^jfcws,jfc) for bivariate Plackett copula, or Frank copula, and to let the higher order parameters be constant values, such as 1. This is a simple and useful approach, but there is no guarantee that this leads to compatible parameters. (See Joe 1996 for a maximum entropy interpretation in this case.) - With the Morgenstern copula, the extension to let the parameters Oijk be functions of some covariates is not easy, since the Oijk must satisfy some constraints. This is rather complicated and difficult to manage when the dimension is high. The situation is very similar to the multinormal copula where 0,- should be nonnegative definite. - With the permutation symmetric copula (3.8), the extension to include covariates is to let 0,-be function of covariates, such as to let 0,- = exp(/3'wj). We see that for different copulas, there are many ways to extend the model to include covariates. Some are obvious and thus appear to be natural, others are not easy or obvious to specify. Note also that an exchangeable structure within the copula does not imply an exchangeable structure for the response variables. For M C D models for binary data, an exchangeable structure within the copula plus constant cut-off points across all the margins implies an exchangeable structure with the response variables. The A R dependence structure for the discrete response variables should be understood as latent Markov dependence structure (see section 3.7). When we mention an AR(1) (or A R ) dependence structure, we are referring to the latent dependence structure within the multinormal copula. Chapter 3. Modelling of multivariate discrete data 98 In summary, under \"multivariate logit models\", many slightly different models are available. For example, we have multivariate logit model with i . multinormal copula (3.1), ii . multivariate Molenberghs-Lesaffre construction a. with bivariate normal copula, b. with Plackett copula (2.8), c. with Frank copula (2.9), iii . mixture of max-id copula (3.3), iv. Morgenstern copula (3.6), v. the permutation symmetric copula (3.8). Indeed, such multiple choices of models are available for any kind of M C D model. For a discrete random vector Y , different copulas in (2.13) lead to different probabilistic models. The question is when is one model preferred to another? We will discuss this question in section 3.2. In the following, as an example, we examine estimation aspects of the multivariate logit model with multinormal copula. The multivariate logit model with multinormal copula can also be called multivariate normal-copula logit model to highlight the fact that multinormal copula is used. For the case with no covariates, the estimating equations for multivariate normal-copula logit model based on I F M can be written as the following f * n ; ( * j ) = K ( l ) ( l + e x p ( - Z i ) ) - n,-(0)(l + exp(-z j ))exp(z j )] ^}~(Zj\.2 = 0, j = 1,..., d , i {i -r exp^\u00E2\u0080\u0094Zj)) ^ 2 ( $ - 1 ( U j ) , $ - 1 ( U f c ) , ^ , ) = 0, 1 < j < k < d, where Pjk(ll) = Cjk(uj, uk; 9jk), Pjk(10) = Uj - Cjk(uj ,uk;9jk), Pjk{\l) - uk - Cjk(uj ,uk;9jk), Pjk{U) = 1 - Uj - uk + Cjk{uj,uk;9jk) with Cjk(uj,uk;9jk) = ^2(^~1{uj),^~1(uk),9jk), Uj = 1/[1 + exp(\u00E2\u0080\u0094Zj)], uk = 1/[1 + exp(\u00E2\u0080\u0094zk)], and (j>2 is the B V N density. We obtain the estimates Zj = log(nj(l)/n ;(0)), j = 1 , . . . , d, and 9jk is the root of the equation $2( (3.10) where 0]k = (bjk,o, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, bjklPjk). We recognize that (3.10) is one of the form of functions of dependence parameters (in multinormal copula) on covariates that we discussed previously. We use this form of functions for the purpose of illustration. Other function forms can also be used instead. Because of the linearity of (3.9) and (3.10), the regression parameter vectors otj, /3jk have marginal interpretations. The loglikelihood functions of margins are Znj(otj) = \u00C2\u00A3 log Pj(yij), j - 1,..., d, n \u00C2\u00A3njk(otj,otk,l3jk) = ^logPjk(yijyik), 1 < j < k < d, where exp(zij) 1 . 1 + exp(ztj) 1 + exp(zij) ^ ( ^ ( a y ) , * _ 1 ( ^ ) i + * 2 ( * _ 1 f o ) . ^-\aik);ejk), where =-Gi,(yij - 1), 6,j = Gij(yij), a i k = Gik(yik - 1) and bik = Gik(yik), with G , j ( l ) = 1 and Gij(0) \u00E2\u0080\u0094 1/1 + exp(zjj). We can apply quasi-Newton minimization to the loglikelihood functions of margins for getting the estimates of the parameters otj and (i]k. The Newton-Raphson method can also be used for getting the estimates of otj (what we used in our computer programs). In this case, we have to solve the estimating equations For applying Newton-Raphson method, we need to calculate dP(yij)/dajt and d2P(yij)/dctjsdajt. WehavedP(yij)/dajs - (2yij-l){exp(zij)/(l+exp(zij))2}xijs, s - 0 ,1 , . ..,pj, a n d d 2 P ( y i j ) / d a j s d a j t = (2ytj \u00E2\u0080\u0094 l ){exp(z, j)( l - exp(z, j)) /( l + exp(zij))3}xijsXijt, s,t = 0 , 1 , . . .,pj. For details about Newton-Raphson and quasi-Newton methods, see section 2.7. M $ and Dy can be calculated and estimated by following the results in section 2.4. In applica-tions, to avoid the tedious coding of M $ and , we may use the jackknife technique to obtain the Chapter 3. Modelling of multivariate discrete data 100 asymptotic variance of Zj and Ojk in case there are no covariates, or that of otj and $jk in case there are covariates. 3.1.2 Multivariate probit model The general multivariate probit model, similar to that of multivariate logit model, is obtained by letting Gj(0) = 1 \u00E2\u0080\u0094 $ ( Z J ) and G j ( l ) = 1 in the model (2.13). The multivariate probit model in Example 2.10 is the classical multivariate probit model discussed in the literature, where the copula in (2.13) is the multinormal copula. A l l the discussion of the multivariate logit model is relevant and can be directly applied to the multivariate probit model. For completeness, we give in the following some detailed discussion about the multivariate probit model when the copula is the multinormal copula, as a continuation of Example 2.10. For the multivariate probit model in Example 2.10, it is easy to see that E(Y}) = <&(ZJ), Var(Y}) = <3>(z,)(l - $ ( Z J ) ) , Cov(Yj, Yfc) = $2(2/, Zk, Ojk) - $(zj)$(zk), j 7 ^ k. The correlation of the response variable Yj and Yj is C o t t ( y Y x = cov(y;-,n) ^ . ^ . M ( 3 ' k> {VM ^ O V a r f Y * ) } 1 / 2 {$(ZJ)(1 - $(zj))$(zk)(l -The variance of Yj achieves its maximum when Zj = 0. In this case E(Yj) = 1/2, Var(Y}) = 1/4. If Zj =0,zk= 0, we have Cov(Y}, Yk) = ( s i n - 1 0jk)/(2ir), and Corr(Yj, Yk) = (2 s i n - 1 0jk)/TT. Without loss of generality, assume Zj < zk, then when 0jk is at its boundary values, f {*(z,-)(l - $(z*))/(l - *(z,.))4(z*)} 1 / 2, Ojk = 1 , -{(1 - $(z,))(l - *(zt))/*(*;)*(*0}1/2. 0jk = - 1 , -Zk < ZJ , -{*(Zj)*(zk)/(l - *{Zj))(l - ^Zk))}1'2, Ojk = - 1 , Zj < -Zk , { 0, Ojk = 0 . From Frechet bound inequalities, - m i n i y / 2 , * ! - 1 / 2 } < Cou(Yj,Yk) < minjfc 1 / 2 , 6\"1/2}, where a = [Pj(l)Pk{l)}/[Pj(0)Pk(0)], b = [Pj(l)Pk(0)]/[Pj(0)Pk{l)], we see that Con(Yj}Yk) attains its upper and lower bound when Ojk = 1 and 0jk = \u00E2\u0080\u00941 respectively. Corr(Yj,Yfc) is an increasing function in Ojk, it varies over its full range as Ojk varies over its full range. Thus in a general situation a multivariate probit model of dimension d consists of d univariate probit models describing some marginal characteristics and d{d\u00E2\u0080\u0094l)/2 latent correlations 0jk, 1 < j < k < d, expressing the strength of the association among the response variables. Ojk = 0 corresponds to independence among the Corr{Yj,Yk) Chapter 3. Modelling of multivariate discrete data 101 response variables. The response variables are exchangeable if 0 has an exchangeable structure and the cut-off points are constant across all the margins. Note that when 0 has exchangeable structure, we must have 0jk = 0 > \u00E2\u0080\u0094l/(d \u00E2\u0080\u0094 1). The estimating equations for the multivariate probit model with multinormal copula, based on n response vectors y,-, i = 1 , . . . , n, are \u00E2\u0080\u00A2-<<-'> = ( 2 $ - r ^ W ^ - <\u00E2\u0080\u00A2 qn.k{e.k) = ( ni\"(n) \" i * ( 1 0 ) nM01) , n \u00C2\u00BBjt(00) -^j foizj, zk,6jk) = 0, 1 < j < k < d. 1 - Q(Zj) - $(**) + ^2{Zj,Zk,9jk)i These lead to the solutions Zj = $ _ 1 (rij(l)/n), j = 1, . . .,d, and Ojk is the root of the equation $2(zj,zk,9jk) = njk(ll)/n, 1 < j < k < d. For the situation with covariates, the details are similar to the multivariate logit model with multinormal copula in the preceding subsection, except now we have PiAvu) = yijHzij) + (1 - -Pijk(VijVik) = $2{$~l(bij), $ - 1(6ijt); Ojk) - $2($~1(bij), $ _ 1 (a l j f c ) ; 0jk)-* 2 ( * _ 1 ( a y ) . \u00C2\u00AE~l(bik); Ojk) + * 2 ( S - r ( a y ) , ^(atk); 0jk), with dij - Gij{yij - 1), bij = Gij(yij), aik = Gik(yik - 1) and bik = Gik(yik), with G,j(l) = 1 and Gij(O) = 1 - $(z.j). We also have dP(yij)/dajs = (2yij - l)4>(zij)xijs, s = 0,1,...,pj and d2P(yij)/dajsdctjt = (1 \u00E2\u0080\u0094 2yij)(zij)zijXijaXijt, s,t = 0 ,1 , . . -,Pj\ these expressions are needed for applying the Newton-Raphson method to get estimates of otj. M $ and D $ can be calculated and estimated by following the results in section 2.4. For example, for the case with no covariates, we have E(ip2(zj)) = 2(zj)/{2{zj,zk,0jk), a result due to Plackett (1954). In applications, to avoid the tedious computer coding of and Dy, we may use the jackknife technique to obtain the asymptotic variance of Zj and 0jk in case there are no covariates, or that of d ;- and /3jk in case there are covariates. Chapter 3. Modelling of multivariate discrete data 102 3.2 Comparison of models We obtain many models under the name of multivariate logit model (also multivariate probit model) for binary data. A n immediate question is when is one model preferred to another? In section 1.3, we outlined some desirable features of multivariate models; among them (2) and (3) may be the most important. But in applications, the importance of a particular desirable feature of multivariate model may well depend on the practical need and constraints. As an example, we briefly compare the multivariate logit models and the multivariate probit models with different copulas studied in the section 3.1. The multivariate logit model with multinormal copula satisfies the desirable properties (1), (2), (3) and (4) of a multivariate model outlined in section 1.3, but not (5). The multivariate probit model with multinormal copula is similar, except that one has logit univariate margins and the other has probit univariate margins. In applications, the multivariate logit model with multinormal copula may be preferred to the multivariate probit model with multinormal copula, as the multivariate logit model with multinormal copula has the advantage of having a closed form univariate marginal cdf. This consideration also leads to the general preference of multivariate logit model to multivariate probit model when both have the same associated multivariate copula. For this reason, in the following, we concentrate on discussion of multivariate logit models. The multivariate logit model with the mixture of max-id copula (3.3) satisfies the desirable properties (1), (3) and (5) of a multivariate model outlined in section 1.3, but only partially (2) and (4). The model only admits positive dependence (otherwise, it is flexible and wide in terms of dependence range) and it is CUOM(fc) (k > 2) but not C U O M . The closed form cdf of this model is a very attractive feature. If the data exhibit only positive dependence (or prior knowledge tells us so), then the multivariate logit model with mixture of max-id copula (3.3) may be preferred to the multivariate logit model with multinormal copula. The multivariate logit model with the M - L construction satisfies the desirable properties (1), (2), (3) and (4) of a multivariate model outlined in section 1.3, but not (5). The computation of the cdf may be easier numerically than that of multivariate logit model with multinormal copula since the former only requires solving a set of polynomial equations, but the latter requires mul-tiple integration. The disadvantage with this model, as stated earlier, is that the object from the construction has not been proven to be a proper multivariate copula. What has been verified nu-merically (see Joe 1996) is that (3.4) and its extensions do not yield proper distributions if 771234 and Vjki (1 < j < k < I < 4) are either too small or too large. In any case, the useful thing about this Chapter 3. Modelling of multivariate discrete data 103 model is that it leads to multivariate objects with given proper univariate and bivariate margins. The multivariate logit model with the Morgenstern copula satisfies the desirable properties (1), (4) and (5) of a multivariate model outlined in section 1.3, but not (2) and (3). This is a major drawback. Thus this model is not very useful. The multivariate logit models with the permutation symmetric copulas (3.7) are only suitable for the modelling of data with special exchangeable dependence patterns. They cannot be considered as widely applicable models, because the desirable property (2) of multivariate models is not satisfied. Nevertheless, this model may be one of the interesting considerations in some applications, such as when the data to be modelled are repeated measures over different treatments, or familial data. In summary, for general applications, the multivariate logit model with the multinormal copula or the mixture of max-id copula (3.3) may be preferred. If the condition of positive dependence holds in a study, then the multivariate logit model with the mixture of max-id copula (3.3) may be preferred to the multivariate logit model with multinormal copula because the former has a closed form multivariate cdf; this is particularly attractive for moderate to large dimension of response, d. The multivariate logit model with Molenberghs-Lesaffre construction may be another interesting option. When several models fit the data about equally well, a preference for one should be based on which desirable feature is considered important to the successful application of the models. In many situations, several equally good models may be possible; see Chapter 5 for discussion and data analysis examples. In the statistical literature, the multivariate probit model with multinormal copula has been studied and applied. A n early reference on an application to binary data is Ashford and Sowden (1970). A n explanation of the popularity of multivariate probit model with multinormal copula is that the model is related to the multivariate normal distribution, which allows the multivariate probit model to accommodate the dependence in its full range for the response variables. Furthermore, marginal models follow the simple univariate probit models. 3 .3 Multivariate copula discrete models for count data Univariate count data may be modelled by binomial, negative binomial, logarithmic, Poisson, or generalized Poisson distributions, depending on the amount of dispersion. In this section, we study some M C D models for multivariate count data. Chapter 3. Modelling of multivariate discrete data 104 3.3.1 Multivariate Poisson model The multivariate Poisson model for Poisson count data is obtained by letting Gj(yj) = Ylm'loPf\"^' yj = 0 ,1 ,2 , . . . , oo, j = 1,2, in the model (2.13), where p^ \u00E2\u0080\u0094 [XJ1 exp(\u00E2\u0080\u0094Aj)]/m!, Xj > 0. The copula C in (2.13) is arbitrary. Copulas (3.1)\u00E2\u0080\u0094(3.8) are some interesting choices here. The multivariate Poisson model has univariate Poisson marginals. We have E(Yj) = Var(Y)) = Xj, which is a characterizing feature of the Poisson distribution called equidispersion. There are situations where the variance of count data is greater than the mean, or the variance is smaller than the mean. The former case is called overdispersion and the latter case is called underdispersion. We will see models dealing with overdispersion and underdispersion in the subsequent sections. A l -though the multivariate Poisson model has Poisson univariate marginal distribution, the conditional distributions are not Poisson. The univariate parameter A, in the multivariate Poisson model can be reparameterized by taking t]j = log(Aj) so that the new parameter rjj has the range (\u00E2\u0080\u009400,00). It is also straightforward to extend the univariate marginal parameters to include covariates. For example, for Xij corresponding to random vector Y,-, we can let A,-,- = exp(a(jX,j), where aj is a parameter vector and x,j is a covariate vector corresponding to the j th margin. The discussion on modelling the dependence parameters in the copulas in section 3.1 are also relevant here. Most of the discussion in section 3.2 about the comparisons of models is also relevant here as the comparison is essentially the comparison of the associated multivariate copulas. In summary, under the name \"multivariate Poisson models\", we may have multivariate Poisson model with i . multinormal copula (3.1), ii . multivariate Molenberghs-Lesaffre construction a. with bivariate normal copula, b. with Plackett copula (2.8), c. with Frank copula (2.9), iii . mixture of max-id copula (3.3), iv. Morgenstern copula (3.6), v. the permutation symmetric copula (3.8). Chapter 3. Modelling of multivariate discrete data 105 These are similar to the multivariate logit models for binary data. For illustration purposes, in the following, we provide some details on the multivariate Poisson model with the multinormal copula. The multivariate Poisson model with the multinormal copula can also be called the multivariate normal-copula Poisson model. This model was already introduced in Example 2.11. For the multivariate normal-copula Poisson model, the Frechet upper bound is reached in the limit if 0 = J , where J is matrix of I's. In fact, when Ojk = 1 and Aj = A^, the correlation of the response variable Yj and Yk is A j = t J 5 7 ^ ^ = o. > = ' * \u00E2\u0080\u009E (3-!2) ,T, ra ^ 1 dPjkijJijyik) \u00E2\u0080\u009E 1 . . ^, , , , 9njk(0jk) - x \u00E2\u0080\u0094 L = 0, 1 < 3 < k < d, ~[ pjk{yijyik) oOjk which lead to Aj = 5 Z \" = 1 Vij/n> a n d Ojk c a n be found through numerical computation. A n extension of the multivariate normal-copula Poisson model with covariate x,j for the response observation y,j is to let A,j = hj(yj, X; J ) for some function hj in the range [0,00). A n example of the function hj is A,j = exp(7jX 5 j ) (or log(A,j) = 7 j X j j ) . The ways to let the dependence parameters Ojk be functions of covariates follow the discussion in section 3.1 for the multivariate logit model with multinormal copula. We can apply quasi-Newton minimization to the loglikelihood functions of margins (3.11) to obtain the estimates of the parameters 7 j - and the dependence parameters Ojk (or the regression Chapter 3. Modelling of multivariate discrete data 106 parameters for the dependence parameters if applicable). The Newton-Raphson method can also be used to obtain the estimates of fj from ^nj(Xj) = 0. Let log(A,j) = ~fjo + ~fji%iji + \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 ' + ljpjxijpj For applying the Newton-Raphson method, we need to calculate dP(yij)/djjs and d2P(yij)/djjsdjjt-If we let xij0 = 1, we have dP(yij)/dfjs = { A ^ exp(-A,j)/?Aj\}[y tj - Xij]xijs, s = 0,1,. ..,pj, and d2P(yij)/dyj,dfjt = {Xy{jj exp(-A, J)/j/ ij!}[(y 1j - A*,) 2 - Xij]xijsXijt, s,t = 0 ,1 , . . .,Pj. For details about numerical methods, see section 2.7. 3.3.2 Multivariate generalized Poisson model The multivariate generalized Poisson model for count data is obtained by letting Gj(yj) = X^/ijj P^'\ yj =0 ,1 ,2 , . . . , oo, j = 1, 2 , . . . , d, in the model (2.13), where where Xj > 0, max(\u00E2\u0080\u0094 1, \u00E2\u0080\u0094 Xj/m) < ctj < 1 and m (> 4) is the largest positive integer for which Xj + maj > 0 when ctj is negative. The copula C in (2.13) is arbitrary. The copulas (3.1)\u00E2\u0080\u0094(3.8) are some choices here. The multivariate generalized Poisson model has as its jth (j = 1 , . . . , d) margin the generalized Poisson distribution with pmf (3.14). This generalized Poisson distribution is extensively studied in a monograph by Consul (1989). Its main characteristic is that it allows for both overdispersion and underdispersion by introducing one additional parameter ctj. The generalized Poisson distribution has the Poisson distribution as a special case when ctj \u00E2\u0080\u0094 0. The mean and variance for Yj are E(Yj) = A j ( l \u00E2\u0080\u0094 ctj)-1 and Var(Yj) = A ; ( l \u00E2\u0080\u0094 ctj)~3, respectively. Thus, the generalized Poisson distribution displays overdispersion for 0 < ctj < 1, equidispersion for ctj = 0 and underdispersion for max(\u00E2\u0080\u0094l,Aj/m) < ctj < 0. The restrictions leading to underdispersion are rather complicated, as the parameters ctj are restricted by the sample space. It is easier to work with the overdispersion situation where the restrictions are simply Aj > 0, 0 < ctj < 1. The details of applying the I F M procedure to the generalized Poisson model are similar to that of the multivariate Poisson model. For the situation with no covariates, the univariate estimating equations for the parameters Aj and aj, j = 1 , . . . , d, are Pj Xj(Xj + saj)s 1 exp(\u00E2\u0080\u0094Aj \u00E2\u0080\u0094 J 0 for s > m, when aj < 0, saj) S = 0 , l , 2 , . . . (3.14) (3.15) Chapter 3. Modelling of multivariate discrete data 107 They lead to f^l y+j + (nyij - y+jjctj { y+j(1 - otj) - nXj = 0, where y+j = 5 Z \" = 1 S/ij. When there is a covariate vector x,j for the response observation y ^ - , we may let A,j = aj(yj,x.ij) for some function aj in the range [0,oo), and let a,j = 6j (fy, x,j) for some function bj in the range [0,1]. A n example is A,j = exp(7jXjj) (or log(A,j) = 7j-x,j) and ctij \u00E2\u0080\u0094 1/[1 + exp(\u00E2\u0080\u0094Jjj-Xjj)]. The discussion on modelling the dependence parameters in the copulas in section 3.1 is also appropriate here. Furthermore, most of the discussion in section 3.2 about the comparisons of models is also relevant here since the comparison is essentially the comparison of the associated multivariate copulas. 3.3.3 Multivariate negative binomial model The multivariate negative binomial modelfor count data is obtained by letting Gj(yj) = Y^s=oP*f\ yj = 0 ,1 ,2 , . . . , oo, j = 1,2, . . . , d, in the model (2.13), where ^ ' ^ w t + V ' ' 1 - * ' ' ' \"\"^ (3'16) with aj > 0 and 0 < pj < 1. The mean and variance for Yj are E(Yj) = atj(l \u00E2\u0080\u0094 Pj)/pj and Var(Yj) = Q !j(l\u00E2\u0080\u0094 Pj)/p], respectively. Since aj > 0, we see that this model allows for overdispersion. When there is a covariate vector x,-j for the response observation y t J - , we may let a,j = aj (yj, x,-j) for some function aj in the range [0,oo), and let pij \u00E2\u0080\u0094 bj(r)j,x,j) for some function bj in the range [0,1]. See Lawless (1987) for another way to deal with covariates. Other details are similar to that of the multivariate generalized Poisson model. 3.3.4 Multivariate logarithmic series model The multivariate logarithmic series model for count data is obtained by letting Gj(yj) = Y^a=iPj'\ yj = 0 ,1 ,2 , . . . , oo, j = 1 ,2, . . . , d, in the model (2.13), where p^ = ajp'j/s, 8=1,2,..., (3.17) with aj = \u00E2\u0080\u0094 [ l o g ( l ^ p j ) ] - 1 and 0 < pj < 1. The mean and variance for Yj are E(Y,-) = ajPj/{l \u00E2\u0080\u0094 aj) and Var(Yj) = otjPj(l \u00E2\u0080\u0094 a , p j ) / ( l \u00E2\u0080\u0094 Pj)2, respectively. This model allows for overdispersion when Pj > 1 \u00E2\u0080\u0094 e _ 1 and underdispersion when pj < 1 \u00E2\u0080\u0094 e _ 1 . Note that for this model to allow a zero count, we need a shift of one such that p^ = c*jpj + 1/(t + 1) for t = 0 ,1 ,2 , . . . . Chapter 3. Modelling of multivariate discrete data 108 For the situation where there is a covariate vector x,j for the response observation y,-,-, we may let pij = Fj (fj, x,j) where Fj is a univariate cdf. A n unattractive feature of this model is that is a decreasing function of s, which may not be suitable in many applications. 3.4 Multivariate copula discrete models for ordinal data In this section, we shall discuss the modelling of multivariate ordinal categorical data with mul-tivariate copula discrete (MCD) models. We first briefly discuss some special features of ordinal categorical data before we introduce the general M C D model for ordinal data and some specific models. When a polytomous variable has an ordered structure, we may assume the existence of a latent continuous random variable that measures the level of the ordered polytomous variable. For a binary variable, models for ordered data and unordered data are equivalent, but for categories variables with more than 2 categories, ordered data and unordered data are quite different. The modelling of unordered data is not as straightforward as the modelling of ordered data. This is especially so in the multivariate situation, where it is not obvious how to model the dependence structure of unordered data. We will discuss briefly the modelling of multivariate polytomous unordered data in Chapter 7. One aspect of ordinal data worth noticing is that it is possible to combine one category with an adjacent category for data analysis. But this practice may not be as meaningful for unordered categorical data, since the notion of adjacent category is not meaningful, and arbitrary clumping of categories may be unsatisfactory. We next introduce the M C D model for ordinal data. Consider d-dimension ordinal categorical random vectors Y with rrij categories for the jth margin (j \u00E2\u0080\u0094 1,2,. . .,d) and with the categories coded as 1,2, . . . , rrij. For the jth margin, the outcome yj can take values 1,2, . . . , rrij, where rrij can differ with the index j. For Yj, suppose the probability of outcome s, s = 1, 2 , . . . , rrij, is p('\ We define { 0, yj < 1, E ^ i P J 0 . !<%\u00E2\u0080\u00A2<\">,-, (3.18) 1, yj > rrij , where [yj] means the largest integer less or equal than yj. For a given d-dimensional copula C(ui,..., u^, $), C(G\(yi),..., Gd{yd)\fl) is a well-defined distribution for the ordinal random vector Chapter 3. Modelling of multivariate discrete data 109 Y . The pmf of Y is 2 2 P(yi \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 Vd) = J2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 E ( - l ) ' l + \" ' + < 'C(a i , - 1 , . . . , a d i i ; 9), (3.19) \u00C2\u00BBi=i \u00C2\u00ABd=i where ciji \u00E2\u0080\u0094 Gj(yj \u00E2\u0080\u0094 1), Oj2 = Gj(yj). (3.19) is called the multivariate copula discrete models for ordinal data. Since Yj is an ordered categorical variable, one simple way to reparameterize p^'\ so that the new parameter associated to the univariate margin has the range in the entire space, is to let Gj (%\u00E2\u0080\u00A2) = Fj (ZJ (yj)) where Fj is a cdf of a continuous random variable Zj. Thus p^ = Fj (ZJ (y^)) \u00E2\u0080\u0094 Fj(zj(y^'~1^)). This is equivalent to \u00E2\u0080\u00A2Yj = l iff ZJ(0) < Zj < ZJ(1), Yj=2 iffzj(l) \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2. d , e x p ( ^ W i j f c ) - l 0i,ik = 7^ 7 T7T' l (3.22) 1 = 1 Chapter 3. Modelling of multivariate discrete data 112 where Pi,j(yij) = F(zij(yij)) - F(zij(yij - 1)) and Pijk(VijVik) =*2 ( * _ 1 (6 i i . ) , * _ 1 (6 , - * ) ;^* ) - M $ - 1 ( M > $ ~ V i * ) ; 0 u i f c ) -with a,-j = F(zj(yij -1)) , &,\u00E2\u0080\u00A2_,\u00E2\u0080\u00A2 = F(zj(yij)), aik = F(zk(yik - 1)) and bik = F(zk(yik)). We can apply quasi-Newton minimization to the loglikelihood functions of margins (3.22) to obtain the estimates [ of the parameters fj, aj and the dependence parameters 6jk (or the regression parameters for the dependence parameters, 0jk, if applicable). The Newton-Raphson method can also be used to obtain the estimates of yj from * n j ( 7 j ) = 0, and the estimates of aj from 9nj(aj) = 0. For applying the Newton-Raphson method, we need to calculate dPj(yij)/dfj, dPj(yij)/daj, d2 Pj(yij) / dyjdyj, d2Pj(yij)/dajdaJ and d2Pj(ytj)/dyjdaJ. The mathematical details for applying the Newton-Raphson method are the following. Let Zij (yij) = yj (y,-,j) + aj i Xij i H h ajPj XijPj. For y,j ^ 1, mj, we have Pj(yij) = exp(zy(y tj))/[l + exp(zij(y0-))] - exp(z i j(y1 J- - 1))/[1 + exp(z, J( 2/ i j - 1))], thus dPj(yij)/dajs = {[exp(zij(yij))/(l+exp(zij(yij)))2]-[exp(zij(yij - l ) ) / ( l+exp(z i j (y , j - l ) ) ) 2 ] }^* , s = l , . . . , p j , and d2Pj{yij)/daj,dajt = {[exp(z, J(y, i))(l - exp(z 8 J (y i j ))) / ( l + exp(z i j(j/, i))) 3] -[exp(z0-(s/ij - 1))(1 - exp(zij(yij - 1)))/(1 + exp(z i j(y i j- - l)))3]}^-^,-,-,, s,< = l , . . . , p ; - . For r = 1, 2 , . . . , mj \u00E2\u0080\u0094 1, we have <9Tj(r) expt (l+exp^o-Cyo-)))3 e x p ( ^ j ( y < j - l ) ) ( H - e x p ^ C y ^ - ! ) ) ) * and for r i , r2 = 1,2, . . . , m,-10 1, we have if r = yfj , if r = ytj - 1 otherwise , djj{ri)dyj(r2) ' exp i^jCy^Xl-exp^yfai,-))) (l+exp i^j^ yij)))3 < exp i^l(yi,-l))(l-exp( i^,(y;,-l))) (l+exp^^Cy -^l)))3 .0 if r i = r 2 = y,-j , if n = r 2 = - 1 otherwise , and d2Pj{yij) dyj(r)dajs ( exp(^ij(yij))(l-exp(^<.,-(y;j))). = < (l+exp(*ii(yji)))3 h%2 s exp(*O -0 / i j - l ) ) ( l -exp(*i j (yi . , - - l ) ) ) ( l + e x p ^ i ^ y ^ - 1 ) ) ) 3 if r = 2/ij , if r = - 1 , 0 otherwise . For y{j = 1, -Pj(y,j) = exp(z 0-(y,j))/[l + exp^^y,-,-))] and for j/,-,- = , Pj{yij) = 1 - exp(z ! i(j/ ij -1))/[1 + exp(zij(ytj \u00E2\u0080\u0094 1))], thus corresponding slight modification on the above formulas should be made. For details about numerical methods, see section 2.7. M $ and Dy can be calculated and estimated by following the results in section 2.4. In applica-tions, to avoid the tedious coding of M $ and D $ , we may use the jackknife technique to obtain the Chapter 3. Modelling of multivariate discrete data 113 asymptotic variance of Zj(yj) and Ojk when there is no covariates, or that of fj, a , and 8jk when there are covariates. 3.4.2 Multivariate probit model Similar to the multivariate probit model for binary data, the general multivariate probit model is obtained by letting Gj(yj) \u00E2\u0080\u0094 $(zj(yj)) in (3.19), where \u00E2\u0080\u0094oo = Zj(0) < Zj(l) < \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 < Zj(rrij \u00E2\u0080\u0094 1) < Zj(rrij) = oo are constants, j = 1, 2 , . . . , d. It is equivalent to letting Fj{z) = 3>(z), or choosing Fj to be the standard normal cdf. The copula C in (3.19) is arbitrary. The copulas (3.1)\u00E2\u0080\u0094(3.8) are some choices here. The multivariate probit model with multinormal copula for ordinal data is discussed in the literature (see for example Anderson and Pemberton 1985). The discussion of the multivariate logit model for ordinal data in the previous subsection is relevant and can be directly applied to the multivariate probit model for ordinal data. For completeness, we provide some detailed discussion for the multivariate probit model for ordinal data when the copula is the multinormal copula. Let the data be yt- = (yu,..., yid), i = 1 , . . . , n. For the situation with no covariates, there are Ej=i( r ai \u00E2\u0080\u0094 1) univariate parameters and d(d\u00E2\u0080\u0094 l ) /2 dependence parameters. As for the multivariate logit model, with the I F M approach, we find that Zj(yj) = $ - 1(Er=i n j ( r ) l n ) , a n d Ojk must be obtained numerically. For the situation with covariate vector Xjj for the marginal parameters Zj(yij) for Y{j, and a covariate vector vfijk for the dependence parameter 0,-jjt, i = l , . . . , n , the details on I F M for parameter estimation are similar to the multivariate logit model for ordinal data in the preced-ing subsection. We here provide some mathematical details for this model. We have Pi,j(yij) = $(zij(Vii)) ~ \u00C2\u00AE(zij (y>j ~ !)) a n d Pijkivijm) =*2(*- 1(6o),$- 1(6 iO ;0u*) - *2($- 1(feo),$- 1(aa);0. J*)-$ 2($ _ 1(ao), $ _ 1 ( M ; Oijk) + $2($ _ 1K-)> \u00C2\u00AE~\aiky, Oijk), where a t j = {zij{yij))-{zij{yij))-(zij(yij-l))]xijs, s = l,...,pj, and d2Pj(yij)/dajsdajt = [-(zij(yij))zij(yij) + (zij(yij - l))zij(yij - l)]xijsxijt, s,t \u00E2\u0080\u0094 1 , . . . ,pj. For r = 1 ,2, . . . , \u00E2\u0080\u0094 1, we have dPjjyij) djj(r) ' 4>{zij(yij)) Hr = yij , = s -Hzij(yij -!)) i f r = y>j - !> . 0 otherwise , Chapter 3. Modelling of multivariate discrete data 114 and for r\, r2 = 1, 2 , . . . , rrij \u00E2\u0080\u0094 1, we have djj(ri)djj(r2) ' -(zij(yij - l))zij{yii -!) if n = r2 = y{j - 1 . 0 otherwise , and d2P:(Vij) djj(r)dajs ' -4>{zij{yij))zij{yij)xijs if r = yij , ' (zij(yij - l))zij(yij - tyxijs if r = yij - 1 otherwise . For = 1, Pj(yij) = \u00C2\u00AE(z%j(yij)) and for ytj - rrij, Pj(ytj) = 1 - $(zij(yij - 1)), thus the corresponding slight modification on the above formulas should be made. For details on numerical methods, see section 2.7. My and Dy can be calculated and estimated by following the results in section 2.4. For example, for the case with no covariate, we have E(ip2(zj(yj))) = {[Pj(yj + 1) + Pj(yj)]2(zj{yj))}/{Pi{yj + tyPj(%')}> where Pj(yj) = $(zj(yj)) \u00E2\u0080\u0094 $(zj (yj \u00E2\u0080\u0094 1)), and so on. In applications, to avoid the tedious coding of My and Dy, we may use the jackknife technique to obtain the asymptotic variances of zj(yj) and djk in case there is no covariates, or those of yj, aj and 8jk in case there are covariates. The multivariate probit model with multinormal copula for ordinal data has been studied and applied in the literature. For example, Anderson and Pemberton (1985) used a trivariate probit model for the analysis of an ornithological data set on the three aspects of colouring of blackbirds. 3.4.3 Multivariate binomial model In the previous subsections, we supposed that for Yj, the probability of outcome s is p^'\ s = 1 , 2 , . . . ,rrij, j = 1 , . . . ,d, and we linked the rrij probabilities p ^ to rrij \u00E2\u0080\u0094 1 cut-off points Zj(l), Zj(2),..., Zj(rrij 1 ) . We keep as many independent parameters within the margins and between the margins as pos-sible. In some situations, it is worthwhile to reduce the number of free parameters and obtain a more parsimonious model which may still capture the major features of the data and serve the inference purpose. One way to reduce the number of free parameters for the ordinal variable is to reparameterize the marginal distribution. Because J2T=i \u00E2\u0080\u0094 1 a n d P^ ^ O i w e m a v l e t This reparameterization of the distribution of Yj reduces the number of free parameter to one, namely Pj. The model constructed in this way is called the multivariate binomial model for ordinal data. (3.23) for some 0 < pj < 1. In other words, we assume that Yj follows a binomial distribution Bi(m_,- \u00E2\u0080\u0094 l,Pj). Chapter 3. Modelling of multivariate discrete data 115 By treating Pj(yj) as a binomial probabilities, we need only deal with one parameter pj for the j th margin. (3.23) is artificial for the ordinal data as s in (3.23) is based on letting Yj take the integer values in {0,1,..., rrij} as its category indicator. But s is a qualitative quantity; it should reflect the ordinal nature of the variable Yj, not necessarily take on the integer values in {0,1,..., mj}. In applications, if one feels justified in assuming the binomial behaviour in the sense of (3.23) for the univariate margin, then this model may be considered. (3.23) is a more natural assumption if the categorical outcome of each univariate response can be considered as the number of realizations of an event in a fixed number of random trials. In this situation, it is a M C D model for binomial count data. When there is a covariate vector x,j for the response observation y,-j, we may let Pij = bj(r/j,Xij) for some function bj in the range [0,1]. Other details are similar to the multivariate logit model for binary data. 3.5 Multivariate mixture discrete models for binary data The multivariate mixture discrete models (2.16) or (2.17) are flexible for the type of discrete data and for the multivariate structure by allowing different choices of copulas. However, they generally do not have closed form pmf or cdf. The choice of models should be based on the desirable features for multivariate models outlined in section 1.3, among them, (2) and (3) are considered to be essential. In this and the next section, we study some specific M M D models. The mathematical develop-ment for other M M D models with different choices of copulas should be similar. 3.5.1 Multivariate probit-normal model The multivariate probit-normal model for binary data is introduced in (2.32) in Example 2.13. Following the notation in Example 2.13, the corresponding cut-off points a,j is a,j = 0'jX.ij for a more general situation, where x,j is a covariate vector, j = l,...,d and i = l,...,n. As-sume Bj ~ NPi{pj,Y,j), j = l,...,d. Let 7 = {Bl,...,Bd)' ~ Nq(p,T,), where q = and Gov{Bj,Bk) \u00E2\u0080\u0094 S j . From the stochastic representation in Example 2.13, we have ' * ? , = ^ -\"-I -\u00C2\u00AB {l + x^.E,^-} ! / 2 ' _ Ojk + X-jSjjfcXjfc i ' j k ~ {(l + x^SJ\u00E2\u0080\u00A2xl\u00E2\u0080\u00A2j)(l + x ^ X i , ) } 1 / 2 , J t \u00E2\u0080\u00A2 Chapter 3. Modelling of multivariate discrete data 116 The jth and (j, k) marginal pmf are HAW) = i/y + (! - -Pi,jk(yijyik) = *2 ($ - 1 (6 , - j ) , * - 1 (6$_1(ai*0; r\u00C2\u00BB\u00E2\u0080\u00A2J\u00E2\u0080\u00A2*)-$2($\"~ 1 ( a \u00E2\u0080\u00A2 .;\u00E2\u0080\u00A2)> nj*) + $ 2 ( $ - 1 ( a , j ) , $~1(a,-Jb); r , j*) , where ay = G,-j(y,j - 1), btj = Gy(j/y) , a,-* = Gik(yik - 1) and 6\u00C2\u00BBfc = Gik{yik), with G y ( l ) = 1 and Gjj(O) = 1 \u00E2\u0080\u0094 $(z*j). We can thus apply quasi-Newton minimization to the log-likelihood functions of margins inj(Pj, \u00C2\u00A3,\u00E2\u0080\u00A2) = \u00C2\u00A3 l o g P , - ( \u00C2\u00BB y ) > J = 1. \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, d. \u00C2\u00BB=i n 4jfc(Pj ,S j , pk,T,k,ejk,Ejk) = \u00C2\u00A3 l o g Pjk(yijyik), l < j < k < d , \u00C2\u00AB=i to obtain the estimates of the parameters / i ^ , E j , /ij., E j , fyi and Eyj,. From appropriate assumptions, many simplifications of (3.24) are possible. For example, if E j = I and E j * = 0, j ^ fc, then (3.24) simplifies to 1.1/2 ' ^ i . , . . . , u, (3.25) * {1 + x ^ x y } ^ ' 11/2 ' , J * {(l + x<.x t i)(l+x< f cx i ,)} 1 which is a simple example of having the dependence parameters be functions of covariates in a natural way, as they are derived. The numerical advantage is that as long as 0 = (9jk) is positive-definite, then all R{ = (ujk), i = 1 , . . . , n, are positive-definite. A n extension of (3.24) is to let z*j=lt'iXi}> J = l> \u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2'd> rijk 11/2 ' J ' ^ ' (3.26) {(l + w{ , .E i w i i ) ( l + w;. i E t w,- t ) } 1 where xy and wy may differ. However this does not obtain from a mixture model. 3.5.2 Multivariate Bernoulli-Beta model For a d-variate binary random vector Y taking value 0 or 1 for each component, assume we have the M M D model (2.17), such that P(yi---Vd)= t \u00E2\u0080\u00A2\u00E2\u0080\u00A2 f ^f{yi-,Pj)~l(Gk(pk)), and Pj)]yii[l ~ h j i x i j ^ j t f - ^ c i G M , G q ( P g ) ) n gj(Pj)dPl \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 dPq, Jo Jo j = 1 j = 1 (3.28) for some function hj with range in [0,1]. A large family of such functions is hj (xtj ,pj) = Fj (Ff1 (Pj)+ P'j'X-ij) where Fj is a univariate cdf. Pj(yij) and Pjk(yijVik) can be written accordingly. For ex-ample, if Fj(z) = exp(\u00E2\u0080\u0094e~z), then hj(xij,Pj) = p^XP^ ^ j X , : ' ^ and we have that when y,j = 1, Pj(yij) = B(ctj + exp(\u00E2\u0080\u00943jXij),3j)/B(ctj,8j). If covariates are not subject dependent, but only margin-dependent, an alternative extension is to let a,,- and depend on the covariates for some functions aj and bj with range in [0,oo], such that a,-,- = a,j(fj,Xj) and = bj(t)j,Xj). In this sit-uation, we have, for example, Pj(yij) = B(aj(y'jXj) + yij, bj(rr,jXj) + l-yij)/B(aj(y'jXj), bj(tfjXj)). A n example of the functions aj and 6, is a,j \u00E2\u0080\u0094 exp(y'jXj) and /?,-,- = exp(t]'jXj). When apply-ing the I F M approach to parameter estimation, the numerical computation involves 2-dimensional integration which would be feasible in most cases. A special case of the model (3.27), where pj = p, j = 1 , . . . , d, is the model (1.1), studied in Prentice (1986). The pmf of the model is P(vi---Vd)= I' py+(l-P)d-y+g(p)dp, : (3.29) Jo where y+ = J2j=iVj a n d 9(P) * s the density of a Beta(a,/?) distribution. The model (3.29) has exchangeable dependence and admits only positive dependence. A discussion of this special model, with extensions to include covariates and to admit negative dependence, can be found in Joe (1996). Chapter 3. Modelling of multivariate discrete data 118 3.5.3 Multivariate logit-normal model For a d-variate binary random vector Y taking value 0 or 1 for each component, suppose we have the M M D model (2.17), such that rl rl d P(yi---yd)= \u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2/ J\f{yj\Pj)g{pi,-\u00E2\u0080\u00A2-,Pd)dpi-\u00E2\u0080\u00A2-dPd, (3.30) Jo Jo j = 1 where f(yj ;pj) = py' (1 \u00E2\u0080\u0094pj)l~yi, and g(-) is the density function of a normal copula, with univariate marginal cdf Gj Jo V \u00C2\u00B0~j In other words, if pj is the outcome of a rv Pj, and Zj = logit(Pj) = l o g ( P , / l \u00E2\u0080\u0094 Pj), j = 1,...', d, then (Z\,..., Zd)' has a joint d-dimensional normal distribution with mean vector ji, variance vector J2T1d T. r exp { - J(z - p)'{ is the standard univariate normal density function and 2 the standard bivariate normal density function. Given data = (yn,..., yid) with no covariates, we may obtain pj, aj and Ojk by the I F M approach. For the case of different covariates for different margins, similar to the multivariate Bernoulli-Beta model, an interpretable extension of (3.30) is obtained by letting rl rl d P(yn---Vid)= / \u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2/ X\[hj{*ij,Pi)]yii[l-hj(xij,pj)]1-yi>g{p1,...,pd)dpl---dpd, (3.31) Jo Jo j = l for some function hj with range in [0,1]. Pj{yij) and Pjk{yijy%k) can be written accordingly. If covariates are not different, an interpretable extension to include covariates to the parameters in c(l - x) dx, 0 < pj < 1, j = 1,. Chapter 3. Modelling of multivariate discrete data 119 (3.30) obtains by letting ptj = aj{fj,Xj) and Cij = bj(r)j,Xij) for some functions aj and bj. The loglikelihood functions of margins for the parameters are now \u00C2\u00A3nj(Pj,CTj) = ^logPjjyij), j = l , . . . , d , n enjk(8jk) = E \\u00C2\u00B0&pjk{yiiyik), i < j < k < d , \u00C2\u00BB=i where J-oo l + exp(/iy+<7,ja;) D / \ r\u00C2\u00B0\u00C2\u00B0 r\u00C2\u00B0\u00C2\u00B0 exp{yij(pij + aijx)} exp{yik(pik + o-iky)} , , a , , , Pjk(yijVik)= / / . , , , \ 7 7 '\u00E2\u0080\u00942{x,y\6jk)dxdy. J-oo J-oo 1 + exp(^ij + cr^z) 1 -)- exp(pik + a n d (3.33) with ?7j > 0, j = l,...,p, is the multivariate lognormal density. For simple situation with no covariates, fa = / i , fl-,- = c and 0,- = 0 . This model is studied in Aitchison and Ho (1989). The model (3.32) can accommodate a wide range of dependence, as we have seen in Example 2.12. Corr(Y}, Yk) is an increasing function of 9jk, and varies over its full range when 9jk varies over its full range. Thus in a general situation a multivariate Poisson-lognormal model of dimension d, consists of d univariate Poisson-lognormal models describing some marginal characteristics and d(d \u00E2\u0080\u0094 l ) /2 Chapter 3. Modelling of multivariate discrete data 120 dependence parameters 9jk, 1 < j < k < d, expressing the strength of the associations among the response variables. Ojk = 0 for all j ^ k correspond to independence among the response variables. The response variables are exchangeable if 0 has an exchangeable structure and pj and 2{Zj,Zk\9jk)dZjdzk, J-oo J-oo y\u00C2\u00BBj!2/,A;!exp(e^+^^ + e' i fc+CTfcZk) where 2 is the standard binormal density. To get the I F M E of p., a and 0 , quasi-Newton mini-mization method can be used. Good starting points can be obtained from the method of moments estimates. Let yj, s2 and rjk be the sample mean, sample variance and sample correlations re-spectively. The method of moments estimates based on the expected values given in (2.30) are = {log[(\u00C2\u00AB? - yj)/y] + l]} 1 ' 2 , p] = logy,- - 0.5(<7?)2 and 9% = logfokS jS k / ( y j y k ) + l}/(a]a0k). When there is a covariate vector xy for the response observation y i ;-, we may let pij = a, (7,, xy) for some function aj in the range (\u00E2\u0080\u009400,00), and let 0 is considered as a scale factor (known or unknown) and the common parameter A has the lognormal distribution LN(p, a2). In this situation we have V 2 7 T [ \ j = 1 Vj! J-00 exp(e\"+^ Pj) and the parameters p and a are common across all the margins. To calculate P(yi \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 - yd), we need only calculate a one-dimensional integral; thus full maximum likelihood estimation can be used to get the estimates of p, a and /3j (if it is unknown). By the formulas in (2.26), it can be shown that there Chapter 3. Modelling of multivariate discrete data 121 is an exchangeable correlation structure in the response vector Y , with the pairwise correlations tending to 1 when p or a tend to infinity. Independence is achieved when a \u00E2\u0080\u0094\u00E2\u0080\u00A2 0. The model (3.34) does not admit negative dependence. 3 . 6 . 2 M u l t i v a r i a t e P o i s s o n - g a m m a m o d e l The multivariate Poisson-gamma model is obtained by letting Gj(r)j) in (2.24) be the cdf of a univariate gamma distribution with shape parameter aj and scale parameter Bj, with the density function gj(x;aj,8j) = B~a'xAJ~1e~x/Pi/T(CXJ), x > 0, aj > 0 and Bj > 0. The Gamma family is closed under convolution for fixed B. The copula C in (2.24) is arbitrary; (3.1)\u00E2\u0080\u0094(3.8) are some choices here. For example, with the multinormal copula, the multivariate Poisson-gamma model is M U B E . Thus the I F M approach can be applied to fit the model. The j th marginal distribution of a multivariate Poisson-gamma distribution is f\u00C2\u00B0\u00C2\u00B0 Je~zizVizaj~1e-z^Pi dzj pj(Vj)= / fiVj; ZJ)9j(zj) dzj = Jo yj\pj3T(aj) _ T ( y j + a j ) f 1 Y * / Bj \ \u00C2\u00BB y,-!r(tti) \l + 8j) \l + 8j) ' which implies that Yj has a negative binomial distribution (in the generalized sense). We have E(Yj) \u00E2\u0080\u0094 ajBj and Var(Yj) = otj8j(l + Bj). The margins are overdispersed since Var(Yj)/E(Yj) > 1. Based on (3.35), if a , is an integer, yj can be interpreted as the number of observed failures in yj +aj trials, with aj a previously fixed number of successes. The parameter estimation procedure based on I F M is similar to that for the multivariate Poisson-lognormal model. Some simplifications are possible. One simplification for the Poisson-gamma model is to hold the shape parameter aj constant across j. In this situation, we have E(Yj) \u00E2\u0080\u0094 pj = aBj and Var{Yj) = pj(l + Pj/a). Similarly, we can also require Bj be constant across j and obtain the same functional relationship between the mean and the variance across j. By doing so, we reduce the total number of parameters. With this simplification in the number of parameters, the same parameter appears in different margins. The I F M approach for estimating parameters common to more than one margin discussed in section 2.6 can be applied. Another special case is to let Aj = \Bj, where Bj > 0 is considered to be a scale factor (known or unknown) and the common parameter A has a Gamma distribution. This is similar to the multivariate Poisson-lognormal model (3.34). Negative dependence cannot be admitted into this special situation, which is similar to the multivariate Poisson-lognormal model (3.34). Chapter 3. Modelling of multivariate discrete data 122 3.6.3 Multivariate negative-binomial mixture model Consider d-dimensional count data with yj = rj, rj + 1 , . . . , rj > 1, j = 1,2 . . . , d. For example, with given integer value rj, yj might be the total number of Bernoulli trials until the r ,th success, where the probability of success in each trial is pj; that is If Pj is itself the outcome of a random variate Xj, j = 1 , . . . , d, which have the joint distribution G(p\,.. -,Pd), then the distribution for Y = ( Y i , . . . , Y \u00C2\u00B0 . W = \u00C2\u00B0. L 2> \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 Using the recursive relation T(x) = (x \u00E2\u0080\u0094 l)T(x \u00E2\u0080\u0094 1), Pj(yj\pj) can be written as yj I pj(yj\pj) = Pi (3j + k - - Pj) k=l 1, yj = 0. The multivariate negative-binomial mixture model can be denned with this general negative binomial distribution as the discrete mixing part. Bj, j \u00E2\u0080\u0094 1, . . .,d, can be considered as parameters in the mo del. 3.6.4 Multivariate Poisson-inverse Gaussian model The multivariate Poisson-inverse Gaussian model is obtained by letting Gj(rjj) in (2.24) be the cdf of a three-parameter univariate inverse Gaussian distribution with density function 9j(*j) = -JTi\u00E2\u0080\u0094\Xrlexp[(^/2)(CJ/AJ- + \j/ij)), Xj > 0, (3.36) where uij = f? + a? \u00E2\u0080\u0094 aj > 0, \u00C2\u00A3j > 0 and \u00E2\u0080\u0094oo < jj < oo. In the density expression, Kv(z) denotes the modified Bessel function of the second kind of order v and argument z. It satisfies the Chapter 3. Modelling of multivariate discrete data 123 relationship 2v Ku+1(z) = \u00E2\u0080\u0094Kv(z) + Kv.1(z), z with K-i/2(z) = K 1/2(2) = yJ!Tf2zexp(\u00E2\u0080\u0094z). The copula C in (2.24) is arbitrary; interesting choices are copulas (3.1)\u00E2\u0080\u0094(3.8). With the multinormal copula, the multivariate Poisson-inverse Gaussian model is M U B E ; thus the I F M approach can be applied to fit the model. A special case of the multivariate Poisson-inverse Gaussian model results when f(yj\Zj) = e~ZiZj1 /yj\, where Zj = Xtj, with tj > 0 considered as a scale factor (j = l , . . . , d ) . Then the pmf for Y is K k + 1 (y/w(w + 2tZt:) ( u> V*+7)/2 TT (&i)yi Ky(w) V w + 2 ^ E ^ y / = i yj-where k = J^j=i extensive study of this special model can be found in Stein et al. (1987). 3.7 Application to longitudinal and repeated measures data Multivariate copula discrete ( M C D ) and multivariate mixture discrete ( M M D ) models can be used for longitudinal and repeated measures (over time) data when the response variables are discrete (binary, ordinal and count), and the number of measures is small and constant over subjects. The multivariate dependence structure has the form of time series dependence or of dependence decreas-ing with lag. Examples include M C D and M M D models with special copula dependence structure and special patterns of marginal parameters. These models include stationary time series models that allow arbitrary univariate margins and non-stationary cases, in which there are time-dependent or time-independent covariates or time trends. In classical time series analysis, the standard models are autoregressive (AR) and moving average (MA) models. The generalization of these concepts to M C D and M M D models for discrete time series is that \"autoregressive\" is replaced by Markov and \"moving average\" is replaced by fc-dependent (only rv's that are separated by a lag of k or less are dependent). A particularly interesting model is the Markov model of order one, which can be considered as a replacement for AR(1) model in classical time series analysis; and these types of Markov models can be constructed from families of bivariate copulas. For a more detailed discussion of related topics, such as the extension of models to include covariates and models for different individuals observed at different times, see Joe (1996, Chapter 8). Chapter 3. Modelling of multivariate discrete data 124 If the copula is the multinormal copula (3.1), the correlation matrix in the multinormal copula may have patterns of correlations depending on lags, such as exchangeable or A R type. For example, for exchangeable pattern, Ojk = 0 for all 1 < j < k < d. For AR(1), djk = 6^3~h\ for some \0\ < 1. For AR(2), 6jk = ps, with s = \j \u00E2\u0080\u0094 k\. ps is the autocorrelations of lag s; the autocorrelation satisfy Ps = iP,-i+2P,-2, s >%,<}>!= px(l- p2)/(l- pj), 2 = (P2 - pl)/{l- p\), and are determined from pi and p2-Some examples of models suitable for modelling longitudinal data and repeated measures (over time) are the multivariate Poisson-lognormal model, multivariate logit-normal model, multivariate logit model with multinormal copula or with M - L construction, multivariate probit model with multinormal copula, and so on. In fact, the multivariate probit model with multinormal copula is equivalent to the discretization of A R M A normal time series for binary and ordinal response. For the discrete time series and d > 4, approximations can be used for the probabilities Pr(Y} = yj, j \u00E2\u0080\u0094 1 , . . . , d) which in general are multidimensional integrals. 3.8 S u m m a r y In this chapter, we studied specific M C D models for binary, ordinal and count data, and M M D models for binary and count data. ( M M D models for ordinal data are not presented, since there is no natural simple way to represent such models, however M M D models for binary data can be extended to M M D models for nominal categorical data.) Extension to let the marginal parameters as well as the dependence parameter be functions of covariates are discussed. We also outlined the potential application of M C D and M M D models for longitudinal data, repeated measures and time series data. However, this chapter does not contain an exhaustive list of models in the family of M C D and M M D classes. Many additional interesting models in M C D and M M D classes could be introduced and studied. Our purpose in this chapter is to demonstrate the richness of the classes of M C D and M M D models, and to make several specific models available for applications. Some examples of the application of models introduced in this chapter can be found in Chapter 5. Chapter 4 The efficiency of I F M approach and the efficiency of jackknife variance estimate It is well known that under regularity conditions, the (full-dimensional) maximum likelihood esti-mator (MLE) is asymptotically efficient and optimal. But in multivariate situations, except the multinormal model, the computation of the M L E is often complicated or impossible. The I F M approach is proposed in Chapter 2 as an alternative estimation approach. We have shown that the I F M approach provides consistent estimators with some good asymptotic properties (such as asymptotic normality of the estimators). This approach has many advantages; computational fea-sibility is main one. It can be applied to many M C D and M M D models (models with M U B E , P U B E properties) with appropriate choices of the copulas; examples of such copulas are multinor-mal copula, M - L construction, copulas from mixture of max-id distributions, copulas from mixture of conditional distributions, and.so on. The I F M theory is a new statistical inference theory for the analysis of multivariate non-normal models. However, the efficiency of estimators obtained from I F M in comparison with M L estimators is not clear. In this chapter, we investigate the efficiency of the I F M approach relative to maximum likelihood. Our studies suggest that the I F M approach is a viable alternative to M L for models with M U B E , P U B E or M P M E properties. This chapter is organized as follows. In section 4.1, we discuss how to assess the efficiency of the I F M approach. In section 4.2, we carry out some analytical comparisons 125 Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 126 of the I F M approach to M L for some models. These studies show that the I F M approach is quite efficient. A general analytical investigation is not possible, as closed form expressions for estimators and the corresponding asymptotic variance-covariance matrices from M L and I F M are not possible for the majority of multivariate non-normal models. Most often numerical assessment of their performance must be used. In section 4.3, we carry out extensive numerical studies of the efficiency of I F M approach relative to M L approach. These studies are done mainly for M C D and M M D models with M U B E or P U B E properties. The situations include models without and with covariates. In section 4.4, we numerically study the efficiency of I F M approach relative to M L approach for models with special dependence structure. The I F M approach extends easily to the models with parameters common to more than one margin. Section 4.5 is devoted to the numerical assessment of the efficiency of the jackknife approach for variance estimation of I F M E . The numerical results show that the jackknife variance estimates are quite satisfactory. 4.1 The assessment of the efficiency of I F M approach In section 2.3, we have given some optimality criteria for inference functions. We concluded that in the class of all regular unbiased estimating functions, the inference functions of scores (IFS) is M -optimal (so T-optimal or D-optimal as well). For the regular model (2.12), the inference function in the I F M approach are in the class of regular unbiased inference functions; thus all the (asymptotic) properties of regular inference functions apply to I F M . To assess the efficiency of I F M relative to IFS, at least three approaches are possible: A l . Examine the M-optimality (or T-optimality or D-optimality) of I F M relative to IFS. A2. Compare the M S E of the estimates from I F M and IFS based on simulation. A3 . Examine the asymptotic behaviour of 2\u00C2\u00A3($) - 2\u00C2\u00A3{6) based on the knowledge that 2\u00C2\u00A3(6) - 21(0) has an asymptotic \ 2 q distribution when 6 is the true parameter vector (of length q). A l is along the lines of inference function theory. As an estimator may be regarded as a solution to an equation of the form \\u00C2\u00A3(y; 6) = 0, we study the inference functions instead of the estimators. This approach can be carried out analytically in a few cases when both the Godambe information matrix of I F M and the Fisher information matrix for IFS are available in closed form, or otherwise numerically by computing (or estimating) the Godambe information matrix and the Fisher information matrix (based on simulation). With this approach, we do not need to actually find the parameter estimates Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 127 for the purposes of comparison. The disadvantage is that the Godambe information matrix or Fisher information matrix may be difficult to calculate, because partial derivatives are needed for the computation and they are difficult to calculate for most multivariate non-normal models. Also this is an asymptotic comparison. A2 is a conventional approach, it provides a way to investigate the small sample properties of the estimates. This possibility is especially interesting in comparison with A l , since although M L E s are asymptotically optimal, this may not generally be the case for finite samples. The disadvantage with A2 is that it may computationally demanding with multivariate non-normal models, because for each simulation, parameters estimation based on I F M and IFS are carried out. A3 is based on the understanding that if the estimates from I F M are efficient, we would envisage that the full-dimensional likelihood function evaluated at these estimates should have the similar asymptotic behaviour as when the full-dimensional likelihood function is evaluated at the M L E . More specifically, suppose the loglikelihood function is \u00C2\u00A3(0) = Yl7=i l\u00C2\u00B0S/(yi|0)> where 6 is a vector of length q. Under regularity conditions, 2(\u00C2\u00A3(8) \u00E2\u0080\u0094\u00C2\u00A3(8)) has an asymptotic x\ distribution (see for example, Sen and Singer 1993, p236). Thus a rough method of assessing the efficiency of 0 is to see if 2(\u00C2\u00A3(0) \u00E2\u0080\u0094 \u00C2\u00A3(0)) is in the likelihood-based confidence interval for 2(1(0)\u00E2\u0080\u0094 \u00C2\u00A3(0))\ this interval of 1 \u00E2\u0080\u0094a confidence is (x2.a/2> Xji -a/2) ' where Xqtp is the lower 0 quantile of a chi-square distribution with q degrees of freedom. The assessment can be carried out by comparing the frequency of (empirical confidence level of) 2(1(0) \u00E2\u0080\u00941,(0)) in the ( x 2 a / 2 ' ^ j i _ a / 2 ) with 1 \u00E2\u0080\u0094 a. In other words, we check the frequency of and 0 is considered to be efficient if the empirical frequency is close to 1 \u00E2\u0080\u0094 a. The advantage of this efficiency of 0 in comparison with 0 in relatively small sample situations. In our studies, A3 will not be used. We mention this approach merely for further potential investigations. To compare I F M with IFS by A l , we need to calculate the Fisher information (matrix) and the Godambe information (matrix). Suppose P(y\ \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 -yd',0), 0 6 5ft, is a regular M C D or M M D model in (2.12), where 0 = ..., 6q)' is g-component vector, and 5ft is the parameter space. The Fisher information matrix from one observation for the parameter vector 0, I, has the following expression 8\u00C2\u00A3{6--xl,a/2<2(t(0)-l(0))q)- The Godambe information matrix Jy based on I F M for one observation is = D<$M^1 Dl%, where / M u \u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2 M l q \ /Dn ,9,J Dlq\ D q q / # *), Djj = EWj/dBj) (j = with Mj, = E(rl>]) (j = 1 , . . . , \u00C2\u00AB ) , Mjk = E ( ^ k ) (j, k = 1, 1 , . . . , q), and Djk = E(dipj/dOk) (j,k = 1 , . . . , q, j ^ k). The detailed calculation of the elements of M $ and D $ can be found in section 2.4 for the models without covariates. The M-optimality assessment examines the positive-definiteness of J^1 \u00E2\u0080\u0094 I~l. It is equivalent to T-optimality which examines the ratio of the trace of the two information matrices, Tr ( J^ ' 1 ) /Tr (7 _ 1 ) , and D-optimality which examines the ratio of the determinant of the two information matrices, det(J^' 1 )/det(7 _ 1 ) . T -optimality is a suitable index for the efficiency investigation as it is easier to compute. A n equivalent index to D-optimality is ^ d e t ( J * )/det(I l ) . In our efficiency assessment studies, we will use M -optimality, T-optimality or D-optimality interchangeably depending on which is most convenient. In most multivariate settings, A l is not feasible analytically and extremely difficult computa-tionally (involving tedious programming of partial derivatives). A2 is an approach which eliminates the above problems as long as M L E s are available. As M L E s and IFMEs are both unbiased only asymptotically, the actual bias related to the sample size is an important issue. For an investigation related to sample size, it is more sensible to examine the measures of closeness of an estimator to its true value. Such a measure is the mean squared error (MSE) of an estimator. For an estimator 6 = 9(X\,..., Xn), where X\,..., Xn is a random sample of size n from a distribution indexed by 9, the M S E of 9 about the true value 9 is MSE( o r A* = n _ 1 E i = i x \u00C2\u00AB - simple calculation leads to J ^ 1 ^ ) = n _ 1 E . Thus if we incorporate the knowledge that pi,...,pd are the same value, with W A and P M L A , we have Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 130 i . W A : the final I F M E of p is P-w = l ' E - i l ' which is exactly the same as p since ft = \" - 1 E r = i x \u00C2\u00BB - So m this situation, the I F M E is equivalent to the M L E . ii . P M L A : the final I F M E of p is . l ' jdiagtS) } - 1 / ! E o - r - V ^ l ' { d i a g ( \u00C2\u00A3 ) } - i l y > r / ' With this approach, I F M E is not equivalent to M L E . The ratio of Var(/ip) to Var(/i) is l / { d i a g ( S ) } - 1 S { d i a g ( S ) } - 1 l l / S - 1 l (l ' {diag(E)}-il)2 There is some loss of efficiency with simple P M L A . \u00E2\u0080\u00A2 E x a m p l e 4.3 (Trivariate p r o b i t , general) Suppose we have a trivariate probit model with known cut-off points, such that P ( l l l ) = $3(0, 0, 0, p\2, P13, p23)- We have the following (Tong 1990): P i ( l ) = P 2 ( l ) = P3(l) = * ( 0 ) = (4.2) ^ + ^ ( s i n 1,912+sin 1 p13 + sin 1 p23). (4.3) P ( l l l ) = $3(0,0, 0,pi2,/\u00C2\u00BB13,A>23) The full loglikelihood function is In = n ( l l l ) l o g P ( l l l ) + n(110) log P(110) + n(101) log P(101) + n(100) logP(100) n(011) logP(Oll) + n(010) log P(010) + n(001) log P(001) + n(000) log P(000). Even in this simple situation, the M L E of pjk is not available in closed form. The information matrix for P12, P13 and p23 from one observation is /In I12 I13 \ I = I12 I22 I23 , \Ii3 I23 hs I where, for example, '11 d P ( l l l ) + P(ll l ) V dpi2 1 fdP{011)\2 + 1 /<9P(110)\ /ap(ioi)V fdP(100)\ P(110) V d P l 2 J + P(101) ^ d P l 2 ) + P(100) d P l 2 ) /<9P(010)\ / d P ( 0 0 1 ) V fdP(000)\ p(oii) I, d P l 2 J + P(OIO) V d P l 2 ) + p(ooi) ^ aPi2 ) + P(OOO) ^ d P l 2 ) Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 131 Simple calculation gives us 5 P ( l l l ) / c V i 2 = 1/(4^-^/1 \u00E2\u0080\u0094 p f 2 ) ; a n d other terms also have similar expressions. After simplification, we get 7T3 + 64a - 16TT6 \"\"C 1 - P2i2)cdef ' where a = s i n 1 p i 2 s i n 1 / 9 i 3 s i n 1 p23, 6 = (sin V i 2 ) 2 + (sin 1 pi3f + (sin 1 p23)2, c = 7T + 2 s i n - 1 pi2 + 2 s i n - 1 piz + 2 s i n - 1 p23, d= 7r + 2 s i n - 1 pi2 \u00E2\u0080\u0094 2 s i n - 1 P13 \u00E2\u0080\u0094 2 s i n _ 1 p23, e = 7r \u00E2\u0080\u0094 2 s i n - 1 p\2 + 2 s i n - 1 p\3 \u00E2\u0080\u0094 2 s i n - 1 p23, / = 7T \u00E2\u0080\u0094 2 s i n - 1 pi2 \u00E2\u0080\u0094 2 s i n - 1 p13 + 2 s i n - 1 p23. Other components in the matrix I can be computed similarly. The inverse of / , after simplification, is found to be / \u00C2\u00AB n ai2 a 1 3 \ r1 = a\2 a22 a23 \ a i 3 a 2 3 a 3 3 where a n = <222 = a33 = a i 2 = a i 3 = 023 = ( T 2 - 4 ( s i n - V i 2 ) 2 ) ( l -Ph) ( T 2 -4 4 ( s i n - 1 p i 3 ) 2 ) ( l -Piz) 4 4 ( s i n - 1 / > 2 3 ) 2 ) ( l -pis) (2 s i n 4 _ 1 p i 2 s i n _ 1 p13 - 7 r s i n _ 1 P23)(l - P 2 2 ) 1 / 2 ( l - P213)1/2 (2 s i n - 1 P i 2 s i n _ 1 p23 -2 7 r s i n - 1 Pis)(l - P 2 2 ) 1 / 2 ( l ~ Ph)1/2 (2 s i n _ 1 / ? i 3 s i n _ 1 / ? 2 3 -2 w s i n - 1 P12 ) ( l - p ? 3 ) 1 / 2 ( l ~ Ph)1'2 For the I F M approach, we have 'n , - t ( l l ) + n > t(00) n,-*(10) + n,-t(01)\ dPjk(U) njk 9njk = 0 leads to Pjk = s i n 1 / 2 - ^ ( 1 1 ) 7T n , - t ( l l ) + njt(OO) ~ njfc(lO) - \"jfc(Ol) 3p , 1 < j < k < 3. 2 n If the I F M for one observation is 9 = (^12, ^13, ^23), then from section 2.4, we have E(tl>- VJ ) = V p:kim{yjykyiym) dPjk{yjyk) dPim(yiym) j\" ' m {y^v-} P i * ( W \u00C2\u00BB ) f l m ( \u00C2\u00BB y \u00C2\u00BB ) dpjk 8p,m Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 132 where 1 < j < fc < 3, 1 < / < m < 3, and 'dj>jk\ i E We thus find that dpjk E {yjyk} 1 < j < fc < 3. My = / on bi2 bi3' bi2 b22 623 \ ^13 &23 &33' and Dy = / - 6 1 0 V 0 0 -622 0 \u00C2\u00B0 \ 0 -633/ where bn &22 633 bl2 &13 b23 = (TT2 \u00E2\u0080\u0094 4(sir ( T T 2 \u00E2\u0080\u0094 4(sin\" (7r 2 \u00E2\u0080\u0094 4(sin\" (w2 \u00E2\u0080\u0094 4(sin\" (7r 2 \u00E2\u0080\u0094 4(sin\" pi2)2)(i-pi2y 4 P 1 3 ) 2 ) ( 1 - P 2 3 ) ' 4 P23) 2)(wy 1 6 s i n - 1 p i 2 s i n - 1/3i 3 \u00E2\u0080\u0094 87rsin - 1 P23 Pi2?){*2 - 4(sin - 1 p 1 3 ) ' ) ( l - p ? 2 ) 1 / 2 ( l - p\3fl2 ' 16 s i n - 1 pi2 s i n - 1 p 2 3 \u00E2\u0080\u0094 87r s i n - 1 p\3 P12)2)(*2 ~ 4(sin\" 1 P23) 2)(l - Pl2)1/2(1 - P223)1/2 ' 16 s i n - 1 P13 s i n - 1 P23 \u00E2\u0080\u0094 87rsin - 1 P12 (n2 - 4 (s in - 1 p 1 3) 2)(7r 2 - 4(sin - 1 p23)2)(l - ? 2 3) 1 / 2(1 - ph)1'2 ' After simplification, J^1 = D^1 My(D~^l)T turns out to be equal to I - 1 . Therefore by M-optimality, the I F M approach is as efficient as the IFS approach. The algebraic computation in this example was carried out with the help of the symbolic manip-ulation software Maple (Char et al. 1992). Maple is also used for other analytical examples in this section. For completeness, the Maple program for this example is listed in Appendix A . The Maple programs for other examples in this section are similar. \u00E2\u0080\u00A2 Example 4.4 (Trivariate probit, exchangeable) Suppose now we have a trivariate probit model with known cut-off points, such that P(lll) = $3(0,0,0, p, p, p). That is, the latent variables are permutation-symmetric or exchangeable. With (4.2), we obtain P1(1) = P2(1) = P3(1) = *(0)=|, Pi 2 (H) = Pi 3 (H) = P 2 3 (H) = $ 2(0,0 , p ) = ] + \u00C2\u00B1- s i n - 1 p, P(lll) = $3(0,0,0, p, p, p) = I + A s i n \" 1 p. o 47T The M L E of p is 7r(4niii +4n0 0o -n) Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 133 Based on the full loglikelihood function (4.3), we calculate the Fisher information for p (using Maple). The asymptotic variance of p is found to be (1 - P2)(TT + 6 s i n - 1 p)(7r - 2 s i n - 1 p) Var(p) = 12n Let the I F M for one observation be \P = (ip\2,1P13, ^23) \u00E2\u0080\u00A2 We use W A and P M L A to estimate the common parameter p: i . W A : We have / a b / - a 0 0 \ b a b and \u00C2\u00A3>$ = 0 \u00E2\u0080\u0094a 0 \b b a) I 0 0 - a ) where a = ( 7 r 2 - 4 ( s i n _ 1 p ) 2 ) ( l - p 2 ) 6 = ! sin 1 p (TT - 2 s i n _ 1 p)(w + 2 s i n - 1 p)2{\ - p2) Thus / a - 1 a~2b a~2b\ J * 1 = D^MviDy1) 1\T a 2b a 1 a 2b \a~2b a~2b a'1 J Assume the I F M E of pn, P13, P23 are p\2, P13, P23 respectively. With W A , we find the weighting vector u = (1/3,1/3,1/3)'. So the I F M E of p, pw> is Pw = ^ (Pl2 + Pl3 + P23), and the asymptotic variance of pw is Var(p) = \u00E2\u0080\u0094 u ' J ^ 1 u _ l ( l - / ? 2 ) ( 7 r 2 - 4 ( s i n - 1 p ) 2 ) 2 ( l - p 2 ) ( 7 r - 2 s i n - 1 / 9 ) s i n - 1 p + 3 9 4n _ (1 - p2)(ir + 6 s i n \" 1 p)(ir - 2 s i n \" 1 p) 12n ii . P M L A : The I F M is * = V12 + ^13 + fe- Thus = \u00C2\u00A3(^ 12 + ^ 13 + V\"23) 2n \u00C2\u00A3 P ( y i * \u00C2\u00AB , ) ( \u00C2\u00A3 * / P ( . w w ) ) (4.4) Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 134 and D9 = E(d(ilil2 + V>13 + i>23)/dp) 3 {l/iyaya} j,k=VJi3, ^ 23)- We use W A and P M L A to estimate the common parameter p: i . W A : We have My = I a c d^ c b c I and Dy = \ d c a. / - a 0 0 \ 0 - 6 0 V 0 0 - a I Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 136 where a ( 7 r 2 - 4 ( s i n - V ) 2 ) ( i - / > 2 ) ' ( 7 r 2 - 4 ( s i n-V ) 2 ) ( l - / > 4 ) ' 16/jsin - 1 p d = ( T T 2 - 4(sin- 1 P) 2 )(TT + 2 s i n \" 1 p2)(l - p2)(l + p2)1/2 ' 87rsin _ 1 /9 2 - 16(sin - 1/?) 2 - 1 n\2\2( ( 7 r 2 - 4 ( s i n - 1 / 9 ) 2 ) 2 ( l - / ' 2 ) Thus c(afc)- 1 da~2 \ J * 1 = Dy1 My (Dy1 )T = c(ab)-1 ciab)-1 \ da~2 c(ab)-1 a - 1 / Assume the I F M E of p12, P\3, P23 are p12, P13, f>23 respectively, and let p = (p\2,p\3, P23)'\u00E2\u0080\u00A2 With W A , the I F M E of p, pw, is Pw = u'p, where the weighting vector u = \u00C2\u00AB 2 , \"3)' = Jyl/(l' Jyl). We find that 01020307 Ui = U3 \u00C2\u00AB2 2o 8 [p2a4 + (1 + p 2 )o 5 + p{l + p2y/2a6] (2/>2o4 + p{\ + /9 2 ) 1 / 2 a 6 ) a 2 a 3 2a 8 [p 2 a 4 + (1 + p2)a5 + p(l + p2y'2a6} ' where ai , 02, 03, 04, 05, and ae as above, and a7 = TT + irp2 + 2 s i n - 1 p2 + 2p2 s i n - 1 p2 - 4p(p2 + 1 ) 1 / 2 s i n - 1 p, a8 = T T 2 + 47rs in _ 1 p2 + 4 ( s in - 1 p2)2 - 16(sin _V) 2-Figure 4.2 is a plot of the weights versus p 6 [\u00E2\u0080\u00941,1]. The asymptotic variance of p is Var(p) = V j - y which turns out to be the same as (4.6). ii . P M L A : The I F M is \P = tp12 + rp13 + tp23. Following (4.4) and (4.5), we calculate (using Maple) the corresponding My and Dy, and then Var(/?p) = J^1. The algebraic expression for Var(pp) is complicated, so we do not display it. Instead, we plot the ratio Var(/>p)/Var(/5) versus p 6 [\u00E2\u0080\u00941,1] in the Figure 4.3. The maximum of the ratio is 1.0391, which is attained at p = 0.3842 and p = -0.3842. Chapter 4, The efficiency of IFM approach and the efficiency of jackknife variance estimate 137 Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 138 1 . 0 - 0 . 5 O . O 0 . 5 1 . 0 - 1 . 0 - 0 . 5 O . O 0 . 5 1 . O Figure 4.4: Trivariate probit, AR(1): (a) The efficiency of p from the margins (1,2) or (2,3). (b) The efficiency of p from the margin (1,3). The above results show that I F M with W A leads to an estimate as efficient as the IFS approach in the AR(1) situation, and I F M with simple P M L A leads to a slightly less efficient estimator (ratio< 1.04). The p from the estimating equations based on margin (1,2) (or (2,3)) is different from the p based on margin (1,3). For p from I F M with the (1,2) (or (2,3)) margin, the ratio of the asymptotic variance of the I F M E of p to the asymptotic variance p is 2(,r2 - 4(sin- 1 p)2) [p2aA + (1 + p2)a5 + p(l + p2)^2a6] 7r(l + p2)aia2a3 For p from I F M with (1,3) margin, the corresponding ratio is ( T T 2 - 4(sin- 1 p2)2) [p2aA + (1 + p2)a5 + p(l + p2)1'2^} 7>2 \u00E2\u0080\u0094 2itp2aia.2a3 We plot r\ and r2 versus p 6 [\u00E2\u0080\u00941,1] in Figure 4.4. We see that when p goes from \u00E2\u0080\u00941 to 0, r i increases from 1.707 to values around 2. When p goes from 0 to 1, r*i decreases from values around 2 to 1.707. Similarly, r2 increases from 1.207 to oo as p goes from \u00E2\u0080\u00941 to 0, and decreases from oo to 1.207 as p goes from 0 to 1. We conclude that the (1,3) margin by itself leads to an inefficient estimator in a wide range of the values of p. We notice that r2 > ri when p < 0.6357, r2 < r\ when p > 0.6357, and rx = r2 = 1.97 when p = 0.6357. \u00E2\u0080\u00A2 Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 139 Example 4.6 (Trivariate M C D model for binary data with Morgenstern copula) Suppose we have a trivariate M C D model for binary data with Morgenstern copula, such that P ( l l l ) = ulU2u3[l + e12(l - m ) ( l - u2) + 013(1 - ui)(l - u3) + 023(1 - u2)(l - u3)}, \0{j\< 1, where the dependence parameters 0i2, #13 and 023 obey several constraints: 1 + 012 + $13 + 023 > 0, 1 + #13 > 023 + #23, (4-7) 1 + #12 > #13 + #23, 1 + #23 > #12 + #13-We have Pj(l) = Uj, j = 1,2,3, and P,fc(ll) = [1 + 0jk(l - Uj)(l - uk)]ujUk, 1 < j < k < 3. Assume Uj are given, and the parameters of interest are #12, #13 and #23. The full loglikelihood function is (4.3). The Fisher information matrix for the parameters #i2, #13 and #23 is I. Assume we have I F M for one observation \P = (ipi2,ipi3,ip23). The Godambe information for \P is Jy = DyM^1(Dy)T. We proceed to calculate My and Dy. The algebraic expression of I and Jy are extremely complicated. We used Maple to output algebraic results in the form of C code and then numerically evaluated the ratio r - T r ( ^ \" 1 ) 9 ~ Tr(I-i)' where rg means the general efficiency ratio. For this purpose, we first generate n\ uniform points (#12, #13, #23) from the cube [\u00E2\u0080\u00941, l ] 3 in three dimensional space under the constraints (4.7), and then order these \u00C2\u00ABi points based on the value of |#i2| + |#i3| + |^ 231 from the smallest to the largest. For each one of the ni points (#12,#13,#23), we generate n2 points of (ui,u2,u3) with (#12,#13,#23) as given dependence parameters in a trivariate Morgenstern copula in (2.5) (see section 4.3 for how to generate multivariate Morgenstern variate), and then order these n2 points based on the value of u\+u2+u3 from the smallest to the largest. Each generated set of (u\, u2, u3, #12, #13, #23) determines a trivariate M C D model with Morgenstern copula for binary data. We calculate rg corresponding to each particular model. Figure 4.5 presents the values of rg at n\X n2 = 300 x 300 \"grid\" points. We can see from Figure 4.5 that the I F M approach is reasonably efficient in most situations. It is also clear that the magnitude of |#i2| + |#i3| + |#23| has an effect on the efficiency of the I F M approach, with generally speaking higher efficiency (rg's value close to 1) when |#i2| + |#i3| + |#23| is relatively smaller. The magnitude of u\ + u2 + u3 has some effect such that the efficiency of the I F M approach is lower at the area close to the boundary of u\ + u2 + u3 (that is close to 0 or 3). The following facts show that the general efficiency of I F M approach is quite good: in these 90,000 efficiency (rg) evaluations, 50% of the rg values are less than 1.0196, 90% of the rg values are less than 1.0722, 99% of the rg values are less than 1.1803 and 99.99% of the rg values are less Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 140 \u00E2\u0080\u00A2JOO Figure 4.5: Trivariate Morgenstern-binary model: Relative efficiency of I F M approach versus IFS approach. than 1.4654. The maximum is 1.7084. The minimum is 1. The two plots in Figure 4.6 are used to clarify the above observations. Plot (a) consists of the 90,000 ordered rg values versus their ordered positions (from 1 to 90,000) in the data set. Plot (b) is a histogram of the rg values. Overall, we consider the I F M approach to be efficient. It is also possible to examine the efficiency ratio in some special situations. We study two of them here. The first one is the situation where ui = u2 \u00E2\u0080\u0094 u3 \u00E2\u0080\u0094 u and 0 i 2 = #13 = #23 \u00E2\u0080\u0094 9 (\u00E2\u0080\u00941/3 < 0 < 1). The ratio of the asymptotic variance of 9 (based on WA) versus the asymptotic variance of 9 is found to be 0,10,20,3 n(u,9) = 6i6 2 where ai = 270 2u 4 - 540 2 u 3 + 330 2 u 2 - 1O0U2 - 60 2u + 1O0U - 3 9 - 1 , a 2 = 36>V - 9 0 V + 903u4 - 110V - 3 0 3 u 3 + 220 2 u 3 - 1202M2 + 9u2 + 9 2 u - 9 u - 9 - l , a 3 = 9u2 -0u + l, 61 = (S9u2 - 69u + 30 + l ) (30u 2 - A9u + 9 + 1), 62 = (30u 2 - 20w + l ) (30u 2 + l ) (0u 2 - 9u - l ) 2 . Figure 4.7 is a plot of r i (\u00C2\u00AB , 0) versus u (0 < u < 1) and 0 (\u00E2\u0080\u00941/3 < 0 < 1). We observe that at the boundaries, when 0 = 1, r\{u, 9) is in the interval (1,1.232), and the maximum 1.232 is attained at Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 141 (a) (b) oo 0 1 o 2 0 0 0 0 4 0 0 0 0 eoooo 8 0 0 0 0 1 . 0 1 . 2 1 . 4 1 . 6 Figure 4.6: Trivariate Morgenstern-binary model: (a). Ordered relative efficiency values of I F M approach versus IFS approach; (b) A histogram of the efficiency value rg. u = 0.2175 o r u = 0.7825. When 9 = \u00E2\u0080\u00941/3, ri(u, 9) is in the interval (1,2), and the maximum 2 is attained at u = 0 or u = 1. Since the maximum ratio is 2 at some extreme points in the parameter space and for the most part the ratio is less than 1.1, we consider the I F M approach to be efficient. The second special situation is where u\ = u2 = 1*3 = u, 9\2 = #23 = 9 and #13 = 92. The algebraic expression of the ratio r2(u, 9) of the asymptotic variance of 9 (based on WA) versus the asymptotic variance of 9 extends to several pages. We thus only present a plot of r2(u, 9) versus u (0 < u < 1) and 9 (\u00E2\u0080\u00941 < 9 < 1) in Figure 4.8. We observe that at the boundaries when 9 = 1 , the ratio r2(u,9) is in the interval (1,1.200097), and the maximum is attained at u = 0.2139 or u = 0.7861. When 9 = \u00E2\u0080\u00941, the ratio r2(u,9) is in the interval (1,1.148333), and the maximum is attained at u = 0.154 or 0.846. Overall, the I F M approach is demonstrated again to be efficient. \u00E2\u0080\u00A2 Example 4.7 (Trivariate normal-copula model for binary data) In Examples 4.3, 4.4 and 4.5, we studied the efficiency of the I F M approach versus the IFS approach in the special situations of P ( l l l ) = $3 (0 ,0 ,0 ,p 1 2 , /> i3 , />2 3 ) , P(U1) = $3(0,0,0,p,p,p) and P ( l l l ) = $ 3 (0,0,0,p,p 2 ,p) . We found that the I F M approach was fully efficient in these situations. For a general trivariate normal-binary model P ( l l l ) = $ 3 ( $ 1(ui),$ 1(w 2),$ 1{U3),P12,P13,P23), (4.8) Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 142 Figure 4.7: Trivariate Morgenstern-binary model: Relative efficiency of I F M approach versus IFS approach when u\ = u2 = \u00C2\u00AB3 and 9\2 = #13 = #23-Figure 4.8: Trivariate Morgenstern-binary' model: Relative efficiency of I F M approach versus IFS approach when m = u2 \u00E2\u0080\u0094 113, 9\2 \u00E2\u0080\u0094 $23 = 9 and #13 = 92. Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 143 the closed form efficiency evaluation, as provided for the trivariate Morgenstern-binary model in Ex-ample 4.6, is not possible because $ 3 ( $ _ 1 ( u i ) , $ - 1 ( u 2 ) , $ - 1 ( u 3 ) , P 1 2 , P 1 3 , P 2 3 ) does not have closed form. Nevertheless, since a high precision multinormal probability calculation subroutine (Schervish 1984) is available, we can evaluate the efficiency numerically. With the model (4.8), we have Pj(l) = ujt j - 1,2,3 and Pjk(ll) = $2(3>_ 1(u;)> $ - 1(ujfe); Pjk), 1 < j < k < 3. Assume Uj are given, and the parameters of interest are p i 2 , P13 and P23- Let 61 \u00E2\u0080\u0094 P i 2, #2 = P13 and 63 \u00E2\u0080\u0094 p23- The Fisher information matrix from one observation for the parameters 61, 62 and O3, I, has the following expression /hi I12 Ii3\ I = I12 I22 I23 V ^13 ^23 ^331 where 2 T - ST 1 (^123(2/12/22/3)'\ . _ 1 9 \u00E2\u0080\u009E \" \" , ^ , PMvittoVs) { d9j J {yiyaya} T 1 5Pl23(2/l2/22/3) 5Pl23(2/l2/22/3) , . I i k - \"5\u00E2\u0080\u00947 \ ?SZ l < j < k < 6 . r z - ' , ^123(2/12/22/3) ddj 36k We can similarly calculate the Godambe information matrix J$ based on the I F M approach for one observation. We then numerically evaluate the ratio (T-optimality) T r ( J ^ ) 9 T r ^ - 1 ) in the joint trinormal copula sample space and its parameter space. Similar to Example 4.6 for the trivariate Morgenstern-binary model, we first generate rti uniform points of (pi2, Pi3> P23) from the cube [\u00E2\u0080\u00941, l ] 3 in three dimensional space under the constraints 1 + 2pi2Pi3/>23 \u00E2\u0080\u0094 P12 \u00E2\u0080\u0094 P13 \u00E2\u0080\u0094 P23 > 0 (which guarantees that the determinant of a trinormal correlation matrix is positive) and order these ni points based on the value of \pn\ + \pi3\ + |/>231 from the smallest to the largest. Then for each one of the n\ points (P12, P13, P23), we generate n 2 points ( u i , \u00C2\u00AB 2 , U 3 ) with (P12,P13,P23) as given dependence parameters in a trinormal copula, and order these ri2 points based on the value of U i + U2 + U3 from the smallest to the largest. Each generated set of (ui, 112, 113, P12, P13, P23) determines a trivariate normal-binary model. We evaluate rg corresponding to each particular model. The plot in Figure 4.9 presents the values of rg at ni x 712 = 300 x 300 \"grid\" points for the trivariate normal-copula model for binary data. We observe from the plot that the I F M approach is reasonably efficient in most situations. It is also clear that the magnitude of |pi2| + \pi3\ + \P23\ has an effect on the efficiency of the I F M approach, with generally higher efficiency (rg's value close to 1) when Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 144 Figure 4.9: Trivariate normal-binary model: Relative efficiency of I F M approach versus IFS ap-proach. \pn\ + \pi3\ + \P23\ is smaller. The magnitude of ui -f u2 + u3 has some effect such that the efficiency of I F M approach is lower at the area close to the boundaries of u\ + u2 + u3 (that is close to 0 or 3). In general the I F M approach is quite efficient: in these 90,000 efficiency (rg) evaluation, 50% of the rg values are less than 1.0128, 90% of the rg values are less than 1.0589, 99% of the rg values are less than 1.1479 and 99.99% of the rg values are less than 1.3672. The maximum is 1.8097. The minimum is 1. The two plots in Figure 4.10 are used to clarify the above observations. Plot (a) consists of the 90,000 ordered rg values versus ordered positions (from 1 to 90,000) in the data set. Plot (b) is a histogram of the rg values. Overall, we draw the conclusion that the I F M approach is efficient. In the situation where ui = u2 = u3 = u and p\2 = pi3 = p23 = p (\u00E2\u0080\u00941/2 < 9 < 1), let us denote r\(u,p) the ratio of the asymptotic variance of p (based on WA) versus the asymptotic variance of 9. ri(u, p) has to be evaluated numerically. Figure 4.11 shows a plot of ri(u, p) versus u (0 < u < 1) and p (\u00E2\u0080\u00941/2 < p < 1). It is difficult to evaluate ri(u,p) numerically when the values of u and p are near the boundaries of the sample space and the parameter space, but generally speaking, the efficiency is lower when the values of u and p are close to the boundaries. In the situation where ui = u2 = u3 = u, p\2 = p23 = p and p\3 \u00E2\u0080\u0094 p2 (p \u00C2\u00A3 [\u00E2\u0080\u00941,1]), we observed similar efficiency behaviour. These results are not presented here. \u00E2\u0080\u00A2 Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 145 Figure 4.11: Trivariate normal-binary model: Relative efficiency of I F M approach versus IFS ap-proach when ui = u2 = U3 and p\2 = P13 = p2z-Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 146 We have seen from the trivariate normal-copula model for binary data and the trivariate Morgenstern-copula model for binary data that, in some situations, I F M is as efficient as IFS (e.g. when u = 0.5 for normal-binary model and u = 0, 0.5 or 1 for Morgenstern-binary model). In other situations, the efficiency of I F M relative to IFS varies from 1 to a value very close to 1. It is hoped the above results may help to develop intuition for the efficiency of I F M . We would guess that the relative efficiency of I F M to IFS for a model with the M U B E property should be good, as we have seen with the trivariate normal-copula model for binary data and the trivariate Morgenstern-copula model for binary data. However, a general exhaustive analytical investigation such as above is not possible; we have to rely on numerical investigation based on simulation for most of the complicated (higher dimensions or models with covariates) situations. 4.3 Efficiency assessment through simulation In this section, we give efficiency assessment results through simulation studies with various models. The following are the steps in the simulation and computation: (1) a M C D or M M D model (with M U B E property) is chosen; (2) different sets of model parameters are specified; (3) with a given set of parameters, a sample of size n is generated from the model, and I F M and IFS approaches are used on the same generated data set to estimate the model parameters; (4) with the same set of parameters, step (3) is repeated m times; (5) for any single parameter in the model, say 9, if the estimates of 9 with the I F M approach from step (3) and (4) are 0\,...,6m, and the estimates of 9 with IFS approach from step (3) and (4) are 9\,..., 9m, then we compute \u00C2\u00A3 = 2 X i A M S E ( i ) = \u00C2\u00A3 \u00C2\u00A3 i ( g < - g ) 2 (4.9) m m and m m The relative efficiency of I F M E to M L E is defined as the ratio r where r 2 = MSE(#)/MSE(#). The values of 9, ^ M S E ( 0 ) , 6, ^ M S E ( 0 ) and r are tabulated, with yJlMSE(6) and ^ M S E ( 0 ) presented in parentheses. For a fixed sample size, a parameter estimation approach is said to be good if 9 (or 9) is close to 9, and if ^ M S E ( f l ) (or y /MSE(#)) is small. There is no \"good\" in the strict sense, it should be under-stood in terms of inference, interpretation (i.e. no misleading interpretation or false inference would be derived, assuming the model is correct) and in comparison with conventional, well-established Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 147 approach. The main objective of this section is to show that with fairly complex models, the I F M approach still has high efficiency. M u l t i v a r i a t e copula discrete models for b i n a r y data In this subsection, we study the M C D models for binary data. The parameters are assumed to be margin-dependent. In our simulation, we use the M V N copula, and simulate (/-dimensional binary observations y,- (i = 1 , . . . , n) from a multivariate probit model Yij = I(Zij < z^), j = 1 , . . . , d, i = l , . . . , n , where Zj = (Zn,.. .,Zn)' ~ MVNd(0,Qi) with z,, = \u00C2\u00A3 j x y , and 0* = (Oijk) assumed to be free of covariates, that is 0,- = 0 or Oijk = #jA, V i. We transform the dependence parameter 6jk with $jk = (exp(ajfc) \u00E2\u0080\u0094 l)/(exp(ajfc) + l) , and estimate ajk instead of Ojk- We use the following simulation scheme: 1. The sample size is n, the number of simulations is N; both are reported in the tables. 2. For d = 3, we study the two situations: Yij = I(Zij < Zj) and Y,;- = I(Zij < fyo + fij\Xij). For each situation, two general dependence structures are chosen: #12 = #13 = #23 = 0.6 (or a12 = a13 = a 2 3 = 1.3863) and 012 = 023 = 0.8 (or a 1 2 = a 2 3 = 2.1972), 013 = 0.64 (or a i 3 = 1.5163). Other parameters are: (a) With no covariates, with z = (0,0,0)'. (b) With covariates, with 0O = (/?1 0,/3 2 0,/?3 0)' = (0.7,0.5,0.3)' and & = (/3n,/?21,/?31)' = (0.5,0.5,0.5)'. Situations where Xij is discrete and continuous are considered. For the discrete situation, X{j = I(U < 0) where U ~ U(\u00E2\u0080\u00941,1); for the continuous situation, XijS are margin-independent, that with Xi ~ N(0,1/4). 3. For d = 4, we only study Y,j = I(Z{j < Zj). Two dependence structures in the study are #12 = #13 = #14 = #23 = #24 - #34 = 0.6 (or C*12 = c*i3 = a i 4 = a 2 3 = a 2 4 = <*34 - 1.38 63) and #12 = #23 = #34 = 0.8 (or a i 2 = a 2 3 = a 3 4 = 2.1972), # i3 = #24 = 0.64 (or a i 3 = a 2 4 = 1.5163) and #i4 = 0.512 (or a i 4 = 1.1309). The cut-off points are (a) z = (0,0,0,0)', (b) z = (0.7,0.7,0.7,0.7)', (c) z = (0.7,0,0.7,0)'. The numerical results from M C D models for binary data are presented in Table 4.1 to Table 4.5. These tables lead to two clear conclusions: i) The I F M approach is efficient relative to the Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 148 Table 4.1: Efficiency assessment with M C D model for binary data: d = 3, z = (0,0,0)', N = 1000 n m a r g i n parameters 1 2 3 (1,2) (1,3) (2,3) Zl Zi Z3 \u00C2\u00AB 1 2 \u00C2\u00AB 1 3 C*23 a i 2 = \u00C2\u00AB 1 3 = \u00C2\u00AB 2 3 = 1.3863 100 I F M M L E r 0.003 -0.002 0.005 1.442 1.426 1.420 (0.131) (0.121) (0.128) (0.376) (0.380) (0.378) 0.002 -0.003 0.004 1.441 1.426 1.420 (0.131) (0.121) (0.128) (0.376) (0.380) (0.378) 0.998 0.999 0.999 0.999 0.999 0.999 1000 I F M M L E r -0.0006 -0.0016 -0.0008 1.3924 1.3897 1.3906 (0.040) (0.038) (0.039) (0.114) (0.114) (0.113) -0.0018 -0.0028 -0.0019 1.3919 1.3893 1.3902 (0.040) (0.038) (0.039) (0.114) (0.114) (0.113) 0.997 0.997 0.997 1.000 1.001 1.000 a 1 2 = a 2 3 = 2.1972, c*i3 = 1.5163 100 I F M M L E r 0.0027 -0.0006 0.0003 2.2664 1.5571 2.2586 (0.131) (0.123) (0.130) (0.454) (0,377) (0.453) 0.0015 -0.0020 -0.0012 2.2646 1.5552 2.2579 (0.131) (0.123) (0.131) (0.453) (0.377) (0.452) 0.999 1.000 0.999 1.001 1.001 1.002 1000 I F M M L E r -0.0006 -0.0001 -0.0005 2.2009 1.5174 2.2043 (0.040) (0.038) (0.039) (0.135) (0.118) (0.136) -0.0023 -0.0020 -0.0022 2.2003 1.5166 2.2036 (0.040) (0.038) (0.039) (0.135) (0.118) (0.137) 0.996 1.000 0.996 0.999 1.000 1.000 M L approach, for small to large sample sizes. The ratio values r are very close to 1 in almost all the situations studied. These results are consistent with the results from the analytical studies reported in the previous section, ii) The M L E may be slightly more efficient than the I F M E , but this observation is not conclusive. We would say that I F M E and M L E are comparable. Multivariate copula discrete models for ordinal data In this subsection, we study the M C D models for ordinal data. The parameters are assumed to be margin-dependent. In our simulation, we use the M V N copula. We simulate d-dimensional ordinal observations (i = 1 , . . . , n) from a multivariate probit model for ordinal data, such that ' Yj = 1 i f f Z j(0) < Zj < Zj(l), Yj =2iftzj(l) 0), g(X) a Gamma density function, having the form fl'(Aj) = {l/[/?\" , r(aj)] }Aj J - 1 e - A 3 ' / ' 3 j ' , Aj > 0, with /?j being a scale parameter, and G(Aj) is a Gamma cdf. We have -E'(Aj) = \u00C2\u00ABj/?j and Var(Aj) = ctjPj. (4.11) is a multivariate Poisson-Morgenstern-gamma model. The multiple integral in (4.11) over the joint space of A i , . . . , A^ can be decomposed into a product of integrals of a single variable. The calculation of P(y\ \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 - yd) can thus be accomplished by calculating 2d univariate integrals. In fact, we have d d p(2/i---^)=np(%-)+x>*{ j=l j \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 > u \" 0 = , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 > \u00C2\u00ABm-i) + J2f=~i ejm(l ~ 2UJ)(1 - 1um), it follows that / ( \u00C2\u00AB ! , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 - l i m ) = 1 + H7=ldirn(l-2Uj) _ 2 ( ^ 0 ^ ( 1 - 2 ^ ) ) ^ / ( U l , . . . , \u00C2\u00AB m - l ) / ( \u00C2\u00AB l , - - - , \u00C2\u00AB m - l ) \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 - , \u00C2\u00AB m - l ) Hence \u00E2\u0080\u00A2\m \u00E2\u0080\u0094 1 / E r = ~ i l ^ m ( i - 2 \u00C2\u00AB j ) \ C ( W m | \u00C2\u00A3 / l = C / m _ l = U m _ i ) = 1 + \u00E2\u0080\u0094 7 7 7 \u00C2\u00AB \u00E2\u0080\u009E \ / ( U l , . . . , U m _ i ) J Er=\"l gjm(l-2ti,-) 2 ~ 1/ / ( u i , . . . , u m _ i ) m Let A = / ( \u00C2\u00AB ! , . . . , u m _i ) , 5 = E J L / 9jm(l-2uj), and D = B / A From Du2n \u00E2\u0080\u0094 (D+l)um+Vm = 0, we get \u00E2\u0080\u00A2_\u00E2\u0080\u00A2(\u00C2\u00A3> + 1) \u00C2\u00B1 V ( D + l ) 2 - 4 \u00C2\u00A3 > V m 2\u00C2\u00A3> Thus the algorithm for generating U\,.. .,Ud from C ( \u00C2\u00AB i , . . . , is as the following: 1. Generate V\,..., Vd from Uniform(0,1). Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 160 2. Let Ui = Vi. 3. Let A = 1 if m = 2 and A = 1 + \u00C2\u00A3 \" 7 / M 1 ~ 2 u i X 1 ~ 2 u O i f m > 2 - L e t B = T%=7i eim{l -IUJ), and D = B/A. 4. For m > 2, if \u00C2\u00A3) = 0, Um = Vm. If D ^ 0, (7m takes one of the values of [(\u00C2\u00A3> + 1) \u00C2\u00B1 \/{D + l ) 2 \u00E2\u0080\u0094 4DVm]/[2D] for which it is positive and less than 1. The efficiency studies with the multivariate Poisson-Morgenstern-gamma model are carried out only for the dependence parameters Ojk, in that univariate parameters are fixed. We use the following simulation scheme: 1. The sample size is n = 3000, the number of simulations is N = 200. 2. The dimension d is chosen to be 3, 4 and 5. 3. The marginal parameters aj and f3j are fixed. They are aj = f3j = 1 for j = 1 , . . . , d. 4. For each dimension, two dependence structures are considered: (a) For d = 3, we have (0 1 2 ,0 1 3 ,0 2 3 ) = (0.5,0.5,0.5) and (0 1 2 ,0 1 3 , 0 2 3 ) = (0.6,0.7,0.8). (b) For d = 4, we have (612,6>i3,014.623,024,034) = (0.5,0.5,0.5,0.5,0.5,0.5) and (0 i2 ,0 i3 ,0 i4 ,023 ,024 ,0 3 4 ) = (0.6,0.7,0.8,0.6,0.7,0.6). (c) For d = 5, we have (012,013,014,015,023,024,025,0 3 4 ,0 3 5 ,0 4 5 ) = (0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5) and (012 ,013 ,014 ,015 ,023 ,024 ,0 2 5 , 0 3 4 , 0 3 5 , 0 4 5 ) = (0.6,0.7,0.8,0.8,0.6,0.7,0.8,0.6,0.7,0.8). The numerical results from the M M D models for count data with the Morgenstern copula are presented in Table 4.16 to Table 4.18. We obtain similar conclusions to those for the M C D models for binary, ordinal and count data. Basically, they are: i) The I F M approach is efficient relative to the M L approach; the ratio values r are very close to 1 in almost all the situations studied, ii) M L E may be slightly more efficient than I F M E , but this observation is not conclusive. I F M E and M L E are comparable. Chapter 4. The efficiency of I F M approach and the efficiency of jackknife variance estimate 161Table 4.16: Efficiency assessment with multivariate Poisson-Morgenstern-gamma model, d = 3 parameters 012 013 023 (012,6 ' l 3 , 023) = (0.5,0.5,0.5) I F M M L E r 0.495 (0.125) 0.494 (0.122) 1.022 0.500 0.501 (0.125) (0.124) 0.499 0.500 (0.124) (0.123) 1.008 1.003 (012,6 ' l 3 , 023) = (0.6,0.7,0.8) I F M M L E r 0.603 (0.127) 0.600 (0.127) 1.000 0.699 0.792 (0.118) (0.119) 0.697 0.790 (0.120) (0.119) 0.985 0.995 Table 4.17: Efficiency assessment with multivariate Poisson-Morgenstern-gamma model, d = 4 parameters 0 1 2 013 014 023 024 034 ( 0 1 2 , 0 1 3 / 0 i 4 , 0 2 3 , 0 2 4 , 0 3 4 ) = 0.5,0.5,0.5.0.5,0.57o.5) I F M 0.500 0 9 5 0.513 UMB 0.494 0.488 (0.131) (0.128) (0.124) (0.133) (0.134) (0.138) M L E 0.501 0.493 0.512 0.497 0.495 0.485 (0.130) (0.124) (0.122) (0.131) (0.132) (0.135) r 1.008 1.026 1.014 1.021 1.018 1.019 ~ ( 0 1 2 , 0 1 3 , 0 1 4 , 0 2 3 , 0 2 4 , 0 3 4 ) =_(0-6,0.7,0.8,,0.6,0.7,0 6) I F M \u00E2\u0080\u0094 0.593 \u00E2\u0080\u0094freJSO\"\u00E2\u0080\u0094 0.794 0.589^\u00E2\u0080\u0094 0.692 0.599 (0.130) (0.127) (0.120) (0.121) (0.133) (0.124) M L E 0.589 0.678 0.792 0.585 0.689 0.598 (0.127) (0.124) (0.117) (0.118) (0.129) (0.124) r 1.017 1.026 1.024 1.025 1.037 1.006 Table 4.18: Efficiency assessment with multivariate Poisson-Morgenstern-gamma model, d \u00E2\u0080\u0094 5 parameters 012 013 014 015 023 024 025 034 035 045 (012, 013, 0 1 4 , 0 1 5 , 0 2 3 , 0 2 4 , 0 2 5 , 0 3 4 , 0 3 5 , 0 4 5 ) = (0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5) I F M M L E r 0.501 0.496 0.477 0.486 0.511 0.467 0.478 0.504 0.493 0.495 (0.122) (0.131) (0.137) (0.132) (0.123) (0.130) (0.123) (0.128) (0.131) (0.116) 0.495 0.493 0.473 0.482 0.508 0.466 0.475 0.503 0.489 0.494 (0.121) (0.125) (0.134) (0.128) (0.122) (0.125) (0.119) (0.127) (0.128) (0.113) 1.012 1.046 1.023 1.026 1.002 1.043 1.037 1.011 1.026 1.018 (012, 013, 0 i 4 , 0 1 5 , 0 2 3 , 0 2 4 , 0 2 5 , 0 3 4 , 0 3 5 , 0 4 5 ) = (0.6,0.7,0.8,0.8,0.6,0.7,0.8,0.6,0.7,0.8) I F M M L E r 0.595 0.667 0.775 0.767 0.597 0.693 0.778 0.590 0.693 0.602 (0.140) (0.137) (0.132) (0.130) (0.127) (0.139) (0.118) (0.136) (0.126) (0.125) 0.590 0.666 0.772 0.766 0.593 0.690 0.778 0.588 0.687 0.604 (0.137) (0.132) (0.128) (0.126) (0.119) (0.135) (0.115) (0.132) (0.124) (0.113) 1.023 1.036 1.029 1.032 1.067 1.029 1.028 1.028 1.018 1.103 Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 162 4.4 I F M efficiency for models with special dependence struc-ture The I F M approach may have important applications for models with special dependence struc-ture. Data with special dependence structure arise often in practice: longitudinal studies, repeated measures, Markov type dependence data, fc-dependent data, and so on. The analytical assessment of the efficiency of the I F M approach for several models with special dependence structure were studied in section 4.2. In the following, we give some numerical results for I F M efficiency for some more complex models with special dependence structure. The estimation approach that we used here is P M L A . We only present representative results from the M C D model for binary data, with the M V N copula of exchangeable and AR(1) dependence structures. Results with other models are quite similar, as we also observed in section 4.3 for various situations with a general model. We use the following simulation scheme: 1. The sample size is n = 1000, the number of simulations is TV = 200. 2. The dimension d are chosen to be 3 and 4. 3. For d = 3, we considered two marginal models Yij = I(Zij < Zj) and Yij = I(Z{j < ctjo + oijiXij), with Xij = I(U < 0) where U ~ uniform(\u00E2\u0080\u0094 1,1), and with the regression parameters (a) with no covariates: z = (0.5,0.5,0.5)' and z = (0.5,1.0,1.5)'; (b) with covariates: a0 = ( \u00C2\u00AB i o . \" 2 0 , \u00C2\u00AB 3 o ) ' = (0.5,0.5,0.5)', (*i = ( a n , a 2 l , a 3 l ) ' = (1,1,1)' and ot0 = (c*io, a 2 0 , \" 3 0 ) ' = (0.5,0.5,0.5)', ori = ( a n , a 2 i , a 3 l ) ' = (1,0.5,1.5)'. For each marginal model, exchangeable and AR(1) dependence structures in the M V N copula are considered, with the single dependence parameter in both cases being 0,- = [exp(/?o+/?iif\u00C2\u00AB) \u00E2\u0080\u0094 l]/[exp(/?o +/?iu>,-) +1], with Wi = I(U < 0) where U ~ uniform(\u00E2\u0080\u00941,1), and parameters /?o = 1 and /?i = 1.5. 4. For d = 4, we only study Yij = I (Zij < Zj), with the marginal parametersz = (0.5,0.5,0.5,0.5)', and z = (0.5,0.8,1.2,1.5)'. For each marginal model, exchangeable and AR(1) dependence structures in M V N copula are considered. The single dependence parameter in both cases is Bi = [exp(/?0) - l]/[exp(/?0) + 1], with /?0 = 1.386 and /30 = 2.197 for both situations. The numerical results from these models with special dependence structure are presented in Table 4.19 to Table 4.26. We basically have the same conclusions as with all other general cases Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 163 Table 4.19: Efficiency assessment with special dependence structure: d = 3, z = (0.5,0.5,0.5)' parameters zi z 2 Z3 Bp ti\ exchangeable, tin = 1, B\ = 1.5 I F M 0.496 0.497 \u00E2\u0080\u00940^97 ITSSB 1.511 (0.043) (0.041) (0.042) (0.118) (0.194) M L E 0.494 0.496 0.496 0.996 1.520 (0.041) (0.040) (0.041) (0.118) (0.195) r 1.047 1.021 1.015 1.003 0.998 AR(1), Ai, = 1, 0i = 1.5\" T F M 0.496 0.497 0.496 UM2 O W ~ (0.043) (0.041) (0.041) (0.123) (0.185) M L E 0.494 0.497 0.495 0.994 1.512 (0.042) (0.040) (0.041) (0.119) (0.183) r 1.034 1.027 1.012 1.031 1.015 Table 4.20: Efficiency assessment with special dependence structure: d = 3, z = (0.5,1.0,1.5)' parameters zi zi Z3 tin ti\ exchangeable, tin \u00E2\u0080\u0094 1, ti\ \u00E2\u0080\u0094 1.5 TFM 0.496 0.997 \u00E2\u0080\u0094TAW 0T9\"9\"S 1.531 (0.043) (0.047) (0.064) (0.154) (0.249) M L E 0.496 0.996 1.499 0.997 1.534 (0.043) (0.047) (0.063) (0.156) (0.247) r 1.009 0.999 1.010 0.986 1.008 AR(1) 0o, = 1, A = 1.5 ~ I F M 0.496 0:997 1.500 0^91 1.509 (0.043) (0.047) (0.063) (0.158) (0.250) M L E 0.496 0.996 1.500 0.993 1.518 (0.043) (0.046) (0.062) (0.156) (0.249) r 1.017 1.013 1.018 1.011 1.003 studied previously. These conclusions are: i) The I F M approach ( P M L A ) is efficient relative to the M L approach; the ratio values r are very close to 1 in almost all the studied situations, ii) M L E may be slightly more efficient than I F M E , but this observation is not conclusive. I F M E and M L E are comparable. 4.5 Jackknife variance estimate compared with Godambe information matrix Now we turn to numerical evaluation of the performance of jackknife variance estimates of I F M E . We have shown, in Chapter 2, that the jackknife estimate of variance is asymptotically equivalent to the estimate of variance from the corresponding Godambe information matrix. The jackknife approach may be preferred when the appropriate computer packages are not available to compute Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 164 Table 4.21: Efficiency assessment with special dependence structure: d = 3, ct$ = (0.5,0.5,0.5)', Of! = (1,1,1)' ~parameters am Qn apn Q21 Qan \"31 Po 01 ~ exchangeable, Bn = 1, Pi = 15 I F M 0.500 1.020 0.499 i .ulO 0.500 1.002 UM0 1.536 (0.055) (0.109) (0.060) (0.108) (0.059) (0.104) (0.153) (0.242) M L E 0.500 1.018 0.498 1.010 0.500 0.999 0.978 1.556 (0.052) (0.104) (0.059) (0.107) (0.058) (0.102) (0.152) (0.250) r 1.052 1.048 1.011 1.007 1.018 1.028 1.002 0.968 AR(1) Bp = \" l p\ = 1.5 I F M 0.500 1.020 0.499 1.010 0.497 1.002 0^88 1.529 (0.055) (0.109) (0.060) (0.108) (0.058) (0.101) (0.158) (0.233) M L E 0.501 1.017 0.499 1.009 0.497 0.999 0.985 1.545 (0.052) (0.104) (0.059) (0.105) (0.058) (0.100) (0.157) (0.235) r 1.043 1.047 1.023 1.022 1.004 1.004 1.008 0.991 Table 4.22: Efficiency assessment with special dependence structure: d \u00E2\u0080\u0094 3, oro = (0.5,0.5,0.5)', cti = (1,0.5,1.5)' \"parameters am a n qpn 021 0:30 \u00C2\u00AB 3 i Pa Pi exchangeable, pp = 1, Pi = 1.5 I F M 0.500 1.020 0.499 6.H0 0.500 1.512 UME 1.528 (0.055) (0.109) (0.060) (0.089) (0.059) (0.141) (0.160) (0.238) M L E 0.500 1.017 0.498 0.510 0.500 1.506 0.983 1.539 (0.052) (0.103) (0.059) (0.089) (0.058) (0.132) (0.159) (0.239) r 1.047 1.050 1.011 1.002 1.017 1.070 1.004 0.996 AH(1), 00 = 1 Pi = 1-5 ~ I F M 0.500 1.020 0.499 0.510 0.497 1.514 0^9r3 T 3 T 8 -(0.055) (0.109) (0.060) (0.089) (0.058) (0.140) (0.159) (0.225) M L E 0.500 1.017 0.499 0.510 0.497 1.510 0.994 1.530 (0.053) (0.104) (0.059) (0.089) (0.057) (0.133) (0.158) (0.223) _r 1.041 1.045 1.021 1.003 1.006 1.049 1.007 1.010 Table 4.23: Efficiency assessment with special dependence structure: d= 4, z = (0.5,0.5,0.5,0.5)' parameters Zi Z2 z 3 24 Jo exchangeable, Po = 1.386 I F M M L E r 0.502 0.499 (0.041) (0.043) 0.501 0.499 (0.041) (0.043) 1.000 1.002 0.501 (0.042) 0.500 (0.042) 1.005 0.501 (0.042) 0.500 (0.042) 1.003 1.387 (0.071) 1.389 (0.070) 1.013 AR(1), A, = 1.386 I F M M L E r 0.502 0.499 (0.041) (0.043) 0.502 0.499 (0.041) (0.043) 0.996 1.000 0.501 (0.041) 0.500 (0.041) 0.998 0.497 (0.042) 0.496 (0.042) 0.998 1.385 (0.072) 1.387 (0.069) 1.047 Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 165 Table 4.24: Efficiency assessment with special dependence structure: d = 4, z = (0.5, 0.8,1.2,1.5)' parameters 2 l 2 3 2 4 Bo exch angeable, 0o = 1.386 I F M M L E r 0.502 (0.041) 0.502 (0.041) 0.998 fi.803 (0.045) 0.802 (0.045) 1.004 1.199 (0.052) 1.198 (0.052) 1.002 1.494 (0.061) 1.492 (0.061) 1.007 1.389 (0.087) 1.391 (0.087) 1.004 A R I D , 0O = 1.386 I F M M L E r 0.502 (0.041) 0.502 (0.041) 0.999 0.803 (0.045) 0.802 (0.045) 1.006 1.20 (0.05) 1.20 (0.05) 1.00 1.495 (0.067) 1.494 (0.065) 1.017 1.388 (0.085) 1.389 (0.083) 1.025 Table 4.25: Efficiency assessment with special dependence structure: d = 4, z = (0.5,0.5,0.5,0.5)' parameters Z\ Z2 2.3 2 4 00 exch angeable, 0o = 2.197 I F M M L E r 0.502 (0.041) 0.500 (0.041) 0.999 ' Q.501 (0.042) 0.499 (0.042) 1.000 0.501 (0.042) 0.499 (0.042) 0.999 0.501 (0.042) 0.499 (0.042) 1.000 2.200 (0.093) 2.202 (0.092) 1.015 AR(1), 0o = 2.197 I F M M L E r 0.502 (0.041) 0.501 (0.041) 0.995 0.501 (0.042) 0.499 (0.043) 0.993 0.501 (0.042) 0.499 (0.042) 1.000 0.499 (0.042) 0.498 (0.042) 0.999 2.194 (0.086) 2.199 (0.084) 1.025 Table 4.26: Efficiency assessment with special dependence structure: d = 4, z = (0.5,0.8,1.2,1.5)' parameters Zl 22 2 3 2 4 0o exchangeable, 0o = 2.197 I F M M L E r 0.502 0.802 (0.041) (0.046) 0.501 0.801 (0.041) (0.046) 0.996 1.002 1.201 (0.056) 1.199 (0.055) 1.005 1.499 (0.060) 1.496 (0.059) 1.003 2.203 (0.114) 2.204 (0.111) 1.031 AR(1), 0o = 2.197 I F M M L E r 0.502 0.802 (0.041) (0.046) 0.501 0.801 (0.041) (0.046) 0.997 1.005 1.198 (0.052) 1.196 (0.052) 0.993 1.500 (0.060) 1.500 (0.060) 1.000 2.200 (0.110) 2.200 (0.100) 1.040 Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 166 the Godambe information matrix or when the asymptotic variance in terms of Godambe informa-tion matrix is difficult to compute analytically or computationally. For example, to compute the asymptotic variance of P(yi \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 -yd',0) by means of Godambe information is not an easy task. To complement the theoretical results in Chapter 2, in this subsection, we give some analytical and nu-merical comparisons of the variance estimates from Godambe information and the jackknife method. The application of jackknife methods to modelling and inference of real data sets is demonstrated in Chapter 5. Analytical comparison of the two approaches Example 4.8 (Multinormal, general) Let X ~ Nd(p, \u00C2\u00A3 ) , and suppose we are interested in esti-mating p. Given n independent observations x i , . . . , x n from X, the I F M E of p is p = n - 1 ^ \" = 1 x,-, and the corresponding inverse of the Godambe information matrix is J ^ 1 = E . A consistent estimate of Jy1 is n \u00C2\u00BB = 1 The jackknife estimate of the Godambe information matrix is nVj = n J2(hi) ~ fi&V) ~ ^)T> i=l where p^ = (n \u00E2\u0080\u0094 l ) _ 1 (n/2 \u00E2\u0080\u0094 x,). Some algebraic manipulation leads to n2 1 n n ^ = (^Ti)2^D*'-\u00C2\u00AB(\u00C2\u00AB'-\u00C2\u00ABT. which is a consistent estimate of S. Furthermore, we see that n2 -_, which shows that the jackknife estimate of the Godambe information matrix is also good when the sample size is moderate to small. \u00E2\u0080\u00A2 Example 4.9 (Multinormal, common marginal mean) Let X ~ Nd(p, E) , where p = (pi,..., pd)' = pi and \u00C2\u00A3 is known. We are interested in estimating the common parameter p. Given n independent observations x i , . . . , x\u00E2\u0080\u009E with same distributions as X, the I F M E of p by the weighting approach is (see Example 4.2) _ _ I/IT 1 / ! Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 167 The inverse of Godambe information of pw is j - 1 1 The jackknife estimate of the Godambe information is iVj = \u00C2\u00AB ^ ( j i B ( i ) - K ) ( j i B ( i ) - / i t J f , 8 = 1 where = l ' E 1 / i ^ \ i / l / E 11. Some algebraic manipulation leads to l ' E \" 1 nVj = T^i L 8 = 1 TAT , \u00E2\u0080\u009E ; + u \u00E2\u0080\u009E 2 _ i \ 2 * l ' E - 1 ! We replace n J2\"=i(h) ~ Mh) ~ # with n 2 / ( n - 1) 2 E. Thus n V j ~ ( n - 1 ) 2 l ' E - i l ' and \" 2 - i * (^31)2- J* > which shows that the jackknife estimate of the Godambe information is also good when the sample size is moderate to small. \u00E2\u0080\u00A2 N u m e r i c a l compar ison of the two approaches In this subsection, we numerically compare the variance estimates of I F M E from the jackknife method and from the Godambe information. For this purpose, we use a 3-dimensional probit model with normal copula. The comparison studies are carried out only for the dependence parameters 8jk- For the chosen model parameters, we carry out TV simulations for each sample size n. For each simulation s (s = 1 , . . . , TV) of sample size n, we estimate model parameters #12, #13, #23 with the I F M approach. Let us denote these estimates 6$ % 813, &23- We then compute the jackknife estimate of variance (with g groups of size m such that g x m = n) for > 1^3^ ) 2^3^ \u00E2\u0080\u00A2 We denote these (s) (s) (s) \u00E2\u0080\u00A2 . ~ ~ ~ variance estimates by v\2 , v\3 , 1*23\u00E2\u0080\u00A2 Let the asymptotic variance estimate of t?i2, ^13, ^23 based on the Godambe information matrix from a sample of size n be V12, ^13, ^23. We compare the following three variance estimates: (i) . MSE: i \u00C2\u00A3 f = 1 $ ' 2 > - M 2 , * \u00C2\u00A3 f = i $ 3 - M 2 , ' * E T = i ( ^ - M 2 ; (ii) . Godambe: v 1 2 , vi3, v23] Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 168 (iii). Jackknife: 1v[s2\ \u00C2\u00A3\u00C2\u00A3iLiwis> wT,1=iv23-The M S E in (i) should be considered as the true variance of the parameter estimate assuming unbiasedness. (ii) and (iii) should be compared with each other and also with (i). Table 4.27 and Table 4.28 summarize the numerical computation of the variance estimates of 0 1 2 , #i3 , 0 23 based on approaches (i), (ii) and (iii). For the jackknife method, the results for different combinations of (g, m) are reported in the two tables. In total four models with different marginal parameters z = (21,22,23) and different dependence parameters 6 = (0i2, #13, 0 2 3) are studied. The details about the parameter values are reported in the tables. We have studied two sample sizes: n = 500 and n = 1000. For both sample sizes, the number of simulations is N = 500. From examining the two tables, we see that the three measures are very close to each other. We conclude that the jackknife method is indeed consistent with the Godambe information computation approach. Both approaches yields variance estimates which are comparable to M S E . In conclusion, we have shown theoretically and demonstrated numerically in several cases that the jackknife method for variance estimation compares very favorably with the Godambe information computation. We are willing to extrapolate to general situations. The jackknife approach is simple and computationally straightforward (computationally, it only requires the code for obtaining the parameter estimates); it also has the advantage of easily handling more complex situations where the Godambe information computation is not possible. One major concern with the jackknife approach is the computational time needed to carry out the whole process. If the computing time problem is due to an extremely large sample size, appropriate grouping of the sample for the sake of applying the jackknife approach may improve the situation. A discussion is given in Section 2.5. Overall, we recommend the general use of the jackknife approach in applications. 4.6 S u m m a r y In this chapter, we demonstrated analytically and numerically that the I F M approach is an efficient parameter estimation procedure for M C D and M M D models with M U B E or P U B E properties. We have chosen a wide variety of cases so that we can extrapolate this conclusion to the general situation. Theoretically, we expect I F M to be quite efficient because it is closely tied to M L E in that each inference function is a likelihood score function of a margin. For comparison purposes, we carried out M L estimates for several multivariate models. Our experience was that finding the M L E is a difficult and very time consuming task for multivariate models, while the I F M E is Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 169 Table 4.27: Comparison of estimates of standard error, (i) true, (ii) Godambe, (iii) jackknife with g groups; N = 500, n = 1000 approach\" | Pi 2 (0.0,0.7,0.0)', f= (-(9,\u00E2\u0084\u00A2) (1000,1) (500,2) (250,4) (125,8) (100,10) (50,20) 0.7, (m) 0.002079 0.002012 0.002030 0.002028 0.002025 0.002058 0.002046 0.002089 0.5,0.5,-0.5) Z23_ 0.001704 0.001645 0.001646 0.001653 0.001658 0.001653 0.001663 0.001685 0.002085 0.002012 0.002038 0.002043 0.002047 0.002046 0.002046 0.002089 z = (0.7,0.0 0.7)', 6 = (0.5,0.9,0.5) i 0.002090 0.000281 0.002200 0.002012 0.000295 0.002012 (g,m) (iii) (1000,1) 0.002026 0.000299 0.002023 (500,2) 0.002027 0.000300 0.002021 (250,4) 0.002036 0.000300 0.002035 (125,8) 0.002056 0.000302 0.002049 (100,10) 0.002063 0.000301 0.002054 (50,20) 0.002088 0.000301 0.002067 z = (0.7,0.7,0.7)', 6 = (0.9,0.7,0.5) i) 0.000333 0.001218 0.002319 (ii) 0.000295 0.001239 0.002187 (9,m) (iii) (1000,1) 0.000302 0.001254 0.002208 (500,2) 0.000303 0.001257 0.002210 (250,4) 0.000302 0.001260 0.002212 (125,8) 0.000303 0.001267 0.002216 (100,10) 0.000305 0.001261 0.002214 (50,20) 0.000310 0.001252 0.002220 9, rn) A (m) 1000,1) 500,2) 250,4) 125,8) 100,10) 50,20) z = (1Q, 0.5.0.0)', 0 = (0.8,0.6,0.8) 0.000821 0.000869 0.000873 0.000874 0.000877 0.000884 0.000887 0.000899 0.002147 0.002089 0.002129 0.002118 0.002108 0.002119 0.002138 0.002151 0.000766 0.000666 0.000683 0.000683 0.000681 0.000688 0.000687 0.000690 Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 170 Table 4.28: Comparison of estimates of standard error, (i) true, (ii) Godambe, (iii) jackknife with groups; N = 500, n = 500 o.oy,gy= U approacn 13_ Z23_ = (0.0,0.7, (m) ( -0.5,0.5,-03) ( f lS\u00E2\u0084\u00A2) (500.1) (250.2) (125,4) (50,10) 0.004158 0.004024 0.004085 0.004071 0.004053 0.004115 0.003135 0.003290 0.004262 0.004024 0.003315 0.004104 0.003333 0.004122 0.003331 0.004119 0.003396 0.004176 z = (0 7,0.0,0.7)', 0 = (0.5,0.9,0.5) (9,\u00E2\u0084\u00A2) (500.1) (250.2) (125,4) (50,10) (m) 0.003998 0.004024 0.004062 0.004062 0.004091 0.004123 0.000602 0.000591 0.003768 0.004024 0.000604 0.004049 0.000601 0.004054 0.000607 0.004103 0.000617 0.004171 : (0.7,0.7, 0.7)', 6 = (0.9,0.7,0.5) I (m) (9,\u00E2\u0084\u00A2) (500.1) (250.2) (125,4) (50,10) 0.000632 0.000591 0.002688 0.002479 0.004521 0.004374 0.000607 0.000611 0.000616 0.000622 5,0.0)', 0 = (0.8,0.6,0.8) 0.002501 0.004410 0.002510 0.004425 0.002533 0.004467 0.002539 0.004501 (i 0.001634 0.003846 0.001413 (ii) 0.001738 0.004179 0.001332 (iii) 500,1) 0.001821 0.004397 0.001365 (250,2) 0.001837 0.004407 0.001368 (125,4) 0.001846 0.004433 0.001360 (50,10) 0.001876 0.004476 0.001388 Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 171 computationally simple and results in significant saving of computing time. We further demonstrated numerically that the jackknife method yields SEs for the I F M E , which are comparable to the SEs obtained from the Godambe information matrix. The jackknife method for variance estimates has significant practical importance as it eliminates the need to calculate the partial derivatives which are required for calculating the Godambe information matrix. The jackknife method can also be used for estimates of functions of parameters (such as probabilities of being in some category or probabilities of exceedances). The I F M approach together with the jackknife estimation of SE's make many more multivariate models computationally feasible for working with real data. The I F M theory as part of statistical inference theory for multivariate non-normal models is highly recommended because of its good asymptotic properties and its computational feasibility. This approach should have significant prac-tical usefulness. We will demonstrate its application in Chapter 5. Chapter 5 Modelling, data analysis and examples Possessing a tool is one thing, but using it effectively is quite another. In this chapter, we explore the possibility of effectively using the tools developed in this thesis for multivariate statistical modelling (including I F M theory, jackknife variance estimation, etc.) and provide data analysis examples. In section 5.1, we first discuss out view of the proper data analysis cycle. This is an important issue since the interpretation of the results and maybe the possible indication of further studies are directly related to the way that the data analysis was carried out. We next discuss several other important issues in multivariate discrete modelling, such as how to make the the choice of models and how to deal with checking the adequacy of models. We also provide some discussion on the testing of dependence structure hypotheses, which is useful for identifying some specific multivariate models. In section 5.2, we carry out several data analysis examples with the models and inference procedure developed in the previous chapters. We show some applications of the models and inference procedures developed in this thesis and point out difficulties related to multivariate nonnormal analysis. 172 Chapter 5. Modelling, data analysis and examples 173 5.1 Some issues on modelling 5.1.1 Data analysis cycle A proper data analysis cycle usually consists of initial data analysis, statistical modelling, diagnostic model assessment and inferences. The initial data analysis may consist of computing various data summaries and examining various graphical representation of data. The type of summary statistics and graphical representations depend on the basic features of the data set. For example, for binary, ordinal and count data, we can compute the empirical frequencies (and percentages) of response variables as well as covariates, separately and jointly. If some covariates are continuous, then standard summaries such as the mean, median, standard deviation, quartiles, maximum, minimum, as well as graphical displays such as boxplots and histograms could be examined. To have a rough idea of the dependence among the response variables, for binary data, a check of the pairwise log odds ratios of the responses could be helpful. Another convenient empirical pairwise dependence measure for multivariate discrete data, which is particularly useful for ordinal and count data, is a measure called gamma. This measure, for ordinal and count data y ; = (yn, yii), i= 1 , . . . , n, is defined as where C = \u00C2\u00A3 \" = 1 E\"=i1 ( y n > yin) * I(yi2 > yi'2) and D = J2\"=1 E \" = i < &'i) * Kyi2 > yi'2), and / is the indicator function. In (5.1), C can be interpreted as the number of concordant pairs and D the number of discordant pairs. The gamma measure is studied in Goodman and Kruskal (1954), and is considered as a discrete generalization of Kendall's tau for continuous variables. The properties of the gamma measure follow directly from its definition. Like the correlation coefficient, its range i s \u00E2\u0080\u0094 1 < 7 < 1 : 7 = 1 when the number of discordant pairs D = 0 , 7 = \u00E2\u0080\u0094 1 when the number of concordant pairs C = 0 , and 7 = 0 when the number of concordant pairs equals the number of discordant pairs. Other dependence measures as the discrete generalizations of Kendall's tau or Spearman's p can also be used for ordinal response and count response as well as binary response. Furthermore, summaries such as means, variances and correlations could also be meaningful and useful for count data. Initial data analysis is particularly important in multivariate analysis, since the structure of multivariate data is much more complicated than that of univariate data, and the initial data analysis results will shed light on identifying the suitable statistical models. Statistical modelling usually consists of specification, estimation, and evaluation steps. The specification formulates a probabilistic model which is assumed to have generated the observed 7 = C - D C + D (5.1) Chapter 5. Modelling, data analysis and examples 174 data. At this stage, to choose appropriate models, relevant questions are: \"What is the nature of the data?\" and \" How have the data been generated?\" The chosen models should make sense for the data. The decision concerning which model to fit to a set of data should, if possible, be the result of a prior consideration of what might be a suitable model for the process under investigation, as well as the result of computation. In some situations, a data set may have several suitable alternative models. After obtaining estimation and computation results, model selections could be made based on certain criteria. Diagnostics consist of assessments of the reliability of the estimates, the fit of the model and the overall performance of the model. Both the fitting error of the model and possibly prediction error should be studied. We should also bear in mind that often a small fitting error does not lead to a small prediction error. Sometimes, it is necessary to seek a balance between the two. Appropriate diagnostic checking is an important but not easy step in the whole modelling process. At the inference stage, relevant statements about the population from which the sample was taken can be made based on the statistical modelling (mainly probabilistic models) results from the previous stages. These inferences may be the explanation of changes in responses over margin or time, the effects of covariates on the probabilities of occurrence, the marginal and conditional behaviour of response variables, the probability of exceedance, as well as of hypothesis testing as suggested by the theory in the application domain, and so on. Some relevant questions are: \"How can valid inference be drawn?\", \"What interpretation can be given to the estimates?\", \"Is there a structural interpretation, relating to the underlying theory in the application?\", and \"Are the results pointing to further studies?\" 5.1.2 M o d e l s e l e c t i o n When modelling a data set, usually it is required only that the model provide accurate predictions or other aspects of data, without necessarily duplicating every detail of the real system. A valid model is any model that gives an adequate representation of the system that is of interest to the model user. Often a large number of equally good models exist for a particular data set in terms of the specific inference aspect of interest to the practitioner. Model selection is carried out by comparing alterna-tive models. If a model fits the data approximately as well as the other more complex models, we usually prefer the simple one. There are many criteria to distinguish between models. One suitable criterion for choosing a model is the associated maximum loglikelihood value. However, within the Chapter 5. Modelling, data analysis and examples 175 same family, the maximum loglikelihood value usually depends on the number of parameters esti-mated in the model, with more parameters yielding a bigger value. Thus maximizing this statistic cannot be the sole criterion since we would inevitably choose models with more parameters and more complex structure. In application, parsimonious models which identify the essential relations between the variables and capture the major characteristic features of the problem under study are more useful. Such models often lead to clear and simple interpretation. The ideal situation is that we arrive at a simple model which is consistent with the observed data. In this vein, a balance between the size of the maximum loglikelihood value and the number of parameters is important. But it is often difficult to judge the appropriateness of the balance. One widely used criterion is the Akaike Information Criterion (AIC), which is defined as A I C =-2^(0;y) + 2s, where \u00C2\u00A3(6; y) is the maximum loglikelihood of the model, and s is the number of estimated param-eters of the model. (With I F M estimation, the A I C is modified to A I C = -2\u00C2\u00A3(6;y) + 2s.) By definition, a model with a smaller A I C is preferable. The A I C considers the principles of maximum likelihood and the model dimensions (of number of parameters) simultaneously, and thus aims for a balance of maximum likelihood value and model complexity. The negative of A I C / 2 is asymp-totically an unbiased estimator of the mean expected loglikelihood (see Sakamoto et al. 1986); thus A I C can be interpreted as an unbiased estimator of the -2 times the expected loglikelihood of the maximum likelihood. The model having minimum A I C should have minimum prediction error, at least asymptotically. In the use of A I C , it is the difference of A I C values that matters and not the actual values themselves. This is because of the fact that A I C is an estimate of the mean expected loglikelihood of a model. If the difference is less than 1, the goodness-of-fit of these models are almost the same. For a detailed account of A I C , see Sakamoto et al. (1986). The A I C was introduced by Akaike (1973) for the purpose of selecting an optimal model from within a set of proposed models (hypotheses). The A I C procedure has been used successfully to identify models; see, for example, Akaike (1977). The selection of models should also be based on the understanding that it is an essential part of modelling to direct the analysis to aspects which are relevant to the context and to omit other aspects of the real world situation which often lead to spurious results. This is also the reason that we have to be careful not to overparameterize the model, since, although this might improve the goodness-of-fit, it is likely to result in the model portraying spurious features of the sampled data, which may detract from the usefulness of the achieved fit and may lead to poor prediction. The Chapter 5. Modelling, data analysis and examples 176 selection of models should also be based on the consideration of the practical importance of the models, which in turn is based on the nature and extent of the models and their contribution to our understanding to the problem. Statistical modelling is often an iterative process. The general process is such that after, a promising member from a family of models is tentatively chosen, parameters in the model are next efficiently estimated; and finally, the success of the resulting fit is assessed. The now precisely defined model is either accepted by this verification stage or the diagnostic checks carried out will find it lacking in certain respects and should then suggest a sensible modified identification. Further estimation and checking may take place, and the cycle of identification, estimation, and verification is repeated until some satisfactory fits obtain. 5.1.3 D i a g n o s t i c c h e c k i n g A model should be judged by its predictive power as well as its goodness-of-fit. Diagnostic checking is a procedure for evaluating to what extent the data support the model. The A I C only compares models through their relative predictive power; it doesn't assess the goodness-of-fit of the model to the data. In multivariate nonnormal analysis, it is not obvious how the goodness-of-fit checking could be carried out. We discuss this issue in the following. There are many conventional ways to check the goodness-of-fit of a model. One direct way to check the model is by means of residuals (mainly for continuous data). A diagnostic check based on residuals consists of making a residual plot of the (standardized) residuals. Another frequently applied approach is to calculate some goodness-of-fit statistics. When the checking of residuals is feasible, the goodness-of-fit statistics are often used as a supplement. In multivariate analysis, direct comparison of estimated probabilities with the corresponding empirical probabilities may also be considered as a good and efficient diagnostic checking method. For multivariate binary or ordinal categorical data, a diagnostic check based on residuals of observed data is not meaningful. However statistics of goodness-of-fit are available in these situations. We illustrate the situation here by means of multivariate binary data. For a d-dimensional random binary vector Y with a model P, its sample space contains 2d elements. We denote these by k = 1,..., 2d with k representing the kt\x particular outcome pattern and Pk the corresponding probability, with Efc=i \u00E2\u0080\u0094 1- Assume n is the number of observations and n i , . . . , n2i are the empirical frequencies corresponding to k = 1,.. .,2d. Let Pk be the estimate of Pk for a specified model. Under the hypothesis that the specified model is the true model and with the assumption of Chapter 5. Modelling, data analysis and examples 111 some regularity conditions (e.g. efficient estimates, see Read and Cressie 1988, \u00C2\u00A74 .1) , Fisher (1924) shows, in the case if Pk depends on one estimated parameter, that the Pearson \ 2 type statistic x 2 = ^ ( n ^ n h l ( 5 . 2 ) is asymptotically chi-squared with 2d \u00E2\u0080\u0094 2 degrees of freedom. If Pk depends on s (s > 1) estimated parameters, then the generalization is that (5.2) is asymptotically chi-squared with 2d \u00E2\u0080\u0094 s \u00E2\u0080\u0094 1 degrees of freedom. A more general situation is that Y depends on a covariate of g categories. For each category of the covariate, it has the situation of (5.2). If we assume independence between the categories of the covariate, we can form an overall Pearson x2 type test statistic for the goodness-of-fit of the model as t i t ! nWpW where v is the index of the categories in the covariate. Suppose we estimated s parameters in the model; thus P^ depends on s parameters. Under the hypothesis that the specified model is the true model, the test statistic X2 in (5.3), with some regularity conditions (e.g. efficient estimates), is asymptotically X^2d-i)-3> w n e r e 9 is the number of categories of the covariate, and s is the total number of parameters estimated in the model. Similarly, an overall loglikelihood ratio type statistic G2 = 2 \u00C2\u00A3 E \u00C2\u00BB<\"> -og[n^/(n^P^)} (5.4) v=lk=l is also asymptotically X^2<\u00C2\u00AB-i)-\u00C2\u00BB- ^ 2 and G2 are asymptotically equivalent, but there are not the same in finite sample case, so sometimes there is a question of which statistic to choose. Read and Cressie (1988) may shed some light on this matter. The computation of the test statistic X2 or G2 requires the calculation of Pk\"\ which may not be easily obtained, depending on the copula associated with the model (for example, it is generally feasible with mixture of max-id copula but only feasible with relatively low dimension for multinormal copula, unless approximations are used). One frequently encountered problem in applications with multivariate binary or ordinal cate-gorical data (also count data) is that when the dimension of the response is relatively high, the empirical frequency for some particular outcomes of the response vector is relatively small or even zero. Thus Pk or P^ would usually be very small for any particular model, and (5.2) or (5.3) with its related statistical inferences are not suitable in these situations. What we may still do in terms of goodness-of-fit checking in these situations is to limit the comparison of Pfc(x,) with nfc(x;)/n(x,-) by tables and graphics to outcomes of non-zero frequency (where x,- is the covariate vector), or to Chapter 5. Modelling, data analysis and examples 178 calculate 4o= E (nk lkhk? o r G \u00C2\u00AB = 2 E \u00C2\u00BB*i\u00C2\u00B0g(\u00C2\u00BB*/aO, (5-5) {nfc>a} { \" k > \u00C2\u00AB } where hk = J2l=i -Pfc(xt')> where k represent the kth patterns of the response variables, and plot X^ (or G 2 a ^) versus a = {1, 2, 3,4,5} to get a rough idea of how the model fits the non-zero frequency observations. The data obviously support the model if the observed values of X2a^ (or G2a^) go down quickly to zero, while large values indicate potential model departures. Obviously, in any case, some partial assessments using (5.2) or (5.3) may be done for some lower-dimensional margins where frequencies are sufficiently large. Sometimes, these kinds of goodness-of-fit checking may be used to retain a model while (5.5) is not helpful. The statistics in (5.5) and related analysis can be applied to multivariate count data as well. Furthermore, a diagnostic check based on the residuals of the observed counts is also meaningful. If there are no covariates, quick and overall residual checking cab be based on examining iij =yij-Y,[Yij\Yii-i~9] (5.6) for a particular fixed j, where Y t ] _ j means the response vector Y , with the j th margin omitted. The model is considered as adequate based on residual plot in terms of goodness-of-fit if the residuals are small and do not exhibit systematic patterns. Note that the computation of E f Y i j l Y ^ - j , ^ ] may not be a simple task when the dimension d is large (e.g d > 3). Another rough check of the goodness-of-fit of a model for multivariate count data is to compare the empirical marginal means, variances and pairwise correlation coefficients with the corresponding means, variances and pairwise correlation coefficients calculated from the fitted model. In principle, a model can be forced to fit the data increasingly well by increasing its number of parameters. However, the fact that the fitting errors are small is no guarantee that the prediction errors will be. Many of the terms in a complex model may simply be accounting for noise in the data. The overfitted models may predict future values quite poorly. Thus to arrive at a model which represents only the main features of the data, selection and diagnostic criteria which balance model complexity and goodness-of-fit must be used simultaneously. As we have discussed, often there are many relevant models that provide an acceptable approximation to reality or data. The purpose of statistical modelling is not to get the \"true\" model, but rather to obtain one or several models which extract the most information and better serve the inference purposes. Chapter 5. Modelling, data analysis and examples 179 5.1.4 T e s t i n g t h e d e p e n d e n c e s t r u c t u r e We next discuss a topic related to model identification. Short series of longitudinal data or repeated measures with many subjects often exhibit highly structured pattern of dependence structure, with the dependence usually becoming weaker as the time separation (if the observation point is time) increases. Valid inferences can be made by borrowing strength across subjects. That is, the consis-tency of a pattern across subjects is the basis for substantive conclusions. For this reason, inferences from longitudinal or repeated measures studies can be made more robust to model assumptions than those from time series data, particularly to assumptions about the nature of the dependence. There are many possible structures for longitudinal or repeated measures type dependence. The exchangeable or AR(l)-like dependence structures are the simplest. But in a particular situation, how to test to see if a particular dependence structure is more plausible? The A I C for model comparison may be a useful index. In the following, we provide an alternative approach for testing special dependence structures. For this purpose, we first give a definition and state two results that we are going to use in the later development. A reference for these materials is Rao (1973). D e f i n i t i o n 5.1 (Genera l ized inverse of a matr ix ) A generalized inverse of an n x m matrix A of any rank is an m x n matrix denoted by A~ which satisfies the following equality: AA~ A = A. \u00E2\u0080\u00A2 Resul t 5.1 (Spectral decomposi t ion theorem) Let A be a real n x n symmetric matrix. Then there exists an orthogonal matrix Q such that Q'AQ is a diagonal matrix whose diagonal elements Ai > A 2 > \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 > A\u00E2\u0080\u009E are the characteristic roots of A, that is / Ai 0 \u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2 0\ 0 A 2 \u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2 0 Q'AQ V 0 0 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2\u00E2\u0080\u00A2 A\u00E2\u0080\u009E / \u00E2\u0080\u00A2 Resul t 5.2 / / X ~ Np(p, Ex) , and E x is positive semidefinite, then a set of necessary and sufficient conditions for X ' A X ~ xl{&2) is (i)tr(AL-x) = r and p.'Ap = S2, (ii) E X A E X ^ E X = E X A E X , (iii) p!AH-ynAp = p'Ap, (iv) p'(AT,x)2 = p'AH. X2{&2) denotes the non-central chi-square distribution with noncentality parameters2. \u00E2\u0080\u00A2 Chapter 5. Modelling, data analysis and examples 180 In the following, we are going to build up a general statistical test, which in turn can be used to test exchangeable or AR(l)-type dependence assumptions. Suppose X ~ Np(p, \u00C2\u00A3 x ) where E x is known. We want to test if / i = pi, where p is a constant. Let a = E ^ l / l ' E x 1 ! , then X - a'Xl = {Xi - a'X,..., Xp - a'X)' = B X , where B = I - H'E^1/l\"E^l, and / is the identity matrix. Thus BX ~ Np(Bp, BZXB'). It is easy to see that Rank(B) = p \u00E2\u0080\u0094 1, it implies that Rank(BExB') = p \u00E2\u0080\u0094 1. By Result 5.1, there is an orthogonal matrix Q, such that ^ A j . . . 0 0^ J 3 E X B ' = Q 0 \ 0 A p - i 0 0 0/ Q', where Ai > A 2 > \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 > A p _ i > 0. Let A = Q 0 V o 0 0 \ 0 \u00E2\u0080\u00A2 o 1 / then A is a full rank matrix. It is also easy to show that A is a generalized inverse of 5Ex-B ' , and all the conditions in Result 5.2 are satisfied, we thus have X'B'ABX-xl-iV2), where S2 = p'B'ABp, S2 > 0. b2 = 0 is true if and only if Bp = 0, and this in turn is true if and only if p = pi, that is p should be an equal constant vector. Thus under the null hypothesis p = pi, we should have X'B'ABX-xl-i, where xP-i means central chi-square distribution with p \u00E2\u0080\u0094 1 degrees of freedom. Now we use an example to illustrate the use of above results. Example 5.1 Suppose we choose the multivariate logit model with multinormal copula (3.1) with correlation matrix 0 = (Ojk) to model the d-dimensional binary observations y 1 ; . . . , y \u00E2\u0080\u009E . We want to know if an exchangeable (that is Ojk = 0 for all 1 < < k < d and for some \0\ < 1) or an AR(1) Chapter 5. Modelling, data analysis and examples 181 (that is 9jk = 0^ for all 1 < j < k < d and for some |t9| < 1) correlation matrix in the multinormal copula is the suitable assumptions. The above results can be used to test these assumptions. Let $W be the I F M E of 0 from the (j,k) bivariate margin, and ~6 = (0~(12\ 0~(13\ ..., fifa-1.*)). By Theorem 2.4, we have asymptotically ~6~ Nd{d_1)/2(01,X0), where is the inverse of Godambe information matrix of 0. Thus under the exchangeable or AR(1) assumptions of 0 , we have asymptotically where 0B'AB~e~x\i-i),2-i, a -5 = 7 - 1 l ' E r 1 6 6 A = Q 0 \ 0 ^ d ( c i - l ) / 2 - l 0 0 1 / Q', and Q is an orthogonal matrix from the spectral decomposition / A x . . . 0 0\ where Ai > A 2 > \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 > Xd(d-i)/2-i > 0. 0 \u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2 A d ( d _ 1 ) / 2 _ i 0 \ 0 \u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2 0 0 / The above results are valid for large samples, and can be used in the applications for a rough judgement about the special dependence structure assumptions, though would typically have to be estimated from the data. 5.2 Data analysis examples In this section, we apply and compare some models developed in Chapter 3 on some real data sets, and illustrate the estimation procedures of Chapter 2. Following the discussion in section 5.1, the examples show the stages of the data analysis cycle and the special features related to the specific type of data. Chapter 5. Modelling, data analysis and examples 182 5.2.1 Example with multivariate/longitudinal binary response data In this subsection, several models for multivariate binary response data with covariates are applied to a subset of a data set from the \"Six Cities Study\" discussed and analyzed by Ware et al. (1984) and Stram et al. (1988). The Six Cities Study is a longitudinal investigation of the effects of indoor and outdoor air pollution on respiratory health. As in'most longitudinal studies, there were missing data for some subjects. In this analysis we consider a subset of data with no missing values, gathered in the study on the occurrence of persistent wheeze (graded as wheeze 1 and none 0) of children (total number of 1020) followed from ages 9 to 12 yearly in two different cities: Kingston-Harriman, Tenessee ( K H T ) , and Portage, Wisconsin (PW) in the US. The outdoor air pollution is measured by the children's residence location, that is, the two cities. These two cities have very different ambient air quality. K H T (coded as 1 in the data set) is influenced by air pollution from several metropolitan and industrial areas, and thus has relatively high average concentrations of fine particulate matter and acid aerosols. P W (coded as 0 in the data set) is located in a region that has relatively low concentrations of these polluants. Indoor pollution is measured by level of maternal smoking graded as 1 (> 10 cigarettes) or 0 (< 10 cigarettes). Let us call the outdoor air pollution variable \"City\" , and the indoor pollution variable \"Smoking\". Smoking is a time-dependent covariate since level of maternal smoking may vary from year to year, and City is considered as time-independent covariate (for the four-year period) since no one in the study moved over the four years. More documentation of the study can be found in Ware et al. (1984) and Stram et al. (1988). Some of the potential scientific questions are: (1) Does the prevalence of wheeze differ between cities or smoking groups? If so, does the difference change over time. If the effects are constant over time, how should they be estimated? (2) How should the rate of respiratory disease for children whose mothers smoke be compared to the rate for children whose mothers do not smoke? Tables 5.1 - 5.3 summarize the initial data analysis. Table 5.1 provides the univariate summaries of the data, with the percentages of I's for the binary response and predictor variables (City and Smoking at 4 time points which we denote by Smoking9, SmokinglO, Smokingll and Smokingl2). We see, from response variables Age 9 to Age 12, that the incidence of persistent wheeze for ages 9 to 12 decreases slightly across the ages. The same is true for the maternal smoking levels. Table 5.2 contains the frequencies of the response vector of the 4 time points when ignoring the effects of the covariates. Table 5.3 has the pairwise log odds ratio for the response variables, ignoring the covariates; it gives some indication of the amount of dependence in the response variables in addition Chapter 5. Modelling, data analysis and examples 183 to Table 5.2. Table 5.3 indicates that the dependence for consecutive years is larger. Multivariate binary response models that were used to model the data include 1. The multivariate logit model from section 3.1, with a. multinormal copula (3.1), b. multivariate Molenberghs-Lesaffre construction i. with bivariate normal copula, ii . with Plackett copula (2.8), iii . with Frank copula (2.9). c. mixture of max-id copula (3.3), d. the permutation symmetric copula (3.8). 2. The multivariate probit model with multinormal copula. The Multivariate logit-normal model (a M M D model) is also used to model this data set, but since in this model fitting, the variance parameters estimates (CTJ , j = 1,2,3,4) all go to 0, it reduces this model in fact to a M C D model, thus we will not pursue the M M D models fitting with this data set further. Only the results with M C D model fitting are reported here. Since we have the covariates City and Smoking, there is a question of how to include these variables into the models. For subject i (i = 1 , . . . , 1020), the cut-off points are Z{j (j = 1,2,3,4) for an univariate probit or logit model. A suitable approach for the cut-off points to be functions of covariates is to let Zij = ctjo + ctji * City,- + ctj2 * Smoking^. To let the dependence parameters be functions of covariates is more complicated. Many possibilities are open. A simple approach is to let the dependence parameters be independent of covariates. This may serve the general modelling purpose in many situation while keeping the model simple. Besides this simple approach, partly for illustrative purposes, we also examine the situation where the dependence parameters depend on the covariate City. For model (la), the dependence parameters are Oijk for the subject i, 1 < j < k < 4. There are many ways to include covariates to the dependence parameters Oijk, as we have discussed in section 3.1 for model (la). For a general dependence structure, we may simply let Oijk = [exp(/?jtio+/? ;'fcii*Cityi) \u00E2\u0080\u0094 l]/[exp(/?jj;]o+/?jifc,i*Cityi)+l]. Another two dependence structures appropriate (suggested by the nature of the study and the initial data analysis) for this data set are exchangeable and AR(1) type structure with 0,- = (Oijk) for the ith subject. The exchangeable situation is that Oijk = Oi for some < 1. The AR(1) situation is Oijk = o\J~k^ for some \9i\ < 1. In Chapter 5. Modelling, data analysis and examples 184 both situations, we let 0,- = [exp(/?o-|-/?i*City,-) \u00E2\u0080\u0094 l]/[exp(/?o+/?i*City i) + l]. For models (lbi), (lbii), (lbiii), we first let higher order (> 3) parameters rjijki a n d ?7i,i234 be constant, say 1. (This is usually good enough for practical purposes, refer to section 3.1.) We next let the parameters appearing in the bivariate copulas be functions of covariates. Assume that for model (lbi), the dependence parameters in the bivariate copulas are Oijk- Since Oijk are correlation coefficients in bivariate normal copulas, we let 6ijk = [exp(8jkfi + 0jk,i * City,-) - l]/[exp(0jk,o + 0jk,i * C i t y J + 1]. For model (lbii), assume are the parameters in the Plackett copulas; we let 6ijk = exp(0jk,o + Pjk,i * City,-). For model (lbiii), assume <5,-jfcS are dependence parameters in the bivariate Frank copulas; we let 8ijk = exp(/?jjto + Pjk,i * City,) . For model (lc), the dependence parameters are 6*,- and S{ jk (1 < j < k < 4). (We let the parameter of asymmetry vij = 0 for all i and j.) t9,- represent a general minimum level of dependence, and Sijk represent bivariate dependence exceeding the minimum dependence. For the dependence parameters, we let Sijk \u00E2\u0080\u0094 exp(0jkfl + 0jk,i * City,) and 6i = exp(/?o) be independent of covariates. For model (Id), the dependence parameters are We let 9i = exp(0o + 0i * City,) . For model (2), the dependence structure is the same as model (la). We use \"1\" to denote the logit model and \"p\" to denote the probit model. For the univariate marginal regressions, at least two situations could be considered: regression coefficients differ across margins (or times), denoted by \" m d \" ; and regression coefficients common across margins (or times), denoted by \"mc\". For the regression of the dependence parameters, for models (la) and (2), we consider the general (denoted by \"g\"), exchangeable (denoted by \"e\") and AR(1) (denoted by \"a\") dependence structures. We also consider the situations with covariate (denoted by \"wc\") and with no covariate (denoted by \"wn\") for the dependence parameters. Thus a total of 12 submodels of model (la) are considered; they are l.md.g.wc, l.md.g.wn, l.md.e.wc, l.md.e.wn, l.md.a.wc, l.md.a.wn, l.mc.g.wc, l.mc.g.wn, l.mc.e.wc, l.mc.e.wn, l.mc.a.wc and l.mc.a.wn, where for example \"l.md.g.wc\" stands for the multivariate logit model with marginal regression coefficients differ across margins and general dependence structure with covariates. There are also 12 submodels for the model (2): these are p.md.g.wc, p.md.g.wn, p.md.e.wc, p.md.e.wn, p.md.a.wc, p.md.a.wn, p.mc.g.wc, p.mc.g.wn, p.mc.e.wc, p.mc.e.wn, p.mc.a.wc and p.mc.a.wn. For models (lbi), (lbii), (lbiii), (lc) and (Id), the AR(1) type latent dependence structure may not be well-defined. In any case, for not repeating similar analysis, we will only consider possible models within the models (lbi), (lbii), (lbiii), (lc) and (Id) with similar structure of models retained by the analysis with models (la) and (2). For all the models except (Id), the I F M estimation theory is applied. That is, the univariate (re-gression) parameters are estimated from separate univariate likelihoods (using the Newton-Raphson Chapter 5. Modelling, data analysis and examples 185 method), and bivariate and multivariate (regression) parameters are estimated from bivariate like-lihoods, using a quasi-Newton optimization routine, with univariate parameters fixed as estimated from the separate univariate likelihoods. Furthermore, for the situation of \"mc\" for common marginal regression coefficients and exchangeable (or AR(1) if applicable) dependence structure, W A of (2.93) in section 2.6 for parameter estimation based on I F M is used. It is also used for estimating the pa-rameter 0o in 8 = exp(/?o) in the model (lc) since 8 is an overall parameter and common across all margins. Notice that only one choice of parametric families for ip and Kjk's were used, but it is expected that other choices could lead to a better fit according to A I C . The model (Id) has a copula with closed form cdf and there is only one dependence (or regression parameters related to it) parameter in the model, thus MLE(s) are computed in this situation. Model (Id) here is used to compare a simple permutation symmetric M C D model with the other models which all allow a general dependence structure. Model (lc) and (Id) have the advantage of having a copula with closed form cdf; this is particularly convenient for dealing with multivariate discrete data of high dimension, as it leads to faster computation in computing probabilities of the form Pr(Y = y) or Pr(Y = y|x). For standard errors (SEs) of parameter estimates and prediction probabilities, the jackknife method from Chapter 2 is used with 255 random groups of 4. Furthermore, the weights used for W A for common parameter estimation are based on the jackknife SEs, and these weights in turn are used, based on (2.93), to each step of jackknife parameter estimation. Summaries of the fits of the models are given in several tables. Table 5.4 contains the esti-mates and SEs of the regression parameters for the marginal parameters with the logit model when the regression parameters are considered to be differ and common across the margins. Table 5.5 contains the estimates and SEs of the regression parameters for the dependence parameters under various settings for the multivariate logit model with multinormal copula (model (la)). Table 5.6 contains A I C values and X2 (calculated based on (5.5) with a = 0) values for all the submodels of multivariate logit and probit models with multinormal copula (that is models (la) and (2)). Care must be taken in the comparison since the AICs here are not calculated from the M L of all pa-rameters simultaneously, the parameters estimates are I F M E . The A I C values and X2 values for the corresponding submodel of models (la), (2) are comparable; this echoes the well-known fact that the univariate probit and logit models are comparable. We thus only compare the submodels within the multivariate logit model. From examining the A I C and X2 values for the 12 models, the models l.md.g.wn, l.md.e.wc, l.md.e.wn and l.mc.g.wn seem to stand out as interesting choices. Chapter 5. Modelling, data analysis and examples 186 Since l.md.e.wc and l.md.e.wn are about the same in terms of A I C and X2 values, and l.md.e.wn is simpler than l.md.e.wc, we only consider l.md.e.wn. At this stage, three models are retained for further inspection: l.md.g.wn, l.md.e.wn and l.mc.g.wn. Table 5.7 contains A I C values and Table 5.8 contains X2 values of submodels l.md.g.wn, l.md.e.wn and l.mc.g.wn of models (la), (lbi), (lbii), (lbiii), (lc) and (Id). These two tables suggest that the models are comparable in general, with models (lc) and (Id) performing relatively poorly; possibly other parametric families of mixture of max-id copulas would do better. The model (lbi) seems to be the best for this data set. Note that since there is only one dependence structure with model (Id), the submodel l.md.g.wn and l.md.e.wn are equivalent in this case. Table 5.9 contains estimates and SEs of the bivariate dependence pa-rameters of the submodel l.md.g.wn of models (lbi), (lbii), (lbiii) and (lc). This and Table 5.5 also suggest that the models are comparable; the conclusion about which bivariate margins are more or less dependent are the same from the models. They show that the dependence for consecutive years is slightly stronger; this is also observed in Table 5.3 for the initial data analysis. (Note also the closeness of the dependence parameter estimates with the model (lbii) to the empirical pairwise log odds ratio in Table 5.3.) For comparison, the estimate of the dependence parameter for model (Id), with an permutation symmetric copula, is 1.719 with S E equal 0.067. As we have pointed out, the model (Id) does not perform as well as other models with this data set. This may indicate that, even though a model with exchangeable dependence structure may be acceptable, a better model is to have a general dependence structure. A l l these models are quite similar in term of computer time for the parameter estimation (because of the I F M approach), but models (la) and (2) used much more computer time than the other models to compute A I C , X2 and the prediction probabilities since 4-dimensional integrations were involved. For the predicted probabilities and inference, and also as a supplement of X2 values for an assessment of goodness-of fit, Table 5.10 contains estimates of probabilities of the form Pr(Y = y) for all possible y from submodels l.md.g.wn, l.md.e.wn and l.mc.g.wn of the model (la), Table 5.11 contains estimates of probabilities of the form Pr(Y = y) for all possible y from submodels l.md.g.wn of models (la), (lbi), (lbii), (lbiii), (lc) and (Id), and these Pr(Y = y) are estimated with X^i=i\u00C2\u00B0Pr(Y = y|xj)/1020. Table 5.12 contains estimates of probabilities of the form Pr(Y = y|x) for various y and x from submodels l.md.g.wn, l.md.e.wn and l.mc.g.wn of the model (la), and Table 5.13 contains estimates of probabilities of the form Pr(Y = y|x) for various y and x from submodels l.md.g.wn of models (la), (lbi), (lbii), (lbiii), (lc) and (Id). In Table 5.12, the n* is the subset sizes for the specific value of x and \"rel. freq\" is the observed relative frequency for the given y under Chapter 5. Modelling, data analysis and examples 187 that value of x. In Table 5.13, to save space, for each line, only the maximum estimated S E over the different models is given; actually the SEs are quite close to each other. The selected x and y values in the Table 5.12 and Table 5.13 are common values in the data set. Tables 5.10 - 5.13 suggest that the submodels l.md.g.wn, l.md.e.wn and l.mc.g.wn are all adequate for predictive purposes, since the prediction probabilities are comparable when the SEs are taken into account. These three submodels can be used to complement each other for a slightly different inference purposes. The large divergences in estimated probabilities occur with the model (Id) and only in the case where the vector x is at the extreme of the covariate space, for example x = (1, 0, 0, 0, 0) and x = (0, 0, 0, 0, 0) for y = (1,1,1,1). There is a simple exchangeable dependence model l.md.e.wn among the three submodels l.md.g.wn, l.md.e.wn and l.mc.g.wn. A n explanation for this may be that the dependence in the bivariate margins are different but not different enough to make a difference in prediction probabilities. Another possibility may be due to the dominance of the response vector (0,0,0,0). The analysis (e.g. from submodel l.md.g.wn) indicates a slight decline in the rate of wheeze over time (the intercepts in Table 5.4 for regression parameters differing across margins decrease gradually over time from \u00E2\u0080\u00941.090 to \u00E2\u0080\u00941.564) and a moderate increase in wheeze for children of mothers who smoke (the corresponding regression parameters increases over time from 0.144 to 0. 444) and for the city with pollution (the corresponding regression parameters increase in time from 0.003 to 0.209). There is an indication that the excess of maternal smoking and the city with pollution both increase significantly the probability of the occurrence of wheeze (e.g. from submodel 1. mc.g.wn). This is also consistent with the observation in Ware et al. (1984), where it is believed that maternal smoking is predictive of respiratory illness. If we study the model with covariate (city) for the dependence (e.g. the submodel l.md.e.wc), we see that high city pollution level has a negative effect on the correlation; it possibly means that the low level city pollution leads to a slightly higher correlation on the occurrence of persistent wheeze. We can interpret this as the wheeze occurrence situation not caused by pollution is more stable over time. The analysis indicates that the dependence for consecutive years is stronger and the dependence (pairwise) are all significant. The rate of respiratory disease for children whose mothers smoke heavily is higher than the rate for children whose mothers do not smoke or only smoke slightly, these can be seen from Table 5.12 (with l.md.g.wn submodel), where for example for y = (1,1,1,1), P(y|x(a)) = 0.099 > 0.071 = P(y|x<*)) where x(a) = (0,1,1,1,1) and x<6> = (0,0,0,0,0), and P(y|x(c)) = 0.118 > 0.085 = P(y|x(d)) where x(c) = (1,1,1,1,1) and x(d) = (1,0,0,0,0). Similarly, we also observe that rate of persistent wheeze for children whose mothers smoke is lower than the rate for children whose mothers do not Chapter 5. Modelling, data analysis and examples 188 Table 5.1: Six Cities Study: Percentages for binary variables Variables # I's Percentage Age 9 266 26.07% Age 10 256 25.09% Age 11 241 23.62% Age 12 217 21.27% City 512 50.19% Smoking9 325 31.86% SmokinglO 313 30.68% Smokingll 311 30.49% Smokingl2 309 30.29% Table 5.2: Six Cities Study: Frequencies of the response vector (Age 9, 10, 11, 12) Response pattern Observed numbers Relative frequency 1 1 1 1 95 0.093 1 1 1 0 30 0.029 1 1 0 1 15 0.015 1 1 0 0 28 0.027 10 11 14 0.014 10 10 9 0.009 10 0 1 12 0.012 1 0 0 0 63 0.062 0 1 1 1 19 0.019 0 1 1 0 15 0.015 0 10 1 10 0.010 0 10 0 44 0.043 0 0 11 17 0.017 0 0 10 42 0.041 0 0 0 1 35 0.034 0 0 0 0 572 0.561 smoke (e.g. for y = (0,0,0,0), P(y|x(e)) = 0.606 > 0.541 = P(y|x<')) where x ^ = (0,0,0,0,0) and = (0,1,1,1,1)). Also similarly, the rate of persistent wheeze for children who reside in the city with more pollution is higher than the rate for children who reside in the city with less pollution, e.g., for y = (1,1,1,1), P ( y | x M ) = 0.118 > 0.099 = P ( y | xW) where x W = (1,1,1,1,1) and xW = (0,1,1,1,1), or P(y|x\u00C2\u00AB) = 0.085 > 0.071 = P(y|xW)) where x \u00C2\u00AB = (1,0,0,0,0) and x0') = (0,0,0,0,0). More detailed comparisons for different situations can be made. Similar results to the partial interpretations given above are also obtained in the literature on the analysis of a similar data set from the same study; see for example Fitzmaurice and Laird (1993), Zeger et al. (1988) and Stram et al. (1988). Chapter 5. Modelling, data analysis and examples 189 Table 5.3: Six Cities Study: Pairwise log odds ratios for Age 9, 10, 11, 12 Pair odds log odds Age 9,10 T2T97 JM~ Age 9,11 8.91 2.19 Age 9,12 8.69 2.16 Age 10,11 13.63 2.61 Age 10,12 10.45 2.35 Age 11,12 14.83 2.69 Table 5.4: Six Cities Study: Estimates of marginal regression parameters for multivariate logit model margin intercept (SE) city (SE) smoking (SE) differ across the margins 1 -1.090 (0.113) 0.003 (0.150) 0.144 (0.144) 2 -1.229 (0.120) 0.080 (0.148) 0.293 (0.155) 3 -1.412 (0.136) 0.311 (0.161) 0.237 (0.166) 4 -1.564 (0.123) 0.209 (0.155) 0.444 (0.166) common across the margins -1.308 (0.061) 0.144 (0.077) 0.270 (0.078) Table 5.5: Six Cities Study: Estimates of dependence regression parameters for multivariate logit model with multinormal copula margin intercept (SE) city (SE) general dependence, with covariate 12 2.156 (0.217) -0.373 (0.288) 13 1.740 (0.183) -0.178 (0.249) 14 1.583 (0.209) 0.041 (0.288) 23 2.334 (0.210) -0.628 (0.287) 24 1.891 (0.214) -0.294 (0.281) 34 2.079 0.224) -0.109 (0.287) general dependence, without covariate 12 1.960 (0.143) 13 1.645 (0.124) 14 1.604 (0.143) 23 1.987 (0.143) 24 1.733 (0.139) 34 2 020 (0.142) exchangeable dependence, with covariate 1.948 (0.085) -0.254 (0.114) exchangeable dependence, without covariate 1.815 (0.057) AR(1) dependence, with covariate 2.380 (0.086) -0.258 (0.115) AR(1) dependence, without covariate 2.236 (0.057) -, Chapter 5. Modelling, data analysis and examples 190 Table 5.6: Six Cities Study: Comparisons of A I C values and X2 values from various submodels of models (la) and (2) Logit Models A I C X2 Probit Models A I C Trn~1 * Distance,) - l]/[exp(0jk>o + Pjk,i * Distance,) +1] for a general dependence structure and 0,- = [exp(/?0 + /?i * Distance,) \u00E2\u0080\u0094 l]/[exp(/?0 +Pi* Distance,) +1] for exchangeable and AR(1) dependence structure. For models (lbi), (lbii), (lbiii), we first let higher order (> 3) parameters 5ijki and <5,',i234 be 1 (see explanation in the example in subsection 5.2.1). We then let 9ijk = [exp(/3jk,o + Pjk,i * Distance,-) - l]/[exp(0jklo + Pjk, l * Distance*) + 1] for model (lbi), b~i,jk = exp^-j^o + 0jk,i * Distance;) for model (lbii), and for model (lbiii). For model (lc), we let Sijk = exp(/?j4]o + 0jk,i * Distance,) and 0,- = exp(/?o) be independent of covariate. (The parameter of asymmetry Vij is set to 0 for all i and j.) Again, notice that only one choice of parametric families for tfi and Kjk's were used, but it is expected that other choices could lead to a better fit according to A I C . For model (Id), the dependence parameters are 9i and let 0,- = exp(/?o *Distancej). For model (2), the dependence structure is the same as model (la). As for the example in subsection 5.2.1, we study 12 submodels for the model (la). They are: l.md.g.wc, l.md.g.wn, l.md.e.wc, l.md.e.wn, l.md.a.wc, l.md.a.wn, l.mc.g.wc, l.mc.g.wn, l.mc.e.wc, Chapter 5. Modelling, data analysis and examples 196 l.mc.e.wn, l.mc.a.wc and l.mc.a.wn. The 12 submodels for the model (2) are: p.md.g.wc, p.md.g.wn, p.md.e.wc, p.md.e.wn, p.md.a.wc, p.md.a.wn, p.mc.g.wc, p.mc.g.wn, p.mc.e.wc, p.mc.e.wn, p.mc.a.wc and p.mc.a.wn. For models (lbi), (lbii), (lbiii), (lc) and (Id), the AR(1) type latent dependence structure may not be well-defined. In any case, to avoid repeating similar analysis, we will only consider possible models within the models (lbi), (lbii), (lbiii), (lc) and (Id) with similar structure of models retained by the analysis with models (la) and (2). For all the models except (Id), the I F M estimation theory is applied. That is, the univariate (regression) parameters are estimated from separate univariate likelihoods, and bivariate and multi-variate (regression) parameters are estimated from bivariate likelihoods, with univariate parameters fixed as estimated from the separate univariate likelihoods. For \"mc\" models involving common marginal regression coefficients and exchangeable (or AR(1) if applicable) dependence structure, W A of (2.93) in section 2.6 for parameter estimation based on I F M is used, and it is also used for estimating the parameter Bo in 9 = exp(Bo) in the model (lc). M L E s are computed in the model (Id). For standard errors (SEs) of parameter estimates and prediction probabilities, the (deletev one) jackknife method from Chapter 2 is used. These are all similar to the use of these models in subsection 5.2.1. Summaries of the model fits are given in several tables. Table 5.17 contains the estimates and SEs of the regression parameters for the univariate parameters with the logit model when the regression parameters are considered to be different and common across the univariate margins. Table 5.18 contains the estimates and SEs of the regression parameters for the dependence parameters under various settings for the multivariate logit model with multinormal copula (the model (la)). Table 5.19 contains A I C values and . X ^ (calculated based on (5.5) with a = 2) values for all the submodels of multivariate logit and probit models with multinormal copula (that is, models (la) and (2)). The A I C values and (not only a = 2, but for all a) values for the corresponding submodel of models (la), (2) are comparable, similar to what we have observed for the models examples in subsection 5.2.1. We thus only compare the submodels within the multivariate logit model. From examining the A I C values and X22^ values for the 12 models, the models l.md.g.wn, l.md.a.wc seem to stand out as interesting choices, with l.md.a.wc appearing to be the better one. Since there is no equivalent way to express the AR(1) structure with the models (lbii), (lbiii), (lc) and (Id), for the comparison study, we focus on the submodel l.md.g.wn. Table 5.20 contains A I C values and X22^ values of submodel l.md.g.wn of models (la), (lbii), (lbiii), (lc) and (Id). The A I C value and X22^ value are not available for (lbi) model, since the dependence parameter estimates obtained from I F M deviate Chapter 5. Modelling, data analysis and examples 197 slightly from forming a compatible set of dependence parameters for a proper Molenberghs-Lesaffre construction multivariate object evaluation. Based on the available A I C values and X22^ values, Table 5.20 suggests that the models (la), (lbii), (lbiii) are comparable in general, with models (lc) and (Id) fitting relatively poorly. Table 5.21 contains estimates and SEs of the bivariate dependence parameters of the submodel l.md.g.wn of models (lbi), (lbii), (lbiii) and (lc). This and Table 5.20 also suggest that the models (la), (lbi), (lbii), (lbiii) are comparable. The conclusion about which bivariate margins are more dependent or less dependent are the same from models (la), (lbi), (lbii), (lbiii). They show that the dependence for consecutive years are slightly stronger; this is consistent with the gamma measures in Table 5.16 for the initial data analysis. The dependence parameter estimates for model (lc) reveal that this model leads to a domination of dependence by the overall dependence (log0 = 1.808 with SE=0.073), which is close to assume a permutation symmetric copula. For comparison, for the model (Id) with a permutation symmetric copula, the dependence parameter estimate is log0 = 1.700 with SE= 0.111. From the above comparisons, it seems that the model (la) is an adequate and better model for this data set. Thus in the following, we will concentrate on comparing the two submodels l.md.g.wn, l.md.a.wc of model (la). Table 5.19 suggests that the submodel l.md.a.wc is a better model than the submodel l.md.g.wn; it also indicates that there is a justifiable AR(1) latent dependence structure, which describes the data set better than a general or exchangeable dependence structure. The exchangeable dependence structure assumption would be the least acceptable hypothesis. To compare the submodel l.md.a.wc and l.md.g.wn, Table 5.22 lists the values of X2a^ for different values of a (a = 1,2, . . . , 10). This table reveals that the submodel l.md.a.wc fits the response vectors with higher frequency (> 5) better, while the submodel l.md.g.wn fits the response vectors with lower frequency (< 4) better. In other words, neither submodel is clearly better. Different models capture the data set equally well in certain way; and they may be together useful to reveal the features of the data set and lead to some useful interpretations. As a complement to the X2 values for assessment of goodness-of fit, Table 5.23 contains estimates of probabilities of the form Pr(Y = y) for all possible y and the corresponding frequencies from the submodels l.md.g.wn and l.md.a.wc of model (la), and these Pr(Y = y) are estimated with Xw=i P~r(Y = y|ai\u00C2\u00BB)/268. Table 5.24 contains estimates of frequencies and probabilities of the form Pr(Y = y|;r) for various y at x = 1 (distance bigger than 5 miles) and at x = 0 (distance less than 5 miles) from submodels l.md.g.wn and l.md.a.wc of the model (la). In Table 5.24, to save space, for each line, only the maximum estimated SE over models is given; actually the SEs are Chapter 5. Modelling, data analysis and examples 198 quite close to each other. Table 5.23 and Table 5.24 suggest that the submodels l.md.g.wn and l.md.a.wc are both adequate for predictive purposes, since the prediction probabilities are comparable when the SEs are considered. These two submodels can be used to complement each other for slightly different inference purposes. The largest divergence in estimated probabilities with observed frequency occurs when x = 1 for y = (1,1,1,1). The submodel l.md.a.wc, with a AR(1) latent dependence structure with dependence parameters depend significantly on the covariate, indicates that not only the dependence for consecutive years are larger significantly, but the strength of dependences also differ for those who live within 5 miles from those who live between 5 and 10 miles of the plant. The analysis (e.g. from submodels l.md.g.wn as well as l.md.a.wc) indicates that, comparing the stress levels of the mothers living less 5 miles from the plant, there is a slight trend over time towards lower stress level for mothers living between 5 and 10 miles from the plant. There are no large changes of stress levels over time, but the stress levels of mothers living between 5 and 10 miles from the plant are a bit higher in the first year following the accident; they decrease in the second year and remain stable over the subsequent years. If we study the model with covariate (distance) for the dependence (e.g. the submodel l.md.a.wc), we see that living far from the plant has a negative effect on the dependence; it indicates that the dependence parameters are larger for those who live within 5 miles from the plant. This means that the mothers living within 5 miles from the plant are in probability more consistent over time in the original 90-item checklist; there could be a number of reasons for this. We can interpret this as the stress symptoms caused by the accident being more persistent over time for the group living closer to the plant. The analysis indicates that the dependence for consecutive years are larger and the dependence (pairwise) are all significant. The rate of a persistent high stress level is higher for mothers living closer to the plant. This can be seen from Table 5.24 (e.g. with l.md.a.wc submodel), where for example for y = (3,3,3,3), P(y\x = 0) = 0.076 > 0.037 = P(y\x = 1). The rate for a persistent medium stress level (y = (2,2,2,2)) is slightly higher for the group living closer to the plant, while the rates of persistent low stress level (y = (1,1,1,1)) is comparable for the two groups. Similar results to the partial interpretations given above are also obtained in Fienberg et al. (1985) and Conaway (1989). They conclude that mothers within the five mile radius were in fact ex-periencing greater stress symptom than mothers living between 5 to 10 miles away; this is consistent with our observations. Chapter 5. Modelling, data analysis and examples 199 Table 5.14: T M I Accident Study: Stress levels for 4 years following accident at T M I . Responses with non zero frequencies. Distance ID Response pattern < 5 mi. > 5 mi. 1 3 3 3 3 12 7 2 3 3 3 2 5 2 3 3 3 2 3 0 2 4 3 3 2 2 2 7 5 3 3 2 1 1 1 6 3 2 3 3 4 0 7 3 2 3 2 1 0 8 3 2 2 3 3 0 9 3 2 2 2 4 13 10 3 2 2 1 0 1 11 3 1 1 3 0 1 12 2 3 3 3 1 1 13 2 3 3 2 1 3 14 2 3 2 3 0 1 15 2 3 2 2 2 1 16 2 2 3 3 3 1 17 2 2 3 2 2 5 18 2 2 2 3 4 6 19 2 2 2 2 38 53 20 2 2 2 1 2 6 21 2 2 12 2 2 22 2 2 11 3 2 23 2 12 3 0 1 24 2 12 2 4 15 25 2 12 1 1 5 26 2 1 1 2 1 4 27 2 1 1 1 5 4 28 12 2 2 4 3 29 12 2 1 2 0 30 12 12 1 0 31 12 11 0 1 32 1 1 2 2 3 0 33 1 1 2 1 2 2 34 1 1 1 2 0 2 35 1 1 1 1 2 1 Total 115 153 Chapter 5. Modelling, data analysis and examples 200 Table 5.15: T M I Accident Study: Univariate marginal (and relative) frequencies. margin Outcomes IW7$ T 9 8 D 1581 IW2\" < 5 mi. 3 32 (0.278) 24 (0.209) 29 (0.252) 27 (0.235) 2 69 (0.600) 73 (0.635) 72 (0.626) 70 (0.609) 1 14 (0.122) 18 (0.157) 14 (0.122) 18 (0.157) > 5 mi. 3 34 (0.222) 25 (0.163) 19 (0.124) 20 (0.131) 2 110 (0.719) 93 (0.608) 117 (0.765) 110 (0.719) 1 9 (0.059) 35 (0.229) 17 (0.111) 23 (0.150) sn 3 66 (0.246) 49 (0.183) 48 (0.179) 47 (0.175) 2 179 (0.668) 166 (0.619) 189 (0.705) 180 (0.672) 1 23 (0.086) 53 (0.198) 31 (0.116) 41 (0.153) Table 5.16: T M I Accident Study: Pairwise gamma measures for Year 1979, 1980, 1981, 1982 Pair < 5 mi. > 5 mi. all (1979, 1980) 0.894 0.829 0.852 (1979, 1981) 0.831 0.635 0.758 (1979, 1982) 0.782 0.595 0.702 (1980, 1981) 0.907 0.882 0.887 (1980, 1982) 0.756 0.638 0.700 (1981, 1982) 0.924 0.738 0.851 Table 5.17: T M I Accident Study: Estimates of univariate marginal regression parameters for mul-tivariate logit models margin Tj-(1) (SE) 7j(2) (SE) distance (SE) differ across the margins 1 -2.376 (0.304) 1.109 (0.227) 0.017 (0.272) 2 -1.629 (0.215) 1.291 (0.203) 0.384 (0.250) 3 -2.349 (0.299) 1.250 (0.234) 0.497 (0.287) 4 -1.938 (0.263) 1.343 (0.232) 0.368 (0.273) common across the margins -1.984 (0.131) 1.250 (0.112) 0.315 (0.135) Chapter 5. Modelling, data analysis and examples 201 Table 5.18: T M I Accident Study: Estimates of dependence regression parameters for multivariate logit model with multinormal copula margin intercept (SE) distance (SE) general dependence, with covariate 12 1.960 (0.289) -0.240 (0.422) 13 1.594 (0.250) -0.470 (0.391) 14 1.430 (0.304) -0.386 (0.414) 23 2.079 (0.310) -0.092 (0.424) 24 1.428 (0.321) -0.271 (0.425) 34 2 358 (0.363) -0.960 (0.505) general dependence, without covariate 12 1.824 (0.212) 13 1.356 (0.192) 14 1.243 (0.195) 23 2.032 (0.219) 24 1.277 (0.205) 34 1.779 (0.273) exchangeable dependence, with covariate 1.772 (0.123) -0.377 (0.174) exchangeable dependence, without covariate 1.546 (0.086) AR(1) dependence, with covariate 2.208 (0.124) -0.385 (0.178) AR(1) dependence, without covariate 2.008 (0.088) Table 5.19: T M I Accident Study: Comparisons of A I C values and X2^ values from various submodels of models (la) and (2) Logit Models A I C Probit Models A I C T m Z l.md l.md l.md l.md l.md l.mc l.mc l.mc l.mc l.mc l.mc 1542.443 1537.235 1549.740 1550.219 1534.116 1535.942 1568.305 1563.332 1568.103 1570.950 1557.114 1562.229 28.018 29.786 98.795 117.144 26.547 28.551 83.795 86.916 197.711 217.573 77.743 80.869 p.md.g.wc p.md.g.wn p.md.e.wc p.md.e.wn p.md.a.wc p.md.a.wn p.mc.g.wc p.mc.g.wn p.mc.e.wc p.mc.e.wn p.mc.a.wc p.mc.a.wn g.wc g.wn .e.wc .e.wn .a.wc .a.wn g.wc g.wn e.wc e.wn a.wc a.wn 1542.788 1537.499 1549.977 1550.403 1534.388 1536.168 1568.351 1563.416 1568.324 1571.111 1557.288 1562.378 27.975 29.600 97.138 115.501 26.594 28.479 85.011 88.284 204.103 226.992 78.831 81.990 Chapter 5. Modelling, data analysis and examples 202 Table 5.20: T M I Accident Study: Comparisons of A I C values and X2^ values from the submodel l.md.g.wn of various models Models A I C Xf2) TTal 1537.235 29.786 (lbi) (lbii) 1540.846 23.485 (lbiii) 1542.312 33.303 (lc) 1566.475 970.756 (Id) 1553.640 422.000 Table 5.21: T M I Accident Study: Estimates (SE) of dependence regression parameters from the submodel l.md.g.wn of various models margin (lbi) (lbii) (lbiii) (lc) T 2 1.824 (0.212) 2.697 (0.289) 1.960 (0.158) -8.250 (0.172) 13 1.356 (0.192) 2.035 (0.262) 1.628 (0.171) -8.996 (0.010) 14 1.243 (0.195) 1.928 (0.273) 1.485 (0.184) -8.397 (0.001) 23 2.032 (0.219) 2.857 (0.289) 2.122 (0.150) -7.495 (0.202) 24 1.277 (0.205) 2.014 (0.271) 1.495 (0.185) 0.927 (1.910) 34 1.779 (0.273) 2.710 (0.290) 1.978 (0.190) 1.084 (1.827) log(fl) 1.808 (0.073) Table 5.22: T M I Accident Study: Comparisons of X2^ values from the submodels l.md.g.wn and l.md.a.wc of model (la) l.md.g.wn l.md.a.wc a *('.) 1 623.507 4390.084 2 29.786 26.547 3 15.809 16.243 4 10.339 11.766 5 9.086 7.453 6 8.217 7.240 7 7.583 6.798 8 4.625 3.575 9 2.983 2.481 10 2.467 2.310 Chapter 5. Modelling, data analysis and examples 203 Table 5.23: T M I Accident Study: Estimates of P r ( Y = y) and frequencies from the submodels l.md.g.wn and l.md.a.wc of model (la) Response pattern Observed numbers l.md.g.wn Expected numbers l.md.a.wc Expected numbers Observed prob. l.md.g.wn Expected prob. l.md.a.wc Expected prob. 3 3 3 3 19 14.1 14.4 0.071 0.053 0.054 3 3 3 2 7 7.3 7.5 0.026 0.027 0.028 3 3 2 3 2 3.0 2.5 0.007 0.011 0.009 3 3 2 2 9 8.3 9.8 0.034 0.031 0.037 3 3 2 1 2 0.2 0.4 0.007 0.001 0.002 3 2 3 3 4 3.7 2.9 0.015 0.014 0.011 3 2 3 2 1 2.6 2.7 0.004 0.010 0.010 3 2 2 3 3 5.3 3.0 0.011 0.020 0.011 3 2 2 2 17 19.2 19.5 0.063 0.072 0.073 3 2 2 1 1 1.0 2.0 0.004 0.004 0.007 3 1 1 3 1 0.0 0.0 0.004 0.000 0.000 2 3 3 3 2 3.7 4.5 0.007 0.014 0.017 2 3 3 2 4 4.3 2.9 0.015 0.016 0.011 2 3 2 3 1 1.1 1.3 0.004 0.004 0.005 2 3 2 2 3 6.3 5.4 0.011 0.023 0.020 2 2 3 3 4 5.1 6.7 0.015 0.019 0.025 2 2 3 2 7 7.0 6.3 0.026 0.026 0.024 2 2 2 3 10 9.9 10.4 0.037 0.037 0.039 2 2 2 2 91 86.0 87.4 0.340 0.321 0.326 2 2 2 1 8 12.5 11.6 0.030 0.047 0.043 2 2 12 4 3.4 3.2 0.015 0.013 0.012 2 2 11 5 3.4 4.1 0.019 0.013 0.015 2 12 3 1 0.9 0.8 0.004 0.003 0.003 2 12 2 19 17.1 16.6 0.071 0.064 0.062 2 12 1 6 4.3 4.6 0.022 0.016 0.017 2 1 1 2 5 6.0 4.7 0.019 0.022 0.018 2 1 1 1 9 7.2 8.1 0.034 0.027 0.030 12 2 2 7 3.7 3.6 0.026 0.014 0.014 12 2 1 2 1.4 0.6 0.007 0.005 0.002 12 12 1 0.2 0.2 0.004 0.001 0.001 12 11 1 0.5 0.3 0.004 0.002 0.001 1 1 2 2 3 4.8 6.3 0.011 0.018 0.024 1 1 2 1 4 2.5 1.9 0.015 0.009 0.007 1 1 1 2 2 2.5 2.9 0.007 0.009 0.011 1 1 1 1 3 6.7 6.3 0.011 0.025 0.023 others 0 2.8 2.6 0.000 0.009 0.008 Chapter 5. Modelling, data analysis and examples 204 Table 5.24: T M I Accident Study: Estimates of Pr(Y = y\x) and frequencies from the submodels l.md.g.wn and l.md.a.wc of model (la) l.md.g.wn l.md.a.wc l.md.g.wn l.md.a.wc Response Observed Expected Expected maxSE Observed Expected Expected maxSE pattern numbers numbers numbers prob. prob. prob. < 5 miles 3 3 3 3 12 7.5 8.7 2.4 0.104 0.065 0.076 0.021 3 3 3 2 5 3.6 3.8 1.2 0.043 0.031 0.033 0.010 3 3 2 2 2 3.4 4.0 1.0 0.017 0.030 0.035 0.009 3 3 2 1 1 0.1 0.1 0.1 0.009 0.001 0.001 0.001 3 2 3 3 4 1.7 1.4 0.7 0.035 0.015 0.012 0.006 3 2 3 2 1 1.2 1.0 0.5 0.009 0.010 0.009 0.004 3 2 2 3 3 2.1 1.0 0.8 0.026 0.018 0.009 0.007 3 2 2 2 4 6.9 6.8 1.5 0.035 0.060 0.059 0.013 2 3 3 3 1 2.3 2.6 0.7 0.009 0.020 0.023 0.006 2 3 3 2 1 2.5 1.5 0.9 0.009 0.022 0.013 0.008 2 3 2 2 2 3.2 2.4 0.9 0.017 0.028 0.021 0.008 2 2 3 3 3 2.9 3.4 0.9 0.026 0.025 0.030 0.008 2 2 3 2 2 3.8 3.0 1.2 0.017 0.033 0.026 0.010 2 2 2 3 4 4.7 4.5 1.3 0.035 0.041 0.039 0.011 2 2 2 2 38 37.3 40.6 3.8 0.330 0.324 0.353 0.033 2 2 2 1 2 4.8 4.4 1.3 0.017 0.042 0.038 0.011 2 2 12 2 1.2 0.8 0.6 0.017 0.010 0.007 0.005 2 2 11 3 1.2 1.3 0.6 0.026 0.010 0.011 0.005 2 12 2 4 6.3 5.8 1.5 0.035 0.055 0.050 0.013 2 12 1 1 1.5 1.6 0.6 0.009 0.013 0.014 0.005 2 1 1 2 1 1.8 1.3 0.7 0.009 ' 0.016 0.011 0.006 2 1 1 1 5 2.1 2.5 0.8 0.043 0.018 0.022 0.007 12 2 2 4 2.0 1.6 0.8 0.035 0.017 0.014 0.007 12 2 1 2 0.7 0.2 0.3 0.017 0.006 0.002 0.003 12 12 1 0.1 0.1 0.1 0.009 0.001 0.001 0.001 1 1 2 2 3 2.2 2.8 0.8 0.026 0.019 0.024 0.007 1 1 2 1 2 1.0 0.8 0.5 0.017 0.009 0.007 0.004 1 1 1 1 2 2.3 2.8 0.9 0.017 0.020 0.024 0.008 others 0 4.5 3.9 - 0.000 0.039 0.034 -> 5 miles 3 3 3 3 7 6.6 5.7 1.7 0.046 0.043 0.037 0.011 3 3 3 2 2 3.7 3.7 0.9 0.013 0.024 0.024 0.006 3 3 2 3 2 1.7 1.4 0.6 0.013 0.011 0.009 0.004 3 3 2 2 7 4.7 5.8 1.4 0.046 0.031 0.038 0.009 3 3 2 1 1 0.2 0.3 0.2 0.007 0.001 0.002 0.001 3 2 2 2 13 12.4 12.7 2.1 0.085 0.081 0.083 0.014 3 2 2 1 1 0.8 1.5 0.5 0.007 0.005 0.010 0.003 3 1 1 3 1 0.0 0.0 0.0 0.007 0.000 0.000 0.000 2 3 3 3 1 1.4 1.8 0.5 0.007 0.009 0.012 0.003 2 3 3 2 3 1.8 1.4 0.8 0.020 0.012 0.009 0.005 2 3 2 3 1 0.5 0.8 0.3 0.007 0.003 0.005 0.002 2 3 2 2 1 3.1 3.1 0.9 0.007 0.020 0.020 0.006 2 2 3 3 1 2.3 3.2 0.8 0.007 0.015 0.021 0.005 2 2 3 2 5 3.4 3.4 1.1 0.033 0.022 0.022 0.007 2 2 2 3 6 5.0 5.8 1.4 0.039 0.033 0.038 0.009 2 2 2 2 53 48.7 46.8 4.9 0.346 0.318 0.306 0.032 2 2 2 1 6 7.7 7.2 1.7 0.039 0.050 0.047 0.011 2 2 12 2 2.1 2.3 0.9 0.013 0.014 0.015 0.006 2 2 11 2 2.3 2.9 0.9 0.013 0.015 0.019 0.006 2 12 3 1 0.6 0.6 0.3 0.007 0.004 0.004 0.002 2 12 2 15 10.9 10.9 2.3 0.098 0.071 0.071 0.015 2 12 1 5 2.9 3.1 0.9 0.033 0.019 0.020 0.006 2 1 1 2 4 4.1 3.5 1.4 0.026 0.027 0.023 0.009 2 1 1 1 4 5.2 5.5 1.2 0.026 0.034 0.036 0.008 12 2 2 3 1.7 2.0 0.8 0.020 0.011 0.013 0.005 12 11 1 0.3 0.2 0.2 0.007 0.002 0.001 0.001 1 1 2 1 2 1.5 1.1 0.6 0.013 0.010 0.007 0.004 1 1 1 2 2 1.5 1.8 0.6 0.013 0.010 0.012 0.004 1 1 1 1 1 4.4 3.5 1.1 0.007 0.029 0.023 0.007 others 0 11.6 11.2 - 0.000 0.076 0.073 -Chapter 5. Modelling, data analysis and examples 205 5.2.3 Example with multivariate count response data In this subsection, several models are applied to a data set of trivariate counts of pathogenic bacteria at 50 different sterile locations measured by three different air samplers. Aitchison and Ho (1989) studied this data set. One of the objectives of the study is to investigate the relative effectiveness of three different air samplers to detect pathogenic bacteria. The response vectors are 3-dimensional count measures. Table 5.25 lists the count measures of the three samplers from the 50 locations. The table shows that there are no duplicate trivariate response observations. The frequencies by univariate margin (or by sampler) given in Table 5.26 indicate that sampler 3 is more variable than sampler 1 and 2. The pairwise gamma measures in Table 5.27 indicate that the samplers 1 and 3 and the samplers 2 and 3 are negatively associated. Summary statistics (means, variance, quartiles, maximum, minimum and pairwise correlations) are given in Table 5.28. This initial data analysis indicates that there is some extra-Poisson variation as the variance to mean ratio for the margins (or sampler) range from 2 to 5, with the sampler 3 more variable than the other two samplers. This initial data analysis suggests that the M C D models with Poisson variation may not be suitable, but for illustrative purposes, we applied the multivariate count model with Poisson variation as well as multivariate count model with extra Poisson variation. The multivariate count response models that were used to model this trivariate count data are: 1. The multivariate Poisson model with multinormal copula (3.1). 2. The multivariate Poisson-lognormal model in section 3.6. This data set has no covariates, we thus directly estimate the parameters in the multivariate Poisson model with multinormal copula (see section 3.3) and the multivariate Poisson-lognormal model (see section 3.6). The multivariate Poisson model has Poisson marginals. The univariate parameter Xj (j = 1,2,3) in the multivariate Poisson model is reparameterized by taking log-transformation, rjj = log(Aj), such that the new parameter rjj has the range ( \u00E2\u0080\u0094 0 0 , 0 0 ) . For the dependence parameters Ojk in the multinormal copula, we let Ojk = [exp(Bjk) \u00E2\u0080\u0094 l]/[exp(0jk) + 1] such that 0jk has the range ( \u00E2\u0080\u0094 0 0 , 0 0 ) . We proceed to estimate rjj and 3j\-. For the multivariate Poisson-lognormal model, the marginal parameters are p = (pi,P2,Pz) and tT = ( 2 are not available since all the 3-tuples in the data set have frequency 1. X2^ is very large since many estimated probabilities for the 3-tuples of frequency 1 are very close to zero. In this situation, because of the frequency 1 occurrence for all 3-tuples, the X2 measure as well as the estimated probabilities of the form Pr (Y = y) may not be suitable measures for a rough assessment of the goodness-of-fit. Instead, residual measures such as (5.6) should be considered if feasible. Other rough goodness-of-fit checks may consist of comparing some empirical statistics (means, variances, correlations, etc.) with the counterparts estimated from the fitted model. The latter approach would rule out all submodels of model (1) for the goodness-of-fit of this data set, since the extra-Poisson variation demonstrated by the empirical statistics are not matched by the model (1). For the residual checking based on (5.6), we give an illustration here with the submodel md.g. We first compute e,-3 - yi3 \u00E2\u0080\u0094 E[Yi3\Yn = yn,Y{2 = yii,X], where E[Yi3|*h - yn,Yi2 - Vi2,>] = Y%=i V * P(ynyi2y)/P{ynyi2)- We then plot ei3 versus yn and ei3 versus y{2 for all i = 1 , . . . , 50. The model would be considered as adequate based on residual plots if the residuals are small and do not exhibit systematic patterns. The two plots in Figure 5.1 do not show evident systematic patterns (except for a few outliers), but almost all the residuals are quite large judging from the observed values of yi3; it indicates that the models do not fit the data well Chapter 5. Modelling, data analysis and examples 207 5 1 0 1 5 2 0 O 5 1 0 1 5 s a m p l e r 1 . s a m p l e r 2 Figure 5.1: Bacteria Counts: Residuals from the submodel md.g of model (1). enough. This is expected since the multivariate Poisson models only fit data with Poisson variation. Next we consider fitting the multivariate Poisson-lognormal model to the data. From I F M estima-tion theory, we first estimate the univariate marginal parameters p = (pi,P2, P3) and V(p), where V(p) is a specified function of p (see McCullagh and Nelder 1989), the estimating equations are applicable to different types of variables (continuous and discrete), with no assumptions about (6.1) \u00C2\u00AB=i Chapter 6. GEE methodology and its comparison with ML and IFM approaches 215 the distribution of the response. (Actually the form Var(Y) = V(p) is quite restrictive. We will discuss this in section 6.2.) For the rf-dimensional multivariate response, if the responses are naively assumed to be independent and 0 is assumed to be common for different Yij or pij, j = l,...,d, then the quasi-likelihood estimation equations become T E(^) V - \ Y i ) ( y i - p i ) = 0, where pt = ( p { 1 p i d ) ' = E(Y,) , V(Y{) = diag[Var(Y < 1 ) , . . . , Var(y i d)], and / dpi 9/3, 1 dpi _ 00' To gain more efficiency in estimating these regression parameters of the univariate margins, Liang and Zeger (1986) and Zeger and Liang (1986) propose to estimate 0 from U(0) = J2DfVr\yi-pi) = O, (6.2) i=l where Di = dp,/80'. Here Vi is a \"working\" or approximate covariance matrix for Y,-, chosen by the investigator. The \"working\" covariance can be expressed in the form: Vi = A)l2Ri{*)Ay\ where Ai = diag[Var(Y,i) , . . . , Var(lid)] and- Ri(ot) = Corr(Y,). Ri(ot) is termed a \"working\" correlation matrix, with a representing a vector of parameters associated with a specified model for Corr(Y,). Note that the correlation matrix can differ from subject to subject, but Ri(ot) is fully specified by the vector of unknown parameters, a, which is usually the same for all subjects. Ri(a) is referred to as a \"working\" correlation matrix, as Liang and Zeger argued, because it need not to be correctly specified. The equations (6.2) are thus called generalized estimating equations, or G E E . The extension in (6.2) made by Liang and Zeger is basically about the specification of the \"working\" correlation matrix. Liang and Zeger (1986) showed that as the sample size tends to infinity, the estimates of the regression coefficients obtained from the G E E approach are consistent and asymptotically normal, with an asymptotic variance-covariance matrix which can be consistently estimated even under misspecification of the dependence structure. If Vi = Cov(Y;) is correctly specified, then a consistent estimate of the asymptotic variance of 0 is given by srl(d) = x;^r1A> \u00C2\u00AB=i Chapter 6. GEE methodology and its comparison with ML and IFM approaches 216 where V{ is Vi evaluated at (0,ot), and D{ is \u00C2\u00A3) ; evaluated at 0, respectively. However, if the \"working\" correlation Ri(ot) is misspecified, T,^[1(0) can give inconsistent estimates. Liang and Zeger (1986) suggest using the following \"robust\" estimate: where = X > f v r 1 ^ -/>,-)(* -kyv- 1^ . i=i This estimate is \"robust\" since it is consistent even if the \"working\" covariance Vi is not equal to Cov(Yj) . A n alternative approach, which we recommend, is to apply the jackknife method to (6.2) to obtain an estimate of E(/?). There are several choices for the \"working\" correlation matrix R{. The simplest choice is to use Ri = J, where J is a identity matrix. This is equivalent to assume that the response variables are not linearly correlated. One can also assume Ri(ot) = R(ot) for all i, and let R(pt) be fully unspecified. Two simple special cases of R(ot) are an exchangeable correlation matrix where Corr(Y;j, Yik) = a and an autoregressive correlation matrix where Con (Yi j, Yik) = a'- 7 - *' . If a is known, (6.2) is sufficient alone for the estimation of 0. Otherwise, or must be estimated. We discuss this next. The \"working\" correlation matrix may be obtained through additional modelling. Prentice (1988) has considered extensions of the G E E in Zeger and Liang (1986) to explicitly estimate the covariances of the responses. He proposed to work with ' t { ^ r ) T Co.'1 (Yi)(yi-p^ = 0 < 'n1 T (6\"3) E(S0 C o v - 1 ( W 0 . ( w < - % ) = 0 > i=l ^ ' for finding 0 and a, where W ; = (YnYa, Y ^ i s , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, 1 ) ^ ) ' , Hi = E(Y,-), Vi = E(W,-). 0 characterizes the marginal means pt = (pn, Pi2, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, Pidf and a = (a i , a 2 , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, ctr)' characterizes the marginal pairwise association ty. C o v _ 1 ( Y j ) and C o v - 1 ( W t ) could be replaced by some special chosen matrices. Let 6 = (0',a')'. Zhao and Prentice(1990) proposed to work with for finding 0 and a. C o v ( ^ ) could be replaced by a \"working\" covariance matrix for (Y,-, W^)'. The main idea here was to add extra estimation equations for the dependence parameters, to improve the parameter estimation from (6.2). Chapter 6. GEE methodology and its comparison with ML and IFM approaches 217 6.2 G E E in multivariate analysis In this section, the G E E method is illustrated with some examples. Drawbacks of the method are discussed. E x a m p l e 6.1 Suppose Y j ~ Nd(X.{0, \u00C2\u00A3,\u00E2\u0080\u00A2), i = 1 , . . . , n, where X,- is a known d x q matrix and is a d x d covariance matrix, and 0 is a q x 1 parameter vector. The maximum likelihood method is to minimize \u00C2\u00A3 \" = i ( y \u00C2\u00BB - pi)'T,z1(yi - pt) with p{ = (pii,pi2, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2\u00E2\u0080\u00A2,Pid)' = X 8 / J . It leads to T where dpi/80' = X j . In this particular situation, the M L estimating equations are exactly the same as G E E . \u00E2\u0080\u00A2 E x a m p l e 6.2 Suppose Y,- has the multivariate probit model with multinormal copula. The mean vector is p{ = , . . . , md)', where - <3>(/?XJJ), i = 1 , . . . , n, j = 1 , . . . , d. The G E E for 0, with the correct specification of the response variance matrix are U(p) = '\u00C2\u00A3iDTvi(Yi)-1(yi-H) = 0, \u00C2\u00AB=i where D, - dpi/dB', and / 2(fcj,i3-x.i1c;pjk) - $(/?Xij)$(/?x;fc), I < j < k < d. The G E E for 0 is different from M L E for 0 in this case. We also notice that in this example, the actual correlation among the responses depend on the mean values, and hence Bx.ij, j = 1, . . .,d. This is not considered in the G E E assumptions for the \"working\" correlation matrix. In G E E for multivariate binary data, the \"working\" correlation is usually assumed to be independent of the mean parameters. In the next section, more studies to compare G E E with M L E under the multivariate probit model assumption will be given. \u00E2\u0080\u00A2 E x a m p l e 6.3 We examine the multivariate Poisson-lognormal model of Example 2.12. Suppose in (2.28) - (2.30), pj = v and o-j \u00E2\u0080\u0094 rj, j = 1 , . . . , d. Let 0 \u00E2\u0080\u0094 (Vji])'. We can obtain the estimates of v and rj by M L or I F M . Now we apply the G E E approach. Since for i = l,...,n, E(Yjj) = Chapter 6. GEE methodology and its comparison with ML and IFM approaches 218 exp{i/ + r,2/2} = a, Var(Y 8 J ) = a + a 2 [exp(\u00C2\u00BB7 2 ) - 1] = 6 and Cov(Y,j, Yik) = a2[exp(% f c7? 2) - l ] , j ^ k . Thus fdECYj) \ 08 With the correct specification of the response mean function and the variance-covariance matrix, the G E E for the parameters v and 77 are where V ( Y , ) has the diagonal component a + a 2 [exp(\u00C2\u00BBj 2 ) \u00E2\u0080\u0094 1], and off-diagonal (j,k) component a2[exp(9jkT]2) \u00E2\u0080\u0094 1]. Since the two rows in (5E(Y,)/<9/?)T are proportional, (6.6) reduces to a single equation, thus estimates for v and n cannot be obtained with the quasi-likelihood approach. We can see this more clearly by examining a special situation where 6jk = 0. In this case (6.6) becomes n d 13 - a) = 0, 1=1j=i which leads only to an estimate of a \u00E2\u0080\u0094 exp-jV + r]2/2}. In this situation, we obtain a consistent estimator for a, but not for v and 77 separately. This is a situation where consistent estimators of the parameters of interest are not obtainable with G E E approach even with the correct specification of the mean functions, the variance functions and the correlation structure. Following G E E , if the interpretation of the model or parameters are carried out based on a, we will have the same interpretation for v + TJ2/2 being constant irrespective of whether v is in fact a relatively big value or small value. For if v = h(8x) where 8 is a parameter vector and x is a covariate vector, then the correct interpretation of the effect of covariates is not possible from the G E E approach. The problem with this example is that v and 77 are confounded in E(Yj), and G E E fails to use the information on v and 77 in the second moment of Yj. This is an example showing that the form Var(Y) = V(p) is restrictive. \u00E2\u0080\u00A2 The above examples provide some flavor of the G E E approach. It is clear that to use G E E for a meaningful purpose, the method requires the correct specification of marginal mean (and possibly correct specification of variance). It expects to get useful marginal regression parameter estimates with the dependence structure of multivariate data treated as a nuisance. A n attractive feature of G E E is the partial requirement on the model through the specification only of lower order moments. But the G E E approach has a number of drawbacks if it is considered under the multivariate data analysis framework; some of the drawbacks are direct results of its attractiveness: Chapter 6. GEE methodology and its comparison with ML and IFM approaches 219 i . The G E E approach is incomplete for the data analysis cycle of initial data analysis, statistical modelling, estimation, diagnostics and inference. In published work, G E E focuses mainly on the estimation stage with emphasis on some marginal regression parameters estimation, which can only be considered as a small part of the whole multivariate analysis process. In multivariate data analysis, the proper analysis cycle is important for the interpretation of the findings to be statistically meaningful. ii . With the G E E approach, the type of inferences can be made from the estimation results are limited. G E E is mainly useful for marginal regression parameter estimation, regardless of the possible multivariate models for the data. If the objective of scientific investigation is to find the probability occurrences of some phenomena, such as in multivariate discriminant analysis, G E E is not helpful. G E E also treats the dependence as a nuisance, and then use a \"working\" correlation matrix in estimation. This may deviate from the purpose of multivariate analysis which is often motivated by the need to analyze complex dependence among variables and objects. G E E does not deal with this question seriously. Furthermore, the correlation often is not the \"best\" notion of dependence for some multivariate non-normal variables. iii . With the G E E approach, there is no clear way to assess the assumptions, such as common 0 for different univariate margins. The effective use of the G E E resides on the correct specification of marginal mean function. If the specifications are not correct, it would not be adequate to use the estimation results for the inference purposes. With G E E , when the inference is wrong, it is not easy to tell where is wrong, and to what extent the results are useful. The G E E has a direct representation within the exponential family, but may not be true for models not in the exponential family. Notice that many \"interpretable\" multivariate models are not in the exponential family. iv. With the G E E approach, it may be difficult to have sound interpretations in some situations. A situation is the Example 6.3, where it is not possible to get an estimate of the parameter (or consistent estimate of the parameter) of interest through simple G E E , even under all favorable conditions for G E E , such as the correct specification of the mean functions, variance functions and the correlation function. Then why G E E ? G E E is a simple estimating approach in multivariate situations where only some knowledge of lower order moment characteristics are used. It may be considered as an appropriate approach when the relationships between covariates and marginal probabilities are of prime interest, Chapter 6. GEE methodology and its comparison with ML and IFM approaches 220 and when a proper multivariate model is not available or mathematically difficult to deal with. G E E may lead to some gain in estimation efficiency for the marginal regression parameters from a sound specification of the dependence among the response variables. In some practical situations, we may have some rough knowledge about the dependence structure among response variables; this knowledge can be appropriately incorporated into G E E . The G E E approach provides a way to avoid the difficulty of dealing with the complex relationship between some model parameters of interests and the joint probabilities that define the likelihood in multivariate (longitudinal) situation, while still estimating some parameters of interests. However, in multivariate analysis, the marginal behaviour is only one of the possible features of interest. Others include the dependence structure among the response variables, the prediction of the probability for an outcome, the changes within subjects, etc. Within the general multivariate analysis framework, G E E should be considered only as a set of estimating equations for some parameters in a multivariate situation. Its usefulness is limited without incorporating it properly into the data analysis cycle, which mainly consist of initial data analysis, statistical modelling, disgnostics and inference. Some technical problems related to the G E E approach are: 1. How efficient is the G E E approach? How it compares with the M L approach? (when a full model can be specified.) 2. How important is the correct specification of the response correlation matrix? If the correla-tions of the response variables do depend on the marginal regression parameters (see Example 6.2), how does G E E work? 3. What is the effect of the (correct) specification of variance function? 4. What is the practical meaning of \"large sample size\" with G E E to achieve the estimation consistency? Item 1 is a natural question, since G E E is an approach which uses only the partial information of a likelihood model. Item 2 is related to the fact that the true correlation structure for the response variables is rarely known in practice. If different specifications of the correlation matrix make a difference on the marginal regression parameter estimate, what could we really say about the regression parameters? If different specifications of the correlation matrix do not make a difference, what else can we say (about the regression parameter estimates and correlations)? Item 4 is also a natural question for many statistical methodologies where their good properties are only established Chapter 6. GEE methodology and its comparison with ML and IFM approaches 221 in an asymptotic sense. For item 3, the point at issue is best introduced via a simple example. Suppose we have a Poisson-lognormal model. In Example 6.3, we have demonstrated that the G E E in (6.6) can not lead to an estimate of v and n. We now simply limit our discussion to the univariate Poisson-lognormal model to illustrate our points. The G E E for the univariate case (with no covariates) is E da y{ - a . dfi\u00E2\u0080\u0094 = \u00C2\u00B0' ( 6 J ) 8 = 1 where 3 = (u, rj)1, a = E(Y,-) = exp{z^ + 772/2} and b = Var(Yj) = a + a2[exp(ri2) - 1] = a + r a 2 , T = exp(rj2) \u00E2\u0080\u0094 1. (6.7) is equivalent to 2>< - a) = 0. (6.8) i=l To estimate v as well as 77, an additional equation to (6.8) is needed. One of a such equation (see McCullagh and Nelder 1989) is \u00C2\u00B1 ( j ^ - ( n - D = 0. (6.9) t=i We see that in this simple univariate situation, (6.8) and (6.9) lead to the use of sample response mean and sample response variance to estimate the response mean and response variance. In the quasi-likelihood literature, it is usually assumed that the variance of Yj has the form Var(Yj) = V(pi), where tj> is an unknown dispersion parameter and V(pi) is a function of m = E(Yj). This is certainly not the case for the Poisson-lognormal model. In the Poisson-lognormal model, if we let pi = E(Yj), then Var(yj) = m + p2Ti, which cannot be identified with Var(j/j) = cf)V(pi). If we always assume Var(j/j) = tf)V(pi), in some situations, it is not possible to have a correct specification of the variance function. These arises a question of the effect of the variance function specification on the consistency of the marginal parameter estimates. It would be interesting to see how (6.8) and (6.9) estimate v and 77 under different specifications of the variance functions. In the next section, we will address these issues. For points 1, 2 and 4, we will use multivariate probit model for the investigation. For point 3, we will study the univariate Poisson-lognormal model. 6 . 3 G E E compared with the M L and I F M approaches In this section, we will study some of the questions concerning G E E arisen at the end of section 6.2. We compare the G E E approach with the M L approach and the I F M approach by simulation with the knowledge of true models. Chapter 6. GEE methodology and its comparison with ML and IFM approaches 222 C o m p a r i s o n a n d s imulat ion schemes We study the cases where the regression parameters are common across all margins. We compare the G E E estimates with M L E s and IFMEs (from the pool-marginal-likelihood approach). Except for the Poisson-lognormal model where we investigate the effect of the specification of variance function, in the G E E estimation, we always assume we have the correct specification of the marginal variance functions. With the Poisson-lognormal model, we specify different variance functions to investigate the importance of correctly specifying the variance functions. We use the mean-square error (MSE) of the estimate for a parameter from different approaches as the basis of the comparison. For an estimator 6 = 9(X\,..., Xn), where X\,..., Xn is a random sample of size n from a distribution indexed by 9, the M S E of 9 about the true value 9 is MSE(0) = E{9 - Of = Var(9) + \E{9) - 9}2. Suppose that 9 has a sampling distribution F, and suppose $i,..., 9m are iid of F, then one obvious estimator of M S E ( t9 ) is MSE(g) = ~ g ) 8 . (6.10) m The average of the parameter estimate is mean(0) = X^Li^>/m- Assume 9gee is from the G E E approach, 9pmia is from the I F M approach (with the pool-marginal-likelihood approach) and 9mie is the M L E . We examine r\ = M ^ ( i 9 m ( e ) / M S E ( ( 9 f f e e ) and r\ = M ^ ( f 3 p m , a ) / M S E ( ^ e e ) (in all tables, r i and r2 are reported). For a fixed sample size, 9 need not be the optimal estimate of 9 in term of M S E , since now the bias of the estimate is also taken into consideration. The above two ratios may indicate how G E E performs in comparison with the other approaches, and particularly how it compares with the M L E . The approach is mainly computational, based on the computer implementation of specific models and then the subsequent intensive simulation and parameter estimation. We will first use the multivariate probit model for investigating the relative efficiency of G E E estimates versus M L E s and IFMEs. We describe our simulation scheme and comparison scheme here in general terms. We simulate 'J~*', depending on the dependence structure of the latent variables. The following simulation scheme is used: 1. The sample size is n, the number of simulations is N. Both are reported in the tables. 2. Situations with covariates for d = 2 and d = 3 and with no covariates for d = 3 and d = 4 are considered: (a) Wi th no covariates: Zij = z for i = 1 , . . . , n, j = 1 , . . . , d. Two chosen values of z are: 0.5 and 1.5. (b) Wi th covariates: i. There are two situations for d = 2: = 0O + 0\X{j with /?o = 0.5, Q\ = 1 and \u00E2\u0080\u00A2Zjj = A) + Ai^i + 02Xij with /?o = - 0 . 5 , 0\ = 0.5, /?2 = 1, where x,j is margin-dependent and to,- is margin-independent covariate i i . For d = 3, only z,-j = /? 0 + 0ix,j with /?o = 0.5, 0\ = 1 is considered. Situations with z t j discrete and continuous and to,- discrete are studied. For w, discrete, we choose wi = I(U < 0) where U ~ uniform(\u00E2\u0080\u0094 1,1). For a;,j discrete, we choose x^ = I(U < j/(2d)) where U ~ uniform(\u00E2\u0080\u00941,1); for x^ continuous, we choose Xij ~ N(j/(2d),l/4). The continuous covariate case is only studied for d = 2. 3. We assume the latent correlation matrix 0 , free of covariates. For d = 2, p is chosen to be 0.9 and 0.5. For d > 3, 0 is chosen to be exchangeable with all correlation equal to p, and an AR(1) with (j, k) component equal to In both exchangeable and AR(1) cases, p is chosen to be 0.9. In G E E , the \"working\" correlation matrix is chosen by the investigator. There is arbitrariness in the choice of the \"working\" correlation matrix. We want to see how the choice of the \"working\" correlation matrix affects the estimation of the regression parameters in situations where the mean functions (also variance functions) are correctly specified. For G E E estimation, we study two type of \"working\" correlation matrix specification: Chapter 6. GEE methodology and its comparison with ML and IFM approaches 224 1. The correct specification of correlation matrix of the response variables, that is, rtjk is calcu-lated from (6.11) with the true parameter values. In the tables, we use ng for G E E specification of rijk, and ng = c for correct specification of correlation matrix. 2. The wrong specification of correlation matrix of the response variables: (a) For d = 2, let ritl2 = % . W e select rjg = 0.9, 0.5, 0, -0 .5 , -0 .9 . (b) When the latent correlation matrix is exchangeable: (i) the \"working\" correlation matrix has exchangeable structure with rijk = Vg> where when d = 3, rjg = 0 and r]g = \u00E2\u0080\u00940.4, and when d = 4, rjg = 0 and ng = \u00E2\u0080\u00940.3; and (ii) the \"working\" correlation matrix has AR(1) structure when d = 3, with r.-jjb = Jij/ -*' where rjg = 0.9 and ng = \u00E2\u0080\u00940.9. (c) When the latent correlation matrix is AR(1): (i) the \"working\" correlation matrix has AR(1) structure with r,jfc = 77]/~*' where ng = 0 and ng = \u00E2\u0080\u00940.9 for both d = 3 and d = 4; and (ii) the \"working\" correlation matrix has exchangeable structure when d = 3, with r,j<; - - 77^ where ng = 0.0 and ng = \u00E2\u0080\u00940.4. In the computer implementation, we first simulate d-dimensional binary data from a given d-dimensional probit model with or without covariates. We then use the G E E , M L and I F M approaches to estimate the parameters from each simulation, and then compute the M S E in (6.10) of the estimates from each parameter estimation approach. We also compute the mean of the parameter estimates. Next we discuss the simulation and computation scheme with the univariate Poisson-lognormal model to investigate the effects of the specification of variance functions on the estimation consistency of the marginal regression parameters. The G E E that we use are (6.8) and (6.9). The true variance for Yi is Var(Yi) = a + a 2 r , where a = E(Y;) = exp(f + r?2/2) and r = e x p(ri 2) \u00E2\u0080\u0094 1 for the situation with no covariate, and Var(Y t ) = a, + a?r, where a, = E(Y t ) = exp(z/,- + nj/2) and T ; = exp^?) \u00E2\u0080\u0094 1 for the situation with covariates. In the comparison study, we compare M L E to G E E with 1) correct specification of Var(y,) , 2) Var(Yj) = T O , 3) Var(Y,) = TO?, 4) Var(Y,) = TO?. Let o be the mean and 6 be the variance from G E E specifications. The simulation scheme is as follows: 1. The sample size is n; the number of simulation is TV. Both numbers are given within the tables. 2. We considered the situations of the parameter u independent of covariates and depending on a covariate x. The parameters are: Chapter 6. GEE methodology and its comparison with ML and IFM approaches 225 i . With no covariates: (v,rj) = (0.99995,0.01). In this case, a = 2.718282, and for the 4 different variance function specifications above, we have b = 2.719, 0.00027, 0.0007, 0.002 respectively, where b = 2.719 corresponds to the correct specification of the variance function. ii . With no covariates: (y,TJ) = (-0.1,1.48324). In this case, a = 2.718282, and for the 4 different variance function specification above, we have b = 62.02, 21.81, 59.30, 161.19 respectively, b = 62.02 corresponds to the correct specification of the variance function. iii . With covariate: v = a + 3x, where a \u00E2\u0080\u0094 0.5, 0 = 0.5 and x = I(U < 0) with U ~ uniform(\u00E2\u0080\u0094 1,1). The parameter rj = 0.01. Next we provide the numerical results for the situations outlined above. Bivar ia te p r o b i t m o d e l For the bivariate probit model with one covariate, the marginal linear regressions are Zij = 0O + 0\Xij, where 0o, 0i are the common marginal regression parameters. In G E E , we have p{ = (Pii(l), Pii(l))' = ($(/?o + 0ixil),$(0o + 0ixi2)y. The numerical results for the bivariate probit model with the covariate Xjj continuous and discrete are presented in Table 6.1 and Table 6.2. The numerical results for the bivariate probit model with the marginal linear regressions z,j = 0o + 0iW{ + 02Xij for W{ and x^ discrete are presented in Table 6.3. The results for W{ and Xij being continuous are quite similar, so they are not presented. In Table 6.4, we also present a case with the marginal linear regressions are z,j = 0O + 0\X{j for the situation where the true parameter p = 0.5. From these tables, two clear conclusions emerge: i) the specification of the \"working\" correlation has an effect on the estimation efficiency of G E E , with a major loss of efficiency when the specified \"working\" correlation is far from the true correlation. In fact, when the working correlation parameter is far away (particular with the wrong sign of the correlation) from the true correlation parameter, the G E E estimator performs poorly, and in some cases, the efficiency can be as low as 50%; ii) M L E s are always more efficient than G E E , but G E E is slightly more efficient than estimate from I F M when the the \"working\" correlation is correctly specified; iii) the observations in i) and ii) are consistent for the sample size from large to moderate. Tr ivar ia te p r o b i t m o d e l We first study the trivariate probit model with no covariate. We have P ( l l l ) = $3(2, z, z; pi2, P13, p23), wherezisthe common cut-off points for all three margins. In G E E , we have /!,- = (Pi(l), P 2(l), ^3(1))' Chapter 6. GEE methodology and its comparison with ML and IFM approaches Table 6.1: G E E assessment: d = 2, 0Q = 0.5,/?i = 1, xtj discrete, p = 0.9, TV = 1000 n = 1000 200\u00E2\u0080\u0094 ( V M S E ) mean ( \ /MSE) r\ T2_ n = mean r i ?*2 M L E I F M E 0.9 = 0.5 = -0.5 -0.9 ft & Pi Po Pi Po Pi Po Pi Po Pi Po Pi Po 0.501 1.003 0.502 1.001 0.501 1.003 0.500 1.005 0.501 1.003 0.502 1.001 0.503 1.001 0.503 1.000 0.0525) 0.0699) 0.0570) 0.0788) 0.0531) '0.0707) 0.0566) 0.0776) '0.0532) '0.0707) '0.0570) '0.0788) '0.0687) '0.1017) 0.0818) '0.1262) 0.989 0.989 0.929 0.901 0.987 0.988 0.921 0.887 0.765 0.687 0.642 0.554 1.074 1.116 1.009 1.016 1.072 1.115 1.0 1.0 0.831 0.775 0.698 0.625 0.504 1.011 0.507 1.007 0.505 1.010 0.501 1.014 0.504 1.010 0.507 1.007 0.508 1.008 0.506 1.011 0.1191) 0.1553) 0.1293) '0.1792) '0.1198) '0.1563) '0.1280) '0.1713) '0.1199) 0.1561) 0.1293 0.1791) 0.1575) 0.2381) 0.1906) 0.3029) 0.994 0.993 0.930 0.907 0.993 0.995 0.921 0.867 0.756 0.652 0.625 0.513 1.079 1.146 1.011 1.046 1.079 1.148 1.0 1.0 0.821 0.752 0.678 0.591 Table 6.2: G E E assessment: d = 2,p0 = 0.5, pi = 1, x{i continuous, p = 0.9, TV = 1000 IUOTJ (VMSE) n = mean ri r2 M L E I F M E Vg = 0.9 ng = 0.5 = 0 -0.5 -0.9 Po Pi Po Pi Po Pi Po Pi Po Pi Po Pi Po Pi Po 11 0.501 1.000 0.501 1.001 0.501 1.000 0.502 0.998 0.501 0.999 0.501 1.001 0.500 1.004 0.499 1.007 0.0416' '0.0651 '0.0427; '0.0730< '0.0416 '0.0653 '0.0496 '0.0766 '0.0421 '0.0675' '0.0427 '0.0730' '0.0463' '0.0952 '0.0512' '0.1209 1.0 1.027 0 997 1.117 0 839 0.861 0 854 0.957 0 990 1.016 0 964 1.081 0 974 1.0 0 892 1.0 0 899 0.923 0 685 0.767 0 813 0.835 0 539 0.604 Chapter 6. GEE methodology and its comparison with ML and IFM approaches 227 Table 6.3: G E E assessment: d = 2, ft = -0.5, ft = 0.5,ft = 1, wit xtj discrete, p = 0.9, N = 1000 n = 1000 mean ( V M S E ) r\ n = 200 mean ( V M S E ) r i r-2 M L E ft -0.496 (0.0647) -0.498 (0.1445) ft 0.497 (0.0782) 0.4986 (0.1793) ft 0.996 (0.0601) 1.005 (0.1289) I F M E ft -0.496 (0.0676) -0.497 (0.1527) ft 0.497 (0.0788) 0.498 (0.1806) ft 0.997 (0.0664) 1.004 (0.1488) Vg =c ft -0.495 (0.0652) 0.993 1.038 -0.495 (0.1447) 0 999 1.056 ft 0.498 (0.0783) 0.998 1.005 0.498 (0.1786) 1 003 1.012 ft 0.995 (0.0605) 0.994 1.098 1.001 (0.1294) 0 996 1.150 Vg = 0-9 ft -0.493 (0.0693) 0.934 0.977 -0.493 (0.1522) 0 950 1.003 ft 0.498 (0.0827) 0.946 0.953 0.503 (0.1905) 0 941 0.949 ft 0.992 (0.0676) 0.889 0.982 0.999 (0.1429) 0 903 1.042 Vg = 0-5 ft -0.495 (0.0655) 0.988 1.033 -0.495 (0.1451) 0 996 1.052 ft 0.497 (0.0786) 0.994 1.001 0.497 (0.1797) 0 998 1.005 ft 0.994 (0.0605) 0.994 1.097 1.001 (0.1294) 0 996 1.150 Vg = 0 ft -0.497 (0.0677) 0.957 1.0 -0.496 (0.1527) 0 946 1.0 ft 0.497 (0.0788) 0.993 1.0 0.498 (0.1806) 0 992 1.0 ft 0.997 (0.0664) 0.905 1.0 1.004 (0.1488) 0 866 1.0 Vg = -0-5 ft -0.499 (0.0768) 0.843 0.881 -0.499 (0.1762) 0 820 0.867 ft 0.498 (0.0789) 0.991 0.998 0.498 (0.1816) 0 987 0.995 ft 1.000 (0.0857) 0.701 0.774 1.008 (0.1977) 0 652 0.753 Vg = -0-9 ft -0.500 (0.0880) 0.736 0.769 -0.502 (0.2039) 0 709 0.749 ft 0.498 (0.0790) 0.989 0.997 0.498 (0.1823) 0 983 0.991 ft 1.003 (0.1067) 0.563 0.622 1.012 (0.2486) 0 519 0.599 Table 6.4: G E E assessment: d = 2, ft = 0.5, ft = 1, xtj discrete, p = 0.5, N = 1000 \"TOW\" mean (-y/MSE) r\ r2 n = 200 mean ( V M S E ) r-2 M L E I F M E Vg Vg Vg 0.9 = 0.5 Vg = 0 Vg Vg = -0.5 = -0.9 ft ft ft ft ft ft ft ft ft ft ft ft ft ft ft JL 0.502 1.002 0.502 1.001 0.502 1.002 0.500 1.005 0.501 1.003 0.502 1.001 0.504 0.999 0.504 0.998 U053) '0.075) 0.055) '0.077) 0.053) '0.075) '0.060) '0.088) 0.054) '0.077) '0.055) '0.077) '0.063) '0.092) '0.074) '0.112) 0.999 0.994 0.893 0.849 0.983 0.967 0.977 0.966 0.850 0.807 0.725 0.664 1.022 1.028 0.914 0.878 1.005 1.001 1.0 1.0 0.870 0.835 0.742 0.688 0 504 (0 119 1 O i l (0 155 0 507 (0 129 1 007 (0 179 0 505 (0 120 1 010 (0 156 0 500 (0 135 1 017 (0 194 0 503 (0 122 1 O i l (0 168 0 506 (0 121 1 007 (0 169 0 508 (0 139 1 005 (0 207 0 508 (0 164 1 006 (0 257 0.994 0.993 0.885 0.898 0.982 0.997 0.986 0.965 0.860 0.788 0.728 0.635 1.079 1.146 0.845 0.875 0.973 1.008 1.0 1.0 0.872 0.817 0.738 0.658 Chapter 6. GEE methodology and its comparison with ML and IFM approaches 228 Table 6.5: G E E assessment: d = 3, z = 0.5, latent exchangeable, p = 0.9, \"working\" exchangeable, TV = 1000 n = 1000 mean ( V M S E ) T*I r2 n = 200 mean ( \ /MSE) r\ r2 n = 100 mean ( \ /MSE) r i r 2 M L E I F M E Vg =c Vg = 0 r?a = -0.4 0.499 (0.0363) 0.498 (0.0366) 0.498 (0.0366) 0.992 1.0 0.498 (0.0366) 0.992 1.0 0.498 (0.0366) 0.992 1.0 0.503 (0.0839) 0.503 (0.0841) 0.503 (0.0841) 0.997 1.0 0.503 (0.0841) 0.997 1.0 0.503 (0.0841) 0.997 1.0 0.506 (0.1228) 0.506 (0.1233) 0.506 (0.1233) 0.996 1.0 0.506 (0.1233) 0.996 1.0 0.506 (0.1233) 0.996 1.0 Table 6.6: G E E assessment: d = 3, z = 1.5, latent exchangeable, p = 0.9, \"working\" exchangeable, TV = 1000 M L E I F M E Vg =c Vg = 0 Vo = -0-4 n = 1000 mean (\/MSE) ri r 2 1.500 1.500 1.500 1.500 1.500 0.0525) 0.0525) 0.0525) '0.0525) '0.0525) 0.999 1.0 0.999 1.0 0.999 1.0 n = 200 mean (VMSE) 1.510 1.510 1.510 1.510 1.510 0.1254) 0.1256) 0.1256) 0.1256) 0.1256) r\ 0.998 0.998 0.998 r2 1.0 1.0 1.0 ($(2), $(2), $(z))'. The numerical results are presented in Table 6.5 to Table 6.8. The numerical re-sults show that the specification of the correlation of the response variables in these simple situations have little effect on the parameter estimates from G E E . G E E is efficient in all cases. For the trivariate probit model with one covariate, we have P ( l l l ) = $3(0o+0ixi, 0o+0\x2, 0o + P12, Pi3, P23), where /?o, 0i are the common marginal regression parameters. In G E E , we have Pi = ( P - i ( l ) , P i ( l ) , P \u00C2\u00AB ( l ) ) ' = Wo+Pixn),m3o + 8lXi2),Q(Po +0ixi3)y. The numerical results for the trivariate probit model with covariate are presented in Table 6.9 and Table 6.10. We studied models with discrete covariate. Now the specification of the \"working\" correlation matrix has some effect on the estimation efficiency of G E E , with a major loss of efficiency when the specified \"working\" correlation matrix is far from the true correlation matrix. We also notice that G E E is slightly more Table 6.7: G E E assessment: d = 3, z = 0.5, latent AR(1), p = 0.9, \"working\" AR(1), TV = 1000 n = 1000 mean ( - \ / M S E ) r x r 2 n = 200 mean ( i / M S E ) 7*1 r 2 n = 100 mean ( A / M S E ) r\ r2 M L E I F M E Vg =c Vg = 0 r]Q = -0.9 0.499 (0.0355) 0.498 (0.0357) 0.498 (0.0357) 0.996 1.0 0.498 (0.0357) 0.994 1.0 0.498 (0.0362) 0.982 0.988 0.503 (0.0822) 0.503 (0.0822) 0.503 (0.0821) 1.0 1.0 0.503 (0.0822) 1.0 1.0 0.503 (0.0832) 0.989 0.988 0.506 (0.1192) 0.505 (0.1198) 0.506 (0.1192) 1.0 1.0 0.505 (0.1198) 0.995 1.0 0.505 (0.1219) 0.978 0.983 Chapter 6. GEE methodology and its comparison with ML and IFM approaches 229 Table 6.8: G E E assessment: d = 3, z = 1.5, latent AR(1), p = 0.9, \"working\" AR(1), N = 1000 n = 1000 mean ( \ /MSE) r\ r 2 n = 200 mean ( \ /MSE) 7*1 r 2 M L E I F M E T]g =c Vg = 0 Vo = -0.9 1.499 (0.0515) 1.500 (0.0516) 1.500 (0.0515) 1.0 1.0 1.500 (0.0517) 0.997 1.0 1.500 (0.0527) 0.978 0.981 1.508 (0.1213) 1.509 (0.1220) 1.509 (0.1213) 1.0 1.0 1.509 (0.1220) 0.994 1.0 1.510 (0.1247) 0.973 0.979 Table 6.9: G E E assessment: d = 3, \"working\" exchangeable, N = 1000 00 \u00E2\u0080\u0094 0.5,0\ = 1, xij discrete, latent exchangeable, p = 0.9, n r o o \u00E2\u0080\u0094 (\/MSE)_ n = 100 n = r-2 n = 200 mean (-\/MSE) r i r 2 mean ( V M S E ) r i r 2 M L E ft ft ft ft ft ft ft ft JJ, = -0.4 ft \u00C2\u00A3 1 I F M E 0.501 0.998 0.502 0.996 0.501 0.998 0.502 0.996 0.504 0.994 0.0462) 0.0582) 0.0502) 0.0677) '0.0463) '0.0588) 0.0502) '0.0677) '0.0765) '0.1219) 0.997 1.084 0.991 1.153 0.920 1.0 0.860 1.0 0.604 0.657 0.478 0.556 0.499 1.005 0.502 1.001 0.500 1.004 0.502 1.001 0.503 1.001 0.1057) 0.1338) '0.1148) 0.1588) '0.1066) '0.1369) '0.1148) '0.1588) '0.1764) '0.2886) 0.991 1.077 0.977 1.160 0.920 1.0 0.843 1.0 0.599 0.651 0.464 0.551 0.504 1.015 0.509 1.008 0.505 1.012 0.509 1.008 0.503 1.022 0.1515) 0.1915) 0.1680) 0.2260) '0.1528) '0.1939) '0.1680) '0.2261) '0.2673) '0.4419) 0.992 0.987 0.902 0.847 1.100 1.165 1.0 1.0 0.567 0.628 0.433 0.512 efficient than the estimate from I F M when the \"working\" correlation matrix is correctly specified. From the tables, we also notice that G E E and I F M E have the same parameter estimate when % = 0 and in exchangeable situations. In the following subsection, we will prove this is always true. We particularly notice that the G E E behaved very similarly to M L E and I F M , both in terms of marginal regression parameter estimates and their MSEs, when the working correlation matrix is chosen to be L 4-variate p r o b i t m o d e l The 4-variate probit model is considered for the situation with no covariate. We have P ( l l l l ) = $4(2, z, z, z; P12, P13, PM, p23, P24, P34), where z is the common cut-off points for all four margins. In G E E , we have p{ = (Pi(l ) , P2(l), P3(l), PA(1))' = ($(z), $(z), $(z), $(2) ) ' . The numerical results are presented in Table 6.11 and Table 6.12. The numerical results show that the specification of the correlation of the response variables in these simple situations have little effect on the parameter estimates from G E E . The G E E approach is efficient in all cases. Our simulation results also indicate that for estimation purposes, the estimating equations based Chapter 6. GEE methodology and its comparison with ML and IFM approaches 230 Table 6.10: G E E assessment: d = 3, ft = 0.5, ft = 1, x{j discrete, latent AR(1), p = 0.9, \"working\" AR(1), N = 1000 \"TUUO (yliaSE) =~2U0 (VMSE) T U T r2 mean (\ZMSE) r\ r2 M L E ft ft ft ft ft ft ft ft 7?, = -0.4 ft \u00C2\u00A7i I F M E % =c % = 0 0.501 0.999 0.502 0.997 0.501 0.999 0.502 0.997 0.504 0.995 0.0455) 0.0575) \"0.0494) 0.0666) '0.0455) '0.0580) '0.0494) '0.0666) '0.0688) '0.1076) 1.0 1.087 0.992 1.149 0.921 1.0 0.864 1.0 0.661 0.718 0.535 0.619 0.500 1.004 0.502 1.001 0.500 1.003 0.502 1.001 0.502 1.004 (0.1042) 0.1330) (0.1123) 0.1560) 0.1048) (0.1355) (0.1123) (0.1560) (0.1584) (0.2573) 0.994 1.071 0.981 1.151 0.928 1.0 0.852 1.0 0.658 0.709 0.517 0.606 0.505 1.011 0.509 1.006 0.507 1.007 0.509 1.006 0.505 1.020 0.1517) '0.1879) '0.1665) 0.2195) '0.1530) '0.1899) '0.1665) '0.2195) '0.2367) '0.3789) 0.992 0.990 0.911 0.856 1.088 1.156 1.0 1.0 0.641 0.703 0.496 0.580 Table 6.11: G E E assessment: d = 4, z = 0.5, latent exchangeable, p \u00E2\u0080\u00A2 N = 1000 0.9, \"working\" exchangeable, n = 1000 mean ( V M S E ) T*I r2 n = 200 mean ( \ /MSE) r i r2 M L E I F M E Vg =c Vg = 0 \u00E2\u0080\u00A2qq = -0.3 0.500 (0.0365) 0.500 (0.0366 0.500 (0.0366) 0.997 1.0 0.500 (0.0366) 0.997 1.0 0.500 (0.0366) 0.997 1.0 0.503 (0.0820) 0.503 (0.0829) 0.503 (0.0829) 0.989 1.0 0.503 (0.0829) 0.989 1.0 0.503 (0.0829) 0.989 1.0 on an independence working correlation structure behave quite well. I F M E or G E E We have seen in the preceding subsection that G E E has better performance than I F M E when the response correlation matrix is correctly specified, but I F M E has better performance than G E E in general. Now we will see some situations where G E E and I F M E are equivalent. Table 6.12: G E E assessment: d = 4, z = 0.5, latent AR(1), p = 0.9, \"working\" AR(1), N = 1000 n = 1000 mean ( \ /MSE) r*i r2 n = 200 mean ( \ /MSE) rx r2 M L E I F M E Vg =c Vg = 0 17, = -0.9 0.500 (0.0349) 0.500 (0.0350) 0.500 (0.0350) 0.997 1.001 0.500 (0.0350) 0.997 1.0 0.500 (0.0354) 0.985 0.989 0.503 (0.0781) 0.503 (0.0791) 0.503 (0.0787) 0.992 1.005 0.502 (0.0791) 0.987 1.0 0.502 (0.0803) 0.972 0.985 Chapter 6. GEE methodology and its comparison with ML and IFM approaches 231 Resul t 6.1 For a multivariate probit model with common cut-off points across margins, GEE with Ri(a) = R, where R has exchangeable structure, is equivalent to IFM. Proof. For a multivariate probit model with common cut-off points across margins, the I F M leads to an estimating equation Yl?=i Sj=i(2/\u00C2\u00AB 'j \u00E2\u0080\u0094 N) ' 0- This i s equivalent to the G E E with Ri{oi) = R, where R has an exchangeable structure. \u00E2\u0080\u00A2 Resul t 6.2 For a multivariate probit model with covariates, GEE with Ri{ot) = I, where I is the identity matrix, is equivalent to IFM. Proof: Assume p = (pi,.. .,pd)' and pj = $(j3xj). For a multivariate probit model with common cut-off points across margins, I F M leads to the estimating equations dp j yij - Pj _ This is equivalent to the G E E with Ri(a) = I, I is the identity matrix. \u00E2\u0080\u00A2 In this Chapter, we limit study to the G E E for common regression parameters across all margins. But G E E can also extended to the situations where parameters differ from margin to margin. We here introduce a result about the equivalency of G E E and I F M in some special situations with parameters differing from margin to margin. Resul t 6.3 For the multivariate probit model with parameters differing from margin to margin and with one margin-independent binary covariate, the GEE with Ri(ot) = R is equivalent to IFM for the marginal regression parameters. Proof. Assume x is the margin-independent binary covariate taking two values a and b. The marginal mean vector pt = {pi(a\ + Pix),..., pd(ctd + Pd^)}', i = 1,..., n, takes two distinct vector values: Ha = {pi{oci + Pia),...,pd(ad + f3da)}' Hb = { r * i ( < * i Pd{ad + 0db)}'. Assume there are na observations for x = a and nj for x = b, and let la = {i\xi = a) and Ib = {i\Xi = b}. Let a = (ax , . . . , a d) ' , 0 = (/?,,.. .,#,)'. For i \u00E2\u0082\u00AC Ia, A , a = dpa/da' does not depend on i, thus DTaVi~al is a d x d invertible matrix which does not depend on i. Let us denote this matrix by A. For i 6 la, we also have E>T^V.~^ = a A ^ a ^ \" a = Similarly, for i G lb, E)JlaVi~a ^ o e s n o t depend on i. If we denote Df^Vf^ by B for i G lb, we also have Chapter 6. GEE methodology and its comparison with ML and IFM approaches 232 Table 6.13: Estimates of v and n under different variance specification Spec, of Var(Yi) V V a + a 2r ar a 2r a 3r \u00C2\u00BB7i={log m={ *?3={1 r W l (s 2-y)/f log[s 2/y + 1 og[s 2/y 2 + 1 og\s 2/f + ] + l l } 1 / 2 nu /2 logy - 0.5(T7I)2 logy - 0.5(ry2)2 logy - 0.5(T?3)2 logy-0.5(f? 4 ) 2 E>JpV.-p = bDTaVr\u00C2\u00A3 = bB. The G E E for a and (3 are na rib A j ^ ( y i a - p a ) + Bj2(yib-pb) = 0 \u00C2\u00AB.=i ib=l aA E (y.-. - + b B - p\u00C2\u00BB) = 0 \u00C2\u00ABb=i t\u00E2\u0080\u009E=i which simplify to i.=l E ( y H - / i 6 ) = o. t t=i (6.12) It is straightforward to see that (6.12) is also the estimating equations for a and /? from I F M approach. \u00E2\u0080\u00A2 Poisson- lognormal m o d e l Let y = YsVil71 be the sample mean, and s2 = J2(v> ~ vY/(n ~ 1) be the sample variance. Using the estimating equations (6.8) and (6.9), the estimates of v and n based on different specification of the variance functions are listed in Table 6.13 (a = exp(i/ + ?72/2) and r = exp(r?2) \u00E2\u0080\u0094 1). Tables 6.14 - 6.16 contain numerical results based on the simulation scheme outlined for the Poisson-lognormal model previously. From the results in Tables 6.14 - 6.16, we see that quasi-likelihood estimates may be fine when the variance function is correctly specified, but may be asymptotically inconsistent if the variance function specification is not correct. A similar problem occurs in the multivariate case. It is thus critical to assess the form of Var(y) as a function of E(Y) before choosing G E E as the estimation method. Chapter 6. GEE methodology and its comparison with ML and IFM approaches 233 Table 6.14: G E E assessment: (v,rj) = (0.99995,0.01), E(Y) = 2.718282, Var(Y) = 2.719, n = 1000, TV = 500 fj ( V M S E ) r i v ( V M S E ) r i M L E 0.038 (0.0595) 0.997 (0.0190) a + O?T 0.047 (0.0705) 0.844 0.996 (0.0190) 1.0 ar 0.831 (0.8214) 0.072 0.653 (0.3472) 0.055 a?r 0.559 (0.5491) 0.108 0.843 (0.1587) 0.120 a3r 0.356 (0.3461) 0.172 0.935 (0.0675) 0.282 Table 6.15: G E E assessment: (v, n) = (-0.1,1.48324), E(Y) = 2.718282, Var(Y) = 62.02, n = 1000, N = 100 fj ( V M S E ) \u00C2\u00BB*1 v ( V M S E ) M L E 1.481 (0.0591) -0.094 (0.0559) a + a2r 1.418 (0.1488) 0.397 -0.016 (0.1790) 0.313 ar 1.724 (0.2743) 0.215 -0.497 (0.4361) 0.128 a2r 1.436 (0.1347 0.439 -0.042 (0.1621) 0.345 a3r 1.126 (0.3767) 0.157 0.357 (0.4741) 0.118 Table 6.16: G E E assessment: a = 0.5, 8 = 0.5, rj = 0.01, n = 1000, N = 500 a (VMSE) n 8 (VMSE) n fj (VMSE) n MLE a + a2r ar a2r a3r 0.492 (0.037) 0.492 (0.038) 0.985 0.153 (0.349) 0.107 0.262 (0.242) 0.154 0.342 (0.164) 0.227 0.502 (0.045) 0.502 (0.045) 0.994 0.499 (0.049) 0.916 0.580 (0.096) 0.469 0.593 (0.107) 0.419 0.063 (0.078) 0.071 (0.084) 0.921 0.832 (0.822) 0.095 0.624 (0.614) 0.127 0.458 (0.448) 0.173 Chapter 6. GEE methodology and its comparison with ML and IFM approaches 234 A p p e n d i x : N e w t o n - R a p h s o n m e t h o d for G E E We perform the model simulations and all M L E , I F M and G E E computations using programs in C written by the author. The code for the probit model incorporates the cases with covariates and with no covariates. For completeness, we provide here some mathematical details about the Newton-Raphson method that we used in the G E E estimation. To apply Newton-Raphson method, we need to evaluate both the estimating functions and the derivative of the estimating functions at arbitrary points of parameter vector. When the same regression parameter vector is common to all margins, the marginal mean function vector is /i t - = (pn(0),... ,pid(0))' where 0 = (0o,0i, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 ,0p)''\u00E2\u0080\u00A2 Assume the correlation matrix in G E E for the ith subject is Ri \u00E2\u0080\u0094 ( 1 a,-i2 1 V a\u00C2\u00BB2d 1 / / 2^i=i2_/j=i a/30 2^k=i oik a>,jkJ \ Then the estimating functions in G E E are t ( t) r^.-'A-(,-,, The estimating function corresponding to the mth regression parameter for the ith subject (i \u00E2\u0080\u0094 1 , . . . , n, m = 0 ,1 , . . . ,p) is d \u00C2\u00A3 J' = l 1 dptj Uik \u00E2\u0080\u0094 Pik Vi] 90m tT.fc After a few lines of calculations, we then have ~\u00C2\u00AEi.jk d^i\u00E2\u0080\u009E dpq d E i=i L 2pij - 1 dpij dpij yik - Pik 2 , , . , , , , .\u00E2\u0080\u009E , ... 0~ik '{yik - Pik)(2pik - 1) - 2o-jk dpjko-ij dpm \u00C2\u00A3f V 2afk 80q ' 6.4 A combination of G E E and I F M estimation approach In section 6.3, we have observed that in some situations G E E provides a slightly more efficient marginal regression parameter estimation than I F M when the correlations of the responses are cor-rectly specified. With the assumption of models, a natural specification of Ri(a) is possible. If Ri(a) Chapter 6. GEE methodology and its comparison with ML and IFM approaches 235 can also be reasonably estimated, then G E E can be applied to obtain the marginal regression param-eter estimates. This leads to the new approach for estimating the marginal regression parameters (for some models): i) use I F M approach to estimate model parameters, thus obtain Ri(ot), ii) use G E E to re-estimate marginal regression parameters. In the following, we provide a few numerical results to illustrate this new approach. To be more general, we study the situation where regression parameters differ from margin to margin. G E E is extended to this situation. We basically compare G E E marginal estimates (when Ri(ot) from I F M estimation is used) to I F M estimates. The comparison is carried out by simulation. We assume a multivariate probit model, Yij = I(Zij < f3jo + (3j\Xij), as in section 6.3. The simulation parameters are d = 3,4,5, /30 = (0.7,0, \u00E2\u0080\u00940.7,0,0.5)', /3, = (1,1.5,2,0.5, \u00E2\u0080\u00940.5) with the first 3 components of /30, /3i for d = 3, and the first 4 components of fi0, /?, for d = 4, so on. The two situations of covariate are i) discrete where x^ = I(U < 0), U ~ uniform(\u00E2\u0080\u00941,1), ii) continuous where Xij ~ /V(0,1/4). The latent correlation matrix is an exchangeable correlation matrix with all correlations equal to p = 0.5. The number of observations is 1000 and the number of simulations for each scenario is 1000. Table 6.17 contains the ratio r (r 2 = MSE(0,- / m ) /MSE(t? a e e_t7 m )) for a parameter 6, where 6gee-ifm means the estimate of 9 from combined G E E and I F M approaches. The calculation of M S E is defined in section 6.2. The Table 6.17 shows that there is some gain of efficiency with the new approach, since all r > 1. Table 6.17: A comparison of I F M to G E E with Ri(a) given margin 1 2 3 4 5 fen fe! fen fei fen fei fen fei fen fei Xij discrete d= 3 d = 4 d= 5 1.036 1.044 1.018 1.028 1.019 1.038 1.046 1.047 1.057 1.060 1.042 1.064 1.040 1.082 1.063 1.061 1.045 1.046 1.032 1.063 1.058 1.094 1.062 1.111 X{j continuous d= 3 d = 4 d= 5 1.000 1.055 1.006 1.053 1.003 1.027 0.999 1.087 1.005 1.078 1.011 1.064 0.999 1.104 1.004 1.101 1.009 1.063 1.011 1.082 1.002 1.101 1.002 1.110 6.5 S u m m a r y In this chapter, we discussed the drawbacks of the G E E in a multivariate analysis framework and examined the efficiency of G E E approach relative to a model based likelihood approach. The purpose of such a study is to partially fill in what is lacking in the statistical literature. Our conclusion is that Chapter 6. GEE methodology and its comparison with ML and IFM approaches 236 G E E is sensitive to the specification of dependence (or correlation) structure; when the specification of dependence is far from the correct one, there is a substantial loss of efficiency with G E E parameter estimation. The application of G E E to multivariate analysis (longitudinal studies and repeated measures) seems to have grown in relative importance in recent years, but the G E E method does have draw-backs, possible inefficiency, and some assumptions that may be too strong. One should be cautious in the use of G E E , particularly for count data, unless one has a way to assess the assumptions. Chapter 7 Some further research topics Many new ideas associated with the construction of multivariate non-normal models, and for es-timation and data analysis in multivariate models are advanced in this thesis. The I F M theory for dealing with multivariate models makes the parameters estimation and inference in multivariate non-normal models possible in many situations. More importantly, the research in this thesis may lead to some more potentially fruitful avenues of research. There is much room for further extensions of the ideas in this thesis to general multivariate analysis. In this final chapter, we mention a variety of research topics of special interest directly related to this thesis work. These topics may include: 1. Comparison of different models and inferences for short and long discrete time series. Long discrete time series situations may include (a) n independent long time series Y j (i = 1,2,.. . ,n) , where Y j = (Yn, Y , 2 , . . . , Y; < ;) has length t,-; (b) m correlated times series from a single subject; (c) n independent subjects (i = l , . . . , n ) , and m* repeated measures are observed on each subject i over a long time period. General M C D and M M D models with I F M inference approach may not be efficient in investigating either the marginal behaviour or dependence structure with these long time series. Adaptation of M C D and M M D to general random effects models together with a relative of the I F M approach can be used in these cases of long time series for each subject. Some applications would be the modelling in environmental studies and health studies of longitudinal or time series nature. 2. Models and inference for mixed multivariate responses (some continuous and some discrete variables). To analyze jointly multivariate discrete and continuous response data, appropriate multi-variate models with desirable properties (see Chapter 1) are required as the foundation for inferences. 237 Chapter 7. Some further research topics 238 The analysis of dependence (or associations) between the discrete and continuous response variables would be interesting and important part of the modelling and inference process. There are some ob-vious extensions of M C D and M M D models for mixed multivariate response variables. Other classes of models based on specified conditional distributions for mixed multivariate response variables may also be promising. The extension of the inference procedures based on I F M for mixed multivariate response variables is also possible. There is interesting potential to develop applications for real life situation. Some recent references on this topic are Catalano and Ryan (1992) and Fitzmaurice and Laird (1995). 3. Models for multinominal categorical responses with covariates. When the polytomous response variables do not have an ordered marginal structure, the existence of a M C D model becomes hard to justify since we are not able to justify the existence of latent continuous variables associated to the response variables. In the univariate situation, Cox (1970) proposed a model for unordered polytomous response variable. When the response variable Y takes m distinct values 2/1,2/2, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 > 2/m and p regressor variables x = (xi,..., xp)', then a model for Y is E , = i e x p ( \u00C2\u00AB > + Pi*) where a i +0'1x is assigned the value 0 for all x to make the parameters identifiable. Now suppose we have d correlated polytomous response variables (assume the dependence is well defined). Is there any suitable multivariate model for appropriately modelling the marginal behaviour as well as the multivariate dependence structure? What about the extension of M C D models? 4. Extension to multivariate compositional data. Sometimes, the analytical problems of interest to scientists produce data sets that consist essentially of relative proportions and thus are subject to nonnegativity and constant sum constraints. These situations lead to the compositional data. The Dirichlet distribution provides the parametric model of choice when analyzing such data. But the covariance structure associated with Dirichlet random vectors is well-known to be limited to nonpositive. Hence compositional data that exhibit positive correlations cannot be modeled with the Dirichlet. Aitchison (1986) developed classes of logistic normal models partly in response to this shortcoming. Unfortunately, Aitchison's logistic normal classes do not contain the Dirichlet distribution as a special case. As a result, they exhibit interesting dependence structures but are unable to model extreme independence. It is possible to relate the compositional data modelling to the big family of multivariate copula model. The questions are: Can we have models which can model the complicated dependence or complicated independence structure (see Aitchison, 1986)? A n d what about the appropriate estimation and inference procedures? 239 Other research topics include: (i) modelling bf unequally spaced longitudinal data; (ii) modelling of multivariate data with spatial patterns; (iii) modelling of multivariate directional data; (iv) adap-tation of M C D and M M D models and the I F M approach to missing data; and (v) further studies of families of copulas with given bivariate margins (such as Molenberghs-Lesaffre construction). The above topics may accept some obvious extensions of this thesis work. The research ap-proaches for the above topics to be taken may make use of copula models, latent variables, mixtures, stochastic processes, and point process modelling. Inference can be based on the expansion of I F M approach. References Aitchison, J . (1986). The Statistical Analysis of Compositional Data. Chapman and Hall, New York. Aitchison, J . and Ho, C . H . (1989). The multivariate Poisson-log normal distribution. Biometrika, 76,643-653. Akaike, H . (1973). Information theory and an extension of the maximum likelihood principle. In 2nd Inter. Symp. on Information Theory, Petrov, B. N . and Csaki, F. (eds.), Akademiai Kiado, Budapest, 267-281. Akaike, H . (1977). On entropy maximization principle. In Application of Statistics, Krishnaiah (ed.), North-Holland, 27-41. Al-Osh, M . A . and Aly, E . A . A . (1992). First order autoregressive time series with negative binomial and geometric marginals. Commun. Statist. A, 21, 2483-2492. Al-Osh, M . A . and Alzaid, A . A . (1987). First-order integer-valued autoregressive (INAR(l)) process. J. Time Series Anal., 8, 261-275. Anderson, J. A . and Pemberton, J . D . (1985). The grouped continuous model for multivariate ordered categorical variables and covariate adjustment. Biometrics, 41, 875-885. Ashford, J . R. and Sowden, R. R. (1970). Multivariate probit analysis. Biometrics, 26, 535-546. Bahadur, R. R. (1961). A representation of the joint distribution of responses to n dichotomous items. In Studies in Item Analysis and Prediction, H . Solomon (ed.). Stanford Mathematical Studies in the Social Sciences VI . Stanford, California: Stanford University Press. Bonney, G . E . (1987). Logistic regression for dependent binary observations. Biometrics, 43, 951\u00E2\u0080\u0094 973. Bradley, R. A . and Gart, J . J . (1962). The asymptotic properties of M L estimators when sampling from associated populations. Biometrika, 49, 205-213. 240 Catalano, P. J . and Ryan, L . M . (1992). Bivariate latent variable models for clustered discrete and continuous outcomes. J. Amer. Statist. Assoc., 87, 651-658. Chandrasekar, B. (1988). A n optimality criterion for vector unbiased statistical estimation func-tions. J. Statist. Plann. Inference, 18, 115-117. Chandrasekar, B. and Kale, B. K . (1984). Unbiased statistical estimation functions for the param-eters in the presence of nuisance parameters. J. Statist. Plann. Inference, 9, 45-54. Char, B. W. , Geddes, K . 0., Gonnet, G . H . , Monagan, M . B. and Watt. S. M . (1992). Maple Reference Manual. Watcom, Waterloo, Canada. Conaway, M . R. (1989). Analysis of repeated categorical measurements with conditional likelihood methods. J. Amer. Statist. Assoc., 84, 53-62. Connolly, M . A . and Liang, K . - Y . (1988). Conditional logistic regression models for correlated binary data. Biometrika 75 501-506. Consul, P. C . (1989). Generalized Poisson Distributions. Marcel Dekker, New York. Cox, D . R. (1970). The Analysis of Binary Data. Methuen, London. Cox, D . R. (1972). The analysis of multivariate binary data. Appl. Statist., 21, 113-120. Cramer, H . (1946). Mathematical Methods of Statistics. Princeton University Press, Princeton, NJ . Darlington, G . A . and Farewell, V . T . (1992). Binary longitudinal data analysis with correlation a function of explanatory variables. Biometrical J., 34, 899-910. Davis, P. J . and Rabinowitz, P. (1984). Methods of Numerical Integration, second edition. Aca-demic Press, Orlando. Efron, B. and Stein, C . (1981). The jackknife estimate of variance. Ann. Statist., 9, 586-596. Efron, B. (1982). T i e Jackknife, the Bootstrap and Other Resampling Plans. Society for Industrial and Applied Mathematics, Philadelphia. Fahrmeir, L . and Kaufmann, H . (1987). Regression models for non-stationary categorical time 241 series, J. Time Series Anal., 8, 147-160. Ferreira, P. E . (1982). Sequential estimation through estimating equations in the nuisance param-eter case. Ann. Statist., 10, 167-173. Fienberg, S. E . , Bromet, E . J., Follman, D . , Lambert, D . and May, S. M . (1985). Longitudi-nal analysis of categorical epidemiological data: a study of Three Mile Island. Environ. Health Perspectives, 63, 241-248-Fisher, R. A . (1924). The conditions under which x2 measures the discrepancy between observation and hypothesis. J. Roy. Statist. Soc, 87, 442-450. Fitzmaurice, G . M . and Laird, N . M . (1993). A likelihood-based method for analyzing longitudinal binary responses. Biometrika, 80, 141-151. Fitzmaurice, G . M . and Laird, N . M . (1995). Regression models for a bivariate discrete and continuous outcome with clustering. J. Amer. Statist. Assoc., 90, 845-852. Fletcher, R. (1970). A new approach to variable metric algorithms. Computer Journal, 13, 317-322. Gardner, W . (1990). Analyzing sequential categorical data: individual variation in Markov chains. Psychometrika, 55, 263-275. Genest, C . and MacKay, R. J . (1986). Copules archimediennes et families de lois bidimensionelles dont les marges sont donnees. Canad. J. Statist., 14, 145-159. Glonek, G . F. V . and McCullagh, P. (1995). Multivariate logistic models. J. R. Statist. Soc. B, 57, 533-546. Godambe, V . P. (1960). A n optimal property of regular maximum likelihood estimation. Ann. Math. Statist., 31, 1208-1211. Godambe, V . P. (1976). Conditional likelihood and unconditional optimum estimating equations. Biometrika, 63, 277-284. Godambe, V . P. (1991). Estimating Functions. Oxford University Press, New York. 242 Goodman, L. A . and Kruskal, W . H . (1954). Measures of association for cross classifications. J. Amer. Statist. Assoc., 49, 732-764. Hoadley, B. (1971). Asymptotic properties of maximum likelihood estimators for the independent not identically distributed case. Ann. Math. Statist., 42, 1977-1991. Joe, H . (1993). Parametric family of multivariate distributions with given margins. J. Multivariate Anal., 46, 262-282. Joe, H . (1994). Lecture notes, Course given at Department of Mathematics and Statistics, Uni-versity of Pittsburgh, Pittsburgh, U S A . Joe, H . (1994). Multivariate extreme-value distributions with applications to environmental data. Canad. J. Statist., 22, 47-64. Joe, H . (1995). Approximations to multivariate normal rectangle probabilities based on conditional expectations. J. Amer. Statist. Assoc., 90, 957-964. Joe, H . (1996). Multivariate Models and Dependence Concepts, Draft book and Stat 521 course notes. Department of Statistics, University of British Columbia, Vancouver, Canada. Joe, H . (1996a). Families of m-variate distributions with given margins and m(m \u00E2\u0080\u0094 l) /2 bivariate dependence parameters. In Distributions with fixed marginals, doubly stochastic measures and Markov operators, Sherwood, H . and Taylor, M . (eds.), IMS Lecture Notes - Monograph Series, Hay ward, C A . Joe, H . (1996b). Time series models with univariate margins in the convolution-closed infinitely divisible class. J. Appl. Probab., to appear. Joe, H . and Hu, T . (1996). Multivariate distributions from mixtures of max-infinitely divisible distributions. J. Multivariate Anal, 57, 240-265. Jorgensen B. and Labouriau, R. S. (1995). Exponential Families and Theoretical Inference, Lecture notes. Department of Statistics, University of British Columbia, Vancouver, Canada. Johnson, N . L. and S. Kotz (1975). On some generalized Farlie-Gumbel-Morgenstern distributions. Communication in Statistics, 4, 415-427. 243 Johnson, N . L . and S. Kotz (1977). On some generalized Farlie-Gumbel-Morgenstern distributions - II Regression, correlation and further generalizations. Communication in Statistics, A , 6, 485-496. Joseph, B. and Durairajan, T . M . (1991). Equivalence of various optimality criteria for estimating functions. J. Statist. Plann. Inference, 27, 355-360. Kimeldorf, G . and Sampson, A . R. (1975). Uniform representations of bivariate distributions. Comm. Stat.-Theor. Meth., 4, 617-628. Lesaffre, E . and Molenberghs, G . M . (1991). Multivariate probit analysis: a neglected procedure in medical statistics. Stat, in Medicine, 10, 1391-1403. Kocherlakota, S. and Kocherlakota, K . (1992). Bivariate Discrete Distributions. Dekker, New York. Lawless, J . F. (1987). Negative binomial and mixed Poisson regression. Canad. J. Statist., 15, 209-225. Liang, K . Y . and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13-22. Liang, K . Y . , Zeger, S. L. and Qaqish, B. (1992). Multivariate regression analysis for categorical data. J. Roy. Statist. Soc, B, 54, 3-40. Lipsitz, S. R., Dear, K. B. G . and Zhao, L . (1994). Jackknife estimators of variance for parameter estimates from estimating equations with applications to clustered survival data. Biometrics, 50, 842-846. Mahamunulu, D . M . (1967). A note on regression in the multivariate Poisson distribution. J. Amer. Statist. Assoc., 62, 251-258. McCullagh, P. and Nelder, J . A . (1989). Generalized Linear Models, second edition. Chapman and Hall, London. McKenzie, E . (1986). Autoregressive moving-average processes with negative-binomial and geo-metric marginal distributions. Adv. Appl. Probab., 18, 679-705. McKenzie, E . (1988). Some A R M A models for dependent sequences of Poisson counts. Adv. Appl. 244 Probab., 20, 822-835. McLeish, D . L. and Small, C . G . (1988). T i e Theory and Applications of Statistical Inference Functions. Lecture Notes in Statistics 44, Springer-Verlag, New York. Meester, S. G . and MacKay, J . (1994). A parametric modelfor cluster correlated categorical data. Biometrics, 50, 954-963. Miller, R. G . (1974). The jackknife - a review. Biometrika, 61, 1-15. Molenberghs, G . M . and Lesaffre, E . (1994). Marginal modeling of correlated ordinal data using a multivariate Plackett distribution. J. Amer. Statist. Assoc., 89, 633-644. Morgenstern, D . (1956). Einfache Beispeile zweidimensionaler Verteilungen. Mitteilungsblatt fur Matiematiscie Statistik, 8, 234-235. Muenz, L . R .and Rubinstein, L . V . (1985). Markov models for covariate dependence of binary sequences. Biometrics 41 91-101. Nash, J . C . (1990). Compact Numerical Methods for Computers: Linear Algebra and Function Minimisation, second edition. Hilger, New York. Nelder, J . A . and Wedderburn, R. W . M . (1972). Generalized linear models. J. Roy. Statist. Soc. A, 135,370-384. Prentice, R. L. (1986). Binary regression using an extended beta-binomial distribution, with dis-cussion of correlation induced by covariate measurement errors. J. Amer. Statist. Assoc., 81, 321-327. Prentice, R. L. (1988). Correlated binary regression with covariates specific to each binary obser-vation. Biometrics, 44, 1033-1048. Petrov, V . V . (1995). Limit Theorems of Probability Theory. Clarendon Press, Oxford. Quenouille, M . H.(1956). Notes on bias in estimation. Biometrika, 43, 353-360. Rao, C . R. (1973). Linear statistical inference and its applications. 2nd ed. Wiley, New York. Read, T . R. C . and Cressie, N . A . C . (1988). Goodness-of-fit Statistics for Discrete Multivariate 245 Data. Springer-Verlag, New York. Rousseeuw, P. J . and Molenberghs, G . (1994). The shape of correlation matrices. T i e American Statistician, 48, 276-279. Sakamoto, Y . , Ishiguro, M . and Kitagawa, G . (1986). Akaike Information Criterion Statistics. K T K Scientific Publishers, Tokyo. Schervish, M . J . (1984). Multivariate normal probabilities with error bound. Appl. Statist., 33, 81-87. Schweizer, B. and Wolff, E . F. (1981). On nonparametric measures of dependence for random variables. Ann. Statist., 9, 879-885. Seber, G . A . F. (1984). Multivariate Observation. Wiley, New York. Sen, P. K . and Singer, J. M . (1993). Large Sample Methods in Statistics. Chapman k Hall, New York. Serfling, R. J . (1980). Approximation Theorems of Mathematical Statistics. Wiley, New York. Sklar, A . (1959). Fonction de repartition a n dimensions et leurs marges. Publ. Inst. Statist. Univ. Paris, 8, 229-231. Stein, G . Z., Zucchini, W . and Juritz, J . M . (1987). Parameter estimation for the Sichel distribution and its multivariate extension. J. Amer. Statist. Assoc., 82, 938-944. Stram, D . O. , Wei, L . J . and Ware, J . H . (1988). Analysis of repeated ordered categorical outcomes with possibly missing observations and time-dependent covariates. J. Amer. Statist. Assoc., 83, 631-637. Teicher, H . (1954). On the multivariate Poisson distribution. Skandinavisk Aktuarietidskrift, 37, 1-9. Thorburn, D . (1976). Some asymptotic properties of jackknife statistics. Biometrika, 63, 305-313. Tong, Y . L. (1990). The Multivariate Normal Distribution. Springer-Verlag, New York. Tukey, J . W . (1958). Bias and confidence in not quite large samples. Abstract in Ann. Math. 246 247 Statist., 29, 614. Ware, J . H . , Dockery, D . W. , Spiro, A . Speizer, F . E . and Ferris, B. G . , Jr. (1984). Passive smoking, gas Cooking, and respiratory health of children living in six cities. American Review of Respiratory Disease, 129, 366-374. Zeger, S. L. and Liang, K. Y . (1986). Longitudinal data analysis for discrete and continuous outcomes. Biometrics, 42, 121-130. Zeger, S. L. , Liang, K . Y . and Albert, P. S. (1988). Models for longitudinal data: A generalized estimating equation approach. Biometrics, 44, 1049-1060. Zeger, S.L., Liang, K . - Y . and Self, S .G. (1985). The analysis of binary longitudinal data with time-independent covariates. Biometrika, 72, 31-38. Zhao, L . P. and Prentice, R. L. (1990). Correlated binary regression using a quadratic exponential model. Biometrika, 77, 642-648. Appendix A Maple programs This appendix contains a program written in Maple for Example 4.3 in Chapter 4. gl2pli := i/4+l/(2*pi)*arcsin(rl2); gl2p00 := g l 2 p l l ; g12pl0 := 1/2 - g l 2 p l l ; gl2p01 := gl2pl0; dgl2pll := d i f f ( g l 2 p l l , rl2);.dgl2p00 : = dgl2pii ; dgl2pl0 := diff(gl2pl0 , r l2 ) ; dgl2p01 := dgl2pl0; gl3pll := l/4+l/(2*pi)*arcsin(rl3); gl3p00 := g l 3 p l l ; gl3pi0 := i/2 - g l 3 p l i ; gl3p01 := gl3pl0; dgl3pll := d i f f ( g l 3 p l l , r l3 ) ; dgl3p00 := dgl3pll ; dgl3pl0 : = diff(gl3pl0, r l3 ) ; dgl3p01 := dgl3pl0; g23pll := l/4+l/(2*pi)*arcsin(r23); g23p00 := g23pll; g23pl0 :=i/2 - g23pil; g23p01 := g23pl0; dg23pll := diff (g23pll , r23); dg23p00 := dg23pll; dg23pl0 := diff(g23pl0, r23); dg23p01 := dg23pl0; g p l l l : = l/8+l/(4*pi)*(arcsin(rl2)+arcsin(rl3)+arcsin(r23)); gpllO := g l 2 p l l - g p l i l ; gpOll := g23pl l -gpl l l ; gplOl := g l 3 p i l - g p l l l ; gpOOl : = g23p01-gpl01; gplOO := gl2p!0-gpl01; gpOlO := gl2p01-gp011; 248 249 gpOOO := 1-gplll-gpliO-gpOll-gplOl-gpOOl-gplOO-gpOlO; 111 := l /gpl l i *di f f (gpl l l , r l2 ) -2+ i /gpl l0*di f f (gpl i0 , r l2 ) -2 +l/gplOi*diff(gpl01,rl2)-2+l/gp01i*diff(gp0il,rl2)-2 +l/gplOO*diff(gplOO,rl2)-2+l/gp001*diff(gpOOl,rl2)'2 +l/gp010*diff(gp010,rl2)-2+l/gp000*diff(gp000,rl2)-2; 122 := l /gpl l l *di f f (gpl l l , r l3 ) -2+ i /gpl l0*dif f (gpl i0 , r l3 ) \"2 +l/gplOi*diff(gplOl,rl3)-2+l/gp011*diff(gpOil,rl3)~2 +l/gplOO*diff(gpi00,rl3)-2+l/gp001*diff(gpOOl,rl3)*2 +l/gp010*diff(gp010,rl3)~2+l/gp000*diff(gp000,rl3)\"2; 133 := l /gpill*diff(gplll ,r23)*2+l/gpll0*diff(gpll0,r23)~2 +l/gpl01*diff(gpl01,r23)-2+l/gp0il*diff(gpOll,r23)*2 +l/gpiOO*diff(gplOO,r23)-2+l/gp001*diff(gpOOl,r23)\"2 +l/gp010*diff(gpOlO,r23)*2+l/gp000*diff(gpOOO,r23)\"2; 112 := l / g p l l l * d i f f ( g p l l l , r l 2 ) * d i f f ( g p l l l , r l 3 ) +l /gpllO*diff(gpll0 ,r l2)*diff(gpll0 ,r l3) +l/gplol*diff(gpl01,rl2)*diff(gpl01,rl3) +l/gp011*diff(gp011,rl2)*diff(gp011,rl3) +l/gpiOO*diff(gpl00,rl2)*diff(gpl00,rl3) +l/gp001*diff(gp001,rl2)*diff(gp001,ri3) +l/gp010*diff(gp010,rl2)*diff(gp010,rl3) +l/gpOOO*diff(gpOOO,r12)*diff(gpOOO,rl3); 113 := l / g p l l l * d i f f ( g p l l l , r l 2 ) * d i f f ( g p l l l , r 2 3 ) +l /gpllO*diff(gpll0,ri2)*dill (gpll0,r23) +l/gpl01*diff(gpl01,rl2)*diff(gpl01,r23) +l/gp011*diff(gp011,rl2)*diff(gp011,r23) +i/gplOO*diff(gpl00,rl2)*diff(gpl00,r23) +l/gp001*diff(gp001,rl2)*diff(gp001,r23) +i/gp010*diff(gp010,rl2)*difl(gp010,r23) +i/gpOOO*diff(gpOOO,ri2)*diff(gpOOO,r23); 123 := l / g p l l l * d i f f ( g p l l l , r l 3 ) * d i f f ( g p l l l , r 2 3 ) +l/gpllO*diff(gpll0,rl3)*diff(gpll0,r23) 250 +l/gpl01*diff(gpl01,rl3)*diff(gpl01,r23) +1/gpO11*dif f(gpO11,r13)*diff(gpO11,r2 3) +l/gplOO*diff(gpl00,rl3)*dill(gpl00,r23) +l/gp001*diff(gpOOl,r13)*diff(gpOOl,r23) +l/gp010*diff(gp010,rl3)*diff(gp010,r23) +l/gpOOO*diff(gpOOO,rl3)*diff(gpOOO,r23); 111 := s i m p l i f y ( I l l ) ; 122 := simplify(I22); 133 : 112 := simplify(I12); 113 := simplify(113); 123 : E l l := l/gl2pll*dgl2pll\"2+l/gl2pl0*dgl2pl0-2 +l/gi2p01*dgl2p01-2+l/gl2p00*dgl2p0(T2; E22 := l/gl3pll*dgl3pll-2+l/gl3pl0*dgl3pl0~2 +l/gl3p01*dgl3p01~2+l/gl3p00*dgl3p0(T2; E33 := l/g23pll*dg23pll~2+l/g23pl0*dg23picr2 -H/g23p01*dg23p01\"2+l/g23p00*dg23p00\"2; E12 := gpll l / (gl2pll*gl3pll )*dgl2pll*dgl3pll +gpll0/(gl2pll*gl3pl0)*dgl2pll*dgl3pl0+gpl01/(gl2plO*gl3pll)*dgl2pl0*dgl3pll +gplOO/(gl2plO*gl3plO)*dgl2plO*dgl3plO+gp011/(gl2p01*gl3p01)*dgl2p01*dgl3p01 +gp010/(gl2p01*gl3p00)*dgl2p01*dgl3p00+gp001/(gl2p00*gl3p01)*dgl2p00*dgl3p01 +gp000/(gl2p00*gl3p00)*dgl2p00*dgl3p00; E13 := gplll/(gl2pll*g23pll)*dgl2pll*dg23pll +gpllO/(gl2pll*g23plO)*dgl2pll*dg23plO+gpl01/(gl2plO*g23p01)*dgl2plO*dg23p01 +gplOO/(gl2plO*g23pOO)*dgl2plO*dg23pOO+gp011/(gl2p01*g23pll)*dgl2p01*dg23pll +gp010/(gl2p01*g23pl0)*dgl2p01*dg23pl0+gp001/(gl2p00*g23p01)*dgl2p00*dg23p01 +gpOOO/(gl2p00*g23p00)*dgl2p00*dg23p00; E23 := gpiil/(gl3pll*g23pli)*dgl3pil*dg23pll +gpll0/(gl3pl0*g23pl0)*dgl3pl0*dg23pl0+gpl01/(gl3pll*g23p01)*dgl3pll*dg23p01 +gplOO/(gl3plO*g23pOO)*dgl3plO*dg23pOO+gp011/(gl3p01*g23pll)*dgl3p01*dg23pll +gp010/(gl3p00*g23pl0)*dgl3p00*dg23pl0+gp001/(gl3p01*g23p01)*dgl3p01*dg23p01 +gpOOO/(gl3p00*g23p00)*dgl3p00*dg23p00; E l l := s impli fy(El l ) ; E22 := simplify(E22); E33 := simplify(E33); E12 := simplify(E12); E13 := simplify(E13); E23 := simplify(E23); = simplify(133); = simplify(123); with(linalg); I := matrix(3,3, [111,112,113,112,122,123,113,123,133]); Iinv := evalm(I~(-1)); M:= matrix(3,3, [Ell,E12,E13,Ei2,E22,E23,E13,E23,E33]); Minv := evalm(M~(-1)); D := matrix(3,3,[Ell,0,0,0,E22,0,0,0,E33]); Dinv := evalm(D*(-1)); Jinv := evalm(Dinv ft* M ft* Dinv); map(simplify,evalm(Jinv-Iinv)); det(evalm(Jinv-Iinv)); "@en . "Thesis/Dissertation"@en . "1996-11"@en . "10.14288/1.0087914"@en . "eng"@en . "Statistics"@en . "Vancouver : University of British Columbia Library"@en . "University of British Columbia"@en . "For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use."@en . "Graduate"@en . "Statistical modelling and inference for multivariate and longitudinal discrete response data"@en . "Text"@en . "http://hdl.handle.net/2429/6188"@en .