UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Statistical modelling and inference for multivariate and longitudinal discrete response data 1996

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata

Download

Media
ubc_1996-148599.pdf
ubc_1996-148599.pdf [ 12.19MB ]
Metadata
JSON: 1.0087914.json
JSON-LD: 1.0087914+ld.json
RDF/XML (Pretty): 1.0087914.xml
RDF/JSON: 1.0087914+rdf.json
Turtle: 1.0087914+rdf-turtle.txt
N-Triples: 1.0087914+rdf-ntriples.txt
Citation
1.0087914.ris

Full Text

STATISTICAL M O D E L L I N G A N D I N F E R E N C E FOR MULTIVARIATE A N D LONGITUDINAL DISCRETE RESPONSE DATA by James Jianmeng X u B.Sc. Sichuan University M.Sc. Universite Laval A T H E S I S S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F D O C T O R O F P H I L O S O P H Y T H E F A C U L T Y O F G R A D U A T E S T U D I E S Department of Statistics We accept this thesis as conforming to the required standard T H E U N I V E R S I T Y O F BRITISH C O L U M B I A September 1996 © James Jianmeng X u , 1996 in In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of The University of British Columbia Vancouver, Canada •ate S ^ p l f r y DE-6 (2/88) Abstract This thesis presents research on modelling, statistical inference and computation for multivariate discrete data. I address the problem of how to systematically model multivariate discrete response data including binary, ordinal categorical and count data, and how to carry out statistical inference and computations. To this end, I relate the multivariate models to similar univariate models al- ready widely used in applications and to some multivariate models that hitherto were available but scattered in the literature, and I introduce new classes of models. The main contributions in this thesis to multivariate discrete data analysis are in several distinct directions. In modelling of multivariate discrete data , we propose two new classification of mul- tivariate parametric discrete models: multivariate copula discrete (MCD) models and multivariate mixture discrete (MMD) models. Numerous new multivariate discrete models are introduced through these two classes and several multivariate discrete models which have appeared in the literature are unified by these two classes. With appropriate choices of copulas, these two classes of models allow the marginal parameters and dependence parameters to vary with covariates in a natural way. By using special dependence structures, the models can be used for longitudinal data with short time series or repeated measures data. As a result, the scope of multivariate discrete data analysis is sub- stantially broadened. In statistical inference and computation for multivariate models, we propose the inference function of margins (IFM) approach in which each inference function is a likelihood equation for some marginal distribution of a multivariate distribution. Examples where the approach applies are the multivariate logit model with the copulas having certain closure properties and the multivariate probit model for binary data. This general approach makes the estimation of parame- ters for the multivariate models computationally feasible. The corresponding asymptotic theory, the estimation of standard errors by the Godambe information matrix as well as the jackknife method, and the efficiency of the IFM approach relative to full multivariate likelihood function approach are studied. Particular attention has been given to the models with special dependence structure ii (e.g. the copula dependence structure is exchangeable or AR(1) type if applicable), and efficient parameter estimation schemes based on I F M (weighting approach and pool-marginal-likelihood ap- proach) are developed. We also give detailed assessments of the efficiency of the G E E approach for estimating regression parameters in multivariate models; this is lacking in the literature. Detailed data analyses of existing data sets are provided to give concrete application of multivariate models and the statistical inference procedures in this thesis. 111 Contents A b s t r a c t i i Table of C o n t e n t s i i i L i s t of Tables v i i L i s t of F i g u r e s x i i B a s i c N o t a t i o n a n d D e f i n i t i o n s x v A c k n o w l e d g e m e n t s x v i i i 1 I n t r o d u c t i o n 1 1.1 Multivariate discrete response data 1 1.2 Review of literature and research motivation 4 1.3 Statistical modelling 8 1.4 Overview of thesis 10 2 F o u n d a t i o n : m o d e l s , s t a t i s t i c a l inference a n d c o m p u t a t i o n 11 2.1 Multivariate copulas and dependence measures 12 2.1.1 Multivariate distribution functions 12 2.1.2 Multivariate copulas and Frechet bounds 13 2.1.3 Dependence measures 15 2.1.4 Examples of multivariate copulas 17 2.1.5 CTJOM, CUOM(fc), MTJBE, P U B E and M P M E concepts 20 2.2 Multivariate discrete models 24 2.2.1 Multivariate copula discrete models 25 iv 2.2.2 Multivariate mixture discrete models 26 2.2.3 Examples of M C D and M M D models 26 2.2.4 Some properties of M C D and M M D models 33 2.3 Inference functions of margins 34 2.3.1 Approaches for fitting multivariate models 35 2.3.2 Inference functions for multiple parameters 37 2.3.3 Inference function of margins 43 2.4 Parameter estimation with I F M and asymptotic results 49 2.4.1 Models with no covariates 49 2.4.2 Models with covariates 57 2.4.3 Asymptotic results for the models assuming a joint distribution for response vector and covariates . 66 2.5 The Jackknife approach for the variance of I F M E . 70 2.5.1 Jackknife approach for models with no covariates 71 2.5.2 Jackknife for a function of 6 79 2.5.3 Jackknife approach for models with covariates 81 2.6 Estimation for models with parameters common to more than one margin 82 2.6.1 Weighting approach 83 2.6.2 The pool-marginal-likelihoods approach 85 2.6.3 Examples 87 2.7 Numerical methods for the model fitting 88 2.8 Summary ' 92 3 M o d e l l i n g of m u l t i v a r i a t e discrete d a t a 93 3.1 Multivariate copula discrete models for binary data 94 3.1.1 Multivariate logit model 94 3.1.2 Multivariate probit model 100 3.2 Comparison of models 102 3.3 Multivariate copula discrete models for count data 103 3.3.1 Multivariate Poisson model 104 3.3.2 Multivariate generalized Poisson model 106 3.3.3 Multivariate negative binomial model 107 3.3.4 Multivariate logarithmic series model 107 v 3.4 Multivariate copula discrete models for ordinal data 108 3.4.1 Multivariate logit model 109 3.4.2 Multivariate probit model 113 3.4.3 Multivariate binomial model 114 3.5 Multivariate mixture discrete models for binary data 115 3.5.1 Multivariate probit-normal model 115 3.5.2 Multivariate Bernoulli-Beta model 116 3.5.3 Multivariate logit-normal model 118 3.6 Multivariate mixture discrete models for count data 119 3.6.1 Multivariate Poisson-lognormal model 119 3.6.2 Multivariate Poisson-gamma model 121 3.6.3 Multivariate negative-binomial mixture model 122 3.6.4 Multivariate Poisson-inverse Gaussian model 122 3.7 Application to longitudinal and repeated measures data 123 3.8 Summary 124 4 T h e eff iciency o f I F M a p p r o a c h a n d t h e eff iciency o f j a c k k n i f e var iance e s t i m a t e l 2 5 4.1 The assessment of the efficiency of I F M approach 126 4.2 Analytical assessment of the efficiency 129 4.3 Efficiency assessment through simulation 146 4.4 I F M efficiency for models with special dependence structure 162 4.5 Jackknife variance estimate compared with Godambe information matrix 163 4.6 Summary 168 5 M o d e l l i n g , d a t a analys is a n d examples 172 5.1 Some issues on modelling 173 5.1.1 Data analysis cycle 173 5.1.2 Model selection 174 5.1.3 Diagnostic checking 176 5.1.4 Testing the dependence structure 179 5.2 Data analysis examples 181 5.2.1 Example with multivariate/longitudinal binary response data 182 5.2.2 Example with multivariate/longitudinal ordinal response data 194 vi 5.2.3 Example with multivariate count response data 205 5.3 Summary 212 6 G E E methodology and its comparison with M L and I F M approaches 213 6.1 Generalized estimating equations 214 6.2 G E E in multivariate analysis 217 6.3 G E E compared with the M L and I F M approaches 221 6.4 A combination of G E E and I F M estimation approach 234 6.5 Summary 235 7 Some further research topics 237 Bibliography - 247 A Maple programs 248 vii List of Tables 1.1 The structure of general multivariate discrete data 2 4.1 Efficiency assessment with M C D model for binary data: d — 3, z = (0, 0, 0)', N = 1000148 4.2 Efficiency assessment with M C D model for binary data: d = 3, 0O — (0.7,0.5,0.3)', & = (0.5,0.5,0.5)', Xij discrete, N = 1000 149 4.3 Efficiency assessment with M C D model for binary data: d = 3, 0O = (0.7,0.5,0.3)', /?i = (0.5,0.5,0.5)', = X{ continuous, N — 100 149 4.4 Efficiency assessment with M C D model for binary data: d = 4, a\2 = ct\3 — <*14 = a 2 3 = a 2 4 = a 3 4 = 1.3863, N = 1000 150 4.5 Efficiency assessment with M C D model for binary data: d = 4, a\2 = « 2 3 = CZA = 2.1972, a i 3 = a 2 4 = 1.5163, a14 = 1.1309, N = 1000 151 4.6 Efficiency assessment with M C D modelfor ordinal data: d = 3, z( l) = (—0.5, —0.5, —0.5)', z(2) = (0.5,0.5,0.5)', TV = 1000 152 4.7 Efficiency assessment with M C D modelfor ordinal data: d = 3, z( l) = (—0.5,0, —0.5)', z(2) = (0.5,1,0.5)', N = 1000 153 4.8 Efficiency assessment with M C D model for ordinal data: d = 4, z( l ) = (—0.5, —0.5, - 0 . 5 , -0 .5) ' , z(2) = (0.5,0.5,0.5,0.5)', a12 = a13 = a14 = a23 = a 2 4 = a 3 4 = 1.3863, N = 100 153 4.9 Efficiency assessment with M C D model for ordinal data: d = 4, z( l ) = (—0.5, —0.5, - 0 . 5 , - 0 . 5 ) ' , z(2) = (0.5,0.5,0.5,0.5)', a12 = a23 = a34 = 2.1972, a13 = a 2 4 = 1.5163, a 1 4 = 1.1309, N = 100 154 4.10 Efficiency assessment with M C D modelfor ordinal data: d — 4, z( l ) = (—0.5,0,-0.5,0)', z(2) = (0.5,1,0.5,1)', ori2 = ai3 = a14 = a23 = a24 = a34 = 1.3863, N = 100 . . . . 154 vm 4.11 Efficiency assessment with M C D modelfor ordinal data: d = 4, z(l) = (—0.5, 0, —0.5, 0)', z(2) = (0.5,1,0.5,1)', ai2 = a23 = a34 = 2.1972, a13 = a 2 4 = 1.5163, a14 = 1.1309, JV = 100 155 4.12 Efficiency assessment with M C D model for count data: d = 3, 0O = (1,1,1)' and 0! = (0.5,0.5,0.5)', X{j discrete, N = 1000 156 4.13 Efficiency assessment with M C D model for count data: d = 3, (0i, 02,03) = (1-6094, 1.0986,1.6094), N = 1000 157 4.14 Efficiency assessment with M C D model for count data: d = 4, (01,02,03,04) = (1.6094,1.0986,1.6094,1.6094), 'N = 1000 157 4.15 Efficiency assessment with M C D model for count data: d = 4, (0i,02,03,0A) = (1.3863,0.6931,1.6094,2.0794), N = 1000 158 4.16 Efficiency assessment with multivariate Poisson-Morgenstern-gamma model, d = 3 . 161 4.17 Efficiency assessment with multivariate Poisson-Morgenstern-gamma model, d = 4 . 161 4.18 Efficiency assessment with multivariate Poisson-Morgenstern-gamma model, d = 5 . 161 4.19 Efficiency assessment with special dependence structure: d = 3, z = (0.5,0.5,0.5)' . . 163 4.20 Efficiency assessment with special dependence structure: d = 3, z = (0.5,1.0,1.5)' . . 163 4.21 Efficiency assessment with special dependence structure: d = 3, ao = (0.5,0.5,0.5)', oti =(1,1,1)' 164 4.22 Efficiency assessment with special dependence structure: d = 3, Qfo = (0.5,0.5,0.5)', o i =(1,0.5,1.5)' 164 4.23 Efficiency assessment with special dependence structure: d = 4, z = (0.5,0.5,0.5,0.5)' 164 4.24 Efficiency assessment with special dependence structure: d = 4, z = (0.5,0.8,1.2,1.5)' 165 4.25 Efficiency assessment with special dependence structure: d = 4, z = (0.5,0.5,0.5,0.5)' 165 4.26 Efficiency assessment with special dependence structure: d = 4, z = (0.5,0.8,1.2,1.5)' 165 4.27 Comparison of estimates of standard error, (i) true, (ii) Godambe, (iii) jackknife with g groups; N - 500, n = 1000 169 4.28 Comparison of estimates of standard error, (i) true, (ii) Godambe, (iii) jackknife with g groups; N = 500, n = 500 170 5.1 Six Cities Study: Percentages for binary variables 188 5.2 Six Cities Study: Frequencies of the response vector (Age 9, 10, 11, 12) 188 5.3 Six Cities Study: Pairwise log odds ratios for Age 9, 10, 11, 12 189 i x 5.4 Six Cities Study: Estimates of marginal regression parameters for multivariate logit model 189 5.5 Six Cities Study: Estimates of dependence regression parameters for multivariate logit model with multinormal copula 189 5.6 Six Cities Study: Comparisons of A I C values and X2 values from various submodels of models (la) and (2) 190 5.7 Six Cities Study: Comparisons of A I C values from various models 190 5.8 Six Cities Study: Comparisons of X2 values from various models 190 5.9 Six Cities Study: Estimates (SE) of dependence regression parameters from the sub- model l.md.g.wn of various models 190 5.10 Six Cities Study: Estimates of P r ( Y = y) from various submodels of model (la) . . 191 5.11 Six Cities Study: Estimates of P r ( Y = y) from the submodel l.md.g.wn of various models 191 5.12 Six Cities Study: Observed frequencies in comparison with estimates of P r ( Y = y|x) from various models, x =(City, Smoking9, SmokinglO, Smokingll , Smokingl2). . . . 192 5.13 Six Cities Study: Estimates of P r ( Y = y|x) from the submodel l.md.g.wn of various models, x =(City, Smoking9, SmokinglO, Smokingll , Smokingl2) 193 5.14 T M I Accident Study: Stress levels for 4 years following accident at T M I . Responses with non zero frequencies 199 5.15 T M I Accident Study: Univariate marginal (and relative) frequencies . . 200 5.16 T M I Accident Study: Pairwise gamma measures for Year 1979, 1980, 1981, 1982 . . 200 5.17 T M I Accident Study: Estimates of univariate marginal regression parameters for multivariate logit models 200 5.18 T M I Accident Study: Estimates of dependence regression parameters for multivariate logit model with multinormal copula 201 5.19 T M I Accident Study: Comparisons of A I C values and X2^ values from various sub- models of models (la) and (2) 201 5.20 T M I Accident Study: Comparisons of A I C values and X2^ values from the submodel l.md.g.wn of various models 202 5.21 T M I Accident Study: Estimates (SE) of dependence regression parameters from the submodel l.md.g.wn of various models 202 5.22 T M I Accident Study: Comparisons of values from the submodels l.md.g.wn and l.md.a.wc of model (la) 202 5.23 T M I Accident Study: Estimates of P r ( Y = y) and frequencies from the submodels l.md.g.wn and l.md.a.wc of model (la) 203 5.24 T M I Accident Study: Estimates of P r ( Y = y|x) and frequencies from the submodels l.md.g.wn and l.md.a.wc of model (la) 204 5.25 Bacteria Counts: Bacterial counts by 3 samplers in 50 sterile locations 209 5.26 Bacteria Counts: Univariate marginal frequencies 209 5.27 Bacteria Counts: Pairwise gamma measures for samplers 1, 2, 3 209 5.28 Bacteria Counts: Moment estimate of means, variances, correlations and other sum- mary statistics of responses 210 5.29 Bacteria Counts: Estimates of marginal parameters for multivariate Poisson model . 210 5.30 Bacteria Counts: Estimates of dependence regression parameters for multivariate Poisson model with multinormal copula 210 5.31 Bacteria Counts: Comparisons of A I C values from various submodels of multivariate Poisson model with multinormal copula 210 5.32 Bacteria Counts: Estimates of marginal parameters from multivariate Poisson-lognormal model 211 5.33 Bacteria Counts: Estimates of dependence parameters from multivariate Poisson- lognormal model 211 5.34 Bacteria Counts: Comparisons of A I C values from various submodels of multivariate Poisson-lognormal model 211 5.35 Bacteria Counts: Estimates of means, variances and correlations of responses from the submodel md.g of multivariate Poisson-lognormal model 211 6.1 G E E assessment: d = 2, /?0 = 0.5, Bx = 1, xtj discrete, p = 0.9, TV - 1000 226 6.2 G E E assessment: d = 2, f30 - 0.5,/?i = 1, xtj continuous, p = 0.9, TV = 1000 226 6.3 G E E assessment: d = 2, (30 = —0.5,/?i = 0.5, = 1, wt, x{j discrete, p = 0.9, TV = 1000227 6.4 G E E assessment: d - 2, f30 = 0.5, /?i = 1, x{j discrete, p = 0.5, TV = 1000 227 6.5 G E E assessment: d = 3, z = 0.5, latent exchangeable, p = 0.9, "working" exchange- able, TV = 1000 228 6.6 G E E assessment: d = 3, z = 1.5, latent exchangeable, p = 0.9, "working" exchange- able, TV = 1000 228 xi 6.7 G E E assessment: d = 3, z = 0.5, latent AR(1) , p = 0.9, "working" AR(1) , N = 1000 228 6.8 G E E assessment: d = 3, z = 1.5, latent AR(1) , p = 0.9, "working" AR(1) , N = 1000 229 6.9 G E E assessment: d = 3, B0 = 0.5, B\ — 1, Xij discrete, latent exchangeable, p — 0.9, "working" exchangeable, TV = 1000 229 6.10 G E E assessment: d = 3, B0 = 0.5, B\ = 1, Xjj discrete, latent AR(1) , p = 0.9, "working" AR(1) , N = 1000 230 6.11 G E E assessment: d = 4, z = 0.5, latent exchangeable, = 0.9, "working" exchange- able, N = 1000 230 6.12 G E E assessment: d = 4, z - 0.5, latent AR(1) , p = 0.9, "working" AR(1) , N - 1000 230 6.13 Estimates of v and rj under different variance specification 232 6.14 G E E assessment: (u,rj) = (0.99995,0.01), E (Y ) = 2.718282, Var(Y) = 2.719, n = 1000, N = 500 . , 233 6.15 G E E assessment: (y,rj) = (-0.1,1.48324), E (Y ) = 2.718282, Var(Y) = 62.02, n = 1000, N = 100 233 6.16 G E E assessment: a - 0.5, B = 0.5, t] = 0.01, n = 1000, N — 500 233 6.17 A comparison of I F M to G E E with R\(a) given 235 xi i List of Figures 4.1 Trivariate probit, exchangeable: The efficiency of p from the margin (1,2) (or (1,3), (2,3)) 135 4.2 Trivariate probit, AR(1): The weight ui (or u3) versus p (solid line) and the weight u2 versus p (dash line) 137 4.3 Trivariate probit, AR(1): The efficiency of pp 137 4.4 Trivariate probit, AR(1): (a) The efficiency of p from the margins (1, 2) or (2, 3). (b) The efficiency of p from the margin (1,3) 138 4.5 Trivariate Morgenstern-binary model: Relative efficiency of I F M approach versus IFS approach 140 4.6 Trivariate Morgenstern-binary model: (a). Ordered relative efficiency values of I F M approach versus IFS approach; (b) A histogram of the efficiency value rg 141 4.7 Trivariate Morgenstern-binary model: Relative efficiency of I F M approach versus IFS approach when ui = u2 = "3 and 6\2 — #13 = #23 142 4.8 Trivariate Morgenstern-binary model: Relative efficiency of I F M approach versus IFS approach when u\ = u2 = u3, #12 = #23 = # and 913 — 02 142 4.9 Trivariate normal-binary model: Relative efficiency of I F M approach versus IFS ap- proach 144 4.10 Trivariate normal-binary model: (a). Ordered relative efficiency of I F M approach versus IFS approach; (b) A histogram of the efficiency rg 145 4.11 Trivariate normal-binary model: Relative efficiency of I F M approach versus IFS ap- proach when « i = u2 = «3 and pi2 = pi3 = Piz 145 5.1 Bacteria Counts: Residuals from the submodel md.g of model (1) 207 xm Bacteria Counts: Residuals from the submodel md.g of trivariate Poisson lognormal model xiv Basic Notation and Definitions The following notation and definitions are used throughout the thesis. 1. cdf stands for cumulative distribution function; pdf stands for probability density function, and pmf stands for probability mass function; Pr stands for probability of. 2. rv stands for random variable or random vector depending on the context; iid stands for independent and identically distributed. 3. B V N and M V N are the abbreviations for bivariate normal and multivariate normal respec- tively. 4. C U O M is the abbreviation for closure under taking of margins. M U B E is the abbreviation for model univariate and bivariate expressible. P U B E is the abbreviation for parameter uni- variate and bivariate expressible. M P M E is the abbreviation for model parameters marginally expressible. (Definitions given in Section 2.1.) 5. M L and M L E are the abbreviations for maximum likelihood and maximum likelihood estimates or estimation. A n M L E of 6 will usually be denoted by 6. 6. I F M and I F M E are the abbreviations for inference functions of margins and inference functions of margins estimates or estimation. A n I F M E of 6 will usually be denoted by 0. IFS is the abbreviation for inference functions of scores. 7. M C D and M M D are the abbreviations for multivariate copula discrete and multivariate mixture discrete. 8. The symbol " • " indicates the end of a definition, the statement of assumptions, a proof, a result, or an example. xv 9. For a vector or matrix, the transpose is indicated with a superscript of T or ', depending on convenience in the context. 10. A l l vectors are column vectors; hence transposed vectors such as X', x' (or XT, xT) are row vectors. 11. Rk = {x : x = (xi,..., Xk)', — oo < Xj < oo for j = l,...,k} denotes the fc-dimensional Euclidean space. 12. d is used for dimension of the multivariate response vector of multivariate distribution. 13. Boldfaced Roman upper case letter Y = ( Y i , . . . , Yd)', usually with subscripts, is used to denote a response random vector and y is used for the observed value of this response vector. A vector of explanatory variables or covariates is usually denoted by x or w. 14. Boldfaced Roman upper case letters X , Y , Z and so on, usually with subscripts, are used for (random) vectors, boldfaced Roman lower case letters x, y, z and so on are used for the observed vector values. 15. Roman upper case letters X,Y, Z and so on, usually with subscripts, are used for random variables, roman lower case letters x, y, z and so on are used for the observed values. 16. Greek boldfaced lower case letters, often with subscripts, are used for a collection of parameters of families of distributions, e.g. a, 0,6,6. They are in vector format. Greek lower case letters, often with subscripts, are used for parameters of families of distributions, e.g. a, 0, 9, 8. 17. Greek upper case letters 0, E are used for a set of parameters (often dependence parameters) in multivariate family, they are mostly in matrix format. 18. 3? is the symbol for parameter space, usually ft C Rk for some K. 19. Script Roman upper case letters T and Q are used for classes of functions or distribution families. 20. F, G, H are the symbols for a (multivariate) cdf. 21. For a d-variate cdf F, the set of its marginal distributions is denoted as {Fs '• S £ Sd}, where Sd is the set of non-empty subsets of {1,..., d}. For a specific S, the subscript is written without braces, e.g., F\,..., Fd, Fi2, Fi2z, etc.. xvi 22. We define the pdf or pmf of Y at y = (yi, . . .,yd) as Pu-d(yi • • -yd) or simply P(yx • • -yd), with the corresponding jth marginal Pj(yj), the bivariate (j, k) marginal Pjk{yjl)k), and so on. We also write P(yi • • - yd', 0) to denote that the pdf or pmf of Y depends on a parameter (or parameter vector) 0. 23. The frequency of observing a particular outcome ( j / i , • • •, j / d ) ' in a data set is denoted by " i 2 - - d ( j / i ' ' - Vd) or simply n(yi • • - yd)- The frequency corresponding to the j th marginal out- come yj is nj(yj), and that corresponding to the (j, k) bivariate marginal outcome yj and J/J, is njk(yjyk), and so on. 24. J2{yj} means the summation over all possible different values of yj. J2{Xl<y! xd<yd] m e a n s the summation over all possible different vector values of x = (x\,..., Xd)' which satisfy {x\ < j / i , . . . , Xd < yd}. X 2 { y 1 - j / d } \ { ! / J } means the summation over all possible different vector value y = ( j / i , . . . , yd)' with the jth component absent, and so on. 25. U(a, b) denotes the uniform distribution on the interval [a, b]. N(p, a2) denotes the univariate normal with mean // and variance a2. Nd(p, S) denotes the d-variate normal with mean vector fl and covariance matrix S; $<j(x;/i, S) (or $d(x)) denotes the corresponding cdf and <f>d{-x.,U,Y) (or <j>d(x.)) the pdf. 26. The partial derivative of a scalar function ip(0), dip(9)/d9, is the q x 1 vector where 9\,..., 8q are the components of the vector 0. 27. The partial derivative of a vector function \? = (ipi(6),..., ipr(6))', 8^/06', is the r xq matrix (dm dm\ V 00i deq ) nVvO?)/ where 0\,..., 6Q are the components of the vector 6. xvn Acknowledgements I wish to record my warmest thanks to my thesis supervisor, Professor Harry Joe, for his continuous encouragement and for numerous suggestions and discussions during the development of this thesis. I acknowledge, with gratitude, the work of Harry Joe on multvariate copulas and dependence, and especially his invaluable book draft on "Multivariate Models and Dependence Concepts"; this has had a significant impact on the development of this thesis, and served as a foundation for this thesis. There are also many computer programming techniques and computations involved in this work where Harry Joe's tremendous experience was a crucial help. I would like to thank Professors John Petkau and Michael Schulzer for serving on my supervisory committee. In addition, I thank Professor John Petkau for helpful discussions, his valuable comments on the thesis, as well as his encouragement and support. Also, many thanks go to Professors Nancy Heckman, James Zidek and Ruben Zamar for their encouragement and support. Special thanks to Rinaldo Artes, Billy Ching and John Smith for their encouragement and support. I would also like to thank my fellow students, friends, professors and staff members in the Department of Statistics at U B C for providing a pleasant and stimulating research environment. Finally, I would like to thank Harry Joe for his financial support through an N S E R C grant. The financial support from the Department of Statistics is acknowledged with great appreciation. xvin C h a p t e r 1 Introduction This chapter starts by discussing the structure of the multivariate data for which we are going to build appropriate multivariate models. We motivate our thesis research through reviewing and criticizing the relevant literature on the modelling of the multivariate discrete data. This chapter is organized in the following way. In section 1.1, we introduce the multivariate data structure, for which we are going to develop multivariate models. In this section, we discuss in detail multivariate binary, multivariate ordinal categorical and multivariate count data. The models developed in this thesis are general in nature, but the illustrative examples will be mainly based on the forementioned three types of multivariate discrete data. In section 1.2, we briefly summarize and criticize the relevant statistical literature on the modelling of data of the types described in section 1.1, point out the inadequacies thereof, and thus motivate our thesis research. In section 1.3, we outline some desirable features of multivariate models and briefly discuss some of my understandings about statistical modelling. Section 1.4 provides an overview of the thesis. 1.1 Multivariate discrete response data T h e d a t a s t r u c t u r e Many data sets consist of discrete variables. Familiar examples of such variables are religion, na- tionality, level of education, degree of disability, attitude to a social issue, and the number of job changes for an individual during a certain period of time. These variables are categorical or count, they may be unordered (religion, nationality) or ordered (degree of disability, attitude to a social issue). In real life, what is more complicated is that often the discrete data are multivariate and 1 Chapter 1. Introduction 2 Table 1.1: The structure of general multivariate discrete data d-variate resp. margin-indep. cova. margin-dep. cova. J/n • • • Vid yn • '• • Vid Vnl • • • Und Xu • •• X i p X{1 ' ' ' %ip Xnl ' ' ' Xnp • • • z u P l , . . . , z\d\ • • • Zldpd Zi\\ • • • ZHpi, . . ., Zidl ' ' ' Zidpt Znll ' ' ' Z n \ p l , . . ., Zndl ' ' ' Zndpt the multiple measurements may be interdependent in some way. The dependence may be general or special. The multivariate data structure can be further complicated by having missing data, random covariates and so on. In this thesis, we shall concentrate mainly on multivariate discrete response data, with or without covariates. The general multivariate discrete data set of interest is given in Table 1.1. The data structure in Table 1.1 consists of basically three parts: d-dimensional discrete response observations = (yn, • • •, yid)': a margin-independent covariate vector of p components x, = (xn,..., Xip)', that is, a covariate vector which is constant across margins, and d margin-dependent (or marginal specific) covariate vectors z , i , . . . , Zid, where Zij = (ziji,..., ZijPj)' is a vector of pj components for the jth margin, j = 1 , . . . , d, i — 1 , . . . , n. In the longitudinal or repeated measures settings, the marginals might be defined by successive points in time. In these situations, we can call the margin- independent covariates time-independent, that is, constant across times, and the margin-dependent covariates time-dependent. The response vector yt- can be measures on d variates with general or special dependence structure, such as multiple measures from a human, a litter of animals, a piece of equipment, a geographical location, or any other unit for which the observations are a collection of related measures. The measures can be spatial or temporal. One way to make inferences from such a data structure is through a multivariate parametric model. (Nonparametric multivariate inference requires much more data than parametric multivariate inference.) The development and analysis of suitable models for the multivariate data in Table 1.1 are the main objectives of this thesis. Some typica l mult ivar ia te discrete data Binary data. Binary data arises when measurements can have only one of two values. Conventionally these are represented by 0 and 1, with 1 usually representing the occurrence of an event and 0 representing non-occurrence. For example, the reaction of a living organism to some material, often Chapter 1. Introduction 3 observed as presence or absence of the reaction (usually called quantal response), is binary. Alive throughout a specified period or died during the period, won or lost, success or failure in a specified task, gender, agree or disagree, are all examples of sources of binary data. Multivariate binary data are frequent in statistical applications. Whenever multivariate data are coded, for each dimension, as one of two mutually exclusive categories, the data are multivariate binary. In the more complicated situation, covariates can be included when one is considering binary response data. A n example of multivariate binary data is the Six Cities Study case analyzed in subsection 5.2.1. Ordinal categorical data. A n ordinal variable is one that has a natural ordering of its possible values, but for which the distances between the values are undefined, such as a four-category Likert scale. Ordinal categorical (or ordered categorical) response data, often accompanied with a set of covari- ates, arise frequently in a wide variety of experimental studies, such as in bioassay, epidemiology, econometrics and medicine. For example, in medicine it may be possible to classify a patient as, say, severely, moderately or mildly ill, when a more exact measurement of the severity of the disease is not possible; the covariates may be age, gender and so on. With ordinal variables, the categories are known to have an order but knowledge of the scale is insufficient to consider them as forming a metric. We may treat the ordinal categories simply as nominal categories - which is unordered categorical measures, but by doing so the valuable information of order is lost. So the consideration of the order is important for optimal information extraction. For an ordinal variable, it is often reasonable to assume that the ordered categories correspond to non-overlapping and exhaustive in- tervals of the real line. Multivariate ordinal data are frequent in applications. Whenever multivariate response variables are each ordinal categorical, the data are multivariate ordinal categorical. More complicated situations include covariates for each of the response variables. A case of multivariate ordinal data from the Three Mile Island (TMI) nuclear power plant accident study can be found in subsection 5.2.2. Count data. Data in the form of counts appear regularly in life. In the simplest case, the number of occurrences of some phenomena on each unit are counted. Because no explanatory variable (e.g. time, treatment) distinguishes among these observed events, they can be aggregated as single numbers, the counts. Examples of count data are the counts of pest eggs on plant leaves, the counts of bacteria in different kinds of bacteria colonies, the number of organic cells with fixed number of chromosome interchanges produced by X-ray irradiation, etc.. Consul (1989) discussed many count data examples in a variety of situations, including home injuries, and strikes in industries. Other examples include the number of units of different commodities purchased by consumers over a Chapter 1. Introduction 4 period of time, the number of times authors cited over a number of years, spatial patterns of plants, the number of television commercials, or the number of speakers in a meeting. Multivariate count data are also frequent in applications. Whenever multivariate response variables are each count in nature, the data are multivariate count. The more complicated situations also include covariates to the response variables. A n example of multivariate count data can be found in subsection 5.2.3. 1.2 Review of literature and research motivation For the data types we have seen in section 1.1, one of the questions is how to build a model or a probability distribution as an approximation to the stochastic phenomenon of multivariate nature, and based on the available data, to estimate the distribution, and make some inference or predictions. For this purpose, the construction of an appropriate probability distribution or statistical model in accordance with the available data generated by the stochastic phenomenon is essential. Models for univariate discrete data have been studied extensively. The well-known generalized linear models for a univariate variable are such examples (McCullagh and Nelder 1989, Nelder and Wedderburn 1972). However, general studies on multivariate models for the type of data outlined in Table 1.1 are lacking in the statistical literature. One difficulty with the analysis of nonnormal multivariate data (including continuous and discrete data) has been the lack of rich classes of models such as the multivariate Gaussian. Some isolated studies on the modelling of a particular data set or under a particular multivariate setting of the type of data in Table 1.1 have appeared in the literature. These studies can be classified in general as being based either on a completely specified probability model or on a-partially specified probability model. We overview some of them here, and point out their drawbacks or weaknesses. Completely specified probability models Exponential family: Following Cox (1972), the probability distribution for a binary random vector Y can be represented as a saturated log-linear model d P(y) = exp(w0 + ^Ujyj + 'YJUjkyiyk + h uu...dyi •••yd) (1-1) where uo is a normalizing constant. The 2d — 1 parameters « i , . . . , ud, • •., « i 2 , « 1 3 , • • •, U(d-i)d, • • •, ui2—d v a r y independently from —oo to co. Expressions similar to (1.1) can also be found in Zhao and Prentice (1990), Liang et al. (1992) and Fitzmaurice and Laird (1993). The representation Chapter 1. Introduction 5 (1.1) is not closed under taking of margins (see Section 2.1 for a definition). In fact, if we write P(yiV2) = e x p ( u o + ]Cj=i UjVj + u*22/i2/2), then UQ, U*- and u * 2 must depend on all the parameters uo, Uj, Ujk, ..., u\2-d- This fact makes the interpretation of the parameters Uj, Ujk, . •., ui 2--d very difficult, and it is not clear how covariates could be included. For the general form, there are too many parameters. Bahadur representation: Bahadur (1961) gave a representation of the distribution for a binary random vector Y, in terms of the moments: d P(y) = nPj(l)yjPj(0)1~yj[l + J 2 P i k e i e k + 2 PjkiejekeI + --- + P l 2 . . . d e 1 e 2 - - - e d ] (1.2) j=l j<k j<k<l where e, = (%• - Pj(l))/^/Pj(l)Pj(0), and pjk = E(ejek), ..., p\2...d = E(eie2 • • • ed). This rep- resentation has the closure property under taking of margins, but the parameterization may not be desirable since the pjk's and the parameters of higher order are constrained by the marginal probabilities (see Prentice 1988), and the extension to include covariates may be difficult. For the general form, there are also too many parameters for the model to be useful. Multivariate Poisson convolution-closed model: Teicher (1954) and Mahamunulu (1967) discussed a class of multivariate Poisson convolution-closed models. For example, a trivariate Poisson convolution- closed model has the stochastic representation (Yx,Y2,Y3)d= (Xi + Xi2 + X13 + X i 2 3 , X2 + X12 + X23 + X \ 2 3 , X3 + X±3 + X23 + Xi23) (1.3) where X\, X2, X3, X\2, X\3, X23, X\23 are independent Poisson rv's with parameters A i , A2, A3, A 1 2 , ^13, A23, Ai 2 3 respectively. These models may be suitable for counts in overlapping regions or time periods if the Poisson process is a reasonable model of the underlying count process. The model has a closure property under taking of margins, but it is not "model univariate-bivariate expressible" (see Chapter 2 and Example 2.5 for further explanation of this expression), and it can only accommodate multivariate count data with a limited type of dependence range (positive dependence). Exchangeable mixture models: Prentice (1986) gives an expression for a joint distribution of a binary random vector Y, with where 0 < p < 1, m = Y H \-Yd and 7 > - ( d - l ) _ 1 min{p, 1-p}. The model (1.4) is an extension of a beta-Bernoulli model derived from the mixture model P(y) — JQ p y + (1 — p)d~y+ g(p) dp, where y+ = Ylj=i %' a n d 9(P) i s t r i e density of a Beta(a, /?) distribution. This model implies equicorrelation Chapter 1. Introduction 6 of Y with correlation parameter of (1 + 7 - 1 ) - 1 . The representation (1.4) has the closure property under taking of margins, but it is limited to equicorrelation of response variables. Joe (1996) has discussions on the range of negative dependence on this family. Discussions of extensions to incorporate covariates appeared in Prentice (1986) and Connolly and Liang (1988). Multivariate probit model: A d-variate probit model for binary data is where /(•) is indicator function, Z = (Z\,.. .,Zd)' ~ N(0,Q), 0 = (Ojk) is a correlation matrix. The Zj's are often referred to as cut-off points. Ashford and Sowden (1970) used the bivariate probit model for binary data to describe a coal miner's status of development of breathlessness (present or absent) and wheeze (present or absent) as a function of the miner's age. Anderson and Pemberton (1985) used a trivariate probit model for the analysis of an ornithological data set on the three aspects of colouring of blackbirds. A general introduction to the multivariate probit model is Lesaffre and Molenberghs (1991). The multivariate probit model has many nice properties, such as closure under taking of margins, model univariate-bivariate expressibility, and a wide range of dependence. M L E is considered but is computationally more difficult as d increases. New approaches to estimation and inference are explored in this thesis. Multivariate Poisson-lognormal model: Aitchison and Ho (1989) studied a model for count random vector Y , with where fj{yj\^j) is a Poisson pmf with parameter \j and g(X) is the density of a multivariate lognormal distribution. This model also has many nice properties, such as closure under taking of margins, model univariate-bivariate expressibility, and a wide range of dependence. Again the M L E is computationally difficult. Molenberghs-Lesaffre model: A model that may be suitable for binary and ordinal data is studied in Molenberghs and Lesaffre (1994). This model can accommodate general dependence structure from the Molenberghs-Lesaffre construction (Joe 1996) with bivariate copulas, such as in Joe (1993). The multivariate objects in the Molenberghs-Lesaffre construction have not been proved to be proper multivariate copulas, but they can be used for the parameters that lead to positive orthant proba- bilities for the resulting probabilities for the multivariate binary vector. Other miscellaneous models (some for time series or longitudinal data): Yj = I(Zj < ZJ), j = l,...,d, (1.5) (1.6) Chapter 1. Introduction 7 - Kocherlakota and Kocherlakota (1992) provide a good summary of bivariate discrete distribu- tions (including bivariate Poisson, bivariate negative binomial, etc.). - Markov chain of first order for binary data with Pr(Yj+i = l|Yj = 0) = PJJ+I(01) and Pr(yj+i = l\Yj = 1) = i-jj+i(ll). It can be generalized to higher order Markov chains. Some combinations of P,j+i(01), PJJ+I(11) and Pr(Yj+i = 1) could be replaced by logistic functions (but not all three) to incorporate covariates. Examples are in Darlington and Farewell (1992), Muenz and Rubinstein (1985), Zeger, Liang and Self (1985) and Gardner (1990). - Poisson AR(1) time series, as in Al-Osh and Alzaad (1987) and McKenzie (1988). The bivariate Poisson margin (for consecutive Y<'s) from this Poisson AR(1) time series is the same as a bivariate margin of (1.3). - Negative binomial AR(1) time series, as in McKenzie (1986), Al-Osh and Aly (1992) and Joe (1996b). The model of Al-Osh and Aly has range of serial correlation depending on the parameters of the negative binomial distribution (and hence is not very flexible). - When the binary or count variables are observed sequentially in time, one could use a model consisting of a product of a sequence of logit models for binary data (logit of Yt given Yi,..., Y(_i, x) and of Poisson models for counts (Poisson of Yt given YL, ..., Y (_i, x). This is proposed in Bonney (1987) and Fahrmeir and Kaufmann (1987). The advantage of such models is that one can use widely available software for univariate logit and Poisson models. One disadvantage of such models is that it would be difficult to predict Yt based on x alone. - Meester and MacKay (1994) studied a class of multivariate exchangeable models with the multivariate Frank copula. The models have limited application since only exchangeable de- pendence structures are considered. - Glonek and McCullagh (1995) have a similar bivariate model to the Molenberghs-Lesaffre model in that the dependence parameter is linear in covariates and the related bivariate copula is the Plackett copula. Their multivariate extension appears to overlap with that of Molen- berghs and Lesaffre (1994), but with a different model construction approach. P a r t i a l l y speci f ied p r o b a b i l i t y m o d e l s — genera l ized e s t i m a t i n g equat ions a p p r o a c h General application of many of the preceding models was impeded, however, by their mathemati- cal complications and by the computational difficulty usually encountered in multivariate analysis. Chapter 1. Introduction 8 A different body of methodology, called the generalized estimating equations (GEE) approach, has been developed based on moment-type methods which do not require explicit distributional assump- tions. References for this methodology are Liang and Zeger (1986) and Zeger and Liang (1986), Zhao and Prentice (1990), Fitzmaurice and Laird (1993), among others. However the G E E approach has several disadvantages mainly related to the modelling, inference, diagnostics checking and interpre- tations. Furthermore, the G E E approach does not apply directly to multivariate ordinal data. A detailed study of the G E E approach, including a discussion of some of its shortcomings, can be found in Chapter 6. R e s e a r c h m o t i v a t i o n In summary, although some approaches have appeared in the literature to model specific instances/examples for the data in Table 1.1, there are at least two major features lacking in the statistical literature in terms of modelling multivariate discrete data: 1. A unified, systematic approach to multivariate discrete modelling, with classes of models for multivariate discrete data where some models in the class have nice properties (see section 1.3 for some desirable features of multivariate models). 2. A model-fitting strategy with computationally feasible parameter estimation and inference procedures, with good asymptotic properties and efficiency. This thesis makes contributions to these two lackings in multivariate discrete (more generally, non- normal) data modelling. We study systematic approaches to the modelling of multivariate discrete response data with covariates. The response types include binary, ordinal categorical and count. Statistical inference and computational aspects of the multivariate nonnormal models are studied. 1.3 Statistical modelling We discuss here two issues in statistical modelling. One is what we mean by statistical modelling in general. The other is the construction of multivariate models with desirable properties. Other aspects of statistical modelling as part of data analysis will be discussed in Chapter 5. In practice, with a finite sample of data, to capture exactly the possibly complex multivariate system which generated the data is impossible. The problem can even be more complicated than modelling a system; it might be that the system itself does not exist and it is forever a hypothetical Chapter 1. Introduction 9 one. In statistical modelling, the specification of a particular model for the data is always somehow arbitrary; what we hope is that the stochastic models we use may reflect relatively well the random- ness or uncertainty in the system, as well as the significant features of the systematic relationships. The statistical models should be considered as a means of providing statistical inference; they should be viewed as tentative approximations to the truth. The most important consideration in using any statistical method (or model) is whether the method (or model) can give insight into important practical problems. A l l models are subjective in some degree. Often the modeller chooses those elements of the system under investigation that should be included in the model as well as the mode of representation. Modelling should not be a substitute for thinking and will only be effective if combined with an interest in and knowledge of the system being modelled. The construction of multivariate nonnormal models is not easy. For modelling purposes, we would like to have parametric families of models that (i) cover the different types of dependence, (ii) have interpretable parameters, and (iii) apply to multivariate discrete data. Some desirable properties of a multivariate model are the following: 1. The model is natural. That is, the model is interpretable in terms of mixture, stochastic or latent variable representations, etc.. 2. Parameters in the model are interpretable. A parametric family has extra interpretability if some of the parameters can be identified as dependence or multivariate parameters, such that some range of the parameters corresponds to positive dependence and some corresponds to negative dependence, and it is desirable to have the amount of dependence to be increasing as parameters increase. 3. The model allows wide and flexible range of dependence, with interpretable dependence pa- rameters which are flexible to the needs for different applications. 4. The model extends naturally to include covariates for the univariate marginal parameters as well as dependence parameters, in the sense that after the extension, we still have probabilistic model and proper interpretations. 5. The model has marginally expressible properties, such as model parameters expressible by pa- rameters in univariate and bivariate distributions property and closure property with extension of univariate to bivariate and to higher order margins. 6. The model has a simple form, preferably with closed form representations of the cdf and density, or at least is easy to use for computation, estimation and inference. Chapter 1. Introduction 10 Generally, it is not possible to achieve all of these desirable properties simultaneously, in which case one must decide the relative importance of the properties and sacrifice one or more of them. There is no known multivariate family having all of these properties but the family of multivariate normal distributions may be the closest. Multinormal distributions satisfy (1), (2), (3), (4) and (5) but not (6) since the multinormal has no closed form cdf. The mixture of max-id copulas (Joe and Hu 1996) satisfy'(1), (2), (3), (4) and (6) but only partially (5). In Chapter 3, these desirable properties of a multivariate model will be used as criteria to compare different models. 1.4 O v e r v i e w o f t h e s i s This thesis consists of seven chapters. In Chapter 2, we develop the theoretical background for the multivariate discrete models, statistical inference and computation procedures. Two general classes of multivariate discrete models are introduced; their common feature is that both rely on the copula concept. Several new concepts related to multivariate models are proposed. The asymptotic theory for parameter estimation based on the inference functions of margins (IFM) is also given in this chapter. In Chapter 3, we study and compare many specific models in the two general classes of multivariate models proposed in Chapter 2. Mathematical details for parameter estimation for some of the models are provided. In Chapter 4, the efficiency of I F M approach relative to the classical maximum likelihood approach is investigated. The major advantage of I F M is its computational feasibility and its good asymptotic properties. We demonstrate that I F M is an efficient parameter estimation approach when it is applicable. We also study the efficiency of the jackknife method of variance estimation proposed in Chapter 2. In Chapter 5, some important issues such as a proper data analysis cycle, model selection and diagnostic checking are discussed. Data analysis examples illustrating modelling and inference procedures developed in this thesis are also carried out. In Chapter 6, we study the usefulness and efficiency of the G E E approach which has been the focus of many recent statistical applications dealing with multivariate and longitudinal data with univariate margins covered by the theory of generalized linear models. In Chapter 7, the final chapter, we discuss some further important research topics closely related to this thesis work. Finally, the Appendix contains a Maple symbolic manipulation program example used in Chapter 4. C h a p t e r 2 Foundation: models, statistical inference and computation In this chapter, we propose two classes of multivariate discrete models: multivariate copula dis- crete (MCD) models and multivariate mixture discrete (MMD) models. These two classes of models provide a new classification of multivariate discrete models, and allow a general approach to mod- elling multivariate discrete data. The two classes unify a number of multivariate discrete models appearing in the literature, such as the multivariate probit model, multivariate Poisson-lognormal model, etc. At the same time, numerous new models are proposed under these two classes. We also propose an inference functions of margins (IFM) approach to parameter estimation for M C D and M M D models. This estimation approach is built on the general theory of inference functions (or estimating equations). Asymptotic theory for I F M is developed and applied to the specific models in Chapter 3. While similar ideas about the same kind of estimating functions for a specific model have appeared in the literature, the general development of the procedure as an approach for the parameter estimation for a class of multivariate discrete models, and the related asymptotic results, are new. We also show that a jackknife estimate of the covariance matrix of the estimates from the I F M approach is asymptotically equivalent to the asymptotic covariance matrix from the Godambe information matrix. The jackknife procedure has the advantage of general computational feasibility. These results are used extensively in the applications in Chapter 5. The efficiency of I F M versus the optimal estimation procedure based on maximum likelihood estimation and the numerical assess- ment of the efficiency of jackknife covariance matrix estimates compared with Godambe information 11 Chapter 2. Foundation: models, statistical inference and computation 12 matrix are studied in detail in Chapter 4. The present chapter is organized as follows. Section 2.1 introduces the multivariate copula mod- els, some multivariate dependence concepts and a number of new concepts regarding the properties of a multivariate model. In section 2.2, we introduce two classes of multivariate discrete models: the multivariate copula discrete models and the multivariate mixture discrete models. These two classes of models are the focus of this thesis, and specific models in these two classes will be extensively studied in Chapter 3. In section 2.3, we propose an inference functions of margins (IFM) approach for the parameter estimation of M C D and M M D models; the theoretical foundation is built on the theory of inference functions for the multi-parameter situation. Section 2.4 is devoted to the study of the asymptotic properties of parameter estimates based on the I F M approach. Under regularity conditions, the I F M estimators (IFME) for parameters are shown to be consistent and asymptoti- cally normal with a Godambe information matrix as the variance-covariance matrix. These are done for the models with no covariates as well as models with covariates. The extension of models with no covariates to models with covariates will be made clear in this section. In section 2.5, we propose a jackknife approach to the asymptotic variance estimation of I F M E , and show theoretically that the jackknife estimate of variance is asymptotically equivalent to the Godambe information matrix. The importance of the jackknife estimate of variance will be demonstrated in Chapter 5 for real data analysis. 2.1 Multivariate copulas and dependence measures 2.1.1 M u l t i v a r i a t e d i s t r i b u t i o n f u n c t i o n s We begin by recalling the definition of a distribution on Md. D e f i n i t i o n 2.1 A d-dimensional distribution function is a function F : Md —> IR, which is right continuous, with (i) lim F(y1,...,yi) = 0, j = l , . . . , d , (ii) lim F ( y i , . . . , y d ) = l and which satisfies the following rectangle inequality: For all (ai,..., af), (b\,..., bf) with a,j < bj, j = !,•••, d, 2 2 Jfei=l kd=l Chapter 2. Foundation: models, statistical inference and computation 13 The following are several remarks related to Definition 2.1: i . If F has dth order derivatives, then (2.1) is equivalent to ddF/dyi • • -dyd > 0. ii . Letting a2, • •., ad —+ —oo, then (2.1) reduces to F(bi, b2,.. .,bd) — F(ai,b2,..., bd) > 0, so F is increasing in the first variable. Similarly, by symmetry, F is increasing in the remaining variables. iii. Let 5 be a subset of {1,..., d}. The margins Fs of F(yi,..., yd) are obtained by letting y,- —» oo for i £ S. There are two important types of cdf generated from a random vector Y : discrete and continuous. In the case of an absolutely continuous random vector Y, there is a corresponding density function f(yi ,---,yd) which satisfies / ( y ! , . . . , yd) > 0 and f f ^ • • • f(yx • • • yd)dyx •••dyd = 1. The cdf can be written by / yd ryi •• f(xi • --x^dxi • --dxd. - o o J — oo In the case of a discrete random vector Y, the probability that Y takes on a value y = ( j / i , . . . , yd)' is defined by the pmf P(yi---yd) = Pv(Y1 = y1,...,Yd = yd), which satisfies P(yi • • -yd) > 0 and Y^,{yi} ' ' '^{yd} ̂ (v^ •••%) = !• The cdf can be written as F{yi,...,yd)- P(xi---xd). {xi<y1,...,xd<yd} For a discrete random vector, the jth marginal distribution is defined by Pj(v:)= E p(y±---yd)- {yi-yd}\{yj} The (j, k) marginal distribution is defined by Pjk(yjyk) = P(yi • • • yd)- {yi-yd}\{yjyk} In general, the marginal distributions can be obtained from the previous remark (iii). 2 .1 .2 Multivariate copulas and Frechet bounds The multivariate normal distribution is used extensively in multivariate analysis because of its many nice properties (see for example, Seber 1984). The wide range of successful application of the Chapter 2. Foundation: models, statistical inference and computation 14 multivariate normal distribution is because of its flexibility in representing different dependence structures rather than for physical reasons or as an approximation from the Central Limit Theorem. The dependence structure plays a crucial role in multivariate modelling. But the multivariate normal model is not sufficient for every multivariate modelling situation. To be able to model multivariate data in general, a good understanding of the general parametric families of multivariate distribution functions - the constructs which describe the characteristic of the random phenomena, is necessary. One useful and well-known approach to understanding a multivariate distribution function F is to express F in terms of its marginals and its associated dependence function C ( ) . This C(-) (or simply C) is commonly called the copula. Definition 2.2 (Copula) C is a copula if G(y1,...,yd) = C(G1(y1),...,Gd(yd)). is a distribution function, whenever Gi, Gd are all arbitrary univariate distribution functions. • Let Y — (Yi,..., Yd)' be a d-variate continuous random vector with cdf G(y\,...,yd) and with continuous univariate marginal distribution functions Gi(yi), ..., Gd(yd) respectively. Then Ui = Gi(Yi), . . . , Ud — Gd(Yd) are uniformly distributed on [0,1]. Let Gj"1, • • •,G d x be the univariate quantile functions, where GJ1 is defined by Gjx(t) = inf{j/: Gj(y) >t},j = l,...,d. The copula, C, of Y = (Yi,..., Yd)' is constructed by making marginal probability integral transforms on Y\,..., Yd to Ui,..., Ud. That is, the copula is the joint distribution function of U\,..., Ud: C { u l , . . . , u i ) = G{G-ll{ul),...,Gd\ud)). (2.2) C is non-unique if the Gj's are not all continuous. This point will be made clear in section 2.2. Sup- pose Y is a continuous random vector with distribution function G(yi,.. - ,yd) and the corresponding copula is C ( u i , . . . , ud) with density function c ( u i , . . . , ud). The density function of G(y\,..., yd) in terms of copula density function is g(yx ,...,yd) = c(Gi(t/i),..., Gd(yd)) ]T? = 1 9j (%)• The copula captures the dependence among the components of the random vector Y ; it contains all of the information that couples the d marginal distributions together to yield the joint distribution of Y . This understanding is essential for the subsequent development of the multivariate discrete models. The copula was first introduced by Sklar (1959). For parametric families of copulas with good properties, see Joe (1993, 1996). Through the copula, a distribution function is decomposed into two parts: a set of marginal distribution functions and the dependence structure which is Chapter 2. Foundation: models, statistical inference and computation 15 specified in terms of the copula. This suggests that one natural way to model multivariate data is to find the dependence structure in terms of copula and the univariate marginals separately. This important feature will be extended to form multivariate discrete models by using the copula concept in section 2.2 and in Chapter 3. Next we define the Frechet bounds. D e f i n i t i o n 2.3 (Frechet bounds) Let F(x) be a d-variate cdf with univariate margins F \ , . . . , Fd. Then for V i i , . . . , xd, max{0, J^(a:i) + • • • + Fd{xd) - (d - 1)} < F(xx, ...,xd)< min{ ^ ( s i ) , . . . , Fd(xd)}, (2.3) where m i n { i ? i ( x i ) , . . . , Fd(xd)} is the Frechet upper bound, and max{0, F\(xi)+- • -+Fd(xd) — (d—l)} is the Frechet lower bound. • We state here some important properties of the Frechet bounds. Proper t ies 1. The Frechet upper bound is a cdf. 2. The Frechet lower bound is a cdf for d = 2. 3. The Frechet upper bound copula is C{/(u) = m a x { « i , . . .,ud}. For d = 2, the Frechet lower bound copula is C^(u) = min{0, u\ + u2 — 1). For a proof of the properties 1,2,3 and other properties of Frechet bounds, see Joe (1996). Under independence, the copula is d Cj(ui,...,ud) = J\uj, and any copula must pointwise fall between max{0, u\ + • • • + ud — (d — 1)} and m i n { u i , . . . , ud}. 2.1.3 Dependence measures It is desirable for a parametric family of multivariate distributions to have a flexible and wide range of dependence. For non-normal random variables, correlation is not the best measure of dependence, and concepts based on linearity are not necessarily the best to work with. More general concepts of positive and negative dependence and measures of monotone dependence are needed. These are necessary for analyzing the type of dependence and range of dependence in a parametric family of Chapter 2. Foundation: models, statistical inference and computation 16 multivariate models. For a thorough treatment of dependence concepts and dependence orderings, see Joe (1996, Chapter 2). In multivariate analysis, one of the most important activities is to model the dependence structure among the random variables. The complexity of the dependence structure and its range often determines the practical usefulness of the model. The dependence structure of a multivariate model can be considered somehow equivalent to the copula; for example, Schweizer and Wolff (1981) used copulas to define several natural nonparametric measures of dependence for pairs of random variables. The parameters in a multivariate copula reflect the degree of dependence among variables; for example, the multivariate normal copula can be adequately expressed in terms of a correlation matrix of which the elements consist of the pairwise correlation coefficients of a multinormal random vector, with a large correlation coefficients indicating strong dependence among variables. However, it is not always possible to express a copula in term of correlation coefficients of a set of random variables. There is also a mathematical reason, e.g. mathematical simplicity, not to express a copula in terms of correlation coefficients. A measure of dependence for two random variables indicates how closely these two random variables are related. The extreme situations would be mutual independence and complete mutual dependence. Some very useful dependence concepts, such as positive and negative dependence concepts, are based on the refinement of some intuitive understanding of dependence among random variables. For example, for two random variables X and Y , the positive dependence concept means roughly that large (small) values of X tend to accompany large (small) values of Y. Often in practice, this knowledge of the amount of dependence is good enough for some modelling purposes. Some well-known measures of dependence for two random variables are Pearson's correlation coefficient r, Spearman's rho and Kendall's tau. These measures are defined as follows: Let X, Y be random variables with continuous distribution function F(x) and G(y) and copula C. We further assume that (X\,Yi), ( X 2 , Y 2 ) and (X, Y ) are independent with the same joint distribution. Then Pearson's correlation coefficient is r = Co\(X,Y)/CTXCY or Kendall's tau is r = CoTr{sgn(Xl - X), sgn^ - Y)) = 2Pr((Xi - X ) ( Y : - Y ) > 0) - 1, or and Spearman's rho is p = Covi(sgn(Xi — X), sgn(Y2 — Y)) or Chapter 2. Foundation: models, statistical inference and computation 17 where ax and cry stand for the standard deviation of random variables X and Y, sgn(-) denotes the sign function. Both Kendall's tau and Spearman's rho are invariant to strictly increasing trans- formations. They are equal to 1 for the Frechet upper bound and -1 for the Frechet lower bound. These properties do not hold for Pearson's correlation. Essentially, Pearson's r measures the strength of the linear relationship between two random variables X and Y, whereas the Kendall's tau and Spearman's rho are measures of monotone correlation (strength of monotone relationship). For bi- variate quantitative data, Spearman's rho corresponds to the rank correlation (Pearson's correlation applied to the ranks of the 2 variables). That the copula captures the basic dependence structure among the components of Y can be seen by the fact that all nonparametric measures of association, such as Kendall's tau, Spearman's rho, are normed distances of the copula from the independence copula. In general, it is difficult to judge the intensity of dependence for a given multivariate model solely based on one dependence measure; the three common dependence measures can be used as a reference for the attainable intensity of the dependences of a given multivariate model. For ease of interpretation of the dependence structure, we would like to see the dependence structure expressed in easily interpretable parameters. For example, for arbitrary marginals, a question is how to express a copula in terms of the most common measures of association, such as Pearson's r (from some specific marginals), Spearman's rho, or Kendall's tau, in a natural way. For some well-defined classes of distribution, such as the multivariate normal, Pearson's correlation coefficient is the measure of choice. In other classes, other measures may be more appropriate. (For example, the Morgenstern copula in subsection 2.1.4 is expressed in terms of Kendall's tau in a natural way.) A parametric family has extra interpretability if some of the parameters can be identified as dependence parameters. More specifically, one would like to be able to say that some range of the parameters corresponds to positive dependence and some corresponds to negative dependence. Furthermore, it would be desirable to have the amount of dependence to be increasing (decreasing) as parameters increase (decrease). 2.1.4 Examples of multivariate copulas Some well-known examples of copula families are: the multivariate normal copula, Morgenstern copula, Plackett copula, Frank copula, etc. Joe (1993, 1996) provides a detailed list of families of copulas with good properties. In Genest and Mackay (1986), a class of copulas, called Archimedean copulas, is studied extensively. Most existing parametric families of copulas represent monotone dependence structures where the intensity of the dependence is determined by the value of the Chapter 2. Foundation: models, statistical inference and computation 18 dependence parameter. Some families, such as the normal family, possess a complete range of dependence intensities whereas others, such as the Morgenstern family, possess only a limited range. In fact, the Morgenstern copula never attains the Frechet bounds; Spearman's rho lies between — 1/3 and 1/3. For general modelling purposes, we would naturally seek families with a wide range of dependence intensities. Here we give some examples of multivariate copulas. More examples of multivariate copulas will be given in Chapter 3 for constructing multivariate discrete models. E x a m p l e 2.1 ( M u l t i v a r i a t e n o r m a l copula) Let $ be the standard normal distribution func- tion and let $<j be the d-variate normal distribution function with mean vector 0, variance vector 1 and correlation matrix 0 . Then the corresponding d-variate copula is where every bivariate copula Cjk(uj,Uk',0jk), 1 < j < k < d, attains the lower Frechet bound, the independence case, or the upper Frechet bound according to 8jk = —1,0, or 1. Pearson's correlation coefficient for the corresponding bivariate normal distribution is r = 9jk- For Spearman's where the Gj's are arbitrary cdfs. For example, we can have GJ(ZJ) = exp(zj)/(l + exp(zj)), GJ(ZJ) — $(zj), Gj(zj) = 1—exp(— exp(zj)) or GJ(ZJ) = exp(— exp(—Zj)). GJ(ZJ) can even be taken as a mixture distribution function, for example, GJ(ZJ) = TTJ$(ZJ) + (1 — TTJ) f^1 exp(— |x|)/2 dx, where 0 < TTJ < 1. These flexible choices of univariate marginal distributions combined with the complete range of the dependence parameter matrix 0 make the multivariate normal copula a powerful copula for general modelling purposes. In Chapter 3, we will use this copula extensively. • E x a m p l e 2.2 ( M o r g e n s t e r n copula) In the literature, sometimes the names of several people are put together in naming this copula; Farlie-Gumbel-Morgenstern copula is one of them. In this thesis, we simply call this copula the Morgenstern copula (Morgenstern 1956)'. One simpler version of a d-dimensional Morgenstern copula, which does not include higher order terms, is C(uu ..., ud; 0) = $d($-V),..., $-1(Ud); 0) (2.4) rho and Kendall's tau, we can also establish the following relationships: r = (2/7r)sin 1 r and p = (6/7r) s in _ 1 ( r /2 ) . With this copula, we have G(z1, ...,zd) = C7(Gi(zi) , . . •, Gd(zd); 0) = ^ ( ^ ( G i ^ i ) ) , . . . , 3>-\Gd{zd)); 0), d d (2.5) Chapter 2. Foundation: models, statistical inference and computation 19 It has density function d c(ui,u2,...,ud) = 1 + J~] 6jkQ- ~ 2 « j ) ( l - 2uk). j<k The conditions on the parameters 0jk so that (2.5) is indeed a copula are straightforward. For d = 3, the conditions can be conveniently summarized as follows: 1+^12+^13+^23 > 0, l+#i3 > #23+#23, 1 + O12 > #13 + 023, l + 023 > 9i2 + #13, or more succinctly - 1 + \012 + ^ s l < #i3 < 1 - \0i2 - #231> — 1 < 0\2, #13, 2̂3 < 1- Similar conditions for higher dimension d — 4,5,..., can also be obtained by considering the 2d cases for uj = 0 or 1, i = 1 , . . . , d, and verifying that c(u\,..., ud) > 0. It is easy to see that for any j,k = 1 , . . . , d; j ^ k, Cjk(uj,uk;0jk) = [l + 9jk(l - UJ)(1 - uk)]ujuk, - 1 < 0jk < 1, with density function Cjk(uj,uk) = 1 + 0jk(l - 2UJ)(1 - 2uk). The dependence structure between Uj and Uk is controlled by the parameter 0jk. Spearman's rho is p = 0jk/3. The maximum Pearson's correlation coefficient over all choices of Gj and Gk is 1/3 (when 9jk = 1) which occurs for uniform marginals. For normal marginals, the maximum of the Pearson's correlation coefficient is l / V ; for exponential marginals it is 1/4; for double exponential marginals, the limit is 0.281. Kendall's tau is 20jk/9, with the maximum range of —2/9 to 2/9. Because of the dependence range limitation, the Morgenstern copula is not very useful for general modelling. Nevertheless, because the Morgenstern copula has such a simple form, it can be used as an investigation tool in, for example, simulation studies to check properties of some general modelling procedures. A n example of its use is provided in section 4.3. If a new procedure breaks down with a distribution based on the Morgenstern copula, then it will probably have difficulties with other models that admit a wider range of dependence. A version of the d-dimensional Morgenstern copula with higher order terms has the following density function d c(uuu2, . . . , u d ) = l+ ] T Pjd2[l - 2ujJ[l - 2uj2] jl<]3<J3 i = l This form expands the correlation structure of the Morgenstern distribution (2.5). For more details, see Johnson and Kotz (1975, 1977). • Chapter 2. Foundation: models, statistical inference and computation 20 2.1.5 C U O M , C U O M ( f c ) , M U B E , P U B E and M P M E concepts In this thesis, we are mainly interested in (a) parametric models (or copulas) with wide range of dependence intensities, and (b) parametric models (or copulas) with certain types of marginal distribution closure properties. In this subsection, we introduce several concepts for the marginal behaviour of a distribution. Definition 2.4 (Closure under taking of margins, or C U O M ) A parametric model (copula) is said to have the property of closure under taking of margins, if the bivariate margins and higher- order margins belong to the same parametric family. • Definition 2.5 (Closure under taking of margins of order k, or CUOM(fe)) A parametric model (copula) is said to have the property of closure under taking of margins of order k, if the k- variate margins belong to the same parametric family. • Definition 2.6 (Model univariate-bivariate expressible, or M U B E ) A parametric model (copula) is called model univariate-bivariate expressible, or MUBE, if all the parameters in the model can be expressed by parameters in the model's univariate and bivariate marginal distributions. a Definition 2.7 (Parameter univariate-bivariate expressible, or P U B E ) If a parameter in a model can be expressed by the model's univariate and bivariate marginal distributions, then this parameter is called parameter univariate-bivariate expressible, or PUBE, under the model. • Definition 2.8 (Model parameters marginally expressible, or M P M E ) / / all the parame- ters in a model can be expressed by the model's lower dimensional (lower than full) marginal distri- butions, then the model is said to have the property of model parameters being marginally expressible or the MPME property. • If we are thinking about parameter estimation, then the expressions such as "expressible" and "be expressed" in the above definitions should be understood as "estimable" and "be estimated" respectively from lower-dimensional margins. A model with C U O M is also said to have reproducibility or upward compatibility under taking of margins. Basically, the marginal distributions "reproduce" themselves under taking of margins. This property is desirable in many applications in multivariate because initial data analysis often starts with lower dimensional margins. Chapter 2. Foundation: models, statistical inference and computation 21 A model with the M U B E property means that all the parameters appearing in the multivariate distribution appear in univariate and bivariate marginal distributions. A model with the P U B E property may have multivariate parameters of order higher than 3, but part of its parameters of interest can be univariate or bivariate expressed without the involvement of other multivariate parameters (e.g. trivariate parameters). A model with the M P M E property means that all of its parameters may be expressed marginally. These are important properties of a multivariate model that allows for a simplification in the parameter estimation through the I F M approach (defined in section 2.3). Based on the above definitions, the following implications hold: i . If a model has the C U O M property, then it also has the CUOM(&) property. If a model is not CUOM(fc), then it is not C U O M . ii . C U O M ( r i ) implies CUOM(ro) if r\ > ro- That is, there exists a parameterization of the lower dimensional margins so that the lower order closure property hold. iii . If a model has the M U B E property, then all the parameters in this model are P U B E . Further- more, this model is also M P M E . iv. If every parameter is P U B E , then the model is M U B E . No other implications hold in general. In the following, a few examples are used to illustrate the above concepts and some of their relationships. Example 2.3 (Models with C U O M and M U B E properties) A familiar example of a model with the C U O M and P U B E properties is the multivariate normal model. The closure under taking of margins for the multinormal distribution is somewhat stronger than the C U O M property defined here, since it is also closed under taking of univariate margins, which is not required in our definition. • Example 2.4 (Models with M U B E property) For some copulas, such as (2.4) and (2.5), the dependence structure can be expressed by a d x d matrix parameter 0 = (Ojk) with Ojj = 1. For such a d-dimensional copula C ( ; 0 ) , the 2-dimensional margins can be expressed by a bivariate copula C j ( - ; Ojk) with one dependence parameter Ojk, for j, k = 1 , . . . , d; j ^ k. Thus each element in the dependence structure described by the parametric matrix 0 = (Ojk) can be equivalently expressed Chapter 2. Foundation: models, statistical inference and computation 22 by a set of bivariate copulas Cjk(-;9jk)- The distribution with this copula is thus M U B E . Some copulas such as (2.4) have a wide range of dependence; some such as (2.5) do not. • E x a m p l e 2.5 ( M o d e l s w i t h C U O M but not M U B E proper ty ) We give two examples here: a. Consider the generalized Morgenstern copula (2.6). This copula has the C U O M property, since for any {ji,..., jm} £ {1,..., d} where m < d, it is straightforward to verify that C(UJ1 , Uj2,.,., Ujm) has the form (2.6). But this generalized Morgenstern copula is not M U B E . b . Another example is the multivariate Poisson distribution. Let us examine the trivariate Poisson distribution. Let the random variables X\, X2, X3, X\2, X13, X23, X123 have independent Poisson distributions with the mean parameters A i , A 2 , A 3 , A12, A13, A23, A123 respectively. We now construct new random variables as follows: Y\ = X\ + X\2 + X13 + X\23, Y2 — X2 + X12 + X23 + X i 2 3 i Y3 = X3 + X13 + X23 + X l 2 3 - Using the convolution property of the Poisson, we derive that Yi ~ Po(X\ + A i 2 + A13 + A123), 7 2 ~ Po(A 2 + A 1 2 + A 2 3 + A123), Y3 ~ Po(A 3 + A i 3 + A 2 3 + A 1 2 3 ) , that (YUY2), (yltY3), (Y2,Y3) have bivariate Poisson distributions, and that (Yi, Y2, Y3) has a trivariate Poisson distribution. This 3-dimensional Poisson model has the C U O M property because the bivariate margins have a similar stochastic representation. But it is not M U B E nor P U B E . In fact, with univariate and bivariate margins, we can only estimate A x + A13, A 2 + A 2 3 and A 1 2 + A i 2 3 from the (1,2) margins, Ai + A i 2 , A 3 + A 2 3 and A i 3 + A i 2 3 from the (1,3) margins, and A 2 + A i 2 , A 3 + A i 3 and A 2 3 + A 1 2 3 from the (2,3) margins. These nine linear expressions form only six independent linear expressions. Since we have seven parameters in the model, thus the model is not M U B E . Furthermore, it can be easily verified that no any single parameter can be univariate-bivariate.expressed. • E x a m p l e 2.6 ( M o d e l s w i t h M U B E b u t not C U O M ( 2 ) p r o p e r t y ) Consider a trivariate cop- ula constructed in the following way: C i 2 3 ( u i , u 2 , ' u 3 ) = / C1\2(u1\x;612)C3\2(u3\x;623)dx, (2.7) Jo where C\\2 and C 3 |2 are conditional cdfs obtained from two arbitrary bivariate copulas families C 1 2 ( u i , «2! ̂ 12) and C 2 3 ( u 2 , u 3 ; <523). This trivariate copula has (1,2) bivariate margin C i 2 ( « i , u 2 ; <5i2), (1,3) bivariate margin C i 3 ( « i , u3) = JX Ci| 2 («i|a;; <5i2)C3|2(u3|a:; <523) dx, and (2, 3) bivariate margin C ' 2 3 ( u 2 , u3; 623)- Suppose we can let C\2 be the Plackett copula C(u,v;6) = Q.bn-1{l + r1(u + v)-[(l + r)(u + v))2 -ASnuv]1!2}, 0 < <5 < 00, (2.8) Chapter 2. Foundation: models, statistical inference and computation 23 where n = 5 — 1, and we can let C23 be the Frank copula C(u,v;6) = - 6 - 1 l o g ( [ t - ( l - e - s » ) ( l - e - 6 v ) ] / t ) , 0 < 6 < oo, (2.9) where £ = 1 — e~s. Then the model (2.7) is well-defined, and is obviously M U B E with 2 bivariate dependence parameters 612 and 623- But the model (2.7) is not CUOM(2) , since the Plackett copula and the Frank copula are not in the same parametric family. Generally speaking, given bivariate distributions ^12,-^23 with univariate margins F\,F2,F3, it can be shown that £13(^112(2/1 W, O12), F3\2(y3\z2; #23)) ^ 2 ( ^ 2 ) (2.10) -00 is a proper trivariate distribution with univariate margins F\,F2,F3, (1,2) bivariate margin F12, and (2,3) bivariate margin i<23- In (2.10), Fi\2,F3\2 a r e conditional cdfs obtained from F\2,F23, and C 1 3 is a bivariate copula associated with the (1,3) margin (it can be interpreted as a copula representing the amount of conditional dependence in the first and third univariate margin given the second). Specifically, Ci3(ui,u3) = U\u3 corresponds to conditional independence and C i 3 ( u i , W3) = m i n { u i , u 3 } corresponds to perfect conditional dependence. The model (2.10) is M U B E , but it may not be CUOM(2) - it is enough to see this fact by choosing F\2 and F23 from different parametric family. The model (2.7) is a special case of (2.10) obtained by letting F\2, F23 be the Plackett and Frank copulas respectively, and C i 3 ( u i , U 3 ) = uiu3. The construction (2.10) is a special case of Joe (1996a). • E x a m p l e 2.7 ( M o d e l s w i t h C U O M ( 2 ) but not C U O M proper ty ) Let F(u, v; 9) = uv(l + 9(1 — u)(l - v)), - 1 < 9 < 1, be the bivariate Morgenstern family (2.5). Let Fi2 and F23 are in this family with parameters #12 and 023 respectively. Let C i 3 ( u i , w 3 ) = W1W3. The conditional distributions are FJ\2(UJ\U2) = Uj + 9j2Uj(l - Uj)(l - 2u2), j = 1,3. Hence by (2.10), we have F 1 3 ( u 1 } u 3 ) = j F1\2(ui\z2)F3\2(u3\z2)dz2 = u l U 3 [ l + Z ' ^ ^ ^ l - U l ) ( l - u3)], Jo which is in the bivariate Morgenstern family (2.5) with parameter 0i2#23/3. Hence the model / • U 2 Fl23(ui,U2,U3) = ^ ( U l |Z2)-F3|2(U3 l z 2 ) ^ 2 (2.11) Jo is CUOM(2) . But (2.11) is not C U O M . In fact, we find F 1 2 3 ( u l t U2, U3) = U 1 U 2 U 3 [ 1 + 012(1 - U i ) ( l - U2) + 3 - 1 0 i 2 0 2 3 ( l - « l ) ( l - U3) + 0 2 3(1 - « 2 ) ( 1 - u3) + 20 1 20 2 3(1 - u i ) ( l - u2)(l - u3)(l - 2 « 2 ) / 3 ] , which is not in the trivariate Morgenstern family (2.5). • Chapter 2. Foundation: models, statistical inference and computation 24 E x a m p l e 2.8 ( M o d e l s w i t h C U O M ( r 0 ) b u t not C U O M ( r i ) property , w h e n r 0 < r i ) Con- sider a 4-variate copula model: Fl234(ui,U2,U3,U4) = UlU2U3U4[l + 0i2(l - U i ) ( l - u2) + 3 _ 1 0 i 2 0 2 3 ( l - « i ) ( l - u 3) + 014(1 - U l ) ( l - u4) + 823(1 - u2)(l - u3) + e24(l - u 2 ) ( l - « 4 ) + 6*34(1 - u 3 ) ( l - u 4 )+ 2^12^23(1 - « i ) ( l - u 2 ) ( l - « 3 ) ( 1 - 2u 2)/3], where \914 + e24 + 934\ - 612 - 1 < 0 2 3(1 + M < 1 + 912 - \914 + 624 - 934\, \914 -024 -934\ + 912 - 1 < ^23 (1 -^12) < 1 — ^12 — 1^14 — ^24 + ^341 and \9jk\ < 1, 1 < j < k < 4. It can be shown that F12, Fi3, F14, F23, F24, and F34 are in the bivariate Morgenstern family (2.5), but i ? i 2 3 , i * i 2 4 , F i 3 4 and F 2 3 4 are not in the same parametric family. In fact, ^124, -P134 and F234 are in the trivariate Morgenstern family (2.5), but .F123 is not. • E x a m p l e 2.9 ( M o d e l s w i t h P U B E b u t not M P M E proper ty ) We give two examples here: a. In the generalized Morgenstern copula (2.6), the parameters (3j1j2 (1 < ji < J2 < d) are P U B E , but the model is not M P M E , as the parameter /?i2--d cannot be expressed by any marginal copula. b. Another example is the Molenberghs-Lesaffre model in Example 2.17. The parameters r\j (1 < j < d) and r]jk (1 < j < k < d) are P U B E , but the model is not M P M E , as the parameter r)i2-d cannot be expressed by any marginal pmf. • 2.2 M u l t i v a r i a t e d i s c r e t e m o d e l s Assume T is a parametric family defined on a common measurable space (3^.4), where y is a discrete sample space and A the corresponding cr-field. We further assume F={P(y;0):6e$i}, ft C JtV, (2.12) where $ — (9\,..., 9q)' is a g-component vector, and ft is the parameter space. The parameter space is usually a subset of g-dimensional Euclidean space. We presume the existence of a measure fi on y such that for each fixed value of the parameter 0, the function P(y; 0) is the density with respect to u of a probability measure V on y. For a d-dimensional random discrete vector Y — (Yi,..., Yd)', its pmf P(yi • • - yd, 6) (or simply P(y\ • • - yd)) is assumed to be in T. Chapter 2. Foundation: models, statistical inference and computation 25 2.2.1 Multivariate copula discrete models We define a cdf of a discrete random vector Y = (Y\,..., Yd)' as G(yi,...,yd) = C(G1(yl),...,Gd(yd)), (2.13) where C is a (/-dimensional copula and Gj (j = 1,. . .,d) is the cdf of the discrete rv Yj. Thus G(yi, • • •, yd) is a well-defined cdf for a discrete random vector Y . The pmf of Y = y = (yi,..., yd)' is 2 2 P(vi---Vd)= J2---H(-l)kl+"+kdC(^'---'x^), (2-14) *1 = 1 kd = l where Xji = Gj(yj),Xj2 = Gj(y*j) with Gj(yj) < Gj(y*j) and for any x such that yj < x < y*j , we have Pr(Yj = x) = 0. We call the model (2.13) for a discrete random vector Y a multivariate copula discrete (MCD) model. The family of M C D models is a big family. With M C D models, we have flexible choices of marginal cdfs, including standard distributions such as Bernoulli, binomial, negative binomial, Pois- son and generalized Poisson, etc., and these allow the models to accommodate a wide range of data. We may also have flexible choices of copulas; examples are multinormal copula, Hiisler-Reiss copula, Morgenstern copula, etc.. For a summary of properties of M C D models, see subsection 2.2.4. For a given d-variate discrete distribution F, we can often find multiple copulas which match F into a M C D model. For example, suppose we have a bivariate binary random vector Y = (Yi, Y2)', where Yj (j = 1,2) takes values 0 and 1. The probability of observing (1,1), (1,0), (0,1) and (0, 0) are P(ll), -P(IO), -P(Ol) and P(00) respectively. Then for any given one-parameter family of bivariate copulas C(u\, u2; 0) that ranges from the Frechet lower bound to upper bound, we can find a 6 to express the four probability masses in the following way C{ux,u2;e) = P(ll), < « i = P ( l l ) + P(10), (2.15) u2 = P ( l l ) + P(01). (2.15) may not hold if C(-;9) cannot attain the Frechet bounds. The above observation suggests that to model multivariate discrete data, different copulas could do the modelling job equally well. To make the modelling successful in the general sense, it is important that the copula has a wide dependence range. Evidently, with different copulas, we will not be estimating the same dependence parameters, but nevertheless the fitted model should lead to the similar inference or interpretations. Chapter 2. Foundation: models, statistical inference and computation 26 2.2.2 Multivariate mixture discrete models Multivariate discrete models can be constructed in different ways than the derivation of M C D models. We can envisage circumstances that the multivariate discrete random vector Y at y = ( j / i , . . . , yd)' has pmf/ ( j / i • • - yd', A) for a given A. Suppose further that A is a random outcome which we assume to be a p-component vector (p may be different from d) subject to chance variation described by a certain (continuous) multivariate distribution G ( A i , . . . , A p ) , which in turn can be expressed in terms of a copula function C(u\,..., up) with (continuous) univariate marginal distribution Gj j = 1 , . . . , p. This is similar to imagining a group of outcomes, with random traits or effects for the individuals in the group, and having a common constant trait or element through the distribution of the random effects. Then the probability of Y = y, or the pmf of Y at y is P(vi---Vd)= / • • • [ f(yi---yd;X)c(G1(X1),...,Gp(Xp))'[[gj(Xj)dX1---dXp. (2.16) We call (2.16) a multivariate mixture discrete (MMD) model. We use the word mixture since the distribution function is constructed as a mixture of {/(j/i • • - yd', A)} over A. A special case of (2.16) obtains by assuming that the outcome of each univariate marginal probability mass corresponding to the outcome of Yj, which is Pj{yj), depends on a parameter jj, j = l,...,d, (or a vector of parameters), and given jj, the variables Yj are independent. If A = ( A i , . . . , A p ) ' is the p-component vector formed by the non-singular components of jj, j = 1 , . . . , d, then the model (2.16) becomes P(vi---Vd)= / • • / Y[fj(yj;7j)c(G1(X1),...,Gp(Xp))l[gj(\j)dX1---dXp, (2.17) J J j=i j=i where fj(yj',jj) — Pr(l} = yj\Tj = Jj). The dependence among the response variables is induced through the mixing distribution of A. Usually Xj = jj, j = 1 , . . . , d. A special case is jj = Xj = X for all j . 2.2.3 Examples of M C D and M M D models From their definitions, we see that the above two classes are rather general. We can choose any appropriate multivariate copula as the copula in the construction of the distribution. The sets of M C D and M M D models are not disjoint, as we can see from Example 2.13. From practical viewpoint, we need to find some specific multivariate copulas C which offer good modelling properties and have a simple analytic form. One such choice is the multivariate normal cop- ula (2.4). With this copula, we have C ( G i ( * i ) , . . . , Gd(zd)) = $ d ( $ " 1 ( G i ( z i ) ) , . . . , *-\Gd(zd));e), Chapter 2. Foundation: models, statistical inference and computation 27 where Gj's are arbitrary cdfs. The multivariate normal copula allows us to fully or almost fully ex- ploit the dependence structure among the response variables. Its primary disadvantage may be computational difficulties when d is large (e.g. d>7, see Schervish 1984). This subsection consists of examples of M C D and M M D models. Discussion concerning the inclusion of covariates is given in some cases. More extensive studies of specific M C D and M M D models are given in Chapter 3. Example 2.10 ( M C D binary model) 1. General models. Let Yj (j = l,...,d) be a binary random variable taking values 0 or 1, and suppose the probability of outcome 1 is pj. The cdf for Yj is For a given d-dimensional copula C ( u i , . . . , ud; $), C(G\(yi),..., Gd{yd)', 9) is a well-defined distri- bution for the binary random vector Y = ( Y i , . . . , Yd)'. When d = 2, with a one-parameter copula C ( u i , u2; #12), we can write down the pmf of Y as where eti = Gi(j / i - 1), 61 = G i ( y i ) , a 2 = G2(y2 - 1) and b2 - G2(y2). The pmf of Y = y for a general d is expressed by (2.14). One simple way to reparameterize pj in (2.18), so that the new parameter associated to the univariate margin has the range in (—00,00), is by letting pj — FJ(ZJ), where Fj is a proper cdf. This is equivalent to writing Yj = I(Zj < Zj), where Zj is a rv with cdf Fj, and the random vector Z = (Z\,..., Zd)' has a multivariate cdf F^-d- In the literature, this approach is referred to as a latent variable model or a multivariate latent model, since Z is an unobserved (latent) vector. There is also the option of including covariates to the parameter Zj, as well as to the dependence parameters 6 in the copula C ( u i , ...,Ud\0). We will show these by examples. 2. Multivariate probit model with no covariates. The classical multivariate probit model for the multivariate binary response vector Y is (2.14) with the multinormal copula (2.4), where pj is reparameterized as pj = $(ZJ) and Gj has form (2.18). This model has the C U O M and M U B E properties. Through its latent variable representation, the model can also be written as Yj = I(Zj < Zj), j = 1 , . . . , d, where Z = (Zlt..., Zd)' ~ TV(0, 0), 6 = (9jk); Zj is often referred to as the cut-off point. 0 is a correlation matrix, which (a) has elements bounded by 1 in absolute value ' 0, yj < 0, Gj(Vj) = I 1 ~Pj, 0 < yj < 1, . 1, yj > 1. (2.18) P{yiV2) = C(blt62; 912) - C(bua2;912) - C(ax,b2;012) + C(ai,a2;912) Chapter 2. Foundation: models, statistical inference and computation 28 and (b) is nonnegative definite. To avoid the constraint of the bounds, we can reparameterize Qjk through the hyperbolic tangent transform as ^ = eXP\7jk)~l (2-19) exp (7 j j b ) + l so that the new parameter jjk is in the range (—oo, oo). The right hand side of (2.19) is an increasing function in jjk- Condition (a) is not sufficient to guarantee 0 be nonnegative definite except when d = 2. For d = 2, 0 is always nonnegative definite since the determinant of 0, 1 — ̂ 12 > ^ s always nonnegative. For d = 3, 0 is nonnegative definite matrix provided det(0) = 1 + 20120i3023 - 0{2 - 6\3 - e23 > 0; (2.20) this constraint is satisfied for about 61.7% of the cube [—1,1]3 for (#12, #13, #23)- For d = 4, only about 18.3% of the hyper cube [—1,1]6 leads to a nonnegative definite matrix 0 ; see Rousseeuw and Molenberghs (1994). Theoretically, the constraint (b) causes no trouble for the usefulness of the model. But numerically, this constraint may be a problem, since the space where the numerical computation can be carried out is quite limited. For the numerical computation to be successful, we have to guarantee that the current values are not out of the space of constraint, which, in some situations (e.g. the real parameters are close to the space boundaries), may render the computation time consuming or even not possible. In some situations, these problems with the constraint (b) can be avoided by limiting consideration to a simple correlation structure, so that the nonnegative definite condition is always satisfied. Examples include an exchangeable correlation matrix with all correlations equal to the same 9, and an AR(1) correlation matrix with the (j, k) component equal to for some 6. 3. Multivariate probit model with covariates. The classical multivariate probit model for a binary response vector Y,-, i = 1 , . . . , n, with covariate vector X{j for the jth univariate marginal parameter, if we use the latent variable representation, is that Y,j = I (Zij < atj + ySjXjj), j = 1, . . .,d, i = l , . . . , n , where Z,- ~ N(O,0 j ) , 0 S = (Oijk)- A modelling question may be whether dependence parameters should also be functions of covariates. If so, what are natural function to choose, so that Oi are all correlation matrices? If 0,- does not depend on any covariates, then Zs- are iid N ( O , 0 ) , with 0j = 0 = (Ojk). If 0j depends on some covariate vectors, say 8ijk depends on vfijk, then to satisfy \0ijk\ < 1> we can let o _ e x p ( 7 j M + 7,-fcWjjt) - 1 uiJk - 7 i T 7 T - ( ^ 1 ) e x p ( 7 j M + 7jifcWt,jfc) + l Since all 0, , i = 1 , . . . , n, must be nonnegative definite, this may be a very strong restriction on the regression parameters ( 7 ^ , 0 , Jjk)- In some situations, choices of the parameters (jjk,o,7jk) i n (2-21) Chapter 2. Foundation: models, statistical inference and computation 29 making all 6 t nonnegative definite may not exist. The inclusion of covariates to the dependence parameters Oijk as expressed in (2.21) is a mathematical construction. In the Example 2.13, we will give a more "natural" way to include covariates to dependence parameters. • Example 2.11 ( M C D count model) 1. General models. Consider a cf-variate random count vector Y = ( Y i , . . . , Yd)'. Let Yj be a random variable taking the integer values 0,1, 2 , . . . , oo, j = 1, 2 , . . . , d. Let Pr(Y;- = m) = p^m\ Then we have J2m=o P^ = ^ a n < ^ * n e c c u " ° f X?' is [y;l GJ(yJ)=Y,P<T)> (2-22) m = 0 where [yj] means the largest integer less or equal than yj. Thus for a given d-dimensional copula C(u\,..., Ud\ 9), C(G\(yi),..., Gd(yd)\9) is a well-defined distribution for the count random vector Y . The.pmf of Y = y for a general d is expressed by (2.14). If we further assume that Yj has a Poisson distribution with parameter Xj, that is ^ ) = A r e x p ( - A , ) J ml then we will say we have a MCD Poisson model. For the M C D Poisson model, the univariate parameter Xj can be reparameterized by rjj = log(Aj), so that the new parameter rjj has the range (—00,00). Covariates can be included to r\j in an appropriate way. The comments on modelling of the dependence structure in the copula C for the M C D binary model are also relevant here. To represent the M C D Poisson model by latent variables, let Yj = m if z m _ i < Zj < zm, —00 = z_i < ZQ < • • • < Zoo = 00, where Zj is a rv with cdf Fj, and the random vector Z = (Z\,..., Zd)' has a multivariate cdf F\2...d- The form of Fj does not have much importance since for count data, we are seldom interested in the cut-off points ZQ, Z \ , ..., z^. But the copula related to Fi^-.-d has essential importance for the modelling of count data, since it determines the multivariate structure of the random count vector Y . Thus we may say that for count data, the M C D representation (2.14) is more relevant than the latent variable representation. 2. Multivariate Poisson model with multinormal copula. The multivariate Poisson model with multi- normal copula for a count response vector Y is that in (2.14), where the copula is the multinormal copula (2.4) and has the form (2.23). This model has the C U O M and M U B E properties. The univariate marginal parameters Xj can be transformed to rjj — log(Aj) so that rjj has range (—00,00). For a random vector Y j , i = 1, . . .,n, if there is a covariate vector x̂ - for Xij, a possible way to Chapter 2. Foundation: models, statistical inference and computation 30 include x,j is by letting rjij = ctj + Pj^ij, where rjij — log(Ajj). Similarly, if 0j = (Oijk) with 9ijk depending on a covariate vector Wjjfc, a possible way to include vfijk is by letting 9ijk have the form (2.21). The difficulties with adding covariates to 0, remain, as in the previous example. • E x a m p l e 2.12 ( M M D Poisson model) 1. General models. Let Y = ( Y i , . . . , Yd) be a random vector of count data, where Yj, j = 1 , . . . , d, has a Poisson distribution. The MMD Poisson model for the random vector Y is /•oo /.oo d p P(vi---Vd)= •• / ]Jfj(yj^j)c(G1(m),...,Gd(rjp))'[[gj(r}j)dm---drlp, (2.24) Jo Jo j = 1 j = 1 where fj(yj;\j) = exv-xi\y /yj\ (2.25) is the probability mass function of a Poisson distribution for Yj given the parameter Xj. In (2.24), T) = (rji,..., r)p)' is a p x 1 vector of the collection of functions of A i , . . . , Xd; it is assumed to be random with a density function c(Gi(n{),...,Gp(r)d))\Yj=i 9i(^i)^ where c(-) is the density function of a copula C and gj (•) the marginal density of rjj. The model can cover a wide range of dependence through appropriate parametric families of the copula C. Through conditional expectations one can study the covariances and correlations of Y . If Xj = rjj, j = 1 , . . . , cf, we have E ( y i ) = E(E(Y >|A i)) = E(A i ) , Var(Y i) = E(Var(Y i | Xj)) + Var(E(Y j [A,-)) = Var(A;) + E(A,-), (2.26) [ Cov(Yj, Yk) = E(Cov(Y j , Y^A,-, A*)) + Cov(E(Y j |A,-), E(Y 2|A f c)) = Cov(Xj, Xk). Therefore the correlation of Yj and Yk is Con(Yj,Yk) = { [ V a r ( ^ } + ^ [ J ^ j + E ( A , ) ] } i / 2 ) (2-27) which has the same sign as the correlation of Xj and Xk. Corr(Yj,Yi:) is smaller than Corr(Aj, A*,) and tends to Corr(Aj, Afc) when E(Aj)/Var(Aj) and E(Afc)/Var(Afc) tend to zero. When Xj = n, j = 1 , . . . , d, Y is equicorrelated with Corr(Yj, Yk) = Var(?7)/[Var(^)+E(77)]. The range of dependence for this special situation is quite restricted. For the general model (2.24), the parameters are introduced by the marginal distribution of r\j and the copula C. Letting the parameters depend on covariates is possible, as we can see from the next example with a specific copula. 2. Multivariate Poisson-lognormal model. The Multivariate Poisson-lognormal model for a random Poisson vector Y is that in (2.24), where the copula is the multinormal copula (2.4), and T]J has a Chapter 2. Foundation: models, statistical inference and computation 31 lognormal distribution with parameters fij and <Xj. The pmf for Y = y is P(yi--yd)= •••/ Y[fj(yj;*j)g(v,i*,<r,Q)dm---dr)i Jo Jo (2.28) where fj(yj; \j) is of the form (2.25), and gd(ri;ti,<T, 0) = 1 -^(log»?-/i)'(<T /e(T)- 1(logi7-p)} , (2.29) (27r)^(7?1...77p)|<T'0(T|l/2 exp with rjj > 0, j = 1 , . . . ,p, is a multivariate lognormal density function. The model (2.28) has the C U O M and M U B E properties. The parameters in the model are p. = (m,..., Ud)', o = (ax,..., ad)' and 0 . By (2.26) and (2.27), we have The margins are overdispersed Poisson since Var(Yj)/E(Yj) > 1. |Corr(Y), Yk)\ is less than |Corr(r?j, r]k)\ and Corr(Yj, Yk) approaches Corr(?7j, rjk) when a,j, ak —* oo. A covariate vector x can be included in the model, say by letting the components of (i be linear functions of x. a can be assumed to have some special pattern, for example a\ = • • • — ap — a. It is harder to naturally let the correlation matrix 0 depend on covariates, as already discussed for the multivariate probit model for binary data. • E x a m p l e 2.13 ( M M D m o d e l for b i n a r y data) 1. General models. Let Y = ( Y i , . . . , Y d ) ' be a binary random vector. Assume that Y has the M C D binary model in Example 2.10 for a given cut-off point vector a = (cxi,..., ad)' • a in turn is assumed to be a random vector. Let T) = (771,..., t]p) be the collection of functions of a. With the latent variable representation, we have that for given t] E(Yj) = exp{fij + ^a?}d=aj, Var(Y i ) = aj + a][exp(a]) - 1], Cov(Yj, Yj,) = ajak[exp(9jkajak) - 1], j ^ k, (2.30) Y = ( Y i , . . . , Yd)' = ( / (Ai < a i ) , . . . , I(Ad < ad))' (2.31) where A = (Ai,..., Ad)' has a multivariate cdf F, and T) has a multivariate cdf G. Thus P{yi---yd)= / •••/ P(yi---yd\r,)c(G1(r]l),--.,Gp(r]p))]Jgj(T]j)dri1---dr)1 J — 00 J— 00 Chapter 2. Foundation: models, statistical inference and computation 32 where c(G\(rji),..., Gp(rjp)) ]TJ=i 9j(Vj) is the density function of t), with c(-) the density function of a copula C and gj(-) the marginal density of r)j. A more general case is when there is a covariate vector x. In this situation, we may let ctj = Bjt0 + BjX, j = 1 , . . . , d, where the Bj^s and Bj's are random, and Tf is now assumed to be the collection of functions of the random components BJQ'S and Bj's. 2. Multivariate probit-normal model. The M M D probit model is obtained by assuming that in (2.31), A = ( A i , . . .,Ad)' ~ Nd(0,Q) and 17 ~ Np(p,T,), where 6 = (Ojk) is a correlation matrix and £ = (o-jk) is a variance-covariance matrix. Without loss of generality, let us assume 17 = a. Then the M M D probit model of the form (2.31) becomes Y = ( Y i , . . . , Yd)' = < zt),... ,I(Zd < z*d))', (2.32) where Zj = (Aj - a, - Uj)/^/l + <TJJ, zj = m/y/l + a,,, j = l,...,d, and Z = (Zx,...,Zd)' ~ Nd(0,R), where R — (rjk) is a correlation matrix with rjk = (Ojk + fjjfe)/{(l + C j j ) ( l + Ckk)}1^2, j ^ k. This is a special class of multivariate probit model in Example 2.10. When Cjj — 0, it is the multivariate probit model discussed in Example 2.10. This example demonstrates that the intersection of the sets of M C D and M M D models is not empty. It is straightforward to extend such a construction to the more general situation with a covariate vector x, such that OLJ = Bj0 + Bj-x. with the BjfiS and Bj's random. With one covariate Xj, for example, one might take aj = BJQ + BJXJ with B0 = ( B 1 > 0 , 3 d < 0 ) ' ~ Nd(ji0, E 0 ) independent of 8 = (Pi,..., BD)' ~ Nd(p, S) , where Mo = (pi,o,---,Ud,o)', S 0 = (<Tjk,o), Ii = (ui,---,ud)' and E = (<Tjk). Now in (2.32), we have Zj = (Aj - BIFI - BJXJ - p,ji0 - pjXj)/yJl + (Tjjfi + o-jjxj, with Z = (Zi,...,Zd)' ~ Nd(0,R), R = (rjk), such that j ~ { l + % , o + W 2 } 1 / 2 ' 3 ~ " " ' ' /» V (2-33) _ Ojk + tTjJb.O + O'jkXjXk . . . r j k ~ {(1 + <Tjj,o + <Tjjx])(l + o-kkfi + <rnx\)Yl* ' 3* • The function of rjk in (2.33) can be considered as a "natural" form for the correlation parameters as functions of the covariates, since this function representation is derived directly from the linear regression for marginal parameters. As long as the conditions for linear regression for marginal parameter hold, rjk will always satisfy the constraints for forming a correlation matrix. For R to be nonnegative definite, it suffice that 0 , So and E be nonnegative definite. These three matrices do not depend on covariates, which is very attractive numerically compared with the nonnegative definite requirement on G,- in (2.21). A special case is Ojk = 0 and <Tjkto = 0 (j ^ k), in which Chapter 2. Foundation: models, statistical inference and computation 33 case the only constraint is that £ be nonnegative definite. Finally, we notice that in contrast to the conventional univariate probit analysis, the regression function in (2.33) for the cut-off points are not linear functions of covariates. Nevertheless, (2.33) can be used in lieu of the multivariate probit model with covariates in Example 2.10, since the parameters in (2.33) are also interpretable. To use the model (2.33), it is necessary to reparameterize the parameters Cjj.o, Cfcit.o, <Tjk,Q, o-jj, <rkk, o~jk and 9jk such that the new parameters have (—oo, oo) as their domain. • 2.2.4 Some properties of M C D and M M D models We summarize some of the properties of M C D and M M D models: 1. M C D and M M D models, constructed through stochastic or latent variable representation, provide a clear probabilistic description of multivariate discrete random phenomenon. In some situations, the pmf and cdf have closed forms; in other situations, the pmf or cdf can be numerically computed in a reasonably short time. Likelihood inference can be used, with the help of the theory in section 2.3 and section 2.4. 2. M C D and M M D models allow flexible choices of multivariate copulas (Multinormal copula, Hiisler-Reiss copula, Morgenstern copula, etc.) as well as flexible choices of all the univari- ate marginal distributions (any discrete distributions: Bernoulli, binomial, negative binomial, Poisson and generalized Poisson, etc.), and they allow relevant covariates to be included in the appropriate parameters in the models. In this way, these two classes of models are able to capture the nature of discrete data in an individual or grouped observation basis, thus they allow the drawing of appropriate inferences from the data. 3. With appropriate copulas, many M C D and M M D models have the C U O M and M U B E prop- erties. The C U O M property, sometimes referred to as "reproducibility" or "upward compati- bility" in the literature, is also sought for modelling longitudinal and repeated measures. With appropriate families of parametric copulas, a wide range of dependence, including negative dependence, is possible. 4. With appropriate copulas, the parameters related to the univariate margins structure and the parameters related to dependence structure can be allowed to vary independently in separate parameter spaces. This is a good property that the multivariate Gaussian model also enjoys. Chapter 2. Foundation: models, statistical inference and computation 34 5. By choosing appropriate marginal distributions, the M C D and M M D models can naturally account for a variety of situations occuring with discrete data, such as over-dispersion which is independent of covariates, skewed distributions, multimodality, etc. 6. For a given d-variate discrete distribution F, there may be many copulas which match F into M C D model class; M C D models are robust in terms of data modelling with copulas of similar structure. Some of the points above will be made clear in Chapter 3 as well as in Chapter 5. 2.3 Inference functions of margins For a general multivariate model, parameter estimation is often a difficult computational issue. Without readily available parameter estimation methods, any model, even though interpretable, will not have practical usefulness. For situations involving univariate models, many methods have been devised for parameter estimation, ranging from the method of moments through formal maximum likelihood to informal graphical techniques. The maximum likelihood approach is used in general because it has a number of desirable statistical properties. For example, under general regularity conditions, M L estimators are consistent, and asymptotically normal. With some weak additional assumptions, the M L E is also asymptotically efficient. However, the method has not been successfully applied for estimating the parameters of multivariate models, except for the multivariate normal and a few cases with low dimension (e.g. d = 2). A primary cause of this unsatisfactory situation is the computational difficulty involved with multivariate models, even with modern powerful computers. The M L approach for parameter estimation in multivariate situations is still not routine. The •question is: can we have a general effective estimation procedure to estimate parameters for a model in the M C D and M M D classes? In this section, we first discuss model fitting strategies for multivariate models in subsection 2.3.1. One strategy leads to the inference functions of margins approach, that we propose as the parameter estimation approach for M C D and M M D models with the M U B E , P U B E or M P M E properties. In subsection 2.3.2, we introduce some important results in inference function theory for multiple parameters needed for developing the inference basis for M C D and M M D models. In subsection 2.3.3, we introduce the inference functions of margins (IFM) approach and give some examples. Chapter 2. Foundation: models, statistical inference and computation 35 2.3.1 Approaches for fitting multivariate models There are at least three possible likelihood-based approaches to estimate parameters in a multivariate model: Approach 1. A l l univariate and multivariate parameters are estimated simultaneously by maximizing the full-dimensional likelihood function. This is the M L E approach. Approach 2. For a model where all multivariate parameters are in a copula, univariate parameters are estimated from the separate univariate likelihoods. The multivariate parameters are then estimated from multivariate likelihoods with the univariate parameters fixed as estimated from separate univariate likelihoods. Approach 3. For a model with the M U B E , P U B E or M P M E property, univariate parameters are estimated from separate univariate likelihoods. Bivariate, trivariate and multivariate parame- ters are then estimated from bivariate, trivariate and multivariate likelihoods, with lower order parameters fixed as estimated from lower order likelihoods. The first approach is general and direct. While this strategy sounds most natural from the likelihood point of view, it could be computationally very difficult for most of the multivariate models, even in relatively low dimensional situations. The multivariate normal distribution, which can be easily han- dled by this approach, is an exception. The second approach makes the computational task easier, but it still has the difficulties of dealing with a multivariate object in general. These difficulties are mainly two: the high-dimensional maximization problem and the multivariate probability calcula- tion. The third approach reduces these difficulties by working with lower dimensional maximizations or lower dimensional probability calculations. This is a valuable approach if the parametric family of interest has the M U B E , P U B E or M P M E properties. It is important because it makes statistical inference for multivariate data easier. Computational tractability is an important factor for the popularity of certain statistical tools, as we observe in many areas of statistics. The third approach to stochastic modeling is often convenient, since many tractable models are readily available for the marginal distributions. It is also invaluable as a general strategy for data analysis in that it allows one to investigate the dependence structure independently of marginals effects (through copula) and computationally only dealing with lower dimensional (often two-dimensional) models. Chapter 2. Foundation: models, statistical inference and computation 36 E x a m p l e 2.14 Consider the multivariate probit model for a d-dimensional binary vector Y with pmf 2 2 P(Vi • • • Vd) = J2 • • • E• '• + * ' ' <M*~ 1 K- , ) . • • •. J ; © ) , (2-34) »'i=i »'<i=i where 0 = (Ojk), o,ji = Gj(yj — 1) and a,j2 = Gj(yj), with C7j(l) = 1 and Gj(0) = 1 — $(ZJ). This model has the C U O M and M U B E properties. For estimation from a random sample of iid Y i , . . . , Y „ , the three approaches for fitting multivariate models could be used here: Approach 1. Estimate the parameters z — (z\,..., zd)' and 0 by maximizing the multivariate like- lihood L = f]r=i P(yn ''' Vid)- Let the resulting estimates be z and 0 . Approach 2. (a) Obtain the estimates z = (z\,..., zd)' by maximizing separately d univariate marginal likelihoods, (b) Estimate the parameters 0 from the multivariate likelihood L = nr=i P(yn '' ~yid) with the parameters z fixed at the estimated values i from (a). Approach 3. (a) Obtain the estimates z = (z\,... ,zd)' by maximizing separately d univariate marginal likelihoods, (b) Estimate the parameters Ojk, 1 < j < k < d, by maximizing separately d(d— l ) /2 bivariate likelihoods Ljk — Y\7=i Pjk(yijyik) with the parameters Zj,Zk fixed at the estimated values Zj,Zk from (a). Let the resulting estimate be 0 . Approach 1 is computationally demanding, since it requires the calculation of high-dimensional multinormal probabilities and a numerical optimization on many parameters. Approach 2 reduces the numerical optimization problem to fewer parameters, but the high-dimensional multinormal probability calculation is still required. Approach 3 reduces the numerical optimization in Ap- proach 2 into several numerical optimizations, each involving fewer parameters. Further, the high- dimensional multinormal probability calculation is no longer required; all that is needed are the binor- mal probability calculations, which are readily feasible with modern computers. Multi-dimensional calculation are needed for predicted or expected frequencies, but this is much less effort compared with multi-dimensional numerical integrations within a numerical optimization procedure. Since it is computationally easier to obtain z and 0, a natural question is what is the asymptotic efficiency of z and 0 compared with z and 0 . In Chapter 4, we will deal with this problem in a general context. • Chapter 2. Foundation: models, statistical inference and computation 37 2.3.2 Inference functions for multiple parameters Introduction The common approach to the problem of estimation is to propose an estimator T(x) and then study its properties. For estimators with specific properties such as unbiasedness, minimum variance or minimum mean squared error, or asymptotic normality, theories for ordering these estimators are developed. Standard methods for obtaining the estimator T(x) include least squares (LS), maximum likelihood ( M L ) , best linear unbiased, method of moments, uniform minimum variance ( U M V ) , and so on. However, many point estimation procedures may be viewed as the solution of an (or some) appropriate estimating equation(s). Indeed, any estimator may be regarded as a solution to an equation or a set of equations of the form \P(x,0) = 0, where $ is a vector of functions (or a single function in the one-parameter case) of the data x and the parameter 6. ty(x,6) is commonly called a vector of inference functions or estimating functions. In this thesis, we use mainly the term "inference functions". But when we focus more on the use of the inference functions for estimation, we also employ the term "estimating functions". The theory of inference functions is studied in , for example, Godambe (1960, 1976, 1991), McLeish and Small (1988) and Jorgensen and Labouriau (1995). The theory of inference func- tions imposes optimality criteria on the function \S' rather than the estimators obtained from it. The approach of considering a class of inference functions and finding the optimal inference function has the advantage of retaining the strengths of the estimation method (e.g LS, M L , U M V ) and at the same time eliminates some of their weaknesses. For example, in point estimation, the Cramer-Rao lower bound is attained only in rare occasions whereas the optimality of the score function among inference functions holds merely under regularity conditions (see below). Inference functions may be used either as estimating equations to determine a point estimate or as the basis for constructing tests or confidence intervals for the parameters. A n example is the maximum likelihood estimators, which are obtained as the solutions of estimating equations from the score functions. Thus the inference functions for M L E are the score functions. Other examples of the application of inference functions are the theory of M-estimators for obtaining robust estimators and the quasi-likelihood methods used in generalized linear models. Inference functions have also found application in a wide variety of applied fields; examples in biostatistics, stochastic processes, and survey sampling can be found in Godambe (1991). In the following, we introduce the notion of regular inference functions and study the asymptotic properties of resulting estimates in the iid situation. Chapter 2. Foundation: models, statistical inference and computation 38 Inference functions for a vector parameter In the following, we wil l give a series of definitions for the inference functions for a vector of pa- rameters and a general asymptotic result for the parameter estimates from the defined inference functions. Let us consider a parametric family T defined on a common measurable space (y ,A), where A is the cr-field associated with y. We further assume T= {P(y,6) :$£ ft}, ftC (2.35) where 6 = (6\,.. .,0q)' is g-component vector, and ft the parameter space. The parameter space is usually a subset of (/-dimensional Euclidean space. We presume the existence of a measure p. on y such that for each fixed value of the parameter 0 the function P(y; 6) is the density with respect to p of a probability measure V on y. Definition 2.9 (Inference functions) A Rq-valued vector of functions *(y;0) = (V>i(y;0),.-.,^(y;0))T : ^xft^TR* is called a vector of inference functions, if the component functions of \&(y; 0) are measurable for each fixed 8= (6i,...,0q) ef t . • Definition 2.10 (Unbiased inference functions) \P is said to be unbiased if for each 6 E ft and j = 1,.. .,q, Efjlijj} = 0, where Eg means expectation relative to P(-;6). • Unbiasedness is a natural requirement which ensures that the roots of the equations are close to the true values when little random variation is present. Whereas 6 may not have an unbiased estimator, unbiased inference functions exist under fairly general circumstances. For any given inference function vector ^ and any y E y, an estimator of 0, say 6 — 6(y), can be obtained as the solution to \t = 0. In order for the estimate 6 to be well-defined and well-behaved, the inference function vector $ must satisfy some regularity conditions, that is, * must consist of regular inference functions. Definition 2.11 (Regular inference functions) The vector of inference functions $ is said to be a vector of regular inference functions if, for all 6 E ft, the following assumptions are satisfied: 1. The support ofy does not depend on any 6 E ft. 2. E{i>j} = 0, j = l,...,q. Chapter 2. Foundation: models, statistical inference and computation 39 3. The partial derivative dyjj/dOk exists for almost every y £ y, j,k = 1 , . . . , q. 4- The order of integration and differentiation may be interchanged as follows: ^- J ^P{y; 0)dp{y) = J [ ^ P ( y ; 6)] dp(y), j, k = 1 , . . .,q. 5. E{ipjij)k} exists, j,k = 1 , . . . , q, and the q x q matrix M$(0) = E{WT} is positive-definite. 6. The q x q matrix is non-singular. • A model P(y; 6) in (2.35) is said to be regular, if the score functions are regular inference functions and 5ft is an open region of Mq. We are only interested in regular models, such that the asymptotic theory concerning M L E s is readily available for use. This is not a strong assumption for applications. (The main limitation may be the exclusion of models in 1 of Definition 2 .11.) D e f i n i t i o n 2.12 (Fisher i n f o r m a t i o n matr ix ) The Fisher information matrix is the matrix- valued function I : 5ft —• Ftqxq defined by I(6) = E{U(6)UT(6)}, where U(6) is the vector of score functions, U{B) = d/dB\ogP(y;6). • D e f i n i t i o n 2.13 ( G o d a m b e i n f o r m a t i o n matr ix ) For a regular inference function vector^!, the Godambe information matrix is the matrix-valued function J$ : 5ft —• R q x q defined by M$) = Dl{6)M^{6)D<i{6), where My(6) = E{WT} and £>$(0) = E{dV/dO'}. • Chapter 2. Foundation: models, statistical inference and computation 40 Consider n iid observations y x , . . . , y„ from a model P(y; 0) in (2.35). Let ^(y,-; 8) = (ipn,..., ipiq)' • The inference function vector based on the n observations is \£„ : yn x ft —• IR9 given by n 8 = 1 We define the estimator 0 = 0(yi,..., y n) as the solution of \P„ = 0. The following theorem establishes the asymptotic normality of the solution 0 based on regular inference functions and gives an asymptotic interpretation of the Godambe information matrix. Theorem 2.1 Assume that the estimator 0 = 0(y1,... ,yn) associated with the regular inference function vector \Pn : yn x ft —+ IR9 is a y/n-consistent estimator of 0, that is, y/n(0j — Oj), j = l,...,q, is bounded in probability so that 9j tends to 9j at least at the rate ofl/y/n. We further assume that there exist functions Mjki(y) such that \d2ipj/d9kd9i\ < Mjki(y) for all 0 G ft, where E{Mjki(y)} < oo for all j,k,l. Then as n —• oo, we have asymptotically V^(0-0)°Nq(O,J^(0)) under P(-;0). Proof. The proof is similar to the corresponding theorem for the asymptotic normality of the M L E . We therefore only sketch it. * n has the following expansion around 0 O = Vn(0) = yn(0) + Hn(0)(0-0) + Rn, where Hn is a q x q matrix d^n/d0 and R n = Op(||0 — 0\\2) = o p ( n _ 1 ) by assumptions. Thus ^(6 -0) = l-Hn{0) n 1 1 - = [ - * „ ( * ) - R „ ] . (2.36) By the Law of Large Numbers -Hn{0)^{0). n Now for any fixed vector u = ( u i , . . . , uq)'', consider the sequence of one-dimensional rv's 4 , U l f « + . . . + u f ^ ; « ) Chapter 2. Foundation: models, statistical inference and computation 41 By the central limit theorem (Lindberg-Levy), u'9n/y/n is Ni(0, u ' M * u ) . This result leads to Applying Slutsky's Theorem to (2.36), we obtain y/H(8 - 0)^NQ(O, D^M^D^f) or V^(e-e)^Nq(o,j^(e)). • O p t i m a l i t y cr i teria for inference functions In this subsection, we will summarize optimality results for inference functions in the multi-parameter situation. These results will be referred to later for comparing two sets of regular inference functions. Consider a scalar inference function \P. It is natural to seek an unbiased estimating function \t for which the variance E{\&2} is as small as possible. This is analogous to the theory of minimum variance unbiased ( M V U ) estimation. Since the variance may be changed by multiplying \? with an arbitrary constant, some further standardization is necessary for the purpose of comparing variances. Godambe (1960) suggested considering the variance of the standardized estimating function \t s = ^/E{d^/d9], and defined an optimal estimating function to be one which minimizes Var(^ r J ) = E{\t2}/{E(d\E ,/3#)}2, or maximizes Var - 1 ( \T/j) , the Godambe information for <£. Godambe showed that in the one-parameter case the usual maximum likelihood estimating equation has this optimal property within a wide class of regular unbiased inference functions. Thus Godambe information can be used to compare two regular inference functions, and the function with the larger Godambe information is generally preferred. Given two vectors of inference functions, \P and Q, several different optimality criteria can be used to say that fi is preferred (or optimal) to D e f i n i t i o n 2.14 ( M - o p t i m a l i t y ) A vector of inference functions is said to have matrix opti- mality or M-optimality versus a vector of inference functions ^ if the difference of the inverses of the Godambe information matrices is non-negative definite. • Chapter 2. Foundation: models, statistical inference and computation 42 Definition 2.15 (T-optimality) A vector of inference functions is said to have trace optimality or T-optimality versus a vector of inference functions ^ if the difference of the trace of the inverse of Godambe information matrices TriJ^(e))-Tr{J^(d)) is positive. • Definition 2.16 (D-optimality) A vector of inference functions is said to have determinant optimality or D-optimality versus a vector of inference functions \P if the difference of determinant of the inverse of Godambe information matrices \J^(6)\-\J^(6)\ is positive. Chandrasekar and Kale (1984) proved that M-optimality implies T-optimality and D-optimality. Joseph and Durairajan (1991) further proved that the above three criteria are equivalent in the sense that if $ is optimal with respect to any one of the three criteria then it is also optimal with respect to the remaining two. When comparing two sets of regular inference functions, we could examine a slightly different version of T-optimality and D-optimality. For example, for T-optimality, we may examine Tr(J^(e))' and for the D-optimality j\Jn\')\ In practice and often in simulation studies, only the estimated values of J^1(6) and J^iO) are available, M-optimality or T-optimality or D-optimality may be violated slightly numerically based on only one set of observations. We end this subsection by stating an extended Cramer-Rao inequality for inference functions: Theorem 2.2 For any given vector of regular inference functions \P, and for all 6 6 5ft, J$l(8) — I - 1 (6) is non-negative definite. For a proof of this result, see Jorgensen and Labouriau (1995). Related references include Ferreira (1982) and Chandrasekar (1988), among others. • Chapter 2. Foundation: models, statistical inference and computation 43 This theorem states that, for a regular model P(y;6), the vector score functions nta\ dlogP(y;fl) fd log P(y,$) dlogP(y;6)\ u { 6 ) - dB - 1, del Wk ) are M-optimal within the class of all regular unbiased estimating functions. 2 .3 .3 I n f e r e n c e f u n c t i o n o f m a r g i n s We have seen from previous subsection that, under fairly general regularity conditions, the score functions are asymptotically optimal among regular inference functions. However, with multivariate models, except in a few special cases (e.g multivariate normal), the estimating equations based on the score functions are computationally very cumbersome or intractable. It would be an invaluable alternative to have inference functions which are computationally feasible in general and also efficient compared to the score functions. In the ensuing subsection, we introduce a set of inference functions, we call the inference functions of margins (IFM). In Chapter 4, we show that I F M shares the asymptotic optimality properties of the score functions, and this is particularly true for the multivariate models with M U B E and P U B E properties. One major advantage of I F M is that it is computationally feasible in general and more flexible for handlinge different types of data. This leads us to develop a new inference theory and computationally feasible procedures for many M C D and M M D models. Inference f u n c t i o n of scores We consider the family (2.35) and assume it is a regular parametric family. The likelihood function of 6, given y, is L(6;y) — P(y;6), the corresponding loglikelihood function is £(6;y) — \ogP(y;6). Let Ln(6) = f[L(6;yi) ! = 1 denote the likelihood of 6 based on y 1 ; . . . ,y„ , a sample from y. The loglikelihood function of 6 based on y l t . . . , yn is 4(*) = l o g M * ) = £ * ( 0 ; y , . ) . »=i D e f i n i t i o n 2.17 (Inference functions of scores, or I F S ) The vector of score functions dtn{6) = fd£n(6) dtn{6) 06 V 30i d9q is called inference function vector of scores, or IFS. • Chapter 2. Foundation: models, statistical inference and computation 44 The maximum likelihood estimate (MLE) is generally determined as the solution to the likelihood equations dtn(0)/d0 = 0. The Hessian matrix of the function —£n(6)/n is J(0), where (J(8))jk = -(l/n)(d2£n(0)/d0jd9k). The expected value of J(6), 1(6) = E{J(0)}, is the Fisher information matrix. The value J(0) of /(•) at the maximum likelihood estimate 0 = 6(y1,...,yn) is referred to as the observed information. J(0) will generally be positive definite since 6 is the point of maximum likelihood. A consistent estimate of1(0) is 1(0) = J(0). Under very general regularity conditions, it is known that the M L E s are asymptotically normal, in the sense that as n —• oo, V^(0-0)°Nq(O,I(0)-'). See Sen and Singer (1993, p.209) for a proof. Inference f u n c t i o n of margins We now introduce the loglikelihood function of a margin, the inference function of a margin for one parameter, and then define the inference functions of margins (IFM) for a parameter vector 6. The asymptotic results for the estimates from I F M will be established in the next section. Consider the parametric family (2.35) and assume P(y,0) is a d-dimensional density function with respect to a probability measure p on y . Let Sd denote the set of non-empty subsets of {1,..., d}. For any S € Sd, we use \S\ to denote the cardinality of S. Let Ps(ys) be the 5-margin of P(y;0), where y 5 = {yj :. j G S}. Assume Ps(ys) depends on 0s, where 0s is a subvector of 0. D e f i n i t i o n 2.18 Let 0 = (9\,..., 9q)'. Suppose the parameter 9k appears in S-margin Ps(ys)- The loglikelihood function of the S-margin is ts(8s) = logPs(ys). An inference function for 9k is d£s(0s) 08k ' • The inference function of a margin for a parameter 9 is not necessarily uniquely defined by the above definition. In this thesis, unless specified otherwise, we always work with the inference function from a margin with the smallest \S\. For a specific model, it is often evident when \S\ is the smallest for a parameter, so we will not be concerned with the proof of this feature in most applied situations. If there are two or more inference functions for 9 with the same smallest \S\, than there Chapter 2. Foundation: models, statistical inference and computation 45 is a question of how to combine these inference functions to optimally extract information. We will discuss this issue in section 2.6. Note that with the assumption of M P M E (or M U B E ) , one can use S with \S\ < q (\S\ < 2 for M U B E ) for every parameter Bk. In the case where M P M E does not hold, then one has S = {1,..., d} for some 6k in the model. For the new theory below, we assume M P M E or M U B E or P U B E in the remainder of this chapter. Assume for the parameter vector 8 — {6\,..., 9q)', the corresponding smallest cardinality subsets associated with the parameters are Si,.. .,Sq (as q is usually greater than d, there are duplicates among the Sk's). D e f i n i t i o n 2.19 (Inference funct ions of margins , or I F M ) The vector of inference functions 'd£n,Sl(8Sl) den,sq(6s,)\' is called the inference functions of margins, or IFM, for 6. • For a regular model, the inference functions derived from the likelihood functions of margins also satisfy the regularity conditions. Thus asymptotic properties related to the regular inference functions should apply to I F M . Detailed development of this aspect will be given in section 2.4. D e f i n i t i o n 2.20 (Inference funct ions of margins estimates, or I F M E ) Any fl £ $ which is the solution of - = ( ^ ^ y = ° is called the inference functions of margins estimate, or IFME, of the unknown true parameter vector 8. • In a few cases, 6 has an analytic expression (e.g. Example 4.3). In general, 8 has to be obtained by means of numerical methods. E x a m p l e s of inference functions of margins E x a m p l e 2.15 Let X i , . . . , X n be n iid rv's from the distribution C(Gi(xi),...,Gd(xd)',@) with GJ(XJ) = $(XJ; Uj,aj), where C is the multinormal copula (2.4). Let p = (pi,..., fid) and a = (<TI, . . . , ad). The loglikelihood function is n / d \ £n(p,(T,0) = J ^ l o g I C ( * ( X J I ) , • • • , * ( * « ) ; © ) X\j>{.Xij;p.j,aj) \ . Chapter 2. Foundation: models, statistical inference and computation 46 Thus the IFS is vp I F S = (dtn(M,',&) d£n(p,<r,e) d£n{p,<r,Q) V dpi ' " ' ' dpd ' 9<TI d£n{p,<r,Q) d£n(p,er,0) d£n{p,<r,QY do-d ' #012 ded-i,d The loglikelihood functions of 1 and 2-dimensional margins for the parameters p., <r, © are £nj(pj,o-j) - ^ l o g ^ X i j - ^ j , ^ ) , j = 1, . . .,d, n tnjk(8jk,Pj,Pk,0-j,0-k) = ^ log (c($(lij), $(liO; S j O ^ 1 ' ! I Pj,^i)<t>{xik\ Pk,<^k)) , 1 < j < k < d. Thus the I F M is »=i * I F M = (d^"1^1'0"1) dtnd(Pd,o-d) d£ni(pi,o-d) d£nd(pd,o-d) \ dpi ' ' ' ' ' Q^d ' Q(Tl i • • • > > d£n!2(9l2, Pi, P2, Q~l, ̂ 2) d£nd-l,d(9d-l,d, Pd-1, Pd, O'd-lyO'd) d9d-l,d 86 12 It is known that \PIFS and \PIFM lead to the same estimates; see for example Seber (1984). • E x a m p l e 2.16 Let Y i , . . . , Y n be n iid rv from the multivariate Poisson model with multinormal copula in Example 2.11. The loglikelihood functions of margins for the parameters A and 0 are 4 j ( A j ) = ^logPjiyij), j = l,...,d, 8 = 1 n £njk(9jk, Aj, A*) = E l o g P j k i y i i V i k ) , 1 < j < k < d, where Pj(yij) = A f J e x p ^ A , - ) / ^ - ! and P j k ( y i j y i k ) = $ 2 ( $ - 1 ( 6 o ) , $ _ 1 ( M ; tyt) - M * - 1 ^ ) . ^ _ 1 ( a » i f c ) ; î/fc) — *2(^ _ 1(a a-j), ^«~ 1(6ifc); <9jfc) -I- ^ 2 ( ^ _ 1 ( a»i)> ^ _ 1 ( a « J k ) ; ĵJfe). w h e r e aij = Gij(yij - 1), bu - Gij(yij), aik - G i k ( y i k - 1) and bik = Gik(yik), with Gij(yij) = YH=opj(x) a n d G i k ( y i k ) = Zl=0Pk(x)- Let r)j = log(Aj). The I F M for rjj, j = 1,..., d, and 6jk, 1 < j < k < d are * I F M = f E i a P i ( W i ) 1 dPd(yid) E f^Pi(yn) d m ""'friPdim) dnd ' 1 dPi2(yayi2) \^ 1 dPd-\,d{yid-iyid) E fr( Pd-lAVid-lVid) d9d-l,d fri Puivnvn) dd12 For a similar random vector Y,-, i = 1 , . . . , n with a covariate vector x,j for A ^ , a possible way to include X ; J is by letting rjij = aj + BjXij, where rjij = log(A,j). We can similarly write down the I F M for parameters aj, Bj and 9jk. • Chapter 2. Foundation: models, statistical inference and computation 47 E x a m p l e 2.17 ( M u l t i v a r i a t e b i n a r y Molenberghs-Lesaffre model ) We first define the mul- tivariate binary Molenberghs-Lesaffre model (Molenberghs and Lesaffre, 1994), or M - L model. Let Y = ( Y i , . . . , Yd) be a d-variate binary random vector taking values 0 or 1 for each component. A model for Y is defined in the following way. Consider a set of 2d — 1 generalized cross-ratios with values in (0, oo): rjj, 1 < j' < d, rjjk, 1 < j < k < d, .. ., and r\\i-d such that: )?. . _ n( g i l , . . . ,y i ( ,) 6A+ pix-u(Vh • ••«/«) ^ where A+ = {(% 1 , . . ., yjq) G {1,0}* | (q - E ? = i W « ) = ° ( m o d 2)> a n d A , = {1.0}«W. a n d {ii i • • •! jq} is a subset of {1, 2 , . . . , d} with cardinality q. We can verify for example when q = 1,2,3,4 that * = ^j^d> P(11)P(00) , ^ . , ^ , ^ = P ( 1 0 ) P ( 0 1 ) ' 1 ^ < f c ^ > P(111)P(100)P(010)P(001) - p(H0)P(101)P(011)P(000)' - 3 < < - ' _ P(1111)P(1100)P(1010)P(1001)P(0110)P(0101)P(0011)P(000Q) tytzm - p( 1 1 1 0)p(iioi)p(ioil)P(1000)P(0111)P(0100)P(0010)P(0001)' 1 - ^ < * < / < m ^ c ( ' (2.38) where subscripts on P are suppressed to simplify the notation. Molenberghs and Lesaffre (1994) show that the 2d — 1 equations in (2.37) together with ^2 P(yi '"' yd) = 1 leads to unique nonnegative solutions for P(yi • • -yd), (yi, • • - ,2/d) G {l,0}d, under some compatibility conditions on the d — 1 and lower-dimensional probabilities. If all these conditions in the Molenberghs-Lesaffre construction are satisfied, we have a well-defined multivariate Bernoulli model. We call this model multivariate M - L binary model. The multivariate M - L binary model is not M U B E , but the parameters rjj and ijjk are P U B E . The special case where rjs = 1 for |5| > 3 is M U B E . Related to the M C D model, it is not clear if there exists a M C D model such that (2.37) is true and under what conditions a M C D model is equivalent to (2.37). The difficulty is to prove there exists a copula, such that (2.37) is a equivalent expression to a M C D model. The existence of a copula is needed in order to properly define this model with covariates (e.g. logistic regression univariate margins). For a discussion of whether the Molenberghs-Lesaffre construction leads to a copula model, see Joe (1996). Nevertheless, (2.37) in terms of Pjl-jq(yj1 • • - yjq) certainly defines a multivariate model for binary data for some ranges of the parameters. Chapter 2. Foundation: models, statistical inference and computation 48 Let Y i , . . . , Y n be n iid binary rv's from-a proper multivariate M - L binary model. Assume the parameters of interest are r) = ( 7 7 1 , . . . ,r)d, 7/12,.. . , nd-i,d)' and let 77s be arbitrary for \S\ > 3. The loglikelihood functions of margins for r) are «=i n enjk(Vjk,rij, Vk) = E l o S PjkiyijVik), 1 < j < k < d. Thus the I F M is lT, fd£„i(r]i) d£nd(rid) d£ni2(m2,m>^2) d£nd-i,d(rid-i,d,Vd-i, * IFM = — ~ , • • •, — * ' a > • • • > x For an interpretation of the parameters (2.37), see Joe (1996). • Some advantages of I F M approach The I F M approach has many advantages for parameter estimation and statistical inference: 1. The I F M approach for parameter estimation is computationally simpler than estimating all the parameters from the IFS approach. A numerical optimization with many parameters is much more time-consuming (sometimes beyond the capacity of current computers) compared with several numerical optimizations, each with fewer parameters. In some cases, optimization is done with parameters from lower-dimensional margins already estimated (that is, there is some order to the sequence of numerical optimizations). I F M leads to estimates of the parameters of many multivariate nonnormal models efficiently and quickly. 2. A potential problem with the IFS approach is the lack of stability of the solution when there are outliers or perturbations of the data in one or few dimensions. With the I F M approach, we suggest that only the contaminated margins will have such nonrobustness problems. In other words, I F M has some robustness properties in multivariate analysis. It would be interesting to study theoretically and numerically how outliers perturb the IFS and I F M estimates. 3. A large sample size is often needed for a large dimension of the responses. This may not be easily satisfied in most applied problems. Rather, sparse data are commonplace when there are multiple responses; these often create problems for M L estimation. By working with the lower dimensional likelihoods, the I F M approach avoids the sparseness problem in multivariate situations to a certain degree; this could be a major advantage in small sample situations. Chapter 2. Foundation: models, statistical inference and computation 49 4. The I F M approach should be robust against some misspecification in the multivariate model. Also some assessment of the goodness-of-fit of the copula can be made after solving part of the estimation equations from I F M , corresponding to parameters of univariate margins. 5. Finally, I F M leads to separate modelling of the relationship of the response with marginal covariates, and the association among the response variables in some situations. This feature can be exploited to shorten the modelling cycle when some quick answer on the marginal behaviour of the covariates is the scientific focus. In the above, we listed some advantages of I F M approach. In the next section, we study the asymptotic properties of I F M approach. The remaining question of efficiency of I F M will be studied in Chapter 4. 2.4 Parameter estimation with I F M and asymptotic results In this section we will be concerned with the asymptotic properties of the parameter estimates from the I F M approach. We will develop in detail the parameter estimation procedure with the I F M approach for a M C D or M M D model with M U B E or with some parameters of the models having P U B E properties. The situations we consider include models with covariates. Sufficient conditions for the consistency and asymptotic normality of I F M E are given. Some theory concerning the asymptotic variance matrix (Godambe information matrix) for the estimates from the I F M approach is also developed. Detailed direct calculations of the Godambe information matrix for the estimates based on the data and fitted models are given. A n alternative computational approach, namely the jackknife method, for the estimation of the Godambe information matrix is given in section 2.5. This technique has considerable importance because of its practical usefulness (See Chapter 5). Later in section 2.6, we will propose computational algorithms, which are based on I F M , for the parameter estimation where common parameters appear in different inference functions of margins. 2.4.1 Models with no covariates In this subsection, we confine our discussion to the case of samples of n independent observations from the same distributions. The case of samples of n independent observations from different distributions will be studied in the next subsection. We consider a regular M C D or M M D model in (2.12) P(yi---yd;6), 0 6% (2.39) Chapter 2. Foundation: models, statistical inference and computation 50 where 6 = (di,..., dd, 612, • • •, 8d-\,d)'• The model (2.39) is assumed to have M U B E or to have some of its parameters having P U B E properties. In general, we assume that 8j (j = l,...,d) is a parameter vector for the j th univariate margin of (2.39) such that Pj(yj) = Pj(yj',8j), and Ojk (1 < j < k < d) is & parameter vector for the (j, k) bivariate margin of (2.39) such that Pjk(yjVk) = Pjk(yj, Vk\ 8j, 8k, 9jk)- The situation for models with higher order (> 2) parameters are similar; the extension of the results here should be straightforward. For the purpose of illustration, and without loss of generality, we assume in the following that 8j and 8jk are scalar parameters. Let Y , Y i , . . . , Y „ be iid rv with model (2.39). The loglikelihood functions of margins of 0 are 4 j (9j) = E lo§ Pi(yij) »• 3 = 1 > • • •) d , 1=1 n tnjk(0j , Ok, Ojk) = E l o S PjkiVij Vik), 1 < j < k < d. (2.40) «=i These expressions can also be rewritten as 4>j (di) = ]C ni (y>)log Pi )' 3 = 1, • • •, d, {yj} tnjk(0j, 8k,8jk) - njk{yjyk)\ogPjk(yjyk), 1 < j < k < d, {yjVk} (2.41) based on the summary data rij(yj) and rijk(yjyk). In the following we continue to use the expression (2.40) for technical development, for consistency with the case where covariates are present. But (2.41) is a more economic form for computation, and should be used for model fitting whenever possible. Let 1 d. d - - T H < n = f 1 9 P j i y j ) i P j { y j ) . m . . 3 V l " *' } k ) Pjk(yjyk) d8jk ' . . d e f 1 d j(yij) . 1 < j < k < d, 1pi;jk = 1pi,jk(8j,8k,8jk) d e f 1 dPjkjyijyik) l<j<k<d, PjkiVijVik) 08jk for i = l , . . . , n . Let * = tf(0) = (V>i,..., V d , ^12, • • . , V d - i . d ) ' , and = = ( * „ i * B i , * n i 2 , • • • , 9nd-i,d)', where ¥„,• = £ " = 1 (j = 1, • • •, d) and 9njk = £ ? = 1 ^.jk (1 < j < k < d). Chapter 2. Foundation: models, statistical inference and computation 51 From (2.40), we derive the I F M for 0 n » = 1 (2.42) i=l Since (2.39) is a regular model, the regularity conditions of Definition 2.11 are also true for the inference functions (2.42). With the I F M approach, an estimate of 0 that we denote by 0 = 0(vi> • • • J Yn) — • • • > 0d> 012, • • •, 8d-i,d)' is obtained by solving the following system of nonlinear equations '*„,•=(), j = l , . . . , d , ^njk =0, 1 < j < k < d. Proper t ies of estimators We start by examining the simple case (particularly, for notation) when d = 2 to illustrate how the consistency and asymptotic normality of 0 can be established. The model (2.39) is now ^(yi.ifc; ft,02,0i2) (2.43) with 0 = (0i, 02,0i2)' G 3J- Without loss of generality, 0i, 02,0i2 are assumed to be scalar parameters. Let the random vector Y = (Yi, Yjj)' and Yj = (Y,i, YJ2)' (i — 1,...,n) be iid with (2.43), and y, y,- be the observed values of Y and Y,- respectively. Thus the I F M for 0i,0 2,0i2 are i = l n *r>12 = y~]ll>i;12- (2.44) i=l We denote the true value of 0 = (0i,02,0i2)' by 0O = (0i,o, 02,o, 0i2,o)'- Using Taylor's theorem on (2.44) at 0o to the first order, we have 0 = * „ i(0i) = *„ i(0i , o ) + (Oi ~ 0i,o) 0 = * n 2 ( 0 2 ) = *„2(^2,0) + (02 - 02,0) 301 502 0 = ¥n12(6~12,h,6~2) = *nl2(012,o) + (#12 ~ »12,o) dVnl2 60 12 (2.45) + (0i - 0i,o) 5^„12 30i r , + ( 0 2 - 0 2 , 0 ) ^ Chapter 2. Foundation: models, statistical inference and computation 52 where 0* is some value between 9\ and 01,0, Q\ is some value between 02 and 02,o, and 6** is some vector value between 6 and OQ. Note that \ P ni2 also depends on 0i and 02. Let Hn=Hn{6) = 0 0 0 0 3 » i 9 * 1 . 1 2 a s 3 a * „ , Q a « i 2 and D $ = -D*(0) = E{n 1 i 7 „ } . Since (2.43) is assumed to be a regular model, we have that E(\P„) = 0 and non-singular. On the right-hand side of (2.45), * „ i , * n 2 , * „ i 2 , d^m/d9u <9#r,2/<902, dVnl2/d912, <9tf„i 2/<90i and d^ni2/d92 are all sums of independent identical variates, and as n —• oo each therefore converges to its expectation by the strong law of large numbers. The expectations of \?„i (0 i ,o ) , ^n 2(0 2 ,o), and *ni 2(0i2,o) a r e zero and the expectations of d^ni/dOx, d^n2/d62, d^ni2/d9i2, are non-zero by regularity assumptions. Since all terms on the right-hand side must converge to zero to remain equal to the left-hand sides, we see that we must have (0i — 0i,o), (02 — 02,o) and (0i2 — 0i2,o) converging to zero a s n - > oo, so that 9\, 92 and 0i2 are consistent estimators under our assumptions (for a more rigorous proof along these lines, see Cramer 1946, page 500-504). Now let HI I \ a*„, 0 0 80! •\ 0 «; 0 8*n,2 0 * n , 12 a«, B" 682 6" a « i 2 \ 3** / It follows from the convergence in probability of 6 to 6Q that Hn{0) - Hn(60) >0. Since each element of n~1Hn(6) is the mean of n independent identically-distributed variates, by the strong law of large numbers, it converges with probability 1 to its expectation. This implies that n~1Hn(6o) converges with probability 1 to Dy(0o)- Thus -Hn(6)^D9(60). n Now we rewrite (2.45) in the following form y/n(0 - 00) = n - 7 = [ - * » ( « o ) ] - (2.46) Since 6\ lies between $i and 0i ] O , 02 lies between 02 and 02,o, and 0** lies between 6 and 0O, thus we also have -H^D9(e0). n Chapter 2. Foundation: models, statistical inference and computation 53 Along the same lines as the proof in Theorem 2.1, we see that 4=M0o)-^iV3(O,M*(0o)), where M*(0 O ) = E(tftf'). Applying Slutsky's Theorem to (2.46), we obtain v ^ ( * - 0O)^N3(O, D*\do)M*(6o)(Dy\0o))T), or equivalently Vt(8-0o)°N3(O,Jyl(6o)). Thus we can extend to the following theorem for the I F M E of model (2.43): Theorem 2.3 Consider the model (2.43) and let the dimension ofOj (j = 1, 2) be pj and that of 612 be P12. Let 6 denote the IFME of 6 under the IFM corresponding to (2-44)- Then 0 is a consistent estimator of 0. Furthermore, as n —• 00, y/H(9-e)^NPl+Pa+Pia(0,J^), where J * = J*(0) = D^M^1 , with M * = E{W} and D<z = E{9^/90'}. • Inverting the Godambe information matrix J$ yields the asymptotic variances and covariances of 0 = ( f li,02, f li2)- We provide the calculations for the situation where 0i,02,0i2 are scalars. The asymptotic variance of Oj, j = 1,2, is n~1\Evb'j]\E9il>j/90j]~2 and the asymptotic covariance of §1,92 is ra-^EVi^tE^Vi/^i]-1^^/^]"1. The asymptotic variance of 0 1 2 is 12 30 12 2 r EV22 + £ E EV? - 2 E E <9Vj_ -1 E 12 [EVi2 -̂]+2n 3 = 1 and the asymptotic covariance of 0i2, Oj is E dip 12 E av_i (90J Eyj12ipj •n E 50* E <9V 12 30* E a_Vj_ 90j [ 99j \ 90~. -1 'M12 90j \ 'diPj d9~ -1 'M\2 [ 90j \ -1 EV1V2 [EV1V2 Furthermore, from the calculation steps leading to the asymptotic variance expression, we can see that 0i, 02 and ,0i2 are ^/n-consistent. Chapter 2. Foundation: models, statistical inference and computation 54 Now we turn to the general model (2.39) where d is arbitrary. As we can see from the detailed development for d — 2, it is straightforward to generalize Theorem 2.3 to the case where d > 3, since in the general situation of the model (2.39), the corresponding I F M are n *nj=X]^»:J' J = 1> •••><*, t=l ®njk = ^2 1 < J < k < d- t=l In (2.39), 0j (j — 1 , . . . , d) and 9jk (1 < j < k < d) can be scalars or vectors, and in the latter case, ipj(0j) and ipjk(8jk) are function vectors. The asymptotic properties of 0 for a general d is given by the following theorem: Theorem 2.4 Consider the model (2.39) and let the dimension of Oj (j = 1, . . .,d) be pj and that of 9jk (1 < j < k < d) be pjk. Let 0 denote the IFME of 6 under the IFM corresponding to (2.42). Then 0 is a consistent estimator of 6. Furthermore, as n —• oo, we have asymptotically M6-6)^Nq(0,J^), where q = £ ? = 1 pj + J2i<j<k<dPi>°> J * = J*(d) = D^M^D*, with M * = M*(0) = Eg{VW'} and = E{8V/80}. • For many models, we have pjk = 1; that is 9jk is a scalar (see Chapter 3). The asymptotic variance-covariance of 6 is expressed in terms of a Godambe information matrix J$(0). Assume 9j (j = 1, . . .,d) and 9jk (1 < j < k < d) are scalars. Let us see how we can calculate the Godambe information matrix J$. We first calculate the matrix M $ and then Since M $ = E(\P\I>'), we only need to calculate its typical components E(ipjtpk) (1 < j, k < d), E(i/>>ipkm) {k < m) where j may be equal to k or m, and E(ipjkiptm) (j < k,l < m), where j may be equal to / and k may be equal to m. For E(ipjtpk), we have \pj(yj) pj(yk) 99j 09k j pik(yjVk) OPj[yj)0Pk{yk) It can be estimated consistently by { s ^ } pi(yj)pk(yk) d9j 89k dPj(ya) 8Pk(yik) l ^ U 1 _ n h i pj(yij)pk(Vik) de{j 89i]k (2.47) Chapter 2. Foundation: models, statistical inference and computation 55 or equivalently by 1 njk(VjVk) 9Pj(yj) dPk(yk) based on the summary data. For the case j = k, we need to replace Pjk(yjUk) by Pj(yj), {yjyk} by {yj} and rijk(yjyk) by nj(yj) in the above expressions. For E(ipjtpkm) (k < m), we have E ( ^ k m ) = E ( 1 dPj(yj)dPkm(ykymy Pj(yj) Pkm(ykym) 96) d9km _ y - Pjkm(yjykym) 9Pj(yj) dPkm(ykym) . ^ , Pj(yj)Pkm(ykym) 96'j d6km \y jykymj It can be estimated consistently by }_ A 1 dPj(yij) 9Pkm(yikyim) (2.48) n Pj(yij)Pkm(yikyim) 96j d6km For the case j = k or j = m, a slight modification on the above expression is necessary. For example for j = k, we need to replace Pjkm(yjykym) by Pjm(yjym), {yjykym} by {%ym} and njkm(yjykym) by rijm(yjym) in the above expressions. For E(tpjkipim) (j < k, I < TO), we have 1 1 dPjk(yjyk) 9Pim(yiymY E ( ^ m ) - E K p . k { y . y k ) Plm{yiym) OOjk 99lm y . Pj * i ~ (V< Vk. VIVm) 9Pik(Vi Vk ) dPlm (Mm) ~ ^ , Pjk(yjyk)Pim(yiym) 96jk 99lm {yjykyiymi It can be estimated consistently by I 9Pjk(yijyik) dPim (mmm) i f " i^i PJk(yijyik)Plm(yiiyim) 99jk 99,„ (2.49) For the particular case j — l,k ^ m (or similarly j = m or k = / or j ^ /, k = m cases), we need to replace Pjkim(yjykyiym) by Pjkm(yjykym), {yjykyiym} by {yjykym} and njklm(yjyky,ym) by njkm(yjykym) in the above expressions. For the particular case j = I, k = m, we need to replace Pjkim(yjykyiym) by Pjk(yjyk), {yjykyiym} by {yjyk} and njkim(yjykyiym) by rijk(yjyk) in the above expressions. Now let us calculate £ ) $ . Since Z)$ = D^(6) is a matrix with (p, q) element E(dipp/89 q) (1 < j, k < q), where xjjp is the pth components of \£ and 9q is the gth component of 6, we only need to calculate its typical component E(8ijjj / 89m) (1 < j,m < d), E(8t}>j/86im) (1 < j < d; 1 < / < ra < Chapter 2. Foundation: models, statistical inference and computation 56 d), F,(dipjk/d6m) (1 < j < k < d; 1 < m < d), and E(dyjjk/dOim) (1 < j < k < d; 1 < / < m < d). Since dipj d9„ 1 dPjjyj) dPjjyj) 1 d^Pjjyj) Pj{yj) d9j dOm + Pj(yj) ddjdem ' we have E 1 dPMdPjjyj) 1 d2Pj(yj)\ = i_dPMdPj(yj) It can be estimated consistently.by n PfiVii) d6j 80m (2.50) Because Pj depends only on univariate parameter Oj, thus yjj does not depend on the parameter Oim- So dvjj(0j)/d0im = 0; this also leads to Since we have dipjk _ 1 dPjk(yjyk) dPjk(yjyk) 1 d2Pjk(yjyk) ddm ~ PjMvjVk) dOjk 80m + Pjk(yjVk) d0jkd0m ' E fdjpj£\_ 1 urjkyyj \dOmJ Pjk(yjyk) dOjk 9Pj ( yk) 9Pjk(yjyk) {yjyk} dOr, It can be estimated consistently by _I f 1 dPjk(yjjyik) dPjk(yjjyik) n fr[ P-k(yijVik) dOjk dom (2.51) Similarly, we find Er̂ ,=< 80, lm sr 1 fdPjk(yjyk)\ . _ , , 0, otherwise, where E(dyjjk/dOjk) can be estimated consistently by 9Pjk(yijyik) n tt PUVHV") V dOjk 71 (2.52) 8j8k9jk Chapter 2. Foundation: models, statistical inference and computation 57 2.4.2 Models with covariates In this subsection, we consider extensions of models to include covariates. Under regularity con- ditions, the I F M E for parameters are shown to be consistent and asymptotically normal and the form of the asymptotic variance-covariance matrix is derived. One approach for asymptotic results is given in this subsection, a second approach is given in the next subsection. Our objective is to set forth results as simply as possible and in useful forms; more general theorems for multivariate models could be obtained. Let Y i , . . . , Y „ be a sequence of independent random vectors of dimension d, which are denned on the probability measure space (y,A,P(Y;0)), 6 G Sr. C Mq. The marginal probability measure spaces are defined as (yj,Aj,P(Yj\6)) (j = l , . . . , d ) for j th margin, and (yjk,Ajk,P{Yj,Yk\0)) (1 < j < k < d) for the (j, k) bivariate margin and so on. Particularly, the random vectors Y j (i = 1 , . . . , n) are assumed to have the distribution P(yn---yid\0i), 0i e% (2.53) where P(yn • • -yid',9i) is a M C D or M M D model with univariate and bivariate expressible (PUBE) parameters 0,- = • • •, 6%,d, 8i-,i2, • • •, 0»;d-i,d)- We also assume that Oij (j = l,...,d) is the parameter for the j th univariate margin of Y,- such that Pj(yij) depends only on the parameter Oij, and Oi-jk (1 < j < k < d) is the parameter for the (j, k) bivariate margin of Y,- such that Oijk is the only bivariate parameter appearing in Pjk(yijyik)- Oij and Oi-jk can both be vectors, but for the purpose of illustration, and without loss of generality, we assume they are scalar parameters. Furthermore, we assume for i = 1 , . . . , ra, = 9j(<*'jXij), j = 1 ,2, . . . , d, (2.54) Gijk - hjk(Bjkyfijk), 1 < j < k < d, where the functions gj(-) and hjk(-) are usually monotonic increasing (or decreasing) and differen- tiable functions (for examples of the choice of gj(-) and hjk(-) in a specific model, see Chapter 3), called link functions. They link the model parameters to a set of covariates through a functional relationship. In (2.54), otj = (otji, • • •, ajpj)' is a pj x 1 parameter vector, X j j = (a ; , j i , . . . , Xijpj)1 is a pj x 1 vector of observable covariates corresponding to the response y ; . 3jk = (djki, • • •, Pjkqjk)' is a qjk x 1 parameter vector, and Wijk = (wijki, • • •, wijkqjk)' is a qjk x 1 vector of observable co- variates corresponding to the response y,-. Usually ^ . pj + J2j<k 9J* i s m u c n smaller than the total number of observations n. x,j and Wijk may or may not include the same covariate components. In terms of margins, the covariates X{j and vfijk may be margin-dependent, or margin-independent, Chapter 2. Foundation: models, statistical inference and computation 58 or partly margin-dependent and partly margin-independent. The marginal parameters may also be margin-dependent or margin-independent. We consider the problem of how to estimate the regression parameters in the statistical model defined by (2.53) and (2.54). We assume we have a well-structured data set as in Table 1.1. Problems such as the misspecification of the link function, the omission of one or several important covariates, the random nature of some covariates, missing values, and so on will not be dealt with here. From (2.54) and the P U B E assumption, Pj can be considered as a function of atj and Pjk can be considered as a function of ctj, otk, and /3jk. Let y x , . . . , y„ be the observed values of Y i , . . . , Y „ . The loglikelihood functions of margins of the parameter vectors Qfj and f}jk based o n y l l . . . , y „ are Let and lni(ai) = E log-P,(j/«j), j = l,...,d, tnjk(otj,ak,0jk) = J^logPjkiyijVik), 1 < j < k < d. , . d e f 1 9Pj(yij) . = ^jk(eijkpp , l °p^yik), i<j<k<d, PjkiyijVik) oVijk tediPi.j(pi.j) ^ j k J dl~j—• < P i ^ k 5 ^ — 1 * w de—,—' l<3<k<d, (2.55) and . . d e f 1 dPjiyij) . , , n ,n N d e f 1 9Pjk(yijyik) . . . ^ , . , , . = **.JM») = p.k(yijyik) dpikt • l<J<k<d;t = l,...,qjk. Let 7 = (.aflt...,e/d,p12,...,pd-lidy, and y0 = (a'h0,a'2fi,pl2fi,... ,#,_ M i 0 ) ' . where y0 is the true vector value of 7. Let 7 = (a[,... ,ad,p'l2,... ,/?'d_i,d)' be the I F M E . Assume 7, 7 0 and 7 are all q x 1 vectors. Let = *,a(ai) - (il>i-ju • • •,il>i-jPj)', Vijk = VijkiPjk) - (V't-jtl, • • •. 1pi\jkqjk)', *„,• = *„,-(«i) = (*„,-!, • • • , *»j, P i)', Vnjk = VnjkiPjk) - ( * n i t l , • • • , ^njk.qjj, Chapter 2. Foundation: models, statistical inference and computation 59 where * n j s = Yl"=i^i-J> (s = and ̂ njkt = H)"=i V'iyJfct (t = l , . . . , ? j jb) . Let tf<;(7) = ( ^ j i C o i i ) ' , . . . , *<jd(otd)', *i;i2(/?i2)', • • •, *,- id-i,d(Ai-i 1d) /) ,> * » ( 7 ) = E?=i *.-;(7), and we define M „ ( 7 ) = n - 1 ^ E ( * i ; ( 7 ) ^ ; ( 7 ) / ) and £ > „ ( 7 ) = n ^ E [ . ( 2 . 5 6 ) From (2.55), the I F M for a ; and /? J j t are t=i datj (2.57) and the estimating equations based on I F M are jvnj = o, j = l,...,d, \ * „ i t = 0 , 1 < j < k < d. (2.58) With the I F M approach, estimates of a j and 0jk, denoted by otj — ( S j i , . . .,djiPj)' and /? j jb = (^jibi, • • • ,Pjk,qjk)'', are obtained by solving the nonlinear system of equations (2.58). We next give several necessary assumptions for establishing asymptotic properties of otj and (3jk. A s s u m p t i o n s 2.1 For 1 < j < k < d, 1 < / < m < d and i = 1 , . . . , n, we make the following assumptions: 1. (a) For almost all y,- G y and for all 0; G ft, tPi-j, fi-jk, <Pijj, <Pi;jk,j, <Pi-jk,k, <Pi;jk,jk exist. (b) For almost all yt- G y and for every 0,- G ft, dei;j < I<i;j; < Sij; dPjkjyijyik) Mi d2Pjk(yijyik) d PjkiyijVik) dei]k < Li;k', dPjkjyijVik) dOijk d2 Pjk(ynyik) d2 PjkjyijVik) ddhk ^ -̂ i;j k j d2 PjkiyijVik) dOijkdOij ^i'jkj j d2Pjk(vijyik) dOijkdO^k < Tijk,k: where Kij, Lij, Li-k, Lijk, Sij, T{j, Ti-k, T{jk, Ti-jkj andTijk,k are integrable (summable) over y , and are all bounded by some positive constants. Chapter 2. Foundation: models, statistical inference and computation 2. (a) and E E Pi(v<j) = opM, i = 1 {ya • \<Pi-,j\>n} n E E PjkivijVik) = o p ( i ) , l 8 = 1 {ya,yik • \<Pi-,jk\>n} E E <Pi-jP}(Vij) = Op(.n2), »' = 1 {Vij l̂ ii>l<"} n E E <Pi;j<Pi;kPjk(yijyik) = Op(n2), i = l {ya.yik •• sup(|ipi ; i|,|V J ; f c|)<n} n E E <PiJ<Pi;lmPjlm(yijyuyim) = Op(n2), i = 1 {y*i ,y ; i ,yim : sup(|v3 i ij|,|^ i ; l m|)<n} n E E <pljkpjk(yijyik) = op{n2), * = 1 {j/ii.yifc : \<Pi;jk\<n} n E E <Pi;jk<Pi;imPjkim(yijyikyuyim) = op(n2): I i = 1 {yij,yik,yil,yim • BUp(.\<f>i;jk\,\<Pi;lm\)<n} (b) E E Pj(yn) = opi1)' E E Pjk(y>jyik) = op(l), » = 1 {yij ; l^ - i i , i l>" } » = 1 {ya,yik • \vi-,jk,j\>n} E E Pjkiytjyik) = op(l), E E pik{yayik) I » = 1 {yn,yik • Iviijfc.fc|>n> !=i {ytj.yik • \<Pi;jkjk\>n} Chapter 2. Foundation: models, statistical inference and computation 61 and X) YI 'Pi-Jj PJ ( ) = oP{n2), i = \ {vu • \v>i;jj\<n} n E J2 <pl,jk,jpjk(yijyik) = op(n2), , = 1 {ya.yik •• \<i>i;jk,j\<n] n E (pl,jk,kpjk(yijyik) = op(n2), i = 1 {yi]<yik • \<Pi;jk,k\<n} n J2 '?hkjkpjk(yijyik) = op(n2), , = 1 { y i i . y . fc : \<fii-,jk,jk\<n} n E E <Pi;j,j<Pi;k,kPjk{yijyik) = op(n2), « = 1 {yijVik • ™p(\'f>i;j,j\,\<Pi;k,k\)<n} n E E Pij J<Pijm,l Pjlm(yij yiiyim) — Op(n2), 1 = 1 {yij,yu,yim • sup(|0 i ; ;, i J|,|0 j ; i r n,,|)<n} E E <Pi;j,j(Pi;!m,'mPjlm(yijyuyim) = Op(n2), i = 1 { y u , y i ! , y i m : sup(|vi ;j, J|,|^ i ; l m,,m|)<n} n E E <Pijk,j<Pi;jk,kPjk(yijyik) = Op(n2), ! = 1 {yij.Vik • s u p ( | ( / i i ! j f c , J | , | ¥ > i l i f e , f e | ) < n } n E E <PiJk,j'Pi;lm,lmPjklm(yijyikyiiyim) = Op(n2), * = 1 { y i i , y i k , y i i , y i m : sup(|v> i ; i k, i|,|v. i. I m, ! m|)<n} n E E *Pijk,jk<PiIm,lmPjklm(yijyikyiiyim) = 0p(n2). I » = 1 {yij,yik,yu,yim •. sup(|^i i f c | J k|,|0 j I m i ! m|)< n} • A s s u m p t i o n s 2.2 Let Q G J ? * is the parameter space of y, where 7 is assumed to be vector of length q. Suppose Q is an open convex set. 1. Each gj(-) and hjk{-) is twice continuously differentiable. 2. (a) The covariates are uniformly bounded, that is, there exist an Mo, such that ||x,j|| < Mo, W^ijkW < Mo- Furthermore, ^ , X , J X ? J - , J2iyfijkwTjk have full rank for n > no, where n0 is some positive integer value. (bj In the neighborhood of the true 7 defined by B(8) = {y*eg : "||7* - Til < *}, «5 i 0, 9j(')i 9j('):9j(')i hjk(-), frjfcO and hjk(-) are bounded away from zero. Chapter 2. Foundation: models, statistical inference and computation 62 • A s s u m p t i o n 2.3 For all e > 0 and for any fixed vector ||u|| ^ 0, the following condition is satisfied i i j S S ^ i : £ K , ( r . ) fP(Y;r . ) = o. v nwoy ; •=i{|u'f j(70)|>£(u'Af.(70)u)i/»} • Assumptions 2.1 and 2.2 are needed so that we may apply some weak law of large numbers theorem to prove the consistency of the estimates. Assumption 2.3 is needed for applying the central limit theorem for deriving the asymptotic normality of the estimates. These conditions appear to be complicated, but for special cases they are often not difficult to verify. For instance, for the models we will use in Chapter 5, if the covariates are bounded and have full rank in the sense of Assumptions 2.2, with appropriate choice of the link functions, the conditions are almost empty. Related to statistical literature, Bradley and Gart (1962) studied the asymptotic properties of maximum likelihood estimates (MLE's) where the observations are independent but not identically distributed (i.n.i.d.). Hoadley (1971) established general conditions (for cases where the observations are i.n.i.d. and there is a lack of densities with respect to Lebesque or counting measure) under which maximum likelihood estimators are consistent and asymptotically normal. Assumptions 2.1, 2.2 and 2.3 are closely related to the conditions in Bradley and Gart (1962) and Hoadley (1971), but adapted to multivariate discrete models with M P M E property. In particular, the Assumptions 2.1 reflect the uniform integrability concept (for discrete models) employed in Hoadley (1971). Proper t ies of estimators As with the model (2.39) with no covariates, we first develop the asymptotic results for the simplest situation when d — 2 such that Y j = (Yu, Y,-2)' (i = 1 , . . . , n) has the distribution P{ynyi2\0i), (2.59) where 6{ = (0j ; 1, 0j ; 2 , 0j ;i 2). Without loss of generality, 0j ;i, 0j ; 2 , 0 j ; i 2 are assumed to be scalar parameters. We further assume = 9j(<Xj*-ij), J = 1,2, (2.60) A;12 = /ll2O0i2Wj12), where the functions gj(-) and fti2(-) are monotonic increasing (or decreasing) and differentiable functions. In (2.60), otj = (ctji,...,ctjPj)' is a pj x 1 parameter vector and XJJ = ( z j j i , . . . , X i j P j ) ' Chapter 2. Foundation: models, statistical inference and computation 63 is a pj x 1 vector of covariates corresponding to the response variables Yij (j — 1,2). Similarly, /?12 = (/?i2i, • • • , / ? i 2 g i 2 ) ' is a gi2 x 1 parameter vector and w , i2 = ( w i m , • • •, u>ti2 4 l 3)' is a ? 1 2 x 1 vector of covariates corresponding to the response vector y,-, where yj = (yn, yi2) is the observed value of Y j . Theorem 2.5 Consider the model (2.59) together with (2.60) and let y = (ot[, d' 2 , J312)' denote the IFME ofy corresponding to IFM (2.57). Assume y is a q x 1 vector. Under Assumptions 2.1 and 2.2, 7 is a consistent estimator ofy0. Furthermore, under Assumptions 2.1, 2.2 and 2.3, as n —> 00, we have ^ - 1 ( 7 - 7 o ) ^ ( 0 , / ) , where An = Dn1/2(y0)M^/2(y0)(Dn 1 / 2 (7o)) T , with Dn(y0) and Mn(y0) defined in (2.56). Proof: Using Taylor's theorem to the first order, we have * „ i ( a i ) = * „ i ( a i , o ) + ( 0 1 - ari.o) *n2(Q!2) = *n2 («2 , o ) + ( « 2 . - «2,o) 0 * m dati a- 3 * „ 2 da2 * n l 2 ( / ? 1 2 ) = *nl 2(/Vo) + (Pl2 ~ Plifi) 3 * „ 1 2 12 + (di - a i , 0 ) dati dfi12 + ( d 2 - a2,o) (2.61) V 3 * „ 1 2 da2 V where ot\ is some vector value between d i and Qfi,o, &2 is some vector value between d2 and a 2 ] o, and 7** is some vector value between 7 and 70. Note that in (2.61), \Pf,i 2(/?i 2) also depends on d i and d 2 , and ̂ ni2(0\2,o) depends on o^o and at2to- Furthermore, in (2.61), we have f = E ^ ; l ( ^ ; l ) ( ^ K x . - l ) ) 2 + ^ ; l (^ ; l )^Kxj 1 ) ]x j 1 xf 1 , i = l ^ f ^ 1 = iyi.4°iMi(«S*>))2 + ^ ; 2 ( ^ ; 2 ) < ( ^ X i 2 ) ] x j 2 X ? T 2> •=1 = E [ ^ ; i 2 ( ^ ; i 2 ) ( ^ f c ( ^ 2 W i i 2 ) ) 2 + W i i a ^ - i a J ^ O T a W a s M w a a w f ^ , (2.62) 0 7 , 1 2 ,-=i ^ n l 2 Q 9 i 2 ) _ ^ 9a 1 1=1 0 t t n l 2 ( £ l 2 ) = f da2 «=i cVt;l2(#i;l2) / , ; x , / / j 0 T \ 'gg ' ' 9j(<XlXil)h'jk(012Wil2) dfi-i2(9i\2) , . , . , r >. - ^ ( a 2 x « ) ^ i ( i 8 i 2 W i i 2 ) 90j;: Wj^Xj^ , Wj l2x f 2 , Chapter 2. Foundation: models, statistical inference and computation 64 and 3 * „ i ( o i ) / a«2 = 0, 0 * „ i ( a i ) / d 0 1 2 = 0, d^n2{a2)/da1 = 0 and d*n2(a2)/d/312 = 0. To establish the consistency and asymptotic normality of y, note that with Assumptions 2.1, we have that n- 2 E(<M«i,o)*»j(«i ,o) T ) 0 (j = 1,2), and n- 2 E(tf„ 1 2 (/?i2 ,o)*ni 2 (/?i2 ,o) T ) 0. By the assumed monotonicity (e.g. /*) and differentiability of gj(-) and hi2(-), g'j(-) is a non- negative function of the unknown orj and the given X j j , and /i'12(-) is a non-negative function of the unknown B12 and the given w , i 2 . From Assumptions 2.1 and 2.2, d^nj(otj)/daj (j = 1,2) and d^ni2(0i2)/d0i2 have full rank. By Markov's weak law of large numbers (see for instance Petrov 1995, p.134), we obtain that n - 1 * „ i ( a i ] 0 ) - ^ p 0 , n " 1 * „ 2 ( a 2 , o ) - > p O and n~11'ni2(/?12|0)-'-P0. Since ^ni(Q!i) = 0, \Pn2(<*2) = 0 and ^n\2(S\2) = 0, by following similar arguments as for the consistency of 6 in the model (2.39), we establish the consistency of y. Now let *n(7) / *» l (« l ) \ * n 2 ( « 2 ) V*nl2(^12) / Hn{y) = I 9 Q t i x aa. 0 0 0 dOtQ and / 9 t t i 0 aa 3 (2.61) can be rewritten in the following matrix form T* 0 0 \ Vn(y - y0) = -HI h*n(7o)]. (2.63) It follows from the convergence in probability of 7 to y0 that ±[Hn(y) - Hn(y0)]^0. Since each element of n~1Hn(y0) is the mean of n independent variates, under some conditions for the law of large numbers, we have 1 -Hn(y0)-Dn(y0)^0, (2.64) where Dn(y0) = n- 1 E{i?„(7 0 )}. Assumptions 2.1 and 2.2 imply that lim ra-2Var(#„(70)) = 0. (2.65) Chapter 2. Foundation: models, statistical inference and computation 65 Thus by Markov's weak law of large numbers, we derive that ^H*n - Dn(y0)^0. (2.66) Next, we note that ^ n (7o) involves independent centered summands. Therefore, we may directly appeal to the Lindberg-Feller central limit theorem to establish its asymptotic normality. From Assumption 2.3, by the Lindberg-Feller central limit theorem, we have " ' * n ( 7 o ) D (u>Mn(y0)uy/2 Applying Slutsky's Theorem to (2.63), we obtain v ^ ^ 1 ( 7 - 7 o ) ^ « ( 0 , / ) , where An = £ > - 1 / 2 ( 7 0 ) M „ 1 / 2 ( 7 o ) ( £ > - 1 / 2 (7o) ) T , and J is a q x q identity matrix. • Next we turn to the general model (2.53) where d is arbitrary. With the Assumptions 2.1, 2.2 and 2.3, Theorem 2.5 can be generalized to the case d > 2. Compared with the d = 2 situation, the I F M for the general model (2.53) is a system of equations (2.58), which do not introduce any complication in terms of the asymptotic properties for I F M E . Thus we have the following: T h e o r e m 2.6 Consider the general model (2.53) with arbitrary d. Lety denote the IFME ofy under the IFM (2.58). Under Assumptions 2.1 and 2.2, y is a consistent estimator of y. Furthermore, under Assumptions 2.1, 2.2 and 2.3, as n —+ oo, we have v ^ A - 1 ^ - 7 0 ) ^ ( 0 , 7 ) , where An — £ » „ 1 / 2 ( 7 0 ) M „ 1 / 2 ( 7 0 ) ( J D „ 1 / 2 ( 7 0 ) ) T , with Dn(y0) and Mn(y0) are defined in (2.56) • We now calculate the matrix Mn(y) and Dn(y) (or just part of the matrices, depending on the need) corresponding to the I F M for aj and 0jk. For example, suppose a,-; is a parameter appearing in Pj(yij) and a k m is a parameter appearing in Pk(y(k). Then the element of the matrix Mn(y) corresponding to the parameters ctji and a k m can be estimated by 1 " 1 dPjjyjj) dPk(yik) (2.67) OtjOt* n JT[ Pj(Vij)Pk(yik) daji dak where j may equal to k. If ctji is a parameter appearing in Pj{yij), and Bkms is a parameter appearing in Pkm{yikyim), then the element of the matrix Mn(y) corresponding to the parameters ctji and Bkms can be estimated by l . f 1 dPjjyij) dPkm(yikyim) n ~[ Pj(yij)Pkm(yikyim) da^ dBkms . , (2.68) ajakamBkm Chapter 2. Foundation: models, statistical inference and computation 66 where k < m and j may equal to k or m. Furthermore, if Pjks is a parameter appearing in PjkiyijVik) and fiimt a parameter appearing in Pim(Vi\Vim), then the element of the matrix Mn(y) corresponding to the parameters Pjks and Pimt can be estimated by 1 " dPjkiyijVik) dPlmjyuVim) . . , (2.69) dr idr l l a Idrm 0 i l k 0 I m n ^ Pjk(Vijyik)Plm(yilVim) df3jk, dPlmt where j < k, I < m and (j, k) = (/, m) is allowed. For the elements of the matrix Dn(y), suppose ctji and ctjm are parameters appearing in Pj(yij)- Then the element of the matrix Dniy) corresponding to the expression d^fnji{aji)/dajm can be estimated by 1 ^ 1 dPjjyjj) dPj(yij) (2.70) " i^l Pj(y>i) dai< d0Cjm If ctji is a parameter appearing in Pj(vij) and /?jjts is a parameter appearing in PjkiyijVik), then the element of the matrix Dn(y) corresponding to the expression d^nj(ctji)/dPjks is 0. If Pjks is a parameter appearing in PjkiyijVik) and ai is a parameter corresponding to a univariate margin, then the element of the matrix Dn(y) corresponding to the expression diinjksiPjks)/dai can be estimated by 1 " n t—' dPjkivijyik) dPjkiyijVik) JT[ PjkiVijyik) dfijk* dai (2.71) If Pjks is a-parameter appearing in PjkiyijVik) and /3t is a parameter corresponding to a bivariate margin, then the element of the matrix Dniy) corresponding to the expression d^njksiPjks)/'dPt can be estimated by dPjkiyijVik) dPjkjyijVik) l y ^ 1_ n fri Pjkiyayik) dpjks dpt (2.72) However, as in section 2.5 and the data analysis examples in Chapter 5, it is easier to use the jackknife technique to estimate M„iy) and Dniy). 2.4.3 Asymptotic results for the models assuming a joint distribution for response vector and covariates The asymptotic developments in subsection 2.4.2 treat x , i , . . . , x*d and w , i 2 , . . . , W j d - i , d as known constant vectors. A n alternative would be to consider the covariates as marginal stochastic outcomes of a vector V* , and to consider the distribution of the random vector formed by the response vector Chapter 2. Foundation: models, statistical inference and computation 67 together with the covariate vector. Then the-model (2.53) can be interpreted as a conditional model, i.e., (2.53) is the conditional mass function given the covariate vectors, where the Y t- are conditionally independent. Specifically, let Zs- = I I , i = 1, . . . , n , be iid with distribution F$ belonging to a family T = {F^,6 G A} . Suppose that the distributions F$ possess densities or mass functions (or the mixture of density functions and mass functions) with sup- port Z; denote these functions by /(z; 6). Let 6 = (y' ,r)')', where y = (o^ , . . . , a'd, /3' 1 2 , . . . ,0'd_1 d)' • Assume that the conditional distribution of Y t- given V,- = vt-, P(yi;6i\vi = vi), (2.73) is as in (2.53), that is P(yi;0i\'Vi = v,) is a M C D or M M D model with univariate and bivariate expressible parameters (PUBE) Oi = (<?!(<*;, v , ) , . . . , ed{a'd, v,-), M / ? i 2 , v,) , • • •, Od-iAP'd-i.d, v,-)), where Oi is assumed to be completely independent of 17 (defined below), and is a function of y. We also assume that 9j (j = l,...,d) is the parameter for the jth univariate margin Pj(ytj) of Y,- from the conditional distribution (2.73), such that Pj(yij) depends only on the parameter Qj, and 9jk (1 < j < k < d) is the parameter for the (j,k) bivariate margin PjkiyijVik) of Y,- from the conditional distribution (2.73) such that 6jk is the only bivariate parameter. In contrast to (2.54), here we only need to assume that Bj is a function of otj and v, j = 1, 2 , . . . , d, and 9jk is a function oi Pjk and v, 1 < j < k < d, without explicitly imposing the type of link functions in (2.54). The marginal distribution of V ; is assumed to depend only on the parameter vector JJ, which is treated as a nuisance parameter vector. Its density or mass function (or the mixture of the two) is denoted by gj (v,-; rj). Thus /(z,-; 6) = P(yi;0i\Vi = v,-)#(v i ; t,). (2.74) Under this framework, weak assumptions on the support of V,-, in particular not requiring any special feature of the link function, are sufficient for consistency and asymptotic normality of the I F M E in the conditional sense. Let the density of Z = (Y' , V ' ) ' be f(z;6), and that of V be gj(v;rj). Based on our assumptions on P(yi',0i\Vi = V J ) , we see that the joint marginal distribution of Yj and V is ^•,v = P , (%|V = v)^(v ;r ? ) Chapter 2. Foundation: models, statistical inference and computation 68 and the joint marginal distribution of Yj, Yk and V is Pjky = Pjk(yjVk\V = V)3J(V;JJ). For notational simplicity, in the following, we simply write Pj(yj) and Pjk(yjyk) in lieu of Pj(yj |V = v) and Pjk(yjyk|V = v). The corresponding marginal distributions for Z; are P j , V i = Pj(yij)9j(v>;r}), Pj k,\i = Pjk (ytj yik )9j (v,-; >?). Thus we obtain a set of loglikelihood functions of margins for 7 t=i >'=i »'=i n n n Injk = ^ l o g P _ , j f c , V i = E l o g P^J / . - j J / j * ) + E f f j ( V ' ; , ? ) ' 1 < J < ^ < rf- (2.75) 8 = 1 i=l s = l Let , .def 1 dPjy Uj,s = Wj,,(Q!j-,) = ^ L L - , J = l,...,d; s = l,...,pj, Fjy OOLjs u>jk,t = Ujk,t(0jk,t)d= 9 ^ k ' V , 1 < j < k < d; t = 1 and "j.Vi OCtj, Ui-jk,t - ^i;jk,t(0jkt)'=-^:—d^ik'Vi, 1<3 <k <d; t = l , . . . , q j k , ^jk.Xi OPjkt for i = 1 , . . . , n. From (2.75), we derive the I F M for aj and 8jk - 2^ W ' ;J> - 2^ P . v 5 a . = 2^ p.(v..\ d a . =2^Vi-jB, J = l,...,d; s = l,...,Pj 8 = 1 8 = 1 3 , X i 3° 8 = 1 r]{y*j' 0 0 ! 1 S i = l 1 3P J(;/, J) njk,t - V * 1 dpjk,vi 1 dPjk(yijyik) L 8 = 1 fr[Pjk,Yi dPjkt fr( Pjk(yijyik) dpjkt jkt, 8 = 1 1 < 3 < k < d; t = l , . . . , q j k , and the estimating equations for aj and 3jk based on I F M are ttnj,s = 0, j - l , . . . , d ; s = l , . . . , p j , (2.76) (2.77) ftnjM = 0, 1 < j < < d ; t = 1 , . . . , qjk. With the I F M approach, estimates oiaj,j = and 8jk, 1 < j < k < d, denoted by aj = (ctji, • • •,&j,pj)' a n d Pjk ~ (Pjki, • • •>Pjk,qjk)', are obtained by solving the nonlinear system Chapter 2. Foundation: models, statistical inference and computation 69 of equations (2.77). Note that (2.77) is computationally equivalent to (2.57); they both lead to the same numerical solutions (assuming the link functions given in (2.54)). Let fi„j = (firy.i, • • •>tonj,pj)' and £l„jk = (£lnjk,i, • • • ,^njk,qjk)'• Then (2.76) can be rewritten in function vector form { &nj=0, j = l , . . . , d , Vnjk = 0, l < j < k < d . Let Qj = (wj . i , . . . , U j , P j ) ' and Qjk = (ujk,i, • • -^jk^J- Let 0 = ( f i i , . . . , Q'd, fi'12>ft'd_M)'. Let M n = E(Qfl ') , Dn = dCl/dy. Under some regularity conditions similar to subsection 2.4.1, consistency and asymptotic normality for y can be established. Basically, the assumptions that we need are those making 0 a regular inference function vector. A s s u m p t i o n s 2.3 1. The support ofZ, Z does not depend on any 6 £ A. 2. Es{Sl) = 0; 3. The partial derivative dCl/dy exists for almost every z G Z; 4- Assumed andy areqxl vectors respectively, and their components have subindex j = l , . . . , q . The order of integration and differentiation may be interchanged as follows: Wi JZU^^'^DTI^ = ^ ^ [ W J A Z ; * ) ] D M Z ) ; 5. E(j{QQ'} G M q x q and the q x q matrix Mn = Es{£in'} is positive-definite. 6. The q x q matrix = dd(6)/dy is non-singular. • Assumptions 2.3 are equivalent to assuming that Mn(y) in (2.56) is positive-definite and Dn(y) in (2.56) is non-singular for certain n > no, where no is a fixed integer. We have the following theorem Chapter 2. Foundation: models, statistical inference and computation 70 Theorem 2.7 Consider the model (2.73) and let y denote the IFME of y under the IFM (2.77). Under Assumptions 2.3, y is a consistent estimator ofy. Furthermore, as n —»• oo, we have asymp- totically where Jn = D^M^1 Da- Proof. Under the model (2.74) and the Assumptions 2.3, the proof is similar to that of Theorem 2.3 We believe this approach for deriving asymptotic properties of an estimate has not appeared in the statistical literature. The assumptions are suitable for an observational study but not an experimental study in which one has control of the v's. Theorem 2.7 is different from Theorem 2.6 in that M n and £>n both depend on the distribution function of V . Nevertheless, because Uij^ = ipij,s and Wijk,t = ipi;jk,t, the numerical evaluation of M n and Mn(y) in (2.59) based on data are the same because only the empirical distribution for ^ is needed. For example, suppose or; is a parameter of Pj(yij) and ctm a parameter of Pk{yik), then the element of the matrix M n corresponding to the parameters a; and ctm can be estimated by which is the same as (2.67). We can similarly obtain (2.68) and (2.69). The same result is true for D n versus Dn(y) in (2.56); they both lead to the same numerical results based on the data. We thus derive the same formulas (2.70)-(2.72) for numerical evaluation of D n - 2.5 The Jackknife approach for the variance of I F M E The calculation of the Godambe information matrix based on regular I F M for the models (2.39) and (2.53) is straightforward in terms of symbolic representation. However, the actual computation of the Godambe information matrix requires many derivatives of first and second order, and in terms of computer implementation, considerable programming effort would be required. With this consideration, an alternative jackknife procedure for the calculation of the I F M E asymptotic variance is developed. The jackknife idea is simple, but very useful, especially in cases where the analytical answer is very difficult to obtain or computationally very complex. This procedure has the advantage of general computational feasibility. V ^ ( 7 - 7 0 ) ^ ( 0 , J n 1 ) . and 2.4. • Chapter 2. Foundation: models, statistical inference and computation 71 In this section, we show that our jackknife method for calculating the corresponding asymptotic variance matrix of 0 is asymptotically equivalent to the Godambe information matrix. We examine the situation for models with covariates and with no covariates. Our main interest in using the jackknife is to obtain the SE of an estimate and not for bias correction (because for multivariate statistical inference the data set cannot be small), though several results about jackknife parameter estimation are also given. The jackknife estimate of variance may be preferred when the appropriate computer code is not available to compute the Godambe information matrix or there are other complications such as the need to calculate the asymptotic variance of a function of an estimator. Some numerical comparisons of the,Godambe information and the jackknife variance estimate based on simulations are reported in Chapter 4. The jackknife procedure is demonstrated to be satisfactory. Some general references about jackknife methods are Quenouille (1956), Miller (1974), Efron and Stein (1981), and Efron (1982), among others. A recent reference on the use of jackknife estimators of variance for parameter estimates from estimating equations is Lipsitz et al. (1994), though their one-step jackknife estimator is not as general as what we are studying here, and their main application is to clustered survival data. In the following, we assume that we can partition the n observations into g groups with m observations each so that n = ra x g, m is an integer. We discuss two situations for applying jackknife idea: leaving out one observation at a time and leaving out more than one observation at a time. 2.5.1 Jackknife approach for models with no covariates Let Y , Y i , . . . , Y „ be iid rv's from a regular discrete model P(yi---yd;6), 6 e » in (2.12), and y, y i , . . . , y „ be their observed values respectively. Let 9(6) = (tpi(6),...,ipq(6)) be the I F M based on y, = (i>i-i(6),..., ipi-q(6)) be the I F M based on yit and 9n(6) = (¥„ i (0 ) , . . . , ¥ „ , ( * ) ) be the I F M based on y l t . . . , y „ , where 9nj(6) - £ ? = 1 ^;j(0) (j = 1, • • •, «)• Let 6 - 0i,..., eq) be the estimate of 6 from 9n(6) = 0. Leave out one observation at a t ime Let be an estimate of 0 based on the same set of inference functions \P„ but with the ith observation yt from the data set y1,..., yn deleted, i = 1 , . . . , n. In this situation, we have m = 1 Chapter 2. Foundation: models, statistical inference and computation 72 and g — n. That is, we delete one group of size 1 each time and calculate the same estimate of 0 based on the remaining n — 1 observations. Let 0,- = nO — (n — 1)0(«), and 0(.) = X2"=1 0({)/n. 0,- are called "pseudo-values" in the literature. The jackknife estimate of 0 is defined as -!)*(•)• (2-78) i = l The jackknife estimator has the property that it eliminates the order l / n term from a bias of the form E{6) = 0 + u\/n + U2/n2 + • • •, where the functions u i,U2, . . . do not depend upon n (see Miller 1974). In fact, the jackknife statistic Oj often has less bias than the original statistic 0. The early version of the jackknife estimate of variance for the jackknife estimator Oj was sug- gested by Tukey (1958). It is defined as Vj = ̂ nhf) ~ ~6j)ih ~ h ) T = ̂  B * « - *<•>)(*«> - * 0 ) T - ( 2 J 9 ) In (2.79), the pseudo-values 0,- are treated as if they are independent and identically distributed, disregarding the fact that the pseudo-values 0t- actually depend on the common observations. To justify (2.79), Thorburn (1976) proved that under rather natural conditions on the original statistic 0, the pseudo-values are asymptotically uncorrelated. In practice, the pseudo-values have been used to estimate the variance not only of the jackknife estimate 0j, but also 0. But if the bias correction in jackknife estimate is not needed, we propose to use a simpler estimate of the asymptotic variance (matrix) of 0: ^ = D * ( o - W o - * ) T - (2-8°) « = 1 In our context, unless stated otherwise, we always call Vj defined by (2.80) the jackknife estimate of variance. In the following, we first prove that the asymptotic distribution of a jackknife statistic 6j is the same as the original statistic 0 under the same model assumptions; subsequently, we prove that Vj defined by (2.80) is a consistent estimator of inverse of the Godambe information matrix MO). Theorem 2.8 Under the same assumptions as in Theorem 2.4, the jackknife estimate 0j in (2.78) has the same asymptotic distribution as 0. That is, as n —• oo, Mh-0)°Nq(0,M(0)), where J*(0) = D%{0)M^(0)D9(0), with M*(0) = £{tf(0)tf T(0)} and £>*(0) = 89(0)/dO. Chapter 2. Foundation: models, statistical inference and computation 73 Proof. We sketch the proof. For * n (0) = ( t f „ i ( 0 ) , * „ , ( 0 ) ) : y n x $ ^ M9 has the following expansion around 0 o = = + - 0) + R „ , where Hn(0) — 3 * „ ( 0 ) / 3 0 is a g x g matrix and R N = 0P(||0 — 0\\2) = O p ( n _ 1 ) by assumptions. Thus Vn~(0 -0)= [-Hn(0)) ^ ( - * „ ( « ) - R „ ) . Let 4,(i)(0) be ^ n (0) calculated without the ith observation, and H^(0) be the g x q matrix d'9n(0)/d0 calculated without the ith observation. Similarly, we have where Rn_i,< = Op{\\~0(i) - 0\\2) = o ^ n " 1 ) . Since Vn~(0i -0) = n (yn~(0 - 0 ) - x/^(0(i) - 0)) + y/nffi^ - 0), we have yfrih -ey=n (vn-(0 - 0) -1>/»(*(.•) -')) + ̂  E v̂ (*(o - ') \ i=l / i=l Thus Vn(0 j -0) = n + By the Law of Large Numbers, we have and From the central limit theorem, -Hn(0)^D^(0), n — f f ( O ( 0 ) A £ > , ( * ) . -±=yn(0)ZNq(o,M*). Chapter 2. Foundation: models, statistical inference and computation 74 We further have This, together with the -yn-consistency of 0 (Theorem 2.4), lead to -Hn(0)) 1 - i= ( - * „ ( « ) - R „ ) ^ T T ; E { ( ^ T F F ( ' ) C ) )  ( " * ( ' > ( ' l ) " " M J ° " . ( ° . D ; ' M . ( C ; ' ) T ) . Thus by applying Slutsky's Theorem, we obtain ^(ej-0)^Nq(o,j^(0)). • Theorem 2.9 Under the same assumptions as in Theorem 2-4, the sample size n times the jackknife estimate of variance Vj defined by (2.80) is a consistent estimator of J^1(0). Proof. We have - *)(*(0 - Of = - *)(*(.•) - of i=l Recall that i=l L» = l (fl _ fl)T -{0-0) n(O-0)(0-Of. 0-0 = H-1(6)(-9n(O)-Kl), and Chapter 2. Foundation: models, statistical inference and computation 75 where R n = Op{\\9 - 6\\2) = Opin'1) and R „ _ l i 8 - = Op(\\6(i) - 6\\2) = Op{n-1). Thus J2(h) ~ * ) ( ' « ) -~0)T=J2 H^(6) ( - * ( 0 ( * ) - R n - i , i ) ( -*( , •)(«) - R n - i , 0 T i = l t'=l n Li=l -1 H~\6) ( - * „ ( * ) - R „ ) E ^ j M (-*(,)(*)-Rn-i,,) «=i + n / / - 1 ^ ) ( - * „ ( * ) - R n) ( - * „ ( « ) - R n ) T {H-\e)f . (2.81) As * ( 0 ( t f ) = * „ ( * ) - *,,(*), thus E : = i * ( . - ) (») = ( « " ! ) * » ( * ) and £ ? = 1 ^oC*)*^*) = (n - 2 ) * „ ( 0 ) ^ ( 0 ) + £ " = 1 (*)• % t h e L a w of Large Numbers, we have -Hn(B)^*D9(9), n 1 and »=1 From (2.81), we have i=i and this implies that t = l i=l In other words, we proved that nVj is a consistent estimator of J^1(0). • Leave out m o r e t h a n one observation at a t ime Now for general g > 1, we assume a random subdivision of y l t . . . , yn into g groups (n = gm). Let 0(„) = 1,...,(/) be an estimate of 0 based on the same set of inference functions \P from the data set y : , . . . , y n but deleting the u-th group of size m (m is fixed), thus is calculated based on a subsample of size m(g — 1). The jackknife estimate of 6 in this general setting is the mean of Bv, which is 1 3 OJ =-52** = g*- (g- i)h)> (2-82) 9 v=\ Chapter 2. Foundation: models, statistical inference and computation 76 where = g 1 YH=i ^0)> a n d ̂ v = gO — (g — 1)6^ {y = 1 , . . . , g) are the pseudo-values. In this situation, the jackknife estimate of variance for 6, Vj, is defined as (2.83) i/=i Theorem 2.8 and 2.9 can be easily generalized to the situation with fixed m > 1. Theorem 2.10 Under the same assumptions as in Theorem 2.4, the jackknife estimate 6j defined by (2.82) with m fixed has the same asymptotic distribution as 6. That is, as n —• oo (thus g —• oo^, Vn-(0j-6)°Nq(O,J*\0)), where Jy{6) = D%{6)M^l(6)Diil{6), with M*(0) = Ee{9{6)9T (6)} and D*(6) = 89(6)/86. Proof. We sketch the proof. Let \P(„)(0) be 9n(6) calculated without the l/th group, and H^(6) be the q x q matrix 89n(6)/d6 calculated without the uth group. We have - l \/n — m(0(v) - 6) = -H{v){6) 1 ( - * ( „ ) ( * ) - R „ - m , „ ) where R n _ m , „ = Op{\\6{v) - 6\\2) - Op(n-1). Recall that - 6) = g (yg(6 -6)- ^{6{v) - 6)) + ^{6{l/) - 6), we thus have y/g&j -6) = g (jg-{6 -6)-l-J2 - 9)) + - E v W w - 6) \ 9 v = l ) 9 v = l this implies that 9 1 yft(h -6) = g [^{6 - 6 ) - ]f^1~ E Vn~^(6(v) - 6)^ + yj^~ E v^=^(0 ( l O - *). Thus Vn~{0j -6) = g ^ ( ^ ' ^ ( - • n W - R n ) - + (2.84) Chapter 2. Foundation: models, statistical inference and computation 77 By the Law of Large Numbers, we have and From the central limit theorem, We also have ±Hn(6)^D*(6), n 1 H(l>)($)**D9(0). n — m 4=*(0)-̂ W,(O,M»). This, together with the \/n-consistency of 6 (Theorem 2.4), lead to -#„(*)) - L ( _ * „ ( « ) _ R „ ) ' i j E { ( s r ^ ' w w ) " ^ (-t(.,(») - a.-.,)} \ ^ £ { ( ^ " « w ) T T = (-*«(') - )} -".0. V « * W)T). Applying Slutsky's Theorem to (2.84), we obtain ^(8j-6)^Nq(0,J^(6)). • Theorem 2.11 Under the same assumptions as in Theorem 2.4, the sample size n times the jack- knife estimate of variance Vj defined by (2.83) is a consistent estimator of J^1(6), when m is fixed and g —• oo. Proof: We use the same notation as in the proof of Theorem 2.10. We have 9 9 - 6)(6{v) - 6f = - 6)(6{v) - 6? vzzl I / = l (6 - 6)T - ( 6 - 6 ) + g(6-6)(6-6f Chapter 2. Foundation: models, statistical inference and computation 78 and Thus *(„) - 0 = H(V)(0) ( - * ( „ > ( * ) - R „ _ m , „ ) . y y r - ~0)(O{V) -6)T = J2 H^){0) ( - * ( „ ) ( « ) - R „ _ m , „ ) ( - * ( „ ) ( ' ) - R n - m , , ) T i / = i 9 ( - * „ ( * ) - R n f ( / / n " 1 ^ ) ) 5 £/frU<?) ( - * < „ ) ( * ) - R n - m , „ ) E#w(*) ( - * W ( 0 ) - R „ _ m , , ) | - T / " 1 ^ ) ( - * „ ( » ) - R n ) + gH-\d) ( - * „ ( * ) - R „ ) ( - * „ ( ' ) - R . ) T ( / f " 1 ( * ) ) T Let V}(0) = * „ ( * ) - * ( „ ) ( « ) . Then i«MW=(fl-i)*»(f) (2.85) v = l and £ *wW*fv)(tf) = (9 - 2)*„(«)C(«) + E *;(*)(*; W)3 i / = i By the Law of Large Numbers, we have -^—H{v){O)^D*{0). n — m y ' We also have £{<^(0)(tf * (0))T/m} = E{9(0)#T(0)}, thus by the Law of Large Numbers, ±£*;(*)(*;(«))T = iZ{*JW(*^W)T/m>^(*(*)*TW)- ra v From (2.85), we have D*oo - - *)T - E w«: ( ^ i ^ i / = i which implies that «E(*(") " W « 0 - *)TI*D*(B)-1M9(B)(D9(B)-1)T. 1=1 In other words, we proved that n ^f_ 1( f l( 1/) - 0)(9(v) ~ ^ ) T is a consistent estimator of 1 (0), when m is fixed and # —• oo. • Chapter 2. Foundation: models, statistical inference and computation 79 The main motive for the leave-out-more-than-one-observation-at-a-time approach is to reduce the amount of computation required for the jackknife method. For large samples, it may be helpful to make an initial random grouping of the data by randomly deleting a few observations, if necessary, into g groups of size ra. The choice of the number of groups g may be based on the computation costs and the precision or accuracy of the resulting estimators. As regards of computation costs, the choice (m,g) = ( l ,n) is most expensive. For large samples, g = n may not be computationally feasible, thus some values of g less than n may be preferred. The grouping, however, introduces a degree of arbitrariness, a problem not encountered when g — n. This results in an analysis that is not uniquely defined. This is generally not a problem for SE estimation for application purposes, as usually then a rough assessment is sufficient. As regards the precision of the estimators, when the sample size n is small to moderate, the choice (m,g) = ( l ,n) is preferred. See Chapter 5 for examples. 2.5.2 J a c k k n i f e f o r a f u n c t i o n o f 0 The jackknife method can also be used for estimates of functions of parameters, such as the asymp- totic variance of P(yi • • - yd', 0) for a M C D or M M D model. The usual delta method requires partial derivatives of the function with respect to the parameters, and these may be very difficult to obtain. The jackknife method eliminates the need for these partial derivatives. In the following, we present some results on the jackknife method for functions of 0. Suppose h(0) = (bi(0),..., bs(6))' is a vector-valued function defined on 5ft and taking values in s-dimensional space. We assume that each component function of b, bj(-) (j = 1,... .,s), is real valued and has a differential at 0Q, thus b has the following expansion as 0 —• 0o'. b(0) = b(6o) + (0-Oo)(j^y+ o(\\0-0o\^ (2.86) where db/d0'o — (db/d0')\Q_Qo is of rank t = min(s, q). By Theorem 2.4, 0 has an asymptotic normal distribution Similarly by Theorem 2.8 and 2.10, 0j has an asymptotic normal distribution in the sense that Vn-(0j- 0)°Nq(0, J * 1 ) . We have the following results for b(0) and b(0j): Chapter 2. Foundation: models, statistical inference and computation 80 T h e o r e m 2.12 Let b be as described above and suppose (2.86) holds. Under the same assumptions as in Theorem 2.4, b(0) has the asymptotic distribution given by Proof. See Serfling (1980, Ch.3). • T h e o r e m 2.13 Let b be as described above and suppose (2.86) hold. Under the same assumptions as in Theorem 2.4, b(6j) has the asymptotic distribution given by Proof. See Serfling (1980, Ch.3). • As in the previous subsection, let t9(„) be the estimator of 0 with the i/-ih group of size m deleted, v = 1 , . . . , g. We define the jackknife estimate of variance of h(0), which we denote by Vjb, as VJb = J2 (*>(*(,)) - b (*)) ( b ( ^ ) ) - b ( * ) f • (2-87) i/=i We have the following theorem. T h e o r e m 2.14 Let b be as described above and suppose (2.86) holds. Under the same assumptions as in Theorem 2.4, the sample size n times the jackknife estimate of variance Vj\> defined by (2.87) is a consistent estimator of dB'JJ* \d6'J • Proof. The proof is similar to that of Theorem 2.11, and thus omitted here. • To carry out the above computational results related to the estimates of functions of parameters, it would be desirable to maintain a table of the parameter estimates for the full sample and each jackknife subsample. Then one can use this table for computing estimates of one or more functions of the parameters, and their corresponding SEs. The results in Theorems 2.12, 2.13 and 2.14 have immediate applications. One example is given next. E x a m p l e 2.18 For a M C D or M M D model in (2.12), say P(yi • • -yd]6), we could apply the above results to say something about the asymptotic behaviour of P(yi • • • ya',6) and P(yi • • - yd',6j). From Theorems 2.12 and 2.13, we derive that as n —• oo Chapter 2. Foundation: models, statistical inference and computation 81 and yfrWvi • y d ; h ) - P(V1 • ••yd;0))$N (o, J*1 {^f^j . Furthermore, by Theorem 2.14, we obtain a consistent estimator of (dP/d0')J^1(dP/d0')T, i.e. g 2 n £ {p(yi • • • - • '• • ')) • i/=i Also see Chapter 5 for direct application in data analysis. O 2 .5 .3 J a c k k n i f e a p p r o a c h f o r m o d e l s w i t h c o v a r i a t e s Suppose we have the model defined by (2.53) and (2.54). Let (v = 1,..., g) be an estimate of 7 based on the same set of inference functions ^ n (7) from the data set y l t . . . , y n but deleting the f - th group of size m (m is fixed). The jackknife estimate off is 1 9 T ; = - D " = ^ - ( « - % ' ( 2- 8 8) 9 v = l where 7 ( . } = 1/g £ * = 1 7 ( l / ) , and yv = gy - (g - l ) 7 ( l / ) (v=l,...,g). We define the jackknife estimate of variance Vj:y for y as follows: ^ 7 = E ( 7 W - 7 ) ( 7 W - 7 ) T . . (2.89) i/=i Under the assumptions for the establishment of Theorem 2.6, in parallel to Theorems 2.10 and 2.11, we have the following theorems for the models with covariates. The proofs are generalizations of the proofs for Theorem 2.5, 2.6, 2.10 and 2.11. We omit the proofs and only state the results here. T h e o r e m 2.15 Consider the general model (2.53) with d arbitrary. Let y denote the IFME of y under the IFM corresponding to (2.57). Under Assumptions 2.1, 2.2 and 2.3, the jackknife estimator 7 J defined by (2.88) is asymptotically normal in the sense that, as n —• oo, V ^ ^ 1 ( 7 J - 7 O ) ) ^ ( 0 , / ) , where An = Dn1/2(y0)M^2(yQ)(D- 1 / 2 ( y 0 ) ) T , Dn(y0) and Mn(y0) are defined by (2.56). • T h e o r e m 2.16 Under the same assumptions as in Theorem 2.7, we have nVj,y - J D - 1 ( 7 o ) M „ ( 7 o ) p - 1 ( 7 o ) ) T - 0 ! where Vj:y is the jackknife estimate of variance defined by (2.89), and Ai(7o) a n d Mn(y0) are defined by (2.56). • Chapter 2. Foundation: models, statistical inference and computation 82 T h e o r e m 2.17 Let b be as described in subsection 2.5.2 and suppose (2.86) hold. Under the same assumptions as in Theorem 2.7, b(y), a function of IFME y, has the asymptotic distribution given by VnB'1 (b(y) - b(y))^Nt (0,1), where Bn = [(db/8y'0) D;l(y0)Mn(y0)(D;l(y0))T (db/dy'0f] ^ , and Dn{y0) and Mn(y0) are defined by (2.56). • T h e o r e m 2.18 Let b be as described in subsection 2.5.2 and suppose (2.86) hold. Under the same assumptions as in Theorem 2.7, b(yj), of the jackknife estimate yj derived fromy, has the asymp- totic distribution given by v^5- 1 (b(7 J )-b( 7 ))^Ar ( (o,7) , where Bn = T | l / 2 (db/dy'0)D-\y0)Mn(y0)(D-1(y0))T(db/dy'0Y , and Dn(y0) and Mn(y0) are defined by (2.56). • We define the jackknife estimate of variance of b(7), Vjb, as follows: VJb = £ (b (7 W ) - b(7)) (b(7(„)) - b( 7)) T . (2.90) v = \ T h e o r e m 2.19 Let b be as described in subsection 2.5.2 and suppose (2.86) hold. Under the same assumptions as in Theorem 2.6, we have nVJh - D - 1 ( 7 o ) M „ ( 7 o ) P „ - 1 ( 7 o ) f (jfif where the jackknife estimate of variance Vjt, defined by (2.90) and Dn(y0) and Mn(y0) are defined by (2.56). • 2.6 Estimation for models with parameters common to more than one margin One potential application of the M C D and M M D models is for longitudinal or repeated measures studies with short time series, in which the interest may be on how the distribution of the response changes over time. Some common characteristics, which carry over time, may appear in the form of common regression parameters or common dependence parameters. There are also general situations in which the same parameters appear in more than one margin. This happens with the M C D and Chapter 2. Foundation: models, statistical inference and computation 83 M M D models, for example, when there is a special dependence structure in the copula C, such as in the multinormal copula (2.4), where 0 = (Ojk) is an exchangeable correlation matrix with all correlations equal to 0, or 0 is an AR(1) correlation matrix with the (j, k) component equal to 0|J"*I for some 0. Example 2.19 Suppose for the d-variate binary vector Y j with a covariate vector for the jth univariate margin can be represented as Yjj = I(Zij < ctj+/3jXij), i = 1 , . . . , n, where Zj ~ N(0,0). This is a multivariate probit model. Assume /3j — /?, then the common regression coefficients appear in more than one margin. We could estimate /? from the d univariate margins based on the I F M approach, but then we have d estimates of /?. Taking any one of them as the estimate of /? evidently results in some loss of information. Can we pool the information together to get a better estimate of /3? The same question arises for the correlation matrix 0 . Assume there is no covariate for 0 . When 0 has certain special forms, for example exchangeable or AR(1), the same parameter appears in d(d—1)/2 bivariate margins. Can we get a more efficient estimate from the I F M approach? There are also situations where a parameter is common some margins, such as 0i2 = 023 = • • • = 0d,d-i in 0 . The same question about getting a more efficient estimate arises. • A direct approach for common parameter estimation is to use the likelihood of a higher-order margin, if this is computationally feasible. Otherwise, the I F M approach for model fitting can be ap- plied. With the I F M approach, appropriately taking the information about common parameters into account can improve the efficiency of the parameter estimates. Analytical and numerical evidence supporting this claim are given in Chapter 4 for these two approaches of information pooling for I F M that we propose here. The first approach, called the weighting approach (WA), is to form a new estimate based on some weighting of the estimates for the same parameter from different margins. A special case is the simple average. The second approach, called the pool-marginal-likelihoods ap- proach ( P M L A ) , is to rewrite the inference function of margins under the assumption that the same parameter appears in several margins. In the following, we outline the two estimating approaches in general terms. 2.6.1 W e i g h t i n g a p p r o a c h W A is a method to get an efficient estimate based on a weighted average of different estimates for the same parameter. We state this approach in general terms. Assume yi,. • - ,yq are estimates of the same parameter 7, but from different inference functions. Let 7 = (71,... ,yq)', and let Y,y be Chapter 2. Foundation: models, statistical inference and computation 84 the asymptotic variance-covariance matrix based on Godambe information matrix. One relatively efficient estimate of the parameter 7 is based on the following result, which easily obtains from the method of Lagrange multipliers. Result 2.1 Suppose X is a q-variate random vector with mean vector px = (p, • • - tP)' = / ^ l and Var(X) = E x , where p is a scalar and 1 = (1,..., 1)'. A linear unbiased estimate of p, u'X, has the smallest variance when E ^ l u = • Applying the above result to our problem, the resulting estimate of 7 is 7 = , 7 , . (2.91) If 7jS are consistent estimates of 7 , then 7 is also a consistent estimate of 7 and it has smaller asymptotic variance than any of the individual estimates of 7 from one particular inference function. The asymptotic variance of 7 is 4 = 1 1 ^ 1 1 = ^ ^ - . (2.92) A computationally simpler but slightly less efficient estimate of 7 is l'diagfCI 1}? and an approximation of the asymptotic variance of 7 from (2.93) is <r| = l / ( l 'diag{S^ 1 } l ) . A naive estimate of 7 is 7 = l ' 7 / l ' l , which is just the simple average. In some special situations, such as in the Example 4.4, the estimate of (2.91) reduces to a simple average. In practice, E^, may not be available. We may have to estimate it based on the data. The following algorithm may be useful for getting a final estimate of 7 . Assume we already have 7 and Computation algorithm: 1. Let u = E ^ l / l ' E r 1 ! . 7 ' 7 2. Find 7 = u '7 . 3. Set 7 = ( 7 , . . . , 7 ) , and update E-y with this new 7 . Chapter 2. Foundation: models, statistical inference and computation 85 4. Go back to step 1 until the final estimate of 7 satisfies a prescribed convergence criterion. The asymptotic variance of 7 at the final iteration can be calculated by (2.92). • 2 . 6 . 2 T h e p o o l - m a r g i n a l - l i k e l i h o o d s a p p r o a c h In this approach, different inference functions for the same parameter are pooled together as a sum, where each inference function is the loglikelihood of some margin. We use the model (2.54) to illustrate the approach. Suppose that in (2.54), otj = Ot, j = 1 , . . . , d and bjk = P , l < j < k < d , then more efficient estimators of a and /? may be obtained. For example, we can sum up the the loglikelihood functions of margins corresponding to the parameter vectors a and /? in the following way: n d 5 ( « ) = £ 5 > g ^ ( « y ) . C ( M ) = £ £ l ° g ^ * ( w ; W * ) - (2.94) is an example of P M L A . The inference functions of margins from (2.94) corresponding to ot and /? are « - i f f 1 dpj(yjj) 1 dPj(yij) IFM \hihp^ dai ' " " k k p ^ da> ' n d E E ^ J dPjkjyijyik) and the estimating equations based on I F M is n d EE i=lj<k dPjkjyijyik) (2.95) PjkiyijVik) dfiq EE—-—dFi,{KV"-) PjkiyijVik) d/3t = 0, s = l , . . . , p , = 0, t = l , . . . , q . i=lj<k If we consider (2.95) as inference functions, we see that asymptotic theory on the estimates from the P M L A can be established by applying the general inference function theory. For completeness, we give here algebraic expressions on I F M with the P M L A for the Godambe information for the model (2.39) with 6 = (#i, . . . , Bd, 9i2,. • •, 0d-i,d)', where we assume 9\,..., 0d Chapter 2. Foundation: models, statistical inference and computation 86 are each a function of one parameter A, and #12,• • •, are each a function of one parameter p. The loglikelihood functions of margin corresponding to the parameter vectors A and p are n d C ( A ) = £ E l o g p ; t o »=i j=i n d C(x, p) - E E l o s PjkiyijVik), i-l j,k = l;j<k and the estimating equations based on I F M is n d (2.96) 1 dPjjyij) 0, y^y^ 1 dPjk(yijyik) \ i=lj<k The I F M corresponding to one observation is Pjk(yijVik) dp = 0. 1 dPjkjyjyk) iVk) dp The Godambe information matrix is a 2 x 2 matrix. To calculate the Godambe information matrix we need the matrices My For E (V 2 ) , we have ' E(V?) E O i l f e ) ' and Dy / E ( 3 V i / 3 A ) 0 V 0 E(dy>2/dp) It can be estimated consistently by 1 A / A • 1 fliMw,-) »SL? j^ (Wi) ^A (2.97) Similarly, we have E(V22) = E Z 1 dPjkjyjyk) Pjk(yjVk) dp Chapter 2. Foundation: models, statistical inference and computation 87 f=iPj(yj) d\ 1 dPjjyj) \ 1 dPjk(yjyk) Pjk(yjyk) dp = £ P(yi--w) £ 1 dPj(yj) £ 1 dPjk(yjyk) friPiivj) dx / yf^PMyjyk) dp For E (5Vi /3A) , chp_ ex r = E 1 f d P M \ \ 1 d2Pj(yj) Pf(yj) \ dX J + Pj(yj) dX2 E(d^/dX)= ^(w-w)E { y i - y d } J = 1 Similarly, we find E(di>2/dP)= p(yi•••w)E { y i - y d } i < * 1 f-dPjMY } 1 a2p,-(y,-) P,-(W) 3A 2 ^ • f c C y j ^ j ^ " , 1 d2Pjk(yjyk) PjkiVjVk) dp2 Consistent estimates for E ^ ^ ) , E(ipiip2), E(dipi/dX) and E(drp2/dp) can be similarly written as in (2.97). 2.6.3 E x a m p l e s We give two examples of W A and P M L A . Example 2.20 1. A trivariate probit model with exchangeable dependence structure: Suppose we have a trivariate probit model with known cut-off points and P ( l l l ) = $3(0,0,0,p,p,p) . It can be shown (see Example 4.4) that the asymptotic variance of p from one bivariate margin is [(7r 2 — 4 (s in _ 1 p)2)(l — p 2)]/4n, and the asymptotic variance of p from W A or P M L A is [(1 — p2)(ir + 6 s i n - 1 p)(n — 2 s i n - 1 p)]/12n. The ratio of the former to the latter is [3(7r + 2 s i n - 1 /5)]/[7r + 6 s i n _ 1 p], which decreases from 0 0 to 1.5 as p increases from —0.5 to 1. In this example, the optimal weighting is equivalent to a simple average (see Example 4.4 for details). 2. A trivariate probit model with AR(1) dependence structure: Suppose we have a trivariate probit model with cut-off points known, such that -P( l l l ) = $3(0, 0, 0, p, p2, p). Let u2w be the asymptotic variance of p from W A , a2 be the asymptotic variance of p from P M L A , a 2 2 be the asymptotic variance of p from the (1, 2) margin, and <r23 be the asymptotic variance of p from the (1, 3) margin. In Example 4.5, we show that C p / c 2 > 1, with a maximum value of 1.0391, which is attained at Chapter 2. Foundation: models, statistical inference and computation 88 p = 0.3842 and p — —0.3842; cr22/<r2 increases from 1.707 to 2 as p goes from —1 to 0 and from 2 to 1.707 as p goes from 0 to 1; and <X23/<T2 increases from 1.207 to oo as p goes from —1 to 0, and decreases from oo to 1.207 as p goes from 0 to 1. • P M L A in the form presented in this section can also be considered as a simple weighted likelihood approach. More complicated weighting schemes for the P M L A can be sought. In general, as long as a reasonable efficiency is preserved, we prefer the simple weighting schemes. 2.7 Numerical methods for the model fitting From previous sections, we see that the I F M approach for parameter estimation leads to the problem of optimization of a set of loglikelihood functions of margins. The typical system of functions for the models with no covariates in the form of loglikelihood functions of margins are tnj(Xj) = Y^lo&pi(Vij)' j = l , . . . , d , i-l n tnjk(Qjk) - Elog PJk(yijyik), 1 < j < k < d, •=1 and the estimating equations (derived from the loglikelihood functions of margins) are ,d. (2.98) (2.99) For the models with covariates, the typical system of functions in the form of loglikelihood functions of margins are Lj(otj) = ^2logPj(yij), j = 1, . . .,d, i = \ (2.100) tnjk(Pjk) = ^2^ogPjk(yijyik), 1 < j < k < d i=i and the estimating equations are f * (a) - V 1 - 0 i - l 1 n A l ) ~ h p ^ da> 1 dPjkjyijyik) (2.101) fr[ pjk(yijyik) dBjk 0, 1 < j < k < d. Chapter 2. Foundation: models, statistical inference and computation 89 N e w t o n - R a p h s o n m e t h o d The traditional approach for numerical optimization or root-finding is the Newton-Raphson method. This method requires the evaluation of both the first and the second derivations of the objective functions in (2.98) and (2.100). This method, with its good rate of convergence, is the preferred method if the derivatives can be easily obtained analytically and coded in a program. But in many cases, for example with £njk(6jk) or £njk(0jk)i where bivariate objects involve non-closed form two-dimensional integrals, application of the Newton-Raphson method is difficult since analytical derivatives in such situations are very hard to obtain. The Newton-Raphson method can only be easily applied for a few cases with £nj(Xj) or £nj(otj), where only univariate objects are involved. For example, the Newton-Raphson method may be used to solve 9nj(Xj) = 0 to find Xj, j — 1 , . . . , d. In this case, based qn Newton-Raphson method, for a given initial value Xj,o, an updated value of Xj is ' 'dVnjjXjY dXj •V.new Aj,o — *nj(Xj) (2.102) This is repeated until successive A^new agree to a specified precision. In (2.102), we need to be able to code n *nj(Xj) = YtlVPjiynWPjiyn)/9^] (2.io3) and 0*n,-(A;) dXj £ 1 (dPjiVij) PfiVij) V dXj (2.104) tiPjiVii) dX] This is for the case with no covariates. For the case with covariates, similar iteration equations to (2.102) can be written down. We need to calculate *«;(<*;) = T,[VPi(vij)][dPj{vij)/d"i], 8 = 1 which is apj x 1 vector and dotj E i=i L 1 d2Pj(yij) 1 dPj(yij) (dPjjyjj) Pj(yij) dctjdet'j Pf{mj) dotj \ dctj (2.105) (2.106) which is a pj x pj matrix. It is equivalent to calculate the gradient of Pj(j)ij) at the point otj, which is the pj-vector of (first order) partial derivatives: d d $ M T Chapter 2. Foundation: models, statistical inference and computation 90 and the Hessian of Pj(ytj) at the point aj, which is &pj xpj matrix of second order partial derivatives with (s,t) (s,t = 1 , . . . ,pj) component (d2/da„dat)Pj(yij). To avoid the often tedious algebraic derivatives in (2.103) - (2.106), modern symbolic computa- tion software, such as Maple (Char et al., 1992), may be used. This software is also convenient in that it outputs the results in the form of C or Fortran code. Q u a s i - N e w t o n m e t h o d For many multivariate models, it is inconvenient to supply both first and second partial derivatives of the objective functions as required by the Newton-Raphson method. For example, to get the partial derivatives of the forms (2.104) - (2.106) may be tedious, particularly with function objects such as (•njk{9jk) or (.njki,Pjk), where 2-dimensional integrations are often involved. A numerical method for optimization that is useful for many multivariate models in this thesis is the quasi-Newton method (or variable-metric method). This method uses the numerical approximation to the derivatives (gradients and Hessian matrix) in the Newton-Raphson iteration; thus it can be considered as a derivative-free method. In many situations, a crude approximation to the derivatives can lead to convergence in the Newton-Raphson iteration as well. Application of this method requires only the objective functions, such as those in (2.98) and (2.100), to be coded. The gradients are computed numerically and the inverse Hessian matrix of second order derivatives is updated after each iteration. This method has the advantage of not requiring the analytic derivatives of the objective functions with respect to the parameters. Its disadvantage is that convergence could be slow compared with the Newton-Raphson approach. A n example of a quasi-Newton routine, which is used in the programs written for this thesis work, is a quasi-Newton minimization routine in Nash (1990) (Algorithm 21, pl92). This is a modified Fletcher variable-metric method; the original method is due to Fletcher (1970). With the quasi-Newton methods, all we need to do is to write down the optimization (min- imization or maximization) objective function (such as £njk{Pjk)), a n d then let a quasi-Newton routine take care of the rest. A quasi-Newton routine works fine if the objective function can be computed to arbitrary precision, say eo- The numerical gradients are then based on a step size (or step length) e < eo- The calculation of the optimization objective function with multivariate model often involves the evaluation of multiple integration at some arbitrary points. One-dimensional and two-dimensional numerical integrals can usually be computed quite quickly to around six digits of precision, but there is a problem of computational time in trying to achieve many digits of precision for numerical integrals of dimension three or more. When the objective function is not computed Chapter 2. Foundation: models, statistical inference and computation 91 sufficiently accurately, the numerical gradients are poor approximations to the true gradients and this will lead to poor performance of the quasi-Newton method. On the other hand, for statistical problems, great accuracy is seldom required; it is often suffice to obtain two or three significant digits, and we expect that in most of situations, we are not dealing with the worst cases. S tar t in g points for n u m e r i c a l o p t i m i z a t i o n In general, an objective function may have many local optima in addition to possibly a single global optimum. There is no numerical method which will always locate an existing global optimum, and the computational complexity in general increases either linearly or quadratically in the number of parameters. The best scenario is that we have a dependable method which converges to a local optimum based on initial guesses of the values which optimize the objective function. Thus good starting points for the numerical optimization methods are important. It is desirable to locate a good starting point based on a simple method, rather than trying many random starting points. A n example based on method of moments estimation for deciding the starting points is for the multivariate Poisson-lognormal model (see Example 2.12), where the initial values for an estimate of pj arid <Tj based on the the sample mean (yj), sample variance (s2) and sample correlations (rjk) are = {log[(«? - yj)/y] + I]}1'2, $ = logy,- - 0.5(«T?) 2 and 0]k = log[rjkSjSk/(yjyk) + respectively. If the problem involves covariates, one can solve some linear equation systems with appropriately chosen covariate values to obtain initial values for the regression parameters. Initial values may also be obtained from prior knowledge of the study or by trial and error. Generally speaking, it is easier to have a good starting point for a model with interpretable parameters or parameters which are easily related to interpretable characteristics of the model. In the situations where closed-form moment characteristics of the model are not available, we may numerically com- pute the model moments. N u m e r i c a l integrat ion There are several methods for obtaining numerical integrals, among them are Romberg integration, adaptive integration and Monte-Carlo integration. The latter isg especially useful for high dimen- sional integration provided the accuracy requirements are modest. With the I F M approach, as often only one or two-dimensional integrations are needed, the necessary numerical integrations are not a problem in most cases. (Thus I F M can be considered as a tractable computational method, which alleviates the often extremely heavy computational burdens in fitting multivariate models.) For Chapter 2. Foundation: models, statistical inference and computation 92 this thesis work, an integration routine based on the Romberg integration method in Davis and Rabinowitz (1984) is used; this routine is good for two to about four dimensional integrations. A routine in Fortran code for computing the multinormal probability (or multinormal copula) can be found in Schervish (1984). Joe (1995) provides some good approximation methods for computing the multinormal cdf and rectangle probabilities. 2.8 Summary In this chapter, two classes of models, M C D and M M D , are proposed and studied. The I F M is proposed as a parameter estimation and inference procedure, and its asymptotic properties are studied. Most of the results are for the the models with M U B E or P U B E properties. But the results of this chapter should apply to a very wide range of inference problems for numerous popular models in M C D and M M D classes. The I F M E has the advantage of computational feasibility; this makes numerous models in the M C D and M M D classes practically useful. We also proposed a jackknife procedure for computing the asymptotic covariance matrix of the I F M E , and demonstrated the asymptotic equivalence of this estimate to the Godambe information matrix. Based on the I F M approach, we also proposed estimation procedures for models with parameters common to more than one margin. One problem of great interest is that of determining the efficiency of the I F M E relative to the conventional M L E . Clearly, a general comparison would be very difficult and next to impossible. Analytic and simulation studies on I F M efficiency will be given in Chapter 4. Another problem of interest is to see how the jackknife procedure compares with the Godambe information calculation numerically. A study of this is also given in Chapter 4. Our results may have several interesting extensions; some of these will be discussed in Chapter 7 as possible future research topics. Chapter 3 Modelling of multivariate discrete data In this chapter, we study some specific models in the class of M C D and M M D models and develop the corresponding I F M approach for fitting the model based on data. The models are first discussed for the case with no covariates and then for the case with covariates. For the dependence structure in the models, we distinguish the models with general dependence structure and the models with special dependence structure. Different ways to extend the models, especially to include covariates for the dependence parameters, are discussed. This chapter is organized as the following. In section 3.1, we study M C D models for binary data, with emphasis on multivariate logit models and probit models for binary data. In section 3.2, we make some comparison of the models discussed in section 3.1. The general ideas in this section should also extend to other M C D and M M D models. In section 3.3, we study M C D models for count data, and in section 3.4, we study M C D models for ordinal data. M M D models for binary data are studied in section 3.5, and M M D models for count data are studied in section 3.6. Finally in section 3.7, we discuss the use of M C D and M M D models for longitudinal and repeated measures data. In each section, only a few parametric models are given, but many others can be derived. For data analysis examples with different models presented in this chapter, see Chapter 5. 93 Chapter 3. Modelling of multivariate discrete data 94 3.1 Multivariate copula discrete models for binary data 3.1.1 M u l t i v a r i a t e l o g i t m o d e l A multivariate logit model should be based on a multivariate logistic distribution. As there is no nat- ural multivariate logistic distribution, we construct multivariate logit models by putting univariate logistic margins into a family of copulas with a wide range of dependence, and simple computa- tional form if possible. As a result, multivariate logit models for binary data are obtained by letting Gj(0) = 1/[1 + exp(zj)] and Gj(l) — 1 in the model (2.13), with arbitrary copula C. Some choices of the copula C are: • Multinormal copula C ( « i , . . . , U d ; e ) = $ d ( $ - 1 ( U l ) , . . . , $ - 1 ( U d ) ; 0 ) , (3.1) where © = (6jk) is a correlation matrix (of normal margins). The bivariate margins of (3.1) are Cjk(uj,uk) = $ 2 ( * _ 1 ( u j ) . * - 1 ( u * ) ; O j k ) • Mixture of max-id copula (id for infinitely divisible, see Joe and Hu 1996) C(u) = V> ( - ^ l o g ^ , ( e - ^ " 1 ( " i ) > e - P ^ " 1 ( ^ ) ) + ^ ^ f t V ' - 1 K ) ) , (3-2) where Kjk are max-id copulas, pj = (VJ +d— l ) - 1 , Vj > 0. For interpretation, we may say that ip can be considered as providing the minimal level of (pairwise) dependence, the copula Kjk adds some pairwise dependence to the global dependence, and i/j's can be used for bivariate and multivariate asymmetry (the asymmetries are represented through Vj/(i/j + uk), j ^ k). (3.2) has the M U B E and partially C U O M properties. With ip(s) = -9~l log(l - [1 - e-e)e-'), 0 > O , d ^-(^-e-e)l[Kjk(xj,xk)l[xy , (3.3) j<k j=l where Xj = [(1 - e-9u')/(l - e~B)]p', pj = (VJ +d - The bivariate margins of (3.3) are Cjk(uj, uk) - -Q~l log[l - (1 - e~e)xl/ji+d~2xukk+d~2Kjk(xj, xk)]. One good choice of Kik would be Kjk(xj,xk) = (xj 6 j k + x k b 3 k — \)~ll6'k, 0 < 6jk < oo; because the resulting copula is simple in form. (See Kimeldorf and Sampson (1975) and Joe (1993) for a discussion of this bivariate copula.) C(u) = - r M o g Chapter 3. Modelling of multivariate discrete data 95 • Molenberghs-Lesaffre (M-L) construction (Joe 1996). The construction generates a multivari- ate "copula" with general dependence structure. A n example for a 4-dimensional copula is the following: =  x(x-ao)Ui<j<k<4(x-aJk) ^ ^ (Cl23 - ar)(Cl24 - z)(Cl34 - X)(C234 - x) Ilj = i ( a i - x ) ' where x = C1234 is the copula, a0 = C123 + C124 + C134 + C234 — C\i — C13 — C14 — C23 — C24 — C34 + wi + u2 + «3 + U4 — 1, a i = u i — C12 — C13 — C14 + C123 + C124 + C134, a 2 = U2 — C\2 — C23 — C24 + C123 + Cl24 + C234, «3 = U3 — C13 — C23 — C34 + C123 + C134 + C234, a 4 = u 4 — C14 — C24 — C34 + C i 2 4 + Ci34-|-C234, and ajk = Cjki + Cjkm — Cjk, for 1 < j < k < 4 and 1 < /, m ^ j, k < 4. In (3.4), C i 2 3 , C i 2 4 , C134 and C 234 are compatible trivariate copulas such that vikl = h h h h ' (3-5^ O4O5O6&7 where z = CJU, &i = Uj - Cjk - Cji + z, b2 = uk - Cjk - CM + z, 63 = u; - Cji - CM + z, 64 = Cjk — z, 65 = Cji — z, b6 = Cki — z and 67 = 1 - Uj — Uk — u\ + Cjk + Cji + Cki — z, for l < j < k < l < 4 . The bivariate copulas C12, Ci3, Ci4> C23, C24, C34 are arbitrary compatible bivariate copulas. Examples are the bivariate normal, Plackett (2.8) or Frank copula (2.9); see Joe (1993, 1996) for a list of bivariate copula families with good properties. The parameters in C1234 are the parameters from the bivariate copulas, plus rjjki, l < j < k < l < 4 , and 771234. The generalization of this construction to arbitrary dimension can be found in Joe (1996). Notice that we have quotation marks on the word copula, because the multivariate object obtained from (3.4) and (3.5) or the corresponding form for general dimension has not been proven to be a proper multivariate copula. But they can be used for the parameter range that leads to positive orthant probabilities for the resulting probabilities for the multivariate binary vector. • Morgenstern copula C ( u i , ...,ud) d I+EM1-^1-"*) j<k d Y[uh, (3.6) ft=i where 6jk must satisfy some constraints such that (3.6) is a proper copula. The bivariate margins of (3.6) are Cjk{uj,uk) — [I + Ojk(l - Uj)(l - uk)]ujUk, \0jk \ < 1. The permutation symmetric copulas of the form C{uu...,ud) = <f> ^X>-1(u'-)) > (3J) Chapter 3. Modelling of multivariate discrete data 96 where 4> : [0,oo) —>• [0,1] is strictly decreasing and continuously differentiable (of all orders), and <j>(0) = 1, ^(oo) = 0, (-l)J>Ci) > 0. With ^(s) = -6~l log(l - [1 - e-e]e~s), 6 > 0, (3.7) is C(uu ..., ud) = ^ £ = - \ log ( l - y ^ J - ! ) ^ ) • (3-8) This choice of ip(s) leads to 3.8 with bivariate marginal copulas that are reflection symmetric. It is straightforward to extend the univariate marginal parameters to include covariates. For example, for z,j corresponding to the random vector Y,-, we can let Z{j — otjXij, where otj is a parameter vector and Xjj is a covariate vector corresponding to the jth margin. But what about the dependence parameters in the copula? Should the dependence parameters be functions of covariates? If so, what are the functions? These questions have no obvious answers. It is evident that if the dependence parameters are functions of covariates, the form of the functions will depend on the particular copula associated with the model. A simple way to deal with the dependence parameters is to let the dependence parameters in the copula be independent of covariates, and sometimes this may be good enough for the modelling purposes. If we need to include covariates for the dependence parameters, careful consideration should be given. In the following, in referring to specific copulas, we give some discussion on different ways of letting the dependence parameters depend on covariates: - With the multinormal copula, the dependence structure in the copula for the ith response vector Y, is Q{ = (Oijk)- It is well-known that (i) 0,- has to be nonnegative definite, and (ii) the component 6ijk of has to be bounded by 1 in absolute value. Under these constraints, different ways of letting 0 ; depend on covariates are possible: (a) let Oijk = [exp(/?Jj.WjJfc) — l]/[exp()9jjfcw,Jjfe)"+ 1]; (b) let 0j have a simple correlation structure such as exchangeable and AR(1); (c) use a representation such as z,j - [otjXij]/[l + x^-Xjj] 1 / 2 , 6ijk — fjk/[(l + x- J x ! J ) ( l + x ^ X j i ) ] 1 / 2 ; (d) use a more general representation such as 6ijk = r^w^ jkvfi,jk; or (e) reparameterize 0j into the form of d— 1 correlations and (d— l)(d—2)/2 partial correlations. The extension (a) satisfies condition (ii), but not necessarily (i). The extension (b) satisfies conditions (i) and (ii), but is only suitable for data with a special dependence structure. The extension (c) is more natural, as it is derived from a mixture representation (see section 3.5 for a more general form) and it satisfies condition (ii) and also condition (i) as long as the correlation matrix (rjk) is nonnegative definite. This is an advantage in comparison with (a), as in (a), for (i) to be satisfied, all n correlation matrices must be nonnegative definite. The disadvantage of (c) is that the dependence range may be limited once a particular formal Chapter 3. Modelling of multivariate discrete data 97 representation is chosen. The extension (d) is similar to the the extension (c), except that now the 0,-jjbS are not required to depend on the same covariate vectors as z 8 j . The extension (e) completely avoids the matrix constraint (i), thus relieving a big burden on the constraint for the appropriate inclusion of covariate to the dependence parameters. But this extension implicitly implies an indexing of variables, which may not be rational in general, although this is not a problem in many applications as the indexing of variables is often evident, such as with time series; see Joe (1996). - With the mixture of max-id copula (3.3), extensions to parameters 6, Vj, 6jk as functions of the covariates are straightforward. For example, for 0,-, Sijk corresponding to the random vector Y,-, we may have 9i, Vij constant, and 6ijk = exp(3jkVfi,jk)- - With the Molenberghs-Lesaffre construction, the extension to include covariates is possible. In applications, it is often good enough to let the bivariate parameters be function of covariates, such as Sijk = exP(/^jfcws,jfc) for bivariate Plackett copula, or Frank copula, and to let the higher order parameters be constant values, such as 1. This is a simple and useful approach, but there is no guarantee that this leads to compatible parameters. (See Joe 1996 for a maximum entropy interpretation in this case.) - With the Morgenstern copula, the extension to let the parameters Oijk be functions of some covariates is not easy, since the Oijk must satisfy some constraints. This is rather complicated and difficult to manage when the dimension is high. The situation is very similar to the multinormal copula where 0,- should be nonnegative definite. - With the permutation symmetric copula (3.8), the extension to include covariates is to let 0,- be function of covariates, such as to let 0,- = exp(/3'wj). We see that for different copulas, there are many ways to extend the model to include covariates. Some are obvious and thus appear to be natural, others are not easy or obvious to specify. Note also that an exchangeable structure within the copula does not imply an exchangeable structure for the response variables. For M C D models for binary data, an exchangeable structure within the copula plus constant cut-off points across all the margins implies an exchangeable structure with the response variables. The A R dependence structure for the discrete response variables should be understood as latent Markov dependence structure (see section 3.7). When we mention an AR(1) (or A R ) dependence structure, we are referring to the latent dependence structure within the multinormal copula. Chapter 3. Modelling of multivariate discrete data 98 In summary, under "multivariate logit models", many slightly different models are available. For example, we have multivariate logit model with i . multinormal copula (3.1), ii . multivariate Molenberghs-Lesaffre construction a. with bivariate normal copula, b. with Plackett copula (2.8), c. with Frank copula (2.9), iii . mixture of max-id copula (3.3), iv. Morgenstern copula (3.6), v. the permutation symmetric copula (3.8). Indeed, such multiple choices of models are available for any kind of M C D model. For a discrete random vector Y , different copulas in (2.13) lead to different probabilistic models. The question is when is one model preferred to another? We will discuss this question in section 3.2. In the following, as an example, we examine estimation aspects of the multivariate logit model with multinormal copula. The multivariate logit model with multinormal copula can also be called multivariate normal- copula logit model to highlight the fact that multinormal copula is used. For the case with no covariates, the estimating equations for multivariate normal-copula logit model based on I F M can be written as the following f * n ; ( * j ) = K ( l ) ( l + e x p ( - Z i ) ) - n,-(0)(l + exp(-z j ))exp(z j )] ^}~(Zj\.2 = 0, j = 1,..., d , i {i -r exp^—Zj)) ^ 2 ( $ - 1 ( U j ) , $ - 1 ( U f c ) , ^ , ) = 0, 1 < j < k < d, where Pjk(ll) = Cjk(uj, uk; 9jk), Pjk(10) = Uj - Cjk(uj ,uk;9jk), Pjk{\l) - uk - Cjk(uj ,uk;9jk), Pjk{U) = 1 - Uj - uk + Cjk{uj,uk;9jk) with Cjk(uj,uk;9jk) = ^2(^~1{uj),^~1(uk),9jk), Uj = 1/[1 + exp(—Zj)], uk = 1/[1 + exp(—zk)], and (j>2 is the B V N density. We obtain the estimates Zj = log(nj(l)/n ;(0)), j = 1 , . . . , d, and 9jk is the root of the equation $2(<b~1(uj), $ - 1 (uj ; ) , 0jk) = rijk(ll)/n, 1 < j < k < d, where Uj = 1/(1 + exp(—Zj)) and uk = 1/(1+ exp(—zk)). 9„jk(9jk) njk(ll) njk(lQ) njk(01) n j t (00) + Pjk(U) J W 1 0 ) Pjk(01) P,*(00) Chapter 3. Modelling of multivariate discrete data 99 For the situation with covariates, we may have the cut-off points depending on some covariate vectors, such that = ctjoXijo + ctjiXiji -\ 1- ctjPjXijPj = ot'jXij, (3.9) where Xjjo = 1, and possibly Vijk - 73̂  \~T~7> (3.10) where 0]k = (bjk,o, • • •, bjklPjk). We recognize that (3.10) is one of the form of functions of dependence parameters (in multinormal copula) on covariates that we discussed previously. We use this form of functions for the purpose of illustration. Other function forms can also be used instead. Because of the linearity of (3.9) and (3.10), the regression parameter vectors otj, /3jk have marginal interpretations. The loglikelihood functions of margins are Znj(otj) = £ log Pj(yij), j - 1,..., d, n £njk(otj,otk,l3jk) = ^logPjk(yijyik), 1 < j < k < d, where exp(zij) 1 . 1 + exp(ztj) 1 + exp(zij) ^ ( ^ ( a y ) , * _ 1 ( ^ ) i + * 2 ( * _ 1 f o ) . ^-\aik);ejk), where =-Gi,(yij - 1), 6,j = Gij(yij), a i k = Gik(yik - 1) and bik = Gik(yik), with G , j ( l ) = 1 and Gij(0) — 1/1 + exp(zjj). We can apply quasi-Newton minimization to the loglikelihood functions of margins for getting the estimates of the parameters otj and (i]k. The Newton-Raphson method can also be used for getting the estimates of otj (what we used in our computer programs). In this case, we have to solve the estimating equations For applying Newton-Raphson method, we need to calculate dP(yij)/dajt and d2P(yij)/dctjsdajt. WehavedP(yij)/dajs - (2yij-l){exp(zij)/(l+exp(zij))2}xijs, s - 0 ,1 , . ..,pj, a n d d 2 P ( y i j ) / d a j s d a j t = (2ytj — l ){exp(z, j)( l - exp(z, j)) /( l + exp(zij))3}xijsXijt, s,t = 0 , 1 , . . .,pj. For details about Newton-Raphson and quasi-Newton methods, see section 2.7. M $ and Dy can be calculated and estimated by following the results in section 2.4. In applica- tions, to avoid the tedious coding of M $ and , we may use the jackknife technique to obtain the Chapter 3. Modelling of multivariate discrete data 100 asymptotic variance of Zj and Ojk in case there are no covariates, or that of otj and $jk in case there are covariates. 3.1.2 Multivariate probit model The general multivariate probit model, similar to that of multivariate logit model, is obtained by letting Gj(0) = 1 — $ ( Z J ) and G j ( l ) = 1 in the model (2.13). The multivariate probit model in Example 2.10 is the classical multivariate probit model discussed in the literature, where the copula in (2.13) is the multinormal copula. A l l the discussion of the multivariate logit model is relevant and can be directly applied to the multivariate probit model. For completeness, we give in the following some detailed discussion about the multivariate probit model when the copula is the multinormal copula, as a continuation of Example 2.10. For the multivariate probit model in Example 2.10, it is easy to see that E(Y}) = <&(ZJ), Var(Y}) = <3>(z,)(l - $ ( Z J ) ) , Cov(Yj, Yfc) = $2(2/, Zk, Ojk) - $(zj)$(zk), j 7 ^ k. The correlation of the response variable Yj and Yj is C o t t ( y Y x = cov(y;-,n) ^ . ^ . M ( 3 ' k> {VM ^ O V a r f Y * ) } 1 / 2 {$(ZJ)(1 - $(zj))$(zk)(l - The variance of Yj achieves its maximum when Zj = 0. In this case E(Yj) = 1/2, Var(Y}) = 1/4. If Zj =0,zk= 0, we have Cov(Y}, Yk) = ( s i n - 1 0jk)/(2ir), and Corr(Yj, Yk) = (2 s i n - 1 0jk)/TT. Without loss of generality, assume Zj < zk, then when 0jk is at its boundary values, f {*(z,-)(l - $(z*))/(l - *(z,.))4(z*)} 1 / 2, Ojk = 1 , -{(1 - $(z,))(l - *(zt))/*(*;)*(*0}1/2. 0jk = - 1 , -Zk < ZJ , -{*(Zj)*(zk)/(l - *{Zj))(l - ^Zk))}1'2, Ojk = - 1 , Zj < -Zk , { 0, Ojk = 0 . From Frechet bound inequalities, - m i n i y / 2 , * ! - 1 / 2 } < Cou(Yj,Yk) < minjfc 1 / 2 , 6"1/2}, where a = [Pj(l)Pk{l)}/[Pj(0)Pk(0)], b = [Pj(l)Pk(0)]/[Pj(0)Pk{l)], we see that Con(Yj}Yk) attains its upper and lower bound when Ojk = 1 and 0jk = —1 respectively. Corr(Yj,Yfc) is an increasing function in Ojk, it varies over its full range as Ojk varies over its full range. Thus in a general situation a multivariate probit model of dimension d consists of d univariate probit models describing some marginal characteristics and d{d—l)/2 latent correlations 0jk, 1 < j < k < d, expressing the strength of the association among the response variables. Ojk = 0 corresponds to independence among the Corr{Yj,Yk) Chapter 3. Modelling of multivariate discrete data 101 response variables. The response variables are exchangeable if 0 has an exchangeable structure and the cut-off points are constant across all the margins. Note that when 0 has exchangeable structure, we must have 0jk = 0 > —l/(d — 1). The estimating equations for the multivariate probit model with multinormal copula, based on n response vectors y,-, i = 1 , . . . , n, are •-<<-'> = ( 2 $ - r ^ W ^ - <• qn.k{e.k) = ( ni"(n) " i * ( 1 0 ) nM01) , n »jt(00) -^j foizj, zk,6jk) = 0, 1 < j < k < d. 1 - Q(Zj) - $(**) + ^2{Zj,Zk,9jk)i These lead to the solutions Zj = $ _ 1 (rij(l)/n), j = 1, . . .,d, and Ojk is the root of the equation $2(zj,zk,9jk) = njk(ll)/n, 1 < j < k < d. For the situation with covariates, the details are similar to the multivariate logit model with multinormal copula in the preceding subsection, except now we have PiAvu) = yijHzij) + (1 - - Pijk(VijVik) = $2{$~l(bij), $ - 1(6ijt); Ojk) - $2($~1(bij), $ _ 1 (a l j f c ) ; 0jk)- * 2 ( * _ 1 ( a y ) . ®~l(bik); Ojk) + * 2 ( S - r ( a y ) , ^(atk); 0jk), with dij - Gij{yij - 1), bij = Gij(yij), aik = Gik(yik - 1) and bik = Gik(yik), with G,j(l) = 1 and Gij(O) = 1 - $(z.j). We also have dP(yij)/dajs = (2yij - l)4>(zij)xijs, s = 0,1,...,pj and d2P(yij)/dajsdctjt = (1 — 2yij)<j>(zij)zijXijaXijt, s,t = 0 ,1 , . . -,Pj\ these expressions are needed for applying the Newton-Raphson method to get estimates of otj. M $ and D $ can be calculated and estimated by following the results in section 2.4. For example, for the case with no covariates, we have E(ip2(zj)) = <f>2(zj)/{<b(zj)(l — $(ZJ))} and E(ip2(0jk)) = [ l /^-^llJ+l /P^Cloy+l /P^COlJ+l /^tCOOM^^llJ /a^i] 2 , where dPjk(n)/d0jk = d^2(zj,zk, 0jk)/90jk — <t>2{zj,zk,0jk), a result due to Plackett (1954). In applications, to avoid the tedious computer coding of and Dy, we may use the jackknife technique to obtain the asymptotic variance of Zj and 0jk in case there are no covariates, or that of d ;- and /3jk in case there are covariates. Chapter 3. Modelling of multivariate discrete data 102 3.2 Comparison of models We obtain many models under the name of multivariate logit model (also multivariate probit model) for binary data. A n immediate question is when is one model preferred to another? In section 1.3, we outlined some desirable features of multivariate models; among them (2) and (3) may be the most important. But in applications, the importance of a particular desirable feature of multivariate model may well depend on the practical need and constraints. As an example, we briefly compare the multivariate logit models and the multivariate probit models with different copulas studied in the section 3.1. The multivariate logit model with multinormal copula satisfies the desirable properties (1), (2), (3) and (4) of a multivariate model outlined in section 1.3, but not (5). The multivariate probit model with multinormal copula is similar, except that one has logit univariate margins and the other has probit univariate margins. In applications, the multivariate logit model with multinormal copula may be preferred to the multivariate probit model with multinormal copula, as the multivariate logit model with multinormal copula has the advantage of having a closed form univariate marginal cdf. This consideration also leads to the general preference of multivariate logit model to multivariate probit model when both have the same associated multivariate copula. For this reason, in the following, we concentrate on discussion of multivariate logit models. The multivariate logit model with the mixture of max-id copula (3.3) satisfies the desirable properties (1), (3) and (5) of a multivariate model outlined in section 1.3, but only partially (2) and (4). The model only admits positive dependence (otherwise, it is flexible and wide in terms of dependence range) and it is CUOM(fc) (k > 2) but not C U O M . The closed form cdf of this model is a very attractive feature. If the data exhibit only positive dependence (or prior knowledge tells us so), then the multivariate logit model with mixture of max-id copula (3.3) may be preferred to the multivariate logit model with multinormal copula. The multivariate logit model with the M - L construction satisfies the desirable properties (1), (2), (3) and (4) of a multivariate model outlined in section 1.3, but not (5). The computation of the cdf may be easier numerically than that of multivariate logit model with multinormal copula since the former only requires solving a set of polynomial equations, but the latter requires mul- tiple integration. The disadvantage with this model, as stated earlier, is that the object from the construction has not been proven to be a proper multivariate copula. What has been verified nu- merically (see Joe 1996) is that (3.4) and its extensions do not yield proper distributions if 771234 and Vjki (1 < j < k < I < 4) are either too small or too large. In any case, the useful thing about this Chapter 3. Modelling of multivariate discrete data 103 model is that it leads to multivariate objects with given proper univariate and bivariate margins. The multivariate logit model with the Morgenstern copula satisfies the desirable properties (1), (4) and (5) of a multivariate model outlined in section 1.3, but not (2) and (3). This is a major drawback. Thus this model is not very useful. The multivariate logit models with the permutation symmetric copulas (3.7) are only suitable for the modelling of data with special exchangeable dependence patterns. They cannot be considered as widely applicable models, because the desirable property (2) of multivariate models is not satisfied. Nevertheless, this model may be one of the interesting considerations in some applications, such as when the data to be modelled are repeated measures over different treatments, or familial data. In summary, for general applications, the multivariate logit model with the multinormal copula or the mixture of max-id copula (3.3) may be preferred. If the condition of positive dependence holds in a study, then the multivariate logit model with the mixture of max-id copula (3.3) may be preferred to the multivariate logit model with multinormal copula because the former has a closed form multivariate cdf; this is particularly attractive for moderate to large dimension of response, d. The multivariate logit model with Molenberghs-Lesaffre construction may be another interesting option. When several models fit the data about equally well, a preference for one should be based on which desirable feature is considered important to the successful application of the models. In many situations, several equally good models may be possible; see Chapter 5 for discussion and data analysis examples. In the statistical literature, the multivariate probit model with multinormal copula has been studied and applied. A n early reference on an application to binary data is Ashford and Sowden (1970). A n explanation of the popularity of multivariate probit model with multinormal copula is that the model is related to the multivariate normal distribution, which allows the multivariate probit model to accommodate the dependence in its full range for the response variables. Furthermore, marginal models follow the simple univariate probit models. 3 .3 Multivariate copula discrete models for count data Univariate count data may be modelled by binomial, negative binomial, logarithmic, Poisson, or generalized Poisson distributions, depending on the amount of dispersion. In this section, we study some M C D models for multivariate count data. Chapter 3. Modelling of multivariate discrete data 104 3.3.1 Multivariate Poisson model The multivariate Poisson model for Poisson count data is obtained by letting Gj(yj) = Ylm'loPf"^' yj = 0 ,1 ,2 , . . . , oo, j = 1,2, in the model (2.13), where p^ — [XJ1 exp(—Aj)]/m!, Xj > 0. The copula C in (2.13) is arbitrary. Copulas (3.1)—(3.8) are some interesting choices here. The multivariate Poisson model has univariate Poisson marginals. We have E(Yj) = Var(Y)) = Xj, which is a characterizing feature of the Poisson distribution called equidispersion. There are situations where the variance of count data is greater than the mean, or the variance is smaller than the mean. The former case is called overdispersion and the latter case is called underdispersion. We will see models dealing with overdispersion and underdispersion in the subsequent sections. A l - though the multivariate Poisson model has Poisson univariate marginal distribution, the conditional distributions are not Poisson. The univariate parameter A, in the multivariate Poisson model can be reparameterized by taking t]j = log(Aj) so that the new parameter rjj has the range (—00,00). It is also straightforward to extend the univariate marginal parameters to include covariates. For example, for Xij corresponding to random vector Y,-, we can let A,-,- = exp(a(jX,j), where aj is a parameter vector and x,j is a covariate vector corresponding to the j th margin. The discussion on modelling the dependence parameters in the copulas in section 3.1 are also relevant here. Most of the discussion in section 3.2 about the comparisons of models is also relevant here as the comparison is essentially the comparison of the associated multivariate copulas. In summary, under the name "multivariate Poisson models", we may have multivariate Poisson model with i . multinormal copula (3.1), ii . multivariate Molenberghs-Lesaffre construction a. with bivariate normal copula, b. with Plackett copula (2.8), c. with Frank copula (2.9), iii . mixture of max-id copula (3.3), iv. Morgenstern copula (3.6), v. the permutation symmetric copula (3.8). Chapter 3. Modelling of multivariate discrete data 105 These are similar to the multivariate logit models for binary data. For illustration purposes, in the following, we provide some details on the multivariate Poisson model with the multinormal copula. The multivariate Poisson model with the multinormal copula can also be called the multivariate normal-copula Poisson model. This model was already introduced in Example 2.11. For the multivariate normal-copula Poisson model, the Frechet upper bound is reached in the limit if 0 = J , where J is matrix of I's. In fact, when Ojk = 1 and Aj = A^, the correlation of the response variable Yj and Yk is A j <A j When 0 has an exchangeable structure and Aj does not depend on j, then there is also an exchange- able correlation structure in the response vector Y . The loglikelihood functions of margins for the parameters A and 0 based on n observed response vectors y,-, i = 1 , . . . , n, are: (3.11) tnj k(0jk) = Y,lo& Pi kiVijVik), l < j < k < d , i=l where Pj(yij) = A j i J e x p ( - A j ) / 2 / i j ! and PjkjVijVik) = * 2 ( * _ 1 ( ^ ) , flj*) ~ * 2 ( * _ 1 ( M . $ " H a i k ) ; M - $ 2 ( $ ~ H a o O . $ " H 6 < * ) ; ^ w h e r e a u = Gij(yij-i), bij = Gij(yij), aik = GikiVik - 1) and 6,-* = Gik(yik), with Gij(yij) = J2V=opj(x) a n d Gik(yik) = Tjx=o pk(x)- The estimating equations based on I F M are •«<*> = t J 5 7 ^ ^ = o. > = ' * „ (3-!2) ,T, ra ^ 1 dPjkijJijyik) „ 1 . . ^, , , , 9njk(0jk) - x — L = 0, 1 < 3 < k < d, ~[ pjk{yijyik) oOjk which lead to Aj = 5 Z " = 1 Vij/n> a n d Ojk c a n be found through numerical computation. A n extension of the multivariate normal-copula Poisson model with covariate x,j for the response observation y,j is to let A,j = hj(yj, X; J ) for some function hj in the range [0,00). A n example of the function hj is A,j = exp(7jX 5 j ) (or log(A,j) = 7 j X j j ) . The ways to let the dependence parameters Ojk be functions of covariates follow the discussion in section 3.1 for the multivariate logit model with multinormal copula. We can apply quasi-Newton minimization to the loglikelihood functions of margins (3.11) to obtain the estimates of the parameters 7 j - and the dependence parameters Ojk (or the regression Chapter 3. Modelling of multivariate discrete data 106 parameters for the dependence parameters if applicable). The Newton-Raphson method can also be used to obtain the estimates of fj from ^nj(Xj) = 0. Let log(A,j) = ~fjo + ~fji%iji + • • ' + ljpjxijpj For applying the Newton-Raphson method, we need to calculate dP(yij)/djjs and d2P(yij)/djjsdjjt- If we let xij0 = 1, we have dP(yij)/dfjs = { A ^ exp(-A,j)/?Aj\}[y tj - Xij]xijs, s = 0,1,. ..,pj, and d2P(yij)/dyj,dfjt = {Xy{jj exp(-A, J)/j/ ij!}[(y 1j - A*,) 2 - Xij]xijsXijt, s,t = 0 ,1 , . . .,Pj. For details about numerical methods, see section 2.7. 3.3.2 Multivariate generalized Poisson model The multivariate generalized Poisson model for count data is obtained by letting Gj(yj) = X^/ijj P^'\ yj =0 ,1 ,2 , . . . , oo, j = 1, 2 , . . . , d, in the model (2.13), where where Xj > 0, max(— 1, — Xj/m) < ctj < 1 and m (> 4) is the largest positive integer for which Xj + maj > 0 when ctj is negative. The copula C in (2.13) is arbitrary. The copulas (3.1)—(3.8) are some choices here. The multivariate generalized Poisson model has as its jth (j = 1 , . . . , d) margin the generalized Poisson distribution with pmf (3.14). This generalized Poisson distribution is extensively studied in a monograph by Consul (1989). Its main characteristic is that it allows for both overdispersion and underdispersion by introducing one additional parameter ctj. The generalized Poisson distribution has the Poisson distribution as a special case when ctj — 0. The mean and variance for Yj are E(Yj) = A j ( l — ctj)-1 and Var(Yj) = A ; ( l — ctj)~3, respectively. Thus, the generalized Poisson distribution displays overdispersion for 0 < ctj < 1, equidispersion for ctj = 0 and underdispersion for max(—l,Aj/m) < ctj < 0. The restrictions leading to underdispersion are rather complicated, as the parameters ctj are restricted by the sample space. It is easier to work with the overdispersion situation where the restrictions are simply Aj > 0, 0 < ctj < 1. The details of applying the I F M procedure to the generalized Poisson model are similar to that of the multivariate Poisson model. For the situation with no covariates, the univariate estimating equations for the parameters Aj and aj, j = 1 , . . . , d, are Pj Xj(Xj + saj)s 1 exp(—Aj — J 0 for s > m, when aj < 0, saj) S = 0 , l , 2 , . . . (3.14) (3.15) Chapter 3. Modelling of multivariate discrete data 107 They lead to f^l y+j + (nyij - y+jjctj { y+j(1 - otj) - nXj = 0, where y+j = 5 Z " = 1 S/ij. When there is a covariate vector x,j for the response observation y ^ - , we may let A,j = aj(yj,x.ij) for some function aj in the range [0,oo), and let a,j = 6j (fy, x,j) for some function bj in the range [0,1]. A n example is A,j = exp(7jXjj) (or log(A,j) = 7j-x,j) and ctij — 1/[1 + exp(—Jjj-Xjj)]. The discussion on modelling the dependence parameters in the copulas in section 3.1 is also appropriate here. Furthermore, most of the discussion in section 3.2 about the comparisons of models is also relevant here since the comparison is essentially the comparison of the associated multivariate copulas. 3.3.3 Multivariate negative binomial model The multivariate negative binomial modelfor count data is obtained by letting Gj(yj) = Y^s=oP*f\ yj = 0 ,1 ,2 , . . . , oo, j = 1,2, . . . , d, in the model (2.13), where ^ ' ^ w t + V ' ' 1 - * ' ' ' ""^ (3'16) with aj > 0 and 0 < pj < 1. The mean and variance for Yj are E(Yj) = atj(l — Pj)/pj and Var(Yj) = Q !j(l— Pj)/p], respectively. Since aj > 0, we see that this model allows for overdispersion. When there is a covariate vector x,-j for the response observation y t J - , we may let a,j = aj (yj, x,-j) for some function aj in the range [0,oo), and let pij — bj(r)j,x,j) for some function bj in the range [0,1]. See Lawless (1987) for another way to deal with covariates. Other details are similar to that of the multivariate generalized Poisson model. 3.3.4 Multivariate logarithmic series model The multivariate logarithmic series model for count data is obtained by letting Gj(yj) = Y^a=iPj'\ yj = 0 ,1 ,2 , . . . , oo, j = 1 ,2, . . . , d, in the model (2.13), where p^ = ajp'j/s, 8=1,2,..., (3.17) with aj = — [ l o g ( l ^ p j ) ] - 1 and 0 < pj < 1. The mean and variance for Yj are E(Y,-) = ajPj/{l — aj) and Var(Yj) = otjPj(l — a , p j ) / ( l — Pj)2, respectively. This model allows for overdispersion when Pj > 1 — e _ 1 and underdispersion when pj < 1 — e _ 1 . Note that for this model to allow a zero count, we need a shift of one such that p^ = c*jpj + 1/(t + 1) for t = 0 ,1 ,2 , . . . . Chapter 3. Modelling of multivariate discrete data 108 For the situation where there is a covariate vector x,j for the response observation y,-,-, we may let pij = Fj (fj, x,j) where Fj is a univariate cdf. A n unattractive feature of this model is that is a decreasing function of s, which may not be suitable in many applications. 3.4 Multivariate copula discrete models for ordinal data In this section, we shall discuss the modelling of multivariate ordinal categorical data with mul- tivariate copula discrete (MCD) models. We first briefly discuss some special features of ordinal categorical data before we introduce the general M C D model for ordinal data and some specific models. When a polytomous variable has an ordered structure, we may assume the existence of a latent continuous random variable that measures the level of the ordered polytomous variable. For a binary variable, models for ordered data and unordered data are equivalent, but for categories variables with more than 2 categories, ordered data and unordered data are quite different. The modelling of unordered data is not as straightforward as the modelling of ordered data. This is especially so in the multivariate situation, where it is not obvious how to model the dependence structure of unordered data. We will discuss briefly the modelling of multivariate polytomous unordered data in Chapter 7. One aspect of ordinal data worth noticing is that it is possible to combine one category with an adjacent category for data analysis. But this practice may not be as meaningful for unordered categorical data, since the notion of adjacent category is not meaningful, and arbitrary clumping of categories may be unsatisfactory. We next introduce the M C D model for ordinal data. Consider d-dimension ordinal categorical random vectors Y with rrij categories for the jth margin (j — 1,2,. . .,d) and with the categories coded as 1,2, . . . , rrij. For the jth margin, the outcome yj can take values 1,2, . . . , rrij, where rrij can differ with the index j. For Yj, suppose the probability of outcome s, s = 1, 2 , . . . , rrij, is p('\ We define { 0, yj < 1, E ^ i P J 0 . !<%•<">,-, (3.18) 1, yj > rrij , where [yj] means the largest integer less or equal than yj. For a given d-dimensional copula C(ui,..., u^, $), C(G\(yi),..., Gd{yd)\fl) is a well-defined distribution for the ordinal random vector Chapter 3. Modelling of multivariate discrete data 109 Y . The pmf of Y is 2 2 P(yi • • • Vd) = J2 • • • E ( - l ) ' l + " ' + < 'C(a i , - 1 , . . . , a d i i ; 9), (3.19) »i=i «d=i where ciji — Gj(yj — 1), Oj2 = Gj(yj). (3.19) is called the multivariate copula discrete models for ordinal data. Since Yj is an ordered categorical variable, one simple way to reparameterize p^'\ so that the new parameter associated to the univariate margin has the range in the entire space, is to let Gj (%•) = Fj (ZJ (yj)) where Fj is a cdf of a continuous random variable Zj. Thus p^ = Fj (ZJ (y^)) — Fj(zj(y^'~1^)). This is equivalent to •Yj = l iff ZJ(0) < Zj < ZJ(1), Yj=2 iffzj(l)<Zj <ZJ{2), (3.20) . Yj = rrij iff Zj(mj — 1) < Zj < Zj(rrij), where —oo = Zj(0) < zy(l) < • • • < Zj(rrij — 1) < Zj(rrij) — oo are constants, j = 1, 2 , . . . , d, and the random vector Z = (Z\,..., Zf)' has a multivariate cdf Fi2-d- In the literature, the representation in (3.20) is referred to as modelling Y through the latent random vector Z, and the parameter Zj(yj) is called the yj th cut-off point for the random variable Zj. As for the M C D model for binary data, the choices of Fj are abundant. It can be standard logistic, normal, extreme value, gamma, lognormal cdf, and so on. Furthermore, FJ(ZJ) need not to be in the same distribution family for different j. Similarly, in terms of the form of copula C ( u i , . . . , ud\ 0), it can be the multivariate normal copula, a mixture of max-id copula, the Molenberghs-LesafFre construct, the Morgenstern copula and a permutation symmetric copula. It is also possible to express the parameters Zj(yj) as functions of covariates, as we will see through examples. For the dependence parameters 6 in the copula C ( u i , . . . , ud; 6), there is also the option of including covariates. The discussion in section 3.1 on the extension for letting the dependence parameters in the copulas be functions of covariates is also relevant here, since this only depends on the associated copulas. In the following, in parallel with the multivariate models for binary data, we will see some examples of multivariate models for ordinal data. 3 . 4 . 1 M u l t i v a r i a t e l o g i t m o d e l The multivariate logit model for ordinal data is obtained by letting Gj(yj) — exp(z;-(yj))/[l + exp(zj(yj)] in (3.19), where - o o = Zj(0) < Zj(l) < • • • < Zjjmj - 1) < Zj(rtij) = oo are constants, Chapter 3. Modelling of multivariate discrete data 110 j = 1, 2 , . . . , d. It is equivalent letting Fj(z) = exp(z)/[l + exp(z)], or choosing Fj to be the standard logistic cdf. The copula C in (3.19) is arbitrary. The copulas (3.1)—(3.8) are some choices here. It is relatively straightforward to extend the univariate marginal parameters to include covariates. For example, for ZJJ corresponding to random vector Y j , we can let ZJ,(?/J,-) = 7,-(WJ,) + gj (oij, X j , ) , for some constants —oo = 7,(1) < 7,(2) < • • • < 7,(m,- — 1) < 7,(w,) = oo, and some function gj in the range of (-co, oo). A n example of the function gj is gj(x) = x. As we have discussed for the multivariate copula discrete models for binary data, a simple way to deal with the dependence parameters is to let the dependence parameters in the copula be independent of covariates. To extend the model to let the dependence parameters be functions of covariates requires specific knowledge of the associated copula C. The discussion on the extension of letting the dependence parameters in the copulas be functions of covariates for the multivariate logit models for binary data in section 3.1 are also relevant here. As with the multivariate logit models for binary data, we may also have multivariate logit model for ordinal data with i. multinormal copula (3.1), ii . multivariate Molenberghs-Lesaffre construction a. with bivariate normal copula, b. with Plackett copula (2.8), c. with Frank copula (2.9), iii . mixture of max-id copula (3.3), iv. Morgenstern copula (3.6), v. the permutation symmetric copula (3.8). For illustrative purposes, we give some details on the multivariate logit model with multinormal copula for ordinal data. The multivariate logit model with multinormal copula for ordinal data is also called the multivariate normal-copula logit model for ordinal data. Let the data be y,- = (yu,..., y,d), i — l,...,n. For the situation with no covariates, there are E j = i ( m i — 1) univariate parameters and d(d— l ) /2 dependence parameters. The estimating equations from (2.42) are \ .(z.(v.))- ( UiM : U *n,(z,{y,)) - {F{z.{yj))_F{zjiy. _ 1 ) } F { z . { y j + vr, / f l \ _ V V n(VjVk) dPjk(yjyk) '.VJ + 1) \ exp(-z,(y,)) l ) ) - - F ( * ; ( V i ) ) y (l+exp(-z,(j/,)))2 Chapter 3. Modelling of multivariate discrete data 111 where F(z) = 1/[1 + exp(—z)], and PMvjVk) =$ 2($- 1(« j), V ) , ojk) - ^ ( a - 1 ^ ) , ojk)- $2($-1(Uj), $-!«), ojk) + ̂ 2 ( ^ _ 1 K ) , Ojk) with U j = 1/[1 + exp(-2j(j/j))], uk = 1/[1 + exp(-Zfc(w,t))], = 1/[1 + exp(-zj(yj - 1))], uj; = 1/[1 + exTp(-zk(yk - 1))]. From tf„;(zj(j/j)) = 0, we obtain P(,j(2/j + 1)) - F(zj(yj)) = nj[yi+,1] (F(zj(yj)) - F(zj(yj - 1))) = + ^ ( l ) ) . nj(l) This implies that ffl;-l which leads to £ (F(zj(yj + l))-F(zj(yj)))= £ „ , . ( w + l ) ^ M _ 0 , yi=0 y i = 0 ^ ̂ ^ ( 1 ) ) = ^ . where n = X ^ = i nj(Vj)- It is thus easy to see that ' F(zj(yj)) = S y = W r ) , which means that the estimate of £J(J/J) from I F M is ^(%) = l o S , n - E r i i » ; ( r ) . The closed form of Ojk is not available. We need to numerically solve 9njk{0jk) = 0 to find Ojk- For the situation with covariate vector x,j for the marginal parameters zj (yij) for Y,j , and a covariate vector vfijk f ° r the dependence parameter Oijk, » = l,...,n, one way to extend the model to include the covariates is as follows: zij{yij) = JiiVij) + J = 1. 2> • • •. d , e x p ( ^ W i j f c ) - l 0i,ik = 7̂ 7 T7T' l<J<k<d. exP{Pjkwi,jk) + 1 (3.21) The loglikelihood functions of margin for the parameter vectors yj = (7j(2),.. • ,7j(mi — 1))', otj (j - 1 , . . . , d) and 0jk (1 < j < k < d) are ^nj(yj,aj) = ^2\ogPj(yij), i=i n tnj k(yj,yk, Otj ,Otk,Pjk) = Y^ l 0 S P i j k (Vij ) > (3.22) 1 = 1 Chapter 3. Modelling of multivariate discrete data 112 where Pi,j(yij) = F(zij(yij)) - F(zij(yij - 1)) and Pijk(VijVik) =*2 ( * _ 1 (6 i i . ) , * _ 1 (6 , - * ) ;^* ) - M $ - 1 ( M > $ ~ V i * ) ; 0 u i f c ) - with a,-j = F(zj(yij -1)) , &,•_,• = F(zj(yij)), aik = F(zk(yik - 1)) and bik = F(zk(yik)). We can apply quasi-Newton minimization to the loglikelihood functions of margins (3.22) to obtain the estimates [ of the parameters fj, aj and the dependence parameters 6jk (or the regression parameters for the dependence parameters, 0jk, if applicable). The Newton-Raphson method can also be used to obtain the estimates of yj from * n j ( 7 j ) = 0, and the estimates of aj from 9nj(aj) = 0. For applying the Newton-Raphson method, we need to calculate dPj(yij)/dfj, dPj(yij)/daj, d2 Pj(yij) / dyjdyj, d2Pj(yij)/dajdaJ and d2Pj(ytj)/dyjdaJ. The mathematical details for applying the Newton- Raphson method are the following. Let Zij (yij) = yj (y,-,j) + aj i Xij i H h ajPj XijPj. For y,j ^ 1, mj, we have Pj(yij) = exp(zy(y tj))/[l + exp(zij(y0-))] - exp(z i j(y1 J- - 1))/[1 + exp(z, J( 2/ i j - 1))], thus dPj(yij)/dajs = {[exp(zij(yij))/(l+exp(zij(yij)))2]-[exp(zij(yij - l ) ) / ( l+exp(z i j (y , j - l ) ) ) 2 ] }^* , s = l , . . . , p j , and d2Pj{yij)/daj,dajt = {[exp(z, J(y, i))(l - exp(z 8 J (y i j ))) / ( l + exp(z i j(j/, i))) 3] - [exp(z0-(s/ij - 1))(1 - exp(zij(yij - 1)))/(1 + exp(z i j(y i j- - l)))3]}^-^,-,-,, s,< = l , . . . , p ; - . For r = 1, 2 , . . . , mj — 1, we have <9Tj(r) expt (l+exp^o-Cyo-)))3 e x p ( ^ j ( y < j - l ) ) ( H - e x p ^ C y ^ - ! ) ) ) * and for r i , r2 = 1,2, . . . , m,- 10 1, we have if r = yfj , if r = ytj - 1 otherwise , djj{ri)dyj(r2) ' exp îjCy^Xl-exp^yfai,-))) (l+exp îĵ yij)))3 < exp îl(yi,-l))(l-exp( î,(y;,-l))) (l+exp^^Cy -̂l)))3 .0 if r i = r 2 = y,-j , if n = r 2 = - 1 otherwise , and d2Pj{yij) dyj(r)dajs ( exp(^ij(yij))(l-exp(^<.,-(y;j))). = < (l+exp(*ii(yji)))3 h%2 s exp(*O -0 / i j - l ) ) ( l -exp(*i j (yi . , - - l ) ) ) ( l + e x p ^ i ^ y ^ - 1 ) ) ) 3 if r = 2/ij , if r = - 1 , 0 otherwise . For y{j = 1, -Pj(y,j) = exp(z 0-(y,j))/[l + exp^^y,-,-))] and for j/,-,- = , Pj{yij) = 1 - exp(z ! i(j/ ij - 1))/[1 + exp(zij(ytj — 1))], thus corresponding slight modification on the above formulas should be made. For details about numerical methods, see section 2.7. M $ and Dy can be calculated and estimated by following the results in section 2.4. In applica- tions, to avoid the tedious coding of M $ and D $ , we may use the jackknife technique to obtain the Chapter 3. Modelling of multivariate discrete data 113 asymptotic variance of Zj(yj) and Ojk when there is no covariates, or that of fj, a , and 8jk when there are covariates. 3.4.2 Multivariate probit model Similar to the multivariate probit model for binary data, the general multivariate probit model is obtained by letting Gj(yj) — $(zj(yj)) in (3.19), where —oo = Zj(0) < Zj(l) < • • • < Zj(rrij — 1) < Zj(rrij) = oo are constants, j = 1, 2 , . . . , d. It is equivalent to letting Fj{z) = 3>(z), or choosing Fj to be the standard normal cdf. The copula C in (3.19) is arbitrary. The copulas (3.1)—(3.8) are some choices here. The multivariate probit model with multinormal copula for ordinal data is discussed in the literature (see for example Anderson and Pemberton 1985). The discussion of the multivariate logit model for ordinal data in the previous subsection is relevant and can be directly applied to the multivariate probit model for ordinal data. For completeness, we provide some detailed discussion for the multivariate probit model for ordinal data when the copula is the multinormal copula. Let the data be yt- = (yu,..., yid), i = 1 , . . . , n. For the situation with no covariates, there are Ej=i( r ai — 1) univariate parameters and d(d— l ) /2 dependence parameters. As for the multivariate logit model, with the I F M approach, we find that Zj(yj) = $ - 1(Er=i n j ( r ) l n ) , a n d Ojk must be obtained numerically. For the situation with covariate vector Xjj for the marginal parameters Zj(yij) for Y{j, and a covariate vector vfijk for the dependence parameter 0,-jjt, i = l , . . . , n , the details on I F M for parameter estimation are similar to the multivariate logit model for ordinal data in the preced- ing subsection. We here provide some mathematical details for this model. We have Pi,j(yij) = $(zij(Vii)) ~ ®(zij (y>j ~ !)) a n d Pijkivijm) =*2(*- 1(6o),$- 1(6 iO ;0u*) - *2($- 1(feo),$- 1(aa);0. J*)- $ 2($ _ 1(ao), $ _ 1 ( M ; Oijk) + $2($ _ 1K-)> ®~\aiky, Oijk), where a t j = <b{zj(yij - 1)), = $(zy(«,•_,)), a i k - <&(zk{yik - 1)) and bik = <b(zk(yik))- The mathematical details for applying the Newton-Raphson method are the following. For yij ^ l .m,- , we have Pj(yij) = <S>{zij{yij))-<b{zij{yij -1)) , thus dPj(yij)/dajs = [<t>{zij{yij))-<t>(zij(yij-l))]xijs, s = l,...,pj, and d2Pj(yij)/dajsdajt = [-<f>(zij(yij))zij(yij) + <f>(zij(yij - l))zij(yij - l)]xijsxijt, s,t — 1 , . . . ,pj. For r = 1 ,2, . . . , — 1, we have dPjjyij) djj(r) ' 4>{zij(yij)) Hr = yij , = s -Hzij(yij -!)) i f r = y>j - !> . 0 otherwise , Chapter 3. Modelling of multivariate discrete data 114 and for r\, r2 = 1, 2 , . . . , rrij — 1, we have djj(ri)djj(r2) ' -<P(zij(yij))zij(yij) i f n = r2 = yij , < <i>(zij(yij - l))zij{yii -!) if n = r2 = y{j - 1 . 0 otherwise , and d2P:(Vij) djj(r)dajs ' -4>{zij{yij))zij{yij)xijs if r = yij , ' <l>(zij(yij - l))zij(yij - tyxijs if r = yij - 1 otherwise . For = 1, Pj(yij) = ®(z%j(yij)) and for ytj - rrij, Pj(ytj) = 1 - $(zij(yij - 1)), thus the corresponding slight modification on the above formulas should be made. For details on numerical methods, see section 2.7. My and Dy can be calculated and estimated by following the results in section 2.4. For example, for the case with no covariate, we have E(ip2(zj(yj))) = {[Pj(yj + 1) + Pj(yj)]<l>2(zj{yj))}/{Pi{yj + tyPj(%')}> where Pj(yj) = $(zj(yj)) — $(zj (yj — 1)), and so on. In applications, to avoid the tedious coding of My and Dy, we may use the jackknife technique to obtain the asymptotic variances of zj(yj) and djk in case there is no covariates, or those of yj, aj and 8jk in case there are covariates. The multivariate probit model with multinormal copula for ordinal data has been studied and applied in the literature. For example, Anderson and Pemberton (1985) used a trivariate probit model for the analysis of an ornithological data set on the three aspects of colouring of blackbirds. 3.4.3 Multivariate binomial model In the previous subsections, we supposed that for Yj, the probability of outcome s is p^'\ s = 1 , 2 , . . . ,rrij, j = 1 , . . . ,d, and we linked the rrij probabilities p ^ to rrij — 1 cut-off points Zj(l), Zj(2),..., Zj(rrij 1 ) . We keep as many independent parameters within the margins and between the margins as pos- sible. In some situations, it is worthwhile to reduce the number of free parameters and obtain a more parsimonious model which may still capture the major features of the data and serve the inference purpose. One way to reduce the number of free parameters for the ordinal variable is to reparameterize the marginal distribution. Because J2T=i — 1 a n d P^ ^ O i w e m a v l e t This reparameterization of the distribution of Yj reduces the number of free parameter to one, namely Pj. The model constructed in this way is called the multivariate binomial model for ordinal data. (3.23) for some 0 < pj < 1. In other words, we assume that Yj follows a binomial distribution Bi(m_,- — l,Pj). Chapter 3. Modelling of multivariate discrete data 115 By treating Pj(yj) as a binomial probabilities, we need only deal with one parameter pj for the j th margin. (3.23) is artificial for the ordinal data as s in (3.23) is based on letting Yj take the integer values in {0,1,..., rrij} as its category indicator. But s is a qualitative quantity; it should reflect the ordinal nature of the variable Yj, not necessarily take on the integer values in {0,1,..., mj}. In applications, if one feels justified in assuming the binomial behaviour in the sense of (3.23) for the univariate margin, then this model may be considered. (3.23) is a more natural assumption if the categorical outcome of each univariate response can be considered as the number of realizations of an event in a fixed number of random trials. In this situation, it is a M C D model for binomial count data. When there is a covariate vector x,j for the response observation y,-j, we may let Pij = bj(r/j,Xij) for some function bj in the range [0,1]. Other details are similar to the multivariate logit model for binary data. 3.5 Multivariate mixture discrete models for binary data The multivariate mixture discrete models (2.16) or (2.17) are flexible for the type of discrete data and for the multivariate structure by allowing different choices of copulas. However, they generally do not have closed form pmf or cdf. The choice of models should be based on the desirable features for multivariate models outlined in section 1.3, among them, (2) and (3) are considered to be essential. In this and the next section, we study some specific M M D models. The mathematical develop- ment for other M M D models with different choices of copulas should be similar. 3.5.1 Multivariate probit-normal model The multivariate probit-normal model for binary data is introduced in (2.32) in Example 2.13. Following the notation in Example 2.13, the corresponding cut-off points a,j is a,j = 0'jX.ij for a more general situation, where x,j is a covariate vector, j = l,...,d and i = l,...,n. As- sume Bj ~ NPi{pj,Y,j), j = l,...,d. Let 7 = {Bl,...,Bd)' ~ Nq(p,T,), where q = and Gov{Bj,Bk) — S j . From the stochastic representation in Example 2.13, we have ' * ? , = ^ -"-I - « {l + x^.E,^-} ! / 2 ' _ Ojk + X-jSjjfcXjfc i ' j k ~ {(l + x^SJ•xl•j)(l + x ^ X i , ) } 1 / 2 , J t • Chapter 3. Modelling of multivariate discrete data 116 The jth and (j, k) marginal pmf are HAW) = i/y + (! - - Pi,jk(yijyik) = *2 ($ - 1 (6 , - j ) , * - 1 (6<j fe ) ; r i j t ) - <M$_1(M>$_1(ai*0; r»•J•*)- $2($"~ 1 ( a • .;•)> nj*) + $ 2 ( $ - 1 ( a , j ) , $~1(a,-Jb); r , j*) , where ay = G,-j(y,j - 1), btj = Gy(j/y) , a,-* = Gik(yik - 1) and 6»fc = Gik{yik), with G y ( l ) = 1 and Gjj(O) = 1 — $(z*j). We can thus apply quasi-Newton minimization to the log-likelihood functions of margins inj(Pj, £,•) = £ l o g P , - ( » y ) > J = 1. • • •, d. »=i n 4jfc(Pj ,S j , pk,T,k,ejk,Ejk) = £ l o g Pjk(yijyik), l < j < k < d , «=i to obtain the estimates of the parameters / i ^ , E j , /ij., E j , fyi and Eyj,. From appropriate assumptions, many simplifications of (3.24) are possible. For example, if E j = I and E j * = 0, j ^ fc, then (3.24) simplifies to 1.1/2 ' ^ i . , . . . , u, (3.25) * {1 + x ^ x y } ^ ' 11/2 ' , J * {(l + x<.x t i)(l+x< f cx i ,)} 1 which is a simple example of having the dependence parameters be functions of covariates in a natural way, as they are derived. The numerical advantage is that as long as 0 = (9jk) is positive-definite, then all R{ = (ujk), i = 1 , . . . , n, are positive-definite. A n extension of (3.24) is to let z*j=lt'iXi}> J = l> •••'d> rijk 11/2 ' J ' ^ ' (3.26) {(l + w{ , .E i w i i ) ( l + w;. i E t w,- t ) } 1 where xy and wy may differ. However this does not obtain from a mixture model. 3.5.2 Multivariate Bernoulli-Beta model For a d-variate binary random vector Y taking value 0 or 1 for each component, assume we have the M M D model (2.17), such that P(yi---Vd)= t •• f ^f{yi-,Pj)<GM,...,Gi(pq))'[{gj{pj)dpl---dpq) (3.27) Jo Jo „•_-, Chapter 3. Modelling of multivariate discrete data 117 where f(yj',Pj) = pJ J (l— Pj)l~Vi • If in (3.27), Gj is a Beta(a?j, Bj) distribution, with density gj(pj) = [B(aj, Bj)]~1Pj3~1(l— PjY'-1, 0 < pj < 1, then (3.27) is called a multivariate Bernoulli-Beta model. The copula C in (3.27) is arbitrary; a good choice may be the normal copula (2.4). With the normal copula, the model (3.27) is M U B E , thus the I F M approach can be applied to fit the model. We can then write the jth and (j, k) marginal pmf as Pi (%) = / Pfi1 ~ Pif~Vi9j (Pj)dpj = B ( a j + y j , B j + l - yj)/B(aj, Bj), Jo PjkiVjyk) = Pi''(l - P j ) 1 _ W P 2 * ( l -Pk)l-ykHx,y\6jk)9j{Pi)9k{Pk)dpjdpk, where x = $~1(Gj(pj)), y = 3>~l(Gk(pk)), and <p2(x,y;0) is the density of the bivariate normal, and gj the density of Beta(a, ,Bj). Given data y,- = (yu,..., ytd) with no covariates, we may obtain aj, fa and Ojk with the I F M approach. For the case of an individual covariate x,j for Yij, an interpretable extension of (3.27) is ^(y.O = f •• f HM*ii>Pj)]yii[l ~ h j i x i j ^ j t f - ^ c i G M , G q ( P g ) ) n gj(Pj)dPl • • • dPq, Jo Jo j = 1 j = 1 (3.28) for some function hj with range in [0,1]. A large family of such functions is hj (xtj ,pj) = Fj (Ff1 (Pj)+ P'j'X-ij) where Fj is a univariate cdf. Pj(yij) and Pjk(yijVik) can be written accordingly. For ex- ample, if Fj(z) = exp(—e~z), then hj(xij,Pj) = p^XP^ ^ j X , : ' ^ and we have that when y,j = 1, Pj(yij) = B(ctj + exp(—3jXij),3j)/B(ctj,8j). If covariates are not subject dependent, but only margin-dependent, an alternative extension is to let a,,- and depend on the covariates for some functions aj and bj with range in [0,oo], such that a,-,- = a,j(fj,Xj) and = bj(t)j,Xj). In this sit- uation, we have, for example, Pj(yij) = B(aj(y'jXj) + yij, bj(rr,jXj) + l-yij)/B(aj(y'jXj), bj(tfjXj)). A n example of the functions aj and 6, is a,j — exp(y'jXj) and /?,-,- = exp(t]'jXj). When apply- ing the I F M approach to parameter estimation, the numerical computation involves 2-dimensional integration which would be feasible in most cases. A special case of the model (3.27), where pj = p, j = 1 , . . . , d, is the model (1.1), studied in Prentice (1986). The pmf of the model is P(vi---Vd)= I' py+(l-P)d-y+g(p)dp, : (3.29) Jo where y+ = J2j=iVj a n d 9(P) * s the density of a Beta(a,/?) distribution. The model (3.29) has exchangeable dependence and admits only positive dependence. A discussion of this special model, with extensions to include covariates and to admit negative dependence, can be found in Joe (1996). Chapter 3. Modelling of multivariate discrete data 118 3.5.3 Multivariate logit-normal model For a d-variate binary random vector Y taking value 0 or 1 for each component, suppose we have the M M D model (2.17), such that rl rl d P(yi---yd)= •••/ J\f{yj\Pj)g{pi,-•-,Pd)dpi-•-dPd, (3.30) Jo Jo j = 1 where f(yj ;pj) = py' (1 —pj)l~yi, and g(-) is the density function of a normal copula, with univariate marginal cdf Gj Jo V °~j In other words, if pj is the outcome of a rv Pj, and Zj = logit(Pj) = l o g ( P , / l — Pj), j = 1,...', d, then (Z\,..., Zd)' has a joint d-dimensional normal distribution with mean vector ji, variance vector <r2 and a correlation matrix 0 = {Ojk). We have 9(Pi, ...,Pd)= , d / 2 , i r . >J2T1d T. r exp { - J(z - p)'{<T'Qa)-\z - p)\ , (2Tr)d'2\a'e(T\1/2Ylj=lpj(l-pj) L 2 J where z = ( z i , . . . , z^)', with Zj = log(pj/l — pj). We call this model the multivariate logit-normal model. The Frechet upper bound is reached in the limit if 0 = J, where J is matrix of I's and c2 —• oo. The multivariate probit model obtains in the limit as a goes to oo, by assuming 0 be a fixed correlation matrix and the mean parameters pj = CjZj where Zj is constant. The j'th and the (j, k) marginal pmf are PjiVi) = t ^ 7 l p f _ 1 ( l " Pj)-y^(xj)dpj, j = l,...,d, Jo Pjk(yjyk) =11 (o-jo-kr'py _ 1 ( 1 -Pj)-y'plk-\l -Pk)-ykfo(xj,xk;Ojk) dPjdpk, l < j < k < d , Jo Jo where XJ = {[log(pj/(l — pj)) — Pj]/o-j}, j = 1 , . . . , d, cf> is the standard univariate normal density function and <f>2 the standard bivariate normal density function. Given data = (yn,..., yid) with no covariates, we may obtain pj, aj and Ojk by the I F M approach. For the case of different covariates for different margins, similar to the multivariate Bernoulli-Beta model, an interpretable extension of (3.30) is obtained by letting rl rl d P(yn---Vid)= / •••/ X\[hj{*ij,Pi)]yii[l-hj(xij,pj)]1-yi>g{p1,...,pd)dpl---dpd, (3.31) Jo Jo j = l for some function hj with range in [0,1]. Pj{yij) and Pjk{yijy%k) can be written accordingly. If covariates are not different, an interpretable extension to include covariates to the parameters in c(l - x) dx, 0 < pj < 1, j = 1,. Chapter 3. Modelling of multivariate discrete data 119 (3.30) obtains by letting ptj = aj{fj,Xj) and Cij = bj(r)j,Xij) for some functions aj and bj. The loglikelihood functions of margins for the parameters are now £nj(Pj,CTj) = ^logPjjyij), j = l , . . . , d , n enjk(8jk) = E \°&pjk{yiiyik), i < j < k < d , »=i where J-oo l + exp(/iy+<7,ja;) D / \ r°° r°° exp{yij(pij + aijx)} exp{yik(pik + o-iky)} , , a , , , Pjk(yijVik)= / / . , , , \ 7 7 '—<l>2{x,y\6jk)dxdy. J-oo J-oo 1 + exp(^ij + cr^z) 1 -)- exp(pik + <riky) A n example of the functions a, and 6, is / i 8 j = 7,Xj and <r,j = e x p ^ x , ) . It is also possible to include covariates to the dependence parameters 0jk; a discussion of this can be found in section 3.1. Again, when applying the I F M approach to parameter estimation, the numerical computation involves 2-dimensional integration, which should be feasible in most cases. 3.6 Multivariate mixture discrete models for count data 3.6.1 Multivariate Poisson-lognormal model The multivariate Poisson-lognormal model for count data is introduced in Example 2.12. The pmf of y,- = , . . . , yid), i = 1, • • •, n, is P(yn---yid)= •••/ T[f(yij;Xij)g(il-,l*i,<ri,®i)dT)i---driP, (3.32) Jo Jo *_ i where ; A,-j) = exp(-A,j)A??V2/0'!> a n d (3.33) with ?7j > 0, j = l,...,p, is the multivariate lognormal density. For simple situation with no covariates, fa = / i , fl-,- = c and 0,- = 0 . This model is studied in Aitchison and Ho (1989). The model (3.32) can accommodate a wide range of dependence, as we have seen in Example 2.12. Corr(Y}, Yk) is an increasing function of 9jk, and varies over its full range when 9jk varies over its full range. Thus in a general situation a multivariate Poisson-lognormal model of dimension d, consists of d univariate Poisson-lognormal models describing some marginal characteristics and d(d — l ) /2 Chapter 3. Modelling of multivariate discrete data 120 dependence parameters 9jk, 1 < j < k < d, expressing the strength of the associations among the response variables. Ojk = 0 for all j ^ k correspond to independence among the response variables. The response variables are exchangeable if 0 has an exchangeable structure and pj and <TJ are constant across the margins. We will see another special case later which also leads to exchangeable response variables. The loglikelihood functions of margins for the parameters are now £nj(pj, (Tj) = £ logPj(yij), j = l , . . . , d , i = l n Znjk(Ojk,Hj,Pk,aj,ak) = £logPjk{yijVik), 1 < j < k < d, where Pj{yij) = r e M y i j ( T j Z j - e ^ o i ) e M _ z 2 / 2 ) d Z j ! VtTVijl J-oo P „ ^ f°° f°° exp [yij(pj + ajZj) + yik(pk + akzk)] PjkiyijVik) = / 7^ , u + a . z . , U k + a . z . \ <t>2{Zj,Zk\9jk)dZjdzk, J-oo J-oo y»j!2/,A;!exp(e^+^^ + e' i fc+CTfcZk) where <j>2 is the standard binormal density. To get the I F M E of p., a and 0 , quasi-Newton mini- mization method can be used. Good starting points can be obtained from the method of moments estimates. Let yj, s2 and rjk be the sample mean, sample variance and sample correlations re- spectively. The method of moments estimates based on the expected values given in (2.30) are = {log[(«? - yj)/y] + l]} 1 ' 2 , p] = logy,- - 0.5(<7?)2 and 9% = logfokS jS k / ( y j y k ) + l}/(a]a0k). When there is a covariate vector xy for the response observation y i ;-, we may let pij = a, (7,, xy) for some function aj in the range (—00,00), and let <r,j = 6J(JJJ,X;J) for some function 6, in the range [0,oo). A n example of the functions a, and bj is mj = 7jX,j and cy = exp()jjX,j). It is also possible to let the dependence parameters 0jk be functions of covariates; a discussion of this can be found section 3.1. For details on numerical methods for obtaining the parameter estimates, see section 2.7. A special situation of the multivariate Poisson-lognormal model is to assume that / (yj jAj) = e~xi\y'/yj\, where Aj = A/?j. /?j > 0 is considered as a scale factor (known or unknown) and the common parameter A has the lognormal distribution LN(p, a2). In this situation we have V 2 7 T [ \ j = 1 Vj! J-00 exp(e"+^ Pj) and the parameters p and a are common across all the margins. To calculate P(yi • • - yd), we need only calculate a one-dimensional integral; thus full maximum likelihood estimation can be used to get the estimates of p, a and /3j (if it is unknown). By the formulas in (2.26), it can be shown that there Chapter 3. Modelling of multivariate discrete data 121 is an exchangeable correlation structure in the response vector Y , with the pairwise correlations tending to 1 when p or a tend to infinity. Independence is achieved when a —• 0. The model (3.34) does not admit negative dependence. 3 . 6 . 2 M u l t i v a r i a t e P o i s s o n - g a m m a m o d e l The multivariate Poisson-gamma model is obtained by letting Gj(r)j) in (2.24) be the cdf of a univariate gamma distribution with shape parameter aj and scale parameter Bj, with the density function gj(x;aj,8j) = B~a'xAJ~1e~x/Pi/T(CXJ), x > 0, aj > 0 and Bj > 0. The Gamma family is closed under convolution for fixed B. The copula C in (2.24) is arbitrary; (3.1)—(3.8) are some choices here. For example, with the multinormal copula, the multivariate Poisson-gamma model is M U B E . Thus the I F M approach can be applied to fit the model. The j th marginal distribution of a multivariate Poisson-gamma distribution is f°° Je~zizVizaj~1e-z^Pi dzj pj(Vj)= / fiVj; ZJ)9j(zj) dzj = Jo yj\pj3T(aj) _ T ( y j + a j ) f 1 Y * / Bj \ » y,-!r(tti) \l + 8j) \l + 8j) ' which implies that Yj has a negative binomial distribution (in the generalized sense). We have E(Yj) — ajBj and Var(Yj) = otj8j(l + Bj). The margins are overdispersed since Var(Yj)/E(Yj) > 1. Based on (3.35), if a , is an integer, yj can be interpreted as the number of observed failures in yj +aj trials, with aj a previously fixed number of successes. The parameter estimation procedure based on I F M is similar to that for the multivariate Poisson- lognormal model. Some simplifications are possible. One simplification for the Poisson-gamma model is to hold the shape parameter aj constant across j. In this situation, we have E(Yj) — pj = aBj and Var{Yj) = pj(l + Pj/a). Similarly, we can also require Bj be constant across j and obtain the same functional relationship between the mean and the variance across j. By doing so, we reduce the total number of parameters. With this simplification in the number of parameters, the same parameter appears in different margins. The I F M approach for estimating parameters common to more than one margin discussed in section 2.6 can be applied. Another special case is to let Aj = \Bj, where Bj > 0 is considered to be a scale factor (known or unknown) and the common parameter A has a Gamma distribution. This is similar to the multivariate Poisson-lognormal model (3.34). Negative dependence cannot be admitted into this special situation, which is similar to the multivariate Poisson-lognormal model (3.34). Chapter 3. Modelling of multivariate discrete data 122 3.6.3 Multivariate negative-binomial mixture model Consider d-dimensional count data with yj = rj, rj + 1 , . . . , rj > 1, j = 1,2 . . . , d. For example, with given integer value rj, yj might be the total number of Bernoulli trials until the r ,th success, where the probability of success in each trial is pj; that is If Pj is itself the outcome of a random variate Xj, j = 1 , . . . , d, which have the joint distribution G(p\,.. -,Pd), then the distribution for Y = ( Y i , . . . , Y<j) is called the multivariate negative-binomial mixture model. If the inverse of 1/Xj has a distribution with mean pj and variance aj, then simple calculations lead to E(Yj) = rjpj and Var(Yj) = rjpj(muj — 1) -I- rj(rj + 1)<T?. This multivariate negative-binomial mixture model for count data is similar to the multivariate Bernoulli-Beta model for binary data in section 3.5. Thus the comments on the extensions to include covariates apply here as well. A more general form of negative binomial distribution is (3.16), such that P j i V j l P i ) = Y{0%{yj + l/ji{l ~ ̂ ' # > ° . W = °. L 2> • • • • Using the recursive relation T(x) = (x — l)T(x — 1), Pj(yj\pj) can be written as yj I pj(yj\pj) = Pi (3j + k - - Pj) k=l 1, yj = 0. The multivariate negative-binomial mixture model can be denned with this general negative binomial distribution as the discrete mixing part. Bj, j — 1, . . .,d, can be considered as parameters in the mo del. 3.6.4 Multivariate Poisson-inverse Gaussian model The multivariate Poisson-inverse Gaussian model is obtained by letting Gj(rjj) in (2.24) be the cdf of a three-parameter univariate inverse Gaussian distribution with density function 9j(*j) = -JTi—\Xrlexp[(^/2)(CJ/AJ- + \j/ij)), Xj > 0, (3.36) where uij = f? + a? — aj > 0, £j > 0 and —oo < jj < oo. In the density expression, Kv(z) denotes the modified Bessel function of the second kind of order v and argument z. It satisfies the Chapter 3. Modelling of multivariate discrete data 123 relationship 2v Ku+1(z) = —Kv(z) + Kv.1(z), z with K-i/2(z) = K 1/2(2) = yJ!Tf2zexp(—z). The copula C in (2.24) is arbitrary; interesting choices are copulas (3.1)—(3.8). With the multinormal copula, the multivariate Poisson-inverse Gaussian model is M U B E ; thus the I F M approach can be applied to fit the model. A special case of the multivariate Poisson-inverse Gaussian model results when f(yj\Zj) = e~ZiZj1 /yj\, where Zj = Xtj, with tj > 0 considered as a scale factor (j = l , . . . , d ) . Then the pmf for Y is K k + 1 (y/w(w + 2tZt:) ( u> V*+7)/2 TT (&i)yi Ky(w) V w + 2 ^ E ^ y / = i yj- where k = J^j=i extensive study of this special model can be found in Stein et al. (1987). 3.7 Application to longitudinal and repeated measures data Multivariate copula discrete ( M C D ) and multivariate mixture discrete ( M M D ) models can be used for longitudinal and repeated measures (over time) data when the response variables are discrete (binary, ordinal and count), and the number of measures is small and constant over subjects. The multivariate dependence structure has the form of time series dependence or of dependence decreas- ing with lag. Examples include M C D and M M D models with special copula dependence structure and special patterns of marginal parameters. These models include stationary time series models that allow arbitrary univariate margins and non-stationary cases, in which there are time-dependent or time-independent covariates or time trends. In classical time series analysis, the standard models are autoregressive (AR) and moving average (MA) models. The generalization of these concepts to M C D and M M D models for discrete time series is that "autoregressive" is replaced by Markov and "moving average" is replaced by fc-dependent (only rv's that are separated by a lag of k or less are dependent). A particularly interesting model is the Markov model of order one, which can be considered as a replacement for AR(1) model in classical time series analysis; and these types of Markov models can be constructed from families of bivariate copulas. For a more detailed discussion of related topics, such as the extension of models to include covariates and models for different individuals observed at different times, see Joe (1996, Chapter 8). Chapter 3. Modelling of multivariate discrete data 124 If the copula is the multinormal copula (3.1), the correlation matrix in the multinormal copula may have patterns of correlations depending on lags, such as exchangeable or A R type. For example, for exchangeable pattern, Ojk = 0 for all 1 < j < k < d. For AR(1), djk = 6^3~h\ for some \0\ < 1. For AR(2), 6jk = ps, with s = \j — k\. ps is the autocorrelations of lag s; the autocorrelation satisfy Ps = <i>iP,-i+<f>2P,-2, s >%,<}>!= px(l- p2)/(l- pj), <f>2 = (P2 - pl)/{l- p\), and are determined from pi and p2- Some examples of models suitable for modelling longitudinal data and repeated measures (over time) are the multivariate Poisson-lognormal model, multivariate logit-normal model, multivariate logit model with multinormal copula or with M - L construction, multivariate probit model with multinormal copula, and so on. In fact, the multivariate probit model with multinormal copula is equivalent to the discretization of A R M A normal time series for binary and ordinal response. For the discrete time series and d > 4, approximations can be used for the probabilities Pr(Y} = yj, j — 1 , . . . , d) which in general are multidimensional integrals. 3.8 S u m m a r y In this chapter, we studied specific M C D models for binary, ordinal and count data, and M M D models for binary and count data. ( M M D models for ordinal data are not presented, since there is no natural simple way to represent such models, however M M D models for binary data can be extended to M M D models for nominal categorical data.) Extension to let the marginal parameters as well as the dependence parameter be functions of covariates are discussed. We also outlined the potential application of M C D and M M D models for longitudinal data, repeated measures and time series data. However, this chapter does not contain an exhaustive list of models in the family of M C D and M M D classes. Many additional interesting models in M C D and M M D classes could be introduced and studied. Our purpose in this chapter is to demonstrate the richness of the classes of M C D and M M D models, and to make several specific models available for applications. Some examples of the application of models introduced in this chapter can be found in Chapter 5. Chapter 4 The efficiency of I F M approach and the efficiency of jackknife variance estimate It is well known that under regularity conditions, the (full-dimensional) maximum likelihood esti- mator (MLE) is asymptotically efficient and optimal. But in multivariate situations, except the multinormal model, the computation of the M L E is often complicated or impossible. The I F M approach is proposed in Chapter 2 as an alternative estimation approach. We have shown that the I F M approach provides consistent estimators with some good asymptotic properties (such as asymptotic normality of the estimators). This approach has many advantages; computational fea- sibility is main one. It can be applied to many M C D and M M D models (models with M U B E , P U B E properties) with appropriate choices of the copulas; examples of such copulas are multinor- mal copula, M - L construction, copulas from mixture of max-id distributions, copulas from mixture of conditional distributions, and.so on. The I F M theory is a new statistical inference theory for the analysis of multivariate non-normal models. However, the efficiency of estimators obtained from I F M in comparison with M L estimators is not clear. In this chapter, we investigate the efficiency of the I F M approach relative to maximum likelihood. Our studies suggest that the I F M approach is a viable alternative to M L for models with M U B E , P U B E or M P M E properties. This chapter is organized as follows. In section 4.1, we discuss how to assess the efficiency of the I F M approach. In section 4.2, we carry out some analytical comparisons 125 Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 126 of the I F M approach to M L for some models. These studies show that the I F M approach is quite efficient. A general analytical investigation is not possible, as closed form expressions for estimators and the corresponding asymptotic variance-covariance matrices from M L and I F M are not possible for the majority of multivariate non-normal models. Most often numerical assessment of their performance must be used. In section 4.3, we carry out extensive numerical studies of the efficiency of I F M approach relative to M L approach. These studies are done mainly for M C D and M M D models with M U B E or P U B E properties. The situations include models without and with covariates. In section 4.4, we numerically study the efficiency of I F M approach relative to M L approach for models with special dependence structure. The I F M approach extends easily to the models with parameters common to more than one margin. Section 4.5 is devoted to the numerical assessment of the efficiency of the jackknife approach for variance estimation of I F M E . The numerical results show that the jackknife variance estimates are quite satisfactory. 4.1 The assessment of the efficiency of I F M approach In section 2.3, we have given some optimality criteria for inference functions. We concluded that in the class of all regular unbiased estimating functions, the inference functions of scores (IFS) is M - optimal (so T-optimal or D-optimal as well). For the regular model (2.12), the inference function in the I F M approach are in the class of regular unbiased inference functions; thus all the (asymptotic) properties of regular inference functions apply to I F M . To assess the efficiency of I F M relative to IFS, at least three approaches are possible: A l . Examine the M-optimality (or T-optimality or D-optimality) of I F M relative to IFS. A2. Compare the M S E of the estimates from I F M and IFS based on simulation. A3 . Examine the asymptotic behaviour of 2£($) - 2£{6) based on the knowledge that 2£(6) - 21(0) has an asymptotic \ 2 q distribution when 6 is the true parameter vector (of length q). A l is along the lines of inference function theory. As an estimator may be regarded as a solution to an equation of the form \£(y; 6) = 0, we study the inference functions instead of the estimators. This approach can be carried out analytically in a few cases when both the Godambe information matrix of I F M and the Fisher information matrix for IFS are available in closed form, or otherwise numerically by computing (or estimating) the Godambe information matrix and the Fisher information matrix (based on simulation). With this approach, we do not need to actually find the parameter estimates Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 127 for the purposes of comparison. The disadvantage is that the Godambe information matrix or Fisher information matrix may be difficult to calculate, because partial derivatives are needed for the computation and they are difficult to calculate for most multivariate non-normal models. Also this is an asymptotic comparison. A2 is a conventional approach, it provides a way to investigate the small sample properties of the estimates. This possibility is especially interesting in comparison with A l , since although M L E s are asymptotically optimal, this may not generally be the case for finite samples. The disadvantage with A2 is that it may computationally demanding with multivariate non-normal models, because for each simulation, parameters estimation based on I F M and IFS are carried out. A3 is based on the understanding that if the estimates from I F M are efficient, we would envisage that the full-dimensional likelihood function evaluated at these estimates should have the similar asymptotic behaviour as when the full-dimensional likelihood function is evaluated at the M L E . More specifically, suppose the loglikelihood function is £(0) = Yl7=i l°S/(yi|0)> where 6 is a vector of length q. Under regularity conditions, 2(£(8) —£(8)) has an asymptotic x\ distribution (see for example, Sen and Singer 1993, p236). Thus a rough method of assessing the efficiency of 0 is to see if 2(£(0) — £(0)) is in the likelihood-based confidence interval for 2(1(0)— £(0))\ this interval of 1 —a confidence is (x2.a/2> Xji -a/2) ' where Xqtp is the lower 0 quantile of a chi-square distribution with q degrees of freedom. The assessment can be carried out by comparing the frequency of (empirical confidence level of) 2(1(0) —1,(0)) in the ( x 2 a / 2 ' ^ j i _ a / 2 ) with 1 — a. In other words, we check the frequency of and 0 is considered to be efficient if the empirical frequency is close to 1 — a. The advantage of this efficiency of 0 in comparison with 0 in relatively small sample situations. In our studies, A3 will not be used. We mention this approach merely for further potential investigations. To compare I F M with IFS by A l , we need to calculate the Fisher information (matrix) and the Godambe information (matrix). Suppose P(y\ • • -yd',0), 0 6 5ft, is a regular M C D or M M D model in (2.12), where 0 = ..., 6q)' is g-component vector, and 5ft is the parameter space. The Fisher information matrix from one observation for the parameter vector 0, I, has the following expression 8£{6--xl,a/2<2(t(0)-l(0))<X2., approach is that only 0, 1(0) and £(0) need to be calculated; this leads to much less computation compare with finding 0. The disadvantage is that this approach may not be very informative about Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 128 where dPi-q(yi •••ya) 1, 1 dPx...q(yi •••yd) dPi...g(yi •••yd) Pi-q(yi •••yd) d6j d9k 1 < j < k < q. {yi---y*} Assume the I F M for one observation is $ = (^I, • • - ,4>q)- The Godambe information matrix Jy based on I F M for one observation is = D<$M^1 Dl%, where / M u ••• M l q \ /Dn ,9,J Dlq\ D q q / # *), Djj = EWj/dBj) (j = with Mj, = E(rl>]) (j = 1 , . . . , « ) , Mjk = E ( ^ k ) (j, k = 1, 1 , . . . , q), and Djk = E(dipj/dOk) (j,k = 1 , . . . , q, j ^ k). The detailed calculation of the elements of M $ and D $ can be found in section 2.4 for the models without covariates. The M-optimality assessment examines the positive-definiteness of J^1 — I~l. It is equivalent to T-optimality which examines the ratio of the trace of the two information matrices, Tr ( J^ ' 1 ) /Tr (7 _ 1 ) , and D-optimality which examines the ratio of the determinant of the two information matrices, det(J^' 1 )/det(7 _ 1 ) . T - optimality is a suitable index for the efficiency investigation as it is easier to compute. A n equivalent index to D-optimality is ^ d e t ( J * )/det(I l ) . In our efficiency assessment studies, we will use M - optimality, T-optimality or D-optimality interchangeably depending on which is most convenient. In most multivariate settings, A l is not feasible analytically and extremely difficult computa- tionally (involving tedious programming of partial derivatives). A2 is an approach which eliminates the above problems as long as M L E s are available. As M L E s and IFMEs are both unbiased only asymptotically, the actual bias related to the sample size is an important issue. For an investigation related to sample size, it is more sensible to examine the measures of closeness of an estimator to its true value. Such a measure is the mean squared error (MSE) of an estimator. For an estimator 6 = 9(X\,..., Xn), where X\,..., Xn is a random sample of size n from a distribution indexed by 9, the M S E of 9 about the true value 9 is MSE(<?) = E(9 - 9f = Var{9) + \E{9) - 9]2. Suppose that 9 has a sampling distribution F, and suppose 9i,..., 0m are iid of F, then an obvious estimator of M S E ( £ ) is MSE(0) = E £ i ( f t - g ) 2 m (4-1) Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 129 If 9 is from the I F M approach and 9 from the IFS approach, A2 suggests that we compare MSE(0) and MSE(0). For a fixed sample size, 9 need not be the optimal estimate of 9 in terms of M S E , since now the bias of the estimate is also taken into consideration. The measure MSE(0)/MSE(0) thus gives us an idea how I F M E performs relative to M L E . The approach is mainly computational, based on the computer implementation of specific models and subsequent intensive simulation and parameter estimation. A2 can be easily used to study models with no covariates as well as with covariates. 4.2 Analytical assessment of the efficiency In this section, we study the efficiency of the I F M approach analytically for some special models where the Godambe information matrix and the Fisher information matrix are available in closed form, or computable. E x a m p l e 4.1 ( M u l t i n o r m a l , general) Let X ~ Nd(p, E) . Given n independent observations x i , . . . , x „ from X , the M L E s are .*»' — H A * * — H i=l i = l It can be easily shown that the I F M E p and E are equal to p and E respectively, so M L E and I F M E are equivalent for the general multinormal distribution. • E x a m p l e 4.2 ( M u l t i n o r m a l , c o m m o n m a r g i n a l mean) Let X ~ Nd(p, £ ) , wherep = (pi,..., pdy = pi for a scalar parameter p and E is known. Given n independent observations x i , . . . , x „ with same distributions as X , the M L E of p is The I F M of p = (p\,..., pd)' are equivalent to $ . - V X i j ~ ^ i - l d i = l ai which leads to p,j — n~l E " = i x»j> o r A* = n _ 1 E i = i x « - simple calculation leads to J ^ 1 ^ ) = n _ 1 E . Thus if we incorporate the knowledge that pi,...,pd are the same value, with W A and P M L A , we have Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 130 i . W A : the final I F M E of p is P-w = l ' E - i l ' which is exactly the same as p since ft = " - 1 E r = i x » - So m this situation, the I F M E is equivalent to the M L E . ii . P M L A : the final I F M E of p is . l ' jdiagtS) } - 1 / ! E o - r - V ^ l ' { d i a g ( £ ) } - i l y > r / ' With this approach, I F M E is not equivalent to M L E . The ratio of Var(/ip) to Var(/i) is l / { d i a g ( S ) } - 1 S { d i a g ( S ) } - 1 l l / S - 1 l (l ' {diag(E)}-il)2 There is some loss of efficiency with simple P M L A . • E x a m p l e 4.3 (Trivariate p r o b i t , general) Suppose we have a trivariate probit model with known cut-off points, such that P ( l l l ) = $3(0, 0, 0, p\2, P13, p23)- We have the following (Tong 1990): P i ( l ) = P 2 ( l ) = P3(l) = * ( 0 ) = (4.2) ^ + ^ ( s i n 1,912+sin 1 p13 + sin 1 p23). (4.3) P ( l l l ) = $3(0,0, 0,pi2,/»13,A>23) The full loglikelihood function is In = n ( l l l ) l o g P ( l l l ) + n(110) log P(110) + n(101) log P(101) + n(100) logP(100) n(011) logP(Oll) + n(010) log P(010) + n(001) log P(001) + n(000) log P(000). Even in this simple situation, the M L E of pjk is not available in closed form. The information matrix for P12, P13 and p23 from one observation is /In I12 I13 \ I = I12 I22 I23 , \Ii3 I23 hs I where, for example, '11 d P ( l l l ) + P(ll l ) V dpi2 1 fdP{011)\2 + 1 /<9P(110)\ /ap(ioi)V fdP(100)\ P(110) V d P l 2 J + P(101) ^ d P l 2 ) + P(100) d P l 2 ) /<9P(010)\ / d P ( 0 0 1 ) V fdP(000)\ p(oii) I, d P l 2 J + P(OIO) V d P l 2 ) + p(ooi) ^ aPi2 ) + P(OOO) ^ d P l 2 ) Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 131 Simple calculation gives us 5 P ( l l l ) / c V i 2 = 1/(4^-^/1 — p f 2 ) ; a n d other terms also have similar expressions. After simplification, we get 7T3 + 64a - 16TT6 ""C 1 - P2i2)cdef ' where a = s i n 1 p i 2 s i n 1 / 9 i 3 s i n 1 p23, 6 = (sin V i 2 ) 2 + (sin 1 pi3f + (sin 1 p23)2, c = 7T + 2 s i n - 1 pi2 + 2 s i n - 1 piz + 2 s i n - 1 p23, d= 7r + 2 s i n - 1 pi2 — 2 s i n - 1 P13 — 2 s i n _ 1 p23, e = 7r — 2 s i n - 1 p\2 + 2 s i n - 1 p\3 — 2 s i n - 1 p23, / = 7T — 2 s i n - 1 pi2 — 2 s i n - 1 p13 + 2 s i n - 1 p23. Other components in the matrix I can be computed similarly. The inverse of / , after simplification, is found to be / « n ai2 a 1 3 \ r1 = a\2 a22 a23 \ a i 3 a 2 3 a 3 3 where a n = <222 = a33 = a i 2 = a i 3 = 023 = ( T 2 - 4 ( s i n - V i 2 ) 2 ) ( l -Ph) ( T 2 - 4 4 ( s i n - 1 p i 3 ) 2 ) ( l -Piz) 4 4 ( s i n - 1 / > 2 3 ) 2 ) ( l -pis) (2 s i n 4 _ 1 p i 2 s i n _ 1 p13 - 7 r s i n _ 1 P23)(l - P 2 2 ) 1 / 2 ( l - P213)1/2 (2 s i n - 1 P i 2 s i n _ 1 p23 - 2 7 r s i n - 1 Pis)(l - P 2 2 ) 1 / 2 ( l ~ Ph)1/2 (2 s i n _ 1 / ? i 3 s i n _ 1 / ? 2 3 - 2 w s i n - 1 P12 ) ( l - p ? 3 ) 1 / 2 ( l ~ Ph)1'2 For the I F M approach, we have 'n , - t ( l l ) + n > t(00) n,-*(10) + n,-t(01)\ dPjk(U) njk 9njk = 0 leads to Pjk = s i n 1 / 2 - ^ ( 1 1 ) 7T n , - t ( l l ) + njt(OO) ~ njfc(lO) - "jfc(Ol) 3p , 1 < j < k < 3. 2 n If the I F M for one observation is 9 = (^12, ^13, ^23), then from section 2.4, we have E(tl>- VJ ) = V p:kim{yjykyiym) dPjk{yjyk) dPim(yiym) j" ' m {y^v-} P i * ( W » ) f l m ( » y » ) dpjk 8p,m Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 132 where 1 < j < fc < 3, 1 < / < m < 3, and 'dj>jk\ i E We thus find that dpjk E {yjyk} 1 < j < fc < 3. My = / on bi2 bi3' bi2 b22 623 \ ^13 &23 &33' and Dy = / - 6 1 0 V 0 0 -622 0 ° \ 0 -633/ where bn &22 633 bl2 &13 b23 = (TT2 — 4(sir ( T T 2 — 4(sin" (7r 2 — 4(sin" (w2 — 4(sin" (7r 2 — 4(sin" pi2)2)(i-pi2y 4 P 1 3 ) 2 ) ( 1 - P 2 3 ) ' 4 P23) 2)(wy 1 6 s i n - 1 p i 2 s i n - 1/3i 3 — 87rsin - 1 P23 Pi2?){*2 - 4(sin - 1 p 1 3 ) ' ) ( l - p ? 2 ) 1 / 2 ( l - p\3fl2 ' 16 s i n - 1 pi2 s i n - 1 p 2 3 — 87r s i n - 1 p\3 P12)2)(*2 ~ 4(sin" 1 P23) 2)(l - Pl2)1/2(1 - P223)1/2 ' 16 s i n - 1 P13 s i n - 1 P23 — 87rsin - 1 P12 (n2 - 4 (s in - 1 p 1 3) 2)(7r 2 - 4(sin - 1 p23)2)(l - ? 2 3) 1 / 2(1 - ph)1'2 ' After simplification, J^1 = D^1 My(D~^l)T turns out to be equal to I - 1 . Therefore by M-optimality, the I F M approach is as efficient as the IFS approach. The algebraic computation in this example was carried out with the help of the symbolic manip- ulation software Maple (Char et al. 1992). Maple is also used for other analytical examples in this section. For completeness, the Maple program for this example is listed in Appendix A . The Maple programs for other examples in this section are similar. • Example 4.4 (Trivariate probit, exchangeable) Suppose now we have a trivariate probit model with known cut-off points, such that P(lll) = $3(0,0,0, p, p, p). That is, the latent variables are permutation-symmetric or exchangeable. With (4.2), we obtain P1(1) = P2(1) = P3(1) = *(0)=|, Pi 2 (H) = Pi 3 (H) = P 2 3 (H) = $ 2(0,0 , p ) = ] + ±- s i n - 1 p, P(lll) = $3(0,0,0, p, p, p) = I + A s i n " 1 p. o 47T The M L E of p is 7r(4niii +4n0 0o -n) Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 133 Based on the full loglikelihood function (4.3), we calculate the Fisher information for p (using Maple). The asymptotic variance of p is found to be (1 - P2)(TT + 6 s i n - 1 p)(7r - 2 s i n - 1 p) Var(p) = 12n Let the I F M for one observation be \P = (ip\2,1P13, ^23) • We use W A and P M L A to estimate the common parameter p: i . W A : We have / a b / - a 0 0 \ b a b and £>$ = 0 —a 0 \b b a) I 0 0 - a ) where a = ( 7 r 2 - 4 ( s i n _ 1 p ) 2 ) ( l - p 2 ) 6 = ! sin 1 p (TT - 2 s i n _ 1 p)(w + 2 s i n - 1 p)2{\ - p2) Thus / a - 1 a~2b a~2b\ J * 1 = D^MviDy1) 1\T a 2b a 1 a 2b \a~2b a~2b a'1 J Assume the I F M E of pn, P13, P23 are p\2, P13, P23 respectively. With W A , we find the weighting vector u = (1/3,1/3,1/3)'. So the I F M E of p, pw> is Pw = ^ (Pl2 + Pl3 + P23), and the asymptotic variance of pw is Var(p) = — u ' J ^ 1 u _ l ( l - / ? 2 ) ( 7 r 2 - 4 ( s i n - 1 p ) 2 ) 2 ( l - p 2 ) ( 7 r - 2 s i n - 1 / 9 ) s i n - 1 p + 3 9 4n _ (1 - p2)(ir + 6 s i n " 1 p)(ir - 2 s i n " 1 p) 12n ii . P M L A : The I F M is * = V12 + ^13 + fe- Thus = £(̂ 12 + ̂ 13 + V"23) 2n £ P ( y i * « , ) ( £ * / P ( . w w ) ) (4.4) Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 134 and D9 = E(d(ilil2 + V>13 + i>23)/dp) 3 {l/iyaya} j,k=VJ<k We find (using Maple) My — Dy = Pfkivm) V dp 12(TT + 6 s i n - 1 p) + d2Pjk(yjyk) Pjk(yjVk) dp2 (4.5) (1 - P2)(TT - 2 s i n - 1 P)(TT + 2 s i n - 1 p)2 ' 12 (TT 2 - 4 ( s i n - V ) 2 ) ( l -p2)' The evaluation of J^1 = £ ) ^ 1 M $ ( Z ) ^ 1 ) T leads to the asymptotic variance of pp (1 - p2)(ir + 6 s i n - 1 p)(% - 2 s i n - 1 p) Var(pp) 12n We have so far shown that Var(p) = Var(p t t,) = Var(p p), which means that the I F M with W A and P M L A lead to an estimate as efficient as that from IFS approach. Any single estimating equation from I F M also gives an asymptotically unbiased estimate of p, and the p from each of these estimating equations has the same asymptotic properties because of the exchangeability of the latent variables. The ratio of the asymptotic variance of the I F M E of p from one estimating equation to the asymptotic variance of the M L E p is found to be [3(7T + 2 s i n - 1 P)]/[TT + 6 s i n - 1 p]. Figure 4.1 is a plot of the ratio versus p £ [—0.5,1]. The ratio decreases from oo to 1.5 as p increases from —0.5 to 1. When p = 0, the ratio value is 3. These imply that the estimate from a single estimating equation has relatively high efficiency relative to the M L E when there is high positive correlation, but performs poorly when there is high negative correlation. • Example 4.5 (Trivariate probit, AR(1)) Suppose we have a trivariate probit model with known cut-off points, such that -P ( l l l ) = $3(0,0,0, p, p2, p). That is, the latent variables have AR(1) cor- relation structure. With (4.2), we obtain P 1(1) = P 2(1) = P 3(1) = *(0) = ^ P i 2 ( H ) = P 2 3 ( l l ) = $2(0,0, p) = \ + ^ - s i n - 1 p, P 1 3 ( l l ) = $ 2(0,0,p 2) = I + J - s i r r V , P ( l l l ) = $3(0,0,0, p, P\P)=\ + ^ s i n " 1 p + ^ s i n " 1 p 2 . Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 135 1 1 1 1 1— -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 Figure 4.1: Trivariate probit, exchangeable: The efficiency of p from the margin (1,2) (or (1,3), (2,3)) Based on the full loglikelihood function (4.3), the asymptotic variance of p is found (using Maple) to be 7r(l - p 4 ) a 1 a 2 a 3 Var(p) = 8 [p 2 a 4 + (1 + p 2 )a 5 + p(l + p2y/2a6] ' (4.6) where ai = 7r — 2 s i n - 1 p2, a2 = 7r — 4 s i n - 1 p + 2 s i n - 1 p2, a,3 = ir + 4 s i n - 1 p + 2 s i n " 1 p2, a 4 = 2TT2 - 16(sin - 1 p)2 + 47rsin _ 1 p2, a5 = 7r 2 - 4 (s in _ 1 p 2 ) 2 , ag = 16 s i n - 1 p s i n - 1 p 2 — 87rsin _ 1 p. Let the I F M for one observation be 9 = (V"i2, i>i3, ̂ 23)- We use W A and P M L A to estimate the common parameter p: i . W A : We have My = I a c d^ c b c I and Dy = \ d c a. / - a 0 0 \ 0 - 6 0 V 0 0 - a I Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 136 where a ( 7 r 2 - 4 ( s i n - V ) 2 ) ( i - / > 2 ) ' ( 7 r 2 - 4 ( s i n-V ) 2 ) ( l - / > 4 ) ' 16/jsin - 1 p d = ( T T 2 - 4(sin- 1 P) 2 )(TT + 2 s i n " 1 p2)(l - p2)(l + p2)1/2 ' 87rsin _ 1 /9 2 - 16(sin - 1/?) 2 - 1 n\2\2( ( 7 r 2 - 4 ( s i n - 1 / 9 ) 2 ) 2 ( l - / ' 2 ) Thus c(afc)- 1 da~2 \ J * 1 = Dy1 My (Dy1 )T = c(ab)-1 ciab)-1 \ da~2 c(ab)-1 a - 1 / Assume the I F M E of p12, P\3, P23 are p12, P13, f>23 respectively, and let p = (p\2,p\3, P23)'• With W A , the I F M E of p, pw, is Pw = u'p, where the weighting vector u = « 2 , "3)' = Jyl/(l' Jyl). We find that 01020307 Ui = U3 «2 2o 8 [p2a4 + (1 + p 2 )o 5 + p{l + p2y/2a6] (2/>2o4 + p{\ + /9 2 ) 1 / 2 a 6 ) a 2 a 3 2a 8 [p 2 a 4 + (1 + p2)a5 + p(l + p2y'2a6} ' where ai , 02, 03, 04, 05, and ae as above, and a7 = TT + irp2 + 2 s i n - 1 p2 + 2p2 s i n - 1 p2 - 4p(p2 + 1 ) 1 / 2 s i n - 1 p, a8 = T T 2 + 47rs in _ 1 p2 + 4 ( s in - 1 p2)2 - 16(sin _V) 2- Figure 4.2 is a plot of the weights versus p 6 [—1,1]. The asymptotic variance of p is Var(p) = V j - y which turns out to be the same as (4.6). ii . P M L A : The I F M is \P = tp12 + rp13 + tp23. Following (4.4) and (4.5), we calculate (using Maple) the corresponding My and Dy, and then Var(/?p) = J^1. The algebraic expression for Var(pp) is complicated, so we do not display it. Instead, we plot the ratio Var(/>p)/Var(/5) versus p 6 [—1,1] in the Figure 4.3. The maximum of the ratio is 1.0391, which is attained at p = 0.3842 and p = -0.3842. Chapter 4, The efficiency of IFM approach and the efficiency of jackknife variance estimate 137 Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 138 1 . 0 - 0 . 5 O . O 0 . 5 1 . 0 - 1 . 0 - 0 . 5 O . O 0 . 5 1 . O Figure 4.4: Trivariate probit, AR(1): (a) The efficiency of p from the margins (1,2) or (2,3). (b) The efficiency of p from the margin (1,3). The above results show that I F M with W A leads to an estimate as efficient as the IFS approach in the AR(1) situation, and I F M with simple P M L A leads to a slightly less efficient estimator (ratio< 1.04). The p from the estimating equations based on margin (1,2) (or (2,3)) is different from the p based on margin (1,3). For p from I F M with the (1,2) (or (2,3)) margin, the ratio of the asymptotic variance of the I F M E of p to the asymptotic variance p is 2(,r2 - 4(sin- 1 p)2) [p2aA + (1 + p2)a5 + p(l + p2)^2a6] 7r(l + p2)aia2a3 For p from I F M with (1,3) margin, the corresponding ratio is ( T T 2 - 4(sin- 1 p2)2) [p2aA + (1 + p2)a5 + p(l + p2)1'2^} 7>2 — 2itp2aia.2a3 We plot r\ and r2 versus p 6 [—1,1] in Figure 4.4. We see that when p goes from —1 to 0, r i increases from 1.707 to values around 2. When p goes from 0 to 1, r*i decreases from values around 2 to 1.707. Similarly, r2 increases from 1.207 to oo as p goes from —1 to 0, and decreases from oo to 1.207 as p goes from 0 to 1. We conclude that the (1,3) margin by itself leads to an inefficient estimator in a wide range of the values of p. We notice that r2 > ri when p < 0.6357, r2 < r\ when p > 0.6357, and rx = r2 = 1.97 when p = 0.6357. • Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 139 Example 4.6 (Trivariate M C D model for binary data with Morgenstern copula) Suppose we have a trivariate M C D model for binary data with Morgenstern copula, such that P ( l l l ) = ulU2u3[l + e12(l - m ) ( l - u2) + 013(1 - ui)(l - u3) + 023(1 - u2)(l - u3)}, \0{j\< 1, where the dependence parameters 0i2, #13 and 023 obey several constraints: 1 + 012 + $13 + 023 > 0, 1 + #13 > 023 + #23, (4-7) 1 + #12 > #13 + #23, 1 + #23 > #12 + #13- We have Pj(l) = Uj, j = 1,2,3, and P,fc(ll) = [1 + 0jk(l - Uj)(l - uk)]ujUk, 1 < j < k < 3. Assume Uj are given, and the parameters of interest are #12, #13 and #23. The full loglikelihood function is (4.3). The Fisher information matrix for the parameters #i2, #13 and #23 is I. Assume we have I F M for one observation \P = (ipi2,ipi3,ip23). The Godambe information for \P is Jy = DyM^1(Dy)T. We proceed to calculate My and Dy. The algebraic expression of I and Jy are extremely complicated. We used Maple to output algebraic results in the form of C code and then numerically evaluated the ratio r - T r ( ^ " 1 ) 9 ~ Tr(I-i)' where rg means the general efficiency ratio. For this purpose, we first generate n\ uniform points (#12, #13, #23) from the cube [—1, l ] 3 in three dimensional space under the constraints (4.7), and then order these «i points based on the value of |#i2| + |#i3| + |̂ 231 from the smallest to the largest. For each one of the ni points (#12,#13,#23), we generate n2 points of (ui,u2,u3) with (#12,#13,#23) as given dependence parameters in a trivariate Morgenstern copula in (2.5) (see section 4.3 for how to generate multivariate Morgenstern variate), and then order these n2 points based on the value of u\+u2+u3 from the smallest to the largest. Each generated set of (u\, u2, u3, #12, #13, #23) determines a trivariate M C D model with Morgenstern copula for binary data. We calculate rg corresponding to each particular model. Figure 4.5 presents the values of rg at n\X n2 = 300 x 300 "grid" points. We can see from Figure 4.5 that the I F M approach is reasonably efficient in most situations. It is also clear that the magnitude of |#i2| + |#i3| + |#23| has an effect on the efficiency of the I F M approach, with generally speaking higher efficiency (rg's value close to 1) when |#i2| + |#i3| + |#23| is relatively smaller. The magnitude of u\ + u2 + u3 has some effect such that the efficiency of the I F M approach is lower at the area close to the boundary of u\ + u2 + u3 (that is close to 0 or 3). The following facts show that the general efficiency of I F M approach is quite good: in these 90,000 efficiency (rg) evaluations, 50% of the rg values are less than 1.0196, 90% of the rg values are less than 1.0722, 99% of the rg values are less than 1.1803 and 99.99% of the rg values are less Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 140 •JOO Figure 4.5: Trivariate Morgenstern-binary model: Relative efficiency of I F M approach versus IFS approach. than 1.4654. The maximum is 1.7084. The minimum is 1. The two plots in Figure 4.6 are used to clarify the above observations. Plot (a) consists of the 90,000 ordered rg values versus their ordered positions (from 1 to 90,000) in the data set. Plot (b) is a histogram of the rg values. Overall, we consider the I F M approach to be efficient. It is also possible to examine the efficiency ratio in some special situations. We study two of them here. The first one is the situation where ui = u2 — u3 — u and 0 i 2 = #13 = #23 — 9 (—1/3 < 0 < 1). The ratio of the asymptotic variance of 9 (based on WA) versus the asymptotic variance of 9 is found to be 0,10,20,3 n(u,9) = 6i6 2 where ai = 270 2u 4 - 540 2 u 3 + 330 2 u 2 - 1O0U2 - 60 2u + 1O0U - 3 9 - 1 , a 2 = 36>V - 9 0 V + 903u4 - 110V - 3 0 3 u 3 + 220 2 u 3 - 1202M2 + 9u2 + 9 2 u - 9 u - 9 - l , a 3 = 9u2 -0u + l, 61 = (S9u2 - 69u + 30 + l ) (30u 2 - A9u + 9 + 1), 62 = (30u 2 - 20w + l ) (30u 2 + l ) (0u 2 - 9u - l ) 2 . Figure 4.7 is a plot of r i (« , 0) versus u (0 < u < 1) and 0 (—1/3 < 0 < 1). We observe that at the boundaries, when 0 = 1, r\{u, 9) is in the interval (1,1.232), and the maximum 1.232 is attained at Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 141 (a) (b) oo 0 1 o 2 0 0 0 0 4 0 0 0 0 eoooo 8 0 0 0 0 1 . 0 1 . 2 1 . 4 1 . 6 Figure 4.6: Trivariate Morgenstern-binary model: (a). Ordered relative efficiency values of I F M approach versus IFS approach; (b) A histogram of the efficiency value rg. u = 0.2175 o r u = 0.7825. When 9 = —1/3, ri(u, 9) is in the interval (1,2), and the maximum 2 is attained at u = 0 or u = 1. Since the maximum ratio is 2 at some extreme points in the parameter space and for the most part the ratio is less than 1.1, we consider the I F M approach to be efficient. The second special situation is where u\ = u2 = 1*3 = u, 9\2 = #23 = 9 and #13 = 92. The algebraic expression of the ratio r2(u, 9) of the asymptotic variance of 9 (based on WA) versus the asymptotic variance of 9 extends to several pages. We thus only present a plot of r2(u, 9) versus u (0 < u < 1) and 9 (—1 < 9 < 1) in Figure 4.8. We observe that at the boundaries when 9 = 1 , the ratio r2(u,9) is in the interval (1,1.200097), and the maximum is attained at u = 0.2139 or u = 0.7861. When 9 = —1, the ratio r2(u,9) is in the interval (1,1.148333), and the maximum is attained at u = 0.154 or 0.846. Overall, the I F M approach is demonstrated again to be efficient. • Example 4.7 (Trivariate normal-copula model for binary data) In Examples 4.3, 4.4 and 4.5, we studied the efficiency of the I F M approach versus the IFS approach in the special situations of P ( l l l ) = $3 (0 ,0 ,0 ,p 1 2 , /> i3 , />2 3 ) , P(U1) = $3(0,0,0,p,p,p) and P ( l l l ) = $ 3 (0,0,0,p,p 2 ,p) . We found that the I F M approach was fully efficient in these situations. For a general trivariate normal-binary model P ( l l l ) = $ 3 ( $ 1(ui),$ 1(w 2),$ 1{U3),P12,P13,P23), (4.8) Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 142 Figure 4.7: Trivariate Morgenstern-binary model: Relative efficiency of I F M approach versus IFS approach when u\ = u2 = «3 and 9\2 = #13 = #23- Figure 4.8: Trivariate Morgenstern-binary' model: Relative efficiency of I F M approach versus IFS approach when m = u2 — 113, 9\2 — $23 = 9 and #13 = 92. Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 143 the closed form efficiency evaluation, as provided for the trivariate Morgenstern-binary model in Ex- ample 4.6, is not possible because $ 3 ( $ _ 1 ( u i ) , $ - 1 ( u 2 ) , $ - 1 ( u 3 ) , P 1 2 , P 1 3 , P 2 3 ) does not have closed form. Nevertheless, since a high precision multinormal probability calculation subroutine (Schervish 1984) is available, we can evaluate the efficiency numerically. With the model (4.8), we have Pj(l) = ujt j - 1,2,3 and Pjk(ll) = $2(3>_ 1(u;)> $ - 1(ujfe); Pjk), 1 < j < k < 3. Assume Uj are given, and the parameters of interest are p i 2 , P13 and P23- Let 61 — P i 2, #2 = P13 and 63 — p23- The Fisher information matrix from one observation for the parameters 61, 62 and O3, I, has the following expression /hi I12 Ii3\ I = I12 I22 I23 V ^13 ^23 ^331 where 2 T - ST 1 (^123(2/12/22/3)'\ . _ 1 9 „ " " , ^ , PMvittoVs) { d9j J {yiyaya} T 1 5Pl23(2/l2/22/3) 5Pl23(2/l2/22/3) , . I i k - "5—7 \ ?SZ l < j < k < 6 . r  z - ' , ^123(2/12/22/3) ddj 36k We can similarly calculate the Godambe information matrix J$ based on the I F M approach for one observation. We then numerically evaluate the ratio (T-optimality) T r ( J ^ ) 9 T r ^ - 1 ) in the joint trinormal copula sample space and its parameter space. Similar to Example 4.6 for the trivariate Morgenstern-binary model, we first generate rti uniform points of (pi2, Pi3> P23) from the cube [—1, l ] 3 in three dimensional space under the constraints 1 + 2pi2Pi3/>23 — P12 — P13 — P23 > 0 (which guarantees that the determinant of a trinormal correlation matrix is positive) and order these ni points based on the value of \pn\ + \pi3\ + |/>231 from the smallest to the largest. Then for each one of the n\ points (P12, P13, P23), we generate n 2 points ( u i , « 2 , U 3 ) with (P12,P13,P23) as given dependence parameters in a trinormal copula, and order these ri2 points based on the value of U i + U2 + U3 from the smallest to the largest. Each generated set of (ui, 112, 113, P12, P13, P23) determines a trivariate normal-binary model. We evaluate rg corresponding to each particular model. The plot in Figure 4.9 presents the values of rg at ni x 712 = 300 x 300 "grid" points for the trivariate normal-copula model for binary data. We observe from the plot that the I F M approach is reasonably efficient in most situations. It is also clear that the magnitude of |pi2| + \pi3\ + \P23\ has an effect on the efficiency of the I F M approach, with generally higher efficiency (rg's value close to 1) when Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 144 Figure 4.9: Trivariate normal-binary model: Relative efficiency of I F M approach versus IFS ap- proach. \pn\ + \pi3\ + \P23\ is smaller. The magnitude of ui -f u2 + u3 has some effect such that the efficiency of I F M approach is lower at the area close to the boundaries of u\ + u2 + u3 (that is close to 0 or 3). In general the I F M approach is quite efficient: in these 90,000 efficiency (rg) evaluation, 50% of the rg values are less than 1.0128, 90% of the rg values are less than 1.0589, 99% of the rg values are less than 1.1479 and 99.99% of the rg values are less than 1.3672. The maximum is 1.8097. The minimum is 1. The two plots in Figure 4.10 are used to clarify the above observations. Plot (a) consists of the 90,000 ordered rg values versus ordered positions (from 1 to 90,000) in the data set. Plot (b) is a histogram of the rg values. Overall, we draw the conclusion that the I F M approach is efficient. In the situation where ui = u2 = u3 = u and p\2 = pi3 = p23 = p (—1/2 < 9 < 1), let us denote r\(u,p) the ratio of the asymptotic variance of p (based on WA) versus the asymptotic variance of 9. ri(u, p) has to be evaluated numerically. Figure 4.11 shows a plot of ri(u, p) versus u (0 < u < 1) and p (—1/2 < p < 1). It is difficult to evaluate ri(u,p) numerically when the values of u and p are near the boundaries of the sample space and the parameter space, but generally speaking, the efficiency is lower when the values of u and p are close to the boundaries. In the situation where ui = u2 = u3 = u, p\2 = p23 = p and p\3 — p2 (p £ [—1,1]), we observed similar efficiency behaviour. These results are not presented here. • Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 145 Figure 4.11: Trivariate normal-binary model: Relative efficiency of I F M approach versus IFS ap- proach when ui = u2 = U3 and p\2 = P13 = p2z- Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 146 We have seen from the trivariate normal-copula model for binary data and the trivariate Morgenstern- copula model for binary data that, in some situations, I F M is as efficient as IFS (e.g. when u = 0.5 for normal-binary model and u = 0, 0.5 or 1 for Morgenstern-binary model). In other situations, the efficiency of I F M relative to IFS varies from 1 to a value very close to 1. It is hoped the above results may help to develop intuition for the efficiency of I F M . We would guess that the relative efficiency of I F M to IFS for a model with the M U B E property should be good, as we have seen with the trivariate normal-copula model for binary data and the trivariate Morgenstern-copula model for binary data. However, a general exhaustive analytical investigation such as above is not possible; we have to rely on numerical investigation based on simulation for most of the complicated (higher dimensions or models with covariates) situations. 4.3 Efficiency assessment through simulation In this section, we give efficiency assessment results through simulation studies with various models. The following are the steps in the simulation and computation: (1) a M C D or M M D model (with M U B E property) is chosen; (2) different sets of model parameters are specified; (3) with a given set of parameters, a sample of size n is generated from the model, and I F M and IFS approaches are used on the same generated data set to estimate the model parameters; (4) with the same set of parameters, step (3) is repeated m times; (5) for any single parameter in the model, say 9, if the estimates of 9 with the I F M approach from step (3) and (4) are 0\,...,6m, and the estimates of 9 with IFS approach from step (3) and (4) are 9\,..., 9m, then we compute £ = 2 X i A M S E ( i ) = £ £ i ( g < - g ) 2 (4.9) m m and m m The relative efficiency of I F M E to M L E is defined as the ratio r where r 2 = MSE(#)/MSE(#). The values of 9, ^ M S E ( 0 ) , 6, ^ M S E ( 0 ) and r are tabulated, with yJlMSE(6) and ^ M S E ( 0 ) presented in parentheses. For a fixed sample size, a parameter estimation approach is said to be good if 9 (or 9) is close to 9, and if ^ M S E ( f l ) (or y /MSE(#)) is small. There is no "good" in the strict sense, it should be under- stood in terms of inference, interpretation (i.e. no misleading interpretation or false inference would be derived, assuming the model is correct) and in comparison with conventional, well-established Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 147 approach. The main objective of this section is to show that with fairly complex models, the I F M approach still has high efficiency. M u l t i v a r i a t e copula discrete models for b i n a r y data In this subsection, we study the M C D models for binary data. The parameters are assumed to be margin-dependent. In our simulation, we use the M V N copula, and simulate (/-dimensional binary observations y,- (i = 1 , . . . , n) from a multivariate probit model Yij = I(Zij < z^), j = 1 , . . . , d, i = l , . . . , n , where Zj = (Zn,.. .,Zn)' ~ MVNd(0,Qi) with z,, = £ j x y , and 0* = (Oijk) assumed to be free of covariates, that is 0,- = 0 or Oijk = #jA, V i. We transform the dependence parameter 6jk with $jk = (exp(ajfc) — l)/(exp(ajfc) + l) , and estimate ajk instead of Ojk- We use the following simulation scheme: 1. The sample size is n, the number of simulations is N; both are reported in the tables. 2. For d = 3, we study the two situations: Yij = I(Zij < Zj) and Y,;- = I(Zij < fyo + fij\Xij). For each situation, two general dependence structures are chosen: #12 = #13 = #23 = 0.6 (or a12 = a13 = a 2 3 = 1.3863) and 012 = 023 = 0.8 (or a 1 2 = a 2 3 = 2.1972), 013 = 0.64 (or a i 3 = 1.5163). Other parameters are: (a) With no covariates, with z = (0,0,0)'. (b) With covariates, with 0O = (/?1 0,/3 2 0,/?3 0)' = (0.7,0.5,0.3)' and & = (/3n,/?21,/?31)' = (0.5,0.5,0.5)'. Situations where Xij is discrete and continuous are considered. For the discrete situation, X{j = I(U < 0) where U ~ U(—1,1); for the continuous situation, XijS are margin-independent, that with Xi ~ N(0,1/4). 3. For d = 4, we only study Y,j = I(Z{j < Zj). Two dependence structures in the study are #12 = #13 = #14 = #23 = #24 - #34 = 0.6 (or C*12 = c*i3 = a i 4 = a 2 3 = a 2 4 = <*34 - 1.38 63) and #12 = #23 = #34 = 0.8 (or a i 2 = a 2 3 = a 3 4 = 2.1972), # i3 = #24 = 0.64 (or a i 3 = a 2 4 = 1.5163) and #i4 = 0.512 (or a i 4 = 1.1309). The cut-off points are (a) z = (0,0,0,0)', (b) z = (0.7,0.7,0.7,0.7)', (c) z = (0.7,0,0.7,0)'. The numerical results from M C D models for binary data are presented in Table 4.1 to Table 4.5. These tables lead to two clear conclusions: i) The I F M approach is efficient relative to the Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 148 Table 4.1: Efficiency assessment with M C D model for binary data: d = 3, z = (0,0,0)', N = 1000 n m a r g i n parameters 1 2 3 (1,2) (1,3) (2,3) Zl Zi Z3 « 1 2 « 1 3 C*23 a i 2 = « 1 3 = « 2 3 = 1.3863 100 I F M M L E r 0.003 -0.002 0.005 1.442 1.426 1.420 (0.131) (0.121) (0.128) (0.376) (0.380) (0.378) 0.002 -0.003 0.004 1.441 1.426 1.420 (0.131) (0.121) (0.128) (0.376) (0.380) (0.378) 0.998 0.999 0.999 0.999 0.999 0.999 1000 I F M M L E r -0.0006 -0.0016 -0.0008 1.3924 1.3897 1.3906 (0.040) (0.038) (0.039) (0.114) (0.114) (0.113) -0.0018 -0.0028 -0.0019 1.3919 1.3893 1.3902 (0.040) (0.038) (0.039) (0.114) (0.114) (0.113) 0.997 0.997 0.997 1.000 1.001 1.000 a 1 2 = a 2 3 = 2.1972, c*i3 = 1.5163 100 I F M M L E r 0.0027 -0.0006 0.0003 2.2664 1.5571 2.2586 (0.131) (0.123) (0.130) (0.454) (0,377) (0.453) 0.0015 -0.0020 -0.0012 2.2646 1.5552 2.2579 (0.131) (0.123) (0.131) (0.453) (0.377) (0.452) 0.999 1.000 0.999 1.001 1.001 1.002 1000 I F M M L E r -0.0006 -0.0001 -0.0005 2.2009 1.5174 2.2043 (0.040) (0.038) (0.039) (0.135) (0.118) (0.136) -0.0023 -0.0020 -0.0022 2.2003 1.5166 2.2036 (0.040) (0.038) (0.039) (0.135) (0.118) (0.137) 0.996 1.000 0.996 0.999 1.000 1.000 M L approach, for small to large sample sizes. The ratio values r are very close to 1 in almost all the situations studied. These results are consistent with the results from the analytical studies reported in the previous section, ii) The M L E may be slightly more efficient than the I F M E , but this observation is not conclusive. We would say that I F M E and M L E are comparable. Multivariate copula discrete models for ordinal data In this subsection, we study the M C D models for ordinal data. The parameters are assumed to be margin-dependent. In our simulation, we use the M V N copula. We simulate d-dimensional ordinal observations (i = 1 , . . . , n) from a multivariate probit model for ordinal data, such that ' Yj = 1 i f f Z j(0) < Zj < Zj(l), Yj =2iftzj(l)<Zj <Zj(2), < Yj — rrij iff Zj(m,j — 1) < Zj < Zj(rrij), where —co — 2/(0) < z ; ( l ) < • • • < Zj(mj — 1) < Zj(rn,j) = oo are constants, j = 1,2,. . . , d , and Zj = (Zn,.. .,Zid)' ~ MVNd(0,®i) with Zijfaj) = fj(yij) + 3'jXij. Qi - (Oijk) is assumed to be free of covariates, that is, 0,- = 6 or 9ijk = Ojk, V i. We transform the dependence parameters Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 149 Table 4.2: Efficiency assessment with M C D model for binary data: d = 3, 0o = (0.7,0.5,0.3)', 0i = (0.5,0.5,0.5)', Xij discrete, N = 1000 n margin parameters 1 2 3 (1,2) (1,3) (2,3) 010 020 021 030 031 " 1 2 " 1 3 " 2 3 "12 = " 1 3 = 0 2 3 = 1-3863 100 I F M M L E r 0.694 0.559 0.496 0.547 0.294 0.526 1.446 1.447 1.435 (0.199) (0.385) (0.192) (0.335) (0.186) (0.282) (0.530) (0.492) (0.455) 0.692 0.557 0.495 0.545 0.293 0.526 1.447 1.450 1.438 (0.198) (0.347) (0.192) (0.319) (0.185) (0.282) (0.498) (0.490) (0.458) 1.005 1.108 1.001 1.050 1.001 1.001 1.064 1.004 0.993 1000 I F M M L E r 0.700 0.501 0.498 0.509 0.298 0.503 1.395 1.386 . 1.385 (0.063) (0.100) (0.058) (0.089) (0.058) (0.085) (0.145) (0.136) (0.131) 0.699 0.500 0.497 0.508 0.298 0.503 1.395 1.387 1.387 (0.063) (0.099) (0.058) (0.089) (0.058) (0.085) (0.145) (0.136) (0.131) 1.002 1.013 1.000 1.006 0.998 1.001. 1.001 1.001 1.001 a 1 2 = aas = 2.1972, a 1 3 = 1.5163 100 I F M M L E r 0.694 0.559 0.496 0.540 0.293 0.534 2.392 1.599 2.314 (0.199) (0.385) (0.198) (0.312) (0.186) (0.289) (0.790) (0.592) (0.669) 0.692 0.558 0.494 0.542 0.292 0.534 2.352 1.591 2.314 (0.199) (0.344) (0.198) (0.309) (0.187) (0.290) (0.675) (0.548) (0.597) 1.001 1.117 0.999 1.007 0.995 0.996 1.171 1.081 1.119 1000 I F M M L E r 0.700 0.501 0.499 0.503 0.299 0.502 2.205 1.521 2.201 (0.064) (0.100) (0.058) (0.092) (0.057) (0.086) (0.167) (0.141) (0.155) 0.699 0.501 0.498 0.503 0.297 0.502 2.207 1.523 2.204 (0.063) (0.098) (0.058) (0.092) (0.057) (0.086) (0.167) (0.141) (0.155) 1.005 1.019 1.000 1.008 0.998 1.001 0.999 1.001 0.999 Table 4.3: Efficiency assessment with M C D model for binary data: d = 3, 0O = (0.7,0.5,0.3)', 0i = (0.5,0.5,0.5)', Xij = Xi continuous, N = 100 n margin parameters 1 2 3 (1,2) (1,3) (2,3) 010 011 020 021 030 031 " 1 2 0 1 3 " 2 3 ai2 = "13 = " 2 3 = 1.3863 100 I F M M L E r 0.722 0.529 0.488 0.520 0.312 0.524 1.453 1.403 1.473 (0.136) (0.326) (0.144) (0.278) (0.137) (0.310) (0.398) (0.402) (0.401) 0.722 0.532 0.486 0.519 0.311 0.522 1.458 1.407 1.476 (0.137) (0.320) (0.144) (0.278) (0.138) (0.308) (0.402) (0.412) (0.406) 0.999 1.019 0.999 1.002 0.993 1.005 0.990 0.976 0.989 1000 I F M M L E r 0.704 0.495 0.501 0.504 0.306 0.504 1.413 1.380 1.391 (0.042) (0.089) (0.046) (0.084) (0.041) (0.093) (0.140) (0.109) (0.124) 0.703 0.494 0.500 0.503 0.305 0.503 1.415 1.381 1.393 (0.042) (0.090) (0.045) (0.084) (0.040) (0.093) (0.139) (0.109) (0.124) 1.004 0.988 1.004 0.993 1.007 1.001 1.000 1.006 0.999 a i , = a 2 3 = 2.1972, a , 3 = 1.5163 100 I F M M L E V 0722 0.529 0.490 0.543 0.300 0.544 2.303 1.556 2.303 (0.136) (0.326) (0.133) (0.272) (0.131) (0.309) (0.494) (0.362) (0.525) 0.721 0.532 0.488 0.542 0.298 0.538 2.318 1.550 2.310 (0.136) (0.317) (0.134) (0.279) (0.131) (0.306) (0.504) (0.371) (0.533) 1.000 1.028 0.993 0.976 1.002 1.010 0.981 0.978 0.985 1000 I F M M L E r 0.704 0.495 0.502 0.500 0.303 0.506 2.220 1.541 2.213 (0.042) (0.089) (0.045) (0.076) (0.041) (0.091) (0.155) (0.123) (0.142) 0.703 0.494 0.499 0.498 0.301 0.505 2.222 1.541 2.215 (0.042) (0.089) (0.045) (0.075) (0.041) (0.091) (0.156) (0.124) (0.142) l . O l l 1.002 1.008 1.015 1.010 1.010 0.991 0.997 0.997 Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 150 Table 4.4: Efficiency assessment with M C D model for binary data: d — 4, c*i2 = « i 3 = c*i4 = «23 = a 2 4 = a 3 4 = 1.3863, N = 1000 n margin parameters 1 2 3 4 (1,2) (1,3) (1,4) (2,3) (2,4) (3,4) Z\ Zi Z3 Z 4 <*12 "13 «14 <*23 «24 «34 z = (0,0,0,0)' 100 I F M M L E r 0.002 -0.008 0.002 -0.002 1.417 1.421 1.419 1.420 1.406 1.397 (0.121) (0.122) (0.128) (0.125) (0.369) (0.374) (0.374) (0.370) (0.374) (0.3 66) 0.001 -0.009 0.000 -0.003 1.417 1.419 1.418 1.419 1.405 1.395 (0.121) (0.123) (0.128) (0.125) (0.369) (0.374) (0.374) (0.370) (0.373) (0.3 65) 0.997 0.996 0.999 0.999 1.001 0.999 1.000 1.000 1.002 1.003 1000 I F M M L E r -0.001 -0.001 -0.001 -0.003 1.388 1.386 1.392 1.385 1.391 1.387 (0.040) (0.037) (0.039) (0.039) (0.108) (0.112) (0.112) (0.115) (0.114) (0.1 18) -0.002 -0.003 -0.002 -0.004 1.388 1.386 1.392 1.385 1.390 1.387 (0.040) (0.037) (0.039) (0.039) (0.108) (0.112) (0.112) (0.115) (0.114) (0.1 17) 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 z = (0.7,0.7,0.7,0.7)' 100 I F M M L E r 0.709 0.703 0.710 0.708 1.447 1.403 1.444 1.405 1.401 1.404 (0.139) (0.139) (0.140) (0.139) (0.444) (0.443) (0.466) (0.430) (0.431) (0.4 44) 0.707 0.702 0.709 0.706 1.447 1.402 1.442 1.403 1.401 1.403 (0.138) (0.139) (0.140) (0.139) (0.442) (0.441) (0.465) (0.430) (0.429) (0.4 45) 1.002 1.000 0.998 1.000 1.005 1.004 1.001 1.000 1.005 0.997 1000 I F M M L E r 0.700 0.703 0.699 0.700 1.388 1.384 1.390 1.380 1.383 1.389 (0.043) (0.044) (0.043) (0.045) (0.134) (0.130) (0.131) (0.131) (0.136) (0.1 32) 0.699 0.701 0.698 0.699 1.389 1.385 1.390 1.382 1.384 1.390 (0.043) (0.044) (0.043) (0.044) (0.133) (0.130) (0.131) (0.130) (0.136) (0.1 32) 1.000 1.001 1.000 1.004 1.001 0.999 1.001 1.004 1.000 1.000 z = (0.7,0,0.7,0)' 100 I F M M L E r 0.709 -0.007 0.710 -0.002 1.464 1.403 1.480 1.463 1.406 1.454 (0.139) (0.122) (0.140) (0.124) (0.567) (0.443) (0.596) (0.533) (0.374) (0.5 74) 0.708 -0.009 0.709 -0.003 1.458 1.400 1.472 1.459 1.405 1.447 (0.138) (0.122) (0.140) (0.124) (0.512) (0.445) (0.541) (0.493) (0.373) (0.5 18) 1.002 0.999 0.998 1.000 1.107 0.996 1.102 1.081 1.003 1.108 1000 I F M M L E r 0.700 -0.001 0.699 -0.002 1.392 1.384 1.398 1.386 1.391 1.392 (0.043) (0.037) (0.043) (0.039) (0.128) (0.130) (0.132) (0.133) (0.114) (0.1 31) 0.699 -0.002 0.698 -0.004 1.394 1.385 1.399 1.387 1.390 1.393 (0.043) (0.038) (0.043) (0.039) (0.128) (0.129) (0.132) (0.133) (0.114) (0.1 31) 1.002 0.999 1.001 0.996 1.000 1.008 1.001 0.999 0.999 1.001 Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 151 Table 4.5: Efficiency assessment with M C D model for binary data: d = 4, a i 2 = « 2 3 = <*34 = 2.1972, a i 3 = a 2 4 = 1.5 1 63, a 1 4 = 1.1309, N = 1000 n margin parameters 1 2 3 4 (1,2) (1,3) (1,4) (2,3) (2,4) (3,4) Z\ Z2 Z3 Z4 «12 «13 «14 Q!23 a 2 4 <*34 z = (0,0,0,0)' 1000 I F M M L E r 0.002 0.001 0.001 0.001 2.210 1.518 1.137 2.198 1.516 2.204 (0.039) (0.038) (0.039) (0.041) (0.135) (0.116) (0.106) (0.131) (0.115) (0.128) -0.000 -0.002 -0.001 -0.001 2.209 1.517 1.136 2.197 1.515 2.204 (0.039) (0.039) (0.039) (0.041) (0.135) (0.115) (0.106) (0.131) (0.115) (0.127) 1.000 0.996 1.004 0.997 1.000 1.002 1.001 1.000 1.000 1.000 z = (0.7,0.7,0.7,0.7)' 1000 I F M M L E r 0.700 0.701 0.700 0.701 2.206 1.514 1.136 2.205 1.521 2.206 (0.042) (0.043) (0.043) (0.044) (0.154) (0.132) (0.125) (0.153) (0.134) (0.159) 0.699 0.699 0.699 0.700 2.207 1.514 1.135 2.206 1.521 2.208 (0.042) (0.044) (0.043) (0.044) (0.154) (0.130) (0.124) (0.153) (0.132) (0.159) 1.000 0.999 1.003 0.999 1.000 1.010 1.008 1.000 1.011 1.001 z = (0.7,0,0.7,0)' 1000 I F M M L E r 0.700 0.001 0.700 0.001 2.212 1.514 1.139 2.209 1.516 2.214 (0.042) (0.038) (0.043) (0.041) (0.159) (0.132) (0.122) (0.162) (0.115) (0.162) 0.699 -0.001 0.699 -0.001 2.215 1.513 1.140 2.214 1.515 2.218 (0.042) (0.039) (0.043) (0.041) (0.159) (0.130) (0.121) (0.163) (0.115) (0.162) 0.996 0.994 0.998 0.997 0.999 1.014 1.009 0.996 1.008 1.001 9jk with 6jk = (exp(ajjt) — l)/(exp(ajfc) + 1), and estimate ctjk instead of 0jk- In our simulation study, we only examine the situation where no covariates are involved in the marginal parameters, and further assume that rrij — 3. In these situations, for each margin, we need to estimate two parameters: Zj(l) and Zj(2). We use the following simulation scheme: 1. The sample size is n, the number of simulations is N; both are reported in the tables. 2. For d = 3, we study two situations of marginal parameters: (a) z(l) = (-0.5,-0.5,-0.5)' , z(2) = (0.5,0.5,0.5)' (b) z(l) = (-0.5,0, -0.5)', z(2) = (0.5,1,0.5)' and for each situation, two dependence structures are used in the simulation study: 0 i2 = 913 - 023 = 0.6 (or a 1 2 = a 1 3 - a 2 3 - 1.3863) and 612 = 023 = 0.8 (or a 1 2 = a 2 3 = 2.1972), 013 = 0.64 (or or is = 1.5163). 3. For d — 4, we similarly study two situations of marginal parameters: (a) z(l) = (-0.5, -0.5, -0.5, -0.5)', z(2) = (0.5,0.5,0.5,0.5)' ' (b) z(l) = (-0.5,0, -0.5,0)', z(2) = (0.5,1,0.5,1)' Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 152 Table 4.6: Efficiency assessment with M C D model for ordinal data: d = 3, z(l) = (—0.5, —0.5, —0.5)', z(2) = (0.5,0.5,0.5)', N = 1000 n m a r g i n p a r a m e t e r s 1 2 3 (1,2) (1,3) (2,3) Zi(2) Z 2 ( l ) Z2(2) 2 3 (D 2fc(2) Oi2 «13 "23 «12 = <*13 = «23 = 1.3863 100 I F M M L E r -0.500 0.508 -0.508 0.500 -0.507 0.508 1.413 1.407 1.414 (0.135) (0.135) (0.130) (0.133) (0.134) (0.137) (0.275) (0.284) (0.287) -0.503 0.507 -0.511 0.498 -0.510 0.507 1.413 1.408 1.415 (0.134) (0.135) (0.130) (0.133) (0.134) (0.136) (0.275) (0.284) (0.287) 1.004 1.003 1.000 1.003 1.006 1.003 1.000 0.999 0.998 1000 I F M M L E r -0.501 0.498 -0.501 0.499 -0.502 0.500 1.390 1.386 1.387 (0.043) (0.041) (0.041) (0.041) (0.042) (0.042) (0.086) (0.089) (0.088) -0.504 0.497 -0.504 0.498 -0.504 0.498 1.390 1.386 1.387 (0.043) (0.041) (0.041) (0.041) (0.042) (0.042) (0.085) (0.089) (0.088) 0.998 1.002 0.998 1.002 0.997 1.006 1.005 1.004 1.005 a „ = ttM = 2.1972, a 1 3 = 1.5163 100 I F M M L E r -0.500 0.508 -0.506 0.502 -0.509 0.508 2.251 1.542 2.242 (0.135) (0.135) (0.132) (0.136) (0.136) (0.134) (0.323) (0.282) (0.324) -0.504 0.506 -0.512 0.500 -0.513 0.506 2.252 1.539 2.243 (0.136) (0.136) (0.132) (0.136) (0.138) (0.135) (0.321) (0.285) (0.322) 0.990 0.999 0.997 1.005 0.991 0.999 1.005 0.992 1.005 1000 I F M M L E r -0.501 0.498 -0.500 0.498 -0.502 0.500 2.202 1.516 2.199 (0.043) (0.041) (0.041) (0.040) (0.042) (0.041) (0.093) (0.088) (0.097) -0.505 0.496 -0.504 0.496 -0.506 0.498 2.203 1.516 2.200 (0.043) (0.041) (0.041) (0.040) (0.042) (0.041) (0.093) (0.088) (0.097) 0.997 0.999 1.005 1.008 0.998 1.001 0.999 1.004 1.000 and for each situation, two dependence structures are used in the simulation study: # i 2 = #13 — flu — 023 = 024 = 034 = 0.6 (or a i 2 - c*i3 — « i 4 = a 2 3 — a 2 4 — a 3 4 -• 1.3863) and 0 1 2 = 0 2 3 = 0 3 4 = 0.8 (or « 1 2 = a 2 3 = a 3 4 = 2.1972), 0 1 3 = 0 2 4 = 0.64 (or a j 3 = a 2 4 = 1.5163) and 0 i4 =0.512 (or a i 4 = 1.1309). The numerical results from M C D models for ordinal data are presented in Table 4.6 to Table 4.11. Again, from these tables, we have the following two clear conclusions: i) The I F M approach is efficient relative to the M L approach, for small to large sample sizes. The ratio values r are very close to 1 in almost all the studied situations, ii) M L E may be slightly more efficient than I F M E , but this observation is not conclusive. We would say that I F M E and M L E are comparable. Multivariate copula discrete models for count data In this subsection, we study the M C D models for count data. The parameters are assumed to be margin-dependent. In our simulation, we use the M V N copula. We simulate d-dimensional Poisson observations y,- (i = 1 , . . . , n) from a multivariate normal-copula Poisson model 2 2 P(yi • • • %) = E • • • E ( - l ) * ' 1 + - + < < C ( a U l a d i d ; 6 ) , « ' i = i » < i = i Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate Table 4.7: Efficiency assessment with M C D model for ordinal data: d = 3, z(l) = (—0.5,0,-0.5) z(2) = (0.5,1,0.5)', N = 1000 n margin parameters 1 2 3 (1,2) (1,3) (2,3) Zi ( l ) Zi(2) z 2 ( l ) z2(2) z 3 ( l ) *3(2) "12 "13 "23 "12 = "13 = "23 = 1.3863 100 I F M M L E r -0.500 0.508 -0.002 1.01 -0.507 0.508 1.429 1.407 1.416 (0.135) (0.135) (0.121) (0.16) (0.134) (0.137) (0.294) (0.284) (0.298) -0.503 0.507 -0.004 1.00 -0.510 0.507 1.430 1.408 1.417 (0.134) (0.135) (0.120) (0.16) (0.134) (0.136) (0.293) (0.284) (0.297) 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1000 I F M M L E r -0.501 0.498 -0.002 0.998 -0.502 0.500 1.393 1.386 1.389 (0.043) (0.041) (0.038) (0.049) (0.042) (0.042) (0.089) (0.089) (0.090) -0.503 0.497 -0.003 0.996 -0.504 0.498 1.393 1.385 1.389 (0.043) (0.041) (0.038) (0.049) (0.042) (0.042) (0.089) (0.089) (0.090) 0.998 1.000 0.996 1.000 0.997 1.005 1.006 1.004 1.004 a i 2 = a M = 2.1972, ais = 1.5163 100 I F M M L E r -0.500 0.508 -0.001 1.012 -0.509 0.508 2.261 1.542 2.247 (0.135) (0.135) (0.123) (0.164) (0.136) (0.134) (0.343) (0.282) (0.344) -0.504 0.505 -0.004 1.011 -0.513 0.505 2.268 1.538 2.254 (0.136) (0.136) (0.121) (0.166) (0.139) (0.135) (0.345) (0.290) (0.347) 0.994 0.995 1.016 0.989 0.984 0.997 0.993 0.975 0.991 1000 I F M M L E r -0.501 0.498 -0.000 0.999 -0.502 0.500 2.204 1.516 2.199 (0.043) (0.041) (0.038) (0.049) (0.042) (0.041) (0.099) (0.088) (0.101) -0.504 0.496 -0.003 0.996 -0.505 0.498 2.204 1.516 2.199 (0.043) (0.041) (0.038) (0.049) (0.042) (0.041) (0.099) (0.087) (0.100) 0.998 0.999 1.018 1.002 0.998 1.002 1.002 1.006 1.011 Table 4.8: Efficiency assessment with M C D model for ordinal data: d = 4, z(l) = (—0.5,-0. -0.5, -0.5)', z(2) = (0.5,0.5,0.5,0.5)', a 1 2 = aX3 = a M = a 2 3 = a 2 4 = a 3 4 = 1.3863, TV = 100 n margin parametei •s Z i ( l ) 1 2 3 4 zi(2) z 2 ( l ) z2(2) z 3 ( l ) z3(2) z 4 ( l ) z4(2) 100 I F M M L E r -0.4928 0.5025 -0.5037 0.4820 -0.4997 0.4986 -0.5088 0.4890 (0.1238) (0.1345) (0.1139) (0.1377) (0.1145) (0.1293) (0.1365) (0.1349) -0.4930 0.5023 -0.5043 0.4819 -0.5007 0.4983 -0.5101 0.4877 (0.1225) (0.1325) (0.1145) (0.1368) (0.1149) (0.1292) (0.1373) (0.1361) 1.011 1.015 0.995 1.006 0.996 1.001 0.994 0.991 1000 I F M M L E r -0.4954 0.5044 -0.5068 0.4981 -0.4987 0.5015 -0.4966 0.5016 (0.0433) (0.0415) (0.0457) (0.0436) (0.0413) (0.0413) (0.0459) (0.0414) -0.4973 0.5026 -0.5094 0.4963 -0.5008 0.4998 -0.4988 0.4998 (0.0431) (0.0416) (0.0463) (0.0439) (0.0413) (0.0413) (0.0460) (0.0419) 1.003 0.999 0.987 0.993 1.000 1.001 0.997 0.988 n p margin arameters (1,2) (1,3) (1,4) (2,3) (2,4) (3,4) tv-2 a i 3 a 1 4 a 2 3 0:94 a 3 4 100 I F M M L E r 1.4489 1.4332 1.4174 1.4539 1.4050 1.4149 (0.2741) (0.2858) (0.2906) (0.2903) (0.3005) (0.3088) 1.4498 1.4351 1.4203 1.4562 1.4070 1.4175 (0.2732) (0.2842) (0.2903) (0.2928) (0.3019) (0.3042) 1.003 1.006 1.001 0.991 0.995 1.015 1000 I F M M L E r 1.3939 1.3937 1.3942 1.3828 1.3739 1.3785 (0.0786) (0.0815) (0.0869) (0.0833) (0.0775) (0.0822) 1.3949 1.3950 1.3958 1.3827 1.3745 1.3800 (0.0795) (0.0817) (0.0886) (0.0834) (0.0790) (0.0823) 0.988 0.997 0.980 0.999 0.981 0.998 Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 154 Table 4.9: Efficiency assessment with M C D model for ordinal data: d = 4, z(l) = (—0.5,-0.5, -0.5,-0.5) ' , z(2) = (0.5,0.5,0.5,0.5)', a n = a 2 3 = " 3 4 = 2.1972, a 1 3 = " 2 4 = 1.5163, a 1 4 = 1.1309, N = 100 n margin parametei ts Z i ( l ) 1 2 3 4 zi(2) z2(l) z2(2) z3(l) z3(2) z 4 ( l ) z4(2) 100 I F M M L E r -0.4928 0.5025 -0.5032 0.4924 -0.5038 0.4917 -0.5142 0.4909 (0.1238) (0.1345) (0.1156) (0.1357) (0.1155) (0.1342) (0.1349) (0.1424) -0.4965 0.5050 -0.5049 0.4945 -0.5122 0.4936 -0.5203 0.4936 (0.1265) (0.1339) (0.1162) (0.1322) (0.1238) (0.1328) (0.1388) (0.1457) 0.979 1.005 0.994 1.026 0.933 1.011 0.972 0.978 1000 I F M M L E r -0.4954 0.5044 -0.5047 0.5018 -0.5011 0.5019 -0.5035 0.5012 (0.0433) (0.0415) (0.0454) (0.0435) (0.0417) (0.0401) (0.0433) (0.0391) -0.4986 0.5017 -0.5079 0.4988 -0.5047 0.4989 -0.5069 0.4984 (0.0433) (0.0416) (0.0460) (0.0434) (0.0412) (0.0408) (0.0436) (0.0394) 0.999 0.999 0.988 1.004 1.014 0.984 0.992 0.992 n p margin arameters (1,2) (1,3) (1,4) (2,3) (2,4) (3,4) a U a 1 3 " 1 4 " 2 3 " 2 4 " 3 4 100 I F M M L E r 2.2807 1.5610 1.1782 2.2764 1.5603 2.2516 (0.3091) (0.2732) (0.2897) (0.3453) (0.3270) (0.3331) 2.2754 1.5503 1.1795 2.2750 1.5491 2.2477 (0.2933) (0.2621) (0.2802) (0.3354) (0.3263) (0.3291) 1.050 1.040 1.030 1.030 1.000 1.010 1000 I F M M L E r 2.2190 1.5263 1.1396 2.1868 1.5055 2.1851 (0.0915) (0.0803) (0.0790) (0.0884) (0.0874) (0.0957) 2.2217 1.5267 1.1394 2.1887 1.5055 2.1865 (0.0916) (0.0791) (0.0789) (0.0871) (0.0865) (0.0951) 0.999 1.015 1.002 1.014 1.011 1.007 Table 4.10: Efficiency assessment with M C D model for ordinal data: d = 4, z(l) — (—0.5, 0, —0.5, 0)', Z(2) = (0.5, 1, 0.5, 1)', "12 = " 1 3 = " 1 4 = " 2 3 = " 2 4 = " 3 4 = 1.3 8 63, N = 100 n margin parameters *i ( l ) 1 2 3 4 Zl(2) z2(l) z2(2) z3(l) z3(2) z 4 ( l ) z4(2) 100 I F M M L E r -0.4928 0.5025 -0.01817 0.9924 -0.4997 0.4986 -0.0108 0.9994 (0.1238) (0.1345) (0.12289) (0.1507) (0.1145) (0.1293) (0.1159) (0.1528) -0.4939 0.5020 -0.01733 0.9886 -0.5013 0.4979 -0.0115 0.9964 (0.1217) (0.1318) (0.12165) (0.1505) (0.1141) (0.1296) (0.1146) (0.1532) 1.018 1.020 1.010 1.002 1.003 0.998 1.011 0.997 1000 I F M M L E r -0.4954 0.5044 -0.0076 1.0021 -0.4987 0.5015 -0.0017 1.0044 (0.0433) (0.0415) (0.0437) (0.0450) (0.0413) (0.0413) (0.0405) (0.0474) -0.4975 0.5026 -0.0095 0.9996 -0.5009 0.5000 -0.0037 1.0018 (0.0431) (0.0413) (0.0436) (0.0448) (0.0414) (0.0413) (0.0404) (0.0473) 1.003 1.005 1.001 1.005 0.998 1.000 1.003 1.002 n margin parameters (1,2) (1,3) (1,4) (2,3) (2,4) (3,4) " 1 2 " 1 3 " 1 4 " 2 3 " 2 4 " 3 4 100 I F M M L E r 1.4398 1.4332 1.4428 1.4452 1.4228 1.4345 (0.2874) (0.2858) (0.2801) (0.2866) (0.2966) (0.3429) 1.4468 1.4363 1.4467 1.4509 1.4313 1.4341 (0.2888) (0.2838) (0.2781) (0.2882) (0.2971) (0.3409) 0.995 1.007 1.007 0.995 0.998 1.006 1000 I F M M L E r 1.4060 1.3937 1.3907 1.3866 1.3798 1.3756 (0.0822) (0.0815) (0.0891) (0.0806) (0.0940) (0.0919) 1.4067 1.3947 1.3924 1.3877 1.3813 1.3769 (0.0820) (0.0811) (0.0911) (0.0808) (0.0941) (0.0908) 1.003 1.005 0.978 0.997 0.999 1.012 Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 155 Table 4.11: Efficiency assessment with M C D model for ordinal data: d — 4, z(l) = (—0.5,0, —0.5, 0)', z(2) = (0.5,1,0.5,1)', a n = a 2 3 = " 3 4 = 2.1972, a 1 3 = a 2 4 = 1.5163, " i 4 = 1.1309, TV = 100 n margin parameters 1 2 3 4 Zl(2) z2(l) z2(2) z3(l) z3(2) z 4 ( l ) zA(2) 100 I F M M L E r -0.4928 0.5025 -0.0217 0.9877 -0.5038 0.4917 -0.01462 0.9791 (0.1238) (0.1345) (0.1241) (0.1516) (0.1155) (0.1342) (0.10744) (0.1439) -0.4944 0.5010 -0.0189 0.9796 -0.5090 0.4892 -0.01577 0.9780 (0.1274) (0.1342) (0.1244) (0.1548) (0.1176) (0.1319) (0.11106) (0.1449) 0.972 1.002 0.998 0.979 0.981 1.018 0.967 0.994 1000 I F M M L E r -0.4954 0.5044 -0.0006 0.9995 -0.5011 0.5019 -0.0013 1.0018 (0.0433) (0.0415) (0.0406) (0.0476) (0.0417) (0.0401) (0.0382) (0.0489) -0.4985 0.5010 -0.0034 0.9956 -0.5046 0.4988 -0.0044 0.9988 (0.0433) (0.0410) (0.0396) (0.0476) (0.0416) (0.0405) (0.0392) (0.0482) 0.999 1.013 1.025 0.999 1.003 0.991 0.974 1.014 n . margin parameters (1,2) (1,3) (1,4) (2,3) (2,4) (3,4) " 1 2 " 1 3 " 1 4 " 2 3 " 2 4 " 3 4 100 I F M M L E r 2.2370 1.5610 1.1717 2.2543 1.5524 2.2616 (0.2871) (0.2732) (0.2737) (0.3444) (0.3290) (0.3438) 2.2484 1.5582 1.1819 2.2640 1.5540 2.2616 (0.2867) (0.2743) (0.2832) (0.3330) (0.3279) (0.3406) 1.001 0.996 0.966 1.034 1.003 1.010 1000 I F M M L E r 2.2143 1.5263 1.1385 2.1965 1.5095 2.1807 (0.0957) (0.0803) (0.0811) (0.0980) (0.0842) (0.0951) 2.2180 1.5268 1.1377 2.1988 1.5102 2.1812 (0.0941) (0.0799) (0.0814) (0.0953) (0.0841) (0.0975) 1.018 1.005 0.996 1.028 1.001 0.975 where o,ji = Gj(y,j — 1), o,j2 = Gj(yj). C is d-dimensional normal copula. Gj(-) is defined as 0, i % < 0, V. 3=0 where p^ = [A? exp(-Aj)]/s!, s = 0,1 ,2 , . . . , oo. In general, we assume X,j = e x p ( ^ X j j ) , and 0j- = (#jjfc) to be free of covariates. We further transform the dependence parameters Ojk with Ojk = (exp(ajjt) — l)/(exp(aJ-j.) + l ) , and estimate ctjk instead of Ojk- We use the following simulation scheme: 1. The sample size is n, the number of simulations is N; both are reported in the tables. 2. For d = 3, we study the two situations: log(Ajj) = Bj and log(A,j) = Bj0 + PjiXij. For each situation, we chose two dependence structures: 0\2 = t9 1 3 = 023 = 0.6 (or cti2 = c*i3 = 0/23 = 1.3863) and 0l2 = 023 = 0.8 (or « 1 2 = a 2 3 = 2.1972), 013 = 0.64 (or e*i3 = 1.5163). Other parameters are (a) 0O = ( / ? i o , / W 3 o ) ' = (1,1,1)' and 0X = (0u,02i,031)' = (0.5,0.5,0.5)'. Situations where Xij is discrete is considered. For the discrete situation, Xij = I(U < 0) where Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 156 Table 4.12: Efficiency assessment with M C D model for count data: d = 3, 0o = (1,1,1)' and 0 i = (0.5,0.5,0.5)', discrete, N = 1000 n margin parameters 1 2 3 (1,2) (1,3) (2,3) /?10 011 020 021 030 031 " 1 2 6*13 " 2 3 " 1 2 = " i s - " 2 3 = 1.3863 100 I F M M L E V 1.0054 0.490 1.0017 0.493 1.0029 0.493 1.423 1.412 1.420 (0.0867) (0.109) (0.0883) (0.110) (0.0870) (0.106) (0.198) (0.191) (0.188) 1.0018 0.488 0.9985 0.491 0.9998 0.491 1.422 1.409 1.417 (0.0876) (0.111) (0.0892) (0.112) (0.0872) (0.107) (0.194) (0.183) (0.187) 0.990 0.985 0.990 0.979 0.997 0.993 1.019 1.041 1.009 1000 I F M M L E r 1.0007 0.4991 1.0013 0.4975 1.0014 0.4984 1.3953 1.3864 1.3896 (0.0287) (0.0358) (0.0271) (0.0343) (0.0278) (0.0352) (0.0625) (0.0564) (0.0595) 0.9976 0.4993 0.9984 0.4977 0.9983 0.4987 1.3918 1.3849 1.3875 (0.0292) (0.0362) (0.0274) (0.0343) (0.0283) (0.0354) (0.0578) (0.0563) (0.0574) 0.983 0.988 0.989 0.998 0.983 0.995 1.081 1.003 1.035 a 1 2 = a 2 3 = 2.1972, a 1 3 = 1.5163 100 I F M M L E r 1.0054 0.490 1.0044 0.490 1.0046 0.491 2.242 1.551 2.236 (0.0867) (0.109) (0.0884) (0.110) (0.0870) (0.107) (0.190) (0.196) (0.186) 1.0006 0.489 0.9993 0.488 0.9998 0.489 2.239 1.545 2.233 (0.0878) (0.111) (0.0895) (0.112) (0.0881) (0.109) (0.187) (0.180) (0.185) 0.987 0.986 0.987 0.981 0.987 0.980 1.015 1.087 1.007 1000 I F M M L E r 1.0007 0.4991 1.0013 0.4979 1.0014 0.4982 2.2055 1.5185 2.1975 (0.0287) (0.0358) (0.0272) (0.0350) (0.0277) (0.0351) (0.0555) (0.0620) (0.0572) 0.9962 0.4992 0.9963 0.4981 0.9971 0.4984 2.2037 1.5156 2.1952 (0.0291) (0.0364) (0.0279) (0.0355) (0.0279) (0.0354) (0.0553) (0.0553) (0.0569) 0.985 0.983 0.977 0.986 0.993 0.990 1.003 1.121 1.006 U ~ uniform(—1,1). (b) ( A , 1 , A i 2 , A , - 3 ) = (5,3,5) (or (0i,02,03) = (1.6094,1.0986,1.6094)). 3. For d = 4, we only study log(A;j) = 0,-. Two dependence structures are considered: #i2 = #13 = #14 = #23 - #24 — #34 — 0.6 (or Q i 2 = «13 = " 1 4 = " 2 3 = " 2 4 — " 3 4 " 1.3863) and # 1 2 = #23 = #34 = 0.8 (or a n = a 2 3 = " 3 4 = 2.1972), #i3 = #24 = 0.64 (or a l 3 = " 2 4 = 1.5163) and #14 = 0.512 (or a i 4 = 1.1309). Other parameters are (a) ( A i i , A i 2 , A i 3 , A i 4 ) = (5,5,5,5) (or equivalent^ (0i,0 2 )0 3,0 4) = (1.6094,1.6094,1.6094, 1.6094)). (b) ( A , - i , A i 2 , A , - 3 l A M ) = (4,2,5,8) (or equivalent^ (0i,0 2,03,0 4) = (1.3863,0.6931,1.6094, 2.0794)). The numerical results from M C D models for count data are presented in Table 4.12 to Table 4.15. We obtain the similar conclusions to those for the M C D models for binary data and ordinal data. Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 157 Table 4.13: Efficiency assessment with M C D model for count data: d = 3, (0i, 02,03) = (1.6094, 1.0986,1.6094), TV = 1000 n margin parameters 1 2 3 (1,2) (1,3) (2,3) 01 02 03 " 1 2 " 1 3 " 2 3 " 1 2 = " 1 3 = " 2 3 - 1.3863 100 I F M M L E r 1.6075 1.1000 1.6076 1.415 1.403 1.408 (0.0465) (0.0593) (0.0456) (0.194) (0.190) (0.195) 1.6024 1.0943 1.6032 1.413 1.398 1.403 (0.0519) (0.0648) (0.0490) (0.191) (0.189) (0.192) 0.896 0.915 0.929 1.018, 1.003 1.018 1000 I F M M L E r 1.6098 1.0988 1.6095 1.3885 1.3885 1.3880 (0.0141) (0.0185) (0.0140) (0.0597) (0.0575) (0.0586) 1.6077 1.0963 1.6076 1.3877 1.3855 1.3869 (0.0146) (0.0191) (0.0143) (0.0588) (0.0577) (0.0574) 0.966 0.967 0.975 1.015 0.996 1.021 a , , = a 9 3 = '2.1972, a 1 3 = 1.5163 100 I F M M L E r 1.6075 1.0991 1.6089 2.234 1.539 2.219 (0.0465) (0.0599) (0.0455) (0.187) (0.187) (0.188) 1.6017 1.0912 1.6032 2.231 1.533 2.217 (0.0509) (0.0667) (0.0490) (0.187) (0.181) (0.188) 0.913 0.897 0.929 0.999 1.033 0.995 1000 I F M M L E r 1.6098 1.0992 1.6097 2.2027 1.5176 2.2002 (0.0141) (0.0182) (0.0140) (0.0565) (0.0579) (0.0547) 1.6063 1.0944 1.6063 2.1992 1.5149 2.1968 (0.0155) (0.0197) (0.0152) (0.0563) (0.0567) (0.0551) 0.915 0.924 0.923 1.003 1.021 0.993 Table 4.14: Efficiency assessment with M C D model for count data: d = 4, (0i,02,03,0i) — (1.6094,1.0986,1.6094,1.6094), N = 1000 n margin parameters 1 2 3 4 (1,2) (1,3) (1,4) (2,3) (2,4) (3,4) 01 02 03 04 " 1 2 " 1 3 " 1 4 " 2 3 " 2 4 " 3 4 " 1 2 = " 1 3 = " 1 4 = " 2 3 — " 2 4 = " 3 4 = 1.38 63 100 I F M M L E r 1.6072 1.6099 1.6076 1.6085 1.417 1.410 1.413 1.402 1.407 1.398 (0.0434) (0.0443) (0.0459) (0.0451) (0.179) (0.188) (0.190) (0.185) (0.183) (0.186) 1.6016 1.6044 1.6023 1.6031 1.415 1.410 1.413 1.401 1.404 1.401 (0.0477) (0.0494) (0.0495) (0.0490) (0.179) (0.189) (0.193) (0.184) (0.183) (0.184) 0.910 0.895 0.927 0.920 1.001 0.991 0.986 1.009 1.003 1.008 a i a = a , 3 = a.* = 2.1972, a 1 3 = a 9 4 = 1.5163, a 1 4 = 1.1309 100 I F M M L E r 1.6072 1.6090 1.6084 1.6084 2.228 1.546 1.159 2.218 1.536 2.215 (0.0434) (0.0445) (0.0454) (0.0456) (0.176) (0.184) (0.195) (0.183) (0.179) (0.177) 1.5996 1.6003 1.5993 1.5999 2.228 1.538 1.153 2.215 1.527 2.217 (0.0499) (0.0513) (0.0544) (0.0525) (0.176) (0.187) (0.193) (0.184) (0.177) (0.177) 0.871 0.867 0.835 0.867 0.997 0.982 1.015 0.995 1.009 1.001 Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 158 Table 4.15: Efficiency assessment with M C D model for count data: d = 4, (/?i, /?2,/?3, A ) = (1.3863,0.6931,1.6094,2.0794), N = 1000 T1T2T "12 w " 1 3 I P 7 " 2 3 " 2 4 W " 3 4 margin parameters 1 2 3 J 3 _ 4 = a 2 4 = " 3 4 = 1 TOO" ~JFW M L E 1.3835 0^918 1,1.607^ 2.07*8*7 l 2 l l 8 !' (0.0489) (0.0705) (0. 1.414 1.405 (0.185) (0.194) 1.411 1.405 (0.185) (0.193) 0.998 1.002 1.406 (0.189) 1.407 (0.187) 1.011 1.398 (0.186) 1.402 (0.183) 1.017 1.3772 0.6840 1 (0.0549) (0.0756) (0. 0.891 0.932 0 0459) (0.0356) 6024 2.0744 0496) (0.0384) .927 0.927 1.418 1.411 (0.185) (0.189) 1.415 1.411 (0.185) (0.188) 1.003 1.005 = " 2 3 =_Q!34 ==2.1972̂ , 0:13 = Q2j = 1.5163, a j 4 =1.131)9 TOO" ~ T F M ~ M L E l . f e F.69061. (0.0489) (0.0693) (0. 1.3758 0.6792 1. (0.0550) (0.0764) (0. 0.890 0.907 0 A » i a 6084 2.0790 0454) (0.0361) 6006 2.0735 0524) (0.0412) .866 0.876 27239 1.546 (0.191) (0.184) 2.236 1.537 (0.191) (0.184) 1.001 0.997 T 5 3 T " (0.191) 1.529 (0.186) 1.023 2.216 (0.173) 2.216 (0.174) 0.991 155 2.219 (0.194) (0.198) 1.152 2.221 (0.188) (0.202) 1.032 0.981 Multivariate mixture discrete models for count data We now consider a M M D model for count data with the Morgenstern copula J J j=l j=l l + £ ^ * ( l - 2 G ( A i ) ) ( l - 2 G ( A l t ) ) 3<k d\\ •••d\d, (4.11) where f(yj',Xj) = e — A j ' A j J / J / J ! is the Poisson frequency function with parameter Aj (Aj > 0), g(X) a Gamma density function, having the form fl'(Aj) = {l/[/?" , r(aj)] }Aj J - 1 e - A 3 ' / ' 3 j ' , Aj > 0, with /?j being a scale parameter, and G(Aj) is a Gamma cdf. We have -E'(Aj) = «j/?j and Var(Aj) = ctjPj. (4.11) is a multivariate Poisson-Morgenstern-gamma model. The multiple integral in (4.11) over the joint space of A i , . . . , A^ can be decomposed into a product of integrals of a single variable. The calculation of P(y\ • • - yd) can thus be accomplished by calculating 2d univariate integrals. In fact, we have d d p(2/i---̂)=np(%-)+x>*{ j=l j<k m=l and PjkiVjVk) = P(yj)P(y*) + 0jk[P(Vj) ~ 2Pw(yj)][P(yk) - 2Pw(yk)}, where P(yj) = / f{yj] Xj)g(Xj) dXj and Pw(yj) = f f(y,; Xj)g(\j)G(Xj) dXj. Now 1  A « i - l . - A , / * . n ^n-3+yj) % ) = / /?rT (aj) ' "Pi dXj (/3j + l )^+^T(aj)2/j ! ' (4.12) Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 159 Further, as f(yj; Xj)g(Xj) is proportional to the density of a Gamma(y + a, 3/(8+ 1)) random variable, if we let pj = Bj(yj + ctj)/(8j + 1), cr? = (yj + aj)8j/(8j + l ) 2 , then the upper and lower integration ranges of Pw(yj) can be set up as Lj = pj — 5aj and Uj = pj + haj for numerical evaluation of the integrals. To carry out the efficiency assessment through simulation, we need to simulate the multivariate Poisson-Morgenstern-gamma distribution. Let C be the Morgenstern copula, and G(x) be the cdf of a univariate Gamma distribution. The following simulation algorithm is used: 1. Generate U\, • • -,Ud from C(U\,..., Ud)- 2. Let A,- = G~l(Uj), j = l,...,d. 3. Generate Yj from Poisson(Aj), j — 1 , . . . , d. In the above algorithm, the difficult part is the generation of U\,..., Ud from C(Ui,..., Ud)- The conditional distribution approach to generate multivariate random variates can be used here. The conditional distribution approach is to obtain x = (x\,..., Xd)' with F(xi) = V i , F(x2\xi) = V2, . . . , F(xd\xi,..., Xd-i) — Vd, where V\,..., Vd are independent uniform(0,1). With the Morgenstern copula, for m < d, f " m / ( « ! , - . . , U m - l , u ) C ( u m \ U 1 - u 1 , . . . , U m - i = u m - i ) = I Jo • du, / ( « ! , . . . , « m _ l ) where f ( u x u m ) = 1 + J2f<k ®ik^ ~ 2 u i ) ( 1 _ 2uk). S i n c e > • • • > u " 0 = , • • • > «m-i) + J2f=~i ejm(l ~ 2UJ)(1 - 1um), it follows that / ( « ! , • • - l i m ) = 1 + H7=ldirn(l-2Uj) _ 2 ( ^ 0 ^ ( 1 - 2 ^ ) ) ^ / ( U l , . . . , « m - l ) / ( « l , - - - , « m - l ) • • - , « m - l ) Hence •\m — 1 / E r = ~ i l ^ m ( i - 2 « j ) \ C ( W m | £ / l = C / m _ l = U m _ i ) = 1 + — 7 7 7 « „ \ / ( U l , . . . , U m _ i ) J Er="l gjm(l-2ti,-) 2 ~ 1/ / ( u i , . . . , u m _ i ) m Let A = / ( « ! , . . . , u m _i ) , 5 = E J L / 9jm(l-2uj), and D = B / A From Du2n — (D+l)um+Vm = 0, we get •_•(£> + 1) ± V ( D + l ) 2 - 4 £ > V m 2£> Thus the algorithm for generating U\,.. .,Ud from C ( « i , . . . , is as the following: 1. Generate V\,..., Vd from Uniform(0,1). Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 160 2. Let Ui = Vi. 3. Let A = 1 if m = 2 and A = 1 + £ " 7 / M 1 ~ 2 u i X 1 ~ 2 u O i f m > 2 - L e t B = T%=7i eim{l - IUJ), and D = B/A. 4. For m > 2, if £) = 0, Um = Vm. If D ^ 0, (7m takes one of the values of [(£> + 1) ± \/{D + l ) 2 — 4DVm]/[2D] for which it is positive and less than 1. The efficiency studies with the multivariate Poisson-Morgenstern-gamma model are carried out only for the dependence parameters Ojk, in that univariate parameters are fixed. We use the following simulation scheme: 1. The sample size is n = 3000, the number of simulations is N = 200. 2. The dimension d is chosen to be 3, 4 and 5. 3. The marginal parameters aj and f3j are fixed. They are aj = f3j = 1 for j = 1 , . . . , d. 4. For each dimension, two dependence structures are considered: (a) For d = 3, we have (0 1 2 ,0 1 3 ,0 2 3 ) = (0.5,0.5,0.5) and (0 1 2 ,0 1 3 , 0 2 3 ) = (0.6,0.7,0.8). (b) For d = 4, we have (612,6>i3,014.623,024,034) = (0.5,0.5,0.5,0.5,0.5,0.5) and (0 i2 ,0 i3 ,0 i4 ,023 ,024 ,0 3 4 ) = (0.6,0.7,0.8,0.6,0.7,0.6). (c) For d = 5, we have (012,013,014,015,023,024,025,0 3 4 ,0 3 5 ,0 4 5 ) = (0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5) and (012 ,013 ,014 ,015 ,023 ,024 ,0 2 5 , 0 3 4 , 0 3 5 , 0 4 5 ) = (0.6,0.7,0.8,0.8,0.6,0.7,0.8,0.6,0.7,0.8). The numerical results from the M M D models for count data with the Morgenstern copula are presented in Table 4.16 to Table 4.18. We obtain similar conclusions to those for the M C D models for binary, ordinal and count data. Basically, they are: i) The I F M approach is efficient relative to the M L approach; the ratio values r are very close to 1 in almost all the situations studied, ii) M L E may be slightly more efficient than I F M E , but this observation is not conclusive. I F M E and M L E are comparable. Chapter 4. The efficiency of I F M approach and the efficiency of jackknife variance estimate 161 Table 4.16: Efficiency assessment with multivariate Poisson-Morgenstern-gamma model, d = 3 parameters 012 013 023 (012,6 ' l 3 , 023) = (0.5,0.5,0.5) I F M M L E r 0.495 (0.125) 0.494 (0.122) 1.022 0.500 0.501 (0.125) (0.124) 0.499 0.500 (0.124) (0.123) 1.008 1.003 (012,6 ' l 3 , 023) = (0.6,0.7,0.8) I F M M L E r 0.603 (0.127) 0.600 (0.127) 1.000 0.699 0.792 (0.118) (0.119) 0.697 0.790 (0.120) (0.119) 0.985 0.995 Table 4.17: Efficiency assessment with multivariate Poisson-Morgenstern-gamma model, d = 4 parameters 0 1 2 013 014 023 024 034 ( 0 1 2 , 0 1 3 / 0 i 4 , 0 2 3 , 0 2 4 , 0 3 4 ) = 0.5,0.5,0.5.0.5,0.57o.5) I F M 0.500 0 9 5 0.513 UMB 0.494 0.488 (0.131) (0.128) (0.124) (0.133) (0.134) (0.138) M L E 0.501 0.493 0.512 0.497 0.495 0.485 (0.130) (0.124) (0.122) (0.131) (0.132) (0.135) r 1.008 1.026 1.014 1.021 1.018 1.019 ~ ( 0 1 2 , 0 1 3 , 0 1 4 , 0 2 3 , 0 2 4 , 0 3 4 ) =_(0-6,0.7,0.8,,0.6,0.7,0 6) I F M — 0.593 —freJSO"— 0.794 0.589^— 0.692 0.599 (0.130) (0.127) (0.120) (0.121) (0.133) (0.124) M L E 0.589 0.678 0.792 0.585 0.689 0.598 (0.127) (0.124) (0.117) (0.118) (0.129) (0.124) r 1.017 1.026 1.024 1.025 1.037 1.006 Table 4.18: Efficiency assessment with multivariate Poisson-Morgenstern-gamma model, d — 5 parameters 012 013 014 015 023 024 025 034 035 045 (012, 013, 0 1 4 , 0 1 5 , 0 2 3 , 0 2 4 , 0 2 5 , 0 3 4 , 0 3 5 , 0 4 5 ) = (0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5) I F M M L E r 0.501 0.496 0.477 0.486 0.511 0.467 0.478 0.504 0.493 0.495 (0.122) (0.131) (0.137) (0.132) (0.123) (0.130) (0.123) (0.128) (0.131) (0.116) 0.495 0.493 0.473 0.482 0.508 0.466 0.475 0.503 0.489 0.494 (0.121) (0.125) (0.134) (0.128) (0.122) (0.125) (0.119) (0.127) (0.128) (0.113) 1.012 1.046 1.023 1.026 1.002 1.043 1.037 1.011 1.026 1.018 (012, 013, 0 i 4 , 0 1 5 , 0 2 3 , 0 2 4 , 0 2 5 , 0 3 4 , 0 3 5 , 0 4 5 ) = (0.6,0.7,0.8,0.8,0.6,0.7,0.8,0.6,0.7,0.8) I F M M L E r 0.595 0.667 0.775 0.767 0.597 0.693 0.778 0.590 0.693 0.602 (0.140) (0.137) (0.132) (0.130) (0.127) (0.139) (0.118) (0.136) (0.126) (0.125) 0.590 0.666 0.772 0.766 0.593 0.690 0.778 0.588 0.687 0.604 (0.137) (0.132) (0.128) (0.126) (0.119) (0.135) (0.115) (0.132) (0.124) (0.113) 1.023 1.036 1.029 1.032 1.067 1.029 1.028 1.028 1.018 1.103 Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 162 4.4 I F M efficiency for models with special dependence struc- ture The I F M approach may have important applications for models with special dependence struc- ture. Data with special dependence structure arise often in practice: longitudinal studies, repeated measures, Markov type dependence data, fc-dependent data, and so on. The analytical assessment of the efficiency of the I F M approach for several models with special dependence structure were studied in section 4.2. In the following, we give some numerical results for I F M efficiency for some more complex models with special dependence structure. The estimation approach that we used here is P M L A . We only present representative results from the M C D model for binary data, with the M V N copula of exchangeable and AR(1) dependence structures. Results with other models are quite similar, as we also observed in section 4.3 for various situations with a general model. We use the following simulation scheme: 1. The sample size is n = 1000, the number of simulations is TV = 200. 2. The dimension d are chosen to be 3 and 4. 3. For d = 3, we considered two marginal models Yij = I(Zij < Zj) and Yij = I(Z{j < ctjo + oijiXij), with Xij = I(U < 0) where U ~ uniform(— 1,1), and with the regression parameters (a) with no covariates: z = (0.5,0.5,0.5)' and z = (0.5,1.0,1.5)'; (b) with covariates: a0 = ( « i o . " 2 0 , « 3 o ) ' = (0.5,0.5,0.5)', (*i = ( a n , a 2 l , a 3 l ) ' = (1,1,1)' and ot0 = (c*io, a 2 0 , " 3 0 ) ' = (0.5,0.5,0.5)', ori = ( a n , a 2 i , a 3 l ) ' = (1,0.5,1.5)'. For each marginal model, exchangeable and AR(1) dependence structures in the M V N copula are considered, with the single dependence parameter in both cases being 0,- = [exp(/?o+/?iif«) — l]/[exp(/?o +/?iu>,-) +1], with Wi = I(U < 0) where U ~ uniform(—1,1), and parameters /?o = 1 and /?i = 1.5. 4. For d = 4, we only study Yij = I (Zij < Zj), with the marginal parametersz = (0.5,0.5,0.5,0.5)', and z = (0.5,0.8,1.2,1.5)'. For each marginal model, exchangeable and AR(1) dependence structures in M V N copula are considered. The single dependence parameter in both cases is Bi = [exp(/?0) - l]/[exp(/?0) + 1], with /?0 = 1.386 and /30 = 2.197 for both situations. The numerical results from these models with special dependence structure are presented in Table 4.19 to Table 4.26. We basically have the same conclusions as with all other general cases Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 163 Table 4.19: Efficiency assessment with special dependence structure: d = 3, z = (0.5,0.5,0.5)' parameters zi z 2 Z3 Bp ti\ exchangeable, tin = 1, B\ = 1.5 I F M 0.496 0.497 —0^97 ITSSB 1.511 (0.043) (0.041) (0.042) (0.118) (0.194) M L E 0.494 0.496 0.496 0.996 1.520 (0.041) (0.040) (0.041) (0.118) (0.195) r 1.047 1.021 1.015 1.003 0.998 AR(1), Ai, = 1, 0i = 1.5" T F M 0.496 0.497 0.496 UM2 O W ~ (0.043) (0.041) (0.041) (0.123) (0.185) M L E 0.494 0.497 0.495 0.994 1.512 (0.042) (0.040) (0.041) (0.119) (0.183) r 1.034 1.027 1.012 1.031 1.015 Table 4.20: Efficiency assessment with special dependence structure: d = 3, z = (0.5,1.0,1.5)' parameters zi zi Z3 tin ti\ exchangeable, tin — 1, ti\ — 1.5 TFM 0.496 0.997 —TAW 0T9"9"S 1.531 (0.043) (0.047) (0.064) (0.154) (0.249) M L E 0.496 0.996 1.499 0.997 1.534 (0.043) (0.047) (0.063) (0.156) (0.247) r 1.009 0.999 1.010 0.986 1.008 AR(1) 0o, = 1, A = 1.5 ~ I F M 0.496 0:997 1.500 0^91 1.509 (0.043) (0.047) (0.063) (0.158) (0.250) M L E 0.496 0.996 1.500 0.993 1.518 (0.043) (0.046) (0.062) (0.156) (0.249) r 1.017 1.013 1.018 1.011 1.003 studied previously. These conclusions are: i) The I F M approach ( P M L A ) is efficient relative to the M L approach; the ratio values r are very close to 1 in almost all the studied situations, ii) M L E may be slightly more efficient than I F M E , but this observation is not conclusive. I F M E and M L E are comparable. 4.5 Jackknife variance estimate compared with Godambe information matrix Now we turn to numerical evaluation of the performance of jackknife variance estimates of I F M E . We have shown, in Chapter 2, that the jackknife estimate of variance is asymptotically equivalent to the estimate of variance from the corresponding Godambe information matrix. The jackknife approach may be preferred when the appropriate computer packages are not available to compute Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 164 Table 4.21: Efficiency assessment with special dependence structure: d = 3, ct$ = (0.5,0.5,0.5)', Of! = (1,1,1)' ~parameters am Qn apn Q21 Qan "31 Po 01 ~ exchangeable, Bn = 1, Pi = 15 I F M 0.500 1.020 0.499 i .ulO 0.500 1.002 UM0 1.536 (0.055) (0.109) (0.060) (0.108) (0.059) (0.104) (0.153) (0.242) M L E 0.500 1.018 0.498 1.010 0.500 0.999 0.978 1.556 (0.052) (0.104) (0.059) (0.107) (0.058) (0.102) (0.152) (0.250) r 1.052 1.048 1.011 1.007 1.018 1.028 1.002 0.968 AR(1) Bp = " l p\ = 1.5 I F M 0.500 1.020 0.499 1.010 0.497 1.002 0^88 1.529 (0.055) (0.109) (0.060) (0.108) (0.058) (0.101) (0.158) (0.233) M L E 0.501 1.017 0.499 1.009 0.497 0.999 0.985 1.545 (0.052) (0.104) (0.059) (0.105) (0.058) (0.100) (0.157) (0.235) r 1.043 1.047 1.023 1.022 1.004 1.004 1.008 0.991 Table 4.22: Efficiency assessment with special dependence structure: d — 3, oro = (0.5,0.5,0.5)', cti = (1,0.5,1.5)' "parameters am a n qpn 021 0:30 « 3 i Pa Pi exchangeable, pp = 1, Pi = 1.5 I F M 0.500 1.020 0.499 6.H0 0.500 1.512 UME 1.528 (0.055) (0.109) (0.060) (0.089) (0.059) (0.141) (0.160) (0.238) M L E 0.500 1.017 0.498 0.510 0.500 1.506 0.983 1.539 (0.052) (0.103) (0.059) (0.089) (0.058) (0.132) (0.159) (0.239) r 1.047 1.050 1.011 1.002 1.017 1.070 1.004 0.996 AH(1), 00 = 1 Pi = 1-5 ~ I F M 0.500 1.020 0.499 0.510 0.497 1.514 0^9r3 T 3 T 8 - (0.055) (0.109) (0.060) (0.089) (0.058) (0.140) (0.159) (0.225) M L E 0.500 1.017 0.499 0.510 0.497 1.510 0.994 1.530 (0.053) (0.104) (0.059) (0.089) (0.057) (0.133) (0.158) (0.223) _r 1.041 1.045 1.021 1.003 1.006 1.049 1.007 1.010 Table 4.23: Efficiency assessment with special dependence structure: d= 4, z = (0.5,0.5,0.5,0.5)' parameters Zi Z2 z 3 24 Jo exchangeable, Po = 1.386 I F M M L E r 0.502 0.499 (0.041) (0.043) 0.501 0.499 (0.041) (0.043) 1.000 1.002 0.501 (0.042) 0.500 (0.042) 1.005 0.501 (0.042) 0.500 (0.042) 1.003 1.387 (0.071) 1.389 (0.070) 1.013 AR(1), A, = 1.386 I F M M L E r 0.502 0.499 (0.041) (0.043) 0.502 0.499 (0.041) (0.043) 0.996 1.000 0.501 (0.041) 0.500 (0.041) 0.998 0.497 (0.042) 0.496 (0.042) 0.998 1.385 (0.072) 1.387 (0.069) 1.047 Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 165 Table 4.24: Efficiency assessment with special dependence structure: d = 4, z = (0.5, 0.8,1.2,1.5)' parameters 2 l 2 3 2 4 Bo exch angeable, 0o = 1.386 I F M M L E r 0.502 (0.041) 0.502 (0.041) 0.998 fi.803 (0.045) 0.802 (0.045) 1.004 1.199 (0.052) 1.198 (0.052) 1.002 1.494 (0.061) 1.492 (0.061) 1.007 1.389 (0.087) 1.391 (0.087) 1.004 A R I D , 0O = 1.386 I F M M L E r 0.502 (0.041) 0.502 (0.041) 0.999 0.803 (0.045) 0.802 (0.045) 1.006 1.20 (0.05) 1.20 (0.05) 1.00 1.495 (0.067) 1.494 (0.065) 1.017 1.388 (0.085) 1.389 (0.083) 1.025 Table 4.25: Efficiency assessment with special dependence structure: d = 4, z = (0.5,0.5,0.5,0.5)' parameters Z\ Z2 2.3 2 4 00 exch angeable, 0o = 2.197 I F M M L E r 0.502 (0.041) 0.500 (0.041) 0.999 ' Q.501 (0.042) 0.499 (0.042) 1.000 0.501 (0.042) 0.499 (0.042) 0.999 0.501 (0.042) 0.499 (0.042) 1.000 2.200 (0.093) 2.202 (0.092) 1.015 AR(1), 0o = 2.197 I F M M L E r 0.502 (0.041) 0.501 (0.041) 0.995 0.501 (0.042) 0.499 (0.043) 0.993 0.501 (0.042) 0.499 (0.042) 1.000 0.499 (0.042) 0.498 (0.042) 0.999 2.194 (0.086) 2.199 (0.084) 1.025 Table 4.26: Efficiency assessment with special dependence structure: d = 4, z = (0.5,0.8,1.2,1.5)' parameters Zl 22 2 3 2 4 0o exchangeable, 0o = 2.197 I F M M L E r 0.502 0.802 (0.041) (0.046) 0.501 0.801 (0.041) (0.046) 0.996 1.002 1.201 (0.056) 1.199 (0.055) 1.005 1.499 (0.060) 1.496 (0.059) 1.003 2.203 (0.114) 2.204 (0.111) 1.031 AR(1), 0o = 2.197 I F M M L E r 0.502 0.802 (0.041) (0.046) 0.501 0.801 (0.041) (0.046) 0.997 1.005 1.198 (0.052) 1.196 (0.052) 0.993 1.500 (0.060) 1.500 (0.060) 1.000 2.200 (0.110) 2.200 (0.100) 1.040 Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 166 the Godambe information matrix or when the asymptotic variance in terms of Godambe informa- tion matrix is difficult to compute analytically or computationally. For example, to compute the asymptotic variance of P(yi • • -yd',0) by means of Godambe information is not an easy task. To complement the theoretical results in Chapter 2, in this subsection, we give some analytical and nu- merical comparisons of the variance estimates from Godambe information and the jackknife method. The application of jackknife methods to modelling and inference of real data sets is demonstrated in Chapter 5. Analytical comparison of the two approaches Example 4.8 (Multinormal, general) Let X ~ Nd(p, £ ) , and suppose we are interested in esti- mating p. Given n independent observations x i , . . . , x n from X, the I F M E of p is p = n - 1 ^ " = 1 x,-, and the corresponding inverse of the Godambe information matrix is J ^ 1 = E . A consistent estimate of Jy1 is n » = 1 The jackknife estimate of the Godambe information matrix is nVj = n J2(hi) ~ fi&V) ~ ^)T> i=l where p^ = (n — l ) _ 1 (n/2 — x,). Some algebraic manipulation leads to n2 1 n n ^ = (^Ti)2^D*'-«(«'-«T. which is a consistent estimate of S. Furthermore, we see that n2 -_, which shows that the jackknife estimate of the Godambe information matrix is also good when the sample size is moderate to small. • Example 4.9 (Multinormal, common marginal mean) Let X ~ Nd(p, E) , where p = (pi,..., pd)' = pi and £ is known. We are interested in estimating the common parameter p. Given n independent observations x i , . . . , x„ with same distributions as X, the I F M E of p by the weighting approach is (see Example 4.2) _ _ I/IT 1 / ! Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 167 The inverse of Godambe information of pw is j - 1 1 The jackknife estimate of the Godambe information is iVj = « ^ ( j i B ( i ) - K ) ( j i B ( i ) - / i t J f , 8 = 1 where = l ' E 1 / i ^ \ i / l / E 11. Some algebraic manipulation leads to l ' E " 1 nVj = T^i L 8 = 1 TAT , „ ; + u „ 2 _ i \ 2 * l ' E - 1 ! We replace n J2"=i(h) ~ Mh) ~ # with n 2 / ( n - 1) 2 E. Thus n V j ~ ( n - 1 ) 2 l ' E - i l ' and " 2 - i * (^31)2- J* > which shows that the jackknife estimate of the Godambe information is also good when the sample size is moderate to small. • N u m e r i c a l compar ison of the two approaches In this subsection, we numerically compare the variance estimates of I F M E from the jackknife method and from the Godambe information. For this purpose, we use a 3-dimensional probit model with normal copula. The comparison studies are carried out only for the dependence parameters 8jk- For the chosen model parameters, we carry out TV simulations for each sample size n. For each simulation s (s = 1 , . . . , TV) of sample size n, we estimate model parameters #12, #13, #23 with the I F M approach. Let us denote these estimates 6$ % 813, &23- We then compute the jackknife estimate of variance (with g groups of size m such that g x m = n) for > 1̂3̂ ) 2̂3̂  • We denote these (s) (s) (s) • . ~ ~ ~ variance estimates by v\2 , v\3 , 1*23• Let the asymptotic variance estimate of t?i2, ^13, ^23 based on the Godambe information matrix from a sample of size n be V12, ^13, ^23. We compare the following three variance estimates: (i) . MSE: i £ f = 1 $ ' 2 > - M 2 , * £ f = i $ 3 - M 2 , ' * E T = i ( ^ - M 2 ; (ii) . Godambe: v 1 2 , vi3, v23] Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 168 (iii). Jackknife: 1v[s2\ ££iLiwis> wT,1=iv23- The M S E in (i) should be considered as the true variance of the parameter estimate assuming unbiasedness. (ii) and (iii) should be compared with each other and also with (i). Table 4.27 and Table 4.28 summarize the numerical computation of the variance estimates of 0 1 2 , #i3 , 0 23 based on approaches (i), (ii) and (iii). For the jackknife method, the results for different combinations of (g, m) are reported in the two tables. In total four models with different marginal parameters z = (21,22,23) and different dependence parameters 6 = (0i2, #13, 0 2 3) are studied. The details about the parameter values are reported in the tables. We have studied two sample sizes: n = 500 and n = 1000. For both sample sizes, the number of simulations is N = 500. From examining the two tables, we see that the three measures are very close to each other. We conclude that the jackknife method is indeed consistent with the Godambe information computation approach. Both approaches yields variance estimates which are comparable to M S E . In conclusion, we have shown theoretically and demonstrated numerically in several cases that the jackknife method for variance estimation compares very favorably with the Godambe information computation. We are willing to extrapolate to general situations. The jackknife approach is simple and computationally straightforward (computationally, it only requires the code for obtaining the parameter estimates); it also has the advantage of easily handling more complex situations where the Godambe information computation is not possible. One major concern with the jackknife approach is the computational time needed to carry out the whole process. If the computing time problem is due to an extremely large sample size, appropriate grouping of the sample for the sake of applying the jackknife approach may improve the situation. A discussion is given in Section 2.5. Overall, we recommend the general use of the jackknife approach in applications. 4.6 S u m m a r y In this chapter, we demonstrated analytically and numerically that the I F M approach is an efficient parameter estimation procedure for M C D and M M D models with M U B E or P U B E properties. We have chosen a wide variety of cases so that we can extrapolate this conclusion to the general situation. Theoretically, we expect I F M to be quite efficient because it is closely tied to M L E in that each inference function is a likelihood score function of a margin. For comparison purposes, we carried out M L estimates for several multivariate models. Our experience was that finding the M L E is a difficult and very time consuming task for multivariate models, while the I F M E is Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 169 Table 4.27: Comparison of estimates of standard error, (i) true, (ii) Godambe, (iii) jackknife with g groups; N = 500, n = 1000 approach" | Pi 2 (0.0,0.7,0.0)', f= (- (9,™) (1000,1) (500,2) (250,4) (125,8) (100,10) (50,20)  0.7, (m) 0.002079 0.002012 0.002030 0.002028 0.002025 0.002058 0.002046 0.002089 0.5,0.5,-0.5) Z23_ 0.001704 0.001645 0.001646 0.001653 0.001658 0.001653 0.001663 0.001685 0.002085 0.002012 0.002038 0.002043 0.002047 0.002046 0.002046 0.002089 z = (0.7,0.0 0.7)', 6 = (0.5,0.9,0.5) i 0.002090 0.000281 0.002200 0.002012 0.000295 0.002012 (g,m) (iii) (1000,1) 0.002026 0.000299 0.002023 (500,2) 0.002027 0.000300 0.002021 (250,4) 0.002036 0.000300 0.002035 (125,8) 0.002056 0.000302 0.002049 (100,10) 0.002063 0.000301 0.002054 (50,20) 0.002088 0.000301 0.002067 z = (0.7,0.7,0.7)', 6 = (0.9,0.7,0.5) i) 0.000333 0.001218 0.002319 (ii) 0.000295 0.001239 0.002187 (9,m) (iii) (1000,1) 0.000302 0.001254 0.002208 (500,2) 0.000303 0.001257 0.002210 (250,4) 0.000302 0.001260 0.002212 (125,8) 0.000303 0.001267 0.002216 (100,10) 0.000305 0.001261 0.002214 (50,20) 0.000310 0.001252 0.002220 9, rn) A (m) 1000,1) 500,2) 250,4) 125,8) 100,10) 50,20) z = (1Q, 0.5.0.0)', 0 = (0.8,0.6,0.8) 0.000821 0.000869 0.000873 0.000874 0.000877 0.000884 0.000887 0.000899 0.002147 0.002089 0.002129 0.002118 0.002108 0.002119 0.002138 0.002151 0.000766 0.000666 0.000683 0.000683 0.000681 0.000688 0.000687 0.000690 Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 170 Table 4.28: Comparison of estimates of standard error, (i) true, (ii) Godambe, (iii) jackknife with groups; N = 500, n = 500 o.oy,gy= U approacn 13_ Z23_ = (0.0,0.7, (m) ( -0.5,0.5,-03) ( f lS™) (500.1) (250.2) (125,4) (50,10) 0.004158 0.004024 0.004085 0.004071 0.004053 0.004115 0.003135 0.003290 0.004262 0.004024 0.003315 0.004104 0.003333 0.004122 0.003331 0.004119 0.003396 0.004176 z = (0 7,0.0,0.7)', 0 = (0.5,0.9,0.5) (9,™) (500.1) (250.2) (125,4) (50,10) (m) 0.003998 0.004024 0.004062 0.004062 0.004091 0.004123 0.000602 0.000591 0.003768 0.004024 0.000604 0.004049 0.000601 0.004054 0.000607 0.004103 0.000617 0.004171 : (0.7,0.7, 0.7)', 6 = (0.9,0.7,0.5) I (m) (9,™) (500.1) (250.2) (125,4) (50,10) 0.000632 0.000591 0.002688 0.002479 0.004521 0.004374 0.000607 0.000611 0.000616 0.000622 5,0.0)', 0 = (0.8,0.6,0.8) 0.002501 0.004410 0.002510 0.004425 0.002533 0.004467 0.002539 0.004501 (i 0.001634 0.003846 0.001413 (ii) 0.001738 0.004179 0.001332 (iii) 500,1) 0.001821 0.004397 0.001365 (250,2) 0.001837 0.004407 0.001368 (125,4) 0.001846 0.004433 0.001360 (50,10) 0.001876 0.004476 0.001388 Chapter 4. The efficiency of IFM approach and the efficiency of jackknife variance estimate 171 computationally simple and results in significant saving of computing time. We further demonstrated numerically that the jackknife method yields SEs for the I F M E , which are comparable to the SEs obtained from the Godambe information matrix. The jackknife method for variance estimates has significant practical importance as it eliminates the need to calculate the partial derivatives which are required for calculating the Godambe information matrix. The jackknife method can also be used for estimates of functions of parameters (such as probabilities of being in some category or probabilities of exceedances). The I F M approach together with the jackknife estimation of SE's make many more multivariate models computationally feasible for working with real data. The I F M theory as part of statistical inference theory for multivariate non-normal models is highly recommended because of its good asymptotic properties and its computational feasibility. This approach should have significant prac- tical usefulness. We will demonstrate its application in Chapter 5. Chapter 5 Modelling, data analysis and examples Possessing a tool is one thing, but using it effectively is quite another. In this chapter, we explore the possibility of effectively using the tools developed in this thesis for multivariate statistical modelling (including I F M theory, jackknife variance estimation, etc.) and provide data analysis examples. In section 5.1, we first discuss out view of the proper data analysis cycle. This is an important issue since the interpretation of the results and maybe the possible indication of further studies are directly related to the way that the data analysis was carried out. We next discuss several other important issues in multivariate discrete modelling, such as how to make the the choice of models and how to deal with checking the adequacy of models. We also provide some discussion on the testing of dependence structure hypotheses, which is useful for identifying some specific multivariate models. In section 5.2, we carry out several data analysis examples with the models and inference procedure developed in the previous chapters. We show some applications of the models and inference procedures developed in this thesis and point out difficulties related to multivariate nonnormal analysis. 172 Chapter 5. Modelling, data analysis and examples 173 5.1 Some issues on modelling 5.1.1 Data analysis cycle A proper data analysis cycle usually consists of initial data analysis, statistical modelling, diagnostic model assessment and inferences. The initial data analysis may consist of computing various data summaries and examining various graphical representation of data. The type of summary statistics and graphical representations depend on the basic features of the data set. For example, for binary, ordinal and count data, we can compute the empirical frequencies (and percentages) of response variables as well as covariates, separately and jointly. If some covariates are continuous, then standard summaries such as the mean, median, standard deviation, quartiles, maximum, minimum, as well as graphical displays such as boxplots and histograms could be examined. To have a rough idea of the dependence among the response variables, for binary data, a check of the pairwise log odds ratios of the responses could be helpful. Another convenient empirical pairwise dependence measure for multivariate discrete data, which is particularly useful for ordinal and count data, is a measure called gamma. This measure, for ordinal and count data y ; = (yn, yii), i= 1 , . . . , n, is defined as where C = £ " = 1 E"=i1 ( y n > yin) * I(yi2 > yi'2) and D = J2"=1 E " = i < &'i) * Kyi2 > yi'2), and / is the indicator function. In (5.1), C can be interpreted as the number of concordant pairs and D the number of discordant pairs. The gamma measure is studied in Goodman and Kruskal (1954), and is considered as a discrete generalization of Kendall's tau for continuous variables. The properties of the gamma measure follow directly from its definition. Like the correlation coefficient, its range i s — 1 < 7 < 1 : 7 = 1 when the number of discordant pairs D = 0 , 7 = — 1 when the number of concordant pairs C = 0 , and 7 = 0 when the number of concordant pairs equals the number of discordant pairs. Other dependence measures as the discrete generalizations of Kendall's tau or Spearman's p can also be used for ordinal response and count response as well as binary response. Furthermore, summaries such as means, variances and correlations could also be meaningful and useful for count data. Initial data analysis is particularly important in multivariate analysis, since the structure of multivariate data is much more complicated than that of univariate data, and the initial data analysis results will shed light on identifying the suitable statistical models. Statistical modelling usually consists of specification, estimation, and evaluation steps. The specification formulates a probabilistic model which is assumed to have generated the observed 7 = C - D C + D (5.1) Chapter 5. Modelling, data analysis and examples 174 data. At this stage, to choose appropriate models, relevant questions are: "What is the nature of the data?" and " How have the data been generated?" The chosen models should make sense for the data. The decision concerning which model to fit to a set of data should, if possible, be the result of a prior consideration of what might be a suitable model for the process under investigation, as well as the result of computation. In some situations, a data set may have several suitable alternative models. After obtaining estimation and computation results, model selections could be made based on certain criteria. Diagnostics consist of assessments of the reliability of the estimates, the fit of the model and the overall performance of the model. Both the fitting error of the model and possibly prediction error should be studied. We should also bear in mind that often a small fitting error does not lead to a small prediction error. Sometimes, it is necessary to seek a balance between the two. Appropriate diagnostic checking is an important but not easy step in the whole modelling process. At the inference stage, relevant statements about the population from which the sample was taken can be made based on the statistical modelling (mainly probabilistic models) results from the previous stages. These inferences may be the explanation of changes in responses over margin or time, the effects of covariates on the probabilities of occurrence, the marginal and conditional behaviour of response variables, the probability of exceedance, as well as of hypothesis testing as suggested by the theory in the application domain, and so on. Some relevant questions are: "How can valid inference be drawn?", "What interpretation can be given to the estimates?", "Is there a structural interpretation, relating to the underlying theory in the application?", and "Are the results pointing to further studies?" 5.1.2 M o d e l s e l e c t i o n When modelling a data set, usually it is required only that the model provide accurate predictions or other aspects of data, without necessarily duplicating every detail of the real system. A valid model is any model that gives an adequate representation of the system that is of interest to the model user. Often a large number of equally good models exist for a particular data set in terms of the specific inference aspect of interest to the practitioner. Model selection is carried out by comparing alterna- tive models. If a model fits the data approximately as well as the other more complex models, we usually prefer the simple one. There are many criteria to distinguish between models. One suitable criterion for choosing a model is the associated maximum loglikelihood value. However, within the Chapter 5. Modelling, data analysis and examples 175 same family, the maximum loglikelihood value usually depends on the number of parameters esti- mated in the model, with more parameters yielding a bigger value. Thus maximizing this statistic cannot be the sole criterion since we would inevitably choose models with more parameters and more complex structure. In application, parsimonious models which identify the essential relations between the variables and capture the major characteristic features of the problem under study are more useful. Such models often lead to clear and simple interpretation. The ideal situation is that we arrive at a simple model which is consistent with the observed data. In this vein, a balance between the size of the maximum loglikelihood value and the number of parameters is important. But it is often difficult to judge the appropriateness of the balance. One widely used criterion is the Akaike Information Criterion (AIC), which is defined as A I C =-2^(0;y) + 2s, where £(6; y) is the maximum loglikelihood of the model, and s is the number of estimated param- eters of the model. (With I F M estimation, the A I C is modified to A I C = -2£(6;y) + 2s.) By definition, a model with a smaller A I C is preferable. The A I C considers the principles of maximum likelihood and the model dimensions (of number of parameters) simultaneously, and thus aims for a balance of maximum likelihood value and model complexity. The negative of A I C / 2 is asymp- totically an unbiased estimator of the mean expected loglikelihood (see Sakamoto et al. 1986); thus A I C can be interpreted as an unbiased estimator of the -2 times the expected loglikelihood of the maximum likelihood. The model having minimum A I C should have minimum prediction error, at least asymptotically. In the use of A I C , it is the difference of A I C values that matters and not the actual values themselves. This is because of the fact that A I C is an estimate of the mean expected loglikelihood of a model. If the difference is less than 1, the goodness-of-fit of these models are almost the same. For a detailed account of A I C , see Sakamoto et al. (1986). The A I C was introduced by Akaike (1973) for the purpose of selecting an optimal model from within a set of proposed models (hypotheses). The A I C procedure has been used successfully to identify models; see, for example, Akaike (1977). The selection of models should also be based on the understanding that it is an essential part of modelling to direct the analysis to aspects which are relevant to the context and to omit other aspects of the real world situation which often lead to spurious results. This is also the reason that we have to be careful not to overparameterize the model, since, although this might improve the goodness-of-fit, it is likely to result in the model portraying spurious features of the sampled data, which may detract from the usefulness of the achieved fit and may lead to poor prediction. The Chapter 5. Modelling, data analysis and examples 176 selection of models should also be based on the consideration of the practical importance of the models, which in turn is based on the nature and extent of the models and their contribution to our understanding to the problem. Statistical modelling is often an iterative process. The general process is such that after, a promising member from a family of models is tentatively chosen, parameters in the model are next efficiently estimated; and finally, the success of the resulting fit is assessed. The now precisely defined model is either accepted by this verification stage or the diagnostic checks carried out will find it lacking in certain respects and should then suggest a sensible modified identification. Further estimation and checking may take place, and the cycle of identification, estimation, and verification is repeated until some satisfactory fits obtain. 5.1.3 D i a g n o s t i c c h e c k i n g A model should be judged by its predictive power as well as its goodness-of-fit. Diagnostic checking is a procedure for evaluating to what extent the data support the model. The A I C only compares models through their relative predictive power; it doesn't assess the goodness-of-fit of the model to the data. In multivariate nonnormal analysis, it is not obvious how the goodness-of-fit checking could be carried out. We discuss this issue in the following. There are many conventional ways to check the goodness-of-fit of a model. One direct way to check the model is by means of residuals (mainly for continuous data). A diagnostic check based on residuals consists of making a residual plot of the (standardized) residuals. Another frequently applied approach is to calculate some goodness-of-fit statistics. When the checking of residuals is feasible, the goodness-of-fit statistics are often used as a supplement. In multivariate analysis, direct comparison of estimated probabilities with the corresponding empirical probabilities may also be considered as a good and efficient diagnostic checking method. For multivariate binary or ordinal categorical data, a diagnostic check based on residuals of observed data is not meaningful. However statistics of goodness-of-fit are available in these situations. We illustrate the situation here by means of multivariate binary data. For a d-dimensional random binary vector Y with a model P, its sample space contains 2d elements. We denote these by k = 1,..., 2d with k representing the kt\x particular outcome pattern and Pk the corresponding probability, with Efc=i — 1- Assume n is the number of observations and n i , . . . , n2i are the empirical frequencies corresponding to k = 1,.. .,2d. Let Pk be the estimate of Pk for a specified model. Under the hypothesis that the specified model is the true model and with the assumption of Chapter 5. Modelling, data analysis and examples 111 some regularity conditions (e.g. efficient estimates, see Read and Cressie 1988, §4 .1) , Fisher (1924) shows, in the case if Pk depends on one estimated parameter, that the Pearson \ 2 type statistic x 2 = ^ ( n ^ n h l ( 5 . 2 ) is asymptotically chi-squared with 2d — 2 degrees of freedom. If Pk depends on s (s > 1) estimated parameters, then the generalization is that (5.2) is asymptotically chi-squared with 2d — s — 1 degrees of freedom. A more general situation is that Y depends on a covariate of g categories. For each category of the covariate, it has the situation of (5.2). If we assume independence between the categories of the covariate, we can form an overall Pearson x2 type test statistic for the goodness- of-fit of the model as t i t ! nWpW where v is the index of the categories in the covariate. Suppose we estimated s parameters in the model; thus P^ depends on s parameters. Under the hypothesis that the specified model is the true model, the test statistic X2 in (5.3), with some regularity conditions (e.g. efficient estimates), is asymptotically X^2d-i)-3> w n e r e 9 is the number of categories of the covariate, and s is the total number of parameters estimated in the model. Similarly, an overall loglikelihood ratio type statistic G2 = 2 £ E »<"> -og[n^/(n^P^)} (5.4) v=lk=l is also asymptotically X^2<«-i)-»- ^ 2 and G2 are asymptotically equivalent, but there are not the same in finite sample case, so sometimes there is a question of which statistic to choose. Read and Cressie (1988) may shed some light on this matter. The computation of the test statistic X2 or G2 requires the calculation of Pk"\ which may not be easily obtained, depending on the copula associated with the model (for example, it is generally feasible with mixture of max-id copula but only feasible with relatively low dimension for multinormal copula, unless approximations are used). One frequently encountered problem in applications with multivariate binary or ordinal cate- gorical data (also count data) is that when the dimension of the response is relatively high, the empirical frequency for some particular outcomes of the response vector is relatively small or even zero. Thus Pk or P^ would usually be very small for any particular model, and (5.2) or (5.3) with its related statistical inferences are not suitable in these situations. What we may still do in terms of goodness-of-fit checking in these situations is to limit the comparison of Pfc(x,) with nfc(x;)/n(x,-) by tables and graphics to outcomes of non-zero frequency (where x,- is the covariate vector), or to Chapter 5. Modelling, data analysis and examples 178 calculate 4o= E (nk lkhk? o r G « = 2 E »*i°g(»*/aO, (5-5) {nfc>a} { " k > « } where hk = J2l=i -Pfc(xt')> where k represent the kth patterns of the response variables, and plot X^ (or G 2 a ^) versus a = {1, 2, 3,4,5} to get a rough idea of how the model fits the non-zero frequency observations. The data obviously support the model if the observed values of X2a^ (or G2a^) go down quickly to zero, while large values indicate potential model departures. Obviously, in any case, some partial assessments using (5.2) or (5.3) may be done for some lower- dimensional margins where frequencies are sufficiently large. Sometimes, these kinds of goodness- of-fit checking may be used to retain a model while (5.5) is not helpful. The statistics in (5.5) and related analysis can be applied to multivariate count data as well. Furthermore, a diagnostic check based on the residuals of the observed counts is also meaningful. If there are no covariates, quick and overall residual checking cab be based on examining iij =yij-Y,[Yij\Yii-i~9] (5.6) for a particular fixed j, where Y t ] _ j means the response vector Y , with the j th margin omitted. The model is considered as adequate based on residual plot in terms of goodness-of-fit if the residuals are small and do not exhibit systematic patterns. Note that the computation of E f Y i j l Y ^ - j , ^ ] may not be a simple task when the dimension d is large (e.g d > 3). Another rough check of the goodness-of-fit of a model for multivariate count data is to compare the empirical marginal means, variances and pairwise correlation coefficients with the corresponding means, variances and pairwise correlation coefficients calculated from the fitted model. In principle, a model can be forced to fit the data increasingly well by increasing its number of parameters. However, the fact that the fitting errors are small is no guarantee that the prediction errors will be. Many of the terms in a complex model may simply be accounting for noise in the data. The overfitted models may predict future values quite poorly. Thus to arrive at a model which represents only the main features of the data, selection and diagnostic criteria which balance model complexity and goodness-of-fit must be used simultaneously. As we have discussed, often there are many relevant models that provide an acceptable approximation to reality or data. The purpose of statistical modelling is not to get the "true" model, but rather to obtain one or several models which extract the most information and better serve the inference purposes. Chapter 5. Modelling, data analysis and examples 179 5.1.4 T e s t i n g t h e d e p e n d e n c e s t r u c t u r e We next discuss a topic related to model identification. Short series of longitudinal data or repeated measures with many subjects often exhibit highly structured pattern of dependence structure, with the dependence usually becoming weaker as the time separation (if the observation point is time) increases. Valid inferences can be made by borrowing strength across subjects. That is, the consis- tency of a pattern across subjects is the basis for substantive conclusions. For this reason, inferences from longitudinal or repeated measures studies can be made more robust to model assumptions than those from time series data, particularly to assumptions about the nature of the dependence. There are many possible structures for longitudinal or repeated measures type dependence. The exchangeable or AR(l)-like dependence structures are the simplest. But in a particular situation, how to test to see if a particular dependence structure is more plausible? The A I C for model comparison may be a useful index. In the following, we provide an alternative approach for testing special dependence structures. For this purpose, we first give a definition and state two results that we are going to use in the later development. A reference for these materials is Rao (1973). D e f i n i t i o n 5.1 (Genera l ized inverse of a matr ix ) A generalized inverse of an n x m matrix A of any rank is an m x n matrix denoted by A~ which satisfies the following equality: AA~ A = A. • Resul t 5.1 (Spectral decomposi t ion theorem) Let A be a real n x n symmetric matrix. Then there exists an orthogonal matrix Q such that Q'AQ is a diagonal matrix whose diagonal elements Ai > A 2 > • • • > A„ are the characteristic roots of A, that is / Ai 0 ••• 0\ 0 A 2 ••• 0 Q'AQ V 0 0 • •• A„ / • Resul t 5.2 / / X ~ Np(p, Ex) , and E x is positive semidefinite, then a set of necessary and sufficient conditions for X ' A X ~ xl{&2) is (i)tr(AL-x) = r and p.'Ap = S2, (ii) E X A E X ^ E X = E X A E X , (iii) p!AH-ynAp = p'Ap, (iv) p'(AT,x)2 = p'AH. X2{&2) denotes the non-central chi-square distribution with noncentality parameters2. • Chapter 5. Modelling, data analysis and examples 180 In the following, we are going to build up a general statistical test, which in turn can be used to test exchangeable or AR(l)-type dependence assumptions. Suppose X ~ Np(p, £ x ) where E x is known. We want to test if / i = pi, where p is a constant. Let a = E ^ l / l ' E x 1 ! , then X - a'Xl = {Xi - a'X,..., Xp - a'X)' = B X , where B = I - H'E^1/l"E^l, and / is the identity matrix. Thus BX ~ Np(Bp, BZXB'). It is easy to see that Rank(B) = p — 1, it implies that Rank(BExB') = p — 1. By Result 5.1, there is an orthogonal matrix Q, such that ^ A j . . . 0 0^ J 3 E X B ' = Q 0 \ 0 A p - i 0 0 0/ Q', where Ai > A 2 > • • • > A p _ i > 0. Let A = Q 0 V o 0 0 \ 0 • o 1 / then A is a full rank matrix. It is also easy to show that A is a generalized inverse of 5Ex-B ' , and all the conditions in Result 5.2 are satisfied, we thus have X'B'ABX-xl-iV2), where S2 = p'B'ABp, S2 > 0. b2 = 0 is true if and only if Bp = 0, and this in turn is true if and only if p = pi, that is p should be an equal constant vector. Thus under the null hypothesis p = pi, we should have X'B'ABX-xl-i, where xP-i means central chi-square distribution with p — 1 degrees of freedom. Now we use an example to illustrate the use of above results. Example 5.1 Suppose we choose the multivariate logit model with multinormal copula (3.1) with correlation matrix 0 = (Ojk) to model the d-dimensional binary observations y 1 ; . . . , y „ . We want to know if an exchangeable (that is Ojk = 0 for all 1 < < k < d and for some \0\ < 1) or an AR(1) Chapter 5. Modelling, data analysis and examples 181 (that is 9jk = 0^ for all 1 < j < k < d and for some |t9| < 1) correlation matrix in the multinormal copula is the suitable assumptions. The above results can be used to test these assumptions. Let $W be the I F M E of 0 from the (j,k) bivariate margin, and ~6 = (0~(12\ 0~(13\ ..., fifa-1.*)). By Theorem 2.4, we have asymptotically ~6~ Nd{d_1)/2(01,X0), where is the inverse of Godambe information matrix of 0. Thus under the exchangeable or