UBC Faculty Research and Publications

The Estimation Method of Inference Functions for Margins for Multivariate Models Joe, Harry; Xu, James Jianmeng Oct 31, 1996

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


52383-Joe_H_et_al_Estimation_method_inference_functions.pdf [ 415.12kB ]
JSON: 52383-1.0225985.json
JSON-LD: 52383-1.0225985-ld.json
RDF/XML (Pretty): 52383-1.0225985-rdf.xml
RDF/JSON: 52383-1.0225985-rdf.json
Turtle: 52383-1.0225985-turtle.txt
N-Triples: 52383-1.0225985-rdf-ntriples.txt
Original Record: 52383-1.0225985-source.json
Full Text

Full Text

The Estimation Method of Inference Functions for Margins for Multivariate ModelsHarry Joe and James J. XuDepartment of Statistics, University of British ColumbiaABSTRACTAn estimation approach is proposed for models for a multivariate (non–normal) response with covari-ates when each of the parameters (either a univariate or a dependence parameter) of the model can beassociated with a marginal distribution. The approach consists of estimating univariate parameters fromseparately maximizing univariate likelihoods, and then estimating dependence parameters from separate bi-variate likelihoods or from a multivariate likelihood. The analysis of this method is done through the theoryof inference or estimating functions, and the jackknife method is proposed for obtaining standard errors ofthe parameters and functions of the parameters. The approach proposed here make a large contributionto the computational feasibility of carrying out inference with multivariate models. Examples illustrate theapproach, and simulation results are used to indicate the efficiency.Key words: estimating equations, inference functions, copula, jackknife for variance estimation, likelihood,multivariate non–normal response, marginal distributions.1. Introduction.This report is concerned with an estimation approach that can be applied to parametric multivariatemodels in which each parameter of the model can be associated with a lower–dimensional margin. By amultivariate model, we mean a model for a multivariate response vector Y = (Y1, . . . , Yd) with covariatevector x. The parameters are either univariate parameters or dependence parameters, and the approach isaimed primarily at a multivariate non–normal response. For a sample of size n, the multivariate data hasthe form (Yi,xi) for the ith subject, with Yi = (Yi1, . . . , Yid), where d is the dimension of the responsevector. If the Yij ’s are repeated measures or observed sequentially in time, then more generally we couldhave a time–varying or margin–dependent covariate vector.The set of parameters of the model are estimated through a (nonlinear) system of estimating equations,with each estimating equation being a score function (partial derivative of a log–likelihood) from somemarginal distribution of the multivariate model. This method is called the inference function for margins(IFM) method because the theory of inference functions (Godambe 1960, 1976, 1991; McLeish and Small,1988) imposes optimality criteria on the functions in the estimating equations rather than the estimatorsobtained from them.We combine this method with the use of the jackknife method for estimation of the standard errorsof the parameters and functions of the parameters. This eliminates the need for analytic derivatives to1obtain the inverse Godambe information matrix or asymptotic covariance matrix associated with the vectorof parameter estimates. In addition, because each inference function derives from some log–likelihood ofa marginal distribution, the inference or score functions do not have to be obtained explicitly (that is,derivatives are not needed) if the numerical optimizations for the log–likelihoods of the marginal distributionsare done with a quasi–Newton routine (e.g., Nash, 1990). The idea of the jackknife for use with estimatingequations is quite new; previously, Lipsitz, Dear and Zhao (1994) used a one–step jackknife with estimatingequations in a context of clustered survival data.The estimation method applies to models in which the univariate margins are separated from the de-pendence structure, for example, when the dependence is summarized through a copula. If F is a d–variatedistribution or model, with univariate margins F1, . . . , Fd, then there is a copula or multivariate U(0, 1)distribution C such thatF (y1, . . . , yd) = C(F1(y1), . . . , Fd(yd)). (1.1)(See Joe, 1993; Joe and Hu, 1996, for parametric families of copulas with nice properties.) This includes thecase of multivariate extreme value distributions in which the univariate margins are each in the generalizedextreme value family and the copula is in the multivariate extreme value class (C satisfies C(ut1, . . . , utd) =Ct(u1, . . . , ud) for all t > 0). Other examples are models in which the multivariate normal distribution isused as a latent vector distribution, for example, the multivariate probit model for multivariate binary orordinal data (Ashford and Sowden, 1970; Anderson and Pemberton, 1985) and the multivariate Poisson–lognormal distribution for count data (Aitchison and Ho, 1989).The use of the IFM method with the jackknife for estimation of standard errors makes many moremultivariate models computationally feasible to work with for data analysis and inference. The previouslack of this method may explain why models such as the multivariate probit and multivariate Poisson–lognormal models have not been much used (see Lesaffre and Molensbergh, 1991, on the former model).The details of the IFM method and some models for which the method applies are given in Section 2.To concentrate on the ideas of the method and to avoid cumbersome notation, we do not present it in itsmost general possible form. Section 3 summarizes some advantages of the IFM method. Section 4 containsasymptotic results associated with the IFM method, including two ways for handling covariates. Section 5has some simulation results showing the efficency of the estimates from the IFM method compared with theclassical maximum likelihood estimation method. Section 6 has a couple of brief examples involving datasets to illustrate the use of the IFM method. The paper concludes with a discussion in Section 7 whichincludes extensions of the concepts in Section 2.For notation, vectors are usually assumed to be row vectors unless otherwise stated.2. Inference method for margins and jackknife.2The inference function for margin (IFM) method is useful for models with the closure property of param-eters associated with or being expressed in lower–dimensional margins (Xu, 1996). Here to concentrate onthe ideas, we demonstrate the main details for a parametric model of the form (1.1), first without covariatesand then with covariates. This is followed by its application to a few models that have appeared in thestatistical literature. From this, we expect that the reader can see the general applicability of the method.Consider a copula–based parametric model for the random vector Y, with cumulative distribution func-tion (cdf)F (y;α1, . . . , αd, θ) = C(F1(y1;α1), . . . , Fd(yd;αd); θ) (2.1)where F1, . . . , Fd are univariate cdfs with respective (vector) parameters α1, . . . , αd, and C is a family ofcopulas parametrized by a (vector) parameter θ. We assume that C has a density c (mixed derivative oforder d). The vector Y could be discrete or continuous. In the former case, the joint probability massfunction f(·;α1, . . . , αd, θ) for Y can be derived from the cdf in (2.1), and we let the univariate marginalprobability mass functions be denoted by f1, . . . , fd; in the latter case, we assume that Fj has density fj forj = 1, . . . , d, and that Y has densityf(y;α1, . . . , αd, θ) = c(F1(y1;α1), . . . , Fd(yd;αd); θ)d∏j=1fj(yj ;αj).For a sample of size n, with observed random vectors y1, . . . ,yn, we can consider the d log–likelihoodfunctions for the univariate margins,Lj(αj) =n∑i=1log fj(yij ;αj), j = 1, . . . , d,and the log–likelihood function for the joint distribution,L(θ,α1, . . . , αd) =n∑i=1log f(yi;α1, . . . , αd, θ). (2.2)A simple case of the IFM method consists of doing d separate optimizations of the univariate likelihoods,followed by an optimization of the multivariate likelihood as a function of the dependence parameter vector.More specifically,a. the log–likelihoods Lj of the d univariate margins are separately maximized to get estimates α˜1, . . . , α˜d;b. the function L(θ, α˜1, . . . , α˜d) is maximized over θ to get θ˜.That is, under regularity conditions, (α˜1, . . . , α˜d, θ˜) is the solution of(∂L1/∂α1, . . . , ∂Ld/∂αd, ∂L/∂θ) = 0. (2.3)3This procedure is computationally simpler than estimating all parameters α1, . . . , αd, θ simultaneouslyfrom L in (2.2). A numerical optimization with many parameters is much more time–consuming comparedwith several numerical optimizations, each with fewer parameters.If the copula model (2.1) has further structure such as a parameter associated with each bivariate margin,simplifications of the second step (b) can be made, so that no numerical optimization with a large number ofparameters is needed. For example, with a multivariate normal latent distribution, there is a simplification asshown in the example given later in this section; the correlation parameters can be estimated from separatelikelihoods of the bivariate margins.If it is possible to maximize L to get estimates αˆ1, . . . , αˆd, θˆ, then one could compare these with theestimates α˜1, . . . , α˜d, θ˜ as an estimation consistency check to evaluate the adequacy of the copula. Forcomparison with the IFM method, the maximum likelihood estimate (MLE) refers to (αˆ1, . . . , αˆd, θˆ). Underregularity conditions, this comes from solving(∂L/∂α1, . . . , ∂L/∂αd, ∂L/∂θ) = 0;contrast with (2.3). Note that for multivariate normal (MVN) distributions, consisting of the MVN copulawith correlation matrix θ = R and N(µj , σ2j ) univariate margins [αj = (µj , σ2j )], that αˆj = α˜j , j = 1, . . . , d,and θˆ = θ˜. The equivalence of the estimators generally does not hold. Possibly because the MVN distributionis dominant in multivariate statistics, attention has not been given to variations of maximum likelihoodestimation for multivariate models.Since it is computationally easier to obtain (α˜1, . . . , α˜d, θ˜), a natural question is its asymptotical efficiencycompared with (αˆ1, . . . , αˆd, θˆ). Outside of efficiency considerations, the former set of estimates provide agood starting point for the latter if one needs and can get (αˆ1, . . . , αˆd, θˆ). Approximations leading to theasymptotic distribution of (α˜1, . . . , α˜d, θ˜) are given in Section 4. From this, one can (numerically) comparethe asymptotic covariance matrices of (α˜1, . . . , α˜d, θ˜) and (αˆ1, . . . , αˆd, θˆ). Also an estimate of the asymptoticcovariance matrix of (α˜1, . . . , α˜d, θ˜) can be obtained. The theory is a special case of using a set of estimatingequations to estimate a vector of parameters. Some results on the efficiency for some specific models aregiven in Section 5.Extensions to covariates is straightforward if one allows some or all of the parameters to be functions ofthe covariates. There are standard ways to handle the inclusion of covariates for univariate parameters, butnot so for the inclusion of covariates for dependence parameters. As before, one has the likelihoods L1, . . . , Ldand L but they are functions of more parameters, such as regression parameters for the covariates. Resultsfor asymptotics of the parameter estimates with covariates are given in Section 4. This is less straightforwardthan the independent and identically distributed (iid) case.We next give some examples to make the above discussion more concrete, and then conclude the sectionwith the use of the jackknife method for estimation of standard errors.4Example 2.1. (Multivariate probit model with no covariates.) The multivariate probit model for themultivariate binary response vector Y has stochastic representation: Yj = I(Zj ≤ αj), j = 1, . . . , d, whereZ ∼ Nd(0, R), and θ = R = (ρjk) is a correlation matrix. Let the data be yi = (yi1, . . . , yid), i = 1, . . . , n.For j = 1, . . . , d, let Nj(0) and Nj(1) be the number of 0’s and 1’s among the yij ’s. For 1 ≤ j < k ≤ d,let Njk(i1, i2) be the frequency of (i1, i2) among the pairs (yij , yik), for (i1, i2) equal (0,0), (0,1), (1,0) and(1,1). From the jth univariate likelihoodL∗j (αj) = [Φ(αj)]Nj(1)[1− Φ(αj)]Nj(0),α˜j = Φ−1(Nj(1)/n), j = 1, . . . , d. For this example, one can estimate the dependence parameters fromseparate bivariate likelihoods L∗jk(ρjk) = L∗jk(ρjk, αj , αk) rather than using the d–variate log–likelihood L in(2.2). Let Φρ be the bivariate standard normal cdf with correlation ρ. ThenL∗jk(ρjk, αj , αk) = [Φρjk(αj , αk)]Njk(11)[Φ(αj)− Φρjk(αj , αk)]Njk(10)[Φ(αk)− Φρjk(αj , αk)]Njk(01)·[1− Φ(αj)− Φ(αk) + Φρjk(αj , αk)]Njk(00),and ρ˜jk is the root ρ of Njk(11)/n = Φρ(α˜j , α˜k). 2Example 2.2. (Multivariate probit model with covariates.) Let the data be (Yi,xi), i = 1, . . . , n, whereYi is a d–variate response vector and xi is a covariate vector. The multivariate probit model has stochasticrepresentation Yj = I(Zij ≤ βj0 + βjxTi ), j = 1, . . . , d, i = 1, . . . , n, where Zi = (Zi1, . . . , Zid) are iid withdistribution Nd(0, R), and R = (ρjk). In the usual multivariate probit model, the dependence parametersρjk are not functions of covariates. However the IFM method works if they are (compatible) functions ofthe covariates. With γj = (βj0, βj), the jth univariate likelihood isL∗j (γj) =n∏i=1[Φ(βj0 + βjxTi )]yij [1− Φ(βj0 + βjxTi )]1−yij .For 1 ≤ j < k ≤ d, the (j, k) bivariate likelihood is:L∗jk(ρjk, γj , γk) =n∏i=1{[Φρjk(βj0 + βjxTi , βk0 + βkxTi )]I(yij=yik=1)·[Φ(βj0 + βjxTi )− Φρjk(βj0 + βjxTi , βk0 + βkxTi )]I(yij=1,yik=0)·[Φ(βk0 + βkxTi )− Φρjk(βj0 + βjxTi , βk0 + βkxTi )]I(yij=0,yik=1)·[1− Φ(βj0 + βjxTi )− Φ(βk0 + βkxTi ) + Φρjk(βj0 + βjxTi , βk0 + βkxTi )]I(yij=yik=0)}.For j = 1, . . . , d, let γ˜j be the IFM estimate from maximizing L∗j . Then for 1 ≤ j < k ≤ d, ρ˜jk is the IFMestimate from maximizing L∗jk(ρjk, γ˜j , γ˜k) as a function of ρjk. 25Examples 2.1 and 2.2 extend to multivariate probit models for ordinal data, in which there are moreunivariate cutoff parameters. With (Z1, . . . , Zd) having Nd(0, R) distribution, the stochastic representationfor (Y1, . . . , Yd) is: Yj = m if βj,m−1 < Zj ≤ βj,m, m = 1, . . . , rj , rj is the number of categories of the jthvariable, j = 1, . . . , d. (Without loss of generality, assume βj,0 = −∞ and βj,rj = ∞ for all j.) There are∑dj=1(rj − 1) univariate parameters or cutoff points. If there is a covariate vector x, then the parametersβj,m and ρjk can depend on x, with constraints such as βj,m−1(x) < βj,m(x).Example 2.3. (Multivariate Poisson–lognormal distribution.) Suppose we have a random sample of iidrandom vectors yi, i = 1, . . . , n, from the probability mass function:f(y;µ,Σ) =∫[0,∞)dd∏j=1p(yj ;λj)g(λ;µ,Σ)dλ1 · · · dλd, yj = 0, 1, 2, . . . ,where p(y;λ) = e−λλy/y!, λ = (λ1, . . . , λd), logλ = (log λ1, . . . , log λd),g(λ;µ,Σ) = (2pi)−d/2(λ1 · · ·λd)−1|Σ|−1/2 exp{−0.5(logλ − µ)Σ−1(logλ − µ)T },λj > 0, j = 1, . . . , d, is a d–variate lognormal density, with mean vector µ = (µ1, . . . , µd) and covariancematrix Σ = (σij). This model uses the multivariate normal distribution for a latent vector, so that parameterscan be estimated from log–likelihoods of univariate and bivariate margins, after reparametrizing Σ into thevector of variances (σ11, . . . , σdd) and a correlation matrix R = (ρjk). Let αj = (µj , σjj), j = 1, . . . , d.The jth univariate marginal density isfj(yj ;αj) =∫ ∞0e−λjλyjjyj !√2piσjjλjexp{−0.5(log λj − µj)2/σjj}dλj .The (j, k) bivariate marginal density isfjk(yj , yk;αj , αk, ρjk) =∫ ∞0∫ ∞0e−λjλyjj e−λkλykkyj !yk! 2piλjλk√σjjσkk − σ2jk·· exp{−0.5(1− ρ2jk)−1[(logαj − µj)2/σjj + (log λk − µk)2/σkk−2ρjk(log λj − µj)(log λk − µk)/√σjjσkk]} dλjdλk,where ρjk = σjk/√σjjσkk. Hence α˜j obtains from maximizing Lj(αj) =∑ni=1 log fj(yij ;αj), and ρ˜jk obtainsfrom maximizing Ljk(ρjk, α˜j , α˜k) =∑ni=1 log fjk(yij , yik; α˜j , α˜k, ρjk) as a function of ρjk.A simple extension to include covariates is to make the parameters µj linear in the covariates for j =1, . . . , d. 2Example 2.4. (Multivariate extreme value models.) In this example, we indicate how the IFM methodapplies to some parametric multivariate extreme value models. Other than one simple exchangeable depen-dence copula model, we do not state multivariate extreme value models here in order to save space (see Joe61994 and references therein for details). The multivariate extreme value models have generalized extemevalue distributions as univariate margins, that is,Fj(yj ;αj) = exp{−(1 + γj [yj − µj ]/σj)−1/γj+ }, −∞ < xj <∞, −∞ < γj , µj <∞, σj > 0,where αj = (γj , µj , σj) and x+ = max{0, x}. In Tawn (1990) and Joe (1994), the initial data analysisconsisted of fitting the univariate margins separately to get α˜j , j = 1, . . . , d, followed by comparison of themodels for the dependence structure with the univariate margins fixed at the estimated parameter valuesα˜j . One way to compare various models is from the AIC (Akaike information criterion) values associatedwith the log–likelihoods L(θ˜, α˜1, . . . , α˜d). However if d is large, one can estimate the dependence parametersin θ more easily with the IFM method.The sequence of log–likelihoods of marginal distributions is indicated for several families. For a multi-variate extreme value copula C with one (exchangeable) dependence parameter θ, such asC(u1, . . . , ud; θ) = exp{−[∑j(− log uj)θ]1/θ}, θ ≥ 1,one can estimate θ from maximizing L(θ, α˜1, . . . , α˜d) as a function of θ. For other copulas with moredependence parameters, one can use log–likelihoods of marginal distributions. Examples are:1. The Hu¨sler and Reiss (1989) model, which derives from a non–standard extreme value limit of themultivariate normal distribution, has a dependence parameter associated with each bivariate margin,and it is closed under taking of margins. That is, θ = (θjk : 1 ≤ j < k ≤ d). For 1 ≤ j < k ≤ d, theparameter θjk can be estimated by maximizing the (j, k) bivariate log–likelihood Ljk(θjk, α˜j , α˜k) as afunction of θjk.2. Joe (1994) has a class of models which have a dependence parameter associated with each bivariate mar-gin, but some parameters are interpreted as conditional dependence parameters. In the d–dimensionalcase, the estimation of θ = (θjk : 1 ≤ j < k ≤ d) is most easily obtained from maximizing the followingsequence of bivariate log–likelihoods, each in one parameter:Lj,j+1(θj,j+1, α˜j , α˜j+1), j = 1, . . . , d− 1,Lj,j+2(θj,j+2, α˜j , α˜j+2, θ˜j,j+1, θ˜j+1,j+2), j = 1, . . . , d− 2,· · ·L1d(θ1,d, α˜1, α˜d, θ˜jk, (j, k) 6= (1, d)).27There are many other models for which the IFM method can be used (Xu 1996). However the aboveexamples should illustrate the general usefulness for parametric multivariate models that have some closureproperties of margins and nice dependence interpretations for the multivariate parameters. Two exampleswith data sets that have appeared in the literature are given in Section 6.Under some regularity conditions, the parameter estimate η˜ = (α˜1, . . . , α˜d, θ˜) can be shown to be asymp-totically multivariate normal, using the theory of inference functions or estimating equations (Godambe1960, 1976, 1991). The asymptotic covariance matrix n−1V involves derivatives of the inference functions(which for us are score functions from some log–likelihood functions). To avoid the tedious taking and cod-ing of derivatives (especially if symbolic manipulation software cannot be used), we propose the use of thejackknife method to estimate the asymptotic covariance matrix and to obtain standard errors for functionsof the parameters. (These functions include probabilities of being in some category or probabilities of ex-ceedances). Hence with the jackknife and the use of the quasi–Newton method for numerical optimization,only the log–likelihood functions of some marginal distributions must be coded.We present the jackknife in the case of no covariates. However it extends easily to the case of covariates.For the delete–one jackknife, let η˜(i) be the estimator of η = (α1, . . . , αd, θ) with the ith observation Yideleted, i = 1, . . . , n. Assuming η˜ and η˜(i)’s are row vectors, the jackknife estimator of n−1V isn∑i=1(η˜(i) − η˜)T (η˜(i) − η˜).For large samples in which the delete–one jackknife would be computationally too time–consuming, thejackknife can be modified into estimates from deletions of more than one observation at a time. Supposen = gm, with g groups or blocks of m; g estimators can be obtained with the kth estimate based onn −m observations after deleting the m observations in the kth block. (It is probably best to randomizethe n observations into the m blocks. The disadvantage is that the jackknife estimates depends on therandomization. However reasonable estimates for standard errors should obtain and there are approximationsin estimating standard errors for any method.) Now let η˜(k) be the estimator of η with the kth block deleted,k = 1, . . . , g. We think of m as fixed with g →∞. The jackknife estimator of n−1V , which is asymptoticallyconsistent, isVn,g =g∑k=1(η˜(k) − η˜)T (η˜(k) − η˜). (2.5)The jackknife method can also be used for estimates of functions of parameters. The delta or Taylormethod requires partial derivatives (of the function with respect to the parameters) and the jackknife methodeliminates the need for this. As above, let η˜(k) be the estimator of η with the kth block deleted, k = 1, . . . , g,and let η˜ be the estimator based on the entire sample. Let b(η) be a (real–valued) quantity of interest. Fromeach subsample, compute an estimate of b(η); i.e., b(η˜(k)), k = 1, . . . , g, and b(η˜) are obtained. The jackknife8estimate of the standard error of b(η˜) is[Vn,g(b)]1/2 ={ g∑k=1[b(η˜(k))− b(η˜)]2}1/2. (2.6)What the above means for the computing sequence for the jackknife method is that one should maintaina table of the parameter estimates for the full sample and each jackknife subsample. Then one can use thistable for computing estimates of one or more functions of the parameters, together with the correspondingstandard errors.The use of the jackknife for functions of parameters also gets around the problem of the ‘optimal’parametrization for speed of convergence to asymptotic normality (and hence estimate of the asymptoticcovariance matrix for the delta method).3. Some advantages of IFM method. In this brief section, we summarize some advantages of theIFM method. In the discussion below, the ML method refers to maximum likelihood of all parameterssimultaneously, as in (2.2).1. The IFM method makes inference for many multivariate models computationally feasible.2. More statistical models exist for univariate and bivariate distributions, so that it allows one to doinference and modelling starting with univariate and lower–dimensional margins.3. It allows one to compare models for the dependence structure and do a sensitivity analysis of the modelsfor predictions and inferences. There is some robustness against misspecification of the dependencestructure. Also there should be more robustness against outliers or perturbations of the data, comparedwith the ML method.4. Sparse multivariate data can create problems for ML method. But the IFM method avoids the sparse-ness problem to a certain degree, especially if parameters can all be estimated from univariate andbivariate likelihoods; this could be a major advantage in a smaller sample situation.4. Summary of asymptotic results.In this section, we present asymptotic results for the case of iid response vectors with no covariates andthen look at two approaches for the extension to include covariates. In the last subsection, results on theconsistency of the jackknife are given.When we refer to a density, it means a density with respect to an appropriate measure space. The densityis a probability mass function in the discrete case, but we use a common notation and terminology to coverboth the cases of continuous and discrete response vectors.4.1. Independent and identically distributed case.9Throughout this subsection, we assume that the usual regularity conditions (see for example, Serfling1980) for asymptotic maximum likelihood theory hold for the multivariate model as well as all of its margins.A unification of the variations of the IFM method is through inference or estimating functions in which eachfunction is a score function or the (partial) derivative of a log–likelihood of some marginal density. Letη = (α1, . . . , αd, θ) be the row vector of parameters and let ψ be a row vector of inference functions of thesame dimension as η.Let Y,Y1, . . . ,Yn be iid with the density f(·;η). Suppose the row vector of estimating equations for theestimator η˜ isn∑i=1ψ(Yi, η˜) = 0. (4.1)Let ∂ψT /∂η be the matrix with (j, k) component ∂ψj(y, η)/∂ηk, where ψj is the jth component of ψ and ηkis the kth component of η. From an expansion of (4.1) similar to the derivation of the asymptotic distributionof a MLE, under the regularity conditions, the asymptotic distribution of n1/2(η˜ − η)T is equivalent to thatof{−E [∂ψT (Y, η)/∂η]}−1n−1/2n∑i=1ψT (Yi, η).Hence it has the same asymptotic distribution as {−E [∂ψT (Y, η)/∂η]}−1ZT , where Z ∼ N(0,Cov (ψ(Y;η))).That is, the asymptotic covariance matrix of n1/2(η˜ − η)T , called the inverse Godambe information matrix,is V = D−1ψ Mψ(D−1ψ )T , where Dψ = E[∂ψT (Y, η)/∂η], Mψ = E [ψT (Y, η)ψ(Y, η)].4.2. Inclusion of covariates: approach 1Now, we assume that we have independent, non–identically distributed random vectors Yi, i = 1, . . . , n,with Yi having density f(·;ηi), ηi = η(xi, γ) for a function η and a parameter function γ. The check of thenecessary conditions depends somewhat on the specific models. However we indicate the general types ofconditions that must hold.We assume that each component of η = (α1, . . . , αd, θ) is a function of x, more specifically, αj = aj(x, γj),j = 1, . . . , d, and θ = t(x, γd+1), with a1, . . . ,ad, t having components that are each functions of linearcombinations of the functions of the components of x (that is, there are link functions linking x to theparameters). We assume that the inference function vector ψ has a component for each parameter inγ = (γ1, . . . , γd, γd+1).We explain the notation here for Examples 2.1 and 2.2. With no covariates, η = (α1, . . . , αd, θ), where αjis the cutoff point for the jth univariate margin, and θ = R = (ρjk) is a correlation matrix. With covariates,αj = aj(x, βj0, βj) = βj0 + βjxT , j = 1, . . . , d, and t(x, θ) = θ, so that γ = (β10, β1, . . . , βd0, βd, θ) withγj = (βj0, βj) and γd+1 = θ.In place of f(y;α1, . . . , αd, θ) and fj(yj , αj) in the case of no covariates, we now have the densities:fY|x(y|x;γ) def= f(y; a1(x, γ1), . . . ,ad(x, γd), t(x, γd+1))10andfYj |x(yj |x;γj) def= fj(yj ; aj(x, γj)), j = 1, . . . , d.In a simple case, the estimate γ˜ from the IFM method has component γ˜j coming from the maximization ofLj(γj) =n∑i=1log fYj |x(yij |xi;γj), (4.2)j = 1, . . . , d, and γ˜d+1 comes from the maximization of L(γ˜1, . . . , γ˜d, γd+1) in γd+1, whereL(γ) =n∑i=1log fY|x(yi|xi;γ). (4.3)Alternatively, the components of γd+1 may be estimated from log–likelihoods of lower–dimensional marginssuch as in Example 2.2. In any case, let ψ be the vector of inference functions from partial derivatives oflog–likelihood functions of margins.Conditions for the preceding asymptotic results to hold are of the following sense (we do not take upspace to write out the numerous equations):a. mixed derivatives of ψ of first and second order are dominated by integrable functions;b. products of these derivatives are uniformly integrable;c. the link functions are twice continuously differentiable with first and second order derivatives boundedaway from zero;d. covariates are uniformly bounded, the sample covariance matrix of the covariates xi is strictly positivedefinite;e. a Lindeberg–Feller type condition holds.References for these types of conditions and proofs of asymptotic normality are in, for example, Bradleyand Gart (1962) and Hoadley (1971).Assuming that the conditions hold, then the asymptotic normality result has the form:n−1/2V −1/2n (γ˜ − γ)T →d N(0, I),where Vn = D−1n Mn(D−1n )T withDn = n−1n∑i=1E [∂ψT (Yi, γ)/∂γ] and Mn = n−1n∑i=1E [ψT (Yi, γ)ψ(Yi, γ)].Details for the case of multivariate discrete models are given in Xu (1996); the results generalize to thecontinuous case when the assumptions hold.114.3. Inclusion of covariates: approach 2.A second approach for asymptotics allows for parameters to be more general functions of the covariates,and treats the covariates as realizations of random vectors. This approach assumes a joint distribution forresponse vector and covariates, with the parameters for the marginal distribution of the covariate vectortreated as nuisance parameters. This assumption might be reasonable for a random sample of subjects inwhich xi and Yi are observed together (no control of xi before observing Yi).Similar to the preceding subsection, we write the conditional densityfY|x(y|x;γ) = f(y;η(x, γ)).Let Zi = (Yi,xi), i = 1, . . . , n. These are treated as iid random vectors from the densityfZ(z;γ) = fY|x(y|x;γ) · fX(x;ω). (4.4)For inference, we are interested in γ and η(x, γ), and not in ω. Marginal distributions of (4.4) for the IFMmethod are:fYj |x(yj |x;γj) · fX(x;ω), j = 1, . . . , d.If ω is treated as a nuisance parameter, then the log–likelihood in γ from (4.5) below is essentially thesame as that in Approach 1.Let γ,αj ,aj , θ, t be the same as in the preceding subsection, except that aj , j = 1, . . . , d, and t could bemore general functions of the covariate vector x. The vector estimate from the IFM method has componentγ˜j coming from the maximization ofLj(γj) =n∑i=1log[fYj |x(yij |xi;γj)fX(xi;ω)], (4.5)j = 1, . . . , d, and γ˜d+1 coming from the maximization of L(γ˜1, . . . , γ˜d, γd+1) in γd+1, whereL(γ) =n∑i=1log[fY|x(yi|xi;γ)fX(xi;ω)]. (4.6)Note that optimization of (4.2) and (4.5), and of (4.3) and (4.6) are the same. Alternatively, the componentsof γd+1 may be estimated from log–likelihoods of lower–dimensional margins. In any case, let ψ be the vectorof inference functions from partial derivatives of log–likelihoods functions of margins.Assume that the standard regularity conditions hold for fZ and its margins, then the asymptotic theoryfrom the iid case holds for the estimates using the IFM method in subsection 4.1. The asymptotic normalityresult isn−1/2(γ˜ − γ)T →d N(0, V ),where V = D−1ψ Mψ(D−1ψ )T , with Dψ = E [∂ψT (Y,x, γ)/∂γ], Mψ = E [ψT (Y,x, γ)ψ(Y,x, γ)].12This approach for asymptotics with covariates appears to be new. The asymptotic covariance matricesfor n−1/2(γ˜ − γ) are different in the two asymptotic approaches. However the inference functions are thesame in the two approaches. The use of the empirical distribution function to estimate the inverse Godambeinformation matrix or the use of the jackknife would lead to the same standard error estimates in the twoapproaches.4.4. Consistency of jackknife.Under the assumptions of the subsection 4.1, the jackknife estimators in (2.5) and (2.6) are consistent asn→∞ or g →∞ with m fixed. That is, nVn,g − V →p 0 andnVn,g(b)− ∂b∂ηV∂b∂ηT→p 0.Under the assumptions of the subsections 4.2 and 4.3, the jackknife estimator with covariates which issimilar to that in (2.5) and (2.6) is also consistent as n→∞ with m fixed.Standard techniques are used to prove these results. Details are given in Xu (1996).5. Efficiency and simulation results. Efficiency comparisons of the IFM method with the MLmethod is a difficult task because of the general intractability of the asymptotic covariance matrices, andcomputational time required to obtain the MLE. Nevertheless, comparisons of various types are made in Xu(1996) for a number of multivariate models. One comparison is that of the asymptotic covariance matricesof the MLE and the IFM estimator, over the range of the parameters, in cases where the terms of thecovariances matrices can be obtained. More generally, Monte Carlo simulations are required to comparethe two estimators. All of the comparisons that were done suggest that the IFM method is highly efficient.Intuitively, we expect the IFM method to be quite efficient because it depends heavily on maximum likelihood,albeit from likelihoods of marginal distributions.This section summarizes one of the many efficiency comparisons that were made, and an assessment ofthe jackknife for standard error estimation.In Xu (1996), simulation results are obtained for dimensions d = 3, 4, for the multivariate probit modelfor binary or ordinal data with covariates, and for other copula–based models for multivariate discrete data.In almost all of the simulations, the relative efficiency, as measured by the ratio of the mean square errorsof the IFM estimator to the MLE is close to 1. Typical simulation summaries are given in Tables 1 and 2.Table 1 is from Example 2.1 with trivariate binary data (d = 3) with true parameters αj = 0, j = 1, 2, 3, and(i) ρjk = 0.6, 1 ≤ j < k ≤ 3, and (ii) ρ12 = ρ23 = 0.8, ρ13 = 0.64. In the parametrization for the estimationand as shown in the Table 1, θjk = (exp(ρjk)−1)/(exp(ρjk) + 1), j < k, were used. Table 2 is from Example2.2; the same parameters as (i) and (ii) were used for the correlation matrix, but there is one covariate. Theparameters used were: β10 = 0.7, β20 = 0.5, β30 = 0.3 and β11 = β21 = β31 = 0.5, with xi iid N(0,1/4).13Table 1: Parameter estimates, square root mean square errors and efficiency ratios for the multivariate probitmodel for binary data; d = 3. True parameters α1 = α2 = α3 = 0.margin 1 2 3 (1,2) (1,3) (2,3)n parameters α1 α2 α3 θ12 θ13 θ23θ12 = θ13 = θ23 = 1.3863100 IFM 0.003 -0.002 0.005 1.442 1.426 1.420(0.131) (0.121) (0.128) (0.376) (0.380) (0.378)MLE 0.002 -0.003 0.004 1.441 1.426 1.420(0.131) (0.121) (0.128) (0.376) (0.380) (0.378)r 0.998 0.999 0.999 0.999 0.999 0.9991000 IFM -0.0006 -0.0016 -0.0008 1.3924 1.3897 1.3906(0.040) (0.038) (0.039) (0.114) (0.114) (0.113)MLE -0.0018 -0.0028 -0.0019 1.3919 1.3893 1.3902(0.040) (0.038) (0.039) (0.114) (0.114) (0.113)r 0.997 0.997 0.997 1.000 1.001 1.000θ12 = θ23 = 2.1972, θ13 = 1.5163100 IFM 0.0027 -0.0006 0.0003 2.2664 1.5571 2.2586(0.131) (0.123) (0.130) (0.454) (0.377) (0.453)MLE 0.0015 -0.0020 -0.0012 2.2646 1.5552 2.2579(0.131) (0.123) (0.131) (0.453) (0.377) (0.452)r 0.999 1.000 0.999 1.001 1.001 1.0021000 IFM -0.0006 -0.0001 -0.0005 2.2009 1.5174 2.2043(0.040) (0.038) (0.039) (0.135) (0.118) (0.136)MLE -0.0023 -0.0020 -0.0022 2.2003 1.5166 2.2036(0.040) (0.038) (0.039) (0.135) (0.118) (0.137)r 0.996 1.000 0.996 0.999 1.000 1.000Two sample sizes n = 100 and n = 1000 are used. The number of simulations in each case is 1000. Foreach case, the entries of the Tables 1 and 2 are: for the IFM estimates and then the MLE, the average andsquare root mean square error for each parameter, and the ratio r of the first square root mean square errorto the second.In Xu (1996), simulations were also used to assess the adequacy of the jackknife estimator of the standarderror, with the comparison to the estimator of the standard error from the Godambe information matrix insome cases where the latter was possible to obtain. The conclusion is that the jackknife estimator does verywell. A typical simulation summary is given below.To simplify the computations, we use a trivariate probit model, and estimate only the dependence param-eters θ12, θ13, θ23 from the bivariate likelihoods with the univariate parameters fixed. For the sth simulation,let the estimates be denoted as θ˜(s)12 , θ˜(s)13 , θ˜(s)23 . We then compute the jackknife estimate of variance (with ggroups of size m such that g ×m = n) for θ˜(s)12 , θ˜(s)13 , θ˜(s)23 . Let these variance estimates be v(s)12 , v(s)13 , v(s)23 andlet the asymptotic variance estimate of θ˜12, θ˜13, θ˜23 based on Godambe information matrix be v12, v13, v23.We compare the following three variance estimates measures:14Table 2: Parameter estimates, square root mean square errors and efficiency ratios for the multivariateprobit model for binary data with a continuous covariate; d = 3. True parameters are β10 = 0.7, β20 = 0.5,β30 = 0.3 and β11 = β21 = β31 = 0.5, and xi ∼ N(0, 1/4).margin 1 2 3 (1,2) (1,3) (2,3)n parameters β10 β11 β20 β21 β30 β31 θ12 θ13 θ23θ12 = θ13 = θ23 = 1.3863100 IFM 0.722 0.529 0.488 0.520 0.312 0.524 1.453 1.403 1.473(0.136) (0.326) (0.144) (0.278) (0.137) (0.310) (0.398) (0.402) (0.401)MLE 0.722 0.532 0.486 0.519 0.311 0.522 1.458 1.407 1.476(0.137) (0.320) (0.144) (0.278) (0.138) (0.308) (0.402) (0.412) (0.406)r 0.999 1.019 0.999 1.002 0.993 1.005 0.990 0.976 0.9891000 IFM 0.704 0.495 0.501 0.504 0.306 0.504 1.413 1.380 1.391(0.042) (0.089) (0.046) (0.084) (0.041) (0.093) (0.140) (0.109) (0.124)MLE 0.703 0.494 0.500 0.503 0.305 0.503 1.415 1.381 1.393(0.042) (0.090) (0.045) (0.084) (0.040) (0.093) (0.139) (0.109) (0.124)r 1.004 0.988 1.004 0.993 1.007 1.001 1.000 1.006 1.000θ12 = θ23 = 2.1972, θ13 = 1.5163100 IFM 0.722 0.529 0.490 0.543 0.300 0.544 2.303 1.556 2.303(0.136) (0.326) (0.133) (0.272) (0.131) (0.309) (0.494) (0.362) (0.525)MLE 0.721 0.532 0.488 0.542 0.298 0.538 2.318 1.550 2.310(0.136) (0.317) (0.134) (0.279) (0.131) (0.306) (0.504) (0.371) (0.533)r 1.000 1.028 0.993 0.976 1.002 1.010 0.981 0.978 0.9851000 IFM 0.704 0.495 0.502 0.500 0.303 0.506 2.220 1.541 2.213(0.042) (0.089) (0.045) (0.076) (0.041) (0.091) (0.155) (0.123) (0.142)MLE 0.703 0.494 0.499 0.498 0.301 0.505 2.222 1.541 2.215(0.042) (0.089) (0.045) (0.075) (0.041) (0.091) (0.156) (0.124) (0.142)r 1.011 1.002 1.007 1.015 1.010 1.010 0.991 0.997 0.99715Table 3: Comparison of estimates of standard errors, (i) true, (ii) Godambe, (iii) jackknife with g groups;n = 500.approach MSE(θ12) MSE(θ13) MSE(θ23)α = (0.0, 0.7, 0.0)′, θ = (−0.5, 0.5,−0.5)(i) 0.00416 0.00314 0.00426(ii) 0.00402 0.00329 0.00402(g,m) (iii)(500, 1) 0.00408 0.00331 0.00410(250, 2) 0.00407 0.00333 0.00412(125, 4) 0.00405 0.00333 0.00412(50, 10) 0.00411 0.00340 0.00418α = (0.7, 0.7, 0.7)′, θ = (0.9, 0.7, 0.5)(i) 0.00063 0.00269 0.00452(ii) 0.00059 0.00248 0.00437(g,m) (iii)(500, 1) 0.00061 0.00250 0.00441(250, 2) 0.00061 0.00251 0.00442(125, 4) 0.00062 0.00253 0.00447(50, 10) 0.00062 0.00254 0.00450α = (1.0, 0.5, 0.0)′, θ = (0.8, 0.6, 0.8)(i) 0.00163 0.00385 0.00141(ii) 0.00174 0.00418 0.00133(g,m) (iii)(500, 1) 0.00182 0.00440 0.00136(250, 2) 0.00184 0.00441 0.00137(125, 4) 0.00185 0.00443 0.00136(50, 10) 0.00188 0.00448 0.00139(i) mean square error (MSE): 1N∑Ns=1(θ˜(s)12 − θ12)2, 1N∑Ns=1(θ˜(s)13 − θ13)2, 1N∑Ns=1(θ˜(s)23 − θ23)2,(ii) estimate from Godambe matrix: v12, v13, v23,(iii) estimate from jackknife: 1N∑Ns=1 v(s)12 ,1N∑Ns=1 v(s)13 ,1N∑Ns=1 v(s)23 ,The MSE in (i) should be considered as the true variance of the parameter estimate, and (ii) and (iii)should be compared with each other and with (i). Table 3 summarizes the numerical computation of thevariance estimates of θ˜12, θ˜13, θ˜23 based on approaches (i), (ii) and (iii); the sample size is n = 500 and therewere N = 500 simulations in each case. For (iii) the jackknife method, different combinations of (g,m) wereused. Three cases with different marginal parameters α = (α1, α2, α3) and different dependence parametersθ = (θ12, θ13, θ23) were used. The parameter values are reported in the table. In Table 3, the three measuresare very close to each other. This is also the cases in simulations in other models.6. Data examples.6.1 Example 1.16Table 4: Trivariate count data: some summary statistics.margin mean variance max Q3 med Q1 min margin correlation1 4.7 15.07 22 6 4 2 1 1,2 0.01922 6.5 13.64 15 9 6 4 0 1,3 -0.16663 6.6 32.61 30 8 6 3 0 2,3 -0.3667Table 5: Trivariate count data: estimates of parameters for the multivariate Poisson–lognormal modelmargin µ˜ (SE) σ˜ (SE)1 1.388 (0.098) 0.551 (0.122)2 1.784 (0.098) 0.425 (0.090)3 1.660 (0.120) 0.672 (0.121)margin ρ˜jk (SE)1,2 0.059 (0.315)1,3 -0.260 (0.208)2,3 -0.605 (0.206)This example consists of a data set in Aitchison and Ho (1989) of trivariate counts, the counts ofpathogenic bacteria at 50 different sterile locations measured by three different air samplers. One of theobjective of the study is to investigate the relative effectiveness of three different air samplers to detectpathogenic bacteria.Summary statistics (means, variance, quartiles, maxima, minima and pairwise correlations) are given inTable 4. The initial data analysis indicate that there are some extra–Poisson variation as the variance tomean ratio for each margin (or sampler) ranges from 2 to 5, with the sampler 3 more spread out than theother two samplers.We fit the multivariate Poisson–lognormal model (see Example 2.3) to the data and estimate the param-eters using the IFM method, with the (delete–one) jackknife method for standard errors. Table 5 containsthe estimates and SEs of the parameters. The estimated means, variances and correlations, computed asfunctions of the parameters based on the fitted model, are quite close to the empirical means, variances,correlations in Table 4. Various diagnostics showed that the multivariate Poisson–lognormal model to bea more appropriate model for this data set than other models. The analysis indicate that the sampler 3tends to be negatively correlated with sampler 2 and sampler 1 . Similar results were obtained in Aitchi-son and Ho (1989). Using an approximation to the multivariate log–likelihood, they report estimates froma final model with σj = σ for all j, and ρjk = ρ for all j 6= k, and got estimates (standard errors) ofµˆ1 = 1.39 (0.11), µˆ2 = 1.75 (0.10), µˆ3 = 1.70 (0.10), σˆ = 0.56 (0.05), and ρˆ = −0.28 (0.10). These are quiteclose to our estimates from the IFM method for this simplified model: µ˜1 = 1.388 (0.098), µ˜2 = 1.784 (0.098),µ˜3 = 1.660 (0.120), σ˜ = 0.523 (0.062), and ρ˜ = −0.347 (0.133).17Table 6: Multivariate ordinal data: univariate marginal (and relative) frequencies.marginOutcomes 1979 1980 1981 1982< 5mi.L 14 (0.122) 18 (0.157) 14 (0.122) 18 (0.157)M 69 (0.600) 73 (0.635) 72 (0.626) 70 (0.609)H 32 (0.278) 24 (0.209) 29 (0.252) 27 (0.235)> 5mi.L 9 (0.059) 35 (0.229) 17 (0.111) 23 (0.150)M 110 (0.719) 93 (0.608) 117 (0.765) 110 (0.719)H 34 (0.222) 25 (0.163) 19 (0.124) 20 (0.131)allL 23 (0.086) 53 (0.198) 31 (0.116) 41 (0.153)M 179 (0.668) 166 (0.619) 189 (0.705) 180 (0.672)H 66 (0.246) 49 (0.183) 48 (0.179) 47 (0.175)6.2 Example 2.This example consists of multivariate (longitudinal) ordinal response data listed in Fienberg et al. (1985)and Conaway (1989). The data come from a study on the psychological effects of the accident at the ThreeMile Island nuclear power plant in 1979. We use a multivariate probit model for each level of a binarycovariate. This model highlights different features of the data compared with the models and analyses incited articles. More detailed data analyses and comparisons of models are given in Xu (1996).The study focuses on the changes in levels of stress of mothers of young children living within 10 milesof the plant. Four waves of interviews were conducted in 1979, 1980, 1981, 1982, and one variable measuredat each time point is the level of stress (categorized as low, medium, or high). Hence stress is treated asan ordinal response variable with three categories, now labelled as L, M, H. There were 268 mothers in thestudy, and they were stratified into two groups, those living within 5 miles of the plant, and those livingbetween 5 and 10 miles from the plant. There were 115 mothers in the first group and 153 in the secondgroup.Over the four time points and three levels are the ordinal response variable, there were 81 possible four–tuples of the form LLLL to HHHH but many of these had a zero count. There was only one subject witha big change in the stress level (L to H or H to L) from one year to another. 42% of the subjects werecategorized into the same stress level in all four years. The frequencies by univariate margin (or by year)are given in Table 6. The medium stress category predominates and there is a higher relative frequency ofsubjects in the high stress category for the group within five miles of the plant compared with the the groupexceeding five miles. From Table 6, there are not big changes over time, but there is a small trend towardslower stress levels, for the group exceeding five miles.18Table 7: Multivariate ordinal data: estimates of parameters (and SEs) for the multivariate probit model.< 5 mi > 5 miparameter estimate (SE) estimate (SE)β11 −1.16 (0.16) −1.57 (0.17)β12 0.59 (0.13) 0.76 (0.11)β21 −1.01 (0.14) −.74 (0.11)β22 0.81 (0.13) 0.98 (0.12)β31 −1.17 (0.16) −1.22 (0.14)β32 0.67 (0.13) 1.15 (0.13)β41 −1.01 (0.14) −1.04 (0.13)β42 0.73 (0.13) 1.12 (0.13)ρ12 0.785 (0.060) 0.678 (0.079)ρ13 0.696 (0.067) 0.463 (0.111)ρ14 0.653 (0.086) 0.436 (0.108)ρ23 0.806 (0.059) 0.750 (0.066)ρ24 0.636 (0.096) 0.510 (0.104)ρ34 0.844 (0.052) 0.562 (0.116)Table 7 has estimates and standard errors of the multivariate probit model by the distance group. Aswould be expected, the dependence parameters for consecutive years are larger. In comparisons of the twogroups (< 5 mi and > 5 mi), the dependence parameters are larger for the first group. This means that themothers in the first group are probably more consistent in their responses over time.7. Discussion.The IFM method presented in this report applies to multivariate models with closure properties of theparameters, for example, parameters are associated with or are expressed in lower–dimensional marginaldistributions. Together with the jackknife, it makes inference and data analysis more computationallyfeasible for many multivariate models.There are modifications of the method when a parameter is common to more than one margin, and thesewill be presented in a separate report. One example in which a parameter can appear in more than onemargin is when the copula has a special dependence structure; for the exchangeable dependence structure, aparameter is common to all bivariate margins, and for the autoregressive of order one dependence structurefor a latent MVN distribution, each bivariate margin has a parameter which is the power of the lag–onecorrelation parameter. Other examples arise when different univariate margins have a common parameter;for example, in a repeated measures study with short time series over a short period of time, it may bereasonable to assume common regression coefficients for the different time points.Besides working with higher–dimensional margins, which may be computationally difficult in some cases,there are two ways in which the IFM method can extend. They are:191. average or weight the estimators from the likelihoods of the margins with the common parameter;2. create inference functions or estimating equations from the sum of log–likelihoods of the margins thathave the common parameter.For both of these methods, the jackknife is still valid for obtaining standard errors of estimates of parameters.Details on the use of variations of the IFM method for comparisons of models, with sensitivity analyses,are given in Xu (1996) for several data sets.Acknowledgements. This research has been supported by a NSERC Canada grant.References.Aitchison, J. and Ho, C.H. (1989). The multivariate Poisson–log normal distribution. Biometrika, 76,643–653.Anderson, J.A. and Pemberton, J.D. (1985). The grouped continuous model for multivariate orderedcategorical variables and covariate adjustment. Biometrics, 41, 875–885.Ashford, J.R. and Sowden, R.R. (1970). Multivariate probit analysis. Biometrics, 26, 535–546.Bradley, R. A. and Gart, J. J. (1962). The asymptotic properties of ML estimators when sampling fromassociated populations. Biometrika, 49, 205–213.Conaway, M.R. (1989). Analysis of repeated categorical measurements with conditional likelihood meth-ods. J. Amer. Statist. Assoc., 84, 53–62.Fienberg, S.E., Bromet, E.J., Follman, D., Lambert, D, and May, S.M. (1985). Longitudinal analysis ofcategorical epidemiological data: a study of Three Mile Island. Environ. Health Perspectives, 63, 241–248.Godambe, V. P. (1960). An optimal property of regular maximum likelihood estimation. Ann. Math.Statist., 31, 1208–1211.Godambe, V. P. (1976). Conditional likelihood and unconditional optimum estimating equations. Biome-trika, 63, 277–284.Godambe, V.P. [editor] (1991). Estimating Functions. Oxford University Press, Oxford.Hoadley, B. (1971). Asymptotic properties of maximum likelihood estimators for the independent notidentically distributed case. Ann. Math. Statist., 42, 1977–1991.Hu¨sler, J. and Reiss, R.–D. (1989). Maxima of normal random vectors between independence and completedependence. Statist. Probab. Letters, 7, 283–286.Joe, H. (1993). Parametric family of multivariate distributions with given margins. J. Mult. Anal., 46,262–282.Joe, H. (1994). Multivariate extreme value distributions and applications to environmental data. Canad.J. Statist., 22, 47–64.20Joe, H. and Hu, T. (1996). Multivariate distributions from mixtures of max–infinitely divisible distribu-tions. J. Mult. Anal., 57, 240–265.Lesaffre, E. and Molenberghs, G. (1991). Multivariate probit analysis: a neglected procedure in medicalstatistics. Statist. in Medicine, 10, 1391–1403.Lipsitz, S. R., Dear, K. B. G., Zhao, L. (1994). Jackknife estimators of variance for parameter estimatesfrom estimating equations with applications to clustered survival data. Biometrics, 50, 842–846.McLeish, D. L. and Small, C. G. (1988). The Theory and Applications of Statistical Inference Functions,Lecture Notes in Statistics 44, Springer-Verlag, New York.Nash, J.C. (1990). Compact Numerical Methods for Computers: Linear Algebra and Function Minimisa-tion, 2nd edition. Hilger, New York.Serfling, R.J. (1980). Approximation Theorems of Mathematical Statistics. Wiley, New York.Tawn, J. A. (1990). Modelling multivariate extreme value distributions. Biometrika, 77, 397–415.Xu, J. J. (1996). Statistical Modelling and Inference for Multivariate and Longitudinal Discrete ResponseData. Ph.D. thesis. Department of Statistics, University of British Columbia.21


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items