UBC Theses and Dissertations
UBC Theses and Dissertations
Statistical modelling and inference for discrete and censored familial data Zhao, Yinshan
Analysis of familial data with quantitative traits based on the multivariate normal distribution has been well studied. However, little attention has been devoted to traits which do not have a multivariate normal distribution, such as traits with discrete or censored values. In this thesis, we devote our effort to (1) construct models for familial data when the trait value is discrete and/or censored, and (2) study alternative estimation methods when maximum likelihood estimation is infeasible. We discuss two existing classes of models: models with random effects which are multivariate normally distributed, and models constructed from the multivariate normal copula. These two classes include a variety of models which can be applied to familial data. We also propose another class of models which we call conditional independence models. This type of model is based on a conditional independence assumption: for a trait variable, we assume independence of a pair of non-sibling relatives conditional on their parents, so that the dependence structure is built on the Markov property. Maximum likelihood estimates are generally difficult to obtain for random effect models and copula models when there are large families involved. We propose two estimation procedures based on composite likelihoods: the first is a two-stage method in which univariate marginal parameters are estimated based on univariate marginal distributions and the dependence parameters are estimated separately based on bivariate marginal distributions with the marginal parameters treated as known; whereas in the second, all the parameters are estimated using the likelihoods of bivariate marginal distributions. The composite likelihood methods can greatly reduce computation in parameter estimation, but with a price of efficiency loss. In this thesis, extensive investigations based on asymptotic covariance matrices and simulations were carried out to compare the asymptotic efficiency of these two procedures with the maximum likelihood method. In our efficiency comparisons, we investigate the multivariate normal model for a continuous trait, the multivariate probit model for a binary trait, the multivariate Poisson-lognormal mixture model for a count trait and multivariate lognormal model for a censored variable. We found that when the dependence is strong, the first approach is inefficient for the regression parameters; whereas when the dependence is weak, the second approach is inefficient for the dependence parameters. In many familial analyses, quantifying familial association is of great interest. For a binary trait, the odds ratio may be used as a measure of association between a parent-offspring pair or a sibling pair. We develop theories so that the asymptotic variance of an odds ratio can be computed from a 2 x 2 contingency table formed by dependent pairs.
Item Citations and Data