UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Statistical modelling and inference for discrete and censored familial data Zhao, Yinshan 2004-12-01

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata


831-ubc_2004-902935.pdf [ 7.45MB ]
JSON: 831-1.0099795.json
JSON-LD: 831-1.0099795-ld.json
RDF/XML (Pretty): 831-1.0099795-rdf.xml
RDF/JSON: 831-1.0099795-rdf.json
Turtle: 831-1.0099795-turtle.txt
N-Triples: 831-1.0099795-rdf-ntriples.txt
Original Record: 831-1.0099795-source.json
Full Text

Full Text

Statistical Modelling and Inference for Discrete and Censored Familial Data by Yinshan Zhao B.Sc, USTC 1993 M.A., York University 1998 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy in THE FACULTY OF GRADUATE STUDIES (Department of Statistics) we accept this thesis as conforming to the required standard The University of British Columbia Mar 2004 © Yinshan Zhao, 2004 Abstract Analysis of familial data with quantitative traits based on the multivariate normal distribution has been well studied. However, little attention has been devoted to traits which do not have a multivariate normal distribution, such as traits with discrete or censored values. In this thesis, we devote our effort to (1) construct models for familial data when the trait value is discrete and/or censored, and (2) study alternative estimation methods when maximum likelihood estimation is infeasible. We discuss two existing classes of models: models with random effects which are multi variate normally distributed, and models constructed from the multivariate normal copula. These two classes include a variety of models which can be applied to familial data. We also propose another class of models which we call conditional independence models. This type of model is based on a conditional independence assumption: for a trait variable, we assume independence of a pair of non-sibling relatives conditional on their parents, so that the dependence structure is built on the Markov property. Maximum likelihood estimates are generally difficult to obtain for random effect models and copula models when there are large families involved. We propose two estimation procedures based on composite likelihoods: the first is a two-stage method in which univariate marginal parameters are estimated based on univariate marginal distributions and the dependence parameters are esti mated separately based on bivariate marginal distributions with the marginal parameters treated as known; whereas in the second, all the parameters are estimated using the likelihoods of bivari ate marginal distributions. The composite likelihood methods can greatly reduce computation in parameter estimation, but with a price of efficiency loss. In this thesis, extensive investigations based on asymptotic covariance matrices and simulations were carried out to compare the asymp totic efficiency of these two procedures with the maximum likelihood method. In our efficiency ii comparisons, we investigate the multivariate normal model for a continuous trait, the multivariate probit model for a binary trait, the multivariate Poisson-lognormal mixture model for a count trait and multivariate lognormal model for a censored variable. We found that when the dependence is strong, the first approach is inefficient for the regression parameters; whereas when the dependence is weak, the second approach is inefficient for the dependence parameters. In many familial analyses, quantifying familial association is of great interest. For a binary trait, the odds ratio may be used as a measure of association between a parent-offspring pair or a sibling pair. We develop theories so that the asymptotic variance of an odds ratio can be computed from a 2 x 2 contingency table formed by dependent pairs. iii Contents Abstract ii Contents v List of Tables viiList of Figures x Glossary xii Acknowledgements xiiDedication xiv 1 Introduction 1 1.1 Familial Data and Notation 1 1.2 Genetics Background 5 1.3 Neurofibromatosis Type 1 and 2 6 1.4 Motivation and Background for Statistical Models 6 1.5 Outline of the Thesis 9 2 Models for Non-normal Familial Data 11 2.1 General Introduction of Model Classes 2 2.1.1 Models with Multinormal Random Effects (Mixture Models) 12 2.1.2 Multinormal Copula Models 14 2.1.3 Conditional Independence (CI) Model 16 iv 2.2 Binary Traits 21 2.2.1 Multivariate Probit (MVP) Model 22 2.2.2 Two-Component Model 22.3 Count Traits 31 2.3.1 Poisson-Lognormal Mixture 32.3.2 Multinormal Copula Model for Count Traits 32 2.3.3 Conditional Independence Models for Count Traits 33 2.4 Survival Data 39 2.4.1 Multivariate Lognormal Models 40 2.4.2 Multivariate Lognormal Frailty2.4.3 Multinormal Copula Model 1 3 Estimating Procedures Generated from Composite Likelihood Functions 42 3.1 Notation 44 3.2 Estimation Procedures based on Composite Likelihood 46 3.2.1 General Properties 43.2.2 A Two-stage Estimating Procedure (CL1) = 48 3.2.3 Estimating Equations Based on BCL (CL2) 60 3.3 Estimating the Asymptotic Covariance Matrix of 0 5 3.4 Remarks 64 Efficiency Comparisons 68 4.1 Continuous Response: MVN model 69 4.1.1 MLE4.1.2 Efficiency of the Two-Stage Estimating Approach (CL1) 70 4.1.3 Estimates Based on Bivariate Composite Likelihood (CL2) . .' 82 4.2 Binary Response: MVP Model 89 4.2.1 Likelihood and Composite Likelihood Estimating Functions 89 4.2.2 Efficiency Comparison 91 4.2.3 Conclusion 104.3 Count Response: Multivariate Poisson-Lognormal Mixture Model 101 4.3.1 ML and CL estimating functions 101 4.3.2 Efficiency Comparison 102 4.4 Survival Data Subject to Right Censoring: Multivariate Lognormal Model 104 4.5 Conclusion 106 5 Inferences for Log Odds Ratio with Dependent Pairs 108 5.1 Interclass Odds Ratio 110 5.1.1 Estimator of 7po Ill 5.1.2 Asymptotic Variance of 7po Il5.1.3 Methods to Estimate the Asymptotic Variance of %0 in (5.5) 116 5.1.4 Asymptotic Relative Efficiency of jpo 5.2 Intraclass Odds Ratio 120 5.2.1 The Estimator of -yss 125.2.2 Asymptotic Variance of yss 1 5.2.3 Weighted Estimator of ^ss 124 5.2.4 Methods to Estimate the Asymptotic Variance of %s 125 5.2.5 Efficiency Comparison 126 5.3 Discussion 129 6 Applications to NF1 and NF2 Data 130 6.1 Intertriginous Freckling in NF1 Patients 1 6.1.1 Interclass Odds Ratio 136.1.2 Intraclass Odds Ratio 2 6.2 Presence of Subcutaneous Neurofibromas in NF1 Patients 133 6.2.1 Summary of Patients 134 6.2.2 Probit Model6.2.3 Two-Component Series Model 136 6.2.4 Comparison of the Probit Model and the Two-Component Series Model . . . 137 6.3 Age at First Symptom of NF2 139 6.3.1 Summary of Patients . . .6.3.2 Model 13vi 6.3.3 Results 140 6.4 Number of Intracranial Meningiomas in NF2 Patients 141 6.4.1 Summary of Patients 146.4.2 Multinormal Copula Model with Negative Binomial Margin for Number of Meningiomas 142 6.4.3 Multivariate Poisson-Lognormal Mixture Model for Number of Meningiomas 144 6.4.4 Comparison of the Two Models 146.5 Discussion 146 7 Conclusion and Future Work 147 7.1 Conclusion 147.2 Future WorkAppendix A Dependence Structure of Quantitative Traits 150 A.l Mendelian Models 151 A.2 Polygenic Models 3 A.3 Major Gene Models 154 A.4 Models for Autosomal-Dominant Diseases 155 A. 5 Comments 156 Appendix B Appendix for Chapter 4 157 B. l General Results 15B.2 Asymptotic Properties of the Covariates 158 B.3 Maple Code '. . 160 vii List of Tables 4.1 AREs of CL1 for Type-3 families with equal number of offspring (MVN model) ... 82 4.2 Range of Pr(y = 1) for each combination of /?o and B\ (MVP model) 92 4.3 ARE of the CL1 and CL2 estimators for the exchangeable case at different levels of Bx and p (MPLN model) 103 4.4 ARE of the CL1 and CL2 estimators for the exchangeable case at different levels of a1 and p (MVPL model) 104 4.5 ARE of the CL1 and CL2 estimators for the exchangeable case with family size varying from 1 to 4 (MVN model with right censoring) 105 5.1 Contingency table formed by independent parent-offspring pairs . 109 6.1 Interclass odds ratio in intertriginous freckling 132 6.2 Intraclass odds ratio of intertriginous freckling 3 6.3 Number of affected pairs included in the analysis of presence of subcutaneous neu rofibromas in NF1 patients 134 6.4 Parameter estimates of the Probit model for the presence of subcutaneous neurofi bromas in NF1 patients. The three dependence parameters are: pss (sib-sib corre lation), ppo (parent-offspring correlation) and p2+ (correlation between any relative pair higher than 2nd degree) 135 6.5 Parameter estimates of the Probit model with CI structure for the presence of sub cutaneous neurofibromas in NF1 patients 135 6.6 The ML estimates of the two-component series model for presence of subcutaneous neurofibromas in NF1 patients . 137 viii 6.7 Probabilities and correlations of the heritable component in the two-component series model for the presence of subcutaneous neurofibromas in NF1 patients. Index I is the index of a parent and j, j' are the indices of Z's two children 137 6.8 Number of affected pairs included in the analysis of age at first symptom of NF2 . . 139 6.9 Parameter estimates of the MVN model for age at first symptom of NF2 140 6.10 Number of affected pairs included in the analysis of number of meningiomas in NF2 patients 146.11 Summary of meningioma counts in NF2 patients 142 6.12 Parameter estimates of the multinormal copula model with negative binomial margin for number of meningiomas in NF2 patients 143 6.13 Parameter estimates of the multivariate Poisson-lognormal mixture model for num ber of meningiomas in NF2 patients ; 144 6.14 Expected number of meningiomas based on the multinormal copula model with negative binomial margin (NBMC) and the multivariate Poisson-lognormal mixture (MPLN) model 145 A.l Covariances between relatives in a simple Mendelian model 152 ix List of Figures 1.1 Diagram of a Type-I pedigree 3 1.2 Diagram of a Type-II pedigree 4 2.1 Path from i to j. If ji and J2 are full siblings, the path is (i,... J2, • •., j); whereas if ji and J2 are half siblings, the path is (i,... ,ji,jo,J2, • •• ,j) 20 4.1 Efficiency of CL1 for exchangeable dependence structure. (MVN model) 77 4.2 Efficiency of CL1 for Type-3 families with k* = 3 offspring (MVN model); p2 = {p2 — p?)/(l — Pi) is the sib-sib correlation conditional on their parent 78 4.3 ARE of /?I,CLI against average number of offspring, k* (MVN model) 80 4.4 Efficiency of PO,CLI, &CLV PifiLi and P2,CLI (MVN model) 81 4.5 Efficiency of CL1 and BCL for the exchangeable case. Families are of size two and six with proportion 3:1 (MVN model) 83 4.6 Efficiency of PCLI, PBCL and pcm (MVN model); family sizes are 2 or 6 with pro portion 3:1. The CL2 estimate is weighted by family size; whereas the BCL estimate is not 87 4.7 Efficiency CL1 and CL2 for Type-3 families. The family size varies; the number of offspring is from 1 to 5 with equal proportion (MVN model) 88 4.8 ARE of A),CLI and p\,CL\ when family size k = 2. (MVP model) 93 4.9 ARE of /3cLi) PWCLI an(l hem f°r *ne exchangeable case with family size varying from 1 to 4 (MVP model) 95 4.10 ARE of PCLU PWCLI and pcL2 for the exchangeable case with family size varying from 1 to 4. (MVP model) 6 x 4.11 ARE of J3CL1, hwchx and PcL2 f°r Type-3 families, at pi = 0.3 (MVP model). Solid line: CL1, dashed line: WCL1, dotted line: CL2 97 4.12 ARE of Pcii? PWCLI and PCLI at Pi = °-3 for Type-3 families (MVP model). Solid line: CL1, dashed line: WCL1, dotted line: CL2 98 4.13 ARE of /3Cil, PWCLI and PcL2 for Type-3 families, at p\ = 0.6 (MVP model). Solid line: CL1, dashed line: WCL1, dotted line: CL2 99 4.14 ARE of PCLII PWCLI and PCL2 f°r Type-3 families, at p\ = 0.6 (MVP model). Solid line: CL1, dashed line: WCL1, dotted line: CL2 ' 100 5.1 ARE of 7po based on a multivariate binary beta-binomial model when ct\ = BQ = a,a0 = BX = 3 118 5.2 ARE of 7po based on the multivariate Probit model 119 5.3 Efficiency of ypo versus the variance-mean ratio (V/M) of family size 120 5.4 Efficiencies of jss estimators (multivariate binary beta-binomial model). Solid line: no weights, dotted line: optimal weights, dashed line: simple weights 127 5.5 Efficiencies of 7SS estimators (multivariate Probit model). Solid line: no weights, dotted line: optimal weights, dashed line: simple weights 128 5.6 Efficiency comparison of 7SS estimators with varied proportion of families with size 2 and 6. Solid line: no weights, dotted line with "+": optimal weights, dashed line: simple weights. In the figure on the right hand side, the dotted line with "+" and the dashed line are coincident 129 6.1 Region of Pn|3 and Pn\4 (intraclass odds ratio) 133 6.2 Predictions of the prevalence of subcutaneous nuerofabromatosis in NF1 patients by the probit model and the two-component model 138 xi Glossary 1. pmf: probability mass function pdf: probability density function cdf: cumulative distribution function MLE: maximum likelihood estimation (or estimator) CL: composite likelihood MVN: multivariate normal i.i.d.: independent and identically distributed AVar: asymptotic variance 2. 4>, <&: the pdf and cdf of univariate standard normal distribution 4>k{ • ; fi>, S), $fc( • ; p, _): the pdf and cdf of fc-variate normal distribution with mean // and covariance matrix S 3. V: maximum A: minimum J.: independent 4. 91fc fc-dimensional real space 5. —>: converge in probability —>• converge in distribution 6- _ipd: positive definite ordering, i.e. if A _P_; B, then A — B is non-negative definite. xii Acknowledgements I would like to express my deep gratitude to my supervisor, Professor Harry Joe. Without his support and guidance, this thesis would not have been completed. Very special thanks are due the other members of my supervisory committee, Professor John Petkau and Professor Nancy Heckman, for their invaluable advice and encouragement. Also, I thank Professor Jan Friedman for his helpful advice and for providing us with the NF data used in this thesis. Additionally, I would like to thank Dr. Gareth Evans and Dr. Michael Baser for providing the family information of the NF2 data. I am thankful for the simulating environment of the department. It is a pleasure to acknowledge helpful discussions with my fellow graduate students: Jochen Brumm, Rachel MacKay, Weiliang Qiu and Rong Zhu and all of the others. Finally, I thank Steven Ruuth for his expertise in numerical methods. YINSHAN ZHAO The University of British Columbia Mar 2004 xiii To my parents and my brother xiv Chapter 1 Introduction This thesis is concerned with models and estimation methods for familial data that may be discrete or censored. The major contributions of this thesis are: (1) a new class of models called conditional independence models, (2) estimating approaches for models which are limited in practice due to computational difficulties, and the asymptotic properties of these approaches, and (3) a nonpara-metric method to estimate familial odds ratios for binary traits. In this chapter, we first introduce the structure of familial data being considered, followed by a brief overview of some genetic back ground and two genetic disorders which are used as examples in this thesis. Then, we address the challenges in analyses of discrete or censored familial data. An outline of this thesis is given at the end of this chapter. 1.1 Familial Data and Notation Many genetic studies seek to understand how different genetic and environmental factors affect a trait or condition. These studies examine a sample of a large number of families. The data obtained from such a study come from measuring trait variables and other variables of interest, such as age and treatments, from one or more members of each family, and therefore the data are called familial data. This thesis presents only models for a single response measurement on each individual. This measurement can be any one of several different types: (1) quantitative, such as body size or blood pressure; (2) survival, i.e. a realization of nonnegative random variable, with possible censoring, such as life time or onset age of a disease, normally subject to right censoring; 1 (3) count (non-negative integer), such as number of tumors; (4) binary, such as absence/presence of certain feature; (5) ordinal, such as level of severity of a disease. We denote the number of families in the sample by n and the number of members in the ith. family by k{. Then Yj = (Yn,..., Y^)' is the vector of measured values of a single trait on the members of the ith family and Xj = (Xji,..., ) is a ki x m covariate matrix containing variables which may influence the trait. When we refer to a family in general, the family index i will be omitted. For familial data, trait values for individuals in different families are considered independent. Members in the same family are expected to resemble each other more than members in different families. Sometimes, the familial resemblance is the primary interest of a study. Other times, the primary interest rests on other factors. However, in order to obtain correct statistical inferences, the familial dependence has to be taken into account. From the statistical point of view, analyses of these data involve two tasks: first, constructing a statistical model which (1) describes the relation between the trait and the covariates; (2) quantifies the degree of association between different types of relatives, and making statistical inferences based on the model. In this thesis, we focus on non-normal responses, more particularly, survival, binary and count responses. In the rest of this thesis, the term "non-normal responses" refers to these three types of data. Before we proceed, we define the family types considered in this thesis. In familial analysis, the sample unit is a family. In the literature, "familial data" sometimes only refers to data with nuclear families as the sample units, i.e., to data obtained by sampling independent families con taining two parents and their offspring. The familial data considered in this thesis are not limited to nuclear families. The data structure under consideration varies according to the sample unit. Here we define the types of families which may be considered as sample units. The first four types are small families with one or two generations: Type-1 a pair of relatives: typically twins, parent-offspring pair or sibling pair. Type-2 a group of full siblings: the size of each group could vary from family to family. Type-3 one parent and his/her offspring. Type-4 a nuclear family: two parents and their common offspring. 2 The remaining two types of sample units are pedigrees with multiple generations. In general, we only consider pedigrees without twins or half siblings. Discussion will be given later on how to generalize our models to include those individuals. Type-I pedigree: a pedigree formed by family members who share a common ancestor. The spouse of a family member is not included. Thus, only one parent of each individual is included in the pedigree. Such a pedigree can be illustrated by a tree as shown in Figure 1.1. The root of the tree is called the founder of the pedigree who is the ancestor of all the other members. We define the following notation to describe this type of pedigree. For individual i, let Ai = (ii,t2, • • •) be the set of i's direct ancestors with its elements indexed so that i\ is i's parent, it is «_.'s parent, and so on. For the founder, Ai = 0. For two individuals i and j, we define A^j — Ai\Aj, namely, Ai\j is the collection of all the individuals who are the ancestors of i, but not j. If i and j are siblings, Ai\j is empty. Otherwise Ai\j — ... ,igij), where igij is the most distant ancestor among all the ancestors of i who are not shared by j. Also we define A'^ as the same collection but in reverse order. Let D. be i's descendants set. Clearly, (i, A) is also a tree with i being the root. Let 5. be a sibling group whose parent is i and T — {i : Si ^ 0}, i.e. the set of indices corresponding to parents. The parent i combined with his/her offspring (Si) is called a family unit, the basic unit of a pedigree. Each family unit is a Type-3 family. We index a family unit by the index of the parent and denote it by Ui = (i, Si). Two units overlap, if and only if an individual is an offspring in one of the units and the parent in another unit. In Figure 1, there are three family units and unit 1 overlaps with both units 2 and 3. Unit Uhitl 'nit 3 Figure 1.1: Diagram of a Type-I pedigree 3 Type-II pedigree: a pedigree formed by family members of a common ancestor, plus possibly their spouses. The spouses are called marry-in's and do not have their ancestors in the pedigree. To describe such a pedigree, we need to modify the previous notation. Let mi be the index of the spouse of individual i. The family unit is (i, m,i, Si), a Type-4 family. Such a pedigree is illustrated in Figure 1.2. Suppose individual 1 is the founder. Individual 2, 3 and 6 are marry-ins. Uni Uhit 1 Uhit 3 Figure 1.2: Diagram of a Type-II pedigree The type of sample unit involved in a study depends on the purpose of the study. For example, data with Type-I pedigrees are often collected from a population affected by an autosomal dominant disorder, caused by alteration or mutation of an autosome (i.e. chromosome 1 - 22, non-sex chromosome). Within a family, affected individuals normally share the same mutation. However, commonly even within the same family there is marked variation in the expression of the disorder. The focus of the study would be the reason for the variable expression. On the other hand, data with Type-II pedigrees are collected from a general population. Therefore, Type-I pedigree and Type-II pedigree data require different model assumptions. A model for Type-I pedigrees is not necessarily a special case of a model for Type-II pedigrees. In data with Type-I pedigrees as sample units, information is sometimes available only on a few individuals, for example a sibling group. In this case, even though the family only contains siblings, it should be considered as a pedigree. 4 1.2 Genetics Background To comprehend the genetic basis of familial resemblance, it is crucial to understand the transmission of genetic information from generation to generation both at the individual and population level. Mendelian laws of transmission tell us how a gene is passed from the parents to their offspring: During gamete formation, the two alleles of a gene separate from each other and go into different gametes. The segregation of one gene pair is independent of other gene pairs. (The second part is not true if two genes are closely located on the same chromosome.) According to Mendel's laws, an offspring receives half of its genetic materials from the father and the other half from the mother. In genetics, the specific allelic composition is called a genotype. Given the parents' genotype, the offspring's genotype is not dependent on other ancestors. The more closely related are two individuals, the more likely they share a gene by descent. On the population level, we consider a population which is infinite in size and randomly mating with no inbreeding. We will further assume that the effects of migration, selection and mutation on the population are insignificant. Although such an idealized situation is never realized perfectly, it is close enough to the truth for practical purposes. With these assumptions, the gene frequencies and the genotype frequencies remain constant from generation to generation. This characteristic is described by the Hardy-Weinberg equilibrium (Lange, 1997). Genotypes often cannot be easily observed, but their effects are reflected on an observable character or characters (i.e., trait), such as eye-color, or fingerprints. The form of the observable character (or the value of the trait) is called a phenotype. It is important to understand how genotypic difference affects phenotypic difference. There are different mechanisms. The simplest case is that a trait is completely determined by a single gene. Then the phenotypic difference can be explained solely by the genotypic difference. More often, the trait is not just affected by genetic factors, but also affected by random environmental factors. Then there are two sources of variation: genetic and environmental. Normally these two sources are treated independently in a statistical model. In reality, there are few traits determined by a single gene. More generally, a trait can be affected by several genes jointly, or by a large number of genes, each of them with a small effect and assumed to be acting independently. It also can be a combination of the two cases: the trait is affected by one or more genes with major effects and a large number of genes with small effects which are assumed to be additive. Each of these mechanisms leads to different resemblance patterns 5 among relatives. The basic idea is that if a trait is determined by genes and random environmental factors, the resemblance between relatives is determined by (1) the possibility of their sharing of the responsible genes and (2) the proportion of variation due to genetic factors against that due to environmental factors. 1.3 Neurofibromatosis Type 1 and 2 This thesis is motivated by problems that arise from several studies on two genetic disorders, neurofibromatosis type 1 (NF1) and type 2 (NF2). The methodologies developed in this thesis are illustrated in analyses of different features in NF1 and NF2 patients. Therefore we give a brief introduction of these two disorders. Both NF1 and NF2 are autosomal dominant genetic disorders. Common features of both are nerve tumors. Although both are called neurofibromatosis, their clinical manifestations and genetic origins are distinct. The gene for NF1 is located on chromosome 17. Features, such as intertriginous freckling, neurofibromas of different types, Lisch nodules and learning disabilities are common in NF1 patients. The gene for NF2 is located on chromosome 22. The hallmark of NF2 is a nerve tumor called bilateral vestibular schwannoma, but other tumors such as meningiomas and nonvestibular schwannomas are also common. Both disorders have highly variable expressivity manifested in many different ways, including variation in age at onset, types and numbers of clinical features, overall disease severity, and rate of progression. Clinical studies indicate that the expression tends to be similar within a family. But substantial phenotypic differences are also observed within families despite the fact that family members share an identical mutant allele. The reason for the variability in the disease expression is unknown. Clearly, variation in the mutant allele itself does not account for all of the variability. Many different genetic and nongenetic causes may exist and act alone or in combination. Comparison of the strength of association between different types of relatives will help us to identify the sources of the variation. 1.4 Motivation and Background for Statistical Models The statistical analysis of familial association started early in the 20th century. In 1918, Fisher published his first and also one of his most important papers in genetics, The Correlation Between 6 Relatives on the Supposition of Mendelian Inheritance. In this paper, he showed how the pheno typic variation could be partitioned into heritable and non-heritable components, and therefore established the link between the association observed between relatives and the underlying genetic mechanism. This paper along with the work of Haldane (1932) and Wright (1921) laid the founda tion of quantitative genetics. Since then, sophisticated models for quantitative traits (continuous variables) have been developed, mainly based on the normal distribution. (A detailed discussion is given in Appendix A.) In addition, there are papers on estimation procedures for interclass and intraclass correlations for familial data, for example, Donner and Koval (1980), Kleffe (1993) and Donner et al. (1998). These estimation procedures are based on the MVN distribution assump tion without adjustment for covariates and mainly apply to data with Type-2, Type-3 or Type-4 families. Cases with covariates are also considered by Srivastava and Ng (1990). As for traits of other kinds, some work is scattered in the literature, but in terms of both quantity and maturity of this work there is no comparison with the well established normal models for quantitative traits. Many open questions remain. Difficulties in analysis of non-normal familial data mostly lie in two aspects. The first aspect is the lack of models which can provide a flexible dependence structure able to reflect the nature of familial resemblance, i.e., the degree to which the dependence of two relatives depends on their biological relationship. Generally, that dependence is stronger for two closely related individuals and weaker for two distantly related individuals. When the families under consideration are arbitrary, the dependence structure varies from family to family. Many models existing in the literature can only be applied to certain types of families. Other models may provide a flexible dependence structure, but give results with no natural interpretation. The second aspect concerns the computational problems in parameter estimation. Many of those multivariate models do not have a simple closed-form likelihood. Therefore, the maximum likelihood estimators (MLEs) are generally difficult to obtain. Even though there are numerical techniques and approximation methods available, those methods may not be stable when the family size becomes large. Below are some examples for binary, count and survival familial data. The latent threshold method was introduced to analyze a binary trait (Falconer, 1965, 1967). Normality is assumed for the latent variables, leading to a multivariate probit model. A pleasing aspect of this model is that it is well connected to quantitative genetics and therefore provides a meaningful interpretation of the parameters. It also admits a wide range of dependence structures. 7 Moreover, it can be extended to analyze an ordinal response (Manatunga and Williamson, 2001). However, its drawback is clear as well. When family size is large, likelihood evaluation involves high dimensional numerical integrations. Other approaches such as regressive logistic models (Bonney, 1986) and conditional logistic regression models (Connolly and Liang, 1988; Tosteson et al., 1991; Abel et al., 1993) have been developed. However, as Tregouiet et al. (1999) pointed out, the results from these models may depend on the size of the family or the ordering of the family members. They propose a model using a parametric copula which overcomes this problem, but their model only applies to nuclear families. A general approach to modeling correlated count data is the Poisson mixture model, in which the Poisson parameters are modelled by fixed and random effects (Cameron and Trivedi, 1998). If the random effects are assumed to have a gamma distribution, the joint distribution has a closed form, but with only one level of dependence. To achieve a more flexible dependence structure, multivariate log normally distributed random effects have been considered in modelling animal breeding (Tempelman, 1996; Tempelman and Gianola, 1999). Again, computational difficulties arise in parameter estimation for such a model. The multivariate Poisson model is also used to model bivariate count data (Xu, 1996; Joe, 1997). In such a model, each count variable is the sum of two independent Poisson random variables, one of which is shared within the sample unit. The dependence comes from the shared component. Similar models can be formed with the multivariate negative binomial or the generalized Poisson distributions. However, this type of model is difficult to extend to accommodate the dependence structure of a complex pedigree. In recent years substantial research has been devoted to developing methodologies for clus tered survival data. An example is the univariate frailty model (Vaupel et al., 1979) where the frailties induce an exchangeable dependence structure. Clearly, it is limited by its unduly simple dependence structure. It is possible to allow more than one frailty in the sample unit, so that a nested or hierarchical dependence structure can be obtained (Bandeen-Roche and Liang, 1996; Joe, 1993). However, this can only be applied to certain types of familial data, such as Type-3 families. Pickles et al. (1994) as well as Yashin and Iachine (1995) propose a correlated frailty model. In this model, instead of sharing a common frailty, each individual has his own frailty correlated with frail ties in the same family. As in the multivariate probit model and log-normal Poisson mixture model, the likelihood function is difficult to evaluate. Later, Korsgaard and Andersen (1998) proposed an 8 additive genetic gamma frailty model. The frailties are constructed as a summation of independent gamma variables. This model provides a dependence structure similar to the polygenic model (see Appendix A.2). But there is no natural explanation of the decomposition. In this thesis, we devote our efforts to overcoming these challenges. We consider different modelling approaches which can lead to models that are both practical and interpretable, including random effects models, multivariate normal copula models and a family of new models called conditional independence models. Conditional independence models are constructed based on the assumption that the trait values of two non-sibling relatives are independent conditional on their parents. This approach reduces the task of modelling a complex pedigree to modelling a family unit containing only the parents and their offspring and simplifies the computation in estimation. For models where the MLE method is difficult to apply, we propose two alternative estimating procedures based on composite likelihoods and investigate the asymptotic relative efficiency of the estimates. We also propose a nonparametric method to estimate the interclass and intraclass odds ratios for a binary trait. The next section is an overview of the work included in this thesis. 1.5 Outline of the Thesis The rest of this thesis is organized as follows. In Chapter 2, we discuss three different modelling approaches for familial data. The first approach uses random effects to construct the dependence structure, and the second uses the multivariate normal copula. These two existing approaches have been applied to familial data or other multivariate data. In this thesis we provide a comprehensive summary of these models in the context of familial data analysis. Chapter 2 also serves as background for Chapter 3, in which estimation procedures are addressed. We then present the conditional independence models and study the properties of this model family. After the general introduction of the modelling approaches, we study some specific models for binary, count and survival responses. In Chapter 3, we propose two likelihood-based estimation methods. The first approach is a two-stage method in which univariate marginal parameters are estimated in the first stage based on the likelihoods of the univariate marginal distributions, whereas the dependence parameters are estimated in the second stage based on the bivariate marginal distributions with the marginal 9 parameters fixed at their estimated values from the first stage. In the second approach, all the parameters are estimated simultaneously based on the likelihoods of the bivariate marginal dis tributions. Both methods yield asymptotically consistent parameter estimates. In Chapter 4, we investigate the performance of the two methods. The asymptotic efficiencies of the estimates are compared with that of the MLEs when the MLEs can be obtained. Our results show that both methods behave well except that the two-stage method can be inefficient for the estimation of the regression coefficients when the dependence is strong. Chapter 5 deals with estimation of intraclass and interclass odds ratios for a binary trait. The problem arises from a study in NF1 patients. We propose a nonparametric method to esti mate intraclass and interclass odds ratios using relative pairs and derive the asymptotic variances. Asymptotic efficiency is compared with that of the MLE based on a parametric model. In Chapter 6, the models and inferential approaches studied in the previous chapters are applied to datasets of NF1 and NF2 patients. In Chapter 7, the final chapter, we discuss some future research topics related to this thesis. 10 Chapter 2 Models for Non-normal Familial Data In this chapter, we introduce models for non-normal familial data. We will focus on three different types of traits: binary, count and survival. A general assumption underlying the models is the "common univariate margin" assumption: the univariate marginal distribution of each individual belongs to the same family of distributions, say, Yij ~ Fi( • \6ij). In familial data analyses, it is common, but not necessary, to assume that the dependence structure between siblings is exchangeable and the order of the siblings is irrelevant to the distribution. The models introduced in this chapter can be classified into three categories: models with random effects which are multivariate normally distributed, multinormal copula models and condi tional independence models. The first two classes of models were originally developed for multivari ate data under different circumstances. They are feasible for familial data analysis because they provide a wide range of choices of distributions and a flexible dependence structure. One can find a considerable amount of research on the properties of these models in the literature. Therefore, we provide only a brief summary of those properties which are important to familial data analysis. Models of the first two classes are limited in practice due to computational difficulties. So we develop a third class of models based on a conditional independence assumption. The dependence structure generated by this approach is an approximation to the dependence structure under poly genic effects. The conditional independence assumption enables one to easily carry out the task of model fitting. Models from this class have not been studied in the literature. Therefore, the properties of this class will be discussed in detail. The first part of this chapter is an overview of the three classes of models. General prop-11 erties such as dependence structure and likelihood are discussed for each class. Then we study some specific models for each type of trait we consider, including: multivariate probit (MVP) and two-component models for binary traits; Poisson-lognormal mixture, multinormal copula and conditional independence models for count traits; multivariate lognormal, multivariate lognormal frailty and multivariate copula models for survival data. 2.1 General Introduction of Model Classes 2.1.1 Models with Multinormal Random Effects (Mixture Models) Definition A random effects model has a hierarchical structure. The first level specifies the dis tribution of the observed response vector, Y, conditional on an unobservable random vector A = (Ai, Afc). Given A = A = (Ai, Afc), we assume that Y_, ...,Tj. are condition ally independent with pdf (if Y"j is continuous) or pmf (if Y. is discrete) g(yf, A., a.). Note that the a. represents other parameters, if they exist, which are not random. The second level specifies the distribution of A. We assume that h(A) = Z ~ N(fi, _), for some elementwise function h. Applications and Examples Such models include the models Tempelman (1996) proposed for animal breeding, in which multivariate normal random effects were incorporated into a negative binomial distribution, and the multivariate lognormal frailty model applied to animal data by Korsgaard, Madsen and Jensen (1998). We describe two examples in the later sections: the multivariate Poisson-lognormal (MPLN) (Section 2.3.1), and the multivariate lognormal frailty models (Section 2.4.2). In both cases, h(A) = (log Ai,..., log Afc)'. The parameter A. is linked to the conditional mean in the MPLN models and to the hazard in the frailty models. Dependence Structure The dependence among the Y.s comes entirely from the dependence among the Z.s. Let Ui(Zi) = E(Y;|Z) = E(Yi\Zi). Then Cov(y_,Yj) = Cov(E(Y.|Z),E(Y)|Z)) + E(C<JV(Y_, Yj|Z)) = Cov(i/_(Z_), Vj{Zjj), 12 since E(Cov(Y";,Yj|Z)) = 0 from the conditional independence assumption. Intuitively, the corre lation between Yj and Yj increases with the correlation between Zi and Zj. However, the range of the correlation between Y> and Yj is generally smaller than [—1,1]. Likelihood The unconditional joint pdf/pmf of Y = y is then given by /(y) = /(y;Ax,£)= f {TT»(yi;^(2i)."i)}^*(»;M,s)dz. (2.1) Unfortunately, there is no closed form for the above integration, except for a few special cases. With an arbitrary S, use of multi-dimensional numerical integration is inevitable in evaluating the likelihood. Advantages and Disadvantages The advantage of normal distributions for the random effects is clear — its great flexibility in dependence structure. It also establishes a link to traditional quantitative genetics. This provides a meaningful interpretation of the models. We might consider Zj as a latent quantitative trait under the influence of polygenic and multiple environmental factors. The mean vector of Z, /z, can be specified as a linear function of the covariates, i.e. fx = X/3. The covariance matrix of Z, S, can be partitioned into different correlation components according to the assumptions on genetic and environmental influences, as described in (A.3). The major disadvantage is the computational difficulty in parameter estimation. Even with today's powerful computers and numerical methods, multi-dimensional integration is formidable when the dimension is higher than four. To overcome this obstacle, we need to consider estimating methods which are less computationally demanding than ML. Some likelihood-type approaches and their performance are investigated in Chapters 3 and 4, and the use of those approaches is illustrated in Chapter 6. Another disadvantage of using normal random effects is the limitation in the type of the distributions. When we assume a parameter of the conditional distribution is random, the shape of the resulting marginal distribution differs from that of the conditional distribution. For example, the Poisson lognormal mixture has a marginal distribution which is much more skewed than the Poisson distribution. In the next section, we introduce the multinormal copula model which is more 13 flexible in modelling the marginal distributions. 2.1.2 Multinormal Copula Models Definition The theory of copulas provides a means of constructing various multivariate distributions. The joint distribution is formed by combining the marginal distributions through a copula, a multivariate distribution function with {7(0,1) univariate margins. (Properties and parametric families of copulas can be found in Joe (1997) or Nelsen (1999).) Among the extensive parametric families of copulas, the multinormal (or Gaussian) copula provides a wide range of dependence and therefore is suitable for familial data analysis. Suppose Yj has univariate marginal cdf Gj. The joint distribution of Y constructed using a multinormal copula is G(y) = MS-MGifoi)], • • •, ^[GkiVk)}; R(a)), (2.2) where R = (pij) is a ^-dimensional correlation matrix. The cdf Gj can be chosen as any family of univariate distributions. For example, for count data, we can choose the Poisson or negative binomial as the marginal distribution; for survival data, we can choose the Weibull distribution. Applications and Examples Applications of multinormal copulas can be found in longitudinal data analysis (Xu, 1996; Song, 2000), but not in familial data. Examples of multinormal copulas are given in Sections 2.2.1, 2.3.2 and 2.4.3. Dependence Structure Since pij = Corr($_1[Gj(Yj)], $_1[Gj(Yj)]) for fixed Gj and Gj, Corr(Yj, Y,) is a monotone transformation of p^. In fact, that Corr(Yj, Yj) and pij are very close to each other (Song, 2000). When p^ = 0, we have independence between Yj and Yj. For the bivariate case, when p^ = 1 or —1, the most extreme positive (or negative) dependence between Yj and Yj is achieved (see discussion of the upper and lower Frechet bounds in Joe, 1997, Chapter 3). For continuous Gj, when p^ = 1, Yj and Yj are perfectly dependent. Moreover, if Gj — Gj or Gj and Gj are related 14 by a location/scale shift, then the dependence is linear and Corr(Y_, Yj) = 1 as well. For discrete Gi, assuming that Yj and Yj have same support, Corr(Y., Yj) = 1 only if p.j = 1 and G. = Gj. Likelihood The response Yj can be treated as a transformed standard normal random variable. Let Z = (Z_,...,Zfc)' ~ N(0,R). When G. is continuous with corresponding density function gi, Yi = G~l(${Zi)). Let Zi = $_1(G.(y.)) and z = (zi, z*)'. The joint density function of Y can be expressed as When Gi is discrete, the transformation from Z to Y is discrete. The joint pmf of Y involves an integral of the multinormal density over a A;-dimensional rectangle. Multidimensional integration is also required to evaluate the likelihood function when censoring is present in survival data. Thus, the application of multinormal copula models is limited in practice. Advantages and Disadvantages One advantage of the multinormal copula model is that the marginal distributions and the joint distribution can be modelled separately. Combining the copula and a variety of univariate marginal models, a large number of multivariate distributions can be constructed, including both continuous and discrete distributions. Moreover, in models formed by a copula, the parameters which describe the association between the variables can be easily separated from those of the univariate marginals. This allows us to partition parameters into two groups: one group only related to the univariate marginals and the other only related to the copula. Another advantage of this class of model is its flexible dependence structure. The correlation matrix R = (p.j) can adopt the structures described for quantitative traits in Chapter A. Since the correlations between the responses are close to those between the Zs, the dependence structure generated from such a model is similar to the dependence structure of a quantitative trait. As mentioned before, the major disadvantage of the multinormal copula model is its com putational infeasibility. 15 2.1.3 Conditional Independence (CI) Model In this section, we present a new method to construct models for traits influenced by many genes and other factors. We consider this class of models when the primary interest is to determine if there is a familial correlation and how strong it is, but not to identify all the variance components. The motivation of this model is to simplify the task of modelling and computing the likelihood. There is a practical reason to consider this class of models as well. In data analysis, the information provided by the data might not be enough to identify every variance component. For example, suppose most families in a dataset are small containing no more than two generations, while only a few families are large with multiple generations. Under this scenario, it is impossible to obtain satisfactory estimation of the associations between distant relatives. So we focus on modelling the two most important dependencies, parent-offspring and sib-sib dependence, while approximating the association between other relatives. The main idea of the conditional independence models is that the dependence between two relatives of second degree or higher comes through their parents. Conditionally on the status of their parents, the two individuals are assumed to be independent. However full siblings and monozygotic twins need to be treated differently. Because of the polygenic and environmental effects, they tend to resemble each other more than other types of relatives because (1) they share more genetic material and (2) they are more likely to share a common environment. For these reasons, we allow stronger dependence among them. We begin with data formed by Type-I pedigrees without monozygotic twins or half siblings. We first state the conditional independence assumption. We then derive the dependence structure and the likelihood of a model under the CI assumption. Finally we extend the model to include monozygotic twins and half siblings and Type-II pedigrees. Examples for binary and count traits are given in Sections 2.2.2 and 2.3.3, respectively. The notation used in the following sections was introduced in Section 1.1. Conditional Independence Assumption for Type-I Pedigree The conditional independence assumption is stated as follows: If a sibling group Si is not empty, then conditionally on the value of the parent, Yi, (1) (Yj : j € Si) are non-negatively dependent; (2) (Yj : j £ 5.) and {Yj> : j' £ Z>.) are mutually 16 independent. Under this assumption, the Ys corresponding to a group of siblings are allowed to be pos itively dependent on each other conditional on the value of their parent. Normally we assume an exchangeable dependence structure among siblings. From this assumption, two important properties follow and the likelihood and dependence structure of a CI model can be derived from these two properties. Properties of the CI Model for Type-I Pedigree Property 1 The joint pdf/pmf ofY can be written as the product of the marginal pdf of the founder and the conditional probabilities of all the sibling groups given the value of their parent. Let 1 be the index of the founder. The joint pdf/pmf of Y is /(y; X, 0) = /i(yi; Xi, 0) J] /^(ys,; Vi, XSi, 9), (2.3) ier where fi is the univariate marginal pdf/pmf ofY\ and fs^Yi *s the conditional pdf/pmf of the sibling group Si, Xi is a vector of covariate values of the founder and Xs; is a matrix of covariate values of the sibling group Si. As a result, once we specify the joint distribution of the family units, we can derive the likelihood of the whole pedigree. Assuming exchangeable dependence among siblings, a family unit has two levels of dependence, parent-offspring dependence and sib-sib dependence, and can be modelled with a nested dependence structure — the sibling group is nested within a family unit, since generally the sib-sib dependence is stronger than the parent-offspring dependence. Property 2 For two individuals j and j', if j is individuali's descendant, i.e., j £ Di, while j' is not i's descendant, i.e., j' Di, then Yj JL Yj'|Y;. Proof: For discrete Yj, we need to show that Pr(Yj = yj\Yi = yi,Yj> = yj>) = Pr(Y7- = yj\Yi = ?/j). This is true when j is i's child. Now suppose j is not i's child. Then let (ji,.. .,JL) = Aj n Di, i.e., the collection of j's ancestors who were descended from i. From the way js are indexed, j\ is 17 ji's parent, J2 is jis parent and so on. So the last element ji is-Vs child. Pr(Yj = yj|Y; = yu Yj, = yj,) • • • > ijL = VJL \Yi = y» Yj, = yj,) yji yjL = E---.EPr(yi = wi1s-i Vii) PrOjfi = Vji \YJ2 =Vh)--- P*(YjL = Vji.\Yi = Vi) = E---EPrft- = wiy* Vh)---^(Y0L =VJL\Yi=yi) Viz VJL = Pr(Yj = yj\Yi = y_) by induction. An analogous derivation holds for continuous Y if we replace summations by integrations and probabilities by densities in the proof. The second property enables us to determine the dependence between any two family mem bers once the model for family units is specified. If we take two individuals, i and j, from the family tree then there are two possibilities: (1) both are on the same branch, i.e., one is the other's ancestor; (2) they are on two different branches, i.e., they are not ancestor-descendant. Case 1: Consider the first case. Suppose individual^ is individual j's descendant. We define the path from i to j as (i,Ai\j) (note that igi<. — j). It follows directly from Property 2 that the path satisfies the Markov property. We can, therefore, view Y. and Yj as two points on a Markov process separated by gij steps. Case 2: Now consider the second case when individual i and j are on two branches. Suppose the two branches meet at k. Then k is the first common ancestor shared by i and j. To go from i to j, we can follow the path described below: start from i and follow Vs ancestors up to the one who is /c's child, then go to another A;'s children who is jf's ancestor, from there descend to j. Expressed in our notation, this path is (i, A^j, A'.^vj). Next we show that it satisfies the Markov property. There are three possibilities: (1) Both are fc's children. Then Aj\j and A1-^ are both empty and i is directly connected to j. (2) Only one is fc's child. Suppose it is i. Then Ai\j is empty. The path is (i^A'-^j) = (Yj, Yj ., Yj1, Yj). From Property 2, the stretch from Yjg.. to j satisfies the Markov property and Y. ± Yjg. ,_1 \Yjg.. (since i is not jgj i descendant, while j9ji-i is). Therefore the whole path satisfies the Markov property. (3) Neither is fc's child. Then the path is (Yj, Y^, ..., Yig. , Yj ., ..., Yj1, Yj), where i9i j and j9j i are fc's children. The two 18 stretches from Yj to Yjs. . and from Yjg.. to Yj satisfy the Markov property. If we write IQ = i, we have Yj _j J_ Yjsiii|Yj9iJ.. Therefore the Markov property is satisfied everywhere. The dependence between any two individuals in a pedigree can be derived based on the Markov property. The correlation structure will depend on the distribution under consideration. Example 2.1 Suppose that the Ys are normally distributed. Suppose that the parent-offspring and sib-sib correlations are specified as ppo and pss, respectively. For case 1, variables on the same branch can be thought of an autocorrelated process with lag one correlation ppo. The correlation between individuals i and j is ppo3. For case 2, the variables on the path from i to j are also autocorrelated but with correlation pss between Yj9. and Yj ., while pp0 elsewhere. The correlation is ppo3 +93'lpss when i and j are on two branches. Thus, the correlation between grandparent and grandchild is ppo, the correlation between uncle and nephew is ppopss, and the correlation between first cousins is p2popss-From the above example, we see that if p\ and p2 are positive, the dependence between two family members is always positive, but declines by generation. The same pattern will be shown in our other examples later. Extension to Include Monozygotic Twins and Half Siblings To extend the ideas to families with monozygotic twins, we can consider three levels of dependence within a family unit. In this case, each family unit can still be considered as a nested cluster, but with three levels—the twins are nested in the full siblings group. Again, let us use the normal distribution as an example. Suppose ji and ji are twins and the correlation between them is ps. Individual i is the gjth descendant of ji and j is the gjth descendant of J2- Then the correlation between i and j is pg^+93 p$. The correlation between other individuals remain the same. It is also straightforward to extend the model to include half siblings. We assume that the half siblings are independent conditionally on their common parent. Under this assumption, half siblings are treated as second degree relatives. Suppose j\ and J2 are half siblings with common parent jo, i is the gjth descendant of j\ and j is the gjth. descendant of J2- Unlike the full siblings, the path between i and j will go through jo as shown in Figure 2.1. So Corr(Yj,Yj) = p^,+s-'+2 under the normal distribution. 19 j1 =|j2 jo Full siblings Half siblings Figure 2.1: Path from i to j. If j\ and j% are full siblings, the path is («,... ,ji,J2, • • • > j); whereas if ji and j2 are half siblings, the path is (i,..., ji,jo, J2, • • •, j)-Extension to Type-II Pedigree In a Type-II pedigree, marry-in's are included. To extend the CI model to Type-II pedigree, we assume Independence between spouses: Yi and the spouse Ymi are independent. In the extension, Ymi and Yj are independent if j is not one of 's descendants, i. e. j Dm. We also modify the conditional independence assumption as follows: The parent-offspring dependence is the same for both parents. Conditional on the value of Y{ and/or Ymi, Yj and Yj> are non-negatively dependent if j, j' G Si; conditional on the value of Yi, Yj and Yj> are independent if j in Si and j' not in Di. Based on the new assumptions, (2.3) can be modified as where B is the set of the founder and all the marry-in's. Discussion The issue is the appropriateness of the conditional independence assumption. It is similar to the Mendelian genetics assumption that the genotype is only determined by the genotypes of the parents. Here we assume that the values of the phenotype have the same property. It seems an /(y)=n^x*>*) n fsiw^ivs i'tUii Vrriii J*-Si H 1 (2.4) ieB ieRT 20 oversimplification of the reality. But we should regard it as an approximation of reality. The biggest challenge of models for familial data is to accommodate the dependence structure of a complex pedigree. The two properties of the dependence structure generated from this assumption suggest it is reasonable for familial data. First, the dependence between two relatives is weaker as they are more distantly related, which agrees with the structures discussed in Appendix A. Second, there are no strong assumptions made on the two most important dependencies — the parent-offspring and sib-sib dependence — therefore they can be modelled with reasonable flexibility. There are two major advantages of the CI models. First, the task of modelling a com plex pedigree reduces to the much easier task of-modelling family units. The choices of marginal distributions are much broader as only two levels of dependence need to be considered. Second, CI models can be handled computationally. The conditional independence assumption allows us to write the likelihood as a product of univariate marginal likelihoods and conditional likelihoods which are computable. Even though a CI model may look like a choice of convenience, the sim plicity of this model does make it appealing when identifying the variance components is not of interest. 2.2 Binary Traits A binary trait is characterized by two states, the presence and absence of a certain feature. For each individual there is an indicator random variable Y such that Y = 0 if the trait is absent and Y = 1 if the trait is present. The feature may have both genetic and environmental determinants. In this section, we introduce two models to describe their effects: the multivariate probit model and two-component model. In the latter, the trait is assumed to be affected by two independent components, a heritable component and a non-heritable component. The probit model is a well accepted method to analyze multivariate binary responses. Therefore our discussion of this model is brief. On the other hand, the two-component model is new. Therefore we give a detailed discussion of the properties of this model. 21 2.2.1 Multivariate Probit (MVP) Model The multivariate probit model was introduced to familial data analysis in the 1960's (Falconer, 1965). The idea is that Yj is determined by an unobserved continuous variable Zj. When the value of Zi exceeds the fixed threshold 0, Yi = 1; otherwise, Yi = 0. It is assumed that Z = (Z\,Z^)' has distribution N(p, R), where p is usually specified as a linear function of the covariates, i.e., p, = X/3. Unit variance is assumed for each Zi since variance is not identifiable when only binary data are observed. It is easy to see that Pr(Yj = 1) = Pr(Zj > 0) = $(//;) and Pr(Yj = l,Yj = 1) = $2(Pi, Pj', Pij)- It follows that Coiv(Yi,Yj) = $2(pu Pj\ Pij) ~ ^(Mt)^(Mi)> which is an increasing function of py. The pmf of Y is given by where 21 = A\ x • • • x Ak with Ai = (—00,0 ] when j/j = 0 and Ai = (0, 00) when yi = 1. The MVP model also belongs to the multinormal copula family. It carries the advantages and disadvantages of this class of models. 2.2.2 Two-Component Model The idea of a two-component model is to assume that the presence of a binary trait is caused by two independent components: a heritable component and a non-heritable component. The heritable component includes more than genetic factors. It includes non-genetic aspects which one might also inherit from the family, such as life style or personality. The non-heritable component could be any non-familial factors, like age, gender, treatment. Observed genetic information, such as mutation type, can be treated as a non-heritable factor. We propose two different two-component models: parallel and series. In a parallel two-component model, we assume that either of the components alone can cause the presence of the trait. If the presence of the trait is due to heritable factors, we say that event AH has occurred, on the other hand, if it is due to non-heritable factors, we say the event AN has occurred. Let A be the event that the trait is present, that is A = AH U AN- Let Y, YH and Yv be the indicators for A, AH and AN, respectively. Then Y = YH V YN- In a series two-component model, we assume that the (2.5) 22 trait is only present when both components are present. In this case, mathematically, A = AH C\AN and Y = YH A Yv. In this model, the event AH means a person is genetically prone to develop the trait, but will not become affected unless he/she is exposed to unhealthy environmental factors, or grows old. The parallel and series models stand for different mechanisms, but one can be obtained from the other by a change of notation. Specifically, suppose Y = YH V Yv. Let Y = 1 — Y, YH = 1 - YH and YN = 1 - YN, then Y = YHA YN. In the following, we first discuss how to model the two components, then explore the de pendence pattern generated by a two-component model and, finally, show how to evaluate the likelihood of an extended pedigree. Models for YAT and YH Let Y, YH and Y^ be the indicator vectors of a family. We assume that YH and YAT are independent and model them separately. YVJ is determined by the characteristics of individual i which are not shared by others. Therefore the YVJS are independent both within and between families. YVJ may depend on a set of covariates Xj, which characterize the individual exposure, and can be modelled by a parametric model such as logit[Pr(Yw = l|Xi)] = XiP or H>-1\Pi(YNi = l\Xi)] = Xip. YH is determined by the heritable factors. Therefore, Y#jS are dependent within families, but independent across families. We will construct a YH that follows a conditional independence model. In addition to the CI assumption, we also make the following two assumptions on YH-A-l Identical univariate margins: the probability of Ym = 1, denoted by pn, does not vary among individuals. A-2 Exchangeability among full siblings: let YHS be the vector of indicators for k full siblings, then Pr(YHs = y) = PR(Y#S = y'), where y' is an arbitrary permutation of y. First consider models for YH for a sample containing only Type-2 families. Exchangeability is assumed within each family. Since we also assume that siblings are positively dependent, one 23 way to model Y# is to use an exchangeable mixture model. In such a model, YJII,...,Yjjk\P = irl~ Bernoulli(7r) where G is a distribution function with support [0,1]. The unconditional probability of Y# is given by ' . • PT(Ym = yi,i = l,...,k)= f 7r^(l-7r)fc-^dG(7r;0), (2.6) Jo where y+ — y%- A simple example of a choice of G is the Beta(o:, 8) distribution which yields the multivariate binary Beta-binomial(a, 8) distribution of Y#. In this example, (2.6) becomes ft(y„ = „,« = !,...,*) = S!L!^+»z»t), B(a, 8) where B(a, 6) is the beta function. Another example for a choice of G is the probit model: P - $(Z) and Z ~ iV(/i0,<72). Then G(n;m,a>) = Pr (Z < = $ (^'^ ~ ^ An alternative to the mixture model is a copula which provides an exchangeable dependence. An example is the multivariate extension of the Frank copula (Joe, 1997, Section 7.1). The cdf of YH is given by F„(y) = — log ji-(i-e-°)JJ i=l 1 _ e-aFi(yi) 1 - e~a , a > 0, where F\ is the univariate marginal pdf of Y{. For example, one joint probability from this is Pr(y„_ =0,t = l,...,*) = —log < a l-ri-O 1 -e" -a(l-pH) l-e-a We need to mention that YJJ and Yjv may sometimes not be identifiable. For example, if YJJ follows a beta-binomial distribution with parameters a and 8 and Pr(Yjvi — 1) = PN does not vary from person to person, then the three parameters a, 8 and pjv can not be identified in the joint distribution of Y if we only have data containing Type-1 families. When data contain families of different sizes and structures or there are covariates present, the parameters are normally identifiable. 24 Next consider models for Y# for Type-3 families. Let 1 be the index of the parent and Y#;_i be the indicator vector for the siblings. Then YH,-i\Ym=yi~Fk(- ;e(yi)), where k is the number of siblings. To keep pn constant for both parent and siblings, 6(yi) needs to satisfy the following constraint: Pv(Ym = 1) - Pr(Ym = 1) Pv(Ym = l\YHl = 1; 6(1)) + Pr(Ym = 0) Pr(Ym = l\YHi = 0; 0(0)) = pHPT(Ym = l\Ym = 1;0(1)) + (1 -pH)Pr(Ym = \\Ym = O;0(O)) = PH, 1-For example, if YH,-I follows a multivariate binary beta-binomial distribution with parameters (cci, 0i) or (Q!O,/3O), given YHI = 1 or 0, respectively, then this constraint is cti a0 PH R + (1-pg) , =pH-C<1 + PI OLQ+ fj0 The above models for Type-3 families are applicable to family units in a Type-I pedigree, in which each family unit is a Type-3 family. Dependence Structure The strength of dependence between two binary variables can be indexed by the so called risk ratio (R), defined by the conditional probability It measures the increased risk of having the trait for an individual when the trait is observed on a relative. When Yj and Yj are positively dependent, R{j > Pr(Yj = 1) and the closer PUJ is to 1, the stronger is the dependence. First, let us assume that the non-heritable component does not exist. In this case Y = YH and Ptij = Rji since Pr(Yj = 1) = Pr(Yj = 1) = pn- To determine the dependence between any two family members, we need to know the univariate marginal probabilities, the dependence between parent and offspring, and the dependence between siblings. Let p = Pr(Yj = 1), q = 1 — p. For a parent-offspring pair (i,j), let Pl,(m|0 = Pr(*j = m\Yi = 0 AND Pl,(ml) = Pt(Y3 =m,Yi = 0, 25 with l,m = 0 or 1. Then we specify the parent-offspring conditional probability matrix (CPM): Pi = /Pl,(0|0) Pl,(l|0) \Pl,(0|l) Pl,(l|l)y Similarly, let P2 = (P2,(m\l)) be the CPM for a sibling pair and P2,(mi) be the joint probabilities. The role of Pi and P2 is similar to a Markov chain transition matrix. With assumption A-l, we can write P/ = A + A{B for / = 1, 2, where fq p\ B - ( P ~P KQ pJ \-q Q and A; = 1 — P„,(o|i) ~Pi,(i|o) = Pi,(i|i) -Pi,(i|o) = Pi,(0|0) ~ Pi,(o\i)- This can be shown as follows: A + A;B = P[P/,(0|0) -PJ,(0|1)] -P[P.,(1|1) -P.,(i|o)] -9[PJ,(O|O) - Pi,(o|i)] - P.,(i|o)] ' Q ~ P/,(oi) + PPi,(o|o) P ~ Pi,(ii) + PPi,(i|o) ^ - Pt,(oo) + 9PJ,(o|i) P ~ P/,(io) + «Pi,(i|i)y P;,(oo) + PPi,(o|o) P;,(io) + PPi,(i|o) ^Pi.(oi) + 5Pj,(o|i) P.,(ii) + «P.,(i|i)y 9P;,(o|o) + PPi,(o|o) QPi,(i\o) + PPi,(i|i ^PP.,(o|i) + «Pi,(o|i) PPJ,(iH) + 9P.,(i|i)y ^,(o|o) Pi,(i|o)\ = Fi Once p, Pi and P2 are specified, for any two individuals i and j in the family, we can derive their CPM, denoted by p(JJ), and this will immediately give us their risk ratio (Rij). First, consider individuals i and j on the same branch. Then the following result holds. Lemma 2.1 When i and j are on the same branch, the conditional probability matrix ofYj given Y_ is the gij-step transition matrix p(i,j) = F9iJ =A + A9iJB (2.7) 26 Proof: Since each branch of the family tree is a Markov chain with stationary transition probability matrix Pi, we have p(*>-?') = PfJ. To verify the second equality we use an induction proof. We have shown that this is true when = 1. Now assume that it is true for g.j. Then p(ij'>Pi = (A + Af JB)(A + AiB) = A2 + Af 'BA + AiAB + Af J+1B2 = A + Af 'i+1B. The last step follows because A2 = A, AB = BA = 0 and B2 = B. We just verified that the formula holds for + 1. It thereby is established for all g. • (This proof is provided in "An Introduction to Stochastic Modeling", Taylor (1994), page 127-128.) The conditional probability matrix gives us Rij = p+A^'q. When the dependence between parent and offspring is positive, the case we consider in a family setting, then 0 < Ai < 1. Consequently, the dependence between two individuals on the same branch is positive but declines by generation. From Lemma 2.1 we also can derive the CPM for two individuals on different branches. When i and j are on different branches, the path between them is a Markov chain with transition matrix P2 between Y_n. . and Y9„.., and Pi elsewhere. The conditional probability matrix for i and j is given by p(i,j) _ p(i'i9i,j)p2pO'9j,iJ') = (A + Af J'B)(A + A2B)(A + A?'*B) (2.8) = A + A2A1iJ'+w,iB. (2.8) holds when A_\j or Aj\i is empty, since A + A?B = A + B = The expression for P(*J) yields Rij = p+A2A[ iJ+Bi,iq. Provided A2, Ai > 0, the dependence is positive and decreases when individuals i and j are more distantly related. 27 Actually, Ai is equal to the parent-offspring correlation: gPi,(u) -PPi,(io) Ai =Pi,(i|i) -Pi,(i|o) pq Pl,(ll) -P(P1,(11) + Pl,(10)) P9 Pl,(li) P9 Similarly, A2 is equal to the sib-sib correlation. Therefore, CorrfYi, Yj) = (pRij - p2)/(pq) = { Af',J when i and j are on the same branch, y A2Af •J+Sj'1 when « and j are on different branches. This is the same correlation structure for the CI model as in Example. 2.1. Now, consider that the non-heritable component exists. To make two individuals compara ble, let Pi(YNi = 1) = Pv(YNj = 1) = pN. If Y = Ym V YNi, then Pr(Y* = 1) = pH + PN - PHPN =PH + qHPN and Pr(Y = l,Yj = l) = P{iiJ) + 2p{l^pN+p^)PN = ptJ)+^(PH-p^%N + (l-2pH+p{iiJ))P'N = PS\J)(1 - PN? + 2pHPN + (1 - 2pH)p%, where the p^s are the joint probabilities of Ym and YHJ- With pn and PN fixed, Pr(Y = l,Yj = 1) reaches its upper limits p# + qHPN when p^ = p#, i.e., YHI and Y/f j are perfectly dependent. So the upper bound of Rij is ^ <pg + ggpfy J _ Ptf + qHPN ' When Y = Ytfj A Y^, Pr(Y = 1) = PffPiV and Pr(Y* = l,Yj = 1) = P^pfv- In this case> the expression for the risk ratio is simpler: Rij = p^pPN, where p^ = Pif/Vpff- Obviously, Rij achieves its upper bound PN when p^ = 1. Remarks: 1. With the involvement of the independent YJVJ and YNJ, the dependence between Y and Yj is weaker than that between Yni and YHJ, provided that Yni and YHJ are positively associated with each other. 28 2. The dependence between Y_ and Yj decreases with the dependence between Ym and YJJJ-Likelihood In this section we derive the likelihood. Since the parallel and series models can be obtained by changing notation, we only consider the case of Y = YH V Yjv. For YH = YH and Y_v = y_v, Y = YH V YN (the element wise maximum). The likelihood of Y = YH is L(y) = Pr(Y# = yH) rijPr(YjVj = ym)- However, YH and YN are not directly observable. Given the value of Yj, there are different possible values of Ym and Ym- For Yj = 0, we must have Ym = Y_VJ = 0. But, for Yj = 1, there are two possibilities: (1) Ym = 1, in which case the value of Y_VJ does not matter; (2) Ym = 0, in which case Ym has to be 1. For a family with k+ members having the feature, there are 2k+ different combinations to consider. The likelihood is the summation of the probability over all of the combinations. Let D = {i : Yj = 1} be the set of affected individuals and D = {i : Yj = 0} be the set of unaffected individuals; YH = {Ym : i G D}, YH = {YHi • i £ D}, Yff — {YNi :ieD},YDN = {YNi : i € D}. Then YH = YdN = 0. Given a possible value of Y^ = y, the likelihood is Pr(Y£ = y, YdH = 0) J] [Pr(Y^ = 1)]1_W J] Pr(Y^ = 0). Let C be the collection of all possible values of Y^. Then the likelihood is L = Pr(Y^ = 0) J_ ( Pr(YH = y, YJ = 0) J] [Pr(Y4- = 1)]1_W ] . i&D yec \ jeD J The calculation of the probabilities involving the YJVJS is straightforward based on the model specified for this variable. Here we discuss how to calculate the joint probabilities of YH- For Type-2 families, the probability is directly given by the specified joint distribution of YJJ. For example, if a mixture model is specified, then the probability is given by (2.6) and it only depends on y+. For an Type-I pedigree (including a Type-3 family as a special case), the probability is given by (2.3) and can be written as Pr(Y# = y) = Pr(Ym = YI) J[ Pv(YHSI = YHSi\Ym = Vi). Here we use the family shown in Figure 1.1 as an example to demonstrate the algorithm. With 29 YHI = YH2 = YH^ = Yji§ = 1 and YH3 = YHS = 0, the joint probability of YH can be written as Pr(Ym = 1) Pr(Yff2 = 1, Ym = 0\YM = 1) Pr(YH4 = 1, YHS = 0\YH2 = 1) Px(YH6 = 1\YH3 = 0). A family tree may not be complete. Values of some parents might be missing. For example, if the value of individual 3 in Figure 1.1 is not observed, the joint probability of the remaining individuals has to be calculated by summing the full joint probability over YH3-Even though for a big family with many affected members, 2fc+ can be quite large, it is manageable since the individual conditional and marginal probabilities are very easy to evaluate. Two-Component Models for Type-4 Families and Type-II Pedigrees For a Type-4 family, let Y\ and Yj> be the indicators of the parents and Ys be those of their offspring. We assume that YHI and YHI are independent. Consider the following model for Y#iS: YH,s\YHi+Ym = l~Fk{-;9l), . = 0,1,2 where Fk is an exchangeable mixture model with parameters 0\. To keep pn constant for all i, $i needs to satisfy the following constraint: Pr(yHi = 1) = Pr(YHl + Ym = 0) Pr(Ym = l\YHi + YH2 = 0; 0Q)+ + 2 Pv(Ym + YH2 = 1) Pr(Ym = l\YM + YH2 = 1; i) + Pr(y„i + YH2 = 2) Pr(Ym = l\YHl + YH2 = 2; 02) (2.9) =(1 - PH)2 Pr(Ym = l\YHl + YH2 = 0; 0O) + 2(1 -pH)PHPr(Ym = 1\YH1 +YH2 = l;0i) + p2H PT(Ym = l\YHl + YH2 - 2; 62) = pH, for i > 2 For a special case, in which the indicators for two siblings are independent given the values of those for their two parents, the model is specified by four probabilities: pn and pH,i = Pr(Ym = l\Ym + YH2 = I), for i > 2 and I = 0,1,2, where pn,i satisfies the constraint in (2.9). These models can be applied to family units in a Type-II pedigree, since each family unit in such a pedigree is a Type-4 family. 30 Discussion Here we would like to discuss the two component assumption. It is plausible to consider the heritable and non-heritable factors as two components of the binary trait. However, are they separable and independent like two components in an electrical system? It is a rather doubtful assumption. Does this not oversimplify reality? This is not an unusual question in statistical practice. The idea of the two-component model is similar to that of the well accepted random effects model. We can not always verify the assumption that the mean structure can be separated into random effects and fixed effects either. Models are an approximate reflection of the reality. It might work well with certain binary traits and might not with others. 2.3 Count Traits In this section we consider non-negative integer random variables representing the results of counts, such as the number of meningiomas in a patient with NF2. Often analysis of univariate count data is based on the Poisson distribution, or a Poisson mixture distribution when the data are overdispersed relative to the Poisson. In this section we study parametric models for multivariate counts, including the Poisson-lognormal mixture, the multinormal copula model and a new model developed based on the conditional independence assumption. The first two models have been studied in the literature and their strengths and weaknesses have been discussed in Section 2.1. Therefore, our effort here is mainly focused on the development of the CI model and its properties. 2.3.1 Poisson-Lognormal Mixture This is an example of a random effects model discussed in Section 2.1. Given Aj — Aj, it is assumed that Yi|Aj = Aj ~ Poisson(Aj). The random vector log A itself follows a MVN distribution with mean fi = X/3 and covariance matrix £ = cr2R(a). Here R is the correlation matrix which depends on dependence parameters a. The moments of Y can be easily obtained through conditional expectation and properties 31 of the Poisson and lognormal distributions. Let pij — (R(a))jj = Corr(log Aj,log\j). E(Yi) = exp(m + ^o2) = vu Var(Yj) = Vi + v2(exp(a2)-l), Cov(Yi,Yj) = UiUj(exp(a2pij) - 1), i ^ j, CWY Y) = exp^2^) - 1 k *' 3! [(exp(a2) - 1 + ^i)(eXp(c72) - 1 + vj1)]^' Since E(Yj) < Var(Yj), the marginal distribution of Yj is overdispersed relative to the Poisson. The correlation between Yj and Yj depends directly on p^ and |Corr(Yj,Yj)\ < \pij\- The range of the correlation between Yj and Yj is not as wide as for the lognormal distribution. When vi — UJ = v, the correlation is bounded by Kexp(-a2)-l] <Corr(y.y.}< ^[exP(a2)-l] 1 + i4exp(a2) - 1] ~ - l + i/[exp(a2) -1] (Joe, 1997, Section 7.2). The upper bound goes to 1 as either v or a2 goes to infinity. The joint probability function of Y is given by Pr(Yj = yj,i = l,...,fc;/i,a2,R) = f TTPr(Y = yj|Aj)/(A;a2R)d\ (2.10) = [ nPr(Yj = yj|Aj = e2i)^(z;/x,^2R)dz where 2lfe is the positive orthant of the A;-dimensional Euclidean space and / is the density function of the multivariate lognormal distribution. The distribution of the counts based on a Poisson-lognormal mixture model is highly skewed. Data generated from such a model normally have a few extremely large counts whereas the majority are around the median. For data with a high frequency of zero counts, we can consider the zero-inflated Poisson-lognormal mixture distribution (see Yau and Lee (2001) for an example). Another way to model multivariate counts is to incorporate the random effects with the negative binomial distribution (see Tempelman (1996) as an example). 2.3.2 Multinormal Copula Model for Count Traits To construct a multinormal copula model for a count trait, the marginal distribution in (2.2), Gj, can be any univariate distribution for count data, such as Poisson, negative binomial, or generalized 32 Poisson. The parameters of the marginal distribution can be related to a linear function of the covariate X through a link function as in a univariate regression model. For example, if Gi is the Poisson(Aj) cdf, we can specify that log(Aj) = Xj/3. Let Z = (Zi,..., Zk)' ~ JV(0, R), then Pr(Y; = Vi) = Pr(zj(t« - 1) < Zt < Zi(yi)), where Zi(l) = for / = 0,1,..., and z(—1) = —co. The joint pmf of Y is given by Pr(Y = y) = Pr (zi{yi) < Zt < Zi(yi + 1), % = 1,..., k). 2.3.3 Conditional Independence Models for Count Traits In this section, we construct conditional independence models for count data and study the model properties, such as dependence structure and likelihood. The models we considered are based on a univariate distribution belonging to the convolution-closed infinitely divisible class and a random operator A, which are introduced in the following section. Convolution-Closed Infinitely Divisible Distributions and Random Operator A Consider a class of distributions {Fg, 9 > 0}, which is convolution-closed infinitely divisible (CCID), i.e., Fg1 * Fg2 — Fg1+g2, where * is the convolution operator. Such distributions for count data include the Poisson, negative binomial (9, p) with fixed p, and generalized Poisson (0, rj) with fixed r/. The negative binomial distribution, denoted by NB(0,p), has pmf T(9 + v)v9ay Hv) = r(/)y, , 0<p<landg = l-p, 0>O. The mean and variance are given by Qq/p and Oq/p2, respectively. The generalized Poisson distri bution, denoted by GP(0,7/), has pmf ™ = *%gf - »>o.o<,<.. The mean and variance are 9/(1 — 77) and 9/(1 — T7)3. For two independent random variables Z\ and Z2 with CCID distributions Fgx and Fg2, respectively, Z\ + Z2 ~ Fg1+g2. Let GajV be the conditional distribution of Z\ given Z\ + Z2 — y. A(Y, a), 0 < a < 1, is a random operator such that given Y = y, A(y, a) ~ Gajy and A(Y, a) ~ Fag if Y ~Fe. Examples of the A(Y, a) are: 33 1. Poisson: A(y, a) ~ Binomial(y,a); 2. negative binomial (NB): A(y, a) ~ Beta-binomial(y, a6, (1 — a)9); 3. generalized Poisson (GP): A(y,a) ~ quasi-binomial(y, a, (), which has the following pmf /(*) = y\ a(l — a) a + (k k) i + ck U + CVJ fc-i 1 - a + C(y - fc) y-fc-l where ( = rj/9. In all the three examples, EA(y, a) = ay (see Joe (1997) Section 8.4 for details). Models for Type-3 Families Let Y be the vector of count variables of a Type-3 family where YL is the variable corre sponding to the parent. Define Yi = Z\ + Z\2 + Z13, and for i > 1, Y = Z\ + Z22 + Z;3, where Zi, Z12, Z22 and Z^ are independent variables with distribution Z\ ~ F^, Zi2, Z22 ~ Fg2 and ZiZ ~ i^i3. The interpretation of the model is that Y is formed by latent components Zi, Z12 or Z22 and ZJ3; among the latent components, Zi is shared by all the family members while Z22 is only shared by siblings. Zi, Z12 and Z22 together are the heritable (or familial) components. ZJ3 is the non-heritable component determined by individual characteristics and could be modelled as depending on a set of covariates Xj through a link function h, i.e., h(6is) — Xj^. If Fg is the Poisson, NB or GP distribution, for Y ~ FQ, Var(Y) = 87, where 7 1 for Poisson, (l-p)/p2 forNB, 1/(1 - r?)3 for GP. Therefore, for these three distributions, it is straightforward to derive the parent-offspring and 34 sib-sib correlations. Let 6{ = 9\ + 92 + 0^. Then Varza 9X Carr(Yi,Yj) = { if i and j are parent and offspring ^(VarYiXVarY,) (2.11) (VarZi + VarZ22) (0i + 02) [ v/(Vary-i)(VarYj) if i and j are siblings Model for Type-I Pedigrees Now we apply this type of model to family units in a Type-I pedigree. For the family unit [/j, i G RT, we specify (2.12) Yi — Z^p + Z^ + ZJ3 Yj = Z? + zg + Zj3, for j G 5<, where ZJ° ~ zg, zg ~ Ffl2 and ZJ3 ~ Jfy8, I = i or j. To link the family units together, we need the stochastic representation: Yj = A(YU a{) + zg + Zj3, for j G St, (2.13) where Oj = 9i/(9i + 92 + 9a). By definition, A(Yi,a>i) ~ FQ1 and A(yi,cti) has the same distribution as Z^\Yi = yi. Also, since Yi is independent of zg and Z73, A{Yi,oti) is independent of them as well. Example: The stochastic representation for the pedigree in Figure 1.1: Y1 = Z[l) + Zg} + Z13, Y2 = ^(yx.aO + Z^ + Zaa, Y3 = A(Yuai) + zS + Z33, Y4 = A(Y2,a2) + zif + Z43, Y5 = ^(Ya^^ + Z^ + Zss, Y5 = A(Y3,a3) + Z$ + z63. Based on such a model, the path between individuals % and j is a discrete Markov series (Joe, 1997, section 8.4) with varying lag one correlation. Suppose (Y^,..., Y;m) is the path between 35 i and j, with l\ = i and lm = j. Thus, for i' = 1,..., m — 1, the path has the following stochastic representation: YWl =.4(^,0*) + ^. where ay and Zj depend on the relationship between and ly. If/j'+i and ly are parent and offspring, ay = 9\/(9\ +62 + 61,3) and Z,' is the summation of the second and third components in (2.13), hence Z[.l+l ~ Fe2+9l., 3; if 'i'+i and ^/ are siblings, ay = (0i + 62)/{9\ + 82 + fy.,3) and Z,' is the third component in 2.13, hence Zi ~ Fa, For the Poisson, NB or GP distribution, the correlation between any two family members is given in the result below. Result 2.1 For a model specified in (2.12), when F is the Poisson, NB or GP distribution, we have m—1 CorrtY,^-)- Ylpf,f+u (2.14) j'=i where pj>,j>+i = Corr(Y/.,, Yt.l+1). Proof: When Y has a Poisson, NB or GP distribution, given Y = y, E[A(y, a)] = ay. Therefore E(lrW1l^)=«i'^+E^l+iand Cov^Y^) = EY/.( Y/.,+1 - EY.,EY/.,+i = E{Y.,E(Y.,+1|Y(.,)} - EY/.,E{E(Y.,+1|Y;.,)} = EayYj2 +EYi.lEZ'l.l+i - EYj.,(Eaj/Y., +EZ/.,+i) = a j'Var Yi,.,-Let CT,2, = VarYj.,, then pyty+1 = Corr(Y.,, Y,.,+l) = ayoi., Moreover, = E(a^Yi/|Yi,_1) + EZ/i,+i = aj/aj,.! Yj.,_l + aj'EZ/., + EZ/.7+i. By induction, i' i' E{Ylil+l\Yh) = Yh\{ajl+EZ'hY[ajl + --- + EZ[il j'=l j'=2 36 It follows for the preceding covariance calculation that i' i' Cov(Yh,Yhl+l) = Vary,, «i'= 4 II ai' f=i j'=i i' 2 TT °V+i = ah 11 — Pj'J'+l j'=l i' = ahah+1 n pj'j'+i-j'=i Therefore Con^Y^ , Y,.,+1) = YYj'=i Pj',j'+i- The equation (2.14) is obtained by taking i' + 1 = m. • When 9iz is a constant, say 6$, let 6 = 6\ + 02 + #3- Then the parent-offspring and sib-sib correlations are p\ = 0i/0 and p2 = (#i + #2)/#, respectively. In this case, for the Poisson, NB or GP distribution, CoTv(Yi,Yj) = { pf'J i and j on the same branch, pf '3+93'lp2 i and j on different branches. Likelihood As shown in (2.3), the likelihood involves the conditional likelihoods of the sibling groups. Here we show how to compute those conditional likelihoods for the model proposed in the preceding section. Let zf1* = A(yi, a^. The conditional likelihood of Yst given Yi = t/i is: y 1 E E v*(zi]* = w Pr(4} = * - io) nPr^ = VJ - 0. (=0 i0=o jeSi where y = min(yj, j E Si). We know that Z^* ~ Gai,yi-Extension to Type-4 Families To extend the model to a Type-4 family, we consider four independent latent components for an individual: Yj = Z\ + Z2 + Zs + Z4. Among these components, Z\ is related to parent 1 while Z2 is related to parent 2 and these two components have the same distribution. Z3 is shared by siblings. The last component is the non-heritable component. 37 If we let the siblings share the first two components, this forces the parent-offspring corre lation to be less than half of the sib-sib correlation which is unusual under a polygenic effect. For this reason, we construct the model as follows: Yi = Zn + Zi2 + Zi3 + ZiA, i = 1,2 (parents), (2.15) Yi = An (Yi, ai) + Ai2{Y2, a2) + Z33 + ZiA, i > 2 (offspring). (2.16) where ZiUZi2 ~ F6l, Zj3 ~ Fg3, j = 1,2,3 and ZiA ~ Fgi4; at = 91/(291 + 83 + 9U), 1 = 1,2. An and Ayi are two independent realizations of A, as are Ai2 and Ai<2. Z33 is shared by siblings representing additional dependence from other sources. All the Z-components are independent. So, the sib-sib correlation is partly from the A operators and partly from Z33. Both the parents and offspring have a marginal distribution F2$14-$3+gi4, for all i. For the parents, this is directly from (2.16). By the definition of A, we have An, Ayi, Ai2 and Ay2 have the same distribution Fg1. An and Ai2 are independent since Yi and Y2 are independent. Therefore, for i > 2, Y ~ F2g1+g3+0n as well. Next, we use the Poisson distribution as an example to illustrate the parent-offspring and sib-sib correlations for such a model. In this case, Y has mean Aj = 2#i +#3 + #i4. The correlation between parent i and offspring j is = Var(^j(Y,aj)) = 9l 3> V(VarYj)(VarY,) y/x~X~' Since 9\ < Aj/2, for all i, the parent-offspring correlation is always less than or equal to 0.5 in this model. This agrees with a polygenic model for quantitative traits when there is no household effect. To obtain the correlation between offspring j and j', we first derive the covariance: CoviY^Yj,) = Cav(Ajl{Y1,ai)1Ajn(Yuai)) + Coy{Aj2{Y2,a2),Ajl2(Y2,a2)) +VarZ33. Based on the fact that A,j(Y, a{) ~ Poisson(0^A;) and Aji(yi,a{) ~ Binomial(yj, on), we have Cw(Ajl(Yl,al),Aj.l(Yhal)) - E[Aji(Yi,ai)Ajn(Yi,ai)} - E[Aji(Yi,ai)]E[Aj'i(Yi,ai)] = E{E[Aji{yhal)Ajn{yi,ai)}\Yl = yt} - a2\f = Eafy,2 - a2A2 = a2A/. 38 Hence, Cov(Yj,Yj>) = a?Ai + o?2X2 + 03, and it follows that con{Yj,Yi.) = <*Xl+^l + 03. If Aj = A for all i, then the parent-offspring and sib-sib correlations do not vary from relative pair to pair. The parent-offspring correlation becomes p\ = 6\/X. Recall that a, = 0i/Aj, therefore ct\ = a2 — p\. Consequently, the sib-sib correlation becomes p2 = 1p\ + 03j\ = (A0i + 63)/X, where A = 20i/A. 2.4 Survival Data This section is concerned with data representing times to the occurrence of some specified event, such as death, onset of a disease or a clinical feature. The time to an event, denoted by Tj, is a positive continuous random variable. The special feature of survival data is the presence of censoring. There are three types of censoring: left, right and interval. In general, if the survival time of family member i is censored, we observe a time interval A( = (0,Ri), (Lj,oo) or (Li,Ri\, corresponding to left, right and interval censored, respectively. We make the following assumptions about the mechanisms behind the censoring, regardless of its type: the censoring times of a family are independent of the survival times. Suppose a full parametric model is specified for the distribution of T = (Ti,... ,Tk)'. Let S, and // denote the Z-dimensional marginal survival and density function of T. When the censoring times are independent of the survival times, the model can be estimated based on the partial likelihood of the survival times. Without loss of generality, assume that the first k\ individuals in the family are not censored and have observed failure time t\,..., tkl, and the rest are censored with censoring intervals A^+i, • • •, Ak. Then the partial likelihood of this family is: L = fki{U,i = l,...,ki) / fk2\ki{tj,j = h + l,...,k\Ti = ti,i = 1,... ,ki)dtkl+i ...dtk, where k2 = k — k\, fk2\kl is the conditional density function of Tkl+i,...,Tk given T\,...,Tkl and 21 = Akl+i x ... x Ak. 39 In particular, if only right censoring is present, Let Cj be the censoring time and Yj = Tj VCj, either the time of event or the time of censoring, whichever is observed first. Then the partial likelihood becomes: L = fia {yui = l,---,h)Sk2\kl {yj,j = h + 1,... ,k | Tj = ti,i = 1,... ,h), where Sk2\kl is the conditional survival function. 2.4.1 Multivariate Lognormal Models Here we assume that log T = (logTi,..., logT^)' ~ iV(X/3, £). We model log T as a usual quanti tative trait. The only difference in the analysis is that censoring may be present. Suppose right censoring is present. Let Z = log Y and zi be the vector of the observed times of event with length k\ and z2 the vector of censored times with length k2 = k — k\. The partial likelihood is given by L = ^1(zi;/x1,Si)$fc2(-z2;-/X2,S2), where /zx and Si are the mean and covariance matrix of Zn pJn, and S2 are the conditional mean and covariance matrix of Z2 given Zi = zi. 2.4.2 Multivariate Lognormal Frailty In a frailty model, the hazard of Tj conditionally on Aj = Aj has the from . hi{t\ki = Aj) = Xih*(t) for some h* (t). If h*(t) — 1, then the conditional distribution of Tj is exponential(Aj), while if h*(t) = it is Weibull(Aj, 7). Aj is called the frailty representing the underlying risk. The observed survival function (also called the marginal survival function) of Tj can be obtained by integrating out AJ: Si(t) = J exp(-AjF*(i))/j(Aj)dAj = E{exp(-AjH*(i))}, where H*(t) = §r]h*(t)dt and /j is the density function of Aj. We see that S{(t) is the Laplace transform of Aj, evaluated at H*(t). 40 Under the familial setting, it is natural to assume that each family member has a separate frailty and these are correlated with each other. Again we choose the multivariate lognormal as the distribution of A, i.e., The distribution of T is difficult to obtain as there is no simple expression for the Laplace transform of lognormal random variables. Let Si(t\Xi) and fi(t\Xi) be the conditional survival and density function of Tj. Then with right censoring, the contribution to the partial likelihood of this family is where di = 1 if the time of the event is observed, 0 if it is censored. 2.4.3 Multinormal Copula Model Suppose that Tj has univariate marginal survival function Si. In this section we construct the joint survival function of T using a multinormal copula. Since it is more convenient to use the survival functions in survival analysis, we replace the cdf Gj in (2.2) with the survival function Si. The two definitions are equivalent. Then we have Suppose Si is fully specified with a density function /j. Consider only right censoring. Let Zi = $_1(5j(j/j)), zi be the vector of uncensored times and z2 the vector of censored ones. The partial likelihood is given by log A = Z~ N(n,a2K(a)). S(t) = $k..., *-HSk{tk)); R(a)). (2.17) L = $ fel(zi;0,Ri)]][ fi(Vi) $fc2(z2;/4R2)> 4>{zi) where /x^ R2 are the conditional mean and covariance matrix of Z2 given Zi = zi. 41 Chapter 3 Estimating Procedures Generated from Composite Likelihood Functions As mentioned in Chapter 2, the major difficulty in implementing models with multinormal random effects and multivariate normal copula is parameter estimation when data sets involve large fam ilies. The MLE is generally computationally difficult to obtain since it involves high dimensional integration. We will view the MVN model as a special case, which is used for quantitative traits or survival data (if on the log scale). Computational difficulty also occurs with the MVN model when censoring is present. Therefore, developing estimation procedures that are less computationally demanding is important. The models we mentioned share a common property: the parameters which specify the models can be classified as univariate marginal parameters and dependence parameters. The former characterizes the univariate margins, such as the means and variances in the MVN model, while the latter, joint with the univariate marginal parameters, fully specify the multivariate law, such as the correlations in the MVN model. This property allows us to develop estimation methods based on composite likelihood from the univariate and bivariate margins. A composite likelihood (CL), as named by Lindsay (1988), is formed by adding together individual component log likelihoods, each of which is a log likelihood or conditional log likelihood of a marginal distribution of a multivariate model. We consider two CL approaches for familial data: the first is based on both CL of the univariate margins and bivariate margins and estimates the marginal parameters and the dependent 42 parameters in two steps while the second only uses the bivariate CL and estimates the parameters simultaneously. Weighting schemes are also considered to improve the efficiency of these approaches. To illustrate the main ideas, we use a simple MVP model as an example. Suppose Yij = l(Zij > 0), where the latent variables Zij have a normal distribution with common mean p and variance 1. Moreover, suppose Cov(Zjj,Zij>) = p for all i and j / /. In this model, there is one marginal parameter, p, and one dependence parameter, p. In the first approach, we estimate p using the univariate marginal log likelihood of Yij = : hiVijifi) = Vij log$i(/i) + (1 - yij) log(l -We form a CL function by summing h(yij',p) over all i and j: ^i(p) = YliYljh(yij', p), then maximize ^UCL(P) with respect to p. This leads to p = $T l(y), where y is the grand average of yij. We then estimate p using the bivariate marginal log likelihood. For a pair (Yij, Yiji) = (y%j,yiji), the bivariate log likelihood is: h(vij,yij>;p,P) = iog{<MM,P;p) {l- yi3){1- y^[$i(p) - MP,P;P)p+^'-2w [l-2$1(/z) + $2(^,/i;p)p^'} = (1 - 2/tj)(l - Vij') log *2(A», Ml p) + {Vij + Vij' - ZVijVij') log[*i(/i) - $2(p, p; p)] +yijyij> log[l - 2§i(p) + §2{p,p;p)]-We form the second CL function by summing h(yij,yij>; p, p) over all possible pairs: ^BCL(P,P) = ^ Ylh(yij,yij';n,p)-i j>j' The estimate of p is obtained by replacing p by p and then maximizing $BCL(P-, p) with respect to p. In the second approach, we form a CL function *BCL(M> p) = YIWi fefo»J> yij'iPi p), i j>j' where U;J is a weight depending on the family size. We maximize $*BCL(p, p) and thus estimate p and p in one step. The motivation and choice of the weights will be given later. 43 The same ideas work for models with covariates although the notation is more complicated. In Section 3.1, we introduce notation to reflect the structure of the parameters. In Section 3.2, we first give a general introduction to estimating procedures based on composite likelihood. We then present the two approaches described above and consider their performance under independence. Section 3.3 deals with methods to estimate the covariance matrix of the estimates and Section 3.4 contains some remarks. In the next chapter, we examine the efficiency of these procedures by comparing them with the ML method based on some special cases for which the comparison is actually possible. 3.1 Notation Let F( • ; 7ix,... 7^) be the joint cdf of Yj, a vector of dimension fcj, where the vector consists of all the parameters appearing in the univariate margins of YJ; 7i2 consists of those not in the univariate margins, but in the bivariate margins, and so on. We assume that margins of order higher than M, including the full joint distribution, are fully determined by 7^,... ,7JA^. The following discussion deals with models having M = 2, but the arguments can be extended to models having M > 2. (i) Let 7^ be the elements of 7^ pertaining to Yjj's marginal distribution. That is, suppose the marginal distribution of Yjj is F\( • ;7j^)- Suppose that 7^ is of length d\: Jj) - (JJ) JJ) y Let 7ilf, - (rfa],---,^)', * = l,---,di, and 7il = (7^,... ,7jMl)'- Similarly, suppose the bivariate marginal distribution of Yjj and Yjj/ is F2( • , • ; Tii \ lW wriere Jjj) _ iJJd') Jjj')y Iii — Wj2,l > • • • Ii2,d2 I " Let-^-JV) (3,1) (3,2) (kul) (*i,*i-i)y ,_1 doandV -(V V ) ^ li2,l - \li2,l ' H2,l ' H2,l > • • • ' H2,l ' • • • ' H2,l ' i, . . . , U2 ana 7i2 - (7i2,D • • • > 7j2,d2/-Furthermore, suppose ~yim, m = 1,2, can be modelled as a function of a set of parameters #m, i.e. 7jm = hm(XJm^;6m,fcj) for a parametric family hm( • ;0m,k), m = 1,2. One particular case is 7im = X.[ m^9m. Here XJm^ is a matrix of covariates, not necessarily linked to the mean of the distribution. The values in are treated as known constants. Suppose that the vectors 0\ 44 and 62 are two distinct sets of parameters. Let 7^ = (7^,7^2)1 0' = (^i>^2)> n' = (hl5 h^), and X,; = 0 0 x: (2) Then 7^ = h(Xj; 0, k). In the linear case 7^ = Xj0. To illustrate the notation, we list a few examples. Example 3.1 MVN model. For h = 3, Yj = (Yji, yi2, ^3) with E(Y<) = fJ,{ = Xj/3; Var(Yjj) = of- = a2 for all j. For an exchangeable dependence model, Corr(Yj.,-, Yjj<) = pijj, = a for j > j'. Then, -yn = Pi with of = (aa, of2> cr23)'. We can write 7iX = X^0i with 0\ = /3 and Xjl> = Xi 0 0 1 where 0/1 is a vector or matrix of Os/ls with dimensions matching those of Xj, 7^ and 0\. Also, 7i2 = (Pi2i,Pi3i)Pi32)' with 02 = a. Since 7i2 can be written as la, X^ = 1. As an example of a non-exchangeable dependence model, suppose pj2i = Pi3i = «i and Pi32 = OL2- Then (1 o\ 62 = I * I and X?) = 1 0 0 1 Example 3.2 MVP model. For h = 3, Yj = (YJI, ^2,^3)', Yij = I(Zjj > 0) where Zjj is the latent variable and Z = (Zn, Zj2, Z^)' has a multivariate normal distribution with mean ^ = Xj/3 and a common variance 1. For an exchangeable dependence model, Corr(Zy, Zij>) = p^j/ = a for j > f. Then, jn = fj,{, 01 = (3 and x|x) = XJ; 7i2 = (pi2i,Pm,Pm)', 02 = at and XJ2) = 1. Example 3.3 PLNM model with exchangeable dependence and ki = 3. Yj = (^1)^2,^3)' and \ = (AJI, Aj2, Aj3)' has mean /ZJ = Xj/3, Var(Aj) = cr2 = a2 and Corr(Ajj, A^/) = pijf = a for j > j'. Then, the parameterization is same as Example 1. Example 3.4 Gaussian copula with Weibull margins. The marginal density function of the failure time Tij is: /,..// \l>ij-1 /(t;X0-) = ^(- exp aij 45 where log a^- = Xjj/3 and bij = b. The joint survival function of Tj is specified as in (2.17). Suppose each of the off-diagonal entries of Rj, pijj>, is a linear combination of a. Let pj be the collection of all the Pijj'S. Let aj = (ajj) and bj = (bij). Then the marginal parameters are specified as 7a = (ai>Di)> 0[ = (r3',b); the dependence parameters are specified as -yi2 — Pi and 62 = a. 3.2 Estimation Procedures based on Composite Likelihood 3.2.1 General Properties Composite likelihood methods are estimation procedures that involve maximizing a function based on the summation of individual component log likelihoods from the marginal distributions. The CL method is appealing for the following reasons: firstly, it inherits some properties of the ordinary likelihood. In particular, under regularity conditions, the estimates based on CL are consistent and asymptotically unbiased. Secondly, under many circumstances the estimates are much easier to compute than the ML estimates. With the growth of research in areas involving multivariate data, there is an increasing use of CL. The following are some recent examples: Xu (1996) proposed using inference functions formed by univariate marginal likelihood for multivariate discrete data; Heagerty and Lele (1998) and Cur-riero and Lele (1999) considered pairwise composite likelihood estimation for binary spatial data; Parner (2001) used pairwise likelihood contributions to analyze familial survival data; Joreskog and Moustaki (2001) compared a CL approach with two other approaches in factor analysis of ordinal variables; Lele and Taper (2002) considered a CL approach to estimate the variance components of a MVN random effects model. Let \I/(n) = Yli=i be a CL function, where ^i(6) is the contribution of the ith family. As with the ML method, the problem of maximizing \J/(n) usually becomes the problem of solving the equations: 90 e=en ' Let ipi(9) — d^i(9)/d6. Then V(n)(^) = Y^itfiiO): called the composite score function, is the inference function generated from ^(n). This can be directly applied to the one-step procedure based on the bivariate CL function. The two-stage procedure involve two CL functions: the univariate CL function, $uCL{n)i arid the bivariate CL function, ^BCL{H)- The procedure becomes to solve 46 two sets of equations: dVuCL(n)(Ol) d*BCL(n){0l,02) d02 0—0n 0=0n = 0. = 0. Next, we state the asymptotic properties of the estimators from CL methods. Under similar regularity conditions applied to the MLE (Serfling, 1980), Egipi(d) = 0 (Lindsay, 1988), therefore V>(„) is unbiased. Assume that M, the size of the largest family, is bounded. The standard theory for inference functions (Godambe, 1991) can be applied to derive the asymptotic properties of 0n, the estimate of 6 obtained from solving ip^(0) = Ohp. Theorem 3.1 Assume that the size of the largest family is bounded and the families are from a finite mixture of family structures. Under the usual regularity conditions for log-likelihood of univariate and bivariate margins, as n -» oo, Vn-(dn-0)AN(0,G^(0)), where G^(d) is the Godambe information matrix defined as: G^O) = f lim D„(0)1' [ lim M^O)]'1 \ lim D^(0)1 , (3.1) Ln-yoo J Ln->oo ^ J Ln->oo J with and M^O) = -Ee{ipin)(0)ip[n)(6)}. A proof of the above theorem can be found in Godambe (1991). An approximation of the covariance matrix of 0n is taken as V-n"1D-1Ml/)(D-1)', (3.2) provided the inverse of D^, exists. In the next two sections, we introduce two estimating procedures based on CL. In short, we call the two methods CL1 and CL2. 47 3.2.2 A Two-stage Estimating Procedure (CL1) Inference Functions In this approach, 0\ and 62 are estimated in two separate steps. The idea is: for each set of parameters, consider the CL from the marginal distribution with the lowest order in which the parameters are identifiable. More specifically, B\ is identifiable from the univariate marginal distributions of Y, and therefore we consider estimating it by maximizing the CL function of the univariate margins (UCL): i=l j=l where l\ is the log likelihood based on the univariate marginal density of Yj. Note that \&r/CL is a log likelihood if Yi, • • • , Y{ki are independent random variables for each i. Differentiating $JJCL with respect to 0\ leads to the composite score function: *UCLM = = EfHW'HWCY,,,,,), (3.4) i where H^fX^) = dh^X^; 0U k^/dOx and j ~ l1 din EjdhjYij) Ejdh(Yiki) -ZjdhjYn) Ejdh(Yiki) 07*1,1 Hi'! Hlfi Hl'di dh(Yg) dh{Yiki) dhjYq) dh(Yiki) Hl!l Hi,! Hl,2 Hl,k In the second step, the first element YJj ^i(Yj)/^7ii,i — ^i(Yi)/97^\ since is only related to Yi. It is the same for other elements. When ^ = X^Oi, H^fXJ1*) = x\1]. • 62 can be identified by the bivariate marginal distribution of Y, so we consider estimating it by maximizing the composite likelihood function of the bivariate margins (BCL). The contribution of Yi is the summation of the log likelihoods of all the possible pairs (Yj, Yij'), j > j'- Let h denote the bivariate log likelihood. The BCL function is **CL(»I,»2) = EE/2(y0^y';7g)(«i),7g')(»i),7gJ"')(»2)). (3.5) i==l 3>f 48 The estimating function for 02 is 3*BCL(0I,02) V»2BCI,(01>*2) = X;(Hf)(xf)))g2BCL(Yi,7,), 90, where g2BCL(Y, ^ dhiX^Yn,) _ (dhjYij.Yy) . ,\ ,7J h d^ v d^r) ,J 3) and H|2)(XJ2)) = 3h2(X,<2);02,fci)/002. Let t/>CL1 = (II>UCL^2BCL)' AND SCLI = (SUCL'&BCL)- TNEN ^Cii = E(H'(^))W(Y<,7<). (3.6) (3.7) Implementation Computationally, the estimating procedure can be carried out in two steps: Step 1: estimate 0\ by 0I,CLI, a solution of V'c/czX^i) = 0 (usually equivalent to maximizing *UCL(OI))-Step 2: estimate 02 by 92,CLI, a solution of ^2BCL(^I,CLI^2) = 0 (usually equivalent to maxi mizing ^>BCL{0I,CLI,02))-Asymptotic Variance under Linearity We first derive the Godambe information matrix of ibcLi assuming a linear relationship between Xj and 7j, that is, 7j = Xj0. Then Hj can be replaced by Xj. Firstly, D^, = T>CLI — n-x£j X^AjXj With Aj = -E dgCLi(Yj,7j) / A?* o \ A(2,l) A(2,2) i i ( gdgi/cx,(Yi,7n) o So = BCLI = - E 1 XJA^X*! ^ g9g2BCi,(Yi,7f) •gdg2BC£JYi,7.) y 0 \ X!-,Ai2,1)Xvi X'-0Af'2)X. •il -^i2l (3.8) H2 ) 49 Also, = MCLI = n"1 EiX^jXj with / ni1>1} aa'2) \ / Var(gr/cL(Yi)) CovCgr/cLTO.gzfYi)) \ ni = Var(gCL1(Yi)) n^1) o(2-2) ^ Cov(g2BCL(Yi),g1(Yi)) Var(g2BCL(Yi)) / Then = MCLi = - V V XJaOj2'1^! x;2nf '2)Xi2 / - ^-ir»-i (3.9) The approximation of the covariance matrix of OCLI is VCLI = n ^CLI^CL1^CLI)'• Write '1,11 0 DCLi = -n ^Mi,n Mi,i2^ and MCLI = -n \M1>2i Mi,22// \Di,2i Di>22 / where Di,«» = ^X^A?'0^, and M1>Wi = Ei^A-'"''^'- We have /DLH o V1 ( D71X 0 \ (3.10) Di,2i D 1,22/ V-DiiDi^iDiJ! D122/ »-i It follows that Var(01)CL1) » Di;t1M1,11(Di;l1)' / A(1-1)^ (3.11) -l E XJX A^Xn £ XS^P^ E X^ Aj1'1^ Var(02,CL1) « (-Di;LDi,2iDi;li D^2) MCii "Dl,22Dl,21Dl -1 11 ,22 Cov(di,CLl,d2lCLl) « (l>r,ll °)MCL1 D 1,22 (3.12) (3.13) Performance of GL1 Estimates in Independent Case 50 In this two-stage approach, in the first step, Yn, ... , Yiki are treated as if they are in dependent and in the second step, the pairs (Yj, Yj') are treated as if they are independent. If YJI, ... , Yjfc; are actually independent, is this approach equivalent to the MLE method in terms of asymptotic variance? To answer this question, we compare the Godambe information matrix of ipCLi and Fisher's information matrix of the full likelihood score function. The Fisher information matrix is defined by F(0) = lim - V E TWOO n ' i where lki is the log likelihood function of Yj. F(0) represents the average Fisher information per family. Theorem 3.2 Assume the same conditions in Theorem 3.1. Let GCL\{6) be the Godambe infor mation matrix defined in (3.1) with D^, and as in (3.8) and (3.9) and F(0) be the Fisher information matrix as defined in (3.14)'• Suppose 6'0 = (6'01, ^0,2) *5 the true value of 6. When Yi, ••• Yjfci are independent, i.e. /fc(yi; 0o,i, #0,2) = Yij /i(Yjl &o,i), we have (a) GCLI(6O) = F(0O) when 0o,2 is in the interior of the parameter space; and (b) lim GC£i(0) = lim F(0) 02~>@0,2 02—^00,2 when 0o,2 is on the boundary of the parameter space and F(0) and GCLI{6) are continuous at 0Q,2-dlki(Yi,6) 36 dlki(Yi,6) 86 (3.14) Before we prove Theorem 3.2, we first introduce some notation used in this proof. Here Y is the response vector of a family in general, so the family index i is suppressed. Suppose h is a function of Y and 6' = (0'1? 6'2). We use a dot on top of h to denote the first partial derivative, i.e., MY) = dh{Y;6) 36 and ftW(Y) = dh{Y;6x,62) 36, So, h(Y) = AW(Y) h^(Y) 51 We use h(Y; OQ) to represent the derivative evaluated at 0Q, i.e., h(Y-e0) = dh(Y;0) ae and A(/>(Y;0o) = dh(Y;61,02) 0=0o 0=00 We use double dots to denote the second partial derivative, i.e., y</>(Y)and ^(Y;*.)^™'9*' 0=0o dexde\, — " 00,00;, where = 1, 2. Let fk and Zfc be the pdf/pmf and log likelihood function of Y, respectively, and v be a Lebesgue measure when Y is continuous or a counting measure when Y is discrete. The regularity conditions (Serfling, 1980) ensure that a/&(y;fl)cfr(y) _ f dfk(y;d) ae -I de -du(y) and dfik(y;0)dv(y) _ [ dik{y;0) -I -du{y) de j de Thus, we have the following three results which are needed in the proof of Theorem 3.2. Results 3.1 and 3.2 are standard results of likelihood theory. Result 3.1 / a/fc(ya*/'*2W) = ^/ /fc(y;*i,*,)«My) = 0. This leads to E0Jkl)(0o) = 0. Result 3.2 Eeoif l'\Y;e0) = -E^fY;0O)[/f}(Y;0O)]'. Result 3.3 E0/P(Yj;eo,l)ik 2)(Y-eo) = O. Proof: By definition, i?(Yr,e0,i) = /r(^;»o,i)//i(^;»o,i) p(i) 42)(Y;0O) = /r(Y;0o)/A(Y;0o). _ i(2) 52 Since fk(Y;B0) = /I(YJ; 0o,i)/j!Li(Y_j|Yj; 0O)) where Y_j is Y without Yj and /JJ_1 is the condi tional pdf/pmf of Y_j, 00/v.. a M'(2W«\_A (yj;^o,i)/A;_1(Y_i|Y,;0o) Therefore, Thus, E0o/W(Y,;0O)i)/i2)(Y;0o) = E^A(1)(yi;go,i)/g(Y-i|yJ-;go) /*(Y;0O) = j ff\yj]e0)f;i"l(y-j\yj;9o)dHy) = I ffHwBo) [|f*kW(y-j\yj-,9)dv(y_j) dv{yj). Similar to Result 3.1, / f^l{y-j\yj\B)dv{y-j) = 0. Therefore, Eg/^iYj-0o>i)42)(Y;B0) = 0. Proof of Theorem 3.2: The asymptotic properties of the CL estimate and the MLE estimate are both derived under the condition that the parameters are in the interior of the parameter space. In (a), #o,2 satisfies this condition. We first prove that MCLI(0O) = &CLI(BO), so that GCLI{9O) = lim MCLi(0o) = lim BCLI(90). n—Hx> n—>oo As defined in (3.10), Dcxi and MCLI are partitioned into submatrices Di^/ and Mi^, where I, l'=l or 2. (1) Claim Diili(fl0) = Mi,1i(flo). By Result 3.2, D1>11(0o) = -E{^i(0o)} = -EEE[(i1,1)(y«;»oIi) i j = EEE?)^-^)?)(^;^)'-i j 53 By definition, Mltll(0o) = E{^UCL(eo)rl>'UCL(0o)} EE(E^;*O,I)J ^E^^-^o.oj • The last equality holds since El^\Yij\ flo.iH^Qfy?'; ^o.i)]' = 0 for i 7^ i' under the assumption that individuals from different families are independent. Under independence of Yij and Yiji, E 41}(yy; tfo.OPi15(Yij'10o,i)]' = 0 for j # j'. Therefore Mi,n(9o) = EEE'i1)(y«'tf''.i)'i1)(^i«o)i)' = Di,n(»o). (2) Claim D1)2i(6»0) = 0. Di,2i(0o) = -E{tf]Ci(0o)} = - E E E 42,1^(Yij, Yy'l #0,1, ^0,2) » J>j' = E E E (42)(% %^o,i, ^0,2)) (/^(yy, Yir, 0o,i, 0o,2))', * j>j' by Result 3.2. Under independence, i21)(Yij,Yij,;0OtU0Ofi) = i?\Yij;0o,1) + i<iLHYir,0Otl). By Result 3.3 (with k = 2), E(/^(yy-.y^;^,!,^)) (4x)(^^o,i) + /i1)(y^;^o,i))' = o; therefore Di^i^o) = 0. (3) Claim Mi,2i(0o) = 0. 54 Mi>21(0o) = E IJ>2BCL(OOWUCL(OO) \ * 3>3 J \ * J = EE(E^^^'^o,!,^)) (E^^;^) i \3>3 J \ 3 = E E EE (42)(^.^i';*O,l,0O,2)) (^(V^O,!))'-* i>i' j" Let (A) = E (j,2 2\Yij,Yiji;0O>1,0O2)j (4^(^V';^0,1)) , an arbitrary term in the above sum mation. If jf 7^ j" and j' 7^ j" , under independence, (A) = (E iPiY&Yij,;0o,i,0o,2)) (E /'^(Y^; 0o,i))' = 0, since by Result 3.1, E (Yijn-0^) = 0. If j = j" or j' = j", (A) = 0 by Result 3.3 with k = 2. Therefore Mi;2i(0o) — 0- By symmetry, Mi^C^o) = 0 as well. (4) Claim Di,22(0o) = Mli22(0o). D1)22(0o) = -E{^22ici(0o)} = ~ E E E ?2)(^> Yif; 0o,i, 0O,2) * 3>3 = E E(4^(*y'> Yij1, #0,1, #0,2)) (4^(*y> #0,1, #0,2))', by Result 3.2. Meanwhile, Mi,22(0o) =E{^2BCL(0o)^'2BCL(0o)} = E EE1?^-^;^-^)) (^5;42)(yij,yijv;0o,i,0o,2)X \ * 3>j' / \ * 3>3' = EE(E42)(^,^'^o,i,0o,2)) fEf22)(y«'y«';flo,i^o,2)) • i \j>j' J \l>l' / Let (B) = E ^42)(^j>yy';^o,i,0o,2)) (42)O^>^';0o,i,0o,2)) , an arbitrary term in the above sum mation. If (j, f) and (I, I') are two pairs with no common individual, (B)= 0. On the other hand, 55 if (Ji j') and (/, I') share one common individual, say j = I, but j' / I', then (B)= [[[ ( A2)(yij^yijr^do,udo,2) \ ( f22)(yij,yii';Oo,uQo,2) V ill \h(Vij,Oo,i)h(Vij>,Oo,i)J \ fi(yij,Oo,i)fi(yu>,00tl)J hiVij, Ooii)fi(yij>,Oo!i)f1(yiil,e0ii)du(yij)dv(yiji)dv(yii>). Since f2\yij,yij<;0o,\,00,2) = h(yij, &o,i) JZ{2) (yij'\Yij = yif, 60,1, Oo,2), where /* is the conditional pdf/pmf, (B) = fff [/2(2)(^,^;0o,i,0o,2)j [/2*(2)(%^;0o,i,0o,2)]7^^ = / / fPHYi^Yij,; 00,1,0o,2)du(yif) J /2*(2)(Yy, YiV; 0OyU 0Ot2)du(yu fi(yij,Oo,i)dv{yij) = 0, since by Result 3.1 the integrals inside the square brackets are zero. It follows that M1)22(0O) = E E E(Z22)(Yij, Yij,; 0O)i, 0Ol2)) (42)(Yy, Yif; 0O>1,0O>2))' = D1)22(0O). Based on (1) - (4), MCLI(0O) = DCLi(0o), therefore GCLi(0o) = lim MCL1(0O) = lim DCLI(0O)-n-voo ra->oo Next we prove that F(0Q) — Mexico)- We write vra ^ r 1 fPu(*o) F12(0O) F(0O) = hm -?W°°N \F21(0o) F22(0O), with F^(0o) = EE4?(Yi^o)C)(Yi,0o)', J, /' = 1, 2 an, (5) Claim Fn(0o) = MM1(0O). /«(Yi;0o)=^(Y^^2) 90i dl^ (YJ; 0i, 0O)2) 0i=eo,i 82=80,2 d0i fli=flo,i Under independence, /^(YJ; 0I, ©0,2) = 52jh(Yij-,Oo,i). Therefore, i^(Yi;0o) = '£ii)(Yij;0o,i). (3.15) 56 It follows that (6) Claim Fi2(0o) = 0. .,,?.(?,^,,)(FH'=" ..„ F12(0O) = EE ikiHYuOoWg(Y;,0O)]' i = EE ^E/i1)(y«;»o,i)j (^(Yi.flo))' = EEE/i1)(^;0o,i)[42)(Yi,eo)]'. 2 J By Result 3.3, Fi2(0o) = 0. (7) Claim F22(0O) = M1)22(0O)-F22(0O) = Z)iE 4f (Yi,0o)[4f (Yi,0o)]'-As defined before, ^ is a function of 72 which in turn is a function of 02. So for a vector Y of length k, we can write ifc(Y;01,02)-/fc(Y;7l(01),72(02)). To prove (7), it suffices to show that ^(Y;7l(0o,i),72(^2)) = ^^^^ i>j' since M1;22(0) = £E fE^^,^^^0^))) (E^^^^T^^))) . * \3>j' J \3>3' J (For a simpler notation, we omit 7l in the likelihood.) We can write mnr.*\ V ^(Y;72(g2))a72j,/)(g2) lk (Y,0o)-2^ UJI) 3>3' U'2 10=00 —d i') <• (i i') Let 72 v be the vector of parameters in 72 without 72 . We have = dik(Yn^'\^'\e0,2)) dlK{Y;l2(02)) faUJ') 02=00,2 70J')=70J')(9 } dj3,3') 57 02=00,2 Under independence, Therefore, /*(Y;7?J"^72'WJ')(^))=/2(yi,^;72J'*,''))+ E 'iM). lk (Y'^)-E ~m QQT-J>3' l6>2=6»o,2 = E42)(^,^;7F)(0o,2)) Based on (3) and (5) - (7), we have F(0O) = Mcii(0o) The earlier results (1) - (4) then yield GcLi(tfo) =F(0O). In (b), 0o,2 is on the boundary of the parameter space. As long as F(0) and Gcxi(0) are continuous at 0o,2, lim G(0O) = lim F(0O). • 02~>#0,2 02—i-00,2 Weights for the Linear Case Theorem 3.2 tells us that the CL1 estimates are asymptotically as efficient as the ML estimates when the data are independent within each family. The next question is how well this method performs when the data are correlated. Later investigations show that it can be inefficient compared with the MLE when the data are highly correlated. To improve the efficiency of OCLU we consider adding weights in the estimating function. For the linear case, consider VWi = EX*W^Li(Yi,7i). i The idea can be generalized to nonlinear cases. For any fixed Wj, E(ibWCLl) = 0. Therefore, ipwcLi are unbiased estimating functions which would provide a consistent estimate of 0. With Wj in the estimating equations, we have Dv, = DH,CM=n-1J]x;W^AiWiXi i = MH,CL1 =n~l ^XjWjaWiXi. 58 From Chaganty (1997), it follows that the optimal choice of Wj is Wopt)j = fi{ 1 Aj. The corre sponding covariance matrix is Vwopt = X* ST^X* where X* = AjXj. Since the optimal choice of weights, Wop^, depends on the unknown parameters 9, we have to replace them with estimates. For this reason, the weighted estimates need to be implemented by an iterated method such as the one described below. 1. Estimate 9 without weights, and set 9 = 9. 2. Calculate the weights Woptti(0). 3. Update 9 using the weights 'W0pt,i(9). 4. Set 9 equal to the new 9 and update W,>pt,i(0)-5. Repeat 3-4 until convergence or for a few times. Let 9wopt be the estimate from the above iteration and 9\vopt be the estimate when the Wop^i are known. We conjecture that ||#vvopt — 0Wopt|| ~* 0 and the asymptotic variance of 0Opitu converges to Vwopt m probability. However, we do not pursue a rigorous proof of the asymptotic properties of 9wopt in this thesis for the reason given below. Theoretically, the weighted method is more efficient. However, the improvement is achieved at the price of increasing computation. Often, the evaluation of Wj is too difficult or time consuming to carry out. To reduce computation, we could consider weighting only guCL- Let w'-( a 1y where Wpor]j = AJ1'1^^1'1^)-1, which is the optimal choice for gi when 9% is known. Comparing with the optimal weights fi~1A,, Wpar-ti is much easier to compute since it only involves the bivariate marginal distributions. The partial weights will mainly improve the efficiency of the estimate of 9\ which, according to our investigation, suffers more efficiency loss (see details in the 59 next chapter). Under the partial weights Wpar)i, Var(0Wpar) « ^X*i(n*(1'1))"lx<i) where X*X = A^'^XJI- Since no weights modify g2BCL, Var(02) remains the same as in (3.12). Estimation with the partial weights can be implemented as follows: 1. Estimate 6 without weights, and set 6 = 6. 2. Calculate the weights Wpar^(6). 3. Update 6\ using the weights Wpar)j(0). 3.2.3 Estimating Equations Based on BCL (CL2) Inference Functions Since both 6\ and 62 can be identified by the bivariate marginal distributions, an alternative to the two-stage method is to estimate the two sets of parameters simultaneously by maximizing the BCL function ^BCL defined in (3.5). However, this method could overemphasize large families since they contribute a quadratically increasing number of pairs to the BCL. It may be sensible to weight the contribution of each family according to its size. An individual from a family of size k appears in k — 1 pairs. Therefore we weight the contribution of the family by l/(k — 1). With u>i = l/(fcj — 1) (hi > 1), the weighted BCL function is: n ^BCM = ^WiY,h{Yij,Yir,0). (3.16) »=1 j>j' When Yn,... , (k{ > 1) are independent, j>j' 1 3>j' 3 We can see that under independence \&BCL is indeed the log likelihood function of Yj. Instead of using a family size-based weight, in theory we can derive the optimal weights. However, as mentioned before, such weights depend on the unknown 0 and are difficult to evaluate, so we do not pursue this. 60 The estimating function based on ^BQL IS +CLM) = &N^{E) = X;^[H(X«)],8CM(Yil7i)> (3.17) where H(X;) = dh{Xf, 0, /80 and gcz,2(Yj,7j = 2^ • The function gcL2 can be written as (giscL,g2BCZ,)', in terms of the partial derivatives with respect to 7a and 7i2, respectively. For ipcL2, i with A* = -E{dgCL2(Yi,7i)/37;} and = MCL2 = n-1 ^^2[H(Xi)]'n*H(Xi) i with nj = Var(gCL2(Y<,7<)). Note that the weight Wi = l/(fcj — 1) can only be applied when hi > 1. When fcj = 1, we set Wi — 1 and use the univariate log-likelihood, i.e., n *BCL(*) = (3-18) i=i where Z(j)(0) = Zi(Yii;0) when-ft* = 1, and l^0) = Y^j>j'h{Yij,Yij'\0) otherwise. Performance of CL2 Estimates in Independence Case Theorem 3.3 Assume the same conditions in Theorem 3A. Under independence, i.e., /fc(Y; 0o,l) 00,: r±i/i(Yi;0o,i), we have AVar(0i)CL2) = AVar(0i)ML£;) AVar(02,CL2) hpd AVar(02,ML£;), where AVar denotes asymptotic covariance matrix. The relations hold for the limits of the covari ance matrices if 0Q,2 on the boundary of the parameter space. 61 In our proof, we will need the following matrix version of the Cauchy-Schwarz inequality given by Chaganty (1997). Lemma 3.1 Let Bj, Dj, l<i<mbetxp matrices. Let Ej, 1 <i < ki be t xt positive definite matrices. Then, assuming that the inverses exist, (EB^) (EB^B<)(ED^) ^(E^D*) • The equality holds ifT>i = SjBj. Proof of Theorem 3.3: To avoid extra notation, we assume ki > 1 for all i. Based on the results in the proof of Theorem 3.2, Fi2(#o) = 0, so we have AV™(d1MLE) = ^Pn^o), (3-19) AVar(02)MLE) = ^F^(0o).- (3.20We write 1 (D2.11 D2ji2 Dc£,2 = n \D2,2i D2,22> where Bw = ~Ei~zr[EEty; *o,i, M, 1,1' = 1,2. Similarly, we write « j>j' 1 /M2,n M2,i2 McL2 = - I n \M29i M292j where MW = E 7j^ri)2 E ^E 4° fe ^'5 0o,i, M j ^E 4° fe Yif; 0o,i, 0C ,2) , I, I' = 1, 2. (1) Claim D2in(0o) = -Fii(0o). 62 Since under independence, Z^O^,#0,1,#0,2) = /^'^(Vtj; 0o,i) + ^'^(iij'j.^o.i), D2III(»O) - -E^i E {'T'Vtfj'o,!)+31,1)(n»';«o,i)} j 1 j>j' = -EE'T,1)(^;ffo,i) = Di>u(tf0) = Fn(fl0). The last step is from step (5) in the proof of Theorem 3.2. (2) Claim D2,2i(0o) = 0. The proof is similar to step (2) in the proof of Theorem 2. Then D2,i2(0o) = 0 as well. (3) ClaimM2,ii(flo) = F1i(fl0). Under independence, k. _l E ^ ' 0o,i' 00,2) = E ^ ^o-1)-Therefore, M2,n(0o) = EE^E 4X)(^;0o,i)j ^E^fe^j =Fii(0o). (4) M2l2i(tfo) = 0. Under independence, M2,21(0O)=E^E^E/?)(^'%^O,1,0O,2^ ^E ^(^^O,!)^ -Then M2,2i(0o) = 0, by a similar proof as we gave for Mi)2i(0o) = 0 in Step (3) of Theorem 3.2. Then M2,i2(0o) = 0 also. By (2) and (4), we have 1 /D-I, 0 Dcl2(0o) = - 2,11 1 and 1 /M2U 0 MCL2(0O) = - ' n\ 0 M2,22> 63 Hence ^.p-1 TV/T Tk-1 1 AVar(0liCL2) = -DjiM^nDjJ!, AVar(02,CL2) = ^D2-j2M2i22D2-j2. By (1) and (3), Therefore, under independence, AVar(0i!c/,2) = AVarf^^MLs)-By a similar proof as in Step (4) of Theorem 3.2, we can show that D2,22(0o) = £ k -1 E E(4^(*u> Yj/; 0o,i, ^0,2)) (4^(**j, Yij'] 0o,i, ^0,2)) , M2,22(0O) = £7jferh)2 EE('2^ Let 1 1 Wj = diag and -1 Bi = E(42)(y^-, Y^; 0o,l, «0,2)) (42)(^U> Yif, 00,1, 0o,2))'. Then D2,22(0o) = w*Bi and M2,22(0o) = WiBjWi; hence AVar(02,BCi) = ^E W*B*) (E W*BiWi) (E WiB* By the matrix Cauchy-Schwarz inequality (Lemma 3.1, (EW*B*) ^WiBiWij (EB*W*) ^ (^Bi) ' Meanwhile, we have shown in step (7) of Theorem 3.2 that F22 = J^i BJ- Therefore AVar(02)BCL) ^pd AVar(02)MLii;). Equality holds when k{ is a constant. • The performance of the CL2 estimate in the dependent case is investigated in the next chapter. 64 3.3 Estimating the Asymptotic Covariance Matrix of 6 Different methods can be considered to estimate the asymptotic covariance matrix of 0. One approach is to evaluate the Godambe information matrix at 0 analytically. This can be time-consuming or even sometimes impossible. For example, with survival data, the matrix cannot be evaluated without specifying the censoring distribution. An alternative approach is to use resampling methods such as jackknife or bootstrapping. Naturally, the sampling units are the families. A third approach is to evaluate the Godambe information matrix empirically or using parametric resampling techniques based on the asymptotic distribution of the CL estimate. We use the jackknife method in our examples in Chapter 6. Let 6-i be the estimator of 0 with Yj deleted, i = 1,..., n. The jackknife estimator of V, as defined in (3.2), is n Vj = Y,(O-i-0)(d-i-ey, (3.2i) i where 0 is the estimate based on all families. A proof of consistency of this estimator for the i.i.d. case and the case with covariates is given by Joe (1997) (Chapter 10). The theorem can be directly applied to familial data with fixed family structure. In general, family structures in the data are not identical, but we assume that the population under consideration is a finite mixture of types of family structures. Since the result holds for each type of family and the estimating function is a summation over all types of families, the consistency can be inferred from Joe's proof. For large samples, the jackknife estimator can be obtained by deleting more than one family at a time in order to reduce the computation time. 3.4 Remarks In this section, we provide some general remarks on the CL and related estimating equation meth ods. Remark 1: For the multivariate probit model for binary data, a related estimating equation approach was proposed by Reboussin and Liang (1998). In their approach, a set of weighted estimating equations is constructed from the first and second moments. The estimating equations for 6\ are the same as the weighted CL1 estimating equations, but the estimating equations for 02 65 are different. For multivariate probit models, under independence, the method of moments leads to the same estimating equations as the ML method for the regression parameters. However, when the estimating equations based on moments differ from the score equations, the method of moments is known to be less efficient than the ML method. We have shown that under independence, the CL1 method is equivalent to the ML method. So under independence, the CL1 method is better than the method of moments in general. Hence, Reboussin and Liang's method of moments idea cannot be expected to have high efficiency if it is extended to some other models. Also, this method cannot be directly applied to the Poisson-log normal mixture model, in which the regression parameters can not be estimated with only the first moment as the moment depends on cr2. Remark 2: The CL1 and CL2 methods are not restricted to the models discussed in this thesis. In particular, the CL2 method works for multivariate models, whether or not the parameters are common to the different margins. It works even when the univariate margins do not belong to the same distribution family. The parameters can be estimated as long as the bivariate marginal likelihoods can be evaluated. Therefore, we can consider a more general form of the BCL function: n i=i j>? where Zjjy is the log likelihood of the bivariate margin of and Y{y. This estimation approach can be considered if the high-dimensional likelihood is too difficult or impossible to compute. Remark 3: The idea can be extended to trivariate CL. However, as the dimension becomes higher, the computational demands increase, so the implementation of such an approach is more difficult. When all the parameters are either univariate or bivariate parameters, it is not clear that the efficiency will be improved by using the likelihoods of marginal distribution of higher order. Remark 4: Typically, the CL methods can be implemented by numerically maximizing the CL functions. For example, the quasi-Newton method can be used to avoid solving the system derivative equations. Numerical optimalization becomes more difficult as the total number of parameters increases since the search for the optimal point has to be carried out in a higher dimensional space. So the CL2 method is numerically more difficult to implement than the CL1 method. To improve the numerical optimization, we can use the CL1 estimate as the starting value for the CL2 method. Remark 5: If the sample size is not large, the numerical optimalization has to be done subject to 66 a constraint that the estimated, dependence parameters must lie inside the parameter space of the multivariate joint distribution. Without this constraint, the solution of the estimating equations may be outside of the parameter space. For models based on the MVN distribution, the constraint on the dependence parameters is that the correlation matrix of each family is positive definite. Remark 6: In the CL2 method, we weight the BCLs by the family size. Naturally, one would consider the same weights in the second stage of the CL1 method. However, those weights are inappropriate when the dependence is weak. As shown in Theorem 3.2, the unweighted CL1 method is equivalent to the ML method under independence. Hence, the weights will only worsen the efficiency when the dependence is weak. When the dependence is strong, the weights can be helpful. But as shown in the next chapter, the CL1 estimates of the dependence parameters perform well even under strong dependence. So weights are not essential. For these reasons, this weighting scheme is not used in the CL1 method. 67 Chapter 4 Efficiency Comparisons The CL methods are not fully efficient. However, it is not known at what parameter values the CL estimates tend to lose more efficiency and how much efficiency is lost for familial data. In this chapter, we intend to partially answer these questions in the context of familial data analysis. We define the asymptotic relative efficiency (ARE) of the CL estimate of the parameter 6 as ARE(0CL) = AVar(0ML£;)/AVar(0cL), where AV&T(9MLE) and AVar(0cL) are derived from the inverted Fisher and Godambe information matrices respectively. We plan to make comparisons for four types of data: continuous, binary, count and survival subject to right censoring. For each type of data we select one distribution as a representative: the MVN for continuous, the MVP for binary, the PLNM for count and the MVN with random right censoring for survival. For each model, we select some cases in which comparisons are possible. For the MVN model, we are able to make theoretical comparisons of the ML method and the CL methods. For the MVP model, we randomly generate covariates and family size and compute the information matrices based on the generated families. For the other two models, the comparisons are done by simulations in which the likelihood is evaluated by numerical integration or approximation. By these comparisons, we will gain some insight into how the performance is affected by different factors, such as degree of dependence, family size and data type. 68 4.1 Continuous Response: MVN model First, we consider the multivariate normal model: Yj ~ iV(Xj/3, a2Rj(a)), where /J is an Tri dimensional vector of unknown regression parameters; and Rj(a) = [pijj'(ct)] is a correlation matrix, with a = (cci,aq)', a vector of dependence parameters. The likelihood of a MVN distribution has a closed form and therefore we are able to derive the theoretical asymptotic variances of both the ML estimates and the CL estimates, so that the efficiency can be compared analytically. For this reason, we conduct comparisons in various settings. The results from the MVN distribution can provide insight into the general behavior of the CL estimates, not limited to just the continuous response. We first derive Fisher's information matrix and the asymptotic variances of the ML estimates and then investigate the AREs of the CL1 and CL2 estimates with conclusions of the comparisons at the end of each subsection. 4.1.1 MLE Assuming R"1 exists, the log-likelihood is: 1 = E "Il0S27r " Tl0^2 ~ \l0S 1^1 - A(Y* - X^'RT-HYi - Xj/3), The score functions are given by: Ir^E^rHY,-^) i i = -±tr (R"1^) + ± £(Yj - XiPyR-^R-HYi - Xj/3), j = 1,, *3 where Ta = ——Rj. These functions are derived based on the general results in Appendix B.l. J OCti 69 For 6' = (/3',a 2,a'), Fisher's information matrix is: / Pn 0 0 \ n F22 F3 32 \ 0 F32 F33 J where Fn = cr-2 ^ X-R^X;; F22 = (2cr2)-1 £V h\ F23 is a q x 1 vector with the jth entry equal to — (2cr2)-1 Y^i trTjjR^1 and F33 is a q x q matrix with the entry in the jith row and j'th column equal to (2a2)-1 J2i trryRr^y/R-1. As AVar(0MLE) = n_1F_1, we obtain kV*c{PMLE) = ^ = a2 (EjXfR"1^)-1, ,2 x , -P22 F32 °MLE AVar <*MLE (4.1) (4.2) \ F32 F33 J (With the MVN distribution, the variance of /3 is the exact variance. But to keep our notation simple, we do not distinguish Var and AVar in this section. Instead, we use AVar everywhere.) 4.1.2 Efficiency of the Two-Stage Estimating Approach (CL1) In this section, we first derive the estimating equations and then compare the efficiency for different cases. Finally, we provide a summary of the comparison results. Estimating Equations: In Step 1, f3 and cr2 are estimated by maximizing the sum of the univariate log likelihood of Yj 13 • ^CL^/V2) = (log27r + loga2 + {Vij j^H i 3 The estimates of /3 and cr2 are obtained by solving J]xf(Yi-Xi/3)' = 0 i (4.3) (4.4) 70 leading to the ordinary least squares estimators of /3 and cr2. If the weighted CL1 method is used, the estimating equations for /3 with the optimal weights are the same as in the full likelihood method: i In Step 2, a is estimated by maximizing the sum of the log-likelihoods of the bivariate margins with /3, a2 replaced by fie Lit and &CLI obtained from Step 1. The BCL function is *BCL(Y; f3, a2, a) = - ]T ]T j log 2TT + log a2 + i log(l - p2ijf)+ (Yjj - Xjj/3)2 - 2pijj,{Yij - •Kijp){Yir - X{j,(3) + {Yjj, - Xif(3)2 \ J (4.5) = c - I log a2 Y, ki(ki-lJ-^EE ~ Ph')~ i i jz£ji J2(Yi - XiPYMYi - Xi/3) 2a2 where c is a constant and Aj = (cijjj') with aijj' = * k -Piji'i1-P%j')~l forj^f. This leads to the following estimating equation for af. 2*CLI E i 9pdf ~ E^Yi - ^cLi/BaCYi - X^CL1) = 0, / = 1,...,9, (4.6) where = dAi/dai. EfBciency The investigation is divided into two parts: (1) efficiency of f3CL1 and (2) efficiency of all estimates under different dependence structures. 1. Efficiency of f3CL1 From (4.3) we have J3CL1 = (EiXfxi)_1 £;xFYi- Therefore, AVar(£Cil) = a2 fe XjX^ fe XjR^) f ^ Xf X^ . (4.7) 71 The variance of 0MLE 1S given in (4.1). We consider the case of only one covariate, denoted by Xj. Then, m = 2, X; = (l,Xj) and /3 = (A),/3i)- The results can be extended to more than one covariate. In the previous chapter, we treated the covariates as known constants. To simplify our analysis of the efficiency, we make the following assumptions about x^: The values of xn,... ,Xiki are independent realizations of a random variable with mean 0 and variance 1. The assumptions about the mean and the variance can easily be satisfied by centering and standardization. Here we assume that the value of X{j is not correlated within families. Cases where the x's are related within family will be considered later. We will derive the asymptotic efficiency of ficLi based on the following results: when the number of families n —>• oo, Result 4.1 J2ijxij/K ^ 0; Result 4.2 xjj/K 4 1; Result 4.3 K~l ^xjRT" 1 ^ 0; Result 4.4 K-1 VJ* xjRjXj - K~l £V tr Rj 4 0, and K~l xjRr1* - K~x trRT/1 4 0. where K = Y17=i *ne total number of observations. These results are derived based on the assumption that M, the size of the largest family, is bounded. See Appendix B.2 for a proof of results 4.3 and 4.4. It follows that when n is large i V since trRj = k{. Also, i , i / £i I'HT i 0 i V 0 X^EitrR"1 72 follows from Result 4.4. Based on the above results, (4.7) and (4.1) become AVar(/3CL1) ~ a2 0 K -l (4.8) AVar(/3ML£) ~ a2 (Eii'RT1!)-1 o \ (4.9) V 0 (EitrRT1)"1 y The relative efficiency of @O,CLI is approximately AVar(A,,MLE) X2 ARE whereas AVar(/30)cLi) (Ei l'H*l)(Ei I'H-T1!)' _ AVar(/3i)MLE) _ ARE ,5 K (4.10) (4.11) AVar(AiCil) EitrRT1' So with this MVN model, the ARE of ficLi omy depends on the correlation matrix Rj and family sizes. When m > 2, Xj = (1,XJI,--- ,Xj(m_i)). If the covariates are realizations of m — 1 mutually independent random variables with mean 0 and variance 1, then K~l Ei xj/xiJ' and K~l Ej xiiRixil' converge to 0 as / 7^ I' (see Appendix B.2 for a proof). Therefore, ARE^jcii, / = 1,..., m — 1, is also given by (4.11). Returning to the case of one covariate, we next investigate some cases in which the x values are correlated within families. The results 1-3 still hold when the x values are correlated within families. Therefore the asymptotic relative efficiency of $O,CLI will not be affected. We only need to consider the asymptotic relative efficiency of $I,CLI-The extreme case is that all family members share the same x value X{. Suppose £j is a realization of a random variable Xi with mean 0 and variance a2. It can be shown that o 2 ^ I'Rii AVar(/9i,cLi) ^ a2 K2 and AVar(/3i,ML£) ^ ^Eii'Rf1! 73 As a result, ARE^ c^ is the same as (4.10). For this case, since the information on 8\ is only from the difference between families, /3I,CLI is not affected much by the dependence between Yj. More generally, suppose the components of Xj are correlated with covariance matrix cr2RX;. Then we have K •^0 i This leads to AVarOW) = ^'lRsiy so that the ratio ARE^ is given by K2 Eitr(RiRXi)Eitr(Rr1RXi)' For the very special case that RXi = Rj, we have K 1 ARE^CU ^ tr R? " 1 + Ei E^< P%rIK' and this efficiency decreases when p2--, increases. 2. Efficiency of All Estimates under Different Dependence Structures We consider the following dependence structures of YJ: exchangeable dependence structure, dependence structure of a Type-3 family; dependence structure of a Type-4 family. For each case, correlations are the same across families. We also consider families with either fixed or varying family size. These results can give us some idea of efficiency for data with more complicated family structure. a. Exchangeable Rj If the largest family is of size M, then — 1/(M — 1) < p < 1. Since M can be much larger than 1, the lower limit of p is close to 0. For this reason, we consider 0 < p < 1. The asymptotic relative efficiencies are: 74 ARE A ARE; n.CLi ARE, JCL1 ARE PC LI (l + ap)(l- bp) 1 1 + bp2/{I -P) 1 (l + ap2)(l--dp2) (1 + P2? (1 — dp)ce1 where a h = c = d = e = fi = J_ ^ kj(kj - l) K^l + ih- l)p I ^ kiiki - 1){1 + {ki - l)p2) K b2/c (1 + fa - l)p)2 h{ki - 1)((1 -(hi - 3)p2)(l + (ki - l)p)2 + ki(ki - 2)V). 2n , , 2x , 2p3(l-P2)ZWki-2)(ki-l) AREp0CL1 and AREp1CL1 are obtained directly from (4.10) and (4.11). Var(a2:L1) and Var(pcLi) are derived using Maple (see Appendix B.3 for the Maple code). Some remarks concerning the dependence of these AREs on family size and the dependence parameter p are given below: 1. Family size: (a) When ki = k for all i, ARE a = ARE^2 = ARE.W. = 1, while V ' ' ' P0,CL1 CL\ PCLl ' ARE, -l-l"-1* . KCLX 1 + (A: - l)p k(k — 1) (b) The term i n — is a strictly convex function of k since dk2 2(1 -p) By Jensen's Inequality, k(k- 1) 1 + (k - l)p\ ~ [1 + {k - l)p]3 k- 1 > 0. 6> 75 where k = K/n. This implies that ARE < 1 (* ~ 1)P2 '&.CL1 " 1 + (jfc - l)p' Equality holds if and only if ki = k for all i. This implies that: for given K and n, the efficiency is always lower when family sizes vary, and further that, roughly speaking, ARE decreases when k increases. (c) Roughly speaking, ARECT2 and ARE^ CL1 increase with k and decrease with the vari ation in ki. 2. Dependence parameter p: (a) J3I,CLI loses efficiency when \p\ increases. As p approaches 1, ARE^ goes to 0. This is because when p = 1, Q\,MLE has no estimation error, whereas PitcLi has estimation error. (b) When p = 1, ARE A = 0 and K2 ARE* =ARE-2 -- _ l9. Clearly AREA and AREA2 are less than one unless is constant, (c) ARE^.2^ is a function of p2. Comparing to ARE^ , it decreases slower when p increases. Figure 4.1 shows the four ratios with the maximum family size M varying from 2 to 10 and mNm/K fixed at 1/M, where Nm is the number of families with m members, m = 1,...,M. (i.e. there are the same total number of individuals from each size of family.) b. Type-3 family with varying number of offspring Let p\ be the parent-offspring correlation and p2 be the sib-sib correlation. Assuming that the index of the parent is 1, then A R-j — (1 Pi Pi Pi 1 P2 Pi P2 1 \Pl P2 P2 Pi P2 P2 1/ 76 I I I I I I I J I I ft 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Figure 4.1: Efficiency of CL1 for exchangeable dependence structure. (MVN model) The correlations p\ and p2 need to satisfy the following condition: Mp\ — (M — l)p2 < 1, where M is the maximum number of offspring in a family (Srivastava and Katapa, 1986). For a fixed number of offspring k* in all families, the relative efficiency was calculated analytically using Maple. The Maple code is attached in Appendix B.3. Since there is no simple expression of the efficiency, we show the results graphically. Figure 4.2 shows the case when k* = 3. When k* increases, the efficiency generally decreases. 77 o g> 'g 0) oo d CD d d d o d (a) A) 1"• yj b--0.5 0.0 0.5 P2 (C) a2 1.0 (b) A oo d o c a> 'o CD d o d -0.5 Pi=0 •-- P!=0.3 p1=0.5 -•- p1=0.8 (e) P2 o c gj 'g it CD CO d CD d d o d O c CD 'g it CD CO d co d d co d o d P2 p2 Figure 4.2: Efficiency of CLl for Type-3 families with k* = 3 offspring (MVN model); P2 = (P2 — Pi)/(1 — Pi) is the sib-sib correlation conditional on their parent. In Figure 4.2, the efficiency is plot at different levels of pi (0, 0.3, 0.5 and 0.8). To make curves more comparable, the efficiency was plotted against the conditional sib-sib correlation, 78 Pi = (P2 — Pi)/(1 — Pi)- Some remarks follow: 1. The efficiency of f3\&Li approaches zero when p2 approaches to its lower and upper bounds. Given pi, Q\,CL\ is most efficient when the conditional sib-sib correlation is close to 0. 2. The estimates of the other parameters perform well when the conditional dependence between siblings is non-negative. Except for &CLl, the efficiency decreases rapidly as the conditional dependence approaches its lower bound. However, in analysis of familial data, we expect the conditional sib-sib correlation to be non-negative. 3. When the parent-offspring correlation p\ is 0, the CL1 estimate of pi is as efficient as the MLE regardless of the strength of the dependence among the siblings. When there is a varying number of offspring in each family, the efficiency is affected by both the mean and relative dispersion of family size. We randomly generated families with numbers of offspring varying from 1 to 7. We generated 500 sets of families. Each set has a different combination of family sizes. We calculated the efficiency of the parameter estimates for each combination at pi = 0.4 and p2 — 0.8, then plotted the efficiency against the mean of k\, denoted by k*, and the relative dispersion, measured by the variance-mean ratio of A;*'s, V/M. We deliberately chose p2 far away from pi since the effect of the relative dispersion of family sizes is more obvious under strong conditional sib-sib dependence. The following patterns were observed: 1. Since trR"1 is almost linear with k*, ARE^ mainly depends on k*. Figure 4.3 shows that the efficiency is monotonely decreasing in k*. 2. In Figure 4.4, we plot the efficiency of the other four parameters. The x-axis is k* and the y-axis is the variance-mean ratio of the A;*s. For J3O,CLI, ®CL\ and P2,CLI, the efficiency is almost linearly decreasing with both k* and V/M. On the other hand, ARE^ CL1 is mainly affected by V/M. For different values of pi and p2, similar patterns were observed, c. Type-4 family In a Type-4 family, there are two parents with their common offspring. Assume that the correlation between father and child is the same as that between mother and child. Let pi and p2 79 Figure 4.3: ARE of /3I,CLI against average number of offspring, k* (MVN model). be the parent-offspring and sib-sib correlations, respectively, and p3 be the correlation between the parents. The three dependence parameters must satisfy the following conditions: (p3 + l)[(M-l)p2 + l] , 2M Pl>°' where M is the maximum number of offspring in a family. This can be proved with the same techniques used by Srivastava and Katapa (1986). Case c is actually an extension of case b. The effect of p3 is symmetric to that of p2. We first examine the boundary cases: 1. When the dependence parameters approach the boundary: (p3 + l)[(M-l)p2 + l] = 2 2M Pl' ARE a and ARE a approach 0. 2. When either p2 or P3 approaches 1, ARE^^^ approaches 0. Table 4.1 presents some results for constant family sizes. Except for the efficiency of the estimates of other parameters is generally high. Discussion 80 (J2 variance-mean ratio of k* mean of k* ET0.9 - c g> =e 0.8 2 10 mean of k* variance-mean ratio of k* 1, 00.9-o 'o £. 0.8-Q> variance-mean ratio of k* 2 10 mean of k* 00.9-I variance-mean ratio of k* 2 8 mean of k* Figure 4.4: Efficiency of p\,CLi, °CL1, pifiLi and /52,CLI (MVN model). The performance of the CL1 estimators is affected by the degree of dependence among the family members. When the correlation parameters approach the boundaries, Rj is close to singular and the efficiency of /3I,CLI g°es to 0. The CL1 estimators of the other parameters are less affected by the correlation parameters and perform fairly well. Their efficiency can reach 0 at some particular parameter values, such as in case b, when the conditional sib-sib correlation approaches its lower boundary. The efficiency is also affected by family size. In general, the efficiency tends to decrease as the average family size increases and the relative dispersion of family size increases. When the relative dispersion of family sizes is large, the data are mainly formed by either small or large families. Individuals or pairs contributed by different families are equally weighted in the CL1 method. That is why large relative dispersion of family size combined with high dependence has a strong effect on their efficiency. 81 pi = 0.2 P2 = 0.4 p3 = 0.3 pi = 0.2 P2 = 0.7 P3 = 0.5 Pi = 0.5 P2 = 0.7 p3 = 0.5 number of offspring 0o .987 .921 .982 A .777 .475 .475 a2 .993 .936 .974 Pi .989 .904 .972 k* = 3 P2 .999 .985 .995 P3 .991 .914 .987 00 .978 .893 .961 01 .727 .412 .412 a2 .987 .898 .942 Pi .987 .888 .960 k* = 5 P2 .997 .947 .969 P3 .980 .865 .966 Table 4.1: AREs of CL1 for Type-3 families with equal number of offspring (MVN model) 4.1.3 Estimates Based on Bivariate Composite Likelihood (CL2) Estimating Equations In the CL2 estimating approach, j3, a2 and a are estimated simultaneously by maximizing the product of the likelihoods of the bivariate margins. First, we will show why it is appropriate to adjust the BCL by family size. Suppose ki > 1 for each i. We estimate all the parameters by maximizing the BCL function defined in (4.5). Differentiating and setting to zero lead to the following set of equations: = ^a^XjAi(Yi-X<j9) = 0 i th = ^ a a2 ^ - 1) - J](Yi - X^j'A^Yi - Xi/3) = 0 (4.12) (4.13) £(V. - - *» - 0 (4,4) Let $BCL be the estimator based on the above estimating equations. From (4.12), we have PBCL = (EiXjAiXir^XjAiYi. Therefore, AVar0BCi) = a2[J2 X'^Xi £ XjA;RjA;Xi £ AjX; (4.15) 82 We consider the case of one covariate and make the same assumptions for the covariate as on page 72: the values of xn,... , are independent realizations of a random variable with mean 0 and variance 1. Asymptotically, AVar(/30,BCi) - S<1'Bil AVar(A,BCL) a2(trA02' The efficiency relative to the MLE is: ARE - 'ElW where Bj = cr2A;RjA AREs (£il'Bil)(E.l'R-1l) (trA,)2 3I.BC* (EjtrBO^trR"1) Using the exchangeable case as an example, in Figure 4.5, we plot the efficiency of both the CLl and BCL estimators relative to the MLE for the special case of a mixture of two family sizes: size two and six with proportion 3:1. For /?o, the efficiency of the BCL estimator is much lower than the CLl estimator when the association is high. For Bi, the BCL estimator is better when the correlation is strong, whereas the CLl estimator is better when the correlation is weak. It is Figure 4.5: Efficiency of CLl and BCL for the exchangeable case. Families are of size two and six with proportion 3:1 (MVN model). not surprising that the estimates based on BCL are inefficient when the correlation p is close to 83 zero, since (4.12) can be seen as the CLl estimating equation (4.3) weighted by Aj. When p = 0, Aj = (ki — 1)1. By Theorem 3.2, the CLl is as efficient as the MLE when p — 0. Hence it is more efficient than a method with unequal weights. This justifies the previous consideration in Section 3.2.3 of weighting the BCL function by family size. In the CL2 method, the weight of family i is given by Wi = 1 for ki = 1 and Wi = l/{ki — 1) if ki > 1. Let Z(j)(/3, cr2, a) be the contribution to the CL from the ith. family before weighting. After weighting by w{, the BCL function becomes: ^*BCL{Y;/3,a2,a) = J^Wil^ifi^2 ,a) i \loga2Y^h-\y£dWiYJ log(l - fijj,) -i i jjtj' = C ^ J2(Yi - Xi(3)'A*(Yi - Xj/3), where c* is a constant and 1 when ki = 1 Aj/(fcj — 1) when ki > 1 After taking partial derivatives of l2 and rescaling, we obtain estimating equations of the CL2 method: ft = EX'*A**(Yi-Xj/3)=0 i V>2* = a2 J>-^(Yj-Xj/3)'A*(Yj-Xj/3) = 0 (4.16) (4.17) r3 Pijf dpijji J2(Yi - XiPY^p-iYi - Xj/3) = 0 (4.18) da The estimate of /3 generated from the CL2 method is hem = (E X. Ai x<) E XlAiY-The variance of $CL2 1S Var(j3CL2) = a2 ^ XjA?X^ ^ Xj A?Rj A?X^ ^ XjAJX4j . For the example illustrated in Figure 4.5, (3o,cL2 is as efficient as /3O,CLI and (3i,cL2 is generally more efficient than $itcLi- This will be shown in the next section. 84 Efficiency First, we show that the CL2 method provides a better estimate of Q\ than the CLl method. Theorem 4.1 Consider the case of one covariate. Under the assumption that the values ofxn,..., x^ are independent realizations of a random variable with mean 0 and variance 1, we have AVar0hCL2) < AVar(pi,CLi). Proof: From (4.8), we have AVar^^cLi) — 1/K. By a similar deduction, we also have EtrBJ AVar(/31)CL2) (£trA*)2' where B* = A*RjA*. (1) We first show that £tr AJ > K. It is trivial for ki = 1. For kt > 1, (2) Next we claim that trB* < tr A*. It is true when fcj = 1. For ki > 1, we have tr-A* — . _ -i y,aijj-and tr Bj' = (j^ _ 1)2 E ^ij^^iji where is the jth column of Aj. To prove the claim, it suffices to show ajjR,jajj ^ Qijj{ki 1). By symmetry, we only need to show that this is true when j = 1. ^jiRiaji = afn + 2ajn ^ anjpnj + ^ a2^- + ^ anjO-iij'Pijj' j=2 j=2 jft>>2 Since k{ k{ ^ k{ ^2 Otll + E a^3PH3 = E i _ n2 ~ E i = k ~ 1, ' j=2 j=2 1 j=2 1 P^3 85 a^Rjaji = (ki — l)am + am ^aajPiij + E8iij + E aHjavij'Pijj j=2 j=2 j±j'>2 ki ' • ~ Pil.7 / V ,--o 1 Pili / o-9 ^ P1I7V + £ ^•=2X ^7 ^3) j=2^ PiljPilj' Pijj ^>2(1-P?y)(l-^) 2 - iu ^ili _i_ PiijPnj'Pijj 0- ~ P?y)U ~ Pij') j#y>2 (1 - P?y)(l - Pij') = (ki - l)am ~ Y2 Piij + pfij' ~ Ipiijpiij'pjjji j>?>2 tt-PiljW-Pil?) Since paj + p 2aj, > 2\pnjpnj>\ > 2|piljPiij'Pw'|, Paj + Pa/ ~ 2pnjPnj'Pijj' > 0. Therefore, a^Rjaji < (fcj - l)am-From (1) and (2) we easily obtain that AVar(/Ji)cL2) < AVar(/3i)cLi)- The equality holds when Rj = I for all i. • Next we give some results for the exchangeable case and for Type-3 families, a. Exchangeable (ki > 2) 1. ARE H = ARE a and ARE.2 = ARE.2 . P0.CL2 P0,CL1 CCL2 aCLl 2. ARE* 1 01.CL2 [i+p_p2(i_a)](i_6p)' where a = K'1 £ - 1) and 6 = if-1 £ + (ki - l)p]. When p = 0, ARE A = 1. r Pl,CL2 When p > 0, since 0 < a < 1 and 6 > 1/fe, AREA > - \-, > - =7- > 0.8. KCL2 - (! + p _ p2)(! _ p/fc) - (1 + p _ p2) -3. In Figure 4.6 we plot the efficiency of pen, PCL2 and PBCL for the case of family sizes two and six mixed with proportion 3:1. When p is close to 0, the CL2 estimator is worse than both the CL1 and BCL estimators. However, as p increases, the CL2 estimator surpasses them at around p = 0.2. 86 0.0 0.2 0.4 0.6 0.8 1.0 P Figure 4.6: Efficiency of pen, PBCL and pcx2 (MVN model); family sizes are 2 or 6 with proportion 3:1. The CL2 estimate is weighted by family size; whereas the BCL estimate is not. b. Type-3 family In Figure 4.7 we plot the efficiency of the CL2 estimators for all the parameters for the case of number of offspring varying from 1 to 5 with equal proportion. Similar to Figure 4.2, the efficiency was calculated at four different levels of p\ = (0,0.3,0.5,0.8), and plotted against the conditional correlation p2. For the purpose of comparison, we also plotted the efficiency of the CLl estimators. The plots show that 1. /3Q: when p2 is small, the two estimators are similar. For positive p2, the CL2 estimator loses less efficiency when either pi or p2 increases. 2. fii: unlike the CLl estimator, the CL2 estimator performs very well when pi or p2 is large. 3. cr2: the efficiency pattern is similar to f3$. 4. pi and P2: The CL2 estimator loses efficiency when the dependence is small or negative. It performs better than the CLl estimator when the dependence increases. 87 Figure 4.7: Efficiency CLl and CL2 for Type-3 families. The family size varies; the number of offspring is from 1 to 5 with equal proportion (MVN model). 88 Discussion In summary, compared with the CLl approach, the advantages of the CL2 approach are: 1. the efficiency of the estimate of 3\ is much improved, while the efficiency of the estimates of Po and <72 is similar to the former approach; and 2. the efficiencies of the estimates of the dependence parameters are improved when the corre lations are high. The drawback of this approach is that when the dependency is weak, the estimates of a are less efficient compared with the CLl approach. 4.2 Binary Response: MVP Model The multivariate probit model for binary data is specified in Section 2.2.1 with the probability that Yj = yj given in (2.5). Let p{ = {pijfj > j'), then 7^ = (/x-,p-). Suppose fi{ - xf}/3 and Pi = Xf )a. Then 6' = (/3', a'). We have and 7J = Xj0. 4.2.1 Likelihood and Composite Likelihood Estimating Functions MLE Suppose there are Mj different possible outcomes of Yj with non-zero probabilities. Let yj,/ denote the Zth outcome and TTU denote the associated probability. For the MVP model, Xj = where 2l£'j is the rectangular region in 5Hfei in which Zj falls when Yj = yj^. For each i, ^ nu = 1. L(Yj;0) = n4(Yi=yM)> 1=1 89 where I is the indicator function. The estimating functions are ^^J(Yi = yw)Xj*^=0. The Fisher Information matrix is (diru/frli) (dnu/dii)' n Xj. (4.19) Let Aj = (Ajjj/) denote the term inside the square brackets in (4.19). Then F = -Vx^AjXj. (4.20) i The Two-stage Estimating Approach (CLl) Let Vi] = EYij = 1 - $(/%), Vi = (i/a,...,!/^)', a?- = Var(l^j) = *l>y)[l - $l>y)], and The univariate log likelihood of Yjj is hiYijiP) = (1 - r-j) log*Oy) + Ytj log[l - $(Wj)]. In the first step, /3 is estimated by the solution of the following equations J](xS1))'gf/CL(Yi,/Xj) = 0, (4.21) i where gUCL{ii,Vi) = 5/Xji dfiiki Here fli(Yjj) = (1 - YjjWjj Yjrfjj = <fo {Yjj - [1 - $(/ijj)]) = ^(^ -Let Aj = diag(<^/<7?). Then we have gr/CL(Yj,//j) = Aj(Yj - i/j). In the second step, the bivariate marginal log likelihood is given by: h{Yij, Yiji;nij,Hiji,pnji) = (1 - Yij){\ - Yiji) log[$2(/ijj, P-ij>\Pijj')] + (1 - Yij)Yij> log[$i(/ijj) - $2(Mij)/iy';Pw')] +yjj(l - Vjj') log[$i(/ijj0 - $2(p-ij,P>ij<;pijj>)] +YijYiji log[l - $i(/ijj) - + $2(p<ij,Vij';Pijj<)] 90 The estimating equations for a are: E(x!J))Wi(Yiitt,ft) = 0 i where sir \ I vkyYijiYiji] Uij, fail, Pijj') ., g2BCL(Yi;/ii,pi) = a " —> 3>3 \ oPijj' ((-i)Yij+Y^'Mt*ij,t*ij>;pijj>) . > . y 1*2 {Yij > Yiji; p^, //jj', p^ji) ' y The Godambe information matrix for /3 is given by (3.12) where dK a/ij ^ V4y and fi,1'1 = Cov(gr/cL(Yi,/xi)) = AiCov(Yi)Ai. 3. CL2 method To estimate both (3 and a based on the weighted BCL, the estimating functions are: 0*i(Y»i) , 1 ^ dhjYjj,^,) {i:ki=l} M {i:ki>\} j>j' H sr^ dpn 4>ii(Yn - pn) 1 9pjj ^ ( 1)1 ^^i/ij^1 ®ij'\j3 ) V2,CL2 fc - lf^. 9a {i:*i=l} i>j' E fc. _ 1 E gpijj/ (-l) Yij+YiJ'(f)2(pjj,pjj>; pjjj') where ^ = $(/ia) and ^ = $ • ^"'^ m' 4.2.2 Efficiency Comparison In the first part of the comparison, we focus on the efficiency of (3c Li, since in the previous section, we observed that the efficiency is strongly affected by the within-family dependence. In the second 91 part we focus on comparing the efficiencies of all parameter estimators from the CLl, CLl with optimal weights (WCL1) and CL2 methods. The AREs need to be evaluated from the information matrices which depend on the values of the covariates and their coefficients. In our comparison, we considered one covariate and simulated the values of the covariate from a uniform distribution on (—1,1). These values are independent both within a family and among families. The efficiency is compared at different levels of Po and A which are listed in Table 4.2 along with the corresponding range of Pr(Y = 1). Only positive values of /3 were investigated; by symmetry, the results can be extended to negative values. 00 A Pr(y = 1) 0 0 0.50 0 I 0.159 ~ 0.841 0 2 0.023 ~ 0.977 0.67 0 0.749 0.67 1 0.371 ~ 0.953 0.67 2 0.092 ~ 0.996 1.65 0 0.951 1.65 1 0.742 ~ 0.996 1.65 2 0.363 ~ 1.000 Table 4.2: Range of Pr(Y = 1) for each combination of Po and A (MVP model). Efficiency of /3CL1 To gain some understanding of how the performance of the estimators is affected by the degree of dependence, we consider the bivariate case; that is, the family size k{ = 2 for all families. We generated x values for 1000 families. The efficiency of PO,CLI and A,CL2 was evaluated at the nine combinations of Po and A when p varied in the range (—1,1]. The results are plotted in Figure 4.8. As expected, the efficiency of PO,CLI is only slightly affected by p. For positive p, ARE^Q is generally above 0.9. The efficiency of A^LI is affected more by p. In all the cases, the stronger the dependence among Zj, the less efficient is A,CLI- When A = 0 and pi is close 1, AREr. is close to 0. PI,CLl 92 93 Next we will have a closer look at the worst scenario: Z's are perfectly dependent within family and (3\ = 0. The family size is not restricted. When fa = 0, then pij = Po for all j. Perfect dependence then implies Zn = Z& = • • • = Zikr As a result, Yn = Y{2 = • • • = Y^. By the ML method, Pi can be estimated without error. The MLE of p0 is P0,MLE = $~HY), where Y = ya/n. Let a2 = Var(ya) = $(A))(1 - *(#>)) and 6 = a2/4>2(p0). Then VZX{PO,MLE) = 5/n. For 3Cil) / 52ijxij \ ( Ei kiXij iMl,11=ij:(xS")'j.x!i»=i Suppose x is centred, i.e., Ylijxij = °- Then Var(/?0,CLi) = <^Z)i ki/(Yli ki)2- When &j is constant across all families, Var(/3o,CLi) = S/n — the same as VSX{PO,MLE)- Meanwhile, Vax(PitcLi) = <5 Ei(Ej Xij)2/(J2 xlj)2- Since there is no estimation error in the MLE, the efficiency of Pi is zero. As shown in Figure 4.8, when Pi ^ 0, the efficiency is no longer zero since the MLE is also subject to estimating error. Comparison of CLl, WCL1 and CL2 Estimates a. Exchangeable In the following comparisons, efficiency was computed under the first four combinations of Po and Pi in Table 4.2. For each combination, p varies from 0 to 0.90. The other combinations of Po and Pi are not included since in those cases the numerical results are not reliable when p exceeds a certain level. The x values of 500 families were generated for each simulation with family size being randomly generated from 1 to 4 (the number of families of each size is 141, 139, 105 and 115, respectively). 94 95 |5o=0 p,=0 —I 1 r 0.2 0.4 0.6 Po=0 P,=1 —I 1 1 1 0.2 0.4 0.6 0.8 f>0=0.67 p,=0 —i 1 r 0.2 0.4 0.6 CL1 WCL1 CL2 po=0.67 P,=1 I I 0.0 0.2 i i 0.4 0.6 I 0.8 Figure 4.10: ARE of PCLI, PWCLI and pcvi for the exchangeable case with family size varying from 1 to 4. (MVP model) The results are plotted in Figure 4.9 and Figure 4.10. The following patterns are observed: 1. The efficiency of the weighted CL1 estimator is always close to 1 under all the conditions and for all three parameters. 2. The CL2 estimators of p\ and 3\ are better than the CL1 estimators. The improvement of $\,CL2 is substantial when p is close to 1 and 3\ is close to 0. 3. pcL2 loses efficiency when p is close to 0 and 3\ is close to 1. Recall that in the MVN model, ARE(/3O,CL2) = ARE(/3n,CLi) for the exchangeable case. However, in the MVP model, ARE(/3n,CL2) is larger than ARE(/3n,CLi)- If we compare the efficiency loss of $O,CLI in the MVP model to that in the MVN model, we find that there is more efficiency loss in the MVN model. 96 Figure 4.11: ARE of f$CL\i PWCLI ana" &CL2 f°r Type-3 families, at p\ line: CL1, dashed line: WCL1, dotted line: CL2. 97 = 0.3 (MVP model). Solid P2 P2 P2 P2 (a) Pi P2 q q CO O " CO o CO 6 " t d efficiency CO d " d CL1 WCL1 CL2 <N _ o Po=0.67 ft=0 •N _ o p<p0.67 p1=1 o d " o d " 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 P2 P2 (b) P2 Figure 4.12: ARE of pCL1, pWCL1 and pCL2 at pi line: CLl, dashed line: WCL1, dotted line: CL2. 98 = 0.3 for Type-3 families (MVP model). Solid P2 P2 (a) fa P2 (b) 0\ Figure 4.13: ARE of PQLII 0WCL\ and 0CL2 f°r Type-3 families, at p\ line: CL1, dashed line: WCL1, dotted line: CL2. 99 = 0.6 (MVP model). Solid P2 P2 (a) Pi P2 P2 P2 (b) p2 Figure 4.14: ARE of PCL\, PWCLI and PCLI f°r Type-3 families, at p\ = 0.6 (MVP model). Solid line: CLl, dashed line: WCL1, dotted line: CL2. 100 b. Type-3 family Again, 500 families were generated. In each family, the number of offspring was randomly generated from 1 to 3; there were 165 families with one offspring, 163 with two and 172 with three. The same four combinations for Po and Pi were considered. In the first case, the parent-offspring correlation pi was fixed at 0.3 and the sib-sib correlation p2 varied from 0 to 0.9. The efficiency of the four parameters was plotted in Figure 4.11 and Figure 4.12. In the second case, p\ was set at 0.6 and p2 was varied from 0.3 to 0.9. The results are in Figure 4.13 and Figure 4.14. The patterns for the efficiency of estimation of Po and P\ are the same as in the exchangeable case. PI,CLI lost more efficiency when pi = 0.6 than when p\ = 0.3. All three estimators of p\ are highly efficient. The efficiency of p2)cz,2 is relatively low when the dependence is weak. 4.2.3 Conclusion For the MVP model, we observe patterns similar to those seen in the case of the MVN model. As familial dependence increases, PI,CLI loses efficiency, while PI,CL2 is generally more efficient. The CL2 estimator of the dependence parameters loses efficiency when the dependence is weak. Unlike the MVN, for the MVP model the efficiency is affected by the magnitude of /3. The effect on the CL2 and weighted CLl methods is rather small, but is greater for the CLl estimators. In general, at a fixed dependence level, the efficiencies of the CLl estimators are relatively higher when Pi is far enough from 0 so that a change in x causes a notable change in the marginal probability. 4.3 Count Response: Multivariate Poisson-Lognormal Mixture Model The model is defined in Section 2.3.1 with pmf given in (2.11). In this model, Oi = (/3, CT2), 02 = a, Iii = (K.o-i) and 7i2 = {pij3>,j > f). 4.3.1 ML and CL estimating functions ML 101 The score functions are oo Vij=0 Viki=0 LetTr(yi;7i(6>)) = Pr(Yi-yi). CL1 and CL2 The univariate marginal likelihood function of Yj is h{Yij) = log J Pr(Yj = y»j|Ay = eZiJ)(j>i(zij;iJ,ij,o2)dzij and the bivariate marginal likelihood function of Yij and Y^/ is h(Yij,Yijt) = log jj Pr(Yj = yij\Kij = eZii)Pr(Yij> = yij/|Ajj/ = eV)^^,^-,; /^j, /_tij,cr2, pijj>).dzijdziji The CL1 and CL2 estimating functions are formed as described in Section 3.2.2 and Section 3.2.3. 4.3.2 Efficiency Comparison Since the calculation of the information matrices is difficult to carry out, the efficiency comparison is conducted by simulations. We considered the exchangeable dependence structure. In each simulation we generated 2,000 samples each containing 3,000 families. The family sizes were randomly generated from 1 to 4. To reduce the computing time, we use a discrete covariate which was randomly generated with value —1, 0 or 1. We first fixed 3\ = 0.5, a2 = 0.25. We chose three different levels of 0o — —1,0 and 1. For 0o = —1, the count variable is close to a binary variable — around 90% of the counts are 0 or 1. As 3Q increases, the count variable becomes closer to a continuous random variable. At each level of 00, we conducted three simulations with the correlation between the latent variables, p, set at 0.2, 0.5 and 0.8, respectively. The AREs are reported in Table 4.3. The mean and variance reported in the table are calculated at Xij = 0; p* is Corr(Yij, Yj/) with Xij = Xiji = 0. In some cases, the ARE is slightly over 1 due to simulation noise. 102 P P* CL1 CL2 A) fa °2 P 0o 0i °L P Pv =-1 mean: 0.417 variance: 0.466 0.2 0.5 0.8 0.019 0.050 0.083 1.002 1.002 1.002 0.997 0.997 0.990 0.999 0.997 0.952 0.986 0.907 0.989 1.001 1.002 1.001 0.876 0.997 0.996 1.001 0.938 0.984 0.995 0.976 0.954 A)=o mean: 1.133 variance: 1.498 0.2 0.5 0.8 0.044 0.114 0.190 0.998 0.992 1.006 1.003 0.982 0.966 0.985 0.982 0.982 0.927 0.973 0.968 0.998 0.997 1.002 0.875 0.988 0.989 0.994 0.912 0.994 0.981 0.995 0.959 A) = l mean: 3.080 variance: 5.775 0.2 0.5 0.8 0.084 0.219 0.364 1.000 0.987 0.999 0.992 0.984 0.924 0.966 0.972 0.956 0.830 0.927 0.936 0.998 0.996 1.001 0.908 0.989 0.984 0.985 0.933 0.980 0.971 0.977 0.943 Table 4.3: ARE of the CL1 and CL2 estimators for the exchangeable case at different levels of 0\ and p (MPLN model). At all levels of Bo, the efficiency of 0\,CLI decreases as p increases and the efficiency loss increases with Bo- The efficiency of oCLl is relatively low when p = 0.8. 0\,CL2 is more efficient than $I,CLI and less affected by p and Bo- The efficiency of pcm is lower than pen except for the last case, /3n = 1 and p = 0.8 and that could be due to simulation noise. Also we see that p* is small for all the other cases. This is consistent with the MVN and the MVP models, in which pem also tends to be less efficient when the familial dependence is weak. When a2 increases, the counts become more and more overdispersed relative to the Poisson distribution. We also did simulations to investigate how this affects the estimates. In these simu lations, we fixed Bo = 0, B\ = 0.5, p = 0.5, and three different level of a2 were chosen: 0.1, 0.25 and 1. The AREs are reported in Table 4.4. Even though p is fixed, as a2 increases, p* increases and 0itcL\ becomes less efficient. The estimator o2CLX also loses efficiency as a2 increases. As the CL1 method tends to lose efficiency as dependence is high, we also did a simulation with a2 = 1 and p = 0.8 to get an idea of how much worse the estimate can be. The results are also reported in Table 4.4. In this case, the efficiencies of 0I,CLI and OQL1 are much lower than the corresponding estimates from CL2. Moreover, the CL2 method provides a better estimate of p as well. We also conducted simulations with Bo = 0, a2 = 0.25, p = 0.5, and different levels of B\. There is very little change in the efficiencies of all the parameters, and therefore the results are not tabulated here. 103 a2 Mean Variance P* CLl CL2 Po A °l P A) A o2 P p = 0.5 0.10 0.25 1.00 1.051 1.133 1.649 1.167 1.498 6.319 0.049 0.114 0.279 1.002.1 0.992 1.001 0.998 0.997 0.990 0.999 0.997 0.955 0.852 0.896 0.940 1.009 0.996 1.001 0.913 0.997 0.996 1.001 0.938 0.977 0.972 0.989 0.949 p = 0.8 1.00 0.527 1.649 .632 0.895 0.673 0.809 0.932 0.955 0.954 0.953 0.963 Table 4.4: ARE of the CLl and CL2 estimators for the exchangeable case at different levels of a 2 and p (MVPL model). In conclusion, both Po and a 2 affect the efficiency of the CLl estimates, especially A,cii • Once again, we observed (1) efficiency loss of A,CLI when the dependence is strong; (2) the CL2 method is less efficient than the CLl method for the dependence parameters when the dependence is weak. 4.4 Survival Data Subject to Right Censoring: Multivariate Log-normal Model The goal of this section is to gain some knowledge on how the CLl or CL2 estimates are affected by the presence of censoring. Let us consider failure times with a multivariate lognormal distribution subject to right censoring. The model is described in Section 2.4.1. The weighted CLl method is not considered in this section since the optimal weights are difficult to compute in the presence of censoring. When censoring is present, the asymptotic variances of the estimates generally depend on the censoring mechanism. For the case of random censoring, these depend on the distribution of the censoring time C. Moreover, the information matrices are difficult to derive even when a censoring distribution is specified. For this reason, a simulation study was conducted to assess the efficiency. In the simulation study, we consider an exchangeable dependence structure among failure times within a family. The marginal distribution of log(Tjj) is normal with mean /ijj = Po + A^ij and variance a 2. The value of iy was generated from uniform (—1,1). For convenience, the distribution of the log censoring time, log(Cjj), is also chosen as normal with parameters pc and 104 p r CLl CL2 Po Pi °2 p Po Pi °2 p 0.1 0 0.2 0.5 0.8 0.996 0.985 1.000 0.991 0.995 0.984 0.999 0.997 0.994 0.988 0.996 0.992 0.998 0.994 0.999 0.996 0.998 0.997 1.000 0.913 0.996 0.995 1.000 0.889 0.995 0.996 0.999 0.876 1.000 0.999 1.000 0.884 0.4 0 0.2 0.5 0.8 0.937 0.788 0.979 0.957 0.945 0.807 0.980 0.946 0.945 0.842 0.973 0^936 0.962 0.906 0.976 0.943 0.960 0.968 0.987 0.967 0.974 0.970 0.993 0.964 0.978 0.969 0.992 0.928 0.986 0.977 0.993 0.903 0.7 0 0.2 0.5 0.8 0.885 0.447 0.924 0.916 0.882 0.480 0.909 0.908 0.874 0.547 0.902 0.891 0.894 0.677 0.905 0.887 0.942 0.958 0.961 0.957 0.944 0.949 0.961 0.952 0.949 0.945 0.966 0.922 0.964 0.939 0.966 0.898 0.9 0 0.2 0.5 0.8 0.846 0.160 0.863 0.885 0.841 0.183 0.845 0.876 0.840 0.239 0.839 0.869 0.871 0.353 0.867 0.889 0.924 0.966 0.934 0.943 0.926 0.947 0.937 0.940 0.943 0.932 0.949 0.925 0.982 0.913 0.976 0.919 Table 4.5: ARE of the CLl and CL2 estimators for the exchangeable case with family size varying from 1 to 4 (MVN model with right censoring). cr2. dj is independent from Ty and independent both within and between families. By varying p,c, we can achieve the desired censoring rate. Suppose r is the target censoring rate. We choose p,c such that Pr(Cy < T0) = r, where log (To) ~ N(P0,a 2). In each generated sample, the censoring rate fluctuates around r, but the average is close enough to r. In each simulation we generated 5,000 samples each containing 2,000 families. The family size was randomly generated from 1 to 4. We fixed Po = 1, Pi = 1 and a 2 = 4 since they have little impact on the efficiency. The censoring rate r was chosen at four different levels: 0, 0.2, 0.4 and 0.8. The case of r = 0 was also considered for the purpose of comparison. Four levels of dependence were considered: p = 0.1, 0.4, 0.7 and 0.9. The AREs are shown in Table 4.5. The following is a summary of the simulation results: 1. The CLl method: (a) Regardless of the degree of censoring, PifiLi loses efficiency as p increases. However, the loss of efficiency becomes less when r increases. 105 (b) The efficiency of other parameters is generally high and not much affected by the cen soring rate, especially when p is small. When p is relatively large, there is a decreasing trend in ARE^ and AREpCil. 2. The CL2 method: (a) The estimator pchi loses efficiency when p is close to 0 and, among the four parameters, it is also the most affected by the censoring rate. Its efficiency decreases with r, especially when p = 0.4 or 0.7. (b) The other parameters are not much affected by the censoring rate. The performance of $i,CL2 is satisfactory. We observed an increasing trend in AREa and ARE^.2 , and a decreasing trend in ARE a when p = 0.7 or 0.9. 4.5 Conclusion From the different models and dependence structures investigated in this chapter, we obtained similar results. The following is a summary of the main patterns for relative efficiency. 1. Regression parameters: the CLl method is easily affected by the following factors. (1) de pendence: it tends to lose more efficiency when dependence becomes stronger. The efficiency can be 0 in certain extreme cases. (2) data type: there is more efficiency loss for continuous responses than discrete responses. (3) censoring rate: when right censoring occurs, there is less efficiency loss when the censoring rate increases. (4) family size: the efficiency decreases with the average family size. The CL2 method is less affected by these factors. The efficiency of the CL2 estimate is close to 1 most of the time and generally better than that of the CLl estimate. 2. Other univariate parameters: both methods are reasonable. 3. Dependence parameters: the CL2 method is generally better for stronger dependence, even though exceptions can occur; the CLl method is better for weaker dependence. 4. Effect of family size: the efficiency of all the parameters is negatively associated with the mean and relative dispersion of family size (measured by the variance-mean ratio). 106 We make a few final remarks about these two approaches. They are not limited to familial data. We can apply them to other correlated data. We recommend the CL2 method when the initial data analysis suggests a strong dependence. It generally provides better estimation for all the parameters, especially the regression parameters. However, it is numerically more difficult to implement since computation increases when the total number of parameters increases. Therefore we recommend the CL1 method when the initial data analysis suggests a weak dependence. 107 Chapter 5 Inferences for Log Odds Ratio with Dependent Pairs This chapter is motivated by an association study of clinical features on NF1 patients (see Szudek et al. (2000)). The objective of the study is to quantify the association between the occurrence of the same clinical feature in relatives affected with NF1. Two associations are of special interest in the study: interclass association, i.e., association between parent and offspring. (2) intraclass association, i.e., association between siblings. As mentioned in an earlier chapter, comparing the strength of these two types of associations may help to reveal characteristics that separate familial and non-familial factors. For continuous variables with a multivariate normal distribution, the intraclass and interclass correlations are used to measure the sib-sib and parent-offspring associations. For a binary response, we study the intraclass and interclass (log) odds ratios as analogies of the intraclass and interclass correlations. Let X and Y be the indicators of the occurrence of a feature for a parent and an offspring, respectively. If the feature is present, i.e., the individual is affected, the indicator equals 1. Then there are four possible outcomes: (1, 1), (1, 0), (0, 1), and (0, 0). Suppose the probabilities associated with the four outcomes are P = (P\,P2,P3,Pi)'. The interclass odds ratio is _P\JP± The intraclass odds ratio, rss, is defined in a similar way. 108 If we have a random sample of n independent parent and offspring pairs, (Xi,Yi), i = 1.. .. ,n, and these pairs are identically distributed Bernoulli random variables. We can form a 2x2 table with counts, Oi, O2, O3 and O4, of the four outcomes (1, 1), (1, 0), (0, 1), and (0, 0) (Table 5.1). The odds ratio can be estimated by f>-°*2* (51) and the log odds ratio, 7 = log[(PiP4)/(P2-P3)], can be estimated by 7 = log(f). Based on the 1.1. d. assumption, the counts in the 2x2 table can be considered as a random sample from the multinomial (n;P) distribution. As a result, the asymptotic variance of 7 is h~l £; 1/P*-Offspring 1 0 Parent 1 0 02 03 04 Table 5.1: Contingency table formed by independent parent-offspring pairs In NF1 association study, the sample units are not independent relative pairs. For the interclass odds ratio, the data contain families with one NF1 parent and at least one NF1 offspring. The odds ratio was estimated based on a contingency table formed by all possible parent-offspring pairs. For the intraclass odds ratio, the data contain families with at least two NF1 siblings. The odds ratio was estimated based on a contingency table formed by all possible sibling pairs. In this case, pairs in the contingency table are not independent. Firstly, family members are dependent if there exists a familial association; secondly, the same individual appears in more than one pair when the family size is larger than two. Assuming exchangeability within a family and closure of multivariate binary distribution under margins, Oi/N, where N is the total number of pairs, is an unbiased estimate of P;, I = 1 to 4. Therefore, the cross-product ratio of the table is still a valid point estimate of the odds ratio. However, the dependence among pairs will affect the variance of the estimators and this has to be taken into consideration. The idea of forming a 2 x 2 contingency table by all possible relative pairs has been used by Hunt et al. (1988) to test familial aggregation for families of a fixed size. (Familial aggregation is a terminology commonly used in the literature referring to the resemblance among family members.) The test is based on the standard x2 statistic generated from the contingency table. He showed that 109 the standard x statistic is appropriate for intraclass aggregation despite the dependence among the pairs. For interclass aggregation, an adjustment based on the affected rate and the intraclass aggregation is needed. However, Hunt did not consider to estimate of the intraclass/interclass odds ratio and provide an appropriate standard error for the estimate. In this chapter, we derive the asymptotic variance of the estimators based on such a con tingency table and then compare their efficiency with the MLE from a parametric model. The rest of this chapter is organized as follows. In Section 5.1, we first derive the asymptotic variance of the estimator of the interclass log odds ratio and write it in a form that allows an interpretation of its magnitude as a function of the degree of association among siblings conditional on their parents. The method is derived for both Type-4 families, i.e., families containing two parents and multiple offspring, and Type-3 families, i.e., families containing one parent and multiple offspring. The latter is discussed as a special case. Then we propose estimation of the asymptotic variance of the estimator followed by an efficiency comparison of the proposed estimator with the MLE generated from a parametric model. We find that except for very strong dependence, the efficiency of our estimator is generally above 0.85. In Section 5.2, we consider estimating the intraclass log odds ratio based on all sibling pairs. Since the simple estimator tends to emphasize large families disproportionately, weights are considered to improve the efficiency. We then propose a method to evaluate the asymptotic variance and compare the efficiencies of the unweighted and weighted estimators with the MLE based on a parametric model. Two applications of these methods will be given later in Section 6.1. 5.1 Interclass Odds Ratio Assume a random sample of n families. Let kn (= 1 or 2) be the number of parents and ki2 be the number of offspring in the ith family. Let Xj = (Xij1 ,j\ = 1,..., kn) be the vector of indicators for the parents in family i and Yj = (Yj2, j2 = 1,..., k{2) be the vector corresponding to the offspring. Within a family, the parents and the offspring are considered as two classes. We assume that the individuals belonging to the same class are exchangeable. We also assume closure under margins. That is, there is a probability distribution, p(x, y; k\, k2) closed under margins, such that Pr(Xi = x,Yi =y) =Pr(Xi = x*,Yi = y*) = p(x, y; kn, ki2) 110 where x* and y* are arbitrary permutations of x and y, respectively. This guarantees a common intraclass odds ratio for different families: = P1P4 =p(l,l;l,l)p(0,0;l,l) 7po P2P3 p(l,0;l,l)p(0,l;l,l): 5.1.1 Estimator of jpo We pair each parent with each of his/her offspring and then form a contingency table based on all the pairs as in Table 5.1. We estimate the interclass odds ratio using the cross-product ratio of the counts Oi, 02, O3 and O4. The estimator is denoted by fpo. The log odds ratio is estimated by Ipo = log7>. Since the probability that a pair falls in cell 1,1 = 1,..., 4, is Pi for all the pairs, the expected value of Oi is E(Oi) = NPi, where N is the total number of parent-offspring pairs. Therefore Oi/N is an unbiased estimate of Pj. As a result, rpo is asymptotically unbiased. 5.1.2 Asymptotic Variance of 7po General Since %0 is a function of O = (Oi, O2,03,04)', we first obtain Var(O), then use the Delta method to obtain the asymptotic variance of 7po. Clearly the distribution of O is no longer multinomial since the pairs contributed by the same family are not independent due to the intraclass dependence and repeated use of the same individual in multiple pairs. Let Oi = {On, I = l)---)4), be the counts in the contingency table contributed by the ith. family. The number of parent-offspring pairs formed by the zth family is Ni = knfa. Let if) be the indicator that the jth pair contributed by the ith. family falls in cell Then On = Ylf=i ifj • The variance of Qu is Var(Ofl) = ^Var^J + ^Cov^.lg)) j=i j'ft = NiPiil-Pti + biiP^ -P2), where hi = Ni(Ni — 1) and P^ is the probability that pairs j and j' both fall in cell /. The 111 covariance between On and Ojm, / ^ m, is cov(ou,oim) = cov(x;/«,x;/(;)) j=i j'=i = EEcov(ig>4?)) j=i j'=i = ECov^^ifV^Cov^,^) = -NiPtPn + biiP^-PtPrn), where PJ£ is the probability that pairs j and / fall in cells I and m respectively. Since the two pairs are symmetric, PJ£ = PJ^. Next we derive P;vm, l,m = 1,... ,4. If we randomly choose two distinct pairs from all the possible pairs within a family, there are three types of combinations: type I: two pairs containing the same parent and different offspring. This occurs with probability (j) _ kiiki2(kj2 - 1) _ ki2 - 1 0,1 ~ Ni(Nt - 1) ~ Nt - 15 type II: two pairs containing the same offspring and different parents. This occurs with probability (i) _ ki2kii(kn - 1) _ kn - 1 hl ~ Ni(Ni - 1) ~ Ni - 1 0 kn = 1, 1/(2^-1) ka = 2; type III: two pairs containing no common individuals. This occurs with probability (*) 1 («) (*) «/// = 1-«/-«//• For families with one parent, = 1, a^] = a^Jj = 0; for families with two parents, df* = afjj = (ka - l)/(2ki2 - 1), afj = l/(2ki2 - 1). Let Pim\d be the probability that two distinct pairs fall in cells / and m given that the pair set is of type d, d = I, II or III. Then p(«) _ V/iW P , 112 The variance and covariance of O can be easily obtained based on the above results for Oj. Since Oi = Yli On and families are independent, we have Var(0<) = iV^l-PO + EwP-i?) i=\ Cov(0,,Om) = -NPtPm + Y,1>i(Pi%-PiPm), for/^m. i=l As %o = log PlPj PiPs" (5.2) (5.3) (5.4) where Pi = Oi/N, by the Delta method, the asymptotic variance of 7 is given by: since Cov(P) = Cov(0)/AT2. As 9JP1= (L _J_ __L J_V dp' IPI' P2' p3'p4y ' algebraic simplification leads to the following result: Result 5.1 The asymptotic variance of the interclass log odds ratio estimator given in (5.4) is 1_ A JL_ • J_v-, - N2^p+ N2l^bi 1=1 1 i 1 2 1 V- n ' 4 p(i) E tJL_ P2 , .1=1 ri J where a2 = E^Pf \ Pd = EMf/N and 9 pW 9 p' _l_ ZJ14 + zi23 •(0 9 pW 9 pW •"12 ZJ13 9 p(») 9 p(0 \ zi24 _ ZJ34 1 P1P4" P2P3 PiP2 P1P3 P2P4 P3P4; (5.5) Vd = d = I,II and III. ll\d £ p2 .1=1 ^ 2P^ 2P23|d 2P12|d 2P 13|d 2P 24|d 2P 34|<2 PlP4 P2P3 PiP2 PiP3 P2P4 P3P4 (5.6) In the above result, the term OQ/N is the asymptotic variance given by N independent pairs. Next, we show that there is an easy interpretation of vi and VJJ. When d = I, the same parent is in both pairs. Therefore, Pim\i = 0 when I = 1,2 and m = 3,4 or when I = 3,4 and m = 1,2. Hence = E Ht|J 0^12|/ 9£34|7 p2 Z Z J=l ^ 'PlP2 P3P4' (5.7) 113 Let cl u\i nv\i Pv ' 1 if Z = 2, 2 if / = 1, 3 if I = 4, 4 if/= 3. (5.8) Then uj can be written as Ylt=i Cl/Pl-Let X be the indicator of a parent, and Yi, Y2 be those of two offspring. We write py.\x = Pr(Y; = yi\X = x) and pyiy2\x = Pr(Y1 = yu Y2 = y2\X = x), for x,yi = 0 or 1. Then rt2|j P2 Pr(x = i,yi = i,y2 = i) Pn|i Pr(X = l,y2 = l) Pill Pr(x = i, yx = i, y2 = o) _ pioii _ PI|I - Pn|i Pr(X = l,y2 = 0) Po|i 1 -Pi|i It follows that ci = Pii|i-Pi|i Pi|i(l -Pi|i)' which is the conditional correlation coefficient of Yi and Y2 given X = 1. It can be shown that ci = c2. By symmetry, c3 = c± is the conditional correlation coefficient of Yi and Y2 given X = 0. This indicates that the magnitude of vi depends on the degree of conditional association between the siblings given the status of one of the parents. In general < 0 if the siblings are conditionally negatively dependent, vl ^ = 0 if the siblings are conditionally independent, > 0 if the siblings are conditionally positively dependent. Since |c/| < 1, |u/| < a\. Moreover, v\ can be simplified as Vl=ci (pi+£)+C4(i+^)-Likewise, VJJ depends on the degree of conditional correlation between the two parents given the value of one of their offspring and can be rewritten as 114 where c[ = Pn\n/Pi - Pi3\n/P3 and c4 = PU\II/PA ~ P^u/Pi-Unfortunately, there is no such intuitive interpretation for vni-For a special case of no interclass or intraclass dependence, vj = 0 and VJJ = 0 since the conditional correlations, ci, C4, c[ and c4, are equal to 0. Under independence, Pim\d = PiPm, and therefore vm = 0 as well. This leads to a2 = O-Q/N. Type-3 Family If the sample contains only Type-3 families, that is kn we have the following result: Result 5.2 When kn = 1 for all i, the asymptotic variance of a2 = {al + BIVI)/N, with N = £?=i ki2 and Bi = £^1 M*i2 " l)/N. Particularly, when the family size is constant, i.e., ki2 = k, a2 = (CTQ + {k — \)vj)/N. When family size varies, Bj can be written in terms of the sample mean (M) and variance {V = ^2{ki2 — M)2 j'n) of k{2: Bi = V/M + (M — 1). In general, the variance of 7po increases when either c\ or C4 becomes larger. Now we consider two cases: (1) the offspring are independent conditional on the status of the parent, and (2) the offspring are perfectly dependent conditional on the status of the parent. In the first case, c\ = C4 = 0, and vj = 0. Hence a2 = OQ/N. Therefore, when conditional independence occurs, the resulting asymptotic variance is the same as having N independent pairs. In the second case, ci = C4 = 1 and a2 = {Bj + \)o-1/N since v\ = a\ in this case. So when perfect dependence occurs, if the family size is constant, a2 = kal/N = o\jn. As would be expected in this case, it is the variance achieved considering only one offspring from each family. If the family size is not a constant, Bi + l 1 N > n' The estimate is less efficient than considering only one offspring in each family. = 1 for all i, afj = afn = 0. Then 115 5.1.3 Methods to Estimate the Asymptotic Variance of 7po in (5.5) First consider data with Type-4 families. To calculate the asymptotic variances of %0, we need to know Pim\d, which can be derived from the joint probabilities of two parents and two offspring. A natural way to estimate these joint probabilities is to use all the possible parent-offspring sets containing two parents and two offsprings. The sample proportion of counts corresponding to one outcome can be used as the estimate of the probability of that outcome. If the dataset is formed by Type-3 families, instead of estimating the Pim\d, we estimate the conditional correlations c\ and C4 in (5.8). In this case, the ANOVA estimator for unbalanced data can be borrowed for our use. To estimate ci, we construct an ANOVA table by all siblings whose parent is affected. Suppose there are n\ families with affected parent and a total number of N\ siblings from these families. Let % = Yj/fcj for i G I\ and Yi = N^1 Ylieh £j wnere I\ is the set of indices of families with affected parent. The sum of squares within and between families are: ieh j SSB = Y/HYi-Y1). The corresponding mean squares are MSE = SSE/(Ni - m) and MSB = SSB/(ni - 1). From the expected sum of squares (Searle et al., 1992), 2 (ni-l)(MSB-MSE) °B= £ie/lMi-Wtfi) is an unbiased estimate of Cov(Yj,-, Yj/), for % 6 I\, and a\ + MSE is an unbiased estimate of Var(Yj) for i G I\. The above results only depend on the moments of the distribution. Then an estimate of c\ is given by Cl a% + MSE' In a similar way, C4 can be estimated based the ANOVA table formed by all siblings whose parent is unaffected. The ANOVA approach can be inefficient when c\ and C4 is close to 1. The combination estimator proposed by Srivastava (1993) may be considered. An alternative approach to estimate the variance of 7po is to use the jackknife method with one family or one block of families removed at a time. 116 5.1.4 Asymptotic Relative Efficiency of 7po Asymptotic Variance of the MLE To compare the efficiency based on a parametric model, we need the asymptotic variance of the ML estimate of 7po. Let p(x, y; k\, k2; 6) be a multivariate binary distribution with hi, k2 being the dimension of the two classes. We assume that it is exchangeable in x and in y, and is closed under margins. Then 7po(0) = logp(l, 1; 1,1; 6) + logp(0,0; 1,1; 6) - logp(l, 0; 1,1; d) - logp(0,1; 1,1; 0) Let 0 be-the MLE and Vg be the asymptotic covariance matrix of n1//2(0 — 0), i.e., the inverse of Fisher's information F. Then the asymptotic variance of n1/2 [7(0) — 7(0)] is dj_ dj oe' eee' If the proportion of families with sizes k\, k2 is (asymptotically) qkl,k2 f°r ki = 1,2, k2 = 1,..., M, then Fisher's information matrix is given by i? V" V* „ ST V dp{yi,y;ki,k2;d) r3p(x,y;fci,fc2;g) / F=Z.Z^i^ QQ QQ< j v(wMte\Q). fc1=lA;2=l x€{0,l}fci yelO,!}^ Results We compare the efficiency with specific computations from the multivariate binary beta-binomial and multivariate probit models for Type-3 families. In our comparisons, we set the number of siblings in a family varying from 1 to 4 with the same frequency for each size. This is reasonable because most families contains no more than 4 offspring. If we change the maximum family size, it will not change the general patterns addressed below. (a) The relative efficiency is typically above 0.85. (b) The relative efficiency decreases as the dependence becomes strong. (c) The relative efficiency tends to decrease as the variance-mean ratio of family size increases. We first show the comparison based on the multivariate binary beta-binomial distribution. Suppose the probability of the presence of the binary trait among parents is TT, and that given the 117 Figure 5.1: ARE of 7po based on a multivariate binary beta-binomial model when ct\ = Po = a,a0 = Pi= P parent's state, Xi = I, I = 0 or 1, (the subscript ji is omitted since there is only one parent per family), the joint conditional distribution of Yj is a multivariate binary beta-binomial distribution with parameters (a/,/?/). The conditional pmf of Y is Pp,v _ ,,iY _ n - B{aI + yi+,pI + k-yi+) Pr(Y - y\Xl - /) - ^—W) , where B( • , • ) is the Beta function and y;+ = YlVj- The interclass odds ratio is (aiPo)/(aoPi). In general, the efficiency depends on TT, and the conditional correlation between offspring, which is given by l/(a/ + Pi + 1). As a simple case, we set a\ = PQ = a and ao = Pi = P so that the conditional correlation among siblings given X{ = 1 is the same as that given X{ = 0, and is equal to l/(o + P +1). In this case, the efficiency is not affected by the value of TT by the symmetry of the conditional distribution. For this reason we set TT = 0.5 for the calculation. The efficiency of 7po is plotted in Figure 5.1. When both a and P are small, i.e. the siblings are strongly correlated, the efficiency is relatively low. But even when the correlation is almost 1, it is still above 0.85. We next show the comparison based on the multivariate Probit model. Let Z;o be the latent variable of the parent and Zij, j = 1,..., ki, be those of the offspring. Let ni = EZJO and fi2 = EZjj, j > 0, and pi, p2 be the parent-offspring and sib-sib correlations between the latent variables. 118 P2 P~2 Figure 5.2: ARE of 7p0 based on the multivariate Probit model. We calculate the efficiency at different combinations of the means and different levels of pi and p2- The results are displayed in Figure 5.2. In order to make the results comparable at different levels of pi, the efficiency is plotted against the conditional sib-sib correlation p2. The efficiencies based on this multivariate Probit model are similar to those based on the multivariate binary beta-binomial model. If we change the distribution of family sizes, the efficiency of the estimator exhibits the same general patterns but with different magnitudes. We find that the efficiency tends to be lower when the variation of the family size is large compared with the mean family size. We illustrate this pattern by a comparison based on the multivariate binary beta-binomial model. The results are shown in Figure 5.3, in which each point corresponds to the efficiency associated with a particular distribution of family size. In all cases, the number of offspring in a family is set to be 1, 2, 3 or 4 but with a randomly generated frequency. The efficiencies are calculated based on a multivariate binary beta-binomial model with a\ = 2, d\ — 1, ao = 1, Po — 2, TT = 0.5. 119 variance-mean ratio of family size Figure 5.3: Efficiency of 7po versus the variance-mean ratio (V/M) of family size 5.2 Intraclass Odds Ratio Consider a random sample of n families, each containing a varying number of siblings. Let ki be the number of siblings in the ith family and Yj = {YijiJ — L • • • > hi) De the indicator vector for the siblings. We assume exchangeability and closure under margins, that is, there is a distributionp(y; k) closed under margins, such that for y of dimension fcj, Pr(Yi = y)=Pr(Yi = y*)=p(y;fci) for y* being any arbitrary permutation of y. From the closure under margins, the odds ratio is = P1P4 =p(l,l;2)p(0,0;2) r" P2P3 p(l,0;2)p(0,l;2) Due to the exchangeability, P2 = P3 = (1 — Pi — PA)/2. Therefore, the intraclass odds ratio is completely determined by Pi and P4: 4PiP4 ^2' (I-P1-P4)2 5.2.1 The Estimator of iss For family i, there are (2') possible sib-sib pairs. The indicators of a pair fall into three categories: both equal to 1, one equal to 1 and the other equal to 0, and both equal to 0. To form the 2x2 120 contingency table of pairs from all the families, let 0\ be the count of pairs both equal to 1 and O4 be the count of those both equal to 0. As 02 and O3 are not distinguished, we set each of them equal to half of the counts for heterogeneous pairs; that is, O2 = O3 = (N — 0\ — 04)/2, given the total number of pairs, N. The estimate of the intraclass odds ratio from such a contingency table is _ 0:04 _ 40x04 rss_ 0203 " (iV-Oi-04)2- ^'yj The corresponding log odds ratio estimate is %s = log Oi - 2 log(AT - Ox - 04) + log 04 + log 4. (5.10) 5.2.2 Asymptotic Variance of %s With similar derivations as in the preceding section, the variance and covariance of (Oi,04) is given by Var(0;) = NP\{l-Pl) + Ylbi{P®-P?), for 1 = 1,4, (5.11) i=l Cov(Oi,04) = -ArPiP4 + X>(Pi4 -PiP*), (5-12) i=i with N = £?=i (fe2<), k = (ki)((ki) - 1) and Plm =(1~ ai)Plm\A + OiPlm\3, h m = 1, 4, where ai is the probability that two random pairs from the (2') possible pairs contain only three distinct individuals and P^|3 is the probability that two such pairs fall in the cell / and m respec tively, while Pjj|4 is the same probability for two random pairs containing four individuals. It can be shown that _ ki(ki-l)(kj-2)/2 _ 4 ft)(ft)-l)/2 + * (Hunt et al., 1988). As P14|3 = 0, we obtain pfj = (1 - cn)Pu\4. 121 With substitution for P^, (5.11) and (5.12) become n Var(0,) = NPtil-Pti + YihaiPu^ + biil-aAPu^-biP?), for/= 1,4, (5.13) Cov(Ou04) = -7YPiP4 + z^(6i(l-ai)P14|4-6iP1P4). (5.14) Let i=l BI = Y, w#> = Y w - and B* = YI w (5-15) v0=,Pi 0 - r2 PiP4, vi=h> ° o p4; \PIP4 p42; v o p44|3/ v2 = | Pn|4 Pl414 ], v3 = - ( p* Plf -fl4|4 ^44|4/ \P1P4 P4 The covariance matrix of (Oi,04), denoted by V, can be written as: 3 Let V = NVo + Nj^BdVd-d=l , = 1 (diss djssy 9 N \ dPx' dP4 ' ' where diss 1 1 djss _ 1 1 dP, -P~i+P~2 m-J\ + p-2 The asymptotic variance of jss given by the Delta method is: 3 a2 = Nd>'VQ(f> + NY Bd<i>'Vd<t>. d=l 122 If we write Pi P2J ^\P2 P, (5.16) v3 = iVVv30 = then we have the following result. Result 5.3 The asymptotic variance of the intraclass log odds ratio estimator given in (5.10) is 1 1 3 d=l with Bd defined in (5.15) and Vd defined in (5.16). If each family contains the same number of siblings, say fc, then N = nQ) and Sl=2(t_2), B2=<"-2y-3), B3=<t+iy-2>, (Ms, from which it follows that CT2 = 2 • k — 2 erg + —— (4ui + (fc - 3)u2 + {k + l)v3) nfc(A; - 1) L 2 For the special case where the siblings are mutually independent, let p = Pr(Yjj = 1) and q = 1-p. For this case, Pi = p2,P4 = q2,P2 = pq, p14|4 = p2q2 and PU\L = pL, P44|L = qL, for L = 3 or 4. Consequently, Pff = (l-a^ + a^3 Pff = (l-a«)pV P4(? = (1 - a^4 + a^3. Substituting these probabilities into (5.17), we find Yld=i Bd,Vd = 0. Hence a2 = OQ/N, which is the same variance when the sample contains N independent pairs. Surprisingly, although the pairs are not independent due to the inclusion of the same individuals in multiple pairs, the asymptotic variance of 7SS is not inflated. Generally speaking, when siblings are positively dependent, cr2 > a\. When there are large families in the sample, a2 may be substantially larger than 02,. 123 5.2.3 Weighted Estimator of %s In the contingency table, a large family contributes many more pairs than a small family. For example, the contribution from a family of six is equivalent to that from 5 families of three or 15 families of two. Since the number of pairs is disproportional to the family size, the large families might be over-emphasized. With this concern, we consider assigning weights to each family based on the size of the family. Let rik be the number of families of size k and C)(fc) = (0[k\ O^f1) be the counts from these families. Consider a weight u>k for O^. Then the weighted total counts are Ow = Ylk WfcO^- Let lw,ss be the weighted estimate of the log odds ratio calculated from Ow, that is %>ss = l0S(N>Zwf-Pw4r ^ where N' = VJfc Wknk (2) • Next we derive its asymptotic variance. Let V^) = Var(0(fc))/nfc(2)- Then the covariance matrix of Ow is k Let vw = £«£nfcQ)v<*>. - —(— — — —\ *w ~ JV'^Pi + P2'P4 + P2j' If we write rk = (N')2(j)'wV^4>wi *ben the asymptotic variance of iWtSs is °~l = <P'wYw<t>w = ^7)2 E wknk (2) Tfcl It can be shown that rk = o\ + Y?d=i ^f^Vd-, where the B^ are given by (5.18). In particular, T2 = CTQ since B^ = 0 for all d. Without loss of generality, let w2 = 1, then °l = (777)2 + E wlnk Q) T* with N' = n2 + Efc>3 wknk (2) • To find the optimal weights, we solve the following equations: WkTk ~ N'al = 0, for A; > 3, obtained by differentiating cr2 with respect to the u>kS and then setting the derivatives equal to 0. 124 Result 5.4 The optimal weights which minimize the asymptotic variance ofjw,ss (5.19) are Wk = r2/Tk, for k>2. The corresponding asymptotic variance is tr^ = a2/N'. The crf^ is derived as the following: v ' fc>3 * T2 fn2 , 1 v> fc>3 N' N' by the definition of N'. Since the weight wk is a function of T2 and rk, it depends on Pi, P4 and Pim\Li f°r m = 1, 4 and L = 3, 4. In practice, these probabilities are unknown and need to be estimated. Therefore jw has to be obtained iteratively. Instead of using the optimal weights, a simple approach to down-weight the large families is to set the weights such that wkNk, where is the number of pairs contributed by a family of size k, is proportional to k, that is, wk = 2/(k — 1). Result 5.5 When set wk = 2/(k — 1), the asymptotic variance ofjw is <impu = 7j\J7)2 E YZ\ l2°l + (* - 2) ^ + (k~ ^ + '(* + ^3)] • (5-20) with N' = Ylk nkk-5.2.4 Methods to Estimate the Asymptotic Variance of 7SS To estimate the asymptotic variance of we need to estimate and Py|4- A simple way to obtain the estimates is to use the sample proportions based on all possible sets of four siblings to estimate the P^|4 and use all the possible sets of three siblings to estimate the Pij\z- However, if the dataset has limited number of families containing three or more siblings, this method will not be able to provide reliable estimates of the probabilities. In this case, the jackknife method can be used to estimate the variance of %s. 125 5.2.5 Efficiency Comparison In this subsection, we investigate the efficiency of the weighted and unweighted estimators by comparing them with the MLE based on a parametric model. Asymptotic Variance of the MLE Consider a parametric exchangeable model p(y; k, 9) which is closed under margins and defined for all k > 2. This means that the bivariate margin of Yj, Yj is p(yi, yj-, 2,9) for all i ^ j. The log odds ratio based on the parametric model is 7(0) = logp(l, 1; 2,9) + logp(0,0; 2,9) - 2 logp(l, 0; 2,9). Let 9 be the MLE and Vg be the asymptotic covariance matrix of n1/2(9 — 9). Then the asymptotic variance of nll2[y(9) — 7(0)] is dl v dj 89' 989' The matrix V0 is the inverse of Fisher's information F. If the proportion of families with size k is (asymptotically) qk for k = 2,..., M, then F-Z^fc 2^ ^ ^ / p(yk,k,9). *=2 y^elo,!}* Results We compare the efficiency with specific computations from the multivariate binary beta-binomial and multivariate probit models. In our comparisons, we set the number of siblings in a family varying from 2 to 5 with the same frequency for each size. The conclusions are the following. (a) The relative efficiency with the optimal weights is close to 1 (> 0.95). (b) The relative efficiency of the unweighted estimator decreases when the odds ratio increases and is not much affected by p; the relative efficiency can get below 0.6. (c) The relative efficiency for the estimator with simple weights is lower when the odds ratio is low, but it is typically above 0.8. For a multivariate binary beta-binomial model with parameters (a, 3). In this model the affected rate, p, is a/(a + 6) and the intraclass odds ratio is (a + 1)(B+ l)/(aB). 126 For a multivariate Probit model, let Zij be the latent variable corresponding to Yij. Let p, be the mean of Zjj, and p = Corr(Zjj, Zij>) for j ^ j'. Then, the affected rate is p = ${p). In our comparisons, we vary the degree of the intraclass dependence at four different affected rates (p): 0.05, 0.20, 0.35 and 0.50. When p is above 0.5, the results are same as that for 1 — p. For each case, we calculate the efficiency under three situations: unweighted, weighted by optimal weights and weighted according to family size (referred to as simple weights). In Figures 5.4 and 5.5, we plot the results for the multivariate binary beta-binomial model and the multivariate Probit model, respectively. The plots show that the unweighted estimator (solid line) loses efficiency when the dependence increases, whereas its efficiency does not seem much affected by p. The optimal weights (dotted line) greatly enhance the efficiency, particularly when the dependence is strong. When p — 0.05, the efficiency of the optimally weighted estimator is relatively low. In other cases, it is always above 0.95. Using the simple weights is worse than no weights when the odds ratio is low, however, the performance of this estimator improves when the dependence increases and is almost as good as the optimally weighted estimator for strong dependence. The plots also show that its efficiency climbs faster when p is closer to 0.5. 5 10 15 20 intraclass odds ratio 25 5 10 15 20 intraclass odds ratio 25 Figure 5.4: Efficiencies of jss estimators (multivariate binary beta-binomial model). Solid line: no weights, dotted line: optimal weights, dashed line: simple weights. 127 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 P P Figure 5.5: Efficiencies of 7SS estimators (multivariate Probit model). Solid line: no weights, dotted line: optimal weights, dashed line: simple weights. Next, we investigate how the distribution of family sizes affects the efficiency of the three estimators based on the multivariate binary beta-binomial model. In this case, we fix p at 0.5 and the intraclass odds ratio at a low level (1.44) and a high level (9). We only consider two different family sizes: 2 and 6. In Figure 5.6 the efficiencies are plotted against n2/n, the proportion of families of size 2. At both odds ratio levels, the unweighted estimator is least efficient when n2/n = 15/16, that is, when each type of family contributes equal number of pairs to the contingency table. The efficiency of the estimator with optimal weights increases monotonically with the proportion n2/n. When the odds ratio is small, the estimator with simple weights is least efficient when n2/n = 6/7, that is, each type of family contributes an equal number of individuals to the sample, whereas the efficiency of this estimator is almost coincident with that of the optimal weights when the odds ratio is high. Moreover, analogous to the interclass odds ratio, the efficiency of both the estimators, unweighted or weighted by family size, decreases almost linearly with the variance-mean ratio of the number of pairs contributed by each family. Lastly we give some considerations on using weights. When the dependence among siblings 128 0) 0.0 0.2 0.4 0.6 0.8 1.0 proportion of families with size 2 'g 0.0 0.2 0.4 0.6 0.8 1.0 proportion of families with size 2 Figure 5.6: Efficiency comparison of jss estimators with varied proportion of families with size 2 and 6. Solid line: no weights, dotted line with "+": optimal weights, dashed line: simple weights. In the figure on the right hand side, the dotted line with "+" and the dashed line are coincident. is weak, the loss of efficiency is modest if the unweighted estimator is used and, therefore, there is no compelling reason to use any weights. On the contrary, when the dependence is very strong (odds ratio > 5 when the affected rate p is not too small) and there is a great deal of variation in family size, the simple weights are recommended since they work almost as well as the optimal weights and are straightforward to use. 5.3 Discussion In this chapter, we proposed a simple method to estimate common odds ratios across families with mixed sizes. The method is directly extended from the odds ratio estimator based on a standard contingency table and is easy to carry out. The method can also be used in many other studies. For example, the interclass odds ratio estimator is useful for a family based case-control study. The methods are not limited to family studies either. They can be applied to clustered binary data. 129 Chapter 6 Applications to NF1 and NF2 Data In this chapter, we apply the models and estimation methods studied in the previous chapters to some measurements on NF1 and NF2 patients. On the one hand, these measurements have a remarkable variation among patients. On the other hand, they exhibit a certain degree of similarity among relatives. In this chapter, we analyze the strength of association between different types of relatives. The measurements are treated as response variables in our analyses. These variables belong to different data types, and therefore require different models and methods of analysis. For some variables, adjustment for covariates, such as age and mutation type, is required. Among the four response variables analyzed in this chapter, the first two are from NF1 patients and the data are from two large NF1 databases: the National Neurofibromatosis Foun dation International Database and the Neurofibromatosis Institute Database; the second two are from NF2 patients and the data are from various sources. The data from these sources can not be considered as random samples from certain population. Our inferences are valid based on the assumption that these data are representative of the NF1/2 population. The first variable is studied using Type-2 and Type-3 families. The other three variables are studied using Type-I pedigrees. The first variable is the presence of intertriginous freckling (freckling in intertriginous area, such as armpits, where skin comes into contact with itself) in NF1 patients. We estimated the interclass and intraclass odds ratio using the method presented in Chapter 5. The second variable is the presence of subcutaneous neurofibromas (tumors or growths located along a nerve or nervous tissue under the skin) in NF1 patients. A multivariate probit model and a two-component series model are compared in this example. The third variable is 130 age at first symptom of NF2, where right censoring is present. We consider a MVN model for this variable. The CLl, CL2 methods are compared with the MLE method in this example. The last variable is number of intracranial meningiomas (a type of brain tumor) in NF2 patients. A multinormal copula model with negative-binomial margin and a Poisson log-normal mixture model are compared in this example. 6.1 Intertriginous Freckling in NF1 Patients In this section we use the method presented in Chapter 5 to evaluate the parent-offspring (interclass) and sib-sib (intraclass) odds ratios in intertriginous freckling. 6.1.1 Interclass Odds Ratio There are 261 families containing one affected parent and his/her affected offspring. Among these families there are 187 with one child, 52 with two children, 22 with three to five. In total there are 364 parent-offspring pairs. The prevalence of intertriginous freckling among the parents is 0.88 and 0.80 among the offspring. The estimates of Pi to P4;from these pairs are A = 0.714 (both affected), P2 = 0.165 (parent affected, offspring unaffected), P3 = 0.091 (parent unaffected, offspring affected), P4 = 0.030 (both unaffected). To estimate the standard error of the log odds ratio (5.17), we need to estimate the condi tional correlations among siblings given the state of their parent, ci and c4 defined in (5.8). The estimation method is presented in Section 5.1.3. c\ = 0.004, obtained from an ANOVA table of all sibling groups with affected parent and c4 = 0.200, obtained from all sibling groups with unaffected parent. The jackknife method, in which one family was removed at a time, is also used to estimate the standard error of the log odds ratio. The estimates are given in Table 6.1. The naive standard error in the table is calculated ignoring the dependence among the pairs. The other two standard errors are not very different from the naive standard error, since the conditional associations among siblings are mild. The estimate of the standard error based on (5.5) is slightly larger than the 131 odds ratio log odds ratio s.e.* naive s.e.* Jackknife s.e.* 1.51 0.368 0.402 0.377 0.387 * standard error Table 6.1: Interclass odds ratio in intertriginous freckling jackknife estimate. 6.1.2 Intraclass Odds Ratio There are 193 families with at least two siblings. Among them there are 151 families with two siblings, 30 with three and 11 with four. There is one large family with eight siblings, all affected. In total, there are 335 sibling pairs. To obtain good estimates of Pjj|3 and Pij|4, the probabilities of two pairs falling in respective categories, usually a large number of families with size over two is needed. In this data set, there are not enough families to produce the estimates. However, the exchangeability assumption can help narrow down the region of these values. For k exchangeable binary variables, the joint probabilities are fully determined by Pr(Yi = 1, i = 1, ...j), j = 1, ...k. (George and Bowman, 1995). In fact, P44|3 = 1-3P2+PU|3 P44|4 = 1 - 4P2 + 2PX - 4PU|3 + P11)4 Pl4|4 = Pi - 2Pn|3 + Pll|4-Therefore, we only need to know PU|3 and Pn|4. Given Pi to P4, Pu|3 and Pn|4 are restricted to a relatively small region (Figure 6.1). Since an estimate is not available, we choose the point at the center of the region as the values of Pn|3 and Pn|4 and carry out the standard error calculation at these values. We calculated both the unweighted and optimal weighted estimates and the corresponding standard errors based on (5.17) and (5.20). The results are reported in Table 6.2. For the optimal weighted estimates, the estimated weights are 0.684, 0.526, 0.292 for families of sizes 3, 4 and 8, respectively. After weighting, there is a slight decrease in the estimate of Pi and an increase in the log odds ratio estimate. We also calculate the naive standard error (0.314) and the jackknife standard error (0.357) for the unweighted estimate. The standard error reported in Table 6.2 is larger than the jackknife estimate. This could be due to the choice of Pn|3 and Pn|4. 132 0.50 0.55 0.60 0.65 Figure 6.1: Region of Pn|3 and Pu\n (intraclass odds ratio). equal weights optimal weights PI (both affected) 0.704 0.680 P2 (one affected, one unaffected) 0.110 0.116 P4 (both unaffected) 0.075 0.088 log odds ratio 1.461 1.490 standard error *0.385 *0.359 * The standard errors are calculated at = 0.631, and Py|4 = 0.577. Table 6.2: Intraclass odds ratio of intertriginous freckling The naive standard error is about 10% smaller than the jackknife standard error. 6.2 Presence of Subcutaneous Neurofibromas in NF1 Patients Subcutaneous neurofibromas are a common manifestation among NF1 patients. Significant parent-offspring and sib-sib associations have been observed in a previous study (Szudek et al., 2002). In this section we first apply a multivariate probit model on the dataset used by Szudek. As an alternative, we then fit a two-component series model. The purposes of this section are to (1) demonstrate the CLl and CL2 methods in multivariate probit regression, and (2) compare the two-component model with the probit model as a standard method in binary data analysis. In addition, in the probit analysis, we also compare a model of conditional independence structure with that of a more general dependence structure. The model comparison is based on the Akaike information criterion (AIC) which is defined as —21og(L) + 2m, where L is the likelihood and m is the number of parameters in the model. 133 6.2.1 Summary of Patients There are 867 NF1 patients included in our analysis. These patients are from 371 independent families. Among all the families there are 12 of size one, 264 of size two, 68 with three, 18 with four, 5 with five, 2 with six and 2 with seven. The numbers of affected pairs included are summarized in Table 6.3. Type of relative pair Number of Pairs siblings 353 parent-offspring 264 second degree 57 third degree 23 fourth degree 1 Table 6.3: Number of affected pairs included in the analysis of presence of subcutaneous neurofi bromas in NF1 patients Subcutaneous neurofibromas are known to be increasing in frequency with age, therefore age is included as a covariate in both models. The average age of the patients is 20.5 (range: 0.2 -78.3) at the time of examination. Around 95% of the patients are under 50 years old. 6.2.2 Probit Model We first consider a dependence structure with three parameters: pss (sib-sib correlation), ppo (parent-offspring correlation) and p2+ (correlation between any relative pair higher than second degree). The correlation between a third or fourth degree relative pair is set to be equal to the correlation between a second degree relative pair because there are only 24 pairs of third or fourth degree. The parameters are estimated using the CL1, CL2 and MLE methods. The maximum family size is seven in this dataset, so we are able to compute the MLEs using the second order approximation (Joe 1995) for MVN rectangle probabilities. The results are reported in Table 6.4. The covariate age is incorporated on its natural scale, log scale and log-log scale. The log-log transformation provides the best fit of these three and is used in the final model. The point estimates from all the three methods basically agree , especially the regression parameter estimates. The standard errors of the CL1 and CL2 estimates are close to each other, but around 20% larger than those of the MLEs. 134 CLl CL2 MLE estimate s.e. estimate s.e. estimate s.e. Covariate coefficients: Po -3.109 0.291 -3.121 0.288 -3.199 0.240 age* 2.517 0.254 2.536 0.250 2.606 0.207 Correlations: Pss 0.660 0.099 0.594 0.121 0.652 0.093 Ppo 0.493 0.116 0.443 0.109 0.463 0.096 P2+ 0.710 0.311 0.698 0.322 0.670 0.275 age*=log(log(age+2)) Table 6.4: Parameter estimates of the Probit model for the presence of subcutaneous neurofibromas in NF1 patients. The three dependence parameters are: pss (sib-sib correlation), pp0 (parent-offspring correlation) and p2+ (correlation between any relative pair higher than 2nd degree). Incidentally, the unconstrained estimates from the CL2 method lead to an inappropriate cor relation matrix (not non-negative definite) for a multi-generation family. This shows the drawback of the CL methods if the positive definiteness constraint is not used in the estimating procedure. It is no surprise that the estimates of p2+ have large standard errors because there are only a small number of 2+ relative pairs and half of them are from four families. Since there is not enough information on the dependence between those relatives, we then consider a dependence structure based on the conditional independence (CI) assumption. In this case, the correlation between any relative pair higher than first degree is a function of ppo and pss as shown in Example 2.1. Again the results from the three methods agree. CLl CL2 MLE estimate s.e. estimate s.e. estimate s.e. Covariate coefficients: Po -3.109 0.291 -3.130 0.284 -3.224 0.288 age* 2.517 0.254 2.544 0.247 2.626 0.250 Correlations: Pss 0.676 0.096 0.604 0.118 0.658 0.085 Ppo 0.536 0.111 0.470 0.107 0.489 0.092 age*=log(log(age+2)) Table 6.5: Parameter estimates of the Probit model with CI structure for the presence of subcuta neous neurofibromas in NF1 patients. The negative log-likelihoods of the two models are 425.0 and 426.0, respectively (based on 135 the MLEs). It indicates that the model with the CI structure provides almost as good a fit as the first model. The AICs of the two models are 860.7 and 860.0, respectively. Based on the estimates of pss and ppo and their standard errors, we find that the sib-sib dependence is slightly stronger than the parent-offspring dependence. The estimate of p2+ is high but has a large standard error. Therefore, we do not have sufficient information to draw any conclusion on the dependence between more distant relatives. 6.2.3 Two-Component Series Model Now we consider a two-component series model for the presence of subcutaneous neurofibromas. In this model, the occurrence of subcutaneous neurofibromas is assumed to be caused by the heritable and non-heritable components jointly. The non-heritable component is modelled by a logistic regression. Age is included as a covariate on its natural scale. The multivariate binary beta-binomial model is used to specify distribution of the heritable components of a family unit. Let Yjtu be the heritable component of the parent / and Yj^u the corresponding components of Z's offspring. Let kn be the length of Yi_u. Then, Pr(V7>jj) = p/,for all i and j Yi-u\YItu = 0 ~ beta-binomial(a0,do, ku) Yi-u\Yitu = 1 ~ beta-binomial(ai,/3i,A;«) Meanwhile, Pi ,o + (1 -Pi)n ,o =Pl-Therefore, there are actually four independent parameters for the heritable component. The pa rameterization used in the ML estimation procedure is 9 = (logit pi, log an, log(/3o/o!o)? logai). The transformations improve numerical stability. The ML estimates of the model are reported in Table 6.6. The estimates are tranformed into probabilities and correlations and reported in Table 6.7. To avoid the tedious calculation of the Delta method, the 95% confidence intervals are obtained by resampling 0 from its estimated asymptotic MVN distribution. The estimates of ao and a\ have relatively large standard errors and are sensitive to the starting point. These two parameters are difficult to estimate because they involve the joint dis-136 estimate SE Non-heritable: intercept -3.399 0.353 age 0.232 0.027 Heritable: logit (pi) 0.409 0.128 log(ao) -3.504 4.475 log(A)/ao) 0.922 0.291 log(ai) 0.186 1.459 Table 6.6: The ML estimates of the two-component series model for presence of subcutaneous neurofibromas in NF1 patients estimate 95% confidence interval Univariate probability: Pi 0.601 (0.539, 0.658) Conditional probabilities: Pi{YItij = l\YItU = 0) 0.285 (0.183, 0.414) Pr(y/><i = l\Ym = 1) 0.811 (0.708, 0.884) Parent-offspring correlation: Corr (Y/^Yj,^) 0.526 (0.309, 0.695) Sib-sib correlation: Corr(Y/)i(,Yw) 0.677 (0.340, 0.750) Table 6.7: Probabilities and correlations of the heritable component in the two-component series model for the presence of subcutaneous neurofibromas in NF1 patients. Index I is the index of a parent and j, f are the indices of Vs two children. tribution of three variables: indicators of the parent and his/her two offspring. To obtain a more informative inference, we need more families with at least two offspring. 6.2.4 Comparison of the Probit Model and the Two-Component Series Model We first compare the AICs of the two models. Recall that the AIC of the probit model is 860.0. The AIC of the two-component model is 849.4. This indicates that the two-component series model provides a better overall fit than the probit model. Next, we use the two models to predict the prevalence of subcutaneous nuerofibromas in NF1 patients at different ages and compare the results with that observed in the data. The results are plotted in Figure 6.2. The dashed line is the probit model prediction and the solid line is the 137 two-component model prediction. The circles on the broken line are the proportions of affected patients in different age groups. Patients under 50 years old are grouped in 5-year intervals. There are only 40 patients over 50 years old, so those patients are grouped in 10-year intervals. From the plot we observe a decrease in the estimated prevalence after age 50. This decrease is very likely due to the death of patients with more serious forms of the disease. Subcutaneous neurofibroma itself is a benign complication of NF1, but it has been reported to have an association with other more severe tumors, such as plexiform nuerofabroma (Szudek, 2000). Therefore there could be a bias in older patients. For this reason, we limit our comparison to before age 50. The estimate from the probit model has a major departure from the data around age 20; whereas the estimate based on the two-component model follows the data well. This indicates that the two-component is better in modelling the age covariate while adequately approximating the dependence. CO d a I , a. o o d 0 20 40 60 age Figure 6.2: Predictions of the prevalence of subcutaneous nuerofabromatosis in NF1 patients by the probit model and the two-component model. The above comparison is based on the fit of the models. In terms of implementation, the two-component model is easier to implement than the probit model as it takes much less computing time. -o- empirical — probit two-component 138 Type of relative pair Number of Pairs siblings parent-offspring second degree third degree other 206 152 196 96 280 Table 6.8: Number of affected pairs included in the analysis of age at first symptom of NF2 In this section we apply the CL1 and CL2 methods to a MVN model for survival data. The response is the age at first symptom of NF2. Censoring can occur because some patients were diagnosed by gene testing before any manifestation was present. We assume that the censoring time is independent of the age at first symptom. 6.3.1 Summary of Patients There are 721 individuals from 526 unrelated families. The family sizes ranged from 1 to 23 affected members. There are 378 with one member, 35 with two, 18 with three, 16 with four, 13 with 5-8 and 6 with at least ten. Among the 721 individuals included, 70 were right censored, i.e., asymptomatic. There are 30 families with one censored case, 9 with two, 3 with three, and 2 families, each of size ten, with five and eight censored cases, respectively. The number of affected pairs included in the analysis is summarized in Table 6.8. 6.3.2 Model In previous work (Zhao et al., 2002), we used a simple random effects model to assess the intrafa-milial correlation without distinguishing the correlations between different types of relatives. In the study, we only considered inherited cases. An intrafamilial correlation of 0.51 was observed with a 95% confidence interval (0.35, 0.64). In this section, we further examine the correlations between various types of relative pairs, such as parent-offspring, siblings and more distant relatives. Individuals with both new and inherited mutation are included. The previous analysis indicated that the univariate marginal distribution of the data is close to a normal distribution. Therefore, we use a MVN model in our analysis. 6.3 Age at First Symptom of NF2 139 Mutations have been found throughout the NF2 gene, and different mutations tend to occur in different families (Friedman, 1999). These mutations are of various types, but most can be classified as nonsense, frameshift, splice-site, missense, or large deletions. In general, patients with a truncating mutation (i.e. nonsense or frameshift) are associated with an earlier onset. Therefore we include the indicator of truncating mutation as a covariate. Another covariate, the indicator of new mutation ( 1 = new mutation and 0 = inherited mutation), is included as well. We consider four dependence parameters: the sib-sib correlation (pss), the parent-offspring correlation {ppo), the correlation between second degree relatives (p2), the correlation between relatives more distant than second degree relatives (p3+). 6.3.3 Results We implement the CLl and the CL2 methods by minimizing the negative CL functions using the quasi-Newton algorithm (Nash, 1990). The standard errors are estimated by the jackknife method with one family removed each time. We also compute the ML estimates. The likelihood for censored family members is computed using the second order approximation proposed by Joe (1995) for MVN rectangle probabilities. The results are reported in Table 6.9. CLl CL2 MLE estimate s.e. estimate s.e. estimate s.e. Marginal: Bo 27.776 1.353 27.228 1.395 22.814 1.031 New mutation -3.496 1.415 -2.966 1.494 1.028 1.149 Truncating -7.295 1.159 -7.377 1.165 -5.997 1.245 a2 12.823 0.575 12.751 0.556 12.253 0.368 Correlations: Pss 0.425 0.093 0.454 0.079 0.616 0.051 Ppo 0.192 0.060 0.138 0.086 0.303 0.068 P2 0.170 0.093 0.100 0.085 0.220 0.069 P3+ 0.105 0.115 0.081 0.109 0.115 0.067 Table 6.9: Parameter estimates of the MVN model for age at first symptom of NF2 The point estimates and the standard errors from the CLl and CL2 methods are similar. The ML estimates of the regression coefficients and the two correlations, pss and ppo, are quite different from those CL estimates. This could be due to the normality assumption being violated. 140 We observed a certain degree of right skewness in the sample distribution. The ML method could be more sensitive to the departure from the model assumptions than the CL methods. Another possibility is that there are a few large families which are highly influential. Further investigation is needed to explain the differences. The results indicate that the sib-sib correlation is considerably higher than the correlation between other type of relatives. It also suggests a decreasing pattern when two relatives become more distantly related. In terms of computation, the computing times are 20, 42 and 342 seconds for the CL1, CL2 and ML methods, respectively. Moreover, the ML method is relatively sensitive to the starting point. 6.4 Number of Intracranial Meningiomas in NF2 Patients The variable considered in this section is a count trait: number of intracranial meningiomas in NF2 patients. This variable was analyzed in the previous study (Zhao et al., 2002). Since the information on relative types was not available, we assumed an exchangeable dependence structure within a family and modelled the counts using a gamma mixture of negative binomial distribution. An significant familial correlation was observed. In the current study, the information on relatives has become available and we would like to assess the dependence between different types of relatives. The gamma mixture of negative binomial model is not suitable for this analysis since it can only provide one level of dependence. Therefore, in this section, we consider a multinormal copula model (Section 2.3.2) with negative binomial margin and a Poisson-lognormal mixture model (Section 2.3.1) to analyze the data. 6.4.1 Summary of Patients In the dataset, there are 628 individuals from 431 families with information on meningiomas. Among the families there are 35 of size two, 16 with three, 15 with four, 11 with 5-7, 2 with nine and 2 with ten. The number of affected pairs included is summarized in Table 6.10. Among the 628 patients, there are 350 new mutation cases and 278 inherited cases. The counts of meningiomas are summarized in Table 6.11 by new mutation cases and inherited cases. 141 Type of relative pair Number of Pairs siblings 104 parent-offspring 157 second degree 109 third degree 44 other 67 Table 6.10: Number of affected pairs included in the analysis of number of meningiomas in NF2 patients. There are some patients recorded with meningiomas, where the count is unknown. These individuals are summarized in the last column. The sample mean and variance calculated based on the known counts are 1.75 and 6.12 for new mutation cases and 0.87 and 3.79 for the inherited cases. This indicates overdispersion with respect to the Poisson distribution. Type of Number of Meningioma mutation 0 1 2 3 4 5 6 7 8 9 10 19 20 > 0, but no count New 130 71 48 19 17 12 10 1 5 3 0 6 1 27 Inherited 160 48 19 7 5 5 2 0 1 2 1 1 0 27 Table 6.11: Summary of meningioma counts in NF2 patients 6.4.2 Multinormal Copula Model with Negative Binomial Margin for Number of Meningiomas We first considered a multinormal copula model. Since overdispersion relative to the Poisson distribution is observed, we chose the negative-binomial distribution to model the univariate margin of the count variable Y^. The univariate pmf of Yij is The parameter pij is the mean of Y^ and we specify log/ijj = BQ + B\Xij, where Xij = 1 for a new mutation case and 0 for an inherited case. We also assume different fyj values for the new and inherited mutation cases: 0ij = Q\ if it is a new mutation, otherwise, &ij = 6Q . We consider 3 dependence parameters: the sib-sib correlation (pss), the parent-offspring correlation (ppo), and the correlation between relatives of second or higher degree (p2+)-142 We use the CL1 and CL2 methods to estimate the parameters. The jackknife method is used to estimate the standard errors. Individuals with positive but unknown count are included in the analysis by calculating the probability of > 0. The results are reported in Table 6.12. The CL1 estimates of the marginal parameters are very close to the CL2 estimates. As for the dependence parameters, the standard error of the CL2 estimate for pss is considerably larger than that of the CL1 estimate. This is due to two special influential cases. The first case is a family containing two siblings with counts 5 and 9. The removal of this family results a decrease of 0.102 in the CL2 estimate of pss. The second case is a family containing two siblings with counts 1 and 10. The removal of this family results an increase of 0.156 in the CL2 estimate of pss. Similar changes occur in the CL1 estimate, but with smaller magnitudes (a decrease of 0.04 in the first case and an increase of 0.05 in the second case). The CL2 estimate is more sensitive to unusual pairs from small families since pairs from a small family are weighted more in the CL2 method than in the CL1 method. When there are influential observations in the data, the jackknife standard error can be inflated. To show the effect, we calculate the jackknife standard errors without using the two sets of estimates obtained with the two influential families deleted. The results are reported as s.e.* in Table 6.12. CL1 CL2 estimate s.e. s.e.* estimate s.e. s.e.* Marginal: A) 0.024 0.158 0.144 0.030 0.155 0.143 Pi 0.590 0.175 0.162 0.587 0.171 0.160 Oo 0.454 0.110 0.105 0.442 0.104 0.099 0.829 0.115 0.115 0.827 0.114 0.114 Correlations: Pss 0.481 0.119 0.100 0.408 0.202 0.080 Ppo 0.436 0.081 0.080 0.400 0.092 0.097 P2+ 0.454 0.229 0.228 0.382 0.218 0.217 —log likelihood 911.4 910.7 s.e.*: jackknife standard error with two outliers removed. Table 6.12: Parameter estimates of the multinormal copula model with negative binomial margin for number of meningiomas in NF2 patients 143 6.4.3 Multivariate Poisson-Lognormal Mixture Model for Number of Menin giomas We also use a multivariate Poisson-lognormal mixture model to analyze the same data. In this model, Yij|Ay = Aj, ~ Poisson(Ajj) and logAjj ~ Normally,a?-), where //jj = /3n + 0\Xij with = 1 for a new mutation case and 0 for an inherited case, and tr2- = a\ if it is a new mutation, otherwise, tr2- = a\. We consider the same dependence parameters for (logAji,..., logAjj) as in the multinormal copula model. The CL2 and CL2 estimates and jackknife standard errors are reported in Table 6.13. The results show that the two influential families have a even stronger effect on the estimates of pss in this model. CL1 CL2 estimate s.e. s.e.* estimate s.e. s.e.* Marginal: 00 -0.823 0.207 0.207 -0.867 0.205 0.203 01 0.914 0.219 0.218 0.955 0.215 0.214 1.718 0.329 0.307 1.838 0.356 0.335 o\ 1.083 0.146 0.146 1.085 0.147 0.147 Correlations: Pss 0.677 0.167 0.137 0.584 0.288 0.102 Ppo 0.599 0.123 0.123 0.578 0.137 0.137 P2+ 0.652 0.285 0.285 0.574 0.297 0.296 —log likelihood 911.8 910.9 s.e.*: jackknife standard error with two outliers removed. Table 6.13: Parameter estimates of the multivariate Poisson-lognormal mixture model for number of meningiomas in NF2 patients 6.4.4 Comparison of the Two Models Since both models contain the same number of parameters, we use the negative log likelihood evaluated at the CL estimates as a measure of the overall fit. The negative log likelihood of the copula model is computed using Joe's second order approximation for MVN rectangle probabilities. As for the Poisson-lognormal mixture model, the likelihood of a family is computed using numerical integration if the family size is less than 3; otherwise, the likelihood is computed using the Monte Carlo method. The results are reported at the bottom of Tables 6.12 and 6.13. Since the negative 144 new mutation: 0 1 2 3 4 5 6 7-9 >= 10 Pearson x2 observed 130 71 48 19 17 12 10 9 7 expected (MCNB) 132 66 42 27 18 12 8 12 5 5.4 expected (MPLN) 126 77 45 26 10 7 8 10 8 5.2 inherited mutation: 0 1 2 3,4 5,6 >=7 observed 160 48 19 12 7 5 expected (MCNB) 163 39 19 18 7 5 4.2 expected (MPLN) 162 45 19 14 5 6 1.4 Table 6.14: Expected number of meningiomas based on the multinormal copula model with negative binomial margin (NBMC) and the multivariate Poisson-lognormal mixture (MPLN) model log likelihoods of the two models are very close to each other (910.7 and 910.9 based on the CL2 estimates), there is no indication that one model fits better than the other. To check the marginal fit, we calculate the expected counts based on the CL2 estimates and compare them with the observed counts (Table 6.14). The expected counts for those individuals with positive but unknown counts are calculated conditional on Yij > 0 then subtracted from the total expected counts for all the individuals. The Pearson x2 statistic is also reported in the table. Since the data are correlated, the statistic is not chi-square distributed. However, a small value of the statistic still indicates a reasonable fit. In all the cases, the Pearson x2 statistics are less than the number of the categories. For the new mutation case, the fits of the two models are similar. For the inherited case, the Poisson-lognormal mixture model fits the data better. The estimated means based on the two models are almost the same. For the first model, the estimated mean is 1.85 for the new mutation cases and 1.03 for the inherited mutation cases. For the second model, the estimated means are 1.89 and 1.05, respectively. However, there is a difference in the standard deviations. For the first model, the estimated standard deviation is 2.45 for the new mutation cases and 1.85 for the inherited mutation cases. For the second model, the estimated standard deviations are 2.99 and 2.63, respectively. With the estimated parameters, the Poisson-lognormal mixture distribution has a longer right tail than the negative binomial distribution. For example, in the new mutation case, the fitted negative binomial distribution has a probability of 0.0003 to produce a count over 20 (the largest count in our data set), whereas the fitted Poisson-lognormal mixture distribution has a probability 10 times higher to produce such a count. The estimates of the dependence parameters from both models suggest an exchangeable 145 dependence structure within a family. However, the estimate of p2+ has a large standard error. 6.5 Discussion The examples demonstrate that the CLl and CL2 methods work well in practice. They also revealed some aspects that need to be investigated in future research. For example, it will be useful to gain some understanding of the consequences of departures from model assumptions, the influence of outliers or influential observations on the parameter estimates and the influence of large families on the estimates of the dependence parameters. In Chapter 4, we studied the asymptotic properties of the CL methods. The properties of the CL estimates may not be the same when the sample size is close to the data size in our examples. It would be interesting to further investigate the behavior of the CL estimators when the sample size is not very large. We will discuss these issues further in Chapter 7. 146 Chapter 7 Conclusion and Future Work 7.1 Conclusion The major contributions of our work to familial data analysis are in modelling and estimation approaches for non-normal traits. The CI model proposed in Chapter 2 provides a new approach to construct models for familial data. We gave two examples: the two-component model for binary trait and the CI model for count trait using a distribution of convolution-closed infinitely divisible class. Potentially, many other models can be constructed based on the CI assumption. The ex ample in Section 6.2 demonstrated that the CI model works well with real data and can be easily implemented. We also considered two existing classes of models: random effects models and multi-normal copula models, which are suitable for modelling familial data, but hindered in practice by their computational difficulties. Equipped with the CL estimation approaches discussed in Chap ter 3, these models become feasible. Our investigations in the asymptotic efficiency comparisons showed that the relative asymptotic efficiencies of these two methods are satisfactory except for some extreme cases. Those investigations help us better understand under what situations these methods work well and provide hints on how to improve them. 7.2 Future Work Due to the complexity in the nature of familial data, there are still many variations and extensions that remain to be explored. Listed below are several research problems that look interesting for 147 future exploration. 1. Extend the multinormal copula models to semiparametric margins. There is of special interest in survival models. A Cox-type regression model has been considered to analyze clustered failure time data (Wei et al., 1989; Lee et al., 1992; Spiekerman and Lin, 1998). The dependence of the failure times within a cluster is assumed to be exchangeable, but not specified by a parametric model. The marginal hazard function for the failure time Tij is \ij(t;Xij) = \0(t)eXi>13, where Ao is the common baseline hazard function. /3 is estimated from "quasi-partial likelihood", which takes the form of the partial likelihood of independent observations. The cumulative baseline hazard function is estimated by an Aalen-Breslow type estimator. Spiekerman and Lin (1998) established the asymptotic results for the estimators and developed procedures to approximate the covariance matrix. To combine the Cox-type of regression with the multinormal copula, we allow different dependence structure among the variables. It is natural to use the two-stage method to estimate the marginal and dependence parameters. We can adapt Spiekerman and Lin's procedure to estimate the regression parameters and the cumulative base-line hazard, then estimate the dependence parameters based on BCLs. King et al. (1996) and Bandeen-Roche and Liang (1996) used a similar procedure for a gamma frailty model with unspecified common base-line hazard, but they only provided asymptotic results for their first stage. The remaining work is to establish the asymptotic results for the estimating procedure and develop a method to estimate the standard errors of the dependence parameters. Even though we use the existing procedure in the first step, the asymptotic results need to be modified, since they are based on an exchangeable dependence structure. The major challenge is to develop the asymptotic properties for the dependence parameter estimators. To estimate those parameters, we need to plug-in the estimated marginal cdf in the bivariate marginal likelihood function. Unlike a full parametric model, the estimated marginal cdf is a step function. 2. Develop models for multiple responses and longitudinal familial data. In this thesis, we only consider one measurement on each individual. Often, values of more than one trait are recorded on each individual. It is of interest to see how the traits are related to each other. Sometimes patients are followed over time and the value of a trait is recorded longitudinally. There is a need to model the change of the trait over time. In both cases, there are more associations 148 that need to be taken into account: the association of the two measurements on the same family member, the association of two measurements across family members. Moreover, for the first case, the data types of two traits might differ. 3. Investigate other properties of the CL methods. As mentioned in the data examples, in practice we might encounter problems such as departures from model assumptions or outliers. It would be useful to investigate the robustness of these methods compared to the ML method. Intuitively, the CL methods might be more robust to violations of the distribution assumptions. The estimates of the marginal parameters might be closer to the true marginal characteristics since only marginal information is used. On the other hand, the influence of outliers on the CL methods could be stronger. It also important to see how these methods are affected by large families. Another direction is to examine their small sample properties. For many studies, the sample size is around a few hundred families. For the dependence parameters, the convergence is relatively slower than for the marginal parameters. The investigations can be done by conducting simulations with different sample sizes. 4. Further develop the weighted CL1 method. In Chapter 3, we derived the theoretical optimal weights for the CL1 estimating functions. Since the optimal weights depend on unknown parameters, the method needs to be implemented iteratively. In the future, we will develop the asymptotic theory and consider implementation of the method. 149 Appendix A Dependence Structure of Quantitative Traits In this appendix, we introduce the dependence structure of quantitative traits. Analysis of quan titative traits is fundamental to familial data analysis. It will help us to understand the patterns of association in non-normal traits. Moreover, the dependence structure of quantitative traits can be reasonably linked to non-quantitative traits. A good example is the multivariate probit model for binary traits. In such a model, the presence or absence of a binary trait is assumed to be determined by an unobservable quantitative trait which is under the influence of multiple factors. This idea can be used to generate other types of non-normal traits, such as the log-normal Poisson model for count data, or the log-normal frailty model for survival data. These models are discussed in detail in Chapter 2. As mentioned in Section 1.2, the value of a quantitative trait can be affected by a single gene, several genes, a large number of genes with comparable effects, or a gene with a major effect plus a large number of genes with small effects. The trait is normally also affected by environmental factors. How and how much the genetic and environmental factors affect a trait is often reflected in the association pattern within family members. The idea of modelling the dependence structure of quantitative traits is to decompose the total phenotypic variance into different components attributable to various factors. Some variance can be attributed to known and measurable factors — these factors are treated as fixed effects. Genetic factors are normally 150 modelled as random effects since modelling is at the phenotypic level without genotype information at loci and therefore genetic effects are unobservable. The major gene model is an exception, in which information on the major gene is available, but genetic effects additional to the major gene are still unobservable. Environment effects due to factors such as a shared household are unobservable as well. These factors are modelled as random effects. Environmental factors and genetic factors are usually considered additive. In this appendix, different models for quantitative traits will be introduced. Behind each model there is a different mechanism and therefore each model has a different dependence structure. But they share a common feature: the degree of association due to genetic factors decreases when two relatives are more distantly related. This is because of the fact that the more distantly related two individuals are, the less likely they are to share a gene by descent. Section A.l discusses the simplest case of Mendelian models. Section A.2 discusses polygenic models. The major gene models and models for autosomal-dominant diseases are discussed in Sections A.3 and A.4. The appendix ends with some general comments. A.l Mendelian Models Mendelian models involve only one locus or, in a more general case, a finite number of loci. In order to proceed from the simple to the complex, we first consider a trait controlled by a single locus with two alleles A and A'. If the frequencies of A and A' are p and q, then according to the Hardy-Weinberg equilibrium there are three possible genotypes, AA, AA' and A'A' with frequencies p2, 2pq and q2. Suppose the value of the phenotype (Y) is completely determined by the genotype (G) the variance of Y is purely phenotypic. Furthermore, suppose Y(AA) = fi + a, Y(AA') = fj, + d and Y(A'A') — fi — a, where —a<d<a. When d / 0, we say one allele is dominant over the other. The variance of Y can be expressed as a2 — 2pq(a + (q — p)d)2 + (pqad)2. The first term is called the additive genetic variance a2, and the second term is called the dominance genetic variance cr2,. If we do a regression of Y on the number of A alleles, a2 is the variation explained by the regression and a\ is the unexplained variation (Li, 1976). The additive genetic variance is closely related to the concept of heritability {h2), which is defined as the ratio of additive genetic variance to the total variance. It represents the degree of 151 influence of the phenotype value on the next generation. The dominance genetic variance arises from the phenomenon of dominance among alleles. When there is no dominant allele, a\ = 0. To compare these two components, let us look at the ratio of a2, to a2 ra pq 2 l + S(q—p)r\ where r = \d/a\ is the degree of dominance and S is the sign of d. This ratio is mainly affected by a. When a is small and p is not too close to 1 or 0, the contribution of a\ is negligible comparing to a2. (For example, when p = q = 0.5 and a = 1, the ratio is less than 0.125.) Based on random mating and Mendel's laws, the phenotypic covariance between any pair of relatives can be derived after some calculations. In Table A.l we list the covariance for several common types of relative pairs. Relationship Covariance Offspring and one parent Offspring and midparent* Identical twins °l + °d Full siblings <rl/2 + a2/4 Half siblings Nephew and uncle °2/4 Grandparent and grandchild First cousins <*/* *midparent is the average of the two parents Table A.l: Covariances between relatives in a simple Mendelian model A full explanation about how to derive the above covariances can be found in standard references, such as Falconer (1989). In general, the covariance between the trait values of two relatives, Yj and Yj, can be written as Cov(Yi,Yj) = 2$a2a + Aa2d, where $ and A are known as the kinship coefficient and the identity coefficient in genetics literature (Falconer, 1989). Remarks: 1. The table shows that closer biological relationship generally leads to greater phenotypic sim ilarity. 152 2. Parent-offspring correlation is always less 0.5. This makes sense, since an offspring and parent only share 50% of their genetic materials. 3. The sib-sib correlation is generally higher than the parent-offspring correlation due to the dominance effect. When the locus has a strong effect on the trait (i.e., a is large) and one allele is obviously dominant over the other, the sib-sib correlation could be much stronger than the parent-offspring correlation. Otherwise, the difference between the sib-sib correlation and parent-offspring correlation is rather small. More often the value of Y is not completely determined by the genotype. Instead, it may be subject to some random variation. In this case, we assume that given the genotype G, Y has mean p + g and variance a2, where g is the loci effect with value a, d and —a when G = AA,AA' and A'A' respectively; a2 is also called environmental variance, representing the influence of non-genetic circumstances. Then the total phenotypic variation in the population is a2 + a2. If the environmental contribution is uncorrelated within families, it will not affect the covariance between relatives, but will weaken the correlation since it increases the total variance. When more than one locus is involved, if there is no interaction between the loci, the variance components can be derived by summing over each individual locus. When interactions among the loci are included, the genetic variance will be partitioned into three components: 2 2,2,2 ag = aa + ad + °7> where the new term a2 is associated with the interactions, and thus called the interaction ge netic variance. The covariances between different types of relatives can be found in Kempthorne (1955a,b). A.2 Polygenic Models A polygenic (PG) model involves many loci. It is assumed that these loci act independently and additively, each with relatively small effects. Under this assumption, the vector of trait values Y for a family is multivariate normally distributed by the central limit theorem (Lange, 1978). Moreover, the total genetic contribution to the phenotypic variation is the sum of the contribution 153 of the individual loci. Therefore the genotypic variability still can be partitioned into the additive and dominance components; each is summed over all participating loci. The variation of Y comes from different types of factors: measurable environmental factors which are typically modelled as fixed effects, unobservable factors such as a shared household effect which are modelled as random effects, and the polygenic factors which are modelled as random effects. The model is specified as Y = X/3 + G + S (A.l) where G denotes the polygenic factor and is multivariate normal with mean 0 and covariance matrix 2a2a* + a2dA (A.2) where a2, cr2. are defined as before, and $ and A are matrices of the kinship and identity coefficients. S is the environmental factor which can be further decomposed into components. For instance, if there is a shared household effect that needs to be considered, then S can be specified as normal with mean 0 and covariance matrix ajjH + <r2I, where a\ is the common household variance, H is a matrix of household indicators, cr2 is the random variance, and I is the identity matrix. As a result, the covariance matrix of Y is given by 2a2<& + a2A + a2H + ae2I. (A.3) In many polygenic models, the dominance variance is omitted. As we have mentioned before, when the effect of a locus is small, the additive component outweights the dominance component. This was also justified by Morton (1974) and Amos (1988). On the contrary, when a notable dominance variance component is detected in a polygenic model, it could indicate the existence of gene or genes which have a large effect on the trait compared with the other genes. Such a gene is called a major gene. A.3 Major Gene Models In a major gene model, a quantitative trait is determined by the contribution of a major locus plus a polygenic contribution (Elston and Stewart, 1971; Morton and Maclean, 1974). In a linkage study (a study to identify the major gene using markers), a marker locus is linked with this major gene, 154 and the distribution is a mixture of normals. It is not surprising that computational problems arise when a large number of alleles are observed at the locus (Ott, 1979). Thus a random effects model may be a good approximation for such cases (Andrade et al., 1999). The model is represented as Y = X/3 + G + M + S, (A.4) where G is the polygenic effect with only the additive component, M is the major gene effect whose covariance matrix depends on the recombination fraction and the IBD (identical by descent) value (Amos, 1994), and H is the environmental effect. We will not discuss this model in detail as linkage study is not the focus of this thesis. A.4 Models for Autosomal-Dominant Diseases A genetic disorder is autosomal-dominant if it is caused by a single, abnormal gene on one of the autosomal chromosomes (one of the first 22 non-sex chromosomes) and a mutation at one copy of the allele is sufficient for expression of such a disease. One of the parents will usually have the disease in this mode of inheritance and only one parent must have an abnormal gene in order for the child to inherit the disease. Examples are NF1 and NF2. Since a new mutation is rare, we can assume that in a family the mutant alleles are inherited from a common ancestor and therefore identical. However, wide expressive variability can occur among the affected relatives. Differences in the other genes may be important reason for the variation. When we study a population of autosomal-dominant disease, the trait of interest is normally a clinical feature of the disease, such as the onset time. If the variation in the trait is due to a polygenic effect, then the polygenic models can be used to model those effects. If the variation is due to a polygenic effect plus a modifying gene effect, then the major gene models can be used. Differences in the normal allele at the responsible locus could be another reason for the phenotypic differences. For siblings, the mutant allele is from their affected parent, whereas the normal allele is from the unaffected parent. With no inbreeding, affected siblings have 50% chance to inherit the same normal allele from their unaffected parent. Two affected family members, except for siblings, do not share an identical normal allele by descent, since their unaffected parents are not related, i.e., they do not share any gene by descent. Suppose we model the normal allele effect as a random effect A with distribution iV(0, a\). Let Ai and Aj denote the normal allele effects of two individuals i and j. Hi and j are monozygotic 155 twins, they share an identical normal allele by descent, then Aj = Aj. If i and j are full siblings, there is a 50% chance that Aj = Aj and 50% chance that Aj and Aj are independent. For any other types of relatives, Aj and Aj are independent. Now let p be the chance of two individuals to share an identical normal allele by descent. The covariance due to the normal allele effects is then Cov(Aj,Aj) = E(AjAj) = pE(AjAj|Aj = Aj) + (1 - p)E(AjAj|Aj JL Aj) = P°l-The covariance Cov(Aj,Aj) is a\ between monozygotic twins, o\/2 between full siblings and 0 between any other relative pairs. If we also consider a polygenic effect and environmental effect, the model can be expressed as Y = X/3 + G + A + S (A.5) where A is the vector of the normal allele effects of a family with covariance matrix crJ^T, where T is a matrix with 7^ equal to the probability of individuals i and j sharing an identical normal allele by descent. A.5 Comments Up to now, we have discussed how to specify the dependence structure based on how genetic and environmental factors are involved. In practice, often it is the other way around — the correlation structure of a trait is estimable and the question to be answered is what are the genetic and environmental influences on the trait. This can be assessed by comparing the correlation between different type of relatives. For example, if the parent-offspring correlation is comparable with the sib-sib correlation and the correlation declines by half when the two individuals are one degree less related, this suggests a polygenic effect. To identify a modifying gene effect or a normal allele effect, it is important to compare the correlation among monozygotic twins, parent-offspring, full siblings and half siblings. 156 Appendix B Appendix for Chapter 4 B.l General Results The first part are some general results on matrices from algebra and calculus. The second part are some results on expectations and covariances of quadratic forms of normal random variables. These results were used to derive the information matrices and the proof of Theorem B.l in Appendix B.2. Let A = (aij) be a symmetric and positive definite matrix and A{j be the cofactor corre sponding to dij. Then (Lang, 1997) |A| = ^^ciijAij, for any i 3 A~'= W\(A"] ||,*|A| = .,(A-|A). Let Z and Y be n dimensional random vectors, Z ~ iV(0,I) and Y ~ N(0, S). A and B are n x n symmetric matrices. 1. E(ZiZjZk) = 0 E(YiYjYk)=0, i,j,k = l,.:.,n More generally, E(CiY, Y^Y) = 0, where Ci and C2 are arbitrary matrices. 157 2. E(Z'AZ) = tr(A) E(Y'AY) = tr (AS) Var(Z'AZ) = 2tr (A2) Var(Y'AY) = 2tr ((AS)2) Cov(Z'AZ, Z'BZ,) = 2tr (AB) Cov(Y'AY, Y'BY) = 2tr (ASBS) (Seber, 1977). B.2 Asymptotic Properties of the Covariates We prove the asymptotic properties of matrices in the information matrices under the assumptions in Section 4.1.2. Results (1) and (2) in Theorem B.l are Results 4.3 and 4.4 in the case of one covariate; result (3) is used for the case of more than one covariate. Lemma B.l Let Sn, n = 1,2,..., be a sequence of random variables with finite variance. A P sufficient condition for Sn — E5n —> 0 is that Var(Sn) -> 0. The proof follows from Chebyshev's inequality. Theorem B.l Let Xj = (Xn, ...,Xiki)' be a ki-dimensional random vector, i = l,2,...,n. Sup pose 1 < ki < M and M is bounded. Assume {Xj} has the following properties: (a) EXj = 0, (b) Cov(Xj) = Ifcjxfc; and (c) Xi,-- - ,Xn are mutually independent. Suppose Dj = (d^y) is a symmetric ki x ki matrix with \dijj/\ bounded by T, a finite real number. As n —>• oo, (1) K-1 £\ XPil A 0, where K = Zi h-(2) K~l ^ X'PiXi - K~l Zi tr Dj -A 0. (3) IfYi = (Yi,YjjtJ' is another sequence of random vectors with properties (a) - (c) and Xj and Yy are independent for all i and i', then as n —> oo, K~l Yli X'^iYi —t 0. 158 Proof: The proofs of (l)-(3) all follow from Lemma B.l. (1) Let &i = Djl. Firstly, E(^EX^)=^E(EX0'a^O. Secondly, we need to show that Var(iT"1 ^ X^Djl) -> 0. i as n —>• oo. This holds because Var^-^X^) = K-2^2a^Vax(Xi)ai i i = *"2E4 < T2^2kf/K2. The inequality holds because, for the jth element of aj, \dij\ = \Yj'dijj>\ ^ fcjM. Moreover, J2i kf < nM3 and K2 = (Yi h)2 > n2 since 1 < fcj < M. This implies that Ejfcf <M3 - n ' As n ->• oo, K ->• oo and T2MZ/K -» 0 since T and M are finite. (2) We have E(X_1 Ej X-DjXj) = if-^trDj (see Appendix B.l). Next we will show that Vax(K~l EjX^DjXj) ->• 0 as n -> oo. Since Var(X^DjXj) = 2trD2 (general results of expecta tions) and trD2 = Ytfjj' < k2T2, we have Var^-^X^DjXj) = |^EtrD^ 2T2^1. i i Similarly as in (1), we have K2 - n ' Therefore, as n oo, Var^-1 Yi X-DjXj) ->• 0. (3) Firstly, E(K~l Yi X^DjYj) = K~l Yi dy^EpC^/) = 0. Also, Var(X'iDlYi) ^^Cw^J^.djuJA) id' W = E E dijj'dmiE(XijYijiXiiYii'). 3,3' 1,1' 159 Due to independence, E{XijYij>XaYu>) = 0 as j ^ f oxl^V and EX2 Y-2, = EXj-EY^,. Therefore, Var^DjYj) = 2 £ d^EXg-EYj2, j,j' = 2^ d-tfVaxXijVaxYif j,j' j,j' = Var(X^DjXj). Since we have shown that K~l Zi Var(X£DjXj) —> 0 as n —> 00 in (2), it follows that If-.1 J^VarCXjDiYi) -> 0. i The cases (1) and (2) can be extended to the case that Xj has non-identity correlation . matrix Rj. Then (1) still holds and (2) becomes: (2') K~l ^XpjXj - K~l £jtrDjRj -A 0 Proof: Let Zj = R~1/2X,. then Xj = Rj/2Zj. The left hand side of (1) and (2) can be written as K~l Zi ZjR^Djl and K~l Zi ZjRj^DjRpZj, respectively. Since Zj has correlation matrix Ifcixfci? we can apply (1) and (2) and use the identity trR^DjR^2 = trRjDj (see Appendix B.l). B.3 Maple Code 1. Var((j27L1) and Var(/3cz,i) under exchangeable dependence and with family size fixed at k. readlib(linalg): with(linalg): ###MLE ss:=2*s"4*(l+(k-l)*a~2)/k: rr:=2*(1-a)"2*(l+(k-l)*a)~2/k/(k-1): ###CL1 #M: covariance matrix of the estimating functions a2:=rh*4*(l-kk+kk-2)+rh-3*(4*kk-2*kk~2)+rh~2*(kk-2-kk+2)+2*kk*rh+l: m:=matrix(2,2): m[l,l]:=2*ss*ss*(kk+l)*(l+kk*rh*rh): m[l,2]:=-(kk+l)*kk*(kk-l)*ss*ss*rh~2*(l-rh)~2: m[2,l] :=m[l,2] : m[2,2]:=(kk+1)*kk*(1-rh)~2*a2*ss*ss/2: #D: expectations of derivatives of the estimating functions # with respect to theta d:=matrix(2,2): d[l,l]:=kk+l: 160 d[l,2]:=0: d[2,1]:=(kk+1)*kk*rh*(l-rlT2)/2: d[2,2]:=-(kk+1)*kk*(l+rh*rh)*ss/2: #Godambe information matrix dl:=inverse(d): dl:=map(simplify,dl); tem:=multiply(dl,m): tem:=map(simplify,tem): gm:=multiply(tem,transpose(dl)) : gm:=map(simplify,gm); gm:=map(factor,gm); #var(ss) gmCl.l]; #var(rr) gm[2,2] ; quit; 2. ARE^.2 , AREpt CL1 and AREp2 CL1 for Type-3 family with fixed number of offspring k. #efficiency: 1 parent k offspring readlib(linalg): with(linalg): with(codegen,C): ###constants e:=a/(k*a~2-l-(k-l)*b): c:=l-k*a*e: f : = (a*(l-a*e)+e)/(a*(k-l)*(b-D): f:=simplify(f): d:=l-a*e-(k-l)*b*f: d:=simplify(d): bb:=d+(k-l)*f: bb:=simplify(bb): ###Fisher's information matrix (mle) mle:=matrix(3,3): mle[l,l]:=(k+l)/(2*s~4): mle [1,2]:=k*e/s~2: mle [2,1]:-mle[1,2]: mle[1,3]:=k*(k-l)*f/(2*s~2): mle[3,1]:-mle [1,3]: trl:=k~2*e"2+k*c*bb: tr2:=k*(c*bb+e~2*k): mle [2,2]: = (trl+tr2)II: mle [2,3]:=e*(k-l)*k*bb: mle[3,2]:=mle[2,3]: 161 cl:=(k-l)*f: c2:=d+(k-2)*f: mle[3,3]:=k*(cl~2+(k-l)*c2-2)/2: #asy covariance matrix of the MLE map(simplify, mle); mv:=inverse(mle): mv:=map(simplify,mv): ###asy covariance matrix of the CL1 estimate ###D--C-1}%*%M%*%D"{-1} ##matrix D and D~{-1} teml:=l/(l-a~2): tem2:=-a/(l-a~2): dal:=diff(teml, a): da2:=diff(tem2, a): teml:=l/(l-b~2): tem2:=-b/(l-b~2): dbl:=diff(teml, b) : db2:=diff(tem2, b): Dl:=matrix(3,3): Dl[l,l]:=k+l: Dl[l,2]:=s~2*k*e*2: Dl[l,3]:=s~2*k*(k-l)*f: Dl[2,l]:=2*k*a/(l-a~2): ddal:=diff(dal,a): dda2:=diff(da2,a): tem:=ddal+dda2*a: Dl[2,2] :=2*s~2*k*(-da2-tem): Dl[2,3]:=0: Dl[3,l]:=k*(k-l)*b/(l-b-2): Dl[3,2]:=0: ddbl:=diff(dbl.b): ddb2:=diff(db2,b): tem:=ddbl+ddb2*b: Dl[3,3]:=s"2*k*(k-1)*(-db2-tem): map(simplify, Dl); DV:=inverse(Dl): DV:=map(simplify, DV); ##matrix M M:=matrix(3,3): M[l,l]:=k+l: teml:=k*(dal+a*da2): tem2:=k*(dal+a*da2): M[l,2] :=teml+tem2: 162 Cl:=l: c2:=b: c3:=dbl*(k-l): c4:=db2: M[l,3]:=k*(cl*c3+(k-l)*c2*c4): M[2,l] :=M[1,2] : teml:=k*(dal+da2*a): tem2:=da2+a*dal: tem3:=k*dal*a+da2*(l+(k-l)*b): tem4:=a*da2: tr1:=teml~2+k*tem2*tem3: cl:=tem4+dal: c2:=tem4+dal*b: tr2:=k*tem2*tem3+k*(cl~2+(k-l)*c2~2): M[2,2]:=trl+tr2: trl:=k*(k-l)*tem3*a*(dbl+db2): c3:=(k-l)*(dbl+b*db2): c4:=db2+b*(k-1)*dbl+(k-2)*db2*b: tr2:=k*(cl*c3+(k-i)*c2*c4): M[2,3]:=trl+tr2: M[3,l] :=M[1,3] : M[3,2] :=M[2,3] : M[3,3] :=k*(c3-2+(k-l)*c4"2): MV:=map(simplify, M); ##D-{-l}7„*y.M7„*y„D-{-l} tem:=multiply(DV, MV): tem:=map(simplify, tem): dvt:=transpose(DV): gm:=multiply(tem, dvt): ###AREs sigma:=simplify(mv[1,1]/(gm[l,1] *2*s~4) ) : b:=(k*a"2-l)/(k-l): rhl:=simplify(mv[2,2]/(gm[2,2] *2*s"4) ) ; rh2:=simplify(mv[3,3]/(gm[3,3] *2*s"4) ) ; quit; 163 Bibliography Abel, L., Golmard, J., and Mallet, A. (1993). An autologistic model for the genetic analysis of familial binary data. American Journal of Human Genetics, 53: 894-907. Amos, C.I. (1988). Robust Methods for Detection of Genetic Linkage for Data from Extended Families and Pedigree. PhD thesis, Louisiana State University, New Orleans. Amos, C.L (1994). Robust variance-components approach for assessing genetic linkage in pedigrees. American Journal of Human Genetics, 54: 535-543. Andrade, M., Amos, C, and Thiel, T. (1999). Methods to estimate genetic components of variance for quantitative traits in family studies. Genetic Epidemiology, 17: 64-76. Bandeen-Roche, K. and Liang, K.-Y. (1996). Modelling failure-time associations in data with multiple levels of clustering. Biometrika, 83: 29-39. Bonney, G.E. (1986). Regressive logistic models for familial disease and other binary traits. Bio metrics, 42: 611-625. Cameron, A. and Trivedi, P. (1998). Regression Analysis of Count Data. Cambridge University Press, Cambridge. Chaganty, N.R. (1997). Loss in efficiency due to misspecification of the correlation structure in gee. In Proceedings of the 51st session of the International Statistical Institute, Istanbul, Turkey, 2: 127-128. Connolly, M. and Liang, K.-Y. (1988). Conditional logistic regression models for correlated binary data. Biometrika, 75: 501-506. 164 Curriero, F. and Lele, S. (1999). A composite likelihood approach to semivariogram estimation. Journal of Agricultural, Biological and Environmental Statistics, 4: 9-28. Donner, A. and Koval, J.J. (1980). The estimation of intraclass correlation in the analysis of family data. Biometrics, 36: 19-25. Donner, A., Eliasziw, M. and Shoukri, M. (1998). Review of inference procedures for the interclass correlation coefficient with emphasis on applications to family studies. Genetic Epidemiology, 15: 627-646. Elston, R.C. and Stewart, J. (1971). A general model for the genetic analysis of pedigree data. Human Heredity, 21: 523-542. Falconer, D. (1965). The inheritance of liability to certain diseases, estimated from the incidences among relatives. Annals of Human Genetics, 29: 51-79. Falconer, D. (1967). The inheritance of liability to diseases with variable age of onset, with particular reference of diabetes mellitus. Annal of Human Genetics, 31: 1-20. Falconer, D. (1989). Introduction to Quantitative Genetics. Longman Scientific and Technical, Essex. Fisher, R.A. (1918). The correlation between relatives on the supposition of Mendelian Inheritance. Transactions of the Royal Society of Edinburgh. 52: 399-433. Friedman, J., editor (1999). Neurofibromatosis, Phenotype, Natural History, and Pathogenesis. Johns Hopkins University Press, Baltimore, 3rd edition. George, E.O. and Bowman, D. (1995). Full likelihood procedure for analyzing exchangeable binary data. Biometrics, 51: 512-523. Godambe, V.P. (1991). Estimating Functions. Oxford University Press, New York. Haldane, J. (1932). The Causes of Evolution. Longmans, Green, London. Heagerty, P. and Lele, S. (1998). A composite likelihood approach to binary data in space. Journal of the American Statistical Association, 93: 1099-1111. 165 Hunt, S., Hasstedt, S., and Williams, R. (1988). Testing for familial aggregation of a dichotomous trait. Genetic Epidemiology, 3: 299-312. Joe, H. (1993). Parametric families of multivariate distributions with given margins. Journal of Multivariate Analysis, 46: 262-282. Joe, H. (1995). Approximation to multivariate normal rectangle probabilities based on conditional expectations. Journal of the American Statistical Association, 90: 957-964. Joe, H. (1997). Multivariate Models and Dependence Concepts. Chapman & Hall, London. Joreskog, K.G. and Moustaki, I. (2001). Factor analysis of ordinal variables: a comparison of three approaches. Multivariate Behavioral Research, 36: 347-387. Kempthorne, O. (1955a). The correlations between relatives in random mating populations. Cold Spring Harbor Symposia Quantitative Biology, 20: 60-75. Kempthorne, O. (1955b). The theoretical values of correlations between relatives in random mating populations. Genetics, 40: 153-167. King, T.M., Beaty, T.H., and Liang, K.-Y. (1996). Comparison of methods for survival analysis of dependent data. Genetic Epidemiology, 13: 139-158. Kleffe, J. (1993). Parent-offspring and sibling correlation estimation based on MINQUE theory. Biometrika, 80: 393-404. Korsgaard, I. and Andersen, A. (1998). The additive genetic gamma frailty model. Scandinavian Journal of Statistics Theory and Applications, 25: 255-269. Lange, K. (1978). Central limit theorems for pedigrees. Journal of Mathematical Biology, 6: 59-66. Lange, K. (1997). Mathematical and Statistical Methods for Genetic Analysis. Springer, New York. Lee, E., Wei, L.J., and Amato, D.A. (1992). Cox-type regression analysis for large numbers of small groups of correlated failure time observations. In Survival Analysis: State of the Art, Klein, J.P., Liang, K.-Y., and Goel, P.K., editors, Kluwer Academic Dordrecht, 237-247. 166 Lele, S. and Taper, M.L. (2002). A composite likelihood approach to (co)variance components estimation. Journal of Statistical Planning and Inference, 103: 117-135. Li, C.-C. (1976). First Course in Population Genetics. The Boxwood Press. Lindsay, B.G. (1988). Composite likelihood methods. Contemporary Mathematics, 80: 221-239. Manatunga, A.K. and Williamson, J.M. (2001). Assessing familial aggregation with an ordinal response. Communications in Statistics: Theory and Methods, 30: 627-641. Morton, N. (1974). Analysis of family resemblance. I. introduction. American Journal of Human Genetics, 26: 318-330. Morton, N. and Maclean, C. (1974). Analysis of family resemblance. III. complex segregation analysis of quantitative traits. American Journal of Human Genetics, 26: 489-503. Nash, J. (1990). Compact Numerical Methods for Computers: Linear Algebra and Function Min imisation. Hilger, New York. Ott, J. (1979). Maximum likelihood estimation by counting methods under polygenic and mixed models in human pedigrees. American Journal of Human Genetics, 31: 161-175. Parner, E.T. (2001). A composite likelihood approach to multivariate survival data. Scandinavian Journal of Statistics, 28: 295-302. Pickles, A., Crouchley, R., SimonofT, E., Eaves, L., Meyer, J., Rutter, M., Hewitt, J., and Silberg, J. (1994). Survival models for developmental genetic data: age of onset of puberty and antisocial behavior in twins. Genetic Epidemiology, 11: 155-170. Reboussin, B.A. and Liang, K.-Y. (1998). An estimating equations approach for the LISCOMP model. Psychometrika, 63(1): 165-182. Searle, S.R., Casella, G., and McCulloch, CE. (1992). Variance Components. John Wiley & Sons, Inc., New York. Seber, G.A.F. (1977). Linear Regression Analysis. Griffin, London. 167 Serfling, R.J. (1980). Approximation Theorems of Mathematical Statistics. John Wiley & Sons, New York. Song, P. (2000). Multivariate dispersion models generated from Gaussian copula Scandinavian Journal of Statistics, 27: 305-320. Spiekerman, C.F. and Lin, D.Y. (1998). Marginal regression models for multivariate failure time data. Journal of the American Statistical Association, 93: 1164-1175. Srivastava, M.S. and Ng, F. (1990). Estimation of intraclass correlation in regression models. Gujarat Statistics Review, Professor Khatri Memorial Volume, 229-236. Srivastava, M.S. and Katapa, R. (1986). Comparison of estimators of interclass and intraclass correlations from familial data. The Canadian Journal of Statistics, 14: 29-46. Srivastava, M.S. (1993). Estimation of the intraclass correlation coefficient. Annals of Human Genetics, 57: 159-65. Szudek, J., Birch, P., Riccardi, V.M., Evans, D., and Friedman, J. (2000). Association of clinical features in neurofibromatosis 1 (nfl). Genetic Epidemiology, 19: 429-439. Szudek, J., Joe, H. and Friedman, J. (2002). Analysis of intrafamilial phenotypic variation in neurofibromatosis 1 (nfl). Genetic Epidemiology, 23: 150-164. Taylor, H.M. (1994). An Introduction to Stochastic Modeling. Academic Press, In, San Diego. Tempelman, R.J. (1996). A mixed effects model for overdispersed count data in animal breeding. Biometrics, 52: 265-279. Tempelman, R.J. and Gianola, D. (1999). Genetic analysis of fertility in dairy cattle using negative binomial mixed models. Journal of Dairy Science, 82: 1834-1847. Tosteson, T., Rosner, B., and Redline, S. (1991). Logistic regression for clustered binary data in proband studies with application to familial aggregation of sleep disorders. Biometrics, 47: 1257-1265. 168 Tregouiet, D.-A., Ducimetiere, P., Bocquet, V., Visvikis, S., Soubrier, F., and Tiret, L. (1999). A parametric copula model for analysis of familial binary data. American Journal of Human Genetics, 64: 886-893. Vaupel, J., Manton, K., and Stallard, E. (1979). The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography, 16: 439-454. Wei, L.J., Lin, D.Y., and Weissfeld, L. (1989). Regression analysis of multivariate incomplete failure time data by modelling marginal distributions. Journal of the American Statistical Association, 84: 1065-1073. Wright, S. (1921). Correlation and causation. Journal of Agricultural Research, 20: 557-585. Xu, J.J. (1996). Statistical Modelling and Inference for Multivariate and Longitudinal Discrete Response Data. PhD thesis, Department of Statistics, University of British Columbia. Yashin, A. and Iachine, I. (1995). Survival of related individuals: an extension of some fundamental results of heterogeneity analysis. Mathematical Population Studies, 5: 321-340. Yau, K.K.W. and Lee, A.H. (2001). Zero-inflated Poisson regression with random effects to evaluate an occupational injury prevention programme. Statistics in Medicine, 20: 2907-2920. Zhao, Y., Kumar, R., Baser, M., Evans, D., Wallace, A., Kluwe, L., Mautner, V., Parry, D., Rouleau, G., Joe, H., and Friedman, J. (2002). Intrafamilial correlation of clinical manifestations in neurofibromatosis. Genetic Epidemiology, 23: 245-259. 169 


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items