Some Statistical Models for The Multivariate Analysis of Longitudinal Data by Peter Xue-Kun Song B.Sc, Jilin University, China, 1985 M.Sc, Southwest Jiaotong University, China, 1987 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES Department of Statistics We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA October 1996 Â©Peter X.-K. Song, 1996 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head , of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of ^ { ^ ^ C S The University of British Columbia Vancouver, Canada Date 0^{, (OStb A B S T R A C T This thesis develops some statistical models for the multivariate analysis of longitu-dinal data on the basis of the dispersion models of J0rgensen (1987a, 1996), consisting of three topics: multivariate dispersion models and their application to regression analy-sis, stationary time series models with non-normal margins and state space models with Markov latent processes. The goal of the thesis is to develop statistical models which can accommodate features of both trend and dependence for longitudinal data. This thesis focusses mainly on the following three types of longitudinal data, namely (1) many short time series, (2) a few long stationary time series and (3) a few long non-stationary time series with time-varying covariates. A class of multivariate dispersion models is proposed to deal with data of type (1), in a spirit similar to multivariate analysis based on the multivariate normal distribution. Under these multivariate parametric models, population-averaged models (Diggle et al., 1994) are revisited, where approximate inferences for regression parameters are presented, including the generalized estimating equation (GEE) of Liang and Zeger (1986) as a special case. The thesis also presents a class of stationary autoregressive moving-average ( A R M A ) models with exponential dispersion model margins for data of type (2). The class of A R M A models is defined as a special case of a class of stationary infinite order moving average processes constructed by means of the thinning operation of Joe (1996a). For analysis of type (3) data, two classes of state space models, including one with sta-tionary latent processes and another with non-stationary latent processes, are proposed. To estimate regression parameters in both classes of models, we develop an algorithm for solving the so-called Kalman estimating equation (KEE) , corresponding to a modified EM-algorithm where the E-step is approximated by the Kalman smoother that estimates the latent process via the best linear unbiased predictor (BLUP) . ii Two simulation studies are conducted in the thesis based on Poisson-gamma models. One is for the comparison of the efficiency of the K E E approach and the Monte Carlo E M ( M C E M ) algorithm. The other simulation study is for the examination of the utility of the model diagnosis for detecting the misspecification of stationarity and non-stationarity for latent process. The thesis contains two data analyses. One data set consists of daily counts of emergency room visits for respiratory diseases to the hospital of Prince George, British Columbia, along with covariates of air pollution variables and meteorological variables. These data are analyzed through state space models to investigate the relationship be-tween air pollution and respiratory morbidity. The other data set, consisting of the monthly number of poliomyelitis cases in the USA from 1970 to 1983, is analyzed based on the Poisson stationary-gamma model to study whether or not there is an evidence of a decreasing trend in the rate of polio infections in the USA. m Table of Contents Abstract ii List of Tables ix List of Figures x Acknowledgements xii 1 Introduction 1 1.1 General ^ 1 1.2 Literature review 3 1.3 Overview of topics 5 2 Dispersion models 10 2.1 Definition 10 2.2 Exponential dispersion models 12 2.3 Tweedie class 14 2.4 Proper dispersion models 15 2.5 Thinning operation ' 16 2.6 Multivariate dispersion model densities 19 3 Multivariate dispersion models 24 3.1 Copulas 26 3.2 Multivariate normal copulas 28 iv 3.3 Interpretation of dependence parameters 30 3.4 Multivariate dispersion models 35 3.5 Normal scores and their approximations 36 3.6 Small-dispersion asymptotics for joint density 40 3.7 Approximate covariance 43 4 Likelihood-based inference for population-averaged models 45 4.1 Models 46 4.2 Approximate inference based on the Pearson residual 49 4.3 Approximate inference based on the deviance residual 51 4.4 Approximate inference based on the modified deviance residual 54 4.5 Implementation of the estimating procedure 56 5 Stationary models with exponential dispersion model margins 58 5.1 Introduction 58 5.2 Moving average processes of infinite order 62 5.2.1 Hidden process 63 5.2.2 Autocorrelation function 64 5.3 Autoregressive moving average processes 65 5.4 Autoregressive processes 66 5.4.1 E D A R ( l ) process 67 5.4.2 Partial autocorrelation function 68 5.5 Truncation error 71 5.6 Further questions 72 6 B L U P , Kalman filter and smoother 74 6.1 Some general results for linear prediction 74 v 6.2 General forms of Kalman filter and smoother 80 7 State space models with stationary latent process 86 7.1 Introduction 86 7.2 Models 91 7.2.1 Definition 91 7.2.2 Moment structure 92 7.3 Kalman filter and smoother 93 7.3.1 Derivation 94 7.3.2 Mean square error matrix 96 7.4 Estimation '. '. 97 7.4.1 Estimation of regression parameters 98 7.4.2 Estimating the dispersion and correlation parameters 100 7.4.3 Calculation of Godambe information matrix 101 7.5 Model checking 102 7.6 Data analysis 105 7.6.1 Poisson stationary-gamma model 107 7.6.2 Implementation of estimation 108 7.6.3 Numerical results 109 7.6.4 Model diagnosis I l l 7.7 Proof of asymptotic normality 115 8 State space models with non-stationary latent process 121 8.1 Introduction 121 8.2 Models 123 8.3 Kalman filter and smoother 129 8.4 Parameter estimation 131 v i 8.4.1 Kalman estimating equation 131 8.4.2 Estimation of dispersion parameters 134 8.5 Derivation of Godambe information matrix 135 8.5.1 Some moment calculations 135 8.5.2 Derivatives of Kalman filter and smoother 136 8.5.3 Variability matrix 137 8.5.4 Sensitivity matrix 138 8.6 Residual analysis 139 9 Simulation studies under Poisson-gamma models 143 9.1 Models 144 9.2 Comparison between MCEM and KEE 146 9.2.1 MCEM algorithm 147 9.2.2 Gibbs sampler 151 9.2.3 Numerical results 155 9.2.4 Conclusion 160 9.3 Detecting non-stationarity 164 9.3.1 When non-stationarity is present 164 9.3.2 When stationarity is present 166 9.3.3 Conclusion 168 10 Data analysis 170 10.1 Introduction 170 10.2 Data 172 10.3 Methodology 173 10.3.1 The model 176 10.3.2 Parameter estimation 178 vii 10.3.3 Residual analysis 178 10.4 Data analysis 179 10.4.1 Model identification 179 10.4.2 Model checking 182 10.5 Conclusions 190 10.5.1 Final model 190 10.5.2 Comparisons with the previous analysis 193 11 Further research 195 11.1 Comparison of approximate inferences 195 11.2 Estimation for stationary time-series models 196 11.3 Extensions of state space models 197 Bibliography 198 vm List of Tables 2.1 Summary of Tweedie exponential dispersion models 14 2.2 Summary of contractions 18 7.1 Monthly number of U.S. cases of poliomyelitis for 1970 to 1983 106 7.2 Estimates of coefficients and standard errors 110 9.1 Estimates and standard errors from K E E and M C E M 162 9.2 Overview of the simulation strategy for model diagnosis 164 9.3 Estimates of coefficients given by Model I based on the data generated by Model II 166 9.4 Estimates of coefficients given by Model II based on the data generated by Model 1 167 10.1 Estimates and standard errors for air pollution effects 180 10.2 Empirical variance-covariance matrix for standardized residuals 190 10.3 Estimates and standard errors for short-term effects 191 ix List of Figures 3.1 Four bivariate normal copulas with different values of 7 29 3.2 Monte Carlo simulation plots of three dependence measures 34 3.3 Comparison of three types of residuals from gamma distribution 41 5.1 Yearly mean sunspot numbers 59 6.1 The comb structure 80 7.1 Monthly counts of Polio incidences 106 7.2 Checking correlation structures for both latent and observed processes. . 112 7.3 Checking distribution assumptions for the latent and observed process. . 113 7.4 Checking the linearity of the time trend 114 7.5 Checking the latent structure 115 8.1 Simulation of 100 observations from a process with four categories 127 9.1 168 simulated counts from the Poisson-gamma model 156 9.2 Log-profile-likelihood versus the dispersion parameter given increasingly. 158 9.3 Log-profile-likelihood versus the dispersion parameter given decreasingly. 159 9.4 Patterns for log-likelihood and coefficients estimates 161 9.5 Comparison between the true latent process and the estimated latent pro-cesses 162 9.6 A C F and P A C F functions for the residuals of the latent process 165 9.7 168 simulated counts from a state space model 167 x 9.8 A C F and P A C F functions for the residuals from the latent process. . . . 168 10.1 Emergency room visits for four respiratory diseases. 174 10.2 Meteorological and air pollution variables 175 10.3 First-year residuals against second-year residuals for the four categories. . 181 10.4 First-year residuals against second-year residuals for latent process. . . . 182 10.5 Residuals against log-linear predictions for the four categories 183 10.6 Residuals against log-linear predictions for latent process 184 10.7 Autocorrelation function of residuals from latent process 185 10.8 Autocorrelation functions of residuals for the four categories 186 10.9 Residuals against lag 1 residuals for the four categories 187 lO.lOResiduals against lag 1 residuals for the latent process 188 10.11 Residuals for the four categories against the lag 1 smoother 189 10.12The estimated latent process from Kalman smoother 192 10.13Day-of-the-week effects 192 x i Ackownledgements I thank my supervisor, Dr. Bent j0rgensen, who has been a constant source of inspiration, encouragement and guidance for me throughout my graduate career at U B C . I would like to acknowledge the assistance and valuable comments from other members of my supervising committee, Dr. James Zidek and Dr. Martin Puterman. Many thanks go to Dr. Harry Joe for introducing me to his research work that was both inspiring and stimulating. I acknowledge my debt to Dr. S0ren Lundbye-Christensen and Dr. L i Sun, without whom this research would not have been possible. I would also like to thank Dr. Ruben Zamar, Dr. John Petkau, Dr. Jian Liu , Dr. Nancy Heckman and Dr. Paul Gustafson for their sincere support and encouragements during my Ph.D program at U B C . I thank my wife Ru Yan for her support and patience. My daughter Angela Song came and brought uncountable happiness to me during my stay at U B C . Finally I would like to thank the Department of Statistics, U B C , as well as my supervisor Dr. Bent j0rgensen for their financial support to me in the past four years. x i i Chapter 1 Introduction 1.1 General Statistical studies of longitudinal data are commonly undertaken in a number of practical areas and for a variety of purposes. In the most general sense, "longitudinal data" means any collection of data indexed by time. Following established terminology in biostatistical applications, we shall usually think of longitudinal data as a collection of time-series, one for each of a number of subjects. The main characteristic of longitudinal data is thus the presence of repeated observations for each subject, the observations being correlated over time. Of primary interest in the analysis of longitudinal data are the mechanisms of change, for instance growth, aging, time profiles or the effects of covariates (explanatory variates). See Diggle et al. (1994) or Lindsey (1993) for a general introduction to longitudinal data analysis. Longitudinal data sets usually consist of either many short time-series or a few long time-series; each series may be either stationary or non-stationary. Classical time-series analysis (in the Box-Jenkins sense, say) corresponds essentially to the special case of one long stationary time-series. See Box and Jenkins (1970) and Brockwell and Davis (1987). Furthermore, the response at a given time may be either univariate or multivariate, and there may be explanatory variates, either constant or time-varying. Depending on the particular type of longitudinal data at hand, any of several dif-ferent types of statistical analysis may be appropriate. Some methods for longitudinal 1 Chapter 1. Introduction 2 data analysis are adapted from classical time-series analysis, while others such as those based on generalized linear models originate in regression analysis. In particular, state-space models (or dynamical models) have developed in parallel development for both classical time-series analysis and longitudinal data analysis (c.f. Fahrmeir and Tutz, 1994), whereas the generalized estimating equation method of Liang and Zeger (1986) was mainly inspired by ideas from generalized linear models. A key distinction in longitudinal data analysis is between long and short time-series. For long time-series, stationarity is crucial, and assuming stationarity for non-stationary data, or vice versa, may invalidate the analysis. For short time-series, this question is less important, but if the number of subjects is large enough one should still be able to determine the nature of the correlation structure accurately. There is no hard and fast rule for determining when a series is considered long or short, and the choice of method may also depend on other aspects of the data at hand. In this thesis, we concentrate on three particular types of longitudinal data, which together cover a broad range of data-types and applications. The first type consists of many short time-series, where methods from multivariate data analysis may be employed. The second type involves one or a few long stationary time-series. The third consists of one or a few long non-stationary time-series with multivariate responses. In the first and third cases, we assume that the series are accompanied by time-varying covariates. For stationary time-series, we can take advantage of the already extensive arsenal of techniques available from classical time-series analysis. Regression analysis for long non-stationary time-series, on the other hand, presents new unsolved problems, the main one being that arising from the distinction between correlation, trend and regression effects for the data. Understanding the nature of both correlation and trend of the data is the main con-cern in modelling longitudinal data. However, the marginal model (Diggle et al., 1994), Chapter 1. Introduction 3 currently one of the most popular methods for the analysis of short time series, deals only with the pattern of trend, treating the correlation as a set of nuisance parameters. This model is unsatisfactory in practice when the pattern of correlation is of interest in the analysis. The goal of this thesis is to develop statistical models which can accommodate features of both trend and correlation for longitudinal data. Corresponding to the three classes of longitudinal data introduced above, this the-sis concentrates on three topics: (1) multivariate dispersion models with application to regression analysis under the population-averaged model (Zeger et al., 1988); (2) station-ary time series models with exponential dispersion model margins motivated by Wolfer's monthly mean relative sunspot numbers (Yule, 1927); (3) state space models with either stationary or non-stationary Markov latent processes. 1.2 Literature review Due to the relative paucity of multivariate non-normal distribution families, the analysis of short time series data is in general difficult. For the case of normal data, the theory for longitudinal studies is quite well developed. Lindsey (1993) gives a good summary about such developments and gives further references. Lindsey (1993) also described recent de-velopments in the analysis of longitudinal categorical data, and later Diggle, Liang and Zeger (1994) presented a comprehensive introduction to the theory of longitudinal data analysis based on generalized linear models. The central method in the book of Diggle el al. is what they call the generalized estimating equation (GEE) approach, which is developed by incorporating a so-called "working" correlation matrix into a Fisher scor-ing equation obtained with independent observations (see also Liang and Zeger, 1986). The correlation matrix is then treated as a set of nuisance parameters in their quasi-likelihood estimation procedure. Fitzmaurice et al. (1993) made an important comment Chapter 1. Introduction 4 on this quasi-likelihood approach, arguing that when the experimental design includes time-varying covariates, it becomes very important to obtain a close approximation to the correlation structure in order to achieve high efficiency. Zhao and Prentice (1990) suggest estimating the regression parameters and correlation parameters simultaneously to improve the efficiency of the G E E approach. Attempts at likelihood-based estimation for longitudinal data have been seen in the literature in the past several years (c.f. Laird, 1990 for the general multivariate model approach to binary outcomes and Joe, 1996b for the copula approach to categorical data). These attempts give rise to the development of multivariate non-normal distribution fam-ilies that can be applied in the longitudinal data setting. McCullagh and Nelder (1989, Chapter 6) and Lipsitz et al. (1990) developed some general multivariate models for discrete multivariate data for solving certain problems. Some recent models for associa-tion in discrete data are avialable (c.f. Carey et al. 1993 and Fitzmaurice et al. 1993). Other recent work includes Leroux and Puterman (1992), j0rgensen (1992), J0rgensen and Lauritzen (1993) and X u (1996). The problem of generalizing the classical time series models of Box and Jenkins (1970) (see also Brockwell and Davis, 1987) to analyze non-normal stationary time series data has received considerable attention in the past two decades or so, starting with the papers Lawrance and Lewis (1977) and Jacobs and Lewis (1977) for the case of exponential margins. Models for other marginal distributions such as gamma, Poisson and negative binomial followed. A unification and generalization of some of these models were recently proposed by Joe (1996a), further detailed references are available therein. To analyze time series data with time-varying covariates, state space models (or dy-namical models) are very useful. These models offer a convenient conceptual framework for time series data, where the latent process may be interpreted as an unobserved inten-sity process, often providing a useful interpretation of the data-generating mechanism. Chapter 1. Introduction 5 Harrison and Stevens (1976) first introduced state space models in the context of Gaus-sian time series. See also West and Harrison (1989). The generalization of these models for non-normal data has been developed extensively in the literature. Cox (1981) first proposed the concepts of parameter-driven models and observation-driven models for the regression analysis of time-dependent data. West, Harrision and Migon (1985) proposed the dynamic generalized linear model where the latent process is specified through only first and second moments. State space models for univariate count data have been pro-posed by Azzalini (1982), Zeger (1988), Harvey and Fernandes (1989) and Chan and Ledolter (1995). See also Grunwald, Raftery and Guttorp (1993) and Delampady, Yee and Zidek (1993). The development of estimation methods for state space models is mainly based on the Kalman filter techniques of Kalman (1963) (c.f. Durbin, 1990, Fahrmeir and Kaufmann, 1991, Fahrmeir and Tutz, 1994 and De Jong, 1990). 1.3 Overview of topics We begin by reviewing (in Chapter 2) the theory of dispersion models, referring to the forthcoming monograph of J0rgensen (1996), including dispersion models, exponential dispersion models, proper dispersion models and Tweedie models. We then study the thinning operation of Joe (1996a) under exponential dispersion models, the key to con-structing the stationary time series models introduced in Chapter 5. As a natural multi-variate extension of dispersion models, the multivariate dispersion densities of J0rgensen and Lauritzen (1993) are introduced in Section 2.6 where some special cases are exhibited. â€¢ Chapters 3 and 4 form the first part of the thesis and deal with short time series data. In Chapter 3, we apply the copula approach (c.f. Joe, 1996b) to form a family of multivariate dispersion models whose dependence structures are inherited from the mul-tivariate normal distribution in a certain way. We introduce a new dependence measure Chapter 1. Introduction 6 similar to Kendall's r and Spearman's p, the so-called normal scoring u, to explain the dependence structure of a copula. The normal copula approach is very useful for dealing with non-normal multivariate data, including integer data, directional data and positive skew data. Compared with existing results, ours provides a more flexible approach to analysis of non-normal longitudinal data. Chapter 4 presents a likelihood-based approach to the regression analysis of short time series data under the population-averaged model (Zeger et al., 1988) in which the aggregate response for the population is the focus, and each short series forms a response vector assumed to follow a multivariate dispersion model. A theory of approximate inference is proposed for this regression model through saddlepoint approximations and tail area approximations for the joint likelihood function. As a result, we derive several estimating equations for regression parameters, which include the generalized estimating equations of Liang and Zeger (1986) as a special case. Since our approach is based on a fully specified probability model, this sheds new light on the generalized estimating equation approach, and suggests cases where a likelihood-based approach is preferred. Chapter 5 focusses on the second topic of the thesis: the construction of stationary time series models with exponential dispersion model margins, motivated by Wolfer's monthly mean relative sunspot data (Yule, 1927). The analysis of stationary non-normal long time series data gives rise to the problem of developing stationary time-series mod-els with non-normal marginal distributions. We introduce a class of stationary infinite order moving average processes with margins in the class of infinitely divisible exponen-tial dispersion models. Our main tool for constructing these processes is the thinning operation of Joe (1996a). As a special case we obtain a class of autoregressive moving average processes with exponential dispersion model margins, denoted E D A R M A ( p , q), which are different from Joe's (1996a) A R M A models. Our construction is based on first defining an infinite-order moving average model, Chapter 1. Introduction 7 and then defining the E D A R M A ( p , q) model as a special case. Although this approach is less direct than that of Joe (1996a), it is mathematically more convenient, and allows us to transplant many techniques from linear time-series analysis into our framework. In addition, the class of stationary processes could be used to model latent processes in state space models presented in Chapter 7. The third topic of this thesis is dealt with in Chapters 6 to 10, where we develop state space models for long time series data with time-varying covariates. State space models may have either stationary latent processes or non-stationary latent processes. Most previous work has focussed either on models with stationary latent process, such as Zeger (1988) and Chan and Ledolter (1995), or on linear Gaussian state space models. Our approach is based on either a stationary latent process or a non-stationary latent process, and uses distributions from the Tweedie class, providing a very flexible approach to the analysis of non-normal time series data. Following J0rgensen, Labouriau and Lundbye-Christensen (1994), we base our esti-mation procedure on what we call the Kalman estimating equation (KEE) , corresponding to the E M algorithm with the E-step approximated by the Kalman smoother. We also study residual analysis for model checking in detail. In Chapter 6, we introduce the B L U P and the Kalman filter and smoother, which are crucial for the analysis of state space models in the subsequent chapters. Chapter 7 presents a class of state space models for long time series data, assuming that the latent processes are given by the class of stationary AR(1) models of Joe (1996a) with non-normal margins and without time-varying covariates. These stationary latent processes are constructed by the thinning operation of Joe (1996a), which leads to a structure different from the linear structure of classical state space models, so standard results for the Kalman filter and smoother do not apply to our models. Instead we use the results of Chapter 6. We also give a complete proof of the asymptotic normality for Chapter 1. Introduction 8 the K E E estimates for these models. Chapter 7 also gives illustrative applications of state space models with a stationary AR(1) gamma process to the analysis of the polio data (Zeger, 1988). The polio data consist of monthly number of poliomyelitis cases in the USA from 1970 to 1993, which were previously analyzed by Zeger (1988) and Chan and Ledolter (1995). We re-analyze the data to study whether or not there is an evidence of a decreasing trend in the rate of polio infection in the USA. Our K E E estimates are closer to Chan and Ledolter's M C E M estimates than Zeger's quasi-likelihood estimates. Both Chan and Ledolter's method and ours lead to a significantly decreasing trend, whereas Zeger's method leads to a non-significant trend. We conduct a detailed model checking diagnosis by means of residual analysis to confirm the adequacy of our model assumptions to the data. Neither of the papers cited above by Chan and Ledolter nor Zeger present such a model checking procedure in their analyses. Chapter 8 proposes a class of state space models for multivariate response vectors with non-stationary latent processes where covariates may enter in both the observed and latent processes. The models open up some new possibilities for detailed investigation of the time-varying structure of long time series data, allowing us to address substantive questions that are not easily dealt with by existing methods. The marginal distributions for the components of the response vector are allowed to be different. Hence our theory provides a very realistic model for many types of multivariate longitudinal data. Chapter 9 conducts two simulation studies under Poisson-gamma models (Poisson response and gamma latent process): (1) comparison of the efficiency of the Kalman estimating equation approach and the Monte Carlo E M algorithm ( M C E M ) ; (2) exami-nation of the utility of residual analysis for detecting of stationarity or non-stationarity for the latent process. The simulation experiments allow us to conclude that the estimates from the K E E method are very similar to those from the M C E M approach. We also Chapter 1. Introduction 9 found that the residual analysis is able to effectively distinguish between the stationary and non-stationary cases. Chapter 10 presents an application of the state space models with non-stationary latent process proposed in Chapter 8 to the analysis of data from Prince George, British Columbia, previously analyzed by Knight et al. (1993). The data consist of daily counts of emergency room visits each of four categories of respiratory diseases, along with co-variates of air pollution variables and meteorological variables. These data are analyzed through state space models to investigate the relationship between air pollution and res-piratory morbidity. Because our model includes a more realistic correlation structure compared with Knight et al.'s (1993) analysis, we arrive at a much more parsimonious model for the systematic part. The interpretation of the latent process in connection with these data provide new insight into the relation between air pollution and respira-tory morbidity. Finally in Chapter 11 we make some general comments on possible further studies for some of the topics of the thesis. Chapter 2 Dispersion models In this chapter, we first give a brief introduction to the theory of univariate dispersion models of J0rgensen (1987a, 1996), which is the basis of the entire theory that we develop in this thesis. We then discuss in detail the thinning operation of Joe (1996a) for the class of exponential dispersion models, and finally discuss a possible multivariate extension of dispersion models. 2.1 Definition One attempt to generalize the normal distribution, linked to generalized linear models, leads to the class of dispersion models whose statistical properties are very similar to those of the normal distribution. Dispersion models were originally introduced by Sweeting (1981) and J0rgensen (1983) and the models were studied in detail by in J0rgensen and dispersion parameter a1 > 0 is defined by a probability density function of form where a > 0 is a suitable function, d : C x 0, â€”> 7Â£ is a regular unit deviance satisfying (1986, 1987a). A univariate (regular) dispersion model DM(//,cr 2) with position parameter (i â‚¬ Cl (2.1.1) d(y;u) > d(y;y) = 0 and d2d 2 du2 V(y) > 0 V being called the variance function. 10 Chapter 2. Dispersion models 11 Obviously, the normal distribution Af([i, c 2 ) is a dispersion model with unit deviance (y â€” p)2 and variance function V(p) = 1. Other dispersion models have many analo-gies with the normal distribution, particularly because d(y; p) behaves like a generalized measure of squared distance for small o2. Other characteristics of the densities are: (a) The factor a in (2.1.1) is independent of the position parameter p. (b) The densities are often unimodal; in general, the deviance d tends to make the density peak near p, and the peak will become higher and sharper the smaller the value of a2. (c) The dispersion models cover a comprehensive range of non-normal distributions such as Poisson, gamma, negative binomial, binomial, inverse Gaussian, von Mises and simplex distributions. Dispersion models are in general approximately normal Af {fi, a2V(fi)} for a2 small, so that fi and cr2V(fi) represent the asymptotic mean and variance, respectively. According to J0rgensen (1996), as the dispersion parameter a2 tends to zero, the density of a regular dispersion model DM(//,cr 2) can be approximated as follows, p(y;Â»,o2)~ ( 2 7 r a 2 ) " 1 / 2 y - 1 / 2 ( y ) e x p { - ^ % ; / i ) } , y e 0. (2.1.2) This asymptotic relation (2.1.2) is usually called the saddlepoint approximation for a regular dispersion model. We use the convention that throughout the rest of this thesis, the symbol "~" in (2.1.2) will always used for asymptotic relationships as the dispersion parameter tends to zero. Chapter 2. Dispersion models 12 2.2 Exponential dispersion models Exponential dispersion models were originally proposed by Tweedie (1947), and later reintroduced by Nelder and Wedderburn (1972) as the class of error distributions for generalized linear models. A reproductive exponential dispersion model is defined by the following density for a random variable Y , with respect to a suitable measure, p(y; 6, A) = c(y; A) exp [\{6y - K(0)}] . (2.2.1) A n additive exponential dispersion model for a random variable Z is defined by the prob-ability density function of the following form, with respective to a suitable measure, p(z; 0, A) = c*(z; A) exp {Oz - A>c(0)} . (2.2.2) The two types are connected by the duality transformation Z = AY", so in principle, any exponential dispersion model has both an additive and a reproductive forms. The parameter 9 is called the canonical parameter and A the index parameter. The maximum possible domain for 6, denoted 0 C % is called the canonical parameter domain. The domain for A, called the index set, denoted A, is a subset of 1Z+. The function K is called the cumulant function. The reproductive form (2.2.1) is denoted Y ~ ED(/i,<7 2), where fi = T(6) = K'(8) is the mean and a2 = A - 1 is the dispersion parameter. The variance of Y is a2V(n) = The additive form (2.2.2) is denoted by Z ~ ED*(0,X), and the mean and variance of Z are E(Z) = C = \T(6) and Var(Z) = AV(Â£/A) = XV (n), (2.2.3) where r(0) = K'{9) and V(p) = T' { r " 1 ^ ) } . Chapter 2. Dispersion models 13 We define the unit deviance for the above exponential dispersion models as follows, sup {yd - K{6)} - yT-\p) + Â« { r " 1 ^ ) } (2.2.4) Then the density (2.2.1) can be re-written by the form (2.1.1) in terms of d of (2.2.4) and the a function given by i(y;a ) = c(y;a )exp 8up{y6-K(6)} This proves that the reproductive model (2.2.1) is indeed a dispersion model (2.1.1). However, the additive model (2.2.2) is not, in general, of the form (2.1.1). Both types of exponential dispersion models satisfy a convolution property given as follows. Let Yi,... ,Yn be independent and Yi ~ ED(/z,cr2/wi) with weights iw,-. Then, 1 ^+ i=l where w+ = Y% wi-If Zi,..., Zn are independent and Z{ ~ ED*(^, Aj), then Z + = X > ~ E D * ( 0 , A 1 + . . . + A n ) . (2.2.5) (2.2.6) Furthermore, we have A = if and only if (2.2.2) is infinitely divisible (J0rgensen, 1986). The saddlepoint approximation to the density function of the additive model (2.2.2) is given by O A V (z/X)y1/2 exp [-ZT-1 (z/X) + XK { r " 1 (z/X)}] exp {9z - XK{6)} (2.2.7) as A goes to oo. Chapter 2. Dispersion models 14 Table 2.1: Summary of Tweedie exponential dispersion models. Distributions P S n 0 Extreme stable p<0 R R + Ro Normal p = 0 R R R [Do not exist] 0 < p < 1 â€” R+ Ro Poisson p = l N 0 R + R Compound Poisson 1<P<2 R-o R+ R _ Gamma p = 2 R + R+ R _ Positive stable 2 < p < 3 R+ R+ â€”Ro Inverse Gaussian p = 3 R+ R+ â€”Ro Positive stable p > 3 R + R+ - R o Extreme stable p = oo R R R _ Notation: â€”Ro = (-oo,0] 2.3 Tweedie class The Tweedie class is an important class of univariate exponential dispersion models, defined as the special case of (2.2.1) with unit variance functions of the form V{u) = /x", for some p Â£ (-co, 0] U [1, oo), denoted Twp(/i, cr2), where the variance is a2fip. Table 2.1 shows a summary of the different types of Tweedie models. The Tweedie models may be characterized as the only exponential dispersion models that satisfy the following scale transformation property, cTwp(/i,<72) ~ Twp(c/j,c 2-p(7 2), Vc>0. (2.3.1) Hence, each Tweedie model also has an additive form. The density function of the Tweedie distribution Twp(//, a2), considered as an additive model, has the following form, with R â€” (p â€” 2)/(p â€” 1), f(z; 6, A, R) = c* (A; z) exp {0z - \K0{O)} (2.3.2) Chapter 2. Dispersion models 15 where Kp{6) = fS'1^ - 1) {6/\R - 1)}P and Tp(6) = K'p(6). The function cÂ£(A; z) is defined as follows, 1 for (3 < 0, z = 0 ~ XkKkJ-l/z) e n f o r ^ < 0 , , > 0 4(A; z) = { g IX1 + M) AfcÂ«J(_i/z) 8 i n (_ifc T / 3) for 0 < S < 1, z > 0 (2.3.3) Many exponential dispersion models have variance functions that are asymptotically of the Tweedie form, leading to a general convergence theorem with the Tweedie models as limiting distributions (J0rgensen, Martinez and Tsao, 1994). For this reason, Tweedie models occupy a central position among exponential dispersion models. 2.4 Proper dispersion models By normalizing the saddlepoint approximation (2.1.2) for a regular dispersion model (2.1.1), we obtain a norming constant a 0 (/i,<7 2) that makes the right-hand side of (2.1.2) a density function with respect to Lebesgue measure. Considering the special case where the norming constant a 0(/Â«,cr 2) = a(a2) does not depend on p, we may write (2.1.2) as follows, p(y,H,v2) = a ( a 2 ) y - 1 / 2 ( y ) e x p { - ^ ( 2 / ; / i ) } , y â‚¬ C. (2.4.1) This is called a proper dispersion model Exactly three exponential dispersion models are also proper dispersion models, namely the normal, gamma and inverse Gaussian families. Other examples of proper dispersion models are the von Mises distribution and the simplex distribution of Barndorff-Nielsen and J0rgensen (1991). For more details, refer to J0rgensen (1996). Chapter 2. Dispersion models 16 2.5 Thinning operation We now describe the thinning operation of Joe (1996a), which is crucial for developing both the stationary A R M A models with exponential dispersion model margins in Chapter 5 and the state space models with stationary Markov latent processes in Chapter 7. The thinning operation may be interpreted as a kind of inverse of convolution. For example, whereas convolution increases the value of A (c.f. (2.2.6)), we show that thinning decreases the value of A. For the convenience of convolution expression, we adopt the additive form rather than reproductive exponential dispersion model for this study. Suppose X\ and X2 are independent with Xi~EDm(8,\i), i = l , 2 . (2.5.1) Let W â€” X\ + X2 denote their sum. By the property of exponential family, it is easy to show that W is sufficient for the parameter 9. So, the conditional distribution of X\ given W = x does not depend on the parameter 9, and is denoted X1\X1+X2 = x ~G{XuX2,x). The conditional density function corresponding to G(Xi, X2,x) is given by i \ \ \ \ c*(a;i;Ai)c*(a - x i ; A 2 ) . . P { X l | X ; A l ' A 2 ) = c*(*;A 1 + A 2) ( 2 - 5 - 2 ) where c* is given in the density (2.2.2). We call the distribution G(X1, X2,x) the contrac-tion corresponding to ED*(0, A) where A = Ai + X2. This terminology comes from the fact that if the model ED*(8,X) has support containing only non-negative values (excluding for example the normal density), then G(Ai , X2,x) is concentrated on the interval from 0 to x. We now define the thinning operation A (â€¢; a), where a Â£ [0,1]. Let X be a random variable with distribution ED*(0, A), an infinitely divisible exponential dispersion model, Chapter 2. Dispersion models 17 and let a = 1 â€” a. Define the random variable A(X; a) by the conditional distribution A(X;a)\X = x ~ G(a\,aA,x). By Proposition 2.1 below, the marginal distribution of A(X;a) is then ED*(0,aA), and we say that A(X; a) is the thinning of X by the proportion a. The assumption of infinite divisibility is made to ensure that both aA and aA are valid values of the index parameter for any value of a in (0,1). In the extreme cases a = 0,1, we define A(X; 1) = X, and A(X;0) = 0. The proof of the thinning operation hinges on the next proposition, which is easily proved using density functions when they exist. However, for completeness we include a proof that does not depend on the existence of densities. Proposition 2.1 (Joe, 1996a) Let ED*(6,X) be an exponential dispersion model with index set A, and suppose that X ~ ETT{6,X) and Y\X = x ~ G(X\, A2, x), where A, A i , A 2 Â£ A and X\ + A 2 = A. Then Y and X â€” Y are independent, Y ~ ED*(0, Aj) and X -Y ~ ED*(0,X2). Proof Suppose that X\ and X2 are independent with distributions (2.5.1), and let W = X\ + X2- Then the conditional distribution of and the marginal distribution of W are the same as respectively the conditional distribution of Y\X and the marginal distribution of X. Hence the joint distribution of (Xi,W) is the same as the joint distribution of (Y,X), which in turn implies that the joint distribution of (Xi,X2) is the same as the joint distribution of (Y,X â€” Y), completing the proof. â€¢ The following moment property, due to Joe (1996a), E{A(X;a)\X = x} = ax, (2.5.3) and is important for many calculations. Hence, the thinning operation decreases the value of A, one may say that it decreases the expectation, as evidenced by (2.5.3). Chapter 2. Dispersion models 18 Table 2.2: Summary of contractions. Distribution ED*(9,X) G(aX,aX,x) Reference Normal N(9X,X) N(ax,aaX) J o e (1996a) Poisson Po(Xe9) Bi(x,a) Al-Osh and Alzaid (1987) Gamma Ga*(8,X) xBeta(aX,aX) Lewis (1983) Inverse Gaussian IG*(9,X) Simplex Barndorff-Nielsen and J0rgensen (1991) Negative binomial Nb*(9,X) Beta-Bi(aX,aX,x) J o e (1996a) Generalized Poisson GP*(9,X) Quasi-binomial Consul (1989 p. 195) Tweedie Tw*(9, X) GTwp(aX,aX,x) J0rgensen and Song (1995) Table 2.2 presents a summary of the contractions for various exponential dispersion models, including the six examples considered by Joe (1996a). Except for the normal and Poisson distributions, the notation follows that used in equation (2.2.2). Details about the various contractions may be found either in the references mentioned, or in Joe (1996a). The contraction is particularly simple in the case of a normal or gamma margins. In the normal case with X ~ N(9X, X) we may write A(X; a) = aX + X0, where XQ ~ N(0, aaX) is independent of X. In the gamma case we have A(X; a) = GX where G ~ Beta(aX, aX) is independent of X. Let g(x; X) = - X T - 1 (x/X) + AAC {T _ 1 (X/A)} . (2.5.4) Inserting this in (2.2.7) we obtain an asymptotic approximation for the density of the contraction G(ctX,aX,x) namely p(w\x; aX, aX) V(X/*) g ( x ; A ) - P ( w ; a A ) - f l ( x - w ; a A ) \] 27raaAV {w/(aX)} V {(x - w) / (aX)} where V(fi) is the unit variance function r ' {T - 1(/J)}. This approximation gives the exact result in two cases, the normal and inverse Gaussian distributions. In several other cases, such as the Poisson and gamma cases, it amounts to replacing all factorials and gamma functions by Stirling's approximation. Chapter 2. Dispersion models 19 2.6 Multivariate dispersion model densities A simple multivariate extension for the univariate dispersion models might be defined similarly to (2.1.1), except that now y and fi are p-dimensional vectors but cr2 is still one-dimensional. Examples of this type of multivariate dispersion models include the multi-variate von Mises-Fisher distribution, the multivariate hyperboloid distribution and some special cases of the multivariate simplex distribution of Barndorff-Nielsen and J0rgensen (1991). The inference for this type of multivariate models follows the same pattern as that for univariate models, see J0rgensen (1987a). It is obvious that this multivariate extension oversimplifies the covariance structure because only a single parameter a2 is employed for handling the correlation structure. j0rgensen and Lauritzen (1993) proposed a different multivariate extension of the dispersion model density, following the spirit of the multivariate normal density. Consider a p-variate multivariate normal distribution, Afp(p, S) with mean LI and covariance matrix S. Its density function is of the form p(y; p, S) = (27r)-^2|S|-1/2 e X p | -I(y - / i ^ E ^ y - p)} , y G TV. (2.6.1) Equivalently it may be written in terms of the notations of dispersion models as follows, p(y;/i,S) = (2 7 r)-^ 2|S | - 1 / 2 exp{-^ T(y;M)S- 1 6(y;M)} (2.6.2) where 6(y; p) = (Su ..., SP)T and 6t = m) = d1/2(yi; ^)sgn(yi - m) = Vi - //;. If we generalize the expression (2.6.2) to allow the 6, (the square-root of the unit deviance d{) to be taken from the class of dispersion models, in principal we could define a family of multivariate dispersion model densities as follows, p(y; /*, S) = A(y-E) exp {~^T(y; M)S _ 1 Â£(y ; A*)} , y G C (2.6.3) Chapter 2. Dispersion models 20 where A > 0 is a suitable function, S is a positive definite matrix, 6 = (<$i,... ,b~p)T is the vector with components Â±d1^2(yf, //,-) and C is the convex support for Y . The multivariate representation is simplified when X is a diagonal matrix. Proposition 2.2 Suppose Y has a density of form (2.6.3). If the dispersion matrix Â£ is diagonal, then the components ofY are independent and Y; ~ DM(fii,au) where an is the i-th diagonal element of the matrix S. A n immediate example of this definition is the multivariate location model defined by the density p(y; p,, E) = A(2) exp { -^ T (y - ^ S T ^ y - (i)} where A-\i:) = exp { - ^ ( z J E - ^ f z ) } rfz, 6(y-n) = (Â±6(yi - . . . , Â±S(yp - ap))T , Â± is taken by the sign of yi â€” /i; and 8 is a non-negative function defined on 1Z satisfying 6(0) = 0. If the function A in (2.6.3) factorizes as of form A(y;E) = C ( Â£ ) Â£ ( y ) , for all y for suitable function C and B, we call (2.6.3) a multivariate proper dispersion model. Obviously, the multivariate normal distribution is a multivariate proper dispersion model. Other examples are given as follows. Example 2.1 (Multivariate gamma density) The univariate gamma distribution, being a proper dispersion model, has unit deviance function d(ytli) = 2U-log^-l). Chapter 2. Dispersion models 2 1 Now define a multivariate gamma distribution with position parameter vector p. and dispersion matrix Â£ by the joint density function p(y; p, S) = A(y; S) exp {~^T(y; A*)S _ 1 Â£(y; A*)} with S(y; p) = (<$i,..., SP)T where Si = %T-;m) = J2 ^ - log y - 1 j J sign(yt- - m). Here the A function is independent of the parameter p, given by TIP \ A " " Y V V ^ P P A ^ E ) = T ? v ( \ 7 e x p { - S A ^ II where Â£ 1 = A = (A tj) and T- X(S) = E I i>3 the expectation is taken under the probability measure generated by p independent uni-variate gamma random variables Xi ~ Ga (l, A ^ 1 ) , i = 1 , . . . ,p. This expectation can be evaluated by using the method of Monte Carlo simulation. Consequently, we may construct the multivariate exponential distribution and multivariate ^-distribution as special cases of multivariate gamma distributions. Example 2.2 (Multivariate von Mises density) The univariate von Mises distribution is a dispersion model, but not an exponential dispersion model, with unit deviance function d(y;ii) = 2 { 1 - cos (3 / - a)} . We now define a multivariate von Mises distribution with position parameter p and dispersion matrix Â£ by the joint density function p(y; p, S) = A(S) exp {-^T(y; ^ S T ^ y ; p)} Chapter 2. Dispersion models 22 with 6(y; p) = ( 6 1 , . . . , SP)T where Si - S(yi\iii) - [2 {1 - cos(?/i - Mi)}] 1 / 2 sign(?/i - m), and A-\S) = i e x p j - ^ ^ y ; ^ - 1 ^ ; ^ ) } ^ = I exp{-^ 0 T(x)S-^ 0(x)}rfx. where 6o(x) = S(y; 0). So, the A function is independent of Â£ since the deviance function is periodic. Example 2.3 (Multivariate hyperbola distribution) As is well-known, the univariate hy-perbola distribution (c.f. Barndorff-Nielsen, 1978) is proper dispersion model with unit deviance function y M d(y; a) = â€” | 2, for a > 0. M y Hence we define a multivariate hyperbola distribution with vector of parameter /x and dispersion matrix Â£ by the joint density function p(y; /x, S) = C(E) f[ y-1 exp {-^ T (y; A ^ S ^ y ; Â» ) } , y â‚¬ ftp+ with 6(y; /z) = ( 6 1 ? . . . , Sp)T where #i = % i ; Mi) = â€” + â€” - 2 sign(yi - m). \Mi !/i / By the transformation Zi = yi/fc, i = 1,... we obtain = j j , , ft^expj-^^E-^^z)}^ which is independent of the parameter vector p, and #i(z) = 6(z; 1). It is obvious that the multivariate hyperbola distribution is a multivariate proper dispersion model. Chapter 2. Dispersion models 23 Example 2.4 (Multivariate symmetric hyperbolic distribution) The univariate symmet-ric hyperbolic distribution is a proper dispersion model and has unit deviance function given by d(y;fi) = 2^{l + (y-p)2}1/2 -1 . Hence the multivariate symmetric hyperbolic distribution is defined by the joint density function p(y; A t , S) = C(E) {-\sT(y; f O E ^ y ; A * ) } , y G TV with S(y; p) = ( Â£ i , . . . , <5P)T where = /xO = V 2 { l + (y,- - ui)2}1/2 - 1 1/2 sign(y; - m). By the transformation zÂ± = yt- â€” u-i, i = 1,... ,p, we obtain C -^S) = j^exp{-^T(z)E-%(z)}dz where 6~o(z) = S(z; 0). Obviously, it is a multivariate proper dispersion model. In general, although it is difficult to obtain closed forms for the marginal distribu-tions and hence moments, it might be possible, with heavy computing, to develop the multivariate analysis for multivariate non-normal data analogous to the classical theory based on the multivariate dispersion model densities. Chapter 3 Multivariate dispersion models In Section 2.6, we have seen that J0rgensen and Lauritzen's (1993) multivariate dispersion model densities do not, in general, have closed forms for their marginals and moments. This deficiency limits their applicability, for example in their capacity to model the pattern of marginal means, which is often of interest for the analysis of short time series. To overcome this, we now study an alternative method for constructing multivariate models with given dispersion model marginals and indices of dependence through the copula approach of Sklar (1959) (see also Joe, 1996b). Our aim is to develop a joint distribution that can provide the given marginals. Constructing multivariate distributions by mean of copulas has proved popular in recent years (c.f. Joe, 1993, Hutchinson and Lai, 1990, Marshall and Olkin, 1988 and Joe, 1996b). The motivation for the copula approach is probably rooted in the aim of forming multivariate non-normal distributions by combining given non-normal marginal models, in a certain way, with dependence patterns. Being a multivariate joint distri-bution which contains only information regarding dependence, a copula produces new multivariate distributions whenever new suitable marginals are fed into it. A parametric copula is usually obtained by making certain variable transformations for a multivariate parametric distribution such that the transformed variable's distribution no longer con-tains any parameters of the marginals. As a consequence, what remains only relates to the parametrization of dependence but nothing to do with that of the marginals. With regard to dependence indices, it is known that Pearson's correlation coefficient 24 Chapter 3. Multivariate dispersion models 25 is mainly adequate for measuring linear dependence in the multivariate normal distri-bution, so when studying multivariate non-normal distributions, we need new tools to measure dependence between components. Among many possible definitions, Kendall's T , Spearman's p, the normal scoring v defined in the present chapter, are introduced to describe non-linear dependence. Amongst several types of copulas available in the literature, we are particularly inter-ested in the multivariate normal copula which is "extracted" from the multivariate normal distribution, say Afm(fi,T) where T = ( 7 ^ ) is a Pearson correlation matrix. The main reason for this option is that the multivariate normal distribution has a nice parameter structure, that is, the marginal parameter vector p, and the correlation parameter ma-trix T are "orthogonal" in the sense that the univariate marginal distributions N(p.i, 1) are independent of the correlation parameters Y (this property is not satisfied by, for example, the Dirichlet distribution). This property enables us to get rid of marginal information without any information loss regarding the dependence part. There are a few other reasons that makes the normal copula preferable, notably: â€¢ The normal copula may retain some good properties from the multivariate normal distribution where the dependence is characterized through the correlation matrix; this parametrization has been commonly used in practice and seems easy to com-prehend. Additionally, most computational tasks involving the normal distribution are available in statistical software. â€¢ Normal copulas may accommodate both "positive" and "negative" dependence since the matrix T may have values between â€”1 and 1. â€¢ For small dispersion parameters, univariate dispersion models may have normal approximations, saddlepoint approximations and Edgeworth expansions to assist asymptotic investigations. Chapter 3. Multivariate dispersion models 26 The copula approach to generating multivariate distributions gives rise to an impor-tant question of how to interpret the parameters of the matrix T in new multivariate distributions. A natural approach to this question would be to study the possible rela-tionships between T and the three measures r , p and v. We show that 7 ^ is identical to the dependence value of the (i,j)-ih pair measured by the normal scoring v (c.f. (3.3.1) for details), leading to a new interpretation for 7 ^ . Also, we obtain two closed forms for the relationships between 7 ^ and p and between 7 ^ and r; both are non-linear. A Monte Carlo stimulation illustrates that the value of 7 ^ is close to both r and p. A class of multivariate dispersion models is then defined by the multivariate normal copula, which is parameterized by both marginal position parameters, dispersion param-eters along with the dependence matrix. It is remarkable that J0rgensen and Lauritzen's (1993) multivariate dispersion model densities in Section 2.6 turn out to be the limiting distributions of our multivariate dispersion models for small dispersion parameters. This result links the two classes of multivariate extensions of dispersion models, and it helps us to understand the multivariate generalization of dispersion models from a different point of view. 3.1 Copulas We start with the definition of a copula. Let u_j be a subset of components of u with those indicated by the set / being omitted, where / is a subset of the indices { 1 , . . . , m}. Definition 3.1 A mapping C : (0, l ) m â€”> (0,1) is called a copula if (1) it is a continuous distribution function; and (2) each margin is a univariate uniform distribution, namely l im C(u) = v,i,Ui e (0,1) Chapter 3. Multivariate dispersion models 27 where the limit is taken under u3; â€”Â» 1, V/ ^ i. It is easy to prove that l i m u _ 7 - f i C(u) is a copula, denoted by C/(u/) where / is any subset of { 1 , . . . ,ra}. Copulas are easy to construct. For example, we may construct a copula from a given multivariate distribution as follows. Consider a random vector X ~ F(x) where F(-) is a given continuous multivariate distribution function with margins F i ( x i ) , . . . , Fm(xm). Then, Ui â€” F{(Xi) is a random variable that follows the uniform distribution U(0,1) for i = 1,... , m. Now consider the inverse distribution function xt- = Fj _ 1(u;) and obtain the copula CF(UI, . . . , um) as CF(uu ..., um) = F {F^M,..., F-\um)} . By the definition above, a copula might be interpreted as the remainder after transfor-mations making the marginals uniform, where the parameters regarding marginals are hence no longer contained in this copula. Clearly, the joint distribution F(x) can be fully recovered by complementing the copula with its own marginal distributions, F(x) = CF{F1(x1),...,Fm(xrn)}. However, it is interesting to observe that the function CF{U\, ..., um) can be used to accommodate other univariate distributions, say . . . , Gm(ym). Therefore, a new multivariate distribution, G(y), is obtained from the copula CV(-) by plugging in the margins G i ( y i ) , . . . , Gm(ym), G(y) = CF{G1(y1),...,Gm(ym)}. An important property of the multivariate distribution generated by a copula is that the i-th margin of G(y) gives the original Gi(yi), for i = 1,... , m. Note that in general, doing variable transformation for the marginals of a joint distribution -F(x) might change Chapter 3. Multivariate dispersion models 28 its dependence part. However, the multivariate normal distributions do not suffer from this problem. This is because, as is mentioned above, the multivariate normal distribu-tion is parametrized by two orthogonal sets of parameters with regard to the marginals and the correlations, respectively. This parametrization makes the multivariate normal distribution a good candidate for constructing copulas with easy interpretations. 3.2 Multivariate normal copulas Consider an m-variate multivariate normal distribution x = {xu...,xm)T ~Afm(o,v) where the matrix T = ( 7 , ^ ) is chosen to be a correlation matrix whose diagonal elements are all 1 and in effect, 7 , ^ is Pearson's correlation of Xi and Xj. We denote the distribution function of X by Clearly, all m marginal components of X have standard univariate normal distributions A/*(0,1), with distribution and density function denoted by </>(â€¢) and </?(â€¢), respectively. An m-dimensional normal copula is defined as c*(u|r) = $ {r xK), â€¢ â€¢ â€¢ ,^ _ 1 ( Â«m ) | r } , u G (o, i)m (3.2.1) and the density of C$(u|T) is hence c(u|r) = | r | - 1 / 2 e x P { - ^ x T r - 1 x + i x T x } = i r i - ^ e x p ^ x ^ i - r - 1 ^ } (3.2.2) where x = (xi,... ,xm)T with Xi = <^_ 1(ui),i = 1,... ,m . Figure 3.1 shows some plots of bivariate normal copulas (bivariate density functions) with different values of the correlation parameter 7 , namely â€”0.9, â€”0.5, 0.5 and 0.9. The normal copulas with negative values of 7 turn out to be concentrated in an opposite direction in relation to those with positive values of 7 . This figure also indicates a nice curvature pattern which seems similar to that of the normal distribution. Chapter 3. Multivariate dispersion models 29 Figure 3.1: Four bivariate normal copulas with different values of 7 . gamma=-0.9 gamma=-0.5 gamma=0.5 gamma=0.9 Chapter 3. Multivariate dispersion models 30 3.3 Interpretation of dependence parameters A n important issue that we discuss in this section is how to interpret the elements of the matrix T in the normal copula (3.2.2). As is mentioned above, three types of dependence measures are employed to explain 7,-j. First we show that 7,-j turns out to be identical to the normal scoring v(Ui,Uj), and then we proceed to establish its connections with the other two dependence measures: Kendall's T and Spearman's p. Here are the definitions of three dependence measures. Definition 3.2 (Kendall's T) Let F(xi, x2) be a continuous bivariate distribution function. Then Kendall's dependence measure r for any independent pairs (X\,X2) and (X[,X'2) with distribution F is defined h, T(XUX2) = P{(X1-X[)(X2-X2)>Q}-P{(x1-X[)(X2-X2)<0} = *J J F{xux2)dF(xux2)-\. From the definition, it is clear that Kendall's r is a bivariate measure of monotone dependence for continuous variables and gauges the difference of the probability of two random concordant pairs and the probability of two random discordant pairs. Definition 3.3 (Spearman's p) Let F(xi,x2) be a continuous bivariate distribution function with marginal distributions Fi(xi) and F2(x2). Then Spearman's dependence measure p is defined as the Pearson correlation of F\(X\) and F2(X2) for any pair (X\,X2) ~ F. Therefore, p(XUX2) = Corr{F1(X1),F2(X2)} = 12 JJ F1(x1)F2(x2)dF(x1,x2) - 3. Chapter 3. Multivariate dispersion models 31 Spearman's p is also a bivariate measure of monotone dependence for continuous variables and it reflects the association between two (monotonely increasing) transformed random variables that are non-informative with respect to their original variables. Since Pearson's correlation measures the dependence between two normal random variables so satisfactorily, why not to use the Pearson correlation for non-normal variables under normalizing transformations? This leads to another dependence measure defined as follows. Definition 3.4 (Normal scoring v) Let F(x\,x2) be a continuous bivariate distribution function with marginal distributions Fi(xi) and F2(x2). Then the normal scoring dependence measure v is defined as the Pearson correlation between qi(X\) and q2(X2) for any pair (X\,X2) ~ F, namely v(XuX2) = Corr{ql{X1),q2{X2)} where qi(-) are two transformations such that qi(Xi), i = 1,2, follow normal distributions. Note that the normality transformation q(-) always exists for a continuous random vari-able. Proposition 3.1 Let X be a continuous random variable distributed by F(x). Define q{x) = r 1 {F(x)} Then q(X) follows a normal distribution. The function q(x) is called the normal scoring function and the values of q(x) are called normal scores. Proof Let Y = q(X) where a one-to-one function q(-) is such that <Kv) = P(Y<y) = F{q-\y)} Chapter 3. Multivariate dispersion models 32 which leads to q(x) = (j)'1 {F(x)}. â€¢ Actually, the normal scoring index v is a bivariate measure of dependence for contin-uous variables and its value is calculated from normal scores. It is not hard to generalize the definition of v to discrete distributions. Details are omitted, but in the following the v is assumed well-defined in both the continuous and discrete cases. Now we apply the three dependence measures above to interpret the matrix T that parametrizes the normal copula given in (3.2.2). Consider the bivariate marginal normal copula with components (Ui,Uj) obtained from (3.2.2), C*y(Ui,t i j |7ij) |r*r 1 / a exp < 1 (xiyxj) (i2 - r , / ) X X^ ~f" %j ^"^fijXiXj 1 dsi 4 where as a 2 x 2 symmetric submatrix of T, has all diagonal elements 1 and all off-diagonal elements 7,-j and 72 is the bivariate identity matrix. Normal scoring v The normal scoring v of ([/;, Uj) is seen to be immediately v(Ui,Uj) = Corr {qiiU&qjiUj)} = CoTT^iUi),^^)} Note that Covl^- 1 ^-) ,^- 1 ^)} = / / <t>-\Ui)ct>-\Uj)c*x]{ui,Uj\li3)duiduj Chapter 3. Multivariate dispersion models 33 where ipij(x{, Xj\jij) is the joint density function of Af2(0, r , j ) . Similarly, Var{qi(Ui)} â€” Var{qi(Uj)} = 1, and therefore, WuUj) = Corr{9l-(r/0,9i(Â«7i)} = 7y- (3.3.1) Spearman's p Spearman's p of (Â£/,-, Uj) is = p{Ui,Uj) = 12 / / c^ij(ui,Uj\'jij)duiduj - 3. J J(0,1)2 Note that both <j)(Xi) and <f>(Xj) are uniformly distributed as tV(0,1) so it is not hard to prove that P i j = UEtftXiMXj) -3 = 12Cov {<f>(Xi), (f>(Xj)} where the expectation is taken under the bivariate distribution of (X{,Xj) ~ ^(0, r,-j). A Monte Carlo simulation study shows that p^ and 7,^ are quite close when the simulation size is large enough, say 5,000. See the two top plots of Figure 3.2. Kendall's r Kendall's r is given by - r(Ui, Uj) = 4 / / C7Â«y (M,-, Â«j|7tj)^CÂ«o-(Â«i> uj\lij) ~ !â€¢ J J(0,1)2 /(0,1)Â» It may be re-written as r t j = 4 E $ i i ( X , - , A i | 7 i i ) - 1, where the expectation is taken under the bivariate normal distribution [Xi, Xj) ~ A/^ O, T^). A Monte Carlo simulation study is shown in the two bottom plots of Figure 3.2. Although Kendall's r and the values of parameter 7 are not identical, they show a close relation to reflect the dependence pattern. In other words, increasing the value of 7 results in a similar increase for the value of r, and vice versa. Chapter 3. Multivariate dispersion models 34 Figure 3.2: Monte Carlo simulation study showing plots of Spearman's p versus Pearson's 7 (top plots) and Kendall's r versus Pearson's 7 (bottom plots), where normal scoring dependence measurements are identical to Pearson correlations given in the normal dis-tribution. MC with n=1000 MC with n=5000 c CO E CO <x> CL C O 10 0 o 0 LO 0 -0.5 0.0 0.5 pearson c CO E i_ CO <D Q. C O ID O o 0 IT) O -0.5 0.0 0.5 pearson MC with n=1000 MC with n=5000 CO TJ C 0 O o 0 10 0 -0.5 0.0 0.5 pearson CO "D C 6 o 0 LO 0 -0.5 0.0 0.5 pearson Chapter 3. Multivariate dispersion models 35 3.4 Multivariate dispersion models Now we define a family of multivariate dispersion models through multivariate normal copulas. Let Y{, i â€” 1,..., m, be m univariate random variables distributed according to dispersion models DM(// ; , a2), respectively. The densities and distributions of DM(ixt-, a2) are denoted by gi(y%, Pi^crf) and /x,-, cr?), respectively. We may construct a multi-variate dispersion model via the normal copula c$(-|T) by the following definition. Definition 3.5 An m-variate multivariate dispersion model generated by the normal copula c$(-|T)> denoted by MDMm(p, o~2 ,T), with margins gi(yi', fJ.i,o-f), i â€” 1,... ,m, is defined by the joint distribution of form G(y; A*, a2, T) = C$ { d f o i ; uh a2),Gm(ym-um, a2J\T} (3.4.1) or equivalently by the density given by 9{y\AÂ», <r2, T) = c$ {(7i(yi; fii, a2),..., Gm(ym; fim, ^m)|r}gi(yi\A*I, < T ? ) â€¢ â€¢ â€¢ gm(ym; Hm,crm) (3.4.2) where LI = (/ij,..., (im)T and a2 = (a2,..., a2m)T. This definition gives the same marginal distributions as those we started with regardless of the values of the matrix Y. Consequently, we may express the logarithm of the density g(y; p, or2, Y) in detail as follows, m log#(y; A*, o- 2 , r) = log c$ {Gi (y i ; / i i , < 7 2 ) , . . . , Gm(ym; Um, Ol r } + E l o S A*'. ai) 4 = 1 = â€”2 log |T| + 2 E [ ^ { G f o ) } ] - 2 E 7 t J >- 1 {G t (y i )}0- 1 {G,(y J )} t=i Â«,j=i +ElosflfÂ«(^;/iÂ»''<7i?) - ^ r | + - Â£ ( i - 7 Â» ) [^ {^ (y,-)}]' Chapter 3. Multivariate dispersion models 36 1 m where Gi(yi) â€” Gi(yi] Hi, of), r -1-whose (i,j)-th element being denoted by 7*J-is the inverse matrix of V and 4>~l {Gi(yi)} are normal scoring functions. Proposition 3.2 If T is the identity matrix I, then the components Y\,... ,Ym are independent. Proof When T = I, the logarithm of the joint density becomes m logg(y;ti,<r2,I) = ^log^(j/i;/ii,<r 2) i=l since |T| = 1 and m m â€ž i,j=l i=l Therefore, Y\,..., Ym are independent. â€¢ 3.5 Normal scores and their approximations For convenience, we denote the normal scores by qt = (j)'1 {Gi(ui)} , i = 1,..., m with respect to the i-th. margin G{. The accuracy of evaluation of normal scores involving the joint density g(y; p, Â«r2, V) is crucial for its potential applications. Clearly, when G{ is a univariate normal distribution A/"(/^ -, of), we obtain qi = r1 {Gi(yi)} = Vi - M i o-i and therefore, 1 -I m ij 1 m logsKy; p, <r2, T) = - - l o g | r | - - E ^ â€”(yi ~ M0(W - Hi) - lÂ°g(2T) - ^ E M " ? ) Z L i,j=l aiaJ Z Z i = l Chapter 3. Multivariate dispersion models 37 which is exactly the logarithm of the density function of the multivariate normal distri-bution Afm(p, S ) with Â£ = diag(<7;) T diag(cr,). However, for non-normal margins Gi, the evaluation of </>_1{Gft(?/i)} is not so straightforward, and many approximations have been proposed in recent literature. According to J 0 r g e n s e n (1996), (also see J 0 r g e n s e n , 1987b), the so-called technique of small-dispersion asymptotics plays a key role in approximating normal scores, and this evaluation is basically equivalent to assessing the normal approximation for the distribution of the dispersion model. Suppressing the index variable i for convenience, five related approaches are available to approximate the normal scores, starting with the Pearson residual defined by a one-to-one function of y for fixed //. According to the theory of the tail area approx-imation (c.f. J0rgensen 1987b and Reid 1995), we have the following small-dispersion asymptotic normality, G(y;u,a2)~<f>(rp/a) (3.5.1) where G(y; fi,cr2) is the distribution function of a dispersion model D M ( J K , ( 7 2 ) given in (2.1.1), (/>(â€¢) is the distribution function of jV(0, 1) and rp is the Pearson residual. Here the notation ~ is introduced in Section 2.1. From (3.5.1), the normal score is therefore approximated as q(y) = r1{G(y]u,*2)}^rv/* = J ^ y In addition, there are four other types of standardized residuals similar to the Pearson residuals given as follows: (1) the Wald residual defined by rw = VV2(y){T-\y)-T-\u)}, yen, Chapter 3. Multivariate dispersion models 38 (2) the score residual rs = 2dp (3) the dual score residual of Reid (1995) and (4) the deviance residual r = Â±d1'2(yp) where Â± is taken by the sign of y â€” p. The approximation equivalence of the five standard residuals gives rise to similar normal asymptotics by G(y,p,o-2) ~ cf>(rw/a) or q(y)~rw/a G(y, p, cr2) ~ <j> (rs/cr) or q(y) ~ r s/cr, G(y,p,cr2)~<j)(u/cr) or q(y)~u/a and G{y;p,o2)~<j)(r/cr) or q(y)~r/a as the dispersion parameter cr2 becomes small. To adjust for a certain bias caused by the above approximations, Barndorff-Nielsen (1990) and Jensen (1992) proposed the modified deviance residual defined by m r a u r - â€” | â€” log â€” a r r and they proved that G(y;ii,*2) = <Kr*){l + 0(o*)} Chapter 3. Multivariate dispersion models 39 where the term ^ is always positive because the two residuals u and r take the same sign. As is shown in J0rgensen (1996), see also Figure 3.3, this approximation tends to be very good, and is often significantly better than the closest contender c/>(r/<r) based on the deviance residual. However, this approximation may not be reliable for some extreme cases where u tends to zero while r remains bounded as y tends to the lower end of the domain, making r* tend to infinity, instead of minus infinity as it ordinarily should. Now we present a more refined approximation for the normal score based on the modified deviance residual r*. Since Â§~ x is continuously differentiable, where Â£ is between 4>(r*)0(o3) and <j>(r*). Due to the monotonicity of (f> 1, we have g = r1 {G(y; IM^2)} = r1 [#r*){l + 0(cr3)}] = *-M#0}+ [^{r1^)}]"1^*)^3) q = r1 [flr*){l + 0(a3)}} < r* + v2'K(j>{r*) exp 0(a3). (3.5.2) Proposition 3.3 Let Y ~ ED(fi,a2). Then as a2 -Â»â€¢ 0, q(Y) = r* + Op(*3). Proof Consider the limit a2 0. First note that log ^ = o p ( l ) , where u is the dual score residual and r the deviance residual. So r * = â€” + Op(cr) and moreover a2 + op(r) + op(cr2). Chapter 3. Multivariate dispersion models 40 Second, we know that Var(Y) = a2V(fi) < oo, or (y - M ) 2 o-2v(u.) By the Taylor expansion, it follows that % ; M ) (y-f)2 = oP(i). + op(\y-u\2) = Op(l). (3.5.3) a2 o-2V(u) Thus the inequality (3.5.2) can be further written as q = r * + e x p j ^ j c V ) = r* + e x p | ^ + o p ( r ) - r o p ( ( 7 2 ) |o ( a 3 ) = r* + Op(l)0(a3). The last equality is obtained by (3.5.3). â€¢ Figure 3.3 shows quantile plots for gamma residuals. Clearly, when the dispersion parameter a2 is small, there are no big differences among the three approximations, but as a2 increases, the Pearson residual appears to have a substantial bias away from the diagonal line. Similar phenomena for other dispersion models are illustrated in J0rgensen (1996). In summary, when the dispersion parameter a2 is not small, either r/cr or r* is superior to rp/cr for approximating normal scores. This comparison has a great impact on the approximate inference developed in Chapter 4. 3.6 Small-dispersion asymptotics for joint density In this section, we focus on the study of approximation for the logarithm of the joint density #(y) given by (3.4.1). According to J0rgensen (1996), the four types of residuals including rp, r w , rs and u are nearly equivalent in terms of their contributions for the tail area approximation under the class of regular dispersion models. So, the Pearson Chapter 3. Multivariate dispersion models 41 Figure 3.3: Comparison of three types of residuals from gamma distribution: r* is rep-resented by a solid line (â€”), r/cr by a dashed line ( ), rp/u by a broken line ( ) and normal scores by a dotted line ( ) (mostly covered by the line of r*). Chapter 3. Multivariate dispersion models 42 residual rp is opted to be their representative for the following study. The other two types of residuals, the deviance residual r and the modified deviance residual r*, are also used in the following study of approximation. Approximation based on Pearson residual When the dispersion parameter o2 small, Y Y m 2 1 m m lÂ°g#(y;/*,<r,r) ~ l o Â§ l r l + g E ( r P') ~ 2 ? T* , rP< rPi + Elog#(^'"><7.?) i = l i , j=l i = l 1 m 1 - log |r| + E log afa; a?) - -(y - /x)T (D T D)~l (y - /x) 2 where D = diag |<r,-V'1/2(/i,-)|. Approximation based on deviance residual iog^( y ;p,<r 2,r) Â« - - l o g |r| + - E - i - Â« E ^r: rÂ«- ri + Eiog^ -(y.-;M.-, i = l * t,j=l * J 1=1 1, . 1 A r ? 1 ^ 7 i j ^logirK-E-i-^E-v^-r 2 2 . = 1 a,- 2 â€¢ i i = 1 cr^-m t = i L z i = i 1 m 1 = - 2 l oÂ§ i r i + E log *?) - 2rT(y; **) (G r r(y; *Â») where r(y; t t ) = ( r i , . . . , r m ) T and G = diag(cr,-). So, the density might be re-written approximately as m , -1 v g{y,p,<r\T) Â» \Y\~^ J[ Â«(</,; <r2) exp { - ^ ( y ; ^ ^ ^ ( y ; (3.6.1) where S = G F (?. It is remarkable that the density function obtained by normalizing the right-hand side of the approximation (3.6.1) coincides with the definition for multivariate dispersion density of J0rgensen and Lauritzen (1993) given in Section 2.6. Approximation based on modified deviance residual 1 I m I m m log0(y;M ,<r 2 ,r) Â« --log|r| + 3 E (O' ~ 9 E 7 i j'r?r; + Elog^ -(y.-;W,^ ) i = l i,3=l Â»=1 Chapter 3. Multivariate dispersion models 43 and it can be further approximated with the deviance residual r by using r* = r/cr + O p ( c r ) . This approximation provides a more accurate representation to the joint density g(y; p, <x2, T ) than the other two; however, because r* involves the parameter cr2, this version of the approximation to inference might results in a complex estimating procedure. 3.7 Approximate covariance It is of interest to investigate how the matrix T characterizes the correlation structure of Y i , . . . , Ym. It follows from (3.3.1) that v(Yi,Yi) = v(Ui,Uj) = l t h so, the (i, j)-th element -y^ - of T turns out to be the normal scoring v measurement for the dependence between Y{ and Yj. In addition, a direct relationship between the Pearson correlation and the parameter 7JJ could be established approximately as follows. Through some trivial calculations, we find that for any i, j, Cov(Yt, Y3) = Cov [GTHftXi)}, Gfi^Xj)} where (Xi, Xj) ~ A/^ O, Tij) and is a 2 x 2 symmetric matrix with diagonal elements 1 and off-diagonal element 7^. By (3.5.1), the approximation based on the Pearson residual for al small, k = 1,... , m, G * 1 {>(**)} - <rkV1/2{u.k)xk + fik. Hence, for both cr2 and cr2 small, Cov(r-, Yj) ~ C o v { a i y 1 / 2 ( ^ ) X i + ^ajV^^Xj + N} = ^ VO^'Vihij Thus, noting that Var(rt) = alV(fik), k = 1,..., m, we obtain Cov(3S,y})~7y. Chapter 3. Multivariate dispersion models 44 Hence the Pearson correlation of (Yi,Yj) is approximately 7 tj when both o\ and <r| are small. In other words, jij gives a local linear correlation in the neighbourhood of (//;, pj). This property will be used to explain the working correlation matrix of the G E E approach suggested by Liang and Zeger (1986) in Chapter 4. Chapter 4 Likelihood-based inference for population-averaged models This chapter studies the regression analysis of short time series data through population-averaged models (Zeger et al., 1988) in which the aggregate response for the population is the focus. A likelihood-based theory for approximate inference about regression parame-ters is presented, provided that each series as a vector of responses follows a multivariate dispersion model introduced in the previous chapter. The motivation of such a study is to investigate, from a probability point of view, a possible theoretical explanation of the generalized estimating equation (GEE) proposed by Liang and Zeger (1986). Being a quasi-likelihood approach, the G E E has its limi-tations. That is, the method incorporates a so-called "working" correlation matrix and then treats the matrix as a set of nuisance parameters in the estimation procedure. Only second moment assumptions are made for the margins, whereas the standard errors of the regression parameter estimates are estimated using the so-called sandwich estimator. This makes the procedure semi-parametric in nature, leaving open the question of the existence of an underlying probability model. Approximate inference for the regression parameters of the population-averaged mod-els is based on both Pearson and deviance residuals that approximate the normal scores in the multivariate dispersion models. An important finding in this chapter is an asymptotic relation between our likelihood-oriented method and Liang and Zeger's G E E approach as the dispersion parameter becomes small. As a consequence, the G E E method may be justified through a fully specified probability model, thereby reinforcing the theoretical 45 Chapter 4. Likelihood-based inference for population-averaged models 46 basis of the GEE approach. When the dispersion parameter is not small, we propose an alternative approach based on the deviance residual, which provides a much more accurate approximation to the normal score but results in a biased inference function. Following Kuk (1995), a so-called bias-corrected estimate of the regression parameters is suggested, and is shown to be asymptotically unbiased and asymptotically normally distributed. 4.1 M o d e l s Consider a data set consisting of n independent short time series where each series is observed over the same m occasions. Assume that the z-th series (the m-vector of re-sponses), Yj, follows a multivariate exponential dispersion model, Y , - M D M m ( ^ ^ , r ) , i = l , . . . , n where p,{ = (u-n,..., Pim)T and <r2 = (<r,i,..., cr; m ) T are respectively the vector of mean parameters and the vector of dispersion parameters for the i-th series. We assume an = a21wn where io,-t are known positive weights, t = 1,... ,m and a2 is an unknown constant dispersion parameter common for all series. Now assume that each series is associated with a p-vector of covariates x,-t available at time t. Let X,- = (x^i,..., x,-m)T represent the m x p matrix of covariates associated with the i-th. series. Hence, the data for the i-th subject consist of the pair of matrices (Y,-,X,-). It follows immediately that for given covariate matrix X,-, the marginal distribution of the i-th component, Yn, of Y,-, t â€” 1,... , m, i = 1,... , n is ^ | X , - ~ E D (pit,o-2/w2t). Chapter 4. Likelihood-based inference for population-averaged models 47 We further assume that the marginal means follow a generalized linear model, h(uit) = r)it, T]it = xJtP where (3 is a p-vector of regression parameters of interest and the link function h is usually chosen to be the canonical link (for example, the log link for the Poisson distribution and the logistic link for the binomial distribution). The main objective is to estimate the parameters 0 = ((3,<r2,T). We can group the fiit(fl) together to form a m-vector Pi((3) including the marginal means, Pi(f3) = (fin,..., pim)T. Note that the dependence among the components of Y8- is characterized by the joint distribution M D M m ( / / i , <r2, T). With observations yn, t â€” 1,..., m, i = 1,..., n, the full log-likelihood function for the parameter 6 is given by n m Â£(/3,<72,r ;v) = - ^ i o g | r | + EEiogÂ«(^^ 2/^)-z t=l t=l 2 . . where q;(yi;(3,a2) â€” (qu,... ,qim)T, the vectors of normal scores depending on param-eters (/3,a2) and yt- = (yiu ...,yim)T, because indeed qit = (j)'1 {Git (yu;Hiucr2/wft)} where Ga are the distribution functions of ED (fin, a2/w2t). In principle, the maximum likelihood estimate J3 can be obtained, applying the chain rule of differentiation with respect to (3, as the solution to the estimating equation given by <P(/3; Y,a2,Y) = t(^) (j * = 0 (4-1.1) where % t _ dqit dGit dfijt _ d/3 ~ dGit dfnt d/3 ~ V(ui Here </?(â€¢) is the density function of the standard normal distribution and ga is the density function of Ga- It is worth noting that since the equation (4.1.1) is derived from an exact x | â€” / ugit(u)du - Git \ x 8 f Chapter 4. Likelihood-based inference for population-averaged models 48 full log-likelihood, the inference function \P must be unbiased, that is E\P = 0. Therefore, under some mild regularity conditions, standard theory for inference functions implies asymptotic normality (Cox and Hinkley, 1974 or Godambe, 1976), as both <r2 and V are fixed y/n {p - p) Â±> Mp (0, G) , as n â€”â€¢ oo with G = limâ€ž raJ-1 where J is the Godambe information matrix defined by J = S(/3)V-1(/3)ST(/3) with sensitivity matrix S = E0*'(/J;Y,<7 2,r) and variability matrix V = Ep {*(/3; Y, a 2, r)*T(/3; Y, o\ T)} . Here we make several remarks: (1) In the asymptotic result above the Godambe information matrix is actually the Fisher information matrix, because the inference function 4.1.1 is derived directly from the real likelihood function. (2) For Poisson count data without overdispersion, cr2 = 1, so depends on only (3 and Y for given V. (3) Unfortunately, it is difficult to obtain the estimate /3 directly from equation (4.1.1) due to the complexity of the normal scores qu. Therefore, approximations to the normal scores in (4.1.1) are invoked. Details are given in the following sections. Similarly, given (3 and cr2, differentiating the full log-likelihood function with respect to T _ 1 and noting that (see for example Rogers, 1980) ^ = i r - | r <ir-i 1 1 ' Chapter 4. Likelihood-based inference for population-averaged models 49 we obtain the maximum likelihood estimate of T f = -Ecfc-Cy.-^.OqTCy,.;^^). (4.1.2) Expressing the derivative of the likelihood with respect to the dispersion parameter c 2 is quite complex because the derivatives for the function a and the normal scores q t are non-linear function in cr2, so that there is no closed form available, in general, for the estimate of cr2. In the following sections, various versions of estimate of cr2 will be given under different choices of the normal scoring approximations. In all cases, to find &2 we shall use the saddlepoint approximation to the function a, a(yit;a2)~{2no-2V(yit)}~1/2. 4.2 Approximate inference based on the Pearson residual The first approximation is based on the Pearson residual, which leads to an unbiased inference function shown as follows. Let Wi = diag(u>ti). With the Pearson residual, the log-likelihood function for the parameters (/3,a2,T) is Â£(/3,<72,r;Y) ~ log |T| + f : Â£ log a(^; C T 2 / ^ , ) z i=i t=i 1 E(y< - ^rwivr1 r - 1 w^Wiivi -2o2 . 1 where V; = diag {V1'2^)}. Taking the derivative of the log-likelihood with respect to the regression parameter {3, we obtain BÂ£ 1 71 BuJ â€” = -â€”T-^w- V- 1 r - 1 V'1 W(y - u) d[3 a 2 ^ B/3 1 1 1 t [ Y t M J and hence the estimate of j3, denoted by /3p, is given as the solution to the following equation 71 BuJ n Up(/3; Y , r ) = Â£ -gtWi V- 1 T-1 V r 1 Wi(y.- - lit) = Â£ DjWi Sr1 Wi(y,- - /*.â€¢) = 0 i = l Â°P i=l Chapter 4. Likelihood-based inference for population-averaged models 50 where 7> = ^ and <r2Et- = o 2 d i a g { y 1 / 2 ( ^ t ) } T diag {V^2(uit)} is the approximate covariance matrix of Yi, as shown in Section 3.7 when the dispersion parameter <r2 becomes small. It is easy to show that for given T, E / 3 U p ( / 3 ; Y , r ) = 0 , so the inference function Up(/3; Y , T ) is unbiased. Therefore, we have the asymptotic normality ^Jn (/3p â€” f3j Afp (o, Gp) , as n â€”> oo with Gp = l i m n nJp1 where Jp is the Godambe information matrix given by Jp = S p V p 1 S p with the sensitivity matrix and the variability matrix are given respectively by S p = EpUpGS; Y , T), V p = Ep {u p(/3; Y , r)Uj(/3; Y , T)} . Because of the asymptotic normality, we then have /3p - (3 = Op (n" 1 / 2 ) . This proves that the estimate /3p is consistent and asymptotically unbiased to (3. Furthermore, simplifying the above process by letting all be 1, we obtain a simpler version of the estimating equation for (3 as follows, E A T s - 1 ( y ! - M i ) = o. (4.2.1) Remarkably, equation (4.2.1) coincides with the form of Liang and Zeger's (1986) gen-eralized estimating equation. It is also remarkable that Ej (or T), being treated as the Chapter 4. Likelihood-based inference for population-averaged models 51 nuisance parameters in their paper, may be interpreted, in our setting, as dependence measures in the sense explained in the previous chapter. The computation of /3p may be done by the G E E routine available in S-PLUS. From the approximate log-likelihood, we could also obtain an estimate of V * 1 EwiK"1(y.--/*0(y.--Mi)TK"1wi n Â° 2 i=i and the following estimate of cr2 Â£p = â€” E(y.- - ^ ) T W i E , - 1 Wi(y,- - pd-n-p i = 1 4.3 Approximate inference based on the deviance residual When the dispersion parameter is not small, we argued in Section 3.5 that the approxi-mation by Pearson residual is likely to cause serious bias and hence results in inaccurate inference. On the other hand, the deviance residual provides a better approximation to the normal scores, leading to a new estimating equation based on the deviance residual as follows. Let r; = ( r , i , . . . , r , - m ) T . The full log-likelihood function for the parameter a2, T) is approximately of the form n m I n Â£((3, a, T; Y) ~ - - log in+E E log a(yit; a2/w2t)-â€” Â£ r T (y i ; p{) Wt V'1 Wt r 8(y i ; p{). L i=i t=i z a i=i Let Then, the estimate /3 r is defined as the solution to the equation given as follows, U r ( /3 ; Y) = YJR]Wi T - 1 Wi r,-(y,-;fi,-) = 0 (4.3.1) Chapter 4. Likelihood-based inference for population-averaged models 52 where i? t is an m x p matrix given by i? 8(/3,a 2) = diag Vit - Pit ritV(pit) Di. Clearly, the inference function U r is no longer unbiased, i.e., E ^ U r ( / 3 ; Y ) ^ 0, and therefore the estimate J3r is inconsistent (to the true parameter /3). To compute /3 r , we iterate between approximate Fisher scoring for (3 and updating estimates of T and a2. Given current estimate T and a2, we suggest the following modified iterative procedure for f3, Inspired by Kuk (1995), we now consider a bias-corrected estimator of {3. Let (3* be a vector of parameters such that Then, as the sample size n increases, we have the standard asymptotic normality as follows, E / 3 U r ( / 3 * ; Y ) = 0. (/3r - p) Â± Afp (0, Gr) with G r = lirrin n J r where J r is the Godambe information matrix given by J r = S r V ^ S j with the sensitivity matrix and the variability matrix given by respectively Sr = E Â£ { U r ( / 3 * ; Y ) } and V r = E ^ {ur(/3*; Y)U r T ( /3*; Y ) } . Chapter 4. Likelihood-based inference for population-averaged models 53 Suppose that there exists a vector of one-to-one differentiable function f such that (3* = f((3) holds componentwise. Define a new estimator of f3 by 0 = f - 1(/3 r). Then it follows from Slutsky-Frechet' s theorem (c.f. Serfling, 1980) that ^(3-/3) A A T P ( O , G ) where G = H G R H T and H is the matrix of the derivatives of the components of (3 = f_1(/3*) with respect to the components of (3*. Hence the estimate (3 is consistent and asymptotically normal. With a specified function vector f, we derive bias-correction as follows. Define the bias by b(/3) = (3* - (3 = f ((3) - (3. Start with b^ 0^ which is an initial estimate for the bias of (3T and then update both by b(fc+1> = f (3r - b ( f c )) - (3r - b W ) (4.3.2) and 3("+1)=3r-b(fc+1). If the limit b of exists, then by letting k â€”> co in equation (4.3.2), we obtain b = f (3) - (3r - b) so that 3 = f _ 1 (3r) â€¢ ^ n w a v t Â° initialize the bias is b(Â°) = f(3p)-3p where 3p is the estimate of (3 obtained from the G E E approach. In general, it is hard to get a closed form for the function f, but numerical evaluation of f at given points is feasible. For instance, given the fact that 3r h a s the asymptotic Chapter 4. Likelihood-based inference for population-averaged models 54 mean /3* = f(/3), f(/3) might be approximated by the average of /3r over simulated samples from the model with parameter set at (3. For details, we refer to Kuk (1995) and numerical examples therein. We may consider a further approximation for r t as follows, r,(yt';/*i) ~ VT^yi - Pi) and Therefore we obtain the same approximate form as equation (4.2.1), J2DjWiX-lWi(yi-p.i) = 0. Similarly, we could then obtain the following estimate of T, r = ^ E ^ ^(y,-;^) r\T(yi;/i,.) W{ Â«=i and the following estimate of cr2, 4.4 Approximate inference based on the modified deviance residual In this closed form for the estimate of T is available. However, since r* involves the parameter cr2, it is nearly impossible to make inference for /3. For theoretical reasons, we present some calculations in details as follows. Let r* = ( r* l 5 . . . , r * m ) T where r*t denote the modified deviance residuals. The log-likelihood function for parameters (f3,cr2,T) may be approximated by n n p 1 " m Â£(/3,o;T;Y) ~ - ^ l o g | r | + Â£ Â£ l o g a ( y 8 i ; c r 2 > 2 ) + i Â£ Â£ ( r * t ) 2 -z i=i t=i z i=i t=i Chapter 4. Likelihood-based inference for population-averaged models 55 9 Â£ Â£ 7 risrit Z i=l s , i = l â€ž n p -i n = - ? log |r| + Â£ Â£ log a(ytt; <7 2 /% 2 ) + r Â£ (r*)T (r*) -z i=i = â€” l o g i n + E E log Â«(yÂ«; *7Â«&) + 5 Â£ (r<*)T (' - r _ 1 ) W) Z t = l i = l Z t=l To derive the likelihood estimating equation, we first study the derivative of the form 0^ i a g \dfiit) whose i-th diagonal element is dr*it _ wa djj^ <T Oflit O- Oflit WIT where dit = d(yit; (iit) and , v2 v2 , uit 1 VW(yit) eit = 1 2 j r r l o g â€” ' ~ T77â€”\~-witda wtAt rit uura V(pa) In matrix notation, we obtain dpi a dfii where Ei â€” diag(e^) and Fi = diag(/ t i). So, the estimating equation is finally given by dp, \ ^ " f r t ' dp, J = I = o. It turns out that the approximation based on modified deviance residual does not reduce the simplicity of the estimation procedure for f3. However, we obtain the following estimate of T f = -Er?(y.-;)Ma) r?T (y.-;)M2)-n i=i Chapter 4. Likelihood-based inference for population-averaged models 5 6 4.5 Implementation of the estimating procedure To compute the estimate of f3, we choose the estimating equation based on the Pearson residual for small dispersion parameter, and otherwise choose the estimating equation based on the deviance residual. As shown in Section 3 . 5 , along with Figure 3 . 3 , for the accuracy of normal scoring approximation, we implement our estimating procedure as follows. No matter which approximation residual we use for computing f3, we suggest esti-mating T always by f = - t r ; (yt;/3,c>2) r*T (yt;/3,c>2) ( 4 . 5 . 1 ) iâ€”l and cr2 by ^ = Â£ rsT {yi; vM} Wt f-1 Wt n {y,; iiiifl)} . ( 4 . 5 . 2 ) The algorithm is as follows. ( 1 ) Fit data by a generalized linear model under the assumption of independence, and obtain (30 and <TQ. These estimates should be good starting values since they are a kind of robust estimates against the misspecification of T. Here T = I. ( 2 ) Given /30 and <TQ, obtain f 0 by equation ( 4 . 5 . 1 ) above. ( 3 ) With <7Q and r 0 given, fit a model again to compute /3p using the G E E method of Liang and Zeger ( 1 9 8 6 ) , corresponding to approximate inference based on Pear-son residuals, or to compute 0 using the bias-corrected approach of Kuk ( 1 9 9 5 ) , corresponding to the approximate inference based on deviance residuals. 4 Use the new J3 to update cr by equation ( 9 . 2 . 1 1 ) and then update f by equation ( 4 . 5 . 1 ) . 5 Repeat Steps ( 3 ) and ( 4 ) until convergence. Chapter 4. Likelihood-based inference for population-averaged models 57 Clearly, the standard errors of the estimates could be computed using the Godambe information matrix given in the previous sections. The Jackknife approach is another way to calculate standard errors, see Joe (1996b, Chapter 10) for details. Finally, to compare the efficiency of the two estimators /3p and fi, we may calculate the asymptotic relative efficiency (ARE) of /3p versus /3, which is given by the diagonal elements of A R E = diagtJp 1) {diag(G)} = diag(G p ) {diag(H J f 1 H T ) } _ 1 . Chapter 5 Stationary models with exponential dispersion model margins 5.1 Introduction Let us first consider the sunspot data, reported by Andrews and Herzberg (1980), which are monthly mean relative sunspot numbers over the period 1749-1976. These monthly data are the simple monthly means of the daily relative sunspot values that are calculated by Wolfer's formula (c.f. Andrews and Herzberg, 1980), so they are in fact non-negative continuous data rather than integer data. Due to the large amount of monthly data, we chose to show the Wolfer's yearly mean relative sunspot numbers instead in Figure 5.1 (the time-series plot based on monthly data appears to be too squeezed and therefore can not exhibit pattern of the data). Many studies have assumed that the sunspot data (for example, Wolfer's yearly mean relative sunspot numbers) form a stationary time series (c.f. Yule, 1927). Most such stud-ies assume that the sunspot data approximately follow a normal distribution, allowing one to use classical A R M A models. Noting that the sunspot data are positive numbers with occasional zeroes, we would suggest that the data are likely to follow a compound Pois-son marginal distribution, resulting from a sum of gamma random variables (representing Wolfer's mean relative sunspot numbers) with a Poisson number of terms representing the number of places with the presence of sunspots on the surface of the sun. Thus in the compound Poisson setting, a zero sunspot number is obtained when the Poisson random variable takes the value zero, interpreted as the case where no sunspots occurred for that 58 Chapter 5. Stationary models with exponential dispersion model margins 59 Figure 5.1: Yearly mean sunspot numbers. Yearly mean sunspot numbers o LO o o o LO A V 1 I ~i i i I i i i I I ~ J i i I i i r~ 1749 1779 1809 1839 1869 1899 1929 1959 Time over 1749-1976 Chapter 5. Stationary models with exponential dispersion model margins 60 year (or month). It is natural to ask whether or not the classic A R M A model can really be used to fit such data when we realize the data are not from a normal distribution. More generally, a fundamental question would be whether or not it is possible to gener-alize the classical A R M A models to fit stationary time series with non-normal margins such as compound Poisson, Poisson, gamma, negative binomial or inverse Gaussian. In the past two decades or so, this issue has received considerable attention, starting with the papers Lawrance and Lewis (1977) and Jacobs and Lewis (1977) for the case of exponential margins. Models for other marginal distributions such as gamma, Poisson and negative binomial followed, (c.f. Gaver and Lewis 1980, Lawrance 1982, McKenzie 1986, 1988, Lewis, McKenzie and Hugus 1989, McCormick and Park 1992 and Al-Osh and Aly 1992). A unification and generalization of some of these models was recently proposed by Joe (1996a), based on convolution-closed families of infinitely divisible distributions, allowing the construction of autoregressive moving average ( A R M A ) models of arbitrary orders. Let us first consider Joe's (1996a) construction of an AR(1) model of the form â€¢Xt = At(Xt-1;ct) + et, * = 1,... , (5.1.1) where At is a thinning operation defined in Section 2.5, and the two terms on the right-hand side of the equation are independent. The subscript t on the operator At emphasizes that different operators are used for each t and the operator At is assumed to be con-ditionally independent of et given Xt-\. We want the marginal distribution of Xt to be an infinitely divisible exponential dispersion model ED*(9,X). By the definition of the thinning operation, the variable Xt-i with distribution ED*(Q,\) is 'thinned' to a vari-able At(Xt-\\ct) with distribution ED*(6,ct\) for some a G (0,1). Assuming that the innovations et are independent and identically distributed ED*(0,(1 â€” a)A) and using Chapter 5. Stationary models with exponential dispersion model margins 61 the convolution formula ED*(6, Ai) * ED*(6, A 2 ) = ED*(6, A x + A 2 ) , (5.1.2) the desired distribution for Xt is obtained. The thinning operation, defined in Section 2.5, generalizes the notion of binomial thinning of a Poisson variable (c.f. McKenzie, 1988), and the beta 'thinning' of the gamma distribution used by Lewis, McKenzie and Hugus (1989). The generalization of the thinning operation to the thinning of a vector of correlated variables is not that straightforward. For this reason, the generalization of (5.1.1) to higher-order A R processes is complicated, as evidenced by the technique used by Joe (1996a). In contrast it is easy to construct moving average processes, because only the thinning of independent variables is involved. Thus, following Joe (1996a) we define a q-th order moving average process {Xt} with margins ED* (0, A) and coefficients a\,..., ctq â‚¬ [0,1] by i Xt = et + Y,Atj{et-j;cLj) (5.1.3) i=i where {et} is a sequence of i.i.d. ED* (0, A / ( l + YA aj)) innovations, and where the q terms in the sum on the left-hand side of (5.1.3) are conditionally independent given the innovations. The definition in equation (5.1.3) is easily extended to a moving average model of infinite order, as made precise in Section 5.2. In the normal case we know that any causal A R M A model may be represented as an infinite order moving average model. Our approach, then, is to use this equivalence to define what we call E D A R M A ( p , q) processes from the infinite order moving average processes in Section 5.2. Although this approach is less direct than that of Joe (1996a), it is mathematically more convenient, and allows us to transplant many techniques from linear time-series analysis into our framework. Chapter 5. Stationary models with exponential dispersion model margins 62 5.2 Moving average processes of infinite order We now use the thinning operation to construct a class of moving average processes of infinite order with univariate margins of exponential dispersion model form. Moving average processes of infinite order with normal margins have been considered in many practical fields such as geophysics (Dimri, 1992), spectroscopy (Jansson, 1984), and are also known as convolution models. The extension to an infinite order moving average model may be based on the Hilbert space L2(Q,A, P) whenever second moments exist (c.f. Brockwell and Davis 1987, Chapter 2). Without moment assumptions (at the extremes of the canonical parameter domain), the infinite sums appearing below may be interpreted in terms of convergence in probability. From now on ED* (9, A)denotes an infinitely divisible exponential dispersion model. Definition 5.1 The process {Xt}%__00 defined by C O ^ = E ^ H ; Â« J ) . (5-2-1) 3=0 where the coefficients ao, o c i , . . . â‚¬ [0,1] satisfy cto = 1 and a + = z3j=o a j < Â° Â° ; ? s s a ^ tÂ° be an infinite order moving average process with margins ED* (9, A), denoted EDM A (oo), provided: 1. the innovations {&t}uz-oo ore i.i.d. ED* (9, \/a+); 2. the thinning operators {At>j(et-j; o i j ) } Â° _ o t _ _ ^ are conditionally independent given the innovations {et}^l_00. Since Atjo{et\ 1) â€” et, we may write the EDMA(co) process in the form C O xt = e t + A t , 3 ( Â£ t - j ; a j ) â€¢ i = i Chapter 5. Stationary models with exponential dispersion model margins 63 It is useful to remember in the sequel that the terms of this infinite sum are independent with distributions Attj(et-j; aj) ~ ED* (6,ctjX/a+). The distribution of any partial sum is easily calculated from this result by the convolution formula (5.1.2). The infinite order EDMA(oo) processes produce a large class of what appear to be new infinite order moving average models. If we take ctj â€” 0 for j > q, the EDMA(oo) process reduces to a moving average process of finite order q, denoted EDMA(c/), as defined in (5.1.3). This model contains several known time-series models as special cases, corresponding to the following marginal distributions: Poisson (Al-Osh and Alzaid, 1987; McKenzie, 1988), gamma (Lewis, 1983; Lewis, McKenzie and Hugus, 1989), inverse Gaussian (Joe, 1996a), negative binomial (Joe, 1996a) and generalized Poisson (Alzaid and Al-Osh, 1993). Note that, in the case of negative binomial margins, the EDMA(c/) model is different from that defined by McKenzie (1986) and Al-Osh and Aly (1992). 5.2.1 Hidden process Let us define a-algebras Mk = o-{ek, ek-i, â€¢ â€¢ â€¢} and M = a{..., Â£_i, e0, Â£i, â€¢ â€¢ â€¢}â€¢ The process {Yt} defined from {Xt} by Yt = E(Xt\M) = E(Xt\Mt) is called the hidden process corresponding to {Xt}. Noting that O O Y^Y.ctfr-j (5.2.2) j=0 we find that Yt is an ordinary strictly stationary infinite order moving average process, ex-cept that the innovations et follow an exponential dispersion model, rather than Gaussian white noise. Chapter 5. Stationary models with exponential dispersion model margins 64 We may then write Xt in the form Xt = Â£ (tjt + OtjEt-j) 3=0 = 6t + Yt, (5.2.3) where f>jt = At,j(st-j\ctj) - OLjSt-j and 8t â€” j=o which both have zero mean, and the 8ts are independent. When second moments exist, 8jt and et-j are uncorrected, and so are 8t and Yt. We may hence think of Xt as a kind of state space model corresponding to the hidden process Yt. In the case of a normal marginal distribution, (5.2.2) is a normal moving average process, and Xt is then a normal state space model corresponding to the hidden process Yt. Hence in the case of normal margins, the EDMA(c/) processes have a different structure than the classical normal MA(c/) processes. 5.2.2 Autocorrelation function Assume now that 9 â‚¬ intQ, let fi = Xr(6) and cr2 = \r'(0) denote respectively the mean and variance of Xt, and let fiÂ£ = fi/a+ and <r2 = cr2/a+ denote respectively the mean and variance of e t. By the properties recorded above, it follows that ( O O j=o The autocovariance function for Yt of lag h > 0 is given by O O Cov (Yt, Yt+h) = a] <*ja3+h, j=o and the corresponding autocorrelation function is O O O O Chapter 5. Stationary models with exponential dispersion model margins 65 Since Xt and Yt have the same autocovariances for lags h > 0, the autocorrelation function of Xi is, for h > 0, oo Px{h) = Â£ a j C ( j + h / a + . (5.2.5) j=o In particular the autocorrelation function of the EDMA(g) process is X3]=o a j a j + h / a + Â£ [0, (1 + q â€” h)/(l + q)] for 1 < h < q, and 0 otherwise, in agreement with results of Joe (1996a). We have not investigated the range of the autocorrelation function (5.2.5) in detail, but apparently the range is a subset of that of (5.2.4). Note in particular that px(h) tends faster to zero as h tends to infinity because of the assumption Yl aj < 0 0 ; compared with (5.2.4) where only a ] < 0 0 is assumed. 5.3 Autoregressive moving average processes As we know from classical time-series analysis, any causal autoregressive moving av-erage time-series model may be expressed as an infinite order moving average model. This equivalence is now employed to define autoregressive moving average processes with exponential dispersion model margins. Definition 5.2 An EDMA(oo) process {Xt} is said to be an autoregressive moving av-erage process of order (p,q) with margins ED* (9,X), denoted by EDARMA(p,q), if the coefficients aj are determined by f > ' v = m w Â£ (5-3-1) where $ and i\) are polynomials of degree p and q, respectively, given by <f>(z) = 1 - faz (j>pZP, ip(z) = 1 + i)\z -\ h ipqzq, and where <f)(z) ^ 0 for \z\ < 1. Chapter 5. Stationary models with exponential dispersion model margins 66 In the case p = 0, we have EDARMA(0 ,g ) = EDMA(g) , a moving average process of order q with coefficients aj = ipj j = 1,... , q. When q = 0 the process is called an autoregressive process of order p, denoted EDAR(p), which we study in Section 5.4. The coefficients {aj} for the EDAR(p) process are determined by the relation Y^jLoajz^ = <^_1(z), and may be found by expanding ^^(z) in a power-series. Similarly, in the general case the coefficients may be obtained by expanding the ratio ip(z)/<f>(z) on the right-hand side of (5.3.1) in a power-series, in the usual fashion. Let us consider the hidden process {Yt} corresponding to {Xt}, assuming that Xt has finite expectation. From (5.2.2) and (5.3.1), letting B denote the backward shift operator, we obtain Hence Yt is an ordinary A R M A process given by Yt - 0iY*_i (f>pYt-p = et + ipiet-! H h 4>qet-q, (5.3.2) in terms of the exponential dispersion model innovations e t. In this case we may hence think of the EDARMA(p,<?) process Xt as a state space model corresponding to a hidden A R M A process. 5.4 Autoregressive processes In this section we assume that second moments are finite. When Xt is an autoregressive EDAR(p) process, the hidden process Yt is an AR(p) process, Yt-faYt-! (t>pYt-p = st. (5.4.1) It follows that the autocorrelation function py(h) satisfies the recursion formula, py(h) â€” 4>\py{h â€” 1) -I V cf)pPY(h-p), just as in the normal case. For the EDAR(p) process Xt, Chapter 5. Stationary models with exponential dispersion model margins 67 a modified recursive relation holds for its autocorrelation function px(h), as shown in the following theorem. No such recursive relations seem to hold for the autocorrelation function of Joe's (1996a) autoregressive processes. Theorem 5.1 Define p(h) = px(h) for h ^ 0 and p(0) = J2(j=0a2/a+. If Xt follows an EDAR(p) process, then p satisfies the recursive relation p(h) = fapfi - 1) + h <f>pp{h - p) for h > 0. Proof From the equation (5.3.1) with ip(z) = 1, we obtain O O â€¢ O O O O 3=0 j=0 j=0 Identifying the coefficients of the term z3+h on both sides, we obtain a3+h = <f>l<Xj+h-l H + (f>pCtj+h-v-It follows that Ylf=oOLjCtjJrh T,'jLoa3a3+h-l , , , E^Lo a3a3+h-P = <Pi 1 h <Pp , a + ct+ a + and so p(h) = fap(h - 1) + V <t>vp(h - p). â€¢ 5.4.1 E D A R ( l ) process We now investigate the E D A R ( l ) process and compare it with Joe's (1996a) AR(1) process (5.1.1). The coefficients of the E D A R ( l ) process are given by ct3- â€” a3, where a = ^ Â£ (0,1). It follows that a+ = Â£ ~ 0 a J ' = l / ( l - a )> so that et ~ ED* {0,(1 - a)A}. From (5.2.5), Chapter 5. Stationary models with exponential dispersion model margins 68 the autocorrelation function is px(h) = cth/(l + a) for h > 0. With gamma margins, the process has the form Xt = et + AtAet-! + â€¢â€¢â€¢ + Atjet-j + â€¢â€¢â€¢ (5.4.2) where the A t j s are independent and Atj ~ -Bei^a^'A, (1 â€” a3)X). By comparison, the process (5.1.1) takes the form (1 - GtB)Xt = et where Gt ~ Beta {aX, aX}. The autocorrelation function of this process has the familiar form a3. By inversion, we obtain Xt = et + GtSt-r + â€¢â€¢â€¢ + G{et-j + â€¢â€¢â€¢ (5.4.3) which is different from the expansion (5.4.2). While both models may thus be written as infinite order moving averages with random coefficients, it is clear that there is more than one way to define an autoregressive process. 5.4.2 Partial autocorrelation function Now we investigate the partial autocorrelation functions for both {Xt} and {Yt} in the case of an EDAR(p) process. In order to be able to handle the process Xt we need the following generalization of the classical definition of the partial autocorrelation function. Definition 5.3 The partial autocorrelation function irx(') of a stationary time-series {Xt} is defined by irx(l) = Corr(X2,X1) = px(l), and for h > 2, 7rx(h) = Corr{Xh+1 - E(Xh+1\X2,.. .,Xh),X1 - E{XX\X2, ...,Xh)}. Chapter 5. Stationary models with exponential dispersion model margins 69 For the normal distribution the conditional expectation E ( X i \ X 2 , . . . , X h ) is the re-gression of X \ onto the variables 1, X 2 , . . . , Xh, and similarly for E ( X h + i \ X 2 , . . . , Xh), so the definition hence reduces to the usual one in this case. It is simple to see that *x(h) = Corr {Xh+1 - E ( X h + 1 \ X 2 , â€¢ â€¢ . , X h ) , X 1 } . (5.4.4) The partial autocorrelation function of Yt is the same as for the classical AR(p) process. In particular, the representation (5.4.1) shows that 7Ty(/i) = 0 for h > p. While it is difficult to compute the partial autocorrelation function irx(h) of X t for lags p or less, the following result confirms the autoregressive nature of the EDAR(p) process by showing that partial autocorrelations for higher lags are zero. Theorem 5.2 The partial autocorrelation function irx(h) of the EDAR(jp) process is zero for lags h > p. P r o o f By (5.4.4) it is sufficient to prove that for h > p, Cov {Xh+1 - E ( X h + 1 \ X 2 , . . . , X h ) , X 1 } = 0. (5.4.5) Combining (5.2.3) and (5.4.1), we obtain X t - it t j X t - i = et + St-J2 <t>f8t-3, (5.4.6) i=i j=i It follows that for h > p, E ( X h + i \ X 2 , â€¢ â€¢ â€¢ , X h ) V _ / V \ - YI faXh+i-j + fJ>e + E(8h+1 \ X 2 , â€¢ â€¢ â€¢ , Xh) - E I Â£ <f>j6h+i~j\X2, â€¢ â€¢ â€¢ , X h 1 3=1 \3=1 J p _ p = Y ^jXh+i-j + Pe + E{6h+\\X2, â€¢ â€¢ â€¢ , X h ) - Y ^ A+i - j Chapter 5. Stationary models with exponential dispersion model margins 70 where the last equality is a consequence of a(6h+i-p, â€¢ â€¢ â€¢, 6 )^ C a(X2, â€¢ â€¢ â€¢, Xn) because of (5.2.3). Now since E(6h+i\X2,'",Xh) = E{E(Xh+1-Yh+1\M,X2,-~,Xh)\X2i---,Xh} = E{E(Xh+1\M)-Yh+1\X2,---,Xh} = 0, it follows that the left hand side of (5.4.5) is Cov {Xh+1 - E(Xh+1\X2, â€¢â€¢â€¢,Xh),X1} = Cov + eh+i - fie - E(6h+1\X2, â€¢ â€¢ â€¢ ,Xh), Xx} = Cov(4+i, * i + *i) - Cov {E{Sh+i\X2, â€¢ â€¢ â€¢, Xh), Xt} = 0, completing the proof. â€¢ Finally, we consider the Yule-Walker equations for the EDAR(p) process. First it is easy to show that for j' = n , . . . , 1 with n < p, Cov{Xn+1-E(Xn+1\Xu...,Xn), Xj} = 0. Taking n = p and going through arguments similar to the above ones, we obtain C o v i E ^ I X a , . . . , ^ ) , X3} = f>Cov(y p + 1 _,-, X3) Â«'=1 P = ^2<j>iCov(xp+1_i,Xj). Therefore, for j =/>,..., 1, p ^2<j)iCov(Xp+1-i,Xj) = Cov(Xp+1,Xj). t'=l Chapter 5. Stationary models with exponential dispersion model margins 71 Defining po = Yj <Xj/a+, w e obtain the following Yule-Walker-like equations, Po px(l) px{2) Px(l) po Px(l) Px(p-l) Px(p-2) \ ' Px(l) " <i>2 â€” Px(2) I v <t>v ) { Px{p) , \Px(p-l) px(p~2) px(p-Z) â€¢â€¢â€¢ po Unfortunately, it is not completely straightforward to estimate the coefficients <j>\,..., <f)p from this equation based on sample moments because of the presence of po- We have Ya,r(Xt-Yt) = a2(l-p0), allowing us to estimate po from the variance of Xt and that of Xt â€” Yt, but of course Yt is not observable. 5.5 Truncation error We now consider briefly, for the EDMA(oo) process, the error incurred by truncating the infinite sum (5.2.1) to a finite number of terms. Let us denote the truncated series by N x" = Â£ Aj(Â£*-j;aj)â€¢ j=0 (5.5.1) Then xt-xÂ»~Eir ( M Q + ~ + Q ( A Â° ) , where a^) = Yo aj- When the second moments are finite, the corresponding mean *(N) squared error is E(Xt-XÂ»Y = a 2 a + ~ Q ^ + / x 2 f Q + ~ Q ^ V 2\a+ - a{N) Chapter 5. Stationary models with exponential dispersion model margins 72 To obtain E(Xt â€” Xf*)2 < TJ, where n is a given tolerance level, it is hence enough to choose N so large that a(iV) - a+ i1 ~ ^ T^) â€¢ (5-5-2) In the special case of an E D A R M A ( p , q) process we have a+ = (1 -f ^1 + â€¢ â€¢ â€¢ + â€¢0 g ) / ( l â€” <f>i â€” â€¢ â€¢ â€¢ â€” (f)p), and N may be determined from (5.5.2) by a simple search. For the E D A R ( l ) process with aj = a3 we obtain a+ = 1/(1 â€” a), and equation (5.5.2) becomes with solution N > log??-log(o- 2 + M2) â€” log a 5.6 Further questions The ideas of this chapter may be extended in several directions, and many issues such as parameter estimation and inference remain to be addressed. One may construct state space models in various ways from the processes constructed above. To take an example, let Zt be a given stationary EDMA(oo) process with marginal distribution ED*(9, X/a). If r}t is independent of the Zt and has distribution ED*(6,6), then the process defined by Xt = At(Zt;a) + Tit (5.6.3) has marginal distribution ED*(6,X + 6). If Xt is observed and Zt is latent, we have a simple stationary state-space model. A variation of this idea is obtained by letting Xt = Zt + r)ti where Zt has inverse Gaussian margin and rjt follows a reciprocal inverse Gaussian distribution. This gives a reciprocal inverse Gaussian distribution for Yt (c.f. Seshadri 1992). Chapter 5. Stationary models with exponential dispersion model margins 73 A moving average process of infinite order, MA(oo), with normal margins CO Xt = Y aiÂ£t-J 3=0 (5.6.4) which, as mentioned earlier, is also known as a convolution process, has a variety of ap-plications in many branches of sciences. An important question is that of deconvolution, whereby the sequence {et} is recovered from the observed process {Xt} and the param-eters {ctj}- There is a large literature available on approaches to the deconvolution of (5.6.4) for normal systems (c.f. Dimri 1992, Jansson 1984 and references therein). We may hence ask if it is possible to deconvolve our model that is, to recover {et} from the observed process {Xt}. A possible solution is to estimate the hidden process Yt by means of filtering methods. Since Yt is of the form (5.6.4), one may then adapt existing deconvolution methods to the filtered version of Yt. O O Xt â€” Â£ ^ t , j ( Â£ t - j ; 3=0 Chapter 6 BLUP, Kalman filter and smoother This chapter makes the key technical preparations for results to be developed in the next four chapters. We start with the definition of the best linear unbiased predictor (BLUP) and investigate some basic properties of B L U P . We then develop the Kalman filter and smoother, originally proposed by Kalman (1960) and Kalman and Bucy (1961), under a structure more general than the classic linear model. The reason for studying such a generalization is that the latent process of the state space model considered in Chapter 7 has a non-linear stochastic structure, indeed described by the thinning operation (c.f. Section 2.5), and it requires the Kalman filter and smoother for a broader framework than the classical one presented in, for example, Harvey (1981). In the literature, there are some generalizations of the Kalman filter theory to non-linear models, for instance, that of Kitagawa (1987), Spall (1988) and of Naik-Nimbalkar and Rajarshi (1995). But none has dealt with the theory under a stochastic functional model (c.f. Equation (7.2.2)) yet, as far as we know. 6.1 Some general results for linear prediction Throughout this section we consider random vectors X , Y and Z with finite second moments, and denote their means, variances and covariances by / X \ / X \ / ^XZ \ E Y = fly and Var Y Syx Syz I Z / \ Â»z ) V Z / V ^zx Xz / 74 Chapter 6. BLUP, Kalman filter and smoother 75 We define the linear predictor (or the least squares prediction) of X given Y by X|Y ~ [mX)Y; Qx-|y], where mx\Y and CX\Y, called respectively the predictor Or B L U P and the mean square error, are given by mX\Y = Px + S x y S y 1 ( Y - pY) (6.1.1) and CX\Y = â€” S x y S F 1 S y x - (6.1.2) For the true moments we use Greek letters p and Â£ , while the linear predictor is denoted by the Roman letters m and C , with appropriate subscripts to denote the variables in question. When no conditioning is involved, the linear predictor gives the true mean and vari-ance. If X and Y are jointly multivariate normally distributed then mx\Y a n d C ^ | y coincide with the conditional mean and variance, which is not generally the case other-wise. Furthermore, mX\Y is that linear combination of the elements of Y that minimizes the mean square error for prediction of X, i.e. the best linear unbiased predictor (BLUP) (c.f. Brockwell and Davis 1991, p. 64). In general the linear predictor behaves much like the conditional mean and variance do in the context of the multivariate normal distribution, as the following list of properties shows. 1. The mean square error may be written in the form C* |y = E{(X - mX\Y){X - m X | y ) T } = Var (x - m x | y ) . Thus, C X | y resembles Var {X - E(X|Y)} = E{Var(X|Y)}. Chapter 6. BLUP, Kalman filter and smoother 76 2. For suitable matrices or scalars a and {3, we have (aX + (3)\Y ~ [amX\Y + (3; a C ^ y a T ] . 3. For the multivariate normal distribution, the conditional mean is a linear function of Y and the conditional variance is not functionally dependent on the value of Y . Similarly for linear predictors, we have that if E(X|Y) is linear in Y then X | Y ~ [ E ( X | Y ) ; E { V a r ( X | Y ) } ] . (6.1.3) 4. The prediction error X â€” mX\Y is uncorrelated with Y , Cov(X - mx[y, Y) = Cov(X, Y) - S x y S y 1 V a r ( Y ) = 0, an important and often used property. 5. If Y and Z are uncorrelated, then X|(Y, Z) ~ [rnx|y,zj C X | F , Z ] I with mX\Y,z = Â»x + ^XY^Y1 (Y - pY) + Xxz^z1 (Z - pz) (6.1.4) and 'X\Y,Z ^x â€” TIXY^Y^^YX â€” S X z S z x H z X . (6.1.5) The following results show that the joint, marginal and conditional linear predic-tors behave much like the joint, marginal and conditional means and variances for the multivariate normal distribution. Theorem 6.1 Let X , Y and Z be random vectors with finite second moments. Then the joint predictor o / X , Y given Z is given by V I I I Cxi 'XY\Z \ CYX\Z CY\Z (6.1.6) Chapter 6. BLUP, Kalman filter and smoother 77 where CXY\Z = ^XY â€” '^XZ^Z'^ZYâ€¢ The linear predictor of X given Y and 7i (the "conditional" linear predictor) is given by mx\z + CXY\ZCY\Z ft ~ MY\z) ; Cx\z - CXY\ZCY\ZCYX\Z â€¢ (6.1.7) Proof The joint predictor (6.1.6) is obtained directly from the definition. To show (6.1.7), we note that the vectors (Y, Z) have the same span as (Y â€” my|^ , Z), and that the two components of the latter vector are uncorrelated. Therefore, (6.1.7) follows from (6.1.4) and (6.1.5) by noting that Cov(X, Y â€” mY\z) = CXY\Z and Var(Y â€” mY\z) = CY\z- Hence, the linear predictor of X given Y and Z is the same as that given (Y â€” mY\z),Z. Using (6.1.4) we find \ mX\YZ = Px + (Cov(X, Y - mF|Z), Cov(X, Z)) / ^Y\Z U V o >z ) Y - mY\z V z - Pz J â€” PxJf{pXY\Z^Y]zi^XZ^z1>) = â„¢x\z + CXY\ZVY}Z (Y - MY\z) Y - mY\z \ %-Pz ) and by (6.1.5) 'X\YZ = S; ^YX\Z = - CXY\ZCY^ZCYX\Z -â€” &x\z â€” CXY\ZCY}ZCYX\Z-\ ^zx ) xz^^^zx This completes the proof. â€¢ Corollary 6.1 //X and Y are uncorrelated given Z and either Â£(X|Z) or Z?(Y|Z) is Chapter 6. BLUP, Kalman filter and smoother 78 linear in Z, then X / Y \ = X | Z ~ Cx\z V Z ) (6.1.8) P r o o f From (6.1.7), if CXY\Z = 0 then (6.1.8) is true. Hence we only need to show CXY\Z = 0 under the given conditions. By symmetry, it is sufficient to prove the result in the case E ( X | Z ) = <xZ + p. Since Cov(X, Y | Z ) = 0, we obtain XY = C o v ( X , Y ) = E { C o v ( X , Y | Z ) } + C o v { E ( X | Z ) , E ( Y | Z ) } = C o v { E ( X | Z ) , E ( Y | Z ) } = COV(QZ + /3,Y) and Sxz = Cov(X,Z) = Cov[E(X|Z),Z] = Cov(aZ + P,Z) = aSz, implying that CxY\Z = ^XY â€” ^XZ^z^ZY = OLTÂ±ZY ~ OtSzS^YiZY = 0. This proves the corollary. â€¢ Theorem 6.2 Assume Y | Z ~ [raY\z\ CY\z] a n d X I Y , z ~ [aY + PZ + T, CX\YZ]- The joint predictor o / X , Y given Z (called the combination) is V J / <xmY\z + PI + 7 \ ( CX\YZ + aCY\z<xT ocCY\Z ^ \ mY\z ) \ Cy\zOCT 'Y\Z J_ (6.1.9) In particular, X | Z ~ [am.y|z -f P Z -f 7; CX\YZ + c*Cy|zor T]. Chapter 6. BLUP, Kalman filter and smoother 79 Proof The formula for the combination (6.1.9) may be shown by noting that (o,/3) = ( S X y , S x z ) and that X y XyÂ£ 7 = Px ~ <*PY ~ PPz-Hence Y,XY = Â« S y + pUzY and ^xz - OLY^YZ + P^z-From the definition (6.1.1) we obtain â„¢x\z = Px + {ot^YZ + P^z) S ^ 1 (Z - gz) = Px + <* (MY\Z - PY) + P ( z - Pz) = ctmY\z + PZ + 7-Similarly it is seen from (6.1.2) that &XY\Z = ^XY â€” ^XZ^^^ZY = a S y +PVZY - ((X'Â£YZ + PVZ)XZ1'ZZY = aCY\Z. (6.1.10) The last part of (6.1.9) is obtained by using P â€” (Exz â€” Oi^Yz) S ^ 1 , giving CX[YZ + OLCY\ZOLt = T,x - oc'Eyx - P^zx + ocCY\zaT = ~Sx â€” ^xz^z^zx â€” OL (pYX\Z â€” C y | ^ a T ) . According to (6.1.10) the last term of this expression vanishes, and hence Cx\z = ^x ~ ^xz^z^^zx = CX\YZ + OLCY\ZOLT . â€¢ Chapter 6. BLUP, Kalman filter and smoother 80 6.2 General forms of Kalman filter and smoother This section presents both Kalman filter and smoother under a general framework where only the mean structure is assume to be linear, extending the theory given by, for example, Harvey (1981). Assume that two processes of random vectors {Yt} G W and {9t} G 7lq follow the comb structure defined as follows: Given 0t, Yt is uncorrelated with the rest of the Y t ' s and 6t is first-order Markov. (6.2.1) Figure 6.1 illustrates the comb structure at time t. Figure 6.1: The comb structure. Yt-i 0t-i Yt o O -Ot o Dâ€” In addition, we assume that the processes satisfy the following moment structure: E(Y t |0O = A t0Â« + a t, and Var (Yt\0t) = Wt(6t) + W tÂ°, (6.2.2) and E(0,|0,_1) = B t 0 ,_i + bâ€ž and Var (0t\0t-i) = D ( (^ t - i ) + DÂ° (6.2.3) where the mean structure is assumed to be linear, both W t(0j) and T)t(6t) are matrices of functions in 0t entrywise with constant terms being expressed separately. Such expression Chapter 6. BLUP, Kalman filter and smoother 81 for the conditional variances make the theories developed later in Chapter 7 and 8 easier to implement. Let E {Wt(0t)} = Wt, E {Dt(0t)} = D, and E 0 t = rt. Let Y ' be the set of the first t vectors Y i , . . . ,Yt. The Kalman filter is the B L U P for 0t given Y ' , and the Kalman smoother is the B L U P for 0t based on the total of all observations Yâ„¢, say. Both can be calculated recursively by the following theorems, respectively. Theorem 6.3 Under the comb structure and the above moment structure, for given the prediction at time t â€” 1, 0 t _ i | Y t _ 1 ~ [rri;_i; C(_i] , the Kalman filter is given as follows. First compute OtlY1-1 ~ [B tm (_i + bt; H t ] and Y ^ Y ' " 1 ~ [ft; Qt] where Ht = Dt_x + BÂ°t + BtCt-iBj and ft = At (B tm t_i + bt) + &t, Qt = F t _! + FÂ°t + AtBtCt-iBjAj. Then update the prediction of 9t given Y ' by OtlY* ~ [mt; Ct] where mt = Btmt_1 + bt-rHjAjq;1{Yt-ft) and Ct = H t - ViJAj Q;1 AtHt. Note that the Kalman filtering is done by a forward updating procedure. Theorem 6.4 Now assume that all Kalman filter mt and Ct are given. Under the comb structure and the moment structure given above, the Kalman smoother can be computed Chapter 6. BLUP, Kalman filter and smoother 82 backwards as follows. Given the prediction at time t + 1, 0t+i\Yn ~ rn* + 1; C * + 1 , the Kalman smoother and the corresponding error are given by m* = rrii + CtBj+1Ht+\(m*+1 - B t + 1 m t - b t + i ) and C* â€” C t â€” CtBj+1H.t+\'Bt+1Ct + CtlSj+1'Ht+11C*+1~ilt+T1'Bt+1Ct. The recursion starts with t = n by taking m* = m n and C* = Câ€ž. We now prove the two theorems. Proof of Theorem 6.3: It follows from the comb structure that E (Y*|0 t_!) = A t B ^ + A t b t + a4, (6.2.4) Var(Y t|fl t_ 1) = E{Var(Yt|tfOIÂ«*-i} + Var{E(Y t|tf t)l^-i} = F t(0 t_ a) + F? where Ft(6t-l) = E{Wt{dt)\dt-1} + AtBt{0t_1)Aj, and FÂ° = W Â° + AtT>Â°Aj and Cov(Y<,t9 i|0 t_1) = Cov{E(Y* |0O ,W - i } = A ^ C o v ^ , fl^) = A^^-O + AtD?. Since the both conditional expectations E(Y f |#t-i) and E(t94|0t_1) are linear in 6t-i, by the third property (6.1.3), we immediately obtain \ 6 T I 0t-i ( A&tQt^ + A tb* + &t\ ( F t _ i + FÂ° A f D 4 _ x + A t DÂ° \ + B0TAJ D t _ 1 + DÂ° ) (6.2.5) Chapter 6. BLUP, Kalman filter and smoother 83 where F t _ i = E {Ft(Ot-i)} = E [E {Wt(0t)|0t-i}] + A * E {D t (0t- i )} Aj =Wt + A ^ A j . Now assume that we currently know Ot^lY1'1 ~ [m t _ i ; C t _x] . Applying Theorem 6.2, we obtain the predictions of (Yt,9t) given Y f _ 1 as follows, \ where the predictor is m t | t _ i = and the prediction error is 0t ) V 1 / Y * 1 ~ [m t|t_i; Ct| t _i] (6.2.6) B ( m t _ i + ^ A t ^ v 1 / bt + at 0 C t | t - i = Ft_ i + FÂ° AtDt_a + A t D ? V A T + D?TA7 Dt_r + DÂ° In particular, tftlY^-pBtmt-i + bt; Ht] + ' A t B t C t _ i B T A T A t B t C t - i B T N ^ B t C t _ i B T A T B f C t _ i B T y where and Ht = Dt_ 1 + DÂ° + B t C t _ 1 B T Y<|Y ~ [ft; Qt] where ft = At (Btmt_i + bt) + at, Q t = F t _ i + FÂ° + AtBtC^BjA T A T t â€¢ Chapter 6. BLUP, Kalman filter and smoother 84 Now we update the prediction of 6t given a new vector Yt and Y ' 1 from the expression (6.2.6). It follows from Theorem 6.1 that 0Â«|Y* ~ [m t; C t ] (6.2.7) whe re mt Btmt- i + bt + (Dt_i + DÂ° + B t C t _ ! B tT ) T A 1 (Ft - i + FÂ° + A t B t C t - r B T A t 7 ) " 1 (Yt - ft) Btmt_i + b t + H t T A 4 T Q - 1 ( Y t - ft) (6.2.8) and Ct = H t - H T A i T Q t - 1 A t H t , (6.2.9) so proof is complete. â€¢ Proof of Theorem 6.4: Suppose that we know the current Kalman smoother 6t+1\Yn ~ [m* + 1; c;+1 (6.2.10) and the entire sequence of the Kalman filters 0 t |Y*~[mt ; C t ] . (6.2.11) Note that for given 9t, 0t+i is independent of Y * by the assumption (6.2.1) of the comb structure, then from (6.2.5), 0t+i|Y*,0t = 0t+i|0t~ fBt+itft + bt+i; Dt + DÂ° t+i (6.2.12) By Theorem 6.2 and both (6.2.11) and (6.2.12), \ Ot+i mt ^ Bt+imt + bt+i J ctBJ+i \ B f + i C t H f + i J (6.2.13) Chapter 6. BLUP, Kalman filter and smoother 85 Furthermore, by Theorem 6.1 and (6.2.13), we have 0t|0t+i,Y* ~ m f + C t B ^ 1 H t ^ 1 ( c 9 t + 1 â€” B i + i r r i t â€” b i + i ) ; C i â€” C t B t ^ . 1 H t ^ 1 B < + i C t . (6.2.14) Similar to the calculation of the equation (6.2.4), it is easy to prove that for each s > 1, E (Y t + s | t9 t + 1 ) is linear in t9 t + 1. From the Corollary 6.1, o \o V â„¢ â€” o \o " V * and again from Theorem 6.2 and (6.2.10), 1 et \ / - i * / - i D T T T T f i * ^ LA m*+i / / (6.2.15) c *+i H t+ iBt+ iCi C * + 1 where m* and C* are the Kalman smoother and the corresponding prediction error given by m* = m , + C f B7 + 1 Hr + \(m* + 1 - B t + 1 m t - b t + 1 ) (6.2.16) and C* = Ct - C t B ^ i H ^ B t + i C f + C t B ^ i H ^ C ^ H ^ B t + x C t . (6.2.17) We complete the proof. â€¢ Chapter 7 State space models with stationary latent process This chapter presents a class of state space models for one-dimensional time series of observations, assuming that the latent processes are the stationary AR(1) models of Joe (1996a) with exponential dispersion model margins and without time-varying covariates. The development of the Kalman filter and smoother under the models is not a straight-forward application of the classic theory (c.f. Harvey, 1981) because the latent process is modelled by the thinning operation (c.f. Section 2.5) rather than the linear model in the classic context. Hence we have to derive both predictors, applying the theory of Chapter 6. Estimates of parameters are obtained through the so-called Kalman estimating equa-tion (KEE) , corresponding to a modified EM-algorithm with the E-step approximated by the Kalman smoother. The asymptotic normality of the estimates is proven rigorously. Also model checking diagnosis is proposed via residual analysis. We illustrate the K E E method on the analysis of the polio incidence data (Zeger, 1988) that were previously analyzed by both Zeger (1988) and Chan and Ledolter (1995). 7.1 Introduction For simplicity, in this chapter we concentrate only on one-dimensional time series data; however, most results in this chapter may be generalized to the multivariate setting in a similar way to what we discuss in Chapter 8. Let {Yt:l < t < n} be a time series of observations associated with a set of k-dimensional time-varying covariates, {x i 5 1 < t < n}. The objective of the analysis is to 86 Chapter 7. State space models with stationary latent process 87 model the observations Yt as a. function of the covariates x i 5 with a certain correlation structure introduced via an unobserved latent process 0t. The class of state space models considered in the present chapter is structured as follows: the latent process 6t follows the stationary first-order autoregressive model AR(1) of Joe (1996a) with a margin in the family of exponential dispersion models and without time-varying covariates. Given the latent process 0t, the observations Yt are conditionally independent with distributions in the Tweedie class (c.f. Section 2.3). Under such a model, the problem of estimation is usually difficult because the latent process results in a complex likelihood function. Cox (1981) characterized the regression models for time-dependent data into two classes: "parameter-driven models" and "observation-driven models". The class of mod-els defined above may be viewed as of the parameter-driven type. Although observation-driven models have some advantages from a computational point of view, (c.f. Harvey and Fernandes 1989), the parameter-driven model is conceptually more attractive and has therefore attracted a lot of attention in recent years. It is noteworthy that the parameter-driven model in Cox (1981) tends to stress the latent structure feature, which does not fully reflect the characteristic of our models in which the observed part is also (and may be more) important. We prefer and use the term of state space model instead of parameter-driven model, although some authors such as Zeger (1988) and Chan and Ledolter (1995) have used the term of parameter-driven model for their models that are similar to ours. State space models or dynamical models have been developed extensively since their introduction in the context of Gaussian time-series modelling by Harrison and Stevens (1976) (see also West and Harrison 1989). The models provide a convenient conceptual framework for longitudinal data because the latent process may be interpreted as an un-observed intensity process, often providing a useful interpretation of the data-generating mechanism. A n additional important feature of these models is that the latent process Chapter 7. State space models with stationary latent process 88 may be estimated by Kalman filter techniques of Kalman (1963) (c.f. Durbin 1990, Fahrmeir and Kaufmann 1991, Jones 1993 as well as Fahrmeir and Tutz 1994). State space models for univariate count data have been proposed by Azzalini (1982), Zeger (1988) as well as Chan and Ledolter (1995) (c.f. Grunwald, Raftery and Guttrop 1993). A major obstacle to the development of state space models for non-Gaussian data is the that likelihood-based methods may require computationally intensive methods such as Monte Carlo Markov chain simulation (see Chan and Ledolter, 1995). A computationally simpler alternative to state space models is the quasi-likelihood approach of Liang and Zeger (1986), which accommodates a wide range of data by using only second moment assumptions, and is robust against misspecification of the correlation structure, in the sense of producing asymptotically correct standard errors. In particular, models for serially correlated discrete observations have been studied by Zeger (1988), Kaufmann (1987), Stiratelli, Laird and Ware (1984), Zeger, Liang and Self (1985) and Zeger and Qaqish (1988). Further contributions and references may be found in Zeger and Liang (1986), Diggle, Liang and Zeger (1994) and Dwyer et al. (1991). The less than full efficiency of the quasi-likelihood estimators is not a major concern, but the lack of full distributional assumptions often makes the method difficult to inter-pret, and leads to an over-reliance on estimated regression coefficients and their standard errors. Many of the state space models for non-Gaussian data proposed in the literature share with the quasi-likelihood approach the problem that the probability model is not fully specified. Similarly, in the dynamic generalized linear model of West, Harrision and Migon (1985), the latent process is specified in terms of first and second moments only. In the exponential discount model (or the Bayesian steady forecasting model) (c.f. Smith 1979, Key and Godolphin 1981, West 1986 and Smith and Miller 1986), the state process is vaguely described via predictive distributions. By contrast, the state space models proposed in the present chapter and Chapter 8 Chapter 7. State space models with stationary latent process 89 are based on a fully specified class of parametric distributions, opening up some new possibilities for detailed modelling of the longitudinal structure of the data. In partic-ular, we allow time-varying covariates to enter either via the latent process or via the observation model. These two types of covariates are called respectively long-term and short-term covariates. The long-term covariates thus feed long-term variations (relatively speaking) via the latent process, whereas the short-term covariates feed variations that are short-term relative to the latent process. As a consequence, we stress the need for checking both the observed and unobserved parts of the model by means of residual analysis, something which may be difficult for partly specified models. Different structures for latent process gives rise to different interpretations of mod-elling as well as different implementations for estimation procedures. Therefore, we present our development of the state space models in two separate chapters. In the present chapter, we focus on the state space models with stationary latent process in the absence of long-term covariates. Later in Chapter 8 we consider state space mod-els with more general latent structures including time-varying covariates, leading us to non-stationary latent processes. With regard to the state space models with stationary latent process, Zeger (1988) as well as Chan and Ledolter (1995) propose two different methods for statistical inference on regression parameters. Zeger (1988) used a quasi-likelihood approach resembling the G E E method developed for population-averaged models (c.f. Liang and Zeger, 1986), while Chan and Ledolter (1995) used a Markov Chain Monte Carlo algorithm, a com-putationally intensive approach. Both methods were developed for Poisson count data using a log-linear model, but imposed different assumptions regarding the stationary latent structures: Zeger (1988) considered a weakly stationary latent process based on the first two moments while Chan and Ledolter (1995) used a stationary process defined by exp{AR(l)} where the AR(1) is the first-order autoregressive model with a normal Chapter 7. State space models with stationary latent process 90 margin. The class of state space models presented in this chapter bases the observed part on the Tweedie family which covers discrete data (including Poisson counts), continuous and mixed data, incorporating a more flexible latent part with the class of AR(1) models having exponential dispersion model margins. To estimate parameters, we use what we call the Kalman estimating equation (KEE) , corresponding to a modified version of EM-algorithm with the E-step approximated by the Kalman smoother. We provide a rigorous proof concerning the asymptotic normality of the estimators obtained from the K E E method. Additionally, we develop a systematic approach to model checking diagnosis based on residual analysis, allowing us to avoid any serious misspecification of the models. It is noticeable that both Zeger (1988) and Chan and Ledolter (1995) did not discuss the problem of model diagnosis. There is a concern about the difficulty of extending the latent process from the sta-tionary AR(1) to higher-order Markov structure. As we have pointed out before, gener-alizing the AR(1) process of Joe (1996a) to a higher-order A R process is complicated, as evidenced by the technique of Joe (1996a), hampering the further development of the models presented in this chapter to handle high-order correlation structures. A possible path around this problem would model the latent part via the E D A R models discussed in Chapter 5, since all E D A R models have a unified structural presentation regardless of the order of model. Finally, as an illustration of the parameter-driven model and the K E E estimation approach, we re-analyzed Zeger's polio incidence data, and demonstrated our model checking diagnosis in details. The polio data were previously analyzed by both Zeger (1988) and Chan and Ledolter (1995) who did not discuss any corresponding model diagnoses for the assumptions made in their analyses. Chapter 7. State space models with stationary latent process 91 7.2 Models We focus on one-dimensional time series, although generalization to multivariate serial data could be done, for example, in a similar way to that presented in Chapter 8. 7.2.1 Definition Conditional on the latent process 8t, Yt is assumed to be a time series of independent observations following Tweedie exponential dispersion models, Yt\6t~Twp(at6t,v201t-p), t=l,...,n (7.2.1) with at = exp(x tT/3) where x t is a k-vector of time-varying covariates, (3 is a A;-vector of regression parameters of interest and v2 is the dispersion parameter. As shown in Section 2.3, the Tweedie class accommodates a considerable range of distributions, including continuous, discrete and mixed types. For the Poisson case (p = 1), (7.2.1) reduces to v2Po(at6t/i>2), so in this case we take v2 = 1 in all formulas throughout the chapter. Assume that latent process {9t} is first order stationary AR(1) with margins in the class of infinitely divisible exponential dispersion models, defined by Joe (1996a) 6t = At{9t-1;a)+eu i = l , 2 , . . . (7.2.2) where At(-; a) is the thinning operation defined in Section 2.5, a â‚¬ [0,1] is the autocorrela-tion coefficient and {e t} is a sequence of i.i.d. innovations distributed as ED*{o>, (1â€”a)X}. It is worth noting that the values of the latent process must be positive due to the definition of the model (7.2.1), which allows only certain exponential dispersion models Chapter 7. State space models with stationary latent process 92 to be used as marginal distributions for the latent process. This excludes the normal distribution as a marginal distribution for the latent process. A possible choice for marginal distribution would be the gamma distribution, exactly the distribution chosen for the analysis of the polio data (Zeger, 1988) in Section 7.6. The AR(1) model with gamma margins Ga*(u, A) was first proposed by Lewis, et al. (1989) as follows, 9t = Gt0t-1+et, t = l,... (7.2.3) where {et} is a sequence of i.i.d innovations with distributions Ga*{u>, (1 â€” a)X}, and {Gt} is a sequence of i.i.d. random variables with beta distribution Beta{a\, (1 â€” ct)\}. 7.2.2 Moment structure We now study the moment structure of the model, in order to interpret the correlation structure of the model, and for use in the Kalman filter and smoother in Section 7.3. Throughout the rest of this chapter, we let a = 1 â€” a. The conditional moment structure For the observed process (7.2.1), we have E(Yt\9t) = at9u Vâ„¢{Yt\9t) = a?tv29t. (7.2.4) For the latent process (7.2.2), because of stationarity, let E(9t) = p = Ar(w), Var(04) = a2 = XV(fi/X) (7.2.5) where T(U) = K'(UJ) and V(-) is the variance function of the model ED*(u, A). Then the conditional moments of the latent process are given by E{9t\9t-1) = a9t-1 + afi, V a r ^ l ^ ) = F(9t_1) + aa2; (7.2.6) Chapter 7. State space models with stationary latent process 93 where F(9t-i) â€” Var{At(9t; a)\9t-i}. Note that the conditional variance of the latent model given in (7.2.6) is a non-linear function in 9t-\ and no longer follow the structure of linear model. By equation (2.5.3), we have E{At(9t; a)\9t} = a9t. Furthermore, E {F(0 t_i)} = Wav{At(9t; a)} - Var [E{At(9t; c*)|0t_i}] = ecu2 - o?a2 = acta2. The marginal moment structure The marginal moments for the observed process are given by E(Yt) = at(j,, and Var(r^) = a?tv2fi + a2cr2, while (7.2.5) gives the marginal moments for the latent process. The covariances We have the following covariances, Cov(Y t, 9t) = ata2, Cov(0 t, 9t+s) = asa2, (7.2.7) and Cov(Y t, Yt+S) = atat+sasa2, Cav(Yu 9t+s) = ataso-2, Gov(Yt+s,0t) = at+sasa2. (7.2.8) By the definition of the latent process (7.2.2), the autocorrelation function is Corr(#t, 9t+s) â€¢ a s ; moreover, the correlation of the observed process is given by Corr(F t, Yt+a) = a2as v/(arV/. + c72)(a?;>V + < 7 2 )* 7.3 Kalman filter and smoother We now develop the Kalman filter and smoother. Because the latent process (7.2.2) is stochastic functional and the variance structure (7.2.6) is non-linear, the classic theory Chapter 7. State space models with stationary latent process 94 for the derivation of the Kalman filter and smoother is not applicable to our models. Therefore, we must derive these directly from their definitions by using the tools devel-oped in Chapter 6 under a general setting. Before working out the details, we would like to state a fact that will be useful for the following calculations. Define the innovations of the observed process by V t = Yt-E(Yt\6t) = Yt-at9t. Then {rjt} is uncorrelated with any function of {0t}, and it is also uncorrelated with {et}. A sketch of the proof is in fact given as follows. For any measurable function / , by the assumption of conditional independence, Cov{rit, f{6t+a)} = ECov{rt - at9tJ(9t+s)\9t} + Cov[E(Yt - a t 0 t |0 t ) ,E{/(0 t + , ) |0t}] = ECov{F t , f(9t+s)\9t} - atECov{9t, f(9t+s)\9t} + 0 = 0 (7.3.1) and C o v ( 7 / T , et+s) = 0 by a similar argument. 7.3.1 Derivation The Kalman filter predicts a future value of the observed process and the current value of the latent process based on past observations. Noting that both conditional moments (7.2.4) and (7.2.6) for the observed and latent processes satisfy, respectively, the condi-tions of (6.2.2) and (6.2.3), by (6.2.6), we obtain the predictions of (Yt,9t)T given V t _ 1 (Y \ where the predictor is â„¢t\t-l - OL t-1 Y Ct \ 1 / i; ct\t-i mt-i + a ( n \ at V 1 I Chapter 7. State space models with stationary latent process 95 and the prediction error is / <H\t-\ = oc In particular, at 1 (at, 1) ct-i + (1 - ct2)a2 f a \ \ 1 ) (at, 1) + ' aptv2pi 0 N , o o j where it t = a2ct + (1 â€” a2)a2. Assume that Y J | F t _ 1 ~ [ft; Qt]. Therefore, by Theorem 6.2 of Chapter 6, we have ft = aatmt-i + aatfJ,, Qt = oc a t C f _ i + (1 - a )a at + a\v (i = atut-\ + a\v \i. Moreover, applying the equation (6.2.8), we update the prediction of 6t given a new observation Yt and Yt-1 as follows, _ atUt-i . mt = amt-i + a/i + â€”2 â€¢â€”j-ir- (Yt - ft) (7.3.2) and <x2ct-i + (1 - a2)a2 aptv2n + a 2 u t _i v2iiavut-x (7.3.3) The Kalman smoother predicts the latent process from the total vector of observa-tions, to be used for the estimation procedure presented in Section 8.4. Assume that we know 9t Yt ~ [mt; ct], t=l,...,n. Denote the Kalman smoother predictor given all observation Yn by 6t\Yn ~[m t*; cf]. Chapter 7. State space models with stationary latent process 96 Assume that we currently know et+i\Yn m â€¢ r <+!' t+1 Then the formulas given in (6.2.16) and (6.2.17) give the smoother, and the prediction error m* = mt-\- (rn*+1 â€” amt â€” ap\ (7.3.4) Ut <*2<% , a 2 c t * Ct â€” + 2 W + l (l-a2)a2 a2c2 , = â€”ct + â€”r<5+i- (7.3.5) ut Â« t The recursion starts at t = n with m* = m n and c! 7.3.2 Mean square error matrix Let m* = (m*, . . . , m*n)T and 0 = ( ^ , . . . , 0 n ) T . Then 0\Yn ~ [m*; C*] where C* is the mean square error matrix of form C* = E{(6> -m*)(0 -m*)T} whose diagonal elements c* have given by (7.3.5). The off-diagonal elements are given by < Â«+. = cf + - i t = E {{Ot - m*t)(6t+s - m*t+s)} = Cov{6u 9t+s - m*+s) where the last equation is obtained using F,8t = Em* = / i and C o v K , 6t+s - m*t+s) = ECov {ml 6t+s - m*t+s\Yn} + Cov{E(m; |F" ) , E(0t+S - m*t+s\Yn)} = 0. Chapter 7. State space models with stationary latent process 97 By definition of the model, we have et | et+s,Yn = et | et+a,Yi+a-1 = et \ et+s - ^ ^ r - 1 . As discussed by J0rgensen, et al. (1995), we obtain t?ha *\ro * \\ Cov(f lÂ«, 0t+s - a m t + . - i ) â€ž E U6t - mt)(et+8 - mt+s)j = â€” â€” ct+s >â€¢ } Var(0 t + s - a m t + s _ i ) where Var (0 t + s - amt+s-i) = u t + s _ i and Cov(6Â»(, 9t+s - a m t + s _ i ) = Cov(04, 0t+s) - Gov(6t, tttn(+,.i) = a{Cov(0 f , 6Â» t + s_i) - Cov(0 t, m i + s _x)} = aCov(0 t, ^ + s _ i m t + s _ i ) . Following the same procedure, we further obtain n fa a \ Cov(0*, 0 i + s _ i - amt+s-2) Var((9 t + S_ 1 - a m ( + s . 2 ) a = Cov(0 t, 6t+s-2 - amt+s^ct+s-! Ut+s-2 and finally, s _ 1 r â€¢ * _ s * T T Â° * + ' C t , t+s ~ a Ct+s 11 The recursive relation given below is especially useful for calculating the matrix C*. r * _ r r f!Â±i _ -,c*+s+i c*+Â« * C t , i + s + l - " C t + 5 + l 11 - a * C t , i+S" i=0 U i + l Ct+s Ut+s In particular, ct, t+i = ar"c*+i- (7-3.6) Ut 7.4 Estimation This section presents a modified EM-algorithm for the maximum likelihood estimation of the regression parameters /3, given v2 and cr2, by way of the Kalman estimating equation Chapter 7. State space models with stationary latent process 98 proposed first in J0rgensen et al. (1994). Also the asymptotic normal distribution of the resulting estimators is described in this section while the detailed proof of the validity of the distribution appears in Section 7.7. Then we introduce the moment estimators for the dispersion parameters v2 and a2 and the correlation parameters a, as well as the calculation of the Godambe information matrix. 7.4.1 Estimation of regression parameters Let Y = ( y 1 , . . . , y n ) T , X = (x 1 , . . . ,x n ), A*(/3) = (a 1 c9 1 , . . . ,a n ^) T . Assuming 0t fixed, by differentiating the joint likelihood of the model, we obtain the Fisher scoring equation as follows, X A 1 _ P (Y - /x(/3)) = 0 (7.4.1) where A c denotes a diagonal matrix d iag(aÂ£ , . . . , aÂ£) for any real c, and then we obtain the estimate (3 as the solution to the equation (7.4.1). By treating the latent process 0t as a set of missing values, we would apply the EM-algorithm for estimation as follows. The M-step is completed by solving the following equation, = X A 1 - " (Y - p.((3)) = Â£ xta}-p(Yt - atm*) = 0, (7.4.2) t=i replacing p-((3) in (7.4.1) by fi({3) â€” (aim*,..., anra*) where m* is the Kalman smoother. Consequently, Equation (7.4.2) is called the Kalman estimating equation (KEE) . On the other hand, the E-step is done by using the Kalman smoother (7.3.4) given both the currently updated estimates f3 from the M-step and the moment estimates of cr2, v2 and a. For most distributions in the Tweedie families, including the cases such as p = 0 (normal), p = 1 (Poisson) and p = 2 (gamma), the M-step may be easily carried out Chapter 7. State space models with stationary latent process 99 using standard statistical software for generalized linear models. A n alternative algorithm to solve equation (7.4.2) is the so-called Newton scoring algorithm defined by where the matrix S(/3) is given in Theorem 7.1 below. The asymptotic normality of the distribution for the estimator (3 from the K E E holds under some mild regularity conditions, as indicated in the following theorem. Theorem 7.1 Suppose 9t is a stationary process. Under mild regularity conditions, for v2, a2 and a fixed, VZ(p - / 3 ) ^ A 4 { 0 , S ( / 3 ) } with S(/3) = l i m n raJ-1(/3) where J is the Godambe information matrix defined by J((3) = ST(/3)V-1(/3)S(/3), where S(/3) is the sensitivity matrix given by and V(/3) = Ep {</>(WT(/3)} is the variability matrix. The proof of the theorem is given in Section 7.7. From the theorem, the asymptotic standard deviation of (3 may be calculated from the asymptotic variance S(/3). Moreover, an analogue of Wald's test is available for testing the joint significance of a subset of regression parameters. The test statistic is defined by W = f3T1{j1101)}-1(31 Chapter 7. State space models with stationary latent process 100 where \3X is the subvector of interest to test and J11(/31) is the corresponding block of the asymptotic covariance matrix of /3V Asymptotically, this statistic follows a x 2 (^i )" distribution where k\ is the size of \3X. 7.4.2 Estimating the dispersion and correlation parameters The likelihood estimates of dispersion parameters could, in principle, be obtained. How-ever, the numerical computation for obtaining estimates are not easy to carry out in general. For example, in gamma case, dispersion parameters are involved in the gamma function which leads to a digamma function when taking derivatives with respect to the parameters. So, solving the corresponding estimating equation to obtain maximum like-lihood estimates of parameters is not straightforward. Alternatively, we use the moment estimates for dispersion parameters. The parameter u is assumed known in the following discussions. The equation Var(Yt â€” atm*) = avt\xv2 â€” a2c* leads to the following moment estimator of the dispersion parameter u2, v2 = â€” Â£aTP(Yt - atm*t)2 + â€” Â£a^c*. (7.4.3) To estimate the dispersion parameter a2 of the latent process, note that Var(Yt) = a\v2n + a2a2;, so the moment estimator is of the form * 3 = E - - / IX (7-4.4) t=l t=l We now consider the estimate of the autocorrelation coefficient a. Observing that aa2 = Cov(0i ,0 t + 1 ) = Cov(m*,m* + 1) + c * t + 1 = Cov(ra*,m* + 1) + aâ€”c*t+1 Chapter 7. State space models with stationary latent process 1 0 1 where c* t + 1 is given in (7.3.6), we obtain a if* ~ u^ C* + 1) = C o v(m*>m*+i)-This suggests the following moment estimate of a, 5 = ^ _ g K - M ) K - M)_ n - 1 <=i Â°~ - Â°tut c*+i It is noted that for small size of data the moment estimate of a, sometimes, might not lead to a positive definite covariance structure. 7.4.3 Calculation of Godambe information matrix We first derive the [k by k) sensitivity matrix S(/3). Define ^ ( . ) = E ( ^ ) , M D ^ ) = E ( ^ ) . Noting that E(Yt â€” atm*) = 0 , we have n n n / f)m*\ S(/3) = ( 1 - p ) X ; Â« ! - V E ( ^ - Â« t m;) - EÂ«rPx tx t TEm; - E Â« ? - p x t E t=i t=i t=i Vo'P / = - Â£ a i 2 - p x i L x t T + % ( i ) ) The (k by &:) variability matrix is given as follows, V(/3) = {^ (/3)^ T(/3)} = XA x - p Var (Y - Am*)A 1 _ p X T (7.4.6) where Var(Y â€” Am*) is a A; x k matrix whose (t, s)-th element is of form o?tv2\i â€” a2c*, t = s -a t a s c* â€ž t^s A sketch of the derivation of Var(Y â€”Am*) is obtained using the following two facts. Un-der the assumption that two conditional residuals from both processes are uncorrelated, we have the first fact, Cov(F t - at6t, Ys - as0s) = Cov(Yt - atm*, Ys - asm*) + atasc*t s , Cov(Yt - atm*t, Ys - asm*) - < Chapter 7. State space models with stationary latent process 102 where c* t = c* by the convention in Section 7.3.2. The second fact is that a^u2/!, t = s Cov(Yt - at6t, Ys - as9s) - < 0, tÂ± s obtained by the arguments based on the conditional covariance given 9t, say. Finally, we establish recursive relations for both T>p and P*^. From equation (7.3.2), by taking expectations on derivatives with respect to /3T, we have ~ a \ l a2ut_1+aptv2p)P{ } ^ - i + a ^ 2 / " starting with T>p(0) = 0. The last equality is obtained by noting dft dat _ dat T "cW = ^ dJT aat fit + a f i 0 r = a a ' P M + ^a<x* â€¢ Moreover, noting that E(m* + 1 â€” amt â€” au) = 0, we obtain = DpÂ® = Dpit) + <A [V*p(t + 1) - aDp(t)} , (7.4.7) starting at t = n with T>*p(n) = T>p{n). 7.5 Model checking We now develop methods for model diagnosis for both the observed and unobserved parts of the model. In principle, there may be in all, seven types of residuals classified as either marginal or conditional residuals. Also we could form the residuals based on either the Kalman filter or smoother. In this section, we are interested in only three out of these seven residuals, which are those used for our model checking procedure. (1) For the observed part, we use the conditional residuals based on the Kalman filter, defined by Vt = Yt~ ft Chapter 7. State space models with stationary latent process 103 By the fourth basic property and Theorem 6.3 in Chapter 6, for any s > 0, f}t+s is uncorrelated with Yt+S, and hence these residuals are uncorrelated and have variances Qt-(2) For the latent process, we choose the conditional residuals based on the Kalman filter, given by i t = m t - ( a m t - i + ctfi). By the similar argument given in part (1), these residuals are uncorrelated and their variances are Var(it) = (v2)2 fi2a2r2c2Qt because of the fact it â€” v2 \iavt~x ctf\t-(3) Marginal residuals of the latent process based on the Kalman filter are also consid-ered, defined by rt = mt â€” fi with moments given as follows, E(r f ) = 0, Vax(rt) = a 2 - cu and Cov(r t , rt+s) = a s a 2 - ctsct. (7.5.1) In the following, we let a bar denote standardized residuals, for instance, % = Â£ t / \ /Var (e i ) . Checking correlat ion structure A simple way to check the first-order Markov assumption of the latent process is to plot Â£ t versus St-i- Similarly, the plot of rjt against rft_1 provides a check of the Chapter 7. State space models with stationary latent process 104 conditional independence for the observed process over time. One may also inspect the autocorrelation function of It and that of rjt for the same purpose. Checking distributional assumptions As in generalized linear models, we may check the form of the variance functions (and hence the distributional forms) by plotting the standardized filter residuals versus the corresponding fitted values. To check the conditional distribution assumption of the irt given 7T(_i, we plot It against log(oro t_i + ctfi). In a similar way, we may plot rft against log(/ t) to check the conditional distribution assumption for the observed part. Checking the covariate structure Plots of standardized residuals against each covariate are useful for detecting nonlin-earity of the model. For instance, to check the log link, we could plot \og(Yt) against log(/*) where / * = atm*. In principle, this plot should show a horizontal linear relation-ship. Checking the latent process From (7.5.1), we have Cov(rt,rt+s)/cts = a 2 - c t (7.5.2) where the right-hand side of the equation (7.5.2) is constant in lag s. In practice, Cov(r i ? rt+s) is indeed replaced by the sample autocovariance function and both a and a2 by their moment estimators. If the values on the left-hand side of the equation (7.5.2) are about constant as a function of the lag s, the AR(1) structure is confirmed. In prac-tice, there is certain risk of a misleading conclusion, because equation (7.5.2) contains estimators, in particular the as are very sensitive to error due to the power function when ct is not precise enough. Nevertheless, we could check the pattern based on the first several lags, which is more reliable than the pattern with respect to large lags. Chapter 7. State space models with stationary latent process 105 For the latent model with gamma margin, we have an alternative way to check the AR(1) structure by inspecting the simulated residuals given by et = mt- Gtmt-\ (7.5.3) where Gt are i.i.d sequence of random variates simulated from the distribution Beta {a\, (1 â€” ct)X} where A = 1/<J2, and they are also independent of the Y^s. For given a. and A, since E((? t) = a and Var(G<) = a ( l â€” a)/2, it is easy to show that Cov(e t , e t + s ) = Cov(et,et+s) = 0 and Var(e t) = Var {(ra t - ara t_i) + (a - Gt)mt_i} = Var(ra< - cvm t_i) + Var{(a - Gt)mtâ€ži} = Var(e\) + Vai(GÂ«) {Var(r 4) + p2} . Therefore, the standardized et are a noise series provided that the structure of AR(1) is correct. Any dependence pattern would indicate inadequacy of the stationary assump-tion. 7.6 Data analysis We now apply the parameter-driven model to the analysis of the polio incidence data taken from Zeger (1988) as an example. Table 7.1 reports the polio data which are the monthly numbers of poliomyelitis cases in the USA from 1970 to 1983, given in Table 2 of Zeger (1988). Figure 7.1 gives a time series plot of the data. The central question investigated by Zeger (1988) is whether or not the data provide evidence of a long-term decrease in the rate of U.S. polio infections, that is, whether or not the data follow a decreasing time trend. Chapter 7. State space models with stationary latent process 106 Table 7.1: Monthly number of U.S. cases of poliomyelitis for 1970 to 1983. Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec. 1970 0 1 0 0 1 3 9 2 3 5 3 5 1971 2 2 0 1 0 1 3 3 2 1 1 5 1972 0 3 1 0 1 4 0 0 1 6 14 1 1973 1 0 0 1 1 1 1 0 1 0 1 0 1974 1 0 1 0 1 0 1 0 1 0 0 2 1975 0 1 0 1 0 0 1 2 0 0 1 2 1976 0 3 1 1 0 2 0 4 0 2 1 1 1977 1 1 0 1 1 0 2 1 3 1 2 4 1978 0 0 0 1 0 1 0 2 2 4 2 3 1979 3 0 0 2 7 8 2 4 1 1 2 4 1980 0 1 1 1 3 0 0 0 0 1 0 1 1981 1 0 0 0 0 0 1 2 0 2 0 0 1982 0 1 0 1 0 1 0 2 0 0 1 2 1983 0 1 0 0 0 1 2 1 0 1 3 6 Figure 7.1: Monthly counts of Polio incidences. Dots show the observations. The solid line represents the estimated latent process by the Kalman smoother. The broken line shows the moving average series of the data with both 6 and 12 lags as the initialization of the latent process for Kalman filter. C M o Q. o -1970 1972 1974 1976 1978 1980 1982 Time (monthly reported over 1970-1983) Chapter 7. State space models with stationary latent process 107 These data were previously analyzed by Zeger (1988) and Chan and Ledolter (1995) under a stationary latent structure. They both assumed that the data are conditionally independent given the latent process and follow a Poisson distribution. Here the con-ditional mean is a function of covariates and the regression parameters. Zeger (1988) used the quasi-likelihood approach similar to the G E E in his model where the latent process is assumed to be stationary without the specification of marginal distribution. Chan and Ledolter (1995) assumed the latent process to be exp{AR(l)} where AR(1) is a stationary process with Gaussian margins. They used a Markov chain Monte Carlo approach to deal with the estimation problem. 7.6.1 Poisson stationary-gamma model In the present section, we tentatively fit the following model to Zeger's (1988) data. The model assumptions will be confirmed through residual analysis later. Taking p = 1 in model (7.2.1), we assume that given the latent process 9t, the polio counts are conditionally independent over time and follow a Poisson distribution with mean at9t (therefore v2 = 1), namely, Yt\0t ~ Po(at6t), < = 1,...,168 where x t = (1, t, cos(27rt/12), sin(27ri/12), cos(27rt/6), sin(27rt/6))T which is the same as Zeger (1988), including the constant term, time trend and the seasonal effects. The latent process is chosen to be an AR(1) model with gamma margins given by the equation (7.2.3), 9t = Gt9t-\ + et, t = 1,..., where the mean \i of 9t is set to 1, forcing the unconditional mean of E(Yt) in (7.2.4) to depend only on x^ /3 and not on moments of the 9t series, as is desirable in the regression Chapter 7. State space models with stationary latent process 108 analysis. So, we have E(Yt) = at, and Var(yt) = at{l + ata2). Here the latent process introduces both overdispersion and autocorrelation into Yt. Fur-thermore, we obtain a log-linear model for the marginal expectation as follows, log{E(y t)} = xtT/3. ( 7 > 6 < 1 ) We must now obtain an estimator of (3 and test whether or not the time trend is signifi-cant. 7.6.2 Implementation of estimation Under the Poisson stationary-gamma model, the Kalman filter and smoother take the following form. The filter is given by mt = amt-i + a + ct(Yt - ft) with prediction error _ atUt-i _ ut-i at + a2ut-i 1 + atut-\' where ut = a2ct + (1 - a2)a2, ft = aatmt-\ + aat and Qt = a2ut-i + at. The smoother is obtained as mt =mt-\ {mt+1 - amt - a) ut with The estimator of (3 is obtained as the solution of the Newton scoring algorithm for the inference function ib of form t=i Chapter 7. State space models with stationary latent process 109 The sensitivity matrix is given by S(/3) = - f > x t ( X i T + ^ ( t ) ) . From (7.4.6), the variability matrix is given by V(/3) = X V a r (Y - Am*) X T = X {diag(a t) - diag(a t)C*diag(a t)} X T . Also, we could simplify the both forms of T>p{t) and T)*p{t) as follows, Vp(t) = a ( l - atCt)Vp{t - 1) - atctxj and Vpd) = + Â« ^ +1) - Â«*>/J(*)} â€¢ The estimator of the dispersion parameter u 2 is of form * 2 = E { w - & o a - & t } / i ; f i t a t= i *=i and the estimator of the correlation parameter is given by i " ^ K - i R ! - ^ ct = > â€”â€” f n - l ^ J Â£ 2 _ â€ž . â€ž - ! , 7.6.3 Numerical results We initialize the Kalman filter by the moving average series of the polio counts obtained based on both 6 and 12 lags, taking the seasonal effects away, shown by the broken line in Figure 7.1. The starting value for a is chosen to be the autocorrelation of the starting 6 series, and the starting value for cr2 is set to 1, the same as in Chan and Ledolter (1995). The convergence criterion is set to max (|| /3^+1^ â€” (3^ ||) < 10 - 5 . Our results, as well as those reported by Zeger (1988) and Chan and Ledolter (1995), are shown in Table 7.2. In general, the estimates from the three different approaches seem similar; however, a few differences should be pointed out: Chapter 7. State space models with stationary latent process 110 Table 7.2: Estimates of coefficients and standard errors: (a) Zeger's parameter-driven model using G E E ; (b) Chan and Ledolter's parameter-driven model using an M C M C algorithm; (c) our parameter-driven model using the K E E approach. â€¢ (a) G E E (b) M C M C (c) K E E 3 Std. err. 3 Std. err. 3 Std. err. Intercept (log rate .17 .13 .42 .49 .16 in Jan. 1976) Trend x l O - 3 -4.35 2.68 -4.62 1.38 -3.87 1.64 COS(2TT//12) -.11 .16 .15 .09 .13 .10 sin(27rt/12) -.48 .17 -.50 .12 -.50 .12 cos(27ri/6) .20 .14 .44 .10 .45 .10 sin(27ri/6) a2 -.41 . .14 -.04 .10 -.06 .10 .77 .54 .81 P(l) .77 .88 .36 Note: p(l) represents the estimator of the first lag autocorrelation for the latent process. (1) Our K E E estimates are closer to Chan and Ledolter's maximum likelihood ( M C M C ) estimates than Zeger's estimates, noting that the estimates corresponding to the four seasonal effects are almost the same in (b) and (c). (2) With regard to the time trend, both our method and Chan and Ledolter's method found the slope to be significantly different from zero, implying a significant de-creasing time trend for the polio data. Zeger reported insignificance for the slope of the time-trend at the .05 level, having a t ratio of 1.62, while both Chan and Ledolter's and our t ratios exceed 2.3. (3) Standard errors for the K E E estimates corresponding non-constant covariates are uniformally smaller than the those given by Zeger, being nearly the same as the those obtained by Chan and Ledolter. (4) Our estimate for the lag 1 autocorrelation, a, is almost twice as small as the one Chapter 7. State space models with stationary latent process 111 reported by Zeger and two-and-a half times smaller than that obtained by Chan and Ledolter. The lag 1 autocorrelation for the raw data of polio counts is close to .3, so the value of .36 is reasonable for the lag 1 autocorrelation of latent process, under the assumption that the observations are conditionally independent given the latent part. The solid line in Figure 7.1 shows the estimated latent process from the Kalman smoother. Although the computation for the K E E involves the updating of predictions of the latent process, the most time-consuming part of the method, it is worthwhile, be-cause the availability of prediction for the latent process makes model checking diagnosis feasible, as discussed in the following section. With regard to the computational aspects for the analysis of the polio data, our K E E program took about 16 minutes on a Sun Sparc-10 Workstation to complete a total of 166 iterations until convergence. The program was written in S-PLUS code rather than C code, so the K E E approach is generally fairly competitive in terms of computational time. Also the K E E approach is very easy to implement, compared with the M C M C algorithm which usually needs programming in the C language due to the tremendous demand for computing memory on the simulations. More detailed comparisons between the M C E M and K E E methods on the computational aspects are found in Chapter 9. 7.6.4 Model diagnosis When reading this section, the reader should bear one thing in mind: all the time series plots such as the autocorrelation function (ac/) and partial autocorrelation function (pacf) plots are just approximate demonstrations, since the distributions considered here are non-normal. Figure 7.2 provides a check of the correlation structures of both the latent part and Chapter 7. State space models with stationary latent process 112 Figure 7.2: Checking correlation structures for both latent and observed processes. Latent process Latent process CO 13 T3 'to P â€¢ * ~i 1 1 1 1 1â€” - 1 0 1 2 3 4 lag 1 residuals 00 d CD d Li. < o C M d C M d I . . . I I I I 1 10 15 20 Lag Observed process Observed process -\ C M - 1 0 1 2 3 4 lag 1 residuals C O d C D d u. O ^ < o C M d C M d 10 15 20 Lag Chapter 7. State space models with stationary latent process 113 observed part. The two top displays are respectively the plot of e ( versus e t_i and the acf plot of s~f The two bottom displays are respectively the plot of ~ t versus 7? <_ 1 and the acf plot of fjt. A l l four plots show absence of serial correlation for both residuals respectively from the latent process and observed process. This confirms the assumptions of a first-order Markov latent process and the conditional independence for the observed process. Figure 7.3: Checking distribution assumptions for the latent and observed process. Latent process log predictors Observed process log predictors Figure 7.3 shows plots of filter residuals e t and rjt against respectively the log linear predictors log(ara t_i + a) and log(/ (), for the purpose of checking the distributional as-sumptions. The plot for checking the assumption of gamma distribution (latent part) looks fine, while the one for checking the assumption of Poisson distribution (observed part) shows clear discreteness. Taking into account the small expected counts, the second Chapter 7. State space models with stationary latent process 114 plot seems.to be in agreement with the proposed model. Figure 7.4 shows the plot of rjt versus time t to check the adequacy of linearity for the time trend covariate. The plot looks fairly random, confirming the proposed model structure for the time trend covariate which is the main focus in the analysis of the data. Figure 7.4: Checking the linearity of the time trend. 100 150 Figure 7.5 provides a check of the structure of the latent process by plotting the autocorrelation and partial autocorrelation functions of the standardized residuals et, which are supposed to be a series of noise values, where Gt is a sequence of beta numbers randomly generated from the distribution Beta(a/a2,(l â€” a)/a2). Both the acf and pacf plots show no significant peaks, confirming the adequacy of the assumption for the latent structure. Chapter 7. State space models with stationary latent process Figure 7.5: Checking the latent structure. 115 Latent process Latent process i . . 11 7.7 Proof of asymptotic normality Lemma 7.1 If {9t} is a stationary Markov process of order 1, then {9t} is a <f)-mixing process. For the proof, refer to Billingsley (Page 167, 1968). Lemma 7.2 Assume {9t} is a ^-mixing process and given 9t, {Yt} is conditionally inde-pendent. Then the {Yt} is a <f)-mixing process, and therefore under some mild regularity conditions -J2(Yt-EYt)^0, asn^oo, (7.7.1) n t=i and -^Ylt=i(Yt â€” EYt) is asymptotically normal with mean 0 and variance Â°^) + i i Â£ c i 2 ry(j). "j=-n+l Chapter 7. State space models with stationary latent process 116 where ry(j) â€” Cov(Yt+j, Yt). Proof First we prove that {l i}is a (^-mixing process. Let Bij = (r(Yui<t<j) for all 1 < i < j' < oo, and let A G B\n and B G Bn+S<00. Since given 9t, Yt is conditionally independent, P(AnB) = f P(A(1 B\9n+S = x)pn+s(x)dx JTl = I P(A\9n+s = x)P(B\9n+s = x)Pn+s(x)dx JTl here pt(x) denotes the density function of 9t. Since {9t} is a (^-mixing process, there exits a function (f)(s) with <f)(s) â€”> 0 as s â€”> oo such that (Z,X) - pn(z)pn+s(x)\ < <f>(s) for all 1 < n,s < oo. Hence, P(A\9n+s) _ JnP(A\6n+s = x,9n = z)pn+s\n(x\z)pn(z)dz P(A) ~ P(A)Pn+s(x) f I I A\ Pn,n+s(z, x) j = / Pn{z\A)â€”â€” â€” dz ci(s) < j Pn{z\A) JTl 1 + pn(z)pn+s(x) / Pn(z\A) dz where ptiit(x,z) denotes the joint density of 6t for the state z at time t and the state x at time t', pti\t(x\z) denotes the conditional density of 9t,9t', and pt(z\A) denotes the conditional density of 9t given a Borel set A. It follows that P(AHB) < P{A)P{B) + <f>{s)P(A) [ f ^4^-P{B\9n+s^x)dzdx JTl JTl pn{Z) = P(A)P{B) + (f*(s) \ [ P{A\9n = z)dz] \ I P{B\9n+s = x)dx Chapter 7. State space models with stationary latent process 117 Proceeding the similar procedure above, we may obtain P(A C\B)> P(A)P(B) - <f>(s) \ I P(A\9n = z)dz] \ I P(B\9n+s = x)da So, where \P{A n B) - P(A)P(B)\ < (j>*(s) P(s) = <j>(s) [Jn^(A\en = z)dz\ y^P(B\9n+s = x)d 0 as s â€”> 0. The proof for the rest parts of this theorem is just an application of theorems appeared in Chapter 3 of Stout (1974) and Chapter 4 of Billingsley (1968), so the details are omitted. â€¢ Now we prove Theorem 7.1. Under some regularity conditions, \/n(J3 â€” (3) can be approximated by 1 n â€¢ -1 where - ( ^ T 1 ) = - Â£ x ( Â« 2 - x 7 m ; - (j^l + (1 -p) t^a]-^J(Yt - atm*t). (7.7.2) From the definition of the Kalman smoother, denoting H< = ( h t i , h t n ) = Cov(0t, Y ) V a r " 1 ( Y ) , we obtain m*t - E (m*) = m* - // = H t ( Y - E Y ) . (7.7.3) The first term on the right-hand side of the equation (7.7.2) may be written as follows, i n i n i n - Â£ x t a ^ x t T m ; = - Â£ x 4 a ^ x 7 E m * + - ^ Â£ x t a 2 - p x t T K - E m * ) n t=i t=i n t=i tzzl Chapter 7. State space models with stationary latent process 118 since i n - i n n - ^ X i a ( 2 - \ T K - E m ; ) = - ^ E ^ V E M ^ - E * ; - ) 11 t=l 71 t=l 3=1 i n /-i n \ = - E - E ^ A ? ' P X T (1S--E1S-) re j = l \ n t=l ) = Â°p(l)l provided that i f > p c t a 2 - % T = 0(1). (7.7.4) ra t=i Similarly, the second term on the right-hand side of the equation (7.7.2) can be evaluated as follows, n "â€¢ t=i where the last equality is obtained by the following arguments. First we note that dm* dm*_dUt It follows from Lemma 7.2 that assuming Because ^ Y%=i a t _ P x t x 7 ( ^ t â€” a < m * ) = Op(l) due to the asymptotic normality (which will be proven below), the third term in the right-hand side of equation (7.7.2) is 0â€ž(lA/n). Chapter 7. State space models with stationary latent process 119 By now we prove that the equation (7.7.2) has = - ^ E Â« ? " ' ' x 1 { p x t + Z)^( t )} T + D p ( l ) where T>*p(i) is given in (7.4.7). On the other hand, noting that Yt-atm*t = Yt - atp - a t H t ( Y - E Y ) = (1 - athtt)(Yt - EYt) -Yathtj(Yj - EY]) 3+t = YhtAYj-EYj) w here htj = < 1 - athtt if j =t -athtj Hj^t we have HP) VU t=l j=l By Lemma 7.2, -^tp(f3) is asymptotically normal with mean 0 and variance lim - J2 XA^hiTyii - j ) h j T A 1 _ p X T , n n 1,3 where h t = (hu,... ,Xâ€ž,)T. By (7.4.6), the asymptotic variance may also be written as lim - X A 1 - p V a r ( Y - A m * ) A ^ X 7 n n where Y = ( Y i , . . . , Yn)T and m* = (m*, . . . , m*) T . Summarizing above arguments, we finally prove that y/n(J3 â€” (3) is asymptotically normal with mean 0 and variance matrix Chapter 7. State space models with stationary latent process 120 T -1 given by {xA x - p Var (Y - A m * ) A 1 _ p X T } Â£ a t 2 _ p {/ix t + VUt)} x t T Lt=i which may also be written as limnS-1(/3)V(/3)S-T(/3). - l The proof of the asymptotic normality is completed. Chapter 8 State space models with non-stationary latent process In this chapter, we present a class of state space models structured as follows: the latent process is assumed to be a Markov process including time-varying covariates and the observations are conditionally independent given the latent process, over time as well as over the components of the response vector. The class of models generalizes the state space models discussed in the previous chapter in two aspects: (1) the models allow time-varying covariates to enter latent process, which results in a non-stationary latent process and (2) the models deal with multivariate time series data where the correlations over both time and components are introduced by latent process. The models are inspired by the analysis of the Prince George data regarding the number of emergency room visits for four categories of respiratory diseases. Refer to Chapter 10 for details regarding these data. This provides a fully parametric alternative to the quasi-likelihood approach of Liang and Zeger (1986). Following the K E E method presented in Chapter 7, we establish the estimation for the regression parameters in the present chapter, and the analysis of residuals from both the observation and the latent process is also proposed for model diagnosis. 8.1 Introduction Let Y< be a d-dimensional response vector such that the components of Yt (called cate-gories) all reflect the same underlying tendency 9t, say, but where the categories may not 121 Chapter 8. State space models with non-stationary latent process 122 all have the same distribution. We assume that the latent process 6t is a Markov process including time-varying covariates and that the observations are conditionally indepen-dent given the latent process, both across categories and over time, giving a realistic correlation structure for many types of data. We consider a log-linear regression model for the marginal means E(Y t ) . These state space models generalize the class of state space models with stationary latent process proposed in Chapter 7 in two aspects: (1) they allow the latent process to include the time-varying covariates and (2) they deal with the multivariate time series data. The two types of covariates that enter either via the latent process or via the obser-vation model are called respectively long-term and short-term covariates, a terminology that derives from the fact that, for the long-term covariates, the Markov structure of the latent process creates a carry-over effect, which to some extent obviates the need for lagged covariates. Another important feature of these models is that the latent process naturally introduces both the correlation over time within each time series and the cor-relation across categories among all time series, giving a more realistic model setting to multivariate longitudinal data. Additionally, similar to the previous chapter, the models are also based on distribu-tions in the class of Tweedie exponential dispersion models (see Section 2.3). This ac-commodates a wide variety of data, including discrete, continuous and mixed data. The model provides a fully parametric alternative to the quasi-likelihood models of Liang and Zeger (1986) (c.f. Diggle, Liang and Zeger, 1994). However, the Tweedie models all have unbounded support, excluding in particular binary data from our method. The class of state space models proposed in the present chapter is inspired by the special case of a Poisson response vector and a gamma latent process which is used for the analysis of the Prince George data in Chapter 10, where the relation between ambient air pollution and emergency room visits for respiratory diseases is discussed. Chapter 8. State space models with non-stationary latent process 123 In this analysis the air pollution variables are taken as long-term covariates, the latent process represents the air pollution's current respiratory morbidity potential, and the short-term variables such as temperature and humidity affect the rate at which this potential morbidity produces respiratory disease. Note that in this case the categories all have the same marginal distribution. As the estimation method used in the previous chapter for the models with stationary latent process, we also center our estimation approach for the state space models with non-stationary latent process at the Kalman estimating equation (KEE) , corresponding to the E M algorithm with the E-step approximated by the Kalman smoother. We also use a Newton scoring algorithm in the M-step, which is an adaptation of Fisher's scoring method to estimating equations. The dispersion parameters are estimated by a bias corrected version of the Pearson estimator. We also consider analysis of residuals from both the latent and observed parts of the model. 8.2 Models Consider a d-dimensional vector of observations Yt recorded at equally spaced times t â€” 1,..., n, and denote the full data vector by Y = ( Y 1 T , . . . , Y J ) T . Let 9t be a univariate latent process and denote the vector of 6s by where 9Q is an initializing variable. We assume that the nd components of Y are condi-tionally independent given 6, and that the conditional distribution of Yt given 0 depends only on 9t. Let xf Â£ R f c and zt Â£ R/ be two separate vectors of time-varying covariates, Chapter 8. State space models with non-stationary latent process 124 containing respectively short-term and long-term covariates, none of which contains the constant covariate. The conditional distribution of the ith category of Yt given 9t is assumed to follow a Tweedie exponential dispersion model, Yit\9t^TwPi{ait9u-^j, (8.2.1) for i = 1,... , d, t = 1,... , n, where ait = exp (xtTQ;Â») . (8.2.2) The short-term covariates represent modulating factors that have an immediate effect on Yn relative to the value of 9t. Note that the corresponding regression parameters a , G R f c may vary among the d categories. The shape parameters pi of the Tweedie model, which must satisfy pi > 1 or p, < 0, are known, but may vary from category to category, allowing each category to have a different distribution. The parameters vf in (8.2.1) are dispersion parameters, so that even in the case where all p,s are equal, the d categories may have different dispersions. Note that pi > 1 corresponds to non-negative distributions while pi < 0 corresponds to distributions with support TZ (c.f. Section 2.3). In particular, p; = 0 gives the normal distribution, p, = 1 gives the Poisson distribution, 1 < p, < 2 are compound Poisson distributions with a positive probability in zero, pi = 2 is the gamma distribution and Pi = 3 is the inverse Gaussian distribution. The Tweedie class hence accommodates a considerable range of distributions, including continuous, discrete and mixed ones. In the Poisson case (p, = 1), (8.2.1) reduces to Yit\9t ~ TWl (ait9t,v?) = ufPo(ait9t/v?), so in this case we take vf â€” 1 in all formulas in the following. Chapter 8. State space models with non-stationary latent process 125 In (8.2.1) the conditional expectation is the first argument a^Qt and the conditional variance is Vai(Y,-t|6>0 = vfa%6t. Hence 6t may be interpreted as the "intensity" of the observation Yt, being modified for the ith category by the covariates X i via an. The latent process of the state space model is assumed to be an exponential dispersion model Markov process, defined by the transition distribution ^ - i - T w ^ M t - i , ^ ) (8.2.3) for t = 1,..., n, where the parameter q is known and satisfies q > 2, making the latent process positive. In particular, this excludes the possibility of a normal distribution for the latent process. The conditional expectation is bt9t-\ and the conditional variance is Var(^|^_x) = a2bl6t_x. The latent process may be interpreted as a random walk with drift, and is hence non-stationary. The parameter a1 in (8.2.3) is a dispersion parameter, and bt depends on the long-term covariates zt via their increments Azt = zt â€” z t _i as follows, bt = exp (AzT/3) , (8.2.4) (3 G R' being a regression parameter. For convenience, we assume from now on that ZQ = 0, which may be obtained by subtracting z 0 from all zt. Note that the Markov structure of the latent process produces a carry-over effect, such that an increase in the level of z at a given time t may have an effect on subsequent time-points. A n important characteristic of long-term covariates is that they have the Chapter 8. State space models with non-stationary latent process 126 same effect on all d categories of Yt, in contrast to short-term covariates whose effects may vary across categories. For a single series of observations Y , we take 9Q = g = exp (7) to be degenerate and equal to the exponential of the constant covariate. The means of the latent and observed processes are log-linear in the covariates, as shown by E(0 t) = rt = gb1---bt = exp (zj (3 + 7) and E(Y f t ) = exp (x t T a t - -f zj (3 + 7) . The marginal variance of the observed vector Y< consists of two terms, Var(Y0 = ATTT + a ta t TVar(0 t), where A* = diag(i/ 2a?J,... , vjapd*), at = (alt,..., adt)T and Var(0 t) is given in (8.2.7) below. The second term shows overdispersion relative to the model (8.2.1) and shows that the correlation between categories is positive. The covariance between two observation vectors separated by a lag of s > 0 is Cov(Y t , Y t + S ) = ataj+sbt+1 â€¢ â€¢ â€¢ bt+sVai(9t). Figure 8.1 shows simulations of the model for q = 2 (a gamma latent process) and four categories with px â€” 0, p2 = 1, p3 â€” 1.5 and p4 = 2. The four observation processes represent the following characteristics, respectively: symmetric continuous (normal ob-servation process), discrete (Poisson observation process), positive skew with positive probability at zero (compound Poisson observation process) and positive skew (gamma observation process). Now consider the case where h independent series of observations are available indexed by J â€” 1) â€¢ â€¢ â€¢) h, the jth series having length rij. We let Yijt denote the observation for Chapter 8. State space models with non-stationary latent process 127 Figure 8.1: Simulation of 100 observations from a process with four categories with q = 2, Pi = 0, 1, 1.5, 2, uf = 1, cr2 = 0.05 and 9Q = 2 without any covariate effects. Observations are shown with solid lines, and the latent processes with dashed lines. Normal observation process 0 20 40 60 80 100 Poisson observation process 0 20 40 60 80 100 Compound Poisson observation process 0 20 40 60 80 100 Chapter 8. State space models with non-stationary latent process 128 category i, series j and time t. The vector of observations and the latent value at time t for series j are denoted Yjt and 6jt, respectively, and the corresponding values of the time-varying covariates are denoted Zjt and X j * . We assume that series j has an associated vector of covariates W j that are constant over time, and which by convention contains the constant covariate. The covariates W j represent characteristics specific to the j t h observation series. Let the conditional distribution of the data vector for series j given OJQ be given by the model (8.2.1)â€”(8.2.3) with parameters otij Â£ R f c, fij Â£ R' depending on j but a2 being independent of j. Let the marginal distribution of 8 jo be 0jO ~ Twr(9j,u2), (8.2.5) with 9 j = exp ( w j 7 ) , (8.2.6) where 7 Â£ R m is a third regression parameter. The variance of 8j0 is VaX(dj0)=u2grj. The parameter UJ 2 is a dispersion parameter, reflecting random variation between series, whereas W j reflects systematic variation. The variance of 6jt is Var(0 j t) = o2<f>jtTjt + u2r2t (8.2.7) where fa = hVx + Wjt-i + â€” 1 - bjtbjt~i â€¢ â€¢ â€¢ In this random effects model the marginal expectation of the observations are E(Yijt) = exp (xjtotij + z]tPj + w j 7 ) . Note that if we interpret Twr(g,0) as the degenerate distribution at g, taking u2 â€” 0 in (8.2.5) gives the special case of the general model (8.2.1)-(8.2.6) in which 0jO is constant and equal to gj. Chapter 8. State space models with non-stationary latent process 129 8.3 Kalman filter and smoother We now consider the Kalman filter and smoother for the model (8.2.1)â€”(8.2.6). For notational convenience we consider a single series of observations, suppressing the index j from the notation. The Kalman filter predicts a future observation Yt+i and the current value of the latent process 9t based on past and current observations Y i , . . . , Y * . The smoother predicts the latent process from the total vector of observations Y 1 ; . . . ,Yâ€ž . The innovations for the latent process are given by Â£t = 9t â€” E (9t\6t-1) =9t- bt9t-i (8.3.1) and for the observed process <pt = Yt-E (Y (|0 f) =Yt- aA, (8.3.2) where 6l = (90,..., 9t)T. Using the innovations, the model may be put on additive form as follows, Yt = &tet + (pt, 6t = btdt-1+Zu where tpt and 9t are uncorrelated, and so are Â£ t and 9t-\. A l l In innovations are uncor-related by the argument similar to those given in 7.3.1. Hence the model is structured as that of an ordinary Gaussian state space model. The variances of the innovations are Var(6) = E ( $ ) = E {Var ( ^ f l ' " 1 ) } = aHf^t and V a r ( ^ ) = E (<pt<pj) = AtTt. (8.3.3) With the notations introduced in Chapter 6, we may either use the standard results under the ordinary Gaussian state space models (see e.g. Harvey, 1981) or apply the Chapter 8. State space models with non-stationary latent process 130 results developed in Chapter 6 under a general setup to obtain the recursive formulas for prediction, Kalman filter and smoother. Let ^ . i l Y * " 1 ~ [m t _ 1 ;CÂ«_ 1 ] and Dt = btCt-1+<r2br1Tt-i, where Y t _ 1 = (Yi,..., Yt_i). The prediction is then given by Y t l Y ^ - f o Q t ] , (8.3.4) where f( = a t6 tm t_i; Qt = btDtataJ + Atrt. The filtered values of the latent process are, for t = 1,..., n 9t\Yf ~[mt;Ct], (8.3.5) starting with mo = g and Co = u>2gr, where mt = bt {m 4 _! + Aa^Qr^Y* - ft)} (8.3.6) and Ct = btDt ( l - btDtajQ^at) . Given all n observations, the smoothed version of the latent process is given by the following backward recursion for t = n â€” 1,... , 0, * t | Y n ~ K ; C ; ] , (8.3.7) starting with m* = mn and C* = Câ€ž, where Dt+i and m* = mt + j ~ (â„¢t+i - bt+imt) Chapter 8. State space models with non-stationary latent process 131 8.4 Parameter estimation We now consider parameter estimation for the model defined in Section 8.2 based on h independent series of observations, corresponding to N = n\ + â€¢ â€¢ â€¢ -f nh vectors of observations Yjt. The unknown parameters of this model are otij, Pj, 7, v2, cr2 and UJ2. Using a slightly different notation from Sections 8.1 and 8.3, we now denote the dN x 1 data vector by Y â€” (YT Y T Y T Y T \ T x ~~ ^ x 1 H â€¢ â€¢ ' ' x l n 1 ' - - - 5 1 hD - â€¢ â€¢ i x hnh) ' and we define the vector of smoothed values by m* = (ml0,... ,m*lni,... ,m*h0,... ,m*hnJT. We also define the two parameter vectors ct = ctjh,...,ctJh)T , (3 = (pJ,...,(3l)T . Using the similar arguments in Section 7.3.2, the mean squared error matrix C*, defined by C* = E{(0 -m*)(0 -m*)T}, has diagonal elements C* and off-diagonal elements given by s c â€¢ n* _ n* TT Â° t t+s â€” ^t+s 11 n 8.4.1 Kalman estimating equation Following ideas presented in Chapter 7, we estimate the regression parameters based on the Kalman estimating equation. This estimation method corresponds to replacing the E-step of the E M algorithm by the Kalman smoother, but instead of the E M algorithm we use the Newton scoring algorithm. Chapter 8. State space models with non-stationary latent process 132 Let us consider estimation of oc, f3 and 7 for known values of the dispersion parameters v2, cr2 and UJ2. We define the estimate fj as the solution to the Kalman estimating equation tp(rj) = 0. Here rj = (a T,/3 T ,7 T ) , and tp is the vector with components ^1(i|) = X T K : 1 ( Y - A m * ) , (8.4.1) ip2(Tj) = A Z T K b - 1 B b m * , (8.4.2) and rp3(rj) = WTK;' (m*0 - g), (8.4.3) where A and B are suitably defined matrices such that the elements of Y â€” Am* are Yt â€” atm* and those of Bm* are m* â€” btm*^. The matrices X, A Z and W are the design matrices corresponding to the generalized linear models (8.2.2), (8.2.4) and (8.2.6), and K a , K-b and K 5 and the vectors g and mj are defined by K a = diag {i/jaJir1,..., ^dadLl} , Kb = diag{bl;\...,bl^,...,bl-\...,brnl}, K^diag^r 1,...,^" 1}. and g = (9u---,9h)T , m*0 = (rn*w,...,m*h0)T . The estimating equation tp(rj) = 0 is unbiased, and the asymptotic standard errors of the estimator fj may be calculated from the inverse of the Godambe information matrix, J(il) = ST(i|)V-1(f|)S(i|), where S(rj) is the sensitivity matrix, Chapter 8. State space models with non-stationary latent process 133 and where is the variability matrix. Both these matrices are calculated in Section 8.5. If the dis-persion parameters are unknown, this gives approximately correct asymptotic standard errors if ib depends little on vf, a2 and u>2. A n analogue of Wald's test is available for testing the joint significance of a subset of regression parameters. The test statistic is defined as follows, where rj1 is the subvector whose significance we want to test, and J1 1(Â»)) is the corre-sponding block of the asymptotic covariance matrix of fj. Asymptotically, this statistic follows a x 2(^)-distribution, where u is the size of the subvector r}1. The Newton scoring algorithm, a parallel to Fisher's scoring method, is defined as the Newton algorithm applied to the equation tp(rj) = 0, but with the derivative of tft replaced by its expectation S(T7). The resulting algorithm gives the following updated value for rj, f | * = Â» | - S - 1 ( f | ) ^ ( i | ) . (8.4.4) A n advantage of this algorithm is that the calculation of may be done recursively in parallel with the calculation of the Kalman smoother given in Section 8.5. To start the algorithm, we first estimate the ^-process for a given series as the average of the d components of Yjt. If d â€” 1 some smoothing is required, using for example a moving average of the series. Initial parameter estimates may then be obtained by fitting the three generalized linear models mentioned above. Chapter 8. State space models with non-stationary latent process 134 8.4.2 Estimation of dispersion parameters Based on the augmented data vector (Y,0), simple estimates of the parameters v2, a2 and ui2 are Pearson estimates, based on weighted sums of the squared innovations, estimated from the Kalman smoother. These estimates would be downwards biased due to the substitution of the smoother. However, the bias may be expressed in terms of the mean square error matrix C* of the smoother, as explained in the following, leading to the following adjusted Pearson estimates for the dispersion parameters. Note that E(Â£| t ) = o-2bq^XT3t, and, since Â£jt and Â£*t = m*t â€” bjtm*^ are uncorrelated, that ^ V ^ = E(<yi2) + E f e t - Â§ t ) 3 -Straightforward calculations (c.f. J0rgensen et al., 1995) yield Hence we obtain 2 _ E fe2) C*t + b%C]t_x - 2bJtCjt-1C*t/Djt and an unbiased estimate for a2 is 1 & 1 V ^ y * Ch + b%Ch-i ~ 2bjtCjt-iCjt/D3t ( o A c \ - N 2s hq-i -r N 2s 2s Â« - i > v c -^Â°J l y j=lt=lÂ°jt T3t l y j=l t=l Â°jt T3t where the last term corrects for the bias introduced by using the smoother for the latent process. Similarly, estimates for v2 may be obtained from the following, Â« r j i = E(4<)=E(^)+a2jtq;, where (p*t = Y3t â€” a.jtm*jt. This leads to the corresponding adjusted estimate for v2 i V j = l t=l T3taijt W j = i t=l T3taijt Chapter 8. State space models with non-stationary latent process 135 Based on the same arguments, the adjusted estimator for the random effect variance is . 2 1 A m t 1 A ct . E - f + r E - T 1 - (8-4-7) 3=1 ^3 3=1 ^3 When the dispersion parameters are unknown, the Newton scoring algorithm swaps between updating rj via (8.4.4) and updating uf, a2 and u2 via (8.4.5), (8.4.6) and (8.4.7). 8.5 Derivation of Godambe information matrix This section presents the calculations of the sensitivity matrix S and the variability matrix V entering in the Godambe information matrix J = S T V _ 1 S and in the Newton scoring algorithm. 8.5.1 Some moment calculations Since Â£* = et â€” a<(ra* â€” 8t), where at0t is the conditional expectation of Y f given 0t, we have for all s, t > 0 Cov (et, ea) = Cov (et*, e*s) + a,a T C*. According to results in Section 8.3, the e*s are mutually uncorrelated, and we hence obtain , t X A t7> â€” ataTC? for t = s E (e^: T )= " 4 (8.5.1) [ -ata]C?s for t ^ s. A special case of this result is . Cov(e*,m*) = atC*0. From Section 8.3 we know that Cov(eri, 6) = 0 and using again that 9t = m* â€” (m*â€”6t), we obtain Cov (rt,C) = a tCov (m*t -0um*-0a- bs{m*s_, - 0^)) = *iCl-*ib,C;_1. (8.5.2) Chapter 8. State space models with non-stationary latent process 136 To obtain the covariance matrix of Â£* = (Â£* , . . . ,Â£*) we note that the innovation Â£t may be written as a sum of two uncorrelated terms 6 = G + {(Ot - m*t) - M0 t_! - mU)} â€¢ The innovations are mutually uncorrelated with variance o-2b\~XTt, and since Var (9 â€” m*) C*, we find E (Â£t it+s) = o-2b\-\t - c; + 2 0 ic;*-i - %cu for s = o (o.o.6) . -c ; t + , + 6tc;_1<+s + 6*+.^ +,.! - 6t6t+.c;_lt+,_1 for * > o. A special case of this formula is E {(m*0 - r0)i;} = -c;t + btC^. 8.5.2 Derivatives of Kalman filter and smoother In the calculation of the sensitivity matrix we need the expected values of the partial derivatives of mt and m* with respect to the parameters oc, /3 and 7, denoted by etc., which we now calculate. For the Kalman filter, we have from (8.2.5) and (8.3.6) that m0 = g = exp(w T 7) and mt = mt = bt {m t _i + A a T Q t - 1 (Y< _ f<)} Â» where at is a function of a , 6* is a function of (3 and A and Q< are functions of ct and /3. Since E (Yt â€” fÂ«) = 0, we obtain the recursion = bt ( l - ^ A a ^ Q r 1 ^ ) Z>Â«(< - 1) - ftiAna^Q:1â€”a, Chapter 8. State space models with non-stationary latent process 137 starting with T>a(0) â€” 0. Note that d dot â€”a.t = blockdiag (xjalt,..., x.Jadt) Analogously, we obtain V^^b^t{^^ + btVp{t-\)} starting with T>p(0) = 0, and starting with 2^ 7(0) = wT g. For the smoother we immediately obtain the recursions TTa(t) = Va(t) + {V*a(t + 1) - bt+1Va(t)} , V*p{t) = Vp(t) + [V*p(t + 1) - bt+tVpit) - z t T + 1r t + 1} and IXfit) = 2>7(t) + {D^t + 1) - bt+1V7(t)} , all starting for t = n with V*a(n) = Va(n), V*p(n) = Vp(n) and V*y(n) = V<y(n). 8.5.3 Variability matrix We now switch to the notation of Section 8.4 again. By definition, the variability matrix for the estimating function ip~ is V = ( vâ€ž v12 v13 N V21 V22 V 2 3 \ V31 V32 V 3 3 J Chapter 8. State space models with non-stationary latent process 138 where V,j = E{^ t-^ J}. From Section 8.5.1 we have the variance-covariance structure of Y â€” Am*, Bm* and m*,. Straightforward calculations give and V n = X 1 K J 1 Var(Y - Am*)K; 1X, Via = X T K ; 1 C o v ( Y - Am*,Bm*)K b- 1AZ, â€¢ V 1 3 = X T K ; 1 C o v ( Y - Am*, m ^ K ^ W , V 2 2 = AZ T K f e - 1 Var (Bm*)K 6 - 1 AZ, V 2 3 = AZ T K 6 - 1 Cov(Bm*,m*)K; 1 W V 3 3 = W T K ; 1 Var (m*)K; 1 W. The variances and covariances involved in these expressions may be calculated from the results of Section 8.5.1 by noting that Y â€” Am* is the vector of predicted innovations e*t and Bm* is the vector of predicted innovations Â£* r 8.5.4 Sensitivity matrix As in the previous section, we may write the sensitivity matrix for rp on the form S = ^ Sn S12 Si 3 ^ '21 2^2 &23 \ S 3i S 3 2 S 3 3 j where S,j = E{dipi/dr)j}, with j = 1,2,3 referring to ct, (3 and 7, respectively. Using the derivatives of the Kalman smoother from Section 8.5.2 we obtain S n = - X T K a - M ^ r + A D M , OA Chapter 8. State space models with non-stationary latent process 139 where D A denotes the vectors T>*a(i) stacked in the appropriate order, and where f dA \ " dait The remaining blocks of S are, using a similar notation, S12 = - X ^ ^ A D ^ , s13 = - X T K ; 1 A D ^ , S21 = AZ T K 6 " 1 B D A , S23 = AZ T K 6 - 1 B D * Y , 531 = W ^ D S a , 532 = W - K - D ^ and s33 = W T K J 1 ( D 0 7 - . Here a subscript 0 refers to differentiation of mj. 8.6 Residual analysis Residual analysis for both the observed and unobserved parts of the model is an important ingredient of our approach. The basic idea is to use plots of standardized residuals (i.e. residuals divided by their standard deviations) in much the same way as in generalized linear models, in order to check the distributional assumptions and the regression part of the model. We may also check the correlation structure of the model by means of methods from time-series analysis. We may consider each series separately, or we may combine residuals from several series. Chapter 8. State space models with non-stationary latent process 140 The main type of residuals are the conditional residuals, defined as the predicted values of the innovations <p and Â£, based on either the Kalman filter or smoother. A l l residuals have means 0. The properties derived in the following do not take into account the effect of substituting parameter estimates. The predicted values of the innovations based on the Kalman filter are 0>jt â€” Y j t - fjt and Â£jt = m,jt - ajtmjt. The prediction errors cpjt are mutually uncorrelated over time, and since Â£jt = bjtDjtBjtQj^ (p then so are the predicted innovations of the latent process. This property of the filter residuals (pjt and Â£jt makes them especially useful for residual plots. Moreover, Â£jt is uncorrelated with f^t, and (jt is uncorrelated with bjtmjt-i- The variances of the two sets of residuals are, respectively, Var ( Â£ j t ) = Qjt and Var (Â£â€¢Â«) = bjtDjt - Cjt. A disadvantage of the filter residuals is that their variances may be large for the first few observations of each series. This disadvantage is not shared by residuals based on the Kalman smoother, which in fact have smaller variances than the corresponding filter residuals. These residuals are <P*jt = Yjt-ajtm*t and Â§ t = m*jt - bjtm*t_v However, the smoother residuals do not have the same uncorrelatedness properties as the filter residuals, making the smoother residuals less useful for some purposes. Their variances are V a r (#Jt) = AitTJt ~ &jt&]tCjt and Var (&) = a V * " ^ " $Ch-i + ^^C],. Chapter 8. State space models with non-stationary latent process 141 Standardized residuals (having unit variance) are denoted Â£ -t etc. The residuals (p]t are standardized componentwise. The correlation structure of the model is easy to check by means of the autocorrelation functions of the standardized filter residuals (the are considered separately for each i). We may also plot Â£- t against ttjt-i, to check the Markov assumption for the latent process, and Tp^ against to check the conditional independence of the counts over time. To check the assumption that the categories are conditionally independent given the latent process, we consider the empirical variance-covariance matrices of the vector of standardized prediction errors, f EQ7. 1 / A^^J,Q7. T / a, (8-6.1) n3 t=l whose expectations are all the d x d identity matrix. The off-diagonal elements have â€” 1/2 asymptotic standard errors rij . Note that the standardization depends on the version 1 /2 of the square-root matrix Q chosen. As in generalized linear models, we may check the form of the variance functions (and hence distributional forms) by plotting the standardized residuals against the cor-responding log fitted values. A "megaphone" shape would indicate that the chosen shape parameter in the corresponding variance function is incorrect. In the case of several series we may use the standardized smoother residuals m* 0 â€” g3- for checking the form of the random effects distribution. It is crucial to use the smoother residuals for this purpose, because, by definition, mj 0 = gj, which is not random. Plots of standardized residuals against each covariate are useful for detecting non-linearity of the model. The smoother residuals are the best choice of residuals for this purpose. To check the log link assumption, we plot log Y{jt against the log fitted values log(aj tm* t). This plot should show a horizontal linear relationship; a curvature or other Chapter 8. State space models with non-stationary latent process 142 unusual shape would indicate inadequacy of the log link assumption. Residual analysis may also help to determine whether a covariate is long-term or short-term. The basic idea is that a short-term covariate would show an association with the observation residuals {(p or cp*) if it had been incorrectly fitted as a long-term covariate, and similarly, a long-term covariate would show association with the latent process residuals (Â£ or Â£*) if it had been incorrectly fitted as a short-term covariate. Chapter 9 Simulation studies under Poisson-gamma models In this chapter, Poisson-gamma models are used for our simulation studies to help ad-dress the two issues: (1) the comparative efficiency between the K E E approach and the Monte Carlo E M approach and (2) the utility of the residual analysis for detecting misspecifications between stationarity and non-stationarity for latent process. For issue (1), our study is based on a one-dimensional observed Poisson process (with one category) and non-stationary gamma latent process including time trend covariate. With regard to the model with stationary gamma latent process, a comparison has been made on the basis of the analysis of the polio data (see Table 7.1). As shown in Chapter 7, the K E E approach gives competitive estimates of regression parameters compared with the M C E M approach. So, in this chapter we decided to conduct our comparison on the model with non-stationary latent process. As the time-varying covariates enter the latent process, the feature of the latent process is substantially changed to non-stationarity, which increases the complexity of the model and complicates the statistical inference. As seen in Chapter 8, different covariates may play different roles in statistical models, and hence arranging them on different levels in models according to their nature is reasonable approach. The basic goal for the comparison is to understand the differences of the corresponding estimates of the coefficients as well as the standard errors given by the M C E M and K E E approaches. As a matter of fact, Chapter 8 suggests using the Godambe information matrix as the asymptotic variance matrix from which the standard errors may be computed. But such a suggestion is rooted in the asymptotic result which 143 Chapter 9. Simulation studies under Poisson-gamma models 144 was only proven when observations are independent (c.f. Godambe, 1976). Therefore, the comparison is expected to provide a kind of guidance for the further application of K E E approach, in particular as regards calculation of standard errors. In addition, the derivation of the M C E M method presented in this chapter may be applicable to a state space model whose latent process includes many covariates rather than one covariate (time trend) considered in our simulation study. The simulation study for question (2) mainly concerns misspecification of the latent process. Apparently, incorporating a stationary or non-stationary structure in latent process is crucial for modelling and hence for analysis of data. It seems rather important to stress the utility of the residual analyses developed in respectively Chapter 7 and 8 for detecting the misspecification of stationarity and non-stationarity for the latent process. We use the strategy of a case-and-control study; that is, we generate a data set from the model with a stationary latent process but fit the generated data by the model with a non-stationary latent structure, and then apply the diagnostic tools of model checking to detect the inadequacy for such a switch. As shown in the simulation study, the residual analyses proposed in the previous chapters effectively detect the mistakes of the modelling structures and hence assure appropriate model assumptions in the analysis of data. 9.1 Models We use two types of Poisson-gamma models for our simulation studies in this chapter. They are: Model I: The same model as studied in Section 7.6.1 with stationary latent process, Yt\9t~Po(Atdt), t = l,...,n where At = exp{xtTÂ»7}, Xt = (t, cos(27rt/12), sin(27r*/12), cos(27rt/6), sin(27rt/6))T includes time-trend and seasonal effects covariates, and TJ Â£ 7l5 is the parameter Chapter 9. Simulation studies under Poisson-gamma models 145 of interest. Note that in the model we use slightly different notation as before in order to distinguish it from the Model II given below. The stationary latent process is assumed to be 9t = Gtet-1 + eu t = l,...,n (9.1.1) where Gt are i.i.d random variables with beta distribution Beta{p\, (1 â€” p)X} where p is the autocorrelation. Model II: The following one-dimensional Poisson-gamma model is a special case of the models proposed in Chapter 8, Yt\6t~ Po(at0t), t = l,...,n where at = exp{xja}, x t = (cos(27ri/12), sin(27ri/12), cos(27ri/6), sin(27ri/6))T includes only the seasonal effects covariates and a G 7l4. The latent process is the first-order Markov gamma process 0 t|0 t_! ~ Ga {bet.ua2iet.^) , t = 1,... ,n , (9.1.2) where 90 is set to 1 and b = e13 â‚¬ 7Â£. This allows the time-trend (assumed to be a long-term covariate) to enter the latent process and hence results in a non-stationary latent process. Both /3 and a: are parameters of interest in the model, and a 2 is the dispersion parameter. Note that in both models, we rule out the intercept covariate due to the issue of model identification, which is different from the way we did for the analysis of the polio data in Section 7.6.1 where we imposed the condition E(#t) = 1 instead. Also note that only Model II is used for the study of question (1). Chapter 9. Simulation studies under Poisson-gamma models 146 9.2 Comparison between M C E M and K E E It follows from Model II that the conditional moment structures are given by E(Yt\6t) = ateu Yav(Yt\6t) = atet, and E(0t|0*-i) = bOt.u Var(0 t |0 t_i) = b2Qt_xo2. Up to the condition on 0O, the marginal mean for the observed process is logE(r t) = x t T a + /?i which is the same marginal model as in the stationary case, although the interpretations of the parameters are different. Let rj = ( a T , / 3 ) T be the vector of parameters to be estimated. Also let Y = ( l i , . . . , Yn)T and 6 = (0O,..., 0n)T. Conditionally on the initial latent variable 0O, the full log-likelihood of the augmented data ( Y , 0) is % ) = / Y , a ( Â« ) + V ( / ? ) (9-2.1) which shows that the parameter vector a and the parameter (3 are L-independent. Apart from constant terms, the likelihood for a is Â£(cx) = Â£Yie(a) = J2{Yt\og(6t) + Yt\og(at)-at6t} t=i = J2{-0texp(x?<x) + Ytlog(et) + Yti$a}. (9.2.2) t=i By letting A = 1/u 2, the likelihood for f3 is n - 2~2 {0t-iAlog A - pXOt-! + A0 t _i \og9t - loget - XO^ - lo g r (A0 t _i )} t=l Chapter 9. Simulation studies under Poisson-gamma models 147 A(log A - /?) + A X: log(^) - Â£ log(0Â«) t=i t=i t=i n n A e - ^ ^ - ^ l o g r ( A ^ ) . t=i <=i The E M algorithm is natural to use for estimating the parameter vector n by treating the latent process {9t} as a set of missing values. The real difficulty for the implemen-tation of EM-algorithm is that the E-step cannot be computed directly and easily. A n approximation to the E-step may be given by using a sample average and the Monte Carlo method, which is discussed in detail in the following section. 9.2.1 M C E M algorithm The E M algorithm consists of two steps: â€¢ E step: given the current rj^ â€” (aM T , / ?W) and Y , compute Q ( r 7 , T 7 ( i ) ) =Elog / ( 7 7 | 0 ,Y) (9.2.3) where the expectation is taken under the probability density / {0\rji^%\ Y ) . â€¢ M step: find the maximizer rj(1+1) of Q ( ? 7 , Â» 7 ^ ) -The algorithm is iterated until || rj^ - 17W || or \Q - Q , I is sufficiently small. When the E step cannot be implemented explicitly, we use the Monte Carlo method to approximate the Q function. In particular, the Monte Carlo E-step is given as (a) DTtm01,...,OmiÂ£f(0\YJr,W) (b) L e t Q J + 1 ( 7 7 , * 7 ( i ) ) = k g / ( l l * i , Y ) Chapter 9. Simulation studies under Poisson-gamma models 148 Accordingly, the M-step maximizes the Qi+i to obtain ?7^+ 1). It is worth noting that the step (a) in the Monte Carlo E-step is usually conducted by the Gibbs sampler ap-proach that generates an ergodic Markov chain 9\,. . . ,9m whose invariant distribution is the conditional distribution f (j)\Y,T)('^. The ergodicity implies that as m â€”â€¢Â» oo, Qi+i(v,V(i)) Q (v,V(i)) â€¢ Gelfand and Smith (1990) advocated using every h-th (h Â« 50) value in the sequence to obtain more independent contributions, but some authors such as Zeger and Karim (1991) opposed such an implementation and insisted on using the whole sequence for the computation. In our simulation studies, we have tried both; since there is no big difference between the two implementations in our computation of the E-step, we use the whole sequence for our computations. Within the context of Monte Carlo E M , the observed Fisher information matrix of the estimator rj = (aT,/?) may be approximately estimated via the complete information minus the missing information, given as respectively ^ -L j n \ complete info. = 0 V 0 Â£ Â£ â„¢ = l I/3/3J J and missing info. = 1 ^ E J m = i 4 , J ( Â« ) C J ( Â« ) - 4 ( Â« K l ( a ) â€¢ ^ E â„¢ i 4 . ( Â« K j , ( ^ ) - 4 ( a)5(^) where (9.2.4) (9.2.5) dotdoJ OC ' hpd = -a2 log/(iiiY,^-) 1 m -i m = - E ^ - ( Â« ) = - E m 3=1 m 3=1 and 1 m 1 m 3=1 3=1 d2/3 aiog/^lY,^) doc aiog/(77|Y,^) d/3 oc Chapter 9. Simulation studies under Poisson-gamma models 149 Letting B = Â£ ? = 1 6t_u C = Â£ ? = 1 (Yt + A 6 U ) log(0Â«), D = Â£ ? = 1 log(0Â«), E = Â£ t " = 1 6t and F = Â£â„¢ = 1 log r(A0 t_i), we may re-write the full log-likelihood of 7} by n = E {-^texp (xtTa) + Y ,x T a} + A(log A - /3)B + C - D - Xe~pE - F . (9.2.6) where E â€” B = 6n â€” 9Q. Clearly, the log-likelihood Â£(77) is linear in B,C,D,E and i* 1 . So, for given #1,..., 9m ~ / (0|Y,?yW), the Monte Carlo E-step is given by Q i + 1 (77, VÂ°) = E { - ^exp (xtTa) + rtxtTa} + A(log A - f3)B + C â€” D- Xe'^E - F t=i (9.2.7) where we use bars to denote the sample averages for corresponding variables; for instance, &t = m Â£JLi &t,j a n ( l &t,j is the i-th component of 6j. For the M-step, by taking the derivatives of the function Qi+i with respect to the parameter ct, we obtain J2xt{Yt-aA)=0 (9.2.8) t=i and the estimator ct^+1^ is the solution to the equation (9.2.8). To solve the equation, we may either use the G L I M software for Poisson regression or use the Newton-Raphson algorithm directly. Similarly, by differentiating Qi+i with respect to the parameter f3, we obtain P+V = \og(E) - \og(B). (9.2.9) The standard errors of the estimators oc and f3 may be given as the square roots of the diagonal elements of the inverse of the observed Fisher information matrix approximated as follows, Fisher info, w complete info. â€” missing info. (9.2.10) where both the complete info, and missing info, are approximated on the basis of the estimated likelihood Qi for such a large i that convergence is obtained. Chapter 9. Simulation studies under Poisson-gamma models 150 (9.2.11) One may also update the estimate of the dispersion parameter a2 in the M C E M procedure, although sometimes it goes out of control in simulated samples for some unclear reason. Here we propose two types of estimators for a2. One is based on the fact that Vai(Yi) = atbt + a**+\{1~*)*2. l â€” o We hence propose a moment estimator of a2 as ^ 2 = ( l - & ) E ^ { ( ^ - ^ ) 2 - q ^ } Another estimator is based on the likelihood function using Stirling's approximation for the T function. The estimate of o2 or A is obtained by taking derivative with respect to the parameter A in equation (9.2.7), a2 = H. (pB + b^E-G-B) , (9.2.12) where G is defined as the average of the term Â£ " 0t-i \og(0t) over the samples Bi,..., Bm. We now derive the formula for computing the approximate observed Fisher informa-tion matrix as follows. Suppose ff = (a T , /?) is the parameter estimate obtained from the M C E M algorithm and 0t are the smoothed means output at the last iteration. For the complete information, -y m -y m n n E W = â€” E E Ot,jatxtxJ = atxtxj0v (9.2.13) m jTi m j=i t = 1 t=i and 1 m -j TO n - E hpj = - Â£ ^ E Otj = A e " ^ . (9.2.14) For the missing information, laj(ot) = - thOtj), IpAP) = f - A E ^ - i j + Ae - ' X X - ) â€¢ (9-2-15) t=i \ t=i t=i / Chapter 9. Simulation studies under Poisson-gamma models 151 Also -i m n n 4 ( a ) = - Â£ Â£ x ' ( ^ - aA,j) = Â£ x t ( Y t - a & ) ( 9 - 2 - 1 6 ) m j=i t=i t=i and -i m n / n n \ W ) = - E Â£ - A Â£ J + Â£ = - + \e-?E. (9.2.17) m j=i t=i \ t=i t=i ) Notice that both Â£a(6t) Â« 0 and lfj({i) ~ 0 since a and ^ are maximum likelihood estimators making the first-order derivatives zero. Therefore, Tanner (1991) suggested approximating the observed Fisher information matrix by excluding the terms of la(oc) and tp(P) in the missing information matrix given by equation (9.2.5). 9.2.2 Gibbs sampler In order to implement the M C E M algorithm, we must sample from the conditional distri-bution /(0|Y, rj) where 77 = (a T , f3)T is given by the previous M-step. We use the Gibbs sampler to to generate the required Markov chain. Let us first review the method by considering only the case of three variables U, V and W. Suppose that the conditional distribution of each, given the remaining two has a simple form while the joint distribution is complicated. For simplicity, denote the conditional distributions by [U\V, W], [V\U, W] and V], respectively. Then the Gibbs sampler generates a random variate from the joint distribution [U, V, W] as follows. Given arbitrary starting values, C/0, Vo, Wo, draw Ui from [U\Vo, W0], then draw V\ from [V|[/i,Wo], and finally, complete the first itera-tion by drawing W\ from Vi]. After a large number, m, of iterations, we otbain Um,Vm,Wm. Geman and Geman (1984) showed that under mild conditions, the joint distribution of (Um,Vm,Wm) converges at an exponential rate to the joint distribution [U, V, W] as m â€”> co. With regard to our implementation, each iteration needs to generate an n-vector from n conditional densities, respectively. In effect, we incorporate the so-called modified Chapter 9. Simulation studies under Poisson-gamma models 152 rejection sampling in the Gibbs sampling algorithm to generate each component 9t of 9 for t ^ n, and this modified version of rejection sampling is given as follows. The rejection sampling (c.f. Ripley, 1987) is a generating method for univariate distributions. Assume that we want to sample from a density f(x), say, which is rather difficult to do directly. Assume that a constant c > 1 is available such that c-g(x) > f(x) where g(x) is a density from which sampling is easy to perform. Then the following steps result in a random variate x with density f(x): (1) Generate a random variate x* from g(x) (2) Generate a uniform variate u from U(0,1) (3) If f(x*)/{c â€¢ g(x*)} < u, assign x = x*; otherwise return to step (1). The modified version of rejection sampling is described as follows. Suppose now that the density f(x) is only known up to a multiplicative constant c/, say, f(x) = Cff*(x). And assume similarly that another density is given by g(x) = cgg*(x) where cg might be also unknown. If, by the known constant K, we have f*(x) < Kg*(x), a random variate x can be generated from density f(x) by the following modified rejection sampling: (1) Generate a random variate x* from g(x) (2) Generate a uniform variate u from U(0,1) (3) If f*(x*)/{K â€¢ g*(x*)} < u, assign x = x*; otherwise return to step (1) To find the constant K for our Poisson-gamma model, we need the following lemma. Lemma 9.1 The function f(z) = z1^, z > 0 is bounded and has a (unique) maximum e1/6 at z = e. Chapter 9. Simulation studies under Poisson-gamma models 153 Proof Clearly, the function f(z) is continuous on (0, oo). For z G (0,1], f(z) < 1. Since z we know that / ( z ) = e x p ( ^ ) ^ l , z ^ o c . By taking the first derivative of f(z), we get the maximum e1/ ,e of f(z) at z = e. â€¢ Let 0-t be 0 with 9t being omitted. Under our Poisson-gamma setting, we attempt to sample the t-th component 9t from f<q(0t\9-t, Y). Note that for t ^ n, fr,(6t\0-uY) oc fn{e,Y) <x fa(Y\6)f0(9) oc /a(2/ (|^)^(^l^-i)^(^+i|^) avt ( a ^/)Afit-l-l / A0 tl 1 /A0 t+i\ e ' oc *?exp(-a ,<M ^ { - X / i w l ^ J = * < * + " " M exp {-0, (a, + A6-1 + A log &)} . Using the notations in the modified rejection sampling, we thus have not) = { X ^ J ^ 0*VT+XBT~1)~1 E X P {~6T ( A < + A 6 _ 1 + A L O G B)} (9-2-18) and the function #*(â€¢) is obtained by applying Stirling's formula T(p) ~ v /27re _ pp p - 1 /' 2 , as p â€”Â»â€¢ oo. or the fact that It follows that And furthermore, r(p) > v ^ e - v1/2-r(A0t) > V27(A^)A 5'"1 / 2exp(-A^) T(A^T < V ^ E X P W I IT J - V ^ EXP IT- J9T E X P W Chapter 9. Simulation studies under Poisson-gamma models 154 where the last inequality is obtained by Lemma 9.1 and the fact that m*-{(*rr "â€¢Â»> Thus, we obtain 9*(9t) = ^ + A ' - 1 + 1 / 2 ) - 1 e x p { - 5 f (a, + A6"1 - A + Alog i )} and /*(0t) < IiT#*(0t) where and #*(0t) oc gamma (yt + A0 t_i + a* + A6 _ 1 - A + A log . In summary, we generate the 9t by executing the following routine: (1) Generate 9* ~ gamma (yt + A0 t_i + ^ , a< + A6 _ 1 - A + A log bj . (2) Generate u ~ W(0,1). (3) If T(A0*) (0 t*)1 / 2exp (A0* + \e-x9t+1) (X9t+1)~xe' > u " 1 , (9.2.20) let 9t â€” 9*, otherwise return to step (1). For t = n, a similar derivation leads to the conditional density frj(0n\On,Y) oc f(yn\6n)f{9n\9n-i) oc gamma (yn + \0n-i, an + A6 _ 1) . In some cases, the bound given in equation (9.2.19) is too crude to execute the rejection sampling effectively. An alternative bound we actually used for our programming is given Chapter 9. Simulation studies under Poisson-gamma models 155 as follows. For a small e < 1, ot J \ 0t J = eA e'< J Consequently, the rejection sampling could be modified as follows. (1') Generate 9* ~ gamma (yt + \9t-\ + ^ , at'+ Xb'1 - A - A log(e) + A log bj . (2') Generate u ~ W(0,1). (3') If )f^nM*t) (^*) 1 / 2exp (A0* + A e - ^ t + 1 ) ( A 0 t + 1 ) " A e * > u"1, ^ let 9t = 9*, otherwise return to step (1). The method gives some flexibility for the sampling procedure through adjustment of e. For some cases, it is necessary to monitor outcomes of rejection sampling to assure that the samples are somehow balanced in order to make a reasonable inference. 9.2.3 Numerical results The true parameters for Model II were chosen to be a = (0.10, â€”0.20, 0.20, â€”0.05)T with covariates x = (cos(27rÂ£/12), sin(27rt/12), cos(27rÂ£/6), sin(27ri/6))T, t = 1,... , 168, and {3 = 0.005 (increasing trend). The dispersion parameter a2 is set to 0.1 and 90 is set to 1. Chapter 9. Simulation studies under Poisson-gamma models 156 Figure 9.1 shows the 168 simulated observations from the Poisson-gamma model with these parameters, where the solid line represents the latent process and the dots are the Poisson counts. There is an apparent increasing trend in the latent part, so the underlying process appears to be non-stationary. Figure 9.1: 168 simulated counts from the Poisson-gamma model where the latent process is non-stationary. Observations are indicated by dots, and the latent process with solid line. o CM O 0 50 100 150 Time To perform the M C E M algorithm, we set initial values as follows. The starting values of oc were obtained by fitting a log-linear model to the time series of counts with an offset specified by the moving average of the count data. In order to remove the seasonal Chapter 9. Simulation studies under Poisson-gamma models 157 effects, the moving average was computed based on both lag 6 and lag 12 (first average the raw series by lag 6 and then average averaged series by lag 12), and the resulting series characterized mainly the trend. It was also used as the initial latent process for the Gibbs sampler. Consequently, we obtained the starting value of 0 based on the initial latent process, by differentiating the joint log-likelihood with respect to The starting value for cr2 was chosen as 0.001. The same initializations was also used for the K E E approach. The Markov chain sample size (m) was initially chosen to be 2,000. As mentioned above, the Gibbs sampler starts with the 9 process which was set to the moving average of the count data. A run of 4,000 0's was generated by the Gibbs sampler, and the first 2,000 0's were discarded as transient values. The last 2,000 9,s were retained to compute Q(J+1\ The final value of 9 from the previous run was then used as the starting value of 9 for the Gibbs sampler in the next M-step. This procedure was tried to run repeatedly 400 times where the estimate of the dispersion parameter a2 was updated at each iteration by equation (9.2.12). Unfortunately, the above tentative scheme for computing the M C E M algorithm did not succeed in the estimation of the dispersion parameter a2, because Stirling's approx-imation to the gamma function was applied to obtain the estimate. In fact, whenever the estimate of a2 became negative, the M E C M algorithm stopped. This relates to the rejection sampling that can not always produce the "robust" samples against the approx-imation for the estimate of a2. Actually, we have tried a few alternative ways to estimate cr2, but none was successful. So, we eventually came to use the plotting profile likelihood approach to overcoming the difficulty. This method requires much more computing time and space because now each iteration needs much more runs to obtain the maximizer of profile likelihood. For a fixed value of cr2, the profile likelihood with respect to the parameter rj was given by equation (9.2.6) where the parameter A was substituted by the given value 1/cr2. Therefore the whole implementation for the M C E M algorithm was Chapter 9. Simulation studies under Poisson-gamma models 158 then based on likelihood with known cr2, rather than the joint likelihood with unknown a2. Accordingly, the M-step, consisting of solving equations (9.2.8) and (9.2.9), was run until the convergence criterion max || rj(l+1) â€” rj^ ||< 1 0 - 4 attains. So, the resulting maximizers of profile likelihood, denoted by o:(cr2) and f3(cr2) are the so-called maximum profile-likelihood estimates of a and (3 for given a2. Figure 9.2: Log-profile-likelihood versus the dispersion parameter given increasingly. O O GO CD 8 <P ~ oo CO CD 0.0 0.05 0.10 0.15 dispersion parameters 0.20 0.02 0.04 0.06 0.08 0.10 dispersion parameters Chapter 9. Simulation studies under Poisson-gamma models 159 To implement the plotting profile likelihood method numerically, referring to the pat-tern of the adjusted Pearson estimate of a2 produced by the K E E approach, we chose the range of the dispersion parameter a2 as the interval [0.001, 2.15]. We implemented the procedure based on both cases where the dispersion parameter was updated, respec-tively, increasingly and decreasingly with a difference of 0.001 between two consecutive iterations. Figure 9.3: Log-profile-likelihood versus the dispersion parameter given decreasingly. CO CD 0.3 0.25 0.2 0.15 0.1 0.05 dispersion parameters T 1 1 1 1 1 1 1 1 r 0.1 0.08 0.06 0.04 0.02 dispersion parameters Chapter 9. Simulation studies under Poisson-gamma models 160 Figure 9.2 shows the plot of the estimated log-profile-likelihood of the observed data versus the values of the dispersion parameter <J2 which was fed increasingly starting at 0.001. Conversely, given the values of the dispersion parameter a2 decreasingly starting at 0.3, the estimated log-profile-likelihood of the observed data versus a2 is plotted in Figure 9.3. The top display of Figure 9.2 indicates that the log-profile-likelihood first goes up rapidly and then goes down slowly when a2 exceeds 0.03 or so. The bottom display shows a detailed pattern of the log-profile-likelihood for the a2 given on the interval [0.01,0.1]. The two displays in Figure 9.3 show similar pattern to Figure 9.2, where the log-profile-likelihood goes up slowly and then goes down quickly. Both figures indicate that the log-profile-likelihood functions attain maximum around a2 = 0.025. Then we took the parameter values at a2 = 0.028, which gives the highest profile likelihood in Figure 9.2, as the starting but fixed value of dispersion parameter for a new M C E M algorithm. This time the starting latent process and the starting values of regres-sion parameters are set to the values of the profile likelihood maximizers corresponding to a 2 = 0.028. Now the size of the Markov chain sample is increased to 4,000. Figure 9.4 shows the patterns for the log-likelihood and the coefficients over 100 iterations. A l l displays are stable, and the values of the parameters change only slightly; the first two significant digits are virtually unchanged. 9.2.4 Conclusion Table 9.1 reports the approximate maximum likelihood estimates from the M C E M ap-proach, those based on Figure 9.2 corresponding to a2 = 0.028, and from the K E E approach. Except that the estimate of /? (trend covariate) is a little overestimated, the rest of estimates are relatively close to the corresponding true values. The standard errors of the Chapter 9. Simulation studies under Poisson-gamma models 161 Figure 9.4: Patterns for log-likelihood and coefficients estimates. 20 40 20 40 Chapter 9. Simulation studies under Poisson-gamma models Table 9.1: Estimates and standard errors from K E E and M C E M . 162 M C E M K E E Estimates Std err. Estimates Std err. True values Trend 0.0125 0.0300 0.0120 0.0149 0.0050 cos(2?rt/12) 0.1348 0.0494 0.1014 0.0774 0.1000 sin(27rt/12) -0.3300 0.0546 -0.2281 0.0835 -0.2000 cos(27rt/6) 0.2264 0.0513 0.1592 0.0698 0.2000 sin(27ri/6) a2 -0.0731 0.0522 -0.0571 0.0702 -0.0500 0.0280 0.1310 0.1000 K E E method are uniformly slightly higher than those of M C E M method. However, it is unfair to turn down the K E E in favour of M C E M only through the simple comparison between standard errors which are approximately computed. For example, one may compare the predicted latent processes obtained from two method methods. Figure 9.5 presents the true latent process, the Kalman smoother (m*) related to the K E E approach and the smoothed means (6T) from the M C E M approach. Figure 9.5: Comparison between the true latent process (solid line) and the estimated latent processes from the Gibbs sampler (dashed line) and from Kalman smoother (broken line). True process Smoothed means Kalman smoother i 50 100 150 Time Chapter 9. Simulation studies under Poisson-gamma models 163 It is easy to compute the mean squared error for the two estimated processes: the 9t produces n'1 Â£â„¢(0 t -0;) 2 â€” 7.9140 and the Kalman smoother m* gives n~l Â£ ? ( 0 i - m * ) 2 =-0.7410. The BLUP gives an overall better prediction for the latent process than the smoothed means of the Gibbs sampler. This might be the reason why the KEE produces a stable sequence for estimation of the dispersion parameters converging towards the true value. Another advantage of K E E method is in the aspect of computational efficiency in terms of the simplicity of programming and far less time-consuming computation. For the analysis of the simulated data, we coded the KEE method in the S-PLUS language, whereas we were forced to program the MCEM in the C language due to the huge dynamic space demand for the Gibbs sampler and plotting profile likelihood approach, which are impossible to run based on S-PLUS code. Even though C code is much more efficient than S-PLUS in terms of computational speed, our program for the plotting profile likelihood estimation via MCEM took about 30 hours to complete, whereas the program for KEE took only about 16 minutes to converge. As is well-known, the rejection sampling involved in the Gibbs sampler is the part which takes considerable time. In addition, the plotting profile likelihood estimation is very time-consuming because the MCEM has to be carried out for a sequence of a2 values. As a result, we would like to make a recommendation concerning the MCEM approach: if there is a failure to get simple starting values for the MCEM algorithm such as through GLIM or if a faster computing procedure is needed, the K E E definitely provides high quality starting values because those values given by the KEE are already near the true values. For example, the Kalman smoother provides very competitive starting values for the latent process as shown in Figure 9.5. We have tentatively tried to run the MCEM algorithm by using the K E E output as starting values, and the log-likelihood fluctuated without any substantial changes occurring. A future careful investigation of this issue would be interesting. Chapter 9. Simulation studies under Poisson-gamma models 164 9.3 Detecting non-stationarity This section is mainly devoted to examining the utility of the residual analyses developed respectively in Chapters 7 and 8. It seems that the assumption of stationarity or non-stationarity for the latent process substantially affects the interpretation of the modelling and hence the data analysis. So, developing tools for checking whether the latent process is stationary or not is important in order to assure the appropriateness of the data analysis. The reliability of the model checking diagnosis based on analysis of residuals will be the main concern in the present section. Our studies are organized through the case-and-control strategy as indicated in the following table. Table 9.2: Overview of the simulation strategy for model diagnosis. Model that generates data Model that fits data Model I (stationarity) Model II (non-stationarity) Model I (stationarity) see Chapter 7 see Section 9.3.1 Model II (non-stationarity) see Section 9.3.2 see Chapter 8 In Section 9.3.2 the data that are generated from Model I are fitted by Model II, and in Section 9.3.1 the data that are generated from Model II are fitted by Model I. Then we study the residuals from both fittings. 9.3.1 When non-stationarity is present Actually, the data set given in the previous section (also shown in Figure 9.1) is used for our study here. Now we fit the data by Model I where the latent process is stationary AR(1) with gamma margins. The starting values for the dispersion parameter cr2 and autocorrelation parameter p are set to 0.05 and 0.1, respectively. As usual, the starting Chapter 9. Simulation studies under Poisson-gamma models 165 latent process is set to the moving average of the data. And 500 iterations were run; however, according to our experience, 150 iterations are normally enough to get conver-gence. Unfortunately, for the current case the estimate of oc does not seem convergent where the maximum difference of two consecutive estimates of oc remains about as high as 1,500, and the last 100 iterations only improve the value of the maximum difference from 1,507 to 1,498. Figure 9.6: A C F and P A C F functions for the residuals of the latent process. ACF function for residuals Lag Partial ACF function for residuals 10 15 Lag It is clear that all estimates reported in Table 9.3 are fairly poor compared with the true values. Nevertheless, let us try the residual plot to see if there is any indication of violation against the assumption of stationarity. If the stationary AR(1) structure is correct, the sequence of residuals et = mt â€” Gtmt-i should be a noise series where Gt are independent beta random variates from Beta(p\, (1 â€” p)X). Figure 9.6 shows the Chapter 9. Simulation studies under Poisson-gamma models 166 Table 9.3: Estimates of coefficients given by Model I based on the data generated by Model II. Estimates Std. err. True values Trend 0.0237 0.0002 0.0050 cos(27ri/12) 0.2718 0.0179 0.1000 sin(27rt/12) -0.8187 0.0186 -0.2000 cos(27ri/6) 0.0910 0.0153 0.2000 sin(27rt/6) -0.3079 0.0154 -0.0500 a2 0.6512 P 0.7175 autocorrelation and partial autocorrelation functions for et. Both displays clearly suggest that the latent process is neither an autoregressive process nor a moving-average process. 9.3.2 When stationarity is present First we generated a data set from Model I with true values of the parameters given as follows. The covariates were the same as those we used to generate the data in Section 9.3.1, and so are the values of the parameters associated with the corresponding covariates. The correlation coefficient p is set to 0.5 and the dispersion parameter a2 = 0.5 or A = 2. The 168 simulated observations and latent process are shown in Figure 9.7. These counts were fitted by Model II, starting with a2 = 0.05 and the moving-average of the data used as the initial latent process. After 200 iterations, the program stopped according to the criterion of convergence - the maximum difference between two consecutive estimates was less than 1 0 - 5 . Table 9.4 reports the values of estimates. The estimate for covariate cos(27Tt*/12) had the opposite sign compared with the true value 0.1. The rest of estimates coincide in sign with the true values, and the covariates of trend, cos(27rÂ£/12) and sin(27rt/6) are not significant at the significance level of 95%. If the non-stationary latent process were correct, the residuals Â£ t = (mt â€” bmt-i)/^/bDt â€” Ct Chapter 9. Simulation studies under Poisson-gamma models 167 Figure 9.7: 168 simulated counts (represented by dots) from a state space model with a stationary gamma latent process (represented by a solid line). GO C O O 50 100 150 Time Table 9.4: Estimates of coefficients given by Model II based on the data generated by Model I. Estimates Std. err. True values Trend 0.0013 0.0321 0.0050 COS(2TT*/12) -0.0612 0.0730 0.1000 sin(27ri/12) -0.2700 0.0813 -0.2000 cos(27ri/6) 0.3101 0.0671 0.2000 sin(27r*/6) -0.1053 0.0667 -0.0500 <7*2 1.9748 Chapter 9. Simulation studies under Poisson-gamma models 168 should be uncorrelated. Figure 9.8 shows the autocorrelation function and partial au-tocorrelation function of these residuals. It is clear that the Â£ t are correlated, which suggests that there is something wrong with the assumptions for the latent process. Figure 9.8: A C F and P A C F functions for the residuals from the latent process. ACF function for residuals of latent process U- d o < T T r Lag Partial ACF function for residuals of latent process o 1 1 | I I I I Lag 9.3.3 Conclusion As discussed above, the residual analyses developed in the previous chapters are very effective for detecting the misspecification of the assumptions concerning the structure of the latent processes. Such tools are valuable in practice for checking model assumptions for data analysis. This use of residual analysis for model checking provides an objective Chapter 9. Simulation studies under Poisson-gamma models 169 statistical inference. Evidently, the ability to detect the misspecification of the structure of the latent process gives some credibility to the K E E approach and increases the confi-dence to its users in applied areas. It is worth noting that the model checking diagnosis via residual analyses for both M C E M and G E E methods are not currently available. That is because that even though the residuals may be computed in both approaches, their properties that might be used for model checking are not easy to be drawn in gen-eral. This hampers the development of model checking diagnosis on the basis of both methods. Chapter 10 Data analysis In this chapter we present an application of state space models to a regression analysis of data from Prince George, British Columbia, previously analyzed by Knight, et al. (1993). The data consist of daily recordings of the number of emergency room visits for each of four categories of respiratory diseases, along with measurements of meteoro-logical variables and air pollution. We used a state-space model assuming conditionally independent Poisson counts for the four categories given a latent morbidity process, the latent process being a gamma Markov process. The main objective of the investigation was to examine the relationship between air pollution and respiratory morbidity, taking into account seasonality and meteorological conditions. We found that Total Reduced Sulphur significantly influences the expected number of emergency room visits for the four disease categories, in agreement with the conclusion by Knight et al. (1993). How-ever, our final model is simpler than theirs; in particular we found no evidence of seasonal variation beyond that explained by the meteorological variables. 10.1 Introduction We consider data from Prince George, British Columbia, consisting of daily counts of emergency room visits for respiratory diseases, classified into four categories (Asthma, Bronchitis, Ear infections and Others) for the period 1984 to 1986, along with daily mea-surements of air pollution and meteorological variables. The air pollution variables were 170 Chapter 10. Data analysis 171 Sulphur (total reduced sulphur compounds) and Particulates (total suspended particu-lates). The main objective of the investigation was to examine the relationship between air pollution and respiratory morbidity. The monitoring of air pollution in Prince George (c.f. Lambert et al., 1987) shows that there are frequent excursions above the provincial air quality standards, and there has long been public concern in Prince George that the air quality in the city may be adversely affecting the health of the residents. Recent studies of the association between air pollution and human health that have used emergency room visits or hospital admissions as an indicator of human health in-clude Levy et al. (1977), Pope (1989), Samet et al. (1981) and Bates and Sizto (1983). Other indicators such as mortality (Dockery et al., 1993) and elementary school absences (Ransom and Pope, 1992) have also been used. We are particularly concerned with the modelling of the dynamics of the disease-generating process (at an aggregate level), and our model is well suited to deal with count indicators such as emergency room visits and school absences that can be presumed to be closely related to daily levels of ambient air pollution. The Prince George data were previously analyzed by Knight et al. (1993) who used a log-linear Poisson model, in which the counts were assumed independent over time, as well as between the four categories. They found evidence of over-dispersion, and hence used a quasi-likelihood analysis. A state-space model based on Poisson counts, proposed in Chapter 8, is used for the analysis, assuming that the counts are conditionally independent given a latent Markov process. This implies serial correlation for the counts as well as correlations between the four disease categories, giving a more realistic modelling of the correlation structure for this type of data. The ability of our model to account for correlations among the four categories turns out to be crucial in the data analysis. Chapter 10. Data analysis 172 A new feature of our model is that the air pollution covariates are treated as long-term covariates and enter the model via the latent Markov process. We may hence interpret the latent process as a latent morbidity process representing the air pollution's potential for causing respiratory diseases. In particular, the Markov assumption implies a carry-over effect from day to day for the effect of air pollution. The meteorological and seasonal variables are assumed to have an immediate effect on the counts, and hence enter as covariates in the conditional Poisson distribution for the counts. So, such covariates are short-term covariates, because they have an immediate effect on the counts. This approach gives a realistic modelling of the longitudinal structure of the data not easily achieved with the quasi-likelihood method developed by Liang and Zeger (c.f. Diggle, Liang and Zeger, 1994). In particular, the new model has both a marginal and a conditional interpretation, and a detailed residual analysis allows all model assumptions to be checked from the data. 10.2 Data The available data consist of daily counts of emergency room visits for respiratory diseases and the corresponding daily values of various meteorological variables and air pollution parameters from Apri l 1, 1984 to March 31, 1986, a total of 730 days. For a more detailed description of the data, we refer to Knight et al. (1993). The counts refer to emergency room visits by residents of Prince George to the single hospital in Prince George. A l l visits were classified according to discharge diagnosis. The counts were originally classified into 33 diagnostic categories. For this analysis these categories were collapsed into four categories, referred to as Asthma, Bronchitis, Ear infections and Other respiratory diseases. Figure 10.1 shows time-series plots of the daily Chapter 10. Data analysis 173 counts for each of the four categories. The meteorological data were all obtained at the Prince George airport, and consist of daily summaries of temperature and humidity. The following three measurements were recorded: temperature, maximum relative humidity and minimum relative humid-ity. The temperature is the average of 24 hourly readings in degrees Celsius, and the maximum/minimum relative humidity are the largest/smallest of 24 hourly readings. There were no missing values for temperature, but a missing value of minimum humidity for September 29, 1985 was substituted by 38, the average of the remaining records of minimum humidity. Air quality parameters have been measured at six monitoring stations in Prince George since 1980, and the daily averages of the readings were used. The two summaries used in this study were Sulphur (total reduced sulphur compounds) and Particulates (total suspended particulates). Sulphur (recorded in ppb) was measured by a continuous monitor, recording daily averages. The original data had many individual missing values, but these were "pre-dicted" using available measurements from the other stations (for details, see Knight et al., 1993). Particulates (recorded in fxg/m3) was measured by a 24-hour vacuum sampler once every six days, on the same day for all six monitoring stations. For the purpose of the analysis, we have filled in the missing five-day blocks by linear interpolation. Figure 10.2 shows time-series plots of the meteorological and air pollution variables. 10.3 Methodology Referring to Chapter 8, we summarize the main features of the model and estimation technique as follows. Chapter 10. Data analysis 174 Figure 10.1: Emergency room visits for four respiratory diseases. Asthma Counts Apr . . Jul . . Oct . . 85 . . Apr . . Jul . . Oct . . 86 . . Apr Bronchitis Counts Apr . . Jul . . Oct . . 85 . . Apr . . Jul . . Oct . . 86 . . Apr Ear Infection Counts Apr . . Jul . . Oct . . 85 . . Apr . . Jul . . Oct . . 86 . . Apr Others Apr . . Jul . . Oct . . 85 . . Apr . . Jul . . Oct . . 86 . . Apr Chapter 10. Data analysis 175 Figure 10.2: Meteorological and air pollution variables. Temperature Apr . . Jul . . Oct . 85 . . Apr . . Jul . . Oct . 86 . Apr Humidity (Max, Min) Apr . . Jul . . Oct . . 85 . . Apr . . Jul . . Oct . . 86 . . Apr Sulphur (TRS) LAI \IJJAIA M l i l k Apr . . Jul . . Oct . 85 . . Apr . . Jul . . Oct . 86 . Apr Particulates (TSP) Apr . . Jul . . Oct . . 85 . . Apr . . Jul . . Oct . . 86 . . Apr Chapter 10. Data analysis 176 10.3.1 The model The response variable Yt is a four-dimensional vector, consisting of daily counts of emer-gency room visits for the four respiratory disease categories. The main idea in the model is to consider the number of daily emergency room visits to be a Poisson process driven by a latent Markov process, denoted {0t}. The components of Yt are assumed to be con-ditionally independent given 9t, and the conditional distribution for the zth component of Yf follows a Poisson distribution, Yit\6t~Po(ait6t), i = 1,2,3,4, where loga, t is a linear predictor depending on the short-term covariates xf, i.e., ait = exp (x^tti) , on denoting the parameter vector for the ith category. The latent morbidity process 6t represents the "overall potential" for air pollution to cause emergency room visits with respiratory symptoms. It is assumed to follow a gamma Markov process, defined by where Ga(p, 82) denotes the gamma distribution with mean p and coefficient of variation 8. The parameter a2 is a dispersion parameter expressing the smoothness of the process, and log bt is a linear predictor depending on the long-term covariates via their increments. Hence, bt = exp (Az^/S) , A z t = zt - z t _ i , where zt is the vector process of air pollution measurements and /3 is the parameter vector. The latent process is standardized by taking 60 = 1, and the long-terms covariates are centralized by subtracting Zi from all values and taking z 0 = 0. Chapter 10. Data analysis 177 The marginal moments of Yt are E(Y t) = atrt, Var(Y4) = Atrt + a4aj'4>ta2Tt where A t = diag(a<), rt = E(8t) = 0obi â€¢ â€¢ â€¢ bt and <f>t = bt + btbt-\ + 6i6(_i64_2 + â€¢ â€¢ â€¢ + btbt-i â€¢ â€¢ â€¢ 61. The log-expectation of Yu is log E(Yit) = xJai + zJ/3, showing that the model is log-linear in the covariates xt and zt. The two sets of covariates X j and z< are called respectively short-term and long-term covariates, a crucial distinction, although the terminology short-term/long-term should not be taken too literally. We now explain the differences between the two types of covariates. The long-term covariates zt represent measurements of the two air pollutants, ex-pected to have an effect on respiratory diseases. Because the long-term covariates enter the model via the latent process, this process represents the potential for air pollution to cause emergency room visits with respiratory symptoms. The terminology "long-term" is used because the Markov structure of the latent process produces a carry-over effect from day to day, such that an increase in the air pollution level on a given day may have an effect on subsequent days. An important characteristic of long-term covariates is that they have the same effect on all four categories of Yt. The short-term covariates xt enter via the conditional distribution of the observed emergency room visit counts given the latent process 6t, and hence represent modulating factors that have an immediate (acute) effect on visit counts relative to the value of 0t . Such covariates may include local weather conditions and seasonal effects, and also include separate constants for the four categories, corresponding to the different incidence rates for the symptoms. Chapter 10. Data analysis 178 10.3.2 Parameter estimation Following Chapter 8, we use the so-called Kalman estimating equation for estimation. For a sample of n observations, equations (8.4.1) and (8.4.2) become *i(v,Â«) = Â£ ( Â£ Y* - Â£ Â°Â«0t) = 0 t=l \i=l i=l ) and V>2(Â»7,0) = T,K\0t - M t-i ) A z t = 0, t=l where 77 = (aT,/?T)T and 0 = (90,6\,..., 0n) denotes the latent process. Through the the Newton scoring algorithm given by (8.4.4), the M-step of the EM algorithm devotes to solve the Kalman estimating equations rp1(r],m*(TJ)) = 0, V2(r/,m*(i|)) = 0, where m* = m*(77) is the Kalman smoother (c.f. Section 8.3). And the standard errors of the estimates may be computed via the inverse of Godambe information matrix. 10.3.3 Residual analysis The Kalman filter and smoother allow us to define residuals for both the observed and latent processes, to be used for model checking. The most useful type of residuals for our purpose are the conditional residuals, which may be characterized as observed values minus predicted values. There are two types of conditional residuals for the ith observed process, defined in terms of the Kalman filter and smoother by Rit = Yit - fu, and R*t - Yit - aitm*, respectively. For the latent process, we define the conditional residuals as Chapter 10. Data analysis 179 rt = mt- btmt-i, based on the Kalman filter. The residuals are then standardized to have zero mean and unit variance. A l l references to residuals from now on are to these standardized residuals with parameter estimates inserted. The standardized residuals may be used for checking the correlation structure, the distributional assumptions and the covariate structure of the model, as done in Section 10.4 for the Prince George data. The residuals rt and Rn are particularly useful for checking the correlation structure of the data, because they are uncorrelated over time. 10.4 Data analysis 10.4.1 Model identification We now describe the main considerations that went into finding the appropriate model for the Prince George data. Following Knight et al. (1993), we transformed Sulphur and Particulates to respec-tively log(0.5 + Sulphur) and log(Particulates). For convenience, we shall refer to log(0.5 + Sulphur) as simply log-Sulphur. We also made a logarithmic transformation of the minimum and maximum humidity (note that Knight et al. (1993) did not transform humidity). For ease of interpretation, the difference and the sum of high and low log-humidities were used as short-term covariates. The three meteorological covariates were centralized by subtracting their respective means. In this way, the intercept parameter represents the log expected number of emergency room visits for a day with average temperature and humidity, and with air pollution as on Apri l 1, 1984. The intercept is allowed to depend on the day of the week. Chapter 10. Data analysis 180 We now investigate the need for lagging the log-Sulphur covariate. Following Knight et al. (1993), we consider up to lag 2 for log-Sulphur. It does not seem reasonable to include lags for log-Particulates, because of the linear interpolation used for this covariate (c.f. Section 10.2). Our preliminary model for the Prince George data is hence as follows. Short-term covariates: log-temperature, difference of log-humidities, sum of log-humidities, 7 day-of-week factors. Long-term covariates: lag 0, 1 and 2 of log-Sulphur, log-Particulates. Table 10.1: Estimates and standard errors for air pollution effects. log-Sulphur log-Particulates lag 0 0.029 (0.016) -0.120(0.058) lag 1 -0.023 (0.017) â€” lag 2 0.048 (0.016) â€” The estimates of the coefficients for the air pollution covariates are reported in Ta-ble 10.1. To verify the need for lag 1 and lag 2 for log-Sulphur, we applied the Wald test described in Section 10.3. The result was W = 10.251, which, compared with a X2(2)-distribution, gave a p-value of 0.006. This confirms the need for including the two lags in the model. Only the lag 2 coefficient is significant individually, but we have no Chapter 10. Data analysis 181 explanation for this finding. Knight et al. (1993) found similar results, except that they found both the lag 0 and lag 2 coefficients to be significant, but not the lag 1 coefficient. Figure 10.3: First-year residuals against second-year residuals for the four categories. Asthma Bronchitis - 2 - 1 0 1 2 3 4 First year Ear infections Others .â€¢â€¢.--â€¢'â€¢rt"-"-j*;V.'" -2 0 2 4 First year CO To investigate the possibility of a seasonality effect, we plotted the second year's residuals against the first year's, for both rt and Rn shown in Figure 10.3 and Figure 10.4, respectively. In the presence of seasonality not accounted for by the model, we would expect to see a correlation between the first and second year's residuals. None of these five plots Chapter 10. Data analysis 182 Figure 10.4: First-year residuals against second-year residuals for latent process. Smoother residuals indicated any such correlation, but it should be kept in mind that a two-year period may be too short to detect seasonality. However,.it seems that the meteorological variables have accounted for all seasonality variation in the present data. 10.4.2 Model checking In order to verify the main model assumptions, we now analyse the residuals from the above model. Figure 10.5 shows plots of Rn against log fn for each i, where the /^ 's are the predicted values of the observed processes based on the Kalman filter. The discreteness of the data combined with the low expected counts, especially for the Asthma process, account for the curved set of bands with negative slope found in these plots. The lowest band in each plot corresponds to the observation y = 0, the next lowest to y = 1 and so on. In particular, there can be no large negative residuals for small fitted values. This feature, together with the negative slope of the curves on which the residuals lie, make Chapter 10. Data analysis 183 Figure 10.5: Residuals against log-linear predictions for the four categories. Asthma Bronchitis co 3 â€¢g 'co CD DC -CO -CM -' â€¢ . 'â€¢ * * - " o -1 CM â€¢ i 1 1 r -1.5 -0.5 0.5 Log-linear predictor CO -g 'co CD QC CO CM -0.5 0.5 1.5 Log-linear predictor Ear infections Others co 3 -g 'co CD CM H CM v - . v -"*"7M.Vv-;: T 1 1 1 1 1 0.0 1.0 2.0 3.0 Log-linear predictor co 3 â€¢g w CD DC co CM H CM J . "-'-yi *â€¢' â€¢-â€¢â€¢>:. â€”i 1 1 1 1 r 1.0 2.0 3.0 Log-linear predictor Chapter 10. Data analysis 184 the residual plots show an apparent downward trend. These traits dominate the plots for log expected counts less than 1 (and are hence prominent in Figure 10.5), but disappear above this value, where the Poisson residuals become approximately normal. Taking this into account, the plots are compatible with the Poisson assumption. In particular the plots confirm the appropriateness of the Poisson variance function. There appears to be an outlier in the category Others, but otherwise the number of larger residuals seems to be compatible with the number of observations. The downward trend in Figure 10.5 will be apparent in any plot of residuals against fitted values, making it difficult to interpret some plots. For example, Figure 10.6 plots the residuals rt against the log fitted values log(6f *mt-i), which is useful for checking the gamma assumption for the latent process, showed such a slight downward trend. Other-wise this plot showed no distinctive features, thereby confirming the appropriateness of the gamma assumption. Figure 10.6: Residuals against log-linear predictions for latent process. Log-BnMr predictor Chapter 10. Data analysis 185 To check the correlation structure for the latent process, we plotted the autocorrela-tion function (ACF) for rt (Figure 10.7). The plot shows that the autocorrelations for lags 1, 16 and 27 fall slightly outside the asymptotic 95% confidence bands. However, the confidence bands are approximate, even for Gaussian time-series, and it seems that the significance would be slight in any case, confirming the Markov assumption for the latent process. Figure 10.7: Autocorrelation function of residuals from latent process. .,1.1,,, Il I l i u m II 1 1 '1 T 1 1 1 1 r-0 6 10 IS 20 25 Similar A C F plots for R& for each of the four categories shown in Figure 10.8, did not show anything unexpected. This confirms that the Ru are uncorrelated over time for each of the four categories and moreover confirms the adequacy of assuming conditional independence of the Poisson counts over time, given the latent process. Further plots of Rn against Rn-i for each i, and of rt against rt-i in respectively Figure 10.9 and Figure 10.10 confirmed these conclusions. An additional check of the correlation structure for the model was obtained by plotting R*t against m*_a for each category, see Figure 10.11, of the latent process, and not on Chapter 10. Data analysis 186 Figure 10.8: Autocorrelation functions of residuals for the four categories. Asthma Bronchitis CO d CO d LL o CV) d o d T 1 1 r -00 d CO d 2-CM d o d 0 5 10 15 20 25 Lag 0 5 10 15 20 25 Lag Ear infections Others CO d CO d LL o CM d o d i l l 0 5 10 15 20 25 Lag CO d co d *3 CM d o d 'I 1 , 1 , 1 , , i , , l i . I j l . T 1 1 r-0 5 10 15 20 25 Lag Chapter 10. Data analysis 187 Figure 10.9: Residuals against lag 1 residuals for the four categories. Asthma Bronchitis w CO " O 'in CD cr -co -co -CM -CM -CO O â€¢ â€¢ $ â€¢ % * â€¢ â€¢ i .. m â€¢ j . V " ' . i - \ â€¢â€¢ ' . . . â€¢ Residual O -,_ 1 . # M Â» â€¢ â€¢ i CM CM i -1 1 1 1 r -2 0 1 2 3 4 Lagged residuals -2 0 1 2 3 Lagged residuals Ear infections Others -2 0 2 4 Lagged residuals W CO â€¢g 'co CD EC oo co CM CM -2 0 2 4 6 8 Lagged residuals Chapter 10. Data analysis Figure 10.10: Residuals against lag 1 residuals for the latent process. 188 T 1 1 1 -4 -2 0 2 4 6 Lagged reskluats yesterday's value. These plots exhibited a slight downward trend, especially for asthma and bronchitis, but since this effect is to be expected for plots of residuals against fitted values, as noted above, we interpreted the plot to be in agreement with the model. In order to check the assumption that the four categories share the same latent pro-cess, we plotted the residuals Rn against the long-term covariates, in order to confirm that the dependence of the counts on the long-term covariates is via the latent process. These eight plots (not shown) did not exhibit any unexpected patterns. Further plots of the residuals rt against the long-term covariates, and of Ru against the short-term covariates, also confirmed the adequacy of the form of the regression model. Finally, to check the correlations between the four categories, we standardized the vector consisting of Ra, i = 1,2,3,4 by the square-root of its variance-covariance matrix, and calculated the corresponding empirical variance-covariance matrix, which is expected to be the identity matrix (for details, see Chapter 8). The result is given in Table 10.2. The interpretation of the results in Table 10.2 is complicated by the fact that they Chapter 10. Data analysis Figure 10.11: Residuals for the four categories against the lag 1 smoother. Asthma Bronchitis _co co â€¢g 'to CD rr 0.5 1.0 1.5 2.0 2.5 Lagged smoother w CB =J T3 'co CD CC CM CM 0.5 1.0 1.5 2.0 2.5 Lagged smoother Ear infections Others w CO 3 â€¢g 'co CD rr CM CM V : . " 0.5 1.0 1.5 2.0 2.5 Lagged smoother _co CO â€¢g 'co CD rr CO CD CM CM 0.5 1.0 1.5 2.0 2.5 Lagged smoother Chapter 10. Data analysis 190 Table 10.2: Empirical variance-covariance matrix for standardized residuals. 1 2 3 4 1 0.931 0.060 0.014 -0.013 2 0.060 0.889 0.050 0.153 3 0.014 0.050 0.931 0.131 4 -0.013 0.153 0.131 1.240 depend on the choice of the square root-matrix used for the standardization. However, the four Rus are not far from being uncorrelated, so the matrix in Table 10.2 is not far from being the empirical correlation matrix for the four Rn's. If we take the asymptotic standard errors of the elements of the matrix to be 1/V730 â€” 0.037, some of the elements of the fourth row are somewhat larger than predicted by the model, suggesting a problem with the category Others. It is not surprising that this category should show correlation with the categories bronchitis and ear infections, for example due to practical problems with the classification of certain cases. Otherwise, this does not seem to indicate any serious shortcoming of our model. 10.5 Conclusions 10.5.1 Final model We now comment on the interpretation of the model, whose parameter estimates are summarized in Tables 10.1 and 10.3, with asymptotic standard errors in brackets. The Wald test for the joint significance lag 0, 1 and 2 of log-Sulphur had the value W = 11.706, which gave a p-value of 0.0085 compared with a x2(3)-distribution. Hence, the effect of log-Sulphur on emergency room visits is highly significant. The coefficient for log-Particulates is negative, but only slightly significant. Moreover, due to the use of linear interpolation (described in Section 10.2), this result should be Chapter 10. Data analysis 191 Table 10.3: Estimates and standard errors for short-term effects. Asthma Bronchitis Ear Others Temperature Diff. Humidity Sum Humidity 0.0106 (0.0060) 0.0682 (0.2336) -0.0388 (0.1590) -0.0147 (0.0045) 0.2964 (0.1763) 0.1780 (0.1207) 0.0002 (0.0039) 0.0903 (0.1168) -0.0099 (0.0816) -0.0018 (0.0035) -0.0569 (0.0906) -0.1638 (0.0646) Sunday Monday Tuesday Wednesday Thursday Friday Saturday 0.0796 (0.1760) -0.1050 (0.1807) -0.3667 (0.1893) -0.6121 (0.1986) -0.5163 (0.1951) -0.1725 (0.1834) 0.1798 (0.1747) 0.9099 (0.1620) 0.3878 (0.1695) 0.1634 (0.1745) 0.4878 (0.1685) 0.3026 (0.1722) 0.1881 (0.1741) 1.0689 (0.1612) 1.8208 (0.1552) 1.1237 (0.1601) 1.0236 (0.1614) 1.1242 (0.1606) 1.0648 (0.1613) 1.1538 (0.1604) 1.9462 (0.1554) 2.4185 (0.1531) 1.8075 (0.1554) 1.5843 (0.1571) 1.7100 (0.1564) 1.6419 (0.1570) 1.8060 (0.1561) 2.5418 (0.1535) cr2 = 0.019 interpreted with care. We have fitted the above model without log-Particulates, in order to see if correlation between log-Particulates and log-Sulphur could have influenced the coefficient of log-Sulphur. The resulting estimates for the effect of log-Sulphur (not shown) were close to those of Table 10.1. The actual correlation between log-Particulates and log-Sulphur was 0.36, a modest value, confirming that the effect of log-Particulates on the coefficient for log-Sulphur is likely to be of minor importance. Figure 10.12 shows the Kalman smoother estimate of the latent process, which es-timates the potential morbidity due to air pollution without regard to meteorological conditions. The estimated process is highly variable, and the estimate of the dispersion parameter (<7 = 0.13, from Table 10.3) shows that the stochastic variation of the latent morbidity process 0t itself is high. There is evidence of major episodes in December 1985 and late March 86 of elevated risk of respiratory morbidity due to air pollution, and minor episodes in November-December 1984 and Apri l 85. These episodes are to some extent evident in the counts for three of the categories (Bronchitis, Ear and Others), see Figure 10.1. Chapter 10. Data analysis Figure 10.12: The estimated latent process from Kalman smoother. 192 Chapter 10. Data analysis 193 Figure 10.13 shows a plot of the day-of-the-week effects from Table 10.3 for the four categories, which are very similar to the plot of daily averages shown by Knight et al. (1993). This plot shows a fairly clear weekly cycle, with more emergency room visits during the weekends than on workdays. 10.5.2 Comparisons with the previous analysis While our main findings agree with those of Knight et al. (1993), some of the significant effects that they found were not confirmed by our analysis, which produced a more par-simonious model. We shall argue that there are objective reasons for these discrepancies. It is crucial in our analysis that we did not find any seasonal variation other than what can be explained by the meteorological variables. As a rule, one does not expect to find an interaction when the corresponding main effects are not significant, ruling out the possibility of finding interactions involving seasonality based on our model. Thus, while Knight et al. (1993) found interactions between month and category, between temperature and season, and between humidity and season, no such effects were found in our analysis. We also found no interactions between season and the air pollution covariates, contrary to Knight et al. (1993), who found significantly different effects of the air pollution covariates for the four seasons. Due to the fact that we consider air pollution measurements as being long-term co-variates, it is an assumption in our model that the air pollution effects are the same for all four categories. However, this assumption was confirmed via the residual analysis, in agreement with the findings by Knight et al. (1993). At a general level, we expect that an analysis in which the correlation structure of the data is taken into account should produce larger standard errors of estimates, and hence fewer significant results, compared with an analysis based on assuming independent counts. However, while Knight et al. (1993) worked mostly with the assumption of Chapter 10. Data analysis 194 no serial correlation for the counts, they did perform a check of this assumption by incorporating lagged counts as covariates, creating a kind of autoregressive model. This check showed that their conclusions are not much affected by the assumption of no serial correlation. The principal difference between the two approaches is that, while Knight et al. (1993) assumed that the four daily counts are conditionally independent given the covariates, we assume conditional independence given the covariates and the latent process, implying that, in our model, the daily counts are correlated given the covariates. The estimated correlations from our analysis were in the range 0.235 to 0.888, indicating that an analysis based on assuming independence of the four categories might create spurious significances. By taking these correlations into account, our model is hence more realistic. As mentioned in Section 10.4, there may in fact be more correlation present than explained by our model, but a more detailed analysis of the correlation structure is unlikely to alter the conclusions by much. Finally, it should be pointed out that the x2-tests used by Knight et al. (1993) in their quasi-likelihood Poisson analysis have not yet been studied in detail in the literature. They use a x 2 - a P P r o x i m a t i o n to the scaled deviance difference, but this approximation ignores the randomness of the scale estimator. If anything, this would tend to overem-phasize parameter effects, although the effect would disappear for large sample sizes. Moreover, this aspect is likely to be less important than the question of the correlation structure discussed above. Chapter 11 Further research This chapter gives a brief discussion of further research for some topics of the thesis, including the following three parts: (1) comparison of the approximate inferences for population-averaged models in Chapter 4; (2) development of an estimation procedure for the stationary time-series models proposed in Chapter 5; and (3) some theoretical questions concerning state space models proposed in Chapters 7 and 8. We will mainly list questions and, for some problems, give specific suggestions. 11.1 Comparison of approximate inferences The main features of the two likelihood-based approximate inference methods proposed in Chapter 4 may be summarized as follows: Method based on Pearson residuals: This method relates to an unbiased inference function derived from Pearson residuals. These residuals give bad approximation to normal scores when the dispersion parameter is large. The G E E approach is a limiting case of the method when the dispersion parameter becomes small. Method based on deviance residuals: This method is associated with a biased in-ference function derived from deviance residuals. These residuals give good ap-proximations to normal scores even when the dispersion parameter is not small. A bias-correction is usually needed to obtain reasonable estimates. 195 Chapter 11. Further research 196 It is not clear to us whether the bias-correction approach is useful to improve the estima-tion procedure of the inference based on deviance residuals and how. Before proceeding with any theoretical work, a stimulation study for numerical comparison should be made first. In general, the following questions should be addressed through the simulation. (a) With regard to the estimates from the approach based on deviance residual ap-proximations, does a bias-correction improve the estimates substantially? Is there any relationship between the roles of bias-correction and the values of dispersion parameter and sample size? (b) Compare the estimates from the two approximate methods, taking both values of dispersion parameter and sample size into account. In addition, the application of the multivariate dispersion models in the survival analysis (see Lawless, 1982) would be interesting. 11.2 Estimation for stationary time-series models For the stationary time-series models, we need to develop the estimation theory of the model and fit the model to the sunspot data that motivated our study. Due to the similarity of our model and the classic A R M A models, an analogous esti-mating approach such as least-squares estimation could be used for our models. Maxi-mum likelihood estimation might be difficult because this method requires evaluation of high-dimensional integrals. Another interesting question is to develop a numerical tool for computing the partial autocorrelation function defined in Chapter 5. It is very difficult to compute such function directly, and therefore approximations seem inevitable. But we do not know by now how to approximate this function. Chapter 11. Further research 197 11.3 Extensions of state space models For the state space models proposed in Chapters 7 and 8, here are some questions con-cerning further studies. (a) Develop models that are able to accommodate high-order Markov structure to replace the need for lagged covariates which appeared in the model for the Prince George data. (b) Give a theoretical proof of the asymptotic normality for the estimators from the Kalman estimating equation in Chapter 8. (c) Give a theoretical proof for the convergence of the modified EM-algorithm where the E-step is replaced by the Kalman smoother. (d) Study the orthogonality of the regression parameters and the dispersion parameters. If this is true, the asymptotic variance matrix of the estimates is block diagonal. (e) In Chapter 8, extend the state space models to deal with the spatial correlation likely to be present in the case where data from several air monitoring stations and hospitals are available, a problem we avoided by taking daily averages over the six monitoring stations in Prince George. One might address this problem by working with a multi-dimensional latent process, having for example a component for each hospital, while allowing all monitoring stations to influence each component. (f) Develop a test for the hypothesis that the dispersion parameter of the latent process is zero. This test would provide information about the relevance of the latent process. Bibliography [1] Al-Osh, M.A. and Aly, E.A.A (1992). First order autoregressive time series with negative binomial and geometric marginals. Comm. in. Statist. A 21, 2483-2492. [2] Al-Osh, M.A. and Alzaid, A.A. (1987). First-order integer-valued autoregressive (INAR(l)) process. J. Time Series Analysis 8, 261-275. [3] Andrews, D. F. and Herzberg, A. M. (1980). DATA - a collection of problems and data from many fields for the student and research worker in statistics. [4] Azzalini, A. (1982). Approximate filtering of parameter driven processes. J. Time Series Anal. 3, 38-44. [5] Barndorff-Nielsen, O.E. (1978). Hyperbolic distributions and distributions on hy-perbolae. Scand. J. Statist. 5, 151-157. [6] Barndorff-Nielsen, O.E. (1990). p* and Laplace's method. Braz. J. Probab. Statist. 4, 89-103. [7] Barndorff-Nielsen, O.E. and J0rgensen, B. (1991). Some parametric models on the simplex, J. Mult. Anal, 39, 106-116. [8] Bates, D.V. and Sizto, R. (1983). Relationship between air pollution and hospital admissions in southern Ontario. Canadian Journal of Public Health 74, 117-122. [9] Billingsley, P. (1968). Convergence of Probability Measures. Wiley, New York. [10] Brockwell, P.J. and Davis, R.A. (1987). Time Series: Theory and Methods. Springer-Verlag, New York. [11] Box, G.E.P. and Jenkins, G.M. (1970). Time Series Analysis: Forecasting and Con-trol. Holden-Day, San Francisco. [12] Carey, V., Zeger, S.L. and Diggle, P. (1993). Modellling multivariate binary data with alternating logistic regressions. Biometrika 80, 517-26. [13] Chan, K.S. and Ledolter, J. (1995). Monte Carlo EM estimation for time series models involving observations. J. Amer. Statist. Assoc. 90, 242-252. [14] Consul, P.C. (1989). Generalized Poisson Distribution: Properties and Applications. Marcel Dekker, New York. 198 Bibliography 199 [15] Cox, D. R. (1981). Statistical analysis of time series, some recent developments. Scand. J. Statist., 8, 93-115. [16] Cox, D.R. and Hinkley, D .V. (1974). Theoretical Statistics. London: Chapman and Hall. [17] De Jong, P. (1990). The diffuse Kalman filter. Ann. Statist. 19, 1073-1083. [18] De Jong, P. and Shephard, N . (1995). The simulation smoother for time series mod-els. Biometrika 82, 339-350. [19] Delamppady, M . , Yee, I .M.L. and Zidek, J .V. (1993). Hierarchical Bayesian analysis of a discrete time series of Poisson counts. Statist. Comp. 3, 7-15. [20] Dempster, A.P . , Laird, N . and Rubin, D.B. (1977). Maximum likelihood from in-complete data via the E M algorithm (with discussion). J. Roy. Statist. Soc. Ser. B 39, 1-38. [21] Diggle, P.J. , Liang, K . - Y . and Zeger, S.L. (1994). The Analysis of Longitudinal Data. Oxford University Press, Oxford. [22] Dimri, V . (1992). Deconvolution and inverse theory. Elsevier Science Publishers B . V . , The Netherlands. [23] Dockery, D.W., Pope, C.A. , X u , X . , Spengler, J.D., Ware, J .H. , Fay, M . E . , Ferris, B . G . and Speizer, F .E . (1993). A n association between air pollution and mortality in six U.S. cities. New England Journal of Medicine 329, 1753-1759. [24] Durbin, J . (1990). Extensions of Kalman modelling to non-Gaussian observations. Quaderni di Statistica e Matematica Aplicata alle Scienze Economico-Sociali 12, 3-12. [25] Dwyer, J .H. , Feinleib, M . , Lipert, P. and Hoffmeister, H . (1991). Statistical Mod-els for Longitudinal Studies of Health. (Dwyer, J . H . , Feinleib, M . , Lipert, P. and Hoffmeister, H . editors). Wiley, New York. [26] Fahrmeir, L. and Kaufmann, H . (1991). On Kalman filtering, posterior mode esti-mation and Fisher scoring in dynamic exponential family regression. Metrika 38, 37-60. [27] Fahrmeir, L. and Tutz, G. (1994). Multivariate Statistical Modelling Based on Gen-eralized Linear Models. New York: Springer-Verlag. [28] Feller, W. (1971). An Introduction to Probability Theory and its Applications, Vol. 2 (2nd Edition). Wiley, New York. Bibliography 200 [29] Fitzmaurice, G.M. and Laird, N.M. (1993). A likelihood-based method for analysing longitudinal binary responses. Biometrika 80, 141-51. [30] Fitzmaurice, G.M., Laird, N.M. and Rotnitzky, A.G. (1993). Regression models for discrete longitudinal responses. Statistical Science 8, 284-309. [31] Gaver, D.P. and Lewis, P.A.W. (1980). First-order autoregressive gamma sequences and point processes. Adv. Appl. Probab. 12, 727-745. [32] Gelfand, A. E. and Smith, A. F. M. (1990). Sampling based approaches to calculating marginal densities. J. Amer. Statist. Assoc. 85, 398-409. [33] Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Mechine Intelligence 6, 721-741. [34] Godambe, V.P. (1976). Conditional likelihood and unconditional optimum estimat-ing equations. Biometrika 63, 277-284. [35] Grunwald, G.K., Raftery, A.E. and Guttorp, P. (1993). Time series of continuous proportions. J. Roy. Statist. Soc. Ser. B 55, 103-116. [36] Harrison, P.J. and Stevens, C.F. (1976). Bayesian forecasting (with discussion). J. Roy. Statist. Soc. Ser. B, 38, 205-247. [37] Harvey, A.C. (1981). Time Series Models. Oxford: Allan. [38] Harvey, A.C. and Fernandes, C. (1989). Time series models for counts or qualitative observations (with discussion). J. Business Econ. Statist. 7, 407-422. [39] Hutchinson, T.P. and Lai, C D . (1990). Continuous Bivariate Distributions, Empha-sising Applications. Rumsby, Sydney, Australia. [40] Jacobs, P.A. and Lewis, P.A.W. (1977). A mixed autoregressive-moving average exponential sequence and point process (EARMA 1,1). Adv. Appl. Probab. 9, 87-104. [41] Jacobs, P.A. and Lewis, P.A.W. (1978). Discrete time series generated by mixtures. I: Correlational and runs properties. J. R. Statist. Soc. B 40, 94-105. [42] Jacobs, P.A. and Lewis, P.A.W. (1978). Discrete time series generated by mixtures. II: Asymptotic properties. J. R. Statist. Soc. B 40, 222-228. [43] Jacobs, P.A. and Lewis, P.A.W. (1978). Stationary discrete autoregressive-moving average time series generated by mixtures. J. Time Series Analysis 4, 19-36. Bibliography 201 [44] Jansson, P.A. (1984). Deconvolution with Applications in Spectroscopy. Academic Press, Inc. [45] Jensen, J .L. (1992). A note on a conjecture of H.E. Daniels. Braz. J. Probab. Statist. 6, 85-95. [46] Jones, R . H . (1993). Longitudinal Data with Serial Correlation. London: Chapman and Hall. [47] Joe, H . (1993). Parametric family of multivariate distributions with given margins. J. Mult. Anal. 46, 262-282. [48] Joe, H . (1996a). Time series models with univariate margins in the convolution-closed infinitely divisible class. J. Appl. Probab. 33, 664-677. [49] Joe, H . (1996b). Multivariate Models and Dependence Concepts. Forthcoming mono-graph. [50] J0rgensen, B . (1983). Maximum Likelihood estimation and large-sample inference for generalized linear and nonlinear regression models. Biometrika 70, 19-28. [51] J0rgensen, B . (1986). Some properties of exponential dispersion models, Scand. J. Statist., 13, 187-198. [52] J0rgensen, B . (1987a). Exponential dispersion models (with discussion). J. Roy. Statist. Soc. Ser. B, 49, 127-162. [53] J0rgensen, B . (1987b). Small-dispersion asymptotics. Braz. J. Probab. Statist. 1, 59-90. [54] J0rgensen, B . (1996). The Theory of Dispersion Models. Forthcoming monograph. [55] J0rgensen, B . and Lauritzen, S. L. (1993). Constructing multivariate proper disper-sion models, Unpublished Manuscript. [56] J0rgensen, B . , Martinez, J . R. and Tsao, M . (1994). Asymptotic behaviour of the variance function. Scand. J. Statist. 21, 223-243. [57] J0rgensen B. , Labouriau R. and Lundbye-Christensen S. (1994). Linear growth curve analysis based on exponential dispersion models. Technical Report #140, Depart-ment of Statistics, University of British Columbia. To appear in J. Roy. Statist. Soc. Ser. B. [58] J0rgensen, B . , Lundbye-Christensen, S., Song, X . - K . and Sun, L. (1995). A state space model for multivariate longitudinal count data. Technical Report #148, De-partment of Statistics, University of British Columbia. Bibliography 202 J0rgensen, B. and Song, X.-K. (1995). Stationary time-series models with exponen-tial dispersion model margins. Technical Report #154, Department of Statistics, University of British Columbia. Kalman, R.E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82, 34-45. Kalman, R.E. and Bucy, R.S. (1961). New results in linear filtering and prediction theory. Journal of Basic Engineering, 83, 95-108. Kalman, R.E. (1963). New methods in Wiener filtering theory. Proceedings of the First Symposium on Engineering Application of Random Function Theory and Prob-ability. (Bogdanoff, J.L. and Kozin, F., editors). Wiley, New York. Kaufmann, H. (1987). Regression methods for non-stationary categorical time series: Asymptotic estimation theory. Ann. Statist. 17, 79-98. Key, P.B. and Godolphin, E.J. (1981). On the Bayesian steady forecasting model. J. Roy. Statist. Soc. Ser. B 43, 92-96. Kitagawa, G. (1987). Non-Gaussian state-space modelling of nonstationary time-series (with discussion). J. Amer. Statist. Assoc. 82, 1032-1063. Knight, K., Leroux, B.G., Millar, J. and Petkau, A.J. (1993). Air pollution and human health: A study based on data from Prince George, British Columbia. Un-published manuscript. Kuk, A. Y. C. (1995). Asymptotically unbiased estimation in generalized linear models with random effects. J. R. Statist. Soc. B 57, 395-407. Laird, N.M. (1991). Topics in likelihood-based methods for longitudinal data anal-ysis. Statistica Sinica 1, 33-50. Lambert, M.L., Lamble, S.J. and Girard, R.W. (1987). 1986 Annual Air Quality Report for Prince George. Waste Management Branch, Ministry of Environment & Parks of British Columbia, Prince George, British Columbia. Lawless, J.F. (1982). Statistical Models and Methods for Lifetime Data. John Wiley, New York. Lawrance, A.J. (1980). The mixed exponential solution to the first-order autoregres-sive model. J. Appl. Probab. 17, 546-552. Lawrance, A.J. (1982). The innovation distribution of a gamma distributed autore-gressive process. Scand. J. Statist. 9, 234-236. Bibliography 203 [73] Lawrance, A . J . and Lewis, P .A.W. (1977). A n exponential moving average sequence and point process ( E M A 1). J. Appl. Probab. 14, 98-113. [74] Lawrance, A . J . and Lewis, P .A .W. (1980). The exponential autoregressive-moving average EARMA(p,c7) process. J. R. Statist. Soc. B 42, 150-161. [75] Lawrance, A . J . and Lewis, P .A .W. (1981). A new autoregressive time series model in exponential variables (NEAR( l ) ) . Adv. Appl. Probab. 13, 826-845. [76] Lawrance, A . J . and Lewis, P .A.W. (1985). Modelling and residual analysis of non-linear autoregressive time series in exponential variables. J. R. Statist. Soc. B 47, 000-000. [77] Leroux, B . G . and Puterman, M . L . (1992). Maximum-penalized-likelihood Estima-tion for independent and Markov-dependent mixture models. Biometrics 48, 545-558. [78] Levy, D., Gent, M . and Newhouse, M . T . (1977). Relationship between acute res-piratory illness and air pollution levels in an industrial city. American Review of Respiratory Disease 116, 167-173. [79] Lewis, P .A .W. (1983). Generating negatively correlated gamma variates using the beta-gamma transform. In Proceedings of the 1983 Winter Simulation Conference eds. S. Roberts, J . Banks, B . Schmeiser. IEEE Press, New York, pp. 175-176. [80] Lewis, P .A .W. and McKenzie, E . (1991). Minification processes and their transfor-mations. J. Appl. Probab. 28, 45-47. [81] Lewis, P .A.W. , McKenzie, E . and Hugus, D .K. (1989). Gamma processes. Commun. Statist.-Stochastic models 5(1), 1-30. [82] Liang, K . - Y . and Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73, 13-22. [83] Lindsey, J .K . (1993). Models for Repeated Measurements. Oxford University Press, Oxford. [84] Lipsitz, S.R., Laird, N . M . and Harrington, D.P. (1990). Maximum likelihood regres-sion methods for paired binary data. Statistics in Medicine 9, 1517-1525. [85] McCormick, W.P. and Park, Y .S . (1992). Asymptotic analysis of extremes from autoregressive negative binomial processes. J. Appl. Probab. 29, 904-920. [86] McCullagh, P. and Nelder, J .A. (1989). Generalized Linear Models, 2nd edition. Chapman and Hall, London. Bibliography 204 [87] McKenzie, E. (1981). Extending the correlation structure of exponential autoregressive-moving average processes. J. Appl Probab. 18, 181-189. [88] McKenzie, E. (1986). Autoregressive moving average processes with negative-binomial and geometric marginal distributions. Adv. Appl. Probab. 18, 679-705. [89] McKenzie, E. (1988). Some ARMA models for dependent sequences of Poisson counts. Adv. Appl. Probab. 20, 822-835. [90] Marshall, A.W. and Olkin, I. (1988). Families of multivariate distributions. J. Amer. Statist. Assoc. 83, 834-841. [91] Naik-Nimbalkar, U.V. and Rajarshi, M.B. (1995). Filtering and smoothing via esti-mating functions. J. Amer. Statist. Assoc. 90, 301-306. [92] Nelder, J.A. and Wedderburn, R.W.M. (1972). Generalized linear models. J. Roy. Statist. Soc. Ser. A 135, 370-384. [93] Pope, C A . (1989). Respiratory disease associated with community air pollution and a steel mill, Utah Valley. American Journal of Public Health 79, 623-628. [94] Ransom, M.R. and Pope, C A . (1992). Elementary school absences and P M i 0 pol-lution in Utah Valley. Environmental Research 58, 204-219. [95] Reid, N. (1995). The roles of conditioning in inference (with discussion). Statistical Science 10, 138-199. [96] Ripley, B. (1987). Stochastic Simulation. Wiley, New York. [97] Rogers, G.S. (1980). Lecture Notes in Statistics: Matrix Derivatives. Marcel Dekker, INC., New York. [98] Samet, J.M., Bishop, Y., Speizer, F.E., Spengler, J.D. and Ferris, B.F. (1981). The relationship between air pollution and emergency room visits in an industrial community. Journal of the Air Pollution Control Association 31, 236-240. [99] Seshadri, V. (1992). General exponential models on the unit simplex and related multivariate inverse Gaussian distributions. Statist. Probab. Letters 14, 385-391. [100] Shephard, N. (1994). Partial non-Gaussian state space. Biometrika 81, 115-131. [101] Serfling, R. (1980). Approximation Theorems of Mathematical Statistics. Wiley, New York. [102] Sklar, A. (1959). Fonctions de repartition a n dimensions et leurs marges. Publ. Inst. Statist. Univ. Paris, 8, 229-231. Bibliography 205 [103] Smith, J . Q. (1979). A generalization of the Bayesian steady forecasting model. J. Roy. Statist. Soc. Ser. B 41, 375-387. [104] Smith, R. L. and Miller, J .E. (1986). A non-Gaussian state space model and appli-cation to prediction of records. J. Roy. Statist. Soc. Ser. B 48, 79-88. [105] Spall, J . C. (ed.) (1988). Bayesian analysis of time series models. New York: Marcel Dekker. [106] Stiratelli, R., Laird, N . and Ware, J .H. (1984). Random effects models for serial observations with binary response. Biometrics 40, 961-971. [107] Stout, W.F . (1974). Almost Sure Convergence. Academic Press, New York. [108] Sweeting, T. J . (1981). Scale parameters: A Bayesian treatment. J. Roy. Statist. Soc. Ser. B 43, 333-338. [109] Tanner, M . A . (1991). Tools for Statistical Inference (Lecture Notes in Statistics, 67). Springer-Verlag, Berlin. [110] Tweedie, M . C . K . (1947). Functions of a statistical variate with given means, with special reference to Laplacian distributions. Proc. Cambridge Phil. Soc. 49, 41-49. [I l l] West, M . and Harrison, J . (1989). Bayesian Forecasting and Dynamic Models. Springer-Verlag, New York. [112] West, M . , Harrison, P.J. and Migon, H.S. (1985). Dynamic generalized linear mod-els and Bayesian forecasting. J. Amer. Statist. Assoc. 80, 73-97. [113] X u , J . (1996). Ph.D Thesis. Department of Statistics, University of British Columbia. [114] Yule, G. U . (1927). On a method of investigating periodicities in disturbed series, with special reference to Wolfer's sunspot numbers. Phil. Trans. Roy. Soc. A 226, 267-298. [115] Zeger, S.L. (1988). A regression model for time series of counts. Biometrika 75, 621-629. [116] Zeger, S.L., Liang, K . - Y . and Self, S.G. (1985). The analysis of binary longitudinal data with time-dependent covariates. Biometrika 72, 31-38. [117] Zeger, S.L. and Qaqish, B . (1988). Markov regression models for time series: A quasi-likelihood approach. Biometrics 44, 1019-1031. Bibliography 206 [118] Zeger, S.L., Liang, K . - Y . and Albert, P.S. (1988). Models for longitudinal data: A generalized estimating equation approach. Biometrics 44, 1049-1060. [119] Zeger, S.L. and Karim, M.R. (1991). Generalized linear models with random effects: A Gibbs sampling approach. J. Amer. Statist. Assoc. 86, 79-86. [120] Zhao, L.P. and Prentice, R .L . (1990). Correlated binary regression using a gener-alized quadratic model. Biometrika 77, 642-648.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Some statistical models for the multivariate analysis...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Some statistical models for the multivariate analysis of longitudinal data Song, Peter Xue-Kun 1996
pdf
Page Metadata
Item Metadata
Title | Some statistical models for the multivariate analysis of longitudinal data |
Creator |
Song, Peter Xue-Kun |
Date Issued | 1996 |
Description | This thesis develops some statistical models for the multivariate analysis of longitudinal data on the basis of the dispersion models of J0rgensen (1987a, 1996), consisting of three topics: multivariate dispersion models and their application to regression analysis, stationary time series models with non-normal margins and state space models with Markov latent processes. The goal of the thesis is to develop statistical models which can accommodate features of both trend and dependence for longitudinal data. This thesis focusses mainly on the following three types of longitudinal data, namely (1) many short time series, (2) a few long stationary time series and (3) a few long non-stationary time series with time-varying covariates. A class of multivariate dispersion models is proposed to deal with data of type (1), in a spirit similar to multivariate analysis based on the multivariate normal distribution. Under these multivariate parametric models, population-averaged models (Diggle et al., 1994) are revisited, where approximate inferences for regression parameters are presented, including the generalized estimating equation (GEE) of Liang and Zeger (1986) as a special case. The thesis also presents a class of stationary autoregressive moving-average (ARMA) models with exponential dispersion model margins for data of type (2). The class of ARMA models is defined as a special case of a class of stationary infinite order moving average processes constructed by means of the thinning operation of Joe (1996a). For analysis of type (3) data, two classes of state space models, including one with stationary latent processes and another with non-stationary latent processes, are proposed. To estimate regression parameters in both classes of models, we develop an algorithm for solving the so-called Kalman estimating equation (KEE), corresponding to a modified EM-algorithm where the E-step is approximated by the Kalman smoother that estimates the latent process via the best linear unbiased predictor (BLUP). Two simulation studies are conducted in the thesis based on Poisson-gamma models. One is for the comparison of the efficiency of the K E E approach and the Monte Carlo E M (MCEM) algorithm. The other simulation study is for the examination of the utility of the model diagnosis for detecting the misspecification of stationarity and non-stationarity for latent process. The thesis contains two data analyses. One data set consists of daily counts of emergency room visits for respiratory diseases to the hospital of Prince George, British Columbia, along with covariates of air pollution variables and meteorological variables. These data are analyzed through state space models to investigate the relationship between air pollution and respiratory morbidity. The other data set, consisting of the monthly number of poliomyelitis cases in the USA from 1970 to 1983, is analyzed based on the Poisson stationary-gamma model to study whether or not there is an evidence of a decreasing trend in the rate of polio infections in the USA. |
Extent | 8244231 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
FileFormat | application/pdf |
Language | eng |
Date Available | 2009-03-17 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0087739 |
URI | http://hdl.handle.net/2429/6196 |
Degree |
Doctor of Philosophy - PhD |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 1996-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-ubc_1996-148394.pdf [ 7.86MB ]
- Metadata
- JSON: 831-1.0087739.json
- JSON-LD: 831-1.0087739-ld.json
- RDF/XML (Pretty): 831-1.0087739-rdf.xml
- RDF/JSON: 831-1.0087739-rdf.json
- Turtle: 831-1.0087739-turtle.txt
- N-Triples: 831-1.0087739-rdf-ntriples.txt
- Original Record: 831-1.0087739-source.json
- Full Text
- 831-1.0087739-fulltext.txt
- Citation
- 831-1.0087739.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0087739/manifest