STATISTICAL A N A L Y S I S OF DISCRETE TIME SERIES WITH APPLICATION TO THE A N A L Y S I S OF WORKERS ' COMPENSATION C L A I M S D A T A by R. Keith Freeland B.Sc. The University of Calgary, 1990 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE F A C U L T Y OF G R A D U A T E STUDIES M A N A G E M E N T SCIENCE DIVISION we accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH C O L U M B I A February 1998 Â© R. Keith Freeland, 1998 In presenting this thesis in partial, fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain ; shall not be allowed without my written permission. Department The University of British Columbia Vancouver, Canada Date DE-6 (2/88) Abstract This thesis examines the statistical properties of the Poisson AR(1) model of A l -Osh and Alzaid (1987) and McKenzie (1988). The analysis includes forecasting, estimation, testing for independence and specification and the addition of regressors to the model. The Poisson AR(1) model is an infinite server queue, and as such is well suited for modeling short-term disability claimants who are waiting to recover from an injury or illness. One of the goals of the thesis is to develop statistical methods for analyzing series of monthly counts of claimants collecting short-term disability benefits from the Workers' Compensation Board (WCB) of British Columbia. We consider four types of forecasts, which are the k-step ahead conditional mean, median, mode and distribution. For low count series the k-step ahead conditional distribution is practical and much more informative than the other forecasts. We consider three estimation methods: conditional least squares (CLS), generalized least squares (GLS) and maximum likelihood (ML). In the case of CLS estimation we find an analytic expression for the information and in the GLS case we find an approximation for the information. We find neat expressions for the score function and the observed Fisher information matrix. The score expressions leads to new definitions of residuals. Special care is taken to test for independence since the test is on the boundary of the parameter space. The score test is asymptotically equivalent to testing whether the CLS estimate of the correlation coefficient is zero. Further we define a Wald and ii likelihood ratio test. Then we use the general specification test of McCabe and Leybourne (1996) to test whether the model is sufficient to explain the variation found in the data. Next we add regressors to the model and update our earlier forecasting, estimation and testing results. We also show the model is identifiable. We conclude with a detailed application to monthly W C B claims counts. The preliminary analysis includes plots of the series, autocorrelation function and partial autocorrelation function. Model selection is based on the preliminary analysis, t-tests for the parameters, the general specification test and residuals. We also include forecasts for the first six months of 1995. iii Table of Contents Abstract ii List of Tables vii List of Figures x Acknowlegements xiV 1 Introduction 1 1.1 General 1 1.2 Overview of topics 4 2 Poisson AR(1) model 10 2.1 Model definition .- 10 2.2 Interpretation 12 2.3 Basic properties 13 2.4 Poisson AR(p) model 15 2.5 A n illustrative example 22 3 Forecasting 26 3.1 Minimum Mean squared error 26 3.2 Minimum mean absolute error 32 3.3 Forecasts distributions 33 3.4 Prediction intervals.... 36 3.5 Duration 39 4 Estimation 41 4.1 Likelihood theory and estimating functions 41 4.2 Conditional least squares for the Poisson AR(1) model 64 4.3 Generalized least squares for the Poisson AR( 1) model 69 iv 4.4 The score and Fisher information for the Poisson AR(1) model 81 4.5 The score and Fisher information for a general AR(1) model 94 4.6 Asymptotics of the conditional maximum likelihood estimators for the Poisson AR(1) model 98 4.7 Comparison of methods 103 5 Testing for independence 108 5.1 Gaussian AR(1) 108 5.2 Conditional least squares 109 5.3 Score test I l l 5.4 The score function on the boundary of the parameter space 114 6 General misspecification test 121 6.1 Overview 121 6.2 Outline test 123 6.3 Details for the Poisson AR(1) model 127 7 Models with covariates 134 7.1 Model definition and introduction 134 7.2 Forecasting 135 7.3 Estimation 142 7.4 Testing 148 8 Application to counts of workers collecting disability benefits 154 8.1 Workers' Compensation Data 154 8.2 Model selection and testing 162 8.3 Arrival process 175 8.4 Forecasting 180 8.5 Gaussian AR(1) models 184 Bibliography 190 Appendix ...195 vi List of Tables 3.3.1 k-step ahead conditional means, medians, modes and point mass forecasts 35 3.4.1 95% prediction intervals for the k-step ahead conditional distribution 39 4.7.1 The diagonal elements of the Godambe information matrix for GLS, CLS and C M L when a = 0.3 and X = 1 105 4.7.2 The diagonal elements of the Godambe information matrix for GLS, CLS and C M L when a = 0.7 and 1 = 1 105 i 5.4.1 Tests for independence in the illustrative data set 119 5.4.2 The percentage of time the null hypothesis of independence was rejected out of 1000 simulated series of length 200 with a = 0 and A, = 1 119 5.4.3 The percentage of time the null hypothesis of independence was rejected out of 1000 simulated series of length 200 with a = 0.1 and X = 1 120 8.1.1 A summary of simple statistics for data sets 1 through 5 155 8.2.1 Tests for independence in data set 1 * 169 8.2.2 This table displays the seasonal arrival rate for data set 3 169 8.2.3 This table summaries the parameter estimation for data sets 1 to 5; included are the parameter estimates and the upper and lower 95% confidence limits 170 vii 8.2.4 This table summarizes the joint information matrix test of models 1,2 and 3 on data set 3 170 8.2.5 This table contains the mean duration and 95% confidence interval for the mean duration for data sets 1*,2,.. .,5 170 8.3.1 This table summaries the parameter estimation for the arrival processes in data sets 1A to 5A; included are the parameter estimates and 95% confidence interval. The last two columns contain estimated arrival rates and 95% confidence intervals from the Poisson AR(1) model 178 8.3.2 This table displays the seasonal arrival rate for data set 3A 179 8.3.3 This table summarizes the information matrix test for data sets 1A-5A 179 8.4.1 Individual 95% confidence intervals for the k-step ahead conditional distribution for data set 1 * '. 181 8.4.2 Individual 95% confidence intervals for the k-step ahead conditional distribution for data set 2 181 8.4.3 The k-step ahead conditional means, conditional medians and point mass forecast for data set 3 182 8.4.4 Individual 95% confidence intervals for the k-step ahead conditional distribution for data set 4 182 vm 8.4.5 Individual 95% confidence intervals for the k-step ahead conditional distribution for data set 5 182 8.4.6 The marginal means, medians and distributions for data set 3 183 8.5.1 The Gaussian AR(1) model parameter estimates for data sets 1 to 5 185 8.5.2 This table displays the seasonal arrival rate for data set 3 given by the Gaussian AR(1) model 186 ix List of Figures 2.5.1 Times series plot of monthly claims counts of workers collecting STWLB from January 1987 to December 1994. A l l claimants are male, between the ages of 25 and 34, are employed in the heavy manufacturing industry and are collecting STWLB due to a burn related injury 2 2.5.2 Correlogram for the time series of monthly claims counts of workers collecting STWLB from January 1987 to December 1994. A l l claimants are male, between the ages of 25 and 34, are employed in the heavy manufacturing industry and are collecting STWLB due to a burn related injury 25 2.5.3 Sample partial autocorrelation function for the time series of monthly claims counts of workers collecting STWLB from January 1987 to December 1994. A l l claimants are male, between the ages of 25 and 34, are employed in the heavy manufacturing industry and are collecting STWLB due to a burn related injury 25 3.3.1 Time series plot of monthly claims count collecting STWLB from January 1987 to December 1994. A l l claimants are male, between the ages of 25 and 34, employed in the heavy manufacturing industry and collecting STWLB due to a burn related injury 35 4.4.1 Residual plot of the continuation process for the time series of monthly claims counts of workers collecting STWLB from January 1987 to December 1994. A l l claimants are male, between the ages of 25 and 34, are employed in the heavy manufacturing industry and are collecting STWLB due to a burn related injury 92 4.4.2 Residual plot of the arrival process for the time series of monthly claims counts of workers collecting STWLB from January 1987 to December 1994. A l l claimants are male, between the ages of 25 and 34, are employed in the heavy manufacturing industry and are collecting STWLB due to a burn related injury 93 4.4.3 Autocorrelations in the continuation residuals for the time series of monthly claims counts of workers collecting STWLB from January 1987 to December 1994. A l l claimants are male, between the ages of 25 and 34, are employed in the heavy manufacturing industry and are collecting STWLB due to a burn related injury 93 4.4.4 Autocorrelations in the arrival residuals for the time series of monthly claims counts of workers collecting STWLB from January 1987 to December 1994. A l l claimants are male, between the ages of 25 and 34, are employed in the heavy manufacturing industry and are collecting STWLB due to a burn related injury 93 4.7.1 The asymptotic efficiency of conditional least squares as a function of a when X = l 104 4.7.2 The asymptotic efficiency of conditional least squares as a function of X when xi a =0.3 104 4.7.3 Box plots comparing the sampling distributions of a when the arrival process {s,} is uniform over {0,1,2} 106 4.7.4 Box plots comparing the sampling distributions of i when the arrival process {s t } is uniform over {0,1,2} 107 5.2.1 A comparison of the power for the Gaussian and Poisson based tests as a function of a . X - 1 and n = 100 .....110 5.2.2 A comparison of the power for the Gaussian and Poisson based tests as a function of X. a = 0.01 and n = 100 I l l 5.4.1 Plots of the likelihood as a function of a for five simulated samples of size 200 with a = 0 and X = 1 115 5.4.2 Plots of the likelihood as a function of a for five simulated samples of size 200 with a = 0.1 and A, = 1 115 8.1.1 A time series plot of data set 1 158 8.1.2 A time series plot of data set 2 158 8.1.3 A time series plot of data set 3 158 8.1.4 A time series plot of data set 4 159 xii 8.1.5 A time series plot of data set 5 159 8.1.6 A C F ' s and PACF's for data sets 1 to 3. 160 8.1.7 A C F ' s and PACF's for data sets 4 to 5 161 8.1.8 A C F ' s and PACF's for data sets 1* 161 8.2.1 Pearson, continuation and arrival residuals plotted against time for data set 1*. ...170 8.2.2 Pearson, continuation and arrival residuals plotted against time for data set 2 171 8.2.3 Pearson, continuation and arrival residuals plotted against time for data set 3 172 8.2.4 Pearson, continuation and arrival residuals plotted against model regressors in data set 3 : 173 8.2.5 Pearson, continuation and arrival residuals plotted against time for data set 4 174 8.2.6 Pearson, continuation and arrival residuals plotted against time for data set 4 174 8.5.1 The bar chart represents the k-step ahead conditional cumulative distribution for the Poisson AR(1) model, while the line graph represents the forecasts from the Gaussian AR(1) model 186 xiii Acknowledgements I thank my supervisors, Dr. Brendan McCabe and Dr. Marty Puterman, for their encouragement and guidance throughout my graduate career at U B C . I would like to acknowledge the assistance and valuable comments of another member of my supervising committee, Dr. Bent J0rgensen. I would like to thank the Workers' Compensation Board of British Columbia for providing the data, which motivated this thesis. I further thank the Workers' Compensation Board of British Columbia for providing me with some financial support. I thank my wife Mary Kelly for her support and patience. Our first son Terry Freeland was born during our stay at U B C and our second son Sandy Freeland was born only a few weeks ago. Both boys have brought us vast amounts joy and happiness. Finally I would like to thank my supervisors, Dr. Brendan McCabe and Dr. Marty Puterman, for their financial support to me in the past six years. xiv Chapter 1 1. Introduction 1.1 General Count data is generated when the number of occurrences of an event type are recorded. Examples include the monthly number of motor vehicle accidents, the annual number of murders, the number of goals scored in a hockey game, the number of stolen bases by a baseball player in a season and the monthly number of new cancer patients. Often the prime objective in studying count data is to understand how counts are changing over time, for instance seasonal patterns such as more suicides in February, and trends such as increases in the number of cancer patients. For this reasons count data is often collected over time to form a time series. Often the term discrete time series is used and includes more general situations in which the discrete values were not necessarily counts. Methods for examining and modeling time series have been around a long time. The most popular method for modeling time series is the Box-Jenkins modeling procedure (ARTJVIA models), developed by Box and Jenkins (1970). The limitation of these models is that the random variation in the series is assumed to be normally distributed. When counts are "large" it is often assumed that time series of counts can be adequately approximated by continuous time series which are normally distributed. The reasoning for this is that many common distributions for count data (e.g. binomial, Chapter 1. Introduction 2 Poisson and negative binomial) have an approximate normal distribution when the distribution mean is "large". Often series have very small counts or even many consecutive zeroes. Series such as the monthly number of Polio cases in the US, or the monthly number of new cases of women with AIDS in Vancouver wil l typically contain many consecutive months with no occurrences. Because of the inappropriateness of classic time series methods for the study of time series of count data, much effort has and continues to be put into the development of methodology necessary to study such time series. It is important to find out when and to what extent classical time series methods fail. Throughout the thesis we wil l compare our results for the Poisson AR(1) model to the results one would get by incorrectly using a Gaussian AR(1) model. For example we get inefficient parameter estimates when estimation is based on the pseudo Gaussian AR(1) likelihood. However when it comes to testing for independence in a Poisson AR(1) time series one can incorrectly assume a Gaussian AR(1) model with little consequence. Several authors have studied the time series of polio counts in the US, see Zeger (1988), Chan and Ledolter (1995), J0rgensen et al (1995) and Song (1996). These works are based on the theory of state space models. For an overview of state space models see Fahrmeir and Tutz (1994). Other authors have taken a different approach to modeling count time series, by trying to develop models with similar properties of the popular A R I M A models. See Al-Osh and Alzaid (1987), McKenzie (1988), Joe (1996 & 1997), J0rgensen and Song (1995) and Song (1996). Chapter 1. Introduction 3 One of the goals of this thesis is to develop models to use in an analysis of data on short-term disability (STD) claim counts at the Workers' Compensation Board of British Columbia (WCB). The W C B provides disability insurance for more than 130,000 employers in British Columbia. Every year the W C B receives about 200,000 new claims. There are two broad types of disability claims: health care only claims and wage loss claims. In this thesis claims counts for those workers collecting Short Term Wage Loss Benefits (STWLB) will be examined. A large portion of the economy in British Columbia has historically been based on the resource sector. In industries such as forestry, fishing and mining, injuries such as broken bones, cuts, dislocations and sprains are predominant. Lately the depletion of natural resources and other environmental concerns have reduced the activities of many resource industries. This however has not brought a decrease in the number of claims, since injuries from other industries are becoming more prevalent. For example, the number of repetitive strain injuries suffered by store clerks and computer programmers has increased dramatically. About half of all the claims at the W C B are strains and of these about half are back strains. The data which have been provided by the W C B contains the monthly count of the number of workers collecting STWLB. These data are grouped by 47 injuries, 16 industries, 10 claims centers, 2 genders and 4 age groups, resulting in 47x16x10x2x4=60,160 separate series. Of course many of the series have no occurrences at all. Chapter 1. Introduction 4 Each month the number of workers collecting STWLB is composed of 2 components, new claims and continuing claims. Since the number of continuing claims this month depends on the total number collecting claims last month, these data have an auto-regressive structure of order 1. The count AR(1) model discussed in Al-Osh and Alzaid (1987) is appropriate for studying such data. Much of the work done to date on such AR(1) models for count data has concerned probabilistic issues, such as correlation structure, distribution and time reversibility. Little or no attention has been devoted to issues of inference and forecasting. There is a need for statisticians and econometricians to study these models keeping statistical and practicality issues in mind. This thesis focuses on forecasting, estimation, inference, covariates, residual analysis, and the fitting of the model to actual data. 1.2 Overview of topics The outline of this thesis is as follows. In Chapter 2 we begin by presenting the Poisson AR(1) model, Al-Osh and Alzaid (1987). The appropriateness of fitting such a model to the W C B claim data is discussed. We can think of disabled workers as remaining in a queue waiting to get healthy. Under certain assumptions this queuing process is equivalent to the Poisson AR(1) model. This is important since it gives justification for the use of binomial thinning to model the correlation in these data. We continue the chapter by reviewing known properties of the Poisson AR(1) model. Then we discuss three extensions of this model to a Poisson AR(p) model that are found in the literature. Our contribution is to show that the partial autocorrelation function can be used to select Chapter 1. Introduction 5 the number of lags in the Poisson AR(p) model of Jin-Guan and Yuan (1991). That is, the partial autocorrelations beyond lag p are zero, as they are for the Gaussian AR(p) model. The chapter concludes with an illustration applied to one of the W C B data series. We consider Chapter 3 to be a major contribution. In this chapter we consider how to generate forecasts from the Poisson AR(1) model. Two criteria are considered for finding optimal forecasts, the minimum squared error and minimum absolute error of the forecasts. The squared error approach results in the conditional mean as the optimal forecast, while the absolute error approach yields the conditional median as the optimal forecast. It is interesting to note that the conditional median is an integer, which may be desirable from the point of view of data cohesion. We derive formulas for the k-step ahead conditional mean and variance as well as the k-step ahead conditional moment generating function. Our analysis is concerned with count series, more specifically low count series. That is, the number of possible outcomes for the k-step ahead count is small. In such cases it is possible and more useful to calculate the probability mass for each of these outcomes, which we call point mass forecasting. We show how to find confidence intervals around each point mass forecast when parameter estimates are used. The chapter concludes by finding confidence intervals for the mean duration of claims. Chapter 4 begins with a brief review of likelihood theory and estimating functions and follows the exposition in Barndorff-Nielsen and Sprensen (1994) and Godambe and Heyde (1987). This theory is used to unify the following methods of estimation and inference for this model: conditional least squares (CLS), generalized least squares Chapter 1. Introduction 6 (GLS), and conditional maximum likelihood (CML). Al-Osh and Alzaid (1987) considered estimation by CLS and C M L , but ignored inference except to refer to the general results for CLS in Klimko and Nelson(1978). Wooldridge (1991) considers GLS estimation and inference for parametric models of conditional means and variances. Our contributions include the following: we prove a strong law of large numbers which requires simply moment restrictions that typically hold for autoregressive processes. Then we find an analytic expression for the expected information matrix in the CLS case and derive an approximation for the expected information matrix in the GLS case. Further we find neat forms for expressing the score function and observed Fisher information matrix in terms of conditional expectations. This result generalizes to other cases where each observation is a drawing from a convolution of two distributions. It is relatively easy to numerically calculate the expected Fisher information. This makes Fisher scoring much faster than using the observed information in a Gauss-Newton method. We also show that the Poisson AR(1) model is a -mixing. We directly compare the asymptotic efficiency of CLS to C M L , and find that the efficiency is one when a = 0 and decreases as a increases. Finally, we use a simulation to show that the methods of CLS and GLS are more robust to model misspecification than C M L . To date in the literature no one has considered any statistical tests on the Poisson AR(1) model. In Chapter 5 we develop tests to check i f a series is independent. This is important since it would be inappropriate to fit an autoregressive model to a series of Chapter 1. Introduction 7 independent random variables. We note that special care must be taken since we are testing to see i f the parameter value lies on the boundary of its domain. A test based on the score is developed and found to be asymptotically equivalent. to a one sided test of the least squares estimator for a . In fact the difference between the score and CLS statistics is in the denominators, which are respectively the sample mean and variance. Since under the null hypothesis, the series consists of independent and identically distributed Poisson random variables, the sample mean and variance should be asymptotically equivalent. However i f the data are over-dispersed then the sample mean will be smaller than the sample variance, or the score based statistic wil l be larger than the CLS statistic. It is further found that the score is a well defined martingale for values of a in a neighborhood of zero, and hence a Taylor series expansion of the score function around the point a = 0 is also well defined. This leads to a one sided score, Wald and likelihood ratio test. Chapter 6 considers a general specification test for the model. Having satisfied ourselves that a series is dependent is it adequately modeled by the Poisson AR(1) process? This general specification test is often referred to as the information matrix test. Basically it tests to see i f the difference between the two representations of the information matrix (sometimes referred to as the inner and outer products) is distributed about zero, which should be the case i f the model is correctly specified. Chesher (1983) showed that the information matrix test can be interpreted as a score test, and amounts to testing whether the model parameters are stochastic. This test wil l therefore indicate Chapter 1. Introduction 8 whether or not over-dispersion is present in the data. McCabe and Leybourne (1996) extended Chesher's result to a much more general setting, including time series of dependent non-homogeneous random vectors. In this chapter we outline the result of McCabe and Leybourne and present the details for the Poisson AR(1) model. To assess the test we simulate 100 series for which the Poisson model is the correct specification and 100 series for which the Poisson model is a misspecification. We find that the level of the test is approximately correct for finite series of length 200 and that the test has strong power. In Chapter 7 we extend the model to include covariates. It is the first time anyone has considered adding covariates to both the continuation process and the arrival process. Joe (1996) briefly considers the addition of covariates to the mean of the process. Note Joe's parameterization is slightly different than ours. We find the k-step ahead moment generating function. From this we get the k-step ahead conditional mean, variance and distribution. Further we find the marginal distribution. Then we show how to construct individual confidence intervals around the point mass forecasts (k-step ahead conditional distribution). Next we give necessary and sufficient conditions for the model to be identifiable. We further give conditions for the model to be a -mixing. If the covariates are such that the model is identifiable, a-mixing and stationary then the maximum likelihood estimates are asymptotically normal. We conclude the chapter by outlining testing for independence and for specification. Chapter 1. Introduction 9 Joe (1997) applied the Poisson AR(1) model to daily counts of children reporting symptoms associated with air pollution. The advantage of the W C B data over the data used by Joe (1997) is that the WCB data can be interpreted as a queue and hence justifies our use of the Poisson AR(1) model. In Chapter 8 we apply the methods in the previous chapters to modeling and forecasting five monthly time series of counts of disabled workers collecting STWLB. Our analysis includes time-series plots as well as plots of the sample autocorrelation function and partial autocorrelation function. A l l five series appear to be autoregressive of order 1 and one of the series appears to be seasonal. Model selection is based on t-ratios, the information matrix specification test and residual analysis. We give a detailed discussion of the interpretation of the residuals. For the five W C B series we analyze in this chapter we were able to obtain additional data about the number of arrivals each month. The previously analyzed data contain the total number collecting each month, that is, the number continuing plus the number of arrivals. We use these data to assess whether the arrivals are Poisson. In addition we directly estimate the arrival parameters and compare them to the estimates obtained using the Poisson AR(1) model. Next we calculate point mass forecasts for the five series. We note that the k-step ahead distribution quickly approaches the marginal distribution. We conclude the chapter by comparing the results of the Poisson AR(1) model to the results that would come from assuming a Gaussian AR(1) model. Chapter 2 2. Poisson AR(1) model In this chapter, we begin by presenting the Poisson AR(1) model, Al-Osh and Alzaid (1987). The appropriateness of fitting such a model to the W C B claim data is discussed. We can think of workers collecting STWLB as waiting in a queue. When they begin receiving benefits they enter the queue and when they stop receiving benefits they exit the queue. Under certain assumptions this queuing process is equivalent to the Poisson AR(1) model. Next we review some basic properties of the model, which are shown to follow from the moment generating function. We then discuss extensions of the model to a Poisson AR(p) model which has properties similar to the Gaussian AR(p) model. We conclude the chapter with an illustrative example which demonstrates the theory covered in this chapter. 2.1 Model formulation Let X0,X1,---,Xn be a series of dependent Poisson counts generated according to the model where X0 has a Poisson distribution with parameterX/(l-a), written as X0 ~ Pol Xt - aÂ° Xt_x + ÂŁ tÂ» (2.1.1) and {e,} is a series of independent identically distributed (iid) Poisson random 10 Chapter 2. Poisson AR(1) model 11 variables with parameter X , that is s ( ~ Po(X). The thinning operator " Â° " is defined as follows; given Xt_x, aoXt_l=_2^Bu(a), where B â€ž ( a ) , B 2 ( ( a ) , . . . , B ; f ( i ( (a) are iid Bernoulli random variables with P(Bit (a) = 1) = 1 - P(BU (a) = 0) = a . Since a o Xt_x given Xt_x is a sum of iid Bernoulli random variables i f follows that it has a binomial distribution with parameters a andX,_j, written as a o J M | J M ~ Bi(a,Xt_x). The term thinning operator is used since its operation is to randomly thin out or reduce the number in a group. It is further assumed that Btj(a) and e, are independent. For comparison purposes we define the Gaussian AR(1) model as, Xt = aX,_x + X+r\,, | a | < l , (2.1.2) where { n , } ^ is a sequence of independent identically distributed normal random variables with mean zero and variance a 2 . An important distinction between the Poisson AR(1) model and the Gaussian AR(1) model is that in the Poisson model Xt is composed of two random components, the survivorship component a o Xt_x \ Xt_x, and the new entrant (innovation) component s,. In the Gaussian model given Xt_x the first component a l , . , is not random. The Poisson model is also complicated by the fact that these two random components are not observed. That is the distribution of X, given Xt_x is given by the convolution of the two random components. This makes the Poisson model much harder to work with than the Gaussian model. Chapter 2. Poisson AR(1) model 12 This model has been studied by Al-Osh and Alzaid (1987) and McKenzie (1988). Joe (1996) extended this model by developing a general method to define a random operator for cases were the marginal distribution is in the convolution-closed infinitely divisible class. His method is consistent with the AR(1) Poisson model in that it defines the binomial thinning operator when the marginal distribution is Poisson. 2.2 Model interpretation This model can be interpreted as a birth and death process, see Ross (1983, Section 5.3) for an introduction to birth and death processes. Each individual at time, t-l, has probability a of continuing to be alive at time, t, and at each time, t, the number of births follows a Poisson distribution with mean X. â€˘ Alternatively, the model can be interpreted as an infinite server queue, see Ross (1983, Example 2.3 b). The service time is geometric with parameter 1 - a and the arrival process is Poisson with mean X . One fundamental result from queuing theory is that the expected length of the queue is equal to the arrival rate times the expected waiting time, or L = XW, where , L, X and W are respectively the expected queue length, the arrival rate and the expected waiting time, see Little (1961). In this example the mean waiting time, W, is equal to l / ( l - a ) and the expected queue length i s ÂŁ = A , / ( l - a ) . With regards to short-term disability claims, Xt is the number of workers collecting short-term wage loss benefits (STWLB) at time t. It equals the sum of the Chapter 2. Poisson AR(1) model 13 number of workers continuing to collect from time t -1, a o Xt_x, and the number of new claims at time t, e,. The waiting time l / ( l - a) is the mean number of months that a newly disabled worker is expected to collect STWLB. This is referred to as duration at the W C B and is an important input into managerial decision making. Note that it is not directly extractable from available W C B data. 2.3 Basic properties In this section we look at the stationary distribution of the Poisson AR(1) process and basic quantities, such as mean, variance, moment generating function and correlation. For the Poisson APv(l) model the conditional mean and variance of Xt given Xt_x are respectively E[Xt\Xt_x] =aX._.+A, and Var[Xt\X t_(\ = a(\-a)Xt_x +X .The Gaussian AR(1) model has the same conditional mean but the conditional variance, a 2 , is different, McKenzie (1988). In the following proposition the stationary distribution for the Poisson AR(1) model is given and from this we can find the unconditional moments of Xt. McKenzie (1988) sketches the proof of this result, we give a simple proof using moment generating functions. Proposition 2.3.1 When X0 is Poisson with mean X/(l-a) the marginal distribution of Xt is Poisson with mean X,/(l - a ) . Chapter 2. Poisson AR(1) model 14 Proof: We use induction to prove the result. By assumption the result is true for X0. We now assume Xt_x is Poisson with mean A , / ( l - a ) , that is the moment generating ( X \ function of X,_x is Mx (s) = exp (es -1) . The moment generating function of Xt U - a v ') is, MXi(s) = E[exp(sX!j\ = E[E[exv(saoXl_x+set)\Xt_x]] = E^(aes +(1 - a))*'"' exp(X(es - l))j = E[exp(s'X,_x)]exp(X(es -l)) X = e x p ( T â€” ( e s ' - l ) ) e x p ( X i y - l ) ) where es' = ae 5 +1 - a . Making this substitution gives us, Mx (s) = e x p ( - ^ (aes +1 - a -1)) cxp(X(es -1)) ' 1-a = e x p ( - ^ - ( e s - l ) ) . 1-a Thus the marginal distribution of Xt is Poisson with mean X/(\-a) and the result follows. Remark Since the marginal distribution of Xt is Poisson it follows that the unconditional mean Chapter 2. Poisson AR(1) model 15 and variance of Xt are both equal t o X , / ( l - a ) . In addition the first three marginal moments of Xt are respectively , + (^-) 2 and + 3(T^-)2 + (-^f . An important quantity for this model is the autocorrelation function, which is the same as in the standard Gaussian AR(1) model, and is given by pk =ak, k = 1,2,..., see Al-Osh and Alzaid (1987). 2.4 Poisson AR(p) model A natural extension of the Poisson AR(1) model is the following Poisson AR(p) model, Xt = a 1 Â° I ( . 1 + a 2 o I ( . 2 + - + a â€ž o I , . f + ÂŁ ( ! (2.4.1) where {s,}â„˘, is again a sequence of iid Poisson random variables, " Â° " is the binomial thinning operator, given X0,Xi,...,Xl_x a , Â°Xt_15a2 oXt_2,...,ap Â°Xt_p and s, are mutually independent and a y. â‚¬[0,l], j = 1,2,...,p. Note that the marginal distribution is Poisson only for the case p = 1. This extension is studied in detail in Jin-Guan and Yuan (1991). Alzaid and Al-Osh (1990) examine a different Poisson AR(p) model where the thinning operator is defined such that a , Â°Xt,a2 oXt,...,ap Â°Xt given Xt has a multinomial distribution with parameters a , , a 2 , . . . , a p and Xt. A non-linear Poisson AR(p) model is found in Joe (1996). Joe writes his model as Xt = At[Xt_x,Xt_2,...,Xt_p^j + et where the probability function of At{Xt_x,Xt_2,...,Xt_p} is zp+l dimensional sum. Chapter 2. Poisson AR(1) model 16 The model by Jin-Guan and Yuan (1991) has the same autocorrelation function as the Gaussian AR(p) model. The parameters are easily estimated by conditional least squares, see Theorem 2.4.4. A drawback to this model is the lack of a physical interpretation as in the Poisson AR(1) model. Jin-Guan and Yuan give no interpretation of this model, nor can we think of any physical system it could represent. However it is often the case that A R M A models do not have physical interpretation, so this may not be much of a concern. It is possible to give a physical interpretation of the model in Alzaid and Al-Osh (1990) but the interpretation is a bit odd and is as follows. Let Xt be the number of newborn females in period t. Each female has reproductive life span of p periods and is permitted to have at most one offspring. From the cohort Xt of newborn females the distribution of their female offspring in the next p periods is assumed to be multinomial. That is they let a , oXt,a2 oXt,...,ap Â°Xt denote the female offspring respectively in periods ^ + 1, t + 2,...,t+p and assume that a , Â°Xt,a2 Â°Xt,...,ap Â° X r given Xt has a multinomial distribution with parameters al,a2,...,ap and Xt. The innovation 8, represents the number of newborn females that immigrate into the system at time t. The autocorrelation function for this process is the same as the Gaussian ARMA(p,p- l ) process. The higher order AR(p) models of Joe (1996) are extremely difficult to use in practice. The conditional moments for his model are nonlinear and involve the calculation of multi-dimensional sums. In his AR(2) model the calculation of ÂŁ[x,|X f_j,.Ar,_2] Chapter 2. Poisson AR(1) model 17 requires the calculation of four 2-dimensional sums. This is a major draw back since it makes conditional least squares estimation much more difficult. Maximum likelihood estimation is possible but also difficult, since even when p = 2 the conditional probabilities involve 4-dimensional sums. Joe (1997) contains an example where an AR(2) model is estimated. However the model does have a physical interpretation and as a special case can be interpreted as a queue. For the AR(2) model there exists random variables Z],Z2,ZJ,ZU,ZU,Z23 and Z 1 2 3 such that the distribution of Xt+],Xl+2,Xt+2 is the same as the distribution of Zx+Zn+Zu+Zm, Z2+Zu+Z23+Zm, Z 3 + Z 1 3 +Z 2 3 +Z 1 2 3 . This is not a queue since Z 1 3 represents the number present at both time t+\ and t+3 but are not present at time t+2. It is possible to make Z 1 3 degenerate and equal to zero, in which case the model becomes a queue. In the remainder of this section we examine the properties of the Poisson AR(p) model of Jin-Guan and Yuan (1991) and henceforth refer to it simply as the Poisson AR(p) model. The objective of the rest of this section is to show that we can use the sample autocorrelation function and partial autocorrelation function to select the order in a Poisson AR(p) model in exactly the same way as they are used to select the order in a Gaussian AR(p) model. Theorem 2.4.1 (Jin-Guan and Yuan, 1991) Let {s,}â„˘ be count valued random variables with mean p., finite variance a 2 and let a y. e[0,l], j = 1,2,...,p. If the roots of Chapter 2. Poisson AR(1) model 18 G* - o ^ e ' - 1 -a2Qp~2 a ^ G - a p = 0 are inside the unit circle, then there exists a unique stationary count valued time series {Xt} which satisfies (2.4.1) and cov[X s ,s ,] = 0, s < t. The proof of this result is long and is found in Jin-Guan and Yuan (1991). We define the sample covariance and sample correlation respectively as y k =_Z' (x< ~ xlx^ ~ x)md P * = y * h o â€˘ Theorem 2.4.2 The Poisson AR(p) process {Xt} defined by (2.4.1) is ergodic. Theorem 2.4.3 y k and pk are strongly consistent. Jin-Guan and Yuan (1991) show that the Poisson AR(1) process {Xt} ergodic and use this to prove the strong consistency of the sample covariance's and correlations. The following result, also from Jin-Guan and Yuan (1991), implies that the autocorrelation function of the Poisson AR(p) model is the same as for the Gaussian AR(p) model. We provide a simple alternative proof to that found in Jin-Guan and Yuan (1991). Proposition 2.4.1 The Yule-Walker equation, pk =ct,p t_1 +a 2 p t _ 2 Hâ€”kx p t , holds for the Poisson AR(p) model. Chapter 2. Poisson AR(1) model 19 Proof: Multiplying (2.4!) by Xt_k and taking expectations we get, Next we take the expectation of (2.4.1) and multiply by ZifX,^] to get, (2 4 3) +apE[Xl_p]E[Xl_k] + E[Et]E[Xt_k] We note that ÂŁ[s,X__ t] = i s f e j i i f x , ^ ] and that because of stationarity E\_Xt_sXt_k] -E[Xt_s]E[Xt_k] = co\[Xt,Xt+s_k] = ys_k. Taking the difference between (2.4.2) and (2.4.3) we get y k =a.xy k_{ +a 2y k_2-\â€”Kx^y k_p. Finally dividing this by y 0 completes the proof. There is a common misconception that the sample autocorrelations for a series of iid data wil l be zero. In fact for iid data we can expected one autocorrelation out of 20 to be larger in absolute value than 2n~^ and as a result values of | p t | larger than 2n~^ are statistically significant at the 5% level. Parameter estimates for the Poisson AR(p) model can be found using the method of conditional least squares, Klimko and Nelson (1978). In this method the parameter estimates are choosen to minimize the sum of squared distances between Xt and ÂŁ[X_|X___,X__2,...]. Details for the Poisson AR(1) case are found in Section 4.2. Chapter 2. Poisson AR(1) model 20 Theorem 2.4.4 The conditional least squares estimates, du,d2n,...,dpn and Xn, of a! ,a 2 , . . .a andX are strongly consistent andfurther d 2 â€ž - a 2 where E is a finite covariance matrix. Jin-Guan and Yuan (1991) prove this by noting that the Poisson AR(p) satisfies the conditions needed in Klimko and Nelson (1978) for the conditional least squares estimates to be consistent and asymptotically normal. The covariance matrix S is equal to the inverse of the Godambe information, see Section 4.1.1 and an expression for it is given in Jin-Guan and Yuan (1991). Definition 2.4.1 Thepth partial autocorrelation, n , is the last coefficient, ap, when fitting a Poisson AR(p) model, and measures the excess correlation at lag p which is not accounted for in a Poisson AR(p-l) model. The following new result is useful in model selection, since it shows that a Poisson AR(p) process has the same partial autocorrelations as a Gaussian AR(p) process. Corollary 2.4.1 If the series {Xt} follows a Poisson AR(p) process and satisfies the conditions in Theorem 2.4.1 then the partial autocorrelations beyond lag p are zero, that Chapter 2. Poisson AR(1) model 21 is np+k =0,for k>\. Proof: By Theorem 2.4.1 Xt is uniquely defined by (2.4.1) with p lags. Therefore the only way to represent Xt is by the following equation, X, = a, o x,_x + a 2 o X,_2 +â€˘ â€˘ -+a p Â° Xt,p + a p+i Â° Xt_p_x +â€˘ â€˘ -+a p+k Â° Xt_p_k + e, is to set a p + 1 = a p + 2 =â€˘ â€˘ â€˘ = ap+k = 0 and hence 7c^ t = 0 . â€˘ As a result of Theorem 2.4.4 we can use conditional least squares to find strongly consistent estimates of the partial autocorrelation coefficients. Another estimate comes from the Yule-Walker equations. Setting k = p in the Yule-Walker equation and solving for ap gives us ap = pp -a^^ - a 2 P P -2 otp_,p,. This suggests the following estimate for np, Kp = Pp -O-lPp-l -<*2Pâ€ž-2 ^ p - l P l ' ( 2 A 4 ) where d 1 ,c t 2 , . . . ,d , are any 4n consistent estimates ofa 1 , a 2 , . . . , a H when fitting a Poisson AR(p-l) model. If {Xt} is a sequence of iid random variables satisfying certain mild moment restrictions, for example finite variance and fourth moment, then n^n has a standard normal asymptotic distribution, where the sample partial autocorrelation coefficient k is defined as in (2.4.4). As a result, values of \iik\ larger than 2 Â« - ^ are statistically Chapter 2. Poisson APv(l) model significant at the 5% level. 22 2.5 An illustrative example To illustrate the points in this chapter and the following chapters we have selected one of the series form the WCB claims data set, which is analyzed in more detail in Chapter 8. As mentioned in Section 1.1 the W C B data are monthly claims counts of workers collecting STWLB and are grouped into more than 60,000 separate series. One of the categories in which the data is grouped is type of industry, most of which are strongly affected by seasonality. Examples of seasonally affected industries are logging, hotels, restaurants, fishing, and retail. To model the series from these industries it will be necessary to add seasonal regressors to the Poisson AR(1) model defined in Section 2.1. The addition of covariates or regressors to the model will be covered in Chapter 7. The heavy manufacturing industry is one which we feel is less sensitive to seasonality. It is from this industry that we have selected a series to examine in the following example. Example 2.5.1 The data series used in this example and the examples in Chapters 3 through 6 consist of monthly claims counts of workers collecting STWLB from the Richmond claims center between January 1987 and December 1994. A l l the claimants are males, are between the ages of 25 and 34, are employed in the heavy manufacturing industry and are collecting STWLB due to a burn related injury. Chapter 2. Poisson AR(1) model 23 A time series plot of this data is given in Figure 2.5.1. From this plot the data appear to be stationary and non-seasonal. This is confirmed by the correlogram, Figure 2.5.2. In a non-stationary time series the autocorrelations do not come down to zero except for large lags. If the series were seasonal then we would expect the absolute value of the sample autocorrelation at lags 6 and 12 to be large. The correlogram for an AR(1) process with 0 < a < 1 should move to zero exponentially as the lag increases. While the autocorrelations in our correlogram approach zero quickly it does not appear exponential. A good discussion on interpreting correlograms is found in Chatfield (1989) section 2.7.2 and 4.1.1. Figure 2.5.3 is a plot of the sample partial autocorrelations, which were calculated using (2.4.4). This data set consists of 12x8 = 96 observations. As noted at the end of Section 2.4 any partial autocorrelation larger than 2/V96 = 0.204 is statistically significant at the 5% level. Consequently the second partial autocorrelation is on the border of being significant. Since the correlogram does not give any strong evidence against an AR(1) plus the fact that we have a physical interpretation which suggest an AR(1) model, it is appropriate to proceed assuming an AR(1) model. Suppose we know the "true" parameter values are a = 0.40 and A, = 5.2 (these are actually the maximum likelihood estimates calculated in Section 4.6). Then the unconditional expected number of claimants collecting STWLB each month is 5.2 / (1 - 0.40) = 8.67. The expected waiting time, or the expected number of months that Chapter 2. Poisson AR( 1) model 24 a new injured claimant can expect to be off work, is 1 / (1 - 0.40) = 1.67. A property of the geometric service time is that it is memoryless. That is, at the start of the current month all claimants collecting this month, both old and new claimants, can expect to collect, in addition to the current month, for an average 0.67 months. Claims Counts (heavy manufacturing industry, males, ages 25-34, burns) 20 _ _ â€” â€” 1/87 1/88 1/89, 1/90 1/91 1/92 1/93 1/94 Month Figure 2.5.1 Times series plot of monthly claims counts of workers collecting STWLB from January 1987 to December 1994. All claimants are male, between the ages of 25 and 34, are employed in the heavy manufacturing industry and are collecting STWLB due to a burn related injury. Correlogram (heavy manufacturing industry, males, ages25-34, burns) Figure 2.5.2 Correlogram for the time series of monthly claims counts of workers collecting STWLB from January 1987 to December 1994. All claimants are male, between the ages of 25 and 34, are employed in the heavy manufacturing industry and are collecting STWLB due to a burn related injury. Chapter 2. Poisson AR(1) model 25 Sample Partial Autocorrelation Function 0 5 (heavy manufacturing industry, males, ages 25-34,burns) 0.4 -0.3 . 0.2. 0.1 -0 I "^""""'â€˘â€˘â€”-A.. , . I .-T~ â€” -0.1 - 2 3 4 ! Lag k Figure 2.5.3 Sample partial autocorrelation function for the time series of monthly claims counts of workers collecting STWLB from January 1987 to December 1994. All claimants are male, between the ages of 25 and 34, are employed in the heavy manufacturing industry and are collecting STWLB due to a burn related injury. Chapter 3 3. Forecasting The first three sections of this chapter concern forecasting when the model parameters are known. We begin by considering two criteria for optimal forecasting: minimum mean squared error forecasting and minimum mean absolute error forecasting. The first criterion almost always results in a non-integer forecast and is the same point forecast that the usual Gaussian AR(1) model would produce. The second criterion leads to an integer forecast which is attractive from the point of view of data cohesion. When forecasting a future count from a low count time series there is only a small set of possible values the future outcome is likely to take. In these situations it is practical and desirable to give the individual probabilities for each outcome in the set. We call this point mass forecasting. A fourth type of forecast is the conditional mode, which is found by selecting the outcome (point mass forecast) with the largest probability. We conclude the chapter by constructing individual confidence intervals for the point mass forecasts when using parameter estimates. 3.1 Minimum mean squared error Consider a sample } ( = 0 from the Poisson AR(1) model in Chapter 2. The objective is to find a forecast, XN+k of XN+k that minimizes the expected squared error given the 26 Chapter 3. Forecasting 27 sample. That is to find XN+k which minimizes N+k A N+k XN = E[X2N+k IXN ] - 2 X / V + J t . E f X ^ | XN ] + X 2 JV+t The first order conditions are, 0 This implies that the conditional mean, XN+k = E[XN+k | XN ], is the forecast of XN+k that minimizes the mean squared forecast error. Note that this is a general result when forecasting time series and does not depend on the model. The next result is for the &-step ahead conditional mean, and has never explicitly appeared in the literature except for the case were ^=1. It is the same as the k-step ahead conditional mean for the Gaussian AR(1) model. Proposition 3.1.1 In the Poisson AR(1) model the k-step ahead conditional mean is Proof: We prove the result by induction. Since s, is independent of a o Xt_x the one step ahead conditional mean is given by E[XN+k\XN] = akXN + 1-a X, k = 1,2,3,.... E[XN+1\XN] = E[aoXN+sN+i\XN] Chapter 3. Forecasting 28 Now suppose that the ÂŁ-1 step ahead conditional mean is 1 - a ' E[XN+k_l\XH] = ak-1Xlf+- X 1-a Then the k-step ahead conditional mean is E[XN+IC\XN] = E = E E[XN+ISXN+\\XN\ k-\ a*->XN+1 + -X 1-a 1-a*"1 ak~l\oXN+X]+ X 1-a l - a * â€ž = aKXN + 1-a and the result follows by induction. It is also interesting to know the variation of XN+k given XN around the forecast XN+k, which is given in the following proposition. Proposition 3.1.2 In the Poisson AR(1) model the k-step ahead conditional variance is givenbyVar[XN+k\XN] = ak(\-ak)XN + l-^X, k = 1,2,3,. .. 1-a Proof: We prove the result by induction. The one step ahead conditional variance is, Var[XN+l\XN] = a(l-a)XN+X Now suppose that the k-l step ahead conditional variance is Chapter 3. Forecasting 29 1-a k-i Var[XN+l_x | XN ] = a w (l - a k~l )XN + -jâ€” X Then the k-step ahead conditional variance is Var[XN+k\XN] = E[Var[XNJXN+l]\XN] + Var[E[XNJXN+x]\XN] = E 1-a i -1 a ^ l - a " ) ^ ^ ^ 1-a 1-a : a ' - 1 ( l - a ' - 1 ) [ a X j V + ? . ] + 1 i a a X + a 2 t " 2 [ a ( l - a ) X J V +X] ^ a ' - a ^ + a ^ - a 2 * ] ^ ( a " - a 2 t - 2 ) ( l - a ) + l - a i - 1 +a 2 *- 2 ( l - a ) 1-a = a * ( l - a ) * X i V + ^ ^ A , . 1-a â€˘ For the Gaussian AR(1) model the k-step ahead conditional variance is ( l - a 2 ) - 1 ( l - a 2 t ) a 2 . As k goes to infinity the conditional mean and variance respectively go to the stationary (unconditional) mean and variance of the process. That is l i m ^ E [ X N + k \ X N ] = yx_a and l i m ^ Var[XN+k\XN] = A third result which actually includes the two previous results is the conditional moment generating function of XN+k given X Chapter 3. Forecasting 30 Theorem 3.1.1 For the Poisson AR(1) model the distribution of XN+k given XN is a convolution of a binomial distribution with parameters ak and XN and a Poisson distribution with parameter X1^-. That is, the k-step ahead conditional moment generating function is given by ^w=i^+( i -<**) r e x pf T~(es -1)} <3- li> Proof: We prove the result by induction. The one step ahead conditional moment generating function is given by m * m ^=4expHa Â° X N + & ^ \ X N \ = [aes +(1 - a ) ] ' " exp{x(es -1)} Now suppose that the ÂŁ-1 step ahead conditional moment generating function is " w r . W = [Â«"Â«* + (1 - Â« " ) f - ')} â€˘ Then the k-step ahead conditional moment generating function is Chapter 3. Forecasting 31 - 4 [ a - V + ( 1 - Â« Â» ) J - e x p | x l ^ - ( e ' - l ) = 4 " - | ^ ] e x p { x l ^ - ( e - - l ) } ( e ' - l ) where e' = a * ~ V + ( l - a i _ 1 ) . Substituting this for e' gives MxN^xNis) = (a[a*-V + ( l -a H ) ]+( l -a ) f = exp{x([a *" V + (1 - a )] -1)} expjx (e' -1) j = [ a V + ( l - a * f e x p j ^ ( e - l ) j The above result shows that the distribution of XN+k \ XN is a convolution of a binomial distribution with parameters a* and XN and a Poisson distribution with parameter ^â€˘JnS~- Hence it has mean akXN +X1f^- and variance ak(l-ak}XN + X1=^, which of course agrees with Propositions 3.1.1 and 3.1.2. This extends the results in McKenzie (1988) to cases were k>\. Chapter 3. Forecasting 32 In the usual Gaussian AR(1) model XN+k\XN has a normal distribution with mean a and variance T f V ^ â€˘ So while the conditional mean of XN+k\XN in the Poisson and Gaussian models are the same, their conditional distributions are quite different. Corollary 3.1.1 Let \ik denote the distribution of XN+k\XN and let \i be the distribution of a Poisson random variable with mean . Then, u k => u . That is u k converges weakly to u or XN+k\XN has a Poisson limiting distribution with mean . Proof: From (3.1.1) we have which is the moment generating function of a Poisson distribution with mean â€˘ The result follows since convergence of the moment generating function implies convergence of the probability measure whenever the moment generating function exists in a neighborhood of zero, see for example Billingsley (1986). â€˘ 3.2 Minimum mean absolute error The objective in this section is to find a forecast, XN+kof XN+k that minimizes the Chapter 3. Forecasting 33 expected absolute error given the sample. That is to find XN+k which minimizes Let pk(x\XN) be the conditional probability function of XN+k given XN. We will define the conditional median of XN+k given XN as the smallest non-negative integer mk for which '^l*_0Pk(x\XN) ^ 0.5. A n alternative definition is to let mk be the largest non-negative integer such that ^^_Qpk{x\XN) < 0.5. However i f pk(0\XN) > 0.5 then the median would not be defined. Proposition 3.2.1 In the Poisson AR(1) model the k-step ahead conditional median is the forecast which minimizes the expected absolute forecast error. That is, E^xN+t-xN^\xN] has a global minimum at XN+k = mk. Proof: Suppose that XN+k is between m -1 and m, where m is a non-negative integer. Then ÂŁ | ^ - ^ ^ ] = ^ ( ^ " x K W " 2 Z . ( ^ - x ) A W - T h e s l o P e o f m i s expectation as a function of XN+k is _ 0 - Z^x-mPk(.x) â€˘ ^ m<mk the slope is negative and i f m > mk +1 the slope is positive. The minimum therefore occurs at XN+k = mk. â€˘ It is interesting to note that we did not have to restrict our search to values in the non-Chapter 3. Forecasting negative integers in order to get an integer solution. 34 3.3 Point mass forecasts In this section we look at point mass forecasts, or the k-step ahead conditional distribution. In cases were the counts are low, the median forecast may not be very informative. For example, consider the following 2 cases of a discrete random variable X: case 1, P(X = 0) = 1-P(X = 1) =.50 and case 2 P(X = 0) = 1-P(X = 5) =.90. In both cases the median of X is 0 and the mean is 0.5, but in case 2, there is almost twice the probability of observing a zero. Since there are only 2 outcomes in this example it would be more informative to give the probability distribution for the outcomes. By Theorem 3.1.1 the distribution of XN+k given XN is a convolution of a binomial distribution with parameters ak and XN and a Poisson distribution with parameter X ^ f^-. The probability mass function of XN+k given XN is, *-Â° \ s ) \x-s)\ [ 1-a 1-a J In the following example we illustrate the conditional mean, median, mode and point mass forecasts. The conditional mode is easy to find since it is the point, x, at which the probability mass, pâ€ž(x\xâ€ž), is largest, where x is a non-negative integer. Example 3.3.1 This is a continuation of Example 2.5.1. Our series consists of 8x12 = 96 observations and the last observation is Xg6 = 11. Again we assume that the "true" Chapter 3. Forecasting 35 parameter values are a =0.40 and A. = 5.2. Figure 3.3.1 is a time series plot of the observations and predicted values (one step ahead conditional mean forecasts). It indicates that one step ahead conditional mean forecasts are reasonably good. Conditional forecasts for the first 6 months of 1995 given a count of 11 in December 1994 are displayed in Table 3.3.1. The table includes the k-step ahead conditional mean and median forecasts as well as the k-step ahead conditional distribution (point mass forecasts). The last column contains the limiting distribution or unconditional distribution. 20 Monthly Claims Counts (heavy manufacturing industry, males, ages 25-34, burns) Month . Observed - Fredicted Figure 3.3.1 Time series plot of monthly claims count collecting STWLB from January 1987 to December 1994: All claimants are male, between the ages of 25 and 34, employed in the heavy manufacturing industry and collecting STWLB due to a burn related injury. Chapter 3. Forecasting 36 k---l k=2 k=3 k=4 k=5 k=6 k--=oo mean 9.60 9.04 8.82 8.73 8.69 8.68 8.67 median 9 9 9 9 9 9 8 mode 9 9 8 8 8 8 8 V(5|H) 0.025 0.038 0.051 0.058 0.061 0.066 0.065 0.068 0.066 0.069 0.067 0.070 0.067 0.070 ^ ( 6 | l l ) 0.068 0.089 0.097 0.099 0.101 0.101 0.101 A W " ) 0.101 0.117 0.122 0.124 0.125 0.125 0.126 ^(8|11) 0.129 0.133 0.135 0.135 0.136 0.136 0.136 A ( 9 | H ) 0.142 0.134 0.132 0.131 0.131 0.131 0.131 A ( I O | H ) 0.138 0.121 0.116 0.115 0.114 0.114 0.113 A ( I I | H ) 0.118 0.099 0.093 0.091 0.090 0.090 0.089 A ( I 2 | H ) 0.091 0.074 0.068 0.066 0.065 0.065 0.065 A(13|11) 0.063 0.051 0.046 0.044 0.044 0.043 0.043 A ( l 4 | l l ) 0.040 0.032 0.029 0.028 0.027 0.027 0.027 0.046 0.040 0.035 0.033 0.032 0.032 0.032 Table 3.3.1 k-step ahead conditional means, medians, modes and point mass forecasts. Remarks 1. The conditional mean forecasts quickly approach the unconditional mean of 8.67. 2. The conditional median forecast is 9 and remains unchanged except in the limit where fjPa}(x\U) = 0.50. 3. The conditional mode is easy to find from the table of point mass forecasts (conditional distribution). For example, the largest 1-step ahead point mass, j?,(x|ll), Chapter 3. Forecasting 37 is 0.142 and occurs whenjc=9. 4. The 6 step ahead conditional distribution is very close to the limiting distribution. 3.4 Prediction intervals The point mass forecast in Section 3.3 were calculated assuming the "true" parameter i values were known. Since in practice we have to estimate the parameters we would like to include this uncertainty by constructing point mass prediction intervals. The following result, which is found in Serfling (1980), will be used to develop the prediction intervals. Theorem 3.4.1 Suppose that the sequence, {^}"=1, of k-dimensional random vectors is asymptotically normal with mean \i and covariance matrix n~l2Z = w - 1 [a , . , . ] . Let g(y) be a real-valued function having a non zero derivative at y = u.. Then g(Yn) is asymptotically normal with mean gir1) and covariance matrix Now suppose we have a sample of size n and denote the maximum likelihood estimates for this sample by dâ€ž and Xn. Under mild regularity conditions, see Section / is the Fisher information matrix and aa and X0 are the "true" parameter values. As a consequence of the above result for fixed x, pk(x\Xn;an,Xâ€ž) has an dn,X\ is asymptotically normal with mean (a 0 , X 0 ) and variance nxi 1 , where Chapter 3. Forecasting asymptotically normal distribution with mean pk(x\Xn;aQ,X0) and variance 38 o2k(x;a0,X0) = n~ f \ da a=a0,A.=>-o J V \ + 2 r i d P k d P k da dX + i a=a 0 ,X=X 0 dX >2\ ,(3.4. where ial and / x ' are the diagonal element of / 1 and i_[ is the off diagonal element. The partial derivatives of the point mass probability pk(x\Xn;a,X) can either be found directly or with the help of the expressions in Section 4.5 and are given by â€”pk(x\Xn) = (pk (x-\\Xn-\)-pk (x\ Xn j)ka k~x da 1 -a . \-kak-l-(k-\)ak ( 3 A 2 ) + {Pk(x-\\Xn)-Pk(x\Xn))X- K } (1-aY and ~ Pk <A xn) = {pk (x-W- P k (x\ xn)) dX 1 -a A n approximate 95% confidence interval for pk (x\ Xn ;a 0, X0) is pk(x\Xn;an,Xn)Â±2v k(x;an,Xâ€žy We now continue Example 3.3.1. (3.4.3) (3.4.4) Example 3.4.1 The maximum likelihood estimates for our data set are d = 0.40 and X, = 5.2. The inverse of the expected Fisher information matrix at these parameter estimates is given by Chapter 3. Forecasting 39 0.62 -5.17 1 ~ [-5.17 50.05 The details of these calculations are found in Example 4.4.1. In section 4.4 we show that the Poisson AR(1) model satisfies the regularity conditions for the maximum likelihood estimates to be asymptotically normal. Consequently we can use (3.4.4) to construct individual approximate 95% prediction intervals for the k-step ahead point mass forecasts. Table 3.4.1 contains the 95% prediction intervals for the first 6 months of 1995. Remarks 1) The width of the prediction intervals increases with the number of steps ahead. However after about 6 steps ahead the width changes very little as it is very close to its maximum. 2) The prediction intervals for the 6 step ahead conditional distribution is almost the same as the prediction intervals for the unconditional (limiting) distribution. 3) These are individual confidence intervals for the probabilities in the k-step ahead conditional distribution NOT the forecasts. Chapter 3. Forecasting 40 k- 1 k=2 k= 3 k=4 k -5 k=6 k - / A ( O | I I ) 0 000 OOOf 0.000 0.000 0 000 0 000 0.000 0.000 0 000 0 000 0.000 0.000 0 000 0 000 0 000 0.001 0.000 0.002 0 000 0 002 0.000 0.003 0.000 0 001 0.000 0.003 0 000 0 005 A(2|ll) 0 ooo o ooi 0.002 0.007 0 002 0 010 0.002 0.010 0 002 0 011 0.002 0.011 0 002 0 011 0 000 0 012 0.006 0.021 0 007 0 02"' 0.007 0.029 0 008 0 029 0.008 0.030. 0 00S 0 010 0 005 0 029 0.017 0.046 0 019 0 055 0.020 0.058 0 020 0 059 0.021 0.060 0 021 0 060 ft(5|ll) 0 018 0 058 0.037 0.080 0 040 0 091 0.042 0.095 0 041 0 096 0.043 0.097 0 043 0 097 A(6lll) 0 041 0 091 0.064 0.115 0 06S 0 125 0.071 0.128 0072 0 129 0.073 0.12l> 0 073 0 130 A(7IÂ») 0 0~8 0 125 0.093 0.140 0 099 0 14> 0.102 0.146 O M P (i 147 0.103 0.147 0 104 0 147 0 111 0 145 0.118 0.I4S 0 123 0 146 0.125 0.14o 0 126 0 145 0.126 0.145 0 126 0 145 A(9|ll) (lir, 0 152 0.130 0.138 OPH) 0 134 0.128 0.135 0 127 0 135 0.127 0.135 0 126 0 136 A ( I O | H ) 0 12^ 0 151 0.111 0.131, 0 102 0 110 0.099 0.130 0 098 0 129 0.098 0.129 0 098 0 129 A ( I I | H ) 0 098 0 1 V 0.081 0.117 ()0"2 0 114 0.069 0.112 0 06X 0 112 0.068 0.111 0 068 0 111 A(12|11) 0 0~0 0 112 0.053 0.095 0 046 0 091 0.044 0.088 0.043 0.088 0.042 0.087 0.042 0 087 A( '3 |II) 0 04^ 0 0S2 0.031 (MPl 0 026 0 066 0.025 0.064 0 024 0 063 0.024 0.063 0 023 0 063 A(14|11) 0 026 0 054 0.016 0.049 0 015 0 045 0.013 0.O41 0 012 0 042 0.012 0.042 0 012 0 042 Table 3.4.1 95% confidence intervals for the k-step ahead conditional distribution. 3.5 Duration In this section we examine duration, which is the number of months that the W C B continues to pay a claim. Under the Poisson AR(1) model assumption the duration time is geometric with parameter 1 - a , see Section 2.2, and the mean duration is (l - a)" 1 . Suppose we have a sample of size n and denote the maximum likelihood Chapter 3. Forecasting 1 41 estimators by dn and Xn. We assume that dn is asymptotically normal with mean a 0 and variance a 2 ( a 0 ,A , 0 ) , where a 2 (a 0 ,A, 0 ) is that portion of the inverse of the Fisher information matrix which pertains to a , and a 0 and k0 are the "true" parameter values. As a result of Theorem 3.4.1 the estimated duration ( l - d j 1 is asymptotically normal with mean (l - a 0 ) 1 and variance (l -a 0)~ 4CT 2(a 0,X 0)n~ l. Hence an approximate 95% confidence interval for the mean duration is ( l - d n ) 1 Â± 1 . 9 6 ( l - d â€ž ) 2a(dn,Xnjn~^2. Example 3.4.1. For our illustrative example the estimated duration is (1-0.40) 1 = 1.667 months and an approximate 95% confidence interval for the mean duration is 1.667 Â± 1.96(1 - 0.40)"2 ^ 0.62/96 or (1.229,2.104). Chapter 4 4. Estimation Section 4 ! outlines the asymptotic theory of estimating functions and is illustrated by reviewing estimation and inference for the Gaussian AR(1) model. In Sections 4.2 and 4.3 we show that the methods of conditional least squares (CLS) and generalized least squares (GLS) satisfy the regularity conditions for asymptotic normality of Section 4.1. Further we derive an analytic expression for the Godambe information matrix in the CLS case and an approximation to the Godambe information matrix in the GLS case. Then in Section 4.4 we show that the score function and observed Fisher information matrix can be neatly represented as conditional expectations. Further in Section 4.5 we show that we can generalize our expressions in Section 4.4 to general AR(1) models. In Section 4.6 we show that the method of maximum likelihood (ML) estimation satisfies the regularity conditions for asymptotic normality. Finally in Section 4.7 we compare the asymptotic efficiency and robustness of the CLS and M L . 4.1 Likelihood theory and estimating functions The goal of this section is to present a unified approach to estimation through the use of estimating functions. We provide the estimating function approach since it includes many standard estimating techniques, such as maximum likelihood, generalized method of moments, conditional least squares and generalized least squares. 42 Chapter 4. Estimation 43 A detailed review of likelihood theory and an introduction to estimating functions is found in Barndorff-Nielsen and Sprensen (1994). Godambe and Heyde (1987) give more details on the asymptotics and optimality of estimating functions. Both of these papers consider estimation and inference for multivariate continuous time stochastic processes. However in this review we will restrict ourselves to univariate discrete time stochastic processes. Many applications of estimating functions can be found in Godambe (1991). Let {X,}" be a discrete time stochastic process for which we consider parametric statistical models of the form (Q",3",P e";0 e Â© ) , where Q" is the sample space, 3" is a sigma field, Pe" is a probability measure defined on the measurable space (Q",3") and 0 is an open subset of W. We assume that all of the probability measures in the family P" = [PQ',Q eÂ©} are dominated by a common rj -finite measure p. and we let dP" d\i denote a version of the Radon-Nikodym derivative. Definition 4.1.1 We call a p-dimensional function ÂĄ,,(0) = xÂĄn(Xl,X2,...,Xn;Q) a regular estimating function if it satisfies the following conditions for all 0=(e 1,e 2,...,e p)â‚¬0. Chapter 4. Estimation 44 1) ^ ( 9 ) is a zero mean square integrable martingale with respect to Pe" and the sigma field Zn=v{Xx,X2,...,Xn), 2) the covariance matrix, ÂŁ , e [ x P n r (0)T n (0)] , is positive definite, 3) xÂĄn is almost surely differentiable with respect to the components of 0 , 4) ^n is nonsingular, where denote the matrix of partial derivatives of xÂĄn with respect to the components of 0 . The parameter 0 is estimated by solving the system of equations Wn (0) = 0 and we let 0 denote the estimate. Estimating functions or estimating equations arise from many standard estimation techniques. The first order conditions for minimizing conditional least squares is an example of an estimating equation. The score function is another common estimating function. In the case were ^â€ž(0) is the score function we will write Un instead of x ÂĄ n . Finally note that xVn is also called an inference function, since both estimation and inference are based on it. Definition 4.1.2 A sequence {3,}^ of sigma fields is said to be adapted to a sequence { X , } ^ of random variables if Xt is 3, measurable. Further, is called an adapted sequence. Definition 4.1.3 Let {3,}"=1 be an increasing sequence of sigma fields. An adapted sequence { X , , ^ , } ^ on a probability space (Q,3 ,f) is called a martingale if for all t Chapter 4. Estimation 45 -filial] exists and is finite and JE'[X ( |3,_ 1] = X T _ V Definition 4.1.4 A stochastic process { X , } ^ is called a martingale difference sequence if Xt is measurable with respect to 3, and E^Xt\Xst_^ = 0. Remarks 1) Unless otherwise stated we will assume our martingales to be vector valued and denote the transpose of Xt by Xf. 2) If {Xt}â„˘ is a martingale difference sequence then the sequence is uncorrelated. That is for all s and t such that s < t ,E[XSX,] = E[XSE[X, | 3 , ] = E[Xs0] = 0 3) If the sequence { X , } ^ is a martingale then a martingale difference sequence {Y,}â„˘=1 can be formed by defining Yt = Xt - Xt_x. Further the variance of the martingale is In the following well known example we illustrate definitions 4.1.1 to 4.1.4. Example 4.1.1 Let u. be Lebesque measure, Q" + 1 = 5R"+1, 3" + 1 be the Borel sigma field of 9T + 1 and 0 = (0,1) x 9? + , where SR+ is the set of positive real numbers. The family of probability measures Pn+1 is defined by the following version of the Radon-Nikodym derivative, Chapter 4. Estimation 46 " 1 p(X;a,X) = ]J-j=e 4(x,-ocAr,.,-x)2 y 2K j where ( a , X ) e 0 and X = ( X 0 , X , , . . . , X â€ž ) r e9T + 1 . This is the Gaussian AR(1) model defined in (2.1.2), with a e(0,l), a 2 = 1 and X Xn = hs n where s n is normal with mean zero and variance l / f l - a 2 ) . Note that 1-a ' defining X0 in this way makes the unconditional distribution of Xt normal with mean A. / ( l - a ) and variance l / ( l - a 2 ) . Note that p(X;a,X) is the likelihood, and the log-likelihood is proportional to 1 ^ l(a,X,o2;X) = --YJ(Xt-X-aXt_l) + l o g ( l - a 2 ) -2 (=i X 1-a The relative contribution of the last two terms of this expression to the log-likelihood is small for large samples. Since our primary objective is to obtain asymptotic properties we wil l consider instead the following approximation to the log-likelihood, 1 " l(a,X,o2) = -^_(Xt-X-aXtJ The score function associated with this log-likelihood is Uâ€ž{a,X) = ^ X ^ - X - a x S t=\ _{Xt-X-aX,_x) (4.1.1) Chapter 4. Estimation It is useful to express Un as 47 Un{a,X) = f n \ t = \ n V /=i J Next we show that Un is a regular estimating function. It is well known that score functions are martingales. It is easy to verify that Un is a martingale with respect to 3â€ž = cy(X0,Xx,X2,...,Xn) as follows E[Un\Xn_x] = E fn-\ \ n-l V /=i J The martingale differences of the score are ut = Ut -Ut_x = (Xt_xst,st)T, which have the following variance, (=1 n 3 - . ^ t=l ) Chapter 4. Estimation 48 .EJu^f] = E X 2 _ ,s 2 X,_,e 2 'E[XI^-. E[XT_X]E E[X^]E[Z)] 4 ? ] . 1 + f X A. l - a z U - a V 1-a X 1-a This matrix is positive-definite since for any 2-dimensional vector / = (/,, l2 Y, lTE[utuJ]l = l\ 1 ( X ^ 1 - a ' U - a 1 (, X X + 21J2 + l2 1-a 1 1 - a 2 + / 2 1-a is positive except i f lx=l2=0. The variance of UN is simply n time the variance of ut and is positive-definite. The partial derivatives of UN exist and are given by the following matrix: - z (=1 X , t-\ (4.1.2) and has the following determinant: det(UN) = -n^"_^{XT - X ) , which is strictly negative except for the case were X , = X 2 -â€˘â€˘â€˘XN. Therefore UN is a regular estimating function. Chapter 4. Estimation 49 The solution of Un(a,)Cj - 0 gives the following parameter estimates, Y(xt.x-x)(xt-x) * n - ^ r - â€” â€” (4.1-3) (=1 K=-l_Xt-dn-_Xt_x. (4.1.4) Definition 4.1.5 Let {/w,}", 6e a sequence of martingale differences, which generates the martingale Mn=_^^mt. (M)n =_~j"^oE^mlmf\^t_l'^ is called the quadratic characteristic of Mn and [M]n - __"t_xmimJ is called the quadratic variation of Mn. Remark If Mn is a martingale with respect to 3â€ž then [M]n â€”(M) is also a martingale with respect to the same sigma field 3 n . Definition 4.1.6 A sequence {Xt}â„˘_x is said to be stationary if for every finite k the joint distribution of the vector (Xt, Xt+XXl+k) is independent of t. Definition 4.1.7 A sequence {X,}â„˘ is said to be a -mixing if there exist a sequence of positive numbers {a(*)} , convergent to zero, such that Chapter 4. Estimation 50 \P(A nB)- P(A)P{B)\ < a(n) for any set A e a ( X j , X 2 , . . . , X t ) , any set B &<j(Xk+n,Xk+n+l,...) and k>l, n>\. Definition 4.1.8 Let {Xt}â„˘=l be a stationary stochastic process define on the probability n space (Q,3,P). If for every two sets A and B e 3 , l i m n _ 1 V P ( X , â‚¬.Ar\Xt eB) = P{Xl eA)P{X1 eB) then the sequence {X,}â„˘ is called ergodic. Theorem 4.1.1 Let g be a measurable function onto 9?* and define Yt = g(Xt,Xl+l,...,Xl+s).Then 1) if {X,}â„˘=1 is stationary then {Y,}â„˘=] is stationary, 2) if {Xt}â„˘ is a-mixing a(n) = 0(n~r) for some r>0 then {Y,}â„˘=1 is a-mixing a(n) = 0(n~r), 3) if{Xt}â„˘={ is ergodic then {^}", is ergodic. For a proof of parts 1 and 3 see Stout (1974, pp. 170, 182) and for a proof of part 2 see White and Domowitz (1984, Lemma 2.1). Remarks 1) Parts 1 and 3 hold even when 5 is infinite. Chapter 4. Estimation 51 2) If {X,}â„˘ is stationary and a -mixing then it is ergodic. The converse is not always true. Theorem 4.1.2 (Ergodic theorem) If { X , } ^ is stationary ergodic with ÂŁ[|X,|] < oo then " as n-*_X,^E[X,]. (=i For a proof of the Ergodic theorem see Stout (1974, p. 181). The following regularity conditions are needed to establish a central limit theorem: Let {/H,}^, be a sequence of martingale differences, which generate the martingale Mn = __" t m , â€˘ C 1 ) U m - MM~] -Â°' C2) {M)JVar[Mn}^r\, CT)[M}JVar[Mn}\x\. Theorem 4.1.3 If Mn is a one dimensional zero mean square integrable martingale satisfying conditions CI and C2 then (M)'/2Mn-%N(0,l) (mixing) (4.1.5) and if condition C2 is replaced by condition C2' then Chapter 4. Estimation [M]~/2 Mn - i JV(0,1) (mixing). A proof of this result can be found in Hall and Heyde (1980, ch. 3). Remarks 1) Condition CI can be replaced by weaker conditions, see Hall and Heyde (1980, ch. 3). 2) Loosely speaking the regularity conditions insure that Mn is not dominated by a few of the mt 's and that the variance is standardized to unity. 3) If the martingale difference sequence is stationary and var[/w,]<oo then condition CI is satisfied. 4) If the martingale difference sequence {mt }â„˘=l is stationary ergodic and E^mf j < oo then conditions C2 and C2 ' are satisfied with r | 2 = 1 almost surely, and the result is called asymptotic normality. 5) If n 2 is not degenerate the result is called mixed asymptotic normality and the model is called non-ergodic. 6) For the multivariate case, one simply checks that for all non-zero ^-dimensional vectors / the conditions for the univariate martingale lTMn hold. In this case the Cramer-Wold (1936) device says that (4.1.4) and (4.1.5) hold for the vector Mn. Example 4.1.2 In this example we show the estimating function Un in Example 4.1.1 52 (4.1.6) Chapter 4. Estimation 53 satisfies condition C I . For any 2 dimensional vector l = {lx,l2)T w e n a v e P<zr[/rw,j = lTE\utuJ]l, which we showed in Example 4.1.1 to be equal to l\ â€” r + K â€” â€” ^ L J 1-a ^ 1-a The variance of lTUn is simply n times this. Substituting into condition CI we get, 2 1 2 .2 1 (j X â„˘â„˘,4i2,..,n}MlTu'] r 1 1 - a 2 T 1 1-a lnri vâ€ž â€” r 4 â€” ^ - - lim [ 1-a V 1-a ; J = 0 We defer verifying conditions C2 and C2 ' to Example 4.1.5. Theorem 4.1.4 (Continuous Mapping Theorem, CMT) Let Zn and Z be random vectors from some sample space Cl to k-dimensional Euclidean space 9?*, Further let g() be a measurable function from 9?* to 9?m. We will allow g(-) to be discontinuous at a set of d zero measure points Dg. IfZn-+Z and if P(Z sDg) = 0 then the Continuous Mapping d Theorem says that g(Zn) â€”Â» g(Z). See McCabe and Tremayne (1993, Section 4.4) for a sketch of the proof and for many illustrations of its use. Remark d d The following three cases of the C M T are widely used. If Xn -Â» X and Yn -> c then, Chapter 4. Estimation 54 d XJâ€ž->Xc n n and Xn d X â€”=â€”Â» â€”, c*Q. Y. c We require the following additional regularity conditions for the next theorem. C3) Y ; 1 (0 0)Yâ€ž (0) - 4 0, for any 9 0 in a suitable neighborhood of G . C4) I f 0 n A e thcn%\Q)Vn{Qn)^Ip. Theorem 4.1.5 Let xVn be a regular inference function and let 0â€ž be a solution of ^ ( 0 ) = 0. If the regularity conditions CI, C2, C3 and C4 are satisfied then, ( T ( 0 ) ) ; ^ â€ž ( 0 ) ( 0 â€ž - 0 ) ^ i v [ o , / J . The proof of this result is outlined in Godambe and Heyde (1987). Remarks 1) In condition C3 a suitable neighborhood of 0 means one such that 0â€ž is in the neighborhood for all n greater than some number N. Chapter 4. Estimation 55 2) The result of Theorem 4.1.5 also hold when condition C2 is replaced by C2 ' and (ÂĄ (9 ) ) B is replaced by [^(9)]^ The following regularity conditions, for the sequence of martingale differences {vj/1}, lead to a simplification of Theorem 4.1.5. C5 For all s and t, JE'[v|/Jvj/Jj = ÂŁ'[v(/(v(/f]<oo and 2?[\|> J = 2?[\|) J , where v|), is the matrix of partial derivative of v|/, with respect to the components of 9 . C6 Either n\V]n^E[y ty]} or n\V)H A ^ ^ f ] , C7 n ' Y ^ A ^ , ] . Remarks 1) Condition C5 implies condition CI . 2) Conditions C5 and C6 imply either conditions C2 or C2' . 3) If {y,} is stationary then conditon C5 holds. 4) If {v|/1} is stationary ergodic, E |\|/ t\\i ] |j < oo and ii[|v|> ,|] < oo then conditons C6 and C7 hold. Definition 4.1.4 When condition C5 holds for the regular estimating function x ÂĄ n we define the variability matrix as V(Q) = Var[y, (9)] = E[y, (9 )y ] (9)] Chapter 4. Estimation 56 Remark A desired property of an estimating function is to give similar estimates in repeated samples. That is, we would like the function x ÂĄ n to not vary much from sample to sample or the variance of xÂĄ n to be as small as possible. We therefore desire V to be as small as possible. Definition 4.1.4 When condition C5 holds for the regular estimating function xÂĄ n we define the sensitivity matrix as S(&) = E^f R(0)]. Remark Another desired property of an estimating function is that it should be easy to distinguish between small changes in the parameters. The steeper the estimating function is around the true parameter value the easier it is to distinguish and identify the parameter. It is therefore desirable for S(&) to be as "large" as possible. Definition 4.1.4 When condition C5 holds for the regular estimating function xÂĄ n we define the Godambe information matrix as j = S T (0 )V~l (0 )S(Q ). Definition 4.1.9 When estimation is based on the score and it satisfies the conditions for a regular estimating function as well as condition C5, then we define the Fisher information matrix as / = V = -S. Chapter 4. Estimation 57 Remarks 1) The Godambe information can be interpreted as a measure of the amount of information that can be obtained about the parameter from the estimating function. 2) A l l the estimating functions considered in this chapter satisfy conditions C5, C6 and 3) A l l of the estimating functions considered in this thesis are score or quasi-score functions, that is Ln, where Ln(x0,x1,...,xn:Q) is either a likelihood or a pseudo likelihood. Since in this case x ÂĄ n will be a matrix of second partial derivatives of Ln with respect to 9 , S(9) will be a symmetric matrix. By pseudo likelihood we mean when the . true likelihood is replaced by a simpler likelihood, which retains some of the properties of the true likelihood. For example the Poisson AR(1) likelihood could be replaced by the pseudo Gaussian AR(1) likelihood, which is much simpler. Recall that the Poisson AR(1) model and the Gaussian AR(1) model have the same conditional mean, marginal mean, autocorrelation function and partial autocorrelation function. Corollary 4.1.1 Let xVn be a regular inference function and let Qn be a solution of xÂĄn(9) = 0. If the regularity conditions C3, C4, C5, C6 and C7 are satisfied then, CI. d Proof: From Theorem 4.1.5 we have Chapter 4. Estimation 58 W e ) ) : K t ( e ) ( e â€ž - e ) ^ z , where Z a p-dimensional standard normal random variable. From the C M T we have The left hand side is equivalent to nyi(dn - 0 j , while random variable on the right hand side has a multivariate normal distribution with variance covariance matrix S-T(Q)V(d)S-\Q) = r\ â€˘ Example 4.1.4 We show that condition C5 holds for the Gaussian AR(1) model in Example 4.1.1. In Example 4.1.1 we showed that V = E[utuj] 1 J X 1 - a ' 1 -a 1-a J 1 -a 1 (4.1.7) which is finite and due to stationarity is independent of t. The expected value of ut is given by, S = E[ut] -E X , 1 _ Chapter 4. Estimation 59 1 - a ' + ( x V V l - a j 1 -a X 1-a 1 which is also finite and independent of t. Hence condition C5 is satisfied. We wil l defer showing that conditions C6 and C7 hold to Example 4.1.5. Ideally we would like the variation in 0 â€ž to be as small as possible. That is we want j~x to be small or the Godambe information to be large. The following regularity condition allows comparison between the Godambe information and the Fisher information. We denote the score function as Un(x;Q) = pn(x;Q)~x j$ pn(x;Q), where -kpW) = )Â»air ^ <>'G )> â€˘ â€˘ â€˘' A P(x>etf â€˘ C8 The score function is a regular inference function and the order of integration and differentiation may be interchanged as follows, d r â€ž . / ^ / â€ž N , / N r d dQ jTâ€ž(x;0K(x;0)^(x)=Jâ€”[^â€ž(x;0K(x;0)]^(x). Theorem 4.1.6 For a regular estimating function which satisfies conditions C5 through C8 the Godambe information is always less than or equal to the Fisher information. More specifically det(/- j)-0. Next we state a strong law of large numbers for martingales and prove some special cases which will be used later in the thesis. Chapter 4. Estimation 60 Theorem 4.1.7 (Martingale strong law) Let {Yt,3t} be a martingale difference sequence, where 3, = <j(Yx,Y2,...,Yt), the a -field generated by Yx,Y2,...,Yt. If there exist an r > 1 such that Zâ„˘ ] -EI^I ]/tl+r < o Â° , then Z " . , % ^ n~* Â® a ^ m o s t surely as n â€”Â» oo. For a proof of this strong law see Chow (1960) and Stout (1974). Remark r T y n as If there exists an M such that ÂŁ[Y, 2 1<M for all t then 2Lt=Ytln^r0, since z : , ^ v ^ z > A 2 < Â» -Next in Theorems 4.1.8 and 4.1.9 we prove a strong law of large numbers which only requires restrictions on moments. In such cases, Theorems 4.1.8 and 4.1.9 can be used instead of the Ergodic theorem which requires showing the process is ergodic. Suppose a stochastic process {X,}â„˘ has the following moment retrictions: = oo, (4.1.8) for all positive integers s and t and k = 1,2, â€˘â€˘â€˘,p. Further suppose the conditional expectation of X] given 3 M = G(Xx,X2,...,Xt_x) is a polynomial in Xt_x of degree k. That is, the conditional expectation can be written as ÂŁ[JT*|3 /. 1] = Â« M J r * 1 +aâ€žJT^1+-+fl^_t-y f_1 (4.1.9) Chapter 4. Estimation where a.n ^ 0. 61 Theorem 4.1.8 Let {X^ be a stochastic process which satisfies the moment restrictions in (4.1.8) and (4.1.9). Then the strong law of large numbers holds for Xkt , that is - __t_x Xk â€”> E[Xk ] almost surely as n â€”> oo and k = 1,2,â€˘â€˘â€˘,/Â». Proof: First note conditions (4.1.8) and (4.1.9) together imply E[Xk]= â€”-â€” l~ako {aHE[Xf-1] +ak2E[Xk-2]+...+ak^E[Xt] + akk}. We will proceed to prove the result by induction of k. Consider the case k = 1. Let Yt = Xt - f l 1 0 I M -an and note that {Yt,3t} forms a martingale difference sequence. Further Yt satisfies condition (4.1.8) and hence satisfies the conditions for Theorem 4.1.7. Therefore as n -Â» oo, 1 " as -_{X,-ai0Xt^-au}^0 or ^ J x ( + ^ { X â€ž - X 0 } + a u 4 0 n t=\ n __â€ž as which implies that \ ÂŁ M ^ - = ]. We now assume that for / = 1,2,...,k, where k<p, X't->E[X't]. Let Chapter 4. Estimation 62 Yt=X*+x - f l t + 1 / M - f l w / H ^k+xkXt_x-aMMV Then {7,,3,} is again a martingale difference sequence and satisfies condition (4.1.8). By Theorem 4.1.7 the following sums converge almost surely as n â€”Â» oo, 1 ", 0 8 _ Z ~ a*+i,o^r/-"i _ flt+i,i^-i ~~ ak+\,kXt-\ _ ak+\,k+i ^ 0 n t=\ 1 ak+\,0 The proof then follows by induction on k. Theorem 4.1.9 Under the assumptions of Theorem 4.1.8 1 " as " i=j almost surely as n â€”> co. Proof: Note that repeated application of the conditional expectation assumption implies that E[Xt\3l_j] = bjXt_j+Cj, for some bj and c., and this implies that E[Xt_jXt] = bJE[Xf] + cJE[Xt].Let Y, = Xt_}(Xt-b}Xt_}-Cj). {Tâ€ž3,_,.+1} is again a martingale difference sequence and satisfies condition (4.1.8). By Theorem 4.1.7 we have the following, Chapter 4. Estimation 63 1 ^ <" - ÂŁ j r (JT, -& JT - c )->o i n i n i n ^ n t=j n t=J n t=J 1 " as - _ Z X t - j X t -tbjEiX^ + CjEiX,]-E[Xt_jX,] â€˘ Remarks ^_ as 1) Under the same conditions the more general result i . Xlt -> E\Xkt_}X\ ] can be proven using basically the same proof, only it is more cumbersome to write down. 2) We believe that these are new results. 3) The conditions for Theorem 4.1.7 are satisfied by the Poisson AR(1) model. Example 4.1.5 We continue with Example 4.1.4. Since the unconditional distribution of Xt is normal with mean X / ( l - a ) and variance 1/(1-a 2) we have that for all s and t E[Xf]= E[X*]<co, k = 1,2,-â€˘â€˘. Also the conditional expectation of Xt given Xt_x is linear in Xt_x. Hence we can apply Theorems 4.1.6 and 4.17 to get, as and Chapter 4. Estimation 64 Theorems 4.1.6 and 4.17 can also be applied to n xUn(Q0) which gives us, as - _ where the equality to S results because ut is independent of the parameters. Applying the C M T we establish condition C3 as follows U;xUn={n-xUnYnxUn^S-xO = 0. Also since Un(Q) = Un(Q0) for all G and 0 O in the parameter space condition C4 is trivially satisfied. Since conditions C3 through C7 are satisfied Corollary 4.1.1 gives us the following asymptotic result, v A Â« ~ K ) where / = Fand appears in (4.1.7). Chapter 4. Estimation 65 4.2 Conditional least squares for the Poisson AR(1) model We begin by defining the family of statistical models. Let p be the counting measure, â‚¬1" =XQ> where K 0 is the set of non-negative integers, 3" be the set of all possible subsets of Q" , and 0 = (0,1) x 9 i + . We define the family of probability measures P" by the following version of the Radon-Nikodym derivative, pX;a,X =f[p(Xt\Xt_l)p X, , where p{Xl\X,_l)= ÂŁ a s \ - a x ' - l ~ s â€” , (4.2.1) s=0 Lr-1 V s J Xt-s ! pX0 = V J a ^ (4.2.2) and a,A. e 0 . This defines the Poisson AR(1) model given in (2.1.1). In this section we consider estimation of the Poisson AR(1) model parameters by the method known as conditional least squares (CLS). Klimko and Nelson (1978) and Hall and Heyde (1980, ch. 6) consider CLS estimation and inference for stochastic processes. The parameter estimates are selected to minimize the sum of the squared distances of each observation Xt from its conditional expected value given the previous observations XQ,Xl,...,Xt_l, 2 ? [ X J 3 M ] = aXt_x +X , where 3 , is the standard filtration, Chapter 4. Estimation 66 that is 3, = cs(X0,Xv..,Xt). The problem is equivalent to maximizing the following function over the parameter space, ^ r=l The first order conditions to this maximization problem lead to the following estimating function, a,A, = Qn a,X _Xt_, Xt-oXt_,-X = , = 1 _ Xt-aXt_x-\ V r=l J Note this estimating function is the same as the score function for the Gaussian AR(1) model given in (4.1.1) and is therefore called a quasi score. The following are a direct consequence of this equivalence: 1) from (4.1.2) the partial derivatives of the estimating function are, Xf-\ Xt-\ 2) this matrix is non-singular (except i f X0 = Xx -â€˘ â€˘ â€˘ = Xn = 0), 3) from (4.1.3) and (4.1.4) the solutions to 9 =0 are, (=1 Chapter 4. Estimation 67 _(x,_x-x)(xt-x) a . and (Note that in cases were d â€ž is negative we set it equal to zero) 4) xÂĄn is a martingale with respect to 3 n = a X0,Xx,...,Xn , since the conditional expectation of Xt given X r - 1 is the same in both the Poisson and Gaussian AR(1) models. Next we calculate the variance of v|/,, which is given by ÂŁ[\|/,\|/f] = Xt_x Xt â€”oXt_x â€” X Xt_x Xt â€” aXt_x â€” X E E Xt_x Xt â€” oXt_x â€” A, 2V\ Xt -aXt_x Xl,E[xt-aXt_x-'k2\^ Xt_xE Xtâ€”oXt_xâ€”X |3(_ E E -aXt_x-X |3 X(-aAr(_,-A. 13 4 4 Xf_x(a 1-a XM+A)] ^ [ ^ ( a 1-a Xt_x+X) x,_x(a 1-a X,_,+A)] E (a 1-a Xt_x+X)] a 1-a E a 1-a E Xlx] + XE[x2t_x} a 1-a ÂŁ[X ( 2 . ,] + A J : [ X M ] xU + XE[xt_x] a 1-a E[Xt_x] + X Chapter 4. Estimation 68 f a 1-a 1-a â€˘ + 3 1-a v l - a ' A ^ , 1 - a , 2 1-a . A f A a 1 - a { â€” â€” + +A 1-a \l-a ( A ^ 2U 1-a J a 1-a A ( A â€˘ + â€˘ + A a 1-a aA + 1-a U - a J I \ l - a A3 aA + f â„˘ | A 2 1-a z A2 1+a A V ' a A + f ! Â± ^ V + ^ U 3 Q Â± | ) n + a ^ i 1-a + A I 1 - a J 1-a V l - a y Note that the above expectations are with respect to the Poisson AR(1) model. This T matrix is positive definite since for any 2 dimensional vector 1= lx,l2 , the quantity / r 4 l / , M / f ] / = / 2 aA + ( l + 3 a ^ 2 | 1+a ^ 3 V 1-a j 1-a + 2lxl2 aA + ^ |A2 1-a + / 2 1+a A r _ a A _ + l + a ^ + U + a 1-a J ak'A 1+a / 2 A % 1+a â€˘ + -1-a + /, 1+a K A K is positive except i f /, = l2 = 0. The expected value of \j>, is given by, E xf E[Xt] E 1 1-a ( A V 1 - a , 1 -a 1 -a 1 Since J and ÂŁ[v|),] are both finite and independent of t condition C5 holds and Chapter 4. Estimation 69 we define the variability matrix and the sensitivity matrix respectively as V = E^\i t\\i Tt j and S = E\y ,]. The Godambe information and its inverse are given by, J = sv-ls = 2 â€˘> a 1-a A,+ 1+a A X2 , 1+a , 3 1-a 1-a X2 1+a , 2 1-a l-^X2 1-a a 1-a + 1+a A, and a 1-a A + 1 + a 1-a - 1+a A, A + 1+a A. 1 + a A 2 1-a Note these explicit expressions for the Godambe information and its inverse are new. Condition (4.1.8) is satisfied since the marginal distribution of Xt is Poisson with mean A/ 1-a and all moments for the Poisson distribution exist and are finite. Condition (4.1.9) is also satisfied since i s [ x , | . X M ] = aX ( _ 1 +A,. We can therefore use Theorems 4.1.8 and 4.1.9 to show the following: as 1) n~l [ÂĄ]n -> V, which is condition C6, as 2) n~lxFâ€ž -> S, which is condition C7, 3) Â«- l v Fâ€ž->Â° Chapter 4. Estimation 70 Since xÂĄn a, X is independent of a and X condition C4 holds. Further condition C3 is Y " 1 a,X a,X =(n~lxÂĄn a,A, ) n~lxÂĄn a,X , which by the C M T converges in probability to zero. Finally we can apply Corollary 4.1.1, since conditions C3 through C7 are satisfied, to get 4^ Â«" ^ i V ( 0 , 7 - ' ) . 4.3 Generalized least squares for the Poisson AR(1) model In this section we use generalized (weighted) least squares (GLS) to estimate the model parameters, see Wooldridge (1991a). In this method, parameter estimates are selected to minimize the sum of the weighted squared distance of each observation Xt from its conditional expected value given the previous observation Xt_x, E\Xt\Xt_l'\ = aXt_x +X . The weights are the inverse of the estimated conditional variance I / ar[X,|X ( _,] = d ; i l - d â€ž Xt_x + Xn, where dâ€ž and Xn are strongly consistent estimates, such as the conditional least squares estimates. Observations with large conditional variances are given smaller weightings than those with smaller conditional variances. This problem is equivalent to minimizing the following function over the parameter space, Chapter 4. Estimation 71 ,=i aâ€ž l - a â€ž X, , + Aâ€ž The first order conditions to this minimization problem lead to the following estimating function, a,A, =Qn a ,A '^X_x Xt-aXt_x-\ ^ ^ X , - a X M - A . (4.3.1) The sequence {v(/1}" , where \)/1 = - , is martingale difference sequence since X(_| X ( â€” OLX ( _J â€” A dâ€ž l - d â€ž X, , +Aâ€ž n n tâ€”i n X, -oXt_x - A, . dâ€ž l - d â€ž X, , + A,â€ž , Jt-\ dâ€ž l - d â€ž X, , +A,â€ž v 1 V n Â« /â€”I n v ^ = 0. Therefore xÂĄ t is a martingale with respect to 3, = a [xo, Xx,..., Xt, d n , A n j . The partial derivative of \\i t with respect to a and A, are Chapter 4. Estimation 72 aâ€ž l - a â€ž X, , +A. X. , 1 This matrix xÂĄn = /^"_,v|/, is non-singular except i f all of the Xt's are zero. Notice that the matrix X,-aX,_x-\2 (xf_x X,_^ (<*Â» X,_x+kn) is positive definite and hence the variance of xÂĄ n is also positive definite. xÂĄ n is therefore a regular inference function. If we assume that for all n Aâ€ž >5 for some 5 >0 then v|/ f is dominated by the function xt-oxt_x-x2(xU xt_x~] (4.3.2) which has a finite expectation. By the Dominated Convergence Theorem we have l i n v ^ , E\]y , V f ] = ÂŁ [ l i m ^ y ty f ] = E Xt-oX,_x-X = E = E\ (a 1-a Xt_x +A.) 1 (a 1-a X^+kf 1 (a 1-a Xt_x+X) fx2 K.xt-\ 1 xt. U - i 1 X , - a Z , _ , - A r-l 1 (4.3.3) Chapter 4. Estimation 73 Further this implies l i m _ nlVar[Vn] = l i m ^ E[y , V f ] 1 = E\ (a 1-a + (X2 X ^ Note that strong consistency of dâ€ž and A,n is required to use the Dominated Convergence Theorem. Similarly \|>, is dominated by the function (4.3.4) which has a finite expectation. By the Dominated Convergence Theorem we have l i m ^ ÂŁ[\j>, ] = E[\imn^x v|> ,] 1' y1 Y ^ (a 1-a + * M 1 We will therefore define the variability matrix Fand sensitivity matrix S by V = -S = E (a 1-a Xt_x+k) I' Y2 Y (4.3.5) Theorem 4.3.1 Let xÂĄn be the inference function defined in (4.3.1). Then the following converge under the Poisson AR(1) model: n-l[V]n->V (4.3.6) Chapter 4. Estimation 74 n^n\s (4.3.7) Before we prove this result we need to show the following new result. Proposition 4.3.1 If the stochastic process follows the Poisson AR(1) model with a e 0,1 , then it is a -mixing with a n = 0(a"). Proof: We let pk(Xt+k\Xt), p(Xt\Xt_x) and p Xt denote respectively the conditional probability of Xt+k given Xt, the conditional probability of Xt given Xt_x and the marginal probability of Xt. These are respectively defined in (3.3.1), (4.2.1) and (4.2.2). We denote the joint distribution of Xx, X2,..., Xk by, p Xx,X2-,Xk =p(Xk\Xk_x)p(Xk_x\Xt+k_2)-p(X2\Xx)p xx the joint distribution of Xk+n, Xk+n+x,... by oo p xâ€ž+k,xn+k+x--- =pxk+n Y\p{xt+1\xt) t=k+n and the joint distribution of Xx, X2,..., Xk, Xk+n, Xk+n+x,... by p X\>x2,...,xk,xk+n,xk+n+x,... -p xx,x2...,xk pn(xk+n\xk)Y[p{xt+i\xt) t=k+n Pn k+n \ Xk ) = p Xx,X2...,Xk p Xn+k,Xn+k+x... -p xk+n For any A eo Xx,X2,...,Xk and any B = a Xk+n,Xk+n+xwe have the following, Chapter 4. Estimation 75 \P AnB - P A P B\ = __{p x 1 , x 2 , . . . , x k , x k + n , x k + n + l , . . . A,B -p Xx,X2...,Xk p X n + k , X n + k + x . . . }| Z â€ž V V Y â€ž V Y \Pn(Xk+n\Xk)-P X k+n P ^i,A-2---,Ak p * - n + k , A . n + k + x , . A,B P X , k+n = e = e = e = e |(aT(l-a")x*~V 1"a mm(x,Xk)( v 1-a (x-s)\ 1-a 1-a 1-a" 1-a - + a , 1-a" (any-l(l-a")Xt-se 1-a" 1-a 1-a' min(xâ€”-1) - + a (a")f(l-a")Xt-1-se" ^ 1-a (x-l-s)\ 1-a min(x-Uft-l) (â€˘ÂŁ _ x! 1-a" 1-a + anXk __ (s+l) k (a")s(l-a")JfÂ»-'-ie" 1-a (x-\-s)\ x\ a"Xk __ (5 + l)| j=0 k (a")s(l-a")Jf'-'-Je v 5 ) 1-a (je-l-s)! Since f 1 _/v " A 0< v 5 y n w i . n n ^ - i - ' / , * i -a V ( a " r ( l - a " ) ^ - H c 1 -a ' 1 -a J C - l - S ! <1, min(x-Uf t-l) I t - U t the summation on the right is bounded by _ s +1 , which is less than s=0 Hence we have Chapter 4. Estimation 76 or pn(x\Xk)<e'X 1-a" 1-a V_ 1-a xk-ixk +a"Xt k k pMxk)=e x 1-a" 1-a 1-a + <9(a") JC! , 1-a" X -X +-< 1 - a " l - a A A, 1-a 1-a \ 1-a A JC! â€˘ + 0(a") p{x) = p(x) Xa 1 + + 0(a") A a : 1 - a . ( 1 - a - ) ' JC! + 0(a") = jp(x){l + 0(a : ) + 0(a")} = p(x){l + 0(a"j\ where 0 < a . < a . Therefore Pk(Xk+n\Xk)-p Xk+n p Xk+n {l + 0(a")}-p Xk+n px, k+n px, k+n = 0(a") and Chapter 4. Estimation 77 \P AnB -PAPB\ = Y,p Xx,X2...,Xk p Xn+k,Xn+k+v.. 0(a") A,B = 0(a")YJpXl,X2...,Xk pXn+k,Xn = 0(a") A,B Therefore {X,} is a -mixing with a n = 0(a") . Proof: (Theorem 4.3.1) We begin by introducing some notation. Let â€˘ X,-oXt_x-X (xlx X_ (aâ€ž 1 -d , Xt_l+Xn) X i 1 f v2 X = X - a X _ , - A , Xt-i xt_x and Showing (4.3.6) is equivalent to showing n lSn ->V. Since { X , } ^ is stationary and a -mixing with a n =0(a") by Theorem 4.1.1 { i f } " is also stationary and a -mixing with a n = 0(a"). Further {X,}" = 1 and {7/}" are Chapter 4. Estimation 78 both ergodic. Since Ym and Ys are both positive and Ym<Xâ€žY, we have that the variance of -1 "-2 -1 * ^ as n Sn is less than or equal to the variance of Xn n Sn. By assumption Xn â€”> A and by the Ergodic theorem n Sn ->E\Yt J. Hence Xnn Sn converges almost surely and therefore its asymptotic variance is zero. Chebyshev's inequality can then be used to show that r i P n~lSn - ÂŁ [ Â« - 1 5 n ] - > 0 . Since lim n_ > 0 OÂŁ[l^ I] = r we therefore have the desired result, P n~lSn â€”> V. The proof of (4.3.7) is similar and is omitted. â€˘ P Condition C2 ' is satisfied since we have shown that n~1[^/]â€žâ€”>V and n mn->Â«. = V. Condition CI is also satisfied since l i m , , ^ max,^ 2 nj Var\y,] = V. Condition C4 is satisfied since a,X j x5'n(&â€ž,Xâ€ž) = I2 â€˘ Finally condition C3 is satisfied since [*â€ž a0,X0 a,X = [*â€ž a,A, \ \ a,X = [n~lxÂĄn a , A ] ~ V l v F n a A which by the C M T Theorem 4.1.4 converges in probability to zero. We can therefore use Theorem 4.1.5 which implies Chapter 4. Estimation [ y ] ; % â€ž a,X ->7V 0,7, By the C M T Theorem 4.1.4 the following holds n*(% a,X = ( r t â€ž a ,A ) ~ V * [ Y ] * - Â» , r V * = K " * Therefore n - ^ ( O . F - 1 ) . The following theorem gives an approximation for V Theorem 4.3.2 The Junction V in 4.3.5 is equivalent to V a,X -X a 1-a a 1-a 1 a 1-a 0 + X -X [a 1-a ] a 1-a -X a 1-a 1 a 1-a JC (_J + X Chapter 4. Estimation Proof: Making use of the following two identities, x2_x =(a 1 -a xt_x + Xj x. a 1 -a V [a 1 - a ] J [a 1 - a ] and, = â€” r â€” ( a 1 _ a + ^ ) â€” r â€” a 1-a a 1 -a we can rewrite the information matrix as follows, V a,X = ( E X a 1 -a [a 1 - a ] 1 a 1 -a X2 -X 1 a 1-a 0 [a 1 - a ] a 1 -a -X a 1 -a a 1 - a xt_x + X -X 1 a 1 -a a 1 -a 1 a 1 -a -X [a 1 - a ] a 1 -a a 1 -a a 1 - a xt_x + X To approximate the expectation above we consider the following function, gx = a1-a x + X Chapter 4. Estimation 81 th The k derivative of g x with respect to x is, g x = it! â€”1 *[<x 1 -a ]* [a 1 -a x + A]* + 1 and a Taylor expansion about the point A/ 1 - a is, gx =g\ U - a + g' ( X Y X 1-a x â€”-V 1-a 1-a ^ 1 - a j 2! 1-a ( x V i / x Y A V i V* 1 - a ; 3! v l - a y v 1 - a J 4! - + i? ^ where X Y 1 1-aj 5! i? ^ wil l be largest in absolute value when ^ = 0. Now we can approximate the expected value of g Xt_x . 1 [a 1-a f f x } [a 1-a ] 3 ( X 1+a X [ 1+a X]3 U - a J [ l + a A ] 4 U - a [ a l - a ] 4 f ^ / A, ^ [ 1+a A,] 1-a U - a where the error in the approximation is negative and bounded in absolute value by Chapter 4. Estimation 82 E[R(0)] [ a ( l - a )]5f x + 10 4.4 The score and Fisher information for the Poisson AR(1) model In this section we derive expressions for the score function and observed Fisher information for the Poisson AR(1) model. The conditional likelihood given X0 is L(a,X) = Yl"_lp(Xt\Xt_l), where p(Xt\Xt_l) is defined in (4.2.1.). Let Ua be the score with respect to a , that is where /(â€˘) is the log-likelihood. The partial derivative of the conditional probability is found by making use of the following derivative Ua=^t(aXXâ€žXl,...,Xn) = fj ^p(Xt\Xt_x) pmxt_x) s r â€” ( 5a 1 a ^ l - a ) * - 5 ' a ( l - a ) 1-a s x a* ( l - a ) x-s and the relation Chapter 4. Estimation 83 -â€”p(y\x)= ÂŁ -j r a ' ( l - a ) x " ' 7 r - - - p(y\x) da " a ( l - a j ^ s j [y-s)l 1 - a 1 f x -1) 1 v 1 ' ^ A e A- ^ / ^ a (1 -a ) T r 1 - a i_\ [s-lj (7-5)1 1 - a X^y-s p(y I *) x v \ x 1 L s-i (-, â€ž v-i-s e K â€” z 1 -a S ( j - l - 5 > 1 -a p(y I (4.4.1) 1 -a {p(y-l\x-\)-p(y\x)\ Thus the score with respect to a is, U^^^ipiXi-W^-V-pmX^/pmX.-t). (4.4.2) <=i t a Let C/"x denote the score with respect to A , that is a "~^rP(xt\xt-\) Ux(a,X) = â€”l(a,X;XQ,Xl, , * j = ÂŁ ^ _ _ _ 5A. n P(x,\x,-i) The partial derivative of the conditional probability is found by making use of the following derivative dX ^ ; ^rP(y\x) = dX min(y,x) ( J \ -X * y-\-s s=0 \ S J _i*)(x A = -p(y\x)+ __ M a ' ( l -a ) s=0 c-s e X X y y-l-s ( v - 1 - 5 ) ! = -p(y\x)+p(y-i\x). Chapter 4. Estimation Thus the score with respect to X is, The second derivative of the conditional probability can now easily be found. d p(y\ x) = 7-^- 4r p{y\ x) + da 1-a 5a 1 I x ^-p(y-l\x-l)-ÂŁp(y\x) 1-a l l - a + 1-a {p(y-l\X-l)-p(y\X)} 7 ^ {^{Piy -2\x-2)-p(y-l\x-1)} 1 -a [ 1 - a L J {p(y-l\x-\)-p{y\x)}\ 1-a = - ^ { p ( y - l|x -1) - /;(v|x) - ^( v - 2|x - 2) + p{y - l|x -1) (1 -a ) +x[^( v - 2|x - 2) - ,p( v - l|x -1) - p( y - l\x -1) + p{ v|x)]} = W v|x) - 2^( v - l|x -1) + p( y - 2|x - 2) (1 -a ) -x[p(y\x) - 2p{y - l|x -1) + p( y - 2\x - 2)]}, dadX p(y\*)=~: ^p(y-i\x-i)-^P(y\x) = ^ {[~P(y-^-l) + p(y-2\x-l)] - M v | x ) + Jp(v-l|x)]} = r 5 - {p{y\x) - P(y-l\x)-p(y-l\x-l) + p(y-2\x-1)} 1-a and Chapter 4. Estimation 85 Â§^p(y\X) = -^p(y\x) + ^p(y-i\x) = p(y\x) - 2p(y -l\x) + p(y -2\x). The second derivatives of the log-likelihood are then, â€ž pmx^^p^x^-^pix^x^ t=\ p(xt\xt_xy tZj^ripiX,^^)2-2p(Xt\Xt_x)p(Xt-\\Xt_x-\) t=\ (1 -a ) + p(Xt\X!_x)p(Xt-2\Xt_x-2)-Xt_x[p(Xl\Xl_x)p(Xt-2\Xt_x - 2 ) -p(Xt - 1 | X M - l ) 2 ] } / ^ ( X f | X M ) 2 1 ^ f2a2X t_ xp(X l -l\Xt_x -1) 2 x + a 2 X ( _ , ( X ( _ 1 - l ) J p ( X , - 2 | X , _ 1 - 2 ) a 2 ( l - a ) 2 f P(Xt\X,_x) aX^p(Xt-\\Xt_x-\) ,21 (4.4.3) . ^ ( x j x ^ ^ I ^ ^ i x ^ o - ^ m x j x ^ ) ] Â§ / > ( * , I * M ) 2 = ^ ( X , | X , _ > ( X , - ^ X ^ ) - ^ - l | X f _ , ) 2 â€”/ v I v \2 t=l p(xt\xt_xy 1 " X2p(Xt-2\Xt_x) P(Xt\Xt_x) Xp(X,-l\Xt_x) p{xt\xt_x) â€˘ and Chapter 4. Estimation 86 â€ž p(Xt\ Xt_x) p(X, I Xt_x) - A p ( JT, | X,_x )^-p(X,\ Xt_x) I _ â€˘y1 oAoa ok da * it p(xt\xt_x)2 _fXt_x pjX,\Xt_x)p{Xt - 2 1 X ^ - 1 ) - ^ ^ -\\Xt_x)p(Xt-\\Xt_x-\) _ 1 ^ [A.oJr < _ l j p(X,-2 |X, . 1 -1) Xp(Xt - \\Xt_x)aXt_xp(X, - l\X,_x -1) A a ( l - a ) t r l p(Xt\Xt_x) P(X,\X,_X)2 J' Next we show how to write the first and second derivatives of the log-likelihood as conditional expectations. To make these conditional expectations more clear, we consider the following example with just 2 random variables. Let X and Y be independent random variables and denote their densities respectively as fx(x) and fY(y), where the densities are with respect to the measure v (Lebesgue measure or counting measure). Let Z = X + Y be the convolution of X and Y. The joint distribution of Z and X is fx (JC) fY (z - x) (note the Jacobian for this transformation is 1). The density for Z is found by integrating out X, fz 0 ) = f fx (x)fr 0 - x ) d v O) â€˘ Jâ€”oo The conditional density for X given Z is fxM = f A x ) M z ~ X ) . The conditional moments for X and Y are then, Chapter 4. Estimation 87 E[xk\z] = JlxkfX]Z(x)dv(x) \ xkfx(x)fY(z-x)dv(x) Jâ€”CO fzV) and E[Yk\z] = E[(Z-X)k\Z = \_x (*-*)* fx\z (x)dv (x) J {z-xf fx(x)fY(z-x)dv(x) v â€”CO Proposition 4.4.1 For the Poisson AR(1) model, with conditional probabilities p(Xt\Xt_x) given in (4.2.1), the following equalities hold: aXt_xp{Xt -\\Xt_x -l)/p(Xt\Xt_x) = Et[aoXt_x] (4.4.4) Xp(X, -1|X,_x)/p(X, | Xt_x) = E, [e, ] (4.4.5) a2X,_x{Xt_x -l)p(X, -2\X,_X -2)/p(X,\Xt_x) = Et[(aoXt_xf]-Et[aÂ°Xt_x] (4.4.6) aXXt_xp(Xt -2\X,_X -l)/p(Xt\X,_x) = ÂŁ ( [ ( a o I M ) e J (4.4.7) X2p(Xt-2\Xt_x)/p(Xt\Xt_x) = El[e2]-El[st] ' (4.4.8) where ÂŁ,[â€˘] denotes the conditional expectation with respect to the sigma field 3t=a(X0,Xx, ,Xt_x,Xt). Chapter 4. Estimation Proof: We first show (4.4.4). Rearranging (4.4.1) we see that aXt_xp(Xt-\\Xt_x-\) p(Xt\Xt_x) p{Xt\X_x) % = Et[aoXt_l]. 1 V t-\ \ x/i \X,-i-x __ x a (1 -a ) (X._A ... .y.^exXx' V x J Next the left hand side of (4.4.5) is Xp(Xt-\\Xt_x) 1 mHXÂŁ-x'-\(xtA p(Xt\Xt_x) p(Xt\Xt_x) _ V x J X.-l-x j min( )|<J^ ^ P(Xt\Xt-\) *=o 1 V x J min(X, .-X.-OfX ~\ p(Xt\Xt_x) tt V x ) | a x (1 - a )X"l~x (X,-x) | a " ( l - a )* - - ' ( JT , -x ) (X,-l-x)\ e'xXx'-(Xt-x e-xXx'~x (X,-x)l which is the right hand side of (4.4.5). Next the left hand of (4.4.6) is Chapter 4. Estimation 89 a 2 X M ( X f _ , - l ) ^ ( X t -2|*_,-2) 1 minCJT,^,^,.,^) X a 2 X , _ , ( X M - l ) / y _ o \ -x^x.-2-x a (1-a) 1 j=0 mii^A-, -2,X,_,-2) L (- l v x y ( X , - 2 - x ) l /KXJX,.,) ^ 1 min(*, -2,X,_,-2) (X , - 2 ) l ( X ( _ ! - 2 - x ) ! x ! X,_,-2-a* + 2 ( l -a ) A ' - ' e - x ^ - 2 -( X r - 2 - x ) ! 2 (x + l X x + 2 ) i ; : - ; | a - 2 ( l - a ) ' - 2 - * ekrk X^X,-2-x (X,-2-x)\ p{Xt\Xt_x) ^ * " \ x + 2 j />(*,!*,_,)â€˘ ^ ^ \ x + 2 f V ' ( X ( - 2 - x ) ! > x ( x - l ) a (1-a) . ^ V ; - V ' ( X , - x ) ! fX, ^ P ( X t \ X t - \ ) x=0 = ÂŁ | ( a o Z M ) 2 ] - ÂŁ ( [ a o I M ] v x J which is the right hand side of (4.4.6). Next the left hand side of (4.4.7) is Chapter 4. Estimation 90 aXXt_xp(Xt-2\X,_x-l) P(Xt\Xt_x) 1 pmx,_x i P{Xt\Xt_x 1 p(Xt\X,_ 1 pmx,_ 1 P(X,\Xt_ 1 p(X,\Xt_x 1 I 1 Z a -^> xam(X,r2,X,_x-\) V x J a * ( l - a ) ' x,,-\-x e A, ' ( X , - 2 - x ) ! (X,_! -1)! â€ž e XXX' 1 x tt '-'(X^-l-xy.xl V ; ( X , - 2 - x ) ! n(X, -2 ,X M - l ) / j ^ i=0 v x + l y X ( x + i r % Â« ' ( i - a r ( X , - 2 - x ) ! min(X,,-l,XM) / F ^ Z A -' x=0 min(X,,-l,XM) / j ^ \ V X J a ' ( l - a ) ' -X>\ X-x x,.-x e K ' x=0 V x J __ x "x a.x(\-a)x"-x{Xt-x) {X,-\-x)\ e~xXx--x p(Xt\X, = E,[{aoX,_l)et] (X,-x)\ exXx'~x min( (y \ __ x " l ax{l-a)x"-'(Xt-x) x=Q V X J \X,-X)\ which is the right hand side of (4.4.7). Finally the left hand side of (4.4.8) is X2p{Xt-2\Xt_x) P(Xt\Xt_x) j mm(X,y2,X,.,) Jy \ p(Xt\X,_x) tt z ^2 r k a - a ) V x J X.-2-x (X,-2-x)\ j ^(X,r2,X,_i) f jÂŁ y p(X,\Xt_x) tt {\-a)x--\Xt-xlXt-x-l) V x J exXx'~x ( X , - x ) l e~xXx'-x p(Xt\Xt_x) t% { x J = E,[e2]-Et[et] ax(l-a)x^{(Xt-x)2-(X,-x)}^^^ Chapter 4. Estimation 91 which is the right hand side of (4.4.8). â€˘ Proposition 4.4.1 leads to the following new expression for the first and second derivatives of the log-likelihood: u * = a ^ l r ^ ' [ a o X ' - l ] " j B ' - i [ a o X ' - i ] } ' ^j~â€”f t {2aEt [ a Â° ' " ^ t a Â° x ^ i + e > [ ( a Â° x-i y - ^ [ Â« ^ H J - ( ÂŁ j a o I M ] ) 2 } a 2 ( l - a ) 2 1 ^ " l ) E ' [ a Â° X ' ^ + V a r ' t a Â° 1 " t a Â° ] } , aX t {E,[(Â« ) s , ] - ÂŁ , [ a o jr,_,]ÂŁ,[e,]} _ ^ | c â€ž , [ a . A r M , Â« , ] and " = ^ S { ÂŁ . ^ ] - ÂŁ . [ ÂŁ . ] - ( ÂŁ . [ 8 , ] ) 2 } -jTZl^W-^t'.Il-A (=1 Note that ÂŁ ( [ a o I M ] ^ a o I M ! E,[et]*et, ÂŁ M [ a o I M ] = a I M and Chapter 4. Estimation 92 Et-i[ÂŁt] ~ ^ â€˘ Further note that given Xt the correlation between a o Xt_x and s, is -1. This follows because C o v , [ a o I M , ÂŁ , ] = Covt[(X, -s,)et] = El[{Xt-el)et]-El[{Xt-st)]Et[el] ^X^^^-E^-X^z^E^,}2 and Vart[a Â°Xt_x] = Vart[{Xt-&,)] = Vart[e, ]. In the cases were the time series is comprised of low counts the expected information is easy to calculate numerically. For example i f the probability of a count larger than 6 is almost zero then there are 36 outcomes to sum over in the calculation of E Et [a Â° Xt_x ] 2 or E E, [e, ]2 J . Hence the expected information is easily to calculate. The above representation of the score function leads to a new definition of residuals. The usual way to define a residual is to take the difference between a random variable and its conditional expectation. That is, for the Poisson AR(1) model residuals can be defined as rt - Xt -Et_x\Xt] = Xt -oXt_x -X. However since Xt is comprised of two random components it would be nice to define two sets of residuals, one for each random component. The natural way to define such residuals is as follows: for the continuation component let rXt =ot Â°Xt_x -aXt_x and for the arrival component let r2t=et-X. Unfortunately these definitions won't work, because aÂ°Xt_x and e, are not observable. Chapter 4. Estimation 93 However we can replace a o I H and e, respectively with is, [a o J M ] and is,[e (] (their conditional expectations given the observed values of Xt_x and Xt). That is, redefine the residuals as r* = E, [râ€ž ] = [a Â° ] - a l ( _ , and r2* = E[r2t ] = E, [e, ] - X. Remarks 1. These conditional expectations are easily calculated with the help of Proposition 4.4.1. 2. We are using the aggregate data to estimate the individual unobserved processes (continuation process and arrival process). 3. The residuals are equivalent to the martingale differences of the score function. Further note our new expressions for the score function lead to these new definitions for residuals, which otherwise would not have been obvious. 4. Adding the components of the two new sets of residuals gives the old set of residuals. That is, r\t + rit = Et [a Â° Xt-\ ]- + Et [s / ]- ^ = Et [a o Xt_x + s, ] - oXt_x - X = El[Xt]-aXt_1-X = Xt-aXt_l-X 5. A l l the residuals above should be standardized before making residual plots. 6. Since the two sets of residuals are calculated from aggregate data they might be Chapter 4. Estimation 94 highly correlated. This appears to be the case in Example 4.4.1. For low count series (series with many zeros) there appears to be less correlation, see the residual plots for data set 1*, Figure 8.2.1, and the discussion of the residuals in Section 8.2. Example 4.4.1 In this example we continue our analysis of our illustrative data set by examining the residual plots. Figures 4.4.1 and 4.4.2 are respectively the residual plots for the continuation and arrival processes, while Figures 4.4.3 and 4.4.4 are respectively the autocorrelations respectively for the continuation and arrival residuals. Both residual plots have patterns similar to the Pearson residual plot in Figure 2.5.4. None of the autocorrelations in the continuation residuals are statistically significant at the 5% level. The second autocorrelation for the arrival residuals is on the border line significant at the 5% level. Residuals: Continuation Process (heavy manufacturing industry, males, ages25-34, burns) Figure 4.4.1 Residual plot of the continuation process for the time series of monthly claims counts of workers collecting STWLB from January 1987 to December 1994. All claimants are male, between the ages of 25 and 34, are employed in the heavy manufacturing industry and are collecting STWLB due to a burn related injury. Chapter 4. Estimation 95 Residuals: Arrival Process (heavy manufacturing industry, males, ages 25-34, burns) Figure 4.4.2 Residual plot of the arrival process for the time series of monthly claims counts of workers collecting STWLB from January 1987 to December 1994. All claimants are male, between the ages of 25 and 34, are employed in the heavy manufacturing industry and are collecting STWLB due to a burn related injury. Autocorrelations of the Residuals: Continuation Process (heavy manufacturing industry, males, ages 25-34, burns) i : i Figure 4.4.3 Autocorrelations in the continuation residuals for the time series of monthly claims counts of workers collecting STWLB from January 1987 to December 1994. All claimants are male, between the ages of 25 and 34, are employed in the heavy manufacturing industry and are collecting STWLB due to a burn related injury. Autocorrelations of the Residuals: Arrival Process (heavy manufacturing industry, males, ages 25-34, burns) JS 2 o o o < -0.5 11 Lag Figure 4.4.4 Autocorrelations in the arrival residuals for the time series of monthly claims counts of workers collecting STWLB from January 1987 to December 1994. All claimants are male, between the ages of 25 and 34, are employed in the heavy manufacturing industry and are collecting STWLB due to a burn related injury. Chapter 4. Estimation 96 4.5 The score and Fisher information for a general AR(1) model We now show how to extend this to a general AR(1) model of the following form. Let Xi = a Â° X t - \ + s t , where aÂ°Xt_x\Xt_x has densi ty /^ lx^^a) , st has density g(s;X) and Xt\Xt_x has density h(xt\xt_x;a,X) = f f(s\xt_x;a)g(xt -x;X)dv(s). Jâ€”oo The conditional density of a o Xt_x \ Xt_x, Xt is f(s\xt_x;a)g(xt -s;X) f(s\xt_x,xt;a,X) h(xt\xt_x;a,X) Let 3, =a(XQ,Xx,...,Xt). The conditional moments of aoXt_x\3t and e, |3, can be expressed as follows. E[{aoXt_x)k\^] = [yfWx^x^aMdvis) f xkf(s\xt_x;a)g(xt-s;X)dv(s) J -co h{xt\xt_x;a,X) ÂŁ[e? |3 , ] = E[{Xt-aoXt_x)k\z] = \ {xt-s)kf(s\xt_x,xt;a,X)dv{s) Jâ€”oo J'CO ^ ÂŁ / (s |* , - i ; a )g(X-*;A,)rfv( j ) -00 h{xt\xt_x\a,X) Given a sample x0,xx,...,xn from the above AR(1) model the likelihood is the product of the conditional densities , that is L(a,X) = ^ " ^ ( x J x ^ j j a j A y n ^ ) . To simplify things Chapter 4. Estimation 97 we take h(x0) = l. The log-likelihood and score are respectively then ÂŁ(a,X) = ]T" = ilog(/*(x r |*,-i;a,A.)) and U(a,X) = f-^-ÂŁ(a,X)" da n â€”n{xt\xt_{,a,X) ,=1 h(xt\xt^;a,X) n -Z-h(Xt\xt.i\0.,X) y 1 5A, ,=1 h(xt\xt_x;a,X) If -â€” f(s\xt_x;a) and â€”g(xt -s;X) are polynomials of the following form, 5a dX â€”/(*| ;a) = (a0 + axs + a2s2 +â€˘ â€˘ -+apsp )f(s\xt_l;a), and â€” g(xt -s;X) = (a0 +ax(x, -s)+a2(xt -s)2 +â€”+ap(xt -s)p)g(xt -s;X) oX then the score can be rewritten is terms of conditional moments as, ( n p U(a,X) = Z Z Â« , 4 a o J T M ) ' | 3 f /=1 1=0 \t=\ i=0 Note that we can write Chapter 4. Estimation 98 â€” / ( j | x M ; a ) = x ( j ; a ) / ( 5 | x M ; a ) 5a and 5A g ( x ( - 5; A) = y (x, - s; X)g(xt - s; X) by simply defining i(s;a) = â€” //f and y (x, -s;X) = â€”g/g. Thus the score can be 5a / dX I written in terms of conditional expectations as, <7(a,A) = ^ ÂŁ [ T ( a o X M ) | 3 j A I>[Y(ÂŁ,)|3,] V/=I The second derivative of the log-likelihood are, (=1 5a -&(x, | x , _ , ; a , A ) h(xt\xt_x;a,X) f 5 ^ â€”/?(xJx , _ , ;a ,A) 5a "aX dadX A ( x , | x M ; a , X ) /?(xJx f _j;a,A) 5a A ( x , | x M ;a , A ) / j ( x J x M ; a , A ) â€” / j ( x ( | x M ; a , A ) 5A / ? ( x , | x M ; a , A ) A (=1 1? 5A: 2 ^ a A ^ d ^ ' - l ) A a ,x( X rlVl) \ 2 Chapter 4. Estimation 99 If we let x a = -^x and y x = J^y then we can write, /(5|x,_!;a) =xa(5;a)/(5|xr_1) + (x(5;a)) 5a and d2 2 g(xt -s;X)=yx(xt -s;X)g(x, -s;X) + (y(xt -s;Xj) g(x, -s;X). dX2 Using these two equations we can rewrite the second derivatives of the log-likelihood as, = S {^ fT a Â° ) + (x (a o ))213,1 - (^ [x (cc o )| 3, ])2 j t=i = Z l ^ a C a ^ ^ ^ J + MxCaoX^)!^]} lox = X{4^ a o ^- i )Y( ÂŁ ( ) l 3 J -^ ( a o ^- i )|3 ,My (ÂŁ r )|3,]} = _Cov[x(aoXt_}),y(et)\3t] 1=1 = __{E[y x(e +Var[y(et)\Zt]} r=l Applying these results to the AR(1) models of Joe (1996) is a bit more complicated, since Joe allows the distribution of a o Xt_x to depend on A . . If we define Q 8(-) such that â€”f(s\xt_x;a,X) = 8(s;X)f(s\xt_l;a,X), then the score function with 5A. respect to A. becomes Chapter 4. Estimation 100 ÂŁi<[8(aÂ°X,_J+Y(e,)|3,]. Similar modifications can be made to the expressions for the second derivatives of the log-likelihood. In the following three cases a Â° Xt_x is independent of A,. 1) The Poisson AR(1) model, see Section 4.4, 2) The Gaussian AR(1) model, with a Â°Xt_x = oXt_x and s, ~ N(X,G2). This is slightly different than Joe (1996), who lets a Â°Xt_x\Xt_x ~N(aXt_x,ao2) and s, ~ N{X, (1 - a )a2). Note however that both models are equivalent. 3) The generalized Poisson AR(1) model, where Xt has a generalized Poisson distribution with parameters A,/(l-a) and A/(l-a). Note Joe (1996) allows the two parameters to be different. In these 3 cases the functions x(s) and y(s) are polynomials in s, and hence the score function and observed Fisher information can be written in terms of the conditional moments of a o Xt_x and er 4.6 Asymptotics of the conditional maximum likelihood estimators for the Poisson AR(1) model In Section 4.3 we showed that the Poisson AR(1) model is a -mixing. Since it is also Chapter 4. Estimation 101 stationary the model is ergodic. If the Fisher information is finite and positive definite then the regularity conditions C1-C3 and C5-C7 hold. The next theorem shows that the Fisher information is finite. Theorem 4.6.1 Let uat=Uai-UaJ_v uXt=UXt-Uu_x and ut =(ual,uXlf, where Uw and are the score functions for the Poisson AR(1) respectively with respect to a and X. Further, let ut denote the matrix of partial derivatives of ut with respect to a and X and let Ut =^"_1Â«t- For any two dimensional vector I and any positive integer k, E (lTut )k < oo and i s [ / r i i , / ] < oo. Proof. For the Poisson AR(1) model we note the following: 1) E.laoX.J <X? (4.6.1) 2) Et[etf<X? (4.6.2) 3) Et[aoXtjEt[ej<X?+! (4.6.3) for all positive integers k and /. Also note that, for any two dimensional vector /, {lTut}k ={lxuat +l2uxt}" is a polynomial in E,[aÂ° Xt_x] and E,[et] of degree k, which we write as Chapter 4. Estimation 102 I k-i for some constants a0,ax,---,ak. The expected value of j / 7 " , } is j ; ^ ( [ o o j M ] h ÂŁ , [ 8 > ] i=0 ^ E K i 4 ^ [ Â« Â° ^ - . r ^ [ 8 r ] ' " (=0 1=0 < 00 The proof for the second part is automatic since E (lTu,)2} = E[lTutuJl] = E[lTutl\ The next theorem shows that the Fisher information is positive definite. Proposition 4.6.1 E 1 7 {lTux) =0 if and only iflx =l2 = 0, where l = (lx,l2) Proof. It is sufficient to show that Var[lxEt[a Â° Xt_x] + l2Et[et]]* 0, or that Var[lxaX!_xp(Xt-l\Xt_x-l) + l2Xp(Xt-l\Xt_x)] * 0 when / is non-zero. We prove the result by contradiction. Suppose there exists a non-zero / such that Var[lxaXt_xp(Xt-l\Xt_x-l) Chapter 4. Estimation 103 +l2Xp(Xt - l | X M ) ] = 0. This implies that lpXt_xp(Xt-\\Xt_x-l) + l2Xp(X,-l\X^j\ = c (4.6.4) almost everywhere for some constant c. In particular (4.6.4) holds when Xt â€” 1 and Xt_x = 0 which gives us c = lxa0p(0\-i) + l2Xp(0\6j\ = l2Xex or l2 = cX~lex. Taking Xt = 1 and Xt_x = 1 in (4.6.4) we can solve for lx as follows c = lxa\p(0\6) + l2Xp(0\l)] = lxae~x +cX~lexX(\-a)e~x = lxae~x +c(l-a) or lx = cex. Finally taking Xt = 1 and Xt_x = 2 in (4.6.4) we have c = lxa\p({\\) + l2Xp(Q\2)\ = cexa(l-a)e~x +cX-lexX2(l-af ex = c a ( l - a ) + c2(l-a) 2 = c ( l - a ) { 2 - a } This implies either a = l , a = 2 or c = 0. Since a = 1 and a =2 are outside the parameter space we conclude that c = 0, however this implies that lx=l2=0. â€˘ Chapter 4. Estimation 104 It can be shown that the regularity condition C4 follows from the uniform strong law of large numbers, see Ferguson (1996, Ch. 16). Therefore the regularity conditions for Corollary 4.1.1 are satisfied, and as a result the maximum likelihood estimators ctn and Xn have the following asymptotic distribution: â€˘Jn (dâ€ž - o ^ d N(o,r x), where / is the Fisher information matrix. The parameter estimates can be found using a Newton-Raphson type iterative procedure as follows: let U(Q) be the score function and let t/(0) denote the matrix of partial derivative of U with respect to 9, where 9 =(a,X)T. The iterative procedure is defined by 0(~i) = ew-[/(eW)- i c/(0W). In some cases this procedure can be modified by replacing C/(0 ( r )) with i^t/(9 ( r ))] and is sometimes referred to as Fisher Scoring. We use Fisher Scoring to estimate the parameter in the Poisson AR(1) model, since 4c/(0(r))J is easy to calculate. The CLS estimates generally work well as starting values. Occasionally (less than 1% of the time) we found that the CLS estimates caused the algorithm to diverge. Example 4.6.1 The maximum likelihood estimates for our illustrative example are Chapter 4. Estimation 105 d = 0.40 and X = 5.2 . The inverse of the expected Fisher information matrix at these parameter estimates is given by ._,._[ 0.62 -5.17" ' ~[_-5.17 5 0 - 0 5 4.7 Comparison of methods Al-Osh and Alzaid (1987) used a simulation to compare the following three estimation methods for the Poisson AR(1) model: Yule-Walker, CLS and C M L . They note that the algebraic formulas for the Yule-Walker and CLS estimates are almost identical and is also seen in their numerical comparisons. Their results also show that there is only a small efficiency gain in C M L when a alpha is small, say less than 0.3. However for larger values of a alpha, say between 0.5 and 1, the efficiency gain in C M L seems well worth the effort. Our analysis differs in that we look at the asymptotic efficiency (AE). That is we consider what happens as the sample size goes to infinity and our results are not based on a simulation. We let 0 n be an estimate of 0 and denote it Godambe information by j and its inverse by Let be the k, k element of j'1 and let be the k, k element of the inverse of the Fisher information matrix. Cox and Hinkley (1974) define the A E of the k* component of 9 â€ž as AE(Q nk) = Z"1 / j ' " 1 . Chapter 4. Estimation 106 A E of CLS (A = l) AE(d) AE (X) Figure 4.7.1 The asymptotic efficiency of conditional least squares as a function of a when X = 1. 0.86 0.84 0.82 0.8 0.78 0.76 0.74 0.72 to m csi AE of CLS (a = 0.3) in X in CO in cri AE{& ) Figure 4.7.2 The asymptotic efficiency of conditional least squares as a Junction ofX when a =0.3. Figure 4.7.1 shows that as a goes to zero the A E goes to 1, that is for small values of a there is little difference between CLS and C M L . However as a goes to one the A E goes to 0, that is there is a substantial advantage in using C M L when a is large. Figure 4.7.2 shows that as X increases the A E in estimating a rises while the A E in Chapter 4. Estimation 107 estimating X drops. In both cases the A E appears to be approaching a limit of about 0.83. For GLS estimation we can use Theorem 4.3.2 to find bounds on the Godambe information. Unfortunately we found no increase in information in moving from CLS estimation to GLS estimation. Tables 4.7.1 and 4.7.2 show that the CLS information is contained within our bounds for the GLS information. Further the upper bound on the GLS information is not significantly closer to the C M L information. G L S C L S C M L E l | (2.00,2.20) 2.14 2.30 9 (0.788,0.797) 0.794 0.802 Table 4.7.1 The diagonal elements of the Godambe information matrix for GLS, CLS and CML when a = 0.3 and 1 = 1. GLS C L S CML E l (6.18,7.24) 7.17 8.35 a (0.573,0.619) 0.616 0.668 Table 4.7.2 The diagonal elements of the Godambe information matrix for GLS, CLS and CML when a = 0.7 and 31 = 1. Next we tested our estimation methods on some misspecifed data. We simulated 200 series of length 100 using binomial thinning with parameter a = 0.5 and misspecifying the arrival process by letting the distribution of e, be uniform over {0,1,2}. The resulting sampling distributions of d and X for the estimation methods CLS, GLS and C M L are summarized by box plots in Figures 4.7.4 and 4.7.5. Figure 4.7.3 shows that C M L estimates for a are strongly biased, in fact almost Chapter 4. Estimation 108 all of the C M L estimates for a are greater than 0.5. In contrast the CLS and GLS estimates for a are only slightly biased. Note that the interpretation a has not changed, that is a is the probability that a claimant in the current period continues to be a claimant in the next period. In Figure 4.7.4 we see that C M L estimates for X are biased estimates of the mean of e, which is 1, in fact almost all of the C M L estimates for X are smaller than 1. In contrast the sample mean of the CLS and GLS estimates for X are very close to 1. We can think of X as the mean parameter of et, however it is not used to specify the distribution of s ( . Estimates of X are therefore estimates of the mean of e,. Box Plots (estimates for alpha) o CO CD h-; TO <=> " I CO LU i _ CD u o ' CD E 2 CO CM d < 9 "cLS GLS CML Estimation Method Figure 4.7.3 Box plots comparing the sampling distributions of a when the arrival process {e,}is uniform over {0,1,2}. Chapter 4. Estimation 109 Box Plots (estimates for lambda) 00 8 w CD 00 0- d CLS GLS Estimation Method CML Figure 4.7.4 Box plots comparing the sampling distributions of Xwhen the arrival process {s,}is uniform over {0,1,2}. Chapter 5 5 Testing for independence In this chapter we consider testing i f the thinning parameter a in the Poisson AR(1) model is zero, that is testing for independent observations. We have a strong belief that an AR(1) model is appropriate for W C B data and in such cases it may not be necessary to test for independence. However the test for independence is interesting from a mathematical point of view and would be necessary before fitting the model to other data sets, which may not be interpretable as a queue. The first section examines what would happen i f one blindly went ahead and used the standard Gaussian AR(1) asymptotic results. The next section bases inference on the conditional least squares estimator for a . As noted in section 4.3 this estimator is the same as the Gaussian estimator for a , however this time the asymptotic properties of the estimator are worked out under the Poisson model. We then derive a score based test and show that it is asymptotically equivalent to the CLS based test. Finally we examine the score function in detail about the point a = 0 , leading to non-standard Wald and likelihood ratio tests. 5.1 Gaussian AR(1) Suppose we decided to ignore the fact that our time series is integer valued, and used the 110 Chapter 5. Testing for independence 111 usual Gaussian AR(1) model. In Example 4.1.5 we showed that yfn{an - a ) is asymptotically normal with mean zero and variance 1 - a 2 . Under the null hypothesis, H0:a = 0, 4ndn is asymptotically standard normal. We reject the null hypothesis when 4nan is larger than the critical value ÂŁ 8 , where ÂŁ s is selected so that the probability of committing type I error is 5 (the significance level of the test). That is, under/^0:a = 0, p(~Jnan > %s) = P{Zn > %s) = 5 , where Z denotes the standard normal random variable. Note this is a one sided test and that a negative estimate for a would lead us to accept the null hypothesis. The power of the test is the probability of not committing a type II error, that is the probability of rejecting H0:a = 0 when 7f,:0 < a < 1 is true. The power of the test under the Gaussian assumption is, P(^tdn>^a) = P = P\ Vn(dâ€ž-a) ÂŁ a - V Â« a 1 - a 2 1 - a 2 2 V 1 - a ' In the next section we continue by using the correct asymptotic distribution for 4ndn. 5.2 Conditional least squares A more sophisticated test for the hypothesis is based on the conditional least squares method. In section 4.2 we showed that the CLS estimate for a was the same as the Gaussian based estimate, but with a larger asymptotic variance, l - a 2 + a ( l - a ) 2 /x , . Chapter 5. Testing for independence 112 However under the null hypothesis both variance expressions reduce to 1 and hence we would use the same critical value for both tests. Another way to put this is that the significance level of the test in section 5.1 is correct. However the power function in section 5.1 is incorrect under the Poisson AR(1) assumption since the variance used is too small. The difference between the Gaussian and Poisson based variance is a(l-a)2/X, which is small when X is large or whena is near zero or one. To help assess this effect on the power function we consider two graphs of the power function: first the power as a function of a , and second as a function of X . Power of Gaussian vs Poisson Test (function of alpha) GAR(1) PAR(1) T - ^ - r ^ - i - c o c o o c N i o c o O O O Q T - T - T - C M C M C M o d d d d d d d d Alpha Figure 5.2.1 A comparison of the power for the Gaussian and Poisson based tests as a function o / a . 1 and Â« = 100. Figure 5.2.1 show that for a between 0.01 and 0.16 the Gaussian based power is understating the "true" power and for a larger than 0.16 the Gaussian based power is overstating the "true" power. However overall the error in the Gaussian based power Chapter 5. Testing for independence 113 seems small. For example i f the true model is the Poisson AR(1) with a = 0.30, X = 1 and we performed the test on a sample of size 100, then the probability of rejecting H0:a = 0 is 90.0% and not 93.2% as given by the Gaussian based power. Figure 5.2.2 shows the following: Gaussian based power is independent of X, for extremely small values of X the "true" power based on the Poisson AR(1) model is large, but as X increases the "true" power quickly converges to the Gaussian based power. For example i f the true model is the Poisson AR(1) with a = 0.01, X = 0.1 and we performed the test on a sample of size 100, then the probability of rejecting H0:a = 0 is 8.0% and not 6.1% as given by the Gaussian based power. 0.25 0.2 0.15 1 I 0.1 0.05 0 Figure 5.2.2 A comparison of the power for the Gaussian and Poisson based tests as a function ofX. a =0.01 and n = 100. Power of Gaussian vs Poisson Test (function of lambda) o o o Lambda CM CM CM O O O Chapter 5. Testing for independence 114 5.3 Score test In this section we derive a test for a = 0 by considering the distribution of the score function when a = 0. We begin by noting the following simplification for the conditional probabilities, equation 4.1, when a = 0, p(X,\X,_x) = e~xXx' /Xt!. That is, the series is iid Poisson. Making use of this simplification the score function (4.4.1) reduces to, tfa(0A)=4E*,-,(*,-*.)-A- t=\ Under H0:a=0, Ua(0,X) is a zero mean square integrable martingale. Its quadratic characteristic and variance are both equal to Â«(l + X). Since conditions for the as strong law hold, from Theorem 4.1.8 and 4.19, we have that nx [Ua (0, X)]n ->(l + X). The martingale differences each have identical moments, and all of these moments exist and are finite. This insures that the Lindeberg condition trivially holds. Thus the conditions for the central limit theorem are satisfied and the distribution of the score is n'^2Ua(0,X)^N(0,l + X). To find the distribution of the score under the alternative hypothesis Hx :0 < a < 1 we rewrite the score, U Â« (0, X) = \ J X_X {X, - a l , , - X) + ^ ÂŁ X]_x A /=i A /=i (* 1 r x, ^ 2 \ 1-a U - a J ) Chapter 5. Testing for independence 115 The first term in the score is a zero mean square integrable martingale. Applying the strong law of large numbers we get, 1 a -Ua(0,X)^0 + -n X Therefore under the alternative hypothesis n /2Ua(0,X) will diverge since it has no mean correction. This implies that the test is consistent or that asymptotically the null hypothesis is rejected with probability one when the alternative hypothesis is true. Also note again that the test is one sided, that is large positive values of the score lead us to reject the null hypothesis. In practice we replace the nuisance parameter X with an estimate. Under H0:a=0 the maximum likelihood estimate for the Poisson mean parameter is the sample mean X = X. Plugging this estimate into the score function we get, The second term is op{\) and the first term ignoring the l/X is a zero mean square integrable martingale with variance A,2. Again the central limit theorem holds and Chapter 5. Testing for independence 116 _ i / - d together with the continuous mapping theorem we get n /2Ua(0,X)â€”> N(0,Y)-Alternatively we could have found the distribution of the score by noting that, -rU.iO,X)=n'-' - Jnâ€ž (5.3.1) where dâ€ž is the conditional least squares estimator for a . Since both X and â€” 'zZ"_l(Xt_l - X) converge in probability to X under the null hypothesis and to X/(l â€” a) under the alternative hypothesis, the continuous mapping theorem implies that -^=rUa(0,X) and 4nan both have the same asymptotic distribution. The difference between the score test and the conditional least squares t-test is in the estimate for X in the denominator of the statistic. To summarize, the test based on the Gaussian model has the correct significance level, but the wrong power. The error in the power, however, is not large. The test based on the score is asymptotically equivalent under alternatives to the conditional least squares t-test. So from the point of view of testing for independence it is interesting to note that naively using the standard Gaussian test gives the correct result except for a small error in the test's power. 5.4 The Score function on the boundary of the parameter space In this section we take a detailed look at the score function about the point a = 0, and although this is a boundary point of the parameter space the score is well defined and Chapter 5. Testing for independence 117 behaved. Recall the interpretation of a is the probability that an individual in the system at time t â€” 1 remains in the system at time t. In such a case values of a less than zero would not make any sense. However the probability function p(x;a) = a 1 ( l - a ) 1 x , x=0, 1, is well defined for any real number a and its derivatives exist and are continuous. Similarly, even though the Poisson AR(1) likelihood is not defined for negative values of a , we can still examine the function's mathematical properties at the point a = 0. To get an idea of what the likelihood looks like at a = 0 we simulated ten data sets and plotted the likelihood as a function of a . Figure 5.4.1 shows five plots of the likelihood for simulated independent and identically distributed Poisson samples with mean 1, which correspond to the case where a =0. For comparisons we have also included five plots for the case were a is near zero in Figure 5.4.2. alpha=0, lambda=1, n=200 8,000 , alpha Figure 5.4.1 Plots of the likelihood as a function of a forfive simulated samples ofsize 200 with a = 0 and k = l. Chapter 5. Testing for independence 118 alpha=0.1, lambda=1, n=200 7,000 6,000 .. 5,000 o CO 4,000 -o .c < 3,000 . I o X 2,000 . 1,000 . 0 . -1,000ÂŁ ffl B t o o o I I I Figure 5.4.2 Plots of the likelihood as a function of a for five simulated samples ofsize 200 with a = 0.1 and X = \. In both figures the likelihood plots are smooth with a unique maximum. These plots do not answer the question of whether or not the score function is well behaved about a = 0, but suggest that the derivative of the likelihood about a = 0 may exist and be continuous. Also note, the maximum for one of the simulation series in Figure 5.4.1 occurs at a point where a is negative. We know that the score is defined at a = 0, we even gave an expression for this in section 5.3. However is it defined for a < 0 and if so how does it behave? Recall that the conditional probabilities p(Xt\Xt_i), equation 4.1, are polynomials in a . One might think that a negative value for a could cause the conditional probabilities to become negative. To consider this possibility we rewrite the conditional probabilities as follows 1=1 v x . xt\ -X * X, -x (Xt-x)\-The first term is positive and the summation is less than one in absolute value. Therefore Chapter 5. Testing for independence 119 given a value 8 > 0 it is possible to find a neighborhood K 0 of a = 0 such that the probability of p(Xt\Xt_x) being non-positive is less than 8 . Since p{Xt\Xt_x) is a polynomial in a , the derivatives of p(Xt\Xt_x) with respect to a exist and are continuous. Further derivatives of the log(p(Xt\Xt_x)) with respect to a wil l be the ratio of two polynomials in a , with the polynomial in the denominator being positive for a in some neighborhood of zero. Hence the derivative of log( /?(X, |X M ) ) with respect to a exists and is continuous for a in some neighborhood of zero. Before calculating the Fisher information matrix we note the following simplification when a = 0. p{Xt-i\Xt^-i) e'^ Xt\ =Xt(Xt-l)-{Xt-i + l) p(Xt\Xt_x) (Xt-i)\e-W X' Substituting this into Equation 4.4.3 we find the observed Fisher information. ~ L x = ~ X J^^-l ~^ ~ Xt-\ + Xt-\{Xt-\ ~ l) '~^2 ~ ~ Xt-\ "^2~ J Under the null hypothesis the expected information is Chapter 5. Testing for independence 120 -E[L] = -E X ~^ ~ Xt_i - Xt_x â€” - Xt_x â€” + Xt_x â€” rl y Xj_ y X^ x2 - l X2 X2 X / x (x + x2) y = -2X- + X+(X + X2)-^ + X ^ ^ - X ^ r X K 'X2 X2 X2 = l + X In a similar manner we find iaX = 1 and ixx =l/X. The inverse of the Fisher information matnx is 1 -X -x I X . Let an be the value of a that maximizes the likelihood when the search for a is not restricted to (0,1). We call a â€ž the likelihood maximizer to distinguish it from the maximum likelihood estimate dâ€ž, which must lie in the parameter space (0,1). Under the null hypothesis Vnct n converges in probability to a standard normal random variable and Vndn converges in probability to a random variable Z*, where Z* is defined as, / . v fO z<0 P(Z <z) = \ , . , v ; \P(Z<z) z>0 where Z has a standard normal distribution. We can define the Wald statistic in the usual way, Wn = n(d n - 0) 2 . If the maximum likelihood estimator dâ€ž is replace by the likelihood maximizer dâ€ž then the Wald statistic has the usual chi-square distribution with one degree of freedom. However Chapter 5. Testing for independence 121 for the maximum likelihood estimate the Wald statistic converges to a modified chi-square random variable defined by, P ( X * < x ) = ^ P ( X l < x ) , *>0, (5.4.1) where %, has a chi-square distribution with one degree of freedom. We also define the likelihood ratio statistic in the usual way, that is 2 log(A), where A = L(dn,Xn;X)/L(0,Xn;X) and dâ€ž and Xn are the maximum likelihood estimates and Xn is the sample mean. Our analysis at the beginning of this section showed that the derivatives of the log-likelihood with respect to a exist and are continuous at a = 0. It is therefore possible to make the usual Taylor series expansion of the log-likelihood about a = 0 and show its asymptotic equivalence to the Wald statistic. Example 5.4.1 In this example we test our illustrative data set for independence. Table 5.4.1 shows the test statistic values as well as the 5% and 1% critical values. In all the cases the null hypothesis of independence is rejected. Test Statistic Value 5% Critical Value 1% Critical Value CLS 4.66 1.645 2.33 Our Score 6.06 1.645 2.33 Wald 15.09 2.71 5.41 L R 24.27 2.71 5.41 Table 5.4.1 Tests for independence in the illustrative data set. Chapter 5. Testing for independence 122 To assess how well the tests for independence work for small samples we apply them to two sets of simulated data. For both sets of data we generate 1000 series of length 200. In the first set we let a = 0 and X = 1 and in the second set we let a = 0.1 and X = 1. Table 5.4.2 show the percentage of time the null hypothesis of independence was rejected by each test for the first set of data, while Table 5.4.3 show the percentage of time the null hypothesis of independence was rejected by each test for the second set of data. Test Percentage of rejections at the 5% level Percentage of rejections at the 1% level CLS 4.2% 0.7% Our Score 4.3% 0.6% Wald 4.1% 1.0% L R 3.7% 0.6% Table 5.4.2 The percentage of time the null hypothesis of independence was rejected out of 1000 simulated series of length 200 with a = 0 and X = 1. Test Percentage of rejections at the 5% level Percentage of rejections at the 1% level CLS 37.4% 16.0% Our Score 36.3% 15.0% Wald 35.9% 17.0% L R 36.0% 15.5% Table 5.4.3 The percentage of time the null hypothesis of independence was rejected out of 1000 simulated series of length 200 with a = 0.1 and X = 1. From Table 5.4.2 we see that the probability of committing type 1 error is about the same for all the tests. In Table 5.4.2 we see that the power of the CLS test may be Chapter 5. Testing for independence 123 slightly higher than the other tests at the 5% level and that the power of the Wald test may be slightly higher than the other tests at the 1% level. The table also shows that there is a large probability of committing a type II error when a is small. Chapter 6 6. General misspecification test In this chapter we apply a general specification test, sometimes called the information matrix test, to our model. The test is equivalent to a score test of whether the parameters in the model are stochastic or not. Section 6.1 motivates the information test. Then in Section 6.2 we derive the test for the simplest possible case and state the more general result found in McCabe and Leyboume (1996). Finally in Section 6.3 we give the details for the Poisson AR(1) model and evaluate the test on some simulated data. 6.1 Overview A n advantage of maximum likelihood estimation is that the asymptotic variance of its estimators often attain the Cramer-Rao lower bound. However, a drawback to maximum likelihood estimation is that the estimates are not robust to model misspecification. We saw this in Section 4.7 where we found that for the misspecified model our parameter estimates are severely significantly biased. It is therefore important to check a model's specification i f maximum likelihood estimation is to be used. Let L and / denote the likelihood and log-likelihood respectively, and let 0 be a vector of parameters. Also let the first and second partial derivatives of L and / with respect to 0 be denoted by L\, L\, / e and / e respectively. The expected Fisher 124 Chapter 6. General misspecification test 125 information can be expressed in two ways: the Hessian form - ÂŁ [ / e ] and the outer product form 2s[/ e/ e r]. When the model is correctly specified / e + 44^ has a distribution with mean zero. A n equivalent expression for le +/ e/ e r is Le/L, which is a zero mean martingale. The equivalence of these two expressions is well known and can be shown by considering the second partial derivative of the log-likelihood as follows: 7 = â€” / 9 ae 9 59 L h L I L \ L L 6 9 Barndroff-Nielsen and Sorensen (1994) state that Le/L is a martingale. This is easy to prove as follows: Let ft(yt) denote the conditional density of Yt given the past observations, and write the likelihood as Ln =n"_j / , (v,)- L e t 3 m be the sigma field generated by yx, y2, â€˘ â€˘ â€˘, ym. For m < n we have, 3â€ž \Â°V 1=1 Jt=m+\ ;=1 Â°V t=m+l J , , = J r - r , -dyÂ» Chapter 6. General misspecification test 126 = ^ J E L + 1 / . & â€˘ â€˘ â€˘ + J^J r L + , ft cv. ^ â€˘ â€˘ â€˘ ^ = A - i + A i Lm Therefore Le/L is a martingale. Under certain regularity conditions, see Section 4.1, the martingale central limit theorem can be used to find the asymptotic distribution of A good description of the information matrix test is found in White (1982). Chesher (1983) shows that for sequences of independent observations the information matrix test can be interpreted as a score based test of whether the model parameter are stochastic. McCabe and Leybourne (1996) extend Chesher's result to a much more general setting. Namely to sequences of random vectors, which are possibly dependent and non-stationary. 6.2 Outline of the test In this section we derive the score based interpretation of the information matrix test in the simplest case and then state the more general result found in McCabe and Leybourne (1996). Let yx,y2,â€˘â€˘â€˘,)>â€ž be a sequence of observations. We will assume that the distribution of yt depends on an unobserved random parameter 0 , , and denote the joint Chapter 6. General misspecification test 127 density of y, and 0, by / ( v , , 0 , ) . By definition f(y,,Qt) = f(y,\Qt)g(Qt), where f(y,\Qt) is the conditional distribution of yt given 0, and g(Bt) is the marginal distribution of 0 , . We will also assume the sequence of parameters {0,}" is uncorrelated and that ÂŁâ€˘[01 ] = p 0 and Var\Q ,] = 7 i > 0 . I f 7 i = 0 then the parameters are not stochastic and the observations are independent and identically distributed. Let L(y\Q) be the conditional likelihood, where y = (y{,---,yâ€ž)T and 0 =(0, ,--- ,0 n ) r . Consider the following Taylor series expansion of L(y\Q) about the vector p = (p 0 ,â€˘ â€˘ â€˘, p 0 ) : L(y\9) = L(y\Â») + ( 9 - p ) r L 9 ( v | p ) +1(0 - p ) ^ (v|p)(9 - p) +Op(\p - p f ) . To proceed we need the following well-known property of the trace operator, E(XT AX) = tr(AL), where X is a n x p random matrix, A is a n x n matrix of constants and = E . In the following we will use Ee to denote expectation with respect to 0 . The marginal likelihood is then, ZOO = E9[L(y\Q)] = L(y\u) + Ee [(0 - p ) r ]4 (y\ p) + } ÂŁ e [(9 - p ) r Z 9 (y| p)(9 - p)] + ÂŁ 9 [o, (|||0 - pf)] = L(y\p) + 0 +1 ^(Z e (y| p)Q) + o ( ÂŁ e [||9 - p f ]) where Q = var(0) = diag{%). Therefore i f K is close to zero the likelihood can be approximated by L(y)Â«Z,(v|p) + j f r (L e (y |p )Q) . Using the relationship Z e = L ( / e from Section 6.1, we can rewrite the likelihood approximation as Chapter 6. General misspecification test 128 ÂŁCv)Â«ÂŁ (y |u ){ l + ifr((/ e +/e/er)le^ where / = log(Z(v|0)). We are interested in testing the null hypothesis H0:n = 0 (0 is not stochastic) against the alternative hypothesis Ha:n > 0 (0 is stochastic). The score with respect to 71 is, Um(y,n) = !"log(Z(v)) on l + l/r((/e+/e/9r)|9=(1Q) l^((/9+/e/er)|9=(i/n) I + ^ + U J I B - , . " ) ' Evaluated at TC = 0 the score simplifies to Un(y,0) - \tr{i[l^ + / e / e r ) | e = | a j . It is not completely obvious that this is a martingale since the number of parameters and the dimension of / e and / e are all increasing with n . Note that the elements of /9 are f (yt |0,) and the diagonal element of / 9 are ÂŁrf(yt\9t). The score therefore become Uâ€ž(y,0)=Ydll{^fiyt\Qt) f(yt\Â®t)) n8=n. â€˘ This is, of course, the same as replacing the n parameters in 0 with the single (1-dimensional) parameter u 0 , that is Un(y,0) = Z " = 1 t e / ^ > o ) + ( i / ( v ^ 0 ) ) 2 } - Or letting / ( ^ , v) = ÂŁ ; = 1 l o g ( / ( v > 0 ) ) we can then write the score as Un(y,0) = l +11*, which is now clearly a martingale. Since Chapter 6. General misspecification test 129 {y,}J= 1 is a sequence of independent and identically distributed random variables and assuming the derivatives of / with respect to p 0 are continuous, j ^ / ( y , | p 0 ) + (g^/ (v , |p 0 ) ) \ is also a sequence of independent and identically distributed random variables. Assuming the variance exists and is finite, that is var^/(v, |p 0 ) + ( ^ / ( v , | p 0 ) ) j = r j 2 , the central limit theorem implies that n^v'HL +L C )^W(0,1). V "Ho "Mo "Mo/ v ' / For the more general case we let {y, j be a sequence of x 1 vectors of observations. It is assumed that the marginal distribution of yt depends on the parameter 0, which is a k x 1 random vector. Let L(y\Q) and Z(y) be respectively the conditional and marginal likelihood, where y = (yf, ,yTN)T and 0 = (0f, ,0 TN)T. Also assume that E[Q ] = p and var[0 ] = Q(II), where n = (TI ,, , K M)T, and that Q(0) = . McCabe and Leybourne show that the marginal likelihood can be approximated as L(y) Â» I (v |p){l + ^ ( ( / e +/ e/ e%M a)}. We are interested in testing the hypothesis that none of the parameters are stochastic, H0:H = 0, against the alternative that at least one of the parameters is stochastic, H0:ni > 0 for at least one i, 1 = 1,2,â€˘â€˘â€˘,m. McCabe and Leybourne show that the test statistic to be used in this case is UN = trl(l6 + V / J L ^ / ^ ^ ^ l n ^ } or Chapter 6. General misspecification test 130 equivalently UN = vec (/e +447)le=n 'an v e c [ p ] ' e ' w n e r e e is the mx 1 unit vector and vec is the operation of column stacking. Under H0:YI = 0 UN is a zero mean martingale, and under appropriate regularity conditions, again see section 4.1, \U)N UN â€”> N(0,l) as 7V->oo. 6.3 Details for the Poisson AR(1) model In this section we present the details of the specification test for the Poisson AR(1) model and apply the test to simulated data, some for which the Poisson AR(1) is the correct specification and some for which the Poisson AR(1) is not the correct specification. We will denote the conditional probability of Xt given and the parameter values a, and Xt by, mm(Xu,X,_x) ( -ft- ^ pl(Xt\X_l) = 2 M <(l-a,) x=0 V x J -X, i X.-x (Xt-x)\' We assume that the sequences {ot,}"=1 and {A.,}"=1 are independent and identically distributed with the following means, variances and covariance: ÂŁ[a (] = a , E\X J = X , Var[at] = ni >0, Var[Xt] = n2>0 and Cov[at,Xt] = -TC3 < 0. We can justify the use of a negative covariance as follows: To simplify the argument we assume a, =a and Xt=X for all t. The marginal mean of Xt is p = X./(l-a). If the mean p is fixed then increasing a corresponds to decreasing X Chapter 6. General misspecification test 131 and we therefore assume a and A, to be negatively correlated. Our vector of random parameters is 0 = ( a 1 5 a 2 , ,an,'kx,X2, , A , n ) r and the first and second derivatives of the log-likelihood with respect to 0 are given by and 6 \_diag(laih ,laiXi, ,lanK) . diag(lXi,lXi, ,lx) where / a | = ÂŁ \og{p (X, \ Xt_x)), l x = ^ \og{pt {Xt \ Xt_x)), /â€ž, = Â§ l o g ( A (X , | X,_x)), k=i}og{Pt^Xt\Xt_x)) a n d / ^ =1$x\og{pt{Xt\Xt_x)). The variance matrix of 0 can be written as Q(iT) = Klin where /â€ž denotes the n x n identity matrix. The derivatives of Q are d dn, 0.1 oâ€ž Â° .J oâ€ž 0.1 7 J Chapter 6. General misspecification test 132 Â°- J Finally the score statistic, to test the null hypothesis HQ:nl =0,n2 = 0 & 71^=0 against the alternative hypothesis Ha:nl > 0 for at least one / , is given by Un = Y,-i{%, (a>^) + Za<(a,X) + llt(a,X) + lXi (a,X) - 2 / a (a,X)lXi (a,X) -2/aA (a,X)). Note that all the derivatives in this expression can be found in section 4.4. If this joint test rejected the null hypothesis then the test can be separated into individual tests to try and identify which parts of the model are inadequate. In the Poisson AR(1) model there are two random components: the thinning operator and the arrival process. Recall in Section 4.4 we showed the Pearson residuals could be decomposed into continuation and arrival residuals. In a similar manner we can decompose the score statistic into components. However in this case the decomposition is into three parts: continuation component (for the binomial thinning operator), arrival component and an interaction component for the interaction between the binomial thinning operator and the arrival process. In the test statistic Un the component /]]"_1{/ a ! i(a,A) + / a <(a,A.)} t e s t s m e adequacy of the binomial thinning operator to explain the variation in those who continue to collect from one period to the next. Binomial thinning assumes that the recovery of individuals is independent, which is probably a reasonable assumption. It is unlikely that one individual's recovery time would affect another's, unless we had limited medical Chapter 6. General misspecification test 133 services and recovery on one meant that the next individual could start treatment. The binomial thinning operator also assumes that all individuals recover at the same rate. This is a less realistic assumption, since there is wide variation in individual health due to genetic factors and lifestyle choices, such as diet and exercise. Recovery rates should vary from person to person, since we would expect recovery rates to depend on the person's health immediately prior to injury or illness. We now examine the thinning operator in more detail. The Bernoulli assumption seems reasonable. That is, for individual i who is collecting disability at time t-1 there are two outcomes: recovers (stops collecting STD) or does not recover (continues to collect STD). This, of course, ignores other possibilities, such as death, which we could either assume has a negligible probability or can simply be added to the recovery probability and thought of as an exit probability. However, each individual collecting disability at time t-1 wil l have a different level of health and therefore will have a different probability of recovery before the next period. The number of individuals who continue to collect at time t can be written as Z M ' P<' w n e r e Pi ^ independent Bernoulli random variables, and the probability that individual / recovers is -P(p\. = 0) = 1 - a,.. This thinning operator has more variation than the binomial thinning operator. The question is how much more variation or is the binomial thinning operator sufficient to account for the majority of the variation? If we found that the binomial thinning Chapter 6. General misspecification test 134 operator was not sufficient to explain the variation, that is, the result of the specification test was to reject the hypothesis that the binomial thinning parameter was non-stochastic, then we would have to consider over-dispersed models such as in McKenzie (1986), A l -Osh and Aly (1992) and Joe (1996). The problem with the thinning operators used in McKenzie (1986) and Al-Osh and Aly (1992) is that they are a random sum of geometric random variables and it is possible to have a o Xt_x > Xt_x, which wouldn't make sense i f we want to use it to model a queue. adequacy of the Poisson distribution in describing the arrival process. Two reasons for using the Poisson distribution are: first it has a simple probability mass function with well-known properties and second when combined with the binomial thinning operator the marginal distribution of our process remains Poisson. In practice we often use the Poisson distribution to model count data. The main criticism of this is that most real data are over-dispersed, that is, the Poisson distribution is not sufficient to describe the variation found in the data. One method for dealing with over-dispersed count data is to let the Poisson mean X be random. Usually it is assumed that X has a gamma distribution, which transforms the distribution into a negative binomial. For AR(1) models with negative binomial marginals see McKenzie (1986), A l -Osh and Aly (1992) and Joe (1996). Finally, the component ^" = 1 {/ 0 ( (a,X,)/ X ( (a,X.) + / a X (a ,A))j tests i f the arrival The component statistic Un tests the Chapter 6. General misspecification test 135 and departure processes are independent. If the workers are a small cohort then the arrival and departure processes are dependent, since, as workers recover, the cohort size increases, which raises the cohort's exposure to injury and increases the number of new injuries (arrivals). In most industries this is unlikely to be a problem since the number of injured workers is usually a very small fraction of the industries work force. Next, suppose we believe that the binomial thinning operator with a fixed non-stochastic parameter is sufficient to explain the variation in the number of individuals remaining in the system from one period to the next. However, we suppose we have picked the arrival process to be Poisson out of convenience and wish to check the adequacy of this specification. In this case, the score statistic simplifies to Un = Z"- i (^ , ( a ' A - )+^ , (a^)) â€˘ Using the expression in Sections 4.5 we can rewrite the low count series, it is easy to calculate the variance numerically, which is given by Since the martingale differences, Ut - Ut_x, are bounded by A.2 + X] which has finite moments, the weak law of large numbers holds for the martingale differences and their squares. That is, nxUn -*E[U, -Ut_x] = 0 and n\U\ ->ÂŁ (U, -Ut_x)2 score statistic as Un = L;.,{(*]) 2-2XE,[s,]+X' +E,[s>]-(E,[s,]f -ÂŁ,[*,]} E^s] - (l + 2X)Et [s,]}, which is, of course, is a zero mean martingale. For = var[C7, -Ut_x\ The bound on the martingale differences also means that the Lindeberg Chapter 6. General misspecification test 136 -y d condition is satisfied; hence var[(7n ] 2 Un -> iV(0,l). To evaluate how well this test performs, we apply it to some simulated data which are correctly specified, and to some simulated data which are misspecified. In the first case, we simulated 100 series of length 200 from the Poisson AR(1) model with a = 0.5 and X = 1. At the 1% level of significance the information matrix test rejects 3 series and at the 5% level it rejects 8 series. The number of rejections is only slightly higher than expected, so it appears that a sample size of 200 is sufficient for the distribution of the test statistic to be approximately standard normal. Next, to assess the power of the test we simulate some misspecified data by letting e, follow the uniform distribution over the set of integers {0,1,2}, that is, P(st = /) = X , / = 0,1,2. Note that the mean for s, remains the same. This time, out of 100 series, the information matrix test rejects 92 series at the 1% level and all 100 series at the 5% level. This indicates that the test has strong power against this type of misspecification. Example 6.3.1 In this example we test the specification of the Poisson arrivals in our illustrative data set. The information matrix test statistic is 1.45 (p-value 14.7%). We therefore accept the null hypothesis that the Poisson parameter is non-stochastic. Chapter 7 7. Models with covariates 7.1 Model definition and introduction Let X0,X1,---,Xn be a series of dependent Poisson counts generated according to the following model Xt =a ,oX,_, +et, where X0 has a Poisson distribution with mean A,0 and {s,}^, is a series of independently distributed Poisson random variables with mean Xt. The thinning operator " Â° " is defined as in Section 2.1. Given Xt_x, a , " ! , . , and s, are assumed to be independent; this can be checked with the model specification test, see Sections 6.3 and 7.5. As in Section 6.3 we will denote the conditional probability of Xt given Xt_x as Pt{Xt\Xt_x). A n easy way to incorporate covariates into the model is to use a link function, which is the common method in generalized linear models. The idea behind the link function is to map an unrestricted (real) parameter space for the regression coefficients into the restricted parameter space required by the model. For example, since the 137 Chapter 7. Models with covariates 138 parameter space for a , is (0,1), the following link is appropriate a , = exp(lfY) l + exp(Jfy) ' where Yt is an m -dimensional vector of time-varying covariates and y eSRm is an m-dimensional vector of parameters. Similarly, taking Xt = exp(Z, rP), where Zt is a p-dimensional vector of time-varying covariates and p e$R p, will insure that Xt is positive. 7.2 Forecasting The first step in the process of forecasting is to find the k-step ahead distribution, that is, the conditional distribution of XN+k given XN. This distribution is defined by the conditional moment generating function of XN+k given XN; and is given in the following theorem. Theorem 7.2.1 For the Poisson AR(1) model defined in Section 7.1 the k step ahead conditional moment generating Junction is given by Proof: The result is proved using induction. The one step ahead conditional moment generating function is AN+klAN X ' t 0 1 A f + i a j V + 2 " ' a j V + i e +(1 a A T + i a N+2 Chapter 7. Models with covariates 139 N+l Â° XN + ÂŁ N+l ]}| XN 1 = [a ^ + (1 - Â» ) f " expjX w + 1 (es -1)}. Now, suppose that the k-l step ahead conditional moment generating function is Mx ,x (s) = A N+k-\ \ A N v ' N+k-l f N+k-\ 1 N+k-l N+k-l exp ( ^ - l ) X ^ , n a ; I i=N+l 7=1+1 /V+t-1 where we define ]^[cc y=0 when i = N + k-l. Then the k step ahead conditional 7='-+l moment generating function is MxN^Nis) = E[e^\XN] = E[E[esX^\XN+l]\XN] = E N+k f N+k ^ eÂ° UaJ+ 1 - n S j=N+2 \ j=N+2 J N+k N+k exp (e*-l) \ \ X N [ i=N+2 j=i+l J N+k N+k = E[e?â„˘\Xâ€ž]exp\(S-l)YlXlYlaJ I i=N+2 j=i+l N+k N+k : [ a w + 1 e ' + ( l - a w + 0 ] ^ e x p { X w + 1 ( e ' - l ) } e x p ( e s - l ) X ^ f [ a 4 ' i=N+2 j=i+l where e' = ^Y[^N+2aj + ~ 11^+2 a.>)' Substituting this in for e' gives Chapter 7. Models with covariates 140 exp< A, ' n ; ^ Â« , + ( i - n M W i - a Â» Â« ) , ' i C - Â« J + ( > - i C Â« Â« > ) l - 0 } H ( ' - o s ^ n Â« J N+k N+k i=N+2 j=i+l N+k f JV+* ^ 1-j=N+l N+k N+k ex P ( ^ - l ) X M l a ; â€˘ I i=N+l j=i+l J â€˘ Remarks 1. The distribution of XN+k\XN is a convolution of a Binomial distribution with parameters n ^ + 1 a 7 and XN, and a Poisson distribution with parameter Z N+k â€ž -iâ€”rN+k . . -wâ€”rN+k -r-iN+k . -iâ€”rN+k . A, I a , . Hence, it has mean XM a , + > A,, a , and vanance A â€ž a , 1 - I I a . + > A,,. a , . " 1 1 j=N+l J\ 1 1 j=N+l J) Z_(/=Af+l 11 1 y=,-+l J 2. From the conditional moment generating function the conditional distribution of Xt given X0 is a convolution of a Binomial distribution with parameters J^J^rXy and X0 and a Poisson distribution with parameter X ' - i ^ < l X - i a . / â€˘ Hence, i f the unconditional distribution of X0 is Poisson with mean A 0 , then the unconditional distribution of Xt is Poisson withmean A< (+A-Ma,+A,_ 2a,a (_,+ +A,0a,a,_, a , . For comparison purposes we define a corresponding Gaussian AR(1) model with covariates as follows: Chapter 7. Models with covariates 141 Xt = Xt +(xt Xt_\ + s r , where s ( is normally distributed with mean zero and variance a 2 and the parameters Xt and a, are defined as in the Poisson model. For this Gaussian AR(1) model, XN+k\XN has a normal distribution with mean . A ^ T T ^ a ,+y J V + * A , T T j v + * a, and variance Â« 1 1 j=N+l J ÂŁâ€”li=N+\ ' 1 17=1+1 J l + ]^^Ja 2)o" 2. As in section 3.1 the conditional mean of XN+k\XN in the Poisson and Gaussian models are the same, but the conditional distributions are quite different. Let pk (x) be the conditional probability function of XN+k given XN and define the conditional median of XN+k given XN as the smallest non-negative integer mk such that ^ â„˘ ! 0 ^ 05. This is the same definition used in Chapter 3. If, at time N, we know the future covariate values, and { Z , } ^ * , as well as the parameter values y and |3 ; then the k step ahead conditional probability function, pk (x), can be found from the k step ahead moment generating. Proposition 3.2.1 says that the k step ahead conditional median is the forecast that minimizes the k step ahead absolute forecast error. With regard to forecasting count series, the biggest advantage in using a data cohesive model is that individual probabilities (point mass forecasts) can be calculated for all possible future outcomes. This is especially useful for low count series, where only a few of the outcomes have non-zero probabilities. If the purpose of a model is to make forecasts, it is helpful to select covariates that are either deterministic or easy to forecast, since future covariate values are needed to Chapter 7. Models with covariates 142 forecast with these types of models. Next we consider the limiting distribution for a couple of simple cases. Example 7.2.1 Suppose that there are two different rates of new injuries, and each month the rate switches back and forth. That is X2t+x=Xx and X2t=X2, t = 0,1,2,-â€˘â€˘. Also assume that the recovery rate is constant over time, that is a , =a fixed. The unconditional distributions of X2t+X and X2t are Poisson with respective means, Xx+X2a +A,,a2 + +Xxa2'~2 + X2a2'~1 and X2+Xxa+X2a2+ +X2a2t~2+Xxa2'~l. There is no limiting distribution in this case. However the following subsequential limits hold X2t+x^P0(Â±f^-) and I 2 ( ^ P o ( ^ i ) . Also i f X0~Po[^^) then the unconditional or marginal distribution is X2t+X ~ P0{^^j and X2t ~ ^,(X2+a2x'). Example 7.2.2 Consider the following modification of Example 1. Suppose there are two distinct seasons (winter and summer) and that the two seasons have different injury rates, that is, x [A,, 1 = 0,1, 5 1 2 ' + ' 1^2 ' = '11 If X0 ~ ^ ^ ' ^ ^ ^ ' ^ m e n for / = l,2,---,6 the marginal distribution is Y P((l-a'h . ( l - a 6 ) ( a ' ^ ^ " ^ . ) \ , ( l - a > 2 (l-a')(a%+a"%) 1 2'+' Â° ^ 1-a + (l-a)(l-a 1 2) J ^ ^12<+/+6 ~ ro 1-a + ( l- a )(l-a 1 2 ) Chapter 7. Models with covariates 143 Example 2 is fairly realistic in that a lot of industries have two well defined seasons. Exposure to injury is often highest during the summer months, when there is a larger work force and more over-time. In contrast, during the winter months, when there is less work, the work force is smaller and those employees who are working are encouraged to take holidays. In British Columbia, the logging industry and the fishing industry are two examples of industries whose exposure to injury changes according to the season. In Section 3.4 we showed how to construct confidence intervals for the probability mass forecasts. The following modifications are needed when regressors are included in the model. Suppose we have a sample of size n and denote the maximum likelihood estimates for this sample by y n and (3â€ž. We will assume that (y â€ž,Pâ€ž) is asymptotically normal with mean ( y 0 , P 0 ) r and variance n'H'1, where i is the Fisher information matrix and y 0 and p 0 are the "true" parameter values. The k step ahead probability function can be written as , i , r x T - H M ( I X ) X y 1 , x-s s~Â° S (X-S)\ where a n k = I X I l , a * > k = W>- 3 1 1 ( 1 X* = ^ n + i a n + i , k - i +'kn^n+2,k-i+ +^â€ž + *_ia I I + t _ l i l +Xn+Ic. By Theorem 3.4.1 for fixed x, pk(x\Xn;y â€ž,Pâ€ž) has an asymptotically normal distribution with mean pk (x\ Xn;y 0 , p 0 ) and variance Chapter 7. Models with covariates 144 < * * ( * ; Y O Â» P O ) = N I &y 7=Yo.P= dy + 2 Y=YO.P=PO dy dpk YP Y=Y 0 .P=PO ap + ax -1 5/7t Y=Yo.P=Po 8X (7.2.1) Y=Yo.P=Po where the matrix / 1 is partitioned as i = ZY 1 *YP *PY 'p and the matrices iy \ = (/pr') and are of dimension pxp, pxm and mxm. The partial derivatives of the point mass probability pk(x\Xn;y , P ) are given by 9 pk(x\Xa) = ^pk(x\XH)4-amJ[+4TP*W.)4-X dy da dy dX dy (7.2.2) and (7.2.3) Expressions for the partial derivatives pk(x\Xn) and â€”?p k(x\X ) are found in dan>k dX (4.4.1) and (4.4.3) respectively. The other 3 partial derivatives are given by ay <*â€ž,* =o.â€žtkJ^Yn+J(l-an+J)t ^X' = YX >-\ / j n+m n+m,k-m dy &i Chapter 7. Models with covariates 145 and 8 . * a o ^ _ Z_i ^ n+m^ n+nP" n+m.k-m ' where a n 0 s 1. A n approximate 95% confidence interval for pk(x\Xn;y 0 , P 0 ) is A (x|Xâ€ž;Y n ,p n )+2a , (x;Y n ,p n ) . (7.2.4) 7.3 Estimation The model parameter can be estimated using the Newton-Raphson iterative scheme discussed in Section 4.6. In this case, the vector of parameter is 9 =(y 1 5 y 2 , . . . ,y m , P 1 ; p 2 , . . . ,P . The addition of regressors in the model makes ÂŁ[f / (9 ( r ) )] less practical. In Chapter 8 we fit Poisson AR(1) models to some WCB data. In these models a is held constant over time. The parameter estimates were found using a quasi-Newton procedure, in which we calculate 17(0(r)) numerically. We found the following starting values worked well a = 0.9, pj = P 2 = = P P = 0. We now consider the asymptotic properties of the model. Let Uyn = ^ " _ , 4 ( <Â§~a/ and U ? n = Yt=Jh ap^f denote the score function for the departure parameters y and the arrival parameters P respectively, where 4 = 1 = i 0 g P t (Xt\ Xt_x), k, =-i;l = -i;lÂ°SPt(Xt\X,-i)> W Â° - , = Yta,(l-at) and =Z,X,.We also denote the Chapter 7. Models with covariates 146 martingale differences as uyt=Uyt -U t_x and wp, = - U^t_x. Using the expressions in Section 4.5 for the derivatives of the conditional probability the martingale differences can be written as Â«,,=*, p ^ - X ^ - V - p X X ^ ) j (\-at)Pt{Xt\Xt_x) atYt = pt(Xt-\\Xt_,-Y)-pt(X,\Xt_x) / {\-at)Pt(Xt\Xt_x) a,Yt = E,[at Xt_,]-atXt_x Yt and lip, = Pt{Xt-\\X,)-pt(Xt\Xt) lp,(X,\Xt)XtZ, = ÂŁ,[>,]-*., Zt. The following proposition shows this model can be identified. Proposition 7.3.1 The Poisson AR(1) model defined in Section 7.1 can be identified if and only if the following two matrices have full row rank: \Yx,Y2,...,Yn] and [Zj, Z 2 , . . . , Zn ]. Proof: A statistical model is identifiable i f its Fisher information is positive definite. It is therefore sufficient to show that 1 " L a.il-a,)Yt L a , ( l - a , ) K a E â€˘ . =0 (7 3 1) | M k^tzt k,^tzt | b i f and only i f ax=a2= =am=b1=b2= =bp=0, where a ={al,a2,...,am) and a\bT Y Chapter 7. Models with covariates 147 b = (bl,b2,...,bm) . The left hand side of (7.3.1) can be written as f^E (iaat{l-at)aTYt+ixXtbTZt)2 = f^E (c 1 (Z a | ( + c2jj , where cu =at(l-at)aTYl and c2t=XtbTZt. From Proposition 4.6.1 ÂŁ ^ c 1 ( / a ( f +c2tk,) = 0 ifandonlyif cu =c2t = 0. That is (7.3.1) holds i f and only i f aT[Yx,Y2,... , 7n] = 0 and bT[Zl,Z2,...,Zn] = 0 which by assumption hold i f and only i f a, =a 2 â€˘ If the regularity conditions for the martingale CLT hold, then the score can be used to make inference about the parameters y and p . Basically, i f the covariate processes are "well" behaved then these conditions will be satisfied. As mentioned in example 7.2.2, in many industries, exposure to injury is seasonal, in which case the addition of seasonal covariate to the arrival of injuries is appropriate. A common method for modeling seasonality in monthly data is to use indicator covariates for each month. Let Zt = {za,zt2,...,zn2f, where /'* component is one in month / and zero in all other months. That is, 1 i = j 0 i * y Z12MJ= n . . . *',./ = 1,2,...,12. Chapter 7. Models with covariates 148 Note, i f a constant is included in the regressors then one of the monthly indicators must be dropped. Often it is more appropriate to use seasonal indictors rather than monthly indicators. However, the number and length of the seasons is somewhat arbitrary. Some authors prefer to use sinusoidal covariates, such as Z, = sin(27ir/12) ,sin(27tr/12) T ,sin(27tf/12) . When all 12 sinusoidal components are included both models are the same. Usually only a few of the sinusoidal components are needed. However, this still generates 12 different monthly levels. Suppose we have a model with 12 values of the thinning parameter, one for each month of the year, that is, a 1 5 a 2 , ,ot12 are the thinning parameters for January, February, , December respectively. Similarly, suppose the model also has 12 values of the departure parameter, X,, X 2, ,Xl2, again corresponding to the 12 months of the year. Theorem 7.3.1 If X0 has a Poisson distribution with mean p.0 then the marginal distribution of Xj is Poisson with mean u. ; , where 1 1 2 Â»j = umod 1 2o) = : ÂŁ / ( M ) A * l-ala2 a 1 2 ^ and mod12(10+j+k) fU\k)= I"I ai+mod 1 2(/+A-l) ' /=1 Remark Note that \iJ = Paoiau)> t h a t is> xnt+k ~ p0(.Vk) f o r * = UÂ» ,12 and Chapter 7. Models with covariates 149 t = 0 ,1 ,2 , . . . . Proof: Since Xnt+k is a convolution of a 1 2 / + i ^ 1 2 , + t _ i and em+k> i t s niean must be equal to ak[ik_l + Xk, or, in other words, p t = a t p ( t _ , + X , t . We wil l show this equation holds for k = 1, and omit the tedious details for k = 2 ,3 , ,12 . We begin by writing p 0 and pj out in long form. That is, oi 2 ci 3 ocj2A.[ + ccjCt^ oi[2A.2+ ~Kx]2A.ii A.j2 1â€”a,^ a 1 2 and _ A . 1 + a 1 a 3 a 4 a 1 2 / l 2 + a 1 a 4 a 5 a 1 2 X 2 + +a]al2Xn +a1A,,2 l - a ^ a 1 2 Then . a j a 2 a 3 a 1 2 A, ,+a , a 3 a 4 a 1 2 A , 2 + a 1 a 4 a 5 a 1 2 X 2 + +alanXu+alXl2 ajPo+A,, -l - a , a 2 a 1 2 _ / l 1 + a 1 a 3 a 4 a 1 2 A . 2 + a , a 4 a 5 a 1 2 X 2 + +a1anXn+alXn l -a jd .2 a 1 2 = M i It can be shown that the sequence of random vectors (Xl2t+l,Xn 2, â€˘ ,Xi2t+12)T ^ is stationary. Proposition 7.3.2 If the stochastic process Xt follows the Poisson AR(1) model as Chapter 7. Models with covariates 150 defined in Section 7.1 and there exist an a m a v < 1 and a Xâ€žâ€ž < oo such that a, < a _ and X,, < Xmax for all t, then it is a -mixing a(ri) = O a"max . The proof of Proposition 7.3.2 is very similar to that of Proposition 4.3.1, but the notation required makes it tedious an is therefore omitted. Proposition 7.3.4 Under the assumptions of Proposition 7.3.2 all the moments of Xt exist and are finite. Further, all the moments of ut and ut exist and are finite, where ut = Ut â€”Ut_x, Ut is the score function and ut is the matrix of partial derivative of ut with respect to y and P . Proof: As noted in Remarks following Theorem 7.2.1 the marginal distribution of Xt is Poisson with mean Xt +Xt_lat +Xt_2atat_l+---+X0atat_l---al which is bounded by A max 1 ~ a mi / ( 1 - a max) < 0 0 â€˘ Hence, all the moments of Xt exist and are finite. The second part follows from the fact that both ut and ut are bounded polynomials in Xt and hence all of there moments exist and are finite. â€˘ Proposition 7.3.3 If in addition to the assumptions of Proposition 7.3.1 we assume that the model is idenifiable and Xt is stationary, then the Fisher information, i, is finite and Chapter 7. Models with covariates 151 Proof: By Proposition 7.3.2 Xt is a -mixing, which combined with stationarity implies that Xt is ergodic. As a consequence of Theorem 4.1.1 ut and ut are both ergodic. Further, ut and ut have finite variances due to Proposition 7.3.2. Hence conditions for Corollary 4.1.1 are therefore satisfied and the result follows. â€˘ 7.4 Testing In the first part of the section we consider testing for independence, or more formally testing the null hypothesis, H0:a = 0, against the alternative hypothesis, Ha:0<a < 1. We begin by finding the Fisher information matrix under the null hypothesis. Let lt-ptXt\Xt_x , lta=-Â§a-lt, hx,=~i7,h' 4 x = & f 4 x ' haX,=~Â§r,ha 3 1 1 ( 1 hx, = a hx, â€˘ A l l of these partial derivative are found in Section 4.4. When a = 0 we have the following simplifications: â€˘ _ Xt_xXt _ '(a â€” Â« A M i Kt Xt ' / - 2 Y ^ - Y + Y , Y 1 <X*{Xt~l) Y2 X ' A, A, A, Y Y Y2 Y â€” 0 Y ' Y Y2 1 Y I _i_ Y ' Chapter 7. Models with covariates X, , X] -X, XX1 i = _ <-i < <aX, 12 12 ^1 M ltX, . 2 *1 The expected values of these under the assumption a = 0 are: E ^ E ^ - X ^ = 0, E[hx] = E ^ - l = 0, ^ T^ Xt â€ž X, â€ž X ltâ€ž â€” E IlXT 1 XT i XT 1 i XF i Â« h i X* A. A* A* A, , A. /^+A.2, - ^ ( - I T 1 " ^ - ! " A,M+A,M - y - A , M 2 A,, A,, A.r â€” 01 1 ^ ' - i 1 i ^t-i â€” LK, , â€” A,, , A,, , H A,, A,j A,, A ( _ A,2M A,M Chapter 7. Models with covariates 153 = A,_,A.2 X,_xXt+X2t X] ~ X) _ _ j _ Next we calculate the partial derivative with respect to P . 'p ap ' tx- ap' 'lap ap taX 8Xt â€˘ ap 'pp ap 'p = f ex^dx/ . a^, A < ap ap ap2 * We then have Chapter 7. Models with covariates 154 X = -xt_xz,, xtzt and ap ap L ^ ' J ap .= -J-x2.z.z5+o-a2A" A,, â€˘"' ' ' ap 2 = - A , Z , Z j . The Fisher information is therefore given by i â€” 1 Â± + â€” Xt_xZt XtZtZt We will assume there exist a positive definite matrix / such that l i m n ^ 0 0 in -> i and let a 2 denote the first entry in r 1 . We define the following three sets of parameter "estimates": Let dn and Pâ€ž be the unrestricted maximizers of the likelihood, dn and pâ€ž be the maximum likelihood estimates (that is maximizing over the parameter space) and p* be the maximum likelihood estimate of p when a = 0. Under the null hypothesis V w a n / a converges in probability to a standard normal Chapter 7. Models with covariates 155 random variable and V n d n / a converges in probability to a random variable Z*, where Z* is defined as, 0 z<0 PZ <z = P(Z<z) z>0 where Z has a standard normal distribution. If we respectively define the Wald and likelihood ratio statistics as: Wn=na2Ja2 and 2log( A ) , where A = Z(dâ€ž, pâ€ž; X)/Z(0, p*; X). Then under the null hypothesis both converges to the usual chi-square random variable with one degree of freedom. However if redefine the Wald and likelihood ratio statistics as: WN =na2n/o2 and 2 log( A ) , where A = L(d n, P N; X)/L(0, P *; X), respectively. Then under the null hypothesis both converges to a modified chi-square random variable defined by, P X* = I + I i > ( X l <X), x>0, where x i has a chi-square distribution with one degree of freedom. Consider the following zero mean martingale i * M ( * ' 7 * M ~ x ' ' - g . ( C M Â» ) - Â» ÂŁ & where U^a,^) is the score function with respect to a . Under the null hypothesis Chapter 7. Models with covariates 156 ^an(O'P) i s a niean zero martingale. However, under the alternative it is necessary to " X2 subtract â€”â€” from ÂŁ7^(0, P) to get a zero mean martingale. Hence large positive values of ÂŁ7^(0, p) give evidence in favor of the alternative hypothesis. That is, the score test is one-sided. Next we consider the information matrix test for specification. We wil l allow covariates in both the arrival and departure process and use the same notation as in Section 7.3. That is, we let ut=Ut- Ut_x, where Ut denote the score function at time t and let ut denote the matrix of partial derivative of ut with respect to y and p . The information matrix test is based on the following martingale: Mn = mt, where mt = \Tm+p utuTt +ut lm+p and lm+p is a m+p vector of l 's . The quadratic variation of Mn is given by [M]n = ^"_im2. Under the assumptions of Proposition 7.3.3 mt is stationary and a -mixing. Further all the moments of mt exist and are finite. This is sufficient for the conditions of Theorem 4.1.1 to hold and hence [Af] ^ Mn converges in probability to a standard normal random variable. Chapter 8 8. Application to counts of workers collecting disability benefits In this chapter we analyze five data series drawn from the W C B data set. Section 8.1 contains descriptions of the five data series and preliminary analysis of the data. Next, in Section 8.2, we carry out our in-depth analysis, which includes: model estimation, selection and testing. We calculate forecasts for the first six months of 1995, in Section 8.3. Finally, in Section 8.4 we show what happens i f the Gaussian AR(1) model is fit to the data. 8.1 Workers' Compensation Data We have selected the following five data series for analysis in this chapter: Each series contains monthly counts of claimants collecting STWLB from the W C B . A l l the claimants are male, between the ages of 35 and 54, work in the logging industry and reported their claim to the Richmond service delivery location. The distinguishing difference between the five series is the nature of the injury. We will refer to the five series as data set 1, 2, 3, 4 and 5. The claimants in data set 1 have burn related injuries. In data set 2 the claimants have soft tissue injures, such as contusions and bruises. The claimants in data set 3 have cuts, lacerations or punctures. Claimants in data set 4 have dermatitis and data set 5 contains claimants with dislocations. 157 Chapter 8. Application to counts of workers collecting disability benefits 158 Table 8.1.1 contains a summary of simple descriptive statistics for the five data sets and Figures 8.1.1 through 8.1.5 contain time series plots for the five data sets. Plots of the sample autocorrelation function and sample partial autocorrelation function are found in Figures 8.1.6 and 8.1.7. Data Set Minimum Maximum Median Mean Variance 1 0 2 0 0.34 0.33 2 4 17 9 9.83 9.48 3 1 21 5 6.13 11.70 4 0 2 0 0.28 0.27 5 0 4 1 0.92 0.76 Table 8.1.1 A summary ofsimple statistics for data sets 1 through 5. A property of the Poisson AR(1) model is that the marginal mean and variance should be the same, see Proposition 2.3.1. In Table 8.1.1 we see that for each data set the mean and variance are close except for data set 3, where the variance is almost twice the mean. However, i f the Poisson mean is non-constant this would cause the variance to be larger than the mean. In the time series plot of data set 1, Figure 8.1.1, there is a significant change in the pattern after the middle of 1993. It therefore seems unlikely that an AR(1) model will fit the series well. This is further confirmed by the sample autocorrelation function, which doesn't decay fast enough for an AR(1) model. Also, the first two lags of the partial autocorrelation function are statistically significant at the 5% level, suggesting that an AR(2) model might be appropriate. However, an AR(2) model has the problem that it is not easy to interpret and it has more than one specification. See the discussion at the Chapter 8. Application to counts of workers collecting disability benefits 159 beginning of Section 2.4. The difficulty with analyzing a series with very low counts, such as data set 1, is that a single claim can drastically change the shape and correlation pattern of the series. For instance, in data set 1 there appears to be a single claimant collecting STWLB between May 1993 and December 1994. It is impossible to tell this for sure, since this same pattern could be caused by several individual claims, although, based on the earlier claims frequency in the series, this is less likely. Further investigation shows that our conjecture of a single long duration claimant is correct. Since the frequency of severe claims is low (one claim in ten years) and since our model is not designed to handle such claims we removed this outlier from data set 1. To distinguish this new series from the original we refer to it as data set 1*. The sample autocorrelation function and sample partial autocorrelation function for data set 1* are found in Figure 8.1.8. There is a slight seasonal pattern in the sample autocorrelation function. However, a seasonal model may not be necessary, since the correlations at lag 6 and 12 are well within the 5% confidence limits. The sample partial autocorrelation function indicates that an AR(1) model is appropriate. The claims counts in the second data set are significantly higher than in data set 1. Hence one or two persistent claims would not have a profound effect on the series' shape and correlation pattern. The time series plot of data set 2, Figure 8.1.2, looks stationary with possible seasonality. A seasonal pattern appears in the sample autocorrelations function, which has with large negative correlations at lags 7, 8 and 9, and large positive Chapter 8. Application to counts of workers collecting disability benefits 160 correlations at lags 13 and 14. The sample partial autocorrelation function is consistent with the AR(1) model. The claims counts in data set 3 are also relatively large, with a mean of 6.133. In July 1988 the claims count is unusually high at 21 and is the only observation above 14. The time series plot, Figure 8.1.3, shows a season pattern and a drop in the variation after January 1990. The sample autocorrelation function confirms a seasonal pattern, while the sample partial autocorrelation function suggest an AR(1) model is appropriate, see Figure 8.1.6. The claims counts in data set 4 are low. Between June 1990 and April 1991 there appears to be a persistent claim, again it is impossible to tell for sure without further investigation. The later half of the series appears to have a slightly lower claims frequency than the first half of the series. The sample autocorrelation function and partial autocorrelation function, Figure 8.1.7, are consistent with an AR(1) model. In data set 5 the claims counts are low with slightly higher claims counts occurring between January 1990 and December 1992, which again could be caused by a single claimant with a severe or reoccurring dislocation. The sample autocorrelation function and partial autocorrelation function, Figure 8.1.7, are consistent with an AR(1) model. Chapter 8. Application to counts of workers collecting disability benefits Time Series Plot (data set 1) Month Figure 8.1.1 A time series plot of data set 1. Time Series Plot (data set 2) i i i i i i i i i C ( o C | s . C o O C o } C o C , â€” C c M ^ O C T t TOOOJOOOJDOOJOOOJOOJ^OJJOOJJOOJJOO) Month Figure 8.1.2 A time series plot of data set 2 Time Series Plot (data set 3) Month Figure 8.1.3 A time series plot of data set 3 Chapter 8. Application to counts of workers collecting disability benefits Time Series Plot (data set 4) Month Figure 8.1.4 A time series plot of data set 4 Time Series Plot (data set 5) Figure 8.1.5 A time series plot of data set 5. Chapter 8. Application to counts of workers collecting disability benefits 163 ACF (data set 1) l i f t Confidence Limits 1 3 5 7 9 11 13 15 2 4 6 8 10 12 14 16 Lag Number ACF (data set 2) o -1.0 8 ^ -'MM â€˘ | H " Confidence Limits 1 3 5 7 9 11 13 15 2 4 6 8 10 12 14 16 Lag Number ACF (data set 3) I Bii-Confktence Limits 1 3 5 7 S 11 13 15 2 4 6 S 10 12 14 16 Lag Number PACF (data set 1) ! â€˘ flu Confidence Limits 3 5 7 2 4 6 3 11 13 15 10 12 14 16 Lag Number PACF (data set 2) Confidence Limits 1 3 5 7 9 11 13 15 2 4 6 8 10 12 14 16 Lag Number PACF (data set 3) â€˘ 111 wB HÂ«Â» Confidence Limits 3 5 7 9 11 13 15 2 4 6 8 10 12 14 16 Lag Number Figure 8.1.6 ACF's and PACF's for data sets 1 to 3. Chapter 8. Application to counts of workers collecting disability benefits 164 ACF (data set 4) Confidence Limits 1 3 5 7 9 11 13 15 2 4 6 I Lag Number ACF (data 5) 10 12 14 16 Confidence Limits 1 3 5 7 11 13 15 10 12 14 16 PACF (data set 4) LL -.5 O < P"ini"ii""!!^ lZii^ B â€˘ â€˘ â€˘ â€” a Confidence Limits HUfcoefficiant 1 3 5 7 9 11 13 15 2 4 6 8 10 12 14 16 Lag Number PACF (data set 5) ConfKdence Limits 1 3 5 7 9 2 4 6 8 11 13 15 10 12 14 16 Lag Number Lag Number Figure 8.1.7ACF's and PACF's for data sets 4 and 5. ACF (data set 1*) 8 -i.o t r Confidence Limits 3 5 7 2 4 6 11 13 15 10 12 14 16 PACF (data set 1*) ÂŁ -1.0 M JLE-, Confidence Limits 1 3 2 9 11 13 10 12 1 Lag Number Lag Number Figure 8.1.8 ACF and PACFfor data set 1 * Chapter 8. Application to counts of workers collecting disability benefits 165 8.2 Model selection and testing In this section we select and estimate a Poisson AR(1) model for each of our data sets. We restrict the class of models by considering only models where the departure rate is fixed, that is, a , = a for all t, and where the arrival rate is either constant or depends on sinusoidal seasonal regressors. The parameter estimates for data sets 1*, 2, 3, 4 and 5 are summarized in Table 8.2.3. Our preliminary analysis in Section 8.1 suggested the following need for regressors: data sets 1* and 2 might need seasonal regressors, data set 3 wil l almost certainly need seasonal regressors and data sets 4 and 5 will not require seasonal regressors. In analyzing data set 1*, we fail to find any seasonal regressors for which the coefficients were significantly different from zero, at the 5% level. We therefore proceeded to analyze the model with a constant arrival rate. The information matrix test statistic for the joint test that neither the departure nor the arrival parameters are stochastic is 0.240 (P-value 0.81), while for the individual test that the arrival parameter is non-stochastic is 0.134 (P-value 0.89). In both cases we accept the null hypothesis of non-stochastic parameters. Henceforth in this section we will refer to these two tests as simply the joint information matrix test and the individual information matrix test. In Table 8.2.3 we see that the lower bound of the 95% confidence interval for a is close to zero, 0.007. It is therefore worth testing for independence in this series. In Table Chapter 8. Application to counts of workers collecting disability benefits 166 8.2.1 we see that all of the tests reject independence at the 5% level and that the CLS and Wald test reject independence at the 1% level. We conclude that it is unlikely the series is independent. In Figure 8.2.1 we have included three residual plots: Pearson, continuation and arrival, see Section 4.4 for the development of continuation and arrival residuals. Recall the Pearson residuals can be decomposed into the continuation and arrival residuals as follows: rt = Xt - a I M - X = E,[Xt]-oX,_x-X = Et[a o I M + ÂŁ , ] - O L Y ( _ 1 -X = Et[a o J M ] - a I M + i?,[e,]-A where rXt = Et[a Â°X,_x]-aXt_x and r2t = i? ([s,]-X are, respectively, the continuation and arrival residuals at time t. The residual can be standardized as follows: rt j Et_x^rf , rXjjJE7/_1 [ l* 2 and r2t j [ r 2 ? ]^ . Recall Et is the operation of expectation conditional on 3, =G(X0,Xx,...,Xt). Since the continuation and arrival residuals are new ideas, we now analyze these residuals in detail for data set 1*. We begin by considering the following cases for the standardized continuation residuals of data set 1*: Case 1. Xt_x = 0. Chapter 8. Application to counts of workers collecting disability benefits 167 In this case, a Â°Xt_x given Xt_x = 0, is not random but identically equal to zero, hence Et[a o I M ] = 0. Since aXt_x is also equal to zero in this case, the residual at time t is zero. This is a key observation and an important property. It shows that, in this case, all of the deviation between the observed value of Xt and its expected value at time t-1 is due to the arrival process and not the continuation process. Case 2. Xt_x = 1 and Xt = 0. In this case, a o J , . , given Xt=0, is non-random and equal to zero since nobody continues. Therefore Et [a Â° Xt_x ] = 0. Since oX,_x is positive, the residual at time t is negative. For data set 1* the standardized residual is -0.709. Case 3. Xt_x = 2 and X, = 0. This case is similar to case 2, the difference is that oXt_x is twice as large as in case 2. For data set 1* the standardized residual is -0.919. Note that the residual is not twice the value in case 2, since, in this case, the standardization is conditional on Xt_x = 2 and not Case 4. Xt_x =1 andX, =1. In this case, a Â°Xt_x given Xt_x = 1 and Xt = 1, is a random variable taking values in the set {0,1}. That is, the one individual collecting at time t is either continuing to collect from time t-1 or is a new claim at time t. For data set 1* the standardized residual is Chapter 8. Application to counts of workers collecting disability benefits 168 1.364. Case 5. Xt_x = 1 and Xt = 2. In this case, a Â° Xt_x given Xt_x = 1 and Xt = 2 is again a random variable taking values in the set {0,1}. Note this is a different random variable than the one in case 4, that is P(aoXt_x = l\Xt_x =\,Xt = 2)>P(aoXt_x =\\Xt_x =\,Xt =l) . At time t there are two possibilities: two arrivals or one arrival and one continuing claim. For data set 1* the standardized residual is 1.727. Although other cases are possible these are the only five observed in data set 1*. In a similar manner we now analyze the standardized arrival residuals for data set 1*. Casel. Xt_x =0 andX, =0. Conditional on Xt = 0, e, is non-random and equal to zero, hence ÂŁ,[ÂŁ,] = 0. This results in a negative residual. For data set 1* the standardized residual is -0.366. Case 2. X,_x = 1 and X, = 0. This case is the same as case 1 except that the scaling (standard deviation conditional on Xt_x = 1) is different. For data set 1* the standardized residual is -.520. Case 3. Xt_x = 2 and Xt = 0. This case is the same as cases 1 and 2 except that the scaling (standard deviation Chapter 8. Application to counts of workers collecting disability benefits 169 conditional on Xt_x = 2) is different. For data set 1* the residual is -.654. Case 4. Xt_x =0 and Xt =1. In this case, e ( given Xt_x = 0 and Xt = 1 is non-random and equal to 1 For data set 1* the standardized residual is 2.366. Case 5. Xt_x = 1 and X, = 1. In this case, s ( given Xt_x = 1 and Xt = 1 is a random variable taking values in the set {0,1}. Thus ÂŁ,[e,] > 0. For data set 1* the standardized residual is 0.638. Case 6. Xt_x = 1 and Xt - 2. In this case, e t given Xt_x = 1 and Xt = 2 is a random variable taking values in the set {1,2}. Thus E,[et] > 0. For data set 1* the standardized residual is 4.043. For data set 2 we found the coefficients for the following seasonal regressors statistically significant at the 5% level: sin(27t t/12) and COS(2TI t/\2). If these regressors are essential to the model then the information matrix test for a simpler model should reject the null hypothesis of non-stochastic parameters. For the model with a constant arrival rate the joint information matrix test statistic is -.537 (P-value 0.59), while the individual information matrix test statistic is 0.417 (P-value 0.68). In both cases we accept the null hypothesis of non-stochastic parameters. That is, the model with a constant arrival rate is sufficient to explain the variation observed in the series. The Chapter 8. Application to counts of workers collecting disability benefits 170 simulation in Section 6.3 showed that the information matrix test had good power against the misspecification considered. However, the power of the test may be lower for other types of misspecification. We therefore need to check the residuals before making any conclusions. The residual plots for this simple model, Figure 8.2.2, look random. Further none of the sample autocorrelation for the residuals are statistically significant at the 5% level. We therefore choose to use the simpler model with a constant arrival rate. In our analysis of data set 3 we found the following seasonal regressors statistically significant at the 5% level: sin(27tf/12) and COS(2TC t/12). The joint information matrix test for this model is 1.008 (0.31 P-value). Figure 8.2.3 shows the residuals plotted in chronological order. At the 5% level, none of the sample autocorrelations are significant. In Figure 8.2.3 the residuals are plotted against the two seasonal regressors. These plots indicate no significant problems. Before we can go ahead and accept this model we need to show that simpler models are not adequate. We consider three simpler models with the following regressors: constant only (model 1), constant plus sin(27i t/12) (model 2) and constant plus cos(27r t/12) (model 3). The joint information matrix tests for these three models are summarized in Table 8.2.4. For model 1, the Pearson and continuation residuals have a significant sample autocorrelation at lag 12, while the arrival residuals have significant sample autocorrelations at lags 2 and 12. These correlations in the residuals along with the low P-Chapter 8. Application to counts of workers collecting disability benefits 171 value for the joint information matrix test lead us to reject model 1. The Pearson, continuation and arrival residuals for model 2 have significant sample autocorrelations at lags 2, 12 and 2, respectively. Although the P-value for the joint information matrix test is above .05 we reject the model because the residuals are correlated. In the case of model 3 the residuals appear to be uncorrelated, however we reject the model due to the low P-value of the joint information matrix test. In the case of data set 4 we fit a model with a constant arrival rate. The joint information matrix statistic is 0.222 (0.82 P-value). This leads us to accept the null hypothesis that the parameters are non-stochastic. In the three residual plots, see Figure 8.2.5, the suspected persistent claim is evident (observations 54-64). In the case of the Pearson and arrival residuals this causes a band of residuals close to zero. However, in the case of the continuation residuals this band of residuals is quite far from zero. This causes the lag 1 sample autocorrelation for the continuation residuals to be significant at the 5% level. Otherwise the residuals for this model look good, and we decide not to remove or further investigate the suspected outlier. For data set 5 we also fit a model with a constant arrival rate. The joint information matrix test statistic is -0.514 (0.61 P-value) and therefore we accept the null hypothesis of non-stochastic parameters. The residual plots are found in Figure 8.2.6. None of the sample autocorrelation for the residuals are significant at the 5% level. Chapter 8. Application to counts of workers collecting disability benefits 172 Recall duration is the number of months that a claimant collects STWLB. In Section 2.2 we showed that the mean duration is ( l - a ) _ 1 . We further showed how to construct 95% confidence intervals for the mean duration in Section 3.5. The mean durations, Table 8.2.5, for data sets 1*, 2, 3 and 4 are between one and two months. In the case of data set 5 the mean duration is longer, between two and three months. Test Statistic Value 5% Critical Value 1% Critical Value CLS 2.456 1.645 2.33 Our Score 2.256 1.645 2.33 Wald 6.893 2.71 5.41 L R 5.037 2.71 5.41 Usual Score 5.090 2.71 5.41 Table 8.2.1 Tests for independence in data set I *. Month Arrival Rate January 2.353 February 2.415 March 2.737 April 3.310 May 4.060 June 4.783 July 5.177 August 5.043 September 4.450 October 3.680 November 3.000 December 2.547 Table 8.2.2 This table displays the seasonal arrival rate for data set 3. Chapter 8. Application to counts of workers collecting disability benefits 173 Data Set Parameter 1 imei Bd Estimate I ppci Hd 1* a (><)()- 0.240 0.472 1* X 0 064 0.134 0.204 2 a D.344 0.472 0.599 2 X 3.898 5.188 6.478 3 a 0.294 0.406 0.519 3 P, (constant) 1 dV) 1.250 1.4M 3 P 2 (sin(2jr?/12)) -d401 -0.243 -0.051 3 P 3 (cos(2jtr712)) -0.483 -0.315 -0.147 4 a H B f l H 0.404 0.604 4 X mum 0.170 0 2M 5 a 0.539 0.652 0.765 5 X d.2Â«w 0.333 0.457 Table 8.2.3 This table summaries the parameter estimation for data sets 1 to 5; included are the parameter estimates and the upper and lower 95% confidence limits. Model Test statistic P-value 1 2.313 0.02 2 1.729 0.08 3 2.114 0.03 Table 8.2.4 this table summarizes the joint information matrix test of models 1,2 and 3 on data set 3. Data set Duration 95% confidence interval 1* 1.316 (0.881,1.717) 2 1.894 (1.397,2.352) 3 1.684 (1.338,2.002) 4 1.678 (1.066,2.242) 5 2.874 (1.864,3.804) Table 8.2.5 This table contains the mean duration and 95% confidence interval for the mean duration for data sets 1*2,...,5. Chapter 8. Application to counts of workers collecting disability benefits 174 Pearson Residuals (data set 2) Arrival Residuals (data set 2) â€˘ â€˘ â€˘ â€˘ Continuation Residuals (data set 2) Figure 8.2.2 Pearson, continuation and arrival residuals plotted against time for data set 2. Chapter 8. Application to counts of workers collecting disability benefits 175 Pearson Residuals (data set 3) Arrival Residuals (data set 3) Continuation Residuals (data set 3) Figure 8.2.3 Pearson, continuation and arrival residuals plotted against time for data set 3. Chapter 8. Application to counts of workers collecting disability benefits 176 Pearson Residuals s i n ( 2 ' p l / 1 2 ) Continuation Residuals iJ s in (2 'p i f 12) Arrival Residuals s i n ( 2 * p i / 1 2 ) Pearson Residuals cos ( 2 ' p i / 1 2 ) Continuation Residuals c o s ( 2 * p i / 1 2 ) Arrival Residuals H I c o s ( 2 * p i / 1 2 ) Figure 8.2.4 Pearson, continuation and arrival residuals plotted against model regressors in data set 3. Chapter 8. Application to counts of workers collecting disability benefits 177 Figure 8.2.6 Pearson, continuation and arrival residuals plotted against time for data set 4. Chapter 8. Application to counts of workers collecting disability benefits 178 8.3 Arr iva l process The Poisson AR(1) model assumes the continuation and arrival processes are not directly observable. For the five data sets described in Section 8.1 we were able to obtain the arrival data from the WCB. To distinguish the two sets of data we will refer to the arrivals as data sets 1A, 2A, 3A, 4A and 5A. In general data of this form may not be readily available, since it requires more detailed record keeping. Since these data are available we consider the following two questions: how well were we able to estimate the arrival rates with the Poisson AR(1) model? Are the arrival processes Poisson? We begin by estimating the arrival rates for the arrival data, which we assume to be independent and Poisson. For data sets 1A, 2A, 3A and 5A we assume a constant mean, while for data set 3A we assume a seasonally changing mean. The maximum likelihood estimates and 95% confidence intervals for the parameters are found in Table 8.3.1. For data sets 1*, 4 and 5 the parameter estimates from the Poisson AR(1) model are contained within these 95% confidence limits. The estimated arrival rate for data set 2A, 4.475, is contained in the some what wider 95% confidence interval for the same parameter in the Poisson AR(1) model. In the case of data set 3, two of the three parameter estimates from the Poisson AR(1) model are contained within the 95% confidence intervals for the same parameter as calculated from data set 3A. The estimate for p, in data set 3A is contained in the 95% confidence interval for p j as calculated from data set 3A. Chapter 8. Application to counts of workers collecting disability benefits 179 Table 8.3.2 displays the estimated seasonal arrival rate for data set 3A. These rates are slightly lower than those estimated by the Poisson AR(1) model. Also note the seasonal pattern has shifted by one month. That is, the lowest and highest months are respectively February and August, where as for the Poisson AR(1) model they are January and July. Overall, the Poisson AR(1) model appears to do a good job at estimating the arrival rate. Next we check the Poisson assumption. If the Poisson assumption is correct we would expect the mean and variance of each arrival process to be close. This is the case for data sets 1A, 4A and 5A. However, for data set 2A the variance, 7.171, is "much larger" than the mean, 4.475. Since the arrival rate is non-constant for data set 3A we would expect the variance to be larger than the mean. The Poisson specification can be formally tested by the information matrix test as follows. The Poisson probability function is p{Xt) = The first and second derivative of log[p(X,)] are: and Chapter 8. Application to counts of workers collecting disability benefits 180 2\og[p(Xt)] = - ^ . dX1 ~L"V " J x Therefore the information matrix test statistic is x - x 2-X, t=\ A 2 Since the denominator is constant, we consider the following statistic instead un(X) = ÂŁ { x , - x 2 - x t } . -V We have that [U(X)]~/2Un(X)^>N 0,1 , since the data are independent and identically distributed with finite moments. Note the test is basically checking whether the mean and variance are the same. The parameter X is unknown, so we replace it with X (the maximum likelihood estimate). Un(X) is related to Un(X) as follows: n-y>Uâ€ž(X) = n-y>Â±{xt-X2-Xt} t=\ = n-y^lxt-X 2-Xt+ X-X2 +2 Xt-X X-x] = Â«^X{x,-X 2-Xt\+n/2 X-X2 +2 X-X Xt-X = Â» - % â€ž ( * . ) + 0 , ( 1 ) . The results of the information matrix test are summarized in Table 8.3.3. In the case of data set 2A, the P-value is small which suggests the arrivals are unlikely to be Chapter 8. Application to counts of workers collecting disability benefits 181 Poisson. Finally we check the assumption of independent arrivals. None of the first 12 sample autocorrelations for data sets 1A, 3A and 5A were significant at the 5% level. Note for data set 3A we used the sample autocorrelations in the residuals. For data set 2A lags 1 and 12 had autocorrelations that were significant at the 5% level. This indicates possible seasonality as well as violation of the independence assumption. For data set 4A the sixth lag was significant at the 5% level, indicating possible seasonality. To conclude, the Poisson AR(1) model adequately estimates the arrival parameter. A l l of the data sets expect data set 2A appear to be Poisson and independent. Parameter Duu 1 MlllKlk 95% C.I. Data Set Estimate 95% C.I. X 1A 0.158 (0.087,0.229) 1 0.134 (0.064,0.204) X 2A 4.575 (4.192.4.958) 2 5.188 (3.898,6.478) p, (constant) 3A â€˘ â€˘ J I M (1.016,1.226) 3 1.250 (1.039,1.461) p 2 (sin(2nf/12)) 3 A -0.381 (-0.526,-0.236) 3 -0.243 (-0.401,-0.051) P 3 (cos(2^/12)) 3 \ -0.281 (-0.412,-0.137) 3 -0.315 (-0.483,-0.147 X 4A 0.150 (0.081,0.219) 4 0.170 (0.090,0.251) X 5A 0.308 (0.:<W.().4M-) 5 0.333 (0.209,0.457) Table 8.3.1 This table summaries the parameter estimation for the arrival processes in data sets 1A to 5A; included are the parameter estimates and 95% confidence interval. The last two columns contain estimated arrival rates and 95% confidence intervals from the Poisson AR(1) model. Chapter 8. Application to counts of workers collecting disability benefits 182 Arrival Rate Arrival Rate Month (arrival data) (Poisson AR(1)) January 1.987 2.353 February 1.916 2.415 March 2.095 2.737 April 2.538 3.310 May 3.235 4.060 June 4.064 4.783 July 4.735 5.177 August 4.911 5.043 September 4.490 4.450 October 3.706 3.680 November 2.908 3.000 December 2.315 2.547 Table 8.3.2 This table displays the seasonal arrival rate for data set 3 A. Data Set Test P-value Statistic 1A -0.543 0.59 2A 2.954 0.00 3A 1.407 0.16 4A -0.385 0.70 5A 0.170 0.87 Table 8.3.3 This table summarizes the information matrix test for data sets 1A-5A. 8.4 Forecasting In this section we calculate forecasts for the first 6 months of 1995, which are found in Tables 8.4.1-8.4.6. Since the models used for data sets 1*, 2,4 and 5 are simple, that is the arrival and departure rates are fixed over time, we can apply the forecasting techniques of Chapter 3. For these four data sets we have calculated individual 95% confidence intervals for the k-step ahead conditional distribution k=l,2,...,6,oo. Chapter 8. Application to counts of workers collecting disability benefits 183 Note that in each case the 95% confidence intervals for the 6-step ahead conditional distribution are very close to the 95% confidence intervals for the marginal distribution. Therefore i f we require forecasts beyond six months into the future we can simply use the marginal distribution. A few of the lower bounds in Tables 8.4.4 and 8.4.5 are negative. This is because the confidence intervals are constructed by applying an asymptotic normal result to a finite sample and are therefore only approximate. However, the confidence intervals that are affected by a negative lower bound are for probabilities that are very small (less than 1%). In Section 7.2 we show how to construct individual confidence intervals for the k-step ahead conditional distribution when the arrival and departure rates depend on regressors. However, to practically apply these results would require a significant amount of programming. Therefore for data set 3 we have only calculated the k-step ahead conditional distribution. A possible alternative for constructing confidence intervals for the k-step ahead conditional distribution is to use a bootstrap method, see Efron (1982), which may be more practical than the method in Section 7.2. The marginal distribution for data set 3 is given in Table 8.4.6. Notice the marginal mean peaks in August, which is one month after the arrival rate peaks, see Table 8.2.2. Also note, there is a fair bit of difference between the 6-step ahead conditional distribution and the marginal distribution for June. Chapter 8. Application to counts of workers collecting disability benefits 184 k 1 k = 2 k - 3 k = 4 k - 5 k = 6 k ^(o| i) 0.812 0.937 0.773 0.920 0.762 0.919 0.758 0.920; 0.757 0.920 0.756 0.920 i'."5h H O M 0 063 0 171 0.079 0.202 0 08) 0 211 0.081 0.214:0.080 0.215 0.080 0.215 0.080 0.215 Pkw) 0 000 0 016, 0.000 0.023 0 0()0 0 02S 0.000 0.026; 0.000 0.026 0.000 0.026 0 000 0 026 PkW) o o o o o o o r o.ooo 0.002 0.000 0.002 0.000 0.002- 0.000 0.002 0.000 0.002 0.000 0.002 Pkw) 0 000 0 000 0.000 0.000 0 000 0 000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Table 8.4.1 Individual 95% confidence intervals for the k-step ahead conditional distribution for data set 1 *. k - 1 1 k = 2 k 3 k = 4 k 5 k = 6 k - - * ! Pt(0|7) 0000 0000 0.000 0.000 0 000 0 000 0.000 0.000. 0 000 0 000 0.000 0.000 0.000 0.000 Pk(P) 0000 0 001 [ 0.000 0.001 0 000 0 001 0.000 0.001 0 000 0 001 0.000 0.001 0 000 0 001 PkW) O001 0 00'' 0.001 0.007 0 001 0 006 0.001 0.005 0 001 0 005 0.001 0.005 0.001 0.004 pkm 0.004 0 024 0.004 0.020 0.004 0 017 0.004 0.015 0 003 0 014 0.003 0.014 0.003 0 014 PkW) 0 014 0 057 0.013 0.044 0 012 0 03Xi 0.011 0.035 0 010 0 033; 0.010 0.033 0 010 0 032 Pk(p) 0 034 0 104 0.030 0.078 0 02" 0 06* 0.025 0.063' 0 024 0 061 0.024 0.060 , 0.023 0.059 PAW 0.066 0.149 0.055 0.114 0.050 0 101 0.047 0.096 0 046 0 093 0.045 0.092 0 044 0.091 PM 0 103 0 174 0.085 0.140 0 078 0 129 0.074 0.123 0.0~"2 0 121 0.072 0.120 0.071 0.119 Pk{P) 0 135 0 169 0.112 0.149 0.104 0 142 0.101 0.138,0099 0 137 0.098 0.136 0.098 0.136 Pki$l) 0 13" 0 151j 0.130 0.139 0 123 0 HX 0.120 0.137 0 119 o i r 1 0.118 0.137 0 118 0 13" ^(10|7) 0094 0 145 0.114 0.133 0 119 0 129 0.121 0.128| 0 122 0\r 0.123 0.127 0123 0 127 ft(H|7) 0.054 0.122 0.083 0.122 0.092 0 122 0.096 0.123 0 098 0 123 0.099 0.123 0.100 0.123 P*(12|7) 0 025 0 091 0.054 0.102 0 065 0 106 0.069 0.107 0 072 0 108 0.073 0.109 0.073 0.109 Pt(l3|7) 0 008 0 061 0.031 0.077 0 041 0 083 0.046 0.086 ' 0 048 0 087 0.049 0.088 0.049 0.089 PkWT) 0 001 0.037i 0.016 0.054 0 024 0 061 0.027 0.064' 0 029 0 06* O.030 0.066, 0.030 0.066 Table 8.4.2 Individual 95% confidence intervals for the k-step ahead conditional distribution for data set 2. Chapter 8. Application to counts of workers collecting disability benefits 185 k 1 k-2 k-3 k-4 k-5 k-6 mean 4.383 4.194 4.440 5.113 6.136 7.274 median ' 4 4 5 5 6 7 mode 4 4 4 5 6 6 P*(0|5) 0.007 0.007 0.005 0.003 0.001 0.001 Pk(P) 0.041 0.039 0.029 0.018 0.010 0.005 P*(2|5) 0.109 0.105 0.085 0.058 0.034 0.020 Pk(P) 0.182 0.177 0.154 0.117 0.078 0.051 P*(4|5) 0.213 0.211 0.198 0.170 0.130 0.095 Pk{5\5) 0.187 0.189 0.193 0.189 0.167 0.138 Pkffl) 0.131 0.134 0.150 0.168 0.173 0.162 ^ ( 7 I 5 ) 0.074 0.078 0.096 0.124 0.150 0.159 Pt(Â«|5) 0.035 0.038 0.051 0.078 0.110 0.134 # ) 0.014 0.016 0.024 0.042 0.071 0.098 ft(l0|5) 0.005 0.006 0.010 0.020 0.040 0.064 ft(ll|5) 0.002 0.002 0.003 0.008 0.020 0.037 P*(12|5) 0.000 0.001 0.001 0.003 0.009 0.020 0.000 0.000 0.000 0.002 0.006 0.016 Table 8.4.3 The k-step ahead conditional means, medians, modes and point mass forecast for data set 3. k- 1 1 k= 2 k 3 k=4 k 5 k= 6 ft(0|l) 0 V 1 0^12 0.696 0 S " 7 0.669 0 8 6 2 0.655 0.85V 0 647 0 861 0 643 0.862 0 640 0 8 6 4 PkW) 0 086 0 202 0.120 0.257 0.134 0 2_,5 0.137 0.284 0 136 0 2 9 0 0.136 0.292 0 134 0 295 ft(2|l) 0 001 0 023 0.004 0.042, 0 005 0 050 0.005 0.054 0 004 0 056 0.004 0.057 0 003 0 058 Pk{$) 0.000 0.002 -0.001 0.004 -0.001 0.006 -0.001 0.006 -0 001 0 00" -0.001 0.007 -0.001 0.007 PkW) 0 000 0 000 0 000 0.000 0 000 0 000 0.000 0.001 0 000 0 001 0.000 0.001 0 000 0 001 Table 8.4.4 Individual 95% confidence intervals for the k-step ahead conditional distribution for data set 4. k= 1 | k= 2 k-3 k=4 k=6 1 â€” />*(o|i) 0.032 0.141; 0.135 0.247 0.196 0.327, 0.221 0.392 0.234 0.435 0.241 0.463 0 235 0 534 0 289 0 420 0.338 0.436 0 336 0 427 0.347 0.404 0.357 0.387 0.363 0.377 0 361 (H74 A(2|l) 0.327 0.509 0.241 0.335 0 203 0 276 0.170 0.260 0.149 0.252 0.136 0.248 0 101 0 250 WKKKKKKKtm M M M H N I I BSill'lilillllllll ll^^Hlif^^Bil PkW) 0.080 0.160; 0.067 0.144 0.049 0.129 0.038 0.118 0.031 0.110 0.026 0.105 0 009 0 103 PM 0 007 0 0*1 0.004 0.045 0 003 0 04* â€˘ 0.003 0.038 0 002 0 034 0 001 0.032 -0 003 0 0*0 Table 8.4.5 Individual 95% confidence intervals for the k-step ahead conditional distribution for data set 5. Chapter 8. Application to counts of workers collecting disability benefits 186 Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec. k-1 k=2 k-3 k=4 / k = 5 k-6 . k=7 k-S k-9 k-10 k-11 k-12 mean 4.341 4.177 4.433 5.110 6.135 7.274 8.130 8.344 7.838 6.862 5.786 4 896 median 4 4 4 5 6 7 8 8 8 7 6 5 mode 4 4 4 5 6 7 8 8 7 6 5 4 P*(0) 0.013 0.015 0.012 0.006 0.002 0.001 0.000 0.000 0.000 0.001 0.003 0.007 PkO) 0.057 0.064 0.053 0.031 0.013 0.005 0.002 0.002 0.003 0.007 0.018 0.037 Pk(2) 0.123 0.134 0.117 0.079 0.041 0.018 0.010 0.008 0.012 0.025 0.051 0.090 PkO) 0.178 0.186 0.172 0.134 0.083 0.044 0.026 0.023 0.032 0.056 0.099 0.146 PkW 0.193 0.195 0.191 0.171 0.128 0.081 0.054 0.048 0.062 0.097 0.143 0.179 Pk(5) 0.167 0.163 0.169 0.175 0.157 0.118 0.087 0.080 0.097 0.133 0.166 0.175 Pk(6) 0.121 0.113 0.125 0.149 0.160 0.143 0.118 0.111 0.127 0.152 0.160 0.143 PkO) 0.075 0.068 0.079 0.109 0.141 0.148 0.137 0.133 0.142 0.149 0.132 0.100 P*(8) 0.041 0.035 0.044 0.070 0.108 0.135 0.139 0.139 0.139 0.128 0.096 0.061 Pk(9) 0.020 0.016 0.022 0.040 0.073 0.109 0.126 0.128 0.121 0.097 0.061 0.033 P*(10) 0.009 0.007 0.010 0.020 0.045 0.079 0.102 0.107 0.095 0.067 0.036 0.016 Pk (11) 0.003 0.003 0.004 0.009 0.025 0.052 0.076 0.081 0.068 0.042 0.019 0.007 Pt(12) 0.001 0.001 0.001 0.004 0.013 0.032 0.051 0.057 0.044 0.024 0.009 0.003 0.001 0.000 0.001 0.002 0.010 0.035 0.070 0.082 0.056 0.024 0.007 0.002 Table 8.4.6 The marginal means, medians and distributions for data set 3. 8.5 Gaussian AR(1) models In this section we fit the Gaussian AR(1) model, see Example 4.1.1 for the model definition, to our five data sets and compare the results to the Poisson AR(1) model. Table 8.5.1 contains the parameter estimates for the Gaussian AR(1) model. Most of the estimates are quite close to the corresponding parameter estimates of the Poisson AR(1) model. The largest difference is in the estimate for a in data set 4, where the Gaussian model estimates 0.291 and the Poisson model estimates 0.404. However, the Gaussian estimate is still well within the 95% confidence interval for a given by the Poisson model. Chapter 8. Application to counts of workers collecting disability benefits 187 The Gaussian model gives wider 95% confidence intervals for X than the Poisson model. In fact for data set 1* the Gaussian 95% confidence interval for X includes zero. For data sets 2, 3 and 5 the Gaussian models give wider 95% confidence intervals for a than the Poisson model. While for data sets 1 and 4 the Gaussian model gives narrower 95% confidence intervals for a than the Poisson model. Note the parameters P,, P 2 M & P 3 for data set 3 can not be directly compared to the corresponding parameters in the Poisson model, since in the Gaussian model we have not used an exponential link function. However, the estimated arrival rates for the Gaussian model, Table 8.5.2, can be directly compared to the arrival rates of the Poisson model. While the arrival rates calculated by the Poisson AR(1) model are slightly higher than those calculated directly from the arrival data the arrival rates calculated by the Gaussian AR(1) model are slightly lower those calculated directly from the arrival data. The biggest advantage in using the Poisson AR(1) model over the Gaussian AR(1) model is in the area of forecasting. Figure 8.5.1 displays Poisson AR(1) model forecasts and the Gaussian AR(1) model forecasts for the first six months of 1995 for data set 1*. From the Poisson AR(1) model we have calculated the forecast distribution, which is the bar portion of the chart. The three lines on the chart mark the predicted values and the 95% prediction interval for the Gaussian AR(1) model. The cumulative probabilities are marked on the left Y-axis. For the 1-step ahead distribution there is a 0.875 probability that the count will be zero and a 0.117 probability that the count will be 1. While for the 6-step ahead distribution there is 0.838 probability Chapter 8. Application to counts of workers collecting disability benefits 188 of a count of zero and a 0.148 probability of a count of 1. The numbers for the predicted values in the Gaussian AR(1) model are labeled on the right Y-axis. The mean predicted value is about 0.2 and the 95% prediction interval is approximately between -0.6 to 1.0. The lower bound of the 95% prediction interval as given by the Gaussian AR(1) model is also negative for data sets 3, 4 and 5. In data set 2 the counts are sufficiently high, so that the lower bound is positive. This illustrates the importance of using appropriate distributional assumptions in modeling. Data Set Parameter Gaussian Estimate Gaussian 95% C I . Poisson Estimate Poisson 95% C I . 1* a 0.221 (0.047,0.395) 0.240 (0.007,0.472) 1* X 0.136 (-0.045,0.317) 0.134 (0.064,0.204) 2 a 0.449 (0.289,0.609) 0.472 (0.344,0.599) 2 X 5.400 (3.823,6.977) 5.188 (3.898,6.478) 3 a 0.489 (0.330,0.648) 0.406 (0.294,0.519) 3 P T (constant) 6.147 (5.192,7.102) 1.250 (1.039,1.461) 3 P 2 (sin(27tf/12)) -1.683 (-2.794,-0.572) -0.243 (-0.401,-0.051) 3 P 3 (cos(27tf/12)) -1.138 (-2.238,-0.038) -0.315 (-0.483,-0.147) 4 a 0.291 (0.120,0.462) 0.404 (0.203,0.604) 4 X 0.199 (0.014,0.384) 0.170 (0.090,0.251) 5 a 0.591 (0.447,0.735) 0.652 (0.539,0.765) 5 X 0.376 (0.153,0.599) 0.333 (0.209,0.457) Table 8.5.1 The Gaussian AR(1) model parameter estimates for data sets 1 to 5. Chapter 8. Application to counts of workers collecting disability benefits 189 Arrival Rate Arrival Rate Arrival Rate Month Gaussian AR(1) Poisson AR(1) (arrival data) January 1.871 2.353 1.987 February 2.008 2.415 1.916 March 2.449 2.737 2.095 April 3.076 3.310 2.538 May 3.720 4.060 3.235 June 4.209 4.783 4.064 July 4.412 5.177 4.735 August 4.274 5.043 4.911 September 3.833 4.450 4.490 October 3.207 3.680 3.706 November 2.563 3.000 2.908 December 2.074 2.547 2.315 Table 8.5.2 This table displays the seasonal arrival rate for data set 3 given by the Gaussian AR(1) model. 1.000 c o 0.950 J 2 "k-ffl 0.900 â€˘5 > 0.850 ra 3 E 0.800 u 0.750 Forecasts (data set 1*) H i II i j l i i , Hill â€˘Illlllllll . 'â€˘ . 1 2 3 4 k-steps ahead 5 6 1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 l i i i i l l 2 1 0 â€”Aâ€” upper bd. â€”â€˘â€” forecast â€”â€˘â€” lower bd. Figure 8.5.1 The bar chart represents the k-step ahead conditional cumulative distribution for the Poisson AR(1) model, while the line graph represents the forecasts from the Gaussian AR(1) model. Chapter 8. Application to counts of workers collecting disability benefits 190 8.6 Summary In this chapter we have illustrated the methods developed in the earlier chapters. The preliminary analysis is identical to the preliminary analysis of continuous valued time series. This includes time series plots, as well as plots of the autocorrelation and partial autocorrelation function. This analysis gives us a starting point for model selection. In Chapter 4 we showed that the parameters can be estimated via maximum likelihood estimation and gave expression for the observed Fisher information. The expected Fisher information is easily evaluated numerically, and can be inverted to get the asymptotic variance of the parameter estimates. This allows us to perform t-tests on the model parameters and is useful in determining which covariates to include in the model. For models with covariates it is more difficult to calculate the expected Fisher information. In this case the observed Fisher information can be used and can easily be calculated by numerically differentiating the score function. If the t-statistic for a is small, we can further test for independence with the tests found in Chapter 5. This was the case for data set 1*, which we concluded to be a dependent series. Model selection is further refined with the help of the information matrix test, see Chapter 6, and the new residuals defined in Section 4.4. The information matrix test checks whether the model is sufficient to explain the variation found in the data. Patterns in the residuals may indicate the need of additional regressors. We found the simple Chapter 8. Application to counts of workers collecting disability benefits 191 Poisson AR(1) model without regressors was sufficient to model data sets 1*, 2, 4 and 5. In the case for data set 3 the model required the following two seasonal regressors in the arrival process: sin(27tf/12) and cos(27tr/12). The W C B was able to provide us with the arrival process, which the Poisson AR(1) model assumes to be latent. From these data we were able to directly estimate the arrival parameter. We found these estimates to be close to the estimates found using the aggregate data. The additional data also allowed us to check whether the arrivals were independent and Poisson. Independence was assessed using the sample autocorrelation function. We found that the arrivals for data set 2 appeared to be dependent. Further we found that the arrivals for data set 4 may be seasonal, which wasn't indicated in the aggregate data. The information matrix test was used to assess the Poisson assumption, which was only rejected in the case of data set 2. We conclude the Poisson assumption appears to be realistic for the W C B data. In Chapter 3 we considered forecasting for the Poisson AR(1) model. If data cohesion is considered important the k-step ahead conditional median or mode can be used as a forecast. The k-step ahead conditional mean can also be used as a forecast. However it is not integer valued. For low count series we proposed using the k-step ahead conditional distribution as a forecast. In the case of data sets 1*, 4 and 5 the counts are low enough that the k-step ahead conditional distribution is easy to read. However for data sets 2 and 3 the k-step ahead conditional distribution is much harder to read, since the non-zero probabilities are spread over more non-negative integers. Chapter 8. Application to counts of workers collecting disability benefits 192 The analysis in Chapter 8 concluded with a look at the results from fitting a Gaussian AR(1) model to the 5 data sets. We found that the parameter estimates were similar to the Poisson AR(1) estimates. We know from Section 4.7 that the loss of efficiency in using the Gaussian AR(1) likelihood increases with a , but that the Gaussian estimates are more robust than the Poisson estimates. With the possible exception of data set 2, our analysis has found the Poisson AR(1) adequate. Further in four of the five series the estimates for a was above 0.400. We therefore favor the Poisson estimates. Finally using the Gaussian AR(1) prediction intervals is meaningless, since the data are discrete valued. In fact we saw that the Gaussian AR(1) prediction intervals can include negative values, which again is meaningless for non-negative data. We conclude with some future possible avenue for analyzing the W C B data. Our analysis only considered simple sinusoidal regressors. Better models may be found by considering economic regressors, such as employment rates, weather conditions, sales etc. The W C B often funds accident prevention programs and is interested in whether or not the program has had an effect on injury rates. Indicator regressors can be used to model the change in the arrival rates before and after the start of an accident prevention program. We can then estimate the change and test whether it is significantly different than zero. Bibliography [I] Al-Osh, M . A . and Aly, E .A.A. (1992). First order autoregressive time series with negative binomial and geometric marginals. Communications in Statistics A 21, 2483-2492. [2] Al-Osh, M . A . and Alzaid, A . A . (1987). First-order integer-valued autoregressive (INAR(l)) process. Journal of Time Series Analysis 8, 261-275. [3] Al-Osh, M . A . and Alzaid, A . A . (1988). On maximum likelihood estimation for a subcritical branching process with immigration. Pakistan Journal of Statistics 4, 147-156. [4] Alzaid, A . A and Al-Osh, M . A . (1988). First-order integer-valued autoregressive (INAR(l)) process: Distributional and regression properties. Statistica Neerlandica 42,53-61. ^ [5] Alzaid, A . A . and Al-Osh, M . A . (1990). An integer-valued pth-order autoregressive structure (INAR(p)) process. Journal of Applied Probability 27, 314-324. [6] Barndorff-Nielsen, O.E. and Sorensen, M . (1994). A Review of Some Aspects of Asymptotic Likelihood Theory for Stochastic Processes. International Statistical Review 62 133-165. [7] Box, G.E.P. and Jenkins, G .M. (1970). Time Series Analysis, Forecasting and Control. San Francisco: Holden-Day. [8] Billingsley, P. (1986). Probability and Measure, 2nd Edition. New York: John Wiley & Sons. [9] Brockwell, P.J. and Davis, R.A. (1987). Time Series: Theory and Methods. New York: Springer-Verlag. [10] Brown, B . M . (1971). Martingale Central Limit Theorems. The Annals of Mathematics and Statistics 42, 59-66. [II] Chan, K.S. and Ledolter, J. (1995). Monte Carlo E M estimation for time series models involving counts. Journal of American Statistical Association 90, 242-252. [12] Chant, D. (1974). On asymptotic tests of composite hypotheses in nonstandard conditions. Biometrika 61,291 -298. [13] Chesher, A . (1983). The Information Matrix Test, Simplified Calculation Via a 193 Bibliography 194 Score Test Interpretation. Economics Letters 13, 45-48. [14] Chow, G.C. (1960). Tests of equality between sets of coefficients in two linear regressions. Econometrica 28, 591-605. [15] Cox, D.R. and Hinkley, D.V. (1974). Theoretical Statistics. London: Chapman and Hall. [16] Cramer, H . And Wold, H . (1936). Some theorems on distribution functionsl. Journal of London Mathematical Society 11, 290-295. [ 17] Crowder, M.J . (1976). Maximum Likelihood Estimation for Dependent Observations. Journal of the Royal Statistical Society, Series B 38, 45-53. [18] Davidson, J. (1994). Stochastic Limit Theory. New York: Oxford University Press. [19] Davidson, R. and MacKinnon, J.G. (1993). Estimation and Inference in Econometrics. New York: Oxford University Press. [20] Efron, B . (1982). The Jackknife, the Bootstrap and other Resampling Plans. Philadelphia: Society for Industrial and Applied Mathematics. [21] Fahrmeir, L . (1988). A Note on Asymptotic Testing Theory for Nonhomogeneous Observations. Stochastic Processes and their Applications 28, 267-273. [22] Fahrmeir, L . and Tutz, G. (1994). Multivariate Statistical Modeling Based on Generalized Linear Models. New York: Springer-Verlag. [23] Ferguson, T.S. (1996). A Course in Large Sample Theory. New York: Chapman and Hall. [24] Godambe, V.P. and Heyde, C.C. (1987). Quasi-likelihood and optimal estimation. International Statistical Review 55, 231-244 [25] Godfrey, L .G . (1988). Misspecification tests in econometrics: The Lagrange multiplier principle and other approaches. New York: Cambridge University Press. [26] Hall, P. And Heyde, C.C. (1980). Martingal Limit Theory and its Application. New York: Academic Press. [27] Hall, W.J. and Mathiason, D.J. (1990). On Large-Sample Estimation and Testing in Parametric Models. International Statistical Review 58, 77-97. [28] Hamilton, J.D. (1994). Time Series Analysis. Princeton: Princeton University Press. [29] Harvey, A . C . (1981). Time Series Models. Oxford: Allan. Bibliography 195 [30] Harvey, A . C . and Fernandes, C. (1989). Time series models for count or qualitative observations. Journal of Business & Economic Statistics 7, 407-417. [31] Hiller, F.S. and Lieberman, G.J. (1986). Interoduction to Operations Research, 4th edition. Oakland: Holden-Day. [32] Jacobs, P.A. and Lewis, P.A.W. (1977). A mixed autoregressive-moving average exponential sequence and point process ( E A R M A 1,1/ Advances in Applied Probability 9, 87-104. [33] Jacobs, P.A. and Lewis, P.A.W. (1978). Discrete time series generated by mixtures. I. : Correlation and runs properties. Journal of the Royal Statistical Society Series B 40, 94-105. [34] Jacobs, P.A. and Lewis, P.A.W. (1978). Discrete time series generated by mixtures. II. : Asymptotic properties. Journal of the Royal Statistical Society Series B 40, 222-228. [35] Jacobs, P.A. and Lewis, P.A.W. (1978). Stationary discrete autoregressive-moving average time series generated by mixtures. Journal of Time Series Analysis 4, 19-36. [36] Jin-Guan, D. and Yuan, L . (1991). The integer-valued autoregressive (INAR(p)) model. Journal of Time Series Analysis 12, 129-142. [37] Joe, H . (1997). Multivariate models and dependence concepts. London: Chapman & Hall. [38] Joe, H . (1996). Time series models with univariate margins in the convolution-closed infinitely divisible class. Journal of Applied Probability 33, 664-677. [39] Jprgensen, B. , Lundbye-Christensen, S., Song, X . - K . and Sun, L . (1995). A state space model for multivariate longitudinal count data. Technical Report #148, Department of Statistics , University of British Columbia. [40] Jprgensen, and Song, X . - K . (1998). Stationary time-series models with exponential dispersion model margins. Journal of Applied Probability 35 (to appear). [41] Kalman, R.E. (1987). Regression methods for non-stationary categorical time series: Asymptotic estimation theory. Annals of Statistics. 17, 79-98. [42] Klimko, L . A . and Nelson, P.I. (1978). On conditional least squares estimation for stochastic process. The Annals of Statistics 6, 629-642. [43] Little, J.D.C. (1961). A proof for the queuing formula L = XW. Operations Research 9, 383-387. Bibliography 196 [44] MacDonald, I.L. and Zucchini, W. (1997). Hidden Markov and Other Models for Discrete-Valued Times Series. Monographs on Statistics and Applied Probability 70. London: Chapman and Hall. [45] McCabe, B and Tremanyne, A (1993). Elements of modern asymptotic theory with statistical applications. Manchester: Manchester University Press. [46] McCabe, B . and Leybourne, S. (1996). A General Test. Working Paper. [47] McKenzie, E. (1988). Some A R M A models for dependent sequences of Poison counts. Advances in Applied Probability 20, 822-835. [48] McKenzie, E. (1986). Autoregressive moving-average processes with negative-binomial and geometric marginal distributions. Advances in Applied Probability 18, 679-705. [49] McLeish, D.L. (1975). A maximal inequality and dependent strong laws. Annals of Probability 3, 826-836. [50] Pierce, D. A . (1982). The Asymptotic Effect of Substituting Estimators for Parameters in Certain Types of Statistics. The Annals of Statistics 10, 475-478. [51] Ross, S.M. (1983). Stochastic Processes. New York: John Wiley & Sons. [52] Serfling, R.J. (1980). Approximation Theorems of Mathematical Statistics. New York: John Wiley & Sons. [53] Song, X . - K . (1996). Some statistical models for the multivariate analysis of longitudinal data. Ph.D thesis. Department of Statistics, University of British Columbia. [54] Sprott, D.A. (1983). Estimating the parameters of a convolution by maximum likelihood. Journal of the American Statistical Association 78, 457-460. [55] Stout, W.F. (1974). Almost Sure Convergence. New York: Academic Press. [56] White H . (1982). Maximum Likelihood Estimation of Misspecified Models. Econometrica 50, 1-25. [57] White H . (1984). Asymptotic Theory for Econometricans. New York: Academic Press. [58] White, H . And Domowitz, I. (1984). Nonlinear regression with dependent observations. Econometrica 52, 143-161. [59] Wooldridge, J .M. (1991a). On the application of robust, regression-based diagnostics to models of conditional means and conditional variances. Journal of Bibliography 197 Econometrics 47, 5-46. [60] Wooldridge, J .M. (1991b). Specification testing and quasi-maximum-likelihood estimation. Journal of Econometrics 48, 29-55. [61] Zeger, S.L. (1995). A regression model for time series of counts. Biometrika 75, 621-629. [62] Zeger, S.L., Liang, K . - Y . and Self, S.G. (1985). The analysis of binary longitudinal data with time independent covariates. Biometrika 72, 31-38 Appendix The following is a list of the data sets used in this thesis. Data set 0 refers to the illustrative data set introduce in Section 2.5, and unlike the other series starts at January 1987. 0 1 1* 2 3 4 5 1A 2A 3A 4A 5A Jan-85 0 0 9 6 0 0 0 2 2 0 0 Feb-85 0 0 6 7 1 0 0 0 3 1 0 Mar-85 0 0 6 8 0 1 0 3 4 0 1 Apr-85 0 0 7 9 0 1 0 3 5 0 0 May-85 0 0 10 6 1 1 0 8 1 1 0 Jun-85 0 0 8 8 0 1 0 2 4 0 0 Jul-85 0 0 14 5 0 1 0 10 4 0 0 Aug-85 0 0 8 3 0 1 0 4 1 0 0 Sep-85 0 0 7 7 0 0 5 4 0 0 Oct-85 0 0 10 11 0 1 0 8 8 0 1 Nov-85 0 0 10 8 1 1 0 9 5 1 0 Dec-85 0 0 12 4 0 2 0 6 3 0 2 Jan-86 0 0 8 2 0 0 0 6 1 0 0 Feb-86 0 0 8 3 0 0 0 4 2 0 0 Mar-86 1 1 8 4 0 0 1 5 3 0 0 Apr-86 1 1 8 5 1 0 0 4 4 1 0 May-86 1 1 13 7 1 1 1 8 2 1 0 Jun-86 1 1 12 8 0 0 1 8 5 0 0 Jul-86 0 0 14 12 0 0 0 7 8 0 0 Aug-86 0 0 13 11 0 1 0 6 6 0 1 Sep-86 0 0 13 12 0 1 0 6 7 0 1 Oct-86 0 0 8 6 1 1 0 3 5 1 0 Nov-86 0 0 13 2 1 1 0 8 1 1 0 Dec-86 1 1 10 2 0 1 1 3 0 0 0 Jan-87 6 1 1 12 3 0 0 1 5 2 0 0 Feb-87 11 0 0 12 3 0 0 0 7 1 0 0 Mar-87 5 0 0 9 5 0 0 0 4 2 0 0 Apr-87 5 0 0 8 6 0 1 0 4 2 0 1 May-87 5 0 0 13 13 2 0 0 10 9 2 0 Jun-87 2 0 0 9 12 0 0 0 4 6 0 0 Jul-87 7 0 0 8 21 0 0 0 3 15 0 0 Aug-87 4 0 0 6 9 0 0 0 3 3 0 0 Sep-87 5 0 0 7 11 1 0 0 4 6 1 0 Oct-87 4 0 0 10 11 0 0 0 8 7 0 0 Nov-87 6 1 1 17 10 0 2 1 11 5 0 2 Dec-87 8 0 0 11 8 0 1 0 6 2 0 1 Jan-88 7 1 1 13 5 0 0 1 8 2 0 0 Feb-88 7 0 0 10 4 0 0 0 4 2 0 0 Mar-88 9 1 1 9 4 0 2 0 5 3 0 2 Apr-88 9 2 2 15 4 0 2 2 7 2 0 1 May-88 13 0 0 13 2 0 2 0 8 1 0 0 Jun-88 12 0 0 12 9 0 2 0 9 8 0 1 Jul-88 11 0 0 8 8 0 1 0 2 5 0 0 198 Appendix 199 0 1 1* 2 3 4 5 1A 2A 3A 4A 5A Aug-88 13 1 1 8 5 0 0 1 5 3 0 0 Sep-88 16 0 0 9 10 0 0 0 2 8 0 0 Oct-88 8 0 0 9 12 1 1 0 4 4 1 1 Nov-88 14 0 0 12 11 1 1 0 5 7 0 0 Dec-88 10 0 0 9 9 0 1 0 4 3 0 0 Jan-89 6 0 0 5 4 0 1 0 2 1 0 0 Feb-89 6 0 0 9 5 0 1 0 4 2 0 0 Mar-89 15 0 0 10 5 0 2 0 5 2 0 1 Apr-89 9 0 0 6 10 0 2 0 2 6 0 0 May-89 15 0 0 8 14 0 2 0 5 5 0 0 Jun-89 13 0 0 17 7 1 2 0 11 2 1 0 Jul-89 11 0 0 16 11 1 2 0 11 4 0 1 Aug-89 14 0 0 17 12 1 3 0 12 9 0 0 Sep-89 11 0 0 16 7 1 1 0 7 2 0 0 Oct-89 17 0 0 8 8 1 2 0 4 6 0 2 Nov-89 8 0 0 10 14 1 2 0 8 7 0 2 Dec-89 10 0 0 7 6 1 1 0 2 2 0 1 Jan-90 11 0 0 8 4 1 1 0 3 0 0 0 Feb-90 13 0 0 7 3 1 0 0 1 1 0 0 Mar-90 10 0 0 4 4 1 1 0 1 3 0 1 Apr-90 8 0 0 5 4 1 3 0 3 3 0 1 May-90 8 0 0 4 7 0 1 0 2 3 0 0 Jun-90 6 0 0 4 6 0 1 0 2 4 0 0 Jul-90 9 1 1 10 9 0 1 0 4 3 0 0 Aug-90 12 0 0 9 8 0 1 0 2 5 0 0 Sep-90 11 1 1 12 2 0 2 1 6 0 0 1 Oct-90 9 0 0 12 4 0 1 0 4 3 0 0 Nov-90 11 1 1 11 3 0 1 1 6 0 0 0 Dec-90 7 0 0 9 1 0 2 0 3 0 0 1 Jan-91 9 0 0 8 3 0 2 0 0 2 0 0 Feb-91 11 0 0 9 1 0 1 0 3 0 0 0 Mar-91 6 0 0 8 4 1 1 0 3 2 1 0 Apr-91 4 0 0 6 3 0 3 0 3 1 0 1 May-91 6 0 0 8 5 0 4 0 3 4 0 2 Jun-91 6 0 0 13 3 0 3 0 6 1 0 1 Jul-91 12 0 0 13 8 0 2 0 5 6 0 0 Aug-91 10 0 0 10 11 0 1 0 3 4 0 0 Sep-91 12 0 0 7 7 0 1 0 2 4 0 0 Oct-91 8 1 1 17 9 0 0 1 11 6 0 0 Nov-91 6 0 0 14 5 0 0 0 2 1 0 0 Dec-91 1 0 0 10 3 1 0 0 0 1 1 0 Jan-92 3 0 0 12 6 0 1 0 3 2 0 0 Feb-92 5 0 0 6 4 0 2 0 0 2 0 1 Mar-92 5 0 0 4 5 0 0 0 1 4 0 0 Apr-92 10 0 0 7 6 0 0 0 4 3 0 0 May-92 12 0 0 8 7 1 0 0 4 3 1 0 Jun-92 9 0 0 10 7 1 0 0 4 4 0 0 Jul-92 7 0 0 16 3 1 0 0 9 2 0 0 Aug-92 9 0 0 15 5 1 0 0 8 4 0 0 Sep-92 12 0 0 10 5 2 0 0 1 2 1 0 Appendix 200 0 1 1* 2 3 4 5 1A 2A 3A 4A 5A Oct-92 14 0 0 14 4 0 1 0 6 2 0 1 Nov-92 11 0 0 16 4 0 1 0 5 3 0 1 Dec-92 9 0 0 12 2 0 1 0 3 0 0 0 Jan-93 3 0 0 10 3 0 1 0 2 1 0 0 Feb-93 4 0 0 11 6 0 1 0 2 1 0 0 Mar-93 10 0 0 10 3 0 1 0 2 1 0 0 Apr-93 2 0 0 8 1 0 1 0 5 0 0 0 May-93 7 1 0 9 3 0 0 1 6 2 0 0 Jun-93 9 1 0 10 6 1 0 1 6 3 1 0 Jul-93 9 1 13 5 0 0 1 8 3 0 0 Aug-93 3 1 0 6 9 0 1 0 1 5 0 1 Sep-93 6 1 0 8 9 0 1 0 4 6 0 0 Oct-93 9 1 0 9 5 0 1 0 5 1 0 0 Nov-93 9 1 0 6 6 0 1 0 1 4 0 0 Dec-93 9 1 0 9 4 0 1 0 5 2 0 0 Jan-94 6 1 12 6 0 0 1 4 3 0 0 Feb-94 5 1 0 8 2 0 0 0 2 0 0 0 Mar-94 6 1 0 9 4 0 0 0 4 3 0 0 Apr-94 5 1 0 5 1 1 0 0 2 0 1 0 May-94 9 1 0 6 6 0 0 0 5 4 0 0 Jun-94 7 1 0 9 5 0 1 0 2 3 0 1 Jul-94 11 1 0 9 3 0 0 0 4 1 0 0 Aug-94 12 1 0 13 2 0 0 0 6 1 0 0 Sep-94 11 2 1 12 2 0 1 1 5 1 0 0 Oct-94 12 2 1 10 2 1 2 0 4 1 0 1 Nov-94 7 2 1 9 9 0 3 1 2 8 0 1 Dec-94 11 1 0 7 5 0 2 0 3 3 0 0
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Statistical analysis of discrete time series with application...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Statistical analysis of discrete time series with application to the analysis of workers' compensation… Freeland, R. Keith 1998
pdf
Page Metadata
Item Metadata
Title | Statistical analysis of discrete time series with application to the analysis of workers' compensation claims data |
Creator |
Freeland, R. Keith |
Date Issued | 1998 |
Description | This thesis examines the statistical properties of the Poisson AR(1) model of Al-Osh and Alzaid (1987) and McKenzie (1988). The analysis includes forecasting, estimation, testing for independence and specification and the addition of regressors to the model. The Poisson AR(1) model is an infinite server queue, and as such is well suited for modeling short-term disability claimants who are waiting to recover from an injury or illness. One of the goals of the thesis is to develop statistical methods for analyzing series of monthly counts of claimants collecting short-term disability benefits from the Workers' Compensation Board (WCB) of British Columbia. We consider four types of forecasts, which are the k-step ahead conditional mean, median, mode and distribution. For low count series the k-step ahead conditional distribution is practical and much more informative than the other forecasts. We consider three estimation methods: conditional least squares (CLS), generalized least squares (GLS) and maximum likelihood (ML). In the case of CLS estimation we find an analytic expression for the information and in the GLS case we find an approximation for the information. We find neat expressions for the score function and the observed Fisher information matrix. The score expressions leads to new definitions of residuals. Special care is taken to test for independence since the test is on the boundary of the parameter space. The score test is asymptotically equivalent to testing whether the CLS estimate of the correlation coefficient is zero. Further we define a Wald and likelihood ratio test. Then we use the general specification test of McCabe and Leybourne (1996) to test whether the model is sufficient to explain the variation found in the data. Next we add regressors to the model and update our earlier forecasting, estimation and testing results. We also show the model is identifiable. We conclude with a detailed application to monthly WCB claims counts. The preliminary analysis includes plots of the series, autocorrelation function and partial autocorrelation function. Model selection is based on the preliminary analysis, t-tests for the parameters, the general specification test and residuals. We also include forecasts for the first six months of 1995. |
Extent | 14121540 bytes |
Subject |
Time-series analysis. Prediction theory. |
Genre |
Thesis/Dissertation |
Type |
Text |
FileFormat | application/pdf |
Language | eng |
Date Available | 2009-05-29 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0088709 |
URI | http://hdl.handle.net/2429/8500 |
Degree |
Doctor of Philosophy - PhD |
Program |
Business Administration |
Affiliation |
Business, Sauder School of |
Degree Grantor | University of British Columbia |
GraduationDate | 1998-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-ubc_1998-271447.pdf [ 13.47MB ]
- Metadata
- JSON: 831-1.0088709.json
- JSON-LD: 831-1.0088709-ld.json
- RDF/XML (Pretty): 831-1.0088709-rdf.xml
- RDF/JSON: 831-1.0088709-rdf.json
- Turtle: 831-1.0088709-turtle.txt
- N-Triples: 831-1.0088709-rdf-ntriples.txt
- Original Record: 831-1.0088709-source.json
- Full Text
- 831-1.0088709-fulltext.txt
- Citation
- 831-1.0088709.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0088709/manifest