STATISTICAL A N A L Y S I S OF DISCRETE TIME SERIES WITH A P P L I C A T I O N TO T H E A N A L Y S I S OF W O R K E R S ' C O M P E N S A T I O N C L A I M S D A T A by R. Keith Freeland B.Sc. The University of Calgary, 1990 A THESIS SUBMITTED IN P A R T I A L F U L F I L M E N T OF T H E REQUIREMENTS FOR T H E D E G R E E OF DOCTOR OF PHILOSOPHY in T H E F A C U L T Y OF G R A D U A T E STUDIES M A N A G E M E N T SCIENCE DIVISION we accept this thesis as conforming to the required standard T H E UNIVERSITY OF BRITISH C O L U M B I A February 1998 Â© R. Keith Freeland, 1998 In presenting degree this thesis in partial, fulfilment of the requirements for an advanced at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of department this or thesis by for scholarly his publication of this thesis or her Department The University of British Columbia Vancouver, Canada DE-6 (2/88) may representatives. It be is granted by the head of understood that for financial gain shall not be allowed without permission. Date purposes ; copying my or my written Abstract This thesis examines the statistical properties of the Poisson AR(1) model of A l Osh and Alzaid (1987) and McKenzie (1988). The analysis includes forecasting, estimation, testing for independence and specification and the addition of regressors to the model. The Poisson AR(1) model is an infinite server queue, and as such is well suited for modeling short-term disability claimants who are waiting to recover from an injury or illness. One of the goals of the thesis is to develop statistical methods for analyzing series of monthly counts of claimants collecting short-term disability benefits from the Workers' Compensation Board (WCB) of British Columbia. We consider four types of forecasts, which are the k-step ahead conditional mean, median, mode and distribution. For low count series the k-step ahead conditional distribution is practical and much more informative than the other forecasts. We consider three estimation methods: conditional least squares (CLS), generalized least squares (GLS) and maximum likelihood (ML). In the case of C L S estimation we find an analytic expression for the information and in the GLS case we find an approximation for the information. We find neat expressions for the score function and the observed Fisher information matrix. The score expressions leads to new definitions of residuals. Special care is taken to test for independence since the test is on the boundary of the parameter space. The score test is asymptotically equivalent to testing whether the C L S estimate of the correlation coefficient is zero. Further we define a Wald and ii likelihood ratio test. Then we use the general specification test of McCabe and Leybourne (1996) to test whether the model is sufficient to explain the variation found in the data. Next we add regressors to the model and update our earlier forecasting, estimation and testing results. We also show the model is identifiable. We conclude with a detailed application to monthly W C B claims counts. The preliminary analysis includes plots of the series, autocorrelation function and partial autocorrelation function. Model selection is based on the preliminary analysis, t-tests for the parameters, the general specification test and residuals. We also include forecasts for the first six months of 1995. iii Table of Contents Abstract ii List of Tables vii List of Figures x Acknowlegements 1 2 3 4 xiV Introduction 1 1.1 General 1 1.2 Overview of topics 4 Poisson AR(1) model 10 2.1 Model definition .- 10 2.2 Interpretation 12 2.3 Basic properties 13 2.4 Poisson AR(p) model 15 2.5 A n illustrative example 22 Forecasting 26 3.1 Minimum Mean squared error 26 3.2 Minimum mean absolute error 32 3.3 Forecasts distributions 33 3.4 Prediction intervals.... 36 3.5 Duration 39 Estimation 41 4.1 Likelihood theory and estimating functions 41 4.2 Conditional least squares for the Poisson AR(1) model 64 4.3 Generalized least squares for the Poisson AR( 1) model 69 iv 4.4 The score and Fisher information for the Poisson AR(1) model 81 4.5 The score and Fisher information for a general AR(1) model 94 4.6 Asymptotics of the conditional maximum likelihood estimators for the Poisson AR(1) model 4.7 5 6 7 8 Comparison of methods 98 103 Testing for independence 108 5.1 Gaussian AR(1) 108 5.2 Conditional least squares 109 5.3 Score test Ill 5.4 The score function on the boundary of the parameter space 114 General misspecification test 121 6.1 Overview 121 6.2 Outline test 123 6.3 Details for the Poisson AR(1) model 127 Models with covariates 134 7.1 Model definition and introduction 134 7.2 Forecasting 135 7.3 Estimation 142 7.4 Testing 148 Application to counts of workers collecting disability benefits 154 8.1 Workers' Compensation Data 154 8.2 Model selection and testing 162 8.3 Arrival process 175 8.4 Forecasting 180 8.5 Gaussian AR(1) models 184 Bibliography 190 Appendix ...195 vi List of Tables 3.3.1 k-step ahead conditional means, medians, modes and point mass forecasts 35 3.4.1 95% prediction intervals for the k-step ahead conditional distribution 39 4.7.1 The diagonal elements of the Godambe information matrix for GLS, C L S and C M L when a = 0.3 and X = 1 105 4.7.2 The diagonal elements of the Godambe information matrix for GLS, C L S and C M L when a = 0.7 and 1 = 1 105 i 5.4.1 Tests for independence in the illustrative data set 119 5.4.2 The percentage of time the null hypothesis of independence was rejected out of 1000 simulated series of length 200 with a = 0 and A, = 1 119 5.4.3 The percentage of time the null hypothesis of independence was rejected out of 1000 simulated series of length 200 with a = 0.1 and X = 1 120 8.1.1 A summary of simple statistics for data sets 1 through 5 155 8.2.1 Tests for independence in data set 1 * 169 8.2.2 This table displays the seasonal arrival rate for data set 3 169 8.2.3 This table summaries the parameter estimation for data sets 1 to 5; included are the parameter estimates and the upper and lower 95% confidence limits vii 170 8.2.4 This table summarizes the joint information matrix test of models 1,2 and 3 on data set 3 170 8.2.5 This table contains the mean duration and 95% confidence interval for the mean duration for data sets 1*,2,.. .,5 170 8.3.1 This table summaries the parameter estimation for the arrival processes in data sets 1A to 5A; included are the parameter estimates and 95% confidence interval. The last two columns contain estimated arrival rates and 95% confidence intervals from the Poisson AR(1) model 178 8.3.2 This table displays the seasonal arrival rate for data set 3A 179 8.3.3 This table summarizes the information matrix test for data sets 1A-5A 179 8.4.1 Individual 95% confidence intervals for the k-step ahead conditional distribution for data set 1 * 8.4.2 '. 181 Individual 95% confidence intervals for the k-step ahead conditional distribution for data set 2 181 8.4.3 The k-step ahead conditional means, conditional medians and point mass forecast for data set 3 8.4.4 182 Individual 95% confidence intervals for the k-step ahead conditional distribution for data set 4 182 vm 8.4.5 Individual 95% confidence intervals for the k-step ahead conditional distribution for data set 5 182 8.4.6 The marginal means, medians and distributions for data set 3 183 8.5.1 The Gaussian AR(1) model parameter estimates for data sets 1 to 5 185 8.5.2 This table displays the seasonal arrival rate for data set 3 given by the Gaussian AR(1) model 186 ix List of Figures 2.5.1 Times series plot of monthly claims counts of workers collecting S T W L B from January 1987 to December 1994. A l l claimants are male, between the ages of 25 and 34, are employed in the heavy manufacturing industry and are collecting STWLB due to a burn related injury 2 2.5.2 Correlogram for the time series of monthly claims counts of workers collecting STWLB from January 1987 to December 1994. A l l claimants are male, between the ages of 25 and 34, are employed in the heavy manufacturing industry and are collecting STWLB due to a burn related injury 25 2.5.3 Sample partial autocorrelation function for the time series of monthly claims counts of workers collecting STWLB from January 1987 to December 1994. A l l claimants are male, between the ages of 25 and 34, are employed in the heavy manufacturing industry and are collecting STWLB due to a burn related injury 25 3.3.1 Time series plot of monthly claims count collecting STWLB from January 1987 to December 1994. A l l claimants are male, between the ages of 25 and 34, employed in the heavy manufacturing industry and collecting STWLB due to a burn related injury 4.4.1 Residual plot of the continuation process for the time series of monthly claims counts of workers collecting STWLB from January 1987 to December 1994. A l l claimants are male, between the ages of 25 and 34, are employed in the 35 heavy manufacturing industry and are collecting STWLB due to a burn related injury 4.4.2 92 Residual plot of the arrival process for the time series of monthly claims counts of workers collecting STWLB from January 1987 to December 1994. A l l claimants are male, between the ages of 25 and 34, are employed in the heavy manufacturing industry and are collecting STWLB due to a burn related injury 93 4.4.3 Autocorrelations in the continuation residuals for the time series of monthly claims counts of workers collecting STWLB from January 1987 to December 1994. A l l claimants are male, between the ages of 25 and 34, are employed in the heavy manufacturing industry and are collecting STWLB due to a burn related injury 93 4.4.4 Autocorrelations in the arrival residuals for the time series of monthly claims counts of workers collecting STWLB from January 1987 to December 1994. A l l claimants are male, between the ages of 25 and 34, are employed in the heavy manufacturing industry and are collecting STWLB due to a burn related injury 93 4.7.1 The asymptotic efficiency of conditional least squares as a function of a when X =l 104 4.7.2 The asymptotic efficiency of conditional least squares as a function of X when xi a =0.3 104 4.7.3 Box plots comparing the sampling distributions of a when the arrival process {s,} is uniform over {0,1,2} 106 4.7.4 Box plots comparing the sampling distributions of i when the arrival process {s } is uniform over {0,1,2} 107 t 5.2.1 A comparison of the power for the Gaussian and Poisson based tests as a function of a . X - 1 and n = 100 5.2.2 .....110 A comparison of the power for the Gaussian and Poisson based tests as a function of X. a = 0.01 and n = 100 Ill 5.4.1 Plots of the likelihood as a function of a for five simulated samples of size 200 with a = 0 and X = 1 115 5.4.2 Plots of the likelihood as a function of a for five simulated samples of size 200 with a = 0.1 and A, = 1 115 8.1.1 A time series plot of data set 1 158 8.1.2 A time series plot of data set 2 158 8.1.3 A time series plot of data set 3 158 8.1.4 A time series plot of data set 4 159 xii 8.1.5 A time series plot of data set 5 159 8.1.6 A C F ' s and P A C F ' s for data sets 1 to 3. 160 8.1.7 A C F ' s and P A C F ' s for data sets 4 to 5 161 8.1.8 A C F ' s and P A C F ' s for data sets 1* 161 8.2.1 Pearson, continuation and arrival residuals plotted against time for data set 1*. ...170 8.2.2 Pearson, continuation and arrival residuals plotted against time for data set 2 171 8.2.3 Pearson, continuation and arrival residuals plotted against time for data set 3 172 8.2.4 Pearson, continuation and arrival residuals plotted against model regressors in data set 3 : 173 8.2.5 Pearson, continuation and arrival residuals plotted against time for data set 4 174 8.2.6 Pearson, continuation and arrival residuals plotted against time for data set 4 174 8.5.1 The bar chart represents the k-step ahead conditional cumulative distribution for the Poisson AR(1) model, while the line graph represents the forecasts from the Gaussian AR(1) model 186 xiii Acknowledgements I thank my supervisors, Dr. Brendan McCabe and Dr. Marty Puterman, for their encouragement and guidance throughout my graduate career at U B C . I would like to acknowledge the assistance and valuable comments of another member of my supervising committee, Dr. Bent J0rgensen. I would like to thank the Workers' Compensation Board of British Columbia for providing the data, which motivated this thesis. I further thank the Workers' Compensation Board of British Columbia for providing me with some financial support. I thank my wife Mary Kelly for her support and patience. Our first son Terry Freeland was born during our stay at U B C and our second son Sandy Freeland was born only a few weeks ago. Both boys have brought us vast amounts joy and happiness. Finally I would like to thank my supervisors, Dr. Brendan McCabe and Dr. Marty Puterman, for their financial support to me in the past six years. xiv Chapter 1 1. Introduction 1.1 General Count data is generated when the number of occurrences of an event type are recorded. Examples include the monthly number of motor vehicle accidents, the annual number of murders, the number of goals scored in a hockey game, the number of stolen bases by a baseball player in a season and the monthly number of new cancer patients. Often the prime objective in studying count data is to understand how counts are changing over time, for instance seasonal patterns such as more suicides in February, and trends such as increases in the number of cancer patients. For this reasons count data is often collected over time to form a time series. Often the term discrete time series is used and includes more general situations in which the discrete values were not necessarily counts. Methods for examining and modeling time series have been around a long time. The most popular method for modeling time series is the Box-Jenkins modeling procedure (ARTJVIA models), developed by Box and Jenkins (1970). The limitation of these models is that the random variation in the series is assumed to be normally distributed. When counts are "large" it is often assumed that time series of counts can be adequately approximated by continuous time series which are normally distributed. The reasoning for this is that many common distributions for count data (e.g. binomial, Chapter 1. Introduction 2 Poisson and negative binomial) have an approximate normal distribution when the distribution mean is "large". Often series have very small counts or even many consecutive zeroes. Series such as the monthly number of Polio cases in the U S , or the monthly number of new cases of women with AIDS in Vancouver will typically contain many consecutive months with no occurrences. Because of the inappropriateness of classic time series methods for the study of time series of count data, much effort has and continues to be put into the development of methodology necessary to study such time series. It is important to find out when and to what extent classical time series methods fail. Throughout the thesis we will compare our results for the Poisson AR(1) model to the results one would get by incorrectly using a Gaussian AR(1) model. For example we get inefficient parameter estimates when estimation is based on the pseudo Gaussian AR(1) likelihood. However when it comes to testing for independence in a Poisson AR(1) time series one can incorrectly assume a Gaussian AR(1) model with little consequence. Several authors have studied the time series of polio counts in the US, see Zeger (1988), Chan and Ledolter (1995), J0rgensen et al (1995) and Song (1996). These works are based on the theory of state space models. For an overview of state space models see Fahrmeir and Tutz (1994). Other authors have taken a different approach to modeling count time series, by trying to develop models with similar properties of the popular A R I M A models. See Al-Osh and Alzaid (1987), McKenzie (1988), Joe (1996 & 1997), J0rgensen and Song (1995) and Song (1996). Chapter 1. Introduction 3 One of the goals of this thesis is to develop models to use in an analysis of data on short-term disability (STD) claim counts at the Workers' Compensation Board of British Columbia (WCB). The W C B provides disability insurance for more than 130,000 employers in British Columbia. Every year the W C B receives about 200,000 new claims. There are two broad types of disability claims: health care only claims and wage loss claims. In this thesis claims counts for those workers collecting Short Term Wage Loss Benefits (STWLB) will be examined. A large portion of the economy in British Columbia has historically been based on the resource sector. In industries such as forestry, fishing and mining, injuries such as broken bones, cuts, dislocations and sprains are predominant. Lately the depletion of natural resources and other environmental concerns have reduced the activities of many resource industries. This however has not brought a decrease in the number of claims, since injuries from other industries are becoming more prevalent. For example, the number of repetitive strain injuries suffered by store clerks and computer programmers has increased dramatically. About half of all the claims at the W C B are strains and of these about half are back strains. The data which have been provided by the W C B contains the monthly count of the number of workers collecting STWLB. These data are grouped by 47 injuries, 16 industries, 10 claims centers, 2 genders and 4 age groups, resulting in 47x16x10x2x4=60,160 separate series. Of course many of the series have no occurrences at all. Chapter 1. Introduction 4 Each month the number of workers collecting STWLB is composed of 2 components, new claims and continuing claims. Since the number of continuing claims this month depends on the total number collecting claims last month, these data have an auto-regressive structure of order 1. The count AR(1) model discussed in Al-Osh and Alzaid (1987) is appropriate for studying such data. Much of the work done to date on such AR(1) models for count data has concerned probabilistic issues, such as correlation structure, distribution and time reversibility. Little or no attention has been devoted to issues of inference and forecasting. There is a need for statisticians and econometricians to study these models keeping statistical and practicality issues in mind. This thesis focuses on forecasting, estimation, inference, covariates, residual analysis, and the fitting of the model to actual data. 1.2 Overview of topics The outline of this thesis is as follows. In Chapter 2 we begin by presenting the Poisson AR(1) model, Al-Osh and Alzaid (1987). The appropriateness of fitting such a model to the W C B claim data is discussed. We can think of disabled workers as remaining in a queue waiting to get healthy. Under certain assumptions this queuing process is equivalent to the Poisson AR(1) model. This is important since it gives justification for the use of binomial thinning to model the correlation in these data. We continue the chapter by reviewing known properties of the Poisson AR(1) model. Then we discuss three extensions of this model to a Poisson AR(p) model that are found in the literature. Our contribution is to show that the partial autocorrelation function can be used to select Chapter 1. Introduction 5 the number of lags in the Poisson AR(p) model of Jin-Guan and Yuan (1991). That is, the partial autocorrelations beyond lag p are zero, as they are for the Gaussian AR(p) model. The chapter concludes with an illustration applied to one of the W C B data series. We consider Chapter 3 to be a major contribution. In this chapter we consider how to generate forecasts from the Poisson AR(1) model. Two criteria are considered for finding optimal forecasts, the minimum squared error and minimum absolute error of the forecasts. The squared error approach results in the conditional mean as the optimal forecast, while the absolute error approach yields the conditional median as the optimal forecast. It is interesting to note that the conditional median is an integer, which may be desirable from the point of view of data cohesion. We derive formulas for the k-step ahead conditional mean and variance as well as the k-step ahead conditional moment generating function. Our analysis is concerned with count series, more specifically low count series. That is, the number of possible outcomes for the k-step ahead count is small. In such cases it is possible and more useful to calculate the probability mass for each of these outcomes, which we call point mass forecasting. We show how to find confidence intervals around each point mass forecast when parameter estimates are used. The chapter concludes by finding confidence intervals for the mean duration of claims. Chapter 4 begins with a brief review of likelihood theory and estimating functions and follows the exposition in Barndorff-Nielsen and Sprensen (1994) and Godambe and Heyde (1987). This theory is used to unify the following methods of estimation and inference for this model: conditional least squares (CLS), generalized least squares Chapter 1. Introduction 6 (GLS), and conditional maximum likelihood (CML). Al-Osh and Alzaid (1987) considered estimation by C L S and C M L , but ignored inference except to refer to the general results for CLS in Klimko and Nelson(1978). Wooldridge (1991) considers G L S estimation and inference for parametric models of conditional means and variances. Our contributions include the following: we prove a strong law of large numbers which requires simply moment restrictions that typically hold for autoregressive processes. Then we find an analytic expression for the expected information matrix in the CLS case and derive an approximation for the expected information matrix in the GLS case. Further we find neat forms for expressing the score function and observed Fisher information matrix in terms of conditional expectations. This result generalizes to other cases where each observation is a drawing from a convolution of two distributions. It is relatively easy to numerically calculate the expected Fisher information. This makes Fisher scoring much faster than using the observed information in a Gauss-Newton method. We also show that the Poisson AR(1) model is a -mixing. We directly compare the asymptotic efficiency of CLS to C M L , and find that the efficiency is one when a = 0 and decreases as a increases. Finally, we use a simulation to show that the methods of CLS and G L S are more robust to model misspecification than C M L . To date in the literature no one has considered any statistical tests on the Poisson AR(1) model. In Chapter 5 we develop tests to check i f a series is independent. This is important since it would be inappropriate to fit an autoregressive model to a series of Chapter 1. Introduction 7 independent random variables. We note that special care must be taken since we are testing to see if the parameter value lies on the boundary of its domain. A test based on the score is developed and found to be asymptotically equivalent. to a one sided test of the least squares estimator for a . In fact the difference between the score and CLS statistics is in the denominators, which are respectively the sample mean and variance. Since under the null hypothesis, the series consists of independent and identically distributed Poisson random variables, the sample mean and variance should be asymptotically equivalent. However i f the data are over-dispersed then the sample mean will be smaller than the sample variance, or the score based statistic will be larger than the CLS statistic. It is further found that the score is a well defined martingale for values of a in a neighborhood of zero, and hence a Taylor series expansion of the score function around the point a = 0 is also well defined. This leads to a one sided score, Wald and likelihood ratio test. Chapter 6 considers a general specification test for the model. Having satisfied ourselves that a series is dependent is it adequately modeled by the Poisson AR(1) process? This general specification test is often referred to as the information matrix test. Basically it tests to see i f the difference between the two representations of the information matrix (sometimes referred to as the inner and outer products) is distributed about zero, which should be the case i f the model is correctly specified. Chesher (1983) showed that the information matrix test can be interpreted as a score test, and amounts to testing whether the model parameters are stochastic. This test will therefore indicate Chapter 1. Introduction 8 whether or not over-dispersion is present in the data. McCabe and Leybourne (1996) extended Chesher's result to a much more general setting, including time series of dependent non-homogeneous random vectors. In this chapter we outline the result of McCabe and Leybourne and present the details for the Poisson AR(1) model. To assess the test we simulate 100 series for which the Poisson model is the correct specification and 100 series for which the Poisson model is a misspecification. We find that the level of the test is approximately correct for finite series of length 200 and that the test has strong power. In Chapter 7 we extend the model to include covariates. It is the first time anyone has considered adding covariates to both the continuation process and the arrival process. Joe (1996) briefly considers the addition of covariates to the mean of the process. Note Joe's parameterization is slightly different than ours. We find the k-step ahead moment generating function. From this we get the k-step ahead conditional mean, variance and distribution. Further we find the marginal distribution. Then we show how to construct individual confidence intervals around the point mass forecasts (k-step ahead conditional distribution). Next we give necessary and sufficient conditions for the model to be identifiable. We further give conditions for the model to be a -mixing. If the covariates are such that the model is identifiable, a-mixing and stationary then the maximum likelihood estimates are asymptotically normal. We conclude the chapter by outlining testing for independence and for specification. Chapter 1. Introduction 9 Joe (1997) applied the Poisson AR(1) model to daily counts of children reporting symptoms associated with air pollution. The advantage of the W C B data over the data used by Joe (1997) is that the W C B data can be interpreted as a queue and hence justifies our use of the Poisson AR(1) model. In Chapter 8 we apply the methods in the previous chapters to modeling and forecasting five monthly time series of counts of disabled workers collecting STWLB. Our analysis includes time-series plots as well as plots of the sample autocorrelation function and partial autocorrelation function. A l l five series appear to be autoregressive of order 1 and one of the series appears to be seasonal. Model selection is based on t-ratios, the information matrix specification test and residual analysis. We give a detailed discussion of the interpretation of the residuals. For the five W C B series we analyze in this chapter we were able to obtain additional data about the number of arrivals each month. The previously analyzed data contain the total number collecting each month, that is, the number continuing plus the number of arrivals. We use these data to assess whether the arrivals are Poisson. In addition we directly estimate the arrival parameters and compare them to the estimates obtained using the Poisson AR(1) model. Next we calculate point mass forecasts for the five series. We note that the k-step ahead distribution quickly approaches the marginal distribution. We conclude the chapter by comparing the results of the Poisson AR(1) model to the results that would come from assuming a Gaussian AR(1) model. Chapter 2 2. Poisson AR(1) model In this chapter, we begin by presenting the Poisson AR(1) model, Al-Osh and Alzaid (1987). The appropriateness of fitting such a model to the W C B claim data is discussed. We can think of workers collecting STWLB as waiting in a queue. When they begin receiving benefits they enter the queue and when they stop receiving benefits they exit the queue. Under certain assumptions this queuing process is equivalent to the Poisson AR(1) model. Next we review some basic properties of the model, which are shown to follow from the moment generating function. We then discuss extensions of the model to a Poisson AR(p) model which has properties similar to the Gaussian AR(p) model. We conclude the chapter with an illustrative example which demonstrates the theory covered in this chapter. 2.1 Model formulation Let X ,X ,---,X 0 1 n be a series of dependent Poisson counts generated according to the model X - aÂ° X _ + Â£ tÂ» t where X 0 and {e,} t (2.1.1) x has a Poisson distribution with parameterX/(l-a), written as X ~ Pol 0 is a series of independent identically distributed (iid) Poisson random 10 Chapter 2. Poisson AR(1) model 11 variables with parameter X , that is s ~ Po(X). The thinning operator " Â° " is defined as ( follows; given X _ , t x aoX _ =_2^B (a), t l where B â€ž ( a ) , B ( a ) , . . . , B u 2( ; f ( i( (a) are iid Bernoulli random variables with P(B (a) = 1) = 1 - P(B (a) = 0) = a . Since a o X _ it given X _ t x U t x is a sum of iid Bernoulli random variables i f follows that it has a binomial distribution with parameters a andX,_j, written as a o J M |J M ~ Bi(a,X _ ). t x The term thinning operator is used since its operation is to randomly thin out or reduce the number in a group. It is further assumed that B (a) tj and e, are independent. For comparison purposes we define the Gaussian AR(1) model as, X = aX,_ + X+r\,, t where { n , } ^ is a x |a|<l, (2.1.2) sequence of independent identically distributed normal random variables with mean zero and variance a . 2 A n important distinction between the Poisson AR(1) model and the Gaussian AR(1) model is that in the Poisson model X is composed of two random components, t the survivorship component a o X _ \ X _ , and the new entrant (innovation) component t x t s,. In the Gaussian model given X _ t x x the first component a l , . , is not random. The Poisson model is also complicated by the fact that these two random components are not observed. That is the distribution of X, given X _ is given by the convolution of the two t x random components. This makes the Poisson model much harder to work with than the Gaussian model. Chapter 2. Poisson AR(1) model 12 This model has been studied by Al-Osh and Alzaid (1987) and McKenzie (1988). Joe (1996) extended this model by developing a general method to define a random operator for cases were the marginal distribution is in the convolution-closed infinitely divisible class. His method is consistent with the AR(1) Poisson model in that it defines the binomial thinning operator when the marginal distribution is Poisson. 2.2 Model interpretation This model can be interpreted as a birth and death process, see Ross (1983, Section 5.3) for an introduction to birth and death processes. Each individual at time, t-l, has probability a of continuing to be alive at time, t, and at each time, t, the number of births follows a Poisson distribution with mean X. â€¢ Alternatively, the model can be interpreted as an infinite server queue, see Ross (1983, Example 2.3 b). The service time is geometric with parameter 1 - a and the arrival process is Poisson with mean X . One fundamental result from queuing theory is that the expected length of the queue is equal to the arrival rate times the expected waiting time, or L = XW, where , L, X and W are respectively the expected queue length, the arrival rate and the expected waiting time, see Little (1961). In this example the mean waiting time, W, is equal to l / ( l - a ) and the expected queue length isÂ£ = A,/(l-a). With regards to short-term disability claims, X t is the number of workers collecting short-term wage loss benefits (STWLB) at time t. It equals the sum of the Chapter 2. Poisson A R ( 1 ) model 13 number of workers continuing to collect from time t -1, a o X _ , and the number of t x new claims at time t, e,. The waiting time l / ( l - a ) is the mean number of months that a newly disabled worker is expected to collect STWLB. This is referred to as duration at the W C B and is an important input into managerial decision making. Note that it is not directly extractable from available W C B data. 2.3 Basic properties In this section we look at the stationary distribution of the Poisson A R ( 1 ) process and basic quantities, such as mean, variance, moment generating function and correlation. For the Poisson APv(l) model the conditional mean and variance of X t X _ are respectively E[X \X _ ] t x t t x =aX._.+A, and Var[X \X _(\ = a(\-a)X _ t t t x given +X.The Gaussian A R ( 1 ) model has the same conditional mean but the conditional variance, a , 2 is different, McKenzie (1988). In the following proposition the stationary distribution for the Poisson A R ( 1 ) model is given and from this we can find the unconditional moments of X . McKenzie t (1988) sketches the proof of this result, we give a simple proof using moment generating functions. Proposition 2.3.1 When X is Poisson with mean X/(l-a) 0 X is Poisson with mean X,/(l - a ) . t the marginal distribution of Chapter 2. Poisson AR(1) model 14 Proof: We use induction to prove the result. B y assumption the result is true for X . We 0 now assume X _ is Poisson with mean A , / ( l - a ) , that is the moment generating t x ( X \ function of X,_ is M (s) = exp (e -1) . The moment generating function of X s x x t U-a ') v is, M (s) = E[exp(sX j\ Xi ! = E[E[ xv(saoX _ +se )\X _ ]] e l x t t x = E^(ae +(1 - a))*'"' exp(X(e - l))j s s = E[exp(s'X,_ )]exp(X(e -l)) s x X = exp( â€” ( '-l))exp(Xiy-l)) s T e where e ' = a e +1 - a . Making this substitution gives us, s 5 M (s) = e x p ( - ^ (ae +1 - a -1)) cxp(X(e -1)) ' 1-a s s x = exp(-^-(e -l)). 1-a s Thus the marginal distribution of X t is Poisson with mean X/(\-a) and the result follows. Remark Since the marginal distribution of X is Poisson it follows that the unconditional mean t Chapter 2. Poisson AR(1) model and variance of X t 15 are both equal t o X , / ( l - a ) . In addition the first three marginal moments of X are respectively , t + (^-) and 2 + 3( ^-) + (-^f . 2 T A n important quantity for this model is the autocorrelation function, which is the same as in the standard Gaussian AR(1) model, and is given by p =a , k = 1,2,..., see k k Al-Osh and Alzaid (1987). 2.4 Poisson AR(p) model A natural extension of the Poisson AR(1) model is the following Poisson AR(p) model, X =a Â°I . +a oI . +-+aâ€žoI,. +Â£ t 1 ( 1 2 ( 2 f ( (2.4.1) ! where {s,}â„¢, is again a sequence of iid Poisson random variables, " Â° " is the binomial thinning operator, given X ,X ,...,X _ 0 i l a , Â°X _ a x t 15 oX _ ,...,a 2 t 2 Â°X _ p t and s, p are mutually independent and a . â‚¬[0,l], j = 1,2,...,p. Note that the marginal distribution is y Poisson only for the case p = 1. This extension is studied in detail in Jin-Guan and Yuan (1991). Alzaid and Al-Osh (1990) examine a different Poisson AR(p) model where the thinning operator is defined such that a , Â°X ,a t 2 oX ,...,a multinomial distribution with parameters a , , a , . . . , a 2 t p Â°X p + e where the probability function of A {X _ ,X _ ,...,X _ } t x t 2 t has a t t t t and X . A non-linear Poisson AR(p) model is found in Joe (1996). Joe writes his model as X = t given X t p A [X _ ,X _ ,...,X _ ^j t t x t 2 t p is zp+l dimensional sum. Chapter 2. Poisson AR(1) model 16 The model by Jin-Guan and Yuan (1991) has the same autocorrelation function as the Gaussian AR(p) model. The parameters are easily estimated by conditional least squares, see Theorem 2.4.4. A drawback to this model is the lack of a physical interpretation as in the Poisson AR(1) model. Jin-Guan and Yuan give no interpretation of this model, nor can we think of any physical system it could represent. However it is often the case that A R M A models do not have physical interpretation, so this may not be much of a concern. It is possible to give a physical interpretation of the model in Alzaid and Al-Osh (1990) but the interpretation is a bit odd and is as follows. Let X be the number of t newborn females in period t. Each female has reproductive life span of p periods and is permitted to have at most one offspring. From the cohort X of newborn females the t distribution of their female offspring in the next p periods is assumed to be multinomial. That is they let a , oX ,a t 2 oX ,...,a t p Â°X denote the female offspring respectively in t periods ^ + 1, t + 2,...,t+p and assume that a , Â°X ,a t 2 multinomial distribution with parameters a ,a ,...,a l 2 p Â°X ,...,a t Â° X given X has a p r t and X . The innovation 8, t represents the number of newborn females that immigrate into the system at time t. The autocorrelation function for this process is the same as the Gaussian ARMA(p,p-l) process. The higher order AR(p) models of Joe (1996) are extremely difficult to use in practice. The conditional moments for his model are nonlinear and involve the calculation of multi-dimensional sums. In his AR(2) model the calculation of Â£[x,|X _j,.Ar,_ ] f 2 Chapter 2. Poisson AR(1) model 17 requires the calculation of four 2-dimensional sums. This is a major draw back since it makes conditional least squares estimation much more difficult. Maximum likelihood estimation is possible but also difficult, since even when p = 2 the conditional probabilities involve 4-dimensional sums. Joe (1997) contains an example where an AR(2) model is estimated. However the model does have a physical interpretation and as a special case can be interpreted as a queue. For the AR(2) model there exists random variables Z ,Z ,Z ,Z ,Z ,Z ] the same Z + Z +Z 3 13 2 J U as 23 +Z U the 123 23 and Z distribution 1 2 3 such that the distribution of X ,X ,X of t+] Z +Z +Z +Z , x . This is not a queue since Z n 1 3 u l+2 is t+2 Z +Z +Z +Z , m 2 u 23 m represents the number present at both time t+\ and t+3 but are not present at time t+2. It is possible to make Z 1 3 degenerate and equal to zero, in which case the model becomes a queue. In the remainder of this section we examine the properties of the Poisson AR(p) model of Jin-Guan and Yuan (1991) and henceforth refer to it simply as the Poisson AR(p) model. The objective of the rest of this section is to show that we can use the sample autocorrelation function and partial autocorrelation function to select the order in a Poisson AR(p) model in exactly the same way as they are used to select the order in a Gaussian AR(p) model. Theorem 2.4.1 (Jin-Guan and Yuan, 1991) Let {s,}â„¢ be count valued random variables with mean p.,finitevariance a 2 and let a . e[0,l], j = 1,2,...,p. If the roots of y Chapter 2. Poisson AR(1) model G* - o ^ e ' 18 - 1 -a Q ~ p a^G-a 2 2 p =0 are inside the unit circle, then there exists a unique stationary count valued time series {X } which satisfies (2.4.1) and cov[X ,s,] = 0, s < t. t s The proof of this result is long and is found in Jin-Guan and Yuan (1991). We define the sample covariance and sample correlation respectively as y =_Z' ( < ~ l ^ ~) x x x x md k P* = y * h o â€¢ Theorem 2.4.2 The Poisson AR(p) process {X } defined by (2.4.1) is ergodic. t Theorem 2.4.3 y and p are strongly consistent. k k Jin-Guan and Yuan (1991) show that the Poisson AR(1) process {X } t ergodic and use this to prove the strong consistency of the sample covariance's and correlations. The following result, also from Jin-Guan and Yuan (1991), implies that the autocorrelation function of the Poisson AR(p) model is the same as for the Gaussian AR(p) model. We provide a simple alternative proof to that found in Jin-Guan and Yuan (1991). Proposition 2.4.1 The Yule-Walker equation, p =ct,p _ + a p _ H â€” k x p k for the Poisson AR(p) model. t 1 2 t 2 t , holds Chapter 2. Poisson AR(1) model 19 Proof: Multiplying (2.4!) by X _ and taking expectations we get, t k Next we take the expectation of (2.4.1) and multiply by ZifX,^] to get, (2 4 3) +a E[X _ ]E[X _ ] p l p l k + E[E ]E[X _ ] t t k We note that Â£[s,X__ ] = i s f e j i i f x , ^ ] and that because of stationarity E\_X _ X _ ] t -E[X _ ]E[X _ ] t s t = co\[X ,X _ ] k t t+s =y_. k s k (2.4.3) we get y =a. y _ +a y _ -\â€”Kx^y k x k { 2 k 2 t s t k Taking the difference between (2.4.2) and _ . Finally dividing this by y k p 0 completes the proof. There is a common misconception that the sample autocorrelations for a series of iid data will be zero. In fact for iid data we can expected one autocorrelation out of 20 to be larger in absolute value than 2n~^ and as a result values of | p | larger than 2n~^ are t statistically significant at the 5% level. Parameter estimates for the Poisson AR(p) model can be found using the method of conditional least squares, Klimko and Nelson (1978). In this method the parameter estimates are choosen to minimize the sum of squared distances between X and t Â£[X_|X___,X__ ,...]. Details for the Poisson AR(1) case are found in Section 4.2. 2 Chapter 2. Poisson AR(1) model Theorem 2.4.4 a! ,a , . . .a 2 20 The conditional least squares estimates, d ,d ,...,d u 2n pn and X , of n andX are strongly consistent andfurther d â€ž-a 2 2 where E is a finite covariance matrix. Jin-Guan and Yuan (1991) prove this by noting that the Poisson AR(p) satisfies the conditions needed in Klimko and Nelson (1978) for the conditional least squares estimates to be consistent and asymptotically normal. The covariance matrix S is equal to the inverse of the Godambe information, see Section 4.1.1 and an expression for it is given in Jin-Guan and Yuan (1991). Definition 2.4.1 Thepth partial autocorrelation, n , is the last coefficient, a , p when fitting a Poisson AR(p) model, and measures the excess correlation at lag p which is not accounted for in a Poisson AR(p-l) model. The following new result is useful in model selection, since it shows that a Poisson AR(p) process has the same partial autocorrelations as a Gaussian AR(p) process. Corollary 2.4.1 If the series {X } follows a Poisson AR(p) process and satisfies the t conditions in Theorem 2.4.1 then the partial autocorrelations beyond lag p are zero, that Chapter 2. Poisson AR(1) model is n =0,for p+k 21 k>\. Proof: B y Theorem 2.4.1 X is uniquely defined by (2.4.1) with p lags. Therefore the t only way to represent X is by the following equation, t X, = a, o x,_ + a o X,_ +â€¢ â€¢ -+a Â° X , x is to set a =a p + 1 p + 2 2 2 =â€¢ â€¢ â€¢ = a p+k p t p +a p+i Â° X _ _ +â€¢ â€¢ -+a t p x p+k Â° X _ _ + e, t p k = 0 and hence 7c^ = 0 . t â€¢ As a result of Theorem 2.4.4 we can use conditional least squares to find strongly consistent estimates of the partial autocorrelation coefficients. Another estimate comes from the Yule-Walker equations. Setting k = p in the Yule-Walker equation and solving for a p gives us a = p -a^^ p p -a P -2 2 P ot _,p,. This suggests the following estimate for n , p p Kp = Pp where d , c t , . . . , d 1 2 -O-lPp-l -<*2Pâ€ž-2 ^p-lPl' ( , are any 4n consistent estimates o f a , a , . . . , a 1 2 H 2 A 4 ) when fitting a Poisson AR(p-l) model. If {X } is a sequence of iid random variables satisfying certain mild moment t restrictions, for example finite variance and fourth moment, then n^n has a standard normal asymptotic distribution, where the sample partial autocorrelation coefficient k is defined as in (2.4.4). As a result, values of \ii \ larger than 2 Â« ^ are statistically - k Chapter 2. Poisson APv(l) model 22 significant at the 5% level. 2.5 An illustrative example To illustrate the points in this chapter and the following chapters we have selected one of the series form the W C B claims data set, which is analyzed in more detail in Chapter 8. As mentioned in Section 1.1 the W C B data are monthly claims counts of workers collecting STWLB and are grouped into more than 60,000 separate series. One of the categories in which the data is grouped is type of industry, most of which are strongly affected by seasonality. Examples of seasonally affected industries are logging, hotels, restaurants, fishing, and retail. To model the series from these industries it will be necessary to add seasonal regressors to the Poisson AR(1) model defined in Section 2.1. The addition of covariates or regressors to the model will be covered in Chapter 7. The heavy manufacturing industry is one which we feel is less sensitive to seasonality. It is from this industry that we have selected a series to examine in the following example. Example 2.5.1 The data series used in this example and the examples in Chapters 3 through 6 consist of monthly claims counts of workers collecting S T W L B from the Richmond claims center between January 1987 and December 1994. A l l the claimants are males, are between the ages of 25 and 34, are employed in the heavy manufacturing industry and are collecting S T W L B due to a burn related injury. Chapter 2. Poisson AR(1) model 23 A time series plot of this data is given in Figure 2.5.1. From this plot the data appear to be stationary and non-seasonal. This is confirmed by the correlogram, Figure 2.5.2. In a non-stationary time series the autocorrelations do not come down to zero except for large lags. If the series were seasonal then we would expect the absolute value of the sample autocorrelation at lags 6 and 12 to be large. The correlogram for an AR(1) process with 0 < a < 1 should move to zero exponentially as the lag increases. While the autocorrelations in our correlogram approach zero quickly it does not appear exponential. A good discussion on interpreting correlograms is found in Chatfield (1989) section 2.7.2 and 4.1.1. Figure 2.5.3 is a plot of the sample partial autocorrelations, which were calculated using (2.4.4). This data set consists of 12x8 = 96 observations. As noted at the end of Section 2.4 any partial autocorrelation larger than 2/V96 = 0.204 is statistically significant at the 5% level. Consequently the second partial autocorrelation is on the border of being significant. Since the correlogram does not give any strong evidence against an AR(1) plus the fact that we have a physical interpretation which suggest an AR(1) model, it is appropriate to proceed assuming an AR(1) model. Suppose we know the "true" parameter values are a = 0.40 and A, = 5.2 (these are actually the maximum likelihood estimates calculated in Section 4.6). Then the unconditional expected number of claimants collecting STWLB each month is 5.2 / (1 - 0.40) = 8.67. The expected waiting time, or the expected number of months that Chapter 2. Poisson AR( 1) model 24 a new injured claimant can expect to be off work, is 1 / (1 - 0.40) = 1.67. A property of the geometric service time is that it is memoryless. That is, at the start of the current month all claimants collecting this month, both old and new claimants, can expect to collect, in addition to the current month, for an average 0.67 months. Claims Counts (heavy manufacturing industry, males, ages 25-34, burns) 20 __â€”â€” 1/87 1/88 1/89, 1/90 1/91 1/92 1/93 1/94 Month Figure 2.5.1 Times series plot of monthly claims counts of workers collecting STWLB from January 1987 to December 1994. All claimants are male, between the ages of 25 and 34, are employed in the heavy manufacturing industry and are collecting STWLB due to a burn related injury. Correlogram (heavy manufacturing industry, males, ages25-34, burns) Figure 2.5.2 Correlogram for the time series of monthly claims counts of workers collecting STWLB from January 1987 to December 1994. All claimants are male, between the ages of 25 and 34, are employed in the heavy manufacturing industry and are collecting STWLB due to a burn related injury. 25 Chapter 2. Poisson AR(1) model Sample Partial Autocorrelation Function (heavy manufacturing industry, males, ages 25-34,burns) 05 0.4 0.3 . 0.2. 0.1 0 -0.1 - I 2 ^"""""'â€¢â€¢â€”-A.. , . 3 I 4 .-T~ â€” ! Lag k Figure 2.5.3 Sample partial autocorrelation function for the time series of monthly claims counts of workers collecting STWLB from January 1987 to December 1994. All claimants are male, between the ages of 25 and 34, are employed in the heavy manufacturing industry and are collecting STWLB due to a burn related injury. Chapter 3 3. Forecasting The first three sections of this chapter concern forecasting when the model parameters are known. We begin by considering two criteria for optimal forecasting: minimum mean squared error forecasting and minimum mean absolute error forecasting. The first criterion almost always results in a non-integer forecast and is the same point forecast that the usual Gaussian AR(1) model would produce. The second criterion leads to an integer forecast which is attractive from the point of view of data cohesion. When forecasting a future count from a low count time series there is only a small set of possible values the future outcome is likely to take. In these situations it is practical and desirable to give the individual probabilities for each outcome in the set. We call this point mass forecasting. A fourth type of forecast is the conditional mode, which is found by selecting the outcome (point mass forecast) with the largest probability. We conclude the chapter by constructing individual confidence intervals for the point mass forecasts when using parameter estimates. 3.1 Minimum mean squared error Consider a sample to find a forecast, X } N+k (=0 from the Poisson AR(1) model in Chapter 2. The objective is of X N+k that minimizes the expected squared error given the 26 27 Chapter 3. Forecasting sample. That is to find X which minimizes N+k N+k A X N+k = E[X IX ] - 2 X 2 N N+k N . E f X ^ |X ] +X JV+t 2 /V+Jt N The first order conditions are, 0 This implies that the conditional mean, X N+k = E[X N+k | X ], is the forecast of N X N+k that minimizes the mean squared forecast error. Note that this is a general result when forecasting time series and does not depend on the model. The next result is for the &-step ahead conditional mean, and has never explicitly appeared in the literature except for the case were ^=1. It is the same as the k-step ahead conditional mean for the Gaussian AR(1) model. Proposition 3.1.1 In the Poisson AR(1) model the k-step ahead conditional mean is given by E[X \X ] N+k N =aX k N + 1-a X, k = 1,2,3,.... Proof: We prove the result by induction. Since s, is independent of a o X _ the one step t ahead conditional mean is E[X \X ] N+1 N = E[aoX +s \X ] N N+i N x 28 Chapter 3. Forecasting Now suppose that the Â£-1 step ahead conditional mean is E[X _ \X ] N+k l 1-a' = a - X +k 1 H X lf 1-a Then the k-step ahead conditional mean is [ N+IC\ N] E X = X [ N+IS N+\\ N\ E = E X X X E a*->X N+1 k-\ + 1-a*" a ~ \oX +X]+ k l N = aX + K N -X 1-a 1-a 1 X l-a*â€ž 1-a and the result follows by induction. It is also interesting to know the variation of X given X N+k N around the forecast X , N+k which is given in the following proposition. Proposition 3.1.2 In the Poisson AR(1) model the k-step ahead conditional variance is givenbyVar[X \X ] N+k N = a (\-a )X k + k N -^X, 1-a l k = 1,2,3,. .. Proof: We prove the result by induction. The one step ahead conditional variance is, Var[X \X ] N+l N = a(l-a)X +X N Now suppose that the k-l step ahead conditional variance is Chapter 3. Forecasting 29 Var[X _ N+l |X ]= a x (l - a ~ )X w k N l N k-i 1-a + -jâ€” X Then the k-step ahead conditional variance is Var[X \X ] N+k = E[Var[X JX ]\X ] N N N+l + N Var[E[X JX ]\X ] N N+x N i-1 1-a =E a ^ l - a " ) ^ ^ ^ 1-a :a'- (l-a'- )[aX +?.]+ 1 1 1 j V 1-a X +a " [a(l-a)X a i 2 t a 2 J V +X] ^a'-a^+a^-a *]^ 2 ( a " -a - )(l-a) +l-a 2 t 2 i 1 +a *- (l-a) 2 2 1-a = a*(l-a)*X +^^A,. 1-a i V â€¢ For the (l-a ) 2 - 1 Gaussian AR(1) model the k-step ahead conditional variance is (l-a )a . 2 t 2 As k goes to infinity the conditional mean and variance respectively go to the stationary lim^E[X (unconditional) N + k \X ] N mean and variance = y _ and l i m ^ Var[X \X ] x a N+k N of the process. That is = A third result which actually includes the two previous results is the conditional moment generating function of X N+k given X Chapter 3. Forecasting Theorem 3.1.1 30 For the Poisson AR(1) model the distribution of X N+k convolution of a binomial distribution with parameters a k distribution with parameter X ^-. given X and X N N and a Poisson That is, the k-step ahead conditional moment 1 generating function is given by ^ w = i ^ + ( - < * * ) r p f T~( es - 1)} i ex <-> 3 li Proof: We prove the result by induction. The one step ahead conditional moment generating function is given by m *m ^=4 pH ex is a a Â° X N + & ^ \ X N \ = [ae +(1 - a ) ] ' " exp{x(e -1)} s s Now suppose that the Â£-1 step ahead conditional moment generating function is " w r . W = [Â«"Â«* (1 - Â« " ) f + Then the k-step ahead conditional moment generating function is - ')} â€¢ 31 Chapter 3. Forecasting - 4 [a-V (1-Â«Â»)J- exp|xl^-( '-l) + e = 4"-|^]exp{xl^-( --l)} e (e'-l) where e' = a * ~ V + ( l - a ) . Substituting this for e' gives = (a[a*-V + ( l - a ) ] + ( l - a ) f x ^x is) H M N i _ 1 N = exp{x([a *" V + (1 - a =[aV (l-a*f + )] -1)} expjx e x p j ^ ( e - l ) j The above result shows that the distribution of X N+k binomial distribution with parameters a * and X N parameter ^â€¢ nS~- a (l-a }X J k k Hence it has mean \X N is a convolution of a and a Poisson distribution with aX k +X f^1 N and variance + X =^, which of course agrees with Propositions 3.1.1 and 3.1.2. This 1 N (e' -1) j extends the results in McKenzie (1988) to cases were k>\. Chapter 3. Forecasting 32 In the usual Gaussian AR(1) model X \X N+k mean a N has a normal distribution with and variance T f V ^ â€¢ So while the conditional mean of X \X N+k N in the Poisson and Gaussian models are the same, their conditional distributions are quite different. Corollary 3.1.1 Let \i k denote the distribution of X \X N+k distribution of a Poisson random variable with mean converges weakly to u or X \X N+k N and let \i be the N . Then, u => u . That is u k has a Poisson limiting distribution with mean k . Proof: From (3.1.1) we have which is the moment generating function of a Poisson distribution with mean â€¢ The result follows since convergence of the moment generating function implies convergence of the probability measure whenever the moment generating function exists in a neighborhood of zero, see for example Billingsley (1986). â€¢ 3.2 Minimum mean absolute error The objective in this section is to find a forecast, X of N+k X N+k that minimizes the Chapter 3. Forecasting 33 expected absolute error given the sample. That is to find X which minimizes N+k Let p (x\X ) k be the conditional probability function of X N given X . We N+k will define the conditional median of X given X N+k N as the smallest non-negative N integer m for which '^ *_ P (x\X ) ^ 0.5. A n alternative definition is to let m be the l k 0 k N k largest non-negative integer such that ^^_ p {x\X ) < 0.5. However i f p (0\X ) > 0.5 Q k N k N then the median would not be defined. Proposition 3.2.1 In the Poisson AR(1) model the k-step ahead conditional median is the forecast which minimizes the expected absolute forecast error. That is, E^x -x ^\x ] N+t has a global minimum at X N+k Proof: Suppose that X N+k =m. k is between m -1 and m, where m is a non-negative integer. Then Â£ | ^ - ^ ^ ] ^ ( ^ " K W " 2 Z . ( ^ - ) A W = a function of X N+k is _ x x 0 T h e s l o P e o f m s expectation as Z^ - Pk(. ) â€¢ ^ m<m the slope is negative and i f x x m k m > m +1 the slope is positive. The minimum therefore occurs at X k i N+k = m. k â€¢ It is interesting to note that we did not have to restrict our search to values in the non- N N Chapter 3. Forecasting 34 negative integers in order to get an integer solution. 3.3 Point mass forecasts In this section we look at point mass forecasts, or the k-step ahead conditional distribution. In cases were the counts are low, the median forecast may not be very informative. For example, consider the following 2 cases of a discrete random variable X: case 1, P(X = 0) = 1 - P ( X = 1) =.50 and case 2 P(X = 0) = 1 - P ( X = 5) =.90. In both cases the median of X is 0 and the mean is 0.5, but in case 2, there is almost twice the probability of observing a zero. Since there are only 2 outcomes in this example it would be more informative to give the probability distribution for the outcomes. By Theorem 3.1.1 the distribution of X given X N+k binomial distribution with parameters a k and X and a Poisson distribution with N parameter X ^f^-. The probability mass function of X given X N+k *-Â° \ s ) \x-s)\ is a convolution of a N N [ is, 1-a 1-a J In the following example we illustrate the conditional mean, median, mode and point mass forecasts. The conditional mode is easy to find since it is the point, x, at which the probability mass, pâ€ž(x\xâ€ž), is largest, where x is a non-negative integer. Example 3.3.1 This is a continuation of Example 2.5.1. Our series consists of 8 x 1 2 = 96 observations and the last observation is X g6 = 11. Again we assume that the "true" Chapter 3. Forecasting 35 parameter values are a =0.40 and A. = 5.2. Figure 3.3.1 is a time series plot of the observations and predicted values (one step ahead conditional mean forecasts). It indicates that one step ahead conditional mean forecasts are reasonably good. Conditional forecasts for the first 6 months of 1995 given a count of 11 in December 1994 are displayed in Table 3.3.1. The table includes the k-step ahead conditional mean and median forecasts as well as the k-step ahead conditional distribution (point mass forecasts). The last column contains the limiting distribution or unconditional distribution. Monthly Claims Counts (heavy manufacturing industry, males, ages 25-34, burns) 20 . Observed - Fredicted Month Figure 3.3.1 Time series plot of monthly claims count collecting STWLB from January 1987 to December 1994: All claimants are male, between the ages of 25 and 34, employed in the heavy manufacturing industry and collecting STWLB due to a burn related injury. Chapter 3. Forecasting 36 k---l k=2 k=3 k=4 k=5 k=6 k--=oo 9.60 9.04 8.82 8.73 8.69 8.68 8.67 median 9 9 9 9 9 9 8 mode 9 9 8 8 8 8 8 0.025 0.051 0.061 0.065 0.066 0.067 0.067 V(5|H) 0.038 0.058 0.066 0.068 0.069 0.070 0.070 ^(6|ll) 0.068 0.089 0.097 0.099 0.101 0.101 0.101 0.101 0.117 0.122 0.124 0.125 0.125 0.126 0.129 0.133 0.135 0.135 0.136 0.136 0.136 0.142 0.134 0.132 0.131 0.131 0.131 0.131 0.138 0.121 0.116 0.115 0.114 0.114 0.113 0.118 0.099 0.093 0.091 0.090 0.090 0.089 0.091 0.074 0.068 0.066 0.065 0.065 0.065 0.063 0.051 0.046 0.044 0.044 0.043 0.043 0.040 0.032 0.029 0.028 0.027 0.027 0.027 0.046 0.040 0.035 0.033 0.032 0.032 0.032 mean AW") ^(8|11) A(9|H) A(IO|H) A(II|H) A(I2|H) A(13|11) A(l4|ll) Table 3.3.1 k-step ahead conditional means, medians, modes and point mass forecasts. Remarks 1. The conditional mean forecasts quickly approach the unconditional mean of 8.67. 2. The conditional median forecast is 9 and remains unchanged except in the limit where f (x\U) = 0.50. jPa} 3. The conditional mode is easy to find from the table of point mass forecasts (conditional distribution). For example, the largest 1-step ahead point mass, j?,(x|ll), Chapter 3. Forecasting 37 is 0.142 and occurs whenjc=9. 4. The 6 step ahead conditional distribution is very close to the limiting distribution. 3.4 Prediction intervals The point mass forecast in Section 3.3 were calculated assuming the "true" parameter i values were known. Since in practice we have to estimate the parameters we would like to include this uncertainty by constructing point mass prediction intervals. The following result, which is found in Serfling (1980), will be used to develop the prediction intervals. Theorem 3.4.1 Suppose that the sequence, {^}" , of k-dimensional random vectors is =1 asymptotically normal with mean \i and covariance matrix n~ 2Z l = w - 1 Let g(y) be [a,.,.]. a real-valued function having a non zero derivative at y = u.. Then g(Y ) n asymptotically normal with mean gir ) d 1 covariance an is matrix Now suppose we have a sample of size n and denote the maximum likelihood estimates for this sample by dâ€ž and X . Under mild regularity conditions, see Section n d ,X\ n is asymptotically normal with mean (a , X ) / is the Fisher information matrix and a 0 a 0 and variance n i x 1 , where and X are the "true" parameter values. 0 As a consequence of the above result for fixed x, p (x\X ;a ,Xâ€ž) k n n has an Chapter 3. Forecasting 38 asymptotically normal distribution with mean p (x\X ;a ,X ) k f o (x;a ,X ) 2 k 0 0 = n~ V where i l a \ + 2 r i \ da d a=a ,A.=>-o J n Q d da dX P k and variance 0 P k + i a=a ,X=X 0 0 >2\ ,(3.4. dX 0 and / ' are the diagonal element of / and i_[ is the off diagonal element. The 1 x partial derivatives of the point mass probability p (x\X ;a,X) k can either be found n directly or with the help of the expressions in Section 4.5 and are given by â€”p (x\X ) da k n = (p (x-\\X -\)-p k 1-a n (x\ X j)ka ~ k k . { ( -\\X )- (x\X ))X- + Pk x n Pk x n \-ka - -(k-\)a k l k K ( 3 A 2 ) } n (1-aY and ~ Pk <A x ) = {p (x-WdX n k P k (3.4.3) (x\ x )) n 1-a A n approximate 95% confidence interval for p (x\ X ;a , X ) is k p (x\X ;a ,X )Â±2v k n n n n 0 0 (x;a ,Xâ€žy k n (3.4.4) We now continue Example 3.3.1. Example 3.4.1 The maximum likelihood estimates for our data set are d = 0.40 and X, = 5.2. The inverse of the expected Fisher information matrix at these parameter estimates is given by Chapter 3. Forecasting 39 0.62 1 ~ [-5.17 -5.17 50.05 The details of these calculations are found in Example 4.4.1. In section 4.4 we show that the Poisson AR(1) model satisfies the regularity conditions for the maximum likelihood estimates to be asymptotically normal. Consequently we can use (3.4.4) to construct individual approximate 95% prediction intervals for the k-step ahead point mass forecasts. Table 3.4.1 contains the 95% prediction intervals for the first 6 months of 1995. Remarks 1) The width of the prediction intervals increases with the number of steps ahead. However after about 6 steps ahead the width changes very little as it is very close to its maximum. 2) The prediction intervals for the 6 step ahead conditional distribution is almost the same as the prediction intervals for the unconditional (limiting) distribution. 3) These are individual confidence intervals for the probabilities in the k-step ahead conditional distribution NOT the forecasts. Chapter 3. Forecasting k- 1 A(O|II) A(2|ll) 40 k= 3 k=2 k-5 k=4 k-/ k=6 0 000 OOOf 0.000 0.000 0 000 0 000 0.000 0.000 0 000 0 000 0.000 0.000 0 000 0 000 0 000 0.000 0.002 0 000 0 002 0.000 0.000 0.000 0.003 0 000 0 005 0.002 0.011 0 002 0 011 0.030. 0 00S 0 010 0 0.001 0.003 0 001 ooo o ooi 0.002 0.007 0 002 0 010 0.002 0.010 0 002 0 011 0 000 0 012 0.006 0.021 0 007 0 02"' 0.007 0 005 0 029 0.017 0.046 0 019 0 055 0.020 0.058 0 020 0 059 0.021 0.060 0 021 0 060 ft(5|ll) 0 018 0 058 0.037 0.080 0 040 0 091 0.042 0.095 0 041 0 096 0.043 0.097 0 043 0 097 A(6lll) 0 041 0 091 0.064 0.115 0 06S 0.071 0.128 0072 0 129 0.073 0.12> 0 073 0 130 0 0~8 0 125 0.093 0.140 0 099 0 14> 0.102 0.146 O MP (i 147 0.103 0.147 0 104 0 147 0 111 0 145 0.118 0.I4S 0 123 0 146 0.125 0.14o 0 126 0 145 0.126 0.145 0 126 0 145 (lir, 0 152 0.138 OPH) 0 134 0.128 0.135 0 127 0 135 0.127 0.135 0 126 0 136 0 12^ 0 151 0.111 0.131, 0 102 0 110 0 098 0 1 V 0.081 0.117 ()0"2 0 0~0 0 112 0.053 0.095 0 04^ 0 0S2 (MPl 0 026 A( IÂ») 7 A(9|ll) A(IO|H) A(II|H) A(12|11) A('3|II) A(14|11) 0.130 0.031 0 026 0 054 0.016 0.049 0 125 l 0.130 0 098 0 129 0.098 0.129 0 098 0 129 0 114 0.069 0.112 0 06X 0 112 0.068 0.111 0 068 0 111 0 046 0 091 0.044 0.088 0.043 0.088 0.042 0.087 0.042 0 087 0 024 0 063 0 015 0.099 0.029 0 008 0 029 0.008 0 066 0.025 0.064 0.024 0.063 0 023 0 063 0 045 0.013 0.O41 0 012 0 042 0.012 0.042 0 012 0 042 Table 3.4.1 95% confidence intervals for the k-step ahead conditional distribution. 3.5 Duration In this section we examine duration, which is the number of months that the W C B continues to pay a claim. Under the Poisson AR(1) model assumption the duration time is geometric with parameter 1 - a , see Section 2.2, and the mean duration is (l - a ) " . 1 Suppose we have a sample of size n and denote the maximum likelihood Chapter 3. Forecasting estimators by d n 41 1 and X . We assume that d n is asymptotically normal with mean a n 0 and variance a ( a , A , ) , where a ( a , A , ) is that portion of the inverse of the Fisher 2 2 0 0 0 0 information matrix which pertains to a , and a 0 and k are the "true" parameter values. 0 As a result of Theorem 3.4.1 the estimated duration ( l - d j normal with mean (l - a ) 0 1 1 is asymptotically and variance (l -a )~ CT (a ,X )n~ . Hence an approximate 4 2 l 0 0 0 95% confidence interval for the mean duration is ( l - d ) Â± 1 . 9 6 ( l - d â€ž ) 1 n a(d ,X jn~^ . 2 2 n Example 3.4.1. For our illustrative example the estimated duration is (1-0.40) n 1 = 1.667 months and an approximate 95% confidence interval for the mean duration is 1.667 Â± 1.96(1 - 0.40)" ^0.62/96 or (1.229,2.104). 2 Chapter 4 4. Estimation Section 4 ! outlines the asymptotic theory of estimating functions and is illustrated by reviewing estimation and inference for the Gaussian AR(1) model. In Sections 4.2 and 4.3 we show that the methods of conditional least squares (CLS) and generalized least squares (GLS) satisfy the regularity conditions for asymptotic normality of Section 4.1. Further we derive an analytic expression for the Godambe information matrix in the C L S case and an approximation to the Godambe information matrix in the GLS case. Then in Section 4.4 we show that the score function and observed Fisher information matrix can be neatly represented as conditional expectations. Further in Section 4.5 we show that we can generalize our expressions in Section 4.4 to general AR(1) models. In Section 4.6 we show that the method of maximum likelihood (ML) estimation satisfies the regularity conditions for asymptotic normality. Finally in Section 4.7 we compare the asymptotic efficiency and robustness of the CLS and M L . 4.1 Likelihood theory and estimating functions The goal of this section is to present a unified approach to estimation through the use of estimating functions. We provide the estimating function approach since it includes many standard estimating techniques, such as maximum likelihood, generalized method of moments, conditional least squares and generalized least squares. 42 Chapter 4. Estimation 43 A detailed review of likelihood theory and an introduction to estimating functions is found in Barndorff-Nielsen and Sprensen (1994). Godambe and Heyde (1987) give more details on the asymptotics and optimality of estimating functions. Both of these papers consider estimation and inference for multivariate continuous time stochastic processes. However in this review we will restrict ourselves to univariate discrete time stochastic processes. Many applications of estimating functions can be found in Godambe (1991). Let { X , } " be a discrete time stochastic process for which we consider parametric statistical models of the form (Q",3",P ";0 e Â© ) , where Q" is the sample e space, 3" is a sigma field, P " is a probability measure defined on the measurable space e (Q",3") and 0 is an open subset of W. We assume that all of the probability measures in the family P" = [PQ',Q e Â© } are dominated by a common rj -finite measure p. and we let dP" d\i denote a version of the Radon-Nikodym derivative. Definition 4.1.1 We call a p-dimensional function Â¥,,(0) = Â¥ (X ,X ,...,X ;Q) x n l 2 n a regular estimating function if it satisfies the following conditions for all 0=(e ,e ,...,e )â‚¬0. 1 2 p Chapter 4. Estimation 44 1) ^ ( 9 ) is a zero mean square integrable martingale with respect to P " and the sigma e field Z =v{X ,X ,...,X ), n x 2 n 2) the covariance matrix, Â£ [ P (0)T (0)] , is positive definite, , x e r n n 3) Â¥ is almost surely differentiable with respect to the components of 0 , x n 4) ^ n is nonsingular, where denote the matrix of partial derivatives of Â¥ with x n respect to the components of 0 . The parameter 0 is estimated by solving the system of equations W (0) = 0 and n we let 0 denote the estimate. Estimating functions or estimating equations arise from many standard estimation techniques. The first order conditions for minimizing conditional least squares is an example of an estimating equation. The score function is another common estimating function. In the case were ^ â€ž ( 0 ) is the score function we will write U instead of Â¥ . Finally note that V is also called an inference function, x n x n n since both estimation and inference are based on it. Definition 4.1.2 A sequence {3,}^ { X , } ^ of random variables if X t of sigma fields is said to be adapted to a sequence is 3, measurable. Further, is called an adapted sequence. Definition 4.1.3 Let {3,}" be an increasing sequence of sigma fields. An adapted =1 sequence { X , , ^ , } ^ on a probability space (Q,3,f) is called a martingale if for all t Chapter 4. Estimation 45 -filial] exists and isfiniteand E'[X |3,_ ] = X _ J Definition 4.1.4 ( 1 T V A stochastic process { X , } ^ is called a martingale difference sequence if X is measurable with respect to 3, and E^X \Xs _^ = 0. t t t Remarks 1) Unless otherwise stated we will assume our martingales to be vector valued and denote the transpose of X by Xf. t 2) If {X }â„¢ is a martingale difference sequence then the sequence is uncorrelated. That t is for all s and t such that s < t ,E[X X,] = E[X E[X, S S | 3 , ] = E[X 0] s =0 3) If the sequence { X , } ^ is a martingale then a martingale difference sequence {Y,}â„¢ =1 can be formed by defining Y = X - X _ . Further the variance of the martingale is t t t x In the following well known example we illustrate definitions 4.1.1 to 4.1.4. Example 4.1.1 field of 9T +1 Let u. be Lebesque measure, Q" = 5R" , +1 +1 3" +1 be the Borel sigma and 0 = (0,1) x 9? , where SR is the set of positive real numbers. The + family of probability measures P + n+1 Nikodym derivative, is defined by the following version of the Radon- Chapter 4. Estimation 46 " 1 p(X;a,X) = 4(x,-ocAr,.,-x) 2 y ]J-j=e 2K j where ( a , X ) e 0 and X = ( X , X , , . . . , X â€ž ) e 9 T . r +1 0 This is the Gaussian AR(1) model defined in (2.1.2), with a e(0,l), a 2 = 1 and X X = hs where s n n is normal with mean zero and variance l / f l - a ) . Note that ' 2 n 1-a defining X 0 in this way makes the unconditional distribution of X normal with mean t A . / ( l - a ) and variance l / ( l - a ) . Note that p(X;a,X) 2 is the likelihood, and the log- likelihood is proportional to l(a,X,o ;X) 2 1 ^ = --Y (X -X-aX _ ) J t t +log(l-a )2 l 2 (=i X 1-a The relative contribution of the last two terms of this expression to the log-likelihood is small for large samples. Since our primary objective is to obtain asymptotic properties we will consider instead the following approximation to the log-likelihood, l(a,X,o ) 1 " = 2 -^_(X -X-aX J t t The score function associated with this log-likelihood is ^ X ^ - X - a x S Uâ€ž{a,X) = t=\ (4.1.1) _{X -X-aX,_ ) t x Chapter 4. Estimation 47 It is useful to express U as n f n t =\ U {a,X) = n n V Next we show that U n \ /=i J is a regular estimating function. It is well known that score functions are martingales. It is easy to verify that U is a martingale with respect to n 3â€ž = cy(X ,X ,X ,...,X ) 0 x 2 n as follows E[U \X _ ] n n x (=1 =E ^ 3-. n ) t=l \ fn-\ n-l V /=i J The martingale differences of the score are u = U -U _ t following variance, t t = (X _ s ,s ) , T x t x t t which have the Chapter 4. Estimation 48 X _,s 2 .EJu^f] = E X,_,e 2 'E[XI^-. 2 E[X^]E[Z)] E[X _ ]E T 1 l-a 4?] X + f z . A. X U-aV 1-a X 1-a This matrix is positive-definite since for any 2-dimensional vector / = (/,, l Y, 2 l E[u uJ]l T t 1 = l\ ( 1 1-a is positive except i f l =l =0. x 2 ^ (, 2 X X +l + 21J U-a 1-a' 1 X 2 2 1-a +/ 2 1-a The variance of U is simply n time the variance of u N t and is positive-definite. The partial derivatives of U exist and are given by the following matrix: N - z (=1 (4.1.2) X ,t-\ and has the following determinant: det(U ) = -n^"_^{X N T negative except for the case were X , = X -â€¢â€¢â€¢X . 2 function. N - X ) , which is strictly Therefore U N is a regular estimating Chapter 4. Estimation 49 The solution of U (a,)Cj - 0 gives the following parameter estimates, n Y(x . -x)(x -x) t x * t - ^ r - n â€” â€” (4.1-3) (=1 K=-l_X -d -_X _ . t n t (4.1.4) x Definition 4.1.5 Let {/w,}", 6e a sequence of martingale differences, which generates the martingale M =_^^m . n (M) =_~ "^ E^m mf\^ _ '^ is called the quadratic t n characteristic of M and [M] n n j __" _ i J m t x m o l t l is called the quadratic variation of M . n Remark If M is a martingale with respect to 3â€ž then [M] â€”(M) n n is also a martingale with respect to the same sigma field 3 . n Definition 4.1.6 A sequence {X }â„¢_ is said to be stationary if for every finite k the joint t x distribution of the vector (X , X X ) t t+X l+k is independent of t. Definition 4.1.7 A sequence {X,}â„¢ is said to be a -mixing if there exist a sequence of positive numbers {a(*)} , convergent to zero, such that Chapter 4. Estimation 50 \P(A nB)- P(A)P{B)\ < a(n) for any set A e a ( X j , X , . . . , X ) , any set B &<j(X ,X ,...) 2 t Definition 4.1.8 Let {X }â„¢ t =l k+n k+n+l and k>l, n>\. be a stationary stochastic process define on the probability n space (Q,3,P). If for every two sets A and B e 3 , l i m n V P ( X , â‚¬.Ar\X _ 1 t = P{X eA)P{X l eB) eB) then the sequence {X,}â„¢ is called ergodic. 1 Theorem 4.1.1 Let g be a measurable function onto 9?* and define Y = t g(X ,X ,...,X ).Then t l+l l+s 1) if {X,}â„¢ is stationary then {Y,}â„¢ is stationary, =1 2) if {X }â„¢ t =] is a-mixing a(n) = 0(n~ ) for some r>0 then {Y,}â„¢ is a-mixing r =1 a(n) = 0(n~ ), r 3) if{X }â„¢ t ={ is ergodic then {^}", is ergodic. For a proof of parts 1 and 3 see Stout (1974, pp. 170, 182) and for a proof of part 2 see White and Domowitz (1984, Lemma 2.1). Remarks 1) Parts 1 and 3 hold even when 5 is infinite. Chapter 4. Estimation 2) If {X,}â„¢ 51 is stationary and a -mixing then it is ergodic. The converse is not always true. Theorem 4.1.2 (Ergodic theorem) If { X , } ^ is stationary ergodic with Â£[|X,|] < oo then " as n-*_X,^E[X,]. (=i For a proof of the Ergodic theorem see Stout (1974, p. 181). The following regularity conditions are needed to establish a central limit theorem: Let {/H,}^, be a sequence of martingale differences, which generate the martingale M = __" , â€¢ m n C 1 ) C2) U m t - MM~] -Â°' {M)JVar[M }^r\, n CT)[M}JVar[M }\x\. n Theorem 4.1.3 If M n is a one dimensional zero mean square integrable martingale satisfying conditions CI and C2 then (M)'/ M -%N(0,l) 2 n (mixing) and if condition C2 is replaced by condition C2' then (4.1.5) Chapter 4. Estimation 52 (4.1.6) [M]~/ M - i JV(0,1) (mixing). 2 n A proof of this result can be found in Hall and Heyde (1980, ch. 3). Remarks 1) Condition C I can be replaced by weaker conditions, see Hall and Heyde (1980, ch. 3). 2) Loosely speaking the regularity conditions insure that M is not dominated by a few n of the m 's and that the variance is standardized to unity. t 3) If the martingale difference sequence is stationary and var[/w,]<oo then condition C I is satisfied. 4) If the martingale difference sequence {m }â„¢ is stationary ergodic and E^mf j < oo t =l then conditions C2 and C 2 ' are satisfied with r| = 1 almost surely, and the result is 2 called asymptotic normality. 5) If n 2 is not degenerate the result is called mixed asymptotic normality and the model is called non-ergodic. 6) For the multivariate case, one simply checks that for all non-zero ^-dimensional vectors / the conditions for the univariate martingale l M T n hold. In this case the Cramer-Wold (1936) device says that (4.1.4) and (4.1.5) hold for the vector M . n Example 4.1.2 In this example we show the estimating function U in Example 4.1.1 n Chapter 4. Estimation 53 satisfies condition C I . For any 2 dimensional vector l = {l ,l ) T x w e n a v P<zr[/ w,j = e r 2 2 l E\u uJ]l, T L t J which we showed in Example 4.1.1 to be equal to l\ â€” r + K â€” â€” ^ 1 2 1-a ^ 1-a The variance of l U is simply n times this. Substituting into condition C I we get, T n 1 .2 lnri â€ž â„¢â„¢, i2,.., M '] v lTu 4 â€”r4â€”^ n} - - lim 1 r [ 1-a T 2 (j X 1 1-a V 1-a 1-a ; J =0 We defer verifying conditions C2 and C 2 ' to Example 4.1.5. Theorem 4.1.4 (Continuous Mapping Theorem, CMT) Let Z and Z be random vectors n from some sample space Cl to k-dimensional Euclidean space 9?*, Further let g() be a measurable functionfrom9?* to 9? . We will allow g(-) to be discontinuous at a set of m d zero measure points D . IfZ -+Z g n and if P(Z sD ) g = 0 then the Continuous Mapping d Theorem says that g(Z ) â€”Â» g(Z). n See McCabe and Tremayne (1993, Section 4.4) for a sketch of the proof and for many illustrations of its use. Remark d d The following three cases of the C M T are widely used. If X -Â» X and Y -> c then, n n Chapter 4. Estimation 54 d XJâ€ž->Xc n n and X X â€”=â€”Â» â€”, Y. c d n c*Q. We require the following additional regularity conditions for the next theorem. C3) Y ; (0 )Yâ€ž (0) - 4 0, for any 9 in a suitable neighborhood of G . 1 0 C4) I f 0 A e n 0 thcn%\Q)V {Q )^I . n n p Theorem 4.1.5 Let V be a regular inference function and let 0â€ž be a solution of x n ^ ( 0 ) = 0. If the regularity conditions CI, C2, C3 and C4 are satisfied then, (T(0));^â€ž(0)(0â€ž-0)^iv[o,/J. The proof of this result is outlined in Godambe and Heyde (1987). Remarks 1) In condition C3 a suitable neighborhood of 0 means one such that 0â€ž is in the neighborhood for all n greater than some number N. Chapter 4. Estimation 55 2) The result of Theorem 4.1.5 also hold when condition C2 is replaced by C 2 ' and (Â¥(9)) B is replaced by [^(9)]^ The following regularity conditions, for the sequence of martingale differences {vj/ }, lead to a simplification of Theorem 4.1.5. 1 C5 For all s and t, E'[v|/ vj/Jj = Â£'[v(/ v(/f]<oo and 2?[\|> J = 2?[\|) J , where v|), is the J J ( matrix of partial derivative of v|/, with respect to the components of 9 . C6 Either n\V] ^E[y n C7 y]} or n\V) t H A ^ ^ f ] , n'Y^A^,]. Remarks 1) Condition C5 implies condition C I . 2) Conditions C5 and C6 imply either conditions C2 or C2'. 3) If {y,} is stationary then conditon C5 holds. 4) If {v|/ } is stationary ergodic, E |\|/ \\i ] |j < oo and ii[|v|> ,|] < oo then conditons C6 and 1 t C7 hold. Definition 4.1.4 When condition C5 holds for the regular estimating function Â¥ we define the variability matrix as V(Q) = Var[y, (9)] = E[y, (9 )y ] (9)] x n Chapter 4. Estimation 56 Remark A desired property of an estimating function is to give similar estimates in repeated samples. That is, we would like the function Â¥ to not vary much from sample to sample x n or the variance of Â¥ to be as small as possible. We therefore desire V to be as small as x n possible. Definition 4.1.4 When condition C5 holds for the regular estimating function Â¥ we x n define the sensitivity matrix as S(&) = E^f (0)]. R Remark Another desired property of an estimating function is that it should be easy to distinguish between small changes in the parameters. The steeper the estimating function is around the true parameter value the easier it is to distinguish and identify the parameter. It is therefore desirable for S(&) to be as "large" as possible. Definition 4.1.4 When condition C5 holds for the regular estimating function Â¥ we x n define the Godambe information matrix as j = S (0 )V~ (0 )S(Q ). T l Definition 4.1.9 When estimation is based on the score and it satisfies the conditions for a regular estimating function as well as condition C 5 , then we define the Fisher information matrix as / = V = -S. Chapter 4. Estimation 57 Remarks 1) The Godambe information can be interpreted as a measure of the amount of information that can be obtained about the parameter from the estimating function. 2) A l l the estimating functions considered in this chapter satisfy conditions C5, C6 and CI. 3) A l l of the estimating functions considered in this thesis are score or quasi-score functions, that is L , where L (x ,x ,...,x :Q) n n 0 1 n is either a likelihood or a pseudo likelihood. Since in this case Â¥ will be a matrix of second partial derivatives of L with x n n respect to 9 , S(9) will be a symmetric matrix. B y pseudo likelihood we mean when the . true likelihood is replaced by a simpler likelihood, which retains some of the properties of the true likelihood. For example the Poisson AR(1) likelihood could be replaced by the pseudo Gaussian AR(1) likelihood, which is much simpler. Recall that the Poisson AR(1) model and the Gaussian AR(1) model have the same conditional mean, marginal mean, autocorrelation function and partial autocorrelation function. Corollary 4.1.1 Let V be a regular inference function and let Q be a solution of x n n Â¥ (9) = 0. If the regularity conditions C3, C4, C5, C6 and C7 are satisfied then, x n d Proof: From Theorem 4.1.5 we have Chapter 4. Estimation 58 We)): t(e)(eâ€ž-e)^z, K where Z a p-dimensional standard normal random variable. From the C M T we have The left hand side is equivalent to n (d - 0 j , while random variable on the right hand yi n side has a multivariate normal distribution with S- (Q)V(d)S-\Q) = T variance covariance matrix r\ â€¢ Example 4.1.4 We show that condition C5 holds for the Gaussian AR(1) model in Example 4.1.1. In Example 4.1.1 we showed that V = E[u uj] t 1 1-a' J X 1-a J 1-a 1-a (4.1.7) 1 which is finite and due to stationarity is independent of t. The expected value of u is t given by, S = E[u ] t -E X , 1 _ 59 Chapter 4. Estimation + 1-a' ( x V Vl-aj X 1-a 1 1-a which is also finite and independent of t. Hence condition C5 is satisfied. We will defer showing that conditions C6 and C7 hold to Example 4.1.5. Ideally we would like the variation in 0 â€ž to be as small as possible. That is we want j~ to be small or the Godambe information to be large. The following regularity x condition allows comparison between the Godambe information and the Fisher information. We denote the score function as U (x;Q) = p (x;Q)~ j$ p (x;Q), x n -kpW) = )Â»air ^<>' )> â€¢ â€¢ â€¢' A P( > tf G x e n n where â€¢ C8 The score function is a regular inference function and the order of integration and differentiation may be interchanged as follows, d dQ râ€ž. / r d jTâ€ž(x;0K(x;0)^(x)=Jâ€”[^â€ž(x;0K(x;0)]^(x). ^ / â€ž N , / N Theorem 4.1.6 For a regular estimating function which satisfies conditions C5 through C8 the Godambe information is always less than or equal to the Fisher information. More specifically det(/- j)-0. Next we state a strong law of large numbers for martingales and prove some special cases which will be used later in the thesis. Chapter 4. Estimation Theorem 4.1.7 60 (Martingale strong law) Let {Y ,3 } t be a martingale difference t sequence, where 3, = <j(Y ,Y ,...,Y ), the a -field generated by Y ,Y ,...,Y . If there exist x 2 t x an r > 1 such that Zâ„¢ -EI^I ]/t < o Â° , then Z " . , % ^ ~* Â® ^ l+r n ] a 2 t surely as n â€”Â» oo. m o s t For a proof of this strong law see Chow (1960) and Stout (1974). Remark r If there exists an M such that Â£ [ Y , 1 < M T z : , ^ v ^ z > A 2 as y n for all t then 2L Y ln^r0, 2 t= t since < Â» - Next in Theorems 4.1.8 and 4.1.9 we prove a strong law of large numbers which only requires restrictions on moments. In such cases, Theorems 4.1.8 and 4.1.9 can be used instead of the Ergodic theorem which requires showing the process is ergodic. Suppose a stochastic process {X,}â„¢ has the following moment retrictions: = oo, (4.1.8) for all positive integers s and t and k = 1,2, â€¢â€¢â€¢,p. Further suppose the conditional expectation of X] given 3 M = G(X ,X ,...,X _ ) x 2 t is a polynomial in X _ of degree k. x t x That is, the conditional expectation can be written as Â£[JT*|3 . ] = Â« J r * +aâ€žJT^ +-+fl^_t-y _ 1 / 1 M 1 f 1 (4.1.9) Chapter 4. Estimation 61 where a. ^ 0. n Theorem 4.1.8 Let {X^ be a stochastic process which satisfies the moment restrictions in (4.1.8) and (4.1.9). Then the strong law of large numbers holds for X , k t that is - __ _ X â€”> E[X ] almost surely as n â€”> oo and k = 1,2,â€¢â€¢â€¢,/Â». k t Proof: k x First note conditions (4.1.8) and (4.1.9) together imply E[X ]= â€”-â€” ~ ko k l {a E[Xf- ] +a E[X - ] ... a ^E[X ] 1 a }. k 2 H k2 + + k t a + kk We will proceed to prove the result by induction of k. Consider the case k = 1. Let Y = X - f l t t 1 0 I -a M and note that {Y ,3 } forms a martingale difference sequence. n t t Further Y satisfies condition (4.1.8) and hence satisfies the conditions for Theorem 4.1.7. t Therefore as n - Â» oo, 1 " as -_{X,-a X ^-a }^0 i0 t u or ^Jx __â€ž which implies that \ Â£ as M ( + ^{Xâ€ž-X } a 40 t=\ n ^ -= 0 + u n ]. We now assume that for / = 1,2,...,k, where k<p, X' ->E[X' ]. t t Let Chapter 4. Estimation Y =X* +x t - f l t + 1 62 / M - f l ^ X _ -a w / H k+xk t x Then MMV {7,,3,} is again a martingale difference sequence and satisfies condition (4.1.8). B y Theorem 4.1.7 the following sums converge almost surely as n â€”Â» oo, _ n 1 ", 08 Z t=\ ~ *+i,o^ /-"i a 1 r _ t+i,i^-i ~~ k+\,k t-\ fl a X _ k+\,k+i ^ a 0 k+\,0 a The proof then follows by induction on k. Theorem 4.1.9 Under the assumptions of Theorem 4.1.8 1 " as " i=j almost surely as n â€”> co. Proof: Note that repeated application of the conditional expectation assumption implies that E[X \3 _ ] t E[X _ X ] t j t l j = bjX _j+Cj, for t = b E[Xf] + c E[X ].Let J J t some bj and c., Y, = X _ (X -b X _ - ). t } t } t } Cj and this implies that {Tâ€ž3,_,. } is again +1 a martingale difference sequence and satisfies condition (4.1.8). B y Theorem 4.1.7 we have the following, 63 Chapter 4. Estimation 1^ - Â£ j r ( J T , - & JT i n i n n t=j n 1 " - c <" )->o i n n t=J ^ t=J as -_Z t-j t X X -tbjEiX^ + CjEiX,]-E[X _jX,] t â€¢ Remarks ^_ 1) Under the same conditions the more general result i as . X -> E\X _ X\ ] can l k t t } be proven using basically the same proof, only it is more cumbersome to write down. 2) We believe that these are new results. 3) The conditions for Theorem 4.1.7 are satisfied by the Poisson AR(1) model. Example 4.1.5 We continue with Example 4.1.4. Since the unconditional distribution of X is normal with mean X / ( l - a ) and variance 1/(1-a ) we have that for all s and t 2 t E[Xf]= E[X*]<co, k = 1,2,-â€¢â€¢. Also the conditional expectation of X given X _ t linear in X _ . Hence we can apply Theorems 4.1.6 and 4.17 to get, t x as and t x is Chapter 4. Estimation 64 Theorems 4.1.6 and 4.17 can also be applied to n U (Q ) which gives us, x n - as 0 _ where the equality to S results because u is independent of the parameters. Applying the t C M T we establish condition C3 as follows U; U ={n- U Yn U ^S- O x x n x n x n Also since U (Q) = U (Q ) for all G and 0 n n 0 O = 0. in the parameter space condition C4 is trivially satisfied. Since conditions C3 through C7 are satisfied Corollary 4.1.1 gives us the following asymptotic result, v Â« ~ ) A where / = Fand appears in (4.1.7). K Chapter 4. Estimation 4.2 65 Conditional least squares for the Poisson AR(1) model We begin by defining the family of statistical models. Let p be the counting measure, â‚¬1" =XQ> where K is the set of non-negative integers, 3" be the set of all possible 0 subsets of Q " , and 0 = (0,1) x 9 i . We define the family of probability measures P" by + the following version of the Radon-Nikodym derivative, pX;a,X =f[p(X \X _ )p t t X, , l where p{X \X,_ )= l r-1 Â£ l L s=0 pX 0 V = s a \-a '- ~ â€” X -s s J x l s t V J a ! ^ , (4.2.1) (4.2.2) and a,A. e 0 . This defines the Poisson AR(1) model given in (2.1.1). In this section we consider estimation of the Poisson AR(1) model parameters by the method known as conditional least squares (CLS). Klimko and Nelson (1978) and Hall and Heyde (1980, ch. 6) consider C L S estimation and inference for stochastic processes. The parameter estimates are selected to minimize the sum of the squared distances of each observation X from its conditional expected value given the previous t observations X ,X ,...,X _ , Q l t l 2?[XJ3 ] M = aX _ +X , where 3 , is the standard filtration, t x Chapter 4. Estimation 66 that is 3, = cs(X ,X ..,X ). 0 v t The problem is equivalent to maximizing the following function over the parameter space, ^ r=l The first order conditions to this maximization problem lead to the following estimating function, a,A, = Q a,X n _X _, X -oX _,-X t = t t , = 1 _ X -aX _ -\ t t x J r=l V Note this estimating function is the same as the score function for the Gaussian AR(1) model given in (4.1.1) and is therefore called a quasi score. The following are a direct consequence of this equivalence: 1) from (4.1.2) the partial derivatives of the estimating function are, Xf-\ X -\ t (=1 2) this matrix is non-singular (except i f X = X -â€¢ â€¢ â€¢ = X 0 3) from (4.1.3) and (4.1.4) the solutions to x 9 =0 are, n = 0), Chapter 4. Estimation 67 _(x,_ -x)(x -x) x t a. and (Note that in cases were d â€ž is negative we set it equal to zero) 4) Â¥ x n X ,X ,...,X , since is a martingale with respect to 3 = a 0 n given X expectation of X t r-1 x n the conditional is the same in both the Poisson and Gaussian AR(1) models. Next we calculate the variance of v|/,, which is given by 2V\ X _ X â€”oX _ â€”EXX _ X â€” oX _ â€” A, X _ X â€” aX _ â€”EX X -aX _ Â£[\|/,\|/f] = t x t t t x t t x t x x t Xl,E[x -aX _ -'k \^ 2 t t t t t x x E x t x X _ E X â€”oX _ â€”X |3_E X -aAr _,-A. 134 Xf_ (a 1 - a X +A)] ^ [ ^ ( a 1 - a X _ +X) x,_ (a 1 - a X,_,+A)] E(a 1-a X _ +X)] t x t t x ( x ( t M x t x x E Xl ] + XE[x _ } a 1 - a Â£[X .,] + A J : [ X 1 - a E xU + XE[x _ ] a 1 - a E[X _ ] + X a 1-a a ( 2 x 2 t t x x ( t x 4 |3 -aX _ -X M ] Chapter 4. Estimation f a 68 ' A ^ â€¢+3 ,1-a, 1-a 1-a 1-a 2 1-a a 1-a V ' a A aA + + +A vl-a A ( A f ! Â± ^ V I1-aJ n+a^i A Vl-ay â€¢+ A I \ l - a â€¢ +U - a J 1-a + ^ U 11--aa 3 A a ( A ^ 1-a J 1-a aA aA ++ fQâ„¢Â± || A ) 3 2U . A f A a1-a{â€”â€” + 1-a \l-a + A 1-a 2 z 1+a A 2 Note that the above expectations are with respect to the Poisson AR(1) model. This T matrix is positive definite since for any 2 dimensional vector 1= l ,l x / 4l/,M/f]/ = / r 2 aA + ( l + 3 a ^ V 1-a j 2 r_aA_ l+a^ + U+a 1-a 1+a 1-a | ^3 + 2l l aA + ^ x 2 ak' A + J 1+a â€¢+ - 1+a / 2 1-a A 2 1-a , the quantity |A 2 + / 1+a A 2 % + /, 1+a K A K is positive except i f /, = l = 0. 2 The expected value of \j>, is given by, E xf E[X ] 1 t E ( 1-a A V 1-a, 1-a Since 1-a 1 J and Â£[v|),] are both finite and independent of t condition C5 holds and Chapter 4. Estimation 69 we define the variability matrix and the sensitivity matrix respectively as V = E^\i \\i j T t t and S = E\y ,]. The Godambe information and its inverse are given by, , X 2 1-a sv- s = l J = a 1-a A,+ 1+a 2 â€¢> A 1+a , -^X 3 l 1-a 1-a 1+a , a 1-a 22 1-a 2 X + 1+a A, and a 1-a A +1+a 1-a 1+a A. - 1+a A, A+ 1 + a 1-a A 2 Note these explicit expressions for the Godambe information and its inverse are new. Condition (4.1.8) is satisfied since the marginal distribution of X is Poisson with t mean A / 1 - a and all moments for the Poisson distribution exist and are finite. Condition (4.1.9) is also satisfied since i s [ x , | . X ] = aX _ +A,. We can therefore use M Theorems 4.1.8 and 4.1.9 to show the following: as 1) n~ [Â¥] l n -> V, which is condition C6, as 2) n~ Fâ€ž -> S, which is condition C7, lx 3) Â« - F â€ž - > Â° lv ( 1 Chapter 4. Estimation 70 Since Â¥ a, X is independent of a and X condition C4 holds. Further condition x n C3 is Y " a,X a,X 1 =(n~ Â¥ a,A, ) n~ Â¥ a,X , which by the C M T converges in lx lx n n probability to zero. Finally we can apply Corollary 4.1.1, since conditions C3 through C7 are satisfied, to get 4^ 4.3 Â«" ^iV(0, -'). 7 Generalized least squares for the Poisson AR(1) model In this section we use generalized (weighted) least squares (GLS) to estimate the model parameters, see Wooldridge (1991a). In this method, parameter estimates are selected to minimize the sum of the weighted squared distance of each observation X t from its conditional expected value given the previous observation X _ , E\X \X _ '\ = aX _ +X . t The weights I ar[X,|X _,] = d are / ( ; i the inverse of the x estimated t t l conditional t x variance l - d â€ž X _ + X , where d â€ž and X are strongly consistent estimates, t x n n such as the conditional least squares estimates. Observations with large conditional variances are given smaller weightings than those with smaller conditional variances. This problem is equivalent to minimizing the following function over the parameter space, Chapter 4. Estimation 71 ,=i a â€ž l - a â€ž X, , + Aâ€ž The first order conditions to this minimization problem lead to the following estimating function, a,A, =Q n a,A '^X_ X -aX _ -\ x t t ^ x (4.3.1) ^ X,-aX -A. M The sequence {v(/ }" , where \)/ = 1 - 1 , is martingale difference sequence since X _| X ( ( â€” OLX _J â€” A ( dâ€ž l - d â€ž X, , +Aâ€ž n n tâ€”i n X, -oX _ - A, t t-\ J x . d â€ž l - d â€ž X, , + A,â€ž , dâ€ž l - d â€ž X, , +A,â€ž v n = 0. Â« /â€”I n v 1 V ^ Therefore Â¥ is a martingale with respect to 3, = a [x , X ,..., X , d , A j . x t o x t The partial derivative of \\i with respect to a and A, are t n n Chapter 4. Estimation 72 X. , aâ€ž l - a â€ž X, , +A. 1 This matrix Â¥ = /^"_,v|/, is non-singular except i f all of the X 's x n t are zero. Notice that the matrix X,-aX,_ -\ (xf_ 2 x (<*Â» X,_^ x X,_ +k ) x n is positive definite and hence the variance of Â¥ is also positive definite. Â¥ is therefore x x n n a regular inference function. If we assume that for all n Aâ€ž >5 for some 5 >0 then v|/ f is dominated by the function x -ox _ -x (xU x _ ~] 2 t t x t x (4.3.2) which has a finite expectation. B y the Dominated Convergence Theorem we have l i n v ^ , E\]y , f ] = Â£ [ l i m ^ y y f ] V t =E t x (a 1 - a X _ +A.) t x 1 t X^+kf U - i (a 1 - a t x. 1 =E = E\ fx K.x -\ 2 X -oX,_ -X 1 1 (a 1 - a X _ +X) t x 1 X,-aZ,_,-A r-l (4.3.3) Chapter 4. Estimation 73 Further this implies l i m _ n Var[V ] l n = l i m ^ E[y , f ] V = E\ (X X ^ 2 1 (a 1-a + Note that strong consistency of dâ€ž and A, is required to use the Dominated Convergence n Theorem. Similarly \|>, is dominated by the function (4.3.4) which has a finite expectation. B y the Dominated Convergence Theorem we have l i m ^ Â£[\j>, ] = E[\im ^ n x v> | ,] 'y 1 (a 1-a + 1 Y * M ^ 1 We will therefore define the variability matrix Fand sensitivity matrix S by I' Y 2 V = -S = E Theorem 4.3.1 (a 1-a X _ +k) t Y (4.3.5) x Let Â¥ be the inference function defined in (4.3.1). Then the following x n converge under the Poisson AR(1) model: n- [V] ->V l n (4.3.6) Chapter 4. Estimation 74 n^ \s (4.3.7) n Before we prove this result we need to show the following new result. Proposition 4.3.1 If the stochastic process follows the Poisson AR(1) model with a e 0,1 , then it is a -mixing with a n = 0(a"). Proof: We let p (X \X ), k t+k probability of X p(X \X _ ) t t t and p X x denote respectively the conditional t given X , the conditional probability of X given X _ and the t+k t t t x marginal probability of X . These are respectively defined in (3.3.1), (4.2.1) and (4.2.2). t We denote the joint distribution of X , X ,..., X by, x p X ,X -,X x 2 2 k =p(X \X _ )p(X _ \X _ )-p(X \X )p k k the joint distribution of X , X ,... k+n k+n+x k x k x t+k 2 2 x x x by oo p xâ€ž ,x --- =px Y\p{x \x ) +k n+k+x k+n t+1 t t=k+n and the joint distribution of X , X ,..., X , X , X ,... x p X\>x ,...,x ,x ,x 2 k k+n 2 ,... k+n+x k k+n k+n+x -p x ,x ...,x x 2 k by p (x \x )Y[p{x +i\x ) n k k+n t t t=k+n Pn = p X ,X ...,X x 2 p X ,X k n+k ... x 2 k and any B = a X ,X we k+n k+n+x X - n+k+x p For any A eo X ,X ,...,X k+n \ k k x +n have the following, ) Chapter 4. Estimation \P AnB A - P P 75 B\ = __{p A,B x ,x ,...,x ,x 1 -p Z â€ž x 2 Y V =e -+a ,X n + k + x ... Y *- X ,A. n + k }| \Pn( k+n\ k)-P n + k + x X ,. P X X k+n , k+n 1-a (x-s)\ a mm(x,X )( v (a y- (l-a") - e n l Xt 1-a" 1-a , 1-a" s 1-a' min(xâ€”-1) (a") (l-a") - - e" ^ f -+a min(x-Uf -l) 1-a =e ,... k + n + l n + k V p k 1 ,x k 1-a" 1-a =e â€ž ^i,A-2---,A x k + n p X k |(aT(l-a") *~V " 1-a 1-a k X ,X ...,X V A,B P 2 (â€¢Â£ t +aX __ n k x! Xt 1 s (s+l) _ k 1-a (x-l-s)\ (a") (l-a") Â»-'- e" s Jf i 1-a (x-\-s)\ 1-a" 1-a x\ =e a"X __ k j=0 ( l)| v 5+ k 5 s ) 1-a (je-l-s)! (a") (l-a") '-'- e Jf J Since f 0< n v 5 y w i . n n ^ - i - 'H / , (a"r(l-a")^- c * i-a V 1 _-/av '" 1 A 1-a JC-l-S! <1, min(x-Uf -l) t the summation on the right is bounded by _ s +1 , which is less than s=0 Hence we have I t - U t Chapter 4. Estimation 76 p (x\X )<e' X n 1-a" 1-a V_ x -ix 1-a k k +a"X k k k t or pM k) x =e x 1-a" 1-a 1-a + <9(a") , -X 1-a" X +1-a 1-a < 1-a"l-a A, 1-a A \ A â€¢ + 0(a") JC! JC! Xa + 0(a") p{x) = p(x) Aa: 1+ 1-a. (1-a-)' + 0(a") JC! = p(x){l + 0 ( a : ) + 0(a")} j = p(x){l + 0(a"j\ where 0 < a . < a . Therefore (X \X )-p Pk k+n k X k+n p X k+n px,k+n k+n px,k+n = 0(a") and {l + 0(a")}-p X Chapter 4. Estimation \P AnB -PAPB\ 77 = Y,p X ,X ...,X x 2 p X ,X k n+k .. 0(a") n+k+v A,B = 0(a")YA,BpX ,X ...,X J l 2 pX ,X k n+k n = 0(a") Therefore { X , } is a -mixing with a n = 0 ( a " ) . â€¢ Proof: (Theorem 4.3.1) We begin by introducing some notation. Let X,-oX _ -X t (xl x X_ x (aâ€ž 1 - d , X _ X ) t l+ n X f v2 X = X-aX_,-A, Xt-i i 1 x_ t x and Showing (4.3.6) is equivalent to showing n S l n ->V. Since { X , } ^ is stationary and a -mixing with a n =0(a") by Theorem 4.1.1 { i f } " is also stationary and a -mixing with a n = 0(a"). Further {X,}" =1 and {7/}" are Chapter 4. Estimation 78 both ergodic. Since Y and Y are both positive and Y <Xâ€žY, m s m "-2 -1 -1 we have that the variance of ^ * as n S is less than or equal to the variance of X n S . B y assumption X â€”> A and by the n n Ergodic theorem n S ->E\Y J. Hence X n n t n n n S converges almost surely and therefore n its asymptotic variance is zero. Chebyshev's inequality can then be used to show that r i P n~ S - Â£ [ Â« 5 ] - > 0 . Since lim _ Â£[l^ ] = r l we therefore have the desired result, - 1 n n n >0O I P n~ S â€”> V. The proof of (4.3.7) is similar and is omitted. l n â€¢ P Condition C 2 ' is satisfied since we have shown that n~ [^ ]â€žâ€”>V 1 nm n->Â«. = V. Condition C I is also satisfied since l i m , , ^ max,^ = V. Condition C4 is satisfied since / 2 n and j Var\y,] a,X j 5' (&â€ž,Xâ€ž) = I â€¢ Finally condition C3 is x n 2 satisfied since [*â€ž a ,X 0 0 a,X = [*â€ž a,A, \ \ a,X = [n~ Â¥ a , A ] ~ V F a A lx l v n n which by the C M T Theorem 4.1.4 converges in probability to zero. We can therefore use Theorem 4.1.5 which implies Chapter 4. Estimation [ y ] ; % â€ž a,X ->7V 0,7, B y the C M T Theorem 4.1.4 the following holds n*(% a,X Therefore n = ( r t â€ž a,A ) ~ V * [ Y ] * - Â» , r V * = K"* -^(O.F - 1 ). The following theorem gives an approximation for V Theorem 4.3.2 The Junction V in 4.3.5 is equivalent to X -X V a,X a a 1-a 1 1-a a 1-a 0 + [a 1 - a ] -X a 1-a -X a 1-a 1 a 1-a JC _J + ( X Chapter 4. Estimation Proof: Making use of the following two identities, x. x _ =(a 1 - a x _ + Xj 2 x t x V a 1-a [a 1 - a ] J [a 1-a] and, =â€”râ€”( a 1-a a 1 _ a + ^ ) â€” r â€” a 1-a we can rewrite the information matrix as follows, ( E V a,X a 1-a = 1 X [a 1-a] a 1 0 a 1-a [a 1 - a ] a X -X 2 1-a a 1 - a x_ + X -X a t a 1-a 1 x 1-a -X 1 -X a 1-a a 1-a [a 1 - a ] a 1-a a 1 - a x_ + X t 1-a a x 1-a To approximate the expectation above we consider the following function, gx = a1-a x+X Chapter 4. Estimation 81 th The k derivative of g x with respect to x is, g x = it! â€”1 *[<x 1 - a ]* [a 1 - a x + A]* and a Taylor expansion about the point A / 1 - a gx =g\ + g' U-a ( X Y ( 1-a V* is, X x â€”V 1-a 1-a x V i 1-a; +1 1-a x Y / 3! ^ 1-aj 2! A V i - + i? ^ 4! 1-aJ vl-ayv where X Y 1 1-aj 5! i? ^ will be largest in absolute value when ^ = 0. Now we can approximate the expected value of g X _ t x . 1 [a 1 - a ff x } 1+a X [ 1+a X] U - a J 3 [al-a] f 4 [ 1+a A,] ^ 1-a / A, [a 1 - a ] ( X 3 [l+ A] U-a 4 a ^ U - a where the error in the approximation is negative and bounded in absolute value by 82 Chapter 4. Estimation [a(l-a E[R(0)] 4.4 )] f x 5 + 10 The score and Fisher information for the Poisson AR(1) model In this section we derive expressions for the score function and observed Fisher information for the Poisson AR(1) model. The conditional likelihood given X 0 L(a,X) = Yl"_ p(X \X _ ), l t t l where p(X \X _ ) t t l is is defined in (4.2.1.). Let U be the score with respect to a , that is a ^p(X \X _ ) t = f U =^t(aXXâ€žX ,...,X ) a l j n t x pmx _ ) t x where /(â€¢) is the log-likelihood. The partial derivative of the conditional probability is found by making use of the following derivative s ra ^ l - a ) * ' â€” ( 5a 1 and the relation - 5 s a(l-a) x a * ( l - a ) x-s 1-a Chapter 4. Estimation 83 -â€”p(y\x)= da -j r a'(l-a) a(l-aj^sj Â£ " 1 f ^x v ' / ^ 1 1-a i_\ v \ x â€” 1-a -1) a 1 z S x "' [y-s)l e A (1-a) L 1 1-a X^y-s A- T ^ â€ž v-i-s s-i (-, e p(y I *) r (7-5)1 [s-lj x p(y\x) r - - - 7 1-a K (j-l-5> 1-a p(y I (4.4.1) {p(y-l\x-\)-p(y\x)\ 1-a Thus the score with respect to a is, U^^^ipiXi-W^-V-pmX^/pmX.-t). <=i t a (4.4.2) Let C/" denote the score with respect to A , that is x a U (a,X) = â€”l(a,X;X ,X , 5A. x Q "~^rP( t\ t-\) ,*j = Â£ ^ _ _ _ n P( ,\ ,-i) x l x x x The partial derivative of the conditional probability is found by making use of the following derivative dX ^ ; min(y,x) ( J \ ^rP(y\ ) dX x -X * y-\-s = s=0 = -p(y\x)+ \ S J _i*)(x __ M s=0 = -p(y\x)+p(y-i\x). A a'(l-a) c-s Xy e X y-l-s (v-1-5)! Chapter 4. Estimation Thus the score with respect to X is, The second derivative of the conditional probability can now easily be found. d da ^-p(y-l\x-l)-Â£p(y\x) p(y\ x) = 7-^- 4r p{y\ x) + 1 - a 5a 1-a 1 I x 1-a l l - a { (y-l\ -l)- (y\ )} p X p + 7 ^ {^{Piy 1-a [ 1 - a 1-a X -2\x-2)-p(y-l\x-1)} L J {p(y-l\x-\)-p{y\x)}\ =-^{p(y (1-a) - l|x -1) - /;(v|x) - ^( v - 2|x - 2) + p{y - l|x -1) +x[^( v - 2|x - 2) - ,p( v - l|x -1) - p( y - l\x -1) + p{ v|x)]} = (1-a) W v|x) - 2^( v - l|x -1) + p( y - 2|x - 2) -x[p(y\x) - 2p{y - l|x -1) + p( y - 2\x - 2)]}, dadX p(y\*)=~: ^p(y-i\x-i)-^ (y\x) P =^ {[~P(y-^-l) -Mv|x) + J + p(y-2\x-l)] p(v-l|x)]} = r - {p{y\x) - P(y-l\x)-p(y-l\x-l) 1-a 5 and + p(y-2\x-1)} Chapter 4. Estimation 85 Â§^p(y\ ) = -^p(y\x) X ^p(y-i\ ) + x = p(y\x) - 2p(y -l\x) + p(y -2\x). The second derivatives of the log-likelihood are then, â€ž pmx^^p^x^-^pix^x^ p(x \x _ y t=\ t t x tZj^ripiX,^^) -2p(X \X _ )p(X -\\X _ -\) =\ ( 1 - a ) 2 t t x t t x t + p(X \X _ )p(X -2\X _ -2)-X _ [p(X \X _ )p(X -2\X _ t ! -p(X x t t -1|X t x t x l l x -l) ]}/^(X |X ) 2 M 1 f 2 l t x a X _,(X _ -l) p(X,-2|X,_ -2) 2 2 x + ( ( 1 J 1 2 t x ,21 aX^p(X -\\X _ -\) t . t (4.4.3) x ^ ( x j x ^ ^ I ^ ^ i x ^ o - ^ m x j x ^ ) ] Â§ = />(*,I*M) 2 ^ ( X , | X , _ > ( X , - ^ X ^ ) - ^ â€”/ v I v 1 " -l|X _,) f \2 p(x \x _ y t=l t X p(X -2\X _ ) 2 t t P(X \X _ ) t and -2) x 2 -1) 2 x t M ^ f2a X _ p(X -l\X _ P(X \X,_ ) a (l-a) f t t t x x t x Xp(X,-l\X _ ) t x p{x \x _ ) â€¢ t t x 2 Chapter 4. Estimation I * 86 â€ž p(X \ X _ ) p(X, I X _ ) - A p ( JT, | X,_ )^-p(X,\ _ â€¢y oAoa ok da t 1 t x t it _ X_ ) x t x p(x \x _ ) 2 t _fX _ t x pjX,\X _ )p{X x t 1 x t x -21X^-1)-^^ t ^ [A.oJr _ p(X,-2|X,. -1) < Aa(l-a)trl lj -\\X _ )p(X -\\X _ -\) t t t p(X \X _ ) t t Xp(X - \\X _ )aX _ p(X, 1 t x x t x t x - l\X,_ -1) x J' P(X,\X,_ ) 2 x X Next we show how to write the first and second derivatives of the log-likelihood as conditional expectations. To make these conditional expectations more clear, we consider the following example with just 2 random variables. Let X and Y be independent random variables and denote their densities respectively as and f (y), Y f (x) x where the densities are with respect to the measure v (Lebesgue measure or counting measure). Let Z = X + Y be the convolution of X and Y. The joint distribution of Z and X is f x (JC) f (z - x) (note the Jacobian for this transformation is 1). The density Y for Z is found by integrating out X, fz 0 ) = f fx ( )fr 0 - ) x x O) â€¢ dv Jâ€”oo The conditional density for X given Z is fxM = f A x ) The conditional moments for X and Y are then, M z ~ X ) . Chapter 4. Estimation 87 E[x \z] = k Jlx f (x)dv(x) k X]Z \ x f (x)f (z-x)dv(x) k x Y Jâ€”CO fzV) and E[Y \z] = k = E[(Z-X) \Z k \_ (*-*)* fx\z ( ) v (x) x d x J {z-xf f (x)f (z-x)dv(x) x Y v â€”CO Proposition 4.4.1 p(X \X _ ) t t For the Poisson AR(1) model, with conditional probabilities given in (4.2.1), the following equalities hold: x aX _ p{X t x -\\X _ -l)/p(X \X _ ) t t x t Xp(X, -1|X,_ )/p(X, x a X,_ {X _ 2 x t x -l)p(X, -2\X,_ -2)/p(X,\X _ ) X aXX _ p(X t x t t t t x t t x t (4.4.4) x (4.4.5) x t t 2 t | X _ ) = E, [e, ] -2\X,_ -l)/p(X \X,_ ) X = E [aoX _ ] x = E [(aoX _ f]-E [aÂ°X _ ] x X p(X -2\X _ )/p(X \X _ ) t t x t x t t (4.4.6) x =Â£ [(aoI )eJ ( = E [e ]-E [s ] 2 l (4.4.7) M l t ' (4.4.8) where Â£,[â€¢] denotes the conditional expectation with respect to the sigma field 3 =a(X ,X , t 0 x ,X _ ,X ). t x t Chapter 4. Estimation Proof: We first show (4.4.4). Rearranging (4.4.1) we see that aX _ p(X -\\X _ -\) t x t t p(X \X _ ) t t __ % p{X \X_ ) x t = (X._A t-\ V 1 x x x V x J \ ... x/i .y.^e \X,-i-x X' x x a(1-a) E [aoX _ ]. t t l Next the left hand side of (4.4.5) is Xp(X -\\X _ ) t t p(X \X _ ) t t x 1 x t x t p(X \X _ ) t X.-l-x Â£- '-\(x A mHX _ x j min( V (X,-l-x)\ J x )|<J^ ^ x P(X \ t-\) *=o X t min(X, .-X.-OfX 1 p(X \X _ ) t t V x which is the right hand side of (4.4.5). Next the left hand of (4.4.6) is tt V x x J e' X '(X,-x) (X -x x | a (1 - a ) " ~ X l x x t ~\ e- X '~ |a"(l-a)*--'(JT,-x) x ) x (X,-x)l x Chapter 4. Estimation a X (X _,-l)^(X 89 -2|*_,-2) 2 M f t minCJT,^,^,.,^) 1 X / y 2 v mii^A , -2,X,_,-2) - x y -x^x.-2-x a (1-a) x ^ r (x * />(*,!*,_,)â€¢ ^ ^ lXx 2)i;:-;|a- (l-a)'- -* " \ x + 2j (X,-2-x)\ 2 + 2 + \ x +2 f X, ^ V ' f P( t\ t-\) X X x(x-l) V ; vx a (1-a) ' J V = Â£|(aoZ ) ]-Â£ [aoI ] 2 M ( e X^X,-2-x k kr 2 ^ > ^ x=0 A ( p{X \X _ ) x (X,-2-x)l e- ^- (X,-2)l X,_,-2a* (l-a) '-' (X -2-x)! (X _!-2-x)!x! min(*, -2,X,_,-2) 1 t _o\ +2 /KXJX,.,) t (-l M j=0 1 L a X,_,(X -l) M which is the right hand side of (4.4.6). Next the left hand side of (4.4.7) is . (X,-x)! (X -2-x)! ( 2 Chapter 4. Estimation 90 aXX _ p(X -2\X,_ -l) t x t x P(X \X _ ) t t x 1 Z ^-> a pmx,_ x t t V xam(X, 2,X,_ -\) i P{X \X _ r 1 X ( x + i=0 t e X' X V X 1 x (X,-2-x)! ; / j ^ M p(X \X,_ A, ' â€ž '-'(X^-l-xy.xl n(X,-2,X -l) e (X,-2-x)! J x (X,_! -1)! x tt x x,,-\-x a*(l-a) ' I 1 v i r % Â« ' ( i x +l r a y (X,-2-x)! 1 pmx,_ min(X,,-l,X ) 1 Z P(X,\X _ t V t __ p(X \X, x=Q t " V a. (\-a) "- {X -x) x x x t J \ x e~ X -x " x (y __ \ x V min( 1 X x x=0 x -X>\ X-x x,.-x e K ' -' a ' ( l - a ) ' {X,-\-x)\ J / j ^ M p(X,\X _ ^ F A x=0 min(X,,-l,X ) 1 = / M x x x (X,-x)\ e X '~ x a {l-a) "-'(X -x) l x x x t J X \X,-X)\ E,[{aoX,_ )e ] l t which is the right hand side of (4.4.7). Finally the left hand side of (4.4.8) is X p{X -2\X _ ) 2 t t x P(X \X _ ) t t x j mm(X,y2,X,.,) z p(X \X,_ ) t tt x j ^(X, 2,X,_i) r y tt p(X,\X _ ) t x Jy ^ V f \ X.-2-x rka-a) 2 (X,-2-x)\ J x jÂ£ x t V x e X '~ x {\-a) --\X -xlX -x-l) t J x (X,-x)l e~ X 'x a (l-a) ^{(X -x) -(X,-x)}^^^ x p(X \X _ ) t = t t% x E,[e ]-E [e ] 2 t t { xJ x 2 t x x x Chapter 4. Estimation 91 which is the right hand side of (4.4.8). â€¢ Proposition 4.4.1 leads to the following new expression for the first and second derivatives of the log-likelihood: u * a^lr^' = ^j~â€”f t { 2aEt[ [ a o X Â° a '- l ] ' "^ " j B a M ] ) " ' Â° '^ 't l)E a (l- ) 1^ 2 a t aX [a i t Â° - ^ [ Â« ^ H J - ( Â£ j a o I 2 '- +Var X { ,[(Â« a [ a x 2 o X ^i '- + e i ] } ' > [ ( Â° -i a y x } Â° 1 " t a Â° ] } , ) s , ] - Â£ , [ a o jr,_,]Â£,[e,]} E _^|câ€ž,[a.Ar ,Â«,] M and "=^S{ .^]- .[ .]-( .[ ,]) } Â£ Â£ Â£ Â£ 8 2 -jTZl^W-^t'.IlA Note that Â£ [aoI ( M (=1 ]^aoI M ! E,[e ]*e , t t Â£ M [ a o I M ] = a I M and Chapter 4. Estimation t-i[ t] E Â£ 92 ~ ^ â€¢ Further note that given X the correlation between a o X _ and s, is -1. t t x This follows because C o v , [ a o I , Â£ , ] = Cov [(X, -s,)e ] M t t = E [{X -e )e ]-E [{X -s )]E [e ] l t l t l t t t l ^X^^^-E^-X^z^E^,} 2 and Var [a Â°X _ ] = Var [{X -&,)] = Var [e, ]. t t x t t t In the cases were the time series is comprised of low counts the expected information is easy to calculate numerically. For example i f the probability of a count larger than 6 is almost zero then there are 36 outcomes to sum over in the calculation of E E [a Â° X _ ] t t x 2 or E E, [e, ] J. Hence the expected information is easily to calculate. 2 The above representation of the score function leads to a new definition of residuals. The usual way to define a residual is to take the difference between a random variable and its conditional expectation. That is, for the Poisson AR(1) model residuals can be defined as r - X -E _ \X ] t t t x = X -oX _ -X. However since X is comprised of t t t x t two random components it would be nice to define two sets of residuals, one for each random component. The natural way to define such residuals is as follows: for the continuation component let r =ot Â°X _ -aX _ Xt t x t x and for the arrival component let Unfortunately these definitions won't work, because aÂ°X _ t x r =e -X. 2t t and e, are not observable. Chapter 4. Estimation 93 However we can replace a o I and e, respectively with is, [a o J ] and is,[e ] (their H M ( conditional expectations given the observed values of X _ and X ). That is, redefine the t residuals as r* = E, [râ€ž ] = [a Â° x t ] - a l _ , and r * = E[r ] = E, [e, ] - X. ( 2 2t Remarks 1. These conditional expectations are easily calculated with the help of Proposition 4.4.1. 2. We are using the aggregate data to estimate the individual unobserved processes (continuation process and arrival process). 3. The residuals are equivalent to the martingale differences of the score function. Further note our new expressions for the score function lead to these new definitions for residuals, which otherwise would not have been obvious. 4. Adding the components of the two new sets of residuals gives the old set of residuals. That is, \t + it r r = Et [ Â° t-\ ] a X +E [ /] ^ - s - t = E [a o X _ + s, ] - oX _ - X t t x t = E [X ]-aX _ -X = X -aX _ -X l t t t t x 1 l 5. A l l the residuals above should be standardized before making residual plots. 6. Since the two sets of residuals are calculated from aggregate data they might be Chapter 4. Estimation 94 highly correlated. This appears to be the case in Example 4.4.1. For low count series (series with many zeros) there appears to be less correlation, see the residual plots for data set 1*, Figure 8.2.1, and the discussion of the residuals in Section 8.2. Example 4.4.1 In this example we continue our analysis of our illustrative data set by examining the residual plots. Figures 4.4.1 and 4.4.2 are respectively the residual plots for the continuation and arrival processes, while Figures 4.4.3 and 4.4.4 are respectively the autocorrelations respectively for the continuation and arrival residuals. Both residual plots have patterns similar to the Pearson residual plot in Figure 2.5.4. None of the autocorrelations in the continuation residuals are statistically significant at the 5% level. The second autocorrelation for the arrival residuals is on the border line significant at the 5% level. Residuals: Continuation Process (heavy manufacturing industry, males, ages25-34, burns) Figure 4.4.1 Residual plot of the continuation process for the time series of monthly claims counts of workers collecting STWLB from January 1987 to December 1994. All claimants are male, between the ages of 25 and 34, are employed in the heavy manufacturing industry and are collecting STWLB due to a burn related injury. Chapter 4. Estimation 95 Residuals: Arrival Process (heavy manufacturing industry, males, ages 25-34, burns) Figure 4.4.2 Residual plot of the arrival process for the time series of monthly claims counts of workers collecting STWLB from January 1987 to December 1994. All claimants are male, between the ages of 25 and 34, are employed in the heavy manufacturing industry and are collecting STWLB due to a burn related injury. Autocorrelations of the Residuals: Continuation Process (heavy manufacturing industry, males, ages 25-34, burns) i : i Figure 4.4.3 Autocorrelations in the continuation residuals for the time series of monthly claims counts of workers collecting STWLB from January 1987 to December 1994. All claimants are male, between the ages of 25 and 34, are employed in the heavy manufacturing industry and are collecting STWLB due to a burn related injury. Autocorrelations of the Residuals: Arrival Process (heavy manufacturing industry, males, ages 25-34, burns) JS 2 o o o < 11 -0.5 Lag Figure 4.4.4 Autocorrelations in the arrival residuals for the time series of monthly claims counts of workers collecting STWLB from January 1987 to December 1994. All claimants are male, between the ages of 25 and 34, are employed in the heavy manufacturing industry and are collecting STWLB due to a burn related injury. Chapter 4. Estimation 96 4.5 The score and Fisher information for a general AR(1) model We now show how to extend this to a general AR(1) model of the following form. Let i X = a Â°X -\ + s t and X \X _ t t where aÂ°X _ \X _ t, t x t has d e n s i t y / ^ l x ^ ^ a ) , s has density g(s;X) x t has density x h(x \x _ ;a,X) t t = f f(s\x _ ;a)g(x x t x -x;X)dv(s). t Jâ€”oo The conditional density of a o X _ \ X _ , X is t x t x t f(s\x _ ;a)g(x t f(s\x _ ,x ;a,X) t x t x h(x \x _ ;a,X) t Let 3, =a(X ,X ,...,X ). Q x -s;X) t t x The conditional moments of aoX _ \3 t t x t and e,|3, can be expressed as follows. E[{aoX _ ) \^] = k t x [yfWx^x^aMdvis) f x f(s\x _ ;a)g(x -s;X)dv(s) k t J -co x t h{x \x _ ;a,X) t Â£[e?|3,] = = t x E[{X -aoX _ ) \z] k t \ Jâ€”oo 'CO ^ J t x {x -s) f(s\x _ ,x ;a,X)dv{s) k t Â£ t x t /(s|*,-i; )g(X-*;A,)rfv(j) a -00 h{x \x _ \a,X) t Given a sample x ,x ,...,x 0 x n t x from the above AR(1) model the likelihood is the product of the conditional densities , that is L(a,X) = ^ " ^ ( x J x ^ j j a j A y n ^ ) . To simplify things Chapter 4. Estimation we take 97 h(x ) = l. The 0 log-likelihood and score are respectively Â£(a,X) = ]T" log(/*(x |*,-i;a,A.)) and =i r -^-Â£(a,X)" da f U(a,X) = n â€”n{x \x _{,a,X) t , h(x \x ^;a,X) =1 t n y , 1 t x and â€”g(x -s;X) dX -Z-h(Xt\x .i\0.,X) t h(x \x _ ;a,X) t t x are polynomials of the following form, t â€”/(*| t 5A, =1 If -â€” f(s\x _ ;a) 5a t ;a) = (a + a s + a s +â€¢ â€¢ -+a s 2 0 x )f(s\x _ ;a), p 2 p t l and â€” g(x -s;X) = (a +a (x, -s)+a (x oX t 0 x 2 -s) 2 t +â€”+a (x -s) )g(x p p t then the score can be rewritten is terms of conditional moments as, ( n p ZZÂ«,4aoJT )'| M U(a,X) = /=1 \t=\ Note that we can write 1=0 i=0 3 f t -s;X) then Chapter 4. Estimation 98 â€” / ( j | x ; a ) =x(j;a)/(5|x ;a) 5a M M and g ( x - 5; A ) = y (x, - s; X)g(x - s; X) ( 5A t by simply defining i(s;a) = â€” //f 5a / and y (x, -s;X) = â€”g/g. Thus the score can be dX I written in terms of conditional expectations as, ^Â£[T(aoX )|3j A M <7(a,A) = I>[Y(Â£,)|3,] V/=I The second derivative of the log-likelihood are, -&(x, | x , _ , ; a , A ) 5a h(x \x _ ;a,X) (=1 dadX "aX f 5 ^ â€”/?(xJx,_,;a,A) 5a t t x A(x,|x ;a,X) M /?(xJx _j;a,A) f 5a A(x,|x ;a,A) â€” /j(x |x ;a,A) 5A M ( /j(xJx ;a,A) /?(x,|x ;a,A) M M A \ 1? 2 ^aA^d^'-l) 5A : (=1 A M a,x( rlVl) X 2 Chapter 4. Estimation 99 If we let x = ^-x and y = J^y a x 5a then we can write, /(5|x,_!;a) =x (5;a)/(5|x _ ) + (x(5;a)) r a 1 and d 2 dX 2 2 g(x -s;X)=y (x t x -s;X)g(x, -s;X) + (y(x -s;Xj) g(x, t t -s;X). Using these two equations we can rewrite the second derivatives of the log-likelihood as, = S {^f a T ) + (x (a o Â° )) 13,1 - (^[x (cc o )| 3, ]) j 2 2 t=i = ox l Z l ^ a C a ^ ^ ^ J + MxCaoX^)!^]} = X{4^ ^-i)Y( )l J-^( ^-i)|3,My(Â£ )|3,]} = _Cov[x(aoX _ ),y(e )\3 ] 1=1 Â£ a o 3 a o ( t = __{E[y (e x } r t t +Var[y(e )\Z ]} t t r=l Applying these results to the AR(1) models of Joe (1996) is a bit more complicated, since Joe allows the distribution of a o X _ to depend on A.. If we define t Q 8(-) such that â€”f(s\x _ ;a,X) 5A. t respect to A. becomes x = 8(s;X)f(s\x _ ;a,X), t l x then the score function with Chapter 4. Estimation 100 Â£i<[8(aÂ°X,_J+Y(e,)|3,]. Similar modifications can be made to the expressions for the second derivatives of the log-likelihood. In the following three cases a Â° X _ is independent of A,. t x 1) The Poisson AR(1) model, see Section 4.4, 2) The Gaussian AR(1) model, with a Â°X _ = oX _ and s, ~ N(X,G ). This is slightly 2 t different x t x than Joe (1996), who lets a Â°X _ \X _ ~N(aX _ ,ao ) 2 t x t x t x and s, ~ N{X, (1 - a )a ). Note however that both models are equivalent. 2 3) The generalized Poisson AR(1) model, where X t has a generalized Poisson distribution with parameters A,/(l-a) and A / ( l - a ) . Note Joe (1996) allows the two parameters to be different. In these 3 cases the functions x(s) and y(s) are polynomials in s, and hence the score function and observed Fisher information can be written in terms of the conditional moments of a o X _ and e t 4.6 x r Asymptotics of the conditional maximum likelihood estimators for the Poisson AR(1) model In Section 4.3 we showed that the Poisson AR(1) model is a -mixing. Since it is also Chapter 4. Estimation 101 stationary the model is ergodic. If the Fisher information is finite and positive definite then the regularity conditions C1-C3 and C5-C7 hold. The next theorem shows that the Fisher information is finite. Theorem 4.6.1 Let u =U -U _ at and ai aJ u =U -U _ v Xt Xt u and u =(u ,u f, x t al Xl where U w are the score functions for the Poisson AR(1) respectively with respect to a and X. Further, let u denote the matrix of partial derivatives of u with respect to a and X t t and let U =^"_ Â« t E (l u ) T 1 < k t oo For any two dimensional vector I and any positive integer k, t and i s [ / i i , / ] < oo. r Proof. For the Poisson AR(1) model we note the following: 1) E.laoX.J <X? (4.6.1) 2) E [e f<X? t (4.6.2) t 3) E [aoX jE [ej<X? (4.6.3) +! t t t for all positive integers k and /. Also note that, for any two dimensional vector /, {l u } T ={l u k t x at +l u }" 2 xt is a polynomial in E,[aÂ° X _ ] and E,[e ] of degree k, which we write as t x t 102 Chapter 4. Estimation k-i I for some constants a ,a ,---,a . The expected value of j / " , } is 7 0 x k j ; ^ [ o o j ( ] M h Â£ , [ 8 ] > i=0 ^EKi4^[Â«Â°^-.r^[8r]'" (=0 1=0 < 00 The proof for the second part is automatic since E (l u,) } = E[l u uJl] T 2 = T t E[l u l\ T t The next theorem shows that the Fisher information is positive definite. Proposition 4.6.1 E 1 {l u ) T x Proof. x It is sufficient to show that Var[l aX _ p(X -l\X _ -l) x 7 =0 if and only ifl =l = 0, where l = (l ,l ) ! x t t x + l Xp(X -l\X _ )] 2 t t x 2 x 2 Var[l E [a Â° X _ ] + l E [e ]]* x t t x 2 t 0, or that t * 0 when / is non-zero. We prove the result by contradiction. Suppose there exists a non-zero / such that Var[l aX _ p(X -l\X _ -l) x t x t t x 103 Chapter 4. Estimation +l Xp(X - l | X ) ] = 0. This implies that 2 t M lpX _ p(X -\\X _ -l) t x t t + l Xp(X,-l\X^j\ x =c 2 (4.6.4) almost everywhere for some constant c. In particular (4.6.4) holds when X â€” 1 and t X _ = 0 which gives us t x c = l a0p(0\-i) + l2Xp(0\6j\ x = l Xe x 2 or l = cX~ e . Taking X = 1 and X _ = 1 in (4.6.4) we can solve for l as follows l 2 x t t x x c = lxa\p(0\6) + l Xp(0\l)] 2 = l ae~ +cX~ e X(\-a)e~ x l x x x = l ae~ + c ( l - a ) x x or l = ce . Finally taking X = 1 and X _ = 2 in (4.6.4) we have x x t t x c = l a\p({\\) + l Xp(Q\2)\ x 2 = ce a(l-a)e~ x x +cX- e X2(l-af = c a ( l - a ) + c2(l-a) l x e x 2 = c(l-a){2-a} This implies either a = l , a = 2 or c = 0. Since a = 1 and a =2 are outside the parameter space we conclude that c = 0, however this implies that l =l =0. x 2 â€¢ Chapter 4. Estimation 104 It can be shown that the regularity condition C4 follows from the uniform strong law of large numbers, see Ferguson (1996, Ch. 16). Therefore the regularity conditions for Corollary 4.1.1 are satisfied, and as a result the maximum likelihood estimators ct n and X have the following asymptotic distribution: n (dâ€ž - o ^ â€¢Jn d N(o,r ), x where / is the Fisher information matrix. The parameter estimates can be found using a Newton-Raphson type iterative procedure as follows: let U(Q) be the score function and let t/(0) denote the matrix of partial derivative of U with respect to 9, where 9 =(a,X) . The iterative procedure is T defined by 0(~i) ew-[/(eW)- c/(0W). i = In some cases this procedure can be modified by replacing C/(0 ) with i^t/(9 )] and is (r) (r) sometimes referred to as Fisher Scoring. We use Fisher Scoring to estimate the parameter in the Poisson AR(1) model, since 4c/(0 )J is easy to calculate. The C L S estimates generally work well as starting (r) values. Occasionally (less than 1% of the time) we found that the C L S estimates caused the algorithm to diverge. Example 4.6.1 The maximum likelihood estimates for our illustrative example are Chapter 4. Estimation 105 d = 0.40 and X = 5.2 . The inverse of the expected Fisher information matrix at these parameter estimates is given by ._,._[ 0.62 ' 4.7 ~[_-5.17 -5.17" 5 0 - 0 5 Comparison of methods Al-Osh and Alzaid (1987) used a simulation to compare the following three estimation methods for the Poisson AR(1) model: Yule-Walker, CLS and C M L . They note that the algebraic formulas for the Yule-Walker and C L S estimates are almost identical and is also seen in their numerical comparisons. Their results also show that there is only a small efficiency gain in C M L when a alpha is small, say less than 0.3. However for larger values of a alpha, say between 0.5 and 1, the efficiency gain in C M L seems well worth the effort. Our analysis differs in that we look at the asymptotic efficiency (AE). That is we consider what happens as the sample size goes to infinity and our results are not based on a simulation. We let 0 be an estimate of 0 and denote it Godambe information by j and n its inverse by Let be the k, k element of j' 1 and let be the k, k element of the inverse of the Fisher information matrix. Cox and Hinkley (1974) define the A E of the k* component of 9 â€ž as AE(Q ) = Z" / j ' " . 1 nk 1 106 Chapter 4. Estimation A E of C L S (A = l) AE(d) AE (X) Figure 4.7.1 The asymptotic efficiency of conditional least squares as a function of a when X = 1. A E of C L S 0.86 (a = 0.3) 0.84 0.82 0.8 AE{& ) 0.78 0.76 0.74 0.72 to m csi in in CO in cri X Figure 4.7.2 The asymptotic efficiency of conditional least squares as a Junction ofX when a =0.3. Figure 4.7.1 shows that as a goes to zero the A E goes to 1, that is for small values of a there is little difference between CLS and C M L . However as a goes to one the A E goes to 0, that is there is a substantial advantage in using C M L when a is large. Figure 4.7.2 shows that as X increases the A E in estimating a rises while the A E in Chapter 4. Estimation 107 estimating X drops. In both cases the A E appears to be approaching a limit of about 0.83. For GLS estimation we can use Theorem 4.3.2 to find bounds on the Godambe information. Unfortunately we found no increase in information in moving from C L S estimation to G L S estimation. Tables 4.7.1 and 4.7.2 show that the CLS information is contained within our bounds for the GLS information. Further the upper bound on the GLS information is not significantly closer to the C M L information. El | 9 GLS CLS CML (2.00,2.20) 2.14 2.30 (0.788,0.797) 0.794 0.802 Table 4.7.1 The diagonal elements of the Godambe information matrix for GLS, CLS and CML when a = 0.3 and 1 = 1. GLS CLS CML El (6.18,7.24) 7.17 8.35 a (0.573,0.619) 0.616 0.668 Table 4.7.2 The diagonal elements of the Godambe information matrix for GLS, CLS and CML when a = 0.7 and 31 = 1. Next we tested our estimation methods on some misspecifed data. We simulated 200 series of length 100 using binomial thinning with parameter a = 0.5 and misspecifying the arrival process by letting the distribution of e, be uniform over {0,1,2}. The resulting sampling distributions of d and X for the estimation methods C L S , G L S and C M L are summarized by box plots in Figures 4.7.4 and 4.7.5. Figure 4.7.3 shows that C M L estimates for a are strongly biased, in fact almost 108 Chapter 4. Estimation all of the C M L estimates for a are greater than 0.5. In contrast the C L S and G L S estimates for a are only slightly biased. Note that the interpretation a has not changed, that is a is the probability that a claimant in the current period continues to be a claimant in the next period. In Figure 4.7.4 we see that C M L estimates for X are biased estimates of the mean of e, which is 1, in fact almost all of the C M L estimates for X are smaller than 1. In contrast the sample mean of the CLS and GLS estimates for X are very close to 1. We can think of X as the mean parameter of e , however it is not used to specify the t distribution of s . Estimates of X are therefore estimates of the mean of e,. ( Box Plots (estimates for alpha) o CO CD h-; TO <=> " I CO LU i _ CD u o' CD E 2 CO 9 < CM d "cLS GLS CML Estimation Method Figure 4.7.3 Box plots comparing the sampling distributions of a when the arrival process {e,}is uniform over {0,1,2}. 109 Chapter 4. Estimation Box Plots (estimates for lambda) 00 8 w CD 00 0- d CLS Figure 4.7.4 {0,1,2}. GLS CML Estimation Method Box plots comparing the sampling distributions of Xwhen the arrival process {s,}is uniform over Chapter 5 5 Testing for independence In this chapter we consider testing i f the thinning parameter a in the Poisson AR(1) model is zero, that is testing for independent observations. We have a strong belief that an AR(1) model is appropriate for W C B data and in such cases it may not be necessary to test for independence. However the test for independence is interesting from a mathematical point of view and would be necessary before fitting the model to other data sets, which may not be interpretable as a queue. The first section examines what would happen if one blindly went ahead and used the standard Gaussian AR(1) asymptotic results. The next section bases inference on the conditional least squares estimator for a . As noted in section 4.3 this estimator is the same as the Gaussian estimator for a , however this time the asymptotic properties of the estimator are worked out under the Poisson model. We then derive a score based test and show that it is asymptotically equivalent to the CLS based test. Finally we examine the score function in detail about the point a = 0 , leading to non-standard Wald and likelihood ratio tests. 5.1 Gaussian AR(1) Suppose we decided to ignore the fact that our time series is integer valued, and used the 110 Chapter 5. Testing for independence 111 usual Gaussian AR(1) model. In Example 4.1.5 we showed that yfn{a - a ) n is asymptotically normal with mean zero and variance 1 - a . Under the null hypothesis, 2 H :a = 0, 4nd 0 4na n n is asymptotically standard normal. We reject the null hypothesis when is larger than the critical value Â£ , where Â£ 8 s is selected so that the probability of committing type I error is 5 (the significance level of the test). That is, under/^ :a = 0, 0 p(~Jna > % ) = P{Z > % ) = 5 , where Z denotes the standard normal random variable. n s n s Note this is a one sided test and that a negative estimate for a would lead us to accept the null hypothesis. The power of the test is the probability of not committing a type II error, that is the probability of rejecting H :a = 0 when 7f,:0 < a < 1 is true. The power of the test 0 under the Gaussian assumption is, P(^td >^ ) n a = P = P\ V Vn(dâ€ž-a) Â£ -VÂ«a 1-a 1-a 2 1-a' a 2 2 In the next section we continue by using the correct asymptotic distribution for 5.2 4nd . n Conditional least squares A more sophisticated test for the hypothesis is based on the conditional least squares method. In section 4.2 we showed that the C L S estimate for a was the same as the Gaussian based estimate, but with a larger asymptotic variance, l - a + a ( l - a ) / x , . 2 2 Chapter 5. Testing for independence 112 However under the null hypothesis both variance expressions reduce to 1 and hence we would use the same critical value for both tests. Another way to put this is that the significance level of the test in section 5.1 is correct. However the power function in section 5.1 is incorrect under the Poisson AR(1) assumption since the variance used is too small. The difference between the Gaussian and Poisson based variance is a(l-a) /X, 2 which is small when X is large or whena is near zero or one. To help assess this effect on the power function we consider two graphs of the power function: first the power as a function of a , and second as a function of X . Power of Gaussian vs Poisson Test (function of alpha) GAR(1) PAR(1) T - ^ - r ^ - i - c o c o o c N i o c o O o O d O Q d T - T - T - C M d Alpha d d C d M C d M d Figure 5.2.1 A comparison of the power for the Gaussian and Poisson based tests as a function o / . Â« = 100. a 1 d an Figure 5.2.1 show that for a between 0.01 and 0.16 the Gaussian based power is understating the "true" power and for a larger than 0.16 the Gaussian based power is overstating the "true" power. However overall the error in the Gaussian based power Chapter 5. Testing for independence 113 seems small. For example i f the true model is the Poisson AR(1) with a = 0.30, X = 1 and we performed the test on a sample of size 100, then the probability of rejecting H :a = 0 is 90.0% and not 93.2% as given by the Gaussian based power. 0 Figure 5.2.2 shows the following: Gaussian based power is independent of X, for extremely small values of X the "true" power based on the Poisson AR(1) model is large, but as X increases the "true" power quickly converges to the Gaussian based power. For example if the true model is the Poisson AR(1) with a = 0.01, X = 0.1 and we performed the test on a sample of size 100, then the probability of rejecting H :a = 0 is 8.0% and 0 not 6.1% as given by the Gaussian based power. Power of Gaussian vs Poisson Test (function of lambda) 0.25 0.2 0.15 1 I 0.1 0.05 0 o o o CM CM O O CM O Lambda Figure 5.2.2 A comparison of the power for the Gaussian and Poisson based tests as a function ofX. n = 100. a =0.01 and 114 Chapter 5. Testing for independence 5.3 Score test In this section we derive a test for a = 0 by considering the distribution of the score function when a = 0. We begin by noting the following simplification for the conditional probabilities, equation 4.1, when a = 0, p(X,\X,_ ) = e~ X ' /X !. x x x t That is, the series is iid Poisson. Making use of this simplification the score function (4.4.1) reduces to, tfa(0A)=4E*,-,(*,-*.)A- t=\ Under H :a=0, 0 U (0,X) a is a zero mean square integrable martingale. Its quadratic characteristic and variance are both equal to Â«(l + X). Since conditions for the as strong law hold, from Theorem 4.1.8 and 4.19, we have that n [U (0, X)] ->(l + X). x a n The martingale differences each have identical moments, and all of these moments exist and are finite. This insures that the Lindeberg condition trivially holds. Thus the conditions for the central limit theorem are satisfied and the distribution of the score is n'^ U (0,X)^N(0,l + X). 2 a To find the distribution of the score under the alternative hypothesis H :0 < a < 1 x we rewrite the score, U Â« (0, X) = \ A J X_ X /=i {X, - a l , , - X) + ^ Â£ X]_ x A /=i Chapter 5. Testing for independence 115 The first term in the score is a zero mean square integrable martingale. Applying the strong law of large numbers we get, 1 -U (0,X)^0 n a a + - (* X 1-a r x, ^2 \ 1 U - a J Therefore under the alternative hypothesis n U (0,X) /2 a ) will diverge since it has no mean correction. This implies that the test is consistent or that asymptotically the null hypothesis is rejected with probability one when the alternative hypothesis is true. Also note again that the test is one sided, that is large positive values of the score lead us to reject the null hypothesis. In practice we replace the nuisance parameter X with an estimate. Under H :a=0 0 the maximum likelihood estimate for the Poisson mean parameter is the sample mean X = X. Plugging this estimate into the score function we get, The second term is o {\) p and the first term ignoring the l/X is a zero mean square integrable martingale with variance A, . Again the central limit theorem holds and 2 Chapter 5. Testing for independence 116 _i/ together with the continuous mapping theorem we get - d n U (0,X)â€”> N(0,Y)/2 a Alternatively we could have found the distribution of the score by noting that, - U.iO,X)= '-' n r - Jnâ€ž where d â€ž is the conditional least squares estimator for a . â€” 'zZ"_ (X _ - X) l t l (5.3.1) Since both X and converge in probability to X under the null hypothesis and to X/(l â€” a) under the alternative hypothesis, the continuous mapping theorem implies that -^=rU (0,X) a and 4na n both have the same asymptotic distribution. The difference between the score test and the conditional least squares t-test is in the estimate for X in the denominator of the statistic. To summarize, the test based on the Gaussian model has the correct significance level, but the wrong power. The error in the power, however, is not large. The test based on the score is asymptotically equivalent under alternatives to the conditional least squares t-test. So from the point of view of testing for independence it is interesting to note that naively using the standard Gaussian test gives the correct result except for a small error in the test's power. 5.4 The Score function on the boundary of the parameter space In this section we take a detailed look at the score function about the point a = 0, and although this is a boundary point of the parameter space the score is well defined and Chapter 5. Testing for independence 117 behaved. Recall the interpretation of a is the probability that an individual in the system at time t â€” 1 remains in the system at time t. In such a case values of a less than zero would not make any sense. However the probability function p(x;a) = a ( l - a ) 1 1 x , x=0, 1, is well defined for any real number a and its derivatives exist and are continuous. Similarly, even though the Poisson AR(1) likelihood is not defined for negative values of a , we can still examine the function's mathematical properties at the point a = 0. To get an idea of what the likelihood looks like at a = 0 we simulated ten data sets and plotted the likelihood as a function of a . Figure 5.4.1 shows five plots of the likelihood for simulated independent and identically distributed Poisson samples with mean 1, which correspond to the case where a =0. For comparisons we have also included five plots for the case were a is near zero in Figure 5.4.2. alpha=0, lambda=1, n=200 8,000 , alpha Figure 5.4.1 Plots of the likelihood as a function of a forfive simulated samples ofsize 200 with a = 0 and k = l. 118 Chapter 5. Testing for independence alpha=0.1, lambda=1, n=200 7,000 6,000 .. 5,000 o o .c CO 4,000 - < 3,000 . o 2,000 . I X 1,000 . 0. -1,000Â£ ffl B o I o I t o I Figure 5.4.2 Plots of the likelihood as a function of a for five simulated samples ofsize 200 with a = 0.1 and X = \. In both figures the likelihood plots are smooth with a unique maximum. These plots do not answer the question of whether or not the score function is well behaved about a = 0, but suggest that the derivative of the likelihood about a = 0 may exist and be continuous. Also note, the maximum for one of the simulation series in Figure 5.4.1 occurs at a point where a is negative. We know that the score is defined at a = 0, we even gave an expression for this in section 5.3. However is it defined for a < 0 and i f so how does it behave? Recall that the conditional probabilities p(X \X _ ), t t equation 4.1, are polynomials in a . One might i think that a negative value for a could cause the conditional probabilities to become negative. To consider this possibility we rewrite the conditional probabilities as follows -X * X, -x x\ t 1=1 v x . (X -x)\t The first term is positive and the summation is less than one in absolute value. Therefore Chapter 5. Testing for independence 119 given a value 8 > 0 it is possible to find a neighborhood K of a = 0 such that the 0 probability of p(X \X _ ) t t being non-positive is less than 8 . x Since p{X \X _ ) is a polynomial in a , the derivatives of p(X \X _ ) with respect to a exist and are continuous. Further derivatives of the log(p(X \X _ )) with t t x t t t t x x respect to a will be the ratio of two polynomials in a , with the polynomial in the denominator being positive for a in some neighborhood of zero. Hence the derivative of l o g ( / ? ( X , | X ) ) with respect to a exists and is continuous for a in some neighborhood M of zero. Before calculating the Fisher information matrix we note the simplification when a = 0. p{X -i\X ^-i) t t p(X \X _ ) t t x e'^ X\ t X (X -l)-{X -i = t t (X -i)\e-W + l) t X' t Substituting this into Equation 4.4.3 we find the observed Fisher information. ~Lx = ~ X J^^-l ~^ ~ t-\ X + t-\{X -\ ~ l) X t Under the null hypothesis the expected information is '~^2 ~ ~ Xt-\ "^2~ J following Chapter 5. Testing for independence 120 -E[L] = X ~^ -E = -2X- = l+X X rl - X_ ~ X _i t t / x x + X+(X + X )-^ + X 'X 2 K In a similar manner we find i 2 = 1 and i aX xx â€”2 y - X_ x (x + x ) t x l Xj_ â€” X 2 y + X_ t x X^ â€” X 2 2 y X ^ ^ - X ^ r X X 2 2 =l/X. The inverse of the Fisher information matnx is 1 -x -X I X. Let a n be the value of a that maximizes the likelihood when the search for a is not restricted to (0,1). We call a â€ž the likelihood maximizer to distinguish it from the maximum likelihood estimate dâ€ž, which must lie in the parameter space (0,1). Under the null hypothesis Vnct converges in probability to a standard normal random variable and n Vnd converges in probability to a random variable Z*, where Z* is defined as, n / . v fO z<0 P(Z <z) = \ , . , \P(Z<z) z>0 v ; where Z has a standard normal distribution. We can define the Wald statistic in the usual way, W = n(d - 0 ) . If the 2 n n maximum likelihood estimator dâ€ž is replace by the likelihood maximizer dâ€ž then the Wald statistic has the usual chi-square distribution with one degree of freedom. However Chapter 5. Testing for independence 121 for the maximum likelihood estimate the Wald statistic converges to a modified chisquare random variable defined by, P( *<x) = ^ P ( X X l < x ) , *>0, (5.4.1) where %, has a chi-square distribution with one degree of freedom. We also define the likelihood ratio statistic in the usual way, that is 2 log(A), where A = L(d ,X ;X)/L(0,X ;X) n n n and d â€ž and X are the maximum likelihood n estimates and X is the sample mean. Our analysis at the beginning of this section n showed that the derivatives of the log-likelihood with respect to a exist and are continuous at a = 0. It is therefore possible to make the usual Taylor series expansion of the log-likelihood about a = 0 and show its asymptotic equivalence to the Wald statistic. Example 5.4.1 In this example we test our illustrative data set for independence. Table 5.4.1 shows the test statistic values as well as the 5% and 1% critical values. In all the cases the null hypothesis of independence is rejected. Test Statistic Value 5% Critical Value 1% Critical Value CLS 4.66 1.645 2.33 Our Score 6.06 1.645 2.33 Wald 15.09 2.71 5.41 LR 24.27 2.71 5.41 Table 5.4.1 Tests for independence in the illustrative data set. Chapter 5. Testing for independence 122 To assess how well the tests for independence work for small samples we apply them to two sets of simulated data. For both sets of data we generate 1000 series of length 200. In the first set we let a = 0 and X = 1 and in the second set we let a = 0.1 and X = 1. Table 5.4.2 show the percentage of time the null hypothesis of independence was rejected by each test for the first set of data, while Table 5.4.3 show the percentage of time the null hypothesis of independence was rejected by each test for the second set of data. Test Percentage of rejections at the 5% level Percentage of rejections at the 1% level CLS 4.2% 0.7% Our Score 4.3% 0.6% Wald 4.1% 1.0% LR 3.7% 0.6% Table 5.4.2 The percentage of time the null hypothesis of independence was rejected out of 1000 simulated series of length 200 with a = 0 and X = 1. Test Percentage of rejections at the 5% level Percentage of rejections at the 1% level CLS 37.4% 16.0% Our Score 36.3% 15.0% Wald 35.9% 17.0% LR 36.0% 15.5% Table 5.4.3 The percentage of time the null hypothesis of independence was rejected out of 1000 simulated series of length 200 with a = 0.1 and X = 1. From Table 5.4.2 we see that the probability of committing type 1 error is about the same for all the tests. In Table 5.4.2 we see that the power of the C L S test may be Chapter 5. Testing for independence 123 slightly higher than the other tests at the 5% level and that the power of the Wald test may be slightly higher than the other tests at the 1% level. The table also shows that there is a large probability of committing a type II error when a is small. Chapter 6 6. General misspecification test In this chapter we apply a general specification test, sometimes called the information matrix test, to our model. The test is equivalent to a score test of whether the parameters in the model are stochastic or not. Section 6.1 motivates the information test. Then in Section 6.2 we derive the test for the simplest possible case and state the more general result found in McCabe and Leyboume (1996). Finally in Section 6.3 we give the details for the Poisson AR(1) model and evaluate the test on some simulated data. 6.1 Overview A n advantage of maximum likelihood estimation is that the asymptotic variance of its estimators often attain the Cramer-Rao lower bound. However, a drawback to maximum likelihood estimation is that the estimates are not robust to model misspecification. We saw this in Section 4.7 where we found that for the misspecified model our parameter estimates are severely significantly biased. It is therefore important to check a model's specification i f maximum likelihood estimation is to be used. Let L and / denote the likelihood and log-likelihood respectively, and let 0 be a vector of parameters. Also let the first and second partial derivatives of L and / with respect to 0 be denoted by L\, L\, / e and / 124 e respectively. The expected Fisher 125 Chapter 6. General misspecification test information can be expressed in two ways: the Hessian form - Â£ [ / ] and the outer e product form 2s[/ / ]. When the model is correctly specified / + 44^ has a distribution r e e e with mean zero. A n equivalent expression for l +/ / e e r e is L /L, e which is a zero mean martingale. The equivalence of these two expressions is well known and can be shown by considering the second partial derivative of the log-likelihood as follows: 7 = 9 â€”/ ae 9 59 L h L L \ L I L 6 9 Barndroff-Nielsen and Sorensen (1994) state that L /L is a martingale. This is easy to e prove as follows: Let f (y ) t t denote the conditional density of Y given the past t observations, and write the likelihood as L = n " _ j / , ( v , ) - L n e t 3 be the sigma field m generated by y , y , â€¢ â€¢ â€¢, y . For m < n we have, x 2 m 3â€ž \Â°V = J 1=1 Jt=m+\ Â°V ;=1 r-r , t=m+l J , , - yÂ» d 126 Chapter 6. General misspecification test = ^ JE L + 1 /. & â€¢â€¢â€¢ + J^J r L , + ft cv. ^ â€¢â€¢â€¢ ^ = A-i+Ai L m Therefore L /L is a martingale. Under certain regularity conditions, see Section 4.1, the e martingale central limit theorem can be used to find the asymptotic distribution of A good description of the information matrix test is found in White (1982). Chesher (1983) shows that for sequences of independent observations the information matrix test can be interpreted as a score based test of whether the model parameter are stochastic. McCabe and Leybourne (1996) extend Chesher's result to a much more general setting. Namely to sequences of random vectors, which are possibly dependent and non-stationary. 6.2 Outline of the test In this section we derive the score based interpretation of the information matrix test in the simplest case and then state the more general result found in McCabe and Leybourne (1996). Let y ,y ,â€¢â€¢â€¢,)>â€ž x 2 be a sequence of observations. We will assume that the distribution of y depends on an unobserved random parameter 0 , , and denote the joint t Chapter 6. General misspecification test 127 density of y, and 0, by / ( v , , 0 , ) . B y definition f(y,,Q ) t f(y,\Q ) is the conditional distribution of y t t = f(y,\Q )g(Q ), t given 0, and g(B ) t t where is the marginal distribution of 0 , . We will also assume the sequence of parameters {0,}" Â£â€¢[0 ] = p 1 0 is uncorrelated and that and Var\Q ,] = 7 i > 0 . I f 7 i = 0 then the parameters are not stochastic and the observations are independent and identically distributed. Let L(y\Q) be the conditional likelihood, where y = (y ,---,yâ€ž) and 0 = ( 0 , , - - - , 0 ) . Consider the following Taylor T r { n series expansion of L(y\Q) about the vector p = ( p ,â€¢ â€¢ â€¢, p ) : 0 L(y\9) = L(y\Â») 0 + ( 9 - p ) L ( v | p ) +1(0 - p ) ^ (v|p)(9 - p) +O (\p - p f ) . r 9 p To proceed we need the following well-known property of the trace operator, E(X T AX) = tr(AL), where X constants and is a n x p random matrix, A is a n x n matrix of = E . In the following we will use E to denote expectation with e respect to 0 . The marginal likelihood is then, ZOO = E [L(y\Q)] 9 = L(y\u) + E [(0 - p ) ]4 (y\ p) + } Â£ [(9 - p ) Z (y| p)(9 - p)] + Â£ [o, (|||0 - pf)] r r e = e 9 9 L(y\p) + 0 +1 ^(Z (y| p)Q) + o ( Â£ [||9 - p f ]) e e where Q = var(0) = diag{%). Therefore i f K is close to zero the likelihood can be approximated Z = L(/ e e by from L(y)Â«Z,(v|p) + jfr(L (y|p)Q). e Using the relationship Section 6.1, we can rewrite the likelihood approximation as Chapter 6. General misspecification test 128 Â£ C v ) Â« Â£ ( y | u ) { l + ifr((/ +/ / )le^ where / = log(Z(v|0)). r e e e We are interested in testing the null hypothesis H :n = 0 (0 is not stochastic) 0 against the alternative hypothesis H :n > 0 (0 is stochastic). The score with respect to a 71 is, U (y,n) = !"log(Z(v)) on m l + l/r((/ +/ / )| Q) r e e 9 9=(1 l^((/ +/ / )| / ) r 9 e e 9=(i n I + ^ + U J I B - , . " ) ' Evaluated at TC = 0 the score simplifies to U (y,0) - \tr{i[l^ + / / ) | r n e e e=|a j . It is not completely obvious that this is a martingale since the number of parameters and the dimension of / and / are all increasing with n . e e Note that the elements of / are 9 Â£rf(y \9 ). t The t f(yt\Â®t)) the Z" single score therefore f (y |0,) and the diagonal element of / are t 9 Uâ€ž(y,0)=Y l {^fiy \Q ) become d l t t n8=n. â€¢ This is, of course, the same as replacing the n parameters in 0 with (1-dimensional) parameter u , 0 that is U (y,0) = n t e / ^ > o ) + ( i / ( v ^ ) ) } - Or letting / ( ^ , v) = Â£ ; l o g ( / ( v > ) ) we can 2 = 1 0 = 1 0 then write the score as U (y,0) = l +11*, which is now clearly a martingale. Since n 129 Chapter 6. General misspecification test {y,}J is a sequence of independent and identically distributed random variables and =1 assuming the derivatives of / j ^ / ( y , | p ) + (g^/(v,|p )) \ 0 with respect to p are 0 continuous, is also a sequence of independent and identically 0 distributed random variables. Assuming the variance exists and is finite, that is var^/(v,|p ) + ( ^ / ( v , | p ) ) j = r j , 2 0 n^v'HL 0 the central limit theorem implies that +L C )^W(0,1). V "Ho "Mo "Mo/ v' / For the more general case we let {y, j be a sequence of x 1 vectors of observations. It is assumed that the marginal distribution of y depends on the parameter t 0, which is a k x 1 random vector. Let L(y\Q) and Z(y) be respectively the conditional and marginal likelihood, where y = (yf, ,y ) and 0 = (0f, T T N E[Q ] = p and var[0 ] = Q(II), where n = (TI ,, ,K ) , T M and Leybourne show that the marginal likelihood L(y) Â» I ( v | p ) { l + ^((/ a)}. e +/ / % e e M ,0 ) . Also assume that T T N and that Q(0) = can be . McCabe approximated as We are interested in testing the hypothesis that none of the parameters are stochastic, H :H = 0, against the alternative that at least one of the parameters is 0 stochastic, H :n > 0 for at least one i, 1 = 1,2,â€¢â€¢â€¢,m. McCabe and Leybourne show that 0 i the test statistic to be used in this case is U = trl(l + V / J L ^ / ^ ^ ^ l n ^ } N 6 or Chapter 6. General misspecification test equivalently U = vec (/ +44 )le=n 'an 7 N 130 v e c e [p]' ' e e is the mx 1 unit vector and w n e r e vec is the operation of column stacking. Under H :YI = 0 U is a zero mean martingale, 0 N and under appropriate regularity conditions, again see section 4.1, \U) U â€”> N(0,l) as N N 7V->oo. 6.3 Details for the Poisson AR(1) model In this section we present the details of the specification test for the Poisson AR(1) model and apply the test to simulated data, some for which the Poisson AR(1) is the correct specification and some for which the Poisson AR(1) is not the correct specification. We will denote the conditional probability of X given and the parameter t values a, and X by, t p (X \X_ ) l t l = mm(X ,X,_ ) ( -ft- ^ u 2 x x=0 V x -X, i X.-x <(l-a,) M (X -x)\' J t We assume that the sequences {ot,}" and {A.,}" are independent and identically =1 =1 distributed with the following means, variances and covariance: Â£[a ] = a , E\X J = X , ( Var[a ] = n >0, Var[X ] = n >0 t i t 2 and Cov[a ,X ] = -TC < 0. t t 3 We can justify the use of a negative covariance as follows: To simplify the argument we assume a, =a and X =X t for all t. The marginal mean of X is t p = X./(l-a). If the mean p is fixed then increasing a corresponds to decreasing X 131 Chapter 6. General misspecification test and we therefore assume a and A, to be negatively correlated. Our vector of random parameters is 0 = ( a a , 1 5 ,a ,'k ,X , 2 n x 2 ,A, ) n r and the first and second derivatives of the log-likelihood with respect to 0 are given by and 6 \_diag(l ,l , aih where / a| = Â£ \og{p (X, \ X _ )), t k=i}og{ ^X \X _ )) Pt t t x ,l ) aiXi x anK . diag(l ,l , Xi ,l ) Xi x l = ^ \og{p {X \ X _ )), x and/^ t t t x /â€ž, = Â§ l o g ( ( X , | X,_ )), = $ \og{p {X \X _ )). 1 x t t t x The variance matrix of 0 can be written as Q(iT) = Klin where /â€ž denotes the n x n identity matrix. The derivatives of Q are d dn, 0.1 oâ€ž Â°.J oâ€ž 0.1 7 J A x 132 Chapter 6. General misspecification test Â°- J Finally the score statistic, to test the null hypothesis H :n =0,n Q l 2 = 0 & 71^=0 against the alternative hypothesis H :n > 0 for at least one / , is given by U = Y,-i{%, ( >^) a a l n + Z (a,X) + ll (a,X) + l (a,X) - 2 / (a,X)l a< t Xi a Xi (a,X) -2/ (a,X)). aA Note that all the derivatives in this expression can be found in section 4.4. If this joint test rejected the null hypothesis then the test can be separated into individual tests to try and identify which parts of the model are inadequate. In the Poisson AR(1) model there are two random components: the thinning operator and the arrival process. Recall in Section 4.4 we showed the Pearson residuals could be decomposed into continuation and arrival residuals. In a similar manner we can decompose the score statistic into components. However in this case the decomposition is into three parts: continuation component (for the binomial thinning operator), arrival component and an interaction component for the interaction between the binomial thinning operator and the arrival process. In the test statistic U the component /]]"_ {/ (a,A) + / (a,A.)} ! n 1 a i t e s t s m e a< adequacy of the binomial thinning operator to explain the variation in those who continue to collect from one period to the next. Binomial thinning assumes that the recovery of individuals is independent, which is probably a reasonable assumption. It is unlikely that one individual's recovery time would affect another's, unless we had limited medical Chapter 6. General misspecification test 133 services and recovery on one meant that the next individual could start treatment. The binomial thinning operator also assumes that all individuals recover at the same rate. This is a less realistic assumption, since there is wide variation in individual health due to genetic factors and lifestyle choices, such as diet and exercise. Recovery rates should vary from person to person, since we would expect recovery rates to depend on the person's health immediately prior to injury or illness. We now examine the thinning operator in more detail. The Bernoulli assumption seems reasonable. That is, for individual i who is collecting disability at time t-1 there are two outcomes: recovers (stops collecting STD) or does not recover (continues to collect STD). This, of course, ignores other possibilities, such as death, which we could either assume has a negligible probability or can simply be added to the recovery probability and thought of as an exit probability. However, each individual collecting disability at time t-1 will have a different level of health and therefore will have a different probability of recovery before the next period. The number of individuals who continue to collect at time t can be written as Z M ' P<' w n e r e Pi ^ independent Bernoulli random variables, and the probability that individual / recovers is -P(p\. = 0) = 1 - a,.. This thinning operator has more variation than the binomial thinning operator. The question is how much more variation or is the binomial thinning operator sufficient to account for the majority of the variation? If we found that the binomial thinning Chapter 6. General misspecification test 134 operator was not sufficient to explain the variation, that is, the result of the specification test was to reject the hypothesis that the binomial thinning parameter was non-stochastic, then we would have to consider over-dispersed models such as in McKenzie (1986), A l Osh and A l y (1992) and Joe (1996). The problem with the thinning operators used in McKenzie (1986) and Al-Osh and Aly (1992) is that they are a random sum of geometric random variables and it is possible to have a o X _ > X _ , which wouldn't make sense i f t x t x we want to use it to model a queue. The component statistic U n tests the adequacy of the Poisson distribution in describing the arrival process. Two reasons for using the Poisson distribution are: first it has a simple probability mass function with well-known properties and second when combined with the binomial thinning operator the marginal distribution of our process remains Poisson. In practice we often use the Poisson distribution to model count data. The main criticism of this is that most real data are over-dispersed, that is, the Poisson distribution is not sufficient to describe the variation found in the data. One method for dealing with over-dispersed count data is to let the Poisson mean X be random. Usually it is assumed that X has a gamma distribution, which transforms the distribution into a negative binomial. For AR(1) models with negative binomial marginals see McKenzie (1986), A l Osh and A l y (1992) and Joe (1996). Finally, the component ^" {/ (a,X,)/ (a,X.) + / =1 0( X( a X (a,A))j tests i f the arrival Chapter 6. General misspecification test 135 and departure processes are independent. If the workers are a small cohort then the arrival and departure processes are dependent, since, as workers recover, the cohort size increases, which raises the cohort's exposure to injury and increases the number of new injuries (arrivals). In most industries this is unlikely to be a problem since the number of injured workers is usually a very small fraction of the industries work force. Next, suppose we believe that the binomial thinning operator with a fixed nonstochastic parameter is sufficient to explain the variation in the number of individuals remaining in the system from one period to the next. However, we suppose we have picked the arrival process to be Poisson out of convenience and wish to check the adequacy U of this specification. In this case, the score statistic simplifies to = Z"-i(^,( ' a n score A statistic -)+^, as E^s] ( ^)) â€¢ Using the expression in Sections 4.5 we can rewrite the a U =L;.,{(*]) 2 -2XE,[s,] + X' n E,[s>]-(E,[s,]f + -Â£,[*,]} - (l + 2X)E [s,]}, which is, of course, is a zero mean martingale. For t low count series, it is easy to calculate the variance numerically, which is given by Since the martingale differences, U - U _ , are bounded by A. + X] which has 2 t t x finite moments, the weak law of large numbers holds for the martingale differences and their squares. = var[C7, -U _ \ t x That is, n U x n -*E[U, -U _ ] = 0 t x and n\U\ ->Â£ (U, -U _ ) 2 t x The bound on the martingale differences also means that the Lindeberg Chapter 6. General misspecification test 136 -y d condition is satisfied; hence var[(7 ] U -> iV(0,l). 2 n n To evaluate how well this test performs, we apply it to some simulated data which are correctly specified, and to some simulated data which are misspecified. In the first case, we simulated 100 series of length 200 from the Poisson AR(1) model with a = 0.5 and X = 1. A t the 1% level of significance the information matrix test rejects 3 series and at the 5% level it rejects 8 series. The number of rejections is only slightly higher than expected, so it appears that a sample size of 200 is sufficient for the distribution of the test statistic to be approximately standard normal. Next, to assess the power of the test we simulate some misspecified data by letting e, follow the uniform distribution over the set of integers {0,1,2}, that is, P(s = /) = X , t / = 0,1,2. Note that the mean for s, remains the same. This time, out of 100 series, the information matrix test rejects 92 series at the 1% level and all 100 series at the 5% level. This indicates that the test has strong power against this type of misspecification. Example 6.3.1 In this example we test the specification of the Poisson arrivals in our illustrative data set. The information matrix test statistic is 1.45 (p-value 14.7%). We therefore accept the null hypothesis that the Poisson parameter is non-stochastic. Chapter 7 7. Models with covariates 7.1 Model definition and introduction Let X ,X ,---,X 0 1 n be a series of dependent Poisson counts generated according to the following model X t where X 0 =a,oX,_, +e , t has a Poisson distribution with mean A, and {s,}^, is a series of 0 independently distributed Poisson random variables with mean X . The thinning operator t " Â° " is defined as in Section 2.1. Given X _ , t x a , " ! , . , and s, are assumed to be independent; this can be checked with the model specification test, see Sections 6.3 and 7.5. As in Section 6.3 we will denote the conditional probability of X t given X _ t x as P {X \X _ ). t t t x A n easy way to incorporate covariates into the model is to use a link function, which is the common method in generalized linear models. The idea behind the link function is to map an unrestricted (real) parameter space for the regression coefficients into the restricted parameter space required by the model. For example, since the 137 Chapter 7. Models with covariates 138 parameter space for a , is (0,1), the following link is appropriate a, = exp(lfY) l + exp(Jfy)' where Y is an m -dimensional vector of time-varying covariates and y eSR is an mm t dimensional vector of parameters. Similarly, taking X = exp(Z, P), where Z is a pr t t dimensional vector of time-varying covariates and p e$R , will insure that X p t is positive. 7.2 Forecasting The first step in the process of forecasting is to find the k-step ahead distribution, that is, the conditional distribution of X N+k given X . N This distribution is defined by the conditional moment generating function of X given X ; N+k N and is given in the following theorem. Theorem 7.2.1 For the Poisson AR(1) model defined in Section 7.1 the k step ahead conditional moment generating Junction is given by 0 1 N+kl N A A X ' t Af+i jV+2"' jV+i a a e +(1 a A T + i N+2 a Proof: The result is proved using induction. The one step ahead conditional moment generating function is Chapter 7. Models with covariates 139 N+l Â° N + X = [a ^ Â£N+l ]}| XN 1 + (1 - Â» ) f " expjX (e -1)}. s w + 1 Now, suppose that the k-l step ahead conditional moment generating function is f N+k-l M , (s) = x x A N+k-\ \ A N N+k-\ N+k-l 1 exp ' v N+k-l ( ^ - l ) X ^ , n ; a I 7=1+1 i=N+l /V+t-1 where we define ]^[cc =0 when i = N + k-l. Then the k step ahead conditional y 7='-+l moment generating function is x ^ is) M N N = E[e^\X ] N = E[E[e ^\X ]\X ] sX N+l f N+k =E eÂ° Ua N+k N+k ^ 1-nS J+ j=N+2 N \ j=N+2 N+k exp (e*-l) J N+k [ \\ i=N+2 j=i+l X N J N+k E[ ?â„¢\Xâ€ž]exp\(S-l)Y X Yla = e l I l J i=N+2 j=i+l N+k :[a where e' = ^Y[^N+2 j a w + 1 e'+(l-a 0]^exp{X w + + ( '-l)}exp (e -l) s w + 1 e X ^ f [ i=N+2 j=i+l ~ 1 1 ^ + 2 .>)' Substituting this in for e' gives a N+k a 4 ' Chapter 7. Models with covariates 140 ' n ; ^ Â« , + ( i - n M W i - a Â» Â« ) , N+k exp< A, ' i C - Â« f N+k N+k + ( > - i C Â« Â« > ) l - 0 } H ( ' - o s ^ n Â« J i=N+2 j=i+l JV+* ^ N+k 1- ex j=N+l N+k ( ^ - l ) X M l P I J i=N+l a ; j=i+l â€¢ J â€¢ Remarks 1. The distribution of X \X N+k parameters Z N+k â€ž n ^ and a + 1 7 -iâ€”rN+k A, I is a convolution of a Binomial distribution with N X, and N . a Poisson . -wâ€”rN+k a , . Hence, it has mean vanance A â€ž "11 a , 1- I I j=N+l J\ distribution X J) a ,+ > M a . +> 1 1 j=N+l -r-iN+k A,,. Z_(/=Af+l with parameter . -iâ€”rN+k A,, . a , and a ,. 11 1 y=,-+l J 2. From the conditional moment generating function the conditional distribution of X t given X 0 is a convolution of a Binomial distribution with parameters J^J^rXy and X 0 and a Poisson distribution with parameter X'-i^<lX-i ./ distribution of X 0 0 a â€¢ Hence, i f the unconditional is Poisson with mean A , then the unconditional distribution of X is Poisson withmean A< +A- a,+A,_ a,a _,+ ( M 2 ( t +A, a,a,_, 0 a,. For comparison purposes we define a corresponding Gaussian AR(1) model with covariates as follows: Chapter 7. Models with covariates 141 X = X +(x X _\ + s , t t t t r where s is normally distributed with mean zero and variance a ( 2 and the parameters X t and a, are defined as in the Poisson model. For this Gaussian AR(1) model, has a normal distribution with mean . A ^ T T ^ a , + y Â« 1 1 j=N+l J J V + N+k N * A , T T * a, and variance jv+ Â£â€”li=N+\ ' 1 17=1+1 J l + ]^^Ja )o" . As in section 3.1 the conditional mean of X \X 2 X \X 2 N+k N in the Poisson and Gaussian models are the same, but the conditional distributions are quite different. Let p (x) be the conditional probability function of X k N+k the conditional median of X N+k that ^ â„¢ ! 0 given X N given X N and define as the smallest non-negative integer m such k ^ 05. This is the same definition used in Chapter 3. If, at time N, we know the future covariate values, and { Z , } ^ * , as well as the parameter values y and |3 ; then the k step ahead conditional probability function, p (x), can be found k from the k step ahead moment generating. Proposition 3.2.1 says that the k step ahead conditional median is the forecast that minimizes the k step ahead absolute forecast error. With regard to forecasting count series, the biggest advantage in using a data cohesive model is that individual probabilities (point mass forecasts) can be calculated for all possible future outcomes. This is especially useful for low count series, where only a few of the outcomes have non-zero probabilities. If the purpose of a model is to make forecasts, it is helpful to select covariates that are either deterministic or easy to forecast, since future covariate values are needed to Chapter 7. Models with covariates 142 forecast with these types of models. Next we consider the limiting distribution for a couple of simple cases. Example 7.2.1 Suppose that there are two different rates of new injuries, and each month the rate switches back and forth. That is X =X 2t+x and X =X , x 2t t = 0,1,2,-â€¢â€¢. Also 2 assume that the recovery rate is constant over time, that is a , = a unconditional distributions of X and X 2t+X X +X a x +A,,a + 2 2 2 2 2 x are Poisson with respective means, 2t +X a '~ + X a '~ and 1 2 fixed. The X +X a+X a + +X a ~ +X a '~ . 2 2 x 2t 2 2 2 2 l x There is no limiting distribution in this case. However the following subsequential limits hold X ^P (Â±f^-) 2t+x and I 0 2 ( ^P (^i). Also i f X ~P [^^) o unconditional or marginal distribution is X 0 then the o ~ P {^^j and X ~ ^,( '). X2+a x 2t+X 0 2t 2 Example 7.2.2 Consider the following modification of Example 1. Suppose there are two distinct seasons (winter and summer) and that the two seasons have different injury rates, that is, x 1 2 If X ~^ ^ ' ^ ^ ^ ' ^ m e n 0 Y (( - 'h l a '+' Â°^ 1-a + (l-a)(l-a ) 12 1^2 ' J 0,1, 5 '11 = / = l,2,---,6 , 6 + 1= for . (l-a )(a'^^"^.)\ P 12 ' ' [A,, ^ the (l-a> ^12<+/+6 ~ o r 1-a marginal distribution (l-a')(a%+a"%) 2 + (l- )(l-a ) 12 a is Chapter 7. Models with covariates 143 Example 2 is fairly realistic in that a lot of industries have two well defined seasons. Exposure to injury is often highest during the summer months, when there is a larger work force and more over-time. In contrast, during the winter months, when there is less work, the work force is smaller and those employees who are working are encouraged to take holidays. In British Columbia, the logging industry and the fishing industry are two examples of industries whose exposure to injury changes according to the season. In Section 3.4 we showed how to construct confidence intervals for the probability mass forecasts. The following modifications are needed when regressors are included in the model. Suppose we have a sample of size n and denote the maximum likelihood estimates for this sample by y and (3â€ž. We will assume that (y â€ž,Pâ€ž) is n asymptotically normal with mean ( y , P ) and variance n'H' , where i is the Fisher r 0 1 0 information matrix and y and p are the "true" parameter values. 0 0 The k step ahead probability function can be written as , i , x r T - H M ( I X ) X ~Â° where a n k = IXIl, *> a k 3 1 1 ( 1 = W>- n+Ic * = ^ X k n + i n a i , - i +' n^ 2,k-i+ k n + k 0 x-s 0 n+ +^â€ž *_ia _ + II+t lil â€ž,Pâ€ž) has an asymptotically normal distribution with mean p (x\ X ;y , p ) and variance n , (X-S)\ +X . B y Theorem 3.4.1 for fixed x, p (x\X ;y k 1 y S s Chapter 7. Models with covariates <**(*;YOÂ»PO) = N I + &y 144 7=Yo.P= -1 ax +2 dy Y=YO.P=PO 5/7 dp k YP Y=Y .P=PO 0 ap (7.2.1) t 8X Y=Yo.P=Po dy Y=Yo.P=Po where the matrix / is partitioned as 1 Z i = Y *YP 1 'p *PY and the matrices i \ y = (/p ') and are of dimension pxp, pxm and mxm. The partial r , P ) are given by derivatives of the point mass probability p (x\X ;y k 9 dy p (x\X ) = k a n ^p (x\X )4-a +4TP*W.)4-X dy dX dy da k H mJ[ (7.2.2) and (7.2.3) Expressions for the partial derivatives p (x\X ) k da n n>k and â€”?p (x\X ) are found in dX k (4.4.1) and (4.4.3) respectively. The other 3 partial derivatives are given by ay <*â€ž,* ^X' dy >-\ =o.â€ž J^Y (l-a ) tk n+J n+J t = YX / j n+m &i n+m,k-m Chapter 7. Models with covariates 145 and 8 a where a n 0 o . ^ * _ Z_i ^n+m^ n+nP" n+m.k-m ' s 1. A n approximate 95% confidence interval for p (x\X ;y , P ) is k A 7.3 0 n 0 (x|Xâ€ž;Y ,p )+2a,(x;Y ,p ). n n n (7.2.4) n Estimation The model parameter can be estimated using the Newton-Raphson iterative scheme discussed in Section 4.6. In this case, the vector of parameter is 9 =(y y , . . . , y , 15 P p ,...,P 1; 2 m . The addition of regressors in the model makes Â£ [ f / ( 9 ) ] less practical. (r) 2 In Chapter 8 we fit Poisson AR(1) models to some W C B data. In these models a is held constant over time. The parameter estimates were found using a quasi-Newton procedure, in which we calculate 17(0 ) numerically. We found the following starting (r) values worked well a = 0.9, pj = P = = P = 0. 2 P We now consider the asymptotic properties of the model. Let U and U ? n arrival = Y =Jh = ^ " _ , 4 <Â§~ / a yn ( ap^f denote the score function for the departure parameters y and the t parameters P k, =-i;l = -i;lÂ°SP (X \X,-i)> t t respectively, W Â° - , = Y a,(l-a where t t) and 4 = 1= i g 0 P t (X \ X _ ), t t x =Z,X,.We also denote the Chapter 7. Models with covariates 146 martingale differences as u =U yt -U _ and w , = yt t x - U^ _ . Using the expressions p t x in Section 4.5 for the derivatives of the conditional probability the martingale differences can be written as Â«,,=*, p ^ - X ^ - V - p X X ^ ) = p (X -\\X _,-Y)-p (X,\X _ ) t t t t t = E,[a X _,]-a X _ t t t t j \-a {X \X _ ) ( t)Pt t t aY x t / \-a (X \X _ ) x { t)Pt t t t a,Y x t Y x t and lip, = {X -\\X,)-p (X \X ) Pt t t t = Â£,[>,]-*., lp,(X,\X )X Z, t t t Z. t The following proposition shows this model can be identified. Proposition 7.3.1 The Poisson AR(1) model defined in Section 7.1 can be identified if and only if the following two matrices have full row rank: \Y ,Y ,...,Y ] x 2 n and [Zj, Z , . . . , Z ]. 2 n Proof: A statistical model is identifiable i f its Fisher information is positive definite. It is therefore sufficient to show that 1 a\b T " Y E L a.il-a,)Y â€¢ t |M if and only i f a =a = x 2 La,(l-a,)K . k^t t k,^t t z =a =b =b = m 1 2 z =b =0, p a =0 | (7 3 1) b where a ={a ,a ,...,a ) l 2 m and Chapter 7. Models with covariates b = (b ,b ,...,b ) l 2 147 . The left hand side of (7.3.1) can be written as m f^E (i a l-a a Y i X b Z ) T a where 2tk,) +c t{ T t) t+ x c =a (l-a )a Y = t t 1( T l 2t 0 ifandonlyif c =c u , 7 ] = 0 and b [Z ,Z ,...,Z ] l 2 t t a|( + c jj 2 From Proposition 4.6.1 Â£ ^ c / 1 ( = 0. That is (7.3.1) holds if and only i f 2t , a ( f a [Y ,Y ,... T x 2 = 0 which by assumption hold i f and only i f a, = a T n t and c =X b Z . T u = f^E (c Z 2 t n 2 â€¢ If the regularity conditions for the martingale C L T hold, then the score can be used to make inference about the parameters y and p . Basically, i f the covariate processes are "well" behaved then these conditions will be satisfied. As mentioned in example 7.2.2, in many industries, exposure to injury is seasonal, in which case the addition of seasonal covariate to the arrival of injuries is appropriate. A common method for modeling seasonality in monthly data is to use indicator covariates for each month. Let Z = {z ,z ,...,z f, t a t2 n2 where /'* component is one in month / and zero in all other months. That is, 12MJ= Z 1 i=j . . . 0 i*y n *',./ = 1,2,...,12. Chapter 7. Models with covariates 148 Note, i f a constant is included in the regressors then one of the monthly indicators must be dropped. Often it is more appropriate to use seasonal indictors rather than monthly indicators. However, the number and length of the seasons is somewhat arbitrary. Some authors prefer to use sinusoidal covariates, such as Z, = sin(27ir/12) ,sin(27tr/12) T ,sin(27tf/12) . When all 12 sinusoidal components are included both models are the same. Usually only a few of the sinusoidal components are needed. However, this still generates 12 different monthly levels. Suppose we have a model with 12 values of the thinning parameter, one for each month of the year, that is, a a , 1 5 February, ,ot 2 are the thinning parameters for January, 12 , December respectively. Similarly, suppose the model also has 12 values of the departure parameter, X,, X , ,X , again corresponding to the 12 months of the year. 2 l2 Theorem 7.3.1 If X has a Poisson distribution with mean p. then the marginal 0 0 distribution of Xj is Poisson with mean u. , where ; 1 12 Â»j = mod o) = : Â£/(M)A* u 12 a l-a a l 2 1 2 ^ and mod(10+j+k) 1 2 fU\k) = Remark Note that \i = P oi u)> J a a I"I i+mod (/+A-l) ' a 12 /=1 t h a t is > nt+k ~ (.Vk) x p 0 f o r * = UÂ» ,12 and Chapter 7. Models with covariates 149 t = 0,1,2,.... Proof: Since X is a convolution of a nt+k and m+k> i e ^ ,+t_i 1 2 / + i 1 2 niean must be t s equal to a [i _ + X , or, in other words, p = a p _ , + X , . We will show this equation k k l k t t ( t t holds for k = 1, and omit the tedious details for k = 2,3, ,12 . We begin by writing p 0 and pj out in long form. That is, oi ci 2 3 ocj A.[ + ccjCt^ oi[ A. + 2 2 1â€”a,^ ~Kx] A.ii A.j 2 a 2 2 1 2 and _A. +a a a 1 1 3 4 a /l +a a a 1 2 2 1 4 5 l - a ^ a X + +a a X 1 2 a 2 ] l2 +a A,, n 1 2 1 2 Then . ajPo+A,, aja a 2 a A,,+a,a a 3 12 3 a A, +a a a 4 1 2 2 1 4 5 a X + 1 2 +a a X +a X 2 l l-a,a a a A. +a,a a a X + 2 _ /l +a a a 1 1 3 4 1 2 2 4 5 l-ajd.2 = ) T i2t+12 u l l2 1 2 a 2 1 2 +a a X +a X 1 n n l n 1 2 Mi It can be shown that the sequence of random vectors â€¢ ,X n - (X ,X l2t+l n , 2 ^ is stationary. Proposition 7.3.2 If the stochastic process X t follows the Poisson AR(1) model as Chapter 7. Models with covariates 150 defined in Section 7.1 and there exist an a X,, < X max m a v < 1 and a Xâ€žâ€ž < oo such that a, < a _ and for all t, then it is a -mixing a(ri) = O a" max . The proof of Proposition 7.3.2 is very similar to that of Proposition 4.3.1, but the notation required makes it tedious an is therefore omitted. Proposition 7.3.4 Under the assumptions of Proposition 7.3.2 all the moments of X t exist and are finite. Further, all the moments of u and u exist and are finite, where t t u = U â€”U _ , U is the score function and u is the matrix of partial derivative of u t t t x t t t with respect to y and P . Proof: As noted in Remarks following Theorem 7.2.1 the marginal distribution of X is t Poisson with mean X +X _ a t A max 1 ~ a mi / ( 1- a t l +X _ a a _ +---+X a a _ ---a t t 2 t t l 0 t t l which is bounded by l max) < â€¢ Hence, all the moments of X 0 0 t exist and are finite. The second part follows from the fact that both u and u are bounded polynomials in X and t t t hence all of there moments exist and are finite. â€¢ Proposition 7.3.3 If in addition to the assumptions of Proposition 7.3.1 we assume that the model is idenifiable and X is stationary, then the Fisher information, i, is finite and t Chapter 7. Models with covariates 151 Proof: B y Proposition 7.3.2 X is a -mixing, which combined with stationarity implies t that X t is ergodic. As a consequence of Theorem 4.1.1 u and u are both ergodic. t t Further, u and u have finite variances due to Proposition 7.3.2. Hence conditions for t t Corollary 4.1.1 are therefore satisfied and the result follows. â€¢ 7.4 Testing In the first part of the section we consider testing for independence, or more formally testing the null hypothesis, H :a = 0, against the alternative hypothesis, H :0<a 0 a < 1. We begin by finding the Fisher information matrix under the null hypothesis. Let l -p X \X _ t t t t , x l =-Â§ -l , ta a hx, ~i7,h' 4x &f4x' = t = haX, ~Â§r,ha = 3 1 1 ( 1 hx, a hx, â€¢ A l l of these partial derivative are found in Section 4.4. When a = 0 we have = the following simplifications: â€¢ _ X_ X t '(a â€” x _ t Â« M i A t K X ' t / - 2 Y ^ - Y + Y , Y 1 < * t~ X {X A, Y ' 2 A, Y â€” 0Y Y l) Y Y 2 1 Y 2 Y I _i_ Y X ' A, Y ' Chapter 7. Models with covariates i = <aX, X, , X] -X, XX 1 _ 12 <-i 12 ^1 tX, l < M . 2 *1 The expected values of these under the assumption a = 0 are: E ^ E ^ - X ^ = 0, E[hx] = E ^ - l = 0, lâ€ž â€” E IlX t 1 T ^ ^ T X T X* A, - ^ ( - I T " ^ - ! " A,, 1 â€” 01 â€” LK, _ 1 , â€” A,, , A, 2 M A, i X T 1 X t â€ž i X F M X, A. , A. A, +A, - y - A , M M A,j Â« A* A,, ^'-i A,, i A,, h â€ž i X A* ^/+A., 2 M 2 A. r 1 i ^t-i A,, , H A ( Chapter 7. Models with covariates 153 A,_,A. = X,_ X +X 2 X] 2 x ~ t t X) __ j _ Next we calculate the partial derivative with respect to P . ' ap ' p - ap' tx 'lap ap 8X t taX â€¢ ' ap ' pp = A< We then have p ex^dx/ f ap ap ap . a^, ap * 2 Chapter 7. Models with covariates 154 X xz t t = -x _ z,, t x and ap ap L ^' J ap .= -J-x .z.z5+o- " 2 a2A A,, â€¢"' ' ' ap 2 = -A,Z,Zj. The Fisher information is therefore given by i â€” + â€” 1 Â± X_ Z t x XZZ t t t t We will assume there exist a positive definite matrix / such that l i m ^ n 00 i -> i and let a 2 n denote the first entry in r . 1 We define the following three sets of parameter "estimates": Let d n the unrestricted maximizers of the likelihood, d n and Pâ€ž be and pâ€ž be the maximum likelihood estimates (that is maximizing over the parameter space) and p* be the maximum likelihood estimate of p when a = 0. Under the null hypothesis V w a / a converges in probability to a standard normal n Chapter 7. Models with covariates 155 random variable and V n d / a converges in probability to a random variable Z*, where n Z* is defined as, PZ <z = 0 z<0 P(Z<z) z>0 where Z has a standard normal distribution. If we respectively define the Wald and likelihood ratio statistics as: W =na Ja 2 n 2 and 2log( A ) , where A = Z(dâ€ž, pâ€ž; X)/Z(0, p*; X). Then under the null hypothesis both converges to the usual chi-square random variable with one degree of freedom. However if redefine the Wald and likelihood ratio statistics as: and 2 log( A ) , where A = L(d , P ; X)/L(0, n N =na /o 2 W N 2 n P *; X), respectively. Then under the null hypothesis both converges to a modified chi-square random variable defined by, P X* =I +Ii> ( X l < ), X x>0, where x i has a chi-square distribution with one degree of freedom. Consider the following zero mean martingale i * where U^a,^) is M ( * ' 7 * M ~ x ' ' - g . ( C M Â» ) - Â» Â£ & the score function with respect to a . Under the null hypothesis Chapter 7. Models with covariates ^an(O'P) i 156 niean zero martingale. However, under the alternative it is necessary to sa " X â€”â€” from Â£7^(0, P) to get a zero mean martingale. Hence large positive 2 subtract values of Â£7^(0, p) give evidence in favor of the alternative hypothesis. That is, the score test is one-sided. Next we consider the information matrix test for specification. We will allow covariates in both the arrival and departure process and use the same notation as in Section 7.3. That is, we let u =U t U _ , where U denote the score function at time t t t x t and let u denote the matrix of partial derivative of u with respect to y and p . t t The information matrix test is based on the following martingale: M = n where m = \ T t u u +u l T m+p t t t m+p and l m+p m, t is a m+p vector of l's. The quadratic variation of M is given by [M] = ^"_ m . Under the assumptions of Proposition 7.3.3 m is 2 n n i t stationary and a -mixing. Further all the moments of m exist and are finite. This is t sufficient for the conditions of Theorem 4.1.1 to hold and hence [Af] ^ M converges in n probability to a standard normal random variable. Chapter 8 8. Application to counts of workers collecting disability benefits In this chapter we analyze five data series drawn from the W C B data set. Section 8.1 contains descriptions of the five data series and preliminary analysis of the data. Next, in Section 8.2, we carry out our in-depth analysis, which includes: model estimation, selection and testing. We calculate forecasts for the first six months of 1995, in Section 8.3. Finally, in Section 8.4 we show what happens i f the Gaussian AR(1) model is fit to the data. 8.1 Workers' Compensation Data We have selected the following five data series for analysis in this chapter: Each series contains monthly counts of claimants collecting STWLB from the W C B . A l l the claimants are male, between the ages of 35 and 54, work in the logging industry and reported their claim to the Richmond service delivery location. The distinguishing difference between the five series is the nature of the injury. We will refer to the five series as data set 1, 2, 3, 4 and 5. The claimants in data set 1 have burn related injuries. In data set 2 the claimants have soft tissue injures, such as contusions and bruises. The claimants in data set 3 have cuts, lacerations or punctures. Claimants in data set 4 have dermatitis and data set 5 contains claimants with dislocations. 157 Chapter 8. Application to counts of workers collecting disability benefits 158 Table 8.1.1 contains a summary of simple descriptive statistics for the five data sets and Figures 8.1.1 through 8.1.5 contain time series plots for the five data sets. Plots of the sample autocorrelation function and sample partial autocorrelation function are found in Figures 8.1.6 and 8.1.7. Data Set Minimum Maximum Median Mean Variance 1 0 2 0 0.34 0.33 2 4 17 9 9.83 9.48 3 1 21 5 6.13 11.70 4 0 2 0 0.28 0.27 5 0 4 1 0.92 0.76 Table 8.1.1 A summary ofsimple statistics for data sets 1 through 5. A property of the Poisson AR(1) model is that the marginal mean and variance should be the same, see Proposition 2.3.1. In Table 8.1.1 we see that for each data set the mean and variance are close except for data set 3, where the variance is almost twice the mean. However, i f the Poisson mean is non-constant this would cause the variance to be larger than the mean. In the time series plot of data set 1, Figure 8.1.1, there is a significant change in the pattern after the middle of 1993. It therefore seems unlikely that an AR(1) model will fit the series well. This is further confirmed by the sample autocorrelation function, which doesn't decay fast enough for an AR(1) model. Also, the first two lags of the partial autocorrelation function are statistically significant at the 5% level, suggesting that an AR(2) model might be appropriate. However, an AR(2) model has the problem that it is not easy to interpret and it has more than one specification. See the discussion at the Chapter 8. Application to counts of workers collecting disability benefits 159 beginning of Section 2.4. The difficulty with analyzing a series with very low counts, such as data set 1, is that a single claim can drastically change the shape and correlation pattern of the series. For instance, in data set 1 there appears to be a single claimant collecting S T W L B between May 1993 and December 1994. It is impossible to tell this for sure, since this same pattern could be caused by several individual claims, although, based on the earlier claims frequency in the series, this is less likely. Further investigation shows that our conjecture of a single long duration claimant is correct. Since the frequency of severe claims is low (one claim in ten years) and since our model is not designed to handle such claims we removed this outlier from data set 1. To distinguish this new series from the original we refer to it as data set 1*. The sample autocorrelation function and sample partial autocorrelation function for data set 1* are found in Figure 8.1.8. There is a slight seasonal pattern in the sample autocorrelation function. However, a seasonal model may not be necessary, since the correlations at lag 6 and 12 are well within the 5% confidence limits. The sample partial autocorrelation function indicates that an AR(1) model is appropriate. The claims counts in the second data set are significantly higher than in data set 1. Hence one or two persistent claims would not have a profound effect on the series' shape and correlation pattern. The time series plot of data set 2, Figure 8.1.2, looks stationary with possible seasonality. A seasonal pattern appears in the sample autocorrelations function, which has with large negative correlations at lags 7, 8 and 9, and large positive Chapter 8. Application to counts of workers collecting disability benefits 160 correlations at lags 13 and 14. The sample partial autocorrelation function is consistent with the AR(1) model. The claims counts in data set 3 are also relatively large, with a mean of 6.133. In July 1988 the claims count is unusually high at 21 and is the only observation above 14. The time series plot, Figure 8.1.3, shows a season pattern and a drop in the variation after January 1990. The sample autocorrelation function confirms a seasonal pattern, while the sample partial autocorrelation function suggest an AR(1) model is appropriate, see Figure 8.1.6. The claims counts in data set 4 are low. Between June 1990 and April 1991 there appears to be a persistent claim, again it is impossible to tell for sure without further investigation. The later half of the series appears to have a slightly lower claims frequency than the first half of the series. The sample autocorrelation function and partial autocorrelation function, Figure 8.1.7, are consistent with an AR(1) model. In data set 5 the claims counts are low with slightly higher claims counts occurring between January 1990 and December 1992, which again could be caused by a single claimant with a severe or reoccurring dislocation. The sample autocorrelation function and partial autocorrelation function, Figure 8.1.7, are consistent with an AR(1) model. Chapter 8. Application to counts of workers collecting disability benefits Time Series Plot (data set 1) Month Figure 8.1.1 A time series plot of data set 1. Time Series Plot (data set 2) i i i i i i C ( o C | s . C o O C o } C o C , â€” TOOOJOOOJDOOJOOOJOOJ^OJJOOJJOOJJOO) i Month Figure 8.1.2 A time series plot of data set 2 Time Series Plot (data set 3) Month Figure 8.1.3 A time series plot of data set 3 i i C c M ^ O C T t Chapter 8. Application to counts of workers collecting disability benefits Time Series Plot (data set 4) Month Figure 8.1.4 A time series plot of data set 4 Time Series Plot (data set 5) Figure 8.1.5 A time series plot of data set 5. 163 Chapter 8. Application to counts of workers collecting disability benefits ACF (data set 1) PACF (data set 1) lift !â€¢ flu Confidence Limits 1 3 2 5 4 7 6 9 8 11 10 13 12 Confidence Limits 15 14 3 16 5 2 Lag Number 7 4 3 6 11 10 13 12 15 14 16 Lag Number ACF (data set 2) PACF (data set 2) 8^- â€¢|H" 'MM Confidence Limits o Confidence Limits -1.0 1 3 2 5 4 7 6 9 8 11 10 13 12 15 14 1 16 3 2 Lag Number 5 4 7 6 9 8 11 10 13 12 15 14 16 Lag Number ACF (data set 3) I PACF (data set 3) Bii- â€¢ 111 wB HÂ«Â» Confktence Limits 1 2 3 4 5 6 7 S S 10 11 12 13 14 15 16 Lag Number Figure 8.1.6 ACF's and PACF's for data sets 1 to 3. Confidence Limits 3 2 5 4 Lag Number 7 6 9 8 11 10 13 12 15 14 16 164 Chapter 8. Application to counts of workers collecting disability benefits ACF (data set 4) PACF (data set 4) P"ini"ii""!!^lZii^B â€¢â€¢ â€¢ â€” a Confidence Limits LL O -.5 < Confidence Limits HUfcoefficiant 1 3 2 5 4 7 6 9 I 11 10 13 12 15 14 1 16 5 2 Lag Number ACF 3 7 4 6 9 8 11 10 13 12 15 14 16 Lag Number (data 5) PACF (data set 5) Confidence Limits 1 3 5 7 11 10 13 12 ConfKdence Limits 1 15 14 3 2 16 Lag Number 5 4 7 6 9 8 11 10 12 13 15 14 16 Lag Number Figure 8.1.7ACF's and PACF's for data sets 4 and 5. ACF (data set 1*) PACF (data set 1*) tr M JLE-, Confidence Limits 8 -i.o Confidence Limits Â£ 3 2 5 4 7 6 11 10 13 12 15 14 -1.0 1 16 Lag Number Figure 8.1.8 ACF and PACFfor data set 1 * 3 2 Lag Number 9 11 10 13 12 1 Chapter 8. Application to counts of workers collecting disability benefits 165 8.2 Model selection and testing In this section we select and estimate a Poisson AR(1) model for each of our data sets. We restrict the class of models by considering only models where the departure rate is fixed, that is, a , = a for all t, and where the arrival rate is either constant or depends on sinusoidal seasonal regressors. The parameter estimates for data sets 1*, 2, 3, 4 and 5 are summarized in Table 8.2.3. Our preliminary analysis in Section 8.1 suggested the following need for regressors: data sets 1* and 2 might need seasonal regressors, data set 3 will almost certainly need seasonal regressors and data sets 4 and 5 will not require seasonal regressors. In analyzing data set 1*, we fail to find any seasonal regressors for which the coefficients were significantly different from zero, at the 5% level. We therefore proceeded to analyze the model with a constant arrival rate. The information matrix test statistic for the joint test that neither the departure nor the arrival parameters are stochastic is 0.240 (P-value 0.81), while for the individual test that the arrival parameter is non-stochastic is 0.134 (P-value 0.89). In both cases we accept the null hypothesis of non-stochastic parameters. Henceforth in this section we will refer to these two tests as simply the joint information matrix test and the individual information matrix test. In Table 8.2.3 we see that the lower bound of the 95% confidence interval for a is close to zero, 0.007. It is therefore worth testing for independence in this series. In Table 166 Chapter 8. Application to counts of workers collecting disability benefits 8.2.1 we see that all of the tests reject independence at the 5% level and that the CLS and Wald test reject independence at the 1% level. We conclude that it is unlikely the series is independent. In Figure 8.2.1 we have included three residual plots: Pearson, continuation and arrival, see Section 4.4 for the development of continuation and arrival residuals. Recall the Pearson residuals can be decomposed into the continuation and arrival residuals as follows: r =X -aIM -X t t = E,[X ]-oX,_ -X t = E [a t o I x +Â£,]-OLY _ M ( = E [a o J ] - a I M t where r = E [a Â°X,_ ]-aX _ Xt t x t M 1 -X + i?,[e,]-A and r = i? [s,]-X are, respectively, the continuation and x 2t ( arrival residuals at time t. The residual can be standardized as follows: r j E _ ^rf t r jJE7 _ [l* Xj / and r j [ r 2 1 on 3, 2t 2 t x , ? ]^ . Recall E is the operation of expectation conditional t =G(X ,X ,...,X ). 0 x t Since the continuation and arrival residuals are new ideas, we now analyze these residuals in detail for data set 1*. We begin by considering the following cases for the standardized continuation residuals of data set 1*: Case 1. X _ = 0. t x 167 Chapter 8. Application to counts of workers collecting disability benefits In this case, a Â°X _ given X _ = 0, is not random but identically equal to zero, hence t x t x E [a o I ] = 0. Since aX _ is also equal to zero in this case, the residual at time t is zero. t M t x This is a key observation and an important property. It shows that, in this case, all of the deviation between the observed value of X and its expected value at time t-1 is due to the t arrival process and not the continuation process. Case 2. X _ = 1 and X = 0. t x t In this case, a o J , . , given X =0, t is non-random and equal to zero since nobody continues. Therefore E [a Â° X _ ] = 0. Since oX,_ is positive, the residual at time t is t t x x negative. For data set 1* the standardized residual is -0.709. Case 3. X _ = 2 and X, = 0. t x This case is similar to case 2, the difference is that oX _ is twice as large as in case 2. For t x data set 1* the standardized residual is -0.919. Note that the residual is not twice the value in case 2, since, in this case, the standardization is conditional on X _ = 2 and not t x Case 4. X _ =1 andX, =1. t x In this case, a Â°X _ given X _ = 1 and X = 1, is a random variable taking values in the t x t x t set {0,1}. That is, the one individual collecting at time t is either continuing to collect from time t-1 or is a new claim at time t. For data set 1* the standardized residual is Chapter 8. Application to counts of workers collecting disability benefits 168 1.364. Case 5. X _ = 1 and X = 2. t x t In this case, a Â° X _ given X _ = 1 and X = 2 is again a random variable taking values in t x t x t the set {0,1}. Note this is a different random variable than the one in case 4, that is P(aoX _ t x = l\X _ =\,X t x = 2)>P(aoX _ t t x =\\X _ =\,X =l). At time t there are two t x t possibilities: two arrivals or one arrival and one continuing claim. For data set 1* the standardized residual is 1.727. Although other cases are possible these are the only five observed in data set 1*. In a similar manner we now analyze the standardized arrival residuals for data set 1*. Casel. X _ =0 andX, =0. t x Conditional on X = 0, e, is non-random and equal to zero, hence Â£,[Â£,] = 0. This results t in a negative residual. For data set 1* the standardized residual is -0.366. Case 2. X,_ = 1 and X, = 0. x This case is the same as case 1 except that the scaling (standard deviation conditional on X _ = 1) is different. For data set 1* the standardized residual is -.520. t x Case 3. X _ = 2 and X = 0. t x t This case is the same as cases 1 and 2 except that the scaling (standard deviation Chapter 8. Application to counts of workers collecting disability benefits 169 conditional on X _ = 2) is different. For data set 1* the residual is -.654. t x Case 4. X _ =0 and X =1. t x t In this case, e given X _ = 0 and X = 1 is non-random and equal to 1 For data set 1* the ( t x t standardized residual is 2.366. Case 5. X _ = 1 and X, = 1. t x In this case, s given X _ = 1 and X = 1 is a random variable taking values in the set ( t x t {0,1}. Thus Â£,[e,] > 0. For data set 1* the standardized residual is 0.638. Case 6. X _ = 1 and X - 2. t x t In this case, e given X _ = 1 and X = 2 is a random variable taking values in the set t t x t {1,2}. Thus E,[e ] > 0. For data set 1* the standardized residual is 4.043. t For data set 2 we found the coefficients for the following seasonal regressors statistically significant at the 5% level: sin(27t t/12) and COS(2TI t/\2). If these regressors are essential to the model then the information matrix test for a simpler model should reject the null hypothesis of non-stochastic parameters. For the model with a constant arrival rate the joint information matrix test statistic is -.537 (P-value 0.59), while the individual information matrix test statistic is 0.417 (P-value 0.68). In both cases we accept the null hypothesis of non-stochastic parameters. That is, the model with a constant arrival rate is sufficient to explain the variation observed in the series. The 170 Chapter 8. Application to counts of workers collecting disability benefits simulation in Section 6.3 showed that the information matrix test had good power against the misspecification considered. However, the power of the test may be lower for other types of misspecification. We therefore need to check the residuals before making any conclusions. The residual plots for this simple model, Figure 8.2.2, look random. Further none of the sample autocorrelation for the residuals are statistically significant at the 5% level. We therefore choose to use the simpler model with a constant arrival rate. In our analysis of data set 3 we found the following seasonal regressors statistically significant at the 5% level: sin(27tf/12) and COS(2TC t/12). The joint information matrix test for this model is 1.008 (0.31 P-value). Figure 8.2.3 shows the residuals plotted in chronological order. At the 5% level, none of the sample autocorrelations are significant. In Figure 8.2.3 the residuals are plotted against the two seasonal regressors. These plots indicate no significant problems. Before we can go ahead and accept this model we need to show that simpler models are not adequate. We consider three simpler models with the following regressors: constant only (model 1), constant plus sin(27i t/12) (model 2) and constant plus cos(27r t/12) (model 3). The joint information matrix tests for these three models are summarized in Table 8.2.4. For model 1, the Pearson and continuation residuals have a significant sample autocorrelation at lag 12, while the arrival residuals have significant sample autocorrelations at lags 2 and 12. These correlations in the residuals along with the low P- Chapter 8. Application to counts of workers collecting disability benefits 171 value for the joint information matrix test lead us to reject model 1. The Pearson, continuation and arrival residuals for model 2 have significant sample autocorrelations at lags 2, 12 and 2, respectively. Although the P-value for the joint information matrix test is above .05 we reject the model because the residuals are correlated. In the case of model 3 the residuals appear to be uncorrelated, however we reject the model due to the low P-value of the joint information matrix test. In the case of data set 4 we fit a model with a constant arrival rate. The joint information matrix statistic is 0.222 (0.82 P-value). This leads us to accept the null hypothesis that the parameters are non-stochastic. In the three residual plots, see Figure 8.2.5, the suspected persistent claim is evident (observations 54-64). In the case of the Pearson and arrival residuals this causes a band of residuals close to zero. However, in the case of the continuation residuals this band of residuals is quite far from zero. This causes the lag 1 sample autocorrelation for the continuation residuals to be significant at the 5% level. Otherwise the residuals for this model look good, and we decide not to remove or further investigate the suspected outlier. For data set 5 we also fit a model with a constant arrival rate. The joint information matrix test statistic is -0.514 (0.61 P-value) and therefore we accept the null hypothesis of non-stochastic parameters. The residual plots are found in Figure 8.2.6. None of the sample autocorrelation for the residuals are significant at the 5% level. Chapter 8. Application to counts of workers collecting disability benefits 172 Recall duration is the number of months that a claimant collects S T W L B . In Section 2.2 we showed that the mean duration is ( l - a ) . We further showed how to _ 1 construct 95% confidence intervals for the mean duration in Section 3.5. The mean durations, Table 8.2.5, for data sets 1*, 2, 3 and 4 are between one and two months. In the case of data set 5 the mean duration is longer, between two and three months. Test CLS Our Score Wald LR Usual Score Statistic Value 2.456 2.256 6.893 5.037 5.090 5% Critical Value 1.645 1.645 2.71 2.71 2.71 1% Critical Value 2.33 2.33 5.41 5.41 5.41 Table 8.2.1 Tests for independence in data set I *. Month Arrival Rate January February 2.353 2.415 March 2.737 April 3.310 May 4.060 June 4.783 July 5.177 August 5.043 September 4.450 October 3.680 November 3.000 December 2.547 Table 8.2.2 This table displays the seasonal arrival rate for data set 3. Chapter 8. Application to counts of workers collecting disability benefits Data Set Estimate I ppci Hd a 1 imei Bd (><)()- 0.240 0.472 1* X 0 064 0.134 2 a D.344 0.472 0.204 0.599 2 X 3.898 5.188 6.478 3 0.294 0.406 0.519 3 a P, (constant) 1 dV) 1.250 1.4M 3 P (sin(2jr?/12)) -d401 -0.243 -0.051 3 P (cos(2jtr712)) -0.483 -0.315 -0.147 4 a 0.404 0.604 0.170 0 2M 1* Parameter 2 3 4 X H Bmum flH 5 a 0.539 0.652 0.765 5 X d.2Â«w 0.333 0.457 173 Table 8.2.3 This table summaries the parameter estimation for data sets 1 to 5; included are the parameter estimates and the upper and lower 95% confidence limits. Model 1 2 3 Test statistic 2.313 1.729 2.114 P-value 0.02 0.08 0.03 Table 8.2.4 this table summarizes the joint information matrix test of models 1,2 and 3 on data set 3. Data set Duration 95% confidence interval 1* (0.881,1.717) 1.316 2 1.894 (1.397,2.352) 1.684 3 (1.338,2.002) 4 1.678 (1.066,2.242) 5 2.874 (1.864,3.804) Table 8.2.5 This table contains the mean duration and 95% confidence intervalfor the mean duration for data sets 1*2,...,5. Chapter 8. Application to counts of workers collecting disability benefits Pearson Residuals (data set 2) Continuation Residuals (data set 2) Arrival Residuals (data set 2) â€¢ â€¢ â€¢ â€¢ Figure 8.2.2 Pearson, continuation and arrival residuals plotted against time for data set 2. 174 Chapter 8. Application to counts of workers collecting disability benefits Pearson Residuals (data set 3) Continuation Residuals (data set 3) Arrival Residuals (data set 3) Figure 8.2.3 Pearson, continuation and arrival residuals plotted against time for data set 3. 175 Chapter 8. Application to counts of workers collecting disability benefits Pearson Residuals Pearson Residuals cos(2'pi/12) sin(2'pl/12) Continuation Residuals Continuation Residuals iJ s i n ( 2 ' p i f 12) Arrival Residuals cos(2*pi/12) Arrival Residuals H I sin(2*pi/12) cos(2*pi/12) Figure 8.2.4 Pearson, continuation and arrival residuals plotted against model regressors in data set 3. 176 Chapter 8. Application to counts of workers collecting disability benefits Figure 8.2.6 Pearson, continuation and arrival residuals plotted against time for data set 4. 177 Chapter 8. Application to counts of workers collecting disability benefits 178 8.3 A r r i v a l process The Poisson AR(1) model assumes the continuation and arrival processes are not directly observable. For the five data sets described in Section 8.1 we were able to obtain the arrival data from the W C B . To distinguish the two sets of data we will refer to the arrivals as data sets 1A, 2A, 3A, 4A and 5A. In general data of this form may not be readily available, since it requires more detailed record keeping. Since these data are available we consider the following two questions: how well were we able to estimate the arrival rates with the Poisson AR(1) model? Are the arrival processes Poisson? We begin by estimating the arrival rates for the arrival data, which we assume to be independent and Poisson. For data sets 1A, 2A, 3A and 5A we assume a constant mean, while for data set 3A we assume a seasonally changing mean. The maximum likelihood estimates and 95% confidence intervals for the parameters are found in Table 8.3.1. For data sets 1*, 4 and 5 the parameter estimates from the Poisson AR(1) model are contained within these 95% confidence limits. The estimated arrival rate for data set 2A, 4.475, is contained in the some what wider 95% confidence interval for the same parameter in the Poisson AR(1) model. In the case of data set 3, two of the three parameter estimates from the Poisson AR(1) model are contained within the 95% confidence intervals for the same parameter as calculated from data set 3A. The estimate for p, in data set 3A is contained in the 95% confidence interval for p j as calculated from data set 3A. Chapter 8. Application to counts of workers collecting disability benefits 179 Table 8.3.2 displays the estimated seasonal arrival rate for data set 3A. These rates are slightly lower than those estimated by the Poisson AR(1) model. Also note the seasonal pattern has shifted by one month. That is, the lowest and highest months are respectively February and August, where as for the Poisson AR(1) model they are January and July. Overall, the Poisson AR(1) model appears to do a good job at estimating the arrival rate. Next we check the Poisson assumption. If the Poisson assumption is correct we would expect the mean and variance of each arrival process to be close. This is the case for data sets 1A, 4A and 5A. However, for data set 2A the variance, 7.171, is "much larger" than the mean, 4.475. Since the arrival rate is non-constant for data set 3A we would expect the variance to be larger than the mean. The Poisson specification can be formally tested by the information matrix test as follows. The Poisson probability function is p{X ) = t The first and second derivative of log[p(X,)] are: and 180 Chapter 8. Application to counts of workers collecting disability benefits \og[p(X )] 2 t ~" " dX L 1 V J =- ^ . x Therefore the information matrix test statistic is x-x -X, 2 A t=\ 2 Since the denominator is constant, we consider the following statistic instead u (X) = Â£ { x , - x - x } . 2 n t -V We have that [U(X)]~/ U (X)^>N 0,1 , since the data are independent and identically 2 n distributed with finite moments. Note the test is basically checking whether the mean and variance are the same. The parameter X is unknown, so we replace it with X (the maximum likelihood estimate). U (X) n is related to U (X) as follows: n n-y>Uâ€ž(X) = n-y>Â±{x -X -X } 2 t t t=\ = n- ^lx -X -X + y -X \+n 2 /2 t +2 X -X 2 t = Â«^X{x,-X = Â» - % â€ž ( * . ) + X-X 2 t t X-X 2 +2 X-X X-x] X -X t 0,(1). The results of the information matrix test are summarized in Table 8.3.3. In the case of data set 2A, the P-value is small which suggests the arrivals are unlikely to be 181 Chapter 8. Application to counts of workers collecting disability benefits Poisson. Finally we check the assumption of independent arrivals. None of the first 12 sample autocorrelations for data sets 1A, 3A and 5A were significant at the 5% level. Note for data set 3A we used the sample autocorrelations in the residuals. For data set 2 A lags 1 and 12 had autocorrelations that were significant at the 5% level. This indicates possible seasonality as well as violation of the independence assumption. For data set 4 A the sixth lag was significant at the 5% level, indicating possible seasonality. To conclude, the Poisson AR(1) model adequately estimates the arrival parameter. A l l of the data sets expect data set 2A appear to be Poisson and independent. Parameter Duu 1 MlllKlk 95% C.I. Data Set Estimate 95% C.I. X 1A 0.158 (0.087,0.229) 1 0.134 (0.064,0.204) X 2A 4.575 (4.192.4.958) 2 5.188 (3.898,6.478) p, (constant) 3A (1.016,1.226) 3 1.250 (1.039,1.461) â€¢ â€¢ J I M p 2 (sin(2nf/12)) 3A -0.381 (-0.526,-0.236) 3 -0.243 (-0.401,-0.051) P 3 (cos(2^/12)) 3 \ -0.281 (-0.412,-0.137) 3 -0.315 (-0.483,-0.147 X 4A 0.150 (0.081,0.219) 4 0.170 (0.090,0.251) X 5A 0.308 (0.:<W.().4M-) 5 0.333 (0.209,0.457) Table 8.3.1 This table summaries the parameter estimation for the arrival processes in data sets 1A to 5A; included are the parameter estimates and 95% confidence interval. The last two columns contain estimated arrival rates and 95% confidence intervals from the Poisson AR(1) model. Chapter 8. Application to counts of workers collecting disability benefits Arrival Rate Arrival Rate Month (arrival data) (Poisson AR(1)) January 1.987 2.353 February 1.916 2.415 March 2.095 2.737 April 2.538 3.310 May 3.235 4.060 June 4.064 4.783 July 4.735 5.177 August 4.911 5.043 September 4.490 4.450 October 3.706 3.680 November 2.908 3.000 December 2.315 2.547 182 Table 8.3.2 This table displays the seasonal arrival rate for data set 3A. Data Set 1A 2A 3A 4A 5A Test Statistic -0.543 2.954 1.407 -0.385 0.170 P-value 0.59 0.00 0.16 0.70 0.87 Table 8.3.3 This table summarizes the information matrix test for data sets 1A-5A. 8.4 Forecasting In this section we calculate forecasts for the first 6 months of 1995, which are found in Tables 8.4.1-8.4.6. Since the models used for data sets 1*, 2,4 and 5 are simple, that is the arrival and departure rates are fixed over time, we can apply the forecasting techniques of Chapter 3. For these four data sets we have calculated individual 95% confidence intervals for the kstep ahead conditional distribution k=l,2,...,6,oo. Chapter 8. Application to counts of workers collecting disability benefits 183 Note that in each case the 95% confidence intervals for the 6-step ahead conditional distribution are very close to the 95% confidence intervals for the marginal distribution. Therefore i f we require forecasts beyond six months into the future we can simply use the marginal distribution. A few of the lower bounds in Tables 8.4.4 and 8.4.5 are negative. This is because the confidence intervals are constructed by applying an asymptotic normal result to a finite sample and are therefore only approximate. However, the confidence intervals that are affected by a negative lower bound are for probabilities that are very small (less than 1%). In Section 7.2 we show how to construct individual confidence intervals for the kstep ahead conditional distribution when the arrival and departure rates depend on regressors. However, to practically apply these results would require a significant amount of programming. Therefore for data set 3 we have only calculated the k-step ahead conditional distribution. A possible alternative for constructing confidence intervals for the k-step ahead conditional distribution is to use a bootstrap method, see Efron (1982), which may be more practical than the method in Section 7.2. The marginal distribution for data set 3 is given in Table 8.4.6. Notice the marginal mean peaks in August, which is one month after the arrival rate peaks, see Table 8.2.2. Also note, there is a fair bit of difference between the 6-step ahead conditional distribution and the marginal distribution for June. Chapter 8. Application to counts of workers collecting disability benefits k ^(o|i) k=2 1 k - 3 k=4 k-5 184 k=6 k 0.812 0.937 0.773 0.920 0.762 0.919 0.758 0.920; 0.757 0.920 0.756 0.920 i'."5h H O M 0 063 0 171 0.079 0.202 0 08) 0 211 0.081 0.214:0.080 0.215 0.080 0.215 0.080 0.215 Pkw) 0 000 0 016, 0.000 0.023 0 0()0 0 02S 0.000 0.026; 0.000 0.026 0.000 0.026 0 000 0 026 PkW) o o o o o o o r o.ooo 0.002 0.000 0.002 0.000 0.002- 0.000 0.002 0.000 0.002 0.000 0.002 0 000 0 000 0.000 0.000 0 000 0 000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Pkw) Table 8.4.1 Individual 95% confidence intervals for the k-step ahead conditional distribution for data set 1 *. 1 k-1 Pt(0|7) Pk(P) PkW) pm k k=2 k=4 0000 0000 0 001[ 0.000 0.001 0 000 0 001 O001 0 00'' 0.001 0.007 0.004 0 024 0.004 0.020 0.000 0.000 0 000 3 0000 0 000 0.000 k k=6 5 k--*! 0.000. 0 000 0 000 0.000 0.000 0.000 0.000 0.000 0.001 0 000 0 001 0.000 0.001 0 000 0 001 0 001 0 006 0.001 0.005 0 001 0.001 0.005 0.001 0.004 0.004 0 017 0.004 0.015 0 003 0 014 0.003 0.014 0.003 0 014 0.044 0 012 0 03Xi 0.011 0.035 0 010 0 033; 0.010 0.033 0 010 0 032 0.059 0 005 k PkW) Pk(p) PAW PM Pk{P) Pki$l) ^(10|7) 0 014 0 057 0.013 0 034 0 104 0.030 0.078 0.066 0.149 0.055 0 02" 0.114 0.050 0 06* 0.025 0.063' 0 024 0 061 0.024 0.060 , 0.023 0 101 0.047 0.096 0 046 0 093 0.045 0.092 0 044 0.091 0.0~"2 0 121 0.072 0.120 0.071 0.119 0 137 0.098 0.136 0.098 0.136 0.118 0.137 0 118 0 13" 0 103 0 174 0.085 0.140 0 078 0 129 0.074 0.123 0 135 0 169 0.112 0.149 0.104 0 142 0.101 0.138,0099 0 13" 0 151j 0.130 0.139 0 123 0 HX 0.120 0094 0 145 0.114 0.133 1 0 119 o i r 0 119 0 129 0.121 0.096 0.123 0.127 0123 0 127 0.128| 0 122 0\r 0.123 0 098 0 123 0.099 0.123 0.100 0.123 0.109 0.122 0.083 0.122 0.092 0 122 P*(12|7) 0 025 0 091 0.054 0.102 0 065 0 106 0.069 0.107 0.073 0.109 0.073 Pt(l3|7) 0 008 0 061 0.031 0.077 0 041 0 083 0.086 ' 0 048 0 087 0.049 0.088 0.049 0.089 ft(H|7) 0.054 0.137 0.046 00 2 7 0 108 0 001 0.03 i 0.016 0.054 0 024 0 061 0.027 0.064' 0 029 0 06* O.030 0.066, 0.030 0.066 PkWT) Table 8.4.2 Individual 95% confidence intervals for the k-step ahead conditional distribution for data set 2. 7 185 Chapter 8. Application to counts of workers collecting disability benefits mean median k 1 k-2 k-3 k-4 k-5 k-6 4.383 4.194 4.440 5.113 6.136 7.274 ' 4 4 5 5 6 7 4 4 4 5 6 6 P*(0|5) 0.007 0.007 0.005 0.003 0.001 0.001 Pk(P) 0.041 0.039 0.029 0.018 0.010 0.005 P*(2|5) 0.109 0.105 0.085 0.058 0.034 0.020 Pk(P) 0.182 0.177 0.154 0.117 0.078 0.051 P*(4|5) 0.213 0.211 0.198 0.170 0.130 0.095 Pk{5\5) 0.187 0.189 0.193 0.189 0.167 0.138 0.131 0.134 0.150 0.168 0.173 0.162 0.074 0.078 0.096 0.124 0.150 0.159 Pt(Â«|5) 0.035 0.038 0.051 0.078 0.110 0.134 #) 0.014 0.016 0.024 0.042 0.071 0.098 (l0|5) 0.005 0.006 0.010 0.020 0.040 0.064 ft(ll|5) 0.002 0.002 0.003 0.008 0.020 0.037 P*(12|5) 0.000 0.001 0.001 0.003 0.009 0.020 0.000 0.000 0.000 0.002 0.006 0.016 mode Pkffl) ^( I ) 7 5 ft Table 8.4.3 The k-step ahead conditional means, medians, modes and point mass forecast for data set 3. k- 1 ft(0|l) PkW) ft(2|l) Pk{$) PkW) 1 k= 2 0^12 0.696 k 3 k=4 k 5 k= 6 0 S " 7 0.669 0 862 0.655 0.85V 0 647 0 861 0 643 0.862 0 640 0 086 0 202 0.120 0.257 0.134 02 5 0.137 0.284 0 136 0 2 9 0 0.136 0.292 0 134 0 295 0 001 0 023 0.004 0.042, 0 005 0 050 0.005 0.054 0 004 0.057 0 003 0 058 0.000 0.002 -0.001 0.004 -0.001 0.006 -0.001 0.007 -0.001 0.007 0.001 0 001 0V1 0 000 0 000 0 000 0.000 _, 0 000 0 000 0.000 0.006 -0 001 0.001 0 056 0.004 0 00" -0.001 0 000 0 001 0.000 0 000 0 864 Table 8.4.4 Individual 95% confidence intervals for the k-step ahead conditional distribution for data set 4. k= 1 />*(o|i) 0.032 | PkW) PM k-3 0.141; 0.135 0.247 0 289 0 420 A(2|l) k= 2 0.338 0.436 0.327 0.509 0.241 WKKKKKKKtm 0.080 0.160; 0.067 0.335 0 007 0 0*1 Table 8.4.5 â€¢ Individual 0.004 0.144 0.045 0.196 k=4 0.327, 0.221 â€” 0.234 0.435 0.241 0.463 0 235 0 534 0.347 0.404 0.357 0.387 0.363 0.377 0 361 (H74 0 276 0.170 0.260 0.149 0.252 0.136 0.248 0 101 0 250 MMMHNII 0.049 0.129 0 003 1 0.392 0 336 0 427 0 203 k=6 BSill'lilillllllll ll^^Hlif^^Bil 0.038 0.118 0.031 0.110 0.026 0.105 0 04* 0.003 0.038 0 002 0 034 0 001 0.032 -0 003 0 009 0 103 95% confidence intervals for the k-step ahead conditional distribution for data set 5. 0 0*0 186 Chapter 8. Application to counts of workers collecting disability benefits Jan. k-1 mean 4.341 Feb. Mar. k=2 k-3 4.177 4.433 Apr. k=4 May / k = 5 5.110 6.135 June July Aug. k-S k-6 . k=7 7.274 8.130 median 4 4 4 5 6 7 mode 4 4 4 5 6 7 P*(0) 0.013 0.015 0.012 0.006 PkO) 0.057 Pk(2) 0.123 PkO) Sept. Oct. Nov. k-9 k-10 k-11 8.344 7.838 8 8 8 6.862 5.786 7 6 Dec. k-12 4 896 5 8 7 6 5 4 0.002 0.001 0.000 0.000 0.000 0.001 0.003 0.007 0.064 0.053 0.031 0.013 0.005 0.002 0.002 0.003 0.007 0.018 0.037 0.134 0.117 0.079 0.041 0.018 0.010 0.008 0.025 0.051 0.090 0.178 0.186 0.172 0.134 0.083 0.044 0.026 0.023 0.032 0.056 0.099 0.146 PkW 0.193 0.195 0.191 0.171 0.128 0.081 0.054 0.048 0.062 0.097 0.143 0.179 Pk(5) 0.167 0.163 0.169 0.175 0.157 0.118 0.087 0.080 0.097 0.133 0.166 0.175 Pk(6) 0.121 0.113 0.125 0.149 0.160 0.143 0.118 0.111 0.127 0.152 0.160 0.143 PkO) 0.075 0.068 0.079 0.109 0.141 0.148 0.137 0.133 0.142 0.149 0.132 0.100 P*(8) 0.041 0.035 0.044 0.070 0.108 0.135 0.139 0.139 0.139 0.128 0.096 0.061 Pk(9) 0.020 0.016 0.022 0.040 0.073 0.109 0.126 0.128 0.121 0.097 0.061 0.033 P*(10) 0.009 0.007 0.010 0.079 0.102 0.107 0.095 0.067 0.036 0.016 Pk (11) 0.003 0.003 0.004 0.009 0.052 0.076 0.081 0.068 0.042 0.019 0.007 Pt(12) 0.001 0.001 0.032 0.051 0.044 0.024 0.009 0.003 0.001 0.000 0.001 0.001 0.020 0.045 0.025 0.004 0.013 0.002 0.010 0.035 8 0.057 0.012 0.070 0.082 0.056 0.024 0.007 0.002 Table 8.4.6 The marginal means, medians and distributions for data set 3. 8.5 Gaussian AR(1) models In this section we fit the Gaussian AR(1) model, see Example 4.1.1 for the model definition, to our five data sets and compare the results to the Poisson AR(1) model. Table 8.5.1 contains the parameter estimates for the Gaussian AR(1) model. Most of the estimates are quite close to the corresponding parameter estimates of the Poisson AR(1) model. The largest difference is in the estimate for a in data set 4, where the Gaussian model estimates 0.291 and the Poisson model estimates 0.404. However, the Gaussian estimate is still well within the 95% confidence interval for a given by the Poisson model. Chapter 8. Application to counts of workers collecting disability benefits 187 The Gaussian model gives wider 95% confidence intervals for X than the Poisson model. In fact for data set 1* the Gaussian 95% confidence interval for X includes zero. For data sets 2, 3 and 5 the Gaussian models give wider 95% confidence intervals for a than the Poisson model. While for data sets 1 and 4 the Gaussian model gives narrower 95% confidence intervals for a than the Poisson model. Note the parameters P,, P M 2 & P for data set 3 can not be directly compared to 3 the corresponding parameters in the Poisson model, since in the Gaussian model we have not used an exponential link function. However, the estimated arrival rates for the Gaussian model, Table 8.5.2, can be directly compared to the arrival rates of the Poisson model. While the arrival rates calculated by the Poisson AR(1) model are slightly higher than those calculated directly from the arrival data the arrival rates calculated by the Gaussian AR(1) model are slightly lower those calculated directly from the arrival data. The biggest advantage in using the Poisson AR(1) model over the Gaussian AR(1) model is in the area of forecasting. Figure 8.5.1 displays Poisson AR(1) model forecasts and the Gaussian AR(1) model forecasts for the first six months of 1995 for data set 1*. From the Poisson AR(1) model we have calculated the forecast distribution, which is the bar portion of the chart. The three lines on the chart mark the predicted values and the 95% prediction interval for the Gaussian AR(1) model. The cumulative probabilities are marked on the left Y-axis. For the 1-step ahead distribution there is a 0.875 probability that the count will be zero and a 0.117 probability that the count will be 1. While for the 6-step ahead distribution there is 0.838 probability Chapter 8. Application to counts of workers collecting disability benefits 188 of a count of zero and a 0.148 probability of a count of 1. The numbers for the predicted values in the Gaussian AR(1) model are labeled on the right Y-axis. The mean predicted value is about 0.2 and the 95% prediction interval is approximately between -0.6 to 1.0. The lower bound of the 95% prediction interval as given by the Gaussian AR(1) model is also negative for data sets 3, 4 and 5. In data set 2 the counts are sufficiently high, so that the lower bound is positive. This illustrates the importance of using appropriate distributional assumptions in modeling. Data Set Parameter Gaussian Estimate Gaussian 95% CI. Poisson Estimate Poisson 95% CI. 1* a 0.221 (0.047,0.395) 0.240 (0.007,0.472) 1* X 0.136 (-0.045,0.317) 0.134 (0.064,0.204) 2 a 0.449 (0.289,0.609) 0.472 (0.344,0.599) 2 X 5.400 (3.823,6.977) 5.188 (3.898,6.478) 3 0.489 (0.330,0.648) 0.406 (0.294,0.519) 3 a P (constant) 6.147 (5.192,7.102) 1.250 (1.039,1.461) 3 P 2 (sin(27tf/12)) -1.683 (-2.794,-0.572) -0.243 (-0.401,-0.051) 3 P 3 (cos(27tf/12)) -1.138 (-2.238,-0.038) -0.315 (-0.483,-0.147) 4 a 0.291 (0.120,0.462) 0.404 (0.203,0.604) 4 X 0.199 (0.014,0.384) 0.170 (0.090,0.251) 5 a 0.591 (0.447,0.735) 0.652 (0.539,0.765) 5 X 0.376 (0.153,0.599) 0.333 (0.209,0.457) T Table 8.5.1 The Gaussian AR(1) model parameter estimates for data sets 1 to 5. Chapter 8. Application to counts of workers collecting disability benefits Arrival Rate Arrival Rate Arrival Rate Month Gaussian AR(1) Poisson AR(1) (arrival data) January 1.871 2.353 1.987 February 2.008 2.415 1.916 March 2.449 2.737 2.095 April 3.076 3.310 2.538 May 3.720 4.060 3.235 June 4.209 4.783 4.064 July 4.412 5.177 4.735 August 4.274 5.043 4.911 September 3.833 4.450 4.490 October 3.207 3.680 3.706 November 2.563 3.000 2.908 December 2.074 2.547 2.315 189 Table 8.5.2 This table displays the seasonal arrival rate for data set 3 given by the Gaussian AR(1) model. Forecasts (data set 1*) 1.000 c o J2 "k- ffl â€¢5 > ra II Hi ijlii, Hill â€¢Illlllllll 0.950 0.6 0.4 0.900 0.2 0 0.850 -0.2 3 E u 1 0.8 -0.4 0.800 . 'â€¢ -0.6 . 0.750 -0.8 1 2 3 4 k-steps ahead 5 6 liiiill 2 1 0 â€”Aâ€” upper bd. â€”â€¢â€” forecast â€”â€¢â€” lower bd. Figure 8.5.1 The bar chart represents the k-step ahead conditional cumulative distribution for the Poisson AR(1) model, while the line graph represents the forecasts from the Gaussian AR(1) model. Chapter 8. Application to counts of workers collecting disability benefits 190 8.6 Summary In this chapter we have illustrated the methods developed in the earlier chapters. The preliminary analysis is identical to the preliminary analysis of continuous valued time series. This includes time series plots, as well as plots of the autocorrelation and partial autocorrelation function. This analysis gives us a starting point for model selection. In Chapter 4 we showed that the parameters can be estimated via maximum likelihood estimation and gave expression for the observed Fisher information. The expected Fisher information is easily evaluated numerically, and can be inverted to get the asymptotic variance of the parameter estimates. This allows us to perform t-tests on the model parameters and is useful in determining which covariates to include in the model. For models with covariates it is more difficult to calculate the expected Fisher information. In this case the observed Fisher information can be used and can easily be calculated by numerically differentiating the score function. If the t-statistic for a is small, we can further test for independence with the tests found in Chapter 5. This was the case for data set 1*, which we concluded to be a dependent series. Model selection is further refined with the help of the information matrix test, see Chapter 6, and the new residuals defined in Section 4.4. The information matrix test checks whether the model is sufficient to explain the variation found in the data. Patterns in the residuals may indicate the need of additional regressors. We found the simple Chapter 8. Application to counts of workers collecting disability benefits 191 Poisson AR(1) model without regressors was sufficient to model data sets 1*, 2, 4 and 5. In the case for data set 3 the model required the following two seasonal regressors in the arrival process: sin(27tf/12) and cos(27tr/12). The W C B was able to provide us with the arrival process, which the Poisson AR(1) model assumes to be latent. From these data we were able to directly estimate the arrival parameter. We found these estimates to be close to the estimates found using the aggregate data. The additional data also allowed us to check whether the arrivals were independent and Poisson. Independence was assessed using the sample autocorrelation function. We found that the arrivals for data set 2 appeared to be dependent. Further we found that the arrivals for data set 4 may be seasonal, which wasn't indicated in the aggregate data. The information matrix test was used to assess the Poisson assumption, which was only rejected in the case of data set 2. We conclude the Poisson assumption appears to be realistic for the W C B data. In Chapter 3 we considered forecasting for the Poisson AR(1) model. If data cohesion is considered important the k-step ahead conditional median or mode can be used as a forecast. The k-step ahead conditional mean can also be used as a forecast. However it is not integer valued. For low count series we proposed using the k-step ahead conditional distribution as a forecast. In the case of data sets 1*, 4 and 5 the counts are low enough that the k-step ahead conditional distribution is easy to read. However for data sets 2 and 3 the k-step ahead conditional distribution is much harder to read, since the non-zero probabilities are spread over more non-negative integers. Chapter 8. Application to counts of workers collecting disability benefits 192 The analysis in Chapter 8 concluded with a look at the results from fitting a Gaussian AR(1) model to the 5 data sets. We found that the parameter estimates were similar to the Poisson AR(1) estimates. We know from Section 4.7 that the loss of efficiency in using the Gaussian AR(1) likelihood increases with a , but that the Gaussian estimates are more robust than the Poisson estimates. With the possible exception of data set 2, our analysis has found the Poisson AR(1) adequate. Further in four of the five series the estimates for a was above 0.400. We therefore favor the Poisson estimates. Finally using the Gaussian AR(1) prediction intervals is meaningless, since the data are discrete valued. In fact we saw that the Gaussian AR(1) prediction intervals can include negative values, which again is meaningless for non-negative data. We conclude with some future possible avenue for analyzing the W C B data. Our analysis only considered simple sinusoidal regressors. Better models may be found by considering economic regressors, such as employment rates, weather conditions, sales etc. The W C B often funds accident prevention programs and is interested in whether or not the program has had an effect on injury rates. Indicator regressors can be used to model the change in the arrival rates before and after the start of an accident prevention program. We can then estimate the change and test whether it is significantly different than zero. Bibliography [I] Al-Osh, M . A . and Aly, E.A.A. (1992). First order autoregressive time series with negative binomial and geometric marginals. Communications in Statistics A 21, 2483-2492. [2] Al-Osh, M . A . and Alzaid, A . A . (1987). First-order integer-valued autoregressive (INAR(l)) process. Journal of Time Series Analysis 8, 261-275. [3] Al-Osh, M . A . and Alzaid, A . A . (1988). On maximum likelihood estimation for a subcritical branching process with immigration. Pakistan Journal of Statistics 4, 147-156. [4] Alzaid, A . A and Al-Osh, M . A . (1988). First-order integer-valued autoregressive (INAR(l)) process: Distributional and regression properties. Statistica Neerlandica 42,53-61. ^ [5] Alzaid, A . A . and Al-Osh, M . A . (1990). A n integer-valued pth-order autoregressive structure (INAR(p)) process. Journal of Applied Probability 27, 314-324. [6] Barndorff-Nielsen, O.E. and Sorensen, M . (1994). A Review of Some Aspects of Asymptotic Likelihood Theory for Stochastic Processes. International Statistical Review 62 133-165. [7] Box, G.E.P. and Jenkins, G . M . (1970). Time Series Analysis, Forecasting and Control. San Francisco: Holden-Day. [8] Billingsley, P. (1986). Probability and Measure, 2nd Edition. New York: John Wiley & Sons. [9] Brockwell, P.J. and Davis, R.A. (1987). Time Series: Theory and Methods. New York: Springer-Verlag. [10] Brown, B . M . (1971). Martingale Central Limit Theorems. The Annals of Mathematics and Statistics 42, 59-66. [II] Chan, K.S. and Ledolter, J. (1995). Monte Carlo E M estimation for time series models involving counts. Journal of American Statistical Association 90, 242252. [12] Chant, D. (1974). On asymptotic tests of composite hypotheses in nonstandard conditions. Biometrika 61,291 -298. [13] Chesher, A . (1983). The Information Matrix Test, Simplified Calculation V i a a 193 Bibliography 194 Score Test Interpretation. Economics Letters 13, 45-48. [14] Chow, G.C. (1960). Tests of equality between sets of coefficients in two linear regressions. Econometrica 28, 591-605. [15] Cox, D.R. and Hinkley, D.V. (1974). Theoretical Statistics. London: Chapman and Hall. [16] Cramer, H . And Wold, H . (1936). Some theorems on distribution functionsl. Journal of London Mathematical Society 11, 290-295. [ 17] Crowder, M.J. (1976). Maximum Likelihood Estimation for Dependent Observations. Journal of the Royal Statistical Society, Series B 38, 45-53. [18] Davidson, J. (1994). Stochastic Limit Theory. New York: Oxford University Press. [19] Davidson, R. and MacKinnon, J.G. (1993). Estimation and Inference in Econometrics. New York: Oxford University Press. [20] Efron, B . (1982). The Jackknife, the Bootstrap and other Resampling Plans. Philadelphia: Society for Industrial and Applied Mathematics. [21] Fahrmeir, L. (1988). A Note on Asymptotic Testing Theory for Nonhomogeneous Observations. Stochastic Processes and their Applications 28, 267-273. [22] Fahrmeir, L. and Tutz, G. (1994). Multivariate Statistical Modeling Based on Generalized Linear Models. New York: Springer-Verlag. [23] Ferguson, T.S. (1996). A Course in Large Sample Theory. New York: Chapman and Hall. [24] Godambe, V.P. and Heyde, C.C. (1987). Quasi-likelihood and optimal estimation. International Statistical Review 55, 231-244 [25] Godfrey, L . G . (1988). Misspecification tests in econometrics: The Lagrange multiplier principle and other approaches. New York: Cambridge University Press. [26] Hall, P. And Heyde, C.C. (1980). Martingal Limit Theory and its Application. New York: Academic Press. [27] Hall, W.J. and Mathiason, D.J. (1990). On Large-Sample Estimation and Testing in Parametric Models. International Statistical Review 58, 77-97. [28] Hamilton, J.D. (1994). Time Series Analysis. Princeton: Princeton University Press. [29] Harvey, A . C . (1981). Time Series Models. Oxford: Allan. Bibliography 195 [30] Harvey, A . C . and Fernandes, C. (1989). Time series models for count or qualitative observations. Journal of Business & Economic Statistics 7, 407-417. [31] Hiller, F.S. and Lieberman, G.J. (1986). Interoduction to Operations Research, 4th edition. Oakland: Holden-Day. [32] Jacobs, P.A. and Lewis, P.A.W. (1977). A mixed autoregressive-moving average exponential sequence and point process ( E A R M A 1,1/ Advances in Applied Probability 9, 87-104. [33] Jacobs, P.A. and Lewis, P.A.W. (1978). Discrete time series generated by mixtures. I. : Correlation and runs properties. Journal of the Royal Statistical Society Series B 40, 94-105. [34] Jacobs, P.A. and Lewis, P.A.W. (1978). Discrete time series generated by mixtures. II. : Asymptotic properties. Journal of the Royal Statistical Society Series B 40, 222228. [35] Jacobs, P.A. and Lewis, P.A.W. (1978). Stationary discrete autoregressive-moving average time series generated by mixtures. Journal of Time Series Analysis 4, 1936. [36] Jin-Guan, D. and Yuan, L. (1991). The integer-valued autoregressive (INAR(p)) model. Journal of Time Series Analysis 12, 129-142. [37] Joe, H . (1997). Multivariate models and dependence concepts. London: Chapman & Hall. [38] Joe, H . (1996). Time series models with univariate margins in the convolutionclosed infinitely divisible class. Journal of Applied Probability 33, 664-677. [39] Jprgensen, B., Lundbye-Christensen, S., Song, X . - K . and Sun, L . (1995). A state space model for multivariate longitudinal count data. Technical Report #148, Department of Statistics , University of British Columbia. [40] Jprgensen, and Song, X . - K . (1998). Stationary time-series models with exponential dispersion model margins. Journal of Applied Probability 35 (to appear). [41] Kalman, R.E. (1987). Regression methods for non-stationary categorical time series: Asymptotic estimation theory. Annals of Statistics. 17, 79-98. [42] Klimko, L . A . and Nelson, P.I. (1978). On conditional least squares estimation for stochastic process. The Annals of Statistics 6, 629-642. [43] Little, J.D.C. (1961). A proof for the queuing formula L = XW. Operations Research 9, 383-387. Bibliography 196 [44] MacDonald, I.L. and Zucchini, W. (1997). Hidden Markov and Other Models for Discrete-Valued Times Series. Monographs on Statistics and Applied Probability 70. London: Chapman and Hall. [45] McCabe, B and Tremanyne, A (1993). Elements of modern asymptotic theory with statistical applications. Manchester: Manchester University Press. [46] McCabe, B . and Leybourne, S. (1996). A General Test. Working Paper. [47] McKenzie, E. (1988). Some A R M A models for dependent sequences of Poison counts. Advances in Applied Probability 20, 822-835. [48] McKenzie, E. (1986). Autoregressive moving-average processes with negativebinomial and geometric marginal distributions. Advances in Applied Probability 18, 679-705. [49] McLeish, D.L. (1975). A maximal inequality and dependent strong laws. Annals of Probability 3, 826-836. [50] Pierce, D. A . (1982). The Asymptotic Effect of Substituting Estimators for Parameters in Certain Types of Statistics. The Annals of Statistics 10, 475-478. [51] Ross, S.M. (1983). Stochastic Processes. New York: John Wiley & Sons. [52] Serfling, R.J. (1980). Approximation Theorems of Mathematical Statistics. New York: John Wiley & Sons. [53] Song, X . - K . (1996). Some statistical models for the multivariate analysis of longitudinal data. Ph.D thesis. Department of Statistics, University of British Columbia. [54] Sprott, D.A. (1983). Estimating the parameters of a convolution by maximum likelihood. Journal of the American Statistical Association 78, 457-460. [55] Stout, W.F. (1974). Almost Sure Convergence. New York: Academic Press. [56] White H . (1982). Maximum Likelihood Estimation of Misspecified Models. Econometrica 50, 1-25. [57] White H . (1984). Asymptotic Theory for Econometricans. New York: Academic Press. [58] White, H . And Domowitz, I. (1984). Nonlinear regression with dependent observations. Econometrica 52, 143-161. [59] Wooldridge, J.M. (1991a). On the application of robust, regression-based diagnostics to models of conditional means and conditional variances. Journal of Bibliography 197 Econometrics 47, 5-46. [60] Wooldridge, J.M. (1991b). Specification testing and quasi-maximum-likelihood estimation. Journal of Econometrics 48, 29-55. [61] Zeger, S.L. (1995). A regression model for time series of counts. Biometrika 75, 621-629. [62] Zeger, S.L., Liang, K . - Y . and Self, S.G. (1985). The analysis of binary longitudinal data with time independent covariates. Biometrika 72, 31-38 Appendix The following is a list of the data sets used in this thesis. Data set 0 refers to the illustrative data set introduce in Section 2.5, and unlike the other series starts at January 1987. 0 Jan-85 Feb-85 Mar-85 Apr-85 May-85 Jun-85 Jul-85 Aug-85 Sep-85 Oct-85 Nov-85 Dec-85 Jan-86 Feb-86 Mar-86 Apr-86 May-86 Jun-86 Jul-86 Aug-86 Sep-86 Oct-86 Nov-86 Dec-86 Jan-87 Feb-87 Mar-87 Apr-87 May-87 Jun-87 Jul-87 Aug-87 Sep-87 Oct-87 Nov-87 Dec-87 Jan-88 Feb-88 Mar-88 Apr-88 May-88 Jun-88 Jul-88 6 11 5 5 5 2 7 4 5 4 6 8 7 7 9 9 13 12 11 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 1 0 1 2 0 0 0 1* 2 0 9 0 6 0 6 0 7 10 0 0 8 0 14 0 8 7 0 0 10 0 10 0 12 8 0 0 8 1 8 1 8 1 13 1 12 0 14 0 13 0 13 0 8 0 13 1 10 1 12 0 12 0 9 0 8 0 13 0 9 0 8 0 6 0 7 0 10 1 17 0 11 1 13 0 10 1 9 2 15 0 13 0 12 8 0 3 6 7 8 9 6 8 5 3 7 11 8 4 2 3 4 5 7 8 12 11 12 6 2 2 3 3 5 6 13 12 21 9 11 11 10 8 5 4 4 4 2 9 8 4 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 0 2 0 0 0 1 0 0 0 0 0 0 0 0 0 0 5 0 0 1 1 1 1 1 1 1 1 2 0 0 0 0 1 0 0 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 2 1 0 0 2 2 2 2 1 1A 2A 3A 4A 5A 2 0 0 0 2 1 0 0 0 3 1 4 0 0 3 0 3 5 0 0 8 1 1 0 0 2 4 0 0 0 0 4 0 0 10 1 0 4 0 0 4 0 0 0 5 1 8 8 0 0 9 5 1 0 0 2 6 3 0 0 1 0 0 6 0 2 0 4 0 0 1 5 3 0 0 4 4 1 0 0 2 1 0 1 8 0 1 8 5 0 7 8 0 0 0 1 6 0 0 6 1 7 0 0 6 1 0 0 3 5 1 1 0 0 8 0 0 0 1 3 1 2 0 0 5 7 1 0 0 0 4 2 0 0 0 4 2 1 0 0 2 0 10 9 0 4 6 0 0 0 0 0 3 15 0 0 0 3 3 0 1 0 4 6 0 7 8 0 0 0 0 2 1 11 5 2 1 0 6 0 1 8 2 0 0 4 2 0 0 0 2 0 5 3 0 7 2 1 2 0 8 1 0 0 0 1 9 8 0 0 2 5 0 0 0 198 199 Appendix Aug-88 Sep-88 Oct-88 Nov-88 Dec-88 Jan-89 Feb-89 Mar-89 Apr-89 May-89 Jun-89 Jul-89 Aug-89 Sep-89 Oct-89 Nov-89 Dec-89 Jan-90 Feb-90 Mar-90 Apr-90 May-90 Jun-90 Jul-90 Aug-90 Sep-90 Oct-90 Nov-90 Dec-90 Jan-91 Feb-91 Mar-91 Apr-91 May-91 Jun-91 Jul-91 Aug-91 Sep-91 Oct-91 Nov-91 Dec-91 Jan-92 Feb-92 Mar-92 Apr-92 May-92 Jun-92 Jul-92 Aug-92 Sep-92 0 13 16 8 14 10 6 6 15 9 15 13 11 14 11 17 8 10 11 13 10 8 8 6 9 12 11 9 11 7 9 11 6 4 6 6 12 10 12 8 6 1 3 5 5 10 12 9 7 9 12 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 2 3 1* 8 5 1 9 10 0 9 12 0 0 12 11 0 9 9 4 0 5 0 9 5 5 0 10 6 10 0 0 8 14 7 0 17 0 16 11 0 17 12 7 0 16 0 8 8 0 10 14 7 6 0 4 0 8 7 0 3 4 4 0 5 4 0 7 4 0 0 4 6 1 10 9 0 9 8 2 1 12 4 0 12 1 11 3 1 0 9 0 8 3 1 0 9 4 0 8 0 6 3 0 8 5 0 13 3 8 0 13 0 10 11 0 7 7 1 17 9 0 14 5 0 10 3 0 12 6 0 6 4 4 0 5 7 0 6 7 0 8 7 0 10 0 16 3 5 0 15 0 10 5 4 0 0 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 1 1 2 5 1A 2A 3A 4A 5A 0 0 1 5 3 0 2 8 0 0 0 0 1 1 4 4 1 0 7 0 0 0 5 1 4 3 0 0 1 0 2 1 0 0 1 0 2 0 1 4 0 0 1 2 0 2 0 5 2 6 0 0 2 0 5 5 0 0 2 0 1 11 2 0 2 0 4 1 2 0 0 11 9 0 0 3 0 12 0 1 7 2 0 0 4 0 2 2 0 6 7 0 2 2 0 8 1 1 2 2 0 0 0 1 0 3 0 0 1 1 0 0 0 0 1 1 0 1 3 0 1 3 0 3 3 0 1 0 2 3 0 0 4 0 1 0 2 0 4 0 1 0 3 0 0 1 0 2 5 0 1 2 1 6 0 0 1 0 4 3 0 0 1 1 6 0 0 0 1 2 0 3 0 0 2 2 0 0 0 0 1 0 3 0 0 0 2 1 0 1 0 3 1 1 3 0 3 0 4 2 4 0 3 0 1 3 0 6 1 0 2 6 0 0 0 5 1 4 0 0 3 0 1 2 4 0 0 0 1 11 0 0 0 6 0 0 2 1 0 0 0 0 1 1 0 0 1 2 0 3 0 0 2 2 0 1 0 0 0 0 1 4 0 0 4 3 0 0 0 0 4 3 1 0 0 0 0 0 4 4 0 0 2 0 0 0 9 0 4 0 0 8 0 0 0 1 2 1 0 0 200 Appendix Oct-92 Nov-92 Dec-92 Jan-93 Feb-93 Mar-93 Apr-93 May-93 Jun-93 Jul-93 Aug-93 Sep-93 Oct-93 Nov-93 Dec-93 Jan-94 Feb-94 Mar-94 Apr-94 May-94 Jun-94 Jul-94 Aug-94 Sep-94 Oct-94 Nov-94 Dec-94 0 14 11 9 3 4 10 2 7 9 9 3 6 9 9 9 6 5 6 5 9 7 11 12 11 12 7 11 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1* 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 0 2 14 16 12 10 11 10 8 9 10 13 6 8 9 6 9 12 8 9 5 6 9 9 13 12 10 9 7 3 4 4 2 3 6 3 1 3 6 5 9 9 5 6 4 6 2 4 1 6 5 3 2 2 2 9 5 4 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 5 1A 2A 3A 4A 5A 1 2 0 1 0 6 1 5 3 0 1 0 0 0 0 1 0 3 1 0 2 0 1 0 1 0 0 1 2 0 1 0 0 1 2 0 0 0 1 5 0 0 1 6 2 0 0 0 1 0 1 6 3 0 0 0 1 8 3 0 1 1 1 5 0 0 0 4 6 0 1 0 1 0 1 5 0 0 4 0 1 0 1 0 0 1 5 2 0 0 0 1 4 3 0 0 2 0 0 0 0 0 0 0 4 3 0 0 1 0 0 2 0 0 4 0 0 0 0 5 2 0 1 1 0 3 4 1 0 0 0 0 1 6 0 0 0 0 1 1 5 1 0 0 1 2 0 4 1 0 1 1 2 8 0 3 2 3 3 0 0 0
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Statistical analysis of discrete time series with application...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Statistical analysis of discrete time series with application to the analysis of workers' compensation… Freeland, R. Keith 1998
pdf
Page Metadata
Item Metadata
Title | Statistical analysis of discrete time series with application to the analysis of workers' compensation claims data |
Creator |
Freeland, R. Keith |
Date Issued | 1998 |
Description | This thesis examines the statistical properties of the Poisson AR(1) model of Al-Osh and Alzaid (1987) and McKenzie (1988). The analysis includes forecasting, estimation, testing for independence and specification and the addition of regressors to the model. The Poisson AR(1) model is an infinite server queue, and as such is well suited for modeling short-term disability claimants who are waiting to recover from an injury or illness. One of the goals of the thesis is to develop statistical methods for analyzing series of monthly counts of claimants collecting short-term disability benefits from the Workers' Compensation Board (WCB) of British Columbia. We consider four types of forecasts, which are the k-step ahead conditional mean, median, mode and distribution. For low count series the k-step ahead conditional distribution is practical and much more informative than the other forecasts. We consider three estimation methods: conditional least squares (CLS), generalized least squares (GLS) and maximum likelihood (ML). In the case of CLS estimation we find an analytic expression for the information and in the GLS case we find an approximation for the information. We find neat expressions for the score function and the observed Fisher information matrix. The score expressions leads to new definitions of residuals. Special care is taken to test for independence since the test is on the boundary of the parameter space. The score test is asymptotically equivalent to testing whether the CLS estimate of the correlation coefficient is zero. Further we define a Wald and likelihood ratio test. Then we use the general specification test of McCabe and Leybourne (1996) to test whether the model is sufficient to explain the variation found in the data. Next we add regressors to the model and update our earlier forecasting, estimation and testing results. We also show the model is identifiable. We conclude with a detailed application to monthly WCB claims counts. The preliminary analysis includes plots of the series, autocorrelation function and partial autocorrelation function. Model selection is based on the preliminary analysis, t-tests for the parameters, the general specification test and residuals. We also include forecasts for the first six months of 1995. |
Extent | 14121540 bytes |
Subject |
Time-series analysis. Prediction theory. |
Genre |
Thesis/Dissertation |
Type |
Text |
File Format | application/pdf |
Language | eng |
Date Available | 2009-05-29 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0088709 |
URI | http://hdl.handle.net/2429/8500 |
Degree |
Doctor of Philosophy - PhD |
Program |
Business Administration |
Affiliation |
Business, Sauder School of |
Degree Grantor | University of British Columbia |
Graduation Date | 1998-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-ubc_1998-271447.pdf [ 13.47MB ]
- Metadata
- JSON: 831-1.0088709.json
- JSON-LD: 831-1.0088709-ld.json
- RDF/XML (Pretty): 831-1.0088709-rdf.xml
- RDF/JSON: 831-1.0088709-rdf.json
- Turtle: 831-1.0088709-turtle.txt
- N-Triples: 831-1.0088709-rdf-ntriples.txt
- Original Record: 831-1.0088709-source.json
- Full Text
- 831-1.0088709-fulltext.txt
- Citation
- 831-1.0088709.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0088709/manifest