STATISTICAL ANALYSIS WITH THESTATE SPACE MODELBySingfàt Chu- Chun-LinB. Sc.(Hons.) Queen’s University at KingstonA THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinTHE FACULTY OF GRADUATE STUDIESCOMMERCE AND BUSINESS ADMINISTRATIONWe accept this thesis as conformingto the required standardTHE UNIVERSITY OF BRITISH COLUMBIADecember 1991© Singfat Chu-Chun-Lin, 1991In presenting this thesis in partial fulfilment of the requirements for an advanced degree atthe University of British Columbia, I agree that the Library shall make it freely availablefor reference and study. I further agree that permission for extensive copying of thisthesis for scholarly purposes may be granted by the head of my department or by hisor her representatives. It is understood that copying or publication of this thesis forfinancial gain shall not be allowed withu&y written permission.Commerce and Business AdministrationThe University of British Columbia2075 Wesbrook PlaceVancouver, CanadaV6T 1W5Date:AbstractThe State Space Model (SSM) encompasses the class of multivariate linear models, inparticular, regression models with fixed, time-varying and random parameters, time series models, unobserved components models and combinations thereof. The well-knownKalman Filter (KF) provides a unifying tool for conducting statistical inferences withthe SSM.A major practical problem with the KF concerns its initialization when either theinitial state or the regression parameter (or both) in the SSM are diffuse. In these situations, it is common practice to either apply the KF to a transformation of the data whichis functionally independent of the diffuse parameters or else initialize the KF with an arbitrarily large error covariance matrix. However neither approach is entirely satisfactory.The data transformation required in the first approach can be computationally tediousand furthermore it may not preserve the state space structure. The second approach istheoretically and numerically unsound. Recently however, De Jong (1991) has developedan extension of the KF, called the Diffuse Kalman Filter (DKF) to handle these diffusesituations. The DKF does not require any data transformation.The thesis contributes further to the theoretical and computational aspects of conducting statistical inferences using the DKF. First, we demonstrate the appropriate initialization of the DKF for the important class of time-invariant SSM’s. This result isuseful for maximum likelihood statistical inference with the SSM. Second, we derive andcompare alternative pseudo-likelihoods for the diffuse SSM. We uncover some interestingcharacteristics of the DKF and the diffuse likelihood with the class of ARMA models.Third, we propose an efficient implementation of the DKF, labelled the collapsed DKF11(CDKF). The latter is derived upon sweeping out some columns of the pertinent matricesin the DKF after an initial number of iterations. The CDKF coincides with the KF inthe absence of regression effects in the SSM. We demonstrate that in general the CDKFis superior in practicality and performance to alternative algorithms proposed in the literature. Fourth, we consider maximum likelihood estimation in the SSM using an EM(Expectation-Maximization) approach. Through a judicious choice of the complete data,we develop an CDKF-EM algorithm which does not require the evaluation of lag onestate error covariance matrices for the most common estimation exercise required for theSSM, namely the estimation of the covariance matrices of the disturbances in the SSM.Last we explore the topic of diagnostic testing in the SSM. We discuss and illustrate therecursive generation of residuals and the usefulness of the latters in pinpointing likelyoutliers and points of structural change.inAbstractList of TablesTable of ContentsHvi’List of FiguresSummary of NotationList of AbbreviationsAcknowledgement1 Introduction1.1 Preliminariesvi”ixxixi’152 The State Space Model2.1 Defining the SSM2.2 Characteristics of the SSM .2.3 Specializations of the SSM .2.4 The Statistics of the SSM2.4.1 Filtering2.4.2 Forecasting2.4.3 Likelihood Evaluation2.4.4 Smoothing2.4.5 Information Filter7810• . . . 1217• . . • 17• • . . 20• . . • 212124iv2.4.6 Computational Aspects 242.5 Summary 253 Time invariance and Stationarity in the State Space Model 273.1 Preliminaries 273.2 Automatic Initialization of the Kalman Filter 303.3 Summary 383.4 Appendix 394 The Diffuse State Space Model 434.1 Anchoring the Diffuse SSM 464.2 Pseudo-Likelihoods for the Diffuse SSM 474.2.1 The Diffuse Likelihood 494.2.2 Connection between the Diffuse and the Marginal Likelihoods . . 524.3 Statistical Inference with the Diffuse SSM 574.3.1 Filtering and Likelihood Evaluation with the DKF 584.3.2 Square Root DKF 614.3.3 Automatic Initialization of the DKF 634.3.4 Pitfall of employing the “big k” method 634.4 Characteristics of the DKF with ARMA Models 644.4.1 Autoregressive Processes 654.4.2 Mixed ARMA Processes 674.5 Summary 705 Efficient Algorithms for the State Space Model 715.1 The Canonical Form of the Diffuse State Space Model 735.2 Switching from the DKF to the KF 74v5.3 The Collapsed DKF 835.4 Efficiency of Collapsing Strategies 915.4.1 Numerical ifiustration 935.5 Summary 946 Maximum Likelihood Estimation in the State Space Model 966.1 The EM approach 986.1.1 The general DKF-EM algorithm 1006.1.2 Illustrations 1026.2 Estimation of covariance matrices in the SSM 1086.2.1 A new CDKF-EM algorithm 1106.2.2 Estimation of Structural Models 1136.3 Summary 1196.4 Appendix 1207 Residual Analysis in the State Space Model 1217.1 Connection with the Literature 1247.2 Backward Orthogonalization of Predicted Residuals 1267.3 Illustrations 1277.4 Summary 1298 Epilogue 134Appendix: CAPM Dataset 136Bibliography 141viList of Tables5.1 Dimensionalities in filtering algorithms 925.2 Dimensionalities in smoothing algorithms 935.3 Rnn times (seconds) for state prediction, regression parameter estimationand likelihood evalnation 945.4 Run times (seconds) for smoothing 946.1 Estimation results with Johnson & Johnson data 1036.2 Summaries for financial data 1066.3 Estimates with financial data 1076.4 Estimation results for airline departures data (I) 1156.5 Estimation results for airline departures data (II) 1166.6 Estimation results for tobacco products sales data 1176.7 Estimation results for UK weddings data 118viiList of Figures4.1 i2D evaluated vith DKF and “big k” methods 647.1 Tobacco sales data 1307.2 Diagnostics with tobacco sales data 1317.3 UK weddings data 1327.4 Diagnostics with UK weddings data 133vii’Summary of Notationv column vector vv# number of components in vector v1 column vector with all entries equal to 1M matrix M0 matrix with all entries equal to 0I, p x p identity matrixM(i;j) (i,j) entry of matrix MM’ transpose of MM* conjugate transpose of MM— Moore-Penrose generalized inverse of MM equals J’(JMJ’)’J with rank(JMJ’) = rank(M)if M is symmetric and singular(M, N) matrix with column blocks M and N(M; N) matrix with row blocks M and NMI determinant of Mtr(M) trace of Mvec(M) stack of columns of MM 0 N Kronecker product of M and NM’12 a square root of positive semidefinite matrix Mdiag(M, N) a diagonal matrix with diagonal matrix blocks M and NixE(x) unconditional mean of random variable xCov(x) unconditional covariance matrix of random variable xx‘-.‘(it, V) random variable x with mean t and covariance matrix Vfiltered estimate of random variable xsmoothed estimate of random variable xx random variable y conditional on xPred(yx) random variable a + bx where a and b minimimizediagonal elements of Cov(y — a — bx)Mse(yx) equals Cov{y— Pred(ylx)}A(y) —2 x log-likelihood of data y apart from constants—2x diffuse log-likelihood of data y apart from constants,\ (y) —2 x marginal log-likelihood of data y apart from constants2 (y) —2 x -concentrated log-likelihood of data y apart from constantsi positive (complex) square root of —1t (integer) time indexn number of observations in datasetxList of AbbreviationsSSM State Space Model (as defined in thesis)ASSM Augmented State Space Model (as employed in literature)KF Kalman FilterDKF Diffuse Kalman FilterIF Information FilterAR autoregressiveMA moving-averageARMA autogressive moving-averageARIMA integrated autogressive moving-averagemmse minimum mean square errormie maximum likelihood estimatorlog natural logarithmgis generalized least-squaresiff if and oniy ifxAcknowledgementI wish to express my deep gratitude to my supervisor, Professor Piet de Jong for hisboundless inspiration, guidance, patience, support and advice throughout the course ofthis work.Professor Martin L. Puterman provided much encouragement and advice throughout mygraduate career. I thank him for making my initial stay at UBC possible.Professor Jian Liu made valuable comments on this work.I gratefully acknowledge the financial support of the Faculty of Commerce and BusinessAdministration both for the Dean E.D. MacPhee Fellowship and for the numerous lectureship appointments. Partial support was also provided by the National Sciences andEngineering Research Council under grant 5-88-5-77.I commend Colleen Colciough and Ruby Visser for their gracious and expert handling ofmy dealings with the University administration.I wish to recognise Miss Celine Anime and Mr Chung Chai Tsang, both former teachersof mine, who have been inspiring role models during my younger days in Mauritius.Thanks, Paul, Melisa, Yuppy and Junior Ho for the happy memories amid the nicestliving environment.xiiTo Pa (late), Ma, Vivi and Fifi,This is yours too.xl”Chapter 1IntroductionThis thesis deals with the statistical and computational aspects of prediction, modelfitting and diagnostic testing in the State Space Model (SSM), a model which has becomeincreasingly prominent in the time series literature during the last two decades.The SSM originates from the systems science and owes much of its theoretical basisto the seminal contributions of Kalman (1960) and Kalman and Bucy (1961). Duncanand Horn (1972) introduced the SSM to the statistical community from the standpoint ofa random parameter regression model and connected its theory with the fixed parameterregression theory. Harrison and Stevens (1976) refer to the SSM as the Dynamic LinearModel in their work on Bayesian forecasting.The SSM describes an observation process in terms of an underlying unobserved timeseries known as the state. An example of the model is the time series model where theobservation (at time t) is specified as the sum of fixed regression effects and the state (attime t) with the latter having components which are interpreted as the unobserved trendand seasonalities. A simple stochastic model for the state may for example stipulate thatits trend component follows a random walk model while its seasonal components sumto a white noise process over the span of a year. Observe that this time series modelconsists of both fixed and dynamic random effects. The SSM that will be defined in thenext Chapter is a generalization of this example.The SSM has carved a niche in engineering and (more recently) in statistical and socioeconomic applications. The flagship application is perhaps the NASA space program1Chapter 1. Introduction 2which uses the SSM to monitor the progress of its spacecraft. In the field of econometrics,the SSM has been employed in the estimation of unobserved wage rates (Watson andEngle, 1983), the estimation of historical unobserved trend and cycle components of theBritish industrial production index (Crafts et al. , 1989) and in the seasonal adjustmentof census data (Burridge and Wailis, 1984). Business applications include inventorycontrol (Downing et al. , 1980), short-term forecasting (Mehra, 1979) and statisticalquality control (Phadke, 1981). In the area of policy, Harvey and Durbin (1986) reportan interesting study of the impact of seat belt legislation on road casualties. Recentlyin a series of contributions, Harvey (1984, 1989) expounds on the merits, in particularthe ease of interpretation, of a class of SSM’s called the structural models over anotherclass of SSM’s, namely the ARMA time series models which have been popularizedby Box and Jenkins (1970). Other uses of the SSM in the statistical arena deal withcross-validation (De Jong, 1988b) and spline smoothing (Kohn and Ansley, 1987a). Aninteresting application of the SSM deals with the prediction of outcomes of NationalFootball League (NFL) games (Sallas and Harville, 1981, 1989).The SSM is formally defined in Chapter 2. The definition is nonstandard in that fixedand random effects are treated separately. The conceptual and computational advantagesderived from this definition will be displayed throughout the thesis. We demonstrate thatmany practical statistical models are in fact special instances of the SSM. Consequentlythey can all be treated in a unified fashion upon casting them as SSM’s. The major partof the Chapter is devoted to a summary of the technology associated with the SSM. Inthis respect we cover the statistical and computational aspects of recursive filtering (i. e.the Kalma.n Filter), smoothing and likelihood evaluation. The concepts discussed thereinare central to the contributions presented later in the thesis.The KF needs to be properly initialized to allow its use for maximum-likelihood-basedinference in the SSM. Chapter 3 addresses this issue for an important class of SSM’s. TheChapter 1. Introduction 3Chapter first defines the concepts of time invariance and stationarity in the SSM context.Thereafter, assuming that a time invariant SSM has applied since time-immemorial, wederive closed-form expressions for the unconditional mean and covariance matrix of thestates. The results hold for both stationary and nonstationary SSM’s and are useful forinitializing both the KF and the Diffuse Kalman Filter (De Jong, 1991b) when they areapplied to these time invariant models.In Chapter 4, we turn to the problems of the initialization of the KF and the definitionof an appropriate likelihood for the general SSM, in particular the diffuse SSM. The latterarises when there is uncertainty about the initial state or the regression parameter in theSSM. These diffuse situations can be handled in a unified fashion by including a diffuserandom vector (i. e. a random vector with an arbitrarily large covariance matrix) in theSSM. However the effect of this diffuse random vector needs to be factored out prior toany statistical inference. This leads us to the study of the diffuse and marginal likelihoodswhich are both suitable pseudo-likelihoods for the diffuse SSM. We establish the exactrelationship between these two pseudo-likelihoods. De Jong (1991b) has developed anextension of the KF, called the Diffuse Kahnan Filter (DKF), to handle recursive filtering,smoothing, likelihood evaluation and gls estimation of regression effects in the diffuseSSM. Using basic arguments, we demonstrate why the DKF is a natural extension ofthe KF. The Chapter concludes with the presentation of two interesting characteristicsof the DKF with the class of nonstationary ARMA models. These models are oftenused in socio-economic applications. We demonstrate that when the DKF is appliedto nonstationary autoregressive models, it reduces de facto to the KF after an initialnumber of iterations. This “collapse” of the DKF sets the motivation for the work of thefollowing Chapter where we implement a collapsed form of the DKF which, for arbitrarySSM’s, is generally not equivalent to the KF. With nonstationary mixed ARMA (p,q)processes, we demonstrate that it is critical from a computational standpoint to restrictChapter 1. Introduction 4ourselves to the invertible parametrization of these models.The implementation aspects of the DKF are discussed in Chapter 5. The startingpoint is the redefinition of the diffuse SSM in such a way that the diffuse parameter vector is partitioned as-y = (71; 72) with 71 and 72 being solely associated with the initialstate and the unknown regression parameter. A proper estimate of 7i is obtainable (savefor collinearity problems) after an initial number of DKF iterations and this in turn canbe used to provide limiting estimates of the subsequent states. From a computationalstandpoint, this suggests collapsing the DKF, specifically factoring out those columns ofvarious matrices in the DKF which are associated with‘.This collapsed DKF, labelledthe collapsed DKF (CDKF), coincides with the KF in the absence of a regression effectin the SSM. The smoothing algorithm associated with the DKF can also be collapsed inan analogous fashion as the CDKF. We provide the details of the intricate adjustmentsrequired by this smoothing algorithm when it has to be switched back to the smoothing algorithm associated with the DKF in the pre-collapse time period. In the finalsection, we collate the CDKF and its associated smoothing algorithm with alternativealgorithms discussed in the literature. We conclude that the use of the CDKF (and itsassociated smoothing) algorithm can lead to appreciable computational savings since itemploys recursions of state error covariance matrices of lower dimensionalities than itscompetitors.Maximum likelihood estimation of parameters in the SSM is covered in Chapter 6. Itis well-known that maximum likelihood estimators possess such desirable properties asasymptotic consistency, efficiency, unbiasedness and normality. The estimation method,labelled the CDKF-EM method, embeds the CDKF within the EM algorithm, a popularderivative-free likelihood optimization algorithm. We generalize and unify previous worksin the literature. We also propose a novel CDKF-EM algorithm specifically designed forthe estimation of the error covariance matrices in the SSM. This new algorithm is simplerChapter 1. Introduction 5and computationally more efficient than the general algorithm discussed in the first partof the Chapter since it does not require the evaluation of lag one state error covariancematrices. We illustrate the CDKF-EM algorithm using examples borrowed from theliterature. Of interest is the fact that in some cases, the new CDKF-EM algorithmgenerates solutions with higher log-likelihoods than previously reported.In Chapter 7, we explore the topic of diagnostic testing in the SSM. The KF generates a sequence of uncorrelated residuals known as the innovations. The latters haveproved useful in tests of goodness-of-fit (see Harvey (1989), p256-260) but they oftenfail to distinguish between outliers and structural breaks in the SSM. In that regard, it isworthwhile to study alternative residuals. The SSM defined in this thesis employs a singledisturbance vector Ut with specific components of the latter applying to the observationand state equations (at time t). The Chapter focusses on the study of Vt which is definedas the predictor of Ut conditional on the whole observation set. We demonstrate thatthe Vt’5 are more useful than the innovations in the detection of outliers and structuralbreaks in the SSM. The Vt’S are serially correlated and we therefore consider the ideaof orthogonalizing them (in a backward direction). We conclude that these backwardorthogonalized versions of Vt merely corresponds to the innovations. This tells us that noadvantage is derived from using orthogonalized versions of Vt’S in lieu of the innovationsin statistical tests of goodness-of-fit in the SSM.1.1 PreliminariesFor clarity and completeness, we now define the notations employed in this thesis. Matrices are denoted by capital roman or caligraphic characters (e. g. M, M) and vectors byordinary characters (e. g. v). A matrix with all entries equal to zero is written as 0, theidentity matrix is denoted by I and a vector with all entries equal to one is denoted byChapter 1. Introduction 61. For (appropriate) matrix M, the determinant, the Moore-Penrose generalized inverse,the transpose, the conjugate transpose and a Choleski root are respectively denoted byMl, M-, M’, M* and M’/2. For (appropriate) matrices M and N, M 0 N, (M, N)and (M; N) respectively stand for the Kronecker product of M and N, the matrix withcolumn blocks M and N and the matrix with row blocks M and N.Time series observations are denoted by yt, t = 1,. . . , n with t the time index and nthe number of observations in the dataset. We will often use the shorthand notation yto denote the stack of observations (yl; . . . ; y). For a statistical model with parametervector and under the assumption of normally distributed disturbances, -2 times thelog-likelihood of y, apart from constant terms which do not depend on 0, is denoted byor more compactly by .\(y) when the role of 0 is unambiguous.For random variables z and y, x ([L, V) is shorthand for saying that x has meanE(x) = u and covariance matrix Cov(x) = V whereas Pred(xly) denotes the inhomogeneous linear combination of the components of y which minimize the diagonal elements ofCov{x—Pred(xy)} Mse(zy). In this thesis, we often consider the prediction of a random vector Xj conditional on observation vectors (yi;. . . ; yi) and (yi; . . . ; yj. We usethe shorthand notation, Mse(&t) and Mse(t) to respectively denote Mse(xtyi; . .. ; yt-_1)and Mse(xty1;... ; yn).Chapter 2The State Space ModelThis Chapter reviews the state of the art in state space technology and in the processit introduces the fundamental concepts behind the contributions presented later in thethesis. The development of the theory associated with these fundamental concepts canbe found in the lucid textbooks of Anderson and Moore (1979) and Harvey (1981, 1989).The programme of this Chapter is as follows. The SSM is formally defined in Section1. This definition of the SSM differs from that commonly employed in the SSM literature.We spell out the analytic and computational advantages arising from such a definitionof the SSM. The next section describes some attractive characteristics of the SSM. Insection 3, we demonstrate that familiar statistical models in the linear models literatureare in fact special cases of the SSM. Section 4 deals with the prediction aspects associated with the SSM, namely filtering and smoothing. These operations are conductedvia a pair of recursive algorithms known as the Kalman Filter (KF) and the Smoothingalgorithm. Direct implementation of these algorithms can be numerically unsafe, specifically with regards to maintaining the positive semi-definiteness of covariance matrices.An attractive solution is to employ the square root forms of the KF and the smoothingalgorithm. These propagate covariance matrices in terms of their Choleski square roots.We describe a computationally efficient version of these square root algorithms. We alsobriefly discuss a variant of the KF which is known as the Information Filter.7Chapter 2. The State Space Model 82.1 Defining the SSMThe SSM stipulates the generation of an observation process, y = (yi; y2; .. . ; y,j. In thisthesis, the SSM is defined according to the following pair of equations= Xt/3+Ztot+Gtut , t= 1,2,...,n (2.1)= Wt/3+Ttot+Htut, t=O,1,...,n (2.2)The first equation is called the observation (or measurement) equation while thesecond equation is known as the state (or transition or system) equation. The latterequation specifies the dynamics of the unobserved random vector a which is known asthe state (or system vector) at time t. The term 3 represents a regression. parameter, whilethe Ut’5 are serially uncorrelated disturbances with mean 0 and covariance matrix o2J.The SSM is anchored with a = 0 thereby implying an initial state, cv1 W03 + H0u.The system matrices Z, G, T and H as well as the regression matrices X, W are allassumed known.Our definition of the SSM differs from the one commonly employed in the literature.We label the latter definition of the SSM, the augmented SSM (ASSM) on account ofthe fact that the state is augmented to accomodate the regression parameter /3. Thespecification of the ASSM which is equivalent to equations (2.1)-(2.2) is,IQt\’!It = (Z X) I I + Gu , (2.3)T’V at litI I = I II I + I I Ut, t = O,1,...,n (2.4)/3 ) O ‘)/3) O)The above specification of the ASSM points out that the standard approach in theSSM literature has been to deal with regression effects implicitly rather than explicitly.Chapter 2. The State Space Model 9We now spell out the advantages associated with using our definition of the SSM asopposed to the ASSM1. Statistical. Regression effects (X/3 and W,8) are explicitly introduced in themodel. These effects form an integral part of any statistical model and shouldbe handled explicitly and not ignored or removed in an ad hoc fashion from theobservations. Furthermore this feature is useful (i) conceptually, to separate fixedregression effects from the purely random effects induced by the states, (ii) theoretically, to introduce diffuse parameters in the SSM and (iii) empirically, to describe,for instance, outliers and model shifts in the SSM.2. Computational. It will be shortly demonstrated that the performance of filteringand smoothing algorithms depends on the size of the state. It is therefore beneficialto keep the dimension of the state in the SSM to a minimum. Furthermore we willargue later in the Chapter that smoothing algorithms based on the ASSM areinefficient since the smoothed estimate of corresponds to its final estimate in thefiltering cycle and is therefore not effectively updated during the smoothing cycle.3. Analytic. Using the same /3 and Ut in the observation and state equation isnot restrictive since through appropriate choices of X, W, G and H, differentcomponents of 9 and Ut can be brought into either equation. This parsimonyof notation contrasts with the situation in the ASSM which in general employsdistinct regression parameters and disturbance vectors in the measurement andstate equations.The technical material introduced in this Chapter assumes that the regression parameter 3 is either fixed and known or random with known covariance matrix. For the lattercase (a rare occurrence in practice), it is necessary to employ the ASSM specificationChapter 2. The State Space Model 10wherein /3 is included in the state. The more common cases of fixed but unknown /3 andrandom but diffuse 3 require special treatment and will be covered in Chapter 4.2.2 Characteristics of the SSMWe now list some general characteristics of the SSM. These establish the usefulness ofthe model especially when viewed in the context of its specializations which are describedin the next section.1. Dimensionality. The dimensions of the system matrices are arbitrary at eachtime point except for conformability constraints. In particular, the SSM coversboth univariate and vector observations in a unified framework.2. State. Often the state has a physical meaning. For example, the progress of aspacecraft can be monitored using a SSM with a state whose components describethe velocity, acceleration, coordinates and rate of fuel consumption of the spacecraft. In the structural model introduced in the next section, the components ofthe state describe economic constructs such as trend and seasonalities. However inmany situations, the state can only be given an abstract interpretation.Observe that the current state embodies all the information up to the present.Therefore “knowledge” about the state implies the redundance of storing past observations. The latter aspect is a major feature of the filtering and smoothingalgorithms described later in the Chapter.3. Dynamics. In many natural or scientific phenomena, the evolution mechanism ofthe state is known to vary with time. The SSM provides a simple and elegantframework for capturing such knowledge via time-varying system matrices andregression matrices.Chapter 2. The State Space Model 114. Markovian Nature. The fact that the state equation is Markovian is not restrictive. When the dynamics of the state involve multiple lags, then a suitableaugmentation of the state keeps the Markovian feature intact. As an illustration,suppose a4 = W/3 + Ttat + Stag_i + In this case, an appropriate stateequation is,(€+ (w (T S ( o (HI I = I 1,8—i— I I I I i- I Uta ) 0 ) I 0 ) \ at-i ) \ 0 )Kalman (1960) exploited the Markovian nature of the state equation to design thefamous recursive filter named after him, namely the Kalman Filter.5. Non-Uniqueness. The SSM specification for a particular process is not unique.For example if U is any orthogonal matrix, then the SSM defined in (2.1)-(2.2) isequivalent to another SSM where Z is postmultiplied by U and W_1, T_1 andlI_ are premultiplied by U’. Furthermore it is possible for “equivalent” SSM’s tohave states and hence system matrices of different dimensions (equations (2.1)-(2.2)and (2.3)-(2.4) for example). Obviously from a computational standpoint, SSM’swith system matrices of minimum dimensions are preferred.6. Missing Data. Missing observations do not require any special handling. Stoffer(1981) has demonstrated a “zeroing-out” strategy whereby missing componentsof yg as well as the corresponding rows of X, Zj and G are replaced by zeroesand the “revised” data are then processed in the usual manner except for a minoradjustment in the Kalman Filter.7. Linearity. Repeated backsubstitutions of the state equation in the observationequation lead to y = X3 + Gu where y and U are respectively the stacks of theobservations Yt and disturbances Ut and X and G are built up from Xt, Z, W, TChapter 2. The State Space Model 12and H. This implies that the SSM is a linear model if X and G do not depend onthe data or the error terms. As such, the well-known linear model theory appliesto the SSM.8. Data Irregularities. The transition matrices X and W are useful for interventionpurposes in the presence of outliers and structural changes in the SSM.2.3 Specializations of the SSMIt is now shown that the SSM straddles a wide class of popular multivariate linear models,in particular regression models with fixed, time-varying or random coefficients, time seriesmodels and unobserved components models.Regression ModelConsider the SSM with scalar observations Yt (generalization for vector observations isimmediate) of the form,Yt = X/’3 + o.t, at+1 = a a + UtThrough appropriate specifications of X and a, we immediately recognise the followingfamiliar statistical models• White Noise Model: = Ut, (put X = 0, a = 0).• Fixed Parameter Regression Model: Yt = Xj3 + Ut, (put a = 0).• Autoregressive Model of order 1 : (yt — 3) = a (yt—i — /3) + Ut, (put X = 1).• Regression Model with autoregressive disturbances of order 1 : Yt = X/3 + o.t= a c + Uj, (interpret the state as the autoregressive disturbance term).Chapter 2. The State Space Model 13• Random Walk with drift: Ut = 3 + Ut-i + Ut, (put X1 — X = 1, a = 1).The random coefficients regression model (Nicholls and Pagan, 1985) can be describedby the SSM,Ut = Ztcit + GtU, at+i = Wt13 + Ttag + HUtSpecial cases of this model include (i) the regression model with random-walk parameters(put Wt = 0 and T I), (ii) the Return to Normality Model discussed in Harvey (1981,p202) where the regression parameter at evolves according to (at+1 — /3) = q(at — j3) + Ut(put W = 1 — çb, T = and H = 1) and (iii) the time-varying (but non-random)coefficient regression model (put lit = 0).ARMA ModelThere are several advantages in casting ARMA models as SSM’s. First, scalar and vectorARMA processes are dealt with in a unified framework. Second, their log-likelihood isevaluated in an exact and efficient fashion via the Kalman Filter. This contrasts withthe Box-Jenkins methodology which relies on ad hoc procedures such as back-forecasting.Third, note that through the zeroing-out strategy, missing observations do not requireany special handling.Consider the vector ARMA (p,q) model,Yt = A1y+ ... + Ay_ + t + B11 + ... + Bqct_qwhich following Gardner et al. (1980) can be specified as a SSM of the form,Chapter 2. The State Space Model 14= (1,0,. ..,0) c , t = 1,2,... (2.5)A1 I0...0 IA2 0I...0 B1= . . .. B2 Et+1 , t 0, 1,2,... (2.6)Am_i 0 0 ... IAm 00...0 Bm_iwherem=max(p,q+1),A=0, i>pandB,=0,j>q.In this specification, the first row block of the state vector is the observation itself.To see this, denote block component j of the state by at,,. Then by repeated backsubstitutions,= Aictt_i,i + at_1,2 + 6t= Aiat_i,i + (A2at_,i + at2,3 +B1c_)+ t= Aicvt_i,i + ... + Amat_m,i + t + B1e_ + . .. + Bm_i€t+i_mAs stated earlier, there exist alternative specifications for the SSM. For instance, Akaike(1975) defines a SSM where at,, (1 i m) is the (j-1)-step ahead predictor of theprocess.An immediate extension of the ARMA model is the regression model with ARMAerror structure, yt = Xj3 + Vt where the disturbance term Vt follows an ARMA process.This model can be written as the SSM above (equations 2.5-2.6) with the regression effectX/3 included in the observation equation and the state now interpreted as Vt. Anotherextension is the mixed linear model where the random effects evolve according to anChapter 2. The State Space Model 15ARMA process. This is cast as the SSM defined in (2.1)-(2.2) with /3 and c respectivelyrepresenting the fixed and random effects and the transition equation is as defined for theARMA process above. This particular SSM has been employed by Sallas and Harville(1981) for the prediction of football scores and dairy cattle breeding values.Unobserved Components ModelThe familiar time series model comprising of trend, seasonal and irregular components isa member of the class of unobserved components models. These models arise in many applications. For instance, Watson and Engle (1983) use the SSM to estimate “unobserved”wage rates whereas Downing et al. (1980) estimate the shrinkage or loss of materials inan inventory control system. In general, the data is used to suggest the form of theunobserved components model but it is often possible to impose plausible models for theform of the components. In such cases, they are known as structural models.An interesting application of the structural model is described in Harvey and Durbin(1986) who assess the impact of seat belt legislation on British road casualties. One ofthe structural models used by the authors is of the form,Yt = Xt/3+itt+ct+et,Pt÷i = Wj3 + ILt + Vt + “it, (Level)= V + t, (Local Slope)ct = Zt7t, (Seasonality)=where is a local linear trend with its level and slope (Vt) determined by randomwalks, ct describes aspects of the vector of the seasonal components 7t in effect at time I(the evolution of y, being specified by transition matrix F and disturbances Wt), Ct is theChapter 2. The State Space Model 16irregular component (i. e. measurement error) and X and W are matrices of explanatoryvariables such as car traffic index, real price of gasoline and indicator variables reflectingthe permanent effects (via X) or the transient effects (via W) of the seat belt legislation.This structural model can be written as the following SSM,Yt = X/3+(1 , 0Pt+i W 11Vt+1 = at+1 = 0 + 0 1 t + ?-2 Ut7t-i-i 0 0 I’ 7-i3The above SSM with /3 = 0 and ft describing quarterly variations will be often used inthis thesis and it will labelled the Quarterly Basic Structural Model (QBSM).SSM’s have also been employed in the study of non-linear and non-normal dynamicphenomena, both possibly occurring in a continuous time setting; see for example Kitagawa (1987,1989) and Pena and Guttman (1988), with the latters employing Bayesianconcepts to propose robust recursive algorithms for these situations. A class of nonlinearmodels of interest to time series practitioners is the state dependent model (Priestley,1988) where the current state depends on its previous realizations and/or past observations. Special cases include the bilinear models (Granger and Andersen (1978) and SubbaRao, 1981), the threshold autoregressive models (Tong and Lim, 1980), the exponentialautoregressive model (Haggan and Ozaki, 1981) and the autoregressive conditional heteroscedasticity model (ARCH) introduced by Engle (1982) and thereafter generalized byBollerslev (1986). Finally we also mention the seminal work of Harvey and Fernandes(1989) who extend the SSM technology to deal with count or qualitative time series data.Non-linear models are often of limited practical utility and furthermore their statisticalChapter 2. The State Space Model 17analyses are complicated and lack the elegance of their linear counterparts. This thesisdeals exclusively with the class of linear SSM’s.2.4 The Statistics of the SSMStatistical issues concerning the SSM include the reconstruction of the states {} andthe evaluation of the likelihood function. Estimation of the states can be achieved intwo ways namely filtering and smoothing whereby the states are respectively estimatedconditional on (Yl; y2; . . . ; yt—i) and (yr; y2;.. . ; yn).2.4.1 FilteringThe famous Kalman Filter (Kahnan, 1960) provides a recursive algorithm for the filteringprocess with the recursiveness ocurring as a result of the Markovian nature of the transition equation in the SSM. Filtering can be viewed as the process of updating a predictorin light of new information. In the Bayesian context, this is equivalent to computing aposterior distribution given a prior distribution and the data. It is therefore appropriatethat the KF be also viewed as a Bayesian procedure; see Harrison and Stevens (1976)and Meinhold and Singpurwalla (1983). Jazwinski (1970), Anderson and Moore (1979)and Harvey (1981, 1989) derive the KF using the classical ideas of the Prediction orProjection Theorem. We now state without proof the following prediction results (where/3 is assumed known), the equations of which make up the KF.Theorem 2.1 (Kalman Filter) Suppose y, . . . , y are generated by the SSM. Then &,the predictor of the state t conditional on (yr, . . . , yt—i), t n + 1, and its associatedMse matrix u2P are evaluated according to the following recursions,et = Yt — X/3 — D = ZPZ + = (TPZ + HG)D’,Chapter 2. The State Space Model 18= W/3 + T& + Ktet and Pt+1 = TPT’ + HH — KDKwith & = W03 and P1 = H0,.The quantities employed in the KF have physical interpretations : et is called theinnovation or the one-step ahead prediction error resulting from the prediction of ytconditional on (yi;. . . ; yt_i) and its covariance matrix is u2D ; K is the Kalman gainmatrix and it is used in updating the estimate of the state in light of observation yt (orequivalently the innovation et).Some merits of the Kalman Filter are the following:1. It is suited for on-line or real-time applications.2. It is efficient in terms of storage : past data need not be stored ; they manifestthemselves in the estimate of the current state vector.3. It produces minimum mean square linear estimators of the states.4. It provides a recursive scheme for evaluating the likelihood of the SSM. (see section2.4.3). This contrasts for example with Box-Jenkins evaluation of ARMA processeswhich employs the technique of iterated back-forecasting.5. It produces a sequence of residuals namely the innovations, et. The latters aregeneralizations of the ‘recursive residuals discussed by Brown et al. (1975) in theirstudy of the stability of regression parameters in the fixed regression model. Theinnovations represent aspects of the observation yt that cannot be predicted fromprevious observations and consequently they are serially uncorrelated. The latterproperty makes the innovations useful in statistical tests of goodness-of-fit in theSSM. This topic is discussed in more detail in Chapter 7.Chapter 2. The State Space Model 19A major practical problem with the KF concerns the specification of & and P1 i. e.Wo/3 and H0. This will be resolved in the next Chapter for the class of SSM’s with a timeinvariant state equation. More general cases will be dealt with in Chapter 4. Observe thatthe computation of P is the most time-consuming exercise in the KF. This emphasisesthe importance of keeping the dimension of the state to a minimum. In Chapter 5, wewill demonstrate that the CDKF outperforms its competitors on account of the fact thatit recurs P of lower dimensionality.We stated previously that the “zeroing-out” strategy permits one to process missingobservations in an automatic way except for a minor adjustment in the KF. We nowdiscuss the details of this adjustment. The zeroing-out strategy implies that D aresingular and consequently the KF fails. In this situation, it is practical to employ ageneralized inverse Dt = J’(JDJ’)’J where J is a “selector” matrix (e. g. a permutation of the identity matrix) such that rank(JDt) = rank(Dt). As De Jong (1991a,section 3) remarks and illustrates with an example, although the choice of J is immaterial for prediction purposes, it can however affect likelihood evaluation and maximumlikelihood estimation. To avoid these irregular situations, De Jong (1991a) proposes theconsideration of the regular SSM.Definition 2.1 Let y denote the stack of observations Yt generated by the SSM. Theny is said to generated by a regular SSM if y = Fz where Cov(z) is nonsingular and hasthe same rank as Cov(y) and F is functionally independent of any unknown parameter.The “zeroing-out” strategy coupled with the use of Dt as defined above is implicitlytantamount to dealing with a regular SSM with z interpreted as the stack of the nonmissing elements in y. More generally, in the consideration of the regular SSM, y#, IDtIand the log-likelihood )(y) are to be interpreted respectively as z#, IJDtJ’I and X(z).Chapter 2. The State Space Model 20Henceforth in this thesis, we imply the regular SSM whenever we refer to the SSM.In Chapter 4, we discuss the DKF, a ifitering algorithm specifically designed by DeJong (1991b) to handle diffuse parameters in the SSM. It turns out that the DKF isthe KF with the vectors et and & turned into matrices E and A with the same rowdimensions. The DKF can be viewed as a generalization of the KF since it uses the KFto update each column of A. The DKF therefore enjoys most of the attractive featuresof the KF.2.4.2 ForecastingForecasting is easily carried out with the KF. For instance, using the concept of iteratedpredictions, the k-step ahead predictor of the state is,Pred(cvt+kyl,.. .,yt) Pred{Pred(a+kyl,.. .,yt+ki)Iy1,. . .,yt}Wt+kl!3 + Tt+k_lPred(at+k_l Ui,. . . , y){Wt+k_i + Tt+k_iWt+k_2 + ... + (Tt+k_l .. .T+2)W+1}13+ (Tt+k_l...Tt+1)&t+1The mse matrix of the k-step ahead predictor is evaluated from the prediction error,ot+k — Pred(ot+kI Ui,.. . , ye). It follows that,2Mse(o+kIyl,. .. ,yt) Ht+k_1H÷k_l + Tt+k_1Ht+k_2H+k_2+ _l ++ (Tt+k_i . ..Tt+)Ht+lH+l(Tt+k_l . . . T2)’+ (Tt÷k_i .. . Tt+l)Pt+l(Tt+kl . . . Tm)’Therefore the k-step ahead observation predictor and its mse matrix are,Pred(yt+kIyl,... ,yt) = Xt+k/3+Zg+k Pred(at+kyi,. .. ,yt) andMse(yt+kIyi,...,yt) = Zt+k Mse(a+kyl,...,y) Z+k+o2Gt kG+k.Chapter 2. The State Space Model 212.4.3 Likelihood EvaluationAssuming that the disturbances Ut are normally distributed, Schweppe (1965) and Harvey(1981), the latter employing the Prediction Error Decomposition, have shown that —2times the log-likelihood of the SSM, apart from a constant, is,.2)= y#loga2+ 1og IDtI + a2 eD’et (2.7)Here 9 denote the parameters in the SSM (i. e. the system and regression matrices) andCov(’ut) = a21. Interestingly, the likelihood function is expressed as a function of onlythe innovations et and their covariance matrices a2D. This implies that the roles of theregression effects Xj3 and W/3 are implicitly buried within the innovations.Assuming that a2 is known and furthermore noting that et and D are producedby the KF, it therefore ensues that (yI&, a2) can be evaluated in a recursive fashionby attaching to the KF the extra recursion qt+i = qt + eD’et with q = 0. If a2 isunknown then it can be concentrated out (i. e. replaced by its mie) of the above log-likelihood function. The nile of a2 is &2 q,f/y# and upon its substitution in (2.7), weobtain thea2-concentrated log-likelihood, (yI9) = y# log qn+i + logIn Chapter 4, we derive two pseudo-likelihoods, namely the marginal and diffuselikelihoods. We demonstrate that they are both equal to )(yI9, a2) plus some additionalterms. It will also be shown that diffuse log-likelihood can be evaluated in a recursivefashion via the DKF.2.4.4 SmoothingThe KF constructs an estimate of the state at time t using only the information availableat time t — 1. In many situations however, it is desirable to estimate the state usingthe entire dataset. For instance, the common exercise of least squares estimation of theparameters of a statistical model is tantamount to a smoothing operation. Smoothing canChapter 2. The State Space Model 22be a complicated procedure requiring in many cases the inversion of covariance matricesof the order of the data.The SSM however provides a framework for recursive smoothing. This feature isagain due to the Markovian characteristic of the state equation. Smoothing in the SSMcontext translates to updating the ifitered estimate of the state, &, using the observationvector (yt; yt+i; .. . ; y,) or equivalently the innovation vector (et; et+1;.. .; en). Thereforesmoothing algorithms are designed to run backward using output produced by a forwardrun of the KF. Smoothing algorithms fall into three categories namely (i) fixed-intervalwhich assume that n, the number of observations, is fixed and t, the time index, varies (ii)fixed-point, where t is fixed and n increases and (iii) fixed-lag, where both t and n vary.These are all detailed in Jazwinski (1970) and Anderson and Moore (1979, Chapter 7). DeJong (1988b,1989) makes significant contributions towards enhancing the computationalefficiency of these smoothing algorithms.The fixed-interval smoothing algorithm, by far the most commonly used, suffices forthe purpose of this thesis. We now state without proof De Jong’s (1988b,1989) resultsconcerning fixed-interval smoothing.Theorem 2.2 (De Jong, 1988b) Suppose the KF is run and for t = 1,. . . , n, thequantities ZD’et, ZD1, &, Pt and L = T— KZ are stored. The smoothingalgorithm proceeds as follows : initialize r, and R respectively as a vector and a matrixof zerocs and for t = n, . . . , 1 run the recursions,‘it-i = ZD1e+ Lq and Ji_1 = ZD’Z +Then the smoothed estimator of the state at and its associated mse matrix are,= Pred(atlyi,. ..,yn) = & + F’it_i andMse(&t) = Mse(atlyi,...,y,) = (P — PR_1).Chapter 2. The State Space Model 23Furthermore, for 1 <t r n + 1, the mse matrix between smoothed estimators of thestates is,Mse(&t, &r) a2FtL_1,(1 — Rr_iPr)where L._1, = fl’ L with L_1, = I.Kohn and Ansley (1989) independently derive a scalar version of this smoothingalgorithm. Furthermore, they compare the efficiency of the algorithm to an alternativeone discussed in Anderson and Moore (1979, p187) and report for instance savings of theorder of 50% in the number of multiplication and division steps for a SSM with a stateconsisting of 15 components. This is not surprising since the above smoothing algorithmavoids further matrix inversions following the forward KF pass.An interesting method to compute Mse(&t_i, &) is provided by Watson and Engle(1983). They augment the state c by at_i, redefine the transition equation appropriatelyand thereafter run the KF and smoothing algorithm. This method automatically yieldsu2Mse(&t_i, &) as the off-diagonal matrix block of the mse matrix of the smoothedestimate of the augmented state. This approach is however computationally inefficientdue to the dimensionality of the augmented state.Two computational drawbacks of the ASSM are vividly portrayed in the smoothingexercise. First, the smoothed estimate of 3 corresponds to its final estimate obtainedin the KF and hence it is effectively not updated during the smoothing cycle. Second,the augmented state in the ASSM implies that a smoothing algorithm will require moredata storage than a smoothing algorithm based on the SSM described by (2.1)-(2.2).These properties make smoothing algorithms based on the ASSM patently inefficient. Anumerical illustration attesting for this fact is provided in Chapter 5.To conclude this subsection, we mention that the smoothing algorithm described inChapter 2. The State Space Model 24Theorem 2.2 is easily extended to handle diffuse parameters. This extended smoothingalgorithm will be discussed in detail in Chapter 4.2.4.5 Information FilterWhen there is uncertainty regarding either the initial state or the regression parameterin the SSM, it is common practice to initialize the KF with a large P1. This method,which is commonly known as the “big k” method, is inexact and can furthermore benumerically unstable ; see Chapter 4 for an illustration. The numerical problems cansometimes be avoided by using the Information Filter (IF) which differs from the KF inthat it recurs the inverse of P. The IF is easily derived from the KF using the well-knownMatrix Inversion Lemma ; see Anderson and Moore (1979, p139).The IF has some serious shortcomings vis-a-vis the KF. For example, it does notfollow that generalized inverses of the covariance matrices can be readily employed ifthese matrices are singular. For instance, Ansley and Kohn (1985b) remark that theIF breaks down for a large class of ARMA models. Furthermore, the IF is numericallyinefficient, requiring the inversion of large covariance matrices when observations Yt areof a multivariate nature. Applications of the IF can be found in Kitagawa (1981) andSallas and Harville (1988) which both deal with nonstationary time series models.2.4.6 Computational AspectsDirect implementation of the KF may lead to asymmetric or even negative definite covariance matrices due to rounding errors. These problems can be circumvented throughthe propagation of Choleski square roots of these covariance matrices. A survey on squareroot filtering is provided in Anderson and Moore (1979, p147-162). The following efficientsquare root form of the KF is due to De Jong (1991a).Chapter 2. The State Space Model 25Theorem 2.3 (Square-Root Kalman Filter) The Kalman Filter described in Theorem 2.1 can be generated as follows : let U be such that UU’ = I andIZtPh/2 ID O’I IU = I I (2.8)TP”2 H) K p)with 13 of minimal column dimensions and with the same row dimension as Z. ThenD = D2, P = P’f and K KD and &t+ = W/3 + T& + KD(yt — X/3 — Z&).The orthogonal matrix U may be obtained by various means, for example Givens rotation, Householder transformation or the QR algorithm as it is commonly known. Allthe computations reported in this thesis are based on square-root algorithms with theQR algorithm as their core component. The computer codes were written in the APLlanguage and run on an AT-type microcomputer. The codes for the QR algorithm weretaken from Heizer (1983).The square-root form of the smoothing algorithm follows the same concept as thesquare root KF. In particular, it only suffices to propagate R in square-root form. Thusfor t = n+ 1,. . . , 1, we find an orthogonal matrix U such that (D”2Z)’, (RI2L)’U =D1/2.12.5 SummaryWe have demonstrated the versatility and usefulness of the SSM. The importance oftreating fixed effects separately from random effects has been emphasised. Statisticalinferences in the SSM can be achieved within the unified framework of the KF and theassociated smoothing algorithm. A major problem with the KF concerns its initialization.This problem is dealt with in the next Chapter for the special case of S SM’s with a timeinvariant state equation. The issues of how to handle diffuse initial states and regressionChapter 2. The State Space Model 26parameters in the SSM are covered in Chapter 4. In particular, it will be shown thatthese diffuse situations can be handled by a generalization of the KF technology reportedin this Chapter.Chapter 3Time invariance and Stationarity in the State Space ModelMost practical time series models have time invariant parameters. ARMA models andstructural models are typical examples. In their study of ARMA models, Box and Jenkins (1970) consider the concept of stationarity and use it to develop such analytic featuresas the unconditional mean and variance of the ARMA process. The work in this Chapterextends these features to a more general class of SSM’s, namely those with a time invariant state equation. The results are useful for initializing both the KF and the DKFwhen they are applied to such SSM’s.This Chapter is divided as follows. The concepts and definitions of time invarianceand stationarity within the SSM context which were originally devised by De Jong (1991c)are reported in section 1. This section describes and investigates consequences of thesedefinitions. We derive the necessary and sufficient conditions for stationarity in the SSM.These generalize the well-known conditions for stationarity in ARMA models. Section 2deals with the evaluation of the unconditional mean and covariance matrix of the statesfor the time invariant SSM assuming the latter has applied since time immemorial. Fourapplications are reported. The Chapter concludes with an Appendix containing all thetechnical proofs.3.1 PreliminariesDefinition 3.1 The SSM is said to be time invariant if for t = 1,... ,n, W = W,T = T and ii = H in the state equation (2.2).27Chapter 3. Time invariance and Stationarity in the State Space Model 28Remarks1. The above definition does not say anything about Wo, T0 and H0 ; in particularthey may differ from the subsequent values of these same matrices.2. Time invariance does not impose any conditions on the observation equation.3. Time invariance implies that T is square. This follows since if T is a p x q matrixwith p q, then T+1 must have p columns thereby implying that T is not timeinvariant.Definition 3.2 The SSM is said to be stationary if it is time invariant and both E(ct)and Covfrxt) are invariant to t.This definition of stationarity parallels the definition of second-order or weak stationarity used in the ARMA model literature. Weak stationarity assumes that the autocovariance function is a function of the lag between the arguments. In the case of stationaryS SM’s, this assumption translates to the following result.Lemma 3.1 For a stationary SSM, Cov(at, at_s) = TCov(ot) , 0 .s t.Proof. The result is obvious for s = 0. For s > 0, observe thatat = (I + T + T2 + . .. + T’)W/3 +T8at_ + (Hut_i + THu_2+ ... + T8_lHut_)and hence Cov(at, at_8) = T8Cov(crt_)=T8Cov(at).The following Theorem states the necessary and sufficient conditions for stationarityin the SSM.Chapter 3. Time invariance and Stationarity in the State Space Model 29Theorem 3.1 The SSM is stationary if Wo and H0 are such that W0/3 = Wj3 + TW0/3and H0 = TH0H’+ HH’.Proof. Suppose Wo and H0 are as specified. Then E(a2) = W/3 + TWoI3 = W0/3and cr2Cov(c) = TH0H’ + HH’ = H0. Clearly for t > 2, E(ot) W0/3 andCov(crt) =a2H0which are both invariant to t. Conversely if W0 and H0 do not satisfythe given relations then E(cr2) Wo3 = E(ai) and Cov(o2) rHoH = Cov(ai) andhence the SSM is not stationary. •The equations W0/3 = W/’3 + TW0/3 and H0 = TH0H’ + HH’ need not haveproper solutions for Wo or H0. This is the case, for example when T, H and W/3 areall equal to one. Furthermore note that these equations do not appear to bear anyconnection to the Box-Jenkins approach where stationarity conditions are couched interms of roots of polynomials. However a consequence of the next result, which givesnecessary and sufficient conditions for the existence of proper solutions and hence forstationarity, is that these two approaches do in fact coincide for the ARMA model. Theresult is stated in terms of the eigenvalues of T. An eigenvalue of T is called stationaryif it has modulus less than one; otherwise it is called nonstationary.Theorem 3.2 The equation W0/3 = W/3 + TW0I3 has a solution for Wo if (W3)’x = 0whenever T’x = x. The equation H0 = TH0H’ + HH’ has a solution for H0 ifH’x = 0 whenever x is an eigemvector of T corresponding to a nonstationary eigenvalue.The proof is given in the Appendix at the end of the Chapter. A sufficient conditionfor both conditions in the Theorem to hold is that T only have stationary eigenvalues.Then, as suggested by Gardner et al. (1980), Ho may be solved as vec(HoH) = {I—Chapter 3. Time invariance and Stationarity in the State Space Model 30(T 0T)}1vec(HH’). Note that I — (T 0 T) is nonsingular since none of the eigenvaluesof T equals ±1.Application to ARMA(p,q) model. The SSM specification of an ARMA (p,q)model is given in equations (2.5)-(2.6). Recall that the determinant of a matrix does notchange if a multiple of one row is added to another row. Consider T— zI and add zItimes the first block of rows to the second block of rows leading to a second block of rowsof (A2 + zA1 — z21, 0,1, 0,. . . , 0). Repeat this procedure by multiplying the resultingsecond block of rows by zI and adding to the third block of rows and so on for subsequentrow blocks. The resulting matrix has a determinant equal to(—1)1det (A + zA_1 + ... +z1A — z”I)or equivalently (—l)’’det {(—z”) (I — z’A1 —... — zA)}The Box-Jenkins approach states that the model is stationary iff the roots (z’) of thepolynomial (I —z1A— ... — zA) all lie outside the unit circle. This is equivalent toz lying inside the unit circle or in other words T has all stationary eigenvalues.3.2 Automatic Initialization of the Kalman FilterThe KF recursion for the SSM must be initialized with the unconditional mean and covariance matrix of the initial state. In this section, explicit expressions for these quantitiesare developed for the time invariant SSM. The expressions hold for both the stationaryand nonstationary cases. In related work, Ansley and Kolin (1985a) show how to initializethe KF for the special case of ARIMA models.The previous section dealt with the assignment of Wo and H0 in order to inducestationarity in the SSM. A conceptual tool to get around the explicit assignments of Woand H0 is to suppose that the SSM has applied since time immemorial.Chapter 3. Time invariance and Stationarity in the State Space Model 31Definition 3.3 The SSM is said to have applied since time immemorial if it is timeinvariant and the state equation at+1 = W,6 + Tat + Hut is assumed to hold for t =r,. . ., —1,0, 1,. . . , n where r —* —co and a,. = 0.The concept of a model having applied since time immemorial is exploited in theBox-Jenkins methodology to evaluate the unconditional mean and variance of an ARMAprocess. This time immemorial assumption can also be used to find the unconditionalmean and covariance matrix of the states in a time invariant SSM. We first consider twosimple classes of time invariant SSM’s.Theorem 3.3 Suppose the SSM has applied since time immemorial. If T has all stationary eigenvalues then for t = . . . , —1, 0, 1,. .. , n + 1,E(crt) = (I — T)’W/3 and Cov(at) =where M is such that M = TMT’ + HH’. If T has all nonstationary eigenvalues then{Cov(at)} 0.The proof of this Theorem is also given in the Appendix. An immediate consequenceof the Theorem is that for stationary SSM’s which are assumed to have applied sincetime immemorial, the KF may be initialized with E(ai) = (I — T)1W/3 and Pi = M.These generalize the expressions derived by Gardner et al. (1980) who assume = 0.In general, T can have both stationary and nonstationary eigenvalues. In this case,the time immemorial argument leads to an arbitrarily large covariance matrix for thestates. This has led to the suggestion that the KF should then be initialized eitherexplicitly or implicitly with a state covariance matrix of the form kI or more generallykG + D where k is large. For example, both Burridge and Wallis (1985) and Burmeister,Chapter 3. Time invariance and Stationarity in the State Space Model 32Wall and Hamilton (1986) propose taking Cov(ai) = kI with k large. In their theoreticalworks on ifitering, smoothing and signal extraction, Ansley and Kohn (1985b,1987) andKohu and Ansley (1986) use Cov(i) = kG + D with k —* oo. De Jong (1988b,1991b)also employs this specification. There appears to be no literature explicitly justifying thechoice of E(o1)or the use of Cov(ci) = kC+D with k large or how to actually determinethe matrices C and D. The next Theorem provides expressions for these quantities underthe time immemorial assumption.Theorem 3.4 Suppose the SSM is assumed to have applied since time immemorial.Thenfort=...,—1,O,1,...,n+1,u2{Cov(ct)} = S’MS= lim(kUiU+U2Mk—+oowhere M is such that M QMQ’ + SHH’S’. Furthermore if for given x, x’U1 = 0 thenx’E(cit) = x’U2(I — Q)SW/3The matrices Q, 5, U1 and U2 are as follows : U is any matrix such that UTU =diag(P, Q) where P has all nonstationary eigenvalues and Q has all stationary eigenvalues; U = (R; 5) with S having the same row dimension as Q and U’ = (U1, (12) where U2has the same column dimension as the row dimension of S.Proof. Consider Ucr1 = UWI3 + UTU(Uo.) + UHut. This can be written as thefollowing system of equations,= RW/3 + Pnt + RHut,mt+i = SWJ3 + Qmt + SHutwith t = Rat and mt = Sc.Chapter 3. Time invariance and Stationarity in the State Space Model 33Since P has all nonstationary eigenvalues, it follows that {Cov(nt)} = 0. Thematrix Q has all stationary eigenvalues thereby implying E(int) = (I — Q)-’SWJ3 andCov(mt) o-2M where M is such that M = QMQ’ + SHH’S’. These results are nowused to showu2{Cov(cxt)} (U’{Cov(Uat)}U”) = U’{Cov(Ucxt)}U = S’MS= lim U’diag(k’I, M)U = urn {U’diag(kI, M)U”}k—*oo k—oo= lim (kU1U +U2MUk—booFinally, if x’U, = 0, then x’E(cvt) x’U’E(Uct) = x’{U,E(rit) +U2E(mt)} = x’U2(I—Q)’SWI3. This concludes the proof of the Theorem. •Noteworthy features of Theorem 3.4 are:1. It encompasses Theorem 3.3. If all the roots of T are stationary, then U = S = Iand U, is null and the Theorem implies the well known results (Gardner at al.,1980), Cov(crt) = u2M where M = TMT’ + HH’ and E(ot) = (I — T)’W3. If allthe roots of T are nonstationary then U2 is null, U, = I and thus {Cov(ao)} =limk k-1 = 0.2. A general construction for U is as follows. Suppose )q, )‘2,. . . , ) are the rootsof T with algebraic multiplicities n1,n2,. . . , n,, and where conjugate pairs of rootsare included just once and.2I...For i=1,2, . ..,p, define matricesN as follows. If ) is real then N has n: columns spanning the null space of(T — )iI)fl’. If \, is complex then N: has 2n columns spanning the null space of(T2 — 2rT + P1I2IY where r2 is the real part of . Put U—’ = (N,, N2,. . . , N,,).The decomposition UTU’ = diag(P, Q) is called a Real Jordan Canonical form ofT (see Brown (1988), p 141-150).Chapter 3. Time invariance and Stationarity in the State Space Model 343. With U as constructed in 2., the matrix Q is block diagonal with blocks of sizen x n1 or (2nd) x (2n1) depending on whether the corresponding root is real orcomplex. Accordingly, the inversions of I— Q 0 Q and I — Q are reduced toinverting relatively small diagonal blocks.4. If x = S’7 for some y, then x’E(ct) —* 7’(I — Q)-1SW3. This result is obtainedupon noting that SU2 = I and 5U1 = 0.5. The Theorem suggests that for nonstationary SSM’s, ao may be specified as o =U17 + a where -y and c are uncorrelated and u2 Cov() = kI with k largeand u2 Cov(c) = U2MU. This specification in turn implies the initialization ofthe KF with P1 = kTU1UT’ + TU2MUT’ + HH’, where k — cc. This howeveris not a satisfactory solution for it induces numerical instability in the KF. Thelatter situation arises since the update of P in the KF (Theorem 2.1) may possiblyinvolve the difference of two large quantities to yield a required small quantity.These numerical difficulties will be circumvented via the use of an extended KF inthe next Chapter.6. The Information Filter sometimes serves as a useful alternative to the KF, in particular when the latter is to be initiated with a large state error covariance matrix (F1),as for instance in the case of nonstationary SSM’s. For these cases, the Theoremsuggests that a suitable initialization for the IF is P = S’M S. This contrastswith the common practice of employing P = 0.We now illustrate the results of Theorem 3.4 in four applications. These provideexpressions for the unconditional mean and covariance matrix of the states in the AR (2)model and three other empirical models borrowed from the literature. As stated in point5. above, the derived expressions are useful for initializing the DKF when it is appliedChapter 3. Time invariance and Stationarity in the State Space Model 35to these models. The results for the second and third applications were obtained withthe use of software written in APL.Application to the AR(2) model. The AR(2) model Yt = /3+ ayt_i + by_2 + ct hasa SSM representation,(/3 (a i (iYt = (1, 0) o, at+1 = I I + I I at + I I €t+i0) b 0) \0)The roots of T are (5 = (a + v’)/2 and -y = (a — /i)/2 where d = a2 + 4b. Ifboth roots are stationary then M =c2Cov(at) satisfies M TMT’ + diag(1, 0) andE(at) = /3(1 — a — b)1(1; b). When both roots are nonstationary, {Cov(at)} —* 0 andboth components of E(at) diverge. Now suppose 6 is stationary and ‘-y is nonstationaryand hence the roots are real. Define the matrix U with columns U1 = (1; —6) andU2=(l;—-y). Put Q=6. Then11 (5 /i \ /1 \lim — E(ao) + /3 I I I I andk-*ook (‘—X-y—) \ —7 )J \ —6)2_______(i (1—6lim1TCov(ao)- (1- (59(-y - 6)2 72 ) -6 62Application to the seasonal adjustment of data. Burridge and Wallis (1985) dealwith the seasonal adjustment of U.S. employment data. Part of one of the models relatesto a cyclical component specified by2.26 1 0 0 1—1.52 0 1 0 —.989W/3=O, T= 2.144O90.26 0 0 1 .006860 0 0 0 .00001Chapter 3. Time invariance and Stationarity in the State Space Model 36The matrix T has a nonstationary root 1 repeated twice and stationary roots of 0 and.26. Assuming the model has applied since time immemorial thencr2Cov(co) is of theform.99 —.02 —.06 0 .12 .24 .12 —.33 x iO.94 —.24 0 .48 —.24 .67 x iOk +.07 0 .12 —.33 x iO0 1x10’°where k —+ oo. Burridge and Waffis (1985) employed Cov(co) = 10121.Application to the unobserved components model. Burmeister, Wall and Hamilton (1986) apply the Kalman filter to estimate unobserved monthly inflation rate ineconomic time series. One of their SSM’s has parametersW = 0, T= ( o), H’ = (1,0,0,0), 2 = 1.9537 x iOwhere ‘ = (.14135, .89635, —.3817, .11173). The roots of T are 0, .1905 ± .2905i, .8497and —1.0894. Under the time immemorial assumptiona2Cov(oo) is of the form.27 —.25 .23 —.21 .19 1.40 1.42 1.00 1.00 .70.23 —.21 .19 —.18 1.53 1.30 1.11 .90k.19 —.18 .16 + 1.64 1.20 1.19.16 —.15 1.73 1.12.14 1.80where k —* oo. Burmeister, Wall and Hamilton (1986) used Cov(cio) = 201.Application to panel survey data. In many applications, it is possible to directlypartition the state into a nonstationary component and a stationary component. ForChapter 3. Time invariance and Stationarity in the State Space Model 37instance, Pfeffermann (1991) employs the following state-space model for the estimationand seasonal adjustment of population means based on rotating panel surveys carriedout on a quarterly basisYt = (10 100 1/4 1/4 1/4 1/40 0) , t = 1,2,...t+i = diag(T1,T2)o+et , t=0,1,2,... where000000110 0 0p000 00010 0 00 0 0 0 p3 0Ti= 0 0 —1 —1 —1 T2=OOpO 0000 1 0 0000001000 1 0010000Var(et) diag(Vi,V2), = diag(4,cr,o,0,0) andV2 = o diag{(i—p)’,i,(p4+p1), ,0,0}Here Ti describes the quarterly basic strtctural model reported in Chapter 2 while T2describes the rotation patterns of the units of the panel data under the assumption thatobservations from the same unit follow an AR(1) model with autoregressive coefficient p(II < 1). Observe that Ti has eigenvalues i,i,i,—i and 0 while T2 has all its eigenvaluesequal to 0.To initialize the KF, Pfeffermann (1991) following Harvey and Peters (1990) uses adiagonal covariance matrix of the form diag(Pi, ‘P2) where P is diagonal with a zerodiagonal element plus four arbitrarily large diagonal elements and P2 = o(i—p2)’I6.These specifications are in line with the results of Theorem 3.4. In particular, observethat the transition matrix is already in the required “nonstationary-stationary” diagonalform thereby implying that U = I. Thus P diag(k14,0), k —* cc and P2 satisfies‘P2=T2P+ V2.Chapter 3. Time invariance and Stationaxity in the State Space Model 383.3 SummaryWe have derived expressions for the unconditional mean and covariance matrix of thestates in time invariant SSM’s under the assumption that the latters have applied sincetime immemorial. These expressions are useful for initializating the KF. However if thecovariance matrix of the states is arbitrarily large as in the case of nonstationary SSM’s,then the KF will fail numerically. The next Chapter demonstrates an extension of theKF to deal with this problem. The results of this Chapter will prove useful for initializingthis extended KF for the class of time invariant nonstationary SSM’s.Chapter 3. Time invariance and Stationarity in the State Space Model 393.4 AppendixThe following Lemma and its consequences will be useful for the various proofs. Proofand details can be found in Brown (1988).Lemma 3.2 Every square matrix T has a Jordan Decomposition such that T’ = U(D +K)U—’, where D is a diagonal matrix with the eigenvalues of T as its diagonal entriesand K is a matrix with zeroes everywhere except on its superdiagonal where there maybe one or more ones. The matrices D and K are related as follows if K(j, j + 1) = 1then D(j,j) = D(j + 1,j + 1). The matrices D and U may be complex with the columnsof U being the generalized eigenvectors of T’.Furthermore D and K have the following useful properties, (i) DK = KD, (ii)Jm= 0 where mis the order of T and (iii) (D + K)t =Dt7’ (D-K)i.\J)Proof of Theorem 3.2 The equation W0/3 = W13 + TW0/3 is equivalent to® (I — T)}vec(Wo) W/3Clearly the equation is consistent if 3 = 0. Suppose 3 0. Then consistency results 1fffor x 0, x’{/3’ 0 (I — T)} 0 implies x’W/3 = 0. Since 3 0, x’{j3’ ®(1 — T)} = 0implies x’T = x’. Thus for given W, T and /3, there is a W0 such that W03 = W/3+TW0/if T’x = x implies (W/3)’x = 0.Next consider w*HoHw = w*THoHTw + w*HHw for w arbitrary. Then for t 0,w*HoHw = w*TtHoHTt+h’w + w*TtHHTt’w + ... + w*HHwChapter 3. Time invariance and Stationarity in the State Space Model 40Using Lemma 3.2, write w*TtHHITt’w as w*U_l*(D +K)*t(U*HHIU)(D + K)tU’w.A typical entry of (D + K)tU_lw is,A{ro+tr1/A+...+ r/Ak}k)where A is an eigenvalue of T and (ro, . . . , r,) are consecutive elements in U’w, withrk 4 0 if k > 0. Thus for large t, w*TtHHTt’w is dominated by a term of the form2x*HHlxIAI2(t) 1 1 Irk 2k)where x is a generalized eigenvector ofT’ associated with eigenvalue A. If A is nonstationary and x*HHIX 0 then the dominating term diverges and hence the equation cannothold for finite H0. Thus the stated condition is necessary for stationarity.To show sufficiency, suppose the condition holds. Premultiply and postmultiply bothsides of H0 TH0HT’ + HH’ by U* and U to yieldC = (D + K)*C(D + K) + U*HHUwhere C = U*HoHU. Without any loss in generality assume D + K and U*HH!U arearranged so that D + K has two seperate blocks each containing all the nonstationaryand all the stationary eigenvalues of T’. A solution for C is obtained by setting thediagonal block of C corresponding to the “nonstationary” block in D + K to zero whichis possible since the condition states that the “nonstationary” block in U*HHU is a zeromatrix. The vec of the “stationary” diagonal block of C (denoted by C) can be solved asvec(C) = {I_(L®L*)}_1vec(R) where L and Rare respectively the “stationary” blocks of(D+K) and U*HHU. Thus U_l*diag(0,C)U_l is a solution to H0 = TH0HT’+HH’..Chapter 3. Time invariance and Stationarity in the State Space Model 41Proof of Theorem 3.3. Consider the state equation a Wj3 + Teat + Htut wherefor t> 0, W = W, T = T and H = H. Backsubstitution showsat = (I + T + ... +Tt2)W/3 + TtW0+ Hut.i + THu2+ . + THu1+TtHu0and in turn this implies,= (I + T +... + Tt_2)W/ + Tt_1W0/3 andcr2Cov(at) = HH’ + THH’T’ +... +Tt2HH’(T) +TtH0T’Clearly the behaviour of Tt dictates the convergence of E(at) and Cov(at). Observe thatrn-i ITt’= U(D + K)tU_i = U{Dt > I I (DK)3}U_ijO \ j )Suppose T has all stationary eigenvalues. Then Dt —* 0 implying Tt = 0. Accordingly, using a geometric sum argument, (I + T + T2 + .. + Tt) ,‘ (I T)_’. Hence ast —* oo, E(at) = (I + T + T2 +. . + Tt_i)W,6 + Tt_iWo/3 (I — T)W/. The limitingcovariance matrix of the state vector is such thatvec{Cov(at)} u2(I + (T ® T) + ... + (T ®T)t)vec(HH’)As t —* oo, vec{Cov(at)} .. 2{J_ (T®T)}_ivec(HH) which is the solutionu2vec(M)where M is such that M = TMT’ + HH’. Gardner, Harvey and Philips (1980) obtain asimilar solution. Note that (I — T) and {I — (T 0 T)} are both nonsingular since T doesnot have any root with unit modulus.Now suppose T has all nonstationary roots. Clearly (I + T + T2 + ... + Tt) diverges and hence x’E(at) diverges for every x 0. Finally it is shown that for anyw, w*TtHH1T’tw converges to zero or diverges to +oo. Thus every root of Cov(at)Chapter 3. Time invariance and Stationarity in the State Space Model 42converges to zero or diverges to +oo which implies that {Cov(at)} —* 0. Recallfrom the proof of Theorem (3.2), w*TtHHIT’tw is dominated by a term of the formx*HIIxII2(t_c) I 1 Irk 12 where x is an eigenvector of T’ associated with the nonsta\k)tionary eigenvalue A. As I —* oo the dominating term diverges unless x*HHfX = 0 inwhich case U*HHU = 0 which in turn implies H = 0 and therefore w*TtHHT’tw = 0..Chapter 4The Diffuse State Space ModelThis Chapter is concerned with the problem of diffuseness in the SSM. In this thesis,a diffuse random variable is viewed as one with an arbitrarily large covariance matrix.Diffuseness arises in three situations within the SSM context : (i) when the SSM isnonstationary and is assumed to have applied since time-immemorial, a situation weencountered in the last Chapter, (ii) when there is uncertainty about the initial state ina time-varying SSM and (iii) when the regression parameter vector /3 is unknown ; thiscovers both the cases of fixed but unknown /3 and random /3 with unknown covariancematrix. To reflect the lack of knowledge about the parameters in the last two situations,it is convenient to regard them as diffuse random variables. It will be shown in thisChapter that these three diffuse situations can be addressed in a unified fashion usinga transparent generalization of the KF technology introduced in Chapter 2. In the nextChapter, we will demonstrate that this approach, when properly implemented, is superiorin practicality and computational performance to alternative approaches discussed in theliterature.These alternative approaches can be categorised into four methods, all of which applyto the ASSM, which as described in Chapter 2, employs an augmented state to accomodate the regression parameter /3. The first one, commonly known as the “big k” method,initiates the KF with an arbitrarily large covariance matrix in order to reflect the diffuseness in the initial state. The “big k” method is popular in empirical works (see Burridgeand Wallis (1985) and Den Butter and Mourik (1990) for example) since it makes use43Chapter 4. The Diffuse State Space Model 44of readily-available KF software. However as we will illustrate graphically later in thisChapter, it is inexact and numerically unstable. The second method employs the Information Filter (IF) ; see Kitagawa (1981), Sallas and Harville (1988) and Pole and West(1989). The drawbacks of the IF have already been discussed in Chapter 2. The thirdmethod, due to Harvey and Pierse (1984), is best introduced within the sphere of theARIMA model. Here the state vector used is the one associated with the differencedmodel (i. e. stationary ARMA model) but augmented with the first d (the order of differencing) raw observations. More generally, the augmented part of the state correspondsto regression-type estimates based on an initial stretch of the raw observations. Thistechnique is tantamount to producing an estimate of the state in effect at time t = dand allows one to initiate the KF at t = d where d is equal to the number of regressionestimates. The method has two drawbacks: (i) the evaluation of the regression-type estimates can be difficult and potentially complicated and messy (e. g. with missing data)and (ii) the excessive augmentation of the state, when the SSM also includes an unknownregression parameter vector, makes it computationally unattractive. The fourth method,devised by Ansley and Kohn (1985b), applies in more generality than the three methodsdiscussed above. Conceptually, the method amounts to removing the diffuseness in theSSM through a data transformation and thereafter applying the KF to the transformeddata. This data transformation is achieved in an implicit fashion by a “modified” KFwhich is hereafter referred to as the AKKF. However the implementation of the AKKFcan be complicated ; in particular, “existing Kalinan Filter software cannot be used”(Bell and Hillmer, 1991).The method used in this Chapter to treat the diffuseness problem in the SSM is dueto De Jong (1991b). It expresses diffuse aspects of the SSM via a parameter vector,extends the KF technology discussed in Chapter 2 to estimate-y in parallel with the nondiffuse aspects of the state and finally indicates the appropriate adjustments requiredChapter 4. The Diffuse State Space Model 45for factoring out the diffuse effects. This extended KF, called the Diffuse Kalman Filter(DKF), operates by applying the KF technology to each column of a column-augmentedstate. The latter consists of-y +1 columns with the first columns each correspondingto a particular aspect of and the last column corresponding to the non-diffuse aspectsof the state. The DKF is therefore a transparent generalization of the KF ; it turns acouple of vector iterations in the KF into matrix iterations. The DKF and the AKKFare conceptually similar since they are designed to factor out the effects of the diffuseparameter.However they differ in approach: is factored out explicitly in the DKFbut implicitly in the AKKF. Although matrix recursions are employed in the DKF, itdoes not follow that the DKF is computationally expensive since its performance, likethat of the KF, is significantly more dependent on the number of rows than the numberof columns in the matrices that it computes. The issue of computational comparisons iscovered in detail in the next Chapter.In this Chapter, we address two original topics associated with diffuseness in the SSM.First, we derive, compare and relate two alternative pseudo-likelihoods called the diffuseand marginallikelihoods which are suitable normalized likelihoods for a SSM with diffuseparameters. This study is useful since it allows us to relate likelihoods evaluated fromdifferent transformations of the data. In the absence of diffuseness in the SSM, both thediffuse and marginal likelihoods reduce to the “ordinary” likelihood defined in Chapter2. Second, we report some interesting properties of the DKF when it is applied to theoften-used class of nonstationary ARMA models. We show that when it is applied toautoregressive processes, the DKF collapses de facto to the KF after an initial run. Withmixed ARMA processes, we demonstrate the prudence of restricting grid-searching of thediffuse likelihood function to the invertibility region. This avoids numerical roundoff andoverflow problems in the DKF.Chapter 4. The Diffuse State Space Model 46The contents of this Chapter are as follows. In the first section, the SSM is redefined to incorporate diffuse parameters. Section 2 is concerned with the derivation andcomparison of the marginal and diffuse likelihoods. Section 3 deals the DKF. We startwith an intuitive explanation of the concepts behind the derivation of the DKF andthereafter we summarize the results of De Jong (1991b) concerning filtering, smoothing,gis estimation of the regression parameter and evaluation of the diffuse likelihood withthe DKF. The results of Chapter 3 are then used to initialize the DKF for use with theclass of time-invariant SSM’s. We also report the square-root forms of the DKF and itsassociated smoothing algorithm. The section concludes with a graphical illustration ofthe pitfalls of employing the inexact “big k” method as opposed to an exact method likethe DKF. Section 4 illustrates the DKF with the class of nonstationary ARMA models.4.1 Anchoring the Diffuse SSMIn order to accomodate the three diffuse situations described in the preamble, it is necessary to redefine the anchoring of the SSM. This will be achieved via the device of adiffuse random variable.Definition 4.1 A sequence of random variables {y’, 72,.. .} is said to be diffuse ifCov(7’j is nonsingular and the sequence of inverses of Cov(7c) converges, as k —* co, tothe zero matrix in the Eucidean norm.Remarks1. The above definition of diffuseness translates to the assumption of a noninformativeprior in Bayesian analysis.2. This thesis henceforth uses the expression “diffuse random vector” to refer to theabove sequence of random variables.Chapter 4. The Diffuse State Space Model 47Definition 4.2 The diffuse SSM (DSSM) is the SSM defined by equations (2.1)-(2.2)with a0 and /3 now specilled ascvo=a+A7 and /3=b+B-y (4.1)where a and b are known, (A; B) is of full column rank, y (c,2C) with C-’ —* 0 (i.e. 7 is diffuse) and y uncorrelated with (uo,. . . , u,j.The above specifications are flexible: (i) if a0 is totally diffuse then rank (A) =i. e. the number of components in a0, (ii) if cVo is partially diffuse then rank (A) < 4and (iii) if ao is not diffuse then A is null. Similar statements apply to 3 and rank (B).RemarkIn line with Definition 4.1, a diffuse SSM is a sequence of SSM’s with each term in thesequence corresponding to a term in the sequence {7I}.4.2 Pseudo-Likelihoods for the Diffuse SSMThe presence of a diffuse parameter in the DSSM implies that the likelihood functionof the latter converges pointwise to zero at every possible set of values of the parameterand it is therefore uninformative. One approach suggested to deal with this problem isto consider a likelihood based on a SSM where and consequently the initial conditionsare fixed. However, it has been established in the literature (Tunnicliffe Wilson (1989)and Shepard and Harvey, 1990) that this approach may lead to erroneous statisticalinferences with a high probability. This has led researchers to instead study pseudolikelihood functions based on particular normalizations of the probability distribution ofthe model. An instance of normalization is differenced data which is employed by Box andJenkins (1970) in the context of scalar ARIMA models. In this section, we generalize thedifferencing technique and thereafter construct normalized likelihoods based on implicitChapter 4. The Diffuse State Space Model 48differencing of the data.The differencing operation effects a data transformation which results in the transformed data being functionally independent of the diffuse parameter. The likelihoodbased on the transformed data is called the marginal (or restricted or residual or invariant) likelihood. The marginal likelihood was introduced by Kalbfleish and Sprott (1970)who were interested in defining a likelihood for those aspects of the data which are invariant to a set of nuisance parameters. These can be viewed in an analogous fashionto diffuse parameters. Thus if the “exact” likelihood is a product of two terms eachsupplying exclusive information on two different sets of parameters, one of which is considered a nuisance for the estimation process, then that part of the exact likelihood whichrelates to the parameter set of interest is called the marginal likelihood. Kalbfleish andSprott (1970) “expounds marginal likelihood as a means for treating nuisance parametersand reducing bias in the parameters of interest” (Tunniciffe Wilson, 1989). Marginallikelihood has been employed in the modelling of variance components (Patterson andThompson, 1975) and in the estimation of ARMA parameters (Cooper and Thompson,1977). Parameter estimation based on the marginal likelihood is commonly known asrestricted maximum likelihood (REML) estimation. Both Ansley and Kohn (1985b) andSallas and Harville (1988) employ a marginal likelihood for estimation purposes withinthe context of the SSM.Two practical problems confront the use of the marginal likelihood. First, there isthe issue of data transformation. This can be computationally tedious and intricate (e.g. in the case of missing observations). Furthermore although differencing is a populardata transformation in the scalar time series arena, its application to vector time seriesis still a moot point. For instance, Tsay and Tiao (1990) report that for the case ofvector ARMA processes, identification of the “genuine” nonstationary components ofthe vector process is not currently feasible and furthermore differencing of the individualChapter 4. The Diffuse State Space Model 49components of the observation vector is not justified and may even induce noninvertibilityin the resulting stationary model. Second, we are faced with the issue of evaluating themarginal likelihood in an efficient manner. From our discussion in Chapter 2, we wouldexpect that this could be done recursively. However this is not the case since the SSMstructure is not maintained after a data transformation except in special circumstances.For instance, a necessary condition for the implementation of the AKKF of Ansley andKohn (1985b) is that the matrix A (defined in equation 4.1) have a canonical form.De Jong (1991b) has proposed an alternate pseudo-likelihood called the diffuse likelihood which does not have the shortcomings of the marginal likelihood for it is basedon the untransformed data. Furthermore he demonstrates the evaluation of the diffuselog-likelihood via the DKF. In the next two subsections we review the derivation of thediffuse likelihood and establish its connection with the marginal likelihood.4.2.1 The Diffuse LikelihoodAn expression for the diffuse likelihood is best derived upon regarding the DSSM as alinear model. It is a straightforward exercise to show that upon repeated substitutionsof the transition equation into the measurement equation and taking into account theanchoring of the DSSM defined in equation (4.1) that the DSSM can be written as thelinear model y = X(a; b) + X(A; B)7 + Gu where y and u are respectively the stacksof the observations and the error terms and X and G are built up from the regressionand system matrices in the model. We will find it convenient to employ the followingshorthand notations: GG’ = , X(a; b) = x and X(A; B) = X and the linear model itselfwill be denoted by (y, {x, X}, G). Furthermore we remind the reader that the variouslog-likelihoods (AQi), ,Xd(y) and )m(y)) employed in the thesis are to be interpreted as-2 times the “exact” log-likelihood minus all additive constants which are independentof the parameters. These preliminaries set the stage for the derivation of the diffuseChapter 4. The Diffuse State Space Model 50likelihood.Theorem 4.1 (De Jong, 1991b) Suppose y is generated by a DSSM with (y; u) normally distributed. Then as C —* 0, )(y) — log Icr2CI converges to the diffuse log-likelihood,= (y# whereS = X’D’X, s = X’Yr’(y— x) and q=(y — x)’E’(y — x).The proof of the Theorem makes use of the following Lemma.Lemma 4.1 Suppose y is generated by a SSM with (‘y; u) normally distributed. Then71Y N(.5’,2(C-1+ S)’) where = (C’ + S)’(C’c+ s).Proof. That 7y is normally distributed is a well known property of the normal distribution. Two well-known identities are needed to prove the result. Suppose P and Q arenonsingular matrices and R is conformable with P. ThenPR’(RPR’ + Q)1 = (R’Q’R + P’)’R’Q’ (4.2)(R’Q’R + P’)’ = P — PR’(RPR’ + Q)’RP (4.3)These identities are now used to derive E(71y) and Cov(71y). Under the normalityassumption, it follows thatE(7y) = E(7) + Cov(7,y){Cov(y)}’[y — E(y)]= c+CX’(XCX’+’{y-x-Xc}= c+ (X’’X + C’)’X’Y’(y — x — Xc) by (4.2)= c+(S+C’)’ (s—Sc)= (I— (C-’ + S)”S)c + (C-’ + S)’s=Chapter 4. The Diffuse State Space Model 51The last equality is obtained upon noting that (C-’ + S)(I — (C-’ + S)’S) = C’.Finally,o2Cov(7y) = Cov(7) — Cov(7,y){Cov(y)}’{Cov(7, y)}’= C - CX’{XCX’ + E}’XC= (C’ + S)1 by (4.3)This asserts the Lemma. •Proof of Theorem 4.1. Using Bayes’ Theorem, it follows that .\(y) = )(‘y) + )(yfry) —)(‘yy) with= 7#log a2 +log ICI +a2(7 — c)’C’(7— c)Mfri) = y#log a2 +log El +a2(y — x —X7)’E’(y — x — X’y)= y#log o2 + log El +a2(q — 2s’7 +71S)A(-yly) = 7# log a2 — log IC’ + S + g_2(7 — )‘(C’ + S)(-y—Following direct simplification, )(y) = (y# — 7#)log a2 + log Ia2Cl + log El + log C’ +SI + a2(q + c’C’c — (s + C’c)’(C’ + S)’(s + C’c). Now let C —* x. Then= )(y) — log Ia2Cl is as stated in the Theorem. •The nile’s of y and a2 are respectively ‘ = S’s and &2 = (q — s’S1)/(y —7#). Subtracting log Ia2CI from \(y) is tantamount to a normalization of the exact loglikelihood to ensure nondegeneracy. The “ordinary” log-likelihood described in Chapter2 is a special instance of the diffuse likelihood ; it is obtained upon regarding 7 = 0.Chapter 4. The Diffuse State Space Model 524.2.2 Connection between the Diffuse and the Marginal LikelihoodsIn this subsection, we derive the marginal likelihood and establish its connection to thediffuse likelihood. As previously stated, the marginal likelihood is based on a transformation of (y, {x, X}, G) which is functionally independent of . This data transformation isachieved by a class of well-known lineai maps whose interesting properties are describedin the following Lemma.Lemma 4.2 Let M = I — X[X’E’X]-’X’E’. Then (i) M is idempotent with ranky# _y#, (ii) MX = 0 and (iii) E-’M = M’’.Furthermore suppose N is a (y# — x y matrix spanning the rowspace of M.Then (iv) NX 0 and (v) N’(NN’)1 1M.Proof. Results (i)-(iii) are direct. For the second part of the Lemma, write N = JMwhere J is of full-row rank. Then (iv) immediately follows from (ii). Now let V =N’(NN’)1. ThenNV = NN’(NEN’)’N = N=‘ MDV = M since N spans the row space of M==‘ M’V = r’M by result (iii)= V = E’M by results (iii) and (i)This asserts result (v) of the Lemma. .Corollary 4.1 For q, s and S as defined in Theorem 4.1 and N as defined in Lemma4.2,q — s’S’s = {N(y — x)}’(NN’)’{N(y—x)}.Chapter 4. The Diffuse State Space Model 53Proof. Direct manipulation of the pertinent quantities leads toq — s’S’s={(y — — x)}—{(y — x)’E’X}{X’E1X}’{ ’E’(y — x)}=(y — — x)= {N(y — {N(y — x)}with the final equality following from result (v) of Lemma 4.2.The results of the Lemma and the Corollary are now used to establish the marginallikelihood and thereafter connect it with the diffuse likelihood. It is critical to recall atthis stage that we are considering the regular SSM whereby y is interpreted as the stackof non-missing elements of the observations y. This ensures that every aspect of thenon-missing data can manifest themselves in all possible linear transformations of y, inparticular Ny.Definition 4.3 The marginal likelihood of data y is the likelihood based on the transformed data Ny where N is as defined in Lemma 4.2.Theorem 4.2 The marginal log-likelihood apart from an additive constant equals,,\m(y) = (Ny) = (y# —7#)2 + log INEN’ + (q — s’S’s)/o2.Proof. The result follows from corollary 4.1. .An immediate consequence of the Theorem is the following result.Corollary 4.2 The mie of .2 is the same when it is calculated from either the diffuse ormarginal likelihoods.Chapter 4. The Diffuse State Space Model 54We now establish the connection between the diffuse and marginal likelihoods.Theorem 4.3 For the model (y, {x, X}, G), the diffuse and marginal log4ikelihoods differby log INN’I — log IX’XI.Proof. We first express INEN’I as follows,INN’I = I(NEN’)’L’= I(NN’)’N{N’(NDN’)1N}N’( ’) ‘= (NN’)2I ’MN I’INN’I2{’ —N1’ NE’X= NN’j2 IX’E11X’E’N’ X’1—1I N= INN’2 I I ‘(N’ X) IX’1XIx,)/ —1(N\= INN’121E1 I I (N’ X) IX’’XIx’)—1NN’ 0= INN’I2 IX’E’XI0 X’X= NN’I X’X’ II SIThe second equality follows from result (v) of Lemma 4.2 ; the sixth equality from the wellknown formula for the determinant of a covariance matrix and the penultimate equalityuses NX = 0. Upon substitution of the above expression for INEN’I, we obtain\m(y)=(y —7#)2+ log INEN’I + (q — s’S’s)/u2Chapter 4. The Diffuse State Space Model 55= (y# — 7#)c72 + log IE + log SI + (q — s’S’s)/2+ log INN’I — log IX’XId()+ log INN’I — log IX’XIThis asserts the Theorem. •It follows from Theorem 4.3 that the difference between the diffuse and marginal log-likelihoods, namely log INN’l — log IX’XI, can be interpreted as a penalty term resultingfrom the non-removal of diffuse effects in the SSM. The diffuse likelihood coincides withthe marginal likelihood when the SSM is non-diffuse. Hence it can be used to discriminatebetween diffuse and non-diffuse SSM’s. The penalty term contrasts with its counterpartin the Akaike’s Information Criterion where it is expressed as an ad hoc function of thenumber of parameters in the model. The next two results illustrate the significance ofthis penalty term in the case of time invariant SSM’s.Theorem 4.4 If matrix N represents ordinary differencing then \m(y)Proof. Consider the scalar model (1—L)yt = Vt, where Vt iS an arbitrary disturbance.This can be written as y 1Yo + v where y and v are respectively the stacks of ut’sand Vt’S. Here X = 1 thereby implying that IX’X I n. The matrix N has dimensions(n—i) x n with N(i,i) = —1, N(i,i + 1) = 1 and zero elsewhere. It follows that NN’is tridiagonal with diagonal entries of 2 and subdiagonal and superdiagonal entries of -1.It is easy to show that INN’I = n. Therefore)m(y) )d(y) as asserted. •Remarks1. The Theorem is easily extended to higher order differencing i. e. (i — L)dyt, d 1and to seasonal differencing i. e. (1 — L8)yt, s i. It also extends to vectorobservations Yt.Chapter 4. The Diffuse State Space Model 562. The Theorem indicates that differencing of the data, whereby the differenced dataare regarded as noise, is in fact unnecessary since statistical inferences based on thediffuse and marginal likelihoods will in fact coincide. This is verified in the followingapplication which uses the DKF in the context of an ARIMA (0,1,1) model.Application to IBM stock prices. Box and Jenkins (1970) fit the model Ztet + 0.O9et_i, t = 1,2,. . . ,369, where Zt =— Ut—i with Ut representing the closingprices of IBM stock for the period 17th May 1961 to 2nd November 1962 (Series B). Inthis situation, the diffuse and marginal log-likelihoods are respectively based on data ytand its differenced form Zt. Using the square-root DKF algorithm (presented in the nextsection), we obtain Ad(y) = 1814.9 and 2 = 52.2. These results coincide with thoseprovided by Box and Jenkins.The next example links Theorem 3.4 and Theorem 4.3. It is shown that in the contextof time invariant SSM’s with state transition matrix T, the penalty term log lNN’—log X’X is a function of the nonstationary eigenvalues of T.Theorem 4.5 Consider the DSSM,Yt Ztctt+Gtut, ctt+i=Tot+Hut, t=0,1,2,...Then the difference between the diffuse and marginal log-likelihoods of this DSSM is afunction of the nonstationary eigenvalues of T.Proof. As a consequence of Theorem 3.4 one can write c = U17 + 4. Repeatedsubstitution of the transition equation into the measurement equation allows one to writeUt = ZtTtU17+vt where Vt 15 a linear function of c4 and (u0,. . . , ut) and U1 is as defined inTheorem 3.3. Therefore the tth row of X = ZtTtU1 = ZU1P. Matrix X also affects theChapter 4. The Diffuse State Space Model 57determination of N (see Lemma 4.2). Therefore we conclude that the difference betweenthe diffuse and marginal log-likelihoods is explained by the nonstationary eigenvalues ofT. .Application to scalar explosive autoregressive processes. To assess the relevanceof Theorem 4.5, consider the scalar autogressive model Yt = ayt_i + t where a > 1. Inthis case, the component of X is & while an appropriate (n — 1) x n matrix N hasentries N(i, i) = —a, N(i, i + 1) = 1 and zero elsewhere. It is easy to show through directalgebra that IX’XI =a2INN’I thereby implying that the diffuse and marginal likelihoodsdiffer by the logarithm of the square modulus of the nonstationary eigenvalue, a.4.3 Statistical Inference with the Diffuse SSMThe estimation of the states and regression parameters in the DSSM requires dealingwith the diffuse parameter-y. In the initial part of this section, we demonstrate usingideas borrowed from Rosenberg (1973) that an efficient filtering algorithm for the DSSMis tantamount to a modified KF which estimates y in parallel with the nondiffuse aspectsof the states. We then show that the DKF of De Jong (1991b) immediately followsfrom these ideas. The section ends with a summary of the results of De Jong (1991a,1991b) concerning diffuse filtering, smoothing, likelihood evaluation and generalized leastsquares estimation of regression parameters with the DKF. We also briefly discuss theimplementation of the DKF via its square-root form.For this informal introduction to the ideas behind the DKF, we assume without anyloss in generality (a; b) = 0. The following result is useful for subsequent discussions.Lemma 4.3 The DSSM can be expressed as Yt = X-y + Vt with X built up from thesystem and regression matrices and where Vt is generated as follows,Vt = Ztc4 + Gtut , = Ttcr + Htut whereChapter 4. The Diffuse State Space Model 58u_2Cov(cr)= H0.Proof. The proof easily follows upon repeated substitutions of the transition equationinto the observation equation. .Now suppose the KF is applied to {Vt}. This is possible since the initial state ofthis process is non-diffuse. Denote the innovations and their covariance matrices byet = Vt— Pred(vtlvo; . .. ; Vt_i) and a2D1. Put v = (vi; . .. ; v,j, e = (ci; . .. ; e,) andD = Diag(Di, . . . , D). Then the log-likelihood of v is given by,= nlogcr2+ D +o._2eID_ieObserve that e = Ky = K(y— X’y) where y is the stack of the ys and K is lowertriangular with ones on the diagonal thereby implying IKI = 1. It is crucial to recognisethat K is orthogonal and is implicitly produced by the KF. Furthermore,e’De = (Kv)’D’(Kv)=(y —X7)!KD_iK(y — X7)= {ACQ,, — Xy)}’{K(y — X7)}where K = D’2K. Therefore the gls of y is obtained upon regressing ICy on KX, aprocess than can be done in parallel with the filtering of v. This suggests applying aKF-like algorithm to the augmented observation (Xe, Ut) instead of Ut alone. This is theconcept behind the DKF.4.3.1 Filtering and Likelihood Evaluation with the DKFThis subsection summarises the work of De Jong (1991b) with regards to the use ofthe DKF in filtering, smoothing, gis estimation of regression parameters and evaluationof the diffuse log-likelihood. First consider how the diffuse log-likelihood defined inChapter 4. The Diffuse State Space Model 59Theorem 4.1 can be evaluated in a recursive fashion. As Schweppe (1965) did earlier inthe derivation of the likelihood of a SSM (see Chapter 2), De Jong (1991b) exploits thefact that = K’D’K to express S and s as,S = X’EX = (KX)’D(KX), s = — = (KX)’DK(y— x)In Chapter 2, we saw that the stack of innovations produced by the KF is e = K(y—where X3 is a known mean effect. Therefore replacing /3 by (a; b) allows the computationof f = K(y — x). Furthermore each column of KX may be computed by substituting yby a column of zeroes and /3 by the relevant column of —(A; B). These ideas are used inthe DKF for the recursive evaluation of S and s and ultimately the diffuse log-likelihood.We now formally define the DKF.Definition 4.4 The DKF is the KF (see Theorem 2.1) with the equations for Ct and& respectively replaced byE and A+1 =with A1 = W0(—B, b) +T0(—A, a) and P1 unchanged.Furthermore attach the recursion Qt+i = Qt + ED’E with Q = 0 to the DKF. Thematrix Qt is of the form,I S StQt=Is qand hence the diffuse log-likelihood as stated in Theorem 4.1 is given by,=—7#)logu2+logSiI + logD +2(q+i —Chapter 4. The Diffuse State Space Model 60Remarks1. The DKF turns two vector recursions in the KF into matrix recursions. Furthermore evaluation of the diffuse log-likelihood is made possible upon appending therecursion of the matrix Qt to the DKF. This contrasts with the KF where Qt is ascalar.2. The matrices A and B are the same matrices used in defining a0 and 3. We willaddress the appropriate specification of A for the nonstationary time-invariant SSMin a future subsection.3. The last column of A and E are interpreted as the nondiffuse aspects of the stateand the innovation at time t.The above ideas put us in a position to appreciate De Jong’s (1991b) results onfiltering, smoothing with the DSSM. The predictors therein are interpreted as limitingpredictors since they assume that the diffuse parameter-y is such thatcr2{Cov(7)}’ =c-i _* 0.Theorem 4.6 (De Jong , 1991b) Suppose y= (yi;. . . ; yr,) is generated by the DSSM.Let (i) and it respectively denote the limiting predictors of the random variable xconditional on (yi;. . . ; yt_i) and (yi;. .. ; y,) and (ii) M7 denote all but the last columnof matrix M. Then= S’s, .r2Mse(5it) = 5_i= At(—5’t; 1), o2 Mse(&t) = P +A7SAyt — = Et7(—5’t; 1), r2Mse() = D +b + B’5’t, g_2Mse(t) = BSB’Chapter 4. The Diffuse State Space Model 61Furthermore, for 1 t r n + 1,= F(—+1;1) and o2Mse(&t, &r) = FtL._1,(I— Rr_iFr) +Ft7SiFrwhere N_1 = ZD’E + LN and R_1 = ZD’Z + LRLwith N and R,1 equal to zero matrices, F = A +P1N_,L = T — KZ and L_1, =i—rr—1 T’ s1. T—.u1 W1&n =If St is singular then a generalized inverse (e. g. S) can be employed. The resultsof the Theorem clearly generalize those described in Chapter 2 for the non-diffuse SSM.Thus the DKF is a transparent generalization of the KF. The conceptual elegance andthe computational simplicity of the DKF makes it an attractive proposition vis a vis acompetitor like the AKKF of Ansley and Kohn (1985b).4.3.2 Square Root DKFThe connection between the KF and the DKF makes it obvious that their square-rootforms should be similar except for differences in the dimensionalities of various matrices.This subsection presents a slightly modified version of the square root DKF originallydevised by De Jong (1991a). The algorithm proceeds as follows,• Step 0. Initialize A = Wo(—B,b)+To(—A,a), P H0, .\ = m = 0 and set Q to anull matrix.For steps t = 1,2,.. . , n do• Step 1. Postmultiply matrix on the left with an orthogonal matrix U such thatzP”2 G D 0U=TP”2 H K Pand D is row-echelon with the same number of rows as Z.Chapter 4. The Diffuse State Space Model 62• Step 2. E D-{(XtB,yt— Xb) — ZA}, A = W(—B, b) + TA + KE= .-\ + log D, m m + column-rank (D)• Step 3. Update Q via an orthogonal transformation U,(Q;E)U= QDe Jong (1991a) shows that at the end of each iteration, D = D”2, P =1/2 • •A = A+1 and Q = . The matrix Q has canonical form Q = {(Q,w); (O,r)}, witha scalar. Upon multiplying Q by its transpose, we immediately recognise St = Q’Q,St = Q’zv and q = r2 + w’w. Therefore it follows that (i) 1t = Ss = (Q’Q)Q’w =Q’w and (ii) mô qt — sS’st = (r2 + w’w) — w’Q(Q’Q)’Q’w r2.Step 3 in the square-root DKF devised by De Jong (1991a) is more elaborate in thesense that the update of the (generalized) inverse of Q is achieved with an orthogonalmatrix U such that,( Q E(Q 0Q’- 0) WDe Jong shows that R automatically holds Q” if Q was square on the previous iteration.If Q is not square, then its generalized inverse must be explicitly computed. This isnecessary for only a few initial iterations unless a multicollinearity problem exists. Theadvantage of this approach is that 5_41/2 is immediately available in R. This “luxury”however comes at the price of carrying out an orthogonal transformation on an augmentedmatrix at each iteration.The use of either version of step 3 depends on the purpose under consideration. Ifthe square-root DKF is used for likelihood evaluation and smoothing, then our step 3 ismore appropriate since only is required and therefore only one matrix inversion isneeded. However if we are interested in monitoring the behaviour of,for example inChapter 4. The Diffuse State Space Model 63a study of stability of regression relationships over time, then De Jong’s version is moreappropriate.On a final note, we mention that a square-root form of the diffuse smoothing algorithmis identical to the one used in the non-diffuse context except for the difference in thedimensions of the pertinent quantities.4.3.3 Automatic Initialization of the DKFWe now demonstrate, using the results of Chapter 3, the appropriate initialization of theDKF for the class of nonstationary time-invariant SSM’s. From Theorem 3.4, it followsthat the initial state can be written as a1 = T(U17 + c) + H’u0 where y (0, kI) withk —÷ co and _2Cov() = U2MU. Therefore for this class of SSM’s, an appropriateinitialization of the DKF is A1 = (—TU1,a) + W(—B, b) and P1 = TU2MUT’ + HH’.4.3.4 Pitfall of employing the “big k” methodWe have emphasized the fact that the “big k” method is inexact. As previously noted, itis nevertheless a popular approach adopted in empirical works since it employs readily-available KF software. We now demonstrate that employing the “big k” method canhave serious consequences for all aspects of statistical inference. This can be seen byconsidering the behaviour of the time series u2D, the variance of the innovations, asevaluated in the first instance from the KF using the “big k” method and in the secondinstance using the DKF. For the purpose of illustration, we consider the QBSM discussedin Chapter 2. Recall that the QBSM is a nonstationary SSM with its state transitionmatrix having four nonstationary eigenvalues each with unit modulus. Therefore for the“big k” method, the KF is initialized with the estimate of the initial state covariancematrix equal to kI where k is large.Chapter 4. The Diffuse State Space Model 64Prediction VarianceEffect of KF InitializationLog Innovation Variance1211.9511.911.85-11.8—11.75 j .j tr: c:0z;zzz::11.5 I I I I I I I I I I I I I I I I I I I I I I5 10 15 20 25 30 35 40IterationDKF-— k10E12 - k10E4Figure 4.1: o2D evaluated vith DKF and “big k” methods.Since the DKF can be viewed as equivalent to the “big k” method with k arbitrarilylarge, one would expect o2D to be highest when the DKF is employed. However thisis not the case in Figure 4.1. This is explained by the fact that the use of the “big k”method, with k large, is numerically unstable. Bell and Hillnier (1991) report a similarphenomenon upon applying a modified AKKF to a seasonal ARMA model.4.4 Characteristics of the DKF with ARMA ModelsThe ARMA model is a popular tool in Time Series Analysis. Box and Jenkins (1970)have built up a complete set of statistical techniques around stationary ARMA models.However many applications, especially those arising in the socio-economic areas, requireChapter 4. The Diffuse State Space Model 65the use of nonstationary ARMA models such as ARIMA models. In these situations, statistical inferences have been traditionally conducted with the stationary models resultingfrom iterated differencing of these nonstationary models. This practice has however beenquestioned in the case of vector data by Tsay and Tiao (1990) who ultimately go on torecommend that statistical analysis be carried using the raw data. The material in thelast Chapter and the current Chapter indicate that the DKF provides a means for statistical inference in nonstationary ARMA models at all levels of generality. In particular,vector ARMA processes and data irregularity problems such as missing data are covered.In the next subsection, we display an interesting collapsing property of the DKFwhen it is applied to nonstationary autoregressive processes. The subsection thereafterfocusses on the consequences of noninvertibility on the diffuse likelihood. For ease ofpresentation, we have only considered scalar ARMA models ; however generalization forthe vector models is direct.4.4.1 Autoregressive ProcessesWe demonstrate in this subsection that when the DKF is applied to a nonstationaryAR(p) process, the estimate of the diffuse parameter y associated with the state a0is in fact obtained from the first p observations (we are assuming a regular SSM i. e.without any missing observations). This has the interesting implication that after theth iteration, the DKF self-collapses to the KF.Theorem 4.7 Suppose the DKF is applied to a nonstationary AR(p) process. PartitionA = (A7,at) and E = (E7,et) where at and et are vectors. Then fort > p, E7 = =o and at and et corresponds to the limiting predictors of the state and the innovation attime t.Chapter 4. The Diffuse State Space Model 66Proof. Consider the AR (p) model, Ut = aiyt_i +. . .+apyt_p+ft, t 1. Upon repeatedsubstitutions, one can write y= (yi; . . . ; yr,) as y = Ayt + BE where yt = (Ui;... ; y,,) and= (ti; . . . ; c,,). Hence all the diffuse aspects of the process can be efficiently inferredfrom yt only and this in turn implies that S and s attain their final values after iterationp of the DKF. Since S is a positive semi-definite matrix, it then follows that for t > p,E7 and in turn A7 are matrices of zeroes. Consequently, for t > p, at and et mustcorrespond to the limiting predictors of the state and the innovation at time t. .RemarkThe Theorem asserts when the DKF is applied to nonstationary AR (p) processes, itcollapses de facto to the KF i. e. at t= p + 1, we can reinitialize A = (0, at) and Qiby at and q,,÷ —sS1÷,update the diffuse log-likelihood by log IS+iI and thenrun the KF for t p + 1. Hence this collapsed DKF is as computationally efficient asany alternative algorithm proposed in the literature (such as those described in the nextChapter).The de facto collapse of the DKF to the KF occurs whenever the diffuse aspects of theSSM are completely determined by an initial stretch of the observations (yl; . . . ; Urn) (say)in which case A = (A7,at) = (0, at) for t > m. Whether A exhibits such a behaviour inother SSM’s is a moot point. It is an easier task to find SSM’s which do not lead to suchAt’s. Consider the following two examples. First, suppose the SSM contains a diffuseregression parameter 3. In that case, the optimal estimator of /3 is based on the entireobservation set and hence A7 does not necessarily stay a zero matrix. Second, considera nonstationary mixed ARMA (p,q) process. This can be written as y = Ayt + Be + Cetwhere y, yt, and e are as described in the proof of Theorem 4.7 and e = (ei_; . .. ; co). Inthis case, the diffuse aspects of the process are captured in both yt and et. The optimalChapter 4. The Diffuse State Space Model 67estimator of Et requires the entire observation set y and consequently Ar.., 0.RemarkSince it is often possible to approximate a mixed ARMA process by a relatively low orderAR process, we would expect, in view of Theorem 4.7, the entries of the matrices E7 tobe close to zero after an initial number of DKF iterations. Therefore we expect St andst to substantially attain their their final values in the early iterations of the DKF.The next Chapter focusses on the computational aspects of the DKF. The worktherein is motivated by the results in this subsection. In particular, it is shown that afteran initial run, the DKF can always be switched to a KF based on the ASSM (i. e. theSSM with augmented states). This KF is however not computationally efficient due to thedimensions of the states. This leads us to consider a collapsing strategy which consists ofreducing the column dimensions of pertinent matrices in the DKF. This collapsed DKFis shown to outperform the competition.4.4.2 Mixed ARMA ProcessesBox and Jenkins (1970, p198-199) observe that a stationary ARMA (p,q) model mayhave up to 2’ representations thereby implying that the processes described by theserepresentations have the same autovariance function. Therefore these processes mustalso share the same likelihood function and consequently they also generate one-stepahead prediction errors with identical means and variances. Osborn (1976) argues thatdue to roundoff errors, grid searching of likelihood values across the parameter spacemust be restricted to the invertibility region. This invertibility property is satisfied byonly one of these 2’ parametrizations of the ARMA model. For completeness, we nowdefine the concept of invertibility.Chapter 4. The Diffuse State Space Model 68Definition 4.5 The ARMA (p,q) process a(L)yt = b(L)e, where L is the lag operator(i. e. Lx = Xt_i) and a(.) and b(.) are polynomials of order p and q in L, is said to beinvertible if the roots of b(L) = 0 lie outside the unit circle.In this subsection, we demonstrate that the above remarks transcends to nonstationary ARMA processes with the diffuse likelihood used instead of the “exact” likelihood.We show how roundoff errors arise in the DKF and therefore stress the prudence ofkeeping to the invertible region while grid-searching the diffuse likelihood function.Theorem 4.8 Up to 2q parametrizations of a nonstationary ARMA (p, q) process sharethe same diffuse likelihood function.Proof. It suffices to consider an MA (q) process, since the nonstationary ARMA(p,q) process, a(L)yt = b(L)e can be viewed as Zt = b(L)e with Zt = a(L)yt. Suppose themodel is invertible with the roots of b(L) (possibly complex) denoted by 9,,j = 1,. . . , q.The spectrum of z = (zi;... ; z) is given by,==2+ 8jexpiAi 12=ufJ(1+8+2OcosXj)When 8 is real, it follows that1 ++2S3cosAj 8(1 + (1/8)2 +2(1/83)cos)j)This asserts that the spectrum is invariant to root ffippings. Since the spectrum stands a1:1 relationship with the autocovariance function, this implies that the latter and hence byChapter 4. The Diffuse State Space Model 69extension the diffuse log-likelihood is invariant to possibly 2q different parametrizationsof the ARMA (p,q) process. •Theorem 4.8 implies that an identification problem is likely during grid-searching ofthe diffuse likelihood function. Restricting the grid-searching to the invertibility regionavoids this identification problem. However a more consequential argument to keepingto the invertibility region is that the evaluation of the diffuse likelihood of nonstationary,noninvertible ARMA processes is prone to acute roundoff errors. To see this, write theARMA model as, in the previous subsection, y = Ayt + B + Ce. Then it is easy to seethat in this context, roundoff errors are likely to arise since when t is sufficiently large,the entries of the ttIz row of A, B and C diverge.We now illustrate with a simple example how the evaluation of the diffuse likelihoodof noninvertible processes is plagued by overflow problems when the DKF is employed.Consider the ARIMA (0,1,1) process itt = itt_i + €t + b€_1, where the disturbances Ct’Sare serially independent with mean zero and variance v. Assign b = 8 (8 < 1) andv = ,.2 for the invertible model and b = 1/8 and v = u22 for the noninvertible model.Upon assuming that this process has applied since time-immemorial, it follows fromTheorem 3.4 that the stack of observations can be written as y = 17 + (1, 1)a + Gu,where u_2Cov(a) = {(1, —1); (—1, 1)} and G is lower-triangular with diagonal entriesof one and subdiagonal entries of (1 + b). The “non-diffuse” portion ofcr2Cov(y) is E =2(i, 1)Cov()(1, 1)’ + GG’ = GG’. This implies [>D = 1. Upon direct manipulation,it can be shown that S(b) = 1’(GG’)’l’ = b (here y# = 1) where 5(b)is S evaluated with MA coefficient b. Let C(b, v) denote the sum of square of errorsq — s’S’s evaluated with parameters b and v. Then it is easy to deduce that C(8, 2) =82C(8,82o). Therefore, after noting that log E = 0 independantly of the value of b,Chapter 4. The Diffuse State Space Model 70we obtainy#_1)d(yI8_l,82J = (y#— 1)log(822+log 0 + (9u)2C(91,6o)y#—1= 2(y#— 1)logS+ (y# — 1)log2+log6_2#_1) 2t+ (0oC(0,)y#—1= (y—1)log92+lo >= A’(yI6,2)as asserted in Theorem 4.8. Observe however that with the noninvertible process, Sblows up and this makes computations in the DKF unsound. This demonstrates theprudence of restricting ourselves to invertible ARMA models.4.5 SummaryIn this Chapter, we have treated the problem of statistical inference in the DSSM. Wehave demonstrated that recursive ifitering, smoothing, evaluation of the log-likelihoodand the gis estimation of regression parameters can be carried out with a transparentextension of the KF labelled the DKF. We have considered the merits of the diffuseand marginal likelihoods as suitable pseudo-likelihoods for the DSSM. We displayed twointeresting characteristics of the DKF when it is applied to nonstationary ARMA models.First with autoregressive models, the DKF is shown to collapse de facto to the KF after aninitial number of iterations. Second we have stressed the prudence of restricting the grid-searching of the diffuse likelihood function to the invertibility region. This is necessaryto avoid numerical roundoff and overflow problems. The work in the next Chapter ismotivated by the collapsibility property of the DKF. There we show a means of forcingthe collapse of the DKF, which for arbitrary SSM’s, is not necessarily equal to the KF.Chapter 5Efficient Algorithms for the State Space ModelIn this Chapter we show that the DKF, when properly implemented, is superior in performance to alternative algorithms proposed in the literature for the purposes of ifitering, smoothing, likelihood evaluation, gis estimation of regression effects and diagnosticsgeneration in the DSSM. This may appear surprising since the DKF has two apparentshortcomings : (i) the vector recursions for Ct, & and the scalar recursion for qt in theKF are replaced by matrix recursions and (ii) it does not immediately provide limitingpredictors of the state or estimates of the regression effects.The alternative approaches to the DKF do not suffer from these shortcomings sincethey apply the KF to all but an initial stretch of the observations. Ansley and Kohn(1985b,1990) switch from their modified KF (hereafter called AKKF) to the standardKF after an initial startup period whereas Sallas and Harville (1988) and Pole and West(1989) both initially use the Information Filter (IF) and thereafter switch to the KF. Inboth cases, the switch to the KF is explained by the fact that once a proper estimate ofthe diffuse parameter-y can be constructed (from an initial stretch of the observations)then it can subsequently be used to construct limiting predictors of the states via theKF. This concept is evident in Harvey and Pierse (1984) and Bell and Hillmer (1991)they both deal directly with the diffuseness problem by using an initial stretch of theobservations to construct regression type estimates for initializing the KF used for thesubsequent stretch of the observations.It therefore appears that the usefulness of either the AKKF or the IF is confined to71Chapter 5. Efficient Algorithms for the State Space Model 72providing estimates for initializing the KF used thereafter. Since the DKF achieves thesame objectives as the AKKF and IF, it is of interest to study the merits of switchingfrom the DKF to the KF after a sufficient number of iterations. De Jong (1991a, 1991b)provides the implementation details and discusses the utility of switching from the DKFto the KF. The latter is based on a SSM with states augmented by the diffuse parameter7.We maice several contributions in this Chapter. In section 1, we demonstrate thatwithout any loss in generality, the SSM can be defined with the diffuse parameter 7partitioned as 7 = (71; 72) where yi and 72 are respectively the diffuse effects associatedwith the initial state and the regression parameter. With this redefined SSM, we argue inthe following section, that from the standpoint of likelihood evaluation, De Jong’s collapseof the DKF to the KF only necessicates the augmentation of the states by 72. This KF,which we label the Augmented KF (AKF), coincides with the alternative algorithms tothe DKF (discussed above) since these are based on the ASSM wherein the states areaugmented by the regression parameter 6. We next show in section 3 that an analogueto the AKF is a column-reduced DKF, labelled the collapsed DKF (CDKF), where theappropriate submatrices of A, E and Qt associated with 7i are partialled-out after aninitial stretch of the observations has been processed. Both the AKF and CDKF coincidewith the KF in the absence of a regression parameter in the SSM. Square root forms ofthe AKF and CDKF and their associated smoothing algorithms are also described inthese two sections. Section 4 is devoted to the comparison of the computational aspectsof the DKF, AKF and CDKF. We conclude that the CDKF is generally more efficientthan either the DKF or the AKF since (i) it employ matrices A, E and Q with lowerdimensionalities than the DKF and (ii) it recurs mse matrices P of lower dimensionalitiesthan the AKF. This tells us that the performance of filtering and smoothing algorithmsis significantly more affected by the number of rows than the number of columns in theirChapter 5. Efficient Algorithms for the State Space Model 73pertinent matrices and thereby validates our reservation (see Chapter 2) on incorporatinga regression parameter within the state, as in the ASSM specification.5.1 The Canonical Form of the Diffuse State Space ModelIn the previous Chapter, the DSSM was anchored with co = a + A7 and 3 = b + B7with (A; B) of full column rank. The following result shows that without any loss ofgenerality, (A; B) may assume a canonical structure.Lemma 5.1 The DSSM (see Definition 4.2) can be transformed such that (A; B) hasthe canonical structure {(A1 A2); (0 B)} where A1 and B respectively have the same rowdimensions as A and B.Proof. There exists a nonsingular matrix Q such that (A; B)Q has the stated canonicalstructure. The effect of Q is undone upon reinterpreting 7 as Q—’-y. .The canonical structure on (A; B) often arises naturally; therefore it is rarely necessary to determine the transformation matrix Q. In most applications, A2 = 0 but thisdoes not necessarily entail further simplifications of the results reported in this Chapter.Also note that the theoretical results developed in the previous Chapter are unaffectedby nonsingular transformations of the diffuse parameter.Henceforth the DSSM willalways be anchored with,a0 = a + (A1,A2)(71;72) and /3 = b + (0, B)(-yi; 72)Consequently, 7i and 72 can then be viewed as the diffuse parameters associated respectively with the initial state and the regression parameter. The next two sections discusstwo ramifications of partitioning 7 in such a fashion.Chapter 5. Efficient Algorithms for the State Space Model 745.2 Switching from the DKF to the KFDe Jong (1991a) rewrites the DSSM as,IYt = Xb+ (Z,XB) + Gtut (5.1)\7)Iat+i \ IW’ IT WB’\ fat’ (iit’I I = I I b+ + Ut (5.2)7 ) 0) I R7) 0)This is a SSM with states augmented by the diffuse parameter-y. Clearly if (&t; ) isavailable and corresponds to the limiting predictor of (at; 7) using (yi,.. . , yt—i), thensubsequent iterations of the KF (applied to (5.1)-(5.2)) yield (&r; r), r > t and theassociated error covariance matrix.Since B = (0, B) where the zero matrix consists of columns, it follows that thefirst columns of both XB and WB only contain zero entries. Consequently 71, apartfrom being captured in the initial state cr1, does not figure in either Yt or at and hence itneed not be accomodated in the augmented state in (5.2). In essence then, the omissionof 7i from the augmented state has no repercussion with regards to likelihood evaluationor prediction in the SSM. This observation leads us to the definition of the AugmentedKF (AKF).Definition 5.1 The AKF is the KF applied to the SSM,Yt = Xb+ Z6 + = VVb+ 76 +‘1-tu where (5.3)(T WZ3’ (litI,Wt=l I’=I I and t=l\72) \O) 0 I )Chapter 5. Efficient Algorithms for the State Space Model 75Remarks1. The AKF coincides with the alternative KF-based algorithms listed in the preamblesince these are based on the ASSM wherein the regression parameter is incorporatedwithin the state.2. The AKF coincides with the KF when 3 is null.3. When ‘y’ 0, it ensues that the state in (5.3) has -y less components than the statein (5.2). Consequently the AKF will outperform the KF based on (5.1)-(5.2). Thecomputational savings can be appreciable, as for instance with monthly seasonaldata when y 11.4. The AKF does not update the estimate of ‘. However Pred(7iy1;... ; y) issometimes required for smoothing purposes. The reconstruction of this estimatoris dealt with in the proof of Theorem 5.2.We now turn to the problem of the appropriate initialization of the AKF. This requiresthe construction of the limiting predictor of the augmented state and its associated errorcovariance matrix at the point of collapse.Theorem 5.1 Suppose y is generated by the DSSM. Apply the DKF and partition,(Sm SmAm = (Am7,am) and Qm = Is q,Suppose Sm is nomsingular and for t > m, replace B = (0, B) by B and run the AKFinitialized with( &m ‘ ( am ‘ ( Am-y ‘ -1= I 1=1 11 I5mm and‘\7m,2 ) \ 0 ) \ {O,—17#} )Chapter 5. Efficient Algorithms for the State Space Model 76-2 (Pm + Am7S;’A Am7S= o• Mse(6) = I1 _S*21A S22\ m m inwhere S denotes the last columns of 5;’ and S =t2Mse(,) is the bottomdiagonal block of order 4L of S;’. Then subsequent recursions of the AKF yield thelimiting predictor of 6, I > m and its error covariance matrix.Furthermore if the recursion q+1 = q + eD1’et where et is the innovation at time t,Cov(et) = a2Dt and q = q — sS;’s is attached to the KF, then —2 x the diffuselog-likelihood, apart from a constant, is given by,= (y#— 7#)logg2+log ISmI + slog IDtI +Proof. It follows from Theorem 4.6 that 6 is the limiting predictor of and that i’ =o2Mse(Sm). Therefore subsequent iterations of the AKF yield the limiting predictor ofS, I> rn and its error covariance matrix. The second result is obtained as follows. LetC =cr2Cov(7) —* oc. Then= A(y)—logo2CI= {(yi,...,ym_i)—logICI}+(ym,...,yny,,...,ym_i)rn—i—* [{(yi;... ; — 7#}logj2+log ISmI + log IDtI +2q,j+=This completes the proof of the Theorem. •With scalar observations, the switch from the DKF to the AKF can take place at theearliest when m = 1 + in which case q = 0. This will be the case when each iterationof the DKF leads to the identification of a separate component of.This however is notChapter 5. Efficient Algorithms for the State Space Model 77the case in general. For instance if for 1 <t 7# either of Z or X or W is equal to azero matrix then more than 7# iterations of the DKF will be required to identify 7 andin this case q,, 0. With vector observations, fewer than iterations may be neededto identify.The AKF has three attractive features vis-a-vis the DKF. First, it automaticallygenerates (i) limiting estimates of the states and the innovations and (ii) gls estimates ofthe regression parameter ; with the DKF, such estimates are only obtained after furthercomputations. Second, it employs a scalar recursion for q as opposed to the matrixrecursion for Qt in the DKF. This implies that y#ô.2 = q1 can be read off from theAKF ; this compares with the DKF where extra computations are required to evaluatethe same estimate. Third, it is numerically more stable than the DKF since it does notrecurs S. The latter was shown to explode when the DKF was applied to nonstationarynoninvertible ARMA models. These advantages are however overriden by the fact thatthe high climensionality of the augmented state implies time-consuming computations ofcrucial matrices in the AKF, in particular the computation of the state error covariancematrix o2Pt. This fact will be highlighted during the discussion on the efficiencies ofcollapsing strategies in section 5.4.The square root form of the AKF is as described in Theorem 2.3 except for themodified system matrices which are specified in (5.3) . The next Lemma describes thesafe computation of its initializing quantities (i. e. 6,,,, P,,/2 and )) using output generatedby the square root DKF.Lemma 5.2 Suppose the square root form DKF described in section 4.3.2 is run until say t = m when rank(Q) = + Write (i) Q’ = {(Q,w); (0,r) where Q ={(Q11,Q12); (0, Q2)} with Q11 and Q22 both square matrices with respective order yand -ye, w is a vector and r is a scalar and (ii) Am = (Am7,am) = (Ami, Am2, am) whereChapter 5. Efficient Algorithms for the State Space Model 78am is the final column of Am while Ami consists of the first columns of Am. Then thereinitializations described in Theorem 5.1 can be evaluated as follows :— ( &m — (am “ ( Am7 -13m— I ii ii iQ W,7m,2 ) 0 ) \ {0,—17#})pi/2= ((p112, AmiQ)U Am7S*20= r and \€—A+logQwhere U is an orthogonal matrix, (P,,/2,AmiQ’)U has no trailing zero columns andc’*2— I t—ii f—i. ç’—i— -12.22 , -22Proof. With matrix Q’ as stated, it follows that(Sm QQI(Q o(Q w(Q’Q Q’wk. s q ) w’ r ) \ 0 r ) \ w’Q w’w+r2Therefore ISmI = 1Q12 and q — sS;’sm = r2 and these assert the reinitialization of Xand The expression for L ensues upon noting that S;’sm = (Q’Q)1Q w = Q1w.Using the canonical expression for Q it follows that,S;;;1 == ( Qr —Q1Q2j ) ( -1 o )0 —Q22 Qi211 Q22/ ,-—1ç—1’ I r—lç ,,—1,-,—l’rv ç—1’ ,——1r ,—1t---1’-11 -11 1 ..12 -‘22 22 -12 -iii — -ii 412 -22 22I —1 (—i (I ,—1 j—1’— 22 -22-12 11 -22 -22It is then a direct task, using this expression, to verify that Pj2 when post-multipliedby its transpose equals Pm as specified in Theorem 5.1. .It is clear that for I m, the smoothing algorithm associated with the KF (seeTheorem 2.2) can also be used in conjunction with the AKF provided that the pertinentChapter 5. Efficient Algorithms for the State Space Model 79matrices are appropriately redefined. This smoothing procedure is however not efficientsince it entails the redundant smoothing of 72. This is explained by the fact that 72 canbe viewed as a (diffuse) regression parameter and hence its smoothed estimate coincideswith the final estimate provided by the AKF. An extra drawback arises when t < msince & is not available then. Thus for t < m, we need to revert to a diffuse smoothingalgorithm. The following Theorem specifies the necessary adjustments for extending thesmoothing algorithm associated with the AKF to a diffuse smoothing algorithm.Theorem 5.2 Suppose the DKF is switched to the AKF at t = m and for t rn, runthe smoothing algorithm described in Theorem 2.2 using AKF-derived data. PartitionTlm—i = (ni; n2) and Rm_i = {(R11,R12); (R21,R22)}, with n1 and R” cr2Cov(n1)ofthe same order as the unaugmented state cm. Fort < m, carry the modified recursionsN_1 = ZD’E + LN and R,_1 = ZDZ +where Nm_i = (0, ni) with matrix 0 having 7 + columns and R_1 = R11. Thenfor 1 t < r m, & and2Mse(&t,àr) are respectively given by,Ft(w; 1) and PtL._1,(I— Rr_iPr) +F7WF, PtL,JF7— F7J’Lm,rPrwhere F = A + PN_1, tO = Hum_i — 515m, H = S;’{A7,(0; —I)} with matrix 0having dimensions x W =o2Mse(w) = S’— HRm_1’ and J = (Re,R’2)H .The proof of this Theorem (and Theorems 5.4 and 6.1) requires the following result.Lemma 5.3 Consider the KF recursion given in Theorem 2.1. Then for t 1,et = Zt(at—&) + Gtut and at+1—= L(o—&) + Jtutwhere L=T—KZ and J=H—KG.Chapter 5. Efficient Algorithms for the State Space Model 80Proof. For the first result, observe thatet yt—Xt/3—Zt&t= (X/3 + Ztcxt + Gtut) — X/3 — Z&= Zt(crt— &) + GtutThis result is now used to prove the second resultat+i — = (W/3 + Tcx + Htut) — (Wi3 + T& + Ktet)= Tt(at— &) + Hu — Kt{Zt(at—&) + Gu}= (T— KtZt)(crt — &) + (IIi — KtGt)ut= Lj(at — &) + JuThis concludes the proof of the Lemma.RemarkThe results of Lemma 5.3 still hold when the AKF is applied to the SSM defined by (5.3)provided that at is interpreted as the augmented state = (at; 72) and all the matricesare appropriately redefined.Proof of Theorem 5.2. We first show that —w n+1 = Pred(71y1;... ; y,). Fort m, let 6 = (at; 72), Zt = (Z, XZ3), £ = — KZ where 2 is the transition matrixin (5.4) and K is Kalman gain matrix generated by the AKF at time t. ThenPred(7yi;...;y) == Pred(7yi;. .. ; ym-i) + 2 Cov(7, et)D’ett=m= S1m + -2 Cov(7, S — t)ZD1et= S’Sm +Y2COV(7,6m—Sm)1_i,mZD’etChapter 5. Efficient Algorithms for the State Space Model 81= S’Sm + ci 2Cov{7, H’Sm(7— ‘m)}7im-i= Mse(m)SmHm_i= S’Sm — Hum_iwhere £_i,m = ll7zt_i C’, with Cm_i,m = I. The second equality follows since thestack of observations (yl;. . . ; y,—i) is uncorrelated with (em;... ; e,j, which is the stackof innovations produced by the AKF for t m. The third and fourth equality usesLemma 5.3 and the fact that y and Ut are uncorrelated for all t. In the fifth equality, theexpression tm follows from Theorem 5.1. Furthermore repeated backsubstitutions of uas defined in Theorem 2.2 shows that is as asserted.We now show that Mse(yIy) =a2W:Mse(71y) = Cov{7,— Pred(7y)}= Cov{7, y — Pred(7yi;. .. ; ym—i)} + Cov(7, iim_i)H’= M.se(71y1;. .. ;y—i) + Co(S1m— H?m_i,’qm_i)H’= a2(S1— HRrn_1’) = u2WThe above expression for‘n+1 implies that,= ft)(—5’+i; 1) = (Ff7,ft)(Him_i — S’sm; 1)= {Ft..(—Ssm) + ft} +Ft7Hum_i= &tlm +Ft7H7m_iwhere F = (Ff7,ft) and &tlm = Pred(atyi;. .. ; ym).Let P1,, = Then for 1 t r m,Mse(&t, &,.) = Cov(ctt— &t, a—&r)= Covfrit— &tlm —Ft7Hum_i,ar— &rlm —Fr7Hum_i)Chapter 5. Efficient Algorithms for the State Space Model 82= Mse(&t1m,&r) + Cov(FtyHm_i,Fr7Him_ )— Cov(at— &t + FtyH7lm_i,FryH7m_i)— COV(Ft7H?lm_i, r — &r +Fr7H7lm_i)= Mse(&tim, &rim) — COV(Ft.yHllm_i,Fr7Hiim_i)— Cov(ct — &t,Fr7Thim_i)— COV(Ft7H71m_i,Yr — &r)= {Pt,r_i(I— 1Z_iFr) +F7S’F— Ft,mR”P,m} —Ft7HRiH’F,’y— COV(Ft,mfli,im_i)H’F,— Ft.yH Cov(qm_i,ni)F,,= Pt,r_i{I— (R.r_i— L,rR”Lm,r)Pr}— Pt,mR’1,+ F7WF, — Ft,m(R1112)H’F,—Ft7H(R11;R21)P,m= Pt,r.....i(I— ‘I?r_iPr) + F,mR11P,m — Pt,mR”P,m+ F7WF — Pt,mJF,7—F7= Pt,r_i(I— r-ilr) +F7WF, — 1t,mJ7— F7J’P,mwhere the expression for Mse(&t1m,&rlm) is obtained upon noting thattim = (At+PtNi)(—5’m;1) = (At+PtNi)(—S’sm;1)with N,_1 = 0 and hence follows the same recursion as 1Z except that 7? = 0.This concludes the proof of the Theorem. •Remarks1. For 1 t r m, the expression for Mse(&t,&r) is, except for a couple ofadjustment terms induced by the switch from the smoothing algorithm associatedwith the KF to the diffuse smoothing algorithm, equal to the mse expression givenin Theorem 4.6 for an uncollapsed DKF.2. For 1 t < rn < r n + 1, M.se(àt, &r) can be derived in an analogous fashion asChapter 5. Efficient Algorithms for the State Space Model 83in Theorem 5.2. This is however not required in this thesis since we only use lagzero and lag one mse matrices of predictors of the states.With regards to square-root smoothing, observe that the square-root smoothing algorithm associated with the KF (see section 2.4.6) can be employed using the AKF-deriveddata in the post-switch time period (i. e. t m). In the pre-switch time period however,square-root propagation of the lag zero mse matrices of the states does not appear tobe possible in light of the adjustments arising upon the switch from the AKF to theDKF. Ansley and Kohn (1990) make the same comments with regards to the smoothingalgorithm associated with the AKKF.The final section of this Chapter makes clear that the switch from the DKF to theAKF is not advantageous save for the case of a null /3. This conclusion is explained by thefact that the AKF is based on a SSM with augmented states and as we have previouslynoted in Chapter 2, it is the dimensions of the states which determine the performancesof ifitering and smoothing algorithms. The next section discusses a collapse of the DKF.This actually reduces the dimensions of pertinent matrices in the DKF and hence it is ofmajor computational interest.5.3 The Collapsed DKFThe preceding section suggests the idea of partiaffing out the effect of‘in the DKFitself after an initial run of the latter. Specifically this entails the partialling out of thosecolumns and rows of E, A and Qt which relate to y’. This modified DKF, which welabel the collapsed DKF (CDKF), can be viewed as an analogue of the AKF since theonly diffuse effects that it estimates are those associated with the regression parameter.We now state the main result of this section.Chapter 5. Efficient Algorithms for the State Space Model 84Theorem 5.3 Suppose the DKF is applied to observations y generated by the DSSMuntil t = m. Partition Am and Qm conformably with‘and 72,/ Smii Sm12 5m1Am (Aml,Am2,am) and Qm = Sm21 Sm22 3m2I ISmi 8m2 q,and suppose that Sm11 is nonsingular. For t m, replace B = (0, B) by B where thelatter is defined in Lemma 5.1 and reinitialize Am, Pm and Qm as followsAm = (Am2 — AmiSiSmi2,am — AmiSimi), Pm = Fm + AmiS1 and( Sm.i Sm2lSrnliSmi —Qm1I I C—iO I I 0—1\ S)1J— 8m2 q, —8mi’m1i5i0 (1wnere ?m22.1— 0m22—‘m21mip-’m12•Then this collapsed DKF can be employed in lieu of the standard DKF for limiting prediction of the states and gis estimation of !. Furthermore —2 x the diffuse log-likelihood,apart from a constant, is=— 7#)lOg 2 +ogDt +logSmii +logS+i+ a (qn+i—Proof. Without any loss in generality, assume= (-xi; 72) ‘— N{O, diag a2(Ci,C2)}.Let Ci —* co. Then from Lemma 4.1,Pred(jyi; . .. ; ym—i) = {Sm + diag(0, C)}m anda2Mse(7Iy1;. .. ; ym—i) = {Sm + diag(0, C’)}Observe that {Sm + diag(0, C’)} can be written as,—iI ( i 0 ‘\ ( Sm11 0 ‘ ( I ShSm12SmiS1 0 Sm22.i + Cl) 0 JChapter 5. Efficient Algorithms for the State Space Model 85Ir c—1c \ Icy—iI ‘ “m11’m12 1 I mi1—i ) 0 {Sm22i+C1}’ —sm2is;h I= UDU’Let = (Sm22.i+ C’)-1 and p = 1(Sm2 —SmiSi i). Then using the above identity,the predictor of conditional on (yi;. . . ; y—) jS= am — (Ami,Am2)UD’U’(smi;sm2)= (Am2 — AmiSiSmi2,am — AmiSismi)(p; 1) and2Mse(&m) = Pm + (Ami,Am)UDU’(Ami,Am— i A cy—i A i I A A a—i a \flI A A a—i a \— Fm 1 mi’m1imi 1- k/1m2— m1mi1mi2) ‘m2 —‘1mimii°mi2)Therefore (ym; . . . ; y) can be envisaged as being generated by a DSSM anchored with,= (am— AmiSiSmi) + (Am2 — Am1SlSmi2)W where w .‘ {p,cr2f)Thus the DKF, reinitialized as stated in the Theorem, yields the same limiting predictorsas the standard DKF.Next, express the diffuse log-likelihood asAd(y)= )‘(yi; . .. ; ym_i) + )‘(ym;.. . ; iIyi;.. . ; Ym—i) —log Iu2CiI — log Ia2CThe two likelihoods on the right can be evaluated by appealing to Theorem 4.1 andDefinition 4.4. These assert that the log-likelihood of a diffuse SSM with the diffuseparameter (c,u2C) is given by,(y#+ log JC’ + S +u2{q + c’C’c — (s + Cc)’(C + S)(s + C_ic)}Chapter 5. Efficient Algorithms for the State Space Model 86Recall Y1 (0,o2Ci). Thus as C1 —‘ oo,rn—iym—i) — log Ia2CiI = {(yi; . . . ; Yrn_i)# — 7}log cr2 + log DtI+ log I5rn + diag(O, C)I + log Io2C+ [q — ,,{Sm + diag(O, C’)}srn1/2{(yl;...;yrn_i)# _7}logcr2 +logD+ log I5VmiiI — log II + log Ia2C+ cr2(qm — SiSiSrni —Furthermore A(ym;... ; yyi;.. . ; y,,_i) equals(y;...;y)#1ogu2 +S1j+ +— (+i + it)’(1 + +Q1t)}where Q {(S, .s); (4’, q)} follows the same recursion as Q except that it is initializedwith Q = 0. Now let C2 —* oo. Then in view of the initialization of Qrn, it is a simpletask to ascertain the stated expression for )d(). This concludes the proof of the Theorem.Remarks1. With scalar observations, the earliest time that the DKF can be collapsed to theCDKF is when m = 1 + y, in which case Qrn = 0. This contrasts to the switchfrom the DKF to the AKF which can take place at m = 1 + y at the earliest.2. When /3 is null, both the CDKF and the AKF coincide with the KF.3. The matrix U performs a sweep operation : it factors out the effect of in theDKF. As such, the matrices A, E and Q generated by the CDKF respectivelycorrespond to the71-partialled-out versions of A, E and Qt in the DKF.Chapter 5. Efficient Algorithms for the State Space Model 874. The matrix.A defined in Lemma 5.1 can be in row-echelon form. Then using theideas in Theorem 5.3, it is possible to implement a progressive collapse of the DKF.This amounts to partialling out the columns related to those distinct elements of yias soon as the latters are identified. Ansley and Kohn (1990) employ this strategyin turning their AKKF to the AKF. However in our opinion, progressive collapsingis not an attractive proposition in view of the intricate bookeeping that is requiredwhen smoothing (see Theorems 5.2 and 5.4) is called for.We now consider the square root form of the CDKF. It follows the line of the squareroot form of the DKF described in section 3.2.3 except for the modification of the pertinent matrices for t rn. The following Lemma indicates the safe computations of thereinitialized quantities.Lemma 5.4 Suppose the square root form DKF described in section 4.3.2 is run fort < m where m is as described in Theorem 5.2. WriteQ11 Q12 w1= 0 Q22 w20 Orwhere for i 1,2, square matrix Q22 and vector w both conform with and r is a scalar.Then the reinitializations described in Theorem 5.3 can be obtained as follows :Am = (Am2— AmiQQi2,am— AmiQri’wi), -p’!2 = (P,/2,AmiQj)U,1’2’ w2= I I and =+logQnO r)where U is any orthogonal matrix and pf2 is without any trailing zero columns.Chapter 5. Efficient Algorithms for the State Space Model 88Q’I11 Q’112— iV / fV i I (,I / I-1211 1212 1-22 -22 .12W 1wQ11 wQ12 + wQ22 ww1 + ww + r2I /I \ —1——1= kii11) 1112— ii 12,— IfV fi \—1fl — f— 11—11) liWl — ..11) W1,IQI. \ /i1 r I tV \ —1 i-’ I,= i 12 12 + 22 22) — 1211k11 11) ii 12 = 2222,= (QQi)Qiwi — (Q2wi + Q2w)= —Q2w,= (wwi + ww2 + r2) — wQn((QiQn)1Qiwi = ww + r2We then obtain upon using these expressions, the asserted reinitializations of .A, Am, ‘ph2and Q’/2.It is clear that the collapse the DKF to the CDKF also permeates to the relatedsmoothing algorithm. Note that the smoothing algorithm related to the CDICF makesuse of Pred(72yi; . .. ; y,). Hovever for t < m, this smoothing algorithm additionallyrequires Pred(71yi; . .. ; yj. The next Theorem details this reconstruction as well asother necessary adjustments to this smoothing algorithm for the pre-collapse time period.Theorem 5.4 Suppose the DKF is collapsed to the CDKF at t m. Put N = 0, azero matrix with dimensions a+i x (y + 1), and R 0, a square matrix of order a,+i.IterateProof. With Q’ as specified in the Lemma, we obtainSmii Sm12 8m1 Q1 0 0Sm21 Sm22 5m = Qu12 Q2 05rn2 q w w rQ1100Q 12 W1Q22 W20 rqi/2Therefore,-‘m11 Q’H andSiSmi2ShsrniSm22.1Sm21S18 — Sm2I c—iq—8mimh131N.1 = ZD’E + LN and Ji_1 = ZDZ + LRLChapter 5. Efficient Algorithms for the State Space Model 89except that fort < m, N is replaced byJVg WZthA1m_i (0, Nm_i) where 0 consists of7columns. Then for 1 m t r, & andcr2Mse(&t, &r) are obtained as in Theorem4.6 provided is replaced by 7n+i,2•Furthermore for 1 I r m, à anda2Mse(àt, &r) are respectively,Ft(w; 1), PL._1,(I— Rr_iPr) +F,7WF,— PtL,JF,1— FtiJ’Lm,rPrwhere/ “ ‘\ IT c—i r’I 7n+1,1 .Lmi — *m11Smi — ‘.372W1 1=1\ 7n+i,2 ) \-2 (sh — HRm_iH’ + G1’G’ GFW = o Mse(w)=IPG’ F)F1 and Fe,.,, respectively denote the first y1# and the first 7# columns of F =H =S1A,G = HNm_i,2— SiSmi2 where Nm_i = (Nm_1,2, flm—i) with Ttm1 as itsfinal column, I’ =oMse(-/+i,)and J = Rm_1H’.Proof. From Theorems 5.3 and 4.6, Pred{aml(7i;72)} is given by either71 /(Ami, Am2, am)—72 or (Am2 — AmiS1S i,am — AmiSimi) ( —72\11This implies after direct algebra that‘ = Simi — SiSmi272.HencePred(7iIy) = Pred(S1smiIy)—Pred(S1Smi272y)= Pred(Sismi ui; . . . ; Ym—1 em;. . . ; e,j — SiSmi2/= S18mi + -2 COV(Simi, t — ât)ZD’et— SlSm12n+l,2= Si(mi— Sm125’n+i,2) +cTCov(S.hsml, tm— &m)Nm1(n+i,2; 1)Chapter 5. Efficient Algorithms for the State Space Model 90a—i I a— ‘‘miikm1 — mi27n+i,2+J2COV(SmiSmi, —AmiSismi)Nm_i(5’n+i,;1)= S•i(mi— Smi25’n+i,2) — SiAiNm_i(’5’n+i,;1)= SiSmi + G5’+i,2 — Hflm_iwhere et = Et(—5’2, ; 1) is the limiting innovation and Nm_i(—5’n÷i,2;1) coincides with n1in Theorem 5.2. The third and fourth equalities as in the proof of Theorem 5.2, followfrom Lemma 5.3. Therefore w = Fred(—7y) can be written as,Ir ry\ Irr a—iI iJ LLT1rn_i— ‘mi13iw=I II\O I)\, 7n+1,2Observe that2Mse(H— Simi) = S1 — HRm_iH’ (as in Theorem 5.2) and— SiSmi is uncorrelated with 2• Therefore-2 (i G’\ (s iH1?m_1Z1’ 0 ‘ ( I 0u Mse(71y)= I II II\0 I) 0 I= (sn’- HRmiH’ + GT’G’ GI’ ‘= wFG’ r)Now observe that in both the pre-switch (from DKF to AKF) and the pre-collapse(from DKF to CDKF), we employ the same DKF quantities. Furthermore the adjustments to Nm_i in both Theorem 5.2 and the present Theorem coincide since Nm_i asdescribed in this Theorem can always be written as Nm_i = (0, ni) upon factoring outthe effect of 72 and this in turn leads to a reinitialization A1m_i = (0, ni) as in Theorem5.2. Thus the arguments developed in the proof of the cross state mse expressions inTheorem 5.2 apply verbatim here. This concludes the proof of the Theorem.A square-root form for the smoothing algorithm associated with the CDKF is similarto one associated with the DKF (section 4.3.2) except that it only applies for t m, theChapter 5. Efficient Aigorithms for the State Space Model 91post-collapse period. Due to the adjustments at t m, square-root smoothing algorithmfor t m appears intricate at best.5.4 Efficiency of Collapsing StrategiesAnsley and Kohn (1990, p282) and Bell and Hilliner (1991, p284) have raised concernsabout the computational efficiency of the DKF, specifically the fact that it employsrecursions of matrices A and E in lieu of the vector recursions for and et in the AKF.However we counter through the observation that the efficiency of the DKF and the AKFis significantly more dependent on the dimensions of the mse matrices of the predictorsof the states. Thus the AKF, since it is based on a SSM with augmented states, is notnecessarily more computationaily efficient than the DKF.In the present section, we demonstrate that the CDKF, the collapsed version of theDKF which was derived in the last section, is computationally superior than both theDKF and the AKF-type algorithms (unless /3 is null in which case they all coincide withthe KF) proposed by Ansley and Kohn (1990), Bell and Hillmer (1991) and Harvey andPierse (1984). As discussed previously, these alternative algorithms are all based onthe ASSM wherein the states are augmented to incorporate the regression parameter /3.These algorithms therefore construct larger error covariance matrices than the CDKFand this time-consuming activity explains the computational superiority of the latteralgorithm. The same explanation permeates to the associated smoothing algorithm.Additionally, the latter employs, among other quantities, the state error covariance matrices generated by the CDKF and therefore has less data storage requirements than thesmoothing algorithm associated with the AKF.A minor reason behind the computational superiority of the CDKF over the AKF isdue to the fact that the switch from the DKF to the AKF generally takes place at a laterChapter 5. Efficient Algorithms for the State Space Model 92Strategy P QtDKF rx(-y+-y+1) rxr (+-y+1)x(-y+7+1)AKF (r+4) xl (r+-y) x (r+-y) lxiCDKF rx(4+l) rxr (-y+l)x(-y+l)Table 5.1: Dimensionalities in filtering algorithmsstage than the collapse of the DKF to the CDKF. This follows since the switch can onlyoccur when St is nonsingular while the collapse requires that only the topmost diagonalblock of order of St be nonsingular. Therefore the use of the CDKF as opposed to theAKF implies gains in the areas of computational efficiency and data storage requirementswhen smoothing is called for.The DKF, CDKF and AKF (and their associated smoothing algorithms) share thesame recursions except that they employ matrices of different dimensions. Hence thedifference in their computational performances is solely accountable to the dimensions ofpertinent matrices in these algorithms. As stated in Chapter 2, the most time-consumingrecursion in the KF, and by extension the DKF, CDKF and AKF, is the one concerningthe state error covariance matrix i. e. In the same vein, the smoothing algorithm(see Theorems 2.2 and 4.6) evaluates a covariance matrix R with the same dimensionas P. The dimensions of these matrices, as well as other pertinent matrices, subsequentto the switch or collapse of the DKF are given in Table 5.1 for the filtering cycle and inTable 5.2 for the smoothing cycle. These tables assume that the transition matrix T istime-invariant with dimensions r x r. For the time-varying case, the relative differenceswill be exactly the same.Since the computations of P, Qt and R are the most time-consuming in the ifiteringand smoothing algorithms especially when their square-root forms are employed, we canimmediately infer that the CDKF should be the most efficient of these three strategiesChapter 5. Efficient Algorithms for the State Space Model 93Strategy NDKF TX(7+7+1) rxrAKF (r+-y)x1 (r+y)x(r+y)CDKF rx(-y+1) TXTTable 5.2: Dimensionalities in smoothing algorithmsfor two reasons namely, (i) it is based on a SSM with states of minimal dimensionalityand (ii) its associated smoothing algorithm require less data storage.5.4.1 Numerical IllustrationIn order that the DKF and the CDKF can be compared on an equal footing with the AKF,we must estimate the state, its error covariance matrix and also the gis estimate of / ateach iteration of the algorithms. The same benchmark applies with regards to smoothing.For illustration purposes, we have chosen a regression model with-y = 0, 1,2,3 regressorsand a quarterly seasonal error term. This can be expressed as the following SSM,—1 1 0 1Yt = X/3 + (1 ci = —1 0 1 c1t + 0 Ut—1 0 0 0The transition matrix has nonstationary roots 1 and ±i. Hence following the collapsestage, the quantities E, A and Qt in the CDKF have = 3 fewer columns than theircounterparts in the DKF. The AKF is based on a SSM with augmented states (ct; /3) (i.e. the ASSM). Therefore the matrices P in the AKF and R in the associated smoothingalgorithm consists of y more rows and columns than their counterparts in the DKF andits associated smoothing algorithm.The computations were carried out on an AT-type microcomputer using the squareroot forms of the DKF, AKF, and CDKF and their associated smoothing algorithms.Chapter 5. Efficient Algorithms for the State Space Model 940 1 2 3DKF 23.34 30.92 40.53 53.99AKF 15.26 19.33 26.75 37.29CDKF 16.04 19.28 23.07 28.95Table 5.3: Run times (seconds) for state prediction, regression parameter estimation andlikelihood evaluationy 0 1 2 3DKF 38.34 45.98 55.86 69.86AKF 31.03 42.67 60.15 77.28CDKF 31.46 35.15 38.50 43.55Table 5.4: Run times (seconds) for smoothingRecall that these square root algorithms are identical except that they employ matrices of different dimensions. Tables 5.3 and 5.4 display the run times of these squareroot algorithms when the same 20 randomly generated observations are employed toconstruct limiting estimates (ifitered and smoothed) of the state and the related errorcovariance matrix and the gis estimate of the regression parameter at each iteration ofthese algorithms as well as evaluating the diffuse log-likelihood at the final iteration.These run times confirm our assertion that the CDKF is computationally more efficient than its competitors in all facets of statistical inference with the DSSM. The resultsalso clearly indicate that the inclusion of the regression parameters within the state leadsto patently inefficient smoothing algorithms.5.5 SummaryWe have demonstrated that a properly implemented DKF, labelled the CDKF, is computationally superior to alternatives discussed in the literature. The next Chapter discussesChapter 5. Efficient Algorithms for the State Space Model 95maximum likelihood estimation of parameters in the SSM. This is conducted by embedding the DKF within the iterative EM algorithm. Chapter 7 deals with the recursivegeneration of residuals using the DKF. The results of this Chapter indicate that cornputational efficiency will be enhanced if the algorithms developed in these two Chaptersemploy the CDKF in lieu of the DKF.Chapter 6Maximum Likelihood Estimation in the State Space ModelIn many applications of the SSM, the focus in on the estimation of the unknown parameters in its system matrices on account of their practical interpretation. In this Chapter,maximum likelihood estimates (mie’s) are derived for these unknown parameters underthe assumption that they are time invariant. The estimation method, labelled the DKFEM method, consists of embedding the DKF within the EM (Expectation-Maximization)algorithm. The latter is a popular derivative-free likelihood optimization procedure.The time series literature reports several applications employing an EM approach forthe estimation of unknown parameters in the SSM : Harvey and Peters (1990) for theestimation of the error covariance matricesa2GG’ ando2HH’ in the str’uctural models,Shumway and Stoffer (1981) for the estimation of a nonstationary scalar transition matrix T which is interpreted as an inflation rate and Watson and Engle (1983) for theestimation of the observation matrix Z with the components of the latter interpreted asthe unobserved wage rates.All these works employ a KF-EM estimation method with the KF initiated accordingto the big “k” method. Initializing the KF in such a fashion is, as argued in previouschapters, both theoretically and computationafly unsound. The DKF is more appropriatein diffuse situations and therefore in this Chapter, we focus on maximum likelihoodestimation in the SSM via the DKF-EM technology. The DKF (like the KF) serves twopurposes (i) evaluate the diffuse log-likelihood and (ii) provide the required data forthe smoothing algorithm ; the latter is required by the EM algorithm. In view of the96Chapter 6. Maximum Likelihood Estimation in the State Space Model 97results of the previous chapter, it is computationally more efficient to employ the CDKFin lieu of the DKF. The results reported in this Chapter are obtained via the CDKF-EMestimation method.This Chapter is organized along two main sections. Section 1 reviews the conceptsbehind the EM algorithm and discusses its merits relative to other likelihood maximizingprocedures. A general CDKF-EM algorithm for estimating system matrices in the SSMis developed. It generalizes and unifies the works of Shumway and Stoffer (1981), Watsonand Engle (1983) and Harvey and Peters (1990). The CDKF-EM estimation technologyis illustrated via two financial applications. The first application requires the estimationof the exponential growth rate of a time series of quarterly earnings of a company. This istantamount to estimating the growth coefficient of the trend in a trend-seasonal transitionmatrix. The second application employs the Capital Asset Pricing Model under theassumption that the market premium (equivalent to the state) follows a random walkto estimate the risk-free rate of return (equivalent to an unknown regression parameter)and the “betas” (equivalent to the observation matrix) of three stocks traded on the NewYork Stock Exchange (NYSE).The majority of estimation applications in the SSM deal with the estimation of thecovariance matrices of the disturbances in the SSM, namely o2GG’ and o2HH’. Thesecond haff of the Chapter focusses on this estimation problem. Following the judiciouschoice of the complete data (concept is explained in the next section), we investigate a newand computationally more efficient CDKF-EM estimation method which avoids the timeconsuming computation of lag one state error covariance matrices. In a recent discussionpaper, Koopman (1991) also suggests a similar estimation strategy. The section concludeswith a tabulation of the results obtained on employing this novel version of the CDKFEM algorithm to structural models previously illustrated by Harvey (1989), Harvey andPeters (1990) and West and Harrison (1989). Interestingly, we obtain solutions withChapter 6. Maximum Likelihood Estimation in the State Space Model 98higher log-likelihoods than previously found.6.1 The EM approachThe EM algorithm is a popular tool for likelihood maximization in statistical modelswhich either explicitly involve missing data or which can be formulated in terms ofmissing information. As such, it is often employed for maximum likelihood estimationpurposes in a variety of applications. Instances of its applications can be traced quitefar in statistical history. For example, Lauritzen (1981) reports a paper by Thiele (1880)who formulated a time series model consisting of the sum of a regression component, aBrownian motion and a white noise. Thiele employed a variant of the Kalman Filter toestimate the regression component and evaluated the variances of the Brownian motionand the white noise via an iterative process which is akin to the EM algorithm. A recentapplication of the EM algorithm deals with the reconstruction of data images in thefield of positron emission tomography (PET) (Vardi et al., 1985). Dempster, Laird andRubin (1977) building on previous work by Orchard and Woodbury (1972) and Sundberg (1974) generalize and unify the theory behind the EM algorithm and demonstrateits usefulness to such applications as variance components estimation, hyperparameterestimation, finite mixture models and factor analysis.The SSM provides a suitable framework for explaining the concepts behind the EMalgorithm. In the SSM, we observe the incomplete data yt as a function of an unobservedtime series o. Call ({yt}, {o}) the complete data. Suppose the mle of a parameter vector& is required. One possible estimation method is to maximize the likelihood based on theincomplete data. However this is usually a complicated task requiring the optimization ofa nonlinear likelihood function. An attractive alternative is to employ the EM algorithm.This consists of the repeated iterations of the following two stepsChapter 6. Maximum Likelihood Estimation in the State Space Model 991. Expectation or E-step : complete data sufficient statistics for i& are estimated conditional on the cur’rent estimate of and the observed data y.2. Maximimization or M-step: a new estimate for & is evaluated using the expectationof the complete data sufficient statistics computed in the E-step.Dempster, Laird and Rubin (1977) prove that repeated iterations of the E and M stepseventually lead to a stationary point of the likelihood function. The starting estimate ofb is arbitrarily chosen; however from the viewpoint of accelerating the convergence ofthe EM algorithm, it is often possible to employ an easily-derived consistent estimate of?b.The appealing features of the EM algorithm are: (i) the E and M steps are often verysimple and interpretable, amounting in many cases to regression-like equations, (ii) thesequence of log-likelihoods is monotone nondecreasing (Boyles (1983) and Wu, 1983) (iii)a neighbourhood of the stationary point of the likelihood function is usually found withina few initial iterations even when the EM algorithm is initiated with poor estimates ofthe unknown parameters and (iv) the parameter estimates are mle’s if a global maximumof the likelihood function is attained. Consequently, in the PET application, subsequentiterations of the EM algorithm yield sharper images. The EM algorithm does have somedeficiencies : (i) it does not provide covariance matrices for the parameter estimates and(ii) its convergence rate may be linear or sublinear in the neighbourhood of the stationary point and thus unbearably slow for many applications. Louis (1982) and Meilijson(1989) have proposed extensions to the EM algorithm for computing mse matrices forthe parameter estimates and speeding up the convergence of the EM algorithm usingpseudo-Aitken’s acceleration methods. These extensions are not covered in this thesis.Alternative likelihood maximization methods include Newton-Raphson (for examplethe Gill-Murray-Pitfield algorithm) and quasi-Newton methods such as scoring whichChapter 6. Maximum Likelihood Estimation in the State Space Model 100require the use of complex software to solve the nonlinear equations which arise fromdifferentiating the likelihood function based on the incomplete data y. These estimation methods have the merits of (i) requiring fewer (but more involved) steps than theEM algorithm to achieve convergence and (ii) providing mse matrices of the parameterestimates. However they are not as stable as the EM algorithm in the sense that their repeated applications do not generally yield a monotone sequence of log-likelihoods. Someresearchers like Watson and Engle (1983) have used a combination of both the EM andscoring algorithms in order to exploit their desirable features. The EM algorithm is firstused to locate a neighbourhood of the stationary point of the likelihood function and thescoring algorithm is subsequently applied to achieve the convergence of the parameterestimates and the estimation of an approximate information matrix.A serious flaw to the EM algorithm and most optimization routines is that theyconverge to a stationary point and not necessarily to the global maximum of the likelihoodfunction. It is therefore always a wise strategy to initiate these algorithms with differentstarting points and then compare the results. For instance, in the applications reportedin section 6.1.2, we have used two strategies to initiate the EM algorithm. The firststrategy (labelled A) initiates the EM algorithm with solutions provided by previousresearchers while the second strategy (labelled B) uses “naive” starting points where forexample the relative variances of disturbances in the SSM are set to one.6.1.1 The general DKF-EM algorithmThis subsection customizes the EM algorithm for use with the SSM. We generalize andunify the works of Shumway and Stoffer (1981), Watson and Engle (1983) and Harveyand Peters (1990) dealing with the estimation of unknown parameters in the systemChapter 6. Maximum Likelihood Estimation in the State Space Model 101matrices (i. e. Z, T, G or H) in the SSM. Consider the SSM,Yt = X/3 + Za + Gut, at+i = W/3 + Tat + Hutwhere a0 ‘-S’ N(ao,a2Po) with a0 and P0 known, X and W are known and Ut N(O,o21).Suppose & = (/3; Z, G, T, H, 2) and furthermore assume without any loss in generalitythat Yt and at respectively have p and k components. Then assuming {at} is known, theSSM can be expressed as,/ \ I \ 3 /I ( X a 0 I, 0 \ ( GI 11 I z +1 Ut0 ct®Ik)Twhere z = vec (Z) and ‘r = vec (T).It then follows from well-known linear models theory that the mie of (/3; z; T) is,X W’GG’ GH’ ( X a®I 0= crØI,, 0 I IHG’ HH’) \ W 0 a 00 aØI,x w /(GG’ GH’\ ( Ytx aØI,, 0 I IIHG’ HH’ ) \0 c0IkThis general mie expression is easily modified when some of the parameters (possiblytime-varying) are known. Furthermore,/ —1(GG’ GH’”\= tr M /(y;a)#HG’ HH’)(C.”I f = (M/n)”2 whereH)Chapter 6. Maximum Likelihood Estimation in the State Space Model 102/3I Yt ‘\ (Xt cØI, 0M = >mtmt with mt= I I — i ILt+i) \W 0 ø’k)TIt is easily ascertained that the sufficient statistics for the above mie’s are E Yt,::‘:=i YtY, =i Yta, =i cr and E=1 j,i = 0,1. Therefore the E-step of theEM algorithm evaluates Et = E(atly) and E(at+1a+,Iy) = Mse(&t+2,&t+j) + &t+i&.In essence then, the E-step constructs the complete data sufficient statistics by runningthe smoothing algorithm associated with the CDKF. The M-step uses these completedata sufficient statistics to evaluate a mew estimate for j,.We conclude this subsection by pointing out a problem which affects the estimation ofsystem matrices in the SSM. In Chapter 2, we discussed the non-uniqueness of the SSMspecification. This property implies that parameter estimation may be only feasible up toa scale and/or orthogonal transformation and consequently the EM algorithm convergesto a ridge of stationary points of the likelihood function. This characteristic is evidencedby the multiple solutions obtained in the applications illustrated in this Chapter.6.1.2 fliustrationsEstimation of growth rate of earnings. Sliumway (1988, pl8f3-192) describes anexponential trend plus seasonal variation model for a time series of quarterly earnings(from the fourth quarter, 1970 to the first quarter, 1980) of the U.S. conglomerate Johnson& Johnson with the SSM,q0 00 h1000 —1 —1 —1 0 h2 0Yt = (1 1 0 0)at + (0 0 1)ut, at+i = Ut0100 0000010 000Chapter 6. Maximum Likelihood Estimation in the State Space Model 103The mie’s of , h1, h2 and .2 conditional on y are easily derived using the results ofthe previous subsection. In particular, the nile of conditional on y is Sio[1; 1]/Sii[1; 1]where S, = = Mse(&t+,&t+) + ,j = 0,1. Table 61.lists the results obtained upon initiating the EM algorithm with (i) the final resultsobtained by Shumway (strategy A) and (ii) naive estimates of the parameters (strategyB). The second and third columns report theu2-concentrated log-likelihoods (denotedby )2) respectively computed from the starting and final estimates generated by the EMalgorithm.Starting Points Start. Final Solutions Number of, h1, 112, h, 112, &2 IterationsA : 1.037, 0.53, 1.66 -93.99 -93.83 1.037, 0.5853, 1.528, 0.0394 2B : 1, 1, 1 -136.67 -94.04 1.0368, 1.0583, 1.8851, 0.0234 295Table 6.1: Estimation results with Johnson & Johnson dataThe results tell us that during the study period, Johnson & Johnson experienced anaverage 3.7% quarterly increase in earnings. Furthermore the high value of h2 is evidenceof seasonal fluctuations in the earnings figures. The EM algorithm does not provide thestandard errors of its estimates. However, in the present context, hypothesis testing on aparticular value of 4 can be conducted in an indirect fashion by analysing the innovationsgenerated upon employing the stated value of q in the transition matrix.It is clear from the results of strategy A that Shumway’s solution is indeed very closeto a stationary point of the likelihood function. His results and ours may possibly differdue to the initialization of the KF : Shumway initializes the KF according to the “bigk” method whereas we employ an exact method, namely the CDKF. Interestingly, theestimate of q5 converges very quickly (within the first 5 iterations) as opposed to the otherestimates. The other applications reported in this Chapter which also employ seasonalChapter 6. Maximum Likelihood Estimation in the State Space Model 104SSM’s also experience slow rates of convergence of the error covariance matrices.Finally note that in this application, the CDKF coincides with the KF and thus Aand E are vectors. This contrasts with employing the DKF when A and E would thenconsist of 4 columns. Thus employing the CDKF in lieu of the DKF leads to substantialcomputational savings.Estimation of asset betas. The Capital Asset Pricing Model (CAPM), devised bySharpe (1965) and which earned him a share of the 1990 Nobel prize in Economics, isused in Finance to estimate the “beta” (a measure of riskiness or volatility) of financialassets vis-a-vis a market portfolio. The latter is defined as the basket of all assets in afinancial market, such as the New York Stock Exchange. In its simplest form, the CAPMcan be viewed as a simple linear regression model with the intercept and regressor beingrespectively interpreted as the risk-free rate of return and the market premium, which isa premium offered as compensation for investing in a risky asset as opposed to a risk-freeasset. By design, risk-free assets like government treasury bills are assigned a “beta” ofzero while the market portfolio has a “beta” of one. Over the years, several modificationshave been proposed to the CAPM. One such modification posits a random walk modelfor the unobserved market premium. This modified CAPM can be written as the SSM,= 1/3+Zcrt+(G,O)u, Qt+1 = crt+(0,1)utwhere Yt is a vector of the rates of return of p assets, j3 is the unknown risk-free rate ofinterest, cr is the market premium thereby implying that Z is a vector of the “betas”of the assets and G is diagonal with unknown diagonal entries gj, i 1,...,p. Morecomplicated asset-pricing models exist. For instance in the Arbitrage Pricing Mode4 thestate at has components which are thought to be inflation rate, growth in industrialproduction, difference between long-term and short-term treasury bond yields etc. (ChenChapter 6. Maximum Likelihood Estimation in the State Space Model 105et aL, 1986). The estimation of these components is of obvious practical interest.We now derive mie’s for the parameters of interest in the above CAPM. Denotez = vec(Z) and rewrite the CAPM as,(in, +(G,O)utThen the nile of (/3; z) is given by,($)= {(i; atØI) Diag(g2,...,g; (in,X (1; atI) Diag(g2,...,g Yt-1n1g2g2at g2at g2Jcrt •.. g1ct g12atgj2at g2c 0 0 ... 0 0g2Zat 0 0 ... 0 0gE2at 0 0 0 0 g,2cr/Etg2yt[iJgj2 cvtyt[l]X g2atyt[2jg, cryt[p] /where yt[i] is the th component of y. Therefore the Frstep of the EM algorithm constructsthe estimates of the sufficient statistics crt and a conditional on y and feeds themto the M-step to update the current estimate of (/3; z). The mie’s of 2 and G conditionalon y are derived according to the expressions given in the previous subsection.Using the CDKF in lieu of the DKF in this application implies that (i) the matricesChapter 6. Maximum Likelihood Estimation in the State Space Model 106A and E consist of 2 as opposed to 3 columns and (ii) Qt has order 2 as opposed toorder 3. Thus the use of the CDKF attracts both computational and storage savings.In order to obtain proper “betas”, i. e. the “betas” are such that the market portfoliois assigned a “beta” equal to one, it is necessary to augment Yt by a proxy representing themarket rate of returns. The choice of an appropriate market index has been a subject ofdebate in the financial research community. Two common choices are the CRSP (Centerfor Research in Security Prices) equally-weighted and value-weighted market indices whichare respectively the arithmetic and price-weighted averages of the rates of return of allthe assets in the NYSE. Therefore an interesting side result of the current applicationis to produce evidence if any in favour of these two market indices. This is achieved bycomparing the relative “betas” estimated when no market index is used to the “betas”estimated by augmenting Yt with the two CRSP market indices.The dataset considered in this application (courtesy of Dr. Dilip Madan of the University of Maryland) consists of 336 monthly rates of returns of p = 3 assets for theperiod 1959-1986 inclusive (see Appendix at the end of the dissertation). Listed in Table6.2 are the summary statistics of the CRSP equally-weighted and value-weighted marketindices (denoted by EW and VW) and these 3 assets.Asset EW VW 1 2 3Mean (x104) 87 117 95 90 79Variance (x104) 18 28 22 27 21Table 6.2: Summaries for financial dataTable 6.3 summarises our results (with assets in the same order as above) obtainedunder 3 strategies namely (I) without the use of a market proxy (therefore yieldingrelative “betas”, which are denoted with f sign), (II) using the equally-weighted CRSPindex and (III) using the value-weighted CRSP index. The average monthly marketChapter 6. Maximum Likelihood Estimation in the State Space Model 107premium a = n & is also reported. In all instances, the EM algorithm was initiatedwith estimates of the “betas” and the diagonal elements of G (i. e. the gj’s) all set toone.I II III,2 6767.74 9880.62 9880.64/3 x i0 5.4160 7.5432 7.5433.- 1-—— 1it 1.0480 1.04801.i256t 1.1538 1.1538i.OO34 1.0519 1.0519i0 x 2.48 2.59 2.59iO x diag a2th.! - 0.1904-0.09380.4422 0.3428 0.30410.4814 0.3886 0.39080.3540 0.2869 0.2130iterations 56 51 57i0— x a 3.1556 1.1553 1.1552Table 6.3: Estimates with financial dataFrom Table 6.3, we infer the following1. The CAPM explains about 80% of the variations invariances (i. e. the diagonal elements of o.2ããl) eachthe unconditional variances listed in Table 6.1.the assets since the residualaccount for less than 20% of2. The similarity in the results produced by strategies II and III suggest that thevalue-weighted and the equally-weighted CRSP market indices are equally goodmarket proxies.3. The relative betas of assets 1 through 3 in the three strategies are about the same(since 1.04 : 1.15 : 1.05 translates to 1 : 1.1 : 1.003). In strategy I, the estimatesChapter 6. Maximum Likelihood Estimation in the State Space Model 108of the beta were rescaled so that the first asset had a beta of one. This has thedrawback of subsequently confounding the estimates of /3 and a.4. The results of strategies II and III indicate average risk-free and market premiumannualized rates of return of 9.4% (= 1.0075412 — 1) and 1.4%. Therefore theaverage annualized rate of return in the NYSE for the period 1959-1986 is 10.8%a figure which is in line with commonly held estimates.5. The higher beta for the second asset tells us that the returns of this asset are moresensitive to market fluctuations than the returns of the other two assets. The lattershave “betas” close to one and therefore can be categorised as conservative assets.6.2 Estimation of covariance matrices in the SSMThe most common parameter estimation required in the SSM deals with the estimationof the covariance matrices of its disturbances. This section considers a novel and moreefficient CDKF-EM algorithm for this specific application. Koopman (1991) recentlysuggested a similar approach in the case of structural models. As a motivation, considerthe quarterly basic structural model (QBSM),Yt = (10100)at+(0001)ut,110 0 0 h10 00010 0 0 Oh2 00= = 0 0 —1 —1 —1 at + 0 0 h3 0 Ut7t 001 0 0 00007t—i 000 1 0 0000The QBSM is a seasonal structural model. Non-seasonal structural models include (i)the basic structural model (BSM) where the state consists of only a level (it) and aChapter 6. Maximum Likelihood Estimation in the State Space Model 109slope (/3) components and (ii) the random walk plus noise (RWM) model where both theobservation matrix and transition matrix are equal to 1. The illustrations described laterin this section employ the RWM, BSM and QBSM.The unknown parameters in these structural models are u2 and a subset of {h1,h2,h3}.In the case of the QBSM, their nile’s are derived upon exploiting the following relationships,= (0001)ut=yt—ztat,wherezt=(10100)i,t = (hi000)ut=(10000)crt+i—(11000)at62,t = (0 h2 0 0) Ut = (0 1 0 0 O)(at+i — at)= (00h30)ut=(00111)at+i+(00001)atDenote the variance of e by o, i 0,... , 3 with c = Clearly then, the mle of uis & = 1/n e. Using the general CDKF-EM algorithm described in section 6.1.1,we therefore obtain, conditional on= 1/n {(yt — zt&t)Qjt — :j&t)’ + ztMse(&j)z}= 1/(no2){Sii(1;1)+Soo(i;j)—2 x [Sio(1;1)+S10(1;2)J}i=1 j=1= 1/(no2){Sii(2; 2) + Soo(2;2) — 2 x Sio(2; 2)}= 1/(ncT2){> Sn(i;j) + Soo(5; 5) + 2 x Sio(i; 5)} wherei=33= i=3=i,j=0,1These mle’s require the computation of Mse(&t_i, &). To obtain this quantity, Harvey and Peters (1990), while working with the QBSM, innovate an idea originally devisedby Watson and Engle (1983) by augmenting at as defined above with (-y; [it—i; v1) andthereafter write t = 1, 2, 3 as contrast functions of the augmented state. The mle’s ofChapter 6. Maximum Likelihood Estimation in the State Space Model 110o conditional on y are then evaluated using smoothed estimates of the augmented statesand their mse matrices. From the discussions and illustrations in Chapters 2 and 5, it isclear that the performance of ifitering and smoothing algorithms are dependent on thesize of the state. This may explain why Harvey and Peters (1990) found the performanceof the EM algorithm to be unsatisfactory. The next subsection proposes a means foravoiding the computation of lag one state error covariance matrices.6.2.1 A new CDKF-EM algorithmIn this subsection, we consider a version of the CDKF-EM algorithm for the efficientestimation of G and H in the SSM. Its derivation is based on the observation that it ismore natural to regard the disturbances in the structural model or in the general SSMas linear functions of Ut and not the state at. This suggests the consideration of (y; u)as the complete data in lieu of (y; a). Consequently we are led to devise an CDKF-EMalgorithm which does not require any lag 1 error covariance terms since the componentsof u (as opposed to a) are serially uncorrelated. Koopman (1991) has suggested a similarimplementation of this version of the EM algorithm for structural models.The E-step of this new DKF-EM algorithm requires the evaluation of E(ut I) andMse(utly). The following Theorem indicates the recursive evaluation of these quantities.Theorem 6.1 (De Jong, 1991c) Suppose y= (yi; y2; . . . ; y,) is generated by the SSM.Then v = Pred(uty) and 4 = I —2Mse(uty) are computed as,Vt = GD1et+ J1’i,t, = GD’G + JRJwhere Jt = Ii — KG and all the quantities are as defined in the KF and in the smoothingalgorithm presented in Chapter 2.The proof is given in the Appendix at the end of the Chapter. Koopman (1991) alsoproves a similar result. The above Theorem will also prove useful in the next ChapterChapter 6. Maximum Likelihood Estimation in the State Space Model 111where we consider diagnostic testing in the SSM. We now state the main result of thissection.Theorem 6.2 Consider the SSM,X/3 + Ztc + Gut, t+i = W + Ttat + Huwhere c = 0, X, Z, W and T are known and Ut N(0,a21). Then the complete datasufficient statistics in the mle ‘s of 6, v.2, G and H are constructed fromIx (G\ Ic= I I /3 + I Vt and B 2 I (I — 14)(G’ H’).\W) \H) H)Proof. Definefort=O,1,...,n,fx IG (Gft=I Ii+I lut, F=l I(G’H’)W) H) \H)Observe that at-i-i Ttcxt+(O, I)ft and yt Ztct+(I, 0)ft. Therefore f = (fo; fi; . . . ; f)is a complete dataset for the SSM. Furthermore —2 x the log-likelihood of f,apart froma constant is,X(bIf) = f#log(u2)+ nlog Fl + cT2 YZ{ft - (Xt; Wt)/3}’F’{ft - (Xt; Wt)/3}Thus E(A( If) I y) requires the evaluation ofE(ftly) b and E(ftfIy) = Mse(fty) + E(ftIy)E’(fty) = B + bband hence the Theorem is asserted. •Therefore the Theorem indicates that the E-step of the CDKF-EM algorithm usesthe iterations described in Theorem 6.1 to evaluate Vt and V. The M-step isdescribed in the following result.Chapter 6. Maximum Likelihood Estimation in the State Space Model 112Theorem 6.3 (M-step) For the SSM described iii Theorem 6.2, the mies of o.2, F and3 conditional on the observed data y are= 1/f# tr[E’ >{Bt + bb + (Xt; Wt)$$’(Xt; We)’ — (Xt; W)$b — b$’(Xt; W)’}]P = 1/(n&2) {B + b1 + (Xt; Wt)$$’(Xt; We)’ — (Xt; W)$b —b1ã’(Xt; W)’}= [(Xt; Wt)’P1(Xt;W)]1(Xt; W)’P’bProof. Differentiating (f) in turn with respect to a2, F and /9 and equating eachnormal equation to zero respectively yields,= 1/f# tr[F1>Z{ftf + (Xt; Wt)/3/3’(Xt; We)’ - (Xt; Wt)/9ft - f/3’(Xj; W)’}]P = i/(2) {ftf + (Xt; Wt)/9/3’(Xt; We)’ - (Xt; Wt)/3ft - f/3’(Xt; W)’}= [E(Xt; Wt)’F’(Xt; W)]1(Xt; Wt)’F’ftTaking the expectation of these mie’s conditional on y immediately leads to the resultslisted in the Theorem..If /3 = 0, as for instance in the structural models, then the expressions given in theTheorem for &2 and P simplify considerably.Corollary 6.1 Suppose /3 = 0 in Theorem 6.8. Then the new estimate ofa2 and (G; H)arentr(E vtv)/tr(E 14) and I {I — 1/n( 14 — vtV/j2)}h/2t=o t=o H ) t=oProof. Put /3 = 0 in Theorem 6.3. Then= 1/f# tr{Ea.2(I— 14) + VtV} => = tr (vtv)/tr(T4)Chapter 6. Maximum Likelihood Estimation in the State Space Model 113Furthermore the new estimate of (G; H) is(O ){[o2(I— 14) + vtvJ/nö2}’12= ((; fi){i — 1/n( 14 — vv/&2)}h12This completes the proof of the Corollary. .In order to derive consistent estimates of 2, it may be necessary to impose someconstraints on G and H. For instance in the QBSM, the zero constraints must be enforcedin the updated estimates of C and H. Furthermore these estimates should also be rescaledsuch that the updated estimate of G has the form C (0 0 0 1).This new version of the CDKF-EM algorithm is clearly simpler and more efficientthan the one discussed in section 6.1. It provides gains in storage requirements wheneverUt is of lower dimensionality than o. This occurs for instance in the seasonal structuralmodel when the number of seasons is greater than 3. Furthermore, as previously noted,it does not require the evaluation of lag one error covariance matrices.6.2.2 Estimation of Structural ModelsThis subsection illustrates the improved CDKF-EM algorithm with some structural models. A factor influencing the convergence of the EM algorithm is the initial estimate ofthe parameter. It is desirable that a consistent and easily computable estimate be employed. Harvey and Todd (1983) and Harvey (1989, p56) have suggested the constructionof consistent estimates of the unknown parameters in structural models through the consideration of the autocovariance function. They state that the autocovariance functionof (1 — L)(1 — L8)y, where L is the usual lag operator, s is the number of seasons ando, j = 0, 1, 2, 3 are as defined previously in this Chapter, is7(0) = 4u+2u+so+6cy7(1) = —2o + (s — 1)o — 4TChapter 6. Maximum Likelihood Estimation in the State Space Model 1147(2) = (s — 2)a + o7(i) = (s — i)o, i = 3,... , s — 27(s—1) =7(S) = —2o —7(s+1) =7(i) = 0, i>s+2Therefore particular estimates of are obtained upon solving any set of four autocovariance equations listed above. A serious flaw is that there is no guarantee ofnon-negative solutions. To get around this problem, we may substitute a small positivenumber for each negative estimate of c. We will employ this strategy (labelled C) inaddition to stategies A (which uses the reported solutions) and B (which uses naiveestimates) to initiate the EM algorithm in the next 3 applications.We now report the results of applying this new version of the CDKF-EM algorithmin the first instance to structural models that have been estimated by Harvey (1989) andHarvey and Peters (1990). In the second instance, we apply the CDKF-EM algorithm tostructural models considered (but not estimated) by West and Harrison (1989). In theapplications below, the EM algorithm is stopped when the increase in the2-concentratedlog-likelihood is less than iO.Estimation of variability in purse snatchings. Harvey (1989, p89) uses a randomwalk plus noise model (RWM) for a time series of reported purse snatchings in the HydePark area of Chicago. He reports estimates &2 = 24.79 and h1 = 0.4557. With theseas starting points in the new EM algorithm (strategy A) we could not attain a higherlog-likelihood (=-554.264). For strategy B we evaluated the lag 0 and lag 1 covariancesof (1 — L)yt as 57.23 and -31 thereby implying an estimate of ? of -4.77 ; strategy BChapter 6. Maximum Likelihood Estimation in the State Space Model 115therefore initiates the EM algorithm with h1 = 0.01. After 130 iterations, we obtain alog-likelihood of -554.264, &2 24.8347 and h1 = 0.4537. Finally strategy C initiates theEM algorithm with h1 = 1. After 26 iterations, we obtain a similar log-likelihood with= 24.7763 and h1 = 0.4557. For this dataset, the CDKF-EM algorithm convergencesin all instances to a ridge of local maxima.Estimation of seasonality in air travel. Box and Jenkins (1970, p531) provide adataset containing the number of monthly international airline departures for the periodJanuary 1949 to December 1960. This dataset is a popular benchmark test in time seriesanalysis. Harvey (l99O,p93-94) and Harvey and Peters (1990) aggregate the data into 48quarterly observations and thereafter apply the log transformation to them.In the tables below, the first column lists the starting points employed by strategiesA, B and C. The second column and third column respectively list the2-concentratedlog-likelihoods based on the initial and final estimates of the parameters ; the fourthcolumn contains the final parameter estimates and the last column reports the numberof iterations required for convergence.The results in Table 6.4 relate relate to Harvey (1990) who estimates the QBSM usingall the 48 observations. Harvey and Peters (1990), on the other hand, only employ the firstStart. Points Start. Final Solutions Number of7. X72 —2i I-. 7-. T •I, U3, 111, ‘2, U3 LeradonsA : 22.19, 1, 11 121.88 123.22 6.88 x iO, 29.9946, 0.8138, 10.7035 110B : 15, 15, 15 115.74 123.21 1.37 x iO, 66.9838, 2.0305, 23.9407 185C : 3.52, 1, 1.24 119.76 123.21 2.83 x 10—6, 14.7877, 0.3258, 5.2631 301Table 6.4: Estimation results for airline departures data (I)40 observations for estimation puposes. They report solutions obtained upon using fourestimation methods namely (1) TD, maximization of the time domain prediction errorChapter 6. Maximum Likelihood Estimation in the State Space Model 116decomposition form of the likelihood function using the Gill-Murray-Pitfield algorithm (2)EM, the EM algorithm as discussed in section 1 but modified to incorporate a line searchin order to speed up convergence and using a stopping criterion based on differencesin log-likelihoods (3) EM*, same as (2) but using a different stopping criterion whichis based on differences between prediction error variances and (4) TD, maximizationof the frequency domain form of the likelihood function using the Gill-Murray-Pitfieldalgorithm.Start. Points Start. Final Solutions Number ofi. 1. L 2 T“1, 112, 113 A A U I1J, I1 113 erauonsA (TD) : 13, 1, 5.77 101.71 102.31 2.17 x 10—6, 18.24, 0.84, 6.20 99A (EM) : 14.35, 1, 4.47 102.16 102.31 2.41 x 106, 17.31, 0.82, 5.89 96A (EM*) : 9.8, 4.11, 1 90.43 102.32 2.22 x 10_6, 18.07, 0.74, 6.11 236A (FD) : 16.63, 1, 5.48 102.27 102.31 1.95 x 10, 19.26, 0.88, 6.54 88B : 10, 10, 10 97.79 102.32 4.89 x i0, 38.32, 2.03, 13.09 140C : 9.1, 1.24, 1 95.16 102.31 3.17 x 10_6, 15.14, 0.59, 5.11 240Table 6.5: Estimation results for airline departures data (II)In both Tables 6.4 and 6.5, the estimate of h2 is relatively small compared the estimates to h1 and h3. The same situation occurs in the next two applications (see Tables6.6 and 6.7). Ledolter explains this phenomenom in the discussion of Harvey (1984) asfollows : structural models (apart from the random walk model) can be expressed asARIMA models with MA coefficients lying on the boundary of the invertibility regionand it is “structural components with small variances that introduce moving operatorsin the equivalent ARIMA model that are close to the invertibility boundary”.West and Harrison (1989) and Ng and Young (1990) discuss subjective interventionsin state-space models, specifically structural models. Following their Bayesian approach,the first set of authors fix the unknown parameters of the basic structural model a prioriChapter 6. Maximum Likelihood Estimation in the State Space Model 117rather than estimate them. The second set of authors assume zero variance for the leveland slope components of the state except at intervention points. Both papers illustratetheir methods with two applications. We have a two-pronged interest in the work of theseresearchers : (i) provide mie’s for the unknown parameters of these structural modelsunder the assumption of no data irregularities and (ii) provide diagnostics based on themodels estimated in (i) and thereafter incorporating the necessary interventions. Thelatter area of work is covered in the next Chapter.Estimation of trend in tobacco sales. West and Harrison (1989) employ a BSMto model standardized monthly total sales of tobacco products by a major company inthe UK for the period 1955-1959. In this application, we ignore the effects of possibleoutliers and structural breaks in the model. Table 6.6 lists the results obtained uponapplying the CDKF-EM algorithm to this dataset.Start. Points Start. Final Solutions Number ofhi, h22 ö2, h, h2 IterationsB : 0.2, 0.2 -665.42 -659.93 540.80, 0.9357, 0.0626 124C : 1, 1 -678.31 -659.92 539.67, 0.9402, 0.0600 105Table 6.6: Estimation results for tobacco products sales dataThese results tell us that the level component in the state account for more variabilityin the observations than the slope components. In fact in our work in diagnostic-testingin Chapter 7, we attribute the cause of data irregularities in this model to shifts in themean level.Estimation of seasonality in UK weddings. West and Harrison (1989) posit aQBSM for the quarterly number of UK weddings for the period 1965-1970. Ignoringpossible data iregularities, we obtain the following estimates for the parameters of thisChapter 6. Maximum Likelihood Estimation in the State Space Model 118model in Table 6.7.Start. Points Start. Final Solutions Number ofh1, 112, 113 2 ã2, h1, 112, 113 IterationsB : 1, 1, 1 -174.25 -150 0.0582, 0.5035, 0.0933, 38.1495 181C : 0.01, 0.03, 0.052 -166.08 -150 0.1865, 0.0888, 0.5, 21.30 287Table 6.7: Estimation results for UK weddings dataThe interesting finding in the UK weddings dataset is the wide variability of theseasonal effects. West and Harrison (1989) attributes this to the abolishment of a taxincentive which used to affect the timing of weddings. This will be considered in moredetail in the next Chapter.Remarks1. The results from the three seasonal models considered in this section suggest thatthe likelihood function surface is flat. This is attested for by the multiplicity ofsolutions. These findings are in line with those observed by Laird et al. (1987)who employ the EM approach to estimate variance components models of whichthe structural model is one.2. The relatively high number of iterations required for convergence of the CDKFEM algorithm does not necessarily constitute a drawback. For instance, scoringmethods require fewer iterations in the neighbourhood of a stationary point of thelikelihood function but each of these iterations are very involved usually requiringseveral passes of the KF to compute first and second derivatives of the likelihoodwith respect to the unknown parameters. For instance Watson and Engle(1983)find the performance of the EM algorithm and the scoring method to be comparablein an application involving vector observations.Chapter 6. Maximum Likelihood Estimation in the State Space Model 1196.3 SummaryThis Chapter considered maximum likelihood estimation in the SSM using an EM approach. We showed a general method for estimating unknown time invariant systemmatrices in the SSM. We also developed an efficient CDKF-EM estimation method forthe estimation of error covariance matrices in the SSM. This novel approach does notrequire the computation of lag one state error covariance matrix. The preceding chapters dealt with the prediction aspects of the SSM. Therefore following the guidelines setforward by Box and Jenkins (1970), it remains to cover the topic of model-fitting ordiagnostic testing in order to complete the statistical analysis of the SSM. This is thefocus of the following Chapter.Chapter 6. Maximum Likelihood Estimation in the State Space Model 1206.4 AppendixProof of Theorem 6.1 The proof makes use of the fact that the innovation vectore = (ei;. .. ; e,) generated by the KF has the same information content as the observationvector y= (yi; y;. . . ; y,). Therefore,Pred(uIy) = Pred(utle) = Pred(ut let; et+i; ... ; e)=Cov(ut, e)(cr2D)’eUsing Lemma 5.3, Ct =— &) + Gu and for j = t + 1,. . , n,= G,u3 + Z{J_iu_i +L3_i(a,_i —= G3u +Z3{J_1u+ L_iJ_2u_+ L_iL_2(cv3— cv,_2)}= G3u, +Z3{J_iu,_i +L3_iJ_2u_++ (L,_1L,2. . . L)Ju + (L3_1L,_2.. . L)(c—After noting that Cov(ut, j — = 0 for all t and j, we obtainPred(ut I y) = GD1e+ (ZL_1L_2. . .jtt+1= GD’e+Jr= Vt, as assertedFinally,Mse(uIy) = Cov(ut)—Cov(ut, e)(a2D) ‘{Cov(nt, e)}’= u2(I—GD1G— J{ (ZL_1L_2. . . (ZL_1L_2. . .L+1J)}J)j=t+1=— GD1— JRJ)=This asserts the Theorem.Chapter 7Residual Analysis in the State Space ModelTime series models, especially those arising in socioeconomic applications, are prone todata irregularities such as discordant observations, or outliers as they are commonlycalled, and structural breaks. The exercise of detecting these unanticipated or extraordinary events, generally dubbed as residual analysis or diagnostic testing, is now firmlyentrenched as an essential and integral part of any statistical modelling. Residual analysisallows us to revise the statistical model under consideration and consequently it enhancesthe various facets of statistical inference namely parameter estimation, prediction andtests for goodness of fit.The residual analysis literature is extensive in the case of the linear regression modelwhere the observations are mutually independent. Foremost publications are the established textbooks of Belsey et al. (1980) and Cook and Weisberg (1982). The latterauthors introduce three types of residuals and discuss their uses in the detection of outliers and influential observations. For the SSM defined by equations (2.1)-(2.2), theseresiduals are defined as follows,1. Ordinary or Signal residuals= Yt — Pred(Xtf3 + Ztc I Yi,. . . , y)= Zt{crt—Pred(at I2. Deleted or Leave-one-out residuals= Yt — Pred(yt I Yi,. . . , Yt—i, Yt+i,. . . , yn)121Chapter 7. Residual Analysis in the State Space Model 122= Zt{cxt—Pred(ct I yi,...,yt—i,Yt+i,...,yn)}+ Gt{ut—Pred(ut I yi,...,yt—i,Yt+i,...,yn)}3. Recursive or Innovation residualset = yt—Pred(ytyi,...,yt_i)= Zt{ct—Pred(crt I yi,...,yt_i)}+Gt{ut—Pred(ut IThe ordinary and deleted residuals are often outputted by standard regression packages.Brown et al. (1975) demonstrate the usefulness of the recursive residuals in assessingthe constancy of the regression parameter in the context of cross-sectional regressionanalysis.The ideas pertaining to the above residuals clearly extend to dynamic linear modelsbut they become more intricate due to the dependent nature of the observations (or moreprecisely the states). The latter characteristic also makes the leave-one-out residual lessuseful. This point is emphasised in a recent paper dealing with diagnostics for ARIMAfitting of time series data where Bruce and Martin (1989) emphasise that “the dependencyaspect of time series data gives rise to a smearing effect, which confounds the diagnosticsfor the coefficients ...“ and thereafter propose a “leave-k-out” diagnostics approach todeal with patches of outliers.Carrying out residual analysis via a linear regression model approach has the drawback of being computationally demanding since the evaluation of the residuals requiresthe inversion of error covariance matrices with dimensions equal to the size of the data.The execution of this exercise via a SSM approach is however more attractive from acomputational standpoint. For instance, innovations are automatically generated by theKF or the DKF after further algebraic manipulation. Early works in residual analysisChapter 7. Residual Analysis in the State Space Model 123within the SSM context made exclusive use of the innovations; see Harvey (1990, p256-260) who survey their uses in tests of misspecifactions for serial correlation, non-linearity,heteroscedasticity and normality in the SSM. Innovations have the nice statistical property of uncorrelatedness but suffer from the fact that they may not convey as muchinformation content as the signal and deleted residuals. These alternative residuals canbe generated in an efficient fashion using for example the recursive algorithms derivedindependently by De Jong (1988b,1989) in the vector data case and Kolin and Ansley(1989) in the scalar data case.The residuals described above confound the observation errors (Gu) and the errorsincurred in the estimation of the states. Therefore they are unlikely to distinguish between outliers and structural breaks. The detection of these data irregularities is moresatisfactorily addressed via the separate studies of estimators of Gu1 and the state ortransition errors Htut. Harvey and Koopman (1991) advocate and study the use of theseresiduals for goodness-of-fit tests and diagnostic checks.The SSM specification employed in this thesis attracts two benefits for residual analysis. First, estimates of GtUt and Htu can be generated in a unified fashion from estimators of the disturbance vector Ut. Second, interventions in the SSM to incorporate datairregularities is easily carried out via the use of the regression matrices X and W.This Chapter emphasises the use of the predicted residuals Vt = Pred(ut I yi,. . . , y)for exploratory residual analysis. Recursive formulae for the generation of Vt and Mse(V)have already been provided in the previous Chapter (Theorem 6.1). Other predictors ofUt are less useful. For instance, observe that the random variable Ut conditional on• . . , has mean 0 and covariance matrix u21 and therefore Pred(ut I Yi,. . . , yt—i)is uninformative from the standpoint of diagnostics. Furthermore, as argued above, leaveone-out residuals are ahnost similar in characteristics to signal residuals in the presence ofdependent data. These will be apparent in the illustrations presented in the final sectionChapter 7. Residual Analysis in the State Space Model 124of the Chapter. A recursive algorithm for the generation of the leave-one-out residuals isprovided by De Jong (1988b, 1989).In the first section of this Chapter, we demonstrate that the results of De Jong(1988b,1989) and Kohn and Ansley (1989) concerning signal residuals are in fact consequences of Theorem 6.1. Since the prediction residuals Vt are serially correlated, weconsider, in section 2, whitening or orthogonalizing {Vt} in a backward direction andconclude that this orthogonalized sequence corresponds, up to a scaled factor, to theinnovations. This tells us that innovations are as statistically efficient as the backwardorthogonalized prediction residuals in statistical tests for goodness-of-fit of the SSM. Insection 3, we apply residual analysis to both the tobacco sales dataset and the the UKweddings dataset which were discussed in the last Chapter. We illustrate the detectionof outliers and points of structural breaks via simple graphical devices.7.1 Connection with the LiteratureIn many applications, it is only necessary to consider specific aspects of Vt, for exampleGtvt and Htvt. These have lower dimensionalities than Vt and hence it is worthwhile tospecialize Theorem 6.1 to these cases.We now demonstrate how the results of De Jong (1988b,1989) and hence Kohn andAnsley (1989) concerning the signal residuals follow from Theorem 6.1. The followingTheorem due to De Jong (1989) is reexpressed with notation consistent with this thesissee also Koopman (1991) for a closely connected result.Theorem 7.1 (De Jong, 1989) Consider the SSM defined by Yt = X/3 + Ztat + Gtutand = W/3 + Tct + Hut where GH = 0 and the Ut ‘s are mutually and serially Un-correlated zero-mean disturbance vectors with covariance matrix cr2I. Let y = (yi,.. . , y,j.Chapter 7. Residual Analysis in the State Space Model 125ThenPred(Gtutjy) = GtGmt and Mse(GtutIy) =o2(GtG—wheremt = D’et — and M = D’ + KRKwith ij and R as defined in Theorem 2.2.Proof. We first establish a useful identity, namely ZPL = —GtJt, where L =T— KZ and J = H—To see this, manipulate a couple of equations making upthe KF to obtainZPL = ZP(T— KZ)’== —G(H—KG)’= —cj:Using the expression given in Theorem 6.1 for Vj, it follows thatGv = GtGD’et += GtGD1et— ZP(T—= GtGD’et — {D’K’— (D —= GtG(D1et— K)= GtGrntThe third equality follows from the KF (see Chapter 2), taking into account that GH =0. Finallycr2Mse(Gtv) = G(I — 14)G equals,G(I—14)G == GG - GGD’ GG -Chapter 7. Residual Analysis in the State Space Model 126= GG - GG(D1+—f1 çI fY g f f11——These assert the Theorem. •Theorem 6.1 can also be specialized for the efficient generation of Htvt and theirassociated error covariance matrices. These residuals, which are sometimes known asthe smoothed auxilliary residuals, estimate the errors associated with components of thestate and convey information which is usually not apparent in the innovations.Theorem 7.2 Consider the SSM described in Theorem 7.1. ThenPred(Htuty) = and Mse(Htutly) =—Proof. Observe that Pred(Htuty) = Htvt = Ht{GD’et + (H — =upon noting that HG = 0. Finally, Mse(Htuty) = Cov(Htut)— Cov(Htutly)a2(HtH— HHRHH).7.2 Backward Orthogonalization of Predicted ResidualsThe predicted residuals Vt’S are serially correlated since they are inhomogeneous linearcombination of (yl,.. . , y) or equivalently e1,. . . , e,. This property makes the use of theVt’S in statistical tests for goodness-of-fit in the SSM a complicated task. This contrastswith the ease with which the (uncorrelated) innovation lend themselves to in the sametests. Hence we are led to consider the idea of whitening or orthogonalising the Vt’S.Theorem 7.3 Suppose that for 1 <t n, the space spanned by (Vt; vt+i; . .. ; vt,) coincides with the space spanned by (et; et+1; . .. ; en). Then a backward orthogonalization ofVt corresponds, up to a weighting matrix, to the innovations et generated in the KF.Chapter 7. Residual Analysis in the State Space Model 127Proof. The assumption in the Theorem impliesvt—Pred(vtIvt+i;...;v) = vt—Pred(vtIet+i;...;e)= (GD1et+ Jrt) — JrGD’etHence the backward orthogonalized version of Vt, corresponds up to a scale factor, to theinnovations. .Remarks1. A sufficient condition for the Theorem to hold is that G has full rank for all t.2. The Theorem implies that innovations are as efficient as backward whitened versionsof Vt’5 in statistical tests of goodness-of-fit.7.3 IllustrationsTheorem 6.1 was useful for maximum likelihood estimation of parameters in the SSM.We now illustrate the Theorem in a different setting, namely exploratory residual analysis. This consists of assessing time series plots of the studentized observation residuals,{o2G(I— 14)G}’Gtvt and the studentized auxilliary residuals, {o2H(I— 4)H}’Htvt.We have chosen the final two datasets covered in the previous Chapter for the purposeof illustration.Diagnostics for tobacco sales. The sales figures are graphed in Figure 7.1. Theobservations clearly suggest the presence of outliers and possibly structural breaks. Thevarious residuals, displayed in the top two graphs of Figure 7.2 were obtained, from thestructural model estimated in the previous Chapter. A cursory examination of theseChapter 7. Residual Analysis in the State Space Model 128diagnostic plots (especially the smoothed observation residuals and the auxiliary residuals) suggest the presence of one-time mean effects or outliers at Dec’55, Jan’57 andJan’58. In particular, note the statistically significant departures of the innovations andthe residuals associated with the level component from their expected value of zero atthese points.The basic structural model (model C) employed in the previous Chapter was modifiedto incorporate interventions at these points. This consists of defining X as indicatorvariables at Dec’55, Jan’57 and Jan’58. The matrix H has revised parameters h1 = 1.30and it2 = .08 whereas ö2 = 223.8 (compared respectively to 0.94, 0.06 and 540.8 in thepre-intervention model). The residuals (innovations, signal and level residuals) producedby this revised model (displayed in the bottom half of Figure 7.2) look reasonable andfurthermore reflect the larger variability in the data for the period Jan’58 - Dec’59.Diagnostics for UK Weddings. West and Harrison (1989) attribute the unanticipated seasonal variations in the observations (see Figure 7.3), particularly in the firstquarter of each year to the tax benefits enjoyed upon matrimony. These benefits wereabolished at the end of 1967. The diagnostic plots in the top half of Figure 7.4 clearlyindicates this fact. Specifically observe the huge residual associated with the seasonalcomponent of the state at the first quarter of 1968.We intervened in the model and associated a dummy regression variable with theseasonal component of the state for the first quarter of 1968. The revised model hasparameters h1 = .11, h2 = .014, h3 = 1.32 and ö2 = 30.69 (compared respectively to0.50, 0.09, 0.06 and 38.15 in the model without any intervention). The residuals in therevised model (see bottom half of Figure 7.4) especially those arising after 1968 appearto conform to expectations. In the absence of any financial incentives, we would expecta seasonal low during the first quarter since it coincides with the winter months andChapter 7. Residual Analysis in the State Space Model 129seasonal highs during the two middle quarters. From the revised model, we inferred thatthe abolishment of the tax benefits caused a decrease of about 34,200 weddings (or a28% drop) from the expected number in the first quarter of 1968.7.4 SummaryWe have demonstrated that the specification of the SSM employed in this thesis allows usto generate, in a unified fashion, residuals which are useful for pinpointing likely outliersand point of structural change in the SSM. These residuals, unlike the innovations, areserially correlated and should therefore be interpreted with care. We have shown thatthe backward whitened versions of these residuals correspond (up to weighting matrix)to the innovations generated by the KF. This implies that innovations are as efficient asthese whitened residuals in statistical tests of goodness-of-fit.Chapter 7. Residual Analysis in the State Space Model 13011001000900-- .-- .--.--- -800- -700- --- ----600 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I IJan Jan Jan Jan Jan Jan1955 I 1956 I 1957 I 1958 I 1959 119601Figure 7.1: Tobacco sales dataOriginalModelInnovations&SignalResidualsStandardizedResidualsModelwithInterventionInnovations&SignalResidualsStandardizedResidualsOriginalModelSmoothedAuxiliaryResidualsStandardizedResidualsModelwithInterventionSmoothedAuxiliaryResidualsStandardizedResiduals:°°—3IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIJanJanJanJanJanJanI1955I19561957I1958I1959196013 2JanJanJanJanJanJanI1955I1956I1957I1958I1959119601A—SignalResidualsInnovationsCtIC II.O Cl) C O 0 FJanJanJanJanJanJanI1955I1958I1957I1958I1959119601-LevelSlope3 2 0—1-2U‘1(ID Cb*SIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIJanJanJanJanJanJanI1955I1956I1957I1958I1959119801SignalResiduals*InnovationsLevelSlopeChapter 7. Residual Analysis in the State Space Model 132(‘000)170160-150140-130120-110100go8070 II I I I I I I I I I I I1 1 1 1 1 1 1I 1965 1966 1967 1968 1969 1970 1971 IFigure 7.3: UK weddings data*2—-——--4IIIIIIIIIIIIIIII0101010101QI01I1965I1966I19671968I1969I1970I1971SignalResidualsInnovationsStandardizedResidualse 4 20.C*7-2--6..S-s‘.5-1CIIIIIIIP-1 .0101010101QiQiI1965I1966198719681989I197019711-—Level°Slope“-SeasonalityStandardizedResiduals2q-s’1p/-2‘.IIIIIIIIIIIIIIII01010101010101I1965I1966I198719681989I1970I19711—k—LevelSlope‘SeaaonalityOriginalModelInnovations&SignalResidualsStandardizedResidualsOriginalModelSmoothedAuxiliaryResiduals*3 2.$I.*C*aa**.-—1IIIIIIIIIIIII•*IIII01010101010101I1965I1968I1967I1968I1969I1970I19711—SignalResidualsInnovationsModelwithInterventionInnovations&SignalResidualsStandardizedResiduals3I-.0 IModelwithInterventionSmoothedAuxiliaryResidualsCi) I I.Chapter 8EpilogueIn this thesis, we have shown that the three facets of statistical analysis with the SSM,namely prediction, model-fitting and residual analysis can be conducted in a computationally sound and efficient manner using the DKF, which is the Kalman Filter extendedin order to handle diffuse effects in the SSM.We have illustrated throughout the thesis the conceptual and computational advantages of our definition of the SSM over the standard definition employed in the literature,namely the ASSM. Recursive algorithms based on the latter have been shown to be inefficient when the states in the ASSM accomodate a regression parameter and/or initialdiffuse effects since they then require recursions of larger error covariance matrices thanthe DKF. Another disadvantage of including a regression parameter in the state arisesin the smoothing cycle wherein the smoothed estimator of the regression parameter iseffectively not updated since it coincides with its final estimator in the filtering cycle.We have discussed practical issues concerning the use of the DKF namely its initialization and its efficient implementation. We have shown how the DKF could be initializedin the general case as well as in particular instances such as when the SSM is time invariant. We have implemented a collapsing strategy in the DKF whereby columns ofpertinent matrices related to diffuse initial conditions were factored out. This revisedDKF, labelled the CDKF, which coincides with the KF when regression effects are absent from the SSM, has been shown to be computationally more efficient than alternativealgorithms considered in the literature.134Chapter 8. Epilogue 135We have displayed two beneficial consequences of employing a single source of errorin the SSM. First, it allowed us to implement a novel and efficient CDKF-EM approachwhich does not require the estimation of lag-one state error covariance matrices for themaximum likelihood estimation of the covariance matrices of the error vector in the SSM.Second, we showed how the predictor of the error vector conditional on the entire datasetcan be obtained recursively. These estimates were then used to detect likely outliers andstructural breaks in the SSM.Appendix: CAPM DatasetDataset is kindly provided by Dr. Dilip Madan, University of Maryland. Read across anddown. Entries in each column are monthly rate of returns for the three assets. Example(.00488 ; .00153; .00509), (-.01604; .02413 ; .03227) etc..00488 .00153 .00509.03378 .07872 .04534.03052 .03892 .04550.02093 .02631 .00987-.05316 -.10617 -.08731-.01749 -.01572 -.03292-.01202 -.02909 -.03807-.00321 -.01668 -.01467.07804 .05111 .06694.01214 -.01640 -.00129.02313 .05060 .02535.03810 -.00859 .00185-.04063 -.00709 -.04625-.05668 -.07087 -.08011.05346 .08264 .06045-.01239 .02177 .02360.05362 .05869 .06336.03980 .05834 .05317.00026 -.00849 -.01286.01528 .09369 .02447-.00065 .01083 .03446.01836 -.00259 -.01343-.00022 .02053 .03021.01958 .02128 .01063-.00644 -.02633 -.01106.08750 .00799 .03519-.01604 .02413 .03227-.00876 .03910 .03543.00753 -.00653 -.01043.01396 .04359 .02384-.00287 .00868 -.00263.03306 .04078 .04443.03044 .01588 .01471.05385 .02855 .04241.03832 .03008 .04395-.00229 .03824 .04389.03695 .01951 .02989.06536 .06916 .02560.04792 -.01064 .01806-.07937 -.06558 -.09000.02504 .03517 .00977.12416 .10097 .13546-.00766 -.03700 -.03178-.00132 .05658 .01680.05478 .06585 .07177-.00694 -.03556 .00030.02394 .01915 .01883.01041 .00977 .01194-.00663 .01181 -.01976-.00252 -.00407 -.00405-.00468 .00467 .01670-.01176 -.02218 -.00591.03324 .04000 .02734.00288 -.00590 - .00740-.02186 -.00538 -.01263-.03393 -.08579 -.05524-.08353 -.07456 -.07760.01378 -.00466 .02407-.00586 .00869 -.00231-.00329 .00237 .01892- .05926 -.04643 -.04989.03798 .04033 .02503.01118 -.03933 -.02236-.00200 .00302 .01749-.03827 -.07757 -.07178.03924 .00750 .04233.05504 AJ3143 .01655- .02598 - .02390- .03444-.01909 .01390 -.04414-.00498 .00770 .00199-.02873 -.00471 .00071-.07169 -.08016 -.09779-.03258 -.04971 -.05365.01665 .01728 -.01003.03491 .04262 .03144-.01304 -.02987 -.02801-.03505- .00905 - .00808.03485 -.00061 .02842.01734 .02048 .02921-.00302 .01548 .00254.02140 .04828 .04824-.01540 .01553 .00163-.00486 .01195- .01284-.02801 -.07065 -.06289.01984 .07106 .02728.00375 .01148 .03698- .04000 -.02890 -.02684-.00681 -.00701 -.01778.03477 -.01601 -.03141.01034 -.01518 .01248.01987 .05097.01233 .00143.03547 .03677.02181 .06228.02140 .02766.02561 .05777.02117.00728.05415.02978.01519.03724136Appendix: CAPM Dataset 137.08576 .11789 .09994.02738 .08751 .03871.06433 .09556 .06151-.04199 -.03762 -.05762.00410 -.07368 -.05400.07802 .12163 .09577-.01269 -.03754 -.03446.04046 .02523 .02390.01004 -.02060 -.00440.01957 .00660 .02581-.08161 -.07171 -.03729.07821 .05345 .07560-.08882 -.08982 -.06666-.12047 -.11071 -.08238.08884 .08788 .05401-.04746 -.03388 -.01826.05034 .05716 .04601.02161 .07846 .02923-.01385 -.06055 -.04150-.02845 -.06168 -.05308.01587 .03794 .02355.01766 .00258 .01202-.00968 -.02122 .01487.02468 -.01991 .01639-.05566 -.06866 -.01700-.06642 -.06723 -.03791.07761 .09048 .07028-.0 1144 -.04287 .02047-.00687 .05354 .00202-.07467 -.03801 -.01158-.05316 -.10298 -.06862.20065 .06390 .21557.15402 .19543 .08092.01600 .05121 .08181-.07351 -.04895 -.07783.04024 .07696 .06146.11204 .17004 .14620-.01916 .00492 -.01265.00578 -.01767 -.01495-.00123 -.03328 -.02646-.00356.00238 -.00381-.04347 - .05871 -.03825.00057 -.00641 -.00260.00738 -.00395 -.00186-.03 117 -.03256 -.03703.01883 .00341 .02061.02933 .01418 .02022.04321 .02152 .05921-.07823 -.04737 -.05577.00013 -.00600 -.00227.08002 .03335 .05140-.05321 -.04181 -.01976.10130 .05973 .07073-.06444 -.07420 -.05777.05562 .07688 .02984.06190 .06513 .03061.01776 .02397 .00731-.04848 -.03633 -.02824.05350 .08663 .04744.01049 .00316 -.00681.00295 .0203 1 .05592-.00446 -.00223 .02148.06204 .03414 .03575.05275 .07311 .04180- .05875 -.05895 -.00979-.02465 -.02966 -.00280-.01715 -.03953 -.03121-.11437 -.16418 -.11602.01101 -.00359 .01046-.08675 -.00969 -.02994-.10610 -.06650 -.09100-.01773 -.03795 -.04117-.01292 .06492 .09249.07187 .04990 .03656-.05480 -.03188 -.00615.04050 .03322 .03347.00994 .03371 -.00557-.01188 -.01690 -.01819.00278 -.01268 -.01229.02182 .00406 -.03885.02187 .04189 .06165.03894 .03532 .01706.01146 .04519 .03042.04100 .01968 .04843.00167 .00429 .00699.05541 -.00814 .00648.07839 .05467 .04417-.04785 -.04135 -.01939.02161 .03755 .02940-.08081 -.06860 -.08060-.01448 .00070 -.01293-.02572 -.03004 -.00413-.00295 .01695 -.01166-.03412 -.04836 -.03248.03772 .04152 .04197.08108 .06549 .06844.06776 .04626 .03572.00132 -.00480 .00073-.01083 -.00079 -.00814.06075 .08147 .08777.03640 .02434 .00923-.03220 -.03823 -.01816-.00719 -.01242 -.01785-.00634 .01638 .01743.00247 -.03338 .01001.01495 -.03433 -.00290.07600 .08756 .04532.04143 -.03505 .01387-.04428 -.02153 - .00342-.05663 -.03825 .00311-.03900 -.12378 -.13364-.01950 -.03489 -.04007.01155 .07801 .04573.07908 .09545 .02840-.05120 -.03301 -.06751-.00022 -.00808 -.02513.03755 .03176 .02528.05937 .06495 .02745.0 1784 .03969 .02155.08856 .08683 .06774Appendix: CAPM Daiaset 138-.04467 -.04224 -.07186.02260 .02803 -.01452.00137 -.02982 -.03608-.04581 -.04159 -.03653-.06845 -.05918 -.04970.08680 .10346 .09831.04755 .06836 .07846-.10496 -.11840 -.09411.04939 .04795 .05707.01981 .01795 -.00873.03206 .00990 .01048-.07246 -.08609 -.08062.03404 .10487 .04552.06077 - .00218 .03377.02749 .12732 .09354.03299 -.00331 -.01642-.05670 -.04634 -.00859-.04553 .02893 -.01914-.00292 -.03099 -.02681.06408 .02972 .03664- .03266 -.00285 -.02933.04914 .05548 .04331-.04256 -.00453 -.01856.14788 .16851 .11336.00280 .04122 .04850.09146 .08260 .06611-.03127 -.04357 -.02216-.02396 -.01691 -.01642.00616 -.04841 -.02854-.01160 .00939 -.00634-.01098 -.00468 -.03757.00518 .01440 .00969.08983 .10779 .06663.02094 -.03234 -.01243-.02228 .01388 .02802.07113 .02856 .03426.02569 .01368 .01490-.03602 -.00694 -.01509-.04740 -.08905 -.05804.04393 .05273 .07335-.02559 -.02601 -.00299-.00744 -.00192 -.04461-.02434 -.01408 -.00400.04768 .0349 1 .04258-.00163 -.00023 -.03876.00535 .02346 .02888.04396 .02872 .02753.01901 .02536 .01811-.03785 -.05004- .03385.00690 -.02123 -.02298.04549 .06756 .08332.06894 .02200 .04872.00016 -.05645 -.00775.06792 .05224 .07461.01182 .02858 .00888.10915 .09398 .06943.00780 .03459 .03477.01606 .04328 -.00204-.05553 -.07644 -.05209.04925 .02293 .05434- .04998 - .05334- .03777-.05096 - .03713 -.04824.12889 .14373 .14028.04919 .10020 .03990.04167 .03691 .02399-.00614 .00897 .00906.01568 -.02049 .02952.03691 .05742 .03401-.03884 -.06501 -.03690-.05965 -.07030 -.06563.11213 .12325 .10835-.00930 -.03926 -.00139.02262 -.00023 .00862.05626 .05513 .06738-.00309 -.01907 -.00914.06854 .07975 .08742.07963 .08708 .09132.05076 .01318 .06446.09072 .07577 .06013.00046 .03024 .02335-.00703 -.01835 -.00879.05775 .05226 .03151-.00203 .00103- .01689-.00286- .00536 .00713.05136 .06597 .03842-.01164 -.01443 -.01531.00451 -.01772 -.01335.01491 .00304 .01715.07242 .06531 .06813.07885 .04290 .03175-.00381 .00176 .00860.00871 .04504 .04777-.08970 -.09564 -.10789.06707 .01880 .01899.04016 .03769 .02494-.02087 -.01321 -.00811.06955 .09137 .07477.01715 -.06166 -.01682-.0273 7 -.05894 -.06538-.04413 -.00628 -.02378-.01197 -.00045 -.03072-.03189 .00852 -.02857.04524 -.01120 -.00262-.02211 .02439 .00840.06484 .01443 .02524.01555 .06979 .01632.00582 .04651 .00502-.00883 .00552 -.02902.01581 .01272 .03733.01397 .03906 .01896.03535 -.01558 -.03242.03756 .03901 .02083.00009 -.04152 .00087.02621 .01885 .01066-.04015 -.04467 -.03129.04794 .06467 .04023.06504 .07057 .07876.01509 -.00566 .02751-.09890 -.06565- .08089-.01890 -.01764 -.01138Appendix: CAPMDataset 139Read across and down. Entries are monthly rate of returns for the CRSP equally-weightedand value-weighted market indices. Example : (.00958 ; .04149), (.01120 ; .02747) etc..00958 .04149 — .01120 .02747 — .00490 .01280 — .03917 .02559.02004 .00758 .00121 .00450 .03427 .03015 -.01160 -.01508-.04475 -.04589 .01514 .02322 .01822 .01486 .02902 .02124-.06726 -.03910 .01246 .00813 -.01212 -.02447 -.01548 -.01919.03328 .02533 .02313 .02136 -.02077 -.01807 .03004 .03495-.05811 -.05880 -.00457 -.02299 .04793 .04659 .04811 .03740.06446 .08239 .03709 .05986 .03096 .05042 .00586 .00941.02589 .04176 -.02848 -.04239 .03070 .01154 .02716 .02112-.01881 -.02979 .02714 .02110 .04609 .04630 .00067 -.00351-.03626 -.00789 .01904 .0 1544 -.00564 -.00594 -.06273 -.06806-.08452 -.09804 -.08265 -.08497 .06608 .06378 .02287 .02800-.05007 -.06058 .00418 -.02184 .11182 .13783 .01306 -.00914.05129 .07819 -.02253 -.01477 .03421 .02083 .04789 .03845.02038 .03268 -.01801 -.01577 -.00185 -.00968 .05408 .05094-.01264 -.01950 .02937 .01633 -.00508 -.00766 .02273 .00772.02590 .02024 .01766 .02661 .01767 .03179 .00432 -.00348.01690 .01230 .01624 .01530 .01975 .02772 -.01140 -.00903.03052 .03699 .00953 .01720 .00247 .00074 .00384- .00693.03785 .05926 .00710 .02795 -.01069 .00512 .03407 .03590-.00434 -.00784 -.05035 -.07439 .01711 .02900 .03009 .04497.03223 .03212 .02878 .04744 .00165 .02972 .01233 .03212.01005 .04228 -.01024 .01108 -.02131 -.02182 .02371 .03372-.05109 -.07242 -.01112 -.04999 -.01208 -.01208 -.07461 -.09325-.00668 -.01355 .04583 .01299 .01666 .03814 .00444 .01643.08330 .14337 .00992 .02088 .04301 .05193 .04196 .03708-.04143 -.01790 .02356 .05147 .04825 .07033 -.00604 .00298.03328 .03782 -.02801 -.03594 .00770 .00690 .03098 .05574-.03890 -.00343 -.03130 -.04138 .00681 -.00421 .08971 .11645.02336 .05951 .01181 .01845 -.02166 -.02687 .01654 .02810.04174 .05859 .01047 .01620 .05739 .07289 -.03696 -.01562-.00722 -.00923 -.05026 -.07071 .03117 .01889 .02127 .01059.00304 -.00257 -.06235 -.09557 -.06301 -.07997 .05017 .05010-.02231 -.01632 .05504 .07636 -.03151 -.04740 -.01766 -.04501-.07634 -.05607 .05956 .05238 -.00269 -.00903 -.09977 -.12998-.06159 -.08786 -.05059 -.07009 .07460 .07432 .04980 .06828.04258 .08838 -.01576 -.04624 .05260 .03279 .06173 .08773.04965 .09963 .01475 .02573 .04401 .05361 .03396 .03403-.03662 -.04224 .00428 -.00845 -.04064 -.04601 .04228 .05205-.00568 -.01265 -.03962 -.05032 .00018 -.02168 .09056 .11232.02391 .06023 .03049 .03000 .00922 .00171 .00628 .00536.01724- .00908 -.02185 -.03300 -.00188 -.02258 .03791 .02915Appendix: CAPM Dataset 140-.00655 -.02543 — .01038 -.00174 — .04951 .06556 — .01105 -.01313-.02545- .05051 - .04021 -.06447 - .00534 - .02362 - .04635 -.06336-.01876 -.06291 -.00852 -.03135 .05210 .10438 -.03026 -.03454.05259 .10404 -.00139 -.00911 -.11611 -.16980 .01522 -.01002-.00057 .10283 .00318 .01038 -.02419 -.01766 -.04331 -.05963-.03503 -.07316 -.01896 -.03387 -.07276 -.04621 -.08537 -.09296-.11028 -.07791 .16800 .11779 -.04017 -.04206 -.02350 -.06631.13483 .30024 .06037 .03453 .02901 .07982 .04685 .03106.05499 .06829 .05150 .07621 -.06358 - .04080 - .02055 -.04364-.03606 -.04109 .06086 .03704 .03128 .03112 -.01018 .00048.12524 .19089 .00098 .07222 .02971 .01084 -.01099 -.01497-.00907 - .01864 .04749 .04847 -.00737 -.00001 .00053 -.01354.02572 .02195 -.02128 -.02419 .00515 .03402 .05806 .09372-.03965 -.00069 -.01677 -.01623 -.01079 .00368 .00384 .01592-.01238 -.00540 .05119 .06241 -.01548 -.00838 -.01412 -.01456.00040 .00639 -.03944 -.02840 .04234 .07877 .00553 .00341-.05739 -.03637 -.01217 .00650 .03182 .06494 .08347 .07615.01896 .04454 -.01323 -.00604 .05683 .05966 .03755 .06751-.00660 -.00768 -.10221 -.16731 .03170 .04482 .01652 .01170.04721 .08699 -.02897 -.02915 .06199 .08643 .00680 .01737-.01488 -.00773 .04467 .05410 .01532 .02813 .06299 .07851-.00037- .00908 -.06925 -.10039 .06059 .07540 .02280 .04380.06193 .06412 -.00344 -.03120 -.10759 -.13698 .04880 .06376.05840 .07789 .03345 .03966 .06848 .09627 .02003 .04153.02906 .02628 .01965 .01800 .10769 .05668 -.03369 -.02104-.04353 -.00337 .01833 .01956 .04297 .08058 -.01642 .01191.00814 .01993 -.00789 -.00412 .00075 -.01852 -.05598 -.06503-.05570 -.06441 .05729 .07096 .04585 .04131 -.02781 -.02020-.02207 -.02273 -.04924 -.03775 -.00833 - .00122 .04188 .04788-.02911 -.03329 -.01988 -.02074 -.02112 -.01444 .12520 .11348.01264 .02535 .11569 .13465 .04713 .07346 .01615 .01450.03695 .05305 .02794 .04766 .03343 .04331 .07215 .06774.00373 .05105 .03830 .03808 -.03039 -.02021 .01239 -.00868.01749 .02427 -.01824 -.03523 .02563 .04803 -.00821 -.01127-.00888 -.00610 -.03688 -.04973 .01669 .01977 .00526 -.00599-.05129 -.04961 .02325 .02578 -.01565 -.03727 .11144 .11793.00205 .00682 .00331 -.00533 -.00938 -.01677 .02517 .01825.07950 .10633 .01661 .02128 -.00037 -.00828 -.00277 -.01114.05872 .04541 .01719 .01406 -.00351 .01465 -.00463 -.00294-.03667 -.04905 .04462 .03495 .06884 .06549 .04554 .04245.00737 .01326 .07374 .07056 .05560 .05967 -.01322 -.00631.05146 .03973 .01509 .00034 -.05480 -.07165 .07312 .06207-.07957 -.05350 .05402 .04518 .01857 .00726 -.02677 -.01967Bibliography[1] Akaike, H. (1975). Markovian representation of stochastic processes by canonicalvariables. SIAM Journal on Control, 13 , 162-173.[2] Anderson, B.D.O and Moore, J.B. (1979). Optimal Filtering. Englewood Cliffs, NewJersey : Prentice Hall.[3] Ansley, C.F. and Kolin, R. (1984). On the estimation of ARIMA models with missingvalues. In Time Series Analysis of Irregularly Observed Data, Parzen E. (ed), NewYork: Springer-Verlag.[4] Ansley, C.F. and Kolin, R. (1985a). A structured state space approach to computing the likelihood of an ARIMA process and its derivatives. Journal of StatisticalComputation and Simulation, 21 , 135-169.[5] Ansley, C.F. and Kohn, R. (1985b). Estimation, filtering and smoothing in statespace models with incompletely specified initial conditions. Annals of Statistics, 131286-1316.[6] Ansley, C.F. and Kohn, R. (1987). Efficient generalised cross-validation for statespace models. Biometrika, 74 , 139-148.[7] Ansley, C.F. and Kohn, R. (1990). Filtering and smoothing in state space modelswith partially diffuse initial conditions. Journal of Time Series Analysis, 11 , 275-293.[8] Bell, W. and Hiimer, S. (1991). Initializing the Kalman Filter for nonstationarytime series models. Journal of Time Series Analysis, 12 , 283-300.[9] Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedascity. Journal of Econometrics, 37 , 307-327.[10] Box, G.E.P. and Jenkins, G.M. (1970). Time Series Analysis Forecasting andControl. San Francisco: Holden Day.[11] Boyles, R.A. (1983). On the convergence of the EM algorithm. Journal of the RoyalStatistical Society, Series B, 45 , 47-50.[12] Brown, C.B. (1988). A Second Course In Linear Algebra. New York: Wiley.141Bibliography 142[13] Brown, R.L., Durbin, J. and Evans, J.M. (1975). Techniques for testing the constancy of regression relationships over time (with discussion). Journal of the RoyalStatistical Society, Series B, 37 , 141-192.[14] Burmeister, E. Wall, K.D. and Hamilton, J.D. (1986). Estimation of unobserved expected monthly inflation using Kalman ifitering. Journal of Business and EconomicStatistics, 4 , 147-160.[15] Burridge, P. and Wallis, K.F. (1985). Calculating the variance of seasonally adjustedseries. Journal of the American Statistical Association, 80 , 541-552.[16] Chen, N.F., Roll, R. and Ross, 5. (1986). Economic forces and the stock market.Journal of Business, 59 , 383-403.[17] Cook, R.D. and Weisberg, 5. (1982). Residuals and Influence in Regression. NewYork: Chapman and Hall.[18] Cooper, D.M. and Thompson, R. (1977) A note on the estimation of parameters ofthe autoregressive-moving average process. Biometrika, 64 , 625-628.[19] Crafts, N.F.R. Leybourne S.J. and Mills T.C. (1989). Trends and cycles in Britishindustrial production 1700-1913. Journal of the Royal Statistical Society, Series A,152 , 43-60.[20] De Jong, Piet (1988a). The likelihood for a state space model. Biometrika, 75165-169.[21] De Jong, Piet (1988b). A cross-validation filter for time series models. Biometrika,75 , 594-600.[22] De Jong, Piet (1989). Smoothing and interpolation with the state space model.Journal of the American Statistical Association, 84 , 1085-1088.[23] De Jong, Piet (1991a). Stable algorithms for the state space model. Journal of TimeSeries Analysis, 12 143-157.[24] De Jong, Piet (1991b). The diffuse Kalman filter. Annals of Statistics, 19 , 1073-1083.[25] De Jong, Piet (1991c). Linear Time Series for Statistical Analysis and Prediction.Unpublished manuscript.[26] Dempster, A.P. Laird, N.M. and Rubin D.B. (1977). Maximum likelihood fromincomplete data via the EM algorithm. Journal of the Royal Statistical Society,Series B, 39 , 1-38.Bibliography 143[27] Den Butter, F.A.G. and Mourik, T.J. (1990). Seasonal adjustment using structuraltime series models: an application and a comparison with the Census X-11 method.Journal of Business and Economic Statistics, 8 , 385-394.[28] Downing, D.J. Pike, D.H. and Morrison G.W. (1980). Application of the KalmanFilter to inventory control. Technometrics, 22 , 17-22.[29] Duncan, D.B. and Horn, S.D. (1972). Linear dynamic recursive estimation from theviewpoint of regression analysis. Journal of the American Statistical Association, 67815-821.[30] Engle, R.F. (1982). Autoregressive conditional heteroskedascity with estimates thevariance of of U.K. inflation. Econometrica, 50 , 987-1007.[31] Gardner, G. Harvey, A.C. and Phillips, G.D.A. (1980). Algorithm AS154. An algorithm for exact maximum likelihood estimation of autoregressive-moving averagemodels by means of Kalman ifitering. Applied Statistics, 29 , 311-317.[32] Granger, C.W.T. and Andersen, A. (1978). An introduction to bilinear time seriesmodels. Gottingen: Vandenhoeck and Ruprecht.[33] Haggan, V. and Ozaici T. (1981). Modelling non-linear random vibrations using anamplitude- dependent autoregressive time series model. Biometrika, 68 , 189-196.[34] Harrison, P.J. and Stevens, C.F. (1976). Bayesian forecasting. Journal of the RoyalStatistical Society, Series B, 38 , 205-247.[351 Harvey, A.C. (1981). Time series models. New York: Wiley.[36] Harvey, A.C. (1984). A unified view of statistical forecasting (with discussion). Journal of Forecasting, 3 , 245-283.[37] Harvey, A.C. (1989). Forecasting, structural time series models and the Kalmanfilter. Cambridge: Cambridge University Press.[38] Harvey, A.C. and Durbin J. (1986). The effects of seat belt legislation on Britishroad casualties : a case study in structural time series modeffing. Journal of theRoyal Statistical Society, Series A, 149 , 187-227.[39] Harvey, A.C. and Fernandes, C. (1989). Time series models for count or qualitativeobservations (with discussion). Journal of Business and Economic Statistics, 7407-422.[40] Harvey, A.C. and Koopman, S.J. (1991). Diagnostic checking of unobserved components time series models. Research memorandum T1-1191/66. Tinbergen Instituut.Bibliography 144[41] Harvey, A.C. and Peters, S. (1990) Estimation procedures for structural time seriesmodels. Journal of Forecasting, 9 , 89-108.[42] Harvey, A.C. and Pierse, R.G. (1984) Estimating missing observations in economictime series. Journal of the American Statistical Association, 79 , 125-131.[43] Harvey, A.C. and Phillips, G.D.A. (1979). Maximum likelihood estimation of regression models with autoregressive-moving average disturbances. Biometrika, 6649-58.[44] Harvey, A.C. and Todd, P.H.J. (1983). Forecasting economic time series with structural and Box-Jenkins models (with discussion). Journal of Business and EconomicStatistics, 1 , 299-315.[45] Helzer, G. (1983). Applied Linear Algebra with APL. Toronto : Little, Brown andCompany.[46] Jazwinski, A.H. (1970). Stochastic Processes and Filtering Theory. New York: Academic Press.[47] Kalbfleish, J.D. and Sprott, D.A. (1970). Application of likelihood methods to models involving large numbers of parameters. Journal of the Royal Statistical Society,Series B, 32 , 175-194.[48] Kalman, R.E. (1960). A new approach to linear filtering and prediction problems.Journal of Basic Engineering, 82 , 34-45.[49] Kalman, R.E. and Bucy, R.S. (1961). New results in linear filtering and predictiontheory. Journal of Basic Engineering, 83 , 95-108.[50] Kitagawa, G. (1981). A nonstationary time series model and its fitting by a recursiveifiter. Journal of Time Series Analysis, 2 , 103-116.[51] Kitagawa, G. (1987). Non-Gaussian state space modelling of nonstastionary timeseries. (with discussion) Journal of the American Statistical Association, 82 , 1032-1044.[52] Kitagawa, G. (1989). Non-Gaussian state space modelling of time series. .1989 Proceedings of the American Statistical Association, Business and Economic StatisticsSection.[53] Kohn, R. and Ansley, C.F. (1986). Estimation, prediction and interpolation forARIMA models with missing data. Journal of the American Statistical Association,81 , 751-761.Bibliography 145[54] Kohn, R. and Ansley, C.F. (1987a). A new algorithm for spline smoothing based onsmoothing a stochastic process. SIAM Journal on Scientific and Statistical Computation, 8 , 3 3-48.[55] Kohn, R. and Ansley, C.F. (1987b). Signal extraction for finite nonstationary timeseries. Biometrika, 74 , 411-421.[56] Kohn, R. and Ansley, C.F. (1989). A fast algorithm for signal extraction, influenceand cross-validation in state space models. Biometrika, 76 , 65-79.[57] Koopman, S.J. (1991). Efficient smoothing algorithms for time series models. Discussion paper. Department of Statistics, London School of Economics.[58] Laird, N. , Lange, N. and Stram, D. (1987). Maximum likelihood computations withrepeated measures : Application of the EM algorithm. Journal of the AmericanStatistical Association, 82 , 97-105.[59] Lancaster, P. and Tismenetsky, M. (1985). The Theory of Matrices. New YorkAcademic Press.[60] Lauritzen, S.L. (1981). Time series analysis in 1880 : A discussion of contributionsmade by T.N. Thiele. International Statistical Review, 49 , 319-331.[61] Louis, T.A. (1982). Finding the observed information matrix when using the EMalgorithm. Journal of the Royal Statistical Society, Series B,, 44 , 226-233.[62] Mehra, R.K. (1979). Kalman Filters and their applications to forecasting. TimsStudies in the Management Sciences, 12 , 75-94.[63] Meilijson, I. (1989). A fast improvement to the EM algorithm on its own terms.Journal of the Royal Statistical Society B, 51 , 127-138.[64] Meinhold, R.J. and Singpurwalla, N.D. (1983). Understanding the Kalman Filter.The American Statistician, 37 , 123-127.[65] Meinliold, R.J. and Singpurwalla, N.D. (1987). A Kalman-Filter smoothing approachfor extrapolations in certain dose-response, damage-assessment, and accelerated-life-testing studies. The American Statistician, 41 , 101-106.[66] Nicholls, D.F. and Pagan,A.R. (1985). Varying coefficient regression. In Handbookof Statistics, Volume 5, Hannan, E.J., Krishnaiah, P.R. and Rao, M.M. (eds). NewYork: Elsevier Science Publishers B.V.[67] Orchard, T. and Woodbury, M.A. (1972). A missing information principle: theoryand applications. Proceedings of the Sixth Berkeley Symposium on MathematicalStatistics, 1 , 697-715.Bibliography 146[68] Osborn, D.R. (1976). Maximum likelihood estimation of moving average processes.Annals of Economic and Social Measurement, , 75-87.[69] Patterson, H.D. and Thompson, R. (1975). Maximum likelihood estimation of components of variance. Proceedings Eighth International Biometric Conference, CorstenL.C.A. and Postelnicu, T. (eds). Bucharest: Academy of the Socialist Republic ofRomama.[70] Pena, D. and Guttman, I. (1988). Bayesian approaches to robustifying the KalmanFilter. In Bayesian Analysis of Time Series and Dynamic Models, Spall, J.C. (ed).New York: Marcel Dekker.[71] Pfeffermann, D. (1991). Estimation and seasonal adjustment of population meansusing data from repeated surveys (with discussion). Journal of Business and Economic Statistics, 9 , 163-177.[72] Phadke, M.S. (1981). Quality audit using adaptive Kalman Filtering. ASQC QualityCongress Transactions-San Francisco, 1045-1052.[73] Plackett, R.L. (1950). Some theorems in least squares. Biometrika, 37 , 149-157.[74] Pole, A. and West, M. (1989). Reference analysis for the DLM. Journal of TimeSeries Analysis, 10 , 131-147.[75] Priestley, M.B. (1988). Non-linear and non-stationary time series analysis. LondonAcademic Press.[76] Rosenberg, B. (1973). Random coefficient models : the analysis of a cross-section oftime series by stochastically convergent parameter regression. Annals of Economicand Social Measurement, 2 , 399-428.[77] Sallas, W.M. and Harville,D.A. (1981). Best linear recursive estimation for mixedlinear models. Journal of the American Statistical Association, 76 , 860-869.[78] Sallas,W.M. and Harville,D.A. (1988). Noninformative priors and restricted maximum likelihood estimation in the Kalman Filter. In Bayesian Analysis of TimeSeries and Dynamic Models, Spall J.C. (ed), New York: Marcel Dekker.[79] Schweppe, F.C. (1965). Evaluation of likelihood functions for Gaussian signals.IEEE Transactions in Information Theory, 11 , 61-70.[80] Sliarpe, W.F. (1964). Capital asset prices: A theory of market equilibrium. Journalof Finance, 19 425-442.[81] Shephard, N.G. and Harvey, A.C. (1990). On the probability of estimating a deterministic component in the local level model. Journal of Time Series Analysis, 11339-347.Bibliography 147[82] Shumway, R.H. (1988). Applied Statistical Time Series Analysis. Englewood Cliffs,NJ : Prentice Hall.[83] Shumway, R.H. and Stoffer, D.S. (1982). An approach to time series smoothing andforecasting using the EM algorithm. Journal of Time Series Analysis, 3 , 253-264.[84] Stoffer, D.S. (1982). Estimation of Parameters in a Linear Dynamic System withMissing Observations. PhD thesis, University of California, Davis.[85] Subba Rao, T. (1981). On the theory of bilinear time series models. Journal of theRoyal Statistical Society, Series B, 43 , 244-255.[86] Sundberg, R. (1974). Maximum likelihood theory for incomplete data from an exponential family. Scandinavian Journal of Statistics, 1 , 49-58[87] Thiele, T.N. (1880). Sur la compensation de quelques erreurs quasi-systematiquespar la methode des moindres carrees. Kobenhavn: Reitzel.[88] Tong, H. and Lim, K.S. (1980). Threshold autoregression, limit cycles and cyclicaldata. Journal of the Royal Statistical Society, Series B, 42 , 245-292.[89] Tsay, R.S. and Tiao, G.C. (1990). Asymptotic properties of multivariate nonstationary processes with applications to autoregressions. Annals of Statistics, 18220-250.[90] Tunniciffe Wilson, G. (1989). On the use of marginal likelihood in time series modelestimation. Journal of the Royal Statistical Soc-iety, Series B, 51 , 15-27.[91] Vardi, Y., Shepp, L.A. and Kaufmann, L. (1985). A statistical model for positronemission tomography (with comments). Journal of the American Statistical Association, 80 , 8-37.[92] Watson, M.W. and Engle, R. (1983). Alternative algorithms for the estimationof dynamic factor, MIMIC and varying coefficient regression models. Journal ofEconometrics, 23 , 385-400.[93] West, M. and Harrison, J. (1989). Bayesian Forecasting and Dynamic Models. NewYork: Springer Verlag.[94] Wu, C.F.J. (1983). On the convergence of the EM algorithm. Annals of Statistics,11 , 95-103.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Statistical analysis with the state space model
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Statistical analysis with the state space model Chu-Chun-Lin, Singfat 1992
pdf
Page Metadata
Item Metadata
Title | Statistical analysis with the state space model |
Creator |
Chu-Chun-Lin, Singfat |
Date Issued | 1992 |
Description | The State Space Model (SSM) encompasses the class of multivariate linear models, in particular, regression models with fixed, time-varying and random parameters, time series models, unobserved components models and combinations thereof. The well-known Kalman Filter (KF) provides a unifying tool for conducting statistical inferences with the SSM. A major practical problem with the KF concerns its initialization when either the initial state or the regression parameter (or both) in the SSM are diffuse. In these situations, it is common practice to either apply the KF to a transformation of the data which is functionally independent of the diffuse parameters or else initialize the KF with an arbitrarily large error covariance matrix. However neither approach is entirely satisfactory. The data transformation required in the first approach can be computationally tedious and furthermore it may not preserve the state space structure. The second approach is theoretically and numerically unsound. Recently however, De Jong (1991) has developed an extension of the KF, called the Diffuse Kalman Filter (DKF) to handle these diffuse situations. The DKF does not require any data transformation. The thesis contributes further to the theoretical and computational aspects of con ducting statistical inferences using the DKF. First, we demonstrate the appropriate initialization of the DKF for the important class of time-invariant SSM’s. This result is useful for maximum likelihood statistical inference with the SSM. Second, we derive and compare alternative pseudo-likelihoods for the diffuse SSM. We uncover some interesting characteristics of the DKF and the diffuse likelihood with the class of ARMA models. Third, we propose an efficient implementation of the DKF, labelled the collapsed DKF (CDKF). The latter is derived upon sweeping out some columns of the pertinent matrices in the DKF after an initial number of iterations. The CDKF coincides with the KF in the absence of regression effects in the SSM. We demonstrate that in general the CDKF is superior in practicality and performance to alternative algorithms proposed in the literature. Fourth, we consider maximum likelihood estimation in the SSM using an EM (Expectation-Maximization) approach. Through a judicious choice of the complete data, we develop an CDKF-EM algorithm which does not require the evaluation of lag one state error covariance matrices for the most common estimation exercise required for the SSM, namely the estimation of the covariance matrices of the disturbances in the SSM. Last we explore the topic of diagnostic testing in the SSM. We discuss and illustrate the recursive generation of residuals and the usefulness of the latters in pinpointing likely outliers and points of structural change. |
Extent | 2483668 bytes |
Subject |
State - space methods Time - series analysis |
Genre |
Thesis/Dissertation |
Type |
Text |
FileFormat | application/pdf |
Language | eng |
Date Available | 2008-12-24 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0098959 |
URI | http://hdl.handle.net/2429/3318 |
Degree |
Doctor of Philosophy - PhD |
Program |
Business Administration |
Affiliation |
Business, Sauder School of |
Degree Grantor | University of British Columbia |
GraduationDate | 1992-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-ubc_1992_spring_chu_chun_lin_singfat.pdf [ 2.37MB ]
- Metadata
- JSON: 831-1.0098959.json
- JSON-LD: 831-1.0098959-ld.json
- RDF/XML (Pretty): 831-1.0098959-rdf.xml
- RDF/JSON: 831-1.0098959-rdf.json
- Turtle: 831-1.0098959-turtle.txt
- N-Triples: 831-1.0098959-rdf-ntriples.txt
- Original Record: 831-1.0098959-source.json
- Full Text
- 831-1.0098959-fulltext.txt
- Citation
- 831-1.0098959.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0098959/manifest