UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Asymptotic inference for segmented regression models Wu, Shiying 1992

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata

Download

Media
ubc_1992_fall_wu_shiying.pdf [ 5.12MB ]
Metadata
JSON: 1.0086617.json
JSON-LD: 1.0086617+ld.json
RDF/XML (Pretty): 1.0086617.xml
RDF/JSON: 1.0086617+rdf.json
Turtle: 1.0086617+rdf-turtle.txt
N-Triples: 1.0086617+rdf-ntriples.txt
Original Record: 1.0086617 +original-record.json
Full Text
1.0086617.txt
Citation
1.0086617.ris

Full Text

ASYMPTOTIC INFERENCE FOR SEGMENTED REGRESSION  MODELS  By SHIYING W U  B . S c , Beijing University, 1983 M . S c , The University of B r i t i s h C o l u m b i a , 1988  A THESIS S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F THE REQUIREMENTS FOR THE DEGREE DOCTOR OF  OF  PHILOSOPHY  in T H E F A C U L T Y OF G R A D U A T E STUDIES DEPARTMENT OF  STATISTICS  We accept this thesis as conforming to the required standard  T H E UNIVERSITY OF BRITISH C O L U M B I A October 1992 ©Shiying Wu,  1992  In  presenting  degree freely  at  this  the  thesis  in  partial  fulfilment  University  of  British  Columbia, I agree  available for  copying  of  department publication  this or of  reference  thesis by  this  for  his thesis  and study. scholarly  or for  her  financial  3 / ^ = ^ ' . S 1^  The University of British Columbia Vancouver, Canada  Date  DE-6  (2/88)  O c i  /</  I further  purposes  CX^  gain  the  requirements that  agree  may  representatives.  permission.  Department of  of  It  shall not  be is  that  the  an  advanced  Library shall make  permission for  granted  by  understood be  for  allowed  the that  without  it  extensive  head  of  my  copying  or  my  written  A s y m p t o t i c inference for segmented regression models  Abstract  T h i s thesis deals w i t h the estimation of segmented multivariate regression models. A segmented regression model is a regression model which has different analytical forms i n different regions of the domain of the independent variables. W i t h o u t knowing the number of these regions and their boundaries, we first estimate the number of these regions by using a modified Schwarz' criterion. Under fairly general conditions, the estimated number of regions is shown to be weakly consistent. We then estimate the change points or "thresholds" where the boundaries lie and the regression coefficients given the (estimated) number of regions by minimizing the sum of squares of the residuals. It is shown that the estimates of the thresholds converge at the rate of (9p(ln'^n/n), if the model is discontinuous at the thresholds, and Op{n~^^^) if the model is continuous. In both cases, the estimated regression coefficients and residual variances are shown to be asymptotically normal. It is worth noting that the condition required of the error distribution is local exponential boundedness which is satisfied by any d i s t r i b u t i o n with zero mean and a moment generating function provided its second derivative is bounded near zero. A s an illustration, a segmented bivariate regression model is fitted to real data and the relevance of the asymptotic results is examined through simulation studies. T h e identifiability of the segmentation variable is also discussed.  Under different  conditions, two consistent estimation procedures of the segmentation variable are given. T h e results are then generalized to the case where the noises are  heteroscedastic  and autocorrelated. T h e noises are modeled as moving averages of an infinite number of independently, identically distributed random variables multiplied by different constants  i n different regions. It is shown that w i t h a slight modification of our assumptions, the estimated number of regions is still consistent. A n d the threshold estimates retain the convergence rate of Op{\n^ n/n) when the segmented regression model is discontinuous at the thresholds. T h e estimation procedures also give consistent estimates of the residual variances for each region. These estimates and the estimates of the regression coefficients are shown to be asymptotically normal. T h e consistent estimate of the segmentation variable is also given. Simulations are carried out for different model specifications to examine the performance of the procedures for different sample sizes.  ni  Table of Contents  Abstract  ii  Table of Contents  iv  List of Tables  vi  List of Figures  vii  Acknowledgement  viii  C h a p t e r 1. Prologue  1  1.1 Introduction  1  1.2 A review of segmented regression and related problems  3  1.3 N e w contributions and their relationship to previous work  8  1.4 O u t l i n e of the following chapters  11  C h a p t e r 2. E s t i m a t i o n of segmented regression models  14  2.1 Identifiability of the segmentation variable  15  2.2 E s t i m a t i o n procedures  23  2.3 General remarks  30  C h a p t e r 3.  A s y m p t o t i c results of the estimators for segmented regression  models  32  3.1 A s y m p t o t i c results when the segmentation variable is known  33  3.2 Consistency of the estimated segmentation variable  60  3.3 A simulation study  74  3.4 General remarks  80  3.5 A p p e n d i x : A discussion of the continuous model  83  C h a p t e r 4. Segmented regression models w i t h heteroscedastic noise  97  4.1 E s t i m a t i o n procedures  98  4.2 A s y m p t o t i c properties of the parameter estimates  98  4.3 A simulation study  104  4.4 General remarks  107  4.5 A p p e n d i x : Proofs  107  C h a p t e r 5. S u m m a r y and future research  142  5.1 A brief summary of previous chapters  142  5.2 Future research on the current model  142  5.3 Further generalizations  143  References  145  List of Tables  Table 3.1 Frequency of correct identification of P i n 100 repetitions and the estimated thresholds for segmented regression models  149  Table 3.2 E s t i m a t e d regression coefficients and variances of noise and their standard errors w i t h n = 200  150  Table 3.3 T h e empirical distribution of / i n 100 repetitions by MIC,  SC and YC for piecewise constant model  151  Table 3.4 T h e estimated thresholds and their standard errors for piecewise constant model  152  Table 4.1 Frequency of correct identification of P in 100 repetitions and the estimated thresholds for segmented regression models w i t h two regimes . 153 Table 4.2 Estimated regression coefficients and variances of noise and their standard errors w i t h n = 200  154  Table 4.3 Frequency of correct identification of /° in 100 repetitions and the estimated threshold for a segmented regression model w i t h three regimes 154 Table 4.4 Estimated regression coefficients and noise variances and their standard errors w i t h n = 200  155  List of Figures  Figure 2.1  (xi,X2)  uniformly distributed over the shaded area  156  Figure 2.2  [xi,X2)  uniformly distributed over the eight points  157  Figure 2.3 M i l e per gallon vs. w^eight for 38 cars Figure 4.1  {xi,X2)  158  uniformly distributed over each of six regions  w i t h indicated mass  159  Acknowledgements  I thank m y supervisor, D r . Jian L i u , for his inspiration, guidance, support and advice throughout the course of the work reported i n this thesis. I vi^ish to express m y deep gratitude to Professor James V . Zidek, for his guidance, encouragement, patience and valuable advice. T h i s thesis benefitted from the helpful comments of Professor Piet De Jong to whom I a m indebted. Professor J o h n Petkau and M r . Feifang H u also made valuable comments. M a n y thanks go to D r . H a r r y Joe and Nancy E . Heckman for their encouragement and support during m y stay at U B C . Special thanks to Professor James V . Zidek, who provided boundless support throughout m y graduate career at U B C . T h e financial support from the Department of Statistics, University of B r i t i s h C o l u m b i a is acknowledged w i t h great appreciation. I also acknowledge the support of the University of B r i t i s h C o l u m b i a through a University Graduate Fellowship.  Vlll  Chapter 1  PROLOGUE  1.1  Introduction  This thesis deals with asymptotic estimation for segmented multivariate regression models. A segmented regression model is a regression model which has different analytical forms in different regions of the domain of the independent variables. This model may be useful when a response variable depends on the independent variables through a function whose form cannot be uniformly well approximated by a single finite Taylor expansion, and hence the usual linear regression models are not applicable. In such a situation, the possibility of regaining the simplicity of the Taylor expansion and added modeling flexibility is achieved by allowing the response to depend on these variables differently in different subregions of the domains of certain independent variables. For example, Yeh et al (1983) discuss the idea of an "anaerobic threshold". It is hypothesized that if a person's workload exceeds a certain threshold where his muscles cannot get enough oxygen, then the aerobic metabolic processes become anaerobic processes. This threshold is called "anaerobic threshold". In this case a model with two segments is suggested by the subject oriented theory. McGee and Carleton (1970) discuss another example where the dependent structure of the selhng volume of a regional stock exchange on that of New York Stock Exchange and American Stock Exchange is thought to be clianged by a change of govenment regulation. A model with four segments is considered appropriate in  their analysis. Examples of this kind in various contexts are given by Sprent (1961), Dunicz (1969), Schuize (1984) and many others. In some situations, although a segmented model Is considered suitable, the appropriate number of segments may not be known, as i n the example mentioned above and the exchange rate problem we shall discuss in Chapter 5. Furthermore, in the case of multivariate regression, it may not be clear which of the independent variables relate to the change of the dependent structure, or, which independent variable can be best used as the segmentation variable. In some problems where the independent variables are of low dimension, graphical approaches may be effective in determining the number of segments and which independent variable can best be chosen as the segmentation variable. However, if the independent variables are of high dimension, the interrelations of the independent variables may thwart such an approach. Tlierefore, an objective and automated approach is in order. In this thesis, we develop procedures to estimate the model parameters, including the segmentation variable, the number of segments, the location of the thresholds, and other parameters in the model. Note that the word "threshold" is used to emphasize that the dependent structure changes when the segmentation variable exceeds certain values. The estimation procedures are based on least squares estimation and a modified version of Schwarz' (1978) criterion. These estimators are shown to be consistent under fairly mild conditions. In addition, asymptotic distributions are derived for the estimated regression coefficients and the estimated variance of the noises.  The procedures are then generahzed to accommodate situations when the noise levels are different from segment to segment, and when the noise is autocorrelated. It is shown that the consistency of these estimators is retained. Simulated data sets are analyzed by the proposed  procedures to show their performances for finite sample sizes, and the results seem satisfactory.  1.2  A review of segmented regression and related problems  One problem closely related to segmented regression is the change-point problem. A segmented regression problem reduces to a change-point problem i f the regression functions are unknown constants and the boundaries of the segments are to be estimated.  In general, a  change-point problem refers to the problem of making inferences about the point in a sequence of random variables at which the law governing evolution of the process changes. As a matter of fact, part of the work in this thesis is greatly inspired by Yao's (1988) work on the change-point problem. The segmented regression problem and change-point problem have attracted much attention since the 1950's. Shaban (1980) gives a rather complete hst of references from the 1950's to 1970's. Among other authors, Quandt (1958) postulates a model of the form:  where t* is unknown. Under the assumption that ej's are independent normal random variables, he obtains the maximum likelihood estimates for the parameters including t*. Robison (1964) considers a two-phase polynomial regression problem of the form:  = + p i ^ ^ x , + p ^ ^ x j + . . . + + i  = {2;  iî;>  Also assuming noises are independent normal variables, he obtains the maximum likehhood estimate and confidence interval for the change-point. Adding to the model of Quandt (1958) the assumption that the model is everywhere continuous and the variances of the {et} are identical, Hudson (1966) gives a concise method  for calculating the overall least squares estimator of the intersection point of two intersecting regression lines. For the same problem, Hinkley (1969) derives an asymptotic distribution for the maximum likelihood estimate of the intersection which is claimed to be a better approximation to the finite sample distribution than the asymptotic normal distribution of Feder and Sylwester (1968). For the change-point problem, Hinkley (1970) derives the asymptotic distribution of the maximum likelihood estimate of the change-point. He assumes that exactly one change occurs and that the means of the two submodels are known. He also gives the asymptotic distribution when these means are unknown, and the noises are assumed to be identically, independently distributed normal random variables ("iid normal" hereafter). As Hinkley notes, the maximum likehhood estimate is not consistent and the asymptotic result is not good for small samples when the two means are unknown. In all of these problems, the number of change points is assumed to be exactly one. For problems where the number of change-points may be more than one, Quandt (1958, p880) concludes "The exact number of switches must be assumed to be known". McGee and Carleton (1970) treat the estimation problem for cases where more than one change may occur. Their model is:  yt = Po^ + fii'^xu  + ••• + di'^Xkt +  if  te h _ i , Tj),  where 1 < TI < • • • < TL < T^^I = N and the { e j are iid N{0,a-).  Note that L and the r^'s  are unknown. Constrained by the computing power available at that time (1970), they propose a estimation method which essentially combines least squares estimation with hierarchical clustering. While being computationally efficient, their method is suboptimal (resulting from the use of hierarchical clustering), subjective (in terms of choice of L) and lacking theoretical  justification. Goldfeld and Quandt (1972, 1973a) discuss the so-caUed switching regression model specified as follows: = htPi  + ^u,  iiT^'zt < 0;  Here Zt = {zn, • • •, Zkt}' are the observations on some exogenous variables (including, possibly, some or all of the regressors), TT = (TTI, • • •, TT^)' is an unknown parameter, and the  {un}  are independent normal random variables with zero means and variances, <T?, i = 1,2. The parameters, /3i, /JT, o"!,CTIand TT are to be estimated. They define d{zt) = l(x'2, >o) «'•nd reexpress the model as  yt = x[[{l - d{zt))(3^ + d{zt)(32] + (1 - d{zt))uu + d{zt)u2t.  For estimation the "D-method" is proposed: d{zt) is replaced by  J-co  \/27rc7  io-^  and the maximum lil^elihood estimates for the parameters are obtained. As they point out, the D-method can be extended to the case of more than two regimes. Gallent and Fuller (1973) consider the problem of estimating the parameters in a piecewise polynomial model with continuous derivative, where the join points are unknown. They reparametrize the model so that the Gauss-Newton method can be applied to obtain the least squares estimates.  A n F statistic is suggested for model selection (including the number of  regimes) without theoretical justification. Poirer (1973) relates sphne models and piecewise regression models. Assuming the change points known, he develops tests to detect structural changes in the model and to decide whether certain of the model coefficients vanish.  Ertel and Fowlkes (1976) also point out that the regression models for linear spline and piecewise linear regression have many common elements. The primary difference between them is that in the linear spline case, adjacent regression lines are required to intersect at the changepoints, while in the piecewise hnear case, adjacent regression hues are fitted separately. He develops some efficient algorithms to obtain least squares estimates for these models. Feder (1975a) considers a one-dimensional segmented regression problem; it is assumed that the function is continuous over the entire range of the covariate and the number of segments is known. Under certain additional assumptions, he shows that the least squares estimates of the regression coefficients of the model are asymptoticaUy normally distributed. Note that the two assumptions that the function is continuous and that the number of segments is known are essential for his results. For the simplest two segments regression problem with continuity assumption, Miao (1988) proposes a hypothesis test procedure for the existence of a change-point together with a confidence interval of the change-point, based on the theory of Gaussian processes. Statistical hypothesis tests for segmented regression models are studied by many authors, among them are Quandt (1960), Sprent (1961), Hinkley (1969), Feder (1975b) and Worsley (1983). Bayesian methods for the problem are considered by Farley and Hinich (1970), Bacon and Watts (1971), Broemehng (1974), Ferreira (1975), Holbert and Broemehng (1977) and Salazar, Broemehng and Chi (1981). Quandt (1972), Goldfeld and Quandt (1972, 1973b) and Quandt and Ramsey (1978) treat the problem as a random mixture of two regression lines. Closely related to the problem studied in this thesis, Yao (1988) studies the following change-point problem: a sequence of independent normally distributed random variables have a common variance, but their means change / times along the sequence, with / unknown. He  adopts the Schwarz criterion for estimating / and proves that such an estimator is consistent. Yao noted that consistency need not obtain without the normahty assumption. Yao and A u (1989) consider the problem of estimating a step function, g{t), over t G [0,1] in the presence of additive noise. They assume that i,- = i/n (i = 1, • • •, n) are fixed points and the noise has a sixth or higher moment, and derive limiting distributions for the least squares estimators of the locations and sizes of the jumps when the number of jumps is either known or bounded. The discontinuity of g{i) at each change point makes the estimated locations of the jumps converge rapidly to their true values. This thesis is primarily about situations like those described above, where the segmented regression model may be viewed as a partial explanation model tries to capture our impression that an abrupt change in the mechanism underlying the process. It is linked to other paradigms in modern regression theory as well. Much of this theory (see the references below, for example) is concerned with regression functions of say, y on x, which cannot be well approximated globally by the leading terms of its Taylor expansion, and hence by a global linear model. This has led to various approaches to "nonparametric regression" (see Friedman, 1991, for a recent survey). One such approach is that of Cleveland (1979) when the dimension of x is 1; his results, which use a linear model in a moving local window, are extended to higher dimensions by Cleveland and Devlin (1988). Weerahandi and Zidek (1988) use a Taylor expansion explicitly to construct a locally weighted smoother, also when the dimension of a; is 1; a different expansion is used at each i-value thereby avoiding the shortcomings of using a single global expansion. However, difficulties confront local weighting methodologies like those described above as well as kernel smoothers and splines because of the "curse of dimensionality" which becomes progressively more serious as the dimension of x grows beyond 2. These difficulties are weU  described by Friedman (1991) who presents an alternative methodology called "multivariate adaptive regression splines," or " M A R S . " M A R S avoids the curse of dimensionality by partitioning I's domain into a data-determined, but moderate number of subdomains within which spline functions of low dimensional subvectors of a; are fitted. B y using splines of order exceeding 0, M A R S can lead to continuous smoothers. In contrast, its forerunner, called "recursive partitioning" by Friedman, must be discontinuous, because a different constant is fitted in different subdomains. But, like M A R S it avoids the curse of dimensionality because it depends locally on a small number (in fact, none) of the coordinates of x.  Friedman (1991) attributes to Breiman and Meisel (1976), a  natural extension of recursive partitioning wherein a hnear function of x is fitted within each subdomain. However, it can encounter the curse of dimensionality when these subdomains are small and Friedman (1991) ascribes the lack of popularity of this extension to this feature. However, the curse of dimensionality is relative. If the subdomains of x are large the "curse" becomes less problematical. A n d within such subdomains, the Taylor expansion leads to linear models like those used by Breiman and Meisel (1976) and here, as natural approximants; in contrast, splines seem somewhat ad hoc. A n d linear models have a long history of application in statistics.  1.3  New contributions and their relationship to previous work  In this thesis, we address the problem of making asymptotic inference for the following model: p  (1.1)  i=i where zt = ( ï t i , . . . , x^p)' is an observed random variable; f/ is assumed to have zero mean  and unit variance, wliile  r,-, ctj (i = 1 , . . . , / + 1, j = 0 , l , . . . , p ) , / and d are unlvnown  parameters. Our main contributions are as follows. A sequence of procedures are proposed to estimate all these parameters, based on least squares estimation and our modified Schwarz' criterion. It is shown that under mild conditions, the estimator, /, of / is consistent. Furthermore, a bound on the rate of convergence of fi and the asymptotic normality for estimators of Pij, ai (z = / , . . . , / + 1, J = 0 , 1 , . . . ,p) are obtained under certain additional assumptions. When the segmentation is related to a few highly correlated covariates, it may not be clear which covariate can best be chosen as the segmentation variable. In such a case, d will be treated as an unknown parameter to be estimated. A new concept of identifiabihty of d is introduced to formulate the problem precisely. We prove that the least squares estimate of d is consistent. In addition, we propose another consistent and computationally efficient estimate of d. A l l of these are achieved without the Gaussian assumption on the noises. In many practical situations, it is necessary to assume that the noises are heteroscedastic and serially correlated.  Our estimation procedures and the asymptotic results are general-  ized to such situations. Asymptotic theory for stationary processes are developed to estabhsh consistency and asymptotic normality of the estimates. Note that in Model (1.1) if f3ij = 0 for all z = 1, • • •,/ -|- 1 and j = 1, •  equation (1.1)  reduces to the change-point problem discussed by Yao (1988), Xd being the explanatoi-y variable controlhng the allocation of measurements associated with different dependence  structures.  Although our formulation is somewhat different from that of Yao (1988) in that we introduce an explanatory variable to allocate response measurements, both formulations are essentially the same from the point of view of an experimental design. If the other covariates are all known  functionals of x^, as in segmented polynomial regressions, and / is known, (1.1) reduces to the case discussed by Feder (1975a).  Unlike all the above mentioned work on segmented regression except McGee and Carleton (1970), we assume that the number of segments is unknown, and that the noise may be dependent. In terms of estimating /, we generalize Yao's (1988) work on the change-point problem to a multiple segmented regression set-up. Furthermore, his conditions on the noises are relaxed in the sense that the e^'s do not have to be (a) normally distributed (rather, they could follow any of the many distributions commonly used for noise); (b) identically distributed; and (c) independent.  In terms of making asymptotic inference on the regression coefficients and the  change points, we do not assume continuity of the underlying function which is essential for Feder's (1975a) results. We find that without the continuity assumption, the estimated change points converge to the true ones at a much faster rate than the rate given by Feder. Finally, a consistent estimator is obtained for d, an additional parameter not found in any of the previous work.  Our results also relate to M A R S . In fact, our estimation procedure can be viewed as adaptive regression using a different method of partitioning than Breiman and Meisel (1976). By placing an upper bound on the number of partitions, we can avoid the difficulties caused by curse of dimensionahty, of fitting our model to data in high dimensional space (but recognize that there are trade-offs involved). A n d we have adopted a different stopping criterion in partitioning a;-space; it is based on ideas of model selection rather than testing and seems more appealing to us. Finally, and most importantly, we are able to provide a large sample theory for our methodology. This feature of our work seems important to us. Although the M A R S methodology appears to be supported by the empirical studies of Friedman (1991), there is  an inevitable concern about the general merits of any procedure when it lacks a theoretical foundation. Interestingly enough, it can be shown that in some very special cases, our estimation procedures coincide with those of M A R S in estimating the change points, if our stopping criterion were adopted in M A R S . This seems to indicate that, with our techniques, M A R S may be modified to possess certain optimalities (e.g. consistency) or suboptimalities for more general cases. So in summary, with the estimation procedures proposed in this thesis we regain some of the simphcity of the (piecewise) Taylor expansion and attendant linear models, while retaining some of the virtues of added modeling flexibihty possessed by nonparametric approaches. Our large sample theory gives us precise conditions under which our methodology would work well, given sufficiently large samples. A n d by restricting the number of a;-subdomains sufficiently we avoid the curse of dimensionality. Partitioning for our methodology, is data-based like that of MARS.  1.4  Outline of the following chapters  This dissertation is organized as follows. In Chapter 2, the identifiability of the segmentation variable i n the segmented regression model is discussed first. We introduce a concept of  identifiability  and demonstrate how the concept naturally arises from the problem. Then  we give an equivalent condition which is crucial in establishing the consistency. Finally, we give a sequence of procedures to estimate all the parameters involved in a "basic" segmented regression model with uncorrelated and homoscedastic noise. These procedures are illustrated with an example.  The consistency of the estimates given in Chapter 2 is proved in Chapter 3. Conditions under which the procedures give consistent estimates are also discussed. For technical reasons, the consistency of estimates other than that of the segmentation variable is estabhshed first. The estimation problem is treated as a problem of model selection, with the models represented by the possible number of segments, assuming the segmentation variable is known. Schwarz' criterion is tuned to an order of magnitude that can distinguish systematic bias from random noise and is used to select models. Then, with the estabhshed theories, the consistency of the estimated segmentation variable is proved. Simulations with various model specifications are carried out to demonstrate the finite sample behavior of the estimators, which prove to be satisfactory.  Results given in Chapter 2 and Chapter 3 are generalized to the case where the noise levels in different segments are different. The noise often derives from factors that cannot be clearly specified and about which little is known. In many practical situations, like that of the economic example mentioned above, the noise may represent a variety of factors of different magnitudes, over different segments. Therefore a heteroscedastic specification of the noise is often necessary. To meet practical needs further, the noise term in the model is assumed to be autocorrelated. The estimation procedures given in Chapter 2 are modified to accommodate these necessities and presented in Chapter 4. It is shown that under a moving average specification of the noise, the estimates given by the procedures are consistent. Further, the parameters specified in the moving average model of the noise term can be estimated by the estimated residuals. Simulation results are given to shed light on the finite sample behavior of the estimates.  A summary of the results established in this thesis is given in Chapter 5. Future research is also discussed. One line of future research comes from the similarity between segmented  regression and spline techniques. Our model can first be generalized to the case where there are more than one segmentation variables. Then an "oblique" threshold model can be considered. A n oblique threshold is one made by a linear combination of explanatory variables. This is reasonable because often there is no reason to beheve that the threshold has to be parallel to any of the axes. Finally, by partitioning the domain of the explanatory variables into polygons, an adaptive regression splines could be developed. This could serve as an alternative to Friedman's (1988) multivariate adaptive regression sphne method, or M A R S .  Chapter 2  ESTIMATION OFSEGMENTED REGRESSION MODELS  In this chapter, we consider a special case of model (1.1) where the {ctj} are all equal and the {et} are independent and identically distributed. In this case, the model can be reformulated as foUows. Let (2/1,a:ii,...,xip), ..., (?/„,x„i, . . •,Xnp) be the independent observations of the response,  y,  and the covariates,   = {0io,Piu  xi,...,Xp.  Let Xt =  [l, Xti,...,  Xtp)'  for i = l , . . . , n and  Pip)', i = 1 , . . . , / + 1. Then,  yt  = x'Ji + et,  if xtd  G {Ti-i,Ti], i = 1 , . . . , / + 1,  where the {et} are i i d with mean zero and variance  t = l,...,n,  (2.1)  and are independent of { x j , —00 =  •'"0 < Ti < • • • < T/+1 = 00. The Pi, Ti, (i = 1,..., / + 1), /, d andCT^are unknown parameters. When Pd = 0, the segmentation variable Xtd becomes an exogenous variable as considered by Goldfeld and Quandt (1972, 1973a). A sequence of estimation procedures is given to estimate the parameters in model (2.1). The estimation is done in three steps. First, the segmentation variable or the parameter d is estimated, if it is not known  a priori.  Then, with  d  known or supposed known, if estimated,  the number of structural changes / and the locations of structural changes r^'s are estimated by a modified Schwarz' criterion. Finally, based on the estimated d, I and r^'s, the Pi's and <7^ are estimated by ordinary least squares. It will be shown in the next chapter that all these estimators are consistent, under certain conditions.  It is obvious that to estimate d consistently, it has to be identifiable. In Section 2.1, we discuss the identifiability of  d.  Specifically, we introduce a concept of  identifiability  and give  equivalent conditions, all illustrated by examples. These conditions will be used in the next chapter to provide the consistency of the estimator of d. Our estimation procedures are given in Section 2.2. In particular, two procedures are given to estimate d under different conditions. The first one assumes less prior knowledge while the second one requires less computational effort. Based on the estimated d, the estimation procedures for other parameters are then given. Finally, all the procedures are illustrated by an example in which the dependence of gas consumption on the weight and horse power of different cars is examined. Some general remarks are made in Section 2.3. In the sequel, either a superscript or a subscript 0 will be used to denote the true parameter values.  2.1  Identifiability of the segmentation variable  Although in some appfications, the parameter  d  can be determined  a priori  from back-  ground knowledge about the problem of concern, it can be hard to determine d with reasonable certainty, due to a lack of background information. For instance, i f the segmentation is related to a few highly correlated covariates, it may not be clear which one can best be chosen as the segmentation variable. Therefore, there is a need for a defensible choice of d based on the data. When the vector of covariates are of high dimension and d cannot be identified by graphical methods, a computational procedure is required. However, when some of the covariates are highly correlated, it may not be clear whether d can be uniquely identified. In the following, we discuss the exact meaning of being "identified" and give a set of conditions under which d  can be uniquely identified. To simplify notation, let x have the same distribution as that of x i and R° = {x : x^o G ( r ? _ i , r ? ] } , j = 1 , . . . , / ° + 1. A n d for any d, let {Rff^t\  be a partition of RP where i?^ =  {x : Xd £ (rj_i,Tj]}, - c o = TQ < n < • • • < r; < r;+i = oo. Let X be a known upper bound on the number of thresholds. any partition {Rj}^^^,  Intuitively speaking, dP is identifiable if for any d ^ d°, and  there is at least one region, say Rf, on which the model exhibits clear  nonlinearity. Note that L is involved. Indeed, the identifiabihty oi d° does depend on L when the domain of X takes a certain special form. This can be easily seen in the following two examples.  Example 1 x is uniformly distributed over the shaded area in Figure 2.1,  y = l(xi>i) +  where  is an indicator function. A n d  i2° = {x : x i e ( - 0 0 , 1 ] } , ii:^ = {x : xi e (1, oo)}.  For X = 1, no threshold on X2 can make the model piecewise linear over its domain. The only possible threshold which makes the model piecewise linear is r i = 1 as defined in the model. For i = 2, however, TI — —1, T2 — 1 also make the model piecewise hnear over its domain. Hence either Xi or X2 can be used as the threshold variable.  %  The same phenomenon can also be seen in the next example.  Example 2  x is uniformly distributed with probabilities concentrated at the 8 points as  specified in Figure 2.2, Y = l(xi>0) •X2 + e. 16  For X = 1, no threshold on X2 can make the model piecewise linear over its domain. For L = 2, however, TI = —1/2, T2 = 1/2 make the model piecewise linear over its domain. Hence either xi or X2 can be used as the threshold variable.  ^  Sometimes, but not always, one cannot determine whether or not the model is linear on unless the model can be uniquely determined on both Rf n R^ and Rf n R^ for a pair of adjacent linear on Rf.  In Example 2, if Rf = {-x. : X2 < 0}, dropping the point ( — 1, —1) makes the model Furthermore, since in model (2.1) we did not exclude the possibility of (3i = Pj  for nonadjacent  to ensure the detection of nonlinearity on Rf, the model has to be uniquely  determined on Rf n R^ and Rf D R°j for at least one pair of adjacent  To this end, we need  1 " -^Xtx;i(^,efi.nHO^.)  (2.2)  be positive definite for z = 1,2 and some A; e {0, • • •, /° - 1}. Asymptotically, we need (2.2) to hold with probabiHty approaching 1 as n becomes large, and its L H S should not gradually degenerate to a singular matrix. This in turn can be stated as follow: For any set A , let A(A) be the smallest eigenvalue of jE[xx'l(xeyi)]. Define ^{{Rj}fii) ,2mRj  Definition 2.1  n Rl+i)}.  =  We win need d° to be identifiable, defined as follows:  d^ is identifiable w.r.t. L if for every d ^  A=  mi  Xi{R^}f+,')>0,  (2.3)  where the inf is taken over all possible partitions of the form {Rj}^^^ .  If /" = 1, then k = 0 and X{{R^}f+^) = max^ mini=i,2{A(i2^^ n Rf)}. the identifiability of d^ in the two examples given above.  Now, let us examine  Example 1 (continued)  dP is not identifiable w.r.t. L = 2.  Since for d = 2, and (ri,r2) = ( - 1 , 1 ) , either P{RJ n i i ? ) = 0 or P{RJ n iE^) = 0 for all j = 1,2,3. dP is identifiable w.r.t.  L — 1.  Since for any T\, there exists r G {1,2} such that  £^[xx'l(xeiî<'nR°)] is positive definite, for i = 1 , 2 .  Example 2 (continued)  f  is not identifiable w.r.t. L = 2.  Let d = 2. If (ri,r2) = (—0.5,0.5) then each of Rj D R'- will contain no more than two points with positive masses, i = 1,2, j = 1,2,3. Hence ^fxx'l^^g^jnjjo)] will be degenerate for all d° is identifiable w.r.t. L = 1. Since for any TI and i = 1,2, there exists r G {1,2} such that Rf n R'i contain at least 3 points, with positive masses, which are not collinear. Hence £{xx'l(x.e7î''niî°)} is positive definite.  Because we have effectively just 4 choices of r i , the  eigenvalues of JEJ{xx'l(3(.£/?<Jn/i9)}, ^ = 1,2, are positive.  %  In more complicated cases, the identifiabihty condition may not be easy to verify. A n equivalent condition is given in the theorem below. This theorem is essential in showing that the two methods of estimating d^ given in the next section are consistent.  Theorem 2.1  The following conditions are equivalent:  (i) d° is identifiable w.r.t. L, (ii) for any d ^ d°, there exist sets {Aj]^!^ of the form Aj = {x : Oj < Xd < bj] such that (a) \{AJr]Rl_^-) > 0 for some 0 < k < P - 1 and all i = 1,2, s = 1, (b) for any partition {Rj]^^l,  A^ C Ri for some r, 5 G {1, • • •, X + 1}.  L + 1, and H  Before proving the theorem, let us find Aj's in the two examples given above. Assume, arbitrarily, d = 2. In Example 1, let Af = {x :-2  < X2 < -0.5} and  = {x : 0.5 < X2 < 2}.  Then, Af and A^ satisfy (ii) i n Theorem 2.1. In Example 2, Af = {x : -1 < X2 < 0} and A2 = {x. : 0 < X2 < 1}. Note that in this case, Af H A^ = {0}; the sets overlap.  For any measurable set C in  , let  A'^(C) = jmn A({x : Lemma 2.1  G C } n i2?).  A'^([a,u]) is right continuous in u. X'^{[u,b]) is left continuous in u.  Also, hmfc__oo A''((-oo, b]) = 0, hm<,_oo A''([a, +00)) = 0 and X'^i{a}) = 0. Proof Let A = {x : a < Xd < u} n Rl, As = {x : u < Xd < u + S} n R° and A+ = {x : a < Xd < u + ê} n Ri- Then A^ = AU As. Let a be the normalized eigenvector corresponding to X{A), the smallest eigenvalue of £[xx'l(.{xgyi})]- Then X{A) = a'i;[xx'l({x6^})]a = a'i;[xx'l({xeA+})]a-  a'i;[xx'l({xe>i.})]a  > A(A+)-a'£;[xx'l({xe^,})]a >X{A+)-tr{E[xx'l^^^^A,})])  = A(A+) - E[x'xl({xe>ia)]By the dominated convergence theorem, i^[x'xl(.{xeAi})] = -^[x'xl(^x:u<^<j<u-i-5}nR°)] converges to 0 as ^  0+. Therefore, X(A) < A(A+) < X{A) + o(l) and A(£'[xx'l(^x:a<:r^<u}n/î°)])  is right continuous in u. Replacing R° by R2 in the above argument, we have that A(£'[xx' ^({7s.:a<xa<u}nR°)])  right continuous in u. Since A'^([a,t/]) is the minimum of the two right  continuous functions, it is also right continuous. Now, let A = {x : u < Xd < b} 0 R^, As - {x : u - 6 < Xd < u} D R^ and A _ = {x : u — 6 < Xd < b} f] R^. Then A- = AU As- Let a be the normalized eigenvector corresponding  to \{A), the smallest eigenvalue of E[x.-x.'l^^xeA})]- Then A(A) = a'i;[xx'l({,e^})]a = a'£[xx'l({xe^_})]a -  a'f;[xx'l({xg^,})]a  > A(A_)-a'X;[xx'l({,g^,})]a > A(A_)-ir(^[xx'l({x€^,})]) = A(A_)-£[x'xl(.{xç^,))]. By the dominated convergence theorem, X^[x'xl({xeA«})] = •E^[x'xl({x:u-5<xd<u}nflO)] converges to 0 as ^ ^ 0+. Therefore, X{A) < A ( A _ ) < X(A) + o(l) and A(i;[xx'l(^x:«<a:<i<fc}nH;)]) is left continuous in u. Replacing ^{{x:u<xd<b}nR°)])  by R2 in the above argument, we have that A ( £ [ x x '  is left continuous in u. Since X'^([u,b]) is the minimum of the two left con-  tinuous functions, so it is also left continuous. Observe that 0 < X'{[a,+^))  < /r(i;[xx'l^x,„<,,«^}nflO)]) < ^ [ x ' x l  {{x:a<xj«x.}nR°))l-  By the dominated convergence theorem, the RHS converges to 0 as a ^ cx). Thus lim A'*([a,+oo)) = 0. a—KX>  Similarly, 0 < A'^(-oo,6]) < tr{E[xx'l^^^,_^^^^<tynR°)])  < i^[x'xl(^x:-oo<r,<6}nii?)]-  By the dominated convergence theorem again, the R H S converges to 0 as 6 ^ —00. Thus lim  A''((-oo,6]) = 0.  6-* —00  Since the {d + l ) t h row of the matrix £'[xx'l(^3ç.^^_a-}n/jO)] is its first row multiphed by a, its rank is less than or equal to p and hence it is degenerate. i;[xx'l({,,,,=,}nRO)]. Hence A''({a}) = 0.  %  So does the rank of  Let  = sup{6 : A'^([6,+00)) > A} where A > 0 is given by Définition 2.1, 6^^^^ = co,  and, recursively, bj_i - sup{6 < bj : X^{[b,bj]) > A } , j = 2 , . . . , i , where, by convention, 6;_i = - 0 0 i f {b < b* : X-'iib, b*j]) > A} =  Lemma 2.2 (i)  Suppose  - 0 0 = 60  <  is identifiable w.r.t. L. Let 65 = — 0 0 . Then < . . . < 62 < 62+1 -  ^"'^  (ii) A ' ' ( ( - o o , 6 î ] ) > A . Proof (i) Lemma 2.1 imphes hma_^oo A'^ffa, 00)) = 0, so 6^ < 0 0 . A n d 6^ > - 0 0 . For if it were not, i.e., 6^ = Tl  Ç. ( — 0 0 , 0 0 ) ,  that 62 = T2  <  • • • <  —00,  ^h^n since limf,_t_oo A'^((-oo, 6]) = 0, there exists  such that A'^((—00, ri]) < A . In view of the definition of 6^ ^.nd the assumption we have that A'^((ri,oo)) < A . For any  TL < TL^I  = 00,  we have  X'^{{TJ_I,TJ])  T2,---,TL  such that  — 0 0 — TQ < TI  <  < A , j = 1, • • •, X + 1. This contradicts to  the definition of A . So, — 00 < 62 < 0 0 . Assume that 6^, • • •, 62 have been well defined and satisfy — o o < 6 ^ < - - - < 6 2 < o o . We will now show that - 0 0 < 6*_j < 6^. By Lemma 2.1, X'^{{a}) — 0 and X'^{[u,b]) is left continuous in u. Hence, bj_i < bj. Suppose bj_^ = —CO. Since  lim6__oo  A''((—00,6]) = 0, there exists r j _ i € ( — 0 0 , 6 * )  such that A'^((—00,rj_i]) < A . For this TJ-I, let TQ = 00 and choose r i , - - - , r j _ 2 such that 00 = To < Tl < • • • < Tj-2 < Tj-i- Then X'^iin-uTk])  < A'^((-^,r,_a]) < A ,  k = l,---J-l.  Since bj_-^ = — 0 0 , A''([rj_i, 6^]) < A . By right continuity of X'^{[a,-]), there exists Sj > 0 such that Tj = bj + Sj e (6^,6^^j) and  X'^{[TJ^I,TJ])  < A.  Repeating this argument we can see  that there exists Sk > 0, such that Tk = b^ + 6k £ (KiK+i) k = j, • • •, L. By the definition of 62, X'^([TL, 00)) < A .  A'^([r/.._i, rfc]) < A , where  In summary, we have X\{Tk-urk])  < X\[Tk-i,rk])  < A,  and A'^((rL,oo)) < A . That is, the partition {Rjjf^l, inini=i_2 A(i2^ni2°) = X'^{{TJ-I,TJ])  If not, X'^{(-(X),b'^])  such that n =  A, j =  1, • • •,  where  L +  l,...,L,  = {x: Xd £ ( r j _ i , r j ] } , satisfy  1. This again contradicts the definition  < 6^ for j = 2, • • •, i + 1. Thus, (i) is verified,  of A . B y induction, —oo < (u)  <  k =  < A . Then, by the right continuity of A'^([a,-]), there exists  > 0  + ^1 < ^2 and A''((-oo, ri]) < A . By the definition of b^, X'^{[Ti,b^]) < A and  hence there exists 62 > 0, such that tt = 62 + ^2 < ^3 and A'^([ri, r2]) < A . B y repeating this process we shall see that there exists — 00 = TQ < r i < • • • < r / , _ i < bl<TL  = bl + 6L< TL+1 = 00 such that A'^((rj_i, TJ]) < A , j = 1, • • •, X + 1.  This leads again to a contradiction to the definition of A .  ^  Proof of Theorem 2.1 Without loss of generality, /° = 1 is assumed. Suppose (ii) holds. The condition A(Af n i??) > 0 for ah s and i imphes mim^siK^i  ^ ^?)} > 0- Then, X{{R'^}^+^) >  mini=i,2 A(i2^ n i2?) > min,=i,2 A(Af n R'^) > mini,^{A(Af n i?^)}. We conclude that d° is identifiable w.r.t. L by taking the infima in the last inequality. Now assume (i) holds. Let Aj  — {-x. : Xd £ l^j-ii^j]},  where bj is defined in Lemma 2.2, j = 1 , - - - , X + 1.  By Lemma 2.2, - 0 0 = 6^ < 6J' < • • • < 6^ < ^l+i definition of b^s, X'^([u,b*j]) > A for all u <  =  and A'^((-oo, 6|]) > A . By the  j = 2 , - - - , X + 1. By Lemma 2.1, X'^{lu,b])  is left continuous in u. Hence, A'^([6^_j, 6*']) > A , j = 2, • • •, X + 1. By the definition of A'^(-), X{Af n i?0) = A({x : Xd e ( - 0 0 , b1]} n X;0) > A'^((-oo, b^]) > A , and A(Af n R°) = A({x : Xd € [K-i^K]}'^R^i)  2.1 (u).  > ^'^ilK-i^K])  > A.s =  2,---,L  +  1. That is, {A^}^+/ satisfy (a) in Theorem  It remains to show that for any {Rj}f^i, r, 5 € {1, • • •, X -f 1} such that Rf C  where Rj = {x. :  £ ( r j _ i , r , ] } , there exists  . We shall show it by sequential exhaustive argument.  If Rf 75 Af then r i < 6*. If R^ 75 Af, i = 1,2, then r2 <  If i?^ 7$ A,^, i = 1,2,3, then  Ta <b^. •• : If i2£ 75 Af, i = 1, • • •, X , then, rz, < bl and hence igf+i D A ^ ^ ^ . This completes the proof of Theorem 2.1.  Corollary 2.2  1[  Suppose the distribution of Z i = ( x n , . . . , Xip)' has support ( a i , 6 i ) x ••• X  (flp, 6p), where —00 < Ui < bi < 0 0 , i — 1,... ,p. Then for any integer X > / ° , d° is identifiable w.r.t. X .  Proof For any d ^ d^, any X + l mutually exclusive subsets of the form {x : Xd £ [a, T]]}, where a < Tj  and [a,r]] C ia.d,bd), will serve as the {Aj}^^l in Theorem 2.1. Hence the identifiabihty  of d° follows.  ^  Corollary 2.3 Suppose the support of distribution of z i = (xn,... of R P . Then for any integer X >  Proof  ,Xxp)'  is a convex subset  is identifiable w.r.t. X .  Since the support of distribution of Z i is convex, it contains a subset of the form  (ai, 61) X . . . X (ttp, b-p), where —00 < a, < bi < 0 0 , i = 1,... ,p. For any d 7^ c?°, any X + l mutually exclusive subsets of the form {x : serve as the {A'j)^^l in Theorem 2.1.  € [a, 77]}, where a < rj and [a, T]] C (a^, 6^), will  f  2.2 Estimation procedures  The least squares criterion is used to select d. The idea is simple. Suppose that d^ is identifiable and that a wrong d were chosen as the threshold variable. Then for sufficiently  large n, on at least one of the Rj^s, say Rf, the model exhibits nonhnearity, resulting in a large sum of squared errors on Rf.  Hence, the total sum of squared residuals is large. In contrast,  if d° were chosen, by adjusting the f / s , the model on each {x : f j _ i < x^o < fj} would be roughly hnear, resulting in a smaher total sum of squared errors. Therefore, d should be chosen as the d resulting in the smallest total sum of squared errors. To simphfy the implementation of this idea, let  \enJ In{A) := c f i a p ( l ( x , e ^ ) , . . . , l ( x „ e A ) ) , A C R''+'' XniA)  :=  In{A)Xn,  H^{A) := Sn{A)  Xr.{A)[X'M)Xn{A)]-X'M  := Y:,{UA)  -  Hn{A))Yn,  and Tn{A) :=  è'MA)ên,  where in general for any matrix M, M~ denotes a generahzed inverse. Note that X „ ( A ) , Hn{A) and Sn{A) are, respectively, the covariates, "hat matrix" and the sum of squared residual errors from fitting a linear model based on just the observations in A. Finally, for any {RjYjtl  define the total sum of squares over different regions as ;+! i=i  The first method for estimating  is given below.  Method 1 Suppose d° is identifiable w.r.t. L . Choose d to minimize the sum of squared errors. More precisely, let  := S^{ff,...,  f^), where  < • • • < f | minimize  S^{TI,  . . . , r^) over ah  ( r i , . . . , TL). Select d such that  < 5^ for d = 1,... ,p. Should multiple minimizers occur, we  define d to be the smallest of them.  Remark  When calculating SniRj),  at least p data points must be in  to ensure the  regression coefficients on that segment are uniquely determined.  This method requires intensive computation. A s Feder (1975a) and other authors note, S^{TI, TL),  • • •, TL) may not be differentiable at the true change points. So to minimize 5'^(TI, • • • ,  one has to search all ( r i , • • •,TL).  Fortunately, we can do this by restricting ourselves to  the finite set {xid, • • •, Xnd}, without loss of generality. Even so, exhausting all (T^, • • •, T^) for any d needs (£) x ( i + 1) linear fits. Although a method more efficient than actually doing the (2) x{L + l) fits exists, there is still a lot of work for any i > 3 and large n. So, under stronger conditions, we give another more efficient method. This method is based on the following idea. Suppose z i = (xu, •.., xip)' is a continuous random vector and the support of its distribution is ( c i , 6i) X . . . x ( o p , bp), where —oo < a,- < 6,- < oo, (i — 1, - • • ,p). Then for any d we can partition {ad,bd) into 2L + 2 disjoint intervals such that there are an equal number of observations in each of the intervals. For any d ^ d°, on all these intervals the model will exhibit nonlinearity and hence the linear fits will result in larger sum of squared errors. If d = d^, then there are at least X + l intervals that are entirely embedded in one of the ( r ° _ j , r ° ] ' s . Hence, on those intervals, the model is linear and the sum of squared errors from hnear fits are smaller. Thus, the total of the smallest L + 1 sums of squared errors for d = d° is expected to be smaller than that for d ^ d^. It is easy to see that the above argument holds as long as the number of partitions is no less than X + 2. The practical advantages of choosing a number larger than X + 2 will be discussed in Section 3.2. We summarize the above discussion as follows:  Method 2  Suppose Zi = ( x n , . . . , xip)' is a continuous random vector and the support of its  distribution is  X . . . X (ap,6p), where - o o < a,- < 6j < oo, i = 1,.. .,p. Let r'j be the  [100 X j/{2L + 2)]th percentile of Xt^'s,  = { x i : xu G (^j^-i, r^^]}, j = 1,..., 2X + 2. Select  d, so that  for aU d = 1, • • •, p, where 5^=x;'5•n(4)) :=1  and 5„(À(''-)) is the ith smallest of 5 „ ( À ^ ) , • • •, 5„(À^£,+2)-  Remark  For any d, Method 2 requires only 2X + 2 linear fits (independent of n).  The  computational effort is significantly reduced compared with Method 1.  Now, with d'^ estimated above, we can assume that rf" is known and estimate other parameters. For simphcity, we shall drop the superscript, d, on  and rj^'s in the rest of this  section. First we estimate P and the thresholds,  , . . . , r^J, by minimizing the modified Schwarz'  criterion (Schwarz, 1978),  MICil)  for some constants CQ > 0,  := l n [ 5 ( f i , . . . , f;)/(n -p*)] +  ££O^Î^)!l!l^ n  (2.4)  > 0. In equation (2.4), p* = (I + l)p + I ^ (I + l){p + 1) is the  total number of fitted parameters, and for any fixed /, f i , . . . , f/ are the least squares estimates which minimize 6 ' „ ( r i , . . . , r;) subject to —oo = TQ < TI < • • • < r;+i = oo. Recall that Schwarz' criterion (SC) is defined by  SC{1) = ln[Sin,fi)l{n  - I)] + / ^ ^ . 26  (2.5)  We can see that the distinction between MIC{1) and SC{1) hes in the severity of the penalty for overspecification. A n d a severer penalty is essential for the correct specification of a nonGaussian, segmented regression model, since SC{1) is derived under Gaussian assumption (cf., Yao, 1988). Both criteria are sometimes referred as penalized least squares. W i t h estimates, / of / ° , and fj for r ° , i = 1 , . . . , / available, we then estimate the other regression parameters  and the residual variance  by the ordinary least squares estimates,  h = [x;(4)x„(Âi)]-x;(Ài)Yn, î = i , . . . , / + i , and =  5„(fi,...,f/)/(n-p*),  where Ri = {x : f , _ i < x^o < fi}, p* = (l + l)p + I. Under regularity conditions essential for the identifiability of the regression parameters, we shall see in Chapter 3 that the ordinary least squares estimates Pj will be unique with probabihty approaching 1, for j = 1 , . . . , / -|- 1, as n —>• oo. While for a really large sample size, we do not expect the choice of  and CQ to be crucial,  for small to moderate sample sizes, this choice does influence the model selection. Below, we briefly discuss the choice of CQ and ^oIn general, when selecting models, a relatively large penalty term would be preferable for the models that can be easily identified. This is because a larger penalty will greatly reduce the probabihty of overestimation while not risking underestimation too much. However, if the model is difficult to identify (e.g., a continuous model with  \\dj+i  —  Pj\\  small), the penalty  should not be too large since the risk of underestimation is now high. Another factor infiuencing the choice of the penalty is the error distribution. A distribution with heavy tails is likely to generate extreme values, making it look as though a change in  response has occurred. To counter this effect, one needs a heavier penalty. In fact, if ej has only finite order moments, a penalty of order  for some a > 0 is needed to make the  estimation of 1° consistent. Given that the best criterion is model dependent and no uniformly optimal choice can be made, the following considerations guide us to a reasonable choice of  and CQ:  (1) From the proof of Lemma 3.2 in Section 3.1, we shall see that it is possible that the exponent 2 + SQ in the penalty term of MIC may be further reduced, while keeping the model selection procedure consistent. A n d since the Schwarz' criterion (where the exponent is 1) is obtained by maximizing the posterior likelihood in a model selection paradigm and is widely used in model selection problems, it may be used as a basehne reference. Adopting such a view,  should  be small to reduce the potential risk of underestimation when the noise is normal and n is not large. (2) For a small sample, it is practically difficult to distinguish normal and double exponential noise, or t distributed noise. A n d , hence, one would not expect the choice of SC or any other reasonable criterion to make a drastic difference. (3)  A s Yao (1988) noted for large samples, SC tends to overestimate /° if the noise is not  normal. We observe such overestimation in our simulations under different model specifications when n = 50 (see Section 3.3). Based on (1), we should choose a small ^o- A n d by (2), with SQ chosen, we can choose some moderate no, and solve for CQ by forcing MIC equal to SC at UQ. By (3), no < 50 seems desirable. In the simulation reported in the next section, we (arbitrarily) choose 6o to be 0.1 (which is considered to be small). W i t h such a 6o, we arbitrarily choose no = 20 and solve for Co. We get Co = 0.299.  In summary, since the "best" selection of the penalty is model dependent for finite samples, no optimal pair of (co,^o) can be recommended. On the other hand, our choice of  = 0.1  and Co = 0.299 performs reasonably well for most of the cases we experimented with in our simulation. The simulation results are reported in Section 3.3. Further study is needed on the choice of 6o and co under different assumptions. A data set used in Henderson and Velleman (1981) is analyzed below to illustrate the estimation procedures proposed above. The data consist of measurements of three variables, miles per gallon (y), weight (xj) and horse power (x^), on thirty eight 1978-79 model automobiles. The dependence of y on Xi and X2 is of interest. Graphs of the data show a certain nonlinear dependence structure between y and xi (see Figure 2.3). Suppose we want to fit a model of the form (2.1). In this case, it becomes  yt = Pio + Piixn + Pi23:t2 + Q, if xtd £ ( r , _ i , r i ] , i = l , . . . , / - f 1,  where  (2.6)  is assumed to have zero mean and variance <t^. To demonstrate the use of two methods  of estimating  let us ignore the information given by Figure 2.3 (which suggests <i° = 1 and  /° = 1) and estimate d° by calculation. First, we (arbitrarily) choose L - 2 and apply Method 1. We get 5^ = 120.0 and Si = 136.0. Hence  = 1 is chosen by Method 1. W i t h Z = 2 we get on applying Method 2, S^ = 14.6  and Si — 15.3. Thus, d — 1 is also chosen by Method 2. Both methods agree with the casual observation made above about Figure 2.3. Next, with d = 1, we calculate and compare MIC{1) for / = 0,1,2 to estimate / ° . For illustrative purposes, the constants CQ and 6o in the penalty term of MIC are chosen as 0.2 and 0.05 respectively, to enable the piecewise model to remain competitive for this small sample example. The MIC values for / = 0,1,2 are 2.28, 2.11 and 2.31 respectively. Thus / = 1 is chosen  by the criterion. Then with / = 1, f i = 2.7 is obtained. W i t h these estimates, the estimated coefficients are (  o , / 3 i 2 ) = (48.82,-5.23,-0.08), (/320,/32iJ22) = (30.76,-1.84,-0.05) and â2 = 4.90. Finally, treating the MIC as a general model selection criterion rather than a tool for finding  two more competing models are fitted to the data. These are  2/t = /?o+/?ia;n + ef,  (2.7)  2/i = /3o + Pxxn + P2x\i + P:iXt2 + ft-  (2.8)  and  From Figure 2.3, both models seem appealing. The MIC values for these two models are 2.24 and 2.12. Thus, the segmented model is chosen as the "best". Needless to say, it is only the "best" among the few models considered; further model reduction may be possible.  2.3  General remarks  In Section 2.1, we have discussed the identifiability of cP. It can be seen from Corollary 2.3 that i n many regression problems, dP can be treated as identifiable w.r.t. any L > But, it is important to reahze that ^ is not always uniquely identifiable and to know when it is not uniquely identifiable, i n an asymptotic sense. It is also important to bear i n mind the question of identifiability in a design problem. The results in Section 2.1 have provided an answer to these questions. Moreover, these results not only provide a foundation for estimating dP in model (2.1) for continuous covariates, but they also address the same problem when the covariates are discrete or ordered categorical. For example, one may want to know which of the two covariates, the dose of certain drug or age group, alters the dependent structure of blood  pressure on the two. In this case, the identification of cP is important even when the change point is not uniquely defined. As in the example of automobiles, the MIC  we proposed in the last section should be  treated as a method of model selection, and not merely as a tool of estimating dP. In fact, in the case when dP is only identifiable w.r.t. some number less than the known L, d^ and P can be jointly estimated by minimizing MIC over all the combinations of d{<. p) and /(< L). In the next chapter, the consistency of these estimates, under certain conditions, will be shown. From a much broader perspective, our estimation procedures can be seen as a general adaptive model fitting technique. The upper bound L on the number of segments is imposed to ensure computational feasibility and to avoid the "curse of dimensionality"; in other words, L ensures there are sufficient data to enable each piece of the model to be well estimated even when the covariate is a vector of high dimension. W i t h this upper bound, the number of segments and the boundaries of each segment are selected by the data. It will be shown in the next chapter that these estimates are also consistent.  Chapter 3  A S Y M P T O T I C RESULTS FOR ESTIMATORS OF SEGMENTED REGRESSION MODELS  In this chapter, asymptotic results for the estimators given in the last chapter are proved. The exact conditions under which these results hold are stated and explained.  It will be  seen that these conditions seem realistic for many practical problems. More importantly, the techniques we use in this chapter constitute a foundation for the generalizations in Chapter 4 of Model (2.1). In some cases the parameter dP is known a priori, in such cases the notation required for presenting the proof of our results is relatively simple, and so we first prove the results for these cases. In Section 3.1 we estabhsh the consistency of the estimated number of segments, the estimated thresholds and the estimated regression coefficients. Then, for the discontinuous model, an upper bound is given for the rate of convergence of the estimated change points. The asymptotic normality of the estimated regression coefficients and of the estimated variance of the noise is also estabhshed.  In Section 3.2 we move to the case of unknown dP  and prove the consistency of the two estimators of (f given in Section 2.2. It wih be easy to see that the results proved in Section 3.1 still hold \î cP is replaced by its consistent estimate. In Section 3.3, the finite sample behavior of these estimators is investigated by simulation for various models and noise distributions. Some general remarks are made in Section 3.4. The asymptotic normality of the various estimates for the continuous model is established in Section  3.5.  3.1  Asymptotic results when the segmentation variable is known  In this section, the parameter d in model (2.1) is assumed known. Consequently, we can simphfy the notation at the beginning of Section 2.2. For any — o o < a < 7 / < o o , let /„(a,T?) := dia5(l(^j^e(c,„i),...,l{:,„^e{<:,,T,l)),  and ^ „ ( a , 7/) := X „ ( a , r/)[X;(a, 7?)X„(a, r ? ) ] - X ; ( a , 7?), where in general for any matrix A, A~ will denote a generalized inverse while 1(.) represents the indicator function. Similarly, let y „ ( a , Tj) := In{a, T])Yn, ê „ ( a , rj) := / „ ( a , 7/)ê„, 5 „ ( a , rj) := ^ [ / ^ ( a , 7?) - H^a,  7/)]y„,  i+i Sn(Ti,...,Ti)  := ^SniTi-i,Ti),To  1= - c o , r , + i :=  oo,  and r „ ( a , 7 / ) := 4 ^ n ( « , ^ ) ë n Observe that Sn{ot,v) is just the error sum of squares when a linear model is fitted over the "threshold" interval (a, rj]. Also, let the forecast of y„ on the interval (a, 77], Yn{a, 77), be defined by y„(«,7/) := Hr,{a,ri)Yn. Then, in terms of true parameters, (2.1) can he rewritten in the vector form, F „ = J ] X „ ( r f _ i , r ° ) / 3 . ° + f-n. t=i  (3.1)  To establish the asymptotic theory for the estimation problems of Model (3.1), some assumptions have to be made. First, we assume an upper bound, i , of  can be specified. This  is because in practice, the sample size n is always finite and hence any 1° that can be effectively identified is always bounded. We also assume the segmentation does occur at every true threshold, i.e.,  7^ /^j+i) i = 1) • • • 5  so that these parameters are uniquely defined. The covariates  {xt} are assumed to be strictly stationary, ergodic random sequence. Further, {xt} and the errors sequence {q} are assumed independent. These are the basic assumptions underlying our analysis. To simplify the problem further, we assume in this chapter that the errors {et} are iid random variables with mean zero and variance a^. In addition, a local exponential boundedness condition is placed on the distribution of the errors {et}.  A random variable Z is said to be  locally exponentially bounded li there exist two positive constants, CQ and TQ, such that  i;(e"^) < e'^""', for  every \u\ < TQ.  (3.2)  The above assumptions are summarized in  Assumption 3.0; The covariates {x^} and the errors {et} are independent, where the {x^} are strictly and ergodic with E{x[x.i)  < oo, {ct} are iid with a locally exponentially bounded distribution  having mean zero and varianceCTQ.For the number of threshold that /o < L. Also, for anyj^l,---,  Remark  stationary  f,  there exists a known L such  7^ ^ ^ ^ j .  The local exponential boundedness condition is satisfied by any distribution with  zero mean and a moment generating function with second derivative bounded around zero. Many distributions commonly used as error distributions such as those in the symmetrized  exponential family are of this type, and hence aU the theorems in this chapter wiU commonly apply.  The next assumption is required to identify the number of thresholds /° consistently. Assumption 3.1 There exists è G (0, mini<j<;o(rj^i — T j )/2) such that both E{x.i-x.\l^^^^^ç,(^^o_g ,^ay^} and E{x.i-x.'i ^(xide(r9 ,T9-irS])] o,re positive definite for each of the true thresholds r f , . . . , r j o .  Under Assumption  3.1, the design matrix Xn{ct,T]) has full column rank a.s. as n —»• oo for  every open interval (a, r?) containing at least one of (rf - S, r f + 6], i = 1,..., 1°. So P{a, 77) = [Xl^(a,ri)Xnia,T])]~Xl^{a,rj)Yn  wiU be unique with probabihty tending to 1 as  It is easy to see that Assumption o f z i = ixn,--.,xipy,Cov{zi\xueirf-S,Tf]) are both positive definite. Assumption over each of { x i : xu  ^ 00.  3.1 is satisfied if and only if the conditional covariances and Cov{zi\xid  € (rf,  + <5]), (i = 1 , . . . , / « ) ,  3.1 means that the model can be uniquely determined  G ( r f - 6,Tf]} and { x i : xid £ ( r P , r f + S]}, i ^ 1 , . . T h e  remark  immediately after the proof of Theorem 3.1 will show that this assumption can be slightly relaxed. To estimate the thresholds consistently, we need Assumption 3.2 For any sufficiently small S > 0, £:{xixil(^^_^e(^_o_5 ,._o])} and £{xixil(^j^g(^_o ^.o+^j)} are positive definite, i — 1,---,P.  Obviously, Assumption  Also, £ ( x i x i ) " < 00 for some u> 1.  5.5 imphes Assumption  3.1.  If Model (3.1) is discontinuous at rj* for some j = I, - • • ,P, it will he shown that the least squares estimate fj converges to rj* at the rate no slower than Op(ln'^ n/n), under the following  assumption: Assumption 3.3 (A.3.3.1)  The covariates { x j are iid random variables. Also, £ ( x i x i ) " < oo for some u > 2.  (A.3.3.2)  Within some small neighborhoods of the true thresholds, xid has a positive and con-  tinuous probability density function fd{-) with respect to the one dimensional Lebesgue measure. (A.3.3.3)  There exists one version of E[xi-x.[\xid = x] which is continuous within some neigh-  borhoods of the true thresholds and that version has been adopted.  Remark  Assumptions  (A.3.3.2)-(A.3.3.3)  are satisfied if z i = ( x i , • • •, Xp) h.as a joint distri-  bution in canonical form from the exponential family. Note that Assumptions 3.1-3.3 are made on the distribution of { x j . When {x^} are nonrandom, one may assume the empirical distribution function of {xt} converges to a distribution function satisfying these assumptions. Now, the main results of this section are presented in the next five theorems. Their proofs are given in the sequel.  Theorem 3.1  Assume for the segmented linear regression model (3.1) that Assumptions 3.0  and 3.1 are satisfied. Then I, the minimizer of (2.4), converges to  Remark  in probability as n  oo.  In the nonlinear minimization of 5 ( r i , . . .,r(), the possible values of r i < . . . < r;  may be limited to { x i ^ , . . . , x„d}. This restriction induces no loss of generality.  Theorems 3.2 and 3.3 show that the estimates f, ^^s and a- are consistent.  Theorem 3.2  Assume for the segmented linear regression model (3.1) that Assumptions 3.0  and 3.2 are satisfied. Then  where T°  = ( r ° , . . . , r^o )  and  f = ( f i , . . . , -fp)  is the least squares estimate of  r°  based on I = /,  and I is a minimizer of MIC {I) subject to I < L.  Theorem 3.3  If the marginal cdf Fj. ofx\d satisfies the Lipschitz condition \Fd{x')—Fd{x")\ <  C\x' — x"\ for some constant C in a small neighborhood of Xid = r ° for every j, then under the conditions of Theorem 3.2, the least squares estimates (Pj, j = 1,... ,1) based on the estimates I and fj's as defined in Section 2.2 are consistent. The next two theorems show that if Model (3.1) is discontinuous at TJ for some j = 1, • • •, /°, then the threshold estimate fj converges to the true thresholds rj" at the rate of Op(ln'^n/n), and the least squares estimates of  andCTQbased on the estimated thresholds are asymptotically  normal.  Theorem 3.4  Suppose for the segmented linear regression model (3.1) that Assumptions  3.0,  3.2 and 3.3 are satisfied. For any J G {1, • • •, /°} such that P ( x i ( ^ j % i - ySp ^ Q\xd = T^) > 0,  Tj-Tj  =0p(-—).  Let Pj andCT'^be the least squares estimates of P^j andCTQbased on the estimates / and fj's as defined in Section 2.2, j = 1,... ,1^ -\- I. Theorem 3.5  Suppose for the segmented linear regression model (3.1) that  3.0, 3.2 and 3.3 are satisfied.  Assumptions  If P{x[(P^^j^ - P^) 7^ 0\xd = r?) > 0 for all j = l , - - - , / ° ,  then y/n(Pj - / 3 ° ) and •y/n[â^ - CTQ] converge in distribution to normal distributions with finite variances, j = 1 , . . . , /° + 1.  Remark The asymptotic variances can be computed by first treating P and rj", (j = 1 , . . . , /°), as known so that the usual "estimates" of the variances of the estimates of the regression coefficients and residual variance can then be written down explicitly by substituting / and fj for  and TJ, [j = 1,...,/^), in these variance "estimates". For example, the asymptotic  covariance matrix for Pj is OTQGJ^,  where Gj = £'[xiXil(2,j^g(^o_^ ,.9])].  The proof of Theorem 3.1 is motivated by the following idea. If the model is overfitted {P < I < L), the reduction in the mean square error will be bounded in probability by a positive sequence tending to zero. In fact, this turns out to be Op(ln^ n/n).  On the other  hand, i f the model is underfitted (/ < P), the inflation in the mean square error will be of order Op{l).  Hence, by setting the penalty term in MIC equal to a quantity of order bigger than  Op(ln^ n/n) but still tending to 0, we can avoid both overfitting and underfitting. This idea is formulated in a series of lemmas.  The result of Lemma 3.1 is a consequence of the local exponential boundedness assumption, which gives the added flexibihty of modehng with non-Gaussian noises. Using the properties of the hat matrix Hn{xsd, Xtd), Lemma 3.2 estabhshes a uniform bound of T „ ( a , 77) for all a < t]. W i t h this lemma, we show in Proposition 3.1 that the mean squared residuals differs from the mean squared pure errors only by Op{ln^ n/n), penalty term in our MIC.  which in sequel motivates the choice of the  Given Lemma 3.2 and Proposition 3.1, the results of Lemmas 3.3  and 3.4 are more or less expected.  Lemma 3.1  Let Zi,...,Zk  be i.i.d.  locally exponentially bounded random variables,  i;(e"^i) < e'=°"' for \u\ < TQ, where TQ and CQ € (0,oo). Let Sk = E Î L i  i.e.,  where the a\s are  constants.  Then for any  > 0 satisfying |fo«t| < TQ, i < k, P{\Sk\ >x}<  Proof  2e-'°^+'=°'°S-=i''?.  (3.3)  It follows from Markov's inequality that for the hypothesized to,  P{Sk >x} = Pfe*"-^* > e*'"'} < e~^'"'E{e^°^'')  = e-'°'^£(e'° ^ * = i )  < e-'o^e""*" ^ i = i ,  and to conclude the proof of (3.3), P{Sk < -x} Lemma 3.2  = P{-Sk  >x}<  e-^o^e"^"'" ^*=i ''^.  ^  Assume for the segmented linear regression model (3.1) that Assumption 3.0 is  satisfied. Let r „ ( a , 7 / ) , — o o < ex. < T} < oo, be defined as in the beginning of this section. Then P{sup Tn{a, 7?) > ^ In^ ra} ^ 0, a<ri 1Q  as n ^ 0,  (3.4)  where po is the true order of the model and TQ is the constant associated with the local exponential boundedness condition for the {ct}. Proof  Conditioning on X „ , we have  P{sup r „ ( a , r ? ) > ^ I n ^ n I X „ } = P{ max €'M^sd,xtd)ën a<v J-O x,d<x,a  <  >^ln'n\ ip  X„}  P{è'M^sd,xtd)èn>^ln'n\Xn}.  Since Hni^Xsdi Xtd) is nonnegative definite and idempotent, it can be decomposed as Hn{xsd, Xtd) = M^'APF, where W is orthogonal and A = diag{l, • • •, 1,0, • • •, 0) with p := rank{Hn{xsd, Xtd)) = rank{A)  < PQ. Set Q = (Ip,0)W.  Ui = q;ê„, / =  Then Q has full row rank p. Let Q' = ( q i , - - - , q p ) and  Then p  Since p < po and 1=1  ^0  <P{J:uf>p^^ln'n\Xr.} 1=1 ^0 <P{Ul > ^ I n ^ n for some / | X „ }  1=1  ^0  it suffices to show, for any /, that  E  P{Uf>^ln'n\Xn}^0,  X,d<Xtd  °  Noting that p = trace{Hn{Xsd,Xtd)) / = 1,... ,p. B y Lemma 3.1, with V  asn-^0.  = Y7i=x II qt IP> we have || q, f=  qjq; < p < Po,  = To/po we have  P{|C/,| >3poInn/To I X „ } <  T  2 e x p ( - ^ • ^lnn)exp(co(ro/po)%)  < n(n - l)/n^ exp{coT^/po)  0,  as n -> oo, where CQ is the constant specified in Lemma 3.1. Finally, by appealing to the dominated convergence theorem we obtain the desired result without conditioning.  Proposition 3.1  ^  Consider the segmented regression model 3.1.  (i) For any j and {a,rj\ C (r]'_i,r]'],  5 „ ( a , 7/) = ê ' „ ( a , 7 7 ) ê „ ( a , r/) - r „ ( a , 77).  ('ii^ Suppose Assumption  3.0 is satisfied. Let m > 1. T/ien uniformly for all ( a i , • • • , a ^ ) such  that —00 < ai < • • • < Um < 0 0 , m+t°  i=l  +l  where  = -oo,  ^„,+;o+i = oo, and {^i, • • • ,^m+i°}  is the set {rf, • • •, r°o, ai, • •  ordering its elements. Proof: (i) Observe that Snia, ri) ^Y^iUa,  rj) - H^ia, r,)Y^  = ( X „ ( a , 7?)^° + 6 „ ( a , v))'iXn{a,  r,)$'j +  ê „ ( a , rj))  - ( X „ ( a , r,)p'j + ê „ ( a , r ? ) ) ' ^ n ( a , r?)(X„(a, 7?)^° + £ „ ( a , 7/)) = / f ° ' X ; ( a , 77)X„(a, 7?)^° + 2ë'^ia, 7?)X„(a, 7/)^° + ^ ( a , 7?)6„(a, T?) -[/3°'X:(a,77)^„(a,77)X„(a,77)^° + 2 4 ( a , 7?)/r„(a, 7?)X„(a, 7/);3° +  7 , ) i r „ ( a , 77)€„(a, TJ)].  Noting that i f „ ( a , 77) is idempotent and  X ; ( a , 77)^„(a, 7;)X„(a, 77) = X^ia, n)Xn{a, rj),  we have ( X „ ( a , 77) - ^ „ ( a , rj)Xnia, 7?))'(X„(a, 7/) - ^ „ ( a , 7?)X„(a, 7?)) = X ; ( a , 7/)(/„(a, 77) - Hn(a, 7/))X„(a, 7/) = X ; ( a , 7?)X„(a, 7?) - X ; ( a , 7,)X„(a, 7?) = 0 and hence X „ ( a , 77) = Hn{a, 77)X„(a, 77). Therefore 5 „ ( a , 7/) = ê U a , 7 ? ) ë „ ( a , 7?) - 4 ( a , 7 ? ) 5 " „ ( a , 7 7 ) è „ ( Q , 7/)  =ê'„("> '/)ë„(a, 7;) - T „ ( a , 77). (ii) B y (i), m+l° + l «•=1 m+l° + l  •- E  «=i  =ê'„ê„-  K(ei-i,6)ê„(ei-i,6)-r„(e.-i,e.)]  E  î^n(6-l,^i).  Note that each of (6-1 > ft] is contained in one of ( r ° _ i , rj"], j — 1, • • • ,1^ + 1. B y Lemma 3.2, ET=/^'  <{m + P + l ) s u p „ < , r „ ( a < r?) = Op{\n' n).  Lemma 3.3  Under the condition of Theorem  3.1,  there exists 8  £  %  (0, mini<j</o (rj'^j —  TJ)/2)  such that for r = 1 , . . . ,  [ 5 „ ( r ° - 6,  + 6)-  5 „ ( r ° - S, r,) - 5 „ ( T ° , r ° + ê)]/n ^  C.  (3.5)  for some Cr > 0 as n —* 0 0 .  Proof  It suffices to prove the result when 1° = 1. For notational simplicity, we omit the  subscripts and superscripts 0 in this proof. For the S in Assumption 3 . 1 , let Xj* = X „ ( r i —^, r i ) , ^ 2 = ^ n ( r i , n + 6), X* = X „ ( r i - ^, n + <5) = X ; + X;, el* = è„(ri - «5, rj), €* =  = è„(ri, n + 8 ) ,  + €2 and P = ( X * ' X * ) ~ X * ' y n . As in ordinary regression, we have Sn{ri-8,Tx  + 8)  =\\xfpx + x*j2 +  r-x*'p\?  =\\x:{h-h+x;cP2-h+n' =\mh  - h?  + m i h  - h?  +  +2ê*'x;{h - h  + 2e~*'x^ip, -  h  It then follows from the strong law of large numbers for stationary ergodic stochastic processes that as n - * 00, 1  '  1 , —XX* "  '  '  1 "  f ^{Xixil(^^,g(^,_5,^,])} > 0, ""'i <  if  \ i ; { x i x i l ( x , . e ( n , n + 6 ] ) } > 0,  if  and  To  j=2.  Therefore,  Similarly, it can be shown that f (  - ;â*)'X;(xixil(,,,g(,,_5,n])) • (^1 - /3*), n  •'  02-n'E{x^K[l(^^^ç^r„n+5]))-02-n,  V x ; ( ; â ^ - ^ ) ^ 0 ,  if  j=i,  if  J=2,  for j = 1,2,  and n Thus as n —>• oo, ^ 5 „ ( r i — ^, r i -f- ^) has a finite hmit, this limit being given by lim - 5 „ ( r i - S,TI +6) n—*oo n ={h  - ^ * ) ' i ; ( x a x ; i ( . , , e ( n - . , x , i ) ) • ( À - P') + 02 - / 3 ' ) ' £ ( x i x ; i ( , , , e ( . „ . , + „ ) ) • 0,  + a^P{xtHe{n-S,n  - p*)  + S]}.  It remains to show that ^ 5 n ( T i — S,TI) and ^ 5 „ ( r i , r i + ^) converge to a-P{xid  G (TI -  ^, n ] } and cr^Pjxid G ( n , rj+<î]}, respectively, and either {Pi - / 3 * ) ' £ ( x i x i ^ ( ^ j _ s ^ r i ] ) ) 0 i  -  P*) > 0 or (;92 -^*)'£^(xixil(ij_^ç(^j,^j4.5]))(/32 -y3*) > 0. The latter is a direct consequence of the assumed conditions while the former can be shown again by the law of large numbers. To this end, we first write 5„(TI — 8,TI) in the following form (bearing in mind that P is assumed to he 1 in the proof), Sn{ri-6,Ti)  =  êl'll-Tr.{n-6,n)  using Proposition 3.1 (i). B y the strong law of large numbers,  Ul'êl iê*Xi 71  ^  E[4l^,,,e(r..s,r,])] E[eiy.il{^^^^^r,-s,T,])]  = <T'P{xrd G ( n = 0,  6,n]},  and W = lim„_oo ^I'X^  is positive definite under tlie assumption. Tlierefore,  and hence ^ 5 „ ( r i - S,TI)  a'^P{xid G  show that ~Sn{T\,Ti  a'^P{xid G {TI,TI + 6]}. This completes the proof.  Lemma 3.4  + 6)  - 8, ri]}. The same argument can also be used to %  Under the condition of Theorem 3.1, we have  (i) for every I <  , P{àj > <TQ + C}  (ii) for every I such that  I, as n ^ oo for some C > 0, and  <l < L, where L is an upper bound of  ,  0 < -I'^ln - à] = Op{ln\n)ln), n  (3.6)  where âj — ^ 5 „ ( f i , . . . , f ( ) is the estimated CTQ when the number of true thresholds is assumed to be I.  Proof  (i) Since / < / ° , for the 6 G (0, mini<j<;o (rj'^i - rj')/2) in Assumption  1 < r- < /o, such that ( f i , . . . , f i ) G Ar := { ( r i , . . . , r , ) : \TS - r ° | > S, for  3.1, there exists all s = 1 , . . . , / } .  Hence, if we can show that for each r, 1 < r < / ° , with probabihty approaching 1,  min  Sn{Ti,---,Ti)/n>  +Cr,  for some Cr > 0, then by choosing C := mini<r<;o{Cr}, we will have proved the desired result. For any ( r i , - - - , r / ) G A ^ , let f i < ••• < 6+io+i be the ordered set { r i , . . . , r;, TI", . . . , T°_i,  T°-ë,  r°+6,  T°^i ,...,T^o} and let fo = - o o , 6+/0+2 = oo- Then it follows from Proposition  3.1 (ii) that uniformly i n 1 n n 1+1°+2  1 _T =- E ^n(6-l,ei) (3.7) = n^  E  - ^ " ( 0 - 1 , 0 ) + ' î n ( r ° - ^ , r ° ) + 5 „ ( r ° , r « + 6)]  + i [ 5 „ ( r ° - 6, r ° + ^) - 5„(r,° - S, r ° ) - 5 „ ( r ° , r ° + 6)] n = -~e'nën + Op(ln2(n)/n) + - [ 5 „ ( r ° - <5,r° + 6)-  5 „ ( r ° - (5,rO) - 5 „ ( r ° , r ° + <?)].  By the strong law of large numbers the first term on the R H S is  + o(l) a.s.. B y Lemma 3.3,  the third term on the R H S is Cr + Op(l) a.s.. Thus 1 n where Cr is defined in (3.5). (u) Let ^1 < ••• < ^/+;o be the ordered set, { n , • • •, f;,  , • • •, r,^},  = T§ = - o o and  ^;+(o+i = T°o^^ = 00. Since / > P, by Proposition 3.1 (ii) again, ^ n ^ n >'5'n(7"i , • • •, Tjo)  i.2  =4f-n + This proves (ii).  Opiln'in)).  ^  Proof of Theorem 3.1  B y Lemma 3.4 (i), for / < P and sufficiently large n , there exists  c > 0 such that MIC{1) = \n{âf) + p*{lnnf+^/n  > Inia^ + C/2) > In(al) + l n ( l + C/(2a^))  with probability approaching 1. By Lemma 3.4 (ii), for / > 1°, MIC{1) = In(âf) + p*(lnn)2+Vn Thus, P{1 >  —* 1 as  Incrf.  oo. B y Lemma 3.4 (ii) and the strong law of large numbers, for  /o < / < X , 0 > [a? - U'^ên] - [ 4 - U'jn]  = Op{ln' n / n ) ,  and [âl - cl] = [âfo Hence 0 < (âfo-àf)/â%  + [^è'jn  = Op(ln^ n/n).  MIC{1) - MIC{f)  - CT'O] = Opiln' n/n) + Op(l) ^ Op(l).  Note that for 0 < a; < 1/2, I n ( l - x ) > -2x.  = l n ( â f ) - l n ( 4 ) + CQ{1 -  Therefore,  f){\unf+^°ln  = l n ( l - ( 4 - â f ) / 4 ) + co(/ - /°)(lnn)2+«o/n > - 20p(ln2(n)/n) + co(/ - /°)(ln n)2+*Vn >0 for sufficiently large n. Whence / ^ /" as n ^ oo.  Remark:  f  From the proof of Theorem 3.1 it can be seen that if the term Co/(ln n)^+''o/n is  replaced by / - c n " " ^ , where a € (0,1) and c is a constant, the model selection procedure is still consistent. In fact, such a penalty is proposed by Yao (1989) for a one-dimensional piecewise constant model.  Remark If the assumed 6 in Assumption 3.1 is replaced by assumed sequences {flj}, {bj] such that - o c < oi < r f < 6i < • • • < a;o < r^o < 6/o < oo, and such that both E{x.ix.[l(^^^^f^a. .^o-^-^]  and £{xixil(2.j^g(,.o^{,^.])} are positive definite for j = 1 , . . . , / ° , then the conclusion of Lemma 3.3 still holds with 6 replaced by aj and bj, respectively. Therefore, the conclusion of Theorem 3.1 still holds.  To prove Theorem 3.2, we need the following lemma.  Lemma 3.5  Under the assumptions of Theorem 3.2, for any sufficiently  small 6 G (0,  mini<j</o(r^^j — rj')/2), there exists a constant Cr > 0 such that  ^ [ 5 „ ( r ° - <5,r° + 6)-  5 „ ( r ° - S,T^) - Sn(r^,T°,  + S)] ^ Cr, as n ^ oo,  where r = 1, • • Proof  It suffices to prove the result for the case when  = 1. For any small ^ > 0, all the  arguments in the proof of Lemma 3.3 apply, under Assumption  3.2. Hence the result holds.  IF Remark: Although the proofs of Lemma 3.3 and Lemma 3.5 are essentially the same, the assumptions, and hence the conclusions of these lemmas are different. In Lemma 3.3 Cr is fixed for the existing 6. While Lemma 3.5 implies that for any sequence of {6m} such that and  > 0  —^ 0 as m ^ oo, there exist {Cr(m)} such that the conclusion of Lemma 3.5 holds for  all m .  Proof of Theorem 3.2  B y Theorem 3.1, the problem can be restricted to {/ = / ° } . For any  suflîciently small 8' > 0, substituting S' for the 6 in (3.7) in the proof of Lemma 3.4 (i), we have  the following inequality -Snin, n  - • • ,Tio)  >-ë'^èn + Op{ln\n)ln) n 1 + -[5„(r° r ° + 6') - 5„(r," - 8', r ° ) - 5 „ ( r ° , r," + 8% n uniformly i n ( n , • • - J T / O ) £ Ar  { ( r i , • • • ,r;o) : Ir^, -  >  1 < s < / ° } . B y Lemma 3.5, the  last term on the RHS converges to a positive Cr- For sufficiently large n, this Cr will dominate the term Op(ln^ n/ra). Thus, uniformly in Ar, r = 1,... ,1^, and with probability tending to 1, 1 o / n  -Sn{ri,---,Tio)  ^ > -1e „,e „ + — Cr . n 1  This implies that with probability approaching 1 no r in Ar is qualified as a candidate for the role of f, where f = ( f i , • • •, fjo). In other words, P{T 6 Af) for all r, P{f G f l t l i ^ r )  riil^'- -  I<  1,  n -> oo. Note that for 8' < mino<i<;o{(rP^i - rP)/2},  =  r=l  1 as n ^ oo. Since this is true  - ^ r l < è'Jor  some 1 < v <  = {r € f l  r=l  r=l  Thus we have, 1°  P{\fr - r ° | < 8' for r = 1,...,/") = P{f e Ç] A';) ^ 1, as n ^ oo, r=l  which completes the proof.  ^  The proof of Theorem 3.3 requires a series of preliminary results. The key step is to establish Lemma 3.6 which implies the estimation errors of the regression coefficients are controlled by the estimation errors of the thresholds.  Proposition 3.2 Let { x „ } be a sequence of random variables. If z „ = Op(l), then there exists a positive sequence { a „ } , such that a „ ^ 0 as n ^ oo and Xn = O p ( a „ ) .  Proof Let €k =  = 1/2'', k = 1,2,- •  Since a;„ = Op(l), for e\ and ^ i , tliere exists A''i > 0  such that for all re > Ni P C k n l > Si) < €i. A n d for each pair of  and 6k, there exists Nk > iVjt_i such that for all n > Nk,  P(\Xn\ > 6k) < €k-  Let a„ = 1 if n < iVi and an = 6k ii Nk < n < Nk+i, k = 1,2, - • •. Then a„ Also, for any e > 0, there exists ko such that 0 <  0 as  re  oo.  < €. Thus for any re > Nk^, Nk < n < Nk+i  for some k > ko, and  P(\xn\ > a „ ) = P{\x^\ > 6k) < ffc < ffco < e.  Again by x „ = Op(l), there exists M > 1 such that  Pi\xn\  for all re < Nko • This completes the proof.  Lemma 3.6  >M)<e  %  Let Rj = (rj'_i,r]'], Rj = (fj_i,fj],  An,j = \fj — Tj \ = Op(a„), j = 1, • • • , +  =  TQ =  - o o , rfo+j = 7^,0+1  =  00,  and  1, where {an} is a sequence of positive numbers.  Suppose that {(zt,Xtd)} is a strictly stationary and ergodic sequence and that the marginal cdf, Fd, of Xid satisfies the Lipschitz condition, \Fd{x') - Fd{x")\ < C\x' — x"\, for some constant C in a small neighborhood of xid = TJ for every j. If for some u > 1, E\zi\^  where 1/v = 1 — 1/u.  < 0 0 , then  Proof  It suffices to sliow that  \^i\\^(x,defli) - l(x.j6fl,)l = C>p((a„)i/'').  Since, for  every j = 1 , . . . , / ° ,  where for J = 1, the first term is defined as 0. Hence it suffices to show that for every i .  B y assumption, A „ j = Op(a„). So for all e > 0 there exists M > 0 such that P ( A „ j > a „ M ) < € for all n. Thus  1  "  E  l ^ ' | l ( k , . - r ° | < a „ M ) > «y'^M) + 6.  Hence it remains to show that ^ i / , ^ I]"=i kt|l(|x,j-T9|<a„M) is bounded in probabihty. However, in view of the Holder's inequality and the assumptions, the expected value of this last quantity is bounded above by ( £ ' | 2 i | " ) ^ / " a ô ' ' ^ ' ' ( C a „ i l / ) ^ / " for some constant C. This shows that 1  "  an n is bounded in  and hence in probabihty.  Proof of Theorem 3.3  %  Let /Sj" be the "least squares estimates" of  j = 1, • • •, /° -f-1, when  P and {T\I - • • IT^Q) are assumed known. Then by the law of large numbers, j = 1, • • •, /" -f 1. So it suffices to show that Pj ~  = Op(l) for each j.  — /3j = Op(l),  Set x ; = / « ( r P . i , r j ' ) X „ and Xj = / „ ( f , _ i , f , ) X „ . Then,  h -  ^;  - ( i x ; ' x ; ) - ] { i ( x j - x ; ) % + i x ; r „ } + [ ( i x ; ' x ; ) - ] [ i ( x , - x;)'y„]  =:(/){(//) + (///)}+ where (/) = [ ( ^ X j X , ) " - ( ^ X / X ; ) " ] , (//) = i ( X ; . - X ; ) % , ( / / / ) = i X ; y „ and {IV)  =  [ ( i X / X / ) - ] . B y the strong law of large numbers, both (III) and (IV) are Op(l). B y Theorem 3.2, f — r ° = Op(l). Proposition 3.2 implies that there exists a sequence {«n}, a„ —> 0 as n  oo such that f - r ° = O p ( a „ ) . Note that (//) = ^ Y,^^^ ^tyti'^ix.jeR,)  ~ h^t^eR,)) where  Rj = (•fj_i,fj], Rj = (rj'_i,rj']. Taking u > 1 and Zt = ai'xtyt for any real vector a, it follows from Lemma 3.6 that (//) = Op(l). If (J) = Op(l), then 'pj - P* = Op(l), j = 1, • • •,/° + 1. So, it remains only to show that (/) = Op(l). B y the strong law of large numbers, ^XJ'XJ show that ^X'jXj-i^XJ'X*  ^fxiXil^^^^g^^o^^^o])} > 0. If we can  = Op(l), then for sufficiently large n, ( i X j X y ) - i and ( ^ X / ' X * ) " !  exist with probability approaching 1. A n d , ( ^ X j X j ) ~ — ( ^ X j * ' X * ) ~ = Op(l). So, it suffices to show that ^ X j X j — ^Xj'XJ  = Op(l). Let a 7^ 0 be a constant vector and Zt — (a'xj)^.  Then a ' ( i X j X , - i X ; ' X ; ) a = 1 E L i a'x,x^a(l(^,^,^^.) - ! ( . . , , « , ) ) = \  ^ti^^.^eR,)  "  ^(xtjeRj))- Taking the sequence {un} in the last paragraph and u > 1, it follows from Lemma 3.6 that a ' ( i X j X , - - i X ; ' X ; ) a = Op(l) and hence i X j X , - - i X / ' X * = Op(l).  This completes the proof.  %  The proof of Theorem 3.4 depends on the following results.  Proposition 3.3 (Serfling, 1980, p32)  Let {y^t, 1 < t < Kn,n  = 1,2,...} be a double array  with independent random variables within rows. Suppose, for some v > 2,  Then n B-'[J2y-t-^-]^  N{Q,l),  where n^t = E{ynt), An = E<=i Mnt and Bl = Lemma 3.7  asn-^oo,  Var(ynt).  Let {kn} be a sequence of positive numbers such that kn ^ 0 and nkn —> oo.  Assumptions 3.0 and 3.3 imply that for any j = 1, - • • ,P, (i) ^X;(r« -  fc„,r°)X„(r°  - fc„,r°) ^ £ ( x i x l | a ; i , = r ° ) / , ( r ° ) ,  ^ X ; ( r ° , r j ' + fc„)X„(r°,r° + kn) ^ E{xix[\x,d  =  r^)féir°),  (ii) ^ 6 U r « - kn,r^)en{r^ - kn,T^) ^  a'Mr^),  ^ 4 ( r « , r ° + A : „ K ( r ° , r ° + kn) ^ cToV.(r°), (Hi) - kn,r^)Xn(r°  - kn,T^) ^  0,  - ^ < ( r ° , r » + kn)Xn{Tf,T^ + kn) ^ 0.  Proof  It suffices to show the second equation in each of (i), (ii) and (iii), the proofs of the  first deferring only i n a formahstic sense. (i)  Note that X'niTf,Tf  + /:„)X„(rj', r? + A;„) = E t l i X t x ; i ( , . , e ( , o , , o + , „ ] ) .  be a constant vector, r/„t = a'xtx;al(^,^e(^o_^o^jt„]),  Let a ^ 0  = E(ynt), and al = Var{ynt).  If  X;[(a'xt)2|r9] > 0, then E[(a'yity\Tf]  > 0 and  =^{l(x..€{r°,rO + fc„])£^[(a'xi)2|xtd]} =E[iBi'xrf\xrd  = &n]fd{0n)kn  = i ; [ ( a ' x i ) ' | i i < i = r°]/d(r°)fc„ + o{kn), where dn €  {'''J^TJ  + A;„] and /d(-) is the marginal density function of Xtd- Similarly, al=Eyl-,^l  where rjn € i^j  = E[(ei%)'\Xtd = VnUMkn  - f^l  = E[(B.%y\Xtd  + 0{kn),  = T^]UT^)kn  + ^n] and for sufficiently large n,  E\yni-t^nr<2''-\E\ynir  > 0. B y Minkowski's inequality, for  + ti:)  = 2 ' ' - H ^ [ ( a ' x i f " I x i , = ^n]fdUr.)kn + ( i ; [ ( a ' x i = 2''-'E[iai'xxf'\x,d  = Tf]MT])kn  l^i, =  en]fd(On)knr}  + Oikn),  where Ç„ € i'^j^'^j + ^n]- So by setting An = nfin and  = ncr^, we have  i=l iE[{a'x,y\xu  = r°]/,(r9)A;„ + o(A;„))V2  -0, as n ^ oo since v > 2. Hence by Proposition 3.3, n  Bn'[J2ynt-An]^N{0,l), t=l  US U  OO.  Now, since Bllinknf  = Opiln^n)/ln'n 53  =  Op{ln-^n),  we obtain  1 = — V ynt  a'X;(xixJ|a;fd = T^)aifd{T^), as n ^ oo.  K i;[(a'xi)2|xid = rj»] = 0, it suffices to show that ; ^ a ' X ; ( r j > , T"? + A;„)X„(r?,  + fc„  converges to 0 i n i i . i ; ( ^ a ' X ; ( r ° , 7-° + K)Xn{Tl  + fc„)a)  1  =£[(a'xtfl(..,e(.o,.o+jt„])]/fc„ =^{l(r..e(rO,TO+fc„])£[(a'xt)'|xid]}/A:„ = i ; [ ( a ' x i f | a : i d = ^„]/d(^„) = £ [ ( a ' x a f | x i , = r ° ] / . ( r ° ) + o(l) =o(l), as n —>• oo, where 0„ € (u)  ('''JITJ  + 'i^n)- This completes the proof.  Similarly to (i), let y^t = ^t'^(x,de(r°,r°+k„]), fJ-n = E(ynt), and al = Var(ynt).  fin =^[f?l(x„e(T°,T°-l-fc„])]  = al[fd{T^)kn + o(kn)l =E{ylù  -  ni  = Eiet)P{xtdeiTf,Tf  +  = Ei4)UT^)kn  +  = Eiet)MT^)kn  + oikn).  kn])-fll  0ikn)-fll  Then  By Minkowski's inequality, for u > 2, ^iî/„i-/^nr<2'^-'(^iynir+//;:)  = 2 ' ' - i f ; ( e ^ ) / d ( r ? ) f c „ + o(fc„). So by setting A„ = n/z„ and  = na^, we have  è ^\ynt - M n l V ^ ; : =n-^''''-'^E\ynt - M n | 7 ( ^ | y n * - tin?)"" i=i _(./2_i) 1''-'[E{exrU{r^)K + o{kn)] <n {E{e,YU{r])kn^o{K)Yn ^0, as n —»^ oo. Hence by Proposition 3.3, n  By the fact that Bllinknf  = Op(/n2n)//n''n =  Op{ln-''n),  we obtain  (iii)  For any a 7^ 0, E{^e'n{rlr^j 1  + fc„)X„(r]', r]> + ^ „ ) a f  "  1 =  E  ^[^?(^'^0'l(x..6(rO,rO + fc„])]  = ^ a 2 ( i ; [ ( a ' x , ) 2 | x i , = r]>]/,(r°) + o(l)) ^  0  as n  oo.  f  The approach of the fohowing proof is to show that uniformly for all TJ such that \TJ — TJ\ > Op(ln^ n/n),  5 „ ( r i , • • •, r;o) > 5'„(r{*, • • •, rfo) for sufficiently large n. We shall achieve this by  showing  5n(r?_i + 6, TJ) + Snirj, rf^^ - S) - [5„(rj'_, + 6, r^) + 5 „ ( r ? , T°^, - S)] + Op{ln' n) > 0  for sufficiently large n.  Proof of Theorem 3.4  B y Theorem 3.1, the problem can be restricted to {/ = P}. Suppose  for some j, P ( x U / 9 , V i - P'j) ^ 0|xd = r?) > 0. Hence A = XJ[(xi(y3P+i - P']))'\xd = r?] > 0. Let  P(a,T})  be the minimizer of \\Ynia,T]) - Xn{(x,Tf)P\\'^. Set  kn  = A ' l n ^ n/n for n = 1,2,- • - ,  where K will be chosen later. The proofs of Lemma 3.6 and Theorem 3.3 show that if a „ Vi then / 3 ( a „ , 7 / „ )  «5 Vn  /3(rj'_i + <Ç,rj' + kn) small  S  0(a,T))  as n ^  + <5,rj') as n  oo.  Hence, for rj* + A;„ ^  oo. B y Assumption  Therefore P{TJ_I  + S,T^  +  kn)  Pj.  oo,  3.2, for any sufficiently  e (rj'_i,rj'), i ; { x i x i l{x,de(T9_^+s,T°])} is positive definite, hence  as n —» oo.  rj* as n ^  P{Tf_-^ +  6,Tf)  So, there exists a sufficiently smah  <5 > 0 such that for all sufficiently large n , ||/?(r?_i + S,T^ + kn) - P°j\\ < \\P°j - P%i\\ and iP{Tf_,  +  ê,T^  +  kn)  - P^+i)'Eixix[\xu  = rf) {P{TU  + ^'^i +  kn)  - P]+i) > A / 2 with  probabihty approaching 1. Hence by Theorem 3.2, for any c > 0, there exists Ni such that for n > Ni, with probability larger than 1 - 6, we have Z=l,---,/°,  {\)\fi-Tf\<S,  (ii) WkrU  +  + ^n) -  < 2||^? - P'HA'  (in) (/3(r«_, + 6,T9 + kn) - P'^j^JE{xix\\xid  and  = r^){M-i  +  + ^")) " Z^^+i) > A / 2 .  Let A,- = {{n, - • • ,r,o) : \Ti - Tf\ < S, i = 1, - • •,P, \TJ -  >  J = 1, • • •,/«. Since for  the least squares estimates f i , • • •, f^o, 5 „ ( f i , • • •, f/o)  inf  <  5„(rf,  •••  ,T^O),  {5n(ri, • • •, r,o) - 5 „ ( r ° , • • •, rfo)} > 0  (TI,-,T,O)6>1,-  implies ( f i , • • •, fio) ^ A j , or, |fj — rj"] < fc„ = ii'ln'^ n / n when (i) holds. By (i), if we show that for each j, there exists N > Ni such that for all ra > TV, with probability larger than 1 — 2e, inf(Ti,...,T,o)eAj{'S'n(T-i,---,T/o) - 5„(ri°,---,r°o)} > 0, we wiU have proved the desired result. Furthermore, by symmetry, we can consider the case when TJ > TJ only. Hence Aj may be replaced by A'j = { ( r i , • • •, r,o) A'j, let  6 < • • • < 6/0+1  : \Ti-T^\  < S, i = 1,-• • ,1°,  be the set {n,r,o,  r», •  TJ-T^  • •, T-P.^,  > K}.  For any ( n , • • •, r(o) G  rj».! + S, r^+j  -S,T^^^,---,  after ordering its elements and let fo = - o o , ^2i°+2 — oo. Using Proposition 3.1 (ii) twice, we have E  Sn{^i-uii)  + 5„(r]'_i + <5,r°) + 5 „ ( r ] ' , r ] V i - ^)  =4c„ +  Op(ln2  =[Sn{rl  • • •, r ° ) + Op(ln2 n)] + Op{\n^ n)  n)  = 5 „ ( r ° , . . . , r ° o ) + Op(ln2 n). Thus, Sn{T\, •  • -jT/o) >5„(6, • ••,6/0 + 1) 2/°+2  = ^  5„(f,_l,6)  :=1  =  Sn{ii-,,ii)  + 5„(Tf_i + 8,rj) + 5„(r,-,r]Va - <5)  5„(6-i,6) + 5 n ( r ° _ i + ^ , r ° ) + 5'„(r°,r]Vi - 8) +[^n(r°_a + 8,Tj) + 5„(r,-,rO+, - <^)] - [Snir°_, =Sn{Tl...,T°) HSnirU  + +  + 8,T^) + 5 „ ( r ° , r ° , i - 8)]  Op{ln\) + Snirj,T^+r - é)] - [5„(r°_i + ,Ç,r°) + Snir^T^^,  - 8)],  }  where Op{ln'^n) is independent of ( r i , • • •, r;o) G Aj. It suffices to show that for 5 „ = {TJ : TJ G (TJ + kn, rj* + 6)} and sufficiently large n, inf  {5n(r?_i - ^, rj) + 5„(r,-, r?+i - ^) - [5„(r?_i + 6, r]) + Snir^ rj'+i - 6)]}  ^'^^^  (3.8)  with probability larger than 1 — 2e for some fixed M' > 0. Let n  5 „ ( a , r ? ; ^ ) = | | y „ ( a , 77) - X „ ( a , 7?)^||2 = E ^ ^ / * " Since 5„(Q;, 77) = 5'„(Q, 77; P(a, 77)), we have 5„(r?_i+^,r,) > 5 „ ( r 9 _ i + ^, r9 + kn) + 5 „ ( r ° + A;„, TJ) =Sn{rf_i + 6, rf;P{T^_,  + S,r° + kn)) + 5 „ ( r 9 , + Â;„;^(rf_j +6,T^  + kn))  + 5 „ ( r ] ' + Â:„,r,) >5'„(rj'_i + S,T^) + 5 „ ( r ° , r 9 + fc„;/3(r°_i + <Ç,r° + A:„)) + 5 „ ( r ] ' + A;„,r,). And since (r^ + kn,T^^i - ^] C ( T J , T J ^ . ! ] for sufficiently large n,  Snirf + kn,T^+i - ^;^°+x) = Ur]  + fcn,r°+i - è)ln{r] + fc„,r]Vi - <!?).  Applying Proposition 3.1 (i), we have 0 <Sn{T] + kn,T]^i - 60%,)  - [ 5 „ ( r ° + fc„, T,) + 5„(r,-,r°+i - .5)]  =Tn{r] + Ar„, r,) + r„(r,-, T]^, - S). By Lemma 3.2, the R H S is Op(ln^ n). Thus, Snir^rf^i-S) <Snirf,Tl,-6;Pl,) = 5 „ ( r ° , r ; + kn;P"j+i) + 5 „ ( r ° + kn,T^+r <SniT^,T]  60%,)  + kn, P%,) + 5 „ ( r ° + kn, Tj) + Sn{Tj, T^+j - S) + Op{\n' 7l),  (3.9)  where Op{ln^ n) is independent of TJ. Hence  (3.10) >Sn{rJ,T]^,  -S)-  5 „ ( r ? , r 9 + knJ'j+x) - 5 „ ( r ? + Ar„,T,) + O p ( W n).  Therefore, by (3.9) and (3.10) [ 5 „ ( r ? _ i + (5, TJ) + 5„(r,-, r^+i - 6)] - [5„(Tf_i + ^, rj>) + 5 „ ( r ] ' , rj'+i - 6)] >5n(r?,  + A:„; ^ ( r ? _ i + 6, r? + A;„)) - 5'„(r]', r? +  ^P^^) + ^^(In^ n).  Let M > 0 such that the term |Op(ln^ n)| < M l n ^ ra with probability larger than 1 - e for all n > Ni. To show (3.8), it suffices to show that for sufficiently large n,  Snirf,  + A:„;/3(r°_i + ê, r ° + K)) - 5 „ ( r j ' , rj» + K; P'j+,) - Mln'n  >  M'ln'n,  or  SniTf,T]  + kn; P{Tf_, + 6, r? + kn)) - 5 „ ( r j ' , rj> + kn, P°j+r) > (M' + M)ln'n  with large probabihty. Recall Sn{a,vJ) + kn)Pj+i + ^n(Tf,Tf  = \\Yn{a,rj) - Xn{a,T})P\\^ and Yn{Tf,Tf  + kn,  +  + ^n)) - ^ n ( r ° , T» + kn, ^ ° + i ) ]  = ; ^ [ r n ( r ° , r ° + kn) - X „ ( r « , r ° + kn)KrU - | | y „ ( r « , r « + kn) - Xn{rf,T^  +  +  + kn)P'j+,\\']  - | | c „ ( r ; , 7 - ° + A;„)||2]  + ^n)(^?+l -  =  J:'"^^^'  + kn) =  + kn). Taking K sufficiently large and applying (ii), (iii) and  Lemma 3.7 (i), (iii), we can see that there exists N > Ni such that for any n > N, -L-lSnir^T^  (3.11)  +  > A / 4 - A / 8 > ( M ' +Af)//!:  + S,T^ +  '  + ^n)0°+l  -  kn))r  +  ^ i " + kn))  with probabihty larger than 1 — 2e. Since /:„ = Klv?n/n,  Proof of Theorem 3.5 n S"=i  B y Lemma 3.4 (n),  -  the above imphes (3.11).  ^  J2t=i A = Op{ln^ n/n). So,  and  share the same asymptotic distribution. Applying the central hmit theorem to {e^},  we conclude that the asymptotic distribution of Let {Pi, - • • iPfo^i)  Z)"=i is normal.  be the "least squares estimates" of (Pi, • • •,P%^i)  when P and r ? ,  ( i = 1, • • •, P), are assumed known. Then it is clear that ^/n[{P*', • • •,P*o+i)'-{Pi', converges i n distribution to a normal distribution.  ••,^p+i')']  So it suffices to show that Pj — Pj =  Ovin-'I'). Set X ; = / „ ( r j ' _ i , r P ) X „ and Xj = J „ ( f , _ i , f , ) X „ . Then, h - ^; - ( ^ x ; ' x ; ) - ] [ i x j y „ ] + [(ix;'A7)-][i(x, - x;)'y„] =[(ix;.x,)- - ( i x ; ' x ; ) - ] { i ( x ; . - x;)'y„ + i x ; y „ } + [ ( i x ; ' x ; ) - ] [ i ( x , - x;)'y„] =:(/){(//)+ (J7/)} + (n/)(//). where (/) = [ ( ^ X j X , ) " - ( ^ X / ' X / ) " ] , ( / / ) = i ( X j - X ; ) ' y „ , ( / / / ) = i x ; y „ and ( I F ) = [ ( i X ; ' x ; ) - ] . As in the proof of Theorem 3.3, both (III) and (IV) are 0^(1). B y Theorem 3.4, f - r ° = Op{ln^n/n).  The order of Op(n"^''^) of (I) and (II) follows from Lemma 3.6 by taking  a„ = In^n/n, zt = (a'xj)'^ and zt — a'xf^j respectively, for any real vector a and u > 2. This completes the proof.  ^  3.2 Consistency of the estimated segmentation variable  Since d is assumed unknown in this section, we wiU use the notation such as 5„(yi), Tn{A) introduced i n Section 2.2. The two theorems in this section show that the two methods of estimating d9 given in Section 2.2 produce consistent estimates, respectively.  T h e o r e m 3 . 6 If dP is asymptotically identifiable w.r.t. L, then under the conditions of Theorem 3.1, d given in Method 1 satisfies P{d = dP) — - 1 as n ^  ex.  T h e o r e m 3 . 7 Assume {xj} are iid random vectors. If Zi — ( x n , . . . , X i p ) ' is a continuous random vector and the support of its distribution is ( a i , 6 i ) X . . . X (ap,bp), where —oo < ai < bi < oc, i = I,...  ,p, and for any a G R P , E[{z[zi)^] < oo, for some u > I, then d given by  Method 2 satisfies P{d = dP) —r 1 as n  oo.  To prove Theorem 3.6, some results similar to those presented in the last section are needed.  Lemmas 3.2'-3.3' and Proposition 3.1' below are generahzations of Lemmas 3.2-3.3  and Proposition 3.1 respectively. Lemma 3.2'  Assume for the segmented linear regression model (3.1) that Assumption  satisfied. For any d ^ do and j ^ 1, • • • , / ° -|- 1, let R'j(a, 77) = { x i : a < xid < v}<^R°j,  3.0 is <  a < 7/ < 00. Then  P{svipT4R'jia,rj)) a<Ti  > ^In'n}  ^0,  as  n 0 ,  J-Q  where Po is the true order of the model and To is the constant associated with the local exponential boundedness condition for the {et}. Proof  Conditioning on X „ , we have for any j and d ^ do that Q„3  P{supr„(i2,^(a,77))>^ln2 7 i | X j a<TI J-0 = P { max ê'nHn{R%Xsd,Xtd))ên <  J2  P{'<Hn{R^ix,d,Xtd))ên>^ln'n\Xn}. ^0  x,d<x,d Since IIn{Rj{x3d,Xtd))  > M l n ' n | X„}  is nonnegative definite and idempotent, it can be decomposed as  Hn{RJ{x,d,Xtd)) 61  = W'AW,  where W is orthogonal and A = diag{l, - •• ,1,0, - •• ,0) with p := rank{Hn{RJ{xsd,Xtd)))  =  Tank{K) < po. Set Q = (/p,0)W. Then Q has fuh row rank p. Let Ç ' = ( q i , - - - , q p ) and C/, = q 5 ê „ , / =  Then p (=1  Since p < po and 7~r  -'o  ^  9pg  /=1  <i'{f^f > ^iri^ri  for  some l\Xn}  Pq  it suffices to show, for any /, that  E  Pi^^  > M  ^ I Xn} ^ 0 ,  Noting that p = trace{H^{RJ{x,d,xtd)))  = E L i II  asn^O.  \?^ we have || q, f=  q^q, < p < po,  / = 1,... ,p. B y Lemma 3.1, with <o = î o / p o we have E  ^{|C^/I > 3poInn/To I X „ } <  E  2exp(—^ • ^Inn)exp(co(ro/po)'po)  < n(n - l)/n^exv{coT^/po) as n ^  ^ 0,  oo, where CQ is the constant specified in Lemma 3.1. Finally, by appealing to the  dominated convergence theorem we obtain the desired result without conditioning. P r o p o s i t i o n 3.1' Consider the segmented regression model 3.1. (i) For any subset B of the domain of X\ and any j,  SniB n R^j) = -e'niB n R''j)ên{B D E " ) - T „ ( 5 n iZ^).  %  (ii) Let  be a partition of the domain o / x i , where m is a finite positive integer. Then, m+1  m+1  i=i  i=i  /or a / / F u r t h e r , if Bi = { x i : r j _ i < x i ^ < r,} for d ^ do then Assumption 3.0 implies m+1  Sn{Bi n R]) = ê'n{R]yn{R]) + Op(ln2 n) i=l  uniformly for all T\, - • • ,Tjn such that —oo = TQ < r i • • • < r^^ < r ^ + i = oo. Proof: (i) Denote A =  Bf\R].  Sn{A) =y,:(/n(A) -  Hn{A))Yn  = (X„(A)/3° + èn{A))'{UA) =P'j'X'^{A)Xn{A)P]  - Hn{,A)){XMW'j  +  UA))  + 2ê'n{A)Xn{A)P] + 4(A)è„(/l)  - [^°X(^)^n(^)X„(A)^° +  24(A)^„(A)X„(A)/3« + ê'„(A)JÏ„(A)è„(A)].  Since X ; ( A ) ^ „ ( A ) X „ ( A ) = X ; ( A ) X „ ( A ) and ^ „ ( A ) is idempotent, we have  [Xn{A) - i r „ ( A ) X „ ( A ) ] ' [ X „ ( A ) - ^ „ ( A ) X „ ( A ) ] = 0  and hence 5 ' „ ( A ) X „ ( A ) = X „ ( A ) . Thus,  5„(A) =  4(^)fn(^) - ê„(A)^„(A)ê„(A) =  è'n{A)én{A)  (ii) B y (i), m+1  Y,Sn{B,f\R]) i=l m+1  = Y KiBi t=i  n R])UBi  n i2°) - r„(5.- n R])]  m+1  =ê'„(i2?)è„(ii:°) - E  .=1  ^-(^<^ ^ i ) -  - T„(A).  1£ Bi = { x i : r i _ i < xid < Ti}, denote Bi n R° by RJ{Ti_i,Ti) plies Y.TJ'i TniBi n PQ) = ZtV  Tn{RJ(Ti-i, Ti)) < (m + 1) s u p , < , T„(i2^^(a, T?)) = Op{ln' n)  uniformly for all —oo < r i < • • • <  L e m m a 3.3'  for all i. Lemma 3.2' im-  < oo. %  Let A be a subset of the domain o / x i .  / / both £'[xixil(xieAnHO)]  X^[xiXil(xjeyiniî<'^j)] û'^c positive definite. Then under Assumption [Sn{A) - Sn{A n R°,) - Sn{A n i?°+i)]/n  3.0,  ^  for some Cr > Q as n ^ oo, r = 1, • • •, / ° .  Proof  It suffices to prove the result when /° = 1. For notational simplicity, we omit the  subscripts and superscripts 0 in this proof. Let  = X „ ( A n Rj), êj = ê„(A fi Rj), j — 1,2,  X * = X i * + Xj*, €* = €l + ë | and 'p = ( X * ' X * ) - X * ' y „ . As in ordinary regression, we have Sn(A)  =\\x;:0i-'p) + x;02-h  + n\'  -  - h\'  =\\x;0i  + \\x;02  + Wn? + 2 € * ' X ; ( À  -  ^ )  + 26--'x,*(^2 -  h  It then follows from the strong law of large numbers for stationary ergodic stochastic processes that as n —> oo, ^ ^ v ^ * = ^èx,x;i(x.eA) ^ ix;'x; ^  £ { x i x ; i ( x , e ^ ) } > 0,  £ { x i x l l ( x , e x n R , ) } > 0, ; = 1,2,  and ix*V„ ^  i;{î/iXal(xieA)}-  Therefore, ^ ^  {^{xixil(x,6^)}}-^£{î/ixil(x,6^)} 64  Similarly, it can be shown that  Tt  for J = 1, 2, and n Thus as n —>• oo, ^ 5 „ ( A ) has a finite limit, this limit being given by lim  -Sn{A)  n-K» n = (  - ^ * ) ' £ ( x i x i l ( x , e ^ n R , ) ) • (^1 - n  + 02 - ^ • ) ' i ; ( x i x ' i l ( x , e ^ n R , ) ) • 02  -  + a ^ P j x i e A}. It remains to show that ^5„(>1 n Rj) converges to a-P{xi least one of 0i - P*)'E(xxx[l^^^^^nR,))0i  £ A (1 Rj}, j = 1,2, and at  - P") and 02 - $')'Eix^^[li^,eAnR,))02  - P')  is positive. The latter is a direct consequence of the assumed conditions while the former can be shown again by the strong law of large numbers. By Proposition 3.1' (i),  Sn(A nRr) = ê'^iA n Ri)êniA n Ri) - T„(A n  =  - Tn{A n R^).  The strong law of large numbers implies  -êîêi ^ Tt  E[ell^^^eAnR,)] = (T^P{^I  -fi'Xj Tt  e AO R^),  ^ [ f i X i l ( x , e ^ n i î i ) ] = 0,  as n ^ oo and W = lim„_^co ^ - ' ^ i ' - ^ i * is positive definite. Therefore,  Ri) = i-ê[x;)i-x^'xn-i-x:'€,)  -TniAn n  n  and hence ^ 5 „ ( A n i^i) that ^SniA n R2) ^  (T^P{XI  CT^Pfxi  n 6 AD  ^ ow-'o  =o  n  Ri}. The same argument  e An R2}. This completes the proof  can also be used to show ^  P r o o f o f T h e o r e m 3.6 For d = (f,hy  Lemma 3.4 (ii),  n  Thus, it suffices to show for d ^ dP, that ^S^ > <^o+C for some constant C > 0 with probabihty approaching 1. Again, /° = 1 is assumed for simplicity, li d ^ d^,hy and Theorem 2.1, for any {Rj]'fil,  the identifiability of d'^  there exist r, 5 e {1, • • •, X + 1} such that  A f = { x i : Xid e [as,b,]} is defined in Theorem 2.1. Let  D  where  = { ( r i , . . . , r i ) : Rf D A'^ for some  r } . Then for any ( r i , . . . , TL), (TI, • • •, TL) G Bs for at least one s 6 {1, • • •, X + 1}. Since d is chosen such that  <  for all d, it suffices to show that for d ^ d° and each s, there exists  Cs > 0 such that  inf  i5^(n,...,rz,)>a2 + C.  (3.12)  (TI,...,TI,)6B, n  with probabihty approaching 1 as n ^  oo. For any  {TI,...,TL)  € P ^ , let R'1^2 = {x :  G  ( r r _ i , a s ) } , i î ^ ^ 3 = {x : Xd e (&i,r,.]}. Then J?^ = A'^^ U R'[_^_ol> Ri+s- Note that the total sum of squared errors decreases as the partition becomes finer. B y Proposition 3.1' and the strong  law of large numbers, n  j=i >-[  Y  Sn{R'^) +  >-{  E  [SniR'^nR'i)  T"'^ + -[SniAi) n = -{è'^{Rl)UR°i) n  SMi)]  + Sn{R'jnRl)]  + [SniAinR'i)  +  Sn{AinR'',)]} (3.13)  - 5 „ ( A f n R°) - SniAi n R°)] + ^RDURD  = i{è'„è„ + Op(ln^ n)} + ^[SniAi)  + Op{\n' n)]  - 5„(A^ n ii!?) - 5„(Af n iî?)]  =al + Op(l) + - [ 5 „ ( A f ) - SniAi n iE°) - SniAi n i2«)]. Now it remains to show that i [ 5 „ ( A ^ ) - 5 „ ( A f n A ? ) - 5 „ ( A f fli?^)] >  for some Cs > 0,  with probability approaching 1. B y Theorem 2.1, £^[xiXil(xie^,nRO)]j * — 1)2, are positive definite. Applying Lemma 3.3' we obtain the desired result.  ^  To prove Theorem 3.7, we first define the Â;th percentile of a distribution function F as Pk := inft{/ : Fit) > k/100}. Let  and  be the j * 100/(2X + 2)th percentile of F'^ and F^  respectively, where F*^ is the distribution function of  and Fn is the empirical distribution  function of {xtd}, i = 1,..., 2X + 2. If x^d has positive density function over a neighborhood of Pj for each j, then by Theorem 2.3.1 of Serfling (1980, p75),  converges to pj almost surely  for any j. Now, we are ready to introduce three lemmas required by the proof of Theorem 3.7. In these three lemmas, we shall omit "d" in  Lemma 3.8 Suppose izt,Xtd)  and  for notational simpficity.  is a strictly stationary process and the marginal cdf of xtd has  bounded derivative at pj for all j . If rj - pj = Op(l), j = 1, • • •, 2X + 2, and for some u > 1 jEl^tl" < oo, then 1 " ~E^*(^(^"^e(ry_i,r,)) " l(x,<ie(py_ i ,Py )) ) = "  Op(l).  t=l  P r o o f B y the assumption, the marginal cdf, Fd, of xid satisfies Lipschitz condition in a small neighborhood of x-^d — Pj for every j. By Proposition 3.2, TJ — pj — Op(l) implies that there exists a positive sequence {an} such that a„ ^ 0 as Lemma 3.6 in with  ^ oo and rj — pj = (9p(a„). Applying  and fj replaced by pj and TJ respectively, we obtain the desired result.  IT  For any j G {1, • • - , 2Z + 2}, let Rj = { x i :  < x^d < Pj} and Rj = {xj : rj^i  < xid <  rj}. Also let  x:^ = Xn{RjnR°), X* = Xn{Rj),  f; = èn(i2,), and X*r = Xn{Rj n i2j ), X* =  Xn{Rj),  K =  ëniRj),  where i = 1,2. Under the conditions of Theorem 3.7, the support of the distribution of z i is ( a i , 6 i ) X . . . X (ap,bp). Hence, for d ^ dP, E[xix[l(^^^çfi.CRO)] Lemma 3.9 Under the conditions of Theorem 3.7, (i) i X . - X . - . = (ii) liK'K (iii) \x:;ë;  -  ^X:;x;^ +  is of full rank, i = 1,2.  Op(l), i = l , 2;  = Op{l); and = O p ( n - i / 2 ) , i x . * . ' ? ; = Op(i), i =  1,2.  P r o o f : W i t h loss of generality, we can assume P{Rj f] R'-) > 0, i = 1, 2. (i) For any a 7^ 0,  1  1  1 "  Taking Zt — (a'xt)^!^^,^/??) and applying Lemma 3.8, we have  \x*:xi = ^x:;x:^ + o^ii), Tl  i = i , 2.  Tl  (ii) Take Zt = ejl(x,g/î9). Lemma 3.8 implies the desired result. (in)  Take zt = a'x^Ci for any a. Lemma 3.8 imphes ^[X^^'e* - X*,'e*] = Op(l). So, it suffices = Op(7i-i/2). For any a 7^ 0,  to show that ^X*/e*  1  1 " t=i  where {a.'x.t£tl(x,eR°nRj)}  is a martingale difference sequence. B y the central hmit theorem for  a martingale difference sequence (Bilhngsley, 1968), a'(^X^/e*)  L e m m a 3.10 Let n{A) —  l(x,e>i)  = Op{n-'^/'^).  ^^"2/ set A in the domain of x.\.  conditions of Theorem 3.7, for j = 1, • • •, 2Z + 2, (0  HRJ)  = HRJ)  (ii) 'Pr = K +  + Op{l) = 2rF2 + = ^P + Op(l), where K  'pp  Op(l),  =  =  (x;'x;)-x;'y„,  ix;'x;)-x;'Yn,  h = {^[xixil(x,eiîy)]}~^i:[î/iXil(xj6R.)]. (Hi) \[Sn{Ri) (iv)  - Sn{Rj)] = Op(l) and  SniRj)/n(R,)  - Sn{Rj)ln{R,)  = Op(l).  t  Then under the  P r o o f W i t h loss of generality, we can assume P{Rj f] (i) N o t e t h a t i n ( P , ) - i 7 z ( i 2 , )  ) > 0, i — 1,2.  =  3.8 with Zt = 1, we get ^n(Rj)  By applying Lemma = ^n(Rj) + Op(l). B y the strong law of large numbers for  ergodic processes,  ^n{Rj) = i E  M^,eR,) = ElMx.eR,)]  + Op(l) = P ( x , € Rj) + Op(l) =  (u) B y the strong law of large numbers for ergodic sequence, ^X*'X* ^ and ^X*'Yn ^  £'[X'IJ/I1(X,6RJ)].  Hence,  + Op(l).  -Efxix'j l(x,6Hj)] > 0  — /3p as u -> oo.  Since x ; ' F „ = xrp'xrpA° +  x;;x;^p', +  x;'ê;  and X*'Yn = Xi^' XirPi  + X^r' X2rP2 +  X^/êl,  Lemma 3.9 (i) and (iii) imply  ( ^ x . ; ' x . * . ) - - ( i x . v x . " ; ) - = op(i), Tl  Tt  i = 1,2 and -x:'Yn - - x ; ' y „  =èxi'x:,. Tt  ixr;xrp)/3? + Tt  =Op(l). This implies ^X;'Yn  K-K  Tt  71  +  hx;',; -  x;'e;)  Tt  = Op(l) since ^ X ; ' y „ = Op(l). Thus,  = {x:'x:rx:'Yn  =[(ix;'x;)71  C-x;;x;^ - lx;;x;Xpl  - (x;'x;)-x;'y„  - (ix;'x;)-]ix;'r„ + (ix;'x;)-[ix;'r„ - ix;'y„] 7i Ti 7Z 71 71  =Op(l)Op(l) + Op(l)op(l) = Op(l).  Tl  Tl  =hxxAl  - P\)+xiXPr  Tl  - P\) +  =(|,-^?)'(ix,Vxrj(|,-^?) +(^,-/3°)'(ix;/x;,)(,i-^2°) + i e ; ' e ; + \e*'\xiXPr Tl  - Pi) + xuhr  -  m-  Tl  B y (ii) and Lemma 3.9 (iii), 'Pr = Pp + Op{l) and ^e'^'Xf^ = Op(l), i = 1,2. Thus,  ={h - ^m^xi'xiM,  - /3?) + (P, - »°.)'èx;;x;M,  Tl  - 0°) + U',; + 0,(1).  Tl  Tl  Similarly,  =W, - / 3 f ) ' ( i x , - / x , ; ) ( f t - ^ « ) + 0,  - p°,y{^x;;x;,)0,  ft  - (fi,) +  Tl  ^.-j,;  +  0,(1).  ll  Hence, by Lemma 3.9 (i) and (ii), ^SniRj)  -  ^SniR,)  Th  TL  =CPP - m l x ' j x ; ,  -  Tl  HPp -  \XI'X',XPP  -  pi)  Tl  - lx;;x;^m  p'2)'[^x;/xi Tl  - P') +  Tl  - ^e;'e;] + Tl  0^(1)  Tl  = Op(l). (iv) B y (i) and (iii), n(Rj) n  n{Rj) n{Rj)  n  n{R,)  Lemma 3.10 sets down the fundation for Theorem 3.7 and will be used repetedly in its proof.  P r o o f o f T h e o r e m 3.7  Let d ^ dP. Suppose a hnear model is fitted on _ff^ = { x i : xu, €  with the mean squared error à'j{d) = Sn{RJ)/n{R'j). Lemma 3.3'and Lemma 3.10 (i) imply -;^Sn{RJ)-  Under the assumed conditions,  ^^^[Sn{RJr\RVl  + Sn{RJ^Rl)]  ^  Cj  for some Cj > 0. Proposition 3.1' (i) and Lemma 3.2' imply the second term on the L H S , 1 —[SniRjnR°,)  +  = ; ^ E ' n ( R l ^R°yn(Rj = ^/niRl)èniR^)  +  SniR^nR'2)]  n R'i) + Op{ln' n)] Op(ln'n/n),  which converges to (TQ by the strong law of large numbers. Thus, P(àj{d) as n  oo. Since this holds for every  >  (^0+Cfc/2)1(H^^.^^A^)+Op(l)  E  >al  by Lemma 3.10 (iv)  + C +  Op{l)  for some C > 0. By Lemma (3.10) (i)  n 2(^+1)  rtd  2(L+1)  ^  > (TQ + Cj/2)  1  Thus, 1 -  1 ^"^^  1 2 = 2^o + y + If  «p(l)-  = <f°, there are at least Z + 1 E^'s, say,  , i = 1, • • •, Z + 1, which are entirely  embedded in one of the P^'s. By Proposition 3.1 and Lemma 3.2,  1  ^  ^^[4(4.)fn(4)-r„(4)] [ - 6 ' „ ( 4 ) ê „ ( E , ^ . ) + OpOn^ n/n)],  i=l,...,i+l.  By Lemma 3.10 (i) and the strong law of large numbers, the RHS is al + Op(ln" n / n ) . and Lemma 3.10 (i), (iv) imply.  1  1  ^+1  " i=i  L+1  ,  L+1  = E ( ^ ( 7 q : T j + «p(i)K^o + ''p(i)) = ^^0+Op(l)-  So, with probabihty approaching 1, 5^° <  ioi d ^ (f.  ^  This  Remark  The number 2{L + 1) in Theorem 3.7 is not necessary. Actually, all we need is a  number larger than ( i + l ) . S o X - h 2 will do. A n d with probabihty approaching 1, 5„(Ê^°^), the smallest of the {Sn{Rf)}  will be one of those obtained from the data entirely contained  in one regime. Hence, if we let  = SniRf-j^^), with probability approaching 1,  di^dP. However, by changing Z, + 2 and Sn{Rfiy) to 2(X + 1) and we expect that the chance of  <  <  for  SniRf^j) respectively,  for any d ^ dP will be reduced for small sample size. In  fact, this was shown by a simulation study we performed but have not included in this thesis for the sake of brevity. The rate of correct identification is significantly higher when ^f^^ ^niR^j-^) is used. If the number of regimes is chosen to be too large, then the number of observations in each regime will be small and the variance of 5^ will increase. Hence, it will undermine our selection of d. Through our simulation, we found that 2(X + 1) is a reasonable choice. In addition, with small sample size, one of R^^-^ n R'- (z = 1,2) may have very few observations for some d ^ cJ". In such a case SniÈfi^^) is hkely to be smaller than SniAfl^^) by chance. Using "^^=1 '^n{Rfj)) may average out this effect.  3.3 A s i m u l a t i o n s t u d y  In this section, simulations of model (3.1) are carried out to examine the performance of the proposed procedure under various conditions. Constrained by our computing power, we study only moderate sample sizes under the segmented regression setup with two to three dependence structures, that is, 1^ = 1 and 2, respectively. Let {et} be iid with mean 0 and varianceCTQand Zt = (xti, • • •, Xtp)' so that xj = ( l , z j ) , where {xtj} are iid iV(0,4). Let DE{0, A) denote the double exponential distribution with mean 0 and variance 2A^. For d = 1 and T° = 1, the foUowing 5 sets of specifications of the model  are used for reasons given below: (a)  p = 2,  = (0,1,1)', 02 = (1.5,0,1)', €t ~ iV(0,1);  (b)  p = 2, ^1 = (0,1,1)', 02 = (1.5,0,1)', et ~  (c)  p^2ji  (d)  p = 3,/3i = (0,1,0,1)',/32 = (1,0,0.5,1)', et ~ Z ' i ; ( 0 , l / v ^ ) ;  (e)  p = 3, À = (0,1,1,1)', 02 = (1,0,1,1)', et ~  DE{0,1/^);  = (0,1,0)', /32 = (1,1,0.5)', et ~ DEiO,  1/V2);  DE{0,1/^2).  From the theory in Section 3.1 we Icnow that the least squares estimate, f i , is appropriate if the model is discontinuous at r f . To explore the behavior of fi for moderate sized samples. Models (a)-(d) are chosen to be discontinuous. The noise term in Model (a) is chosen to be normal as a reference, normal noise being widely used in practice. However, our emphasis is on more general noise distributions. Because the double exponential distribution is commonly used in regression modeling and it has heavier tails than the normal distribution, it is used as the distribution of the noise in all other models. The deterministic part of Model (b) is chosen to be the same as that of Model (a) to make them comparable. Note that Models (a) and (b) have a jump of size 0.5 at xi = r i while Var(ei)  = 1, which is twice the jump size.  Except for the parameter T,, our model selection method and estimation procedures work for both continuous and discontinuous models. Model (e) is chosen to be a continuous model to demonstrate the behavior of the estimates for this type of model. In all, 100 replications are simulated with different sample sizes, 30, 50, 100 and 200. Although in some experiments, X = 3 was tried, the number of under- and over-estimated /° are the same as those obtained by setting Z = 2. The number of cases where / = 3 is only 1 or 2, out of 100 replications. This agrees with our intuition that, given a two-piece model, if a two-piece model is selected over a three-piece one, it is unlikely that a four-piece model will be  selected over a two-piece one. Based on this experience, the results reported in Tables 3.1 and 3.2 are obtained by setting i = 2 to save some computational effort. The two constants  and  Co in MIC are chosen as 0.1 and 0.299 respectively, as explained in Section 3.1. The results are summarized in Tables 3.1 and 3.2. Table 3.1 contains the estimates o f / ° , r ° and the standard error of the estimate of r^, fx, based on the MIC.  A number of observations  may be made about the results in the table. (i)  For sample sizes greater than 30, the MIC correctly identifies l'^ in most of the cases.  Hence, for estimating Z*', the result seems satisfactory. Comparing Models (a) and (b), it seems that the distribution of the noise has a significant influence on the estimation of /°, for sample sizes of 50 or less. (ii)  For smaller sample sizes, the bias of f i is related to the shape of the underlying model.  It is seen that the biases are positive for Models (a) and (b), and negative for the others. In an experiment where Models (a) and (b) are changed so that the jump size at Xi = TI is -0.5, instead of 0.5, negative biases are observed for every sample size. These biases decrease as the sample size becomes larger. (iii)  The standard error of f i is relatively large in all the cases considered. A n d , as expected,  the standard error decreases as the sample size increases. This suggests that a large sample size is needed for a reliable estimate of r f . A n experiment with sample size of 400 for a model similar to Model (e) is reported in Section 4.3. In that experiment the standard error of f i is significantly reduced. (iv)  The choice oi 6o = 0.1 seems adequate for most of the models we experimented with since  it does not generate a pattern, like always overestimating / for n = 30 and underestimating / for n = 50, or vice-versa.  By the continuity of Model (e), its identification is expected to be the most difficult of all the cases considered. The CQ chosen above seems too big for this case, since the tendency toward underestimating / is obvious when the sample size is small. However, a more plausible explanation for this is that with the small sample size and the noise level, there is simply not enough information to reveal the underlying model. Therefore, choosing a lower dimensional model with positive probability may be appropriate by the principle of parsimony.  In summary, since the optimal selection of the penalty is model dependent for samples of moderate size, no optimal pair of (co,^o) can be recommended. On the other hand, our choice of ^0 and Co shows a reasonable performance for the models we experimented with.  Table 3.2 shows the estimated values of the other parameters for the models in Table 3.1 for a sample size of 200. The results indicate that, in general, the estimated /3j's and CTQ are quite close to their true values even when f i is inaccurate. So, for the purpose of estimating /3j's and al, and interpolation when the model is continuous, a moderate sized sample say of size 200 may be sufficient. When the model is discontinuous, interpolation near the threshold may not be accurate due to the inaccurate f i . A careful comparison of the estimates obtained from Models (a) and (b) shows that the estimation errors are generally smaller with normally distributed errors. The estimates of  have relatively larger standard errors. This is due to  the fact that a small error in P21 would result in a relatively large error in $ 2 0 -  To assess the performance of the MIC when 1° = 2, and to compare it with the Schwarz Criterion (SC) as well as a criterion proposed by Yao (1989), simulations were done for a much simpler model with sample sizes up to n = 450. Here we adopt Yao's (1989) setup where an univariate piecewise constant model is to be estimated. Note that such a model is a special  case of Model (3.1). Specifically, Yao's model is  where Xt is set to be t/n for i = 1, • • •, n, e< is iid with mean zero and finite 2mth moment for some positive integer m. Yao shows that with m > 3, the minimizer of logâf -f- / • C „ / n is a consistent estimate of 1° for / < L, the known upper bound of satisfying Cnn"^/"*  where {C„} is any sequence  oo and C „ / n —>• 0 as n —* oo. Four sets of specifications of this model  are experimented with: (f) r ° = 1/3,  = 2/3, /3?o - 0, 0% = 2, P% = 4, e, ~ DEiO, 1/^2);  (g) r f = 1/3, T° = 2/3, P% = 0, P% = 2, P% = 4,  -  tj/VU;  (h) r " = 1/3, rO = 2/3, 0% = 0, /3?o = 1, P'zo = - 1 , Q ~ ^'^^(0,1/V2); and (i)  = 1/3,  where  = 2/3, 0% = 0, y3°o = 1, P'so = - 1 ,  ~  tr/VU,  refers to the Student-t distribution with degree of freedom of 7.  In each of these cases the variances of ej are scaled to 1 so the noise levels are comparable. Note that for ej ~ tj/y/ÏÂ,  ^^(ef) < oo and Ele]] = oo. It barely satisfies Yao's (1989) condition  with m = 3 and does not satisfy our exponential boundedness condition. In Yao's (1989) paper, {Cn} is not specified, so we have to choose a {Cn} satisfying the conditions. The simplest {C„} is c i n " . W i t h m = 3, we have n"~'^l'^  oo implying a > 2/2. (We shall call the criterion with  such a Cn, Y C , hereafter.) To reduce the potential risk of underestimating / ° , we round 2/3 up to 0.7 as our choice of a. The  and CQ in MIC are chosen as 0.1 and 0.299 respectively, for  the reasons previously mentioned. Ci is chosen by the same method as we used to choose CQ, that is, forcing log no = cing" and solving for c j . W i t h no = 20 and a = 0.7, we get ci = 0.368. The results for model selection are reported in Tables 3.3-3.4. Table 3.3 tabulates the empirical distributions of the estimated  for different sample sizes. From the table, it is seen  that for most cases, MIC  and YC perform significantly better than SC. A n d with sample size  of 450, MIC and YC correctly identify /" in more then 90% of the cases. For Models (f ) and (g), which are more easily identified, YC makes more correct identifications than MIC. for Models (h) and (i), which are harder to identify, MIC  But  makes more correct identifications.  From Theorem 3.1 and the remark after its proof, it is known that both MIC  and YC are  consistent for the models with double exponential noise. This theory seems to be confirmed by our simulation.  The effect on model selection of varying the noise distribution does not seem significant. This may be due to the scaling of the noises by their variances, since variance is more sensitive to tail probabilities compared to quantiles or mean absolute deviation. Because most people are familiar with the use of variance as an index of dispersion, we adopt it, although other measures may reveal the tail effect on model identification better for our moderate sample sizes. Table 3.4 shows the estimated thresholds and their standard deviations for Models (f), (g), (h), (i), conditional on I = l'^. Overall, they are quite accurate, even when the sample size is 50. For Models (h) and (i), the accuracy of  is much better than that of f i , since T2 is much easier to  identify by the model specification. In general, for models which are more difficult to identify, a larger sample size is needed to achieve the same accuracy.  Finally, the small sample performance of the two methods given in Section 2.2 for the identification of the segmentation variable is examined. Models (b), (d) and (e).  The experiment is carried out for  Among Models (a)-(e). Models (b) and (e) seem to be the most  difficult in terms of identifying / ° , and are also expected to be difficult for identifying d. Note that for all the models considered, d is asymptotically identifiable w.r.t. any X > 1 by Corollary 2.2. For X = 2, 100 replications are simulated with sample sizes of 50, 100 and 200. W i t h sample  sizes of 100 and 200, both methods identify 1° correctly in every case. W i t h sample size of 50, the correct identification rate of Method 1 is 100% for Models (b), (d), and 96% for Model (e); for Method 2 the rates are 98, 94 and 88 for Models (b), (d) and (e), respectively. From these results, we observe that for sample sizes of 100 or more, the two methods perform very well. And for a sample size of 50, Method 1 performs better than Method 2. This suggests that if the sample size is small. Method 1 may be more reliable. Otherwise, Method 2 gives a good estimate with a high computational efficiency.  3.4 General remarks  In this chapter, we proved the consistency of the estimators given in Chapter 2. In addition, when the model is discontinuous at the thresholds, we proved that the estimated thresholds converge rapidly to their true values at the rate of In^ n/n. Consequently, the estimated regression coefficients and the estimated variance of the noise are shown to have the same asymptotic distributions as in the case where the thresholds are known, under the specified conditions. We put emphasis on the case where the model is discontinuous for the following two reasons: First, if the model is continuous at the thresholds, then we have for any z € R P and x ' = (1, z'), x'^^o = x'P%, i f X , = rj», J = 1,..., /O. This implies for ah j, E.-^d(/^(°+i)i P% ~ f^U+i)o  0% ~ f^U+i)d)'''j • Since this holds for any x such that Xd =  =  , we can conclude  that /J^j+i),- = /5ji for i ^ 0,d and all j. By aggregating the data over Xd, we obtain an ordinary hnear regression problem and, hence,  (z 7^ 0, c?, j = 1, • • •, /°  1), can be estimated by least  squares estimates with all the properties given by the classical theory. The residuals can then be used to fit a one-dimensional continuous piecewise hnear model to estimate I, - • • ,1° + 1).  (i = 0, d, j =  For this one-dimensional continuous problem, Feder (1975a) shows that the  restricted (by continuity) least squares estimates of the thresholds and the regression coefficient are asymptoticaUy normally distributed when the covariates are viewed as nonrandom. So the problem is essentially solved except for a few technical points. In the Appendix of this chapter, we shall use Feder's idea to show that for a multidimensional continuous model with random covariates, the unrestricted least squares estimates possess similar properties. That is, the {/3j} are asymptoticaUy normally distributed, and so are the thresholds estimates given by the {Pj} instead of least squares.  Second, noting that continuity requires P^j^i-^i ~ 0% for i ^ (},d and all j, it would seem that a response surface over a multidimensional space will rarely be well approximated by such a continuous piecewise model.  Problems where the models are either continuous at all thresholds or discontinuous at all thresholds have now been solved. The next question is what i f the model is continuous at some thresholds, and discontinuous at others. This problem can be treated as follows. First, decide if the model is continuous at each threshold. This can be done by comparing fj, the least squares estimate of rj", with fj, the solution of pjo - P(j+i)o - {P(j+i)d - Pjd)'''j- B y the established convergence of the /S^'s and the fj's, if the model were discontinuous at TJ, then fj would converge to TJ. Meanwhile,  or P(j+i)i would converge to different values for some  i ^ 0,d or fj would converge to some point different from rj", or both. Thus, a large difference between fj and fj or between 0ji and P(j+i)i for some i ^ 0,d would indicate discontinuity. Then, by noting that Theorem 3.4 does not assume the model is discontinuous at all r^'s, we see that fj - rj* = Op(ln^n/n) for ah r^'s which are thresholds of model discontinuity. By the proof of Theorem 3.5, it is seen that these f / s can replace the corresponding r j ' s without changing the asymptotic distributions of the other parameters.  So, between each successive  pair of thresholds at which the model is discontinuous, the asymptotic results for a continuous model can be applied. In summary, regardless of whether the model is continuous or not, we can always obtain estimates of TJ''S which converge to their true values no slower than  Op{ll\/n),  and the estimated regression coefficients always have asymptoticaUy normal distributions.  Note that most results given in this chapter do not require that x i have a joint density which is everywhere positive over its domain. Hence, one component of X i could be a function of other components, as long as they are not collinear. In particular, x i could be a basis of pth order polynomials.  Since our estimation procedure is computationally intensive, one may worry about its computational feasibility. However, we do not thin]< this is a serious problem, especially with the ever growing speed of modern computers. The simulations reported in the last section are done with a Sparc 2 work station. Even with our inefl^icient program, which inverts an order rp (p-t- 1) X (p-|-1) matrices, 100 runs for model (a) consumes only about 9 minutes of C P U time with a sample size of n = 50 and only about 35 minutes with n = 100. Hence, each run would consume approximately .35 minutes of C P U time if n = 100. A more efficient program is under development; it uses an iterative method to avoid matrix inversion. A preliminary test shows that, with the same problems mentioned above, the C P U time consumed by this program is about 15 and 40 seconds for n = 50 and 100, respectively. Hence, each run would only take a few seconds of C P U time. Unfortunately, further modifications are needed for the new program to counter the problem of error evolution for large sample size. Nevertheless, even with our inefficient program, we believe our procedure is computationally feasible if L is small and n is not too large (say, Z < 5, n < 1000). A n d with a better program and a faster computer, the computation time could be substantially reduced, making much more complicated model  fitting computationally feasible. Finally, as we mentioned in Section 3.1, the choice of Co in MIC  and  needs further study.  3.5 Appendix: A discussion of the continuous model  In Section 3.1, we estabhshed the asymptotic normality of coefficient estimators for Model (3.1) when it is discontinuous at the thresholds. In this section, we shall establish the corresponding result for Model (3.1) when it is everywhere continuous. If Assumptions  3.0-3.1 are  assumed by Theorem 3.1, the attention can be restricted to {/ = / ° } . First, we shall show that the /3j's converge at a rate no slower than Op{n~^l- Inn) by a method similar to that of Feder (1975a). Now let ^ = (/3;,...,^;o+i)'; ^° = ( ^ ? ' , - - - J ? o V i ) ' ; f = (^',ri,---,r;o)'; f° = ( ^ ° ' , r f , . . . , r ° ) ' ; S =  : /5j 7^ /^j+i, i = 1, • • •,  m(6X) = x ' [ ^  -oo < n < • • • < r,o < oo};  l(^,g(^._,,^^])^j];  and /i(Ç;Xi) = ( ^ ( f ; x i ) , - - - , M e ; x f c ) ) ' ,  where Xfc = ( x i , • • • ,Xfc)'. Assuming no measurement errors, Feder (1975a) seeks the values at which the response must be observed to uniquely determine the model over the domain of the covariate. To find these values, he introduces a concept of identifiability. VVe adapt his concept to our problem.  D e f i n i t i o n For any C = {6*', r^*, • • •, r* )' G S, tlie parameter (9 = (/3[, • • •,0\o+J is identified at / i * = /x(f*,Xfc) by Xk if the equation  = / i * uniquely determines 0 =  Next we prove a lemma adapted from Feder (1975a).  The proof follows that of Feder  (1975a).  L e m m a A 3 . 1 If 9 is identified at fp = /i(Ç°,Xyt) hy Xk = ( x i , - - - , X f c ) , then there exist neighborhoods, M, of fi(^'^,Xk) and T of Xk such that (a) for all (k-dimensional)  vectors p, = {fii, • • •,pk)' € M and (p + I) X k matrices X^ G T  such that p, can be represented as jl — /i(^, X^) for some ^ £E,  0 is identified at fi by XI; and  (b) the induced transformation 9 = 9{fi;X^) satisfies the Lipschitz condition \\9i —^2|| < C\\fii /Ï2II for some constant C > 0, whenever X^ G T and p., = n{Çi;X^),  Proof:  p2 = più'iXk)  S M.  Since 9 is identified at fjP by Xk, it follows that for any possible choice of parameters  Tl, - •• ,Tio consistent with 9^, for each j there must exist p + 1 components of Xk, X j j , • • •, Xj^^^ such that Xj.^d € iTj-i,Tj]n{T^_-^^,T^],  i = 1, - • • , p - | - l , and the matrix (x_,-,, • • •, Xj^^^J is nonsin-  gular. B y continuity, the Xj. 's may be perturbed shghtly without disturbing the nonsingularity of ( x j j , • • •, Xjp^j). Assertions (a) and (b) follow directly from the properties of nonsingular hnear transformations. (Recall that if / i = X6 for a nonsingular X , then 9 = X~'p, and hence ll^ll < tr{X-''X-')M\).  H  R e m a r k It is clear from the proof that for a continuous model, it is necessary and sufficient to identify 9'^, that within each r-partition, there are p + 1 observations ( x j j , • • •, xj^,^ J such that the matrix X = ( x j j , • • • ,Xjp^j) is of full rank. In particular, if z has a positive density over a neighborhood of rj* for each j, then with large n, a Xk exists such that 9 is identified at fi{e\Xk)  hyXk.  Another concept introduced by Feder (1975a) is called the center of observations.  This  concept is modified in the next definition to fit our multivariate setup. D é f i n i t i o n Let z = ( x i , • • •, Xp)'. z° = {x°, • • •, x^)' is a center of observation if for any ^ > 0, both P ( { z : ||z - z ° | | < S, Xd < x^}) and P ( { z : ||z - z ° | | < 6, Xd > x"}) a-^e positive.  Remark For any a < ?/, if constant vectors z i , - - - , Z p + i are centers of observations such that Xtd € (a, 77), t = l , - - - , p + 1, and the matrix Xp+i = ( x i , • • •, Xp+i) is of full rank where Xj = (1,Z;)', by Lemma (A3.1) there exists a neighborhood, T , of X p + i , such that T C {x : a < xtrf < 77}, P{T) > 0 and X*^^ is of fuU rank if X;^^ 6 T . Hence, for any a / 0 and random vector x , i;[(a'x)^l(,,e(„,,|)] > ^[(a'x)2l(x6T)] > 0 implying that £ ^ [ x x ' l ( 2 . ^ ç ( i s positive definite. Therefore, a sufficient condition for Assumption 3.1 to hold is that for some è G (0,mini<j<;o(r°^i - TJ)/2), within each of {x : x^ £ {TJ  —6,  TJ)}  and {x : x^ G {'''J,TJ-\-S)}  there are p + 1 centers of observations forming a full rank  matrix for every j. In particular, ordinal categorical covariates are allowed in this assumption.  Lemma A3.2 (Feder, 1975a) Let V be an inner product space and X, Suppose x £  y £ y,  y subspaces of V.  and x*, y* are the orthogonal projections 0 / x + y onto X,  y  respectively. If there exists an a < I such that -x. £ X, y £ y implies |x'y| < a||x||||y||, then ||x + y | | < ( | | x * | | + | | y * | | ) / ( l - a ) .  Lemma A3.3 For any real TI <  , let T be the random linear space spanned by the 2{p + 1)  column vectors o / ( A „ ( - o o , n ) , X „ ( r i , o o ) ) , and let C = X „ ( r i , r ° ) A ^ ° , where Ap° = Then under Assumptions  3.0-3.1, there exists a < I such that for sufficiently large n, K'gl < ^\m\9\\ 85  P^-P^.  uniformly in T\ < r ° and g £ T with probability approaching 1.  Proof:  It suffices to show that with large probability, for all Vi < r ° and g £  iC'gf  Define  <  a'WCfWgf.  = X „ ( - < x ) , r i ) , X^ = X „ ( r i , o o ) , X^ = X „ ( - o o , r O ) , X^ = ^ « ( r O , ^ ) . For any  g e :F, there exist pijo  G R-^+^ such that g = X i  + X2P2- Noting that | | X „ ( r i , rf)/32i|- <  \\X2P2\\\^e have \\X4n,T^)M'  _  M'  <  \\Xn{n,T^)P2\\' \\xJi\\^ + \\X2M'  |2 \\X2P:  . J | X „ ( r i , r » ) / ? 2 | P + ||Xi/?2|P -  (A3.1)  \\X2P2\\' + \\XxP2\\' \\XnM''  Suppose A, B are positive definite matrices and A ( M ) denotes the largest eigenvalue of any symmetric matrix M. Then for any P ^ 0, P'AP _ {B^'^P)'{B-^I^AB-^I^){B^/'^~P) _ ~P'{A + B)p {B^np)'{B-^l''AB-^n){B^I-ip) + {B^f^py{B^/^p)  ^ A(5-V2^5-i/2) - A(E-i/2A5-i/2) + T  This result can be appfied to the RHS of (A3.1) since X^Xn = XfX^  + X^'X^ and with  probability approaching 1, Xf X^, X2 X2 are positive definite. Thus, \x:P2r  _  \\XnP2\V  p'2C-x;'xi)P2  ^  Ai  P'2CnXrXt + iX*2X*2)~P2 - A l + 1 '  where Al = xii^x; n  x;)-'/\lx{'xi){\xU;)-''') n  n  is bounded in probabihty since both ^X^'X^ and ^X2 X2 converge to positive definite matrices. Therefore, by (A3.1) and (A3.2) there exists 0 < a < 1 such that with probabihty  approaching 1,  for all Tl < T^ and g Ç. T. Thus, with probabihty approaching 1,  <[E(A^°'x,)^l(..,,(.,,.o„][E(x;^2)^l(..,e(n,.?I)] t=i t-i = |Kin|X„(ra,r«)^2||2 ||.||2||  ,|2ll^Yn(ri,r»)^2|P  =iiai M  — ^ ^ i i , —  <«'iicini5ii' for all Tl <  and g E J^. This completes the proof.  L e m m a A3.4 that P{W)  Suppose Assumptions  ^  3.0-3.1 are satisfied. Let W he a subset of R P such  > 0. Then under Assumptions  3.0-3.1, min^,gvv |z^(xf)| = Op{lnn/^/n),  where  />(xO = M l ; x t ) - M e ° ; x t ) .  P r o o f Without loss of generality, we can assume P = 1. If we can show that Y,7=i ^ti'^t) = Op{\n^ n), then for any I ^ C R " such that P(W) > 0, min^.evv \i>i^t)\ =  Op{\nn/y/n).  Let  be the linear space spanned by the 2(p + 1) column vectors of (A„(—oo,fi),  X„(fi,oo)),  be the linear space spanned by / / ( f ° ; X „ ) , and :F+ = :F ®  X^)]  be the direct sum of the two vector spaces. Let Q'^,Q denote the orthogonal projections onto .;£•+,  respectively. Let i>(X„) = (j>(xa), • • •, £>(x„))'. Then | | ^ ( X „ ) - ê j p = S^ih)  < \\ên\\'.  Since botli / i ( f ° , X „ ) and / x ( f ; X „ ) belong to T"^, by orthogonality, l K l ; x „ ) - g + y n i i ' + iiê+5^n-F„iP =IKl;Xn)-F„||2  <lk-n|P  = I K e ° ; X „ ) - Q+y„|p + ||Q+y„ -  YX-  Subtracting HQ'^yn — y n | P from both sides, we have that  <IKe°;X„)-Q+n|p  Therefore,  <||/z(|;x„) -  + i|Q+y„ - M e " ; ^ n ) l i  < l l O + è „ | | + ||Q+ê„|| =2iig+ëni|. Since YJt=\ ^li'^t) = \\i>(Xn)\\'^, it remains to show that ||(5"'"ên|| = Op{lnn). of generahty, we can assume that n < T^. Let /3° =  and A/3° ^  Kf°,A„) = (X„(-oo,r{'),X„(r{',oo))4° = ( X „ ( - œ , f i ) + X „ ( f i , r ° ) , X „ ( f i , oo) - X „ ( f i , rO))/3° = [(X„(-cx),f,),X„(fi,oo)) + (X„(fi,r°),-X„(fa,r«))]^« = ( X „ ( - ^ , f i ) , X „ ( f i , oo))^° + X„(fa, r°)A/3°.  -0^.  Without loss Note that  This imphes that T'^ is also generated by the direct sum of T and vector C, where C X„(fi,rf)A/3°. B y Lemma A3.3, there exists a < 1 such that for sufficiently large n, IC'^I < allClllkll for ah f\ < r ° and g Ci P with probability approaching 1. Since Q{Q^èn)  — Q^n and C'(Q^fn)/IICI| =  C'ên/IICII) it follows from Lemma A3.2 that with probability approaching 1,  Therefore, if it is shown that ||Qên|| = Op(lnn) and C'?n/l|C|| = Op(lnra), the desired result obtains. Define X = (Â:i,Â'2). Then =è'^X{X'X)-X'X{X'X)-X'èn =è'nX{X'X)-X'ln =  ~e'nMX[Xi)-X[èn  +  è'MX'2X2)-XUn  = r „ ( - o o , f i ) + r„(fi,(X)). Therefore by Lemma 3.2, ||Qên|| = Op(lnra) uniformly for all fx. We next show that uniformly in n < rJ", C'ên/||CII = Op(lnn) for ||C|| 7^ 0, where C ^{M-^i)  and ^ = ( X „ ( - o o , r f ) - X „ ( - o o , f i ) ) . Let yt = x'^A/3°. Conditional on X „ , we have  that AQ%\  31nn  IICII -  <P( <  To  1^")  l^if^^-<-^--^;,'l > ^ | A „ ) pJEr=iytl(x„j<x.^<r„j)gt|  3 In 71  where To is specified in Lemma 3.1. Since | 2 / . l ( x „ , < x „ < x „ , ) / ( E r = i 2/i l(:<:.d<x„<x.,))^/-| < 1 and n  n  for any  x^d,  by Lemma 3.1,  <  Y  2exp(-To.^)exp(coro^)  <n{n - l)/n^exp{coT^) as  0,  —>• oo, where CQ is the constant specified in Lemma 3.1. Finally, by appealing to the  dominated convergence theorem we obtain the desired result without conditioning. This completes the proof.  ^  T h e o r e m A 3 . 1 Suppose Assumptions 3.0 and 3.1 are satisfied. Let X ° = ( x ° , • • - j x " ) . If 6 is identified at  X ° ) by  and x j , • • •, x° are centers of observations, then  = Op(lnn/A).  Proof  Lemma A3.4 implies that with probability approaching 1, within any small neighbor-  hood of x ° , there exists a xj^ such that  i = 1, - •• ,k. Lemma A3.1 imphes the conclusion of the theorem.  If  C o r o l l a r y A 3 . 1 Under the conditions of Theorem A3.1, f - r ° = Op(lnn/y/n) (^1, • • •,'fio)', fj =  0 - Pj+xfi)l0i+i,d  - hd),  i = 1, • • •,  P r o o f For any j = 1, • • •, /°, by continuity of the model at the end points x^ = r^,  where f =  for all {xi, i ^ d}. Then by choosing the {x^, i ^ d} so that they are not collinear, we deduce that  =  for ah i ^ 0,d. By assumption, /9°^ ^  Therefore, TJ can be reestimated  by solving  —  and hence, fj — r ° has the same order as  1  Next we shall establish the asymptotic normahty of ^, and f when the model is continuous. The idea is to form a pseudo problem by deleting all the observations in a small neighborhood of each r ° so that classical techniques can be apphed, and then to show that the problem of concern is "close" to the pseudo problem. The term "pseudo problem" is used because in practice the r^'s are unknown and so are the observations to be deleted. This idea is due to Sylwester (1965) and is used by Feder (1975a). Assume xj, has positive density function fd{xd) over a neighborhood of r ° , j = l , - - - , / ° . Our pseudo problem is formed by deleting all the observations in {x : r ° — d„ <  < r ° + rf„}  where dn = 1/ln^ n. Intuitively speaking, the number of observations deleted will be Op{ndn). This will be confirmed later in Lemma A3.6. Adopting Feder's (1975a) notation, we define n* as the sample size in the pseudo problem, and let n** = n - n*, 9* he the least squares estimate in the pseudo problem, and  the summation over the n* terms of the pseudo problem,  = Yl't=i " E * - Generally, a single asterisk refers to the pseudo problem. Theorem A3.1 and Corollary A3.1 carry over directly to the pseudo problem. Thus,  Theorem A3.2 If the conditions of Theorem A3.1 is satisfied in the pseudo problem, then  9' -9°  Further, if Model (3.1) is continuous, f  =  Op{lnn/V^).  — r ° = Op{\n n/y/n).  L e m m a A 3 . 5 Suppose {xt} is an iid sequence. Under the conditions of Theorem A3.2  where Gj = £;[xx'l(^_^ç(^<^^,^o])], j = 1, • • • , / ° + 1.  Proof  Let 5*(f) = ^ ^'(yt  - / / ( ^ X t ) ) ^ . Theorem A3.2 imphes that f* £ ( r ° - dn,Tf + dn]  with probability approaching 1. Since there are no observations within this region, it follows that 5*(f) computed within this region does not depend on r and is a paraboloid in 9. In particular, it is twice differentiable in 6. For the reminder of the proof, denote S*(Ç) by S*{d). Thus, with probability approaching 1, 6* may be obtained by setting the derivative of S*(9) to 0:  t=i j=i  n = ^ ^x,(x',(/3, -  - fOl(x..e(rO_,+.„,.o_.„])-*  Hence, ^T,ti^t^tM..,e(rO_^+d„,rf-d„]))0j  - P'j) = 7[T.7=i^tetl(.,,ç(rO_^+d^,rf-d„])-  By  Lemma 3.6 and the strong law of large numbers, 1  "  - Y 1  ^t^tMx,deir°_^  +  d„,r°-d„]))  "  = G , + Op(l), where Gj = ^fxix'^l^j-^^çf^o J,T°])]-  Under the assumptions of the pseudo problem, Gj is  positive definite. Thus,  ^0]  - P'j) = [Gj +  è x , C , l ( , . , e ( . ; ^ , + <i„,rO-.„l)-  The Lindeberg-Feller central limit theorem for double sequences implies the assertion of the lemma.  f  It now remains to sliow tliat 9 in the original problem and 9* in the pseudo problem do not differ by too much. In fact, we shall show that 9 — 9* = Op{n~^/'^) and hence that the two have the same asymptotic distribution.  L e m m a A 3 . 6 Suppose Assumptions 3.0, 3.1 and 3.3 are satisfied. Then under the conditions of Theorem A3.2, 9 - 9* = Op(n-i/2).  P r o o f The hypotheses imply that 9 is identified at  X ^ ) both in original problem and in  the pseudo problem, by some X° = (x^, • • •, x ° ) , where x J , • • •, x ° are centers of observations. It follows from Theorems A3.1 and A3.2 that ^-é»" = Opin''^/^ In n), a.nd 9'-9'^ = Op{n-'^/^'Inn). Let an = (ln7i)5/4 and  =  : \0 - 9''\ < <x„/V^, | r , - - r ^ l <  j = l , - - - , / ° } . Then  ^ and ^* both lie in J/„ with probability approaching 1. Note that function S*(^) depends only on (9 for f €  so that S'{0  = S*{9). RecaU that  S(0=^f^(^i  +  K^t))\  and S*{0 = li^i^t  + '^(^t))'.  Tl  Thus, SiO  =S*{0+lf^{et  + t^{xt))'  (A3.3)  Without loss of generality, we can assume that z is bounded. It follows from the definition of Un and the boundedness of z, that  sup  max  |i/(^;xt)| = 0 ( a „ / v ^ ) .  Note that n** is the (1, l ) t h component of J2'j=i XI,(T° - d^,  + dn)Xn{T° - d^, rf + d „ ) . By  Lemma 3.7 (i), n'* = Op(ndn). Thus, 1 ** sup  l-^i^'i^t)]  <{alln)n'*ln ^Opialdnln)  Also, for any (5 > 0 and ^ Ç.lin  <§E[f:.\C,Xt)] <§(sup max ieu^x,ue[j.(T°-d,.,rf+d„]  K^;xOlfi?K*)  <^0i^)0p{ndn) for some M > 0, where 0{a\ln) Q as n -* oo, ^  and Op{ndn) are independent of ^ € ZYn. Since a\dn —>•  ^ti^i^y^t) = Op{l/n) uniformly for all ^ G ZY„. Thus, by (A3.3)  S{0  = S*{0 + ^f^^l  where Op(l/n) is uniformly small for ^  + Op{h  (A3.4)  £lin-  Since ^ and ^* are least squares estimates for the original and the pseudo problem respectively, Sii) < Sit),  S'it)  < S'ii).  (A3.5)  (A3.4) and (A3.5) imply  0 < Sit)  - 5(0 = S'it)  - S*ii) + Op{-) < opi-). Tt  Tl  (A3.6)  Therefore, S*(i) -  S*{i') = Op(^). Since dS*(i*)/d9  = 0 and 5*(f) is a paraboloid in 6,  Taylor's expansion yields  s'ii)  = s*in+l{ê  -  - r)'.  Equations (A3.6) and (A3.7) imply Ô - 9* = Op(7i-§).  (^3.7)  If  Lemma A3.6 implies that ^/n{9 — 9^) and y/n{9* — 9^) have the same asymptotic distribution. Thus, by Lemma A3.5 we have  Theorem A3.3 Suppose the conditions of Lemma A3.6 are satisfied. Then,  ^A^(/3, -  - i N{0, alGf),  j = 1, • • •, /° + 1  where Gj is defined in Lemma A3.5.  For any j =  + 1, let  and A  = $j,o -  Then  =  fj =  A/3, = pj^d - Pj+i,d-  hence.  V ( A / 3 o - A/3S) - - M ^ ( A / 3 ° - A/3,) -A/3/ A/3,A/32 = - i ^ ( A / 3 o - A/3°) + - ^ ( A ^ , - A/33).  M^i  - r°) = - ^ v ^ ( A ^ o - A/3°) + _ ^ v ^ ( A / 3 , - A/32) + «P(1)95  So we have  Theorem A3.4 Under the conditions of Theorem A3.3, if Model (3.1) is continuous, then {fj — Tj) and _^^(,{APo — A/?o) + zr^{^Pd  — A/3°) have the same asymptotic  distribution.  Chapter 4  SEGMENTED REGRESSION MODELS W I T H HETEROSCEDASTIC A U T O C O R R E L A T E D NOISE  In this chapter, we consider the situation where the noise is autocorrelated and the noise levels are different in different regimes. Specifically, consider the model  yt = x'j^j + o-jfi, if Xtd € ( r j _ i , TJ], J = 1,..., / + 1, ^ = 1,..., n,  where €t = YlT i^iCt-i, with  (4.1)  < oo. The {CJ} are iid, have mean zero, have variance a^,  and are independent of the {xj}, Xj = {l,Xti,...,  Xtp)'. A n d —oo = TQ < TI < • • • < TI^I = oo,  while the CTJ (j = 1 , . . . , / + 1) are positive parameters. We adopt the parametrization which forces aç — l / E o ° ^ i ^ ^° that the {et} have unit variances. Further, we assume that there exists a ^ > 3 / 2 , ko > 0 such that  < k/{i + 1)'' for all i. Note that this implies {et} is a  stationary ergodic process. Estimation procedures are given i n Section 4.1. In Section 4.2, it is shown that the asymptotic results obtained i n Chapter 3 remain vahd. Since a major part of the proofs formally resemble those i n Chapter 3, all the proofs are put in Section 4.5 as an appendix. Simulation results are reported i n Section 4.3. Section 4.4 contains some remarks.  4.1 Estimation procedures  W i t h the notation introduced in Chapter 3, the model can be rewritten in the vector form,  y„ = J ] X „ ( T f _ „ r ° ) ^ , + c-,  (4.2)  i=i  where  :=  [^'-^x'ajUrl„rf)%.  A l l the parameters are estimated as in Chapter 2 except for the variances {a^,..., a-fo_^_-^}. These are estimated by  â] = Snifj-i,fj)/nj.  i = 1 , . . . , / + 1,  where fij is the number of observations falling in the jth estimated regime and / is the estimate of /° produced by the estimation procedure in Section 2.2. We shall see in the next section that the asymptotic results in Section 3.2 are essentially unchanged for this modification of the model. After estimating Pj and aj we may use the estimated residuals, êt — {yt — x.[Pj)/âj,  if  Xtd € ( f j _ i , f j ] , to estimate the parameters in the moving average model for the e'^s.  4.2 Asymptotic properties of the parameter estimates  To establish the asymptotic theory, we need to make some assumptions for Model (4.2). Below is a basic assumption which is assumed to hold throughout this section.  Assumption  4.0;  The {xj} is a strictly stationary ergodic process with £ ' ( x j x i ) < oo. €t =  tpiCt-i, where ipi < ko/{i-\- if  The et are given by  for some ko > 0, 6 > 3/2 and all i, the {Q} o^fe iid,  locally exponentially bounded random variables with mean zero, variance  = 1/ J2ilo '^h  are independent of the {xj}. For the number of threshold P, there exists a specified L such that P < L. Also, for anyj = l,...,l\  p° ^  0%,.  Note that {e^} is a stationary ergodic process and each  has unit variance. Additional  assumptions analogous to those in Section 3.1 are also needed to establish the consistency of the estimates.  For convenience, we restate Assumptions  3.1-3.2 as Assumptions 4-1-4-^,  respectively. A s s u m p t i o n 4.1 There exists 6 e (0,mini<j<;o(rj'.^;^-r]')/2) such that both E{x.iXil^^^^ç.(^.,.o_g,,o-^^} and 'i-{xideiT9,T°+s])} are positive definite for each of the true thresholds  E{xix[  T^,...,T°O-  A s s u m p t i o n 4.2 For any sufficiently small 6 > 0, £^{xiXil(3,j_^ç(^p_5 .^o])} and jE'{xixil(^j_^g(^p .,.0^5]^} are positive definite, i = l,---,l°.  Also, £ ' ( x i x i ) " < 00 for some u>  I.  To establish the asymptotic normality for the /9j's and â j ' s , we need to establish it for the least squares estimates of the /3j's and o-|'s with P and r^, • • •, T^O known. To this end, we specify the probabihty structure of { x J and {0} exphcitly. If {Q, T, V) is a probability space, a measurable transformation T : fi —> measure-preserving if P{T~'A)  is said to be  = P{A) for all A € !F- If T is measure-preserving, a set A €  is called invariant if T~'{A) — A. The class T of all invariant sets is a sub-cr-field of T, called the invariant cr-field, and T is said to be ergodic if all the sets in T have probabihty zero or one. (cf. Hah and Heyde, 1980, P281.) As Hall and Heyde point out (1980, P281): "Any stationary process { x „ } may be thought of as being generated by a measure-preserving transformation, in the sense that there exists a variable x defined on a probability space  {Q.,T,V),  and a measure-preserving map T : fi — >  fi,  such that the sequence {x'„} defined by XQ = x and xj,(u;) — x(T"a;), n > 1, a; G  has the  same distribution as { x „ } . " Therefore, we can assume that the stationary and ergodic sequence {xt,Ct} is generated by a measure preserving transformation T on a probability space without loss of generality. A s s u m p t i o n 4.3 (A.4.3.1)  Let (fi, J^, •p) he a probability space. Let {^t,Ct}t^-oo  the iid random sequence such  that (i) {Xf} and { C J are independent; (ii) (xtXt) = (x(r*a;), C(T'a>)), a; G fi, i = 0 , ± 1 , - - - , where T is an ergodic measurepreserving transformation  and (x, ^) is a random variable defined on the probability space  {^,T,V);and (iii) E{x\x.iY (A.4-3.2)  < 00 for some u > 2.  Within some small neighborhoods of the true thresholds, x\d has a positive and con-  tinuous probability density function /,(•) with respect to the one dimensional Lebesgue measure. (A.4-3-3)  There exists one version of E[-x.\X.'^\xxd — x] which is continuous within some neigh-  borhoods of the true thresholds and that version has been adopted. Consider the segmented linear regression model (4.2) of the previous section. Let / be the minimizer of MIC{1).  T h e o r e m 4.1  For the segmented linear regression model (4.2) suppose Assumptions 4-0 and  4.1 are satisfied. Then I converges to /° in probability as n ^ 00.  The next two theorems show that the estimates f, 0j and aj are consistent, under Assumptions 4.0 and 4-2.  Theorem 4.2  Assume for the segmented linear regression model (4-Sj Assumptions 4-0 and  4.2 are satisfied. Then f - r ° = Op(l), where r ° = ( r f , . . . , r^o) and f — (fi,...,  fj) is the least squares estimate of r ° based on I — I,  and I is a minimizer of MIC {I) subject to I < L.  Theorem 4.3  If the marginal cdf Fj, of xn satisfies Lipschitz Condition \Fd{x') - Fd{x")\ <  C\x' — x"\ for some constant C at a small neighborhood of X\d = rj" for every j, then under the conditions of Theorem 4-2, the least squares estimates Pj and aj, j = 1,... ,1 + 1, based on the estimates I and fj's as defined in Section 2.2, are consistent.  Next, we show that if Model (4.2) is discontinuous at r ° for some j = 1, • • • , / ° , then the threshold estimates, fj, converge to the true thresholds, r ° , at the rate of Op(ln' n/n), and the least squares estimates of Pj and <7| based on the estimated thresholds are asymptotically normally distributed.  Theorem 4.4 Suppose for the segmented linear regression model (4-2) that Assumptions 4-0, 4.2 and 4.3 are satisfied. IfP{x[{Pj+i  - Pj) / 0\xd = r?) > 0 for some j = 1,---,P,  then  For j = 1, • • •, /° + 1, let Pj be the least squares estimates of Pj based on the estimates / and fj's as defined in Section 2.2, and aj be as defined in Section 4.1. Define Gj = Z;(xix'il(^^_^ç(^o_^_^o])), 00  E,- = aj[G-'  2Y,l{i)Gj'E{xil^,^^^^rO_^,rO^^^^^  + i=l  Pj = P{TU  <  < r'j)  and vj=pjil-pj)Eiet)  + p'j[iv-3h\0)  where 7(1) = £ ' ( e i € i + , ) , 77 = cryE(<^f)  Theorem 4.5  and j =  oo +2 ^ 7^(0], »=-oo +  Then, we have the following result.  Suppose for the segmented linear regression model (4-2) Assumptions 4-0, 4-2  and 4.3 are satisfied. If P{x.\{Pj^x - Pj) 7^ O^d = r?) > 0 for all j = 1, • • • t h e n  V^CPJ - Pj)  N{0, S , ) and ^Pj{à]  - u])  iV(0, v^a)),  as n ->• 00, j = 1, - • • ,f + 1.  Note that i f 7(1) = 0, i > 0, then Ylj — <^o^7^ as shown in Section 3.1. The next theorem shows that Method 1 of Section 2.2 for estimating dP produces a consistent estimate.  Theorem 4.6 If d° is asymptotically identifiable w.r.t. L, then under the conditions of Theorem 4-1, d given in Method 1 of Section 2.2 satisfies P(d = d^) —>^ 1 as TI — » • 00.  Remark:  Although the result of Theorem 3.7 is expected to carry over i f aj = a for all j, it  does not carry over i n general. Hence, Method 2 given in Section 2.2 is not generally consistent. Below is a counterexample.  Example 4.1. Let x = (1,2:1,X2)' where (xi,X2)  is a random vector with domain [0,6] x [0,6].  Divide the domain into six parts as shown i n Figure 4.1. On each part, (xi,X2)  is uniformly  distributed with mass indicated i n the figure. Let d = 1, Z*' = 2, L = 2 and ( r i , r 2 ) = (0.5,1). Hence, i?? = {x : 0 < x i < 0.5}, ii:^ = {x : 0.5 < x i < 1} and i?^ = {x : 1 < x i < 6}. The model is yt = ^^ l(x,eK«) + <^j(t: if Xt G R'j, 102  where the { x J are independent samples from the distribution of x , the {et} are iid iV(0,1) and independent of {xt}. Let o-^ = 1 and  = cr^ = 10. Define Rj = {x : X i 6 (j — 1, j]}, i =  1,2, J = 1, • • - , 6 . It is easy to see that on each Rj, the mass is 1/6 = 1/(2X + 2). Suppose we fit a constant on each of on R).  Rj.  For j > 1, AMSE(R])  Let us calculate = a | = 10.  AMSE{Rl)  AMSE{R^j),  the asymptotic mean squared error  And  = ^2 ^ i + a l X i + 5f  = ^  +  BJ,  where Bi is the asymptotic mean bias. Observe that the marginal distribution of Xi on (0,1] is uniform and symmetric about n = 0.5; hence Bi = 1 and  AMSE{R\)  =  13/2 < 10. Therefore,  with probabihty approaching 1 as n —»^ oo, the M S E on Rl wiU be chosen as the smaUest M S E among those on 72], j = 1, • • •, 6. For i = 2 and  j  > 1,  where B2 represents the asymptotic mean bias on each of Rj, j > 1. The asymptotic mean squared error on Rl should be no larger than the asymptotic mean squared error obtained by setting the model to 0:  \  ij -  1  20  2  20  ^  20  20  20  100  Thus, with large probability as n ^ 0 0 , the M S E on Rl will be chosen as the smallest M S E among those on  Rj,  j = l , - - - , 6 . Since  AMSE{R\)  > AMSE{R\),  X2,  rather than xi, wih be  chosen by Method 2 as the segmentation variable with probability approaching 1 as n —>^ 00. f  4.3 A simulation study  In this section, simulation experiments involving model (4.2) are carried out to examine the small sample performance of our proposed procedures under various conditions. As in Section 3.3, segmented regression models with two to three regimes are investigated. Let 4 = 0.7eJ_i - 0.1e;_2 + Ct, where the { 0 } are i i d with a locally exponentially bounded distribution having zero means and unit variances. Note that the {e^} can alternatively be defined by  (l-ei-^5)(l-C2-^5)e', = Ct, where B is the backward shift operator defined by Bh'^ = e[_j, j = 0, ± 1 , ± 2 , • • -, and (6,6)  =  (2,5). Since |6| > 1 for i = 1,2, {ej} is a causal A R ( 2 ) process. Hence, it can be written as = Sjlo  where  is the coefficient of  in the polynomial, V>(2) = l/[{l —  ^z){l-^z)].  Expanding tp(z), we get  t=0  fc=0  .=0 it=0  Let j = i + k, then  «=0 j = »  j=0 i=0  So  t=0  t=0  Thus for any S > 3/2, taking ko > 0 sufficiently large, we have €t — e'Jy/Var{€[),  < ko/(j  + 1)*. Let  so that Var{et) = 1 for all t. Then the {et} satisfy the condition of Model  (4.2) [In this case ^yVar{e't) = 1.33 (c.f Example 3.3.5, Brockweh and Davis, 1987)].  Let Zt = {xti, - • • ,xtp)' and xJ = ( l , z ' J , where {xtj} are nd iV(0,4). Let DE{Q,\)  denote  the double exponential distribution with mean 0 and variance 2A^. For d = 1 and r ° = 1, the following 3 sets of model specifications are used: (a')  p = 2  = (0,1, l y , p2 = (1.5,0,1)', tTi = 0.8,  = 1, 0 ~ ^ ( 0 , 1 ) ,  (d')  p = 3,  = (0,1,0,1)', ^2 = (1,0,0.5,1)', a i =0.8, <T2 = l,Ct-^  (e')  p=3ji  DEiO,l/V2),  = (0,1,1,1)', 02 = (1,0,1,1)', (Tl = 0.8, (72 = 1, 0 ~ i ? ^ ( 0 , 1 / v ^ ) .  Note that the regression coefficients in Models (a'), (d') and (e') are the same as those in Models (a), (d) and (e). Beyond the reasons given in Section 3.3, these models are selected so that the results in this section will be comparable to those in Section 3.3. In all, 100 replications are simulated with different sample sizes, 50, 100 and 200. For the reason given in Section 3.3, the results reported in Tables 4.1 and 4.2 are obtained by setting L = 2 to save some computational effort. The two constants, êo and CQ in MIC,  are chosen as  0.1 and 0.299 respectively, as explained in Section 3.1. Table 4.1 shows the estimates /, f i and its standard error, based on the MIC.  The following observations derive from the table.  (i) For all models, in more than 90% of the cases 1° is correctly identified. Hence, for estimating f our residts seem satisfactory. Comparing these results to those in Table 3.1, it seems that Models (a'), (d') and (e') are more diflRcult to identify than Models (a), (d) and (e). (ii)  As in Section 3.3, f i seems biased for small sample size. This bias is related to the shape  of the model. Note that the biases for Model (a') are all positive and those for Model (d') are all negative. These biases decrease as the sample size becomes larger. (iii)  The standard error of f i is relatively large in all the cases considered. A n d , as expected,  the standard error decreases as the sample size increases. This suggests that a large sample size is needed for reliable estimation of r f . A n experiment of n = 400 is carried out for Model  (e'). We again obtained correct identification in 99% of tlie cases. But the standard error of fi reduces from 1.111 for n = 200 to 0.707 when n = 400.  (iv)  A larger  estimate  niay perform better in these cases, since there seems to be a tendency to over  especially as n becomes large. Because in practice, the model structure is unknown  and one cannot choose the best (SofCo), we adopt the same values for these parameters as in Section 3.3.  Table 4.2 shows the estimated values of the other parameters for the models in Table 4.1 only for a sample size of 200. The results indicate that, except for P20, the estimated y3j's are quite close to their true values even when f i is inaccurate. So, for the purpose of estimating the ySj's, and interpolation when the model is continuous, a moderate sample size such as 200 may be sufficient. When the model is discontinuous, interpolation near the threshold may not be accurate due to the inaccurate f i . As we saw in Section 3.3, the estimates of /32o have relatively large standard errors. This is due to the fact that a small error in P21 would result in a relatively large error in $20- The relatively large error for  may also be due to the inaccurate f i .  Simulations have also been carried out for a model with /° = 2. Specifically, the model is:  (j)  p = 2,  = (1,1,0)', P2 = (0,0,1), Ps = (0.5,0,0.5), a i = 0.7, ^2 = 0.8,  r{' = - l , T° = l,  = 1  (:t^DE{0,l/V2).  The results are reported in Tables 4.3-4.4. Table 4.3 tabulates the empirical distributions of the estimated /" for different sample sizes. W i t h n = 200, 1° is correctly identified 95 out 100 rephcations. The standard errors of fj (j = 1,2) are relatively smah indicating that the thresholds in this model are easier to identify. The Pj''s and the â ] ' s are given in Table 4.4. The results are similar to those in Table 4.2.  4.4 General remarks  In this chapter, we generalized the results in Chapter 3 to the case where the noise is heteroscedastic and autocorrelated. Although the ideas used in this generalization are the same as those of Chapter 3, it can be seen in Section 4.5 that a more technical analysis is required to prove these results. The simulation results given in the last section indicate that this model is in general more difficult to identify, compared with the model discussed in the last chapter. There are several questions which need further investigation. First, can the residuals be used to estimate the tpi's in the moving average specification of the noise once the estimates of the regression coefficients are obtained? If so, what procedure should be used to reduce the impact of the bias in the estimated r ° ' s ? Once the Vt's are estimated, can the information obtained be used to reestimate the other parameters of the model to obtain better estimates? Second, the asymptotic distribution of the estimates given i n this chapter are for discontinuous models. If the model were continuous, one could aggragate the data over the segmentation variable regions to obtain a linear regression problem. The /3ji's {i ^ 0,d) can be estimated by least squares. The residuals can be then be used to estimate f3ji, /Sjd and aj (j = 1, • • • , / ° + 1) by least squares again in a one-dimensional segmented regression problem. A number of questions remain to be answered: Are these estimates consistent? What are their asymptotic distributions? If the parameters are estimated directly by least squares, are the estimates, unrestricted by continuity, consistent? What are their asymptotic distributions? Some of these problems will be discussed further in the next chapter as future research topics.  4.5 Appendix: Proofs  Although a major part of the proof appear to resemble those in Chapter 3, there are some  extra difficulties resulted from the correlated errors. First, we have to show that the result of Lemma 3.2 still holds under dependent assumptions. This is accomplished in Lemmas 4.1 and 4.2. Second, the results of Lemma 3.7 have to be re-established by calculating the limits of sample moments. Third, we have to establish the asymptotic normality of the estimated regression coefficients and the variances of the errors for known thresholds. This is done in Lemmas 4.9 and 4.10 by using a central hmit theorem for stationary processes. The proof of Theorem 4.1 will be given after a series of related lemmas. L e m m a 4.1 (Susko, 1991) Suppose \ai\ < ko/i^ for some Â;o > 0, ^ > 3/2. Then |a,+,|)2 <  YlZiŒZi  oo.  P r o o f : B y assumption, \ai\ < ko/i^ for some ko > 0, S > 3/2. Therefore,  Now,  oo  oo  oo  oo  ;=1  /=1  .=1  /=1  oo  ^  1=1 ^ ^  ^ ^  ^  oo ^  j=,+i oo =  E  j  V/ /  oo  =  E  <  E  dt  .j  /  min  dt  /  roo  dt  -I.  So, oo  oo  L,2  °°  D E i - ' + ^ d ^ s t ^ E ' / . ^ " - " -  (4.3)  By assumption, S > 3/2, so 2(6 — 1) > 1, and hence  f;(f;ia,+,i)2<oo. The next Lemma is slightly modified version of Lemma 1 of Susko (1991).  L e m m a 4.2  Let {Ct} be iid, locally exponentially bounded random variables. Let  €t = S i ^ o '^iCt-i, and assume there exists 6 > 3/2, ko > 0 such that  < ko/{i + 1)'' for all  i. Let Sk = Yii=i ^i^i> where the a'^s are constants. Then there exists 0 < c i < oo and Ti > 0, such that for any x >Q, k > 1 and t satisfying 0 < /||a|| < T i ,  P{\Sk\ >x}<  Proof  2e-*^+=i*'ll''ll'.  The assumption of locally exponentially boundedness means that for some TQ > 0 and  0 < Co < oo, f;(e*^i) < e''"*^ for \t\ < To. Now it follows from Markov's inequality that for sufficiently small t > 0,  P{Sk >x}  = P{e*^* > e'^} < e-*^X;(e'^*).  And fc  k  Sk = Y  = E  E  i=l  1  where  fc-1  ^(^) =  =  t j=0  Ec-.E«iV'i+.-. i=0  Hence,  = ^(^)+^(^)'  j=0  E'^'^-'E^'t-j^'-i' :=0  ^ w  oo  i=l  if | ^ E t i a / V ' / + i | < To for aU i. Let Mi = E S o C E t i  Note that we can assume  y/M^ > 0 without loss of generality (since otherwise Cj = 0 a.s.). Since iV»,! < ko/{i+ 1)^, from the previous lemma Afi < oo. Observe that for all i,  (E«'V'/+.)^<(è«?)(E^'+.) /=i 1=1 1=1 <iwP(Ei^'+'i)'^iHi'(Ei^'+«i)'/=i /=i  Hence i f t is such that | i | | | a | | <  k  TQ/^/M^,  then for aU i  oo  l*E"'^'+«l ^ MIHI(El^'+.l) < 1=1  \t\\HVM'i<To.  1=1  Therefore, for any t such that |t|||a|| < To/y/M^ and c = c o M i ,  Also,  if I Z)}=o '^k-j'>Pi-j\ < To for all i. Let n = i- j, m = i - I, then  i=0 j=0 k-1  i  = E  i j-1  E ^l-j^i-j  + 2E  E  ak-jQk-irpi-ji^i-i]  j = l /=0  1=0 i = 0  fc-1 0  fc-1  i - 1 n+1  = E E ^l-i+n'^l + 2 E E E afc-(i-n)afc-(.-m) V ' n ^ m t"=0 n=t" j = l n=0 m = : fc-1 t fc-2 fc-1 «• = E E ' ^ f c - ' + n ' ^ " + 2 5 ^ J2 E flfc+n-iafc+m-.V'nV'^ t=0 n=0  n=0 t = n + l m = n + l  fc-1 fc-1 fc-2 fc-1 fc-1  ^ E  ^" E  n=0  +  2|  ^  Y^k+n-iak+m-iMm  n=Om=n+lz=m  i=n  fc-1 fc-2 fc-1 fc-1  i  /V A A. A  < E ^ n H i ' + 2 E  A,  E  J.  iV'.^'-iiE Ck+n-iak+m-i\  n=0  n=0 m=n+l  »=m  < E ^ n N l ' + 2 E E l^n^'mlNI^ n=0 n=0 m=n+l =ii«iP(Ei^'^i)' n=l  Therefore, for any t such that |<|||a|| < To/y/M^ and the c = CQMI, we have «  ItY^k-j^i-jl j=o fc-1 i  «•  < |f||^afc_jV.-il j=0  t=0 j=0 <7o. and hence  Since A(A;) and  are independent we get that for Ti = To/y/Ml  P{Sk >x}<  e-'^Eie*''^''^)E{e'^^'^) < e-«-e2ct^||.||^ ^  where c i = 2c and |f|||a|| < T j .  and any A;,  ^-tx^c,t^\\a\\^^  Finally, to conclude the proof, we note that  P{Sk < -x}  Lemma 4.3  = P{-Sk  > x}.  f  Assume for the segmented linear regression model (4-2) that Assumption 4-0 is  satisfied. Define  (Tmax  := rnaxj  <7i  and redefine Tn(a,T]) := ê ^ ' ^ „ ( a ,  77)6^, - 0 0  < a <  77 <  00.  Then P { s u p Tnia, 77) >  Qfj2 „ 3 In^ TI}  a<Ti  0,  as  n 0 ,  ±1  where po is the true order of the model and T, is the constant specified in Lemma 4-2.  Proof  Conditioning on X „ , we have P { s u p T „ ( a , 7 7 ) > £ 4 f ^ l n 2 7i I X „ } a<r]  =P{ <  J-i  max ê r ^ „ ( x , r f , x , , K > ^ - ^ I n ^ n I X „ } E  PK'Hn{x,d,xu)èl>^^\n'n\Xn]. 1  X,d<X,d  Since Hnixad, Xtd) is nonnegative definite and idempotent, it can be decomposed as Hnixsd, Xtd) = W'AW,  where W is orthogonal and A = diag{l, • • •, 1,0, • • •, 0) with p := rank{Hn(xsd, Xtd))  = rank{A) < po- Set Q = {Ip,0)W. Then Q has fuh row rank p. Let Q' = ( q i , • • •,qp) and Ui = q^el = q J E l l V ^ i / n ( r P . i , rP)]c„, / = 1 , . . . ,p. Then,  1=1  Since p < po, as in the proof of Lemma 3.2, it suffices to show, for any /, that  E  m ' > ^ % ^ l n ' n | X , } - > 0 ,  Noting that p = trace{Hn{x,d,Xtd)) II q ; E ! l V ^ i ^ n ( r ? . i , r f ) r<  asn^O.  = E f = i II qi IP> we have || q, ||2= qjq, < p < po and  a L . || q/ |P< ^LxJ^o < crLxPg, where / = l , . . . , p . B y Lemma 112  4.2, with ^0 = Tx/umaxPo we have  T2 V,  ^  E  i_2  2 e x p ( - - ^ . ^ ^ h i n ) e x p ( c i ( - ^ ) V L . F o )  <n(n - l)/n3exp(ciToVPo) -> 0,  as ra -> oo, where c\ is the constant specified in Lemma 4.2. Finally, by appealing to the dominated convergence theorem we obtain the desired result without conditioning.  %  Consider the segmented regression model 4-1 •  C o r o l l a r y 4.1  (i) For any j and ( a , /?] C ( r ^ . i , r]>],  5 „ ( a , 7/) = a]è'n{a, r])€n(a, rj) - Tn{a, rj).  (ii) Suppose Assumption 4-0 is satisfied. Let m > 1. Then uniformly for all (oi, • • •, a^) such that -oo < cx < • • • <  < oo,  m+l°+l  5„(6,---,W)=  where 6 = -oo,  Y i=i  SniÇi.x,^i)  fm+zo+i = oo, and {^i,-• • ,^m+i°}  = rn'ë^n +  Op{ln'n),  is the set {ri°, • • •, r°o, ai, • • •, a„} after  ordering its elements. P r o o f : (i) Replace ë „ ( a , rj) in the proof of Proposition 3.1 (i) by c^(a, rj) = / „ ( a , r])ê^ and note €^(0,77) = ajën(a,rj)  when (a,77) C {TJ_I,T^].  The result obtains immediately.  (") B y (i), SniÙ,  •  ••,Çm+l°)  «=1  m+l°+l 1=1  m+l° + l «=1  Note that each of (^j_i,^j] is contained in one of ( r ° _ i , rj*], j = 1, • • •, /° + 1. B y Lemma 4.3, E . ' l t ' " " ' ' Tn{ii-x,ii)  L e m m a 4.4  < (m + /« + 1) sup,<, r „ ( a < T?) = O^Cln^ n).  1[  Under the condition of Theorem 4-1, there exists S G (0, mini<j</o(TJ^-, — TJ')/2)  such that for r = 1 , . . . , /°,  [ 5 „ ( r ° - 6,r° + S)-  Snir"^ - é,r^,) - 5 „ ( r ° , r ° + <5)]/n ^  (4.4)  /or some C r > 0 as n —>^ oo, r = 1 , . . . , / ° + 1.  Proof  It suffices to prove the result when /" = 1. For notational simplicity, we omit the  subscripts and superscripts 0 in this proof. For the 6 in Assumption 4.I, denote S,Ti),X^ =  = XniTi,Ti+S),X* +  = Xn(Ti-S,Ti+S),ël  and /3 = {X*'X*)~X*'Yn.  =\\x{Pi+x;h  +  = <7i/„(rin)ë„,  = X„(ri -  =  Cr2ln(Tl,Ti+6)ën,  As in ordinary regression, we have  ê*-x'k?  = | | X r (  - ^ ) + ^2*(^2-^) + 6 l P =\\x*{h - h '  + \\x;02  - h '  +  + 2e*'xr( - h  + ^i-'x^i^  -  h  Note that { x J and { j / J in Model (4.2) are strictly stationary and ergodic. It then follows from  the strong law of large numbers for stationary ergodic stochastic processes that as n —»• oo, 1 ' 1 " -X* X" = - V x i X ' i l ( ^ . ^ e ( ^ j _ 5 , T i + 5]) 71 .^ «•=1 -xfx;  as  ^{xix'il(^j^ç(.,j_6,^j + 6])} > 0,  ' i;{xix'il(^,^e(ri-5,Ti])} > 0,  if  j=l,  £{xixil(^,,G(^,,^,+5])} > 0,  if  j=2,  and - X * Y„ ^  E{yiXil(xue{Ti-s,Ti+i])},  Th  where E{yiXil^^^^ç(^r,-s,T,+s])}  = -E{xixil(^j^e(rj-5,ri])} + £^{xixil(^^^6(^i,^,+5])}^2-  Therefore,  P ^  { X ; { x i x i l ( ^ „ e ( ^ j _ 5 , ^ , + 5 ] ) } } ' ^ ^ { î / i X i l ( x i , e ( n - 5 , r i + 5 ] ) } =: P'-  Similarly, it can be shown that f iP, - ^ • ) ' E ( x i x ' i l ( x . . e ( n - 5 , n ] ) ) ( ^ i 7t  02 - ^•)'i;(x:xil(,,,e(.,,,,+^]))(/32 - ^S*),  if  J=l,  if  j=2.  - c * ' x ; ( ^ , - ^ ) ^ 0 , for j = 1,2, Th  and n where pi = P{xid  € (n - 6,TI]} and p2 = P{xid  € ( r i , r i + S]}. Thus, as n -> oo, ^ 5 „ ( r i -  6, Tl + (5) has a finite limit, given by lim - 5 „ ( r i - 6,TI + S) = (  - /3-)'i;(xixil(,,,e(.,_5,.,]))(;3i - n +  + 02 - ^ * ) ' £ ( x i x ; i ( . , , e ( , , , , , + 5 ] ) ) ( / 3 2 -  PI  (Tlpi+alp2.  It remains to show that ~Sn{Ti - S,TI) and ^ 5 ' „ ( r i , r i + ^) converge to ajpi and cr^p2 respectively, and either (  - P*yEixix[l(,,,^^r,-s,n]))0i  - P*) > 0 or (^2 -  P*yE{xix[  1(xide(Ti,Ti+s])) • (02 — p*) > 0. The latter is a direct consequence of the assumed conditions while the former can be shown again by the strong law of large numbers. To this end, we first write 5n(Ti — 6,TI) in the following form,  Sniri - 6,  n) = êl'êl -  Tn{ri - 6, n )  using Corollary 4.1 (i). Bearing in mind Eel = 1» by the strong law of large numbers, iê-'ëî ^  <rlE[ell^^^,^^r,-s,r.])]  = <TlP{xid e in  -  Tl  Tl  and W = lim„_»oo ^X^'X^  is positive definite under the assumption. Therefore,  Tn{ri - è,T,) = {--el'Xl){-XrX*,)-{-Xl'ël)  n  Thus, ^Sn{Ti—è,Ti) 6)  n  n  ^  = 0.  cr^pi. The same argument can also be used to show that ^ 5 „ ( r i , r i - | -  o'2P2. This completes the proof.  Now define al — Y^j^i  PJ(TJ,  f  where Pj = P{xid G  large numbers to {efl(x,<,e(TO_i,Tp])} for all j , we obtain ^è^'è^  Lemma 4.5  OW-'O  7"°]}.  ^  Applying the strong law of al.  Under the condition of Theorem 4-i, we have  (i) for every I < 1°, P{âf  > al + C} —>• 1, as n  oo for some C > 0, and  (ii) for every I such that P < I < L, where L is an upper bound of P,  0 < i ^ ' c ' ^ _ âf =  Op(ln\n)/n),  Tt  where aj = ^ 5 ' „ ( f i , . . . , f;) is the estimated al when the true number of thresholds is assumed to he I.  Proof  (i) Since / < / ° , for 6 € (0, mini<j<;o (rj*^! - T^)/2) in Assumption  1 < r < /o, such that {h,...,fi)€  4.I, there exists  A^ := { ( r i , . . . , r,) : | r , - r"! >(5, s = 1,..., /}. Hence, if  we can show that for each r, 1 < r <  min  with probability approaching 1,  Snin,---,Ti)/n>  al + Cr,  for some Cr > 0, then by choosing C := mini<r<(o {Cr}, we prove the desired result. For any ( r i , - - - , r / ) G Ar, let 6 < ••• < , r ° - 6,  + ^, T°^i,.  ..,Tfo}  and let  ^0 = -0°,  be the ordered set { r i , . . . , r,, r f , . . . , ^i+i°+2  = oo- Then it follows from Corollary  4.1 (ii) that uniformly in Ar,  -SniTi,---,Ti)  n  n . 1+1°+2  =-  E  "^"(0-1,6) (4.5)  = -[  E  + ^[5„(r° -  +  SMJ-U^J)  5 „ ( r ° - S,r°) + 5 „ ( r ° , r ° + ,5)]  r ° + ,5) - 5 „ ( r ° - ^, r ° ) - 5 „ ( r ° , r ° + 6)]  = i e - ' 6 - + Op(ln2(n)/n) + i [ 5 „ ( r ° - ^, r ° + S) - 5 „ ( r ° - ^, r ° ) - 5 „ ( r ° , r ° + ^)]. 7i  Tt  By the strong law of large numbers the first term on the R H S isCTQ+ o ( l ) a.s.. B y Lemma 4.4, the third term on the R H S is Cr + o(l) a.s.. Thus  -SniTl,---,Ti)>al+Cr Tt  + Opil),  where Cr is defined in (4.4). (u) Let 6  < ••• < (1+1° be the ordered set, {h-,-• • ,TI,T^,•  • • ,Tfo},  -  =  -00  and  Çi+io^i  = OO. Since / > / ° , by Corollary 4.1 (ii) again,  = T°o^,  >5n(r°,.-.,rfo)  =naï  =  E  '^n(6-l,6)  j=l  =ël'rn + This proves (ii).  Op{ln\n)).  ^  P r o o f o f T h e o r e m 4.1  By Lemma 4.5 (i), for / < f and sufhciently large n, there exists  C > 0 such that  MIC{1) = ln(<7f ) + p * ( l n n ) 2 + V n > \n{al + C / 2 ) > ln(a2) + l n ( l + Cl{2al))  with probabihty approaching 1. By Lemma 4.5 (ii), for / > / ° ,  MIC{1) = lii{âj)+  Thus, P{1 > /"} 1°  p*(Innf+^/n  Ina^.  1 as n —»• oo. B y Lemma 4.5 (ii) and the strong law of large numbers, for  <1<L,  0>âf-  àfo = [àf - i e - ' c - ] - [âfo - ^e-'e-] = ^^(In^  n/n),  and [^?o - al] = [âfo - Uv-<\  Hence 0 < (âfo - à'\)/à]„  + \-jV-<. - <^Vi = Op(ln2 n/n) + Op(l) = 0^(1).  = Op{ln'^{n)/n).  Note that for 0 < x < 1/2, l n ( l - x) >  -2x.  Therefore, MIC{1) - MIC{f)  = l n ( â f ) - l n ( 4 ) + Co(/ -  f){\nnf^^°ln  = l n ( l - ( 4 - 4 ) / 4 ) + co(/ - /°)(In(n))2+*Vn > - 20j,{\n\n)/n)  + co(/ - / ° ) ( l n ( n ) ) 2 + « V n  >0 for sufficiently large n. Whence /  /° as n ^ oo.  %  To prove Theorem 4.2, we need the foUowing lemma.  Under the assumptions of Theorem 4-2, for any sufficiently small 6 G (0,  Lemma 4.6  mini<j<jo(r°^.i — r ° ) / 2 ) , there exists a constant Cr > 0 such that  - [ 5 „ ( r ° - 6, r ° + S) - 5 „ ( r ° - 6, r ° ) - 3^(4,T°  + S)] ^  Cr, as n ^  oc,  Tt  where r = 1, • • •, Proof It suffices to prove the result for the case when P = 1. For any small ^ > 0, all the arguments in the proof of Lemma 4.4 apply, under Assumption 4-2. Hence, the result holds.  Proof of Theorem 4.2  B y Theorem 4.1, the problem can be restricted to {/ =  For any  sufficiently small 6' > 0, substituting 6' for the 6 in (4.5) in the proof of Lemma 4.5 (i), we have the foUowing inequality: -Sn{n,---,Tl<>)  n >Uîèl  +  Op{ln\n)/n)  + ^ [ 5 „ ( r ° - y , 4 + 6') - 5 „ ( r ° - 8', r ° ) - 5 „ ( r ° , r ° + 6% uniformly in ( r i , - - - , r ; o ) G Ar := { ( n , • • •, r/o) : jr, - T°\ > 6' ,1 < s <  B y Lemma 4.6,  the last term on the R H S converges to a positive Cr for every r. A n d for sufficiently large n,  the O pilv? {n) I n) < imni<r<io(Cr). Thus, uniformly in Ari r = 1 , . . . , i ^ , and with probabihty tending to 1, i5„(ri,...,r,o)>iCf- + n n  ^ .  1  This imphes that with probability approaching 1 no r i n  is quahfied as a candidate of f,  where f = ( f i , • • • ,fjo). In other words, P ( f € A%) -> 1 as n -> oo. Since this is true for ah r, P{f e H r l i ^ r ) ^ 1> 05 n /"  1°  n  oo. Note that for S' < mino<i<,o{(rP+i - r f )/2}, i°  - ^ r l < S'} = f]{\K  r=l  - r"r\ < S'Jor  some 1 < ir < 1°} = {f e f]  r=l  A^.  r=l  Thus we have, 1°  Pi\fr-T^\<6'  for r = l,...,P)  = Fife  f| A^) ^ 1, as n ^ oo, r=l  which completes the proof.  P r o o f o f T h e o r e m 4.3  ^  Let aj* and Pj be the "least squares estimates" of aj and /?  j = 1, - • • ,1° + 1, when /° and (rf, • • •, rjj) are assumed known. First, we shaU show that the Pj^s are consistent. B y the strong law of large numbers for ergodic sequence, Pj — Pj = Op(l), J = 1, • • •, /° + 1. So it suffices to show that Pj — Pj = Op(l) for each j. Set X ; = / „ ( r j ' _ i , r ] ' ) X „ and Xj = / „ ( f , _ i , f , ) X „ . Then,  <i\^^^r  - {\^U]r\^^y^\ + a'-x-'x-m'-ix, - x;)'y„]  =[(ixjx,)- -  i^-xfxjmkx'j -  =:(I){{II) + {in)}  +  x ; ) X  +  ix;y„}  + [(^x/x;)-][i(x, -  x;)%]  iIV)iII).  where (/) = [ ( ^ X j X , ) " - ( i X / X / ) " ] , ( / / ) = i ( X j - X ; ) ' F „ , ( / / / ) = i X ; F „ and ( / V ) = [ ( i X / ' X / ) - ] . B y the strong law of large numbers, both (III) and (IV) are O p ( l ) . B y Theorem  4.2, f — r ° = Op(l).  Proposition 3.2 implies that there exists a sequence { a „ } , a„  0 as  n -> oo such that f - r ° = O p ( a „ ) . Note that (//) = ^ X;r=i '<^i2/Kl(x.aeR, ) ~ l(^<d€Ri)) where -^j = ('''j-i»'Ty]' - ^ i = (^i-i''^}']- Taking u > 1 and  = aJxtyt for any real vector a, it follows  from Lemma 3.6 that ( / / ) = Op(l). It is shown in the proof of Theorem 3.3 that (/) = Op(l). Thus, ; â ^ - ; â ; = o p ( i ) , i = i , . . . , z ° + i . Next, we shall show that the â^'s are consistent. When  and (r^', • • •, T,°O) are known,  the least squares estimates ô-|*'s are obtained from each regime separately. Hence within each regime, applying Corollary 4.1 (i) and Lemma 4.3, we obtain that  n  "i^f =  E  + Op(/n^n),  (4.6)  «=1  where Uj = Y^^=i ^(x^eR^)  number of observations in the j t h regime. B y the strong law  of large numbers and Lemma 4.3 Uj/n  pj as n ^ oo, and  = ^ - 1 E^?l(^..ei.o) + O p ( ^ ) t=i "  Therefore, it remains to show that aj - âf Lemma 3.6 to  = a] + Op(l).  = Op(l). Recall fij = ^ J L ^ ^(xt^eRj)- Applying  = 1 we obtain ^ftj = ^TIJ + Op(l) = pj + Op(l). Thus, it suffices to show  5 „ ( f , _ i , f , ) - 5 „ ( r } ' _ i , r j ' ) = Op(l). Since  ^„(f^_i,f^))F„,  Sn{fj-l,fj)  =  y^(/„(fj_i,f,)  Sn{TU,r^)  =  F,:(/„(r]'_a,r») - ^„(Tf_i,rj'))y„,  -  and  we have that 5„(f,_i,f,)-5„(r°_„r°)  n «=1 n  + Kx,(x;'x;)-x;'y„ - y,:x;(x;'x;)-x;'y„} n  = Eî''(i(x..6R,) t=i +  - {y^x,(x;.x,)-(xj - x;')y„  (4-7)  - ( x ; ' x ; ) - ] x ; ' y „ + y,:(x,- - x ; ) ( x ; ' x ; ) - x ; y „ }  n = E^<(^(^.<ieA,) - l ( x . d G H ° ) ) «=1  - {Y:,XA{X'^XJ)+ y^x,[(xjje,)-  - {x;'x;)-  +  - x;')y„  - ( x ; ' x ; ) - ] x ; ' y „ + y^(x,- - x ; ) ( x ; ' x ; ) - x ; y „ }  n = E2't(^(^<^eÂ,) -  l(x.,Gfi?))  - {((//) + (///))'[(/) + (/F)](/J) + ((//) + (///))'[(/)](///) + (//)'(/F)(//7)}.  Taking u> 1 and Zt = j/f, it foUows from Theorem 4.2 and Lemma 3.6 that ^ E " = i 2/i (l(r,<iefi ) -l(xMefl?)) = Op(l)-  A s we have previously shown, (/) = Op(l), ( / / ) = Op(l), ( / / / ) =  Op(l) and (IV) = O p ( l ) . Hence  - {(op(l) + Op(l))[op(l) + Op(l)]op(l) + (op(l) + Op(l))[op(l)]Op(l) + O p ( l ) 0 p ( l ) 0 p ( l ) } = Op(l)  H  P r o p o s i t i o n 4.1 (Broclcwell and Davis, 1987, p219-220)  Let  oo j=—oo where {^t} is iid with mean zero and variance a^, E^f = rja'^ and Y1JL_^  IV'jl < co- Then,  E{et) = 3'r\0) + {rj-3)a',Y^t, t  (4-8)  and -, n oo lim n F a r ( - V 6 ? ) = ( 7 / - 3 ) 7 ^ ( 0 ) + 2 T T'CJ), n—^oo Jl ' ' ' ' t=l j= —oo  (4-9)  where 7(-) is the autocovariance function of {et}.  We would remark that under Assumption  4-0, 7(7) = «'"^ Si^o''/'«'^î+i-  particular,  r(0) = -E(ef ) = 1. Now, we restate Lemma 3.7 with appropriately modified hypotheses.  L e m m a 4.7 Assumptions  Let {kn} be a sequence of positive numbers such kn ^ 0 and nkn  00. Suppose  4-0 and 4-3 are satisfied. Then for any j = I, - •• ,1°,  (i) ^ X^Crj» - kn,T°)Xn{T] nkn :^j'nir'j,T'J  - kn,T^) ^ E{XIK[\XU  = r^i)fd{r%  + A:„)X„(r°,rj' + kn) ^ E{xix[\xid  =  r'i)fé{r]),  (ii) :^/n'{r'j  - kn,r^)êl{r]  - fe„,r°) ^ a | / d ( r ° ) ,  ^ ê r ( r ° , r « + A ; „ ) 6 - ( r ° , r ° + kn) ^  a]^Mr'j),  (iii) \-ëV{r'i nk ^ nkn Proof  - K,r])Xn{r]  - kn,r]) ^ 0,  -lV{r],T]^kn)Xn{T],r]^kn)^Q,  (i) is the same as in Lemma 3.7, hence, it suflices to show the second equation in each  of (ii) and (iii).  (ii) Noting for sufficiently large n that ê^(rj', rj' + Arn) = /d(rj') as n  là:^'ni'rf,T^+kn)€n{rf,r^+kn)  and  al  ajênirf,  oo. Let  r^ + K),  it suffices to show that  y^t = i(x.de{r°,rO+k„]),  Pn =  E{ynt)  Then,  = Var(ynt).  Pn  =Pixtd  e (TIT^  = iMT°)  +  +  kn])  0{l))kn,  " [^(iCx^eCr/.TO+fc™]))]^  ^^i^(x,de{r°,r9+k„]))  =nn - nl =iUT])  +  0il))kn.  / d ( T ° ) as n ^ oo. It therefore suflîces to show that  In particvdar, /i„/A;„ 1  "  y  nkn  E  ^ ? l ( ^ M e ( T ° , T ° + A:„]) - / ^ n / ^ n " ^ 0 ,  71 ^  OO,  or 1 " -T-E(^?2/nt -/^n) ^  Since i;(ef) = 1 and hence  E{e]ynt)  = E{€^)E{ynt)  0,  n->00.  = /^n,  this last result would be imphed by  Note that 1 " ^yar(Ee?ynt) J  n  "  n  «=1  t=l  = Jk^{^^^iE^tf^n]  +  =0(l).Fûr(iË^?) +  E[J2etal]}  0(l)-i-£(4)  = 0 ( l ) F û r ( ^ Ë e ? ) + o(l)i;(.t).  It remains to show that Var{^ Ylt=i f?) = o(l) and Eie^) = 0(1). YlJLo  <  To this end observe that  a-nd hence by equation (4.8), that £^(e|) ~ 0 ( 1 ) . Now,  OO  OO  Y^'u) j=o oo  OO  oo  = E ( ^ c E^'^^+.)' ^ j=0 i=0 °° u  ^-c E ( E 7 7 W ' ^ ' ^ ^ ' ^ ' ^ j=0 i=0 ^ '  oo  E ( E i^'V'.+ii)^ j=0 i=0 oo  oo  E ( E l^'+iD' < j=o i=0  Consequently, Y,-oo 7^(j) = 2 Ylf=o l^U) " 7^(0) < oo, and hence, by equation (4.9),  y«.(i|:4)=o(i).  (iii)  Since €^(T^,T^  + K) = (7jën{T^,+  K), it suffices to show that  ^ ë „ ( r P , r ° + Ar„)X„(r°,r]' + k^) ^  0,  n o o ,  or, for any a 7^ 0,  E[^€n(TlTJ  + kn)Xn(TlT]  + K)aif  = o(l).  But  ^[^'xil(x>.6(r0,x°+fc„])] = ( ^ [ a ' x i l x i , = r ° ] / d ( r ° ) + o(l))kn  and  ^[(a'xi)2l(,.,e(rO,,o+,„j)]  = (E[(a'xi)'\xu  = r°]/d(rj') + o(l))kn.  Consequently,  1  "  1  "  t>s  oo  =o(i)+o(i)^E  E  oo  i^'^^i  t>s ij:i—j=t — s  =o(i)+o(i)^EE fc=l a=l ^  n—1  E  i^'^ii  i,j:i—j=k oo  =o(i)+o(i)-jE("-^)Ei^i+'^^^i ^  ^  k=l  i=o  oo n—1  <o(i)+o(i)-EEi^i+^^ii "  i=0 oo  oo  <o(i)+o(-)EEi^i+^^ii ^  oo  oo  <o(l) + 0 ( i ) E ( E l ^ ^ + ' ^ l ) ' =o(l). This completes the proof.  f  W i t h Lemmas 3.6, 4.3, 4.7 and Theorems 4.2, 4.3, the proof of Theorem 4.4 is analogous of that of Theorem 3.4.  P r o o f o f T h e o r e m 4.4 for some j, P{x[0j+i  B y Theorem 4.1, the problem can be restricted to {/ = / ° } . Suppose  - Pj) ^ 0\xd = r?) > 0. Hence A = E[{x[0j+i  Let /3(a, TJ) be the minimizer of | | y „ ( a , TJ) — X„(a,77)y3|p. Set  - Pj)f\xd  = rj] > 0.  — Kln^ n/n for n = 1,2, • • - ,  where K will be chosen later. The proofs of Lemma 3.6 and Theorem 4.3 show that if a „ 'Hn  Til then j â ( a „ , 7 ? „ )  /3(r°_j + ^, rj" + kn) ^ € ('r°_i,rj'), i ^ l x i x i  y5(a, 77) as TI —»• oc. Hence, for rj" + k —>• 0 0 . By Assumption  Pi'r'j-i + ^, TJ") as 1(2;JJ6(T?_I+(5,T?])}  numbers, ${Tf_i + S, rf) "-4' Pj as TI  4-2, for any sufficiently small  is positive definite, hence, by the strong law of large  0 0 . Therefore PiTf_i + 6, rj" + kn)^  a sufficiently small ^ > 0 such that for ah sufficiently large n, \\P(TJ_I  Pj. So, there exists  + S,TJ + kn) - Pj\\ <  = rj") (/SCrf.i+5, r j ' + A ; „ ) - ^ , + i ) > A / 2  \\~Pj-P,+x\\ and {P{rj_i+6,TJ+kn)-~Pj+x)'E{-Kix[\xid  with probability approaching 1. Hence by Theorem 4.2, for any e > 0, there exists Ni such that for n> Ni, with probability larger than 1 — e, we have (i) | f i - r P | < < 5 , i = l , - . - , / o , (u) ||/3(r?_i + <5, r9 + fc„) - Pj^^f (iu) iPiTf_i  < 2\\Pj - Pj+i\\' and  + 6, rj» + kn) - Pj+r)'E{xix[\xid  = rj){P{rU  +  + ^-)) " -^i+i) > A / 2 .  Let Aj = { ( n , • . -, r,o) : jr.- - r f l < ^, i = 1, • • •, /«, \TJ - rfl > the least squares estimates  f i , • • • , f/o, 5 „ ( f i , • • • , f i o )  inf  implies (fi,---,fio)  <  j = 1, • - •, /«. Since for  5„(r{', • • •, r^^ ),  { 5 „ ( r i , . . . , r i o ) - 5 „ ( r ° , . . . , r ° o ) } >0  ^ Aj, or, \fj-TJ\  < kn = Kln^ n/n when (i) holds. B y (i), if we show that  for each j , there exists N > Ni such that for all n > N, with probabihty larger than 1 - 2e, inf(Ti,...,T,o)eyij{'5'n(T"i,•  • • jTjo)  - 5'n(r{',• • • , r , o ) } > 0, we will have proved the desired result.  Furthermore, by symmetry, we can consider the case when TJ > TJ only. Hence Aj may be replaced by  = {(rj, • • • , r ( o ) : \Ti-Tf\  < 6, i = l , - - - , / ° , TJ-T] > kn}. For any ( r i , • • • , r , o ) G  A'j, let Cl < • • • <  be the set { n , . . . , r^o, T°, • • •, T]_,,T]_.,  after ordering its elements and let  =  + S, r^+i -6,r°^,,---,  }  — oo- Using Corollary 4.1 (ii) twice, we  — 0 0 , ^2i°+2  have  = [ 5 „ ( r ° , • • • , r ° ) + Op(ln2  ^ ^^(^^2  = 5 „ ( r { ' , . . . , r ° o ) + Op(ln2 n). Thus, •5n(n, • • - jTio) >Sn{Çl,-  ••,^2l<>  + l)  2l°+2  E  +[5„(rj'_i +  •5'n(ei-l,^i) +  Snir^x  + S,Tj) + Sn{T,,T%,  r,) + 5„(r,-, r ] ^ : - b)\ - \Sn{r]_i +  - b)  r]) + 5„(r9, ^ « , 1 - S)\  = 5 „ ( r { ' , . . . , r ° ) + 0p(/n2n)  + [ 5 „ ( 7 f _ i + b,r,) + 5 „ ( r , - , r ° , i - ^)] - [5n(r°_i + <J,r°) + 5 „ ( r « , r ] V i - -5)], where Op(ln'^n) is independent of (TI, • • •, r;o) G A^-. It suffices to show that for 5 „ = {TJ : TJ G (•'•j + ^n»7-j + ^)} and sufficiently large n , inf { 5 „ ( r ° _ i - S, Tj) + 5 „ ( r , , r ° , i - ^) - [5„(r°_i + ^, r ° ) + 5 „ ( r ° , r ] V i - 6)]} (4.10)  with probabihty larger than 1 - 2e for some fixed M' > 0. Let n  5 „ ( a , r?;^) = | | y „ ( a , 7?) - X „ ( a , 7/)^|p = J^iyt  -  x0)H^,^,^^^,r,)).  Since 5 „ ( a , 77) = Sn(a, 77; /3(a, 77)), we have  >Sn{TU  + ^>  + ^n) + Sn{TJ +  = 5 „ ( r ? _ i + S, Tf-J(T°_i  + 6,  K,Tj)  + k^)) + 5 „ ( r ° , rf + A:„; ^ ( r ? , ! + 6, r ° + A:„))  (4.11)  + 5 „ ( r 9 + A;„,r,) > 5 „ ( r j ' _ i + S,TJ) + 5 „ ( r ° , r ° + A:„;^(7-]'_i + S,TJ + fc„)) + 5 „ ( r ° + A n d since (r? +  rP^j - <î] C  (TJJTJ^I]  for sufficiently large  fc„,r,).  TI,  = a j + i c U r ° + A:„,rjVi - ^ ) 6 n ( r ° + A:„,r°,i - ^).  Snir] + A:„,r°,i -  Applying Corollary 4.1 (i), we have 0 <Sn{rf + kn, r°+i - 6; Pj+i) - [6'„(r]' + k^, TJ) + 5„(r,-, r^+i - 6)] =Tn{T] +  kn,T,)+Tn{Tj,T]^,-è).  By Lemma 4.3, the R H S is Op(ln^ n). Thus, 5„(r°,rjVi-^) <5„(r°,Tf+i-*;/3,+a) = 5 „ ( r ] ' , r ] ' + A;„;^^+a) + 5„(rO + fc„,r°+i - ^;^,+a) < 5 „ ( r ? , r ° + A;„;^^+i) + 5 „ ( r ° + A;„,r^) + 5„(r,-,r?+i - ^) + ^^(In^ n), where Op(ln'^ n) is independent of TJ. Hence 5„(r,-,rjVx-^) (4.12) > 5 „ ( r ° , r j ' ^ i - <5) - 5 „ ( r j ' , r ° + k^Jj+i)  - S^irj  + k^^rj) + Op{\n' n).  Therefore, by (4.11) and (4.12) [ 5 „ ( r ° _ i + S, TJ) + Snirj,rjVi > 5 „ ( r ? , r ° + kn-Jirj.i  - S)] - [5„(7f_i + 6, rj) + 5 „ ( r « , rj^^ - 6)]  + S,TJ + k^)) - SniT°,T°  + kn-Jj+i) + Op(ln2 n).  Let M > 0 such that the term |Op(ln^ n)| < Mln^ n with prohabihty larger than 1 - e for all n > Ni. To show (4.10), it suffices to show that for sufficiently large n,  Sn(r^,T°  + kn-JiT9_,  + 6, r ° + k^)) - SniT^,T°  + k^; Pj+i) - Mln-'n >  M'ln'n,  or  SniT^rf  + k n , +  ^ ' +  ^n)) " Sn{r°+  with large probabihty. RecaU Sn(a,rj;P) X{T^,TJ  k^,Pj+i)  > ( M ' + M)ln'n  (4.13)  = ll^n(a,7/) - X „ ( a , 7 ? ) ^ | | 2 and y „ ( r ] ' , r j ' + A;„) =  + kn)Pj+i + €niTj,T^ + kn)- Taking K sufficiently large and applying (ii), (in) and  Lemma 4.7 (i), (iii), we can see that there exists N > Ni such that for any n > N, ^ [ 5 „ ( r j ' , rj» + kn, 0{T°_, + S,  + kn)) - Snirl  rj» + kn;Pj+i)]  = ^ [ r n ( T - , ^ r ? + kn) - Xnir^T^ + fc„)/9(r°_i + S,T° + kn)\\' - ||y„(rj>,r° + kn) - Xn{r°,T^  -\\aj+lèn{Tf,T^  + ^^^n(rj,  + kn)Pj+xf]  + kn)\\']  r° + A:„)X„(rO, r« + kn)iPj+i -  + ^'  + ^n))  >A/4 - A / 8 > (M' + M)/A' with probabihty larger than 1 - 2e. Since kn = Kln^n/n,  the above imphes (4.13).  ^  The following Lemma (cf. Hall and Heyde, 1980, L i u 1991) plays an important role in establishing the central hmit theorem for the sample moments involving the {et}. state the lemma, we need to introduce some notation.  Before we  Let T be an ergodic one-to-one measure-preserving transformation on tlie probability space (fi, T,  P).  Suppose Ito is a sub-cr-field of  a square integrable r.v. defined on  satisfying  Also suppose that ZQ is  Z/Q Ç T~^{UO).  P) with E(Zo) = 0, and that {Zt} is a sequence of  r.v.'s defined by Zt = ZQ{T^UI), a; € fi. Let Uk = T'^'iUo), L e m m a 4.8 Suppose thatUo Ç T-^{UQ)  k =  0,±l,--  andputUk  = T-''{UQ).  Let E{Zl)  + {E[Zo-  EiZopm)?)^/-"}  < oo and E{ZQ)  =  0. / / oo  Y,{iE[E{Zo\U.m)fy'  < oo,  m=l  then a*"^ := fim„_oo '^^f"^ exists, where 5 „ := Yjt=\ '^t- Further, Sn  d  N{0,a'').  \fn  P r o o f The proof is obtained from Hall and Heyde (1980, Theorem 5.5 and Corollary 5.4) or Liu (1991, Theorem 4.1).  ^  P r o p o s i t i o n 4.2 (Brockwell and Davis, 1987, Remark 2, p212) Let oo  i=-oo  where the {Ct} is an iid sequence of random variables each with mean zero and variance a'^. If T:T=-oo \^J\ < ^> then, ZZ-oo  hih)\ ..  n  ]imnVari-Yet)= t=l  < oo and oo  ^  oo  l(h) =  h=—oo  To facihtate the statement of the next result let  Gj =  £'(xixil(^j^ç(^o_^^.,o])),  131  ^ j=-oo  ^J?•  and =  aJGj'TjGj',  where 7(1) = £^(ei€i+,) and j = 1, - • • ,P + 1. Also recall that for each j = 1, • • • , / ° + 1,  is the least squares estimate of/3j given r^'s.  L e m m a 4.9 Under the Assumptions 4-0, 4-i and 4-3,  j = h---,P Proof:  + l.  First, we shall show that  It suffices to show that for any constant vector a,  where <7^ =  a'TjU.  B y Assumption 4-3, {x.t}^^oo is an iid sequence of random variables. Let Tt = < t) denote the cr-field generated by constant vector a. To show that  {(s,Xs,  Z]"=i  s < t}, and Zt =  a'x.t€tl(^x,de(,T°_^,T°])  a((^s,'^s,  s  for a given  has an asypmtotic normal distribution, one needs  to verify the conditions of Lemma 4.8. Thus, it suffices to show that EZQ = 0, EZQ < 00, E : : = i ( ^ [ ^ ( ^ o | ^ - „ . m ^ < 0 0 , and 00  Y,{E[Zo-EiZo\Tm)?y^'<oo. 132  (4.14)  Observe that EZ^ - a'£;(xol(^„^ç(.r?_,,T?]))-^fo = 0 and EZl = a'E(xox[,l(^g_^g(^o_^,^o]))a < oo. Also, for m > 1, Zo = " ' x o f o l ( 2 ; o d e ( T J ' _ i , T ° ] ) is .T^m-measurable, hence  - E{Zo\^m) =  Zo - Zo = 0. So (4.14) is trivial. It remains to show that Y^'^^iiE[E{Zo\J^-mf]y^^  < oo.  Now, note that ElEiZolJ'-m)? oo i=0  oo  = ^[^("'^ol(.o,e(r^„rO])) E  ^'^-'l' oo  =[x;(a'xoi(.„,e(,c^,,.o]))]2i;[5;v.C-,f oo  =[X;(a'xol(.„,e(,^^,rO]))]2  ^fcr^2  oo  E t=m where cj = [E{a'xol(^^^e{T°_„rf]))?(^C Thus  CO  Y{E[E{Zo\T.m)?V^' m=l oo  oo  m=:l oo  «=m oo  m=l »=m oo oo  s v J J t o E l E T - f W r . Tn=l »=Tn ^ under our assumption that \ipi\ < ka/Çi + 1)'^ for all i. Replacing the 6 i n equation (4.3) with 26, we obtain that  °°  E  1  °°  u + 1)25 = E  1  1  + i)2S ^ I2S _ i)Tn2«-i • 133  ("^-1^)  Since 2(5 - 1 > 1, 771 = 1 OO  771=1  This shows that  E " = i •^t ^^.s an asymptotic normal distribution. We next calculate  the asymptotic variance of ra"^/^ Z)"=i ^t- B y Lemma 4.8, it is  n-+oo  n  1  n  = ^ [ ( " ' x i ) ' l ( x . , 6 ( r < L „ r 0 1 ) ] + [ ^ ( " ' ^ l l ( x , , . r O ] ) )]'  ^  i;e,Q  = a ' G , a + [i;(a'xa(x,.e(r;^„rO]))]' J i m ^ i ^ E ^ ^ ^ ' " E ^ ? ] 1  "  = a ' G , a + a'[i;(xil(,^,g(,<^^.,o]))i;(xil(,,,e(,<^^,,o]^  - i l - > ( E ^ ? ) ] ' where lim„..^oo -^-E^CEfLi ^t) = Ee\ = 1 by our assumption. B y Proposition 4.2, 71  OO  ^hjn^nFar(-Eft)= t=l Hence, h m „ ^ o o nVar{l  E i=-oo  ^t) - ^ = ET=-oo T ( 0 - 7(0) = 2 E . ^ x 7 ( 0 , and  lim ^  7i->oo  n  =  a'Tja,  which is CT^. By the strong law of large numbers for ergodic sequences,  as 71 —>• oo. W i t h sufficiently large n, ( X ^ ( r j ' _ i , r ° ) X „ ( r P _ i , rj*))"^ exists a.s., and  71/ as 71 ^ oo. Hence,  =(^;(^-i,r°)X„(r°_i,rj'))-i(X;(7f_i,r?)X„(7f_i,rj>)^,  + X ; ( r ° _ i , r")?:)  =Pj + a , ( X ; ( r ] ' _ i , r « ) X „ ( r ° _ i , r ° ) ) - i x ; ( 7 f _ i , r ° ) c „ . Since a ] G - i ' [ G , + 2 E S i 7 ( 0 i ^ ( x i l ( . , , e ( . < ^ ^ , . o ] ) ) X ; ( x i l ( , ^ , e ( . o ^  v^(^;-/3i)^m£i)This completes the proof.  Lemma 4.10  Under the condition of Lemma 4-9, 1  asn^oo,  f  "  where vj = p , ( l - pj)Eiei)  xu < rf).  P r o o f It suffices to show that  + pj[iv - 3)7^(0) + 2 ZT=-oo 7^(0] '^rid p, = P{T'J_I  <  Let Tt = <T(C,,X,, S <t) he the cr-field generated by {CsjX,, s < t} and  = e?l(x„e(r°_i,r»]) - Pj-  To show that  E"=i  has the asymptotic normal distribution, one needs to verify that the  conditions of Lemma 4.8 obtain. That is, it must be shown that EZQ = 0, EZ^ < oo,  oo  J2iE[EiZo\T.m?])'/'< m=l  oo,  and  ^iE[Zo-EiZo\Tm)?y/'  <oo.  m=l  the latter having the appearance of (4.14). We obtain EZQ = £e§£l(xo^ç(^o_^ .,.0]) - pj = 1 -Pj - Pj = 0, and  EZl  =i;(egl(x„^e(^o_^,^<)]) = E{4M^0d€(rf_„r°]))  -pjf + P'j - 2Pj£(fol(xo.e(rj^.,rO]))  =PjEe*-pj <oo.  Also, for m > 1, Zo is J'm-measurable. Hence, Zo-E{Zo\Tm)  = ZQ-ZQ  - 0. So (4.14) is trivial.  It remains only to show that Em=i(^[^(^oi.^-m)^])^/^ < oo. Recall that E^el) = al E . ^ o V'."  is assumed to be 1. Hence, E[E{ZQ\T-m)?  =  E[Ei4Hxode(rO_,,rO])-Pi\^-m)f  =E\pjE{el\T.m)-Pj? oo ^p]E[E{{Y,i^,^-if\^-ra)-lf i=0 m-1  =p)E[Y,i^>i  +  i=0  {Y.^iC-i?-if »=m  =p)E[{±i.iC-if-f:^Hf i=m  i=m  oo  oo  =p][EiZ^iC-ir-{E^'-i)'^i~ m  i= m  Using equation (4.8) by setting ipi = Q for i < m , we have  i=m oo  i=m oo  t=m  oo  «=m oo  oo  ^('/-iKcE^i)' :=m  <(r;-l)a^fc^(E-i-^f. By (4.15), YlZm  + 1 ) " < 1/(2^ - l ) m 2 * - i . Thus, oo  J^iElEiZolT.m)?}'^' < f : p . v ^ ^ - i k i i ± j r ^ )  m=l  <oo.  »=m  m=l  '  '  Finally, r  Vj = l i m  ESI  n  n-^oo  1  "  = J l ^ -^(E(^'l(x..€(rO.„rO]) - P i ) ) '  s,t  - Pi(f?l(x.ae(T».,,r°i) + fll(x„e(TO_,,TJ'l))] = £ ^ i E -  + £ ^ ^ E [ ^ ( ^ ' ^ ? > i + pj - p'^(^3) - p,'^(f?)]  lim i y i ; ( e ? ) p 2  =p,£(et) + Jim  ^  £ [ ( , 2 _ i)(^2 _  _ p2^(^4)  1 °° = p , ( l - pj)E{et) + p] J i m n F a r ( B y equation (4.9), limn^oo nVari^ the proof.  ^'t)-  E t = i f?) = (^ - 3)7^(0) + 2 E S - o o 7 ' ( 0 - This completes  ^  P r o o f o f T h e o r e m 4.5  We shall show the conclusion for the j9j's first.  Let Pj denote the least squares estimate of Pj when (rf, • • •, r^o ) is known, j = 1, • • •, /° +1. By Lemma 4.9, it suffices to show that Pj and Pj share the same asymptotic distribution, for all j . In turn, it suffices to show that Pj - Pj = Op{n~'/-). Set X ; = / „ ( r j ' _ i , r j ' ) X „ and Xj = Ufj-ufj)Xn. Then, =[(ixjx,)- - ( i x ; ' x ; ) - ] [ i x j y „ ] + [(ix;'x;)-][i(x, - x;)'y„] It 7t Tt Tt /* = [ ( i x ; . x , ) - - {^x;'x;r]{kx'j - x ; ) X + ix;y„} + [(ix;'x;)-][^(x, - x ; ) X ] =:(/){(//) + ( / / / ) } + ( / y ) ( / / ) .  where (/) = [(^X'^Xj)-  - ( i X / X / ) " ] , ( / / ) = i ( X j - X ; ) ' y „ , ( / / / ) = i X ; y „ and (IV) =  [ ( i x / x ; ) - ] . As in the proof of Theorem 4.3, both (III) and (IV) are Op(l). A n d the order of Op(ra~^/^) of (I) and (II) foUows from Lemma 3.6 by taking a„ = In^n/n, Zt = (a'x^)^ and Zt = a'xtj/f respectively, for any real vector a and u > 2. Thus, Pj — Pj = Op{n~'/'^). Next, we proof the conclusion for the <T|'S. Let aj* denote the least squares estimate of  when ( r ° , • • •, r^o) is known, j = 1, • • •, P + l.  By Lemma 4.3, T „ ( r j ' _ i , r ? ) = Op(ln'^n). Hence,  1  "  1  "  = -'']J2^ti(x„e(rO_„rO]) t-i  1  +  Op{ln\/n).  By Lemma 4.10, 1  "  Therefore ^ ( • ? n ( T f _ i , r ° ) - np.aj) ^  iV(0, t;,a,^),  and hence v^p,(âf-(TJ)-^A(0,t;,a,^).  It remains to show that aj - aj* = Op{n~'^'^). As in the proof of Theorem 4.3, it suffices to show that 5 n ( f j - i , f j ) - 5„(rj'_i,r]») = Op(7i-V2). g y equation (4.7), 5„(Vi,f,)-5„(r°_i,r°) n  - {((//) + (///))'[(/) + (/F)](//) + ((//) + (///))'[(/)](///) + ( / / ) ' ( / F ) ( / 7 / ) } .  Taking a „ = In'^n/n, u > 2 and Zt = yt, it follows from Lemma 3.6 that n ^ J2^=i Vt i^(xtdefii) ~ •'•(a^ideH?)) = Op(ra~^/2). Also, it is shown in the proof of Theorem 4.3 that both (III) and (IV) are O p ( l ) . The order of Op(n-^/2) of (j) ^nd (II) follows from Lemma 3.6 by taking a „ = lv?n/n,  Zt = (a'xi)^ and Zt = aJxtyt respectively, for any real vector a and u > 2.  This shows that a] - à]* = o(ra-^/2)_  ^  P r o o f o f T h e o r e m 4.6 For d = (f,hy  Lemma 4.5 (u),  —^o-Q-  -Sn  n  For d ^ dP, -we shall show that approaching 1. Again,  >CTQ+ C for some constant C > 0 with probability  = 1 is assumed for simplicity. JÎ d  d9,hy the identifiability of d°, for  any {Rj}f^i , there exist r,5 € {1, • • •, X +1} such that Rf D  where  is defined in Theorem  2.1. Let 5s = { ( n , . . . , TL) : Rf D Af for some r}. Then for any ( n , . . . , TL), ( n , • • •, TL) € for at least one s e {1, • • •, L + 1}. Since d is chosen such that S^ < show that iox d^ dP and each s, there exists  for  all d, it suffices to  > 0 such that  inf i5^(rj,...,rL)>a^ + C , (Ti,...,Ti)€B, n with probabihty approaching 1 as n -> oo. For any {TI,...,TL)  (4.16)  € Bs, let -^£,^.2 = {x : a;, €  ( r r _ i , a , ) } , i2|,+3 = {x : Xd € ( 6 „ r r ] } . Then Ri = Afxj Rj^^^ U  From Lemma 4.3 and  the proof of Lemma 3.2', we can see that the conclusion of Lemma 3.2' still holds under current assumptions. Hence, the conclusions of Proposition 3.1' and Lemma 3.3' also hold. Therefore, by (3.13)  i 5 ^ ( r i , ...,TL)  = al +  Op(l) + ^ [ 5 „ ( A f ) - 5 „ ( A f n R^) - 5 „ ( A f n R^)].  Now it remains to show that i [ 5 „ ( A f ) - 5 „ ( A f n i 2 ? ) - 5 „ ( A f ni?^)] >  for some C, > 0,  with probabihty approaching 1. By Theorem 2.1, Z;[xixil(xjg^^P^o)], i = 1,2, are positive definite. Applying Lemma 3.3' we obtain the desired result.  f  Chapter 5  SUMMARY AND FUTURE RESEARCH  5.1 A brief summary of previous chapters  In this thesis, we propose a set of procedures for estimating the parameters of a segmented regression model. The consistency of the estimators is established under fairly general conditions. For the "basic" model where the noise is an iid sequence and locally exponentially bounded, it is shown that if the model is discontinuous at a threshold, then the least squares estimate of the threshold converges at the rate of Op{lv?nln). For both continuous and discontinuous models, the asymptotic normality of the estimated regression coefficients and the noise variance is established. The least squares "identifier" of the segmentation variable is shown to be consistent, if the segmentation variable is asymptotically identifiable. A more efficient method of identifying the segmentation variable is given under stronger conditions. Most of these results are generalized to the case where the noise is heteroscedastic and autocorrelated. A simulation study is carried out to demonstrate the small sample behavior of the proposed estimators. The proposed procedures perform reasonably weU in identifying the models, but indicate the need for large sample sizes for estimating the thresholds.  5.2 Future research on the current model  First, further work on choosing  and CQ in the MIC  is needed.  One way to reduce  the risk of mis-specifying the model is to try different  (^O)Co)  values over certain range. If  several (<5o,co) pairs produced the same /, we would be more confident of our choice. Otherwise different models can be fitted. A n d the estimated regression coefficients and noise variance may then indicate what {60, CQ) is more appropriate. In particular, when the noise is autocorrelated, recursive estimation procedures need to be investigated. Second, the asymptotic normality of the estimated regression coefficients for continuous models need to be generalized to the case where the noise is heteroscedastic and autocorrelated. The techniques used in Sections 3.5 and 4.5 are useful but additional tools are needed, such as the central limit theorem for a double array of martingale sequences. T h i r d , the local exponential boundedness assumption made on the noise may be relaxed. Note that this assumption implies that ei has moments of any order. If Ci is assumed to have only moments to finite order, a model selection criterion with a penalty term of the form Cn°' (0 < a < 1) may well be consistent. This has been shown by Yao (1989) for a one-dimensional step function with fixed covariates and iid noise.  5.3 Further generalizations  Further generalization of the segmented regression model will enable its broader apphcations. First, there may be more than one segmentation variable. For example, changes in economic policy may be triggered by the simultaneous extremes in a number of key economic indices. The results in this thesis may be generahzed to the case where more than one segmentation variable is present.  Further, since sometimes there is no reason to beheve that  segmentation has to be parallel to any of the axes, a threshold defined in terms of a linear combination of explanatory variables may be appropriate. A least squares approach or that of  Goldfeld and Quandt (1972, 1973a) can be applied. Large sample properties of the estimators given by these approaches would need to be investigated. In many economic problems, the explanatory variables exhibit certain kinds of dependence over time. The explanatory variables and the noise may also be dependent. Our results can be generalized in this direction, since the iid assumption on {x^} is not essential. Once such generahzations are accomplished, we expect this model to be useful for many economic problems, since many economic policies and business decisions are threshold-based, at least to some extent. In fact, the segmented regression model has been applied to a foreign exchange rate problem by L i u and Susko (1992) with significantly better results than other approaches reported in the hterature. A n d , the need for a theoretical justification for this approach is obvious. K yt and Xti i n Model 2.1 are replaced by Xt and xt-i respectively {i = /, • • •,p), where {xf} is a time series, then the model becomes a  threshold autoregressive  model.  This interesting  nonhnear time series models has been studied by many authors. See, for example, Tong (1987) for a review on some recent work on nonlinear time series analysis. Because this model is very similar to ours i n its structure, the approaches used in this thesis may also shed some light on its model selection problem and the large sample properties of its least squares estimates. In particular, we expect a criterion similar to MIC can be used to select the number of threshold for the threshold autoregressive model.  REFERENCES Bacon, D . W . and Watts, D . G . (1971). Estimating tiie transition between two intersecting straigiit lines. Biometrika, 58, 525-543. Bellman, R. (1969). Curve fitting by segmented straight fines. J. Amer. Statist. Assoc., 64, 1079-1084. Bilhngsley, P. (1968). Convergence of Probability Measures. Wiley, N . Y . Breiman, L . , and Meisel, W . S . (1976). General estimates of the intrinsic variability of data in nonlinear regression models. J. Amer. Statist. Assoc., 71, 301-307. Brockwell, P . J . and Davis, R . A . (1987). Time series: Theory and methods. Springer-Verlag, N.Y. Broemehng, L . D . (1974). Bayesian inferences about a changing sequence of random variables. Commun. Statist., 3, 234-255. Cleveland, W . S . (1979). Robust locally weighted regression: A n approach to regression analysis by local fitting. J. Amer. Statist. Assoc., 74, 829-836. Cleveland, W . S . and Devlin, S.J. (1988). Locally weighted regression: an approach to regression analysis by local fitting. J. Amer. Statist. Assoc., 83, 596-610. Dunicz, B . L . (1969). Discontinuities in the surface structure of alcohol-water mixtures. KolloidZeitschr. u. Zeitschrift f. Polymère, 230, 346-357. Ertel J . E . and Fowlkes E . B . (1976). Some algorithms for linear spline and piecewise multiple linear regression. / . Amer. Statist. Assoc., 71, 640-648. Farley, J . U . and Hinich, M . J . (1970). A test for a shifting slope coefficient in a hnear model. J . Amer. Statist. Assoc., 65, 1320-1329. Feder, P.I. and Sylwester, D . L . (1968). On the asymptotic theory of least squares estimation in segmented regression: identified case (preliminary report) abstracted in Ann. Math. Statist., 39,1362. Feder, P.I. (1975a). On asymptotic distribution theory in segmented regression problemsidentified case. Ann. Statist. 3, 49-83. Friedman, J . H . (1988). Multivariate Adaptive Regression Sphnes, Report 102, Department of Statistics, Stanford University. Friedman, J . H . (1991). Multivariate Adaptive Regression Splines. Ann. Statist. 19, 1-141.  Feder, P.I. (1975b). The log hkelihood ratio in segmented regression. Ann. Statist. 3, 84-97. Ferreira, P . E . (1975). A Bayesian analysis of switching regression model: Known number of regimes. J. Amer. Statist. Assoc., 70, 730-734. Gallant, A . R . and Fuller, W . A . (1973). Fitting segmented polynomial regression models whose join points have to be estimated. J. Amer. Statist. Assoc., 68, 144-147. Goldfeld, S . M . and Quandt, R . E . (1972). Nonlinear Methods in Econometrics. Pubhshing Co.  North-Holland  Goldfeld, S . M . and Quandt, R . E . (1973a). The estimation of structural shifts by switching regressions. Ann. Econ. Soc. Measurement, 2, 475-485. Goldfeld, S . M . and Quandt, R . E . (1973b). A Markov model for switching regressions. Journal of Econometrics, 1, 3-16. Hall, P. and Heyde, C . (1980). Martingale limit theory and its application. Academic Press. Hawkins, D . M . (1980). A note on continuous and discontinuous segmented regressions. Technometrics, 22, 443-444. Henderson, H . V . and Velleman, P . F . (1981). Building regression model interactively. Biometrics, 37, 391-411. Henderson, R. (1986). Change-point problem with correlated observations, with an application in material accountancy. Technometrics, 28, 381-389. Hinkley, D . V . (1969). Inference about the intersection in two-phase regression. Biometrika, 56, 495-504. Hinkley, D . V . (1970). Inference about the change-point in a sequence of random variables. Biometrika, 57, 1-17. Holbert, D . and Broemhng, L . (1977). Bayesian inferences related to shifting sequences and two-phase regression. Commun. Statist. Theor. Meth., A6(3), 265-275. Jennrich, R . J . (1969). Asymptotic properties of non-hnear least squares estimators. Math. Statist, 40, 633-643.  Ann.  Hudson, D . J . (1966). Fitting segmented curves whose join points have to be estimated. Amer. Statist. Assoc., 61, 1097-1129.  J.  L i u , J . and L i u , Z. (1991). Higher order moments and hmit theory of a general bilinear time series. Unpubhshed manuscript.  L i u , J . and Suslco, E . A . (1992). Forecasting exchange rates using segmented time series regression model - a nonlinear multi-country model. Unpubhshed manuscript. MacNeill, L B . (1978). Properties of sequences of partial sums of polynomial regression residuals with applications to test for change of regression at unknown times. Ann. Statist., 6, 422-433. McGee, V . E . , and Carleton, W . T . (1970). Piecewise regression. J . Amer. Statist. Assoc., 65, 1109-1124. Miao, B . Q . (1988). Inference in a model with at most one slope-change point. Journal of Multivariate Analysis, 27, 375-391. MuUer, H . G . and Stadtmuller, U . (1987). Estimation of heteroscedasticity in regression analysis. Ann. Statist., 15, 610-625. Poirier, D . J . (1973). Piecewise regression using cubic splines. J. Amer. Statist. Assoc., 68, 515-524. Quandt, R . E . (1958). The estimation of the parameters of a linear regression system obeying two separate regimes. / . Amer. Statist. Assoc., 53, 324-330. Quandt, R . E . (1960). The estimation of the parameters of a linear regression system obeying two separate regimes. J. Amer. Statist. Assoc., 53, 873-880. Quandt, R . E . (1972). A new approach to estimating switching regression. J. Amer. Assoc., 67, 306-310.  Statist.  Quandt, R . E . , and Ramsey, J . B . (1978). Estimating mixtures of normal distributions and switching regression. (With discussion). J. Amer. Statist. Assoc., 73, 730-752. Robison, D . E . (1964). Estimates for the points of intersection of two polynomial regressions. J. Amer. Statist. Assoc., 59, 214-224. Sacks, J . and Ylvisaker, D . (1978). Linear estimation for approximately linear models. Statist., 6, 1122-1137.  Ann.  Schulze, U . (1984). A method of estimation of change points in multiphasic growth models. Biometrical Journal, 26, 495-504. Schwarz, G . (1978). Estimating the dimension of a model. Ann. Statist., 6, 49-83. Serfling, R . J . (1980). Approximation  theorems of mathematical statistics. Wiley, New York.  Shaban, S.A. (1980) Change point problem and two-phase regression: an annotated bibhography. International Statistical Review, 48, 83-93.  Shao, J . (1990). Asymptotic theory in heteroscedastic nonlinear models. Statistics & Probability Letters, 10, 77-85. Shumway, R . H . and Stoffer, D.S. (1991). Dynamic linear models with switching. J. Amer. Statist. Assoc., 86, 763-769. Sylwester, D . L . (1965). On the maximum likelihood estimation for two-phase Unear regression. Technical Report No. 11, Department of Statistics, Stanford Univ. Sprent, P. (1961). Some hypotheses concerning two phase regression lines. Biometrics, 634-645. Univ.  17,  Susko, E . A . (1991). Segmented regression modelhng with an apphcation to German exchange rate data. M . S c . thesis. Department of Statistics, University of British Columbia. Tong, H . (1987). Non-linear time series models of regularly sampled data: A review. Proc. First World Congress of the Bernoulli Society, Tashkent, USSR, 2, 355-367, The Netherlands, V N U Science Press. Weerahandi, W . and Zidek, J . V . (1988). Bayesian nonparametric smoothers for regular processes. The Canandian journal of Statistics, 16, 61-73. Worsley, K . J . (1983). Testing for a two-phase multiple regression. Technometrics, 25, 35-42. Yao, Y . (1988). Estimating the number of change-points via Schwarz' criterion. Statistics & Probability Letters, 6, 181-189. Wu, C . F . J . (1981). Asymptotic theory of nonlinear least squares estimation. Ann. Statist., 9, 501-513. Yao, Y . and A u , S.T. (1989). Least-squares estimation of a step function. Sankhya: The Indian Journal of Statistics, A, 51, 370-381. Yeh, M . P . , Gardner, R . M . , Adams, T . D . , Yanowitz, F . G . , and Crapo, R . O . (1983). "Anaerobic threshold": Problems of determination and validation. J. Apply. Physiol. Respirit. Envioron. Excercise Physiol., 55, 1178-1186. Zwiers, F . and Storch, H . V . (1990). Regime-dependent autoregressive time series modeling of the Southern OsciUation. Journal of Climate, 3, 1347-1363.  Table 3.1: Frequency of correct identification of P in 100 repetitions and the estimated thresholds for segmented regression models ( m,mu,mo are the frequencies of correct, under- and over-estimations of MIC  )  sample size  : m(mu, nio)  h (SE)  30  50  100  200  Model{a)  79 (18, 3)  95 (4, 1)  100 (0, 0)  100 (0, 0)  1.168 (1.500)  1.033 (1.353)  1.410 (0.984)  1.259 (0.665)  70 (21, 9)  86 (8, 6)  99 (0, 1)  100 (0, 0)  1.022 (1.546)  1.220 (1.407)  1.432 (0.908)  1.245 (0.692)  80 (6, 14)  97(1,2)  100 (0, 0)  100 (0, 0)  0.890 (0.737)  0.761 (0.502)  0.901 (0.221)  0.932 (0.151)  85 (8, 7)  99 (0, 1)  100 (0, 0)  100 (0, 0)  0.791 (1.009)  0.860 (0.665)  0.971 (0.232)  0.963 (0.169)  68 (23, 9)  87 (12, 1)  100 (0, 0)  100 (0, 0)  0.463 (1.735)  0.708 (1.332)  0.989 (0.923)  0.940 (0.707)  Model{b)  Model(c)  Model{d)  Model(e)  Table 3.2: Estimated regression coefficients and variances of noise and their standard errors with n = 200 ( Conditional on / = 1 )  4- (SE)  Model (a)  Model (b)  Model (c)  Model (d)  Model (e)  Pw  -0.003 (0.145)  -0.018 (0.146)  0.004 (0.143)  -0.008 (0.154)  -0.059 (0.177)  /3ii  1.001 (0.038)  0.995 (0.037)  1.000 (0.035)  0.995 (0.041)  0.985 (0.045)  /3l2  1.000 (0.024)  0.996 (0.025)  -0.004 (0.025)  0.000 (0.024)  1.000 (0.025)  0.994 (0.023)  0.995 (0.025)  /?13 /Î20  1.485 (0.345)  1.388 (0.332)  0.962 (0.243)  1.009 (0.225)  0.960 (0.283)  ^21  0.005 (0.063)  0.019 (0.067)  0.008 (0.055)  0.000 (0.049)  0.008 (0.057)  ^23  1.006 (0.034)  0.998 (0.034)  0.495 (0.032)  0.498 (0.032)  0.998 (0.036)  0.997 (0.034)  0.996 (0.036)  0.953 (0.160)  0.944 (0.158)  a2  0.948 (0.108)  0.950 (0.154)  0.956 (0.156)  Table 3.3: The empirical distribution of / in 100 repetitions by MIC, SC and YC for piecewise constant model ( Tip, rai, 712, "3 are the frequencies of / = 0,1,2,3 respectively) MIC  sample size  : no, nx,n2,  YC : no, n\,n2, n^ SC : no, 7 Î 1 ,  Modelif)  Model{g)  7l2, 7l3  50  150  450  5, 30, 48, 17 0, 18, 79, 3  0, 0, 98, 2  5, 36, 45, 14 0, 36, 64, 0  0, 9, 91, 0  0, 17, 52, 31 0, 1, 64, 35  0, 0, 83, 17  5, 38, 51, 6  0, 23, 72, 5 0, 0, 99, 1  7, 41, 48, 4  0, 46, 53, 1  0, 7, 93, 0  3, 18, 56, 23 0, 2, 79, 19 0, 0, 87, 13  Model{h)  Model(i)  0, 3, 81, 16  0, 0, 96, 4  0, 0, 98, 2  0, 3, 84, 13  0, 0, 100,0  0, 0, 100,0  0, 0, 63, 37  0, 0, 82, 18 0, 0, 87, 13  0, 5, 85, 10  0, 0, 97, 3  0, 0, 100, 0  0, 7, 86, 7  0, 0, 100, 0  0, 0, 100, 0  0, 1, 73, 26  0, 0, 83, 17 0, 0, 93, 7  Table 3.4: The estimated thresholds and their standard errors for piecewise constant model ( Conditional on / = 2 ) sample size  r i , (SE) r2, (SE)  50  150  450  Model{f)  0.335 (0.078)  0.338 (0.039)  0.334 (0.012)  0.660 (0.032)  0.666 (0.008)  0.667 (0.003)  0.313 (0.076)  0.332 (0.032)  0.334 (0.013)  0.656 (0.015)  0.669 (0.009)  0.667 (0.002)  0.316 (0.027)  0.334 (0.007)  0.333 (0.002)  0.662 (0.030)  0.667 (0.006)  0.667 (0.003)  0.323 (0.023)  0.332 (0.010)  0.334 (0.004)  0.661 (0.030)  0.666 (0.007)  0.667 (0.003)  Model(g)  Model{h)  Model{i)  Table 4.1: Frequency of correct identification of P in 100 repetitions and the estimated thresholds for segmented regression models with two regimes ( m , mu,mo are the frequencies of correct, under- and over-estimations of /° ) MIC h  sample size  : mim-u,, mo) (SE)  Model (a')  Model (d')  Model (e')  50  100  200  95 (3, 2)  98 (0, 2)  99 (0, 1)  1.322 (1.681)  1.412 (1.293)  1.223 (1.060)  91 (1,8)  95 (0, 5)  99 (0, 1)  0.808 (0.545)  0.936 (0.256)  0.960 (0.109)  94 (3, 3)  98 (0, 2)  99 (0, 1)  0.693 (1.583)  1.088 (1.470)  1.175 (1.111)  Table 4.2: Estimated regression coefficients and variances of noise and their standard errors with n = 200 ( Conditional on / = 1 ) Model (a')  Model (d')  Model (e')  Pio  -0.049 (0.247)  0.007 (0.190)  -0.056 (0.227)  /3n  0.993 (0.066)  0.998 (0.059)  0.985 (0.065)  /3l2  1.003 (0.017)  -0.001 (0.020)  0.999 (0.019)  0.998 (0.018)  0.997 (0.018)  1.258 (0.730)  0.957 (0.461)  0.749 (0.596)  0.033 (0.129)  0.013 (0.107)  0.045 (0.126)  0.998 (0.033)  0.503 (0.029)  1.002 (0.030)  0.998 (0.026)  0.999 (0.029)  0.656 (0.117)  0.639 (0.167)  0.634 (0.166)  0.929 (0.271)  1.050 (0.391)  0.963 (0.361)  kj  (SE)  /3l3  /320  P24 ol  Table 4.3: Frequency of correct identification of /° in 100 repetitions and the estimated threshold for a segmented regression model with three regimes ( m , THU-, rrio are the frequencies of correct, under- and over-estimations of /° ) MIC  sample size  : m(mu, mo)  rx {SE) f2  {SE)  Model (j)  50  100  200  62 (26, 12)  86 (6, 8)  95 (0, 5)  -1.211 (0.251)  -1.051 (0.151)  -1.034 (0.078)  1.046 (0.493)  1.060 (0.388)  0.974 (0.096)  Table 4.4: Estimated regression coefficients and noise variances and their standard errors with n = 200 ( Conditional on / = 2 ) Model (j)  J = 1  i = 2  i = 3  h  (SE)  0.987 (0.290)  -0.029 (0.212)  0.454 (0.413)  h  [SE)  0.996 (0.062)  0.097 (0.480)  0.011 (0.092)  h  {SE)  -0.001 (0.017)  1.000 (0.032)  0.499 (0.028)  0.511 (0.165)  0.681 (0.269)  1.002 (0.294)  {SE)  Figure 2.1 {xi,X2)  uniformly distributed over the shaded area  -2  -1  -1  F i g u r e 2.2 (xi,X2)  uniformly distributed over the eight points  weight  Figure 2.3 Mile per gallon vs. weight for 38 cars  20  8  91  120  120  120 1 2 0 0.5  120  1  Figure 4.1 {xi,X2) uniformly distributed over each of six regions w i t h indicated mass  

Cite

Citation Scheme:

    

Usage Statistics

Country Views Downloads
China 24 13
United States 6 1
France 2 0
Poland 1 0
Canada 1 0
City Views Downloads
Beijing 24 0
Unknown 5 10
Ashburn 2 0
Mountain View 1 0
Wilmington 1 0
Seattle 1 0

{[{ mDataHeader[type] }]} {[{ month[type] }]} {[{ tData[type] }]}
Download Stats

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0086617/manifest

Comment

Related Items