"Science, Faculty of"@en . "Statistics, Department of"@en . "DSpace"@en . "UBCV"@en . "Wu, Shiying"@en . "2008-12-18T21:17:59Z"@en . "1992"@en . "Doctor of Philosophy - PhD"@en . "University of British Columbia"@en . "This thesis deals with the estimation of segmented multivariate regression models.\r\nA segmented regression model is a regression model which has different analytical forms\r\nin different regions of the domain of the independent variables. Without knowing the\r\nnumber of these regions and their boundaries, we first estimate the number of these\r\nregions by using a modified Schwarz' criterion. Under fairly general conditions, the estimated\r\nnumber of regions is shown to be weakly consistent. We then estimate the change\r\npoints or \"thresholds\" where the boundaries lie and the regression coefficients given the\r\n(estimated) number of regions by minimizing the sum of squares of the residuals. It is\r\nshown that the estimates of the thresholds converge at the rate of (Op(ln\u00B2n/n), if the\r\nmodel is discontinuous at the thresholds, and Op{n-\u00B9/2) if the model is continuous. In\r\nboth cases, the estimated regression coefficients and residual variances are shown to be\r\nasymptotically normal. It is worth noting that the condition required of the error distribution\r\nis local exponential boundedness which is satisfied by any distribution with zero\r\nmean and a moment generating function provided its second derivative is bounded near\r\nzero. As an illustration, a segmented bivariate regression model is fitted to real data\r\nand the relevance of the asymptotic results is examined through simulation studies.\r\nThe identifiability of the segmentation variable is also discussed. Under different\r\nconditions, two consistent estimation procedures of the segmentation variable are given.\r\nThe results are then generalized to the case where the noises are heteroscedastic\r\nand autocorrelated. The noises are modeled as moving averages of an infinite number of\r\nindependently, identically distributed random variables multiplied by different constants\r\nin different regions. It is shown that with a slight modification of our assumptions, the\r\nestimated number of regions is still consistent. And the threshold estimates retain the\r\nconvergence rate of Op(ln\u00B2 n/n) when the segmented regression model is discontinuous at\r\nthe thresholds. The estimation procedures also give consistent estimates of the residual\r\nvariances for each region. These estimates and the estimates of the regression coefficients\r\nare shown to be asymptotically normal. The consistent estimate of the segmentation\r\nvariable is also given. Simulations are carried out for different model specifications to\r\nexamine the performance of the procedures for different sample sizes."@en . "https://circle.library.ubc.ca/rest/handle/2429/3141?expand=metadata"@en . "5371601 bytes"@en . "application/pdf"@en . "A S Y M P T O T I C I N F E R E N C E F O R S E G M E N T E D R E G R E S S I O N M O D E L S B y S H I Y I N G W U B . S c , Beijing University, 1983 M . S c , The University of Br i t i sh Columbia, 1988 A T H E S I S S U B M I T T E D I N P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F D O C T O R O F P H I L O S O P H Y in T H E F A C U L T Y O F G R A D U A T E S T U D I E S D E P A R T M E N T O F S T A T I S T I C S We accept this thesis as conforming to the required standard T H E U N I V E R S I T Y O F B R I T I S H C O L U M B I A October 1992 \u00C2\u00A9 S h i y i n g W u , 1992 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of 3 / ^ = ^ ' . S 1^ CX^ The University of British Columbia Vancouver, Canada Date O c i / Also assuming noises are independent normal variables, he obtains the maximum likehhood estimate and confidence interval for the change-point. Adding to the model of Quandt (1958) the assumption that the model is everywhere continuous and the variances of the {et} are identical, Hudson (1966) gives a concise method for calculating the overall least squares estimator of the intersection point of two intersecting regression lines. For the same problem, Hinkley (1969) derives an asymptotic distribution for the maximum likelihood estimate of the intersection which is claimed to be a better approximation to the finite sample distribution than the asymptotic normal distribution of Feder and Sylwester (1968). For the change-point problem, Hinkley (1970) derives the asymptotic distribution of the maximum likelihood estimate of the change-point. He assumes that exactly one change occurs and that the means of the two submodels are known. He also gives the asymptotic distribution when these means are unknown, and the noises are assumed to be identically, independently distributed normal random variables (\"iid normal\" hereafter). As Hinkley notes, the maximum likehhood estimate is not consistent and the asymptotic result is not good for small samples when the two means are unknown. In all of these problems, the number of change points is assumed to be exactly one. For problems where the number of change-points may be more than one, Quandt (1958, p880) concludes \"The exact number of switches must be assumed to be known\". McGee and Carleton (1970) treat the estimation problem for cases where more than one change may occur. Their model is: yt = Po^ + fii'^xu + \u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2 + di'^Xkt + if te h _ i , Tj), where 1 < TI < \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 < TL < T^^I = N and the { e j are iid N{0,a-). Note that L and the r^'s are unknown. Constrained by the computing power available at that time (1970), they pro-pose a estimation method which essentially combines least squares estimation with hierarchical clustering. While being computationally efficient, their method is suboptimal (resulting from the use of hierarchical clustering), subjective (in terms of choice of L) and lacking theoretical justification. Goldfeld and Quandt (1972, 1973a) discuss the so-caUed switching regression model spec-ified as follows: = htPi + ^u, iiT^'zt < 0; Here Zt = {zn, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, Zkt}' are the observations on some exogenous variables (including, possibly, some or all of the regressors), TT = (TTI, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, TT^)' is an unknown parameter, and the {un} are independent normal random variables with zero means and variances, o) \u00C2\u00AB'\u00E2\u0080\u00A2nd reexpress the model as yt = x[[{l - d{zt))(3^ + d{zt)(32] + (1 - d{zt))uu + d{zt)u2t. For estimation the \"D-method\" is proposed: d{zt) is replaced by J - c o \/27rc7 io-^ and the maximum lil^elihood estimates for the parameters are obtained. As they point out, the D-method can be extended to the case of more than two regimes. Gallent and Fuller (1973) consider the problem of estimating the parameters in a piece-wise polynomial model with continuous derivative, where the join points are unknown. They reparametrize the model so that the Gauss-Newton method can be applied to obtain the least squares estimates. An F statistic is suggested for model selection (including the number of regimes) without theoretical justification. Poirer (1973) relates sphne models and piecewise regression models. Assuming the change points known, he develops tests to detect structural changes in the model and to decide whether certain of the model coefficients vanish. Ertel and Fowlkes (1976) also point out that the regression models for linear spline and piecewise linear regression have many common elements. The primary difference between them is that in the linear spline case, adjacent regression lines are required to intersect at the change-points, while in the piecewise hnear case, adjacent regression hues are fitted separately. He develops some efficient algorithms to obtain least squares estimates for these models. Feder (1975a) considers a one-dimensional segmented regression problem; it is assumed that the function is continuous over the entire range of the covariate and the number of segments is known. Under certain additional assumptions, he shows that the least squares estimates of the regression coefficients of the model are asymptoticaUy normally distributed. Note that the two assumptions that the function is continuous and that the number of segments is known are essential for his results. For the simplest two segments regression problem with continuity assumption, Miao (1988) proposes a hypothesis test procedure for the existence of a change-point together with a confi-dence interval of the change-point, based on the theory of Gaussian processes. Statistical hypothesis tests for segmented regression models are studied by many authors, among them are Quandt (1960), Sprent (1961), Hinkley (1969), Feder (1975b) and Worsley (1983). Bayesian methods for the problem are considered by Farley and Hinich (1970), Bacon and Watts (1971), Broemehng (1974), Ferreira (1975), Holbert and Broemehng (1977) and Salazar, Broemehng and Chi (1981). Quandt (1972), Goldfeld and Quandt (1972, 1973b) and Quandt and Ramsey (1978) treat the problem as a random mixture of two regression lines. Closely related to the problem studied in this thesis, Yao (1988) studies the following change-point problem: a sequence of independent normally distributed random variables have a common variance, but their means change / times along the sequence, with / unknown. He adopts the Schwarz criterion for estimating / and proves that such an estimator is consistent. Yao noted that consistency need not obtain without the normahty assumption. Yao and A u (1989) consider the problem of estimating a step function, g{t), over t G [0,1] in the presence of additive noise. They assume that i,- = i/n (i = 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, n) are fixed points and the noise has a sixth or higher moment, and derive limiting distributions for the least squares estimators of the locations and sizes of the jumps when the number of jumps is either known or bounded. The discontinuity of g{i) at each change point makes the estimated locations of the jumps converge rapidly to their true values. This thesis is primarily about situations like those described above, where the segmented regression model may be viewed as a partial explanation model tries to capture our impression that an abrupt change in the mechanism underlying the process. It is linked to other paradigms in modern regression theory as well. Much of this theory (see the references below, for example) is concerned with regression functions of say, y on x, which cannot be well approximated globally by the leading terms of its Taylor expansion, and hence by a global linear model. This has led to various approaches to \"nonparametric regression\" (see Friedman, 1991, for a recent survey). One such approach is that of Cleveland (1979) when the dimension of x is 1; his results, which use a linear model in a moving local window, are extended to higher dimensions by Cleveland and Devlin (1988). Weerahandi and Zidek (1988) use a Taylor expansion explicitly to construct a locally weighted smoother, also when the dimension of a; is 1; a different expansion is used at each i-value thereby avoiding the shortcomings of using a single global expansion. However, difficulties confront local weighting methodologies like those described above as well as kernel smoothers and splines because of the \"curse of dimensionality\" which becomes progressively more serious as the dimension of x grows beyond 2. These difficulties are weU described by Friedman (1991) who presents an alternative methodology called \"multivariate adaptive regression splines,\" or \" M A R S . \" M A R S avoids the curse of dimensionality by partitioning I's domain into a data-deter-mined, but moderate number of subdomains within which spline functions of low dimensional subvectors of a; are fitted. By using splines of order exceeding 0, M A R S can lead to continuous smoothers. In contrast, its forerunner, called \"recursive partitioning\" by Friedman, must be discontinuous, because a different constant is fitted in different subdomains. But, like M A R S it avoids the curse of dimensionality because it depends locally on a small number (in fact, none) of the coordinates of x. Friedman (1991) attributes to Breiman and Meisel (1976), a natural extension of recursive partitioning wherein a hnear function of x is fitted within each subdomain. However, it can encounter the curse of dimensionality when these subdomains are small and Friedman (1991) ascribes the lack of popularity of this extension to this feature. However, the curse of dimensionality is relative. If the subdomains of x are large the \"curse\" becomes less problematical. And within such subdomains, the Taylor expansion leads to linear models like those used by Breiman and Meisel (1976) and here, as natural approximants; in contrast, splines seem somewhat ad hoc. And linear models have a long history of application in statistics. 1.3 New contributions and their relationship to previous work In this thesis, we address the problem of making asymptotic inference for the following where zt = ( \u00C3\u00AF t i , . . . , x^p)' is an observed random variable; f/ is assumed to have zero mean model: p (1.1) i=i and unit variance, wliile r,-, ctj (i = 1 , . . . , /+ 1, j = 0 , l , . . . , p ) , / and d are unlvnown parameters. Our main contributions are as follows. A sequence of procedures are proposed to estimate all these parameters, based on least squares estimation and our modified Schwarz' criterion. It is shown that under mild conditions, the estimator, /, of / is consistent. Furthermore, a bound on the rate of convergence of fi and the asymptotic normality for estimators of Pij, ai (z = / , . . . , /+ 1, J = 0 , 1 , . . . ,p) are obtained under certain additional assumptions. When the segmentation is related to a few highly correlated covariates, it may not be clear which covariate can best be chosen as the segmentation variable. In such a case, d will be treated as an unknown parameter to be estimated. A new concept of identifiabihty of d is introduced to formulate the problem precisely. We prove that the least squares estimate of d is consistent. In addition, we propose another consistent and computationally efficient estimate of d. A l l of these are achieved without the Gaussian assumption on the noises. In many practical situations, it is necessary to assume that the noises are heteroscedastic and serially correlated. Our estimation procedures and the asymptotic results are general-ized to such situations. Asymptotic theory for stationary processes are developed to estabhsh consistency and asymptotic normality of the estimates. Note that in Model (1.1) if f3ij = 0 for all z = 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2,/ -|- 1 and j = 1, \u00E2\u0080\u00A2 equation (1.1) reduces to the change-point problem discussed by Yao (1988), Xd being the explanatoi-y variable controlhng the allocation of measurements associated with different dependence structures. Although our formulation is somewhat different from that of Yao (1988) in that we introduce an explanatory variable to allocate response measurements, both formulations are essentially the same from the point of view of an experimental design. If the other covariates are all known functionals of x^, as in segmented polynomial regressions, and / is known, (1.1) reduces to the case discussed by Feder (1975a). Unlike all the above mentioned work on segmented regression except McGee and Carleton (1970), we assume that the number of segments is unknown, and that the noise may be depen-dent. In terms of estimating /, we generalize Yao's (1988) work on the change-point problem to a multiple segmented regression set-up. Furthermore, his conditions on the noises are relaxed in the sense that the e '^s do not have to be (a) normally distributed (rather, they could follow any of the many distributions commonly used for noise); (b) identically distributed; and (c) independent. In terms of making asymptotic inference on the regression coefficients and the change points, we do not assume continuity of the underlying function which is essential for Feder's (1975a) results. We find that without the continuity assumption, the estimated change points converge to the true ones at a much faster rate than the rate given by Feder. Finally, a consistent estimator is obtained for d, an additional parameter not found in any of the previous work. Our results also relate to M A R S . In fact, our estimation procedure can be viewed as adaptive regression using a different method of partitioning than Breiman and Meisel (1976). By placing an upper bound on the number of partitions, we can avoid the difficulties caused by curse of dimensionahty, of fitting our model to data in high dimensional space (but recognize that there are trade-offs involved). And we have adopted a different stopping criterion in partitioning a;-space; it is based on ideas of model selection rather than testing and seems more appealing to us. Finally, and most importantly, we are able to provide a large sample theory for our methodology. This feature of our work seems important to us. Although the M A R S methodology appears to be supported by the empirical studies of Friedman (1991), there is an inevitable concern about the general merits of any procedure when it lacks a theoretical foundation. Interestingly enough, it can be shown that in some very special cases, our estimation pro-cedures coincide with those of M A R S in estimating the change points, if our stopping criterion were adopted in M A R S . This seems to indicate that, with our techniques, M A R S may be modified to possess certain optimalities (e.g. consistency) or suboptimalities for more general cases. So in summary, with the estimation procedures proposed in this thesis we regain some of the simphcity of the (piecewise) Taylor expansion and attendant linear models, while retaining some of the virtues of added modeling flexibihty possessed by nonparametric approaches. Our large sample theory gives us precise conditions under which our methodology would work well, given sufficiently large samples. And by restricting the number of a;-subdomains sufficiently we avoid the curse of dimensionality. Partitioning for our methodology, is data-based like that of M A R S . 1.4 Outline of the following chapters This dissertation is organized as follows. In Chapter 2, the identifiability of the segmen-tation variable in the segmented regression model is discussed first. We introduce a concept of identifiability and demonstrate how the concept naturally arises from the problem. Then we give an equivalent condition which is crucial in establishing the consistency. Finally, we give a sequence of procedures to estimate all the parameters involved in a \"basic\" segmented regression model with uncorrelated and homoscedastic noise. These procedures are illustrated with an example. The consistency of the estimates given in Chapter 2 is proved in Chapter 3. Conditions under which the procedures give consistent estimates are also discussed. For technical reasons, the consistency of estimates other than that of the segmentation variable is estabhshed first. The estimation problem is treated as a problem of model selection, with the models represented by the possible number of segments, assuming the segmentation variable is known. Schwarz' criterion is tuned to an order of magnitude that can distinguish systematic bias from random noise and is used to select models. Then, with the estabhshed theories, the consistency of the estimated segmentation variable is proved. Simulations with various model specifications are carried out to demonstrate the finite sample behavior of the estimators, which prove to be satisfactory. Results given in Chapter 2 and Chapter 3 are generalized to the case where the noise levels in different segments are different. The noise often derives from factors that cannot be clearly specified and about which little is known. In many practical situations, like that of the economic example mentioned above, the noise may represent a variety of factors of different magnitudes, over different segments. Therefore a heteroscedastic specification of the noise is often necessary. To meet practical needs further, the noise term in the model is assumed to be autocorrelated. The estimation procedures given in Chapter 2 are modified to accommodate these necessities and presented in Chapter 4. It is shown that under a moving average specification of the noise, the estimates given by the procedures are consistent. Further, the parameters specified in the moving average model of the noise term can be estimated by the estimated residuals. Simulation results are given to shed light on the finite sample behavior of the estimates. A summary of the results established in this thesis is given in Chapter 5. Future research is also discussed. One line of future research comes from the similarity between segmented regression and spline techniques. Our model can first be generalized to the case where there are more than one segmentation variables. Then an \"oblique\" threshold model can be considered. A n oblique threshold is one made by a linear combination of explanatory variables. This is reasonable because often there is no reason to beheve that the threshold has to be parallel to any of the axes. Finally, by partitioning the domain of the explanatory variables into polygons, an adaptive regression splines could be developed. This could serve as an alternative to Friedman's (1988) multivariate adaptive regression sphne method, or M A R S . Chapter 2 E S T I M A T I O N O F S E G M E N T E D R E G R E S S I O N M O D E L S In this chapter, we consider a special case of model (1.1) where the {ctj} are all equal and the {et} are independent and identically distributed. In this case, the model can be reformulated as foUows. Let (2/1,a:ii , . . . ,xip), ..., (?/\u00E2\u0080\u009E,x\u00E2\u0080\u009Ei, . . \u00E2\u0080\u00A2,Xnp) be the independent observations of the response, y, and the covariates, xi,...,Xp. Let Xt = [l, Xti,..., Xtp)' for i = l , . . . , n and \u00C3\u0082 = {0io,Piu Pip)', i = 1, . . . , /+ 1. Then, yt = x'Ji + et, if xtd G {Ti-i,Ti], i = 1, . . . , /+ 1, t = l , . . . , n , (2.1) where the {et} are i id with mean zero and variance and are independent of { x j , \u00E2\u0080\u009400 = \u00E2\u0080\u00A2'\"0 < Ti < \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 < T/+1 = 00. The Pi, Ti, (i = 1,.. . , / + 1), /, d and CT^ are unknown parameters. When Pd = 0, the segmentation variable Xtd becomes an exogenous variable as considered by Goldfeld and Quandt (1972, 1973a). A sequence of estimation procedures is given to estimate the parameters in model (2.1). The estimation is done in three steps. First, the segmentation variable or the parameter d is estimated, if it is not known a priori. Then, with d known or supposed known, if estimated, the number of structural changes / and the locations of structural changes r^'s are estimated by a modified Schwarz' criterion. Finally, based on the estimated d, I and r^'s, the Pi's and <7^ are estimated by ordinary least squares. It will be shown in the next chapter that all these estimators are consistent, under certain conditions. It is obvious that to estimate d consistently, it has to be identifiable. In Section 2.1, we discuss the identifiability of d. Specifically, we introduce a concept of identifiability and give equivalent conditions, all illustrated by examples. These conditions will be used in the next chapter to provide the consistency of the estimator of d. Our estimation procedures are given in Section 2.2. In particular, two procedures are given to estimate d under different conditions. The first one assumes less prior knowledge while the second one requires less computational effort. Based on the estimated d, the estimation procedures for other parameters are then given. Finally, all the procedures are illustrated by an example in which the dependence of gas consumption on the weight and horse power of different cars is examined. Some general remarks are made in Section 2.3. In the sequel, either a superscript or a subscript 0 will be used to denote the true parameter values. 2.1 Identifiability of the segmentation variable Although in some appfications, the parameter d can be determined a priori from back-ground knowledge about the problem of concern, it can be hard to determine d with reasonable certainty, due to a lack of background information. For instance, i f the segmentation is related to a few highly correlated covariates, it may not be clear which one can best be chosen as the segmentation variable. Therefore, there is a need for a defensible choice of d based on the data. When the vector of covariates are of high dimension and d cannot be identified by graphical methods, a computational procedure is required. However, when some of the covariates are highly correlated, it may not be clear whether d can be uniquely identified. In the following, we discuss the exact meaning of being \"identified\" and give a set of conditions under which d can be uniquely identified. To simplify notation, let x have the same distribution as that of x i and R\u00C2\u00B0 = {x : x^o G (r?_i ,r?]}, j = 1 , . . . , / \u00C2\u00B0 + 1. And for any d, let {Rff^t\ be a partition of RP where i?^ = {x : Xd \u00C2\u00A3 (r j_i ,Tj]}, - c o = TQ < n < \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 < r; < r;+i = oo. Let X be a known upper bound on the number of thresholds. Intuitively speaking, dP is identifiable if for any d ^ d\u00C2\u00B0, and any partition {Rj}^^^, there is at least one region, say Rf, on which the model exhibits clear nonlinearity. Note that L is involved. Indeed, the identifiabihty oi d\u00C2\u00B0 does depend on L when the domain of X takes a certain special form. This can be easily seen in the following two examples. Example 1 x is uniformly distributed over the shaded area in Figure 2.1, y = l(xi>i) + where is an indicator function. And i2\u00C2\u00B0 = {x : x i e (-00,1]}, ii:^ = {x : xi e (1, oo)}. For X = 1, no threshold on X2 can make the model piecewise linear over its domain. The only possible threshold which makes the model piecewise linear is r i = 1 as defined in the model. For i = 2, however, TI \u00E2\u0080\u0094 \u00E2\u0080\u00941, T2 \u00E2\u0080\u0094 1 also make the model piecewise hnear over its domain. Hence either Xi or X2 can be used as the threshold variable. % The same phenomenon can also be seen in the next example. Example 2 x is uniformly distributed with probabilities concentrated at the 8 points as specified in Figure 2.2, Y = l(xi>0) \u00E2\u0080\u00A2X2 + e. 16 For X = 1, no threshold on X2 can make the model piecewise linear over its domain. For L = 2, however, TI = \u00E2\u0080\u00941/2, T2 = 1/2 make the model piecewise linear over its domain. Hence either xi or X2 can be used as the threshold variable. ^ Sometimes, but not always, one cannot determine whether or not the model is linear on unless the model can be uniquely determined on both Rf n R^ and Rf n R^ for a pair of adjacent In Example 2, if Rf = {-x. : X2 < 0}, dropping the point ( \u00E2\u0080\u0094 1, \u00E2\u0080\u00941) makes the model linear on Rf. Furthermore, since in model (2.1) we did not exclude the possibility of (3i = Pj for nonadjacent to ensure the detection of nonlinearity on Rf, the model has to be uniquely determined on Rf n R^ and Rf D R\u00C2\u00B0j for at least one pair of adjacent To this end, we need 1 \" - ^ X t x ; i ( ^ , e f i . n H O ^ . ) (2.2) be positive definite for z = 1,2 and some A; e {0, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, /\u00C2\u00B0 - 1}. Asymptotically, we need (2.2) to hold with probabiHty approaching 1 as n becomes large, and its LHS should not gradually degenerate to a singular matrix. This in turn can be stated as follow: For any set A , let A(A) be the smallest eigenvalue of jE[xx'l(xeyi)]. Define ^{{Rj}fii) = ,2mRj n Rl+i)}. We win need d\u00C2\u00B0 to be identifiable, defined as follows: Definition 2.1 d^ is identifiable w.r.t. L if for every d ^ A = mi Xi{R^}f+,')>0, (2.3) where the inf is taken over all possible partitions of the form {Rj}^^^ . If /\" = 1, then k = 0 and X{{R^}f+^) = max^ mini=i,2{A(i2^^ n Rf)}. Now, let us examine the identifiability of d^ in the two examples given above. Example 1 (continued) dP is not identifiable w.r.t. L = 2. Since for d = 2, and (r i , r2) = (-1,1) , either P{RJ n i i?) = 0 or P{RJ n iE^) = 0 for all j = 1,2,3. dP is identifiable w.r.t. L \u00E2\u0080\u0094 1. Since for any T\, there exists r G {1,2} such that \u00C2\u00A3^[xx'l(xei\u00C3\u00AE<'nR\u00C2\u00B0)] is positive definite, for i =1 ,2 . f Example 2 (continued) is not identifiable w.r.t. L = 2. Let d = 2. If (r i , r2) = (\u00E2\u0080\u00940.5,0.5) then each of Rj D R'- will contain no more than two points with positive masses, i = 1,2, j = 1,2,3. Hence ^fxx'l^^g^jnjjo)] will be degenerate for all d\u00C2\u00B0 is identifiable w.r.t. L = 1. Since for any TI and i = 1,2, there exists r G {1,2} such that Rf n R'i contain at least 3 points, with positive masses, which are not collinear. Hence \u00C2\u00A3{xx'l(x .e7\u00C3\u00AE' 'ni\u00C3\u00AE\u00C2\u00B0)} is positive definite. Because we have effectively just 4 choices of r i , the eigenvalues of JEJ{xx'l(3(.\u00C2\u00A3/? 0 for some 0 < k < P - 1 and all i = 1,2, s = 1, L + 1, and (b) for any partition {Rj]^^l, A^ C Ri for some r, 5 G {1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, X + 1}. H Before proving the theorem, let us find Aj's in the two examples given above. Assume, arbitrarily, d = 2. In Example 1, let Af = {x :-2 < X2 < -0.5} and = {x : 0.5 < X2 < 2}. Then, Af and A^ satisfy (ii) in Theorem 2.1. In Example 2, Af = {x : -1 < X2 < 0} and A2 = {x. : 0 < X2 < 1}. Note that in this case, Af H A^ = {0}; the sets overlap. For any measurable set C in , let A'^(C) = jmn A({x : G C} n i2?). Lemma 2.1 A'^([a,u]) is right continuous in u. X'^{[u,b]) is left continuous in u. Also, hmfc__oo A' ' ((-oo, b]) = 0, hm<,_oo A''([a, +00)) = 0 and X'^i{a}) = 0. Proof Let A = {x : a < Xd < u} n Rl, As = {x : u < Xd < u + S} n R\u00C2\u00B0 and A+ = {x : a < Xd < u + \u00C3\u00AA} n Ri- Then A^ = AU As. Let a be the normalized eigenvector corresponding to X{A), the smallest eigenvalue of \u00C2\u00A3[xx'l(.{xgyi})]- Then X{A) = a'i;[xx'l({x6^})]a = a 'i;[xx'l({xeA+})]a- a'i;[xx'l({xe>i.})]a > A(A+)-a '\u00C2\u00A3;[xx'l({xe^,})]a >X{A+)-tr{E[xx'l^^^^A,})]) = A(A+) - E[x'xl({xe>ia)]-By the dominated convergence theorem, i^[x'xl(.{xeAi})] = -^[x'xl(^x:u<^ A(A_)-a 'X;[xx' l({,g^,})]a > A ( A _ ) - i r ( ^ [ x x ' l ( { x \u00E2\u0082\u00AC ^ , } ) ] ) = A ( A _ ) - \u00C2\u00A3 [ x ' x l ( . { x \u00C3\u00A7 ^ , ) ) ] . By the dominated convergence theorem, X^[x'xl({xeA\u00C2\u00AB})] = \u00E2\u0080\u00A2E^[x'xl({x:u-5 Similarly, 0 < A'^(-oo,6]) < tr{E[xx'l^^^,_^^^^ A} where A > 0 is given by D\u00C3\u00A9finition 2.1, 6^ ^^ ^ = co, and, recursively, bj_i - sup{6 < bj : X^{[b,bj]) > A } , j = 2 , . . . , i , where, by convention, 6;_i = - 0 0 i f {b < b* : X-'iib, b*j]) > A} = Lemma 2.2 Suppose is identifiable w.r.t. L. Let 65 = \u00E2\u0080\u0094 0 0 . Then (i) - 0 0 = 60 < < . . . < 62 < 62+1 - ^\"'^ (ii) A ' ' ( ( - o o , 6 \u00C3\u00AE ] ) > A . Proof (i) Lemma 2.1 imphes hma_^oo A'^ffa, 00)) = 0, so 6^ < 0 0 . And 6^ > - 0 0 . For if it were not, i.e., 6^ = ^h^n since limf,_t_oo A'^((-oo, 6]) = 0, there exists Tl \u00C3\u0087. ( \u00E2\u0080\u0094 0 0 , 0 0 ) , such that A'^((\u00E2\u0080\u009400, ri]) < A. In view of the definition of 6^ ^.nd the assumption that 62 = \u00E2\u0080\u0094 0 0 , we have that A'^((ri,oo)) < A . For any T2,---,TL such that \u00E2\u0080\u0094 0 0 \u00E2\u0080\u0094 TQ < TI < T2 < \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 < TL < TL^I = 0 0 , we have X'^{{TJ_I,TJ]) < A , j = 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, X + 1. This contradicts to the definition of A . So, \u00E2\u0080\u0094 00 < 62 < 00. Assume that 6^, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, 62 have been well defined and satisfy \u00E2\u0080\u0094 o o < 6 ^ < - - - < 6 2 < o o . We will now show that - 0 0 < 6*_j < 6^. By Lemma 2.1, X'^{{a}) \u00E2\u0080\u0094 0 and X'^{[u,b]) is left continuous in u. Hence, bj_i < bj. Suppose bj_^ = \u00E2\u0080\u0094CO. Since lim6__oo A''((\u00E2\u0080\u009400,6]) = 0, there exists r j _ i \u00E2\u0082\u00AC ( \u00E2\u0080\u0094 0 0 , 6 * ) such that A'^((\u00E2\u0080\u009400,rj_i]) < A . For this TJ-I, let TQ = 00 and choose r i , - - - , r j _ 2 such that 00 = To < Tl < \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 < Tj-2 < Tj-i- Then X'^iin-uTk]) < A '^((-^ , r ,_a]) < A , k = l , - - - J - l . Since bj_-^ = \u00E2\u0080\u0094 0 0 , A' ' ([r j_i , 6 ]^) < A . By right continuity of X'^{[a,-]), there exists Sj > 0 such that Tj = bj + Sj e (6^,6^^j) and X'^{[TJ^I,TJ]) < A. Repeating this argument we can see that there exists Sk > 0, such that Tk = b^ + 6k \u00C2\u00A3 (KiK+i) A'^([r/.._i, rfc]) < A , where k = j, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, L. By the definition of 62, X'^([TL, 00)) < A . In summary, we have X\{Tk-urk]) < X\[Tk-i,rk]) < A, k = l , . . . , L , and A'^((rL,oo)) < A . That is, the partition {Rjjf^l, where = {x: Xd \u00C2\u00A3 ( r j_ i , r j ]} , satisfy inini=i_2 A(i2^ni2\u00C2\u00B0) = X'^{{TJ-I,TJ]) < A, j = 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, L + 1. This again contradicts the definition of A . B y induction, \u00E2\u0080\u0094oo < < 6^ for j = 2, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, i + 1. Thus, (i) is verified, (u) If not, X'^{(-(X),b'^]) < A . Then, by the right continuity of A'^([a,-]), there exists > 0 such that n = + 1^ < ^2 and A' '((-oo, ri]) < A . By the definition of b^, X'^{[Ti,b^]) < A and hence there exists 62 > 0, such that tt = 62 + 2^ < 3^ and A'^([ri, r2]) < A . By repeating this process we shall see that there exists \u00E2\u0080\u0094 00 = TQ < r i < \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 < r / , _ i < bl 0 for ah s and i imphes mim^siK^i ^ ^?)} > 0- Then, X{{R'^}^+^) > mini=i,2 A(i2^ n i2?) > min,=i,2 A(Af n R'^) > mini,^{A(Af n i?^)}. We conclude that d\u00C2\u00B0 is identifiable w.r.t. L by taking the infima in the last inequality. Now assume (i) holds. Let Aj \u00E2\u0080\u0094 {-x. : Xd \u00C2\u00A3 l^j-ii^j]}, where bj is defined in Lemma 2.2, j = 1 , - - - ,X + 1. By Lemma 2.2, - 0 0 = 6^ < 6J' < \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 < 6^ < ^l+i = and A'^((-oo, 6|]) > A. By the definition of b^s, X'^([u,b*j]) > A for all u < j = 2 , - - - , X + 1. By Lemma 2.1, X'^{lu,b]) is left continuous in u. Hence, A'^([6^_j, 6*']) > A , j = 2, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, X + 1. By the definition of A'^(-), X{Af n i?0) = A({x : Xd e ( -00 , b1]} n X;0) > A'^((-oo, b^]) > A , and A(Af n R\u00C2\u00B0) = A({x : Xd \u00E2\u0082\u00AC [K-i^K]}'^R^i) > ^'^ilK-i^K]) > A.s = 2,---,L + 1. That is, {A^}^+/ satisfy (a) in Theorem 2.1 (u). It remains to show that for any {Rj}f^i, where Rj = {x. : \u00C2\u00A3 ( r j_ i , r , ]} , there exists r, 5 \u00E2\u0082\u00AC {1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, X -f 1} such that Rf C . We shall show it by sequential exhaustive argument. If Rf 75 Af then r i < 6*. If R^ 75 Af, i = 1,2, then r2 < If i?^ 7$ A,^, i = 1,2,3, then Ta / \u00C2\u00B0 , d\u00C2\u00B0 is identifiable w.r.t. X . Proof For any d ^ d^, any X + l mutually exclusive subsets of the form {x : Xd \u00C2\u00A3 [a, T]]}, where a < Tj and [a,r]] C ia.d,bd), will serve as the {Aj}^^l in Theorem 2.1. Hence the identifiabihty of d\u00C2\u00B0 follows. ^ Corollary 2.3 Suppose the support of distribution of z i = (xn,... ,Xxp)' is a convex subset of RP . Then for any integer X > is identifiable w.r.t. X . Proof Since the support of distribution of Z i is convex, it contains a subset of the form (ai , 61) X ... X (ttp, b-p), where \u00E2\u0080\u009400 < a, < bi < 0 0 , i = 1,... ,p. For any d 7^ c?\u00C2\u00B0, any X + l mutually exclusive subsets of the form {x : \u00E2\u0082\u00AC [a, 77]}, where a < rj and [a, T]] C (a^, 6^), will serve as the {A'j)^^l in Theorem 2.1. f 2.2 Estimation procedures The least squares criterion is used to select d. The idea is simple. Suppose that d^ is identifiable and that a wrong d were chosen as the threshold variable. Then for sufficiently large n, on at least one of the Rj^s, say Rf, the model exhibits nonhnearity, resulting in a large sum of squared errors on Rf. Hence, the total sum of squared residuals is large. In contrast, if d\u00C2\u00B0 were chosen, by adjusting the f / s , the model on each {x : f j _ i < x^o < fj} would be roughly hnear, resulting in a smaher total sum of squared errors. Therefore, d should be chosen as the d resulting in the smallest total sum of squared errors. To simphfy the implementation of this idea, let \enJ In{A) := c f i a p ( l ( x , e ^ ) , . . . , l ( x \u00E2\u0080\u009E e A ) ) , A C R''+'' XniA) := In{A)Xn, H^{A) := Xr.{A)[X'M)Xn{A)]-X'M Sn{A) := Y:,{UA) - Hn{A))Yn, and Tn{A) := \u00C3\u00A8'MA)\u00C3\u00AAn, where in general for any matrix M, M~ denotes a generahzed inverse. Note that X \u00E2\u0080\u009E ( A ) , Hn{A) and Sn{A) are, respectively, the covariates, \"hat matrix\" and the sum of squared residual errors from fitting a linear model based on just the observations in A. Finally, for any {RjYjtl define the total sum of squares over different regions as ;+! i=i The first method for estimating is given below. Method 1 Suppose d\u00C2\u00B0 is identifiable w.r.t. L . Choose d to minimize the sum of squared errors. More precisely, let := S^{ff,..., f^), where < \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 < f | minimize S^{TI, . . . , r^) over ah ( r i , . . . , TL). Select d such that < 5^ for d = 1,... ,p. Should multiple minimizers occur, we define d to be the smallest of them. Remark When calculating SniRj), at least p data points must be in to ensure the regression coefficients on that segment are uniquely determined. This method requires intensive computation. As Feder (1975a) and other authors note, S^{TI, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, TL) may not be differentiable at the true change points. So to minimize 5'^(TI, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 , TL), one has to search all ( r i , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2,TL). Fortunately, we can do this by restricting ourselves to the finite set {xid, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, Xnd}, without loss of generality. Even so, exhausting all (T^, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, T^) for any d needs (\u00C2\u00A3) x ( i + 1) linear fits. Although a method more efficient than actually doing the (2) x{L + l) fits exists, there is still a lot of work for any i > 3 and large n. So, under stronger conditions, we give another more efficient method. This method is based on the following idea. Suppose z i = (xu, \u00E2\u0080\u00A2.., xip)' is a continuous random vector and the support of its distribution is ( c i , 6i) X . . . x ( o p , bp), where \u00E2\u0080\u0094oo < a,- < 6,- < oo, (i \u00E2\u0080\u0094 1, - \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 ,p). Then for any d we can partition {ad,bd) into 2L + 2 disjoint intervals such that there are an equal number of observations in each of the intervals. For any d ^ d\u00C2\u00B0, on all these intervals the model will exhibit nonlinearity and hence the linear fits will result in larger sum of squared errors. If d = d^, then there are at least X + l intervals that are entirely embedded in one of the ( r\u00C2\u00B0_j , r \u00C2\u00B0 ] ' s . Hence, on those intervals, the model is linear and the sum of squared errors from hnear fits are smaller. Thus, the total of the smallest L + 1 sums of squared errors for d = d\u00C2\u00B0 is expected to be smaller than that for d ^ d^. It is easy to see that the above argument holds as long as the number of partitions is no less than X + 2. The practical advantages of choosing a number larger than X + 2 will be discussed in Section 3.2. We summarize the above discussion as follows: Method 2 Suppose Z i = ( x n , . . . , xip)' is a continuous random vector and the support of its distribution is X ... X (ap,6p), where -oo < a,- < 6j < oo, i = 1,.. .,p. Let r'j be the [100 X j/{2L + 2)]th percentile of Xt^'s, = {x i : xu G (^j^-i, r^^]}, j = 1,.. . , 2X + 2. Select d, so that for aU d = 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, p, where 5 ^ = x ; ' 5 \u00E2\u0080\u00A2 n ( 4 ) ) :=1 and 5\u00E2\u0080\u009E(\u00C3\u0080(''-)) is the ith smallest of 5\u00E2\u0080\u009E(\u00C3\u0080^) , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, 5\u00E2\u0080\u009E(\u00C3\u0080^\u00C2\u00A3,+2)-Remark For any d, Method 2 requires only 2X + 2 linear fits (independent of n). The computational effort is significantly reduced compared with Method 1. Now, with d'^ estimated above, we can assume that rf\" is known and estimate other pa-rameters. For simphcity, we shall drop the superscript, d, on and rj^'s in the rest of this section. First we estimate P and the thresholds, , . . . , r^J, by minimizing the modified Schwarz' criterion (Schwarz, 1978), MICil) := l n [ 5 ( f i , . . . , f;)/(n -p*)] + \u00C2\u00A3\u00C2\u00A3O^\u00C3\u008E^)!l!l^ (2.4) n for some constants CQ > 0, > 0. In equation (2.4), p* = (I + l)p + I ^ (I + l){p + 1) is the total number of fitted parameters, and for any fixed /, f i , . . . , f/ are the least squares estimates which minimize 6 ' \u00E2\u0080\u009E ( r i , . . . , r;) subject to \u00E2\u0080\u0094oo = TQ < TI < \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 < r;+i = oo. Recall that Schwarz' criterion (SC) is defined by SC{1) = ln[Sin,fi)l{n - I)] + / ^ ^ . (2.5) 26 We can see that the distinction between MIC{1) and SC{1) hes in the severity of the penalty for overspecification. And a severer penalty is essential for the correct specification of a non-Gaussian, segmented regression model, since SC{1) is derived under Gaussian assumption (cf., Yao, 1988). Both criteria are sometimes referred as penalized least squares. Wi th estimates, / of / \u00C2\u00B0 , and fj for r \u00C2\u00B0 , i = 1, . . . , / available, we then estimate the other regression parameters and the residual variance by the ordinary least squares estimates, h = [ x ; ( 4 ) x \u00E2\u0080\u009E ( \u00C3\u0082 i ) ] - x ; ( \u00C3\u0080 i ) Y n , \u00C3\u00AE = i , . . . , / + i , and = 5 \u00E2\u0080\u009E ( f i , . . . , f / ) / ( n - p * ) , where Ri = {x : f , _ i < x^o < fi}, p* = (l + l)p + I. Under regularity conditions essential for the identifiability of the regression parameters, we shall see in Chapter 3 that the ordinary least squares estimates Pj will be unique with probabihty approaching 1, for j = 1,. . . , / -|- 1, as n \u00E2\u0080\u0094>\u00E2\u0080\u00A2 oo. While for a really large sample size, we do not expect the choice of and CQ to be crucial, for small to moderate sample sizes, this choice does influence the model selection. Below, we briefly discuss the choice of CQ and ^o-In general, when selecting models, a relatively large penalty term would be preferable for the models that can be easily identified. This is because a larger penalty will greatly reduce the probabihty of overestimation while not risking underestimation too much. However, if the model is difficult to identify (e.g., a continuous model with \\dj+i \u00E2\u0080\u0094 Pj\\ small), the penalty should not be too large since the risk of underestimation is now high. Another factor infiuencing the choice of the penalty is the error distribution. A distribution with heavy tails is likely to generate extreme values, making it look as though a change in response has occurred. To counter this effect, one needs a heavier penalty. In fact, if ej has only finite order moments, a penalty of order for some a > 0 is needed to make the estimation of 1\u00C2\u00B0 consistent. Given that the best criterion is model dependent and no uniformly optimal choice can be made, the following considerations guide us to a reasonable choice of and CQ: (1) From the proof of Lemma 3.2 in Section 3.1, we shall see that it is possible that the exponent 2 + SQ in the penalty term of MIC may be further reduced, while keeping the model selection procedure consistent. And since the Schwarz' criterion (where the exponent is 1) is obtained by maximizing the posterior likelihood in a model selection paradigm and is widely used in model selection problems, it may be used as a basehne reference. Adopting such a view, should be small to reduce the potential risk of underestimation when the noise is normal and n is not large. (2) For a small sample, it is practically difficult to distinguish normal and double exponential noise, or t distributed noise. And , hence, one would not expect the choice of SC or any other reasonable criterion to make a drastic difference. (3) As Yao (1988) noted for large samples, SC tends to overestimate /\u00C2\u00B0 if the noise is not normal. We observe such overestimation in our simulations under different model specifications when n = 50 (see Section 3.3). Based on (1), we should choose a small ^o- And by (2), with SQ chosen, we can choose some moderate no, and solve for CQ by forcing MIC equal to SC at UQ. By (3), no < 50 seems desirable. In the simulation reported in the next section, we (arbitrarily) choose 6o to be 0.1 (which is considered to be small). Wi th such a 6o, we arbitrarily choose no = 20 and solve for Co. We get Co = 0.299. In summary, since the \"best\" selection of the penalty is model dependent for finite samples, no optimal pair of (co,^o) can be recommended. On the other hand, our choice of = 0.1 and Co = 0.299 performs reasonably well for most of the cases we experimented with in our simulation. The simulation results are reported in Section 3.3. Further study is needed on the choice of 6o and co under different assumptions. A data set used in Henderson and Velleman (1981) is analyzed below to illustrate the esti-mation procedures proposed above. The data consist of measurements of three variables, miles per gallon (y), weight (xj) and horse power (x^), on thirty eight 1978-79 model automobiles. The dependence of y on Xi and X2 is of interest. Graphs of the data show a certain nonlinear dependence structure between y and xi (see Figure 2.3). Suppose we want to fit a model of the form (2.1). In this case, it becomes yt = Pio + Piixn + Pi23:t2 + Q, if xtd \u00C2\u00A3 ( r , _ i , r i ] , i = l , . . . , / - f 1, (2.6) where is assumed to have zero mean and variance But, it is important to reahze that ^ is not always uniquely identifiable and to know when it is not uniquely identifiable, in an asymptotic sense. It is also important to bear in mind the question of identifiability in a design problem. The results in Section 2.1 have provided an answer to these questions. Moreover, these results not only provide a foundation for estimating dP in model (2.1) for continuous covariates, but they also address the same problem when the covariates are discrete or ordered categorical. For example, one may want to know which of the two covariates, the dose of certain drug or age group, alters the dependent structure of blood pressure on the two. In this case, the identification of cP is important even when the change point is not uniquely defined. As in the example of automobiles, the MIC we proposed in the last section should be treated as a method of model selection, and not merely as a tool of estimating dP. In fact, in the case when dP is only identifiable w.r.t. some number less than the known L, d^ and P can be jointly estimated by minimizing MIC over all the combinations of d{<. p) and /(< L). In the next chapter, the consistency of these estimates, under certain conditions, will be shown. From a much broader perspective, our estimation procedures can be seen as a general adaptive model fitting technique. The upper bound L on the number of segments is imposed to ensure computational feasibility and to avoid the \"curse of dimensionality\"; in other words, L ensures there are sufficient data to enable each piece of the model to be well estimated even when the covariate is a vector of high dimension. Wi th this upper bound, the number of segments and the boundaries of each segment are selected by the data. It will be shown in the next chapter that these estimates are also consistent. Chapter 3 A S Y M P T O T I C R E S U L T S F O R E S T I M A T O R S O F S E G M E N T E D R E G R E S S I O N M O D E L S In this chapter, asymptotic results for the estimators given in the last chapter are proved. The exact conditions under which these results hold are stated and explained. It will be seen that these conditions seem realistic for many practical problems. More importantly, the techniques we use in this chapter constitute a foundation for the generalizations in Chapter 4 of Model (2.1). In some cases the parameter dP is known a priori, in such cases the notation required for presenting the proof of our results is relatively simple, and so we first prove the results for these cases. In Section 3.1 we estabhsh the consistency of the estimated number of segments, the estimated thresholds and the estimated regression coefficients. Then, for the discontinuous model, an upper bound is given for the rate of convergence of the estimated change points. The asymptotic normality of the estimated regression coefficients and of the estimated variance of the noise is also estabhshed. In Section 3.2 we move to the case of unknown dP and prove the consistency of the two estimators of (f given in Section 2.2. It wih be easy to see that the results proved in Section 3.1 still hold \\u00C3\u00AE cP is replaced by its consistent estimate. In Section 3.3, the finite sample behavior of these estimators is investigated by simulation for various models and noise distributions. Some general remarks are made in Section 3.4. The asymptotic normality of the various estimates for the continuous model is established in Section 3.5. 3.1 Asymptotic results when the segmentation variable is known In this section, the parameter d in model (2.1) is assumed known. Consequently, we can simphfy the notation at the beginning of Section 2.2. For any \u00E2\u0080\u0094 o o < a < 7 / < o o , let /\u00E2\u0080\u009E(a,T?) := dia5(l(^j^e(c,\u00E2\u0080\u009Ei),...,l{:,\u00E2\u0080\u009E^e{<:,,T,l)), and ^ \u00E2\u0080\u009E ( a , 7/) := X \u00E2\u0080\u009E ( a , r/)[X;(a, 7?)X\u00E2\u0080\u009E(a, r?)]-X;(a , 7?), where in general for any matrix A, A~ will denote a generalized inverse while 1(.) represents the indicator function. Similarly, let y \u00E2\u0080\u009E ( a , Tj) := In{a, T])Yn, \u00C3\u00AA\u00E2\u0080\u009E(a , rj) := / \u00E2\u0080\u009E ( a , 7/)\u00C3\u00AA\u00E2\u0080\u009E, 5 \u00E2\u0080\u009E ( a , rj) := ^ [ / ^ ( a , 7?) - H^a, 7/)]y\u00E2\u0080\u009E, i+i Sn(Ti,...,Ti) := ^SniTi-i,Ti),To 1= - co , r ,+ i := oo, and r\u00E2\u0080\u009E (a ,7 / ) := 4 ^ n ( \u00C2\u00AB , ^ ) \u00C3\u00AB n -Observe that Sn{ot,v) is just the error sum of squares when a linear model is fitted over the \"threshold\" interval (a, rj]. Also, let the forecast of y\u00E2\u0080\u009E on the interval (a, 77], Yn{a, 77), be defined by y\u00E2\u0080\u009E(\u00C2\u00AB,7/) := Hr,{a,ri)Yn. Then, in terms of true parameters, (2.1) can he rewritten in the vector form, F \u00E2\u0080\u009E = J ]X\u00E2\u0080\u009E( r f_ i , r \u00C2\u00B0 ) /3 . \u00C2\u00B0 + f-n. (3.1) t=i To establish the asymptotic theory for the estimation problems of Model (3.1), some as-sumptions have to be made. First, we assume an upper bound, i , of can be specified. This is because in practice, the sample size n is always finite and hence any 1\u00C2\u00B0 that can be effectively identified is always bounded. We also assume the segmentation does occur at every true thresh-old, i.e., 7^ /^j+i) i = 1) \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 5 so that these parameters are uniquely defined. The covariates {xt} are assumed to be strictly stationary, ergodic random sequence. Further, {xt} and the errors sequence {q} are assumed independent. These are the basic assumptions underlying our analysis. To simplify the problem further, we assume in this chapter that the errors {et} are iid random variables with mean zero and variance a^. In addition, a local exponential boundedness condition is placed on the distribution of the errors {et}. A random variable Z is said to be locally exponentially bounded li there exist two positive constants, CQ and TQ, such that i;(e\"^) < e'^ \"\"', for every \u\ < TQ. (3.2) The above assumptions are summarized in Assumption 3.0; The covariates {x^} and the errors {et} are independent, where the {x^} are strictly stationary and ergodic with E{x[x.i) < oo, {ct} are iid with a locally exponentially bounded distribution having mean zero and variance CTQ. For the number of threshold there exists a known L such that /o < L. Also, for anyj^l,---, f, 7^ ^ ^ ^ j . Remark The local exponential boundedness condition is satisfied by any distribution with zero mean and a moment generating function with second derivative bounded around zero. Many distributions commonly used as error distributions such as those in the symmetrized exponential family are of this type, and hence aU the theorems in this chapter wiU commonly apply. The next assumption is required to identify the number of thresholds /\u00C2\u00B0 consistently. Assumption 3.1 There exists \u00C3\u00A8 G (0, mini 0, \u00C2\u00A3:{xixil(^^_^e(^_o_5 ,._o])} and \u00C2\u00A3{xixil(^j^g(^_o .^o+ j^)} are pos-itive definite, i \u00E2\u0080\u0094 1,---,P. Also, \u00C2\u00A3 ( x i x i ) \" < 00 for some u> 1. Obviously, Assumption 5.5 imphes Assumption 3.1. If Model (3.1) is discontinuous at rj* for some j = I, - \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 ,P, it will he shown that the least squares estimate fj converges to rj* at the rate no slower than Op(ln'^ n/n), under the following assumption: Assumption 3.3 (A.3.3.1) The covariates { x j are iid random variables. Also, \u00C2\u00A3 ( x i x i ) \" < oo for some u > 2. (A.3.3.2) Within some small neighborhoods of the true thresholds, xid has a positive and con-tinuous probability density function fd{-) with respect to the one dimensional Lebesgue measure. (A.3.3.3) There exists one version of E[xi-x.[\xid = x] which is continuous within some neigh-borhoods of the true thresholds and that version has been adopted. Remark Assumptions (A.3.3.2)-(A.3.3.3) are satisfied if z i = ( x i , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, Xp) h.as a joint distri-bution in canonical form from the exponential family. Note that Assumptions 3.1-3.3 are made on the distribution of { x j . When {x^} are non-random, one may assume the empirical distribution function of {xt} converges to a distribution function satisfying these assumptions. Now, the main results of this section are presented in the next five theorems. Their proofs are given in the sequel. Theorem 3.1 Assume for the segmented linear regression model (3.1) that Assumptions 3.0 and 3.1 are satisfied. Then I, the minimizer of (2.4), converges to in probability as n oo. Remark In the nonlinear minimization of 5 ( r i , . . . ,r(), the possible values of r i < . . . < r; may be limited to { x i ^ , . . . , x\u00E2\u0080\u009Ed}. This restriction induces no loss of generality. Theorems 3.2 and 3.3 show that the estimates f, ^^s and a- are consistent. Theorem 3.2 Assume for the segmented linear regression model (3.1) that Assumptions 3.0 and 3.2 are satisfied. Then where T\u00C2\u00B0 = ( r \u00C2\u00B0 , . . . , r^ o ) and f = ( f i , . . . , -fp) is the least squares estimate of r \u00C2\u00B0 based on I = /, and I is a minimizer of MIC {I) subject to I < L. Theorem 3.3 If the marginal cdf Fj. ofx\d satisfies the Lipschitz condition \Fd{x')\u00E2\u0080\u0094Fd{x\")\ < C\x' \u00E2\u0080\u0094 x\"\ for some constant C in a small neighborhood of Xid = r \u00C2\u00B0 for every j, then under the conditions of Theorem 3.2, the least squares estimates (Pj, j = 1,... ,1) based on the estimates I and fj's as defined in Section 2.2 are consistent. The next two theorems show that if Model (3.1) is discontinuous at TJ for some j = 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, / \u00C2\u00B0 , then the threshold estimate fj converges to the true thresholds rj\" at the rate of Op(ln'^n/n), and the least squares estimates of and CTQ based on the estimated thresholds are asymptotically normal. Theorem 3.4 Suppose for the segmented linear regression model (3.1) that Assumptions 3.0, 3.2 and 3.3 are satisfied. For any J G {1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, /\u00C2\u00B0} such that P (x i (^ j%i - ySp ^ Q\xd = T^) > 0, Tj-Tj = 0 p ( - \u00E2\u0080\u0094 ) . Let Pj and CT'^ be the least squares estimates of P^j and CTQ based on the estimates / and fj's as defined in Section 2.2, j = 1,... ,1^ -\- I. Theorem 3.5 Suppose for the segmented linear regression model (3.1) that Assumptions 3.0, 3.2 and 3.3 are satisfied. If P{x[(P^^j^ - P^) 7^ 0\xd = r?) > 0 for all j = l , - - - , / \u00C2\u00B0 , then y/n(Pj - / 3 \u00C2\u00B0 ) and \u00E2\u0080\u00A2y/n[\u00C3\u00A2^ - CTQ] converge in distribution to normal distributions with finite variances, j = 1, . . . , /\u00C2\u00B0 + 1. Remark The asymptotic variances can be computed by first treating P and rj\", (j = 1,. . . , /\u00C2\u00B0), as known so that the usual \"estimates\" of the variances of the estimates of the regression coefficients and residual variance can then be written down explicitly by substituting / and fj for and TJ, [j = 1,...,/^), in these variance \"estimates\". For example, the asymptotic covariance matrix for Pj is OTQGJ^, where Gj = \u00C2\u00A3'[xiXil(2,j^g(^o_^ ,.9])]. The proof of Theorem 3.1 is motivated by the following idea. If the model is overfitted {P < I < L), the reduction in the mean square error will be bounded in probability by a positive sequence tending to zero. In fact, this turns out to be Op(ln^ n/n). On the other hand, i f the model is underfitted (/ < P), the inflation in the mean square error will be of order Op{l). Hence, by setting the penalty term in MIC equal to a quantity of order bigger than Op(ln^ n/n) but still tending to 0, we can avoid both overfitting and underfitting. This idea is formulated in a series of lemmas. The result of Lemma 3.1 is a consequence of the local exponential boundedness assumption, which gives the added flexibihty of modehng with non-Gaussian noises. Using the properties of the hat matrix Hn{xsd, Xtd), Lemma 3.2 estabhshes a uniform bound of T\u00E2\u0080\u009E (a , 77) for all a < t]. With this lemma, we show in Proposition 3.1 that the mean squared residuals differs from the mean squared pure errors only by Op{ln^ n/n), which in sequel motivates the choice of the penalty term in our MIC. Given Lemma 3.2 and Proposition 3.1, the results of Lemmas 3.3 and 3.4 are more or less expected. Lemma 3.1 Let Zi,...,Zk be i.i.d. locally exponentially bounded random variables, i.e., i;(e\"^i) < e'=\u00C2\u00B0\"' for \u\ < TQ, where TQ and CQ \u00E2\u0082\u00AC (0,oo). Let Sk = E\u00C3\u008ELi where the a\s are constants. Then for any > 0 satisfying |fo\u00C2\u00ABt| < TQ, i < k, P{\Sk\ >x}< 2e- '\u00C2\u00B0^+'=\u00C2\u00B0 '\u00C2\u00B0S-=i ' '? . (3.3) Proof It follows from Markov's inequality that for the hypothesized to, P{Sk >x} = Pfe*\"-^* > e*'\"'} < e~^'\"'E{e^\u00C2\u00B0^'') = e- '\u00C2\u00B0 '^\u00C2\u00A3(e '\u00C2\u00B0 ^ * = i ) < e-'o^e\"\"*\" ^ i = i , and to conclude the proof of (3.3), P{Sk < -x} = P{-Sk >x}< e-^ o^e\"^\"'\" ^ *=i ''^. ^ Lemma 3.2 Assume for the segmented linear regression model (3.1) that Assumption 3.0 is satisfied. Let r\u00E2\u0080\u009E(a ,7/) ,\u00E2\u0080\u0094oo < ex. < T} < oo, be defined as in the beginning of this section. Then P{sup Tn{a, 7?) > ^ In^ ra} ^ 0, as n ^ 0, (3.4) a ^ I n ^ n I X \u00E2\u0080\u009E } = P{ max \u00E2\u0082\u00AC'M^sd,xtd)\u00C3\u00ABn > ^ l n ' n \ X \u00E2\u0080\u009E } a^ln'n\Xn}. Since Hni^Xsdi Xtd) is nonnegative definite and idempotent, it can be decomposed as Hn{xsd, Xtd) = M^'APF, where W is orthogonal and A = diag{l, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, 1,0, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, 0) with p := rank{Hn{xsd, Xtd)) = rank{A) < PQ. Set Q = (Ip,0)W. Then Q has full row rank p. Let Q' = ( q i , - - - , q p ) and Ui = q ;\u00C3\u00AA\u00E2\u0080\u009E , / = Then p Since p < po and 1=1 ^ 0 p^^ln'n\Xr.} 1=1 ^0 ^ I n ^ n for some / | X \u00E2\u0080\u009E } 1=1 ^ 0 it suffices to show, for any /, that E P{Uf>^ln'n\Xn}^0, asn-^0. X,d we have || q, f= qjq; < p < Po, / = 1,. . . ,p. By Lemma 3.1, with = To/po we have V P{|C/, | >3poInn/To I X \u00E2\u0080\u009E } < T 2 e x p ( - ^ \u00E2\u0080\u00A2 ^ l n n ) e x p ( c o ( r o / p o ) % ) < n(n - l)/n^ exp{coT^/po) 0, as n -> oo, where CQ is the constant specified in Lemma 3.1. Finally, by appealing to the dominated convergence theorem we obtain the desired result without conditioning. ^ Proposition 3.1 Consider the segmented regression model 3.1. (i) For any j and {a,rj\ C (r]'_i,r]'], 5 \u00E2\u0080\u009E ( a , 7/) = \u00C3\u00AA '\u00E2\u0080\u009E(a , 77)\u00C3\u00AA\u00E2\u0080\u009E(a , r/) - r \u00E2\u0080\u009E ( a , 77). ('ii^ Suppose Assumption 3.0 is satisfied. Let m > 1. T/ien uniformly for all (a i , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 ,a^) such that \u00E2\u0080\u009400 < ai < \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 < Um < 0 0 , m+t\u00C2\u00B0 + l i=l where = -oo, ^\u00E2\u0080\u009E,+;o+i = oo, and {^i, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 ,^m+i\u00C2\u00B0} is the set {rf, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, r\u00C2\u00B0o, ai, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 ordering its elements. Proof: (i) Observe that Snia, ri) ^Y^iUa, rj) - H^ia, r,)Y^ = ( X \u00E2\u0080\u009E ( a , 7?)^\u00C2\u00B0 + 6 \u00E2\u0080\u009E ( a , v))'iXn{a, r,)$'j + \u00C3\u00AA\u00E2\u0080\u009E(a , rj)) - ( X \u00E2\u0080\u009E ( a , r,)p'j + \u00C3\u00AA\u00E2\u0080\u009E(a , r ? ) ) ' ^ n ( a , r?)(X\u00E2\u0080\u009E(a, 7?)^\u00C2\u00B0 + \u00C2\u00A3\u00E2\u0080\u009E(a , 7/)) = / f \u00C2\u00B0 ' X ; ( a , 77)X\u00E2\u0080\u009E(a, 7?)^\u00C2\u00B0 + 2\u00C3\u00AB'^ia, 7?)X\u00E2\u0080\u009E(a, 7/)^\u00C2\u00B0 + ^ ( a , 7?)6\u00E2\u0080\u009E(a, T?) - [ /3\u00C2\u00B0 'X: (a , 77 )^\u00E2\u0080\u009E (a , 77 )X\u00E2\u0080\u009E(a , 77)^\u00C2\u00B0 + 2 4 ( a , 7?)/r\u00E2\u0080\u009E(a, 7?)X\u00E2\u0080\u009E(a, 7/);3\u00C2\u00B0 + 7 , ) i r\u00E2\u0080\u009E (a , 77)\u00E2\u0082\u00AC\u00E2\u0080\u009E(a, TJ)]. Noting that i f \u00E2\u0080\u009E ( a , 77) is idempotent and X ; ( a , 77)^\u00E2\u0080\u009E(a, 7 ;)X\u00E2\u0080\u009E(a, 77) = X^ia, n)Xn{a, rj), we have ( X \u00E2\u0080\u009E ( a , 77) - ^ \u00E2\u0080\u009E ( a , rj)Xnia, 7?))'(X\u00E2\u0080\u009E(a, 7/) - ^ \u00E2\u0080\u009E ( a , 7?)X\u00E2\u0080\u009E(a, 7?)) = X ; ( a , 7 / ) ( /\u00E2\u0080\u009E(a, 77) - Hn(a, 7 / ))X\u00E2\u0080\u009E(a, 7/) = X ; ( a , 7?)X\u00E2\u0080\u009E(a, 7?) - X ; ( a , 7 , )X\u00E2\u0080\u009E(a, 7?) = 0 and hence X \u00E2\u0080\u009E ( a , 77) = Hn{a, 77)X\u00E2\u0080\u009E(a, 77). Therefore 5 \u00E2\u0080\u009E ( a , 7/) = \u00C3\u00AA U a , 7?)\u00C3\u00AB\u00E2\u0080\u009E(a , 7?) - 4 ( a , 7 ? ) 5 \" \u00E2\u0080\u009E ( a , 7 7 ) \u00C3\u00A8 \u00E2\u0080\u009E ( Q , 7/) (ii) By (i), =\u00C3\u00AA'\u00E2\u0080\u009E(\"> ' / )\u00C3\u00AB\u00E2\u0080\u009E(a, 7;) - T\u00E2\u0080\u009E (a , 77). m+l\u00C2\u00B0 + l \u00C2\u00AB\u00E2\u0080\u00A2=1 m+l\u00C2\u00B0 + l \u00E2\u0080\u00A2- E K ( e i - i , 6 ) \u00C3\u00AA \u00E2\u0080\u009E ( e i - i , 6 ) - r \u00E2\u0080\u009E ( e . - i , e . ) ] \u00C2\u00AB=i = \u00C3\u00AA ' \u00E2\u0080\u009E \u00C3\u00AA \u00E2\u0080\u009E - E \u00C3\u00AE ^ n ( 6 - l , ^ i ) . Note that each of (6-1 > ft] is contained in one of ( r \u00C2\u00B0_ i , rj\"], j \u00E2\u0080\u0094 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 ,1^ + 1. By Lemma 3.2, ET=/^' <{m + P + l ) s u p \u00E2\u0080\u009E < , r \u00E2\u0080\u009E ( a < r?) = Op{\n' n). % Lemma 3.3 Under the condition of Theorem 3 . 1 , there exists 8 \u00C2\u00A3 (0, mini 0 as n \u00E2\u0080\u0094* 0 0 . Proof It suffices to prove the result when 1\u00C2\u00B0 = 1. For notational simplicity, we omit the subscripts and superscripts 0 in this proof. For the S in Assumption 3 . 1 , let Xj* = X \u00E2\u0080\u009E ( r i \u00E2\u0080\u0094 ,^ r i ) , ^ 2 = ^ n ( r i , n + 6), X* = X \u00E2\u0080\u009E ( r i - ^, n + <5) = X ; + X;, el* = \u00C3\u00A8\u00E2\u0080\u009E(ri - \u00C2\u00AB5, rj), = \u00C3\u00A8\u00E2\u0080\u009E( r i , n + 8 ) , \u00E2\u0082\u00AC* = + \u00E2\u0082\u00AC2 and P = ( X * ' X * ) ~ X * ' y n . As in ordinary regression, we have Sn{ri-8,Tx + 8) =\\xfpx + x*j2 + r-x*'p\? =\\x:{h-h+x;cP2-h+n' = \ m h - h ? + m i h - h ? + +2\u00C3\u00AA*'x;{h - h + 2e~*'x^ip, - h It then follows from the strong law of large numbers for stationary ergodic stochastic processes that as n -* 0 0 , 1 ' 1 \" 1 , f ^{Xixil(^^,g(^,_5,^,])} > 0, if \u00E2\u0080\u0094XX* \"\"'i < \" ' ' \ i ; { x i x i l ( x , . e ( n , n + 6 ] ) } > 0, i f j=2. and To Therefore, Similarly, it can be shown that f ( \u00C3\u0082 - ;\u00C3\u00A2*)'X;(xixil(, , ,g(, ,_5 ,n])) \u00E2\u0080\u00A2 (^ 1 - /3*), if j = i , 02-n'E{x^K[l(^^^\u00C3\u00A7^r\u00E2\u0080\u009En+5]))-02-n, if J=2, n \u00E2\u0080\u00A2' V x ; ( ; \u00C3\u00A2 ^ - ^ ) ^ 0 , for j = 1,2, and n Thus as n \u00E2\u0080\u0094>\u00E2\u0080\u00A2 oo, ^ 5 \u00E2\u0080\u009E ( r i \u00E2\u0080\u0094 ^, r i -f- ^) has a finite hmit, this limit being given by l im - 5 \u00E2\u0080\u009E ( r i - S,TI +6) n\u00E2\u0080\u0094*oo n ={h - ^*) ' i ; (xax; i ( . , , e (n- . ,x , i ) ) \u00E2\u0080\u00A2 ( \u00C3\u0080 - P') + 02 - / 3 ' ) ' \u00C2\u00A3 ( x i x ; i ( , , , e ( . \u00E2\u0080\u009E . , + \u00E2\u0080\u009E ) ) \u00E2\u0080\u00A2 0, - p*) + a^P{xtHe{n-S,n + S]}. It remains to show that ^ 5 n (T i \u00E2\u0080\u0094 S,TI) and ^ 5 \u00E2\u0080\u009E ( r i , r i + ^) converge to a-P{xid G (TI -^, n]} and cr^Pjxid G ( n , rj+<\u00C3\u00AE]}, respectively, and either {Pi - / 3 * ) ' \u00C2\u00A3 ( x i x i ^ ( ^ j _ s ^ r i ] ) ) 0 i -P*) > 0 or (;92 -^*)'\u00C2\u00A3^(xixil(ij_^\u00C3\u00A7(^j,^j4.5]))(/32 -y3*) > 0. The latter is a direct consequence of the assumed conditions while the former can be shown again by the law of large numbers. To this end, we first write 5\u00E2\u0080\u009E(TI \u00E2\u0080\u0094 8,TI) in the following form (bearing in mind that P is assumed to he 1 in the proof), Sn{ri-6,Ti) = \u00C3\u00AAl'll-Tr.{n-6,n) using Proposition 3.1 (i). By the strong law of large numbers, Ul'\u00C3\u00AAl ^ E[4l^,,,e(r..s,r,])] = 0, and (ii) for every I such that S, for all s = 1, . . . , /}. Hence, if we can show that for each r, 1 < r < / \u00C2\u00B0 , with probabihty approaching 1, min Sn{Ti,---,Ti)/n> +Cr, for some Cr > 0, then by choosing C := mini P, by Proposition 3.1 (ii) again, ^ n ^ n >'5'n(7\"i , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, Tjo) i.2 =4f-n + Opiln'in)). This proves (ii). ^ Proof of Theorem 3.1 By Lemma 3.4 (i), for / < P and sufficiently large n, there exists c > 0 such that MIC{1) = \n{\u00C3\u00A2f) + p*{lnnf+^/n > Inia^ + C/2) > In(al) + l n ( l + C/(2a^)) with probability approaching 1. By Lemma 3.4 (ii), for / > 1\u00C2\u00B0, MIC{1) = In(\u00C3\u00A2f) + p*(lnn)2+Vn Incrf. Thus, P{1 > \u00E2\u0080\u0094* 1 as oo. By Lemma 3.4 (ii) and the strong law of large numbers, for /o < / < X , 0 > [a? - U'^\u00C3\u00AAn] - [ 4 - U'jn] = Op{ln' n /n) , and [\u00C3\u00A2l - cl] = [\u00C3\u00A2fo - + [^\u00C3\u00A8'jn - CT'O] = Opiln' n/n) + Op(l) ^ Op(l). Hence 0 < (\u00C3\u00A2fo-\u00C3\u00A0f)/\u00C3\u00A2% = Op(ln^ n/n). Note that for 0 < a; < 1/2, I n ( l - x ) > -2x. Therefore, MIC{1) - MIC{f) = l n ( \u00C3\u00A2 f ) - l n ( 4 ) + CQ{1 - f){\unf+^\u00C2\u00B0ln = l n ( l - ( 4 - \u00C3\u00A2 f ) / 4 ) + co(/ - /\u00C2\u00B0)(lnn)2+\u00C2\u00ABo/n > - 20p(ln2(n)/n) + co(/ - /\u00C2\u00B0)(ln n)2+*Vn >0 for sufficiently large n. Whence / ^ /\" as n ^ oo. f Remark: From the proof of Theorem 3.1 it can be seen that if the term Co/(ln n)^+''o/n is replaced by / -cn\"\"^ , where a \u00E2\u0082\u00AC (0,1) and c is a constant, the model selection procedure is still consistent. In fact, such a penalty is proposed by Yao (1989) for a one-dimensional piecewise constant model. Remark If the assumed 6 in Assumption 3.1 is replaced by assumed sequences {flj}, {bj] such that - o c < oi < r f < 6i < \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 < a;o < r^o < 6/o < oo, and such that both E{x.ix.[l(^^^^f^a. .^o-^-^] and \u00C2\u00A3{xixil(2.j^g(,.o^{,^.])} are positive definite for j = 1 , . . . , / \u00C2\u00B0 , then the conclusion of Lemma 3.3 still holds with 6 replaced by aj and bj, respectively. Therefore, the conclusion of Theorem 3.1 still holds. To prove Theorem 3.2, we need the following lemma. Lemma 3.5 Under the assumptions of Theorem 3.2, for any sufficiently small 6 G (0, mini 0 such that ^ [ 5 \u00E2\u0080\u009E ( r \u00C2\u00B0 - <5,r\u00C2\u00B0 + 6)- 5 \u00E2\u0080\u009E ( r \u00C2\u00B0 - S,T^) - Sn(r^,T\u00C2\u00B0, + S)] ^ Cr, as n ^ oo, where r = 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 Proof It suffices to prove the result for the case when = 1. For any small ^ > 0, all the arguments in the proof of Lemma 3.3 apply, under Assumption 3.2. Hence the result holds. IF Remark: Although the proofs of Lemma 3.3 and Lemma 3.5 are essentially the same, the assumptions, and hence the conclusions of these lemmas are different. In Lemma 3.3 Cr is fixed for the existing 6. While Lemma 3.5 implies that for any sequence of {6m} such that > 0 and \u00E2\u0080\u0094^ 0 as m ^ oo, there exist {Cr(m)} such that the conclusion of Lemma 3.5 holds for all m. Proof of Theorem 3.2 By Theorem 3.1, the problem can be restricted to {/ = / \u00C2\u00B0 } . For any sufl\u00C3\u00AEciently small 8' > 0, substituting S' for the 6 in (3.7) in the proof of Lemma 3.4 (i), we have the following inequality -Snin, - \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 ,Tio) n >-\u00C3\u00AB'^\u00C3\u00A8n + Op{ln\n)ln) n 1 + - [ 5 \u00E2\u0080\u009E ( r \u00C2\u00B0 - r \u00C2\u00B0 + 6') - 5\u00E2\u0080\u009E(r,\" - 8', r \u00C2\u00B0 ) - 5 \u00E2\u0080\u009E ( r \u00C2\u00B0 , r,\" + 8% n uniformly in ( n , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 - J T / O ) \u00C2\u00A3 Ar { ( r i , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 ,r;o) : Ir^ , - > 1 < s < / \u00C2\u00B0 } . By Lemma 3.5, the last term on the RHS converges to a positive Cr- For sufficiently large n, this Cr will dominate the term Op(ln^ n/ra). Thus, uniformly in Ar, r = 1,... ,1^, and with probability tending to 1, 1 o / ^ 1 , Cr -Sn{ri,---,Tio) > - e \u00E2\u0080\u009E e \u00E2\u0080\u009E + \u00E2\u0080\u0094 . n n 1 This implies that with probability approaching 1 no r in Ar is qualified as a candidate for the role of f, where f = ( f i , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, fjo). In other words, P{T 6 Af) 1 as n ^ oo. Since this is true for all r , P{f G f l t l i ^ r ) 1, n -> oo. Note that for 8' < mino 0 such that for all re > Ni PCkn l > Si) < \u00E2\u0082\u00ACi. And for each pair of and 6k, there exists Nk > iVjt_i such that for all n > Nk, P(\Xn\ > 6k) < \u00E2\u0082\u00ACk-Let a\u00E2\u0080\u009E = 1 if n < iVi and an = 6k ii Nk < n < Nk+i, k = 1,2, - \u00E2\u0080\u00A2 \u00E2\u0080\u00A2. Then a\u00E2\u0080\u009E 0 as re oo. Also, for any e > 0, there exists ko such that 0 < < \u00E2\u0082\u00AC. Thus for any re > Nk^, Nk < n < Nk+i for some k > ko, and P(\xn\ > a\u00E2\u0080\u009E) = P{\x^\ > 6k) < ffc < ffco < e. Again by x\u00E2\u0080\u009E = Op( l) , there exists M > 1 such that Pi\xn\ >M) 1, E\zi\^ < 0 0 , then where 1/v = 1 \u00E2\u0080\u0094 1/u. Proof It suffices to sliow that \^i\\^(x,defli) - l(x.j6fl,)l = C>p((a\u00E2\u0080\u009E)i/' '). Since, for every j = 1 , . . . , / \u00C2\u00B0 , where for J = 1, the first term is defined as 0. Hence it suffices to show that for every i . By assumption, A \u00E2\u0080\u009E j = Op(a\u00E2\u0080\u009E). So for all e > 0 there exists M > 0 such that P ( A \u00E2\u0080\u009E j > a \u00E2\u0080\u009E M ) < \u00E2\u0082\u00AC for all n. Thus 1 \" E l^' | l(k, . - r\u00C2\u00B0 | \u00C2\u00ABy'^M) + 6. Hence it remains to show that ^ i / , ^ I]\"=i kt | l(|x,j-T9| 0 as n oo such that f - r \u00C2\u00B0 = Op(a\u00E2\u0080\u009E) . Note that ( / /) = ^ Y,^^^ ^tyti'^ix.jeR,) ~ h^t^eR,)) where Rj = (\u00E2\u0080\u00A2fj_i,fj], Rj = (rj '_i,rj ']. Taking u > 1 and Zt = ai'xtyt for any real vector a, it follows from Lemma 3.6 that ( / /) = Op(l). If (J) = Op(l), then 'pj - P* = Op(l), j = 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 , /\u00C2\u00B0 + 1. So, it remains only to show that (/) = Op(l). By the strong law of large numbers, ^XJ'XJ ^fxiXil^^^^g^^o^^^o])} > 0. If we can show that ^X'jXj-i^XJ'X* = Op(l), then for sufficiently large n, ( i X j X y ) - i and ( ^ X / ' X * ) \" ! exist with probability approaching 1. And , ( ^ X j X j ) ~ \u00E2\u0080\u0094 ( ^ X j * ' X * ) ~ = Op(l). So, it suffices to show that ^ X j X j \u00E2\u0080\u0094 ^Xj'XJ = Op(l). Let a 7^ 0 be a constant vector and Zt \u00E2\u0080\u0094 (a'xj)^. Then a ' ( i X j X , - i X ; ' X ; ) a = 1 E L i a'x,x^a(l(^,^,^^.) - ! ( . . , , \u00C2\u00AB , ) ) = \ ^ti^^.^eR,) \" ^(xtjeRj))- Taking the sequence {un} in the last paragraph and u > 1, it follows from Lemma 3.6 that a ' ( i X j X , - - i X ; ' X ; ) a = Op(l) and hence i X j X , - - i X / ' X * = Op(l). This completes the proof. % The proof of Theorem 3.4 depends on the following results. Proposition 3.3 (Serfling, 1980, p32) Let {y^t, 1 < t < Kn,n = 1,2,...} be a double array with independent random variables within rows. Suppose, for some v > 2, Then n B-'[J2y-t-^-]^ N{Q,l), asn-^oo, where n^t = E{ynt), An = E<=i Mnt and Bl = Var(ynt). Lemma 3.7 Let {kn} be a sequence of positive numbers such that kn ^ 0 and nkn \u00E2\u0080\u0094> oo. Assumptions 3.0 and 3.3 imply that for any j = 1, - \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 ,P, (i) ^ X ; ( r \u00C2\u00AB - fc\u00E2\u0080\u009E,r\u00C2\u00B0)X\u00E2\u0080\u009E(r\u00C2\u00B0 - fc\u00E2\u0080\u009E,r\u00C2\u00B0) ^ \u00C2\u00A3 ( x i x l | a ; i , = r \u00C2\u00B0 ) / , ( r \u00C2\u00B0 ) , ^ X ; ( r \u00C2\u00B0 , r j ' + fc\u00E2\u0080\u009E)X\u00E2\u0080\u009E(r\u00C2\u00B0,r\u00C2\u00B0 + kn) ^ E{xix[\x,d = r^)f\u00C3\u00A9ir\u00C2\u00B0), (ii) ^ 6 U r \u00C2\u00AB - kn,r^)en{r^ - kn,T^) ^ a'Mr^), ^ 4 ( r \u00C2\u00AB , r \u00C2\u00B0 + A:\u00E2\u0080\u009EK(r\u00C2\u00B0,r\u00C2\u00B0 + kn) ^ cToV.(r\u00C2\u00B0), (Hi) - kn,r^)Xn(r\u00C2\u00B0 - kn,T^) ^ 0, - ^ < ( r \u00C2\u00B0 , r \u00C2\u00BB + kn)Xn{Tf,T^ + kn) ^ 0. Proof It suffices to show the second equation in each of (i), (ii) and (iii), the proofs of the first deferring only in a formahstic sense. (i) Note that X'niTf,Tf + / :\u00E2\u0080\u009E)X\u00E2\u0080\u009E(rj ' , r? + A;\u00E2\u0080\u009E) = E t l i Xtx ; i ( , . , e ( ,o , ,o+ ,\u00E2\u0080\u009E] ) . Let a ^ 0 be a constant vector, r/\u00E2\u0080\u009Et = a'xtx;al(^,^e(^o_^o^jt\u00E2\u0080\u009E]), = E(ynt), and al = Var{ynt). If X;[(a'xt)2|r9] > 0, then E[(a'yity\Tf] > 0 and =^{l(x..\u00E2\u0082\u00AC{r\u00C2\u00B0,rO + fc\u00E2\u0080\u009E])\u00C2\u00A3^[(a'xi)2|xtd]} =E[iBi'xrf\xrd = &n]fd{0n)kn =i;[(a 'xi) ' | i i 0. By Minkowski's inequality, for E\yni-t^nr<2''-\E\ynir + ti:) = 2 ' ' - H ^ [ ( a ' x i f \" I x i , = ^n]fdUr.)kn + ( i ; [(a 'xi l ^ i , = en]fd(On)knr} = 2''-'E[iai'xxf'\x,d = Tf]MT])kn + Oikn), where \u00C3\u0087\u00E2\u0080\u009E \u00E2\u0082\u00AC i'^j^'^j + ^n]- So by setting An = nfin and = ncr^, we have i=l iE[{a'x,y\xu = r\u00C2\u00B0]/,(r9)A;\u00E2\u0080\u009E + o(A;\u00E2\u0080\u009E))V2 - 0 , as n ^ oo since v > 2. Hence by Proposition 3.3, n Bn'[J2ynt-An]^N{0,l), US U OO. t=l Now, since Bllinknf = Opiln^n)/ln'n = Op{ln-^n), 53 we obtain 1 = \u00E2\u0080\u0094 V ynt a'X;(xixJ|a;fd = T^)aifd{T^), as n ^ oo. K i;[(a 'xi)2|xid = rj\u00C2\u00BB] = 0, it suffices to show that ; ^ a ' X ; ( r j > , T\"? + A;\u00E2\u0080\u009E)X\u00E2\u0080\u009E(r?, + fc\u00E2\u0080\u009E converges to 0 in i i . i ; ( ^ a ' X ; ( r \u00C2\u00B0 , 7-\u00C2\u00B0 + K)Xn{Tl + fc\u00E2\u0080\u009E)a) 1 =\u00C2\u00A3[(a'xtfl(..,e(.o,.o+jt\u00E2\u0080\u009E])]/fc\u00E2\u0080\u009E =^{l(r..e(rO,TO+fc\u00E2\u0080\u009E])\u00C2\u00A3[(a'xt)'|xid]}/A:\u00E2\u0080\u009E =i ; [ (a 'x i f |a : id = ^\u00E2\u0080\u009E]/d(^\u00E2\u0080\u009E) = \u00C2\u00A3 [ ( a ' x a f | x i , = r \u00C2\u00B0 ] / . ( r \u00C2\u00B0 ) + o(l) =o(l), as n \u00E2\u0080\u0094>\u00E2\u0080\u00A2 oo, where 0\u00E2\u0080\u009E \u00E2\u0082\u00AC ('''JITJ + 'i^ n)- This completes the proof. (u) Similarly to (i), let y^t = ^t'^(x,de(r\u00C2\u00B0,r\u00C2\u00B0+k\u00E2\u0080\u009E]), fJ-n = E(ynt), and al = Var(ynt). Then fin =^[f?l(x\u00E2\u0080\u009Ee(T\u00C2\u00B0,T\u00C2\u00B0-l-fc\u00E2\u0080\u009E])] = al[fd{T^)kn + o(kn)l =E{yl\u00C3\u00B9 - ni = Eiet)P{xtdeiTf,Tf + kn])-fll = Ei4)UT^)kn + 0ikn)-fll = Eiet)MT^)kn + oikn). By Minkowski's inequality, for u > 2, ^ i \u00C3\u00AE / \u00E2\u0080\u009E i - / ^ n r < 2 ' ^ - ' ( ^ i y n i r + / / ; : ) =2 ' ' - i f ; ( e^ ) /d ( r ? ) f c\u00E2\u0080\u009E + o(fc\u00E2\u0080\u009E). So by setting A\u00E2\u0080\u009E = n/z\u00E2\u0080\u009E and = na^, we have \u00C3\u00A8 ^\ynt - M n l V ^ ; : =n-^''''-'^E\ynt - M n | 7 ( ^ | y n * - tin?)\"\" + ^ \u00E2\u0080\u009E ) a f 1 \" 1 = E ^[^?(^'^0'l(x..6(rO,rO + fc\u00E2\u0080\u009E])] = ^ a 2 ( i ; [ ( a ' x , ) 2 | x i , = r]>]/,(r\u00C2\u00B0) + o(l)) ^ 0 as n oo. f The approach of the fohowing proof is to show that uniformly for all TJ such that \TJ \u00E2\u0080\u0094 TJ\ > Op(ln^ n/n), 5 \u00E2\u0080\u009E ( r i , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, r;o) > 5'\u00E2\u0080\u009E(r{*, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, rfo) for sufficiently large n. We shall achieve this by showing 5n(r?_i + 6, TJ) + Snirj, rf^^ - S) - [5\u00E2\u0080\u009E(rj '_, + 6, r^) + 5 \u00E2\u0080\u009E ( r ? , T\u00C2\u00B0^, - S)] + Op{ln' n) > 0 for sufficiently large n. Proof of Theorem 3.4 By Theorem 3.1, the problem can be restricted to {/ = P}. Suppose for some j, P (xU/9 ,V i - P'j) ^ 0|xd = r?) > 0. Hence A = XJ[(xi(y3P+i - P']))'\xd = r?] > 0. Let P(a,T}) be the minimizer of \\Ynia,T]) - Xn{(x,Tf)P\\'^. Set kn = A ' ln^ n/n for n = 1,2,- \u00E2\u0080\u00A2 - , where K will be chosen later. The proofs of Lemma 3.6 and Theorem 3.3 show that if a \u00E2\u0080\u009E \u00C2\u00AB5 Vn Vi then /3(a\u00E2\u0080\u009E ,7/\u00E2\u0080\u009E) 0(a,T)) as n ^ oo. Hence, for rj* + A;\u00E2\u0080\u009E ^ rj* as n ^ oo, /3(rj'_i + <\u00C3\u0087,rj' + kn) + <5,rj') as n oo. By Assumption 3.2, for any sufficiently small S e (rj '_i,rj '), i ; { x i x i l{x,de(T9_^+s,T\u00C2\u00B0])} is positive definite, hence P{Tf_-^ + 6,Tf) as n \u00E2\u0080\u0094\u00C2\u00BB oo. Therefore P{TJ_I + S,T^ + kn) Pj. So, there exists a sufficiently smah <5 > 0 such that for all sufficiently large n, ||/?(r?_i + S,T^ + kn) - P\u00C2\u00B0j\\ < \\P\u00C2\u00B0j - P%i\\ and iP{Tf_, + \u00C3\u00AA,T^ + kn) - P^+i)'Eixix[\xu = rf) {P{TU + ^ ' ^ i + kn) - P]+i) > A / 2 with probabihty approaching 1. Hence by Theorem 3.2, for any c > 0, there exists Ni such that for n > Ni, with probability larger than 1 - 6, we have {\)\fi-Tf\ A / 2 . Let A,- = {{n, - \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 ,r,o) : \Ti - Tf\ < S, i = 1, - \u00E2\u0080\u00A2 \u00E2\u0080\u00A2,P, \TJ - > J = 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 , /\u00C2\u00AB . Since for the least squares estimates f i , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, f^ o, 5 \u00E2\u0080\u009E ( f i , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, f/o) < 5\u00E2\u0080\u009E ( r f , \u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2 ,T^O), inf {5n(ri , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, r,o) - 5 \u00E2\u0080\u009E ( r \u00C2\u00B0 , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, rfo)} > 0 (TI,-,T,O)6>1,-implies ( f i , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, fio) ^ A j , or, |fj \u00E2\u0080\u0094 rj\"] < fc\u00E2\u0080\u009E = ii ' ln '^ n / n when (i) holds. By (i), i f we show that for each j, there exists N > Ni such that for all ra > TV, with probability larger than 1 \u00E2\u0080\u0094 2e, inf(Ti,...,T,o)eAj{'S'n(T-i,---,T/o) - 5\u00E2\u0080\u009E(ri\u00C2\u00B0,---,r\u00C2\u00B0o)} > 0, we wiU have proved the desired result. Furthermore, by symmetry, we can consider the case when TJ > TJ only. Hence Aj may be replaced by A'j = { ( r i , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, r,o) : \Ti-T^\ < S, i = 1,-\u00E2\u0080\u00A2 \u00E2\u0080\u00A2 ,1\u00C2\u00B0, TJ-T^ > K}. For any ( n , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, r(o) G A'j, let 6 < \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 < 6/0+1 be the set {n,r,o, r\u00C2\u00BB , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 , T-P.^, rj\u00C2\u00BB.! + S, r^+j -S,T^^^,---, } after ordering its elements and let fo = -oo , ^2i\u00C2\u00B0+2 \u00E2\u0080\u0094 oo. Using Proposition 3.1 (ii) twice, we have E Sn{^i-uii) + 5\u00E2\u0080\u009E(r] '_i + <5,r\u00C2\u00B0) + 5\u00E2\u0080\u009E(r ] ' , r ]Vi - ^) =4c\u00E2\u0080\u009E + Op(ln2 n) =[Sn{rl \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, r \u00C2\u00B0 ) + Op(ln2 n)] + Op{\n^ n) = 5 \u00E2\u0080\u009E ( r \u00C2\u00B0 , . . . , r \u00C2\u00B0 o ) + Op(ln2 n). Thus, Sn{T\, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 -jT/o) >5\u00E2\u0080\u009E(6, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2\u00E2\u0080\u00A2,6/0 + 1) 2/\u00C2\u00B0+2 = ^ 5\u00E2\u0080\u009E(f,_l,6) :=1 = Sn{ii-,,ii) + 5\u00E2\u0080\u009E(Tf_i + 8,rj) + 5\u00E2\u0080\u009E(r,-,r]Va - <5) 5\u00E2\u0080\u009E(6-i,6) + 5n(r\u00C2\u00B0_i + ^ , r \u00C2\u00B0 ) + 5 '\u00E2\u0080\u009E(r\u00C2\u00B0 ,r]Vi - 8) +[^n(r\u00C2\u00B0_a + 8,Tj) + 5\u00E2\u0080\u009E(r,-,rO+, - <^ )] - [Snir\u00C2\u00B0_, + 8,T^) + 5 \u00E2\u0080\u009E ( r \u00C2\u00B0 , r \u00C2\u00B0 , i - 8)] =Sn{Tl...,T\u00C2\u00B0) + Op{ln\) HSnirU + + Snirj,T^+r - \u00C3\u00A9)] - [5\u00E2\u0080\u009E(r\u00C2\u00B0_i + ,\u00C3\u0087,r\u00C2\u00B0) + Snir^T^^, - 8)], where Op{ln'^n) is independent of ( r i , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, r;o) G Aj. It suffices to show that for 5 \u00E2\u0080\u009E = {TJ : TJ G (TJ + kn, rj* + 6)} and sufficiently large n, inf {5n(r?_i - ^, rj) + 5\u00E2\u0080\u009E(r,-, r?+i - ^) - [5\u00E2\u0080\u009E(r?_i + 6, r]) + Snir^ rj'+i - 6)]} ^'^^^ (3.8) with probability larger than 1 \u00E2\u0080\u0094 2e for some fixed M' > 0. Let n 5\u00E2\u0080\u009E(a , r ? ; ^ ) = | |y\u00E2\u0080\u009E(a , 77) - X \u00E2\u0080\u009E ( a , 7?)^ ||2 = E ^ ^ / * \" Since 5\u00E2\u0080\u009E(Q;, 77) = 5 '\u00E2\u0080\u009E(Q, 77; P(a, 77)), we have 5 \u00E2\u0080\u009E ( r ? _ i + ^ , r , ) > 5 \u00E2\u0080\u009E ( r 9 _ i + ^, r9 + kn) + 5 \u00E2\u0080\u009E ( r \u00C2\u00B0 + A;\u00E2\u0080\u009E, TJ) =Sn{rf_i + 6, rf;P{T^_, + S,r\u00C2\u00B0 + kn)) + 5 \u00E2\u0080\u009E ( r 9 , + \u00C3\u0082;\u00E2\u0080\u009E;^(rf_j +6,T^ + kn)) (3.9) + 5\u00E2\u0080\u009E ( r ] ' + \u00C3\u0082:\u00E2\u0080\u009E,r,) >5'\u00E2\u0080\u009E(rj '_i + S,T^) + 5 \u00E2\u0080\u009E ( r \u00C2\u00B0 , r 9 + fc\u00E2\u0080\u009E;/3(r\u00C2\u00B0_i + <\u00C3\u0087,r\u00C2\u00B0 + A:\u00E2\u0080\u009E)) + 5\u00E2\u0080\u009E( r ] ' + A;\u00E2\u0080\u009E,r,). And since (r^ + kn,T^^i - ]^ C ( T J , T J ^ . ! ] for sufficiently large n, Snirf + kn,T^+i - ^;^\u00C2\u00B0+x) = Ur] + fcn,r\u00C2\u00B0+i - \u00C3\u00A8)ln{r] + fc\u00E2\u0080\u009E,r]Vi - Sn{rJ,T]^, -S)- 5 \u00E2\u0080\u009E ( r ? , r 9 + knJ'j+x) - 5\u00E2\u0080\u009E ( r ? + Ar\u00E2\u0080\u009E,T,) + Op(W n). Therefore, by (3.9) and (3.10) [5\u00E2\u0080\u009E(r?_i + (5, TJ) + 5\u00E2\u0080\u009E(r,-, r^+i - 6)] - [5\u00E2\u0080\u009E(Tf_i + ^, rj>) + 5\u00E2\u0080\u009E ( r ] ' , rj'+i - 6)] >5n(r?, + A:\u00E2\u0080\u009E; ^ ( r?_i + 6, r? + A;\u00E2\u0080\u009E)) - 5'\u00E2\u0080\u009E(r]', r? + ^P^^) + ^^(In^ n). Let M > 0 such that the term |Op(ln^ n)| < M l n ^ ra with probability larger than 1 - e for all n > Ni. To show (3.8), it suffices to show that for sufficiently large n, Snirf, + A:\u00E2\u0080\u009E;/3(r\u00C2\u00B0_i + \u00C3\u00AA, r \u00C2\u00B0 + K)) - 5\u00E2\u0080\u009E ( r j ' , rj\u00C2\u00BB + K; P'j+,) - Mln'n > M'ln'n, or SniTf,T] + kn; P{Tf_, + 6, r? + kn)) - 5\u00E2\u0080\u009E ( r j ' , rj> + kn, P\u00C2\u00B0j+r) > (M' + M)ln'n (3.11) with large probabihty. Recall Sn{a,vJ) = \\Yn{a,rj) - Xn{a,T})P\\^ and Yn{Tf,Tf + kn) = + kn)Pj+i + ^n(Tf,Tf + kn). Taking K sufficiently large and applying (ii), (iii) and Lemma 3.7 (i), (iii), we can see that there exists N > Ni such that for any n > N, -L-lSnir^T^ + kn, + + ^n)) - ^ n ( r \u00C2\u00B0 , T\u00C2\u00BB + kn, ^ \u00C2\u00B0 + i ) ] = ; ^ [ r n ( r \u00C2\u00B0 , r \u00C2\u00B0 + kn) - X \u00E2\u0080\u009E ( r \u00C2\u00AB , r \u00C2\u00B0 + kn)KrU + + - | | y\u00E2\u0080\u009E ( r\u00C2\u00AB , r\u00C2\u00AB + kn) - Xn{rf,T^ + kn)P'j+,\\'] - | | c \u00E2\u0080\u009E ( r ; , 7 - \u00C2\u00B0 + A;\u00E2\u0080\u009E)||2] = + ^n)(^?+l - + S,T^ + kn))r J : ' \" ^ ^ ^ ' + ' + ^n)0\u00C2\u00B0+l - + ^ i \" + kn)) > A / 4 - A / 8 > ( M ' +Af ) / / ! : with probabihty larger than 1 \u00E2\u0080\u0094 2e. Since /:\u00E2\u0080\u009E = Klv?n/n, the above imphes (3.11). ^ Proof of Theorem 3.5 By Lemma 3.4 (n), - J2t=i A = Op{ln^ n/n). So, and n S\"=i share the same asymptotic distribution. Applying the central hmit theorem to {e^}, we conclude that the asymptotic distribution of Z)\"=i is normal. Let {Pi, - \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 iPfo^i) be the \"least squares estimates\" of (Pi, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2,P%^i) when P and r? , ( i = 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, P), are assumed known. Then it is clear that ^/n[{P*', \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2,P*o+i)'-{Pi', \u00E2\u0080\u00A2\u00E2\u0080\u00A2,^p+i')'] converges in distribution to a normal distribution. So it suffices to show that Pj \u00E2\u0080\u0094 Pj = Ovin-'I'). Set X ; = / \u00E2\u0080\u009E ( r j ' _ i , rP )X\u00E2\u0080\u009E and Xj = J \u00E2\u0080\u009E ( f , _ i , f , ) X \u00E2\u0080\u009E . Then, h - ^; - ( ^ x ; ' x ; ) - ] [ i x j y \u00E2\u0080\u009E ] + [ ( i x ; ' A 7 ) - ] [ i ( x , - x ; ) ' y \u00E2\u0080\u009E ] = [ ( i x ; . x , ) - - ( i x ; ' x ; ) - ] { i ( x ; . - x ; ) ' y \u00E2\u0080\u009E + i x ; y \u00E2\u0080\u009E } + [ ( i x ; ' x ; ) - ] [ i ( x , - x ; ) ' y \u00E2\u0080\u009E ] =:(/){(//)+ (J7/)} + (n/)(//). where (/) = [ ( ^ X j X , ) \" - ( ^ X / ' X / ) \" ] , ( / / ) = i ( X j - X ; ) ' y \u00E2\u0080\u009E , ( / / / ) = i x ; y \u00E2\u0080\u009E and ( I F ) = [ ( i X ; ' x ; ) - ] . As in the proof of Theorem 3.3, both (III) and (IV) are 0^(1). By Theorem 3.4, f - r \u00C2\u00B0 = Op{ln^n/n). The order of Op(n\"^''^) of (I) and (II) follows from Lemma 3.6 by taking a\u00E2\u0080\u009E = In^n/n, zt = (a'xj)'^ and zt \u00E2\u0080\u0094 a'xf^j respectively, for any real vector a and u > 2. This completes the proof. ^ 3.2 Consistency of the estimated segmentation variable Since d is assumed unknown in this section, we wiU use the notation such as 5\u00E2\u0080\u009E(yi) , Tn{A) introduced in Section 2.2. The two theorems in this section show that the two methods of estimating d9 given in Section 2.2 produce consistent estimates, respectively. T h e o r e m 3 .6 If dP is asymptotically identifiable w.r.t. L, then under the conditions of Theo-rem 3.1, d given in Method 1 satisfies P{d = dP) \u00E2\u0080\u0094 - 1 as n ^ ex. T h e o r e m 3 . 7 Assume {xj} are iid random vectors. If Zi \u00E2\u0080\u0094 ( x n , . . . , X i p ) ' is a continuous random vector and the support of its distribution is (ai ,6i) X ... X (ap,bp), where \u00E2\u0080\u0094oo < ai < bi < oc, i = I,... ,p, and for any a G R P , E[{z[zi)^] < oo, for some u > I, then d given by Method 2 satisfies P{d = dP) \u00E2\u0080\u0094r 1 as n oo. To prove Theorem 3.6, some results similar to those presented in the last section are needed. Lemmas 3.2'-3.3' and Proposition 3.1' below are generahzations of Lemmas 3.2-3.3 and Proposition 3.1 respectively. L e m m a 3 . 2 ' Assume for the segmented linear regression model (3.1) that Assumption 3.0 is satisfied. For any d ^ do and j ^ 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 , /\u00C2\u00B0 -|- 1, let R'j(a, 77) = {xi : a < xid < v}<^R\u00C2\u00B0j, < a < 7/ < 0 0 . Then P{svipT4R'jia,rj)) > ^In'n} ^ 0 , as n 0 , a ^ l n 2 7 i | X j a M l n ' n | X \u00E2\u0080\u009E } < J2 P{'^ln'n\Xn}. x,d ^iri^ri for some l\Xn} Pq it suffices to show, for any /, that E Pi^^ > M ^ I Xn} ^ 0 , asn^O. Noting that p = trace{H^{RJ{x,d,xtd))) = E L i II \?^ we have || q, f= q^q, < p < po, / = 1,. . . ,p. By Lemma 3.1, with 3poInn/To I X \u00E2\u0080\u009E } < E 2exp(\u00E2\u0080\u0094^ \u00E2\u0080\u00A2 ^ Inn )exp (co ( ro /po ) ' po ) < n(n - l)/n^exv{coT^/po) ^ 0, as n ^ oo, where CQ is the constant specified in Lemma 3.1. Finally, by appealing to the dominated convergence theorem we obtain the desired result without conditioning. % P r o p o s i t i o n 3.1' Consider the segmented regression model 3.1. (i) For any subset B of the domain of X\ and any j, SniB n R^j) = -e'niB n R''j)\u00C3\u00AAn{B D E \" ) - T \u00E2\u0080\u009E ( 5 n iZ^). (ii) Let be a partition of the domain o / x i , where m is a finite positive integer. Then, m+1 m+1 i=i i=i /or a / / F u r t h e r , if Bi = {x i : r j_ i < x i ^ < r,} for d ^ do then Assumption 3.0 implies m+1 Sn{Bi n R]) = \u00C3\u00AA'n{R]yn{R]) + Op(ln2 n) i=l uniformly for all T\, - \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 ,Tjn such that \u00E2\u0080\u0094oo = TQ < r i \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 < r^ ^ < r^+i = oo. Proof : (i) Denote A = Bf\R]. Sn{A) =y,:(/n(A) - Hn{A))Yn = (X\u00E2\u0080\u009E(A)/3\u00C2\u00B0 + \u00C3\u00A8n{A))'{UA) - Hn{,A)){XMW'j + UA)) =P'j'X'^{A)Xn{A)P] + 2\u00C3\u00AA'n{A)Xn{A)P] + 4(A)\u00C3\u00A8\u00E2\u0080\u009E(/l) - [^\u00C2\u00B0X(^)^n(^)X\u00E2\u0080\u009E(A)^\u00C2\u00B0 + 24(A)^\u00E2\u0080\u009E(A)X\u00E2\u0080\u009E(A)/3\u00C2\u00AB + \u00C3\u00AA'\u00E2\u0080\u009E(A)J\u00C3\u008F\u00E2\u0080\u009E(A)\u00C3\u00A8\u00E2\u0080\u009E(A)]. Since X ; ( A ) ^ \u00E2\u0080\u009E ( A ) X \u00E2\u0080\u009E ( A ) = X ; ( A ) X \u00E2\u0080\u009E ( A ) and ^ \u00E2\u0080\u009E ( A ) is idempotent, we have [Xn{A) - i r\u00E2\u0080\u009E(A)X\u00E2\u0080\u009E(A)] ' [X\u00E2\u0080\u009E(A) - ^ \u00E2\u0080\u009E ( A ) X \u00E2\u0080\u009E ( A ) ] = 0 and hence 5 '\u00E2\u0080\u009E (A)X\u00E2\u0080\u009E(A) = X\u00E2\u0080\u009E(A) . Thus, 5\u00E2\u0080\u009E(A) = 4(^ )fn(^ ) - \u00C3\u00AA\u00E2\u0080\u009E(A)^\u00E2\u0080\u009E(A)\u00C3\u00AA\u00E2\u0080\u009E(A) = \u00C3\u00A8'n{A)\u00C3\u00A9n{A) - T\u00E2\u0080\u009E(A) . (ii) By (i), m+1 Y,Sn{B,f\R]) i=l m+1 = Y KiBi n R])UBi n i2\u00C2\u00B0) - r\u00E2\u0080\u009E(5.- n R])] t=i m+1 =\u00C3\u00AA'\u00E2\u0080\u009E(i2?)\u00C3\u00A8\u00E2\u0080\u009E(ii:\u00C2\u00B0) - E ^ - (^<^ ^ i ) -.=1 1\u00C2\u00A3 Bi = { x i : r i _ i < xid < Ti}, denote Bi n R\u00C2\u00B0 by RJ{Ti_i,Ti) for all i. Lemma 3.2' im-plies Y.TJ'i TniBi n PQ) = ZtV Tn{RJ(Ti-i, Ti)) < (m + 1) sup,<, T\u00E2\u0080\u009E(i2^^(a, T?)) = Op{ln' n) uniformly for all \u00E2\u0080\u0094oo < r i < \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 < < oo. % L e m m a 3.3' Let A be a subset of the domain o / x i . / / both \u00C2\u00A3 ' [xixi l(xieAnHO)] X^[xiXil(xjeyini\u00C3\u00AE<'^j)] \u00C3\u00BB'^ c positive definite. Then under Assumption 3.0, [Sn{A) - Sn{A n R\u00C2\u00B0,) - Sn{A n i?\u00C2\u00B0+i)]/n ^ for some Cr > Q as n ^ oo, r = 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, / \u00C2\u00B0 . P r o o f It suffices to prove the result when /\u00C2\u00B0 = 1. For notational simplicity, we omit the subscripts and superscripts 0 in this proof. Let = X\u00E2\u0080\u009E(A n Rj), \u00C3\u00AAj = \u00C3\u00AA\u00E2\u0080\u009E(A fi Rj), j \u00E2\u0080\u0094 1,2, X * = X i * + Xj* , \u00E2\u0082\u00AC* = \u00E2\u0082\u00ACl + \u00C3\u00AB | and 'p = ( X * ' X * ) - X * ' y \u00E2\u0080\u009E . As in ordinary regression, we have Sn(A) =\\x;:0i-'p) + x;02-h + n\' =\\x;0i - + \\x;02 - h \ ' + Wn? + 2 \u00E2\u0082\u00AC * ' X ; ( \u00C3\u0080 - ^ ) + 26--'x,*(^2 - h It then follows from the strong law of large numbers for stationary ergodic stochastic processes that as n \u00E2\u0080\u0094> oo, ^ ^ v ^ * = ^ \u00C3\u00A8 x , x ; i ( x . e A ) ^ \u00C2\u00A3 { x i x ; i ( x , e ^ ) } > 0, i x ; ' x ; ^ \u00C2\u00A3{x ix l l (x , exnR, )} > 0, ; = 1,2, and i x * V \u00E2\u0080\u009E ^ i;{\u00C3\u00AE/iXal(xieA)}-Therefore, ^ ^ {^{x ix i l (x ,6^ )}} -^\u00C2\u00A3{ \u00C3\u00AE / i x i l (x ,6^) } 64 Similarly, it can be shown that Tt for J = 1, 2, and n Thus as n \u00E2\u0080\u0094>\u00E2\u0080\u00A2 oo, ^5 \u00E2\u0080\u009E (A) has a finite limit, this limit being given by lim -Sn{A) n-K\u00C2\u00BB n = ( \u00C3\u0082 - ^ * ) ' \u00C2\u00A3 ( x i x i l ( x , e ^ n R , ) ) \u00E2\u0080\u00A2 (^1 - n + 02 - ^ \u00E2\u0080\u00A2 ) ' i ; (x ix ' i l (x , e^nR, ) ) \u00E2\u0080\u00A2 02 -+ a ^ P j x i e A}. It remains to show that ^5\u00E2\u0080\u009E(>1 n Rj) converges to a-P{xi \u00C2\u00A3 A (1 Rj}, j = 1,2, and at least one of 0i - P*)'E(xxx[l^^^^^nR,))0i - P\") and 02 - $')'Eix^^[li^,eAnR,))02 - P') is positive. The latter is a direct consequence of the assumed conditions while the former can be shown again by the strong law of large numbers. By Proposition 3.1' (i), Sn(A nRr) = \u00C3\u00AA'^iA n Ri)\u00C3\u00AAniA n Ri) - T\u00E2\u0080\u009E(A n = - Tn{A n R^). The strong law of large numbers implies - \u00C3\u00AA \u00C3\u00AE \u00C3\u00AA i ^ E[ell^^^eAnR,)] = (T^P{^I e AO R^), Tt - f i ' X j ^ [ f iXi l (x ,e^n i \u00C3\u00AE i ) ] = 0, Tt as n ^ oo and W = lim\u00E2\u0080\u009E_^co ^ - ' ^ i ' - ^ i * is positive definite. Therefore, -TniAn Ri) = i-\u00C3\u00AA[x;)i-x^'xn-i-x:'\u00E2\u0082\u00AC,) ^ ow-'o = o n n n n and hence ^5 \u00E2\u0080\u009E (A n i^ i ) (T^P{XI 6 AD Ri}. The same argument can also be used to show that ^SniA n R2) ^ CT^Pfxi e An R2}. This completes the proof ^ P r o o f o f T h e o r e m 3.6 For d = (f,hy Lemma 3.4 (ii), n Thus, it suffices to show for d ^ dP, that ^S^ > <^o+C for some constant C > 0 with probabihty approaching 1. Again, /\u00C2\u00B0 = 1 is assumed for simplicity, li d ^ d^,hy the identifiability of d'^ and Theorem 2.1, for any {Rj]'fil, there exist r, 5 e {1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, X + 1} such that D where A f = { x i : Xid e [as,b,]} is defined in Theorem 2.1. Let = { ( r i , . . . , r i ) : Rf D A'^ for some r} . Then for any ( r i , . . . , TL), (TI, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, TL) G Bs for at least one s 6 {1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, X + 1}. Since d is chosen such that < for all d, it suffices to show that for d ^ d\u00C2\u00B0 and each s, there exists Cs > 0 such that inf i 5 ^ ( n , . . . , r z , ) > a 2 + C . (3.12) (TI,...,TI,)6B, n with probabihty approaching 1 as n ^ oo. For any {TI,...,TL) \u00E2\u0082\u00AC P^ , let R'1^2 = {x : G (rr_i ,as)}, i \u00C3\u00AE ^ ^ 3 = {x : Xd e (&i,r,.]}. Then J?^ = A'^^ U R'[_^_ol> Ri+s- Note that the total sum of squared errors decreases as the partition becomes finer. By Proposition 3.1' and the strong law of large numbers, n j=i >-[ Y Sn{R'^) + SMi)] > - { E [SniR'^nR'i) + Sn{R'jnRl)] + [SniAinR'i) + Sn{AinR'',)]} T\"'^ (3.13) + -[SniAi) - 5\u00E2\u0080\u009E(Af n R\u00C2\u00B0) - SniAi n R\u00C2\u00B0)] n = -{\u00C3\u00A8'^{Rl)UR\u00C2\u00B0i) + ^RDURD + Op{\n' n)] n = i{\u00C3\u00A8'\u00E2\u0080\u009E\u00C3\u00A8\u00E2\u0080\u009E + Op(ln^ n)} + ^[SniAi) - 5\u00E2\u0080\u009E(A^ n ii!?) - 5\u00E2\u0080\u009E(Af n i\u00C3\u00AE?)] =al + Op(l) + - [ 5 \u00E2\u0080\u009E (A f ) - SniAi n iE\u00C2\u00B0) - SniAi n i2\u00C2\u00AB)]. Now it remains to show that i [ 5 \u00E2\u0080\u009E ( A ^ ) - 5 \u00E2\u0080\u009E ( A f n A ? ) - 5 \u00E2\u0080\u009E ( A f fli?^)] > for some Cs > 0, with probability approaching 1. By Theorem 2.1, \u00C2\u00A3^[xiXil(xie^,nRO)]j * \u00E2\u0080\u0094 1)2, are positive definite. Applying Lemma 3.3' we obtain the desired result. ^ To prove Theorem 3.7, we first define the \u00C3\u0082;th percentile of a distribution function F as Pk := inft{/ : Fit) > k/100}. Let and be the j * 100/(2X + 2)th percentile of F'^ and F^ respectively, where F*^ is the distribution function of and Fn is the empirical distribution function of {xtd}, i = 1,. . . , 2X + 2. If x^d has positive density function over a neighborhood of Pj for each j, then by Theorem 2.3.1 of Serfling (1980, p75), converges to pj almost surely for any j. Now, we are ready to introduce three lemmas required by the proof of Theorem 3.7. In these three lemmas, we shall omit \"d\" in and for notational simpficity. Lemma 3.8 Suppose izt,Xtd) is a strictly stationary process and the marginal cdf of xtd has bounded derivative at pj for all j . If rj - pj = Op(l), j = 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, 2X + 2, and for some u > 1 jEl^tl\" < oo, then 1 \" ~E^*(^(^\"^e(ry_i,r,)) \" l(x, 0, i = 1, 2. (i) For any a 7^ 0, 1 1 1 \" Taking Zt \u00E2\u0080\u0094 (a'xt)^!^^,^/??) and applying Lemma 3.8, we have \x*:xi = ^ x:;x:^ + o^ii), i = i , 2. Tl Tl (ii) Take Zt = ejl(x,g/\u00C3\u00AE9). Lemma 3.8 implies the desired result. (in) Take zt = a'x^Ci for any a. Lemma 3.8 imphes ^[X^^'e* - X*,'e*] = Op(l). So, it suffices to show that ^X*/e* = Op(7i-i/2). For any a 7^ 0, 1 1 \" t=i where {a.'x.t\u00C2\u00A3tl(x,eR\u00C2\u00B0nRj)} is a martingale difference sequence. By the central hmit theorem for a martingale difference sequence (Bilhngsley, 1968), a'(^X^/e*) = Op{n-'^/'^). t L e m m a 3.10 Let n{A) \u00E2\u0080\u0094 l(x,e>i) ^^ \"2/ set A in the domain of x.\. Then under the conditions of Theorem 3.7, for j = 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, 2Z + 2, (0 HRJ) = HRJ) + Op{l) = 2rF2 + Op(l), (ii) 'Pr = K + = ^P + Op(l), where K = ( x ; ' x ; ) - x ; ' y \u00E2\u0080\u009E , 'pp = ix;'x;)-x;'Yn, h = {^[xixil(x,ei\u00C3\u00AEy)]}~^i:[ \u00C3\u00AE / iXil(xj6R.)] . (Hi) \[Sn{Ri) - Sn{Rj)] = Op(l) and (iv) SniRj)/n(R,) - Sn{Rj)ln{R,) = Op(l). P r o o f Wi th loss of generality, we can assume P{Rj f] ) > 0, i \u00E2\u0080\u0094 1,2. (i) N o t e t h a t i n ( P , ) - i 7 z ( i 2 , ) = By applying Lemma 3.8 with Zt = 1, we get ^n(Rj) = ^n(Rj) + Op(l). By the strong law of large numbers for ergodic processes, ^n{Rj) = i E M^,eR,) = ElMx.eR,)] + Op(l) = P ( x , \u00E2\u0082\u00AC Rj) + Op(l) = + Op(l). (u) By the strong law of large numbers for ergodic sequence, ^X*'X* ^ -Efxix'j l(x,6Hj)] > 0 and ^X*'Yn ^ \u00C2\u00A3 ' [ X ' I J / I 1 ( X , 6 R J ) ] . Hence, \u00E2\u0080\u0094 /3p as u -> oo. Since x ; ' F \u00E2\u0080\u009E = x r p ' x r p A \u00C2\u00B0 + x;;x;^p', + x;'\u00C3\u00AA; and X*'Yn = Xi^' XirPi + X^r' X2rP2 + X^/\u00C3\u00AAl, Lemma 3.9 (i) and (iii) imply ( ^ x . ; ' x . * . ) - - ( i x . v x . \" ; ) - = op(i), Tl Tt i = 1,2 and -x:'Yn - - x ; ' y \u00E2\u0080\u009E =\u00C3\u00A8xi'x:,. - i x r ; x r p ) / 3 ? + C-x;;x;^ - lx;;x;Xpl + hx;',; - x;'e;) Tt Tt Tt 71 Tt =Op(l). This implies ^X;'Yn = Op(l) since ^ X ; ' y \u00E2\u0080\u009E = Op(l). Thus, K-K = {x:'x:rx:'Yn - ( x ; ' x ; ) - x ; ' y \u00E2\u0080\u009E = [ ( i x ; ' x ; ) - - ( i x ; ' x ; ) - ] i x ; ' r \u00E2\u0080\u009E + ( i x ; ' x ; ) - [ i x ; ' r \u00E2\u0080\u009E - i x ; ' y \u00E2\u0080\u009E ] 71 7i Ti 7Z 71 71 =Op(l)Op(l) + Op(l)op(l) = Op(l). Tl Tl =hxxAl - P\)+xiXPr - P\) + Tl = ( | , - ^ ? ) ' ( i x , V x r j ( | , - ^ ? ) + ( ^ , - / 3 \u00C2\u00B0 ) ' ( i x ; / x ; , ) ( , i - ^ 2 \u00C2\u00B0 ) + i e ; ' e ; + \e*'\xiXPr - Pi) + xuhr - m -Tl Tl By (ii) and Lemma 3.9 (iii), 'Pr = Pp + Op{l) and ^e'^'Xf^ = Op(l), i = 1,2. Thus, ={h - ^m^xi'xiM, - /3?) + (P, - \u00C2\u00BB\u00C2\u00B0.)'\u00C3\u00A8x;;x;M, - 0\u00C2\u00B0) + U',; + 0,(1). Tl Tl Tl Similarly, =W, - / 3 f ) ' ( i x , - / x , ; ) ( f t - ^\u00C2\u00AB) + 0, - p\u00C2\u00B0,y{^x;;x;,)0, - (fi,) + ^.-j,; + 0,(1). ft Tl ll Hence, by Lemma 3.9 (i) and (ii), ^SniRj) - ^SniR,) Th TL =CPP - m l x ' j x ; , - \XI'X',XPP - pi) Tl Tl HPp - p'2)'[^x;/xi - lx;;x;^m - P') + - ^e;'e;] + 0^(1) Tl Tl Tl Tl = Op(l). (iv) By (i) and (iii), n(Rj) n{Rj) n n{Rj) n n{R,) Lemma 3.10 sets down the fundation for Theorem 3.7 and will be used repetedly in its proof. P r o o f of T h e o r e m 3.7 Let d ^ dP. Suppose a hnear model is fitted on _ff^ = {x i : xu, \u00E2\u0082\u00AC with the mean squared error \u00C3\u00A0'j{d) = Sn{RJ)/n{R'j). Under the assumed conditions, Lemma 3.3'and Lemma 3.10 (i) imply -;^Sn{RJ)- ^^^[Sn{RJr\RVl + Sn{RJ^Rl)] ^ Cj for some Cj > 0. Proposition 3.1' (i) and Lemma 3.2' imply the second term on the LHS, 1 \u00E2\u0080\u0094[SniRjnR\u00C2\u00B0,) + SniR^nR'2)] = ; ^ E ' n ( R l ^R\u00C2\u00B0yn(Rj n R'i) + Op{ln' n)] = ^/niRl)\u00C3\u00A8niR^) + Op(ln'n/n), which converges to (TQ by the strong law of large numbers. Thus, P(\u00C3\u00A0j{d) > (TQ + Cj/2) 1 as n oo. Since this holds for every by Lemma 3.10 (iv) > E (^0+Cfc/2)1(H^^.^^A^)+Op( l) >al + C + Op{l) for some C > 0. By Lemma (3.10) (i) n 2(^+1) rtd 2(L+1) ^ Thus, 1 - 1 ^\"^^ 1 2 = 2^o + y + \u00C2\u00ABp ( l ) -If = 3, the minimizer of log\u00C3\u00A2f -f- / \u00E2\u0080\u00A2 C \u00E2\u0080\u009E / n is a consistent estimate of 1\u00C2\u00B0 for / < L, the known upper bound of where {C\u00E2\u0080\u009E} is any sequence satisfying Cnn\"^/\"* oo and C \u00E2\u0080\u009E / n \u00E2\u0080\u0094>\u00E2\u0080\u00A2 0 as n \u00E2\u0080\u0094* oo. Four sets of specifications of this model are experimented with: (f) r \u00C2\u00B0 = 1/3, = 2/3, /3?o - 0, 0% = 2, P% = 4, e, ~ DEiO, 1/^2); (g) r f = 1/3, T\u00C2\u00B0 = 2/3, P% = 0, P% = 2, P% = 4, - tj/VU; (h) r\" = 1/3, rO = 2/3, 0% = 0, /3?o = 1, P'zo = - 1 , Q ~ ^'^^(0,1/V2); and (i) = 1/3, = 2/3, 0% = 0, y3\u00C2\u00B0o = 1, P'so = - 1 , ~ tr/VU, where refers to the Student-t distribution with degree of freedom of 7. In each of these cases the variances of ej are scaled to 1 so the noise levels are comparable. Note that for ej ~ tj/y/\u00C3\u008F\u00C3\u0082, ^^(ef) < oo and Ele]] = oo. It barely satisfies Yao's (1989) condition with m = 3 and does not satisfy our exponential boundedness condition. In Yao's (1989) paper, {Cn} is not specified, so we have to choose a {Cn} satisfying the conditions. The simplest {C\u00E2\u0080\u009E} is c i n \" . Wi th m = 3, we have n\"~'^l'^ oo implying a > 2/2. (We shall call the criterion with such a Cn, Y C , hereafter.) To reduce the potential risk of underestimating / \u00C2\u00B0 , we round 2/3 up to 0.7 as our choice of a. The and CQ in MIC are chosen as 0.1 and 0.299 respectively, for the reasons previously mentioned. Ci is chosen by the same method as we used to choose CQ, that is, forcing log no = cing\" and solving for cj . Wi th no = 20 and a = 0.7, we get ci = 0.368. The results for model selection are reported in Tables 3.3-3.4. Table 3.3 tabulates the empirical distributions of the estimated for different sample sizes. From the table, it is seen that for most cases, MIC and YC perform significantly better than SC. And with sample size of 450, MIC and YC correctly identify /\" in more then 90% of the cases. For Models (f ) and (g), which are more easily identified, YC makes more correct identifications than MIC. But for Models (h) and (i), which are harder to identify, MIC makes more correct identifications. From Theorem 3.1 and the remark after its proof, it is known that both MIC and YC are consistent for the models with double exponential noise. This theory seems to be confirmed by our simulation. The effect on model selection of varying the noise distribution does not seem significant. This may be due to the scaling of the noises by their variances, since variance is more sensitive to tail probabilities compared to quantiles or mean absolute deviation. Because most people are familiar with the use of variance as an index of dispersion, we adopt it, although other measures may reveal the tail effect on model identification better for our moderate sample sizes. Table 3.4 shows the estimated thresholds and their standard deviations for Models (f), (g), (h), (i), conditional on I = l'^. Overall, they are quite accurate, even when the sample size is 50. For Models (h) and (i), the accuracy of is much better than that of f i , since T2 is much easier to identify by the model specification. In general, for models which are more difficult to identify, a larger sample size is needed to achieve the same accuracy. Finally, the small sample performance of the two methods given in Section 2.2 for the identification of the segmentation variable is examined. The experiment is carried out for Models (b), (d) and (e). Among Models (a)-(e). Models (b) and (e) seem to be the most difficult in terms of identifying /\u00C2\u00B0 , and are also expected to be difficult for identifying d. Note that for all the models considered, d is asymptotically identifiable w.r.t. any X > 1 by Corollary 2.2. For X = 2, 100 replications are simulated with sample sizes of 50, 100 and 200. Wi th sample sizes of 100 and 200, both methods identify 1\u00C2\u00B0 correctly in every case. With sample size of 50, the correct identification rate of Method 1 is 100% for Models (b), (d), and 96% for Model (e); for Method 2 the rates are 98, 94 and 88 for Models (b), (d) and (e), respectively. From these results, we observe that for sample sizes of 100 or more, the two methods perform very well. And for a sample size of 50, Method 1 performs better than Method 2. This suggests that if the sample size is small. Method 1 may be more reliable. Otherwise, Method 2 gives a good estimate with a high computational efficiency. 3.4 General remarks In this chapter, we proved the consistency of the estimators given in Chapter 2. In addition, when the model is discontinuous at the thresholds, we proved that the estimated thresholds converge rapidly to their true values at the rate of In^ n/n. Consequently, the estimated regres-sion coefficients and the estimated variance of the noise are shown to have the same asymptotic distributions as in the case where the thresholds are known, under the specified conditions. We put emphasis on the case where the model is discontinuous for the following two reasons: First, if the model is continuous at the thresholds, then we have for any z \u00E2\u0082\u00AC R P and x ' = (1, z'), x'^ ^o = x'P%, i f X , = rj\u00C2\u00BB, J = 1,.. . , /O. This implies for ah j, E.-^d(/^(\u00C2\u00B0+i)i - = P% ~ f^U+i)o 0% ~ f^U+i)d)'''j \u00E2\u0080\u00A2 Since this holds for any x such that Xd = , we can conclude that /J^j+i),- = /5ji for i ^ 0,d and all j. By aggregating the data over Xd, we obtain an ordinary hnear regression problem and, hence, (z 7^ 0, c?, j = 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, /\u00C2\u00B0 1), can be estimated by least squares estimates with all the properties given by the classical theory. The residuals can then be used to fit a one-dimensional continuous piecewise hnear model to estimate (i = 0, d, j = I, - \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 ,1\u00C2\u00B0 + 1). For this one-dimensional continuous problem, Feder (1975a) shows that the restricted (by continuity) least squares estimates of the thresholds and the regression coefficient are asymptoticaUy normally distributed when the covariates are viewed as nonrandom. So the problem is essentially solved except for a few technical points. In the Appendix of this chapter, we shall use Feder's idea to show that for a multidimensional continuous model with random covariates, the unrestricted least squares estimates possess similar properties. That is, the {/3j} are asymptoticaUy normally distributed, and so are the thresholds estimates given by the {Pj} instead of least squares. Second, noting that continuity requires P^j^i-^i ~ 0% for i ^ (},d and all j, it would seem that a response surface over a multidimensional space will rarely be well approximated by such a continuous piecewise model. Problems where the models are either continuous at all thresholds or discontinuous at all thresholds have now been solved. The next question is what i f the model is continuous at some thresholds, and discontinuous at others. This problem can be treated as follows. First, decide if the model is continuous at each threshold. This can be done by comparing fj, the least squares estimate of rj\", with fj, the solution of pjo - P(j+i)o - {P(j+i)d - Pjd)'''j- By the established convergence of the /S '^s and the fj's, if the model were discontinuous at TJ, then fj would converge to TJ. Meanwhile, or P(j+i)i would converge to different values for some i ^ 0,d or fj would converge to some point different from rj\", or both. Thus, a large difference between fj and fj or between 0ji and P(j+i)i for some i ^ 0,d would indicate discontinuity. Then, by noting that Theorem 3.4 does not assume the model is discontinuous at all r^'s, we see that fj - rj* = Op(ln^n/n) for ah r^'s which are thresholds of model discontinuity. By the proof of Theorem 3.5, it is seen that these f /s can replace the corresponding r j ' s without changing the asymptotic distributions of the other parameters. So, between each successive pair of thresholds at which the model is discontinuous, the asymptotic results for a continuous model can be applied. In summary, regardless of whether the model is continuous or not, we can always obtain estimates of TJ''S which converge to their true values no slower than Op{ll\/n), and the estimated regression coefficients always have asymptoticaUy normal distributions. Note that most results given in this chapter do not require that x i have a joint density which is everywhere positive over its domain. Hence, one component of X i could be a function of other components, as long as they are not collinear. In particular, x i could be a basis of pth order polynomials. Since our estimation procedure is computationally intensive, one may worry about its computational feasibility. However, we do not thin]< this is a serious problem, especially with the ever growing speed of modern computers. The simulations reported in the last section are done with a Sparc 2 work station. Even with our inefl^icient program, which inverts an order rp (p-t- 1) X (p-|-1) matrices, 100 runs for model (a) consumes only about 9 minutes of C P U time with a sample size of n = 50 and only about 35 minutes with n = 100. Hence, each run would consume approximately .35 minutes of C P U time if n = 100. A more efficient program is under development; it uses an iterative method to avoid matrix inversion. A preliminary test shows that, with the same problems mentioned above, the C P U time consumed by this program is about 15 and 40 seconds for n = 50 and 100, respectively. Hence, each run would only take a few seconds of C P U time. Unfortunately, further modifications are needed for the new program to counter the problem of error evolution for large sample size. Nevertheless, even with our inefficient program, we believe our procedure is computationally feasible if L is small and n is not too large (say, Z < 5, n < 1000). And with a better program and a faster computer, the computation time could be substantially reduced, making much more complicated model fitting computationally feasible. Finally, as we mentioned in Section 3.1, the choice of and Co in MIC needs further study. 3.5 Appendix: A discussion of the continuous model In Section 3.1, we estabhshed the asymptotic normality of coefficient estimators for Model (3.1) when it is discontinuous at the thresholds. In this section, we shall establish the corre-sponding result for Model (3.1) when it is everywhere continuous. If Assumptions 3.0-3.1 are assumed by Theorem 3.1, the attention can be restricted to {/ = / \u00C2\u00B0 } . First, we shall show that the /3j's converge at a rate no slower than Op{n~^l- Inn) by a method similar to that of Feder (1975a). Now let ^ = (/3;,...,^;o+i)'; ^\u00C2\u00B0 = (^? ' , - - - J?oVi) ' ; f = (^ ' , r i , --- , r ;o) ' ; f\u00C2\u00B0 = ( ^ \u00C2\u00B0 ' , r f , . . . , r \u00C2\u00B0 ) ' ; S = : /5j 7^ /^j+i, i = 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, -oo < n < \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 < r,o < oo}; m(6X) = x ' [ ^ l(^,g(^._,,^^])^j]; and / i (\u00C3\u0087;Xi) = (^( f ;x i ) , - - - ,Me;xfc) ) ' , where Xfc = ( x i , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 ,Xfc)'. Assuming no measurement errors, Feder (1975a) seeks the values at which the response must be observed to uniquely determine the model over the domain of the covariate. To find these values, he introduces a concept of identifiability. VVe adapt his concept to our problem. Def in i t i on For any C = {6*', r^ *, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, r* )' G S, tlie parameter (9 = (/3[, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 ,0\o+J is identified at / i * = /x(f*,Xfc) by Xk if the equation = / i * uniquely determines 0 = Next we prove a lemma adapted from Feder (1975a). The proof follows that of Feder (1975a). L e m m a A 3 . 1 If 9 is identified at fp = /i(\u00C3\u0087\u00C2\u00B0,Xyt) hy Xk = (x i , - - - ,Xfc ) , then there exist neighborhoods, M, of fi(^'^,Xk) and T of Xk such that (a) for all (k-dimensional) vectors p, = {fii, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2,pk)' \u00E2\u0082\u00AC M and (p + I) X k matrices X^ G T such that p, can be represented as jl \u00E2\u0080\u0094 /i(^, X^) for some ^ \u00C2\u00A3E, 0 is identified at fi by XI; and (b) the induced transformation 9 = 9{fi;X^) satisfies the Lipschitz condition \\9i \u00E2\u0080\u0094^2| | < C\\fii -/\u00C3\u008F2II for some constant C > 0, whenever X^ G T and p., = n{\u00C3\u0087i;X^), p2 = pi\u00C3\u00B9'iXk) S M. Proof : Since 9 is identified at fjP by Xk, it follows that for any possible choice of parameters Tl, - \u00E2\u0080\u00A2\u00E2\u0080\u00A2 ,Tio consistent with 9^, for each j there must exist p + 1 components of Xk, X j j , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, Xj^^^ such that Xj.^d \u00E2\u0082\u00AC iTj-i,Tj]n{T^_-^^,T^], i = 1, - \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 , p - | - l , and the matrix (x_,-,, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, Xj^^^J is nonsin-gular. By continuity, the Xj . 's may be perturbed shghtly without disturbing the nonsingularity of (xj j , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, Xjp^j). Assertions (a) and (b) follow directly from the properties of nonsingular hnear transformations. (Recall that if / i = X6 for a nonsingular X , then 9 = X~'p, and hence ll^ll < tr{X-''X-')M\). H R e m a r k It is clear from the proof that for a continuous model, it is necessary and sufficient to identify 9'^, that within each r-partition, there are p + 1 observations (xj j , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, xj^,^ J such that the matrix X = (xj j , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 ,Xjp^j) is of full rank. In particular, if z has a positive density over a neighborhood of rj* for each j, then with large n, a Xk exists such that 9 is identified at fi{e\Xk) hyXk. Another concept introduced by Feder (1975a) is called the center of observations. This concept is modified in the next definition to fit our multivariate setup. D \u00C3\u00A9 f i n i t i o n Let z = ( x i , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, Xp)'. z\u00C2\u00B0 = {x\u00C2\u00B0, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, x^)' is a center of observation if for any ^ > 0, both P({z : ||z - z\u00C2\u00B0 | | < S, Xd < x^}) and P({z : ||z - z\u00C2\u00B0 | | < 6, Xd > x\"}) a-^ e positive. Remark For any a < ?/, if constant vectors z i , - - - ,Zp+ i are centers of observations such that Xtd \u00E2\u0082\u00AC (a, 77), t = l , - - - , p + 1, and the matrix Xp+i = ( x i , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, Xp+i) is of full rank where Xj = (1,Z;)', by Lemma (A3.1) there exists a neighborhood, T, of Xp+i , such that T C {x : a < xtrf < 77}, P{T) > 0 and X*^^ is of fuU rank if X;^^ 6 T. Hence, for any a / 0 and random vector x , i;[(a 'x)^l(,,e(\u00E2\u0080\u009E,,|)] > ^[(a 'x)2l(x6T)] > 0 implying that \u00C2\u00A3 ^ [ x x ' l ( 2 . ^ \u00C3\u00A7 ( i s positive definite. Therefore, a sufficient condition for As-sumption 3.1 to hold is that for some \u00C3\u00A8 G (0,mini 0. Then under Assumptions 3.0-3.1, min^,gvv |z^(xf)| = Op{lnn/^/n), where />(xO = M l ; x t ) - M e \u00C2\u00B0 ; x t ) . P r o o f Without loss of generality, we can assume P = 1. If we can show that Y,7=i ^ ti'^t) = Op{\n^ n), then for any I ^ C R \" such that P(W) > 0, min^.evv \i>i^t)\ = Op{\nn/y/n). Let be the linear space spanned by the 2(p + 1) column vectors of (A\u00E2\u0080\u009E(\u00E2\u0080\u0094oo,fi) , X \u00E2\u0080\u009E ( f i , o o ) ) , be the linear space spanned by / / ( f \u00C2\u00B0 ;X\u00E2\u0080\u009E ) , and :F+ = :F \u00C2\u00AE X^)] be the direct sum of the two vector spaces. Let Q'^,Q denote the orthogonal projections onto .;\u00C2\u00A3\u00E2\u0080\u00A2+, respectively. Let i>(X\u00E2\u0080\u009E) = (j>(xa), \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, \u00C2\u00A3>(x\u00E2\u0080\u009E))'. Then | | ^ (X\u00E2\u0080\u009E) - \u00C3\u00AA j p = S^ih) < \\\u00C3\u00AAn\\'. Since botli / i ( f \u00C2\u00B0 , X \u00E2\u0080\u009E ) and /x(f ;X\u00E2\u0080\u009E) belong to T\"^, by orthogonality, l K l ; x \u00E2\u0080\u009E ) - g + y n i i ' + i i \u00C3\u00AA + 5 ^ n - F \u00E2\u0080\u009E i P = I K l ; X n ) - F \u00E2\u0080\u009E | | 2 (Xn)\\'^, it remains to show that ||(5\"'\"\u00C3\u00AAn|| = Op{lnn). Without loss of generahty, we can assume that n < T^. Let /3\u00C2\u00B0 = and A/3\u00C2\u00B0 ^ -0^. Note that K f \u00C2\u00B0 , A \u00E2\u0080\u009E ) = (X\u00E2\u0080\u009E( -oo , r{ ' ) ,X\u00E2\u0080\u009E( r{ ' , oo ) )4\u00C2\u00B0 = ( X \u00E2\u0080\u009E ( - \u00C5\u0093 , f i ) + X \u00E2\u0080\u009E ( f i , r \u00C2\u00B0 ) , X\u00E2\u0080\u009E( f i , oo) - X \u00E2\u0080\u009E ( f i , rO))/3\u00C2\u00B0 = [ (X\u00E2\u0080\u009E ( - cx ) , f , ) ,X\u00E2\u0080\u009E ( f i , oo ) ) + ( X \u00E2\u0080\u009E ( f i , r \u00C2\u00B0 ) , - X \u00E2\u0080\u009E ( f a , r \u00C2\u00AB ) ) ] ^ \u00C2\u00AB = ( X \u00E2\u0080\u009E ( - ^ , f i ) , X\u00E2\u0080\u009E ( f i , oo))^\u00C2\u00B0 + X\u00E2\u0080\u009E(fa, r\u00C2\u00B0)A/3\u00C2\u00B0. This imphes that T'^ is also generated by the direct sum of T and vector C, where C X \u00E2\u0080\u009E ( f i , r f ) A / 3 \u00C2\u00B0 . By Lemma A3.3, there exists a < 1 such that for sufficiently large n, IC'^I < allClllkll for ah f\ < r \u00C2\u00B0 and g Ci P with probability approaching 1. Since Q{Q^\u00C3\u00A8n) \u00E2\u0080\u0094 Q^n and C'(Q^fn)/IICI| = C'\u00C3\u00AAn/IICII) it follows from Lemma A3.2 that with probability approaching 1, Therefore, if it is shown that ||Q\u00C3\u00AAn|| = Op(lnn) and C'?n/l|C|| = Op(lnra), the desired result obtains. Define X = (\u00C3\u0082:i,\u00C3\u0082'2). Then =\u00C3\u00A8'^X{X'X)-X'X{X'X)-X'\u00C3\u00A8n =\u00C3\u00A8'nX{X'X)-X'ln = ~e'nMX[Xi)-X[\u00C3\u00A8n + \u00C3\u00A8'MX'2X2)-XUn =r\u00E2\u0080\u009E ( - o o , f i ) + r\u00E2\u0080\u009E ( f i , (X)) . Therefore by Lemma 3.2, ||Q\u00C3\u00AAn|| = Op(lnra) uniformly for all fx. We next show that uniformly in n < rJ\", C'\u00C3\u00AAn/||CII = Op(lnn) for ||C|| 7^ 0, where C -^{M-^i) and ^ = ( X \u00E2\u0080\u009E ( - o o , r f ) - X \u00E2\u0080\u009E ( - o o , f i ) ) . Let yt = x'^A/3\u00C2\u00B0. Conditional on X \u00E2\u0080\u009E , we have that AQ%\ 31nn IICII - To 1^\") ^ | A \u00E2\u0080\u009E ) < p J E r = i y t l ( x \u00E2\u0080\u009E j < x . ^ < r \u00E2\u0080\u009E j ) g t | 3 In 71 where To is specified in Lemma 3.1. Since |2 / . l (x\u00E2\u0080\u009E ,\u00E2\u0080\u00A2 oo, where CQ is the constant specified in Lemma 3.1. Finally, by appealing to the dominated convergence theorem we obtain the desired result without conditioning. This completes the proof. ^ T h e o r e m A 3 . 1 Suppose Assumptions 3.0 and 3.1 are satisfied. Let X \u00C2\u00B0 = (x\u00C2\u00B0 , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 - j x \" ) . If 6 is identified at X\u00C2\u00B0 ) by and x j , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, x\u00C2\u00B0 are centers of observations, then P r o o f Lemma A3.4 implies that with probability approaching 1, within any small neighbor-= O p ( l n n / A ) . hood of x \u00C2\u00B0 , there exists a xj^ such that i = 1, - \u00E2\u0080\u00A2\u00E2\u0080\u00A2 ,k. Lemma A3.1 imphes the conclusion of the theorem. If C o r o l l a r y A 3 . 1 Under the conditions of Theorem A3.1, f - r \u00C2\u00B0 = Op(lnn/y/n) where f = (^1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, 'fio )', fj = 0 - Pj+xfi)l0i+i,d - hd), i = 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, P r o o f For any j = 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, /\u00C2\u00B0 , by continuity of the model at the end points x^ = r^, for all {xi, i ^ d}. Then by choosing the {x^, i ^ d} so that they are not collinear, we deduce that = for ah i ^ 0,d. By assumption, /9\u00C2\u00B0^ ^ Therefore, TJ can be reestimated by solving and hence, fj \u00E2\u0080\u0094 r \u00C2\u00B0 has the same order as \u00E2\u0080\u0094 1 Next we shall establish the asymptotic normahty of ^, and f when the model is continuous. The idea is to form a pseudo problem by deleting all the observations in a small neighborhood of each r \u00C2\u00B0 so that classical techniques can be apphed, and then to show that the problem of concern is \"close\" to the pseudo problem. The term \"pseudo problem\" is used because in practice the r^'s are unknown and so are the observations to be deleted. This idea is due to Sylwester (1965) and is used by Feder (1975a). Assume xj, has positive density function fd{xd) over a neighborhood of r \u00C2\u00B0 , j = l , - - - , / \u00C2\u00B0 . Our pseudo problem is formed by deleting all the observations in {x : r \u00C2\u00B0 \u00E2\u0080\u0094 d\u00E2\u0080\u009E < < r \u00C2\u00B0 + rf\u00E2\u0080\u009E} where dn = 1/ln^ n. Intuitively speaking, the number of observations deleted will be Op{ndn). This will be confirmed later in Lemma A3.6. Adopting Feder's (1975a) notation, we define n* as the sample size in the pseudo problem, and let n** = n - n*, 9* he the least squares estimate in the pseudo problem, the summation over the n* terms of the pseudo problem, and = Yl't=i \" E * - Generally, a single asterisk refers to the pseudo problem. Theorem A3.1 and Corollary A3.1 carry over directly to the pseudo problem. Thus, Theorem A3.2 If the conditions of Theorem A3.1 is satisfied in the pseudo problem, then 9' -9\u00C2\u00B0 = Op{lnn/V^). Further, if Model (3.1) is continuous, f \u00E2\u0080\u0094 r \u00C2\u00B0 = Op{\n n/y/n). L e m m a A 3 . 5 Suppose {xt} is an iid sequence. Under the conditions of Theorem A3.2 where Gj = \u00C2\u00A3;[xx'l(^_^\u00C3\u00A7(^<^^,^o])], j = 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 , /\u00C2\u00B0 + 1. P r o o f Let 5*(f) = ^ ^'(yt - / / (^Xt))^ . Theorem A3.2 imphes that f* \u00C2\u00A3 ( r \u00C2\u00B0 - dn,Tf + dn] with probability approaching 1. Since there are no observations within this region, it follows that 5*(f) computed within this region does not depend on r and is a paraboloid in 9. In particular, it is twice differentiable in 6. For the reminder of the proof, denote S*(\u00C3\u0087) by S*{d). Thus, with probability approaching 1, 6* may be obtained by setting the derivative of S*(9) to 0: t=i j=i n = ^ ^ x , ( x ' , ( / 3 , - - fOl(x..e(rO_,+.\u00E2\u0080\u009E,.o_.\u00E2\u0080\u009E])-- * Hence, ^T,ti^t^tM..,e(rO_^+d\u00E2\u0080\u009E,rf-d\u00E2\u0080\u009E]))0j - P'j) = 7[T.7=i^tetl(.,,\u00C3\u00A7(rO_^+d^,rf-d\u00E2\u0080\u009E])- By Lemma 3.6 and the strong law of large numbers, 1 \" - Y ^t^tMx,deir\u00C2\u00B0_^ + d\u00E2\u0080\u009E,r\u00C2\u00B0-d\u00E2\u0080\u009E])) 1 \" = G , + Op ( l ) , where Gj = ^fxix'^l^j-^^\u00C3\u00A7f^o J,T\u00C2\u00B0])]- Under the assumptions of the pseudo problem, Gj is positive definite. Thus, ^ 0 ] - P'j) = [Gj + \u00C3\u00A8 x , C , l ( , . , e ( . ; ^ , + 0 and ^ \u00C3\u0087.lin <\u00C2\u00A7E[f:.\C,Xt)] < \u00C2\u00A7 ( s u p max K ^ ; x O l f i ? K * ) ieu^x,ue[j.(T\u00C2\u00B0-d,.,rf+d\u00E2\u0080\u009E] <^0i^)0p{ndn) for some M > 0, where 0{a\ln) and Op{ndn) are independent of ^ \u00E2\u0082\u00AC ZYn. Since a\dn \u00E2\u0080\u0094>\u00E2\u0080\u00A2 Q as n -* oo, ^ ^ti^i^y^t) = Op{l/n) uniformly for all ^ G ZY\u00E2\u0080\u009E. Thus, by (A3.3) S{0 = S*{0 + ^f^^l + Op{h (A3.4) where Op(l/n) is uniformly small for ^ \u00C2\u00A3lin-Since ^ and ^* are least squares estimates for the original and the pseudo problem respec-tively, Sii) < Sit), S'it) < S'ii). (A3.5) (A3.4) and (A3.5) imply 0 < Sit) - 5(0 = S'it) - S*ii) + Op{-) < opi-). (A3.6) Tt Tl Therefore, S*(i) - S*{i') Taylor's expansion yields = Op(^). Since dS*(i*)/d9 = 0 and 5*(f) is a paraboloid in 6, s'ii) = s*in+l{\u00C3\u00AA - - r ) ' . (^3.7) Equations (A3.6) and (A3.7) imply \u00C3\u0094 - 9* = Op(7i-\u00C2\u00A7). If Lemma A3.6 implies that ^/n{9 \u00E2\u0080\u0094 9^) and y/n{9* \u00E2\u0080\u0094 9^) have the same asymptotic distribu-tion. Thus, by Lemma A3.5 we have Theorem A3.3 Suppose the conditions of Lemma A3.6 are satisfied. Then, ^A^(/3, - - i N{0, alGf), j = 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, /\u00C2\u00B0 + 1 where Gj is defined in Lemma A3.5. For any j = + 1, let and A \u00C3\u0082 = $j,o - A/3, = pj^d - Pj+i,d-Then = fj = hence. V ( A / 3 o - A/3S) - - M ^ ( A / 3 \u00C2\u00B0 - A/3,) - A / 3 / A/3,A/32 = - i ^ ( A / 3 o - A/3\u00C2\u00B0) + - ^ ( A ^ , - A/33). M^i - r\u00C2\u00B0) = - ^ v ^ ( A ^ o - A/3\u00C2\u00B0) + _ ^ v ^ ( A / 3 , - A/32) + \u00C2\u00ABP(1)-95 So we have Theorem A3.4 Under the conditions of Theorem A3.3, if Model (3.1) is continuous, then {fj \u00E2\u0080\u0094 Tj) and _^^(,{APo \u00E2\u0080\u0094 A/?o) + zr^{^Pd \u00E2\u0080\u0094 A/3\u00C2\u00B0) have the same asymptotic distribution. Chapter 4 S E G M E N T E D R E G R E S S I O N M O D E L S W I T H H E T E R O S C E D A S T I C A U T O C O R R E L A T E D N O I S E In this chapter, we consider the situation where the noise is autocorrelated and the noise levels are different in different regimes. Specifically, consider the model yt = x'j^j + o-jfi, if Xtd \u00E2\u0082\u00AC ( r j _ i , TJ ] , J = 1,..., / + 1, ^ = 1,... , n, (4.1) where \u00E2\u0082\u00ACt = YlT i^iCt-i, with < oo. The {CJ} are i id , have mean zero, have variance a^, and are independent of the {xj}, Xj = {l,Xti,..., Xtp)'. And \u00E2\u0080\u0094oo = TQ < TI < \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 < TI^I = oo, while the CTJ (j = 1 , . . . , / + 1) are positive parameters. We adopt the parametrization which forces a\u00C3\u00A7 \u00E2\u0080\u0094 l / E o \u00C2\u00B0 ^ i ^ ^\u00C2\u00B0 that the {et} have unit variances. Further, we assume that there exists a ^ > 3 /2 , ko > 0 such that < k/{i + 1)'' for all i. Note that this implies {et} is a stationary ergodic process. Estimation procedures are given in Section 4.1. In Section 4.2, it is shown that the asymp-totic results obtained in Chapter 3 remain vahd. Since a major part of the proofs formally resemble those in Chapter 3, all the proofs are put in Section 4.5 as an appendix. Simulation results are reported in Section 4.3. Section 4.4 contains some remarks. 4.1 Estimation procedures With the notation introduced in Chapter 3, the model can be rewritten in the vector form, y\u00E2\u0080\u009E = J ] X \u00E2\u0080\u009E ( T f _ \u00E2\u0080\u009E r \u00C2\u00B0 ) ^ , + c-, (4.2) i=i where := [^'-^x'ajUrl\u00E2\u0080\u009Erf)%. A l l the parameters are estimated as in Chapter 2 except for the variances {a^,..., a-fo_^_-^}. These are estimated by \u00C3\u00A2] = Snifj-i,fj)/nj. i = 1, . . . , /+ 1, where fij is the number of observations falling in the jth estimated regime and / is the estimate of /\u00C2\u00B0 produced by the estimation procedure in Section 2.2. We shall see in the next section that the asymptotic results in Section 3.2 are essentially unchanged for this modification of the model. After estimating Pj and aj we may use the estimated residuals, \u00C3\u00AAt \u00E2\u0080\u0094 {yt \u00E2\u0080\u0094 x.[Pj)/\u00C3\u00A2j, if Xtd \u00E2\u0082\u00AC ( f j_ i , f j ] , to estimate the parameters in the moving average model for the e'^s. 4.2 Asymptotic properties of the parameter estimates To establish the asymptotic theory, we need to make some assumptions for Model (4.2). Below is a basic assumption which is assumed to hold throughout this section. Assumption 4.0; The {xj} is a strictly stationary ergodic process with \u00C2\u00A3 ' ( x jx i ) < oo. The et are given by \u00E2\u0082\u00ACt = tpiCt-i, where ipi < ko/{i-\- if for some ko > 0, 6 > 3/2 and all i, the {Q} o^fe iid, locally exponentially bounded random variables with mean zero, variance = 1/ J2ilo '^h are independent of the {xj}. For the number of threshold P, there exists a specified L such that P < L. Also, for anyj = l,...,l\ p\u00C2\u00B0 ^ 0%,. Note that {e^} is a stationary ergodic process and each has unit variance. Additional assumptions analogous to those in Section 3.1 are also needed to establish the consistency of the estimates. For convenience, we restate Assumptions 3.1-3.2 as Assumptions 4-1-4-^, respectively. A s s u m p t i o n 4.1 There exists 6 e (0,mini 0, \u00C2\u00A3^{xiXil(3,j_^\u00C3\u00A7(^p_5 .^ o])} and jE'{xixil(^j_^g(^p .,.0^ 5]^ } are pos-itive definite, i = l,---,l\u00C2\u00B0. Also, \u00C2\u00A3 ' ( x i x i ) \" < 00 for some u> I. To establish the asymptotic normality for the /9j's and \u00C3\u00A2 j ' s , we need to establish it for the least squares estimates of the /3j's and o-|'s with P and r^, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, T^O known. To this end, we specify the probabihty structure of { x J and {0} exphcitly. If {Q, T, V) is a probability space, a measurable transformation T : fi \u00E2\u0080\u0094> is said to be measure-preserving if P{T~'A) = P{A) for all A \u00E2\u0082\u00AC !F- If T is measure-preserving, a set A \u00E2\u0082\u00AC is called invariant if T~'{A) \u00E2\u0080\u0094 A. The class T of all invariant sets is a sub-cr-field of T, called the invariant cr-field, and T is said to be ergodic if all the sets in T have probabihty zero or one. (cf. Hah and Heyde, 1980, P281.) As Hall and Heyde point out (1980, P281): \"Any stationary process { x \u00E2\u0080\u009E } may be thought of as being generated by a measure-preserving transformation, in the sense that there exists a variable x defined on a probability space {Q.,T,V), and a measure-preserving map T : fi \u00E2\u0080\u0094> fi, such that the sequence {x'\u00E2\u0080\u009E} defined by XQ = x and xj,(u;) \u00E2\u0080\u0094 x(T\"a;), n > 1, a; G has the same distribution as { x \u00E2\u0080\u009E } . \" Therefore, we can assume that the stationary and ergodic sequence {xt,Ct} is generated by a measure preserving transformation T on a probability space without loss of generality. A s s u m p t i o n 4.3 (A.4.3.1) Let (fi , J^, \u00E2\u0080\u00A2p) he a probability space. Let {^t,Ct}t^-oo the iid random sequence such that (i) {Xf} and { C J are independent; (ii) (xtXt) = (x(r*a;), C(T'a>)), a; G fi, i = 0 , \u00C2\u00B1 1 , - - - , where T is an ergodic measure-preserving transformation and (x, ^ ) is a random variable defined on the probability space {^,T,V);and (iii) E{x\x.iY < 00 for some u > 2. (A.4-3.2) Within some small neighborhoods of the true thresholds, x\d has a positive and con-tinuous probability density function /,(\u00E2\u0080\u00A2) with respect to the one dimensional Lebesgue measure. (A.4-3-3) There exists one version of E[-x.\X.'^\xxd \u00E2\u0080\u0094 x] which is continuous within some neigh-borhoods of the true thresholds and that version has been adopted. Consider the segmented linear regression model (4.2) of the previous section. Let / be the minimizer of MIC{1). T h e o r e m 4.1 For the segmented linear regression model (4.2) suppose Assumptions 4-0 and 4.1 are satisfied. Then I converges to /\u00C2\u00B0 in probability as n ^ 00. The next two theorems show that the estimates f, 0j and aj are consistent, under As-sumptions 4.0 and 4-2. Theorem 4.2 Assume for the segmented linear regression model (4-Sj Assumptions 4-0 and 4.2 are satisfied. Then f - r \u00C2\u00B0 = Op(l), where r \u00C2\u00B0 = ( r f , . . . , r^o) and f \u00E2\u0080\u0094 (fi,..., fj) is the least squares estimate of r \u00C2\u00B0 based on I \u00E2\u0080\u0094 I, and I is a minimizer of MIC {I) subject to I < L. Theorem 4.3 If the marginal cdf Fj, of xn satisfies Lipschitz Condition \Fd{x') - Fd{x\")\ < C\x' \u00E2\u0080\u0094 x\"\ for some constant C at a small neighborhood of X\d = rj\" for every j, then under the conditions of Theorem 4-2, the least squares estimates Pj and aj, j = 1,... ,1 + 1, based on the estimates I and fj's as defined in Section 2.2, are consistent. Next, we show that if Model (4.2) is discontinuous at r \u00C2\u00B0 for some j = 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 , / \u00C2\u00B0 , then the threshold estimates, fj, converge to the true thresholds, r \u00C2\u00B0 , at the rate of Op(ln' n/n), and the least squares estimates of Pj and <7| based on the estimated thresholds are asymptotically normally distributed. Theorem 4.4 Suppose for the segmented linear regression model (4-2) that Assumptions 4-0, 4.2 and 4.3 are satisfied. IfP{x[{Pj+i - Pj) / 0\xd = r?) > 0 for some j = 1,---,P, then For j = 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, /\u00C2\u00B0 + 1, let Pj be the least squares estimates of Pj based on the estimates / and fj's as defined in Section 2.2, and aj be as defined in Section 4.1. Define Gj = Z;(xix'il(^^_^\u00C3\u00A7(^o_^_^o])), 00 E,- = aj[G-' + 2Y,l{i)Gj'E{xil^,^^^^rO_^,rO^^^^^ i=l Pj = P{TU < < r'j) and oo vj=pjil-pj)Eiet) + p'j[iv-3h\0) + 2 ^ 7^(0], \u00C2\u00BB=-oo where 7(1) = \u00C2\u00A3 ' (ei\u00E2\u0082\u00ACi+,) , 77 = cryE(<^f) and j = + Then, we have the following result. Theorem 4.5 Suppose for the segmented linear regression model (4-2) Assumptions 4-0, 4-2 and 4.3 are satisfied. If P{x.\{Pj^x - Pj) 7^ O^d = r?) > 0 for all j = 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 t h e n V^CPJ - Pj) N{0, S,) and ^Pj{\u00C3\u00A0] - u]) iV(0, v^a)), as n ->\u00E2\u0080\u00A2 00, j = 1, - \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 ,f + 1. Note that i f 7(1) = 0, i > 0, then Ylj \u00E2\u0080\u0094 <^o^7^ as shown in Section 3.1. The next theorem shows that Method 1 of Section 2.2 for estimating dP produces a consistent estimate. Theorem 4.6 If d\u00C2\u00B0 is asymptotically identifiable w.r.t. L, then under the conditions of Theo-rem 4-1, d given in Method 1 of Section 2.2 satisfies P(d = d^) \u00E2\u0080\u0094>^ 1 as TI \u00E2\u0080\u0094 \u00C2\u00BB \u00E2\u0080\u00A2 00. Remark: Although the result of Theorem 3.7 is expected to carry over if aj = a for all j, it does not carry over in general. Hence, Method 2 given in Section 2.2 is not generally consistent. Below is a counterexample. Example 4.1. Let x = (1,2:1,X2)' where (xi,X2) is a random vector with domain [0,6] x [0,6]. Divide the domain into six parts as shown in Figure 4.1. On each part, (xi,X2) is uniformly distributed with mass indicated in the figure. Let d = 1, Z*' = 2, L = 2 and ( r i , r2 ) = (0.5,1). Hence, i?? = {x : 0 < x i < 0.5}, i i :^ = {x : 0.5 < x i < 1} and i?^ = {x : 1 < x i < 6}. The model is yt = ^^ l(x,eK\u00C2\u00AB) + <^j(t: if Xt G R'j, 102 where the { x J are independent samples from the distribution of x , the {et} are iid iV(0,1) and independent of {xt}. Let o-^ = 1 and = cr^ = 10. Define Rj = {x : X i 6 (j \u00E2\u0080\u0094 1, j]}, i = 1,2, J = 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 - ,6 . It is easy to see that on each Rj, the mass is 1/6 = 1/(2X + 2). Suppose we fit a constant on each of Rj. Let us calculate AMSE{R^j), the asymptotic mean squared error on R). For j > 1, AMSE(R]) = a | = 10. And AMSE{Rl) = ^2 ^ i + a l X i + 5f = ^ + BJ, where Bi is the asymptotic mean bias. Observe that the marginal distribution of Xi on (0,1] is uniform and symmetric about n = 0.5; hence Bi = 1 and AMSE{R\) = 13/2 < 10. Therefore, with probabihty approaching 1 as n \u00E2\u0080\u0094\u00C2\u00BB^ oo, the M S E on Rl wiU be chosen as the smaUest M S E among those on 72], j = 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, 6. For i = 2 and j > 1, where B2 represents the asymptotic mean bias on each of Rj, j > 1. The asymptotic mean squared error on Rl should be no larger than the asymptotic mean squared error obtained by setting the model to 0: \ ij - 1 20 2 20 ^ 20 20 20 100 Thus, with large probability as n ^ 0 0 , the M S E on Rl will be chosen as the smallest M S E among those on Rj, j = l , - - - , 6 . Since AMSE{R\) > AMSE{R\), X2, rather than xi, wih be chosen by Method 2 as the segmentation variable with probability approaching 1 as n \u00E2\u0080\u0094>^ 00. f 4.3 A simulation study In this section, simulation experiments involving model (4.2) are carried out to examine the small sample performance of our proposed procedures under various conditions. As in Section 3.3, segmented regression models with two to three regimes are investigated. Let 4 = 0.7eJ_i - 0.1e;_2 + Ct, where the {0} are i id with a locally exponentially bounded distribution having zero means and unit variances. Note that the {e^} can alternatively be defined by (l-ei-^5)(l-C2-^5)e', = Ct, where B is the backward shift operator defined by Bh'^ = e[_j, j = 0, \u00C2\u00B1 1 , \u00C2\u00B1 2 , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 -, and (6,6) = (2,5). Since |6| > 1 for i = 1,2, {ej} is a causal AR(2) process. Hence, it can be written as = S j l o where is the coefficient of in the polynomial, V>(2) = l/[{l \u00E2\u0080\u0094 ^z){l-^z)]. Expanding tp(z), we get t=0 fc=0 .=0 it=0 Let j = i + k, then \u00C2\u00AB=0 j=\u00C2\u00BB j=0 i=0 So t=0 t=0 Thus for any S > 3/2, taking ko > 0 sufficiently large, we have < ko/(j + 1)*. Let \u00E2\u0082\u00ACt \u00E2\u0080\u0094 e'Jy/Var{\u00E2\u0082\u00AC[), so that Var{et) = 1 for all t. Then the {et} satisfy the condition of Model (4.2) [In this case ^yVar{e't) = 1.33 (c.f Example 3.3.5, Brockweh and Davis, 1987)]. Let Zt = {xti, - \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 ,xtp)' and xJ = ( l , z ' J , where {xtj} are nd iV(0,4). Let DE{Q,\) denote the double exponential distribution with mean 0 and variance 2A^. For d = 1 and r \u00C2\u00B0 = 1, the following 3 sets of model specifications are used: (a') p = 2 = (0,1, l y , p2 = (1.5,0,1)', tTi = 0.8, = 1, 0 ~ ^ (0 ,1 ) , (d') p = 3, \u00C3\u0082 = (0,1,0,1)', ^2 = (1,0,0.5,1)', a i =0.8, 0, ^ > 3/2. Then YlZi\u00C5\u0092Zi |a,+,|)2 < oo. Proof : By assumption, \ai\ < ko/i^ for some ko > 0, S > 3/2. Therefore, oo oo oo oo ^ ;=1 /=1 .=1 /=1 ^ ^ Now, oo ^ oo 1=1 ^ ^ ^ j=,+i oo j = E V / / dt oo .j = E / min < E / roo -I. dt dt So, oo oo L,2 \u00C2\u00B0\u00C2\u00B0 D E i - ' + ^ d ^ s t ^ E ' / . ^ \" - \" -(4.3) By assumption, S > 3/2, so 2(6 \u00E2\u0080\u0094 1) > 1, and hence f ; ( f ; ia ,+ , i )2 3/2, ko > 0 such that < ko/{i + 1)'' for all i. Let Sk = Yii=i ^i^i> where the a'^ s are constants. Then there exists 0 < c i < oo and Ti > 0, such that for any x >Q, k > 1 and t satisfying 0 < / | |a | | < T i , P{\Sk\ >x}< 2e-*^+=i*'ll''ll'. P r o o f The assumption of locally exponentially boundedness means that for some TQ > 0 and 0 < Co < oo, f ; (e*^i) < e''\"*^ for \t\ < To. Now it follows from Markov's inequality that for sufficiently small t > 0, A n d where Hence, P{Sk >x} = P{e*^* > e'^} < e-*^X;(e'^*). fc k oo Sk = Y = E E = ^ ( ^ ) + ^ ( ^ ) ' 1 i = l j=0 fc-1 t ^(^) = E ' ^ ' ^ - ' E ^ ' t - j ^ ' - i ' :=0 j=0 ^ w = E c - . E \u00C2\u00AB i V ' i + . - . i=0 i = l if | ^ E t i a / V ' / + i | < To for aU i. Let Mi = E S o C E t i Note that we can assume y/M^ > 0 without loss of generality (since otherwise Cj = 0 a.s.). Since iV\u00C2\u00BB,! < ko/{i+ 1)^, from the previous lemma Afi < oo. Observe that for all i, ( E \u00C2\u00AB ' V ' / + . ) ^ < ( \u00C3\u00A8 \u00C2\u00AB ? ) ( E ^ ' + . ) /=i 1=1 1=1 < i w P ( E i ^ ' + ' i ) ' ^ i H i ' ( E i ^ ' + \u00C2\u00AB i ) ' -/=i /=i Hence i f t is such that | i | | | a | | < TQ/^/M^, then for aU i k oo l * E \" ' ^ ' + \u00C2\u00AB l ^ M I H I ( E l ^ ' + . l ) < \t\\HVM'iPi-j\ < To for all i. Let n = i- j, m = i - I, then i=0 j=0 k-1 i i j-1 = E E ^l-j^i-j + 2 E E ak-jQk-irpi-ji^i-i] 1=0 i =0 j = l /=0 fc-1 0 fc-1 i - 1 n+1 = E E ^l-i+n'^l + 2 E E E afc-(i-n)afc-(.-m) V ' n ^ m t\"=0 n=t\" j = l n=0 m = : fc-1 t fc-2 fc-1 \u00C2\u00AB\u00E2\u0080\u00A2 = E E '^fc- '+n '^\" + 2 5 ^ J2 E flfc+n-iafc+m-.V'nV'^ t=0 n=0 n=0 t = n + l m = n + l fc-1 fc-1 fc-2 fc-1 fc-1 ^ E ^ \" E + 2| ^ Y^k+n-iak+m-iMm n=0 i=n n = O m = n + l z = m fc-1 fc-2 fc-1 fc-1 i /V A A. A A, J. < E ^ n H i ' + 2 E E i V ' . ^ ' - i i E Ck+n-iak+m-i\ n=0 n=0 m=n+l \u00C2\u00BB=m < E ^ n N l ' + 2 E E l^n^'mlNI^ n=0 n=0 m=n+l = i i \u00C2\u00AB i P ( E i ^ ' ^ i ) ' n = l Therefore, for any t such that |<|||a|| < To/y/M^ and the c = CQMI , we have \u00C2\u00AB \u00C2\u00AB\u00E2\u0080\u00A2 ItY^k-j^i-jl < | f | | ^ a f c _ j V . - i l j=o j=0 fc-1 i t=0 j=0 <7o. and hence Since A(A;) and are independent we get that for Ti = To/y/Ml and any A;, P{Sk >x}< e-'^Eie*''^''^)E{e'^^'^) < e-\u00C2\u00AB-e2ct^||. | |^ ^ ^-tx^c,t^\\a\\^^ where c i = 2c and |f| | |a| | < T j . Finally, to conclude the proof, we note that P{Sk < -x} = P{-Sk > x}. f Lemma 4.3 Assume for the segmented linear regression model (4-2) that Assumption 4-0 is satisfied. Define (Tmax := rnaxj <7i and redefine Tn(a,T]) := \u00C3\u00AA ^ ' ^ \u00E2\u0080\u009E ( a , 77)6^, - 0 0 < a < 77 < 00. Then Qfj2 \u00E2\u0080\u009E3 P{sup Tnia, 77) > In^ TI} 0, as n 0 , a \u00C2\u00A34 f ^ l n 2 7i I X \u00E2\u0080\u009E } a ^ - ^ I n ^ n I X \u00E2\u0080\u009E } < E PK'Hn{x,d,xu)\u00C3\u00A8l>^^\n'n\Xn]. X,d ^ % ^ l n ' n | X , } - > 0 , asn^O. Noting that p = trace{Hn{x,d,Xtd)) = Ef=i II qi IP> we have || q, | |2= qjq, < p < po and II q ; E ! lV^ i ^ n ( r ? . i , r f ) r< a L . || q/ |P< ^LxJ^o < crLxPg, where / = l , . . . , p . By Lemma 112 4.2, with ^0 = Tx/umaxPo we have T2 V, i_2 ^ E 2 e x p ( - - ^ . ^ ^ h i n ) e x p ( c i ( - ^ ) V L . F o ) 0, as ra -> oo, where c\ is the constant specified in Lemma 4.2. Finally, by appealing to the dominated convergence theorem we obtain the desired result without conditioning. % C o r o l l a r y 4.1 Consider the segmented regression model 4-1 \u00E2\u0080\u00A2 (i) For any j and (a , /?] C ( r ^ . i , r]>], 5 \u00E2\u0080\u009E ( a , 7/) = a]\u00C3\u00A8'n{a, r])\u00E2\u0082\u00ACn(a, rj) - Tn{a, rj). (ii) Suppose Assumption 4-0 is satisfied. Let m > 1. Then uniformly for all (oi, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, a^ ) such that -oo < cx < \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 < < oo, m+l\u00C2\u00B0+l 5 \u00E2\u0080\u009E ( 6 , - - - , W ) = Y Sni\u00C3\u0087i.x,^i) = rn'\u00C3\u00AB^n + Op{ln'n), i=i where 6 = -oo, fm+zo+i = oo, and {^i,-\u00E2\u0080\u00A2 \u00E2\u0080\u00A2 ,^m+i\u00C2\u00B0} is the set {ri\u00C2\u00B0, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, r\u00C2\u00B0o, ai, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, a\u00E2\u0080\u009E} after ordering its elements. Proof : (i) Replace \u00C3\u00AB\u00E2\u0080\u009E(a , rj) in the proof of Proposition 3.1 (i) by c^(a, rj) = / \u00E2\u0080\u009E ( a , r])\u00C3\u00AA^ and note \u00E2\u0082\u00AC^(0,77) = aj\u00C3\u00ABn(a,rj) when (a,77) C {TJ_I,T^]. The result obtains immediately. (\") B y (i), Sni\u00C3\u0099, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2\u00E2\u0080\u00A2,\u00C3\u0087m+l\u00C2\u00B0) \u00C2\u00AB=1 m+l\u00C2\u00B0+l 1=1 m+l\u00C2\u00B0 + l \u00C2\u00AB=1 Note that each of (^j_i,^j] is contained in one of ( r \u00C2\u00B0_ i , rj*], j = 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, /\u00C2\u00B0 + 1. By Lemma 4.3, E . ' l t ' \" \" ' ' Tn{ii-x,ii) < (m + /\u00C2\u00AB + 1) sup,<, r \u00E2\u0080\u009E ( a < T?) = O^Cln^ n). 1[ L e m m a 4.4 Under the condition of Theorem 4-1, there exists S G (0, mini 0 as n \u00E2\u0080\u0094>^ oo, r = 1, . . . , /\u00C2\u00B0 + 1. P r o o f It suffices to prove the result when /\" = 1. For notational simplicity, we omit the subscripts and superscripts 0 in this proof. For the 6 in Assumption 4.I, denote = X \u00E2\u0080\u009E ( r i -S,Ti),X^ = XniTi,Ti+S),X* = Xn(Ti-S,Ti+S),\u00C3\u00ABl = < 7 i / \u00E2\u0080\u009E ( r i n ) \u00C3\u00AB \u00E2\u0080\u009E , = Cr2ln(Tl,Ti+6)\u00C3\u00ABn, = + and /3 = {X*'X*)~X*'Yn. As in ordinary regression, we have =\\x{Pi+x;h + \u00C3\u00AA*-x'k? = | | X r ( \u00C3\u0082 - ^ ) + ^2*(^2-^) + 6 l P =\\x*{h - h ' + \\x;02 - h ' + + 2 e * ' x r ( \u00C3\u0082 - h + ^i-'x^i^ - h Note that { x J and { j / J in Model (4.2) are strictly stationary and ergodic. It then follows from the strong law of large numbers for stationary ergodic stochastic processes that as n \u00E2\u0080\u0094\u00C2\u00BB\u00E2\u0080\u00A2 oo, 1 ' 1 \" as -X* X\" = - VxiX ' i l (^ .^e(^ j_5 ,Ti + 5]) ^{xix'il(^j^\u00C3\u00A7(.,j_6,^j + 6])} > 0, 71 . ^ -xfx; and \u00C2\u00AB\u00E2\u0080\u00A2=1 ' i;{xix'il(^,^e(ri-5,Ti])} > 0, if j = l , \u00C2\u00A3{xixil(^, ,G(^, ,^,+5])} > 0, if j=2, - X * Y \u00E2\u0080\u009E ^ E{yiXil(xue{Ti-s,Ti+i])}, Th where E{yiXil^^^^\u00C3\u00A7(^r,-s,T,+s])} = -E{xixil(^j^e(rj-5,ri])}\u00C3\u0082 + \u00C2\u00A3^{xixil(^^^6(^i,^,+5])}^2-Therefore, P ^ {X ; {x ix i l ( ^ \u00E2\u0080\u009Ee (^ j_5 ,^ ,+5 ] ) } } ' ^^ { \u00C3\u00AE / iXi l (x i ,e (n -5 , r i+5] )} =: P'-Similarly, it can be shown that f iP, - ^ \u00E2\u0080\u00A2 ) 'E(xix ' i l (x . .e (n-5 ,n]) ) (^ i - if J= l , 02 - ^\u00E2\u0080\u00A2) ' i ; (x:xi l( , , ,e( . , , , ,+^]))( /32 - ^S*), if j=2. 7t - c * ' x ; ( ^ , - ^ ) ^ 0 , for j = 1,2, Th and n where pi = P{xid \u00E2\u0082\u00AC (n - 6,TI]} and p2 = P{xid \u00E2\u0082\u00AC ( r i , r i + S]}. Thus, as n -> oo, ^ 5 \u00E2\u0080\u009E ( r i -6, Tl + (5) has a finite limit, given by l im - 5 \u00E2\u0080\u009E ( r i - 6,TI + S) = ( \u00C3\u0082 - /3- ) ' i ; (x ix i l ( , , ,e ( . ,_5 , . , ] ) ) ( ;3 i - n + 02 - ^ * ) ' \u00C2\u00A3 ( x i x ; i ( . , , e ( , , , , , + 5 ] ) ) ( / 3 2 - PI + (Tlpi+alp2. It remains to show that ~Sn{Ti - S,TI) and ^ 5 ' \u00E2\u0080\u009E ( r i , r i + ^) converge to ajpi and cr^p2 respectively, and either ( \u00C3\u0082 - P*yEixix[l(,,,^^r,-s,n]))0i - P*) > 0 or (^2 - P*yE{xix[ 1(xide(Ti,Ti+s])) \u00E2\u0080\u00A2 (02 \u00E2\u0080\u0094 p*) > 0. The latter is a direct consequence of the assumed conditions while the former can be shown again by the strong law of large numbers. To this end, we first write 5n(Ti \u00E2\u0080\u0094 6,TI) in the following form, Sniri - 6, n) = \u00C3\u00AAl'\u00C3\u00AAl - Tn{ri - 6, n ) using Corollary 4.1 (i). Bearing in mind Eel = 1\u00C2\u00BB by the strong law of large numbers, i \u00C3\u00AA - ' \u00C3\u00AB \u00C3\u00AE ^ al + C} \u00E2\u0080\u0094>\u00E2\u0080\u00A2 1, as n oo for some C > 0, and (ii) for every I such that P < I < L, where L is an upper bound of P, 0 < i ^ ' c ' ^ _ \u00C3\u00A2f = Op(ln\n)/n), Tt where aj = ^ 5 ' \u00E2\u0080\u009E ( f i , . . . , f;) is the estimated al when the true number of thresholds is assumed to he I. P r o o f (i) Since / < / \u00C2\u00B0 , for 6 \u00E2\u0082\u00AC (0, mini(5, s = 1,..., /}. Hence, if we can show that for each r, 1 < r < with probability approaching 1, min Snin,---,Ti)/n> al + Cr, for some Cr > 0, then by choosing C := minial+Cr + Opil), Tt where Cr is defined in (4.4). (u) Let 6 < \u00E2\u0080\u00A2\u00E2\u0080\u00A2\u00E2\u0080\u00A2 < (1+1\u00C2\u00B0 be the ordered set, {h-,-\u00E2\u0080\u00A2 \u00E2\u0080\u00A2 ,TI,T^,\u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 ,Tfo}, - = - 0 0 and \u00C3\u0087i+io^i = T\u00C2\u00B0o^, = OO. Since / > / \u00C2\u00B0 , by Corollary 4.1 (ii) again, >5n ( r \u00C2\u00B0 , . - . , r f o ) =na\u00C3\u00AF = E '^n(6-l,6) j=l =\u00C3\u00ABl'rn + Op{ln\n)). This proves (ii). ^ P r o o f of T h e o r e m 4.1 By Lemma 4.5 (i), for / < f and sufhciently large n, there exists C > 0 such that MIC{1) = ln(<7f ) +p*(lnn)2+Vn > \n{al + C/2) > ln(a2) + l n ( l + Cl{2al)) with probabihty approaching 1. By Lemma 4.5 (ii), for / > / \u00C2\u00B0 , MIC{1) = lii{\u00C3\u00A2j)+ p*(Innf+^/n Ina^. Thus, P{1 > /\"} 1 as n \u00E2\u0080\u0094\u00C2\u00BB\u00E2\u0080\u00A2 oo. By Lemma 4.5 (ii) and the strong law of large numbers, for 1\u00C2\u00B0 <1\u00C3\u00A2f- \u00C3\u00A0fo = [\u00C3\u00A0f - i e - ' c - ] - [\u00C3\u00A2fo - ^e-'e-] = ^^(In^ n/n), and [^ ?o - al] = [\u00C3\u00A2fo - Uv-<\ + \-jV-<. - <^Vi = Op(ln2 n/n) + Op(l) = 0^(1). Hence 0 < (\u00C3\u00A2fo - \u00C3\u00A0'\)/\u00C3\u00A0]\u00E2\u0080\u009E = Op{ln'^{n)/n). Note that for 0 < x < 1/2, l n ( l - x) > -2x. Therefore, MIC{1) - MIC{f) = l n ( \u00C3\u00A2 f ) - l n ( 4 ) + Co(/ - f){\nnf^^\u00C2\u00B0ln = ln ( l - ( 4 - 4 ) / 4 ) + co(/ - /\u00C2\u00B0)(In(n))2+*Vn > - 20j,{\n\n)/n) + co(/ - /\u00C2\u00B0) ( ln(n ) )2+\u00C2\u00ABVn >0 for sufficiently large n. Whence / /\u00C2\u00B0 as n ^ oo. % To prove Theorem 4.2, we need the foUowing lemma. Lemma 4.6 Under the assumptions of Theorem 4-2, for any sufficiently small 6 G (0, mini 0 such that - [ 5 \u00E2\u0080\u009E ( r \u00C2\u00B0 - 6, r \u00C2\u00B0 + S) - 5 \u00E2\u0080\u009E ( r \u00C2\u00B0 - 6, r \u00C2\u00B0 ) - 3^(4,T\u00C2\u00B0 + S)] ^ Cr, as n ^ oc, Tt where r = 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, Proof It suffices to prove the result for the case when P = 1. For any small ^ > 0, all the arguments in the proof of Lemma 4.4 apply, under Assumption 4-2. Hence, the result holds. Proof of Theorem 4.2 By Theorem 4.1, the problem can be restricted to {/ = For any sufficiently small 6' > 0, substituting 6' for the 6 in (4.5) in the proof of Lemma 4.5 (i), we have the foUowing inequality: -Sn{n,---,Tl<>) n >U\u00C3\u00AE\u00C3\u00A8l + Op{ln\n)/n) + ^ [ 5 \u00E2\u0080\u009E ( r \u00C2\u00B0 - y , 4 + 6') - 5 \u00E2\u0080\u009E ( r \u00C2\u00B0 - 8', r \u00C2\u00B0 ) - 5 \u00E2\u0080\u009E ( r \u00C2\u00B0 , r \u00C2\u00B0 + 6% uniformly in ( r i , - - - , r ;o) G Ar := { ( n , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, r/o) : jr, - T\u00C2\u00B0\ > 6' ,1 < s < By Lemma 4.6, the last term on the RHS converges to a positive Cr for every r. And for sufficiently large n, the O pilv? {n) I n) < imni i C f - + ^ . n n 1 This imphes that with probability approaching 1 no r in is quahfied as a candidate of f, where f = ( f i , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 ,fjo). In other words, P ( f \u00E2\u0082\u00AC A%) -> 1 as n -> oo. Since this is true for ah r, P{f e H r l i ^ r ) ^ 1> 05 n oo. Note that for S' < mino oo such that f - r \u00C2\u00B0 = Op(a\u00E2\u0080\u009E) . Note that ( / /) = ^ X;r=i '<^i2/Kl(x.aeR, ) ~ l(^ 1 and = aJxtyt for any real vector a, it follows from Lemma 3.6 that ( / / ) = Op(l). It is shown in the proof of Theorem 3.3 that (/) = Op(l). Thus, ; \u00C3\u00A2 ^ - ; \u00C3\u00A2 ; = o p ( i ) , i = i , . . . , z \u00C2\u00B0 + i . Next, we shall show that the \u00C3\u00A2^'s are consistent. When and (r^', \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, T,\u00C2\u00B0O) are known, the least squares estimates \u00C3\u00B4-|*'s are obtained from each regime separately. Hence within each regime, applying Corollary 4.1 (i) and Lemma 4.3, we obtain that n \" i ^ f = E + Op(/n^n), (4.6) \u00C2\u00AB=1 where Uj = Y^^=i ^ (x^eR^) number of observations in the j t h regime. By the strong law of large numbers and Lemma 4.3 Uj/n pj as n ^ oo, and = ^ - 1 E ^ ? l ( ^ . . e i . o ) + O p ( ^ ) = a] + Op(l). t=i \" Therefore, it remains to show that aj - \u00C3\u00A2f = Op(l). Recall fij = ^ J L ^ ^(xt^eRj)- Applying Lemma 3.6 to = 1 we obtain ^ftj = ^TIJ + Op(l) = pj + Op(l). Thus, it suffices to show 5 \u00E2\u0080\u009E ( f , _ i , f , ) - 5 \u00E2\u0080\u009E ( r } ' _ i , r j ' ) = Op(l). Since Sn{fj-l,fj) = y^(/\u00E2\u0080\u009E(f j_i , f , ) - ^\u00E2\u0080\u009E(f^_i,f^))F\u00E2\u0080\u009E, and Sn{TU,r^) = F,:(/\u00E2\u0080\u009E(r]'_a,r\u00C2\u00BB) - ^\u00E2\u0080\u009E(Tf_i,rj'))y\u00E2\u0080\u009E, we have that 5 \u00E2\u0080\u009E ( f , _ i , f , ) - 5 \u00E2\u0080\u009E ( r \u00C2\u00B0 _ \u00E2\u0080\u009E r \u00C2\u00B0 ) n \u00C2\u00AB=1 n + K x , ( x ; ' x ; ) - x ; ' y \u00E2\u0080\u009E - y , : x ; ( x ; ' x ; ) - x ; ' y \u00E2\u0080\u009E } n = E \u00C3\u00AE ' ' ( i ( x . . 6 R , ) - - { y ^ x , ( x ; . x , ) - ( x j - x ; ' ) y \u00E2\u0080\u009E (4-7) t=i + - ( x ; ' x ; ) - ] x ; ' y \u00E2\u0080\u009E + y,:(x,- - x ; ) ( x ; ' x ; ) - x ; y \u00E2\u0080\u009E } n = E^<(^(^. 1 and Zt = j/f, it foUows from Theorem 4.2 and Lemma 3.6 that ^ E \" = i 2/i (l(r, 0 0 . Since i;(ef) = 1 and hence E{e]ynt) = E{\u00E2\u0082\u00AC^)E{ynt) = /^n, this last result would be imphed by Note that 1 \" ^ y a r ( E e ? y n t ) J n n \" t=l \u00C2\u00AB=1 = Jk^{^^^iE^tf^n] + E[J2etal]} = 0 ( l ) . F \u00C3\u00BB r ( i \u00C3\u008B ^ ? ) + 0(l)- i-\u00C2\u00A3(4) = 0 ( l ) F \u00C3\u00BB r ( ^ \u00C3\u008B e ? ) + o( l ) i ; ( . t ) . It remains to show that Var{^ Ylt=i f?) = o(l) and Eie^) = 0(1). To this end observe that YlJLo < a-nd hence by equation (4.8), that \u00C2\u00A3^(e|) ~ 0(1) . Now, OO OO OO oo oo Y^'u) = E ( ^ c E ^ ' ^ ^ + . ) ' ^ E ( E i^'V'.+ii)^ j=o j=0 i=0 j=0 i=0 oo \u00C2\u00B0\u00C2\u00B0 u oo oo ^ - c E ( E 7 7 W ' ^ ' ^ ^ ' ^ ' ^ E ( E l^ '+ iD ' < j=0 i=0 ^ ' j=o i=0 Consequently, Y,-oo 7^(j) = 2 Ylf=o l^U) \" 7^(0) < oo, and hence, by equation (4.9), y \u00C2\u00AB . ( i | : 4 ) = o ( i ) . (iii) Since \u00E2\u0082\u00AC^(T^,T^ + K) = (7j\u00C3\u00ABn{T^,+ K), it suffices to show that ^ \u00C3\u00AB \u00E2\u0080\u009E ( r P , r \u00C2\u00B0 + Ar\u00E2\u0080\u009E)X\u00E2\u0080\u009E(r\u00C2\u00B0,r] ' + k^) ^ 0, n o o , or, for any a 7^ 0, E[^\u00E2\u0082\u00ACn(TlTJ + kn)Xn(TlT] + K)aif = o(l). But ^[^'xil(x>.6(r0,x\u00C2\u00B0+fc\u00E2\u0080\u009E])] = ( ^ [ a ' x i l x i , = r \u00C2\u00B0 ] / d ( r \u00C2\u00B0 ) + o(l))kn and ^[(a'xi)2l(, . ,e(rO , ,o+,\u00E2\u0080\u009Ej)] = (E[(a'xi)'\xu = r\u00C2\u00B0] /d(r j ' ) + o(l))kn. Consequently, 1 \" 1 \" t>s oo oo = o ( i ) + o ( i ) ^ E E i ^ ' ^ ^ i t>s ij:i\u00E2\u0080\u0094j=t \u00E2\u0080\u0094 s = o ( i ) + o ( i ) ^ E E E i ^ ' ^ i i fc=l a=l i,j:i\u00E2\u0080\u0094j=k ^ n\u00E2\u0080\u00941 oo = o ( i ) + o ( i ) - j E ( \" - ^ ) E i ^ i + ' ^ ^ ^ i ^ k=l i=o ^ oo n\u00E2\u0080\u00941 < o ( i ) + o ( i ) - E E i ^ i + ^ ^ i i \" i=0 oo oo < o ( i ) + o ( - ) E E i ^ i + ^ ^ i i ^ oo oo 0. Hence A = E[{x[0j+i - Pj)f\xd = rj] > 0. Let /3(a, TJ) be the minimizer of | | y\u00E2\u0080\u009E (a , TJ) \u00E2\u0080\u0094 X\u00E2\u0080\u009E(a,77)y3|p. Set \u00E2\u0080\u0094 Kln^ n/n for n = 1,2, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 - , where K will be chosen later. The proofs of Lemma 3.6 and Theorem 4.3 show that if a \u00E2\u0080\u009E 'Hn Til then j \u00C3\u00A2 ( a \u00E2\u0080\u009E , 7 ? \u00E2\u0080\u009E ) y5(a, 77) as TI \u00E2\u0080\u0094\u00C2\u00BB\u00E2\u0080\u00A2 oc. Hence, for rj\" + k /3(r\u00C2\u00B0_j + ^, rj\" + kn) Pi'r'j-i + ,^ TJ\") as \u00E2\u0080\u0094>\u00E2\u0080\u00A2 0 0 . By Assumption 4-2, for any sufficiently small ^ \u00E2\u0082\u00AC ( 'r\u00C2\u00B0_i,rj ' ) , i ^ l x i x i 1 ( 2 ; J J 6 ( T ? _ I + ( 5 , T ? ] ) } is positive definite, hence, by the strong law of large numbers, ${Tf_i + S, rf) \"-4' Pj as TI 0 0 . Therefore PiTf_i + 6, rj\" + kn)^ Pj. So, there exists a sufficiently small ^ > 0 such that for ah sufficiently large n, \\P(TJ_I + S,TJ + kn) - Pj\\ < \\~Pj-P,+x\\ and {P{rj_i+6,TJ+kn)-~Pj+x)'E{-Kix[\xid = rj\") (/SCrf.i+5, r j ' + A ; \u00E2\u0080\u009E ) - ^ , + i ) > A / 2 with probability approaching 1. Hence by Theorem 4.2, for any e > 0, there exists Ni such that for n> Ni, with probability larger than 1 \u00E2\u0080\u0094 e, we have (i) | f i - r P | < < 5 , i = l , - . - , / o , (u) ||/3(r?_i + <5, r9 + fc\u00E2\u0080\u009E) - Pj^^f < 2\\Pj - Pj+i\\' and (iu) iPiTf_i + 6, rj\u00C2\u00BB + kn) - Pj+r)'E{xix[\xid = rj){P{rU + + ^-)) \" -^i+i) > A / 2 . Let Aj = { ( n , \u00E2\u0080\u00A2 . -, r ,o) : jr.- - r f l < ^, i = 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, /\u00C2\u00AB, \TJ - rfl > j = 1, \u00E2\u0080\u00A2 - \u00E2\u0080\u00A2, /\u00C2\u00AB. Since for the least squares estimates f i , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 , f / o , 5 \u00E2\u0080\u009E ( f i , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 , f i o ) < 5\u00E2\u0080\u009E(r{ ' , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, r^ ^ ), inf { 5 \u00E2\u0080\u009E ( r i , . . . , r i o ) - 5 \u00E2\u0080\u009E ( r \u00C2\u00B0 , . . . , r \u00C2\u00B0 o ) } > 0 implies (fi,---,fio) ^ Aj, or, \fj-TJ\ < kn = Kln^ n/n when (i) holds. By (i), if we show that for each j , there exists N > Ni such that for all n > N, with probabihty larger than 1 - 2e, inf(Ti,...,T,o)eyij{'5'n(T\"i,\u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 j T j o ) - 5'n(r{',\u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 , r , o ) } > 0, we wil l have proved the desired result. Furthermore, by symmetry, we can consider the case when TJ > TJ only. Hence Aj may be replaced by = {(rj, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 , r ( o ) : \Ti-Tf\ < 6, i = l , - - - , / \u00C2\u00B0 , TJ-T] > kn}. For any ( r i , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 , r , o ) G A'j, let Cl < \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 < be the set { n , . . . , r^o, T\u00C2\u00B0, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, T]_,,T]_., + S, r^+i -6,r\u00C2\u00B0^,,---, } after ordering its elements and let = \u00E2\u0080\u0094 0 0 , ^2i\u00C2\u00B0+2 \u00E2\u0080\u0094 oo- Using Corollary 4.1 (ii) twice, we have = [ 5 \u00E2\u0080\u009E ( r \u00C2\u00B0 , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 , r \u00C2\u00B0 ) + Op(ln2 ^ ^ (^^ ^2 = 5 \u00E2\u0080\u009E ( r { ' , . . . , r \u00C2\u00B0 o ) + Op(ln2 n). Thus, \u00E2\u0080\u00A25n(n, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 - jTio) >Sn{\u00C3\u0087l,- \u00E2\u0080\u00A2\u00E2\u0080\u00A2,^2l<> + l) 2l\u00C2\u00B0+2 E \u00E2\u0080\u00A2 5 ' n ( e i - l , ^ i ) + Snir^x + S,Tj) + Sn{T,,T%, - b) +[5\u00E2\u0080\u009E(r j '_i + r,) + 5\u00E2\u0080\u009E(r,-, r ] ^ : - b)\ - \Sn{r]_i + r]) + 5\u00E2\u0080\u009E(r9, ^ \u00C2\u00AB , 1 - S)\ = 5 \u00E2\u0080\u009E ( r { ' , . . . , r \u00C2\u00B0 ) + 0p(/n2n) + [ 5 \u00E2\u0080\u009E ( 7 f _ i + b,r,) + 5\u00E2\u0080\u009E ( r , - , r \u00C2\u00B0 , i - ^)] - [5n(r\u00C2\u00B0_i + 0. Let n 5 \u00E2\u0080\u009E ( a , r?;^) = | | y\u00E2\u0080\u009E (a , 7?) - X \u00E2\u0080\u009E ( a , 7/)^|p = J^iyt - x0)H^,^,^^^,r,)). Since 5 \u00E2\u0080\u009E ( a , 77) = Sn(a, 77; /3(a, 77)), we have >Sn{TU + >^ + ^n) + Sn{TJ + K,Tj) = 5 \u00E2\u0080\u009E ( r ? _ i + S, Tf-J(T\u00C2\u00B0_i + 6, + k^)) + 5 \u00E2\u0080\u009E ( r \u00C2\u00B0 , rf + A:\u00E2\u0080\u009E; ^ ( r ? , ! + 6, r \u00C2\u00B0 + A:\u00E2\u0080\u009E)) (4.11) + 5\u00E2\u0080\u009E ( r9 + A;\u00E2\u0080\u009E,r,) >5\u00E2\u0080\u009E( r j '_ i + S,TJ) + 5 \u00E2\u0080\u009E ( r \u00C2\u00B0 , r \u00C2\u00B0 + A:\u00E2\u0080\u009E;^(7-]'_i + S,TJ + fc\u00E2\u0080\u009E)) + 5 \u00E2\u0080\u009E ( r \u00C2\u00B0 + fc\u00E2\u0080\u009E,r,). And since (r? + rP^j - <\u00C3\u00AE] C (TJJTJ^I] for sufficiently large TI, Snir] + A:\u00E2\u0080\u009E,r\u00C2\u00B0,i - = a j + i c U r \u00C2\u00B0 + A:\u00E2\u0080\u009E,rjVi - ^ ) 6 n ( r \u00C2\u00B0 + A:\u00E2\u0080\u009E,r\u00C2\u00B0,i - ^). Applying Corollary 4.1 (i), we have 0 5 \u00E2\u0080\u009E ( r \u00C2\u00B0 , r j ' ^ i - <5) - 5\u00E2\u0080\u009E ( r j ' , r \u00C2\u00B0 + k^Jj+i) - S^irj + k^^rj) + Op{\n' n). Therefore, by (4.11) and (4.12) [5\u00E2\u0080\u009E(r\u00C2\u00B0_i + S, TJ) + Snirj,rjVi - S)] - [5\u00E2\u0080\u009E(7f_i + 6, rj) + 5\u00E2\u0080\u009E ( r\u00C2\u00AB , rj^^ - 6)] > 5 \u00E2\u0080\u009E ( r ? , r \u00C2\u00B0 + kn-Jirj.i + S,TJ + k^)) - SniT\u00C2\u00B0,T\u00C2\u00B0 + kn-Jj+i) + Op(ln2 n). (4.12) Let M > 0 such that the term |Op(ln^ n)| < Mln^ n with prohabihty larger than 1 - e for all n > Ni. To show (4.10), it suffices to show that for sufficiently large n, Sn(r^,T\u00C2\u00B0 + kn-JiT9_, + 6, r \u00C2\u00B0 + k^)) - SniT^,T\u00C2\u00B0 + k^; Pj+i) - Mln-'n > M'ln'n, or SniT^rf + k n , + ^ ' + ^n)) \" Sn{r\u00C2\u00B0+ k^,Pj+i) > ( M ' + M)ln'n (4.13) with large probabihty. RecaU Sn(a,rj;P) = ll^n(a,7/) - X\u00E2\u0080\u009E (a , 7?)^ | |2 and y\u00E2\u0080\u009E( r ] ' , r j ' + A;\u00E2\u0080\u009E) = X{T^,TJ + kn)Pj+i + \u00E2\u0082\u00ACniTj,T^ + kn)- Taking K sufficiently large and applying (ii), (in) and Lemma 4.7 (i), (iii), we can see that there exists N > Ni such that for any n > N, ^ [ 5 \u00E2\u0080\u009E ( r j ' , rj\u00C2\u00BB + kn, 0{T\u00C2\u00B0_, + S, + kn)) - Snirl rj\u00C2\u00BB + kn;Pj+i)] = ^ [ r n ( T - , ^ r ? + kn) - Xnir^T^ + fc\u00E2\u0080\u009E)/9(r\u00C2\u00B0_i + S,T\u00C2\u00B0 + kn)\\' - | |y\u00E2\u0080\u009E(rj>,r\u00C2\u00B0 + kn) - Xn{r\u00C2\u00B0,T^ + kn)Pj+xf] -\\aj+l\u00C3\u00A8n{Tf,T^ + kn)\\'] + ^^^n(rj, r\u00C2\u00B0 + A:\u00E2\u0080\u009E)X\u00E2\u0080\u009E(rO, r\u00C2\u00AB + kn)iPj+i - + ^' + ^n)) > A / 4 - A / 8 > ( M ' + M ) / A ' with probabihty larger than 1 - 2e. Since kn = Kln^n/n, the above imphes (4.13). ^ The following Lemma (cf. Hall and Heyde, 1980, L iu 1991) plays an important role in establishing the central hmit theorem for the sample moments involving the {et}. Before we state the lemma, we need to introduce some notation. Let T be an ergodic one-to-one measure-preserving transformation on tlie probability space (fi , T, P). Suppose Ito is a sub-cr-field of satisfying Z/Q \u00C3\u0087 T~^{UO). Also suppose that ZQ is a square integrable r.v. defined on P) with E(Zo) = 0, and that {Zt} is a sequence of r.v.'s defined by Zt = ZQ{T^UI), a; \u00E2\u0082\u00AC fi. Let Uk = T'^'iUo), k = 0,\u00C2\u00B1l,--L e m m a 4.8 Suppose thatUo \u00C3\u0087 T-^{UQ) andputUk = T-''{UQ). Let E{Zl) < oo and E{ZQ) = 0. / / oo Y,{iE[E{Zo\U.m)fy' + {E[Zo- EiZopm)?)^/-\"} < oo, m = l then a*\"^ := fim\u00E2\u0080\u009E_oo '^^f\"^ exists, where 5\u00E2\u0080\u009E := Yjt=\ '^t- Further, Sn d \fn N{0,a''). P r o o f The proof is obtained from Hall and Heyde (1980, Theorem 5.5 and Corollary 5.4) or Liu (1991, Theorem 4.1). ^ P r o p o s i t i o n 4.2 (Brockwell and Davis, 1987, Remark 2, p212) Let oo i=-oo where the {Ct} is an iid sequence of random variables each with mean zero and variance a'^. If T:T=-oo \^J\ < ^> then, ZZ-oo hih)\ < oo and .. n oo oo ]imnVari-Yet)= ^ l(h) = ^ ^J?\u00E2\u0080\u00A2 t = l h=\u00E2\u0080\u0094oo j=-oo To facihtate the statement of the next result let Gj = \u00C2\u00A3'(xixil(^j^\u00C3\u00A7(^o_^^.,o])), 131 and = aJGj'TjGj', where 7(1) = \u00C2\u00A3^(ei\u00E2\u0082\u00ACi+,) and j = 1, - \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 ,P + 1. Also recall that for each j = 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 , /\u00C2\u00B0 + 1, is the least squares estimate of/3j given r^'s. L e m m a 4.9 Under the Assumptions 4-0, 4-i and 4-3, j = h---,P + l. Proof : First, we shall show that It suffices to show that for any constant vector a, where <7^ = a'TjU. By Assumption 4-3, {x.t}^^oo is an iid sequence of random variables. Let Tt = a((^s,'^s, s < t) denote the cr-field generated by {(s,Xs, s < t}, and Zt = a'x.t\u00E2\u0082\u00ACtl(^x,de(,T\u00C2\u00B0_^,T\u00C2\u00B0]) for a given constant vector a. To show that Z]\"=i has an asypmtotic normal distribution, one needs to verify the conditions of Lemma 4.8. Thus, it suffices to show that EZQ = 0, EZQ < 00, E : : = i ( ^ [ ^ ( ^ o | ^ - \u00E2\u0080\u009E . m ^ < 0 0 , and 00 Y,{E[Zo-EiZo\Tm)?y^' 1, Zo = \" ' xo fo l (2 ; ode (TJ ' _ i ,T\u00C2\u00B0 ] ) is .T^m-measurable, hence - E{Zo\^m) = Zo - Zo = 0. So (4.14) is trivial. It remains to show that Y^'^^iiE[E{Zo\J^-mf]y^^ < oo. Now, note that ElEiZolJ'-m)? oo i=0 oo = ^ [ ^ ( \" ' ^ o l ( . o , e ( r ^ \u00E2\u0080\u009E r O ] ) ) E ^ ' ^ - ' l ' oo = [ x ; ( a ' x o i ( . \u00E2\u0080\u009E , e ( , c ^ , , . o ] ) ) ] 2 i ; [ 5 ; v . C - , f oo =[X;(a'xol(.\u00E2\u0080\u009E,e(,^^,rO ]))]2 ^fcr^2 oo E t=m where cj = [E{a'xol(^^^e{T\u00C2\u00B0_\u00E2\u0080\u009Erf]))?(^C Thus CO Y{E[E{Zo\T.m)?V^' m=l oo oo m=:l \u00C2\u00AB=m oo oo m=l \u00C2\u00BB=m oo oo s v J J t o E l E T - f W r . Tn=l \u00C2\u00BB=Tn ^ under our assumption that \ipi\ < ka/\u00C3\u0087i + 1)'^ for all i. Replacing the 6 in equation (4.3) with 26, we obtain that \u00C2\u00B0\u00C2\u00B0 1 \u00C2\u00B0\u00C2\u00B0 1 1 E u + 1)25 = E + i)2S ^ I2S _ i)Tn2\u00C2\u00AB-i \u00E2\u0080\u00A2 (\"^ -1 )^ 133 Since 2(5 - 1 > 1, 771 = 1 OO 771=1 This shows that E \" = i \u00E2\u0080\u00A2^ t ^^ .s an asymptotic normal distribution. We next calculate the asymptotic variance of ra\"^/^ Z)\"=i ^t- By Lemma 4.8, it is n-+oo n n 1 =^[(\" 'x i ) ' l (x . ,6(r ( E ^ ? ) ] ' where lim\u00E2\u0080\u009E..^oo -^-E^CEfLi ^t) = Ee\ = 1 by our assumption. By Proposition 4.2, 71 OO ^ h j n ^ n F a r ( - E f t ) = E t=l i=-oo Hence, hm\u00E2\u0080\u009E^oo nVar{l ^t) - ^ = ET=-oo T ( 0 - 7(0) = 2 E . ^ x 7 (0 , and l im ^ = a'Tja, 7i->oo n which is CT^. By the strong law of large numbers for ergodic sequences, as 71 \u00E2\u0080\u0094>\u00E2\u0080\u00A2 oo. W i t h sufficiently large n, (X^( r j ' _ i , r \u00C2\u00B0 )X\u00E2\u0080\u009E ( rP_ i , rj*))\"^ exists a.s., and 71/ as 71 ^ oo. Hence, = ( ^ ; ( ^ - i , r \u00C2\u00B0 ) X \u00E2\u0080\u009E ( r \u00C2\u00B0 _ i , r j ' ) ) - i ( X ; ( 7 f _ i , r ? ) X \u00E2\u0080\u009E ( 7 f _ i , r j > ) ^ , + X ; ( r \u00C2\u00B0 _ i , r\")?:) =Pj + a , ( X ; ( r ] ' _ i , r \u00C2\u00AB ) X \u00E2\u0080\u009E ( r \u00C2\u00B0 _ i , r \u00C2\u00B0 ) ) - i x ; ( 7 f _ i , r \u00C2\u00B0 ) c \u00E2\u0080\u009E . Since a ] G - i ' [ G , + 2ESi7(0i^(xi l ( . , ,e( .<^^, .o]))X;(xi l ( ,^ ,e( .o^ v ^ ( ^ ; - / 3 i ) ^ m \u00C2\u00A3 i ) -This completes the proof. f Lemma 4.10 Under the condition of Lemma 4-9, 1 \" asn^oo, where vj = p , ( l - pj)Eiei) + pj[iv - 3)7^(0) + 2 ZT=-oo 7^(0] '^rid p, = P{T'J_I < xu < rf). P r o o f It suffices to show that Let Tt = 1, Zo is J'm-measurable. Hence, Zo-E{Zo\Tm) = ZQ-ZQ - 0. So (4.14) is trivial. It remains only to show that Em=i(^[^(^oi.^-m)^])^/^ < oo. Recall that E^el) = al E . ^ o V'.\" is assumed to be 1. Hence, E[E{ZQ\T-m)? = E[Ei4Hxode(rO_,,rO])-Pi\^-m)f =E\pjE{el\T.m)-Pj? oo ^p]E[E{{Y,i^,^-if\^-ra)-lf i=0 m-1 =p)E[Y,i^>i + {Y.^iC-i?-if i=0 \u00C2\u00BB=m =p)E[{\u00C2\u00B1i.iC-if-f:^Hf i=m i=m oo oo =p][EiZ^iC-ir-{E^'-i)'^-i~ m i= m Using equation (4.8) by setting ipi = Q for i < m, we have i=m i=m oo oo oo t=m \u00C2\u00AB=m oo oo ^ ( ' / - i K c E ^ i ) ' :=m < ( r ; - l ) a ^ f c ^ ( E - i - ^ f . By (4.15), YlZm + 1 ) \" < 1/(2^ - l)m2*-i . Thus, oo J^iElEiZolT.m)?}'^' < f : p . v ^ ^ - i k i i \u00C2\u00B1 j r ^ ) m=l \u00C2\u00BB=m ' m=l ' i + pj - p'^(^3) - p,'^(f?)] - l im i y i ; ( e ? ) p 2 = p , \u00C2\u00A3 ( e t ) + J i m ^ \u00C2\u00A3[ ( ,2 _ i)(^2 _ _ p2^(^4) 1 \u00C2\u00B0\u00C2\u00B0 =p, ( l - pj)E{et) + p] J i m n F a r ( - ^'t)-By equation (4.9), limn^oo nVari^ E t = i f?) = (^ - 3)7^(0) + 2 E S - o o 7 ' (0 - This completes the proof. ^ P r o o f o f T h e o r e m 4.5 We shall show the conclusion for the j9j's first. Let Pj denote the least squares estimate of Pj when (rf, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, r^ o ) is known, j = 1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, /\u00C2\u00B0 +1. By Lemma 4.9, it suffices to show that Pj and Pj share the same asymptotic distribution, for all j . In turn, it suffices to show that Pj - Pj = Op{n~'/-). Set X ; = / \u00E2\u0080\u009E ( r j ' _ i , r j ' )X\u00E2\u0080\u009E and Xj = Ufj-ufj)Xn. Then, = [ ( i x j x , ) - - ( i x ; ' x ; ) - ] [ i x j y \u00E2\u0080\u009E ] + [ ( i x ; ' x ; ) - ] [ i ( x , - x ; ) ' y \u00E2\u0080\u009E ] It 7t Tt Tt /* = [ ( i x ; . x , ) - - {^x;'x;r]{kx'j - x ; ) X + i x ; y \u00E2\u0080\u009E } + [ ( i x ; ' x ; ) - ] [ ^ ( x , - x ; ) X ] =:( /){(/ /) + ( / / / ) } + ( / y ) ( / / ) . where (/) = [(^X'^Xj)- - ( i X / X / ) \" ] , ( / / ) = i ( X j - X ; ) ' y \u00E2\u0080\u009E , ( / / / ) = i X ; y \u00E2\u0080\u009E and (IV) = [ ( i x / x ; ) - ] . As in the proof of Theorem 4.3, both (III) and (IV) are Op(l). And the order of Op(ra~^/^) of (I) and (II) foUows from Lemma 3.6 by taking a\u00E2\u0080\u009E = In^n/n, Zt = (a'x^)^ and Zt = a'xtj/f respectively, for any real vector a and u > 2. Thus, Pj \u00E2\u0080\u0094 Pj = Op{n~'/'^). Next, we proof the conclusion for the 2 and Zt = yt, it follows from Lemma 3.6 that n ^ J2^=i Vt i^(xtdefii) ~ \u00E2\u0080\u00A2'\u00E2\u0080\u00A2(a i^deH?)) = Op(ra~^/2). Also, it is shown in the proof of Theorem 4.3 that both (III) and (IV) are Op(l) . The order of Op (n -^ /2) of (j) ^nd (II) follows from Lemma 3.6 by taking a\u00E2\u0080\u009E = lv?n/n, Zt = (a'xi)^ and Zt = aJxtyt respectively, for any real vector a and u > 2. This shows that a] - \u00C3\u00A0]* = o(ra-^/2)_ ^ P r o o f o f T h e o r e m 4.6 For d = (f,hy Lemma 4.5 (u), -Sn \u00E2\u0080\u0094 ^ o - Q -n For d ^ dP, -we shall show that > CTQ + C for some constant C > 0 with probability approaching 1. Again, = 1 is assumed for simplicity. J\u00C3\u008E d d9,hy the identifiability of d\u00C2\u00B0, for any {Rj}f^i , there exist r ,5 \u00E2\u0082\u00AC {1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, X +1} such that Rf D where is defined in Theorem 2.1. Let 5s = { (n, . . . , TL) : Rf D Af for some r}. Then for any ( n , . . . , TL), ( n , \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, TL) \u00E2\u0082\u00AC for at least one s e {1, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2, L + 1}. Since d is chosen such that S^ < for all d, it suffices to show that iox d^ dP and each s, there exists > 0 such that inf i 5 ^ ( r j , . . . , r L ) > a ^ + C , (4.16) (Ti,...,Ti)\u00E2\u0082\u00ACB, n with probabihty approaching 1 as n -> oo. For any {TI,...,TL) \u00E2\u0082\u00AC Bs, let -^\u00C2\u00A3,^.2 = {x : a;, \u00E2\u0082\u00AC ( r r_ i , a , )} , i2|,+3 = {x : Xd \u00E2\u0082\u00AC ( 6 \u00E2\u0080\u009E r r ] } . Then Ri = Afxj Rj^^^ U From Lemma 4.3 and the proof of Lemma 3.2', we can see that the conclusion of Lemma 3.2' still holds under current assumptions. Hence, the conclusions of Proposition 3.1' and Lemma 3.3' also hold. Therefore, by (3.13) i 5 ^ ( r i , ...,TL) = al + Op(l) + ^[5\u00E2\u0080\u009E(Af ) - 5 \u00E2\u0080\u009E (Af n R^) - 5 \u00E2\u0080\u009E (Af n R^)]. Now it remains to show that i [ 5 \u00E2\u0080\u009E ( A f ) - 5 \u00E2\u0080\u009E ( A f n i2?) -5\u00E2\u0080\u009E (Af ni?^)] > for some C, > 0, with probabihty approaching 1. By Theorem 2.1, Z;[xixil(xjg^^P^o)], i = 1,2, are positive definite. Applying Lemma 3.3' we obtain the desired result. f Chapter 5 S U M M A R Y A N D F U T U R E R E S E A R C H 5.1 A brief summary of previous chapters In this thesis, we propose a set of procedures for estimating the parameters of a segmented regression model. The consistency of the estimators is established under fairly general con-ditions. For the \"basic\" model where the noise is an iid sequence and locally exponentially bounded, it is shown that if the model is discontinuous at a threshold, then the least squares estimate of the threshold converges at the rate of Op{lv?nln). For both continuous and discon-tinuous models, the asymptotic normality of the estimated regression coefficients and the noise variance is established. The least squares \"identifier\" of the segmentation variable is shown to be consistent, if the segmentation variable is asymptotically identifiable. A more efficient method of identifying the segmentation variable is given under stronger conditions. Most of these results are generalized to the case where the noise is heteroscedastic and autocorrelated. A simulation study is carried out to demonstrate the small sample behavior of the proposed estimators. The proposed procedures perform reasonably weU in identifying the models, but indicate the need for large sample sizes for estimating the thresholds. 5.2 Future research on the current model First, further work on choosing and CQ in the MIC is needed. One way to reduce the risk of mis-specifying the model is to try different (^O)Co) values over certain range. If several (<5o,co) pairs produced the same /, we would be more confident of our choice. Otherwise different models can be fitted. And the estimated regression coefficients and noise variance may then indicate what {60, CQ) is more appropriate. In particular, when the noise is autocorrelated, recursive estimation procedures need to be investigated. Second, the asymptotic normality of the estimated regression coefficients for continuous models need to be generalized to the case where the noise is heteroscedastic and autocorrelated. The techniques used in Sections 3.5 and 4.5 are useful but additional tools are needed, such as the central limit theorem for a double array of martingale sequences. Third, the local exponential boundedness assumption made on the noise may be relaxed. Note that this assumption implies that ei has moments of any order. If Ci is assumed to have only moments to finite order, a model selection criterion with a penalty term of the form Cn\u00C2\u00B0' (0 < a < 1) may well be consistent. This has been shown by Yao (1989) for a one-dimensional step function with fixed covariates and iid noise. 5.3 Further generalizations Further generalization of the segmented regression model will enable its broader apph-cations. First, there may be more than one segmentation variable. For example, changes in economic policy may be triggered by the simultaneous extremes in a number of key economic indices. The results in this thesis may be generahzed to the case where more than one seg-mentation variable is present. Further, since sometimes there is no reason to beheve that segmentation has to be parallel to any of the axes, a threshold defined in terms of a linear combination of explanatory variables may be appropriate. A least squares approach or that of Goldfeld and Quandt (1972, 1973a) can be applied. Large sample properties of the estimators given by these approaches would need to be investigated. In many economic problems, the explanatory variables exhibit certain kinds of dependence over time. The explanatory variables and the noise may also be dependent. Our results can be generalized in this direction, since the iid assumption on {x^} is not essential. Once such generahzations are accomplished, we expect this model to be useful for many economic problems, since many economic policies and business decisions are threshold-based, at least to some extent. In fact, the segmented regression model has been applied to a foreign exchange rate problem by Liu and Susko (1992) with significantly better results than other approaches reported in the hterature. And , the need for a theoretical justification for this approach is obvious. K yt and Xti in Model 2.1 are replaced by Xt and xt-i respectively {i = /, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2,p), where {xf} is a time series, then the model becomes a threshold autoregressive model. This interesting nonhnear time series models has been studied by many authors. See, for example, Tong (1987) for a review on some recent work on nonlinear time series analysis. Because this model is very similar to ours in its structure, the approaches used in this thesis may also shed some light on its model selection problem and the large sample properties of its least squares estimates. In particular, we expect a criterion similar to MIC can be used to select the number of threshold for the threshold autoregressive model. R E F E R E N C E S Bacon, D . W . and Watts, D . G . (1971). Estimating tiie transition between two intersecting straigiit lines. Biometrika, 58, 525-543. Bellman, R. (1969). Curve fitting by segmented straight fines. J. Amer. Statist. Assoc., 64, 1079-1084. Bilhngsley, P. (1968). Convergence of Probability Measures. Wiley, N . Y . Breiman, L . , and Meisel, W.S. (1976). General estimates of the intrinsic variability of data in nonlinear regression models. J. Amer. Statist. Assoc., 71, 301-307. Brockwell, P .J . and Davis, R . A . (1987). Time series: Theory and methods. Springer-Verlag, N . Y . Broemehng, L . D . (1974). Bayesian inferences about a changing sequence of random variables. Commun. Statist., 3, 234-255. Cleveland, W.S. (1979). Robust locally weighted regression: A n approach to regression analysis by local fitting. J. Amer. Statist. Assoc., 74, 829-836. Cleveland, W.S. and Devlin, S.J. (1988). Locally weighted regression: an approach to regression analysis by local fitting. J. Amer. Statist. Assoc., 83, 596-610. Dunicz, B . L . (1969). Discontinuities in the surface structure of alcohol-water mixtures. Kolloid-Zeitschr. u. Zeitschrift f. Polym\u00C3\u00A8re, 230, 346-357. Ertel J .E . and Fowlkes E . B . (1976). Some algorithms for linear spline and piecewise multiple linear regression. / . Amer. Statist. Assoc., 71, 640-648. Farley, J . U . and Hinich, M . J . (1970). A test for a shifting slope coefficient in a hnear model. J . Amer. Statist. Assoc., 65, 1320-1329. Feder, P.I. and Sylwester, D .L . (1968). On the asymptotic theory of least squares estimation in segmented regression: identified case (preliminary report) abstracted in Ann. Math. Statist., 39,1362. Feder, P.I. (1975a). On asymptotic distribution theory in segmented regression problems-identified case. Ann. Statist. 3, 49-83. Friedman, J . H . (1988). Multivariate Adaptive Regression Sphnes, Report 102, Department of Statistics, Stanford University. Friedman, J . H . (1991). Multivariate Adaptive Regression Splines. Ann. Statist. 19, 1-141. Feder, P.I. (1975b). The log hkelihood ratio in segmented regression. Ann. Statist. 3, 84-97. Ferreira, P .E. (1975). A Bayesian analysis of switching regression model: Known number of regimes. J. Amer. Statist. Assoc., 70, 730-734. Gallant, A . R . and Fuller, W . A . (1973). Fitt ing segmented polynomial regression models whose join points have to be estimated. J. Amer. Statist. Assoc., 68, 144-147. Goldfeld, S .M. and Quandt, R . E . (1972). Nonlinear Methods in Econometrics. North-Holland Pubhshing Co. Goldfeld, S .M. and Quandt, R . E . (1973a). The estimation of structural shifts by switching regressions. Ann. Econ. Soc. Measurement, 2, 475-485. Goldfeld, S .M. and Quandt, R . E . (1973b). A Markov model for switching regressions. Journal of Econometrics, 1, 3-16. Hal l , P. and Heyde, C. (1980). Martingale limit theory and its application. Academic Press. Hawkins, D . M . (1980). A note on continuous and discontinuous segmented regressions. Tech-nometrics, 22, 443-444. Henderson, H . V . and Velleman, P.F. (1981). Building regression model interactively. Biomet-rics, 37, 391-411. Henderson, R. (1986). Change-point problem with correlated observations, with an application in material accountancy. Technometrics, 28, 381-389. Hinkley, D . V . (1969). Inference about the intersection in two-phase regression. Biometrika, 56, 495-504. Hinkley, D . V . (1970). Inference about the change-point in a sequence of random variables. Biometrika, 57, 1-17. Holbert, D . and Broemhng, L . (1977). Bayesian inferences related to shifting sequences and two-phase regression. Commun. Statist. Theor. Meth., A6(3), 265-275. Jennrich, R . J . (1969). Asymptotic properties of non-hnear least squares estimators. Ann. Math. Statist, 40, 633-643. Hudson, D . J . (1966). Fitt ing segmented curves whose join points have to be estimated. J. Amer. Statist. Assoc., 61, 1097-1129. Liu , J . and L iu , Z. (1991). Higher order moments and hmit theory of a general bilinear time series. Unpubhshed manuscript. Liu , J . and Suslco, E . A . (1992). Forecasting exchange rates using segmented time series regres-sion model - a nonlinear multi-country model. Unpubhshed manuscript. MacNeil l , L B . (1978). Properties of sequences of partial sums of polynomial regression residuals with applications to test for change of regression at unknown times. Ann. Statist., 6, 422-433. McGee, V . E . , and Carleton, W . T . (1970). Piecewise regression. J . Amer. Statist. Assoc., 65, 1109-1124. Miao, B .Q . (1988). Inference in a model with at most one slope-change point. Journal of Multivariate Analysis, 27, 375-391. MuUer, H . G . and Stadtmuller, U . (1987). Estimation of heteroscedasticity in regression analysis. Ann. Statist., 15, 610-625. Poirier, D . J . (1973). Piecewise regression using cubic splines. J. Amer. Statist. Assoc., 68, 515-524. Quandt, R . E . (1958). The estimation of the parameters of a linear regression system obeying two separate regimes. / . Amer. Statist. Assoc., 53, 324-330. Quandt, R . E . (1960). The estimation of the parameters of a linear regression system obeying two separate regimes. J. Amer. Statist. Assoc., 53, 873-880. Quandt, R . E . (1972). A new approach to estimating switching regression. J. Amer. Statist. Assoc., 67, 306-310. Quandt, R . E . , and Ramsey, J .B . (1978). Estimating mixtures of normal distributions and switching regression. (With discussion). J. Amer. Statist. Assoc., 73, 730-752. Robison, D . E . (1964). Estimates for the points of intersection of two polynomial regressions. J. Amer. Statist. Assoc., 59, 214-224. Sacks, J . and Ylvisaker, D. (1978). Linear estimation for approximately linear models. Ann. Statist., 6, 1122-1137. Schulze, U . (1984). A method of estimation of change points in multiphasic growth models. Biometrical Journal, 26, 495-504. Schwarz, G . (1978). Estimating the dimension of a model. Ann. Statist., 6, 49-83. Serfling, R . J . (1980). Approximation theorems of mathematical statistics. Wiley, New York. Shaban, S.A. (1980) Change point problem and two-phase regression: an annotated bibhogra-phy. International Statistical Review, 48, 83-93. Shao, J . (1990). Asymptotic theory in heteroscedastic nonlinear models. Statistics & Probability Letters, 10, 77-85. Shumway, R . H . and Stoffer, D.S. (1991). Dynamic linear models with switching. J. Amer. Statist. Assoc., 86, 763-769. Sylwester, D . L . (1965). On the maximum likelihood estimation for two-phase Unear regression. Technical Report No. 11, Department of Statistics, Stanford Univ. Sprent, P. (1961). Some hypotheses concerning two phase regression lines. Biometrics, 17, 634-645. Univ. Susko, E . A . (1991). Segmented regression modelhng with an apphcation to German exchange rate data. M.Sc. thesis. Department of Statistics, University of British Columbia. Tong, H . (1987). Non-linear time series models of regularly sampled data: A review. Proc. First World Congress of the Bernoulli Society, Tashkent, USSR, 2, 355-367, The Netherlands, V N U Science Press. Weerahandi, W . and Zidek, J .V . (1988). Bayesian nonparametric smoothers for regular pro-cesses. The Canandian journal of Statistics, 16, 61-73. Worsley, K . J . (1983). Testing for a two-phase multiple regression. Technometrics, 25, 35-42. Yao, Y . (1988). Estimating the number of change-points via Schwarz' criterion. Statistics & Probability Letters, 6, 181-189. Wu, C . F . J . (1981). Asymptotic theory of nonlinear least squares estimation. Ann. Statist., 9, 501-513. Yao, Y . and A u , S.T. (1989). Least-squares estimation of a step function. Sankhya: The Indian Journal of Statistics, A, 51, 370-381. Yeh, M . P . , Gardner, R . M . , Adams, T . D . , Yanowitz, F . G . , and Crapo, R.O. (1983). \"Anaerobic threshold\": Problems of determination and validation. J. Apply. Physiol. Respirit. Envioron. Excercise Physiol., 55, 1178-1186. Zwiers, F . and Storch, H . V . (1990). Regime-dependent autoregressive time series modeling of the Southern OsciUation. Journal of Climate, 3, 1347-1363. Table 3.1: Frequency of correct identification of P in 100 repetitions and the estimated thresholds for segmented regression models ( m,mu,mo are the frequencies of correct, under- and over-estimations of ) MIC : m(mu, nio) h (SE) sample size 30 50 100 200 Model{a) 79 (18, 3) 95 (4, 1) 100 (0, 0) 100 (0, 0) 1.168 (1.500) 1.033 (1.353) 1.410 (0.984) 1.259 (0.665) Model{b) 70 (21, 9) 86 (8, 6) 99 (0, 1) 100 (0, 0) 1.022 (1.546) 1.220 (1.407) 1.432 (0.908) 1.245 (0.692) Model(c) 80 (6, 14) 97(1,2) 100 (0, 0) 100 (0, 0) 0.890 (0.737) 0.761 (0.502) 0.901 (0.221) 0.932 (0.151) Model{d) 85 (8, 7) 99 (0, 1) 100 (0, 0) 100 (0, 0) 0.791 (1.009) 0.860 (0.665) 0.971 (0.232) 0.963 (0.169) Model(e) 68 (23, 9) 87 (12, 1) 100 (0, 0) 100 (0, 0) 0.463 (1.735) 0.708 (1.332) 0.989 (0.923) 0.940 (0.707) Table 3.2: Estimated regression coefficients and variances of noise and their standard errors with n = 200 ( Conditional on / = 1 ) 4- (SE) Model (a) Model (b) Model (c) Model (d) Model (e) Pw -0.003 (0.145) -0.018 (0.146) 0.004 (0.143) -0.008 (0.154) -0.059 (0.177) /3ii 1.001 (0.038) 0.995 (0.037) 1.000 (0.035) 0.995 (0.041) 0.985 (0.045) /3l2 1.000 (0.024) 0.996 (0.025) -0.004 (0.025) 0.000 (0.024) 1.000 (0.025) /?13 0.994 (0.023) 0.995 (0.025) /\u00C3\u008E20 1.485 (0.345) 1.388 (0.332) 0.962 (0.243) 1.009 (0.225) 0.960 (0.283) ^21 0.005 (0.063) 0.019 (0.067) 0.008 (0.055) 0.000 (0.049) 0.008 (0.057) ^23 1.006 (0.034) 0.998 (0.034) 0.495 (0.032) 0.498 (0.032) 0.998 (0.036) 0.997 (0.034) 0.996 (0.036) a2 0.948 (0.108) 0.950 (0.154) 0.956 (0.156) 0.953 (0.160) 0.944 (0.158) Table 3.3: The empirical distribution of / in 100 repetitions by MIC, SC and YC for piecewise constant model ( Tip, rai, 712, \"3 are the frequencies of / = 0,1,2,3 respectively) MIC : no, nx,n2, YC : no, n\,n2, n^ SC : no, 7\u00C3\u008E1 , 7l2, 7l3 sample size 50 150 450 Modelif) 5, 30, 48, 17 0, 18, 79, 3 0, 0, 98, 2 5, 36, 45, 14 0, 36, 64, 0 0, 9, 91, 0 0, 17, 52, 31 0, 1, 64, 35 0, 0, 83, 17 Model{g) 5, 38, 51, 6 0, 23, 72, 5 0, 0, 99, 1 7, 41, 48, 4 0, 46, 53, 1 0, 7, 93, 0 3, 18, 56, 23 0, 2, 79, 19 0, 0, 87, 13 Model{h) 0, 3, 81, 16 0, 0, 96, 4 0, 0, 98, 2 0, 3, 84, 13 0, 0, 100,0 0, 0, 100,0 0, 0, 63, 37 0, 0, 82, 18 0, 0, 87, 13 Model(i) 0, 5, 85, 10 0, 0, 97, 3 0, 0, 100, 0 0, 7, 86, 7 0, 0, 100, 0 0, 0, 100, 0 0, 1, 73, 26 0, 0, 83, 17 0, 0, 93, 7 Table 3.4: The estimated thresholds and their standard errors for piecewise constant model ( Conditional on / = 2 ) r i , (SE) r2, (SE) sample size 50 150 450 Model{f) 0.335 (0.078) 0.338 (0.039) 0.334 (0.012) 0.660 (0.032) 0.666 (0.008) 0.667 (0.003) Model(g) 0.313 (0.076) 0.332 (0.032) 0.334 (0.013) 0.656 (0.015) 0.669 (0.009) 0.667 (0.002) Model{h) 0.316 (0.027) 0.334 (0.007) 0.333 (0.002) 0.662 (0.030) 0.667 (0.006) 0.667 (0.003) Model{i) 0.323 (0.023) 0.332 (0.010) 0.334 (0.004) 0.661 (0.030) 0.666 (0.007) 0.667 (0.003) Table 4.1: Frequency of correct identification of P in 100 repetitions and the estimated thresholds for segmented regression models with two regimes ( m, mu,mo are the frequencies of correct, under- and over-estimations of /\u00C2\u00B0 ) MIC : mim-u,, mo) h (SE) sample size 50 100 200 Model (a') 95 (3, 2) 98 (0, 2) 99 (0, 1) 1.322 (1.681) 1.412 (1.293) 1.223 (1.060) Model (d') 91 (1,8) 95 (0, 5) 99 (0, 1) 0.808 (0.545) 0.936 (0.256) 0.960 (0.109) Model (e') 94 (3, 3) 98 (0, 2) 99 (0, 1) 0.693 (1.583) 1.088 (1.470) 1.175 (1.111) Table 4.2: Estimated regression coefficients and variances of noise and their standard errors with n = 200 ( Conditional on / = 1 ) kj (SE) Model (a') Model (d') Model (e') Pio -0.049 (0.247) 0.007 (0.190) -0.056 (0.227) /3n 0.993 (0.066) 0.998 (0.059) 0.985 (0.065) /3l2 1.003 (0.017) -0.001 (0.020) 0.999 (0.019) /3l3 0.998 (0.018) 0.997 (0.018) /320 1.258 (0.730) 0.957 (0.461) 0.749 (0.596) 0.033 (0.129) 0.013 (0.107) 0.045 (0.126) 0.998 (0.033) 0.503 (0.029) 1.002 (0.030) P24 0.998 (0.026) 0.999 (0.029) ol 0.656 (0.117) 0.639 (0.167) 0.634 (0.166) 0.929 (0.271) 1.050 (0.391) 0.963 (0.361) Table 4.3: Frequency of correct identification of /\u00C2\u00B0 in 100 repetitions and the estimated threshold for a segmented regression model with three regimes ( m, THU-, rrio are the frequencies of correct, under- and over-estimations of /\u00C2\u00B0 ) MIC : m(mu, mo) rx {SE) f2 {SE) sample size 50 100 200 Model (j) 62 (26, 12) 86 (6, 8) 95 (0, 5) -1.211 (0.251) -1.051 (0.151) -1.034 (0.078) 1.046 (0.493) 1.060 (0.388) 0.974 (0.096) Table 4.4: Estimated regression coefficients and noise variances and their standard errors with n = 200 ( Conditional on / = 2 ) Model (j) J = 1 i = 2 i = 3 h (SE) 0.987 (0.290) -0.029 (0.212) 0.454 (0.413) h [SE) 0.996 (0.062) 0.097 (0.480) 0.011 (0.092) h {SE) -0.001 (0.017) 1.000 (0.032) 0.499 (0.028) {SE) 0.511 (0.165) 0.681 (0.269) 1.002 (0.294) Figure 2.1 {xi,X2) uniformly distributed over the shaded area -2 -1 -1 Figure 2.2 (xi,X2) uniformly distributed over the eight points weight Figure 2.3 Mile per gallon vs. weight for 38 cars 2 0 8 1 2 0 91 120 120 1 2 0 0 .5 1 120 Figure 4.1 {xi,X2) uniformly distributed over each of six regions wi th indicated mass "@en . "Thesis/Dissertation"@en . "1992-11"@en . "10.14288/1.0086617"@en . "eng"@en . "Statistics"@en . "Vancouver : University of British Columbia Library"@en . "University of British Columbia"@en . "For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use."@en . "Graduate"@en . "Asymptotic inference for segmented regression models"@en . "Text"@en . "http://hdl.handle.net/2429/3141"@en .