Inference in Partially Linear Models with Correlated Errors by ISABELLA RODICA GHEMENT B.Sc, The University of Bucharest, Romania, 1996 M.Sc, The University of Bucharest, Romania, 1997 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy in THE FACULTY OF GRADUATE STUDIES (Statistics) The University of British Columbia August 2005 © I S A B E L L A R O D I C A G H E M E N T , 2005 Abstract We study the problem of performing statistical inference on the linear effects in partially linear models with correlated errors. To estimate these effects, we introduce usual, mod-ified and estimated modified backfitting estimators, relying on locally linear regression. We obtain explicit expressions for the conditional asymptotic bias and variance of the usual backfitting estimators under the assumption that the model errors follow a mean zero, covariance-stationary process. We derive similar results for the modified backfitting estimators under the more restrictive assumption that the model errors follow a mean zero, stationary autoregressive process of finite order. Our results assume that the width of the smoothing window used in locally linear regression decreases at a specified rate, and the number of data points in this window increases. These results indicate that the squared bias of the considered estimators can dominate their variance in the presence of correlation between the linear and non-linear variables in the model, therefore compro-mising their i/n-consistency. We suggest that this problem can be remedied by selecting an appropriate rate of convergence for the smoothing parameter of the-estimators. We argue that this rate is slower than the rate that is optimal for estimating the non-linear effect, and as such it 'undersmooths' the estimated non-linear effect. For this reason, data-driven methods devised for accurate estimation of the non-linear effect may fail to yield a satisfactory choice of smoothing for estimating the linear effects. We introduce three data-driven methods for accurate estimation of the linear effects. Two of these methods are modifications of the Empirical Bias Bandwidth Selection method of Op-somer and Ruppert (1999). The third method is a non-asymptotic plug-in method. We use the data-driven choices of smoothing supplied by these methods as a basis for con-structing approximate confidence intervals and tests of hypotheses for the linear effects. Our inferential procedures do not account for the uncertainty associated with the fact that the choices of smoothing are data-dependent and the error correlation structure is estimated from the data. We investigate the finite sample properties of our procedures via a simulation study. We also apply these procedures to the analysis of data collected in a time-series air pollution study. ii Contents Abstract ii Contents iii List of Tables vii List of Figures viii Acknowledgements xxiv Dedication xxvi 1 Introduction 1 1.1 Literature Review 2 1.1.1 Partially Linear Models with Uncorrelated Errors 3 1.1.2 Partially Linear Models with Correlated Errors 5 1.2 Thesis Objectives 9 2 A Partially Linear M o d e l with Correlated Errors 13 2.1 The Model 13 2.2 Assumptions 15 2.3 Notation 19 2.4 Linear Algebra - Useful Definitions and Results 21 2.5 Appendix 22 iii 3 Estimation in a Partially Linear M o d e l with Correlated Errors 25 3.1 Generic Backfitting Estimators 26 3.1.1 Usual Generic Backfitting Estimators 30 3.1.2 Modified Generic Backfitting Estimators 31 3.1.3 Estimated Modified Generic Backfitting Estimators 31 3.1.4 Usual, Modified and Estimated Modified Speckman Estimators . . 32 4 Asymptot ic Properties of the Local Linear Backfitting Estimator fiitsch 35 4.1 Exact Conditional Bias of fiits\ given X and Z 36 4.2 Exact Conditional Variance of fiIiSc Given X and Z 44 4.3 Exact Conditional Measure of Accuracy of (3I<Sc given X and Z 49 4.4 The Vn-consistency of fii,s^ 50 4.5 Generalization to Local Polynomials of Higher Degree 52 4.6 Appendix 53 5 Asymptot ic Properties of the Modified and Estimated Modified Local Linear Backfitting Estimators, / 3 ^ - i i S c and /3~-i g c 71 5.1 Exact Conditional Bias of / 3 ^ - i i S c given X and Z 72 5.2 Exact Conditional Variance of J S ^ - J ^ C given X and Z 76 5.3 Exact Conditional Measure of Accuracy of /3^-i ( S« Given X and Z . . . 79 5.4 The -v/n-consistency of /3^-i > S c 80 5.5 Generalization to Local Polynomials of Higher Degree 81 5.6 The i/n-consistency of /3--i 81 5.7 Appendix 84 6 Choosing the Correct Amount of Smoothing 101 6.1 Notation 102 6.2 Choosing h for cT(3I>Sc and c T / 3 ^ - i ) S c 103 6.2.1 Review of Opsomer and Ruppert's EBBS method 104 6.2.2 Modifications to the EBBS method 107 iv 6.2.3 Plug-in method 109 6.3 Estimating m, of and * 110 6.3.1 Estimating m 110 6.3.2 Estimating of and * 114 6.4 Choosing h for cTJ3~-i 116 7 Confidence Interval Estimation and Hypothesis Testing 118. 7.1 Confidence Interval Estimation 118 7.1.1 Bias-Adjusted Confidence Interval Construction 121 7.1.2 Standard Error-Adjusted Confidence Interval Construction . . . . 122 7.2 Hypothesis Testing 123 8 Monte Carlo Simulations 124 8.1 The Simulated Data 125 8.2 The Estimators 126 8.3 The MSE Comparisons 129 8.4 Confidence Interval Coverage Comparisons 130 8.4.1 Standard Confidence Intervals 131 8.4.2 Bias-Adjusted Confidence Intervals 133 8.4.3 Standard Error-Adjusted Confidence Intervals 134 8.5 Confidence Interval Length Comparisons 134 8.6 Conclusions 136 9 Appl icat ion to A i r Pollution D a t a 141 9.1 Data Description 143 9.2 Data Analysis 144 9.2.1 Models Entertained for the Data 144 9.2.2 Importance of Choice of Amount of Smoothing 146 9.2.3 Choosing an Appropriate Model for the Data 147 9.2.4 Inference on the PM10 Effect on Log Mortality 151 v 10 Conclusions 180 Bibliography 187 Appendix A M S E Comparisons 192 Appendix B Validity of Confidence Intervals 223 Appendix C Confidence Interval Length Comparisons 234 vi List of Tables 8.1 Values of I for which the standard 95% confidence intervals for /?i con-structed from the estimators Pu]PLUG_IN, PU!EBBS-G a n d PS^MCV A R E valid (in the sense of achieving the nominal coverage) for each setting in our simulation study. v i i List of Figures 8.1 Data simulated from model (8.1) for p = 0,0.4,0.8 and m(z) — mi(z). The first row shows plots that do not depend on p. The second and third rows each show plots for p — 0, 0.4, 0.8 138 8.2 Data simulated from model (8.1) for p = 0,0.4,0.8 and m(z) = m.2(z). The first row shows plots that do not depend on p. The second and third rows each show plots for p = 0, 0.4, 0.8 139 9.1 Pairwise scatter plots of the Mexico City air pollution data 156 9.2 Results of gam inferences on the linear PM10 effect (3\ in model (9.3) as a function of the span used for smoothing the seasonal effect m.\. estimated PM10 effects (top left), associated standard errors (top right), 95% confi-dence intervals for /?i (bottom left) and p-values of t-tests for testing the statistical significance of /?i 157 9.3 The top panel displays a scatter plot of log mortality versus PM10. The ordinary least squares regression line of log mortality on PM10 is super-imposed on this plot. The bottom panel displays a plot of the residuals associated with model (9.1) versus day of study. 158 9.4 Plots of the the fitted seasonal effect mi in model (9.2) for various spans. Partial residuals, obtained by subtracting the fitted parametric part of the model from the responses, are superimposed as dots 159 9.5 Plots of the residuals associated with model (9.2) for various spans. . . . 160 viii 9.6 P-values associated wi th a series of crude F-tests for testing model (9.4) against model (9.2) 161 9.7 Plots of the fitted weather surface m 2 in model (9.4) when the fitted sea-sonal effect m i (not shown) was obtained wi th a span of 0.09. The surface m2 was smoothed wi th spans of 0.01 (top left), 0.02 (top right), 0.03 (bot-tom left) or 0.04 (bottom right) 162 9.8 Degrees of freedom consumed by the fitted weather surface m-2 in model (9.4) versus the span used for smoothing m 2 when the fitted seasonal effect m i (not shown) was obtained wi th a span of 0.09 163 9.9 Plot of residuals associated wi th model (9.3) versus P M 1 0 (top row) and day of study (bottom row). The span used for smoothing the unknown m i in model (9.3) is 0.09 164 9.10 Plot of residuals associated wi th model (9.3) versus relative humidity, given temperature. The span used for smoothing the unknown m i in model (9.3) is 0.09 165 9.11 Plot of residuals associated wi th model (9.3) versus temperature, given relative humidity. The span used for smoothing the unknown mj i n model (9.3) is 0.09 166 9.12 Autocorrelat ion plot (top row) and partial autocorrelation plot (bottom row) of the residuals associated wi th model (9.3). The span used for smoothing the unknown m i in model (9.3) is 0.09 167 9.13 Autocorrelat ion plot (top row) and partial autocorrelation plot (bottom row) of the responses in model (9.3) 168 9.14 Usual local linear backfitting estimate of the linear P M 1 0 effect in model (9.4) versus the smoothing parameter 169 9.15 Prel iminary estimates of the seasonal effect m in model (9.3), obtained wi th a modified (or leave-2/ + 1-out) cross-validation choice of amount of smoothing 170 ix 9.16 Residuals associated with model (9.3), obtained by estimating mi with a modified (or leave-(2Z + l)-out) cross-validation choice of amount of smoothing 171 9.17 Estimated order for AR process describing the serial correlation in the residuals associated with model (9.3) versus I, where I — 0,1,...,26. Residuals were obtained by estimating mi with a modified (or leave-(21 + l)-out) cross-validation choice of amount of smoothing 172 9.18 Estimated bias squared, variance and mean squared error curves used for determining the plug-in choice of smoothing for the usual local linear back-fitting estimate of Pi. The different curves correspond to different values of I, where Z = 0,1,..., 26. The estimated variance curves corresponding to small values of I are dominated by those corresponding to large values of I when the smoothing parameter is large. In contrast, the estimated squared bias and mean squared error curves corresponding to small values of I dominate those corresponding to large values of I when the smoothing parameter is large 173 9.19 Estimated bias squared, variance and mean squared error curves used for determining the global EBBS choice of smoothing for the usual local linear backfitting estimate of Q\. The different curves correspond to different values of I, where I = 0,1,. . . , 26. The curves corresponding to large values of I dominate those corresponding to small values of 1 174 9.20 Plug-in choice of smoothing for estimating Pi versus I, where I = 0,1,..., 26.175 9.21 Global EBBS choice of smoothing for estimating Pi versus I, where I = 0,1,. . . , 26 176 x 9.22 Standard 95% confidence intervals for Pi based on local linear backfitting estimates of Pi with plug-in choices of smoothing. The different intervals correspond to different values of I, where I = 0,1, . . . , 26. The shaded area represents confidence intervals corresponding to values of I that are reasonable for the data 177 9.23 Standard 95% confidence intervals for Pi based on local linear backfitting estimates of Pi with global EBBS choices of smoothing. The different intervals correspond to different values of I, where I = 0,1, . . . , 26. The shaded area represents intervals corresponding to values of I that are rea-sonable for the data; the intervals corresponding to I = 3, . . . , 7 do not cross the horizontal line passing through zero 178 9.24 Standard 95% confidence intervals for Pi based on local linear backfitting estimates of Pi with global EBBS choices of smoothing obtained by using a smaller grid range. The different intervals correspond to different values of I, where I = 0,1,. . . , 26. The shaded area represents confidence intervals corresponding to values of I that are reasonable for the data 179 A . l Boxplots of pairwise differences in log MSE for the estimators PJPPLUG-INI PU]EBBS-G A N A @IJ,EBBS-L OI" * n e linear e ff e c t p1 i n model (8.1), where I = 0,1,. . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0 and m(z) = mi (z). 193 A.2 Boxplots of pairwise differences in log MSE for the estimators P'UPLUG-IN^ PIJEBBS-G a n d PUEBBS-L OI" * n e linear effect Pi in model (8.1), where I = 0,1, . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.2 and m(z) = mi(z). 194 xi A.3 Boxplots of pairwise differences in log MSE for the estimators PIJ PLUG-IN > PJJ,EBBS-G a n d PU!EBBS-L °f * n e linear effect B\ in model (8.1), where I = 0,1, . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.4 and m(z) — mi(z). 195 A.4 Boxplots of pairwise differences in log MSE for the estimators PUPLUG-IN> PIJ EBBS-G a n d PIJEBBS-L °f the linear effect B\ in model (8.1), where I = 0,1, . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.6 and m(z) = mi(z).196 A.5 Boxplots of pairwise differences in log MSE for the estimators B<^PLUG_IN, EBBS-G a n d PU,EBBS-L °f the linear effect Pi in model (8.1), where I — 0,1, . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.8 and m(z) = m\(z). 197 A.6 Boxplots of pairwise differences in log MSE for the estimators PU^PLUG-INI EBBS-G a n d PUEBBS-L °^ ^ n e linear effect Pi in model (8.1), where / = 0,1, . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0 and m(z) = m,2{z). 198 xii A.7 Boxplots of pairwise differences in log MSE for the estimators PU*PLUG-INI EBBS-G a n d PUEBBS-L °f the linear effect /3j in model (8.1), where / = 0,1, . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.2 and m(z) — 7712(2). 199 A.8 Boxplots of pairwise differences in log MSE for the estimators P(JPLUG-IN> PIJEBBS-G a n d PIJEBBS-L °f the linear effect /?i in model (8.1), where I = 0,1, . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.4 and m(z) = m2(z). 200 A.9 Boxplots of pairwise differences in log MSE for the estimators P'u PLUG-IN^ PIJEBBS-G a n d PIJEBBS-L °f the linear effect /?i in model (8.1), where I = 0,1, . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.6 and m(z) — 7712(2:). 201 A.10 Boxplots of pairwise differences in log MSE for the estimators PIJPLUG-IN> PXJEBBS-G a n d PIJEBBS-L °f the linear effect B\ in model (8.1), where / = 0,1, . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.8 and m(z) = m2(z). 202 xiii A. 11 Boxplots of pairwise differences in log MSE for the estimators 3^M PLUG-IN> P^EM EBBS-G a n d $EM EBBS-L °f * n e linear effect 8\ in model (8.1), where / = 0,1, . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Dif-ferences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0 and m(z) — m\(z). 203 A. 12 Boxplots of pairwise differences in log MSE for the estimators P^M PLUG-IN > $SM,EBBS-G a n d P^EM,EBBS-L °f t n e hnear effect Pi in model (8.1), where I — 0,1, . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Dif-ferences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.2 and m(z) = mi(z). 204 A.13 Boxplots of pairwise differences in log MSE for the estimators 8^§M PLUG-IN^ P^EM,EBBS-G a n ^ P^EM,EBBS-L °f the linear effect 3\ in model (8.1), where I — 0,1,. . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Dif-ferences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.4 and m(z) — m\(z). 205 A. 14 Boxplots of pairwise differences in log MSE for the estimators PLUG-IN' P^EM,EBBS-G a n d P^EM,EBBS-L °f ^ n e u n e a r effect Pi in model (8.1), where I — 0,1, . . . , 10. Boxplots for which the average difference in log MSE's is significantly different than 0 at the 0.05 level are labeled with an S. Dif-ferences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.6 and m(z) = mi(z). 206 xiv 1 A.15 Boxplots of pairwise differences in log MSE for the estimators 0E^M PLUC-IN> PEM,EBBS-G a n d PEM,EBBS-L o f the l i n e a r e f f e c t A i n model (8.1), where J = 0,1, . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Dif-ferences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.8 and m(z) = rni(z). 207 A.16 Boxplots of pairwise differences in log MSE for the estimators 0^M PLUG-IN> PISM,EBBS-G A N A $SM,EBBS-L °fthe n n e a r effect 3\ in model (8.1), where I = 0,1, . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 significance level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0 and m(z) = m2(z) 208 A.17 Boxplots of pairwise differences in log MSE for the estimators 0^M PLUG-IN^ P^EM,EBBS-G a n d PISM,EBBS-L OI"the n n e a r effect 0i in model (8.1), where I — 0,1, . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Dif-ferences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.2 and m(z) = m2{z). 209 A. 18 Boxplots of pairwise differences in log MSE for the estimators P\^M PLUG—IN> PEM,EBBS-G a n d PEM,EBBS-L °f t n e linear effect @x in model (8.1), where / = 0,1, . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Dif-ferences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.4 and m(z) = m2(z). 210 xv A. 19 Boxplots of pairwise differences in log MSE for the estimators B^M PLUG-IN^ $SM,EBBS-G a n d PISM,EBBS-L °f the linear effect B\ in model (8.1), where / = 0,1,. . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Dif-ferences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.6 and m(z) = m2(z). 211 m EM,PLUG-IN' A.20 Boxplots of pairwise differences in log MSE for the estimators 0^ PEM,EBBS-G a n d PEM,EBBS-L o f t h e l i n e a r e f f e c t Pi hi model (8.1), where Z — 0,1,. . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Dif-ferences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.8 and m(z) — 7712(2). 212 A.21 Boxplots of pairwise differences in log MSE for the estimators B U ^ P L U G _ I N , Pu,EBBS-G> M]M,EBBS-G a n d PS]MCV o f t h e l i n e a r e f f e c t Pi i n m o d e l (8.1), where I = 0,1,. . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0 and m(z) = m\(z). 213 A.22 Boxplots of pairwise differences in log MSE for the estimators PIJPLUC-IN' PU!EBBS-G> PEM,EBBS-G a n d P{S]MCV o f t h e l i n e a r e f f e c t Pi i n model (8.1), where I = 0,1, . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.2 and m(z) = mi (z).214 xvi A.23 Boxplots of pairwise differences in log MSE for the estimators Pu*PLUG-IN i PU,EBBS-G> PEM,EBBS-G a n d PS]MCV o f t h e l i near effect 6X in model (8.1), where I — 0,1,. . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.4 and m(z) = mi (z). 215 A.24 Boxplots of pairwise differences in log MSE for the estimators PO^PLUG-IN^ PU,EBBS-G> PEM,EBBS-G a n d PSMCV o f t h e l i near effect ft in model (8.1), where / = 0,1, . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an 5. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.6 and m(z)= mi(z).216 A.25 Boxplots of pairwise differences in log MSE for the estimators PU^PLUG-INI M]EBBS-GI PEM,EBBS-G a n d PS]MCV o f t h e linear effect ft in model (8.1), where / = 0,1,. . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.8 and m(z) = mi (z).217 A.26 Boxplots of pairwise differences in log MSE for the estimators PI/PLUG-IN' PU,EBBS-G> PEM,EBBS-G a n d PS]MCV o f t h e linear effect ft in model (8.1), where / = 0,1,. . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0 and m(z) = m2(z).218 xvii A.27 Boxplots of pairwise differences in log MSE for the estimators PUPLUG-IN> M]EBBS-GI PEM,EBBS-G a n d M}MCV o f t n e l i n e a r e f f e c t A i n m o d e l C8-1)-where I = 0,1,. . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.2 and m(z) = 7712(2).219 A.28 Boxplots of pairwise differences in log MSE for the estimators Pu'PLUG-IN' 0U,EBBS-G> PEM,EBBS-G a n d PS}MCV o f t h e l i n e a r e f f e c t Pi i n m o d e l t 8- 1) ' where I = 0,1,. . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.4 and 777,(2) = 777,2(2).220 A.29 Boxplots of pairwise differences in log MSE for the estimators PIJPLUG-IN' PU]EBBS-GI PEM,EBBS-G a n d PSMCV o f t h e l i n e a r e f f e c t Pi i n m o d e l C8-1)' where I — 0,1,. . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an, S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) for which p = 0.6 and m(z) = m2(z) 221 A.30 Boxplots of pairwise differences in log MSE for the estimators PUPLUG -IN' PU,EBBS-GI PEM,EBBS-G a n d PS,MCV o f t h e l i n e a r e f f e c t Pi i n model (8.1), where I — 0,1,. . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.8 and 771(2) = 777.2(2).222 xviii B . l Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect 8\ in model (8.1). Each method depends on a tuning parameter I = 0,1, . . . , 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p — 0 and m(z) — mi(z) 224 B.2 Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect 3i in model (8.1). Each method depends on a tuning parameter I — 0,1,. . . , 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.2 and m(z) = m^z) 225 B.3 Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect (3\ in model (8.1). Each method depends on a tuning parameter I = 0,1, . . . , 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.4 and m(z) = m\(z) 226 B.4 Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect d\ in model (8.1). Each method depends on a tuning parameter I = 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p — 0.6 and m(z) = m\(z) 227 xix B.5 Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect ft in model (8.1). Each method depends on a tuning parameter I — 0,1, . . . , 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p — 0.8 and m(z) = m\(z) 228 B.6 Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect ft in model (8.1). Each method depends on a tuning parameter / = 0,1,. . . , 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0 and m(z) — m2(z) 229 B.7 Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect ft in model (8.1). Each method depends on a tuning parameter I — 0,1,. . . , 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.2 and m(z) — m2{z) 230 B.8 Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect ft in model (8.1). Each method depends on a tuning parameter I = 0,1, . . . , 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.4 and m(z) — m2(z) 231 xx B.9 Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect Pi in model (8.1). Each method depends on a tuning parameter I = 0,1, . . . , 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p — 0.6 and 771(2) = 777.2(2:) B. 10 Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect Pi in model (8.1). Each method depends on a tuning parameter I = 0,1,. . . , 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.8 and 771(2) = 7712(2) C l Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of Z = 0,1,. . . , 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0 and 771(2) = 7711(2) C. 2 Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I — 0,1,. . . , 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0.2 and 771(2) = 7711(2) x x i C.3 Top row: Average length of the standard confidence intervals for the linear effect ft in model (8.1) as a function of I — 0,1, . . . , 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0.4 and m(z) = rrii(z) 237 C.4 Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I = 0,1,. . . , 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p — 0.6 and m(z) = rri\(z) 238 C.5 Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of / — 0,1,. . . , 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0.8 and m(z) = mi(z) 239 C.6 Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I = 0,1,. . . , 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for ft. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0 and m(z) — 7712(2) 240 xxii C.7 Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I = 0,1,. . . , 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0.2 and 771(2) = 771.2(2) 241 C.8 Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I = 0,1,. . . , 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths-were computed with p = 0.4 and 771(2) = m 2 (2) 242 C.9 Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I — 0,1,. . . , 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p — 0.6 and 771(2) = 7712(2) 243 C.10 Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I = 0,1,. . . , 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0.8 and 771(2) = 7712(2) 244 xxm Acknowledgements A huge thank you to my thesis supervisor, Dr. Nancy Heckman, for being such an inspirational mentor to me - amazingly generous with her time, ideas, advice and NSERC funding, immensely passionate about research and teaching, wonderfully encouraging and supportive. A sincere thank you to Dr. John Petkau, Department of Statistics, University of British Columbia and to Professor Sverre Vedal and Dr. Eduardo Hernandez-Garduho, formerly of the Respiratory Division, Department of Medicine, Faculty of Medicine, University of British Columbia, for kindly providing me with the Mexico City air pollution data. A heartfelt thank you to Dr. John Petkau for generously funding me to analyze these data, for providing me with valuable feedback upon reading the manuscript of this thesis and for his excellent advice over the years. A sincere thank you to Dr. Lang Wu, Dr. Jim Zidek, Dr. Michael Brauer and Dr. Jean Opsomer for their careful reading of the thesis manuscript and valuable comments and suggestions. Thank you to the Department of Statistics and the University of British Columbia for providing me with funding that enabled me to pursue my degree. I would like to thank all faculty, staff and graduate students in the Department of Statis-tics, University of British Columbia, for making my stay there such an enriching experi-ence. r I would like to thank my family in Romania for believing in me and for loving me unconditionally. I would also like to thank my dear friends in Canada and Romania, whose affection and humour helped me stay grounded. Special thanks to Viviane Diaz-Lima, Lisa Kuramoto and Raluca Balan for their unwavering support and for being my friends. xxiv Finally, thank you to Jeffie, my partner in mischief and adventure, for loving me and our family in Romania beyond measure, and for making magical things happen all the time. I S A B E L L A R O D I C A G H E M E N T The University of British Columbia August 2005 XXV Jeffie, my ever-loving, ever-caring, ever-there knight in shining armour, and our loving family in Romania. x x v i Chapter 1 Introduction Semiparametric regression models combine the ease of interpretation of parametric re-gression models with the modelling flexibility of nonparametric regression models. They generalize parametric regression models by allowing one or more covariate effects to be non-linear. Just as in nonparametric regression models, the non-linear covariate effects are assumed to change gradually and are captured via smooth, unknown functions whose particular shapes will be revealed by the data. In this thesis, we are interested in semiparametric regression models for which (i) the response variable is univariate, continuous, (ii) one of the covariate effects is allowed to be smooth, non-linear, and (iii) the remaining covariate effects are assumed to be linear. Given the data (Yi, Xj, Zj), i — 1 , . . . , n, such models can be specified as: Yi = Xjp + m(Zi) + ei, i = l , . . . , n , (1.1) where (3 is a vector of linear effects, m is a smooth, non-linear effect and the ej's are unobservable random errors with zero mean. Model (1.1) is typically referred to as a partially linear regression model. In many applications, the smooth, non-linear effect m in model (1.1) is not of interest in itself but is included in the model because of its potential for confounding the lin-ear effects (3, which are of main interest. The nature of this confounding is often too 1 complex to specify parametrically. A non-parametric specification of this confounding effect is therefore preferred to avoid modelling biases. The practical choice of the degree of smoothness of the non-linear confounder effect is a delicate issue in these types of applications. This choice should yield accurate point estimators of the linear effects of interest. The choice may be highly sensitive to the correlations between the linear and non-linear variables in the model. The potential correlation amongst model errors is a qualitatively different source of con-founding on the linear effects of interest in a part ial ly linear model. In practice, we need to decide carefully whether we should account for this correlation when assessing the significance and magnitude of the linear effects of interest. If one decides to ignore the error correlation, one should try to understand the impact of this decision on the validity of the ensuing inferences. The issues of error correlation, non-linear confounding, and correlation between the linear and non-linear terms in a partially linear regression model are intimately connected. Thei r interplay needs to be judiciously considered when selecting the degree of smoothness of the estimated non-linear effect. Even when this selection yields accurate estimators of the linear effects of interest in the model, one needs to assess whether it also yields valid confidence intervals and testing procedures for assessing the magnitude and significance of these effects. 1.1 Literature Review-i n this section, we provide a survey of some of the most important results in the literature of partially linear regression models of the form (1.1). We treat separately the case when the model errors, €$, i = 1 , . . . , n , are uncorrelated and when they are correlated. Note that, in (1.1), we observe only one sequence Y i , . . . ,Yn- In classical longitudinal studies we would observe multiple sequences. Even though in this thesis we are not 2 interested in partially linear models for analyzing data collected in longitudinal studies, we do mention some results which are significant in the literature of these models. 1.1.1 Par t ia l l y L inear Mode ls w i th Uncor re la ted Er ro rs The partially linear regression model (1.1)-has been investigated extensively under the assumption of independent, identically distributed errors. In this section, we provide a brief overview of some of the most relevant results concerning inferences on /3, the parametric component of the model, that are available in the literature. These results have a.common theme: seeing if /3 is estimated at the 'usual' parametric rate of 1/n - the rate that would be achieved if m were known. As Robinson (1988) points out, consistent estimators of (3 that do not have the 'usual' parametric rate of convergence have zero efficiency relative to estimators that have this rate. Engle et al. (1983) and Wahba (1984) proposed estimating (3 and m simultaneously by minimizing a penalized least squares criterion with penalty based on the s t h derivative of m, with s > 2. The performance of the penalized least squares estimator of @ depends on the correlation between the linear and non-linear variables in the model. Heckman (1986) established the ^/^-consistency of this estimator assuming that the linear and non-linear variables are uncorrelated. Rice (1986) showed that, if the linear and non-linear variables are correlated, the estimator becomes y^n-inconsistent, unless one 'undersmooths' the estimated m. 'Undersmoothing' refers to the phenomenon of estimating m at a slower rate than the 'usual' nonparametric rate of n~4^5 - the rate that would be achieved if (3 were known. Rice showed that if one didn't 'undersmooth', the squared bias of the estimated linear effects would dominate their variance. The author remarked that this would have disastrous consequences on the inferences carried out on the linear effects. For instance, conventional confidence intervals for these effects would be misleading. Rice called into question the utility of traditional methods such as cross-validation for choosing the degree of smoothness of the estimated non-linear effect when i/n-consistency 3 of the estimated linear effects is desired, and rightly so. These methods are devised for 'smoothing', not 'undersmoothing', the estimated non-linear effect. Green, Jennison and Seheult (1985) proposed estimating (3 and m by minimizing a penal-ized least squares criterion with penalty based on a discretization of the second derivative of m. They termed their estimation method least squares smoothing and showed that it yields estimators that solve a system of backfitting equations. These equations combine a smoothing step for estimating m, carried out using a discretized version of smooth-ing splines, with a least squares regression step for estimating (3. Green, Jennison and Seheult generalized their least squares smoothing estimators by allowing the smoothing step in the backfitting equations to be carried out using any smoothing method. These generalized least squares smoothing estimators are referred to in the literature as the Green, Jennison and Seheult estimators. Speckman (1988) derived the asymptotic bias and variance of the Green, Jennison and Seheult estimator of /3, using locally constant regression with general kernel weights in the smoothing step. Speckman's findings par-alleled those of Rice: in the presence of correlation between the linear and non-linear variables in the model, the Green, Jennison and Seheult estimator of (3 is -^-consistent only if one 'undersmooths' the estimated m. Speckman provided a heuristic argument for why the generalized cross-validation method cannot be used to choose the degree of smoothness of the estimated m in practice when -y/n-consistency of the Green, Jennison and Seheult estimator of (3 is desired. Neither Rice nor Speckman proposed methods for 'undersmoothing' the estimated m. However, Speckman (1988) introduced a partial-residual flavoured estimator of (3 that does not require 'undersmoothing'. He argued that traditional methods such as general-ized cross-validation could be used to select the degree of smoothness of the estimated m. Speckman did not address the important issue of whether such data-driven methods would produce amounts of smoothing that yield -y/n-consistent estimators of the linear ef-fects of interest. Sy (1999) established that data-driven methods such as cross-validation 4 and generalized cross-validation do indeed yield i/n-consistent estimators of these effects, thus paving the way for carrying out valid inferences on these effects, at least for large sample sizes. Opsomer and Ruppert (1999) proposed estimating (3 and m via the Green, Jennison and Seheult estimators, using locally linear regression with general kernel weights in the smoothing step. They showed that, unless one 'undersmooths' the estimated m, their estimator of (3 may not achieve -y/n-consistency. They then suggest how to use the data to choose the appropriate degree of smoothness for accurate estimation of cT(3, with c known. Opsomer and Ruppert's approach for choosing the right degree of smoothness, referred to as the Empirical Bias Bandwidth Selection (EBBS) method, will be discussed in more detail in Chapter 6. The authors conjectured that EBBS would produce a yfn-consistent estimator of cTf3. 1.1.2 Partially Linear Models with Correlated Errors The independence assumption for the errors associated with a partially linear regression model is not always appropriate in applications. For instance, when the data have been collected sequentially over time, it is likely that present response values will be correlated with past response values. Even in the presence of error correlation, it is desirable to obtain y^-consistent estimators for the linear effects in the model. Engle et al. (1986) were amongst the first authors to consider a partially linear regression model with AR(1) errors. They noted that the correct error correlation structure can be used to transform this model into a model with serially uncorrelated errors, by quasi-differencing all of the data. They proposed estimating the linear effects (3 and the non-linear effect m in the original model by applying the penalized least squares method proposed by Engle et al. (1983) and Wahba (1984) to the quasi-differenced data. Engle et al. (1986) prove that their estimator of (3 is consistent when one estimates m at the 'usual' nonparametric rate of n~ 4 / 5 , but do not show it is y/n— consistent. They recommend 5 choosing both the 'right' degree of smoothness of the estimated m and the autoregressive parameter by minimizing a generalized cross-validation criterion constructed from the quasi-differenced data. This data-driven choice of smoothing may not however yield an accurate estimator of j3, as it is geared at accurate estimation of m. Schick (1996, 1999) considered partially linear regression models with AR(1) errors and ARMA(p,cj) errors, respectively, where p,q > 1. He characterized and constructed effi-cient estimators for the parametric component f3 of these models, assuming appropriate theoretical choice of degree of smoothness for the estimated m. He did not however indicate how one might make this choice in practice. Several authors investigated partially linear models with a-mixing errors. Before review-ing their respective contributions, we provide a definition for the a-mixing concept. For reference, see Ibragimov and Linnik (1971). Definition 1.1.1 A sequence of random variables {et,t = 0,±1,---} is said to be a-mixing if a(k) = sup sup \P(Af\B) - P(A)P(B)\ -* 0 (1.2) as k —> oo; where J7™^ and F^+k are two a-fields generated by {et,t < n} and {et,t > n + k}, respectively. The mixing coefficient a(k) in (1.2) measures the amount of dependence between events involving variables separated by at least k lags. Note that for stationary sequences the supremum over n in (1.2) goes away. Aneiros Perez and Quintela del Rio (2001a) considered a partially linear model with a-mixing, stationary errors. They proposed estimating j3 and m via modifications of the Speckman estimators. Their modifications account for the error correlation structure, assumed to be fully known. The smoothing step involved in estimating /3 and m is based 6 on locally constant regression with Gasser-Miiller weights (Gasser and Miiller, 1984), ad-justed for boundary effects. The authors derived the order of the conditional asymptotic bias and variance of the modified Speckman estimator of (3. They found that the condi-tional asymptotic bias of their estimator of /3 is negligible with respect to its conditional asymptotic variance, shown to have the 'usual' parametric rate of convergence of 1/n. They concluded they do not need to 'undersmooth' their estimator for m in order to ob-tain a \Zn-consistent estimator for /3. The fact that the modified Speckman estimator of (3 does not require 'undersmoothing' in the presence of error correlation is not surprising. The estimator inherits this property from the usual Speckman estimator. Aneiros Perez and Quintela del Rio (2001b) proposed a data-driven modified cross-validation method for choosing the degree of smoothness required for accurate estimation of the regression function r(Xi, Zi) = Xf/3 + m(Zi) via modified Speckman estimators. It is not clear whether such a method would be suitable for accurate estimation of (3 itself. To address the problem of choosing the degree of smoothness for accurate estimation of (3 via the modified Speckman estimator, Aneiros Perez and Quintela del Rio (2002) developed an asymptotic plug-in method. Their method relies on the more restrictive assumption that the model errors are realizations of an autoregressive process of finite, known order. You and Chen (2004) considered a partially linear model with a-mixing, possibly non-stationary errors. They estimated /3 and m using the usual Speckman estimators, which do not account for error correlation. They then applied a block external bootstrap approach to approximate the distribution of the usual Speckman estimator of (3 and provide a consistent estimator of its covariance matrix. Using this information, they constructed a large-sample confidence interval procedure for estimating f3. Based on a simulation study, the authors note that the block size seems to have a strong influence on the finite-sample performance of their procedure. However, they do not indicate how one might choose the block size in practice. In the simulation study, the smoothing parameter of the usual Speckman estimator of (3 was selected via cross-validation, modified for correlated errors. This method is appropriate for accurate estimation of m but may not 7 be suitable for accurate estimation of /3. You, Zhou and Chen (2005) considered a partially linear model with errors assumed to follow a moving average process of infinite order. They proposed a jackknife estimator for /3, which they obtained from a usual Speckman estimator. They showed their estimator to be asymptotically equivalent to the usual Speckman estimator, and proposed a method for estimating its asymptotic variance. They also constructed confidence intervals and tests of hypotheses for /3 based on the jackknife estimator and its estimated variance. In their simulation study, these authors find that confidence interval estimation based on their jackknife estimator has better finite-sample coverage properties than that based on the usual Speckman estimator, even though the latter uses the information on the error structure, while the former does not. In this study, the smoothing was performed with different nearest neighbor smoothing parameter values and the results were shown to be insensitive to the choice of this parameter. This may not always be the case for contexts that are different from that considered by these authors. As we already mentioned, partially linear regression models with correlated errors can be used for analyzing longitudinal data, that is, data obtained by measuring each of several study units on multiple occasions over time. Longitudinal data are naturally correlated, as the measurements taken on the same study unit are correlated. In order to estimate the linear effects /3 and the non-linear effect m in such models, Moyeed and Diggle (1994) modified the Green, Jennison and Seheult and the Speckman estimators to account for the longitudinal data structure and for the error correlation, assumed to be known. Their smoothing step used local constant Nadaraya-Watson weights (Nadaraya, 1964 and Watson, 1964). They derived the order of the conditional asymptotic bias and variance of their estimators of /3, obtaining asymptotic constants only for the variance of these estimators. Their results are valid under the assumption that the number of study units goes to infinity and the number of occasions on which each study unit is being measured is kept constant. Note that Moyeed and Diggle did not treat m as a 8 nuisance. To choose the degree of smoothness of the estimated m , these authors used a leave-one-subject-out cross-validation method. This method is geared towards accurate estimation of m and may not be appropriate for accurate estimation of j3. None of the authors considered in this section looked simultaneously at how to choose the right degree of smoothing for accurate estimation of the linear effects and how to construct valid standard errors for the estimated linear effects. To do both requires accounting for the correlation structure of the model errors. 1.2 Thesis Objectives Throughout this thesis, we wi l l consider only partially linear models of the form (1.1) in which the non-linear effect m is treated as a nuisance. In contrast to the 'usual ' view in regression models, we wi l l think of the linear covariates as being random but consider the Zi's to be fixed. The reason for this is that we are mainly interested in applications for which the Z; 's are consecutive time points (e.g. days, weeks, years). The results in this thesis can be easily modified to account for the case when the Z^'s are random instead of fixed. However, some expressions need to be re-defined to account for the randomness of the Zj 's . For instance, see the end of Sections 4.1 and 4.2. In this thesis, we wi l l allow the linear covariates to be mutually correlated and assume they are related to the non-linear covariates v ia a non-parametric regression relationship. Most importantly, we wi l l assume that the model errors are serially correlated. W i t h i n this framework, we wi l l concentrate on developing formal methods for carrying out valid inferences on those linear effects in the model which are of main interest. This entails the following: 1. defining sensible estimators for the linear effects in the model, as well as for the nuisance non-linear effect; 9 2. deriving the asymptotic bias and variance of the proposed estimators of the linear effects; 3. developing methods for choosing the right degree of smoothness of the estimated non-linear effect in order to accurately estimate the linear effects of interest; 4. developing methods for estimating the correlation structure of the model errors for inference and smoothing; 5. developing methods for assessing the magnitude and statistical significance of the linear effects of interest; 6. investigating the performance of the proposed inferential methods via Monte Carlo simulation studies; 7. using the inferential methods developed in this thesis to answer specific questions related to the impact of air pollution on mortality in Mexico City during 1994-1996, after adjusting for weather patterns and temporal trends. We conclude this chapter with an overview of the thesis which indicates where and how the above objectives are addressed. In Chapter 2, we provide a formal definition of the partially linear model with correlated errors of interest in this thesis. We also introduce the notation and assumptions required for establishing the theoretical results in subsequent chapters. In Chapter 3, we define the following types of estimators for (3 and m: (i) local linear backfitting estimators, (ii) modified local linear backfitting estimators, and (iii) estimated modified local linear backfitting estimators. In Chapter 4, we derive asymptotic approximations for the exact conditional bias and variance of the local linear backfitting estimator of /3. Based on these results we conclude that, in general, the local linear backfitting estimator of j3 is not v^-ccmsistent. We 10 argue that the estimator can achieve y^n-consistency provided we 'undersmooth' the corresponding local linear backfitting estimator of m. In Chapter 5, we replicate the results in Chapter 4 for the modified local linear backfitting estimator of (3. We also provide sufficient conditions under which the estimated modified local linear backfitting estimator of (3 is asymptotically 'close' to its modified counterpart. In Chapter 6, we develop three data-driven methods for choosing the degree of smoothness of the backfitting estimators of m defined in this thesis in order to accurately estimate (3. Two of these methods are modifications of the Empirical Bias Bandwidth Selection (EBBS) method of Opsomer and Ruppert (1999). The third method is a non-asymptotic plug-in method. All methods account for error correlation. We suspect that these meth-ods 'undersmooth' the estimated m because they attempt to estimate the amount of smoothing that is optimal for estimating (3, not for estimating m. Our theoretical results suggest that, in general, the optimal amount of smoothing for estimating (3 is smaller than the optimal amount of smoothing for estimating m. In Chapter 6, we also introduce methods for estimating the correlation structure of the model errors needed to choose the amount of smoothing of the backfitting estimators of /3 and to carry out inferences on (3. These methods rely on a modified cross-validation criterion similar to that proposed by Aneiros Perez and Quintela del Rio (2001b). In Chapter 7, we develop three kinds of confidence intervals and tests of hypotheses for assessing the magnitude and significance of a linear combination cT(3 of the linear effects in the model: standard, bias-adjusted and standard-error adjusted. To our knowledge, adjusting for bias in confidence intervals and tests of hypotheses has not been attempted in the literature of partially linear models. In Chapter 8, we report the results of a Monte Carlo simulation study. In this study, we investigated the finite sample properties of the usual and estimated modified local linear backfitting estimators of cTf3 against those of the usual Speckman estimator. We chose 11 the smoothing parameter of the backfitting estimators using the data-driven methods developed in Chapter 6. By contrast, we chose the smoothing parameter of the usual Speckman estimator using cross-validation, modified for correlated errors and for bound-ary effects. The main goals of our simulation study were (1) to compare the expected log mean squared error of the estimators and (2) to compare the performance of the con-fidence intervals built from these estimators and their associated standard errors. Our study suggested that quality of the inferences based on the usual local linear backfitting estimator was superior, and that this estimator should be computed with one of our modifications of EBBS or a non-asymptotic plug-in choice of smoothing. Even though the quality of the inferences based on the usual Speckman estimator was reasonable for most simulation settings, it was not as good as that of the inferences based on the usual local linear backfitting estimator. The quality of the inferences based on the estimated modified local linear estimator was poor for many simulation settings. In Chapter 9, we use the inferential methods developed in this thesis to assess whether the pollutant PM10 had a significant short-term effect on log mortality in Mexico City during 1994-1996, after adjusting for temporal trends and weather patterns. Our data analysis suggests that there is no conclusive proof that PM10 had a significant short-term effect on log mortality. In Chapter 10, we summarize the main contributions of this thesis and suggest possible extensions to our work. 12 Chapter 2 A Partially Linear Model with Correlated Errors In Section 2.1 of this chapter, we provide a formal definition of the partially linear model of interest in this thesis. In Section 2.2, we introduce assumptions that we use to study the asymptotic behavior of our proposed estimators. In Section 2.3, we introduce some useful notation. In Section 2.4, we give several linear algebra definitions and results which will be utilized throughout this thesis. The chapter concludes with an Appendix which contains a useful theoretical result. 2.1 The Model Given the data (Yi, Xij, Zi), i = 1,..., n, j — 1,... ,p, the specific form of the partially linear model considered in this thesis is: Y = X/3 + m + e, (2.1) where Y = (Yi,... ,Yn)T is the vector of responses, X is the design matrix for the parametric part of the model (to be defined shortly), /3 = (Po,Pi, • • • ,Pp)T is the vector 13 of unknown linear effects, m — (m(Zi),..., m(Zn))T and e = ( e i , . . . , en)T is the vector of model errors. Here, X = 1 X 11 X lp \ (2.2) , Xp, the Zi's are fixed design y 1 Xni • • • Xnp J where Xu,..., Xip are measurements on p variables Xi, points on [0,1] following a design density /(•) (see condition (A3) in Section 2.2 for the exact definition), and m(-) is a real-valued, unknown, smooth function defined on [0,1]. Note that, unless we impose a restriction on m(-), model (2.1) is unidentifiable due to the presence of the intercept ft in the model. For instance, ft + m(-) = 0 + (m(-) + ft). To ensure identifiability, we assume that m(-) satisfies the integral restriction: ~i m(z)f{z)dz = 0. (2.3) / J o In practice, we replace (2.3) by the summation restriction: lTm = 0, (2.4) where the symbol 1 denotes an n x 1 vector of l's. One could think of the smooth function m(-) as being a transformation of the fixed design points Zi,i = 1,... ,n, that ensures that the partially linear model (2.1) is an adequate description of the variability in the Yi's. Alternatively, one could think of the function m(-) as representing the confounding effect of a random variable having density /(•) on the linear effects ft, . . . , 8p. We assume that the errors €j in model (2.1) are such that Efe) — 0, Var(e;) = of and Corr(ei, tj) = 'J/jj for i ^ j, where o~t > 0 and \& = (\Pj,j) is the n x n error correlation matrix. Note that \I> is not necessarily equal to the nx n identity matrix I. In practice, both the error variance of and the error correlation matrix * are typically unknown and need to be estimated from the data. An alternative formulation for the partially linear model (2.1) can be obtained by remov-14 ing the constraint (2.3), setting m* = 0O1 + m and re-writing the model as: Y = X*/3* + m* + e, (2.5) where X* is an n x p matr ix defined as: Xn \ X* = (2.6) X, np ) and (3* = (0i,... ,0P)T. The model formulation in (2.5) is frequently encountered in the part ial ly linear model literature and does not require that we impose any identifiability conditions on the function m*(z) = 0O + m(z),z 6 [0,1] . Indeed, the absence of an intercept in model (2.5) ensures that m*(-) is identifiable. In this thesis, however, we prefer to use the formulation in (2.1), as it makes it easier to understand that model (2.1) is a generalization of a linear regression model and a particular case of an additive model, which typically do contain an intercept. 2.2 Assumptions The asymptotic results derived in Chapters 4 and 5 allow the linear variables in model (2.1) to be correlated wi th the non-linear variable v i a the following condition. (AO) The covariate values X^ and the non-random design points Zi are related via the nonparametric regression model: (i) the gj(-) 's are smooth, unknown functions having three continuous derivatives; Xij = gj(Zi) +r)ij, i = 1, . . . , n , j = 1 , . . . (2.7) where 15 (ii) the (rjn,... ,r]ip)T,i = 1,... , n, are independent, identically distributed unob-served random vectors with mean zero and variance-covariance matrix S = We impose two different sets of assumptions on the errors associated with model (2.1) for studying the asymptotic behaviour of two different estimators of (3. In Section 3.1.1 of Chapter 3 we define the so-called local linear backfitting estimator of f3. The definition of this estimator does not account for the correlation structure of the model errors. In Chapter 4, we study the asymptotic behaviour of this estimator under the assumption that the model errors satisfy the following condition. (Al) (i) The model errors Ci,i = 1,... ,n, represent n consecutive realizations from a general covariance-stationary process {et}, t — 0 ,±1 ,±2 , . . . having mean 0, finite, non-zero variance a\ and correlation coefficients: E{etet-k) E{eses+k) , . Pk = o = 2 , k = 1,2,3,..., (2.8) where t,s =0, ±1 , ±2 , . . . . (ii) The error correlation matrix \& is assumed to be symmetric, positive-definite and to have a bounded spectral norm, that is ||*||s = 0{1) as n —* oo. (For a definition of the spectral norm of a matrix see Section 2.4-) (iii) Let (rjn,... ,rjip)T,i = 1,... ,n, be as in (AO)-(ii) . Then there exists a (p + 1) x (p + 1) matrix 5>(°) such that the error correlation matrix satisfies: -^—rf^r) - 4>(0) + oP(l) (2.9) n + 1 as n —> oo, where ^ 0 r?n • • • r]lp ^ V = \ 0 7}ni ••• J]np J (iv) €j is independent of (rjn,..., r}ip)T for any i, j = 1,..., n. (2.10) 16 In Section 3.1.2 of Chapter 3 we define the so-called modified local linear backfitting estimator of the vector of linear effects /3 in model (2.1). The definition of this estimator assumes full knowledge of the correlation matrix of the model errors. In Chapter 5, we study the asymptotic behaviour of this estimator under the assumption that the model errors satisfy the following condition: (A2) (i) The €i's represent n consecutive realizations from a covariance-stationary au-toregressive process of finite order R having mean 0, finite, non-zero variance a\ and satisfying: et = fat-i + + ••• + 4>Ret-R + ut, t = 0, ±1 , ± 2 , . . . (2.11) with {ut}, t = 0, ±1 , ± 2 , . . . being independent, identically distributed random variables having mean 0 and finite, non-zero variance u\. (ii) ej is independent of(r)n,..., r)ip)T for any i, j = 1,..., n, where (rjn,..., rjip)T, i 1,..., n, are as in (AO)-(ii). According to Comments 2.2.1 - 2.2.3 below, if the errors satisfy condition (A2), they also satisfy condition (Al). Comment 2.2.1 If the errors eit i = 1,..., n, satisfy condition (A2), then one can easily see that they also satisfy condition (Al)-(i). Moreover, one can show that their correlation matrix * = (*ij) is given by ^ = 1, = p(\i - j\) = p ^ , i ^ j, where p is a correlation function and the p;'s satisfy the Yule-Walker equations: Pk = (piPk-i + ••• + <f>Rpk-R, for k > 0. The general solution of these difference equations is: Pk = V'IAJ + ip2>^2 + •••+ ipR^R, for > 0 17 where the A;, i — 1,..., R, are the roots of the polynomial equation: zR-<l>izR-1-^--<f>R = 0. Initial conditions for determining tpi,..., TJJR can be obtained by using po = 1 together with the first R — 1 Yule-Walker equations. For more details, see Chatfield (1989, page 38). Comment 2.2.2 If the errors e*, i = 1,..., n, satisfy condition (A2), then their correla-tion matrix * = (^Sij) satisfies condition (Al)-(ii) by Comment 2.2.1 and result (5.34) of Lemma 5.7.2 (Appendix, Chapter 5). In other words, \& is symmetric, positive-definite and has finite spectral norm. Comment 2.2.3 If the errors e*, i = 1,..., n, associated with model (2.1) satisfy con-dition (A2) then, by Lemma 2.5.1 in the Appendix of this chapter, \& satisfies (2.9) of condition (Al)-(iii), with 4? (0) = S ( 0 ) and £ ( 0 ) defined as in (2.15) . Comment 2.2.4 Due to its parametric nature, assumption (A2) allows us to find an explicit expression for the inverse of the error correlation matrix making the derivation of the asymptotic results concerning the modified local linear estimator of j3 easier. We have not been able to modify our proof of these results to handle the more general assumption (Al), since finding an explicit expression for SI/ - 1 under (Al) may not be possible. The asymptotic results derived in Chapters 4 and 5 assume h, the half-width of the window of smoothing involved in the definition of the local linear backfitting estimator and the modified local linear estimator of (3, to be deterministic and to satisfy h-*0 (2.12) 18 and nh3 -» oo (2.13) as n —> co. These asymptotic results also rely on the conditions below. (A3) The Zi's are non-random and follow a regular design, i.e. there exists a continuous strictly positive density /(•) on [0,1] with: fZi i / f(z)dz = ——, i = l,...,n. Jo n+l Moreover, /(•) admits two continuous derivatives. (A4) m(-) is a smooth function with 3 continuous derivatives. (A5) K(-), the kernel function used in (3.7) and (3.8), is a probability density function symmetric about 0 and Lipschitz continuous, with compact support [—1,1]. 2.3 Notation Let Zi, i = 1,..., n, be design points satisfying the design condition (A3) and let ffi (•)>••• > gp(-) be functions satisfying the smoothness assumptions in condition (A0)-(i). We define the n x matrix G as: G = ( 1 g,{Zx) ••• gp(Zx) \ \ 1 gi(Zn) • • • gp{Zn) J ( flo(Zi) Si(Zi) \ (2.14) \ go(Zn) gi(Zn) ••• gp(Zn) J Furthermore, let the n x (p + 1) matrix rj be defined as in (2.10) (condition (Al)-(iii)). In light of condition (AO)-(ii), the transposed rows of rj are independent, identically distributed degenerate random vectors with mean zero and variance-covariance matrix 19 £ ( 0 ) , where: S(o) 0 0 ••• 0 0 S u • • • S ip \ o s pi (2.15) Using equation (2.7) of condition (AO) (Xjj — gj(Zi) + rjij) together wi th the definitions of G and r/ in equations (2.14) and (2.10), we can express the design matr ix X in (2.2) as: X = G + r). (2.16) A\-z)/h {K,z,h)= slK{s)ds, 1 = 0 ,1 ,2 ,3 . J-zlh (2.17) Let K(-) be a kernel function satisfying condition (A5); if z 6 [0,1] and h € [0,1/2], define the following quantity: -z/h Note that, if z € [h, 1 — h], i.e. z is an 'interior' point of the interval [0,1], then vt(K,z,h) = f^slK(s)ds = vt(K) as [-z/h,(l - z)/h] 5 [-1,1] and K(-) has com-pact support on [—1,1] by condition (A5). Now, for go{-), • • •, <7P(-) as above and /(•) a design density, we let: J g(z)f(z)dz=(j\0(z)f(z)dz,...,j\p(z)f(z)dzy, (2.18) and J1 g(z)m"(z)f(z)dz = ( j f ' g0(z)m"(z)f(z)dz,..., gp(z)m"\z)f(z)dz)T . (2.19) We also let JQ g(z)Tf(z)dz = fQ g(z)f(z)dz and define the (p + 1) x (p+ 1) matr ix V as: V = E<°) + / g(z)f(z)dz • [ g(z)Tf(z)dz, Jo Jo (2.20) 20 with as in (2.15). We also define the (p + 1) vector W as: W = VJ^L I! 9^m"^fWdz ~ V2^YL I! 9(z)f(z)dz • f m"(z)f(z)dz. (2.21) Finally, define the (p + 1) x (p + 1) matrix as: V * = 4 (1 + E ^ S ( 0 ) + ~2 (1 - E ^ ) r 9(z)f{z)dz • f g(zff(z)dz. a « V *=i / CT« V *=i / 7 0 J ° (2.22) 2.4 Linear Algebra - Useful Definitions and Results In this section, we first provide an overview of the vector and matrix norm definitions and properties used throughout the remainder of this thesis. Let A = (Aij) be an arbitrary m x n matrix and B — (Bki) be an n x q matrix, both having real elements. Also, let v — (v\,..., vn)T be an arbitrary n x 1 vector with real elements. The spectral norm of the matrix A is defined as: I I J I I H^lb -A s = max —r—-— IMl2#o \\v\\2 with || • | | 2 being the Euclidean norm of a vector, that is \\v\\l = ^"=i u i • Furthermore, the Frobenius norm of A is defined as: \\A]\F E E 4 -i=i j=i It is well-known that \\A\\s < \\A\\F. Clearly, if A is a column vector (that is, n — 1), then | | A | | S — | | A | | 2 . In particular, if A is a scalar (i.e., m = n — 1), then ||.A||s equals the absolute value of this scalar. It is also known that | | A • B\\p < \\A\\F • \\B\\F-We conclude this section by reviewing the definitions of random bilinear and quadratic forms and providing formulas for computing the expected value of such forms. 21 Suppose A = (Aij) is an n x n matrix with real-valued elements, not necessarily sym-metric. Similarly, suppose that B — is an n x m matrix with real-valued elements. Let u be an arbitrary n x l random vector having real-valued elements. Also, let v be an arbitrary m x 1 random vector with real-valued elements. A bilinear form in u and v with regulator matrix B is defined as: n m B(u, v) = uTBv = BijUiVj. i=l j=l Note that B(u, v) is random, and its expected value can be computed using the following formula: E(B(u, v)) = trace(BCov(u, v)T) + E{u)TBE(v). (2.23) In particular, a quadratic form in u with regulator matrix A is defined as: n n Q(u) = UTAu = ^2 AijUiUj, i=l j=l with (2.23) reducing to: E(Q(u)) = trace(AVar(u)) + E{u)TAE{u). (2.24) 2.5 Appendix The following result helps establish that condition (A2) is a special case of condition (Al). Lemma 2.5.1 Let rj be defined as in equation (2.10) of condition (Al) and let * be defined as in Comment 2.2.1. Then, as n —> oo, 1 T7*T7 = S ( 0 ) + Op(l), (2.25) n + 1 where is defined as in (2.15). 22 Proof: Let rjt denote the / t h column of rj and consider rjfSl?rit, where I, t = 1,..., p + 1 . When / = 1 or t — 1, this is 0. For I, t — 2,... ,p + 1, we have: ^ ^ n n ^ n n -vf^Vt+i = "7 E E V i , i * i j V j , t = - £ £ ^ 1 ' " Jl)^,"7i,t n n '—' '—' n i=l j=l i=l j=l y n [2] / ^ n-k 1 ™ \ = - E Vi,iVi,t + E P(\k\) - E Vi,iVi+k,t + - Y Vi,iVi-k,t t=l fc=i \ t=l i=k+l J •y n ^° ( y n~k y n = - E + E ^ i ) - E + - E i = l fc=l \ i = l i=fc+l [2] / j n-fc 1 ™ \ + E (^1*1) I ~ .^'^ +*.* + ~ E ^ . ^ - M J (2-2 6) Vi,lVi-k,t i=k+l / fc=fc0+l \ t=l i=fc+l where [n/2] denotes the integer part of n/2 and k0 is chosen independently of n in the following fashion. Since 2~Zfcli IP ( I ^ I ) I < 0 0 ( s e e Lemma 5.7.2 for a justification of this result), for any given e > 0 we can choose ko such that: 00 2 E I P ( I * D I < § -k=k0+l for some large constant C. In light of condition (AO)-(ii), the first term in (2.26) converges to E; | t by the Weak Law of Large Numbers applied to the independent random variables 7 7 ^ 7 7 ^ , i — 1 , . . . , n. The second term in (2.26) converges to zero in probability as n —> co by the follow-ing argument. The random variables 7 7 ^ 7 7 * + ^ , i = I,... ,n — k, are Ac-dependent and identically distributed by condition (AO)-(ii). The Weak Law of Large Numbers for k-dependent random variables implies that YfiZi Tli,ir1i+k,t/{'n — k) converges to Efa^A = E(rjiti)E(rj2:t) = 0 in probability as n —* 00. A similar argument yields that the quantity YH=k+i VijVi-kj/n converges to 0 in probability as n —> 00. Now, consider the third term in (2.26). By Markov's Inequality and condition (AO)-(ii), for n large enough, we have: 23 12 J / j n—k ^ n E p(\k\) -/~2vi,iVi+k,t + - E Vi,ivi-k,t k=k0+l \ t=l i=k+l > € <1-E e L 2 J / j n-k 1 " \ E (^lfcD ( ~ + ~ E ^ ^ i - M J =fco+l V t=l i=/c+l / ^ [2] / ^ n-fc n \ < ~ E l^ (lfcDI -S^l^ i+Ml + ~ E ^ki.^i-Ml ) fc=fco+l V 1=1 i=k+l / = - E \P(\k\)\(2—E\r,1,im+k,t\) fc=fc0+l ~ [ 2 ] ^ 0 0 2 < 7 E I P ( I * I ) I < 7 E I P ( I * I ) I < 7 - ^ < « fc=fco+l k=ko+l In conclusion, the third term in (2.26) converges to zero in probabili ty as n Combining the previous results yields (2.25). 00. 24 Chapter 3 Estimation in a Partially Linear Model with Correlated Errors Obtaining sensible point estimators for the linear effects in a partially linear model with correlated errors is the first important step towards carrying out valid inferences on these effects. Such inferences include conducting hypotheses tests for assessing the statistical significance of the linear effects of interest, and constructing confidence intervals for these effects. As we have seen in Sections 1.1.1-1.1.2, several methods for estimating the linear and non-linear effects in a partially linear model have been proposed in the literature, both in the presence and absence of correlation amongst model errors. In principle, any of these methods could be used to obtain point estimators for the linear effects in a partially linear model with, correlated errors. However, those methods which ignore the correlation structure of the model errors might produce less efficient estimators than the methods which account explicitly for this correlation structure. It is still of interest to consider methods which do not account for the presence of correlation amongst the model errors when estimating the linear effects in the model. Indeed, these methods could yield valid testing procedures based on the inefficient point estimators they produce and the standard errors associated with these estimators. 25 In the present chapter we show that many of the estimation methods used in the literature for a partially linear model with known correlation structure can be conveniently viewed as particular cases of a generic Backfitting Algorithm. We also show how this generic Backfitting Algorithm can be modified for those instances when the error correlation structure is unknown and must be estimated from the data. This chapter is organized as follows. In Section 3.1, we discuss the generic Backfitting Algorithm for estimating the linear and non-linear effects in model (2.1) when the error correlation structure is known. In particular, in Sections 3.1.1 and 3.1.2 we discuss the usual and modified generic backfitting estimators of these effects. In Section 3.1.3, we talk about appropriate modifications of these estimators that can be used when the error correlation structure is unknown. In Section 3.1.4, we discuss several generic backfitting estimators which are versions of the estimators introduced by Speckman (1988). 3.1 Generic Backfitting Estimators In this section, we provide a formal definition for the generic backfitting estimators of the unknowns /3 and m in model (2.1). We also define and discuss various particular types of these estimators, clearly indicating which of these types we consider in this thesis. We start by introducing some notation. Let ft be an n x n matrix of weights such that the (p+1) x (p+1) matrix XT£IX is invertible. Also, let §/, be a smoother matrix depending on a smoothing parameter h which controls the width of the smoothing window. For example, the local linear smoother matrix is given in (3.6)-(3.8). Next, let Sch be the centered version of S ,^ obtained as: S% = (I-llT/n)Sh. (3.1) Formal definitions for f2 and S>ch will be provided shortly. For now, we note that the matrix of weights fi may possibly depend on the known error correlation matrix \& and on the smoother matrix S£. 26 The constrained generic backfitting estimators P>tn,sch a n d m o,S£ of 0 and m are defined as the fixed points to the following generic backfitting equations: 3n,Sc = {xTnxylxTsi{Y - mn,Sj) (3.2) mn,Sc = Sch(Y - X3n,s £)- (3-3) Use of the matrix Sch instead of §/, in equation (3.3) ensures that mn,s= satisfies the identifiability condition lT5rin,sj = 0. The motivation behind the generic backfitting equations introduced above is as follows. Given an estimator mn,g£ of the unknown m in model (2.1), one can construct the vector of partial residuals Y — mjj^, Regressing these partial residuals on X via weighted least squares yields the generic backfitting estimator /3nSc in equation (3.2). On the other hand, given an estimator /3n§c of the unknown (3 in model (2.1), one can construct the vector of partial residuals Y — X/3n §c. Smoothing these partial residuals ' h on Z — (Zi,..., Zn)T via the smoother matrix S£ yields the generic backfitting estimator mn,s= in equation (3.3). In practice, one could solve the generic backfitting equations (3.2)-(3.3) for (3r>§c and "^ n,s= iteratively by employing a modification of the Backfitting Algorithm of Buja, Hastie and Tibshirani (1989), as follows. 27 The Generic Backfitting Algorithm (i) Let /3^ and be initial estimators for (3 and m calculated as follows. We regress y on the parametric and nonparametric covariates in the model via weighted least squares regression, obtaining: V(xi,..., xp, z) = 70 + 7 i • xi H h % • xp + %+1 • (z - Z). Here, Z = (Z\ H V Zn)/n. Note that, if Z = (Z,..., Z)T is an n x 1 vector, the weighted least squares estimators 7 = (70 ,71 , . . . , 7 P ) T and 7 P + i above are obtained by minimizing the following criterion with respect to 7 = ( 7 0 , 7 1 , . . . , 7 P ) T and 7 p +i: [Y - X1 - 7 p + 1 ( Z - Z)]T ft [Y - X1 - 7 p + 1 ( Z - Z)} . We let m^(z)=%+1-(z-Z) and m<°> = (m<°)(Zi),..., m(-°\Zn))T. Also, we let /3 ( 0 ) = 7. Note that m(°> satisfies the identifiability condition (2.4). (ii) Given the estimators and m ^ , we construct /3^+1' and m ^ / + 1 ' as follows: /3(/+1> = ( X r J 7 X ) - 1 X T r 2 ( F - m « ) mV+V = Sch(Y - X0{1)). Note that m ^ / + 1 ' satisfies the identifiability condition (2.4), since E>ch = (I — 11T/n)Sh, for some smoother matrix S/,. (iii) Repeat (ii) until (3^ and do not change much. If the Generic Backfitting Algorithm converges at the iteration labeled as I + 1, say, we set: 3n,s< = (3{I) rnn,si = m ( / ) -28 However, we need not iterate to find the generic backfitting estimators /3 n §c and ran §c. Using the generic backfitting equations (3.2) and (3.3), we can easily derive an explicit expression for the generic backfitting estimator 3n §c. Simply substitute the expression of ™ n^,s= given in equation (3.3) into equation (3.2) and solve for /3n,s=: 3n,s= = (xTnx)-1xTn[r - §UY - xpnjscj = (XTnX)-1XTil[(I - SCH)Y + S£x3n,s;i Pre-multiplying both sides of the above equation by XTflX and rearranging yields xTn(i - §ch)xpntSl = xTn(i - S%)Y. Thus, provided the matrix XTQ(I — S°h)X is invertible, 3n,Sc = (XTn(I - S^)X)- 1 X r f i ( / - S%)Y. (3.4) To obtain the generic backfitting estimator mn^ without iterating, substitute the ex-plicit expression of 3Q§C obtained above in (3.3) to get: sch - &hx (xTn(i - SDX)-1 xTn(i - sch) (3.5) Results (3.4) and (3.5) above show that the generic backfitting equations (3.2)-(3.3) have a unique solution as long as the (p + 1) x (p + 1) matrix XTCl(I — Sch)X is invertible. Various specifications for the smoother matrix Sch and the matrix of weights Q, appear-ing in the generic backfitting equations (3.2) and (3.3) (or, equivalently, in the explicit equations (3.4) and (3.5)) lead to different types of generic backfitting estimators. In the rest of this section, we discuss several such specifications, together with the par-ticular types of generic backfitting estimators they yield. Note that, if one wishes to estimate the unknowns 3* and m* in the intercept-free model (2.5) one should carry out an unconstrained backfitting algorithm, using X* instead of X, and Sh instead of S°h in (3.2)-(3.3). 29 3.1.1 Usual Generic Backfitting Estimators The usual generic backfitting estimators are obtained from (3.2)-(3.3) by taking fi = I. Clearly, these estimators are defined by ignoring the correlation structure of the model errors. In this thesis, we consider a particular type of usual backfitting estimators, obtained by taking to be a local linear smoother matrix Sh, whose formal definition will be provided shortly. We refer to these estimators as local linear backfitting estimators and denote them by 0its° a n d mi^i- These estimators were introduced by Opsomer and Ruppert (1999) in the context of partially linear models with uncorrelated errors and discussed in Section 1.1.1. Taking to be Sh is motivated by the fact that local linear smoothing has been shown by Fan and Gijbels (1992) and Fan (1993) to be an effective smoothing method in nonpara-metric regression. It has the advantage of achieving full asymptotic minimax efficiency and automatically correcting for boundary bias. For more information on local linear smoothing, the reader is referred to Fan and Gijbels (1996). We define the (i, j)th element of Sh as: w (i) j E n 3 = 1 W . with local weights w%\ k = 1,..., n, given by: Sii - ™ (iy ' (3-6) Ki^ir l) [ 5"' 2(Zi) _ {Zi ~ z^ Sn^] • (3- 7) Here: Sn,l(Z) = f^K(?-^)(Z-Zj)1, 1 = 1,2, (3.8) 3=1 where Z G [0,1], h is the half-width of the smoothing window and K is a kernel function specified by the user. One possible choice of K, which will be used later in this thesis, is 30 the so-called Epanechnikov kernel: K(u) = ?) , if |u | < 1; (3.9) 0, else. 3.1.2 Modified Generic Backfitting Estimators The modified generic backfitting estimators are feasible when the error correlation matr ix \T/ is fully known. These estimators are obtained from (3.2)-(3.3) by taking f i = vf / - 1 . Unlike the usual generic backfitting estimators, which ignore the error correlation struc-ture of the model errors, the modified generic backfitting estimators estimators account for this correlation structure and thus would be expected to be more efficient. In this thesis, we consider a particular case of modified generic backfitting estimators, obtained by taking to be the local linear smoother matr ix Sh, whose (i, j)th element is defined in (3.6)-(3.8). We refer to these estimators as modified local linear backfitting estimators and denote them by /3^ -i Sc and m^-i s=. 3.1.3 Estimated Modified Generic Backfitting Estimators In practice, the error correlation matrix \I7 is never fully known. More commonly, \17 is assumed to be known only up to a finite number of parameters, or assumed to be stationary, but otherwise left completely unspecified. In these situations, the modified generic backfitting estimators are no longer feasible. However, these estimators can be adjusted to become feasible by simply replacing f i = wi th fl = & \ where "J/ is an estimator of We refer to these adjusted estimators as being estimated modified generic backfitting estimators. In this thesis, we consider a particular case of estimated modified generic backfitting estimators, obtained by taking 8^ to be the local linear smoother matr ix Sh, whose (i,j)th 31 element is denned in (3.6)-(3.8). We refer to these estimators as estimated modified local linear backfitting estimators and denote them by 3~-i _„ and m~-i Surprisingly, not much information is available in the partially linear regression model literature on estimating the correlation structure of the model errors when it is known only up to a finite number of parameters, or assumed to be stationary, but otherwise left completely unspecified. Later in this thesis we discuss how one might obtain estimators for the error variance of and the error correlation matrix \& in practice. 3.1.4 Usual, Modified and Estimated Modified Speckman Esti-mators As we have seen earlier, the usual, modified and estimated modified backfitting estimators are obtained from (3.2)-(3.3) by taking f2 to be J , and VP \ respectively, with determined by the smoothing method chosen. Other estimators are the usual, modified and estimated modified Speckman estimators, which are obtained from (3.2)-(3.3) by taking fl to be (J - Sch)T, (I - S £ ) r * _ 1 and (I - §ch)T*I>~\ respectively. Here, $ is an estimator of while Sch depends on the smoothing method of our choice. We discuss these estimators below. The usual Speckman estimators ignore the correlation structure of the model errors. In what follows, we denote these estimators by /3(/_§^r § c and m ( 7 _ § c ) T S c . An explicit expression for 3^_§CJT § C can be found by taking fl = (I — Sch)T in (3.4): 3(/-s=r,s= = {XTXYlXTY, (3.10) where X = (I — §>ch)X and Y = (I — §l)Y are partial residuals formed by smoothing X and Y as functions of Z. The usual Speckman estimator 3^J_SCJT § C can thus be thought of as being the least squares estimator of 3 obtained by regressing the partial residuals Y on the partial residuals X. Later in this thesis, we compare the finite sample behaviour 32 of the usual Speckman estimator /3(J_S°)T §=, with being a local constant matrix with Nadaraya-Watson weights, against that of 0ItSc, the local linear backfitting estimator, and / 3 o - i c o , the estimated modified local linear backfitting estimator. The modified Speckman estimators are defined by taking into account the correlation structure of the errors associated with model (2.1) and are feasible when the correlation matrix of these errors is fully known. We denote these estimators by /3(i_§£)T*-\§= and ^ ( j - s ^ ) 7 " * - 1 , s= a n d note that an explicit expression for /3(isch)T^-1, §ch c a n be found by taking f i = (I - Sch)T^/-1 in (3.4): 3(I-SJF*-I,SC - ( X r * - 1 X ) - 1 X r * - 1 y . (3.11) One can see that / 3 ( / _ s = ) i , * - i i sch is a weighted least squares estimator, obtained by re-gressing the partial residuals Y on the partial residuals X. The large-sample properties of an unconstrained version of this estimator have been studied by Aneiros Perez and Quintela del Rio (2001a) under the assumption of a-mixing errors. Their estimator is given by: 3 ( / - K „ r * - 1 , Kh = (X^V-'X*)-1**7*-1*, (3.12) where X = (I—Kch)X*, X* is defined as in (2.6) and Kh is an uncentered local constant smoother matrix with Gasser-Miiller weights. Later in this thesis, we compare their asymptotic properties of / 3 ( ; _ J f j i ' $ - 1 , Kh against those of / 3 ^ - i s=, the modified local linear backfitting estimator. We do not, however, compare the finite sample properties of these estimators, as neither estimator can be computed in practice. Indeed, both estimators depend on the true error correlation matrix, which is typically unknown in applications. The estimated modified Speckman estimators are feasible in those situations where the error correlation matrix is unknown but estimable. We denote these estimators by ^ ( i - s j r * - 1 , S £ a n d ™V-S£)r*"\sj- A n e x P l i c i t expression for 3 ( 7 _ S c ) T § - 1 ) S e can be obtained by substituting * instead of * into (3.11). 33 In the remainder of this thesis, we concentrate on the following estimators of 3, the parametric component in model (2.1): (i) 3ISch, the local linear backfitting estimator; (ii) 3 ^ - i | S c , the modified local linear backfitting estimator; (iii) 3s-i , the estimated modified local linear backfitting estimator. * >°h Opsomer and Ruppert (1999) studied the asymptotic behaviour of 3r S c under the as-sumption that the model errors are uncorrelated. However, the asymptotic behaviour of Pi,s%> / ^ * - \ S £ a n d 3~-i g c has not been studied under the assumption of error correla-tion. In Chapter 4 of this thesis, we investigate the asymptotic behaviour of 3IS^ and discuss conditions under which this estimator is v/ro"-c°nsistent. In Chapter 5, we obtain similar results for / 3 ^ - i s = for correctly specified \&. Rather than assuming * to have a general form as in Chapter 4, we restrict it to have a parametric (autoregressive) struc-ture in order to simplify the proofs of all results in Chapter 5. We also give conditions under which 3s-i is i/n-consistent. 34 Chapter 4 Asymptotic Properties of the Local Linear Backfitting Estimator (3j In this chapter, we investigate the large-sample behaviour of the local linear backfitting estimator 3IiSc as the number of data points in the local linear smoothing window increases and the window size decreases at a specified rate. Recall that an explicit expression for /3 J ) S = can be obtained from (3.4) by taking Q = I and replacing with the centered local linear smoother S°h: 0IiSc = (XT(I-S'il)X)-lXT(I-Sl)Y. (4.1) Throughout this chapter, we assume that the errors associated with model (2.1) are a realization from a zero mean, covariance-stationary stochastic process satisfying condition (Al) of Section 2.2. We also assume that the non-linear variable in the model is a fixed design variable following a smooth design density /(•) (condition(A3), Section 2.2) and having a smooth effect m(-) on the mean response (condition (A4), Section 2.2). Finally, we allow the linear variables in the model to be mutually correlated and assume they are related with the non-linear variable via a non-parametric regression relationship (condition (AO), Section 2.2). In Sections 4.1 and 4.2, we provide asymptotic expressions for the exact conditional bias 35 and variance of 3JiSc, given X, Z. In Section 4.3, we provide an asymptotic expression for an exact conditional quadratic loss criterion that measures the accuracy of 3IS^ as an estimator of 3. In Section 4.4, we discuss the circumstances under which the \fn-consistency of 3IiSc can be achieved given X and Z. In particular, we show that one must 'undersmooth' mj,s=, the estimated non-parametric component, to ensure that 3ISc^ is - /^n-consistent given X and Z. The results in Sections 4.1-4.4 focus on the local linear backfitting estimator 3JS^. In Section 4.5, we indicate how these results can be generalized to local polynomials of higher degree. The chapter concludes with an Appendix containing several auxiliary results. Throughout this chapter, we let Gi denote the ith column of the matrix G defined in (2.14), and 77, denote the 7 t h column of the matrix 77 defined in (2.10). We also let Bij,sch denote the z t h component of Pitsch-4.1 Exact Condit ional Bias of f^i,sch given X and Z The modelling flexibility of the partially linear model (2.1) comes at a price. On one hand, the presence of the nonparametric term m in this model safeguards against model mis-specification bias in the estimated relationships between the linear variables Xi,..., Xp and the response. On the other hand, allowing m to enter the model causes the usual backfitting estimator 3ISc^ to suffer from finite sample bias. Indeed, using the explicit expression of 3ISc in (4.1), together with the model formulation in (2.1), we easily see the conditional bias of Pitsp given X,Z, to be: E0IiS%\X,Z)-a= {XT(I-Sch)X)-1XT(I-St)m, (4.2) an expression which generally does not equal zero. Theorem 4.1.1 below provides an asymptotic expression for the exact conditional bias of 3ItSc given X and Z. As we already mentioned, this expression is obtained by 36 assuming that the amount of smoothing h required for computing the estimator /37 S c is deterministic and satisfies conditions (2.12) and (2.13). T h e o r e m 4.1.1 Let V and W be defined as in equations (2.20) - (2.21). Under as-sumptions (AO), (Al) and (A3) - (A5), if n —» oo, h —» 0 and nh3 —> oo, the conditional bias of the usual backfitting estimator /3ISch of (3, given X and Z, is: E0IiSoh\X, Z)-(3 = -h2- V^W + oP(h2). (4.3) C o m m e n t 4.1.1 From equation (4.2) above, one can see that the exact conditional bias of fiits%i given X and Z, does not depend upon the error correlation matrix Hence, it is not surprising that the leading term in (4.3) is unaffected by the possible correlation of the model errors. P r o o f o f T h e o r e m 4.1.1: Let: where the dependence of Bnj upon h is omitted for convenience. We will see below that when n —> oo, h —> 0 and nh3 —> oo, Bnj converges in probability to the quantity V defined in equation (2.20) . Since V is non-singular by Lemma 4.6.11, the explicit expression for /3/,s= in (4.1) holds on a set whose measure goes to 1 as n —> oo, h —> 0 and nh3 —> oo. We can use this expression to write: ^ = j B ^ ' { r 7 T I X T ( J " ^ ) y } ' ( 4 ' 4 ) which holds on a set whose measure goes t o 1 as m oo, / i ->0 and nh3 —> co. Taking conditional expectation in both sides of (4.4) and subtracting /3 yields: E(f3ItSl\X, Z)-(3 = B~\ • {^f[XT(I ~ Sch)m} (4.5) 37 We now show that Bnj converges in probability to V as n -> oo, / i ->0 and nh3 —> oo, that is: BniI = V + oP{l). (4.6) By equation (2.16), X — G + rj, so Bnj can be decomposed as: TI ~t~ J- 71 *T* 1 Using S£ = (7 — 11T/n)Sh (equation (3.1) with = Sh), we re-write the first term, expand the last term and re-arrange to obtain: B - = ^ T T ) ° T l l T G + ^ T " T " + ^ l G T « ~ S ^ G - ^Ts°^ < 4 - 7 > To establish (4.6), it suffices to show that 1• GTUTG = f g(z)f(z)dz • f g(z)Tf(z)dz + o(l), (4.8) n(n + 1) J0 J0 -L-r1Tr1 = ^ + oP(l), (4.9) whereas the remaining terms are Op ( l ) . First consider GTllTG/n(n + 1). Set Z0 = 0, Zn+\ = 1 and use (A3), the design 38 condition on the Z^s, to get: n + l fZi fl "+1 fZi / 9j(z)f(z)dz = E / 9j(z)f(z)dz Jo i = l JZi-X = E / ' fcW-ft(^)]/W^ + E / ' 93iZi)f(z)dz i=l JZi-\ t = l / 2 t - l "+1 /-Zj 1 "+1 = E / b iW-9 ; (Z . ) ] /W^+—rEf t (^ ) i=i ^ ^ - i n i=i n+l „Zi = E / + 1=1 J z i ~ 1 n + l J j+i for j = 0,. . . , p fixed. Re-arranging and using the design condition (A3) and the Lipschitz-continuity of gj(-) (consequence of (A0)-(i)) yields: 1 r1 I I n . ( . \ ?±1 rz> TG 1 n + l Jj+i j0 < 1^ (1)1 ~ n + for any j = 0,... ,p, so: f 9j(z)f(z)dz = f^ + E r I + E \9i(*) ~ 9j(Zi)\f(z)dz = O ( - L ^ \-GTl = f g(z)f(z)dz + o(l) (4.10) + 1 Jo n and (4.8) follows. Next consider r/ T ri /(n + 1). Fix i,j = 1,... ,p, and use (AO)-(ii), which specifies the distributional assumptions on the rows of rj, to get: 1 JT Ln + 1 V V i+ij+i n + _ , , K = l 1 " in probability. Since [T7Trj/(n + l ) ] i + i J + i = 0 whenever i = 0 and j = 0,...,p or i = 1,... ,p and j = 0, (4.9) follows. It remains to show that all the other terms in (4.7) are op{l). It suffices to show that G f + 1 ( I - Sh)Gj+1/(n + 1), Gf+1llT{Sh - I)Gj+1/n(n + 1), Gf+1(I - Sh)Vj+1/(n + 1), 39 r]J+1{I - Sch)Gj+i/(n + 1) and vf+iSchVj+i/(n + 1) are oP{l) for any i,j = 0,1,.. . ,p. These facts follow from lemmas appearing in the Appendix of this chapter. Let i, j — 0,1,.. . ,p be fixed and consider Gj+1(I — Sh)Gj+\/(n + 1). By result (4.58) of Lemma 4.6.9 with r* = Gi+i, fl = I and r = Gj+i, this quantity is 0(h2), so G f + 1 ( I - Sh)Gj+l/{n + 1) is o(l). Similarly, by result (4.59) of Lemma 4.6.9 with r* = Gi+i, fl = I and r = Gj+1, Gj+111T(I - Sh)Gj+1/(n{n + 1)) is Q(h2). Thus, Gf+lllT(I - Sh)Gj+1/(n(n + 1)) is o(l). Next consider Gf + 1(7 - 5^)T7 j + 1 / (n + 1). When j = 0, this is 0. For j = 1,... ,p, by result (4.60) of Lemma 4.6.9 with r* = Gi+\, fl = I and £ = rjj+1, this quantity is C M n - 1 / 2 / ! - 1 / 2 ) = o P (l) . Similarly, when i - 0, rfi+l{I - Sch)Gj+1/{n + 1) = 0. For i = 1,... ,0, result (4.61) of Lemma 4.6.9 establishes that r)f+1(I - Sch)Gj+1/{n +1) = o P (l) . Finally, consider vI+i^hTlj+i/(n + !)• When z = 0 or j = 0, this is 0. By result (4.62) of Lemma 4.6.9 with £* = rji+l, fl = I and £ = rjj+i, rlT+iShrlj+i/(ri + 1) is 0P(n-^2h-^2) = oP{l) for i,j = l,...,p. Combining these results, we conclude that B n i / = £<°>+ f g{z)f(z)dz- f g(z)Tf(z)dz + oP(l) = V + oP(l). Jo Jo But V is non-singular by Lemma 4.6.11, so B$ = V - 1 + 0 P (1) . (4.11) To establish (4.3), by (4.5) it now suffices to show that: ^ T X T ( J - Sch)m = -h2W + oP(h2). (4.12) This equality is established below with the help of lemmas stated in the Appendix of this chapter. 40 By equation (2.16), X = G + rj, so XT(I — Sch)m/(n + 1) can be decomposed as: Using the identifiability condition on m(-) in (2.4) and the fact that S% = (I — 11T/n)Sh we obtain: -±-XT{I - s%)m = 4rG r( J - + - T T T n G r i l T ( f l f f c - J > m n + l n + l n(n + 1) + ^VT(I-S%)m. (4.13) By results (4.66) and (4.67) of Lemma 4.6.10, we obtain GT{I - Sh)m/(n + 1) = -h2(v2(K)/2) fi g(z)m"(z)f(z)dz + oP(h2) as well as GTllT(Sh - I)m/n(n + 1) = h2(v2(K)/2) Si g(z)f(z)dz-Si m"(z)f(z)dz + oP(h2). Result (4.61) of Lemma 4.6.9 with £* = r j i + 1 ) ft = I and r = m establishes that rjf+1(I - Sch)m/(n + 1) = 0P{n~1/2h2) = oP(h2). Note that result (4.61) of Lemma 4.6.9 holds trivially when £* = r ,^ as r)1 = 0 by definition. Thus, (4.12) holds. This, combined with (4.5) and (4.11) completes the proof of Theorem 4.1.1. To better understand the effect of the correlation between the linear and non-linear vari-ables in the model on the asymptotic conditional bias of Pi:sch> w e provide an alternative expression for this bias. Corollary 4.1.1 Let Z be a random variable with density function /(•) as in assumption (A3). Let X\,... ,Xp be random variables related to Z as: X J = 9j(Z)+Vj, 3 = 1, • • • ,P, where the gj(-) 's are smooth functions as in assumption (A0)-(1) and the r)i's are random variables satisfying E(r)j\Z) = 0, Var(r)j\Z) = S^-, Cov(r)j,r)j<) = E^y, j ^ j', with E = 41 (Ejy) as in assumption (AO)-(ii). Also, let m(-) be a smooth satisfying assumption (A4) and denote its second derivative m"(-). Set X — (Xi,... ,XP)T. Under the assumptions in Theorem 4-1-1, our previous bias expression can be re-written in terms of X and Z as: E0OtItS%\X,Z)-po = h\{K)E(X\Z)TVar(X\Z)-1Cov(X,m''(Z)) + oP(h2) (4.14) and E V x , z h2u2(K): Var{X\Z)-lCov{X,m"(Z)) + oP(h2). (4.15) Proof: Let a — (Jg1 gi(z)f(z)dz,..., gp(z)f(z)dz)T and let W be denned as in (2.21). Set W = (0, Wl)T, with: \W2\i = f 9j(z)m"(z)f(z)dz - j f 1 9j(z)f(z)dz • J1 m"{z)f(z)dz, for j = 1,... ,p. Substitute the explicit expression for V - 1 (result (4.68), Lemma 4.6.11) into (4.3) to obtain: E(3IiSAX,Z)-3 = -h2 l + a r E _ 1 a j f 0 - E _ 1 a | -aTzZ~1W2 " + O p ( / i 2 ) , E _ 1 W 2 , + o P ( / i 2 ) with S as in assumption (AO)-(ii). Results (4.14) and (4.15) follow easily from the above by noting that a = E(X\Z), £ = Var(X\Z) and W2 = Cov(X, Z). 42 Result (4.15) in Corollary 4.1.1 shows that the effect of the correlation between the linear variables and the non-linear variable in the model on the asymptotic bias of the local lin-ear backfitting estimator of the linear effects j3\,..., Qp is through the variance-covariance matrix Var(X\Z) and the covariances Cov(X,m"(Z)). Note that the latter depends on the curvature of the smooth non-linear effect m(-) through its second derivative m"(-). Therefore, the leading term in the bias of Pi,i,sch disappears when there is no correlation between the corresponding linear and non-linear terms in the model, that is when the correlation between gi(Z) and m"(Z) is zero. In particular, the leading term disappears if m(-) is a line, or if #;(•) = Q for some constant c,. Opsomer and Ruppert (1999, Theorem 1) obtained a related bias result for the local linear backfitting estimator of the linear effects . . . , Qv in a partially linear model with independent, identically distributed errors. These authors derived their result under a different set of assumptions than ours. Specifically, they assumed the design points Zi, i = 1,..., n, to be random instead of fixed. Furthermore, they did not require that the covariate values X^ and the design points Zi be related via the nonparametric regression model (2.7). However, they assumed the linear covariates to have mean zero. Finally, they allowed h to converge to zero at a rate slower than ours by assuming nh —• oo instead of condition (2.13) (nh3 —> oo). The asymptotic bias expression derived by Opsomer and Ruppert is -(h2u2(K)/2){E{Var{X\Z))}-1Cov{X,m"(Z)) + oP{h2). The leading term in this expression is a slight modification of our first term in (4.15), which accounts for the randomness of the 2Ys. The rate of the error associated with Opsomer and Ruppert's asymptotic bias approximation is Op(h2) and is of the same order as that associated with the bias approximation in (4.15). 43 4.2 Exact Conditional Variance of 0i,si Given X and Z In this section, we derive an asymptotic expression for the exact conditional variance Var(0IiSc\X, Z ) of the usual backfitting estimator /3j,s£ °f A given X and Z . But first, we obtain an explicit expression for the exact conditional variance Var(3ISaJX, Z ) . Using the expression for / 3 J | S c in (4.1) together with the fact that Var(Y\X,Z) = o f* (4.16) from condition (Al), we get: V a r @ I t S . \ X , Z ) = a2 ( X T ( I - S ^ X ) ' 1 • X T ( I - S£ )* ( I - S % ) T X -( X T ( I - S l ) T X y \ (4.17) The next result provides an asymptotic expression for this variance. Theorem 4.2.1 Let G , V and S k be defined as in equations (2.14), (2.20) and (3.1) and let I be the nxn identity matrix. Under conditions (AO) and (A3) - (A5), ifn —> oo, h —> 0 and nh3 —> oo, 2 2 ; n + 1 (n + 1)2 v " fty + 0,(1), (4.18) where 4?^ °' is defined in equation (2.9) and St is t/ie error correlation matrix. Comment 4.2.1 From equation (4.17), Var(3IScJX, Z ) depends upon the error cor-relation matrix \&, so we expect the asymptotic approximation of Var(3j S c \ X , Z) to also depend upon the correlation structure of the model errors. Indeed, result (4.18) of 44 Theorem 4.2.1 shows that, for large samples, the first term in the asymptotic expression of Var(/3ItSc\X, Z) depends on & indirectly via the limiting value 3>(°' of rjT^r]/(n + l), while the second term depends on \& directly. Comment 4.2.2 By Lemma 4.6.12, the second term in (4.18) is at most 0(l/n). There-fore, Var(f3ISc \X, Z) has a rate of convergence of 1/n. Proof of Theorem 4.2.1: -T From (4.6), B n J = X (I - S°h)X/(n + 1) = V + oP(l), so Vor(/3 / i S= \X, Z) in (4.17) can be written as: Var@IiS.\X, Z) = o*B£ • _ L _ x 7 ( i - Sch)*(I - Sch)TX • ( S ^ ) " 1 n + — v T l r cc\ir.fT Qc\T lBnJ • Cn,I • {Bnj) (4.19) where CnJ = X1 (I - Sch)^f(I - Sch)TX/(n + 1). The dependence of CnJ upon h is omitted for convenience. To establish (4.18), it suffices to show that Cnj satisfies: CnJ = *(°> + GT(I - 5CJ*(7 - Sch)TG/(n + 1) + oP(l) (4.20) Using X = G + rj (equation (2.16)), Cnj can be decomposed as: n - r i n + 1 + n + -G1 (I-S%)*(I-S%)Tr, + —rrr(I-Sl)*{I-Sl)Tr1. Expanding the last term and re-arranging yields: n + 1 T^*77 + nT l G T { I ~ " S C h ) T G + —^JG (I - SIMI - SI)1 r, + 1 1 n+l 1 n+l G1 (I-Sh)*(I-S%Yri n + l n + l h h (4.21) 45 The first term, riT^fn/(n + 1), converges in probability to by condition (Al)-(iii). We now show that all the other terms, except for the second, are O p ( l ) . It suffices to show that Gf+1(I - Sch)*(I - Sch)TVj+1/(n + 1), r,T+1^Sfr,j+1/(n + 1), and ? 7 ^ 1 5 ^ *S ^ r T 7 : / + 1 / ( n + l) are op(l); these facts follow from lemmas appearing in the Appendix of this chapter. First consider Gj+1(I - Sch)&(I - Sch)Tr]j+1/(n + 1). Using Lemma 4.6.4 with £ = rjj+x and c = (J — S°h)^f(I — Sch)TGi+i, as well as properties of vector and matrix norms from Section 2.4 of Chapter 2, we obtain: ^ G f + 1 ( I - S%)V(I - SchfVj+1 = ^ O P W - SDMI - Sch)TGi+1\\2) = ^1°P + 11-^ 1 W • ll*H* • IKJ - SDTGi+i\\2) = Opin-Wh-W). The last equality was derived by using that \ \Sch\\P is 0(h~1/2) by result (4.54) of Lemma 4.6.7, is 0(1) by assumption (Al)-(ii), and | | ( J - Sch)TGi+1\\2 is 0(n1'2) by re-sult (4.53) of Lemma 4.6.7 with r = Gi+1. We conclude that Gf+1(I - Sch)¥{I -SCh)Trlj+i/{n + 1) is °P(1)- Note that Lemma 4.6.4 invoked earlier holds trivially for £ = T7j, as r)l = 0 by definition. Next consider rjJ+1VSfr]j+1/'(n + 1) and rif+1SchVSchTr]j+1/(n + 1). When i = 0 or j — 0, these quantities are 0, so consider i,j = l,... ,p. By result (4.63) of Lemma 4.6.9 with = r,i+v Cl = M> and £ = Vj+1, Vl^SchTVj+1/(n + l) is Cpfa" 1 / ^" 1 / 2 ) = 0 p ( l ) . By result (4.64) of the same lemma with £* = r)i+x, £1 = I, fl* = and £ = r/j+1, TiT+iSch*S?rij+i/(n + 1) is 0P{n-'h-1) = o P (l) . Combining these results in (4.21) yields (4.20). This concludes our proof of Theorem 4.2.1. We now provide an alternative expression for the asymptotic conditional variance of 0i,si which will shed more light on the effect of the correlation between the linear and non-linear variables in model (2.1) on this variance. 46 G12 G21 G22 Corollary 4.2.1 Let G as in (2.14) and 4>(0) be as in (2.9). Set = GT(I-Si)(I-Si)TG, (4.22) O 2 2 J where Gu is a scalar, G12 = G^i is a 1 x p vector and G22 is a p x p matrix. Also, set: where 4?^ — 0, = ( * 2 ? ) T = 0 is a 1 x p vector, and $>2°2 is a p x p matrix. If X and Z are as in Corollary 4-1.1 and the assumptions in Theorem 4-2.1 hold, then our previous variance expression can be re-written in terms of X and Z: 2 Var(pltIiS%\X,Z) = :^E(X\Z)TVar(X\Z)-1^2)Var(X\Z)-1E(X\Z) 2 + ( n + i)2 { G ^ 1 + E{X\Z)TVar(X\Z)-lE(X\Z)f - 2G12Var(X\Z)-1 E(X\Z) -2E(X\Z)TVar(X\Z)-1E(X\Z)G12Var(X\Z)-1E(X\Z) +E{X\Z)TVar(X\Z)-1G22Var(X\Z)-1E(X\Z)} (4.24) and Var XZ -VarWZy^VariXlZ)-1 n + + ^ VariXlZ)-1 [G22 - 2E{X\Z)Gl2 + GnE{X\Z)E{X\Z)T} Var{X\Z)-1 + oP (-) . (4.25) Proof: Let a = (Jg1 g1(z)f(z)dz,..., JQl gp(z)f(z)dz)T be as in Lemma 4.6.11 and S = (Ey) be the variance-covariance matrix introduced in condition (AO)-(ii). Substi tuting the 47 explicit expression for V 1 (result (4.68), Lemma 4.6.11) into (4.18) yields: Var(pl!SJX,Z)-v21 v22 1 + oP\-n where V n is a scalar, V12 = V21 is a 1 x p vector and V22 is a p x p matrix given by: V n = 4 r f l r S " 1 $ i ' ) s " l a + r ^ W ^ 1 + a ^ a ) 2 - 2 G 1 2 E " 1 a n+l (n+l)2 - 2 a r S - 1 a G 1 2 S - 1 a + c ^ S ^ G ^ E ^ a } , (4.26) Vl = - ^ r S - ^ f ' E - ' a + _ ^ _ { _ G l l ( l + o rS- 1a)S- 1a 7 1 + 1 (77. + 1) + E - ' o G u S ^ o + (1 + a T S - 1 a ) S - 1 G f 2 - E ^ G ^ E ^ a } (4.27) and 2 2 V 2 2 = - ^ - S - ^ ^ E - 1 + ' ' ^ - ' { G B - 2aG 1 2 + G n a a ^ S " 1 . (4.28) n + l (n + l ) 2 Results (4.24) and (4.25) follow from (4.26) and (4.28), respectively, since a = E(X\Z) and E = Var(X\Z). Result (4.25) of Corollary 4.2.1 shows that the effect of the correlation between the linear variables and the non-linear variable in model (2.1) on the asymptotic variances of the local linear backfitting estimator of the linear effects j3\,...,Pp is through the conditional variance-covariance matrix Var(X\Z), the conditional mean vector E(X\Z) and the matrices G n , G i 2 , G22 in (4.22). Comment 4.2.3 In the case * = I, rj T*r)/(n + 1) = rjTrj/(n + 1) = E ( 0 ) + oP(l) by result (4.9), with E ( 0 ) as in (2.15). Therefore, $ 2 ° } = E = Var(X\Z). If we also assume, 48 as Opsomer and Ruppert (1999) do, that E(X\Z) = 0, then (4.25) becomes: \ Var x,z n + 1 Var(X\Z) - l + f (J;u2Var(X\Z)-1G22Var(X\Z)-1 + oP (-) , (4.29) Recall that these authors also used different conditions on the rate of convergence of the smoothing parameter h and the design points Zi, i — 1,..., n. Namely, they allowed h to converge to zero at a rate slower than ours by assuming nh —> oo instead of nh3 —> co, and they assumed the design points Zit i — 1,..., n, to be random instead of fixed. The asymptotic variance expression derived by Opsomer and Ruppert (1999, Theorem 1) is (of/n) • {E{Var{X\Z))}-1 + Op(h2/n + l/(n2h)). The leading term in this variance expression is (of/n) • {E(V' ar(X\Z))}~1, a slight modification of our first term in (4.29) which accounts for the randomness of the 2Vs. The rate of the error associated with their asymptotic variance approximation is oP(h2/n+1 /'(n2h)) and is possibly of smaller order than the second term in (4.29), known to be at most 0P(l/n) by result (4.69) of Lemma 4.6.12 (Appendix, Chapter 4) with \t = I. 4.3 Exact Conditional Measure of Accuracy of f^i,sch given X and Z Because Pi^i ^s generally a biased estimator of 3 for finite samples, any suitable criterion for measuring the accuracy of this estimator should take into account both bias and 49 variance. A natural way to take both effects into account is to consider E (||3/,sj - P\\l\X, Z) = {E@ItS%\X, Z ) - (3r]T {E(f3j^\X, Z ) - fs) + trace {Var(/3IScjX,Z)} . (4.30) Using the above equality, which follows from (2.24), and the asymptotic expressions for E(J3I>Sc\X, Z)-0 and Var(f3ItScjX, Z ) in Theorems 4.1.1 and 4.2.1, we obtain: Corollary 4.3.1 Assume that the conditions in Theorem 4-1.1 and Theorem 4-2.1 hold. Then: E (Wh.si - 0\\l\X, Z)=hi- WTV~2W + ^ t r a c e { V 1 * ^ " 1 } + J^TWtrace { y ^ i 1 ~ Sh)*(I ~ SlfGV-1} + oP(h4) + op ( i ) . (4.31) 4.4 The i/n-consistency of Pi,sch For obvious reasons, we would like the estimator Pitsch to have the 'usual' parametric rate of convergence of 1/n - the rate that would be achieved if ra were known, given X and Z . I f / 3 7 S c has this rate of convergence, we say that it is v^-consistent. A sufficient condition for /3/,s= to be y^-consistent given X and Z is for E(\\f3j S c — ^|||j-X", Z) to be Op(n~l). By result (4.31) in Corollary 4.3.1, £(||3 I iSe - P\\\\X, Z) is 0P(h4) + OP (rT 1). This result is due to the fact that the conditional bias of Pitsi is &p(h2)> while its conditional variance is GpirC1). For E(\\PISI-(3\\l\X, Z) to be Op{rTl), we require / i 4 = ©( r r 1 ) , as well as h —> 0 and nh3 —•> oo. To understand the meaning of the above conditions, let us consider that h = n~a. For h —* 0, we require a > 0. Also, for nh3 —> oo, we require 1 — 3a > 0. Finally, we want 50 hA — n~ia = 0(n 1 ), so a > 1/4. Thus, we require a G [1/4,1/3). In summary, PItsch achieves ^/n-consistency for h = n~a, with a G [1/4,1/3). We argue that Pits% computed with an h optimal for estimating m is consistent, but not -y/n-consistent, given X and Z. We argue this by finding the amount of smoothing h that is optimal for estimating m(Z) via the local linear backfitting estimator (Z) where, for Z G [0,1] fixed, E n ( i=l W i and ^si{Z) = _ ' * (4.32) wf = K )^ [«5„l2(Z) - (Z - ^)5 n > 1 (Z)] . (4.33) Here, Sn>i(Z), I = 1, 2, is as in (3.8), /Y is a kernel function satisfying condition (A5) and the Zj's are design points satisfying condition (A3). We define the optimal h for estimating m(Z) via mi:s^(Z) as: ^AMSE = argmin AMSE (fhI:Si(Z)\X, Z) , with AMSE (rhitsch(Z)\X, Z) being an asymptotic approximation to the exact condi-tional mean squared error of fhitsch{Z) given X and Z: MSE (rhI:Sch(Z)\X, Z) = £ {(ro/iS= (Z) - m(Z)) 2 |x, z} . To find the order of AMSE {fhi:sch(Z)\X, Z), and hence HAMSE, note that: M S £ (m,,^ (Z)|X, Z) - {E (mItS%(Z)\X, Z) - m(Z)}2 + Var (mIiSfi(Z)\X, Z) . By results (4.73) and (4.74) of Lemma 4.6.13, the first term is Op(h4) and the sec-ond term is 0P{l/(nh)), so MSE (m/,s= (Z)\X, Z) is 0 P ( / i 4 + l/{nh)). Therefore, AMSE (fhitsi(Z)\X, Z) is Op(hA+l/(nh)), and the / i that minimizes it satisfies KAMSE = ©(n" 1 / 5 ) . 51 For h — HAMSE, the estimator 3IS^ has conditional bias of order Op(n~2/5) and condi-tional variance of order Op(n~x). Thus, 3ISc i s consistent but not v^-consistent given X and Z, as its squared conditional bias asymptotically dominates its conditional vari-ance. However, for h = n~a, a e [1/4,1/3), the squared conditional bias of flitsch w m n o longer dominate its conditional variance asymptotically, ensuring that 3j Sc achieves \fn-consistency given X and Z. Note that the estimator m j ^ ^ Z ) of m(Z) computed with h = n~a,a G [1/4,1/3), is 'undersmoothed' relative to that computed with h = HAMSE, since n~a < n"1^5. 4.5 Generalization to Local Polynomials of Higher Degree The asymptotic results in this chapter focus on the local linear backfitting estimator 0i,s%- A natural question that arises is whether these results generalize to the local polynomial backfitting estimator of 3. The latter estimator is obtained from (4.1) by replacing Sch, the smoother matrix for locally linear regression, with the smoother matrix for locally polynomial regression of degree D > 1. See Chapter 3 in Fan and Gijbels (1996) for a definition of locally polynomial regression. Recall that 3IS^ has conditional bias of order Op(h2) and conditional variance of order C?p(n_ 1) by Theorems 4.1.1 and 4.2.1. In keeping with the locally polynomial regres-sion literature, we conjecture that the local polynomial backfitting estimator of 3 has conditional bias of order Op(hD+l) and conditional variance of order Note that we may need boundary corrections if D is even. If our conjecture holds, we see that the conditional variance of the local polynomial backfitting estimator of 3 is of the same order as that of 3ISc. However, the conditional bias of the local polynomial backfitting estimator of 3 is of smaller order than that of 3ISc. In Section 4.4 we established that 3ISc is y^-consistent given X and Z provided h 52 converges to zero at rate n~a, a G [1/4,1/3). To ensure that the local polynomial backfitting estimator of 3 is i/^-consistent given X and Z, we conjecture that h should converge to zero at rate n~a, a E [1/(2D + 2), 1/3). 4.6 Appendix Throughout this Appendix, the assumptions and notation introduced in Chapter 2 of this thesis hold, unless otherwise specified. The first result provides an asymptotic bias expression that will be useful for proving subsequent results. Lemma 4.6.1 Let Sh = (5y) be the uncentered smoother matrix defined by equations (3.6)-(3.8) and Sch = (I-llT/n)Sh. Letr = (r(Zx),..., r(Zn))T, where r(-) : [0,1] -> R is a smooth function having three continuous derivatives and the Zi's are fixed design points satisfying condition (A3). Furthermore, let K be a kernel function satisfying con-dition (A5) whose moments vi(K,z,h),z € [0,1], I — 0,1,2,3, are defined as in (2.17). If n —> oo, h —> 0 and nh? —> oo, then the jth element of the vector (Sh — I)r can be approximated as: l(Sh - 1)^ . = Br(K, Zjt h)-h2 + o(h2) (4.34) uniformly in Zj,j = 1,..., n, where B ( K , h \ - r " { z ) ^(K,z,h)2-^(K,z,h)u3(K,z,h) Br{K,z,h) = — — ——— — 2 — —, z€[0,1. (4.35) 2 V2(K,z,h)vo(K,z,h)-v{(K,z,h) Furthermore, ifrTl = 0, then the jth element of the vector (Sch — I)r can be approximated as: [(Sch - 7)r]. = Br(K, Zj, h)-h2-(^J2 Br(K> ZJ> • h2 + o(h2). (A. 36) 53 Proof: For i = 1,... ,n, let yi — r(Zi) + et, with the e;'s independent, identically distributed random variables with mean 0 and standard deviation ae e (0, co). Set y = (yi,..., yn)T; if r(Zj) = [ShVJj is the local linear estimator of r(Zj) obtained by smoothing y on Z\,..., Zn via the local linear smoother matrix Sh, then Bias(r(Zj)) = [(Sh — I)r]j-Standard results on the asymptotic bias of a local linear estimator yield that Bias(r(Zj)) is of order h2, with asymptotic constant Br(K, Zj,h), uniformly in Zj,j = 1,... , n (Fan and Gijbels, 1993). So the proof of (4.34) is complete. The definition of Sch and rTl = 0 allow us to write: [{si - J H = I - ^ ) S h - I n = [(s fc-iH-= [(Sh-I)rV-l i 1 n 11J n Shr (Sh-I)r Substituting (4.34) in the above result yields (4.36). The next result establishes the boundedness of a function defined in terms of certain moments of a kernel function K(-). Subsequent results rely on this lemma. Lemma 4.6.2 Let K(-) be a kernel function satisfying condition (A5) and whose mo-ments vi(K, z, h), z 6 [0,1], I = 0,1, 2, 3, are defined as in (2.17). Then, for ho € [0,1/2] small enough and I = 1,2, 3, we have: vi(K,z,h) sup sup he[o,ho] ze[o,i] i/2(if, z, h)v0(K, z, h) - v\(K, z, h)2 < oo. (4.37) Proof: 54 For z G [0,1], we define the function: vt(K,z,h) (4.38) v2(K, z, h)u0(K, z, h) - i>i(K, z, h)2 To establish the desired result, it suffices to show that, for any I — 1,2,3, this function is bounded when restricted to the intervals [h, 1 — h], [0, h] and [1 — h, 1], where h < ho for some h0 G [0,1/2] small enough, and that the three bounds do not depend on h. Let / = 1, 2,3 be fixed and let h < h0 for some ho G [0,1/2] small enough. The restriction of the function in (4.38) to the interval [h, 1—h] is t r iv ia l ly bounded, as ui(K, z, h) = vi{K) for any z G [h, 1 — h]. Clearly, the bound of this restriction does not depend on h. To show that the restriction of this function to the interval [0, h] is also bounded, let us note that, if z G [0,1], there exists a G [0,1] such that z = ah and so r(l-z)/h vl{K,z,h)= / slK{s)ds J-z/h /1/h—a slK(s)ds •CX = f slK(s)ds J —a = <M°0 since h < ho- Thus, when restricted to the interval [0,h], the function in (4.38) is equivalent to: _^ <t>i(a) = <l>i(a) (po(a)(t)2(a) - </>i(a)2 _ D(a) where a G [0,1]. To establish boundedness, it suffices to show that the nominator <f>i(a) is bounded from above while the denominator D(a) is bounded from below for any a G [0,1] and I = 1,2,3. To bound 4>i(a), note that: \Ma)\ ^ J \sl\K{s)ds 55 since K(-) is a continuous function with compact support. To bound \D(-)\ from below, we show that D(-) is non-decreasing on [0,1] and satisfies D(0) > 0. As D'{a) = </>'0(a) • «£ 2(a) + 0 o (a) • $ ( a ) - 20!(a) • < / > ' » , and da f slK(s)ds J—a = (-l)lK(-a) = (-!)'*(<*) for any I = 0,1, 2 (using Leibnitz's Rule and the symmetry of K), we obtain: D ' ( a ) = K(a) ( f s2K(s)ds + a2 f K(s)ds + 2a f sK(s)ds) . \J — a J —a J —a J Since K is non-negative and symmetric about 0, each term above is non-negative and so D'(a) > 0, that is D(-) is non-decreasing on [0,1]. Further, with K*(s) the density K(s)/ /o K(s)ds = 2K{s), we obtain: D(0) = I K(s)ds • [ s2K{s)ds - f sK(s) Jo Jo Uo I s2K*l Jo s)ds - (sK*(s)ds) Thus, £>(0) = Var(D*)/4 > 0, with D* a random variable with density K*. Finally, note that the upper bound \sl\K(s)ds/D(0) of the function <j>i(a)/D(a), a G [0,1], does not depend on h. A similar argument can be employed to establish that, when h < h0, with ho G [0,1/2], the restriction of the function defined in (4.38) to the interval [1 — h, 1] is bounded. Now, we use Lemma 4.6.1 and Lemma 4.6.2 to derive asymptotic expressions for the Euclidean norms of the biases which can occur when using locally linear regression to estimate a smooth, unknown function r(-). 56 L e m m a 4.6.3 Let r, Sh and S^ be as in Lemma 4-6.1. Then, if n —> oo, h —> 0 and nb? —> oo: 1 I-Sh)r\\l=V-?^- j\"(z)2f(z)dz-hi + o{hi). (4.39) n + 1 7/ r also satisfies l T r = 0, then: 1 (r-^)r | | i n + 1 1 ^(#) 2 j\"{zff{z)dz-(Kj\"{z)f{z)d; • / I 4 + O(/J 4). (4.40) Proof: To establish (4.39), use Lemma 4.6.1 to get: = - ^ T E [ 5 r ( A r , ^ > / i ) . ^ + o ( / i 2 ) ] 2 = ( T T T T £ ^ 2 ) ' ^ 4 + O ( / L 4 ) - ( 4 ' 4 1 ) The last equality using the boundedness of Br(K, z, h) for a l l z G [0,1] and h < h0, w i th /io G [0,1/2] small enough, which is a consequence of Lemma 4.6.2 and the boundedness of r"(-). Now, we use Br(K, z, h) = r"(z)v2{K)/2 for z G [h, 1 - h] to write: n+1 4-! 3 4(n + l ) f - f V J ' 4 n + 1 j f ^ , V j ; 3=1 K ' 3=1 v ' Zji\h,l-h] + E Br(K,Zj,h)2. Zj£[h,l-h] The first term can be shown to equal (v2(K)/2) fQl r"(z)2f(z)dz + o( l ) by a Riemann integration argument. The second term is o ( l ) , as the sum contains 0(nh) terms and r"(z) is bounded for z £ [h, 1 — /i]. The thi rd term is also o ( l ) , as the sum contains 57 0(nh) terms that have been shown to be bounded for h small enough. Combining these results yields (4.39). To establish (4.40), we use the fact that Sch = (I — llT/n)Sh (equation (3.1)) and lTr — 0 to obtain: Substituting (4.36) in the above yields (4.40). The following result provides a probability bound for a linear combination of independent, identically distributed random variables having zero mean and non-zero, finite variance. Lemma 4.6.4 Let £ = ... ,£„)T be a vector whose components are independent and identically distributed real-valued random variables. If E(£i) = 0 and 0 < Var(£\) < oo, then: ^\\(I-Si)r\\l = J2[(Si-I)r]2. ?c = 0P(\\c\\2) (4.42) for any real-valued vector c — (c i , . . . Proof: By Chebychev's Theorem, we have: I " I / I " ec = El J2c^k \+OP , Var <J2c^k fc=i J VN ^ 58 The next lemma provides asymptotic approximations for the elements 5 y , i, j = 1 , . . . , n, of the local linear smoother matrix Sh defined in (3.6)-(3.8). These approximations are used to obtain uniform bounds for the elements of Sh-L e m m a 4.6.5 Let Sij, i,j = l,...,n, be local linear smoothing weights defined as in (3.6)-(3.8). Also, let K(-) and vt(K,z,h), z G [0,1], I = 0 ,1 ,2 , as in Lemma 4.6.2. Furthermore, let Zi%i = 1 , . . . ,n, be design points with density function /(•) satisfying condition (A3) . Then, if n —> co, h —» 0 and nh3 —> co, we have: s 1 v2(K,Zi,h)-^vx(K,Zi,h) K(Zi-Zj l] f(Zi)(n + l)h ' v2{K, Zu h)u0{K, Zh h) - Vl(K, Zu hf ' \ h uniformly in Zi, i = 1,... ,n. Furthermore, for all h < h0, with ho G [0,1/2] small enough, there exists a positive constant C so that: \ S « \ * J ^ - W i - Z i \ Z h ) (4.44) uniformly in Zt and Zj, i,j = l,...,n. Proof: Using the definition of S^ in (3.6)-(3.8) and the fact that ] T " = 1 wf = Sn,2(Zi)Snfi(Zi) Sn,i(Zi)2, we write: (n + DhS- ~ (n + l)hSn:2(Zj) (Zi - Zj Sn,2(Zi)Snfl(Zi) — Snii(Zi)2 \ h (n + l)h2Sntl(Zj) (Zj-Zj\ (Zi- Zj sn,2{Zi)sn>o{Zi) - sntl{ZiYK \ h ){ h )• ( 4 4 5 ) Let I = 0 ,1, 2, 3 be fixed. B y the definition of «?„,/(•) in (3.8), the design condition (A3) on the Zj's and a Riemann integration argument, we obtain that the following asymptotic 59 expression for Sn<i(Zi)/[(n + l)hl+1]: Zj — Zj^ f Zi — Zj\ (n + 1)^+1 S n ' l { Z i ) ~ (n + l ) h ^ K { h J \ h 3 = 4iX^)(^)'/(^+0("~'A~2) holds uniformly wi th respect to Ziti — 1,..., n, as n —> oo, h —> 0 and nh3 —> oo. M a k i n g the change of variables s = (Zi — z)/h and using a Taylor series expansion of / ( • ) , we express the leading term in the above asymptotic expression as: s jf * (^) w =/rr *• *(s)m+sh)ds r(l-Zi)/h J slK(s) f(Zi) + f'(Zi) • (sh) + f-^- • (sh)2 + o(h2) Zilh r(i-Zi)/h f(i-Zi)/h = / slK(s) [f(Zi) + 0(h)] ds = f(Zi) / slK(s)ds + 0(h) J-Zi/h J-Zi/h = f(Zi)vl(K,Zuh) + 0(h) Here, the O term holds uniformly wi th respect to Zi,i = l , . . . , n by the smoothness assumptions on /(•) given in condition (A3). Combining these results, we conclude that: (n + l ) / i ' + 1 < 5 " ' ' ( Z i ) = f(Zi>l(K> Zi> ^ + °W + °(n~lh~2) (4-46) uniformly in Zi, i = 1,..., n, as n —> oo, h —> 0 and nh3 —> oo. Now, for I = 0,1,2, 3, we substitute the asymptotic expression of Snj(Zi)/[(n + l)hl+1] in (4.46) in the right side of equation (4.45). Using that the quantities f(z), K(z) and zK(z) are bounded for z € [0,1] (conditions (A3) and (A5), respectively) and re-arranging, we easily obtain (4.43). The asymptotic bound for Sy given in (4.44) follows immediately from Lemma 4.6.2 and (4.43). The following result follows easily from Lemma 4.6.5. This result w i l l be used to prove Lemma 4.6.7. ds 60 L e m m a 4.6.6 Let be as in Lemma 4-6.1. Given C > 0, there exist C{ > 0 and C2 > 0 such that for any n > 1 and any v = (v\, . . . , vn)T with \VJ\ < C, we have: (4.47) and \[Shv\A<Cl (4.48) Furthermore, we also have: \SThv\\l<n(Clf (4.49) and \shv\\i<n(c*2y. (4.50) Proof: Use result (4.44) of Lemma 4.6.5 to write: E SjkVj fc=i <El5;*l>^c~El5;*l fc=i fc=i l -0(nh) = 0 ( 1 ) . + ( n + l ) / T This proves (4.47). Result (4.48) can be derived using a similar reasoning. B y result (4.47), we have: \\SThv\\l = (Slv)T(SThv) = £ [ 5 ^ ] ; < n(Cl)\ so (4.49) is proven. Result (4.50) can be shown to hold in a similar manner. Now, we use Lemmas 4.6.5 and 4.6.6 to establish the following asymptotic bounds. 61 Lemma 4.6.7 Let r,Sch and I be as in Lemma 4-6-1- Then, if n —> oo, h —> 0 and nh3 —> co: and ||r||2 = 0(n 1 / 2 ) ) \\ShTr\\2 = 0(n^), (I-Sch)Tr\\2 = 0(n^), \sch\\F = o(h-^). (4.51) (4.52) (4.53) (4.54) Proof: Using the boundedness of /•(•), we write: n \\r\\l = rTr = Y,r{Z%)2 = 0{n), t=i so (4.51) is proven. Using Sch = (I - 11T'/n)Sh and result (4.49) of Lemma 4.6.6 with v = (I - 11T/n)r, we have: Sfr\\l = Si [I - - H T i r n *\2 = \\Siv\\i<n-(Cl) for some CJ1 > 0 not depending on n. This proves (4.52). Result (4.53) follows immediately from results (4.51) and (4.52). Finally, to show result (4.54), we use well-known properties of the Frobenius norm to get: \SI\\F -11T i- — )sh n 1 n < \\Sh\\F + -\\UT\\F -\\Sh\\F <2\\Sh\\F. Thus, it suffices to show that HS/JI^ is of order 0(h 1/2). 62 B y result (4.44) of Lemma 4.6.5, we obtain: \\Sh\\l = 11 Si < ± ± I(\Zt -Z3\< h) i=l j=l ^ ' i=l j=l for some positive constant C. Since the number of non-zero terms in the double sum appearing on the right side of the above inequality is nO(nh), we conclude that HS/JI2? is 0(h-x) or, equivalents, that \\Sh\\F is 0(hr1/2). The next result provides a probability bound for the Euclidean norm of a vector of n independent, identically distributed random variables having zero mean and non-zero, finite variance. It also provides a probabili ty bound for the Euclidean norm of a trans-formation of this vector, obtained by pre-multiplying the vector wi th the transpose of a centered local linear smoother matrix. Lemma 4.6.8 Let £ be as in Lemma 4-6-4 and S°h be as in Lemma 4-6.1. Furthermore, let fi be an n x n symmetric, positive definite matrix with ||fi||s = C>(1). Then, if n —> oo, h —> 0 and nh3 —> co, we have: ||£||2 = 0P{n1'2) (4.55) \\S£toi\\% = Op{h-W) (4.56) \\SchSlt\\2 = Op{hrll2) (4.57) Proof: B y Markov 's Theorem: = OP {E{\\H\\l)) = Op(nVar(^)) = 0P(n), so (4.55) is proven. 63 Next, consider (4.56). Set B = flSch. By Markov's Theorem, we have: \\Sf ml = \\BTm = Op(E(\\BTt\\%) = E(?BBTS) Thus, it suffices to show that E(irBBT$) is 0{h~1/2). Using result (2.24) with u = £ and A = BBT, together with the symmetry of fl, we obtain: E(£TBBT£) = trace (BBT • Var{£)) + E{£)T • BBT • E(£) = Varfa) • trace {BBT) + 0 = Var(^) • \\B\\2F <||n||l.||5cfc||F = C7(l)C?(/l-1) = 0(/i-1), by result (4.54) of Lemma 4.6.7. This proves (4.56). Result (4.57) can be established using a similar argument. The next lemma contains results concerning the asymptotic negligibility of various ran-dom or non-random terms. All of these terms depend on a matrix of weights fl and ;on centered or uncentered local linear smoother matrices. Some terms also depend on a matrix of weights fl*, possibly different than fl itself. Lemma 4.6.9 Let fl and fl* be n x n symmetric, positive-definite matrices satisfying \\fl\\s = 0(1) = ||n*||s. LetSh andSch be as in Lemma 4.6.1. Setr = (r(Zi) , . . . , r{Zn))T and r* = (r*(Zi),... ,r*(Zn))T, where r(-) : [0,1] -> R and r*(-) : [0,1] -> R are smooth functions having three continuous derivatives and the Zi's are fixed design points satis-fying condition (A3). Finally, let £ = (£i,. • •, £ n ) T a n a > £* = (£!> • • • > £n)T be vectors whose components are independent, identically distributed random variables such that Efa) = 0, Varfa) < oo and E(£*) = 0, Var(£*) < oo . Then, if n -» oo, h -> 0 and 64 nh3 —> oo, we have: 1 n + -r*1Sl(I-Sh)r = 0(h2), (4.58) -^±—^TmiT(Sh - I)r = 0(h% (4.59) ^ r ^ n C J - S£)£ = Op (n-^h-^) , (4.60) ^rr"(/ - Sl)r = O p l n " 1 ^ 2 ) , (4.61) - L - £ * T n S ^ = C M n - 1 / 2 / ! - 1 / 2 ) (4.62) - L ^ O S f * - Opin-Wh-1'*). (4.63) 1 - €* r n^n*s f n« = oP{n-lh-1) (4.64) 1 CTnSf Q*Sf Sl£ = Opin^h-1). (4.65) n + n + l Proof: Using properties of matrix and vector norms introduced in Section 2.4 of Chapter 2, we get: | ; ^ r * r f i ( J - Sh)r\ < ^ | | r ' | | 2 • • ||(I - 5 , ) r | | 2 = ^rC»(n1/2)0(l)0(n1/2/i2) = C(/i2) since ||r*||2 is © ( n 1 / 2 ) by result (4.51) with r = r* and - Sh)r\\22/(n + 1) is 0(h4) by result (4.39). Thus, (4.58) holds. Similarly, we obtain: n(n + 1) ;so (4.59) holds. 1 r*TmiT(Sh - I)r -n7r^l) l | r 1 | 2- | | n | | s- | | l l T | | F- 1 1 ( 5 , 1 ~ I ) r | 1 2 - n ( n + ^ 0 ( n l / 2 ) ° ( 1 ) 0 ( n ) 0 ( n l / 2 h 2 ) = ° ( h 2 ) > 65 Using result (4.42) with c = (I - Sch)TClr* , we have: 1 n + 1 r*TCl(I - Si)£ = ^ O P ( \ \ ( I - Sch)Trir*\\2) n + < —Op ((1 + \\S%\\F) • • ||r*||2) = ^-Op(h-ll2)0P{l)Op{n}l2) n + 1 = 0P{n-l'2h-ll2), n+1 since \\Sch\\F is 0{h~1/2) by result (4.54) and ||r*||2 is 0{nll2) by result (4.51) with r — r*. We conclude that (4.60) holds. From result (4.42) with c = ft(I - Sch)r and £ = £*, we get: 1 TCTn(I - Sch)r = (||n(J - Sch)r\\2) < -L_0Pm\\s • ||(I - Sch)r\\2) n + n + 1 1 n + 1 n + 1 0P(l)Op{nll2h2) = 0P(n-1/2h2), since ||(J - Sch)r\\22/{n + 1) is 0(hA) by result (4.40). Therefore, (4.61) holds. To prove (4.62), write: 1 n+1 1 < — iiriuMu-11^112 0P{n1'2) • 0(1) • Op(h~1'2) = Op(n-1/2h-^2), n+1 1 n + 1 since ||£*|| 2 is Op{n1'2) by result (4.55) with £ = £* and |]S££|| 2 is 0 P ( ^ 1 / 2 ) by result (4.57) with Q, = I. Result (4.63) follows via a similar argument, but with result (4.57) replaced by result (4.56). Result (4.64) follows by noting that: 1 n + 1 1 < 1 n + 1 Wnril2-l|n*||s-||s?n<*||2 n+-Op(h--1'2)0(l)0P(h-1'2) = Opin-'h-1), since both ||S^ft£*||2 and ||S£Tf2£|| 2 are Op{h~l/2) by Lemma 4.6.8. A similar reasoning yields that (4.65) holds. This concludes our proof of the current lemma. 66 The next lemma provides asymptotic expressions for quantities involving the bias of a local linear estimator of an unknown, smooth regression function m(-). L e m m a 4.6 .10 Let G be as in (2.14) and be as in Lemma 4-6.1. Furthermore, let m = (m(Zi),... ,m(Zn))T, where m satisfies the smoothness conditions in condition (A4) and Z\,..., Zn are fixed design points satisfying condition (A3) . Then, if n —> oo, h —> 0 and nh3 —> oo, we have: -^—GT{I - Sh)m = -h 2^p- f1 g(z)m"(z)f(z)dz + o(h2) (4.66) 71+1 I JQ — L ^ l l T ( S h - I)m = h 2^p- J1 g(z)f(z)dz • £ m"(z)f(z)dz + o(h2) (4.67) where g(z)f(z)dz and g(z)m"(z)ffz)dz are defined as in equations (2.18) and (2.19). Proof: Let i = 0 , 1 , . . . ,p, be fixed. B y result (4.34) of Lemma 4.6.1 wi th r = m, the (i + l ) s t element of GT(I — Sh)m/(n + 1) is: [^-Cril - S J m ] ^ = --±-±g,(Z,)[(Sk - I)m] i=i 1 ™ = -h2 ——^giiZ^BMZ^h)] +o(h2). L n + 1 3=1 J Noting that Bm(K, z, h) = m"(z)v2(K)/2 for z £ [h, 1 - h], we write: -L-j^g^B^K^^h) = ^LYjgl{Z0)rn''{Zj) 3 = 1 ^ ' j=l V 2 { % E 9i(Zj)m"(Zj) + - ^ 9i(Zi)Bm{K,Zith). 2 ( n + 1) Zj$[h,l-h] Zj$[h,l-h] 67 The first term can be shown to equal (u2(K)/2)- JQX gi(z)m"(z)f(z)dz+o(l) by a Riemann integration argument. The second and third terms are o(l), as both sums contain 0(nh) terms and these terms are bounded for h small enough, by the following argument. The boundedness of m"(z) for z £ [h, 1 — h) is a consequence of condition (A4). Lemma 4.6.2 yields that the function z —> Bm(K, z, h) is bounded for all z G [0,1] and h < h0 with h0 G [0,1/2] small enough. Combining these results yields (4.66). Now, consider (4.67). Since the first column of G is the vector 1, from (4.66): 1T(J - Sh)m/{n + 1) = j1 m"(z)f(z)dz + o(h2). Combining this with (4.10) proves (4.67). The next result concerns the existence of an inverse for the (p + 1) x (p + 1) matrix V defined in (2.20). We do not provide a proof for this result, as one can easily verify that V V _ 1 = V~1V = I using the expression for V - 1 given below. L e m m a 4.6.11 Let V = S ( 0 ) + ftg{z)f{z)dz • fl g(z)Tf(z)dz be the (p+ 1) x (p+ 1) matrix introduced in (2.20) and set a = ( J 1 gx(z)f(z)dz, • • •, JQ gp(z)f(z)dz)T. Also, let S = (Ejj) be the variance-covariance matrix introduced in condition (AO)-(ii). Then V - 1 exists and is given by: , / 1 + o r S - 1 o I - a T E _ 1 , , V-1 = • 1 (4.68) V - S _ 1 o j £ provided E _ 1 exists. The last two lemmas in this Appendix provide several useful asymptotic bounds. L e m m a 4.6.12 Suppose the assumptions in Theorem 4-2.1 hold. Then: — L ^ - ^ J - Sh)9(I - Sh)TGV-1 = Q(n-1). (4.69) 68 Proof: Since the elements of the (p+1) x (p+ 1) matr ix V 1 do not depend upon n, it suffices to show that GT(I - Sch)^{I - Sch)TG/(n + if is 0(n~x). It is enough to show that Gf+1(I - Sch)*(I - Sl)TGm/(n + l ) 2 is © ( n - 1 ) for any i, j = 0,1,... ,p. Let i,j = 0 , 1 , . . . ,p be fixed. Using vector and matr ix norm properties introduced in Section 2.4, we obtain: (n + l ) 1 —2Gj+1(I-Sch)*(I-SiyGj+1 < I - schy Gi+1\\2 • \\#\\s • - Sch)rGj+l\\2 < (n + l ) 2 ' -±—0(n^) • Oil) • 0(n"2) = 0(n~') (n + iy since \\(I - Sch)TGi+1\\2 = 0(n^2) = ||(I - Sch)TGj+1\\2 by result (4.53) of Lemma 4.6.7 wi th r = Gi+i and r = G^+i, respectively, and ||\P|s = 0 (1 ) by condition ( A l ) - ( i i ) . Thus, Gf + 1(J - S C J*(J - S£) TG i + 1 / ( n + l ) 2 is 0(n~l). L e m m a 4.6 .13 Suppose the assumptions in Theorems 4-1-1 and4-2.1 hold. Letfhz,s%(Z) be the local linear backfitting estimator ofm(Z) defined in (4.32), where Z G [0,1] is fixed. Also, let rhitsch(Z) denote the local linear backfitting estimator ofm(Z) that would be ob-tained if 3 were known precisely: m J i S c ( Z ) (4.70) where the wf"1 's are as in (4.33). Then, if n —> oo, h —• 0 and nh3 —> co, we h< ave: E(mi,Sc(Z)\X, Z) - m(Z) = 0(h2), 1 Var(mItSl(Z)\X,Z) = 0 nh (4.71) (4.72) 69 and E(fhItSch(Z)\X, Z) - m(Z) = 0(h2), (4.73) Var(mItSc(Z)\X, Z) = O (J^J . (4.74) Proof: The proof of (4.71) and (4.72) can be found in Francisco-Fernandez and Vilar-Fernandez (2001), so we omit it . To prove (4.73), use the definitions of fhi>sch(Z) and rhz,sch(Z) in (4.32) and (4.70) to write: miiS%{Z) = - En (Z) \—m (Z) = f n I ) S i ( Z ) - w T X 0 I t S l - 3 ) . Thus: E(mj,Sc(Z)\X, Z) - m(Z) = {E(mI>Sc (Z)\X, Z) - m(Z)} - wTX{E0Itsi\X, Z) - 3} (4.75) and Var(fhItSl{Z)\X,Z) = Var(rhItSl{Z)\X, Z) - 2wTX • Cov(J3ItSc, rhm(Z)\X, Z) + wTX • Var{mIiS*(Z)\X, Z) • XTw. (4.76) Result (4.73) follows by combining (4.75) and (4.71) and using that Bias(3is^\X, Z) is Op(h2) by Theorem 4.1.1 and wTX is 0(1). The latter result is easy to establish using the fact that the wf^s are bounded by Lemma 4.6.5. Result (4.74) follows by combining (4.76) and (4.72) and using that Var(3IScjX, Z) is 0P(l/n) by Theorem 4.2.1, Cw(3j Sc ,rhitsi(Z)\X, Z) is Op(l/(nh)) by a Cauchy-Schwartz argument and wTX is 0 ( 1 ) . 70 Chapter 5 Asymptotic Properties of the Modified and Estimated Modified Local Linear Backfitting Estimators, In this chapter, we investigate the asymptotic behavior of the modified local linear back-To simplify the proofs of the asymptotic results derived in this chapter, we consider that the model errors satisfy assumption (A2), that is, they are consecutive realizations from a stationary A R process of finite order R. Assumpt ion (A2) is a special case of the assumption ( A l ) considered in Chapter 4. The structure of this chapter is similar to that of Chapter 4, where we studied the asymptotic behaviour of 3ISc^. ^ n the first P a r * °f *he chapter, we study the asymptotic and fitting estimator /3^- i S c of 3, w i th * being the true correlation matr ix of the model errors. Recall that an explicit expression for 3^-\ S c can be obtained from (3.4) by taking f2 = and replacing wi th the centered local linear smoother Sch: 3*-i i S S = {XT*-\I - Sl)X)~l XT*~\l - S%)Y. (5.1) 71 behaviour of / 3 ^ - i S c . The proofs of the asymptotic results concerning 3^-i>Sc are how-ever more complicated than those concerning 3I>Sc f ° r t n e following reason: the exact conditional bias and variance of 3^,-iSc given X and Z depend on Vl> - 1 whereas the exact conditional bias and variance of / 3 J i S c given X and Z do not depend on \ T / - 1 . Next, we mention how the asymptotic results concerning the modified local linear backfitting estimator 3^-is^ can be generalized to local polynomials of higher degree. We then provide sufficient conditions for the estimators 3^-\ g c and / 3 ^ - i S c to be asymptotically 'close'. The chapter concludes wi th an Appendix containing several auxiliary results. 5.1 Exact Conditional Bias of /3^-i Sc given X and Z ' h Just like the usual local linear backfitting estimate /3/,s= > the modified local linear backfit-t ing estimate 3^,-iSc suffers from finite sample bias. Indeed, using the explicit expression of / 3 ^ - i S c given in equation (5.1), we obtain the exact conditional bias of given X and Z as: £ ( 3 * - i i S c \ X , Z)-3= ( X T * - l ( I - S D X y 1 X T ^ ~ 1 ( I - Sch)m, (5.2) an expression which generally does not equal zero. Theorem 5.1.1 below provides an asymptotic expression for the conditional bias of / 3 ^ - i S c . given X and Z. These derivations assume that the value of h in S ° h is deterministic and satisfies conditions (2.12)-(2.13). Theorem 5.1.1 Let and W be defined as in equations (2.21) - (2.22). Under con-ditions (AO) and (A2)-(A5) , if n —> oo, h —> 0 and nh3 —> oo, the conditional bias of the modified local linear backfitting estimate /3,j-i S c of 3, given X and Z , is: E 0 ^ t S a j X , Z ) - 3 = -h2^(l-J2<pk) V ^ W + oP(h2). (5.3) ° u \ fc=i / 72 Comment 5.1.1 Aneiros Perez and Quintela del Rio (2001a) investigated the large sample properties of an estimator similar to 3^-ig^, namely / 9 ( / _ K - h ) T * - 1 , Kh' * n e u n _ constrained modified Speckman estimator in (3.12). Under similar assumptions as ours, Aneiros Perez and Quintela del Rio obtained a faster rate for the asymptotic condi-tional bias of their estimator, namely Op(h4). As seen in (5.3), the rate we obtained for the asymptotic conditional bias of 3^-iSc is Op(h2). However, they did not provide asymptotic constants for this bias, like we do in (5.3). They obtained the same rate of convergence for the asymptotic conditional variance of their estimator as we did for that of 3^,-i S c , namely Op(l/ri). Just like us, they do provide an asymptotic constant for this variance. Proof of Theorem 5.1.1: Let: where the dependence of Bn^ upon h is omitted for convenience. We will see below that when n —> oo, h —• 0 and nh3 —> co, Bn^ converges in probability to the quantity V * defined in equation (2.22). Since V * is non-singular by Lemma 5.7.6, the explicit expression for 3^,-1 S c in (5.1) holds on a set whose measure goes to 1 as n —> co, h —> 0 and nh3 —> co. We can use this expression to write: 0*-*,s% = • j - l - X ^ - ' t l - S £ ) y j , (5.5) which holds on a set whose measure goes to 1 as n -> oo, h —> 0 and nh3 —> co. Taking conditional expectation in both sides of (5.5) and subtracting 3 yields: E@9-ltS%\X,Z)-a = B~)f, • j - i - X 7 * - 1 ^ - S £ ) m j • (5.6) We now show that -B n ,* converges in probability to V * as n —> oo, h —> 0 and nh3 —> co, that is: Bn>* = V * + o P(l). (5.7) 73 Using the fact that X = G + r\ (equation (2.16)), Bn^ can be decomposed as: = ^ T T g T * _ 1 ( 7 - S ^ G + — ^ r G T * _ 1 ( i - sh)v Ti -t- i n l + r T T ^ * " 1 ^ - S C JG + - i - V * " 1 ^ - 5^)77. (5.8) From equation (3.1) with = Sh, Sch — (I — 11T/n)Sh, so re-writing the first term, expanding the last term and re-arranging yields: - ^ T i ) G T * " l l l T G + ^ i " * " 1 " - ^ i i G T * " 1 ( / - S * ) G + ^ T T ) G T * " l l T ( s * " 7 ) G + ^ T T G T * " 1 ( / -+ ^ T T " r * " 1 ( / " ' s a G " S T T ' ' T * ~ 1 ' s ^ ( 5 ' 9 ) To establish (5.7), it suffices to show that 1 - G r * _ 1 l l T G = 4 {1 - E <r*\ f 9(z)f(z)dz j1 g(z)Tf(z)dz + o(l), n(n + 1) (5.10) while the remaining terms are o p ( l ) . The proof of (5.10) is immediate by writing 1 G r * _ 1 l l T G = (—-—G 7 * " " 1 ! ^ • (-J-GTlV • (l + 1 n(n + l) Vn + ! / + l / V ™ and using Lemma 5.7.3 in the Appendix of this chapter and result (4.10). Result (5.11) is proven in Lemma 5.7.4. To prove the remaining terms in (5.9) are O p ( l ) , it suffices to show that the quanti-ties Gf+1^-\I - Sh)Gj+1/(n + 1), GJ+^II^SH - I)Gj+1/n(n + 1), Gf+1*-\l -Sch)vj+1/(n + 1), vi**-1 (I - Sch)Gj+l/(n + 1) and rjj^S^^J{n + 1) are o P ( l ) . 74 These facts follow from lemmas appearing in the Appendices of this and the preceding chapter. First consider Gf+1i&~1(I - Sh)Gj+1/(n + 1). By result (4.58) of Lemma 4.6.9 with r* = Gi+i, fi = and r = Gj+i, this quantity is 0(h2) = o(l). Similarly, from result (4.59) of Lemma 4.6.9 with r* = Gi+i, fi = and r = Gj+\, we have that Gf+1*"111T(5^ - I)Gj+1/n{n + 1) is 0(h2) = o(l). By result (4.60) of Lemma 4.6.9 with r* = Gi+i, fi = * _ 1 and £ = T ; j + 1 , we have that G f + 1 * _ 1 ( 7 - Sch)r}j+1/(n+ 1) is O p ( n - 1 / 2 / r 1 / 2 ) = oP(l). Using a similar reasoning with (4.61) of Lemma 4.6.9, we obtain that rj'[+1^>~1(I - Sch)Gj+1/(n + 1) is also oP(l). Finally, consider rfi+l^~lSchr)j+l/(n+l). By result (4.62) of Lemma 4.6.9 with £* = r/ i + 1, fi = and £ = Vj+i> this quantity is (DP(n 1 / / 2 / i ll2) = oP(l). This concludes our proof of (5.7). By Lemma 5.7.6 in the Appendix of this chapter, the matrix V * on the right side of (5.7) is non-singular and admits an inverse V^ 1 , so (5.7) leads to: = V^ + oP(l). (5.12) To prove the theorem, by (5.6) and (5.12), it suffices to show that: 1 2 / R \ 2 — X T * - \ I - St)m = -h2°-\ 1 - X> W + oP(h2). (5.13) n + 1 a " V k=i / From equation (2.16), X = G + r), so: - l I X r * - 1 ( / - S%)m = -L^CPV-^I - S%)m + - I ^ H T ^ I - Sl)m. Using the identifiability condition on m in (2.4) and S£ = (I — 11T/n)Sh, we obtain: (5.14) 75 B y Lemma 5.7.5, the first two terms on the right side of (5.14) are equal to the right side of (5.13). Now, consider rrj+l<£>~x(I - Sh)m/(n + 1), the (i + l ) t h element of the third term in (5.14). Using result (4.42) of Lemma 4.6.4 wi th c = * _ 1 ( I - Sh)m and £ = r ? i + 1 , together wi th spectral norm properties introduced in Section 2.4, we obtain: - l ^ t f - V - Sh)m = - l - 0 P ( | | * - i ( J _ Sh)m\\2) = • I I C - Sh)m\\s) = -^-OpiWV-'Ws • | | ( I - Sh)m\\2) = op(h2). lb ~~J~ -L Tb \~ J. The last equality was obtained by using that | | * | | _ 1 is bounded (result (5.35) of Lemma 5.7.2) and - Sh)m\\2 = 0{nll2h2) by result (4.39) of Lemma 4.6.3 wi th r = m. Final ly, consider r]f+1^~1llT(I - Sh)m/n(n + 1), the (i + l ) t h element of the fourth term in (5.14). Using a similar reasoning as above, we obtain: " I ) m ' ^T i ) 0 j , ( l l * ' l l l T ( S f c ~ / ) m | l 2 ) = • l | l l T | l f ' 1 1 ( 7 - S h ) m l l s ) = -L^Opm-'Ws - IK/ - Sh)m\\2) = op(h2). This proves (5.13) and completes our proof of Theorem 5.1.1. 5.2 Exact Conditional Variance of f3^-i 5 c given X and Z In this section, we derive an asymptotic expression for the exact conditional variance of 3*-i,s£, given X,Z: Var(3*-i lSc | X , Z) = a2tB-% • XT*~\l - Sch)#(I - S t f ^ X • B ^ (5.15) 76 where Bn^ is defined as in (5.4). The above equality was obtained by using the explicit formula of 3 ^ - i > S c in (5.1), together with the fact that Var(Y\X, Z) — cr 2* by condition (A2). Theorem 5.2.1 Under conditions (AO) and (A2)-(A5), ifn —> oo, h —> 0 andnh3 —> oo, the conditional variance of the modified local linear backfitting estimator 3 9 - i > S c of 3, given X and Z, is: V a r @ 9 - l i a . \ X , Z) = - 1 _ . + £ ^ y - i s ( o ) v - i (5.16) Comment 5.2.1 By Lemma 5.7.7 in the Appendix of this chapter, the second term in the above asymptotic expression for Var(3^,-iSc\X, Z) is Op(n~l) and hence it does not dominate the first term, which is QP(n~l). Proof of Theorem 5.2.1: By (5.15), we have: 2 Var(p9-itS* |X, Z) = • C n > * • B~^, (5.17) where C n , * = XT*-\l - Sch)^(I - Sch)T^-xXj(n + 1). Since B~\ V*1 by result (5.12), to prove the theorem it suffices to show that: C n , * = 4 f 1 + E ^ ) S ( 0 ) + - ^ T G r * " 1 ( / - S%)#(I - SifV-'G + oP(l). (5.18) 77 This fact is shown below with the help of lemmas in the Appendix of this and the preceding chapter. By (2.16), X = G + r], SO C n ] * can be decomposed as: Expanding the last term and re-arranging yields: 1 1 C„,* = —-TV**'1* + — - G T * - \ I - S%)*(I - StfV-'G + —-GT*-\I - SD^I - sir*-1* + n + 1 -GT*-\I - S%)MI ~ Sch)T*-1rt 1 n + 1 VTSf9-1r, n + 1 n+1 h h ' (5.19) The first term in the above converges to the first term on the right side of (5.18) by Lemma 5.7.4. The second term in the above is the same as the second term on the right side of (5.18). To show the remaining terms are op(l), it suffices to estab-lish that Gj+1*-\I - SDMI ~ SDT*-\+l/(n + 1), rfi+lS£^^/(n + 1) and rif+1*-1Sch*S?9-1rij+1/(n + l) are oP(l) for all i, j = 0,1,.. . ,p. Let i,j — 0,1,.. . ,p be fixed. From result (4.42) of Lemma 4.6.4 with c = — SD^{I — SDT^f~lGi+i and £ = rjj+1 and from the spectral norm properties introduced 78 in Section 2.4 of Chapter 2, we get: -L^nj^ii - si)*(i - si)T^Gz+1 < - ^ O P W V - X • (1 + \\Sch\\F)2 • | | * | | s • | |G i + 1 | | 2) = Opin-^h-1) = oP(l). To derive the above result, we used Lemma 5.7.2 to obtain that ||^||s a n d 11~11[s are 0(1). We also used the fact that \\Sch\\F is 0(h'xl2) by result (4.54) of Lemma 4.6.7, while Gi+i is 0(nl/2) (take r = Gi+i in result (4.51) of Lemma 4.6.7). Next, consider r]f+1SchT^~1r]j+1/(n + 1) = rfi+{&~1 Schr)i+1/(n + 1). This quantity is 0P{n-ll2h-1'2) = oP{\) by result (4.62) of Lemma 4.6.9 with £* = ry i + 1, f i = * _ 1 and £ = rjj+1. Finally, T,T+1*-1S%*S?*-1rij+1/(n +1) is O p ^ / i " 1 ) = oP(l) by result (4.64) of Lemma 4.6.9 with £* = r> i+1, ft = fi* = * and £ = r / J + 1 . 5.3 Exact Conditional Measure of Accuracy of 5 c Given X and Z Any suitable criterion for measuring the accuracy of Qy-i^i should take into account both bias and variance effects. We use the following measure of accuracy for d^-i S c , which combines in a natural fashion these effects: E (||3*-lS= - 3\\l\x,z) = {ECPV-I^IX,Z) - a)T [E09-1iSO\X,Z) - a) + trace Using equation (5.20) above together with Theorem 5.1.1 and Theorem 5.2.1 we obtain the following result: 79 Corollary 5.3.1 Assume that the conditions in Theorem 5.1.1 and Theorem 5.2.1 hold. Then, when n —> co, h —> 0 and nh? —> oo, we have: E (||3*-xlSc - 3\\2\X, Z)=h*-£(l-f2 ^) WTV^W + ^ I - ^ - ( 1 + E^) trace {V-^VJ} 2 + oP{h4) + oP (^) . (5.21) 5.4 The y^-consistency of Sc ' h Just as with the usual backfitting estimator 3IS^, we would like the modified local linear backfitting estimator /3^-i [ S c to be ^ /n-consist'ent given X and Z, that is, we would like E{\\%-\sl -0\\22\X,Z) to be OP{ n By result (5.21) of Lemma 5.3.1, £(||3*-i iSc - /9||i|X, Z) is O p(/I 4) + C P (n"1). This result is due to the fact that the conditional variance of B^-i S c is Op(n~l) but its conditional bias is Op(h2). We are interested in assessing at what rate the smoothing parameter h should converge to zero so that the squared conditional bias of 3^-1^ tends to zero, but has the same order of magnitude as the conditional variance of P><s,-\sch- A similar argument as that employed in Section 4.4 yields that h should converge to zero at rate n~a, a G [1 /4 ,1 /3) , to ensure that the modified local linear backfitting estimator 3^-\s^ is y^-consistent given X and Z - exactly as for the usual local linear backfitting estimator 0itsch- Note that n~a < n - 1 / 5 , so we must 'undersmooth' m^-i S c to achieve \/n-consistency of 3y-\ S c given X and Z. Here, n~ 1 / / 5 is the 'usual' rate of convergence for h, which we believe is optimal for estimating m via m^-i S c . 80 5.5 Generalization to Local Polynomials of Higher Degree The asymptotic results in Sections 5.1-5.4 concern the modified local linear backfitting estimator / 3 ^ - i s = . We believe these results readily generalize to the modified local polynomial backfitting estimator of 8. The latter estimator is obtained from (5.1) by replacing S°h, the smoother matrix for locally linear regression, with the smoother matrix for locally polynomial regression of degree D > 1. In keeping with the locally polynomial regression literature, we conjecture that the mod-ified local polynomial backfitting estimator of 8 has conditional bias of order <DP(hD+1) and conditional variance of order Op(n~l). Note that we may need boundary correc-tions if D is even. We also conjecture that h should converge to zero at rate n~a, a € [l/(2D + 2), 1/3), for the modified local polynomial backfitting estimator of 8 to be •v/n-consistent given X and Z. 5.6 The v^-consistency of s c The estimated modified local linear backfitting estimator 3 ? - i g c can be obtained from (5.1) by replacing \JJ with an estimator 3~-^ = (xT$-\l - Sftxy1 XT$-\l - S%)Y. (5.22) Deriving asymptotic approximations for the exact conditional bias and variance of /3~-i given X and Z is not possible, as these quantities are not tractable. The reason for this is that \& is random since it is computed from the data. In this section, we give sufficient conditions for 8^-^ g c and /3^ ,-i Sc to be asymptotically 'close', in the sense that the difference between these estimators is O p ( n - 1 / 2 ) . Our conditions (5.23) and (5.24) are 81 similar to those imposed by Aneiros Perez and Quintela del Rio (2001a) for establish-ing the asymptotic equivalence of their modified and estimated modified versions of the Speckman estimator. Theorem 5.6.1 Suppose that the conditions in Theorems 5.1.1 and 5.2.1 hold. In ad-dition, suppose that: 1 • - l ^XT{* -*-1){I-Sh)X = oP(l) _ L x T ( * _ 1 - *-!)(/ - Sch)(m + e) = oP(l) Then, if h = n a , a E [1/4,1/3), we have: 39-.ifl£ = 3 . - . l S c + o p ( - ^ (5.23) (5.24) (5.25) Proof: To establish (5.25), it suffices to show: V^(3$-' s c - £ ) = V^09-\S% ~P) + Op{l). (5.26) Using the expression for /3 T-i in (5.22) and Y = XB + m + e (equation (2.1)), we write the left side of (5.26) as: (3$-,s, -Q) = (^XT*~\l ~ Sch)X^ 1 • -±=XT$-\l - Sl){m + e) = (^X r *- 1 (I - Sl)X + o F ( l ) ) 1 • (±=XT*-\I - St)(m + e) + o P ( l ) ) 1 XT*-\I - Sch)X + o p ( l ) n (^=XT^-\l-Sch)(m + e) + op(l) - X ' V U I - Sch)X n XT*-\l-Sl)(m + e) + _ Si)X^j - oP(l) + ±=XT*~\l - Sl){m + e) • o P ( l ) + oP(l). 82 By the definition of 3ySc m (5-1) w e have: ^ (3$->h -a)=y/n~ (3*-i l Sc - /3) + QxT*-1(-f - SDX^j • o P (l) + -±=XT*-\I - Sch)(m + e) • oP(l) + oP(l). Therefore, to prove (5.26), it is enough to show that [XT^f~1(I - Sch)X/n) 1 and X r * - 1 ( J - S%)(m + e)/y/n are Op(1). To prove the first fact, let = X T * - 1 ( i" - Sch)X/(n + 1). By (5.12), B " ^ -V ^ + o P ( l ) , with Vq, as in (2.22), so (XT^-1{I - Sch)X/n)'1 = Op(1). To prove the second fact, use Bn^ = V * + op(l) (result (5.7)) and Chebychev's Theorem to write: -^XT^-\I - Sl)(m + e) = -^=XT^-\I - Sch)(Y - X3) = • ( V * + op(l)) • {E@9-1ISO\X, Z)-3 + OP (yVar(3*-ilS=|X,Z)) } . By result (5.3) of Theorem 5.1.1, £?(3*-i,sj\x> z) - Sis 0P{h2) = 0P{n~2a). Also, by result (5.16) of Theorem 5.2.1, Var(3*-i,s= \X, Z) is <r?p(n"1). Since a > 1/4, we conclude: ±=XT*~\I - Sch)(m + e) = O p ( V ^ ) • ( o p ( n - 2 a ) + Op Q=)) - 0P ( n ^ ) + 0 F ( l ) - OP(1). This completes our proof of Theorem 5.6.1. Theorem 5.6.1 implies that 3~-i is v^-consistent since 3^-1 S c is v^- c o n s istent. One would expect the conditional bias and variance of 3~-i to be similar to those of 83 5.7 Appendix Throughout this Appendix , we assume that the assumptions and notation introduced in Sections 2.2 and 2.3 of this thesis hold, unless otherwise specified. We also let I(S) denote the indicator function of an arbitrary set S. The first lemma in this Appendix shows that the correlation matr ix of n consecutive observations arising from a stationary autoregressive process of finite order R is invertible. The lemma also provides an explicit formula for the inverse of this correlation matrix. A proof of this lemma can be found in Dav id and Bast in (2001,Lemma 1). L e m m a 5.7.1 Let e i , . . . , e n be successive observations from an AR process of finite order R satisfying condition (A2) . If ^ is the correlation matrix of t\, . . . , € „ defined in Comment 2.2.1, then its inverse exists and is given by: - l 07 UTU - V T V] where U and V are n x n Toeplitz lower triangular matrices defined as ( i \ / 0 u 1 -<t>R 0 0 -< and V -4>i i J o -<f>R (5.27) -<t>R 0 0 j (5.28) 84 C o m m e n t 5.7.1 Let 14 be as in (5.28) and define [U(k)}i,j = I(j = i-k,k + l<i<n) (5.29) for k = 1 , . . . , R. T h e n i t can be easily seen that u = i - (j>iU(i) 4>RU Straightforward algebraic manipulations also yield uTu = -Y.MuJk) + u{k)) + £ <t>MuT[p)uiq) + ul)u(p)) + £<f>lufk)Uw +1, (5.30) fc=l p, q = 1 fc=l p<q where lUw\ij = I(j = i + k t l < i < n - k ) , (5.31) [Ujp)U(q)]. . = / ( j = i + p - q, 1 - p + q < i < n - p), (5.32) [Ufq)Uip)]. . = I(j = i-P + qA<i<n-q), (5.33) for fc, p, q — 1 , . . . , R and p < q. The next lemma shows that, if * is the correlation matr ix of a sample of n consecutive observations arising from a stationary AR process of finite order R, then its spectral norm is bounded. Furthermore, the spectral norm of is also bounded. L e m m a 5.7.2 Let e\,...,en be successive observations from an AR process of finite order R satisfying condition (A2) . If \& is the correlation matrix of ex,... ,en defined in Comment 2.2.1, then: | | * | | s = 0 ( l ) (5.34) and = 0(1). (5.35) 85 Proof: The boundedness of ||\& 1 j | ^ (result (5.35)) follows easily by using the explicit expression for in equation (5.27). To prove the boundedness of (result (5.34)), use the symmetry of * and a well-known result on spectral norms to get: 1*11.9 < J'=l max | [ * U = max ^ \ P h l<i<n ' l < i < n —' h=l-According to Exercise 13 in Brockwell and Davis (1991), there exist constants C > 0 and s G (0,1) so that: \Ph\ < C s| / j | for al l h. Combining the previous results yields: 1*11.9 < n - 1 E M S(.g<*),-(*rb)' and (5.35) follows. The following lemma provides a useful asymptotic approximation. Lemma 5.7.3 Let e i , . . . , e n be successive observations from an AR process of finite order R satisfying condition (A2) . Let \& be the correlation matrix of e\,... ,en. Further-more, let G be an nx ( p + 1 ) matrix defined as in (2.14). If n—* oo, then: i 2 f R \ 2 rl -—GT*-ll=a-\-[l-YJ<t>k) / g(z)f(z)dz + o(l). (5.36) n + 1 a« V *=i / J o Proof: 86 B y (5.27), the left side of (5.36) is ^ 1 n + 1 so it suffices to show 1 - G T * _ 1 1 — GTUTU\ ° l n + 1 al n + 1 - G T V T V 1 , n + 1 l—GTUTUl = (l - E <f>k^j fQ 9(z)f(z)dz + o ( l ) , GTVTV1 = o ( l ) . n + 1 (5.37) (5.38) To establish (5.37), it is enough to show that, for any i — 0 , 1 , . . . ,p, we have: 2 >(1). y i £ < M / Si(z)/(z)ete + o( *=i / - 7 0 Let i — 0,1,... ,p, be fixed. Using the explicit expression for WU in result (5.30), we write: .2 R ^ G l M ^ l = -af2-zZ^ (Ujk) + UW) 1 u fc=i .2 * p<q 2 * « fe=i n ^ • S T T ^ ' 1 - ( 5 3 9 ) Therefore, it suffices to prove that the following asymptotic approximations hold: £ & [ ^ X T G ^ i (tff*) + UW) *1 = 2 (E / ' 9i(z)f(z)dz + o(l), (5.40) R r P, Q = 1 P<9 E \ 9i(z)f(z)dz + o(l), Jo (5.41) P, Q = 1 / ^ P<9 / 87 E*: fe=i n + E*i / ft(z)/(*)<fc + o(l), (5.42) The last result follows from result (4.10). l—GTi+1l = J^ 9l(z)f(z)dz + o(l). (5.43) To prove (5.40), it is enough to show that the equalities below hold for any k = 1,..., R: ^ G f + 1 C / f f e ) l = f gi(z)f(z)dz + o( l ) (5.44) n T 1 Jo 1 n + l = f *(*)/( Jo z)dz + o ( l ) . (5.45) Using the expression of Ujk^ in (5.31) and a Riemann integration argument, the left side of (5.44) can be writ ten as: ^ n n = — r Y\ YI 9i{Zt)I [l = t + k,\<t<n-k) n + 1 t r t r t=i = / fte)/e)dz + o(l). Jo Here, we have also used that k does not depend upon n, as R itself does not depend upon n. Similarly, using the expression for U(k) m (5.29), we obtain that the left side of 88 (5.45) is: 1 T ——-G i + 1 C7(fc) l = n + 1 w n + n + t=l / = 1 ^ n n t=i ;=i = ^ E ^ ) ~ E ^ ) + « w t=k+l t=l = / ft(z)/(*)dz + o(l). Jo Thus, both (5.44) and (5.45) hold. A similar argument can be used to derive (5.41) and (5.42). The only difference in the proofs is that the range of summation for t in J2tgi{Zt) changes. It remains to prove (5.38). To establish this result, it is enough to show that G ^ h l V T V l / ( n + 1) is op(l) for a l H = 0 , 1 , . . . ,p. Let i = 0 , 1 , . . . , p, be fixed. B y the definition of V in (5.28), we have: /o A VGi+l = 0 -<t>R9i{Z\) -<t>R-i9i(Zi) - <f>Rgi(Z2) \ -<?\gi{ZX) - (p2gi(Z2) 4>R9Z{ZR) J Since &(•) is bounded by assumption (A0)-(i), | | V G j + i | | 2 = 0(1)- A similar argument yields | |V1 | |2 = 0(1)- Combining these results, we obtain: 1 n + l G f + 1 V T V 1 ^ r x T l l V G i + i | | 2 • | | V 1 | | 2 = CU) • 0 ( 1 ) = 0 ( V n ) = o(l), To ~T~ X To ~x~ 1 so Gj^V'Vl/in + l) is o(l). 89 The following lemma provides a result concerning the convergence in probability of a random matrix. Lemma 5.7.4 Suppose the assumptions in Lemma 5.7.1 hold. Let (rjn,..., rjip)T, i — 1,... ,n, be as in condition (AO)-(ii) and let rj be an n x (p + 1) random matrix defined as in (2.10). Then, as n —> oo: ^ r * - S = ^ - ( l + E«) S ( 0 ) + op(l) (5.46) where S ' 0 ' zs defined as in equation (2.15). Proof: By (5.27), the left side of (5.46) can be written as: n + 1 cr2 n + 1 a 2 n+1 so it suffices to show -L-rfWUri = (l + E )^ S ( 0 ) + n + 1 In fact, if £ ^ — (Ey), it is enough to show: L-nf^Ur]^ = ^1 + E ^ + OP(1), (5.47) n + I^+iV T Vr7 i + 1 = op(l), (5.48) n + 1 for any i,j = 0,1,... ,p. 90 Let i,j = 0,l,...,p, be fixed. Using the explicit expression for * in (5.27), we write: R 1 1 ^pffi+MTU-qi+1 = - fa ^ T ^ r + i (Ul + U(*)) Vj+i fc=i R + E p<g n + jvl+i (C/fp )C/ ( g ) + C / f g ) L/ ( p ) )r7 j + 1 E< fc=i 1 n + l vT+iUjk)U{k)r]j+i + ^jvf+iVj+i (5-49) In order to establish (5.46), we wi l l show that R fc=i 1 n+jvI+iUfk)Uik)Vj+1 1 X>2 S i j + o P ( l ) , vfc=i ri+-Vi+iVj+i = E i j + o P ( l ) , and the remaining terms in the right side of (5.49) are op ( l ) . (5.50) (5.51) Result (5.51) holds by result (4.9). To prove (5.50), we use condition (AO)-(ii) and the Weak Law of Large Numbers for a sequence of independent variables to write: n + 1 1 " " n-^vI+iUfk)U{k)Vj+1 = -r-r-yEE7^ [uJk)U(k)]tilmj — E E Vtjlil = t,l<t<n-k)riij t=i i=i .. n-k 1 V \ P = — T T A , Vt.iVtj > E (m.iVij) = n + l n-+oo and (5.50) follows easily. We now show that the first term in (5.49) is Op ( l ) . We have: R \ I 1 R ( 1 E fa 7—T^f+i^ffc) + U{k))vj+i = E fa ( ^TT^+i^ffc)^--fc=i L J fc=i v / 1 + Hfa [j+^^^i (5.52) 91 so it suffices to analyze the term inI+iUjk)'nj+i/{ri + 1)> a s vf+iU(k)Vj+i/(n + 1) is its transpose, wi th i and j interchanged. Using the expression for 1 7 i n (5.31), we obtain: ^ n n = ——r E E '7t>^(i = i + >^ 1 < t < n - k)rjU n+l — t=i (=i 1 n— 1 ^—"\ p = — p r / . Vt,iVt+k,j > E (r)hir)1+k:j) = E(r)hi)E(rj1+kij) = 0. n -\- 1 — n—*oo t=l The above result was obtained by using the fact that {vt.iVt+kj}^! is a sequence of k-dependent, identically distributed random variables (condition (AO)-(ii)), so the quantity Y^t=i Vt,iVt+k,j/(n + 1) converges to E {r)i:irji+kj) by the Weak Law of Large Numbers for fc-dependent sequences of random variables. We conclude that the term 'nJ+1Ujk^,qj+1/(n+ 1) is op( l ) , so the first term in (5.49) is op ( l ) . Using a similar reasoning, we can show that the second term in (5.49) is also op ( l ) . It remains to show (5.38). B y the definition of V in (5.28), we have: 0 VVj+i = ~cf>RV\,j -fpR-lVlj - $R?}2,i \ \ - 4>Rm,i j so ||V?7 : ; + 1 | |2 = 0(1) by assumption (AO)-(ii). A similar argument gives | |V»7j + 1 | |2 0(1). Combining these results yields: n + l T 7 f + 1 V T V 7 7 j + 1 < n + l 1 n + l i+ll |2 • l | V T 7 j + 1 | | 2 Op(l)-Op(l) = 0P(l/n) = oP(l)) 92 so rf:+1VTVr)j+l/(n + 1) is oP(l). This completes our proof of Lemma 5.7.4. The following lemma provides asymptotic approximations for non-random quantitities involving the bias associated with estimating a smooth function m(-) via locally linear regression. L e m m a 5.7.5 Suppose the assumptions in Lemma 5.7.1 hold. Let G be an n x (p + 1) matrix defined as in (2.14) such that condition (AO)-(i) is satisfied, and Sh be an n x n local linear smoother matrix defined as in (3.6)- (3.8). Set m = (m(Zi),... ,m(Zn))T, where m(-) satisfies condition (A4) and the Zt's satisfy the design condition (A3). Then, if' n —> oo, h —> 0 and nh? —> oo: — - j — G T * - 1 ( I - Sh)m — - h 2 ^ f l — £ <t>k) ^p- f1g(z)m"(z)f(z)dz + o(h2) n + i CT« V fc=i / 1 J o (5.53) and 2 n (n 1—GT*-illT(Sh-I)rn = h2^ ( l - | > ) ^ fQ9{z)m"(z)f{z)dz [ g(z)f(z)dz + oP(h2), (5.54) Jo X /o where g(z)f(z)dz and g(z)m"(z)f(z)dz are defined as in equations (2.18) and (2.19). Proof: We first prove (5.53). Using the explicit expression for in equation (5.27), we write - ± - G T * - \ l - Sh)m = ^ • -^-GTUTU(I - Sh)m n + 1 cH n + 1 0"e 2 1 rp rri - — • — G r V T V I - 5 , m , a 2 n + 1 93 so it suffices to show that 1 n + 1 G'WUil - Sh)m 1-X>J Jo g(z)m"(z)f(z)dz + o(h2) 1 GTVTV(I - Sh)m = o{tf). n + 1 These facts follow by proving that n + 1 l—GJ+lUTU{I - Sh)m = -h2 ^ - E <f>^j j[* 9i(z)m"(z)f(z)dz + o(h2) - G + 1 V J V ( / - Sh)m = o(h2), n + 1 for any i = 0,1,.. . ,p. First, consider (5.55). Using the expression for UTU in (5.30), we have: R (5.55) (5.56) n + l—Gj+1UTU{I - S,)m = E fa ^ G f + 1 ( E / f > + U{k))(Sh - I)m k=l R E P, ? =1 p<q n + jGi+1(Ulp)U{q) + Ulq)Uip))(Sh - I)m TGr+1J7ffc)C7(fc)(S/l-/)m n + n + -G?+1(Sh-I)m (5.57) Thus, to establish (5.53), it suffices to show that the last two terms are o(h2) and the remaining terms can be approximated as: R T 1 1 E ^ -—Gf+1(Ujk) + Uik))(Sh-I)m k= i L + J = h* (2^^) fQ9i(z)ml'{z)f{z)dz + o(h2), (5.58) 94 £ 4>P<l>q p, q = 1 p<q —Gi+1(Ulp)Uiq) + Uiq)Uip))(Sh - I)m I R \ = h2 2 y cpp4>q V p, 1 =1 »2{K) f1 J f 9i(z)m"(z)f(z)dz + o(h2), (5.59) JO k=l -Gj+lUT{k)U{k)(Sh-I)m n + h2 ^ j fa 9i(z)m"(z)f(z)dz + o(h2), (5.60) and 1 rT (s n m - h 2 v 2 { K ) —GM{Sh-T)m-— ^ f1 gi(z)m"(z)f(z)dz + o(h2). (5.61) Jo The last result follows easily from result (4.66) of Lemma 4.6.10. To prove (5.58), it suffices to show that the equalities below hold for any k = 1 , . . . , R: h2v2{K) f1 - ^ G f + 1 E 7 [ f c ) ( S f c - I)m = j f 9i(z)m"(z)f(z)dz + o(h2), -±-Gf+1U(k)(Sh - I)m = f 9i(z)m"(z)f(z)dz + o(h2). (5.62) (5.63) Consider the left side in (5.62). Using the expression for Ujk) in (5.31), the boundedness of cii(-) (condition (A0)-(i)) and result (4.34) of Lemma 4.6.1 wi th r = m, we obtain: L-GZ,U?k){Sh-Ih — EE9i(Zt) Mk)]u • l(Sh - J ) m ] , = r+~l ^ ^ 9i(Zt)I(l = t + k, 1 < t < n - k) [Bm{K, Zh h)-h2 + o(h2)] t=i (=i n—k ^y9i{Zt)Bm{K,Zt+k,h) = h2 (Vn + Qn) + o(h2), + o(h2) (5.64) 95 where Vn = YTtJy gi{Zt+k)Bm(K, Zt+k, h)/(n + 1) and Qn = (9i(Zt) - 9i{Zt+k)) Bm(K, Zt+k, h)/(n + 1). A Riemann integration argument allows us to approximate Vn as fc Vn = ~~T E 9i(Zt)Bm(K, Zu h) - £ 9i(Zt)Bm(K, Zt, h) n + 1 t=i n + i t=i = ^p-j\i(z)m''(z)f(z)dz + o(l). The last equality was obtained by using the fact that k does not depend on n and gi(-) is bounded (condition (AO)-(i)). We also used the fact that Bm(K, z, h) is bounded for all z G [0,1] and h < ho, with ho G [0,1/2] small enough, by result (4.35) of Lemma 4.6.1, Lemma 4.6.2 and condition (A4). Using the fact that <?*(•) is Lipschitz continuous with Lipschitz constant C* (condition (A0)-(i)) and that the Zt's satisfy the design condition (A3), we bound Qn as: n—k \Qn\ < —j - r E \9i(Zt) ~ 9i(Zt+k)\• \Bm(K, Zt+k, h)\ n + t=i ^ n—k k — 1 ^ C ^ T T E E \9i(Zt+i) - gi(Zt+l+1)\ t—i i—o < c i f k ) = c , k i = 0 ( 1 ) , Substituting the results concerning Vn and Qn in (5.64) yields (5.62). A similar argument can be used to prove (5.63). The only difference in the proofs is that the range of summation for 'S2tgi(Zt)Bm(K,Zt+k,h) changes. Combining (5.62) and (5.63) yields (5.58). Similar arguments can be employed to obtain results (5.59) and (5.60). 96 It remains to prove (5.56). B y the definition of V in (5.28), we have: V(J - Sh)m 0 -Mii-Sh)™.]! - f o _ i[(I - 5fc)m]i - fo[(I - Sh)m}2 \ - 0 i [ ( / - Sfc)m]x - 02[(I - 5ft)m]2 Mil - Sh)m}R J B u t ||V(/ - Sh)m\\2 = G(h2), since for i = 1 , . . . , R, \(Sh - I)m]i = 0(h2) by Lemma 4.6.2. We know that | | V r 7 i + 1 | | 2 = 0(1). Combining these results yields: 1 n + l Gf^V'Vil-S^m < 1 n + l 1 n + l ||VGi+1||2-||V(I-Sfc)m||2 0(1) • 0(h2) = 0 ( l / n ) = o{h2) so Gf+lVTV(I - Sh)m/(n + 1) is o(h2). Result (5.54) follows easily, by writing: 1 G r t f - 1 l l r ( S - I)m 1 -G T * _ 1 1 1 n + l lT(Sh - I)m 1 1 + -n n (n + l ) x /""~ \n + l and using Lemma 5.7.3 and result (4.66) of Lemma 4.6.10 wi th G replaced wi th 1. The proof of Lemma 5.7.5 is now complete. Let V<i, be the matr ix defined in (2.22). The next result concerns the existence of an inverse for and provides an explicit expression for this inverse. We do not provide a proof for this result, as one can easily verify that V^V^,1 = V^V<y = I by using the expression of given in Lemma 5.7.6 below. 97 L e m m a 5.7.6 Let V * be the (p+1) x (p+1) matrix introduced in (2.22) and define the px 1 vector a as a — (J* gi(z)f(z)dz,..., gp(z)f(z)dz)T. Here, /(•) is a design density satisfying condition (A3) and gi(-),..., gp(-) are smooth functions satisfying condition (AO)-(i). Furthermore, let £ = (£;j) be the variance-covariance matrix introduced in condition (AO)-(ii). Then V ^ 1 exists and is given by: I 1 1 ( i - E t i « 2 + i + E f = 1 ^ i a r E _ 1 a V -E _ 1 a i' a r E - : E - 1 provided E 1 exists. The last result in this Appendix provides a useful asymptotic bound. L e m m a 5.7.7 Suppose the assumptions in Theorem 5.2.1 hold. Then, if n —> oo, h —> 0 and nh3 —> oo, we /iai;e: — L — V ^ G ^ C I - S £ ) * ( J - 5 ^ ) T * - 1 G V i 1 = 0{n-1). (5.65) Proof: To prove (5.65) it suffices to show that 1 - ^ G T ^ - \ I - Sch)*(I - SifV-'G = Oin-1), (5.66) since the elements of the (p + 1) x (p + 1) matrix V ^ 1 do not depend upon n. Result (5.66) follows by showing that G f + 1 * _ 1 ( T - S c h ) V ( I - S c h ) T * - 1 G j + l / ( n +1)2 is 0 ( 0 for any i, j = 0,1,... ,p. 98 Let i, j — 0 , 1 , . . . ,p be fixed. Using vector and matr ix norm properties introduced in Section 2.4, we obtain: 1 (n + l ) ' < 1 I - SchY tf-^+xlla • | | * | | s • | | ( I - SffV-'G^lU ( n + l ) 2 ' < O ( n - 2 ) • | | ( / - SlY^G^W, • | | ( I - Sch)T^lGj+1\\2, (5.67) since | | * | | s is 0 (1 ) by result (5.34) of Lemma 5.7.2. Thus, it suffices to show that ||(/ - Sch)T*-lGi+l\\2 and ||(7 - S%)T*-lGj+1\\2 are 0(n^). Let v = * _ 1 G j + i ; using S£ = (7 - 11T/n)Sh (equation (3.1) wi th = Sh) we write: | |(7 - SlfV-iG^Wt < WV-'G^W, + WSfV-'G^W, = | | « | | 2 + | | ^ « | | 2 = 11*112 + < IHl2 + | | S ? « | | a + 1 5^(7 - - 1 1 > n (5.68) If vt denotes the i t h component of v, we can show that there exists C > 0 such that \vt\ < C for all t = 1 , . . . , n. Indeed, by the expression for in (5.27) and Comment 5.7.1, we have: v- = G T j + 1 * - i = ^ . Gj+1UTU ~ "f2 • Gj+1VTV ~ 2 ' -TtMufk) + uw)+ £ 0 A ( i / f p ) t / ( 9 ) + r7f ? )L/ ( p )) fc=l + 5>2c/ffc)c/(fc) + 7 p, 1 = 1 p<q k=l so it suffices to show that the quantities Gj+Il/Jk), Gj+1U(k), GJ+iuJP)U(q)^ Gj^Uf^U and Gj+1Ujk)U(k) have bounded components for all k,p, q = 1,...,R, p < q, and 99 G j + 1 V T V also has bounded components. These facts follow easily from the sparse-ness of C7(fc) and V (see Comment 5.7.1) and the boundedness of Gj+i's components (condition (AO)-(i)). The boundedness of v's components implies that ||u||2 is 0(nll2), WS^vWv is C(n 1 / / 2) and \\Sl(llTv/n)\\2 is C(n 1 / 2 ) . The last two results follow by result (4.49) of Lemma 4.6.6. Using these asymptotic bounds in (5.68) yields that ||(J - Sch)T^~lGj+i\\2 is 0(n 1 / 2 ) . A similar argument gives that \\(I - Sch)T^~lGi+i\\2 is 0(n1/2). 100 Chapter 6 Choosing the Correct Amount of Smoothing The estimators of the linear effects in model (2.1) considered in this thesis depend on a smoothing parameter h. This parameter has a dual function. On one hand, it influences the statistical properties of the estimated linear effects. On the other hand, it controls the shape and smoothness of the estimated non-linear effect. Our focus in this chapter is on developing data-driven methods for choosing h so that we obtain accurate estimators for the linear effects of interest. These methods may not be the most appropriate for accurate estimation of the non-linear effect, as they may undersmooth its estimator. This chapter is organized as follows. In Section 6.1, we introduce some useful notation. In Section 6.2, we introduce methods for choosing the correct smoothing parameter for the usual and modified local linear backfitting estimators of the linear effects of interest in model (2.1). These methods require the accurate estimation of the nonparametric component m and the error correlation structure, topics discussed in Section 6.3. Finally, in Section 6.4 we introduce methods for choosing the correct smoothing parameter for the estimated modified local linear backfitting estimators of the linear effects of interest in model (2.1). 101 6.1 Notation In what follows we are interested in the accurate estimation of a linear combination cF 8 of the linear effects 3 in model (2.1), where c = (co, c\,..., cv)T is a known vector with real-valued components (e.g: c = (0,..., 0,1, 0,. . . , 0)T). Throughout this chapter, we denote fii,sch> 3^,-iiSc and /3~-i g c generically by 3U h in order to emphasize their dependence upon the amount of smoothing h. We want to choose the amount of smoothing h to accurately estimate cT3 via c T 3 n h . Given that 3~-\ is conceptually qualitatively different than the other estimators considered here, we defer its discussion to Section 6.4. In the remainder of this chapter, unless otherwise stated, we assume that Cl stands for J or St - 1. The correct choice of h depends on the conditional bias and variance of c T 3 n h given X and Z. We provide below explicit expressions for these quantities. The exact conditional variance of c T 3 i l h equals cTVar(3n h\X, Z)c. Expressions for Var0n,h\x, z) are found in (4.17) when fl = I and in (5.15) when Cl = Thus: cTVar(pu>h\X, Z)c = a2cTMa,h*M^hc = Var(h; Cl), (6.1) where M n , f c = (XTCl(I - 5 ^ ) X ) - 1 X T f i ( I - Sch). (6.2) The exact conditional bias of c T 3 n h equals cTBias(3Uh\X, Z) and can be obtained from (4.2) when Cl = I or (5.2) when Cl = *_1: cTBias(/3nA\X, Z) = cTMn,hm = Bias(h; Cl). (6.3) 102 6.2 Choosing h for cTf3j >Sc and cT/3^-i S c The estimator c1' 3^h depends on the smoothing parameter h. To obtain an accurate estimator cT 3 U h of cT3 we choose h so that it minimizes a measure of accuracy of c T 3 a h . Although the smoothing parameter h quantifies the degree of smoothness of "in.fti a 'good' value for h should not necessarily be chosen to minimize a measure of accuracy of rfici,h as, in the present context, m is merely a nuisance. Since c T 3 n h is generally biased in finite samples, we assess its accuracy via its exact conditional mean squared error, given X and Z: MSE(h; ft) = Bias(h; ft)2 + Var(h; ft). (6.4) We define the MS ^-optimal amount of smoothing for estimating cT3 via cT3a h as the minimizer of MSE: hMSE = argmin MSE(h; ft). (6.5) h From equations (6.1) and (6.3), one can see that h^SE depends upon the unknown nonparametric component m as well as the error variance a2 and the error correlation matrix * , which are typically unknown. Thus, h^SE is not directly available. To date, no methods have been proposed for estimating hf}SE when the model errors are correlated. However, when the model errors are uncorrelated and ft = J , Opsomer and Ruppert (1999) proposed an empirical bias bandwidth selection (EBBS) method for estimating h[jSB. We describe this method in Section 6.2.1. We propose modifications of the EBBS method to estimate h^SE when the errors are correlated not only for ft = J , but also for ft — in Section 6.2.2. Finally, in Section 6.2.3 we propose a non-asymptotic plug-in method for estimating h^SE in the presence of error correlation when ft equals I or V& - 1. Each method minimizes an estimator of MSE(-; ft) over h in 103 some grid. Throughout, we let H = {h(l),h(N)} denote the grid, for some integer N. 6.2.1 Review of Opsomer and Ruppert's EBBS method In this section, we provide a detailed review of Opsomer and Ruppert's EBSS method. Throughout this section only, we assume that the errors associated with model (2.1) satisfy \& = i". Specifically, we assume that these errors satisfy the assumption: (A6) The model errors et,i = 1,... ,n, are independent, identically distributed, having mean 0 and variance a\ G (0, co). We also consider $"2 = I, so that the results in this section will apply exclusively to 0 I h , the usual local linear backfitting estimator of cT3. Under the above conditions, the EBBS method attempts to estimate h^SE by minimizing an estimator of MSE(-\ I) over TC, a grid of possible values of h. For a given h(j) G TC, Opsomer and Ruppert find an estimator for MSE(h(j);I) by combining an empirical estimator of Bias(h(j); I) with a residual-based estimator of Var(h(j); I). We discuss the details related to computing these estimators below. Opsomer and Ruppert use a higher order asymptotic expression for E(cT0Ih\X,Z) — cT0, the exact conditional bias of cFQIh, to obtain: as h —> 0, where at,t = 1,... ,T, are unknown asymptotic constants referred to as bias coefficients. This expression can be obtained by a more delicate Taylor series analysis in (4.3). This yields the approximation: T (6.6) T+l E(cTpIih\X, Z) = cT0 + J2 ath* + o(h1+T). (6.7) t=2 104 For fixed h(j) € H, Opsomer and Ruppert estimate Bias(h(j)\I), the exact conditional bias of cT3Iih(j), as follows. They calculate cT3Ih^ for k e {j — ki,..., j + k2}, for some ki, k2. Note that j must be between ki + 1 and N — k2, inclusive. They then fit the model: E(cTJ3Iih\X, Z) = a0 + a2-h2 + --- + aT+1 • hT+1 (6.8) to the 'data' j (h(k), c?3Ih(k^ : k — j — k\,..., j + A; 2 | using ordinary least squares. This results in the fit: E(cT0i,h\X, Z) = a0 + a2-h2 + --- + aT+1 • hT+1. (6.9) An estimator for Bias[h(j); I) is then: Bio7s(h(j);I) = E(cTpIMj)\X, Z) - So = a2 • h(j)2 + • • • + aT+1 • h(J)T+1. (6.10) Here, ki, k2 and T are tuning parameters that must be chosen by the user. We must have k\ + k2 > T since the T + 1 parameters ao, a i , . . . , ar will be estimated using k\ + k2 + 1 'data' points. Opsomer and Ruppert estimate Var(h(j); I), the exact conditional variance of cT8Ih^, by using (6.1) with *ff = I but with a2 replaced by the following residual-based estimator: ^2 _ \\Y ~XPiMi) ~ miMi) 12 n This yields: Var{h{j); I) = ^ M W ) M j W ) c . (6.11) Finally, Opsomer and Ruppert combine (6.10) and (6.11) to obtain the following estima-tor of MSE(h(j);I), jfci + 1 < j < N - k2: MSE(h(j);I) = Bia7s(h(j);I)2 + Va7r(h(j); I). 105 They then estimate hj^SE, the minimizer of MSE(-;I), as follows: hMSE= argmin MSE(h(j);I). ki+l<j<N-k2 We see that h^SE attempts to estimate h^SE, the smoothing parameter which is MSE-optimal for estimation of cT(3. It is not clear however whether using hj^SE yields a V^n-consistent estimator of cT0. The variance estimator Var(h(j); I) in (6.11) depends on the matrix Mj^(j)- To speed computation of MI;h^, and hence Var(h(j);I), Opsomer and Ruppert suggest the following. First, take fl = I and h = h(j) in (6.2) and re-arrange to obtain an alternative expression for M i m = (xT(i - sch{j))xylxT(i - s%U)) = (XT(X - S<h(j)X))-\XT - XTSch{j)). (6.12) Then, compute the product Sch^X in (6.12) by smoothing each column of X against the design points Z\,... ,Zn. Finally, compute the product XTS°h^ in (6.12) by using the approximation XTSch^ « (S%^X)T. This approximation is justified by the near symmetry of Sch^y These computational tricks can also be used to ease the burden involved in calculating 0ith(j),h(j) G Ti, as 8i,h(j) c a n be easily seen to depend upon MiMJ)-A peculiar feature of the estimator o\ 7 of a\ is that it uses residuals based on the 'working' bandwidth h(j) G Ti, instead of a bandwidth optimized for estimation of of. As an alternative to estimating of with the 'working' bandwidth h, Opsomer and Ruppert suggest that one could use residuals based on a bandwidth optimized for estimation of a\ as in Opsomer and Ruppert (1998). For implementing the EBBS method in practice, Opsomer and Ruppert (1999) suggest using a grid size N = 18 and grid values equally spaced on the log scale. They recommend 106 using the following values for the tuning parameters involved in this method: ki = 1, k2 = 2 and T — 1. For situations where MSE(- \ I) is found to have more than one minimum as a function of h, they suggest that one could take hMSE to be either the h value where the global minimum occurs, or the h value where the first local minimum occurs. The authors advise that they found the former approach to be superior to the latter in their simulation studies. 6.2.2 Modifications to the EBBS method Here we adjust the EBBS method to deal with estimating h^SE when the model errors are correlated and fl = I or Cl — The modified EBBS method attempts to estimate hMSE by minimizing an estimator of MSE(-\Cl) over the grid Ti. For a given h(j) € H, this estimator is obtained by combining an empirical estimator of Bias(h(j);Cl)t the exact conditional bias of c T 3 n h , with a residual-based estimator of Var(h(j);Cl), the exact conditional variance of cT3Uh. Specifics are provided below. The modified EBBS method uses a similar bias-estimation scheme to that employed in the EBBS method in order to estimate Bias(h(j);Cl). This scheme relies on the following asymptotic bias approximation: x+i E(cTpn:h\X, Z) = cT3 + y athl + o(h1+T), (6.13) t=2 which parallels (6.7), and yields the estimator Bias(h;Cl). However, the modified EBBS can no longer rely on the residual estimation scheme uti-lized in the EBBS method for estimating Var(h(j);Cl). The reason for this is that Var(h(j);Cl) depends not only on the error variance cr2, but also on the error correlation 107 matrix * . For ft = J , we propose to estimate Var(h; ft) via: S j ^ M n ^ M j hc, if * is known and of is unknown; Var(h; ft) = < a 2 c T M Q ^ M ^ C , if * is unknown and of is known; (6.14) CT2cTMn,h*iVf^ h c, if * is unknown and of is unknown. For ft = SP - 1 , if of is unknown, we propose to estimate Var(h; ft) via: Var(h; ft) = a2cT'MU^MTu<hc. (6.15) The estimators in (6.14)-(6.15) have been obtained from (6.1) by substituting of for of and * for * whenever appropriate. Details on how to obtain reasonable estimators of and * are provided in Section 6.3.2. In summary, the modified EBBS method finds the minimizer: h^sE — argmin ^Bias(h; ft)2 + Var(h; ft) j = hEBBS_L, (6.16) with h G TL = {h(l),..., h(N)}, Bias(h;Q) obtained via the bias-estimation scheme described earlier, and Var(h;Q) as in (6.14) if ft = I or as in (6.15) if ft = and of is unknown. Here, the label 'EBBS — U denotes the fact that the modified EBBS method estimates Bias(h;fl) by local ordinary least squares regression. It is possible to estimate Bias(h; ft) by performing global, rather than local, ordinary least squares fitting. Specifically, we can perform just one least squares regression, using the 'data' ^(h(k),cT/3nh^ : k = 1,..., ivj. We refer to the method that finds the minimizer of (6.16), with Bias(h;fl) obtained by global ordinary least squares fitting, as the global modified EBBS method. We denote the amount of smoothing this method yields by h£BBS_G. Before concluding this section, we indicate how the modified EBBS methods can be generalized if one is interested in smoothing parameter selection for accurate estimation of cT0 via the usual or modified local polynomial backfitting estimators. For simplicity, in this section only, we denote both of these estimators by cT/3ah. 108 The variance-estimation scheme to be used in the generalized modified EBBS methods should be the same as that employed in (6.14)-(6.15). Obviously, the quantities a 2, M^,h and * involved in these equations should be computed based on locally polynomial regression of degree D > 1, instead of locally linear regression. We conjecture that the bias-estimation scheme would have to rely on the asymptotic approximation T+l E(cT0Uih\X,Z) = cT3+ J2 athT + o(h1+T) t=D+l instead of (6.13). Note that we must have T > D. 6.2.3 Plug-in method In this section we introduce yet another method for estimating the optimal amount of smoothing h$SE in the presence of error correlation whenever fi = I or fi = * _ 1 , namely the non-asymptotic plug-in method. Recall that h^SE was defined as the minimizer of MSE(-; fi) in (6.4). Thus, we might find a reasonable estimator for h^SE by minimizing an estimator of MSE(-; fi) over a grid of possible values for the smoothing parameter h. We propose estimating MSE(h(j); fi) by assembling plug-in estimators of its exact bias and variance components, Bias(h(j)\fl) and Var(h(j); fi). More specifically, we propose to estimate Var(h(j); fi) using (6.14) if fi = J and (6.15) if fi = and of is unknown. Furthermore, we propose to estimate Bias(h(j) \ f2) using (6.3), but with m replaced by an accurate estimator m: Bias(h; fi) = cTMn,hrn. (6.17) Details on how to obtain an accurate estimator m of m are provided in Section 6.3.1. As remarked before, when fi = M n , / , depends upon the error correlation matrix Thus, if * is unknown, we must substitute * for * in the expression for JWn.ft) where * is obtained as in Section 6.3.2. 109 Finally, minimizing the estimator of MSE(-; fl) obtained by combining (6.17) wi th (6.14) for fl — I, or (6.15) for fl = and cr2 unknown, over a grid of possible values for h yields the desired plug-in estimator of h^SE: hMSE = argmin {Bias(h(j);fl)2 + Var(h(j);fl)\ =h^WG_IN. (6.18) hen J 6.3 Estimating m, <j\ and ^ Here we introduce methods for (1) accurately estimating the nonparametric component m in model (2.1) in the presence of error correlation and (2) estimating the variance of and the correlation matrix \& of the errors associated wi th model (2.1). Es t imat ing m , of and * is difficult because of the confounding between the linear, non-linear and correlation effects. We hope that the combined way of estimating m , cr2 and \& proposed in this thesis w i l l enable us to do well when estimating B. 6.3.1 Es t ima t ing m In this section, we consider the issue of accurately estimating the nonparametric com-ponent m in model (2.1) when the model errors are correlated. Recall that we need an accurate estimation of m for estimating the exact conditional bias of cT/3fj h m the plug-in method in (6.17). We propose estimating m v ia mn,/,, w i th fl = I and wi th h chosen by cross-validation, modified for correlated errors. Throughout this section, we thus consider that fl = I. We also let Xf = (1, Xu,..., Xip) denote the z t h row of the matr ix X in (2.2). To assess the accuracy of rrijh as an estimator of m for a given amount of smoothing h, 110 we use the mean average squared error of mj^ : MASE(h-I) = E 1 71 2 -^2{fhi>h(Zi) - m(Zif) x,z = E - £ ( m / l f c ( Z 0 + Xf/3 - m(Zi) --XJ3)2 x,z = E 1 n 2 -^(Yt-EWXitZij) x,z (6.19) i=l where yj = rnIA(Zi) + Xj3 and £?(Yi| JTi, Z4) = m(Zj) + Xjd. We define the MASE-optimal amount of smoothing for accurate estimation of m via mj/j as: h M A S E = argrain MASE(h; I). h (6.20) From (6.19) we can see that hMASE depends on the bias and variance associated with estimating the non-parametric component ra, which in turn depend on m itself. Since in practice m is unknown, hMASE has to be estimated. To estimate h^ASE, we propose using the modified (or leave-(2Z +l)-out) cross-validation method originally formulated by Hart and View (1990) in the context of density estima-tion and studied by Chu and Marron (1991) and Hardle and Vieu (1992) in the context of nonparametric regression with correlated errors. Aneiros Perez and Quintela del Rio (2001b) recommend modified cross-validation in the context of partially linear models with a-mixing errors. These authors used a version of the Speckman estimator with boundary-adjusted Gasser-Miiller weights to estimate m. The modified cross-validation method estimates hMASE by minimizing an estimator of MASE(h;I): . 2 M^E{ha)-\zZ{yth''l)-Y^ 1 = 1 (6.21) This estimator is obtained from (6.19) by dropping the outer expectation sign, substi-tuting E(Yi\Xi, Zi) with Yi, and replacing Yi with Y^1' l \ a prediction of Y$ — Xj3 + 111 m(Zi) + €i based on data points (Yj, Xji,..., XjP, Zj) which are believed to be uncorre-lated wi th Yi. More specifically, Y ^ - X j ^ + ^ i Z i ) , (6.22) where 0IhhV> and friI~hhi\zi) are estimators of 0 and m(Z\) obtained from the data points (Yj, Xji,..., XjP, Zj) wi th j such that \i — j\ > I. The estimation procedure used for obtaining 0Ih and fhj h' (Zi) is the same as that uti l ized for obtaining 0Ih and fhi,h(Zi). Recal l that the estimation procedure uti l ized for obtaining the estimators 0Ih and rrii:h(Zi) of 0 and m is the usual backfitting algorithm, wi th a (centered) local l in -ear smoother matrix in the smoothing step. However, the backfitting algorithm allows us to evaluate fhj~hl'l\-) only at Zj's w i th j such that \i — j\ > I. We cannot evaluate mi~h'l\') & t Zi- To overcome this problem, we propose to estimate 0 and m(Zi) as indicated below. We first carry out the usual backfitting algorithm on all data to obtain the estimator / 3 n h of 0 using all n data points. We then define the partial residuals: rjth = Yj - Xjpnth, j = l,...,n. (6.23) From now on, these residuals wi l l become working responses for the modified cross-validation and our 'data set' is (rjth, Zj),j = 1, . . . ,n. F i x i, 1 < i < n. We temporarily remove from the 'data' the (21 + 1) 'data points' (rjth,Zj) w i th \i — j\ < I. We use the remaining n — (21 + 1) data points in a usual local linear regression to obtain the n — (21 + 1) estimators fn*Q~h1' l\Zi) and m*^~h%' l\Zj), w i th j such that \i — j\ > I. These estimators are not centered. Subtracting the average of -Ti^~h%'l\Zj), w i th j such that \i — j\ > I from rn*^~h1' l\zt) yields a centered estimator for m(Z , ) : ^i0(^)=<;<i')(^)-#o-; |/_ ,-!>/} E ^ ii0(^)- (6-24) 112 The centering approach used above is admittedly ad-hoc, but nevertheless attempts to address the need of subjecting rn(-) to an appropriate identifiability restriction. Next, we use the estimators in (6.24) in a computationally feasible modified cross-validation criterion: MCVt(h) = ± £ {n* ~ °(^))2 • (6'25) 2 = 1 Minimizing this criterion yields the desired cross-validation amount of smoothing for accurate estimation of m via mj/j when the model errors are correlated. Note that it is possible to compute a full scale modified cross-validation criterion, by calculating a different estimator of 3 for each i. Specifically, we could replace Bnh in the right side of (6.23) with 3nh' , the estimator obtained from all data less those data points (Yj,Xj\,... ,XjP, Zj) with j such that \i — j\ < I. However, computing the full scale modified cross-validation criterion would be more involved than computing the computationally convenient criterion in (6.25). Given that 3 is easier to estimate than m, we believe that the computational simplification used to estimate 3 will not affect to a great degree the estimation of m. A similar simplification was used by Aneiros Perez and Quintela del Rio (2001b) for their modified cross-validation method. Although we do not have theoretical results that establish the properties of the modified cross-validation method, our simulation study suggests that it has reasonable finite sam-ple performance and that it produces a reliable estimator of m, provided I is taken to be large enough. It is not clear how to best choose I in practice. Recall that I should be specified such that the correlation between Yi and (Yj,Xji,..., XjP, Zj), with \i — j\ < I, is effectively removed when predicting Yi by the value Y~h1'1 in (6.22). Choosing an I value that is too small may not succeed in removing the correlation between these data values, therefore producing an undersmoothed estimator of m. Choosing an I value that is too large may remove too much of the underlying systematic structure in the data, therefore producing an estimator of m that is oversmoothed. Whenever possible, one 113 should examining a whole range of values for I to gain more understanding about the sensitivity of the final results to the choice of I. Our simulation study suggests that small values of I should probably be avoided. 6.3.2 Estimating o f and * In this section, we propose a method for estimating the variance of and correlation matrix * of the errors associated with model (2.1). The method we propose relies on assumption (A2), that the model errors follow a stationary autoregressive process of unknown, but finite, order. To estimate the order and the corresponding parameters of this process, we apply standard time series techniques to suitable estimators of the model errors. Monte Carlo simulation studies conducted in Chapter 9 indicate that this method performs reasonably well. Assumption (A2) will clearly not be appropriate for all applications. However, we expect it to cover those situations where the errors can be assumed to be realizations of a sta-tionary stochastic process. Indeed, it can be shown that almost any stationary stochastic process can be modelled as an unique infinite order autoregressive process, independent of the origin of the process. In practice, finite order autoregressive processes are sufficiently accurate because higher order parameters tend to become small and not significant for estimation (Bos, de Waele, Broersen, 2002). If the e,'s were observed, we could estimate the order R of the autoregressive process they are assumed to follow by using the finite sample criterion for autoregressive order selection developed by Broersen (2000). This criterion selects the order of the process by achieving a trade-off between over-fitting (selecting an order that is too large) and under-fitting (selecting an order that is too small). Traditional autoregressive order selection criteria either fail to resolve these issues (i.e., the Akaike Information Criterion) or address just the issue of over-fitting (i.e., the corrected Akaike Information Criterion). In addition, 114 Broersen's criterion performs well even when the order of the autoregressive error process is large. After estimating R, we could estimate the error variance a\ and the corresponding autore-gressive parameters <PI,--.,<PR by using Burg's algorithm. This algorithm is described, for instance, in Brockwell and Davis (1991). A comparison of various methods for au-toregressive parameter estimation has shown that the Burg algorithm is the preferred method (Broersen, 2000). Finally, we could estimate the error correlation matrix \& by replacing 4>i, • • •, 4>R with their estimated values in the expression for \& provided in Comment 2.2.1. For instance, if R was estimated to be 1, we would estimate the (i,j)th element of * as: where 4>\ is the estimator of the autoregressive coefficient <p\. However, the e,'s are unobserved, so we must first estimate them via suitably defined model residuals and then apply the methodology described above to these residuals in order to obtain the desired estimators of o~\ and * . We propose to estimate the vector of errors e = (e i , . . . ,e n ) r by the model residuals ej^ = Y — X8Ih — rhih, where h is chosen by modified cross-validation, as described in Section 6.3.1. As argued in Section 6.3.1, this choice of h is expected to provide an accurate estimator for X8+m, and therefore a reasonable estimator for e = Y—X8—m. For those applications where the reasonableness of assumption (A2) is questionable, we believe that one could still use the modified cross-validation residuals to estimate the model errors, since the modified cross-validation method does not rely on explicitly incor-porating the error correlation structure. For instance, under the more general assumption 115 (Al), one could estimate a2 and * = (^ij) from eIth = . . . ,?„) T as follows: n —2 1 n - | » - j | * U = ^ ^ £ - for i ^ j However, we do not pursue this approach in this thesis. 6.4 Choosing h for cT(3~-i „„ We conclude this chapter by discussing the choice of smoothing parameter h for the esti-mated modified local linear backfitting estimator cT/3~-i . As indicated in Section 6.1, we denote 8~-i g c by B^-i h to emphasize its dependence upon h. Our theoretical goal is to choose values of h which minimize measures of accuracy for cF 8~-\ similar to those introduced for c T 8 I h and cT8^,-ih. Namely, if Bias(h;^ ) = cTBias(8^-i \X, Z) and Var(h; \I* ) = cTVar(3~-i AX, Z)c, we wish to choose the value of h that min-imizes the quantity MSE(h;ty ), obtained by taking fl — \& in (6.4). Denote this value by hMSE. In practice, we have to estimate this value from the data. The dif-ficulty that we face is that, since * is estimated and thus random, an expression for MSE(h\ * 1) is not tractable. To avert this issue, we ignore the effect of estimating \& and simply replace * by * in the expression for MSE{h-y-1). We have seen earlier in this thesis that, under certain conditions, 3^,-1 h and 8^-\ h are asymptotically 'close', so we expect our approach to be reasonable for large sample sizes. We propose to choose h using suitable modifications of the EBBS and plug-in methods discussed in Sections 6.2.2 and 6.2.3. The global and local modified EBBS methods for 116 choosing the smoothing parameter h of cT/3~-i attempt to estimate hffSE: h-MSE = argmin{Bias(h\^ 1 ) + V a r ( / x ; $ (6.26) h with h €H- For both methods, V a r ( / i ; * *) is computed by substituting \& wi th * into the expression of Var(h; * _ 1 ) , the exact conditional variance of B^-\ h . This expression is obtained by taking ft = in (6.1). The global modified E B B S method estimates Bias(h; * ) empirically by fitting a global ordinary least squares regression model to the 'data ' points j (ji(k),cTP^-i h ^ : k = I,..., N*j. We denote the amount of smoothing this method yields by h%BBS_G. The local modified E B B S method uses only a fraction of these data to accomplish the same task. We denote the amount of smoothing supplied - i by this method by hEBBS_L. The plug-in method for choosing the value of h in /3^-i tries to estimate (an approxi-mation to) h^SE: hftSE — argmin{Bias(h; * *) + Var(h; * *)} = hP'L>UG_IN. (6.27) hen Here, Var(h; * ) is as above, and Bias(h; * *) is constructed by substituting * wi th * into the expression of Bias(h; \ & _ 1 ) , the exact conditional bias of j3^,-ih. Th is expression is obtained by taking ft — \I>~1 in (6.3). 117 Chapter 7 Confidence Interval Estimation and Hypothesis Testing In this chapter, we develop statistical methods for assessing the magnitude and statistical significance of a linear combination of linear effects cT3 in model (2.1), where c = (co,... ,cp)T is a known vector with real valued components. Specifically, we propose several confidence intervals for assessing the magnitude of cT3, as well as several tests of hypotheses for testing whether cT3 is significantly different than some known value of interest. 7.1 Confidence Interval Estimation We propose to construct approximate 100(1 — a)% confidence intervals for cT3 from the usual, modified or estimated modified local linear backfitting estimators considered in this thesis, and their associated estimated standard errors. In what follows, we use the notation in Section 6.1 to denote these estimators generically by Ct3Q H , where Cl can be - l I, * or * , respectively, and h is an amount of smoothing that must be chosen from the data. Our confidence intervals use an estimated standard error SE(cTf3n h) obtained 118 as follows: SE(cT0n,h) = ^Va7r{h-Cl). (7.1) Here, Var(h;Cl) is an estimator of Var(h;Cl), the conditional variance of cT8nh given X and Z. Specifically, for Cl — I, Var(h; Cl) is defined as in (6.14). For Cl = if of is unknown, Var(h; Cl) is defined as in (6.15). Finally, for Cl = * \ Var(h; Cl) is obtained from (6.15) by replacing * with * . Note that the standard error expression in (7.1) does not account for the estimation of of and * when these quantities are unknown, nor does it account for the data-dependent choice of h. Rather, it is a purely 'plug-in' expression. The performance of a 100(1 — a)% confidence interval for cT3 depends to a great extent on how well we choose the smoothing parameter h of the estimator cT8n h. A poor choice of h can affect the mean squared error of cT3n h, resulting in a confidence interval with poor coverage and/or length properties. We want to choose an h for which (i) the bias of cT3nh is small, so the interval is centered near the true cT3, and (ii) the variance of cT3nh is small, so the interval has small length. Choosing h to ensure that the confidence interval is valid (in the sense of achieving the nominal coverage) and has reasonable length is crucial to the quality of inferences about cT3. In this thesis, we choose the amount of smoothing h needed for constructing confidence intervals for cT3 via the following data-driven choices of h, introduced in Chapter 6: 1. the (local) modified EBBS choice, hE}BBS_L\ 2. the global modified EBBS choice, hEBBS_G; 3. the (non-asymptotic) plug-in choice, hpLUG_IN. Recall that each of these choices is expected to yield an accurate estimator cT3a h of cT3. Throughout the rest of this chapter, unless otherwise specified, we assume that the 119 smoothing parameter h of the estimator c T 3 n h refers to any of hEBBS_L, hEBBS_G or h u [ l PLUG-IN' The performance of a 100(1 — a)% confidence interval for cT/3 also depends on how well we estimate SE(cT 3nh), the true standard error of cT3nh. As already mentioned, we will estimate SE(cT/3nih) by SE(cT0nh) as defined in (7.1). Recall that SE(cT0nh) depends on another smoothing parameter, needed for estimating \fr via as described in Section 6.3.2. It is not clear whether the modified cross-validation choice of smoothing proposed in Section 6.3.2 yields a reasonable estimator of SE(cT3nh)- The Monte Carlo simulations presented in Chapter 8 will shed more light on this issue. The standard 100(1 — a)% confidence interval for cT3 is given by cT0njl±za/2SE(<?0nth), (7.2) where za/2 is the 100(1 — a)% quantile of the standard normal distribution. According to the asymptotic results in this thesis, the estimator c 7 3U h is biased in finite samples. Consequently, the standard confidence interval for cF8 may not be correctly centered and may not provide 1 — a coverage. We propose two strategies for dealing with this problem. One strategy is to perform a bias adjustment to the estimator cT8n<h, to try to ensure that the confidence interval is better centered. This approach, referred to as bias-adjusted confidence interval construction, is discussed in Section 7.1.1. Another strategy is to perform an adjustment to the estimated standard error of cT3nh. The purpose of this adjustment is to inflate the estimated standard error of cT3n h to reflect the bias of cT8n h . This approach, referred to as standard error-adjusted confidence interval construction, is discussed in Section 7.1.2. Throughout, we assume we can use standard normal probability tables to construct the confidence interval in (7.2) and those proposed in Sections 7.1.1 - 7.1.2. This assumption is justified provided the estimator c T /3 n h is asymptotically normal and our standard error estimators are consistent. Opsomer and Ruppert (1999) established the asymptotic 120 normality of the estimator c T 3 U h for the case when the model errors are uncorrelated and Cl — I. However, no asymptotic normality results are available as yet for the cases when the model errors are correlated, for either Cl = I or more general Cl. The simulations conducted in Chapter 8 support the use of normal tables when constructing 95% confidence intervals. Note that, for small sample sizes, one might widen the confidence intervals by using t-tables instead of standard normal probabili ty tables. T h e issue of how one might specify the degrees of freedom involved in these t-tables needs to be considered carefully and is beyond the scope of this thesis. 7.1.1 Bias-Adjusted Confidence Interval Construction The idea underlying the bias-adjusted confidence interval estimation of cT0 is to first adjust the estimator c T 0 n h for possible finite sample bias effects. Then a bias-adjusted 100(1 — a)% confidence interval for cT0 is given by: c T 3 n A - 5 i a ^ ( c T 3 a f e ) ± ^ / 2 5 ^ ( c r 3 n A ) , (7.3) where Bias(cT'3^h) estimates the finite sample conditional bias of cT 0U h , given X and Z, and is defined either as in (6.17) for h = hpLUG_IN, or as in (6.10) for h = hjsBBS-G a n d h = hBBBS_L. Neither of these bias expressions takes into account the data-dependent choice of h. Furthermore, these bias expressions do not account for the estimation of * when Cl = * \ The length of the bias-adjusted confidence interval for cT0 in (7.3) is the same as that of the standard confidence interval in (7.2). The coverage properties of the bias-adjusted confidence interval may, however, be better than those of the standard confidence interval, because the bias-adjusted confidence interval may be better centered. 121 Note that the estimated standard error SE(cT 8nh) in (7.3) reflects the variabili ty of c T 8 n h , instead of the variabili ty of c T 8 n h — Bias(cT 8n h). One could, of course, re-place SE(cTPn h) by an estimator of the true standard error of c T 8 n h — Bias(cT 8n h). But such an estimator may be difficult to obtain in practice, unless one resorts to compu-tationally expensive bootstrapping methods, and may not necessarily yield a confidence interval wi th better coverage properties than those of the standard confidence interval. 7.1.2 Standard Error-Adjusted Confidence Interval Construc-tion We have suggested in Section 7.1.1 that the standard confidence interval for cT8 in (7.2) can be improved upon by replacing c T 8 n h w i th its bias-adjusted version c T 0 n h — Bias(cT'dfih.)- Another possible way to improve upon the standard confidence interval in (7.2) is to replace SE(cT 8Uh) w i th MSE(cT8nh), the square root of the estimated conditional mean squared error of c T 8 n h given X and Z. The motivation for this latter adjustment is that, compared to SE(cTPnh), \J MSE(cT 8nh) is a better measure of the uncertainty associated wi th estimating cT8 v ia Ct8Q H, as it tries to account for the finite sample bias of c T 8 n h . A standard error-adjusted 100(1 — a)% confidence interval for cT8 is given by: cT3n,, ± za/2^MSE(cT(3n>h) (7.4) where 2 12 MSE{&8^h)=\Bias{h-n)\ + [SE(h;Cl)\ . (7.5) Here, Bias(h; Cl) estimates the conditional bias of c T 8 n h given X and Z, and is defined either as in (6.17) for h = hpLUG_IN, or as in (6.10) for h = hEBBS_G and h = hEBBS_L. Note that the length of the standard error-adjusted confidence interval for cT8 in (7.4) is wider than that of the standard confidence interval in (7.2) due to the fact that 122 \JMSE(cT3nh) > SE(cT'3a,h)- This may translate into improved coverage proper-ties for the standard error-adjusted confidence interval. 7.2 Hypothesis Testing In this section, we exploit the duality between confidence interval estimation and hy-pothesis testing to develop tests of hypotheses for cT8. Suppose we are interested in testing the null hypothesis Ho : cT3 = 6 (7.6) against the alternative hypothesis H A : cT3^6, (7.7) where 5 is a constant. From the confidence intervals introduced in Section 7.1, we construct three test statistics for testing HQ against HA: Z £ = ^nt~ 6 , (7.8) n ' h S E ^ B ^ (2) _ cTpU)h- Bias(h;fl) - 6 Z ^ = SE{c^h) ' ( 7 ' 9 ) = • (7-io) y/MSE(cTf3nth) We wi l l reject H0 at significance level a if \Z^l\ > za/2-123 Chapter 8 Monte Carlo Simulations In this chapter we report the results of a Monte Carlo study on the finite sample properties of estimators and confidence intervals for the linear effect f3\ in the model: Yi = /?o + PiXi + m(Zi) + ei, i = l,...,n, (8.1) obtained by taking p = 1 in (2.1). Even though this model is not too complicated, we hope that it will allow us to understand how the properties of these estimators and confidence intervals will be affected by (1) dependency between the Xi's and the Zi's, and (2) correlation amongst the e;'s. For our study, we have deliberately chosen to use a context similar to that considered by Opsomer and Ruppert (1999) for independent ej's, so that we can make direct compar-isons. Given this context, the main goals of our simulation study were to: 1. Compare the expected log mean squared error (MSE) of the estimators for Pi. 2. Compare the performance of the confidence intervals for Pi built from these esti-mators and their associated standard errors. The rest of this chapter is organized as follows. In Section 8.1, we discuss how we generated the data in our simulation study. In Section 8.2, we provide an overview of the 124 estimators for Pi considered in this study. We also specify the methods used for choosing the smoothing parameters of these estimators. In Section 8.3, we compare the expected log mean squared errors (MSE) of the estimators for all simulation settings in our study. Finally, in Sections 8.4 and 8.5, we assess the coverage and length properties of various approximate 95% confidence intervals for Pi constructed from these estimators and their associated approximate standard errors. 8.1 The Simulated Data The data (Yi,Xi, Zi), i — 1,... ,n, in our simulation study were generated from model (8.1) using a modification of the simulation setup adopted by Opsomer and Ruppert (1999). Specifically, we took the sample size n to be 100 and set the values of the linear parameters Po and Pi to zero. We considered two m(-) functions: • mi(z) = 2sin(3z) - 2(cos(0) - cos(3))/3, z G [0,1]; • m2(z) = 2sin(6z) - 2(cos(0) - cos(6))/6, z G [0,1]. The Zi's were equally spaced on [0,1], being defined as Zi = i/(n + 1). Furthermore, Xi = g(Zi) + rji, with g(z) = QAz + 0.3, z G [0,1], and rji = (1 - 0.4)^ - 0.3, where the C/j's were independent, identically distributed having a Unif(0,1) distribution. The €j's followed a stationary AR(1) model with normal distribution: e»- = pet-i + Ui, (8.2) where p is an autoregressive parameter quantifying the degree of correlation amongst the ei's. The iij's were independent, identically distributed normal random variables having mean 0 and standard deviation au = 0.5. The Ui's were independent of the e^ 's. In our simulation study, we used p = 0 to include the case of independence, as well as p — 0.2, 0.4, 0.6 and 0.8 to model positive correlation ranging from weak to strong. 125 The simulation settings corresponding to p = 0 (the case of independent errors) are the same as those considered by Opsomer and Ruppert (1999), wi th the following exceptions: (i) we considered n = 100 instead of n = 250, (ii) we 'centered' the m(-) functions, that is, we subtracted a constant so that these functions integrate to 0 over the interval [0,1] and (iii) we scaled the errors 77, to have E{rji) = 0 instead of E(r}i) = 0.3. Opsomer and Ruppert d id not specify what value they used for Pi. For each model configuration, we generated 500 data sets. Note that there are 10 model configurations altogether, one for each combination of autoregressive parameter p and non-linear effect m(-) considered. Figure 8.1 displays data generated from model (8.1) for p — 0, 0.4, 0.8 and mi(z). Figure 8.2 provides the same display for m2(z). The responses Yi are qualitatively different for different values of p. For p = 0, the responses vary randomly about the m(-) curve. A s p increases from 0.4 to 0.8, the variation of the Yi's about the curve m(-) makes it vi r tual ly impossible to distinguish the non-linear signal m(-) from the autoregressive noise that masks i t . 8.2 The Estimators In this section, we provide an overview of the estimators for the linear effect Pi in model (8.1) considered in our simulation study. We also provide an overview of the methods used for choosing the smoothing parameter of these estimators. Note that Pi = cT8, where c = ( 0 , 1 ) T and 8 = (P0, Pi)T• The estimators of Pi considered in our simulation study are of the form Pi = cT8, where 8 is: (i) /3 J S c , the usual backfitting estimator defined in (3.4) wi th ft — i"; (ii) 8^-1 , the estimated modified backfitting estimator defined in (3.4) wi th ft = * ; ' h 126 (iii) 0(j_ScjT^, the usual Speckman estimator defined in (3.4) with fl = (I — Sch)T• In all three estimators, Sh is a centered smoother matrix, defined in terms of the Epanech-nikov kernel in (3.9). For the two backfitting estimators, we take Sch to be a centered local linear smoother matrix. For the usual Speckman estimator, we take Sh to be a cen-tered local constant smoother matrix with Nadaraya-Watson weights. The latter choice is motivated by the fact that the usual Speckman estimator is typically used with local constant smoother matrices with kernel weights. We are not sure to what extent the dif-ferences in performance between the usual Speckman estimator and the two backfitting estimators may be due to this difference in the method of local smoothing. Note that /3^-iSc, the modified backfitting estimator obtained from (3.4) with fl — was omitted from our simulation study. This estimator may have value as a benchmark, but has no practical value due to the fact that the error correlation matrix * is never fully known in applications. For similar reasons, we also omitted 0(i-s°h)T'<s>-1,sch J the modified Speckman estimator obtained from (3.4) with fl = (I — Sch)T^~1. Another estimator not included in our study is 0. CC^TS,-1 s o the estimated modified Speckman estimator obtained from (3.4) with fl = (I — S ^ ) 7 * . Recall that Aneiros Perez and Quintela del Rio (2001a) investigated the large sample properties of a similar estimator, based on local constant smoothing with Gasser-Muller weights. These authors have a suggestion for estimating * from the data, but they did not explore how well it works in practice. In our simulation study, the estimator 0 ~ - i „ r - which is similar to 0,r ^.^.--I^ -does poorly in general. We believe this may be due to a combination of the following: (1) * is hard to estimate in the presence of confounding between the linear, non-linear and correlation effects and (2) the additional variability introduced by estimating * is not properly taken into account when selecting the smoothing parameter and when constructing standard errors for /3^-i g c from small samples. We suspect that, if one were to use the methods proposed in this thesis to estimate \& for computing 0y_sc^T~-\ g c , one would also get an estimator with poor finite sample behaviour. 127 All three estimators in (i)-(iii) require a data driven choice of smoothing parameter. For the three backfitting estimators we consider EBBS-G and EBBS-L (see Section 6.2.2) and PLUG-IN (see Section 6.2.3). For the usual Speckman estimator, we use cross-validation, modified for correlated errors (MCV) and for boundary effects. The MCV criterion is similar to that in (6.21), namely: Here, Y^1^ is obtained as in (6.22), but with Cl = (I—§ch)T, where Sch is the centered local constant smoother matrix. Also, W is a weight function introduced to allow elimination (or at least significant reduction) of boundary effects that may affect the estimation of the non-linear effect m in model (8.1), and hence the prediction of Yt. W is defined as in Chu and Marron (1991): ' 5 if I < « < i ; W(u) = { 3 5 - - 5> 0, if 0 < u < | or \ < u < 1. Recall that EBBS-G depends on the tuning parameters I, N and T, whereas EBBS-L depends on the tuning parameters I, N, T, k\ and k2. Also, recall that PLUG-IN and MCV depend on the tuning parameter /. In our simulation study, we consider N = 50, T = 2, ki = 5, k2 = 5, and I = 0,1, . . . , 10. For convenience, throughout the remainder of this chapter, we use the notation PIJ PLUG-IN^ $u,EBBS-G a n d Pu1 EBBS-L f° r * n e usual local linear backfitting estimators of j3\. We use the notation P^§M,PLUG-IN^ J^EM,EBBS-G a n d P{EM,EBBS-L for t h e estimated modified lo-cal linear backfitting estimators. Finally, we use the notation ($Mcv *° r e f e r to the usual Speckman estimator of (3\. Wherever necessary, we refer to these estimators generically as 128 8.3 The M S E Comparisons In this section, we identify the estimators /?}';, including bandwidth selection methods, that appear to be best, in the sense of being most accurate for all simulation settings and for most values of /, the tuning parameter used in the modified cross-validation. Recall ~ (0 that the measure of accuracy of /3i considered in this thesis is the conditional MSE of /3[l\ MSE(0[l)), defined in (6.4). Specifics are provided below. To compare the accuracy of two estimators for a given simulation setting, we look at the boxplot of differences in the log MSE's of these estimators. If the boxplot is symmetric about 0, then the two estimators have comparable accuracy. We also conduct a level 0.05 two-sided paired t-test to compare the expected log MSE's of the estimators. If the test is significant, we label the boxplot with an S. The log MSE's of the two estimators are evaluated from the 500 data sets generated for the given simulation setting. For each backfitting estimation method (usual, estimated modified), we recommend a way to choose the smoothing parameter h. Then we compare the resulting backfitting estimators, including a comparison with the usual Speckman estimator to determine an estimator that is best, in the sense of being most accurate for all simulation settings and most values of I. In Figures A.1-A.10 in Appendix A, we study the methods of bandwidth choice for the usual local linear backfitting estimator. We display boxplots of pairwise differences in the log MSE's of the estimators PV,PLUG-IN> PU]EBBS-G a n d PU,EBBS-L> £ = 0,1, . . . , 10. Each figure corresponds to a different simulation setting. From these figures, we see that 1$PLUG-IN a n a PIJEBBS-G n a v e comparable accuracy across all simulation settings, provided I is large enough, say I > 4. They also have better accuracy than @U]EBBS-L> which performs poorly for several simulation settings (see, for instance, Figures A.6-A.7). Therefore, we recommend using PLUG-IN and EBBS-G to choose the smoothing parameter for the usual local linear backfitting estimator. 129 Figures A.11-A.20 display the corresponding plots for the estimated modified local linear backfitting estimator. We see that PEM EBBS_G is the most accurate across all simulation settings, provided I is large enough, say I > 4. We also see that PEM EBBS-L AN^ PEM,PLUG-IN perform very poorly relative to PEM,EBBS-G f° r m o s t simulation settings and most values of I. Therefore, we recommend using EBBS-G to choose the smoothing parameter for the estimated modified local linear backfitting estimator. In Figures A.21-A.30 we compare estimators using our favourite bandwidth selection method. We display boxplots of pairwise differences in the log MSE's of the estima-tors M]PLUG-IN, M]EBBS-G> 0EM,EBBS-G a n d MMCV 1 = 0,1, . . . , 10. Each figure corresponds to a different simulation setting. From these figures, we conclude that the estimators P^PLUG-IN^ PU]EBBS-G a n d PEMEBBS-G have comparable accuracy for all simulation settings, provided I is large enough, say I > 4. The estimator P^MCV ^S less accurate than these three estimators for most simulation settings and most values of /. In particular, plots such as those in Figures A.24, A.25, A.29 and A.30 strongly support the elimination of P J P M C V . The poor performance of P^MCV w ^ n respect to the log MSE criterion could be due to the fact that this estimator uses local constant smoothing, instead of local linear smoothing. But it could also be due to the fact that $3 MCV I S computed with an MCV choice of smoothing. Recall that this choice attempts to estimate the amount of smoothing optimal for estimation of XB + m. It is not clear whether this choice will provide a reliable estimate of the amount of smoothing optimal for estimation of cT3. 8.4 Confidence Interval Coverage Comparisons In this section, we assess and compare the coverage properties of various confidence intervals for Pi constructed from all estimators considered in our simulation study. Our goals are to: 130 1. Identify those estimators which yield standard confidence intervals for Pi w i th good coverage properties across a l l simulation settings and most values of I. 2. Establish whether the coverage properties of standard confidence intervals for Pi can be improved through bias or standard error adjustments. To assess the coverage properties of a confidence interval C for a given simulation setting, we proceed as follows. We evaluate the confidence interval for each of the 500 simulated data sets. We calculate the proportion of these intervals which contain the true value of Pi and denote it by p. lfp± 1.96-y/p(l — p)/500, the 95% confidence interval for the true coverage, contains the nominal level of C , we say that C is valid. If the upper (lower) confidence l imi t is smaller (bigger) than the nominal level of C , we say that C is anti-conservative (conservative). The confidence intervals for Pi considered in our simulation study fall into three cat-egories: standard, bias-adjusted and standard-error adjusted, as defined in (7.2), (7.3) and (7.4). 8.4.1 Standard Confidence Intervals We now assess the coverage properties of the standard 95% confidence intervals for Pi obtained from the estimators PU]PLUG-IN, PU,EBBS-G> PU,EBBS-L> 0EM,PLUG-IN> PEM,EBBS-G> 0EM,EBBS-L A N D @S!MCV> where I = 0 , 1 , . . . , 10. Point estimates and 95% confidence interval estimates for the true coverage achieved by these intervals are displayed in Figures B.1-B.10 in Appendix B . Each figure corresponds to a different simulation setting. Figures B.1-B.10 show that the standard confidence intervals constructed from the esti-mators P ijtpLUG-iNi @U!EBBS-G a n o - 0 S!MCV a r e v a u d for al l simulation settings provided the value of I is large enough. However, the standard confidence intervals obtained from 131 the estimators PU,EBBS-L^ PEM,PLUG-IN> PEM,EBBS-G a n d PEM,EBBS-L h a v e extremely poor coverage for many simulation settings and for many values of /; see, for instance, Figures B.6 and B.7 . In view of these findings, the preferred estimators for constructing standard confidence intervals for Pi are PIJ PLUG-IN^ PU]EBBS-G a n d P S!MCV- The other estimators cannot be trusted to produce valid inferences on Pi. More details concerning our findings are provided below. The standard confidence intervals constructed from the estimators PJj PLUG-IN A N D PIJEBBS-G a r e v & l i d for all simulation settings, provided I is large enough, as shown in Table 8.1. From this table, we see that taking I > 1 when p = 0.2, I > 2 when p — 0.4, I > 3 when p = 0.6, and I > 4 when p = 0.8 yields valid intervals for the contexts con-sidered. We recommend using these intervals to conduct inferences on Pi, wi th values of I that are large enough. Clearly, taking I = 0,1, 2, 3 is not advised, unless one is certain that p is small. W h a t is not apparent from Table 8.1 is why the confidence intervals constructed from Pu PLUG-IN a n d PIJEBBS-G a r e v & h d for smaller values of I. Typical ly , for small I's, the estimates of Pi constructed from the simulated data have a tendency to underestimate the true value of Pi when m(z) = m2{z). Furthermore, the estimated standard errors associated wi th these estimates have a tendency to underestimate the true standard errors both when m(z) = mi(z) and when m(z) = m2{z). However, as I increases, the estimates of Pi and their associated standard errors improve significantly for al l simulation settings. The standard confidence intervals constructed from the usual Speckman estimator P ^SMCV are generally valid across a l l simulation settings even for smaller / values. However, P^SMCV d o e s n ° t yield valid confidence intervals when m(z) — 7712(2) and (i) p = 0.4 and I — 1 or 4 and (ii) p — 0.8 and Z = 3,4, 5, 6, 7, 8 or 10. In these two cases, P^SMCV yields confidence intervals that are slightly anti-conservative. Th is lack of continuity in behaviour is of concern and might not be attributable to simulation variability. In-deed, Figures B.6-B.10 show that, for m(z) = m2(z), P^SMCV seems to exhibit an anti-132 conservative pattern for most I's. W h e n p = 0 and m(z) = rrii(z), the standard confidence intervals obtained from the estimators PU]EBBS-L^ PEM,PLUG-IN, PEM,EBBS-G a n d PEM,EBBS-L provide the nomi-nal coverage, regardless of how we choose I (see Figure B . l ) . However, when p = 0 and m(z) = 7712(2), the intervals constructed from P$EBBS-L a n d P EM EBBS-L a r e extremely anti-conservative for al l values of / (see Figure B.6) . In addition, the intervals constructed from PEM,PLUG-IN a n d PEM EBBS-G a r e m i l d l y anti-conservative for many values of I (see Figure B.6) . A s p increases, the coverage provided by some of the standard confidence intervals ob-tained from P(U]EBBS-L> PEM,PLUG-IN> PEM,EBBS-G a n d PEM,EBBS-L deteriorates for many small and/or large values of I, depending on the specification of m(-). For in-stance, when m(z) = m 2 ( z ) , the coverage properties of the intervals constructed from PEM,PLUG-IN a n d PEM,EBBS-L a r e extremely poor (see Figures B.7-B.10) . The coverage properties of the intervals constructed from j3 y EBBS-L a r e a ^ s o P o o r for small p values (see Figures B.7-B.8) . Final ly, the coverage properties of the intervals constructed from PEM,EBBS-G w o r s e n a s P increases, but not dramatically. We do not recommend using these intervals to carry out inferences on 8.4.2 Bias-Adjusted Confidence Intervals In this section, we assess the coverage properties of the bias-adjusted 95% confidence intervals for Pi. We did not consider a bias-adjusted confidence interval for the usual Speckman estimator P^MCV, as this estimator is known to have good bias properties both when p = 0 (see Speckman, 1988) and when p > 0 (see Aneiros-Perez and Quintela-del-Rio, 2001a). Plots (not shown) of the point estimates and 95% confidence interval estimates for the true coverage achieved by the bias-adjusted intervals yield some general conclusions. 133 Only the estimators p uPLUG-IN a n d 0 U]EBBS-G yield bias-adjusted confidence intervals that are valid for all simulation settings provided the value of I is large enough. These values of I are almost identical to those reported in Table 8.1. Again, we see that one should avoid using I = 0,1,2,3 unless one is sure that p is small enough. 8.4.3 Standard E r ro r -Ad jus ted Conf idence Intervals Here, we assess the coverage properties of the standard error-adjusted 95% confidence intervals for f3\. We did not consider a standard error-adjusted confidence interval for the usual Speckman 3 g\{cv> due 1 , 0 its g°°d bias properties. Plots (not shown) indicate that only the estimators 3 \j PLUG-IN a n d 0 IJ EBBS-G P r o v ide standard error-adjusted confidence intervals that are valid for all simulation settings provided the value of I is large enough. These values of / are nearly identical to those reported in Table 8.1. Yet again, we see that one should avoid using / = 0,1, 2, 3 unless one is sure that p is small enough. To sum up, we see no reason to recommend bias adjustments to the estimators 3 uPLUG-IN and P(JEBBS-G o r to their associated standard errors. Indeed, such adjustments do not seem to improve the coverage properties of the confidence intervals obtained from these estimators. 8 . 5 Confidence Interval Length Comparisons Recall from the previous section that we identified 3 uPLUG-IN a n d 0 U\EBBS-G a s the only estimators of Pi in our simulation study that yielded valid 95% standard confidence intervals for all simulation settings provided the value of / is large enough. The standard intervals based on P SMCV were found to be competitive, but just not as good. Also recall that the coverage properties of the standard confidence intervals constructed from 134 P UNPLUG-IN aftd P^JEBBS-G could n ° t D e improved by performing bias-adjustments to these estimators or to their associated standard errors. Before recommending any of the estimators 0$PLUG-IN a n a PIJEBBS-G f° r practical use, we must compare the lengths of the standard confidence intervals for Pi constructed from these estimators. We choose to include standard intervals constructed from P S^MCV XN o u r comparison to gain more understanding into their properties. When several confidence interval procedures are valid (in the sense of achieving the desired nominal level), we prefer the one with the shortest length. In this section, we conduct visual and formal comparisons of the lengths of the standard 95% confidence intervals for Pi constructed from these estimators. Wc only consider values of / that are large enough to guarantee the validity of the ensuing confidence intervals, as in Section 8.4. Specifically, we take / > 1 for p — 0.2, / > 2 for p ~ 0.4, / > 3 for p = 0.6 and I > 4 for p = 0.8. To compare the lengths of two confidence intervals for a given simulation setting wc look at the boxplot of differences in the log lengths of these intervals. The lengths are evaluated from the 500 data sets generated for the given simulation setting. If the boxplot is symmetric about 0, then the two confidence intervals have comparable length. Figures C l - C.10 in Appendix C (bottom three rows) display boxplots of pairwise differences in the log length of the standard 95% confidence intervals constructed from the estimators P(JPLUG-INI PUEBBS-G a n d PS!MCV F r o m these figures, we see that for all simulation settings with p > 0 and for values of I that are large enough (e.g., larger than 3), the estimators P^u PLUG-IN a r m PIJEBBS-G yield shorter confidence intervals than those based on P^MCV- This was to be expected, as the log MSE behaviour of P^SMGV w a s s e e n to be inferior to that of Pu PLUG-IN a n a PIJEBBS-G- Furthermore, we notice that the lengths of the confidence intervals constructed from P lj PLUG-IN ANA-PUEBBS-G t e n d to be comparable for many of these I values. Our previous findings arc supported by the results of pairwise level 0.05 two-sided paired 135 t-tests for comparing the expected log lengths of the confidence intervals under consid-eration for all simulation settings and for values of I that are large enough. We describe these tests below. Given a simulation setting, for fixed I, conduct (?,) two-sided paired t-tests to compare the expected log lengths of the intervals obtained from the estimators Pu PLUG-IN^ PIJ EBBS-G and PSMCV For each test, the null hypothesis is that the expected log lengths of the intervals being compared are the same. The test result is considered significant if the p-value associated with the test is smaller than 0.05. Use the results of the t-tests to identify which estimators yield the shortest confidence interval. If all tests give significant results, we claim that there is a clear winner; in other cases, we say that two estimators might be tied for best. Figures C.1-C.10 (top row) show the average length of the confidence intervals obtained, with standard error bars superimposed. The figures indicate which of these estimators produces the shortest confidence interval for values of I of interest. 8.6 Conclusions Based on the results of our simulation study, we recommend using the usual local linear backfitting estimators PIJPLUG-IN a n d PIJ EBBS-G a n o - the usual Speckman estimator Bs^MCV to carry out valid inferences about the linear effect B\ in model (8.1). The value of I used when computing these estimators should be large enough, that is, at least 4. Our findings indicate that Pu PLUG-IN a n o - Pu EBBS-G have comparable accuracy for large values of I, and that they are in general more accurate than P^MCV All three estimators yield valid standard 95% confidence intervals for Pi when I is large enough. However, the intervals based on PU]PLUG-IN a n d Pu'EBBS-G t e n d to have shorter length and are therefore preferred over the interval based on P *PMCV • 136 We see no reason to recommend bias-adjustments to the estimators Pu PLUG-IN ANA-PIJEBBS-G o r t ° their associated estimated standard errors. Such adjustments do not seem to improve the coverage properties of the corresponding confidence intervals. Final ly, we do not recommend using the usual backfitting estimator PIJEBBS-L O R *NE estimated modified backfitting estimators PEM,PLUG-IN> PEM,EBBS-G> PEM,EBBS-L T O carry out inferences about fa. These estimators yielded confidence intervals wi th poor coverage for many simulation settings and many values of I, owing to the difficulties associated wi th estimating their standard errors. 137 Figure 8 .1: Data simulated from model (8.1) for p = 0,0.4,0.8 and m(z) = m\(z). The first row shows plots that do not depend on p. The second and third rows each show plots for p = 0, 0.4, 0.8. 138 X1 vs. Z 4, • m(Z) vs. Z 4| • 0.5 Z 0.5 Z £ VS. Z £ VS. Z £ VS. Z Y v s . Z Y v s . Z Y v s . Z N II >-N ca + o CO. II >-Figure 8.2: Data simulated from model (8.1) for p = 0, 0.4, 0.8 and 7n(z) = 7712(2). T/ie ^ r s i row shows plots that do not depend on p. The second and third rows each show plots for p = 0, 0.4, 0.8. 139 Table 8.1: Values of I for which the standard 95% confidence intervals for Pi con-structed from the estimators PXJPLUG-IN> PU]EBBS-G and P^SMCV a r e v a ^ the sense of achieving the nominal coverage) for each setting in our simulation study. 7711(2) off) P U,PLUG--IN P U,EBBS-G P S,MCV p = 0 le{0,.. .,10} le{0,.. .,10} I e {0,.. ,10} p = 0.2 l€{0,.. .,10} l£{0,.. -,10} le{i,.. ,10} p = 0.4 l€{l,.. .,10} le{2,.. .,10} 1 e {0,.. ,10} p = 0.6 le{2,.. .,10} le {3,.. .,10} le{0,.. ,10} p = 0.8 le{3,.. .,10} le {3,.. .,10} le{0,.. ,10} m2{z) P U,PLUG- IN P U,EBBS--G PS,MCV p = 0 le {0,.. .,10} le{0,.. .,10} le{o,...,w} p = 0.2 l€{0,.. .,10} l€{l,.. • ,10} le {0.....10} p = 0.4 ie{i,.. .,10} le{2,.. .,10} l e {0}U{2,3}U{5,... p = 0.6 le {3,.. .,10} le{3,.. .,10} /e{o,...,io} p = 0.8 le{3,.. .,10} l e {4,.. .,10} / e {0,1,2}U{9} 140 Chapter 9 Applicat ion to A i r Pollut ion Data Many community-level studies have provided evidence that air pollution is associated with mortality. Statistical analyses of data collected in such studies face various method-ological challenges: (1) controlling for observed and unobserved factors, such as season and temperature, that might confound the true association between air pollution and mortality, (2) accounting for serial correlation in the residuals that might underestimate statistical uncertainty of the estimated association, and (3) assessing and reporting un-certainty associated with the choice of statistical model. Various statistical models can be used to describe the true association between air pol-lution and health outcomes of interest based on community-level data. However, the most widely used have been the generalized additive models (GAMs) introduced by Hastie and Tibshirani (1990). These models include a single 'time series' response (e.g. non-accidental mortality rates) and various covariates (e.g. pollutants of interest, time, temperature). The effects of the pollutants of interest on the response are typically pre-sumed to be linear, whereas those of the remaining covariates are presumed to be smooth, non-linear. Schwartz (1994), Kelsall, Samet and Zeger (1997), Schwartz (1999), Samet, Dominici, Curriero et al. (2000), Katsouyani, Toulomi, Samoli et al. (2001), Moolgavkar (2000), Schwartz (2000) are just some of the authors who relied on GAMs in order to assess the acute effects of air pollution on health outcomes such as mortality or hospital 141 admissions. There are various problems that researchers must consider when using GAMs to analyze air pollution data arising from community-level studies. Some of these problems are purely computational, whereas others are more delicate and pertain to the theoretical underpinnings of these models. Several computational issues associated with the S-Plus implementation of methodology developed by Hastie and Tibshirani (1990) for estimation of GAMs have been brought to light in recent years. We describe these problems here. The linear and non-linear effects in GAMs applied to air pollution data have been typically estimated using the S-Plus function gam. Dominici et al. (2002) showed that gam may provide incorrect estimates of the linear effects in GAMs and their standard errors if used with the original default parameters. Although the defaults have recently been revised (Dominici et al., 2002), an important problem that remains is that gam calculates the standard errors of the linear effects by assuming that the non-linear effects are effectively linear, resulting in an underestimation of uncertainty (Ramsay et al., 2003a). In air pollution studies, this assumption is likely inadequate, resulting in underestimation of the standard error of the linear pollutant effect (Ramsay et al., 2003a). The practical choice of the degree of smoothness of the estimated non-linear confounding effects of time and meteorology variables is a delicate issue in air pollution studies which utilize GAMs. Given that the confounding effects are viewed as a nuisance in such studies, the appropriate choice should be informed by the objective of conducting valid inferences about the pollution effect. Most choices performed in the air pollution literature are based on exploratory analyses (see, for instance, Kelsall, Samet and Zeger, 1997) and seem to be justified by a different objective, namely doing well at estimating the non-linear confounding effects. This objective typically ignores the impact of residual correlation on the choice of degree of smoothness, as well as the dependencies between the various variables in the model. 142 In the present chapter we apply the methodology developed in this thesis to analyze air pollut ion data collected in Mexico C i t y between January 1, 1994 and December 31, 1996. Our goal is to determine whether the pollutant P M 1 0 has a significant short-term effect on the non-accidental death rate in Mexico C i t y after adjusting for temporal and weather confounding. We give a description of the data in Section 9.1 and analyze the data in Section 9.2. 9.1 Data Description P M 1 0 - airborne particulate matter less than 10 microns in diameter - is a major com-ponent of air pollut ion, arising from natural sources (e.g. pollen), road transport, power generation, industrial processes, etc. W h e n inhaled, P M 1 0 particles tend to be deposited in the upper parts of the human respiratory system from which they can be eventually expelled back into the throat. Health problems begin as the body reacts to these foreign particles. P M 1 0 is associated wi th mortality, exacerbation of airways disease and decre-ment in lung function. Al though P M 1 0 can cause health problems for everyone, certain people are especially vulnerable to its adverse health effects. These "sensitive popula-tions" include children, the elderly, exercising adults, and those suffering from heart and lung disease. The data to be analyzed in this chapter were collected in Mexico C i t y over a period of three years, from January 1, 1994 to December 31, 1996, in order to determine if there is a significant short term effect of P M 1 0 on mortality, after adjusting for potential temporal and weather confounders. The data consist of daily counts of non-accidental deaths, daily levels of ambient concentration of P M 1 0 (10fig/m3), and daily levels of temperature (°C) and relative humidity (%). The ambient concentration of P M 1 0 corresponding to a given day was obtained by averaging the P M 1 0 measurements over a l l the stations in Mexico Ci ty . 143 Pairwise scatter plots of the data are shown in Figure 9.1. The most s tr iking features in these plots are the strong annual cycles in the log mortal i ty levels, the daily level of ambient concentration of P M 1 0 , and the daily levels of temperature and relative humidity. It is likely that the annual cycles in the log mortali ty levels are produced by unobserved seasonal factors such as influenza and respiratory infections. Note that log mortali ty and P M 1 0 peak at the same time wi th respect to the annual cycles. Our analysis of the health effects of P M 1 0 must account for the potential confounding effect of these temporal cycles on the association between P M 1 0 and log mortality. We believe the strength of these cycles w i l l make it difficult to detect whether this association is significant. 9.2 Data Analysis The following is an overview of our data analysis. Firs t , we introduce the four statistical models that we use to capture the relationship between P M 1 0 and mortality, adjusted for seasonal and meteorological confounding. Three of these models contain smooth non-parametric terms which attempt to control for these confounding effects. Next, we illustrate the importance of choosing the amount of smoothing for estimating the nonparametric terms in these models when the main objective is accurate estimation of the true association between P M 1 0 and mortality. We then focus on determining which of the four models is most relevant for the data. Final ly , we use this model as a basis for carrying out inference about the true association between P M 1 0 and mortality. 9.2.1 Models Entertained for the Data Let Di denote the observed number of non-accidental deaths in Mexico C i t y on day i, and let Pi, Ti and Hi denote the daily measures of P M 1 0 , temperature and relative humidity, 144 respectively. The models that we entertain for our data are: log(Di) = 0o + 0iPi + ei (9.1) log{Di) = p0 + 0^ + mi(i) + et (9.2) log(Di) = pQ + + mi(i) + p2Ti + P3Hi + p23Ti • Ht + a (9.3) log(Di) = P0 + PiPi + mi{i) + m2{Th Hi) + et. (9.4) Here, i = 1,2,..., 1096. Also , m i is a smooth univariate function, whereas m2 is a smooth bivariate surface. The function mi serves as a linear filter on the log mortal i ty and P M 1 0 series and removes any seasonal or long-term trends in the data. For the time being, the error terms in al l four models are assumed to be independent, identically distributed, w i th mean 0 and constant variance o\ < oo. The independence assumption wi l l be relaxed later. Models (9.1)-(9.4) treat the log mortali ty counts as a continuous response. Furthermore, they assume the relationship between P M 1 0 and log mortali ty to be linear, to allow for easily interpretable inferences about the effect of P M 1 0 on log mortality. The models differ, however, in their specification of the potential seasonal and weather confounding on this relationship. Specifically, model (9.1) ignores the possible seasonal and weather confounding on the relationship between P M 1 0 and log mortality. Models (9.2)-(9.4), however, allow us to adjust this relationship for potential seasonal and weather con-founding. Models (9.2) and (9.3) require that we specify the amount of smoothing needed for estimating m j . M o d e l (9.4) requires that we specify the amount of smoothing necessary for estimating both mi and m2. To fit models (9.2)-(9.4) to the data, we use the S-Plus function gam w i th the more stringent convergence parameters recommended by Domin ic i et al. ( 2002). We employ a univariate loess smoother to estimate mi and a bivariate loess smoother to estimate 145 m 2 . The loess smoothers are local linear smoothers relying on spans corresponding to a fixed number of nearest neighbours instead of a bandwidth. 9.2.2 Importance of Choice of Amount of Smoothing The inferences made on the linear P M 1 0 effect Pi in any of the models (9.2)-(9.4) may be severely affected by the choice of amount of smoothing for estimating the smooth confounding effects in these models. To illustrate the impact of this choice on the con-clusions of such inferences, we restrict attention to model (9.3). Later, we w i l l see that this model is the most appropriate for the data. Figure 9.2 compares the impact of various choices of smoothing for the seasonal effect m i in model (9.3) on the following quantities: (i) gam estimates of Pi, (ii) gam standard errors for the estimates in (i), (iii) 95% confidence intervals for Pi constructed from the estimates in (i) and (ii), (iv) gam p-values associated wi th standard t-tests of significance of Pi. These quantities were obtained by fitting model (9.3) to the data using gam w i th loess as a basic smoother. The loess span used for smoothing mi was allowed to take on values in the range 0.01 to 0.50. The reference distr ibution for the 95% confidence intervals and the p-values depicted in Figure 9.2 is a t-distribution whose degrees of freedom are the residual (or error) degrees of freedom associated wi th model (9.3). Note that the estimated standard errors reported by gam do not account for error correlation. Changing the span for smoothing mi greatly affects the estimates, standard errors, con-fidence intervals and p-values in Figure 9.2 and hence the conclusions of our inferences on Pi, the short-term P M 1 0 effect on log mortality. In particular, using large spans for 146 smoothing m i suggests that the data provide strong evidence in favour of a significant P M 1 0 effect on log mortality, after adjusting for seasonal and weather confounding. Us-ing small spans for smoothing m i suggests that the data do not provide enough evidence in support of a significant P M 1 0 effect on log mortali ty in Mexico Ci ty . Proper choice of amount of smoothing for estimating the seasonal effect m i in model (9.3) is crucial for making inferences on Pi, as seen in Figure 9.2. Given the sensitivity of our conclusions to the choice of smoothing, the natural question that arises is: how can we choose the amount of smoothing to be able to make valid inferences on Pi? The correct choice of smoothing should be appropriate for accurate estimation of Pi, not for accurate estimation of m i . This choice should account for the strong relationships between the linear and non-linear variables in the model seen i n Figure 9.1, and for potential correlation amongst model errors. It is important to note that the S-Plus function gam provides no data-driven method for choosing the amount of smoothing. Using gam's default choice of smoothing is not advised when one is concerned wi th accurate estimation of Pi. The default choice of smoothing used by gam is 0.50, or 50% of the nearest neighbours. Th is choice is much larger than the choices that we recommend for estimating m i (shown in the next section). The theoretical results in this thesis suggest that the correct choice of smoothing for estimating Pi should undersmooth the estimated mi. Therefore, this choice of smoothing is most likely smaller than the one we recommend for estimating m i , and certainly not larger. 9.2.3 Choosing an Appropriate Model for the Data In this section, we focus on the issue of selecting an appropriate model for the data amongst models (9.1)-(9.4). Selecting such a model requires that we balance model complexity wi th model parsimony. In what follows, we show that model (9.3) is the most 147 appropriate for describing the variabili ty in the log mortali ty counts, as it is complex enough to capture the main features present in the data, yet relatively inexpensive to fit to these data in terms of degrees of freedom. M o d e l (9.1) is the simplest of models (9.1) -(9.4) and, not too surprisingly given the strong cycles apparent in Figure 9.1, it provides an inadequate description for the variabil i ty in the log mortali ty counts. In fact, the linear relationship between P M 1 0 and log mortali ty postulated by model (9.1) explains only 9.25% of the total variabil i ty in the log mortali ty counts. Figure 9.3 (top panel) shows that the log mortali ty counts are widely scattered about the regression line obtained by fitting model (9.1) to the data. Figure 9.3 (bottom panel) shows that model (9.1) displays clear lack-of-fit, as it fails to account for the strong annual cycles present in the model residuals. We therefore drop model (9.1) from our pool of candidate models and concentrate instead on models (9.2)-(9.4). M o d e l (9.4) is the most complex of these models, and w i l l consume significantly more degrees of freedom when fitted to the data than either model (9.2) or model (9.3). A s we shall see shortly, comparing model (9.4) against model (9.2) v ia a series of approximate F-tests suggests that we can drop model (9.4) in favour of model (9.2). We could therefore consider the simpler model (9.2) as being adequate for describing the variabili ty in the log mortali ty counts. However, given that the weather variables are typical ly included in models for P M 1 0 mortali ty data, we prefer to use model (9.3). Th is model is more flexible than model (9.2), as it includes linear marginal effects for the weather variables together wi th a linear interaction effect between these variables. Compared to model (9.2), this model can be fitted to the data at the expense of just three additional degrees of freedom. Given the large size of the data set, this is an insignificant price to pay for achieving more modelling flexibility. We now provide more details concerning the choice of an appropriate model for our data amongst models (9.2)-(9.4). A s a first step we need to identify spans that are reasonable for smoothing the seasonal effect mi in these models. 148 To identify a reasonable range of spans for smoothing m i in model (9.2), we fit model (9.2) to the data by smoothing m i wi th spans ranging from 0.01 to 0.50 in increments of 0.01 and examine plots of the fitted m i and corresponding model residuals. F rom Figures 9.4 and 9.5 we see that the data suggest spans in the range 0.09 — 0.12. Using spans smaller than 0.09 for estimating m i leads to under-smoothed fits, that are visually noisy. O n the other hand, using spans larger than 0.12 leads to over-smoothed fits, that fail to reflect important seasonal features of the data. In summary, the range 0.09 — 0.12 is reasonable for smoothing the seasonal effect m i in model (9.2). Plots of the fitted additive component m i in models (9.3) and (9.4) (not shown) corre-sponding to spans in the range 0.09 to 0.12 are similar to those in Figure 9.4 and suggest that this range is also reasonable for smoothing the seasonal effect m i in models (9.3) and (9.4). We now show that we can reduce model (9.4) to model (9.2). We use a series of approx-imate F-tests to compare models (9.4) and (9.2). Each F-test compares a fit of model (9.4), obtained by smoothing m i wi th the span s\, against a fit of model (9.2), obtained by smoothing m i wi th the span s i and m 2 w i th the span s 2 . The test statistic for each F-test is obtained i n the usual fashion from the residual sums of squares and the residual (or error) degrees of freedom associated wi th the two model fits. The residual degrees of freedom of these fits are obtained as the difference between the size of the data set n = 1096 and the trace of the hat matr ix associated wi th the model fit. We allow the span Si to range between 0.09 and 0.12 in increments of 0.01, and the span s 2 to range between 0.01 and 0.50 i n increments of 0.01. The p-values associated wi th these F-tests are displayed in Figure 9.6. P-values corre-sponding to spans s 2 bigger than 0.04 are quite large, suggesting that the smooth weather surface m 2 need not be included in model (9.4). P-values corresponding to spans s 2 of 0.02, 0.03 or 0.04 are a bit smaller, suggesting that perhaps the surface m 2 should be included in the model. However, Figures 9.7 and 9.8, for s\ — 0.09, show that very small 149 spans are not appropriate for estimating the surface m 2 , as they yield visually rough surfaces that consume unacceptably high numbers of degrees of freedom. Using a span s i of 0.10,0.11 or 0.12 instead yielded plots (not shown) that were basically identical to those in Figures 9.7 and 9.8. In conclusion, the smooth weather surface m2 contributes litt le to model (9.4), so there is no real need to include either temperature or relative humidi ty in this model. In other words, we can reduce model (9.4) to model (9.2). Coplots (not shown) of the residuals associated wi th model (9.2) versus temperature, given relative humidity, and versus relative humidity, given temperature, support this conclusion. Since there is no real need to include the weather variables, temperature and relative humidity, we could consider the simpler model (9.2) as being adequate for describing the variabil i ty in the log mortali ty counts. However, for reasons explained earlier, we prefer to use the more flexible model (9.3). How well does model (9.3) fit the data? To answer this question, we examine a series of diagnostic plots. Figure 9.9 shows plots of the residuals associated wi th model (9.3) against P M 1 0 and day of study. These residuals were obtained by smoothing the unknown mi w i th a span of 0.09; using spans of 0.10, 0.11 or 0.12 yielded similar plots (not shown). The functional form of the relationship between P M 1 0 and log mortali ty postulated by model (9.3) is not violated by the data, since no systematic structure is apparent in the plot of residuals versus P M 1 0 . The plot of residuals against day of study also shows no systematic structure, suggesting that the seasonal component mi of the model accounts for the long-term temporal variation in the data reasonably well. Figures 9.10-9.11 show that the functional specification of the weather portion of model (9.3) is not violated by the data. Indeed, these plots display no obvious systematic structure. The weather coplots corresponding to spans of 0.10,0.11 and 0.12 were similar, so we omitted them. Final ly , Figure 9.12 presents autocorrelation and part ial autocorrelation plots for the residuals associated wi th model (9.3). From these plots, it is apparent that the 150 magnitude of the residual correlation is small. We believe this is due to the fact that most of the short-term temporal variation in log mortali ty counts has been accounted for by the seasonal component mj of the model. Compar ing Figure 9.12 against Figure 9.13, which displays autocorrelation and part ial autocorrelation plots for the raw log mortali ty counts, supports this belief. In summary, the assumptions underlying the systematic part of model (9.3) seem rea-sonable. However, there is some modest suggestion that the independence assumption concerning the error terms in this model may not hold for these data. This assumption w i l l be relaxed to account for the slight temporal correlation present in the data. Mode l (9.3) can therefore be used as a basis for carrying out inferences on pi, the linear P M 1 0 effect on log mortality, adjusted for seasonal and weather confounding. Account ing for error correlation when conducting such inferences is perhaps not as important as accounting for the strong relationships between the linear and non-linear variables in the model evident in Figure 9.1. 9.2.4 Inference on the PM10 Effect on Log Mortality In order to conduct valid inferences about the linear effect Pi in model (9.3), we must not only estimate it accurately, but also calculate correct standard errors for this estimate. For model (9.3), pi = cT3, where c = (0 ,1 ,0 ,0 ,0) and 3 = (Po, Pi, P2, Ps, P23V • We propose to estimate Pi v ia c T / 3 / S c , where 8ISc is the usual local linear backfitting estimate of 3. Figure 9.14 displays a plot of c T / 3 / S c versus the smoothing parameter h, which controls the width of the smoothing window. The large variation in the values of these estimates re-iterates the importance of choosing h appropriately from the data so as to obtain accurate estimates of Pi. To choose appropriate values of h from the data, we use the preferred P L U G - I N and E B B S - G methods developed in Chapter 6. B o t h methods use a grid H = {2, 3 , . . . , 548}, 151 where the values in the grid represent half-widths of local linear smoothing windows. Recall that both of these methods require that we estimate the underlying correlation structure of the model errors. In addition, P L U G - I N requires that we estimate the sesonal effect m i in the model. We discuss these topics below. We estimate the seasonal effect mj and the error correlation structure using modified (or leave-(21+l)-out) cross-validation, as outlined in Sections 6.3.1 and 6.3.2. We allow the tuning parameter I to take on the values 0 , 1 , . . . , 26. Recal l that I quantifies our belief about the range and magnitude of the error correlation. For instance, I = 0 signifies that we believe the errors to be independent. W h e n the model errors are t ruly correlated, we suspect that values of I that are too small may produce under-smoothed estimates of m i , whereas values of I that are too large may produce over-smoothed estimates of rri\. To ascertain what values of / are reasonable for the data, we examine plots of the es-timated seasonal effect m i in model (9.3) corresponding to / = 0 , 1 , . . . , 26; see Figure 9.15. These plots suggest that using I = 0 or I — 1 is probably not appropriate, as the corresponding estimates of m i are visually too rough. Using values of / in the range 2 — 17 seems to yield reasonable estimates of m i . Values of I in the range 18 — 26 seem to yield over-smoothed estimates of m i , so perhaps should, be avoided. Next, we estimate the error terms in model (9.3) v ia modified (or leave-(21+l)-out) cross-validation residuals, defined as in Section 6.3.1. Figure 9.16 shows plots of these residuals for various values of I. Now, we use the modified cross-validation residuals to estimate the correlation structure of the model errors. We wi l l operate under the assumption that these errors follow a covariance-stationary autoregressive process of finite order R. To estimate R, we use the finite sample criterion for autoregressive order selection developed by Broersen (2000). Figure 9.17 shows that our estimate of R is influenced by how we choose the value of the tuning parameter I. Choosing I — 0 or 1 yields an R of 28. Choosing larger Vs yields R's 152 like 0, 2, 3 or 4. Recal l that values of / like 0,1 or 1 8 , . . . , 26 are likely not appropriate for these data. Final ly , after determining the order R — R(l), I = 0 , 1 , . . . , 26, of the autoregressive error process, we estimate the error variance o~\ and the autoregressive parameters 4>i, • • • ,4>R using Burg's method (Brockwell and Davis, 1991). Furthermore, we estimate the error correlation matr ix * by plugging in the estimated values of 0 i in the expression for * provided in Comment 2.2.1. Having estimated the seasonal effect m i and the error correlation structure for model (9.3), we can now tackle the issue of data-driven choice of h for accurate estimation of Pi v i a c T /3 J i S c. The estimated bias squared, variance and mean squared error curves used for determining the P L U G - I N choice of smoothing for cT/37 s= are shown in Figure 9.18. The different curves correspond to different values of I, where Z = 0 , 1 , . . . , 26. In general, the mean squared error curves corresponding to small values of I dominate those corresponding to large values of I. Figure 9.19 displays similar plots used for determining the E B B S - G choice of smoothing. Note that the bias curve in this figure does not depend on I. A l so note that mean squared error curves in this figure that correspond to large values of I dominate, in general, the curves that correspond to small values of I. Figures 9.20 and 9.21 display the P L U G - I N and E B B S - G choices of smoothing parameter obtained by minimizing the estimated mean squared error curves in Figures 9.18 and 9.19. B o t h choices are remarkably stable for values of / that seem appropriate for these data. However, the P L U G - I N choices are much smaller in magnitude than the E B B S - G choices. The P L U G - I N choices that seem appropriate for the data indicate that the seasonal effect m i should be smoothed using h « 28. O n the other hand, the corresponding E B B S - G choices indicate that m i should be smoothed using h « 69. Figures 9.22 and 9.23 show the 95% confidence intervals constructed for Pi w i th P L U G -I N and E B B S - G choices of smoothing for values of I ranging from 0 to 26. These intervals 153 were obtained from formula (7.2), wi th fl — I. B o t h figures suggest that the choice of I (among those that are reasonable for the data) is not that important. Th is finding is consistent wi th the Monte Carlo simulation study conducted in Chapter 8 that indicated these choices of smoothing were appropriate for conducting inferences on the linear effect Pi in model (8.1) provided / was large enough. From Figure 9.22, there is no conclusive proof that Pi, the short-term P M 1 0 effect on log mortality, is significantly different from 0. Indeed, the standard confidence intervals for Pi based on cT/3/Sc, wi th h chosen v ia P L U G - I N , cross the zero line for all values of / that are appropriate for the data. The stability of these confidence intervals across various values of I is quite remarkable, but not entirely surprising given the stabili ty of the corresponding P L U G - I N choices of smoothing shown in Figure 9.20. Figure 9.23 supports the same conclusion for Pi, at least in part. However, for al l values of / that are appropriate for the data, these intervals either narrowly miss zero or barely contain it , suggesting that perhaps P M 1 0 does have a significant effect on log mortality. W h a t could explain the discrepancy between Figures 9.22 and 9.23? The standard er-rors of the estimated P M 1 0 effects are comparable in both figures. However, the P M 1 0 effect estimates obtained wi th a P L U G - I N choice of smoothing are much smaller than those obtained wi th E B B S - G . A s seen in Figures 9.20 and 9.21, the P L U G - I N choices of smoothing parameter for these data are about 28 or so, and are much smaller than the E B B S - G choices, which are about 69 or so. Figure 9.14 shows that using choices of smoothing parameter h of 28 or so yields smaller P M 1 0 estimates than using values of h of 69 or so. We favour smaller choices of smoothing parameter. We believe E B B S - G yielded large choices because it used a grid range that was too wide. Recal l that E B B S -G attempts to estimate the conditional bias of c T 3 I s ^ by assuming a specific form for the relationship between this bias and the smoothing parameter h. This relationship is motivated by asymptotic considerations as in (6.13), so it may break down for val-ues of h € H that are too large. Es t imat ing this relationship based on all the "data" 154 { ( \ c T 3 / i S c ) -.hen) , may therefore not be appropriate. One should perhaps use only "data" for which h is reasonably small to ensure the asymptotic considerations underly-ing E B B S - G are valid. In other words, one should use a smaller grid range for E B B S - G . We used E B B S - G wi th a grid H = { 2 , . . . , 100} instead of H = { 2 , . . . , 548} and got a similar result to that obtained v ia P L U G - I N (see Figure 9.24): there is no conclusive proof that P M 1 0 has a significant effect on log mortality. Th is finding is not surprising given the strength of the annual cycles present in Figure 9.1. 155 Figure 9.1: Pairwise scatter plots of the Mexico City air pollution data. 156 Estimated PM 10 Effects Estimated Standard Errors 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 Span Span Figure 9.2: Results o / g a m inferences on the linear PM10 effect B\ in model (9.3) as a function of the span used for smoothing the seasonal effect m\: estimated PM10 effects (top left), associated standard errors (top right), 95% confidence intervals for B\ (bottom left) and p-values of t-tests for testing the statistical significance of 3\. 157 J Day of Study Figure 9.3: The top panel displays a scatter plot of log mortality versus PM10. The ordinary least squares regression line of log mortality on PM10 is superimposed on this plot. The bottom panel displays a plot of the residuals associated with model (9.1) versus day of study. 158 0.4 0.2 0.0 •0.2 •0.4 0.4 0.2 0.0 •0.2 •0.4 span = 0.01 7 .' ' ? 0 200 400 600 800 1000 span = 0.09 .i.-. :^ : : : v 0.4 0.2 o.o •0.2 •0.4-0.4 0.2 0.0-•0.2 •0.4 span = 0.05 200 400 600 800 1000 span = 0.10 0 200 400 600 800 1000 0 200 400 600 800 1000 •0.4 0.4 0.2 0.0 •0.2 •0.4 0.4 0.2 0.0 -0.2 •0.4 •0.4 0 200 400 600 800 1000 0 200 400 600 800 1000 span = 0.15 0 200 400 600 800 1000 span = 0.35 •0.4 span = 0.25 0 200 400 600 800 1000 0 200 400 600 800 1000 •0.4-«u_ span = 0.50 0 200 400 600 800 1000 Figure 9.4: Plots of the the fitted seasonal effect mi in model (9.2) for various spans. Partial residuals, obtained by subtracting the fitted parametric part of the model from the responses, are superimposed as dots. 159 0.4 0.2 0.0' •0.2 •0.4 0.4 0.2-0.0-•0.2 -0.4 0.4 0.2 0.0' -0.2 -0.4 0.4 0.2 0.0 -0.2 -0.4 span = 0.01 0 200 400 600 800 1000 span = 0.09 0 200 400 600 800 1000 span = 0.11 0 200 400 600 800 1000 span = 0.15 0 200 400 0.4-| span = 0.35 i 600 800 1000 0.4 0.2 0.0-•0.2 •0.4 0.4 0.2 0.0-•0.2-•0.4 0.4-0.2-0.0-•0.2 •0.4 0.4 0.2-0.0-•0.2 •0.4 span = 0.05 0 200 400 600 800 1000 span = 0.10 0 200 400 600 800 1000 span = 0.12 0 200 400 600 800 1000 span = 0.25 200 400 600 800 1000 0 200 400 600 800 1000 Figure 9.5: Plots of the residuals associated with model (9.2) for various spans. 160 0.0 0.1 0.2 0.3 0.4 0.5 Span for smoothing m2 0.0 0.1 0.2 0.3 0.4 0.5 Span for smoothing m2 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 Span for smoothing m2 Span for smoothing m2 Figure 9.6: P-values associated with a series of crude F-tests for testing model (9.4) against model (9.2). 161 Figure 9.7: Plots of the fitted weather surface m2 in model (9.4) when the fitted seasonal effect m\ (not shown) was obtained with a span of 0.09. The surface m-i was smoothed with spans of 0.01 (top left), 0.02 (top right), 0.03 (bottom left) or 0.04 (bottom right). 162 300H 25CH | 200H CD CD CD | l 5 ( H Q 1 0 0 H 5(H o!Ei O 1 0 O i l 0 . 2 0 Span Figure 9.8: Degrees of freedom consumed by the fitted weather surface m 2 in model (9.4) versus the span used for smoothing m 2 when the fitted seasonal effect mi (not shown) was obtained with a span of 0.09. 163 PM10 0.2H 0.1-3 0.0-13 •g "w 2-0.1--0.2--0.3-200 400 600 800 1000 Day of Study Figure 9.9: Plot of residuals associated with model (9.3) versus PM10 (top row) and day of study (bottom row). The span used for smoothing the unknown mi in model (9.3) is 0.09. 164 20 40 60 80 Figure 9.10: Plot of residuals associated with model (9.3) versus relative humidity, given temperature. The span used for smoothing the unknown m\ in model (9.3) is 0.09. 165 Temperature Figure 9.11: Plot of residuals associated with model (9.3) versus temperature, given relative humidity. The span used for smoothing the unknown m\ in model (9.3) is 0.09. 166 0.2 c 0.1 o 0.0 <-0.1 -0.2 50 100 150 200 250 Lag 0.2 c g 'ra 0.1 (B L . L . 10.0 < t -0.1 ra n -0.2 50 100 150 200 250 Lag Figure 9.12: Autocorrelation plot (top row) and partial autocorrelation plot (bottom row) of the residuals associated with model (9.3). The span used for smoothing the unknown mi in model (9.3) is 0.09. 167 0.6 0.4 I 0.2 t 0.0 o 1-0.2 < -0.4 H -0.6 flu. tMh, 50 100 150 Lag 200 250 0.6 0 0.4 J5 2! 0.2 L-1 o.o I -0.4-1 -0.6 H —i— 50 100 150 Lag 200 250 Figure 9.13: Autocorrelation plot (top row) and partial autocorrelation plot (bottom row) of the responses in model (9.3). 168 200 300 Smoothing Parameter 400 500 Figure 9.14: Usual local linear backfitting estimate of the linear PM10 effect model (9.4) versus the smoothing parameter. 169 170 Figure 9.17: Estimated order for AR process describing the serial correlation in the residuals associated with model (9.3) versus I, where I = 0,1,..., 26. Residuals were obtained by estimating mi with a modified (or leave-(2l+l)-out) cross-validation choice of amount of smoothing. 172 i I I I ' " " • T " i i I | I I in II II • | • 0 50 100 150 200 0 50 100 150 200 Smoothing Parameter Smoothing Parameter Smoothing Parameter Figure 9.18: Estimated bias squared, variance and mean squared error curves used for determining the plug-in choice of smoothing for the usual local linear backfit-ting estimate of Pi. The different curves correspond to different values of I, where I — 0,1,..., 26. The estimated variance curves corresponding to small values of I are dominated by those corresponding to large values of I when the smoothing parameter is large. In contrast, the estimated squared bias and mean squared error curves corre-sponding to small values of I dominate those corresponding to large values of I when the smoothing parameter is large. 173 Smoothing Parameter Figure 9.19: Estimated bias squared, variance and mean squared error curves used for determining the global EBBS choice of smoothing for the usual local linear back-fitting estimate of Pi. The different curves correspond to different values of I, where I — 0,1,... ,26. The curves corresponding to large values of I dominate those corre-sponding to small values of I. 174 o 00 0) 0 0 E w 0 o 8 «• o x: 0 2 CL O O H Figure 9.20: Plug-in choice of smoothing for estimating Pi versus I, where I 0 , 1 , . . . , 2 6 . 175 o 00 Ul c o _ 'E (O 0 0 E w 0 0) 0 o 'o ^ " 0 w m CO LU 1 lob o M CD 20 10 15 25 Figure 9.21: Global EBBS choice of smoothing for estimating Pi versus I, where I = 0,1,. . . , 26. 176 CD o o. o 0 o c o •g »+-c 0 0 N° 0s IT) 0) "2 CO "D C iS w o o. o o c i ' o o o "T 10 15 20 25 F i g u r e 9 .22: Standard 95% confidence intervals for Pi based on local linear back-fitting estimates of Pi with plug-in choices of smoothing. The different intervals correspond to different values of I, where I — 0 , 1 , . . . , 26. The shaded area represents confidence intervals corresponding to values of I that are reasonable for the data. 177 Figure 9.23: Standard 9 5 % confidence intervals for 3\ based on local linear backfit-ting estimates of Pi with global EBBS choices of smoothing. The different intervals correspond to different values of I, where 1 = 0,1,..., 26 . The shaded area represents intervals corresponding to values of I that are reasonable for the data; the intervals corresponding to I = 3,... ,7 do not cross the horizontal line passing through zero. 178 T 0 5 10 15 20 25 I Figure 9.24: Standard 95% confidence intervals for Pi based on local linear back-fitting estimates of Pi with global EBBS choices of smoothing obtained by using a smaller grid range. The different intervals correspond to different values of I, where I = 0,1, . . . , 26. The shaded area represents confidence intervals corresponding to values of I that are reasonable for the data. 179 Chapter 10 Conclusions In this chapter, we provide an overview of the research problem considered in this thesis. We then outline the main contributions of this thesis and summarize the contents of each chapter. Final ly , we suggest possible extensions to our work. Partially Linear Models Part ia l ly linear models are flexible tools for analyzing data from a variety of applications. They generalize linear regression models by allowing one of the variables in the model to have a non-linear effect on the response. Inferences on the Linear Effects in Partially Linear Models In many applications, the primary focus is on conducting inferences on the linear effects 8 in a part ial ly linear model. In these applications, the non-linear effect m in the model is treated as a nuisance. This nuisance effect is a double-edged sword - while it affords greater modelling flexibility, it is also more difficult to estimate than the linear effects and, as such, it complicates the inferences on these effects. Inferential Goals Depending on the application, various goals could be relevant to the problem of conduct-ing inferences on the linear effects in a partially linear models wi th correlated errors. 180 One goal would be to choose the correct amount of smoothing for accurately estimating the linear effects. One would hope that the methodology used for making this choice produces an amount of smoothing for which the linear effects are estimated at the 'usual ' parametric rate of 1/n - the rate that would be achieved if the non-linear effect were known. Another goal would be to construct valid standard errors for the estimated linear effects. A n additional goal would be to use the estimated linear effects and their associated stan-dard errors to construct valid confidence intervals and tests of hypotheses for assessing the magnitude and statistical significance of the linear effects, possibly adjusting for smoothing bias. Li t t le has been done in the literature to address this goal. Research Questions Concerning the Inferential Goals Various research questions emerge in connection wi th the inferential goals listed above: 1. How can we choose the correct amount of smoothing for accurate estimation of the linear effects? 2. How can we estimate the correlation structure of the model errors for conducting inferences on the linear effects? 3. How can we construct valid standard errors for the estimated linear effects? 4. How can we construct valid confidence intervals and tests of hypotheses for assessing the magnitude and statistical significance of the linear effects? 5. W h a t is the impact of the choice of amount of smoothing on the validity of the confidence intervals and tests of hypotheses? 6. Cou ld inefficient estimates of the linear effects provide valid inferences? 181 Thesis Contributions The major contributions of this thesis to the research questions stated above are: (1) defining sensible estimators of the linear and non-linear effects in part ial ly linear models wi th correlated errors, (2) deriving explicit expressions for the asymptotic conditional bias and variance of the proposed estimators of the linear effects, (3) developing data-driven methods for selecting the appropriate amount of smoothing for accurate estimation of the linear effects, (4) developing confidence interval and hypothesis testing procedures for assessing the magnitude and statistical significance of the linear effects of main interest, (5) studying the finite-sample properties of these procedures, and (6) applying these procedures to the analysis of an air pol lut ion data set. These contributions are discussed in more detail below. The estimators we proposed in this thesis are backfitting estimators, relying on locally linear regression, which is known to posses attractive theoretical and practical properties. Many of the backfitting estimators proposed in the literature of part ial ly linear regression models wi th correlated errors rely on locally constant regression, a method that does not enjoy the good properties of locally linear regression. In Chapters 4 and 5 of this thesis, we studied the large-sample behaviour of the estima-tors of linear effects introduced in this thesis as the wid th of the smoothing window used in locally linear regression decreases at a specified rate, and the number of data points in this window increases. Specifically, we obtained explicit expressions for the conditional asymptotic bias and variance of these estimators. Our asymptotic results are important as they show that, in the presence of correlation between the linear and non-linear vari-ables in the model, the bias of the estimators of the linear effects can dominate their variance asymptotically, therefore compromising their -^/^-consistency. This problem can be remedied however by selecting an appropriate rate of convergence for the smoothing parameter of the estimators. Th is rate is slower than the rate that is opt imal for esti-mation of the non-linear effect, and as such it 'undersmooths' the estimated non-linear 182 effect. Selecting the appropriate amount of smoothing for the estimators of the linear effects is a crucial problem, which is complicated by the presence of error correlation and dependen-cies between the linear and nonlinear components of the model. Our theoretical results indicate that the amount of smoothing that is 'optimal' for estimating the non-linear effect is not 'optimal' for estimating the linear effects. Data-driven methods devised for accurate estimation of the non-linear effect will likely fail to yield a satisfactory choice of smoothing for estimating the linear effects. In this thesis, we proposed three data-driven smoothing parameter selection methods. Two of these methods are modifications of the EBBS method of Opsomer and Ruppert (1999) and rely on the asymptotic bias results derived in this thesis. The third method is a non-asymptotic plug-in method. Our meth-ods fill a gap in the literature of partially linear models with correlated errors, as they are designed specifically for accurate estimation of the linear effects. These methods 'un-dersmooth' the estimated non-linear effect because they attempt to estimate the amount of smoothing that is MSE-optimal for estimating the linear effects, not the amount of smoothing that is MSE-optimal for estimating the non-linear effect. Our theoretical results suggest that, in general, the amount of smoothing that is MSE-optimal for esti-mating the linear effects is smaller than the amount of smoothing that is MSE-optimal for estimating the non-linear effect. The issue of conducting valid inferences on the linear effects in a partially linear model with correlated errors is inter-connected with the appropriate choice of smoothing for estimating these effects. Most literature results devoted to this issue use choices of smoothing that 'do well' for estimation of the non-linear effect and are deterministic. Such choices may not be satisfactory when one wishes to 'do well' for estimation of the linear effects and hence have little practical value in such contexts. The confidence interval and hypothesis testing procedures proposed in this thesis are constructed with data-driven choices of smoothing. They are either standard, bias-adjusted or standard-183 error adjusted. To our knowledge, adjusting for bias in confidence intervals and tests of hypotheses has not been attempted in the literature of part ial ly linear models. The inferential procedures we introduced in this thesis do not account for the uncertainty associated wi th the fact that the choice of smoothing is data-dependent and the error correlation structure is estimated from the data. However, simulations indicate that several of these procedures perform reasonably well for finite samples. In Chapter 8, we conducted a Monte Carlo simulation study to investigate the finite sample properties of the linear effects estimators proposed in this thesis, namely, the usual and estimated modified local linear backfitting estimators. We also compared the properties of these estimators against those of the usual Speckman estimator. In our simulation study, we chose the smoothing parameter of the backfitting estimators using the data-driven methods developed in Chapter 6. B y contrast, we chose the smoothing parameter of the usual Speckman estimator using cross-validation, modified for correlated errors ( M C V ) and for boundary effects. The main goals of our simulation study were (1) to compare the expected log mean squared error of the estimators and (2) to compare the performance of the confidence intervals buil t from these estimators and their associated standard errors. Our study suggested that the usual local linear backfitting estimator should be used in practice, wi th either a global modified E B B S or a non-asymptotic plug-in choice of smoothing. To ensure the validity of the inferences based on this estimator and its associated standard error, one should never use small values of / in the modified (or leave-(21+l)-out) cross-validation criterion uti l ized in estimating the error correlation structure. Adjust ing these inferences for possible bias effects d id not affect the quality of our results. The quality of the inferences based on the estimated modified local linear estimator was poor for many simulation settings, owing to the fact that the associated standard errors were too variable. The quality of the inferences based on the Speckman estimator was reasonable for most simulation settings, but not as good as that of the inferences based on the usual local linear backfitting estimator. 184 In Chapter 9, we used the inferential methods developed in this thesis to assess whether the pollutant P M 1 0 had a significant short-term effect on log mortali ty in Mexico C i t y during 1994-1996, after adjusting for temporal trends and weather patterns. Our data analysis suggested that there is no conclusive proof that P M 1 0 had a significant short-term effect on log mortality. Our data analysis differs from standard analyses in that it relies on objective methods to adjust this effect for temporal confounding. Further W o r k to be Done A s usual, there is further work to be done. The following are just a few of the issues that need additional investigation. Proofs of the asymptotic normality of the linear effects estimators proposed in this thesis are st i l l pending. These proofs wi l l provide formal justification for using standard con-fidence intervals and tests of hypotheses based on these estimators and their associated standard errors. Further investigation into the appropriate choice of I in the modified cross-validation criterion used in estimating the error correlation structure is needed. This choice should take into account the range and magnitude of the error correlation. Possible Extensions to O u r Work The work in this thesis can be extended in various directions. Firs t , we could extend the partially linear model considered in this thesis by allowing additional univariate smooth terms to enter the model. Such models arise frequently in practical applications. Developing inferential methodology for these models is therefore important. To carry out inferences on the linear effects in such models we would need to simultaneously choose the amounts of smoothing for estimating all the non-linear effects. These amounts should be appropriate for accurate estimation of the linear effects and should account for correlation between the linear and non-linear variables and correlation between the model errors. 185 Second, we could extend the partially linear model considered in this thesis to responses that are not continuous. For instance, the responses could follow a Poisson distribution. Incorporating correlation in such models could be a challenge. Third, we could extend the partially linear model considered in this thesis by allowing the non-linear variable to be a spatial coordinate, in which case m is a spatial effect. Such a model is termed a spatial partially linear model. Clearly, in many contexts, the errors would be correlated. Spatial partially linear models with correlated errors can be used, for instance, to analyze spatial data observed in epidemiological studies of particulate air pollution and mortality. Typically, in these applications, the linear effects 3 are of main interest, while the spatial effect m is treated as a nuisance. Ramsay et al. (2003b) considered spatial partially linear models with uncorrelated errors and estimated 3 and m using the S-Plus function gam with loess as a smoother. They used gam's default choice of smoothing to control the degree of smoothness of the estimated m. They showed via simulation that the correlation between the linear and spatial terms in the model can lead to underestimation of the true standard errors associated with the estimated linear effects, both when using S-Plus standard errors and so-called asymptotically unbiased standard errors. They cautioned that using such standard errors can compromise the validity of inferences concerning the linear effects, but did not propose a solution for alleviating this problem. Their findings highlight the fact that carrying out inferences on the linear effects in spatial partially linear models with uncorrelated errors is challenging in the presence of correlation between the linear and spatial terms in the model. Obviously, error correlation will further compound the challenges involved in conducting valid inferences on the linear effects in spatial partially linear models. Of course, this work would be relevant in the non-spatial context as well. 186 Bibliography [1] Aneiros Perez, G. and Quintela del Rio, A. (2001a). Asymptotic properties in partial linear models under dependence. Test, 10, 333-355. [2] Aneiros Perez, G. and Quintela del Rio, A. (2001b). Modified cross-validation in semiparametric regression models with dependent errors. Communications in Statis-tics: Theory and Methods, 30, 289-307. [3] Aneiros Perez, G. and Quintela del Rio, A. (2002). Plug-in bandwidth choice in partial linear models with autoregressive errors. Journal of Statistical Planning and Inference, 100, 23-48. [4] Bos, R., de Waele, S. and Broersen, P.M.T. (2002). Autoregressive spectral esti-mation by application of the Burg algorithm to irregularly sampled data. IEEE Transactions on Instrumentation and Measurement, 51, 1289-1294. [5] Brockwell, P.J. and Davis, R.A. (1991). Time Series: Theory and Methods. Second Edition. New York: Springer-Verlag. [6] Broersen, P.M.T. (2000). Finite Sample Criteria for Autoregressive Order Selection. IEEE Transactions on Signal Processing, 48, 3550-3558. [7] Buja, A., Hastie, T. and Tibshirani, R. (1989). Linear smoothers and additive models (with discussion). Annals of Statistics, 17, 453-555. [8] Chatfield, C. (1989). The Analysis of Time Series: An Introduction. Fourth Edition. New York: Chapman and Hall. 187 [9] Chu , C . - K . , Marron , J.S. (1991). Comparison of two bandwidth selectors wi th de-pendent errors. Annals of Statistics, 19, 1906-1918. [10] David , B . and Bast in , G . (2001). A n estimator of the inverse covariance matr ix and its application to M L parameter estimation in dynamical systems. Automatica, 156, 99-106. [11] Dominic i , F . , McDermot t , A . , Zeger, S.L. and Samet, J . M . (2002). O n the use of generalized additive models in time-series studies of air pol lut ion and health. American Journal of Epidemiology, 156, 193-203. [12] Engle, R . F . , Granger, C . W . J . , Rice, J . and Weiss, A . (1983). Nonparametric es-timates of the relation between weather and electricity demand. Technical report, U . C . San Diego [13] Engle, R . F . , Granger, C . W . J . , Rice, J . and Weiss, A . (1986). Semiparametric es-timates of the relation between weather and electricity sales. The Journal of the American Statistical Association, 81, 310-320. [14] Fan, J . (1993). Loca l linear regression smoothers and their minimax efficiency. The Annals of Statistics, 21, 196-216. [15] Fan, J . and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. New York: Chapman and Hal l . [16] Fan, J . and Gijbels, I. (1992). Variable Bandwid th and Loca l Linear Regression Smoothers. The Annals of Statistics, 20, 2008-2036. [17] Francisco-Fernandez, M . and Vilar-Fernandez, J . M . (2001). Loca l polynomial regres-sion wi th correlated errors. Communications in Statistics: Theory and Methods, 30, 1271-1293. [18] Gasser, T . and Mi i l l e r , H . G . (1984). Est imat ing regression functions and their deriva-tives by the kernel method. Scandinavian Journal of Statistics, 11, 171-185. 188 [19] Green, P. , Jennison, C . and Seheult, A . (1985). Analysis of field experiments by least squares smoothing. Journal of the Royal Statistical Society, Series B, 47 , 299-315. [20] Hastie, T . J . and Tibshirani , R . J . (1990). Generalized Additive Models. New York: Chapman and Ha l l . [21] Hardle ,W. and Vieu , P. (1992). Kernel regression smoothing of time series. Journal of Time Series Analysis, 13, 209-232. [22] Heckman, N . E . (1986). Spline smoothing in a partly linear model. Journal of the Royal Statistical Association, Series B, 48 , 244-248. [23] Ibragimov, I .A . and Linnik , Y . V . (1971). Independent and Stationary Sequences of Random Variables. Groningen: Wolters Noordhoff. [24] Katsouyanni , K . , Toulomi, G . and Samoli, E . , et al. (1997). Confounding and effect modification in the short-term effects of ambient particles on total mortality: results from 29 European cities wi th in the A P H E A 2 project. Epidemiology, 12, 521-531. [25] Kelsa l l , J . E . , Samet, J . M . and Zeger, S.L. (1997). A i r pol lut ion and mortali ty in Philadelphia, 1974-1988. American Journal of Epidemiology, 146, 750-762. [26] Moolgavakar, S. (2000). A i r pollution and hospital admissions for diseases of the circulatory system in three U.S . metropolitan areas. Journal of the Air Waste Man-agement Association, 50, 1199-1206. [27] Moyeed, R . A . and Diggle, P . J . (1994). Rate of convergence in semiparametric mod-elling of longitudinal data. Australian Journal of Statistics, 36, 75-93. [28] Nadaraya, E . A . (1964). O n estimating regression. Theory of Probability and Its Ap-plications , 9, 141-142. [29] Opsomer, J . D . and Ruppert , D . (1998). A fully automated bandwidth selection method for fitting additive models. The Journal of the American Statistical Associ-ation, 93 , 605-620. 189 [30] Opsomer, J . D . and Ruppert , D . (1999). A root-n consistent estimator for semi-parametric additive modelling. Journal of Computational and Graphical Statistics, 8, 715-732. [31] Ramsay, T . , Burnett , R . , Krewski , D . (2003a). The effect of concurvity in generalized additive models l inking mortali ty and ambient air pollution. Epidemiology, 14, 18-23. [32] Ramsay, T . , Burnett , R. , Krewski , D . (2003b). Explor ing bias in a generalized ad-ditive model for spatial air pollut ion data. Environmental Health Perspectives, 111, 1283-1288. [33] Rice, J . A . (1986). Convergence rates for partially splined models. Statistics and Probability Letters, 4, 203-208. [34] Robinson, P . M . (1988). Root-n-consistent semiparametric regression. Econometrica, 56, 931-954. [35] Samet, J . M . , Dominic i , F . , Curr iero,F. , et al . (2000). Fine particulate air pol lut ion and mortali ty in 20 U .S . cities: 1987-1994 (with discussion). New England Journal of Medicine, 343 , 1742-1757. [36] Schwartz, J . (1994). Nonparametric smoothing in the analysis of air pol lut ion and respiratory illness. The Canadian Journal of Statistics, 22, 471-488. [37] Schwartz, J . (1999). A i r pollution and hospital admissions for heart disease in eight U S counties. Epidemiology, 10, 17-22. [38] Schwartz, J . (2000). Assessing confounding, effect modification, and thresholds in the associations between ambient particles and daily deaths. Environmental Health Perspectives, 108, 563-568. [39] Shick, A . (1996). Efficient estimation in a semiparametric additive regression model wi th autoregressive errors. Stochastic Processes and their Applications, 61 , 339-361. 190 [40] Shick, A . (1999). Efficient estimation in a semiparametric additive regression model wi th A R M A errors. Stochastic Processes and their Applications, 61, 339-361. [41] Speckman, P . E . (1988). Regression analysis for part ial ly linear models. Journal of the Royal Statistical Association, Series B, 50, 413-436. [42] Sy, H . (1999). Automat ic bandwidth choice in a semiparametric regression model. Statistica Sinica, 9, 775-794. [43] Truong, Y . K . (1991). Nonparametric curve estimation wi th time series errors. Jour-nal of Statistical Planning and Inference, 28, 167-183. [44] Wahba, G . (1984). Cross-validated spline methods for the estimation of multivariate functions from data on functionals. In Statistics: An Appraisal, Proceedings 50th Anniversary Conference Iowa State Statistical Laboratory (H. A . David , ed.) Iowa State Universi ty Press, 205-235. [45] Watson, G.S . (1964). Smooth regression analysis. Sankhya A, 26, 359-372. [46] Y o u , J . and Chen, G . (2004). Block external bootstrap in part ial ly linear models wi th nonstationary strong mixing error terms. The Canadian Journal of Statistics, 32, 335-346. [47] Y o u , J . , Zhou, X . and Chen, G . (2005). Jackknifing in part ial ly linear regression models wi th serially correlated errors. Journal of Multivariate Analysis, 92, 386-404. 191 Appendix A MSE Comparisons In this appendix, we provide plots to help assess and compare the M S E properties of the estimators of the linear effect /?i in model (8.1) that were discussed in Section 8.2. 192 U EBBS G minus U PLUG IN U_EBBS L minus U PLUG IN 1 1 + + 1 1 1 1 1 1 1 1 " -L i l l l i l l i - t I * * * t t * ! i + + + + + + + + ; S S S S S S S S S J 1 1 — 1 1 1 1 1 1 1 r |j > s 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 1=10 U EBBS L minus U EBBS G l=0 1=1 l=2 Figure A . l : Boxplots of pairwise differences in log MSE for the estimators PV,PLUG-IN> PU]EBBS-G A N D PU!EBBS-L of the linear effect Pi in model (8.1), where I = 0,1,. . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0 and m(z) — mi(z). 193 U E B B S G minus U P L U G 0.4 h 1=4 1=5 1=6 1=7 U_EBBS L minus U E B B S G 1=9 1=10 + t + t t + I J I I ! I I I J * * * * * * * * * l=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 Figure A.2: Boxplots of pairwise differences in log MSE for the estimators PU]PLVG-IN> PU]EBBS-G A N D 0U,EBBS-L °f t h e ^ear effect BX in model (8.1), where I — 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.2 and m(z) = m\{z). 194 U E B B S G minus U P L U G IN U_EBBS L minus U E B B S G 1=9 1=10 Figure A.3: Boxplots of pairwise differences in log MSE for the estimators M]PLUG-IN> W,EBBS-G A N D PU,EBBS-L °f t h e linear effect Bx in model (8.1), where I = 0,1, . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.4 and m(z) = m\{z). 195 U _ E B B S _ G minus U_PLUG_IN 1 1 1 1 I i i i ! : H 1 1 1 1 h + • ! i ! T I 1 ? ? T T T < * * • . + + t + H + + + S S S + 1 1 1 1 1 1 1 1 t= * + i i 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 U_EBBS_L minus U_PLUG_IN + + j 1 1 1 1 1 1 I I I 1 i + + + • I I t I I I i \ \ - (fl I + t s s s s s s s s -1 1 1 1 1 1 1 1 1 —I 1 I I I I I I I I I 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 U E B B S L minus U E B B S G Figure A.4: Boxplots of pairwise differences in log MSE for the estimators PU,PLUG-IN> PU,EBBS-G A N D M]EBBS-L of the linear effect ft in model (8.1), where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.6 and m(z) — m\(z). 196 U E B B S G minus U P L U G IN 1=3 1=4 1=5 1=6 1=7 1=8 1=9 1=10 U E B B S L minus U P L U G IN U _ E B B S _ L minus U E B B S G I I 1 1 1 1 1 1 1 i 1 [ I | | t | r -1 rh i i I t i , t s S S S ! P f * ¥ * + * * * + i-> S S S I l l l l l l 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 Figure A . 5 : Boxplots of pairwise differences in log MSE for the estimators PU!PWG-IN> PU!EBBS-G and PU,EBBS-L °f t h e Unear effect Pi in model (8.1), where I = 0,1, . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.8 and m(z) — rn\(z). 197 U E B B S G minus U P L U G IN U E B B S L minus U P L U G IN 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=9 1=10 U E B B S L minus U E B B S G T l=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=9 1=10 Figure A.6: Boxplots of pairwise differences in log MSE for the estimators PV]PLUG-IN> PU,EBBS-G A N D PU]EBBS-L of the linear effect ft in model (8.1), where / = 0 , 1 , . . . , 1 0 . Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0 and m(z) — 1712(2). 198 1=0 U E B B S G minus U P L U G IN U E B B S L minus U P L U G IN T 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 U E B B S L minus U E B B S G T 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 l=9 1=10 Figure A.7: Boxplots of pairwise differences in log MSE for the estimators M]PLUG-IN> PU]EBBS-G A N D PU,EBBS-L °f t h e linear effect Pi in model (8.1), where I = 0,1, . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.2 and m(z) — 7712(2). 199 U _ E B B S G minus U P L U G IN ~i r ~i 1 t -i- * * * i I I I t i + + s s I I L. 1=0 1 = 1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 U E B B S L minus U P L U G IN n r - ~i r~ I + i i s s s s s s s s s s s —1 1 1 1 1 I I I I I L_ 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 4 3 2 1h I l=0 U _ E B B S _ L minus U _ E B B S _ G n 1 i 1 1 1 1=1 1 r - * - i i i i I I X 1=2 1=3 1=4 1=5 1=6 1=7 1=8 1=9 1=10 Figure A.8: Boxplots of pairwise differences in log MSE for the estimators PU,PLUG-IN> PU]EBBS-G A N D PU,EBBS-L °f t h e linear effect ft in model (8.1), where I — 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.4 and m(z) — 7712(2). 200 U E B B S G minus U P L U G IN 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 U _ E B B S L minus U P L U G IN - i 1 1 1=9 1=10 + + + i=o 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 1=10 T l=0 T 1=1 U E B B S L minus U E B B S G I + + l=2 l=3 l=4 l=5 l=6 l=7 l=9 1=10 Figure A.9: Boxplots of pairwise differences in log MSE for the estimators PU,PLUG-IN> PU)EBBS-G AND &U]EBBS-L °f t h e linear effect Bx in model (8.1), where 1 = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.6 and m(z) = 7712(2;). 201 U E B B S G minus U P L U G IN U _ E B B S L minus U P L U G IN + + + + + + 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 U E B B S L minus U E B B S G 4 i — 3 -2 -1 -0 — -1 --2 -- 3 -X + l=0 1=1 + i r-l=2 l=3 l=4 l=5 l=6 l=7 l= l=9 1=10 Figure A . 10: Boxplots of pairwise differences in log MSE for the estimators PV,PLUG-IN> W,EBBS-G A N D PU]EBBS-L of the linear effect Pi in model (8.1), where I — 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.8 and m(z) = 7712(2). 202 EM E B B S G minus E M P L U G IN E M E B B S L minus E M P L U G IN ~i r~ ~\ 1 r - 1 1-i i i i i i i i I i I + # s s s s s s s s s s s —I I I I I I 1 I I I . l _ 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 EM E B B S L minus E M E B B S G Figure A . 11: Boxplots of pairwise differences in log MSE for the estimators 0EM,PLUG-IN> PEM,EBBS-G A N D 0EM.EBBS-L °f t h e l i n e a r effect Pi i n m o d e l (8A)> where Z = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is sig-nificantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0 and m(z) = mi(z). 203 E M E B B S G minus E M P L U G IN 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 0.6 0.4 0.2 0 -0.2 h -0.4 -0.6 E M _ E B B S _ L minus EM_PLUG_IN ~i 1 1 1 1 [— l l l l l l i l l l l !! f * * * + * 1 J ][ 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 s s s s s s s s s s s t I I I I I I I I ! 1_ l=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 Figure A.12: Boxplots of pairwise differences in log MSE for the estimators PEM,PLUG-IN> PEM,EBBS-G A N D PEM,EBBS-L °f t h e l i n e a r effect Pi i n m o d e l (8A)> where I — 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is sig-nificantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.2 and m(z) = m\(z). 204 E M _ E B B S _ G minus EM P L U G IN -0.5 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l= E M E B B S L minus EM P L U G IN l=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 I j L j 3 F= i i i J ; H 1 1 1 t i l l j P T T ^ 1 t ± ± i u i i L i i * + + • -} s S S s s s s ? f- ^ 1=9 1=10 Figure A . 13: Boxplots of pairwise differences in log MSE for the estimators PEM,PLUG-IN> PEM,EBBS-G A N D PEM,EBBS-L °f t h e l i n e a r effect Pi i n model (8.1), where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is sig-nificantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.4 and m(z) = m\{z). 205 E M E B B S G minus E M P L U G IN 1 1 1 T T i jj i i i i i i i . y y f i ? s s ? L i M H i " • r T T T T T T T } s s s s s s s 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 1=10 EM E B B S L minus E M P L U G IN I + j - r + L i i l i i i i i i -I I I i * i T T T * T T • + + + h + + ? s s s s s s s s s -1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 I I I I I I I I l i i i i i + t i i 1 L -• • | f T S S S S S S S S S S S " 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 Figure A . 14: Boxplots of pairwise differences in log MSE for the estimators MM,PLUG-IN> PEM,EBBS-G A N D PEM,EBBS-L °fthe l i n e a r effect Pi i n m o d e l where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE's is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.6 and m(z) — m\(z). 206 3 r -2 -1 -3 i— 2 -l=0 I l=0 l=0 1=1 ~i r -1=1 L J 1=1 E M E B B S G minus E M P L U G IN T T T T T f f 3 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 E M E B B S L minus E M P L U G IN - i r~ 1=2 1=3 1=4 1=5 1=6 1=7 1=8 1=9 E M E B B S L minus E M _ E B B S _ G ~i 1 1 1 r~ I 1=2 1=3 1=4 1=5 1=6 1=7 1=8 1=9 1=10 i i i i i i i T T T T T T l + i + + * * 1=10 r f 1=10 Figure A . 15: Boxplots of pairwise differences in log MSE for the estimators MM,PWG-IN> PEM,EBBS-G A N D PEM,EBBS-L °f t h e l i n e a r effect A i n m o d d (S- 1)^ where Z = 0 ,1 , . . . , 10. Boxplots for which the average difference in log MSE is sig-nificantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.8 and m(z) = m\(z). 207 E M E B B S G minus E M P L U G IN 1=10 E M E B B S L minus E M P L U G IN i 1 1 1 1 1 r I _ 1 1 1 " 1 i i i i : -J 1 , 1 1 1 1 1 T S S S S S S S 1 1 1 1 1 1 1 1 1 . s s s s 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 E M E B B S L minus E M E B B S G I I I I I I ' T - J L J L 1 1 1 T I ' • ' — I I I I i i i i : : - r - W T T . 1 T j S S S S S S S 1 1 1 1 1 . 1 T T 1 1 . s s s s I I I I 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 Figure A . 16: Boxplots of pairwise differences in log MSE for the estimators PEM,PLUG-IN> PEM,EBBS-G A N D PEM,EBBS-L °f t h e l i n e a r effect Pi i n m o d d i8-1)' where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is signifi-cantly different than 0 at the 0.05 significance level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simu-lated from model (8.1) with p = 0 and m(z) = m,2(z). 208 4 i— 3 -2 -1 -0 --1 --2 --3 -l=0 l=0 E M _ E B B S G minus E M P L U G IN 1=1 1=1 E M E B B S L minus E M P L U G IN L i i i i T T T T T T I I J T _ X J I l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 E M E B B S L minus E M E B B S G I, J L J L J L J l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 I 1=10 L J 1=10 Figure A.17: Boxplots of pairwise differences in log MSE for the estimators MM,PLUG-IN> PEM,EBBS-G A N D PEM,EBBS-L °f t h e l i n e a r effect A i n model (8.1), where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is sig-nificantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.2 and m(z) = m2(z). 209 E M E B B S G minus E M P L U G IN E M _ E B B S _ L minus E M P L U G IN "T" 1 1 1 1 1 J _ l-I-l 1 1 1 1 1 1 1 i n n T T : - h r 1 . x " x 1 i 1 1 1 | S S S S S S S S S 1 1 1 1 1 1 1 1 1 1 1 s S 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 E M E B B S L minus E M E B B S G I I I I : J L x X J I I 1 I I I I , 1 1 1 1 J h r V S S S S S S S S S S S 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 Figure A . 18: Boxplots of pairwise differences in log MSE for the estimators PEM,PLUG-IN> PEM,EBBS-G A N D PEM,EBBS-L °f t h e l i n e a r effect Pi i n m o d e l where Z = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is sig-nificantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.4 and m(z) = 7712(2). 210 E M _ E B B S _ G minus E M P L U G IN 1 ) 1 1 1 1 1 1 1 1 1 -»- *— HH tzp E 3 E = 3 f==i czi] r_ : T T V T T T ! S S S S S S J 1 1 1 1 1 1 - E r 1 5 £ f T T T : 5 S S S i i i 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l= l=9 1=10 E M E B B S L minus E M P L U G IN : J 1 1 I — I 111 1 1 1 1 s S 1 1 s s 1 1 i 1 1 1 I 1 1 . s s s s s s s 1 1 1 1 1 1 1 1=0 1 = 1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 4 r -3 -l=0 1=1 E M E B B S L minus E M E B B S G -i r l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 Figure A . 19: Boxplots of pairwise differences in log MSE for the estimators PEM,PLUG-IN•> PEM,EBBS-G A N D MM,EBBS~L °f t h e l i n e a r effect A i n model (8.1), where I — 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is sig-nificantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.6 and m(z) = 7712(2). 211 E M E B B S G minus E M P L U G IN ~i 1 r~ TT T s s s s s s s s s s s —I 1 1 1 1 I I I I I 1_ 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 4|— 3 -2 -1 -0 --1 --2 --3 -E M E B B S L minus E M P L U G IN I l=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 4 3 2 1 0 -1 -2 -E M _ E B B S _ L minus E M E B B S G L T l=0 1=1 l=2 l=3 l=4 l=5 l=6 l=8 1 = 10 Figure A.20: Boxplots of pairwise differences in log MSE for the estimators 0EM,PLUG-IN> PEM,EBBS-G A N D 0EM,EBBS-L °f t h e l i n e a r effect 01 i n m o d e l (SA)> where Z = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is sig-nificantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.8 and m(z) = m 2 ( z ) . 212 • i — t U _ E B B S _ G minus E M _ E B B S _ G + » i » i I i i + 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 U _ P L U G _ I N minus E M E B B S G _ l — | — _ | — — | — | - | l=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 U _ E B B S _ G minus U P L U G IN 4 t • l=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 S _ M C V minus U E B B S G | 1_ | 1 1 1 1, T JL l=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 S _ M C V minus U _ P L U G _ I N I I i i i 4 1=10 T l=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=9 S M C V ' m i n u s E M E B B S G 1=10 T I | | j | j. -4-H i=o 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 Figure A.21: Boxplots of pairwise differences in log MSE for the estimators ]PLUG-IN> PU]EBBS-G> @EM,EBBS-G A N D PS]MCV °f the linear effect ft in model (8.1), where I — 0,1, . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0 and m(z) — mi(z). 213 U J E B B S G minus E M E B B S G ' J i 11 i m 1 * ii J I J I » _jA_ —f— f + + — I ¥ — + + — —jr— — —^— 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 — i 1 1 r U P L U G IN minus E M E B B S G ^—t——1--!^—I—I—t—I—t-1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 — i 1 i r U E B B S G minus U P L U G IN _ ~ + - + ~ I I I I I I I i i I I l=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 S M C V minus U E B B S G f { H 1 i I 11 -I- -f 4 + + + + + S s s s s s 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 I f I X 4-. M C V minus U _ P L U G _ I N X X X I I i I 5 5 i T s s * + + s s s + + s s f * s s -1 l=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 S M C V minus E M E B B S G -1 l=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 Figure A.22: Boxplots of pairwise differences in log MSE for the estimators M]PLUG-IN> M]EBBS-G> PEM,EBBS-G A N D MMCV °f t h e l i n e a r effect Pi i n m o d e l (8.1), where I = 0,1, . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.2 and m{z) = m,\ (z). 214 _±_ s dl A JL A Jl S S 1 S i S i S 1=6 1=7 1=8 1=9 1=10 l=0 1=1 l=2 l=3 U E B B S G minus U P L U G IN 4—i—i—I—fr—*—*—*—*—I-l=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 -1 1 1 1 1 1 1 1 S _ M C V minus U _ E B B S _ G = 4 — | — | — | — i _ 1 ~%~ 1 s s 1 1 1 - | - - j -s s s s s 1 1 1 1 1 s s s 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l= 1=10 S M C V minus U P L U G IN i i -f~ - § ~ - f r * - t ~ T " s s i _ s l_ s _l_ l=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 -I 1 1 1 1 1 1 S _ M C V minus E M _ E B B S _ G —^-3^u-L ^ ^ — 1 1 1 1 s 1 s 1 s s s s s l i l t s s s 1 s 1 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 Figure A.23: Boxplots of pairwise differences in log MSE for the estimators ]PLUG-IN> PU]EBBS-G> PEM,EBBS-G A N D PS]MCV °f the linear effect ft in model (8.1), where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.4 and m(z) — m\(z). 215 4 2 1 1 I 1 1 1 1 1 U _ E B B S _ G minus E M _ E B B S _ G 1 1 1 0 -2 1 T 1 i | f * * f 1 t S 1 s 1 s i s s s s s 1 1 1 1 1 s 1 s 1 s 1 A 1=0 1=1 l=2 1=3 1=4 1=5 1=6 1=7 1=8 1=9 1=10 H 2 I I 1 1 1 1 1 U _ P L U G _ I N minus E M _ E B B S _ G 1 1 i - r 1 t * * * * J * ± 0 -2 s s s s s s s s 1 1 1 1 1 s 1 S s I >1 1=0 1=1 1=2 1=3 1=4 1=5 1=6 1=7 1=8 1=9 1=10 H 2 r\ 1 A I 1 1 | U _ E B B S _ G minus U _ P L U G _ I N 1 i i 1 . 1 t 1 * I + U -2 s 1 i s 1 i S i 1 J 1 + t 1 t 1 1 1 X 1 —w— 1 ^ 1 A 1=0 1=1 1=2 1=3 1=4 1=5 1=6 1=7 1=8 1=9 1 = 10 2 i i i i i i S _ M C V minus U _ E B B S _ G A j _ 4 . 4 , _ i _ 4 = 1 1 u -2 1 s 1 $ * * * * s s s s s s 1 1 1 1 1 1 s + • s 1 • • w — s I 1=0 1=1 1=2 1=3 1=4 1=5 1=6 1=7 1=8 1=9 1 = 10 2 0 -2 • T • I T 1 1 1 1 1 1 1 S _ M C V minus U _ P L U G _ I N JL, _±. _±_ _4_ _4_ _A_ s J . + s s s s s s 1 1 1 1 1 S 1 S 1 T s I /I 1=0 1 = 1 1=2 1=3 1=4 1=5 1=6 1=7 1=8 1=9 1=10 2 0 -2 -1 i T i 1 1 1 1 1 S _ M C V minus E M _ E B B S _ G — ^ _ I s s S 1 s s s s s 1 1 1 1 1 S 1 S 1 s 1=0 1=1 1=2 1=3 1=4 1=5 1=6 1=7 1=8 1=9 1=10 Figure A.24: Boxplots of pairwise differences in log MSE for the estimators PU,PLUG-IN> PU,EBBS-G> PEM,EBBS-G A N D M]MCV °f t h e l i n e a r effect A i n model (8.1), where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.6 and m(z) — m\{z). 216 1 1 1 1 1 1 1 U E B B S G minus E M E B B S i x x x 1 _ G + I jt + ill i + S 1 s s -CO H -Cfl • -Cfl -CO s 1 — S i s 1 s 1 1=0 1=1 1=2 1=3 1=4 1=5 1=6 1=7 1=8 1=9 1=10 1 • a I U P L U G IN minus E M E B B S i i i J L I _ G + i i + I + s 1 s s -CO -1 -Cfl • -Cfl -co T — S i w — S i s s 1=0 1=1 1=2 1=3 1=4 1=5 1=6 1=7 1=8 1=9 1=10 1 t I I ! i i I i U _ E B B S _ G minus U _ P L U G _ I i N ™ i + [ JL s 1 + s 1 i i i i p 1— i —— T 1=0 1=1 1=2 1=3 1=4 1=5 1=6 1=7 1=8 1=9 1=10 1 _ t _ i i i i S M C V minus U E B B S G i E f a ^ * i i i 4 = 1 4 -I 4 - • - ^ T T • • s s s s 1 1 1 1 + s S i + s i + s 1 1=0 1=1 1=2 1=3 1=4 1=5 1=6 1=7 1=8 1=9 1=10 1 i i i i i S _ M C V minus U _ P L U G _ I N 1 1 * i * '—1—' 1 1 ' ' 1 ' i t I . i - . -- 9 T T T * s s s s 1 1 1 1 1 + S s i + S 1 + s I 1=0 1=1 1=2 1=3 1=4 1=5 1=6 1=7 1=8 1=9 1=10 -I i t • i i I S M C V minus E M E B B S G i ^ ^ i 4 » 4 - 1 s s s s s s s 1 1 1 1 S s 1 S i s 1 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 Figure A.25: Boxplots of pairwise differences in log MSE for the estimators M]PLUG-IN> PU,EBBS-G> PEM,EBBS-G A N D M]MCV °f t h e l i n e a r effect A i n model (8.1), where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.8 and m(z) = m\{z). 217 1 1 i 1 £ U _ E B B S _ G minus E M _ E B B S _ G ii i + + Ii 1 1 t * -r s w s 1 w 1 ' ! T T ! s s s s s s 1 1 1 1 1 1 ! * 1 1 1=0 1=1 1=2 1=3 1=4 1=5 1=6 1=7 1=8 1=9 1=10 I I U _ P L U G _ I N minus E M _ E B B S _ G i i A , „ ^ 1 s 1 s 1 1 S 1 s s s s s s 1 1 1 1 1 1 | 1 s s -1 1 1=0 1=1 l=2 — i — l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 U _ E B B S _ G minus U _ P L U G _ I N -I 1 * * *-l=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 S _ M C V minus U _ E B B S _ G l=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 S _ M C V minus U _ P L U G _ I N ^ ^ ^ HH^ ^^ H*— l=0 1=1 l=2 —I— l=3 l=4 l=5 l=6 l=7 1— l=8 l=9 1=10 S _ M C V minus E M _ E B B S _ G ^ H H I ^ s l_ l=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 1=10 Figure A.26: Boxplots of pairwise differences in log MSE for the estimators M]PLUG-IN> PU]EBBS-G> PEM\EBBS-G A N D W,MCV °f t h e linear effect ft in model (8.1), where I = 0,1, . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0 and m(z) — 7712(2). 218 1=0 1=1 U EBBS G minus EM EBBS G + + * * I t J I I ' ' l=2 l=3 l = 4 l=5 l=6 l=7 l = 8 l = 9 1 = 1 0 4- + U PLUG IN minus EM EBBS G •f 1 1 1 1 * 1" l = 0 1=1 l=2 l=3 l = 4 l=5 l=6 l=7 l = 8 l = 9 1 = 1 0 U EBBS G minus U PLUG IN i — « — i - — i — i - - i — i — i — * s I s I l = 0 1=1 l=2 l=3 l = 4 l=5 l=6 l=7 l = 8 l = 9 1 = 1 0 S MCV minus U EBBS G _| _J_ ^ ^ •I l = 0 1=1 l=2 l=3 l = 4 l=5 l=6 l=7 l = 8 l = 9 1 = 1 0 1—r S MCV minus U PLUG IN • $ f $ f-l = 0 1=1 l=2 l=3 l = 4 l=5 l=6 l=7 l = 8 l = 9 1 = 1 0 l = 0 1=1 S MCV minus EM EBBS G ^ "* * * * t f t l=2 l=3 l = 4 l=5 l=6 S i l=7 l = 8 l = 9 1 = 1 0 Figure A.27: Boxplots of pairwise differences in log MSE for the estimators M]PLUG-IN> PU,EBBS-G> PEM,EBBS-G A N D M]MCV °f t h e l i n e a r effect Pi i n model (8.1), where I — 0,1, . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.2 and m(z) = 7712(2). 219 U E B B S G minus E M E B B S G WWiJJJJpn ^ —<j||>— _^ J^_ ^ « ^ H H ^ M M a ^ ^ ^ n s L _ 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 - 2 h U P L U G IN minus E M E B B S G ± + enx^jua ^ . jj| 't ^ ^ —"H^"™* ^ ——^ |^^_ 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 U _ E B B S G minus U P L U G IN | j i I * I I t « * - 2 - 4 l=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 l=5 l=6 l=7 l=8 l=9 1=10 S _ M C V minus U P L U G IN 2 0 - 2 T T l=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 S M C V minus E M E B B S G - f r - H ( r - - f - - f ~ - f ~ t { -4- -#-l=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 Figure A.28: Boxplots of pairwise differences in log MSE for the estimators PV,PLUG-IN> M]BBBS-G> MM,EBBS-G A N D M]MCV °f t h e linear effect ft in model (8.1), where I — 0,1, . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.4 and m(z) = m2(z). 220 ,U E B B S G minus E M E B B S G iwrJLv'i i J i in ^ ^ ^ ^ ^ ^ ^ 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 * * i U _ P L U G _ I N minus E M _ E B B S _ G 4. 4, j I ( 1_| S i _ 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 U _ E B B S G minus U P L U G IN 1 I ! i i i V » s 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 1 1 S _ M C V minus U _ E B B S _ G —S1 • i i i - f - - § - ~ - i " s 1 T s I -co -HII -CA « -CA -I -Cfl 4H + s 1 S S S -1 1 1 1=0 1=1 1=2 1=3 1=4 1=5 1=6 1=7 1=8 1=9 1=10 1 T 1 S _ M C V minus U _ P L U G _ I N 1 1 1 1 _±_ _|_ _±_ - r s < i s 1 J "f"" ^ s s s s s 1 1 1 1 1 S S S S -1 1 1 1=0 1=1 1=2 1=3 1=4 1=5 1=6 1=7 1=8 1=9 1=10 —di— . T . S _ M C V minus E M _ E B B S _ G - i - " s 1=0 s 1=1 s s s s s 1 1 1 1 1 1=2 1=3 1=4 1=5 1=6 S 1=7 S S S -I I I l=8 l=9 1=10 Figure A.29: Boxplots of pairwise differences in log MSE for the estimators W,PLUG-IN> P<J,EBBS-G> PEM,EBBS-G A N D M]MCV °f t h e linear effect Pi in model (8.1), where I = 0,1,... ,10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) for which p = 0.6 and m(z) = m2(z). 221 1 1 t i 1 'u i E B B S ' G minus E M f | . E ' B B S _ + * G ' + -L. i + + 1 + - L ? *? s ? 5 ? ? ? ? 1=0 1=1 1=2 1=3 1=4 1=5 = 6 1=7 1=8 1=9 1 = 1 0 I i 'u | _ P L U G _ I N minus E M I i _ E B B S . + _ G ' + i + - i -1 + -L . + « ? . — — ^ ? ? S ? ? s S l = 0 1=1 l = 2 l = 3 l = 4 l = 5 l = 6 l = 7 l=8 l=9 1=10 1_ J. ' U _ E B B S _ G minus U _ P L L I G _ I N * + + l=0 1=1 l = 2 l = 3 l = 4 l = 5 l = 6 l = 7 l=8 l=9 1=10 S M C V minus U E B B S G T T * X i=o — i — 1=1 l = 2 l = 3 l = 4 l = 5 l = 6 l = 7 l=8 l=9 1=10 S _ M C V minus U _ P L U G _ I N _ ± _ f t + i=o 1=1 l = 2 l = 3 l = 4 l = 5 l = 6 l = 7 l=8 l=9 1=10 I I 's. . M C V ' m i n u s E M _ E B B S _ G i 1 1 1 A + + , + ± _±_ -4-? S S S S T ? 9 1=0 1=1 I = 2 l = 3 1=4 l = 5 l = 6 l = 7 1=8 1=9 1=10 - 5 Figure A.30: Boxplots of pairwise differences in log MSE for the estimators Pu,PLUG-IN> M]EBBS-G> PEM,EBBS-G A N D # S , M C V °f t h e l i n e a r effect Pi i n m o d e l (8.1), where I — 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.8 and m(z) = 7 7 1 2 ( 2 ) . 222 Appendix B Validi ty of Confidence Intervals In this appendix, we provide plots that help assess and compare the coverage properties of various methods for constructing standard 95% confidence intervals for Pi, the linear effect in model (8.1). For each method, we visualize point estimates and 95% confidence interval estimates for the true coverage achieved by that method. 223 p = 0; m(z) = 2sin(3z) - 2(cos(0)-cos(3))/3 U S U A L + P L U G - I N U S U A L + E B B S - G U S U A L + E B B S - L MODIFIED + P L U G - I N MODIFIED + E B B S - G MODIFIED + E B B S - L 0.98 0.96 0.94 0.92 0.9 S P E C K M A N + M C V 0.98 0.96 0.94 0.92 pHTl = method with superior M S E performance Figure B . l : Point estimates (circles) and 95% confidence interval estimates (seg-ments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect f3\ in model (8.1). Each method depends on a tuning parameter I = 0,1,. . . , 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0 and m(z) — m\(z). 224 p = 0.2; m(z) = 2sin(3z) - 2(cos(0)-cos(3))/3 U S U A L + P L U G - I N U S U A L + E B B S - G U S U A L + E B B S - L MODIFIED + P L U G - I N MODIFIED + E B B S - G 0 5 10 MODIFIED + E B B S - L 10 1 0.98 0.96 0.94 0.92 0.9 0.88 10 S P E C K M A N + MCV ; method with superior M S E performance Figure B.2: Point estimates (circles) and 95% confidence interval estimates (seg-ments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect Pi in model (8.1). Each method depends on a tuning parameter I = 0,1, . . . , 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.2 and m(z) = 7 7 1 2 ( 2 ) . 225 p = 0.4; m(z) = 2sin(3z) - 2(cos(0)-cos(3))/3 USUAL + PLUG-IN USUAL + EBBS-G USUAL + EBBS-L IjlHHlHI 10 MODIFIED + PLUG-IN MODIFIED + EBBS-G MODIFIED + EBBS-L SPECKMAN + MCV ; method with superior MSE performance Figure B.3: Point estimates (circles) and 95% confidence interval estimates (seg-ments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect B\ in model (8.1). Each method depends on a tuning parameter I = 0,1, . . . , 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.4 and m(z) = m\(z). 226 p — u.o, nil,*.; — ^ s i r n o ^ — ^vuus>t_u;—isua^Offio USUAL + PLUG-IN USUAL + EBBS-G USUAL + EBBS-L 0.9 0.8 0.7 0.6 0.5 • • • • • • • • 10 MODIFIED + PLUG-IN SPECKMAN + MCV 1 0.9 0.8 0.7 0.6 0.5 • •+ • • M M i f MODIFIED + EBBS-G MODIFIED + EBBS-L 10 = method with superior MSE performance 0 5 10 Figure B.4: Point estimates (circles) and 95% confidence interval estimates (seg-ments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect Pi in model (8.1). Each method depends on a tuning parameter I — 0,1,. . . , 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p — 0.6 and m(z) = mi(z). 227 p = 0.8; m(z) = 2sin(3z) - 2(cos(0)-cos(3))/3 USUAL + PLUG-IN USUAL + EBBS-G USUAL + EBBS-L MODIFIED + PLUG-IN MODIFIED + EBBS-G MODIFIED + EBBS-L 5 10 SPECKMAN + MCV • • • • • • • * • • • 10 • method with superior MSE performance Figure B.5: Point estimates (circles) and 95% confidence interval estimates (seg-ments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect Q\ in model (8.1). Each method depends on a tuning parameter I = 0,1,. . . , 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.8 and m(z) = m\(z). 228 p = 0; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6 USUAL + PLUG-IN USUAL + EBBS-G USUAL + EBBS-L 10 1 0.9 0.8 0.7 0.6 0.5 0.4 10 MODIFIED + PLUG-IN MODIFIED + EBBS-G MODIFIED + EBBS-L SPECKMAN + MCV f^vl = method with superior MSE performance Figure B.6: Point estimates (circles) and 95% confidence interval estimates (seg-ments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect ft in model (8.1). Each method depends on a tuning parameter I = 0,1,. . . , 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p — 0 and m(z) = 7 7 1 2 ( 2 ) . 229 p = 0.2; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6 USUAL + PLUG-IN USUAL + EBBS-G USUAL + EBBS-L MODIFIED + PLUG-IN MODIFIED + EBBS-G MODIFIED + EBBS-L SPECKMAN + MCV • method with superior MSE performance Figure B.7: Point estimates (circles) and 95% confidence interval estimates (seg-ments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect ft in model (8.1). Each method depends on a tuning parameter I = 0,1, . . . , 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p — 0.2 and m(z) = 7 7 1 2 ( 2 ) . 230 p = 0.4; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6 USUAL + PLUG-IN USUAL + EBBS-G USUAL + EBBS-L I M • • t M | • SPECKMAN + MCV : method with superior MSE performance Figure B.8: Point estimates (circles) and 95% confidence interval estimates (seg-ments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect B\ in model (8.1). Each method depends on a tuning parameter I = 0,1,. . . , 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.4 and m(z) = 7 7 1 2 ( 2 ) . 231 p = 0.6; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6 USUAL + PLUG-IN USUAL + EBBS-G USUAL + EBBS-L 1 0.9 1 0.9 • * 1 0.9 0.8 0.8 0.8 0.7 0.7 0.7 0.6 0.6 0.6 0.5 0.4 0.5 0.4 0.5 0.4 0.3 0.2 0.3 0.2 0.3 0.2 10 10 MODIFIED + PLUG-IN MODIFIED + EBBS-G MODIFIED + EBBS-L MM***!-mm • A' -5 10 10 5 10 SPECKMAN + MCV jj = method with superior MSE performance Figure B.9: Point estimates (circles) and 95% confidence interval estimates (seg-ments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect Pi in model (8.1). Each method depends on a tuning parameter I = 0,1, . . . , 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.6 and m(z) = 7 7 1 2 ( 2 ) . 232 p = 0.8; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6 USUAL + PLUG-IN USUAL + EBBS-G USUAL + EBBS-L 1 1 0.9 • * 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 10 MODIFIED + PLUG-IN MODIFIED + EBBS G MODIFIED + EBBS-L 5 10 SPECKMAN + MCV : method with superior MSE performance Figure B.10: Point estimates (circles) and 95% confidence interval estimates (seg-ments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect fi\ in model (8.1). Each method depends on a tuning parameter I — 0,1, . . . , 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.8 and m(z) = m,2(z). 233 Appendix C Confidence Interval Length Comparisons In this appendix, we provide plots that help assess and compare the length properties of three methods for constructing standard 95% confidence intervals for 3X, the linear effect in model (8.1). These methods rely on the estimators PIJPLUG-IN^ PUEBBS-G and Ps^MCVi a n < ^ their associated standard errors. We remind the reader that the finite sample properties of these estimators were investigated via simulation in Chapter 8. 234 p = 0; m(z) = 2sin(3z) - 2(cos(0)-cos(3))/3 U - P L U G - I N U - E B B S - G S - M C V 1.15 1.15 1.15 = l>0 0.2 0 -0.2 -0.4 0.4 0.2 0 -0.2 -0.4 0.4 0.2 0 -0.2 -0.4 = confidence interval with shortest expected length for each l> 0 (among U-PLUG-IN, U - E B B S - G and S-MCV) U_EBBS_G minus U PLUG IN I I I I I I I 1 1 1 1——— * * * * * * i i i * * s s s I S 1 S S i i S r S S 1 ¥ s ~w~ -s l=0 1=1 l=2 1=3 1=4 1=5 S_MCV minus U . 1=6 _PLUG_IN 1=7 1=8 1=9 1=10 1 + : I + 1 + 1 i i i i i JL ' s 1 s I s 1 S S i " S 1 S S 1 s T P — 1=0 1=1 1=2 1=3 1=4 1=5 S_MCV minus U. 1=6 _EBBS_G 1=7 1=8 1=9 1=10 + 1 t 1 t • i i s s 1 S S S • i s 1 S S 1 s I 1=0 1=1 1=2 1=3 1=4 1=5 1=6 1=7 1=8 1=9 1 = 10 Figure C l : Top row: Average length of the standard confidence intervals for the linear effect ft in model (8.1) as a function of I — 0,1, . . . , 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for ft. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0 and m(z) = m\(z). 235 p = 0.2; m(z) = 2sin(3z) - 2(cos(0)-cos(3))/3 U - P L U G - I N U - E B B S - G S - M C V 1.2 1.2 1.2 1 1.15 1.1 i 1 | 1 * * * M * 1.15 1.1 T 1.15 1.1 t * 1.05 1.05 1.05 0 5 10 0 5 10 0 5 10 = I > 1 = confidence interval with shortest expected length for each l> 1 (among U - P L U G - I N , U - E B B S - G and S-MCV) U_EBBS_G minus U PLUG IN l=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 S_MCV minus U EBBS G l=8 l = 9 l=9 1 = 10 1=10 Figure C.2: Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I = 0,1,..., 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p — 0.2 and m(z) — m 1 (z). 236 p = 0.4; m(z) = 2sin(3z) - 2(cos(0)-cos(3))/3 U - P L U G - I N 1.2 1.1 U - E B B S - G S - M C V 1.2 1.2 * » « » » * « t f 1.1 , * . . . . . . * 1.1 T • 1 ff • 1 0 5 10 0 5 10 0 5 10 : I > 2 = confidence interval with shortest expected length for each l> 2 (among U-PLUG-IN, U - E B B S - G and S - M C V ) U_EBBS_G minus U PLUG IN l=6 l=7 l=8 l = 9 1=10 Figure C.3: Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I — 0 , 1 , . . . , 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0.4 and m(z) — mi(z). 237 p = 0.6; m(z) = 2sin(3z) - 2(cos(0)-cos(3))/3 1.2 1 0 . 8 U - P L U G - I N » » » <1D « : 0 1 0 U - E B B S - G 1.2 1 0.8 * * * * 1 0 S - M C V I > 3 = confidence interval with shortest expected length for each l> 3 (among U - P L U G - I N , U - E B B S - G and S-MCV) U_EBBS_G minus U PLUG IN l=3 l=4 l=5 l=6 l=7 S MCV minus U PLUG IN l=8 l=9 1=10 -S S S S S J I I I 1 1 L ± ± 1 -1- -1-• + + + » s s s s s 1 1 1 1 1 0.5 0 -0.5 1 0.5 0 -0.5 l=0 l=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 S MCV minus U EBBS G l=8 l=9 1=10 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 Figure C.4: Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I — 0 ,1, . . . , 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0.6 and m(z) = mi(z). 238 p = 0.8; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6 U - P L U G - I N U - E B B S - G S - M C V 1.4 1.2 1 0.8 1.4 1.2 • • • 1 0.8 • • • • 0 S 10 0 5 10 I > 4 0.2 0 -0.2 -0.4 -0.6 = r a = confidence interval with shortest expected length for each l> 4 (among U - P L U G - I N , U - E B B S - G and S - M C V ) U_EBBS_G minus U_PLUG_IN "i i 1 1 1 1 1— I f -1 r S S 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 S MCV minus U PLUG IN - A i l j ^ Eh Ek ^ 1 f » -S S S S S S S S S S S ' 1 1 1 1 1 1 1 1 1 I | 0.5 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 S_MCV minus U EBBS G I 1 1 I I j. JL JL •*• J . j . J . S S S S S S S S S S S I 1 1 1 1 I I I I I I I 0.5 h l=0 1=1 l=2 l = 3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 Figure C.5: Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) os a function of I = 0,1,. . . ,10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0.8 and m(z) — mi(z). 239 U - P L U G - I N p = 0; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6 U - E B B S - G S - M C V = I > 0 -0.5 0.5 -0.5 - confidence interval with shortest expected length for each l> 0 (among U - P L U G - I N , U - E B B S - G and S - M C V ) U_EBBS_G minus U PLUG IN 1=3 1=4 1=5 1=6 1=7 S_MCV minus U EBBS G - + I 1 1 1 1 . t . 1 , f , i s 1 s 1 ' + ' s s S J i " i • s 1 S 1 S 1 S 1 S S 1=0 1=1 l=2 1=3 1=4 1=5 1=6 1=7 1=8 1=9 1=10 Figure C.6: Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I — 0,1, . . . , 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p — 0 and m(z) — 1712(2). 240 U - P L U G - I N p = 0.2; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6 U - E B B S - G S P E C K M A N + MCV I > 1 0.2 0 -0.2 -0.4 0.5 0 -0.5 0.5 0 -0.5 = confidence interval with shortest expected length for each l> 1 (among U - P L U G - I N , U - E B B S - G and S - M C V ) U EBBS G minus U PLUG IN I l l l l l l i i | 1 I » 1 ! S 4-s s s , 1 ( 1 1 1 s s s s 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l= S_MCV minus U_PLUG_IN 7 1= 8 1=9 1=10 i i i i i i + i i - L i J L JL J 1 1 1 1 J - ± + ^ r u r j ^ T r - i - + -1- 4 . s s s s s s s s 1 1 1 1 1 1 1 1 1 1 1 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l= S_MCV minus U _ E B B S _ G 7 1= 8 l=9 1=10 I I I I I I I ; JL 1 1 ± ± t ± j i i i L J L ± J L + 4. 1 1 4 - + -L J , S S S S S S S S S 1 1 1 1 1 1 1 1 1 1 1 1 l=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 Figure C.7: Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I = 0,1,. . . ,10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0.2 and m(z) = 7712(2). 241 p = 0.4; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6 U - P L U G - I N 0.2 — 0 — -0.2 --0.4 -U - E B B S - G S - M C V 1.2 1.2 1.1 1.1 1 • 1 0 5 10 0 5 10 = l>2 = confidence interval with shortest expected length for each l> 2 (among U - P L U G - I N , U - E B B S - G and S - M C V ) U_EBBS_G minus U PLUG IN 0.5 -0.5 i i i i i i S —I— I — 0 1=1 l=2 l=3 l=4 l=5 l=6 S_MCV minus U PLUG IN l=7 l=8 l=9 1=10 - i r i r !! j . + T~ m—r s s s T S I i 3 4 4 1 s s 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 S_MCV minus U EBBS G l = 8 l=9 1=10 l=8 l=9 1=10 Figure C.8: Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I = 0,1,... ,10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p — OA and m(z) = 7 7 1 2 ( 2 ) . 242 p = 0.6; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6 1.2 1 0.8 U - P L U G - I N U - E B B S - G 10 10 S - M C V 1.2 1.2 • tj> * « « : « . « « ; » 1 • » : * * # * « » 1 • * • • 0.8 • • 0.8 10 = I ;> 3 0.5 0 -0.5 1 0.5 0 -0.5 = confidence interval with shortest expected length for each l> 3 (among U-PLUG-IN, U - E B B S - G and S-MCV) U_EBBS_G minus U PLUG IN l=2 l=3 l=4 l=5 l=6 l=7 S_MCV minus U EBBS G l = 8 l=9 1=10 " i — r 1 1 1 1 P b|d j d I-7-I h|=l =t I I 1 1 1 i-i-l ^-i hjH Fj=l f=Y=l + + S S S S S S I 1 1 1 I I I S s s s s — i 1 i i i l = 0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 Figure C.9: Top row: Average length of the standard confidence intervals for the linear effect 3\ in model (8.1) as a function of I = 0,1,. . . ,10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Q\. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p — 0.6 and m(z) = 7 7 1 2 ( 2 ) . 243 p = 0.8; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6 U - PLUG-IN U - E B B S - G 3 - M C V 1.4 1.2 1.4 1.2 1.4 1.2 • • 1 1 1 • « « • » « « « • • 0.8 • • 0.8 • • • • 0.8 0 5 10 0 5 10 0 5 10 I > 4 0.2 r -0 — -0.2 --0.4 --0.6 -0.5 - confidence interval with shortest expected length for each l> 4 (among U - P L U G - I N , U - E B B S - G and S - M C V ) U_EBBS_G minus U_PLUG_IN ! ! 1 * l = ° 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 S_MCV minus U_PLUG_IN I 1 1 ¥ 4* R I 1 p I ' • : J h -s s s s s s s s s s s 1=0 1=1 |=2 l= I I 1 1 1 3 l=4 l=5 l=6 l= S_MCV minus U_EBBS_G i 1 1 , 7 1= B 1=9 1 = 10 1=7 1=8 1=9 1=10 Figure C.10: Top row: Average length of the standard confidence intervals for the linear effect ft in model (8.1) as a function of I = 0,1, . . . , 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for ft. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p — 0.8 and m(z) — 1 7 1 2 ( 2 ) . 244
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Inference in partially linear models with correlated...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Inference in partially linear models with correlated errors Ghement, Isabella Rodica 2005
pdf
Page Metadata
Item Metadata
Title | Inference in partially linear models with correlated errors |
Creator |
Ghement, Isabella Rodica |
Date Issued | 2005 |
Description | We study the problem of performing statistical inference on the linear effects in partially linear models with correlated errors. To estimate these effects, we introduce usual, modified and estimated modified backfitting estimators, relying on locally linear regression. We obtain explicit expressions for the conditional asymptotic bias and variance of the usual backfitting estimators under the assumption that the model errors follow a mean zero, covariance-stationary process. We derive similar results for the modified backfitting estimators under the more restrictive assumption that the model errors follow a mean zero, stationary autoregressive process of finite order. Our results assume that the width of the smoothing window used in locally linear regression decreases at a specified rate, and the number of data points in this window increases. These results indicate that the squared bias of the considered estimators can dominate their variance in the presence of correlation between the linear and non-linear variables in the model, therefore compromising their i/n-consistency. We suggest that this problem can be remedied by selecting an appropriate rate of convergence for the smoothing parameter of the-estimators. We argue that this rate is slower than the rate that is optimal for estimating the non-linear effect, and as such it 'undersmooths' the estimated non-linear effect. For this reason, data-driven methods devised for accurate estimation of the non-linear effect may fail to yield a satisfactory choice of smoothing for estimating the linear effects. We introduce three data-driven methods for accurate estimation of the linear effects. Two of these methods are modifications of the Empirical Bias Bandwidth Selection method of Opsomer and Ruppert (1999). The third method is a non-asymptotic plug-in method. We use the data-driven choices of smoothing supplied by these methods as a basis for constructing approximate confidence intervals and tests of hypotheses for the linear effects. Our inferential procedures do not account for the uncertainty associated with the fact that the choices of smoothing are data-dependent and the error correlation structure is estimated from the data. We investigate the finite sample properties of our procedures via a simulation study. We also apply these procedures to the analysis of data collected in a time-series air pollution study. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2009-12-21 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
IsShownAt | 10.14288/1.0092286 |
URI | http://hdl.handle.net/2429/16950 |
Degree |
Doctor of Philosophy - PhD |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2005-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-ubc_2005-104953.pdf [ 14.36MB ]
- Metadata
- JSON: 831-1.0092286.json
- JSON-LD: 831-1.0092286-ld.json
- RDF/XML (Pretty): 831-1.0092286-rdf.xml
- RDF/JSON: 831-1.0092286-rdf.json
- Turtle: 831-1.0092286-turtle.txt
- N-Triples: 831-1.0092286-rdf-ntriples.txt
- Original Record: 831-1.0092286-source.json
- Full Text
- 831-1.0092286-fulltext.txt
- Citation
- 831-1.0092286.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0092286/manifest