Inference in Partially Linear Models with Correlated Errors by ISABELLA RODICA GHEMENT B.Sc, The University of Bucharest, Romania, 1996 M.Sc, The University of Bucharest, Romania, 1997 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy in THE FACULTY OF GRADUATE STUDIES (Statistics) The University of British Columbia August 2005 © I S A B E L L A R O D I C A G H E M E N T , 2005 Abstract We study the problem of performing statistical inference on the linear effects in partially linear models with correlated errors. To estimate these effects, we introduce usual, modified and estimated modified backfitting estimators, relying on locally linear regression. We obtain explicit expressions for the conditional asymptotic bias and variance of the usual backfitting estimators under the assumption that the model errors follow a mean zero, covariance-stationary process. We derive similar results for the modified backfitting estimators under the more restrictive assumption that the model errors follow a mean zero, stationary autoregressive process of finite order. Our results assume that the width of the smoothing window used in locally linear regression decreases at a specified rate, and the number of data points in this window increases. These results indicate that the squared bias of the considered estimators can dominate their variance in the presence of correlation between the linear and non-linear variables in the model, therefore compromising their i/n-consistency. We suggest that this problem can be remedied by selecting an appropriate rate of convergence for the smoothing parameter of the-estimators. We argue that this rate is slower than the rate that is optimal for estimating the non-linear effect, and as such it 'undersmooths' the estimated non-linear effect. For this reason, data-driven methods devised for accurate estimation of the non-linear effect may fail to yield a satisfactory choice of smoothing for estimating the linear effects. We introduce three data-driven methods for accurate estimation of the linear effects. Two of these methods are modifications of the Empirical Bias Bandwidth Selection method of Opsomer and Ruppert (1999). The third method is a non-asymptotic plug-in method. We use the data-driven choices of smoothing supplied by these methods as a basis for constructing approximate confidence intervals and tests of hypotheses for the linear effects. Our inferential procedures do not account for the uncertainty associated with the fact that the choices of smoothing are data-dependent and the error correlation structure is estimated from the data. We investigate the finite sample properties of our procedures via a simulation study. We also apply these procedures to the analysis of data collected in a time-series air pollution study. ii Contents Abstract ii Contents iii List of Tables vii List of Figures viii Acknowledgements xxiv Dedication xxvi 1 Introduction 1 1.1 Literature Review 1.2 2 2 1.1.1 Partially Linear Models with Uncorrelated Errors 3 1.1.2 Partially Linear Models with Correlated Errors 5 Thesis Objectives 9 A P a r t i a l l y Linear M o d e l with C o r r e l a t e d E r r o r s 13 2.1 The Model 13 2.2 15 Assumptions 2.3 Notation 19 2.4 Linear Algebra - Useful Definitions and Results 21 2.5 Appendix 22 iii 3 E s t i m a t i o n in a P a r t i a l l y Linear M o d e l w i t h C o r r e l a t e d E r r o r s 3.1 4 5 Generic Backfitting Estimators 26 3.1.1 Usual Generic Backfitting Estimators 30 3.1.2 Modified Generic Backfitting Estimators 31 3.1.3 Estimated Modified Generic Backfitting Estimators 31 3.1.4 Usual, Modified and Estimated Modified Speckman Estimators . . 32 A s y m p t o t i c Properties of the L o c a l Linear Backfitting E s t i m a t o r fii s c t h 35 4.1 Exact Conditional Bias of fii s\ given X and Z 36 4.2 Exact Conditional Variance of fi c Given X and Z 44 4.3 Exact Conditional Measure of Accuracy of (3 c 49 4.4 The Vn-consistency of fii,s^ 4.5 Generalization to Local Polynomials of Higher Degree 52 4.6 Appendix 53 t IiS I<S given X and Z 50 A s y m p t o t i c Properties of the M o d i f i e d and E s t i m a t e d M o d i f i e d L o c a l Linear Backfitting Estimators, / 3 ^ - i c and /3~-i i S 6 25 g c 5.1 Exact Conditional Bias of / 3 ^ - i 5.2 Exact Conditional Variance of 5.3 Exact Conditional Measure of Accuracy of /3^-i « Given X and Z . . . 79 5.4 The -v/n-consistency of /3^-i c 80 5.5 Generalization to Local Polynomials of Higher Degree 81 5.6 The i/n-consistency of /3--i 81 5.7 Appendix 84 i S c given X and Z 71 J S ^ - J ^ C given X and Z (S >S C h o o s i n g the C o r r e c t A m o u n t of S m o o t h i n g 6.1 Notation 6.2 Choosing h for c (3 c and 72 76 101 102 T I>S c /3^-i c T ) S 103 6.2.1 Review of Opsomer and Ruppert's EBBS method 104 6.2.2 Modifications to the EBBS method 107 iv 6.2.3 6.3 6.4 7 7.2 9 109 Estimating m, of and * 110 6.3.1 Estimating m 110 6.3.2 Estimating of and * 114 Choosing h for c J3~-i 116 T Confidence Interval E s t i m a t i o n and Hypothesis Testing 7.1 8 Plug-in method 118. Confidence Interval Estimation 118 7.1.1 Bias-Adjusted Confidence Interval Construction 121 7.1.2 Standard Error-Adjusted Confidence Interval Construction . . . . 122 Hypothesis Testing 123 M o n t e C a r l o Simulations 124 8.1 The Simulated Data 125 8.2 The Estimators 126 8.3 The MSE Comparisons 129 8.4 Confidence Interval Coverage Comparisons 130 8.4.1 Standard Confidence Intervals 131 8.4.2 Bias-Adjusted Confidence Intervals 133 8.4.3 Standard Error-Adjusted Confidence Intervals 134 8.5 Confidence Interval Length Comparisons 134 8.6 Conclusions 136 A p p l i c a t i o n to A i r P o l l u t i o n D a t a 141 9.1 Data Description 143 9.2 Data Analysis 144 9.2.1 Models Entertained for the Data 144 9.2.2 Importance of Choice of Amount of Smoothing 146 9.2.3 Choosing an Appropriate Model for the Data 147 9.2.4 Inference on the PM10 Effect on Log Mortality 151 v 10 Conclusions 180 Bibliography 187 Appendix A M S E Comparisons 192 Appendix B V a l i d i t y of Confidence Intervals 223 Appendix C Confidence Interval L e n g t h Comparisons 234 vi List of Tables 8.1 Values of I for which the standard 95% confidence intervals for /?i constructed from the estimators Pu] _, PLUG IN PU!EBBS-G a n d PS^MCV A R E valid (in the sense of achieving the n o m i n a l coverage) for each setting i n our simulation study. vii List of Figures 8.1 Data simulated from model (8.1) for p = 0,0.4,0.8 and m(z) — mi(z). The first row shows plots that do not depend on p. The second and third rows each show plots for p — 0, 0.4, 0.8 8.2 138 Data simulated from model (8.1) for p = 0,0.4,0.8 and m(z) = m.2(z). The first row shows plots that do not depend on p. The second and third rows each show plots for p = 0, 0.4, 0.8 139 9.1 Pairwise scatter plots of the Mexico City air pollution data 156 9.2 Results of gam inferences on the linear PM10 effect (3\ in model (9.3) as a function of the span used for smoothing the seasonal effect m.\. estimated PM10 effects (top left), associated standard errors (top right), 95% confidence intervals for /?i (bottom left) and p-values of t-tests for testing the statistical significance of /?i 9.3 157 The top panel displays a scatter plot of log mortality versus PM10. The ordinary least squares regression line of log mortality on PM10 is superimposed on this plot. The bottom panel displays a plot of the residuals associated with model (9.1) versus day of study. 9.4 158 Plots of the the fitted seasonal effect mi in model (9.2) for various spans. Partial residuals, obtained by subtracting the fitted parametric part of the model from the responses, are superimposed as dots 9.5 159 Plots of the residuals associated with model (9.2) for various spans. . . . 160 viii 9.6 P-values associated w i t h a series of crude F-tests for testing model (9.4) against model (9.2) 9.7 161 Plots of the fitted weather surface m 2 i n model (9.4) when the fitted sea- sonal effect m i (not shown) was obtained w i t h a span of 0.09. T h e surface m2 was smoothed w i t h spans of 0.01 (top left), 0.02 (top right), 0.03 (bott o m left) or 0.04 (bottom right) 9.8 162 Degrees of freedom consumed by the fitted weather surface m-2 i n model (9.4) versus the span used for smoothing m when the fitted seasonal effect 2 m i (not shown) was obtained w i t h a span of 0.09 9.9 163 P l o t of residuals associated w i t h model (9.3) versus P M 1 0 (top row) and day of study (bottom row). T h e span used for smoothing the unknown m i i n model (9.3) is 0.09 164 9.10 P l o t of residuals associated w i t h model (9.3) versus relative humidity, given temperature. T h e span used for smoothing the unknown m i i n model (9.3) is 0.09 165 9.11 P l o t of residuals associated w i t h model (9.3) versus temperature, given relative humidity. T h e span used for smoothing the unknown m j i n model (9.3) is 0.09 166 9.12 A u t o c o r r e l a t i o n plot (top row) and partial autocorrelation plot (bottom row) of the residuals associated w i t h model (9.3). T h e span used for smoothing the unknown m i i n model (9.3) is 0.09 167 9.13 A u t o c o r r e l a t i o n plot (top row) and partial autocorrelation plot (bottom row) of the responses i n model (9.3) 168 9.14 Usual local linear backfitting estimate of the linear P M 1 0 effect i n model (9.4) versus the smoothing parameter 169 9.15 P r e l i m i n a r y estimates of the seasonal effect m i n model (9.3), obtained w i t h a modified (or leave-2/ + 1-out) cross-validation choice of amount of smoothing 170 ix 9.16 Residuals associated with model (9.3), obtained by estimating mi with a modified (or leave-(2Z + l)-out) cross-validation choice of amount of smoothing 171 9.17 Estimated order for A R process describing the serial correlation in the residuals associated with model (9.3) versus I, where I — 0,1,...,26. Residuals were obtained by estimating mi with a modified (or leave(21 + l)-out) cross-validation choice of amount of smoothing 172 9.18 Estimated bias squared, variance and mean squared error curves used for determining the plug-in choice of smoothing for the usual local linear backfitting estimate of Pi. The different curves correspond to different values of I, where Z = 0,1,..., 26. The estimated variance curves corresponding to small values of I are dominated by those corresponding to large values of I when the smoothing parameter is large. In contrast, the estimated squared bias and mean squared error curves corresponding to small values of I dominate those corresponding to large values of I when the smoothing parameter is large 173 9.19 Estimated bias squared, variance and mean squared error curves used for determining the global EBBS choice of smoothing for the usual local linear backfitting estimate of Q\. The different curves correspond to different values of I, where I = 0,1,..., 26. The curves corresponding to large values of I dominate those corresponding to small values of 1 174 9.20 Plug-in choice of smoothing for estimating Pi versus I, where I = 0,1,..., 26.175 9.21 Global EBBS choice of smoothing for estimating Pi versus I, where I = 0,1,..., 26 176 x 9.22 Standard 95% confidence intervals for Pi based on local linear backfitting estimates of Pi with plug-in choices of smoothing. The different intervals correspond to different values of I, where I = 0,1,..., 26. The shaded area represents confidence intervals corresponding to values of I that are reasonable for the data 177 9.23 Standard 95% confidence intervals for Pi based on local linear backfitting estimates of Pi with global EBBS choices of smoothing. The different intervals correspond to different values of I, where I = 0,1,..., 26. The shaded area represents intervals corresponding to values of I that are reasonable for the data; the intervals corresponding to I = 3 , . . . , 7 do not cross the horizontal line passing through zero 178 9.24 Standard 95% confidence intervals for Pi based on local linear backfitting estimates of Pi with global EBBS choices of smoothing obtained by using a smaller grid range. The different intervals correspond to different values of I, where I = 0,1,..., 26. The shaded area represents confidence intervals corresponding to values of I that are reasonable for the data A . l Boxplots of pairwise differences in log MSE for the estimators PU]EBBS-G A N A @IJ,EBBS-L OI "* n e linear ff t p e ec 1 i n 179 PJPPLUG-INI model (8.1), where I = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0 and m(z) = mi (z). 193 A.2 Boxplots of pairwise differences in log MSE for the estimators PIJEBBS-G a n d PUEBBS-L OI "* n e P'UPLUG-IN^ linear effect Pi in model (8.1), where I = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.2 and m(z) = mi(z). 194 xi A.3 Boxplots of pairwise differences in log MSE for the estimators PIJ PLUG-IN > PJJ,EBBS-G a n d PU!EBBS-L °f * n e linear effect B\ in model (8.1), where I = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.4 and m(z) — mi(z). 195 A.4 Boxplots of pairwise differences in log MSE for the estimators PUPLUG-IN> PIJ EBBS-G a n d PIJEBBS-L °f the linear effect B\ in model (8.1), where I = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.6 and m(z) = mi(z).196 A.5 Boxplots of pairwise differences in log MSE for the estimators EBBS-G a n B^ _ , < PLUG IN d PU,EBBS-L °f the linear effect Pi in model (8.1), where I — 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.8 and m(z) = m\(z). 197 A.6 Boxplots of pairwise differences in log MSE for the estimators PU^PLUG-INI EBBS-G a n d PUEBBS-L °^ ^ n e linear effect Pi in model (8.1), where / = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0 and m(z) = m,2{z). 198 xii A.7 Boxplots of pairwise differences in log MSE for the estimators PU*PLUG-INI EBBS-G a n d PUEBBS-L °f the linear effect /3j in model (8.1), where / = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.2 and m(z) — 7712(2). 199 A.8 Boxplots of pairwise differences in log MSE for the estimators P(JPLUG-IN> PIJEBBS-G a n d PIJEBBS-L °f the linear effect /?i in model (8.1), where I = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.4 and m(z) = m (z). 200 2 A.9 Boxplots of pairwise differences in log MSE for the estimators P'u PLUG-IN^ PIJEBBS-G a n d PIJEBBS-L °f the linear effect /?i in model (8.1), where I = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.6 and m(z) — 7712(2:). 201 A.10 Boxplots of pairwise differences in log MSE for the estimators PIJPLUG-IN> PXJEBBS-G a n d PIJEBBS-L °f the linear effect B\ in model (8.1), where / = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.8 and m(z) = m (z). 202 2 xiii A. 11 Boxplots of pairwise differences in log MSE for the estimators 3^ PLUG-IN> M P^EM EBBS-G a n d $EM EBBS-L °f * n e linear effect 8\ in model (8.1), where / = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0 and m(z) — m\(z). A. 12 Boxplots of pairwise differences in log MSE for the estimators $SM,EBBS-G a n d P^EM,EBBS-L °f t n e 203 P^M PLUG-IN > hnear effect Pi in model (8.1), where I — 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.2 and m(z) = mi(z). 204 A.13 Boxplots of pairwise differences in log MSE for the estimators 8^§ M P^EM,EBBS-G a n ^ P^EM,EBBS-L PLUG-IN^ °f the linear effect 3\ in model (8.1), where I — 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.4 and m(z) — m\(z). 205 A. 14 Boxplots of pairwise differences in log MSE for the estimators P^EM,EBBS-G a n d P^EM,EBBS-L °f ^ n e u n e a r PLUG-IN' effect Pi in model (8.1), where I — 0,1,..., 10. Boxplots for which the average difference in log MSE's is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.6 and m(z) = mi(z). 206 xiv 1 A.15 Boxplots of pairwise differences in log MSE for the estimators 0 ^ E PEM,EBBS-G a n d PEM,EBBS-L o f the l i n e a r e f f e c t A i n M C-IN> PLU model (8.1), where J = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.8 and m(z) = rni(z). A.16 Boxplots of pairwise differences in log MSE for the estimators 0^ M PISM,EBBS-G A N A $SM,EBBS-L °fthe n n e a r 207 G-IN> PLU effect 3\ in model (8.1), where I = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 significance level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0 and 208 m(z) = m (z) 2 A.17 Boxplots of pairwise differences in log MSE for the estimators 0^ PLUG-IN^ M P^EM,EBBS-G d PISM,EBBS-L "the a n OI n n e a r effect 0i in model (8.1), where I — 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.2 and m(z) = m {z). 2 209 A. 18 Boxplots of pairwise differences in log MSE for the estimators P\^M PLUG—IN> PEM,EBBS-G a n d PEM,EBBS-L °f e linear effect @ in model (8.1), where t n x / = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.4 and m(z) = m (z). 2 xv 210 A. 19 Boxplots of pairwise differences in log MSE for the estimators $SM,EBBS-G a n d °f the linear effect PISM,EBBS-L B\ B^M PLUG-IN^ in model (8.1), where / = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.6 and m(z) = m (z). 211 m A.20 Boxplots of pairwise differences in log MSE for the estimators 0^ EM,PLUG-IN' 2 PEM,EBBS-G a n d o f t h e PEM,EBBS-L l i n e a r e f f e c t Pi hi model (8.1), where Z — 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.8 and m(z) — 7712(2). 212 A.21 Boxplots of pairwise differences in log MSE for the estimators Pu,EBBS-G> M M,EBBS-G a n d ] PS]MCV o f t h e l i n e a r e f f e c t Pi i n m o d B U ^ P L U G _ I N , e l (8.1), where I = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0 and m(z) = m\(z). 213 A.22 Boxplots of pairwise differences in log MSE for the estimators PU!EBBS-G> PEM,EBBS-G a n d P S]MCV { o f t h e l i n e a r e f f e c t Pi i n PIJPLUC-IN' model (8.1), where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.2 and m(z) = mi (z).214 xvi A.23 Boxplots of pairwise differences in log MSE for the estimators Pu*PLUG-IN i PU,EBBS-G> PEM,EBBS-G a n d PS]MCV o f t h e lin ear effect 6 in model (8.1), X where I — 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.4 and m(z) = mi (z). 215 A.24 Boxplots of pairwise differences in log MSE for the estimators PO^PLUG-IN^ PU,EBBS-G> PEM,EBBS-G a n d PSMCV o f t h e lin ear effect ft in model (8.1), where / = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an 5. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.6 and m(z)= mi(z).216 A.25 Boxplots of pairwise differences in log MSE for the estimators PU^PLUG-INI M]EBBS-GI PEM,EBBS-G a n d PS]MCV o f t h e linear effect ft in model (8.1), where / = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.8 and m(z) = mi (z).217 A.26 Boxplots of pairwise differences in log MSE for the estimators PI/PLUG-IN' PU,EBBS-G> PEM,EBBS-G a n d PS]MCV o f t h e linear effect ft in model (8.1), where / = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0 and m(z) = m (z).218 2 xvii A.27 Boxplots of pairwise differences in log MSE for the estimators PUPLUG-IN> M]EBBS-GI PEM,EBBS-G a n d M}MCV o f t n e l i n e a r e f f e c t A i n m o d e l C - )8 1 where I = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.2 and m(z) = 7712(2).219 A.28 Boxplots of pairwise differences in log MSE for the estimators Pu'PLUG-IN' 0U,EBBS-G> a n d PEM,EBBS-G PS}MCV o f t h e l i n e a r e f f e c t Pi i n m o d e l t - )' 8 1 where I = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.4 and 777,(2) = 777,2(2).220 A.29 Boxplots of pairwise differences in log MSE for the estimators PIJPLUG-IN' PU]EBBS-GI PEM,EBBS-G a n d PSMCV o f t h e l i n e a r e f f e c t Pi i n m o d e l C - )' 8 1 where I — 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an, S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) for which p = 0.6 and m(z) = 221 m (z) 2 A.30 Boxplots of pairwise differences in log MSE for the estimators PU UG PL PU,EBBS-GI PEM,EBBS-G a n d PS,MCV o f t h e l i n e a r e f f e c t Pi i n -IN' model (8.1), where I — 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.8 and 771(2) = 777.2(2).222 xviii B.l Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect 8\ in model (8.1). Each method depends on a tuning parameter I = 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p — 0 and m(z) — mi(z) 224 B.2 Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect 3i in model (8.1). Each method depends on a tuning parameter I — 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.2 and m(z) = m^z) 225 B.3 Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect (3\ in model (8.1). Each method depends on a tuning parameter I = 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.4 and m(z) = m\(z) 226 B.4 Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect d\ in model (8.1). Each method depends on a tuning parameter I = 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p — 0.6 and m(z) = m\(z) xix 227 B.5 Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect ft in model (8.1). Each method depends on a tuning parameter I — 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p — 0.8 and m(z) = m\(z) 228 B.6 Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect ft in model (8.1). Each method depends on a tuning parameter / = 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0 and m(z) — m (z) 2 229 B.7 Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect ft in model (8.1). Each method depends on a tuning parameter I — 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.2 and m(z) — m {z) 2 230 B.8 Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect ft in model (8.1). Each method depends on a tuning parameter I = 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.4 and m(z) — m (z) 2 xx 231 B.9 Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect Pi in model (8.1). Each method depends on a tuning parameter I = 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p — 0.6 and 771(2) = 777.2(2:) B. 10 Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect Pi in model (8.1). Each method depends on a tuning parameter I = 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.8 and Cl 771(2) = 7712(2) Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of Z = 0,1,..., 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0 and 771(2) = 7711(2) C. 2 Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I — 0,1,..., 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0.2 and 771(2) = 7711(2) xxi C.3 Top row: Average length of the standard confidence intervals for the linear effect ft in model (8.1) as a function of I — 0,1,..., 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0.4 and m(z) = rrii(z) 237 C.4 Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I = 0,1,..., 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p — 0.6 and m(z) = rri\(z) 238 C.5 Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of / — 0,1,..., 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0.8 and m(z) = mi(z) 239 C.6 Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I = 0,1,..., 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for ft. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0 and m(z) — 7712(2) 240 xxii C.7 Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I = 0,1,..., 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0.2 and 771(2) = 771.2(2) 241 C.8 Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I = 0,1,..., 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths-were computed with p = 0.4 and 771(2) = m ( 2 ) 242 2 C.9 Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I — 0,1,..., 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p — 0.6 and 771(2) = 7712(2) 243 C.10 Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I = 0,1,..., 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0.8 and 771(2) = 7712(2) 244 xxm Acknowledgements A huge thank you to my thesis supervisor, Dr. Nancy Heckman, for being such an inspirational mentor to me - amazingly generous with her time, ideas, advice and NSERC funding, immensely passionate about research and teaching, wonderfully encouraging and supportive. A sincere thank you to Dr. John Petkau, Department of Statistics, University of British Columbia and to Professor Sverre Vedal and Dr. Eduardo Hernandez-Garduho, formerly of the Respiratory Division, Department of Medicine, Faculty of Medicine, University of British Columbia, for kindly providing me with the Mexico City air pollution data. A heartfelt thank you to Dr. John Petkau for generously funding me to analyze these data, for providing me with valuable feedback upon reading the manuscript of this thesis and for his excellent advice over the years. A sincere thank you to Dr. Lang Wu, Dr. Jim Zidek, Dr. Michael Brauer and Dr. Jean Opsomer for their careful reading of the thesis manuscript and valuable comments and suggestions. Thank you to the Department of Statistics and the University of British Columbia for providing me with funding that enabled me to pursue my degree. I would like to thank all faculty, staff and graduate students in the Department of Statistics, University of British Columbia, for making my stay there such an enriching experience. r I would like to thank my family in Romania for believing in me and for loving me unconditionally. I would also like to thank my dear friends in Canada and Romania, whose affection and humour helped me stay grounded. Special thanks to Viviane DiazLima, Lisa Kuramoto and Raluca Balan for their unwavering support and for being my friends. xxiv Finally, thank you to Jeffie, m y partner i n mischief and adventure, for loving me and our family i n R o m a n i a beyond measure, and for m a k i n g magical things happen a l l the time. ISABELLA RODICA GHEMENT The University of British Columbia August 2005 XXV Jeffie, my ever-loving, ever-caring, ever-there knight i n shining armour, and our loving family i n R o m a n i a . xxvi Chapter 1 Introduction Semiparametric regression models combine the ease of interpretation of parametric regression models with the modelling flexibility of nonparametric regression models. They generalize parametric regression models by allowing one or more covariate effects to be non-linear. Just as in nonparametric regression models, the non-linear covariate effects are assumed to change gradually and are captured via smooth, unknown functions whose particular shapes will be revealed by the data. In this thesis, we are interested in semiparametric regression models for which (i) the response variable is univariate, continuous, (ii) one of the covariate effects is allowed to be smooth, non-linear, and (iii) the remaining covariate effects are assumed to be linear. Given the data (Yi, Xj, Zj), i — 1 , . . . , n, such models can be specified as: Yi = Xjp + m(Z ) + e , i = l , . . . , n , i i (1.1) where (3 is a vector of linear effects, m is a smooth, non-linear effect and the ej's are unobservable random errors with zero mean. Model (1.1) is typically referred to as a partially linear regression model. In many applications, the smooth, non-linear effect m in model (1.1) is not of interest in itself but is included in the model because of its potential for confounding the linear effects (3, which are of main interest. The nature of this confounding is often too 1 complex to specify parametrically. A non-parametric specification of this confounding effect is therefore preferred to avoid modelling biases. T h e practical choice of the degree of smoothness of the non-linear confounder effect is a delicate issue i n these types of applications. T h i s choice should yield accurate point estimators of the linear effects of interest. T h e choice may be highly sensitive to the correlations between the linear and non-linear variables i n the model. T h e potential correlation amongst model errors is a qualitatively different source of confounding on the linear effects of interest i n a partially linear model. In practice, we need to decide carefully whether we should account for this correlation when assessing the significance and magnitude of the linear effects of interest. If one decides to ignore the error correlation, one should try to understand the impact of this decision on the validity of the ensuing inferences. T h e issues of error correlation, non-linear confounding, and correlation between the linear and non-linear terms i n a partially linear regression model are intimately connected. T h e i r interplay needs to be judiciously considered when selecting the degree of smoothness of the estimated non-linear effect. E v e n when this selection yields accurate estimators of the linear effects of interest i n the model, one needs to assess whether it also yields valid confidence intervals and testing procedures for assessing the magnitude and significance of these effects. 1.1 Literature Review- i n this section, we provide a survey of some of the most important results i n the literature of partially linear regression models of the form (1.1). We treat separately the case when the model errors, €$, i = 1 , . . . , n , are uncorrelated and when they are correlated. Note that, i n (1.1), we observe only one sequence Y i , . . . ,Y n studies we would observe multiple sequences. 2 In classical longitudinal E v e n though i n this thesis we are not interested in partially linear models for analyzing data collected in longitudinal studies, we do mention some results which are significant in the literature of these models. 1.1.1 Partially Linear Models with Uncorrelated Errors The partially linear regression model (1.1)-has been investigated extensively under the assumption of independent, identically distributed errors. In this section, we provide a brief overview of some of the most relevant results concerning inferences on /3, the parametric component of the model, that are available in the literature. These results have a.common theme: seeing if /3 is estimated at the 'usual' parametric rate of 1/n - the rate that would be achieved if m were known. As Robinson (1988) points out, consistent estimators of (3 that do not have the 'usual' parametric rate of convergence have zero efficiency relative to estimators that have this rate. Engle et al. (1983) and Wahba (1984) proposed estimating (3 and m simultaneously by minimizing a penalized least squares criterion with penalty based on the s th derivative of m, with s > 2. The performance of the penalized least squares estimator of @ depends on the correlation between the linear and non-linear variables in the model. Heckman (1986) established the ^/^-consistency of this estimator assuming that the linear and non-linear variables are uncorrelated. Rice (1986) showed that, if the linear and non-linear variables are correlated, the estimator becomes y^n-inconsistent, unless one 'undersmooths' the estimated m. 'Undersmoothing' refers to the phenomenon of estimating m at a slower rate than the 'usual' nonparametric rate of n~ ^ - the rate that would be achieved if 4 5 (3 were known. Rice showed that if one didn't 'undersmooth', the squared bias of the estimated linear effects would dominate their variance. The author remarked that this would have disastrous consequences on the inferences carried out on the linear effects. For instance, conventional confidence intervals for these effects would be misleading. Rice called into question the utility of traditional methods such as cross-validation for choosing the degree of smoothness of the estimated non-linear effect when i/n-consistency 3 of the estimated linear effects is desired, and rightly so. These methods are devised for 'smoothing', not 'undersmoothing', the estimated non-linear effect. Green, Jennison and Seheult (1985) proposed estimating (3 and m by minimizing a penalized least squares criterion with penalty based on a discretization of the second derivative of m. They termed their estimation method least squares smoothing and showed that it yields estimators that solve a system of backfitting equations. These equations combine a smoothing step for estimating m, carried out using a discretized version of smoothing splines, with a least squares regression step for estimating (3. Green, Jennison and Seheult generalized their least squares smoothing estimators by allowing the smoothing step in the backfitting equations to be carried out using any smoothing method. These generalized least squares smoothing estimators are referred to in the literature as the Green, Jennison and Seheult estimators. Speckman (1988) derived the asymptotic bias and variance of the Green, Jennison and Seheult estimator of /3, using locally constant regression with general kernel weights in the smoothing step. Speckman's findings paralleled those of Rice: in the presence of correlation between the linear and non-linear variables in the model, the Green, Jennison and Seheult estimator of (3 is -^-consistent only if one 'undersmooths' the estimated m. Speckman provided a heuristic argument for why the generalized cross-validation method cannot be used to choose the degree of smoothness of the estimated m in practice when -y/n-consistency of the Green, Jennison and Seheult estimator of (3 is desired. Neither Rice nor Speckman proposed methods for 'undersmoothing' the estimated m. However, Speckman (1988) introduced a partial-residual flavoured estimator of (3 that does not require 'undersmoothing'. He argued that traditional methods such as generalized cross-validation could be used to select the degree of smoothness of the estimated m. Speckman did not address the important issue of whether such data-driven methods would produce amounts of smoothing that yield -y/n-consistent estimators of the linear effects of interest. Sy (1999) established that data-driven methods such as cross-validation 4 and generalized cross-validation do indeed yield i/n-consistent estimators of these effects, thus paving the way for carrying out valid inferences on these effects, at least for large sample sizes. Opsomer and Ruppert (1999) proposed estimating (3 and m via the Green, Jennison and Seheult estimators, using locally linear regression with general kernel weights in the smoothing step. They showed that, unless one 'undersmooths' the estimated m, their estimator of (3 may not achieve -y/n-consistency. They then suggest how to use the data to choose the appropriate degree of smoothness for accurate estimation of c (3, with c T known. Opsomer and Ruppert's approach for choosing the right degree of smoothness, referred to as the Empirical Bias Bandwidth Selection (EBBS) method, will be discussed in more detail in Chapter 6. The authors conjectured that EBBS would produce a yfnconsistent estimator of c f3. T 1.1.2 Partially Linear Models with Correlated Errors The independence assumption for the errors associated with a partially linear regression model is not always appropriate in applications. For instance, when the data have been collected sequentially over time, it is likely that present response values will be correlated with past response values. Even in the presence of error correlation, it is desirable to obtain y^-consistent estimators for the linear effects in the model. Engle et al. (1986) were amongst the first authors to consider a partially linear regression model with AR(1) errors. They noted that the correct error correlation structure can be used to transform this model into a model with serially uncorrelated errors, by quasidifferencing all of the data. They proposed estimating the linear effects (3 and the nonlinear effect m in the original model by applying the penalized least squares method proposed by Engle et al. (1983) and Wahba (1984) to the quasi-differenced data. Engle et al. (1986) prove that their estimator of (3 is consistent when one estimates m at the 'usual' nonparametric rate of n~ / , but do not show it is y/n— consistent. They recommend 4 5 5 choosing both the 'right' degree of smoothness of the estimated m and the autoregressive parameter by minimizing a generalized cross-validation criterion constructed from the quasi-differenced data. This data-driven choice of smoothing may not however yield an accurate estimator of j3, as it is geared at accurate estimation of m. Schick (1996, 1999) considered partially linear regression models with AR(1) errors and ARMA(p,cj) errors, respectively, where p,q > 1. He characterized and constructed efficient estimators for the parametric component f3 of these models, assuming appropriate theoretical choice of degree of smoothness for the estimated m. He did not however indicate how one might make this choice in practice. Several authors investigated partially linear models with a-mixing errors. Before reviewing their respective contributions, we provide a definition for the a-mixing concept. For reference, see Ibragimov and Linnik (1971). Definition 1.1.1 A sequence of random variables {e ,t = 0 , ± 1 , - - - } is said to be at mixing if a(k) = sup as k —> oo where J ™^ and F^ sup 7 ; +k \P(Af\B) - P(A)P(B)\ -* 0 (1.2) are two a-fields generated by {e ,t < n} and {e ,t > t t n + k}, respectively. The mixing coefficient a(k) in (1.2) measures the amount of dependence between events involving variables separated by at least k lags. Note that for stationary sequences the supremum over n in (1.2) goes away. Aneiros Perez and Quintela del Rio (2001a) considered a partially linear model with amixing, stationary errors. They proposed estimating j3 and m via modifications of the Speckman estimators. Their modifications account for the error correlation structure, assumed to be fully known. The smoothing step involved in estimating /3 and m is based 6 on locally constant regression with Gasser-Miiller weights (Gasser and Miiller, 1984), adjusted for boundary effects. The authors derived the order of the conditional asymptotic bias and variance of the modified Speckman estimator of (3. They found that the conditional asymptotic bias of their estimator of /3 is negligible with respect to its conditional asymptotic variance, shown to have the 'usual' parametric rate of convergence of 1/n. They concluded they do not need to 'undersmooth' their estimator for m in order to obtain a \Zn-consistent estimator for /3. The fact that the modified Speckman estimator of (3 does not require 'undersmoothing' in the presence of error correlation is not surprising. The estimator inherits this property from the usual Speckman estimator. Aneiros Perez and Quintela del Rio (2001b) proposed a data-driven modified cross-validation method for choosing the degree of smoothness required for accurate estimation of the regression function r(Xi, Zi) = Xf/3 + m(Zi) via modified Speckman estimators. It is not clear whether such a method would be suitable for accurate estimation of (3 itself. To address the problem of choosing the degree of smoothness for accurate estimation of (3 via the modified Speckman estimator, Aneiros Perez and Quintela del Rio (2002) developed an asymptotic plug-in method. Their method relies on the more restrictive assumption that the model errors are realizations of an autoregressive process of finite, known order. You and Chen (2004) considered a partially linear model with a-mixing, possibly nonstationary errors. They estimated /3 and m using the usual Speckman estimators, which do not account for error correlation. They then applied a block external bootstrap approach to approximate the distribution of the usual Speckman estimator of (3 and provide a consistent estimator of its covariance matrix. Using this information, they constructed a large-sample confidence interval procedure for estimating f3. Based on a simulation study, the authors note that the block size seems to have a strong influence on the finite-sample performance of their procedure. However, they do not indicate how one might choose the block size in practice. In the simulation study, the smoothing parameter of the usual Speckman estimator of (3 was selected via cross-validation, modified for correlated errors. This method is appropriate for accurate estimation of m but may not 7 be suitable for accurate estimation of /3. You, Zhou and Chen (2005) considered a partially linear model with errors assumed to follow a moving average process of infinite order. They proposed a jackknife estimator for /3, which they obtained from a usual Speckman estimator. They showed their estimator to be asymptotically equivalent to the usual Speckman estimator, and proposed a method for estimating its asymptotic variance. They also constructed confidence intervals and tests of hypotheses for /3 based on the jackknife estimator and its estimated variance. In their simulation study, these authors find that confidence interval estimation based on their jackknife estimator has better finite-sample coverage properties than that based on the usual Speckman estimator, even though the latter uses the information on the error structure, while the former does not. In this study, the smoothing was performed with different nearest neighbor smoothing parameter values and the results were shown to be insensitive to the choice of this parameter. This may not always be the case for contexts that are different from that considered by these authors. As we already mentioned, partially linear regression models with correlated errors can be used for analyzing longitudinal data, that is, data obtained by measuring each of several study units on multiple occasions over time. Longitudinal data are naturally correlated, as the measurements taken on the same study unit are correlated. In order to estimate the linear effects /3 and the non-linear effect m in such models, Moyeed and Diggle (1994) modified the Green, Jennison and Seheult and the Speckman estimators to account for the longitudinal data structure and for the error correlation, assumed to be known. Their smoothing step used local constant Nadaraya-Watson weights (Nadaraya, 1964 and Watson, 1964). They derived the order of the conditional asymptotic bias and variance of their estimators of /3, obtaining asymptotic constants only for the variance of these estimators. Their results are valid under the assumption that the number of study units goes to infinity and the number of occasions on which each study unit is being measured is kept constant. Note that Moyeed and Diggle did not treat m as a 8 nuisance. T o choose the degree of smoothness of the estimated m , these authors used a leave-one-subject-out cross-validation method. T h i s method is geared towards accurate estimation of m and may not be appropriate for accurate estimation of j3. None of the authors considered i n this section looked simultaneously at how to choose the right degree of smoothing for accurate estimation of the linear effects and how to construct valid standard errors for the estimated linear effects. To do b o t h requires accounting for the correlation structure of the model errors. 1.2 Thesis Objectives Throughout this thesis, we will consider only partially linear models of the form (1.1) i n which the non-linear effect m is treated as a nuisance. In contrast to the 'usual' view i n regression models, we w i l l think of the linear covariates as being r a n d o m but consider the Zi's to be fixed. T h e reason for this is that we are m a i n l y interested i n applications for which the Z ; ' s are consecutive time points (e.g. days, weeks, years). T h e results i n this thesis can be easily modified to account for the case when the Z^'s are r a n d o m instead of fixed. However, some expressions need to be re-defined to account for the randomness of the Z j ' s . For instance, see the end of Sections 4.1 and 4.2. In this thesis, we w i l l allow the linear covariates to be m u t u a l l y correlated and assume they are related to the non-linear covariates v i a a non-parametric regression relationship. M o s t importantly, we w i l l assume that the model errors are serially correlated. W i t h i n this framework, we w i l l concentrate on developing formal methods for carrying out valid inferences on those linear effects i n the model which are of m a i n interest. T h i s entails the following: 1. defining sensible estimators for the linear effects i n the model, as well as for the nuisance non-linear effect; 9 2. deriving the asymptotic bias and variance of the proposed estimators of the linear effects; 3. developing methods for choosing the right degree of smoothness of the estimated non-linear effect in order to accurately estimate the linear effects of interest; 4. developing methods for estimating the correlation structure of the model errors for inference and smoothing; 5. developing methods for assessing the magnitude and statistical significance of the linear effects of interest; 6. investigating the performance of the proposed inferential methods via Monte Carlo simulation studies; 7. using the inferential methods developed in this thesis to answer specific questions related to the impact of air pollution on mortality in Mexico City during 1994-1996, after adjusting for weather patterns and temporal trends. We conclude this chapter with an overview of the thesis which indicates where and how the above objectives are addressed. In Chapter 2, we provide a formal definition of the partially linear model with correlated errors of interest in this thesis. We also introduce the notation and assumptions required for establishing the theoretical results in subsequent chapters. In Chapter 3, we define the following types of estimators for (3 and m: (i) local linear backfitting estimators, (ii) modified local linear backfitting estimators, and (iii) estimated modified local linear backfitting estimators. In Chapter 4, we derive asymptotic approximations for the exact conditional bias and variance of the local linear backfitting estimator of /3. Based on these results we conclude that, in general, the local linear backfitting estimator of j3 is not v^-ccmsistent. We 10 argue that the estimator can achieve y^n-consistency provided we 'undersmooth' the corresponding local linear backfitting estimator of m. In Chapter 5, we replicate the results in Chapter 4 for the modified local linear backfitting estimator of (3. We also provide sufficient conditions under which the estimated modified local linear backfitting estimator of (3 is asymptotically 'close' to its modified counterpart. In Chapter 6, we develop three data-driven methods for choosing the degree of smoothness of the backfitting estimators of m defined in this thesis in order to accurately estimate (3. Two of these methods are modifications of the Empirical Bias Bandwidth Selection (EBBS) method of Opsomer and Ruppert (1999). The third method is a non-asymptotic plug-in method. All methods account for error correlation. We suspect that these methods 'undersmooth' the estimated m because they attempt to estimate the amount of smoothing that is optimal for estimating (3, not for estimating m. Our theoretical results suggest that, in general, the optimal amount of smoothing for estimating (3 is smaller than the optimal amount of smoothing for estimating m. In Chapter 6, we also introduce methods for estimating the correlation structure of the model errors needed to choose the amount of smoothing of the backfitting estimators of /3 and to carry out inferences on (3. These methods rely on a modified cross-validation criterion similar to that proposed by Aneiros Perez and Quintela del Rio (2001b). In Chapter 7, we develop three kinds of confidence intervals and tests of hypotheses for assessing the magnitude and significance of a linear combination c (3 of the linear effects T in the model: standard, bias-adjusted and standard-error adjusted. To our knowledge, adjusting for bias in confidence intervals and tests of hypotheses has not been attempted in the literature of partially linear models. In Chapter 8, we report the results of a Monte Carlo simulation study. In this study, we investigated the finite sample properties of the usual and estimated modified local linear backfitting estimators of c f3 against those of the usual Speckman estimator. We chose T 11 the smoothing parameter of the backfitting estimators using the data-driven methods developed in Chapter 6. By contrast, we chose the smoothing parameter of the usual Speckman estimator using cross-validation, modified for correlated errors and for boundary effects. The main goals of our simulation study were (1) to compare the expected log mean squared error of the estimators and (2) to compare the performance of the confidence intervals built from these estimators and their associated standard errors. Our study suggested that quality of the inferences based on the usual local linear backfitting estimator was superior, and that this estimator should be computed with one of our modifications of EBBS or a non-asymptotic plug-in choice of smoothing. Even though the quality of the inferences based on the usual Speckman estimator was reasonable for most simulation settings, it was not as good as that of the inferences based on the usual local linear backfitting estimator. The quality of the inferences based on the estimated modified local linear estimator was poor for many simulation settings. In Chapter 9, we use the inferential methods developed in this thesis to assess whether the pollutant PM10 had a significant short-term effect on log mortality in Mexico City during 1994-1996, after adjusting for temporal trends and weather patterns. Our data analysis suggests that there is no conclusive proof that PM10 had a significant short-term effect on log mortality. In Chapter 10, we summarize the main contributions of this thesis and suggest possible extensions to our work. 12 Chapter 2 A Partially Linear Model with Correlated Errors In Section 2.1 of this chapter, we provide a formal definition of the partially linear model of interest in this thesis. In Section 2.2, we introduce assumptions that we use to study the asymptotic behavior of our proposed estimators. In Section 2.3, we introduce some useful notation. In Section 2.4, we give several linear algebra definitions and results which will be utilized throughout this thesis. The chapter concludes with an Appendix which contains a useful theoretical result. 2.1 The Model Given the data (Yi, Xij, Zi), i = 1,..., n, j — 1,... ,p, the specific form of the partially linear model considered in this thesis is: Y = X/3 + m + e, where Y = (Yi,... ,Y ) T n (2.1) is the vector of responses, X is the design matrix for the parametric part of the model (to be defined shortly), /3 = (Po,Pi, • • • ,Pp) is the vector T 13 of unknown linear effects, m — (m(Zi),..., m(Z )) T n and e = ( e i , . . . , e ) T n is the vector of model errors. Here, X lp \ 1 X 11 X = (2.2) y 1 Xi ••• X n where Xu,..., X ip np J are measurements on p variables Xi, , X , the Zi's are fixed design p points on [0,1] following a design density /(•) (see condition (A3) in Section 2.2 for the exact definition), and m(-) is a real-valued, unknown, smooth function defined on [0,1]. Note that, unless we impose a restriction on m(-), model (2.1) is unidentifiable due to the presence of the intercept ft in the model. For instance, ft + m(-) = 0 + (m(-) + ft). To ensure identifiability, we assume that m(-) satisfies the integral restriction: ~i / m(z)f{z)dz = 0. (2.3) Jo In practice, we replace (2.3) by the summation restriction: lm T = 0, (2.4) where the symbol 1 denotes an n x 1 vector of l's. One could think of the smooth function m(-) as being a transformation of the fixed design points Zi,i = 1,... ,n, that ensures that the partially linear model (2.1) is an adequate description of the variability in the Yi's. Alternatively, one could think of the function m(-) as representing the confounding effect of a random variable having density /(•) on the linear effects ft, . . . , 8 . p We assume that the errors €j in model (2.1) are such that Efe) — 0, Var(e;) = of and Corr(ei, tj) = 'J/jj for i ^ j, where o~ > 0 and \& = (\Pj,j) is the n x n error correlation t matrix. Note that \I> is not necessarily equal to the nx n identity matrix I. In practice, both the error variance of and the error correlation matrix * are typically unknown and need to be estimated from the data. An alternative formulation for the partially linear model (2.1) can be obtained by remov14 ing the constraint (2.3), setting m* = 0 1 + m and re-writing the model as: O Y = X*/3* + m* + e, (2.5) where X* is an n x p m a t r i x defined as: \ X n X* = (2.6) X, np and (3* = (0i,... ,0 ) . T P ) T h e model formulation i n (2.5) is frequently encountered i n the partially linear model literature a n d does not require that we impose any identifiability conditions on the function m*(z) = 0 + m(z),z O 6 [0,1] . Indeed, the absence of a n intercept i n model (2.5) ensures that m*(-) is identifiable. I n this thesis, however, we prefer to use the formulation i n (2.1), as i t makes i t easier to understand that model (2.1) is a generalization of a linear regression model a n d a particular case of a n additive model, which typically do contain a n intercept. 2.2 Assumptions T h e asymptotic results derived i n Chapters 4 and 5 allow the linear variables i n model (2.1) to be correlated w i t h the non-linear variable v i a the following condition. (AO) The covariate nonparametric values X^ and the non-random regression design points Zi are related via the model: Xij = gj(Zi) +r)ij, i = 1 , . . . , n , j = 1 , . . . (2.7) where (i) the gj(-) 's are smooth, unknown functions 15 having three continuous derivatives; (ii) the (rjn,... ,r] ) ,i T ip = 1,... , n, are independent, identically distributed unob- served random vectors with mean zero and variance-covariance matrix S = We impose two different sets of assumptions on the errors associated with model (2.1) for studying the asymptotic behaviour of two different estimators of (3. In Section 3.1.1 of Chapter 3 we define the so-called local linear backfitting estimator of f3. The definition of this estimator does not account for the correlation structure of the model errors. In Chapter 4, we study the asymptotic behaviour of this estimator under the assumption that the model errors satisfy the following condition. (Al) (i) The model errors Ci,i = 1,... ,n, represent n consecutive realizations from a general covariance-stationary process {et}, t — 0 , ± 1 , ± 2 , . . . having mean 0, finite, non-zero variance a\ and correlation coefficients: Pk = E{e e -k) t t o E{e e ) s = , k = 1,2,3,..., 2 s+k , . (2.8) where t,s =0, ±1, ± 2 , . . . . (ii) The error correlation matrix \& is assumed to be symmetric, positive-definite and to have a bounded spectral norm, that is ||*||s = 0{1) as n —* oo. (For a definition of the spectral norm of a matrix see Section 2.4-) (iii) Let (rjn,... ,rji ) ,i T p = 1,... ,n, be as in (AO)-(ii) . Then there exists a (p + 1) x (p + 1) matrix 5>(°) such that the error correlation matrix -^—rf^r) n+ 1 - 4> + o (l) (0) P satisfies: (2.9) as n — > oo, where ^ 0 r?n • • • r] ^ lp (2.10) V= \0 7} i ••• n J] np J (iv) €j is independent of (rjn,..., r} ) for any i, j = 1,..., n. T ip 16 In Section 3.1.2 of Chapter 3 we define the so-called modified local linear backfitting estimator of the vector of linear effects /3 in model (2.1). The definition of this estimator assumes full knowledge of the correlation matrix of the model errors. In Chapter 5, we study the asymptotic behaviour of this estimator under the assumption that the model errors satisfy the following condition: (A2) (i) The €i's represent n consecutive realizations from a covariance-stationary autoregressive process of finite order R having mean 0, finite, non-zero variance a\ and satisfying: e = fat-i + + ••• + 4> et-R + u , t = 0, ± 1 , ± 2 , . . . t R t (2.11) with {u }, t = 0, ± 1 , ± 2 , . . . being independent, identically distributed random t variables having mean 0 and finite, non-zero variance u\. (ii) ej is independent of(r)n,..., r) ) for any i, j = 1,..., n, where (rjn,..., T ip rji ) , i T p 1,..., n, are as in (AO)-(ii). According to Comments 2.2.1 - 2.2.3 below, if the errors satisfy condition (A2), they also satisfy condition (Al). Comment 2.2.1 If the errors e i = 1,..., n, satisfy condition (A2), then one can easily it see that they also satisfy condition (Al)-(i). Moreover, one can show that their correlation matrix * = (*ij) is given by ^ = 1, = p(\i - j\) = p ^ , i ^ j, where p is a correlation function and the p;'s satisfy the Yule-Walker equations: Pk = (piPk-i + ••• + <f>Rpk-R,fork > 0. The general solution of these difference equations is: Pk = V'IAJ + ip2>^2 + •••+ ipR^R, for 17 > 0 where the A;, i — 1,..., R, are the roots of the polynomial equation: z -<l>iz - -^--<f> R R 1 R Initial conditions for determining tpi,..., TJJR = 0. can be obtained by using po = 1 together with the first R — 1 Yule-Walker equations. For more details, see Chatfield (1989, page 38). Comment 2.2.2 If the errors e*, i = 1,..., n, satisfy condition (A2), then their correlation matrix * = (^Sij) satisfies condition (Al)-(ii) by Comment 2.2.1 and result (5.34) of Lemma 5.7.2 (Appendix, Chapter 5). In other words, \& is symmetric, positive-definite and has finite spectral norm. Comment 2.2.3 If the errors e*, i = 1,..., n, associated with model (2.1) satisfy condition (A2) then, by Lemma 2.5.1 in the Appendix of this chapter, \& satisfies (2.9) of condition (Al)-(iii), with 4? = S (0) ( 0 ) and £ ( 0 ) defined as in (2.15) . Comment 2.2.4 Due to its parametric nature, assumption (A2) allows us to find an explicit expression for the inverse of the error correlation matrix making the derivation of the asymptotic results concerning the modified local linear estimator of j3 easier. We have not been able to modify our proof of these results to handle the more general assumption (Al), since finding an explicit expression for SI/ under (Al) may not be -1 possible. The asymptotic results derived in Chapters 4 and 5 assume h, the half-width of the window of smoothing involved in the definition of the local linear backfitting estimator and the modified local linear estimator of (3, to be deterministic and to satisfy h-*0 18 (2.12) and nh -» oo (2.13) 3 as n —> co. These asymptotic results also rely on the conditions below. (A3) The Zi's are non-random and follow a regular design, i.e. there exists a continuous strictly positive density /(•) on [0,1] with: f i f(z)dz = ——, i = n+l Zi / Jo l,...,n. Moreover, /(•) admits two continuous derivatives. (A4) m(-) is a smooth function with 3 continuous derivatives. (A5) K(-), the kernel function used in (3.7) and (3.8), is a probability density function symmetric about 0 and Lipschitz continuous, with compact support [—1,1]. 2.3 Notation Let Zi, i = 1,..., n, be design points satisfying the design condition (A3) and letffi(•)>••• > g (-) be functions satisfying the smoothness assumptions in condition (A0)-(i). We define p the n x matrix G as: ( 1 g,{Z ) x ••• g (Z ) p x \ ( flo(Zi) \ Si(Zi) G = (2.14) \ 1 gi(Z ) n • • • g {Z ) J p n \ go(Z ) n gi(Z ) n ••• g (Z ) J p n Furthermore, let the n x (p + 1) matrix rj be defined as in (2.10) (condition (Al)-(iii)). In light of condition (AO)-(ii), the transposed rows of rj are independent, identically distributed degenerate random vectors with mean zero and variance-covariance matrix 19 £ , where: ( 0 ) 0 S 0 (o) 0 ••• 0 S u • • • Sip (2.15) \ o s i p U s i n g equation (2.7) of condition (AO) (Xjj — gj(Zi) + rjij) together w i t h the definitions of G a n d r/ i n equations (2.14) and (2.10), we can express the design m a t r i x X i n (2.2) as: X = G + r). (2.16) Let K(-) be a kernel function satisfying condition ( A 5 ) ; i f z 6 [0,1] a n d h € [0,1/2], define the following quantity: A\-z)/h {K,z,h)= s K{s)ds, 1 = 0 , 1 , 2 , 3 . (2.17) l -z/h J-zlh Note that, i f z € [h, 1 — h], i.e. z is a n 'interior' point of the interval [0,1], then v (K,z,h) t = f^s K(s)ds = v (K) as [-z/h,(l l - z)/h] 5 [-1,1] a n d K(-) has com- t pact support o n [—1,1] by condition ( A 5 ) . Now, for go{-), • • •, <7P(-) as above and / ( • ) a design density, we let: J g(z)f(z)dz=(j\ (z)f(z)dz,...,j\ (z)f(z)dzy, 0 (2.18) p and J 1 g(z)m"(z)f(z)dz = ( j f ' g (z)m"(z)f(z)dz,..., 0 We also let J g(z) f(z)dz T Q = f Q g(z)f(z)dz g (z)m"\z)f(z)dz) T p . (2.19) and define the (p + 1) x (p+ 1) m a t r i x V as: V = E<°) + / g(z)f(z)dz Jo 20 • [ Jo g(z) f(z)dz, T (2.20) as in (2.15). We also define the (p + 1) vector W as: with W = ^ VJ L I! 9^ "^fW m I! 9( )f( ) ~ ^Y dz V2 L z Finally, define the (p + 1) x (p + 1) matrix V * =4 a ( +E ^ 1 « V *=i 1 / « V CT •f dz m"(z)f(z)dz. (2.21) as: + ~2 ( - E S ( 0 ) z r ^) *=i / 9(z)f{z)dz 7 0 f • J g(zff(z)dz. ° (2.22) 2.4 Linear A l g e b r a - Useful Definitions and Results In this section, we first provide an overview of the vector and matrix norm definitions and properties used throughout the remainder of this thesis. Let A = (Aij) be an arbitrary m x n matrix and B — (Bki) be an n x q matrix, both having real elements. Also, let v — (v\,..., v ) T n be an arbitrary n x 1 vector with real elements. The spectral norm of the matrix A is defined as: I I-A JII s H^lb = max —r—-— IMl2#o \\v\\ 2 with || • || being the Euclidean norm of a vector, that is \\v\\l = ^"=i i • Furthermore, u 2 the Frobenius norm of A is defined as: \\A]\ F E E i=i 4 j=i It is well-known that \\A\\s < \\A\\F. Clearly, if A is a column vector (that is, n — 1), then | | A | | — | | A | | . In particular, if A is a scalar (i.e., m = n — 1), then ||.A||s equals S 2 the absolute value of this scalar. It is also known that | | A • B\\p < \\A\\F • \\B\\FWe conclude this section by reviewing the definitions of random bilinear and quadratic forms and providing formulas for computing the expected value of such forms. 21 Suppose A = (Aij) is an n x n matrix with real-valued elements, not necessarily symmetric. Similarly, suppose that B — is an n x m matrix with real-valued elements. Let u be an arbitrary n x l random vector having real-valued elements. Also, let v be an arbitrary m x 1 random vector with real-valued elements. A bilinear form in u and v with regulator matrix B is defined as: n m B(u, v) = u Bv T = BijUiVj. i=l j=l Note that B(u, v) is random, and its expected value can be computed using the following formula: E(B(u, v)) = trace(BCov(u, v) ) + E{u) BE(v). T T (2.23) In particular, a quadratic form in u with regulator matrix A is defined as: n Q(u) = U Au T n = ^2 AijUiUj, i=l j=l with (2.23) reducing to: E(Q(u)) 2.5 = trace(AVar(u)) + E{u) AE{u). T (2.24) Appendix The following result helps establish that condition (A2) is a special case of condition (Al). Lemma 2.5.1 Let rj be defined as in equation (2.10) of condition (Al) and let * be defined as in Comment 2.2.1. Then, as n — > oo, 1 n +1 where T7*T7 = S is defined as in (2.15). 22 ( 0 ) + Op(l), (2.25) Proof: Let rj denote the / t t h column of rj and consider rjfSl?ri , where I, t = 1,..., p + 1. When t / = 1 or t — 1, this is 0. For I, t — 2,... ,p + 1, we have: ^ ^ n -vf^Vt+i n n ^ n n = "7 E E V i , i * i j V j , t n '—' '—' i=l j=l n y = - E t=l •y n = - £ £ ^ 1 ' " Jl)^,"7i,t n i=l j=l [2] / ^ n-k 1 ™ \ Vi,iVi,t + E P(\k\) - E Vi,iVi+k,t + - Y Vi,iVi-k,t fc=i \ t=l i=k+l J ^° ( y ~k y n +E ^ i ) - E + - E Vi,lVi-k,t i=k+l / n = -E i=l fc=l [2] + E fc=fc +l 0 \ i=l i=fc+l / j n-fc 1 ™ ^(1*1) \ I ~t = l ^.'^+*.* ~i=fc+l E + \ ^ . ^ - M J (- ) 2 26 where [n/2] denotes the integer part of n/2 and k is chosen independently of n in the 0 following fashion. Since 2~Zfcli IP(I^I)I < 0 0 ( s e e Lemma 5.7.2 for a justification of this result), for any given e > 0 we can choose ko such that: 2 00 for some large constant C. E IP(I*DI<§k=k +l 0 In light of condition (AO)-(ii), the first term in (2.26) converges to E ; by the Weak Law |t of Large Numbers applied to the independent random variables 77^77^, i — 1 , . . . , n. The second term in (2.26) converges to zero in probability as n —> co by the following argument. The random variables 77^77*+^, i = I,... ,n — k, are Ac-dependent and identically distributed by condition (AO)-(ii). The Weak Law of Large Numbers for kdependent random variables implies that YfiZi li,i 1i+k,t/{'n — k) converges to Efa^A T E(rji i)E(rj t) t 2: r = = 0 in probability as n —* 00. A similar argument yields that the quantity YH=k+i VijVi-kj/n converges to 0 in probability as n —> 00. Now, consider the third term in (2.26). By Markov's Inequality and condition (AO)-(ii), for n large enough, we have: 23 12 J / j n—k E p(\ \) ^ -/~2vi,iVi+k, k +- t k=k +l \ 0 t=l n E t i=k+l L2J / j n-k 1 E ^(lD (V~ t=l =fco+l < -E e " + ~i=/c+l E fc 1 >€ Vi,ivi-k, \ ^^i-MJ / / ^ n-fc n \ < ~ E l^(lDI - S ^ l ^ i + M l + ~ E ^ki.^i-Ml ) ^ [2] cf fc=fco+l = - E 1=1 V i=k+l / \P(\k\)\(2—E\r, , , \) 1 im+k t fc=fc +l 0 ~ < 7 [ 2 ] E ^ IP(I*I)I<7 0 E 0 2 I P ( I * I ) I < 7 - ^ < « fc=fco+l k=ko+l In conclusion, the t h i r d term i n (2.26) converges to zero i n probability as n C o m b i n i n g the previous results yields (2.25). 24 00. Chapter 3 Estimation in a Partially Linear Model with Correlated Errors Obtaining sensible point estimators for the linear effects in a partially linear model with correlated errors is the first important step towards carrying out valid inferences on these effects. Such inferences include conducting hypotheses tests for assessing the statistical significance of the linear effects of interest, and constructing confidence intervals for these effects. As we have seen in Sections 1.1.1-1.1.2, several methods for estimating the linear and non-linear effects in a partially linear model have been proposed in the literature, both in the presence and absence of correlation amongst model errors. In principle, any of these methods could be used to obtain point estimators for the linear effects in a partially linear model with, correlated errors. However, those methods which ignore the correlation structure of the model errors might produce less efficient estimators than the methods which account explicitly for this correlation structure. It is still of interest to consider methods which do not account for the presence of correlation amongst the model errors when estimating the linear effects in the model. Indeed, these methods could yield valid testing procedures based on the inefficient point estimators they produce and the standard errors associated with these estimators. 25 In the present chapter we show that many of the estimation methods used in the literature for a partially linear model with known correlation structure can be conveniently viewed as particular cases of a generic Backfitting Algorithm. We also show how this generic Backfitting Algorithm can be modified for those instances when the error correlation structure is unknown and must be estimated from the data. This chapter is organized as follows. In Section 3.1, we discuss the generic Backfitting Algorithm for estimating the linear and non-linear effects in model (2.1) when the error correlation structure is known. In particular, in Sections 3.1.1 and 3.1.2 we discuss the usual and modified generic backfitting estimators of these effects. In Section 3.1.3, we talk about appropriate modifications of these estimators that can be used when the error correlation structure is unknown. In Section 3.1.4, we discuss several generic backfitting estimators which are versions of the estimators introduced by Speckman (1988). 3.1 Generic Backfitting Estimators In this section, we provide a formal definition for the generic backfitting estimators of the unknowns /3 and m in model (2.1). We also define and discuss various particular types of these estimators, clearly indicating which of these types we consider in this thesis. We start by introducing some notation. Let ft be an n x n matrix of weights such that the (p+1) x (p+1) matrix X £IX T is invertible. Also, let §/, be a smoother matrix depending on a smoothing parameter h which controls the width of the smoothing window. For example, the local linear smoother matrix is given in (3.6)-(3.8). Next, let S be the c h centered version of S^, obtained as: S% = (I-ll /n)S . T h (3.1) Formal definitions for f2 and S> will be provided shortly. For now, we note that the c h matrix of weights fi may possibly depend on the known error correlation matrix \& and on the smoother matrix S£. 26 The constrained generic backfitting estimators P>tn,s c a h n d m o,S£ of 0 and m are defined as the fixed points to the following generic backfitting equations: 3 , c = {x nxy x si{Y T l - m , j) T n S n (3.2) S m , c = S (Y - X3 ,s )- (3-3) c n S h n £ Use of the matrix S instead of §/, in equation (3.3) ensures that mn,s= satisfies the c h identifiability condition l 5rin,sj = 0. T The motivation behind the generic backfitting equations introduced above is as follows. Given an estimator mn,g£ of the unknown m in model (2.1), one can construct the vector of partial residuals Y — mjj^, Regressing these partial residuals on X via weighted least squares yields the generic backfitting estimator /3 c in equation (3.2). nS On the other hand, given an estimator /3 §c of the unknown (3 in model (2.1), one can n construct the vector of partial residuals Y — X/3 §c. Smoothing these partial residuals n 'h on Z — (Zi,..., Z) T n via the smoother matrix S£ yields the generic backfitting estimator mn,s= in equation (3.3). In practice, one could solve the generic backfitting equations (3.2)-(3.3) for (3r>§c and "^n,s= iteratively by employing a modification of the Backfitting Algorithm of Buja, Hastie and Tibshirani (1989), as follows. 27 The Generic Backfitting Algorithm (i) Let /3^ and be initial estimators for (3 and m calculated as follows. We regress y on the parametric and nonparametric covariates in the model via weighted least squares regression, obtaining: x , z) = 70 + 7 i • xi H V(xi,..., h% • x +% p Here, Z = (Z\ H V Z )/n. p Note that, if Z = (Z,..., n +1 • (z - Z). Z) is an n x 1 vector, the T weighted least squares estimators 7 = ( 7 0 , 7 1 , . . . , 7 ) and 7 i above are obtained P T P + by minimizing the following criterion with respect to 7 = ( 7 0 , 7 1 , . . . , 7 ) and 7 + i : P [ - X Y 1 - ( Z - Z)] ft [Y - X T 7 p + 1 1 - 7 p + 1 T p ( Z - Z)} . We let m^(z)=% -(z-Z) +1 and m<°> = (m<°)(Zi),..., m -°\Z )) . ( Also, we let /3 T n (0) = 7 . Note that m(°> satisfies the identifiability condition (2.4). (ii) Given the estimators and m ^ , we construct /3^ ' and m ^ ' as follows: +1 / + 1 /3 > = ( X J 7 X ) - X r 2 ( F - m « ) (/+1 r 1 mV+V = S (Y - T X0 ). c {1) h Note that m ^ ' satisfies the identifiability condition (2.4), since E> c / + 1 h = (I — 11 /n)Sh, for some smoother matrix S/,. T (iii) Repeat (ii) until (3^ and do not change much. If the Generic Backfitting Algorithm converges at the iteration labeled as I + 1, say, we set: 3n,s< = (3 {I) rnn,si = 28 m ( / ) - However, we need not iterate to find the generic backfitting estimators / 3 § andran§c. n c Using the generic backfitting equations (3.2) and (3.3), we can easily derive an explicit expression for the generic backfitting estimator 3 §c. Simply substitute the expression n of ™^n,s= given in equation (3.3) into equation (3.2) and solve for /3n,s= : 3 ,s= = (x nx)- x n[r T 1 = (X nX)- X il[(I T 1 xp j - §UY - T n njsc - S )Y + S£x3n,s;i T C H Pre-multiplying both sides of the above equation by X flX and rearranging yields T x n(i - § )xp T = x n(i c h T ntSl - S%)Y. Thus, provided the matrix X Q(I — S° )X is invertible, T h 3 , c = (X n(I - S ^ ) X ) - X f i ( / - S%)Y. T n 1 (3.4) r S To obtain the generic backfitting estimator m n ^ without iterating, substitute the explicit expression of 3Q§C obtained above in (3.3) to get: s - & x (x n(i c T h h - SDX)- 1 x n(i T - s ) (3.5) c h Results (3.4) and (3.5) above show that the generic backfitting equations (3.2)-(3.3) have a unique solution as long as the (p + 1) x (p + 1) matrix X Cl(I — S )X is invertible. T c h Various specifications for the smoother matrix S and the matrix of weights Q, appearc h ing in the generic backfitting equations (3.2) and (3.3) (or, equivalently, in the explicit equations (3.4) and (3.5)) lead to different types of generic backfitting estimators. In the rest of this section, we discuss several such specifications, together with the particular types of generic backfitting estimators they yield. Note that, if one wishes to estimate the unknowns 3* and m* in the intercept-free model (2.5) one should carry out an unconstrained backfitting algorithm, using X* instead of X, and Sh instead of S° in h (3.2)-(3.3). 29 3.1.1 Usual Generic Backfitting Estimators The usual generic backfitting estimators are obtained from (3.2)-(3.3) by taking fi = I. Clearly, these estimators are defined by ignoring the correlation structure of the model errors. In this thesis, we consider a particular type of usual backfitting estimators, obtained by taking to be a local linear smoother matrix Sh, whose formal definition will be provided shortly. We refer to these estimators as local linear backfitting estimators and denote them by 0i s° a n t d mi^i- These estimators were introduced by Opsomer and Ruppert (1999) in the context of partially linear models with uncorrelated errors and discussed in Section 1.1.1. Taking to be Sh is motivated by the fact that local linear smoothing has been shown by Fan and Gijbels (1992) and Fan (1993) to be an effective smoothing method in nonparametric regression. It has the advantage of achieving full asymptotic minimax efficiency and automatically correcting for boundary bias. For more information on local linear smoothing, the reader is referred to Fan and Gijbels (1996). We define the (i, j) th element of Sh as: i Si w -™ E n (i) j 3= 1 (iy ' W (-) 3 6 . with local weights w%\ k = 1,..., n, given by: i^ir ) K l [ "' 5 2(Zi) _ {Zi ~ ^ ^] z Sn • - (3 7) Here: S , (Z) = f^K(?-^)(Z-Z ) , 1 n l j 1 = 1,2, (3.8) 3=1 where Z G [0,1], h is the half-width of the smoothing window and K is a kernel function specified by the user. One possible choice of K, which will be used later in this thesis, is 30 the so-called Epanechnikov kernel: ? ) , if | u | < 1; K(u) = 3.1.2 (3.9) 0, else. Modified Generic Backfitting Estimators T h e modified generic backfitting estimators are feasible when the error correlation m a t r i x \T/ is fully known. These estimators are obtained from (3.2)-(3.3) by t a k i n g f i = v f / . -1 Unlike the usual generic backfitting estimators, which ignore the error correlation structure of the model errors, the modified generic backfitting estimators estimators account for this correlation structure and thus would be expected to be more efficient. In this thesis, we consider a particular case of modified generic backfitting estimators, obtained by t a k i n g to be the local linear smoother m a t r i x Sh, whose (i, j) th element is defined i n (3.6)-(3.8). We refer to these estimators as modified local linear backfitting estimators and denote them by /3^-i 3.1.3 c and m^-i S =. s Estimated Modified Generic Backfitting Estimators In practice, the error correlation m a t r i x \I7 is never fully known. M o r e commonly, \17 is assumed to be known only up to a finite number of parameters, or assumed to be stationary, but otherwise left completely unspecified. In these situations, the modified generic backfitting estimators are no longer feasible. However, these estimators can be adjusted to become feasible by simply replacing f i = an estimator of w i t h fl = & \ where "J/ is We refer to these adjusted estimators as being estimated modified generic backfitting estimators. In this thesis, we consider a particular case of estimated modified generic backfitting estimators, obtained by t a k i n g 8^ to be the local linear smoother m a t r i x 31 Sh, whose (i,j) th element is denned in (3.6)-(3.8). We refer to these estimators as estimated modified local linear backfitting estimators and denote them by 3~-i _„ and m~-i Surprisingly, not much information is available in the partially linear regression model literature on estimating the correlation structure of the model errors when it is known only up to a finite number of parameters, or assumed to be stationary, but otherwise left completely unspecified. Later in this thesis we discuss how one might obtain estimators for the error variance of and the error correlation matrix \& in practice. 3.1.4 Usual, Modified and Estimated Modified Speckman Estimators As we have seen earlier, the usual, modified and estimated modified backfitting estimators are obtained from (3.2)-(3.3) by taking f2 to be J , and VP \ respectively, with determined by the smoothing method chosen. Other estimators are the usual, modified and estimated modified Speckman estimators, which are obtained from (3.2)-(3.3) by taking fl to be (J - S ) , (I - S £ ) * c T h estimator of r _ 1 and (I - § ) *I>~\ c respectively. Here, $ is an T h while S depends on the smoothing method of our choice. We discuss c h these estimators below. The usual Speckman estimators ignore the correlation structure of the model errors. In what follows, we denote these estimators by /3(/_§^r expression for §c and m ( § C § _ § c ) T S c . An explicit can be found by taking fl = (I — S ) in (3.4): c 3^_ CJT 7 T h 3(/-s=r,s= = {X XY X Y, T l T (3.10) where X = (I — §> )X and Y = (I — §l)Y are partial residuals formed by smoothing X c h and Y as functions of Z. The usual Speckman estimator 3^J_ CJT S § C can thus be thought of as being the least squares estimator of 3 obtained by regressing the partial residuals Y on the partial residuals X. Later in this thesis, we compare the finite sample behaviour 32 of the usual Speckman estimator /3(J_S°)T §=, with Nadaraya-Watson weights, against that of 0 and / 3 o - i c o being a local constant matrix with the local linear backfitting estimator, c, ItS , the estimated modified local linear backfitting estimator. The modified Speckman estimators are defined by taking into account the correlation structure of the errors associated with model (2.1) and are feasible when the correlation matrix of these errors is fully known. We denote these estimators by /3(i_§£) *-\§= and T a n ^ ( j - s ^ ) " * - , s= 7 1 d note that an explicit expression for /3(is ) ^- , § c T 1 taking f i = (I - S ) ^/c T 1 h be found by c c a n h h in (3.4): (3.11) 3(I-SJF*-I, C - ( X * - X ) - X * - y . r 1 1 r 1 S One can see that /3(/_s=)i *-i , s c i is weighted least squares estimator, obtained by rea h gressing the partial residuals Y on the partial residuals X. The large-sample properties of an unconstrained version of this estimator have been studied by Aneiros Perez and Quintela del Rio (2001a) under the assumption of a-mixing errors. Their estimator is given by: = (X^V-'X*)- ** *- *, 1 3 ( / - K „ * - , Kh 1 r 7 (3.12) 1 where X = (I—K )X*, X* is defined as in (2.6) and Kh is an uncentered local constant c h smoother matrix with Gasser-Miiller weights. Later in this thesis, we compare their asymptotic properties of /3(;_ ji'$- , 1 J f K h against those of /3^-i s =, the modified local linear backfitting estimator. We do not, however, compare the finite sample properties of these estimators, as neither estimator can be computed in practice. Indeed, both estimators depend on the true error correlation matrix, which is typically unknown in applications. The estimated modified Speckman estimators are feasible in those situations where the error correlation matrix is unknown but estimable. ^(i-sjr* ,S£ - 1 a n d ™V-S£)r*"\sj- A n e x P l i c i t expression for obtained by substituting * instead of * into (3.11). 33 We denote these estimators by 3 ( 7 _ S c ) T §- 1 ) S e can be In the remainder of this thesis, we concentrate on the following estimators of 3, the parametric component in model (2.1): (i) 3 c, (ii) 3^-i IS the local linear backfitting estimator; h | S c, (iii) 3 -i s the modified local linear backfitting estimator; , the estimated modified local linear backfitting estimator. * >°h Opsomer and Ruppert (1999) studied the asymptotic behaviour of 3r c under the asS sumption that the model errors are uncorrelated. However, the asymptotic behaviour of Pi,s%> / ^ * - \ S £ a n d 3~-i g c has not been studied under the assumption of error correla- tion. In Chapter 4 of this thesis, we investigate the asymptotic behaviour of 3 ^ and IS discuss conditions under which this estimator is v ro"- °nsistent. In Chapter 5, we obtain / similar results for /3^-i = s c for correctly specified \&. Rather than assuming * to have a general form as in Chapter 4, we restrict it to have a parametric (autoregressive) structure in order to simplify the proofs of all results in Chapter 5. We also give conditions under which 3 -i s is i/n-consistent. 34 Chapter 4 Asymptotic Properties of the Local Linear Backfitting Estimator (3j In this chapter, we investigate the large-sample behaviour of the local linear backfitting estimator as the number of data points in the local linear smoothing window 3 c IiS increases and the window size decreases at a specified rate. Recall that an explicit expression for /3 J ) S = can be obtained from (3.4) by taking Q = I and replacing with the centered local linear smoother S° : h 0 c IiS = (X (I-S'i )X)- X (I-Sl)Y. T l l T (4.1) Throughout this chapter, we assume that the errors associated with model (2.1) are a realization from a zero mean, covariance-stationary stochastic process satisfying condition (Al) of Section 2.2. We also assume that the non-linear variable in the model is a fixed design variable following a smooth design density /(•) (condition(A3), Section 2.2) and having a smooth effect m(-) on the mean response (condition (A4), Section 2.2). Finally, we allow the linear variables in the model to be mutually correlated and assume they are related with the non-linear variable via a non-parametric regression relationship (condition (AO), Section 2.2). In Sections 4.1 and 4.2, we provide asymptotic expressions for the exact conditional bias 35 and variance of 3 given X, Z. In Section 4.3, we provide an asymptotic expression c, JiS for an exact conditional quadratic loss criterion that measures the accuracy of 3 ^ as IS an estimator of 3. In Section 4.4, we discuss the circumstances under which the \fnconsistency of can be achieved given X and Z. In particular, we show that one 3 c IiS must 'undersmooth' mj,s=, the estimated non-parametric component, to ensure that is -^/n-consistent given X and Z. 3 c^ IS The results in Sections 4.1-4.4 focus on the local linear backfitting estimator 3 ^. In Section 4.5, we indicate how these results JS can be generalized to local polynomials of higher degree. The chapter concludes with an Appendix containing several auxiliary results. Throughout this chapter, we let Gi denote the i th column of the matrix G defined in (2.14), and 77, denote the 7 column of the matrix 77 defined in (2.10). We also let Bij,s th c h denote the z component of Pi s th c t h Exact Conditional Bias of f^i,s 4.1 c h given X and Z The modelling flexibility of the partially linear model (2.1) comes at a price. On one hand, the presence of the nonparametric term m in this model safeguards against model misspecification bias in the estimated relationships between the linear variables Xi,..., X p and the response. On the other hand, allowing m to enter the model causes the usual backfitting estimator expression of 3 c^ IS to suffer from finite sample bias. Indeed, using the explicit in (4.1), together with the model formulation in (2.1), we easily see 3 c IS the conditional bias of Pi sp given X,Z, to be: t E0 \X,Z)-a= IiS% {X (I-S )X)- X (I-St)m, T c 1 h T (4.2) an expression which generally does not equal zero. Theorem 4.1.1 below provides an asymptotic expression for the exact conditional bias of 3 c ItS given X and Z. As we already mentioned, this expression is obtained by 36 assuming that the amount of smoothing h required for computing the estimator /3 c is 7S deterministic and satisfies conditions (2.12) and (2.13). Theorem 4.1.1 Let V and W be defined as in equations (2.20) - (2.21). Under as- sumptions (AO), (Al) and (A3) - (A5), if n —» oo, h —» 0 and nh —> oo, the conditional 3 bias of the usual backfitting estimator E0 o \X, IiS Comment h of (3, given X and Z, is: /3 c IS h Z)-(3 = -h - V^W + o (h ). 2 (4.3) 2 P 4.1.1 From equation (4.2) above, one can see that the exact conditional bias offii s%igiven X and Z, does not depend upon the error correlation matrix t Hence, it is not surprising that the leading term in (4.3) is unaffected by the possible correlation of the model errors. Proof of Theorem 4.1.1: Let: where the dependence of B j upon h is omitted for convenience. We will see below n that when n —> oo, h —> 0 and nh —> oo, B j converges in probability to the quantity 3 n V defined in equation (2.20) . Since V is non-singular by Lemma 4.6.11, the explicit expression for /3/,s= in (4.1) holds on a set whose measure goes to 1 as n —> oo, h —> 0 and nh —> oo. We can use this expression to write: 3 ^ = j B ^'{r7TI X T ( J "^ which holds on a set whose measure goes t o 1 as m ) y }' ( 4 ' 4 ) oo, /i ->0 and nh —> co. Taking 3 conditional expectation in both sides of (4.4) and subtracting /3 yields: E(f3 \X, ItSl Z)-(3 = B~\ • {^f[X (I T 37 ~ S )m} c h (4.5) converges in probability to V as n -> oo, /i ->0 and nh —> oo, We now show that B j 3 n that is: B = V + o {l). niI By equation (2.16), X — G + rj, so B j can be decomposed as: n TI ~t~ (4.6) P J- 71 *T* 1 Using S£ = (7 — 11 /n)Sh (equation (3.1) with = Sh), we re-write the first term, T expand the last term and re-arrange to obtain: B - = ^ T T ) ° - T l l T G + ^ T " " T ^ l + G T « ~ ^ S G ^ °^ <4-7> Ts To establish (4.6), it suffices to show that • GUG n(n + 1) 1 T T -L-r r T 1 1 = f g(z)f(z)dz • f g(z) f(z)dz + o(l), J J (4.8) =^ (4.9) T 0 0 + o (l), P whereas the remaining terms are O p ( l ) . First consider G ll G/n(n T T + 1). Set Z = 0, Z \ 0 38 n+ = 1 and use (A3), the design condition on the Z^s, to get: n+l "+ Zi fZi 1 fl / 9j(z)f(z)dz Jo i = E / ' i=l f = E/ = 9j(z)f(z)dz JZi- l X fcW-ft(^)]/W^ +E i-\ "+ /-Zj 1 ^ ^ - i n+l n „Zi ~ J z i for j = 0,..., p fixed. 3 biW-9;(Z.)]/W^+—rEft(^) =E / 1=1 9 iZi)f(z)dz "+1 1 =E / i=i / ' /2t-l t=l JZ i=i + n+l 1 J j+i Re-arranging and using the design condition (A3) and the Lipschitz-continuity of gj(-) (consequence of (A0)-(i)) yields: 1 r 1 TG I I ?±1 rz> n.(.\ 1 fj Jj+i n+l < 1^(1)1 ~ n +I (z)f(z)dz = 9j 0 +E \9i(*) f^ + E 9j(Z )\f(z)dz ~ i r =O ( - L ^ for any j = 0,... ,p, so: = f g(z)f(z)dz + o(l) \-G l + n+ T (4.10) Jo 1 and (4.8) follows. Next consider r/ ri/(n + 1). Fix i,j = 1,... ,p, and use (AO)-(ii), which specifies the T distributional assumptions on the rows of rj, to get: 1 Ln + 1 1 JT V V i+ij+i " n+ _, , K=l in probability. Since [T7 rj/(n + l ) ] i i i = 0 whenever i = 0 and j = 0,...,p or T + J + i = 1,... ,p and j = 0, (4.9) follows. It remains to show that all the other terms in (4.7) are op{l). It suffices to show that G f ( I - S )G /(n +1 h j+1 + 1), Gf ll {S T +1 h - I)G /n(n j+1 39 + 1), Gf (I +1 - S ) /(n h Vj+1 + 1), r]J {I - S )G i/(n + 1) and vf+iS Vj+i/(n + 1) are o {l) for any i,j = 0,1,... ,p. c +1 h c j+ h P These facts follow from lemmas appearing in the Appendix of this chapter. — Sh)Gj \/(n Let i, j — 0,1,... ,p be fixed and consider Gj (I +1 + 1). By result (4.58) + of Lemma 4.6.9 with r* = G i, fl = I and r = Gj+i, this quantity is 0(h ), so 2 i+ G f ( I - S )G /{n +1 h + 1) is o(l). Similarly, by result (4.59) of Lemma 4.6.9 with j+l r* = G i, fl = I and r = G , i+ j+1 Gf ll (I - S )G /(n(n T +l h Gj 11 (I T +1 - S )G /(n{n h + 1)) is Q(h ). Thus, 2 j+1 + 1)) is o(l). j+1 Next consider Gf (7 - 5^)T7 /(n + 1). When j = 0, this is 0. For j = 1,... ,p, by +1 j+1 result (4.60) of Lemma 4.6.9 with r* = G \, i+ fl = I and £ = rjj , this quantity is +1 C M n - / / ! - / ) = o ( l ) . Similarly, when i - 0, rfi {I - S )G /{n 1 2 1 2 + 1) = 0. For i = c P +l h j+1 1,... ,0, result (4.61) of Lemma 4.6.9 establishes that r)f (I - S )G /{n +1) = o ( l ) . c +1 Finally, consider vI+i^h lj+i/( T n h j+1 P + !)• When z = 0 or j = 0, this is 0. By result (4.62) of Lemma 4.6.9 with £* = rj , fl = I and £ = rjj+i, lT+iSh lj+i/( r r ri i+l 0 (n-^ h-^ ) 2 2 P + 1) is = o {l) for i,j = l,...,p. P Combining these results, we conclude that B = £<°>+ f g{z)f(z)dz- f g(z) f(z)dz + o (l) = V + o (l). T n i / Jo P Jo P But V is non-singular by Lemma 4.6.11, so B$ = V - + ( 1 ) . (4.11) 1 0P To establish (4.3), by (4.5) it now suffices to show that: ^ T X T ( J - S )m = -h W c 2 h + o (h ). 2 P (4.12) This equality is established below with the help of lemmas stated in the Appendix of this chapter. 40 By equation (2.16), X = G + rj, so X (I — S )m/(n T c h + 1) can be decomposed as: Using the identifiability condition on m(-) in (2.4) and the fact that S% = (I — 11 /n)Sh T we obtain: -±-X {I T - s%)m = 4r ( Gr n +l J - n+l + ^V (I-S%)m. + -TTTn n(n + 1) G r i l T ( f l f f c - J > m (4.13) T By results (4.66) and (4.67) of Lemma 4.6.10, we obtain G {I - S )m/(n + 1) = -h (v (K)/2) + 1) = T h fi g(z)m"(z)f(z)dz 2 2 h (v (K)/2) 2 2 £* = r j o (h ). 2 P + o (h ) 2 P as well as G ll (S T - I)m/n(n T h Si g(z)f(z)dz-Si m"(z)f(z)dz + o (h ). 2 P Result (4.61) of Lemma 4.6.9 with ft = I and r = m establishes that rjf (I - S )m/(n + 1) = 0 {n~ h ) c i+1) +1 1/2 h = 2 P Note that result (4.61) of Lemma 4.6.9 holds trivially when £* = r^, as r) = 0 1 by definition. Thus, (4.12) holds. This, combined with (4.5) and (4.11) completes the proof of Theorem 4.1.1. To better understand the effect of the correlation between the linear and non-linear variables in the model on the asymptotic conditional bias of Pi s > c : h w e provide an alternative expression for this bias. Corollary 4.1.1 Let Z be a random variable with density function /(•) as in assumption (A3). Let X\,... ,X p be random variables related to Z as: X J = 9j(Z)+Vj, 3 = 1, • • • ,P, where the gj(-) 's are smooth functions as in assumption (A0)-(1) and the r)i's are random variables satisfying E(r)j\Z) = 0, Var(r)j\Z) = S^-, Cov(r)j,r)j<) = E^y, j ^ j', with E = 41 (Ejy) as in assumption (AO)-(ii). Also, let m(-) be a smooth satisfying assumption (A4) and denote its second derivative m"(-). Set X — (Xi,... ,X ) . Under the assumptions T P in Theorem 4-1-1, our previous bias expression can be re-written in terms of X and Z as: E0 \X,Z)-po = \ E(X\Z) Var(X\Z)- Cov(X,m''(Z)) h {K) OtItS% T + o (h ) 1 2 P (4.14) and h u (K) 2 E 2 x,z : Var{X\Z)- Cov{X,m"(Z)) + l o (h ). 2 P V (4.15) Proof: Let a — (Jg gi(z)f(z)dz,..., g (z)f(z)dz) 1 W = (0, Wl) , T \W2\i and let W be denned as in (2.21). Set T p with: = f (z)m"(z)f(z)dz jf - 9j •J (z)f(z)dz 1 1 9j for j = 1,... ,p. Substitute the explicit expression for V - 1 m"{z)f(z)dz, (result (4.68), Lemma 4.6.11) into (4.3) to obtain: l +a E a j r E(3 AX,Z)-3 IiS = -h 2 _ 1 -E a -a zZ~ W T f | _ 1 1 2 E W 2 + o (/i ) 2 P " + _ 1 0 Op(/i ), 2 , with S as in assumption (AO)-(ii). Results (4.14) and (4.15) follow easily from the above by noting that a = E(X\Z), £ = Var(X\Z) 42 and W 2 = Cov(X, Z). Result (4.15) in Corollary 4.1.1 shows that the effect of the correlation between the linear variables and the non-linear variable in the model on the asymptotic bias of the local linear backfitting estimator of the linear effects j3\,..., Q is through the variance-covariance p matrix Var(X\Z) and the covariances Cov(X,m"(Z)). Note that the latter depends on the curvature of the smooth non-linear effect m(-) through its second derivative m"(-). Therefore, the leading term in the bias of Pi,i,s disappears when there is no correlation c h between the corresponding linear and non-linear terms in the model, that is when the correlation between gi(Z) and m"(Z) is zero. In particular, the leading term disappears if m(-) is a line, or if #;(•) = Q for some constant c,. Opsomer and Ruppert (1999, Theorem 1) obtained a related bias result for the local linear backfitting estimator of the linear effects . . . , Q in a partially linear model with v independent, identically distributed errors. These authors derived their result under a different set of assumptions than ours. Specifically, they assumed the design points Zi, i = 1,..., n, to be random instead of fixed. Furthermore, they did not require that the covariate values X^ and the design points Zi be related via the nonparametric regression model (2.7). However, they assumed the linear covariates to have mean zero. Finally, they allowed h to converge to zero at a rate slower than ours by assuming nh —• oo instead of condition (2.13) (nh —> oo). 3 The asymptotic bias expression derived by Opsomer and Ruppert is -(h u (K)/2){E{Var{X\Z))}- Cov{X,m"(Z)) 2 1 2 + o {h ). 2 P The leading term in this expression is a slight modification of our first term in (4.15), which accounts for the randomness of the 2Ys. The rate of the error associated with Opsomer and Ruppert's asymptotic bias approximation is Op(h ) 2 order as that associated with the bias approximation in (4.15). 43 and is of the same 4.2 Exact Conditional Variance of 0i,si Given X and Z In this section, we derive an asymptotic expression for the exact conditional variance Var(0 c\X, IiS of the usual backfitting estimator /3j,s£ °f A given Z ) and Z . But X first, we obtain an explicit expression for the exact conditional variance Var(3 aJX, IS Using the expression for /3 J | S Z ) . in (4.1) together with the fact that c Var(Y\X,Z) of* = (4.16) from condition (Al), we get: Var@ I t S .\X, =a 2 Z ) ( X ( X T T ( I - S ^ X ) ' ( I - S l ) T 1 • X T ( I - S£)*(I - S % ) T X - (4.17) X y \ The next result provides an asymptotic expression for this variance. Theorem 4.2.1 Let G , V and S k be defined as in equations (2.14), (2.20) and (3.1) and let I be the nxn identity matrix. Under conditions (AO) and (A3) - (A5), ifn — > oo, h— > 0 and nh —> oo, 3 2 ; n+ 1 2 (n + 1) 2 v " fty + 0,(1), (4.18) where 4?^°' is defined in equation (2.9) and St is t/ie error correlation matrix. Comment 4.2.1 From equation (4.17), Var(3 cJX, IS Z ) depends upon the error cor- relation matrix \&, so we expect the asymptotic approximation of Var(3j S c \ X , Z) to also depend upon the correlation structure of the model errors. Indeed, result (4.18) of 44 Theorem 4.2.1 shows that, for large samples, the first term in the asymptotic expression Z) depends on & indirectly via the limiting value 3>(°' of rj ^r]/(n + l), of Var(/3 c\X, T ItS while the second term depends on \& directly. Comment 4.2.2 By Lemma 4.6.12, the second term in (4.18) is at most 0(l/n). There\X, Z) has a rate of convergence of 1/n. fore, Var(f3 c IS Proof of Theorem 4.2.1: From (4.6), B = X-T (I - S° )X/(n + 1) = V + o (l), so Vor(/3 = \X, Z) in (4.17) h n J P /iS can be written as: Var@ .\X, Z) = o*B£ IiS • _ L _ x 7 ( i - S )*(I - S ) X c • n,I l nJ B where C n+ • C c h c • (S^)" T h 1 (4.19) {Bnj) = )^f(I - SQc\T ) X/(n — X v T l(I r - S cc\ir.fT 1 nJ c h + 1). The dependence of C T h nJ upon h is omitted for convenience. To establish (4.18), it suffices to show that C j satisfies: n C = *(°> + G (I T nJ - 5 J*(7 - S ) G/(n C c Using X = G + rj (equation (2.16)), C j n + 1) + o (l) (4.20) P can be decomposed as: n+ 1 n-ri -G + T h (I-S%)*(I-S%) r, 1 T n+ + —rr (I-Sl)*{I-Sl) r . r T 1 Expanding the last term and re-arranging yields: 1 n+ T^* 77+ nTl G T { I ~ " SCh)TG 11 + —^J (I - SIMI - SI) r,G+ n+l G 1 n+l 1 1 n+l n+l 45 (I-S )*(I-S%Yri h h h (4.21) ri ^fn/(n T The first term, + 1), converges in probability to by condition (Al)- We now show that all the other terms, except for the second, are O p ( l ) . It (iii). suffices to show that Gf (I and ? 7 ^ 5 ^ * S ^ T 7 r 1 :/+1 S ) /(n + 1), r,T ^Sfr, /(n c - S )*(I c +1 T h h Vj+1 +1 + 1), j+1 / ( n + l) are op(l); these facts follow from lemmas appearing in the Appendix of this chapter. S ) r] /(n + 1). Using Lemma 4.6.4 with £ = c First consider Gj (I - S )&(I c +1 T rj j+1 h h j+x and c = (J — S° )^f(I — S ) Gi+i, as well as properties of vector and matrix norms from c h T h Section 2.4 of Chapter 2, we obtain: ^Gf = ^ O P W - SDMI - ( I - S%)V(I - S f c + 1 h Vj+1 + 11-^1 W • ll*H* • IK - D Gi i\\2) J = ^1°P S T c 4.6.7, is 0(1) by assumption (Al)-(ii), and | | ( J - S ) G \\ S h) lj+i/{ Tr i+1 2 2 0(n ' ) by re1 2 We conclude that Gf (I i+1 i+1 Opin-Wh-W). is T h sult (4.53) of Lemma 4.6.7 with r = G . C P c T h by result (4.54) of Lemma 1/2 h c = + The last equality was derived by using that \ \S \\ is 0(h~ ) S ) G \\ ) - S )¥{I c +1 h - + 1) is °P(1)- Note that Lemma 4.6.4 invoked earlier holds trivially for n £ = T7j, as r) = 0 by definition. l Next consider rjJ VSfr] /'(n +1 rif S VS r] /(n c + 1) and j+1 +1 c T h h j+1 + 1). When i = 0 or j — 0, these quantities are 0, so consider i,j = l,... ,p. By result (4.63) of Lemma 4.6.9 with = r, i+v , l^S Cl = M> and £ =Vj+1 /(n + l) is C p f a " / ^ " / ) = c T V 1 h Vj+1 By result (4.64) of the same lemma with £* = r) , £1 = I, fl* = i+x TiT iS *S?ri i/(n c + h j+ + 1) is 0 {n-'h- ) 1 P 1 2 0 p (l). and £ = r/j , +1 = o (l). P Combining these results in (4.21) yields (4.20). This concludes our proof of Theorem 4.2.1. We now provide an alternative expression for the asymptotic conditional variance of 0i,si which will shed more light on the effect of the correlation between the linear and non-linear variables in model (2.1) on this variance. 46 Corollary 4.2.1 Let G as in (2.14) and 4> (0) G12 G21 be as in (2.9). Set = G (I-Si)(I-Si) G, T G22 O 2 2 (4.22) T J where Gu is a scalar, G12 = G^i is a 1 x p vector and G22 is a p x p matrix. Also, set: where 4?^ — 0, = ( * 2 ? ) = 0 is a 1 x p vector, and $> ° is a p x p matrix. If X T 2 2 and Z are as in Corollary 4-1.1 and the assumptions in Theorem 4-2.1 hold, then our previous variance expression can be re-written in terms of X and Z: 2 Var(p \X,Z) 2 = ltIiS% ^E(X\Z) Var(X\Z)- ^ Var(X\Z)- E(X\Z) T 1 : + ( + i)2 { G ^ + E{X\Z) Var(X\Z)- E(X\Z)f 1 T ) 1 2 - 2G Var(X\Z)- l 1 n 12 E(X\Z) -2E(X\Z) Var(X\Z)- E(X\Z)G Var(X\Z)- E(X\Z) +E{X\Z) Var(X\Z)- G Var(X\Z)- E(X\Z)} T 1 T 1 1 12 (4.24) 1 22 and Var XZ + + o P ^ VariXlZ)- (-) . 1 -VarWZy^VariXlZ)- 1 n+ [G - 2E{X\Z)G 22 + G E{X\Z)E{X\Z) } T l2 n Var{X\Z)- 1 (4.25) Proof: Let a = (Jg g (z)f(z)dz,..., 1 1 J g (z)f(z)dz) l Q T p be as i n L e m m a 4.6.11 and S = be the variance-covariance m a t r i x introduced i n condition (AO)-(ii). 47 (Ey) Substituting the explicit expression for V 1 (result (4.68), Lemma 4.6.11) into (4.18) yields: Var(p JX,Z)l!S v v 21 where V n is a scalar, V = V 12 V n = 4 r n+l f l r S 1 $ ) s l a P n is a p x p matrix given by: 22 " i' " o \- 22 is a 1 x p vector and V 21 1 + + r (n+l) ^ W ^ 2 1 +a^a) 2 - 2G E" a 1 1 2 - 2a S- aG S- a + c^S^G^E^a}, r 1 (4.26) 1 1 2 Vl = - ^ r S - ^ f ' E - ' a + _ ^ _ { _ (77. + 71+1 ( l + o S- a)S- a r G l l 1 1 1) + E - ' o G u S ^ o + (1 + a S - a ) S - G f - E ^ G ^ E ^ a } T 1 (4.27) 1 2 and V 2 2 2 = -^-S-^^En+ l + 1 2 ' ' ^ 2- ' { G B (n + l ) - 2aG + Gnaa^S" . (4.28) 1 12 Results (4.24) and (4.25) follow from (4.26) and (4.28), respectively, since a = E(X\Z) and E = Var(X\Z). Result (4.25) of Corollary 4.2.1 shows that the effect of the correlation between the linear variables and the non-linear variable in model (2.1) on the asymptotic variances of the local linear backfitting estimator of the linear effects j3\,...,P p is through the conditional variance-covariance matrix Var(X\Z), the conditional mean vector E(X\Z) and the matrices G n , G i , G 2 22 in (4.22). Comment 4.2.3 In the case * = I, rj *r)/(n + 1) = rj rj/(n + 1) = E T result (4.9), with E ( 0 ) T as in (2.15). Therefore, $ ° = E = Var(X\Z). } 2 48 ( 0 ) + o (l) by P If we also assume, as Opsomer and Ruppert (1999) do, that E(X\Z) = 0, then (4.25) becomes: \ x,z Var + -l Var(X\Z) n +1 + o (-) ; Var(X\Z)- G Var(X\Z)- (J f 1 u2 1 , P 22 (4.29) Recall that these authors also used different conditions on the rate of convergence of the smoothing parameter h and the design points Zi, i — 1,..., n. Namely, they allowed h to converge to zero at a rate slower than ours by assuming nh —> oo instead of nh —> co, 3 and they assumed the design points Z i — 1,..., n, to be random instead of fixed. it The asymptotic variance expression derived by Opsomer and Ruppert (1999, Theorem 1) is (of/n) • {E{Var{X\Z))}- 1 + Op(h /n 2 expression is (of/n) • {E(V' ar(X\Z))}~ , 1 + l/(n h)). 2 The leading term in this variance a slight modification of our first term in (4.29) which accounts for the randomness of the 2Vs. The rate of the error associated with their asymptotic variance approximation is o (h /n+1 2 P /'(n h)) and is possibly of smaller 2 order than the second term in (4.29), known to be at most 0 (l/n) P by result (4.69) of Lemma 4.6.12 (Appendix, Chapter 4) with \t = I. 4.3 Exact Conditional Measure of Accuracy of f^i,s c h given X and Z Because Pi^i ^ generally a biased estimator of 3 for finite samples, any suitable criterion s for measuring the accuracy of this estimator should take into account both bias and 49 variance. A natural way to take both effects into account is to consider E (||3/,sj - P\\l\X, Z) = {E@ \X, Z) ItS% + trace (3r] T - {E(f3j^\X, {Var(/3 cjX,Z)} Z) - fs) (4.30) . IS Using the above equality, which follows from (2.24), and the asymptotic expressions for and Z)-0 E(J3 c\X, I>S Var(f3 cjX, ItS in Theorems 4.1.1 and 4.2.1, we obtain: Z) Assume that the conditions in Theorem 4-1.1 and Theorem 4-2.1 hold. Corollary 4.3.1 Then: E (Wh.si - + J^TW 0\\l\X, Z)=h i { y ^ i trace W V~ W T ~ Sh)*(I 1 2 +^trace { V ~ SlfGV- } * ^ " 1 } + o (h ) + op ( i ) . 1 (4.31) 4 P The i/n-consistency of Pi,s 4.4 1 c h For obvious reasons, we would like the estimator Pi s to have the 'usual' parametric c t h rate of convergence of 1/n - the rate that would be achieved if ra were known, given X and Z . If/3 c 7 S has this rate of convergence, we say that it is v^-consistent. A sufficient condition for /3/,s= to be y^-consistent given X and Z is for E(\\f3j c — ^|||j-X", Z) to S be Op(n~ ). l By result (4.31) in Corollary 4.3.1, £(||3 e - P\\\\X, Z) is 0 (h ) + O (rT ). This 4 IiS P 1 P result is due to the fact that the conditional bias of Pi si is &p(h )> while its conditional 2 t variance is GpirC ). 1 For E(\\P -(3\\l\X, Z) to be Op{rT ), we require / i = © ( r r ) , l ISI 4 1 as well as h —> 0 and nh —•> oo. 3 To understand the meaning of the above conditions, let us consider that h = n~ . For a h —* 0, we require a > 0. Also, for nh —> oo, we require 1 — 3a > 0. Finally, we want 3 50 h — n~ A ia = 0(n 1 ), so a > 1/4. Thus, we require a G [1/4,1/3). In summary, P s c It h achieves ^/n-consistency for h = n~ , with a G [1/4,1/3). a We argue that Pi s% computed with an h optimal for estimating m is consistent, but not t -y/n-consistent, given X and Z. We argue this by finding the amount of smoothing h that is optimal for estimating m(Z) via the local linear backfitting estimator (Z) where, for Z G [0,1] fixed, ^si{Z) = _ E i=l n ' ( W * (4.32) (Z)]. (4.33) i and wf =K ^) [«5„ (Z) - (Z - ^ ) 5 l2 n>1 Here, S i(Z), I = 1, 2, is as in (3.8), /Y is a kernel function satisfying condition (A5) and n> the Zj's are design points satisfying condition (A3). We define the optimal h for estimating m(Z) via mi s^(Z) as: : ^AMSE = argmin AMSE with AMSE Z) , I:S Z) being an asymptotic approximation to the exact condi- (rhi s (Z)\X, c t (fh i(Z)\X, h tional mean squared error of fhi s {Z) given X and Z: c t MSE h Z) = £ {(ro = (Z) - m(Z)) (rh c (Z)\X, I:S /iS h To find the order of AMSE {fhi s (Z)\X, |x, z} . Z), and hence HAMSE, note that: c : 2 h M S £ (m,,^ (Z)|X, Z) - {E (m (Z)\X, Z) - m(Z)} + Var (m (Z)\X, 2 ItS% IiSfi Z) . By results (4.73) and (4.74) of Lemma 4.6.13, the first term is Op(h ) and the sec4 ond term is 0 {l/(nh)), P (fhi si(Z)\X, AMSE t so MSE (m/, = (Z)\X, Z) is 0 ( / i + l/{nh)). 4 s Z) is Op(h +l/(nh)), A P and the /i that minimizes it satisfies KAMSE = ©(n" / ). 1 Therefore, 5 51 For h — HAMSE, the estimator 3 ^ has conditional bias of order Op(n~ / ) and condi2 5 IS tional variance of order Op(n~ ). Thus, x 3 c IS i consistent but not v^-consistent given s X and Z, as its squared conditional bias asymptotically dominates its conditional variance. However, for h = n~ , a a e [1/4,1/3), the squared conditional bias of fli s c t w mn o h longer dominate its conditional variance asymptotically, ensuring that 3j c achieves \fnS consistency given X and Z. Note that the estimator m j ^ ^ Z ) of m(Z) computed with h = n~ ,a G [1/4,1/3), is 'undersmoothed' relative to that computed with h = HAMSE, a since n~ < n" ^ . a 4.5 1 5 Generalization to Local Polynomials of Higher Degree The asymptotic results in this chapter focus on the local linear backfitting estimator 0i,s%- A natural question that arises is whether these results generalize to the local polynomial backfitting estimator of 3. The latter estimator is obtained from (4.1) by replacing S , the smoother matrix for locally linear regression, with the smoother matrix c h for locally polynomial regression of degree D > 1. See Chapter 3 in Fan and Gijbels (1996) for a definition of locally polynomial regression. Recall that 3 ^ has conditional bias of order Op(h ) and conditional variance of order 2 IS C?p(n ) by Theorems 4.1.1 and 4.2.1. In keeping with the locally polynomial regres_1 sion literature, we conjecture that the local polynomial backfitting estimator of 3 has conditional bias of order Op(h ) D+l and conditional variance of order Note that we may need boundary corrections if D is even. If our conjecture holds, we see that the conditional variance of the local polynomial backfitting estimator of 3 is of the same order as that of 3 c. IS However, the conditional bias of the local polynomial backfitting estimator of 3 is of smaller order than that of In Section 4.4 we established that 3 c IS 3 c. IS is y^-consistent given X and Z provided h 52 converges to zero at rate n~ , a G [1/4,1/3). To ensure that the local polynomial a backfitting estimator of 3 is i/^-consistent given X and Z, we conjecture that h should converge to zero at rate n~ , a E [1/(2D + 2), 1/3). a 4.6 Appendix Throughout this Appendix, the assumptions and notation introduced in Chapter 2 of this thesis hold, unless otherwise specified. The first result provides an asymptotic bias expression that will be useful for proving subsequent results. Lemma 4.6.1 Let Sh = (5y) be the uncentered smoother matrix defined by equations (3.6)-(3.8) and S = (I-ll /n)S . Letr = (r(Z ),..., T c h h where r(-) : [0,1] -> R r(Z )) , T x n is a smooth function having three continuous derivatives and the Zi's are fixed design points satisfying condition (A3). Furthermore, let K be a kernel function satisfying condition (A5) whose moments vi(K,z,h),z € [0,1], I — 0,1,2,3, are defined as in (2.17). element of the vector (Sh — I)r If n —> oo, h —> 0 and nh? —> oo, then the j th can be approximated as: l(S - 1)^. = B (K, Z h)-h + o(h ) 2 h r (4.34) 2 jt uniformly in Zj,j = 1,..., n, where B(K,h\- " r { z ^(K,z,h) -^(K,z,h)u (K,z,h) ) 2 3 = — — ——— — 2 — —, z€[0,1. (4.35) 2 V2(K,z,h)vo(K,z,h)-v{(K,z,h) Furthermore, ifr l = 0, then the j element of the vector (S — I)r can be approximated B {K,z,h) r T th c h as: [(S - 7)r]. = B (K, Zj, c h r h)-h -(^J2 2 53 r( > B K J> Z • h + o(h ). 2 2 (A. 36) Proof: For i = 1,... ,n, let yi — r(Zi) + e , with the e;'s independent, identically distributed t random variables with mean 0 and standard deviation a e (0, co). Set y = (yi,..., e y); T n if r(Zj) = [ShVJj is the local linear estimator of r(Zj) obtained by smoothing y on Z\,..., Z via the local linear smoother matrix Sh, then Bias(r(Zj)) n = [(Sh — I) ]j- Standard results on the asymptotic bias of a local linear estimator yield that r Bias(r(Zj)) is of order h , with asymptotic constant B (K, Zj,h), uniformly in Zj,j = 1,... , n (Fan 2 r and Gijbels, 1993). So the proof of (4.34) is complete. The definition of S and r l c T h [{si - = 0 allow us to write: J H = I - ^ ) S n = [(s -iH- h - I li 1 Sr h n fc 11 J = [(Sh-I)rV- n (Sh-I)r Substituting (4.34) in the above result yields (4.36). The next result establishes the boundedness of a function defined in terms of certain moments of a kernel function K(-). Subsequent results rely on this lemma. Lemma 4.6.2 Let K(-) be a kernel function satisfying condition (A5) and whose mo- ments vi(K, z, h), z 6 [0,1], I = 0,1, 2, 3, are defined as in (2.17). Then, for ho € [0,1/2] small enough and I = 1,2, 3, we have: vi(K,z,h) sup sup i/ (if, z, h)v (K, z, h) - v\(K, z, h) he[o,ho] ze[o,i] 2 2 0 Proof: 54 < oo. (4.37) For z G [0,1], we define the function: v (K,z,h) t v (K, z, h)u (K, z, h) - i>i(K, z, h) 2 2 (4.38) 0 To establish the desired result, it suffices to show that, for any I — 1,2,3, this function is bounded when restricted to the intervals [h, 1 — h], [0, h] a n d [1 — h, 1], where h < ho for some h G [0,1/2] small enough, a n d that the three bounds do not depend on h. 0 Let / = 1, 2 , 3 be fixed and let h < h for some ho G [0,1/2] small enough. T h e restriction 0 of the function i n (4.38) to the interval [h, 1—h] is t r i v i a l l y bounded, as ui(K, z, h) = vi{K) for any z G [h, 1 — h]. Clearly, the bound of this restriction does not depend on h. T o show that the restriction of this function to the interval [0, h] is also bounded, let us note that, i f z G [0,1], there exists a G [0,1] such that z = ah and so v {K,z,h)= l r(l-z)/h / s K{s)ds J-z/h 1/h—a s K(s)ds l / l •CX = f s K(s)ds J —a l = <M°0 since h < ho- Thus, when restricted to the interval [0,h], the function i n (4.38) is equivalent to: _^ <t>i( ) a <l>i( ) a = (po(a)(t)2(a) - </>i(a) 2 _ D(a) where a G [0,1]. To establish boundedness, it suffices to show that the nominator <f>i(a) is bounded from above while the denominator D(a) is bounded from below for any a G [0,1] and I = 1,2,3. To bound 4>i(a), note that: \M )\ a ^ J \s \K{s)ds l 55 since K(-) is a continuous function with compact support. To bound \D(-)\ from below, we show that D(-) is non-decreasing on [0,1] and satisfies D(0) > 0. As D'{a) = </>' (a) • «£ (a) + 0 (a) • $ ( a ) - 20!(a) • < / > ' » , 0 2 o and da = f s K(s)ds l J—a (-l) K(-a) l = (-!)'*(<*) for any I = 0,1, 2 (using Leibnitz's Rule and the symmetry of K), we obtain: D ' ( a ) = K(a) ( f s K(s)ds + a \J — a 2 f 2 K(s)ds + 2a f J —a sK(s)ds) J —a . J Since K is non-negative and symmetric about 0, each term above is non-negative and so D'(a) > 0, that is D(-) is non-decreasing on [0,1]. Further, with K*(s) the density K(s)/ /o K(s)ds = 2K{s), we obtain: D(0) = I K(s)ds • [ s K{s)ds f Jo Jo Uo 2 I s K*l s)ds Jo 2 sK(s) (sK*(s)ds) Thus, £>(0) = Var(D*)/4 > 0, with D* a random variable with density K*. Finally, note that the upper bound \s \K(s)ds/D(0) l of the function <j>i(a)/D(a), a G [0,1], does not depend on h. A similar argument can be employed to establish that, when h < h , with ho G [0,1/2], 0 the restriction of the function defined in (4.38) to the interval [1 — h, 1] is bounded. Now, we use Lemma 4.6.1 and Lemma 4.6.2 to derive asymptotic expressions for the Euclidean norms of the biases which can occur when using locally linear regression to estimate a smooth, unknown function r(-). 56 L e m m a 4.6.3 Let r, Sh and S^ be as in Lemma 4-6.1. Then, if n —> oo, h —> 0 and nb? —> oo: n + 1 1 I-S )r\\l= -?^- j\"(z) f(z)dz-h V 2 i + o{h ). (4.39) i h 7/ r also satisfies l r = 0, then: T 1 n + 1 1 ^(#) (r-^)r||i 2 j\"{zff{z)dz-( j\"{z)f{z)d; • / I + O(/J ). 4 K 4 (4.40) Proof: To establish (4.39), use L e m m a 4.6.1 to get: = -^ = ( T T T T E[5r(Ar,^ /i).^ + o(/i )] 2 > T £ ^ 2 ) ' ^ 4 + O ( / L 4 ) 2 - ( 4 ' 4 1 ) T h e last equality using the boundedness of B (K, z, h) for a l l z G [0,1] and h < h , w i t h r 0 /io G [0,1/2] small enough, which is a consequence of L e m m a 4.6.2 a n d the boundedness of r"(-). Now, we use B (K, z, h) = r"(z)v {K)/2 r n+1 4-! 3 4(n + l ) f - f ' 3=1 3=1 V J ' 4 n+ 1 K + v E Zj£[h,l-h] T h e first term can be shown to equal (v (K)/2) 2 integration argument. for z G [h, 1 - h] to write: 2 jf^ , ' Zji\h,l-h] V j ; B (K,Zj,h) . 2 f r l Q r"(z) f(z)dz 2 + o ( l ) by a R i e m a n n T h e second term is o ( l ) , as the s u m contains 0(nh) terms and r"(z) is bounded for z £ [h, 1 — /i]. T h e t h i r d term is also o ( l ) , as the s u m contains 57 0(nh) terms that have been shown to be bounded for h small enough. Combining these results yields (4.39). To establish (4.40), we use the fact that S c = (I — ll /n)S (equation (3.1)) and T h h l r — 0 to obtain: T ^\\(I-Si)r\\l J2[(Si-I)r] . = 2 Substituting (4.36) in the above yields (4.40). The following result provides a probability bound for a linear combination of independent, identically distributed random variables having zero mean and non-zero, finite variance. Lemma 4.6.4 Let £ = ... ,£„) be a vector whose components are independent and T identically distributed real-valued random variables. If E(£i) = 0 and 0 < Var(£\) < oo, then: ?c = 0 (\\c\\ ) P (4.42) 2 for any real-valued vector c — ( c i , . . . Proof: By Chebychev's Theorem, we have: " ec = El I J2 ^kI \+OP c fc=i J / " , VarI <J2 ^k VN 58 c ^ T h e next lemma provides asymptotic approximations for the elements 5 y , i, j = 1 , . . . , n, of the local linear smoother m a t r i x Sh defined i n (3.6)-(3.8). These approximations are used to obtain uniform bounds for the elements of Sh- L e m m a 4.6.5 (3.6)-(3.8). Let Sij, i,j = l,...,n, Also, let K(-) be local linear smoothing weights defined as in and v (K,z,h), z G [0,1], I = 0 , 1 , 2 , as in Lemma 4.6.2. t Furthermore, let Z i = 1 , . . . ,n, be design points with density function /(•) satisfying i% condition (A3). Then, if n —> co, h —» 0 and nh —> co, we have: 3 s 1 v (K,Zi,h)-^v (K,Zi,h) f(Zi)(n + l)h ' v {K, Z h)u {K, Z h) - (K, 2 l] x 2 u uniformly in Zi, i = 1,... ,n. 0 h (Zi-Zj h K Z hf ' Vl \ u Furthermore, for all h < h , with ho G [0,1/2] small 0 enough, there exists a positive constant C so that: \ S « \ * J ^ - W i - Z i \ Z h ) uniformly in Z and Zj, i,j = (4.44) l,...,n. t Proof: U s i n g the definition of S^ i n (3.6)-(3.8) and the fact that ] T " S ,i(Zi) , 2 n = 1 wf = S , (Zi)S (Zi) n 2 nfi we write: (n + DhS- ~ ( + l)hS (Zj) S ,2(Zi)S fl(Zi) — S i(Zi) (n + l)h S (Zj) (Zi - Zj \ h (Zj-Zj\ n n:2 2 n n ni 2 ntl s , {Zi)s o{Zi) - s {ZiY K n 2 n> ntl \ h (Zi- ){ Zj h )• ( 4 4 5 ) Let I = 0 , 1 , 2, 3 be fixed. B y the definition of «?„,/(•) i n (3.8), the design condition (A3) on the Zj's and a R i e m a n n integration argument, we obtain that the following asymptotic 59 expression for S i(Zi)/[(n + l)h ]: l+1 n< Zj — Zj^ f Zi — Zj\ (n + 1)^+1 ' S n l { Z i ) ~ ( +l ) h ^ n 3= K { h J \ h 4iX^)(^)' ^ "~' ~ /( holds uniformly w i t h respect to Z i +0( A 2) — 1,..., n, as n —> oo, h — > 0 and nh —> oo. 3 it M a k i n g the change of variables s = (Zi — z)/h a n d using a Taylor series expansion of s jf * (^) w =/rr *•* / ( • ) , we express the leading term i n the above asymptotic expression as: r(l-Zi)/h J s K(s) l Zilh f(Zi) + f'(Zi) • (sh) + -^- • (sh) + o(h ) ds f r(i-Zi)/h 2 2 (i-Zi)/h f = / s K(s) [f(Zi) + 0(h)] ds = f(Zi) / s K(s)ds + 0(h) l J-Zi/h l J-Zi/h = f(Z )v (K,Z h) i (s)m+sh)ds l + 0(h) u Here, the O term holds uniformly w i t h respect to Zi,i = l , . . . , n b y the smoothness assumptions on / ( • ) given i n condition (A3). C o m b i n i n g these results, we conclude that: (n + l ) / i ' + 1 < 5 "'' ( Z i ) = ( >( > f Zi l > ^ + °W K Zi + °( ~ ~ ) n lh 2 ( - 6) 4 4 uniformly i n Zi, i = 1,..., n, as n —> oo, h —> 0 and nh —> oo. 3 Now, for I = 0,1,2, 3, we substitute the asymptotic expression of S j(Zi)/[(n n + l)h ] i n l+1 (4.46) i n the right side of equation (4.45). U s i n g that the quantities f(z), K(z) and zK(z) are bounded for z € [0,1] (conditions (A3) and (A5), respectively) and re-arranging, we easily obtain (4.43). T h e asymptotic bound for Sy given i n (4.44) follows immediately from L e m m a 4.6.2 and (4.43). T h e following result follows easily from L e m m a 4.6.5. T h i s result w i l l be used to prove L e m m a 4.6.7. 60 L e m m a 4.6.6 Let be as in Lemma 4-6.1. Given C > 0, there exist C{ > 0 and C > 0 2 such that for any n > 1 and any v = (v\, . . . , v ) T n with \VJ\ < C, we have: (4.47) and \[S v\A<Cl (4.48) \S v\\l<n(Clf (4.49) h Furthermore, we also have: T h and (4.50) \s v\\i< (c* y. n h 2 Proof: Use result (4.44) of L e m m a 4.6.5 to write: E SjkVj <El ;*l>^ ~El ;*l fc=i 5 c 5 fc=i fc=i l -0(nh) = 0 ( 1 ) . ( +l)/T + n T h i s proves (4.47). Result (4.48) can be derived using a similar reasoning. B y result (4.47), we have: \\S v\\l = (Slv) (S v) T T h T h = £ [5^]; < n(Cl)\ so (4.49) is proven. Result (4.50) can be shown to hold i n a similar manner. Now, we use Lemmas 4.6.5 and 4.6.6 to establish the following asymptotic bounds. 61 Lemma 4.6.7 Let r,S c h and I be as in Lemma 4-6-1- Then, if n —> oo, h —> 0 and nh — > co: 3 ||r|| = 0 ( n / ) 1 \\S r\\ T h 2 (I-S ) r\\ c T h 2 (4.51) 2 2 ) = 0(n^), (4.52) = 0(n^), (4.53) and \s \\ c h F = (4.54) o(h-^). Proof: Using the boundedness of /•(•), we write: n \\r\\l = r r = Y,r{Z ) = 0{n), T 2 % t=i so (4.51) is proven. Using S = (I - 11 '/n)S c and result (4.49) of Lemma 4.6.6 with v = (I - T h h 11 /n)r, T we have: Sfr\\l = Si [I - - H i r T *\2 \\Siv\\i<n-(Cl) = n for some CJ > 0 not depending on n. This proves (4.52). 1 Result (4.53) follows immediately from results (4.51) and (4.52). Finally, to show result (4.54), we use well-known properties of the Frobenius norm to get: )s n 11 T \SI\\F - i- — Thus, it suffices to show that h 1 n < \\S \\ + -\\U \\ h F HS/JI^ is of order 0(h 62 T F / ). 1 2 -\\S \\ h F <2\\S \\ . h F B y result (4.44) of L e m m a 4.6.5, we obtain: \\S \\l = 11 Si < h i=l j=l ± ± I(\Z -Z \< h) t ^ 3 i=l j=l ' for some positive constant C. Since the number of non-zero terms i n the double s u m appearing on the right side of the above inequality is nO(nh), we conclude that is 0(h- ) x or, equivalents, that \\S \\ is h HS/JI? 2 0(hr / ). 1 2 F T h e next result provides a probability bound for the E u c l i d e a n norm of a vector of n independent, identically distributed random variables having zero mean and non-zero, finite variance. It also provides a probability b o u n d for the E u c l i d e a n n o r m of a transformation of this vector, obtained by pre-multiplying the vector w i t h the transpose of a centered local linear smoother matrix. Lemma 4.6.8 Let £ be as in Lemma 4-6-4 d S° be as in Lemma 4-6.1. Furthermore, an h let fi be an n x n symmetric, positive definite matrix with ||fi||s = C>(1). Then, if n —> oo, h —> 0 and nh —> co, we have: 3 ||£|| = 0 {n ' ) (4.55) 1 2 2 P = Op{h-W) \\S£toi\\ % (4.56) \\S Slt\\ = Op{hr l ) c (4.57) l 2 h 2 Proof: B y M a r k o v ' s Theorem: = OP {E{\\H\\l)) = Op(nVar(^)) so (4.55) is proven. 63 = 0 (n), P Next, consider (4.56). Set B = flS . By Markov's Theorem, we have: c h \\Sf ml = \\B m = Op(E(\\B t\\%) T Thus, it suffices to show that E(i BB $) r is 0{h~ ). T E(?BB S) = T T Using result (2.24) with u = £ 1/2 and A = BB , together with the symmetry of fl, we obtain: T E(£ BB £) T T = trace (BB • Var{£)) + E{£) • BB • E(£) = Varfa) • \\B\\ T T T • trace {BB ) + 0 = Var(^) T 2 F <||n||l.||5 || = C7(l)C?(/- ) = 0(/i- ), c 1 fc F 1 l by result (4.54) of Lemma 4.6.7. This proves (4.56). Result (4.57) can be established using a similar argument. The next lemma contains results concerning the asymptotic negligibility of various random or non-random terms. All of these terms depend on a matrix of weights fl and ; on centered or uncentered local linear smoother matrices. Some terms also depend on a matrix of weights fl*, possibly different than fl itself. Lemma 4.6.9 Let fl and fl* be n x n symmetric, positive-definite matrices satisfying \\fl\\ = 0(1) s = ||n*|| . LetS s and r* = (r*(Zi),... ,r*(Z )) , T n andS c h h be as in Lemma 4.6.1. Setr = (r(Zi),..., r{Z )) T n where r(-) : [0,1] -> R and r*(-) : [0,1] -> R are smooth functions having three continuous derivatives and the Zi's are fixed design points satisfying condition (A3). Finally, let £ = (£i,. • •, £ ) n T a n a > £* = (£!> • • • > £n) be vectors T whose components are independent, identically distributed random variables such that Efa) = 0, Varfa) < oo and E(£*) = 0, Var(£*) < oo . Then, if n -» oo, h -> 0 and 64 nh —> oo, we have: 3 1 -r* Sl(I-S )r n+ = 0(h ), (4.58) - I)r = 0(h% (4.59) 1 2 h -^±—^ mi (S T T h ^ r ^ n C J - S£)£ = Op (n-^h-^) , (4.60) ^r "(/ - Sl)r = O p l n " ^ ) , r 1 (4.61) 2 - L - £ * n S ^ = CMn- / /!- / ) T 1 2 1 (4.62) 2 - L ^ O S f * - Opin-Wh- '*). (4.63) 1 1 1 n - € * n ^ n * s f n« = o { - h- ) (4.64) C nSf (4.65) r n+ T +l l P 1 n Q*Sf Sl£ = Opin^h- ). 1 Proof: Using properties of matrix and vector norms introduced in Section 2.4 of Chapter 2, we get: • ||(I - 5 , ) r | | | ; ^ r * f i ( J - S )r\ < ^ | | r ' | | • r h 2 2 = ^rC»(n / )0(l)0(n / /i ) = C(/i ) 1 2 1 since ||r*|| is © ( n / ) by result (4.51) with r = r* and 1 2 2 2 - S )r\\ /(n 2 2 2 h 2 ||llT||F - + 1) is 0(h ) 4 by result (4.39). Thus, (4.58) holds. Similarly, we obtain: 1 r* mi (S n(n + 1) T T h - I)r - ; -n7r^l) (n+ ^ ( 0 n n l / 2 so (4.59) holds. 65 l|r1|2 - ||n||s )°( ) ( ) ( 1 0 n 0 - n l / 2 h 2 11(5,1 ~ ) = °( )> h 2 I ) r | 1 2 Using result (4.42) with c = (I - S ) Clr* , we have: c T h 1 r* Cl(I - Si)£ = ^ O ( \ \ ( I n +1 n+ S ) rir*\\ ) T c < —Op n +1 ((1 + \\S%\\ ) • = l 2 T h P 2 • ||r*|| ) = n+1 F ^-Op(h- l )0 {l)Op{n}l ) l 2 2 2 P 0 {n- ' h- l ), l 2 P since \\S \\ is 0{h~ / ) by result (4.54) and ||r*|| is 0{n l ) c 1 h 2 by result (4.51) with l 2 F 2 r — r*. We conclude that (4.60) holds. From result (4.42) with c = ft(I - S )r and £ = £*, we get: c h n+ (||n(J - - S )r = C n(I 1 c T h T S )r\\ ) < c n +1 1 0 (l)Op{n l h ) n +1 l 2 h 2 = 2 P -L_0 m\\ n +1 P • ||(I - S )r\\ ) c s h 0 (n- h ), 1/2 2 P since ||(J - S )r\\ /{n + 1) is 0(h ) by result (4.40). Therefore, (4.61) holds. c 2 h A 2 To prove (4.62), write: 1 1 iiriuMu-11^112 n+1 1 0 {n ' ) • 0(1) • Op(h~ ' ) = n +1 <— n+1 1 2 1 2 P Op(n- h-^ ), 1/2 2 since ||£*|| is Op{n ' ) by result (4.55) with £ = £* and |]S££|| is 0 ( ^ / ) by result 1 2 1 2 2 2 P (4.57) with Q, = I. Result (4.63) follows via a similar argument, but with result (4.57) replaced by result (4.56). Result (4.64) follows by noting that: 1 1 < n + 1 Wnril2-l|n*|| -||s?n<*|| n +1 s 1 -Op(h-- ' )0(l)0 (h- ' ) 1 2 = 1 2 n+ P 2 Opin-'h- ), 1 since both ||S^ft£*||2 and ||S£ f2£|| are Op{h~ ) by Lemma 4.6.8. A similar reasoning T l/2 2 yields that (4.65) holds. This concludes our proof of the current lemma. 66 2 T h e next l e m m a provides asymptotic expressions for quantities involving the bias of a local linear estimator of an unknown, smooth regression function m(-). L e m m a 4 . 6 . 1 0 Let G be as in (2.14) and let m = (m(Zi),... ,m(Z )) , be as in Lemma 4-6.1. Furthermore, where m satisfies the smoothness conditions in condition T n (A4) and Z\,..., Z are fixed design points satisfying condition (A3). Then, if n —> oo, n h —> 0 and nh —> oo, we have: 3 - S )m = -h ^p2 -^—G {I T 71+1 — L ^ l l f g(z)m"(z)f(z)dz 1 h ( S - I)m = h ^p- J 2 h (4.66) 2 JQ I T + o(h ) g(z)f(z)dz 1 •£ m"(z)f(z)dz + o(h ) 2 (4.67) where g(z)f(z)dz and g(z)m"(z)ffz)dz are defined as in equations (2.18) and (2.19). Proof: Let i = 0 , 1 , . . . ,p, be fixed. B y result (4.34) of L e m m a 4.6.1 w i t h r = m, the (i + l ) element of G (I — Sh)m/(n + 1) is: T [^-Cril - S J m ] ^ = --±-± ,(Z,)[( g - I)m] Sk i=i 1 = -h L N o t i n g that B (K, m ™ ——^giiZ^BMZ^h)] 2 n + 1 z, h) = m"(z)v (K)/2 for z £ [h, 1 - h], we write: 2 = 3=1 ^LY g Z )rn''{Z ) j ^ 2(n+ % 1) 2 2 3=1 -L-j^g^B^K^^h) V +o(h ). J { E 9i(Z )m"(Z ) j j ' +- Zj$[h,l-h] l{ 0 ^ 9i(Zi)B {K,Z h). m Zj$[h,l-h] 67 j j=l it s t The first term can be shown to equal (u (K)/2)- J gi(z)m"(z)f(z)dz+o(l) by a Riemann X 2 Q integration argument. The second and third terms are o(l), as both sums contain 0(nh) terms and these terms are bounded for h small enough, by the following argument. The boundedness of m"(z) for z £ [h, 1 — h) is a consequence of condition (A4). Lemma 4.6.2 yields that the function z —> B (K, z, h) is bounded for all z G [0,1] and h < h with m 0 h G [0,1/2] small enough. Combining these results yields (4.66). 0 Now, consider (4.67). Since the first column of G is the vector 1, from (4.66): 1 (J - S )m/{n T h j + 1) = "(z)f(z)dz 1 m + o(h ). 2 Combining this with (4.10) proves (4.67). The next result concerns the existence of an inverse for the (p + 1) x (p + 1) matrix V defined in (2.20). We do not provide a proof for this result, as one can easily verify that VV _ 1 = V~ V = I using the expression for V 1 L e m m a 4.6.11 Let V = S ( 0 ) given below. - 1 + ftg{z)f{z)dz • f g(z) f(z)dz l T matrix introduced in (2.20) and set a = ( J g (z)f(z)dz, be the (p+ 1) x (p+ 1) • • •, JQ g (z)f(z)dz) . 1 T x p Also, let S = (Ejj) be the variance-covariance matrix introduced in condition (AO)-(ii). Then V - 1 exists and is given by: , V- 1 /1 + provided E r S = - 1 • V _ 1 o - S _ 1 o o I -a E 1 T j _ 1 , , (4.68) £ exists. The last two lemmas in this Appendix provide several useful asymptotic bounds. L e m m a 4.6.12 Suppose the assumptions in Theorem 4-2.1 hold. Then: — L ^ - ^ J - S )9(I h 68 - S ) GVT h 1 = Q(n- ). 1 (4.69) Proof: Since the elements of the (p+1) x (p+ 1) m a t r i x V to show that G (I - S )^{I T c h - S ) G/(n c + if T h do not depend upon n, it suffices is 0(n~ ). It is enough to show that x Gf (I - S )*(I Let i,j = 0 , 1 , . . . ,p be fixed. U s i n g vector and m a t r i x n o r m properties introduced i n +1 - Sl) G /(n 1 c + l ) is © ( n - ) for any i, j = 0,1,... ,p. T h 2 m 1 Section 2.4, we obtain: — Gj (I-S )*(I-SiyG < c (n + l ) 1 2 +1 (n + l ) ' I 2 h s yG c h -±—0(n^) (n + iy since \\(I - S ) G \\ c T h i+1 j+1 2 • \\#\\s • • Oil) • 0(n" ) = 0(n^ ) - S ) G \\ c r h j+l 2 < = 0(n~') 2 2 2 \\ i+1 = ||(I - S ) G \\ c by result (4.53) of L e m m a 4.6.7 T h j+1 2 w i t h r = Gi+i and r = G^+i, respectively, a n d ||\P||s = 0 ( 1 ) by condition ( A l ) - ( i i ) . Thus, Gf (J - S J * ( J - S £ ) G C T +1 / ( n + l ) is 0(n~ ). l 2 i + 1 L e m m a 4 . 6 . 1 3 Suppose the assumptions in Theorems 4-1-1 and4-2.1 hold. Letfhz,s%(Z) be the local linear backfitting estimator ofm(Z) defined in (4.32), where Z G [0,1] is fixed. Also, let rhi s (Z) c t h denote the local linear backfitting estimator ofm(Z) that would be ob- tained if 3 were known precisely: m J i S (4.70) c(Z) where the wf" 's are as in (4.33). Then, if n —> oo, h —• 0 and nh —> co, we h< ave: 1 3 E( , c(Z)\X, mi S Z) - m(Z) = 0(h ), 2 Var(m (Z)\X,Z) = 0 ItSl 69 1 nh (4.71) (4.72) and E(fh c (Z)\X, ItS Z) - m(Z) = 0(h ), h Var(m c(Z)\X, Z) = O (J^J ItS 2 (4.73) . (4.74) Proof: T h e proof of (4.71) and (4.72) can be found i n Francisco-Fernandez and Vilar-Fernandez (2001), so we omit it. T o prove (4.73), use the definitions of fhi s (Z) and rhz,s (Z) i n (4.32) and (4.70) to c > c h h write: mi %{Z) = - En iS = fn \—m (Z) (Z)-w X0 T I ) S i I t S l (Z) -3). Thus: E(mj, c(Z)\X, Z) - m(Z) = {E(m c S I>S (Z)\X, Z) - m(Z)} - w X{E0 si\X, Z) - 3} T It (4.75) and Var(fh {Z)\X,Z) = Var(rh {Z)\X, ItSl Z) - 2w X T ItSl + wX T • Var{m *(Z)\X, • Cov(J3 c, rh (Z)\X, ItS m Z) • X w. (4.76) T IiS Result (4.73) follows by combining (4.75) a n d (4.71) a n d using that Bias(3 ^\X, is is Op(h ) 2 by T h e o r e m 4.1.1 and w X T using the fact that the wf^s is 0(1). are bounded by L e m m a 4.6.5. Result (4.74) follows by IS wX T c ,rhi si(Z)\X, S t Z) T h e latter result is easy to establish combining (4.76) a n d (4.72) and using that Var(3 cjX, 4.2.1, Cw(3j Z) Z) is Op(l/(nh)) is 0 ( 1 ) . 70 Z) is 0 (l/n) P by T h e o r e m by a Cauchy-Schwartz argument and Chapter 5 Asymptotic Properties of the Modified and Estimated Modified Local Linear Backfitting Estimators, and In this chapter, we investigate the asymptotic behavior of the modified local linear backfitting estimator / 3 ^ - i c of 3, w i t h * being the true correlation m a t r i x of the model S errors. Recall that an explicit expression for 3^-\ f2 = and replacing 3*-i iSS S c can be obtained from (3.4) by t a k i n g w i t h the centered local linear smoother = {X *-\I T - Sl)X)~ l X *~\l T S: c h - S%)Y. (5.1) To simplify the proofs of the asymptotic results derived i n this chapter, we consider that the model errors satisfy assumption (A2), that is, they are consecutive realizations from a stationary A R process of finite order R. A s s u m p t i o n (A2) is a special case of the assumption ( A l ) considered i n Chapter 4. T h e structure of this chapter is similar to that of Chapter 4, where we studied the asymptotic behaviour of 3 c^. IS ^ n the fi rst P * °f *he chapter, we study the asymptotic a r 71 behaviour of / 3 ^ - i c . T h e proofs of the asymptotic results concerning ever more complicated t h a n those concerning conditional bias and variance of I>S given 3^,-i c S exact conditional bias and variance of / 3 J i S f° 3 c and X c given r t and X n e are how- 3^-i c S >S following reason: the exact Z depend on Vl> whereas the Z do not depend on -1 \T/ - 1 . Next, we mention how the asymptotic results concerning the modified local linear backfitting estimator can be generalized to local polynomials of higher degree. 3^-i ^ s provide sufficient conditions for the estimators 3^-\ g c We then and / 3 ^ - i c to be asymptotically S 'close'. T h e chapter concludes w i t h a n A p p e n d i x containing several a u x i l i a r y results. 5.1 Exact Conditional Bias of /3^-i c given X and Z S ' h Just like the usual local linear backfitting estimate t i n g estimate /3/,s= > the modified local linear backfit- suffers from finite sample bias. Indeed, using the explicit expression 3^,-i c S of / 3 ^ - i c given i n equation (5.1), we obtain the exact conditional bias of S X and Z given as: £ ( 3 * - i c \ X , Z)-3= i S ( X T * - ( I l - S D X y 1 X T ^ ~ 1 ( I S )m, c - h (5.2) an expression which generally does not equal zero. T h e o r e m 5.1.1 below provides an asymptotic expression for the conditional bias of / 3 ^ - i c . S given X and Z. These derivations assume that the value of h i n S ° is deterministic and h satisfies conditions (2.12)-(2.13). Theorem 5.1.1 Let and W be defined as in equations (2.21) - (2.22). Under con- ditions (AO) and (A2)-(A5), if n —> oo, h —> 0 and nh —> oo, the conditional bias of the 3 modified local linear backfitting estimate /3,j-i c of 3, given X and Z , is: S E 0 ^ t S a j X , Z ) - 3 = -h ^(l-J2<p ) 2 V k ° u \ 72 fc=i / ^ W + o (h ). 2 P (5.3) Comment 5.1.1 Aneiros Perez and Quintela del Rio (2001a) investigated the large sample properties of an estimator similar to 3^-ig^, namely /9(/_K- ) *- , K ' T 1 h h * n e u n _ constrained modified Speckman estimator in (3.12). Under similar assumptions as ours, Aneiros Perez and Quintela del Rio obtained a faster rate for the asymptotic conditional bias of their estimator, namely Op(h ). 4 the asymptotic conditional bias of 3^-i c S As seen in (5.3), the rate we obtained for is Op(h ). However, they did not provide 2 asymptotic constants for this bias, like we do in (5.3). They obtained the same rate of convergence for the asymptotic conditional variance of their estimator as we did for that of 3^,-i c , namely Op(l/ri). Just like us, they do provide an asymptotic constant for S this variance. Proof of Theorem 5.1.1: Let: where the dependence of B ^ n upon h is omitted for convenience. We will see below that when n —> oo, h —• 0 and nh —> co, B ^ converges in probability to the quantity 3 n V * defined in equation (2.22). Since V * is non-singular by Lemma 5.7.6, the explicit expression for 3^,-1 c in (5.1) holds on a set whose measure goes to 1 as n —> co, h —> 0 S and nh —> co. We can use this expression to write: 3 •j - l - X ^ - ' t l - S£)yj , 0*-*,s% = (5.5) which holds on a set whose measure goes to 1 as n -> oo, h —> 0 and nh —> co. Taking 3 conditional expectation in both sides of (5.5) and subtracting 3 yields: E@ - \X,Z)-a 9 ltS% = B~)f, • j - i - X 7 *- ^1 S£)mj • (5.6) We now show that -B ,* converges in probability to V * as n —> oo, h —> 0 and nh —> co, 3 n that is: B * = V * + o (l). n> P 73 (5.7) Using the fact that X = G + r\ (equation (2.16)), B ^ can be decomposed as: n = ^ T T g * T _ ( 1 - 7 ^ S G + —^rG Ti -t- i n T * _ (i - 1 + r T T ^ * " ^ - S JG + - i - V * " 1 C = Sh, S — (I — 11 /n)Sh, From equation (3.1) with c s) h v l ^ - 5^)77. 1 (5.8) so re-writing the first term, T h expanding the last term and re-arranging yields: - ^Ti) *" G T + ^TT) + ^TT" *" G T r l l l T G + *" l l T ( s "' 1 ( / ^i"*" "-^ii s a G 1 *" 7 ) G+ ^TT *" G T *" G T 1 ( / 1 ( / - * S ) G - "STT'' *~ ' ^ T 1 s ( 5 ' 9 ) To establish (5.7), it suffices to show that 1 -G * r ll G = = 4 { - E <r*\ _ 1 T n(n + 1) f 9(z)f(z)dz 1 j 1 g(z) f(z)dz + T o(l), (5.10) while the remaining terms are op(l). The proof of (5.10) is immediate by writing 1 G * ll G r T _ 1 n(n + l) = (—-—G *"" !^ • 7 1 V +! n (-J-G lV • (l + T / + l / 1 V ™ and using Lemma 5.7.3 in the Appendix of this chapter and result (4.10). Result (5.11) is proven in Lemma 5.7.4. To prove the remaining terms in (5.9) are O p ( l ) , it suffices to show that the quantities Gf ^-\I +1 + 1), GJ+^II^SH - Sh)Gj+1/(n S )v /(n + 1), vi**- 1 c h j+1 (I - S h)Gj+l/(n c - I)G /n(n j+1 + 1) and rjj^S^^J{n 74 + 1), Gf *-\l - +1 + 1) are o ( l ) . P These facts follow from lemmas appearing in the Appendices of this and the preceding chapter. First consider Gf &~ (I i - S )G /(n 1 +1 r* = Gi+i, fi = h + 1). By result (4.58) of Lemma 4.6.9 with j+1 and r = G , this quantity is 0(h ) result (4.59) of Lemma 4.6.9 with r* = G i, fi = i+ Gf *" 11 (5^ - I)G /n{n 1 T +1 j+1 2 fi = * i+i _ 1 1) is O ( n - c + 1 h j+1 and r = Gj+\, we have that + 1) is 0(h ) = o(l). By result (4.60) of Lemma 4.6.9 with r* = G , G f * ( 7 - S )r} /(n+ = o(l). Similarly, from 2 j+i p 1 / 2 /r 1 / 2 _ and £ = T ; , we have that 1 j + 1 ) = o (l). Using a similar reasoning with P (4.61) of Lemma 4.6.9, we obtain that rj'[ ^>~ (I - S )G /(n 1 Finally, consider rfi ^~ S r) /(n+l). l +l fi = h j+1 P By result (4.62) of Lemma 4.6.9 with £* = r/ , c h + 1) is also o (l). c +1 j+l i+1 and £ = Vj+i> this quantity is (D (n 1 / / 2 P /i l ) = o (l). This concludes our l 2 P proof of (5.7). By Lemma 5.7.6 in the Appendix of this chapter, the matrix V * on the right side of (5.7) is non-singular and admits an inverse V ^ , so (5.7) leads to: 1 = V^ + o (l). (5.12) P To prove the theorem, by (5.6) and (5.12), it suffices to show that: 1 — X n + 1 T * - \ I 2/ \ R - St)m = -h °-\ 1 - X> " V k=i 2 a 2 W + o (h ). 2 P (5.13) / From equation (2.16), X = G + r), so: - l X * - ( / - S%)m = -L^CPV-^I r - S%)m + - I ^ H T ^ I - 1 I Sl)m. Using the identifiability condition on m in (2.4) and S£ = (I — 11 /n)Sh, we obtain: T (5.14) 75 B y L e m m a 5.7.5, the first two terms o n the right side of (5.14) are equal to the right side of (5.13). Now, consider rrj <£>~ (I - S )m/(n + 1), the (i + l ) x +l (5.14). h element of the t h i r d term i n t h Using result (4.42) of L e m m a 4.6.4 w i t h c = * _ 1 ( I - S )m h and £ = r ? i + 1 , together w i t h spectral n o r m properties introduced i n Section 2.4, we obtain: - l ^ t f - V - S )m = -l-0P(||*-i(J _ h = • I I C - S )m\\ ) h lb ~~J~ h -L The last equality was obtained by using that | | * | | - S )m\\ h = 0{n l h ) l 2 Finally, consider r]f ^~ ll (I 1 2 h 2 = o (h ). 2 p is bounded (result (5.35) of L e m m a _ 1 by result (4.39) of L e m m a 4.6.3 w i t h r = m. 2 - S )m/n(n T +1 • | | ( I - S )m\\ ) J. Tb \~ 5.7.2) and 2 -^-OpiWV-'Ws = s S )m\\ ) + 1), the (i + l ) h t h element of the fourth term i n (5.14). Using a similar reasoning as above, we obtain: " • = = -L^Opm-'Ws ' ^Ti) I ) m l | l l T | l f ' 1 1 ( - IK/ - S )m\\ ) h 7 0j,(ll - S h *' ) m l ~ lllT(Sfc l s /)m|l2) ) = o (h ). 2 2 p T h i s proves (5.13) and completes our proof of Theorem 5.1.1. 5.2 Exact Conditional Variance of f3^-i c given X 5 and Z In this section, we derive an asymptotic expression for the exact conditional variance of 3*-i,s£, given X,Z: Var(3*-i c | X , Z) = a B-% • X *~\l 2 lS - S )#(I T c t h 76 - S t f ^ X •B ^ (5.15) where B ^ is defined as in (5.4). The above equality was obtained by using the explicit n formula of 3^-i > S c in (5.1), together with the fact that Var(Y\X, Z) — cr * by condition 2 (A2). Theorem 5.2.1 Under conditions (AO) and (A2)-(A5), ifn —> oo, h —> 0 andnh —> oo, 3 the conditional variance of the modified local linear backfitting estimator 3 9 - i > S of 3, c given X and Z, is: Var@ 9 l i a . \ X , Z) = - 1 _ . + £ ^ y - i ( o ) s v - i (5.16) Comment 5.2.1 By Lemma 5.7.7 in the Appendix of this chapter, the second term in the above asymptotic expression for Var(3^,-i c\X, S not dominate the first term, which is Z) is Op(n~ ) and hence it does l Q (n~ ). l P Proof of Theorem 5.2.1: By (5.15), we have: 2 |X, Z) = Var(p -i * 9 where C , * = X *-\l T n tS - S )^(I c h • C * • B~^, - S ) ^- Xj(n c (5.17) n> T x h + 1). Since B~\ V* by 1 result (5.12), to prove the theorem it suffices to show that: C ,* = n 4 f +E^ ) 1 S ( 0 ) + - ^ G * " ( / - S%)#(I - SifV-'G r T 1 + o (l). P (5.18) 77 This fact is shown below with the help of lemmas in the Appendix of this and the preceding chapter. By (2.16), X = G + r], SO C * can be decomposed as: n ] Expanding the last term and re-arranging yields: 1 1 C„,* = —-TV**' * +— - G * - \ I 1 - SD^I - + —-G *-\I T + n + -G *-\I T - S%)*(I - T StfV-'G sir*- * 1 1 V Sf9- r, n +1 - S%)MI ~ S ) *- r c T 1 n+1 n+1 T 1 h t h 1 ' h (5.19) The first term in the above converges to the first term on the right side of (5.18) by Lemma 5.7.4. The second term in the above is the same as the second term on the right side of (5.18). To show the remaining terms are op(l), it suffices to establish that Gj *-\I - SDMI ~ SD *-\ /(n T +1 +l rif *- S *S?9- ri /(n 1 +1 c 1 h j+1 + 1), rfi S£^^/(n +l + l) are o (l) for all i, j = 0,1,... ,p. P Let i,j — 0,1,... ,p be fixed. From result (4.42) of Lemma 4.6.4 with c = SD^{I — SD ^f~ G i T l i+ + 1) and — and £ = rjj and from the spectral norm properties introduced +1 78 in Section 2.4 of Chapter 2, we get: -L^nj^ii - si)*(i - < - ^ O P W V - X = Opin-^h- ) si) ^G T z+1 • (1 + \\S \\ ) • | | * | | • ||G || ) c 2 h s F i+1 = o (l). 1 P To derive the above result, we used Lemma 5.7.2 to obtain that ||^||s 0(1). We also used the fact that \\S \\ is 0(h' l ) c c T 0 {n- l h- ' ) P j+1 F + 1) = rfi {&~ S r) /(n 1 h 1 j+1 + h i+1 P _ 1 i+1 Finally, T,T *- S%*S?*- ri /(n 1 +1 and +1) is O p ^ / i " ) = o (l) by result 1 1 j+1 P (4.64) of Lemma 4.6.9 with £* = r> , ft = i+1 5.3 + 1). This quantity is c = o {\) by result (4.62) of Lemma 4.6.9 with £* = ry , f i = * 1 2 £ = rj . 1 i+ Next, consider r]f S ^~ r] /(n l 2 d 11~ 1[s are (take r = G i in result (4.51) of Lemma 4.6.7). l/2 +1 a n by result (4.54) of Lemma 4.6.7, x 2 h while Gi+i is 0(n ) 2 fi* = * and £ = r / . J+1 Exact Conditional Measure of Accuracy of 5 c Given X and Z Any suitable criterion for measuring the accuracy of Qy-i^i should take into account both bias and variance effects. We use the following measure of accuracy for d^-i c , S which combines in a natural fashion these effects: E (||3*- = - 3\\l\x,z) lS = {ECPV-I^IX,Z) - a) T [E0 - O\X,Z) 9 1iS - a) + trace Using equation (5.20) above together with Theorem 5.1.1 and Theorem 5.2.1 we obtain the following result: 79 Corollary 5.3.1 Assume that the conditions in Theorem 5.1.1 and Theorem 5.2.1 hold. Then, when n —> co, h —> 0 and nh? —> oo, we have: E (||3*-xlSc - 3\\ \X, Z)=h*-£(l-f2 ^) 2 +^I-^-( + E^) 1 trace W V^W T {V-^VJ} 2 + o {h ) + o 4 P P (^) . (5.21) 5.4 The y^-consistency of c S ' h Just as with the usual backfitting estimator 3 ^, we would like the modified local linear IS backfitting estimator E{\\%-\ /3^-i c [S -0\\ \X,Z) to be O {n 2 sl to be ^/n-consist'ent given X and Z, that is, we would like 2 P By result (5.21) of Lemma 5.3.1, £(||3*-i c iS - /9||i|X, Z) is O (/I ) + C (n" ). This 4 1 p P result is due to the fact that the conditional variance of B^-i c is Op(n~ ) but its l S conditional bias is Op(h ). 2 We are interested in assessing at what rate the smoothing parameter h should converge to zero so that the squared conditional bias of 3^-1^ tends to zero, but has the same order of magnitude as the conditional variance of P><s,-\s - A similar argument as that c h employed in Section 4.4 yields that h should converge to zero at rate n~ , a G [ 1 / 4 , 1 / 3 ) , a to ensure that the modified local linear backfitting estimator 3^-\ ^ is y^-consistent s given X and Z - exactly as for the usual local linear backfitting estimator 0i s - Note c t h that n~ < n / , so we must 'undersmooth' m^-i c to achieve \/n-consistency of a - 1 5 S 3y-\ c given X and Z. Here, n ~ S 1//5 is the 'usual' rate of convergence for h, which we believe is optimal for estimating m via m^-i c . S 80 5.5 Generalization to Local Polynomials of Higher Degree The asymptotic results in Sections 5.1-5.4 concern the modified local linear backfitting estimator /3^-i =. s We believe these results readily generalize to the modified local polynomial backfitting estimator of 8. The latter estimator is obtained from (5.1) by replacing S° , the smoother matrix for locally linear regression, with the smoother matrix h for locally polynomial regression of degree D > 1. In keeping with the locally polynomial regression literature, we conjecture that the modified local polynomial backfitting estimator of 8 has conditional bias of order and conditional variance of order Op(n~ ). <D (h ) D+1 P Note that we may need boundary correc- l tions if D is even. We also conjecture that h should converge to zero at rate n~ , a a € [l/(2D + 2), 1/3), for the modified local polynomial backfitting estimator of 8 to be •v/n-consistent given X and Z. 5.6 The v^-consistency of s c The estimated modified local linear backfitting estimator 3 ? - i g c can be obtained from (5.1) by replacing \JJ with an estimator 3~-^ = (x $-\l - Sftxy 1 T X $-\l T - S%)Y. (5.22) Deriving asymptotic approximations for the exact conditional bias and variance of /3~-i given X and Z is not possible, as these quantities are not tractable. The reason for this is that \& is random since it is computed from the data. In this section, we give sufficient conditions for 8^-^ g c and /3^,-i Sc to be asymptotically 'close', in the sense that the difference between these estimators is Op(n - 1 81 / ). 2 Our conditions (5.23) and (5.24) are similar to those imposed by Aneiros Perez and Quintela del Rio (2001a) for establishing the asymptotic equivalence of their modified and estimated modified versions of the Speckman estimator. Theorem 5.6.1 Suppose that the conditions in Theorems 5.1.1 and 5.2.1 hold. In ad- dition, suppose that: 1 •-l ^X {* T Then, if h = n a o (l) (5.23) S )(m + e) = o (l) (5.24) 1 h _ L x ( * - *-!)(/ T = -*- ){I-S )X _1 P c h P , a E [1/4,1/3), we have: 3 -. = 3 . - . 9 ifl£ l S c (5.25) o p ( - ^ + Proof: To establish (5.25), it suffices to show: V^(3$-' s c Using the expression for /3 -i £ ) = V^09-\S% ~P) + (5.26) Op{l). in (5.22) and Y = XB + m + e (equation (2.1)), we T write the left side of (5.26) as: (3 -, , -Q) $ s = (^X *~\l = ( ^ X * - ( I - Sl)X r 1 n 1 X *-\I T - X ' V U I n + c c h S )X c h _ Si)X^j T h + o (l)) - S )X • -±=X $-\l 1 ~ S )X^ T 1 F • (±=X *-\I +op(l) (^=X ^-\l-S )(m T + ±=X *~\l T 82 + e) + e) + o ( l ) ) P + e) + op(l) + e) T P c h X *-\l-Sl)(m - o (l) - St)(m T - Sl){m - Sl){m + e) • o ( l ) + P o (l). P By the definition of 3y c (5-1) m S ^ (3$- -a)=y/n~ >h w e have: (3*-i c /3) - lS + Qx *- (-f - SDX^j T • o (l) 1 P - S )(m + e) • o (l) + o (l). + -±=X *-\I T c P h P Therefore, to prove (5.26), it is enough to show that [X ^f~ (I T - S )X/n) 1 c h 1 and X * - ( J - S%)(m + e)/y/n are O (1). r 1 p = X * ( i " - S )X/(n To prove the first fact, let T -1 + 1). By (5.12), B " ^ - c h V ^ + o ( l ) , with Vq, as in (2.22), so (X ^- {I T - S )X/n)' 1 c P = O (1). To prove the 1 h p second fact, use B ^ = V * + op(l) (result (5.7)) and Chebychev's Theorem to write: n - Sl)(m + e) = -^=X ^-\I -^X ^-\I T = - S )(Y - X3) T • ( V * + op(l)) • {E@ - O\X, 9 c h Z)-3 1IS (yVar(3*-i =|X,Z)) } . + O lS P By result (5.3) of Theorem 5.1.1, £?(3*-i,sj\ > ) - Sis 0 {h ) = 0 {n~ ). x z 2 2a P P Also, by result (5.16) of Theorem 5.2.1, Var(3*-i,s= \X, Z) is <r?p(n" ). Since a > 1/4, we 1 conclude: ±=X *~\I T - S )(m + c h e) = O ( V ^ ) • p - 0 P ( o p ( n - ) + Op 2a Q=)) ( n ^ ) + 0 ( l ) - OP(1). F This completes our proof of Theorem 5.6.1. Theorem 5.6.1 implies that 3~-i is v^-consistent since 3^-1 c is v^S One would expect the conditional bias and variance of 3~-i 83 cons istent. to be similar to those of 5.7 Appendix Throughout this A p p e n d i x , we assume that the assumptions a n d notation introduced i n Sections 2.2 and 2.3 of this thesis hold, unless otherwise specified. W e also let I(S) denote the indicator function of a n arbitrary set S. T h e first lemma i n this A p p e n d i x shows that the correlation m a t r i x of n consecutive observations arising from a stationary autoregressive process of finite order R is invertible. T h e l e m m a also provides an explicit formula for the inverse of this correlation matrix. A proof of this lemma can be found i n D a v i d and B a s t i n (2001,Lemma 1). L e m m a 5.7.1 Let e i , . . . , e n be successive observations from an AR process of finite order R satisfying condition ( A 2 ) . If ^ is the correlation matrix of t\, . . . , € „ defined in Comment 2.2.1, then its inverse exists and is given by: -l 07 U U - V V] T T (5.27) where U and V are n x n Toeplitz lower triangular matrices defined as ( i 1 \ / 0 o u and -<t>R V -<f>R 0 0 -< -4>i i J -<t>R 0 0 j (5.28) 84 C o m m e n t 5.7.1 Let 14 be as i n (5.28) and define [U(k)}i,j = I(j = i-k,k + l<i<n) (5.29) for k = 1 , . . . , R. T h e n i t c a n be easily seen t h a t u = i - (j>iU(i) 4>RU Straightforward algebraic manipulations also yield u u = -Y.MuJk) T + u{k)) + £ <t>Mu u T [p) p, q = 1 fc=l + l) (p)) u iq) u + £<f>l fk)Uw u fc=l p<q (5.30) where l w\ij = I(j = i + k l < i < n - k ) , U (5.31) t [Uj U ]. . = / ( j = i + p - q, 1 - p + q < i < n - p), (5.32) [Uf U ]. . = I(j = i-P (5.33) p) q) (q) ip) + qA<i<n-q), for fc, p, q — 1 , . . . , R and p < q. T h e next lemma shows that, i f * is the correlation m a t r i x of a sample of n consecutive observations arising from a stationary AR process of finite order R, then its spectral n o r m is bounded. Furthermore, the spectral n o r m of L e m m a 5 . 7 . 2 Let e\,...,e n is also bounded. be successive observations from an AR process of finite order R satisfying condition ( A 2 ) . If \& is the correlation matrix of e ,... ,e x n defined in Comment 2.2.1, then: ||*||s = 0 ( l ) (5.34) and = 0(1). 85 (5.35) +1, Proof: T h e boundedness of ||\& for 1 j | ^ (result (5.35)) follows easily by using the explicit expression i n equation (5.27). To prove the boundedness of (result (5.34)), use the symmetry of * and a well- known result on spectral norms to get: 1*11.9 < |[*U max = max ^ ' J'=l l<i<n l<i<n \ P h —' h=l- A c c o r d i n g to Exercise 13 i n Brockwell and D a v i s (1991), there exist constants C > 0 and s G (0,1) so that: \Ph\ <Cs | / j | for a l l h. C o m b i n i n g the previous results yields: n-1 E 1*11.9 < M (.g<*) -(* b)' , r S and (5.35) follows. T h e following l e m m a provides a useful asymptotic approximation. Lemma 5.7.3 Let e i , . . . , e n be successive observations from an AR process of finite order R satisfying condition ( A 2 ) . Let \& be the correlation matrix of e\,... ,e . n more, let G be an nx (p+1) matrix defined as in (2.14). If n—* oo, then: i f -—G *- l= -\-[l-Y <t>k) 2 T l \ R r 2 l / a J n+1 Further- «V a *=i / Proof: 86 J o g(z)f(z)dz + o(l). (5.36) B y (5.27), the left side of (5.36) is 1 ^ -G * 1 — T _ 1 n + 1 ° l 1 -G V V1, T G U U\ T n+ 1 T al T n+ 1 so it suffices to show —G U Ul n + = (l - 1 G V V1 n+1 = o(l). l T T T T E <f>k^j f Q 9(z)f(z)dz + o ( l ) , (5.37) (5.38) To establish (5.37), it is enough to show that, for any i — 0 , 1 , . . . ,p, we have: y2 £ < M *=i i / / - >(1). Si(z)/(z)ete + o( 70 Let i — 0,1,... ,p, be fixed. Using the explicit expression for WU i n result (5.30), we write: ^ G l M ^ l = .2 R - f -zZ^ a 2 u .2 (Uj k) +U) 1 W fc=i * p<q 2 * « fe=i n ^ • S T T ^ ' 1 - ( 5 3 9 ) Therefore, it suffices to prove that the following asymptotic approximations hold: £ R & [ ^ X T ^ i (tff*) + W) G U *1 = 2 (E / ' 9i(z)f(z)dz + o(l), (5.40) r E P, Q = 1 P, Q = 1 ^ P<9 P<9 / \ 9i(z)f(z)dz Jo + o(l), / (5.41) 87 E*: fe=i E*i n+ —G l l / = J^ (z)f(z)dz T i+1 9l ft(z)/(*)<fc + o(l), (5.42) + o(l). (5.43) T h e last result follows from result (4.10). To prove (5.40), it is enough to show that the equalities below hold for any k = 1,..., R: ^Gf n + 1 C/f f e ) l = f + o(l) (5.44) + o(l). (5.45) Jo 1 T (z)f(z)dz gi 1 = f *(*)/( z)dz n +l Jo Using the expression of Uj ^ i n (5.31) and a R i e m a n n integration argument, the left side k of (5.44) c a n be written as: ^ n = — r Y\ n + 1 n YI 9i{Zt)I [l = t + trtr k,\<t<n-k) t=i = / e)/e)dz + o(l). Jo ft Here, we have also used that k does not depend u p o n n, as R itself does not depend upon n. Similarly, using the expression for U(k) 88 m (5.29), we obtain that the left side of (5.45) is: 1 — — - G C 7 ( wf c ) l = T n +1 i+1 n+ ^ n+ t=l /=1 n n t=i ;=i = ^ E ^ ) ~ E ^ ) + « w t=k+l = / t=l ft(z)/(*)dz Jo + o(l). Thus, b o t h (5.44) and (5.45) hold. A similar argument c a n be used to derive (5.41) a n d (5.42). T h e only difference i n the proofs is that the range of summation for t i n J2 gi{Z ) t t changes. It remains to prove (5.38). To establish this result, i t is enough to show that G ^ V V l / ( n + T h l 1) is op(l) for a l H = 0 , 1 , . . . ,p. Let i = 0 , 1 , . . . , p, be fixed. B y the definition of V i n (5.28), we have: /o A 0 VG i+l = -<t>R9i{Z\) -<t>R-i9i(Zi) - <f>R (Z ) gi \ 2 -<?\gi{Z ) - (p2gi(Z ) X 4>R9Z{ZR) J 2 Since &(•) is bounded by assumption (A0)-(i), | | V G j + i | | = 0 ( 1 ) - A similar argument 2 yields ||V1||2 = 0 ( 1 ) - C o m b i n i n g these results, we obtain: 1 n +l Gf V V1 T + 1 so Gj^V'Vl/in ^ rxTllVG i|| i + To ~T~ 2 • ||V1|| = C U ) • 0 ( 1 ) = 0 ( V n ) = o(l), 2 To X + l) is o(l). 89 ~x~ 1 The following lemma provides a result concerning the convergence in probability of a random matrix. Lemma 5.7.4 Suppose the assumptions in Lemma 5.7.1 hold. Let (rjn,..., rji ) , i — T p 1,... ,n, be as in condition (AO)-(ii) and let rj be an n x (p + 1) random matrix defined as in (2.10). Then, as n —> oo: ^ * - S = ^ - ( l + E«) r S + op(l) ( 0 ) (5.46) where S ' ' zs defined as in equation (2.15). 0 Proof: By (5.27), the left side of (5.46) can be written as: n +1 cr 2 n+ 1 a n+1 2 so it suffices to show -L-rfWUri = (l + E ^) S ( 0 ) + n +1 In fact, if £ ^ — (Ey), it is enough to show: L-nf^Ur]^ n + 1I^+iV Vr7 = ^1 + E T n + i+1 = op(l), for any i,j = 0,1,... ,p. 90 ^ + OP(1), (5.47) (5.48) Let i,j = 0,l,...,p, be fixed. Using the explicit expression for * R 1 ^pffi+M U-q = - T i+1 1 fa fc=i ^T^r+i ( l + (*)) U U i n (5.27), we write: Vj+i R +E n + jvl i (C/f C/ + + p) C/f L/ )r7 (g) g) (p) j+1 p<g 1 E< n+l fc=i vT+iUj U k)r] i k) { + ^jvf+iV i j+ j+ (5-49) In order to establish (5.46), we w i l l show that R 1 jvI iUf U fc=i n+ + k) X> ik)Vj+1 i j + o (l), P (5.50) vfc=i 1 -Vi iVj+i ri+ S 2 (5.51) = Eij + o (l), + P and the remaining terms i n the right side of (5.49) are o p ( l ) . Result (5.51) holds by result (4.9). To prove (5.50), we use condition (AO)-(ii) and the Weak L a w of Large Numbers for a sequence of independent variables to write: 1 -^vI iUf U n + k) {k)Vj+1 = 1 " " -r-r-yEE ^ 7 [ Jk)U(k)] mj u til — EE Vtjlil = t=i i=i n + n-k .. 1 V t,l<t<n-k)riij P \ = — T T A , Vt.iVtj > E (m.iVij) = n-+oo n + l and (5.50) follows easily. We now show that the first term i n (5.49) is O p ( l ) . We have: R \ E fa 7—T^f+i^ffc) + U )v i fc=i L 1 I {k) j+ J ( 1 = E fa ( ^TT^+i^ffc)^-fc=i / 1 R v + Hfa 91 [j+^^^i (5.52) so it suffices to analyze the term nI+iUj )'nj+i/{ + 1)> i ri a s k vf+iU(k)Vj+i/( + 1) is its n transpose, w i t h i and j interchanged. Using the expression for 1 7 i n (5.31), we obtain: ^ n n = ——r E E n+l — '7t>^( i = i + ^> 1 < t < n - k)rj U t=i (=i 1 n — 1 = ^—"\ p n -\- 1t=l— —pr / . Vt,iVt+k,j n—*oo> E (r) r) ) hi = E(r) )E(rj ) 1+k:j hi = 0. 1+kij T h e above result was obtained by using the fact that {vt.iVt+kj}^! is a sequence of kdependent, identically distributed random variables (condition (AO)-(ii)), so the quantity Y^t=i Vt,iVt+k,j/(n + 1) converges to E {r)i rji j) :i +k by the Weak L a w of Large Numbers for fc-dependent sequences of random variables. W e conclude that the term 'nJ Uj ^ qj /(n+ , +1 k +1 1) is o p ( l ) , so the first term i n (5.49) is o p ( l ) . Using a similar reasoning, we can show that the second term i n (5.49) is also o p ( l ) . It remains to show (5.38). B y the definition of V i n (5.28), we have: \ 0 VVj+i = ~cf>RV\,j -fpR-lVlj \ so ||V?7 0(1). :;+1 - $R?}2,i 4>Rm,i j - ||2 = 0(1) by assumption (AO)-(ii). A similar argument gives ||V»7j ||2 +1 C o m b i n i n g these results yields: T7f V V77 T n +l +1 j+1 < n + l 1 n + l i+ll|2 • l | V T 7 Op(l)-Op(l) 92 j + 1 || 2 = 0 (l/n) P = o (l) P ) so rf: V Vr)j /(n + 1) is o (l). T +1 +l This completes our proof of Lemma 5.7.4. P The following lemma provides asymptotic approximations for non-random quantitities involving the bias associated with estimating a smooth function m(-) via locally linear regression. L e m m a 5.7.5 Suppose the assumptions in Lemma 5.7.1 hold. Let G be an n x (p + 1) matrix defined as in (2.14) such that condition (AO)-(i) is satisfied, and Sh be an n x n local linear smoother matrix defined as in (3.6)- (3.8). Set m = (m(Zi),... ,m(Z )) , T n where m(-) satisfies condition (A4) and the Z 's satisfy the design condition (A3). Then, t if' n — > oo, h —> 0 and nh? —> oo: — - j — G * ( I - S )m T - 1 h n + — -h ^ « i f l 2 — V CT <t>k) ^p- £ fc=i / 1 f g(z)m"(z)f(z)dz 1 + o(h ) 2 J o (5.53) and 2 n (n —G *-ill (S -I)rn 1 T = h^ T 2 h X where g(z)f(z)dz and ( l - | > ) f 9{z)m"(z)f{z)dz ^ Q [ g(z)f(z)dz + o (h ), Jo /o (5.54) 2 P g(z)m"(z)f(z)dz are defined as in equations (2.18) and (2.19). Proof: We first prove (5.53). Using the explicit expression for - ± - G * - \ l - S )m T n+1 h = ^ in equation (5.27), we write • -^-G U U(I T 2 - — a 93 2 - T cH n + 1 0"e 1 •— G n+1 rp r V S )m h rri T V I - 5, m, so it suffices to show that 1 G'WUil n+1 1 G V V(I T n +1 - S )m J 1-X>J h g(z)m"(z)f(z)dz o + o(h ) 2 - S )m = o{tf). T h These facts follow by proving that — J U U{I n + l - S )m T G +l h = -h ^ - E <f>^j j[* 9i(z)m"(z)f(z)dz + o(h ) 2 2 (5.55) 1 -G n+1 V V ( / - S )m = o(h ), J + 1 (5.56) 2 h for any i = 0,1,... ,p. First, consider (5.55). Using the expression for U U in (5.30), we have: T R —Gj U U{I n + l - S,)m = E fa ^ G f T +1 ( E / f > + U )(S {k) + 1 - h I)m k=l R E n+ P, ? =1 p<q n + n + T jGi (Ul U +1 p) Gr J7f C7 +1 fc) + Ul U )(S {q) q) (fc) ip) - I)m h (S -/)m /l (5.57) -G? (S -I)m +1 h Thus, to establish (5.53), it suffices to show that the last two terms are o(h ) and the 2 remaining terms can be approximated as: R E^ k=i T 1 -—Gf (Uj +1 L + k) 1 U )(S -I)m ik) h + J = * ( ^^) h 2 f 9i(z)m '{z)f{z)dz l Q 94 + o(h ), 2 (5.58) £ 4>P<l>q—Gi (Ul U p, q = 1 p<q +1 p) + Ui U )(S iq) = h q) I R ip) \ »2{K) 2 y cp 4> 2 p V p, 1 - I)m h (z)m"(z)f(z)dz + o(h ), 2 9i (5.59) JO J =1 f 1 f q -Gj U U (S -I)m T n+ k=l +l {k) {k) h h ^ j 2 f a 9i(z)m"(z)f(z)dz + o(h ), (5.60) 2 and 1 r (s T nm- h 2 v 2 { K ) M f (z)m"(z)f(z)dz + o(h ). 1 (5.61) 2 gi Jo ^ —G {S -T)m-— h T h e last result follows easily from result (4.66) of L e m m a 4.6.10. To prove (5.58), i t suffices to show that the equalities below hold for any k = 1 , . . . , R: h v {K) 2 -^Gf + 1 E7[ f c ) (S - I)m = f c -±-Gf U (S +1 (k) 2 1 9i f - I)m = h f j f (z)m"(z)f(z)dz (z)m"(z)f(z)dz 9i + o(h ), (5.62) + o(h ). (5.63) 2 2 Consider the left side i n (5.62). Using the expression for Uj ) i n (5.31), the boundedness k of cii(-) (condition (A0)-(i)) and result (4.34) of L e m m a 4.6.1 w i t h r = m, we obtain: — L-GZ,U? {S -Ih k) h = r+~l ^ ^ 9i(Z )I(l t EE9i(Zt) M ] • l(S - J ) m ] , k) u h = t + k, 1 < t < n - k) [B {K, Z h)-h 2 m h + o(h )] 2 t=i (=i n—k + o(h ) 2 ^y {Z )B {K,Z ,h) 9i t m t+k = h (V + Q ) + o(h ), 2 (5.64) 2 n n 95 where V = YTtJy gi{Z )B (K, n Qn = t+k (9i(Z ) - 9i{Z )) t Z , m t+k 1) and h)/(n + t+k B (K, Z , m h)/(n + 1). t+k A Riemann integration argument allows us to approximate V as n fc V n = ~~T E 9i(Z )B (K, t=i t n + m Z h) u 1 n = ^p-j\i(z)m''(z)f(z)dz + i £ t=i (Z )B (K, 9i t Z , h) m t + o(l). The last equality was obtained by using the fact that k does not depend on n and gi(-) is bounded (condition (AO)-(i)). We also used the fact that B (K, z, h) is bounded for all m z G [0,1] and h < ho, with ho G [0,1/2] small enough, by result (4.35) of Lemma 4.6.1, Lemma 4.6.2 and condition (A4). Using the fact that <?*(•) is Lipschitz continuous with Lipschitz constant C* (condition (A0)-(i)) and that the Z 's satisfy the design condition (A3), we bound Q as: t n n—k \Qn\ < — j - r E \9i(Zt) ~ 9i(Z )\• t=i ^ n—k k — 1 t+k n ^ C \B (K, Z , m h)\ t+k + ^TT < i c E E \9i(Zt+i) t—i i—o f k ) = c gi(Z )\ t+l+1 , k i = 0 ( 1 ) , Substituting the results concerning V and Q in (5.64) yields (5.62). A similar argument n n can be used to prove (5.63). The only difference in the proofs is that the range of summation for 'S2 gi(Z )B (K,Zt+k,h) t t m changes. Combining (5.62) and (5.63) yields (5.58). Similar arguments can be employed to obtain results (5.59) and (5.60). 96 It remains to prove (5.56). B y the definition of V i n (5.28), we have: 0 -Mii-Sh)™.]! V(J - S )m h - f o _ i [ ( I - 5 )m]i -fo[(Ifc h fc x 2 ft i + 1 || 2 J = 0(1). C o m b i n i n g these results yields: 1 < n + l ||VG || -||V(I-S )m|| Gf^V'Vil-S^m - S )m/(n T R 2 i+1 n + l +l h h 1 so Gf V V(I Mil - S )m} 2 2 4.6.2. We know that | | V r 7 n + l 2 = G(h ), since for i = 1 , . . . , R, \(S - I)m]i = 0(h ) by L e m m a 2 1 h - S )m] - 0 [(I - 5 )m] \ -0i[(/ B u t ||V(/ - S )m\\ S )m} 2 fc 2 0 ( 1 ) • 0(h ) = 0 ( l / n ) = o{h ) 2 2 + 1) is o(h ). 2 h Result (5.54) follows easily, by writing: 1 n(n + l ) G t f l l ( S - I)m r - 1 r x / ""~ 1 \n + l -G * 1 T _1 1 n + l l (S T h - I)m 1 1+ n and using L e m m a 5.7.3 and result (4.66) of L e m m a 4.6.10 w i t h G replaced w i t h 1. T h e proof of L e m m a 5.7.5 is now complete. Let V<i, be the m a t r i x defined i n (2.22). T h e next result concerns the existence of a n inverse for and provides an explicit expression for this inverse. W e do not provide a proof for this result, as one c a n easily verify that V^V^, = V^V<y = I by using the 1 expression of given i n L e m m a 5.7.6 below. 97 L e m m a 5.7.6 Let V * be the (p+1) x (p+1) matrix introduced in (2.22) and define the px 1 vector a as a — (J* gi(z)f(z)dz,..., g (z)f(z)dz) . Here, /(•) is a design density T p satisfying condition (A3) and gi(-),..., g (-) are smooth functions satisfying condition p (AO)-(i). Furthermore, let £ = (£;j) be the variance-covariance matrix introduced in condition (AO)-(ii). Then V ^ exists and is given by: 1 I 1 (i-Eti« 1 2 + i r i+ Ef=1^ a E- a E a r _ 1 i' -E a _1 V E- : 1 provided E exists. 1 The last result in this Appendix provides a useful asymptotic bound. L e m m a 5.7.7 Suppose the assumptions in Theorem 5.2.1 hold. Then, if n —> oo, h —> 0 and nh —> oo, we /iai;e: 3 — L — V ^ G ^ C I - S£)*(J - 5 ^ ) * - G V i T 1 1 = 0{n- ). (5.65) 1 Proof: To prove (5.65) it suffices to show that 1 - ^ G T ^ - \ I - S )*(I - SifV-'G = Oin- ), c (5.66) 1 h since the elements of the (p + 1) x (p + 1) matrix V ^ do not depend upon n. Result 1 (5.66) follows by showing that G f * ( T - S ) V ( I - S ) * - G _ 1 c + 1 c h for any i, j = 0,1,... ,p. 98 T h 1 j + l /(n +1) is 0 ( 0 2 Let i, j — 0 , 1 , . . . ,p be fixed. Using vector and m a t r i x norm properties introduced i n Section 2.4, we obtain: 1 (n + l ) ' 1 < (n+l) I - S Y c 2 ' tf-^+xlla h SffV-'G^lU • ||*||s • ||(I - < O ( n - ) • | | ( / - SlY^G^W, • | | ( I - S ) ^ G \\ , 2 c T (5.67) l h j+1 2 since | | * | | s is 0 ( 1 ) by result (5.34) of L e m m a 5.7.2. T h u s , it suffices to show that ||(/ - S ) *- G \\ 0(n^). c T h Let v = * and ||(7 - S%) *- G \\ l i+l _ 1 T 2 j+1 2 G j + i ; using S £ = (7 - 11 /n)Sh (equation (3.1) w i t h T ||(7 - SlfV-iG^Wt are l < WV-'G^W, + = Sh) we write: WSfV-'G^W, = ||«||2+||^«||2 = 11*112 + 1 - - 1 1 > n 5^(7 < IHl2 + | | S ? « | | + (5.68) a If v denotes the i t h t component of v, we can show that there exists C > 0 such that \v \ < C for all t = 1 , . . . , n. Indeed, by the expression for i n (5.27) and Comment t 5.7.1, we have: v- = G T j * - i = ^ . Gj U U T +1 ~ +1 ~ "f • Gj V V T 2 +1 -T Muf + u )+ 2' t fc=l w k) £ 0A(i/f t/ p ) ( 9 ) + r7f L/ ) ?) (p) p, 1 = 1 p<q + 5> c/f c/ + 7 2 fc) (fc) k=l so it suffices to show that the quantities and Gj Uj )U( ) +1 k k Gj l/J , Gj U( ), +I k) +1 k J+i J ) (q)^ G u U P have bounded components for all k,p, q = 1,...,R, 99 Gj^Uf^U p < q, a n d G j V V also has bounded components. These facts follow easily from the sparseT + 1 ness of C7(fc) and V (see Comment 5.7.1) and the boundedness of Gj+i's components (condition (AO)-(i)). The boundedness of v's components implies that ||u||2 is 0(n l ), WS^vWv is C(n l 2 \\Sl(ll v/n)\\ T 1//2 ) and is C ( n ) . The last two results follow by result (4.49) of Lemma 4.6.6. 1/2 2 Using these asymptotic bounds in (5.68) yields that ||(J - S ) ^~ G i\\ c T h A similar argument gives that \\(I - S ) ^~ G i\\ c T h l i+ 100 2 is l j+ 0(n ). 1/2 is 0 ( n ) . 1/2 2 Chapter 6 Choosing the Correct A m o u n t of Smoothing The estimators of the linear effects in model (2.1) considered in this thesis depend on a smoothing parameter h. This parameter has a dual function. On one hand, it influences the statistical properties of the estimated linear effects. On the other hand, it controls the shape and smoothness of the estimated non-linear effect. Our focus in this chapter is on developing data-driven methods for choosing h so that we obtain accurate estimators for the linear effects of interest. These methods may not be the most appropriate for accurate estimation of the non-linear effect, as they may undersmooth its estimator. This chapter is organized as follows. In Section 6.1, we introduce some useful notation. In Section 6.2, we introduce methods for choosing the correct smoothing parameter for the usual and modified local linear backfitting estimators of the linear effects of interest in model (2.1). These methods require the accurate estimation of the nonparametric component m and the error correlation structure, topics discussed in Section 6.3. Finally, in Section 6.4 we introduce methods for choosing the correct smoothing parameter for the estimated modified local linear backfitting estimators of the linear effects of interest in model (2.1). 101 6.1 Notation In what follows we are interested in the accurate estimation of a linear combination cF 8 of the linear effects 3 in model (2.1), where c = (co, c\,..., c ) is a known vector with T v real-valued components (e.g: c = (0,..., 0,1, 0,..., 0) ). T Throughout this chapter, we denote fii,s > c h and /3~-i 3^,-i c iS g generically by 3 c U h in order to emphasize their dependence upon the amount of smoothing h. We want to choose the amount of smoothing h to accurately estimate c 3 via c 3 T 3~-\ T n h . Given that is conceptually qualitatively different than the other estimators considered here, we defer its discussion to Section 6.4. In the remainder of this chapter, unless otherwise stated, we assume that Cl stands for J or St . -1 The correct choice of h depends on the conditional bias and variance of c 3 T n h given X and Z. We provide below explicit expressions for these quantities. The exact conditional variance of c 3 T Va 0n,h\ , r x equals c Var(3 \X, Z)c. Expressions for T i l h n h ) are found in (4.17) when fl = I and in (5.15) when Cl = z c Var(p \X, Z)c = a c M , *M^ c T 2 u>h = Var(h; Cl), T a h h Thus: (6.1) where M , n = (X Cl(I T f c The exact conditional bias of c 3 T - 5 ^ ) X ) - X f i ( I - S ). 1 T c h equals c Bias(3 \X, T n h Uh (6.2) Z) and can be obtained from (4.2) when Cl = I or (5.2) when Cl = * : _1 c Bias(/3 \X, T nA Z) = c M ,hm T n 102 = Bias(h; Cl). (6.3) 6.2 Choosing h for c f3j c and c /3^-i c T The estimator c ' 3^ h estimator c 3 T T a h S depends on the smoothing parameter h. To obtain an accurate 1 c 3 T >S of c 3 we choose h so that it minimizes a measure of accuracy of T U h . Although the smoothing parameter h quantifies the degree of smoothness of "in.fti a 'good' value for h should not necessarily be chosen to minimize a measure of accuracy of rfici,h as, in the present context, m is merely a nuisance. Since c 3 T n h is generally biased in finite samples, we assess its accuracy via its exact conditional mean squared error, given X and Z: MSE(h; ft) = Bias(h; ft) + Var(h; ft). 2 (6.4) We define the MS ^-optimal amount of smoothing for estimating c 3 via c 3 T T a h as the minimizer of MSE: MSE h = argmin MSE(h; h From equations (6.1) and (6.3), one can see that h^ ft). SE (6.5) depends upon the unknown nonparametric component m as well as the error variance a and the error correlation 2 matrix * , which are typically unknown. Thus, h^ SE is not directly available. To date, no methods have been proposed for estimating hf} SE when the model errors are correlated. However, when the model errors are uncorrelated and ft = J , Opsomer and Ruppert (1999) proposed an empirical bias bandwidth selection (EBBS) method for estimating h[j . We describe this method in Section 6.2.1. We propose modifications SB of the EBBS method to estimate h^ SE ft = J , but also for ft — when the errors are correlated not only for in Section 6.2.2. Finally, in Section 6.2.3 we propose a non-asymptotic plug-in method for estimating h^ SE in the presence of error correlation when ft equals I or V& . Each method minimizes an estimator of MSE(-; ft) over h in -1 103 some grid. Throughout, we let H = {h(l),h(N)} denote the grid, for some integer N. 6.2.1 Review of Opsomer and Ruppert's EBBS method In this section, we provide a detailed review of Opsomer and Ruppert's EBSS method. Throughout this section only, we assume that the errors associated with model (2.1) satisfy \& = i". Specifically, we assume that these errors satisfy the assumption: (A6) The model errors e ,i = 1,... ,n, are independent, identically distributed, having t mean 0 and variance a\ G (0, co). We also consider $"2 = I, so that the results in this section will apply exclusively to 0 , the usual local linear backfitting estimator of c 3. T Ih Under the above conditions, the EBBS method attempts to estimate h^ by minimizing SE an estimator of MSE(-\ I) over TC, a grid of possible values of h. For a given h(j) G TC, Opsomer and Ruppert find an estimator for MSE(h(j);I) by combining an empirical estimator of Bias(h(j); I) with a residual-based estimator of Var(h(j); I). We discuss the details related to computing these estimators below. Opsomer and Ruppert use a higher order asymptotic expression for E(c 0 \X,Z) T Ih — c 0, the exact conditional bias of cFQ , to obtain: T Ih T (6.6) as h —> 0, where a ,t = 1,... ,T, are unknown asymptotic constants referred to as bias t coefficients. This expression can be obtained by a more delicate Taylor series analysis in (4.3). This yields the approximation: T+l E(c p \X, T Iih Z) = c 0 + J2 ath* + T t=2 104 o(h ). 1+T (6.7) For fixed h(j) € H, Opsomer and Ruppert estimate Bias(h(j)\I), bias of c 3 (j), T Iih as follows. They calculate c 3 ^ the exact conditional for k e {j — ki,..., j + k }, for T Ih 2 some ki, k . Note that j must be between ki + 1 and N — k , inclusive. They then fit 2 2 the model: E(c J3 \X, Z) = a + a -h Iih 0 to the 'data' j (h(k), c?3 ( ^ Ih + --- + a 2 T 2 •h T+1 T+1 (6.8) : k — j — k\,..., j + A; | using ordinary least squares. k 2 This results in the fit: E(c 0i, \X, Z) = a + a -h T + --- + a 2 h 0 2 •h . T+1 T+1 (6.9) An estimator for Bias[h(j); I) is then: Bio7s(h(j);I) = E(c p \X, Z) - So T IMj) = a • h(j) + • • • + a 2 2 • h(J) . (6.10) T+1 T+1 Here, ki, k and T are tuning parameters that must be chosen by the user. We must have 2 k\ + k > T since the T + 1 parameters ao, a i , . . . , ar will be estimated using k\ + k + 1 2 2 'data' points. Opsomer and Ruppert estimate Var(h(j); I), the exact conditional variance of c 8 ^, T Ih by using (6.1) with *ff = I but with a replaced by the following residual-based estimator: 2 ^ _ \\ ~ PiMi) Y 2 ~ X iMi) 12 m n This yields: Var{h{j); I) = ^ M W ) M j W ) c . (6.11) Finally, Opsomer and Ruppert combine (6.10) and (6.11) to obtain the following estimator of MSE(h(j);I), jfci + 1 < j < N - k : 2 MSE(h(j);I) = Bia7s(h(j);I) 2 105 + Va7r(h(j); I). They then estimate hj^ , the minimizer of MSE(-;I), as follows: SE hMSE= argmin ki+l<j<N-k MSE(h(j);I). 2 We see that h^ SE attempts to estimate h^ , the smoothing parameter which is MSESE optimal for estimation of c (3. It is not clear however whether using hj^ T SE yields a V^n-consistent estimator of c 0. T The variance estimator Var(h(j); I) in (6.11) depends on the matrix Mj^(j)- To speed computation of M ^, and hence Var(h(j);I), I;h Opsomer and Ruppert suggest the following. First, take fl = I and h = h(j) in (6.2) and re-arrange to obtain an alternative expression for M = (x (i T i m - s )xy x (i c l - s% ) T h{j) U) = (X (X - S< X))-\X T - X S ). T T h(j) Then, compute the product S ^X c h c h{j) in (6.12) by smoothing each column of X against the design points Z\,... ,Z . Finally, compute the product X S° ^ T n the approximation X S ^ T T h symmetry of S ^y c h h « (S%^X) . c (6.12) in (6.12) by using This approximation is justified by the near These computational tricks can also be used to ease the burden involved in calculating 0i h(j),h(j) t G Ti, as 8i,h(j) c a n be easily seen to depend upon iMJ)- M A peculiar feature of the estimator o\ of a\ is that it uses residuals based on the 7 'working' bandwidth h(j) G Ti, instead of a bandwidth optimized for estimation of of. As an alternative to estimating of with the 'working' bandwidth h, Opsomer and Ruppert suggest that one could use residuals based on a bandwidth optimized for estimation of a\ as in Opsomer and Ruppert (1998). For implementing the EBBS method in practice, Opsomer and Ruppert (1999) suggest using a grid size N = 18 and grid values equally spaced on the log scale. They recommend 106 using the following values for the tuning parameters involved in this method: ki = 1, k = 2 2 and T — 1. For situations where MSE(- \ I) is found to have more than one minimum as a function of h, they suggest that one could take h MSE to be either the h value where the global minimum occurs, or the h value where the first local minimum occurs. The authors advise that they found the former approach to be superior to the latter in their simulation studies. 6.2.2 Modifications to the EBBS method Here we adjust the EBBS method to deal with estimating h^ SE are correlated and fl = I or Cl — h MSE when the model errors The modified EBBS method attempts to estimate by minimizing an estimator of MSE(-\Cl) over the grid Ti. For a given h(j) € H, this estimator is obtained by combining an empirical estimator of Bias(h(j);Cl) the exact conditional bias of c 3 the t T n h , with a residual-based estimator of Var(h(j);Cl), exact conditional variance of c 3 . T Uh Specifics are provided below. The modified EBBS method uses a similar bias-estimation scheme to that employed in the EBBS method in order to estimate Bias(h(j);Cl). This scheme relies on the following asymptotic bias approximation: E(c p \X, T n:h x+i Z) = c 3 + y a h + o(h ), t=2 T l t which parallels (6.7), and yields the estimator 1+T (6.13) Bias(h;Cl). However, the modified EBBS can no longer rely on the residual estimation scheme utilized in the EBBS method for estimating Var(h(j);Cl). Var(h(j);Cl) The reason for this is that depends not only on the error variance cr , but also on the error correlation 2 107 matrix * . Forft= J , we propose to estimate Var(h; ft) via: S j ^ M n ^ M j c, if * is known and o f is unknown; h Var(h; ft) = < a c 2 T CT c Mn,h*iVf^ 2 if * is unknown and o f is known; M Q ^ M ^ C , T h c, (6.14) if * is unknown and o f is unknown. For ft = SP , if of is unknown, we propose to estimate Var(h; ft) via: -1 Var(h; ft) = a c 'M ^M 2 T (6.15) c. T U u<h The estimators in (6.14)-(6.15) have been obtained from (6.1) by substituting o f for o f and * for * whenever appropriate. Details on how to obtain reasonable estimators o f and * are provided in Section 6.3.2. In summary, the modified EBBS method finds the minimizer: h^sE — argmin ^Bias(h; ft) + Var(h; ft) j = h _, 2 EBBS with h G TL = {h(l),..., L (6.16) h(N)}, Bias(h;Q) obtained via the bias-estimation scheme described earlier, and Var(h;Q) as in (6.14) if ft = I or as in (6.15) if ft = of is unknown. Here, the label 'EBBS method estimates Bias(h;fl) and — U denotes the fact that the modified EBBS by local ordinary least squares regression. It is possible to estimate Bias(h; ft) by performing global, rather than local, ordinary least squares fitting. Specifically, we can perform just one least squares regression, using the 'data' ^(h(k),c /3 ^ T nh : k = 1,..., ivj. We refer to the method that finds the minimizer of (6.16), with Bias(h;fl) obtained by global ordinary least squares fitting, as the global modified EBBS method. We denote the amount of smoothing this method yields by h£ _ . BBS G Before concluding this section, we indicate how the modified EBBS methods can be generalized if one is interested in smoothing parameter selection for accurate estimation of c 0 via the usual or modified local polynomial backfitting estimators. For simplicity, T in this section only, we denote both of these estimators by c /3 . T ah 108 The variance-estimation scheme to be used in the generalized modified EBBS methods should be the same as that employed in (6.14)-(6.15). Obviously, the quantities a , M^,h 2 and * involved in these equations should be computed based on locally polynomial regression of degree D > 1, instead of locally linear regression. We conjecture that the bias-estimation scheme would have to rely on the asymptotic approximation T+l E(c 0 \X,Z) T Uih = c 3+ J2 T a h + o(h ) T 1+T t t=D+l instead of (6.13). Note that we must have T > D. 6.2.3 Plug-in method In this section we introduce yet another method for estimating the optimal amount of smoothing h$ SE in the presence of error correlation whenever fi = I or fi = * the non-asymptotic plug-in method. Recall that h^ SE _ 1 , namely was defined as the minimizer of MSE(-; fi) in (6.4). Thus, we might find a reasonable estimator for h^ SE by minimizing an estimator of MSE(-; fi) over a grid of possible values for the smoothing parameter h. We propose estimating MSE(h(j); fi) by assembling plug-in estimators of its exact bias and variance components, Bias(h(j)\fl) and Var(h(j); fi). More specifically, we propose to estimate Var(h(j); fi) using (6.14) if fi = J and (6.15) if fi = and of is unknown. Furthermore, we propose to estimate Bias(h(j) \ f2) using (6.3), but with m replaced by an accurate estimator m: Bias(h; fi) = c M ,hrn. T n (6.17) Details on how to obtain an accurate estimator m of m are provided in Section 6.3.1. As remarked before, when fi = M n , / , depends upon the error correlation matrix Thus, if * is unknown, we must substitute * for * in the expression for JWn.ft) where * is obtained as in Section 6.3.2. 109 Finally, m i n i m i z i n g the estimator of for fl — I, or (6.15) for fl = MSE(-; fl) obtained by combining (6.17) w i t h (6.14) and cr unknown, over a grid of possible values for h 2 yields the desired plug-in estimator of h SE M 6.3 h^ : = argmin {Bias(h(j);fl) hen SE 2 + Var(h(j);fl)\ J =h^ _ . WG (6.18) IN Estimating m, <j\ and ^ Here we introduce methods for (1) accurately estimating the nonparametric component m i n model (2.1) i n the presence of error correlation and (2) estimating the variance of and the correlation m a t r i x \& of the errors associated w i t h model (2.1). E s t i m a t i n g m , of and * is difficult because of the confounding between the linear, non-linear and correlation effects. We hope that the combined way of estimating m , cr and \& proposed 2 i n this thesis w i l l enable us to do well when estimating B. 6.3.1 Estimating m In this section, we consider the issue of accurately estimating the nonparametric component m i n model (2.1) when the model errors are correlated. Recall that we need an accurate estimation of m for estimating the exact conditional bias of c /3fj h T m the plug-in method i n (6.17). We propose estimating m v i a mn,/,, w i t h fl = I and w i t h h chosen by cross-validation, modified for correlated errors. Throughout this section, we thus consider that fl = I. We also let Xf denote the z t h = (1, Xu,..., X) ip row of the m a t r i x X i n (2.2). To assess the accuracy of rrijh as an estimator of m for a given amount of smoothing h, 110 we use the mean average squared error of m j ^ : MASE(h-I) 1 =E 71 2 -^2{fhi (Zi) - m(Zif) >h = E - £(m/ 1 l f c x,z ( Z 0 + Xf/3 - m(Zi) --XJ3) 2 2 n = E -^(Yt-EWXitZij) x,z x,z (6.19) i=l where yj = rn (Zi) IA + Xj3 and £?(Yi| JTi, Z ) = m(Zj) + Xjd. We define the MASE4 optimal amount of smoothing for accurate estimation of m via mj/j as: h M A S E From (6.19) we can see that h MASE = argrain MASE(h; h (6.20) I). depends on the bias and variance associated with estimating the non-parametric component ra, which in turn depend on m itself. Since in practice m is unknown, h MASE To estimate h^ , ASE has to be estimated. we propose using the modified (or leave-(2Z +l)-out) cross-validation method originally formulated by Hart and View (1990) in the context of density estimation and studied by Chu and Marron (1991) and Hardle and Vieu (1992) in the context of nonparametric regression with correlated errors. Aneiros Perez and Quintela del Rio (2001b) recommend modified cross-validation in the context of partially linear models with a-mixing errors. These authors used a version of the Speckman estimator with boundary-adjusted Gasser-Miiller weights to estimate m. The modified cross-validation method estimates h by minimizing an estimator of MASE MASE(h;I): .2 M^E{ha)-\zZ{yth'' - ^ l) Y (6.21) 1=1 This estimator is obtained from (6.19) by dropping the outer expectation sign, substituting E(Yi\Xi, Zi) with Yi, and replacing Yi with Y^ ' \ a prediction of Y$ — Xj3 + 1 111 l m(Zi) + €i based o n data points (Yj, Xji,..., Xj , Zj) which are believed to be uncorreP lated w i t h Yi. M o r e specifically, Y ^ - X j ^ where 0 hV> Ih and points (Yj, Xji,..., fri ~ \z ) hi I h i +^ i Z i ) , are estimators of (6.22) 0 a n d m(Z\) obtained from the data Xj , Zj) w i t h j such that \i — j\ > I. T h e estimation procedure used P for obtaining 0 a n d fhj ' Ih (Zi) is the same as that utilized for obtaining 0 h Ih and fhi, (Zi). h R e c a l l that the estimation procedure utilized for obtaining the estimators 0 Ih rrii h(Zi) : and of 0 a n d m is the usual backfitting algorithm, w i t h a (centered) local l i n - ear smoother m a t r i x i n the smoothing step. However, the backfitting algorithm allows us to evaluate fhj~ ' \-) only at Zj's w i t h j such that \i — j\ > I. W e cannot evaluate l l h i~h' \') m l & t Zi- T o overcome this problem, we propose to estimate 0 and m(Zi) as indicated below. We first carry out the usual backfitting algorithm o n a l l d a t a to obtain the estimator /3 n h of 0 using a l l n data points. W e then define the partial residuals: r = Yj - Xjp , jth j = l,...,n. nth (6.23) F r o m now on, these residuals w i l l become w o r k i n g responses for the modified crossvalidation and our 'data set' is (rj h, Zj),j = 1 , . . . ,n. F i x i, 1 < i < n. W e temporarily t remove from the 'data' the (21 + 1) 'data points' (rj h,Zj) t w i t h \i — j\ < I. W e use the remaining n — (21 + 1) data points i n a usual local linear regression to obtain the n — (21 + 1) estimators fn*Q~ ' 1 h estimators are not centered. \i — j\ > I from rn*^~ ' \z ) 1 h l t \Zi) a n d m*^~ ' \Zj), l % h l w i t h j such that \i — j\ > I. These Subtracting the average of -Ti^~ ' \Zj), % l h w i t h j such that yields a centered estimator for m ( Z , ) : ^ (^)=<; ' (^)- - |/_ ,-!>/} E ^ (^)i0 <i ) ii0 #o ; 112 (6-24) The centering approach used above is admittedly ad-hoc, but nevertheless attempts to address the need of subjecting rn(-) to an appropriate identifiability restriction. Next, we use the estimators in (6.24) in a computationally feasible modified crossvalidation criterion: MCVt(h) = ± £ {n* ~ °(^)) • 2 (' ) 6 25 2=1 Minimizing this criterion yields the desired cross-validation amount of smoothing for accurate estimation of m via mj/j when the model errors are correlated. Note that it is possible to compute a full scale modified cross-validation criterion, by calculating a different estimator of 3 for each i. Specifically, we could replace B nh in the right side of (6.23) with 3 ' , the estimator obtained from all data less those data points j nh (Yj,Xj\,... ,Xj , P Zj) with such that \i — j\ However, computing the < I. full scale modified cross-validation criterion would be more involved than computing the computationally convenient criterion in (6.25). Given that 3 is easier to estimate than m, we believe that the computational simplification used to estimate 3 will not affect to a great degree the estimation of m. A similar simplification was used by Aneiros Perez and Quintela del Rio (2001b) for their modified cross-validation method. Although we do not have theoretical results that establish the properties of the modified cross-validation method, our simulation study suggests that it has reasonable finite sample performance and that it produces a reliable estimator of m, provided I is taken to be large enough. It is not clear how to best choose I in practice. Recall that I should be specified such that the correlation between Yi and (Yj,Xji,..., is effectively removed when predicting Yi by the value with \i — j\ < I, in (6.22). Choosing an I Xj , P Y~ ' 1 h 1 Zj), value that is too small may not succeed in removing the correlation between these data values, therefore producing an undersmoothed estimator of m. Choosing an I value that is too large may remove too much of the underlying systematic structure in the data, therefore producing an estimator of m that is oversmoothed. Whenever possible, one 113 should examining a whole range of values for I to gain more understanding about the sensitivity of the final results to the choice of I. Our simulation study suggests that small values of I should probably be avoided. 6.3.2 Estimating o f and * In this section, we propose a method for estimating the variance of and correlation matrix * of the errors associated with model (2.1). The method we propose relies on assumption (A2), that the model errors follow a stationary autoregressive process of unknown, but finite, order. To estimate the order and the corresponding parameters of this process, we apply standard time series techniques to suitable estimators of the model errors. Monte Carlo simulation studies conducted in Chapter 9 indicate that this method performs reasonably well. Assumption (A2) will clearly not be appropriate for all applications. However, we expect it to cover those situations where the errors can be assumed to be realizations of a stationary stochastic process. Indeed, it can be shown that almost any stationary stochastic process can be modelled as an unique infinite order autoregressive process, independent of the origin of the process. In practice, finite order autoregressive processes are sufficiently accurate because higher order parameters tend to become small and not significant for estimation (Bos, de Waele, Broersen, 2002). If the e,'s were observed, we could estimate the order R of the autoregressive process they are assumed to follow by using the finite sample criterion for autoregressive order selection developed by Broersen (2000). This criterion selects the order of the process by achieving a trade-off between over-fitting (selecting an order that is too large) and under-fitting (selecting an order that is too small). Traditional autoregressive order selection criteria either fail to resolve these issues (i.e., the Akaike Information Criterion) or address just the issue of over-fitting (i.e., the corrected Akaike Information Criterion). In addition, 114 Broersen's criterion performs well even when the order of the autoregressive error process is large. After estimating R, we could estimate the error variance a\ and the corresponding autoregressive parameters <PI,--.,<PR by using Burg's algorithm. This algorithm is described, for instance, in Brockwell and Davis (1991). A comparison of various methods for autoregressive parameter estimation has shown that the Burg algorithm is the preferred method (Broersen, 2000). Finally, we could estimate the error correlation matrix \& by replacing 4>i, • • •, 4>R with their estimated values in the expression for \& provided in Comment 2.2.1. For instance, if R was estimated to be 1, we would estimate the (i,j) th element of * as: where 4>\ is the estimator of the autoregressive coefficient <p\. However, the e,'s are unobserved, so we must first estimate them via suitably defined model residuals and then apply the methodology described above to these residuals in order to obtain the desired estimators of o~\ and * . We propose to estimate the vector of errors e = ( e i , . . . , e ) by the model residuals r n ej^ = Y — X8 Ih — rhih, where h is chosen by modified cross-validation, as described in Section 6.3.1. As argued in Section 6.3.1, this choice of h is expected to provide an accurate estimator for X8+m, and therefore a reasonable estimator for e = Y—X8—m. For those applications where the reasonableness of assumption (A2) is questionable, we believe that one could still use the modified cross-validation residuals to estimate the model errors, since the modified cross-validation method does not rely on explicitly incorporating the error correlation structure. For instance, under the more general assumption 115 (Al), one could estimate a and * = (^ij) from e 2 Ith = . . . ,?„) as follows: T n —2 1 n-|»-j| *U ^ = ^ £ -fori ^ j However, we do not pursue this approach in this thesis. 6.4 Choosing h for c (3~-i „„ T We conclude this chapter by discussing the choice of smoothing parameter h for the estimated modified local linear backfitting estimator we denote 8~-i g c c /3~-i T . As indicated in Section 6.1, by B^-i to emphasize its dependence upon h. Our theoretical goal is h to choose values of h which minimize measures of accuracy for cF 8~-\ introduced for c 8 and c 8^,-i . T and Var(h; \I* h )= c Bias(8^-i T \X, Z) AX, Z)c, we wish to choose the value of h that min- ) = c Var(3~-i T imizes the quantity MSE(h;ty value by h . Namely, if Bias(h;^ T I h similar to those ), obtained by taking fl — \& in (6.4). Denote this In practice, we have to estimate this value from the data. The dif- MSE ficulty that we face is that, since * is estimated and thus random, an expression for MSE(h\ * ) is not tractable. 1 To avert this issue, we ignore the effect of estimating \& and simply replace * by * in the expression for MSE{h-y- ). 1 conditions, 3^,-1 h and 8^-\ h We have seen earlier in this thesis that, under certain are asymptotically 'close', so we expect our approach to be reasonable for large sample sizes. We propose to choose h using suitable modifications of the EBBS and plug-in methods discussed in Sections 6.2.2 and 6.2.3. The global and local modified EBBS methods for 116 choosing the smoothing parameter h of c /3~-i attempt to estimate T = argmin{Bias(h\^ h h-MSE 1 hff : SE ) + Var(/x;$ (6.26) w i t h h €H- For b o t h methods, V a r ( / i ; * *) is computed by substituting \& w i t h * into the expression of Var(h; * _ 1 ) , the exact conditional variance of B^-\ . T h i s expression h is obtained by t a k i n g ft = Bias(h; * i n (6.1). T h e global modified E B B S method estimates ) empirically by fitting a global ordinary least squares regression model to the 'data' points j (ji(k),c P^-i ^ : k = I,..., T h this method yields by h% _ . BBS N*j. W e denote the amount of smoothing T h e local modified E B B S method uses only a fraction G of these data to accomplish the same task. We denote the amount of smoothing supplied -i by this method by h _. EBBS L T h e plug-in method for choosing the value of h i n /3^-i m a t i o n to) tries to estimate (an approxi- h^ : SE hftSE Here, Var(h; * — argmin{Bias(h; * *) + Var(h; * *)} = h ' hen _ . > P L UG IN (6.27) ) is as above, and Bias(h; * *) is constructed by substituting * w i t h * into the expression of Bias(h; \ & ) , the exact conditional bias of _ 1 is obtained by t a k i n g ft — \I>~ i n (6.3). 1 117 j3^,-i . h T h i s expression Chapter 7 Confidence Interval Estimation and Hypothesis Testing In this chapter, we develop statistical methods for assessing the magnitude and statistical significance of a linear combination of linear effects c 3 T (co,... ,c ) T p in model (2.1), where c = is a known vector with real valued components. Specifically, we propose several confidence intervals for assessing the magnitude of c 3, as well as several tests T of hypotheses for testing whether c 3 is significantly different than some known value of T interest. 7.1 Confidence Interval Estimation We propose to construct approximate 100(1 — a)% confidence intervals for c 3 from the T usual, modified or estimated modified local linear backfitting estimators considered in this thesis, and their associated estimated standard errors. In what follows, we use the notation in Section 6.1 to denote these estimators generically by C 3Q t H where Cl can be , -l I, * or * , respectively, and h is an amount of smoothing that must be chosen from the data. Our confidence intervals use an estimated standard error SE(c f3 ) obtained T n 118 h as follows: SE(c 0 , ) T n h Here, Var(h;Cl) = ^Va7r{h-Cl). (7.1) the conditional variance of c 8 is an estimator of Var(h;Cl), given T nh X and Z. Specifically, for Cl — I, Var(h; Cl) is defined as in (6.14). For Cl = if of is unknown, Var(h; Cl) is defined as in (6.15). Finally, for Cl = * \ Var(h; Cl) is obtained from (6.15) by replacing * with * . Note that the standard error expression in (7.1) does not account for the estimation of of and * when these quantities are unknown, nor does it account for the data-dependent choice of h. Rather, it is a purely 'plug-in' expression. The performance of a 100(1 — a)% confidence interval for c 3 depends to a great extent T on how well we choose the smoothing parameter h of the estimator c 8 . T n choice of h can affect the mean squared error of c 3 A poor h , resulting in a confidence interval T n h with poor coverage and/or length properties. We want to choose an h for which (i) the bias of c 3 nh is small, so the interval is centered near the true c 3, and (ii) the variance of c 3 nh is small, so the interval has small length. Choosing h to ensure that T T T the confidence interval is valid (in the sense of achieving the nominal coverage) and has reasonable length is crucial to the quality of inferences about c 3. T In this thesis, we choose the amount of smoothing h needed for constructing confidence intervals for c 3 via the following data-driven choices of h, introduced in Chapter 6: T 1. the (local) modified EBBS choice, h} _\ 2. the global modified EBBS choice, h E BBS _; EBBS 3. the (non-asymptotic) plug-in choice, L G hp _ . LUG IN Recall that each of these choices is expected to yield an accurate estimator c 3 T a h of c 3. Throughout the rest of this chapter, unless otherwise specified, we assume that the T 119 smoothing parameter h of the estimator c 3 T h u [ l PLUG-IN' n h refers to any of h _, EBBS h L _ EBBS or G The performance of a 100(1 — a)% confidence interval for c /3 also depends on how well T we estimate SE(c 3 ), the true standard error of c 3 . will estimate SE(c /3 ) by SE(c 0 ) T nh T nih As already mentioned, we T nh as defined in (7.1). Recall that T nh SE(c 0 ) T nh depends on another smoothing parameter, needed for estimating \fr via as described in Section 6.3.2. It is not clear whether the modified cross-validation choice of smoothing proposed in Section 6.3.2 yields a reasonable estimator of SE(c 3 )T nh The Monte Carlo simulations presented in Chapter 8 will shed more light on this issue. The standard 100(1 — a)% confidence interval for c 3 is given by T c 0 j ±z SE(<?0 ), (7.2) T n l where z / a 2 a/2 nth is the 100(1 — a)% quantile of the standard normal distribution. According to the asymptotic results in this thesis, the estimator c 3 7 U h is biased in finite samples. Consequently, the standard confidence interval for cF8 may not be correctly centered and may not provide 1 — a coverage. We propose two strategies for dealing with this problem. One strategy is to perform a bias adjustment to the estimator c 8 , T n<h to try to ensure that the confidence interval is better centered. This approach, referred to as bias-adjusted confidence interval construction, is discussed in Section 7.1.1. Another strategy is to perform an adjustment to the estimated standard error of c 3 . T nh purpose of this adjustment is to inflate the estimated standard error of c 3 T n the bias of c 8 T n h h The to reflect . This approach, referred to as standard error-adjusted confidence interval construction, is discussed in Section 7.1.2. Throughout, we assume we can use standard normal probability tables to construct the confidence interval in (7.2) and those proposed in Sections 7.1.1 - 7.1.2. This assumption is justified provided the estimator c / 3 T nh is asymptotically normal and our standard error estimators are consistent. Opsomer and Ruppert (1999) established the asymptotic 120 normality of the estimator c 3 T and Cl — I. U h for the case when the model errors are uncorrelated However, no asymptotic n o r m a l i t y results are available as yet for the cases when the model errors are correlated, for either Cl = I or more general Cl. T h e simulations conducted i n Chapter 8 support the use of normal tables when constructing 95% confidence intervals. Note that, for small sample sizes, one might widen the confidence intervals b y using ttables instead of standard n o r m a l probability tables. T h e issue of how one might specify the degrees of freedom involved i n these t-tables needs to be considered carefully a n d is beyond the scope of this thesis. 7.1.1 Bias-Adjusted Confidence Interval Construction T h e idea underlying the bias-adjusted confidence interval estimation of c 0 T adjust the estimator c 0 T n h is t o first for possible finite sample bias effects. T h e n a bias-adjusted 100(1 — a)% confidence interval for c 0 is given by: T c 3 -5ia^(c 3 T T n A where Bias(c '3^ ) )±^/25^(c 3 ), n A estimates the finite sample conditional bias of c 0 T T h U and Z, a n d is defined either as i n (6.17) for h = hp _ , LUG hjsBBS-G a n (7.3) r a f e dh = h _. BBBS L IN h , given X or as i n (6.10) for h = Neither of these bias expressions takes into account the data-dependent choice of h. Furthermore, these bias expressions do not account for the estimation of * when Cl = * \ T h e length of the bias-adjusted confidence interval for c 0 i n (7.3) is the same as that T of the standard confidence interval i n (7.2). T h e coverage properties of the bias-adjusted confidence interval may, however, be better t h a n those of the standard confidence interval, because the bias-adjusted confidence interval m a y be better centered. 121 Note that the estimated standard error SE(c 8 ) T c 8 T , instead of the variability of c 8 T n h place SE(c P i n (7.3) reflects the variability of nh — Bias(c 8 T n h ). One could, of course, re- n h ) by a n estimator of the true standard error of c 8 T T n h — Bias(c 8 T n h n ). h B u t such an estimator may be difficult to obtain i n practice, unless one resorts to computationally expensive bootstrapping methods, and may not necessarily yield a confidence interval w i t h better coverage properties t h a n those of the standard confidence interval. 7.1.2 Standard Error-Adjusted Confidence Interval Construction We have suggested i n Section 7.1.1 that the standard confidence interval for c 8 i n T (7.2) can be improved upon by replacing c 8 w i t h its bias-adjusted version c 0 T Bias(c 'dfih.)T — T n h n h A n o t h e r possible way t o improve u p o n the standard confidence interval in (7.2) is to replace SE(c T 8 ) with Uh MSE(c 8 ), the square root of the estimated T nh conditional mean squared error of c 8 given X and Z. T h e m o t i v a t i o n for this latter T n h adjustment is that, compared to SE(c P ), \J MSE(c T the uncertainty associated w i t h estimating c 8 v i a C 8Q T finite sample bias of c 8 T n h 8 ) T nh is a better measure of nh , as it tries to account for the t H . A standard error-adjusted 100(1 — a)% confidence interval for c 8 is given by: T c 3 ,, ± z T n ^MSE(c (3 ) (7.4) T a/2 n>h where 2 MSE{&8^ )=\Bias{h-n)\ h + [SE(h;Cl)\ Here, Bias(h; Cl) estimates the conditional bias of c 8 T either as i n (6.17) for h = hp _ , LUG IN n h 12 . (7.5) given X and Z, and is defined or as i n (6.10) for h = h _ EBBS G and h = h _. EBBS L Note that the length of the standard error-adjusted confidence interval for c 8 i n (7.4) T is wider t h a n that of the standard confidence interval i n (7.2) due to the fact that 122 \JMSE(c 3 ) > SE(c '3a,h)- T T h i s may translate into improved coverage proper- T nh ties for the standard error-adjusted confidence interval. 7.2 Hypothesis Testing In this section, we exploit the duality between confidence interval estimation a n d hypothesis testing to develop tests of hypotheses for c 8. T Suppose we are interested i n testing the null hypothesis Ho : c 3 = 6 T (7.6) against the alternative hypothesis H : A c 3^6, (7.7) T where 5 is a constant. F r o m the confidence intervals introduced i n Section 7.1, we construct three test statistics for testing HQ against H : A Z £ = ^ t~ , ' S E ^ B ^ n n 6 _ cp - Bias(h;fl) T (2) Z (7.8) h ^ U)h SE c^ ) = { ' h = • y/MSE(cTf3 ) nth W e w i l l reject H at significance level a i f \Z^l\ 0 123 - 6 > z /2a ( 7 ' 9 ) (7-io) Chapter 8 M o n t e Carlo Simulations In this chapter we report the results of a Monte Carlo study on the finite sample properties of estimators and confidence intervals for the linear effect f3\ in the model: Yi = /?o + PiXi + m(Zi) + ei, i = l,...,n, (8.1) obtained by taking p = 1 in (2.1). Even though this model is not too complicated, we hope that it will allow us to understand how the properties of these estimators and confidence intervals will be affected by (1) dependency between the Xi's and the Zi's, and (2) correlation amongst the e;'s. For our study, we have deliberately chosen to use a context similar to that considered by Opsomer and Ruppert (1999) for independent ej's, so that we can make direct comparisons. Given this context, the main goals of our simulation study were to: 1. Compare the expected log mean squared error (MSE) of the estimators for Pi. 2. Compare the performance of the confidence intervals for Pi built from these estimators and their associated standard errors. The rest of this chapter is organized as follows. In Section 8.1, we discuss how we generated the data in our simulation study. In Section 8.2, we provide an overview of the 124 estimators for Pi considered in this study. We also specify the methods used for choosing the smoothing parameters of these estimators. In Section 8.3, we compare the expected log mean squared errors (MSE) of the estimators for all simulation settings in our study. Finally, in Sections 8.4 and 8.5, we assess the coverage and length properties of various approximate 95% confidence intervals for Pi constructed from these estimators and their associated approximate standard errors. 8.1 The Simulated Data The data (Yi,Xi, Zi), i — 1,... ,n, in our simulation study were generated from model (8.1) using a modification of the simulation setup adopted by Opsomer and Ruppert (1999). Specifically, we took the sample size n to be 100 and set the values of the linear parameters Po and Pi to zero. We considered two m(-) functions: • mi(z) = 2sin(3z) - 2(cos(0) - cos(3))/3, z G [0,1]; • m (z) = 2sin(6z) - 2(cos(0) - cos(6))/6, z G [0,1]. 2 The Zi's were equally spaced on [0,1], being defined as Zi = i/(n + 1). Furthermore, Xi = g(Zi) + rji, with g(z) = QAz + 0.3, z G [0,1], and rji = (1 - 0.4)^ - 0.3, where the C/j's were independent, identically distributed having a Unif(0,1) distribution. The €j's followed a stationary AR(1) model with normal distribution: e»- = pet-i + Ui, (8.2) where p is an autoregressive parameter quantifying the degree of correlation amongst the ei's. The iij's were independent, identically distributed normal random variables having mean 0 and standard deviation a = 0.5. The Ui's were independent of the e^'s. In u our simulation study, we used p = 0 to include the case of independence, as well as p — 0.2, 0.4, 0.6 and 0.8 to model positive correlation ranging from weak to strong. 125 T h e simulation settings corresponding to p = 0 (the case of independent errors) are the same as those considered by Opsomer and R u p p e r t (1999), w i t h the following exceptions: (i) we considered n = 100 instead of n = 250, (ii) we 'centered' the m(-) functions, that is, we subtracted a constant so that these functions integrate to 0 over the interval [0,1] and (iii) we scaled the errors 77, to have E{rji) = 0 instead of E(r}i) = 0.3. Opsomer and R u p p e r t d i d not specify what value they used for Pi. For each model configuration, we generated 500 d a t a sets. Note that there are 10 model configurations altogether, one for each combination of autoregressive parameter p and non-linear effect m(-) considered. Figure 8.1 displays data generated from model (8.1) for p — 0, 0.4, 0.8 and mi(z). Figure 8.2 provides the same display for m (z). T h e responses Yi are qualitatively different for 2 different values of p. For p = 0, the responses vary randomly about the m(-) curve. A s p increases from 0.4 to 0.8, the variation of the Yi's about the curve m(-) makes it v i r t u a l l y impossible to distinguish the non-linear signal m(-) from the autoregressive noise that masks it. 8.2 The Estimators In this section, we provide an overview of the estimators for the linear effect Pi i n model (8.1) considered i n our simulation study. W e also provide an overview of the methods used for choosing the smoothing parameter of these estimators. Note that Pi = c 8, where c = ( 0 , 1 ) and 8 = (P , Pi) • T h e estimators of Pi considered T T T 0 i n our simulation study are of the form Pi = c 8, where 8 is: T (i) / 3 c , the usual backfitting estimator defined i n (3.4) w i t h ft — i"; JS (ii) 8^-1 , the estimated modified backfitting estimator defined i n (3.4) w i t h ft = * ; 'h 126 the usual Speckman estimator defined in (3.4) with fl = (I — S ) • (iii) 0(j_ cj ^, S c T T h In all three estimators, S is a centered smoother matrix, defined in terms of the Epanechh nikov kernel in (3.9). For the two backfitting estimators, we take S to be a centered c h local linear smoother matrix. For the usual Speckman estimator, we take S to be a cenh tered local constant smoother matrix with Nadaraya-Watson weights. The latter choice is motivated by the fact that the usual Speckman estimator is typically used with local constant smoother matrices with kernel weights. We are not sure to what extent the differences in performance between the usual Speckman estimator and the two backfitting estimators may be due to this difference in the method of local smoothing. Note that /3^-i c, the modified backfitting estimator obtained from (3.4) with fl — S was omitted from our simulation study. This estimator may have value as a benchmark, but has no practical value due to the fact that the error correlation matrix * is never fully known in applications. For similar reasons, we also omitted 0(i-s° ) '<s>- ,s J the modified T 1 h h Speckman estimator obtained from (3.4) with fl = (I — S ) ^~ . c T 1 h not included in our study is 0. CC^TS,- Another estimator the estimated modified Speckman estimator so 1 c obtained from (3.4) with fl = (I — S ^ ) * 7 . Recall that Aneiros Perez and Quintela del Rio (2001a) investigated the large sample properties of a similar estimator, based on local constant smoothing with Gasser-Muller weights. These authors have a suggestion for estimating * from the data, but they did not explore how well it works in practice. In our simulation study, the estimator 0~-i„ r - which is similar to 0, r ^.^.--I^ - does poorly in general. We believe this may be due to a combination of the following: (1) * is hard to estimate in the presence of confounding between the linear, non-linear and correlation effects and (2) the additional variability introduced by estimating * is not properly taken into account when selecting the smoothing parameter and when constructing standard errors for /3^-i g c from small samples. We suspect that, if one were to use the methods proposed in this thesis to estimate \& for computing 0y_ ^ ~-\ sc one would also get an estimator with poor finite sample behaviour. 127 T g c , All three estimators in (i)-(iii) require a data driven choice of smoothing parameter. For the three backfitting estimators we consider EBBS-G and EBBS-L (see Section 6.2.2) and PLUG-IN (see Section 6.2.3). For the usual Speckman estimator, we use cross-validation, modified for correlated errors (MCV) and for boundary effects. The MCV criterion is similar to that in (6.21), namely: Here, Y^ ^ is obtained as in (6.22), but with Cl = (I—§ ) , 1 c T h where S is the centered local c h constant smoother matrix. Also, W is a weight function introduced to allow elimination (or at least significant reduction) of boundary effects that may affect the estimation of the non-linear effect m in model (8.1), and hence the prediction of Y . W is defined as t in Chu and Marron (1991): ' 5 if I < « < i W(u) = { 3 5 - - 5> ; 0, if 0 < u < | or \ < u < 1. Recall that EBBS-G depends on the tuning parameters I, N and T, whereas EBBS-L depends on the tuning parameters I, N, T, k\ and k . Also, recall that PLUG-IN and 2 MCV depend on the tuning parameter /. In our simulation study, we consider N = 50, T = 2, ki = 5, k = 5, and I = 0,1,..., 10. 2 For convenience, throughout the remainder of this chapter, we use the notation PIJ PLUG-IN^ $u,EBBS-G a n d Pu EBBS-L f ° * 1 r n e usual local linear backfitting estimators of j3\. We use the notation P^§M,PLUG-IN^ J^EM,EBBS-G a n d P EM,EBBS-L { for t h e estimated modified lo- cal linear backfitting estimators. Finally, we use the notation ($ cv *° f r e M e r to the usual Speckman estimator of (3\. Wherever necessary, we refer to these estimators generically as 128 8.3 The M S E Comparisons In this section, we identify the estimators /?}' , including bandwidth selection methods, ; that appear to be best, in the sense of being most accurate for all simulation settings and for most values of /, the tuning parameter used in the modified cross-validation. Recall ~ (0 that the measure of accuracy of /3i considered in this thesis is the conditional MSE of /3[ \ MSE(0[ ), defined in (6.4). Specifics are provided below. l l) To compare the accuracy of two estimators for a given simulation setting, we look at the boxplot of differences in the log MSE's of these estimators. If the boxplot is symmetric about 0, then the two estimators have comparable accuracy. We also conduct a level 0.05 two-sided paired t-test to compare the expected log MSE's of the estimators. If the test is significant, we label the boxplot with an S. The log MSE's of the two estimators are evaluated from the 500 data sets generated for the given simulation setting. For each backfitting estimation method (usual, estimated modified), we recommend a way to choose the smoothing parameter h. Then we compare the resulting backfitting estimators, including a comparison with the usual Speckman estimator to determine an estimator that is best, in the sense of being most accurate for all simulation settings and most values of I. In Figures A.1-A.10 in Appendix A, we study the methods of bandwidth choice for the usual local linear backfitting estimator. We display boxplots of pairwise differences in the log MSE's of the estimators PV,PLUG-IN> PU]EBBS-G a n d PU,EBBS-L> £ = 0,1,..., 10. Each figure corresponds to a different simulation setting. From these figures, we see that 1$PLUG-IN a n a PIJEBBS-G n a v e comparable accuracy across all simulation settings, provided I is large enough, say I > 4. They also have better accuracy than @U]EBBS-L> which performs poorly for several simulation settings (see, for instance, Figures A.6A.7). Therefore, we recommend using PLUG-IN and EBBS-G to choose the smoothing parameter for the usual local linear backfitting estimator. 129 Figures A.11-A.20 display the corresponding plots for the estimated modified local linear backfitting estimator. We see that P _ EM EBBS is the most accurate across all simulation G settings, provided I is large enough, say I > 4. We also see that PEM EBBS-L PEM,PLUG-IN perform very poorly relative to PEM,EBBS-G f ° r m o s ^ AN t simulation settings and most values of I. Therefore, we recommend using EBBS-G to choose the smoothing parameter for the estimated modified local linear backfitting estimator. In Figures A.21-A.30 we compare estimators using our favourite bandwidth selection method. We display boxplots of pairwise differences in the log MSE's of the estimators M]PLUG-IN, M]EBBS-G> 0EM,EBBS-G a n d MMCV 1 = 0,1,..., 10. Each figure corresponds to a different simulation setting. From these figures, we conclude that the estimators P^PLUG-IN^ PU]EBBS-G a n d P EBBS-G EM have comparable accuracy for all simulation settings, provided I is large enough, say I > 4. The estimator P^MCV ^ S less accurate than these three estimators for most simulation settings and most values of /. In particular, plots such as those in Figures A.24, A.25, A.29 and A.30 strongly support the elimination of P J P . The poor performance of P^MCV ^ w M C V n respect to the log MSE criterion could be due to the fact that this estimator uses local constant smoothing, instead of local linear smoothing. But it could also be due to the fact that $3 MCV I S computed with an MCV choice of smoothing. Recall that this choice attempts to estimate the amount of smoothing optimal for estimation of XB + m. It is not clear whether this choice will provide a reliable estimate of the amount of smoothing optimal for estimation of c 3. T 8.4 Confidence Interval Coverage Comparisons In this section, we assess and compare the coverage properties of various confidence intervals for Pi constructed from all estimators considered in our simulation study. Our goals are to: 130 1. Identify those estimators which yield standard confidence intervals for Pi w i t h good coverage properties across a l l simulation settings and most values of I. 2. E s t a b l i s h whether the coverage properties of standard confidence intervals for Pi can be improved through bias or standard error adjustments. T o assess the coverage properties of a confidence interval C for a given simulation setting, we proceed as follows. We evaluate the confidence interval for each of the 500 simulated data sets. We calculate the proportion of these intervals which contain the true value of Pi and denote it by p. lfp± 1.96-y/p(l — p)/500, the 95% confidence interval for the true coverage, contains the nominal level of C , we say that C is valid. If the upper (lower) confidence l i m i t is smaller (bigger) t h a n the n o m i n a l level of C , we say that C is anti-conservative (conservative). T h e confidence intervals for Pi considered i n our simulation study fall into three categories: standard, bias-adjusted and standard-error adjusted, as defined i n (7.2), (7.3) and (7.4). 8.4.1 Standard Confidence Intervals We now assess the coverage properties of the standard 95% confidence intervals for Pi obtained from the estimators PU]PLUG-IN, PEM,EBBS-G> 0EM,EBBS-L A N D PU,EBBS-G> PU,EBBS-L> @S!MCV> where I = 0 , 1 , . . . , 10. 0EM,PLUG-IN> P o i n t estimates and 95% confidence interval estimates for the true coverage achieved by these intervals are displayed i n Figures B.1-B.10 i n A p p e n d i x B . E a c h figure corresponds to a different simulation setting. Figures B.1-B.10 show that the standard confidence intervals constructed from the estimators P ij pLUG-iNi t @U!EBBS-G a n o - 0 S!MCV a r e v a u d for all simulation settings provided the value of I is large enough. However, the standard confidence intervals obtained from 131 the estimators PU,EBBS-L^ PEM,PLUG-IN> PEM,EBBS-G a n d PEM,EBBS-L h a v e extremely poor coverage for many simulation settings a n d for many values of /; see, for instance, Figures B . 6 a n d B . 7 . In view of these findings, the preferred estimators for constructing standard confidence intervals for Pi are PIJ PLUG-IN^ PU]EBBS-G a n d P S!MCV- T h e other estimators cannot be trusted to produce valid inferences on Pi. M o r e details concerning our findings are provided below. confidence intervals constructed from the estimators PJj PLUG-IN T h e standard PIJEBBS-G a r e v & A N D l i d for a l l simulation settings, provided I is large enough, as shown in Table 8.1. F r o m this table, we see that taking I > 1 when p = 0.2, I > 2 when p — 0.4, I > 3 when p = 0.6, and I > 4 when p = 0.8 yields valid intervals for the contexts considered. W e recommend using these intervals to conduct inferences on Pi, w i t h values of I that are large enough. Clearly, taking I = 0 , 1 , 2, 3 is not advised, unless one is certain that p is small. W h a t is not apparent from Table 8.1 is w h y the confidence intervals constructed from Pu PLUG-IN a n d PIJEBBS-G a r e v & h d for smaller values of I. Typically, for small I's, the estimates of Pi constructed from the simulated data have a tendency to underestimate the true value of Pi when m(z) = m {z). Furthermore, the estimated standard 2 errors associated w i t h these estimates have a tendency to underestimate the true standard errors b o t h when m(z) = mi(z) and when m(z) = m {z). However, as I increases, the estimates 2 of Pi and their associated standard errors improve significantly for all simulation settings. T h e standard confidence intervals constructed from the usual Speckman estimator P ^SMCV are generally valid across a l l simulation settings even for smaller / values. P^SMCV d o e s n However, ° t yield valid confidence intervals when m(z) — 7712(2) and (i) p = 0.4 and I — 1 or 4 a n d (ii) p — 0.8 a n d Z = 3,4, 5, 6, 7, 8 or 10. In these two cases, P^SMCV yields confidence intervals that are slightly anti-conservative. T h i s lack of continuity i n behaviour is of concern and might not be attributable to simulation variability. I n deed, Figures B . 6 - B . 1 0 show that, for m(z) = m (z), P^SMCV seems t o exhibit a n anti2 132 conservative pattern for most I's. W h e n p = 0 and m(z) = rrii(z), the standard confidence intervals obtained from the estimators PU]EBBS-L^ PEM,PLUG-IN, PEM,EBBS-G a n PEM,EBBS-L d provide the nomi- nal coverage, regardless of how we choose I (see Figure B . l ) . However, when p = 0 and m(z) = the intervals constructed from P$EBBS-L 7712(2), a n d P EM EBBS-L a r e extremely anti-conservative for all values of / (see Figure B . 6 ) . In addition, the intervals constructed from PEM,PLUG-IN a n d PEM EBBS-G a r e m i l d l y anti-conservative for many values of I (see Figure B . 6 ) . A s p increases, the coverage provided by some of the standard confidence intervals obtained from P U]EBBS-L> PEM,PLUG-IN> PEM,EBBS-G ( a n PEM,EBBS-L d deteriorates for many small a n d / o r large values of I, depending o n the specification of m(-). For i n stance, when m(z) = m ( z ) , the coverage properties of the intervals constructed from 2 PEM,PLUG-IN a n PEM,EBBS-L d a r e extremely poor (see Figures B . 7 - B . 1 0 ) . T h e coverage properties of the intervals constructed from j3 y EBBS-L a r e a ^ s o P o o r for small p values (see Figures B . 7 - B . 8 ) . Finally, the coverage properties of the intervals constructed from PEM,EBBS-G w o r s e n a s P increases, but not dramatically. W e do not recommend using these intervals to carry out inferences o n 8.4.2 Bias-Adjusted Confidence Intervals In this section, we assess the coverage properties of the bias-adjusted 95% confidence intervals for Pi. W e d i d not consider a bias-adjusted confidence interval for the usual Speckman estimator P^ , MCV as this estimator is known to have good bias properties b o t h when p = 0 (see Speckman, 1988) and when p > 0 (see Aneiros-Perez and Quinteladel-Rio, 2001a). Plots (not shown) of the point estimates and 95% confidence interval estimates for the true coverage achieved by the bias-adjusted intervals yield some general conclusions. 133 Only the estimators p uPLUG-IN a n d 0 U]EBBS-G yield bias-adjusted confidence intervals that are valid for all simulation settings provided the value of I is large enough. These values of I are almost identical to those reported in Table 8.1. Again, we see that one should avoid using I = 0,1,2,3 unless one is sure that p is small enough. 8.4.3 S t a n d a r d E r r o r - A d j u s t e d Confidence Intervals Here, we assess the coverage properties of the standard error-adjusted 95% confidence intervals for f3\. We did not consider a standard error-adjusted confidence interval for the usual Speckman 3 g\{cv> due 1 , 0 its g°°d bias properties. Plots (not shown) indicate that only the estimators 3 \j PLUG-IN a n d 0 IJ EBBS-G P rov i d e standard error-adjusted confidence intervals that are valid for all simulation settings provided the value of I is large enough. These values of / are nearly identical to those reported in Table 8.1. Yet again, we see that one should avoid using / = 0,1, 2, 3 unless one is sure that p is small enough. To sum up, we see no reason to recommend bias adjustments to the estimators 3 uPLUG-IN and P(JEBBS-G o r to their associated standard errors. Indeed, such adjustments do not seem to improve the coverage properties of the confidence intervals obtained from these estimators. 8.5 Confidence Interval Length Comparisons Recall from the previous section that we identified 3 uPLUG-IN a n d 0 U\EBBS-G a s the only estimators of Pi in our simulation study that yielded valid 95% standard confidence intervals for all simulation settings provided the value of / is large enough. The standard intervals based on P SMCV were found to be competitive, but just not as good. Also recall that the coverage properties of the standard confidence intervals constructed from 134 P UNPLUG-IN aftd P^JEBBS-G could ° t n improved by performing bias-adjustments to D e these estimators or to their associated standard errors. Before recommending any of the estimators 0$PLUG-IN a n a PIJEBBS-G f° practical use, we must compare the lengths r of the standard confidence intervals for Pi constructed from these estimators. We choose to include standard intervals constructed from P S^MCV XN o u r comparison to gain more understanding into their properties. When several confidence interval procedures are valid (in the sense of achieving the desired nominal level), we prefer the one with the shortest length. In this section, we conduct visual and formal comparisons of the lengths of the standard 95% confidence intervals for Pi constructed from these estimators. Wc only consider values of / that are large enough to guarantee the validity of the ensuing confidence intervals, as in Section 8.4. Specifically, we take / > 1 for p — 0.2, / > 2 for p ~ 0.4, / > 3 for p = 0.6 and I > 4 for p = 0.8. To compare the lengths of two confidence intervals for a given simulation setting wc look at the boxplot of differences in the log lengths of these intervals. The lengths are evaluated from the 500 data sets generated for the given simulation setting. If the boxplot is symmetric about 0, then the two confidence intervals have comparable length. Figures C l - C.10 in Appendix C (bottom three rows) display boxplots of pairwise differences in the log length of the standard 95% confidence intervals constructed from the estimators P(JPLUG-INI PUEBBS-G a n d PS!MCV F these figures, we see that for r o m all simulation settings with p > 0 and for values of I that are large enough (e.g., larger than 3), the estimators P^u PLUG-IN than those based on P^MCVP^SMGV w a s s e e n a r m PIJEBBS-G yield shorter confidence intervals This was to be expected, as the log MSE behaviour of to be inferior to that of Pu PLUG-IN a n a PIJEBBS-G- Furthermore, we notice that the lengths of the confidence intervals constructed from P lj PLUG-IN PUEBBS-G - ANA t d to be comparable for many of these I values. e n Our previous findings arc supported by the results of pairwise level 0.05 two-sided paired 135 t-tests for comparing the expected log lengths of the confidence intervals under consideration for all simulation settings and for values of I that are large enough. We describe these tests below. Given a simulation setting, for fixed I, conduct (?,) two-sided paired t-tests to compare the expected log lengths of the intervals obtained from the estimators Pu PLUG-IN^ PIJ EBBS-G and PSMCV For each test, the null hypothesis is that the expected log lengths of the intervals being compared are the same. The test result is considered significant if the p-value associated with the test is smaller than 0.05. Use the results of the t-tests to identify which estimators yield the shortest confidence interval. If all tests give significant results, we claim that there is a clear winner; in other cases, we say that two estimators might be tied for best. Figures C.1-C.10 (top row) show the average length of the confidence intervals obtained, with standard error bars superimposed. The figures indicate which of these estimators produces the shortest confidence interval for values of I of interest. 8.6 Conclusions Based on the results of our simulation study, we recommend using the usual local linear backfitting estimators PIJPLUG-IN Bs^ MCV a n d PIJ EBBS-G a n o - the usual Speckman estimator to carry out valid inferences about the linear effect B\ in model (8.1). The value of I used when computing these estimators should be large enough, that is, at least 4. Our findings indicate that Pu PLUG-IN a n o - Pu EBBS-G have comparable accuracy for large values of I, and that they are in general more accurate than P^MCV All three estimators yield valid standard 95% confidence intervals for Pi when I is large enough. However, the intervals based on PU]PLUG-IN a n d Pu'EBBS-G t d to have shorter length e n and are therefore preferred over the interval based on P*PMCV• 136 We see no reason to recommend bias-adjustments to the estimators Pu PLUG-IN PIJEBBS-G o r - ANA t ° their associated estimated standard errors. Such adjustments do not seem to improve the coverage properties of the corresponding confidence intervals. Finally, we do not recommend using the usual backfitting estimator PIJEBBS-L O R estimated modified backfitting estimators PEM,PLUG-IN> PEM,EBBS-G> PEM,EBBS-L * NE T O carry out inferences about fa. These estimators yielded confidence intervals w i t h poor coverage for many simulation settings and many values of I, owing to the difficulties associated w i t h estimating their standard errors. 137 Figure 8 . 1 : Data simulated from model (8.1) for p = 0,0.4,0.8 and m(z) = m\(z). The first row shows plots that do not depend on p. The second and third rows each show plots for p = 0, 0.4, 0.8. 138 X1 vs. Z 4, m(Z) vs. Z • 4| 0.5 Z £ VS. • 0.5 Z Z £ VS. Yvs. Z Z £ VS. Yvs. Z N Z Yvs. Z N ca + o II CO. >- II >- Figure 8.2: Data simulated from model (8.1) for p = 0, 0.4, 0.8 and 7n(z) = 7712(2). T/ie ^ r s i row shows plots that do not depend on p. The second and third rows each show plots for p = 0, 0.4, 0.8. 139 Table 8.1: Values of I for which the standard 95% confidence intervals for Pi constructed from the estimators PXJ LUG-IN> PU]EBBS-G P d P^SMCV an a r ev a ^ the sense of achieving the nominal coverage) for each setting in our simulation study. 7711(2) p =0 off) P U,PLUG--IN le{0,.. .,10} p = 0.2 l€{0,.. p = 0.4 P U,EBBS--G P S,MCV le{0,.. .,10} I e {0,.. ,10} .,10} l£{0,.. -,10} le{i,.. l€{l,.. .,10} le{2,.. .,10} 1 e {0,.. ,10} p = 0.6 le{2,.. .,10} le {3,.. .,10} le{0,.. ,10} p = 0.8 le{3,.. .,10} le {3,.. .,10} le{0,.. ,10} ,10} m {z) 2 P U,PLUG--IN P U,EBBS--G PS,MCV p =0 le {0,.. .,10} le{0,.. .,10} le{o,...,w} p = 0.2 l€{0,.. .,10} l€{l,.. • ,10} le {0.....10} p = 0.4 ie{i,.. .,10} le{2,.. .,10} l e {0}U{2,3}U{5,... p = 0.6 le {3,.. .,10} le{3,.. .,10} /e{o,...,io} p = 0.8 le{3,.. .,10} l e {4,.. .,10} 140 / e {0,1,2}U{9} Chapter 9 Application to A i r Pollution D a t a Many community-level studies have provided evidence that air pollution is associated with mortality. Statistical analyses of data collected in such studies face various methodological challenges: (1) controlling for observed and unobserved factors, such as season and temperature, that might confound the true association between air pollution and mortality, (2) accounting for serial correlation in the residuals that might underestimate statistical uncertainty of the estimated association, and (3) assessing and reporting uncertainty associated with the choice of statistical model. Various statistical models can be used to describe the true association between air pollution and health outcomes of interest based on community-level data. However, the most widely used have been the generalized additive models (GAMs) introduced by Hastie and Tibshirani (1990). These models include a single 'time series' response (e.g. non-accidental mortality rates) and various covariates (e.g. pollutants of interest, time, temperature). The effects of the pollutants of interest on the response are typically presumed to be linear, whereas those of the remaining covariates are presumed to be smooth, non-linear. Schwartz (1994), Kelsall, Samet and Zeger (1997), Schwartz (1999), Samet, Dominici, Curriero et al. (2000), Katsouyani, Toulomi, Samoli et al. (2001), Moolgavkar (2000), Schwartz (2000) are just some of the authors who relied on GAMs in order to assess the acute effects of air pollution on health outcomes such as mortality or hospital 141 admissions. There are various problems that researchers must consider when using GAMs to analyze air pollution data arising from community-level studies. Some of these problems are purely computational, whereas others are more delicate and pertain to the theoretical underpinnings of these models. Several computational issues associated with the S-Plus implementation of methodology developed by Hastie and Tibshirani (1990) for estimation of GAMs have been brought to light in recent years. We describe these problems here. The linear and non-linear effects in GAMs applied to air pollution data have been typically estimated using the S-Plus function gam. Dominici et al. (2002) showed that gam may provide incorrect estimates of the linear effects in GAMs and their standard errors if used with the original default parameters. Although the defaults have recently been revised (Dominici et al., 2002), an important problem that remains is that gam calculates the standard errors of the linear effects by assuming that the non-linear effects are effectively linear, resulting in an underestimation of uncertainty (Ramsay et al., 2003a). In air pollution studies, this assumption is likely inadequate, resulting in underestimation of the standard error of the linear pollutant effect (Ramsay et al., 2003a). The practical choice of the degree of smoothness of the estimated non-linear confounding effects of time and meteorology variables is a delicate issue in air pollution studies which utilize GAMs. Given that the confounding effects are viewed as a nuisance in such studies, the appropriate choice should be informed by the objective of conducting valid inferences about the pollution effect. Most choices performed in the air pollution literature are based on exploratory analyses (see, for instance, Kelsall, Samet and Zeger, 1997) and seem to be justified by a different objective, namely doing well at estimating the non-linear confounding effects. This objective typically ignores the impact of residual correlation on the choice of degree of smoothness, as well as the dependencies between the various variables in the model. 142 In the present chapter we apply the methodology developed i n this thesis to analyze air p o l l u t i o n data collected i n M e x i c o C i t y between January 1, 1994 a n d December 31, 1996. O u r goal is to determine whether the pollutant P M 1 0 has a significant short-term effect on the non-accidental death rate i n M e x i c o C i t y after adjusting for temporal and weather confounding. W e give a description of the data i n Section 9.1 a n d analyze the d a t a i n Section 9.2. 9.1 Data Description P M 1 0 - airborne particulate matter less than 10 microns i n diameter - is a major component of air pollution, arising from natural sources (e.g. pollen), road transport, power generation, industrial processes, etc. W h e n inhaled, P M 1 0 particles tend to be deposited i n the upper parts of the human respiratory system from w h i c h they can be eventually expelled back into the throat. H e a l t h problems begin as the b o d y reacts to these foreign particles. P M 1 0 is associated w i t h mortality, exacerbation of airways disease and decrement i n lung function. A l t h o u g h P M 1 0 can cause health problems for everyone, certain people are especially vulnerable to its adverse health effects. These "sensitive popula- tions" include children, the elderly, exercising adults, a n d those suffering from heart a n d lung disease. T h e d a t a to be analyzed i n this chapter were collected i n M e x i c o C i t y over a period of three years, from January 1, 1994 to December 31, 1996, i n order to determine if there is a significant short term effect of P M 1 0 on mortality, after adjusting for potential temporal and weather confounders. T h e data consist of daily counts of non-accidental deaths, daily levels of ambient concentration of P M 1 0 (10fig/m ), 3 and daily levels of temperature (°C) and relative h u m i d i t y (%). T h e ambient concentration of P M 1 0 corresponding to a given day was obtained by averaging the P M 1 0 measurements over a l l the stations i n M e x i c o City. 143 Pairwise scatter plots of the data are shown i n Figure 9.1. T h e most s t r i k i n g features i n these plots are the strong annual cycles i n the log m o r t a l i t y levels, the daily level of ambient concentration of P M 1 0 , and the daily levels of temperature and relative humidity. It is likely that the annual cycles i n the log mortality levels are produced by unobserved seasonal factors such as influenza and respiratory infections. Note that log mortality and P M 1 0 peak at the same time w i t h respect to the annual cycles. O u r analysis of the health effects of P M 1 0 must account for the potential confounding effect of these temporal cycles on the association between P M 1 0 and log mortality. W e believe the strength of these cycles w i l l make it difficult to detect whether this association is significant. 9.2 Data Analysis T h e following is an overview of our data analysis. F i r s t , we introduce the four statistical models that we use to capture the relationship between P M 1 0 and mortality, adjusted for seasonal and meteorological confounding. Three of these models contain smooth non-parametric terms which attempt to control for these confounding effects. Next, we illustrate the importance of choosing the amount of smoothing for estimating the nonparametric terms i n these models when the m a i n objective is accurate estimation of the true association between P M 1 0 and mortality. We then focus on determining which of the four models is most relevant for the data. Finally, we use this model as a basis for carrying out inference about the true association between P M 1 0 and mortality. 9.2.1 Models Entertained for the Data Let Di denote the observed number of non-accidental deaths i n M e x i c o C i t y on day i, and let Pi, Ti and Hi denote the daily measures of P M 1 0 , temperature and relative humidity, 144 respectively. T h e models that we entertain for our d a t a are: log(Di) = 0o + 0iPi + ei (9.1) log{Di) = p + 0^ + mi(i) + e log(Di) = p + + (i) 0 Q (9.2) t mi + p Ti + P Hi + p Ti • H + a 2 3 23 (9.3) t log(Di) = P + PiPi + mi{i) + m {T Hi) + e . 0 Here, i = 1,2,..., 1096. 2 h (9.4) t A l s o , m i is a smooth univariate function, whereas m 2 is a smooth bivariate surface. T h e function mi serves as a linear filter on the log m o r t a l i t y and P M 1 0 series and removes any seasonal or long-term trends i n the data. For the time being, the error terms i n a l l four models are assumed to be independent, identically distributed, w i t h mean 0 and constant variance o\ < oo. T h e independence assumption w i l l be relaxed later. Models (9.1)-(9.4) treat the log m o r t a l i t y counts as a continuous response. Furthermore, they assume the relationship between P M 1 0 and log m o r t a l i t y to be linear, to allow for easily interpretable inferences about the effect of P M 1 0 on log mortality. T h e models differ, however, i n their specification of the potential seasonal and weather confounding on this relationship. Specifically, model (9.1) ignores the possible seasonal and weather confounding on the relationship between P M 1 0 and log mortality. Models (9.2)-(9.4), however, allow us to adjust this relationship for potential seasonal and weather confounding. Models (9.2) and (9.3) require that we specify the amount of smoothing needed for estimating m j . M o d e l (9.4) requires that we specify the amount of smoothing necessary for estimating b o t h mi and m. 2 To fit models (9.2)-(9.4) to the data, we use the S-Plus function gam w i t h the more stringent convergence parameters recommended by D o m i n i c i et al. ( 2002). W e employ a univariate loess smoother to estimate mi and a bivariate loess smoother to estimate 145 m . T h e loess smoothers are local linear smoothers relying on spans corresponding to a 2 fixed number of nearest neighbours instead of a b a n d w i d t h . 9.2.2 Importance of Choice of Amount of Smoothing T h e inferences made on the linear P M 1 0 effect Pi i n any of the models (9.2)-(9.4) may be severely affected by the choice of amount of smoothing for estimating the smooth confounding effects i n these models. To illustrate the impact of this choice on the conclusions of such inferences, we restrict attention to model (9.3). Later, we w i l l see that this model is the most appropriate for the data. Figure 9.2 compares the impact of various choices of smoothing for the seasonal effect m i i n model (9.3) on the following quantities: (i) gam estimates of Pi, (ii) gam standard errors for the estimates i n (i), (iii) 95% confidence intervals for Pi constructed from the estimates i n (i) and (ii), (iv) gam p-values associated w i t h standard t-tests of significance of Pi. These quantities were obtained by fitting model (9.3) to the data using gam w i t h loess as a basic smoother. T h e loess span used for smoothing mi was allowed to take on values in the range 0.01 to 0.50. T h e reference d i s t r i b u t i o n for the 95% confidence intervals and the p-values depicted i n Figure 9.2 is a t-distribution whose degrees of freedom are the residual (or error) degrees of freedom associated w i t h model (9.3). Note that the estimated standard errors reported by gam do not account for error correlation. C h a n g i n g the span for smoothing mi greatly affects the estimates, standard errors, confidence intervals and p-values i n Figure 9.2 and hence the conclusions of our inferences on Pi, the short-term P M 1 0 effect on log mortality. In particular, using large spans for 146 smoothing m i suggests that the data provide strong evidence i n favour of a significant P M 1 0 effect on log mortality, after adjusting for seasonal and weather confounding. Using small spans for smoothing m i suggests that the data do not provide enough evidence in support of a significant P M 1 0 effect on log mortality i n M e x i c o C i t y . P r o p e r choice of amount of smoothing for estimating the seasonal effect m i i n model (9.3) is crucial for m a k i n g inferences on Pi, as seen i n F i g u r e 9.2. G i v e n the sensitivity of our conclusions to the choice of smoothing, the natural question that arises is: how can we choose the amount of smoothing to be able to make valid inferences on Pi? T h e correct choice of smoothing should be appropriate for accurate estimation of Pi, not for accurate estimation of m i . T h i s choice should account for the strong relationships between the linear and non-linear variables i n the model seen i n Figure 9.1, and for potential correlation amongst model errors. It is important to note that the S-Plus function gam provides no data-driven method for choosing the amount of smoothing. Using gam's default choice of smoothing is not advised when one is concerned w i t h accurate estimation of Pi. T h e default choice of smoothing used by gam is 0.50, or 50% of the nearest neighbours. T h i s choice is much larger than the choices that we recommend for estimating m i (shown i n the next section). T h e theoretical results i n this thesis suggest that the correct choice of smoothing for estimating Pi should undersmooth the estimated mi. Therefore, this choice of smoothing is most likely smaller than the one we recommend for estimating m i , and certainly not larger. 9.2.3 Choosing an Appropriate Model for the Data In this section, we focus on the issue of selecting an appropriate model for the data amongst models (9.1)-(9.4). Selecting such a model requires that we balance model complexity w i t h model parsimony. In what follows, we show that model (9.3) is the most 147 appropriate for describing the variability i n the log mortality counts, as it is complex enough to capture the m a i n features present i n the data, yet relatively inexpensive to fit to these data i n terms of degrees of freedom. M o d e l (9.1) is the simplest of models (9.1) -(9.4) and, not too surprisingly given the strong cycles apparent i n Figure 9.1, it provides an inadequate description for the variability i n the log mortality counts. In fact, the linear relationship between P M 1 0 and log mortality postulated by model (9.1) explains only 9.25% of the total variability i n the log mortality counts. Figure 9.3 (top panel) shows that the log mortality counts are widely scattered about the regression line obtained by fitting model (9.1) to the data. Figure 9.3 (bottom panel) shows that model (9.1) displays clear lack-of-fit, as it fails to account for the strong annual cycles present i n the model residuals. W e therefore drop model (9.1) from our pool of candidate models and concentrate instead on models (9.2)-(9.4). M o d e l (9.4) is the most complex of these models, and w i l l consume significantly more degrees of freedom when fitted to the data than either model (9.2) or model (9.3). A s we shall see shortly, comparing model (9.4) against model (9.2) v i a a series of approximate F-tests suggests that we can drop model (9.4) i n favour of model (9.2). We could therefore consider the simpler model (9.2) as being adequate for describing the variability i n the log mortality counts. However, given that the weather variables are typically included i n models for P M 1 0 mortality data, we prefer to use model (9.3). T h i s model is more flexible than model (9.2), as it includes linear marginal effects for the weather variables together w i t h a linear interaction effect between these variables. C o m p a r e d to model (9.2), this model can be fitted to the d a t a at the expense of just three additional degrees of freedom. G i v e n the large size of the d a t a set, this is an insignificant price to pay for achieving more modelling flexibility. We now provide more details concerning the choice of an appropriate model for our d a t a amongst models (9.2)-(9.4). A s a first step we need to identify spans that are reasonable for smoothing the seasonal effect mi i n these models. 148 To identify a reasonable range of spans for smoothing m i i n model (9.2), we fit model (9.2) to the d a t a by smoothing m i w i t h spans ranging from 0.01 to 0.50 i n increments of 0.01 and examine plots of the fitted m i and corresponding model residuals. From Figures 9.4 and 9.5 we see that the data suggest spans i n the range 0.09 — 0.12. U s i n g spans smaller than 0.09 for estimating m i leads to under-smoothed fits, that are visually noisy. O n the other hand, using spans larger than 0.12 leads to over-smoothed fits, that fail to reflect important seasonal features of the data. In summary, the range 0.09 — 0.12 is reasonable for smoothing the seasonal effect m i i n model (9.2). P l o t s of the fitted additive component m i i n models (9.3) and (9.4) (not shown) corresponding to spans i n the range 0.09 to 0.12 are similar to those i n Figure 9.4 and suggest that this range is also reasonable for smoothing the seasonal effect m i i n models (9.3) and (9.4). We now show that we can reduce model (9.4) to model (9.2). W e use a series of approximate F-tests to compare models (9.4) and (9.2). E a c h F-test compares a fit of model (9.4), obtained by smoothing m i w i t h the span s\, against a fit of model (9.2), obtained by smoothing m i w i t h the span s i and m 2 w i t h the span s . T h e test statistic for each 2 F-test is obtained i n the usual fashion from the residual sums of squares and the residual (or error) degrees of freedom associated w i t h the two model fits. T h e residual degrees of freedom of these fits are obtained as the difference between the size of the d a t a set n = 1096 and the trace of the hat m a t r i x associated w i t h the model fit. W e allow the span Si to range between 0.09 and 0.12 i n increments of 0.01, and the span s 2 to range between 0.01 and 0.50 i n increments of 0.01. T h e p-values associated w i t h these F-tests are displayed i n Figure 9.6. P-values corresponding to spans s bigger than 0.04 are quite large, suggesting that the smooth weather 2 surface m 2 need not be included i n model (9.4). P-values corresponding to spans s of 2 0.02, 0.03 or 0.04 are a bit smaller, suggesting that perhaps the surface m 2 should be included i n the model. However, Figures 9.7 and 9.8, for s\ — 0.09, show that very small 149 spans are not appropriate for estimating the surface m 2 , as they yield visually rough surfaces that consume unacceptably high numbers of degrees of freedom. U s i n g a span s i of 0.10,0.11 or 0.12 instead yielded plots (not shown) that were basically identical to those i n Figures 9.7 and 9.8. In conclusion, the smooth weather surface m 2 contributes little to model (9.4), so there is no real need to include either temperature or relative h u m i d i t y i n this model. other words, we can reduce model (9.4) to model (9.2). In Coplots (not shown) of the residuals associated w i t h model (9.2) versus temperature, given relative humidity, and versus relative humidity, given temperature, support this conclusion. Since there is no real need to include the weather variables, temperature and relative humidity, we could consider the simpler model (9.2) as being adequate for describing the variability i n the log mortality counts. However, for reasons explained earlier, we prefer to use the more flexible model (9.3). How well does model (9.3) fit the data? To answer this question, we examine a series of diagnostic plots. Figure 9.9 shows plots of the residuals associated w i t h model (9.3) against P M 1 0 and day of study. These residuals were obtained by smoothing the unknown mi w i t h a span of 0.09; using spans of 0.10, 0.11 or 0.12 yielded similar plots (not shown). T h e functional form of the relationship between P M 1 0 and log mortality postulated by model (9.3) is not violated by the data, since no systematic structure is apparent i n the plot of residuals versus P M 1 0 . T h e plot of residuals against day of study also shows no systematic structure, suggesting that the seasonal component mi of the model accounts for the long-term temporal variation i n the d a t a reasonably well. Figures 9.109.11 show that the functional specification of the weather portion of model (9.3) is not violated by the data. Indeed, these plots display no obvious systematic structure. The weather coplots corresponding to spans of 0.10,0.11 and 0.12 were similar, so we omitted them. Finally, Figure 9.12 presents autocorrelation and p a r t i a l autocorrelation plots for the residuals associated w i t h model (9.3). F r o m these plots, it is apparent that the 150 magnitude of the residual correlation is small. W e believe this is due to the fact that most of the short-term temporal variation i n log m o r t a l i t y counts has been accounted for by the seasonal component m j of the model. C o m p a r i n g Figure 9.12 against Figure 9.13, which displays autocorrelation and p a r t i a l autocorrelation plots for the raw log m o r t a l i t y counts, supports this belief. In summary, the assumptions underlying the systematic part of model (9.3) seem reasonable. However, there is some modest suggestion that the independence assumption concerning the error terms i n this model may not hold for these data. T h i s assumption w i l l be relaxed to account for the slight temporal correlation present i n the data. M o d e l (9.3) can therefore be used as a basis for carrying out inferences o n pi, the linear P M 1 0 effect o n log mortality, adjusted for seasonal and weather confounding. A c c o u n t i n g for error correlation when conducting such inferences is perhaps not as important as accounting for the strong relationships between the linear and non-linear variables i n the model evident i n Figure 9.1. 9.2.4 Inference on the PM10 Effect on Log Mortality In order to conduct valid inferences about the linear effect Pi i n model (9.3), we must not only estimate it accurately, but also calculate correct standard errors for this estimate. For model (9.3), pi = c 3, where T c = ( 0 , 1 , 0 , 0 , 0 ) and 3 = (Po, Pi, P2, Ps, P23V • W e propose to estimate Pi v i a c / 3 c , where 8 c is the usual local linear backfitting T IS / S estimate of 3. Figure 9.14 displays a plot of c / 3 c versus the smoothing parameter h, T / S which controls the w i d t h of the smoothing window. T h e large variation i n the values of these estimates re-iterates the importance of choosing h appropriately from the d a t a so as to obtain accurate estimates of Pi. To choose appropriate values of h from the data, we use the preferred P L U G - I N and E B B S - G methods developed i n Chapter 6. B o t h methods use a grid H = {2, 3 , . . . , 548}, 151 where the values i n the grid represent half-widths of local linear smoothing windows. Recall that b o t h of these methods require that we estimate the underlying correlation structure of the model errors. In addition, P L U G - I N requires that we estimate the sesonal effect m i i n the model. We discuss these topics below. We estimate the seasonal effect m j and the error correlation structure using modified (or leave-(21+l)-out) cross-validation, as outlined i n Sections 6.3.1 and 6.3.2. W e allow the t u n i n g parameter I to take on the values 0 , 1 , . . . , 26. Recall that I quantifies our belief about the range and magnitude of the error correlation. For instance, I = 0 signifies that we believe the errors to be independent. W h e n the model errors are t r u l y correlated, we suspect that values of I that are too small may produce under-smoothed estimates of m i , whereas values of I that are too large may produce over-smoothed estimates of rri\. To ascertain what values of / are reasonable for the data, we examine plots of the estimated seasonal effect m i i n model (9.3) corresponding to / = 0 , 1 , . . . , 26; see Figure 9.15. These plots suggest that using I = 0 or I — 1 is probably not appropriate, as the corresponding estimates of m i are visually too rough. U s i n g values of / i n the range 2 — 17 seems to yield reasonable estimates of m i . Values of I i n the range 18 — 26 seem to yield over-smoothed estimates of m i , so perhaps should, be avoided. Next, we estimate the error terms i n model (9.3) v i a modified (or leave-(21+l)-out) crossvalidation residuals, defined as i n Section 6.3.1. Figure 9.16 shows plots of these residuals for various values of I. Now, we use the modified cross-validation residuals to estimate the correlation structure of the model errors. W e w i l l operate under the assumption that these errors follow a covariance-stationary autoregressive process of finite order R. T o estimate R, we use the finite sample criterion for autoregressive order selection developed by Broersen (2000). Figure 9.17 shows that our estimate of R is influenced by how we choose the value of the t u n i n g parameter I. Choosing I — 0 or 1 yields an R of 28. Choosing larger Vs yields R's 152 like 0, 2, 3 or 4. Recall that values of / like 0,1 or 1 8 , . . . , 26 are likely not appropriate for these data. Finally, after determining the order R — R(l), I = 0 , 1 , . . . , 26, of the autoregressive error process, we estimate the error variance o~\ and the autoregressive parameters 4>i, • • • ,4>R using B u r g ' s method (Brockwell and Davis, 1991). Furthermore, we estimate the error correlation m a t r i x * by plugging i n the estimated values of 0 i i n the expression for * provided i n C o m m e n t 2.2.1. H a v i n g estimated the seasonal effect m i and the error correlation structure for model (9.3), we can now tackle the issue of data-driven choice of h for accurate estimation of Pi v i a c / 3 T JiS c. T h e estimated bias squared, variance and mean squared error curves used for determining the P L U G - I N choice of smoothing for c /3 = are shown i n Figure T 7s 9.18. T h e different curves correspond to different values of I, where Z = 0 , 1 , . . . , 26. In general, the mean squared error curves corresponding to small values of I dominate those corresponding to large values of I. Figure 9.19 displays similar plots used for determining the E B B S - G choice of smoothing. Note that the bias curve i n this figure does not depend on I. A l s o note that mean squared error curves i n this figure that correspond to large values of I dominate, i n general, the curves that correspond to small values of I. Figures 9.20 and 9.21 display the P L U G - I N and E B B S - G choices of smoothing parameter obtained by m i n i m i z i n g the estimated mean squared error curves i n Figures 9.18 and 9.19. B o t h choices are remarkably stable for values of / that seem appropriate for these data. However, the P L U G - I N choices are much smaller i n magnitude t h a n the E B B S - G choices. The P L U G - I N choices that seem appropriate for the data indicate that the seasonal effect m i should be smoothed using h « 28. O n the other hand, the corresponding E B B S - G choices indicate that m i should be smoothed using h « 69. Figures 9.22 and 9.23 show the 95% confidence intervals constructed for Pi w i t h P L U G I N and E B B S - G choices of smoothing for values of I ranging from 0 to 26. These intervals 153 were obtained from formula (7.2), w i t h fl — I. B o t h figures suggest that the choice of I (among those that are reasonable for the data) is not that important. T h i s finding is consistent w i t h the M o n t e Carlo simulation study conducted i n C h a p t e r 8 that indicated these choices of smoothing were appropriate for conducting inferences on the linear effect Pi i n model (8.1) provided / was large enough. F r o m Figure 9.22, there is no conclusive proof that Pi, the short-term P M 1 0 effect on log mortality, is significantly different from 0. Indeed, the standard confidence intervals for Pi based on c /3 c, T /S w i t h h chosen v i a P L U G - I N , cross the zero line for all values of / that are appropriate for the data. T h e stability of these confidence intervals across various values of I is quite remarkable, but not entirely surprising given the stability of the corresponding P L U G - I N choices of smoothing shown i n Figure 9.20. Figure 9.23 supports the same conclusion for Pi, at least i n part. However, for all values of / that are appropriate for the data, these intervals either narrowly miss zero or barely contain it, suggesting that perhaps P M 1 0 does have a significant effect on log mortality. W h a t could explain the discrepancy between Figures 9.22 and 9.23? T h e standard errors of the estimated P M 1 0 effects are comparable i n b o t h figures. However, the P M 1 0 effect estimates obtained w i t h a P L U G - I N choice of smoothing are much smaller than those obtained w i t h E B B S - G . A s seen i n Figures 9.20 and 9.21, the P L U G - I N choices of smoothing parameter for these data are about 28 or so, and are much smaller than the E B B S - G choices, which are about 69 or so. Figure 9.14 shows that using choices of smoothing parameter h of 28 or so yields smaller P M 1 0 estimates than using values of h of 69 or so. W e favour smaller choices of smoothing parameter. W e believe E B B S - G yielded large choices because it used a grid range that was too wide. Recall that E B B S G attempts to estimate the conditional bias of c 3 ^ by assuming a specific form for T I s the relationship between this bias and the smoothing parameter h. T h i s relationship is motivated by asymptotic considerations as i n (6.13), so it may break down for values of h € H that are too large. E s t i m a t i n g this relationship based on all the 154 "data" {(\c 3 T / i S c ) -.hen) , may therefore not be appropriate. One should perhaps use only "data" for w h i c h h is reasonably small to ensure the asymptotic considerations underlying E B B S - G are valid. In other words, one should use a smaller grid range for E B B S - G . We used E B B S - G w i t h a grid H = { 2 , . . . , 100} instead of H = { 2 , . . . , 548} a n d got a similar result to that obtained v i a P L U G - I N (see Figure 9.24): there is no conclusive proof that P M 1 0 has a significant effect o n log mortality. T h i s finding is not surprising given the strength of the annual cycles present i n Figure 9.1. 155 Figure 9.1: Pairwise scatter plots of the Mexico City air pollution data. 156 Estimated PM 10 Effects 0.0 0.1 0.2 0.3 0.4 Estimated Standard Errors 0.5 0.0 Span 0.1 0.2 0.3 0.4 0.5 Span Figure 9.2: Results o / g a m inferences on the linear PM10 effect B\ in model (9.3) as a function of the span used for smoothing the seasonal effect m\: estimated PM10 effects (top left), associated standard errors (top right), 95% confidence intervals for B\ (bottom left) and p-values of t-tests for testing the statistical significance of 3\. 157 J Day of Study Figure 9.3: The top panel displays a scatter plot of log mortality versus PM10. The ordinary least squares regression line of log mortality on PM10 is superimposed on this plot. The bottom panel displays a plot of the residuals associated with model (9.1) versus day of study. 158 0.4 span = 0.01 0.4 span = 0.05 0.2 0.2 0.0 o.o •0.2 7 •0.2 .' ' ? •0.4 •0.4- 0 0.4 200 400 600 span = 0.09 800 200 1000 400 600 800 1000 0.4 span = 0.10 .i.-. 0.2 0.2 0.0 0.0- :^ : :: •0.2 v •0.2 •0.4 •0.4 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 400 600 800 1000 400 600 800 1000 •0.4 •0.4 0 200 400 600 800 1000 0.4 span = 0.15 span = 0.25 0.2 0.0 •0.2 •0.4 •0.4 0 200 400 600 800 1000 0 0.4 span = 0.35 200 span = 0.50 0.2 0.0 -0.2 •0.4- •0.4 0 200 400 600 800 1000«u_ 0 200 Figure 9.4: Plots of the the fitted seasonal effect mi in model (9.2) for various spans. Partial residuals, obtained by subtracting the fitted parametric part of the model from the responses, are superimposed as dots. 159 0.4 span = 0.01 0.4 span = 0.05 0.2 0.2 0.0' 0.0- •0.2 •0.2 •0.4 •0.4 0 200 400 600 800 1000 0 200 400 600 800 1000 400 600 800 1000 400 600 800 1000 200 400 600 800 1000 200 400 600 800 1000 0.4 span = 0.09 0.4 span = 0.10 0.2- 0.2 0.0- 0.0- •0.2 •0.2- -0.4 •0.4 0 200 400 600 800 1000 0 200 0.4 span = 0.11 0.4- span = 0.12 0.2 0.2- 0.0' 0.0- -0.2 •0.2 -0.4 •0.4 0 200 400 600 800 1000 0 200 0.4 span = 0.15 0.4 span = 0.25 0.2 0.2- 0.0 0.0- -0.2 •0.2 -0.4 •0.4 0 200 400 600 800 1000 0.4-| span = 0.35 i 0 Figure 9.5: Plots of the residuals associated with model (9.2) for various spans. 160 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 Span for smoothing m2 0.0 0.1 0.2 0.3 0.2 0.3 0.4 0.5 0.4 0.5 Span for smoothing m2 0.4 0.5 Span for smoothing m2 0.0 0.1 0.2 0.3 Span for smoothing m2 Figure 9.6: P-values associated with a series of crude F-tests for testing model (9.4) against model (9.2). 161 Figure 9.7: Plots of the fitted weather surface m in model (9.4) when the fitted seasonal effect m\ (not shown) was obtained with a span of 0.09. The surface m-i was smoothed with spans of 0.01 (top left), 0.02 (top right), 0.03 (bottom left) or 0.04 (bottom right). 2 162 300H 25CH | 200H CD CD CD |l5(H Q 100H 5(H o!Ei O10 Oil 0.20 Span Figure 9.8: (9.4) versus shown) Degrees of freedom consumed the span used for smoothing was obtained with a span of by the fitted weather m 2 0.09. 163 when surface the fitted seasonal m 2 in effect mi model (not PM10 0.2H 0.1- 3 0.0- 13 •g "w 2-0.1-0.2-0.3200 400 600 800 1000 Day of Study Figure 9.9: Plot of residuals associated with model (9.3) versus PM10 (top row) and day of study (bottom row). The span used for smoothing the unknown mi in model (9.3) is 0.09. 164 20 40 60 80 Figure 9.10: Plot of residuals associated with model (9.3) versus relative humidity, given temperature. The span used for smoothing the unknown m\ in model (9.3) is 0.09. 165 Temperature Figure 9.11: Plot of residuals associated with model (9.3) versus temperature, given relative humidity. The span used for smoothing the unknown m\ in model (9.3) is 0.09. 166 0.2 c 0.1 o 0.0 <-0.1 -0.2 50 100 150 200 250 150 200 250 Lag 0.2 c g 'ra 0.1 (B L. L. 10.0 < t -0.1 ra n -0.2 50 100 Lag F i g u r e 9.12: Autocorrelation plot (top row) and partial autocorrelation plot (bottom row) of the residuals associated with model (9.3). The span used for smoothing the unknown mi in model (9.3) is 0.09. 167 0.6 0.4 I 0.2 t 0.0 o flu. tMh, 1-0.2 < -0.4 H -0.6 50 100 150 200 250 200 250 Lag 0.6 0 0.4 J5 2! 0.2 L- 1 o.o I -0.4-1 -0.6 H —i— 50 100 150 Lag Figure 9.13: Autocorrelation plot (top row) and partial autocorrelation plot (bottom row) of the responses in model (9.3). 168 200 300 400 500 Smoothing Parameter Figure 9.14: Usual local linear backfitting estimate of the linear PM10 effect model (9.4) versus the smoothing parameter. 169 170 Figure 9.17: Estimated order for AR process describing the serial correlation in the residuals associated with model (9.3) versus I, where I = 0,1,..., 26. Residuals were obtained by estimating mi with a modified (or leave-(2l+l)-out) cross-validation choice of amount of smoothing. 172 i 0 I 50 I 100 I ' 150 ""• T" i 200 Smoothing Parameter 0 i 50 I 100 | I I 150 in II II • | • 200 Smoothing Parameter Smoothing Parameter Figure 9.18: Estimated bias squared, variance and mean squared error curves used for determining the plug-in choice of smoothing for the usual local linear backfitting estimate of Pi. The different curves correspond to different values of I, where I — 0,1,..., 26. The estimated variance curves corresponding to small values of I are dominated by those corresponding to large values of I when the smoothing parameter is large. In contrast, the estimated squared bias and mean squared error curves corresponding to small values of I dominate those corresponding to large values of I when the smoothing parameter is large. 173 Smoothing Parameter Figure 9.19: Estimated bias squared, variance and mean squared error curves used for determining the global EBBS choice of smoothing for the usual local linear backfitting estimate of Pi. The different curves correspond to different values of I, where I — 0,1,... ,26. The curves corresponding to large values of I dominate those corresponding to small values of I. 174 o 00 0) 0 0 E w o 8 «• 0 o x: 0 2 CL O OH Figure 9.20: Plug-in choice of smoothing for estimating Pi versus I, where I 0,1,. ..,26. 175 o 00 Ul c 'E o _ (O 0 0 E w 0 0) 0 'o o ^ " 0 w m CO LU lob 1 CD o M 10 15 20 25 Figure 9.21: Global EBBS choice of smoothing for estimating Pi versus I, where I = 0,1,..., 26. 176 CD o o. o 0 o c o •g o o. »+c o 0 0 N° 0 s o "2 c i ' IT) 0) CO "D C iS w o o o "T 10 15 20 25 F i g u r e 9 . 2 2 : Standard 95% confidence intervals for Pi based on local linear backfitting estimates of Pi with plug-in choices of smoothing. The different intervals correspond to different values of I, where I — 0 , 1 , . . . , 26. The shaded area represents confidence intervals corresponding to values of I that are reasonable for the data. 177 Figure 9.23: Standard 9 5 % confidence intervals for 3\ based on local linear backfitting estimates of Pi with global EBBS choices of smoothing. The different intervals correspond to different values of I, where 1 = 0,1,..., 26. The shaded area represents intervals corresponding to values of I that are reasonable for the data; the intervals corresponding to I = 3,... ,7 do not cross the horizontal line passing through zero. 178 T 0 5 10 15 20 25 I Figure 9.24: Standard 95% confidence intervals for Pi based on local linear backfitting estimates of Pi with global EBBS choices of smoothing obtained by using a smaller grid range. The different intervals correspond to different values of I, where I = 0,1,..., 26. The shaded area represents confidence intervals corresponding to values of I that are reasonable for the data. 179 Chapter 10 Conclusions In this chapter, we provide an overview of the research problem considered i n this thesis. We then outline the m a i n contributions of this thesis and summarize the contents of each chapter. Finally, we suggest possible extensions to our work. Partially Linear Models P a r t i a l l y linear models are flexible tools for analyzing data from a variety of applications. T h e y generalize linear regression models by allowing one of the variables i n the model to have a non-linear effect on the response. Inferences on the Linear Effects in Partially Linear Models In many applications, the p r i m a r y focus is on conducting inferences on the linear effects 8 i n a p a r t i a l l y linear model. In these applications, the non-linear effect m i n the model is treated as a nuisance. T h i s nuisance effect is a double-edged sword - while it affords greater modelling flexibility, it is also more difficult to estimate t h a n the linear effects and, as such, it complicates the inferences on these effects. Inferential Goals Depending on the application, various goals could be relevant to the problem of conducting inferences on the linear effects i n a partially linear models w i t h correlated errors. 180 One goal would be to choose the correct amount of smoothing for accurately estimating the linear effects. One would hope that the methodology used for m a k i n g this choice produces an amount of smoothing for which the linear effects are estimated at the 'usual' parametric rate of 1/n - the rate that would be achieved if the non-linear effect were known. A n o t h e r goal would be to construct valid standard errors for the estimated linear effects. A n additional goal would be to use the estimated linear effects and their associated standard errors to construct valid confidence intervals and tests of hypotheses for assessing the magnitude and statistical significance of the linear effects, possibly adjusting for smoothing bias. L i t t l e has been done i n the literature to address this goal. Research Questions Concerning the Inferential Goals Various research questions emerge i n connection w i t h the inferential goals listed above: 1. H o w can we choose the correct amount of smoothing for accurate estimation of the linear effects? 2. H o w can we estimate the correlation structure of the model errors for conducting inferences on the linear effects? 3. H o w can we construct valid standard errors for the estimated linear effects? 4. H o w can we construct valid confidence intervals and tests of hypotheses for assessing the magnitude and statistical significance of the linear effects? 5. W h a t is the impact of the choice of amount of smoothing on the validity of the confidence intervals and tests of hypotheses? 6. C o u l d inefficient estimates of the linear effects provide valid inferences? 181 Thesis Contributions T h e major contributions of this thesis to the research questions stated above are: (1) defining sensible estimators of the linear and non-linear effects i n partially linear models w i t h correlated errors, (2) deriving explicit expressions for the asymptotic conditional bias and variance of the proposed estimators of the linear effects, (3) developing data-driven methods for selecting the appropriate amount of smoothing for accurate estimation of the linear effects, (4) developing confidence interval and hypothesis testing procedures for assessing the magnitude and statistical significance of the linear effects of m a i n interest, (5) studying the finite-sample properties of these procedures, and (6) applying these procedures to the analysis of an air p o l l u t i o n data set. These contributions are discussed i n more detail below. T h e estimators we proposed i n this thesis are backfitting estimators, relying on locally linear regression, w h i c h is known to posses attractive theoretical and practical properties. M a n y of the backfitting estimators proposed i n the literature of partially linear regression models w i t h correlated errors rely on locally constant regression, a method that does not enjoy the good properties of locally linear regression. In Chapters 4 and 5 of this thesis, we studied the large-sample behaviour of the estimators of linear effects introduced i n this thesis as the w i d t h of the smoothing window used i n locally linear regression decreases at a specified rate, and the number of d a t a points i n this window increases. Specifically, we obtained explicit expressions for the conditional asymptotic bias and variance of these estimators. O u r asymptotic results are important as they show that, i n the presence of correlation between the linear and non-linear variables i n the model, the bias of the estimators of the linear effects can dominate their variance asymptotically, therefore compromising their -^/^-consistency. T h i s problem can be remedied however by selecting an appropriate rate of convergence for the smoothing parameter of the estimators. T h i s rate is slower than the rate that is o p t i m a l for estimation of the non-linear effect, and as such it 'undersmooths' the estimated non-linear 182 effect. Selecting the appropriate amount of smoothing for the estimators of the linear effects is a crucial problem, which is complicated by the presence of error correlation and dependencies between the linear and nonlinear components of the model. Our theoretical results indicate that the amount of smoothing that is 'optimal' for estimating the non-linear effect is not 'optimal' for estimating the linear effects. Data-driven methods devised for accurate estimation of the non-linear effect will likely fail to yield a satisfactory choice of smoothing for estimating the linear effects. In this thesis, we proposed three data-driven smoothing parameter selection methods. Two of these methods are modifications of the EBBS method of Opsomer and Ruppert (1999) and rely on the asymptotic bias results derived in this thesis. The third method is a non-asymptotic plug-in method. Our methods fill a gap in the literature of partially linear models with correlated errors, as they are designed specifically for accurate estimation of the linear effects. These methods 'undersmooth' the estimated non-linear effect because they attempt to estimate the amount of smoothing that is MSE-optimal for estimating the linear effects, not the amount of smoothing that is MSE-optimal for estimating the non-linear effect. Our theoretical results suggest that, in general, the amount of smoothing that is MSE-optimal for estimating the linear effects is smaller than the amount of smoothing that is MSE-optimal for estimating the non-linear effect. The issue of conducting valid inferences on the linear effects in a partially linear model with correlated errors is inter-connected with the appropriate choice of smoothing for estimating these effects. Most literature results devoted to this issue use choices of smoothing that 'do well' for estimation of the non-linear effect and are deterministic. Such choices may not be satisfactory when one wishes to 'do well' for estimation of the linear effects and hence have little practical value in such contexts. The confidence interval and hypothesis testing procedures proposed in this thesis are constructed with data-driven choices of smoothing. They are either standard, bias-adjusted or standard- 183 error adjusted. T o our knowledge, adjusting for bias i n confidence intervals and tests of hypotheses has not been attempted i n the literature of p a r t i a l l y linear models. T h e inferential procedures we introduced i n this thesis do not account for the uncertainty associated w i t h the fact that the choice of smoothing is data-dependent and the error correlation structure is estimated from the data. However, simulations indicate that several of these procedures perform reasonably well for finite samples. In Chapter 8, we conducted a M o n t e C a r l o simulation study to investigate the finite sample properties of the linear effects estimators proposed in this thesis, namely, the usual and estimated modified local linear backfitting estimators. W e also compared the properties of these estimators against those of the usual Speckman estimator. In our simulation study, we chose the smoothing parameter of the backfitting estimators using the data-driven methods developed i n Chapter 6. B y contrast, we chose the smoothing parameter of the usual Speckman estimator using cross-validation, modified for correlated errors ( M C V ) and for boundary effects. T h e m a i n goals of our simulation study were (1) to compare the expected log mean squared error of the estimators and (2) to compare the performance of the confidence intervals built from these estimators and their associated standard errors. O u r study suggested that the usual local linear backfitting estimator should be used i n practice, w i t h either a global modified E B B S or a non-asymptotic plugi n choice of smoothing. To ensure the validity of the inferences based on this estimator and its associated standard error, one should never use small values of / i n the modified (or leave-(21+l)-out) cross-validation criterion utilized i n estimating the error correlation structure. A d j u s t i n g these inferences for possible bias effects d i d not affect the quality of our results. T h e quality of the inferences based on the estimated modified local linear estimator was poor for many simulation settings, owing to the fact that the associated standard errors were too variable. T h e quality of the inferences based on the Speckman estimator was reasonable for most simulation settings, but not as good as that of the inferences based on the usual local linear backfitting estimator. 184 In Chapter 9, we used the inferential methods developed i n this thesis to assess whether the pollutant P M 1 0 had a significant short-term effect on log m o r t a l i t y i n M e x i c o C i t y during 1994-1996, after adjusting for temporal trends and weather patterns. Our data analysis suggested that there is no conclusive proof that P M 1 0 had a significant shortterm effect on log mortality. O u r data analysis differs from standard analyses i n that it relies on objective methods to adjust this effect for temporal confounding. F u r t h e r W o r k to be D o n e A s usual, there is further work to be done. T h e following are just a few of the issues that need additional investigation. Proofs of the asymptotic normality of the linear effects estimators proposed i n this thesis are still pending. These proofs w i l l provide formal justification for using standard confidence intervals and tests of hypotheses based on these estimators and their associated standard errors. Further investigation into the appropriate choice of I i n the modified cross-validation criterion used i n estimating the error correlation structure is needed. T h i s choice should take into account the range and magnitude of the error correlation. Possible Extensions to O u r W o r k T h e work i n this thesis can be extended i n various directions. First, we could extend the partially linear model considered i n this thesis by allowing additional univariate smooth terms to enter the model. Such models arise frequently i n practical applications. Developing inferential methodology for these models is therefore important. T o carry out inferences on the linear effects i n such models we would need to simultaneously choose the amounts of smoothing for estimating a l l the non-linear effects. These amounts should be appropriate for accurate estimation of the linear effects and should account for correlation between the linear and non-linear variables and correlation between the model errors. 185 Second, we could extend the partially linear model considered in this thesis to responses that are not continuous. For instance, the responses could follow a Poisson distribution. Incorporating correlation in such models could be a challenge. Third, we could extend the partially linear model considered in this thesis by allowing the non-linear variable to be a spatial coordinate, in which case m is a spatial effect. Such a model is termed a spatial partially linear model. Clearly, in many contexts, the errors would be correlated. Spatial partially linear models with correlated errors can be used, for instance, to analyze spatial data observed in epidemiological studies of particulate air pollution and mortality. Typically, in these applications, the linear effects 3 are of main interest, while the spatial effect m is treated as a nuisance. Ramsay et al. (2003b) considered spatial partially linear models with uncorrelated errors and estimated 3 and m using the S-Plus function gam with loess as a smoother. They used gam's default choice of smoothing to control the degree of smoothness of the estimated m. They showed via simulation that the correlation between the linear and spatial terms in the model can lead to underestimation of the true standard errors associated with the estimated linear effects, both when using S-Plus standard errors and so-called asymptotically unbiased standard errors. They cautioned that using such standard errors can compromise the validity of inferences concerning the linear effects, but did not propose a solution for alleviating this problem. Their findings highlight the fact that carrying out inferences on the linear effects in spatial partially linear models with uncorrelated errors is challenging in the presence of correlation between the linear and spatial terms in the model. Obviously, error correlation will further compound the challenges involved in conducting valid inferences on the linear effects in spatial partially linear models. Of course, this work would be relevant in the non-spatial context as well. 186 Bibliography [1] Aneiros Perez, G. and Quintela del Rio, A. (2001a). Asymptotic properties in partial linear models under dependence. Test, 10, 333-355. [2] Aneiros Perez, G. and Quintela del Rio, A. (2001b). Modified cross-validation in semiparametric regression models with dependent errors. Communications in Statistics: Theory and Methods, 30, 289-307. [3] Aneiros Perez, G. and Quintela del Rio, A. (2002). Plug-in bandwidth choice in partial linear models with autoregressive errors. Journal of Statistical Planning and Inference, 100, 23-48. [4] Bos, R., de Waele, S. and Broersen, P.M.T. (2002). Autoregressive spectral estimation by application of the Burg algorithm to irregularly sampled data. IEEE Transactions on Instrumentation and Measurement, 51, 1289-1294. [5] Brockwell, P.J. and Davis, R.A. (1991). Time Series: Theory and Methods. Second Edition. New York: Springer-Verlag. [6] Broersen, P.M.T. (2000). Finite Sample Criteria for Autoregressive Order Selection. IEEE Transactions on Signal Processing, 48, 3550-3558. [7] Buja, A., Hastie, T. and Tibshirani, R. (1989). Linear smoothers and additive models (with discussion). Annals of Statistics, 17, 453-555. [8] Chatfield, C. (1989). The Analysis of Time Series: An Introduction. Fourth Edition. New York: Chapman and Hall. 187 [9] C h u , C . - K . , M a r r o n , J . S . (1991). C o m p a r i s o n of two b a n d w i d t h selectors w i t h dependent errors. Annals of Statistics, 19, 1906-1918. [10] D a v i d , B . and B a s t i n , G . (2001). A n estimator of the inverse covariance m a t r i x and its application to M L parameter estimation i n d y n a m i c a l systems. Automatica, 156, 99-106. [11] D o m i n i c i , F . , M c D e r m o t t , A . , Zeger, S . L . and Samet, J . M . (2002). O n the use of generalized additive models i n time-series studies of air p o l l u t i o n and health. American Journal of Epidemiology, 156, 193-203. [12] Engle, R . F . , Granger, C . W . J . , Rice, J . and Weiss, A . (1983). Nonparametric estimates of the relation between weather and electricity demand. Technical report, U . C . San Diego [13] Engle, R . F . , Granger, C . W . J . , Rice, J . and Weiss, A . (1986). Semiparametric estimates of the relation between weather and electricity sales. The Journal of the American Statistical Association, 81, 310-320. [14] F a n , J . (1993). L o c a l linear regression smoothers and their m i n i m a x efficiency. The Annals of Statistics, 21, 196-216. [15] F a n , J . and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. N e w York: C h a p m a n and H a l l . [16] F a n , J . and Gijbels, I. (1992). Variable B a n d w i d t h and L o c a l Linear Regression Smoothers. The Annals of Statistics, 20, 2008-2036. [17] Francisco-Fernandez, M . and Vilar-Fernandez, J . M . (2001). L o c a l p o l y n o m i a l regression w i t h correlated errors. Communications in Statistics: Theory and Methods, 30, 1271-1293. [18] Gasser, T . and M i i l l e r , H . G . (1984). E s t i m a t i n g regression functions and their derivatives by the kernel method. Scandinavian Journal of Statistics, 11, 171-185. 188 [19] Green, P . , Jennison, C . and Seheult, A . (1985). Analysis of field experiments by least squares smoothing. Journal of the Royal Statistical Society, Series B, 4 7 , 299-315. [20] Hastie, T . J . and T i b s h i r a n i , R . J . (1990). Generalized Additive Models. N e w York: C h a p m a n and H a l l . [21] H a r d l e , W . and V i e u , P. (1992). K e r n e l regression smoothing of time series. Journal of Time Series Analysis, 1 3 , 209-232. [22] Heckman, N . E . (1986). Spline smoothing i n a partly linear model. Journal of the Royal Statistical Association, Series B, 4 8 , 244-248. [23] Ibragimov, I . A . and L i n n i k , Y . V . (1971). Independent and Stationary Sequences of Random Variables. Groningen: Wolters Noordhoff. [24] K a t s o u y a n n i , K . , T o u l o m i , G . and Samoli, E . , et al. (1997). Confounding and effect modification i n the short-term effects of ambient particles on total mortality: results from 29 E u r o p e a n cities w i t h i n the A P H E A 2 project. Epidemiology, 1 2 , 521-531. [25] K e l s a l l , J . E . , Samet, J . M . and Zeger, S . L . (1997). A i r p o l l u t i o n and mortality i n Philadelphia, 1974-1988. American Journal of Epidemiology, 1 4 6 , 750-762. [26] Moolgavakar, S. (2000). A i r pollution and hospital admissions for diseases of the circulatory system i n three U . S . metropolitan areas. Journal of the Air Waste Management Association, 50, 1199-1206. [27] Moyeed, R . A . and Diggle, P . J . (1994). Rate of convergence i n semiparametric modelling of longitudinal data. Australian Journal of Statistics, 3 6 , 75-93. [28] Nadaraya, E . A . (1964). O n estimating regression. Theory of Probability and Its Applications , 9, 141-142. [29] Opsomer, J . D . and Ruppert, D . (1998). A fully automated b a n d w i d t h selection method for fitting additive models. The Journal of the American Statistical Association, 9 3 , 605-620. 189 [30] Opsomer, J . D . and Ruppert, D . (1999). A root-n consistent estimator for semiparametric additive modelling. Journal of Computational and Graphical Statistics, 8, 715-732. [31] Ramsay, T . , Burnett, R . , K r e w s k i , D . (2003a). T h e effect of concurvity i n generalized additive models l i n k i n g mortality and ambient air pollution. Epidemiology, 14, 1823. [32] Ramsay, T . , Burnett, R . , K r e w s k i , D . (2003b). E x p l o r i n g bias i n a generalized additive model for spatial air p o l l u t i o n data. Environmental Health Perspectives, 1 1 1 , 1283-1288. [33] Rice, J . A . (1986). Convergence rates for partially splined models. Statistics and Probability Letters, 4, 203-208. [34] Robinson, P . M . (1988). Root-n-consistent semiparametric regression. Econometrica, 56, 931-954. [35] Samet, J . M . , D o m i n i c i , F . , C u r r i e r o , F . , et a l . (2000). F i n e particulate air p o l l u t i o n and mortality i n 20 U . S . cities: 1987-1994 (with discussion). New England Journal of Medicine, 3 4 3 , 1742-1757. [36] Schwartz, J . (1994). Nonparametric smoothing i n the analysis of air p o l l u t i o n and respiratory illness. The Canadian Journal of Statistics, 22, 471-488. [37] Schwartz, J . (1999). A i r pollution and hospital admissions for heart disease i n eight U S counties. Epidemiology, 10, 17-22. [38] Schwartz, J . (2000). Assessing confounding, effect modification, and thresholds i n the associations between ambient particles and daily deaths. Environmental Health Perspectives, 1 0 8 , 563-568. [39] Shick, A . (1996). Efficient estimation i n a semiparametric additive regression model w i t h autoregressive errors. Stochastic Processes and their Applications, 6 1 , 339-361. 190 [40] Shick, A . (1999). Efficient estimation i n a semiparametric additive regression model w i t h A R M A errors. Stochastic Processes and their Applications, 61, 339-361. [41] Speckman, P . E . (1988). Regression analysis for partially linear models. Journal of the Royal Statistical Association, Series B, 50, 413-436. [42] Sy, H . (1999). A u t o m a t i c b a n d w i d t h choice i n a semiparametric regression model. Statistica Sinica, 9, 775-794. [43] Truong, Y . K . (1991). Nonparametric curve estimation w i t h time series errors. Journal of Statistical Planning and Inference, 28, 167-183. [44] W a h b a , G . (1984). Cross-validated spline methods for the estimation of multivariate functions from d a t a on functionals. In Statistics: Anniversary An Appraisal, Proceedings 50th Conference Iowa State Statistical Laboratory ( H . A . D a v i d , ed.) Iowa State University Press, 205-235. [45] Watson, G . S . (1964). Smooth regression analysis. Sankhya A, 26, 359-372. [46] Y o u , J . and C h e n , G . (2004). B l o c k external bootstrap i n partially linear models w i t h nonstationary strong m i x i n g error terms. The Canadian Journal of Statistics, 32, 335-346. [47] Y o u , J . , Zhou, X . and C h e n , G . (2005). Jackknifing i n partially linear regression models w i t h serially correlated errors. Journal of Multivariate Analysis, 92, 386404. 191 Appendix A M S E Comparisons In this appendix, we provide plots to help assess and compare the M S E properties of the estimators of the linear effect /?i i n model (8.1) that were discussed i n Section 8.2. 192 U EBBS G minus U PLUG IN U_EBBS L minus U PLUG IN 1 + 1 1 + 1=0 + S 1 1 1 1 1"- r + S 1=1 1 * * * t t * ! i + 1 1 L i l l l i l l i|j - t I S 1 — S 1 S 1 l=3 l=2 + + S 1 l=4 + S 1 l=5 + S 1 l=6 + S 1 l=7 ; J 1 l=8 > s 1=10 U EBBS L minus U EBBS G l=0 Figure A . l : 1=1 l=2 Boxplots of pairwise differences in log MSE for the estimators PV,PLUG-IN> PU]EBBS-G PU!EBBS-L of the linear effect Pi in model (8.1), where I = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0 and m(z) — mi(z). A N D 193 U E B B S G minus U P L U G 0.4 h 1=4 1=5 1=6 1=7 1=9 1=10 U _ E B B S L minus U E B B S G + t t + I* l=0 1=1 t + J *I * I * ! * I * I * I* J * l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 Figure A.2: Boxplots of pairwise differences in log MSE for the estimators PU]PLVG-IN> PU]EBBS-G A N D 0U,EBBS-L °f t h e ^ear effect B in model (8.1), where X I — 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.2 and m(z) = m\{z). 194 U E B B S G minus U P L U G IN U _ E B B S L minus U E B B S G 1=9 Figure A.3: 1=10 Boxplots of pairwise differences in log MSE for the estimators M]PLUG-IN> W,EBBS-G PU,EBBS-L °f linear effect B in model (8.1), where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.4 and m(z) = m\{z). A N D t h e x 195 U _ E B B S _ G minus U_PLUG_IN 1 1 1 1 I 1 1 iT I i i1!?:? T T T• <! H = t * +* • +. S 1 S 1 1=0 1=1 + + S + + 1 1 1 l=2 l=3 l=4 1 1 t + + + 1 1 l=5 l=6 1 1 + h i +! * H 1 l=7 i i l=8 l=9 1=10 I I U _ E B B S _ L minus U _ P L U G _ I N j 1 1 1 i I I t I 1 1 + + + I • I i \\ I I + - (fl t —I 1=0 1 1=1 s 1 I l=2 s I1 l=3 s I1 I1 I1 l=4 l=5 l=6 s I1 l=7 s I1 l=8 s s I1 l=9 s 1I - 1=10 U E B B S L minus U E B B S G Figure A . 4 : Boxplots of pairwise differences in log MSE for the estimators PU,PLUG-IN> PU,EBBS-G M]EBBS-L of the linear effect ft in model (8.1), where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.6 and m(z) — m\(z). A N D 196 U E B B S G minus U P L U G IN 1=3 1=4 1=5 1=6 1=7 1=8 1=9 1=10 1 i 1 , + t * U E B B S L minus U P L U G IN U _ E B B S _ L minus U E B B S G I I 1 1 i P i- 1=0 Figure A . 5 : 1 [ I || -1 rh S 1 S S 1=1 ! > l=2 l=3 if 1 1 | t I r t * i * ¥ s * * + S I l=4 S l l=5 S l l=6 l l=7 l l=8 l l=9 l 1=10 Boxplots of pairwise differences in log MSE for the estimators d PU,EBBS-L °f Unear effect Pi in model (8.1), where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.8 and m(z) — rn\(z). PU!PWG-IN> PU!EBBS-G an t h e 197 U E B B S G minus U P L U G IN U E B B S L minus U P L U G IN 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=9 1=10 l=9 1=10 U E B B S L minus U E B B S G T l=0 Figure A.6: 1=1 l=3 l=2 l=4 l=5 l=6 l=7 Boxplots of pairwise differences in log MSE for the estimators PV]PLUG-IN> PU,EBBS-G PU]EBBS-L of the linear effect ft in model (8.1), where / = 0 , 1 , . . . , 1 0 . Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0 and m(z) — 1712(2). A N D 198 U E B B S G minus U P L U G IN U E B B S L minus U P L U G IN T 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 l=8 l=9 1=10 U E B B S L minus U E B B S G T 1=0 Figure A.7: 1=1 l=2 l=3 l=4 l=5 l=6 l=7 Boxplots of pairwise differences in log MSE for the estimators M]PLUG-IN> PU]EBBS-G PU,EBBS-L °f linear effect Pi in model (8.1), where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.2 and m(z) — 7712(2). A N D t h 199 e U _ E B B S G minus U P L U G IN t i + 1=0 I ~i r -i- * I ~i * 1 * I t i + 1=1 l=2 n r l=3 l=4 l=5 l=6 l=7 s s I I l=8 l=9 1L. =10 U E B B S L minus U P L U G IN ~i r~ - I i i + s —1 1=0 4 3 2 s s 1 1=1 n I 1h s s l=2 1 l=3 1 1 i s 1 l=4 s I s I l=5 l=6 Figure A.8: 1=1 1 1=2 s I l=8 s I l=9 L_ 1=10 U _ E B B S _ L minus U _ E B B S _ G 1 1 1 iii I r-*-i l=0 s I l=7 1=3 1=4 1=5 1=6 1=7 I X 1=8 1=9 1=10 Boxplots of pairwise differences in log MSE for the estimators PU,PLUG-IN> PU]EBBS-G PU,EBBS-L °f linear effect ft in model (8.1), where I — 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.4 and m(z) — 7712(2). A N D t h 200 e U E B B S G minus U P L U G IN 1=0 1=1 -i l=2 l=3 1 1 l=4 l=5 l=6 l=7 1=9 1=10 U _ E B B S L minus U P L U G IN + + + i=o 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 1=10 U E B B S L minus U E B B S G I T T + + l=0 Figure A.9: 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=9 1=10 Boxplots of pairwise differences in log MSE for the estimators PU,PLUG-IN> PU)EBBS-G linear effect Bx in model (8.1), where 1 = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.6 and m(z) = 7712(2;). AND &U]EBBS-L °f 201 t h e U E B B S G minus U P L U G IN U _ E B B S L minus U P L U G IN + + + 1=0 1=1 l=2 + + l=3 l=4 2 - l=5 l=6 l=7 l=8 l=9 1=10 l= l=9 1=10 U E B B S L minus U E B B S G 4i— 3 - + X + 1 - i r- 0— -1 -2 -3 - + l=0 Figure A . 10: 1=1 l=2 l=3 l=4 l=5 l=6 l=7 Boxplots of pairwise differences in log MSE for the estimators PV,PLUG-IN> W,EBBS-G PU]EBBS-L of the linear effect Pi in model (8.1), where I — 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.8 and m(z) = 7712(2). A N D 202 E M E B B S G minus E M P L U G IN E M E B B S L minus E M P L U G IN ~i r~ ~\ 1 r 1 1 - - i i ii iiii IiI + s —I 1=0 s s s I I 1=1 l=2 s I l=3 s I s I l=5 l=4 s 1 l=6 s I s I l=7 l=8 # s I l=9 . l_ 1=10 E M E B B S L minus E M E B B S G Figure A . 11: Boxplots of pairwise differences in log MSE for the estimators °f ff Pi ( )> where Z = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0 and m(z) = mi(z). 0EM,PLUG-IN> PEM,EBBS-G A N D 0EM.EBBS-L 203 t h e l i n e a r e ect i n m o d e l 8A E M E B B S G minus E M P L U G IN 1=0 1=1 0.6 0.4 0.2 0 -0.2 h l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 E M _ E B B S _ L minus E M _ P L U G _ I N 1 1 1 1 [— ~i llllllillll !! f * 1=0 1=1 l=2 s s * * + * 1 J ][ l=9 1=10 -0.4 -0.6 s t l=0 Figure A.12: l=3 s l=4 s l=6 l=5 s s l=7 s l=8 s s s I I I I I I I I ! 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1_ 1=10 Boxplots of pairwise differences in log MSE for the estimators PEM,PLUG-IN> PEM,EBBS-G PEM,EBBS-L °f t h e l i n e a r effect Pi i n m o d e l (8A)> where I — 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.2 and m(z) = m\(z). A N D 204 E M _ E B B S _ G minus E M P L U G IN -0.5 1=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l= l=9 1=10 E M E B B S L minus E M P L U G IN I j L i ji 3 F= ; i l=0 Figure A . 13: till j 1 i J 1 1 T+ T ^ HP * 1t ± + uL ± ii ii ?f1=9 ^ 1=10 • } s S S s s s s 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 - Boxplots of pairwise differences in log MSE for the estimators °f ff Pi model (8.1), where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.4 and m(z) = m\{z). PEM,PLUG-IN> PEM,EBBS-G A N D PEM,EBBS-L 205 t h e l i n e a r e ect i n E M E B B S G minus E M P L U G IN 1 1 1 T i i i ? s 1=0 ? s 1=1 i jj L i M .yy f i T i r l=2 T s l=4 l=3 i i H i " • T } i T s l=5 T s l=6 T s l=7 T s l=8 T s s 1=10 E M E B B S L minus E M P L U G IN I + + j - rL i i l i i i i i i i I • * i T + h T + + T * T T + + I I s s 1=0 1=1 l=2 l=3 l=4 l=5 l=6 I I I I I I I ? l • i • i i S S S 1=0 1=1 l=2 Figure A . 14: i | s f S i s S l=4 s 1=10 L- i S l=7 s l=9 1 i S l=6 s l=8 I t S l=5 s l=7 + T S l=3 s S l=8 S l=9 " 1=10 Boxplots of pairwise differences in log MSE for the estimators MM,PLUG-IN> PEM,EBBS-G PEM,EBBS-L °fthe l i n e a r effect Pi i n m o d e l where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE's is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.6 and m(z) — m\(z). A N D 206 E M E B B S G minus E M P L U G IN 3r2 1 - T TT 3i— 2 - l=0 1=1 ~i r ff T T l=2 l=4 l=3 l=5 l=6 l=7 l=8 3 l=9 1=10 E M E B B S L minus E M P L U G IN -i - I r~ i i i i i i i TTTTTTl l=0 1=1 1=2 1=3 + i 1=4 1=5 + 1=6 + 1=7 1=8 * * 1=9 1=10 E M E B B S L minus E M _ E B B S _ G 1 1 1 r~ ~i L J l=0 Figure A . 15: 1=1 I rf 1=2 1=3 1=4 1=5 1=6 1=7 1=8 1=9 1=10 Boxplots of pairwise differences in log MSE for the estimators MM,PWG-IN> PEM,EBBS-G PEM,EBBS-L °f ff A (S- )^ where Z = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.8 and m(z) = m\(z). A N t D 207 h e l i n e a r e ect i n m o d d 1 E M E B B S G minus E M P L U G IN 1=10 i 1 E M E B B S L minus E M P L U G IN 1 1 1 1 r 1 1 i I S 1 1=0 1=1 1 l=2 1 S 1 l=3 I I I S S _ 1 1 -J T S S 1 1 l=4 l=5 1" i i 1 , 1 1 i : 1 1 1 1 . S s s s s 1 l=6 l=7 l=8 l=9 1=10 E M E B B S L minus E M E B B S G I - J L J L 1 I 1 1 ' . ' T T I ' —: • 1 T S S S S 1 1=0 1 1=1 1 l=2 l=3 Figure A . 16: PEM,PLUG-IN> I 1 I i S S S . l=5 1 l=6 I i -r- j T 1 l=4 I I i i T W T 1 s s s I l=7 Il=8 I l=9 I : 1 T . s 1=10 Boxplots of pairwise differences in log MSE for the estimators PEM,EBBS-G A N D PEM,EBBS-L °f t h e l i n e a r ff e ect Pi i n m o d d i - )' 8 1 where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 significance level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0 and m(z) = m,2(z). 208 E M _ E B B S G minus E M P L U G IN E M E B B S L minus E M P L U G IN L i i i i T T T T T T IIJT_XJI l=0 1=1 l=2 l=3 l=4 I l=5 l=6 l=7 l=8 l=9 E M E B B S L minus E M E B B S G 4i— 3 - I, 2 - J L J L J L J L J 1 0 1=10 - -1 -2 -3 l=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 Figure A.17: Boxplots of pairwise differences in log MSE for the estimators MM,PLUG-IN> PEM,EBBS-G PEM,EBBS-L °f ff A model (8.1), where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.2 and m(z) = m2(z). A N t D 209 h e l i n e a r e ect i n E M E B B S G minus E M P L U G IN E M _ E B B S _ L minus E M P L U G IN 1 1 1 "T" _ l-I-l " x S 1 1=0 S 1 1=1 I I 1 J i 1 1 . 1 1 1 x 1 S S 1 l=2 n S 1 S 1 - hr 1 | S 1 l=5 1 1 s S 1=10 S 1 l=6 T : T 1 S 1 l=4 1 n 1 i 1 l=3 1 1 l=7 1 l=8 l=9 I I E M E B B S L minus E M E B B S G : JL I I xX I 1 I , 1 1 1 1 J S S S 1=1 l=2 Figure A . 18: S S l=3 S l=4 S l=5 S l=6 J V hr I 1=0 I S S l=7 S l=8 l=9 1=10 Boxplots of pairwise differences in log MSE for the estimators PEM,PLUG-IN> PEM,EBBS-G A N D PEM,EBBS-L °f t h e l i n e a r ff e ect Pi i n m o d e l where Z = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.4 and m(z) = 7712(2). 210 E M _ E B B S _ G minus E M P L U G IN 1 ) 1 -»- 1 1 1 1 1 1 1 1 — * : TTVTTT! fTTT: HH tzp E 3 E = 3 f==i czi] r_ - E 1 r S0 1= 1 S1 1= 1 S l=2 1 S l=3 S l=4 1 1 S l=5 1 5 J l=6 £5 S l=7 l= i S S l=9 1=10 i i E M E B B S L minus E M P L U G IN 111 1 : J s S 1=0 1=1 1 I — I 1 1 1 1 11 i s s s s s s s s s l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 1 1 1 1 1 I 1 1 1 1 1 1 . 1 1 1 E M E B B S L minus E M E B B S G 4r- -i r l=9 1=10 3 - l=0 Figure A . 19: 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 Boxplots of pairwise differences in log MSE for the estimators PEM,PLUG-IN•> PEM,EBBS-G MM,EBBS~L °f ff A model (8.1), where I — 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.6 and m(z) = 7712(2). A N t D 211 h e l i n e a r e ect i n E M E B B S G minus E M P L U G IN ~i 1 TT s — I 1=0 s 1 1=1 r~ s s l=21 s 1 l=3 s l=41 EMEBBS 4|— s s I l=6 I l=5 s I l=7 T s s l=8I I l=9 1_ 1=10 L minus E M P L U G IN 3 - I 2 1 0 - -1 -2 -3 - l=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 E M _ E B B S _ L minus E M E B B S G 4 L T 3 2 1 0 -1 -2 - l=0 Figure A.20: 0EM,PLUG-IN> 1=1 l=2 l=3 l=4 l=5 l=6 l=8 1 = 10 Boxplots of pairwise differences in log MSE for the estimators PEM,EBBS-G A N D 0EM,EBBS-L °f t h e l i n e a r ff e ect 01 i n m o d e l ( )> SA where Z = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.8 and m(z) = m ( z ) . 2 212 U _ E B B S _ G minus E M _ E B B S _ G + » i • i — t 1=0 l=2 1=1 l=3 l=4 l=5 » i I i l=7 l=8 l=9 l=6 i + 1=10 U _ P L U G _ I N minus E M E B B S G _ l=0 l — 1=1 | — l=2 _ l=3 | — — l=4 | l=5 — | l=6 l=7 - | l=8 l=9 1=10 U _ E B B S _ G minus U P L U G IN 4 t • l=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 S _ M C V minus U E B B S G | l=0 1=1 1_ | l=2 l=3 1 1 l=4 l=5 1 l=6 JL 1, T l=7 l=8 l=9 1=10 S _ M C V minus U _ P L U G _ I N I I i l=0 1=1 l=2 l=3 l=4 i i l=5 l=6 4 T l=7 l=9 1=10 S MCV'minus E M E B B S G T i=o 1=1 Figure A.21: I | l=2 l=3 | j | l=4 l=5 l=6 j. l=7 -4-H l=8 l=9 1=10 Boxplots of pairwise differences in log MSE for the estimators ]PLUG-IN> PU]EBBS-G> @EM,EBBS-G A N D PS]MCV °f the linear effect ft in model (8.1), where I — 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0 and m(z) — mi(z). 213 UJEBBS J i 11 —f— f 1=0 —i i m + + G minus E M E B B S G ' 1 — 1=1 l=2 l=3 1 1 r I ¥ — l=4 U PLUG * ii + + l=5 l=6 JI » JI —jr— — l=7 l=8 _jA_ — l=9 —^— 1=10 IN m i n u s E M E B B S G ^—t——1--!^—I—I—t—I—t1=0 —i 1=1 l=2 l=3 1 i r l=4 l=5 l=6 l=7 l=8 l=9 U E B B S G minus U P L U G IN ~ + + ~ 1=10 _ I I I I I I I i i I l=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 I 1=10 S M C V minus U E B B S G f { H 1 i I 1 -I- -f 4 1 S 1=0 1=1 I I f -1 5 5 l=0 1=1 + s l=2 + s l=3 + s + s l=4 + s l=5 l=6 l=7 l=8 l=9 . M C V minus U _ P L U G _ I N X i s l=2 4- X* X +X + T s l=3 s s l=4 + s l=5 l=6 + 1=10 I I f i I * s s l=7 l=8 l=9 1=10 l=7 l=8 l=9 1=10 s s S M C V minus E M E B B S G -1 l=0 1=1 l=2 l=3 l=4 l=5 l=6 Figure A.22: Boxplots of pairwise differences in log MSE for the estimators M]PLUG-IN> M]EBBS-G> PEM,EBBS-G A N D MMCV °f t h e l i n e a r ff e ect Pi i n m o d el (8.1), where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.2 and m{z) = m,\ (z). 214 _±_ dl A JL A s S S S S S 1=6 1=7 1=8 1=9 1=10 l=9 1=10 l=0 1=1 l=2 l=3 1 i i Jl U E B B S G minus U P L U G IN 4—i—i—I—fr—*—*—*—*—Il=0 - 1 1=1 1 l=2 1 l=4 l=3 1 l=5 l=6 1 1 1 l=7 1 l=8 1 1 S _ M C V minus U _ E B B S _ G = 4 — | — | — | — i _ - | - s 1 1=0 1 1=1 s 1 l=2 s 1 l=3 - j - s s s s 1 1 1 1 l=4 l=5 l=6 l=7 ~%~ s l=8 s l= s 1=10 S M C V m i n u s U P L U G IN i T" s i -f~ - § ~ s - f r * l=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 i_ s l_ l=8 I 1 1 1 1 1 1 1 1 - - t ~ l=9 1 s _l_ 1=10 1 S _ M C V minus E M _ E B B S _ G —^-3^ u s s 1 1 1=0 1=1 s s l=2 l=3 l L ^ i s ^ s l l=5 l=4 t — s s s l=6 l=7 l=8 s 1 l=9 s 1 1=10 Figure A . 2 3 : Boxplots of pairwise differences in log MSE for the estimators ]PLUG-IN> PU]EBBS-G> PEM,EBBS-G A N D PS]MCV °f the linear effect ft in model (8.1), where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.4 and m(z) — m\(z). 215 4 1 1 I 2 1 0 -2 A H 1 T H 2 r\ U -2 | f * * f 1 t s s s s s s s s 1 1=0 1 1=1 i l=2 1=3 I I 1 1 s s s s s s s 1=1 1=2 1=13 1 1=4 1=15 1=16 1=7 1 I 1 A 1 i 1=9 1 1=10 1 1 i * ± s S 1=18 1=9 I 1=10 1 1 I X —w— U _ E B B S _ G minus U _ P L U G _ I N | 1 i i 1i 1 J 1 1 + . t t s S 1 1=1 1=i2 1=3 t 1=4 1=5 1=6 1=7 1=8 1=9 i i i i i i 1 1 1 s 4 . s s 1=13 1 1=4 I 1 + s 1=2 1=1 * * s s 1=16 1 1=7 1=8 s 1 + • s S 1=12 1 1 = 10 •• w — s s 1 I= 10 1=19 1 s 1=13 1=4 1 s s 1=15 1=16 s 1 1=7 S S s 1=18 1=19 I 1=10 1 I ^_ — 1=1 1 T s i s + ^ _±. _±_ _4_ _4_ _A_ 1 1 1 S _ M C V minus E M _ E B B S _ G 1=0 Figure A.24: s 1 1= 5 1 1 _ i _ 4 = 1 1 i T * 1 1 1 1 S _ M C V minus U _ P L U G _ I N JL, T J. 4 , * s 20 _ 1=12 s 1=0 $ 1 1=1 • T • 1 S _ M C V minus U _ E B B S _ G j A 1 s * s 2 -2 1=8 1 J 1 s 1 1 1=0 1=0 /I 1=6 s u -2 1 1=7 t * * ** 2 0 1=5 1=4 1 1=0 A -2 1 1 1 1 1 U _ P L U G _ I N minus E M _ E B B S _ G 1 -r 1 1 1 i s 0 >1 1 S 2 -2 1 1 1 1 1 U _ E B B S _ G minus E M _ E B B S _ G s s 1=13 1=4 1 s s s S 1=15 1=16 1 1=7 1=18 S s 1=19 1=10 Boxplots of pairwise differences in log MSE for the estimators PU,PLUG-IN> PU,EBBS-G> PEM,EBBS-G M]MCV °f ff A model (8.1), where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.6 and m(z) — m\{z). A N D 216 t h e l i n e a r e ect i n 1 1 1 I 1 1 1 1 1 U E B B S G minus E M E B B S_ G + ix xx s 1 1=0 -CO H -Cfl -Cfl -CO • s 1=1 1=2 1=3 1=4 1=5 1=6 1 I s 1=2 1=3 1 I I JL + s 1 1=0 s 1 1=1 i 1=3 1=2 1=6 i 1=4 i 1=5 i ^ 1=0 1=1 1=2 i 1 — S i 1=6 ^ s 1=4 1 * s 1=5 1 i ™ + p 1=7 1=8 i i 1=2 1 I i s 1=3 1 s s s 1=1 l=2 1=5 1 s 1 l=4 s 1 l=5 s 1=10 [ 1— —— T 1=10 i 1=9 4 = 1 I 4 - 4 + + S s s i 1=9 i 1=8 1=7 i I . i- . 1 ' s 1=6 1 s 1 l=6 + + S 1 1=9 s I 1=10 + S s i 1=8 1=7 i S - 1 4 » 4 - s 1 l=8 l=7 • s 1 1=10 t i ^^ s 1 l=3 s 1=9 + • i i I S M C V minus E M E B B S G t 1=0 s s 1=4 1 ' + — S i i s 1=6 1 1 ' I w i 1=8 i i i i S _ M C V minus U _ P L U G _ I N 1 s 1 1=10 + i 1=7 T T T* 1=1 - s 1=3 1 1 * i * '—1—' 1 - 9 E f a s 1 1=9 + iT T T • • _t_ 1=0 1=5 i i i i i S M C V minus U E B B S G 1 - 1=4 I i i I i U _ E B B S _ G minus U _ P L U G _ IN ! t -co s 1=1 i 1=8 + — S s 1 1=7 + ill _G JL i -Cfl s 1 1=0 i IN m i n u s E M E B B S • i a -CO -1 • U PLUG -Cfl S jt i S i l=9 s 1 1=10 Figure A . 2 5 : Boxplots of pairwise differences in log MSE for the estimators M]PLUG-IN> PU,EBBS-G> PEM,EBBS-G A N D M]MCV °f t h e l i n e a r ff e ect A i n model (8.1), where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.8 and m(z) = m\{z). 217 1 1 1 -r s 1=0 w 1=2 I I 1 1 s s S 1=0 1=1 1 s 1 1=3 1 1 1=1 1 i ' w s ! T s 1=1 ^ l=0 1=1 Figure A.26: l=2 EM_EBBS_G 1 1=9 1 1=10 i i A ,„^ 1 s s s l=3 l=4 l=5 l=6 l=7 l=8 l=9 1=10 l=7 l=8 l=9 1=10 l=7 l=8 l=9 1=10 l=7 l=8 l=9 1=10 l=3 ^ —I— 1 s l=3 l=2 l=2 s 1=8 1 * * s 1 minus 1 l=4 minus l=4 S_MCV l=0 s 1=7 1 s 1 U_EBBS_G l=2 ^ T ! 1 !t s S_MCV ^ minus s 1 1=6 Ii s 1 l=2 1=1 s 1 1=5 1 1=4 -I l=0 + | 1 1=1 EM_EBBS_G + U_PLUG_IN — i — l=0 minus U_EBBS_G ii £ i 1 1 1 - U_PLUG_IN * * l=5 l=6 *- U_EBBS_G l=5 minus 1 1 l=6 U_PLUG_IN HH^^^H*— l=3 l=4 S_MCV l=5 minus l=6 1— EM_EBBS_G H H I ^ l=3 s l_ l=5 l=4 l=6 l=7 l=8 1=10 Boxplots of pairwise differences in log MSE for the estimators M]PLUG-IN> PU]EBBS-G> PEM\EBBS-G A N D W,MCV °f t h e linear effect ft in model (8.1), where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0 and m(z) — 7712(2). 218 JI I ' ' l=6 l=8 U E B B S G minus E M E B B S G + 1=0 * * I + 1=1 l=2 l=4 l=5 l=7 l=9 1=10 * 1" l=9 1=10 U P L U G IN minus E M E B B S G 4- + l=0 l=3 t •f 1 1 1 1 l=2 1=1 l=3 l=5 l=4 l=6 U E B B S G minus U P L U G i — « — i - — i — i - - l=8 l=7 i IN i — — i — * s l=0 l=2 1=1 l=3 l=4 l=5 l=6 l=7 s I l=8 I l=9 1=10 S M C V minus U E B B S G _| l=0 _J_ 1=1 l=2 ^ ^ l=3 l=4 •I l=5 l=6 S M C V minus U P L U G • 1—r l=0 1=1 l=2 l=3 l=4 l=5 l=7 l=8 l=9 1=10 IN $ f l=6 f l=7 $ f- l=8 l=9 1=10 l=8 l=9 1=10 S M C V minus E M E B B S G ^ l=0 "* * * * t 1=1 Figure A.27: l=2 l=3 l=4 l=5 l=6 t S i l=7 Boxplots of pairwise differences in log MSE for the estimators °f ff Pi model (8.1), where I — 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.2 and m(z) = 7712(2). M]PLUG-IN> PU,EBBS-G> PEM,EBBS-G A N D M]MCV 219 t h e l i n e a r e ect i n U WWiJJJJpn ^ —<j||>— EBBS G minus E M E B B S ^ _^J^_ « G ^ H H ^ M Ma^^^n s L_ -2h 1=0 1=1 ± + enx^jua 1=0 l=2 U l=4 PLUG l=2 l=3 -4 ^ | ji I* l=0 1=1 l=2 l=3 l=6 l=7 IN m i n u s E M E B B S l=4 U_EBBS -2 l=5 . jj| 't ^ 1=1 l=3 ^ l=5 1=10 l=9 G —"H^^"™* ^ l=6 l=7 G minus U P L U G l=4 l=8 ——^^|^_ l=8 l=9 1=10 IN I I t « * l=5 l=6 l=7 l=8 l=9 1=10 l=5 l=6 l=7 l=8 l=9 1=10 l=7 l=8 l=9 1=10 S _ M C V minus U P L U G IN T T l=0 1=1 l=2 l=3 S 2 0 -fr- H(r- l=4 l=5 l=6 M C V minus E M E B B S G -f- -f~ -f~ t l=4 l=5 l=6 l=7 { -4- -#- -2 l=0 1=1 Figure A.28: l=2 l=3 l=8 l=9 1=10 Boxplots of pairwise differences in log MSE for the estimators M]MCV °f linear effect ft in model (8.1), where I — 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.4 and m(z) = m (z). PV,PLUG-IN> M]BBBS-G> MM,EBBS-G t h e A N D 2 220 w i rJLvi' i J i in ^ 1=0 1=1 l=2 * * i 1=0 1=1 4. 4, j ,U E B B S G minus E M E B B S G ^ l=3 ^ ^ l=4 l=5 ^ ^ ^ l=7 l=6 l=8 l=9 1=10 l=8 l=9 1=10 l=8 l=9 1=10 U _ P L U G _ I N minus E M _ E B B S _ G I ( l=3 l=4 1_| S i_ l=2 1I ! i U_EBBS l=5 l=6 l=7 G minus U P L U G IN i iV» s 1=0 1=1 1 l=2 l=3 1 l=4 l=5 l=6 l=7 i S _ M C V minus U _ E B B S _ G i -f- —S • s 1=3 1=4 1=5 1=6 1=7 4H -I « -Cfl 1=2 -CA s I 1=1 -CA s 1 1=0 -co -HII 1 - i " T -r s < 1=0 1 T i s 1 1=2 s s 1 1=3 1 1=4 s 1 1=6 S 1 1=8 S 1 1=9 1 1 1 - 1=10 1 1 _±_ _|_ _±_ S S 1=7 1=8 s 1 1=5 S 1 J "f"" ^ s 1 1=1 -§- ~ + S _ M C V minus U _ P L U G _ I N 1 i S 1 S 1 1=9 1 - 1=10 S _ M C V minus E M _ E B B S _ G —di— . T . s s 1=0 1=1 Figure A.29: - i s s s s 1 1 1 1 1=2 1=3 Boxplots 1=4 1=5 of pairwise W,PLUG-IN> P<J,EBBS-G> PEM,EBBS-G (8.1), where I = 0,1,... significantly obtained from different by evaluating model ,10. Boxplots A N D s S S S S 1 1=7 l=8 l=9 1=10 1=6 differences in I log MSE M]MCV °f for which t h e the average I for linear the log MSE's of the estimators 221 = m (z). 2 for - I the estimators effect Pi in model difference than 0 at the 0.05 level are labeled with an S. (8.1) for which p = 0.6 and m(z) " in log MSE Differences 500 data sets is were simulated 1 1 i 'u t 1 i ? *? 1=0 1=1 EBBS' f 1=2 1=3 i 'u 1=4 + * 5 ? ? =6 1=7 I i + . — — ^ ? ? ? 1=1 l=2 l=3 l=4 l=0 1=1 l=2 * + + l=3 l=4 l=5 S i=o 1=1 l=2 l=3 —i— l=4 l=6 l=5 S _ M C V minus + + -L ? 1=8 ? 1=9 1=10 1 + + -i- -L. + « s S ? l=7 l=8 l=9 1=10 l=7 l=8 l=9 1=10 l=7 l=8 l=9 1=10 l=8 l=9 1=10 1 1 1 U_PLLIG_IN M C V minus U E B B S TT* X ? l=6 ' U _ E B B S _ G minus 1_ J. + S l=5 + i _ P L U G _ I N minus E M_ E B B S . _ G ' | l=0 -L. + 1=5 ? s I | 1 i G minus E M . E ' B B S _ G ' G l=6 U_PLUG_IN f t _±_ -5 i=o 1=1 I I ? 1=0 l=2 l=3 + l=4 l=5 's. . M C V ' m i n u s l=6 i EM_EBBS_G A + + S S S S 1=1 I=2 l=3 1=4 l=5 l=6 ,+ l=7 ± -4- _±_ T l=7 1=8 ? 9 1=9 1=10 Figure A . 3 0 : Boxplots of pairwise differences in log MSE for the estimators °f ff Pi (8.1), where I — 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.8 and m(z) = 7 7 1 2 ( 2 ) . Pu,PLUG-IN> M]EBBS-G> PEM,EBBS-G A N D 222 # S , M C V t h e l i n e a r e ect i n m o d e l Appendix B Validity of Confidence Intervals In this appendix, we provide plots that help assess and compare the coverage properties of various methods for constructing standard 95% confidence intervals for Pi, the linear effect i n model (8.1). For each method, we visualize point estimates and 95% confidence interval estimates for the true coverage achieved by that method. 223 p = 0; m(z) = 2sin(3z) - 2(cos(0)-cos(3))/3 USUAL + PLUG-IN USUAL + E B B S - G USUAL + E B B S - L MODIFIED + P L U G - I N MODIFIED + E B B S - G MODIFIED + E B B S - L 0.98 0.96 0.94 0.92 0.9 SPECKMAN + MCV 0.98 pHTl = method with superior M S E performance 0.96 0.94 0.92 Figure B . l : Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect f3\ in model (8.1). Each method depends on a tuning parameter I = 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0 and m(z) — m\(z). 224 p = 0.2; m(z) = 2sin(3z) - 2(cos(0)-cos(3))/3 USUAL + PLUG-IN USUAL + E B B S - G USUAL + E B B S - L 0 MODIFIED + P L U G - I N 1 MODIFIED + E B B S - G 5 10 MODIFIED + E B B S - L 0.98 0.96 0.94 0.92 0.9 10 0.88 10 SPECKMAN + MCV ; method with superior M S E performance Figure B.2: Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect Pi in model (8.1). Each method depends on a tuning parameter I = 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.2 and m(z) = 7 7 1 2 ( 2 ) . 225 p = 0.4; m(z) = 2sin(3z) - 2(cos(0)-cos(3))/3 USUAL + PLUG-IN USUAL + EBBS-G USUAL + EBBS-L IjlHHlHI 10 MODIFIED + PLUG-IN MODIFIED + EBBS-G MODIFIED + EBBS-L SPECKMAN + MCV ; method with superior MSE performance Figure B.3: Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect B\ in model (8.1). Each method depends on a tuning parameter I = 0 , 1 , . . . , 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.4 and m(z) = m\(z). 226 p — u.o, nil,*.; — ^ s i r n o ^ — USUAL + PLUG-IN ^vuus>t_u;—isua^Offio USUAL + E B B S - G USUAL + EBBS-L MODIFIED + E B B S - G MODIFIED + EBBS-L • • • • • • • • 0.9 0.8 0.7 0.6 0.5 10 MODIFIED + PLUG-IN 10 1 0.9 SPECKMAN + MCV • •+ • • M M i f = method with superior MSE performance 0.8 0.7 0.6 0.5 0 5 10 Figure B.4: Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect Pi in model (8.1). Each method depends on a tuning parameter I — 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p — 0.6 and m(z) = mi(z). 227 p = 0.8; m(z) = 2sin(3z) - 2(cos(0)-cos(3))/3 USUAL + PLUG-IN USUAL + E B B S - G USUAL + EBBS-L MODIFIED + PLUG-IN MODIFIED + E B B S - G MODIFIED + EBBS-L 5 10 SPECKMAN + MCV • •••••• *••• • method with superior MSE performance 10 Figure B . 5 : Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect Q\ in model (8.1). Each method depends on a tuning parameter I = 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.8 and m(z) = m\(z). 228 p = 0; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6 USUAL + PLUG-IN 1 USUAL + E B B S - G USUAL + E B B S - L 0.9 0.8 0.7 0.6 0.5 10 MODIFIED + PLUG-IN 0.4 10 MODIFIED + E B B S - G MODIFIED + E B B S - L SPECKMAN + MCV f^vl = method with superior MSE performance Figure B.6: Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect ft in model (8.1). Each method depends on a tuning parameter I = 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p — 0 and m(z) = 7 7 1 2 ( 2 ) . 229 p = 0.2; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6 USUAL + PLUG-IN USUAL + E B B S - G USUAL + EBBS-L MODIFIED + PLUG-IN MODIFIED + E B B S - G MODIFIED + EBBS-L SPECKMAN + MCV • method with superior MSE performance Figure B.7: Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect ft in model (8.1). Each method depends on a tuning parameter I = 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p — 0.2 and m(z) = 7 7 1 2 ( 2 ) . 230 p = 0.4; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6 USUAL + PLUG-IN USUAL + E B B S - G USUAL + EBBS-L I M • • t M | • SPECKMAN + MCV : method with superior MSE performance Figure B.8: Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect B\ in model (8.1). Each method depends on a tuning parameter I = 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.4 and m(z) = 7 7 1 2 ( 2 ) . 231 p = 0.6; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6 1 USUAL + PLUG-IN 1 USUAL + E B B S - G 1 0.9 0.9 0.8 0.8 0.8 0.7 0.7 0.7 0.6 0.6 0.6 0.5 0.5 0.5 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.2 10 MODIFIED + PLUG-IN USUAL + EBBS-L 0.9 • * 10 MODIFIED + E B B S - G 0.2 MODIFIED + EBBS-L MM***!mm • 5 10 A' - 10 5 10 SPECKMAN + MCV jj = method with superior MSE performance Figure B.9: Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect Pi in model (8.1). Each method depends on a tuning parameter I = 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.6 and m(z) = 7 7 1 2 ( 2 ) . 232 p = 0.8; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6 1 0.9 USUAL + PLUG-IN 1 • * USUAL + E B B S - G USUAL + EBBS-L MODIFIED + EBBS G MODIFIED + EBBS-L 0.9 0.8 0.7 0.8 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 10 MODIFIED + PLUG-IN 5 10 SPECKMAN + MCV : method with superior MSE performance Figure B.10: Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect fi\ in model (8.1). Each method depends on a tuning parameter I — 0 , 1 , . . . , 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.8 and m(z) = m,2(z). 233 Appendix C Confidence Interval Length Comparisons In this appendix, we provide plots that help assess and compare the length properties of three methods for constructing standard 95% confidence intervals for 3 , X effect i n model (8.1). and Ps^MCVi a n < the linear These methods rely on the estimators PIJPLUG-IN^ PUEBBS-G ^ their associated standard errors. W e remind the reader that the finite sample properties of these estimators were investigated v i a simulation i n C h a p t e r 8. 234 p = 0; m(z) = 2sin(3z) - 2(cos(0)-cos(3))/3 U-PLUG-IN U-EBBS-G 1.15 S-MCV 1.15 1.15 = l>0 = confidence interval with shortest expected length for each l> 0 (among U - P L U G - I N , U - E B B S - G and S - M C V ) U _ E B B S _ G minus U PLUG IN 0.2 I 0 * -0.2 s -0.4 I * l=0 s 1=1 1 0.4 0.2 0 -0.2 -0.4 0.4 : I * S S S 1 i i r l=2 + s s s 1 I 1 + 0.2 1 1=2 1=3 1 S 1=3 t t s S i i S i " i S i S S 1 * ¥ -0.4 s 1=0 S 1 1=1 1=2 1=3 S • 1=4 i 1=5 ~w~ - s s i S 1 1=8 1=9 1=10 i S s JL ' T P — 1 1=8 1=9 S s 1=10 i 0 -0.2 1——— * 1 1=4 1=5 1=6 1=7 S_MCV minus U._EBBS_G i 1 i 1=4 1=5 1=6 1=7 S_MCV minus U _PLUG_IN . • 1 1 I i S 1 1=1 I * I + 1=0 I * s + I I * s S 1 1=6 1 1=7 1=8 1=9 I 1 = 10 Figure C l : Top row: Average length of the standard confidence intervals for the linear effect ft in model (8.1) as a function of I — 0 , 1 , . . . , 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for ft. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0 and m(z) = m\(z). 235 p = 0.2; m(z) = 2sin(3z) - 2(cos(0)-cos(3))/3 U-PLUG-IN U-EBBS-G 1.2 1.15 i 1| 1 *** * M 1.1 1.05 S-MCV 1.2 1.15 1.1 0 5 10 1.2 1.05 1.15 T 1 t * 1.1 0 5 10 1.05 0 5 10 = I> 1 = confidence interval with shortest expected length for each l> 1 (among U - P L U G - I N , U - E B B S - G and S - M C V ) U _ E B B S _ G minus U PLUG IN l=0 1=1 l=2 l=3 l=4 l=5 l=6 l=7 S_MCV minus U EBBS G l=8 l=9 1 = 10 l=9 1=10 Figure C.2: Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I = 0,1,..., 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p — 0.2 and m(z) — m (z). 1 236 p = 0.4; m(z) = 2sin(3z) - 2(cos(0)-cos(3))/3 U-PLUG-IN U-EBBS-G 1.2 S-MCV 1.2 1.1 * » « » » * « • 0 1.2 , *...... * 1.1 t f 1 5 10 : 1.1 ff T • 0 1 5 10 0 5 10 I>2 = confidence interval with shortest expected length for each l> 2 (among U - P L U G - I N , U - E B B S - G and S - M C V ) U _ E B B S _ G minus U PLUG IN l=6 l=7 l=8 l=9 1=10 Figure C . 3 : Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I — 0 , 1 , . . . , 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0.4 and m(z) — mi(z). 237 p = 0.6; m(z) = 2sin(3z) - 2(cos(0)-cos(3))/3 U-PLUG-IN U-EBBS-G 1.2 S-MCV 1.2 1 » »» < 1 D « 0.8 1 :0 * * * * 0.8 10 10 I> 3 = confidence interval with shortest expected length for each l> 3 (among U - P L U G - I N , U - E B B S - G and S - M C V ) U _ E B B S _ G minus U PLUG IN l=3 0.5 l=4 l=5 l=6 l=7 S MCV minus U PLUG IN l=9 1 -- 1=10 - 0 -0.5 l=8 S S S S l=0 1=1 l=2 I l=3 I I 1 1 S 1 L ± ± • + + J l=4 l=5 l=6 » U EBBS s G l=7 s S MCV minus 1 1 l=6 l=7 + 1 -1 l=8 s l=9 s 1=10 s l=8 l=9 1=10 1 1 1 0.5 0 -0.5 l=0 1=1 l=2 l=3 l=4 l=5 Figure C.4: Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I — 0 , 1 , . . . , 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0.6 and m(z) = mi(z). 238 p = 0.8; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6 U-PLUG-IN U-EBBS-G S-MCV 1.4 1.4 1.2 1.2 1 1 • 0.8 0.8 •0 S 10 •• • •• 0 5 10 I>4 = confidence interval with shortest expected length for each l> 4 (among U - P L U G - I N , U - E B B S - G and S - M C V ) If U _ E B B S _ G minus U_PLUG_IN 0.2 = r a 0 "i i l=2 l=3 -0.2 -0.4 -0.6 S S 1=0 1=1 1 1 1 - A ^ ' S 1 1=0 i Eh S 1 1 1=1 I l f 1 l=2 l=8 -1 r l=9 »- 1=10 j Ek ^ S 1— l=4 l=5 l=6 l=7 S MCV minus U P L U G IN 1 0.5 1 S l=3 1 S 1 S S S S l=4 l=5 l=6 l=7 1 1 1 S_MCV minus U EBBS G I S l=8 1 S l=9 I 1=10 | I 1 0.5 h j. S l=0 I 1 JL S 1=1 1 JL S l=2 1 •*• S l=3 1 J. S l=4 I S l=5 I j. S J. S l=6 I S l=7 I S l=8 I S l=9 I 1=10 I Figure C.5: Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) os a function of I = 0,1,... ,10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0.8 and m(z) — mi(z). 239 p = 0; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6 U-PLUG-IN U-EBBS-G S-MCV =I>0 - confidence interval with shortest expected length for each l> 0 (among U - P L U G - I N , U - E B B S - G and S - M C V ) U _ E B B S _ G minus U PLUG IN -0.5 0.5 -0.5 1=3 - I + s 1 1=0 s 1 1=1 '+' 1 1=4 1=5 1=6 1=7 S_MCV minus U EBBS G 1 1 1 1 , f , . t . J i s s l=2 1=3 S 1=4 i "i • s 1 1=5 S 1 1=6 S S S S 1=7 1=8 1=9 1=10 1 1 Figure C.6: Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I — 0,1,..., 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p — 0 and m(z) — 1712(2). 240 p = 0.2; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6 U-EBBS-G U-PLUG-IN S P E C K M A N + MCV I> 1 = confidence interval with shortest expected length for each l> 1 (among U - P L U G - I N , U - E B B S - G and S - M C V ) U EBBS G minus U P L U G IN 0.2 I 0 l | -0.2 1 l I s 1=0 l » s l 1 ! i ( S1 , 1 s 1=1 l=2 l=3 i i i i 0.5 l=4 l=5 l=6 l= 7 S_MCV minus U_PLUG_IN i 1 i + i 0 Js -0.5 - L i J L JL J ± + ^ r r ^ T r - i - i u s 1 i 1 I ; JL -0.5 1 S 1 l=0 1 l=2 I 1 S 1 1=1 S I I 1 ± 1 l=2 1 l=3 + 4. s 1 1 1 l=4 l=5 l=6 l= 1 7 S_MCV minus U _ E B B S _ G 1 l=3 I ± s s s 1=9 1=10 1 1 1=1 8 ± jL 1 1 1 l=5 S 1 s s 1 l=9 l=6 S 1 l=7 i J L S 1 l=8 JL -L S s 1 1=10 i ± + 4 - 1 1 -- 4 . i I t 1 l=4 s 1= 8 + j s 1 1=1 1 1=0 I 0 l 4s -0.4 0.5 l S 1 l=9 J, S 1 1=10 Figure C.7: Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I = 0,1,... ,10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0.2 and m(z) = 7712(2). 241 p = 0.4; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6 U-PLUG-IN U-EBBS-G S-MCV 1.2 1.2 1.1 1.1 1 1 • 0 5 10 0 5 10 = l>2 = confidence interval with shortest expected length for each l> 2 (among U - P L U G - I N , U - E B B S - G and S - M C V ) U _ E B B S _ G minus U P L U G IN 0.2 — i 0— -0.2 - i i i i i -0.4 S I—0 —I— 1=1 l=2 -i 0.5 !! T~ -0.5 s 1=0 j. l=3 s s l=2 l=8 r l=9 i + m—r 1=1 l=4 l=5 l=6 l=7 S_MCV minus U PLUG IN T i 3 I l=4 l=5 l=6 l=7 S_MCV minus U E B B S G r 1 4 4 S l=3 1=10 s s l=8 l=9 1=10 l=8 l=9 1=10 Figure C.8: Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I = 0,1,... ,10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p — OA and m(z) = 7 7 1 2 ( 2 ) . 242 p = 0.6; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6 U-PLUG-IN U-EBBS-G 1.2 S-MCV 1.2 1 0.8 tj> • * « « : « . « « 1 ;» • • 1.2 0.8 • • • » * * # * « » : 1 • * 0.8 10 10 10 = I ;> 3 = confidence interval with shortest expected length for each l> 3 (among U - P L U G - I N , U - E B B S - G and S - M C V ) U _ E B B S _ G minus U PLUG IN 0.5 0 -0.5 l=3 l=2 1 1 i —r P b|d j d S S l=4 l=5 l=6 l=7 S_MCV minus U E B B S G 1 1 0.5 " 0 -0.5 I S 1 l=0 1 1=1 1 l=2 I-7-I S I l=3 S h|=l + I l=4 S I 1 =t i-i-l + S I l=5 — l=6 i I ^-i s 1 l=7 l=8 1 l=9 1=10 1 1 hjH Fj=l f=Y=l s s l=8 i l=9 i s 1=10 i Figure C.9: Top row: Average length of the standard confidence intervals for the linear effect 3\ in model (8.1) as a function of I = 0,1,... ,10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Q\. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p — 0.6 and m(z) = 7 7 1 2 ( 2 ) . 243 p = 0.8; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6 U - PLUG-IN U- EBBS-G 3-MCV 1.4 1.4 1.4 1.2 1.2 1.2 1 1 1 « « • » « « « • 0.8 •• 0.8 •0 5 10 0 0.8 • 5 • 10 •• • •• 0 5 10 I >4 - confidence interval with shortest expected length for each l> 4 (among U - P L U G - I N , U - E B B S - G and S - M C V ) U _ E B B S _ G minus U_PLUG_IN 0.2 r- 0 — ! ! * 1 -0.2 -0.4 -0.6 l = 0.5 ° 1=1 I 1 l=3 1 I l=4 l=5 l=6 l=7 S_MCV minus U_PLUG_IN 1 4* Rp ¥ I l=2 s s s 1=0 1=1 |=2 s I 1 1 1 • l=9 1=10 J h - I ' s l= 3 : l=8 s s s s s s l=4 l=5 l=6 l= 7 S_MCV minus U _ E B B S _ G 1= B 1=9 1 =10 1=7 1=8 1=9 1=10 i 1 1 , Figure C.10: Top row: Average length of the standard confidence intervals for the linear effect ft in model (8.1) as a function of I = 0,1,..., 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for ft. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p — 0.8 and m(z) — 1 7 1 2 ( 2 ) . 244
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Inference in partially linear models with correlated...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Inference in partially linear models with correlated errors Ghement, Isabella Rodica 2005-12-31
pdf
Page Metadata
Item Metadata
Title | Inference in partially linear models with correlated errors |
Creator |
Ghement, Isabella Rodica |
Date | 2005 |
Date Issued | 2009-12-21T20:45:55Z |
Description | We study the problem of performing statistical inference on the linear effects in partially linear models with correlated errors. To estimate these effects, we introduce usual, modified and estimated modified backfitting estimators, relying on locally linear regression. We obtain explicit expressions for the conditional asymptotic bias and variance of the usual backfitting estimators under the assumption that the model errors follow a mean zero, covariance-stationary process. We derive similar results for the modified backfitting estimators under the more restrictive assumption that the model errors follow a mean zero, stationary autoregressive process of finite order. Our results assume that the width of the smoothing window used in locally linear regression decreases at a specified rate, and the number of data points in this window increases. These results indicate that the squared bias of the considered estimators can dominate their variance in the presence of correlation between the linear and non-linear variables in the model, therefore compromising their i/n-consistency. We suggest that this problem can be remedied by selecting an appropriate rate of convergence for the smoothing parameter of the-estimators. We argue that this rate is slower than the rate that is optimal for estimating the non-linear effect, and as such it 'undersmooths' the estimated non-linear effect. For this reason, data-driven methods devised for accurate estimation of the non-linear effect may fail to yield a satisfactory choice of smoothing for estimating the linear effects. We introduce three data-driven methods for accurate estimation of the linear effects. Two of these methods are modifications of the Empirical Bias Bandwidth Selection method of Opsomer and Ruppert (1999). The third method is a non-asymptotic plug-in method. We use the data-driven choices of smoothing supplied by these methods as a basis for constructing approximate confidence intervals and tests of hypotheses for the linear effects. Our inferential procedures do not account for the uncertainty associated with the fact that the choices of smoothing are data-dependent and the error correlation structure is estimated from the data. We investigate the finite sample properties of our procedures via a simulation study. We also apply these procedures to the analysis of data collected in a time-series air pollution study. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Collection |
Retrospective Theses and Dissertations, 1919-2007 |
Series | UBC Retrospective Theses Digitization Project |
Date Available | 2009-12-21 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0092286 |
URI | http://hdl.handle.net/2429/16950 |
Degree |
Doctor of Philosophy - PhD |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2005-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- ubc_2005-104953.pdf [ 14.36MB ]
- [if-you-see-this-DO-NOT-CLICK]
- Metadata
- JSON: 1.0092286.json
- JSON-LD: 1.0092286+ld.json
- RDF/XML (Pretty): 1.0092286.xml
- RDF/JSON: 1.0092286+rdf.json
- Turtle: 1.0092286+rdf-turtle.txt
- N-Triples: 1.0092286+rdf-ntriples.txt
- Original Record: 1.0092286 +original-record.json
- Full Text
- 1.0092286.txt
- Citation
- 1.0092286.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Country | Views | Downloads |
---|---|---|
United States | 21 | 1 |
China | 14 | 47 |
Canada | 5 | 4 |
France | 5 | 0 |
Germany | 2 | 21 |
United Kingdom | 1 | 0 |
Poland | 1 | 0 |
City | Views | Downloads |
---|---|---|
Ashburn | 12 | 0 |
Unknown | 9 | 22 |
Shenzhen | 7 | 43 |
Beijing | 6 | 4 |
Vancouver | 5 | 4 |
Mountain View | 3 | 0 |
Wilmington | 3 | 0 |
Washington | 2 | 0 |
Redmond | 1 | 0 |
Guangzhou | 1 | 0 |
{[{ mDataHeader[type] }]} | {[{ month[type] }]} | {[{ tData[type] }]} |
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0092286/manifest