UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Inference in partially linear models with correlated errors Ghement, Isabella Rodica 2005

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata

Download

Media
ubc_2005-104953.pdf [ 14.36MB ]
[if-you-see-this-DO-NOT-CLICK]
Metadata
JSON: 1.0092286.json
JSON-LD: 1.0092286+ld.json
RDF/XML (Pretty): 1.0092286.xml
RDF/JSON: 1.0092286+rdf.json
Turtle: 1.0092286+rdf-turtle.txt
N-Triples: 1.0092286+rdf-ntriples.txt
Original Record: 1.0092286 +original-record.json
Full Text
1.0092286.txt
Citation
1.0092286.ris

Full Text

Inference in Partially Linear Models with Correlated Errors by ISABELLA RODICA GHEMENT B.Sc, The University of Bucharest, Romania, 1996 M.Sc, The University of Bucharest, Romania, 1997  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  Doctor of Philosophy in THE FACULTY OF GRADUATE STUDIES (Statistics)  The University of British Columbia August 2005  © I S A B E L L A R O D I C A G H E M E N T , 2005  Abstract We study the problem of performing statistical inference on the linear effects in partially linear models with correlated errors. To estimate these effects, we introduce usual, modified and estimated modified backfitting estimators, relying on locally linear regression. We obtain explicit expressions for the conditional asymptotic bias and variance of the usual backfitting estimators under the assumption that the model errors follow a mean zero, covariance-stationary process. We derive similar results for the modified backfitting estimators under the more restrictive assumption that the model errors follow a mean zero, stationary autoregressive process of finite order. Our results assume that the width of the smoothing window used in locally linear regression decreases at a specified rate, and the number of data points in this window increases. These results indicate that the squared bias of the considered estimators can dominate their variance in the presence of correlation between the linear and non-linear variables in the model, therefore compromising their i/n-consistency. We suggest that this problem can be remedied by selecting an appropriate rate of convergence for the smoothing parameter of the-estimators. We argue that this rate is slower than the rate that is optimal for estimating the non-linear effect, and as such it 'undersmooths' the estimated non-linear effect. For this reason, data-driven methods devised for accurate estimation of the non-linear effect may fail to yield a satisfactory choice of smoothing for estimating the linear effects. We introduce three data-driven methods for accurate estimation of the linear effects. Two of these methods are modifications of the Empirical Bias Bandwidth Selection method of Opsomer and Ruppert (1999). The third method is a non-asymptotic plug-in method. We use the data-driven choices of smoothing supplied by these methods as a basis for constructing approximate confidence intervals and tests of hypotheses for the linear effects. Our inferential procedures do not account for the uncertainty associated with the fact that the choices of smoothing are data-dependent and the error correlation structure is estimated from the data. We investigate the finite sample properties of our procedures via a simulation study. We also apply these procedures to the analysis of data collected in a time-series air pollution study.  ii  Contents Abstract  ii  Contents  iii  List of Tables  vii  List of Figures  viii  Acknowledgements  xxiv  Dedication  xxvi  1  Introduction  1  1.1 Literature Review  1.2 2  2  1.1.1  Partially Linear Models with Uncorrelated Errors  3  1.1.2  Partially Linear Models with Correlated Errors  5  Thesis Objectives  9  A P a r t i a l l y Linear M o d e l with C o r r e l a t e d E r r o r s  13  2.1 The Model  13  2.2  15  Assumptions  2.3 Notation  19  2.4  Linear Algebra - Useful Definitions and Results  21  2.5  Appendix  22  iii  3  E s t i m a t i o n in a P a r t i a l l y Linear M o d e l w i t h C o r r e l a t e d E r r o r s  3.1  4  5  Generic Backfitting Estimators  26  3.1.1  Usual Generic Backfitting Estimators  30  3.1.2  Modified Generic Backfitting Estimators  31  3.1.3  Estimated Modified Generic Backfitting Estimators  31  3.1.4  Usual, Modified and Estimated Modified Speckman Estimators . .  32  A s y m p t o t i c Properties of the L o c a l Linear Backfitting E s t i m a t o r fii s  c  t  h  35  4.1  Exact Conditional Bias of fii s\ given X and Z  36  4.2  Exact Conditional Variance of fi c Given X and Z  44  4.3  Exact Conditional Measure of Accuracy of (3  c  49  4.4  The Vn-consistency of  fii,s^  4.5  Generalization to Local Polynomials of Higher Degree  52  4.6  Appendix  53  t  IiS  I<S  given X and Z  50  A s y m p t o t i c Properties of the M o d i f i e d and E s t i m a t e d M o d i f i e d L o c a l Linear Backfitting Estimators, / 3 ^ - i c and /3~-i i S  6  25  g c  5.1  Exact Conditional Bias of / 3 ^ - i  5.2  Exact Conditional Variance of  5.3  Exact Conditional Measure of Accuracy of /3^-i « Given X and Z . . .  79  5.4  The -v/n-consistency of /3^-i c  80  5.5  Generalization to Local Polynomials of Higher Degree  81  5.6  The i/n-consistency of /3--i  81  5.7  Appendix  84  i S  c  given X and Z  71  J S ^ - J ^ C  given X and Z (S  >S  C h o o s i n g the C o r r e c t A m o u n t of S m o o t h i n g  6.1  Notation  6.2  Choosing h for c (3 c and  72 76  101  102 T  I>S  c /3^-i c T  ) S  103  6.2.1  Review of Opsomer and Ruppert's EBBS method  104  6.2.2  Modifications to the EBBS method  107  iv  6.2.3 6.3  6.4 7  7.2  9  109  Estimating m, of and *  110  6.3.1  Estimating m  110  6.3.2  Estimating of and *  114  Choosing h for c J3~-i  116  T  Confidence Interval E s t i m a t i o n and Hypothesis Testing  7.1  8  Plug-in method  118.  Confidence Interval Estimation  118  7.1.1  Bias-Adjusted Confidence Interval Construction  121  7.1.2  Standard Error-Adjusted Confidence Interval Construction . . . .  122  Hypothesis Testing  123  M o n t e C a r l o Simulations  124  8.1  The Simulated Data  125  8.2  The Estimators  126  8.3  The MSE Comparisons  129  8.4  Confidence Interval Coverage Comparisons  130  8.4.1  Standard Confidence Intervals  131  8.4.2  Bias-Adjusted Confidence Intervals  133  8.4.3  Standard Error-Adjusted Confidence Intervals  134  8.5  Confidence Interval Length Comparisons  134  8.6  Conclusions  136  A p p l i c a t i o n to A i r P o l l u t i o n D a t a  141  9.1  Data Description  143  9.2  Data Analysis  144  9.2.1  Models Entertained for the Data  144  9.2.2  Importance of Choice of Amount of Smoothing  146  9.2.3  Choosing an Appropriate Model for the Data  147  9.2.4  Inference on the PM10 Effect on Log Mortality  151  v  10 Conclusions  180  Bibliography  187  Appendix A  M S E Comparisons  192  Appendix B  V a l i d i t y of Confidence Intervals  223  Appendix C  Confidence Interval L e n g t h Comparisons  234  vi  List of Tables 8.1  Values of I for which the standard 95% confidence intervals for /?i constructed from the estimators  Pu]  _,  PLUG  IN  PU!EBBS-G  a  n  d  PS^MCV  A  R  E  valid (in the sense of achieving the n o m i n a l coverage) for each setting i n our simulation study.  vii  List of Figures 8.1  Data simulated from model (8.1) for  p =  0,0.4,0.8 and m(z) — mi(z).  The first row shows plots that do not depend on p. The second and third rows each show plots for p — 0, 0.4, 0.8 8.2  138  Data simulated from model (8.1) for p = 0,0.4,0.8 and m(z) = m.2(z). The first row shows plots that do not depend on p. The second and third rows each show plots for p = 0, 0.4, 0.8  139  9.1  Pairwise scatter plots of the Mexico City air pollution data  156  9.2  Results of gam inferences on the linear PM10 effect (3\ in model (9.3) as a function of the span used for smoothing the seasonal effect m.\. estimated PM10 effects (top left), associated standard errors (top right), 95% confidence intervals for /?i (bottom left) and p-values of t-tests for testing the statistical significance of /?i  9.3  157  The top panel displays a scatter plot of log mortality versus PM10. The ordinary least squares regression line of log mortality on PM10 is superimposed on this plot. The bottom panel displays a plot of the residuals associated with model (9.1) versus day of study.  9.4  158  Plots of the the fitted seasonal effect mi in model (9.2) for various spans. Partial residuals, obtained by subtracting the fitted parametric part of the model from the responses, are superimposed as dots  9.5  159  Plots of the residuals associated with model (9.2) for various spans. . . . 160  viii  9.6  P-values associated w i t h a series of crude F-tests for testing model (9.4) against model (9.2)  9.7  161  Plots of the fitted weather surface m  2  i n model (9.4) when the fitted sea-  sonal effect m i (not shown) was obtained w i t h a span of 0.09. T h e surface m2 was smoothed w i t h spans of 0.01 (top left), 0.02 (top right), 0.03 (bott o m left) or 0.04 (bottom right) 9.8  162  Degrees of freedom consumed by the fitted weather surface m-2 i n model (9.4) versus the span used for smoothing m when the fitted seasonal effect 2  m i (not shown) was obtained w i t h a span of 0.09 9.9  163  P l o t of residuals associated w i t h model (9.3) versus P M 1 0 (top row) and day of study (bottom row). T h e span used for smoothing the unknown m i i n model (9.3) is 0.09  164  9.10 P l o t of residuals associated w i t h model (9.3) versus relative humidity, given temperature. T h e span used for smoothing the unknown m i i n model (9.3) is 0.09  165  9.11 P l o t of residuals associated w i t h model (9.3) versus temperature, given relative humidity. T h e span used for smoothing the unknown m j i n model (9.3) is 0.09  166  9.12 A u t o c o r r e l a t i o n plot (top row) and partial autocorrelation plot (bottom row) of the residuals associated w i t h model (9.3).  T h e span used for  smoothing the unknown m i i n model (9.3) is 0.09  167  9.13 A u t o c o r r e l a t i o n plot (top row) and partial autocorrelation plot (bottom row) of the responses i n model (9.3)  168  9.14 Usual local linear backfitting estimate of the linear P M 1 0 effect i n model (9.4) versus the smoothing parameter  169  9.15 P r e l i m i n a r y estimates of the seasonal effect m i n model (9.3), obtained w i t h a modified (or leave-2/ + 1-out) cross-validation choice of amount of smoothing  170  ix  9.16 Residuals associated with model (9.3), obtained by estimating mi with a modified (or leave-(2Z + l)-out) cross-validation choice of amount of smoothing  171  9.17 Estimated order for A R process describing the serial correlation in the residuals associated with model (9.3) versus I, where I — 0,1,...,26. Residuals were obtained by estimating mi with a modified (or leave(21 + l)-out) cross-validation choice of amount of smoothing  172  9.18 Estimated bias squared, variance and mean squared error curves used for determining the plug-in choice of smoothing for the usual local linear backfitting estimate of Pi. The different curves correspond to different values of I, where Z = 0,1,..., 26. The estimated variance curves corresponding to small values of I are dominated by those corresponding to large values of I when the smoothing parameter is large. In contrast, the estimated squared bias and mean squared error curves corresponding to small values of I dominate those corresponding to large values of I when the smoothing parameter is large  173  9.19 Estimated bias squared, variance and mean squared error curves used for determining the global EBBS choice of smoothing for the usual local linear backfitting estimate of Q\. The different curves correspond to different values of I, where I = 0,1,..., 26.  The curves corresponding to large  values of I dominate those corresponding to small values of 1  174  9.20 Plug-in choice of smoothing for estimating Pi versus I, where I = 0,1,..., 26.175 9.21 Global EBBS choice of smoothing for estimating Pi versus I, where I = 0,1,..., 26  176  x  9.22 Standard 95% confidence intervals for Pi based on local linear backfitting estimates of Pi with plug-in choices of smoothing. The different intervals correspond to different values of I, where I = 0,1,..., 26. The shaded area represents confidence intervals corresponding to values of I that are reasonable for the data  177  9.23 Standard 95% confidence intervals for Pi based on local linear backfitting estimates of Pi with global EBBS choices of smoothing. The different intervals correspond to different values of I, where I = 0,1,..., 26. The shaded area represents intervals corresponding to values of I that are reasonable for the data; the intervals corresponding to I = 3 , . . . , 7 do not cross the horizontal line passing through zero  178  9.24 Standard 95% confidence intervals for Pi based on local linear backfitting estimates of Pi with global EBBS choices of smoothing obtained by using a smaller grid range. The different intervals correspond to different values of I, where I = 0,1,..., 26. The shaded area represents confidence intervals corresponding to values of I that are reasonable for the data A . l Boxplots of pairwise differences in log MSE for the estimators PU]EBBS-G  A  N  A  @IJ,EBBS-L  OI  "*  n e  linear ff t p e  ec  1  i n  179 PJPPLUG-INI  model (8.1), where  I = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0 and m(z) = mi (z). 193 A.2 Boxplots of pairwise differences in log MSE for the estimators PIJEBBS-G  a n  d  PUEBBS-L  OI  "*  n e  P'UPLUG-IN^  linear effect Pi in model (8.1), where  I = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.2 and m(z) = mi(z). 194 xi  A.3 Boxplots of pairwise differences in log MSE for the estimators PIJ PLUG-IN > PJJ,EBBS-G  a n  d PU!EBBS-L °f *  n e  linear effect B\ in model (8.1), where  I = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.4 and m(z) — mi(z). 195 A.4 Boxplots of pairwise differences in log MSE for the estimators PUPLUG-IN> PIJ EBBS-G  a n  d PIJEBBS-L  °f the linear effect B\ in model (8.1), where  I = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.6 and m(z) = mi(z).196 A.5 Boxplots of pairwise differences in log MSE for the estimators EBBS-G  a n  B^ _ , <  PLUG  IN  d PU,EBBS-L °f the linear effect Pi in model (8.1), where  I — 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.8 and m(z) = m\(z). 197 A.6 Boxplots of pairwise differences in log MSE for the estimators PU^PLUG-INI EBBS-G  a n  d PUEBBS-L  °^ ^  n e  linear effect Pi in model (8.1), where  / = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0 and m(z) = m,2{z). 198  xii  A.7 Boxplots of pairwise differences in log MSE for the estimators PU*PLUG-INI EBBS-G  a n  d PUEBBS-L  °f the linear effect /3j in model (8.1), where  / = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.2 and m(z) —  7712(2).  199  A.8 Boxplots of pairwise differences in log MSE for the estimators P(JPLUG-IN> PIJEBBS-G  a n  d PIJEBBS-L °f the linear effect /?i in model (8.1), where  I = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.4 and m(z) = m (z). 200 2  A.9 Boxplots of pairwise differences in log MSE for the estimators P'u PLUG-IN^ PIJEBBS-G  a n  d PIJEBBS-L °f the linear effect /?i in model (8.1), where  I = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.6 and m(z) —  7712(2:).  201  A.10 Boxplots of pairwise differences in log MSE for the estimators PIJPLUG-IN> PXJEBBS-G  a n  d PIJEBBS-L °f the linear effect B\ in model (8.1), where  / = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.8 and m(z) = m (z). 202 2  xiii  A. 11 Boxplots of pairwise differences in log MSE for the estimators 3^  PLUG-IN>  M  P^EM EBBS-G  a n  d  $EM EBBS-L  °f *  n e  linear effect 8\ in model (8.1), where  / = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0 and m(z) — m\(z). A. 12 Boxplots of pairwise differences in log MSE for the estimators $SM,EBBS-G  a n  d  P^EM,EBBS-L  °f  t n e  203  P^M PLUG-IN >  hnear effect Pi in model (8.1), where  I — 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.2 and m(z) = mi(z). 204 A.13 Boxplots of pairwise differences in log MSE for the estimators 8^§  M  P^EM,EBBS-G  a n  ^  P^EM,EBBS-L  PLUG-IN^  °f the linear effect 3\ in model (8.1), where  I — 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.4 and m(z) — m\(z). 205 A. 14 Boxplots of pairwise differences in log MSE for the estimators P^EM,EBBS-G  a n  d  P^EM,EBBS-L  °f ^  n e u n e a r  PLUG-IN'  effect Pi in model (8.1), where  I — 0,1,..., 10. Boxplots for which the average difference in log MSE's is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.6 and m(z) = mi(z). 206  xiv  1  A.15 Boxplots of pairwise differences in log MSE for the estimators 0 ^ E  PEM,EBBS-G  a  n  d  PEM,EBBS-L  o f  the  l i n e a r  e f f e c t  A  i n  M  C-IN>  PLU  model (8.1), where  J = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.8 and m(z) = rni(z). A.16 Boxplots of pairwise differences in log MSE for the estimators 0^  M  PISM,EBBS-G  A N A  $SM,EBBS-L °fthe  n n e a r  207  G-IN>  PLU  effect 3\ in model (8.1), where  I = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 significance level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0 and 208  m(z) = m (z) 2  A.17 Boxplots of pairwise differences in log MSE for the estimators 0^ PLUG-IN^ M  P^EM,EBBS-G d PISM,EBBS-L "the a n  OI  n n e a r  effect 0i in model (8.1), where  I — 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.2 and m(z) = m {z). 2  209  A. 18 Boxplots of pairwise differences in log MSE for the estimators P\^M PLUG—IN> PEM,EBBS-G  a n  d PEM,EBBS-L °f e linear effect @ in model (8.1), where t n  x  / = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.4 and m(z) = m (z). 2  xv  210  A. 19 Boxplots of pairwise differences in log MSE for the estimators $SM,EBBS-G  a  n  d  °f the linear effect  PISM,EBBS-L  B\  B^M  PLUG-IN^  in model (8.1), where  / = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.6 and m(z) = m (z). 211 m A.20 Boxplots of pairwise differences in log MSE for the estimators 0^ EM,PLUG-IN' 2  PEM,EBBS-G  a  n  d  o f t h e  PEM,EBBS-L  l i n e a r  e f f e c t  Pi hi model (8.1), where  Z — 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.8 and m(z) — 7712(2). 212 A.21 Boxplots of pairwise differences in log MSE for the estimators Pu,EBBS-G>  M M,EBBS-G  a  n  d  ]  PS]MCV  o f t h e  l i n e a r  e f f e c t  Pi  i n  m o d  B  U  ^  P  L  U  G  _  I  N  ,  e l (8.1),  where I = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0 and m(z) = m\(z). 213 A.22 Boxplots of pairwise differences in log MSE for the estimators PU!EBBS-G>  PEM,EBBS-G  a  n  d  P S]MCV {  o f t h e  l i n e a r  e f f e c t  Pi  i n  PIJPLUC-IN'  model (8.1),  where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.2 and m(z) = mi (z).214  xvi  A.23 Boxplots of pairwise differences in log MSE for the estimators Pu*PLUG-IN i PU,EBBS-G> PEM,EBBS-G  a  n  d  PS]MCV  o f  t h e  lin  ear effect 6 in model (8.1), X  where I — 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.4 and m(z) = mi (z). 215 A.24 Boxplots of pairwise differences in log MSE for the estimators PO^PLUG-IN^ PU,EBBS-G> PEM,EBBS-G  a  n  d  PSMCV  o f  t h e  lin  ear effect ft in model (8.1),  where / = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an 5. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.6 and m(z)= mi(z).216 A.25 Boxplots of pairwise differences in log MSE for the estimators PU^PLUG-INI M]EBBS-GI  PEM,EBBS-G  a n d  PS]MCV  o f t h e  linear effect ft in model (8.1),  where / = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.8 and m(z) = mi (z).217 A.26 Boxplots of pairwise differences in log MSE for the estimators PI/PLUG-IN' PU,EBBS-G> PEM,EBBS-G  a n d  PS]MCV  o f t h e  linear effect ft in model (8.1),  where / = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0 and m(z) = m (z).218 2  xvii  A.27 Boxplots of pairwise differences in log MSE for the estimators PUPLUG-IN> M]EBBS-GI PEM,EBBS-G  a n d  M}MCV  o f t n e l i n e a r e f f e c t  A  i n m o d e l  C - )8 1  where I = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.2 and m(z) = 7712(2).219 A.28 Boxplots of pairwise differences in log MSE for the estimators Pu'PLUG-IN' 0U,EBBS-G>  a n d  PEM,EBBS-G  PS}MCV  o f  t h e  l i n e a r  e f f e c t  Pi  i n  m o d e l  t - )' 8  1  where I = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.4 and 777,(2) = 777,2(2).220 A.29 Boxplots of pairwise differences in log MSE for the estimators PIJPLUG-IN' PU]EBBS-GI PEM,EBBS-G  a n d  PSMCV  o f t h e l i n e a r e f f e c t  Pi  i n m o d e l  C - )' 8 1  where I — 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an, S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) for which p = 0.6 and m(z) = 221  m (z) 2  A.30 Boxplots of pairwise differences in log MSE for the estimators PU UG PL  PU,EBBS-GI PEM,EBBS-G  a  n  d  PS,MCV  o f  t h e  l i n e a r  e f f e c t  Pi  i n  -IN'  model (8.1),  where I — 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.8 and 771(2) = 777.2(2).222  xviii  B.l  Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect 8\ in model (8.1). Each method depends on a tuning parameter I = 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p — 0 and m(z) — mi(z)  224  B.2 Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect 3i in model (8.1). Each method depends on a tuning parameter I — 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.2 and m(z) = m^z)  225  B.3 Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect (3\ in model (8.1). Each method depends on a tuning parameter I = 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.4 and m(z) = m\(z)  226  B.4 Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect d\ in model (8.1). Each method depends on a tuning parameter I = 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p — 0.6 and m(z) = m\(z)  xix  227  B.5 Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect ft in model (8.1). Each method depends on a tuning parameter I — 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p — 0.8 and m(z) = m\(z)  228  B.6 Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect ft in model (8.1). Each method depends on a tuning parameter / = 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0 and m(z) — m (z) 2  229  B.7 Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect ft in model (8.1). Each method depends on a tuning parameter I — 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.2 and m(z) — m {z) 2  230  B.8 Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect ft in model (8.1). Each method depends on a tuning parameter I = 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.4 and m(z) — m (z) 2  xx  231  B.9 Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect Pi in model (8.1). Each method depends on a tuning parameter I = 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p — 0.6 and  771(2) = 777.2(2:)  B. 10 Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect Pi in model (8.1). Each method depends on a tuning parameter I = 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.8 and Cl  771(2) = 7712(2)  Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of Z = 0,1,..., 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0 and  771(2) = 7711(2)  C. 2 Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I — 0,1,..., 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0.2 and  771(2) = 7711(2)  xxi  C.3 Top row: Average length of the standard confidence intervals for the linear effect ft in model (8.1) as a function of I — 0,1,..., 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0.4 and m(z) = rrii(z)  237  C.4 Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I = 0,1,..., 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p — 0.6 and m(z) = rri\(z)  238  C.5 Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of / — 0,1,..., 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0.8 and m(z) = mi(z)  239  C.6 Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I = 0,1,..., 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for ft. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0 and  m(z)  — 7712(2)  240  xxii  C.7 Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I = 0,1,..., 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0.2 and  771(2) = 771.2(2)  241  C.8 Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I = 0,1,..., 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths-were computed with p = 0.4 and  771(2) = m ( 2 )  242  2  C.9 Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I — 0,1,..., 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p — 0.6 and  771(2) = 7712(2)  243  C.10 Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I = 0,1,..., 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0.8 and  771(2) = 7712(2)  244  xxm  Acknowledgements A huge thank you to my thesis supervisor, Dr. Nancy Heckman, for being such an inspirational mentor to me - amazingly generous with her time, ideas, advice and NSERC funding, immensely passionate about research and teaching, wonderfully encouraging and supportive. A sincere thank you to Dr. John Petkau, Department of Statistics, University of British Columbia and to Professor Sverre Vedal and Dr. Eduardo Hernandez-Garduho, formerly of the Respiratory Division, Department of Medicine, Faculty of Medicine, University of British Columbia, for kindly providing me with the Mexico City air pollution data. A heartfelt thank you to Dr. John Petkau for generously funding me to analyze these data, for providing me with valuable feedback upon reading the manuscript of this thesis and for his excellent advice over the years. A sincere thank you to Dr. Lang Wu, Dr. Jim Zidek, Dr. Michael Brauer and Dr. Jean Opsomer for their careful reading of the thesis manuscript and valuable comments and suggestions. Thank you to the Department of Statistics and the University of British Columbia for providing me with funding that enabled me to pursue my degree. I would like to thank all faculty, staff and graduate students in the Department of Statistics, University of British Columbia, for making my stay there such an enriching experience. r  I would like to thank my family in Romania for believing in me and for loving me unconditionally. I would also like to thank my dear friends in Canada and Romania, whose affection and humour helped me stay grounded. Special thanks to Viviane DiazLima, Lisa Kuramoto and Raluca Balan for their unwavering support and for being my friends.  xxiv  Finally, thank you to Jeffie, m y partner i n mischief and adventure, for loving me and our family i n R o m a n i a beyond measure, and for m a k i n g magical things happen a l l the time.  ISABELLA RODICA GHEMENT  The University of British  Columbia  August 2005  XXV  Jeffie, my ever-loving, ever-caring, ever-there knight i n shining armour, and our loving family i n R o m a n i a .  xxvi  Chapter 1 Introduction Semiparametric regression models combine the ease of interpretation of parametric regression models with the modelling flexibility of nonparametric regression models. They generalize parametric regression models by allowing one or more covariate effects to be non-linear. Just as in nonparametric regression models, the non-linear covariate effects are assumed to change gradually and are captured via smooth, unknown functions whose particular shapes will be revealed by the data. In this thesis, we are interested in semiparametric regression models for which (i) the response variable is univariate, continuous, (ii) one of the covariate effects is allowed to be smooth, non-linear, and (iii) the remaining covariate effects are assumed to be linear. Given the data (Yi, Xj, Zj), i — 1 , . . . , n, such models can be specified as: Yi = Xjp + m(Z ) + e , i = l , . . . , n , i  i  (1.1)  where (3 is a vector of linear effects, m is a smooth, non-linear effect and the ej's are unobservable random errors with zero mean. Model (1.1) is typically referred to as a partially linear regression model. In many applications, the smooth, non-linear effect m in model (1.1) is not of interest in itself but is included in the model because of its potential for confounding the linear effects (3, which are of main interest. The nature of this confounding is often too 1  complex to specify parametrically.  A non-parametric specification of this confounding  effect is therefore preferred to avoid modelling biases. T h e practical choice of the degree of smoothness of the non-linear confounder effect is a delicate issue i n these types of applications. T h i s choice should yield accurate point estimators of the linear effects of interest. T h e choice may be highly sensitive to the correlations between the linear and non-linear variables i n the model. T h e potential correlation amongst model errors is a qualitatively different source of confounding on the linear effects of interest i n a partially linear model. In practice, we need to decide carefully whether we should account for this correlation when assessing  the  significance and magnitude of the linear effects of interest. If one decides to ignore the error correlation, one should try to understand the impact of this decision on the validity of the ensuing inferences. T h e issues of error correlation, non-linear confounding, and correlation between the linear and non-linear terms i n a partially linear regression model are intimately connected. T h e i r interplay needs to be judiciously considered when selecting the degree of smoothness of the estimated non-linear effect. E v e n when this selection yields accurate estimators of the linear effects of interest i n the model, one needs to assess whether it also yields valid confidence intervals and testing procedures for assessing the magnitude and significance of these effects.  1.1  Literature Review-  i n this section, we provide a survey of some of the most important results i n the literature of partially linear regression models of the form (1.1). We treat separately the case when the model errors, €$, i = 1 , . . . , n , are uncorrelated and when they are correlated. Note that, i n (1.1), we observe only one sequence Y i , . . . ,Y n  studies we would observe multiple sequences. 2  In classical longitudinal  E v e n though i n this thesis we are not  interested in partially linear models for analyzing data collected in longitudinal studies, we do mention some results which are significant in the literature of these models.  1.1.1  Partially Linear Models with Uncorrelated Errors  The partially linear regression model (1.1)-has been investigated extensively under the assumption of independent, identically distributed errors. In this section, we provide a brief overview of some of the most relevant results concerning inferences on /3, the parametric component of the model, that are available in the literature. These results have a.common theme: seeing if /3 is estimated at the 'usual' parametric rate of 1/n - the rate that would be achieved if m were known. As Robinson (1988) points out, consistent estimators of (3 that do not have the 'usual' parametric rate of convergence have zero efficiency relative to estimators that have this rate. Engle et al. (1983) and Wahba (1984) proposed estimating (3 and m simultaneously by minimizing a penalized least squares criterion with penalty based on the s  th  derivative of  m, with s > 2. The performance of the penalized least squares estimator of @ depends on the correlation between the linear and non-linear variables in the model. Heckman (1986) established the ^/^-consistency of this estimator assuming that the linear and non-linear variables are uncorrelated. Rice (1986) showed that, if the linear and non-linear variables are correlated, the estimator becomes y^n-inconsistent, unless one 'undersmooths' the estimated m. 'Undersmoothing' refers to the phenomenon of estimating m at a slower rate than the 'usual' nonparametric rate of n~ ^ - the rate that would be achieved if 4  5  (3 were known. Rice showed that if one didn't 'undersmooth', the squared bias of the estimated linear effects would dominate their variance. The author remarked that this would have disastrous consequences on the inferences carried out on the linear effects. For instance, conventional confidence intervals for these effects would be misleading. Rice called into question the utility of traditional methods such as cross-validation for choosing the degree of smoothness of the estimated non-linear effect when i/n-consistency 3  of the estimated linear effects is desired, and rightly so. These methods are devised for 'smoothing', not 'undersmoothing', the estimated non-linear effect. Green, Jennison and Seheult (1985) proposed estimating (3 and m by minimizing a penalized least squares criterion with penalty based on a discretization of the second derivative of m. They termed their estimation method least squares smoothing and showed that it yields estimators that solve a system of backfitting equations. These equations combine a smoothing step for estimating m, carried out using a discretized version of smoothing splines, with a least squares regression step for estimating (3. Green, Jennison and Seheult generalized their least squares smoothing estimators by allowing the smoothing step in the backfitting equations to be carried out using any smoothing method. These generalized least squares smoothing estimators are referred to in the literature as the Green, Jennison and Seheult estimators. Speckman (1988) derived the asymptotic bias and variance of the Green, Jennison and Seheult estimator of /3, using locally constant regression with general kernel weights in the smoothing step. Speckman's findings paralleled those of Rice: in the presence of correlation between the linear and non-linear variables in the model, the Green, Jennison and Seheult estimator of (3 is -^-consistent only if one 'undersmooths' the estimated m. Speckman provided a heuristic argument for why the generalized cross-validation method cannot be used to choose the degree of smoothness of the estimated m in practice when -y/n-consistency of the Green, Jennison and Seheult estimator of (3 is desired. Neither Rice nor Speckman proposed methods for 'undersmoothing' the estimated m. However, Speckman (1988) introduced a partial-residual flavoured estimator of (3 that does not require 'undersmoothing'. He argued that traditional methods such as generalized cross-validation could be used to select the degree of smoothness of the estimated m. Speckman did not address the important issue of whether such data-driven methods would produce amounts of smoothing that yield -y/n-consistent estimators of the linear effects of interest. Sy (1999) established that data-driven methods such as cross-validation  4  and generalized cross-validation do indeed yield i/n-consistent estimators of these effects, thus paving the way for carrying out valid inferences on these effects, at least for large sample sizes. Opsomer and Ruppert (1999) proposed estimating (3 and m via the Green, Jennison and Seheult estimators, using locally linear regression with general kernel weights in the smoothing step. They showed that, unless one 'undersmooths' the estimated m, their estimator of (3 may not achieve -y/n-consistency. They then suggest how to use the data to choose the appropriate degree of smoothness for accurate estimation of c (3, with c T  known. Opsomer and Ruppert's approach for choosing the right degree of smoothness, referred to as the Empirical Bias Bandwidth Selection (EBBS) method, will be discussed in more detail in Chapter 6. The authors conjectured that EBBS would produce a yfnconsistent estimator of c f3. T  1.1.2  Partially Linear Models with Correlated Errors  The independence assumption for the errors associated with a partially linear regression model is not always appropriate in applications. For instance, when the data have been collected sequentially over time, it is likely that present response values will be correlated with past response values. Even in the presence of error correlation, it is desirable to obtain y^-consistent estimators for the linear effects in the model. Engle et al. (1986) were amongst the first authors to consider a partially linear regression model with AR(1) errors. They noted that the correct error correlation structure can be used to transform this model into a model with serially uncorrelated errors, by quasidifferencing all of the data. They proposed estimating the linear effects (3 and the nonlinear effect m in the original model by applying the penalized least squares method proposed by Engle et al. (1983) and Wahba (1984) to the quasi-differenced data. Engle et al. (1986) prove that their estimator of (3 is consistent when one estimates m at the 'usual' nonparametric rate of n~ / , but do not show it is y/n— consistent. They recommend 4  5  5  choosing both the 'right' degree of smoothness of the estimated m and the autoregressive parameter by minimizing a generalized cross-validation criterion constructed from the quasi-differenced data. This data-driven choice of smoothing may not however yield an accurate estimator of j3, as it is geared at accurate estimation of m. Schick (1996, 1999) considered partially linear regression models with AR(1) errors and ARMA(p,cj) errors, respectively, where p,q > 1. He characterized and constructed efficient estimators for the parametric component f3 of these models, assuming appropriate theoretical choice of degree of smoothness for the estimated m. He did not however indicate how one might make this choice in practice. Several authors investigated partially linear models with a-mixing errors. Before reviewing their respective contributions, we provide a definition for the a-mixing concept. For reference, see Ibragimov and Linnik (1971).  Definition 1.1.1 A sequence of random variables {e ,t = 0 , ± 1 , - - - } is said to be at  mixing if a(k) = sup as k —> oo where J ™^ and F^  sup  7  ;  +k  \P(Af\B)  - P(A)P(B)\  -* 0  (1.2)  are two a-fields generated by {e ,t < n} and {e ,t > t  t  n + k}, respectively.  The mixing coefficient a(k) in (1.2) measures the amount of dependence between events involving variables separated by at least k lags. Note that for stationary sequences the supremum over n in (1.2) goes away. Aneiros Perez and Quintela del Rio (2001a) considered a partially linear model with amixing, stationary errors. They proposed estimating j3 and m via modifications of the Speckman estimators. Their modifications account for the error correlation structure, assumed to be fully known. The smoothing step involved in estimating /3 and m is based 6  on locally constant regression with Gasser-Miiller weights (Gasser and Miiller, 1984), adjusted for boundary effects. The authors derived the order of the conditional asymptotic bias and variance of the modified Speckman estimator of (3. They found that the conditional asymptotic bias of their estimator of /3 is negligible with respect to its conditional asymptotic variance, shown to have the 'usual' parametric rate of convergence of 1/n. They concluded they do not need to 'undersmooth' their estimator for m in order to obtain a \Zn-consistent estimator for /3. The fact that the modified Speckman estimator of (3 does not require 'undersmoothing' in the presence of error correlation is not surprising. The estimator inherits this property from the usual Speckman estimator. Aneiros Perez and Quintela del Rio (2001b) proposed a data-driven modified cross-validation method for choosing the degree of smoothness required for accurate estimation of the regression function r(Xi, Zi) = Xf/3 + m(Zi) via modified Speckman estimators. It is not clear whether such a method would be suitable for accurate estimation of (3 itself. To address the problem of choosing the degree of smoothness for accurate estimation of (3 via the modified Speckman estimator, Aneiros Perez and Quintela del Rio (2002) developed an asymptotic plug-in method. Their method relies on the more restrictive assumption that the model errors are realizations of an autoregressive process of finite, known order. You and Chen (2004) considered a partially linear model with a-mixing, possibly nonstationary errors. They estimated /3 and m using the usual Speckman estimators, which do not account for error correlation. They then applied a block external bootstrap approach to approximate the distribution of the usual Speckman estimator of (3 and provide a consistent estimator of its covariance matrix. Using this information, they constructed a large-sample confidence interval procedure for estimating f3. Based on a simulation study, the authors note that the block size seems to have a strong influence on the finite-sample performance of their procedure. However, they do not indicate how one might choose the block size in practice. In the simulation study, the smoothing parameter of the usual Speckman estimator of (3 was selected via cross-validation, modified for correlated errors. This method is appropriate for accurate estimation of m but may not 7  be suitable for accurate estimation of /3. You, Zhou and Chen (2005) considered a partially linear model with errors assumed to follow a moving average process of infinite order. They proposed a jackknife estimator for /3, which they obtained from a usual Speckman estimator. They showed their estimator to be asymptotically equivalent to the usual Speckman estimator, and proposed a method for estimating its asymptotic variance. They also constructed confidence intervals and tests of hypotheses for /3 based on the jackknife estimator and its estimated variance. In their simulation study, these authors find that confidence interval estimation based on their jackknife estimator has better finite-sample coverage properties than that based on the usual Speckman estimator, even though the latter uses the information on the error structure, while the former does not. In this study, the smoothing was performed with different nearest neighbor smoothing parameter values and the results were shown to be insensitive to the choice of this parameter. This may not always be the case for contexts that are different from that considered by these authors. As we already mentioned, partially linear regression models with correlated errors can be used for analyzing longitudinal data, that is, data obtained by measuring each of several study units on multiple occasions over time. Longitudinal data are naturally correlated, as the measurements taken on the same study unit are correlated. In order to estimate the linear effects /3 and the non-linear effect m in such models, Moyeed and Diggle (1994) modified the Green, Jennison and Seheult and the Speckman estimators to account for the longitudinal data structure and for the error correlation, assumed to be known. Their smoothing step used local constant Nadaraya-Watson weights (Nadaraya, 1964 and Watson, 1964). They derived the order of the conditional asymptotic bias and variance of their estimators of /3, obtaining asymptotic constants only for the variance of these estimators. Their results are valid under the assumption that the number of study units goes to infinity and the number of occasions on which each study unit is being measured is kept constant. Note that Moyeed and Diggle did not treat m as a  8  nuisance. T o choose the degree of smoothness of the estimated m , these authors used a leave-one-subject-out cross-validation method. T h i s method is geared towards accurate estimation of m and may not be appropriate for accurate estimation of j3. None of the authors considered i n this section looked simultaneously at how to choose the right degree of smoothing for accurate estimation of the linear effects and how to construct valid standard errors for the estimated linear effects.  To do b o t h requires  accounting for the correlation structure of the model errors.  1.2  Thesis Objectives  Throughout this thesis, we will consider only partially linear models of the form (1.1) i n which the non-linear effect m is treated as a nuisance. In contrast to the 'usual' view i n regression models, we w i l l think of the linear covariates as being r a n d o m but consider the Zi's to be fixed. T h e reason for this is that we are m a i n l y interested i n applications for which the Z ; ' s are consecutive time points (e.g. days, weeks, years). T h e results i n this thesis can be easily modified to account for the case when the Z^'s are r a n d o m instead of fixed. However, some expressions need to be re-defined to account for the randomness of the Z j ' s . For instance, see the end of Sections 4.1 and 4.2. In this thesis, we w i l l allow the linear covariates to be m u t u a l l y correlated and assume they are related to the non-linear covariates v i a a non-parametric regression relationship. M o s t importantly, we w i l l assume that the model errors are serially correlated.  W i t h i n this framework, we  w i l l concentrate on developing formal methods for carrying out valid inferences on those linear effects i n the model which are of m a i n interest. T h i s entails the following:  1. defining sensible estimators for the linear effects i n the model, as well as for the nuisance non-linear effect;  9  2. deriving the asymptotic bias and variance of the proposed estimators of the linear effects; 3. developing methods for choosing the right degree of smoothness of the estimated non-linear effect in order to accurately estimate the linear effects of interest; 4. developing methods for estimating the correlation structure of the model errors for inference and smoothing; 5. developing methods for assessing the magnitude and statistical significance of the linear effects of interest; 6. investigating the performance of the proposed inferential methods via Monte Carlo simulation studies; 7. using the inferential methods developed in this thesis to answer specific questions related to the impact of air pollution on mortality in Mexico City during 1994-1996, after adjusting for weather patterns and temporal trends. We conclude this chapter with an overview of the thesis which indicates where and how the above objectives are addressed. In Chapter 2, we provide a formal definition of the partially linear model with correlated errors of interest in this thesis. We also introduce the notation and assumptions required for establishing the theoretical results in subsequent chapters. In Chapter 3, we define the following types of estimators for (3 and m: (i) local linear backfitting estimators, (ii) modified local linear backfitting estimators, and (iii) estimated modified local linear backfitting estimators. In Chapter 4, we derive asymptotic approximations for the exact conditional bias and variance of the local linear backfitting estimator of /3. Based on these results we conclude that, in general, the local linear backfitting estimator of j3 is not v^-ccmsistent. We 10  argue that the estimator can achieve y^n-consistency provided we 'undersmooth' the corresponding local linear backfitting estimator of m. In Chapter 5, we replicate the results in Chapter 4 for the modified local linear backfitting estimator of (3. We also provide sufficient conditions under which the estimated modified local linear backfitting estimator of (3 is asymptotically 'close' to its modified counterpart. In Chapter 6, we develop three data-driven methods for choosing the degree of smoothness of the backfitting estimators of m defined in this thesis in order to accurately estimate (3. Two of these methods are modifications of the Empirical Bias Bandwidth Selection (EBBS) method of Opsomer and Ruppert (1999). The third method is a non-asymptotic plug-in method. All methods account for error correlation. We suspect that these methods 'undersmooth' the estimated m because they attempt to estimate the amount of smoothing that is optimal for estimating (3, not for estimating m. Our theoretical results suggest that, in general, the optimal amount of smoothing for estimating (3 is smaller than the optimal amount of smoothing for estimating m. In Chapter 6, we also introduce methods for estimating the correlation structure of the model errors needed to choose the amount of smoothing of the backfitting estimators of /3 and to carry out inferences on (3. These methods rely on a modified cross-validation criterion similar to that proposed by Aneiros Perez and Quintela del Rio (2001b). In Chapter 7, we develop three kinds of confidence intervals and tests of hypotheses for assessing the magnitude and significance of a linear combination c (3 of the linear effects T  in the model: standard, bias-adjusted and standard-error adjusted. To our knowledge, adjusting for bias in confidence intervals and tests of hypotheses has not been attempted in the literature of partially linear models. In Chapter 8, we report the results of a Monte Carlo simulation study. In this study, we investigated the finite sample properties of the usual and estimated modified local linear backfitting estimators of c f3 against those of the usual Speckman estimator. We chose T  11  the smoothing parameter of the backfitting estimators using the data-driven methods developed in Chapter 6. By contrast, we chose the smoothing parameter of the usual Speckman estimator using cross-validation, modified for correlated errors and for boundary effects. The main goals of our simulation study were (1) to compare the expected log mean squared error of the estimators and (2) to compare the performance of the confidence intervals built from these estimators and their associated standard errors. Our study suggested that quality of the inferences based on the usual local linear backfitting estimator was superior, and that this estimator should be computed with one of our modifications of EBBS or a non-asymptotic plug-in choice of smoothing. Even though the quality of the inferences based on the usual Speckman estimator was reasonable for most simulation settings, it was not as good as that of the inferences based on the usual local linear backfitting estimator. The quality of the inferences based on the estimated modified local linear estimator was poor for many simulation settings. In Chapter 9, we use the inferential methods developed in this thesis to assess whether the pollutant PM10 had a significant short-term effect on log mortality in Mexico City during 1994-1996, after adjusting for temporal trends and weather patterns. Our data analysis suggests that there is no conclusive proof that PM10 had a significant short-term effect on log mortality. In Chapter 10, we summarize the main contributions of this thesis and suggest possible extensions to our work.  12  Chapter 2 A Partially Linear Model with Correlated Errors In Section 2.1 of this chapter, we provide a formal definition of the partially linear model of interest in this thesis. In Section 2.2, we introduce assumptions that we use to study the asymptotic behavior of our proposed estimators. In Section 2.3, we introduce some useful notation. In Section 2.4, we give several linear algebra definitions and results which will be utilized throughout this thesis. The chapter concludes with an Appendix which contains a useful theoretical result.  2.1  The Model  Given the data (Yi, Xij, Zi), i = 1,..., n, j — 1,... ,p, the specific form of the partially linear model considered in this thesis is: Y = X/3 + m + e,  where Y = (Yi,... ,Y )  T  n  (2.1)  is the vector of responses, X is the design matrix for the  parametric part of the model (to be defined shortly), /3 = (Po,Pi, • • • ,Pp) is the vector T  13  of unknown linear effects, m  —  (m(Zi),..., m(Z ))  T  n  and e = ( e i , . . . , e )  T  n  is the vector  of model errors. Here, X lp \  1 X 11 X =  (2.2) y 1 Xi  ••• X  n  where Xu,..., X  ip  np  J  are measurements on p variables Xi,  , X , the Zi's are fixed design p  points on [0,1] following a design density /(•) (see condition (A3) in Section 2.2 for the exact definition), and m(-) is a real-valued, unknown, smooth function defined on [0,1]. Note that, unless we impose a restriction on m(-), model (2.1) is unidentifiable due to the presence of the intercept ft in the model. For instance, ft + m(-) = 0 + (m(-) + ft). To ensure identifiability, we assume that m(-) satisfies the integral restriction: ~i  /  m(z)f{z)dz = 0.  (2.3)  Jo  In practice, we replace (2.3) by the summation restriction: lm T  = 0,  (2.4)  where the symbol 1 denotes an n x 1 vector of l's. One could think of the smooth function m(-) as being a transformation of the fixed design points Zi,i = 1,... ,n, that ensures that the partially linear model (2.1) is an adequate description of the variability in the Yi's. Alternatively, one could think of the function m(-) as representing the confounding effect of a random variable having density /(•) on the linear effects ft, . . . , 8 . p  We assume that the errors €j in model (2.1) are such that Efe) — 0, Var(e;) = of and Corr(ei, tj) = 'J/jj for i ^ j, where o~ > 0 and \& = (\Pj,j) is the n x n error correlation t  matrix. Note that \I> is not necessarily equal to the nx n identity matrix I. In practice, both the error variance of and the error correlation matrix * are typically unknown and need to be estimated from the data. An alternative formulation for the partially linear model (2.1) can be obtained by remov14  ing the constraint (2.3), setting m* = 0 1 + m and re-writing the model as: O  Y = X*/3* + m* + e,  (2.5)  where X* is an n x p m a t r i x defined as:  \  X  n  X* =  (2.6) X, np  and (3* = (0i,...  ,0 ) . T  P  )  T h e model formulation i n (2.5) is frequently encountered i n the  partially linear model literature a n d does not require that we impose any identifiability conditions on the function m*(z) = 0 + m(z),z O  6 [0,1] . Indeed, the absence of a n  intercept i n model (2.5) ensures that m*(-) is identifiable. I n this thesis, however, we prefer to use the formulation i n (2.1), as i t makes i t easier to understand that model (2.1) is a generalization of a linear regression model a n d a particular case of a n additive model, which typically do contain a n intercept.  2.2  Assumptions  T h e asymptotic results derived i n Chapters 4 and 5 allow the linear variables i n model (2.1) to be correlated w i t h the non-linear variable v i a the following condition. (AO)  The covariate nonparametric  values X^ and the non-random regression  design points Zi are related via the  model:  Xij = gj(Zi) +r)ij, i = 1 , . . . , n , j = 1 , . . .  (2.7)  where (i) the gj(-) 's are smooth, unknown functions  15  having three continuous  derivatives;  (ii) the (rjn,... ,r] ) ,i T  ip  = 1,... , n, are independent, identically distributed unob-  served random vectors with mean zero and variance-covariance matrix S =  We impose two different sets of assumptions on the errors associated with model (2.1) for studying the asymptotic behaviour of two different estimators of (3. In Section 3.1.1 of Chapter 3 we define the so-called local linear backfitting estimator of f3. The definition of this estimator does not account for the correlation structure of the model errors. In Chapter 4, we study the asymptotic behaviour of this estimator under the assumption that the model errors satisfy the following condition. (Al)  (i) The model errors Ci,i = 1,... ,n, represent n consecutive realizations from a general covariance-stationary process {et}, t — 0 , ± 1 , ± 2 , . . . having mean 0, finite, non-zero variance a\ and correlation coefficients:  Pk =  E{e e -k) t  t  o  E{e e ) s  =  , k = 1,2,3,...,  2  s+k  ,  .  (2.8)  where t,s =0, ±1, ± 2 , . . . . (ii) The error correlation matrix \& is assumed to be symmetric, positive-definite and to have a bounded spectral norm, that is ||*||s = 0{1) as n —* oo. (For a definition of the spectral norm of a matrix see Section 2.4-) (iii) Let (rjn,... ,rji ) ,i T  p  = 1,... ,n, be as in (AO)-(ii) . Then there exists a (p +  1) x (p + 1) matrix 5>(°) such that the error correlation matrix  -^—rf^r) n+ 1  - 4> + o (l) (0)  P  satisfies:  (2.9)  as n — > oo, where  ^ 0 r?n • • • r] ^ lp  (2.10)  V= \0  7} i ••• n  J]  np  J  (iv) €j is independent of (rjn,..., r} ) for any i, j = 1,..., n. T  ip  16  In Section 3.1.2 of Chapter 3 we define the so-called modified local linear backfitting estimator of the vector of linear effects /3 in model (2.1). The definition of this estimator assumes full knowledge of the correlation matrix of the model errors. In Chapter 5, we study the asymptotic behaviour of this estimator under the assumption that the model errors satisfy the following condition: (A2)  (i) The €i's represent n consecutive realizations from a covariance-stationary autoregressive process of finite order R having mean 0, finite, non-zero variance a\ and satisfying:  e = fat-i +  + ••• + 4> et-R + u , t = 0, ± 1 , ± 2 , . . .  t  R  t  (2.11)  with {u }, t = 0, ± 1 , ± 2 , . . . being independent, identically distributed random t  variables having mean 0 and finite, non-zero variance u\. (ii) ej is independent of(r)n,...,  r) ) for any i, j = 1,..., n, where (rjn,..., T  ip  rji ) , i T  p  1,..., n, are as in (AO)-(ii).  According to Comments 2.2.1 - 2.2.3 below, if the errors  satisfy condition (A2), they  also satisfy condition (Al).  Comment 2.2.1 If the errors e i = 1,..., n, satisfy condition (A2), then one can easily it  see that they also satisfy condition (Al)-(i). Moreover, one can show that their correlation matrix * = (*ij) is given by ^  = 1,  = p(\i - j\) = p ^ , i ^ j, where p is a  correlation function and the p;'s satisfy the Yule-Walker equations: Pk = (piPk-i + ••• + <f>Rpk-R,fork > 0.  The general solution of these difference equations is: Pk = V'IAJ + ip2>^2 + •••+ ipR^R, for  17  > 0  where the A;, i — 1,..., R, are the roots of the polynomial equation: z -<l>iz - -^--<f> R  R  1  R  Initial conditions for determining tpi,...,  TJJR  = 0.  can be obtained by using po = 1 together  with the first R — 1 Yule-Walker equations. For more details, see Chatfield (1989, page 38).  Comment 2.2.2 If the errors e*, i = 1,..., n, satisfy condition (A2), then their correlation matrix * = (^Sij) satisfies condition (Al)-(ii) by Comment 2.2.1 and result (5.34) of Lemma 5.7.2 (Appendix, Chapter 5). In other words, \& is symmetric, positive-definite and has finite spectral norm.  Comment 2.2.3 If the errors e*, i = 1,..., n, associated with model (2.1) satisfy condition (A2) then, by Lemma 2.5.1 in the Appendix of this chapter, \& satisfies (2.9) of condition (Al)-(iii), with 4? = S (0)  ( 0 )  and £  ( 0 )  defined as in (2.15) .  Comment 2.2.4 Due to its parametric nature, assumption (A2) allows us to find an explicit expression for the inverse of the error correlation matrix  making the derivation  of the asymptotic results concerning the modified local linear estimator of j3 easier. We have not been able to modify our proof of these results to handle the more general assumption (Al), since finding an explicit expression for SI/ under (Al) may not be -1  possible. The asymptotic results derived in Chapters 4 and 5 assume h, the half-width of the window of smoothing involved in the definition of the local linear backfitting estimator and the modified local linear estimator of (3, to be deterministic and to satisfy h-*0 18  (2.12)  and nh -» oo  (2.13)  3  as n —> co. These asymptotic results also rely on the conditions below. (A3) The Zi's are non-random and follow a regular design, i.e. there exists a continuous strictly positive density /(•) on [0,1] with:  f  i f(z)dz = ——, i = n+l  Zi  / Jo  l,...,n.  Moreover, /(•) admits two continuous derivatives. (A4) m(-) is a smooth function with 3 continuous derivatives. (A5) K(-), the kernel function used in (3.7) and (3.8), is a probability density function symmetric about 0 and Lipschitz continuous, with compact support [—1,1].  2.3  Notation  Let Zi, i = 1,..., n, be design points satisfying the design condition (A3) and letffi(•)>••• > g (-) be functions satisfying the smoothness assumptions in condition (A0)-(i). We define p  the n x  matrix G as: ( 1 g,{Z ) x  •••  g (Z ) p  x  \  ( flo(Zi)  \  Si(Zi)  G =  (2.14) \ 1 gi(Z ) n  • • • g {Z ) J p  n  \ go(Z ) n  gi(Z ) n  •••  g (Z ) J p  n  Furthermore, let the n x (p + 1) matrix rj be defined as in (2.10) (condition (Al)-(iii)). In light of condition (AO)-(ii), the transposed rows of rj are independent, identically distributed degenerate random vectors with mean zero and variance-covariance matrix  19  £  , where:  ( 0 )  0 S  0  (o)  0  •••  0  S u • • • Sip  (2.15)  \ o s i p  U s i n g equation (2.7) of condition (AO) (Xjj — gj(Zi) + rjij) together w i t h the definitions of G a n d r/ i n equations (2.14) and (2.10), we can express the design m a t r i x X i n (2.2) as: X = G + r).  (2.16)  Let K(-) be a kernel function satisfying condition ( A 5 ) ; i f z 6 [0,1] a n d h € [0,1/2], define the following quantity: A\-z)/h  {K,z,h)=  s K{s)ds, 1 = 0 , 1 , 2 , 3 .  (2.17)  l  -z/h J-zlh  Note that, i f z € [h, 1 — h], i.e. z is a n 'interior' point of the interval [0,1], then v (K,z,h) t  = f^s K(s)ds  = v (K) as [-z/h,(l  l  - z)/h] 5 [-1,1] a n d K(-) has com-  t  pact support o n [—1,1] by condition ( A 5 ) . Now, for go{-), • • •, <7P(-) as above and / ( • ) a design density, we let:  J g(z)f(z)dz=(j\ (z)f(z)dz,...,j\ (z)f(z)dzy, 0  (2.18)  p  and J  1  g(z)m"(z)f(z)dz  = ( j f ' g (z)m"(z)f(z)dz,..., 0  We also let J g(z) f(z)dz T  Q  = f  Q  g(z)f(z)dz  g (z)m"\z)f(z)dz)  T  p  .  (2.19)  and define the (p + 1) x (p+ 1) m a t r i x  V as: V = E<°) + / g(z)f(z)dz Jo 20  • [ Jo  g(z) f(z)dz, T  (2.20)  as in (2.15). We also define the (p + 1) vector W as:  with W  =  ^  VJ  L  I! 9^ "^fW m  I! 9( )f( )  ~ ^Y  dz  V2  L  z  Finally, define the (p + 1) x (p + 1) matrix V  * =4 a  ( +E ^ 1  « V  *=i  1  /  «  V  CT  •f  dz  m"(z)f(z)dz.  (2.21)  as:  + ~2 ( - E  S ( 0 )  z  r  ^)  *=i  /  9(z)f{z)dz  7 0  f  • J  g(zff(z)dz.  ° (2.22)  2.4  Linear A l g e b r a - Useful Definitions and Results  In this section, we first provide an overview of the vector and matrix norm definitions and properties used throughout the remainder of this thesis. Let A = (Aij) be an arbitrary m x n matrix and B — (Bki) be an n x q matrix, both having real elements. Also, let v — (v\,..., v )  T  n  be an arbitrary n x 1 vector with real  elements. The spectral norm of the matrix A is defined as: I I-A JII s  H^lb = max —r—-—  IMl2#o \\v\\  2  with || • || being the Euclidean norm of a vector, that is \\v\\l = ^"=i i • Furthermore, u  2  the Frobenius norm of A is defined as: \\A]\  F  E E i=i  4 j=i  It is well-known that \\A\\s < \\A\\F. Clearly, if A is a column vector (that is, n — 1), then | | A | | — | | A | | . In particular, if A is a scalar (i.e., m = n — 1), then ||.A||s equals S  2  the absolute value of this scalar. It is also known that | | A • B\\p < \\A\\F • \\B\\FWe conclude this section by reviewing the definitions of random bilinear and quadratic forms and providing formulas for computing the expected value of such forms. 21  Suppose A = (Aij) is an n x n matrix with real-valued elements, not necessarily symmetric. Similarly, suppose that B —  is an n x m matrix with real-valued elements.  Let u be an arbitrary n x l random vector having real-valued elements. Also, let v be an arbitrary m x 1 random vector with real-valued elements. A bilinear form in u and v with regulator matrix B is defined as: n m B(u, v) = u Bv T  =  BijUiVj. i=l j=l  Note that B(u, v) is random, and its expected value can be computed using the following formula: E(B(u, v)) = trace(BCov(u,  v) ) + E{u) BE(v). T  T  (2.23)  In particular, a quadratic form in u with regulator matrix A is defined as: n Q(u) = U Au T  n  = ^2 AijUiUj, i=l j=l  with (2.23) reducing to: E(Q(u))  2.5  = trace(AVar(u))  + E{u) AE{u). T  (2.24)  Appendix  The following result helps establish that condition (A2) is a special case of condition (Al). Lemma 2.5.1 Let rj be defined as in equation (2.10) of condition (Al) and let * be defined as in Comment 2.2.1. Then, as n — > oo, 1  n +1 where  T7*T7 = S  is defined as in (2.15).  22  ( 0 )  + Op(l),  (2.25)  Proof: Let rj denote the / t  t h  column of rj and consider rjfSl?ri , where I, t = 1,..., p + 1. When t  / = 1 or t — 1, this is 0. For I, t — 2,... ,p + 1, we have: ^  ^ n  -vf^Vt+i n  n  ^ n  n  = "7 E E V i , i * i j V j , t n '—' '—' i=l j=l n  y  = - E t=l  •y  n  = - £ £ ^ 1 ' " Jl)^,"7i,t n i=l j=l [2] / ^ n-k 1 ™ \ Vi,iVi,t + E P(\k\) - E Vi,iVi+k,t + - Y Vi,iVi-k,t fc=i \ t=l i=k+l J ^° ( y ~k y n +E ^ i ) - E + - E Vi,lVi-k,t i=k+l / n  = -E i=l  fc=l  [2]  + E  fc=fc +l 0  \  i=l  i=fc+l  / j n-fc 1 ™ ^(1*1) \ I ~t = l ^.'^+*.* ~i=fc+l E +  \ ^ . ^ - M  J  (- ) 2  26  where [n/2] denotes the integer part of n/2 and k is chosen independently of n in the 0  following fashion. Since 2~Zfcli IP(I^I)I <  0 0  (  s e e  Lemma 5.7.2 for a justification of this  result), for any given e > 0 we can choose ko such that: 2  00  for some large constant C.  E IP(I*DI<§k=k +l 0  In light of condition (AO)-(ii), the first term in (2.26) converges to E ; by the Weak Law |t  of Large Numbers applied to the independent random variables 77^77^, i — 1 , . . . , n. The second term in (2.26) converges to zero in probability as n —> co by the following argument. The random variables 77^77*+^, i = I,... ,n — k, are Ac-dependent and identically distributed by condition (AO)-(ii). The Weak Law of Large Numbers for kdependent random variables implies that YfiZi li,i 1i+k,t/{'n — k) converges to Efa^A T  E(rji i)E(rj t) t  2:  r  =  = 0 in probability as n —* 00. A similar argument yields that the quantity  YH=k+i VijVi-kj/n  converges to 0 in probability as n —> 00.  Now, consider the third term in (2.26). By Markov's Inequality and condition (AO)-(ii), for n large enough, we have: 23  12  J  / j n—k  E  p(\ \)  ^  -/~2vi,iVi+k,  k  +-  t  k=k +l  \  0  t=l  n E  t  i=k+l  L2J  / j n-k  1  E ^(lD (V~ t=l =fco+l  < -E e  "  + ~i=/c+l E  fc  1  >€  Vi,ivi-k,  \  ^^i-MJ /  / ^ n-fc n \ < ~ E l^(lDI - S ^ l ^ i + M l + ~ E ^ki.^i-Ml ) ^  [2]  cf  fc=fco+l  = -  E  1=1  V  i=k+l  /  \P(\k\)\(2—E\r, , , \) 1 im+k t  fc=fc +l 0  ~  <  7  [ 2 ]  E  ^  IP(I*I)I<7  0  E  0  2  I P ( I * I ) I < 7 - ^ < «  fc=fco+l k=ko+l In conclusion, the t h i r d term i n (2.26) converges to zero i n probability as n C o m b i n i n g the previous results yields (2.25).  24  00.  Chapter 3 Estimation in a Partially Linear Model with Correlated Errors Obtaining sensible point estimators for the linear effects in a partially linear model with correlated errors is the first important step towards carrying out valid inferences on these effects. Such inferences include conducting hypotheses tests for assessing the statistical significance of the linear effects of interest, and constructing confidence intervals for these effects. As we have seen in Sections 1.1.1-1.1.2, several methods for estimating the linear and non-linear effects in a partially linear model have been proposed in the literature, both in the presence and absence of correlation amongst model errors. In principle, any of these methods could be used to obtain point estimators for the linear effects in a partially linear model with, correlated errors. However, those methods which ignore the correlation structure of the model errors might produce less efficient estimators than the methods which account explicitly for this correlation structure. It is still of interest to consider methods which do not account for the presence of correlation amongst the model errors when estimating the linear effects in the model. Indeed, these methods could yield valid testing procedures based on the inefficient point estimators they produce and the standard errors associated with these estimators. 25  In the present chapter we show that many of the estimation methods used in the literature for a partially linear model with known correlation structure can be conveniently viewed as particular cases of a generic Backfitting Algorithm. We also show how this generic Backfitting Algorithm can be modified for those instances when the error correlation structure is unknown and must be estimated from the data. This chapter is organized as follows. In Section 3.1, we discuss the generic Backfitting Algorithm for estimating the linear and non-linear effects in model (2.1) when the error correlation structure is known. In particular, in Sections 3.1.1 and 3.1.2 we discuss the usual and modified generic backfitting estimators of these effects. In Section 3.1.3, we talk about appropriate modifications of these estimators that can be used when the error correlation structure is unknown. In Section 3.1.4, we discuss several generic backfitting estimators which are versions of the estimators introduced by Speckman (1988).  3.1  Generic Backfitting Estimators  In this section, we provide a formal definition for the generic backfitting estimators of the unknowns /3 and m in model (2.1). We also define and discuss various particular types of these estimators, clearly indicating which of these types we consider in this thesis. We start by introducing some notation. Let ft be an n x n matrix of weights such that the (p+1) x (p+1) matrix X £IX T  is invertible. Also, let §/, be a smoother matrix depending  on a smoothing parameter h which controls the width of the smoothing window. For example, the local linear smoother matrix is given in (3.6)-(3.8). Next, let S be the c  h  centered version of S^, obtained as: S% = (I-ll /n)S . T  h  (3.1)  Formal definitions for f2 and S> will be provided shortly. For now, we note that the c  h  matrix of weights fi may possibly depend on the known error correlation matrix \& and on the smoother matrix S£. 26  The constrained generic backfitting estimators P>tn,s  c  a h  n  d m  o,S£ of 0 and m are defined  as the fixed points to the following generic backfitting equations: 3 , c = {x nxy x si{Y T  l  - m , j)  T  n S  n  (3.2)  S  m , c = S (Y - X3 ,s )-  (3-3)  c  n  S  h  n  £  Use of the matrix S instead of §/, in equation (3.3) ensures that mn,s= satisfies the c  h  identifiability condition l 5rin,sj = 0. T  The motivation behind the generic backfitting equations introduced above is as follows. Given an estimator mn,g£ of the unknown m in model (2.1), one can construct the vector of partial residuals Y — mjj^,  Regressing these partial residuals on X via  weighted least squares yields the generic backfitting estimator /3 c in equation (3.2). nS  On the other hand, given an estimator /3 §c of the unknown (3 in model (2.1), one can n  construct the vector of partial residuals Y — X/3 §c. Smoothing these partial residuals n  'h  on Z —  (Zi,...,  Z)  T  n  via the smoother matrix S£ yields the generic backfitting estimator  mn,s= in equation (3.3). In practice, one could solve the generic backfitting equations (3.2)-(3.3) for (3r>§c and "^n,s= iteratively by employing a modification of the Backfitting Algorithm of Buja, Hastie and Tibshirani (1989), as follows.  27  The Generic Backfitting Algorithm (i) Let /3^ and  be initial estimators for (3 and m calculated as follows. We regress  y on the parametric and nonparametric covariates in the model via weighted least squares regression, obtaining: x , z) = 70 + 7 i • xi H  V(xi,...,  h% • x +%  p  Here, Z = (Z\ H  V Z )/n.  p  Note that, if Z = (Z,...,  n  +1  • (z - Z).  Z) is an n x 1 vector, the T  weighted least squares estimators 7 = ( 7 0 , 7 1 , . . . , 7 ) and 7 i above are obtained P  T  P +  by minimizing the following criterion with respect to 7 = ( 7 0 , 7 1 , . . . , 7 ) and 7 + i : P  [ - X Y  1  -  ( Z - Z)] ft [Y - X T  7 p + 1  1  -  7 p + 1  T  p  ( Z - Z)} .  We let m^(z)=% -(z-Z) +1  and m<°> = (m<°)(Zi),..., m -°\Z )) . (  Also, we let /3  T  n  (0)  = 7 . Note that m(°>  satisfies the identifiability condition (2.4). (ii) Given the estimators  and m ^ , we construct /3^ ' and m ^ ' as follows: +1  / + 1  /3 > = ( X J 7 X ) - X r 2 ( F - m « ) (/+1  r  1  mV+V = S (Y -  T  X0 ).  c  {1)  h  Note that m ^ ' satisfies the identifiability condition (2.4), since E>  c  / + 1  h  = (I —  11 /n)Sh, for some smoother matrix S/,. T  (iii) Repeat (ii) until (3^ and  do not change much.  If the Generic Backfitting Algorithm converges at the iteration labeled as I + 1, say, we set: 3n,s< = (3  {I)  rnn,si  =  28  m ( / )  -  However, we need not iterate to find the generic backfitting estimators / 3 § andran§c. n  c  Using the generic backfitting equations (3.2) and (3.3), we can easily derive an explicit expression for the generic backfitting estimator 3 §c. Simply substitute the expression n  of ™^n,s= given in equation (3.3) into equation (3.2) and solve for /3n,s=  :  3 ,s= = (x nx)- x n[r T  1  = (X nX)- X il[(I T  1  xp j  - §UY -  T  n  njsc  - S )Y + S£x3n,s;i  T  C  H  Pre-multiplying both sides of the above equation by X flX  and rearranging yields  T  x n(i  - § )xp  T  = x n(i  c  h  T  ntSl  - S%)Y.  Thus, provided the matrix X Q(I — S° )X is invertible, T  h  3 , c = (X n(I - S ^ ) X ) - X f i ( / - S%)Y. T  n  1  (3.4)  r  S  To obtain the generic backfitting estimator m n ^ without iterating, substitute the explicit expression of 3Q§C obtained above in (3.3) to get: s - & x (x n(i c  T  h  h  - SDX)-  1  x n(i T  - s )  (3.5)  c  h  Results (3.4) and (3.5) above show that the generic backfitting equations (3.2)-(3.3) have a unique solution as long as the (p + 1) x (p + 1) matrix X Cl(I — S )X is invertible. T  c  h  Various specifications for the smoother matrix S and the matrix of weights Q, appearc  h  ing in the generic backfitting equations (3.2) and (3.3) (or, equivalently, in the explicit equations (3.4) and (3.5)) lead to different types of generic backfitting estimators. In the rest of this section, we discuss several such specifications, together with the particular types of generic backfitting estimators they yield. Note that, if one wishes to estimate the unknowns 3* and m* in the intercept-free model (2.5) one should carry out an unconstrained backfitting algorithm, using X* instead of X, and Sh instead of S° in h  (3.2)-(3.3). 29  3.1.1  Usual Generic Backfitting Estimators  The usual generic backfitting estimators are obtained from (3.2)-(3.3) by taking fi = I. Clearly, these estimators are defined by ignoring the correlation structure of the model errors. In this thesis, we consider a particular type of usual backfitting estimators, obtained by taking  to be a local linear smoother matrix Sh, whose formal definition will be  provided shortly. We refer to these estimators as local linear backfitting estimators and denote them by 0i s°  a n  t  d mi^i-  These estimators were introduced by Opsomer and  Ruppert (1999) in the context of partially linear models with uncorrelated errors and discussed in Section 1.1.1. Taking  to be Sh is motivated by the fact that local linear smoothing has been shown by  Fan and Gijbels (1992) and Fan (1993) to be an effective smoothing method in nonparametric regression. It has the advantage of achieving full asymptotic minimax efficiency and automatically correcting for boundary bias. For more information on local linear smoothing, the reader is referred to Fan and Gijbels (1996). We define the (i, j)  th  element of Sh as:  i  Si  w  -™ E n  (i)  j  3= 1  (iy ' W  (-) 3 6  .  with local weights w%\ k = 1,..., n, given by:  i^ir )  K  l  [ "' 5  2(Zi) _ {Zi  ~ ^ ^] z  Sn  •  -  (3 7)  Here: S , (Z) = f^K(?-^)(Z-Z ) , 1  n l  j  1 = 1,2,  (3.8)  3=1  where Z G [0,1], h is the half-width of the smoothing window and K is a kernel function specified by the user. One possible choice of K, which will be used later in this thesis, is 30  the so-called Epanechnikov kernel: ? ) , if | u | < 1;  K(u) =  3.1.2  (3.9)  0, else.  Modified Generic Backfitting Estimators  T h e modified generic backfitting estimators are feasible when the error correlation m a t r i x \T/ is fully known. These estimators are obtained from (3.2)-(3.3) by t a k i n g f i = v f / . -1  Unlike the usual generic backfitting estimators, which ignore the error correlation structure of the model errors, the modified generic backfitting estimators estimators account for this correlation structure and thus would be expected to be more efficient. In this thesis, we consider a particular case of modified generic backfitting estimators, obtained by t a k i n g  to be the local linear smoother m a t r i x Sh, whose (i, j)  th  element  is defined i n (3.6)-(3.8). We refer to these estimators as modified local linear backfitting estimators and denote them by /3^-i  3.1.3  c and m^-i  S  =.  s  Estimated Modified Generic Backfitting Estimators  In practice, the error correlation m a t r i x \I7 is never fully known.  M o r e commonly, \17  is assumed to be known only up to a finite number of parameters, or assumed to be stationary, but otherwise left completely unspecified. In these situations, the modified generic backfitting estimators are no longer feasible. However, these estimators can be adjusted to become feasible by simply replacing f i = an estimator of  w i t h fl = & \ where "J/ is  We refer to these adjusted estimators as being estimated modified  generic backfitting estimators. In this thesis, we consider a particular case of estimated modified generic backfitting estimators, obtained by t a k i n g 8^ to be the local linear smoother m a t r i x 31  Sh, whose  (i,j)  th  element is denned in (3.6)-(3.8). We refer to these estimators as estimated modified local linear backfitting estimators and denote them by 3~-i _„ and m~-i Surprisingly, not much information is available in the partially linear regression model literature on estimating the correlation structure of the model errors when it is known only up to a finite number of parameters, or assumed to be stationary, but otherwise left completely unspecified. Later in this thesis we discuss how one might obtain estimators for the error variance of and the error correlation matrix \& in practice.  3.1.4  Usual, Modified and Estimated Modified Speckman Estimators  As we have seen earlier, the usual, modified and estimated modified backfitting estimators are obtained from (3.2)-(3.3) by taking f2 to be J ,  and VP \ respectively, with  determined by the smoothing method chosen. Other estimators are the usual, modified and estimated modified Speckman estimators, which are obtained from (3.2)-(3.3) by taking fl to be (J - S ) , (I - S £ ) * c  T  h  estimator of  r  _ 1  and (I - § ) *I>~\ c  respectively. Here, $ is an  T  h  while S depends on the smoothing method of our choice. We discuss c  h  these estimators below. The usual Speckman estimators ignore the correlation structure of the model errors. In what follows, we denote these estimators by /3(/_§^r expression for  §c  and m (  § C  §  _ § c ) T  S  c .  An explicit  can be found by taking fl = (I — S ) in (3.4): c  3^_ CJT  7  T  h  3(/-s=r,s= = {X XY X Y, T  l  T  (3.10)  where X = (I — §> )X and Y = (I — §l)Y are partial residuals formed by smoothing X c  h  and Y as functions of Z. The usual Speckman estimator  3^J_ CJT S  § C  can thus be thought  of as being the least squares estimator of 3 obtained by regressing the partial residuals Y on the partial residuals X. Later in this thesis, we compare the finite sample behaviour 32  of the usual Speckman estimator  /3(J_S°)T  §=, with  Nadaraya-Watson weights, against that of 0 and / 3 o - i  c o  being a local constant matrix with the local linear backfitting estimator,  c,  ItS  , the estimated modified local linear backfitting estimator.  The modified Speckman estimators are defined by taking into account the correlation structure of the errors associated with model (2.1) and are feasible when the correlation matrix of these errors is fully known. We denote these estimators by /3(i_§£) *-\§= and T  a n  ^ ( j - s ^ ) " * - , s= 7  1  d note that an explicit expression for /3(is ) ^- , § c  T  1  taking f i = (I - S ) ^/c  T  1  h  be found by  c c a n  h  h  in (3.4): (3.11)  3(I-SJF*-I, C - ( X * - X ) - X * - y . r  1  1  r  1  S  One can see that  /3(/_s=)i *-i ,  s  c  i  is weighted least squares estimator, obtained by rea  h  gressing the partial residuals Y on the partial residuals X. The large-sample properties of an unconstrained version of this estimator have been studied by Aneiros Perez and Quintela del Rio (2001a) under the assumption of a-mixing errors. Their estimator is given by: = (X^V-'X*)- ** *- *, 1  3 ( / - K „ * - , Kh 1  r  7  (3.12)  1  where X = (I—K )X*, X* is defined as in (2.6) and Kh is an uncentered local constant c  h  smoother matrix with Gasser-Miiller weights. Later in this thesis, we compare their asymptotic properties of  /3(;_  ji'$- , 1  J  f  K  h  against those of  /3^-i  s  =, the modified local  linear backfitting estimator. We do not, however, compare the finite sample properties of these estimators, as neither estimator can be computed in practice. Indeed, both estimators depend on the true error correlation matrix, which is typically unknown in applications. The estimated modified Speckman estimators are feasible in those situations where the error correlation matrix is unknown but estimable. ^(i-sjr* ,S£ - 1  a  n  d  ™V-S£)r*"\sj-  A  n e x  P  l i c i t  expression for  obtained by substituting * instead of * into (3.11). 33  We denote these estimators by 3  ( 7  _  S  c  ) T  §-  1 )  S  e  can be  In the remainder of this thesis, we concentrate on the following estimators of 3, the parametric component in model (2.1): (i)  3 c,  (ii)  3^-i  IS  the local linear backfitting estimator;  h  | S  c,  (iii) 3 -i s  the modified local linear backfitting estimator; , the estimated modified local linear backfitting estimator.  * >°h  Opsomer and Ruppert (1999) studied the asymptotic behaviour of 3r c under the asS  sumption that the model errors are uncorrelated. However, the asymptotic behaviour of  Pi,s%>  / ^ * - \ S £  a n  d  3~-i  g  c  has not been studied under the assumption of error correla-  tion. In Chapter 4 of this thesis, we investigate the asymptotic behaviour of 3 ^ and IS  discuss conditions under which this estimator is v ro"- °nsistent. In Chapter 5, we obtain /  similar results for  /3^-i = s  c  for correctly specified \&. Rather than assuming * to have a  general form as in Chapter 4, we restrict it to have a parametric (autoregressive) structure in order to simplify the proofs of all results in Chapter 5. We also give conditions under which 3 -i s  is i/n-consistent.  34  Chapter 4 Asymptotic Properties of the Local Linear Backfitting Estimator (3j In this chapter, we investigate the large-sample behaviour of the local linear backfitting estimator  as the number of data points in the local linear smoothing window  3 c IiS  increases and the window size decreases at a specified rate. Recall that an explicit expression for  /3  J ) S  =  can be obtained from (3.4) by taking Q = I and replacing  with  the centered local linear smoother S° : h  0 c IiS  = (X (I-S'i )X)- X (I-Sl)Y. T  l  l  T  (4.1)  Throughout this chapter, we assume that the errors associated with model (2.1) are a realization from a zero mean, covariance-stationary stochastic process satisfying condition (Al) of Section 2.2. We also assume that the non-linear variable in the model is a fixed design variable following a smooth design density /(•) (condition(A3), Section 2.2) and having a smooth effect m(-) on the mean response (condition (A4), Section 2.2). Finally, we allow the linear variables in the model to be mutually correlated and assume they are related with the non-linear variable via a non-parametric regression relationship (condition (AO), Section 2.2). In Sections 4.1 and 4.2, we provide asymptotic expressions for the exact conditional bias 35  and variance of 3  given X, Z. In Section 4.3, we provide an asymptotic expression  c,  JiS  for an exact conditional quadratic loss criterion that measures the accuracy of 3 ^ as IS  an estimator of 3. In Section 4.4, we discuss the circumstances under which the \fnconsistency of  can be achieved given X and Z. In particular, we show that one  3 c IiS  must 'undersmooth' mj,s=, the estimated non-parametric component, to ensure that is -^/n-consistent given X and Z.  3 c^ IS  The results in Sections 4.1-4.4 focus on the  local linear backfitting estimator 3 ^. In Section 4.5, we indicate how these results JS  can be generalized to local polynomials of higher degree. The chapter concludes with an Appendix containing several auxiliary results. Throughout this chapter, we let Gi denote the i  th  column of the matrix G defined in  (2.14), and 77, denote the 7 column of the matrix 77 defined in (2.10). We also let Bij,s th  c h  denote the z component of Pi s th  c  t  h  Exact Conditional Bias of f^i,s  4.1  c h  given X and Z  The modelling flexibility of the partially linear model (2.1) comes at a price. On one hand, the presence of the nonparametric term m in this model safeguards against model misspecification bias in the estimated relationships between the linear variables  Xi,...,  X  p  and the response. On the other hand, allowing m to enter the model causes the usual backfitting estimator expression of  3 c^ IS  to suffer from finite sample bias. Indeed, using the explicit  in (4.1), together with the model formulation in (2.1), we easily see  3 c IS  the conditional bias of Pi sp given X,Z,  to be:  t  E0 \X,Z)-a= IiS%  {X (I-S )X)- X (I-St)m, T  c  1  h  T  (4.2)  an expression which generally does not equal zero. Theorem 4.1.1 below provides an asymptotic expression for the exact conditional bias of  3 c ItS  given X and Z. As we already mentioned, this expression is obtained by 36  assuming that the amount of smoothing h required for computing the estimator /3 c is 7S  deterministic and satisfies conditions (2.12) and (2.13). Theorem  4.1.1 Let V and W be defined as in equations (2.20) - (2.21). Under as-  sumptions (AO), (Al) and (A3) - (A5), if n —» oo, h —» 0 and nh —> oo, the conditional 3  bias of the usual backfitting estimator E0 o \X, IiS  Comment  h  of (3, given X and Z, is:  /3 c IS  h  Z)-(3 = -h - V^W + o (h ). 2  (4.3)  2  P  4.1.1 From equation (4.2) above, one can see that the exact conditional bias  offii s%igiven X and Z, does not depend upon the error correlation matrix t  Hence,  it is not surprising that the leading term in (4.3) is unaffected by the possible correlation of the model errors. Proof of Theorem  4.1.1:  Let:  where the dependence of B j upon h is omitted for convenience. We will see below n  that when n —> oo, h —> 0 and nh —> oo, B j converges in probability to the quantity 3  n  V defined in equation (2.20) . Since V is non-singular by Lemma 4.6.11, the explicit expression for /3/,s= in (4.1) holds on a set whose measure goes to 1 as n —> oo, h —> 0 and nh —> oo. We can use this expression to write: 3  ^  =  j  B  ^'{r7TI  X  T  (  J  "^  which holds on a set whose measure goes t o 1 as m  )  y  }'  ( 4  '  4 )  oo, /i ->0 and nh —> co. Taking 3  conditional expectation in both sides of (4.4) and subtracting /3 yields: E(f3 \X, ItSl  Z)-(3 = B~\ • {^f[X (I T  37  ~ S )m} c  h  (4.5)  converges in probability to V as n -> oo, /i ->0 and nh —> oo,  We now show that B j  3  n  that is: B  = V + o {l).  niI  By equation (2.16), X — G + rj, so B j  can be decomposed as:  n  TI  ~t~  (4.6)  P  J-  71  *T* 1  Using S£ = (7 — 11 /n)Sh (equation (3.1) with  = Sh), we re-write the first term,  T  expand the last term and re-arrange to obtain: B  -  = ^ T T ) °  -  T  l  l  T  G +  ^ T "  "  T  ^ l  +  G  T  « ~  ^  S  G  ^ °^  <4-7>  Ts  To establish (4.6), it suffices to show that • GUG n(n + 1) 1  T  T  -L-r r T  1  1  = f g(z)f(z)dz • f g(z) f(z)dz + o(l), J J  (4.8)  =^  (4.9)  T  0  0  + o (l), P  whereas the remaining terms are O p ( l ) . First consider G ll G/n(n T  T  + 1). Set Z = 0, Z \ 0  38  n+  = 1 and use (A3), the design  condition on the Z^s, to get: n+l "+  Zi fZi  1  fl  / 9j(z)f(z)dz Jo  i  = E  / '  i=l  f  = E/ =  9j(z)f(z)dz JZi-  l  X  fcW-ft(^)]/W^  +E  i-\  "+  /-Zj  1  ^ ^ - i  n+l  n  „Zi  ~  J z i  for j = 0,..., p fixed.  3  biW-9;(Z.)]/W^+—rEft(^)  =E / 1=1  9 iZi)f(z)dz  "+1  1  =E / i=i  / ' /2t-l  t=l  JZ  i=i  + n+l  1  J j+i  Re-arranging and using the design condition (A3) and the  Lipschitz-continuity of gj(-) (consequence of (A0)-(i)) yields: 1  r  1  TG  I  I  ?±1 rz>  n.(.\  1  fj  Jj+i  n+l  < 1^(1)1 ~ n +I  (z)f(z)dz  =  9j  0  +E  \9i(*)  f^  + E  9j(Z )\f(z)dz  ~  i  r  =O ( - L ^  for any j = 0,... ,p, so: = f g(z)f(z)dz + o(l)  \-G l + n+ T  (4.10)  Jo  1  and (4.8) follows. Next consider r/ ri/(n + 1). Fix i,j = 1,... ,p, and use (AO)-(ii), which specifies the T  distributional assumptions on the rows of rj, to get: 1  Ln + 1  1  JT  V V  i+ij+i  "  n+ _, , K=l  in probability. Since [T7 rj/(n + l ) ] i i i = 0 whenever i = 0 and j = 0,...,p or T  +  J +  i = 1,... ,p and j = 0, (4.9) follows. It remains to show that all the other terms in (4.7) are op{l). It suffices to show that G f ( I - S )G /(n +1  h  j+1  + 1), Gf ll {S T  +1  h  - I)G /n(n j+1  39  + 1), Gf (I +1  - S ) /(n h  Vj+1  + 1),  r]J {I - S )G i/(n  + 1) and vf+iS Vj+i/(n + 1) are o {l) for any i,j = 0,1,... ,p.  c  +1  h  c  j+  h  P  These facts follow from lemmas appearing in the Appendix of this chapter.  — Sh)Gj \/(n  Let i, j — 0,1,... ,p be fixed and consider Gj (I +1  + 1). By result (4.58)  +  of Lemma 4.6.9 with r* = G i, fl = I and r = Gj+i, this quantity is 0(h ), so 2  i+  G f ( I - S )G /{n +1  h  + 1) is o(l). Similarly, by result (4.59) of Lemma 4.6.9 with  j+l  r* = G i, fl = I and r = G , i+  j+1  Gf ll (I  - S )G /(n(n  T  +l  h  Gj 11 (I T  +1  - S )G /(n{n h  + 1)) is Q(h ). Thus, 2  j+1  + 1)) is o(l).  j+1  Next consider Gf (7 - 5^)T7 /(n + 1). When j = 0, this is 0. For j = 1,... ,p, by +1  j+1  result (4.60) of Lemma 4.6.9 with r* = G \, i+  fl = I and £ = rjj , this quantity is +1  C M n - / / ! - / ) = o ( l ) . Similarly, when i - 0, rfi {I - S )G /{n 1  2  1  2  + 1) = 0. For i =  c  P  +l  h  j+1  1,... ,0, result (4.61) of Lemma 4.6.9 establishes that r)f (I - S )G /{n  +1) = o ( l ) .  c  +1  Finally, consider vI+i^h lj+i/( T  n  h  j+1  P  + !)• When z = 0 or j = 0, this is 0. By result  (4.62) of Lemma 4.6.9 with £* = rj , fl = I and £ = rjj+i, lT+iSh lj+i/( r  r  ri  i+l  0 (n-^ h-^ ) 2  2  P  + 1) is  = o {l) for i,j = l,...,p. P  Combining these results, we conclude that B  = £<°>+ f g{z)f(z)dz- f g(z) f(z)dz + o (l) = V + o (l). T  n i /  Jo  P  Jo  P  But V is non-singular by Lemma 4.6.11, so B$ = V - + ( 1 ) .  (4.11)  1  0P  To establish (4.3), by (4.5) it now suffices to show that: ^  T  X  T  ( J - S )m = -h W c  2  h  + o (h ). 2  P  (4.12)  This equality is established below with the help of lemmas stated in the Appendix of this chapter.  40  By equation (2.16), X = G + rj, so X (I  — S )m/(n  T  c  h  + 1) can be decomposed as:  Using the identifiability condition on m(-) in (2.4) and the fact that S% = (I — 11 /n)Sh T  we obtain: -±-X {I T  - s%)m = 4r ( Gr  n +l  J  -  n+l + ^V (I-S%)m.  + -TTTn n(n + 1)  G r i l T  (  f l f f c  -  J  >  m  (4.13)  T  By results (4.66) and (4.67) of Lemma 4.6.10, we obtain G {I - S )m/(n  + 1) =  -h (v (K)/2)  + 1) =  T  h  fi g(z)m"(z)f(z)dz  2  2  h (v (K)/2) 2  2  £* = r j o (h ). 2  P  + o (h ) 2  P  as well as G ll (S T  - I)m/n(n  T  h  Si g(z)f(z)dz-Si m"(z)f(z)dz + o (h ). 2  P  Result (4.61) of Lemma 4.6.9 with  ft = I and r = m establishes that rjf (I - S )m/(n  + 1) = 0 {n~ h )  c  i+1)  +1  1/2  h  =  2  P  Note that result (4.61) of Lemma 4.6.9 holds trivially when £* = r^, as r) = 0 1  by definition. Thus, (4.12) holds. This, combined with (4.5) and (4.11) completes the proof of Theorem 4.1.1. To better understand the effect of the correlation between the linear and non-linear variables in the model on the asymptotic conditional bias of Pi s > c  :  h  w  e  provide an alternative  expression for this bias.  Corollary 4.1.1 Let Z be a random variable with density function /(•) as in assumption (A3). Let X\,... ,X  p  be random variables related to Z as: X  J = 9j(Z)+Vj,  3 = 1, • • • ,P,  where the gj(-) 's are smooth functions as in assumption (A0)-(1) and the r)i's are random variables satisfying E(r)j\Z) = 0, Var(r)j\Z) = S^-, Cov(r)j,r)j<) = E^y, j ^ j', with E =  41  (Ejy) as in assumption (AO)-(ii). Also, let m(-) be a smooth satisfying assumption (A4) and denote its second derivative m"(-). Set X — (Xi,...  ,X ) .  Under the assumptions  T  P  in Theorem 4-1-1, our previous bias expression can be re-written in terms of X and Z as: E0  \X,Z)-po  = \ E(X\Z) Var(X\Z)- Cov(X,m''(Z)) h {K)  OtItS%  T  + o (h )  1  2  P  (4.14) and h u (K) 2  E  2  x,z  :  Var{X\Z)- Cov{X,m"(Z))  +  l  o (h ). 2  P  V (4.15)  Proof: Let a — (Jg gi(z)f(z)dz,...,  g (z)f(z)dz)  1  W = (0, Wl) , T  \W2\i  and let W be denned as in (2.21). Set  T  p  with:  =  f  (z)m"(z)f(z)dz  jf  -  9j  •J  (z)f(z)dz  1  1  9j  for j = 1,... ,p. Substitute the explicit expression for V  - 1  m"{z)f(z)dz,  (result (4.68), Lemma 4.6.11)  into (4.3) to obtain:  l +a E a j r  E(3 AX,Z)-3 IiS  =  -h  2  _ 1  -E a -a zZ~ W T  f  |  _ 1  1  2  E W  2  +  o (/i ) 2  P  " +  _ 1  0  Op(/i ), 2  ,  with S as in assumption (AO)-(ii). Results (4.14) and (4.15) follow easily from the above by noting that a = E(X\Z),  £ = Var(X\Z) 42  and W  2  = Cov(X,  Z).  Result (4.15) in Corollary 4.1.1 shows that the effect of the correlation between the linear variables and the non-linear variable in the model on the asymptotic bias of the local linear backfitting estimator of the linear effects j3\,..., Q is through the variance-covariance p  matrix Var(X\Z)  and the covariances Cov(X,m"(Z)).  Note that the latter depends on  the curvature of the smooth non-linear effect m(-) through its second derivative m"(-). Therefore, the leading term in the bias of Pi,i,s disappears when there is no correlation c  h  between the corresponding linear and non-linear terms in the model, that is when the correlation between gi(Z) and m"(Z) is zero. In particular, the leading term disappears if m(-) is a line, or if #;(•) = Q for some constant c,. Opsomer and Ruppert (1999, Theorem 1) obtained a related bias result for the local linear backfitting estimator of the linear effects  . . . , Q in a partially linear model with v  independent, identically distributed errors. These authors derived their result under a different set of assumptions than ours. Specifically, they assumed the design points Zi, i = 1,..., n, to be random instead of fixed. Furthermore, they did not require that the covariate values X^ and the design points Zi be related via the nonparametric regression model (2.7). However, they assumed the linear covariates to have mean zero. Finally, they allowed h to converge to zero at a rate slower than ours by assuming nh —• oo instead of condition (2.13) (nh —> oo). 3  The asymptotic bias expression derived by Opsomer and Ruppert is -(h u (K)/2){E{Var{X\Z))}- Cov{X,m"(Z)) 2  1  2  + o {h ). 2  P  The leading term in this expression is a slight modification of our first term in (4.15), which accounts for the randomness of the 2Ys. The rate of the error associated with Opsomer and Ruppert's asymptotic bias approximation is Op(h ) 2  order as that associated with the bias approximation in (4.15).  43  and is of the same  4.2 Exact Conditional Variance of 0i,si Given X and Z In this section, we derive an asymptotic expression for the exact conditional variance Var(0 c\X, IiS  of the usual backfitting estimator /3j,s£ °f A given  Z )  and Z . But  X  first, we obtain an explicit expression for the exact conditional variance Var(3 aJX, IS  Using the expression for  /3  J | S  Z ) .  in (4.1) together with the fact that  c  Var(Y\X,Z)  of*  =  (4.16)  from condition (Al), we get: Var@  I  t  S  .\X,  =a  2  Z )  ( X  ( X  T  T  ( I  -  S ^ X ) '  ( I - S l )  T  1  • X  T  ( I  - S£)*(I -  S % )  T  X -  (4.17)  X y \  The next result provides an asymptotic expression for this variance.  Theorem 4.2.1 Let G , V and S  k  be defined as in equations (2.14), (2.20) and (3.1)  and let I be the nxn identity matrix. Under conditions (AO) and (A3) - (A5), ifn — > oo, h— > 0 and nh —> oo, 3  2 ;  n+ 1  2  (n + 1)  2  v  "  fty  + 0,(1),  (4.18)  where 4?^°' is defined in equation (2.9) and St is t/ie error correlation matrix.  Comment 4.2.1 From equation (4.17),  Var(3 cJX, IS  Z )  depends upon the error cor-  relation matrix \&, so we expect the asymptotic approximation of Var(3j  S  c \ X , Z) to  also depend upon the correlation structure of the model errors. Indeed, result (4.18) of 44  Theorem 4.2.1 shows that, for large samples, the first term in the asymptotic expression Z) depends on & indirectly via the limiting value 3>(°' of rj ^r]/(n + l),  of Var(/3 c\X,  T  ItS  while the second term depends on \& directly.  Comment 4.2.2 By Lemma 4.6.12, the second term in (4.18) is at most 0(l/n). There\X, Z) has a rate of convergence of 1/n.  fore, Var(f3 c IS  Proof of Theorem 4.2.1: From (4.6), B  = X-T (I - S° )X/(n  + 1) = V + o (l), so Vor(/3 = \X, Z) in (4.17)  h  n J  P  /iS  can be written as: Var@ .\X,  Z) = o*B£  IiS  • _ L _ x 7 ( i - S )*(I - S ) X c  • n,I  l nJ B  where C  n+  •  C  c  h  c  • (S^)"  T  h  1  (4.19)  {Bnj)  = )^f(I - SQc\T ) X/(n — X v T l(I r - S cc\ir.fT 1  nJ  c  h  + 1). The dependence of C  T  h  nJ  upon h is  omitted for convenience. To establish (4.18), it suffices to show that C j  satisfies:  n  C  = *(°> + G (I T  nJ  - 5 J*(7 - S ) G/(n C  c  Using X = G + rj (equation (2.16)), C j n  + 1) +  o (l)  (4.20)  P  can be decomposed as:  n+ 1  n-ri  -G  +  T  h  (I-S%)*(I-S%) r,  1  T  n+  +  —rr (I-Sl)*{I-Sl) r . r  T  1  Expanding the last term and re-arranging yields: 1  n+  T^*  77+  nTl  G T { I  ~  "  SCh)TG  11  + —^J (I - SIMI - SI) r,G+ n+l G  1  n+l  1  1  n+l  n+l 45  (I-S )*(I-S%Yri h  h  h  (4.21)  ri ^fn/(n T  The first term,  + 1), converges in probability to  by condition (Al)-  We now show that all the other terms, except for the second, are O p ( l ) . It  (iii).  suffices to show that Gf (I and ? 7 ^ 5 ^ * S ^ T 7 r  1  :/+1  S ) /(n + 1), r,T ^Sfr, /(n c  - S )*(I c  +1  T  h  h  Vj+1  +1  + 1),  j+1  / ( n + l) are op(l); these facts follow from lemmas appearing in the  Appendix of this chapter.  S ) r] /(n + 1). Using Lemma 4.6.4 with £ = c  First consider Gj (I - S )&(I c  +1  T  rj  j+1  h  h  j+x  and c = (J — S° )^f(I — S ) Gi+i, as well as properties of vector and matrix norms from c  h  T  h  Section 2.4 of Chapter 2, we obtain: ^Gf  = ^ O P W - SDMI -  ( I - S%)V(I - S f c  + 1  h  Vj+1  + 11-^1 W • ll*H* • IK - D Gi i\\2) J  = ^1°P  S  T  c  4.6.7,  is 0(1) by assumption (Al)-(ii), and | | ( J - S ) G \\  S h) lj+i/{ Tr  i+1  2  2  0(n ' ) by re1 2  We conclude that Gf (I  i+1  i+1  Opin-Wh-W).  is  T  h  sult (4.53) of Lemma 4.6.7 with r = G . C  P  c  T  h  by result (4.54) of Lemma  1/2  h  c  =  +  The last equality was derived by using that \ \S \\ is 0(h~ )  S ) G \\ )  - S )¥{I c  +1  h  -  + 1) is °P(1)- Note that Lemma 4.6.4 invoked earlier holds trivially for  n  £ = T7j, as r) = 0 by definition. l  Next consider rjJ VSfr] /'(n +1  rif S VS r] /(n c  + 1) and  j+1  +1  c T  h  h  j+1  + 1).  When i = 0 or  j — 0, these quantities are 0, so consider i,j = l,... ,p. By result (4.63) of Lemma 4.6.9 with  = r,  i+v  , l^S  Cl = M> and £ =Vj+1  /(n + l) is C p f a " / ^ " / ) =  c T  V  1  h Vj+1  By result (4.64) of the same lemma with £* = r) , £1 = I, fl* = i+x  TiT iS *S?ri i/(n c  +  h  j+  + 1) is  0 {n-'h- ) 1  P  1  2  0 p  (l).  and £ = r/j , +1  = o (l). P  Combining these results in (4.21) yields (4.20). This concludes our proof of Theorem 4.2.1.  We now provide an alternative expression for the asymptotic conditional variance of 0i,si which will shed more light on the effect of the correlation between the linear and non-linear variables in model (2.1) on this variance. 46  Corollary 4.2.1  Let G as in (2.14) and 4>  (0)  G12 G21  be as in (2.9). Set  = G (I-Si)(I-Si) G, T  G22 O 2 2  (4.22)  T  J  where Gu is a scalar, G12 = G^i is a 1 x p vector and G22 is a p x p matrix. Also, set:  where 4?^ — 0,  =  ( * 2 ? )  = 0 is a 1 x p vector, and $> ° is a p x p matrix. If X  T  2  2  and Z are as in Corollary 4-1.1 and the assumptions in Theorem 4-2.1 hold, then our previous variance expression can be re-written in terms of X and Z: 2 Var(p \X,Z) 2  =  ltIiS%  ^E(X\Z) Var(X\Z)- ^ Var(X\Z)- E(X\Z) T  1  :  + ( + i)2 { G ^ + E{X\Z) Var(X\Z)- E(X\Z)f 1  T  )  1  2  - 2G Var(X\Z)-  l  1  n  12  E(X\Z)  -2E(X\Z) Var(X\Z)- E(X\Z)G Var(X\Z)- E(X\Z) +E{X\Z) Var(X\Z)- G Var(X\Z)- E(X\Z)} T  1  T  1  1  12  (4.24)  1  22  and  Var  XZ  + + o  P  ^  VariXlZ)-  (-)  .  1  -VarWZy^VariXlZ)-  1  n+ [G  - 2E{X\Z)G  22  + G E{X\Z)E{X\Z) } T  l2  n  Var{X\Z)-  1  (4.25)  Proof: Let a = (Jg g (z)f(z)dz,..., 1  1  J g (z)f(z)dz) l  Q  T  p  be as i n L e m m a 4.6.11 and S =  be the variance-covariance m a t r i x introduced i n condition (AO)-(ii).  47  (Ey)  Substituting the  explicit expression for V  1  (result (4.68), Lemma 4.6.11) into (4.18) yields:  Var(p JX,Z)l!S  v  v  21  where V n is a scalar, V  = V  12  V  n  = 4 r n+l  f l r S  1 $  ) s  l a  P  n  is a p x p matrix given by:  22  " i' "  o \-  22  is a 1 x p vector and V  21  1  +  + r (n+l) ^ W ^ 2  1  +a^a)  2  - 2G E" a 1  1 2  - 2a S- aG S- a + c^S^G^E^a}, r  1  (4.26)  1  1 2  Vl = - ^ r S - ^ f ' E - ' a + _ ^ _ { _ (77. +  71+1  ( l + o S- a)S- a r  G l l  1  1  1)  + E - ' o G u S ^ o + (1 + a S - a ) S - G f - E ^ G ^ E ^ a } T  1  (4.27)  1  2  and V  2  2 2  = -^-S-^^En+ l  +  1  2  ' ' ^ 2- ' { G B  (n + l )  - 2aG  + Gnaa^S" .  (4.28)  1  12  Results (4.24) and (4.25) follow from (4.26) and (4.28), respectively, since a = E(X\Z) and E =  Var(X\Z).  Result (4.25) of Corollary 4.2.1 shows that the effect of the correlation between the linear variables and the non-linear variable in model (2.1) on the asymptotic variances of the local linear backfitting estimator of the linear effects j3\,...,P  p  is through the  conditional variance-covariance matrix Var(X\Z), the conditional mean vector E(X\Z) and the matrices G n , G i , G 2  22  in (4.22).  Comment 4.2.3 In the case * = I, rj *r)/(n + 1) = rj rj/(n + 1) = E T  result (4.9), with E  ( 0 )  T  as in (2.15). Therefore, $ ° = E = Var(X\Z). }  2  48  ( 0 )  + o (l) by P  If we also assume,  as Opsomer and Ruppert (1999) do, that E(X\Z)  = 0, then (4.25) becomes:  \  x,z  Var  +  -l  Var(X\Z)  n +1  + o (-)  ; Var(X\Z)- G Var(X\Z)-  (J f  1  u2  1  ,  P  22  (4.29)  Recall that these authors also used different conditions on the rate of convergence of the smoothing parameter h and the design points Zi, i — 1,..., n. Namely, they allowed h to converge to zero at a rate slower than ours by assuming nh —> oo instead of nh —> co, 3  and they assumed the design points Z i — 1,..., n, to be random instead of fixed. it  The asymptotic variance expression derived by Opsomer and Ruppert (1999, Theorem 1) is (of/n) • {E{Var{X\Z))}-  1  + Op(h /n 2  expression is (of/n) • {E(V' ar(X\Z))}~ , 1  + l/(n h)). 2  The leading term in this variance  a slight modification of our first term in (4.29)  which accounts for the randomness of the 2Vs. The rate of the error associated with their asymptotic variance approximation is o (h /n+1 2  P  /'(n h)) and is possibly of smaller 2  order than the second term in (4.29), known to be at most 0 (l/n) P  by result (4.69) of  Lemma 4.6.12 (Appendix, Chapter 4) with \t = I.  4.3  Exact Conditional Measure of Accuracy of f^i,s  c h  given X and Z Because Pi^i ^ generally a biased estimator of 3 for finite samples, any suitable criterion s  for measuring the accuracy of this estimator should take into account both bias and  49  variance. A natural way to take both effects into account is to consider E  (||3/,sj - P\\l\X, Z)  = {E@ \X,  Z)  ItS%  + trace  (3r]  T  -  {E(f3j^\X,  {Var(/3 cjX,Z)}  Z)  -  fs) (4.30)  .  IS  Using the above equality, which follows from (2.24), and the asymptotic expressions for and  Z)-0  E(J3 c\X, I>S  Var(f3 cjX, ItS  in Theorems 4.1.1 and 4.2.1, we obtain:  Z)  Assume that the conditions in Theorem 4-1.1 and Theorem 4-2.1 hold.  Corollary 4.3.1 Then: E  (Wh.si  -  + J^TW  0\\l\X, Z)=h i  { y ^ i  trace  W V~ W T  ~ Sh)*(I  1  2  +^trace { V  ~ SlfGV- }  * ^ "  1  }  + o (h ) + op ( i ) .  1  (4.31)  4  P  The i/n-consistency of Pi,s  4.4  1  c h  For obvious reasons, we would like the estimator Pi s to have the 'usual' parametric c  t  h  rate of convergence of 1/n - the rate that would be achieved if ra were known, given X and  Z .  If/3 c 7 S  has this rate of convergence, we say that it is v^-consistent. A sufficient  condition for /3/,s= to be y^-consistent given X and Z is for E(\\f3j c — ^|||j-X", Z) to S  be  Op(n~ ). l  By result (4.31) in Corollary 4.3.1, £(||3 e - P\\\\X, Z) is 0 (h ) + O (rT ). This 4  IiS  P  1  P  result is due to the fact that the conditional bias of Pi si is &p(h )> while its conditional 2  t  variance is GpirC ). 1  For E(\\P -(3\\l\X,  Z) to be Op{rT ), we require / i = © ( r r ) , l  ISI  4  1  as well as h —> 0 and nh —•> oo. 3  To understand the meaning of the above conditions, let us consider that h = n~ . For a  h —* 0, we require a > 0. Also, for nh —> oo, we require 1 — 3a > 0. Finally, we want 3  50  h — n~ A  ia  = 0(n  1  ), so a > 1/4. Thus, we require a G [1/4,1/3). In summary, P s  c  It  h  achieves ^/n-consistency for h = n~ , with a G [1/4,1/3). a  We argue that Pi s% computed with an h optimal for estimating m is consistent, but not t  -y/n-consistent, given X and Z. We argue this by finding the amount of smoothing h that is optimal for estimating m(Z) via the local linear backfitting estimator  (Z)  where, for Z G [0,1] fixed, ^si{Z)  =  _  E i=l n  '  ( W  *  (4.32)  (Z)].  (4.33)  i  and wf  =K  ^)  [«5„ (Z) - (Z - ^ ) 5 l2  n>1  Here, S i(Z), I = 1, 2, is as in (3.8), /Y is a kernel function satisfying condition (A5) and n>  the Zj's are design points satisfying condition (A3). We define the optimal h for estimating m(Z) via mi s^(Z) as: :  ^AMSE = argmin AMSE with AMSE  Z) ,  I:S  Z) being an asymptotic approximation to the exact condi-  (rhi s (Z)\X, c  t  (fh i(Z)\X,  h  tional mean squared error of fhi s {Z) given X and Z: c  t  MSE  h  Z) = £ {(ro = (Z) - m(Z))  (rh c (Z)\X, I:S  /iS  h  To find the order of AMSE  {fhi s (Z)\X,  |x, z} .  Z), and hence HAMSE, note that:  c  :  2  h  M S £ (m,,^ (Z)|X, Z) - {E (m (Z)\X,  Z) - m(Z)}  + Var (m (Z)\X,  2  ItS%  IiSfi  Z) .  By results (4.73) and (4.74) of Lemma 4.6.13, the first term is Op(h ) and the sec4  ond term is 0 {l/(nh)), P  (fhi si(Z)\X,  AMSE  t  so MSE (m/, = (Z)\X, Z) is 0 ( / i + l/{nh)). 4  s  Z) is Op(h +l/(nh)), A  P  and the /i that minimizes it satisfies KAMSE =  ©(n" / ). 1  Therefore,  5  51  For h — HAMSE, the estimator 3 ^ has conditional bias of order Op(n~ / ) and condi2 5  IS  tional variance of order Op(n~ ). Thus, x  3 c IS  i consistent but not v^-consistent given s  X and Z, as its squared conditional bias asymptotically dominates its conditional variance. However, for h = n~ , a a  e [1/4,1/3), the squared  conditional bias of fli s  c  t  w  mn  o  h  longer dominate its conditional variance asymptotically, ensuring that 3j c achieves \fnS  consistency given X and Z. Note that the estimator m j ^ ^ Z ) of m(Z) computed with h = n~ ,a G [1/4,1/3), is 'undersmoothed' relative to that computed with h = HAMSE, a  since n~ < n" ^ . a  4.5  1  5  Generalization to Local Polynomials of Higher Degree  The asymptotic results in this chapter focus on the local linear backfitting estimator 0i,s%- A natural question that arises is whether these results generalize to the local  polynomial backfitting estimator of 3. The latter estimator is obtained from (4.1) by replacing S , the smoother matrix for locally linear regression, with the smoother matrix c  h  for locally polynomial regression of degree D > 1. See Chapter 3 in Fan and Gijbels (1996) for a definition of locally polynomial regression. Recall that 3 ^ has conditional bias of order Op(h ) and conditional variance of order 2  IS  C?p(n ) by Theorems 4.1.1 and 4.2.1. In keeping with the locally polynomial regres_1  sion literature, we conjecture that the local polynomial backfitting estimator of 3 has conditional bias of order Op(h ) D+l  and conditional variance of order  Note  that we may need boundary corrections if D is even. If our conjecture holds, we see that the conditional variance of the local polynomial backfitting estimator of 3 is of the same order as that of 3  c.  IS  However, the conditional bias of the local polynomial backfitting  estimator of 3 is of smaller order than that of In Section 4.4 we established that  3 c IS  3 c. IS  is y^-consistent given X and Z provided h 52  converges to zero at rate n~ , a G [1/4,1/3). To ensure that the local polynomial a  backfitting estimator of 3 is i/^-consistent given X and Z, we conjecture that h should converge to zero at rate n~ , a E [1/(2D + 2), 1/3). a  4.6  Appendix  Throughout this Appendix, the assumptions and notation introduced in Chapter 2 of this thesis hold, unless otherwise specified. The first result provides an asymptotic bias expression that will be useful for proving subsequent results.  Lemma 4.6.1 Let Sh = (5y) be the uncentered smoother matrix defined by equations (3.6)-(3.8) and S = (I-ll /n)S .  Letr = (r(Z ),...,  T  c  h  h  where r(-) : [0,1] -> R  r(Z )) , T  x  n  is a smooth function having three continuous derivatives and the Zi's are fixed design points satisfying condition (A3). Furthermore, let K be a kernel function satisfying condition (A5) whose moments vi(K,z,h),z  € [0,1], I — 0,1,2,3, are defined as in (2.17). element of the vector (Sh — I)r  If n —> oo, h —> 0 and nh? —> oo, then the j  th  can be  approximated as:  l(S - 1)^. = B (K, Z h)-h  + o(h )  2  h  r  (4.34)  2  jt  uniformly in Zj,j = 1,..., n, where B(K,h\- " r  {  z  ^(K,z,h) -^(K,z,h)u (K,z,h)  )  2  3  = — — ——— — 2 — —, z€[0,1. (4.35) 2 V2(K,z,h)vo(K,z,h)-v{(K,z,h) Furthermore, ifr l = 0, then the j element of the vector (S — I)r can be approximated B {K,z,h) r  T  th  c  h  as: [(S - 7)r]. = B (K, Zj, c  h  r  h)-h -(^J2 2  53  r( >  B  K  J>  Z  • h + o(h ). 2  2  (A. 36)  Proof: For i = 1,... ,n, let yi — r(Zi) + e , with the e;'s independent, identically distributed t  random variables with mean 0 and standard deviation a e (0, co). Set y = (yi,..., e  y); T  n  if r(Zj) = [ShVJj is the local linear estimator of r(Zj) obtained by smoothing y on Z\,..., Z via the local linear smoother matrix Sh, then Bias(r(Zj)) n  = [(Sh — I) ]j-  Standard results on the asymptotic bias of a local linear estimator yield that  r  Bias(r(Zj))  is of order h , with asymptotic constant B (K, Zj,h), uniformly in Zj,j = 1,... , n (Fan 2  r  and Gijbels, 1993). So the proof of (4.34) is complete. The definition of S and r l c  T  h  [{si  -  = 0 allow us to write:  J H =  I - ^ ) S n  = [(s -iH-  h  - I li  1  Sr h  n  fc  11  J  =  [(Sh-I)rV-  n  (Sh-I)r  Substituting (4.34) in the above result yields (4.36).  The next result establishes the boundedness of a function defined in terms of certain moments of a kernel function K(-). Subsequent results rely on this lemma.  Lemma 4.6.2 Let K(-) be a kernel function satisfying condition (A5) and whose mo-  ments vi(K, z, h), z 6 [0,1], I = 0,1, 2, 3, are defined as in (2.17). Then, for ho € [0,1/2] small enough and I = 1,2, 3, we have: vi(K,z,h) sup sup i/ (if, z, h)v (K, z, h) - v\(K, z, h) he[o,ho] ze[o,i]  2  2  0  Proof: 54  < oo.  (4.37)  For z G [0,1], we define the function: v (K,z,h) t  v (K, z, h)u (K, z, h) - i>i(K, z, h)  2  2  (4.38)  0  To establish the desired result, it suffices to show that, for any I — 1,2,3, this function is bounded when restricted to the intervals [h, 1 — h], [0, h] a n d [1 — h, 1], where h < ho for some h G [0,1/2] small enough, a n d that the three bounds do not depend on h. 0  Let / = 1, 2 , 3 be fixed and let h < h for some ho G [0,1/2] small enough. T h e restriction 0  of the function i n (4.38) to the interval [h, 1—h] is t r i v i a l l y bounded, as ui(K, z, h) = vi{K) for any z G [h, 1 — h]. Clearly, the bound of this restriction does not depend on h. T o show that the restriction of this function to the interval [0, h] is also bounded, let us note that, i f z G [0,1], there exists a G [0,1] such that z = ah and so v {K,z,h)= l  r(l-z)/h / s K{s)ds J-z/h 1/h—a s K(s)ds l  /  l  •CX  = f s K(s)ds J —a l  = <M°0 since h < ho- Thus, when restricted to the interval [0,h], the function i n (4.38) is equivalent to: _^  <t>i( ) a  <l>i( ) a  =  (po(a)(t)2(a) - </>i(a)  2 _  D(a)  where a G [0,1]. To establish boundedness, it suffices to show that the nominator <f>i(a) is bounded from above while the denominator D(a) is bounded from below for any a G [0,1] and I = 1,2,3. To bound 4>i(a), note that: \M )\ a  ^ J  \s \K{s)ds l  55  since K(-) is a continuous function with compact support. To bound \D(-)\ from below, we show that D(-) is non-decreasing on [0,1] and satisfies D(0) > 0. As D'{a) = </>' (a) • «£ (a) + 0 (a) • $ ( a ) - 20!(a) • < / > ' » , 0  2  o  and da  =  f  s K(s)ds l  J—a  (-l) K(-a) l  = (-!)'*(<*) for any I = 0,1, 2 (using Leibnitz's Rule and the symmetry of K), we obtain: D ' ( a ) = K(a) ( f s K(s)ds + a \J — a 2  f  2  K(s)ds + 2a f  J —a  sK(s)ds)  J —a  . J  Since K is non-negative and symmetric about 0, each term above is non-negative and so D'(a) > 0, that is D(-) is non-decreasing on [0,1]. Further, with K*(s) the density K(s)/ /o K(s)ds = 2K{s), we obtain: D(0) = I K(s)ds • [ s K{s)ds f Jo Jo Uo 2  I s K*l s)ds Jo 2  sK(s)  (sK*(s)ds)  Thus, £>(0) = Var(D*)/4 > 0, with D* a random variable with density K*. Finally, note that the upper bound  \s \K(s)ds/D(0) l  of the function <j>i(a)/D(a), a G [0,1], does  not depend on h. A similar argument can be employed to establish that, when h < h , with ho G [0,1/2], 0  the restriction of the function defined in (4.38) to the interval [1 — h, 1] is bounded. Now, we use Lemma 4.6.1 and Lemma 4.6.2 to derive asymptotic expressions for the Euclidean norms of the biases which can occur when using locally linear regression to estimate a smooth, unknown function r(-).  56  L e m m a 4.6.3  Let r, Sh and S^ be as in Lemma 4-6.1. Then, if n —> oo, h —> 0 and  nb? —> oo:  n + 1 1  I-S )r\\l= -?^- j\"(z) f(z)dz-h V  2  i  + o{h ).  (4.39)  i  h  7/ r also satisfies l r = 0, then: T  1 n + 1  1  ^(#)  (r-^)r||i  2  j\"{zff{z)dz-( j\"{z)f{z)d;  • / I + O(/J ). 4  K  4  (4.40)  Proof: To establish (4.39), use L e m m a 4.6.1 to get:  = -^ =  (  T  T  T  T  E[5r(Ar,^ /i).^  + o(/i )] 2  >  T  £  ^  2  )  '  ^  4  +  O  (  /  L  4  )  2  -  (  4  '  4  1  )  T h e last equality using the boundedness of B (K, z, h) for a l l z G [0,1] and h < h , w i t h r  0  /io G [0,1/2] small enough, which is a consequence of L e m m a 4.6.2 a n d the boundedness of r"(-). Now, we use B (K, z, h) = r"(z)v {K)/2 r  n+1  4-!  3  4(n + l ) f - f ' 3=1  3=1  V  J  '  4 n+ 1  K  +  v  E Zj£[h,l-h]  T h e first term can be shown to equal (v (K)/2) 2  integration argument.  for z G [h, 1 - h] to write:  2  jf^ , ' Zji\h,l-h]  V  j  ;  B (K,Zj,h) . 2  f  r  l Q  r"(z) f(z)dz 2  + o ( l ) by a R i e m a n n  T h e second term is o ( l ) , as the s u m contains 0(nh)  terms and  r"(z) is bounded for z £ [h, 1 — /i]. T h e t h i r d term is also o ( l ) , as the s u m contains 57  0(nh) terms that have been shown to be bounded for h small enough. Combining these results yields (4.39). To establish (4.40), we use the fact that S  c  = (I — ll /n)S  (equation (3.1)) and  T  h  h  l r — 0 to obtain: T  ^\\(I-Si)r\\l  J2[(Si-I)r] .  =  2  Substituting (4.36) in the above yields (4.40).  The following result provides a probability bound for a linear combination of independent, identically distributed random variables having zero mean and non-zero, finite variance.  Lemma 4.6.4 Let £ =  ... ,£„) be a vector whose components are independent and T  identically distributed real-valued random variables. If E(£i) = 0 and 0 < Var(£\) < oo, then: ?c = 0 (\\c\\ ) P  (4.42)  2  for any real-valued vector c — ( c i , . . .  Proof: By Chebychev's Theorem, we have: " ec = El I J2 ^kI \+OP c  fc=i  J  /  " , VarI <J2 ^k  VN  58  c  ^  T h e next lemma provides asymptotic approximations for the elements 5 y , i, j = 1 , . . . , n, of the local linear smoother m a t r i x Sh defined i n (3.6)-(3.8). These approximations are used to obtain uniform bounds for the elements of Sh-  L e m m a 4.6.5 (3.6)-(3.8).  Let Sij, i,j = l,...,n,  Also, let K(-)  be local linear smoothing weights defined as in  and v (K,z,h),  z G [0,1], I = 0 , 1 , 2 , as in Lemma 4.6.2.  t  Furthermore, let Z i = 1 , . . . ,n, be design points with density function /(•) satisfying i%  condition (A3). Then, if n —> co, h —» 0 and nh —> co, we have: 3  s  1 v (K,Zi,h)-^v (K,Zi,h) f(Zi)(n + l)h ' v {K, Z h)u {K, Z h) - (K, 2  l]  x  2  u  uniformly in Zi, i = 1,... ,n.  0  h  (Zi-Zj h  K  Z hf '  Vl  \  u  Furthermore, for all h < h , with ho G [0,1/2] small 0  enough, there exists a positive constant C so that: \ S « \ * J ^ - W i - Z i \ Z h ) uniformly in Z and Zj, i,j =  (4.44)  l,...,n.  t  Proof: U s i n g the definition of S^ i n (3.6)-(3.8) and the fact that ] T "  S ,i(Zi) , 2  n  = 1  wf  =  S , (Zi)S (Zi) n 2  nfi  we write:  (n + DhS- ~  ( + l)hS (Zj) S ,2(Zi)S fl(Zi) — S i(Zi) (n + l)h S (Zj)  (Zi - Zj \ h (Zj-Zj\  n  n:2  2  n  n  ni  2  ntl  s , {Zi)s o{Zi) - s {ZiY  K  n 2  n>  ntl  \  h  (Zi-  ){  Zj  h  )•  ( 4 4 5 )  Let I = 0 , 1 , 2, 3 be fixed. B y the definition of «?„,/(•) i n (3.8), the design condition (A3) on the Zj's and a R i e m a n n integration argument, we obtain that the following asymptotic 59  expression for S i(Zi)/[(n  + l)h ]: l+1  n<  Zj — Zj^ f Zi — Zj\ (n + 1)^+1  '  S n  l { Z i )  ~ ( +l ) h ^ n  3=  K  {  h  J \  h  4iX^)(^)' ^ "~' ~ /(  holds uniformly w i t h respect to Z i  +0(  A  2)  — 1,..., n, as n —> oo, h — > 0 and nh  —> oo.  3  it  M a k i n g the change of variables s = (Zi — z)/h a n d using a Taylor series expansion of  s jf * (^) w  =/rr *•*  / ( • ) , we express the leading term i n the above asymptotic expression as:  r(l-Zi)/h  J  s K(s) l  Zilh  f(Zi)  + f'(Zi)  • (sh) +  -^-  • (sh) + o(h ) ds  f  r(i-Zi)/h  2  2  (i-Zi)/h  f  = /  s K(s) [f(Zi) + 0(h)] ds = f(Zi) /  s K(s)ds + 0(h)  l  J-Zi/h  l  J-Zi/h  = f(Z )v (K,Z h) i  (s)m+sh)ds  l  + 0(h)  u  Here, the O term holds uniformly w i t h respect to Zi,i = l , . . . , n b y the smoothness assumptions on / ( • ) given i n condition (A3). C o m b i n i n g these results, we conclude that:  (n + l ) / i ' +  1 < 5  "''  ( Z i )  =  ( >( >  f  Zi  l  > ^ + °W  K  Zi  + °( ~ ~ ) n  lh  2  ( - 6) 4  4  uniformly i n Zi, i = 1,..., n, as n —> oo, h —> 0 and nh —> oo. 3  Now, for I = 0,1,2, 3, we substitute the asymptotic expression of S j(Zi)/[(n n  + l)h ] i n l+1  (4.46) i n the right side of equation (4.45). U s i n g that the quantities f(z), K(z) and zK(z) are bounded for z € [0,1] (conditions (A3) and (A5), respectively) and re-arranging, we easily obtain (4.43). T h e asymptotic bound for Sy given i n (4.44) follows immediately from L e m m a 4.6.2 and (4.43). T h e following result follows easily from L e m m a 4.6.5. T h i s result w i l l be used to prove L e m m a 4.6.7.  60  L e m m a 4.6.6  Let  be as in Lemma 4-6.1. Given C > 0, there exist C{ > 0 and C > 0 2  such that for any n > 1 and any v = (v\, . . . , v )  T  n  with \VJ\ < C, we have: (4.47)  and \[S v\A<Cl  (4.48)  \S v\\l<n(Clf  (4.49)  h  Furthermore, we also have: T  h  and (4.50)  \s v\\i< (c* y. n  h  2  Proof: Use result (4.44) of L e m m a 4.6.5 to write:  E  SjkVj  <El ;*l>^ ~El ;*l fc=i 5  c  5  fc=i  fc=i  l  -0(nh) = 0 ( 1 ) . ( +l)/T  +  n  T h i s proves (4.47). Result (4.48) can be derived using a similar reasoning. B y result (4.47), we have: \\S v\\l = (Slv) (S v) T  T  h  T  h  = £  [5^]; <  n(Cl)\  so (4.49) is proven. Result (4.50) can be shown to hold i n a similar manner.  Now, we use Lemmas 4.6.5 and 4.6.6 to establish the following asymptotic bounds.  61  Lemma 4.6.7 Let r,S  c h  and I be as in Lemma 4-6-1- Then, if n —> oo, h —> 0 and  nh — > co: 3  ||r|| = 0 ( n / ) 1  \\S r\\ T  h  2  (I-S ) r\\ c  T  h  2  (4.51)  2  2  )  =  0(n^),  (4.52)  =  0(n^),  (4.53)  and  \s \\ c  h  F  =  (4.54)  o(h-^).  Proof: Using the boundedness of /•(•), we write: n  \\r\\l = r r = Y,r{Z ) = 0{n), T  2  %  t=i  so (4.51) is proven. Using S = (I - 11 '/n)S c  and result (4.49) of Lemma 4.6.6 with v = (I -  T  h  h  11 /n)r, T  we have: Sfr\\l  =  Si  [I  - - H  i r  T  *\2 \\Siv\\i<n-(Cl)  =  n for some CJ > 0 not depending on n. This proves (4.52). 1  Result (4.53) follows immediately from results (4.51) and (4.52). Finally, to show result (4.54), we use well-known properties of the Frobenius norm to get:  )s n  11  T  \SI\\F  -  i-  —  Thus, it suffices to show that  h  1 n  < \\S \\ + -\\U \\ h  F  HS/JI^ is of order 0(h 62  T  F  / ).  1  2  -\\S \\ h  F  <2\\S \\ . h  F  B y result (4.44) of L e m m a 4.6.5, we obtain:  \\S \\l = 11  Si <  h  i=l  j=l  ± ± I(\Z -Z \< h) t  ^  3  i=l j=l  '  for some positive constant C. Since the number of non-zero terms i n the double s u m appearing on the right side of the above inequality is nO(nh), we conclude that is 0(h- ) x  or, equivalents, that  \\S \\ is h  HS/JI? 2  0(hr / ). 1 2  F  T h e next result provides a probability bound for the E u c l i d e a n norm of a vector of n independent, identically distributed random variables having zero mean and non-zero, finite variance. It also provides a probability b o u n d for the E u c l i d e a n n o r m of a transformation of this vector, obtained by pre-multiplying the vector w i t h the transpose of a centered local linear smoother matrix.  Lemma 4.6.8 Let £ be as in Lemma 4-6-4 d  S° be as in Lemma 4-6.1. Furthermore,  an  h  let fi be an n x n symmetric, positive definite matrix with ||fi||s = C>(1).  Then, if  n —> oo, h —> 0 and nh —> co, we have: 3  ||£|| = 0 {n ' )  (4.55)  1 2  2  P  = Op{h-W)  \\S£toi\\  %  (4.56)  \\S Slt\\ = Op{hr l ) c  (4.57)  l 2  h  2  Proof: B y M a r k o v ' s Theorem: = OP {E{\\H\\l)) = Op(nVar(^)) so (4.55) is proven.  63  = 0 (n), P  Next, consider (4.56). Set B = flS . By Markov's Theorem, we have: c  h  \\Sf ml  = \\B m = Op(E(\\B t\\%) T  Thus, it suffices to show that E(i BB $) r  is 0{h~ ).  T  E(?BB S)  =  T  T  Using result (2.24) with u = £  1/2  and A = BB , together with the symmetry of fl, we obtain: T  E(£ BB £) T  T  = trace (BB • Var{£)) + E{£) • BB  • E(£)  = Varfa)  • \\B\\  T  T  T  • trace {BB ) + 0 = Var(^) T  2 F  <||n||l.||5 || = C7(l)C?(/- ) = 0(/i- ), c  1  fc  F  1  l  by result (4.54) of Lemma 4.6.7. This proves (4.56). Result (4.57) can be established using a similar argument.  The next lemma contains results concerning the asymptotic negligibility of various random or non-random terms. All of these terms depend on a matrix of weights fl and ;  on centered or uncentered local linear smoother matrices. Some terms also depend on a matrix of weights fl*, possibly different than fl itself.  Lemma 4.6.9 Let fl and fl* be n x n symmetric, positive-definite matrices satisfying  \\fl\\ = 0(1) s  = ||n*|| . LetS s  and r* = (r*(Zi),... ,r*(Z )) , T  n  andS  c  h  h  be as in Lemma 4.6.1. Setr = (r(Zi),...,  r{Z ))  T  n  where r(-) : [0,1] -> R and r*(-) : [0,1] -> R are smooth  functions having three continuous derivatives and the Zi's are fixed design points satisfying condition (A3). Finally, let £ = (£i,. • •, £ ) n  T  a n a >  £*  =  (£!> • • • > £n) be vectors T  whose components are independent, identically distributed random variables such that Efa)  = 0, Varfa)  < oo and E(£*) = 0, Var(£*) < oo . Then, if n -» oo, h -> 0 and  64  nh —> oo, we have: 3  1 -r* Sl(I-S )r n+  = 0(h ),  (4.58)  - I)r = 0(h%  (4.59)  1  2  h  -^±—^ mi (S T  T  h  ^ r ^ n C J - S£)£ = Op (n-^h-^)  ,  (4.60)  ^r "(/ - Sl)r = O p l n " ^ ) , r  1  (4.61)  2  - L - £ * n S ^ = CMn- / /!- / ) T  1  2  1  (4.62)  2  - L ^ O S f * - Opin-Wh- '*).  (4.63)  1  1  1  n  - € * n ^ n * s f n« = o { - h- )  (4.64)  C nSf  (4.65)  r  n+  T  +l  l  P  1  n  Q*Sf Sl£ = Opin^h- ). 1  Proof: Using properties of matrix and vector norms introduced in Section 2.4 of Chapter 2, we get: • ||(I - 5 , ) r | |  | ; ^ r * f i ( J - S )r\ < ^ | | r ' | | • r  h  2  2  = ^rC»(n / )0(l)0(n / /i ) = C(/i ) 1  2  1  since ||r*|| is © ( n / ) by result (4.51) with r = r* and 1  2  2  2  - S )r\\ /(n  2  2  2  h  2  ||llT||F  -  + 1) is 0(h ) 4  by result (4.39). Thus, (4.58) holds. Similarly, we obtain: 1  r* mi (S  n(n + 1)  T  T  h  - I)r -  ;  -n7r^l)  (n+ ^ ( 0  n  n l / 2  so (4.59) holds.  65  l|r1|2  -  ||n||s  )°( ) ( ) ( 1  0  n  0  -  n l / 2 h 2  11(5,1  ~  ) = °( )> h 2  I ) r | 1 2  Using result (4.42) with c = (I - S ) Clr* , we have: c  T  h  1 r* Cl(I - Si)£ = ^ O ( \ \ ( I n +1 n+  S ) rir*\\ )  T  c  < —Op n +1  ((1 + \\S%\\ ) •  =  l 2  T  h  P  2  • ||r*|| ) = n+1  F  ^-Op(h- l )0 {l)Op{n}l ) l 2  2  2  P  0 {n- ' h- l ), l 2  P  since \\S \\ is 0{h~ / ) by result (4.54) and ||r*|| is 0{n l ) c  1  h  2  by result (4.51) with  l 2  F  2  r — r*. We conclude that (4.60) holds. From result (4.42) with c = ft(I - S )r and £ = £*, we get: c  h  n+  (||n(J -  - S )r =  C n(I  1  c  T  h  T  S )r\\ ) < c  n +1 1 0 (l)Op{n l h ) n +1 l 2  h  2  =  2  P  -L_0 m\\ n +1 P  • ||(I - S )r\\ ) c  s  h  0 (n- h ), 1/2  2  P  since ||(J - S )r\\ /{n + 1) is 0(h ) by result (4.40). Therefore, (4.61) holds. c  2  h  A  2  To prove (4.62), write: 1  1 iiriuMu-11^112 n+1 1 0 {n ' ) • 0(1) • Op(h~ ' ) = n +1  <—  n+1  1 2  1 2  P  Op(n- h-^ ), 1/2  2  since ||£*|| is Op{n ' ) by result (4.55) with £ = £* and |]S££|| is 0 ( ^ / ) by result 1 2  1  2  2  2  P  (4.57) with Q, = I. Result (4.63) follows via a similar argument, but with result (4.57) replaced by result (4.56). Result (4.64) follows by noting that: 1  1  < n + 1 Wnril2-l|n*|| -||s?n<*||  n +1  s  1 -Op(h-- ' )0(l)0 (h- ' ) 1 2  =  1 2  n+  P  2  Opin-'h- ), 1  since both ||S^ft£*||2 and ||S£ f2£|| are Op{h~ ) by Lemma 4.6.8. A similar reasoning T  l/2  2  yields that (4.65) holds. This concludes our proof of the current lemma. 66  2  T h e next l e m m a provides asymptotic expressions for quantities involving the bias of a local linear estimator of an unknown, smooth regression function m(-).  L e m m a 4 . 6 . 1 0 Let G be as in (2.14) and let m = (m(Zi),...  ,m(Z )) ,  be as in Lemma 4-6.1.  Furthermore,  where m satisfies the smoothness conditions in condition  T  n  (A4) and Z\,..., Z are fixed design points satisfying condition (A3). Then, if n —> oo, n  h —> 0 and nh —> oo, we have: 3  - S )m = -h ^p2  -^—G {I T  71+1  — L ^ l l  f g(z)m"(z)f(z)dz 1  h  ( S  - I)m = h ^p-  J  2  h  (4.66)  2  JQ  I  T  + o(h )  g(z)f(z)dz  1  •£  m"(z)f(z)dz + o(h ) 2  (4.67)  where  g(z)f(z)dz  and  g(z)m"(z)ffz)dz  are defined as in equations (2.18) and  (2.19).  Proof: Let i = 0 , 1 , . . . ,p, be fixed. B y result (4.34) of L e m m a 4.6.1 w i t h r = m, the (i + l )  element of G (I — Sh)m/(n + 1) is: T  [^-Cril -  S J m ] ^  =  --±-± ,(Z,)[( g  - I)m]  Sk  i=i 1  = -h  L  N o t i n g that B (K, m  ™  ——^giiZ^BMZ^h)]  2  n  +  1  z, h) = m"(z)v (K)/2  for z £ [h, 1 - h], we write:  2  =  3=1  ^LY g Z )rn''{Z ) j  ^  2(n+ % 1) 2  2  3=1  -L-j^g^B^K^^h) V  +o(h ). J  {  E  9i(Z )m"(Z ) j  j  '  +-  Zj$[h,l-h]  l{  0  ^  9i(Zi)B {K,Z h). m  Zj$[h,l-h]  67  j  j=l it  s t  The first term can be shown to equal (u (K)/2)- J gi(z)m"(z)f(z)dz+o(l)  by a Riemann  X  2  Q  integration argument. The second and third terms are o(l), as both sums contain 0(nh) terms and these terms are bounded for h small enough, by the following argument. The boundedness of m"(z) for z £ [h, 1 — h) is a consequence of condition (A4). Lemma 4.6.2 yields that the function z —> B (K, z, h) is bounded for all z G [0,1] and h < h with m  0  h G [0,1/2] small enough. Combining these results yields (4.66). 0  Now, consider (4.67). Since the first column of G is the vector 1, from (4.66): 1 (J - S )m/{n T  h  j  + 1) =  "(z)f(z)dz  1  m  + o(h ). 2  Combining this with (4.10) proves (4.67). The next result concerns the existence of an inverse for the (p + 1) x (p + 1) matrix V defined in (2.20). We do not provide a proof for this result, as one can easily verify that VV  _ 1  = V~ V = I using the expression for V 1  L e m m a 4.6.11 Let V = S  ( 0 )  given below.  - 1  + ftg{z)f{z)dz • f g(z) f(z)dz l  T  matrix introduced in (2.20) and set a = ( J g (z)f(z)dz,  be the (p+ 1) x (p+ 1)  • • •, JQ g (z)f(z)dz) .  1  T  x  p  Also, let  S = (Ejj) be the variance-covariance matrix introduced in condition (AO)-(ii). Then V  - 1  exists and is given by:  , V-  1  /1 +  provided E  r  S  =  -  1  •  V _ 1  o  - S  _  1  o  o  I -a E 1 T  j  _ 1  ,  , (4.68)  £  exists.  The last two lemmas in this Appendix provide several useful asymptotic bounds. L e m m a 4.6.12 Suppose the assumptions in Theorem 4-2.1 hold. Then: — L ^ - ^ J  - S )9(I h  68  - S ) GVT  h  1  = Q(n- ). 1  (4.69)  Proof: Since the elements of the (p+1) x (p+ 1) m a t r i x V to show that G (I  - S )^{I  T  c  h  - S ) G/(n c  + if  T  h  do not depend upon n, it suffices  is 0(n~ ).  It is enough to show that  x  Gf (I  - S )*(I  Let i,j  = 0 , 1 , . . . ,p be fixed. U s i n g vector and m a t r i x n o r m properties introduced i n  +1  - Sl) G /(n  1  c  + l ) is © ( n - ) for any i, j = 0,1,... ,p.  T  h  2  m  1  Section 2.4, we obtain:  — Gj (I-S )*(I-SiyG  <  c  (n + l ) 1  2  +1  (n + l ) ' I 2  h  s yG c  h  -±—0(n^) (n + iy since \\(I - S ) G \\ c  T  h  i+1  j+1  2  • \\#\\s •  • Oil) • 0(n" ) = 0(n^ )  - S ) G \\ c  r  h  j+l  2  <  = 0(n~')  2  2  2  \\  i+1  = ||(I - S ) G \\ c  by result (4.53) of L e m m a 4.6.7  T  h  j+1  2  w i t h r = Gi+i and r = G^+i, respectively, a n d ||\P||s = 0 ( 1 ) by condition ( A l ) - ( i i ) . Thus,  Gf (J - S J * ( J - S £ ) G C  T  +1  / ( n + l ) is 0(n~ ). l  2  i + 1  L e m m a 4 . 6 . 1 3 Suppose the assumptions in Theorems 4-1-1 and4-2.1 hold. Letfhz,s%(Z) be the local linear backfitting estimator ofm(Z) defined in (4.32), where Z G [0,1] is fixed. Also, let rhi s (Z) c  t  h  denote the local linear backfitting estimator ofm(Z) that would be ob-  tained if 3 were known precisely:  m  J i S  (4.70)  c(Z)  where the wf" 's are as in (4.33). Then, if n —> oo, h —• 0 and nh —> co, we h< ave: 1  3  E( , c(Z)\X, mi S  Z) - m(Z) = 0(h ), 2  Var(m (Z)\X,Z)  = 0  ItSl  69  1 nh  (4.71) (4.72)  and E(fh c (Z)\X, ItS  Z) - m(Z) = 0(h ),  h  Var(m c(Z)\X,  Z) = O (J^J  ItS  2  (4.73)  .  (4.74)  Proof: T h e proof of (4.71) and (4.72) can be found i n Francisco-Fernandez and Vilar-Fernandez (2001), so we omit it. T o prove (4.73), use the definitions of fhi s (Z)  and rhz,s (Z) i n (4.32) and (4.70) to  c  >  c  h  h  write:  mi %{Z) = -  En  iS  =  fn  \—m  (Z)  (Z)-w X0 T  I ) S i  I t S l  (Z)  -3).  Thus: E(mj, c(Z)\X,  Z) - m(Z) = {E(m c  S  I>S  (Z)\X, Z) -  m(Z)}  - w X{E0 si\X,  Z) - 3}  T  It  (4.75)  and Var(fh {Z)\X,Z)  = Var(rh {Z)\X,  ItSl  Z) - 2w X T  ItSl  + wX T  • Var{m *(Z)\X,  • Cov(J3 c, rh (Z)\X, ItS  m  Z) • X w.  (4.76)  T  IiS  Result (4.73) follows by combining (4.75) a n d (4.71) a n d using that Bias(3 ^\X, is  is Op(h ) 2  by T h e o r e m 4.1.1 and w X T  using the fact that the wf^s  is 0(1).  are bounded by L e m m a 4.6.5. Result (4.74) follows by IS  wX T  c ,rhi si(Z)\X,  S  t  Z)  T h e latter result is easy to establish  combining (4.76) a n d (4.72) and using that Var(3 cjX, 4.2.1, Cw(3j  Z)  Z) is Op(l/(nh))  is 0 ( 1 ) . 70  Z) is 0 (l/n) P  by T h e o r e m  by a Cauchy-Schwartz argument and  Chapter 5 Asymptotic Properties of the Modified and Estimated Modified Local Linear Backfitting Estimators, and In this chapter, we investigate the asymptotic behavior of the modified local linear backfitting estimator / 3 ^ - i c of 3, w i t h * being the true correlation m a t r i x of the model S  errors. Recall that an explicit expression for 3^-\ f2 =  and replacing 3*-i  iSS  S  c can be obtained from (3.4) by t a k i n g  w i t h the centered local linear smoother =  {X *-\I T  - Sl)X)~  l  X *~\l T  S: c  h  - S%)Y.  (5.1)  To simplify the proofs of the asymptotic results derived i n this chapter, we consider that the model errors satisfy assumption (A2), that is, they are consecutive realizations from a stationary A R process of finite order R.  A s s u m p t i o n (A2) is a special case of the  assumption ( A l ) considered i n Chapter 4.  T h e structure of this chapter is similar to that of Chapter 4, where we studied the asymptotic behaviour of 3 c^. IS  ^  n  the  fi  rst  P * °f *he chapter, we study the asymptotic a r  71  behaviour of / 3 ^ - i c . T h e proofs of the asymptotic results concerning ever more complicated t h a n those concerning conditional bias and variance of  I>S  given  3^,-i c S  exact conditional bias and variance of / 3  J i S  f°  3 c  and  X  c given  r t  and  X  n  e  are how-  3^-i c  S  >S  following reason: the exact  Z  depend on Vl> whereas the  Z  do not depend on  -1  \T/  -  1  .  Next,  we mention how the asymptotic results concerning the modified local linear backfitting estimator  can be generalized to local polynomials of higher degree.  3^-i ^ s  provide sufficient conditions for the estimators 3^-\  g  c  We then  and / 3 ^ - i c to be asymptotically S  'close'. T h e chapter concludes w i t h a n A p p e n d i x containing several a u x i l i a r y results.  5.1  Exact Conditional Bias of /3^-i c given X and Z S  ' h Just like the usual local linear backfitting estimate t i n g estimate  /3/,s= >  the modified local linear backfit-  suffers from finite sample bias. Indeed, using the explicit expression  3^,-i c S  of / 3 ^ - i c given i n equation (5.1), we obtain the exact conditional bias of S  X  and  Z  given  as: £ ( 3 * - i c \ X , Z)-3= i S  ( X  T  * -  ( I  l  -  S D X y  1  X  T  ^ ~  1  ( I  S )m, c  -  h  (5.2)  an expression which generally does not equal zero. T h e o r e m 5.1.1 below provides an asymptotic expression for the conditional bias of / 3 ^ - i c . S  given X and Z. These derivations assume that the value of h i n S ° is deterministic and h  satisfies conditions (2.12)-(2.13).  Theorem 5.1.1 Let  and W be defined as in equations (2.21) - (2.22). Under con-  ditions (AO) and (A2)-(A5), if n —> oo, h —> 0 and nh —> oo, the conditional bias of the 3  modified local linear backfitting estimate /3,j-i c of 3, given X and Z , is: S  E 0 ^  t  S  a j X , Z ) - 3  =  -h ^(l-J2<p ) 2  V  k  °  u  \  72  fc=i  /  ^  W  + o (h ). 2  P  (5.3)  Comment 5.1.1 Aneiros Perez and Quintela del Rio (2001a) investigated the large sample properties of an estimator similar to 3^-ig^,  namely  /9(/_K- ) *- , K ' T  1  h  h  *  n  e  u  n  _  constrained modified Speckman estimator in (3.12). Under similar assumptions as ours, Aneiros Perez and Quintela del Rio obtained a faster rate for the asymptotic conditional bias of their estimator, namely Op(h ). 4  the asymptotic conditional bias of 3^-i c S  As seen in (5.3), the rate we obtained for  is Op(h ).  However, they did not provide  2  asymptotic constants for this bias, like we do in (5.3). They obtained the same rate of convergence for the asymptotic conditional variance of their estimator as we did for that of 3^,-i c , namely Op(l/ri).  Just like us, they do provide an asymptotic constant for  S  this variance.  Proof of Theorem  5.1.1:  Let:  where the dependence of B ^ n  upon h is omitted for convenience. We will see below  that when n —> oo, h —• 0 and nh —> co, B ^  converges in probability to the quantity  3  n  V * defined in equation (2.22). Since V * is non-singular by Lemma 5.7.6, the explicit expression for 3^,-1 c in (5.1) holds on a set whose measure goes to 1 as n —> co, h —> 0 S  and nh —> co. We can use this expression to write: 3  •j - l - X ^ - ' t l - S£)yj ,  0*-*,s% =  (5.5)  which holds on a set whose measure goes to 1 as n -> oo, h —> 0 and nh —> co. Taking 3  conditional expectation in both sides of (5.5) and subtracting 3 yields:  E@ - \X,Z)-a 9 ltS%  = B~)f, • j - i - X  7  *- ^1  S£)mj •  (5.6)  We now show that -B ,* converges in probability to V * as n —> oo, h —> 0 and nh —> co, 3  n  that is: B * = V * + o (l). n>  P  73  (5.7)  Using the fact that X = G + r\ (equation (2.16)), B ^ can be decomposed as: n  = ^ T T  g  *  T  _  (  1  -  7  ^  S  G  + —^rG  Ti -t- i  n  T  *  _  (i -  1  + r T T ^ * " ^ - S JG + - i - V * " 1  C  = Sh, S — (I — 11 /n)Sh,  From equation (3.1) with  c  s) h  v  l  ^ - 5^)77.  1  (5.8)  so re-writing the first term,  T  h  expanding the last term and re-arranging yields:  - ^Ti)  *"  G T  +  ^TT)  +  ^TT" *"  G T  r  l l l T G +  *"  l l T ( s  "'  1 ( /  ^i"*" "-^ii  s a G  1  *"  7  )  G+  ^TT  *"  G T  *"  G T  1 ( /  1 ( /  - * S  ) G  -  "STT'' *~ ' ^ T  1  s  ( 5  '  9 )  To establish (5.7), it suffices to show that 1  -G * r  ll G = = 4 { - E <r*\  _ 1  T  n(n + 1)  f 9(z)f(z)dz  1  j  1  g(z) f(z)dz  +  T  o(l),  (5.10)  while the remaining terms are  op(l).  The proof of (5.10) is immediate by writing 1  G * ll G r  T  _ 1  n(n + l)  = (—-—G *"" !^ • 7  1  V +! n  (-J-G lV • (l + T  /  + l  /  1  V  ™  and using Lemma 5.7.3 in the Appendix of this chapter and result (4.10). Result (5.11) is proven in Lemma 5.7.4. To prove the remaining terms in (5.9) are O p ( l ) , it suffices to show that the quantities  Gf ^-\I +1  + 1), GJ+^II^SH  - Sh)Gj+1/(n  S )v /(n + 1), vi**-  1  c  h  j+1  (I - S h)Gj+l/(n c  - I)G /n(n j+1  + 1) and rjj^S^^J{n  74  + 1), Gf *-\l  -  +1  + 1) are o ( l ) . P  These facts follow from lemmas appearing in the Appendices of this and the preceding chapter. First consider Gf &~ (I i  - S )G /(n  1  +1  r* = Gi+i, fi =  h  + 1). By result (4.58) of Lemma 4.6.9 with  j+1  and r = G ,  this quantity is 0(h )  result (4.59) of Lemma 4.6.9 with r* = G i,  fi =  i+  Gf *" 11 (5^ - I)G /n{n 1  T  +1  j+1  2  fi = *  i+i  _ 1  1) is O ( n -  c  + 1  h  j+1  and r = Gj+\, we have that  + 1) is 0(h ) = o(l).  By result (4.60) of Lemma 4.6.9 with r* = G , G f * ( 7 - S )r} /(n+  = o(l). Similarly, from  2  j+i  p  1 / 2  /r  1 / 2  _  and £ = T ; , we have that  1  j + 1  ) = o (l). Using a similar reasoning with P  (4.61) of Lemma 4.6.9, we obtain that rj'[ ^>~ (I - S )G /(n 1  Finally, consider rfi ^~ S r) /(n+l). l  +l  fi =  h  j+1  P  By result (4.62) of Lemma 4.6.9 with £* = r/ ,  c  h  + 1) is also o (l).  c  +1  j+l  i+1  and £ = Vj+i> this quantity is (D (n  1 / / 2  P  /i  l ) = o (l).  This concludes our  l 2  P  proof of (5.7). By Lemma 5.7.6 in the Appendix of this chapter, the matrix V * on the right side of (5.7) is non-singular and admits an inverse V ^ , so (5.7) leads to: 1  = V^ + o (l).  (5.12)  P  To prove the theorem, by (5.6) and (5.12), it suffices to show that: 1  — X n  +  1  T  * - \ I  2/  \  R  - St)m = -h °-\ 1 - X> " V k=i 2  a  2  W + o (h ). 2  P  (5.13)  /  From equation (2.16), X = G + r), so: - l X * - ( / - S%)m = -L^CPV-^I r  - S%)m + - I ^ H T ^ I -  1  I  Sl)m.  Using the identifiability condition on m in (2.4) and S£ = (I — 11 /n)Sh, we obtain: T  (5.14) 75  B y L e m m a 5.7.5, the first two terms o n the right side of (5.14) are equal to the right side of (5.13). Now,  consider rrj <£>~ (I - S )m/(n  + 1), the (i + l )  x  +l  (5.14).  h  element of the t h i r d term i n  t h  Using result (4.42) of L e m m a 4.6.4 w i t h c = *  _  1  ( I - S )m h  and £ = r ?  i + 1  ,  together w i t h spectral n o r m properties introduced i n Section 2.4, we obtain:  - l ^ t f - V - S )m  = -l-0P(||*-i(J _  h  =  • I I C - S )m\\ ) h  lb  ~~J~  h  -L  The last equality was obtained by using that | | * | | - S )m\\ h  = 0{n l h ) l  2  Finally, consider r]f ^~ ll (I 1  2  h  2  = o (h ). 2  p  is bounded (result (5.35) of L e m m a  _ 1  by result (4.39) of L e m m a 4.6.3 w i t h r = m.  2  - S )m/n(n  T  +1  • | | ( I - S )m\\ )  J.  Tb \~  5.7.2) and  2  -^-OpiWV-'Ws  =  s  S )m\\ )  + 1), the (i + l )  h  t h  element of the fourth  term i n (5.14). Using a similar reasoning as above, we obtain:  "  •  =  = -L^Opm-'Ws  ' ^Ti)  I ) m  l  |  l  l  T  |  l  f  '  1  1  (  - IK/ - S )m\\ ) h  7  0j,(ll  -  S  h  *' )  m  l  ~  lllT(Sfc  l  s  /)m|l2)  )  = o (h ). 2  2  p  T h i s proves (5.13) and completes our proof of Theorem 5.1.1.  5.2  Exact Conditional Variance of f3^-i c given X 5  and  Z  In this section, we derive an asymptotic expression for the exact conditional variance of 3*-i,s£, given X,Z: Var(3*-i c | X , Z) = a B-% • X *~\l 2  lS  - S )#(I  T  c  t  h  76  - S t f ^ X  •B ^  (5.15)  where B ^ is defined as in (5.4). The above equality was obtained by using the explicit n  formula of  3^-i  >  S  c  in (5.1), together with the fact that Var(Y\X,  Z) — cr * by condition 2  (A2).  Theorem 5.2.1 Under conditions (AO) and (A2)-(A5), ifn —> oo, h —> 0 andnh —> oo, 3  the conditional variance of the modified local linear backfitting estimator 3  9  - i  >  S  of 3,  c  given X and Z, is: Var@ 9  l i a  . \ X , Z) = - 1 _ .  + £  ^  y  - i ( o ) s  v  - i  (5.16)  Comment 5.2.1 By Lemma 5.7.7 in the Appendix of this chapter, the second term in the above asymptotic expression for Var(3^,-i c\X, S  not dominate the first term, which is  Z) is Op(n~ ) and hence it does l  Q (n~ ). l  P  Proof of Theorem 5.2.1: By (5.15), we have: 2  |X, Z) =  Var(p -i * 9  where C , * = X *-\l T  n  tS  - S )^(I c  h  • C * • B~^,  - S ) ^- Xj(n c  (5.17)  n>  T  x  h  + 1). Since B~\  V* by 1  result (5.12), to prove the theorem it suffices to show that: C ,* = n  4 f +E^ ) 1  S ( 0 )  + - ^ G * " ( / - S%)#(I - SifV-'G r  T  1  + o (l). P  (5.18)  77  This fact is shown below with the help of lemmas in the Appendix of this and the preceding chapter. By (2.16), X = G + r], SO C * can be decomposed as: n ]  Expanding the last term and re-arranging yields: 1  1  C„,* = —-TV**' *  +— - G * - \ I  1  - SD^I -  + —-G *-\I T  + n + -G *-\I T  - S%)*(I -  T  StfV-'G  sir*- * 1  1 V Sf9- r, n +1  - S%)MI ~ S ) *- r c  T  1 n+1  n+1  T  1  h  t  h  1  '  h  (5.19)  The first term in the above converges to the first term on the right side of (5.18) by Lemma 5.7.4. The second term in the above is the same as the second term on the right side of (5.18). To show the remaining terms are op(l), it suffices to establish that Gj *-\I  - SDMI ~ SD *-\ /(n T  +1  +l  rif *- S *S?9- ri /(n 1  +1  c  1  h  j+1  + 1), rfi S£^^/(n +l  + l) are o (l) for all i, j = 0,1,... ,p. P  Let i,j — 0,1,... ,p be fixed. From result (4.42) of Lemma 4.6.4 with c = SD^{I — SD ^f~ G i T  l  i+  + 1) and  —  and £ = rjj and from the spectral norm properties introduced +1  78  in Section 2.4 of Chapter 2, we get:  -L^nj^ii  - si)*(i -  < - ^ O P W V - X  = Opin-^h- )  si) ^G T  z+1  • (1 + \\S \\ ) • | | * | | • ||G || ) c  2  h  s  F  i+1  = o (l).  1  P  To derive the above result, we used Lemma 5.7.2 to obtain that ||^||s 0(1). We also used the fact that \\S \\ is 0(h' l ) c  c T  0 {n- l h- ' ) P  j+1  F  + 1) = rfi {&~ S r) /(n  1  h  1  j+1  +  h  i+1  P  _  1  i+1  Finally, T,T *- S%*S?*- ri /(n 1  +1  and  +1) is O p ^ / i " ) = o (l) by result 1  1  j+1  P  (4.64) of Lemma 4.6.9 with £* = r> , ft = i+1  5.3  + 1). This quantity is  c  = o {\) by result (4.62) of Lemma 4.6.9 with £* = ry , f i = *  1 2  £ = rj .  1  i+  Next, consider r]f S ^~ r] /(n l 2  d 11~ 1[s are  (take r = G i in result (4.51) of Lemma 4.6.7).  l/2  +1  a n  by result (4.54) of Lemma 4.6.7,  x 2  h  while Gi+i is 0(n )  2  fi* = * and £ = r / . J+1  Exact Conditional Measure of Accuracy of  5  c  Given X and Z Any suitable criterion for measuring the accuracy of Qy-i^i  should take into account  both bias and variance effects. We use the following measure of accuracy for d^-i c , S  which combines in a natural fashion these effects: E  (||3*- = - 3\\l\x,z) lS  = {ECPV-I^IX,Z)  - a)  T  [E0 - O\X,Z) 9  1iS  - a)  + trace  Using equation (5.20) above together with Theorem 5.1.1 and Theorem 5.2.1 we obtain the following result: 79  Corollary 5.3.1 Assume that the conditions in Theorem 5.1.1 and Theorem 5.2.1 hold. Then, when n —> co, h —> 0 and nh? —> oo, we have: E (||3*-xlSc  -  3\\ \X, Z)=h*-£(l-f2  ^)  2  +^I-^-(  + E^)  1  trace  W V^W T  {V-^VJ}  2  + o {h ) + o 4  P  P  (^)  .  (5.21)  5.4 The y^-consistency of  c  S  '  h  Just as with the usual backfitting estimator 3 ^, we would like the modified local linear IS  backfitting estimator E{\\%-\  /3^-i c [S  -0\\ \X,Z)  to be O {n  2  sl  to be ^/n-consist'ent given X and Z, that is, we would like  2  P  By result (5.21) of Lemma 5.3.1, £(||3*-i c iS  -  /9||i|X, Z) is O (/I ) + C (n" ). This 4  1  p  P  result is due to the fact that the conditional variance of B^-i c is Op(n~ )  but its  l  S  conditional bias is Op(h ). 2  We are interested in assessing at what rate the smoothing parameter h should converge to zero so that the squared conditional bias of 3^-1^ tends to zero, but has the same order of magnitude as the conditional variance of P><s,-\s - A similar argument as that c  h  employed in Section 4.4 yields that h should converge to zero at rate n~ , a G [ 1 / 4 , 1 / 3 ) , a  to ensure that the modified local linear backfitting estimator 3^-\ ^ is y^-consistent s  given X and Z - exactly as for the usual local linear backfitting estimator 0i s - Note c  t  h  that n~ < n / , so we must 'undersmooth' m^-i c to achieve \/n-consistency of a  - 1  5  S  3y-\ c given X and Z. Here, n ~ S  1//5  is the 'usual' rate of convergence for h, which we  believe is optimal for estimating m via m^-i c . S  80  5.5  Generalization to Local Polynomials of Higher Degree  The asymptotic results in Sections 5.1-5.4 concern the modified local linear backfitting estimator  /3^-i =. s  We believe these results readily generalize to the modified local  polynomial backfitting estimator of 8. The latter estimator is obtained from (5.1) by replacing S° , the smoother matrix for locally linear regression, with the smoother matrix h  for locally polynomial regression of degree D > 1. In keeping with the locally polynomial regression literature, we conjecture that the modified local polynomial backfitting estimator of 8 has conditional bias of order and conditional variance of order Op(n~ ).  <D (h ) D+1  P  Note that we may need boundary correc-  l  tions if D is even. We also conjecture that h should converge to zero at rate n~ , a  a € [l/(2D + 2), 1/3), for the modified local polynomial backfitting estimator of 8 to be •v/n-consistent given X and Z.  5.6  The v^-consistency of  s c  The estimated modified local linear backfitting estimator 3 ? - i  g c  can be obtained from  (5.1) by replacing \JJ with an estimator  3~-^ = (x $-\l  - Sftxy  1  T  X $-\l T  - S%)Y.  (5.22)  Deriving asymptotic approximations for the exact conditional bias and variance of /3~-i given X and Z is not possible, as these quantities are not tractable. The reason for this is that \& is random since it is computed from the data. In this section, we give sufficient conditions for 8^-^  g c  and  /3^,-i Sc  to be asymptotically 'close', in the sense that the  difference between these estimators is  Op(n  - 1  81  / ). 2  Our conditions (5.23) and (5.24) are  similar to those imposed by Aneiros Perez and Quintela del Rio (2001a) for establishing the asymptotic equivalence of their modified and estimated modified versions of the Speckman estimator.  Theorem 5.6.1  Suppose that the conditions in Theorems 5.1.1 and 5.2.1 hold. In ad-  dition, suppose that:  1  •-l  ^X {* T  Then, if h = n  a  o (l)  (5.23)  S )(m + e) =  o (l)  (5.24)  1  h  _ L x ( * - *-!)(/ T  =  -*- ){I-S )X  _1  P  c  h  P  , a E [1/4,1/3), we have:  3 -. = 3 . - . 9  ifl£  l S  c  (5.25)  o p ( - ^  +  Proof: To establish (5.25), it suffices to show:  V^(3$-' s c Using the expression for /3 -i  £ ) = V^09-\S%  ~P)  +  (5.26)  Op{l).  in (5.22) and Y = XB + m + e (equation (2.1)), we  T  write the left side of (5.26) as:  (3 -, , -Q) $  s  = (^X *~\l  = ( ^ X * - ( I - Sl)X r  1 n  1  X *-\I T  - X ' V U I n +  c  c  h  S )X c  h  _ Si)X^j  T  h  + o (l))  - S )X  • -±=X $-\l  1  ~ S )X^  T  1  F  • (±=X *-\I  +op(l)  (^=X ^-\l-S )(m T  + ±=X *~\l T  82  +  e)  + e) + o ( l ) ) P  + e) + op(l)  + e)  T  P  c  h  X *-\l-Sl)(m - o (l)  - St)(m  T  - Sl){m  - Sl){m  + e) • o ( l ) + P  o (l). P  By the definition of 3y c  (5-1)  m  S  ^ (3$-  -a)=y/n~  >h  w  e  have:  (3*-i c  /3)  -  lS  +  Qx *- (-f - SDX^j T  • o (l)  1  P  - S )(m + e) • o (l) + o (l).  + -±=X *-\I T  c  P  h  P  Therefore, to prove (5.26), it is enough to show that [X ^f~ (I T  - S )X/n)  1  c  h  1  and  X * - ( J - S%)(m + e)/y/n are O (1). r  1  p  = X * ( i " - S )X/(n  To prove the first fact, let  T  -1  + 1). By (5.12), B " ^ -  c  h  V ^ + o ( l ) , with Vq, as in (2.22), so (X ^- {I T  - S )X/n)'  1  c  P  = O (1). To prove the  1  h  p  second fact, use B ^ = V * + op(l) (result (5.7)) and Chebychev's Theorem to write: n  - Sl)(m + e) = -^=X ^-\I  -^X ^-\I T  =  - S )(Y - X3)  T  • ( V * + op(l)) • {E@ - O\X, 9  c  h  Z)-3  1IS  (yVar(3*-i =|X,Z)) } .  + O  lS  P  By result (5.3) of Theorem 5.1.1, £?(3*-i,sj\ > ) - Sis 0 {h ) = 0 {n~ ). x  z  2  2a  P  P  Also,  by result (5.16) of Theorem 5.2.1, Var(3*-i,s= \X, Z) is <r?p(n" ). Since a > 1/4, we 1  conclude: ±=X *~\I T  -  S )(m + c  h  e) = O ( V ^ ) • p  - 0  P  ( o p ( n - ) + Op 2a  Q=))  ( n ^ ) + 0 ( l ) - OP(1). F  This completes our proof of Theorem 5.6.1.  Theorem 5.6.1 implies that 3~-i  is v^-consistent since 3^-1 c is v^S  One would expect the conditional bias and variance of 3~-i  83  cons  istent.  to be similar to those of  5.7  Appendix  Throughout this A p p e n d i x , we assume that the assumptions a n d notation introduced i n Sections 2.2 and 2.3 of this thesis hold, unless otherwise specified. W e also let I(S) denote the indicator function of a n arbitrary set S. T h e first lemma i n this A p p e n d i x shows that the correlation m a t r i x of n consecutive observations arising from a stationary autoregressive process of finite order R is invertible. T h e l e m m a also provides an explicit formula for the inverse of this correlation matrix. A proof of this lemma can be found i n D a v i d and B a s t i n (2001,Lemma 1).  L e m m a 5.7.1 Let e i , . . . , e  n  be successive observations from an AR process of finite  order R satisfying condition ( A 2 ) . If ^ is the correlation matrix of t\, . . . , € „ defined in Comment 2.2.1, then its inverse exists and is given by: -l  07  U U - V V] T  T  (5.27)  where U and V are n x n Toeplitz lower triangular matrices defined as (  i 1  \  /  0  o  u  and  -<t>R  V  -<f>R  0  0  -<  -4>i  i  J  -<t>R 0  0 j (5.28)  84  C o m m e n t 5.7.1 Let 14 be as i n (5.28) and define [U(k)}i,j = I(j = i-k,k  + l<i<n)  (5.29)  for k = 1 , . . . , R. T h e n i t c a n be easily seen t h a t u = i - (j>iU(i)  4>RU  Straightforward algebraic manipulations also yield  u u = -Y.MuJk) T  + u{k)) +  £  <t>Mu u T  [p)  p, q = 1  fc=l  + l) (p)) u  iq)  u  + £<f>l fk)Uw u  fc=l  p<q (5.30) where l w\ij  = I(j = i + k l < i < n - k ) ,  U  (5.31)  t  [Uj U ].  . = / ( j = i + p - q, 1 - p + q < i < n - p),  (5.32)  [Uf U ].  . = I(j = i-P  (5.33)  p)  q)  (q)  ip)  + qA<i<n-q),  for fc, p, q — 1 , . . . , R and p < q.  T h e next lemma shows that, i f * is the correlation m a t r i x of a sample of n consecutive observations arising from a stationary AR process of finite order R, then its spectral n o r m is bounded. Furthermore, the spectral n o r m of  L e m m a 5 . 7 . 2 Let e\,...,e  n  is also bounded.  be successive observations from an AR process of finite  order R satisfying condition ( A 2 ) . If \& is the correlation matrix of e ,... ,e x  n  defined in  Comment 2.2.1, then: ||*||s = 0 ( l )  (5.34)  and = 0(1). 85  (5.35)  +1,  Proof: T h e boundedness of ||\& for  1  j | ^ (result (5.35)) follows easily by using the explicit expression  i n equation (5.27).  To prove the boundedness of  (result (5.34)), use the symmetry of * and a well-  known result on spectral norms to get:  1*11.9 <  |[*U  max  =  max ^  ' J'=l  l<i<n  l<i<n  \  P h  —'  h=l-  A c c o r d i n g to Exercise 13 i n Brockwell and D a v i s (1991), there exist constants C > 0 and s G (0,1) so that:  \Ph\  <Cs  | / j |  for a l l h.  C o m b i n i n g the previous results yields: n-1  E  1*11.9 <  M  (.g<*) -(* b)' ,  r  S  and (5.35) follows.  T h e following l e m m a provides a useful asymptotic approximation.  Lemma 5.7.3 Let e i , . . . , e  n  be successive observations from an AR  process of finite  order R satisfying condition ( A 2 ) . Let \& be the correlation matrix of e\,... ,e . n  more, let G be an nx  (p+1)  matrix defined as in (2.14). If n—* oo, then:  i f -—G *- l= -\-[l-Y <t>k) 2  T  l  \  R  r  2  l  /  a  J  n+1  Further-  «V  a  *=i  /  Proof: 86  J  o  g(z)f(z)dz  + o(l).  (5.36)  B y (5.27), the left side of (5.36) is 1  ^  -G * 1 — T  _ 1  n + 1  ° l  1  -G V V1, T  G U U\ T  n+ 1  T  al  T  n+ 1  so it suffices to show  —G U Ul n +  = (l -  1 G V V1 n+1  = o(l).  l  T  T  T  T  E <f>k^j  f  Q  9(z)f(z)dz + o ( l ) ,  (5.37) (5.38)  To establish (5.37), it is enough to show that, for any i — 0 , 1 , . . . ,p, we have:  y2 £ < M  *=i  i /  / -  >(1). Si(z)/(z)ete + o(  70  Let i — 0,1,... ,p, be fixed. Using the explicit expression for WU  i n result (5.30), we  write: ^ G l M ^ l  =  .2  R  - f -zZ^ a  2  u  .2  (Uj  k)  +U)  1  W  fc=i  *  p<q 2  *  « fe=i  n  ^  •  S  T  T  ^  '  1  -  (  5  3  9  )  Therefore, it suffices to prove that the following asymptotic approximations hold:  £  R  & [ ^ X T ^ i (tff*) + W) G  U  *1 =  2  (E  / ' 9i(z)f(z)dz  + o(l),  (5.40)  r  E P, Q = 1  P, Q = 1  ^ P<9  P<9  /  \ 9i(z)f(z)dz Jo  + o(l),  /  (5.41)  87  E*: fe=i  E*i  n+  —G l  l  /  = J^ (z)f(z)dz  T  i+1  9l  ft(z)/(*)<fc + o(l),  (5.42)  + o(l).  (5.43)  T h e last result follows from result (4.10). To prove (5.40), it is enough to show that the equalities below hold for any k = 1,..., R: ^Gf n  + 1  C/f  f e )  l = f  + o(l)  (5.44)  + o(l).  (5.45)  Jo  1  T  (z)f(z)dz  gi  1  = f *(*)/( z)dz  n +l  Jo  Using the expression of Uj ^ i n (5.31) and a R i e m a n n integration argument, the left side k  of (5.44) c a n be written as:  ^  n  = — r Y\ n + 1  n  YI 9i{Zt)I [l = t +  trtr  k,\<t<n-k)  t=i  = / e)/e)dz + o(l). Jo  ft  Here, we have also used that k does not depend u p o n n, as R itself does not depend upon n. Similarly, using the expression for U(k)  88  m  (5.29), we obtain that the left side of  (5.45) is: 1 — — - G C 7 ( wf c ) l = T  n +1  i+1  n+ ^  n+  t=l  /=1  n  n  t=i ;=i  = ^ E ^ ) ~ E ^ ) + « w t=k+l =  /  t=l  ft(z)/(*)dz  Jo  + o(l).  Thus, b o t h (5.44) and (5.45) hold. A similar argument c a n be used to derive (5.41) a n d (5.42). T h e only difference i n the proofs is that the range of summation for t i n J2 gi{Z ) t  t  changes.  It remains to prove (5.38). To establish this result, i t is enough to show that G ^ V V l / ( n + T  h l  1) is op(l) for a l H = 0 , 1 , . . . ,p. Let i = 0 , 1 , . . . , p, be fixed. B y the definition of V i n (5.28), we have:  /o  A  0  VG  i+l  =  -<t>R9i{Z\) -<t>R-i9i(Zi) - <f>R (Z ) gi  \  2  -<?\gi{Z ) - (p2gi(Z ) X  4>R9Z{ZR) J  2  Since &(•) is bounded by assumption (A0)-(i), | | V G j + i | |  = 0 ( 1 ) - A similar argument  2  yields ||V1||2 = 0 ( 1 ) - C o m b i n i n g these results, we obtain: 1  n +l  Gf  V V1 T  + 1  so Gj^V'Vl/in  ^ rxTllVG i|| i +  To ~T~  2  • ||V1|| =  C U ) • 0 ( 1 ) = 0 ( V n ) = o(l),  2  To  X  + l) is o(l). 89  ~x~ 1  The following lemma provides a result concerning the convergence in probability of a random matrix.  Lemma 5.7.4 Suppose the assumptions in Lemma 5.7.1 hold. Let (rjn,..., rji ) , i — T  p  1,... ,n, be as in condition (AO)-(ii) and let rj be an n x (p + 1) random matrix defined  as in (2.10). Then, as n —> oo: ^ * - S  = ^ - ( l + E«)  r  S  + op(l)  ( 0 )  (5.46)  where S ' ' zs defined as in equation (2.15). 0  Proof: By (5.27), the left side of (5.46) can be written as: n +1  cr  2  n+ 1  a  n+1  2  so it suffices to show -L-rfWUri  = (l + E ^)  S ( 0 )  +  n +1 In fact, if £ ^ — (Ey), it is enough to show: L-nf^Ur]^ n + 1I^+iV Vr7  = ^1 + E  T  n  +  i+1  = op(l),  for any i,j = 0,1,... ,p.  90  ^  + OP(1),  (5.47) (5.48)  Let  i,j = 0,l,...,p,  be fixed. Using the explicit expression for * R  1 ^pffi+M U-q  = -  T  i+1  1  fa fc=i  ^T^r+i  ( l + (*)) U  U  i n (5.27), we write:  Vj+i  R  +E  n +  jvl i (C/f C/ + +  p)  C/f L/ )r7  (g)  g)  (p)  j+1  p<g 1  E<  n+l  fc=i  vT+iUj U k)r] i k)  {  + ^jvf+iV i  j+  j+  (5-49)  In order to establish (5.46), we w i l l show that R  1  jvI iUf U  fc=i  n+  +  k)  X>  ik)Vj+1  i j  + o (l), P  (5.50)  vfc=i  1 -Vi iVj+i  ri+  S  2  (5.51)  = Eij + o (l),  +  P  and the remaining terms i n the right side of (5.49) are o p ( l ) . Result (5.51) holds by result (4.9). To prove (5.50), we use condition (AO)-(ii) and the Weak L a w of Large Numbers for a sequence of independent variables to write: 1  -^vI iUf U  n  +  k)  {k)Vj+1  =  1  "  "  -r-r-yEE ^ 7  [ Jk)U(k)] mj u  til  — EE Vtjlil = t=i i=i  n +  n-k  ..  1  V  t,l<t<n-k)riij  P  \  = — T T A , Vt.iVtj  > E (m.iVij)  =  n-+oo  n + l and (5.50) follows easily.  We now show that the first term i n (5.49) is O p ( l ) . We have: R  \  E fa 7—T^f+i^ffc) + U )v i fc=i L  1  I  {k)  j+  J  ( 1 = E fa ( ^TT^+i^ffc)^-fc=i / 1 R  v  + Hfa 91  [j+^^^i  (5.52)  so it suffices to analyze the term nI+iUj )'nj+i/{ + 1)> i  ri  a  s  k  vf+iU(k)Vj+i/(  + 1) is its  n  transpose, w i t h i and j interchanged. Using the expression for 1 7 i n (5.31), we obtain:  ^  n n  = ——r E E n+l —  '7t>^(  i =  i  + ^> 1 < t < n -  k)rj  U  t=i (=i  1  n —  1 =  ^—"\  p  n -\- 1t=l—  —pr  / . Vt,iVt+k,j  n—*oo> E  (r) r) ) hi  = E(r) )E(rj )  1+k:j  hi  = 0.  1+kij  T h e above result was obtained by using the fact that {vt.iVt+kj}^! is a sequence of kdependent, identically distributed random variables (condition (AO)-(ii)), so the quantity Y^t=i Vt,iVt+k,j/(n  + 1) converges to E  {r)i rji j) :i  +k  by the Weak L a w of Large Numbers for  fc-dependent sequences of random variables. W e conclude that the term  'nJ Uj ^ qj /(n+ ,  +1  k  +1  1) is o p ( l ) , so the first term i n (5.49) is o p ( l ) . Using a similar reasoning, we can show that the second term i n (5.49) is also o p ( l ) . It remains to show (5.38). B y the definition of V i n (5.28), we have:  \ 0 VVj+i = ~cf>RV\,j  -fpR-lVlj  \ so ||V?7 0(1).  :;+1  - $R?}2,i  4>Rm,i j  -  ||2 = 0(1) by assumption (AO)-(ii). A similar argument gives ||V»7j ||2 +1  C o m b i n i n g these results yields: T7f V V77 T  n  +l  +1  j+1  <  n + l 1 n + l  i+ll|2 • l | V T 7  Op(l)-Op(l) 92  j + 1  ||  2  = 0 (l/n) P  = o (l) P  )  so  rf: V Vr)j /(n  + 1) is o (l).  T  +1  +l  This completes our proof of Lemma 5.7.4.  P  The following lemma provides asymptotic approximations for non-random quantitities involving the bias associated with estimating a smooth function m(-) via locally linear regression.  L e m m a 5.7.5  Suppose the assumptions in Lemma 5.7.1 hold. Let G be an n x (p + 1)  matrix defined as in (2.14) such that condition (AO)-(i) is satisfied, and Sh be an n x n local linear smoother matrix defined as in (3.6)- (3.8). Set m = (m(Zi),...  ,m(Z )) , T  n  where m(-) satisfies condition (A4) and the Z 's satisfy the design condition (A3). Then, t  if' n — > oo, h —> 0 and nh? —> oo: — - j — G * ( I - S )m T  - 1  h  n  +  — -h ^  «  i  f l  2  —  V  CT  <t>k) ^p-  £  fc=i  /  1  f g(z)m"(z)f(z)dz 1  + o(h ) 2  J o  (5.53)  and 2  n (n —G *-ill (S -I)rn 1  T  = h^  T  2  h  X  where  g(z)f(z)dz  and  ( l - | > )  f 9{z)m"(z)f{z)dz  ^  Q  [ g(z)f(z)dz + o (h ), Jo /o  (5.54)  2  P  g(z)m"(z)f(z)dz  are defined as in equations (2.18) and  (2.19).  Proof: We first prove (5.53). Using the explicit expression for - ± - G * - \ l - S )m T  n+1  h  = ^  in equation (5.27), we write  • -^-G U U(I T  2  - —  a 93  2  -  T  cH n + 1 0"e 1  •— G  n+1  rp r  V  S )m h  rri T  V I  - 5,  m,  so it suffices to show that 1 G'WUil n+1 1  G V V(I T  n +1  -  S )m  J  1-X>J  h  g(z)m"(z)f(z)dz  o  + o(h ) 2  - S )m = o{tf).  T  h  These facts follow by proving that — J U U{I n + l  - S )m  T  G  +l  h  = -h ^ - E <f>^j  j[* 9i(z)m"(z)f(z)dz + o(h )  2  2  (5.55) 1 -G n+1  V V ( / - S )m = o(h ), J  + 1  (5.56)  2  h  for any i = 0,1,... ,p. First, consider (5.55). Using the expression for U U in (5.30), we have: T  R  —Gj U U{I n + l  - S,)m = E fa ^ G f  T  +1  ( E / f > + U )(S {k)  + 1  -  h  I)m  k=l R  E  n+  P, ? =1 p<q  n + n +  T  jGi (Ul U +1  p)  Gr J7f C7 +1  fc)  + Ul U )(S  {q)  q)  (fc)  ip)  - I)m  h  (S -/)m /l  (5.57)  -G? (S -I)m +1  h  Thus, to establish (5.53), it suffices to show that the last two terms are o(h ) and the 2  remaining terms can be approximated as: R  E^ k=i  T 1 -—Gf (Uj +1  L  +  k)  1 U )(S -I)m ik)  h  +  J  = * ( ^^) h  2  f 9i(z)m '{z)f{z)dz l  Q  94  + o(h ), 2  (5.58)  £ 4>P<l>q—Gi (Ul U p, q = 1 p<q +1  p)  + Ui U )(S  iq)  = h  q)  I  R  ip)  \  »2{K)  2 y cp 4>  2  p  V  p,  1  - I)m  h  (z)m"(z)f(z)dz  + o(h ), 2  9i  (5.59)  JO  J  =1  f  1  f  q  -Gj U U (S -I)m T  n+  k=l  +l  {k)  {k)  h  h  ^ j  2  f  a  9i(z)m"(z)f(z)dz  + o(h ),  (5.60)  2  and 1  r  (s  T  nm-  h 2 v 2 { K )  M  f (z)m"(z)f(z)dz  + o(h ).  1  (5.61)  2  gi  Jo ^  —G {S -T)m-— h  T h e last result follows easily from result (4.66) of L e m m a 4.6.10. To prove (5.58), i t suffices to show that the equalities below hold for any k = 1 , . . . , R: h v {K) 2  -^Gf  + 1  E7[  f c )  (S  - I)m =  f c  -±-Gf U (S +1  (k)  2  1  9i  f  - I)m =  h  f j f (z)m"(z)f(z)dz  (z)m"(z)f(z)dz  9i  + o(h ),  (5.62)  + o(h ).  (5.63)  2  2  Consider the left side i n (5.62). Using the expression for Uj ) i n (5.31), the boundedness k  of cii(-) (condition (A0)-(i)) and result (4.34) of L e m m a 4.6.1 w i t h r = m, we obtain:  —  L-GZ,U? {S -Ih k)  h  = r+~l ^  ^  9i(Z )I(l t  EE9i(Zt) M ] • l(S - J ) m ] , k)  u  h  = t + k, 1 < t < n - k) [B {K, Z h)-h  2  m  h  + o(h )] 2  t=i (=i n—k  + o(h ) 2  ^y {Z )B {K,Z ,h) 9i  t  m  t+k  = h (V + Q ) + o(h ), 2  (5.64)  2  n  n  95  where V = YTtJy gi{Z )B (K, n  Qn =  t+k  (9i(Z )  - 9i{Z ))  t  Z ,  m  t+k  1) and  h)/(n +  t+k  B (K,  Z ,  m  h)/(n + 1).  t+k  A Riemann integration argument allows us to approximate V as n  fc V  n  =  ~~T E 9i(Z )B (K, t=i t  n  +  m  Z h) u  1  n  = ^p-j\i(z)m''(z)f(z)dz  +  i  £ t=i  (Z )B (K,  9i  t  Z , h)  m  t  + o(l).  The last equality was obtained by using the fact that k does not depend on n and gi(-) is bounded (condition (AO)-(i)). We also used the fact that B (K, z, h) is bounded for all m  z G [0,1] and h < ho, with ho G [0,1/2] small enough, by result (4.35) of Lemma 4.6.1, Lemma 4.6.2 and condition (A4). Using the fact that <?*(•) is Lipschitz continuous with Lipschitz constant C* (condition (A0)-(i)) and that the Z 's satisfy the design condition (A3), we bound Q as: t  n  n—k \Qn\ < — j - r E \9i(Zt) ~ 9i(Z )\• t=i ^ n—k k — 1 t+k  n  ^  C  \B (K, Z , m  h)\  t+k  +  ^TT  < i c  E E \9i(Zt+i) t—i i—o  f k )  =  c  gi(Z )\ t+l+1  , k i  =  0  (  1  )  ,  Substituting the results concerning V and Q in (5.64) yields (5.62). A similar argument n  n  can be used to prove (5.63). The only difference in the proofs is that the range of summation for 'S2 gi(Z )B (K,Zt+k,h) t  t  m  changes.  Combining (5.62) and (5.63) yields (5.58). Similar arguments can be employed to obtain results (5.59) and (5.60).  96  It remains to prove (5.56). B y the definition of V i n (5.28), we have:  0  -Mii-Sh)™.]!  V(J - S )m h  - f o _ i [ ( I - 5 )m]i -fo[(Ifc  h  fc  x  2  ft  i + 1  ||  2  J  = 0(1). C o m b i n i n g these results yields: 1  < n + l ||VG || -||V(I-S )m||  Gf^V'Vil-S^m  - S )m/(n  T  R  2  i+1  n + l +l  h  h  1  so Gf V V(I  Mil - S )m}  2  2  4.6.2. We know that | | V r 7  n + l  2  = G(h ), since for i = 1 , . . . , R, \(S - I)m]i = 0(h ) by L e m m a  2  1  h  - S )m] - 0 [(I - 5 )m]  \ -0i[(/ B u t ||V(/ - S )m\\  S )m}  2  fc  2  0 ( 1 ) • 0(h ) = 0 ( l / n ) = o{h ) 2  2  + 1) is o(h ). 2  h  Result (5.54) follows easily, by writing: 1 n(n + l )  G t f l l ( S - I)m r  - 1  r  x  /  ""~  1 \n + l  -G * 1 T  _1  1 n + l  l (S T  h  -  I)m  1 1+ n  and using L e m m a 5.7.3 and result (4.66) of L e m m a 4.6.10 w i t h G replaced w i t h 1. T h e proof of L e m m a 5.7.5 is now complete.  Let V<i, be the m a t r i x defined i n (2.22). T h e next result concerns the existence of a n inverse for  and provides an explicit expression for this inverse. W e do not provide a  proof for this result, as one c a n easily verify that V^V^, = V^V<y = I by using the 1  expression of  given i n L e m m a 5.7.6 below.  97  L e m m a 5.7.6 Let V * be the (p+1) x (p+1) matrix introduced in (2.22) and define the px 1 vector a as a — (J* gi(z)f(z)dz,...,  g (z)f(z)dz) .  Here, /(•) is a design density  T  p  satisfying condition (A3) and gi(-),..., g (-) are smooth functions satisfying condition p  (AO)-(i). Furthermore, let £ = (£;j) be the variance-covariance matrix introduced in condition (AO)-(ii). Then V ^ exists and is given by: 1  I  1  (i-Eti«  1  2  +  i  r  i+ Ef=1^  a E-  a E a  r  _ 1  i'  -E a _1  V  E-  :  1  provided E exists. 1  The last result in this Appendix provides a useful asymptotic bound.  L e m m a 5.7.7 Suppose the assumptions in Theorem 5.2.1 hold. Then, if n —> oo, h —> 0 and nh —> oo, we /iai;e: 3  — L — V ^ G ^ C I  - S£)*(J - 5 ^ ) * - G V i T  1  1  = 0{n- ).  (5.65)  1  Proof: To prove (5.65) it suffices to show that 1  - ^ G  T  ^ - \ I - S )*(I - SifV-'G  = Oin- ),  c  (5.66)  1  h  since the elements of the (p + 1) x (p + 1) matrix V ^ do not depend upon n. Result 1  (5.66) follows by showing that G f * ( T - S ) V ( I - S ) * - G _ 1  c  + 1  c  h  for any i, j = 0,1,... ,p.  98  T  h  1  j + l  /(n  +1) is 0 ( 0 2  Let i, j — 0 , 1 , . . . ,p be fixed. Using vector and m a t r i x norm properties introduced i n Section 2.4, we obtain: 1 (n + l ) ' 1  < (n+l)  I - S Y c  2  '  tf-^+xlla  h  SffV-'G^lU  • ||*||s • ||(I -  < O ( n - ) • | | ( / - SlY^G^W,  • | | ( I - S ) ^ G \\ ,  2  c  T  (5.67)  l  h  j+1  2  since | | * | | s is 0 ( 1 ) by result (5.34) of L e m m a 5.7.2.  T h u s , it suffices to show that  ||(/ - S ) *- G \\  0(n^).  c  T  h  Let v = *  and ||(7 - S%) *- G \\  l  i+l  _ 1  T  2  j+1  2  G j + i ; using S £ = (7 - 11 /n)Sh  (equation (3.1) w i t h  T  ||(7 - SlfV-iG^Wt  are  l  < WV-'G^W,  +  = Sh) we write:  WSfV-'G^W,  = ||«||2+||^«||2 =  11*112 +  1 - - 1 1 > n  5^(7  < IHl2 + | | S ? « | | +  (5.68)  a  If v denotes the i  t  h  t  component of v, we can show that there exists C > 0 such that  \v \ < C for all t = 1 , . . . , n. Indeed, by the expression for  i n (5.27) and Comment  t  5.7.1, we have:  v- = G T j * - i = ^ . Gj U U T  +1  ~  +1  ~ "f •  Gj V V T  2  +1  -T Muf + u )+  2'  t  fc=l  w  k)  £  0A(i/f t/ p )  ( 9 )  + r7f L/ ) ?)  (p)  p, 1 = 1 p<q  + 5> c/f c/ + 7 2  fc)  (fc)  k=l so it suffices to show that the quantities  and Gj Uj )U( ) +1  k  k  Gj l/J , Gj U( ), +I  k)  +1  k  J+i J ) (q)^  G  u  U  P  have bounded components for all k,p, q = 1,...,R, 99  Gj^Uf^U p < q, a n d  G j V V also has bounded components. These facts follow easily from the sparseT  + 1  ness of C7(fc) and V (see Comment 5.7.1) and the boundedness of Gj+i's components (condition (AO)-(i)). The boundedness of v's components implies that ||u||2 is 0(n l ),  WS^vWv is C(n  l 2  \\Sl(ll v/n)\\ T  1//2  ) and  is C ( n ) . The last two results follow by result (4.49) of Lemma 4.6.6. 1/2  2  Using these asymptotic bounds in (5.68) yields that ||(J - S ) ^~ G i\\ c  T  h  A similar argument gives that \\(I - S ) ^~ G i\\ c  T  h  l  i+  100  2  is  l  j+  0(n ). 1/2  is 0 ( n ) . 1/2  2  Chapter 6 Choosing the Correct A m o u n t of Smoothing The estimators of the linear effects in model (2.1) considered in this thesis depend on a smoothing parameter h. This parameter has a dual function. On one hand, it influences the statistical properties of the estimated linear effects. On the other hand, it controls the shape and smoothness of the estimated non-linear effect. Our focus in this chapter is on developing data-driven methods for choosing h so that we obtain accurate estimators for the linear effects of interest. These methods may not be the most appropriate for accurate estimation of the non-linear effect, as they may undersmooth its estimator. This chapter is organized as follows. In Section 6.1, we introduce some useful notation. In Section 6.2, we introduce methods for choosing the correct smoothing parameter for the usual and modified local linear backfitting estimators of the linear effects of interest in model (2.1). These methods require the accurate estimation of the nonparametric component m and the error correlation structure, topics discussed in Section 6.3. Finally, in Section 6.4 we introduce methods for choosing the correct smoothing parameter for the estimated modified local linear backfitting estimators of the linear effects of interest in model (2.1).  101  6.1 Notation In what follows we are interested in the accurate estimation of a linear combination cF 8 of the linear effects 3 in model (2.1), where c = (co, c\,..., c )  is a known vector with  T  v  real-valued components (e.g: c = (0,..., 0,1, 0,..., 0) ). T  Throughout this chapter, we denote fii,s > c  h  and /3~-i  3^,-i c iS  g  generically by 3  c  U  h  in  order to emphasize their dependence upon the amount of smoothing h. We want to choose the amount of smoothing h to accurately estimate c 3 via c 3 T  3~-\  T  n h  . Given that  is conceptually qualitatively different than the other estimators considered here,  we defer its discussion to Section 6.4. In the remainder of this chapter, unless otherwise stated, we assume that Cl stands for J or St . -1  The correct choice of h depends on the conditional bias and variance of c 3 T  n h  given X  and Z. We provide below explicit expressions for these quantities. The exact conditional variance of c 3 T  Va 0n,h\ , r  x  equals c Var(3  \X, Z)c. Expressions for  T  i l h  n  h  ) are found in (4.17) when fl = I and in (5.15) when Cl =  z  c Var(p \X,  Z)c = a c M , *M^ c  T  2  u>h  = Var(h; Cl),  T  a h  h  Thus: (6.1)  where M , n  = (X Cl(I T  f c  The exact conditional bias of c 3 T  - 5 ^ ) X ) - X f i ( I - S ). 1  T  c  h  equals c Bias(3 \X, T  n h  Uh  (6.2)  Z) and can be obtained  from (4.2) when Cl = I or (5.2) when Cl = * : _1  c Bias(/3 \X, T  nA  Z) = c M ,hm T  n  102  = Bias(h; Cl).  (6.3)  6.2  Choosing h for c f3j c and c /3^-i c T  The estimator c ' 3^  h  estimator c 3 T  T  a h  S  depends on the smoothing parameter h. To obtain an accurate  1  c 3  T  >S  of c 3 we choose h so that it minimizes a measure of accuracy of T  U h  . Although the smoothing parameter h quantifies the degree of smoothness of  "in.fti a 'good' value for h should not necessarily be chosen to minimize a measure of accuracy of rfici,h as, in the present context, m is merely a nuisance. Since c 3 T  n h  is generally biased in finite samples, we assess its accuracy via its exact  conditional mean squared error, given X and Z: MSE(h; ft) = Bias(h; ft) + Var(h;  ft).  2  (6.4)  We define the MS ^-optimal amount of smoothing for estimating c 3 via c 3 T  T  a  h  as the  minimizer of MSE: MSE  h  = argmin MSE(h; h  From equations (6.1) and (6.3), one can see that h^  ft).  SE  (6.5)  depends upon the unknown  nonparametric component m as well as the error variance a and the error correlation 2  matrix * , which are typically unknown. Thus, h^  SE  is not directly available.  To date, no methods have been proposed for estimating hf}  SE  when the model errors  are correlated. However, when the model errors are uncorrelated and ft = J , Opsomer and Ruppert (1999) proposed an empirical bias bandwidth selection (EBBS) method for estimating h[j . We describe this method in Section 6.2.1. We propose modifications SB  of the EBBS method to estimate h^  SE  ft = J , but also for ft —  when the errors are correlated not only for  in Section 6.2.2. Finally, in Section 6.2.3 we propose a  non-asymptotic plug-in method for estimating h^  SE  in the presence of error correlation  when ft equals I or V& . Each method minimizes an estimator of MSE(-; ft) over h in -1  103  some grid. Throughout, we let H = {h(l),h(N)}  denote the grid, for some integer  N.  6.2.1  Review of Opsomer and Ruppert's EBBS method  In this section, we provide a detailed review of Opsomer and Ruppert's EBSS method. Throughout this section only, we assume that the errors associated with model (2.1) satisfy \& = i". Specifically, we assume that these errors satisfy the assumption: (A6) The model errors e ,i = 1,... ,n, are independent, identically distributed, having t  mean 0 and variance a\ G (0, co). We also consider $"2 = I, so that the results in this section will apply exclusively to 0 , the usual local linear backfitting estimator of c 3. T  Ih  Under the above conditions, the EBBS method attempts to estimate h^  by minimizing  SE  an estimator of MSE(-\ I) over TC, a grid of possible values of h. For a given h(j) G TC, Opsomer and Ruppert find an estimator for MSE(h(j);I)  by combining an empirical  estimator of Bias(h(j); I) with a residual-based estimator of Var(h(j); I). We discuss the details related to computing these estimators below. Opsomer and Ruppert use a higher order asymptotic expression for E(c 0 \X,Z) T  Ih  —  c 0, the exact conditional bias of cFQ , to obtain: T  Ih  T  (6.6) as h —> 0, where a ,t = 1,... ,T, are unknown asymptotic constants referred to as bias t  coefficients. This expression can be obtained by a more delicate Taylor series analysis in (4.3). This yields the approximation: T+l  E(c p \X, T  Iih  Z) = c 0 + J2 ath* + T  t=2  104  o(h ). 1+T  (6.7)  For fixed h(j) € H, Opsomer and Ruppert estimate Bias(h(j)\I), bias of c 3 (j), T  Iih  as follows. They calculate c 3 ^  the exact conditional  for k e {j — ki,..., j + k }, for  T  Ih  2  some ki, k . Note that j must be between ki + 1 and N — k , inclusive. They then fit 2  2  the model: E(c J3 \X,  Z) = a + a -h  Iih  0  to the 'data' j (h(k), c?3 ( ^ Ih  + --- + a  2  T  2  •h  T+1  T+1  (6.8)  : k — j — k\,..., j + A; | using ordinary least squares.  k  2  This results in the fit: E(c 0i, \X,  Z) = a + a -h  T  + --- + a  2  h  0  2  •h . T+1  T+1  (6.9)  An estimator for Bias[h(j); I) is then: Bio7s(h(j);I) = E(c p \X,  Z) - So  T  IMj)  = a • h(j) + • • • + a 2  2  • h(J) .  (6.10)  T+1  T+1  Here, ki, k and T are tuning parameters that must be chosen by the user. We must have 2  k\ + k > T since the T + 1 parameters ao, a i , . . . , ar will be estimated using k\ + k + 1 2  2  'data' points. Opsomer and Ruppert estimate Var(h(j); I), the exact conditional variance of  c 8 ^, T  Ih  by using (6.1) with *ff = I but with a replaced by the following residual-based estimator: 2  ^  _ \\ ~ PiMi) Y  2  ~  X  iMi) 12  m  n  This yields: Var{h{j); I) = ^ M  W  )  M j  W  )  c .  (6.11)  Finally, Opsomer and Ruppert combine (6.10) and (6.11) to obtain the following estimator of MSE(h(j);I),  jfci + 1 < j < N - k : 2  MSE(h(j);I)  = Bia7s(h(j);I)  2  105  + Va7r(h(j); I).  They then estimate hj^ , the minimizer of MSE(-;I), as follows: SE  hMSE=  argmin ki+l<j<N-k  MSE(h(j);I). 2  We see that h^  SE  attempts to estimate h^ , the smoothing parameter which is MSESE  optimal for estimation of c (3. It is not clear however whether using hj^ T  SE  yields a  V^n-consistent estimator of c 0. T  The variance estimator Var(h(j); I) in (6.11) depends on the matrix Mj^(j)- To speed computation of M ^,  and hence Var(h(j);I),  I;h  Opsomer and Ruppert suggest the  following. First, take fl = I and h = h(j) in (6.2) and re-arrange to obtain an alternative expression for M  = (x (i T  i  m  - s )xy x (i c  l  - s% )  T  h{j)  U)  = (X (X - S< X))-\X T  - X S ).  T  T  h(j)  Then, compute the product S ^X c  h  c  h{j)  in (6.12) by smoothing each column of X against  the design points Z\,... ,Z . Finally, compute the product X S° ^ T  n  the approximation X S ^ T  T  h  symmetry of S ^y c  h  h  « (S%^X) .  c  (6.12)  in (6.12) by using  This approximation is justified by the near  These computational tricks can also be used to ease the burden  involved in calculating 0i h(j),h(j) t  G Ti, as 8i,h(j)  c  a  n  be easily seen to depend upon  iMJ)-  M  A peculiar feature of the estimator o\ of a\ is that it uses residuals based on the 7  'working' bandwidth h(j) G Ti, instead of a bandwidth optimized for estimation of of. As an alternative to estimating of with the 'working' bandwidth h, Opsomer and Ruppert suggest that one could use residuals based on a bandwidth optimized for estimation of a\ as in Opsomer and Ruppert (1998). For implementing the EBBS method in practice, Opsomer and Ruppert (1999) suggest using a grid size N = 18 and grid values equally spaced on the log scale. They recommend 106  using the following values for the tuning parameters involved in this method: ki = 1, k = 2  2 and T — 1. For situations where MSE(- \ I) is found to have more than one minimum as a function of h, they suggest that one could take h  MSE  to be either the h value where  the global minimum occurs, or the h value where the first local minimum occurs. The authors advise that they found the former approach to be superior to the latter in their simulation studies.  6.2.2  Modifications to the EBBS method  Here we adjust the EBBS method to deal with estimating h^  SE  are correlated and fl = I or Cl — h  MSE  when the model errors  The modified EBBS method attempts to estimate  by minimizing an estimator of MSE(-\Cl)  over the grid Ti. For a given h(j) € H,  this estimator is obtained by combining an empirical estimator of Bias(h(j);Cl)  the  exact conditional bias of c 3  the  t  T  n h  , with a residual-based estimator of Var(h(j);Cl),  exact conditional variance of c 3 . T  Uh  Specifics are provided below.  The modified EBBS method uses a similar bias-estimation scheme to that employed in the EBBS method in order to estimate Bias(h(j);Cl).  This scheme relies on the following  asymptotic bias approximation: E(c p \X, T  n:h  x+i Z) = c 3 + y a h + o(h ), t=2 T  l  t  which parallels (6.7), and yields the estimator  1+T  (6.13)  Bias(h;Cl).  However, the modified EBBS can no longer rely on the residual estimation scheme utilized in the EBBS method for estimating Var(h(j);Cl). Var(h(j);Cl)  The reason for this is that  depends not only on the error variance cr , but also on the error correlation 2  107  matrix * . Forft= J , we propose to estimate Var(h; ft) via: S j ^ M n ^ M j c, if * is known and o f is unknown; h  Var(h; ft) = <  a  c  2  T  CT c Mn,h*iVf^ 2  if * is unknown and o f is known;  M Q ^ M ^ C ,  T  h  c,  (6.14)  if * is unknown and o f is unknown.  For ft = SP , if of is unknown, we propose to estimate Var(h; ft) via: -1  Var(h; ft) = a c 'M ^M 2  T  (6.15)  c.  T  U  u<h  The estimators in (6.14)-(6.15) have been obtained from (6.1) by substituting o f for o f and * for * whenever appropriate. Details on how to obtain reasonable estimators o f and * are provided in Section 6.3.2. In summary, the modified EBBS method finds the minimizer: h^sE  — argmin ^Bias(h; ft) + Var(h; ft) j = h  _,  2  EBBS  with h G TL = {h(l),...,  L  (6.16)  h(N)}, Bias(h;Q) obtained via the bias-estimation scheme  described earlier, and Var(h;Q) as in (6.14) if ft = I or as in (6.15) if ft = of is unknown. Here, the label 'EBBS method estimates Bias(h;fl)  and  — U denotes the fact that the modified EBBS  by local ordinary least squares regression.  It is possible to estimate Bias(h; ft) by performing global, rather than local, ordinary least squares fitting. Specifically, we can perform just one least squares regression, using the 'data' ^(h(k),c /3 ^ T  nh  : k = 1,..., ivj. We refer to the method that finds the  minimizer of (6.16), with Bias(h;fl)  obtained by global ordinary least squares fitting,  as the global modified EBBS method. We denote the amount of smoothing this method yields by  h£ _ . BBS  G  Before concluding this section, we indicate how the modified EBBS methods can be generalized if one is interested in smoothing parameter selection for accurate estimation of c 0 via the usual or modified local polynomial backfitting estimators. For simplicity, T  in this section only, we denote both of these estimators by c /3 . T  ah  108  The variance-estimation scheme to be used in the generalized modified EBBS methods should be the same as that employed in (6.14)-(6.15). Obviously, the quantities a , M^,h 2  and * involved in these equations should be computed based on locally polynomial regression of degree D > 1, instead of locally linear regression. We conjecture that the bias-estimation scheme would have to rely on the asymptotic approximation T+l  E(c 0 \X,Z) T  Uih  = c 3+  J2  T  a h + o(h ) T  1+T  t  t=D+l  instead of (6.13). Note that we must have T > D.  6.2.3  Plug-in method  In this section we introduce yet another method for estimating the optimal amount of smoothing h$  SE  in the presence of error correlation whenever fi = I or fi = *  the non-asymptotic plug-in method. Recall that h^  SE  _ 1  , namely  was defined as the minimizer of  MSE(-; fi) in (6.4). Thus, we might find a reasonable estimator for h^  SE  by minimizing  an estimator of MSE(-; fi) over a grid of possible values for the smoothing parameter h. We propose estimating MSE(h(j); fi) by assembling plug-in estimators of its exact bias and variance components, Bias(h(j)\fl)  and Var(h(j); fi).  More specifically, we propose to estimate Var(h(j); fi) using (6.14) if fi = J and (6.15) if fi =  and of is unknown. Furthermore, we propose to estimate Bias(h(j) \ f2) using  (6.3), but with m replaced by an accurate estimator m: Bias(h; fi) = c M ,hrn. T  n  (6.17)  Details on how to obtain an accurate estimator m of m are provided in Section 6.3.1. As remarked before, when fi =  M n , / , depends upon the error correlation matrix  Thus, if * is unknown, we must substitute * for * in the expression for JWn.ft) where * is obtained as in Section 6.3.2. 109  Finally, m i n i m i z i n g the estimator of for fl — I, or (6.15) for fl =  MSE(-; fl) obtained by combining (6.17) w i t h (6.14)  and cr unknown, over a grid of possible values for h 2  yields the desired plug-in estimator of h SE M  6.3  h^ :  = argmin {Bias(h(j);fl) hen  SE  2  + Var(h(j);fl)\  J  =h^ _ . WG  (6.18)  IN  Estimating m, <j\ and ^  Here we introduce methods for (1) accurately estimating the nonparametric component m  i n model (2.1) i n the presence of error correlation and (2) estimating the variance  of and the correlation m a t r i x \& of the errors associated w i t h model (2.1). E s t i m a t i n g m , of and * is difficult because of the confounding between the linear, non-linear and correlation effects. We hope that the combined way of estimating m , cr and \& proposed 2  i n this thesis w i l l enable us to do well when estimating B.  6.3.1  Estimating m  In this section, we consider the issue of accurately estimating the nonparametric component m  i n model (2.1) when the model errors are correlated.  Recall that we need  an accurate estimation of m for estimating the exact conditional bias of c /3fj h T  m  the  plug-in method i n (6.17). We propose estimating m v i a mn,/,, w i t h fl = I and w i t h h chosen by cross-validation, modified for correlated errors. Throughout this section, we thus consider that fl = I. We also let Xf denote the z  t h  = (1, Xu,...,  X) ip  row of the m a t r i x X i n (2.2).  To assess the accuracy of rrijh as an estimator of m for a given amount of smoothing h,  110  we use the mean average squared error of m j ^ : MASE(h-I)  1  =E  71  2  -^2{fhi (Zi)  - m(Zif)  >h  = E - £(m/ 1  l f c  x,z  ( Z 0 + Xf/3 - m(Zi) --XJ3)  2  2  n  = E -^(Yt-EWXitZij)  x,z  x,z (6.19)  i=l  where yj = rn (Zi) IA  + Xj3 and £?(Yi| JTi, Z ) = m(Zj) + Xjd. We define the MASE4  optimal amount of smoothing for accurate estimation of m via mj/j as: h  M A S E  From (6.19) we can see that h  MASE  = argrain MASE(h; h  (6.20)  I).  depends on the bias and variance associated with  estimating the non-parametric component ra, which in turn depend on m itself. Since in practice m is unknown, h  MASE  To estimate h^ , ASE  has to be estimated.  we propose using the modified (or leave-(2Z +l)-out) cross-validation  method originally formulated by Hart and View (1990) in the context of density estimation and studied by Chu and Marron (1991) and Hardle and Vieu (1992) in the context of nonparametric regression with correlated errors. Aneiros Perez and Quintela del Rio (2001b) recommend modified cross-validation in the context of partially linear models with a-mixing errors. These authors used a version of the Speckman estimator with boundary-adjusted Gasser-Miiller weights to estimate m. The modified cross-validation method estimates h  by minimizing an estimator of  MASE  MASE(h;I): .2  M^E{ha)-\zZ{yth'' - ^ l)  Y  (6.21)  1=1  This estimator is obtained from (6.19) by dropping the outer expectation sign, substituting E(Yi\Xi,  Zi) with Yi, and replacing Yi with Y^ ' \ a prediction of Y$ — Xj3 + 1  111  l  m(Zi) + €i based o n data points (Yj, Xji,...,  Xj , Zj) which are believed to be uncorreP  lated w i t h Yi. M o r e specifically, Y ^ - X j ^ where  0  hV> Ih  and  points (Yj, Xji,...,  fri ~ \z ) hi  I  h  i  +^ i Z i ) ,  are estimators of  (6.22)  0 a n d m(Z\) obtained from the data  Xj , Zj) w i t h j such that \i — j\ > I. T h e estimation procedure used P  for obtaining 0  a n d fhj '  Ih  (Zi) is the same as that utilized for obtaining 0  h  Ih  and  fhi, (Zi). h  R e c a l l that the estimation procedure utilized for obtaining the estimators 0  Ih  rrii h(Zi) :  and  of 0 a n d m is the usual backfitting algorithm, w i t h a (centered) local l i n -  ear smoother m a t r i x i n the smoothing step. However, the backfitting algorithm allows us to evaluate fhj~ ' \-) only at Zj's w i t h j such that \i — j\ > I. W e cannot evaluate l l  h  i~h' \')  m  l  &  t Zi- T o overcome this problem, we propose to estimate 0 and m(Zi) as  indicated below. We first carry out the usual backfitting algorithm o n a l l d a t a to obtain the estimator /3  n  h  of 0 using a l l n data points. W e then define the partial residuals:  r  = Yj - Xjp ,  jth  j = l,...,n.  nth  (6.23)  F r o m now on, these residuals w i l l become w o r k i n g responses for the modified crossvalidation and our 'data set' is (rj h, Zj),j = 1 , . . . ,n. F i x i, 1 < i < n. W e temporarily t  remove from the 'data' the (21 + 1) 'data points' (rj h,Zj) t  w i t h \i — j\ < I. W e use  the remaining n — (21 + 1) data points i n a usual local linear regression to obtain the  n — (21 + 1) estimators  fn*Q~ ' 1  h  estimators are not centered. \i — j\ > I from rn*^~ ' \z ) 1  h  l  t  \Zi) a n d m*^~ ' \Zj),  l  %  h  l  w i t h j such that \i — j\ > I. These  Subtracting the average of -Ti^~ ' \Zj), % l  h  w i t h j such that  yields a centered estimator for m ( Z , ) :  ^ (^)=<; ' (^)- - |/_ ,-!>/} E ^ (^)i0  <i )  ii0  #o ;  112  (6-24)  The centering approach used above is admittedly ad-hoc, but nevertheless attempts to address the need of subjecting rn(-) to an appropriate identifiability restriction. Next, we use the estimators in (6.24) in a computationally feasible modified crossvalidation criterion: MCVt(h) = ± £ {n* ~  °(^))  •  2  (' ) 6 25  2=1  Minimizing this criterion yields the desired cross-validation amount of smoothing for accurate estimation of m via mj/j when the model errors are correlated. Note that it is possible to compute a full scale modified cross-validation criterion, by calculating a different estimator of 3 for each i. Specifically, we could replace  B  nh  in the right side of (6.23) with 3 '  , the estimator obtained from all data less those  data points  j  nh  (Yj,Xj\,...  ,Xj , P  Zj)  with  such that \i — j\  However, computing the  < I.  full scale modified cross-validation criterion would be more involved than computing the computationally convenient criterion in (6.25). Given that 3 is easier to estimate than m, we believe that the computational simplification used to estimate 3 will not affect to a great degree the estimation of m. A similar simplification was used by Aneiros Perez and Quintela del Rio (2001b) for their modified cross-validation method. Although we do not have theoretical results that establish the properties of the modified cross-validation method, our simulation study suggests that it has reasonable finite sample performance and that it produces a reliable estimator of m, provided I is taken to be large enough. It is not clear how to best choose I in practice. Recall that I should be specified such that the correlation between  Yi  and (Yj,Xji,...,  is effectively removed when predicting Yi by the value  with \i — j\ <  I,  in (6.22). Choosing an  I  Xj , P  Y~ ' 1  h  1  Zj),  value that is too small may not succeed in removing the correlation between these data values, therefore producing an undersmoothed estimator of m. Choosing an I value that is too large may remove too much of the underlying systematic structure in the data, therefore producing an estimator of m that is oversmoothed. Whenever possible, one 113  should examining a whole range of values for I to gain more understanding about the sensitivity of the final results to the choice of I. Our simulation study suggests that small values of I should probably be avoided.  6.3.2  Estimating o f and *  In this section, we propose a method for estimating the variance of and correlation matrix * of the errors associated with model (2.1). The method we propose relies on assumption (A2), that the model errors follow a stationary autoregressive process of unknown, but finite, order. To estimate the order and the corresponding parameters of this process, we apply standard time series techniques to suitable estimators of the model errors. Monte Carlo simulation studies conducted in Chapter 9 indicate that this method performs reasonably well. Assumption (A2) will clearly not be appropriate for all applications. However, we expect it to cover those situations where the errors can be assumed to be realizations of a stationary stochastic process. Indeed, it can be shown that almost any stationary stochastic process can be modelled as an unique infinite order autoregressive process, independent of the origin of the process. In practice, finite order autoregressive processes are sufficiently accurate because higher order parameters tend to become small and not significant for estimation (Bos, de Waele, Broersen, 2002). If the e,'s were observed, we could estimate the order R of the autoregressive process they are assumed to follow by using the finite sample criterion for autoregressive order selection developed by Broersen (2000). This criterion selects the order of the process by achieving a trade-off between over-fitting (selecting an order that is too large) and under-fitting (selecting an order that is too small). Traditional autoregressive order selection criteria either fail to resolve these issues (i.e., the Akaike Information Criterion) or address just the issue of over-fitting (i.e., the corrected Akaike Information Criterion). In addition,  114  Broersen's criterion performs well even when the order of the autoregressive error process is large. After estimating R, we could estimate the error variance a\ and the corresponding autoregressive parameters <PI,--.,<PR by using Burg's algorithm. This algorithm is described, for instance, in Brockwell and Davis (1991). A comparison of various methods for autoregressive parameter estimation has shown that the Burg algorithm is the preferred method (Broersen, 2000). Finally, we could estimate the error correlation matrix \& by replacing 4>i, • • •, 4>R with their estimated values in the expression for \& provided in Comment 2.2.1. For instance, if R was estimated to be 1, we would estimate the  (i,j)  th  element of * as:  where 4>\ is the estimator of the autoregressive coefficient <p\. However, the e,'s are unobserved, so we must first estimate them via suitably defined model residuals and then apply the methodology described above to these residuals in order to obtain the desired estimators of o~\ and * . We propose to estimate the vector of errors e = ( e i , . . . , e ) by the model residuals r  n  ej^ = Y — X8  Ih  — rhih, where h is chosen by modified cross-validation, as described  in Section 6.3.1. As argued in Section 6.3.1, this choice of h is expected to provide an accurate estimator for X8+m,  and therefore a reasonable estimator for e =  Y—X8—m.  For those applications where the reasonableness of assumption (A2) is questionable, we believe that one could still use the modified cross-validation residuals to estimate the model errors, since the modified cross-validation method does not rely on explicitly incorporating the error correlation structure. For instance, under the more general assumption  115  (Al), one could estimate a and * = (^ij) from  e  2  Ith  =  . . . ,?„) as follows: T  n  —2  1  n-|»-j|  *U  ^  =  ^ £  -fori ^ j  However, we do not pursue this approach in this thesis.  6.4  Choosing h for c (3~-i „„ T  We conclude this chapter by discussing the choice of smoothing parameter h for the estimated modified local linear backfitting estimator we denote 8~-i  g c  c /3~-i T  . As indicated in Section 6.1,  by B^-i to emphasize its dependence upon h. Our theoretical goal is h  to choose values of h which minimize measures of accuracy for cF 8~-\ introduced for c 8  and c 8^,-i .  T  and Var(h;  \I*  h  )=  c Bias(8^-i T  \X, Z)  AX, Z)c, we wish to choose the value of h that min-  ) = c Var(3~-i T  imizes the quantity MSE(h;ty value by h .  Namely, if Bias(h;^  T  I h  similar to those  ), obtained by taking fl — \&  in (6.4). Denote this  In practice, we have to estimate this value from the data. The dif-  MSE  ficulty that we face is that, since * is estimated and thus random, an expression for MSE(h\ * ) is not tractable. 1  To avert this issue, we ignore the effect of estimating \& and simply replace * by * in the expression for MSE{h-y- ). 1  conditions, 3^,-1  h  and 8^-\  h  We have seen earlier in this thesis that, under certain are asymptotically 'close', so we expect our approach to  be reasonable for large sample sizes. We propose to choose h using suitable modifications of the EBBS and plug-in methods discussed in Sections 6.2.2 and 6.2.3. The global and local modified EBBS methods for  116  choosing the smoothing parameter h of c /3~-i  attempt to estimate  T  = argmin{Bias(h\^ h  h-MSE  1  hff : SE  ) + Var(/x;$  (6.26)  w i t h h €H- For b o t h methods, V a r ( / i ; * *) is computed by substituting \& w i t h * into the expression of Var(h; *  _ 1  ) , the exact conditional variance of B^-\ . T h i s expression h  is obtained by t a k i n g ft = Bias(h; *  i n (6.1). T h e global modified E B B S method estimates  ) empirically by fitting a global ordinary least squares regression model to the  'data' points j (ji(k),c P^-i  ^ : k = I,...,  T  h  this method yields by h% _ . BBS  N*j. W e denote the amount of smoothing  T h e local modified E B B S method uses only a fraction  G  of these data to accomplish the same task. We denote the amount of smoothing supplied -i  by this method by  h  _.  EBBS  L  T h e plug-in method for choosing the value of h i n /3^-i m a t i o n to)  tries to estimate (an approxi-  h^ : SE  hftSE  Here, Var(h; *  —  argmin{Bias(h; * *) + Var(h; * *)} = h ' hen  _ .  >  P L UG  IN  (6.27)  ) is as above, and Bias(h; * *) is constructed by substituting * w i t h *  into the expression of Bias(h; \ & ) , the exact conditional bias of _ 1  is obtained by t a k i n g ft — \I>~ i n (6.3). 1  117  j3^,-i . h  T h i s expression  Chapter 7 Confidence Interval Estimation and Hypothesis Testing In this chapter, we develop statistical methods for assessing the magnitude and statistical significance of a linear combination of linear effects c 3 T  (co,... ,c )  T  p  in model (2.1), where c =  is a known vector with real valued components. Specifically, we propose  several confidence intervals for assessing the magnitude of c 3, as well as several tests T  of hypotheses for testing whether c 3 is significantly different than some known value of T  interest.  7.1  Confidence Interval Estimation  We propose to construct approximate 100(1 — a)% confidence intervals for c 3 from the T  usual, modified or estimated modified local linear backfitting estimators considered in this thesis, and their associated estimated standard errors. In what follows, we use the notation in Section 6.1 to denote these estimators generically by C 3Q t  H  where Cl can be  ,  -l  I, *  or *  , respectively, and h is an amount of smoothing that must be chosen from  the data. Our confidence intervals use an estimated standard error SE(c f3  ) obtained  T  n  118  h  as follows: SE(c 0 , ) T  n h  Here, Var(h;Cl)  = ^Va7r{h-Cl).  (7.1)  the conditional variance of c 8  is an estimator of Var(h;Cl),  given  T  nh  X and Z. Specifically, for Cl — I, Var(h; Cl) is defined as in (6.14). For Cl =  if of is  unknown, Var(h; Cl) is defined as in (6.15). Finally, for Cl = * \ Var(h; Cl) is obtained from (6.15) by replacing * with *  . Note that the standard error expression in (7.1)  does not account for the estimation of of and * when these quantities are unknown, nor does it account for the data-dependent choice of h. Rather, it is a purely 'plug-in' expression. The performance of a 100(1 — a)% confidence interval for c 3 depends to a great extent T  on how well we choose the smoothing parameter h of the estimator c 8  .  T  n  choice of h can affect the mean squared error of c 3  A poor  h  , resulting in a confidence interval  T  n  h  with poor coverage and/or length properties. We want to choose an h for which (i) the bias of c 3  nh  is small, so the interval is centered near the true c 3, and (ii) the  variance of c 3  nh  is small, so the interval has small length. Choosing h to ensure that  T  T  T  the confidence interval is valid (in the sense of achieving the nominal coverage) and has reasonable length is crucial to the quality of inferences about c 3. T  In this thesis, we choose the amount of smoothing h needed for constructing confidence intervals for c 3 via the following data-driven choices of h, introduced in Chapter 6: T  1. the (local) modified EBBS choice,  h} _\  2. the global modified EBBS choice,  h  E  BBS  _;  EBBS  3. the (non-asymptotic) plug-in choice,  L  G  hp _ . LUG  IN  Recall that each of these choices is expected to yield an accurate estimator c 3 T  a  h  of  c 3. Throughout the rest of this chapter, unless otherwise specified, we assume that the T  119  smoothing parameter h of the estimator c 3 T  h  u  [ l  PLUG-IN'  n h  refers to any of h  _,  EBBS  h  L  _  EBBS  or  G  The performance of a 100(1 — a)% confidence interval for c /3 also depends on how well T  we estimate SE(c 3 ),  the true standard error of c 3 .  will estimate SE(c /3 )  by SE(c 0 )  T  nh  T  nih  As already mentioned, we  T  nh  as defined in (7.1). Recall that  T  nh  SE(c 0 ) T  nh  depends on another smoothing parameter, needed for estimating \fr via  as described  in Section 6.3.2. It is not clear whether the modified cross-validation choice of smoothing proposed in Section 6.3.2 yields a reasonable estimator of SE(c 3 )T  nh  The Monte Carlo  simulations presented in Chapter 8 will shed more light on this issue. The standard 100(1 — a)% confidence interval for c 3 is given by T  c 0 j ±z SE(<?0 ),  (7.2)  T  n l  where z /  a 2  a/2  nth  is the 100(1 — a)% quantile of the standard normal distribution. According  to the asymptotic results in this thesis, the estimator c 3 7  U h  is biased in finite samples.  Consequently, the standard confidence interval for cF8 may not be correctly centered and may not provide 1 — a coverage. We propose two strategies for dealing with this problem. One strategy is to perform a bias adjustment to the estimator c 8 , T  n<h  to try  to ensure that the confidence interval is better centered. This approach, referred to as bias-adjusted confidence interval construction, is discussed in Section 7.1.1. Another strategy is to perform an adjustment to the estimated standard error of c 3 . T  nh  purpose of this adjustment is to inflate the estimated standard error of c 3 T  n  the bias of c 8 T  n  h  h  The  to reflect  . This approach, referred to as standard error-adjusted confidence  interval construction, is discussed in Section 7.1.2. Throughout, we assume we can use standard normal probability tables to construct the confidence interval in (7.2) and those proposed in Sections 7.1.1 - 7.1.2. This assumption is justified provided the estimator c / 3 T  nh  is asymptotically normal and our standard  error estimators are consistent. Opsomer and Ruppert (1999) established the asymptotic 120  normality of the estimator c 3 T  and Cl — I.  U h  for the case when the model errors are uncorrelated  However, no asymptotic n o r m a l i t y results are available as yet for the  cases when the model errors are correlated, for either Cl = I or more general Cl. T h e simulations conducted i n Chapter 8 support the use of normal tables when constructing 95% confidence intervals. Note that, for small sample sizes, one might widen the confidence intervals b y using ttables instead of standard n o r m a l probability tables. T h e issue of how one might specify the degrees of freedom involved i n these t-tables needs to be considered carefully a n d is beyond the scope of this thesis.  7.1.1  Bias-Adjusted Confidence Interval Construction  T h e idea underlying the bias-adjusted confidence interval estimation of c 0 T  adjust the estimator c 0 T  n h  is t o first  for possible finite sample bias effects. T h e n a bias-adjusted  100(1 — a)% confidence interval for c 0 is given by: T  c 3 -5ia^(c 3 T  T  n A  where Bias(c '3^ )  )±^/25^(c 3 ), n A  estimates the finite sample conditional bias of c 0  T  T  h  U  and Z, a n d is defined either as i n (6.17) for h = hp _ , LUG  hjsBBS-G  a  n  (7.3)  r  a f e  dh = h  _.  BBBS  L  IN  h  , given X  or as i n (6.10) for h =  Neither of these bias expressions takes into account the  data-dependent choice of h. Furthermore, these bias expressions do not account for the estimation of * when Cl = *  \  T h e length of the bias-adjusted confidence interval for c 0 i n (7.3) is the same as that T  of the standard confidence interval i n (7.2). T h e coverage properties of the bias-adjusted confidence interval may, however, be better t h a n those of the standard confidence interval, because the bias-adjusted confidence interval m a y be better centered.  121  Note that the estimated standard error SE(c  8 )  T  c 8 T  , instead of the variability of c 8 T  n h  place SE(c P  i n (7.3) reflects the variability of  nh  — Bias(c  8  T  n h  ). One could, of course, re-  n  h  ) by a n estimator of the true standard error of c 8  T  T  n  h  — Bias(c  8  T  n h  n  ).  h  B u t such an estimator may be difficult to obtain i n practice, unless one resorts to computationally expensive bootstrapping methods, and may not necessarily yield a confidence interval w i t h better coverage properties t h a n those of the standard confidence interval.  7.1.2  Standard Error-Adjusted Confidence Interval Construction  We have suggested i n Section 7.1.1 that the standard confidence interval for c 8 i n T  (7.2) can be improved upon by replacing c 8  w i t h its bias-adjusted version c 0  T  Bias(c 'dfih.)T  —  T  n h  n h  A n o t h e r possible way t o improve u p o n the standard confidence interval  in (7.2) is to replace SE(c  T  8 )  with  Uh  MSE(c 8 ),  the square root of the estimated  T  nh  conditional mean squared error of c 8  given X and Z. T h e m o t i v a t i o n for this latter  T  n h  adjustment is that, compared to SE(c P ),  \J MSE(c  T  the uncertainty associated w i t h estimating c 8 v i a C 8Q T  finite sample bias of c 8 T  n h  8 )  T  nh  is a better measure of  nh  , as it tries to account for the  t  H  .  A standard error-adjusted 100(1 — a)% confidence interval for c 8 is given by: T  c 3 ,, ± z T  n  ^MSE(c (3 )  (7.4)  T  a/2  n>h  where 2  MSE{&8^ )=\Bias{h-n)\ h  + [SE(h;Cl)\  Here, Bias(h; Cl) estimates the conditional bias of c 8 T  either as i n (6.17) for h = hp _ , LUG  IN  n h  12  .  (7.5)  given X and Z, and is defined  or as i n (6.10) for h = h  _  EBBS  G  and h =  h  _.  EBBS  L  Note that the length of the standard error-adjusted confidence interval for c 8 i n (7.4) T  is wider t h a n that of the standard confidence interval i n (7.2) due to the fact that 122  \JMSE(c 3 )  > SE(c '3a,h)-  T  T h i s may translate into improved coverage proper-  T  nh  ties for the standard error-adjusted confidence interval.  7.2 Hypothesis Testing In this section, we exploit the duality between confidence interval estimation a n d hypothesis testing to develop tests of hypotheses for  c 8. T  Suppose we are interested i n testing the null hypothesis  Ho : c 3 = 6 T  (7.6)  against the alternative hypothesis  H : A  c 3^6,  (7.7)  T  where 5 is a constant. F r o m the confidence intervals introduced i n Section 7.1, we construct three test statistics for testing HQ against H : A  Z £ = ^ t~ , ' S E ^ B ^ n  n  6  _ cp -  Bias(h;fl)  T  (2) Z  (7.8)  h  ^  U)h  SE c^ )  =  {  '  h  =  • y/MSE(cTf3 ) nth  W e w i l l reject H at significance level a i f \Z^l\ 0  123  - 6  > z /2a  ( 7  '  9 )  (7-io)  Chapter 8 M o n t e Carlo Simulations In this chapter we report the results of a Monte Carlo study on the finite sample properties of estimators and confidence intervals for the linear effect f3\ in the model: Yi = /?o + PiXi + m(Zi) + ei, i = l,...,n,  (8.1)  obtained by taking p = 1 in (2.1). Even though this model is not too complicated, we hope that it will allow us to understand how the properties of these estimators and confidence intervals will be affected by (1) dependency between the Xi's and the Zi's, and (2) correlation amongst the e;'s. For our study, we have deliberately chosen to use a context similar to that considered by Opsomer and Ruppert (1999) for independent ej's, so that we can make direct comparisons. Given this context, the main goals of our simulation study were to: 1. Compare the expected log mean squared error (MSE) of the estimators for Pi. 2. Compare the performance of the confidence intervals for Pi built from these estimators and their associated standard errors. The rest of this chapter is organized as follows. In Section 8.1, we discuss how we generated the data in our simulation study. In Section 8.2, we provide an overview of the 124  estimators for Pi considered in this study. We also specify the methods used for choosing the smoothing parameters of these estimators. In Section 8.3, we compare the expected log mean squared errors (MSE) of the estimators for all simulation settings in our study. Finally, in Sections 8.4 and 8.5, we assess the coverage and length properties of various approximate 95% confidence intervals for Pi constructed from these estimators and their associated approximate standard errors.  8.1  The Simulated Data  The data (Yi,Xi, Zi), i — 1,... ,n, in our simulation study were generated from model (8.1) using a modification of the simulation setup adopted by Opsomer and Ruppert (1999). Specifically, we took the sample size n to be 100 and set the values of the linear parameters Po and Pi to zero. We considered two m(-) functions: • mi(z) = 2sin(3z) - 2(cos(0) - cos(3))/3, z G [0,1]; • m (z) = 2sin(6z) - 2(cos(0) - cos(6))/6, z G [0,1]. 2  The Zi's were equally spaced on [0,1], being defined as Zi = i/(n + 1). Furthermore, Xi = g(Zi) + rji, with g(z) = QAz + 0.3, z G [0,1], and rji = (1 - 0.4)^ - 0.3, where the C/j's were independent, identically distributed having a Unif(0,1) distribution. The €j's followed a stationary AR(1) model with normal distribution: e»- = pet-i + Ui,  (8.2)  where p is an autoregressive parameter quantifying the degree of correlation amongst the ei's. The iij's were independent, identically distributed normal random variables having mean 0 and standard deviation a = 0.5. The Ui's were independent of the e^'s. In u  our simulation study, we used p = 0 to include the case of independence, as well as p — 0.2, 0.4, 0.6 and 0.8 to model positive correlation ranging from weak to strong. 125  T h e simulation settings corresponding to p = 0 (the case of independent errors) are the same as those considered by Opsomer and R u p p e r t (1999), w i t h the following exceptions: (i) we considered n = 100 instead of n = 250, (ii) we 'centered' the m(-) functions, that is, we subtracted a constant so that these functions integrate to 0 over the interval [0,1] and (iii) we scaled the errors 77, to have E{rji) = 0 instead of E(r}i) = 0.3. Opsomer and R u p p e r t d i d not specify what value they used for Pi. For each model configuration, we generated 500 d a t a sets. Note that there are 10 model configurations altogether, one for each combination of autoregressive parameter p and non-linear effect m(-) considered. Figure 8.1 displays data generated from model (8.1) for p — 0, 0.4, 0.8 and mi(z). Figure 8.2 provides the same display for m (z). T h e responses Yi are qualitatively different for 2  different values of p. For p = 0, the responses vary randomly about the m(-) curve. A s p increases from 0.4 to 0.8, the variation of the Yi's about the curve m(-) makes it v i r t u a l l y impossible to distinguish the non-linear signal m(-) from the autoregressive noise that masks it.  8.2  The Estimators  In this section, we provide an overview of the estimators for the linear effect Pi i n model (8.1) considered i n our simulation study. W e also provide an overview of the methods used for choosing the smoothing parameter of these estimators. Note that Pi = c 8, where c = ( 0 , 1 ) and 8 = (P , Pi) • T h e estimators of Pi considered T  T  T  0  i n our simulation study are of the form Pi = c 8, where 8 is: T  (i) / 3 c , the usual backfitting estimator defined i n (3.4) w i t h ft — i"; JS  (ii) 8^-1  , the estimated modified backfitting estimator defined i n (3.4) w i t h ft = * ; 'h 126  the usual Speckman estimator defined in (3.4) with fl = (I — S ) •  (iii) 0(j_ cj ^, S  c  T  T  h  In all three estimators, S is a centered smoother matrix, defined in terms of the Epanechh  nikov kernel in (3.9). For the two backfitting estimators, we take S to be a centered c  h  local linear smoother matrix. For the usual Speckman estimator, we take S to be a cenh  tered local constant smoother matrix with Nadaraya-Watson weights. The latter choice is motivated by the fact that the usual Speckman estimator is typically used with local constant smoother matrices with kernel weights. We are not sure to what extent the differences in performance between the usual Speckman estimator and the two backfitting estimators may be due to this difference in the method of local smoothing. Note that /3^-i c, the modified backfitting estimator obtained from (3.4) with fl — S  was omitted from our simulation study. This estimator may have value as a benchmark, but has no practical value due to the fact that the error correlation matrix * is never fully known in applications. For similar reasons, we also omitted 0(i-s° ) '<s>- ,s J the modified T  1  h  h  Speckman estimator obtained from (3.4) with fl = (I — S ) ^~ . c  T  1  h  not included in our study is 0.  CC^TS,-  Another estimator  the estimated modified Speckman estimator  so  1  c  obtained from (3.4) with fl = (I — S ^ ) * 7  . Recall that Aneiros Perez and Quintela  del Rio (2001a) investigated the large sample properties of a similar estimator, based on local constant smoothing with Gasser-Muller weights. These authors have a suggestion for estimating * from the data, but they did not explore how well it works in practice. In our simulation study, the estimator  0~-i„  r  -  which is similar to 0,  r  ^.^.--I^ -  does poorly in general. We believe this may be due to a combination of the following: (1) * is hard to estimate in the presence of confounding between the linear, non-linear and correlation effects and (2) the additional variability introduced by estimating * is not properly taken into account when selecting the smoothing parameter and when constructing standard errors for /3^-i  g c  from small samples. We suspect that, if one were  to use the methods proposed in this thesis to estimate \& for computing 0y_ ^ ~-\ sc  one would also get an estimator with poor finite sample behaviour.  127  T  g c  ,  All three estimators in (i)-(iii) require a data driven choice of smoothing parameter. For the three backfitting estimators we consider EBBS-G and EBBS-L (see Section 6.2.2) and PLUG-IN (see Section 6.2.3). For the usual Speckman estimator, we use cross-validation, modified for correlated errors (MCV) and for boundary effects. The MCV criterion is similar to that in (6.21), namely:  Here, Y^ ^  is obtained as in (6.22), but with Cl = (I—§ ) ,  1  c  T  h  where S is the centered local c  h  constant smoother matrix. Also, W is a weight function introduced to allow elimination (or at least significant reduction) of boundary effects that may affect the estimation of the non-linear effect m in model (8.1), and hence the prediction of Y . W is defined as t  in Chu and Marron (1991): ' 5 if I < « < i  W(u) = {  3  5 -  -  5>  ;  0, if 0 < u < | or \ < u < 1. Recall that EBBS-G depends on the tuning parameters I, N and T, whereas EBBS-L depends on the tuning parameters I, N, T, k\ and k . Also, recall that PLUG-IN and 2  MCV depend on the tuning parameter /. In our simulation study, we consider N = 50, T = 2, ki = 5, k = 5, and I = 0,1,..., 10. 2  For convenience, throughout the remainder of this chapter, we use the notation PIJ PLUG-IN^ $u,EBBS-G  a n  d Pu EBBS-L f ° * 1  r  n e  usual local linear backfitting estimators of j3\. We use  the notation P^§M,PLUG-IN^ J^EM,EBBS-G  a  n  d  P EM,EBBS-L {  for  t  h  e  estimated modified lo-  cal linear backfitting estimators. Finally, we use the notation ($ cv *° f r e  M  e r  to the usual  Speckman estimator of (3\. Wherever necessary, we refer to these estimators generically as  128  8.3  The M S E Comparisons  In this section, we identify the estimators /?}' , including bandwidth selection methods, ;  that appear to be best, in the sense of being most accurate for all simulation settings and for most values of /, the tuning parameter used in the modified cross-validation. Recall  ~ (0 that the measure of accuracy of /3i  considered in this thesis is the conditional MSE of  /3[ \ MSE(0[ ), defined in (6.4). Specifics are provided below. l  l)  To compare the accuracy of two estimators for a given simulation setting, we look at the boxplot of differences in the log MSE's of these estimators. If the boxplot is symmetric about 0, then the two estimators have comparable accuracy. We also conduct a level 0.05 two-sided paired t-test to compare the expected log MSE's of the estimators. If the test is significant, we label the boxplot with an S. The log MSE's of the two estimators are evaluated from the 500 data sets generated for the given simulation setting. For each backfitting estimation method (usual, estimated modified), we recommend a way to choose the smoothing parameter h. Then we compare the resulting backfitting estimators, including a comparison with the usual Speckman estimator to determine an estimator that is best, in the sense of being most accurate for all simulation settings and most values of I. In Figures A.1-A.10 in Appendix A, we study the methods of bandwidth choice for the usual local linear backfitting estimator. We display boxplots of pairwise differences in the log MSE's of the estimators PV,PLUG-IN> PU]EBBS-G  a  n  d  PU,EBBS-L> £ = 0,1,..., 10.  Each figure corresponds to a different simulation setting. From these figures, we see that 1$PLUG-IN  a  n  a  PIJEBBS-G  n a v e  comparable accuracy across all simulation settings,  provided I is large enough, say I > 4. They also have better accuracy than @U]EBBS-L> which performs poorly for several simulation settings (see, for instance, Figures A.6A.7). Therefore, we recommend using PLUG-IN and EBBS-G to choose the smoothing parameter for the usual local linear backfitting estimator. 129  Figures A.11-A.20 display the corresponding plots for the estimated modified local linear backfitting estimator. We see that P  _  EM  EBBS  is the most accurate across all simulation  G  settings, provided I is large enough, say I > 4. We also see that PEM EBBS-L PEM,PLUG-IN perform very poorly relative to PEM,EBBS-G f °  r m o s  ^  AN  t simulation settings  and most values of I. Therefore, we recommend using EBBS-G to choose the smoothing parameter for the estimated modified local linear backfitting estimator. In Figures A.21-A.30 we compare estimators using our favourite bandwidth selection method. We display boxplots of pairwise differences in the log MSE's of the estimators M]PLUG-IN,  M]EBBS-G> 0EM,EBBS-G  a  n  d  MMCV  1  = 0,1,..., 10. Each figure  corresponds to a different simulation setting. From these figures, we conclude that the estimators P^PLUG-IN^ PU]EBBS-G  a  n  d  P EBBS-G EM  have comparable accuracy for all  simulation settings, provided I is large enough, say I > 4. The estimator P^MCV ^  S  less accurate than these three estimators for most simulation settings and most values of /. In particular, plots such as those in Figures A.24, A.25, A.29 and A.30 strongly support the elimination of P J P  . The poor performance of P^MCV ^ w  M C V  n  respect to  the log MSE criterion could be due to the fact that this estimator uses local constant smoothing, instead of local linear smoothing. But it could also be due to the fact that $3 MCV  I S  computed with an MCV choice of smoothing. Recall that this choice attempts  to estimate the amount of smoothing optimal for estimation of XB + m. It is not clear whether this choice will provide a reliable estimate of the amount of smoothing optimal for estimation of c 3. T  8.4  Confidence Interval Coverage Comparisons  In this section, we assess and compare the coverage properties of various confidence intervals for Pi constructed from all estimators considered in our simulation study. Our goals are to: 130  1. Identify those estimators which yield standard confidence intervals for Pi w i t h good coverage properties across a l l simulation settings and most values of I. 2. E s t a b l i s h whether the coverage properties of standard confidence intervals for Pi can be improved through bias or standard error adjustments.  T o assess the coverage properties of a confidence interval C for a given simulation setting, we proceed as follows. We evaluate the confidence interval for each of the 500 simulated data sets. We calculate the proportion of these intervals which contain the true value of Pi and denote it by p. lfp±  1.96-y/p(l — p)/500, the 95% confidence interval for the  true coverage, contains the nominal level of C , we say that C is valid.  If the upper  (lower) confidence l i m i t is smaller (bigger) t h a n the n o m i n a l level of C , we say that C is anti-conservative (conservative). T h e confidence intervals for Pi considered i n our simulation study fall into three categories: standard, bias-adjusted and standard-error adjusted, as defined i n (7.2), (7.3) and (7.4).  8.4.1  Standard Confidence Intervals  We now assess the coverage properties of the standard 95% confidence intervals for Pi obtained from the estimators PU]PLUG-IN, PEM,EBBS-G>  0EM,EBBS-L  A  N  D  PU,EBBS-G> PU,EBBS-L>  @S!MCV> where I = 0 , 1 , . . . , 10.  0EM,PLUG-IN>  P o i n t estimates and  95% confidence interval estimates for the true coverage achieved by these intervals are displayed i n Figures B.1-B.10 i n A p p e n d i x B . E a c h figure corresponds to a different simulation setting. Figures B.1-B.10 show that the standard confidence intervals constructed from the estimators P ij pLUG-iNi t  @U!EBBS-G  a n o  - 0 S!MCV  a  r  e  v a u  d for all simulation settings provided  the value of I is large enough. However, the standard confidence intervals obtained from 131  the estimators PU,EBBS-L^ PEM,PLUG-IN> PEM,EBBS-G  a  n  d  PEM,EBBS-L  h  a  v  e  extremely  poor coverage for many simulation settings a n d for many values of /; see, for instance, Figures B . 6 a n d B . 7 . In view of these findings, the preferred estimators for constructing standard confidence intervals for Pi are PIJ PLUG-IN^ PU]EBBS-G  a  n  d  P S!MCV- T h e other  estimators cannot be trusted to produce valid inferences on Pi. M o r e details concerning our findings are provided below. confidence intervals constructed from the estimators PJj PLUG-IN  T h e standard PIJEBBS-G  a  r  e v &  A  N  D  l i d for a l l simulation settings, provided I is large enough, as shown  in Table 8.1. F r o m this table, we see that taking I > 1 when p = 0.2, I > 2 when p — 0.4, I > 3 when p = 0.6, and I > 4 when p = 0.8 yields valid intervals for the contexts considered. W e recommend using these intervals to conduct inferences on Pi, w i t h values of I that are large enough. Clearly, taking I = 0 , 1 , 2, 3 is not advised, unless one is certain that p is small. W h a t is not apparent from Table 8.1 is w h y the confidence intervals constructed from Pu PLUG-IN  a  n  d  PIJEBBS-G  a  r  e  v &  h d for smaller values of I. Typically, for small I's,  the estimates of Pi constructed from the simulated data have a tendency to underestimate the true value of Pi when m(z) = m {z). Furthermore, the estimated standard 2  errors  associated w i t h these estimates have a tendency to underestimate the true standard errors b o t h when m(z) = mi(z) and when m(z) = m {z). However, as I increases, the estimates 2  of Pi and their associated standard errors improve significantly for all simulation settings. T h e standard confidence intervals constructed from the usual Speckman estimator P ^SMCV are generally valid across a l l simulation settings even for smaller / values. P^SMCV  d  o  e  s  n  However,  ° t yield valid confidence intervals when m(z) — 7712(2) and (i)  p = 0.4  and I — 1 or 4 a n d (ii) p — 0.8 a n d Z = 3,4, 5, 6, 7, 8 or 10. In these two cases, P^SMCV yields confidence intervals that are slightly anti-conservative.  T h i s lack of continuity  i n behaviour is of concern and might not be attributable to simulation variability. I n deed, Figures B . 6 - B . 1 0 show that, for m(z) = m (z), P^SMCV seems t o exhibit a n anti2  132  conservative pattern for most I's. W h e n p = 0 and m(z) = rrii(z), the standard confidence intervals obtained from the estimators PU]EBBS-L^ PEM,PLUG-IN,  PEM,EBBS-G  a  n  PEM,EBBS-L  d  provide the nomi-  nal coverage, regardless of how we choose I (see Figure B . l ) . However, when p = 0 and  m(z)  =  the intervals constructed from P$EBBS-L  7712(2),  a  n  d  P EM EBBS-L  a  r  e  extremely  anti-conservative for all values of / (see Figure B . 6 ) . In addition, the intervals constructed from PEM,PLUG-IN  a  n  d  PEM EBBS-G  a  r  e m  i l d l y anti-conservative for many values of I  (see Figure B . 6 ) . A s p increases, the coverage provided by some of the standard confidence intervals obtained from P U]EBBS-L> PEM,PLUG-IN> PEM,EBBS-G (  a  n  PEM,EBBS-L  d  deteriorates for  many small a n d / o r large values of I, depending o n the specification of m(-). For i n stance, when m(z) = m ( z ) , the coverage properties of the intervals constructed from 2  PEM,PLUG-IN  a  n  PEM,EBBS-L  d  a  r  e  extremely poor (see Figures B . 7 - B . 1 0 ) . T h e coverage  properties of the intervals constructed from j3 y EBBS-L  a  r  e a  ^  s o  P  o o r  for small p values  (see Figures B . 7 - B . 8 ) . Finally, the coverage properties of the intervals constructed from PEM,EBBS-G  w  o  r  s  e  n  a  s  P increases, but not dramatically. W e do not recommend using  these intervals to carry out inferences o n  8.4.2  Bias-Adjusted Confidence Intervals  In this section, we assess the coverage properties of the bias-adjusted 95% confidence intervals for Pi. W e d i d not consider a bias-adjusted confidence interval for the usual Speckman estimator P^  ,  MCV  as this estimator is known to have good bias properties  b o t h when p = 0 (see Speckman, 1988) and when p > 0 (see Aneiros-Perez and Quinteladel-Rio, 2001a). Plots (not shown) of the point estimates and 95% confidence interval estimates for the true coverage achieved by the bias-adjusted intervals yield some general conclusions. 133  Only the estimators p uPLUG-IN  a  n  d  0 U]EBBS-G yield bias-adjusted confidence intervals  that are valid for all simulation settings provided the value of I is large enough. These values of I are almost identical to those reported in Table 8.1. Again, we see that one should avoid using I = 0,1,2,3 unless one is sure that p is small enough.  8.4.3  S t a n d a r d E r r o r - A d j u s t e d Confidence Intervals  Here, we assess the coverage properties of the standard error-adjusted 95% confidence intervals for f3\. We did not consider a standard error-adjusted confidence interval for the usual Speckman 3 g\{cv> due  1 , 0  its g°°d bias properties. Plots (not shown) indicate  that only the estimators 3 \j PLUG-IN  a  n  d  0 IJ EBBS-G P  rov  i d e standard error-adjusted  confidence intervals that are valid for all simulation settings provided the value of I is large enough. These values of / are nearly identical to those reported in Table 8.1. Yet again, we see that one should avoid using / = 0,1, 2, 3 unless one is sure that p is small enough. To sum up, we see no reason to recommend bias adjustments to the estimators 3 uPLUG-IN and P(JEBBS-G  o r  to their associated standard errors. Indeed, such adjustments do not  seem to improve the coverage properties of the confidence intervals obtained from these estimators.  8.5  Confidence Interval Length Comparisons  Recall from the previous section that we identified 3 uPLUG-IN  a  n  d  0 U\EBBS-G  a s  the  only estimators of Pi in our simulation study that yielded valid 95% standard confidence intervals for all simulation settings provided the value of / is large enough. The standard intervals based on P SMCV were found to be competitive, but just not as good. Also recall that the coverage properties of the standard confidence intervals constructed from 134  P UNPLUG-IN aftd P^JEBBS-G could ° t n  improved by performing bias-adjustments to  D e  these estimators or to their associated standard errors. Before recommending any of the estimators 0$PLUG-IN  a  n  a  PIJEBBS-G f° practical use, we must compare the lengths r  of the standard confidence intervals for Pi constructed from these estimators. We choose to include standard intervals constructed from P S^MCV  XN  o u r  comparison to gain more  understanding into their properties. When several confidence interval procedures are valid (in the sense of achieving the desired nominal level), we prefer the one with the shortest length. In this section, we conduct visual and formal comparisons of the lengths of the standard 95% confidence intervals for Pi constructed from these estimators. Wc only consider values of / that are large enough to guarantee the validity of the ensuing confidence intervals, as in Section 8.4. Specifically, we take / > 1 for p — 0.2, / > 2 for p ~ 0.4, / > 3 for p = 0.6 and I > 4 for p = 0.8. To compare the lengths of two confidence intervals for a given simulation setting wc look at the boxplot of differences in the log lengths of these intervals. The lengths are evaluated from the 500 data sets generated for the given simulation setting. If the boxplot is symmetric about 0, then the two confidence intervals have comparable length. Figures C l - C.10 in Appendix C (bottom three rows) display boxplots of pairwise differences in the log length of the standard 95% confidence intervals constructed from the estimators P(JPLUG-INI PUEBBS-G  a n d  PS!MCV F  these figures, we see that for  r o m  all simulation settings with p > 0 and for values of I that are large enough (e.g., larger than 3), the estimators P^u PLUG-IN than those based on P^MCVP^SMGV  w  a  s  s e e n  a r m  PIJEBBS-G yield shorter confidence intervals  This was to be expected, as the log MSE behaviour of  to be inferior to that of Pu PLUG-IN  a  n  a  PIJEBBS-G-  Furthermore,  we notice that the lengths of the confidence intervals constructed from P lj PLUG-IN PUEBBS-G  -  ANA  t d to be comparable for many of these I values. e n  Our previous findings arc supported by the results of pairwise level 0.05 two-sided paired 135  t-tests for comparing the expected log lengths of the confidence intervals under consideration for all simulation settings and for values of I that are large enough. We describe these tests below. Given a simulation setting, for fixed I, conduct (?,) two-sided paired t-tests to compare the expected log lengths of the intervals obtained from the estimators Pu PLUG-IN^ PIJ EBBS-G and PSMCV For each test, the null hypothesis is that the expected log lengths of the intervals being compared are the same. The test result is considered significant if the p-value associated with the test is smaller than 0.05. Use the results of the t-tests to identify which estimators yield the shortest confidence interval. If all tests give significant results, we claim that there is a clear winner; in other cases, we say that two estimators might be tied for best. Figures C.1-C.10 (top row) show the average length of the confidence intervals obtained, with standard error bars superimposed. The figures indicate which of these estimators produces the shortest confidence interval for values of I of interest.  8.6  Conclusions  Based on the results of our simulation study, we recommend using the usual local linear backfitting estimators PIJPLUG-IN Bs^  MCV  a  n  d  PIJ EBBS-G  a n o  - the usual Speckman estimator  to carry out valid inferences about the linear effect B\ in model (8.1). The value  of I used when computing these estimators should be large enough, that is, at least 4. Our findings indicate that Pu PLUG-IN  a n o  - Pu EBBS-G have comparable accuracy for  large values of I, and that they are in general more accurate than P^MCV  All three  estimators yield valid standard 95% confidence intervals for Pi when I is large enough. However, the intervals based on PU]PLUG-IN  a  n  d  Pu'EBBS-G t d to have shorter length e n  and are therefore preferred over the interval based on P*PMCV• 136  We see no reason to recommend bias-adjustments to the estimators Pu PLUG-IN PIJEBBS-G  o  r  -  ANA  t ° their associated estimated standard errors. Such adjustments do not  seem to improve the coverage properties of the corresponding confidence intervals. Finally, we do not recommend using the usual backfitting estimator PIJEBBS-L  O R  estimated modified backfitting estimators PEM,PLUG-IN> PEM,EBBS-G> PEM,EBBS-L  *  NE  T O  carry out inferences about fa. These estimators yielded confidence intervals w i t h poor coverage for many simulation settings and many values of I, owing to the difficulties associated w i t h estimating their standard errors.  137  Figure 8 . 1 : Data simulated from model (8.1) for p = 0,0.4,0.8 and m(z) = m\(z). The first row shows plots that do not depend on p. The second and third rows each show plots for p = 0, 0.4, 0.8.  138  X1 vs. Z 4,  m(Z) vs. Z  •  4|  0.5 Z £ VS.  •  0.5 Z  Z  £ VS.  Yvs. Z  Z  £ VS.  Yvs. Z  N  Z  Yvs. Z  N  ca +  o  II  CO.  >-  II  >-  Figure 8.2: Data simulated from model (8.1) for p = 0, 0.4, 0.8 and 7n(z) =  7712(2).  T/ie ^ r s i row shows plots that do not depend on p. The second and third rows each show plots for p = 0, 0.4, 0.8.  139  Table 8.1: Values of I for which the standard 95% confidence intervals for Pi constructed from the estimators PXJ LUG-IN> PU]EBBS-G P  d P^SMCV  an  a  r  ev  a  ^  the  sense of achieving the nominal coverage) for each setting in our simulation study.  7711(2)  p =0  off) P U,PLUG--IN le{0,.. .,10}  p = 0.2  l€{0,..  p = 0.4  P U,EBBS--G  P S,MCV  le{0,..  .,10}  I e {0,.. ,10}  .,10}  l£{0,..  -,10}  le{i,..  l€{l,..  .,10}  le{2,..  .,10}  1 e {0,.. ,10}  p = 0.6  le{2,..  .,10}  le {3,.. .,10}  le{0,..  ,10}  p = 0.8  le{3,..  .,10}  le {3,.. .,10}  le{0,..  ,10}  ,10}  m {z) 2  P U,PLUG--IN  P U,EBBS--G  PS,MCV  p =0  le {0,.. .,10}  le{0,..  .,10}  le{o,...,w}  p = 0.2  l€{0,..  .,10}  l€{l,..  • ,10}  le {0.....10}  p = 0.4  ie{i,..  .,10}  le{2,..  .,10}  l e {0}U{2,3}U{5,...  p = 0.6  le {3,.. .,10}  le{3,..  .,10}  /e{o,...,io}  p = 0.8  le{3,..  .,10}  l e {4,.. .,10}  140  / e {0,1,2}U{9}  Chapter 9 Application to A i r Pollution D a t a Many community-level studies have provided evidence that air pollution is associated with mortality. Statistical analyses of data collected in such studies face various methodological challenges: (1) controlling for observed and unobserved factors, such as season and temperature, that might confound the true association between air pollution and mortality, (2) accounting for serial correlation in the residuals that might underestimate statistical uncertainty of the estimated association, and (3) assessing and reporting uncertainty associated with the choice of statistical model. Various statistical models can be used to describe the true association between air pollution and health outcomes of interest based on community-level data. However, the most widely used have been the generalized additive models (GAMs) introduced by Hastie and Tibshirani (1990). These models include a single 'time series' response (e.g. non-accidental mortality rates) and various covariates (e.g. pollutants of interest, time, temperature). The effects of the pollutants of interest on the response are typically presumed to be linear, whereas those of the remaining covariates are presumed to be smooth, non-linear. Schwartz (1994), Kelsall, Samet and Zeger (1997), Schwartz (1999), Samet, Dominici, Curriero et al. (2000), Katsouyani, Toulomi, Samoli et al. (2001), Moolgavkar (2000), Schwartz (2000) are just some of the authors who relied on GAMs in order to assess the acute effects of air pollution on health outcomes such as mortality or hospital 141  admissions. There are various problems that researchers must consider when using GAMs to analyze air pollution data arising from community-level studies. Some of these problems are purely computational, whereas others are more delicate and pertain to the theoretical underpinnings of these models. Several computational issues associated with the S-Plus implementation of methodology developed by Hastie and Tibshirani (1990) for estimation of GAMs have been brought to light in recent years. We describe these problems here. The linear and non-linear effects in GAMs applied to air pollution data have been typically estimated using the S-Plus function gam. Dominici et al. (2002) showed that gam may provide incorrect estimates of the linear effects in GAMs and their standard errors if used with the original default parameters. Although the defaults have recently been revised (Dominici et al., 2002), an important problem that remains is that gam calculates the standard errors of the linear effects by assuming that the non-linear effects are effectively linear, resulting in an underestimation of uncertainty (Ramsay et al., 2003a). In air pollution studies, this assumption is likely inadequate, resulting in underestimation of the standard error of the linear pollutant effect (Ramsay et al., 2003a). The practical choice of the degree of smoothness of the estimated non-linear confounding effects of time and meteorology variables is a delicate issue in air pollution studies which utilize GAMs. Given that the confounding effects are viewed as a nuisance in such studies, the appropriate choice should be informed by the objective of conducting valid inferences about the pollution effect. Most choices performed in the air pollution literature are based on exploratory analyses (see, for instance, Kelsall, Samet and Zeger, 1997) and seem to be justified by a different objective, namely doing well at estimating the non-linear confounding effects. This objective typically ignores the impact of residual correlation on the choice of degree of smoothness, as well as the dependencies between the various variables in the model. 142  In the present chapter we apply the methodology developed i n this thesis to analyze air p o l l u t i o n data collected i n M e x i c o C i t y between January 1, 1994 a n d December 31, 1996. O u r goal is to determine whether the pollutant P M 1 0 has a significant short-term effect on the non-accidental death rate i n M e x i c o C i t y after adjusting for temporal and weather confounding. W e give a description of the data i n Section 9.1 a n d analyze the d a t a i n Section 9.2.  9.1  Data Description  P M 1 0 - airborne particulate matter less than 10 microns i n diameter - is a major component of air pollution, arising from natural sources (e.g. pollen), road transport, power generation, industrial processes, etc. W h e n inhaled, P M 1 0 particles tend to be deposited i n the upper parts of the human respiratory system from w h i c h they can be eventually expelled back into the throat. H e a l t h problems begin as the b o d y reacts to these foreign particles. P M 1 0 is associated w i t h mortality, exacerbation of airways disease and decrement i n lung function. A l t h o u g h P M 1 0 can cause health problems for everyone, certain people are especially vulnerable to its adverse health effects.  These "sensitive popula-  tions" include children, the elderly, exercising adults, a n d those suffering from heart a n d lung disease. T h e d a t a to be analyzed i n this chapter were collected i n M e x i c o C i t y over a period of three years, from January 1, 1994 to December 31, 1996, i n order to determine if there is a significant short term effect of P M 1 0 on mortality, after adjusting for potential temporal and weather confounders. T h e data consist of daily counts of non-accidental deaths, daily levels of ambient concentration of P M 1 0 (10fig/m ), 3  and daily levels of temperature (°C)  and relative h u m i d i t y (%). T h e ambient concentration of P M 1 0 corresponding to a given day was obtained by averaging the P M 1 0 measurements over a l l the stations i n M e x i c o City.  143  Pairwise scatter plots of the data are shown i n Figure 9.1. T h e most s t r i k i n g features i n these plots are the strong annual cycles i n the log m o r t a l i t y levels, the daily level of ambient concentration of P M 1 0 , and the daily levels of temperature and relative humidity. It is likely that the annual cycles i n the log mortality levels are produced by unobserved seasonal factors such as influenza and respiratory infections. Note that log mortality and P M 1 0 peak at the same time w i t h respect to the annual cycles. O u r analysis of the health effects of P M 1 0 must account for the potential confounding effect of these temporal cycles on the association between P M 1 0 and log mortality. W e believe the strength of these cycles w i l l make it difficult to detect whether this association is significant.  9.2  Data Analysis  T h e following is an overview of our data analysis. F i r s t , we introduce the four statistical models that we use to capture the relationship between P M 1 0 and mortality, adjusted for seasonal and meteorological confounding.  Three of these models contain smooth  non-parametric terms which attempt to control for these confounding effects.  Next,  we illustrate the importance of choosing the amount of smoothing for estimating the nonparametric terms i n these models when the m a i n objective is accurate estimation of the true association between P M 1 0 and mortality. We then focus on determining which of the four models is most relevant for the data. Finally, we use this model as a basis for carrying out inference about the true association between P M 1 0 and mortality.  9.2.1  Models Entertained for the Data  Let Di denote the observed number of non-accidental deaths i n M e x i c o C i t y on day i, and let Pi, Ti and Hi denote the daily measures of P M 1 0 , temperature and relative humidity,  144  respectively. T h e models that we entertain for our d a t a are:  log(Di) = 0o + 0iPi + ei  (9.1)  log{Di) = p + 0^  + mi(i) + e  log(Di) = p +  + (i)  0  Q  (9.2)  t  mi  + p Ti + P Hi + p Ti • H + a 2  3  23  (9.3)  t  log(Di) = P + PiPi + mi{i) + m {T Hi) + e . 0  Here, i = 1,2,..., 1096.  2  h  (9.4)  t  A l s o , m i is a smooth univariate function, whereas m  2  is a  smooth bivariate surface. T h e function mi serves as a linear filter on the log m o r t a l i t y and P M 1 0 series and removes any seasonal or long-term trends i n the data.  For the  time being, the error terms i n a l l four models are assumed to be independent, identically distributed, w i t h mean 0 and constant variance o\ < oo. T h e independence assumption w i l l be relaxed later. Models (9.1)-(9.4) treat the log m o r t a l i t y counts as a continuous response. Furthermore, they assume the relationship between P M 1 0 and log m o r t a l i t y to be linear, to allow for easily interpretable inferences about the effect of P M 1 0 on log mortality. T h e models differ, however, i n their specification of the potential seasonal and weather confounding on this relationship. Specifically, model (9.1) ignores the possible seasonal and weather confounding on the relationship between P M 1 0 and log mortality. Models (9.2)-(9.4), however, allow us to adjust this relationship for potential seasonal and weather confounding. Models (9.2) and (9.3) require that we specify the amount of smoothing needed for estimating m j . M o d e l (9.4) requires that we specify the amount of smoothing necessary for estimating b o t h mi and  m. 2  To fit models (9.2)-(9.4) to the data, we use the S-Plus function gam w i t h the more stringent convergence parameters recommended by D o m i n i c i et al. ( 2002). W e employ a univariate loess smoother to estimate mi and a bivariate loess smoother to estimate  145  m . T h e loess smoothers are local linear smoothers relying on spans corresponding to a 2  fixed number of nearest neighbours instead of a b a n d w i d t h .  9.2.2  Importance of Choice of Amount of Smoothing  T h e inferences made on the linear P M 1 0 effect Pi i n any of the models (9.2)-(9.4) may be severely affected by the choice of amount of smoothing for estimating the smooth confounding effects i n these models. To illustrate the impact of this choice on the conclusions of such inferences, we restrict attention to model (9.3). Later, we w i l l see that this model is the most appropriate for the data. Figure 9.2 compares the impact of various choices of smoothing for the seasonal effect m i i n model (9.3) on the following quantities: (i) gam estimates of Pi, (ii) gam standard errors for the estimates i n (i), (iii) 95% confidence intervals for Pi constructed from the estimates i n (i) and (ii), (iv) gam p-values associated w i t h standard t-tests of significance of Pi. These quantities were obtained by fitting model (9.3) to the data using gam w i t h loess as a basic smoother. T h e loess span used for smoothing mi was allowed to take on values in the range 0.01 to 0.50. T h e reference d i s t r i b u t i o n for the 95% confidence intervals and the p-values depicted i n Figure 9.2 is a t-distribution whose degrees of freedom are the residual (or error) degrees of freedom associated w i t h model (9.3). Note that the estimated standard errors reported by gam do not account for error correlation. C h a n g i n g the span for smoothing mi greatly affects the estimates, standard errors, confidence intervals and p-values i n Figure 9.2 and hence the conclusions of our inferences on Pi, the short-term P M 1 0 effect on log mortality. In particular, using large spans for 146  smoothing m i suggests that the data provide strong evidence i n favour of a significant P M 1 0 effect on log mortality, after adjusting for seasonal and weather confounding. Using small spans for smoothing m i suggests that the data do not provide enough evidence in support of a significant P M 1 0 effect on log mortality i n M e x i c o C i t y . P r o p e r choice of amount of smoothing for estimating the seasonal effect m i i n model (9.3) is crucial for m a k i n g inferences on Pi, as seen i n F i g u r e 9.2. G i v e n the sensitivity of our conclusions to the choice of smoothing, the natural question that arises is: how can we choose the amount of smoothing to be able to make valid inferences on Pi? T h e correct choice of smoothing should be appropriate for accurate estimation of Pi, not for accurate estimation of m i . T h i s choice should account for the strong relationships between the linear and non-linear variables i n the model seen i n Figure 9.1, and for potential correlation amongst model errors. It is important to note that the S-Plus function gam provides no data-driven method for choosing the amount of smoothing. Using gam's default choice of smoothing is not advised when one is concerned w i t h accurate estimation of Pi. T h e default choice of smoothing used by gam is 0.50, or 50% of the nearest neighbours. T h i s choice is much larger than the choices that we recommend for estimating m i (shown i n the next section). T h e theoretical results i n this thesis suggest that the correct choice of smoothing for estimating Pi should undersmooth the estimated mi. Therefore, this choice of smoothing is most likely smaller than the one we recommend for estimating m i , and certainly not larger.  9.2.3  Choosing an Appropriate Model for the Data  In this section, we focus on the issue of selecting an appropriate model for the data amongst models (9.1)-(9.4).  Selecting such a model requires that we balance model  complexity w i t h model parsimony. In what follows, we show that model (9.3) is the most  147  appropriate for describing the variability i n the log mortality counts, as it is complex enough to capture the m a i n features present i n the data, yet relatively inexpensive to fit to these data i n terms of degrees of freedom. M o d e l (9.1) is the simplest of models (9.1) -(9.4) and, not too surprisingly given the strong cycles apparent i n Figure 9.1, it provides an inadequate description for the variability i n the log mortality counts. In fact, the linear relationship between P M 1 0 and log mortality postulated by model (9.1) explains only 9.25% of the total variability i n the log mortality counts. Figure 9.3 (top panel) shows that the log mortality counts are widely scattered about the regression line obtained by fitting model (9.1) to the data. Figure 9.3 (bottom panel) shows that model (9.1) displays clear lack-of-fit, as it fails to account for the strong annual cycles present i n the model residuals. W e therefore drop model (9.1) from our pool of candidate models and concentrate instead on models (9.2)-(9.4). M o d e l (9.4) is the most complex of these models, and w i l l consume significantly more degrees of freedom when fitted to the data than either model (9.2) or model (9.3). A s we shall see shortly, comparing model (9.4) against model (9.2) v i a a series of approximate F-tests suggests that we can drop model (9.4) i n favour of model (9.2). We could therefore consider the simpler model (9.2) as being adequate for describing the variability i n the log mortality counts.  However, given that the weather variables  are typically included i n models for P M 1 0 mortality data, we prefer to use model (9.3). T h i s model is more flexible than model (9.2), as it includes linear marginal effects for the weather variables together w i t h a linear interaction effect between these variables. C o m p a r e d to model (9.2), this model can be fitted to the d a t a at the expense of just three additional degrees of freedom. G i v e n the large size of the d a t a set, this is an insignificant price to pay for achieving more modelling  flexibility.  We now provide more details concerning the choice of an appropriate model for our d a t a amongst models (9.2)-(9.4). A s a first step we need to identify spans that are reasonable for smoothing the seasonal effect mi i n these models. 148  To identify a reasonable range of spans for smoothing m i i n model (9.2), we fit model (9.2) to the d a t a by smoothing m i w i t h spans ranging from 0.01 to 0.50 i n increments of 0.01 and examine plots of the fitted m i and corresponding model residuals.  From  Figures 9.4 and 9.5 we see that the data suggest spans i n the range 0.09 — 0.12. U s i n g spans smaller than 0.09 for estimating m i leads to under-smoothed fits, that are visually noisy. O n the other hand, using spans larger than 0.12 leads to over-smoothed fits, that fail to reflect important seasonal features of the data. In summary, the range 0.09 — 0.12 is reasonable for smoothing the seasonal effect m i i n model (9.2). P l o t s of the fitted additive component m i i n models (9.3) and (9.4) (not shown) corresponding to spans i n the range 0.09 to 0.12 are similar to those i n Figure 9.4 and suggest that this range is also reasonable for smoothing the seasonal effect m i i n models (9.3) and (9.4). We now show that we can reduce model (9.4) to model (9.2). W e use a series of approximate F-tests to compare models (9.4) and (9.2). E a c h F-test compares a fit of model (9.4), obtained by smoothing m i w i t h the span s\, against a fit of model (9.2), obtained by smoothing m i w i t h the span s i and m  2  w i t h the span s . T h e test statistic for each 2  F-test is obtained i n the usual fashion from the residual sums of squares and the residual (or error) degrees of freedom associated w i t h the two model fits. T h e residual degrees of freedom of these fits are obtained as the difference between the size of the d a t a set n = 1096 and the trace of the hat m a t r i x associated w i t h the model fit. W e allow the span Si to range between 0.09 and 0.12 i n increments of 0.01, and the span s  2  to range  between 0.01 and 0.50 i n increments of 0.01.  T h e p-values associated w i t h these F-tests are displayed i n Figure 9.6. P-values corresponding to spans s bigger than 0.04 are quite large, suggesting that the smooth weather 2  surface m  2  need not be included i n model (9.4). P-values corresponding to spans s of 2  0.02, 0.03 or 0.04 are a bit smaller, suggesting that perhaps the surface m  2  should be  included i n the model. However, Figures 9.7 and 9.8, for s\ — 0.09, show that very small 149  spans are not appropriate for estimating the surface m 2 , as they yield visually rough surfaces that consume unacceptably high numbers of degrees of freedom. U s i n g a span s i of 0.10,0.11 or 0.12 instead yielded plots (not shown) that were basically identical to those i n Figures 9.7 and 9.8. In conclusion, the smooth weather surface m  2  contributes little to model (9.4), so there  is no real need to include either temperature or relative h u m i d i t y i n this model. other words, we can reduce model (9.4) to model (9.2).  In  Coplots (not shown) of the  residuals associated w i t h model (9.2) versus temperature, given relative humidity, and versus relative humidity, given temperature, support this conclusion. Since there is no real need to include the weather variables, temperature and relative humidity, we could consider the simpler model (9.2) as being adequate for describing the variability i n the log mortality counts. However, for reasons explained earlier, we prefer to use the more flexible model (9.3). How well does model (9.3) fit the data? To answer this question, we examine a series of diagnostic plots. Figure 9.9 shows plots of the residuals associated w i t h model (9.3) against P M 1 0 and day of study. These residuals were obtained by smoothing the unknown mi w i t h a span of 0.09; using spans of 0.10, 0.11 or 0.12 yielded similar plots (not shown). T h e functional form of the relationship between P M 1 0 and log mortality postulated by model (9.3) is not violated by the data, since no systematic structure is apparent i n the plot of residuals versus P M 1 0 . T h e plot of residuals against day of study also shows no systematic structure, suggesting that the seasonal component mi of the model accounts for the long-term temporal variation i n the d a t a reasonably well. Figures 9.109.11 show that the functional specification of the weather portion of model (9.3) is not violated by the data. Indeed, these plots display no obvious systematic structure.  The  weather coplots corresponding to spans of 0.10,0.11 and 0.12 were similar, so we omitted them.  Finally, Figure 9.12 presents autocorrelation and p a r t i a l autocorrelation plots  for the residuals associated w i t h model (9.3). F r o m these plots, it is apparent that the 150  magnitude of the residual correlation is small. W e believe this is due to the fact that most of the short-term temporal variation i n log m o r t a l i t y counts has been accounted for by the seasonal component m j of the model. C o m p a r i n g Figure 9.12 against Figure 9.13, which displays autocorrelation and p a r t i a l autocorrelation plots for the raw log m o r t a l i t y counts, supports this belief. In summary, the assumptions underlying the systematic part of model (9.3) seem reasonable. However, there is some modest suggestion that the independence assumption concerning the error terms i n this model may not hold for these data. T h i s assumption w i l l be relaxed to account for the slight temporal correlation present i n the data. M o d e l (9.3) can therefore be used as a basis for carrying out inferences o n pi, the linear P M 1 0 effect o n log mortality, adjusted for seasonal and weather confounding. A c c o u n t i n g for error correlation when conducting such inferences is perhaps not as important as accounting for the strong relationships between the linear and non-linear variables i n the model evident i n Figure 9.1.  9.2.4  Inference on the PM10 Effect on Log Mortality  In order to conduct valid inferences about the linear effect Pi i n model (9.3), we must not only estimate it accurately, but also calculate correct standard errors for this estimate. For model (9.3), pi = c 3, where T  c = ( 0 , 1 , 0 , 0 , 0 ) and 3 = (Po, Pi, P2, Ps, P23V • W e  propose to estimate Pi v i a c / 3 c , where 8 c  is the usual local linear backfitting  T  IS  / S  estimate of 3. Figure 9.14 displays a plot of c / 3 c versus the smoothing parameter h, T  / S  which controls the w i d t h of the smoothing window. T h e large variation i n the values of these estimates re-iterates the importance of choosing h appropriately from the d a t a so as to obtain accurate estimates of Pi. To choose appropriate values of h from the data, we use the preferred P L U G - I N and E B B S - G methods developed i n Chapter 6. B o t h methods use a grid H = {2, 3 , . . . , 548}, 151  where the values i n the grid represent half-widths of local linear smoothing windows. Recall that b o t h of these methods require that we estimate the underlying correlation structure of the model errors. In addition, P L U G - I N requires that we estimate the sesonal effect m i i n the model. We discuss these topics below. We estimate the seasonal effect m j and the error correlation structure using modified (or leave-(21+l)-out) cross-validation, as outlined i n Sections 6.3.1 and 6.3.2. W e allow the t u n i n g parameter I to take on the values 0 , 1 , . . . , 26. Recall that I quantifies our belief about the range and magnitude of the error correlation. For instance, I = 0 signifies that we believe the errors to be independent. W h e n the model errors are t r u l y correlated, we suspect that values of I that are too small may produce under-smoothed estimates of m i , whereas values of I that are too large may produce over-smoothed estimates of rri\. To ascertain what values of / are reasonable for the data, we examine plots of the estimated seasonal effect m i i n model (9.3) corresponding to / = 0 , 1 , . . . , 26; see Figure 9.15.  These plots suggest that using I = 0 or I — 1 is probably not appropriate, as the  corresponding estimates of m i are visually too rough.  U s i n g values of / i n the range  2 — 17 seems to yield reasonable estimates of m i . Values of I i n the range 18 — 26 seem to yield over-smoothed estimates of m i , so perhaps should, be avoided. Next, we estimate the error terms i n model (9.3) v i a modified (or leave-(21+l)-out) crossvalidation residuals, defined as i n Section 6.3.1. Figure 9.16 shows plots of these residuals for various values of I. Now, we use the modified cross-validation residuals to estimate the correlation structure of the model errors. W e w i l l operate under the assumption that these errors follow a covariance-stationary autoregressive process of finite order R. T o estimate R, we use the finite sample criterion for autoregressive order selection developed by Broersen (2000). Figure 9.17 shows that our estimate of R is influenced by how we choose the value of the t u n i n g parameter I. Choosing I — 0 or 1 yields an R of 28. Choosing larger Vs yields R's  152  like 0, 2, 3 or 4. Recall that values of / like 0,1 or 1 8 , . . . , 26 are likely not appropriate for these data. Finally, after determining the order R — R(l), I = 0 , 1 , . . . , 26, of the autoregressive error process, we estimate the error variance o~\ and the autoregressive parameters 4>i, • • • ,4>R using B u r g ' s method (Brockwell and Davis, 1991). Furthermore, we estimate the error correlation m a t r i x * by plugging i n the estimated values of 0 i  i n the expression  for * provided i n C o m m e n t 2.2.1. H a v i n g estimated the seasonal effect m i and the error correlation structure for model (9.3), we can now tackle the issue of data-driven choice of h for accurate estimation of Pi v i a c / 3 T  JiS  c.  T h e estimated bias squared, variance and mean squared error curves  used for determining the P L U G - I N choice of smoothing for  c /3 = are shown i n Figure T  7s  9.18. T h e different curves correspond to different values of I, where Z = 0 , 1 , . . . , 26. In general, the mean squared error curves corresponding to small values of I dominate those corresponding to large values of I. Figure 9.19 displays similar plots used for determining the E B B S - G choice of smoothing. Note that the bias curve i n this figure does not depend on I. A l s o note that mean squared error curves i n this figure that correspond to large values of I dominate, i n general, the curves that correspond to small values of I. Figures 9.20 and 9.21 display the P L U G - I N and E B B S - G choices of smoothing parameter obtained by m i n i m i z i n g the estimated mean squared error curves i n Figures 9.18 and 9.19. B o t h choices are remarkably stable for values of / that seem appropriate for these data. However, the P L U G - I N choices are much smaller i n magnitude t h a n the E B B S - G choices. The P L U G - I N choices that seem appropriate for the data indicate that the seasonal effect m i should be smoothed using h « 28. O n the other hand, the corresponding E B B S - G choices indicate that m i should be smoothed using h « 69. Figures 9.22 and 9.23 show the 95% confidence intervals constructed for Pi w i t h P L U G I N and E B B S - G choices of smoothing for values of I ranging from 0 to 26. These intervals  153  were obtained from formula (7.2), w i t h fl — I.  B o t h figures suggest that the choice of  I (among those that are reasonable for the data) is not that important. T h i s finding is consistent w i t h the M o n t e Carlo simulation study conducted i n C h a p t e r 8 that indicated these choices of smoothing were appropriate for conducting inferences on the linear effect Pi i n model (8.1) provided / was large enough. F r o m Figure 9.22, there is no conclusive proof that Pi, the short-term P M 1 0 effect on log mortality, is significantly different from 0. Indeed, the standard confidence intervals for Pi based on  c /3 c, T  /S  w i t h h chosen v i a P L U G - I N , cross the zero line for all values  of / that are appropriate for the data. T h e stability of these confidence intervals across various values of I is quite remarkable, but not entirely surprising given the stability of the corresponding P L U G - I N choices of smoothing shown i n Figure 9.20. Figure 9.23 supports the same conclusion for Pi, at least i n part. However, for all values of / that are appropriate for the data, these intervals either narrowly miss zero or barely contain it, suggesting that perhaps P M 1 0 does have a significant effect on log mortality. W h a t could explain the discrepancy between Figures 9.22 and 9.23? T h e standard errors of the estimated P M 1 0 effects are comparable i n b o t h figures. However, the P M 1 0 effect estimates obtained w i t h a P L U G - I N choice of smoothing are much smaller than those obtained w i t h E B B S - G . A s seen i n Figures 9.20 and 9.21, the P L U G - I N choices of smoothing parameter for these data are about 28 or so, and are much smaller than the E B B S - G choices, which are about 69 or so. Figure 9.14 shows that using choices of smoothing parameter h of 28 or so yields smaller P M 1 0 estimates than using values of h of 69 or so. W e favour smaller choices of smoothing parameter. W e believe E B B S - G yielded large choices because it used a grid range that was too wide. Recall that E B B S G attempts to estimate the conditional bias of c 3 ^ by assuming a specific form for T  I s  the relationship between this bias and the smoothing parameter h.  T h i s relationship  is motivated by asymptotic considerations as i n (6.13), so it may break down for values of h € H that are too large. E s t i m a t i n g this relationship based on all the 154  "data"  {(\c 3 T  / i S  c ) -.hen)  , may therefore not be appropriate. One should perhaps use only  "data" for w h i c h h is reasonably small to ensure the asymptotic considerations underlying E B B S - G are valid. In other words, one should use a smaller grid range for E B B S - G . We used E B B S - G w i t h a grid H = { 2 , . . . , 100} instead of H = { 2 , . . . , 548} a n d got a similar result to that obtained v i a P L U G - I N (see Figure 9.24): there is no conclusive proof that P M 1 0 has a significant effect o n log mortality. T h i s finding is not surprising given the strength of the annual cycles present i n Figure 9.1.  155  Figure 9.1: Pairwise scatter plots of the Mexico City air pollution data.  156  Estimated PM 10 Effects  0.0  0.1  0.2  0.3  0.4  Estimated Standard Errors  0.5  0.0  Span  0.1  0.2  0.3  0.4  0.5  Span  Figure 9.2: Results o / g a m inferences on the linear PM10 effect B\ in model (9.3) as a function of the span used for smoothing the seasonal effect m\: estimated PM10 effects (top left), associated standard errors (top right), 95% confidence intervals for B\ (bottom left) and p-values of t-tests for testing the statistical significance of 3\.  157  J  Day of Study  Figure 9.3: The top panel displays a scatter plot of log mortality versus PM10. The ordinary least squares regression line of log mortality on PM10 is superimposed on this plot. The bottom panel displays a plot of the residuals associated with model (9.1) versus day of study.  158  0.4 span = 0.01  0.4 span = 0.05  0.2  0.2  0.0  o.o  •0.2  7  •0.2  .' ' ?  •0.4  •0.4-  0 0.4  200  400  600  span = 0.09  800  200  1000  400  600  800  1000  0.4 span = 0.10  .i.-.  0.2  0.2  0.0  0.0-  :^ : ::  •0.2  v  •0.2  •0.4  •0.4  0  200  400  600  800  1000  0  200  400  600  800  1000  0  200  400  600  800  1000  400  600  800  1000  400  600  800  1000  •0.4  •0.4  0  200  400  600  800  1000  0.4 span = 0.15  span = 0.25  0.2 0.0 •0.2 •0.4  •0.4  0  200  400  600  800  1000  0  0.4 span = 0.35  200  span = 0.50  0.2 0.0 -0.2  •0.4-  •0.4  0  200  400  600  800  1000«u_  0  200  Figure 9.4: Plots of the the fitted seasonal effect mi in model (9.2) for various spans. Partial residuals, obtained by subtracting the fitted parametric part of the model from the responses, are superimposed as dots.  159  0.4 span = 0.01  0.4 span = 0.05  0.2  0.2  0.0'  0.0-  •0.2  •0.2  •0.4  •0.4 0  200  400  600  800  1000  0  200  400  600  800  1000  400  600  800  1000  400  600  800  1000  200  400  600  800  1000  200  400  600  800  1000  0.4 span = 0.09  0.4 span = 0.10  0.2-  0.2  0.0-  0.0-  •0.2  •0.2-  -0.4  •0.4 0  200  400  600  800  1000  0  200  0.4 span = 0.11  0.4- span = 0.12  0.2  0.2-  0.0'  0.0-  -0.2  •0.2  -0.4  •0.4 0  200  400  600  800  1000  0  200  0.4 span = 0.15  0.4 span = 0.25  0.2  0.2-  0.0  0.0-  -0.2  •0.2  -0.4  •0.4 0  200  400  600  800  1000  0.4-| span = 0.35  i  0  Figure 9.5: Plots of the residuals associated with model (9.2) for various spans.  160  0.0  0.1  0.2  0.3  0.4  0.5  0.0  0.1  Span for smoothing m2  0.0  0.1  0.2  0.3  0.2  0.3  0.4  0.5  0.4  0.5  Span for smoothing m2  0.4  0.5  Span for smoothing m2  0.0  0.1  0.2  0.3  Span for smoothing m2  Figure 9.6: P-values associated with a series of crude F-tests for testing model (9.4) against model (9.2).  161  Figure 9.7: Plots of the fitted weather surface m in model (9.4) when the fitted seasonal effect m\ (not shown) was obtained with a span of 0.09. The surface m-i was smoothed with spans of 0.01 (top left), 0.02 (top right), 0.03 (bottom left) or 0.04 (bottom right). 2  162  300H  25CH  | 200H CD CD  CD  |l5(H Q  100H  5(H  o!Ei  O10  Oil  0.20  Span  Figure 9.8: (9.4)  versus  shown)  Degrees  of freedom  consumed  the span used for smoothing  was obtained  with a span of  by the fitted weather m  2  0.09.  163  when  surface  the fitted seasonal  m  2  in  effect mi  model (not  PM10  0.2H 0.1-  3  0.0-  13  •g  "w 2-0.1-0.2-0.3200  400  600  800  1000  Day of Study  Figure 9.9: Plot of residuals associated with model (9.3) versus PM10 (top row) and day of study (bottom row). The span used for smoothing the unknown mi in model (9.3) is 0.09.  164  20  40  60  80  Figure 9.10: Plot of residuals associated with model (9.3) versus relative humidity, given temperature. The span used for smoothing the unknown m\ in model (9.3) is  0.09.  165  Temperature  Figure 9.11: Plot of residuals associated with model (9.3) versus temperature, given relative humidity. The span used for smoothing the unknown m\ in model (9.3) is  0.09.  166  0.2  c 0.1 o  0.0  <-0.1  -0.2 50  100  150  200  250  150  200  250  Lag  0.2 c g 'ra 0.1 (B L. L.  10.0 < t -0.1  ra  n  -0.2 50  100 Lag  F i g u r e 9.12: Autocorrelation plot (top row) and partial autocorrelation plot (bottom row) of the residuals associated with model (9.3). The span used for smoothing the unknown mi in model (9.3) is 0.09.  167  0.6 0.4  I 0.2 t 0.0 o  flu.  tMh,  1-0.2  <  -0.4 H -0.6 50  100  150  200  250  200  250  Lag  0.6 0 0.4 J5 2! 0.2 L-  1 o.o I -0.4-1 -0.6 H  —i—  50  100  150 Lag  Figure 9.13: Autocorrelation plot (top row) and partial autocorrelation plot (bottom row) of the responses in model (9.3).  168  200  300  400  500  Smoothing Parameter  Figure 9.14: Usual local linear backfitting estimate of the linear PM10 effect model (9.4) versus the smoothing parameter.  169  170  Figure 9.17: Estimated order for AR process describing the serial correlation in the residuals associated with model (9.3) versus I, where I = 0,1,..., 26. Residuals were obtained by estimating mi with a modified (or leave-(2l+l)-out) cross-validation choice of amount of smoothing.  172  i  0  I  50  I  100  I  '  150  ""• T"  i  200  Smoothing Parameter  0  i  50  I  100  |  I I  150  in II II • |  •  200  Smoothing Parameter  Smoothing Parameter  Figure 9.18: Estimated bias squared, variance and mean squared error curves used for determining the plug-in choice of smoothing for the usual local linear backfitting estimate of Pi. The different curves correspond to different values of I, where I — 0,1,..., 26. The estimated variance curves corresponding to small values of I are dominated by those corresponding to large values of I when the smoothing parameter is large. In contrast, the estimated squared bias and mean squared error curves corresponding to small values of I dominate those corresponding to large values of I when the smoothing parameter is large.  173  Smoothing Parameter  Figure 9.19: Estimated bias squared, variance and mean squared error curves used for determining the global EBBS choice of smoothing for the usual local linear backfitting estimate of Pi. The different curves correspond to different values of I, where I — 0,1,... ,26. The curves corresponding to large values of I dominate those corresponding to small values of I.  174  o  00  0)  0 0 E w  o 8 «• 0  o  x:  0 2  CL  O  OH  Figure 9.20:  Plug-in choice of smoothing for estimating Pi versus I, where I  0,1,. ..,26.  175  o  00  Ul  c 'E  o _ (O  0 0  E  w  0 0) 0 'o  o ^  "  0 w m  CO LU  lob  1 CD  o M  10  15  20  25  Figure 9.21: Global EBBS choice of smoothing for estimating Pi versus I, where I = 0,1,..., 26.  176  CD  o o. o  0  o c o •g o o. »+c  o  0  0  N° 0 s  o "2 c i ' IT)  0)  CO "D  C  iS w o o o  "T 10  15  20  25  F i g u r e 9 . 2 2 : Standard 95% confidence intervals for Pi based on local linear backfitting estimates of Pi with plug-in choices of smoothing. The different intervals correspond to different values of I, where I — 0 , 1 , . . . , 26. The shaded area represents confidence intervals corresponding to values of I that are reasonable for the data.  177  Figure 9.23: Standard 9 5 % confidence intervals for 3\ based on local linear backfitting estimates of Pi with global EBBS choices of smoothing. The different intervals correspond to different values of I, where 1 = 0,1,..., 26. The shaded area represents intervals corresponding to values of I that are reasonable for the data; the intervals corresponding to I = 3,... ,7 do not cross the horizontal line passing through zero.  178  T  0  5  10  15  20  25  I  Figure 9.24: Standard 95% confidence intervals for Pi based on local linear backfitting estimates of Pi with global EBBS choices of smoothing obtained by using a smaller grid range. The different intervals correspond to different values of I, where I = 0,1,..., 26. The shaded area represents confidence intervals corresponding to values of I that are reasonable for the data.  179  Chapter 10 Conclusions In this chapter, we provide an overview of the research problem considered i n this thesis. We then outline the m a i n contributions of this thesis and summarize the contents of each chapter. Finally, we suggest possible extensions to our work.  Partially Linear Models P a r t i a l l y linear models are flexible tools for analyzing data from a variety of applications. T h e y generalize linear regression models by allowing one of the variables i n the model to have a non-linear effect on the response.  Inferences on the Linear Effects in Partially Linear Models In many applications, the p r i m a r y focus is on conducting inferences on the linear effects 8 i n a p a r t i a l l y linear model. In these applications, the non-linear effect m i n the model is treated as a nuisance. T h i s nuisance effect is a double-edged sword - while it affords greater modelling flexibility, it is also more difficult to estimate t h a n the linear effects and, as such, it complicates the inferences on these effects.  Inferential Goals Depending on the application, various goals could be relevant to the problem of conducting inferences on the linear effects i n a partially linear models w i t h correlated errors.  180  One goal would be to choose the correct amount of smoothing for accurately estimating the linear effects.  One would hope that the methodology used for m a k i n g this choice  produces an amount of smoothing for which the linear effects are estimated at the 'usual' parametric rate of 1/n - the rate that would be achieved if the non-linear effect were known. A n o t h e r goal would be to construct valid standard errors for the estimated linear effects. A n additional goal would be to use the estimated linear effects and their associated standard errors to construct valid confidence intervals and tests of hypotheses for assessing the magnitude and statistical significance of the linear effects, possibly adjusting for smoothing bias. L i t t l e has been done i n the literature to address this goal.  Research Questions Concerning the Inferential Goals Various research questions emerge i n connection w i t h the inferential goals listed above: 1. H o w can we choose the correct amount of smoothing for accurate estimation of the linear effects? 2. H o w can we estimate the correlation structure of the model errors for conducting inferences on the linear effects? 3. H o w can we construct valid standard errors for the estimated linear effects? 4. H o w can we construct valid confidence intervals and tests of hypotheses for assessing the magnitude and statistical significance of the linear effects? 5. W h a t is the impact of the choice of amount of smoothing on the validity of the confidence intervals and tests of hypotheses? 6. C o u l d inefficient estimates of the linear effects provide valid inferences?  181  Thesis Contributions T h e major contributions of this thesis to the research questions stated above are:  (1)  defining sensible estimators of the linear and non-linear effects i n partially linear models w i t h correlated errors, (2) deriving explicit expressions for the asymptotic conditional bias and variance of the proposed estimators of the linear effects, (3) developing data-driven methods for selecting the appropriate amount of smoothing for accurate estimation of the linear effects, (4) developing confidence interval and hypothesis testing procedures for assessing the magnitude and statistical significance of the linear effects of m a i n interest, (5) studying the finite-sample properties of these procedures, and (6) applying these procedures to the analysis of an air p o l l u t i o n data set. These contributions are discussed i n more detail below. T h e estimators we proposed i n this thesis are backfitting estimators, relying on locally linear regression, w h i c h is known to posses attractive theoretical and practical properties. M a n y of the backfitting estimators proposed i n the literature of partially linear regression models w i t h correlated errors rely on locally constant regression, a method that does not enjoy the good properties of locally linear regression. In Chapters 4 and 5 of this thesis, we studied the large-sample behaviour of the estimators of linear effects introduced i n this thesis as the w i d t h of the smoothing window used i n locally linear regression decreases at a specified rate, and the number of d a t a points i n this window increases. Specifically, we obtained explicit expressions for the conditional asymptotic bias and variance of these estimators. O u r asymptotic results are important as they show that, i n the presence of correlation between the linear and non-linear variables i n the model, the bias of the estimators of the linear effects can dominate their variance asymptotically, therefore compromising their -^/^-consistency. T h i s problem can be remedied however by selecting an appropriate rate of convergence for the smoothing parameter of the estimators. T h i s rate is slower than the rate that is o p t i m a l for estimation of the non-linear effect, and as such it 'undersmooths' the estimated non-linear  182  effect. Selecting the appropriate amount of smoothing for the estimators of the linear effects is a crucial problem, which is complicated by the presence of error correlation and dependencies between the linear and nonlinear components of the model. Our theoretical results indicate that the amount of smoothing that is 'optimal' for estimating the non-linear effect is not 'optimal' for estimating the linear effects. Data-driven methods devised for accurate estimation of the non-linear effect will likely fail to yield a satisfactory choice of smoothing for estimating the linear effects. In this thesis, we proposed three data-driven smoothing parameter selection methods. Two of these methods are modifications of the EBBS method of Opsomer and Ruppert (1999) and rely on the asymptotic bias results derived in this thesis. The third method is a non-asymptotic plug-in method. Our methods fill a gap in the literature of partially linear models with correlated errors, as they are designed specifically for accurate estimation of the linear effects. These methods 'undersmooth' the estimated non-linear effect because they attempt to estimate the amount of smoothing that is MSE-optimal for estimating the linear effects, not the amount of smoothing that is MSE-optimal for estimating the non-linear effect.  Our theoretical  results suggest that, in general, the amount of smoothing that is MSE-optimal for estimating the linear effects is smaller than the amount of smoothing that is MSE-optimal for estimating the non-linear effect. The issue of conducting valid inferences on the linear effects in a partially linear model with correlated errors is inter-connected with the appropriate choice of smoothing for estimating these effects.  Most literature results devoted to this issue use choices of  smoothing that 'do well' for estimation of the non-linear effect and are deterministic. Such choices may not be satisfactory when one wishes to 'do well' for estimation of the linear effects and hence have little practical value in such contexts. The confidence interval and hypothesis testing procedures proposed in this thesis are constructed with data-driven choices of smoothing. They are either standard, bias-adjusted or standard-  183  error adjusted.  T o our knowledge, adjusting for bias i n confidence intervals and tests  of hypotheses has not been attempted i n the literature of p a r t i a l l y linear models. T h e inferential procedures we introduced i n this thesis do not account for the uncertainty associated w i t h the fact that the choice of smoothing is data-dependent and the error correlation structure is estimated from the data.  However, simulations indicate that  several of these procedures perform reasonably well for finite samples. In Chapter 8, we conducted a M o n t e C a r l o simulation study to investigate the finite sample properties of the linear effects estimators proposed in this thesis, namely, the usual and estimated modified local linear backfitting estimators. W e also compared the properties of these estimators against those of the usual Speckman estimator.  In our  simulation study, we chose the smoothing parameter of the backfitting estimators using the data-driven methods developed i n Chapter 6. B y contrast, we chose the smoothing parameter of the usual Speckman estimator using cross-validation, modified for correlated errors ( M C V ) and for boundary effects. T h e m a i n goals of our simulation study were (1) to compare the expected log mean squared error of the estimators and (2) to compare the performance of the confidence intervals built from these estimators and their associated standard errors. O u r study suggested that the usual local linear backfitting estimator should be used i n practice, w i t h either a global modified E B B S or a non-asymptotic plugi n choice of smoothing. To ensure the validity of the inferences based on this estimator and its associated standard error, one should never use small values of / i n the modified (or leave-(21+l)-out) cross-validation criterion utilized i n estimating the error correlation structure. A d j u s t i n g these inferences for possible bias effects d i d not affect the quality of our results. T h e quality of the inferences based on the estimated modified local linear estimator was poor for many simulation settings, owing to the fact that the associated standard errors were too variable. T h e quality of the inferences based on the Speckman estimator was reasonable for most simulation settings, but not as good as that of the inferences based on the usual local linear backfitting estimator.  184  In Chapter 9, we used the inferential methods developed i n this thesis to assess whether the pollutant P M 1 0 had a significant short-term effect on log m o r t a l i t y i n M e x i c o C i t y during 1994-1996, after adjusting for temporal trends and weather patterns.  Our data  analysis suggested that there is no conclusive proof that P M 1 0 had a significant shortterm effect on log mortality. O u r data analysis differs from standard analyses i n that it relies on objective methods to adjust this effect for temporal confounding. F u r t h e r W o r k to be D o n e A s usual, there is further work to be done. T h e following are just a few of the issues that need additional investigation. Proofs of the asymptotic normality of the linear effects estimators proposed i n this thesis are still pending. These proofs w i l l provide formal justification for using standard confidence intervals and tests of hypotheses based on these estimators and their associated standard errors. Further investigation into the appropriate choice of I i n the modified cross-validation criterion used i n estimating the error correlation structure is needed. T h i s choice should take into account the range and magnitude of the error correlation. Possible Extensions to O u r W o r k T h e work i n this thesis can be extended i n various directions. First, we could extend the partially linear model considered i n this thesis by allowing additional univariate smooth terms to enter the model. Such models arise frequently i n practical applications. Developing inferential methodology for these models is therefore important. T o carry out inferences on the linear effects i n such models we would need to simultaneously choose the amounts of smoothing for estimating a l l the non-linear effects. These amounts should be appropriate for accurate estimation of the linear effects and should account for correlation between the linear and non-linear variables and correlation between the model errors. 185  Second, we could extend the partially linear model considered in this thesis to responses that are not continuous. For instance, the responses could follow a Poisson distribution. Incorporating correlation in such models could be a challenge. Third, we could extend the partially linear model considered in this thesis by allowing the non-linear variable to be a spatial coordinate, in which case m is a spatial effect. Such a model is termed a spatial partially linear model. Clearly, in many contexts, the errors would be correlated. Spatial partially linear models with correlated errors can be used, for instance, to analyze spatial data observed in epidemiological studies of particulate air pollution and mortality. Typically, in these applications, the linear effects 3 are of main interest, while the spatial effect m is treated as a nuisance. Ramsay et al. (2003b) considered spatial partially linear models with uncorrelated errors and estimated 3 and m using the S-Plus function gam with loess as a smoother. They used gam's default choice of smoothing to control the degree of smoothness of the estimated m. They showed via simulation that the correlation between the linear and spatial terms in the model can lead to underestimation of the true standard errors associated with the estimated linear effects, both when using S-Plus standard errors and so-called asymptotically unbiased standard errors. They cautioned that using such standard errors can compromise the validity of inferences concerning the linear effects, but did not propose a solution for alleviating this problem. Their findings highlight the fact that carrying out inferences on the linear effects in spatial partially linear models with uncorrelated errors is challenging in the presence of correlation between the linear and spatial terms in the model. Obviously, error correlation will further compound the challenges involved in conducting valid inferences on the linear effects in spatial partially linear models. Of course, this work would be relevant in the non-spatial context as well.  186  Bibliography [1] Aneiros Perez, G. and Quintela del Rio, A. (2001a). Asymptotic properties in partial linear models under dependence. Test, 10, 333-355. [2] Aneiros Perez, G. and Quintela del Rio, A. (2001b). Modified cross-validation in semiparametric regression models with dependent errors. Communications in Statistics: Theory and Methods, 30, 289-307.  [3] Aneiros Perez, G. and Quintela del Rio, A. (2002). Plug-in bandwidth choice in partial linear models with autoregressive errors. Journal of Statistical Planning and Inference, 100, 23-48. [4] Bos, R., de Waele, S. and Broersen, P.M.T. (2002). Autoregressive spectral estimation by application of the Burg algorithm to irregularly sampled data. IEEE Transactions on Instrumentation and Measurement, 51, 1289-1294.  [5] Brockwell, P.J. and Davis, R.A. (1991). Time Series: Theory and Methods. Second Edition. New York: Springer-Verlag. [6] Broersen, P.M.T. (2000). Finite Sample Criteria for Autoregressive Order Selection. IEEE Transactions on Signal Processing, 48, 3550-3558.  [7] Buja, A., Hastie, T. and Tibshirani, R. (1989). Linear smoothers and additive models (with discussion). Annals of Statistics, 17, 453-555. [8] Chatfield, C. (1989). The Analysis of Time Series: An Introduction. Fourth Edition. New York: Chapman and Hall. 187  [9] C h u , C . - K . , M a r r o n , J . S . (1991). C o m p a r i s o n of two b a n d w i d t h selectors w i t h dependent errors. Annals of Statistics, 19, 1906-1918. [10] D a v i d , B . and B a s t i n , G . (2001). A n estimator of the inverse covariance m a t r i x and its application to M L parameter estimation i n d y n a m i c a l systems. Automatica,  156,  99-106. [11] D o m i n i c i , F . , M c D e r m o t t , A . , Zeger, S . L . and Samet, J . M . (2002). O n the use of generalized additive models i n time-series studies of air p o l l u t i o n and health. American Journal of Epidemiology, 156, 193-203. [12] Engle, R . F . , Granger, C . W . J . , Rice, J . and Weiss, A . (1983). Nonparametric estimates of the relation between weather and electricity demand. Technical report, U . C . San Diego [13] Engle, R . F . , Granger, C . W . J . , Rice, J . and Weiss, A . (1986). Semiparametric estimates of the relation between weather and electricity sales. The Journal of the American Statistical Association, 81, 310-320. [14] F a n , J . (1993). L o c a l linear regression smoothers and their m i n i m a x efficiency. The Annals of Statistics, 21, 196-216. [15] F a n , J . and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. N e w York: C h a p m a n and H a l l . [16] F a n , J . and Gijbels, I. (1992). Variable B a n d w i d t h and L o c a l Linear Regression Smoothers. The Annals of Statistics, 20, 2008-2036. [17] Francisco-Fernandez, M . and Vilar-Fernandez, J . M . (2001). L o c a l p o l y n o m i a l regression w i t h correlated errors. Communications in Statistics: Theory and Methods, 30, 1271-1293. [18] Gasser, T . and M i i l l e r , H . G . (1984). E s t i m a t i n g regression functions and their derivatives by the kernel method. Scandinavian Journal of Statistics, 11, 171-185. 188  [19] Green, P . , Jennison, C . and Seheult, A . (1985). Analysis of field experiments by least squares smoothing. Journal of the Royal Statistical Society, Series B, 4 7 , 299-315. [20] Hastie, T . J . and T i b s h i r a n i , R . J . (1990). Generalized Additive Models. N e w York: C h a p m a n and H a l l . [21] H a r d l e , W . and V i e u , P. (1992). K e r n e l regression smoothing of time series. Journal of Time Series Analysis, 1 3 , 209-232. [22] Heckman, N . E . (1986). Spline smoothing i n a partly linear model. Journal of the Royal Statistical Association, Series B, 4 8 , 244-248. [23] Ibragimov, I . A . and L i n n i k , Y . V . (1971). Independent and Stationary Sequences of Random Variables. Groningen: Wolters Noordhoff. [24] K a t s o u y a n n i , K . , T o u l o m i , G . and Samoli, E . , et al. (1997). Confounding and effect modification i n the short-term effects of ambient particles on total mortality: results from 29 E u r o p e a n cities w i t h i n the A P H E A 2 project. Epidemiology, 1 2 , 521-531. [25] K e l s a l l , J . E . , Samet, J . M . and Zeger, S . L . (1997). A i r p o l l u t i o n and mortality i n Philadelphia, 1974-1988. American Journal of Epidemiology, 1 4 6 , 750-762. [26] Moolgavakar, S. (2000). A i r pollution and hospital admissions for diseases of the circulatory system i n three U . S . metropolitan areas. Journal of the Air Waste Management Association, 50, 1199-1206. [27] Moyeed, R . A . and Diggle, P . J . (1994). Rate of convergence i n semiparametric modelling of longitudinal data. Australian Journal of Statistics, 3 6 , 75-93. [28] Nadaraya, E . A . (1964). O n estimating regression. Theory of Probability and Its Applications , 9, 141-142. [29] Opsomer, J . D . and Ruppert, D . (1998). A fully automated b a n d w i d t h selection method for fitting additive models. The Journal of the American Statistical Association, 9 3 , 605-620. 189  [30] Opsomer, J . D . and Ruppert, D . (1999). A root-n consistent estimator for semiparametric additive modelling. Journal of Computational and Graphical Statistics, 8, 715-732. [31] Ramsay, T . , Burnett, R . , K r e w s k i , D . (2003a). T h e effect of concurvity i n generalized additive models l i n k i n g mortality and ambient air pollution. Epidemiology, 14, 1823. [32] Ramsay, T . , Burnett, R . , K r e w s k i , D . (2003b). E x p l o r i n g bias i n a generalized additive model for spatial air p o l l u t i o n data. Environmental Health Perspectives, 1 1 1 , 1283-1288. [33] Rice, J . A . (1986). Convergence rates for partially splined models. Statistics and Probability Letters, 4, 203-208. [34] Robinson, P . M . (1988). Root-n-consistent semiparametric regression. Econometrica, 56, 931-954. [35] Samet, J . M . , D o m i n i c i , F . , C u r r i e r o , F . , et a l . (2000). F i n e particulate air p o l l u t i o n and mortality i n 20 U . S . cities: 1987-1994 (with discussion). New England Journal of Medicine, 3 4 3 , 1742-1757. [36] Schwartz, J . (1994). Nonparametric smoothing i n the analysis of air p o l l u t i o n and respiratory illness. The Canadian Journal of Statistics, 22, 471-488. [37] Schwartz, J . (1999). A i r pollution and hospital admissions for heart disease i n eight U S counties. Epidemiology, 10, 17-22. [38] Schwartz, J . (2000). Assessing confounding, effect modification, and thresholds i n the associations between ambient particles and daily deaths. Environmental Health Perspectives, 1 0 8 , 563-568. [39] Shick, A . (1996). Efficient estimation i n a semiparametric additive regression model w i t h autoregressive errors. Stochastic Processes and their Applications, 6 1 , 339-361. 190  [40] Shick, A . (1999). Efficient estimation i n a semiparametric additive regression model w i t h A R M A errors. Stochastic Processes and their Applications, 61, 339-361. [41] Speckman, P . E . (1988). Regression analysis for partially linear models. Journal of the Royal Statistical Association, Series B, 50, 413-436. [42] Sy, H . (1999). A u t o m a t i c b a n d w i d t h choice i n a semiparametric regression model. Statistica Sinica, 9, 775-794. [43] Truong, Y . K . (1991). Nonparametric curve estimation w i t h time series errors. Journal of Statistical Planning and Inference, 28, 167-183. [44] W a h b a , G . (1984). Cross-validated spline methods for the estimation of multivariate functions from d a t a on functionals. In Statistics: Anniversary  An Appraisal, Proceedings 50th  Conference Iowa State Statistical Laboratory ( H . A . D a v i d , ed.) Iowa  State University Press, 205-235. [45] Watson, G . S . (1964). Smooth regression analysis. Sankhya A, 26, 359-372. [46] Y o u , J . and C h e n , G . (2004). B l o c k external bootstrap i n partially linear models w i t h nonstationary strong m i x i n g error terms. The Canadian Journal of Statistics, 32, 335-346. [47] Y o u , J . , Zhou, X . and C h e n , G . (2005). Jackknifing i n partially linear regression models w i t h serially correlated errors. Journal of Multivariate Analysis, 92, 386404.  191  Appendix A M S E Comparisons In this appendix, we provide plots to help assess and compare the M S E properties of the estimators of the linear effect /?i i n model (8.1) that were discussed i n Section 8.2.  192  U EBBS G minus U PLUG IN  U_EBBS L minus U PLUG IN 1 +  1  1  +  1=0  +  S 1  1  1  1  1"-  r  +  S  1=1  1  * * * t t * ! i  +  1  1  L i l l l i l l i|j  - t I S  1  —  S 1  S 1  l=3  l=2  +  +  S 1  l=4  +  S 1  l=5  +  S 1  l=6  +  S 1  l=7  ;  J 1  l=8  >  s  1=10  U EBBS L minus U EBBS G  l=0  Figure A . l :  1=1  l=2  Boxplots of pairwise differences in log MSE for the estimators  PV,PLUG-IN> PU]EBBS-G PU!EBBS-L of the linear effect Pi in model (8.1), where I = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0 and m(z) — mi(z). A  N  D  193  U E B B S G minus U P L U G  0.4  h  1=4  1=5  1=6  1=7  1=9  1=10  U _ E B B S L minus U E B B S G  +  t  t  +  I* l=0  1=1  t  +  J *I * I * ! * I * I * I* J  *  l=2  l=3  l=4  l=5  l=6  l=7  l=8  l=9  1=10  Figure A.2: Boxplots of pairwise differences in log MSE for the estimators PU]PLVG-IN> PU]EBBS-G  A  N  D  0U,EBBS-L  °f  t  h  e  ^ear  effect B in model (8.1), where X  I — 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.2 and m(z) = m\{z).  194  U E B B S G minus U P L U G IN  U _ E B B S L minus U E B B S G  1=9  Figure A.3:  1=10  Boxplots of pairwise differences in log MSE for the estimators  M]PLUG-IN> W,EBBS-G PU,EBBS-L °f linear effect B in model (8.1), where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.4 and m(z) = m\{z). A  N  D  t  h  e  x  195  U _ E B B S _ G minus U_PLUG_IN 1  1  1  1  I  1  1  iT I i i1!?:? T T T• <! H  = t  * +* • +. S  1  S  1  1=0  1=1  +  +  S  +  +  1  1  1  l=2  l=3  l=4  1  1  t +  +  +  1  1  l=5  l=6  1  1  +  h  i +!  *  H  1  l=7  i  i  l=8  l=9  1=10  I  I  U _ E B B S _ L minus U _ P L U G _ I N  j  1  1  1  i  I  I t I  1  1  +  + +  I  •  I i \\  I  I  +  - (fl  t —I  1=0  1  1=1  s  1 I  l=2  s  I1  l=3  s  I1  I1  I1  l=4  l=5  l=6  s  I1  l=7  s  I1  l=8  s  s  I1  l=9  s  1I  -  1=10  U E B B S L minus U E B B S G  Figure A . 4 :  Boxplots of pairwise differences in log MSE for the estimators  PU,PLUG-IN> PU,EBBS-G M]EBBS-L of the linear effect ft in model (8.1), where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.6 and m(z) — m\(z). A N D  196  U E B B S G minus U P L U G IN  1=3  1=4  1=5  1=6  1=7  1=8  1=9  1=10  1  i  1  , +  t *  U E B B S L minus U P L U G IN  U _ E B B S _ L minus U E B B S G I  I  1  1  i  P i-  1=0  Figure A . 5 :  1  [ I || -1 rh  S  1  S  S  1=1  !  >  l=2  l=3  if  1  1  |  t  I  r  t  *  i *  ¥  s  *  * + S I  l=4  S l  l=5  S l  l=6  l  l=7  l  l=8  l  l=9  l  1=10  Boxplots of pairwise differences in log MSE for the estimators  d PU,EBBS-L °f Unear effect Pi in model (8.1), where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.8 and m(z) — rn\(z).  PU!PWG-IN> PU!EBBS-G  an  t h e  197  U E B B S G minus U P L U G IN  U E B B S L minus U P L U G IN  1=0  1=1  l=2  l=3  l=4  l=5  l=6  l=7  l=9  1=10  l=9  1=10  U E B B S L minus U E B B S G  T l=0 Figure A.6:  1=1  l=3  l=2  l=4  l=5  l=6  l=7  Boxplots of pairwise differences in log MSE for the estimators  PV]PLUG-IN> PU,EBBS-G PU]EBBS-L of the linear effect ft in model (8.1), where / = 0 , 1 , . . . , 1 0 . Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0 and m(z) — 1712(2). A  N  D  198  U E B B S G minus U P L U G IN  U E B B S L minus U P L U G IN  T 1=0  1=1  l=2  l=3  l=4  l=5  l=6  l=7  l=8  l=9  1=10  l=8  l=9  1=10  U E B B S L minus U E B B S G  T 1=0  Figure A.7:  1=1  l=2  l=3  l=4  l=5  l=6  l=7  Boxplots of pairwise differences in log MSE for the estimators  M]PLUG-IN> PU]EBBS-G PU,EBBS-L °f linear effect Pi in model (8.1), where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.2 and m(z) — 7712(2). A  N  D  t  h  199  e  U _ E B B S G minus U P L U G IN  t  i + 1=0  I  ~i  r  -i-  *  I  ~i  *  1  *  I  t  i  + 1=1  l=2  n  r  l=3  l=4  l=5  l=6  l=7  s  s  I  I  l=8  l=9  1L. =10  U E B B S L minus U P L U G IN ~i r~  -  I i  i  + s —1  1=0  4 3 2  s  s 1  1=1 n  I  1h  s  s  l=2  1  l=3  1  1  i  s 1  l=4  s I  s I  l=5  l=6  Figure  A.8:  1=1  1 1=2  s I  l=8  s I  l=9  L_  1=10  U _ E B B S _ L minus U _ E B B S _ G 1 1 1  iii I  r-*-i  l=0  s I  l=7  1=3  1=4  1=5  1=6  1=7  I  X  1=8  1=9  1=10  Boxplots of pairwise differences in log MSE for the estimators  PU,PLUG-IN> PU]EBBS-G PU,EBBS-L °f linear effect ft in model (8.1), where I — 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.4 and m(z) — 7712(2). A  N  D  t  h  200  e  U E B B S G minus U P L U G IN  1=0  1=1 -i  l=2  l=3  1  1  l=4  l=5  l=6  l=7  1=9  1=10  U _ E B B S L minus U P L U G IN  + + +  i=o  1=1  l=2  l=3  l=4  l=5  l=6  l=7  l=8  1=10  U E B B S L minus U E B B S G  I  T  T  +  + l=0  Figure A.9:  1=1  l=2  l=3  l=4  l=5  l=6  l=7  l=9  1=10  Boxplots of pairwise differences in log MSE for the estimators  PU,PLUG-IN> PU)EBBS-G  linear effect Bx in model (8.1), where 1 = 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.6 and m(z) = 7712(2;). AND  &U]EBBS-L °f  201  t h e  U E B B S G minus U P L U G IN  U _ E B B S L minus U P L U G IN  + +  +  1=0  1=1  l=2  +  +  l=3  l=4  2 -  l=5  l=6  l=7  l=8  l=9  1=10  l=  l=9  1=10  U E B B S L minus U E B B S G  4i— 3 -  +  X  +  1 -  i r-  0— -1 -2 -3 -  + l=0  Figure A . 10:  1=1  l=2  l=3  l=4  l=5  l=6  l=7  Boxplots of pairwise differences in log MSE for the estimators  PV,PLUG-IN> W,EBBS-G PU]EBBS-L of the linear effect Pi in model (8.1), where I — 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.8 and m(z) = 7712(2). A  N  D  202  E M E B B S G minus E M P L U G IN  E M E B B S L minus E M P L U G IN ~i  r~  ~\  1  r  1  1  -  -  i i ii iiii IiI +  s —I  1=0  s  s  s  I  I  1=1  l=2  s I  l=3  s I  s I  l=5  l=4  s 1  l=6  s I  s I  l=7  l=8  #  s I  l=9  .  l_  1=10  E M E B B S L minus E M E B B S G  Figure A . 11:  Boxplots of pairwise differences in log MSE for the estimators  °f ff Pi ( )> where Z = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0 and m(z) = mi(z). 0EM,PLUG-IN>  PEM,EBBS-G  A  N  D  0EM.EBBS-L  203  t h e  l i n e a r  e  ect  i n  m o d e l  8A  E M E B B S G minus E M P L U G IN  1=0  1=1  0.6 0.4 0.2 0 -0.2 h  l=2  l=3  l=4  l=5  l=6  l=7  l=8  l=9  1=10  E M _ E B B S _ L minus E M _ P L U G _ I N 1 1 1 1 [—  ~i  llllllillll !!  f  *  1=0  1=1  l=2  s  s  *  *  +  *  1  J  ][  l=9  1=10  -0.4 -0.6  s t  l=0  Figure A.12:  l=3  s  l=4  s  l=6  l=5  s  s  l=7  s  l=8  s  s  s  I  I  I  I  I  I  I  I  !  1=1  l=2  l=3  l=4  l=5  l=6  l=7  l=8  l=9  1_  1=10  Boxplots of pairwise differences in log MSE for the estimators  PEM,PLUG-IN> PEM,EBBS-G PEM,EBBS-L °f t h e l i n e a r effect Pi i n m o d e l (8A)> where I — 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p — 0.2 and m(z) = m\(z). A  N  D  204  E M _ E B B S _ G minus E M P L U G IN  -0.5  1=0  1=1  l=2  l=3  l=4  l=5  l=6  l=7  l=  l=9  1=10  E M E B B S L minus E M P L U G IN  I j L  i ji  3  F= ;  i l=0  Figure A . 13:  till j 1  i  J  1  1  T+ T ^  HP  *  1t  ±  +  uL  ±  ii  ii  ?f1=9  ^ 1=10  •  }  s  S  S  s  s  s  s  1=1  l=2  l=3  l=4  l=5  l=6  l=7  l=8  -  Boxplots of pairwise differences in log MSE for the estimators  °f  ff Pi  model (8.1), where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.4 and m(z) = m\{z). PEM,PLUG-IN> PEM,EBBS-G  A  N  D  PEM,EBBS-L  205  t h e l i n e a r e  ect  i n  E M E B B S G minus E M P L U G IN 1  1  1  T  i  i  i  ?  s  1=0  ?  s  1=1  i  jj  L i M .yy f i T  i  r  l=2  T  s l=4  l=3  i  i  H i " •  T  }  i  T  s l=5  T  s l=6  T  s l=7  T s l=8  T s  s 1=10  E M E B B S L minus E M P L U G IN  I +  +  j  - rL i i l i i i i i i i  I  •  *  i  T  +  h  T  + +  T  *  T  T  +  +  I I  s  s  1=0  1=1  l=2  l=3  l=4  l=5  l=6  I  I  I  I  I  I  I  ?  l  •  i  •  i  i  S  S  S  1=0  1=1  l=2  Figure A . 14:  i  |  s  f  S  i  s  S l=4  s  1=10  L-  i  S l=7  s l=9  1  i  S l=6  s l=8  I  t  S l=5  s l=7  +  T  S  l=3  s  S l=8  S l=9  " 1=10  Boxplots of pairwise differences in log MSE for the estimators  MM,PLUG-IN> PEM,EBBS-G PEM,EBBS-L °fthe l i n e a r effect Pi i n m o d e l where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE's is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.6 and m(z) — m\(z). A  N  D  206  E M E B B S G minus E M P L U G IN 3r2 1 -  T TT 3i— 2 -  l=0  1=1  ~i  r  ff  T T  l=2  l=4  l=3  l=5  l=6  l=7  l=8  3  l=9  1=10  E M E B B S L minus E M P L U G IN -i  -  I  r~  i i i i  i i i  TTTTTTl l=0  1=1  1=2  1=3  +  i  1=4  1=5  +  1=6  +  1=7  1=8  *  *  1=9  1=10  E M E B B S L minus E M _ E B B S _ G 1 1 1 r~  ~i  L J  l=0  Figure A . 15:  1=1  I  rf  1=2  1=3  1=4  1=5  1=6  1=7  1=8  1=9  1=10  Boxplots of pairwise differences in log MSE for the estimators  MM,PWG-IN> PEM,EBBS-G PEM,EBBS-L °f ff A (S- )^ where Z = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.8 and m(z) = m\(z). A  N  t  D  207  h  e  l i n e a r  e  ect  i  n  m  o  d  d  1  E M E B B S G minus E M P L U G IN  1=10  i  1  E M E B B S L minus E M P L U G IN 1 1 1 1 r  1  1  i I  S 1  1=0  1=1  1  l=2  1  S 1 l=3  I  I  I  S  S  _  1  1  -J  T  S  S  1  1  l=4  l=5  1"  i  i  1 , 1  1  i  :  1  1  1  1  .  S  s  s  s  s  1 l=6  l=7  l=8  l=9  1=10  E M E B B S L minus E M E B B S G  I  -  J L J L 1  I  1  1 '  .  ' T  T  I  '  —:  •  1 T S  S  S  S  1 1=0  1 1=1  1 l=2  l=3  Figure A . 16: PEM,PLUG-IN>  I  1  I  i  S  S  S  . l=5  1 l=6  I  i  -r-  j T  1 l=4  I  I  i  i  T  W  T  1  s  s  s  I l=7  Il=8  I l=9 I  : 1  T  .  s 1=10  Boxplots of pairwise differences in log MSE for the estimators PEM,EBBS-G  A  N  D  PEM,EBBS-L  °f  t  h  e  l i n e a r  ff  e  ect  Pi  i  n  m o d d  i - )' 8  1  where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 significance level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0 and m(z) = m,2(z). 208  E M _ E B B S G minus E M P L U G IN  E M E B B S L minus E M P L U G IN  L i i i i T T T T T T IIJT_XJI l=0  1=1  l=2  l=3  l=4  I l=5  l=6  l=7  l=8  l=9  E M E B B S L minus E M E B B S G  4i— 3 -  I,  2 -  J L J L J L  J  L J  1 0  1=10  -  -1 -2 -3 l=0  1=1  l=2  l=3  l=4  l=5  l=6  l=7  l=8  l=9  1=10  Figure A.17: Boxplots of pairwise differences in log MSE for the estimators MM,PLUG-IN> PEM,EBBS-G PEM,EBBS-L °f ff A model (8.1), where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.2 and m(z) = m2(z). A  N  t  D  209  h  e  l i n e a r  e  ect  i  n  E M E B B S G minus E M P L U G IN  E M _ E B B S _ L minus E M P L U G IN 1  1  1  "T"  _ l-I-l  "  x  S 1 1=0  S 1 1=1  I  I  1  J i  1 1 .  1  1  1  x  1  S  S  1 l=2  n  S  1  S  1  - hr 1  |  S  1 l=5  1  1  s  S 1=10  S  1 l=6  T :  T  1  S  1 l=4  1  n  1  i  1 l=3  1  1 l=7  1 l=8  l=9  I  I  E M E B B S L minus E M E B B S G  :  JL  I  I  xX  I  1  I  , 1 1 1 1  J  S  S  S  1=1  l=2  Figure A . 18:  S  S  l=3  S l=4  S l=5  S l=6  J  V  hr  I  1=0  I  S  S  l=7  S  l=8  l=9  1=10  Boxplots of pairwise differences in log MSE for the estimators  PEM,PLUG-IN> PEM,EBBS-G  A  N  D  PEM,EBBS-L  °f  t  h  e  l i n e a r  ff  e  ect  Pi  i  n  m  o  d  e  l  where Z = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.4 and m(z) = 7712(2).  210  E M _ E B B S _ G minus E M P L U G IN 1  )  1  -»-  1  1  1  1  1  1  1  1  — *  : TTVTTT! fTTT: HH  tzp  E 3 E = 3 f==i czi]  r_ -  E  1  r  S0 1= 1  S1 1= 1  S l=2 1  S l=3  S l=4  1  1  S l=5 1  5 J l=6  £5  S  l=7  l=  i  S  S  l=9  1=10  i  i  E M E B B S L minus E M P L U G IN  111  1  :  J s  S  1=0  1=1  1  I  — I  1  1  1  1  11  i  s  s  s  s  s  s  s  s  s  l=2  l=3  l=4  l=5  l=6  l=7  l=8  l=9  1=10  1  1  1 1 1 I 1  1  1  1  1  1  .  1  1  1  E M E B B S L minus E M E B B S G  4r-  -i  r  l=9  1=10  3 -  l=0  Figure A . 19:  1=1  l=2  l=3  l=4  l=5  l=6  l=7  l=8  Boxplots of pairwise differences in log MSE for the estimators  PEM,PLUG-IN•> PEM,EBBS-G MM,EBBS~L °f ff A model (8.1), where I — 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.6 and m(z) = 7712(2). A  N  t  D  211  h  e  l i n e a r  e  ect  i  n  E M E B B S G minus E M P L U G IN  ~i  1  TT s — I 1=0  s  1 1=1  r~  s  s  l=21  s 1 l=3  s l=41  EMEBBS  4|—  s  s I l=6  I l=5  s I l=7  T  s  s  l=8I  I l=9  1_ 1=10  L minus E M P L U G IN  3 -  I  2 1 0  -  -1 -2 -3 -  l=0  1=1  l=2  l=3  l=4  l=5  l=6  l=7  l=8  l=9  1=10  E M _ E B B S _ L minus E M E B B S G 4  L T  3 2 1 0 -1 -2 -  l=0  Figure A.20: 0EM,PLUG-IN>  1=1  l=2  l=3  l=4  l=5  l=6  l=8  1 = 10  Boxplots of pairwise differences in log MSE for the estimators PEM,EBBS-G  A  N  D  0EM,EBBS-L  °f  t  h  e  l  i  n  e  a  r  ff  e  ect  01  i  n  m  o  d  e  l  ( )> SA  where Z = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.8 and m(z) = m ( z ) . 2  212  U _ E B B S _ G minus E M _ E B B S _ G +  » i  • i — t 1=0  l=2  1=1  l=3  l=4  l=5  » i  I  i  l=7  l=8  l=9  l=6  i +  1=10  U _ P L U G _ I N minus E M E B B S G _  l=0  l  —  1=1  |  —  l=2  _  l=3  |  —  —  l=4  |  l=5  —  |  l=6  l=7  -  |  l=8  l=9  1=10  U _ E B B S _ G minus U P L U G IN  4 t • l=0  1=1  l=2  l=3  l=4  l=5  l=6  l=7  l=8  l=9  1=10  S _ M C V minus U E B B S G  | l=0  1=1  1_ | l=2  l=3  1  1  l=4  l=5  1 l=6  JL  1, T l=7  l=8  l=9  1=10  S _ M C V minus U _ P L U G _ I N  I I i l=0  1=1  l=2  l=3  l=4  i  i  l=5  l=6  4  T  l=7  l=9  1=10  S MCV'minus E M E B B S G  T i=o  1=1  Figure A.21:  I | l=2  l=3  |  j |  l=4  l=5  l=6  j. l=7  -4-H l=8  l=9  1=10  Boxplots of pairwise differences in log MSE for the estimators  ]PLUG-IN> PU]EBBS-G> @EM,EBBS-G  A  N  D  PS]MCV °f the linear effect ft in model  (8.1), where I — 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0 and m(z) — mi(z).  213  UJEBBS J  i  11  —f— f 1=0 —i  i  m  +  +  G minus E M E B B S G '  1 —  1=1  l=2  l=3  1  1  r  I  ¥ —  l=4  U PLUG  *  ii  +  +  l=5  l=6  JI  »  JI  —jr—  —  l=7  l=8  _jA_ —  l=9  —^—  1=10  IN m i n u s E M E B B S G  ^—t——1--!^—I—I—t—I—t1=0 —i  1=1  l=2  l=3  1  i  r  l=4  l=5  l=6  l=7  l=8  l=9  U E B B S G minus U P L U G IN ~ + + ~  1=10  _  I  I  I  I  I  I  I  i  i  I  l=0  1=1  l=2  l=3  l=4  l=5  l=6  l=7  l=8  l=9  I 1=10  S M C V minus U E B B S G  f { H 1 i I 1 -I- -f 4 1  S  1=0  1=1  I  I  f -1  5  5  l=0  1=1  + s  l=2  + s  l=3  + s  + s  l=4  + s  l=5  l=6  l=7  l=8  l=9  . M C V minus U _ P L U G _ I N  X i  s  l=2  4- X* X +X + T s  l=3  s  s  l=4  +  s  l=5  l=6  +  1=10  I  I  f  i  I *  s  s  l=7  l=8  l=9  1=10  l=7  l=8  l=9  1=10  s  s  S M C V minus E M E B B S G  -1  l=0  1=1  l=2  l=3  l=4  l=5  l=6  Figure A.22: Boxplots of pairwise differences in log MSE for the estimators M]PLUG-IN>  M]EBBS-G>  PEM,EBBS-G  A  N  D  MMCV  °f  t  h  e  l  i  n  e  a  r  ff  e  ect  Pi  i  n  m o d  el  (8.1), where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.2 and m{z) = m,\ (z).  214  _±_  dl  A  JL  A  s  S  S  S  S  S  1=6  1=7  1=8  1=9  1=10  l=9  1=10  l=0  1=1  l=2  l=3  1  i  i  Jl  U E B B S G minus U P L U G IN  4—i—i—I—fr—*—*—*—*—Il=0 -  1  1=1 1  l=2 1  l=4  l=3 1  l=5  l=6  1  1  1  l=7 1  l=8 1  1  S _ M C V minus U _ E B B S _ G  = 4 — | — | — | — i _ - | -  s 1  1=0  1  1=1  s 1  l=2  s 1  l=3  -  j  -  s  s  s  s  1  1  1  1  l=4  l=5  l=6  l=7  ~%~ s  l=8  s l=  s 1=10  S M C V m i n u s U P L U G IN  i  T" s  i -f~ - § ~ s  - f r *  l=0  1=1  l=2  l=3  l=4  l=5  l=6  l=7  i_  s l_ l=8  I  1  1  1  1  1  1  1  1  -  - t ~  l=9 1  s _l_ 1=10 1  S _ M C V minus E M _ E B B S _ G —^-3^ u  s  s  1  1  1=0  1=1  s  s  l=2  l=3  l  L  ^  i  s  ^  s  l  l=5  l=4  t  —  s  s  s  l=6  l=7  l=8  s  1  l=9  s  1  1=10  Figure A . 2 3 : Boxplots of pairwise differences in log MSE for the estimators ]PLUG-IN> PU]EBBS-G> PEM,EBBS-G  A  N  D  PS]MCV °f the linear effect ft in model  (8.1), where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.4 and m(z) — m\(z).  215  4  1  1  I  2 1  0  -2 A H  1  T  H  2 r\ U  -2  |  f  *  *  f  1  t  s  s  s  s  s  s  s  s  1 1=0  1 1=1  i  l=2  1=3  I  I  1  1  s  s  s  s  s  s  s  1=1  1=2  1=13  1 1=4  1=15  1=16  1=7  1  I  1  A  1 i  1=9  1 1=10  1  1  i  *  ±  s  S  1=18  1=9  I 1=10  1  1  I  X  —w—  U _ E B B S _ G minus U _ P L U G _ I N  |  1  i  i 1i  1  J  1  1  +  .  t  t  s  S  1 1=1  1=i2  1=3  t 1=4  1=5  1=6  1=7  1=8  1=9  i  i  i  i  i  i  1  1  1  s  4 .  s  s  1=13  1 1=4  I  1  + s  1=2  1=1  *  * s  s  1=16  1 1=7  1=8  s  1  + •  s  S  1=12  1  1 = 10  •• w  —  s  s  1 I= 10  1=19  1  s  1=13  1=4  1  s  s  1=15  1=16  s  1 1=7  S  S  s  1=18  1=19  I 1=10  1  I  ^_  —  1=1  1  T s  i  s  +  ^  _±. _±_ _4_ _4_ _A_  1 1 1 S _ M C V minus E M _ E B B S _ G  1=0  Figure A.24:  s  1 1= 5  1  1  _ i _ 4 =  1  1 i T  *  1 1 1 1 S _ M C V minus U _ P L U G _ I N  JL,  T J.  4 ,  *  s  20  _  1=12  s  1=0  $  1 1=1  • T •  1  S _ M C V minus U _ E B B S _ G  j  A  1  s  *  s  2  -2  1=8  1  J  1  s  1  1 1=0  1=0  /I  1=6  s  u  -2  1 1=7  t * * **  2  0  1=5  1=4  1  1=0  A  -2  1  1 1 1 1 U _ P L U G _ I N minus E M _ E B B S _ G  1  -r  1  1  1  i  s  0  >1  1  S  2 -2  1 1 1 1 1 U _ E B B S _ G minus E M _ E B B S _ G  s  s  1=13  1=4  1  s  s  s  S  1=15  1=16  1 1=7  1=18  S  s  1=19  1=10  Boxplots of pairwise differences in log MSE for the estimators  PU,PLUG-IN> PU,EBBS-G> PEM,EBBS-G M]MCV °f ff A model (8.1), where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.6 and m(z) — m\{z). A  N  D  216  t h e  l i n e a r  e  ect  i  n  1  1  1  I  1 1 1 1 1 U E B B S G minus E M E B B S_ G +  ix xx  s  1 1=0  -CO H  -Cfl  -Cfl  -CO  •  s 1=1  1=2  1=3  1=4  1=5  1=6  1  I  s 1=2  1=3  1  I  I  JL  +  s 1 1=0  s 1 1=1  i 1=3  1=2  1=6  i 1=4  i 1=5  i  ^ 1=0  1=1  1=2 i  1  —  S  i 1=6  ^  s 1=4 1  *  s 1=5 1  i  ™  +  p 1=7  1=8  i  i  1=2 1  I  i  s 1=3 1  s  s  s  1=1  l=2  1=5 1  s 1 l=4  s 1 l=5  s 1=10 [  1—  —— T 1=10  i 1=9  4 =  1  I  4 -  4 +  +  S  s  s i 1=9  i 1=8  1=7 i  I  . i- .  1 '  s 1=6 1  s 1 l=6  +  +  S 1 1=9  s I 1=10  +  S  s i 1=8  1=7  i  S  -  1  4 »  4 -  s 1 l=8  l=7  •  s 1 1=10  t  i ^^  s 1 l=3  s 1=9  +  • i i I S M C V minus E M E B B S G  t  1=0  s  s 1=4 1  '  +  —  S  i  i  s 1=6 1  1 '  I  w  i 1=8  i i i i S _ M C V minus U _ P L U G _ I N 1  s 1 1=10  +  i 1=7  T T T*  1=1  -  s 1=3 1  1 * i * '—1—'  1  - 9  E f a  s 1 1=9  + iT  T T • •  _t_  1=0  1=5  i  i i i i S M C V minus U E B B S G  1  -  1=4  I  i i I i U _ E B B S _ G minus U _ P L U G _ IN  !  t  -co  s 1=1  i 1=8  +  —  S  s 1 1=7  + ill  _G  JL  i  -Cfl  s 1 1=0  i  IN m i n u s E M E B B S  •  i  a  -CO -1  •  U PLUG  -Cfl  S  jt  i  S i l=9  s 1 1=10  Figure A . 2 5 : Boxplots of pairwise differences in log MSE for the estimators M]PLUG-IN> PU,EBBS-G> PEM,EBBS-G  A  N  D  M]MCV °f  t  h  e  l i n e a r  ff  e  ect  A  i  n  model  (8.1), where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.8 and m(z) = m\{z).  217  1  1  1 -r  s 1=0  w  1=2  I  I  1  1  s  s  S  1=0  1=1  1  s 1 1=3  1  1  1=1  1  i  '  w  s  ! T  s  1=1  ^  l=0  1=1  Figure A.26:  l=2  EM_EBBS_G  1  1=9  1  1=10  i  i  A  ,„^  1  s  s  s  l=3  l=4  l=5  l=6  l=7  l=8  l=9  1=10  l=7  l=8  l=9  1=10  l=7  l=8  l=9  1=10  l=7  l=8  l=9  1=10  l=3  ^  —I—  1  s  l=3  l=2  l=2  s  1=8  1  * *  s  1  minus  1 l=4  minus  l=4  S_MCV  l=0  s  1=7  1  s  1  U_EBBS_G  l=2  ^  T !  1 !t  s  S_MCV  ^  minus  s 1 1=6  Ii  s  1  l=2  1=1  s 1 1=5  1  1=4  -I  l=0  +  |  1  1=1  EM_EBBS_G  +  U_PLUG_IN  — i —  l=0  minus  U_EBBS_G  ii  £  i  1  1  1  -  U_PLUG_IN  *  *  l=5  l=6  *-  U_EBBS_G  l=5  minus  1  1  l=6  U_PLUG_IN  HH^^^H*—  l=3  l=4  S_MCV  l=5  minus  l=6  1—  EM_EBBS_G  H H I ^  l=3  s l_ l=5  l=4  l=6  l=7  l=8  1=10  Boxplots of pairwise differences in log MSE for the estimators  M]PLUG-IN> PU]EBBS-G> PEM\EBBS-G  A  N  D  W,MCV °f  t  h  e  linear effect ft in model  (8.1), where I = 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0 and m(z) — 7712(2).  218  JI  I ' '  l=6  l=8  U E B B S G minus E M E B B S G  + 1=0  * * I  +  1=1  l=2  l=4  l=5  l=7  l=9  1=10  *  1"  l=9  1=10  U P L U G IN minus E M E B B S G  4- + l=0  l=3  t  •f 1 1 1 1  l=2  1=1  l=3  l=5  l=4  l=6  U E B B S G minus U P L U G  i — « — i - — i — i -  -  l=8  l=7  i  IN i  —  —  i  —  *  s l=0  l=2  1=1  l=3  l=4  l=5  l=6  l=7  s  I  l=8  I  l=9  1=10  S M C V minus U E B B S G  _| l=0  _J_ 1=1  l=2  ^  ^  l=3  l=4  •I l=5  l=6  S M C V minus U P L U G  •  1—r l=0  1=1  l=2  l=3  l=4  l=5  l=7  l=8  l=9  1=10  IN  $ f  l=6  f  l=7  $  f-  l=8  l=9  1=10  l=8  l=9  1=10  S M C V minus E M E B B S G  ^ l=0  "* * * * t 1=1  Figure A.27:  l=2  l=3  l=4  l=5  l=6  t S i  l=7  Boxplots of pairwise differences in log MSE for the estimators  °f ff Pi model (8.1), where I — 0,1,..., 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.2 and m(z) = 7712(2). M]PLUG-IN>  PU,EBBS-G>  PEM,EBBS-G  A  N  D  M]MCV  219  t h e  l i n e a r  e  ect  i n  U WWiJJJJpn  ^  —<j||>—  EBBS  G minus E M E B B S  ^  _^J^_  «  G  ^  H  H  ^  M  Ma^^^n  s  L_  -2h  1=0  1=1  ±  +  enx^jua  1=0  l=2  U  l=4 PLUG  l=2  l=3  -4  ^  | ji I*  l=0  1=1  l=2  l=3  l=6  l=7  IN m i n u s E M E B B S  l=4  U_EBBS  -2  l=5  . jj| 't  ^ 1=1  l=3  ^  l=5  1=10  l=9  G  —"H^^"™* ^  l=6  l=7  G minus U P L U G  l=4  l=8  ——^^|^_  l=8  l=9  1=10  IN  I I t  « *  l=5  l=6  l=7  l=8  l=9  1=10  l=5  l=6  l=7  l=8  l=9  1=10  l=7  l=8  l=9  1=10  S _ M C V minus U P L U G  IN  T T l=0  1=1  l=2  l=3 S  2 0  -fr-  H(r-  l=4  l=5  l=6  M C V minus E M E B B S  G  -f-  -f~  -f~  t  l=4  l=5  l=6  l=7  { -4- -#-  -2 l=0  1=1  Figure A.28:  l=2  l=3  l=8  l=9  1=10  Boxplots of pairwise differences in log MSE for the estimators  M]MCV °f linear effect ft in model (8.1), where I — 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.4 and m(z) = m (z). PV,PLUG-IN>  M]BBBS-G>  MM,EBBS-G  t h e  A  N  D  2  220  w i rJLvi' i J i in ^ 1=0  1=1  l=2  *  *  i  1=0  1=1  4. 4,  j  ,U E B B S G minus E M E B B S G  ^  l=3  ^  ^  l=4  l=5  ^  ^ ^  l=7  l=6  l=8  l=9  1=10  l=8  l=9  1=10  l=8  l=9  1=10  U _ P L U G _ I N minus E M _ E B B S _ G  I  (  l=3  l=4  1_|  S  i_  l=2  1I ! i  U_EBBS  l=5  l=6  l=7  G minus U P L U G IN  i iV»  s  1=0  1=1  1  l=2  l=3  1  l=4  l=5  l=6  l=7  i  S _ M C V minus U _ E B B S _ G  i  -f-  —S • s  1=3  1=4  1=5  1=6  1=7  4H  -I  «  -Cfl  1=2  -CA  s  I 1=1  -CA  s  1 1=0  -co -HII  1  - i " T  -r s  <  1=0  1  T  i  s  1 1=2  s  s  1 1=3  1 1=4  s  1 1=6  S  1  1=8  S  1  1=9  1  1  1  -  1=10 1  1  _±_ _|_ _±_  S  S  1=7  1=8  s  1 1=5  S  1  J "f"" ^ s  1  1=1  -§- ~  +  S _ M C V minus U _ P L U G _ I N 1  i  S  1  S  1  1=9  1  -  1=10  S _ M C V minus E M _ E B B S _ G  —di—  . T .  s  s  1=0  1=1  Figure A.29:  - i s  s  s  s  1  1  1  1  1=2  1=3  Boxplots  1=4  1=5  of pairwise  W,PLUG-IN> P<J,EBBS-G> PEM,EBBS-G (8.1),  where I = 0,1,...  significantly obtained from  different by evaluating  model  ,10.  Boxplots  A  N  D  s  S  S  S  S  1  1=7  l=8  l=9  1=10  1=6  differences  in  I  log MSE  M]MCV °f  for which  t  h  e  the average  I  for  linear  the log MSE's  of the estimators  221  =  m (z). 2  for  -  I  the  estimators  effect Pi in model  difference  than 0 at the 0.05 level are labeled with an S.  (8.1) for which p = 0.6 and m(z)  "  in log MSE  Differences  500 data  sets  is  were  simulated  1  1  i  'u  t 1 i ?  *?  1=0  1=1  EBBS'  f  1=2  1=3  i  'u  1=4  + *  5  ?  ?  =6  1=7  I i  +  . — — ^ ? ? ?  1=1  l=2  l=3  l=4  l=0  1=1  l=2  *  +  +  l=3  l=4  l=5  S  i=o  1=1  l=2  l=3  —i—  l=4  l=6  l=5  S _ M C V minus  +  + -L  ? 1=8  ?  1=9  1=10  1  +  +  -i-  -L.  + «  s  S  ?  l=7  l=8  l=9  1=10  l=7  l=8  l=9  1=10  l=7  l=8  l=9  1=10  l=8  l=9  1=10  1  1  1  U_PLLIG_IN  M C V minus U E B B S  TT* X  ?  l=6  ' U _ E B B S _ G minus  1_ J.  +  S  l=5  +  i  _ P L U G _ I N minus E M_ E B B S . _ G '  | l=0  -L. +  1=5  ?  s  I  |  1  i  G minus E M . E ' B B S _ G '  G  l=6  U_PLUG_IN  f t _±_  -5  i=o  1=1  I  I  ?  1=0  l=2  l=3  +  l=4  l=5  's. . M C V ' m i n u s  l=6  i  EM_EBBS_G  A  +  + S  S  S  S  1=1  I=2  l=3  1=4  l=5  l=6  ,+  l=7  ±  -4-  _±_ T l=7  1=8  ?  9  1=9  1=10  Figure A . 3 0 : Boxplots of pairwise differences in log MSE for the estimators °f ff Pi (8.1), where I — 0 , 1 , . . . , 10. Boxplots for which the average difference in log MSE is significantly different than 0 at the 0.05 level are labeled with an S. Differences were obtained by evaluating the log MSE's of the estimators for 500 data sets simulated from model (8.1) with p = 0.8 and m(z) = 7 7 1 2 ( 2 ) . Pu,PLUG-IN>  M]EBBS-G>  PEM,EBBS-G  A  N  D  222  # S , M C V  t  h  e  l i n e a r  e  ect  i  n  m o d e l  Appendix B Validity of Confidence Intervals In this appendix, we provide plots that help assess and compare the coverage properties of various methods for constructing standard 95% confidence intervals for Pi, the linear effect i n model (8.1). For each method, we visualize point estimates and 95% confidence interval estimates for the true coverage achieved by that method.  223  p = 0; m(z) = 2sin(3z) - 2(cos(0)-cos(3))/3  USUAL + PLUG-IN  USUAL + E B B S - G  USUAL + E B B S - L  MODIFIED + P L U G - I N  MODIFIED + E B B S - G  MODIFIED + E B B S - L  0.98 0.96 0.94 0.92 0.9  SPECKMAN + MCV  0.98  pHTl  = method with superior M S E performance  0.96  0.94  0.92  Figure B . l : Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect f3\ in model (8.1). Each method depends on a tuning parameter I = 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0 and m(z) — m\(z).  224  p = 0.2; m(z) = 2sin(3z) - 2(cos(0)-cos(3))/3  USUAL + PLUG-IN  USUAL + E B B S - G  USUAL + E B B S - L  0 MODIFIED + P L U G - I N  1  MODIFIED + E B B S - G  5  10  MODIFIED + E B B S - L  0.98 0.96 0.94 0.92 0.9 10  0.88  10  SPECKMAN + MCV  ;  method with superior M S E performance  Figure B.2: Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect Pi in model (8.1). Each method depends on a tuning parameter I = 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.2 and m(z) = 7 7 1 2 ( 2 ) .  225  p = 0.4; m(z) = 2sin(3z) - 2(cos(0)-cos(3))/3 USUAL + PLUG-IN  USUAL + EBBS-G  USUAL + EBBS-L  IjlHHlHI  10 MODIFIED + PLUG-IN  MODIFIED + EBBS-G  MODIFIED + EBBS-L  SPECKMAN + MCV  ;  method with superior MSE performance  Figure B.3: Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect B\ in model (8.1). Each method depends on a tuning parameter I = 0 , 1 , . . . , 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.4 and m(z) = m\(z).  226  p — u.o, nil,*.; — ^ s i r n o ^ —  USUAL + PLUG-IN  ^vuus>t_u;—isua^Offio  USUAL + E B B S - G  USUAL + EBBS-L  MODIFIED + E B B S - G  MODIFIED + EBBS-L  • • • • • • • • 0.9 0.8 0.7 0.6 0.5  10  MODIFIED + PLUG-IN  10  1 0.9  SPECKMAN + MCV  • •+ • • M M i f = method with superior MSE performance  0.8 0.7 0.6 0.5  0  5  10  Figure B.4: Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect Pi in model (8.1). Each method depends on a tuning parameter I — 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p — 0.6 and m(z) = mi(z).  227  p = 0.8; m(z) = 2sin(3z) - 2(cos(0)-cos(3))/3  USUAL + PLUG-IN  USUAL + E B B S - G  USUAL + EBBS-L  MODIFIED + PLUG-IN  MODIFIED + E B B S - G  MODIFIED + EBBS-L  5  10  SPECKMAN + MCV  • •••••• *••• • method with superior MSE performance  10  Figure B . 5 : Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect Q\ in model (8.1). Each method depends on a tuning parameter I = 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.8 and m(z) = m\(z).  228  p = 0; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6 USUAL + PLUG-IN  1  USUAL + E B B S - G  USUAL + E B B S - L  0.9 0.8 0.7 0.6 0.5 10 MODIFIED + PLUG-IN  0.4  10 MODIFIED + E B B S - G  MODIFIED + E B B S - L  SPECKMAN + MCV  f^vl = method with superior MSE performance  Figure B.6: Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect ft in model (8.1). Each method depends on a tuning parameter I = 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p — 0 and m(z) = 7 7 1 2 ( 2 ) .  229  p = 0.2; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6  USUAL + PLUG-IN  USUAL + E B B S - G  USUAL + EBBS-L  MODIFIED + PLUG-IN  MODIFIED + E B B S - G  MODIFIED + EBBS-L  SPECKMAN + MCV  • method with superior MSE performance  Figure B.7: Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect ft in model (8.1). Each method depends on a tuning parameter I = 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p — 0.2 and m(z) = 7 7 1 2 ( 2 ) .  230  p = 0.4; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6  USUAL + PLUG-IN  USUAL + E B B S - G  USUAL + EBBS-L  I M • • t M | •  SPECKMAN + MCV  : method with superior MSE performance  Figure B.8: Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect B\ in model (8.1). Each method depends on a tuning parameter I = 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.4 and m(z) = 7 7 1 2 ( 2 ) .  231  p = 0.6; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6  1  USUAL + PLUG-IN  1  USUAL + E B B S - G  1  0.9  0.9  0.8  0.8  0.8  0.7  0.7  0.7  0.6  0.6  0.6  0.5  0.5  0.5  0.4  0.4  0.4  0.3  0.3  0.3  0.2  0.2  10 MODIFIED + PLUG-IN  USUAL + EBBS-L  0.9  • *  10 MODIFIED + E B B S - G  0.2  MODIFIED + EBBS-L  MM***!mm •  5  10  A'  -  10  5  10  SPECKMAN + MCV  jj = method with superior MSE performance  Figure B.9: Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect Pi in model (8.1). Each method depends on a tuning parameter I = 0,1,..., 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.6 and m(z) = 7 7 1 2 ( 2 ) .  232  p = 0.8; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6  1  0.9  USUAL + PLUG-IN  1  • *  USUAL + E B B S - G  USUAL + EBBS-L  MODIFIED + EBBS G  MODIFIED + EBBS-L  0.9  0.8 0.7  0.8 0.7  0.6  0.6  0.5  0.5  0.4  0.4  0.3  0.3  0.2  0.2 10 MODIFIED + PLUG-IN  5  10  SPECKMAN + MCV  :  method with superior MSE performance  Figure B.10: Point estimates (circles) and 95% confidence interval estimates (segments) for the true coverage achieved by seven different methods for constructing 95% confidence intervals for the linear effect fi\ in model (8.1). Each method depends on a tuning parameter I — 0 , 1 , . . . , 10. The nominal coverage of each method is indicated via a horizontal line. Estimates were obtained with p = 0.8 and m(z) = m,2(z).  233  Appendix C Confidence Interval Length Comparisons In this appendix, we provide plots that help assess and compare the length properties of three methods for constructing standard 95% confidence intervals for 3 , X  effect i n model (8.1). and  Ps^MCVi  a n <  the linear  These methods rely on the estimators PIJPLUG-IN^ PUEBBS-G  ^ their associated standard errors. W e remind the reader that the finite  sample properties of these estimators were investigated v i a simulation i n C h a p t e r 8.  234  p = 0; m(z) = 2sin(3z) - 2(cos(0)-cos(3))/3 U-PLUG-IN  U-EBBS-G  1.15  S-MCV  1.15  1.15  = l>0  = confidence interval with shortest expected length for each l> 0 (among U - P L U G - I N , U - E B B S - G and S - M C V ) U _ E B B S _ G minus U PLUG IN  0.2  I  0  *  -0.2  s  -0.4  I *  l=0  s 1=1  1  0.4 0.2 0 -0.2 -0.4  0.4  :  I *  S  S  S  1  i  i  r  l=2  +  s  s  s  1  I  1  +  0.2  1  1=2  1=3 1  S 1=3  t  t  s  S  i  i  S i  "  i  S  i  S  S  1 *  ¥  -0.4  s 1=0  S  1  1=1  1=2  1=3  S •  1=4  i  1=5  ~w~ -  s  s  i  S  1  1=8  1=9  1=10  i  S  s  JL ' T P —  1  1=8  1=9  S  s  1=10  i  0 -0.2  1——— *  1  1=4 1=5 1=6 1=7 S_MCV minus U._EBBS_G i  1  i  1=4 1=5 1=6 1=7 S_MCV minus U _PLUG_IN .  •  1  1  I i  S  1  1=1  I *  I  +  1=0  I *  s  +  I  I *  s  S  1  1=6  1  1=7  1=8  1=9  I 1 = 10  Figure C l : Top row: Average length of the standard confidence intervals for the linear effect ft in model (8.1) as a function of I — 0 , 1 , . . . , 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for ft. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0 and m(z) = m\(z).  235  p = 0.2; m(z) = 2sin(3z) - 2(cos(0)-cos(3))/3 U-PLUG-IN  U-EBBS-G  1.2 1.15 i  1| 1 ***  *  M  1.1 1.05  S-MCV  1.2 1.15 1.1 0  5  10  1.2  1.05  1.15 T  1  t  *  1.1  0  5  10  1.05  0  5  10  = I> 1  = confidence interval with shortest expected length for each l> 1 (among U - P L U G - I N , U - E B B S - G and S - M C V ) U _ E B B S _ G minus U PLUG IN  l=0  1=1  l=2  l=3  l=4 l=5 l=6 l=7 S_MCV minus U EBBS G  l=8  l=9  1 = 10  l=9  1=10  Figure C.2: Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I = 0,1,..., 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p — 0.2 and m(z) — m (z). 1  236  p = 0.4; m(z) = 2sin(3z) - 2(cos(0)-cos(3))/3 U-PLUG-IN  U-EBBS-G  1.2  S-MCV  1.2  1.1  * » « » » * «  • 0  1.2  , *...... *  1.1  t f  1 5  10  :  1.1  ff  T  • 0  1 5  10  0  5  10  I>2  = confidence interval with shortest expected length for each l> 2 (among U - P L U G - I N , U - E B B S - G and S - M C V ) U _ E B B S _ G minus U PLUG IN  l=6  l=7  l=8  l=9  1=10  Figure C . 3 : Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I — 0 , 1 , . . . , 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0.4 and m(z) — mi(z). 237  p = 0.6; m(z) = 2sin(3z) - 2(cos(0)-cos(3))/3 U-PLUG-IN  U-EBBS-G  1.2  S-MCV  1.2  1  » »» < 1 D «  0.8  1  :0  * * * *  0.8 10  10  I> 3 = confidence interval with shortest expected length for each l> 3 (among U - P L U G - I N , U - E B B S - G and S - M C V ) U _ E B B S _ G minus U PLUG IN  l=3  0.5  l=4 l=5 l=6 l=7 S MCV minus U PLUG IN  l=9  1  --  1=10  -  0 -0.5  l=8  S  S  S  S  l=0  1=1  l=2  I  l=3  I  I  1  1  S  1  L  ±  ±  •  +  +  J  l=4 l=5 l=6 » U EBBS s G l=7 s S MCV minus 1  1  l=6  l=7  +  1  -1  l=8 s  l=9 s  1=10 s  l=8  l=9  1=10  1  1  1  0.5 0 -0.5  l=0  1=1  l=2  l=3  l=4  l=5  Figure C.4: Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I — 0 , 1 , . . . , 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0.6 and m(z) = mi(z). 238  p = 0.8; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6 U-PLUG-IN  U-EBBS-G  S-MCV  1.4  1.4  1.2  1.2  1  1  •  0.8  0.8  •0  S  10  ••  •  ••  0  5  10  I>4 = confidence interval with shortest expected length for each l> 4 (among U - P L U G - I N , U - E B B S - G and S - M C V )  If  U _ E B B S _ G minus U_PLUG_IN  0.2  = r a  0  "i  i  l=2  l=3  -0.2 -0.4 -0.6  S  S  1=0  1=1  1  1  1  - A ^  '  S  1  1=0  i Eh S  1 1  1=1  I  l  f  1  l=2  l=8  -1  r  l=9  »-  1=10  j  Ek ^ S  1—  l=4 l=5 l=6 l=7 S MCV minus U P L U G IN 1  0.5  1  S  l=3 1  S  1  S  S  S  S  l=4 l=5 l=6 l=7 1 1 1 S_MCV minus U EBBS G I  S  l=8 1  S  l=9 I  1=10 |  I  1  0.5 h  j. S  l=0 I  1  JL S  1=1  1  JL S  l=2 1  •*• S  l=3 1  J. S  l=4  I  S  l=5 I  j. S  J. S  l=6  I  S  l=7 I  S  l=8 I  S  l=9  I  1=10 I  Figure C.5: Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) os a function of I = 0,1,... ,10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0.8 and m(z) — mi(z). 239  p = 0; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6 U-PLUG-IN  U-EBBS-G  S-MCV  =I>0  - confidence interval with shortest expected length for each l> 0 (among U - P L U G - I N , U - E B B S - G and S - M C V ) U _ E B B S _ G minus U PLUG IN  -0.5  0.5  -0.5  1=3  -  I  +  s 1  1=0  s 1  1=1  '+'  1  1=4 1=5 1=6 1=7 S_MCV minus U EBBS G  1  1  1  1 , f ,  . t . J i  s  s  l=2  1=3  S  1=4  i  "i •  s 1  1=5  S  1  1=6  S  S  S  S  1=7  1=8  1=9  1=10  1  1  Figure C.6: Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I — 0,1,..., 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p — 0 and m(z) — 1712(2).  240  p = 0.2; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6 U-EBBS-G  U-PLUG-IN  S P E C K M A N + MCV  I> 1 = confidence interval with shortest expected length for each l> 1 (among U - P L U G - I N , U - E B B S - G and S - M C V ) U EBBS G minus U P L U G IN  0.2  I  0  l  |  -0.2  1  l  I  s  1=0  l  »  s  l  1  !  i  (  S1  ,  1  s  1=1  l=2  l=3  i  i  i  i  0.5  l=4 l=5 l=6 l= 7 S_MCV minus U_PLUG_IN i  1  i  +  i  0  Js  -0.5  - L i J L JL J ± + ^ r r ^ T r - i -  i  u  s  1  i  1  I  ; JL  -0.5 1  S  1  l=0  1 l=2  I  1 S  1 1=1  S  I  I  1  ±  1 l=2  1 l=3  +  4.  s  1 1 1 l=4 l=5 l=6 l= 1 7 S_MCV minus U _ E B B S _ G  1 l=3  I  ±  s  s  s  1=9  1=10  1  1  1=1 8  ± jL  1  1  1  l=5  S  1  s  s  1 l=9  l=6  S  1  l=7  i  J L S  1 l=8  JL  -L S  s 1 1=10  i  ±  +  4 -  1  1  -- 4 .  i  I  t  1 l=4  s  1= 8  +  j  s  1 1=1  1 1=0  I  0  l  4s  -0.4  0.5  l  S  1  l=9  J, S  1  1=10  Figure C.7: Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I = 0,1,... ,10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p = 0.2 and m(z) = 7712(2).  241  p = 0.4; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6 U-PLUG-IN  U-EBBS-G  S-MCV  1.2  1.2  1.1  1.1  1  1  • 0  5  10  0  5  10  = l>2  = confidence interval with shortest expected length for each l> 2 (among U - P L U G - I N , U - E B B S - G and S - M C V ) U _ E B B S _ G minus U P L U G IN  0.2 —  i  0— -0.2 -  i i i  i  i  -0.4 S  I—0  —I—  1=1  l=2 -i  0.5  !!  T~ -0.5  s  1=0  j.  l=3  s  s  l=2  l=8  r  l=9 i  +  m—r 1=1  l=4 l=5 l=6 l=7 S_MCV minus U PLUG IN  T  i  3  I  l=4 l=5 l=6 l=7 S_MCV minus U E B B S G  r  1  4 4  S l=3  1=10  s  s  l=8  l=9  1=10  l=8  l=9  1=10  Figure C.8: Top row: Average length of the standard confidence intervals for the linear effect Pi in model (8.1) as a function of I = 0,1,... ,10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Pi. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p — OA and m(z) = 7 7 1 2 ( 2 ) .  242  p = 0.6; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6 U-PLUG-IN  U-EBBS-G  1.2  S-MCV  1.2  1 0.8  tj>  •  *  «  « : « . «  «  1  ;»  •  •  1.2  0.8  •  •  •  » * * # *  « »  :  1  • *  0.8  10  10  10  = I ;> 3  = confidence interval with shortest expected length for each l> 3 (among U - P L U G - I N , U - E B B S - G and S - M C V ) U _ E B B S _ G minus U PLUG IN  0.5 0 -0.5  l=3  l=2  1  1  i  —r  P  b|d  j d  S  S  l=4 l=5 l=6 l=7 S_MCV minus U E B B S G  1  1  0.5 "  0 -0.5  I  S  1  l=0  1  1=1  1  l=2  I-7-I S  I l=3  S  h|=l +  I l=4  S  I  1  =t i-i-l +  S  I  l=5  —  l=6  i  I  ^-i s  1  l=7  l=8  1  l=9  1=10  1  1  hjH Fj=l  f=Y=l  s  s  l=8 i  l=9 i  s  1=10 i  Figure C.9: Top row: Average length of the standard confidence intervals for the linear effect 3\ in model (8.1) as a function of I = 0,1,... ,10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for Q\. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p — 0.6 and m(z) = 7 7 1 2 ( 2 ) .  243  p = 0.8; m(z) = 2sin(6z) - 2(cos(0)-cos(6))/6 U - PLUG-IN  U- EBBS-G  3-MCV  1.4  1.4  1.4  1.2  1.2  1.2  1  1  1  « « • » « « «  •  0.8  ••  0.8  •0  5  10  0  0.8  •  5  •  10  ••  •  ••  0  5  10  I >4  - confidence interval with shortest expected length for each l> 4 (among U - P L U G - I N , U - E B B S - G and S - M C V ) U _ E B B S _ G minus U_PLUG_IN  0.2  r-  0 —  ! ! * 1  -0.2 -0.4 -0.6 l =  0.5  °  1=1  I  1  l=3  1  I  l=4 l=5 l=6 l=7 S_MCV minus U_PLUG_IN 1  4* Rp  ¥ I  l=2  s  s  s  1=0  1=1  |=2  s  I  1  1  1  •  l=9  1=10  J  h -  I '  s  l= 3  :  l=8  s  s  s  s  s  s  l=4 l=5 l=6 l= 7 S_MCV minus U _ E B B S _ G  1= B  1=9  1 =10  1=7  1=8  1=9  1=10  i  1  1  ,  Figure C.10: Top row: Average length of the standard confidence intervals for the linear effect ft in model (8.1) as a function of I = 0,1,..., 10. Standard error bars are attached. Bottom three rows: Boxplots of pairwise differences in the lengths of the standard confidence intervals for ft. Boxplots for which the average difference in lengths is significantly different than 0 at the 0.05 level are labeled with an S. Lengths were computed with p — 0.8 and m(z) — 1 7 1 2 ( 2 ) . 244  

Cite

Citation Scheme:

    

Usage Statistics

Country Views Downloads
United States 14 1
China 9 43
Canada 5 4
France 5 0
Germany 2 12
United Kingdom 1 0
Poland 1 0
City Views Downloads
Unknown 9 13
Ashburn 7 0
Vancouver 5 4
Shenzhen 5 43
Wilmington 3 0
Beijing 3 0
Washington 2 0
Mountain View 1 0
Redmond 1 0
Guangzhou 1 0

{[{ mDataHeader[type] }]} {[{ month[type] }]} {[{ tData[type] }]}
Download Stats

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0092286/manifest

Comment

Related Items