UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Linear mixed effects models in functional data analysis Wang, Wei 2008

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2008_spring_wang_wei.pdf [ 664.08kB ]
Metadata
JSON: 24-1.0066202.json
JSON-LD: 24-1.0066202-ld.json
RDF/XML (Pretty): 24-1.0066202-rdf.xml
RDF/JSON: 24-1.0066202-rdf.json
Turtle: 24-1.0066202-turtle.txt
N-Triples: 24-1.0066202-rdf-ntriples.txt
Original Record: 24-1.0066202-source.json
Full Text
24-1.0066202-fulltext.txt
Citation
24-1.0066202.ris

Full Text

Linear Mixed Effects Models in Functional Data Analysis by Wei Wang  B.Sc., University of Science and Technology of China, 2001 M.Sc., National University of Singapore, 2003  A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy in The Faculty of Graduate Studies (Statistics)  The University Of British Columbia January, 2008 c Wei Wang 2008  Abstract Regression models with a scalar response and a functional predictor have been extensively studied. One approach is to approximate the functional predictor using basis function or eigenfunction expansions. In the expansion, the coefficient vector can either be fixed or random. The random coefficient vector is also known as random effects and thus the regression models are in a mixed effects framework. The random effects provide a model for the within individual covariance of the observations. But it also introduces an additional parameter into the model, the covariance matrix of the random effects. This additional parameter complicates the covariance matrix of the observations. Possibly, the covariance parameters of the model are not identifiable. We study identifiability in normal linear mixed effects models. We derive necessary and sufficient conditions of identifiability, particularly, conditions of identifiability for the regression models with a scalar response and a functional predictor using random effects. We study the regression model using the eigenfunction expansion approach with random effects. We assume the random effects have a general covariance matrix and the observed values of the predictor are contaminated with measurement error. We propose methods of inference for the regression model’s functional coefficient. As an application of the model, we analyze a biological data set to investigate the dependence of a mouse’s wheel running distance on its body mass trajectory.  ii  Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  ii  Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . .  iii  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  vi  Abstract  List of Tables  List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . .  x  Statement of Co-Authorship . . . . . . . . . . . . . . . . . . . . .  xi  1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Summary of the biology experiment . . . . . . . . . . . . . .  1 1  1.2  Review of smoothing methods  . . . . . . . . . . . . . . . . .  2  1.3  Functional regression models . . . . . . . . . . . . . . . . . .  5  1.4  Introduction to the thesis work . . . . . . . . . . . . . . . . .  9  Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  11  .  13  . . . . . . . . . . . . . . . . . . . . . . . . . . .  13  . . . . . . . . . . . . . . . . . . . . . . . . . . . .  13  2  Identifiability discussion of linear mixed effects models 2.1  Introduction  2.2  Motivation  2.3  Simple sufficient conditions of identifiability  2.4  Sufficient conditions of identifiability for a structured Σ  2.5  Extensions  . . . . . . . . .  16  . .  17  . . . . . . . . . . . . . . . . . . . . . . . . . . . .  22  Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  25  iii  Table of Contents 3 Linear mixed models for measurement error in functional regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  26  3.1  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . . .  26  3.2  Notation and Preliminaries . . . . . . . . . . . . . . . . . . .  29  3.3  Parameter estimation  31  3.3.1  . . . . . . . . . . . . . . . . . . . . . . .  32  . . . . . . . . . . . . . . . . . . . . . .  32  . . . . . . . . . . . . . . . . . . . . . .  34  and σ 2(t) . . . . . . . . . . . . . . . . .  36  Inference for β . . . . . . . . . . . . . . . . . . . . . . . . . .  38  3.4.1  . . . . . . . . . . . . . . . .  38  3.5  Model assumption checking . . . . . . . . . . . . . . . . . . .  43  3.6  Model application  . . . . . . . . . . . . . . . . . . . . . . . .  44  3.6.1  Choosing the basis functions . . . . . . . . . . . . . .  44  3.6.2  Data description . . . . . . . . . . . . . . . . . . . . .  45  3.6.3  Choice of basis functions  . . . . . . . . . . . . . . . .  45  3.6.4  Estimation and residual analysis . . . . . . . . . . . .  46  3.6.5 3.6.6  Inference for β(·) . . . . . . . . . . . . . . . . . . . . 47 Inference for β s − β c . . . . . . . . . . . . . . . . . . 48  3.7  Σx(t) 2(t)  3.3.2  Updating  3.3.3  Updating σ  3.3.4 3.4  Initial estimates of parameters other than µ and β 0  Updating β  (t)  Hypothesis testing for β  Simulation study . . . . . . . . . . . . . . . . . . . . . . . . .  48  3.7.1  Two stage estimate  . . . . . . . . . . . . . . . . . . .  49  3.7.2  One sample comparison . . . . . . . . . . . . . . . . .  50  3.7.3  Two sample comparison . . . . . . . . . . . . . . . . .  51  3.7.4  Edge effect discussion in one-sample MSE comparison  52  Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  73  4 Dependence of average lifetime wheel-running on body mass ontogeny  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  4.1  Introduction  4.2  Methods  4.3  Results  4.4  Discussion  75  . . . . . . . . . . . . . . . . . . . . . . . . . . .  75  . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  78  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  82  . . . . . . . . . . . . . . . . . . . . . . . . . . . .  85  iv  Table of Contents Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  95  5 Conclusions and future plan . . . . . . . . . . . . . . . . . . .  98  5.1  Models with correlated individuals . . . . . . . . . . . . . . .  98  5.2  Proposing models with a functional response . . . . . . . . . 101 5.2.1  Models of a functional response  . . . . . . . . . . . . 101  5.2.2  Outline of the model study . . . . . . . . . . . . . . . 103  5.2.3  Literature review  . . . . . . . . . . . . . . . . . . . . 104  Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106  Appendices A Appendix to Chapter 2 . . . . . . . . . . . . . . . . . . . . . . 107 A.1 Proof of Corollary 2.4.1 . . . . . . . . . . . . . . . . . . . . . 107 A.2 Proof of Corollary 2.4.2 . . . . . . . . . . . . . . . . . . . . . 108 B Appendix to Chapter 3 . . . . . . . . . . . . . . . . . . . . . . 110 B.1 Definition of the first differential . . . . . . . . . . . . . . . . 110 B.2 Definition of the second differential  . . . . . . . . . . . . . . 113  B.3 Matrix algebraic and differential rules . . . . . . . . . . . . . 115 B.4 Calculations in Section 3.3.2  . . . . . . . . . . . . . . . . . . 116  B.5 Calculations in Section 3.3.4  . . . . . . . . . . . . . . . . . . 118  v  List of Tables 3.1  P-values of the test Ho : β(t) = 0, for all t ∈ [−1, 60] in each  group. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  3.2  P-values of the test Ho :  β s (t)  =  β c (t),  for all t ∈ [−1, 60],  within each gender. . . . . . . . . . . . . . . . . . . . . . . . . 4.1  P-values of the test Ho : β(t) = 0, for all t ∈ [−1, 60], using  test statistic Uf . . . . . . . . . . . . . . . . . . . . . . . . . .  4.2  P-values of the test Ho :  β s (t)  =  β c (t),  for all t ∈ [−1, 60],  using test statistic Uf . . . . . . . . . . . . . . . . . . . . . . .  54 54  87 87  vi  List of Figures 3.1  Plots of log body mass versus week for the four groups of laboratory mice. . . . . . . . . . . . . . . . . . . . . . . . . .  3.2  Histogram of the response: averaged wheel running from weeks 5 to 60 for the four groups of laboratory mice.  3.3  55  . . . . . . . .  56  Plots of the proportion of cumulative variance of the centered log body mass explained by the first ten principal components of the four groups of mice, after the individual average log body mass has been removed.  3.4  . . . . . . . . . . . . . . . . .  57  The constant function and the first three smoothed eigenfunctions of the covariance of centered log body mass of the four groups of mice, calculated as described in Section 3.6.3. . . .  3.5  58  Residual plots of the fit of the Yi = averaged wheel running of the four groups of mice.  . . . . . . . . . . . . . . . . . . .  3.6  Plots of residuals of Yi = averaged wheel runining in the male  3.7  control group of mice before and after removing the outlier. Plots of βˆ and its standard errors computed from the Hessian  59 60  matrix (solid) and from the bootstrap (dash-dot) for the four groups of mice, as described in Section 3.4. . . . . . . . . . . 3.8  61  Comparing the permuted values of the generalized likelihood ratio statistic Ul with the Wald statistic Uw for four groups of mice. The equivalence of Ul and Uw were observed in Table 3.1.  3.9  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . s ˆ Plots of β − βˆc and the standard errors of the difference computed from the Hessian matrix (solid) and from the bootstrap  62  (dash-dot) for both genders.  63  . . . . . . . . . . . . . . . . . .  vii  List of Figures 3.10 MSE of the estimate of β for each γ value in one sample simulation as described in Section 3.7.2. Compare the ECME method with the two stage method. . . . . . . . . . . . . . .  64  3.11 Proportion of times Ho is rejected using level α = 0.01, where Ho : β(t) = 0, for all t ∈ [−1, 60] in one sample simulation as  described in Section 3.7.2. Two test statistics are considered,  Uw and Uf . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  65  3.12 Proportion of times Ho is rejected using level α = 0.05, where Ho : β(t) = 0, for all t ∈ [−1, 60] in one sample simulation as  described in Section 3.7.2. Two test statistics are considered,  Uw and Uf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.13 MSE of the estimate of  βs  66  for each γ value in two sample  simulation as described in Section 3.7.3. Compare the ECME method with the two stage method. . . . . . . . . . . . . . . 3.14 MSE of the estimate of  βc  67  for each γ value in two sample  simulation as described in Section 3.7.3. Compare the ECME method with the two stage method. . . . . . . . . . . . . . .  68  3.15 MSE of the estimate of β s − β c for each γ value in two sample simulation as described in Section 3.7.3. Compare the ECME method with the two stage method. . . . . . . . . . . . . . .  69  3.16 Proportion of times Ho is rejected using level α = 0.01, where Ho : β s = β c , for all t ∈ [−1, 60] in two sample simulation as described in Section 3.7.3. Four test statistics are considered  Ul , Uw , Ue and Uf .  . . . . . . . . . . . . . . . . . . . . . . .  70  3.17 Proportion of times Ho is rejected using level α = 0.05, where Ho : β s = β c , for all t ∈ [−1, 60] in two sample simulation as described in Section 3.7.3. Four test statistics are considered  Ul , Uw , Ue and Uf .  . . . . . . . . . . . . . . . . . . . . . . .  71  3.18 MSE of the estimate of β for each γ value using truncated log body mass as described in Section 3.7.4. Compare the ECME method with the two stage method. . . . . . . . . . . . . . .  72  viii  List of Figures 4.1  Plots of log body mass versus week for the four groups of laboratory mice. . . . . . . . . . . . . . . . . . . . . . . . . .  4.2  Histogram of the response: averaged wheel running from weeks 5 to 60 for the four groups of laboratory mice.  4.3  88  . . . . . . . .  89  Plots of the proportion of cumulative variance of the centered log body mass explained by the first ten principal components for four groups of mice, after the individual average log body mass has been removed. . . . . . . . . . . . . . . . . . . . . .  4.4  90  The constant function and the first three smoothed eigenfunctions of the covariance of centered log body mass of four group. The four groups’ functions are together in one panel.  4.5  The eigenfunctions are calculated as described in Section 4.3. Plots of βˆ and its standard errors computed from the Hes-  91  sian matrix (solid) and from the bootstrap (dash-dot) for four groups of mice. Standard errors are calculated as described in Section 4.2. 4.6  . . . . . . . . . . . . . . . . . . . . . . . . . .  92  Plots of the covariances between the average wheel running and the log body mass trajectory at week j, j = −1, . . . , 60, for four groups of mice. Solid lines are the covariance from the model and dash-dot lines are the pointwise sample covariances between average wheel running and log body mass. For the  4.7  calculation, refer to Heckmand and Wang (2007). . . . . . . Plots of βˆs − βˆc , the difference between the selection βˆ and the  93  ˆ and the standard errors of the difference computed control β,  from the Hessian matrix (solid) and from the bootstrap (dashdot) for both genders. . . . . . . . . . . . . . . . . . . . . . .  94  ix  Acknowledgements For the completion of this thesis, I would like very much to express my heartfelt gratitude to my supervisor Professor Nancy Heckman for all her invaluable advice and guidance, endless patience, kindness and encouragement during the mentor period in the Department of Statistics of University of British Columbia. I have learned many things from her, particularly regarding academic research and analytical writing. I truly appreciate all the time and effort she has spent in helping me to solve the problems encountered even when she is in the midst of her work. I would also like to express my sincere gratitude and appreciation to my committee members, namely Professors Gustafson, Paul and Wu, Lang (alphabetically) for their precious advice and help in my thesis. Special thanks to Professor Patrick Carter for providing us the biological data and the biological background of the thesis work. Thank you to all faculty, staff and graduate students in the Department of Statistics, University of British Columbia, for making my stay here such an enriching experience.  x  Statement of Co-Authorship The thesis is finished under the supervision of my supervisor, Professor Nancy Heckman. Chapter 2 is completed with the help of Professor Heckman. Chapter 3 is co-authored with Professor Nancy Heckman. My main contributions are the derivation of the model parameter estimation in Section 3.3 and the model diagnostics in Section 3.5. I conduct the programming work of the data analysis and the simulation studies. Chapter 4 is co-authored with Professor Patrick Carter, School of Biological Sciences, Washington State University. Professor Carter conducted the biological experiment and provided us the data. Professor Carter also gives a detailed description of the experiment and the biological background. We describes the statistical method and carry out the data analysis.  xi  Chapter 1  Introduction This thesis work is motivated by a biology experiment involving data on individuals, data that can be considered as scalar responses depending on function-valued predictors. In the thesis, we provide a functional regression approach to study the biology problem of interest. In this introductory section, we first give a brief summary of the experiment and then a review of smoothing methods in functional data analysis. We give a literature review of the functional regression model followed by a description of our approach.  1.1  Summary of the biology experiment  Professor Patrick Carter of the School of Biological Sciences of Washington State University conducted an experiment on house mice (Mus domesticus) selectively bred for increased voluntary wheel-running exercise. Details of the experiment are described in Morgan, Garland and Carter, 2003. In summary, mice were divided into four groups according to gender and the two treatments “selection” and “control”. The selection group mice were from lines bred with selection being on high wheel-running activity at age eight weeks. Control mice were from lines bred at random. Body mass and wheel running activity were recorded for each mouse for sixty two consecutive weeks, indexed from −1 to 60, except for weeks 34, 38, 39, 50. The  research interest is to know how body mass and wheel running are related and if the relationship depends on the treatment.  Unfortunately, the wheel running distance data have many missing values and are very noisy. In addition, the wheels were cleaned every four weeks, which resulted in spikes in wheel-running activity every fourth week. 1  Chapter 1. Introduction One way to eliminate the problem is to average the wheel running distance over weeks and use the averaged value in the analysis. After this averaging, a scalar is obtained. We treat the averaged wheel running as the response and the body mass trajectory as the predictor. We can build a regression model to study the dependence of the averaged wheel running on the body mass. To make use of the functional form of the body mass, we use a functional regression model approach. More details of this model are in Section 1.3. In the next section, we give a review of smoothing methods which can be used to study the body mass trajectories. Some of these methods will be used in Section 1.3.  1.2  Review of smoothing methods  One can view a body mass trajectory as a process which can be described as a continuous function of time. Though the measurements of the trajectory are made at a finite discrete set of points, the trajectory lies in an infinite dimensional parameter space. This functional type of data is the study object of a statistics branch, namely functional data analysis. For a reference, please see Ramsay and Silverman (2005). In this section, we give a brief summary of some popular smoothing methods to study functional data. Ruppert, Wand and Carroll (2003) give a detailed summary of these methods. These methods can be applied to study the body mass trajectories of the mice. To further investigate the relationship between the body mass and the averaged wheel running, an additional method needs to be used. First, we consider analyzing data on one individual. Fix the ith individual and let Zi (·) denote its trajectory. Let tij denote Zi ’s jth observational point and let zij denote its observed value at that time. Data collected on this individual are the tij ’s and the zij ’s, j = 1, . . . , ni . The dependence of zij on tij through the function Zi is written as zij = Zi (tij ) + where  ij  ij ,  j = 1, . . . , ni .  (1.1)  represents an error, either measurement error or modelling error, 2  Chapter 1. Introduction with mean zero. Estimating Zi from the zij ’s is often refered to as nonparametric regression, since the function Z i is only assumed to be “smooth”. One may view zij as the perturbed Zi at time tij . One of the popular methods to estimate Z i is local polynomial smoothing. See Fan and Gijbels (1996) for details. Let K be a symmetric positive function with K(t) decreasing as |t| increases. Let h > 0 be the smoothing  parameter which is usually refered to as the “bandwidth”. To estimate Z i (t) by a local polynomial of degree p, we minimize n j=1  [zj − β0 − β1 (tj − t) − . . . − βp (tj − t)p ]2 K  tj − t h  to obtain (βˆ0 , βˆ1 , . . . , βˆp ). The estimate of Zi (t) is then Zˆi (t) = βˆ0 . Compared with other smoothing methods, one advantage of local polynomial smoothing is its simpler theoretical analysis which has allowed greater insight into its asymptotic properties. Another popular smoothing method uses spline basis functions to model the function Zi . A spline function with degree q is defined on an interval. One first divides the interval into subintervals by breakpoints, called knots. On each subinterval, the spline function is a polynomial of degree q. On the entire interval, the function has q − 1 continous derivatives. For a discussion of spline functions, please see Ramsay and Silverman (2005) p. 47. A popular choice of spline basis functions is the B-spline basis. In general, let φ1 ,. . .,φK be the spline basis functions and assume that Z i can be modelled as  K  Zi (t) ∼  k=1  xik φk (t) ≡ φ(t) xi , xi ≡ (xi1 , . . . , xiK ) .  (1.2)  It thus follows from (1.1) that K  zij =  xik φk (tij ) + k=1  ij  ≡ φ(tij ) xi +  ij ,  j = 1, . . . , ni .  (1.3)  3  Chapter 1. Introduction One way to estimate the unknown coefficient vector x i is via minimizing ni j=1  zij − φ(tij ) xi  2  .  (1.4)  To use the spline basis functions, one must specify the number and the location of “knots”. This is not a simple task. Improper choice of knots may result in overfitting or underfitting the data. One strategy to overcome this problem is to use many knots, but to penalize to prevent overfitting. In this approach, called penalized splines, we minimize a penalized version of (1.4) to estimate xi . That is, we minimize ni j=1  zij − φ(tij ) xi  2  + λxi Pxi ,  (1.5)  where λ is a smoothing parameter and P is a penalty matrix. For examples of P and how to choose λ using the method of cross validation, please see Ramsay and Silverman (2005) p. 94. Interestingly, there is a close connection between the penalized smoothing estimate of Zi (t) using basis functions and the best linear unbiased predictor (BLUP) of Zi (t) in a random effects model framework. Intuitively, consider the vector coefficient xi in (1.3) being random and having a covariance matrix Σx . Under a normality assumption of the x i and the  ij ’s,  the  log likelihood function of the zij ’s and xi can be written in a form similar to (1.5), up to constant. The smoothing parameter λ is related to the variance of  ij  and the penalty matrix P is in the same position as the inverse of Σ x .  For more discussion on the BLUP, see Robinson (1991). The random effects view of penalized splines is particularly useful when analyzing data from N individuals: the z ij ’s, j = 1, . . . , ni , i = 1, . . . , N . We model each zij as in (1.3), with the random effects x i ’s assumed independently and identically distributed with covariance matrix Σ x . The random effects induce a covariance structure on the z ij ’s that provides a reasonable, yet flexible model for within subject dependence. The parameter Σ x is to be estimated using all the zij ’s. Standard methods to fit random effects or 4  Chapter 1. Introduction mixed models (containing both random and fixed effects) can thus be used. For a reference, see McCulloch and Searle (2001). Rice and Wu (2001) use a B-spline basis in (1.3) and use the EM algorithm to estimate the model parameters. The authors also calculate the best linear unbiased predictor of each individual trajectory.  1.3  Functional regression models  Some ideas and methods in functional data analysis can be viewed as extensions of those in classical statistics. One of the extensions is the functional regression model with a scalar response and a functional predictor. This model can be used to study the dependence of the averaged wheel running on the body mass trajectory. In this section, we introduce the functional regression model and sketch the methods to study this model. Among the rich literature on this topic, we select those papers most relevant to our approach. Let Yi be the averaged wheel running of the ith individual. Referring to (1.1), we view Zi as the predictor while zij is the perturbed value of Zi at time tij . Data collected on N individuals are Y i and zij , j = 1, . . . , ni , i = 1, . . . , N . The functional regression model is Yi = β 0 +  β(t)Zi (t)dt + ei .  (1.6)  The unknown functional coefficient β(·) measures the dependence of the scalar response Yi on the predictor Zi and is of primary interest. Usually it is assumed that the function β is smooth. The functional regression model (1.6) is loosely related to ordinary multiple regression in that the integral replaces the summation. The ordinary multiple linear regression model for a fixed individual uses “covariates” Zi (ti1 ), . . . , Zi (tini ). and models Yi as Yi = β 0 +  βj Zi (tij ) + ei . j  5  Chapter 1. Introduction In our case, where the Zi (tij )’s are unobserved, we are essentially carrying out multiple regression with covariates measured with error. Note that the βj ’s do not depend on i. Thus, to use all individuals in this model, the Z i ’s have to be measured at the same time points to obtain the z ij ’s, i.e. tij = tj and ni = n. When individuals are measured at different sets of time points or have different number of observations, it is not clear how to apply the multiple regression model. Another problem with multiple regression is that for large values of n, the high-dimensional vector of parameters, β, may be hard to estimate. The functional model (1.6) can easily accommodate the case when different time points are observed for different individuals. And exploiting the continuity of β and the Zi ’s allows us to reduce the number of parameters used. In the literature, there are several approaches to calculate the integral β(t)Zi (t)dt and to estimate β. In the remainder of this section, we give a summary of some of these approaches and focus on those closest to ours. One approach uses a discretized version of the integral and estimates β by minimizing a penalized version of the residual sum of squares. Cardot, Crambes, Kneip and Sarda (2007) assumed that data were observed at the same n equally spaced time points, i.e. n i ≡ n, tij = tj and tj − tj−1 = 1/n for all j = 2, . . . , n. Cardot et al. considered model (1.6) but without the  intercept β0 , i.e. the model Yi =  β(t)Zi (t)dt + ei .  The authors discretized the integral  β(t)Zi (t)dt using a grid of t values  equal to the tj ’s and wrote Yi =  1 n β(tj )Zi (tj ) + ei . n j=1  Let Zi = (Zi1 , . . . , Zin ) , zi = (zi1 , . . . , zin ) and β = (β(t1 ), . . . , β(tn )) . The authors used a Total Least Squares method to estimate β and predict Z i as  6  Chapter 1. Introduction the solutions of the minimization problem  β∈  min n  ,Zi ∈  n  1 N  N i=1  Yi −  1 Zβ n i  2  +  1 λ ||zi − Zi ||2 + β Pβ , n n  where λ is a smoothing parameter and P is a penalty matrix. Unfortunately, it is not clear how to apply this approach directly when individuals are measured at different sets of time points. Another approach uses basis functions to approximate the Z i ’s and β. In this basis expansion approach, individuals can be measured at different time points and they don’t need to have the same number of observations. This approach approximates the Zi ’s as in (1.2) and similarly approximates β using basis functions, ψ1 , . . . , ψJ , as J  β(t) = j=1  βj ψj (t) ≡ β ψ(t), β = (β1 , . . . , βJ ) .  (1.7)  An advantage of expansions (1.2) and (1.7) is that they reduce the infinite dimensional functional uncertainty into finite dimensional unknowns which are conveniently handled by vectors. Estimating β(·) thus reduces to estimating the coefficient vector β. We also notice that using (1.2) and (1.7), the integral  β(t)Zi (t)dt in (1.6) can be written as β(t)Zi (t)dt = β Txi , where T[j, k] =  ψj (t)φk (t)dt.  (1.8)  In this approach, the xi ’s in (1.2) can be assumed to be fixed or random. When the xi ’s are assumed random, they are usually modelled as described at the end of Section 1.2. That is, they are assumed independently and identically distributed with a normal distribution with covariance matrix Σ x . This assumption contributes to the construction of the likelihood function of the data, the (Yi , zij )’s. The vector β is estimated by maximizing the log likelihood or a penalized log likelihood. In the following, we give brief summaries of three statistical papers and a book chapter that use a basis expansion approach. In expansions (1.2) and  7  Chapter 1. Introduction (1.7), the φk ’s and ψj ’s are usually chosen to be splines, although this is not necessary. For instance, the φk ’s can be equal to the principal component functions of the Zi ’s. James (2002), James and Silverman (2005) and M¨ uller (2005) considered more general models. In the following, we describe their approaches with an emphasis on studying model (1.6). James (2002) used φk ’s in (1.3) equal to a basis for natural cubic splines and also considered ψ = φ in (1.7). The Z i ’s were subject to error, and so the data were the perturbed zij ’s, given in (1.3). The xi ’s were multivariate normal and β was estimated by maximizing the log likelihood. With x i ’s as the missing data and the {(zij , j = 1, . . . , ni ), Yi }’s as the observed data, the author used the EM algorithm to estimate unknown parameters.  James and Silverman (2005) used φk ’s in (1.3) equal to the estimated principal component functions of the Z ij ’s and used ψj ’s equal to cubic spline basis functions. The coefficient vector’s, the x i ’s, were taken to be fixed. When the Zi ’s were observed without error, the authors estimated β by maximizing the log likelihood of the (Y i , Zij )’s subject to a penalty term which penalized β’s that have β(t)f (t)dt big when Var( Z(t)f (t)dt) is small. When the zij ’s were observed, the log likelihood was modified to include the measurement error and a penalty term was added to ensure the smooth fits of the Zi ’s. M¨ uller (2005) used the idea of Karhunen-Lo`eve expansion of the Z i process to estimate the φk ’s and the corresponding xi ’s. In this approach, the φk ’s in (1.3) are equal to the first K estimated eigenfunctions of the Zi process. The xi ’s are random and have diagonal covariance matrix, and M¨ uller assumes that the estimated x i ’s have diagonal covariance matrix too. M¨ uller’s method of estimating can be separated into two parts where the xi ’s are estimated in the first part and β is estimated in the second part. A detailed description of M¨ uller’s method is provided in Chapter 3. Ramsay and Silverman (2005, Chapter 15) first calculated Zˆi (·) by fitting the zij ’s using least squares and a fairly large number of basis functions. They then estimated β by minimizing a penalized residual sum of squares  8  Chapter 1. Introduction which was defined as N i=1  2  Yi − β 0 −  β(t)Zi (t)dt  + λβ Pβ.  This method is implemented in the R library fda (Ramsay, Wickham and Graves, 2007).  1.4  Introduction to the thesis work  In (1.3), when the xi ’s are random, the distribution of the observed vector (zi1 , . . . , zini ) will not, in general, be identifiable. This concern motivates our work in Chapter 2. Assuming the x i ’s random introduces an additional parameter Σx into the model. This additional parameter complicates the covariance matrix of the zij ’s and the variance of the Yi ’s. Possibly, the covariance/variance parameters of the model are not identifiable. In Chapter 2, we study identifiability in normal linear mixed effects models. We derive necessary and sufficient conditions of identifiability, including conditions of identifiability for the scalar response and functional predictor mixed effects model. In Chapter 3 of the thesis, we study model (1.6) assuming the observed Zi ’s are contaminated with measurement error. We use principal component functions to expand Zi in (1.2) and use the same functions to expand β in (1.7). This choice of basis functions can fit the Z i ’s well with K fairly small. An added benefit of this eigenfunction approach is that we focus on “estimable directions” of β. We assume that x i in (1.3) is a normal random vector with a general covariance matrix Σx . In Chapter 3, we implement the ECME (Expectation/Conditional Maximization Either) algorithm of Liu and Rubin (1994) to estimate the model parameters. We consider Hessian matrix based and boot-strapped standard errors of the estimate of β. We propose a new integrated t-statistic for hypothesis testing involving β(·). Expressions of the residuals of the model fit are derived and we discuss model diagnostics based on the analysis of residuals. 9  Chapter 1. Introduction In Chapter 4, we apply our model to analyze a biological data set, providing a detailed biological background. Chapter 5 summarizes the thesis work and discusses possible future research directions.  10  Bibliography [1] Cardot, H., Crambes, C., Kneip, A. and Sarda, P. (2007) Smoothing splines estimators in functional linear regression with errors-invariables. Computational Statistics & Data Analysis 51, 10, 4832-4848. [2] James, G. (2002) Generalized linear models with functional predictors. Journal of the Royal Statistical Society B 64, 3, 411-432. [3] James, G. M. and Silverman, B. W. (2005) Functional adaptive model estimation. Journal of the American Statistical Association 100, 470, 565-576. [4] Fan, J., and Gijbels, I. (1996) Local Polynomial Modelling and Its Applications. London: Chapman & Hall. [5] Liu, C. H. and Rubin, D. B. (1994) The ECME Algorithm - a simple extension of EM and ECM with faster monotone convergence. Biometrika 81 4, 633-648. [6] McCulloch, C. E. and Searle, S. R. (2001) Generalized, Linear and Mixed Models. New York: Wiley. [7] Morgan, T.J., T. Garland, Jr., and Carter, P. A. (2003) Ontogenetic trajectories in mice selected for high wheel- running activity. I. Mean ontogenetic trajectories. Evolution 57, 646-657. [8] M¨ uller, H. G. (2005) Functional modeling and classification of longitudinal data. Scandinavian Journal Statistics 32, 223-240. [9] Ramsay, J. O. and Silverman, B. W. (2005) Functional Data Analysis. 2nd edition, Springer Series in Statistics. 11  Bibliography [10] Ramsay, J. O., Wickham, H. and Graves, S. (2007) R fda Library. [11] Rice, J. A. and Wu, C. O. (2001) Nonparametric mixed effects models for unequally sampled noisy curves. Biometrics 57, 253-259. [12] Robinson, G.K. (1991) That BLUP is a good thing: the estimation of random effects. Statistical Science 6, 15-32. [13] Ruppert, D., Wand, M. P. and Carroll, R. J. (2003) Semiparametric Regression. Cambridge University Press.  12  Chapter 2  Identifiability discussion of linear mixed effects models 2.1  Introduction  Mixed effects models have moved beyond the original simple models, becoming more complicated, containing more parameters. However, two different sets of parameters producing the same covariance matrix for the observations may cause problems for parameter estimation and for inference. In principle, parameter identifiability is the first thing that should be verified when building a model. As Demidenko (2004, p.118) states, “identifiability may be viewed as a necessary property for the adequacy of a statistical model”. We study identifiability in normal linear mixed effects models. In the next section, we define the classical mixed effects model and show that in an unrestricted form and a specific restricted form, the model is not identifiable. Sections 2.3 and 2.4 give sufficient conditions to identify parameters in various models. In Section 2.5 we discuss identifiability of extended models.  2.2  Motivation  In this section, we give the definition of nonidentifiability. Then we introduce the general unrestricted classical linear mixed effects model and give two models that are nonidentifiable. Definition 2.2.1 Let y be the vector of observable random variables with 0  A version of this chapter will be submitted for publication. Author: Wang, W.  13  Chapter 2.  Identifiability discussion of linear mixed effects models  distribution function Pθ where θ is in the parameter space Θ. This probability model for y is not identifiable if and only if there exist θ, θ ∗ ∈ Θ, with θ = θ∗ and Pθ = Pθ ∗ .  Throughout the paper, we assume all random variables follow normal distributions. As the normal distribution is uniquely characterized by its first two moments, identifiability of a normal distribution function then reduces to the identifiability of its mean vector and covariance matrix (Demidenko, 2004, Proposition 10, p.118). In the standard linear mixed effects model, y is the observable random vector of length n and X and Z are known, non-random design matrices with dimensions n × p, n > p and n × q, n > q respectively. We assume throughout that both X and Z have full column rank. Then y = Xβ + Zu + , u ∼ N (0, Σu ),  ∼ N (0, Σ ), u independent of .  The random effects vector u and the error vector  (2.1)  are unobservable. This  model has been studied and applied by, for instance, McCulloch and Searle (2001). Unknown parameters in the model are (β, θ), where β ∈ B ⊆ p , θ = ˜ ⊆ Θ = the set of all (Σ , Σu ) with Σ , n × n, and Σu , (Σ , Σu ) ∈ Θ  q × q, both symmetric and positive definite. Throughout the paper, we  assume that β and θ do not have common elements, i.e. we assume that ˜ We also assume that Σ and Σu do not have common (β, θ) ∈ B ⊗ Θ. ˜ =Θ ˜ ⊗Θ ˜ u where Θ ˜ ⊆ elements. That is, we sometimes assume that Θ ˜ u ⊆ Θu , and Θ contains all n × n positive definite symmetric Θ and Θ matrices and Θu contains all q × q positive definite symmetric matrices. We take as our operating definition of identifiability the ability to identify the parameters β, Σ and Σu . The linear mixed effects model has become popular in the analysis of longitudinal and functional data. For instance, the response of individual i at time t can be modelled as  αj φj (t) +  uik ψk (t) plus error, where 14  Chapter 2.  Identifiability discussion of linear mixed effects models  the αj ’s are fixed population effects and the u ik ’s are individual-specific random effects. The inclusion of random effects allows us to realistically model covariance within an individual. Taking φ j ’s and ψk ’s equal to Bsplines allows us to flexibly model a wide variety of responses. In addition, if we allow some of the αj ’s to be random but with a very specific covariance structure, then the resulting best linear unbiased predictors of the α j ’s are computationally equivalent to the parameter estimates in a smoothing spline fit or a penalized B-spline fit. See Verbyla (1999) or Ruppert, Wand and Carroll (2003). The parameter vector β is always identifiable since Ey = Xβ and X is of full column rank. So to study identifiability in the normal model, it suffices to study the covariance matrix of y, Σy = ZΣu Z + Σ , to see when Σ and Σu are identifiable. Thus, nonidentifiability of model (2.1) with parameter ˜ is equivalent to the existence of θ, θ ∗ ∈ Θ, ˜ with θ = θ ∗ giving space B ⊗ Θ  the same Σy , i.e.  ZΣu Z + Σ = ZΣ∗u Z + Σ ∗ .  (2.2)  In Example 2.2.1 below, we show that the unrestricted model where ˜ = Θ is nonidentifiable. A similar argument shows that the restricted Θ model in Example 2.2.2 below is not identifiable. Example 2.2.2 uses the covariance structure assumed for the random effects in the penalized Bspline method. Thus, when using penalized Bsplines, one cannot assume a general form for Σ . Example 2.2.1 Suppose we place no further restrictions on Θ. Then the model is not identifiable. To see this, let (Σ , Σu ) ∈ Θ and 0 < a < 1. Then Σ∗u = (1 − a)Σu and Σ  ∗  = Σ + aZΣu Z . So Σ  ∗  and Σ∗u satisfy (2.2) and  (Σ ∗ , Σ∗u ) ∈ Θ.  Example 2.2.2 ˜ u where Θ ˜ u contains all matrices of the form Suppose that Θ = Θ ⊗ Θ  σ 2 R where R is positive definite and known, and σ 2 > 0. To see that this 15  Chapter 2.  Identifiability discussion of linear mixed effects models  model is not identifiable, use the same argument as in Example 2.2.1 and ˜ u if Σu is in Θ ˜ u. note that the constructed Σ∗ is in Θ u  In practice, one usually assumes a more specific structure for Σ , such as Σ = σ 2 I. Restrictions may lead to identifiability, and such restrictions and their effects on identifiability will be discussed in the next two sections.  2.3  Simple sufficient conditions of identifiability  In this section, we find sufficient conditions of identifiability of model (2.1) ˜ = Θ. assuming Θ A further examination of (2.2) gives us the following sufficient conditions. Clearly, if Σu is known, then ZΣu Z is known, and so Σ is completely determined. If Σ is known, then the model is identifiable. To see this, consider (2.2) with Σ = Σ ∗ . It follows that ZΣu Z = ZΣ∗u Z and so Σu = Σ∗u since Z is of full column rank. If ZΣu Z Σ −1 = K, where K is known and K + I is of full column rank, then the model is identifiable. Suppose by way of contradiction the model is ˜ with, not identifiable. Then (2.2) holds for (Σ u , Σ ) = (Σ∗ , Σ ∗ ) both in Θ u  by assumption, ZΣu Z = KΣ and ZΣ∗u Z = KΣ ∗ . Substituting these expressions into (2.2) yields KΣ + Σ = KΣ  ∗  + Σ ∗,  that is, (K + I)(Σ − Σ ∗ ) = 0. Since K + I is of full rank, we must have Σ = Σ ∗ . But, as shown in the previous paragraph, this implies that Σ u = Σ∗u . The last condition is similar to a common condition for identifiabililty in simple linear regression models with measurement errors. The model assumes yi = β 0 + β 1 xi + i ,  i  ∼ (0, σ 2 ), 16  Chapter 2.  Identifiability discussion of linear mixed effects models  where xi is observed with error having variance σ u2 . The response yi then has variance σu2 + σ 2 . One of the common conditions of model identifiability is to assume the ratio σu2 /σ 2 is known. The inverse Σ  −1  appearing in our  last condition could be viewed as multivariate version of “denominator”. If there are any supplementary data, we may then be able to find an estimate of Σu , Σ or K and we can treat this estimate as the true value. The sufficient conditions for identifiability can then be satisfied.  2.4  Sufficient conditions of identifiability for a structured Σ  As we observed from Examples 2.2.1 and 2.2.2, the model is not identifiable even if we restrict Σu to be a scalar multiple of a known matrix. In this section, we study the effect of putting restrictions on Σ . In Theorem 2.4.1 below, we give a necessary and sufficient condition of nonidentifiability, a condition that relies mainly on the design matrix Z via H Z = Z(Z Z)−1 Z . The theorem leads to four corollaries: Corollaries 2.4.1 and 2.4.2 give necessary and sufficient conditions for identifiability when Σ arises from an exchangeable covariance structure or is diagonal. Corollary 2.4.3 states an easily checked condition on Σ that guarantees identifiability of the model. That corollary is then applied to two commonly used error structures. Using Corollary 2.4.4, we can generalize a known identifiability result, giving a shorter proof under weaker conditions. ˜ ⊆ Θ and define HZ = Z(Z Z)−1 Z . Then model Theorem 2.4.1 Let Θ ˜ is nonidentifiable if and only if there exist (2.1) with parameter space B ⊗ Θ ˜ and (Σ ∗ , Σ∗ ) ∈ Θ, ˜ with Σ ∗ = Σ such that (Σ , Σu ) ∈ Θ u  HZ [Σ − Σ ∗ ] = Σ − Σ ∗ ,  (2.3)  Σ∗u = Σu + (Z Z)−1 Z [Σ − Σ ∗ ] Z(Z Z)−1 .  (2.4)  and  17  Chapter 2.  Identifiability discussion of linear mixed effects models  Proof : Nonidentifiability of the model is equivalent to the existence of (Σ , Σu ) and ˜ not equal, satisfying (2.2). Note that this is equivalent to (Σ ∗ , Σ∗ ) in Θ, u  ˜ with Σ having (Σ , Σu ) and (Σ ∗ , Σ∗u ) in Θ Z(Σu − Σ∗u )Z = Σ  ∗  ∗  = Σ satisfying  −Σ .  (2.5)  Suppose the model is nonidentifiable. We premultiply (2.5) by Z , postmultiply it by Z and then pre- and postmultiply by (Z Z)−1 to get Σu − Σ∗u = (Z Z)−1 Z [Σ  ∗  − Σ ] Z(Z Z)−1 .  (2.6)  This gives (2.4). To derive (2.3), premultiply (2.6) by Z, postmultiply (2.6) by Z to get Z(Σu − Σ∗u )Z = HZ [Σ  ∗  − Σ ] HZ  (2.7)  which, by (2.5), is the same as Σ −Σ  ∗  = HZ [Σ − Σ ∗ ] HZ .  (2.8)  Premultiplying (2.8) by the idempotent matrix H Z gives HZ [Σ − Σ ∗ ] = HZ [Σ − Σ ∗ ] HZ . Substituting (2.8) into the right side of the above yields (2.3). To prove the converse, we want to show that (2.3) and (2.4) lead to (2.5). It is clear from (2.4) that (2.7) holds. If we can show that (2.8) holds then we are done since substituting (2.8) into the right side of (2.7) yields (2.5). To show (2.8), from (2.3) and the symmetry of Σ − Σ ∗ , we see that HZ [Σ − Σ ∗ ] = [Σ − Σ ∗ ]HZ . Premultiplying the above identity by the idempotent matrix H Z gives HZ [Σ − Σ ∗ ] = HZ [Σ − Σ ∗ ]HZ . Substituting (2.3) for the left side of the equation, we see that (2.8) holds. ✷  18  Chapter 2.  Identifiability discussion of linear mixed effects models  The proofs of the next two corollaries are in Appendix A. Corollary 2.4.1 Let 1 be an n-vector with each element being one. Suppose that the distribution of matrix of {Σ =  is of the  σ2  1, . . . , n form σ 2 [(1  is exchangeable, that is, the covariance ˜ = − ρ)I + ρJ] where J = 11 . Let Θ  [(1 − ρ)I + ρJ] , σ 2 > 0, −1/(n − 1) < ρ < 1}. Suppose the  matrix Z satisfies 1 Z = 0 and rank(Z) = q with 1 ≤ q < n − 1. Suppose ˜ ⊗ Θu . Then model (2.1) is identifiable if and the parameter space is B ⊗ Θ only if HZ J = J.  Comments. The condition HZ J = J means the sum of the elements of each row of HZ is equal to one, and this is an easy condition to check. For the case that q = 1, i.e. Z is a column vector (z 1 , . . . , zn ) where      HZ =   z12 s2z  z1 z2 s2z  .. .  zn z2 s2z  zn z1 s2z  z1 zn s2z  ···  .. .  2 zn s2z  ···       , where s2z =   zi = 0,  zi2 .  The model is identifiable if and only if Z is not a constant vector. When q = 2, suppose we have the usual simple linear regression model with centered covariates:     1 z1    .. .  Z =  . ..  , with    zi = 0.  (2.9)  1 zn  Then       HZ =   1 n  1 n  z12 s2z  1 n  +  zn z1 s2z  1 n  +  + .. .  +  z1 z2 s2z  ···  zn z2 s2z  ···  1 n  +  1 n  .. .  z1 zn s2z  +  2 zn s2z      .   and each row of HZ sums to one. Thus unfortunately, the model is not identifiable under this Z combined with the exchangable covariance structure. ˜ equals the collection of all diagonal posCorollary 2.4.2 Suppose that Θ itive definite n × n matrices. Then model (2.1) with parameter space B ⊗ 19  Chapter 2.  Identifiability discussion of linear mixed effects models  ˜ ⊗ Θu is identifiable if and only none of the diagonal elements of H Z is Θ equal to one.  Comments. Again, the condition on H Z is easy to check. Consider the case q = 1. As we have seen, diagonal elements of H Z equal zi2 /  zj2 ,  i = 1, . . . , n. Therefore, the model is identifiable if and only if Z does not have n − 1 zero elements. Consider q = 2 with Z as in (2.9). The model is  identifiable provided, for all i, (1/n)+z i2 /  zj2 doesn’t equal 1. So typically,  the model is identifiable. The following corollary provides a sufficient condition for identifiability, a condition that can sometimes be easily checked. Consider (2.3). Note that the rank of HZ (Σ − Σ ∗ ) is at most q, since the rank of HZ is q. Thus, for (2.3) to hold, we must be able to find some Σ and Σ  Σ −Σ  ∗  ∗  with the rank of  less than or equal to q. This proves the following.  ˜ ⊆ Θ. Then model (2.1) with parameter Corollary 2.4.3 Suppose that Θ ˜ is identifiable if rank(Σ − Σ ∗ ) > q for all Σ , Σ ∗ in the space B × Θ parameter space.  Now we apply Corollary 2.4.3 to show model identifiability under the “multiple of a known positive definite matrix” and the “MA(1)” covariance structures respectively in next two examples. Example 2.4.1 Multiple of a known positive definite matrix ˜ implies Fix R, symmetric and positive definite, and suppose (Σ , Σu ) ∈ Θ that Σ = σ 2 R, some σ 2 > 0. Consider Σ = σ 2 R and Σ ∗ = σ ∗2 R, σ ∗2 = σ 2 . Clearly Σ − Σ  ∗  = (σ 2 − σ ∗2 )R is invertible, and so is of rank n,  which we have assumed is greater than q. Thus, the model is identifiable.  To show the model in Example 2.4.2 below is identifiable, we need the following lemma which is a result in (Graybill, 1983, p.285) Lemma 2.4.1 Let T be the n × n Toeplitz matrix with ones on the two parallel subdiagonals and zeroes elsewhere. Given two scalars a 0 and a1 , the 20  Chapter 2.  Identifiability discussion of linear mixed effects models  eigenvalues of the n × n matrix C = a0 I + a1 T are µi = a0 + 2|a1 | cos  iπ , i = 1, . . . , n. n+1  Example 2.4.2 MA(1) Suppose that n − 1 > q. Let the components of have the MA(1) covariance ˜ = {Σ = σ 2 (I + ρT), σ 2 > structure, i.e. of the form σ 2 (I + ρT). Let Θ ˜ implies that Σ ∈ Θ ˜ . 0, |ρ| < 1/2} and suppose (Σ , Σu ) ∈ Θ Let Σ and Σ  matrix Σ − Σ  ∗  ∗  ˜ . By Lemma 2.4.1, the eigenvalues of the difference ∈Θ  = (σ 2 − σ ∗ 2 )I + (σ 2 ρ − σ ∗ 2 ρ∗ )T are  λi = (σ 2 − σ ∗ 2 ) + 2 σ 2 ρ − σ ∗ 2 ρ∗ cos  iπ , i = 1, . . . , n. n+1  Given any (σ 2 , ρ) and (σ ∗ 2 , ρ∗ ), with (σ 2 , ρ) = (σ ∗ 2 , ρ∗ ), the number of zero λi ’s is at most one. Hence, the rank of the difference matrix is greater than or equal to n − 1. Therefore, model (2.1) is identifiable under this MA(1)  covariance structure.  In longitudinal or functional data analysis, usually there are N individuals with the ith individual modelled as in (2.1): yi = X i β + Z i ui + ui ∼ N(0, Σu ),  i  i,  ∼ N(0, Σ i ),  ˜ i , ui and (Σu , Σ i ) ∈ Θu ⊗ Θ  i  independent.  (2.10)  Statistical inference is normally based on the joint model, the model of these N individuals. The following corollary gives sufficient conditions for identifiability of the joint model. The intuition behind the result is that, if we can identify Σu from one individual, then we can identify all of the Σ i ’s. Corollary 2.4.4 If an individual model (2.10) is identifiable, then the joint model is identifiable.  21  Chapter 2.  Identifiability discussion of linear mixed effects models  Proof: We notice each individual model (2.10) shares a common parameter, the covariance matrix Σu . If one individual model uniquely determines Σ u and its Σ i , the identified Σu will then yield identifiability of all the individual Σ i ’s since, if Zi Σu Zi +  i  ∗, i  = Z i Σu Zi +  clearly  i  =  ∗. i  Therefore, the  joint model is identifiable.  ✷  Corollary 2.4.4 reduces the verification of a joint model’s identifiability to the individuals’. For instance, if the i-th individual model has Z i of full column rank and Σ  = σ 2 Ini , where ni is the length of yi , then this  i  individual model is identifiable by Corollary 2.4.1 and thus so is the joint model. Note that the other individual models can still have their Z j ’s not of full column rank. Demidenko (2004, Chapters 2 & 3) studies the joint model but assumes the covariance matrix of  i  is σ 2 Ini . Suppose each Zi is of dimension ni × q.  Demidenko shows that the joint model is identifiable if at least one matrix Zi is of full column rank and  N i=1 (ni  the previous paragraph, the condition  − q) > 0. Using our argument in N i=1 (ni  − q) > 0 can be dropped.  Furthermore, our result can be applied to more general Σ ’s.  2.5  Extensions  In this section, we discuss identifiability of a model in functional regression for a functional predictor y(·) and a scalar response w. We derive a necessary and sufficient condition of nonidentifiability for this model. We model y(t) as  j  αj φj (t) +  k  uk ψk (t) plus error, with the φj ’s  and ψk ’s known, the αj ’s unknown and the uk ’s unknown and random. The dependence of the response w on y is modelled through an unknown functional coefficient, β: w = β0 + β(t) [y(t)−E(y(t))] dt+η where η is mean 0 normal noise. Thus, for appropriately defined ρ and with u = (u 1 , . . . , uq ) , we can write w = β0 + ρ u + η, β0 ∈ , ρ ∈  q  unknown , η ∼ (0, ση2 ) independent of u.  (2.11)  22  Chapter 2.  Identifiability discussion of linear mixed effects models  The predictor y is observed at a sequence of discretized points and the observed values are contained in the vector y, which then follows model (2.1). James (2002), M¨ uller (2005) and Heckman and Wang (2007) consider this modelling of y and w and propose different approaches to estimate the functional coefficient, β. We assume model (2.1) and (2.11). For our purpose of identifiability discussion here, we consider the unknown parameters to be (β 0 , β) and θ = ˜ ⊆ (Σ , Σu , ση2 , ρ). We suppose that (β0 , β) ∈ B ⊆ ⊗ p and that θ ∈ Θ Θ = Θ ⊗ Θu ⊗  +  ⊗  q  where Θ and Θu are as before and  +  is the set  of positive real numbers. To study identifiability, we must study the distribution of the random  vector (y , w). We see that E(y) = Xβ, E(w) = β0 and (y , w) has covariance matrix Σ=  ZΣu Z + Σ  ZΣu ρ  ρ Σu Z  ρ Σu ρ + ση2  .  We know the parameter β is identifiable if the matrix X is of full column rank. The identifiability of β0 is also clear. So we focus on identifying the covariance parameters θ. Our discussion in Section 2.2 suggests the unrestricted model won’t be identifiable. In fact, we can construct an example following Example 2.2.1 to show the existence of nonidentical θ and θ ∗ both in Θ such that (2.2) holds and ZΣu ρ = ZΣ∗u ρ∗ , ρ Σu ρ +  ση2  = ρ  ∗  Σ∗u ρ∗  (2.12) +  ση∗ 2 .  (2.13)  Example 2.5.1 Example 2.2.1 continued. Let 0 < a < 1, Σ∗u and Σ ∗ be as in Example 2.2.1, and let ρ∗ = ρ/(1 − a) and ση∗ 2 = ση2 − aρ Σu ρ/(1 − a). It is not hard to see (2.12) and (2.13) are satisfied. If, in addition, we restrict a < σ η2 /(ση2 + ρ Σu ρ), we see that ση∗ 2 is positive. The following theorem gives a necessary and sufficient condition of non23  Chapter 2.  Identifiability discussion of linear mixed effects models  identifiability. ˜ ⊆ Θ , let Θ ˜ =Θ ˜ ⊗ Θu ⊗ Theorem 2.5.1 Let Θ HZ = Z(Z  Z)−1 Z  +  ⊗  q.  and define ˜ . Then model (2.1) and (2.11) with parameter space B⊗ Θ  is nonidentifiable if and only if there exist (Σ , Σu , ση2 , ρ) and (Σ ∗ , Σ∗u , ση∗ 2 , ρ∗ ) ˜ with Σ ∗ = Σ such that the following hold both in Θ, (a) HZ [Σ − Σ ∗ ] = Σ − Σ ∗ , (b) Σ∗u = Σu + (Z Z)−1 Z [Σ − Σ ∗ ] Z(Z Z)−1 , (c) ρ∗ = Σ∗u −1 Σu ρ, ∗ −1 (d) ση∗ 2 = ση2 + ρ Σu (Σ−1 u − Σu )Σu ρ.  Proof: Nonidentifiability of the model is equivalent to the existence of nonidentical ˜ satisfying (2.2), (2.12) and (2.13). θ and θ ∗ in Θ, First suppose that (2.2), (2.12) and (2.13) hold for some θ = θ ∗ . By the argument given in Theorem 2.4.1, (2.2) implies that (a) and (b) hold. Since Z is of full column rank, (2.12) yields (c). Substituting (c) into (2.13) yields (d). We easily see that, if θ = θ ∗ and if (a) through (d) hold, then Σ = Σ ∗. Now suppose that conditions (a) through (d) hold for some Σ = Σ ∗ . Again, by the argument given in Theorem 2.4.1, (2.2) holds. We easily see that (2.12) and (2.13) also hold.  ✷  24  Bibliography [1] Demidenko, E. (2004) Mixed Models: Theory and Applications, John Wiley & Sons. [2] Graybill, F. A. (1983) Matrices with Applications in Statistics, Second Edition, Wadsorth, Inc. [3] Heckman, N. and Wang, W. (2007) Linear mixed models for measurement error in functional regression. manuscript, Department of Statistics, University of British Columbia. [4] James, G. (2002) Generalized linear models with functional predictors. Journal of the Royal Statistical Society B 64, 3, 411-432. [5] McCulloch, C. & Searle, S. R. (2001) Generalized, Linear and Mixed Models, John Wiley & Sons. [6] M¨ uller, H.G. (2005) Functional modeling and classification of longitudinal data. Scandinavian Journal of Statistics 32, 223-240. [7] Ruppert, D., Wand, M.P. & Carroll, R.J. (2003). Semiparametric Regression, Cambridge University Press. [8] Verbyla, A. P., Cullis, B. R., Kenward, M. G., & Welham, S. J. (1999). The analysis of designed experiments and longitudinal data by using smoothing splines. Applied Statistics 48, 269-311.  25  Chapter 3  Linear mixed models for measurement error in functional regression 3.1  Introduction  Regression models with a functional predictor Z(·) and a scalar response Y have been extensively studied (see, eg, Ramsay and Silverman, 2005, and references therein). For individual i, the dependence of Y i on Zi is modelled as  b  Yi = β 0 +  a  β(t) Zi (t) − E(Zi (t)) dt + ei .  (3.1)  The goal is to estimate β. Data are collected from N independent individuals, with data on individual i, i = 1, . . . , N , being Yi , Zij ≡ Zi (tij ), j = 1, . . . , ni . If the Zi  processes are observed with error, then our data on individual i are Y i and zij , j = 1, . . . , ni , with zij = Zi (tij ) +  ij ,  Cov(  i1 , . . . , ini )  = Σ i.  (3.2)  Here, we consider data where the Zij ’s are observed with error which could be measurement error or modelling error, and we model the function Zi using random effects with a set of basis functions φ 1 , . . . , φK . In our 0  A version of this chapter will be submitted for publication. Authors: Heckman, N. and Wang, W.  26  Chapter 3. Linear mixed models for measurement error in functional regression estimation process, we approximate Z i (·) as K  Zi (t) = µ(t) +  xik φk (t),  (3.3)  k=1  where µ is an arbitrary function and x i ≡ (xi1 , . . . , xiK ) are independent and normally distributed with mean vector 0 and covariance matrix Σ x .  This approach was taken by James (2002), M¨ uller (2005) and James and Silverman (2005) for data with and without measurement error. James used φk ’s equal to a basis for natural cubic splines and also used this basis for modelling µ. James’s approach is similar to that described in Section 3.3. However, we implement a faster algorithm, the ECME (Expectation/Conditional Maximization Either) algorithm in Liu and Rubin (1994), and consider Hessian matrix based and boot-strapped standard errors. M¨ uller used φk ’s equal to the first K estimated eigenfunctions of the Zi process. M¨ uller’s approach can be separated into two parts: the first part uses only the zij ’s to determine µ, the φk ’s and the xij ’s. The second part incorporates the Yi ’s to estimate β0 and β. The first part uses the PACE method (principal analysis through conditional expectation) of Yao, M¨ uller and Wang (2005). In PACE, Yao et al. smooth the observed z ij ’s to obtain an estimate µ ˆ of µ and then centre the data by subtracting µ ˆ . Next the authors smooth the centred data to estimate the covariance function of the Zi ’s and then estimate the first K eigenfunctions and eigenvalues of the estimated covariance function. Denote these estimates by φ 1 , . . . , φK ˆ1, . . . , λ ˆ K . Let x and λ ˆik be the best linear unbiased estimate of individual i’s kth PC score, x ˆik = E(xik |zi1 , . . . , zini ), calculated assuming normality, (3.2) 2 ˆ1, . . . , λ ˆ K ). The second part of with Σ = σ I and (3.3) with Σx = diag(λ i  M¨ uller’s methodology involves regressing Y i on x ˆi1 , . . . , x ˆiK to estimate βk = ˆ ˆ β(t) φk (t) dt and then setting β(t) = βk φk (t). M¨ uller justifies the use of this regression by showing that E(Y i |zij , j = 1, . . . , ni ) = β0 +  ∞ 1 βk xik ,  for the true PC scores xik calculated with the true eigenfunctions. M¨ uller’s approach has a big computational advantage over James’s (2002),  in that it can fit the Zi ’s well with K fairly small. An added benefit of  27  Chapter 3. Linear mixed models for measurement error in functional regression this eigenfunction approach is that we focus on “estimable directions” of β. For instance, consider the extreme case where Z i can be written exactly as Zi (t) = µ(t)+  K 1  to the φk ’s. Since  xik φk (t). So Zi has no variability in directions orthogonal β(t)[Zi (t) − E(Zi (t))] dt =  can only hope to estimate mate  K 1  xik  β(t) φk (t) dt, we  β(t) φk (t) dt, k = 1, . . . , K. We cannot esti-  β(t) f (t) dt for f orthogonal to the φ k ’s. This issue was noted by  James and Silverman (2005) who handled it by adding a penalty term which penalizes β’s that have  β(t)f (t)dt big when Var( Z(t)f (t)dt) is small.  M¨ uller’s approach has the disadvantage that it does not fully use the Y i ’s: the xik ’s in (3.3) are estimated using only the z ij ’s. Clearly, the Yi ’s also provide information about the xik ’s if there is a relationship between Y i and Zi , that is, if β = 0. As James and Silverman (2005) note “It is an interesting feature of this problem that the responses provide additional information in the estimation of the Zi ’s”. Also, M¨ uller’s calculation, that E(Y i |zij , j = 1, . . . , ni ) = β0 +  ∞ 1 βk xik  does not hold if the eigenvalues or eigenfunctions  are incorrect. In particular, the calculation relies on Cov(x ij , xik ) = 0 for j = k. We consider a hybrid approach. Like M¨ uller, we use φ 1 , . . . , φK equal to the first K estimated eigenfunctions of the Z i process. Thus we not only improve on James’s choice of φk ’s but also focus on “estimable directions” of β. We then treat these φk ’s as fixed and known in our analysis. We use all of the data, the Yi ’s and the zij ’s, to estimate the xik ’s and we do not place any restrictions on Σx . Thus we improve on M¨ uller’s procedure, where the xik ’s are estimated using only the zij ’s, and Σx is assumed diagonal and is estimated completely by the eigenanalysis of the z ij ’s. Our detailed parameter estimation procedure using the ECME algorithm is in Section 3.3. In this work, we also propose test statistics for hypothesis testing of β(·). In Section 3.4.1, we consider testing the nullity of the function β, i.e. testing Ho : β(t) = 0, for all t ∈ [a, b]. In the two sample situation,  we consider testing the equality of a selection β s and a control β c , i.e. testing Ho : β s (t) = β c (t), for all t ∈ [a, b]. We propose a new integrated t-statistic  and three test statistics that are more standard. In Section 3.5, we derive expressions of the residuals of the model fit and discuss model diagnostics 28  Chapter 3. Linear mixed models for measurement error in functional regression based on the analysis of residuals. In Section 3.6.2, we apply the model to analyze a biological data set. In Section 3.7, via a simulation study, we compare our ECME estimate of β to a modification of M¨ uller’s (2005) two-stage estimate and we also study the performance of the different test statistics. Our detailed calculations to derive the ECME estimates are in Appendix B.  3.2  Notation and Preliminaries  Before we fit the model, we introduce some notation and carry out preliminary calculations. In this section and the next, we suppose that the φ k ’s are fixed and known. In practice, however, we will estimate them from an eigenanalysis of the zij ’s. For ease, assume ni ≡ n and tij = tj . Suppose that (3.1), (3.2) and  (3.3) above hold. Let Zi = (Zi1 , . . . , Zin ) , µ = (µ(t1 ), . . . , µ(tn )) , (  i1 , . . . , in )  i  =  represent errors and zi = (zi1 , . . . , zin ) be the observed values.  Write zi = Z i +  i  ≡ µ + Axi +  i,  (3.4)  i.i.d.  where xi ∼ N(0, Σx ), Ajk = φk (tj ), j = 1, . . . , n, k = 1, . . . , K, i  i.i.d.  ∼ N(0, σ 2 R), R known, symmetric and positive definite, {x1 , . . . , xn } independent of { 1 , . . . ,  n }.  If R is known, then the model is identifiable (Wang, 2007). However, if σ 2 R is unknown, then the model is not identifiable (Wang, 2007). Wang also discusses model identifiability under other assumptions on the covariance matrix of  i.  We choose a basis for β and write J  β(t) = j=1  βj ψj (t) ≡ β ψ(t).  29  Chapter 3. Linear mixed models for measurement error in functional regression Typically, we will take J = K and ψj = φj , but will write for the general case. Thus we can write b  Yi = β0 + β Txi + ei , where Tjk =  a  ψj (t) φk (t) dt and ei ∼ N(0, σ 2 ).  (3.5)  We easily see that zi , xi and Yi are jointly multivariate normal with E(zi ) = µ,  E(Yi ) = β0 ,  E(xi ) = 0,  and Var(Yi ) ≡ σY2 = β TΣx T β + σ 2 , Cov(zi ) ≡ Σz = AΣx A + σ 2 R, Cov(zi , Yi ) ≡ Σz,Y = AΣx T β, and  Cov zi , xi ≡ Σz,x = AΣx  Cov Yi , xi ≡ ΣY,x = β TΣx .  (3.6)  Let Wi = (zi , Yi ) be the ith subject’s observation and let µ W denote E(Wi ). Let A  C=  (3.7)  βT  and Σd be a block diagonal matrix as Σd = diag(σ 2 R, σ 2 ).  (3.8)  Cov(Wi ) ≡ ΣW = CΣx C + Σd  (3.9)  Then  and the log-likelihood of the observed data is ΛN = −  1 N ln det CΣx C + Σd − 2 2  N i=1  (Wi −µW ) CΣx C + Σd  −1  (Wi −µW )  up to an additive constant term. Unknown model parameters are θ = (µ, β0 , Σx , σ 2 , β, σ 2 ). Directly maximizing ΛN over θ does not give us closed forms of the parameter estimates except for µ and β 0 . So we need to rely 30  Chapter 3. Linear mixed models for measurement error in functional regression on iterative numerical methods to find the estimates. We will elaborate on these methods in the next section.  3.3  Parameter estimation  In this section, we use the ECME (Expectation/Conditional Maximization Either) algorithm (Liu and Rubin, 1994) to estimate the model parameters in θ. In this balanced design, the MLEs of µ and β 0 are z¯ and Y¯ respectively. So throughout we take ˆ = z¯ µ(t) = µ  (t) and β0 = βˆ0 = Y¯  (t) ¯ and so µW = W,  t = 0, 1, . . . .  We estimate the other parameters iteratively and sequentially. Given (t)  θ , the parameter estimates at iteration t, we update one component of θ (t) at a time, holding the other components fixed. We treat (z i , Yi , xi ), i = 1, . . . , N, as the complete data. We update σ  2(t)  by finding its EM estimate.  That is, we find its estimate by maximizing the conditional expected complete data log-likelihood function, where we condition on the observed data. The other components of θ (t) are updated by maximizing ΛN directly. Throughout our ECME procedure in Sections 3.3.2-3.3.4, we make the following assumptions. (a) A is of full column rank; (b) T is of full row rank; (c) there exists no u and v such that, for all i = 1, . . . , n, z i = u + v xi ; (d) there exists no v such, for all i = 1, . . . , n, Y i = Y¯ + v (zi − z¯). The restrictions on A and T are easily satisfied. Assumption (b) requires J, the number of the ψj basis functions to be no larger than K, the number of the φk basis functions. Typically, we will take J = K and ψ j = φj . Assumptions (c) and (d) are common for data where there is noise. We  31  Chapter 3. Linear mixed models for measurement error in functional regression notice assumption (a) implies that the matrix C defined in (3.7) is also of full column rank.  3.3.1  Initial estimates of parameters other than µ and β0  We choose the initial estimates as follows. We take β (0) to be a vector of zeroes. If this were the true value of β, then we could simply estimate σ 2 by the sample variance of the Yi ’s. To account for the fact that β may not be zero and thus the sample variance of the Yi ’s would overestimate σ 2 , we take σ 2(0) equal to 0.6 times the sample variance of the Yi ’s. The values of Σ(0) x and σ  2(0)  are based on the penalized eigenanalysis of  the zi ’s sample covariance matrix described in Section 3.6.1. These initial estimates are sensible if R is the identity matrix, but can still be used if R is not the identity. Roughly, the eigenanalysis in Section 3.6.1 partitions the variability of the zi ’s into two parts: variability from the Z i process and variability from the noise. Let λ1 , . . . , λn denote the calculated eigenvalues and suppose the largest K sufficiently describe the variability in the z i ’s. So we will use K eigenfunctions. A reasonable first estimate of Σ x is Σ(0) x 2(0) diagonal with entries λ1 , . . . , λK . We take σ equal to nK+1 λk /(n − K), explaining the remaining variability in the z ij ’s.  Clearly, under assumptions (a)-(d), we can force σ 2(0) and σ positive and  Σ(0) x  2(0)  to be  (t)  > 0. Given θ , we update Σx , σ 2 , β, and σ 2 , as described  below.  3.3.2  Updating Σx(t)  We update Σx(t) by maximizing ΛN over Σx while keeping the other pa¯ ¯ rameters fixed. Let SW = N (Wi − W)(W i − W) /N. We show that if σ 2(t)  and σ  2(t)  i=1  (t)  is are positive and if SW − Σd > 0, then our update Σ(t+1) x  positive definite and using it in the log likelihood instead of Σ (t) x increases the log likelihood.  With detailed derivation in Section B.4, differentiating Λ N with respect  32  Chapter 3. Linear mixed models for measurement error in functional regression to Σx and equating to zero yields the first order condition −1 −1 C Σ−1 W C = C ΣW SW ΣW C.  (3.10)  (t)  Here, C depends on β (t) and ΣW = CΣx C + Σd from (3.9). Equation (3.10) holds provided ΣW is invertible at the critical value of Σ x . Since we assume that σ 2(t) and σ  2(t)  (t)  are positive, Σd is positive definite. So ΣW will  be invertible provided Σx is non-negative definite. We now solve (3.10) for Σx , first deriving two useful identities, (3.11) and (3.12). For ease, we drop the hats and superscript t’s on the parameter 2(t) ˆ W , σ , β (t) , and σ 2(t) . estimates that are being held fixed, that is, on µ Direct multiplication and some manipulation of the left hand side of the following shows that C Σ−1 d C ×  C Σ−1 d C  −1  −1 + Σx C Σ−1 W = C Σd .  Solving this for CΣ−1 W yields C Σ−1 W =  C Σ−1 d C  −1  −1  C Σ−1 d C  + Σx  −1  C Σ−1 d .  (3.11)  Postmultiplying both sides of identity (3.11) by C yields C Σ−1 W C=  C Σ−1 d C  −1  −1  + Σx  .  (3.12)  Substituting (3.11) into the right side of (3.10) and (3.12) into the left side of (3.10) yields C Σ−1 d C where F = C Σ−1 d C  −1  −1  + Σx = F S W F ,  C Σ−1 d . Note that F is of full row rank. Thus, the  critical point is ˆ x = F SW F − C Σ−1 C Σ d  −1  = F (SW − Σd ) F ,  (3.13)  which is strictly positive definite. And so, clearly we have Σ W invertible at 33  Chapter 3. Linear mixed models for measurement error in functional regression the critical point. ˆ x leads to an increase in ΛN , we show that To see that the updated Σ ˆ x is negative definite. The ijth the Hessian matrix, H(Σx ) evaluated at Σ element of H(Σx ) is the second order partial derivative of Λ N with respect to the ith and jth elements of the vectorized Σ x . From calculations in Section B.4, we have ˆ ⊗D ˆ , where D ˆ =CΣ ˆ −1 C, ˆ x ) = −(N/2) D H(Σ W  (3.14)  which is clearly negative definite.  Updating σ 2(t)  3.3.3  We update σ  2(t)  , holding all other parameter estimates fixed, using one E-  step and one M-step of the EM algorithm. We show that if σ 2(t) and σ are positive and if Σx(t) > 0, then our update σ of the log likelihood after updating σ  2(t)  by σ  2(t+1)  2(t+1)  2(t)  is positive. Increase is guaranteed by the  property of the EM algorithm. Recall (zi , Yi , xi ), i = 1, . . . , N , are our complete data and W i ≡ (zi , Yi ) ,  i = 1, . . . , N , are the observed data. In conditional expectations, we let “ · ”  stand for the observed data. Abusing notation slightly, we let f denote a generic density function with the exact meaning clear from the arguments. N i=1 ln f (zi , Yi , xi ) | · θ (t) and the M-step maximizes this conditional expectation over σ 2 to obtain  The E-Step of the EM algorithm calculates E  σ  2(t+1)  . By the conditional independence of z i and Yi given xi , ln f (zi , Yi , xi ) ≡ ln f (zi |xi ) + ln f (yi |xi ) + ln f (xi ).  Since only ln f (zi |xi ) contains σ 2 , we can ignore the last two terms and obtain σ  2(t+1)  via maximizing E  θ (t)  N i=1  From (3.4), we first get N i=1  ln f (zi |xi ) = −  N 1 ln(det σ 2 R)− 2 2 2σ  ln f (zi |xi ) | · over σ 2 .  N i=1  (zi −µ−Axi ) R−1 (zi −µ−Axi ).  34  Chapter 3. Linear mixed models for measurement error in functional regression Following (3.6), we have Cov(Wi , xi ) = CΣx  (3.15)  which then leads to the conditional mean and covariance of x i given Wi as E[xi |Wi ] ≡ µxi |Wi = Σx C Σ−1 W (Wi − µW ),  Cov[xi |Wi ] ≡ Σx|W = Σx − Σx C Σ−1 W CΣx . Let  N  (t)  s˜ = i=1  (3.16) (3.17)  (t)  ˆ − Aµxi |Wi ) R−1 (zi − µ ˆ − Aµxi |Wi ). (zi − µ  Routine calculations yield N  E  θ (t)  i=1  ln f (zi |xi )|·  =−  N nN 1 (t) ln(det R)− ln σ 2 − 2 s˜ + N tr(R−1 AΣx|W A ) . 2 2 2σ  Differentiating this conditional mean with respect to σ 2 and equating the derivative to zero yields σ 2(t+1) = We show the update σ σ  2(t+1)  1 1 (t) s˜ + tr[R−1 AΣx|W A ]. nN n  2(t+1)  (3.18)  is positive in the following. The first term in  is positive, by assumption (c) and the fact that R is positive definite.  The second term is nonnegative by the following argument. Using the famous matrix identity −1 −1 (VΣV + Σ0 )−1 = Σ−1 + V Σ−1 0 − Σ0 V Σ 0 V  −1  V Σ−1 0  provided the matrix orders properly defined, we see that (t)  Σx|W = Σx(t)  −1  (t) −1  + C(t) Σd  C(t)  −1  (t)  which is positive definite. Given Σx|W > 0, assumption (a) then implies 35  Chapter 3. Linear mixed models for measurement error in functional regression (t)  that AΣx|W A ≥ 0. Together with the fact that R > 0, the second term in (3.18) is thus nonnegative.  3.3.4  Updating β (t) and σ 2(t)  The updates of β (t) and σ 2(t) maximize ΛN over β and σ 2 , holding the other 2(t)  parameters fixed. Suppose that σ > 0 and Σx(t) > 0. We find unique ˆ and σ critical points, β ˆ 2 , and show that they increase the log likelihood provided σ ˆ 2 > 0. Note that log f (yi , zi ) = log f (yi |zi ) + log f (zi ), that log f (zi ) doesn’t  depend on β or σ 2 . We also note given zi , yi is normal with mean E(Yi |zi ) ≡ β0 + β G(zi − µ), and variance σY2 |z ≡ Var(Yi |zi ) = β Kβ + σ 2  (3.19)  where G = TΣx A Σ−1 z  (3.20)  and K = TΣx T − TΣx A Σ−1 z AΣx T . Therefore, to maximize ΛN with respect to β and σ 2 , we maximize 1 ˜ N = − N ln(β Kβ + σ 2 ) − Λ 2 2(β Kβ + σ 2 )  N i=1  2  Yi − β0 − β G(zi − µ) (3.21) .  ˜ N /∂β and ∂ Λ ˜ N /∂σ 2 to With detailed derivation in Section B.5, equating ∂ Λ zero yields respectively 1 β Kβ + σ 2  N i=1  Yi − β0 − β G(zi − µ) G(zi − µ) = 0  (3.22)  36  Chapter 3. Linear mixed models for measurement error in functional regression 1 (β Kβ + σ 2 )2  N i=1  Yi − β0 − β G(zi − µ)  2  − N (β Kβ + σ 2 ) = 0.  (3.23) Note that G is of full row rank because of the following two observations. First, T is of full row rank by assumption (b). Second, the matrix Σ z = Σx(t) + σ  2(t)  R is invertible since it is positive definite.  Let  N  M=G i=1  (zi − µ)(zi − µ) G .  Then, by assumption (c), M is positive definite. Solving (3.22) for β and (3.23) for σ 2 gives N  ˆ = M−1 G β (t+1) = β  i=1  σ  2(t+1)  1 = σ ˆ = N  (zi − µ)(Yi − β0 )  N  2  Yi − β0 − β G(zi − µ)  i=1  2  − β Kβ.  (3.24)  Unfortunately, we are not guaranteed that σ ˆ 2 is positive. However, in all of our data analyses and simulation studies, the final estimate of σ 2 was always positive. ˜ N , we show that the Hessian Again, to check if the update increases Λ matrix is negative definite. We notice that (3.24) implies 1 ˆ Kβ ˆ +σ σ ˆY2 |z ≡ β ˆ2 = N  N i=1  ˆ G(zi − µ) Yi − β 0 − β  2  ,  (3.25)  which is positive by assumption (d). With detailed calculation in Secˆ and σ ˜ tion B.5, the Hessian matrix HΛ(β, σ 2 ) when evaluated at β ˆ 2 equals   ˆβ ˆ K+ 2Kβ ˆ σ ˜ β, HΛ( ˆ )=− 2 2 (ˆ σY |z ) ˆK β 2  N  2 σ ˆY |z N M    ˆ Kβ . 1 2  ˆ σ ˜ β, It follows that HΛ( ˆ 2 ) < 0 by the following argument. Let x 1 ∈  (3.26)  J  and  37  Chapter 3. Linear mixed models for measurement error in functional regression x2 ∈  with at least one of x1 or x2 non-zero. Direct calculation yields x1  ˆ σ (x1 , x2 ) HΛN (β, ˆ2)  =  3.4  −  N (ˆ σY2 |z )2  √ x ˆ √2 + 2x1 Kβ 2  x2 2  +  σ ˆY2 |z N  x1 Mx1 < 0.  Inference for β  ˆ of β, we estimate the function β by βˆ = β ˆ ψ. If the Given the estimate β ˆ is Σβ , then the covariance function of β, ˆ denoted Vβ , covariance matrix of β is ˆ ψ(s), β ˆ ψ(t)) = ψ(s) Σβ ψ(t). Vβ (s, t) = Cov(β ˆ and an estimate of Σβ . We base inference for β on β We estimate Σβ in two ways, using bootstrap by resampling the observed ˆ σ ˜ N (β, Wi ’s with replacement or using the observed Hessian matrix H Λ ˆ2) ˆ β as the K × K upper corner of the inverse of defined in (3.26). We take Σ ˆ σ ˜ N (β, −HΛ ˆ 2 ). In doing this, we are treating the other parameters as known, ignoring the variability introduced by estimating them. Thus, we expect that  we may underestimate Vβ (t, t), while we don’t anticipate underestimation with the bootstrap estimate. However, the Hessian-based estimate is very fast to compute.  3.4.1  Hypothesis testing for β  Testing that β ≡ 0 To determine if Yi depends on Zi (·) we test Ho : β(t) = 0, for all t ∈ [a, b]. We consider three test statistics.  38  Chapter 3. Linear mixed models for measurement error in functional regression The first test statistic is the generalized likelihood ratio statistic Ul = sup ΛN − sup ΛN . β=0  The unrestricted supremum of ΛN is achieved at the ECME estimates described in Section 3.3. We can also use the ECME procedure to calculate the first supremum. To obtain the first supremum, we observe under the restriction β = 0, zi and Yi are independent, with zi ∼ N (µ, AΣx A +σ 2 R) ˆ = ¯z, βˆ0 = Y¯ and σ and Yi ∼ N (β0 , σ 2 ). Thus µ ˆ 2 = (Yi − Y¯ )2 /N . We 2(t)  then calculate Σx(t) and σ by an ECME method treating these estimates 2 ˆ of µ, β and σ as known. We update Σx by maximizing ΛN directly while  holding σ 2 fixed. We update σ 2 by finding its EM estimate σ  2(t+1)  while  holding Σx fixed. We iterate untill convergence occurs. ˆ the vector The second statistic considered is Wald’s test statistic using β, of estimated basis coefficients: ˆΣ ˆ ˆ −1 β. Uw = β β It is interesting to note that this test statistic can be re-written in terms of ˆ To see this, let t∗ , i = 1, . . . , n∗ , be a a vector of function evaluations of β. i  ˜ be the vector containing the values of βˆ at sequence of time points, let β ˜ covariance matrix. The Wald test statistic based the t∗i ’s, and let Σβ˜ be β’s ˜ is β ˜Σ ˜ where Σ ˆ+ ˆ+ on β ˜ β, ˜ is the Moore-Penrose inverse of an estimate of β β ˜Σ ˜ = Uw . ˆ+ Σ ˜ . We now argue that, under mild conditions on the t ∗ ’s, β ˜β i  β  Define the  n∗  β  ψ j (t∗i )  × J matrix Ψ as Ψij = and suppose that Ψ is of full ˜ ˆ column rank. Since β = Ψβ, Σβ˜ = ΨΣβ Ψ and thus it is natural to take + + ˆ −1 + ˜ ˆ+ ˜ = Uw . ˆ β Ψ . Since Σ ˆ+ ˆ ˜ = ΨΣ Σ ˜ =Ψ Σ ˜β β Ψ and Ψ Ψ = I, β Σβ β β  The third statistic is the integrated t-statistic b  Uf =  a  βˆ2 (t) dt. Vˆβ (t, t)  To calculate the null distribution of a test statistic, and thus calculate a p-value, we use a permutation type method, one that does not rely 39  Chapter 3. Linear mixed models for measurement error in functional regression on distributional assumptions. Under the null hypothesis that β(t) ≡ 0  for all t ∈ [a, b], Yi and zi are independent. Therefore the joint distribu-  tion of (zi , Yi ) is the same as that of (zi , Yi ) where i is chosen at random from {1, . . . , N }. To simulate the null distribution of a test statistic, we  make Q new data sets via permuting the Y i ’s: the qth data set is simply {z1 , Yi1 , . . . , zN , YiN } where i1 , . . . , iN is a random permutation of 1, . . . , N . The p-value is then calculated as the proportion of the resulting Q statistic  values larger than the original observed value. Testing equality of two β’s Often we want to know if the β’s governing Y ’s and Z’s in two different groups are equal. To denote group membership, we use the superscript “s” or “c” to indicate the “selection” or “control” group respectively. We have data collected independently from the two groups and we want to test Ho : β s (t) = β c (t), for all t ∈ [a, b]. We consider four test statistics. We assume that the selection and control log-likelihoods, ΛsN s and ΛcN c , each have the same expressions as in (3.21) but with possibly different parameter values, superscripted by s or c. The first test statistic is from a likelihood ratio test Ul = =  sup {ΛsN s + ΛcN c } − sup{ΛsN s + ΛcN c }  s  β = βc  sup {ΛsN s + ΛcN c } − sup ΛsN s − sup ΛcN c .  s  β = βc  Each of the last two suprema is calculated separately, using the ECME estimates from Section 3.3. We can also apply this ECME procedure to calculate the first supremum. Under the restriction β s = β c , ΛsN s and ΛcN c have a common parameter β ≡ β s = β c . Following the same argument as in Section 3.3, we have µc(t) = ¯zc ,  µs(t) = z¯s ,  c(t)  β0  = Y¯ c  and  s(t)  β0  = Y¯ s .  40  Chapter 3. Linear mixed models for measurement error in functional regression c(t)  To update each of Σx  s(t)  and Σx , we follow the steps outlined in Sec-  tion 3.3.2. To update each of σ  c2(t)  and σ  s2(t)  , we follow the steps outlined  in Section 3.3.3. To update β (t) , σ s2(t) and σ c2(t) , we proceed as in Section 3.3.4, but with some modification for our updating of β (t) . ˜s s To describe the procedure to update β (t) , σ s2(t) and σ c2(t) , define Λ N ˜ c c in a manner analagous to Λ ˜ N in (3.21). By an argument similar to and Λ N  ˜c c. ˜s s + Λ that in Section 3.3.4, we must find β, σ s2 and σ c2 to maximize Λ N N s c s2 c2 ˜ ˜ Differentiating ΛN s + ΛN c with respect to σ and σ and setting equal to zero yields equations analagous to the equation for σ 2(t+1) in (3.24). ˜s s + Λ ˜ c c with respect to β and setting Unfortunately, differentiating Λ N  N  equal to zero yields an intractable equation. So we modify our calculation of β, but retain the above-described updates for σ s2 and σ c2 . Instead of ˜˜ c with ˜˜ s + Λ ˜s s + Λ ˜ c c with respect to β we maximize Λ maximizing Λ Nc Ns N N ˜ ˜ are defined as follows. First consider the term respect to β, where the Λ’s β Ks β + σ s2 . Calculate the matrix Ks using Σxs(t+1) and σ β=β  (t)  s2(t+1)  . Take  and σ s2 = σ s2(t) . Then the resulting expression for β Ks β + σ s2 , s2(t)  which we denote σY |z , no longer contains the parameters β and σ s2 . Let s  ˜˜ s = − N ln(σ s2(t) ) − 1 Λ Ns Y |z s2(t) 2 2σY |z  Yis − β0s − β Gs (zsi − µs )  2  .  c2(t) ˜˜ c similarly. Define σY |z and Λ Nc ˜˜ s + Λ ˜˜ c with respect to β and setting equal to zero Differentiating Λ Ns Nc  yields the update   1  β (t+1) =   s2(t) σY |z    s2(t) σY |z    1  Ms +  Gs  1 c2(t) σY |z  −1  Mc   (zsi − µs )(Yis − β0s ) +  1 c2(t) σY |z  Gc    (zci − µc )(Yic − β0c ) .  41  Chapter 3. Linear mixed models for measurement error in functional regression where Ms = G s  (zsi − µs )(zsi − µs ) Gs  and Mc is defined similarly. To define the two sample Wald’s statistic U w , let s ˆ s and define β with estimated covariance matrix Σ β  ˆ s be the estimate of β ˆ c and Σ ˆ c similarly. β β  The two sample statistic is ˆs − β ˆ c) Σ ˆ βs + Σ ˆ βc Uw = ( β  −1  (βˆs − βˆc ).  We consider a two sample statistic based on function evaluations, rather ˜s = than on basis coefficients, recalling the notation in Section 3.4.1. Let β s  ˆ be the vector of function evaluations of βˆs at a sequence of time points. Ψs β ˜ s ’s covariance matrix. Similarly ˆ sβ˜ = Ψs Σ ˆ sβ Ψs be the estimate of β Let Σ ˜ c and Σ ˜ s and β ˜c ˆ c˜ . The two sample Wald test statistic based on β define β β  is ˜s − β ˜ c) Σ ˆ ˜s + Σ ˆ ˜c Ue = ( β β β  +  ˜s − β ˜ c ). (β  In Section 3.4.1, the one sample situation, we argued that we needn’t use the function evaluation statistic Ue as it is equivalent to the Wald test statistic Uw under mild conditions. In the two sample situation here, however, the two sample Uw and Ue may not agree unless Ψs = Ψc and both of them are of full column rank. ˆ sβ ψ s (t) To define the two sample integrated t-statistic U f , let Vˆβ s (s, t) = ψ s (s) Σ be the estimate of the covariance function of βˆs and define Vˆβ c (s, t) similarly. The two sample Uf is Uf =  [βˆs (t) − βˆc (t)]2 dt. Vˆβ s (t, t) + Vˆβ c (t, t)  Again, we use the permutation method to calculate the null distribution of the test statistics and thus the p-values. Under the null hypothesis that β s (t) = β c (t) for all t ∈ [a, b], the dependence of Y i on zi is identical in  both groups. We generate Q “data sets” from the original data set, data  sets that follow the null hypothesis. To construct the qth “data set”, we 42  Chapter 3. Linear mixed models for measurement error in functional regression randomly split the N s + N c individuals into two groups of size N s and N c and calculate the resulting test statistic. We use the empirical distribution of the obtained Q test statistic values to approximate the null distribution of our statistic. The p-value is then calculated as the proportion of the Q statistic values larger than the original observed value.  3.5  Model assumption checking  After fitting the model, we need to check if the model assumptions are satisfied. Our model diagnostics rely on the analysis of residuals. In this section, we derive expressions of the fitted values for W i and for the residuals. Fitted values and residuals for zi and Yi are then obtained as components. We can then plot the residuals to check model assumptions and to look for outliers and influential points. To simplify notation, unknown parameters below stand for their estiˆ i on the mates. Using model (3.4) and (3.5), we base our fitted values W BLUP of the random effects xi , µxi |Wi in (3.16), ˆ i = µ + Cµ W W xi |Wi = µW + CΣx C Σ−1 W (Wi − µW ) Recall the expression of ΣW in (3.9). We get −1 −1 CΣx C Σ−1 W = CΣx C + Σd − Σd ΣW = I − Σd ΣW  and thus ˆ i = I − Σd Σ−1 Wi + Σd Σ−1 µ . W W W W It then follows ˆ i = Σd Σ−1 (Wi − µW ). r i = Wi − W W ˆ i gives the fitted value of Yi and the last element of The last element of W r i gives the Yi residual. As we focus on modelling the dependence of Y i on 43  Chapter 3. Linear mixed models for measurement error in functional regression Zi , plotting residuals against fitted values of Y i is useful to detect outliers or influential points of the model fit.  3.6  Model application  In this section, we analyze a biological data set using our model. First we give a general description of how to choose the basis functions φ 1 , . . . , φK . Then we conduct the data analysis.  3.6.1  Choosing the basis functions  We choose the basis functions, the φ k ’s, as functions that give good approximations to Zi (·): we choose them as the first few estimated eigenfunctions of the Zi process. To do so, we apply a smoothed principle component analysis to the observed zi ’s, penalizing an approximation of the second derivative ˆ be the sample covariance matrix of the z i ’s. We of the eigenfunction. Let Σ ˜ j ∈ Rn to maximize find a sequence of orthogonal vectors v ˆ j vj Σv , vj vj + λvj D Dvj where λ is the smoothing parameter and Dv j calculates second divided differences of the vector vj . The (n − 2) × n matrix D depends on t1 , . . . , tn  and is defined to differentiate quadratic functions exactly. That is, if v j [i] = ˜ j are eigenvectors of a + bti + ct2i , then (Dvj )[i] = 2c. Given λ, the vectors v −1/2 −1/2 ˆ ΣG , where G = I + λD D. The approach is similar to the matrix G Ramsay and Silverman’s (2005) but we don’t use basis function expansions of the Zi . Choice of λ can be done by cross-validation, but for simplicity here, we ˜ j ’s. In the data select λ by examining the smoothness of the resulting v analysis, we chose λ = 100.  44  Chapter 3. Linear mixed models for measurement error in functional regression  3.6.2  Data description  Data were provided by Patrick Carter, School of Biological Sciences, Washington State University, and are described in Morgan et al, 2003. Data are from mice divided into four groups according to gender and the two treatments “selection” and “control”. The selection group mice were bred over 16 generations, with selection being on high wheel-running activity at age eight weeks. Control mice were bred at random. In the final generation, body mass and wheel running activity were recorded for each mouse for sixty two consecutive weeks, indexed from −1 to 60, except for weeks 34, 38, 39,  50. The research interest is to know how body mass and wheel running are related and if the relationship depends on the treatment. The wheel running distance data have many missing values and are very noisy. In addition, the wheels were cleaned every four weeks, and so we see spikes in wheel-running activity every fourth week. So in our analysis, we average the nonmissing wheel running distances over weeks 5 to 60 and use this average as the response Y . The predictor Z(·) is the log transformed body mass. We want to know if any of the groups have β non-zero and if there is any difference between the selection β s and the control β c within each gender. Plots of the observed zi ’s and histograms of the Yi ’s in each group are in Figures 3.1 and 3.2 respectively. We see that log body mass is roughly monotone with a high rate of increase in weeks −1 to 4. The log body mass  in the males is more variable than that in the females.  3.6.3  Choice of basis functions  A smoothed eigenanalysis in each of the four groups yielded a first eigenfunction that was close to constant, indicating that the biggest source of variability in log body mass in each group was overall size of the mouse. Since a constant eigenfunction is biologically meaningful, we forced our first basis function to be constant and, within each group, calculated the remaining functions via a smoothed eigenanalysis on the centered log body mass  45  Chapter 3. Linear mixed models for measurement error in functional regression as follows. We let 1 z¯i = 58  60  zik k=−1 k=34,38,39,50  be the ith mouse’s average log body mass and zij − z¯i  j = −1, . . . , 60, j = 34, 38, 39, 50  the ith mouse’s centered log body masses. Within each group, we calculated the sample covariance matrix of the centered log body mass vectors. We then applied the smoothed eigenanalysis to this covariance matrix. Figure 3.3 shows the proportion of cumulative variance explained by the first ten principal components of this analysis in each group. The variance explained by the constant function was 85% in the male selection group, 72% in the male control group, 70% in the female selection group and 63% in the female control group. Figure 3.4 shows the constant function and the first three eigenfunctions for each group. They are displayed one by one with the four groups’ functions together in one panel. There we see the three smoothed eigenfunctions are very similar in the four groups. Figure 3.3 suggests that, if we choose the first three smoothed eigenfunctions in each group, we will capture about 90% of the variability of the log body mass trajectories, beyond the variability captured by the constant function. We will use a constant function plus these three eigenfunctions in our analysis.  3.6.4  Estimation and residual analysis  Before proceeding with inference, we study residuals of our fits to see if there are any outliers. We fit the model (3.4) and (3.5) within each group, using R = I, and choose the same basis functions for β as for the log body mass, calculated in Section 3.6.3. Within each group, we plot the Y i residuals against the fitted Yi ’s to detect outliers as outlined in Section 3.5. Residual plots of each group are in Figure 3.5. In the control males, we see an outlier at the left 46  Chapter 3. Linear mixed models for measurement error in functional regression side. This outlier may be influential in the estimation of β. So we remove this outlier and refit the model to the male control group. The first panel of Figure 3.6 is the same as the plot in the second panel of Figure 3.5, showing the outlier in the control males. The second panel of Figure 3.6 shows the residual plot of the fit after removing the outlier. There we see no new outliers. We remove the one outlier in the control males in our subsequent analyses.  3.6.5  Inference for β(·)  In each of the four groups, we estimated β and calculated standard errors from both the Hessian matrix and the bootstrap, as described in Section 3.4. These are shown in Figure 3.7. To determine if β(t) = 0 for all t ∈ [−1, 60], we can study the plots in  Figure 3.7 and we can conduct a formal hypothesis test. From the plots in  Figure 3.7, we see that, except for the female control group, all groups show ˆ This a region where the zero line is not within one standard error of β. suggests that, within these groups, perhaps there may be a dependence of averaged wheel running on log body mass. We conduct a hypothesis test of Ho : β(t) = 0 for all t ∈ [−1, 60] in each group using the test statistics in Section 3.4.1. We compute the standard  errors using the Hessian matrix (3.26). The results are in Table 3.1 below. The last three columns contain the permutation p-values of U l , Uw and Uf , computed by using 500 permutations. The second column gives the observed value of the test statistic U l and the next column gives the p-value of Ul based on the fact that negative twice the log-likehood ratio statistic is asymptotically chi-squared distributed with 4 degrees of freedom. From the Table, we see that, in selected and control males, average wheel-running depends on the log body mass trajectories. However, there is insufficient evidence to make this claim in the other three groups. We observe that the p-values of Ul and Uw are identical. We plot the permuted values of Ul and Uw in Figure 3.8 and see that Ul is an increasing function of Uw . This may be due to the normality assumption in the model.  47  Chapter 3. Linear mixed models for measurement error in functional regression  3.6.6  Inference for β s − β c  Within each gender, we want to compare the selection β(·) with the control β(·). Figure 3.9 shows the pointwise difference between the selection βˆ and the control βˆ together with the standard errors computed from the Hessian matrix and from the bootstrap. For both males and females, the region within one standard error of βˆs − βˆc contains much of the zero line, which  suggests we can’t distinguish between the selection β and the control β.  Table 3.2 gives results of the test H o : β s (t) = β c (t), for all t ∈ [−1, 60]  in each gender. The last three columns contain the permutation p-values of  Ul , Uw and Uf , computed by using 500 permutations. The second column gives the observed value of the test statistic U l and the next column gives the p-value of Ul based on the fact that negative twice the log-likehood ratio statistic is asymptotically chi-squared distributed with 4 degrees of freedom. Given the results, we can not reject H o in either gender. That is, within each gender, there is no evidence of a difference between the selected group and the control group in terms of average wheel running’s dependence on log body mass. In conclusion, we find a strong dependence of average wheel running on the log body mass in the selected males and control males. We don’t have enough evidence to distinguish the difference between the selected group and the control group in terms of average wheel running’s dependence on log body mass in either gender.  3.7  Simulation study  In the simulation study, we compare the pointwise mean squared errors of our ECME estimate of β with those of a modified version of the two stage estimate proposed by M¨ uller (2005). We also compare the power of the test statistics proposed in Section 3.4.1 for one-sample and two-sample comparisons. We will see that, in terms of mean squared error, in the one-sample case, both the ECME estimate and the two stage estimate suffer from an edge  48  Chapter 3. Linear mixed models for measurement error in functional regression effect. In Section 3.7.4, we look into this edge effect further. The ECME estimate typically has slightly smaller MSE than the two stage estimate, with the improvement in MSE being more noticeable as the dependence of Y on Z increases. In the two-sample case, the two methods are comparable. For testing, in the one-sample case, there is little difference in power between the two statistics considered. In the two sample case, the integrated t-statistic has more power to distinguish the difference between β s and β c . In the one-sample comparison in Section 3.7.2, we simulate data using parameter values estimated from the male selection group data analyzed in Section 3.6.2. In the two-sample comparison in Section 3.7.3, we simulate selection group data using estimates from the male selection group data and the control group data using estimates from the male control group data without the outlier.  3.7.1  Two stage estimate  Recall from Section 3.1 that the calculation of M¨ uller’s (2005) estimate of β had two parts. The first part ignored the y i ’s and only used the zi ’s to predict the underlying xi ’s. The second part of the analysis was essentially a linear regression of the yi ’s on the predicted xi ’s. We would like to study this procedure’s ability to estimate β. We suspect that ignoring the y i ’s in the first part of the procedure will lead to poorer estimation of β. To study this in a way that is comparable to our over-all methodology, we slightly modify the M¨ uller’s method. We use linear mixed effects methodology to predict the xi ’s from the zi ’s. Specifically, we first use our smoothed principal components analysis of the z i ’s to determine basis functions, the φk ’s, see (3.3). We then construct the matrix T and fit model (3.4) using the EM algorithm. Laird (1982) gave an elaboration on the EM computation in linear mixed models. We use the resulting estimated covariances of the xi ’s and  i ’s  to calculate our predictors, the BLUPs of x i given zi , that is,  to calculate E(xi |zi ). For the second part of the analysis, we find the least squares estimate of β according to the regression model Yi = β0 + E(xi |zi )T β + ei . 49  Chapter 3. Linear mixed models for measurement error in functional regression In our fit of (3.4), we force Σx to be diagonal. This is in keeping with M¨ uller, and stems from the fact that the components of the x i ’s are principal component scores. The main difference between our approach and M¨ uller’s is in the way we estimate the required variances and covariances for calculating the BLUPs. We use a linear mixed effects model based on eigenfunctions. M¨ uller used a smoothing method to directly estimate the components of the covariance structure.  3.7.2  One sample comparison  We simulate data based on parameter and eigenfunction estimates from the data analysis of the male selection group in Section 3.6.2. Let µ s , Σsx , σ s2 , σ s2 , β0 s , β s and β s (t) = ψ(t) β s denote these estimates and let As be the matrix constructed from the basis functions, evaluated at the same t j ’s as in the data analysis. We simulate the unperturbed predictor Z i according to a multivariate normal N (µs , As Σsx As ) and let the observed zi be Zi plus N(0, σ s2 I) noise. We consider four possible β functions to describe the relationship between Yi and Zi : 60 Yi = β 0 s + β(t) Zi (t) − Z¯i (t) dt + ei . −1  The integral is calculated using the R function “sintegral” and we simulate ei from N(0, σ s2 ). We take β = γβ s with γ = 0, 2/3, 4/3 or 2. When γ = 0, Yi does not depend on Zi . When γ = 2, the dependence is large. Recall in our data analysis in Section 3.6.2, we found that β s was significantly different from zero. Thus, for each value of γ, we can generate an observed data set (zi , Yiγ ), i = 1, . . . , 39. We first compare our estimate of β with the two stage estimate. We run 100 simulations and the MSE of the estimate βˆ is calculated as 100 1  ˆ − β(t))2 /100 β(t)  50  Chapter 3. Linear mixed models for measurement error in functional regression at each observation point t. The results are in Figure 3.10 where the first panel shows the pattern of the true β’s, the next two panels show the pointwise MSE’s of our two estimates of β and the last panel shows the differences of these pointwise MSE’s. In the plots, the MSE’s increase as γ increases and both estimates of β have high MSE’s for t’s at the edges. The MSE’s of both methods are much worse at the left hand edge, probably due to the sharp decrease of the true β and the sharp increase in log body mass before week 10. The ECME method seems to be more affected by this edge. We will give a further study of this edge effect in Section 3.7.4. However, in the last panel we see that overall, the ECME estimate has a smaller MSE, with the superiority of ECME increasing as γ increases. Therefore, if there is a significant dependence of Y i on zi through β, the ECME method of estimate is preferred. To test Ho : β(t) = 0, for all t ∈ [−1, 60], we use the test statistics  Uw and Uf with standard errors calculated using the Hessian matrix (3.26)  as described in Section 3.4. For each data set, we run 300 permutations to calculate the p-values. We simulate 100 data sets for each value of γ and choose levels α = 0.01 and α = 0.05. Figure 3.11 and Figure 3.12 summarize the proportion of times Ho was rejected using Uw and Uf but with different levels: α = 0.01 and α = 0.05 respectively. As expected, the powers increase as γ increases. There is little difference in power between the two statistics.  3.7.3  Two sample comparison  To simulate data from two independent samples, we choose model parameters using the male selection group data and the male control group data analysed in Section 3.6.2. We simulate data for the selection group and, separately, for the control group using the same methodology as in Section 3.7.2 but with different true β’s. Let β s and β c be the estimates of β from the original male selection group and male control group data. Let β¯ = (β s + β c )/2 and ∆β = (β s − β c )/2.  In the simulation study we set the β of the selection group to β¯ + γ∆β and that of the control group to β¯ − γ∆β with γ = (0, 2/3, 4/3, 2). Thus, the 51  Chapter 3. Linear mixed models for measurement error in functional regression difference between the selection and control β is 2γ∆β = γ(β s − β c ).  To estimate β s and β c , we fit the selection data and control data sepa-  rately. For each value of γ we simulate 100 data sets and calculate the MSE as in Section 3.7.2 in each group. Figures 3.13 and 3.14 show the MSE’s of the two estimates in each group respectively. The pattern of the two MSE’s are similar to their counterparts in Figure 3.10. We calculate the MSE of the estimate β s − β c as 100 1  βˆs (t) − βˆc (t) − β s (t) + β c (t))2 /100  at each observation point t. Figure 3.15 shows the results. The patterns of the two MSE’s are similar to their counterparts in Figure 3.10. The MSE’s of the two estimates are comparable but in general the MSE of the ECME estimate is smaller. To test Ho : β s (t) = β c (t), for all t ∈ [−1, 60], we compare the four  test statistics, Ul , Uw , Ue and Uf , with standard errors calculated using the Hessian matrix (3.26). For each data set, we run 300 permutations to calculate the p-values. We simulate 100 data sets for each value of γ and choose levels α = 0.01 and α = 0.05. Figures 3.16 and 3.17 show the power of the four test statistics with levels 0.01 and 0.05 respectively. The statistic Uf is the most powerful, especially when γ is large, but U f and Uw are comparable.  3.7.4  Edge effect discussion in one-sample MSE comparison  In Figure 3.10, we see that the MSE’s of the ECME estimate of β are much worse at the left hand edge than the MSE’s of the two stage estimate. We suspect this is probably due to the sharp decrease of the true β before week 10 as shown at the first panel of Figure 3.10, and the large increase in log body mass in that same period. In this section, we exclude the early weeks’ data from our analysis. To determine the various parameter values to use in a new simulation, we re-analyze the male selection group data but with z i containing log body 52  Chapter 3. Linear mixed models for measurement error in functional regression mass values a week 5 rather than at week −1. After obtaining the new  parameter estimates, we simulate data in the same way as in Section 3.7.2.  The simulation analysis result is in Figure 3.18. The edge effect still exists but this time the MSE’s of ECME and the two stage method are comparable at the edges.  53  Chapter 3. Linear mixed models for measurement error in functional regression Group Male selected Male control Female selected Female control  Observed Ul 13.723 10.940 9.095 3.280  Asymptotic p-Ul 0.008 0.027 0.059 0.512  p-Ul 0.008 0.044 0.082 0.568  p-Uw 0.008 0.044 0.082 0.568  p-Uf 0.008 0.034 0.264 0.656  Table 3.1: P-values of the test Ho : β(t) = 0, for all t ∈ [−1, 60] in each group. Gender Male Female  Observed Ul 11.919 5.852  Asymptotic p-Ul 0.018 0.210  p-Ul 0.364 0.564  p-Uw 0.356 0.568  p-Ue 0.532 0.884  p-Uf 0.110 0.444  Table 3.2: P-values of the test Ho : β s (t) = β c (t), for all t ∈ [−1, 60], within each gender.  54  Chapter 3. Linear mixed models for measurement error in functional regression  3.0 2.0  10  20  30  40  50  60  0  10  20  30  40  50  Week  Female Selected  Female Control  60  3.0 2.5 2.0  2.0  2.5  3.0  Log body Mass  3.5  Week  3.5  0  Log body Mass  2.5  Log body Mass  3.0 2.5 2.0  Log body Mass  3.5  Male Control  3.5  Male Selected  0  10  20  30 Week  40  50  60  0  10  20  30  40  50  Week  Figure 3.1: Plots of log body mass versus week for the four groups of laboratory mice. 55  60  Chapter 3. Linear mixed models for measurement error in functional regression  0.020 0.000  20 40 60 80  120  20 40 60 80  120  Female Selected  Female Control  0.020 0.010 0.000  0.010  Density  0.020  0.030  Wheel Running (km/week)  0.030  Wheel Running (km/week)  0.000  Density  0.010  Density  0.020 0.010 0.000  Density  0.030  Male Control  0.030  Male Selected  20 40 60 80  120  Wheel Running (km/week)  20 40 60 80  120  Wheel Running (km/week)  Figure 3.2: Histogram of the response: averaged wheel running from weeks 5 to 60 for the four groups of laboratory mice. 56  Chapter 3. Linear mixed models for measurement error in functional regression  0.9 0.8 0.5  4  6  8  10  2  4  6  8  Eigenvalue Index  Female Selected  Female Control  10  0.9 0.8 0.6 0.5  0.5  0.6  0.8  0.9  Variance Explained  1.0  Eigenvalue Index  1.0  2  Variance Explained  0.6  Variance Explained  0.9 0.8 0.6 0.5  Variance Explained  1.0  Male Control  1.0  Male Selected  2  4  6  8  Eigenvalue Index  10  2  4  6  8  Eigenvalue Index  Figure 3.3: Plots of the proportion of cumulative variance of the centered log body mass explained by the first ten principal components of the four groups of mice, after the individual average log body mass has been removed. 57  10  0.1 −0.1 −0.5  −0.3  PC Function 2  1.2 1.0 0.8 0.6  PC Function 1  1.4  Chapter 3. Linear mixed models for measurement error in functional regression  0  10  20  30  40  50  60  0  10  20  40  50  60  40  50  60  Week  0.2 0.1 −0.2  0.0  PC Function 4  0.1 0.0 −0.2  PC Function 3  0.2  0.3  Week  30  0  10  20  30 Week  40  50  60  0  10  20  30 Week  Figure 3.4: The constant function and the first three smoothed eigenfunctions of the covariance of centered log body mass of the four groups of mice, calculated as described in Section 3.6.3. 58  Chapter 3. Linear mixed models for measurement error in functional regression  40  50  60  10 20 30  70  30  35  40  45  50  Female Selected  Female Control  −30  −10  Observed − Fitted  −10  55  10 20 30  Fitted  10 20 30  Fitted  −30  Observed − Fitted  −10 −30  −10  Observed − Fitted  10 20 30  Male Control  −30  Observed − Fitted  Male Selected  75  80  85  90  Fitted  95  45  50  55  60  Fitted  Figure 3.5: Residual plots of the fit of the Y i = averaged wheel running of the four groups of mice. 59  Chapter 3. Linear mixed models for measurement error in functional regression  0 10 −20  Observed − Fitted  30  The outlier  30  35  40  45  50  55  Fitted  0 10 −20  Observed − Fitted  30  No outlier  30  35  40  45  50  55  60  Fitted  Figure 3.6: Plots of residuals of Yi = averaged wheel runining in the male control group of mice before and after removing the outlier. 60  Chapter 3. Linear mixed models for measurement error in functional regression  20 0 −60 10  20  30  40  50  60  0  10  20  30  40  50  Week  Female Selected  Female Control  60  20 0 −20 −60  −60  −20  ^ β(t)  0  20  40  Week  40  0  ^ β(t)  −20  ^ β(t)  0 −20 −60  ^ β(t)  20  40  Male Control  40  Male Selected  0  10  20  30 Week  40  50  60  0  10  20  30  40  50  Week  Figure 3.7: Plots of βˆ and its standard errors computed from the Hessian matrix (solid) and from the bootstrap (dash-dot) for the four groups of mice, as described in Section 3.4. 61  60  Chapter 3. Linear mixed models for measurement error in functional regression  10  15  20  15 10 5  25  0  5  10  15  20  permuted U_w  Female Selected  Female Control  10  15  permuted U_w  20  25  10 5 0  permuted log−likelyhood ratio  15 10 5  5  15  permuted U_w  0 0  0  permuted log−likelyhood ratio  20 15 10 5  5  20  0  permuted log−likelyhood ratio  Male Control  0  permuted log−likelyhood ratio  Male Selected  0  5  10  15  20  permuted U_w  Figure 3.8: Comparing the permuted values of the generalized likelihood ratio statistic Ul with the Wald statistic Uw for four groups of mice. The equivalence of Ul and Uw were observed in Table 3.1. 62  Chapter 3. Linear mixed models for measurement error in functional regression  0 20 −40  βs(t) − βc(t)  60  Male  0  10  20  30  40  50  60  40  50  60  Week  0 20 −40  βs(t) − βc(t)  60  Female  0  10  20  30 Week  Figure 3.9: Plots of βˆs − βˆc and the standard errors of the difference computed from the Hessian matrix (solid) and from the bootstrap (dash-dot) for both genders. 63  Chapter 3. Linear mixed models for measurement error in functional regression  True β  200  γ=0 γ = 2/3 γ = 4/3 γ=2  0  50 100  20 −20  0  β  40  γ=0 γ = 2/3 γ = 4/3 γ=2  MSE  60  MSE of the ECME estimate  0  10  20  30  40  60  0  10  20  30  40  50  60  MSE of the two stage estimate  Difference between the MSE’s  0 −10  γ=0 γ = 2/3 γ = 4/3 γ=2  −20  50  100  Two stage − ECME  γ=0 γ = 2/3 γ = 4/3 γ=2  5 10  Week  200  Week  0  MSE  50  0  10  20  30 Week  40  50  60  0  10  20  30  40  Week  Figure 3.10: MSE of the estimate of β for each γ value in one sample simulation as described in Section 3.7.2. Compare the ECME method with the two stage method.  64  50  60  Chapter 3. Linear mixed models for measurement error in functional regression  0.8 0.2  0.4  0.6  Uw Uf  0 0.0  Proportion of times rejecting Ho: β = 0  1.0  α = 0.01  0 0.000  0.667  1.333  2.000  γ  Figure 3.11: Proportion of times Ho is rejected using level α = 0.01, where Ho : β(t) = 0, for all t ∈ [−1, 60] in one sample simulation as described in Section 3.7.2. Two test statistics are considered, U w and Uf .  65  Chapter 3. Linear mixed models for measurement error in functional regression  0.8 0.2  0.4  0.6  Uw Uf  0.04 0.0  Proportion of times rejecting Ho: β = 0  1.0  α = 0.05  0.02 0.000  0.667  1.333  2.000  γ  Figure 3.12: Proportion of times Ho is rejected using level α = 0.05, where Ho : β(t) = 0, for all t ∈ [−1, 60] in one sample simulation as described in Section 3.7.2. Two test statistics are considered, U w and Uf .  66  Chapter 3. Linear mixed models for measurement error in functional regression  True βs  80  120  γ=0 γ = 2/3 γ = 4/3 γ=2  0  −20  40  MSE  10  30  γ=0 γ = 2/3 γ = 4/3 γ=2  0  βs  MSE of the ECME estimate  0  10  20  30  40  50  60  0  10  20  30  40  50  60  Week  MSE of the two stage estimate  Difference between the MSE’s  0  10  20  30 Week  40  50  60  2 0  γ=0 γ = 2/3 γ = 4/3 γ=2  −4 −2  Two stage − ECME  80 40 0  MSE  120  γ=0 γ = 2/3 γ = 4/3 γ=2  4  6  Week  0  10  20  30  40  Week  Figure 3.13: MSE of the estimate of β s for each γ value in two sample simulation as described in Section 3.7.3. Compare the ECME method with the two stage method.  67  50  60  Chapter 3. Linear mixed models for measurement error in functional regression  True βc  γ=0 γ = 2/3 γ = 4/3 γ=2  10  20  30  150 50  −40  −20  γ=0 γ = 2/3 γ = 4/3 γ=2 0  40  50  60  0  10  20  30  40  50  60  Week  Week  MSE of the two stage estimate  Difference between the MSE’s  5  10  γ=0 γ = 2/3 γ = 4/3 γ=2  −5  0  Two stage − ECME  150  250  γ=0 γ = 2/3 γ = 4/3 γ=2  50  MSE  MSE  βc  0  250  20  MSE of the ECME estimate  0  10  20  30 Week  40  50  60  0  10  20  30  40  Week  Figure 3.14: MSE of the estimate of β c for each γ value in two sample simulation as described in Section 3.7.3. Compare the ECME method with the two stage method.  68  50  60  Chapter 3. Linear mixed models for measurement error in functional regression  True βs − βc 500 300  MSE  100 0  10  20  30  40  50  60  0  10  20  30  40  50  60  MSE of the two stage estimate  Difference between the MSE’s  0  10  20  30 Week  40  50  60  10  15  γ=0 γ = 2/3 γ = 4/3 γ=2  5  300  γ=0 γ = 2/3 γ = 4/3 γ=2  0  Two stage − ECME  Week  500  Week  100  MSE  γ=0 γ = 2/3 γ = 4/3 γ=2  0  20  40  γ=0 γ = 2/3 γ = 4/3 γ=2  −20  βs − βc  MSE of the ECME estimate  0  10  20  30  40  Week  Figure 3.15: MSE of the estimate of β s − β c for each γ value in two sample simulation as described in Section 3.7.3. Compare the ECME method with the two stage method.  69  50  60  Chapter 3. Linear mixed models for measurement error in functional regression  0.2  0.4  0.6  Ul Uw Ue Uf  0.02 0.0  Proportion of times rejecting Ho: βs = βc  0.8  α = 0.01  0 0.000  0.667  1.333  2.000  γ  Figure 3.16: Proportion of times Ho is rejected using level α = 0.01, where Ho : β s = β c , for all t ∈ [−1, 60] in two sample simulation as described in Section 3.7.3. Four test statistics are considered U l , Uw , Ue and Uf .  70  Chapter 3. Linear mixed models for measurement error in functional regression  0.8 0.2  0.4  0.6  Ul Uw Ue Uf  0.05 0.0  Proportion of times rejecting Ho: βs = βc  α = 0.05  0.02 0.000  0.667  1.333  2.000  γ  Figure 3.17: Proportion of times Ho is rejected using level α = 0.05, where Ho : β s = β c , for all t ∈ [−1, 60] in two sample simulation as described in Section 3.7.3. Four test statistics are considered U l , Uw , Ue and Uf .  71  Chapter 3. Linear mixed models for measurement error in functional regression  True β  γ=0 γ = 2/3 γ = 4/3 γ=2  100 0  −40  −20  γ=0 γ = 2/3 γ = 4/3 γ=2  200  β  MSE  0  300  20  400  MSE of the ECME estimate  20  30  40  50  60  10  20  30  40  50  60  Week  MSE of the two stage estimate  Difference between the MSE’s  10  20  30 Week  40  50  60  25  γ=0 γ = 2/3 γ = 4/3 γ=2  15  γ=0 γ = 2/3 γ = 4/3 γ=2  0 5  Two stage − ECME  100 200 300 400  35  Week  0  MSE  10  10  20  30  40  Week  Figure 3.18: MSE of the estimate of β for each γ value using truncated log body mass as described in Section 3.7.4. Compare the ECME method with the two stage method.  72  50  60  Bibliography [1] James, G. (2002) Generalized linear models with functional predictors. Journal of the Royal Statistical Society B 64, 3, 411-432. [2] James, G. M. and Silverman, B. W. (2005) Functional adaptive model estimation. Journal of the American Statistical Association 100, 470, 565-576. [3] Laird, N. M. (1982) Computation of variance components using the E-M Algorithm. Journal of Statistical Computation and Simulation 14, 295-303. [4] Liu, C. H. and Rubin, D. B. (1994) The ECME Algorithm - a simple extension of EM and ECM with faster monotone convergence. Biometrika 81 4, 633-648. [5] Magnus, J. R. and Neudecker, H. (1988) Matrix Differential Calculus with Applications in Statistics and Econometrics. John Wiley and Sons. [6] Morgan, T.J., T. Garland, Jr., and Carter, P. A. (2003) Ontogenetic trajectories in mice selected for high wheel- running activity. I. Mean ontogenetic trajectories. Evolution 57, 646-657. [7] M¨ uller, H.G. (2005) Functional modeling and classification of longitudinal data. Scandinavian Journal of Statistics 32, 223-240. [8] Ramsay, J. O. and Silverman, B. W. (2005) Functional Data Analysis. 2nd edition, Springer Series in Statistics. [9] Wang, W. (2007) Identifiability in linear mixed effects models. Manuscript, Department of Statistics, University of British Columbia. 73  Bibliography [10] Yao, F., M¨ uller, H.G., Wang, J.L. (2005) Functional data analysis for sparse longitudinal data. Journal of American Statistical Association 100, 577-590.  74  Chapter 4  Dependence of average lifetime wheel-running on body mass ontogeny 4.1  Introduction  In natural populations of animals, strong correlations between large body size and increased Darwinian fitness are frequently measured: large body size is related to increased fecundity, reduced predation and, in general, an increased ability to survive in a given environment (Sogard 1997, Hone and Benton 2005). Variation in body mass is also associated with variation in a number of physiological traits, including metabolic rate and heart mass, and much effort has been expended understanding allometric relationships in animals (Garland and Carter, 1994, and references therein). Body mass has even been shown to impact mutation rates in nuclear and mitochondrial DNA (Gillooly et al., 2005; Estabrook et al., 2007). Body mass is also associated with locomotor performance; however, these relationships are complex. Body mass is positively correlated with home range area and daily movement distances in wild populations of mammals (Garland 1983), but negatively correlated with laboratory measures of voluntary activity (Dewsbery 1980). Replicate lines of mice which have been selected for high levels of voluntary wheel running activity are a powerful model system with which to 0  A version of this chapter will be submitted for publication. Authors: Carter, P. A., Heckman, N. and Wang, W.  75  Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny investigate the relationships between body mass and voluntary locomotor activity (Garland 2003). After 10 generations of selection, mice selected for high voluntary wheel running activity at 8 weeks of age ran significantly more than their unselected controls; however, body mass at the age at which wheel running was measured and selected was not a significant predictor of the amount of wheel running nor was it significantly different between selected and control mice (Swallow et al. 1998). A subsequent study, 14 generations after selection, examined wheel running and body mass in this system of mice after 8 weeks of access to running wheels (Swallow et al., 1999). Three major results emerged from this study. First, body mass and wheel running share a negative genetic correlation of −.50; selected mice  were significantly lighter at maturity than control mice. Second, this result is not mediated solely through wheel running activity; the difference in body mass persisted even in mice without wheel access, and in mice with wheel access even when total wheel running was statistically removed. Third, access to running wheels did result in a decrease in body mass, but this decrease was only about a third of the decrease caused by selection. A detailed study on body mass and wheel running ontogenies in the 16th generation of this system of mice was conducted by Morgan et al (2003). This experiment utilized 320 mice, half from the selected lines and half from the control lines; half of each of these groups were given lifetime access to running wheels while the other half did not have wheel access. Morgan et al (2003) compared the ontogenetic trajectories of the different experimental groups using repeated measures analysis and identified several interesting results. First, both the position and shape of the wheel running trajectory had responded to selection on wheel running at 8 weeks of age; selected mice ran more across the trajectory and showed a steeper decline in activity at  older ages than control mice. This correlated response of the wheel running ontogeny to early-age selection on wheel running indicates a positive genetic covariance between wheel running at younger ages and wheel running at older ages, with the covariance declining as age increases. Second, the position but not the shape of the body mass trajectory had responded to selection, regardless of access to running wheels; selected mice with and 76  Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny without wheel access were significantly lighter than their counterpart control mice across all ages. This result indicates a negative response of the body mass ontogeny to selection on early-age wheel running and hence a negative genetic covariance between wheel running at 8 weeks of age and body mass across ontogeny; in addition, it indicates strong positive genetic covariances among body masses at different ages. Finally, the analysis of the wheel running trajectory revealed that age-specific body mass was a significant covariate for age-specific wheel running; smaller individuals ran more at each age than did larger individuals. The fact that Swallow et al. (1999) measured a negative genetic covariance between age-specific wheel running and body mass and that Morgan et al. (2003) showed that body mass is a significant covariate for wheel running at each age, with smaller mice running more than larger mice, raises the question of whether wheel running through ontogeny explicitly depends on the body mass trajectory. Given that body mass at each age was a significant covariate for wheel running at that age, we hypothesized that wheel running is significantly dependent on the body mass trajectory. In addition, given the age-specific negative genetic covariance between body mass and wheel running, we can ask whether this dependence has evolved in response to early-age selection on wheel running. We hypothesize that this dependence has evolved such that we will measure quantitative differences between the selected and control mice in the dependence of wheel running on the body mass trajectory. To test these hypotheses, we will generally take a function-valued approach. In essence, we regress average wheel running on the log body mass trajectory. We then test if the “regression coefficients” are zero within each group and if they are equal between two groups. To model the log body mass trajectory, we use basis functions with expansion coefficients modeled as random effects.  77  Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny  4.2  Methods  Experiments were performed on house mice (Mus domesticus) from the 16th generation of a selective-breeding experiment for increased voluntary wheelrunning exercise (Swallow et al. 1998; Garland 2003). The selection experiment comprises four selection and four control lines; each line is propagated with 10 breeding pairs that contribute one male and one female to the next generation, with the condition that siblings are not mated. Breeders are chosen either randomly with respect to wheel running (control lines) or selectively, within family, as the mice that run the most revolutions on days five and six of six days of wheel exposure administered at eight weeks of age (selection lines). The individuals in this report are the offspring of the generation 15 breeders for generation 16 of the selection experiment. Five breeding pairs from each of the eight lines were remated to produce second litters at Washington State University. Details of this design are presented in Morgan et al (2003). Briefly, pups were weaned at 21 days of age and placed in treatment groups at 28 + / − 3 days of age in the animal care  facility at Washington State University. Four males and four females from  each family were used, with half designated for the active treatment and half designated for the sedentary treatment. Each activity group thus contained two females and two males from each of the five families within each of the eight lines, for a total of 160 individuals per activity group, i.e., 320 individuals in the aging colony. Mice in the active treatment group were placed individually in cages with a 11.75 cm radius running wheel and electronic wheel-revolution counter built into the cage-top. The mouse thus had the option of voluntarily getting into the wheel and running, or of remaining in the cage and not running. On the same day, sedentary mice were placed individually in standard rodent cages. Photoperiod was 12 : 12 h, and water was available ad libitum. Mice were provided excess food weekly (Harland Teklad rodent diet W8604) and apparent food consumption was determined by weighing food hoppers. This measure does not account for possible variation in food wasting, as when mice shred food pellets and allow fragments to drop in the litter. A study of food wasting at generation 10 78  Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny found no significant differences between selection and control lines, but significant variation among the replicate lines within selection group (Koteja et al. 2003). Additionally, once a week each animal was weighed and its weekly wheel revolutions downloaded from the counter device. Cages were cleaned weekly and running wheels were cleaned monthly. Extra sibs from all 40 families were placed in similar housing and were used as sentinels to monitor the colony monthly for the presence of specific pathogen exposure. At 24 months post start date, a sentinel tested positive for MHV exposure, presumably from a barrier breakdown. No other sentinels tested positive. No treatment was initiated for the virus. Subsequent necropsied mice were monitored specifically for the presence of diseased liver tissue. None were observed to have contracted hepatitis (necropsies performed by LAR veterinarians at WSU). Before carrying out any analysis, we eliminate line effects in log body mass and wheel-running by subtracting out line means within each group (selection and control) and gender. Suppose individual i is in line k of the selection group and is female. We replace individual i’s log body mass with original log body mass minus line k’s average plus the selection group average. The line k average and the selection group average only involve females. Similarly, we do this for the wheel-running. Let Zi (·) be the ith mouse’s log body mass trajectory. We study how Zi (·) affects Yi , the averaged wheel running response over weeks 5 to 60. We use average wheel-running since the wheel running distance data have many missing values and are very noisy. In addition, since the running wheels were cleaned every four weeks, we see spikes in wheel-running activity every fourth week. In the following, we describe our model for the dependence of wheelrunning on log body mass. This dependence is parameterized through an unknown function β. We estimate β and provide standard errors. We also test the null hypothesis that β = 0 (no dependence of wheel-running on log body mass) and test that two groups have the same β’s. We use permutation type methods to calculate the p-values. For more details of the statistical methodology, see Heckman and Wang (2007). 79  Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny We model Zi using random effects with a set of basis functions φ 1 , . . . , φK as  K  Zi (t) = µ(t) +  xik φk (t), k=1  where µ is a smooth unknown function. The random effects x i ≡ (xi1 , . . . , xiK ) are assumed normally distributed as N(0, Σ x ). The log body mass trajectory Zi is observed with error at a sequence of weeks, with the observed value of Zi at week j equal to zij ≡ Zi (j) +  ij .  The  and normally distributed with mean 0 and variance  ij ’s are 2 σ . Let z  independent i  contain the  observed values of the zij ’s. We can then write zi = µ + Axi +  i,  µ = (µ(−1), . . . , µ(60)) , Ajk = φk (j),  i  =(  i1 , . . . , in )  .  The dependence of Yi on Zi (·) is reflected through the unknown function β(·) as 60  Yi = β 0 +  −1  β(t) Zi (t) − E(Zi (t)) dt + ei ,  where ei is an error term distributed as N(0, σ 2 ). The functional parameter β is of primary interest and can be viewed as a generalization of the conventional multiple regression coefficient vector. Choosing a basis for β (for ease, the same as for the Zi (·)), we approximate β as K  βk φk (t).  β(t) = k=1  Together with our modelling of Zi (·), we can write the model of Yi as Yi = β0 + β Txi + ei , 60  β = (β1 , . . . , βK ) , Tjk =  −1  φj (t) φk (t) dt.  The goal is to use information of the observed data, the z i ’s and Yi ’s, to estimate the coefficient vector β. 80  Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny Before estimating β along with the other unknown parameters in the model, we must first estimate the basis functions, the φ k ’s. We apply a smoothed eigenanalysis to the zij ’s, penalizing an approximation of the second derivative of the eigenfunction. We choose the amount of smoothing by eye. A detailed description of choosing the eigenfunctions is provided in Section 4.3. Treating the estimated φk ’s as fixed and known, we then estimate the model parameters via maximum likelihood. We use an iterative numerical method, an implementation of the ECME (Expectation/Conditional Maximization Either) algorithm of Liu and Rubin (1994). Obtaining the ˆ our estimate of β(t) is β(t) ˆ = K βˆk φk (t). We estimate the estimate β, k=1 ˆ covariance matrix of β by both a Hessian matrix based method and a bootstrapped method. Using the estimated covariance matrix, we can estimate ˆ the variance of β(t), and thus make inference for β. To determine if Yi depends on Zi (·), we test Ho : β(t) = 0, for all t ∈ [−1, 60] via an integrated t-statistic 60  Uf =  −1  βˆ2 (t) dt. ˆ var(β(t))  To calculate the null distribution of U f , and thus calculate a p-value, we use a permutation type method, one that does not rely on distributional assumptions. Under the null hypothesis that β(t) ≡ 0 for all t ∈ [−1, 60],  Yi and zi are independent. Therefore the joint distribution of (z i , Yi ) is the same as that of (zi , Yi ) where i is chosen at random from {1, . . . , N } and  N is the number of individuals. To simulate the null distribution of U f ,  we make Q = 500 new data sets via permuting the Y i ’s: the qth data set is simply {z1 , Yi1 , . . . , zN , YiN } where i1 , . . . , iN is a random permutation of  1, . . . , N . The p-value is then calculated as the proportion of the resulting Q statistic values larger than the original observed value.  Consider the two-sample problem where there are observations independently from a selection group and from a control group. To see if the β’s 81  Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny governing the Y ’s and Z’s in the selection group and the control are equal, we test Ho : β s (t) = β c (t), for all t ∈ [−1, 60], where the superscript “s” or “c” indicates the group membership, “selection” or “control” respectively. To test H o , we use the two sample test statistic Uf =  [βˆs (t) − βˆc (t)]2 dt. var(βˆs (t)) + var(βˆc (t))  Again, we use a permutation method to calculate the null distribution of the two sample Uf and thus the p-values. Under the null hypothesis that β s (t) = β c (t) for all t ∈ [−1, 60], the dependence of Y i on zi is identical in  both groups. We generate Q = 500 “data sets” from the original data set, data sets that follow the null hypothesis. To construct the qth “data set”,  we randomly split the N s + N c individuals into two groups of size N s and N c and calculate the resulting Uf . We use the empirical distribution of the obtained Q test statistic values to approximate the null distribution of our statistic. The p-value is then calculated as the proportion of the Q statistic values larger than the original observed value. The use of function-valued traits in evolutionary biology was first proposed by Kirkpatrick and Heckman (1989). The methodology has found applications in animal breeding, via basis function expansions with random coefficients. See Meyer (2005) and references therein.  4.3  Results  The body mass trajectories for each experimental group are presented in Figure 4.1. For all four groups, body mass is a roughly monotone increasing function of age, with the highest rate of increase from ages −1 to 4 weeks.  The body mass trajectories of males, both selected and control, are more variable than those of the females. Histograms of average wheel running from ages 5 to 60 are presented in Figure 4.2; within each sex, selected mice ran more than control mice and females ran more than males (Morgan et 82  Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny al., 2003). In order to determine a few functions which adequately describe the body mass trajectories, a smoothed eigenanalysis of the trajectories was conducted. This smoothed eigenanalysis yielded a first eigenfunction that was close constant, indicating that the biggest source of variability in body mass was overall size of the mouse. Since a constant eigenfunction is biologically meaningful, the first basis function was forced to be constant and the remaining functions calculated via a smoothed eigenanalysis on the trajectories, centered as follows. Let z¯i =  1 58  60  zik k=−1 k=34,38,39,50  be the ith mouses average body mass. We apply a smoothed eigenanalysis to the sample covariance matrix of the centered body mass zij − z¯i  j = −1, . . . , 60, j = 34, 38, 39, 50.  Using the constant as the first eigenfunction, we found that the corresponding first principal component, overall size, accounts for 85% of the variability in the male selection group, 72% in the male control group, 70% in the female selection group and 63% in the female control group. Figure 4.3 shows the proportion of cumulative variance explained by the next ten principal components in each group after removing the variability of the first principal component. The first three smoothed eigenfunctions in each group capture about 90% of the remaining variability of the body mass trajectories so these three eigenfunctions plus the constant function will be used in the subsequent analyses. Figure 4.4 shows each of these four leading eigenfunctions; the functions of the four experimental groups are presented together in each panel. Clearly the smoothed eigenfunctions are very similar among the four experimental groups. We model the mean of the averaged wheel running as E(Y ) = β0 +  β(t)[Z(t) − EZ(t)]dt, 83  Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny where Z(t) is the logarithm of body mass at age t. Since the observed body mass curves are somewhat rougher than one would expect, we model the observed log body mass at age t as Z(t) plus error. To estimate β(·), we approximate both β(·) and Z(·) using basis function expansions. The basis functions used are the estimated eigenfunctions. To model within individual variability in body mass, we use a random effects model, that is, we assume that the coefficients in each individuals Z(·) expansion are random. We then estimate all parameters by maximum likelihood. The βˆ is estimated for each group from the smoothed eigenfunctions. ˆ are presented in Figure 4.5. Both the male selection and control These β’s groups show regions of βˆ that are non-overlapping with the 0 line and formal testing shows that βˆ is significantly different from 0 for both selected and control males (Figure 4.5; Table 4.1; p = 0.008 and 0.034, respectively). Hence in male selection and control mice average wheel running significantly depends on body mass trajectory. Interestingly, βˆ for the selection males is positive at very young ages, then negative just prior to 10 weeks, positive again at middle age and then negative or not different from 0 at older ages. For females, βˆ overlaps 0 and the results of the formal testing are not significant for either selection or control groups (Figure 4.5; p = 0.264 and 0.656, respectively). Hence for all females, there is no evidence that average wheel running depends on the body mass trajectory. Figure 4.6 shows both the pointwise estimates and the model estimates of the covariance between average wheel running and body mass trajectory for all four experimental groups. The general shapes of both estimates are similar to each other and they are similar to the shapes of the curves seen for the βˆ estimates in Figure 4.5. For instance, in each group except for the female control, at the first few ages the covariance between average wheelrunning and log body mass is large. Correspondingly, at these ages each βˆ is also large, where we believe that average wheel running is positively dependent on log body mass. The βˆ for selection and control mice were compared within each gender to test for the evolution of the relationship of body mass trajectory and average wheel running (Figure 4.7). No significant differences were detected 84  Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny between selected and control mice in either gender (Table 4.2); no evolution of the dependence of wheel running on body mass trajectory in response to selection on age-specific wheel running can be detected.  4.4  Discussion  As hypothesized, both male selected and control mice showed a significant dependence of wheel running on the body mass ontogeny. This result generally fits those of Swallow et al. (1999), who measured a negative genetic correlation of −0.5 between wheel running and body mass, and Morgan et  al. (2003) who measured a negative correlated response of the body mass ontogeny to selection on early-age wheel running. However, the analyses used herein have revealed complexity that was not identified in the two previous studies: both previous studies showed clear negative relationships between body mass and wheel running, whereas we have shown that the relationship is sometimes negative and sometimes positive, depending on age. As can be seen in Figures 4.5 and 4.6, body mass has a positive effect on wheel running at young and some middle ages, with a negative effect at some younger ages and at older ages. Interestingly Swallow et al. (1999) conducted their study at the younger ages (˜ weeks 8 to 15) when the relationship is negative, and although Morgan et al. (2003) clearly showed a negative genetic covariance between the body mass ontogeny and early-age wheel running, that does not preclude the phenotypic covariance from being at times positive. Hence our results do not disagree with those of the earlier studies but rather elucidate complexity that the earlier studies were not able to identify. A significant dependence of wheel running on the body mass trajectory was not identified in females. Different results in males and females is not surprising in the context of this model system of mice, in which males and females have differed in traits as diverse as wheel running itself (Swallow et al., 1998) and longevity (Bronikowski et al., 2006). On the other hand, the lack of a measurable difference in females may be caused by a lack of power. Given that Morgan et al. (2003) showed a negative correlated response of the wheel running trajectory to selection on early age body mass, we 85  Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny had hypothesized that the dependence of body mass on the wheel running trajectory would have evolved and hence be different between selected and control mice. However, no such differences were identified. Hence we can not demonstrate any evolution of the dependence of wheel running on body mass trajectory. Several reasons may explain this. First, it is possible that the dependence is evolving, just slowly, and that a similar study conducted on a later generation would reveal differences between selected and control mice. Second, the dependence of wheel running on body mass may be constrained by genetic covariances between wheel running and body mass, effectively making it difficult to alter the dependence. Such a constraint could be identified by detailed quantitative genetic analyses of the wheel running and body mass trajectories (Kingsolver et al., 2001). Third, the inability to demonstrate any evolution dependence of wheel running on body mass trajectory could simply be because of a lack of power. Developing powerful yet flexible statistical techniques to analyze data sets with a large number of observations on each individual is a challenge. We often do not have enough degrees of freedom to carry out a “usual” regression analysis, modelling Yi = β0 + βj Z(tj )+eij : for our data, within one group we have around 40 individuals and 58 values of β j . The functionvalued approach effectively reduces the number of parameters in the model. Using eigenfunctions as basis functions has allowed us to fit our data using only β0 and 4 βj ’s. Griswold, Gomulkiewicz and Heckman (2007) have found via simulation that basis-function type techniques are more powerful than traditional ones. Although they have considered a slightly different problem, applicable to a comparision of log body mass trajectories of two groups, one would expect their results to hold qualitatively for the problem considered here. The statistics methodology was developed by Heckman and Wang (2007), with this its first application. Here we see its benefits: the method has exposed patterns in dependence that were previously unknown. The method allows full use of information from all individuals, through the use of random effects modelling. And by using a function-valued approach, we can handle large number of observations per individual.  86  Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny Group Male selected Male control Female selected Female control  p-value 0.008 0.034 0.264 0.656  Table 4.1: P-values of the test Ho : β(t) = 0, for all t ∈ [−1, 60], using test statistic Uf . Gender Male Female  p-value 0.110 0.444  Table 4.2: P-values of the test Ho : β s (t) = β c (t), for all t ∈ [−1, 60], using test statistic Uf .  87  Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny  3.0 2.0  10  20  30  40  50  60  0  10  20  30  40  50  Week  Female Selected  Female Control  60  3.0 2.5 2.0  2.0  2.5  3.0  Log body Mass  3.5  Week  3.5  0  Log body Mass  2.5  Log body Mass  3.0 2.5 2.0  Log body Mass  3.5  Male Control  3.5  Male Selected  0  10  20  30 Week  40  50  60  0  10  20  30  40  50  Week  Figure 4.1: Plots of log body mass versus week for the four groups of laboratory mice. 88  60  Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny  0.020 0.000  20 40 60 80  120  20 40 60 80  120  Female Selected  Female Control  0.020 0.010 0.000  0.010  Density  0.020  0.030  Wheel Running (km/week)  0.030  Wheel Running (km/week)  0.000  Density  0.010  Density  0.020 0.010 0.000  Density  0.030  Male Control  0.030  Male Selected  20 40 60 80  120  Wheel Running (km/week)  20 40 60 80  120  Wheel Running (km/week)  Figure 4.2: Histogram of the response: averaged wheel running from weeks 5 to 60 for the four groups of laboratory mice. 89  Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny  0.9 0.8 0.5  4  6  8  10  2  4  6  8  Eigenvalue Index  Female Selected  Female Control  10  0.9 0.8 0.6 0.5  0.5  0.6  0.8  0.9  Variance Explained  1.0  Eigenvalue Index  1.0  2  Variance Explained  0.6  Variance Explained  0.9 0.8 0.6 0.5  Variance Explained  1.0  Male Control  1.0  Male Selected  2  4  6  8  Eigenvalue Index  10  2  4  6  8  Eigenvalue Index  Figure 4.3: Plots of the proportion of cumulative variance of the centered log body mass explained by the first ten principal components for four groups of mice, after the individual average log body mass has been removed. 90  10  0.1 −0.1 −0.5  −0.3  PC Function 2  1.2 1.0 0.8 0.6  PC Function 1  1.4  Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny  0  10  20  30  40  50  60  0  10  20  40  50  60  40  50  60  Week  0.2 0.1 −0.2  0.0  PC Function 4  0.1 0.0 −0.2  PC Function 3  0.2  0.3  Week  30  0  10  20  30 Week  40  50  60  0  10  20  30 Week  Figure 4.4: The constant function and the first three smoothed eigenfunctions of the covariance of centered log body mass of four group. The four groups’ functions are together in one panel. The eigenfunctions are calculated as described in Section 4.3. 91  Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny  20 0 −60 10  20  30  40  50  60  0  10  20  30  40  50  Week  Female Selected  Female Control  60  20 0 −20 −60  −60  −20  ^ β(t)  0  20  40  Week  40  0  ^ β(t)  −20  ^ β(t)  0 −20 −60  ^ β(t)  20  40  Male Control  40  Male Selected  0  10  20  30 Week  40  50  60  0  10  20  30  40  50  Week  Figure 4.5: Plots of βˆ and its standard errors computed from the Hessian matrix (solid) and from the bootstrap (dash-dot) for four groups of mice. Standard errors are calculated as described in Section 4.2. 92  60  Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny  10  20  30  40  50  60  0  10  20  30  40  50  Week  Female Selected  Female Control  −0.2  Cov  −0.2  60  0.2 0.4 0.6 0.8  Week  0.2 0.4 0.6 0.8  0  Cov  0.2 0.4 0.6 0.8 −0.2  Cov  0.2 0.4 0.6 0.8  Male Control  −0.2  Cov  Male Selected  0  10  20  30 Week  40  50  60  0  10  20  30  40  50  Week  Figure 4.6: Plots of the covariances between the average wheel running and the log body mass trajectory at week j, j = −1, . . . , 60, for four groups of mice. Solid lines are the covariance from the model and dash-dot lines are the pointwise sample covariances between average wheel running and log body mass. For the calculation, refer to Heckmand and Wang (2007). 93  60  Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny  0 20 −40  βs(t) − βc(t)  60  Male  0  10  20  30  40  50  60  40  50  60  Week  0 20 −40  βs(t) − βc(t)  60  Female  0  10  20  30 Week  Figure 4.7: Plots of βˆs − βˆc , the difference between the selection βˆ and ˆ and the standard errors of the difference computed from the the control β, Hessian matrix (solid) and from the bootstrap (dash-dot) for both genders. 94  Bibliography [1] Bronikowski, A. M., Morgan, T. J., Garland, T., Jr. and Carter, P. A. (2006) The evolution of aging and age-related physical decline in mice selectively bred for high voluntary exercise. Evolution 60, 1494-1508. [2] Dewsbury, D. A. (1980) Wheel-running behavior in 12 species of muroid rodents. Behav. Process 6, 271280. [3] Estabrook, G. F., Smith, G. R. and Dowling, T. E. (2007) Body mass and temperature influence rates of mitochondrial DNA evolution in North American Cyprinid fish. Evolution 61, 1176-1187. [4] Garland, T., Jr. (1983) Scaling the ecological cost of transport to body mass in terrestrial mammals. Am. Nat 121, 571-587. [5] Garland, T., Jr. (2003) Selection experiments: an under-utilized tool in biomechanics and organismal biology. Pages 23-56 in Bels, V. L., Gasc, J. P., Casinos, A. (2003) Vertebrate biomechanics and evolution. BIOS Scientific Publishers, Oxford, U.K. [6] Garland, T., Jr and Carter, P. A. (1994) Evolutionary physiology. Annu. Rev. Physiol 56, 579621. [7] Gillooly, J. F., Allen, A. P., West, G. B. and Brown, J.H. (2005) The rate of DNA evolution: effects of body size and temperature on the molecular clock. PNAS, 102, 140-145. [8] Gomulkiewicz, R. and Kirkpatrick, M. (1992) Quantitative genetics and the evolution of reaction norms. Evolution 46, 390-411.  95  Bibliography [9] Gomulkiewicz, R. and Beder, J. H. (1996) The selection gradient of an infinite-dimensional trait. SIAM Journal of Applied Mathematics 56, 509-523. [10] Griswold, C. K., Gomulkiewicz, R. and Heckman, N. (2007) Statistical power and hypothesis testing with functional data. manuscript. [11] Heckman, N. and Wang, W. (2007) Linear mixed models for measurement error in functional regression. manuscript, Department of Statistics, University of British Columbia. [12] Hone, D. W. E. and Benton, B. J. (2005) The evolution of large size: how does Cope’s Rule work? Trends in Ecology and Evolution 20, 1, 4-6. [13] Kingsolver, J.G., R.S. Gomulkiewicz, and Carter, P. A. (2001) Variation, selection, and evolution of function-valued traits. Genetica 112, 87-104. [14] Kirkpatrick, M. and Heckman, N. (1989) A quantitative-genetic model for growth, shape, reaction norms, and other infinite-dimensional characters. Journal of Mathematical Biology 27, 429-450. [15] Koteja, P., Carter, P. A. and Swallow, J. G. and Garland, T., Jr. (2003) Food wasting in house mice: variation among individuals, families, and genetic lines. Physiology & Behavior 80, 375-383. [16] Meyer, K. (2005) Advances in methodology for random regression analyses. Australian Journal of Experimental Agriculture 45, 847-859. [17] Morgan, T. J., Garland, T., Jr. and Carter, P. A. (2003) Ontogenies in mice selected for high voluntary wheel-running activity. I. Mean ontogenies. Evolution 57, 646-657. [18] Sogard, S. M. (1997) Size-selective mortality in the juvenile stage of teleost fishes: A review. Bulletin of Marine Science 60, 1129-1157  96  Bibliography [19] Swallow, J. G., Carter, P. A. and Garland, T., Jr. (1998) Artificial selection for increased wheel-running behavior in house mice. Behavior Genetics 28, 227-237. [20] Swallow, J. G., Koteja, P. and Carter, P. A. and Garland, T., Jr. (1999) Artificial selection for increased wheel-running activity in house mice results in decreased body mass at maturity. Journal of Experimental Biology 202, 2513-2520.  97  Chapter 5  Conclusions and future plan In the thesis, we study identifiability of linear mixed effects models in Chapter 2. Particularly in Section 2.5, we give necessary and sufficient conditions for identifiability of a model used in functional data analysis. The model is our study object in Chapter 3. There we show how to use the ECME algorithm to estimate the model parameters, develop statistics for hypothesis testing of the functional coefficient β and derive expressions of the residuals to check for outliers or influential points. We also apply the model to analyze a biological data set in Section 3.6. Chapter 4 provides a detailed background of the biological experiment, a description of the data, and biological interpretations of our analysis. In this chapter, we give possible directions for future research. In Section 5.1, we adjust our Chapter 3 to account for correlated individuals. In Section 5.2, we consider generalizing our Chapter 3 model from a scalar response to a functional response.  5.1  Models with correlated individuals  In Chapter 4, we apply our model (3.4) and (3.5) to analyze a biological data set. Specifically, the observed ith mouse’s log body mass, z i , is modelled as zi = µ + Axi +  i  (5.1)  and the ith mouse’s average wheel running, Y i , is modelled as Yi = β0 + β Txi + ei .  (5.2)  98  Chapter 5. Conclusions and future plan We assume the random effects, the xi ’s, are independent across individuals. However, in the biological experiment, mice may come from the same family and thus traits may be genetically correlated. This fact would lead to correlations among the random effects, the x i ’s. Clearly, our independence model assumption is then violated and we should take the dependence of individuals into consideration. In this section, we show how to handle this family correlation more directly, by adjusting our model to account for this dependence. In the experiment, two mice came from the same family. Let the subscript i index the family and the subscript k = 1 or 2 differentiate the two family members. Models of the ikth mouse’s log body mass and average wheel running are respectively zik = µ + Axik +  ik ,  (5.3)  Yik = β0 + β Txik + eik , k = 1, 2.  (5.4)  We still assume that Cov(xi1 ) = Σx = Cov(xi2 ) but, in the ith family, xi1 and xi2 are correlated as Cov(xi1 , xi2 ) = αΣx ,  (5.5)  where α is a known constant. The constant, α, comes from genetic theory (see, for instance, Heckman, 2003). We still assume the measurement errors are independent of the random effects and are independent across individuals as ik  i.i.d.  i.i.d.  ∼ N(0, σ 2 R), eik ∼ N(0, σ 2 ),  (x11 , x12 ), (x21 , x22 ), . . . , (xN 1 , xN 2 ), 11 , 12 , . . . , N 1 , N 2 , e11 , e12 , . . . , eN 1 , eN 2  are mutually independent.  The model (5.3) and (5.4) now includes the dependence between the two family members. In the following, we give another expression of the model, dropping the subscript k.  99  Chapter 5. Conclusions and future plan We first group each family’s information together. Let zi =  zi1 zi2  , yi =  Yi1 Yi2  xi1  , xi =  ,  xi2  i  i1  =  ei1  , and ei =  ei2  i2  To write a model for the ith family, let ˜= µ  µ µ  , β˜0 =  β0 β0  ˜ = ,A  A  0  0  A  , and β T =  βT  0  0  βT  .  From (5.3) and (5.4), our model for the ith family is ˜ i+ ˜ + Ax zi = µ  i,  (5.6)  yi = β˜0 + β Txi + ei .  (5.7)  To derive the covariances of the (zi , yi ), we let ΣAx = AΣx A , ΣA,β = AΣx T β and Σβ,x = β TΣx T β. From (5.5), it is not hard to see the following holds, Cov(zi ) = Cov(zi , yi ) = Cov(yi ) =  ΣAx + Σ  αΣAx  αΣAx  ΣAx + Σ  ΣA,β  αΣA,β  αΣA,β  ΣA,β  ,  ,  Σβ,x + σ 2  αΣβ,x  αΣβ,x  Σβ,x + σ 2  .  Model (5.6) and (5.7) uses individuals’ family information together and the correlation between family members is reflected in the model’s covariance structure. To estimate the model parameters, we can use the ECME algorithm. But we may need to modify the algorithm as the model covariance structure is not in a simple form.  100  .  Chapter 5. Conclusions and future plan  5.2  Proposing models with a functional response  In Chapter 3, we applied our model to analyze the mouse wheel running data set. In the experiment, the wheel running distances were observed every week for 60 consecutive weeks. We took the averaged wheel running distance as a scalar response, averaged to reduce nuisance effects in the wheel running. However, we would like to consider the entire wheel-running trajectory as a response and to study its dependence on the log body mass trajectory. In this section, we introduce a model which generalizes a scalar response to a functional response. We give an outline of the plan to study the model. Finally, we give a literature review of current work on this functional response and predictor model.  5.2.1  Models of a functional response  Recall our modelling of the scalar response Y i is b  Yi = β 0 +  a  β(s) [Zi (s) − E(Zi (s))] ds + ei .  Now we have a functional response Yi indexed by t. Replacing the univariate β(·) function with a bivariate β(·, ·), we consider generalizing the scalar  response model to accomodate the functional response as b  Yi (t) = α(t) +  a  β(t, s) Zi (s) − E(Zi (s)) ds + ei (t).  (5.8)  We consider a tensor-product expansion of β(·, ·) using basis functions, ϕ 1 ,  . . ., ϕJ and ψ1 , . . ., ψL , as J  L  β(t, s) = j=1 l=1  bjl ϕj (t)ψl (s) ≡ ϕ(t) Bψ(s), B[j, l] = bjl .  (5.9)  The modelling of the predictor Zi remains the same as K  Zi (s) = µ(s) + k=1  xik φk (s) ≡ µ(s) + φ(s) xi , 101  Chapter 5. Conclusions and future plan i.i.d.  µ smooth, xi ≡ (xi1 , . . . , xiK ) ∼ N(0, Σx ). Thus we can write Yi (t) = α(t) + ϕ(t) BTxi + ei (t), b  Tlk =  a  ψl (s) φk (s) ds ≡  ψ(s)φ(s) ds.  (5.10)  Both the Zi and Yi are observed at a sequence of discrete time points. Typically, we assume the observed Z i are perturbed with measurement errors. Let  ij  represent the measurement error of Z i at time sij , j = 1, . . . , ni .  Let zij be the contaminated observed value as zij = Zi (sij ) + Let µ = (µ(si1 ), . . . , µ(sini )) ,  ij  i  ≡ µ(sij ) + φ(sij ) xi +  =(  i1 , . . . , ini )  ij .  (5.11)  represent errors and zi =  (zi1 , . . . , zini ) be the observed values. Write zi = µ + Axi +  i,  Ajk = φk (sj ), j = 1, . . . , ni , k = 1, . . . , K, i  x1 , . . . , x n ,  ∼ N(0, Σ i ),  1, . . . , n  are mutually independent.  (5.12)  Suppose the process Yi is observed at a sequence of discrete time points (t1 , . . . , tmi ). Let α = (α(t1 ), . . . , α(tmi )) , ei = (ei (t1 ), . . . , ei (tmi )) represent errors and yi = (yi (t1 ), . . . , yi (tmi )) ≡ (yi1 , . . . , yimi ) be the observed  values. We then write  yi = α + ΦBTxi + ei ,  Φij = ϕj (ti ),  ei independent of both xi and  i,  ei ∼ N(0, Σei ), x1 , . . . , x n ,  1 , . . . , n , e1 , . . . , e n  are mutually independent.  (5.13) 102  Chapter 5. Conclusions and future plan The data available are the observed y i ’s and the zi ’s and the unknown parameter of most interest is B. We then use (5.12) and (5.13) as our model of the functional response and the functional predictor for data analysis.  5.2.2  Outline of the model study  We initiate our plan to study the model (5.12) and (5.13) in this section. The steps basically follow those in Chapter 3: model identifiability, parameter estimation, hypothesis testing and model diagnostics. Throughout, the basis functions are assumed known and fixed. We give a discussion of the choice of basis functions at the end. In Section 2.5 we gave conditions of model identifiability for the scalar response model. We can use similar methods to study identifiability of this functional response model with covariance parameters now {Σ x , Σ i , Σei , i = 1, . . . , N }. New results will be needed to guarantee the model identifiability with restrictions on the Σ i ’s and the Σei ’s.  After identifying the model parameters, we can proceed to estimate the unknown parameters using an ECME algorithm assuming the basis functions known and fixed. The parameter of most interest is the matrix B in (5.9). After obtaining the estimate, similar approaches using the Hessian matrix ˆ For hypothesis or bootstrap can be applied to get the standard errors of B. testing of β, it may be of interest to know if, at certain fixed t point, β(t, ·) is  zero or not. So the statistics developed in Section 4.4 can also be used to test  the hypothesis Ho : β(t, s) = 0 for all s ∈ [a, b] when t is fixed. After fitting  the model, residuals of the fit of yi can be obtained based on the BLUP of the random effects xi . However, for each individual, the fitted value and the  residual of yi now are vectors, not scalars. Appropriate methods for residual analysis need to be proposed. The ϕj ’s and ψl ’s basis functions for β and the φk ’s for Zi can be chosen using an eigenanalysis approach. As in the scalar response case, we apply an eigenanalysis to the observed zij ’s, and take φ1 , . . . , φK as the first K estimated eigenfunctions. We take L = K and ψ k = φk to determine one set of the basis functions of β(·, ·) in (5.9), the set of basis functions that 103  Chapter 5. Conclusions and future plan appear in the integral T in (5.10). The other set of basis functions, the ϕ j ’s, can be chosen as the estimated eigenfunctions of the Y i process. An alternative to the use of a tensor-product expansion of β is to consider using bivariate basis function expansions. Let the ζ l ’s be bivariate basis functions. We can approximate β as L  β(s, t) = l=1  βl ζl (s, t) ≡ ζ(s, t) β.  But we also need to consider what basis functions to choose.  5.2.3  Literature review  There are two research results closest to our work on the functional response and functional predictor model. Ramsay and Silverman (2005, chapter 16) study a slight modification of model (5.8) yi (t) = α(t) +  β(t, s) Zi (s) ds + ei (t).  Besides using the tensor-product expansion of β as in (5.9), the authors expand α using the ϕj ’s as J1  α(t) = j=1  αl ϕj (t) ≡ ϕ(t) α.  Assuming the Zi ’s are observed without error, the authors estimate α and B by minimizing the integrated residual sum of squares, which is N i=1  2  yi (t) − ϕ(t) α −  Zi (s)ψ(s) Bϕ(t)ds  dt.  Yao, M¨ uller and Wang (2005) consider (5.8) with the Z i process observed with error. They use φk ’s equal to the eigenfunctions of the Z i covariance function in (5.11). The random effects x i has a diagonal covariance matrix with elements σk2 ’s being the eigenvalues of the Zi covariance function. The errors  ij  are assumed to be i.i.d. with mean zero and variance σ 2 . 104  Chapter 5. Conclusions and future plan Yao et al. carry out an eigenanalysis of the Y i ’s covariance function, with resulting eigenfunctions equal to the ψ l ’s and corresponding eigenvalues equal to the σl2 ’s. They expand the observed yik , contaminated with measurement errors, as L  uil ψl (tik ) + eik ,  yik = µY (tik ) + l=1  µY smooth, Euil = 0, Eu2il = σl2 , Euil uil = 0, l = l . i.i.d.  eik ∼ (0, σe2 ), independent of the uil s.  (5.14)  Let C(s, t) denote the two-dimensional scatterplot smoothed cross-covariance surface Cov(Z(s), Y (t)) of the zij ’s and yik ’s. The authors estimate β as ˆ t) = β(s,  K  L  k=1 l=1  σkl φk (s)ψl (t), σk2  where σkl is calculated as σkl =  φk (s) C(s, t) ψl (t) ds dt.  Similarities between our modelling and Yao et al’s are that the coefficient vector xi in the expansion of the Zi is random and the Zi ’s are observed with measurement error. However, Yao et al’s x i has a diagonal covariance structure while our covariance matrix of the x i , Σx , can be more general. We use both the zi ’s and the yi ’s information to estimate Σx and other model parameters. In Yao et al., variance elements σ k2 ’s of the xi and variance of the error  ij ,  σ 2 , are estimated completely by the zi ’s. Similarly, the σl2 ’s of  the ui and σe2 of the eik are estimated completely using the y i ’s.  105  Bibliography [1] Heckman, N. (2003) Functional data analysis in evolutionary biology. Recent Advances and Trends in Nonparametric Statistics, 49-60, Akritas, M. G. and Politis, D. N. (editors). [2] Ramsay, J. O. and Silverman, B. W. (2005) Functional Data Analysis. 2nd edition, Springer Series in Statistics. [3] Yao, F., M¨ uller, H.G., Wang, J.L. (2005) Functional linear regression analysis for longitudinal data. Annals of Statistics 33, 6, 2873-2903.  106  Appendix A  Appendix to Chapter 2 A.1  Proof of Corollary 2.4.1  To prove it, we use the following result from Graybill, 1983, p.206. Lemma A.1.1 Given two scalars a and b, the characteristic equation of the matrix C = (a − b)I + bJ in λ is (a + (n − 1)b − λ) (a − b − λ)n−1 , and hence n − 1 characteristic roots are equal to a − b and one root is equal to a + (n − 1)b. Proof : To prove the corollary, we use Theorem 2.4.1 and a proof by contradiction. First suppose that the model is identifiable and suppose, by way of ˜ . Let s > 1, σ ∗ 2 = sσ 2 and contradiction, HZ J = J. Fix Σ ∈ Θ ρ∗ = (ρ − 1)/s + 1. It is not hard to check −1/(n − 1) < ρ ∗ < 1. Define Σ  ∗  = σ ∗ 2 [(1 − ρ∗ )I + ρ∗ J]. Then Σ − Σ  ∗  = (σ 2 − σ ∗ 2 )J and, since  HZ J = J, it is clear that (2.3) is satisfied. We now show that, for any  Σu ∈ Θu , there exists s∗ > 1 so that Σ∗u defined as in (2.4) is positive defi-  nite whenever 1 < s < s∗ . This will show that the model is not identifiable, which contradicts our assumption. Plugging Σ − Σ ∗ = (σ 2 − σ ∗2 )J into (2.4) yields  Σ∗u = Σu + σ 2 (1 − s)(Z Z)−1 Z JZ(Z Z)−1 .  (A.1)  By assumption 1 Z = 0 and Z is of full column rank, the matrix (Z Z)−1 Z JZ(Z Z)−1 is non-negative definite and of rank one since J = 11 . Let λ be its nonzero and thus the largest eigenvalue of (Z Z)−1 Z JZ(Z Z)−1 . Let λm be the 107  Appendix A. Appendix to Chapter 2 smallest eigenvalue of the matrix Σ u , and let s∗ = λm /(λσ 2 ) + 1. For any x∈  q,  x = 0, x Σ∗u x > 0 by the following argument. x Σ∗u x = x Σu x + σ 2 (1 − s)x (Z Z)−1 Z JZ(Z Z)−1 x ≥ λm x x + σ 2 (1 − s)λx x > 0  Now suppose that HZ J = J and suppose, by contradiction, that the model is not identifiable. Then, by Theorem 2.4.1, there exist nonidentical Σ and Σ Σ −Σ  ∗  ∗  satisfying (2.3) and, since the rank of H Z is q, the rank of  is at most q. We have  Σ −Σ  ∗  = (σ 2 − σ ∗2 ) − (σ 2 ρ − σ ∗2 ρ∗ ) I + (σ 2 ρ − σ ∗2 ρ∗ )J.  By Lemma A.1.1, the eigenvalues of Σ − Σ  ∗  are (σ 2 − σ ∗ 2 ) − (σ 2 ρ − σ ∗ 2 ρ∗ ),  which is of multiplicity n − 1 and (σ 2 − σ ∗ 2 ) + (n − 1)(σ 2 ρ − σ ∗ 2 ρ∗ ), of multiplicity 1. Since Σ − Σ  ∗  is not the zero matrix, all of the eigenvalues  cannot be equal to 0: we must either have no eigenvalues equal to 0, one eigenvalue equal to 0, or n − 1 eigenvalues equal to 0. In order to have  rank(Σ − Σ ∗ ) ≤ q, the n − 1 multiple eigenvalues have to be zero since 1 ≤ q < n − 1 by assumption. That is, σ 2 − σ ∗2 = σ 2 ρ − σ ∗2 ρ∗ and so Σ −Σ  ∗  = (σ 2 − σ ∗2 )J. But plugging this into (2.3) yields H Z J = J,  contradicting our assumption.  ✷  A.2  Proof of Corollary 2.4.2  Proof : We first note a fact about the matrix H Z . Since HZ is symmetric and idempotent, (HZ [k, l])2 = (HZ [k, k])2 +  HZ [k, k] = l  (HZ [k, l])2 . l=k  108  Appendix A. Appendix to Chapter 2 Thus, if HZ [k, k] = 1, then HZ [k, i] = HZ [i, k] = 0 for all i = k. To prove the corollary, we use Theorem 2.4.1 and a proof by contradiction. First suppose that the model is identifiable and suppose, by way of contradiction, a diagonal element of H Z is equal to 1. Without loss of generality, we assume HZ [1, 1] = 1. Then by our observation, H Z [1, i] = ˜ . Let σ ∗2 satisfy HZ [i, 1] = 0 for all i = 1. Fix Σ = diag{σ12 , . . . , σn2 } ∈ Θ 1 0 < σ1∗ 2 < σ12 and define Σ  ∗  = diag{σ1∗ 2 , σ22 , . . . , σn2 }. Then Σ − Σ  diag{σ12 −σ1∗ 2 , 0, . . . , 0}. It is not hard to check that (2.3) for any Σu ∈ Θu , Σ∗u defined as in (2.4) is also in Θu .  ∗  =  is satisfied. Clearly, Thus, the model is  not identifiable, which contradicts our assumption.  Now suppose that no diagonal element of H Z is equal to one and suppose, by contradiction, that the model is not identifiable. Then there exists nonidentical diagonal matrices, Σ and Σ ∗ , satisfying (2.3). As Σ = Σ ∗ , at least one of the diagonal elements of Σ − Σ  ∗  is not zero. Suppose the  k-th diagonal element is not zero. By (2.3), the k-th diagonal element of  HZ must be one, contradicting our assumption.  ✷  109  Appendix B  Appendix to Chapter 3 In this appendix, we provide the calculations in Sections 3.3.2 and 3.3.4 where we find the updates of Σx and {β, σ 2 } in the ECME procedure.  In Section B.4, we derive the first order condition (3.10) and the Hessian  matrix (3.14) of Section 3.3.2 where we maximize the log-likelihood Λ N over Σx while holding the other parameters fixed.. In Section B.5, we derive the first order conditions (3.22) and (3.23), and the Hessian matrix (3.26) of Sec˜ N over {β, σ 2 } holding the other parameters tion 3.3.4 where we maximize Λ fixed. We use the tool of matrix differential calculus, calculating first differen-  tials to obtain the first order conditions and second differentials to obtain the Hessian matrices. The book by Magnus and Neudecker (1988) gives an elegant description on this subject. In Sections B.1-B.3, we follow the book to introduce some definitions and provide some background, mainly from Part Two of the book. We keep the same notation as in the book. Throughout this section, chapters and page numbers all refer to (Magnus and Neudecker, 1988).  B.1  Definition of the first differential  We first give the definition of the first differential for a vector function (a vector valued function with a vector argument). We show that the function’s first differential is connected with its Jacobian matrix. We then give an extension of the definition to a matrix function (a matrix valued function with a matrix argument) and show how to identify the Jacobian matrix from the first differential.  110  Appendix B. Appendix to Chapter 3 Definition B.1.1 Let f : S →  m  be a function defined on a set S in  n.  Let c be an interior point of S, and let B(c; r) be an n-ball lying in S centred at c of radius r. If there exists a real m × n matrix A, depending on c but not on u, such that  f (c + u) = f (c) + A(c)u + rc (u) for all u ∈  n  with ||u|| < r and rc (u) = 0, u→0 ||u|| lim  then the function f is said to be differentiable at c; the m × n matrix A(c)  is then called the first derivative of f at c, and the m × 1 vector df (c; u) = A(c)u,  which is a linear function of u, is called the first differential of f at c (with increment u). If f is differentiable at every point of an open subset E of S, we say f is differentiable on E. After calculating the first differential, we identify the Jacobian matrix as follows. Let Df be the m × n Jacobian matrix of f whose ijth element  is D j fi : the partial derivative of the ith component function f i of f with respect to the jth coordinate. The First Identification Theorem (p.87) states that the first derivative A(c) is Df (c) when f is differentiable at c. To extend the definition of a vector function to a matrix function with a  matrix argument is straightforward using the vec operator. The vec operator transforms a matrix into a vector by stacking the columns of the matrix one underneath the other. Recall the norm of a real matrix X is defined by ||X|| = (trX X)1/2 .  111  Appendix B. Appendix to Chapter 3 n×q  Let  contains all the real n × q matrices. Define a ball B(C; r) with  center C and radius r in  n×q  by  B(C; r) = {X : X ∈ Let F : S →  m×p  n×q  , ||X − C|| < r}.  be a matrix function defined on a set S in  n×q .  That  is, F maps an n × q matrix into an m × p matrix F(X). We consider the  the vector function f : vec S →  m×p  defined by  f (vec X) = vec F(X) and the following gives the definition of the first differential of F. Definition B.1.2 Let F : S → S in  n×q .  m×p  be a matrix function defined on a set  Let C be an interior point of S and let B(C; r) ⊂ S be a ball  with center C and radius r. If there exists R C and a real (mp)×(nq) matrix A, depending on C but not on U, such that vecF(C + U) = vecF(C) + A(C)vecU + vecR C (U) for all U ∈  n×q  with ||U|| < r and lim  U→0  RC (U) = 0, ||U||  then the function F is said to be differentiable at C. Let dF(C; U) = A(C)vecU. Although this is a vector of length (mp), it can be formed into a matrix of dimension m × p, in the usual natural way. This m × p matrix dF(C; U) is  called the first differential of F at C with increment U and the (mp) × (nq)  matrix A(C) is called the first derivative of F at C.  From the definition, it is clear that the differential of F and f are related by vec dF(C; U) = df (vecC; vecU). 112  Appendix B. Appendix to Chapter 3 The Jacobian matrix of F at C is defined as DF(C) = Df (vecC). This is an (mp) × (nq) matrix, whose ijth element is the partial derivative  of the ith component of vecF(X) with respect to the jth element of vecX,  evaluated at X = C. The First Identification Theorem for matrix functions (p.96) states that if F is differentiable at C, then vec dF(C; U) = DF(C)vecU. So we can calculate the differential of F to identify its Jacobian matrix.  B.2  Definition of the second differential  We first introduce the definition of twice differentiable on which the definition second differential is based. The definitions are restricted to real valued functions as in our calculations we only need to consider second differentials of real valued functions. Then we connect the Hessian matrix with the second differential. As in the first differential case, we give an extension of the definition to a real valued function with a matrix argument and also show how to identify the Hessian matrix from the second differential. Definition B.2.1 Let f : S → S in  n,  be a real valued function defined on a set  and let c be an interior point of S. If f is differentiable in some  n-ball B(c) and each of the partial derivatives D j f is differentiable at c, then we say that f is twice differentiable at c. If f is twice differentiable at every point of an open subset E of S, we say f is twice differentiable on E. The following is the definition of the second differential. Definition B.2.2 Let f : S →  c of S ⊂  n.  be twice differentiable at an interior point  Let B(c) be an n-ball lying in S such that f is differentiable  at every point in B(c), and let g : B(c) →  be defined by the equation  g(x) = df (x; u). 113  Appendix B. Appendix to Chapter 3 Then the differential of g at c with increment u, i.e. dg(c; u), is called the second differential of f at c (with increment u), and is denoted by d 2 f (c; u). To calculate the second differential of f , by the definition, we just need to calculate the differential of the first differential of f , i.e. d2 f = d(df ). We have seen that the Jacobian matrix can be identified from the first differential. Similarly, we can identify the Hessian matrix from the second differential. The Second Identification Theorem (p.107) states that if f is twice differentiable at c, then d2 f (c; u) = u (Hf (c)) u, where Hf (c) is the n × n symmetric Hessian matrix of f at c with (i, j)  entry equal to ∂ 2 f (c)/(∂ci ∂cj ). Therefore, once we have calculated the second differential, the Hessian matrix is obtainable. Similarly as in Section B.1, the extension of the second differential from vector functions to matrix functions is straightforward using the vec operator. As we only consider real valued functions, we restrict the extension to a real valued function with a matrix argument. We follow the notation in the definition of the first differential of matrix functions. Let the domain of f be S ⊆  n×q  and let C be an interior point  of S. Let B(C; r) ⊂ S be a ball with center C and radius r and let U  be a point in  n×q  with ||U|| < r, so that C + U ∈ B(C; r). The second  differential of f at C is then defined as  d2 f (C; U) = d2 f (vecC; vecU). The Second Identification Theorem for matrix functions (p.115) says if f is twice differentiable at C, then d2 f (C; U) = (vecU) Hf (C) vecU,  114  Appendix B. Appendix to Chapter 3 where Hf (C) is the nq × nq symmetric Hessian matrix of f at C defined as Hf (C) ≡ Hf (vecC). That is, the ijth element of Hf (C) is the second order derivative of f with respect to the ith and jth element of vecX where X ∈ S, evaluated at C.  B.3  Matrix algebraic and differential rules  In this section, we list the matrix algebraic and differential rules (chap.8) which will be used without specific reference in our derivations. In the following, we let A, B, C and D denote constant matrices, u denote a vector function and U and V denote matrix functions. We let ⊗ stand for  the Kronecker product. The rules are the following. • tr(AB) = tr(BA), provided AB is square. • tr(A B) = (vecA) vecB.  • tr(ABCD) = (vecD) (A ⊗ C )(vecB ), provided ABCD is defined and square.  • dA = 0. • dAU = AdU. • d(U + V) = dU + dV. • d(UV) = (dU)V + U(dV). • d(U ) = (dU) . • d (ln det U) = trU−1 (dU). • d (U−1 ) = −U−1 (dU)U−1 . • d (trU) = tr (dU). • d(vecU) = vec (dU). • d(u Au) = u (A + A )du = (du) (A + A )u. 115  Appendix B. Appendix to Chapter 3  B.4  Calculations in Section 3.3.2  Recall the observed data log-likehood has the expression ΛN = −  1 N ln det CΣx C + Σd − 2 2  N i=1  (Wi −µW ) CΣx C + Σd  −1  (Wi −µW ).  We want to maximize it over Σx while holding the other parameters fixed. In this section, holding the other parameters fixed, we calculate the first differential of ΛN to obtain the first order condition (3.10) and calculate the second differential to obtain the Hessian matrix in (3.14). As we treat Σx as the only unknown parameter, it immediately follows from the expression of ΣW in (3.9) dΣW = CdΣx C .  (B.1)  In our derivation, we will use the shorter notation dΣ W before we reach (3.10) or (3.14). We have dΛN  N 1 = − d (ln ΣW ) − 2 2  N i=1  (Wi − µW ) d Σ−1 W (Wi − µW ) N  = −  N 1 tr Σ−1 W dΣW + 2 2  = −  N N 1 −1 Σ tr Σ−1 tr dΣ + (Wi − µW )(Wi − µW ) Σ−1 W W W W (d ΣW ) 2 2 i=1  = −  N −1 tr Σ−1 W (ΣW − SW ) ΣW (dΣW ) 2  i=1  −1 (Wi − µW ) Σ−1 W (d ΣW ) ΣW (Wi − µW )  (B.2)  Recalling (B.1), we now have dΛN = −  N −1 tr C Σ−1 W (ΣW − SW ) ΣW CdΣx . 2  By the First Identification Theorem for matrix functions mentioned in Sec-  116  Appendix B. Appendix to Chapter 3 tion B.1, we obtain the Jacobian matrix of Λ N at Σx as −1 D ΛN (Σx ) = vec C Σ−1 W (ΣW − SW ) ΣW C .  Equating it to zero yields −1 C Σ−1 W (ΣW − SW ) ΣW C = 0  which is equivalent to the first order condition (3.10). Next we calculate d2 ΛN to identify the Hessian matrix in (3.14). Starting from (B.2), we have d2 ΛN  N N −1 −1 tr (dΣ−1 tr Σ−1 W )(ΣW − SW )ΣW (dΣW ) − W d(ΣW − SW )ΣW (dΣW ) 2 2 N −1 − tr Σ−1 W (ΣW − SW )dΣW (dΣW ) 2 N N −1 −1 −1 tr Σ−1 tr Σ−1 = W (dΣW )ΣW (ΣW − SW )ΣW (dΣW ) − W (dΣW )ΣW (dΣW ) 2 2 N −1 −1 + tr Σ−1 W (ΣW − SW )ΣW (dΣW )ΣW (dΣW ) . 2 = −  The first term and the last term at the right hand side are equal, and so they can be combined into one term −1 −1 N tr Σ−1 W (dΣW )ΣW (ΣW − SW )ΣW (dΣW ) .  Then d2 ΛN  −1 1 −1 = N tr Σ−1 W (dΣW )ΣW ( ΣW − SW )ΣW (dΣW ) 2  Recall (B.1). Right hand side of the above −1 −1 1 = N tr Σ−1 W CdΣx C ΣW ( ΣW − SW )ΣW CdΣx C 2 −1 1 −1 = N tr C Σ−1 W CdΣx C ΣW ( ΣW − SW )ΣW CdΣx 2 −1 1 −1 = N (vec dΣx ) C Σ−1 W C ⊗ C ΣW ( ΣW − SW )ΣW C vec(dΣx ). 2  117  Appendix B. Appendix to Chapter 3 ˆ x which satisfies the first order conWhen evaluated at the critical point Σ dition (3.10), d2 ΛN is then −  N ˆ −1 C ⊗ C Σ ˆ −1 C vec(dΣx ). (vec dΣx ) C Σ W W 2  By the Second Identification Theorem for matrix functions mentioned in ˆ x the Hessian matrix is equal to Section B.2, we have at Σ ˆ x ) = −(N/2) D ˆ ⊗D ˆ , where D ˆ =CΣ ˆ −1 H(Σ W C. This is the matrix we saw in (3.14).  B.5  Calculations in Section 3.3.4  Recall we want to maximize the log-likelihood 1 ˜ N = − N ln(β Kβ + σ 2 ) − Λ 2 2(β Kβ + σ 2 )  N i=1  Yi − β0 − β G(zi − µ)  2  over {β, σ 2 } to find the update while fixing the other parameters. In this  section, we derive the first order conditions (3.22) and (3.23) via calculating ˜ N with respect to {β, σ 2 }. Calculating the second the first differential of Λ ˜ N then gives us the Hessian matrix (3.26). differential of Λ  ˜ N and The following two differentials will facilitate our calculation in d Λ ˜ N . Recall the expression of σ 2 in (3.19). We have d2 Λ Y |z  dσY2 |z ≡ d β Kβ + σ 2 = 2β Kdβ + dσ 2 . Let  (B.3)  N  g(β) = i=1  Yi − β0 − β G(zi − µ) (zi − µ) G .  (B.4)  We then obtain N  d i=1  Yi − β0 − β G(zi − µ)  2  N  = −2  i=1  Yi − β0 − β G(zi − µ) (dβ) G(zi − µ) 118  Appendix B. Appendix to Chapter 3 N  = −2  i=1  Yi − β0 − β G(zi − µ) (zi − µ) G (dβ)  = −2g(β)dβ  (B.5)  ˜ N , we use the terms σ 2 and β Kβ +σ 2 interchangeably. To calculate dΛ Y |z ˜N dΛ  = − −  1 N d ln(β Kβ + σ 2 ) − d 2 2(β Kβ + σ 2 ) 1 d 2(β Kβ + σ 2 )  N i=1  N i=1  Yi − β0 − β G(zi − µ)  Yi − β0 − β G(zi − µ)  2  By (B.3) and (B.5), the right hand side above is equal to −  N 2β Kdβ + dσ 2 2β Kdβ + dσ 2 + 2 σY2 |z 2σY4 |z  Let  N i=1  2  Yi − β0 − β G(zi − µ) +  N  2  1  g(β)dβ. σY2 |z  − N σY2 |z .  (B.6)  dσ 2 1 c(β, σ 2 ) + 4 β K c(β, σ 2 ) + σY2 |z g(β) dβ. 4 2σY |z σY |z  (B.7)  c(β, σ 2 ) = i=1  Yi − β0 − β G(zi − µ)  ˜ N is equal to Then dΛ  By the First Identification Theorem mentioned in Section B.1, we obtain the first order conditions 1 σY4 |z  β K c(β, σ 2 ) + σY2 |z g(β)  = 0,  1 c(β, σ 2 ) = 0 2σY4 |z which lead to (3.22) and (3.23). ˜ N to identify the Hessian matrix is lengthy and tedious. Calculating d2 Λ In fact, we don’t need the closed form of the Hessian matrix but the Hes-  119  2  Appendix B. Appendix to Chapter 3 ˆ σ sian matrix evaluated at the critical points { β, ˆ 2 } given in (3.26). So in our derivation, we will make use of the first order conditions to simplify  calculation. ˆ σ We notice, equivalently, the critical points { β, ˆ 2 } only need to satisfy ˆ = 0 g(β)  (B.8)  ˆ σ c(β, ˆ 2 ) = 0.  (B.9)  From (B.6), using (B.3) and (B.5) we have N  dc(β, σ 2 ) = −2  i=1  Yi − β0 − β G(zi − µ) (zi − µ) G dβ − 2N β Kdβ − N dσ 2  = −2g(β)dβ − 2N β Kdβ − N dσ 2 , which is a function of β. By (B.8), ˆ σ 2 ) = −2N (dβ) Kβ − N dσ 2 . dc(β,  (B.10)  ˜ N starting from (B.7). We first calculate Now we calculate d2 Λ d  dσ 2 c(β, σ 2 ) 2σY4 |z  which is 1 2σY4 |z  d  dσ 2 c(β, σ 2 ) +  dσ 2 d c(β, σ 2 ). 2σY4 |z  ˆ σ When at {β, ˆ 2 }, by (B.9) and (B.10), it is equal to −  N dσ 2 ˆ + dσ 2 = − N dβ dσ 2 2(dβ) Kβ 4 2ˆ σY |z σ ˆY4 |z  ˆ Kβ 1/2  dσ 2 .  (B.11)  Then we calculate d  1 σY4 |z  β K c(β, σ 2 ) + σY2 |z g(β) dβ  120  Appendix B. Appendix to Chapter 3 which is 1  d  β K c(β, σ 2 ) + σY2 |z g(β) dβ+  σY4 |z  1  d σY4 |z  β K c(β, σ 2 ) + σY2 |z g(β) dβ. (B.12)  At  ˆ σ {β, ˆ 2 },  the term in the first square brackets vanishes by (B.8) and (B.9). ˆ σ Thus, at {β, ˆ 2 }, (B.12) is equal to 1 ˆ σ ˆ σ 2 ) + dσ 2 g(β) ˆ +σ ˆ dβ. (dβ) Kc(β, ˆ 2 ) + β Kdc(β, ˆY2 |z dg(β) Y |z σ ˆY4 |z From (B.4), we have N  dg(β) = −(dβ)  i=1  G(zi − µ)(zi − µ) G .  (B.13)  ˆ σ 2 ) is a scalar, (B.12) is equal to Again by (B.8)-(B.10) and that dc(β, 1 σ ˆY4 |z = −  ˆ K−σ −2N (dβ) Kβˆ − N dσ 2 β ˆY2 |z (dβ)  N dβ dσ 2 σ ˆY4 |z    ˆˆ  2Kβ β K +  2 σ ˆY |z N  N i=1  G(zi − µ)(zi − µ) G dβ   N i=1  G(zi − µ)(zi − µ) G  dβ (B.14) ˆ βK  ˆ σ Combining (B.11) and (B.14), eventually we get, at ( β, ˆ 2 ), dβ  ˆ σ ˆ σ ˜ N (β, ˜ β, d2 Λ ˆ 2 ) = dβ dσ 2 H Λ( ˆ2)  dσ 2  ,  where   ˆβ ˆ K+ N 2Kβ ˆ σ ˜ β, H Λ( ˆ2) = − 4  σ ˆY |z  2 σ ˆY |z N  N i=1 G(zi  ˆK β  − µ)(zi − µ) G    ˆ Kβ . 1/2  ˆ σ ˜ β, By the Second Identification Theorem mentioned in Section B.2, H Λ( ˆ2) is the Hessian matrix and we have seen it in (3.26).  121  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0066202/manifest

Comment

Related Items