Science, Faculty of
Statistics, Department of
DSpace
UBCV
Wang, Wei
2008-01-03T18:12:16Z
2008
Doctor of Philosophy - PhD
University of British Columbia
Regression models with a scalar response and
a functional predictor have been extensively
studied. One approach is to approximate the
functional predictor using basis function or
eigenfunction expansions. In the expansion,
the coefficient vector can either be fixed or
random. The random coefficient vector
is also known as random effects and thus the
regression models are in a mixed effects
framework.
The random effects provide a model for the
within individual covariance of the
observations. But it also introduces an
additional parameter into the model, the
covariance matrix of the random effects.
This additional parameter complicates the
covariance matrix of the observations.
Possibly, the covariance parameters of the
model are not identifiable.
We study identifiability in normal linear
mixed effects models. We derive necessary and
sufficient conditions of identifiability,
particularly, conditions of identifiability
for the regression models with a scalar
response and a functional predictor using
random effects.
We study the regression model using the
eigenfunction expansion approach with random
effects. We assume the random effects have a
general covariance matrix
and the observed values of the predictor are
contaminated with measurement error.
We propose methods of inference for the
regression model's functional coefficient.
As an application of the model, we analyze a
biological data set to investigate the
dependence of a mouse's wheel running
distance on its body mass trajectory.
https://circle.library.ubc.ca/rest/handle/2429/253?expand=metadata
680020 bytes
application/pdf
Linear Mixed Effects Models in Functional Data Analysis by Wei Wang B.Sc., University of Science and Technology of China, 2001 M.Sc., National University of Singapore, 2003 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy in The Faculty of Graduate Studies (Statistics) The University Of British Columbia January, 2008 c© Wei Wang 2008 Abstract Regression models with a scalar response and a functional predictor have been extensively studied. One approach is to approximate the functional predictor using basis function or eigenfunction expansions. In the expansion, the coefficient vector can either be fixed or random. The random coefficient vector is also known as random effects and thus the regression models are in a mixed effects framework. The random effects provide a model for the within individual covariance of the observations. But it also introduces an additional parameter into the model, the covariance matrix of the random effects. This additional parameter complicates the covariance matrix of the observations. Possibly, the covariance parameters of the model are not identifiable. We study identifiability in normal linear mixed effects models. We derive necessary and sufficient conditions of identifiability, particularly, conditions of identifiability for the regression models with a scalar response and a func- tional predictor using random effects. We study the regression model using the eigenfunction expansion ap- proach with random effects. We assume the random effects have a general covariance matrix and the observed values of the predictor are contaminated with measurement error. We propose methods of inference for the regression model’s functional coefficient. As an application of the model, we analyze a biological data set to in- vestigate the dependence of a mouse’s wheel running distance on its body mass trajectory. ii Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . x Statement of Co-Authorship . . . . . . . . . . . . . . . . . . . . . xi 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Summary of the biology experiment . . . . . . . . . . . . . . 1 1.2 Review of smoothing methods . . . . . . . . . . . . . . . . . 2 1.3 Functional regression models . . . . . . . . . . . . . . . . . . 5 1.4 Introduction to the thesis work . . . . . . . . . . . . . . . . . 9 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2 Identifiability discussion of linear mixed effects models . 13 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3 Simple sufficient conditions of identifiability . . . . . . . . . 16 2.4 Sufficient conditions of identifiability for a structured Σ . . 17 2.5 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 iii Table of Contents 3 Linear mixed models for measurement error in functional regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2 Notation and Preliminaries . . . . . . . . . . . . . . . . . . . 29 3.3 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . 31 3.3.1 Initial estimates of parameters other than µ and β0 . 32 3.3.2 Updating Σ(t)x . . . . . . . . . . . . . . . . . . . . . . 32 3.3.3 Updating σ 2(t) . . . . . . . . . . . . . . . . . . . . . . 34 3.3.4 Updating β(t) and σ2(t) . . . . . . . . . . . . . . . . . 36 3.4 Inference for β . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.4.1 Hypothesis testing for β . . . . . . . . . . . . . . . . 38 3.5 Model assumption checking . . . . . . . . . . . . . . . . . . . 43 3.6 Model application . . . . . . . . . . . . . . . . . . . . . . . . 44 3.6.1 Choosing the basis functions . . . . . . . . . . . . . . 44 3.6.2 Data description . . . . . . . . . . . . . . . . . . . . . 45 3.6.3 Choice of basis functions . . . . . . . . . . . . . . . . 45 3.6.4 Estimation and residual analysis . . . . . . . . . . . . 46 3.6.5 Inference for β(·) . . . . . . . . . . . . . . . . . . . . 47 3.6.6 Inference for βs − βc . . . . . . . . . . . . . . . . . . 48 3.7 Simulation study . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.7.1 Two stage estimate . . . . . . . . . . . . . . . . . . . 49 3.7.2 One sample comparison . . . . . . . . . . . . . . . . . 50 3.7.3 Two sample comparison . . . . . . . . . . . . . . . . . 51 3.7.4 Edge effect discussion in one-sample MSE comparison 52 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4 Dependence of average lifetime wheel-running on body mass ontogeny . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 iv Table of Contents Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5 Conclusions and future plan . . . . . . . . . . . . . . . . . . . 98 5.1 Models with correlated individuals . . . . . . . . . . . . . . . 98 5.2 Proposing models with a functional response . . . . . . . . . 101 5.2.1 Models of a functional response . . . . . . . . . . . . 101 5.2.2 Outline of the model study . . . . . . . . . . . . . . . 103 5.2.3 Literature review . . . . . . . . . . . . . . . . . . . . 104 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Appendices A Appendix to Chapter 2 . . . . . . . . . . . . . . . . . . . . . . 107 A.1 Proof of Corollary 2.4.1 . . . . . . . . . . . . . . . . . . . . . 107 A.2 Proof of Corollary 2.4.2 . . . . . . . . . . . . . . . . . . . . . 108 B Appendix to Chapter 3 . . . . . . . . . . . . . . . . . . . . . . 110 B.1 Definition of the first differential . . . . . . . . . . . . . . . . 110 B.2 Definition of the second differential . . . . . . . . . . . . . . 113 B.3 Matrix algebraic and differential rules . . . . . . . . . . . . . 115 B.4 Calculations in Section 3.3.2 . . . . . . . . . . . . . . . . . . 116 B.5 Calculations in Section 3.3.4 . . . . . . . . . . . . . . . . . . 118 v List of Tables 3.1 P-values of the test Ho : β(t) = 0, for all t ∈ [−1, 60] in each group. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.2 P-values of the test Ho : β s(t) = βc(t), for all t ∈ [−1, 60], within each gender. . . . . . . . . . . . . . . . . . . . . . . . . 54 4.1 P-values of the test Ho : β(t) = 0, for all t ∈ [−1, 60], using test statistic Uf . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.2 P-values of the test Ho : β s(t) = βc(t), for all t ∈ [−1, 60], using test statistic Uf . . . . . . . . . . . . . . . . . . . . . . . 87 vi List of Figures 3.1 Plots of log body mass versus week for the four groups of laboratory mice. . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.2 Histogram of the response: averaged wheel running from weeks 5 to 60 for the four groups of laboratory mice. . . . . . . . . 56 3.3 Plots of the proportion of cumulative variance of the centered log body mass explained by the first ten principal components of the four groups of mice, after the individual average log body mass has been removed. . . . . . . . . . . . . . . . . . 57 3.4 The constant function and the first three smoothed eigenfunc- tions of the covariance of centered log body mass of the four groups of mice, calculated as described in Section 3.6.3. . . . 58 3.5 Residual plots of the fit of the Yi = averaged wheel running of the four groups of mice. . . . . . . . . . . . . . . . . . . . 59 3.6 Plots of residuals of Yi = averaged wheel runining in the male control group of mice before and after removing the outlier. 60 3.7 Plots of β̂ and its standard errors computed from the Hessian matrix (solid) and from the bootstrap (dash-dot) for the four groups of mice, as described in Section 3.4. . . . . . . . . . . 61 3.8 Comparing the permuted values of the generalized likelihood ratio statistic Ul with the Wald statistic Uw for four groups of mice. The equivalence of Ul and Uw were observed in Ta- ble 3.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.9 Plots of β̂s− β̂c and the standard errors of the difference com- puted from the Hessian matrix (solid) and from the bootstrap (dash-dot) for both genders. . . . . . . . . . . . . . . . . . . 63 vii List of Figures 3.10 MSE of the estimate of β for each γ value in one sample simulation as described in Section 3.7.2. Compare the ECME method with the two stage method. . . . . . . . . . . . . . . 64 3.11 Proportion of times Ho is rejected using level α = 0.01, where Ho : β(t) = 0, for all t ∈ [−1, 60] in one sample simulation as described in Section 3.7.2. Two test statistics are considered, Uw and Uf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.12 Proportion of times Ho is rejected using level α = 0.05, where Ho : β(t) = 0, for all t ∈ [−1, 60] in one sample simulation as described in Section 3.7.2. Two test statistics are considered, Uw and Uf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.13 MSE of the estimate of βs for each γ value in two sample simulation as described in Section 3.7.3. Compare the ECME method with the two stage method. . . . . . . . . . . . . . . 67 3.14 MSE of the estimate of βc for each γ value in two sample simulation as described in Section 3.7.3. Compare the ECME method with the two stage method. . . . . . . . . . . . . . . 68 3.15 MSE of the estimate of βs−βc for each γ value in two sample simulation as described in Section 3.7.3. Compare the ECME method with the two stage method. . . . . . . . . . . . . . . 69 3.16 Proportion of times Ho is rejected using level α = 0.01, where Ho : β s = βc, for all t ∈ [−1, 60] in two sample simulation as described in Section 3.7.3. Four test statistics are considered Ul, Uw, Ue and Uf . . . . . . . . . . . . . . . . . . . . . . . . 70 3.17 Proportion of times Ho is rejected using level α = 0.05, where Ho : β s = βc, for all t ∈ [−1, 60] in two sample simulation as described in Section 3.7.3. Four test statistics are considered Ul, Uw, Ue and Uf . . . . . . . . . . . . . . . . . . . . . . . . 71 3.18 MSE of the estimate of β for each γ value using truncated log body mass as described in Section 3.7.4. Compare the ECME method with the two stage method. . . . . . . . . . . . . . . 72 viii List of Figures 4.1 Plots of log body mass versus week for the four groups of laboratory mice. . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.2 Histogram of the response: averaged wheel running from weeks 5 to 60 for the four groups of laboratory mice. . . . . . . . . 89 4.3 Plots of the proportion of cumulative variance of the centered log body mass explained by the first ten principal components for four groups of mice, after the individual average log body mass has been removed. . . . . . . . . . . . . . . . . . . . . . 90 4.4 The constant function and the first three smoothed eigen- functions of the covariance of centered log body mass of four group. The four groups’ functions are together in one panel. The eigenfunctions are calculated as described in Section 4.3. 91 4.5 Plots of β̂ and its standard errors computed from the Hes- sian matrix (solid) and from the bootstrap (dash-dot) for four groups of mice. Standard errors are calculated as described in Section 4.2. . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.6 Plots of the covariances between the average wheel running and the log body mass trajectory at week j, j = −1, . . . , 60, for four groups of mice. Solid lines are the covariance from the model and dash-dot lines are the pointwise sample covariances between average wheel running and log body mass. For the calculation, refer to Heckmand and Wang (2007). . . . . . . 93 4.7 Plots of β̂s−β̂c, the difference between the selection β̂ and the control β̂, and the standard errors of the difference computed from the Hessian matrix (solid) and from the bootstrap (dash- dot) for both genders. . . . . . . . . . . . . . . . . . . . . . . 94 ix Acknowledgements For the completion of this thesis, I would like very much to express my heartfelt gratitude to my supervisor Professor Nancy Heckman for all her invaluable advice and guidance, endless patience, kindness and encourage- ment during the mentor period in the Department of Statistics of University of British Columbia. I have learned many things from her, particularly re- garding academic research and analytical writing. I truly appreciate all the time and effort she has spent in helping me to solve the problems encoun- tered even when she is in the midst of her work. I would also like to express my sincere gratitude and appreciation to my committee members, namely Professors Gustafson, Paul and Wu, Lang (alphabetically) for their precious advice and help in my thesis. Special thanks to Professor Patrick Carter for providing us the biological data and the biological background of the thesis work. Thank you to all faculty, staff and graduate students in the Department of Statistics, University of British Columbia, for making my stay here such an enriching experience. x Statement of Co-Authorship The thesis is finished under the supervision of my supervisor, Professor Nancy Heckman. Chapter 2 is completed with the help of Professor Heckman. Chapter 3 is co-authored with Professor Nancy Heckman. My main contributions are the derivation of the model parameter estimation in Sec- tion 3.3 and the model diagnostics in Section 3.5. I conduct the programming work of the data analysis and the simulation studies. Chapter 4 is co-authored with Professor Patrick Carter, School of Bio- logical Sciences, Washington State University. Professor Carter conducted the biological experiment and provided us the data. Professor Carter also gives a detailed description of the experiment and the biological background. We describes the statistical method and carry out the data analysis. xi Chapter 1 Introduction This thesis work is motivated by a biology experiment involving data on individuals, data that can be considered as scalar responses depending on function-valued predictors. In the thesis, we provide a functional regression approach to study the biology problem of interest. In this introductory section, we first give a brief summary of the experiment and then a review of smoothing methods in functional data analysis. We give a literature review of the functional regression model followed by a description of our approach. 1.1 Summary of the biology experiment Professor Patrick Carter of the School of Biological Sciences of Washington State University conducted an experiment on house mice (Mus domesticus) selectively bred for increased voluntary wheel-running exercise. Details of the experiment are described in Morgan, Garland and Carter, 2003. In summary, mice were divided into four groups according to gender and the two treatments “selection” and “control”. The selection group mice were from lines bred with selection being on high wheel-running activity at age eight weeks. Control mice were from lines bred at random. Body mass and wheel running activity were recorded for each mouse for sixty two consec- utive weeks, indexed from −1 to 60, except for weeks 34, 38, 39, 50. The research interest is to know how body mass and wheel running are related and if the relationship depends on the treatment. Unfortunately, the wheel running distance data have many missing val- ues and are very noisy. In addition, the wheels were cleaned every four weeks, which resulted in spikes in wheel-running activity every fourth week. 1 Chapter 1. Introduction One way to eliminate the problem is to average the wheel running distance over weeks and use the averaged value in the analysis. After this averaging, a scalar is obtained. We treat the averaged wheel running as the response and the body mass trajectory as the predictor. We can build a regression model to study the dependence of the averaged wheel running on the body mass. To make use of the functional form of the body mass, we use a functional regression model approach. More details of this model are in Section 1.3. In the next section, we give a review of smoothing methods which can be used to study the body mass trajectories. Some of these methods will be used in Section 1.3. 1.2 Review of smoothing methods One can view a body mass trajectory as a process which can be described as a continuous function of time. Though the measurements of the trajec- tory are made at a finite discrete set of points, the trajectory lies in an infinite dimensional parameter space. This functional type of data is the study object of a statistics branch, namely functional data analysis. For a reference, please see Ramsay and Silverman (2005). In this section, we give a brief summary of some popular smoothing methods to study functional data. Ruppert, Wand and Carroll (2003) give a detailed summary of these methods. These methods can be applied to study the body mass trajectories of the mice. To further investigate the relationship between the body mass and the averaged wheel running, an additional method needs to be used. First, we consider analyzing data on one individual. Fix the ith individ- ual and let Zi(·) denote its trajectory. Let tij denote Zi’s jth observational point and let zij denote its observed value at that time. Data collected on this individual are the tij ’s and the zij ’s, j = 1, . . . , ni. The dependence of zij on tij through the function Zi is written as zij = Zi(tij) + ij, j = 1, . . . , ni. (1.1) where ij represents an error, either measurement error or modelling error, 2 Chapter 1. Introduction with mean zero. Estimating Zi from the zij’s is often refered to as nonpara- metric regression, since the function Zi is only assumed to be “smooth”. One may view zij as the perturbed Zi at time tij. One of the popular methods to estimate Zi is local polynomial smooth- ing. See Fan and Gijbels (1996) for details. Let K be a symmetric positive function with K(t) decreasing as |t| increases. Let h > 0 be the smoothing parameter which is usually refered to as the “bandwidth”. To estimate Zi(t) by a local polynomial of degree p, we minimize n∑ j=1 [zj − β0 − β1(tj − t)− . . .− βp(tj − t)p]2K ( tj − t h ) to obtain (β̂0, β̂1, . . . , β̂p). The estimate of Zi(t) is then Ẑi(t) = β̂0. Com- pared with other smoothing methods, one advantage of local polynomial smoothing is its simpler theoretical analysis which has allowed greater in- sight into its asymptotic properties. Another popular smoothing method uses spline basis functions to model the function Zi. A spline function with degree q is defined on an interval. One first divides the interval into subintervals by breakpoints, called knots. On each subinterval, the spline function is a polynomial of degree q. On the entire interval, the function has q−1 continous derivatives. For a discussion of spline functions, please see Ramsay and Silverman (2005) p. 47. A popular choice of spline basis functions is the B-spline basis. In general, let φ1,. . .,φK be the spline basis functions and assume that Zi can be modelled as Zi(t) ∼ K∑ k=1 xikφk(t) ≡ φ(t)′xi, xi ≡ (xi1, . . . , xiK)′. (1.2) It thus follows from (1.1) that zij = K∑ k=1 xikφk(tij) + ij ≡ φ(tij)′xi + ij , j = 1, . . . , ni. (1.3) 3 Chapter 1. Introduction One way to estimate the unknown coefficient vector xi is via minimizing ni∑ j=1 [ zij − φ(tij)′xi ]2 . (1.4) To use the spline basis functions, one must specify the number and the location of “knots”. This is not a simple task. Improper choice of knots may result in overfitting or underfitting the data. One strategy to overcome this problem is to use many knots, but to penalize to prevent overfitting. In this approach, called penalized splines, we minimize a penalized version of (1.4) to estimate xi. That is, we minimize ni∑ j=1 [ zij − φ(tij)′xi ]2 + λx′iPxi, (1.5) where λ is a smoothing parameter and P is a penalty matrix. For examples of P and how to choose λ using the method of cross validation, please see Ramsay and Silverman (2005) p. 94. Interestingly, there is a close connection between the penalized smooth- ing estimate of Zi(t) using basis functions and the best linear unbiased pre- dictor (BLUP) of Zi(t) in a random effects model framework. Intuitively, consider the vector coefficient xi in (1.3) being random and having a covari- ance matrix Σx. Under a normality assumption of the xi and the ij ’s, the log likelihood function of the zij ’s and xi can be written in a form similar to (1.5), up to constant. The smoothing parameter λ is related to the variance of ij and the penalty matrix P is in the same position as the inverse of Σx. For more discussion on the BLUP, see Robinson (1991). The random effects view of penalized splines is particularly useful when analyzing data from N individuals: the zij’s, j = 1, . . . , ni, i = 1, . . . , N . We model each zij as in (1.3), with the random effects xi’s assumed indepen- dently and identically distributed with covariance matrix Σx. The random effects induce a covariance structure on the zij ’s that provides a reasonable, yet flexible model for within subject dependence. The parameter Σx is to be estimated using all the zij ’s. Standard methods to fit random effects or 4 Chapter 1. Introduction mixed models (containing both random and fixed effects) can thus be used. For a reference, see McCulloch and Searle (2001). Rice and Wu (2001) use a B-spline basis in (1.3) and use the EM algorithm to estimate the model parameters. The authors also calculate the best linear unbiased predictor of each individual trajectory. 1.3 Functional regression models Some ideas and methods in functional data analysis can be viewed as exten- sions of those in classical statistics. One of the extensions is the functional regression model with a scalar response and a functional predictor. This model can be used to study the dependence of the averaged wheel running on the body mass trajectory. In this section, we introduce the functional regression model and sketch the methods to study this model. Among the rich literature on this topic, we select those papers most relevant to our approach. Let Yi be the averaged wheel running of the ith individual. Referring to (1.1), we view Zi as the predictor while zij is the perturbed value of Zi at time tij. Data collected on N individuals are Yi and zij, j = 1, . . . , ni, i = 1, . . . , N . The functional regression model is Yi = β0 + ∫ β(t)Zi(t)dt+ ei. (1.6) The unknown functional coefficient β(·) measures the dependence of the scalar response Yi on the predictor Zi and is of primary interest. Usually it is assumed that the function β is smooth. The functional regression model (1.6) is loosely related to ordinary mul- tiple regression in that the integral replaces the summation. The ordinary multiple linear regression model for a fixed individual uses “covariates” Zi(ti1), . . . , Zi(tini). and models Yi as Yi = β0 + ∑ j βjZi(tij) + ei. 5 Chapter 1. Introduction In our case, where the Zi(tij)’s are unobserved, we are essentially carrying out multiple regression with covariates measured with error. Note that the βj ’s do not depend on i. Thus, to use all individuals in this model, the Zi’s have to be measured at the same time points to obtain the zij ’s, i.e. tij = tj and ni = n. When individuals are measured at different sets of time points or have different number of observations, it is not clear how to apply the multiple regression model. Another problem with multiple regression is that for large values of n, the high-dimensional vector of parameters, β, may be hard to estimate. The functional model (1.6) can easily accommodate the case when dif- ferent time points are observed for different individuals. And exploiting the continuity of β and the Zi’s allows us to reduce the number of parameters used. In the literature, there are several approaches to calculate the integral∫ β(t)Zi(t)dt and to estimate β. In the remainder of this section, we give a summary of some of these approaches and focus on those closest to ours. One approach uses a discretized version of the integral and estimates β by minimizing a penalized version of the residual sum of squares. Cardot, Crambes, Kneip and Sarda (2007) assumed that data were observed at the same n equally spaced time points, i.e. ni ≡ n, tij = tj and tj − tj−1 = 1/n for all j = 2, . . . , n. Cardot et al. considered model (1.6) but without the intercept β0, i.e. the model Yi = ∫ β(t)Zi(t)dt+ ei. The authors discretized the integral ∫ β(t)Zi(t)dt using a grid of t values equal to the tj’s and wrote Yi = 1 n n∑ j=1 β(tj)Zi(tj) + ei. Let Zi = (Zi1, . . . , Zin) ′, zi = (zi1, . . . , zin) ′ and β = (β(t1), . . . , β(tn)) ′. The authors used a Total Least Squares method to estimate β and predict Zi as 6 Chapter 1. Introduction the solutions of the minimization problem min β∈<n,Zi∈<n { 1 N N∑ i=1 [( Yi − 1 n Z′iβ )2 + 1 n ||zi − Zi||2 ] + λ n β′Pβ } , where λ is a smoothing parameter and P is a penalty matrix. Unfortunately, it is not clear how to apply this approach directly when individuals are measured at different sets of time points. Another approach uses basis functions to approximate the Zi’s and β. In this basis expansion approach, individuals can be measured at different time points and they don’t need to have the same number of observations. This approach approximates the Zi’s as in (1.2) and similarly approximates β using basis functions, ψ1, . . . , ψJ , as β(t) = J∑ j=1 βjψj(t) ≡ β′ψ(t), β = (β1, . . . , βJ)′. (1.7) An advantage of expansions (1.2) and (1.7) is that they reduce the infinite dimensional functional uncertainty into finite dimensional unknowns which are conveniently handled by vectors. Estimating β(·) thus reduces to esti- mating the coefficient vector β. We also notice that using (1.2) and (1.7), the integral ∫ β(t)Zi(t)dt in (1.6) can be written as∫ β(t)Zi(t)dt = β ′Txi, where T[j, k] = ∫ ψj(t)φk(t)dt. (1.8) In this approach, the xi’s in (1.2) can be assumed to be fixed or random. When the xi’s are assumed random, they are usually modelled as described at the end of Section 1.2. That is, they are assumed independently and identically distributed with a normal distribution with covariance matrix Σx. This assumption contributes to the construction of the likelihood function of the data, the (Yi, zij)’s. The vector β is estimated by maximizing the log likelihood or a penalized log likelihood. In the following, we give brief summaries of three statistical papers and a book chapter that use a basis expansion approach. In expansions (1.2) and 7 Chapter 1. Introduction (1.7), the φk’s and ψj’s are usually chosen to be splines, although this is not necessary. For instance, the φk’s can be equal to the principal component functions of the Zi’s. James (2002), James and Silverman (2005) and Müller (2005) considered more general models. In the following, we describe their approaches with an emphasis on studying model (1.6). James (2002) used φk’s in (1.3) equal to a basis for natural cubic splines and also considered ψ = φ in (1.7). The Zi’s were subject to error, and so the data were the perturbed zij ’s, given in (1.3). The xi’s were multivariate normal and β was estimated by maximizing the log likelihood. With xi’s as the missing data and the {(zij , j = 1, . . . , ni), Yi}’s as the observed data, the author used the EM algorithm to estimate unknown parameters. James and Silverman (2005) used φk’s in (1.3) equal to the estimated principal component functions of the Zij ’s and used ψj ’s equal to cubic spline basis functions. The coefficient vector’s, the xi’s, were taken to be fixed. When the Zi’s were observed without error, the authors estimated β by maximizing the log likelihood of the (Yi, Zij)’s subject to a penalty term which penalized β’s that have ∫ β(t)f(t)dt big when Var( ∫ Z(t)f(t)dt) is small. When the zij ’s were observed, the log likelihood was modified to include the measurement error and a penalty term was added to ensure the smooth fits of the Zi’s. Müller (2005) used the idea of Karhunen-Loève expansion of the Zi pro- cess to estimate the φk’s and the corresponding xi’s. In this approach, the φk’s in (1.3) are equal to the first K estimated eigenfunctions of the Zi process. The xi’s are random and have diagonal covariance matrix, and Müller assumes that the estimated xi’s have diagonal covariance matrix too. Müller’s method of estimating can be separated into two parts where the xi’s are estimated in the first part and β is estimated in the second part. A detailed description of Müller’s method is provided in Chapter 3. Ramsay and Silverman (2005, Chapter 15) first calculated Ẑi(·) by fitting the zij ’s using least squares and a fairly large number of basis functions. They then estimated β by minimizing a penalized residual sum of squares 8 Chapter 1. Introduction which was defined as N∑ i=1 [ Yi − β0 − ∫ β(t)Zi(t)dt ]2 + λβ′Pβ. This method is implemented in the R library fda (Ramsay, Wickham and Graves, 2007). 1.4 Introduction to the thesis work In (1.3), when the xi’s are random, the distribution of the observed vector (zi1, . . . , zini) will not, in general, be identifiable. This concern motivates our work in Chapter 2. Assuming the xi’s random introduces an additional parameter Σx into the model. This additional parameter complicates the covariance matrix of the zij’s and the variance of the Yi’s. Possibly, the covariance/variance parameters of the model are not identifiable. In Chapter 2, we study identifiability in normal linear mixed effects mod- els. We derive necessary and sufficient conditions of identifiability, including conditions of identifiability for the scalar response and functional predictor mixed effects model. In Chapter 3 of the thesis, we study model (1.6) assuming the observed Zi’s are contaminated with measurement error. We use principal component functions to expand Zi in (1.2) and use the same functions to expand β in (1.7). This choice of basis functions can fit the Zi’s well with K fairly small. An added benefit of this eigenfunction approach is that we focus on “estimable directions” of β. We assume that xi in (1.3) is a normal random vector with a general covariance matrix Σx. In Chapter 3, we implement the ECME (Expectation/Conditional Max- imization Either) algorithm of Liu and Rubin (1994) to estimate the model parameters. We consider Hessian matrix based and boot-strapped standard errors of the estimate of β. We propose a new integrated t-statistic for hy- pothesis testing involving β(·). Expressions of the residuals of the model fit are derived and we discuss model diagnostics based on the analysis of residuals. 9 Chapter 1. Introduction In Chapter 4, we apply our model to analyze a biological data set, pro- viding a detailed biological background. Chapter 5 summarizes the thesis work and discusses possible future re- search directions. 10 Bibliography [1] Cardot, H., Crambes, C., Kneip, A. and Sarda, P. (2007) Smooth- ing splines estimators in functional linear regression with errors-in- variables. Computational Statistics & Data Analysis 51, 10, 4832-4848. [2] James, G. (2002) Generalized linear models with functional predictors. Journal of the Royal Statistical Society B 64, 3, 411-432. [3] James, G. M. and Silverman, B. W. (2005) Functional adaptive model estimation. Journal of the American Statistical Association 100, 470, 565-576. [4] Fan, J., and Gijbels, I. (1996) Local Polynomial Modelling and Its Ap- plications. London: Chapman & Hall. [5] Liu, C. H. and Rubin, D. B. (1994) The ECME Algorithm - a simple ex- tension of EM and ECM with faster monotone convergence. Biometrika 81 4, 633-648. [6] McCulloch, C. E. and Searle, S. R. (2001) Generalized, Linear and Mixed Models. New York: Wiley. [7] Morgan, T.J., T. Garland, Jr., and Carter, P. A. (2003) Ontogenetic trajectories in mice selected for high wheel- running activity. I. Mean ontogenetic trajectories. Evolution 57, 646-657. [8] Müller, H. G. (2005) Functional modeling and classification of longitu- dinal data. Scandinavian Journal Statistics 32, 223-240. [9] Ramsay, J. O. and Silverman, B. W. (2005) Functional Data Analysis. 2nd edition, Springer Series in Statistics. 11 Bibliography [10] Ramsay, J. O., Wickham, H. and Graves, S. (2007) R fda Library. [11] Rice, J. A. and Wu, C. O. (2001) Nonparametric mixed effects models for unequally sampled noisy curves. Biometrics 57, 253-259. [12] Robinson, G.K. (1991) That BLUP is a good thing: the estimation of random effects. Statistical Science 6, 15-32. [13] Ruppert, D., Wand, M. P. and Carroll, R. J. (2003) Semiparametric Regression. Cambridge University Press. 12 Chapter 2 Identifiability discussion of linear mixed effects models 2.1 Introduction Mixed effects models have moved beyond the original simple models, becom- ing more complicated, containing more parameters. However, two different sets of parameters producing the same covariance matrix for the observa- tions may cause problems for parameter estimation and for inference. In principle, parameter identifiability is the first thing that should be verified when building a model. As Demidenko (2004, p.118) states, “identifiabil- ity may be viewed as a necessary property for the adequacy of a statistical model”. We study identifiability in normal linear mixed effects models. In the next section, we define the classical mixed effects model and show that in an unrestricted form and a specific restricted form, the model is not identifi- able. Sections 2.3 and 2.4 give sufficient conditions to identify parameters in various models. In Section 2.5 we discuss identifiability of extended models. 2.2 Motivation In this section, we give the definition of nonidentifiability. Then we introduce the general unrestricted classical linear mixed effects model and give two models that are nonidentifiable. Definition 2.2.1 Let y be the vector of observable random variables with 0A version of this chapter will be submitted for publication. Author: Wang, W. 13 Chapter 2. Identifiability discussion of linear mixed effects models distribution function Pθ where θ is in the parameter space Θ. This proba- bility model for y is not identifiable if and only if there exist θ,θ∗ ∈ Θ, with θ 6= θ∗ and Pθ = Pθ∗. Throughout the paper, we assume all random variables follow normal dis- tributions. As the normal distribution is uniquely characterized by its first two moments, identifiability of a normal distribution function then reduces to the identifiability of its mean vector and covariance matrix (Demidenko, 2004, Proposition 10, p.118). In the standard linear mixed effects model, y is the observable random vector of length n and X and Z are known, non-random design matrices with dimensions n × p, n > p and n × q, n > q respectively. We assume throughout that both X and Z have full column rank. Then y = Xβ + Zu + , u ∼ N(0,Σu), ∼ N(0,Σ), u independent of . (2.1) The random effects vector u and the error vector are unobservable. This model has been studied and applied by, for instance, McCulloch and Searle (2001). Unknown parameters in the model are (β,θ), where β ∈ B ⊆ <p, θ = (Σ,Σu) ∈ Θ̃ ⊆ Θ = the set of all (Σ,Σu) with Σ, n × n, and Σu, q × q, both symmetric and positive definite. Throughout the paper, we assume that β and θ do not have common elements, i.e. we assume that (β,θ) ∈ B ⊗ Θ̃. We also assume that Σ and Σu do not have common elements. That is, we sometimes assume that Θ̃ = Θ̃ ⊗ Θ̃u where Θ̃ ⊆ Θ and Θ̃u ⊆ Θu, and Θ contains all n × n positive definite symmetric matrices and Θu contains all q×q positive definite symmetric matrices. We take as our operating definition of identifiability the ability to identify the parameters β, Σ and Σu. The linear mixed effects model has become popular in the analysis of longitudinal and functional data. For instance, the response of individual i at time t can be modelled as ∑ αjφj(t) + ∑ uikψk(t) plus error, where 14 Chapter 2. Identifiability discussion of linear mixed effects models the αj ’s are fixed population effects and the uik’s are individual-specific random effects. The inclusion of random effects allows us to realistically model covariance within an individual. Taking φj ’s and ψk’s equal to B- splines allows us to flexibly model a wide variety of responses. In addition, if we allow some of the αj ’s to be random but with a very specific covariance structure, then the resulting best linear unbiased predictors of the αj ’s are computationally equivalent to the parameter estimates in a smoothing spline fit or a penalized B-spline fit. See Verbyla (1999) or Ruppert, Wand and Carroll (2003). The parameter vector β is always identifiable since Ey = Xβ and X is of full column rank. So to study identifiability in the normal model, it suffices to study the covariance matrix of y, Σy = ZΣuZ ′+Σ, to see when Σ and Σu are identifiable. Thus, nonidentifiability of model (2.1) with parameter space B⊗ Θ̃ is equivalent to the existence of θ,θ∗ ∈ Θ̃, with θ 6= θ∗ giving the same Σy, i.e. ZΣuZ ′ + Σ = ZΣ ∗ uZ ′ + Σ ∗. (2.2) In Example 2.2.1 below, we show that the unrestricted model where Θ̃ = Θ is nonidentifiable. A similar argument shows that the restricted model in Example 2.2.2 below is not identifiable. Example 2.2.2 uses the covariance structure assumed for the random effects in the penalized Bspline method. Thus, when using penalized Bsplines, one cannot assume a general form for Σ. Example 2.2.1 Suppose we place no further restrictions on Θ. Then the model is not identifiable. To see this, let (Σ,Σu) ∈ Θ and 0 < a < 1. Then Σ∗u = (1 − a)Σu and Σ∗ = Σ + aZΣuZ′. So Σ∗ and Σ∗u satisfy (2.2) and (Σ ∗,Σ∗u) ∈ Θ. Example 2.2.2 Suppose that Θ = Θ ⊗ Θ̃u where Θ̃u contains all matrices of the form σ2R where R is positive definite and known, and σ2 > 0. To see that this 15 Chapter 2. Identifiability discussion of linear mixed effects models model is not identifiable, use the same argument as in Example 2.2.1 and note that the constructed Σ∗u is in Θ̃u if Σu is in Θ̃u. In practice, one usually assumes a more specific structure for Σ, such as Σ = σ 2 I. Restrictions may lead to identifiability, and such restrictions and their effects on identifiability will be discussed in the next two sections. 2.3 Simple sufficient conditions of identifiability In this section, we find sufficient conditions of identifiability of model (2.1) assuming Θ̃ = Θ. A further examination of (2.2) gives us the following sufficient conditions. Clearly, if Σu is known, then ZΣuZ ′ is known, and so Σ is completely determined. If Σ is known, then the model is identifiable. To see this, consider (2.2) with Σ = Σ ∗. It follows that ZΣuZ ′ = ZΣ∗uZ ′ and so Σu = Σ ∗ u since Z is of full column rank. If ZΣuZ ′Σ −1 = K, where K is known and K+ I is of full column rank, then the model is identifiable. Suppose by way of contradiction the model is not identifiable. Then (2.2) holds for (Σu,Σ) 6= (Σ∗u,Σ∗) both in Θ̃ with, by assumption, ZΣuZ ′ = KΣ and ZΣ ∗ uZ ′ = KΣ ∗. Substituting these expressions into (2.2) yields KΣ + Σ = KΣ ∗ + Σ ∗, that is, (K + I)(Σ −Σ∗) = 0. Since K + I is of full rank, we must have Σ = Σ ∗. But, as shown in the previous paragraph, this implies that Σu = Σ ∗ u. The last condition is similar to a common condition for identifiabililty in simple linear regression models with measurement errors. The model assumes yi = β0 + β1xi + i, i ∼ (0, σ2 ), 16 Chapter 2. Identifiability discussion of linear mixed effects models where xi is observed with error having variance σ 2 u. The response yi then has variance σ2u +σ 2 . One of the common conditions of model identifiability is to assume the ratio σ2u/σ 2 is known. The inverse Σ −1 appearing in our last condition could be viewed as multivariate version of “denominator”. If there are any supplementary data, we may then be able to find an estimate of Σu, Σ or K and we can treat this estimate as the true value. The sufficient conditions for identifiability can then be satisfied. 2.4 Sufficient conditions of identifiability for a structured Σ As we observed from Examples 2.2.1 and 2.2.2, the model is not identifiable even if we restrict Σu to be a scalar multiple of a known matrix. In this section, we study the effect of putting restrictions on Σ. In Theorem 2.4.1 below, we give a necessary and sufficient condition of nonidentifiability, a condition that relies mainly on the design matrix Z via HZ = Z(Z ′Z)−1Z′. The theorem leads to four corollaries: Corollaries 2.4.1 and 2.4.2 give nec- essary and sufficient conditions for identifiability when Σ arises from an exchangeable covariance structure or is diagonal. Corollary 2.4.3 states an easily checked condition on Σ that guarantees identifiability of the model. That corollary is then applied to two commonly used error structures. Us- ing Corollary 2.4.4, we can generalize a known identifiability result, giving a shorter proof under weaker conditions. Theorem 2.4.1 Let Θ̃ ⊆ Θ and define HZ = Z(Z′Z)−1Z′. Then model (2.1) with parameter space B⊗ Θ̃ is nonidentifiable if and only if there exist (Σ,Σu) ∈ Θ̃ and (Σ∗,Σ∗u) ∈ Θ̃, with Σ∗ 6= Σ such that HZ [Σ −Σ∗] = Σ −Σ∗, (2.3) and Σ∗u = Σu + (Z ′Z)−1Z′ [Σ −Σ∗]Z(Z′Z)−1. (2.4) 17 Chapter 2. Identifiability discussion of linear mixed effects models Proof : Nonidentifiability of the model is equivalent to the existence of (Σ,Σu) and (Σ ∗,Σ∗u) in Θ̃, not equal, satisfying (2.2). Note that this is equivalent to having (Σ,Σu) and (Σ ∗,Σ∗u) in Θ̃ with Σ ∗ 6= Σ satisfying Z(Σu −Σ∗u)Z′ = Σ∗ −Σ. (2.5) Suppose the model is nonidentifiable. We premultiply (2.5) by Z′, post- multiply it by Z and then pre- and postmultiply by (Z′Z)−1 to get Σu −Σ∗u = (Z′Z)−1Z′ [Σ∗ −Σ]Z(Z′Z)−1. (2.6) This gives (2.4). To derive (2.3), premultiply (2.6) by Z, postmultiply (2.6) by Z′ to get Z(Σu −Σ∗u)Z′ = HZ [Σ∗ −Σ]HZ (2.7) which, by (2.5), is the same as Σ −Σ∗ = HZ [Σ −Σ∗]HZ. (2.8) Premultiplying (2.8) by the idempotent matrix HZ gives HZ [Σ −Σ∗] = HZ [Σ −Σ∗]HZ. Substituting (2.8) into the right side of the above yields (2.3). To prove the converse, we want to show that (2.3) and (2.4) lead to (2.5). It is clear from (2.4) that (2.7) holds. If we can show that (2.8) holds then we are done since substituting (2.8) into the right side of (2.7) yields (2.5). To show (2.8), from (2.3) and the symmetry of Σ −Σ∗, we see that HZ[Σ −Σ∗] = [Σ −Σ∗]HZ. Premultiplying the above identity by the idempotent matrix HZ gives HZ[Σ− Σ ∗] = HZ[Σ−Σ∗]HZ. Substituting (2.3) for the left side of the equation, we see that (2.8) holds. 2 18 Chapter 2. Identifiability discussion of linear mixed effects models The proofs of the next two corollaries are in Appendix A. Corollary 2.4.1 Let 1 be an n-vector with each element being one. Suppose that the distribution of 1, . . . , n is exchangeable, that is, the covariance matrix of is of the form σ2 [(1− ρ)I + ρJ] where J = 11′. Let Θ̃ = {Σ = σ2 [(1− ρ)I + ρJ] , σ2 > 0, −1/(n − 1) < ρ < 1}. Suppose the matrix Z satisfies 1′Z 6= 0 and rank(Z) = q with 1 ≤ q < n − 1. Suppose the parameter space is B⊗ Θ̃⊗Θu. Then model (2.1) is identifiable if and only if HZJ 6= J. Comments. The condition HZJ = J means the sum of the elements of each row of HZ is equal to one, and this is an easy condition to check. For the case that q = 1, i.e. Z is a column vector (z1, . . . , zn) ′ where ∑ zi 6= 0, HZ = z21 s2z z1z2 s2z · · · z1zn s2z ... ... znz1 s2z znz2 s2z · · · z2ns2z , where s2z = ∑ z2i . The model is identifiable if and only if Z is not a constant vector. When q = 2, suppose we have the usual simple linear regression model with centered covariates: Z = 1 z1 ... ... 1 zn , with ∑ zi = 0. (2.9) Then HZ = 1 n + z21 s2z 1 n + z1z2 s2z · · · 1n + z1zns2z ... ... 1 n + znz1 s2z 1 n + znz2 s2z · · · 1n + z 2 n s2z . and each row of HZ sums to one. Thus unfortunately, the model is not iden- tifiable under this Z combined with the exchangable covariance structure. Corollary 2.4.2 Suppose that Θ̃ equals the collection of all diagonal pos- itive definite n × n matrices. Then model (2.1) with parameter space B ⊗ 19 Chapter 2. Identifiability discussion of linear mixed effects models Θ̃ ⊗Θu is identifiable if and only none of the diagonal elements of HZ is equal to one. Comments. Again, the condition on HZ is easy to check. Consider the case q = 1. As we have seen, diagonal elements of HZ equal z 2 i / ∑ z2j , i = 1, . . . , n. Therefore, the model is identifiable if and only if Z does not have n− 1 zero elements. Consider q = 2 with Z as in (2.9). The model is identifiable provided, for all i, (1/n)+z2i / ∑ z2j doesn’t equal 1. So typically, the model is identifiable. The following corollary provides a sufficient condition for identifiability, a condition that can sometimes be easily checked. Consider (2.3). Note that the rank of HZ(Σ − Σ∗) is at most q, since the rank of HZ is q. Thus, for (2.3) to hold, we must be able to find some Σ and Σ ∗ with the rank of Σ −Σ∗ less than or equal to q. This proves the following. Corollary 2.4.3 Suppose that Θ̃ ⊆ Θ. Then model (2.1) with parameter space B × Θ̃ is identifiable if rank(Σ − Σ∗) > q for all Σ, Σ∗ in the parameter space. Now we apply Corollary 2.4.3 to show model identifiability under the “multiple of a known positive definite matrix” and the “MA(1)” covariance structures respectively in next two examples. Example 2.4.1 Multiple of a known positive definite matrix Fix R, symmetric and positive definite, and suppose (Σ,Σu) ∈ Θ̃ implies that Σ = σ 2R, some σ2 > 0. Consider Σ = σ 2R and Σ ∗ = σ∗2R, σ∗2 6= σ2. Clearly Σ−Σ∗ = (σ2−σ∗2)R is invertible, and so is of rank n, which we have assumed is greater than q. Thus, the model is identifiable. To show the model in Example 2.4.2 below is identifiable, we need the following lemma which is a result in (Graybill, 1983, p.285) Lemma 2.4.1 Let T be the n × n Toeplitz matrix with ones on the two parallel subdiagonals and zeroes elsewhere. Given two scalars a0 and a1, the 20 Chapter 2. Identifiability discussion of linear mixed effects models eigenvalues of the n× n matrix C = a0I + a1T are µi = a0 + 2|a1| cos ipi n+ 1 , i = 1, . . . , n. Example 2.4.2 MA(1) Suppose that n−1 > q. Let the components of have the MA(1) covariance structure, i.e. of the form σ2(I + ρT). Let Θ̃ = {Σ = σ2(I + ρT), σ2 > 0, |ρ| < 1/2} and suppose (Σ,Σu) ∈ Θ̃ implies that Σ ∈ Θ̃. Let Σ and Σ ∗ ∈ Θ̃. By Lemma 2.4.1, the eigenvalues of the difference matrix Σ −Σ∗ = (σ2 − σ∗2)I + (σ2ρ− σ∗2ρ∗)T are λi = (σ 2 − σ∗2) + 2 ∣∣∣σ2ρ− σ∗2ρ∗∣∣∣ cos ipi n+ 1 , i = 1, . . . , n. Given any (σ2, ρ) and (σ∗2, ρ∗), with (σ2, ρ) 6= (σ∗2, ρ∗), the number of zero λi’s is at most one. Hence, the rank of the difference matrix is greater than or equal to n − 1. Therefore, model (2.1) is identifiable under this MA(1) covariance structure. In longitudinal or functional data analysis, usually there are N individ- uals with the ith individual modelled as in (2.1): yi = Xiβ + Ziui + i, ui ∼ N(0,Σu), i ∼ N(0,Σi), (Σu,Σi) ∈ Θu ⊗ Θ̃i, ui and i independent. (2.10) Statistical inference is normally based on the joint model, the model of these N individuals. The following corollary gives sufficient conditions for identifiability of the joint model. The intuition behind the result is that, if we can identify Σu from one individual, then we can identify all of the Σi’s. Corollary 2.4.4 If an individual model (2.10) is identifiable, then the joint model is identifiable. 21 Chapter 2. Identifiability discussion of linear mixed effects models Proof: We notice each individual model (2.10) shares a common parameter, the covariance matrix Σu. If one individual model uniquely determines Σu and its Σi, the identified Σu will then yield identifiability of all the individual Σi’s since, if ZiΣuZ ′ i + i = ZiΣuZ ′ i + ∗ i , clearly i = ∗ i . Therefore, the joint model is identifiable. 2 Corollary 2.4.4 reduces the verification of a joint model’s identifiability to the individuals’. For instance, if the i-th individual model has Zi of full column rank and Σi = σ 2 Ini , where ni is the length of yi, then this individual model is identifiable by Corollary 2.4.1 and thus so is the joint model. Note that the other individual models can still have their Zj’s not of full column rank. Demidenko (2004, Chapters 2 & 3) studies the joint model but assumes the covariance matrix of i is σ 2Ini . Suppose each Zi is of dimension ni× q. Demidenko shows that the joint model is identifiable if at least one matrix Zi is of full column rank and ∑N i=1(ni − q) > 0. Using our argument in the previous paragraph, the condition ∑N i=1(ni − q) > 0 can be dropped. Furthermore, our result can be applied to more general Σ’s. 2.5 Extensions In this section, we discuss identifiability of a model in functional regression for a functional predictor y(·) and a scalar response w. We derive a necessary and sufficient condition of nonidentifiability for this model. We model y(t) as ∑ j αjφj(t) + ∑ k ukψk(t) plus error, with the φj ’s and ψk’s known, the αj’s unknown and the uk’s unknown and random. The dependence of the response w on y is modelled through an unknown functional coefficient, β: w = β0+ ∫ β(t) [y(t)−E(y(t))] dt+η where η is mean 0 normal noise. Thus, for appropriately defined ρ and with u = (u1, . . . , uq) ′, we can write w = β0 + ρ ′u + η, β0 ∈ <, ρ ∈ <q unknown , η ∼ (0, σ2η) independent of u. (2.11) 22 Chapter 2. Identifiability discussion of linear mixed effects models The predictor y is observed at a sequence of discretized points and the observed values are contained in the vector y, which then follows model (2.1). James (2002), Müller (2005) and Heckman and Wang (2007) consider this modelling of y and w and propose different approaches to estimate the functional coefficient, β. We assume model (2.1) and (2.11). For our purpose of identifiability discussion here, we consider the unknown parameters to be (β0,β) and θ = (Σ,Σu, σ 2 η ,ρ). We suppose that (β0,β) ∈ B ⊆ < ⊗ <p and that θ ∈ Θ̃ ⊆ Θ = Θ⊗Θu⊗<+⊗<q where Θ and Θu are as before and <+ is the set of positive real numbers. To study identifiability, we must study the distribution of the random vector (y′, w). We see that E(y) = Xβ, E(w) = β0 and (y ′, w)′ has covari- ance matrix Σ = ( ZΣuZ ′ + Σ ZΣuρ ρ′ΣuZ ′ ρ′Σuρ+ σ 2 η ) . We know the parameter β is identifiable if the matrix X is of full column rank. The identifiability of β0 is also clear. So we focus on identifying the covariance parameters θ. Our discussion in Section 2.2 suggests the unrestricted model won’t be identifiable. In fact, we can construct an example following Example 2.2.1 to show the existence of nonidentical θ and θ∗ both in Θ such that (2.2) holds and ZΣuρ = ZΣ ∗ uρ ∗, (2.12) ρ′Σuρ+ σ 2 η = ρ ∗′Σ∗uρ ∗ + σ∗η 2. (2.13) Example 2.5.1 Example 2.2.1 continued. Let 0 < a < 1, Σ∗u and Σ ∗ be as in Example 2.2.1, and let ρ∗ = ρ/(1 − a) and σ∗η 2 = σ2η − aρ′Σuρ/(1 − a). It is not hard to see (2.12) and (2.13) are satisfied. If, in addition, we restrict a < σ2η/(σ 2 η + ρ ′Σuρ), we see that σ ∗ η 2 is positive. The following theorem gives a necessary and sufficient condition of non- 23 Chapter 2. Identifiability discussion of linear mixed effects models identifiability. Theorem 2.5.1 Let Θ̃ ⊆ Θ, let Θ̃ = Θ̃ ⊗Θu ⊗ <+ ⊗ <q. and define HZ = Z(Z ′Z)−1Z′. Then model (2.1) and (2.11) with parameter space B⊗Θ̃ is nonidentifiable if and only if there exist (Σ,Σu, σ 2 η ,ρ) and (Σ ∗,Σ∗u, σ ∗ η 2,ρ∗) both in Θ̃, with Σ ∗ 6= Σ such that the following hold (a) HZ [Σ −Σ∗] = Σ −Σ∗, (b) Σ∗u = Σu + (Z ′Z)−1Z′ [Σ −Σ∗]Z(Z′Z)−1, (c) ρ∗ = Σ∗u −1 Σuρ, (d) σ∗η 2 = σ2η + ρ ′Σu(Σ −1 u −Σ∗u−1)Σuρ. Proof: Nonidentifiability of the model is equivalent to the existence of nonidentical θ and θ∗ in Θ̃, satisfying (2.2), (2.12) and (2.13). First suppose that (2.2), (2.12) and (2.13) hold for some θ 6= θ∗. By the argument given in Theorem 2.4.1, (2.2) implies that (a) and (b) hold. Since Z is of full column rank, (2.12) yields (c). Substituting (c) into (2.13) yields (d). We easily see that, if θ 6= θ∗ and if (a) through (d) hold, then Σ 6= Σ∗. Now suppose that conditions (a) through (d) hold for some Σ 6= Σ∗. Again, by the argument given in Theorem 2.4.1, (2.2) holds. We easily see that (2.12) and (2.13) also hold. 2 24 Bibliography [1] Demidenko, E. (2004) Mixed Models: Theory and Applications, John Wiley & Sons. [2] Graybill, F. A. (1983) Matrices with Applications in Statistics, Second Edition, Wadsorth, Inc. [3] Heckman, N. and Wang, W. (2007) Linear mixed models for measure- ment error in functional regression. manuscript, Department of Statis- tics, University of British Columbia. [4] James, G. (2002) Generalized linear models with functional predictors. Journal of the Royal Statistical Society B 64, 3, 411-432. [5] McCulloch, C. & Searle, S. R. (2001) Generalized, Linear and Mixed Models, John Wiley & Sons. [6] Müller, H.G. (2005) Functional modeling and classification of longitu- dinal data. Scandinavian Journal of Statistics 32, 223-240. [7] Ruppert, D., Wand, M.P. & Carroll, R.J. (2003). Semiparametric Re- gression, Cambridge University Press. [8] Verbyla, A. P., Cullis, B. R., Kenward, M. G., & Welham, S. J. (1999). The analysis of designed experiments and longitudinal data by using smoothing splines. Applied Statistics 48, 269-311. 25 Chapter 3 Linear mixed models for measurement error in functional regression 3.1 Introduction Regression models with a functional predictor Z(·) and a scalar response Y have been extensively studied (see, eg, Ramsay and Silverman, 2005, and references therein). For individual i, the dependence of Yi on Zi is modelled as Yi = β0 + ∫ b a β(t) [ Zi(t)− E(Zi(t)) ] dt+ ei. (3.1) The goal is to estimate β. Data are collected from N independent individuals, with data on in- dividual i, i = 1, . . . , N , being Yi, Zij ≡ Zi(tij), j = 1, . . . , ni. If the Zi processes are observed with error, then our data on individual i are Yi and zij , j = 1, . . . , ni, with zij = Zi(tij) + ij, Cov(i1, . . . , ini) = Σi . (3.2) Here, we consider data where the Zij’s are observed with error which could be measurement error or modelling error, and we model the function Zi using random effects with a set of basis functions φ1, . . . , φK . In our 0A version of this chapter will be submitted for publication. Authors: Heckman, N. and Wang, W. 26 Chapter 3. Linear mixed models for measurement error in functional regression estimation process, we approximate Zi(·) as Zi(t) = µ(t) + K∑ k=1 xikφk(t), (3.3) where µ is an arbitrary function and xi ≡ (xi1, . . . , xiK)′ are independent and normally distributed with mean vector 0 and covariance matrix Σx. This approach was taken by James (2002), Müller (2005) and James and Silverman (2005) for data with and without measurement error. James used φk’s equal to a basis for natural cubic splines and also used this ba- sis for modelling µ. James’s approach is similar to that described in Sec- tion 3.3. However, we implement a faster algorithm, the ECME (Expecta- tion/Conditional Maximization Either) algorithm in Liu and Rubin (1994), and consider Hessian matrix based and boot-strapped standard errors. Müller used φk’s equal to the first K estimated eigenfunctions of the Zi process. Müller’s approach can be separated into two parts: the first part uses only the zij ’s to determine µ, the φk’s and the xij ’s. The second part incorporates the Yi’s to estimate β0 and β. The first part uses the PACE method (principal analysis through conditional expectation) of Yao, Müller and Wang (2005). In PACE, Yao et al. smooth the observed zij ’s to obtain an estimate µ̂ of µ and then centre the data by subtracting µ̂. Next the authors smooth the centred data to estimate the covariance function of the Zi’s and then estimate the first K eigenfunctions and eigenvalues of the estimated covariance function. Denote these estimates by φ1, . . . , φK and λ̂1, . . . , λ̂K . Let x̂ik be the best linear unbiased estimate of individual i’s kth PC score, x̂ik = E(xik|zi1, . . . , zini), calculated assuming normality, (3.2) with Σi = σ 2I and (3.3) with Σx = diag(λ̂1, . . . , λ̂K). The second part of Müller’s methodology involves regressing Yi on x̂i1, . . . , x̂iK to estimate βk =∫ β(t) φk(t) dt and then setting β̂(t) = ∑ β̂kφk(t). Müller justifies the use of this regression by showing that E(Yi|zij , j = 1, . . . , ni) = β0 + ∑∞ 1 βkxik, for the true PC scores xik calculated with the true eigenfunctions. Müller’s approach has a big computational advantage over James’s (2002), in that it can fit the Zi’s well with K fairly small. An added benefit of 27 Chapter 3. Linear mixed models for measurement error in functional regression this eigenfunction approach is that we focus on “estimable directions” of β. For instance, consider the extreme case where Zi can be written exactly as Zi(t) = µ(t)+ ∑K 1 xikφk(t). So Zi has no variability in directions orthogonal to the φk’s. Since ∫ β(t)[Zi(t) − E(Zi(t))] dt = ∑K 1 xik ∫ β(t) φk(t) dt, we can only hope to estimate ∫ β(t) φk(t) dt, k = 1, . . . ,K. We cannot esti- mate ∫ β(t) f(t) dt for f orthogonal to the φk’s. This issue was noted by James and Silverman (2005) who handled it by adding a penalty term which penalizes β’s that have ∫ β(t)f(t)dt big when Var( ∫ Z(t)f(t)dt) is small. Müller’s approach has the disadvantage that it does not fully use the Yi’s: the xik’s in (3.3) are estimated using only the zij ’s. Clearly, the Yi’s also provide information about the xik’s if there is a relationship between Yi and Zi, that is, if β 6= 0. As James and Silverman (2005) note “It is an interesting feature of this problem that the responses provide additional information in the estimation of the Zi’s”. Also, Müller’s calculation, that E(Yi|zij , j = 1, . . . , ni) = β0+ ∑∞ 1 βkxik does not hold if the eigenvalues or eigenfunctions are incorrect. In particular, the calculation relies on Cov(xij, xik) = 0 for j 6= k. We consider a hybrid approach. Like Müller, we use φ1, . . . , φK equal to the first K estimated eigenfunctions of the Zi process. Thus we not only improve on James’s choice of φk’s but also focus on “estimable directions” of β. We then treat these φk’s as fixed and known in our analysis. We use all of the data, the Yi’s and the zij ’s, to estimate the xik’s and we do not place any restrictions on Σx. Thus we improve on Müller’s procedure, where the xik’s are estimated using only the zij ’s, and Σx is assumed diagonal and is estimated completely by the eigenanalysis of the zij ’s. Our detailed parameter estimation procedure using the ECME algorithm is in Section 3.3. In this work, we also propose test statistics for hypothesis testing of β(·). In Section 3.4.1, we consider testing the nullity of the function β, i.e. testing Ho : β(t) = 0, for all t ∈ [a, b]. In the two sample situation, we consider testing the equality of a selection βs and a control βc, i.e. testing Ho : β s(t) = βc(t), for all t ∈ [a, b]. We propose a new integrated t-statistic and three test statistics that are more standard. In Section 3.5, we derive expressions of the residuals of the model fit and discuss model diagnostics 28 Chapter 3. Linear mixed models for measurement error in functional regression based on the analysis of residuals. In Section 3.6.2, we apply the model to analyze a biological data set. In Section 3.7, via a simulation study, we compare our ECME estimate of β to a modification of Müller’s (2005) two-stage estimate and we also study the performance of the different test statistics. Our detailed calculations to derive the ECME estimates are in Appendix B. 3.2 Notation and Preliminaries Before we fit the model, we introduce some notation and carry out prelim- inary calculations. In this section and the next, we suppose that the φk’s are fixed and known. In practice, however, we will estimate them from an eigenanalysis of the zij ’s. For ease, assume ni ≡ n and tij = tj . Suppose that (3.1), (3.2) and (3.3) above hold. Let Zi = (Zi1, . . . , Zin) ′, µ = (µ(t1), . . . , µ(tn)) ′, i = (i1, . . . , in) ′ represent errors and zi = (zi1, . . . , zin) ′ be the observed values. Write zi = Zi + i ≡ µ+ Axi + i, (3.4) where xi i.i.d.∼ N(0,Σx), Ajk = φk(tj), j = 1, . . . , n, k = 1, . . . ,K, i i.i.d.∼ N(0, σ2 R), R known, symmetric and positive definite, {x1, . . . ,xn} independent of {1, . . . , n}. If R is known, then the model is identifiable (Wang, 2007). However, if σ2R is unknown, then the model is not identifiable (Wang, 2007). Wang also discusses model identifiability under other assumptions on the covariance matrix of i. We choose a basis for β and write β(t) = J∑ j=1 βjψj(t) ≡ β′ψ(t). 29 Chapter 3. Linear mixed models for measurement error in functional regression Typically, we will take J = K and ψj = φj , but will write for the general case. Thus we can write Yi = β0 + β ′Txi + ei, where Tjk = ∫ b a ψj(t) φk(t) dt and ei ∼ N(0, σ2). (3.5) We easily see that zi,xi and Yi are jointly multivariate normal with E(zi) = µ, E(Yi) = β0, E(xi) = 0, and Var(Yi) ≡ σ2Y = β′TΣxT′β + σ2, Cov(zi) ≡ Σz = AΣxA′ + σ2R, Cov(zi, Yi) ≡ Σz,Y = AΣxT′β, Cov ( zi,x ′ i ) ≡ Σz,x = AΣx and Cov ( Yi,x ′ i ) ≡ ΣY,x = β′TΣx. (3.6) Let Wi = (z ′ i, Yi) ′ be the ith subject’s observation and let µW denote E(Wi). Let C = ( A β′T ) (3.7) and Σd be a block diagonal matrix as Σd = diag(σ 2 R, σ 2). (3.8) Then Cov(Wi) ≡ ΣW = CΣxC′ + Σd (3.9) and the log-likelihood of the observed data is ΛN = −N 2 ln det ( CΣxC ′ + Σd )−1 2 N∑ i=1 (Wi−µW )′ ( CΣxC ′ + Σd )−1 (Wi−µW ) up to an additive constant term. Unknown model parameters are θ = (µ, β0,Σx, σ 2 ,β, σ 2). Directly maximizing ΛN over θ does not give us closed forms of the parameter estimates except for µ and β0. So we need to rely 30 Chapter 3. Linear mixed models for measurement error in functional regression on iterative numerical methods to find the estimates. We will elaborate on these methods in the next section. 3.3 Parameter estimation In this section, we use the ECME (Expectation/Conditional Maximization Either) algorithm (Liu and Rubin, 1994) to estimate the model parameters in θ. In this balanced design, the MLEs of µ and β0 are z̄ and Ȳ respectively. So throughout we take µ(t) = µ̂ = z̄ and β (t) 0 = β̂0 = Ȳ and so µ (t) W = W̄, t = 0, 1, . . . . We estimate the other parameters iteratively and sequentially. Given θ(t), the parameter estimates at iteration t, we update one component of θ(t) at a time, holding the other components fixed. We treat (zi, Yi,xi), i = 1, . . . , N, as the complete data. We update σ 2(t) by finding its EM estimate. That is, we find its estimate by maximizing the conditional expected com- plete data log-likelihood function, where we condition on the observed data. The other components of θ(t) are updated by maximizing ΛN directly. Throughout our ECME procedure in Sections 3.3.2-3.3.4, we make the following assumptions. (a) A is of full column rank; (b) T is of full row rank; (c) there exists no u and v such that, for all i = 1, . . . , n, zi = u + v ′xi; (d) there exists no v such, for all i = 1, . . . , n, Yi = Ȳ + v ′(zi − z̄). The restrictions on A and T are easily satisfied. Assumption (b) requires J , the number of the ψj basis functions to be no larger than K, the number of the φk basis functions. Typically, we will take J = K and ψj = φj . Assumptions (c) and (d) are common for data where there is noise. We 31 Chapter 3. Linear mixed models for measurement error in functional regression notice assumption (a) implies that the matrix C defined in (3.7) is also of full column rank. 3.3.1 Initial estimates of parameters other than µ and β0 We choose the initial estimates as follows. We take β(0) to be a vector of zeroes. If this were the true value of β, then we could simply estimate σ2 by the sample variance of the Yi’s. To account for the fact that β may not be zero and thus the sample variance of the Yi’s would overestimate σ 2, we take σ2(0) equal to 0.6 times the sample variance of the Yi’s. The values of Σ(0)x and σ 2(0) are based on the penalized eigenanalysis of the zi’s sample covariance matrix described in Section 3.6.1. These initial estimates are sensible if R is the identity matrix, but can still be used if R is not the identity. Roughly, the eigenanalysis in Section 3.6.1 partitions the variability of the zi’s into two parts: variability from the Zi process and variability from the noise. Let λ1, . . . , λn denote the calculated eigenvalues and suppose the largest K sufficiently describe the variability in the zi’s. So we will use K eigenfunctions. A reasonable first estimate of Σx is Σ (0) x diagonal with entries λ1, . . . , λK . We take σ 2(0) equal to ∑n K+1 λk/(n−K), explaining the remaining variability in the zij ’s. Clearly, under assumptions (a)-(d), we can force σ2(0) and σ 2(0) to be positive and Σ(0)x > 0. Given θ (t), we update Σx, σ 2 , β, and σ 2, as described below. 3.3.2 Updating Σ(t)x We update Σ(t)x by maximizing ΛN over Σx while keeping the other pa- rameters fixed. Let SW = ∑N i=1(Wi − W̄)(Wi − W̄)′/N. We show that if σ2(t) and σ 2(t) are positive and if SW −Σ(t)d > 0, then our update Σ(t+1)x is positive definite and using it in the log likelihood instead of Σ(t)x increases the log likelihood. With detailed derivation in Section B.4, differentiating ΛN with respect 32 Chapter 3. Linear mixed models for measurement error in functional regression to Σx and equating to zero yields the first order condition C′Σ−1W C = C ′Σ−1W SW Σ −1 W C. (3.10) Here, C depends on β(t) and ΣW = CΣxC ′ + Σ (t) d from (3.9). Equation (3.10) holds provided ΣW is invertible at the critical value of Σx. Since we assume that σ2(t) and σ 2(t) are positive, Σ (t) d is positive definite. So ΣW will be invertible provided Σx is non-negative definite. We now solve (3.10) for Σx, first deriving two useful identities, (3.11) and (3.12). For ease, we drop the hats and superscript t’s on the parameter estimates that are being held fixed, that is, on µ̂W , σ 2(t) ,β (t), and σ2(t). Direct multiplication and some manipulation of the left hand side of the following shows that ( C′Σ−1d C ) × [( C′Σ−1d C )−1 + Σx ] C′Σ−1W = C ′Σ−1d . Solving this for CΣ−1W yields C′Σ−1W = [( C′Σ−1d C )−1 + Σx ]−1 ( C′Σ−1d C )−1 C′Σ−1d . (3.11) Postmultiplying both sides of identity (3.11) by C yields C′Σ−1W C = (( C′Σ−1d C )−1 + Σx )−1 . (3.12) Substituting (3.11) into the right side of (3.10) and (3.12) into the left side of (3.10) yields ( C′Σ−1d C )−1 + Σx = F SW F ′, where F = ( C′Σ−1d C )−1 C′Σ−1d . Note that F is of full row rank. Thus, the critical point is Σ̂x = F SW F ′ − ( C′Σ−1d C )−1 = F (SW −Σd)F′, (3.13) which is strictly positive definite. And so, clearly we have ΣW invertible at 33 Chapter 3. Linear mixed models for measurement error in functional regression the critical point. To see that the updated Σ̂x leads to an increase in ΛN , we show that the Hessian matrix, H(Σx) evaluated at Σ̂x is negative definite. The ijth element of H(Σx) is the second order partial derivative of ΛN with respect to the ith and jth elements of the vectorized Σx. From calculations in Section B.4, we have H(Σ̂x) = −(N/2) ( D̂⊗ D̂ ) , where D̂ = C′Σ̂ −1 W C, (3.14) which is clearly negative definite. 3.3.3 Updating σ2(t) We update σ 2(t) , holding all other parameter estimates fixed, using one E- step and one M-step of the EM algorithm. We show that if σ2(t) and σ 2(t) are positive and if Σ(t)x > 0, then our update σ 2(t+1) is positive. Increase of the log likelihood after updating σ 2(t) by σ 2(t+1) is guaranteed by the property of the EM algorithm. Recall (zi, Yi,xi), i = 1, . . . , N , are our complete data and Wi ≡ (z′i, Yi)′, i = 1, . . . , N , are the observed data. In conditional expectations, we let “ · ” stand for the observed data. Abusing notation slightly, we let f denote a generic density function with the exact meaning clear from the arguments. The E-Step of the EM algorithm calculates E θ (t) (∑N i=1 ln f(zi, Yi,xi) | · ) and the M-step maximizes this conditional expectation over σ2 to obtain σ 2(t+1) . By the conditional independence of zi and Yi given xi, ln f(zi, Yi,xi) ≡ ln f(zi|xi) + ln f(yi|xi) + ln f(xi). Since only ln f(zi|xi) contains σ2 , we can ignore the last two terms and obtain σ 2(t+1) via maximizing Eθ(t) (∑N i=1 ln f(zi|xi) | · ) over σ2 . From (3.4), we first get N∑ i=1 ln f(zi|xi) = −N 2 ln(det σ2R)− 1 2σ2 N∑ i=1 (zi−µ−Axi)′R−1(zi−µ−Axi). 34 Chapter 3. Linear mixed models for measurement error in functional regression Following (3.6), we have Cov(Wi,xi) = CΣx (3.15) which then leads to the conditional mean and covariance of xi given Wi as E[xi|Wi] ≡ µxi|Wi = ΣxC′Σ−1W (Wi − µW ), (3.16) Cov[xi|Wi] ≡ Σx|W = Σx −ΣxC′Σ−1W CΣx. (3.17) Let s̃ = N∑ i=1 (zi − µ̂−Aµ(t)xi|Wi) ′R−1(zi − µ̂−Aµ(t)xi|Wi). Routine calculations yield E θ (t) ( N∑ i=1 ln f(zi|xi)|· ) = −N 2 ln(detR)−nN 2 lnσ2− 1 2σ2 [ s̃+Ntr(R−1AΣ (t) x|WA ′) ] . Differentiating this conditional mean with respect to σ2 and equating the derivative to zero yields σ2(t+1) = 1 nN s̃+ 1 n tr[R−1AΣ (t) x|WA ′]. (3.18) We show the update σ 2(t+1) is positive in the following. The first term in σ 2(t+1) is positive, by assumption (c) and the fact that R is positive definite. The second term is nonnegative by the following argument. Using the famous matrix identity (VΣV′ + Σ0) −1 = Σ−10 −Σ−10 V ( Σ−1 + V′Σ−10 V )−1 V′Σ−10 provided the matrix orders properly defined, we see that Σ (t) x|W = ( Σ(t)x −1 + C(t)′Σ (t) d −1 C(t) )−1 which is positive definite. Given Σ (t) x|W > 0, assumption (a) then implies 35 Chapter 3. Linear mixed models for measurement error in functional regression that AΣ (t) x|WA ′ ≥ 0. Together with the fact that R > 0, the second term in (3.18) is thus nonnegative. 3.3.4 Updating β(t) and σ2(t) The updates of β(t) and σ2(t) maximize ΛN over β and σ 2, holding the other parameters fixed. Suppose that σ 2(t) > 0 and Σ (t) x > 0. We find unique critical points, β̂ and σ̂2, and show that they increase the log likelihood provided σ̂2 > 0. Note that log f(yi, zi) = log f(yi|zi) + log f(zi), that log f(zi) doesn’t depend on β or σ2. We also note given zi, yi is normal with mean E(Yi|zi) ≡ β0 + β′G(zi − µ), and variance σ2Y |z ≡ Var(Yi|zi) = β′Kβ + σ2 (3.19) where G = TΣxA ′Σ−1z (3.20) and K = TΣxT ′ −TΣxA′Σ−1z AΣxT′. Therefore, to maximize ΛN with respect to β and σ 2, we maximize Λ̃N = −N 2 ln(β′Kβ + σ2)− 1 2(β′Kβ + σ2) N∑ i=1 ( Yi − β0 − β′G(zi − µ) )2 .(3.21) With detailed derivation in Section B.5, equating ∂Λ̃N/∂β and ∂Λ̃N/∂σ 2 to zero yields respectively 1 β′Kβ + σ2 N∑ i=1 ( Yi − β0 − β′G(zi − µ) ) G(zi − µ) = 0 (3.22) 36 Chapter 3. Linear mixed models for measurement error in functional regression 1 (β′Kβ + σ2)2 [ N∑ i=1 ( Yi − β0 − β′G(zi − µ) )2 −N(β′Kβ + σ2)] = 0. (3.23) Note that G is of full row rank because of the following two observations. First, T is of full row rank by assumption (b). Second, the matrix Σz = Σ(t)x + σ 2(t) R is invertible since it is positive definite. Let M = G N∑ i=1 (zi − µ)(zi − µ)′G′. Then, by assumption (c), M is positive definite. Solving (3.22) for β and (3.23) for σ2 gives β(t+1) = β̂ = M−1G N∑ i=1 (zi − µ)(Yi − β0) σ2(t+1) = σ̂2 = 1 N N∑ i=1 ( Yi − β0 − β′G(zi − µ) )2 − β′Kβ. (3.24) Unfortunately, we are not guaranteed that σ̂2 is positive. However, in all of our data analyses and simulation studies, the final estimate of σ2 was always positive. Again, to check if the update increases Λ̃N , we show that the Hessian matrix is negative definite. We notice that (3.24) implies σ̂2Y |z ≡ β̂ ′ Kβ̂ + σ̂2 = 1 N N∑ i=1 ( Yi − β0 − β̂′G(zi − µ) )2 , (3.25) which is positive by assumption (d). With detailed calculation in Sec- tion B.5, the Hessian matrix HΛ̃(β, σ2) when evaluated at β̂ and σ̂2 equals HΛ̃(β̂, σ̂2) = − N (σ̂2Y |z) 2 2Kβ̂β̂′K + σ̂2Y |zN M Kβ̂ β̂ ′ K 12 . (3.26) It follows that HΛ̃(β̂, σ̂2) < 0 by the following argument. Let x1 ∈ <J and 37 Chapter 3. Linear mixed models for measurement error in functional regression x2 ∈ < with at least one of x1 or x2 non-zero. Direct calculation yields (x′1, x2) HΛN (β̂, σ̂ 2) ( x1 x2 ) = − N (σ̂2Y |z) 2 [( x2√ 2 + √ 2x′1Kβ̂ )2 + σ̂2Y |z N x′1Mx1 ] < 0. 3.4 Inference for β Given the estimate β̂ of β, we estimate the function β by β̂ = β̂ ′ ψ. If the covariance matrix of β̂ is Σβ, then the covariance function of β̂, denoted Vβ, is Vβ(s, t) = Cov(β̂ ′ ψ(s), β̂ ′ ψ(t)) = ψ(s)′Σβψ(t). We base inference for β on β̂ and an estimate of Σβ. We estimate Σβ in two ways, using bootstrap by resampling the observed Wi’s with replacement or using the observed Hessian matrix HΛ̃N (β̂, σ̂ 2) defined in (3.26). We take Σ̂β as the K ×K upper corner of the inverse of −HΛ̃N (β̂, σ̂2). In doing this, we are treating the other parameters as known, ignoring the variability introduced by estimating them. Thus, we expect that we may underestimate Vβ(t, t), while we don’t anticipate underestimation with the bootstrap estimate. However, the Hessian-based estimate is very fast to compute. 3.4.1 Hypothesis testing for β Testing that β ≡ 0 To determine if Yi depends on Zi(·) we test Ho : β(t) = 0, for all t ∈ [a, b]. We consider three test statistics. 38 Chapter 3. Linear mixed models for measurement error in functional regression The first test statistic is the generalized likelihood ratio statistic Ul = sup β = 0 ΛN − supΛN . The unrestricted supremum of ΛN is achieved at the ECME estimates de- scribed in Section 3.3. We can also use the ECME procedure to calculate the first supremum. To obtain the first supremum, we observe under the restriction β = 0, zi and Yi are independent, with zi ∼ N(µ,AΣxA′+σ2R) and Yi ∼ N(β0, σ2). Thus µ̂ = z̄, β̂0 = Ȳ and σ̂2 = ∑ (Yi − Ȳ )2/N . We then calculate Σ(t)x and σ 2(t) by an ECME method treating these estimates of µ, β̂ and σ2 as known. We update Σx by maximizing ΛN directly while holding σ2 fixed. We update σ 2 by finding its EM estimate σ 2(t+1) while holding Σx fixed. We iterate untill convergence occurs. The second statistic considered is Wald’s test statistic using β̂, the vector of estimated basis coefficients: Uw = β̂ ′ Σ̂ −1 β β̂. It is interesting to note that this test statistic can be re-written in terms of a vector of function evaluations of β̂. To see this, let t∗i , i = 1, . . . , n ∗, be a sequence of time points, let β̃ be the vector containing the values of β̂ at the t∗i ’s, and let Σβ̃ be β̃’s covariance matrix. The Wald test statistic based on β̃ is β̃ ′ Σ̂ + β̃ β̃, where Σ̂ + β̃ is the Moore-Penrose inverse of an estimate of Σ β̃ . We now argue that, under mild conditions on the t∗i ’s, β̃ ′ Σ̂ + β̃ β̃ = Uw. Define the n∗ × J matrix Ψ as Ψij = ψj(t∗i ) and suppose that Ψ is of full column rank. Since β̃ = Ψβ̂, Σ β̃ = ΨΣβΨ ′ and thus it is natural to take Σ̂ β̃ = ΨΣ̂βΨ ′. Since Σ̂ + β̃ = Ψ +′Σ̂ −1 β Ψ + and Ψ+Ψ = I, β̃ ′ Σ̂ + β̃ β̃ = Uw. The third statistic is the integrated t-statistic Uf = ∫ b a β̂2(t) V̂β(t, t) dt. To calculate the null distribution of a test statistic, and thus calcu- late a p-value, we use a permutation type method, one that does not rely 39 Chapter 3. Linear mixed models for measurement error in functional regression on distributional assumptions. Under the null hypothesis that β(t) ≡ 0 for all t ∈ [a, b], Yi and zi are independent. Therefore the joint distribu- tion of (zi, Yi) is the same as that of (zi, Yi′) where i ′ is chosen at random from {1, . . . , N}. To simulate the null distribution of a test statistic, we make Q new data sets via permuting the Yi’s: the qth data set is simply {z1, Yi1 , . . . , zN , YiN } where i1, . . . , iN is a random permutation of 1, . . . , N . The p-value is then calculated as the proportion of the resulting Q statistic values larger than the original observed value. Testing equality of two β’s Often we want to know if the β’s governing Y ’s and Z’s in two different groups are equal. To denote group membership, we use the superscript “s” or “c” to indicate the “selection” or “control” group respectively. We have data collected independently from the two groups and we want to test Ho : β s(t) = βc(t), for all t ∈ [a, b]. We consider four test statistics. We assume that the selection and control log-likelihoods, ΛsNs and Λ c Nc , each have the same expressions as in (3.21) but with possibly different parameter values, superscripted by s or c. The first test statistic is from a likelihood ratio test Ul = sup βs = βc {ΛsNs + ΛcNc} − sup{ΛsNs + ΛcNc} = sup βs = βc {ΛsNs + ΛcNc} − supΛsNs − supΛcNc . Each of the last two suprema is calculated separately, using the ECME estimates from Section 3.3. We can also apply this ECME procedure to calculate the first supremum. Under the restriction βs = βc, ΛsNs and Λ c Nc have a common parameter β ≡ βs = βc. Following the same argument as in Section 3.3, we have µc(t) = z̄c, µs(t) = z̄s, β c(t) 0 = Ȳ c and β s(t) 0 = Ȳ s. 40 Chapter 3. Linear mixed models for measurement error in functional regression To update each of Σ c(t) x and Σ s(t) x , we follow the steps outlined in Sec- tion 3.3.2. To update each of σ c2(t) and σ s2(t) , we follow the steps outlined in Section 3.3.3. To update β(t), σs2(t) and σc2(t), we proceed as in Section 3.3.4, but with some modification for our updating of β(t). To describe the procedure to update β(t), σs2(t) and σc2(t), define Λ̃sNs and Λ̃cNc in a manner analagous to Λ̃N in (3.21). By an argument similar to that in Section 3.3.4, we must find β, σs2 and σc2 to maximize Λ̃sNs + Λ̃ c Nc . Differentiating Λ̃sNs + Λ̃ c Nc with respect to σ s2 and σc2 and setting equal to zero yields equations analagous to the equation for σ2(t+1) in (3.24). Unfortunately, differentiating Λ̃sNs + Λ̃ c Nc with respect to β and setting equal to zero yields an intractable equation. So we modify our calculation of β, but retain the above-described updates for σs2 and σc2. Instead of maximizing Λ̃sNs + Λ̃ c Nc with respect to β we maximize ˜̃Λ s Ns + ˜̃Λ c Nc with respect to β, where the ˜̃Λ’s are defined as follows. First consider the term β′Ksβ + σs2. Calculate the matrix Ks using Σs(t+1)x and σ s2(t+1) . Take β = β(t) and σs2 = σs2(t). Then the resulting expression for β ′Ksβ + σs2, which we denote σ s2(t) Y |z , no longer contains the parameters β and σ s2. Let ˜̃Λ s Ns = − N s 2 ln(σ s2(t) Y |z )− 1 2σ s2(t) Y |z ∑( Y si − βs0 − β′Gs(zsi − µs) )2 . Define σ c2(t) Y |z and ˜̃Λ c Nc similarly. Differentiating ˜̃Λ s Ns + ˜̃Λ c Nc with respect to β and setting equal to zero yields the update β(t+1) = 1 σ s2(t) Y |z Ms + 1 σ c2(t) Y |z Mc −1 1 σ s2(t) Y |z Gs ∑ (zsi − µs)(Y si − βs0) + 1 σ c2(t) Y |z Gc ∑ (zci − µc)(Y ci − βc0) . 41 Chapter 3. Linear mixed models for measurement error in functional regression where Ms = Gs ∑ (zsi − µs)(zsi − µs)′Gs′ and Mc is defined similarly. To define the two sample Wald’s statistic Uw, let β̂ s be the estimate of βs with estimated covariance matrix Σ̂ s β and define β̂ c and Σ̂ c β similarly. The two sample statistic is Uw = (β̂ s − β̂c)′ ( Σ̂βs + Σ̂βc )−1 (β̂s − β̂c). We consider a two sample statistic based on function evaluations, rather than on basis coefficients, recalling the notation in Section 3.4.1. Let β̃ s = Ψsβ̂ s be the vector of function evaluations of β̂s at a sequence of time points. Let Σ̂ s β̃ = Ψ sΣ̂ s βΨ s′ be the estimate of β̃ s ’s covariance matrix. Similarly define β̃ c and Σ̂ c β̃. The two sample Wald test statistic based on β̃ s and β̃ c is Ue = (β̃ s − β̃c)′ ( Σ̂ β̃ s + Σ̂ β̃ c )+ (β̃ s − β̃c). In Section 3.4.1, the one sample situation, we argued that we needn’t use the function evaluation statistic Ue as it is equivalent to the Wald test statistic Uw under mild conditions. In the two sample situation here, however, the two sample Uw and Ue may not agree unless Ψ s = Ψc and both of them are of full column rank. To define the two sample integrated t-statistic Uf , let V̂βs(s, t) = ψ s(s)′Σ̂ s βψ s(t) be the estimate of the covariance function of β̂s and define V̂βc(s, t) similarly. The two sample Uf is Uf = ∫ [β̂s(t)− β̂c(t)]2 V̂βs(t, t) + V̂βc(t, t) dt. Again, we use the permutation method to calculate the null distribution of the test statistics and thus the p-values. Under the null hypothesis that βs(t) = βc(t) for all t ∈ [a, b], the dependence of Yi on zi is identical in both groups. We generate Q “data sets” from the original data set, data sets that follow the null hypothesis. To construct the qth “data set”, we 42 Chapter 3. Linear mixed models for measurement error in functional regression randomly split the N s +N c individuals into two groups of size N s and N c and calculate the resulting test statistic. We use the empirical distribution of the obtained Q test statistic values to approximate the null distribution of our statistic. The p-value is then calculated as the proportion of the Q statistic values larger than the original observed value. 3.5 Model assumption checking After fitting the model, we need to check if the model assumptions are satis- fied. Our model diagnostics rely on the analysis of residuals. In this section, we derive expressions of the fitted values for Wi and for the residuals. Fitted values and residuals for zi and Yi are then obtained as components. We can then plot the residuals to check model assumptions and to look for outliers and influential points. To simplify notation, unknown parameters below stand for their esti- mates. Using model (3.4) and (3.5), we base our fitted values Ŵi on the BLUP of the random effects xi, µxi|Wi in (3.16), Ŵi = µW + Cµxi|Wi = µW + CΣxC ′Σ−1W (Wi − µW ) Recall the expression of ΣW in (3.9). We get CΣxC ′Σ−1W = ( CΣxC ′ + Σd −Σd ) Σ−1W = I−ΣdΣ−1W and thus Ŵi = ( I−ΣdΣ−1W ) Wi + ΣdΣ −1 W µW . It then follows ri = Wi − Ŵi = ΣdΣ−1W (Wi − µW ). The last element of Ŵi gives the fitted value of Yi and the last element of ri gives the Yi residual. As we focus on modelling the dependence of Yi on 43 Chapter 3. Linear mixed models for measurement error in functional regression Zi, plotting residuals against fitted values of Yi is useful to detect outliers or influential points of the model fit. 3.6 Model application In this section, we analyze a biological data set using our model. First we give a general description of how to choose the basis functions φ1, . . . , φK . Then we conduct the data analysis. 3.6.1 Choosing the basis functions We choose the basis functions, the φk’s, as functions that give good approxi- mations to Zi(·): we choose them as the first few estimated eigenfunctions of the Zi process. To do so, we apply a smoothed principle component analysis to the observed zi’s, penalizing an approximation of the second derivative of the eigenfunction. Let Σ̂ be the sample covariance matrix of the zi’s. We find a sequence of orthogonal vectors ṽj ∈ Rn to maximize v′jΣ̂vj v′jvj + λv ′ jD ′Dvj , where λ is the smoothing parameter and Dvj calculates second divided differences of the vector vj . The (n− 2)×n matrix D depends on t1, . . . , tn and is defined to differentiate quadratic functions exactly. That is, if vj[i] = a+ bti + ct 2 i , then (Dvj)[i] = 2c. Given λ, the vectors ṽj are eigenvectors of the matrix G−1/2Σ̂G−1/2, where G = I+λD′D. The approach is similar to Ramsay and Silverman’s (2005) but we don’t use basis function expansions of the Zi. Choice of λ can be done by cross-validation, but for simplicity here, we select λ by examining the smoothness of the resulting ṽj’s. In the data analysis, we chose λ = 100. 44 Chapter 3. Linear mixed models for measurement error in functional regression 3.6.2 Data description Data were provided by Patrick Carter, School of Biological Sciences, Wash- ington State University, and are described in Morgan et al, 2003. Data are from mice divided into four groups according to gender and the two treatments “selection” and “control”. The selection group mice were bred over 16 generations, with selection being on high wheel-running activity at age eight weeks. Control mice were bred at random. In the final generation, body mass and wheel running activity were recorded for each mouse for sixty two consecutive weeks, indexed from −1 to 60, except for weeks 34, 38, 39, 50. The research interest is to know how body mass and wheel running are related and if the relationship depends on the treatment. The wheel running distance data have many missing values and are very noisy. In addition, the wheels were cleaned every four weeks, and so we see spikes in wheel-running activity every fourth week. So in our analysis, we average the nonmissing wheel running distances over weeks 5 to 60 and use this average as the response Y . The predictor Z(·) is the log transformed body mass. We want to know if any of the groups have β non-zero and if there is any difference between the selection βs and the control βc within each gender. Plots of the observed zi’s and histograms of the Yi’s in each group are in Figures 3.1 and 3.2 respectively. We see that log body mass is roughly monotone with a high rate of increase in weeks −1 to 4. The log body mass in the males is more variable than that in the females. 3.6.3 Choice of basis functions A smoothed eigenanalysis in each of the four groups yielded a first eigen- function that was close to constant, indicating that the biggest source of variability in log body mass in each group was overall size of the mouse. Since a constant eigenfunction is biologically meaningful, we forced our first basis function to be constant and, within each group, calculated the remain- ing functions via a smoothed eigenanalysis on the centered log body mass 45 Chapter 3. Linear mixed models for measurement error in functional regression as follows. We let z̄i = 1 58 60∑ k=−1 k 6=34,38,39,50 zik be the ith mouse’s average log body mass and zij − z̄i j = −1, . . . , 60, j 6= 34, 38, 39, 50 the ith mouse’s centered log body masses. Within each group, we calcu- lated the sample covariance matrix of the centered log body mass vectors. We then applied the smoothed eigenanalysis to this covariance matrix. Fig- ure 3.3 shows the proportion of cumulative variance explained by the first ten principal components of this analysis in each group. The variance ex- plained by the constant function was 85% in the male selection group, 72% in the male control group, 70% in the female selection group and 63% in the female control group. Figure 3.4 shows the constant function and the first three eigenfunctions for each group. They are displayed one by one with the four groups’ functions together in one panel. There we see the three smoothed eigenfunctions are very similar in the four groups. Figure 3.3 suggests that, if we choose the first three smoothed eigen- functions in each group, we will capture about 90% of the variability of the log body mass trajectories, beyond the variability captured by the constant function. We will use a constant function plus these three eigenfunctions in our analysis. 3.6.4 Estimation and residual analysis Before proceeding with inference, we study residuals of our fits to see if there are any outliers. We fit the model (3.4) and (3.5) within each group, using R = I, and choose the same basis functions for β as for the log body mass, calculated in Section 3.6.3. Within each group, we plot the Yi residuals against the fitted Yi’s to detect outliers as outlined in Section 3.5. Residual plots of each group are in Figure 3.5. In the control males, we see an outlier at the left 46 Chapter 3. Linear mixed models for measurement error in functional regression side. This outlier may be influential in the estimation of β. So we remove this outlier and refit the model to the male control group. The first panel of Figure 3.6 is the same as the plot in the second panel of Figure 3.5, showing the outlier in the control males. The second panel of Figure 3.6 shows the residual plot of the fit after removing the outlier. There we see no new outliers. We remove the one outlier in the control males in our subsequent analyses. 3.6.5 Inference for β(·) In each of the four groups, we estimated β and calculated standard errors from both the Hessian matrix and the bootstrap, as described in Section 3.4. These are shown in Figure 3.7. To determine if β(t) = 0 for all t ∈ [−1, 60], we can study the plots in Figure 3.7 and we can conduct a formal hypothesis test. From the plots in Figure 3.7, we see that, except for the female control group, all groups show a region where the zero line is not within one standard error of β̂. This suggests that, within these groups, perhaps there may be a dependence of averaged wheel running on log body mass. We conduct a hypothesis test of Ho : β(t) = 0 for all t ∈ [−1, 60] in each group using the test statistics in Section 3.4.1. We compute the standard errors using the Hessian matrix (3.26). The results are in Table 3.1 below. The last three columns contain the permutation p-values of Ul, Uw and Uf , computed by using 500 permutations. The second column gives the observed value of the test statistic Ul and the next column gives the p-value of Ul based on the fact that negative twice the log-likehood ratio statistic is asymptotically chi-squared distributed with 4 degrees of freedom. From the Table, we see that, in selected and control males, average wheel-running depends on the log body mass trajectories. However, there is insufficient evidence to make this claim in the other three groups. We observe that the p-values of Ul and Uw are identical. We plot the permuted values of Ul and Uw in Figure 3.8 and see that Ul is an increasing function of Uw. This may be due to the normality assumption in the model. 47 Chapter 3. Linear mixed models for measurement error in functional regression 3.6.6 Inference for βs − βc Within each gender, we want to compare the selection β(·) with the control β(·). Figure 3.9 shows the pointwise difference between the selection β̂ and the control β̂ together with the standard errors computed from the Hessian matrix and from the bootstrap. For both males and females, the region within one standard error of β̂s − β̂c contains much of the zero line, which suggests we can’t distinguish between the selection β and the control β. Table 3.2 gives results of the test Ho : β s(t) = βc(t), for all t ∈ [−1, 60] in each gender. The last three columns contain the permutation p-values of Ul, Uw and Uf , computed by using 500 permutations. The second column gives the observed value of the test statistic Ul and the next column gives the p-value of Ul based on the fact that negative twice the log-likehood ratio statistic is asymptotically chi-squared distributed with 4 degrees of freedom. Given the results, we can not reject Ho in either gender. That is, within each gender, there is no evidence of a difference between the selected group and the control group in terms of average wheel running’s dependence on log body mass. In conclusion, we find a strong dependence of average wheel running on the log body mass in the selected males and control males. We don’t have enough evidence to distinguish the difference between the selected group and the control group in terms of average wheel running’s dependence on log body mass in either gender. 3.7 Simulation study In the simulation study, we compare the pointwise mean squared errors of our ECME estimate of β with those of a modified version of the two stage estimate proposed by Müller (2005). We also compare the power of the test statistics proposed in Section 3.4.1 for one-sample and two-sample comparisons. We will see that, in terms of mean squared error, in the one-sample case, both the ECME estimate and the two stage estimate suffer from an edge 48 Chapter 3. Linear mixed models for measurement error in functional regression effect. In Section 3.7.4, we look into this edge effect further. The ECME estimate typically has slightly smaller MSE than the two stage estimate, with the improvement in MSE being more noticeable as the dependence of Y on Z increases. In the two-sample case, the two methods are comparable. For testing, in the one-sample case, there is little difference in power between the two statistics considered. In the two sample case, the integrated t-statistic has more power to distinguish the difference between β s and βc. In the one-sample comparison in Section 3.7.2, we simulate data using parameter values estimated from the male selection group data analyzed in Section 3.6.2. In the two-sample comparison in Section 3.7.3, we simulate selection group data using estimates from the male selection group data and the control group data using estimates from the male control group data without the outlier. 3.7.1 Two stage estimate Recall from Section 3.1 that the calculation of Müller’s (2005) estimate of β had two parts. The first part ignored the yi’s and only used the zi’s to predict the underlying xi’s. The second part of the analysis was essentially a linear regression of the yi’s on the predicted xi’s. We would like to study this procedure’s ability to estimate β. We suspect that ignoring the yi’s in the first part of the procedure will lead to poorer estimation of β. To study this in a way that is comparable to our over-all methodology, we slightly modify the Müller’s method. We use linear mixed effects methodol- ogy to predict the xi’s from the zi’s. Specifically, we first use our smoothed principal components analysis of the zi’s to determine basis functions, the φk’s, see (3.3). We then construct the matrix T and fit model (3.4) using the EM algorithm. Laird (1982) gave an elaboration on the EM computation in linear mixed models. We use the resulting estimated covariances of the xi’s and i’s to calculate our predictors, the BLUPs of xi given zi, that is, to calculate E(xi|zi). For the second part of the analysis, we find the least squares estimate of β according to the regression model Yi = β0 + E(x ′ i|zi)T′β + ei. 49 Chapter 3. Linear mixed models for measurement error in functional regression In our fit of (3.4), we force Σx to be diagonal. This is in keeping with Müller, and stems from the fact that the components of the xi’s are principal component scores. The main difference between our approach and Müller’s is in the way we estimate the required variances and covariances for calculating the BLUPs. We use a linear mixed effects model based on eigenfunctions. Müller used a smoothing method to directly estimate the components of the covariance structure. 3.7.2 One sample comparison We simulate data based on parameter and eigenfunction estimates from the data analysis of the male selection group in Section 3.6.2. Let µs,Σsx, σ s2 , σ s2, β0 s, βs and βs(t) = ψ(t)′βs denote these estimates and let As be the matrix con- structed from the basis functions, evaluated at the same tj’s as in the data analysis. We simulate the unperturbed predictor Zi according to a multivariate normal N (µs,AsΣsxA s′) and let the observed zi be Zi plus N(0, σ s2 I) noise. We consider four possible β functions to describe the relationship between Yi and Zi: Yi = β0 s + ∫ 60 −1 β(t) [ Zi(t)− Z̄i(t) ] dt+ ei. The integral is calculated using the R function “sintegral” and we simulate ei from N(0, σ s2). We take β = γβs with γ = 0, 2/3, 4/3 or 2. When γ = 0, Yi does not depend on Zi. When γ = 2, the dependence is large. Recall in our data analysis in Section 3.6.2, we found that βs was significantly different from zero. Thus, for each value of γ, we can generate an observed data set (zi, Y γ i ), i = 1, . . . , 39. We first compare our estimate of β with the two stage estimate. We run 100 simulations and the MSE of the estimate β̂ is calculated as 100∑ 1 ( β̂(t)− β(t))2 ) /100 50 Chapter 3. Linear mixed models for measurement error in functional regression at each observation point t. The results are in Figure 3.10 where the first panel shows the pattern of the true β’s, the next two panels show the point- wise MSE’s of our two estimates of β and the last panel shows the differences of these pointwise MSE’s. In the plots, the MSE’s increase as γ increases and both estimates of β have high MSE’s for t’s at the edges. The MSE’s of both methods are much worse at the left hand edge, probably due to the sharp decrease of the true β and the sharp increase in log body mass before week 10. The ECME method seems to be more affected by this edge. We will give a further study of this edge effect in Section 3.7.4. However, in the last panel we see that overall, the ECME estimate has a smaller MSE, with the superiority of ECME increasing as γ increases. Therefore, if there is a significant dependence of Yi on zi through β, the ECME method of estimate is preferred. To test Ho : β(t) = 0, for all t ∈ [−1, 60], we use the test statistics Uw and Uf with standard errors calculated using the Hessian matrix (3.26) as described in Section 3.4. For each data set, we run 300 permutations to calculate the p-values. We simulate 100 data sets for each value of γ and choose levels α = 0.01 and α = 0.05. Figure 3.11 and Figure 3.12 summarize the proportion of times Ho was rejected using Uw and Uf but with different levels: α = 0.01 and α = 0.05 respectively. As expected, the powers increase as γ increases. There is little difference in power between the two statistics. 3.7.3 Two sample comparison To simulate data from two independent samples, we choose model param- eters using the male selection group data and the male control group data analysed in Section 3.6.2. We simulate data for the selection group and, sep- arately, for the control group using the same methodology as in Section 3.7.2 but with different true β’s. Let βs and βc be the estimates of β from the original male selection group and male control group data. Let β̄ = (βs + βc)/2 and ∆β = (βs − βc)/2. In the simulation study we set the β of the selection group to β̄ + γ∆β and that of the control group to β̄ − γ∆β with γ = (0, 2/3, 4/3, 2). Thus, the 51 Chapter 3. Linear mixed models for measurement error in functional regression difference between the selection and control β is 2γ∆β = γ(βs − βc). To estimate βs and βc, we fit the selection data and control data sepa- rately. For each value of γ we simulate 100 data sets and calculate the MSE as in Section 3.7.2 in each group. Figures 3.13 and 3.14 show the MSE’s of the two estimates in each group respectively. The pattern of the two MSE’s are similar to their counterparts in Figure 3.10. We calculate the MSE of the estimate βs − βc as 100∑ 1 ( β̂s(t)− β̂c(t)− βs(t) + βc(t))2 ) /100 at each observation point t. Figure 3.15 shows the results. The patterns of the two MSE’s are similar to their counterparts in Figure 3.10. The MSE’s of the two estimates are comparable but in general the MSE of the ECME estimate is smaller. To test Ho : β s(t) = βc(t), for all t ∈ [−1, 60], we compare the four test statistics, Ul, Uw, Ue and Uf , with standard errors calculated using the Hessian matrix (3.26). For each data set, we run 300 permutations to calculate the p-values. We simulate 100 data sets for each value of γ and choose levels α = 0.01 and α = 0.05. Figures 3.16 and 3.17 show the power of the four test statistics with levels 0.01 and 0.05 respectively. The statistic Uf is the most powerful, especially when γ is large, but Uf and Uw are comparable. 3.7.4 Edge effect discussion in one-sample MSE comparison In Figure 3.10, we see that the MSE’s of the ECME estimate of β are much worse at the left hand edge than the MSE’s of the two stage estimate. We suspect this is probably due to the sharp decrease of the true β before week 10 as shown at the first panel of Figure 3.10, and the large increase in log body mass in that same period. In this section, we exclude the early weeks’ data from our analysis. To determine the various parameter values to use in a new simulation, we re-analyze the male selection group data but with zi containing log body 52 Chapter 3. Linear mixed models for measurement error in functional regression mass values a week 5 rather than at week −1. After obtaining the new parameter estimates, we simulate data in the same way as in Section 3.7.2. The simulation analysis result is in Figure 3.18. The edge effect still exists but this time the MSE’s of ECME and the two stage method are comparable at the edges. 53 Chapter 3. Linear mixed models for measurement error in functional regression Group Observed Ul Asymptotic p-Ul p-Ul p-Uw p-Uf Male selected 13.723 0.008 0.008 0.008 0.008 Male control 10.940 0.027 0.044 0.044 0.034 Female selected 9.095 0.059 0.082 0.082 0.264 Female control 3.280 0.512 0.568 0.568 0.656 Table 3.1: P-values of the test Ho : β(t) = 0, for all t ∈ [−1, 60] in each group. Gender Observed Ul Asymptotic p-Ul p-Ul p-Uw p-Ue p-Uf Male 11.919 0.018 0.364 0.356 0.532 0.110 Female 5.852 0.210 0.564 0.568 0.884 0.444 Table 3.2: P-values of the test Ho : β s(t) = βc(t), for all t ∈ [−1, 60], within each gender. 54 Chapter 3. Linear mixed models for measurement error in functional regression 0 10 20 30 40 50 60 2. 0 2. 5 3. 0 3. 5 Male Selected Week Lo g bo dy M as s 0 10 20 30 40 50 60 2. 0 2. 5 3. 0 3. 5 Male Control Week Lo g bo dy M as s 0 10 20 30 40 50 60 2. 0 2. 5 3. 0 3. 5 Female Selected Week Lo g bo dy M as s 0 10 20 30 40 50 60 2. 0 2. 5 3. 0 3. 5 Female Control Week Lo g bo dy M as s Figure 3.1: Plots of log body mass versus week for the four groups of labo- ratory mice. 55 Chapter 3. Linear mixed models for measurement error in functional regression Male Selected Wheel Running (km/week) D en si ty 20 40 60 80 120 0. 00 0 0. 01 0 0. 02 0 0. 03 0 Male Control Wheel Running (km/week) D en si ty 20 40 60 80 120 0. 00 0 0. 01 0 0. 02 0 0. 03 0 Female Selected Wheel Running (km/week) D en si ty 20 40 60 80 120 0. 00 0 0. 01 0 0. 02 0 0. 03 0 Female Control Wheel Running (km/week) D en si ty 20 40 60 80 120 0. 00 0 0. 01 0 0. 02 0 0. 03 0 Figure 3.2: Histogram of the response: averaged wheel running from weeks 5 to 60 for the four groups of laboratory mice. 56 Chapter 3. Linear mixed models for measurement error in functional regression 2 4 6 8 10 Male Selected Eigenvalue Index Va ria nc e Ex pl ai ne d 0. 5 0. 6 0. 8 0. 9 1. 0 2 4 6 8 10 Male Control Eigenvalue Index Va ria nc e Ex pl ai ne d 0. 5 0. 6 0. 8 0. 9 1. 0 2 4 6 8 10 Female Selected Eigenvalue Index Va ria nc e Ex pl ai ne d 0. 5 0. 6 0. 8 0. 9 1. 0 2 4 6 8 10 Female Control Eigenvalue Index Va ria nc e Ex pl ai ne d 0. 5 0. 6 0. 8 0. 9 1. 0 Figure 3.3: Plots of the proportion of cumulative variance of the centered log body mass explained by the first ten principal components of the four groups of mice, after the individual average log body mass has been removed. 57 Chapter 3. Linear mixed models for measurement error in functional regression 0 10 20 30 40 50 60 0. 6 0. 8 1. 0 1. 2 1. 4 Week PC F un ct io n 1 0 10 20 30 40 50 60 − 0. 5 − 0. 3 − 0. 1 0. 1 Week PC F un ct io n 2 0 10 20 30 40 50 60 − 0. 2 0. 0 0. 1 0. 2 Week PC F un ct io n 3 0 10 20 30 40 50 60 − 0. 2 0. 0 0. 1 0. 2 0. 3 Week PC F un ct io n 4 Figure 3.4: The constant function and the first three smoothed eigenfunc- tions of the covariance of centered log body mass of the four groups of mice, calculated as described in Section 3.6.3. 58 Chapter 3. Linear mixed models for measurement error in functional regression 40 50 60 70 − 30 − 10 10 20 30 Male Selected Fitted O bs er ve d − Fi tte d 30 35 40 45 50 55 − 30 − 10 10 20 30 Male Control Fitted O bs er ve d − Fi tte d 75 80 85 90 95 − 30 − 10 10 20 30 Female Selected Fitted O bs er ve d − Fi tte d 45 50 55 60 − 30 − 10 10 20 30 Female Control Fitted O bs er ve d − Fi tte d Figure 3.5: Residual plots of the fit of the Yi = averaged wheel running of the four groups of mice. 59 Chapter 3. Linear mixed models for measurement error in functional regression 30 35 40 45 50 55 − 20 0 10 30 The outlier Fitted O bs er ve d − Fi tte d 30 35 40 45 50 55 60 − 20 0 10 30 No outlier Fitted O bs er ve d − Fi tte d Figure 3.6: Plots of residuals of Yi = averaged wheel runining in the male control group of mice before and after removing the outlier. 60 Chapter 3. Linear mixed models for measurement error in functional regression 0 10 20 30 40 50 60 − 60 − 20 0 20 40 Male Selected Week β^ (t ) 0 10 20 30 40 50 60 − 60 − 20 0 20 40 Male Control Week β^ (t ) 0 10 20 30 40 50 60 − 60 − 20 0 20 40 Female Selected Week β^ (t ) 0 10 20 30 40 50 60 − 60 − 20 0 20 40 Female Control Week β^ (t ) Figure 3.7: Plots of β̂ and its standard errors computed from the Hessian matrix (solid) and from the bootstrap (dash-dot) for the four groups of mice, as described in Section 3.4. 61 Chapter 3. Linear mixed models for measurement error in functional regression 0 5 10 15 20 25 0 5 10 15 20 Male Selected permuted U_w pe rm ut ed lo g− lik el yh oo d ra tio 0 5 10 15 20 0 5 10 15 Male Control permuted U_w pe rm ut ed lo g− lik el yh oo d ra tio 0 5 10 15 20 25 0 5 10 15 20 Female Selected permuted U_w pe rm ut ed lo g− lik el yh oo d ra tio 0 5 10 15 20 0 5 10 15 Female Control permuted U_w pe rm ut ed lo g− lik el yh oo d ra tio Figure 3.8: Comparing the permuted values of the generalized likelihood ratio statistic Ul with the Wald statistic Uw for four groups of mice. The equivalence of Ul and Uw were observed in Table 3.1. 62 Chapter 3. Linear mixed models for measurement error in functional regression 0 10 20 30 40 50 60 − 40 0 20 60 Male Week βs ( t)− βc ( t) 0 10 20 30 40 50 60 − 40 0 20 60 Female Week βs ( t)− βc ( t) Figure 3.9: Plots of β̂s − β̂c and the standard errors of the difference com- puted from the Hessian matrix (solid) and from the bootstrap (dash-dot) for both genders. 63 Chapter 3. Linear mixed models for measurement error in functional regression 0 10 20 30 40 50 60 − 20 0 20 40 60 True β Week β γ = 0 γ = 2/3 γ = 4/3 γ = 2 0 10 20 30 40 50 60 0 50 10 0 20 0 MSE of the ECME estimate Week M SE γ = 0 γ = 2/3 γ = 4/3 γ = 2 0 10 20 30 40 50 60 0 50 10 0 20 0 MSE of the two stage estimate Week M SE γ = 0 γ = 2/3 γ = 4/3 γ = 2 0 10 20 30 40 50 60 − 20 − 10 0 5 10 Difference between the MSE’s Week Tw o st ag e − EC M E γ = 0 γ = 2/3 γ = 4/3 γ = 2 Figure 3.10: MSE of the estimate of β for each γ value in one sample simulation as described in Section 3.7.2. Compare the ECME method with the two stage method. 64 Chapter 3. Linear mixed models for measurement error in functional regression 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 α = 0.01 γ Pr op or tio n of ti m es re jec tin g H o: β = 0 0.000 0.667 1.333 2.000 0 0 Uw Uf Figure 3.11: Proportion of times Ho is rejected using level α = 0.01, where Ho : β(t) = 0, for all t ∈ [−1, 60] in one sample simulation as described in Section 3.7.2. Two test statistics are considered, Uw and Uf . 65 Chapter 3. Linear mixed models for measurement error in functional regression 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 α = 0.05 γ Pr op or tio n of ti m es re jec tin g H o: β = 0 0.000 0.667 1.333 2.000 0.04 0.02 Uw Uf Figure 3.12: Proportion of times Ho is rejected using level α = 0.05, where Ho : β(t) = 0, for all t ∈ [−1, 60] in one sample simulation as described in Section 3.7.2. Two test statistics are considered, Uw and Uf . 66 Chapter 3. Linear mixed models for measurement error in functional regression 0 10 20 30 40 50 60 − 20 0 10 30 True βs Week βs γ = 0 γ = 2/3 γ = 4/3 γ = 2 0 10 20 30 40 50 60 0 40 80 12 0 MSE of the ECME estimate Week M SE γ = 0 γ = 2/3 γ = 4/3 γ = 2 0 10 20 30 40 50 60 0 40 80 12 0 MSE of the two stage estimate Week M SE γ = 0 γ = 2/3 γ = 4/3 γ = 2 0 10 20 30 40 50 60 − 4 − 2 0 2 4 6 Difference between the MSE’s Week Tw o st ag e − EC M E γ = 0 γ = 2/3 γ = 4/3 γ = 2 Figure 3.13: MSE of the estimate of βs for each γ value in two sample simulation as described in Section 3.7.3. Compare the ECME method with the two stage method. 67 Chapter 3. Linear mixed models for measurement error in functional regression 0 10 20 30 40 50 60 − 40 − 20 0 20 True βc Week βc γ = 0 γ = 2/3 γ = 4/3 γ = 2 0 10 20 30 40 50 60 50 15 0 25 0 MSE of the ECME estimate Week M SE γ = 0 γ = 2/3 γ = 4/3 γ = 2 0 10 20 30 40 50 60 50 15 0 25 0 MSE of the two stage estimate Week M SE γ = 0 γ = 2/3 γ = 4/3 γ = 2 0 10 20 30 40 50 60 − 5 0 5 10 Difference between the MSE’s Week Tw o st ag e − EC M E γ = 0 γ = 2/3 γ = 4/3 γ = 2 Figure 3.14: MSE of the estimate of βc for each γ value in two sample simulation as described in Section 3.7.3. Compare the ECME method with the two stage method. 68 Chapter 3. Linear mixed models for measurement error in functional regression 0 10 20 30 40 50 60 − 20 0 20 40 True βs − βc Week βs − βc γ = 0 γ = 2/3 γ = 4/3 γ = 2 0 10 20 30 40 50 60 10 0 30 0 50 0 MSE of the ECME estimate Week M SE γ = 0 γ = 2/3 γ = 4/3 γ = 2 0 10 20 30 40 50 60 10 0 30 0 50 0 MSE of the two stage estimate Week M SE γ = 0 γ = 2/3 γ = 4/3 γ = 2 0 10 20 30 40 50 60 0 5 10 15 Difference between the MSE’s Week Tw o st ag e − EC M E γ = 0 γ = 2/3 γ = 4/3 γ = 2 Figure 3.15: MSE of the estimate of βs − βc for each γ value in two sample simulation as described in Section 3.7.3. Compare the ECME method with the two stage method. 69 Chapter 3. Linear mixed models for measurement error in functional regression 0. 0 0. 2 0. 4 0. 6 0. 8 α = 0.01 γ Pr op or tio n of ti m es re jec tin g H o: βs = βc 0.000 0.667 1.333 2.000 0 0.02 Ul Uw Ue Uf Figure 3.16: Proportion of times Ho is rejected using level α = 0.01, where Ho : β s = βc, for all t ∈ [−1, 60] in two sample simulation as described in Section 3.7.3. Four test statistics are considered Ul, Uw, Ue and Uf . 70 Chapter 3. Linear mixed models for measurement error in functional regression 0. 0 0. 2 0. 4 0. 6 0. 8 α = 0.05 γ Pr op or tio n of ti m es re jec tin g H o: βs = βc 0.000 0.667 1.333 2.000 0.02 0.05 Ul Uw Ue Uf Figure 3.17: Proportion of times Ho is rejected using level α = 0.05, where Ho : β s = βc, for all t ∈ [−1, 60] in two sample simulation as described in Section 3.7.3. Four test statistics are considered Ul, Uw, Ue and Uf . 71 Chapter 3. Linear mixed models for measurement error in functional regression 10 20 30 40 50 60 − 40 − 20 0 20 True β Week β γ = 0 γ = 2/3 γ = 4/3 γ = 2 10 20 30 40 50 60 0 10 0 20 0 30 0 40 0 MSE of the ECME estimate Week M SE γ = 0 γ = 2/3 γ = 4/3 γ = 2 10 20 30 40 50 60 0 10 0 20 0 30 0 40 0 MSE of the two stage estimate Week M SE γ = 0 γ = 2/3 γ = 4/3 γ = 2 10 20 30 40 50 60 0 5 15 25 35 Difference between the MSE’s Week Tw o st ag e − EC M E γ = 0γ = 2/3 γ = 4/3 γ = 2 Figure 3.18: MSE of the estimate of β for each γ value using truncated log body mass as described in Section 3.7.4. Compare the ECME method with the two stage method. 72 Bibliography [1] James, G. (2002) Generalized linear models with functional predictors. Journal of the Royal Statistical Society B 64, 3, 411-432. [2] James, G. M. and Silverman, B. W. (2005) Functional adaptive model estimation. Journal of the American Statistical Association 100, 470, 565-576. [3] Laird, N. M. (1982) Computation of variance components using the E-M Algorithm. Journal of Statistical Computation and Simulation 14, 295-303. [4] Liu, C. H. and Rubin, D. B. (1994) The ECME Algorithm - a simple ex- tension of EM and ECM with faster monotone convergence. Biometrika 81 4, 633-648. [5] Magnus, J. R. and Neudecker, H. (1988) Matrix Differential Calculus with Applications in Statistics and Econometrics. John Wiley and Sons. [6] Morgan, T.J., T. Garland, Jr., and Carter, P. A. (2003) Ontogenetic trajectories in mice selected for high wheel- running activity. I. Mean ontogenetic trajectories. Evolution 57, 646-657. [7] Müller, H.G. (2005) Functional modeling and classification of longitu- dinal data. Scandinavian Journal of Statistics 32, 223-240. [8] Ramsay, J. O. and Silverman, B. W. (2005) Functional Data Analysis. 2nd edition, Springer Series in Statistics. [9] Wang, W. (2007) Identifiability in linear mixed effects models. Manuscript, Department of Statistics, University of British Columbia. 73 Bibliography [10] Yao, F., Müller, H.G., Wang, J.L. (2005) Functional data analysis for sparse longitudinal data. Journal of American Statistical Association 100, 577-590. 74 Chapter 4 Dependence of average lifetime wheel-running on body mass ontogeny 4.1 Introduction In natural populations of animals, strong correlations between large body size and increased Darwinian fitness are frequently measured: large body size is related to increased fecundity, reduced predation and, in general, an increased ability to survive in a given environment (Sogard 1997, Hone and Benton 2005). Variation in body mass is also associated with variation in a number of physiological traits, including metabolic rate and heart mass, and much effort has been expended understanding allometric relationships in animals (Garland and Carter, 1994, and references therein). Body mass has even been shown to impact mutation rates in nuclear and mitochondrial DNA (Gillooly et al., 2005; Estabrook et al., 2007). Body mass is also associated with locomotor performance; however, these relationships are complex. Body mass is positively correlated with home range area and daily movement distances in wild populations of mammals (Garland 1983), but negatively correlated with laboratory measures of voluntary activity (Dewsbery 1980). Replicate lines of mice which have been selected for high levels of vol- untary wheel running activity are a powerful model system with which to 0A version of this chapter will be submitted for publication. Authors: Carter, P. A., Heckman, N. and Wang, W. 75 Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny investigate the relationships between body mass and voluntary locomotor activity (Garland 2003). After 10 generations of selection, mice selected for high voluntary wheel running activity at 8 weeks of age ran significantly more than their unselected controls; however, body mass at the age at which wheel running was measured and selected was not a significant predictor of the amount of wheel running nor was it significantly different between se- lected and control mice (Swallow et al. 1998). A subsequent study, 14 generations after selection, examined wheel running and body mass in this system of mice after 8 weeks of access to running wheels (Swallow et al., 1999). Three major results emerged from this study. First, body mass and wheel running share a negative genetic correlation of −.50; selected mice were significantly lighter at maturity than control mice. Second, this result is not mediated solely through wheel running activity; the difference in body mass persisted even in mice without wheel access, and in mice with wheel ac- cess even when total wheel running was statistically removed. Third, access to running wheels did result in a decrease in body mass, but this decrease was only about a third of the decrease caused by selection. A detailed study on body mass and wheel running ontogenies in the 16th generation of this system of mice was conducted by Morgan et al (2003). This experiment utilized 320 mice, half from the selected lines and half from the control lines; half of each of these groups were given lifetime access to running wheels while the other half did not have wheel access. Morgan et al (2003) compared the ontogenetic trajectories of the different experimental groups using repeated measures analysis and identified several interesting results. First, both the position and shape of the wheel running trajectory had responded to selection on wheel running at 8 weeks of age; selected mice ran more across the trajectory and showed a steeper decline in activity at older ages than control mice. This correlated response of the wheel run- ning ontogeny to early-age selection on wheel running indicates a positive genetic covariance between wheel running at younger ages and wheel run- ning at older ages, with the covariance declining as age increases. Second, the position but not the shape of the body mass trajectory had responded to selection, regardless of access to running wheels; selected mice with and 76 Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny without wheel access were significantly lighter than their counterpart con- trol mice across all ages. This result indicates a negative response of the body mass ontogeny to selection on early-age wheel running and hence a negative genetic covariance between wheel running at 8 weeks of age and body mass across ontogeny; in addition, it indicates strong positive genetic covariances among body masses at different ages. Finally, the analysis of the wheel running trajectory revealed that age-specific body mass was a significant covariate for age-specific wheel running; smaller individuals ran more at each age than did larger individuals. The fact that Swallow et al. (1999) measured a negative genetic covari- ance between age-specific wheel running and body mass and that Morgan et al. (2003) showed that body mass is a significant covariate for wheel run- ning at each age, with smaller mice running more than larger mice, raises the question of whether wheel running through ontogeny explicitly depends on the body mass trajectory. Given that body mass at each age was a sig- nificant covariate for wheel running at that age, we hypothesized that wheel running is significantly dependent on the body mass trajectory. In addi- tion, given the age-specific negative genetic covariance between body mass and wheel running, we can ask whether this dependence has evolved in re- sponse to early-age selection on wheel running. We hypothesize that this dependence has evolved such that we will measure quantitative differences between the selected and control mice in the dependence of wheel running on the body mass trajectory. To test these hypotheses, we will generally take a function-valued ap- proach. In essence, we regress average wheel running on the log body mass trajectory. We then test if the “regression coefficients” are zero within each group and if they are equal between two groups. To model the log body mass trajectory, we use basis functions with expansion coefficients modeled as random effects. 77 Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny 4.2 Methods Experiments were performed on house mice (Mus domesticus) from the 16th generation of a selective-breeding experiment for increased voluntary wheel- running exercise (Swallow et al. 1998; Garland 2003). The selection experi- ment comprises four selection and four control lines; each line is propagated with 10 breeding pairs that contribute one male and one female to the next generation, with the condition that siblings are not mated. Breeders are chosen either randomly with respect to wheel running (control lines) or se- lectively, within family, as the mice that run the most revolutions on days five and six of six days of wheel exposure administered at eight weeks of age (selection lines). The individuals in this report are the offspring of the generation 15 breeders for generation 16 of the selection experiment. Five breeding pairs from each of the eight lines were remated to produce second litters at Washington State University. Details of this design are presented in Morgan et al (2003). Briefly, pups were weaned at 21 days of age and placed in treatment groups at 28 + / − 3 days of age in the animal care facility at Washington State University. Four males and four females from each family were used, with half designated for the active treatment and half designated for the sedentary treatment. Each activity group thus con- tained two females and two males from each of the five families within each of the eight lines, for a total of 160 individuals per activity group, i.e., 320 individuals in the aging colony. Mice in the active treatment group were placed individually in cages with a 11.75 cm radius running wheel and elec- tronic wheel-revolution counter built into the cage-top. The mouse thus had the option of voluntarily getting into the wheel and running, or of remain- ing in the cage and not running. On the same day, sedentary mice were placed individually in standard rodent cages. Photoperiod was 12 : 12 h, and water was available ad libitum. Mice were provided excess food weekly (Harland Teklad rodent diet W8604) and apparent food consumption was determined by weighing food hoppers. This measure does not account for possible variation in food wasting, as when mice shred food pellets and allow fragments to drop in the litter. A study of food wasting at generation 10 78 Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny found no significant differences between selection and control lines, but sig- nificant variation among the replicate lines within selection group (Koteja et al. 2003). Additionally, once a week each animal was weighed and its weekly wheel revolutions downloaded from the counter device. Cages were cleaned weekly and running wheels were cleaned monthly. Extra sibs from all 40 families were placed in similar housing and were used as sentinels to monitor the colony monthly for the presence of specific pathogen exposure. At 24 months post start date, a sentinel tested positive for MHV exposure, presumably from a barrier breakdown. No other sentinels tested positive. No treatment was initiated for the virus. Subsequent necropsied mice were monitored specifically for the presence of diseased liver tissue. None were observed to have contracted hepatitis (necropsies performed by LAR veteri- narians at WSU). Before carrying out any analysis, we eliminate line effects in log body mass and wheel-running by subtracting out line means within each group (selection and control) and gender. Suppose individual i is in line k of the selection group and is female. We replace individual i’s log body mass with original log body mass minus line k’s average plus the selection group average. The line k average and the selection group average only involve females. Similarly, we do this for the wheel-running. Let Zi(·) be the ith mouse’s log body mass trajectory. We study how Zi(·) affects Yi, the averaged wheel running response over weeks 5 to 60. We use average wheel-running since the wheel running distance data have many missing values and are very noisy. In addition, since the running wheels were cleaned every four weeks, we see spikes in wheel-running activity every fourth week. In the following, we describe our model for the dependence of wheel- running on log body mass. This dependence is parameterized through an unknown function β. We estimate β and provide standard errors. We also test the null hypothesis that β = 0 (no dependence of wheel-running on log body mass) and test that two groups have the same β’s. We use permutation type methods to calculate the p-values. For more details of the statistical methodology, see Heckman and Wang (2007). 79 Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny We model Zi using random effects with a set of basis functions φ1, . . . , φK as Zi(t) = µ(t) + K∑ k=1 xikφk(t), where µ is a smooth unknown function. The random effects xi ≡ (xi1, . . . , xiK)′ are assumed normally distributed as N(0,Σx). The log body mass trajec- tory Zi is observed with error at a sequence of weeks, with the observed value of Zi at week j equal to zij ≡ Zi(j) + ij. The ij ’s are independent and normally distributed with mean 0 and variance σ2 . Let zi contain the observed values of the zij’s. We can then write zi = µ+ Axi + i, µ = (µ(−1), . . . , µ(60))′, Ajk = φk(j), i = (i1, . . . , in)′. The dependence of Yi on Zi(·) is reflected through the unknown function β(·) as Yi = β0 + ∫ 60 −1 β(t) [ Zi(t)− E(Zi(t)) ] dt+ ei, where ei is an error term distributed as N(0, σ 2). The functional parameter β is of primary interest and can be viewed as a generalization of the con- ventional multiple regression coefficient vector. Choosing a basis for β (for ease, the same as for the Zi(·)), we approximate β as β(t) = K∑ k=1 βkφk(t). Together with our modelling of Zi(·), we can write the model of Yi as Yi = β0 + β ′Txi + ei, β = (β1, . . . , βK) ′, Tjk = ∫ 60 −1 φj(t) φk(t) dt. The goal is to use information of the observed data, the zi’s and Yi’s, to estimate the coefficient vector β. 80 Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny Before estimating β along with the other unknown parameters in the model, we must first estimate the basis functions, the φk’s. We apply a smoothed eigenanalysis to the zij ’s, penalizing an approximation of the sec- ond derivative of the eigenfunction. We choose the amount of smoothing by eye. A detailed description of choosing the eigenfunctions is provided in Sec- tion 4.3. Treating the estimated φk’s as fixed and known, we then estimate the model parameters via maximum likelihood. We use an iterative numer- ical method, an implementation of the ECME (Expectation/Conditional Maximization Either) algorithm of Liu and Rubin (1994). Obtaining the estimate β̂, our estimate of β(t) is β̂(t) = ∑K k=1 β̂kφk(t). We estimate the covariance matrix of β̂ by both a Hessian matrix based method and a boot- strapped method. Using the estimated covariance matrix, we can estimate the variance of β̂(t), and thus make inference for β. To determine if Yi depends on Zi(·), we test Ho : β(t) = 0, for all t ∈ [−1, 60] via an integrated t-statistic Uf = ∫ 60 −1 β̂2(t) v̂ar(β̂(t)) dt. To calculate the null distribution of Uf , and thus calculate a p-value, we use a permutation type method, one that does not rely on distributional assumptions. Under the null hypothesis that β(t) ≡ 0 for all t ∈ [−1, 60], Yi and zi are independent. Therefore the joint distribution of (zi, Yi) is the same as that of (zi, Yi′) where i ′ is chosen at random from {1, . . . , N} and N is the number of individuals. To simulate the null distribution of Uf , we make Q = 500 new data sets via permuting the Yi’s: the qth data set is simply {z1, Yi1 , . . . , zN , YiN } where i1, . . . , iN is a random permutation of 1, . . . , N . The p-value is then calculated as the proportion of the resulting Q statistic values larger than the original observed value. Consider the two-sample problem where there are observations indepen- dently from a selection group and from a control group. To see if the β’s 81 Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny governing the Y ’s and Z’s in the selection group and the control are equal, we test Ho : β s(t) = βc(t), for all t ∈ [−1, 60], where the superscript “s” or “c” indicates the group membership, “selection” or “control” respectively. To test Ho, we use the two sample test statistic Uf = ∫ [β̂s(t)− β̂c(t)]2 v̂ar(β̂s(t)) + v̂ar(β̂c(t)) dt. Again, we use a permutation method to calculate the null distribution of the two sample Uf and thus the p-values. Under the null hypothesis that βs(t) = βc(t) for all t ∈ [−1, 60], the dependence of Yi on zi is identical in both groups. We generate Q = 500 “data sets” from the original data set, data sets that follow the null hypothesis. To construct the qth “data set”, we randomly split the N s +N c individuals into two groups of size N s and N c and calculate the resulting Uf . We use the empirical distribution of the obtained Q test statistic values to approximate the null distribution of our statistic. The p-value is then calculated as the proportion of the Q statistic values larger than the original observed value. The use of function-valued traits in evolutionary biology was first pro- posed by Kirkpatrick and Heckman (1989). The methodology has found applications in animal breeding, via basis function expansions with random coefficients. See Meyer (2005) and references therein. 4.3 Results The body mass trajectories for each experimental group are presented in Figure 4.1. For all four groups, body mass is a roughly monotone increasing function of age, with the highest rate of increase from ages −1 to 4 weeks. The body mass trajectories of males, both selected and control, are more variable than those of the females. Histograms of average wheel running from ages 5 to 60 are presented in Figure 4.2; within each sex, selected mice ran more than control mice and females ran more than males (Morgan et 82 Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny al., 2003). In order to determine a few functions which adequately describe the body mass trajectories, a smoothed eigenanalysis of the trajectories was conducted. This smoothed eigenanalysis yielded a first eigenfunction that was close constant, indicating that the biggest source of variability in body mass was overall size of the mouse. Since a constant eigenfunction is biologically meaningful, the first basis function was forced to be constant and the remaining functions calculated via a smoothed eigenanalysis on the trajectories, centered as follows. Let z̄i = 1 58 60∑ k=−1 k 6=34,38,39,50 zik be the ith mouses average body mass. We apply a smoothed eigenanalysis to the sample covariance matrix of the centered body mass zij − z̄i j = −1, . . . , 60, j 6= 34, 38, 39, 50. Using the constant as the first eigenfunction, we found that the corre- sponding first principal component, overall size, accounts for 85% of the variability in the male selection group, 72% in the male control group, 70% in the female selection group and 63% in the female control group. Fig- ure 4.3 shows the proportion of cumulative variance explained by the next ten principal components in each group after removing the variability of the first principal component. The first three smoothed eigenfunctions in each group capture about 90% of the remaining variability of the body mass trajectories so these three eigenfunctions plus the constant function will be used in the subsequent analyses. Figure 4.4 shows each of these four leading eigenfunctions; the functions of the four experimental groups are presented together in each panel. Clearly the smoothed eigenfunctions are very similar among the four experimental groups. We model the mean of the averaged wheel running as E(Y ) = β0 + ∫ β(t)[Z(t)− EZ(t)]dt, 83 Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny where Z(t) is the logarithm of body mass at age t. Since the observed body mass curves are somewhat rougher than one would expect, we model the observed log body mass at age t as Z(t) plus error. To estimate β(·), we approximate both β(·) and Z(·) using basis function expansions. The basis functions used are the estimated eigenfunctions. To model within individual variability in body mass, we use a random effects model, that is, we assume that the coefficients in each individuals Z(·) expansion are random. We then estimate all parameters by maximum likelihood. The β̂ is estimated for each group from the smoothed eigenfunctions. These β̂’s are presented in Figure 4.5. Both the male selection and control groups show regions of β̂ that are non-overlapping with the 0 line and for- mal testing shows that β̂ is significantly different from 0 for both selected and control males (Figure 4.5; Table 4.1; p = 0.008 and 0.034, respectively). Hence in male selection and control mice average wheel running significantly depends on body mass trajectory. Interestingly, β̂ for the selection males is positive at very young ages, then negative just prior to 10 weeks, positive again at middle age and then negative or not different from 0 at older ages. For females, β̂ overlaps 0 and the results of the formal testing are not signif- icant for either selection or control groups (Figure 4.5; p = 0.264 and 0.656, respectively). Hence for all females, there is no evidence that average wheel running depends on the body mass trajectory. Figure 4.6 shows both the pointwise estimates and the model estimates of the covariance between average wheel running and body mass trajectory for all four experimental groups. The general shapes of both estimates are similar to each other and they are similar to the shapes of the curves seen for the β̂ estimates in Figure 4.5. For instance, in each group except for the female control, at the first few ages the covariance between average wheel- running and log body mass is large. Correspondingly, at these ages each β̂ is also large, where we believe that average wheel running is positively dependent on log body mass. The β̂ for selection and control mice were compared within each gender to test for the evolution of the relationship of body mass trajectory and average wheel running (Figure 4.7). No significant differences were detected 84 Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny between selected and control mice in either gender (Table 4.2); no evolution of the dependence of wheel running on body mass trajectory in response to selection on age-specific wheel running can be detected. 4.4 Discussion As hypothesized, both male selected and control mice showed a significant dependence of wheel running on the body mass ontogeny. This result gen- erally fits those of Swallow et al. (1999), who measured a negative genetic correlation of −0.5 between wheel running and body mass, and Morgan et al. (2003) who measured a negative correlated response of the body mass on- togeny to selection on early-age wheel running. However, the analyses used herein have revealed complexity that was not identified in the two previous studies: both previous studies showed clear negative relationships between body mass and wheel running, whereas we have shown that the relationship is sometimes negative and sometimes positive, depending on age. As can be seen in Figures 4.5 and 4.6, body mass has a positive effect on wheel running at young and some middle ages, with a negative effect at some younger ages and at older ages. Interestingly Swallow et al. (1999) conducted their study at the younger ages (˜ weeks 8 to 15) when the relationship is negative, and although Morgan et al. (2003) clearly showed a negative genetic covariance between the body mass ontogeny and early-age wheel running, that does not preclude the phenotypic covariance from being at times positive. Hence our results do not disagree with those of the earlier studies but rather elucidate complexity that the earlier studies were not able to identify. A significant dependence of wheel running on the body mass trajectory was not identified in females. Different results in males and females is not surprising in the context of this model system of mice, in which males and females have differed in traits as diverse as wheel running itself (Swallow et al., 1998) and longevity (Bronikowski et al., 2006). On the other hand, the lack of a measurable difference in females may be caused by a lack of power. Given that Morgan et al. (2003) showed a negative correlated response of the wheel running trajectory to selection on early age body mass, we 85 Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny had hypothesized that the dependence of body mass on the wheel running trajectory would have evolved and hence be different between selected and control mice. However, no such differences were identified. Hence we can not demonstrate any evolution of the dependence of wheel running on body mass trajectory. Several reasons may explain this. First, it is possible that the dependence is evolving, just slowly, and that a similar study conducted on a later generation would reveal differences between selected and control mice. Second, the dependence of wheel running on body mass may be constrained by genetic covariances between wheel running and body mass, effectively making it difficult to alter the dependence. Such a constraint could be identified by detailed quantitative genetic analyses of the wheel running and body mass trajectories (Kingsolver et al., 2001). Third, the inability to demonstrate any evolution dependence of wheel running on body mass trajectory could simply be because of a lack of power. Developing powerful yet flexible statistical techniques to analyze data sets with a large number of observations on each individual is a challenge. We often do not have enough degrees of freedom to carry out a “usual” regression analysis, modelling Yi = β0+ ∑ βjZ(tj)+eij : for our data, within one group we have around 40 individuals and 58 values of βj . The function- valued approach effectively reduces the number of parameters in the model. Using eigenfunctions as basis functions has allowed us to fit our data using only β0 and 4 βj ’s. Griswold, Gomulkiewicz and Heckman (2007) have found via simulation that basis-function type techniques are more powerful than traditional ones. Although they have considered a slightly different problem, applicable to a comparision of log body mass trajectories of two groups, one would expect their results to hold qualitatively for the problem considered here. The statistics methodology was developed by Heckman and Wang (2007), with this its first application. Here we see its benefits: the method has exposed patterns in dependence that were previously unknown. The method allows full use of information from all individuals, through the use of random effects modelling. And by using a function-valued approach, we can handle large number of observations per individual. 86 Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny Group p-value Male selected 0.008 Male control 0.034 Female selected 0.264 Female control 0.656 Table 4.1: P-values of the test Ho : β(t) = 0, for all t ∈ [−1, 60], using test statistic Uf . Gender p-value Male 0.110 Female 0.444 Table 4.2: P-values of the test Ho : β s(t) = βc(t), for all t ∈ [−1, 60], using test statistic Uf . 87 Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny 0 10 20 30 40 50 60 2. 0 2. 5 3. 0 3. 5 Male Selected Week Lo g bo dy M as s 0 10 20 30 40 50 60 2. 0 2. 5 3. 0 3. 5 Male Control Week Lo g bo dy M as s 0 10 20 30 40 50 60 2. 0 2. 5 3. 0 3. 5 Female Selected Week Lo g bo dy M as s 0 10 20 30 40 50 60 2. 0 2. 5 3. 0 3. 5 Female Control Week Lo g bo dy M as s Figure 4.1: Plots of log body mass versus week for the four groups of labo- ratory mice. 88 Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny Male Selected Wheel Running (km/week) D en si ty 20 40 60 80 120 0. 00 0 0. 01 0 0. 02 0 0. 03 0 Male Control Wheel Running (km/week) D en si ty 20 40 60 80 120 0. 00 0 0. 01 0 0. 02 0 0. 03 0 Female Selected Wheel Running (km/week) D en si ty 20 40 60 80 120 0. 00 0 0. 01 0 0. 02 0 0. 03 0 Female Control Wheel Running (km/week) D en si ty 20 40 60 80 120 0. 00 0 0. 01 0 0. 02 0 0. 03 0 Figure 4.2: Histogram of the response: averaged wheel running from weeks 5 to 60 for the four groups of laboratory mice. 89 Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny 2 4 6 8 10 Male Selected Eigenvalue Index Va ria nc e Ex pl ai ne d 0. 5 0. 6 0. 8 0. 9 1. 0 2 4 6 8 10 Male Control Eigenvalue Index Va ria nc e Ex pl ai ne d 0. 5 0. 6 0. 8 0. 9 1. 0 2 4 6 8 10 Female Selected Eigenvalue Index Va ria nc e Ex pl ai ne d 0. 5 0. 6 0. 8 0. 9 1. 0 2 4 6 8 10 Female Control Eigenvalue Index Va ria nc e Ex pl ai ne d 0. 5 0. 6 0. 8 0. 9 1. 0 Figure 4.3: Plots of the proportion of cumulative variance of the centered log body mass explained by the first ten principal components for four groups of mice, after the individual average log body mass has been removed. 90 Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny 0 10 20 30 40 50 60 0. 6 0. 8 1. 0 1. 2 1. 4 Week PC F un ct io n 1 0 10 20 30 40 50 60 − 0. 5 − 0. 3 − 0. 1 0. 1 Week PC F un ct io n 2 0 10 20 30 40 50 60 − 0. 2 0. 0 0. 1 0. 2 Week PC F un ct io n 3 0 10 20 30 40 50 60 − 0. 2 0. 0 0. 1 0. 2 0. 3 Week PC F un ct io n 4 Figure 4.4: The constant function and the first three smoothed eigenfunc- tions of the covariance of centered log body mass of four group. The four groups’ functions are together in one panel. The eigenfunctions are calcu- lated as described in Section 4.3. 91 Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny 0 10 20 30 40 50 60 − 60 − 20 0 20 40 Male Selected Week β^ (t ) 0 10 20 30 40 50 60 − 60 − 20 0 20 40 Male Control Week β^ (t ) 0 10 20 30 40 50 60 − 60 − 20 0 20 40 Female Selected Week β^ (t ) 0 10 20 30 40 50 60 − 60 − 20 0 20 40 Female Control Week β^ (t ) Figure 4.5: Plots of β̂ and its standard errors computed from the Hessian matrix (solid) and from the bootstrap (dash-dot) for four groups of mice. Standard errors are calculated as described in Section 4.2. 92 Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny 0 10 20 30 40 50 60 − 0. 2 0. 2 0. 4 0. 6 0. 8 Male Selected Week Co v 0 10 20 30 40 50 60 − 0. 2 0. 2 0. 4 0. 6 0. 8 Male Control Week Co v 0 10 20 30 40 50 60 − 0. 2 0. 2 0. 4 0. 6 0. 8 Female Selected Week Co v 0 10 20 30 40 50 60 − 0. 2 0. 2 0. 4 0. 6 0. 8 Female Control Week Co v Figure 4.6: Plots of the covariances between the average wheel running and the log body mass trajectory at week j, j = −1, . . . , 60, for four groups of mice. Solid lines are the covariance from the model and dash-dot lines are the pointwise sample covariances between average wheel running and log body mass. For the calculation, refer to Heckmand and Wang (2007). 93 Chapter 4. Dependence of average lifetime wheel-running on body mass ontogeny 0 10 20 30 40 50 60 − 40 0 20 60 Male Week βs ( t)− βc ( t) 0 10 20 30 40 50 60 − 40 0 20 60 Female Week βs ( t)− βc ( t) Figure 4.7: Plots of β̂s − β̂c, the difference between the selection β̂ and the control β̂, and the standard errors of the difference computed from the Hessian matrix (solid) and from the bootstrap (dash-dot) for both genders. 94 Bibliography [1] Bronikowski, A. M., Morgan, T. J., Garland, T., Jr. and Carter, P. A. (2006) The evolution of aging and age-related physical decline in mice selectively bred for high voluntary exercise. Evolution 60, 1494-1508. [2] Dewsbury, D. A. (1980) Wheel-running behavior in 12 species of muroid rodents. Behav. Process 6, 271280. [3] Estabrook, G. F., Smith, G. R. and Dowling, T. E. (2007) Body mass and temperature influence rates of mitochondrial DNA evolution in North American Cyprinid fish. Evolution 61, 1176-1187. [4] Garland, T., Jr. (1983) Scaling the ecological cost of transport to body mass in terrestrial mammals. Am. Nat 121, 571-587. [5] Garland, T., Jr. (2003) Selection experiments: an under-utilized tool in biomechanics and organismal biology. Pages 23-56 in Bels, V. L., Gasc, J. P., Casinos, A. (2003) Vertebrate biomechanics and evolution. BIOS Scientific Publishers, Oxford, U.K. [6] Garland, T., Jr and Carter, P. A. (1994) Evolutionary physiology. Annu. Rev. Physiol 56, 579621. [7] Gillooly, J. F., Allen, A. P., West, G. B. and Brown, J.H. (2005) The rate of DNA evolution: effects of body size and temperature on the molecular clock. PNAS, 102, 140-145. [8] Gomulkiewicz, R. and Kirkpatrick, M. (1992) Quantitative genetics and the evolution of reaction norms. Evolution 46, 390-411. 95 Bibliography [9] Gomulkiewicz, R. and Beder, J. H. (1996) The selection gradient of an infinite-dimensional trait. SIAM Journal of Applied Mathematics 56, 509-523. [10] Griswold, C. K., Gomulkiewicz, R. and Heckman, N. (2007) Statistical power and hypothesis testing with functional data. manuscript. [11] Heckman, N. and Wang, W. (2007) Linear mixed models for measure- ment error in functional regression. manuscript, Department of Statis- tics, University of British Columbia. [12] Hone, D. W. E. and Benton, B. J. (2005) The evolution of large size: how does Cope’s Rule work? Trends in Ecology and Evolution 20, 1, 4-6. [13] Kingsolver, J.G., R.S. Gomulkiewicz, and Carter, P. A. (2001) Varia- tion, selection, and evolution of function-valued traits. Genetica 112, 87-104. [14] Kirkpatrick, M. and Heckman, N. (1989) A quantitative-genetic model for growth, shape, reaction norms, and other infinite-dimensional char- acters. Journal of Mathematical Biology 27, 429-450. [15] Koteja, P., Carter, P. A. and Swallow, J. G. and Garland, T., Jr. (2003) Food wasting in house mice: variation among individuals, families, and genetic lines. Physiology & Behavior 80, 375-383. [16] Meyer, K. (2005) Advances in methodology for random regression anal- yses. Australian Journal of Experimental Agriculture 45, 847-859. [17] Morgan, T. J., Garland, T., Jr. and Carter, P. A. (2003) Ontogenies in mice selected for high voluntary wheel-running activity. I. Mean onto- genies. Evolution 57, 646-657. [18] Sogard, S. M. (1997) Size-selective mortality in the juvenile stage of teleost fishes: A review. Bulletin of Marine Science 60, 1129-1157 96 Bibliography [19] Swallow, J. G., Carter, P. A. and Garland, T., Jr. (1998) Artificial selection for increased wheel-running behavior in house mice. Behavior Genetics 28, 227-237. [20] Swallow, J. G., Koteja, P. and Carter, P. A. and Garland, T., Jr. (1999) Artificial selection for increased wheel-running activity in house mice results in decreased body mass at maturity. Journal of Experimental Biology 202, 2513-2520. 97 Chapter 5 Conclusions and future plan In the thesis, we study identifiability of linear mixed effects models in Chap- ter 2. Particularly in Section 2.5, we give necessary and sufficient conditions for identifiability of a model used in functional data analysis. The model is our study object in Chapter 3. There we show how to use the ECME al- gorithm to estimate the model parameters, develop statistics for hypothesis testing of the functional coefficient β and derive expressions of the resid- uals to check for outliers or influential points. We also apply the model to analyze a biological data set in Section 3.6. Chapter 4 provides a de- tailed background of the biological experiment, a description of the data, and biological interpretations of our analysis. In this chapter, we give possible directions for future research. In Sec- tion 5.1, we adjust our Chapter 3 to account for correlated individuals. In Section 5.2, we consider generalizing our Chapter 3 model from a scalar response to a functional response. 5.1 Models with correlated individuals In Chapter 4, we apply our model (3.4) and (3.5) to analyze a biological data set. Specifically, the observed ith mouse’s log body mass, zi, is modelled as zi = µ+ Axi + i (5.1) and the ith mouse’s average wheel running, Yi, is modelled as Yi = β0 + β ′Txi + ei. (5.2) 98 Chapter 5. Conclusions and future plan We assume the random effects, the xi’s, are independent across individuals. However, in the biological experiment, mice may come from the same fam- ily and thus traits may be genetically correlated. This fact would lead to correlations among the random effects, the xi’s. Clearly, our independence model assumption is then violated and we should take the dependence of individuals into consideration. In this section, we show how to handle this family correlation more directly, by adjusting our model to account for this dependence. In the experiment, two mice came from the same family. Let the sub- script i index the family and the subscript k = 1 or 2 differentiate the two family members. Models of the ikth mouse’s log body mass and average wheel running are respectively zik = µ+ Axik + ik, (5.3) Yik = β0 + β ′Txik + eik, k = 1, 2. (5.4) We still assume that Cov(xi1) = Σx = Cov(xi2) but, in the ith family, xi1 and xi2 are correlated as Cov(xi1,xi2) = αΣx, (5.5) where α is a known constant. The constant, α, comes from genetic theory (see, for instance, Heckman, 2003). We still assume the measurement errors are independent of the random effects and are independent across individuals as ik i.i.d.∼ N(0, σ2 R), eik i.i.d.∼ N(0, σ2), (x11,x12), (x21,x22), . . . , (xN1,xN2), 11, 12, . . . , N1, N2, e11, e12, . . . , eN1, eN2 are mutually independent. The model (5.3) and (5.4) now includes the dependence between the two family members. In the following, we give another expression of the model, dropping the subscript k. 99 Chapter 5. Conclusions and future plan We first group each family’s information together. Let zi = ( zi1 zi2 ) , yi = ( Yi1 Yi2 ) , xi = ( xi1 xi2 ) , i = ( i1 i2 ) , and ei = ( ei1 ei2 ) . To write a model for the ith family, let µ̃ = ( µ µ ) , β̃0 = ( β0 β0 ) , Ã = ( A 0 0 A ) , and β̃′T = ( β′T 0 0 β′T ) . From (5.3) and (5.4), our model for the ith family is zi = µ̃+ Ãxi + i, (5.6) yi = β̃0 + β̃ ′Txi + ei. (5.7) To derive the covariances of the (z′i,y ′ i), we let ΣAx = AΣxA ′, ΣA,β = AΣxT ′β and Σβ,x = β ′TΣxT ′β. From (5.5), it is not hard to see the following holds, Cov(zi) = ( ΣAx + Σ αΣAx αΣAx ΣAx + Σ ) , Cov(zi,yi) = ( ΣA,β αΣA,β αΣA,β ΣA,β ) , Cov(yi) = ( Σβ,x + σ 2 αΣβ,x αΣβ,x Σβ,x + σ 2 ) . Model (5.6) and (5.7) uses individuals’ family information together and the correlation between family members is reflected in the model’s covari- ance structure. To estimate the model parameters, we can use the ECME algorithm. But we may need to modify the algorithm as the model covari- ance structure is not in a simple form. 100 Chapter 5. Conclusions and future plan 5.2 Proposing models with a functional response In Chapter 3, we applied our model to analyze the mouse wheel running data set. In the experiment, the wheel running distances were observed every week for 60 consecutive weeks. We took the averaged wheel running distance as a scalar response, averaged to reduce nuisance effects in the wheel running. However, we would like to consider the entire wheel-running trajectory as a response and to study its dependence on the log body mass trajectory. In this section, we introduce a model which generalizes a scalar response to a functional response. We give an outline of the plan to study the model. Finally, we give a literature review of current work on this functional response and predictor model. 5.2.1 Models of a functional response Recall our modelling of the scalar response Yi is Yi = β0 + ∫ b a β(s) [Zi(s)− E(Zi(s))] ds+ ei. Now we have a functional response Yi indexed by t. Replacing the univariate β(·) function with a bivariate β(·, ·), we consider generalizing the scalar response model to accomodate the functional response as Yi(t) = α(t) + ∫ b a β(t, s) [ Zi(s)− E(Zi(s)) ] ds+ ei(t). (5.8) We consider a tensor-product expansion of β(·, ·) using basis functions, ϕ1, . . ., ϕJ and ψ1, . . ., ψL, as β(t, s) = J∑ j=1 L∑ l=1 bjlϕj(t)ψl(s) ≡ ϕ(t)′Bψ(s), B[j, l] = bjl. (5.9) The modelling of the predictor Zi remains the same as Zi(s) = µ(s) + K∑ k=1 xikφk(s) ≡ µ(s) + φ(s)′xi, 101 Chapter 5. Conclusions and future plan µ smooth, xi ≡ (xi1, . . . , xiK)′ i.i.d.∼ N(0,Σx). Thus we can write Yi(t) = α(t) +ϕ(t) ′BTxi + ei(t), Tlk = ∫ b a ψl(s) φk(s) ds ≡ ∫ ψ(s)φ(s)′ds. (5.10) Both the Zi and Yi are observed at a sequence of discrete time points. Typically, we assume the observed Zi are perturbed with measurement er- rors. Let ij represent the measurement error of Zi at time sij, j = 1, . . . , ni. Let zij be the contaminated observed value as zij = Zi(sij) + ij ≡ µ(sij) + φ(sij)′xi + ij . (5.11) Let µ = (µ(si1), . . . , µ(sini)) ′, i = (i1, . . . , ini) ′ represent errors and zi = (zi1, . . . , zini) ′ be the observed values. Write zi = µ+ Axi + i, Ajk = φk(sj), j = 1, . . . , ni, k = 1, . . . ,K, i ∼ N(0,Σi), x1, . . . ,xn, 1, . . . , n are mutually independent. (5.12) Suppose the process Yi is observed at a sequence of discrete time points (t1, . . . , tmi). Let α = (α(t1), . . . , α(tmi)) ′, ei = (ei(t1), . . . , ei(tmi)) ′ repre- sent errors and yi = (yi(t1), . . . , yi(tmi)) ′ ≡ (yi1, . . . , yimi)′ be the observed values. We then write yi = α+ ΦBTxi + ei, Φij = ϕj(ti), ei independent of both xi and i, ei ∼ N(0,Σei), x1, . . . ,xn, 1, . . . , n, e1, . . . , en are mutually independent. (5.13) 102 Chapter 5. Conclusions and future plan The data available are the observed yi’s and the zi’s and the unknown parameter of most interest is B. We then use (5.12) and (5.13) as our model of the functional response and the functional predictor for data analysis. 5.2.2 Outline of the model study We initiate our plan to study the model (5.12) and (5.13) in this section. The steps basically follow those in Chapter 3: model identifiability, parameter estimation, hypothesis testing and model diagnostics. Throughout, the basis functions are assumed known and fixed. We give a discussion of the choice of basis functions at the end. In Section 2.5 we gave conditions of model identifiability for the scalar response model. We can use similar methods to study identifiability of this functional response model with covariance parameters now {Σx,Σi,Σei , i = 1, . . . , N}. New results will be needed to guarantee the model identifiability with restrictions on the Σi’s and the Σei ’s. After identifying the model parameters, we can proceed to estimate the unknown parameters using an ECME algorithm assuming the basis functions known and fixed. The parameter of most interest is the matrix B in (5.9). After obtaining the estimate, similar approaches using the Hessian matrix or bootstrap can be applied to get the standard errors of B̂. For hypothesis testing of β, it may be of interest to know if, at certain fixed t point, β(t, ·) is zero or not. So the statistics developed in Section 4.4 can also be used to test the hypothesis Ho : β(t, s) = 0 for all s ∈ [a, b] when t is fixed. After fitting the model, residuals of the fit of yi can be obtained based on the BLUP of the random effects xi. However, for each individual, the fitted value and the residual of yi now are vectors, not scalars. Appropriate methods for residual analysis need to be proposed. The ϕj ’s and ψl’s basis functions for β and the φk’s for Zi can be chosen using an eigenanalysis approach. As in the scalar response case, we apply an eigenanalysis to the observed zij’s, and take φ1, . . . , φK as the first K estimated eigenfunctions. We take L = K and ψk = φk to determine one set of the basis functions of β(·, ·) in (5.9), the set of basis functions that 103 Chapter 5. Conclusions and future plan appear in the integral T in (5.10). The other set of basis functions, the ϕj ’s, can be chosen as the estimated eigenfunctions of the Yi process. An alternative to the use of a tensor-product expansion of β is to consider using bivariate basis function expansions. Let the ζl’s be bivariate basis functions. We can approximate β as β(s, t) = L∑ l=1 βlζl(s, t) ≡ ζ(s, t)′β. But we also need to consider what basis functions to choose. 5.2.3 Literature review There are two research results closest to our work on the functional response and functional predictor model. Ramsay and Silverman (2005, chapter 16) study a slight modification of model (5.8) yi(t) = α(t) + ∫ β(t, s) Zi(s) ds+ ei(t). Besides using the tensor-product expansion of β as in (5.9), the authors expand α using the ϕj ’s as α(t) = J1∑ j=1 αlϕj(t) ≡ ϕ(t)′α. Assuming the Zi’s are observed without error, the authors estimate α and B by minimizing the integrated residual sum of squares, which is N∑ i=1 ∫ [ yi(t)−ϕ(t)′α− ∫ Zi(s)ψ(s) ′Bϕ(t)ds ]2 dt. Yao, Müller and Wang (2005) consider (5.8) with the Zi process observed with error. They use φk’s equal to the eigenfunctions of the Zi covariance function in (5.11). The random effects xi has a diagonal covariance matrix with elements σ2k’s being the eigenvalues of the Zi covariance function. The errors ij are assumed to be i.i.d. with mean zero and variance σ 2 . 104 Chapter 5. Conclusions and future plan Yao et al. carry out an eigenanalysis of the Yi’s covariance function, with resulting eigenfunctions equal to the ψl’s and corresponding eigenval- ues equal to the σ2l ’s. They expand the observed yik, contaminated with measurement errors, as yik = µY (tik) + L∑ l=1 uilψl(tik) + eik, µY smooth, Euil = 0, Eu 2 il = σ 2 l , Euiluil′ = 0, l 6= l′. eik i.i.d.∼ (0, σ2e ), independent of the u′ils. (5.14) Let C(s, t) denote the two-dimensional scatterplot smoothed cross-covariance surface Cov(Z(s), Y (t)) of the zij ’s and yik’s. The authors estimate β as β̂(s, t) = K∑ k=1 L∑ l=1 σkl σ2k φk(s)ψl(t), where σkl is calculated as σkl = ∫ ∫ φk(s) C(s, t) ψl(t) ds dt. Similarities between our modelling and Yao et al’s are that the coefficient vector xi in the expansion of the Zi is random and the Zi’s are observed with measurement error. However, Yao et al’s xi has a diagonal covariance structure while our covariance matrix of the xi, Σx, can be more general. We use both the zi’s and the yi’s information to estimate Σx and other model parameters. In Yao et al., variance elements σ2k’s of the xi and variance of the error ij , σ 2 , are estimated completely by the zi’s. Similarly, the σ 2 l ’s of the ui and σ 2 e of the eik are estimated completely using the yi’s. 105 Bibliography [1] Heckman, N. (2003) Functional data analysis in evolutionary biology. Recent Advances and Trends in Nonparametric Statistics, 49-60, Akri- tas, M. G. and Politis, D. N. (editors). [2] Ramsay, J. O. and Silverman, B. W. (2005) Functional Data Analysis. 2nd edition, Springer Series in Statistics. [3] Yao, F., Müller, H.G., Wang, J.L. (2005) Functional linear regression analysis for longitudinal data. Annals of Statistics 33, 6, 2873-2903. 106 Appendix A Appendix to Chapter 2 A.1 Proof of Corollary 2.4.1 To prove it, we use the following result from Graybill, 1983, p.206. Lemma A.1.1 Given two scalars a and b, the characteristic equation of the matrix C = (a− b)I + bJ in λ is (a+ (n− 1)b− λ) (a− b− λ)n−1, and hence n− 1 characteristic roots are equal to a− b and one root is equal to a+ (n− 1)b. Proof : To prove the corollary, we use Theorem 2.4.1 and a proof by contradiction. First suppose that the model is identifiable and suppose, by way of contradiction, HZJ = J. Fix Σ ∈ Θ̃. Let s > 1, σ∗2 = sσ2 and ρ∗ = (ρ − 1)/s + 1. It is not hard to check −1/(n − 1) < ρ∗ < 1. De- fine Σ ∗ = σ∗2 [(1− ρ∗)I + ρ∗J]. Then Σ − Σ∗ = (σ2 − σ∗2)J and, since HZJ = J, it is clear that (2.3) is satisfied. We now show that, for any Σu ∈ Θu, there exists s∗ > 1 so that Σ∗u defined as in (2.4) is positive defi- nite whenever 1 < s < s∗. This will show that the model is not identifiable, which contradicts our assumption. Plugging Σ − Σ∗ = (σ2 − σ∗2)J into (2.4) yields Σ∗u = Σu + σ 2(1− s)(Z′Z)−1Z′JZ(Z′Z)−1. (A.1) By assumption 1′Z 6= 0 and Z is of full column rank, the matrix (Z′Z)−1Z′JZ(Z′Z)−1 is non-negative definite and of rank one since J = 11′. Let λ be its non- zero and thus the largest eigenvalue of (Z′Z)−1Z′JZ(Z′Z)−1. Let λm be the 107 Appendix A. Appendix to Chapter 2 smallest eigenvalue of the matrix Σu, and let s ∗ = λm/(λσ 2) + 1. For any x ∈ <q, x 6= 0, x′Σ∗ux > 0 by the following argument. x′Σ∗ux = x ′Σux + σ 2(1− s)x′(Z′Z)−1Z′JZ(Z′Z)−1x ≥ λmx′x + σ2(1− s)λx′x > 0 Now suppose that HZJ 6= J and suppose, by contradiction, that the model is not identifiable. Then, by Theorem 2.4.1, there exist nonidentical Σ and Σ ∗ satisfying (2.3) and, since the rank of HZ is q, the rank of Σ −Σ∗ is at most q. We have Σ −Σ∗ = [ (σ2 − σ∗2)− (σ2ρ− σ∗2ρ∗) ] I + (σ2ρ− σ∗2ρ∗)J. By Lemma A.1.1, the eigenvalues of Σ−Σ∗ are (σ2−σ∗2)− (σ2ρ−σ∗2ρ∗), which is of multiplicity n − 1 and (σ2 − σ∗2) + (n − 1)(σ2ρ − σ∗2ρ∗), of multiplicity 1. Since Σ −Σ∗ is not the zero matrix, all of the eigenvalues cannot be equal to 0: we must either have no eigenvalues equal to 0, one eigenvalue equal to 0, or n − 1 eigenvalues equal to 0. In order to have rank(Σ − Σ∗) ≤ q, the n − 1 multiple eigenvalues have to be zero since 1 ≤ q < n − 1 by assumption. That is, σ2 − σ∗2 = σ2ρ − σ∗2ρ∗ and so Σ − Σ∗ = (σ2 − σ∗2)J. But plugging this into (2.3) yields HZJ = J, contradicting our assumption. 2 A.2 Proof of Corollary 2.4.2 Proof : We first note a fact about the matrix HZ. Since HZ is symmetric and idempotent, HZ[k, k] = ∑ l (HZ[k, l]) 2 = (HZ[k, k]) 2 + ∑ l 6=k (HZ[k, l]) 2 . 108 Appendix A. Appendix to Chapter 2 Thus, if HZ[k, k] = 1, then HZ[k, i] = HZ[i, k] = 0 for all i 6= k. To prove the corollary, we use Theorem 2.4.1 and a proof by contradic- tion. First suppose that the model is identifiable and suppose, by way of contradiction, a diagonal element of HZ is equal to 1. Without loss of generality, we assume HZ[1, 1] = 1. Then by our observation, HZ[1, i] = HZ[i, 1] = 0 for all i 6= 1. Fix Σ = diag{σ21 , . . . , σ2n} ∈ Θ̃. Let σ∗21 satisfy 0 < σ∗1 2 < σ21 and define Σ ∗ = diag{σ∗12, σ22 , . . . , σ2n}. Then Σ − Σ∗ = diag{σ21−σ∗12, 0, . . . , 0}. It is not hard to check that (2.3) is satisfied. Clearly, for any Σu ∈ Θu, Σ∗u defined as in (2.4) is also in Θu. Thus, the model is not identifiable, which contradicts our assumption. Now suppose that no diagonal element of HZ is equal to one and sup- pose, by contradiction, that the model is not identifiable. Then there exists nonidentical diagonal matrices, Σ and Σ ∗, satisfying (2.3). As Σ 6= Σ∗, at least one of the diagonal elements of Σ −Σ∗ is not zero. Suppose the k-th diagonal element is not zero. By (2.3), the k-th diagonal element of HZ must be one, contradicting our assumption. 2 109 Appendix B Appendix to Chapter 3 In this appendix, we provide the calculations in Sections 3.3.2 and 3.3.4 where we find the updates of Σx and {β, σ2} in the ECME procedure. In Section B.4, we derive the first order condition (3.10) and the Hessian matrix (3.14) of Section 3.3.2 where we maximize the log-likelihood ΛN over Σx while holding the other parameters fixed.. In Section B.5, we derive the first order conditions (3.22) and (3.23), and the Hessian matrix (3.26) of Sec- tion 3.3.4 where we maximize Λ̃N over {β, σ2} holding the other parameters fixed. We use the tool of matrix differential calculus, calculating first differen- tials to obtain the first order conditions and second differentials to obtain the Hessian matrices. The book by Magnus and Neudecker (1988) gives an elegant description on this subject. In Sections B.1-B.3, we follow the book to introduce some definitions and provide some background, mainly from Part Two of the book. We keep the same notation as in the book. Throughout this section, chapters and page numbers all refer to (Magnus and Neudecker, 1988). B.1 Definition of the first differential We first give the definition of the first differential for a vector function (a vector valued function with a vector argument). We show that the function’s first differential is connected with its Jacobian matrix. We then give an extension of the definition to a matrix function (a matrix valued function with a matrix argument) and show how to identify the Jacobian matrix from the first differential. 110 Appendix B. Appendix to Chapter 3 Definition B.1.1 Let f : S → <m be a function defined on a set S in <n. Let c be an interior point of S, and let B(c; r) be an n-ball lying in S centred at c of radius r. If there exists a real m× n matrix A, depending on c but not on u, such that f(c + u) = f(c) + A(c)u + rc(u) for all u ∈ <n with ||u|| < r and lim u→0 rc(u) ||u|| = 0, then the function f is said to be differentiable at c; the m× n matrix A(c) is then called the first derivative of f at c, and the m× 1 vector df(c;u) = A(c)u, which is a linear function of u, is called the first differential of f at c (with increment u). If f is differentiable at every point of an open subset E of S, we say f is differentiable on E. After calculating the first differential, we identify the Jacobian matrix as follows. Let Df be the m × n Jacobian matrix of f whose ijth element is Djfi: the partial derivative of the ith component function fi of f with respect to the jth coordinate. The First Identification Theorem (p.87) states that the first derivative A(c) is Df(c) when f is differentiable at c. To extend the definition of a vector function to a matrix function with a matrix argument is straightforward using the vec operator. The vec operator transforms a matrix into a vector by stacking the columns of the matrix one underneath the other. Recall the norm of a real matrix X is defined by ||X|| = (trX′X)1/2. 111 Appendix B. Appendix to Chapter 3 Let <n×q contains all the real n × q matrices. Define a ball B(C; r) with center C and radius r in <n×q by B(C; r) = {X : X ∈ <n×q, ||X−C|| < r}. Let F : S → <m×p be a matrix function defined on a set S in <n×q. That is, F maps an n × q matrix into an m × p matrix F(X). We consider the the vector function f : vec S → <m×p defined by f(vec X) = vec F(X) and the following gives the definition of the first differential of F. Definition B.1.2 Let F : S → <m×p be a matrix function defined on a set S in <n×q. Let C be an interior point of S and let B(C; r) ⊂ S be a ball with center C and radius r. If there exists RC and a real (mp)×(nq) matrix A, depending on C but not on U, such that vecF(C + U) = vecF(C) + A(C)vecU + vecRC(U) for all U ∈ <n×q with ||U|| < r and lim U→0 RC(U) ||U|| = 0, then the function F is said to be differentiable at C. Let dF(C;U) = A(C)vecU. Although this is a vector of length (mp), it can be formed into a matrix of dimension m× p, in the usual natural way. This m× p matrix dF(C;U) is called the first differential of F at C with increment U and the (mp)× (nq) matrix A(C) is called the first derivative of F at C. From the definition, it is clear that the differential of F and f are related by vec dF(C;U) = df(vecC; vecU). 112 Appendix B. Appendix to Chapter 3 The Jacobian matrix of F at C is defined as DF(C) = Df(vecC). This is an (mp)× (nq) matrix, whose ijth element is the partial derivative of the ith component of vecF(X) with respect to the jth element of vecX, evaluated at X = C. The First Identification Theorem for matrix functions (p.96) states that if F is differentiable at C, then vec dF(C;U) = DF(C)vecU. So we can calculate the differential of F to identify its Jacobian matrix. B.2 Definition of the second differential We first introduce the definition of twice differentiable on which the defini- tion second differential is based. The definitions are restricted to real valued functions as in our calculations we only need to consider second differentials of real valued functions. Then we connect the Hessian matrix with the sec- ond differential. As in the first differential case, we give an extension of the definition to a real valued function with a matrix argument and also show how to identify the Hessian matrix from the second differential. Definition B.2.1 Let f : S → < be a real valued function defined on a set S in <n, and let c be an interior point of S. If f is differentiable in some n-ball B(c) and each of the partial derivatives Djf is differentiable at c, then we say that f is twice differentiable at c. If f is twice differentiable at every point of an open subset E of S, we say f is twice differentiable on E. The following is the definition of the second differential. Definition B.2.2 Let f : S → < be twice differentiable at an interior point c of S ⊂ <n. Let B(c) be an n-ball lying in S such that f is differentiable at every point in B(c), and let g : B(c) → < be defined by the equation g(x) = df(x;u). 113 Appendix B. Appendix to Chapter 3 Then the differential of g at c with increment u, i.e. dg(c;u), is called the second differential of f at c (with increment u), and is denoted by d2f(c;u). To calculate the second differential of f , by the definition, we just need to calculate the differential of the first differential of f , i.e. d2f = d(df). We have seen that the Jacobian matrix can be identified from the first differential. Similarly, we can identify the Hessian matrix from the second differential. The Second Identification Theorem (p.107) states that if f is twice differentiable at c, then d2f(c;u) = u′ (Hf(c))u, where Hf(c) is the n × n symmetric Hessian matrix of f at c with (i, j) entry equal to ∂2f(c)/(∂ci∂cj). Therefore, once we have calculated the second differential, the Hessian matrix is obtainable. Similarly as in Section B.1, the extension of the second differential from vector functions to matrix functions is straightforward using the vec opera- tor. As we only consider real valued functions, we restrict the extension to a real valued function with a matrix argument. We follow the notation in the definition of the first differential of matrix functions. Let the domain of f be S ⊆ <n×q and let C be an interior point of S. Let B(C; r) ⊂ S be a ball with center C and radius r and let U be a point in <n×q with ||U|| < r, so that C + U ∈ B(C; r). The second differential of f at C is then defined as d2f(C;U) = d2f(vecC; vecU). The Second Identification Theorem for matrix functions (p.115) says if f is twice differentiable at C, then d2f(C;U) = (vecU)′ Hf(C) vecU, 114 Appendix B. Appendix to Chapter 3 where Hf(C) is the nq×nq symmetric Hessian matrix of f at C defined as Hf(C) ≡Hf(vecC). That is, the ijth element of Hf(C) is the second order derivative of f with respect to the ith and jth element of vecX where X ∈ S, evaluated at C. B.3 Matrix algebraic and differential rules In this section, we list the matrix algebraic and differential rules (chap.8) which will be used without specific reference in our derivations. In the following, we let A, B, C and D denote constant matrices, u denote a vector function and U and V denote matrix functions. We let ⊗ stand for the Kronecker product. The rules are the following. • tr(AB) = tr(BA), provided AB is square. • tr(A′B) = (vecA)′vecB. • tr(ABCD) = (vecD)′(A ⊗ C′)(vecB′), provided ABCD is defined and square. • dA = 0. • dAU = AdU. • d(U + V) = dU + dV. • d(UV) = (dU)V + U(dV). • d(U′) = (dU)′. • d (ln detU) = trU−1(dU). • d (U−1) = −U−1(dU)U−1. • d (trU) = tr (dU). • d(vecU) = vec (dU). • d(u′Au) = u′(A + A′)du = (du)′(A + A′)u. 115 Appendix B. Appendix to Chapter 3 B.4 Calculations in Section 3.3.2 Recall the observed data log-likehood has the expression ΛN = −N 2 ln det ( CΣxC ′ + Σd )−1 2 N∑ i=1 (Wi−µW )′ ( CΣxC ′ + Σd )−1 (Wi−µW ). We want to maximize it over Σx while holding the other parameters fixed. In this section, holding the other parameters fixed, we calculate the first differential of ΛN to obtain the first order condition (3.10) and calculate the second differential to obtain the Hessian matrix in (3.14). As we treat Σx as the only unknown parameter, it immediately follows from the expression of ΣW in (3.9) dΣW = CdΣxC ′. (B.1) In our derivation, we will use the shorter notation dΣW before we reach (3.10) or (3.14). We have dΛN = −N 2 d (lnΣW )− 1 2 N∑ i=1 (Wi − µW )′ ( d Σ−1W ) (Wi −µW ) = −N 2 tr [ Σ−1W dΣW ] + 1 2 N∑ i=1 (Wi − µW )′Σ−1W (d ΣW )Σ−1W (Wi −µW ) = −N 2 tr [ Σ−1W dΣW ] + 1 2 tr [ Σ−1W N∑ i=1 (Wi − µW )(Wi − µW )′Σ−1W (d ΣW ) ] = −N 2 tr [ Σ−1W (ΣW − SW )Σ−1W (dΣW ) ] (B.2) Recalling (B.1), we now have dΛN = −N 2 tr [ C′Σ−1W (ΣW − SW )Σ−1W CdΣx ] . By the First Identification Theorem for matrix functions mentioned in Sec- 116 Appendix B. Appendix to Chapter 3 tion B.1, we obtain the Jacobian matrix of ΛN at Σx as D ΛN (Σx) = vec { C′Σ−1W (ΣW − SW )Σ−1W C } . Equating it to zero yields C′Σ−1W (ΣW − SW )Σ−1W C = 0 which is equivalent to the first order condition (3.10). Next we calculate d2ΛN to identify the Hessian matrix in (3.14). Starting from (B.2), we have d2ΛN = −N 2 tr [ (dΣ−1W )(ΣW − SW )Σ−1W (dΣW ) ] − N 2 tr [ Σ−1W d(ΣW − SW )Σ−1W (dΣW ) ] −N 2 tr [ Σ−1W (ΣW − SW )dΣ−1W (dΣW ) ] = N 2 tr [ Σ−1W (dΣW )Σ −1 W (ΣW − SW )Σ−1W (dΣW ) ] − N 2 tr [ Σ−1W (dΣW )Σ −1 W (dΣW ) ] + N 2 tr [ Σ−1W (ΣW − SW )Σ−1W (dΣW )Σ−1W (dΣW ) ] . The first term and the last term at the right hand side are equal, and so they can be combined into one term Ntr [ Σ−1W (dΣW )Σ −1 W (ΣW − SW )Σ−1W (dΣW ) ] . Then d2ΛN = Ntr [ Σ−1W (dΣW )Σ −1 W ( 1 2 ΣW − SW )Σ−1W (dΣW ) ] Recall (B.1). Right hand side of the above = Ntr [ Σ−1W CdΣxC ′Σ−1W ( 1 2 ΣW − SW )Σ−1W CdΣxC′ ] = Ntr [ C′Σ−1W CdΣxC ′Σ−1W ( 1 2 ΣW − SW )Σ−1W CdΣx ] = N(vec dΣx) ′ [ C′Σ−1W C⊗C′Σ−1W ( 1 2 ΣW − SW )Σ−1W C ] vec(dΣx). 117 Appendix B. Appendix to Chapter 3 When evaluated at the critical point Σ̂x which satisfies the first order con- dition (3.10), d2ΛN is then −N 2 (vec dΣx) ′ [ C′Σ̂ −1 W C⊗C′Σ̂ −1 W C ] vec(dΣx). By the Second Identification Theorem for matrix functions mentioned in Section B.2, we have at Σ̂x the Hessian matrix is equal to H(Σ̂x) = −(N/2) ( D̂⊗ D̂ ) , where D̂ = C′Σ̂ −1 W C. This is the matrix we saw in (3.14). B.5 Calculations in Section 3.3.4 Recall we want to maximize the log-likelihood Λ̃N = −N 2 ln(β′Kβ + σ2)− 1 2(β′Kβ + σ2) N∑ i=1 ( Yi − β0 − β′G(zi − µ) )2 over {β, σ2} to find the update while fixing the other parameters. In this section, we derive the first order conditions (3.22) and (3.23) via calculating the first differential of Λ̃N with respect to {β, σ2}. Calculating the second differential of Λ̃N then gives us the Hessian matrix (3.26). The following two differentials will facilitate our calculation in dΛ̃N and d2Λ̃N . Recall the expression of σ 2 Y |z in (3.19). We have dσ2Y |z ≡ d ( β′Kβ + σ2 ) = 2β′Kdβ + dσ2. (B.3) Let g(β) = N∑ i=1 ( Yi − β0 − β′G(zi − µ) ) (zi − µ)′G′. (B.4) We then obtain d [ N∑ i=1 ( Yi − β0 − β′G(zi − µ) )2] = −2 N∑ i=1 ( Yi − β0 − β′G(zi − µ) ) (dβ)′G(zi − µ) 118 Appendix B. Appendix to Chapter 3 = −2 N∑ i=1 ( Yi − β0 − β′G(zi − µ) ) (zi − µ)′G′(dβ) = −2g(β)dβ (B.5) To calculate dΛ̃N , we use the terms σ 2 Y |z and β ′Kβ+σ2 interchangeably. dΛ̃N = −N 2 d [ ln(β′Kβ + σ2) ] − d [ 1 2(β′Kβ + σ2) ] N∑ i=1 ( Yi − β0 − β′G(zi − µ) )2 − 1 2(β′Kβ + σ2) d [ N∑ i=1 ( Yi − β0 − β′G(zi − µ) )2] By (B.3) and (B.5), the right hand side above is equal to −N 2 2β′Kdβ + dσ2 σ2Y |z + 2β′Kdβ + dσ2 2σ4Y |z N∑ i=1 ( Yi − β0 − β′G(zi − µ) )2 + 1 σ2Y |z g(β)dβ. Let c(β, σ2) = [ N∑ i=1 ( Yi − β0 − β′G(zi − µ) )2 −Nσ2Y |z ] . (B.6) Then dΛ̃N is equal to dσ2 2σ4Y |z c(β, σ2) + 1 σ4Y |z [ β′K c(β, σ2) + σ2Y |zg(β) ] dβ. (B.7) By the First Identification Theorem mentioned in Section B.1, we obtain the first order conditions 1 σ4Y |z [ β′K c(β, σ2) + σ2Y |zg(β) ] = 0, 1 2σ4Y |z c(β, σ2) = 0 which lead to (3.22) and (3.23). Calculating d2Λ̃N to identify the Hessian matrix is lengthy and tedious. In fact, we don’t need the closed form of the Hessian matrix but the Hes- 119 Appendix B. Appendix to Chapter 3 sian matrix evaluated at the critical points {β̂, σ̂2} given in (3.26). So in our derivation, we will make use of the first order conditions to simplify calculation. We notice, equivalently, the critical points {β̂, σ̂2} only need to satisfy g(β̂) = 0 (B.8) c(β̂, σ̂2) = 0. (B.9) From (B.6), using (B.3) and (B.5) we have dc(β, σ2) = −2 N∑ i=1 ( Yi − β0 − β′G(zi − µ) ) (zi − µ)′G′dβ − 2Nβ′Kdβ −Ndσ2 = −2g(β)dβ − 2Nβ′Kdβ −Ndσ2, which is a function of β. By (B.8), dc(β̂, σ2) = −2N(dβ)′Kβ −Ndσ2. (B.10) Now we calculate d2Λ̃N starting from (B.7). We first calculate d [ dσ2 2σ4Y |z c(β, σ2) ] which is d ( 1 2σ4Y |z ) dσ2 c(β, σ2) + dσ2 2σ4Y |z d c(β, σ2). When at {β̂, σ̂2}, by (B.9) and (B.10), it is equal to −Ndσ 2 2σ̂4Y |z ( 2(dβ)′Kβ̂ + dσ2 ) = − N σ̂4Y |z ( dβ′ dσ2 )( Kβ̂ 1/2 ) dσ2. (B.11) Then we calculate d [ 1 σ4Y |z [ β′K c(β, σ2) + σ2Y |zg(β) ] dβ ] 120 Appendix B. Appendix to Chapter 3 which is d ( 1 σ4Y |z )[ β′K c(β, σ2) + σ2Y |zg(β) ] dβ+ 1 σ4Y |z d [ β′K c(β, σ2) + σ2Y |zg(β) ] dβ. (B.12) At {β̂, σ̂2}, the term in the first square brackets vanishes by (B.8) and (B.9). Thus, at {β̂, σ̂2}, (B.12) is equal to 1 σ̂4Y |z [ (dβ)′Kc(β̂, σ̂2) + β′Kdc(β̂, σ2) + dσ2Y |zg(β̂) + σ̂ 2 Y |zdg(β̂) ] dβ. From (B.4), we have dg(β) = −(dβ)′ N∑ i=1 G(zi − µ)(zi − µ)′G′. (B.13) Again by (B.8)-(B.10) and that dc(β̂, σ2) is a scalar, (B.12) is equal to 1 σ̂4Y |z [( −2N(dβ)′Kβ̂ −Ndσ2 ) β̂ ′ K− σ̂2Y |z(dβ)′ N∑ i=1 G(zi − µ)(zi − µ)′G′ ] dβ = − N σ̂4Y |z ( dβ′ dσ2 ) 2Kβ̂β̂′K + σ̂2Y |zN ∑Ni=1 G(zi − µ)(zi − µ)′G′ β̂ ′ K dβ (B.14) Combining (B.11) and (B.14), eventually we get, at (β̂, σ̂2), d2Λ̃N (β̂, σ̂ 2) = ( dβ′ dσ2 ) HΛ̃(β̂, σ̂2) ( dβ dσ2 ) , where HΛ̃(β̂, σ̂2) = − N σ̂4Y |z 2Kβ̂β̂′K + σ̂2Y |zN ∑Ni=1 G(zi − µ)(zi −µ)′G′ Kβ̂ β̂ ′ K 1/2 . By the Second Identification Theorem mentioned in Section B.2, H Λ̃(β̂, σ̂2) is the Hessian matrix and we have seen it in (3.26). 121
Thesis/Dissertation
2008-05
10.14288/1.0066202
eng
Statistics
Vancouver : University of British Columbia Library
University of British Columbia
Attribution-NonCommercial-NoDerivatives 4.0 International
http://creativecommons.org/licenses/by-nc-nd/4.0/
Graduate
functional regression
mixed models
Linear mixed effects models in functional data analysis
Text
http://hdl.handle.net/2429/253