0. To see that this 15 Chapter 2. Identifiability discussion of linear mixed effects models model is not identifiable, use the same argument as in Example 2.2.1 and note that the constructed \u03a3\u2217u is in \u0398\u0303u if \u03a3u is in \u0398\u0303u. In practice, one usually assumes a more specific structure for \u03a3\u000f, such as \u03a3\u000f = \u03c3 2 \u000f I. Restrictions may lead to identifiability, and such restrictions and their effects on identifiability will be discussed in the next two sections. 2.3 Simple sufficient conditions of identifiability In this section, we find sufficient conditions of identifiability of model (2.1) assuming \u0398\u0303 = \u0398. A further examination of (2.2) gives us the following sufficient conditions. Clearly, if \u03a3u is known, then Z\u03a3uZ \u2032 is known, and so \u03a3\u000f is completely determined. If \u03a3\u000f is known, then the model is identifiable. To see this, consider (2.2) with \u03a3\u000f = \u03a3\u000f \u2217. It follows that Z\u03a3uZ \u2032 = Z\u03a3\u2217uZ \u2032 and so \u03a3u = \u03a3 \u2217 u since Z is of full column rank. If Z\u03a3uZ \u2032\u03a3\u000f \u22121 = K, where K is known and K+ I is of full column rank, then the model is identifiable. Suppose by way of contradiction the model is not identifiable. Then (2.2) holds for (\u03a3u,\u03a3\u000f) 6= (\u03a3\u2217u,\u03a3\u000f\u2217) both in \u0398\u0303 with, by assumption, Z\u03a3uZ \u2032 = K\u03a3\u000f and Z\u03a3 \u2217 uZ \u2032 = K\u03a3\u000f \u2217. Substituting these expressions into (2.2) yields K\u03a3\u000f + \u03a3\u000f = K\u03a3\u000f \u2217 + \u03a3\u000f \u2217, that is, (K + I)(\u03a3\u000f \u2212\u03a3\u000f\u2217) = 0. Since K + I is of full rank, we must have \u03a3\u000f = \u03a3\u000f \u2217. But, as shown in the previous paragraph, this implies that \u03a3u = \u03a3 \u2217 u. The last condition is similar to a common condition for identifiabililty in simple linear regression models with measurement errors. The model assumes yi = \u03b20 + \u03b21xi + \u000fi, \u000fi \u223c (0, \u03c32\u000f ), 16 Chapter 2. Identifiability discussion of linear mixed effects models where xi is observed with error having variance \u03c3 2 u. The response yi then has variance \u03c32u +\u03c3 2 \u000f . One of the common conditions of model identifiability is to assume the ratio \u03c32u\/\u03c3 2 \u000f is known. The inverse \u03a3\u000f \u22121 appearing in our last condition could be viewed as multivariate version of \u201cdenominator\u201d. If there are any supplementary data, we may then be able to find an estimate of \u03a3u, \u03a3\u000f or K and we can treat this estimate as the true value. The sufficient conditions for identifiability can then be satisfied. 2.4 Sufficient conditions of identifiability for a structured \u03a3\u000f As we observed from Examples 2.2.1 and 2.2.2, the model is not identifiable even if we restrict \u03a3u to be a scalar multiple of a known matrix. In this section, we study the effect of putting restrictions on \u03a3\u000f. In Theorem 2.4.1 below, we give a necessary and sufficient condition of nonidentifiability, a condition that relies mainly on the design matrix Z via HZ = Z(Z \u2032Z)\u22121Z\u2032. The theorem leads to four corollaries: Corollaries 2.4.1 and 2.4.2 give nec- essary and sufficient conditions for identifiability when \u03a3\u000f arises from an exchangeable covariance structure or is diagonal. Corollary 2.4.3 states an easily checked condition on \u03a3\u000f that guarantees identifiability of the model. That corollary is then applied to two commonly used error structures. Us- ing Corollary 2.4.4, we can generalize a known identifiability result, giving a shorter proof under weaker conditions. Theorem 2.4.1 Let \u0398\u0303 \u2286 \u0398 and define HZ = Z(Z\u2032Z)\u22121Z\u2032. Then model (2.1) with parameter space B\u2297 \u0398\u0303 is nonidentifiable if and only if there exist (\u03a3\u000f,\u03a3u) \u2208 \u0398\u0303 and (\u03a3\u000f\u2217,\u03a3\u2217u) \u2208 \u0398\u0303, with \u03a3\u000f\u2217 6= \u03a3\u000f such that HZ [\u03a3\u000f \u2212\u03a3\u000f\u2217] = \u03a3\u000f \u2212\u03a3\u000f\u2217, (2.3) and \u03a3\u2217u = \u03a3u + (Z \u2032Z)\u22121Z\u2032 [\u03a3\u000f \u2212\u03a3\u000f\u2217]Z(Z\u2032Z)\u22121. (2.4) 17 Chapter 2. Identifiability discussion of linear mixed effects models Proof : Nonidentifiability of the model is equivalent to the existence of (\u03a3\u000f,\u03a3u) and (\u03a3\u000f \u2217,\u03a3\u2217u) in \u0398\u0303, not equal, satisfying (2.2). Note that this is equivalent to having (\u03a3\u000f,\u03a3u) and (\u03a3\u000f \u2217,\u03a3\u2217u) in \u0398\u0303 with \u03a3\u000f \u2217 6= \u03a3\u000f satisfying Z(\u03a3u \u2212\u03a3\u2217u)Z\u2032 = \u03a3\u000f\u2217 \u2212\u03a3\u000f. (2.5) Suppose the model is nonidentifiable. We premultiply (2.5) by Z\u2032, post- multiply it by Z and then pre- and postmultiply by (Z\u2032Z)\u22121 to get \u03a3u \u2212\u03a3\u2217u = (Z\u2032Z)\u22121Z\u2032 [\u03a3\u000f\u2217 \u2212\u03a3\u000f]Z(Z\u2032Z)\u22121. (2.6) This gives (2.4). To derive (2.3), premultiply (2.6) by Z, postmultiply (2.6) by Z\u2032 to get Z(\u03a3u \u2212\u03a3\u2217u)Z\u2032 = HZ [\u03a3\u000f\u2217 \u2212\u03a3\u000f]HZ (2.7) which, by (2.5), is the same as \u03a3\u000f \u2212\u03a3\u000f\u2217 = HZ [\u03a3\u000f \u2212\u03a3\u000f\u2217]HZ. (2.8) Premultiplying (2.8) by the idempotent matrix HZ gives HZ [\u03a3\u000f \u2212\u03a3\u000f\u2217] = HZ [\u03a3\u000f \u2212\u03a3\u000f\u2217]HZ. Substituting (2.8) into the right side of the above yields (2.3). To prove the converse, we want to show that (2.3) and (2.4) lead to (2.5). It is clear from (2.4) that (2.7) holds. If we can show that (2.8) holds then we are done since substituting (2.8) into the right side of (2.7) yields (2.5). To show (2.8), from (2.3) and the symmetry of \u03a3\u000f \u2212\u03a3\u000f\u2217, we see that HZ[\u03a3\u000f \u2212\u03a3\u000f\u2217] = [\u03a3\u000f \u2212\u03a3\u000f\u2217]HZ. Premultiplying the above identity by the idempotent matrix HZ gives HZ[\u03a3\u000f\u2212 \u03a3\u000f \u2217] = HZ[\u03a3\u000f\u2212\u03a3\u000f\u2217]HZ. Substituting (2.3) for the left side of the equation, we see that (2.8) holds. 2 18 Chapter 2. Identifiability discussion of linear mixed effects models The proofs of the next two corollaries are in Appendix A. Corollary 2.4.1 Let 1 be an n-vector with each element being one. Suppose that the distribution of \u000f1, . . . , \u000fn is exchangeable, that is, the covariance matrix of \u000f is of the form \u03c32 [(1\u2212 \u03c1)I + \u03c1J] where J = 11\u2032. Let \u0398\u0303\u000f = {\u03a3\u000f = \u03c32 [(1\u2212 \u03c1)I + \u03c1J] , \u03c32 > 0, \u22121\/(n \u2212 1) < \u03c1 < 1}. Suppose the matrix Z satisfies 1\u2032Z 6= 0 and rank(Z) = q with 1 \u2264 q < n \u2212 1. Suppose the parameter space is B\u2297 \u0398\u0303\u000f\u2297\u0398u. Then model (2.1) is identifiable if and only if HZJ 6= J. Comments. The condition HZJ = J means the sum of the elements of each row of HZ is equal to one, and this is an easy condition to check. For the case that q = 1, i.e. Z is a column vector (z1, . . . , zn) \u2032 where \u2211 zi 6= 0, HZ = \uf8eb\uf8ec\uf8ec\uf8ec\uf8ed z21 s2z z1z2 s2z \u00b7 \u00b7 \u00b7 z1zn s2z ... ... znz1 s2z znz2 s2z \u00b7 \u00b7 \u00b7 z2ns2z \uf8f6\uf8f7\uf8f7\uf8f7\uf8f8 , where s2z = \u2211 z2i . The model is identifiable if and only if Z is not a constant vector. When q = 2, suppose we have the usual simple linear regression model with centered covariates: Z = \uf8eb\uf8ec\uf8ec\uf8ec\uf8ed 1 z1 ... ... 1 zn \uf8f6\uf8f7\uf8f7\uf8f7\uf8f8 , with \u2211 zi = 0. (2.9) Then HZ = \uf8eb\uf8ec\uf8ec\uf8ec\uf8ed 1 n + z21 s2z 1 n + z1z2 s2z \u00b7 \u00b7 \u00b7 1n + z1zns2z ... ... 1 n + znz1 s2z 1 n + znz2 s2z \u00b7 \u00b7 \u00b7 1n + z 2 n s2z \uf8f6\uf8f7\uf8f7\uf8f7\uf8f8 . and each row of HZ sums to one. Thus unfortunately, the model is not iden- tifiable under this Z combined with the exchangable covariance structure. Corollary 2.4.2 Suppose that \u0398\u0303\u000f equals the collection of all diagonal pos- itive definite n \u00d7 n matrices. Then model (2.1) with parameter space B \u2297 19 Chapter 2. Identifiability discussion of linear mixed effects models \u0398\u0303\u000f \u2297\u0398u is identifiable if and only none of the diagonal elements of HZ is equal to one. Comments. Again, the condition on HZ is easy to check. Consider the case q = 1. As we have seen, diagonal elements of HZ equal z 2 i \/ \u2211 z2j , i = 1, . . . , n. Therefore, the model is identifiable if and only if Z does not have n\u2212 1 zero elements. Consider q = 2 with Z as in (2.9). The model is identifiable provided, for all i, (1\/n)+z2i \/ \u2211 z2j doesn\u2019t equal 1. So typically, the model is identifiable. The following corollary provides a sufficient condition for identifiability, a condition that can sometimes be easily checked. Consider (2.3). Note that the rank of HZ(\u03a3\u000f \u2212 \u03a3\u000f\u2217) is at most q, since the rank of HZ is q. Thus, for (2.3) to hold, we must be able to find some \u03a3\u000f and \u03a3\u000f \u2217 with the rank of \u03a3\u000f \u2212\u03a3\u000f\u2217 less than or equal to q. This proves the following. Corollary 2.4.3 Suppose that \u0398\u0303 \u2286 \u0398. Then model (2.1) with parameter space B \u00d7 \u0398\u0303 is identifiable if rank(\u03a3\u000f \u2212 \u03a3\u000f\u2217) > q for all \u03a3\u000f, \u03a3\u000f\u2217 in the parameter space. Now we apply Corollary 2.4.3 to show model identifiability under the \u201cmultiple of a known positive definite matrix\u201d and the \u201cMA(1)\u201d covariance structures respectively in next two examples. Example 2.4.1 Multiple of a known positive definite matrix Fix R, symmetric and positive definite, and suppose (\u03a3\u000f,\u03a3u) \u2208 \u0398\u0303 implies that \u03a3\u000f = \u03c3 2R, some \u03c32 > 0. Consider \u03a3\u000f = \u03c3 2R and \u03a3\u000f \u2217 = \u03c3\u22172R, \u03c3\u22172 6= \u03c32. Clearly \u03a3\u000f\u2212\u03a3\u000f\u2217 = (\u03c32\u2212\u03c3\u22172)R is invertible, and so is of rank n, which we have assumed is greater than q. Thus, the model is identifiable. To show the model in Example 2.4.2 below is identifiable, we need the following lemma which is a result in (Graybill, 1983, p.285) Lemma 2.4.1 Let T be the n \u00d7 n Toeplitz matrix with ones on the two parallel subdiagonals and zeroes elsewhere. Given two scalars a0 and a1, the 20 Chapter 2. Identifiability discussion of linear mixed effects models eigenvalues of the n\u00d7 n matrix C = a0I + a1T are \u00b5i = a0 + 2|a1| cos ipi n+ 1 , i = 1, . . . , n. Example 2.4.2 MA(1) Suppose that n\u22121 > q. Let the components of \u000f have the MA(1) covariance structure, i.e. of the form \u03c32(I + \u03c1T). Let \u0398\u0303\u000f = {\u03a3\u000f = \u03c32(I + \u03c1T), \u03c32 > 0, |\u03c1| < 1\/2} and suppose (\u03a3\u000f,\u03a3u) \u2208 \u0398\u0303 implies that \u03a3\u000f \u2208 \u0398\u0303\u000f. Let \u03a3\u000f and \u03a3\u000f \u2217 \u2208 \u0398\u0303\u000f. By Lemma 2.4.1, the eigenvalues of the difference matrix \u03a3\u000f \u2212\u03a3\u000f\u2217 = (\u03c32 \u2212 \u03c3\u22172)I + (\u03c32\u03c1\u2212 \u03c3\u22172\u03c1\u2217)T are \u03bbi = (\u03c3 2 \u2212 \u03c3\u22172) + 2 \u2223\u2223\u2223\u03c32\u03c1\u2212 \u03c3\u22172\u03c1\u2217\u2223\u2223\u2223 cos ipi n+ 1 , i = 1, . . . , n. Given any (\u03c32, \u03c1) and (\u03c3\u22172, \u03c1\u2217), with (\u03c32, \u03c1) 6= (\u03c3\u22172, \u03c1\u2217), the number of zero \u03bbi\u2019s is at most one. Hence, the rank of the difference matrix is greater than or equal to n \u2212 1. Therefore, model (2.1) is identifiable under this MA(1) covariance structure. In longitudinal or functional data analysis, usually there are N individ- uals with the ith individual modelled as in (2.1): yi = Xi\u03b2 + Ziui + \u000fi, ui \u223c N(0,\u03a3u), \u000fi \u223c N(0,\u03a3\u000fi), (\u03a3u,\u03a3\u000fi) \u2208 \u0398u \u2297 \u0398\u0303i\u000f, ui and \u000fi independent. (2.10) Statistical inference is normally based on the joint model, the model of these N individuals. The following corollary gives sufficient conditions for identifiability of the joint model. The intuition behind the result is that, if we can identify \u03a3u from one individual, then we can identify all of the \u03a3\u000fi\u2019s. Corollary 2.4.4 If an individual model (2.10) is identifiable, then the joint model is identifiable. 21 Chapter 2. Identifiability discussion of linear mixed effects models Proof: We notice each individual model (2.10) shares a common parameter, the covariance matrix \u03a3u. If one individual model uniquely determines \u03a3u and its \u03a3\u000fi, the identified \u03a3u will then yield identifiability of all the individual \u03a3\u000fi\u2019s since, if Zi\u03a3uZ \u2032 i + \u000fi = Zi\u03a3uZ \u2032 i + \u000f \u2217 i , clearly \u000fi = \u000f \u2217 i . Therefore, the joint model is identifiable. 2 Corollary 2.4.4 reduces the verification of a joint model\u2019s identifiability to the individuals\u2019. For instance, if the i-th individual model has Zi of full column rank and \u03a3\u000fi = \u03c3 2 \u000f Ini , where ni is the length of yi, then this individual model is identifiable by Corollary 2.4.1 and thus so is the joint model. Note that the other individual models can still have their Zj\u2019s not of full column rank. Demidenko (2004, Chapters 2 & 3) studies the joint model but assumes the covariance matrix of \u000fi is \u03c3 2Ini . Suppose each Zi is of dimension ni\u00d7 q. Demidenko shows that the joint model is identifiable if at least one matrix Zi is of full column rank and \u2211N i=1(ni \u2212 q) > 0. Using our argument in the previous paragraph, the condition \u2211N i=1(ni \u2212 q) > 0 can be dropped. Furthermore, our result can be applied to more general \u03a3\u000f\u2019s. 2.5 Extensions In this section, we discuss identifiability of a model in functional regression for a functional predictor y(\u00b7) and a scalar response w. We derive a necessary and sufficient condition of nonidentifiability for this model. We model y(t) as \u2211 j \u03b1j\u03c6j(t) + \u2211 k uk\u03c8k(t) plus error, with the \u03c6j \u2019s and \u03c8k\u2019s known, the \u03b1j\u2019s unknown and the uk\u2019s unknown and random. The dependence of the response w on y is modelled through an unknown functional coefficient, \u03b2: w = \u03b20+ \u222b \u03b2(t) [y(t)\u2212E(y(t))] dt+\u03b7 where \u03b7 is mean 0 normal noise. Thus, for appropriately defined \u03c1 and with u = (u1, . . . , uq) \u2032, we can write w = \u03b20 + \u03c1 \u2032u + \u03b7, \u03b20 \u2208 <, \u03c1 \u2208 0. Given \u03b8 (t), we update \u03a3x, \u03c3 2 \u000f , \u03b2, and \u03c3 2, as described below. 3.3.2 Updating \u03a3(t)x We update \u03a3(t)x by maximizing \u039bN over \u03a3x while keeping the other pa- rameters fixed. Let SW = \u2211N i=1(Wi \u2212 W\u0304)(Wi \u2212 W\u0304)\u2032\/N. We show that if \u03c32(t) and \u03c3 2(t) \u000f are positive and if SW \u2212\u03a3(t)d > 0, then our update \u03a3(t+1)x is positive definite and using it in the log likelihood instead of \u03a3(t)x increases the log likelihood. With detailed derivation in Section B.4, differentiating \u039bN with respect 32 Chapter 3. Linear mixed models for measurement error in functional regression to \u03a3x and equating to zero yields the first order condition C\u2032\u03a3\u22121W C = C \u2032\u03a3\u22121W SW \u03a3 \u22121 W C. (3.10) Here, C depends on \u03b2(t) and \u03a3W = C\u03a3xC \u2032 + \u03a3 (t) d from (3.9). Equation (3.10) holds provided \u03a3W is invertible at the critical value of \u03a3x. Since we assume that \u03c32(t) and \u03c3 2(t) \u000f are positive, \u03a3 (t) d is positive definite. So \u03a3W will be invertible provided \u03a3x is non-negative definite. We now solve (3.10) for \u03a3x, first deriving two useful identities, (3.11) and (3.12). For ease, we drop the hats and superscript t\u2019s on the parameter estimates that are being held fixed, that is, on \u00b5\u0302W , \u03c3 2(t) \u000f ,\u03b2 (t), and \u03c32(t). Direct multiplication and some manipulation of the left hand side of the following shows that ( C\u2032\u03a3\u22121d C ) \u00d7 [( C\u2032\u03a3\u22121d C )\u22121 + \u03a3x ] C\u2032\u03a3\u22121W = C \u2032\u03a3\u22121d . Solving this for C\u03a3\u22121W yields C\u2032\u03a3\u22121W = [( C\u2032\u03a3\u22121d C )\u22121 + \u03a3x ]\u22121 ( C\u2032\u03a3\u22121d C )\u22121 C\u2032\u03a3\u22121d . (3.11) Postmultiplying both sides of identity (3.11) by C yields C\u2032\u03a3\u22121W C = (( C\u2032\u03a3\u22121d C )\u22121 + \u03a3x )\u22121 . (3.12) Substituting (3.11) into the right side of (3.10) and (3.12) into the left side of (3.10) yields ( C\u2032\u03a3\u22121d C )\u22121 + \u03a3x = F SW F \u2032, where F = ( C\u2032\u03a3\u22121d C )\u22121 C\u2032\u03a3\u22121d . Note that F is of full row rank. Thus, the critical point is \u03a3\u0302x = F SW F \u2032 \u2212 ( C\u2032\u03a3\u22121d C )\u22121 = F (SW \u2212\u03a3d)F\u2032, (3.13) which is strictly positive definite. And so, clearly we have \u03a3W invertible at 33 Chapter 3. Linear mixed models for measurement error in functional regression the critical point. To see that the updated \u03a3\u0302x leads to an increase in \u039bN , we show that the Hessian matrix, H(\u03a3x) evaluated at \u03a3\u0302x is negative definite. The ijth element of H(\u03a3x) is the second order partial derivative of \u039bN with respect to the ith and jth elements of the vectorized \u03a3x. From calculations in Section B.4, we have H(\u03a3\u0302x) = \u2212(N\/2) ( D\u0302\u2297 D\u0302 ) , where D\u0302 = C\u2032\u03a3\u0302 \u22121 W C, (3.14) which is clearly negative definite. 3.3.3 Updating \u03c32(t)\u000f We update \u03c3 2(t) \u000f , holding all other parameter estimates fixed, using one E- step and one M-step of the EM algorithm. We show that if \u03c32(t) and \u03c3 2(t) \u000f are positive and if \u03a3(t)x > 0, then our update \u03c3 2(t+1) \u000f is positive. Increase of the log likelihood after updating \u03c3 2(t) \u000f by \u03c3 2(t+1) \u000f is guaranteed by the property of the EM algorithm. Recall (zi, Yi,xi), i = 1, . . . , N , are our complete data and Wi \u2261 (z\u2032i, Yi)\u2032, i = 1, . . . , N , are the observed data. In conditional expectations, we let \u201c \u00b7 \u201d stand for the observed data. Abusing notation slightly, we let f denote a generic density function with the exact meaning clear from the arguments. The E-Step of the EM algorithm calculates E \u03b8 (t) (\u2211N i=1 ln f(zi, Yi,xi) | \u00b7 ) and the M-step maximizes this conditional expectation over \u03c32\u000f to obtain \u03c3 2(t+1) \u000f . By the conditional independence of zi and Yi given xi, ln f(zi, Yi,xi) \u2261 ln f(zi|xi) + ln f(yi|xi) + ln f(xi). Since only ln f(zi|xi) contains \u03c32\u000f , we can ignore the last two terms and obtain \u03c3 2(t+1) \u000f via maximizing E\u03b8(t) (\u2211N i=1 ln f(zi|xi) | \u00b7 ) over \u03c32\u000f . From (3.4), we first get N\u2211 i=1 ln f(zi|xi) = \u2212N 2 ln(det \u03c32\u000fR)\u2212 1 2\u03c32\u000f N\u2211 i=1 (zi\u2212\u00b5\u2212Axi)\u2032R\u22121(zi\u2212\u00b5\u2212Axi). 34 Chapter 3. Linear mixed models for measurement error in functional regression Following (3.6), we have Cov(Wi,xi) = C\u03a3x (3.15) which then leads to the conditional mean and covariance of xi given Wi as E[xi|Wi] \u2261 \u00b5xi|Wi = \u03a3xC\u2032\u03a3\u22121W (Wi \u2212 \u00b5W ), (3.16) Cov[xi|Wi] \u2261 \u03a3x|W = \u03a3x \u2212\u03a3xC\u2032\u03a3\u22121W C\u03a3x. (3.17) Let s\u0303 = N\u2211 i=1 (zi \u2212 \u00b5\u0302\u2212A\u00b5(t)xi|Wi) \u2032R\u22121(zi \u2212 \u00b5\u0302\u2212A\u00b5(t)xi|Wi). Routine calculations yield E \u03b8 (t) ( N\u2211 i=1 ln f(zi|xi)|\u00b7 ) = \u2212N 2 ln(detR)\u2212nN 2 ln\u03c32\u000f\u2212 1 2\u03c32\u000f [ s\u0303+Ntr(R\u22121A\u03a3 (t) x|WA \u2032) ] . Differentiating this conditional mean with respect to \u03c32\u000f and equating the derivative to zero yields \u03c32(t+1)\u000f = 1 nN s\u0303+ 1 n tr[R\u22121A\u03a3 (t) x|WA \u2032]. (3.18) We show the update \u03c3 2(t+1) \u000f is positive in the following. The first term in \u03c3 2(t+1) \u000f is positive, by assumption (c) and the fact that R is positive definite. The second term is nonnegative by the following argument. Using the famous matrix identity (V\u03a3V\u2032 + \u03a30) \u22121 = \u03a3\u221210 \u2212\u03a3\u221210 V ( \u03a3\u22121 + V\u2032\u03a3\u221210 V )\u22121 V\u2032\u03a3\u221210 provided the matrix orders properly defined, we see that \u03a3 (t) x|W = ( \u03a3(t)x \u22121 + C(t)\u2032\u03a3 (t) d \u22121 C(t) )\u22121 which is positive definite. Given \u03a3 (t) x|W > 0, assumption (a) then implies 35 Chapter 3. Linear mixed models for measurement error in functional regression that A\u03a3 (t) x|WA \u2032 \u2265 0. Together with the fact that R > 0, the second term in (3.18) is thus nonnegative. 3.3.4 Updating \u03b2(t) and \u03c32(t) The updates of \u03b2(t) and \u03c32(t) maximize \u039bN over \u03b2 and \u03c3 2, holding the other parameters fixed. Suppose that \u03c3 2(t) \u000f > 0 and \u03a3 (t) x > 0. We find unique critical points, \u03b2\u0302 and \u03c3\u03022, and show that they increase the log likelihood provided \u03c3\u03022 > 0. Note that log f(yi, zi) = log f(yi|zi) + log f(zi), that log f(zi) doesn\u2019t depend on \u03b2 or \u03c32. We also note given zi, yi is normal with mean E(Yi|zi) \u2261 \u03b20 + \u03b2\u2032G(zi \u2212 \u00b5), and variance \u03c32Y |z \u2261 Var(Yi|zi) = \u03b2\u2032K\u03b2 + \u03c32 (3.19) where G = T\u03a3xA \u2032\u03a3\u22121z (3.20) and K = T\u03a3xT \u2032 \u2212T\u03a3xA\u2032\u03a3\u22121z A\u03a3xT\u2032. Therefore, to maximize \u039bN with respect to \u03b2 and \u03c3 2, we maximize \u039b\u0303N = \u2212N 2 ln(\u03b2\u2032K\u03b2 + \u03c32)\u2212 1 2(\u03b2\u2032K\u03b2 + \u03c32) N\u2211 i=1 ( Yi \u2212 \u03b20 \u2212 \u03b2\u2032G(zi \u2212 \u00b5) )2 .(3.21) With detailed derivation in Section B.5, equating \u2202\u039b\u0303N\/\u2202\u03b2 and \u2202\u039b\u0303N\/\u2202\u03c3 2 to zero yields respectively 1 \u03b2\u2032K\u03b2 + \u03c32 N\u2211 i=1 ( Yi \u2212 \u03b20 \u2212 \u03b2\u2032G(zi \u2212 \u00b5) ) G(zi \u2212 \u00b5) = 0 (3.22) 36 Chapter 3. Linear mixed models for measurement error in functional regression 1 (\u03b2\u2032K\u03b2 + \u03c32)2 [ N\u2211 i=1 ( Yi \u2212 \u03b20 \u2212 \u03b2\u2032G(zi \u2212 \u00b5) )2 \u2212N(\u03b2\u2032K\u03b2 + \u03c32)] = 0. (3.23) Note that G is of full row rank because of the following two observations. First, T is of full row rank by assumption (b). Second, the matrix \u03a3z = \u03a3(t)x + \u03c3 2(t) \u000f R is invertible since it is positive definite. Let M = G N\u2211 i=1 (zi \u2212 \u00b5)(zi \u2212 \u00b5)\u2032G\u2032. Then, by assumption (c), M is positive definite. Solving (3.22) for \u03b2 and (3.23) for \u03c32 gives \u03b2(t+1) = \u03b2\u0302 = M\u22121G N\u2211 i=1 (zi \u2212 \u00b5)(Yi \u2212 \u03b20) \u03c32(t+1) = \u03c3\u03022 = 1 N N\u2211 i=1 ( Yi \u2212 \u03b20 \u2212 \u03b2\u2032G(zi \u2212 \u00b5) )2 \u2212 \u03b2\u2032K\u03b2. (3.24) Unfortunately, we are not guaranteed that \u03c3\u03022 is positive. However, in all of our data analyses and simulation studies, the final estimate of \u03c32 was always positive. Again, to check if the update increases \u039b\u0303N , we show that the Hessian matrix is negative definite. We notice that (3.24) implies \u03c3\u03022Y |z \u2261 \u03b2\u0302 \u2032 K\u03b2\u0302 + \u03c3\u03022 = 1 N N\u2211 i=1 ( Yi \u2212 \u03b20 \u2212 \u03b2\u0302\u2032G(zi \u2212 \u00b5) )2 , (3.25) which is positive by assumption (d). With detailed calculation in Sec- tion B.5, the Hessian matrix H\u039b\u0303(\u03b2, \u03c32) when evaluated at \u03b2\u0302 and \u03c3\u03022 equals H\u039b\u0303(\u03b2\u0302, \u03c3\u03022) = \u2212 N (\u03c3\u03022Y |z) 2 \uf8eb\uf8ed 2K\u03b2\u0302\u03b2\u0302\u2032K + \u03c3\u03022Y |zN M K\u03b2\u0302 \u03b2\u0302 \u2032 K 12 \uf8f6\uf8f8 . (3.26) It follows that H\u039b\u0303(\u03b2\u0302, \u03c3\u03022) < 0 by the following argument. Let x1 \u2208

0 by the following argument. x\u2032\u03a3\u2217ux = x \u2032\u03a3ux + \u03c3 2(1\u2212 s)x\u2032(Z\u2032Z)\u22121Z\u2032JZ(Z\u2032Z)\u22121x \u2265 \u03bbmx\u2032x + \u03c32(1\u2212 s)\u03bbx\u2032x > 0 Now suppose that HZJ 6= J and suppose, by contradiction, that the model is not identifiable. Then, by Theorem 2.4.1, there exist nonidentical \u03a3\u000f and \u03a3\u000f \u2217 satisfying (2.3) and, since the rank of HZ is q, the rank of \u03a3\u000f \u2212\u03a3\u000f\u2217 is at most q. We have \u03a3\u000f \u2212\u03a3\u000f\u2217 = [ (\u03c32 \u2212 \u03c3\u22172)\u2212 (\u03c32\u03c1\u2212 \u03c3\u22172\u03c1\u2217) ] I + (\u03c32\u03c1\u2212 \u03c3\u22172\u03c1\u2217)J. By Lemma A.1.1, the eigenvalues of \u03a3\u000f\u2212\u03a3\u000f\u2217 are (\u03c32\u2212\u03c3\u22172)\u2212 (\u03c32\u03c1\u2212\u03c3\u22172\u03c1\u2217), which is of multiplicity n \u2212 1 and (\u03c32 \u2212 \u03c3\u22172) + (n \u2212 1)(\u03c32\u03c1 \u2212 \u03c3\u22172\u03c1\u2217), of multiplicity 1. Since \u03a3\u000f \u2212\u03a3\u000f\u2217 is not the zero matrix, all of the eigenvalues cannot be equal to 0: we must either have no eigenvalues equal to 0, one eigenvalue equal to 0, or n \u2212 1 eigenvalues equal to 0. In order to have rank(\u03a3\u000f \u2212 \u03a3\u000f\u2217) \u2264 q, the n \u2212 1 multiple eigenvalues have to be zero since 1 \u2264 q < n \u2212 1 by assumption. That is, \u03c32 \u2212 \u03c3\u22172 = \u03c32\u03c1 \u2212 \u03c3\u22172\u03c1\u2217 and so \u03a3\u000f \u2212 \u03a3\u000f\u2217 = (\u03c32 \u2212 \u03c3\u22172)J. But plugging this into (2.3) yields HZJ = J, contradicting our assumption. 2 A.2 Proof of Corollary 2.4.2 Proof : We first note a fact about the matrix HZ. Since HZ is symmetric and idempotent, HZ[k, k] = \u2211 l (HZ[k, l]) 2 = (HZ[k, k]) 2 + \u2211 l 6=k (HZ[k, l]) 2 . 108 Appendix A. Appendix to Chapter 2 Thus, if HZ[k, k] = 1, then HZ[k, i] = HZ[i, k] = 0 for all i 6= k. To prove the corollary, we use Theorem 2.4.1 and a proof by contradic- tion. First suppose that the model is identifiable and suppose, by way of contradiction, a diagonal element of HZ is equal to 1. Without loss of generality, we assume HZ[1, 1] = 1. Then by our observation, HZ[1, i] = HZ[i, 1] = 0 for all i 6= 1. Fix \u03a3\u000f = diag{\u03c321 , . . . , \u03c32n} \u2208 \u0398\u0303\u000f. Let \u03c3\u221721 satisfy 0 < \u03c3\u22171 2 < \u03c321 and define \u03a3\u000f \u2217 = diag{\u03c3\u221712, \u03c322 , . . . , \u03c32n}. Then \u03a3\u000f \u2212 \u03a3\u000f\u2217 = diag{\u03c321\u2212\u03c3\u221712, 0, . . . , 0}. It is not hard to check that (2.3) is satisfied. Clearly, for any \u03a3u \u2208 \u0398u, \u03a3\u2217u defined as in (2.4) is also in \u0398u. Thus, the model is not identifiable, which contradicts our assumption. Now suppose that no diagonal element of HZ is equal to one and sup- pose, by contradiction, that the model is not identifiable. Then there exists nonidentical diagonal matrices, \u03a3\u000f and \u03a3\u000f \u2217, satisfying (2.3). As \u03a3\u000f 6= \u03a3\u000f\u2217, at least one of the diagonal elements of \u03a3\u000f \u2212\u03a3\u000f\u2217 is not zero. Suppose the k-th diagonal element is not zero. By (2.3), the k-th diagonal element of HZ must be one, contradicting our assumption. 2 109 Appendix B Appendix to Chapter 3 In this appendix, we provide the calculations in Sections 3.3.2 and 3.3.4 where we find the updates of \u03a3x and {\u03b2, \u03c32} in the ECME procedure. In Section B.4, we derive the first order condition (3.10) and the Hessian matrix (3.14) of Section 3.3.2 where we maximize the log-likelihood \u039bN over \u03a3x while holding the other parameters fixed.. In Section B.5, we derive the first order conditions (3.22) and (3.23), and the Hessian matrix (3.26) of Sec- tion 3.3.4 where we maximize \u039b\u0303N over {\u03b2, \u03c32} holding the other parameters fixed. We use the tool of matrix differential calculus, calculating first differen- tials to obtain the first order conditions and second differentials to obtain the Hessian matrices. The book by Magnus and Neudecker (1988) gives an elegant description on this subject. In Sections B.1-B.3, we follow the book to introduce some definitions and provide some background, mainly from Part Two of the book. We keep the same notation as in the book. Throughout this section, chapters and page numbers all refer to (Magnus and Neudecker, 1988). B.1 Definition of the first differential We first give the definition of the first differential for a vector function (a vector valued function with a vector argument). We show that the function\u2019s first differential is connected with its Jacobian matrix. We then give an extension of the definition to a matrix function (a matrix valued function with a matrix argument) and show how to identify the Jacobian matrix from the first differential. 110 Appendix B. Appendix to Chapter 3 Definition B.1.1 Let f : S \u2192