0. To see that this 15 Chapter 2. Identifiability discussion of linear mixed effects models model is not identifiable, use the same argument as in Example 2.2.1 and note that the constructed Σ∗u is in Θ̃u if Σu is in Θ̃u. In practice, one usually assumes a more specific structure for Σ, such as Σ = σ 2 I. Restrictions may lead to identifiability, and such restrictions and their effects on identifiability will be discussed in the next two sections. 2.3 Simple sufficient conditions of identifiability In this section, we find sufficient conditions of identifiability of model (2.1) assuming Θ̃ = Θ. A further examination of (2.2) gives us the following sufficient conditions. Clearly, if Σu is known, then ZΣuZ ′ is known, and so Σ is completely determined. If Σ is known, then the model is identifiable. To see this, consider (2.2) with Σ = Σ ∗. It follows that ZΣuZ ′ = ZΣ∗uZ ′ and so Σu = Σ ∗ u since Z is of full column rank. If ZΣuZ ′Σ −1 = K, where K is known and K+ I is of full column rank, then the model is identifiable. Suppose by way of contradiction the model is not identifiable. Then (2.2) holds for (Σu,Σ) 6= (Σ∗u,Σ∗) both in Θ̃ with, by assumption, ZΣuZ ′ = KΣ and ZΣ ∗ uZ ′ = KΣ ∗. Substituting these expressions into (2.2) yields KΣ + Σ = KΣ ∗ + Σ ∗, that is, (K + I)(Σ −Σ∗) = 0. Since K + I is of full rank, we must have Σ = Σ ∗. But, as shown in the previous paragraph, this implies that Σu = Σ ∗ u. The last condition is similar to a common condition for identifiabililty in simple linear regression models with measurement errors. The model assumes yi = β0 + β1xi + i, i ∼ (0, σ2 ), 16 Chapter 2. Identifiability discussion of linear mixed effects models where xi is observed with error having variance σ 2 u. The response yi then has variance σ2u +σ 2 . One of the common conditions of model identifiability is to assume the ratio σ2u/σ 2 is known. The inverse Σ −1 appearing in our last condition could be viewed as multivariate version of “denominator”. If there are any supplementary data, we may then be able to find an estimate of Σu, Σ or K and we can treat this estimate as the true value. The sufficient conditions for identifiability can then be satisfied. 2.4 Sufficient conditions of identifiability for a structured Σ As we observed from Examples 2.2.1 and 2.2.2, the model is not identifiable even if we restrict Σu to be a scalar multiple of a known matrix. In this section, we study the effect of putting restrictions on Σ. In Theorem 2.4.1 below, we give a necessary and sufficient condition of nonidentifiability, a condition that relies mainly on the design matrix Z via HZ = Z(Z ′Z)−1Z′. The theorem leads to four corollaries: Corollaries 2.4.1 and 2.4.2 give nec- essary and sufficient conditions for identifiability when Σ arises from an exchangeable covariance structure or is diagonal. Corollary 2.4.3 states an easily checked condition on Σ that guarantees identifiability of the model. That corollary is then applied to two commonly used error structures. Us- ing Corollary 2.4.4, we can generalize a known identifiability result, giving a shorter proof under weaker conditions. Theorem 2.4.1 Let Θ̃ ⊆ Θ and define HZ = Z(Z′Z)−1Z′. Then model (2.1) with parameter space B⊗ Θ̃ is nonidentifiable if and only if there exist (Σ,Σu) ∈ Θ̃ and (Σ∗,Σ∗u) ∈ Θ̃, with Σ∗ 6= Σ such that HZ [Σ −Σ∗] = Σ −Σ∗, (2.3) and Σ∗u = Σu + (Z ′Z)−1Z′ [Σ −Σ∗]Z(Z′Z)−1. (2.4) 17 Chapter 2. Identifiability discussion of linear mixed effects models Proof : Nonidentifiability of the model is equivalent to the existence of (Σ,Σu) and (Σ ∗,Σ∗u) in Θ̃, not equal, satisfying (2.2). Note that this is equivalent to having (Σ,Σu) and (Σ ∗,Σ∗u) in Θ̃ with Σ ∗ 6= Σ satisfying Z(Σu −Σ∗u)Z′ = Σ∗ −Σ. (2.5) Suppose the model is nonidentifiable. We premultiply (2.5) by Z′, post- multiply it by Z and then pre- and postmultiply by (Z′Z)−1 to get Σu −Σ∗u = (Z′Z)−1Z′ [Σ∗ −Σ]Z(Z′Z)−1. (2.6) This gives (2.4). To derive (2.3), premultiply (2.6) by Z, postmultiply (2.6) by Z′ to get Z(Σu −Σ∗u)Z′ = HZ [Σ∗ −Σ]HZ (2.7) which, by (2.5), is the same as Σ −Σ∗ = HZ [Σ −Σ∗]HZ. (2.8) Premultiplying (2.8) by the idempotent matrix HZ gives HZ [Σ −Σ∗] = HZ [Σ −Σ∗]HZ. Substituting (2.8) into the right side of the above yields (2.3). To prove the converse, we want to show that (2.3) and (2.4) lead to (2.5). It is clear from (2.4) that (2.7) holds. If we can show that (2.8) holds then we are done since substituting (2.8) into the right side of (2.7) yields (2.5). To show (2.8), from (2.3) and the symmetry of Σ −Σ∗, we see that HZ[Σ −Σ∗] = [Σ −Σ∗]HZ. Premultiplying the above identity by the idempotent matrix HZ gives HZ[Σ− Σ ∗] = HZ[Σ−Σ∗]HZ. Substituting (2.3) for the left side of the equation, we see that (2.8) holds. 2 18 Chapter 2. Identifiability discussion of linear mixed effects models The proofs of the next two corollaries are in Appendix A. Corollary 2.4.1 Let 1 be an n-vector with each element being one. Suppose that the distribution of 1, . . . , n is exchangeable, that is, the covariance matrix of is of the form σ2 [(1− ρ)I + ρJ] where J = 11′. Let Θ̃ = {Σ = σ2 [(1− ρ)I + ρJ] , σ2 > 0, −1/(n − 1) < ρ < 1}. Suppose the matrix Z satisfies 1′Z 6= 0 and rank(Z) = q with 1 ≤ q < n − 1. Suppose the parameter space is B⊗ Θ̃⊗Θu. Then model (2.1) is identifiable if and only if HZJ 6= J. Comments. The condition HZJ = J means the sum of the elements of each row of HZ is equal to one, and this is an easy condition to check. For the case that q = 1, i.e. Z is a column vector (z1, . . . , zn) ′ where ∑ zi 6= 0, HZ = z21 s2z z1z2 s2z · · · z1zn s2z ... ... znz1 s2z znz2 s2z · · · z2ns2z , where s2z = ∑ z2i . The model is identifiable if and only if Z is not a constant vector. When q = 2, suppose we have the usual simple linear regression model with centered covariates: Z = 1 z1 ... ... 1 zn , with ∑ zi = 0. (2.9) Then HZ = 1 n + z21 s2z 1 n + z1z2 s2z · · · 1n + z1zns2z ... ... 1 n + znz1 s2z 1 n + znz2 s2z · · · 1n + z 2 n s2z . and each row of HZ sums to one. Thus unfortunately, the model is not iden- tifiable under this Z combined with the exchangable covariance structure. Corollary 2.4.2 Suppose that Θ̃ equals the collection of all diagonal pos- itive definite n × n matrices. Then model (2.1) with parameter space B ⊗ 19 Chapter 2. Identifiability discussion of linear mixed effects models Θ̃ ⊗Θu is identifiable if and only none of the diagonal elements of HZ is equal to one. Comments. Again, the condition on HZ is easy to check. Consider the case q = 1. As we have seen, diagonal elements of HZ equal z 2 i / ∑ z2j , i = 1, . . . , n. Therefore, the model is identifiable if and only if Z does not have n− 1 zero elements. Consider q = 2 with Z as in (2.9). The model is identifiable provided, for all i, (1/n)+z2i / ∑ z2j doesn’t equal 1. So typically, the model is identifiable. The following corollary provides a sufficient condition for identifiability, a condition that can sometimes be easily checked. Consider (2.3). Note that the rank of HZ(Σ − Σ∗) is at most q, since the rank of HZ is q. Thus, for (2.3) to hold, we must be able to find some Σ and Σ ∗ with the rank of Σ −Σ∗ less than or equal to q. This proves the following. Corollary 2.4.3 Suppose that Θ̃ ⊆ Θ. Then model (2.1) with parameter space B × Θ̃ is identifiable if rank(Σ − Σ∗) > q for all Σ, Σ∗ in the parameter space. Now we apply Corollary 2.4.3 to show model identifiability under the “multiple of a known positive definite matrix” and the “MA(1)” covariance structures respectively in next two examples. Example 2.4.1 Multiple of a known positive definite matrix Fix R, symmetric and positive definite, and suppose (Σ,Σu) ∈ Θ̃ implies that Σ = σ 2R, some σ2 > 0. Consider Σ = σ 2R and Σ ∗ = σ∗2R, σ∗2 6= σ2. Clearly Σ−Σ∗ = (σ2−σ∗2)R is invertible, and so is of rank n, which we have assumed is greater than q. Thus, the model is identifiable. To show the model in Example 2.4.2 below is identifiable, we need the following lemma which is a result in (Graybill, 1983, p.285) Lemma 2.4.1 Let T be the n × n Toeplitz matrix with ones on the two parallel subdiagonals and zeroes elsewhere. Given two scalars a0 and a1, the 20 Chapter 2. Identifiability discussion of linear mixed effects models eigenvalues of the n× n matrix C = a0I + a1T are µi = a0 + 2|a1| cos ipi n+ 1 , i = 1, . . . , n. Example 2.4.2 MA(1) Suppose that n−1 > q. Let the components of have the MA(1) covariance structure, i.e. of the form σ2(I + ρT). Let Θ̃ = {Σ = σ2(I + ρT), σ2 > 0, |ρ| < 1/2} and suppose (Σ,Σu) ∈ Θ̃ implies that Σ ∈ Θ̃. Let Σ and Σ ∗ ∈ Θ̃. By Lemma 2.4.1, the eigenvalues of the difference matrix Σ −Σ∗ = (σ2 − σ∗2)I + (σ2ρ− σ∗2ρ∗)T are λi = (σ 2 − σ∗2) + 2 ∣∣∣σ2ρ− σ∗2ρ∗∣∣∣ cos ipi n+ 1 , i = 1, . . . , n. Given any (σ2, ρ) and (σ∗2, ρ∗), with (σ2, ρ) 6= (σ∗2, ρ∗), the number of zero λi’s is at most one. Hence, the rank of the difference matrix is greater than or equal to n − 1. Therefore, model (2.1) is identifiable under this MA(1) covariance structure. In longitudinal or functional data analysis, usually there are N individ- uals with the ith individual modelled as in (2.1): yi = Xiβ + Ziui + i, ui ∼ N(0,Σu), i ∼ N(0,Σi), (Σu,Σi) ∈ Θu ⊗ Θ̃i, ui and i independent. (2.10) Statistical inference is normally based on the joint model, the model of these N individuals. The following corollary gives sufficient conditions for identifiability of the joint model. The intuition behind the result is that, if we can identify Σu from one individual, then we can identify all of the Σi’s. Corollary 2.4.4 If an individual model (2.10) is identifiable, then the joint model is identifiable. 21 Chapter 2. Identifiability discussion of linear mixed effects models Proof: We notice each individual model (2.10) shares a common parameter, the covariance matrix Σu. If one individual model uniquely determines Σu and its Σi, the identified Σu will then yield identifiability of all the individual Σi’s since, if ZiΣuZ ′ i + i = ZiΣuZ ′ i + ∗ i , clearly i = ∗ i . Therefore, the joint model is identifiable. 2 Corollary 2.4.4 reduces the verification of a joint model’s identifiability to the individuals’. For instance, if the i-th individual model has Zi of full column rank and Σi = σ 2 Ini , where ni is the length of yi, then this individual model is identifiable by Corollary 2.4.1 and thus so is the joint model. Note that the other individual models can still have their Zj’s not of full column rank. Demidenko (2004, Chapters 2 & 3) studies the joint model but assumes the covariance matrix of i is σ 2Ini . Suppose each Zi is of dimension ni× q. Demidenko shows that the joint model is identifiable if at least one matrix Zi is of full column rank and ∑N i=1(ni − q) > 0. Using our argument in the previous paragraph, the condition ∑N i=1(ni − q) > 0 can be dropped. Furthermore, our result can be applied to more general Σ’s. 2.5 Extensions In this section, we discuss identifiability of a model in functional regression for a functional predictor y(·) and a scalar response w. We derive a necessary and sufficient condition of nonidentifiability for this model. We model y(t) as ∑ j αjφj(t) + ∑ k ukψk(t) plus error, with the φj ’s and ψk’s known, the αj’s unknown and the uk’s unknown and random. The dependence of the response w on y is modelled through an unknown functional coefficient, β: w = β0+ ∫ β(t) [y(t)−E(y(t))] dt+η where η is mean 0 normal noise. Thus, for appropriately defined ρ and with u = (u1, . . . , uq) ′, we can write w = β0 + ρ ′u + η, β0 ∈ <, ρ ∈ 0. Given θ (t), we update Σx, σ 2 , β, and σ 2, as described below. 3.3.2 Updating Σ(t)x We update Σ(t)x by maximizing ΛN over Σx while keeping the other pa- rameters fixed. Let SW = ∑N i=1(Wi − W̄)(Wi − W̄)′/N. We show that if σ2(t) and σ 2(t) are positive and if SW −Σ(t)d > 0, then our update Σ(t+1)x is positive definite and using it in the log likelihood instead of Σ(t)x increases the log likelihood. With detailed derivation in Section B.4, differentiating ΛN with respect 32 Chapter 3. Linear mixed models for measurement error in functional regression to Σx and equating to zero yields the first order condition C′Σ−1W C = C ′Σ−1W SW Σ −1 W C. (3.10) Here, C depends on β(t) and ΣW = CΣxC ′ + Σ (t) d from (3.9). Equation (3.10) holds provided ΣW is invertible at the critical value of Σx. Since we assume that σ2(t) and σ 2(t) are positive, Σ (t) d is positive definite. So ΣW will be invertible provided Σx is non-negative definite. We now solve (3.10) for Σx, first deriving two useful identities, (3.11) and (3.12). For ease, we drop the hats and superscript t’s on the parameter estimates that are being held fixed, that is, on µ̂W , σ 2(t) ,β (t), and σ2(t). Direct multiplication and some manipulation of the left hand side of the following shows that ( C′Σ−1d C ) × [( C′Σ−1d C )−1 + Σx ] C′Σ−1W = C ′Σ−1d . Solving this for CΣ−1W yields C′Σ−1W = [( C′Σ−1d C )−1 + Σx ]−1 ( C′Σ−1d C )−1 C′Σ−1d . (3.11) Postmultiplying both sides of identity (3.11) by C yields C′Σ−1W C = (( C′Σ−1d C )−1 + Σx )−1 . (3.12) Substituting (3.11) into the right side of (3.10) and (3.12) into the left side of (3.10) yields ( C′Σ−1d C )−1 + Σx = F SW F ′, where F = ( C′Σ−1d C )−1 C′Σ−1d . Note that F is of full row rank. Thus, the critical point is Σ̂x = F SW F ′ − ( C′Σ−1d C )−1 = F (SW −Σd)F′, (3.13) which is strictly positive definite. And so, clearly we have ΣW invertible at 33 Chapter 3. Linear mixed models for measurement error in functional regression the critical point. To see that the updated Σ̂x leads to an increase in ΛN , we show that the Hessian matrix, H(Σx) evaluated at Σ̂x is negative definite. The ijth element of H(Σx) is the second order partial derivative of ΛN with respect to the ith and jth elements of the vectorized Σx. From calculations in Section B.4, we have H(Σ̂x) = −(N/2) ( D̂⊗ D̂ ) , where D̂ = C′Σ̂ −1 W C, (3.14) which is clearly negative definite. 3.3.3 Updating σ2(t) We update σ 2(t) , holding all other parameter estimates fixed, using one E- step and one M-step of the EM algorithm. We show that if σ2(t) and σ 2(t) are positive and if Σ(t)x > 0, then our update σ 2(t+1) is positive. Increase of the log likelihood after updating σ 2(t) by σ 2(t+1) is guaranteed by the property of the EM algorithm. Recall (zi, Yi,xi), i = 1, . . . , N , are our complete data and Wi ≡ (z′i, Yi)′, i = 1, . . . , N , are the observed data. In conditional expectations, we let “ · ” stand for the observed data. Abusing notation slightly, we let f denote a generic density function with the exact meaning clear from the arguments. The E-Step of the EM algorithm calculates E θ (t) (∑N i=1 ln f(zi, Yi,xi) | · ) and the M-step maximizes this conditional expectation over σ2 to obtain σ 2(t+1) . By the conditional independence of zi and Yi given xi, ln f(zi, Yi,xi) ≡ ln f(zi|xi) + ln f(yi|xi) + ln f(xi). Since only ln f(zi|xi) contains σ2 , we can ignore the last two terms and obtain σ 2(t+1) via maximizing Eθ(t) (∑N i=1 ln f(zi|xi) | · ) over σ2 . From (3.4), we first get N∑ i=1 ln f(zi|xi) = −N 2 ln(det σ2R)− 1 2σ2 N∑ i=1 (zi−µ−Axi)′R−1(zi−µ−Axi). 34 Chapter 3. Linear mixed models for measurement error in functional regression Following (3.6), we have Cov(Wi,xi) = CΣx (3.15) which then leads to the conditional mean and covariance of xi given Wi as E[xi|Wi] ≡ µxi|Wi = ΣxC′Σ−1W (Wi − µW ), (3.16) Cov[xi|Wi] ≡ Σx|W = Σx −ΣxC′Σ−1W CΣx. (3.17) Let s̃ = N∑ i=1 (zi − µ̂−Aµ(t)xi|Wi) ′R−1(zi − µ̂−Aµ(t)xi|Wi). Routine calculations yield E θ (t) ( N∑ i=1 ln f(zi|xi)|· ) = −N 2 ln(detR)−nN 2 lnσ2− 1 2σ2 [ s̃+Ntr(R−1AΣ (t) x|WA ′) ] . Differentiating this conditional mean with respect to σ2 and equating the derivative to zero yields σ2(t+1) = 1 nN s̃+ 1 n tr[R−1AΣ (t) x|WA ′]. (3.18) We show the update σ 2(t+1) is positive in the following. The first term in σ 2(t+1) is positive, by assumption (c) and the fact that R is positive definite. The second term is nonnegative by the following argument. Using the famous matrix identity (VΣV′ + Σ0) −1 = Σ−10 −Σ−10 V ( Σ−1 + V′Σ−10 V )−1 V′Σ−10 provided the matrix orders properly defined, we see that Σ (t) x|W = ( Σ(t)x −1 + C(t)′Σ (t) d −1 C(t) )−1 which is positive definite. Given Σ (t) x|W > 0, assumption (a) then implies 35 Chapter 3. Linear mixed models for measurement error in functional regression that AΣ (t) x|WA ′ ≥ 0. Together with the fact that R > 0, the second term in (3.18) is thus nonnegative. 3.3.4 Updating β(t) and σ2(t) The updates of β(t) and σ2(t) maximize ΛN over β and σ 2, holding the other parameters fixed. Suppose that σ 2(t) > 0 and Σ (t) x > 0. We find unique critical points, β̂ and σ̂2, and show that they increase the log likelihood provided σ̂2 > 0. Note that log f(yi, zi) = log f(yi|zi) + log f(zi), that log f(zi) doesn’t depend on β or σ2. We also note given zi, yi is normal with mean E(Yi|zi) ≡ β0 + β′G(zi − µ), and variance σ2Y |z ≡ Var(Yi|zi) = β′Kβ + σ2 (3.19) where G = TΣxA ′Σ−1z (3.20) and K = TΣxT ′ −TΣxA′Σ−1z AΣxT′. Therefore, to maximize ΛN with respect to β and σ 2, we maximize Λ̃N = −N 2 ln(β′Kβ + σ2)− 1 2(β′Kβ + σ2) N∑ i=1 ( Yi − β0 − β′G(zi − µ) )2 .(3.21) With detailed derivation in Section B.5, equating ∂Λ̃N/∂β and ∂Λ̃N/∂σ 2 to zero yields respectively 1 β′Kβ + σ2 N∑ i=1 ( Yi − β0 − β′G(zi − µ) ) G(zi − µ) = 0 (3.22) 36 Chapter 3. Linear mixed models for measurement error in functional regression 1 (β′Kβ + σ2)2 [ N∑ i=1 ( Yi − β0 − β′G(zi − µ) )2 −N(β′Kβ + σ2)] = 0. (3.23) Note that G is of full row rank because of the following two observations. First, T is of full row rank by assumption (b). Second, the matrix Σz = Σ(t)x + σ 2(t) R is invertible since it is positive definite. Let M = G N∑ i=1 (zi − µ)(zi − µ)′G′. Then, by assumption (c), M is positive definite. Solving (3.22) for β and (3.23) for σ2 gives β(t+1) = β̂ = M−1G N∑ i=1 (zi − µ)(Yi − β0) σ2(t+1) = σ̂2 = 1 N N∑ i=1 ( Yi − β0 − β′G(zi − µ) )2 − β′Kβ. (3.24) Unfortunately, we are not guaranteed that σ̂2 is positive. However, in all of our data analyses and simulation studies, the final estimate of σ2 was always positive. Again, to check if the update increases Λ̃N , we show that the Hessian matrix is negative definite. We notice that (3.24) implies σ̂2Y |z ≡ β̂ ′ Kβ̂ + σ̂2 = 1 N N∑ i=1 ( Yi − β0 − β̂′G(zi − µ) )2 , (3.25) which is positive by assumption (d). With detailed calculation in Sec- tion B.5, the Hessian matrix HΛ̃(β, σ2) when evaluated at β̂ and σ̂2 equals HΛ̃(β̂, σ̂2) = − N (σ̂2Y |z) 2 2Kβ̂β̂′K + σ̂2Y |zN M Kβ̂ β̂ ′ K 12 . (3.26) It follows that HΛ̃(β̂, σ̂2) < 0 by the following argument. Let x1 ∈

0 by the following argument. x′Σ∗ux = x ′Σux + σ 2(1− s)x′(Z′Z)−1Z′JZ(Z′Z)−1x ≥ λmx′x + σ2(1− s)λx′x > 0 Now suppose that HZJ 6= J and suppose, by contradiction, that the model is not identifiable. Then, by Theorem 2.4.1, there exist nonidentical Σ and Σ ∗ satisfying (2.3) and, since the rank of HZ is q, the rank of Σ −Σ∗ is at most q. We have Σ −Σ∗ = [ (σ2 − σ∗2)− (σ2ρ− σ∗2ρ∗) ] I + (σ2ρ− σ∗2ρ∗)J. By Lemma A.1.1, the eigenvalues of Σ−Σ∗ are (σ2−σ∗2)− (σ2ρ−σ∗2ρ∗), which is of multiplicity n − 1 and (σ2 − σ∗2) + (n − 1)(σ2ρ − σ∗2ρ∗), of multiplicity 1. Since Σ −Σ∗ is not the zero matrix, all of the eigenvalues cannot be equal to 0: we must either have no eigenvalues equal to 0, one eigenvalue equal to 0, or n − 1 eigenvalues equal to 0. In order to have rank(Σ − Σ∗) ≤ q, the n − 1 multiple eigenvalues have to be zero since 1 ≤ q < n − 1 by assumption. That is, σ2 − σ∗2 = σ2ρ − σ∗2ρ∗ and so Σ − Σ∗ = (σ2 − σ∗2)J. But plugging this into (2.3) yields HZJ = J, contradicting our assumption. 2 A.2 Proof of Corollary 2.4.2 Proof : We first note a fact about the matrix HZ. Since HZ is symmetric and idempotent, HZ[k, k] = ∑ l (HZ[k, l]) 2 = (HZ[k, k]) 2 + ∑ l 6=k (HZ[k, l]) 2 . 108 Appendix A. Appendix to Chapter 2 Thus, if HZ[k, k] = 1, then HZ[k, i] = HZ[i, k] = 0 for all i 6= k. To prove the corollary, we use Theorem 2.4.1 and a proof by contradic- tion. First suppose that the model is identifiable and suppose, by way of contradiction, a diagonal element of HZ is equal to 1. Without loss of generality, we assume HZ[1, 1] = 1. Then by our observation, HZ[1, i] = HZ[i, 1] = 0 for all i 6= 1. Fix Σ = diag{σ21 , . . . , σ2n} ∈ Θ̃. Let σ∗21 satisfy 0 < σ∗1 2 < σ21 and define Σ ∗ = diag{σ∗12, σ22 , . . . , σ2n}. Then Σ − Σ∗ = diag{σ21−σ∗12, 0, . . . , 0}. It is not hard to check that (2.3) is satisfied. Clearly, for any Σu ∈ Θu, Σ∗u defined as in (2.4) is also in Θu. Thus, the model is not identifiable, which contradicts our assumption. Now suppose that no diagonal element of HZ is equal to one and sup- pose, by contradiction, that the model is not identifiable. Then there exists nonidentical diagonal matrices, Σ and Σ ∗, satisfying (2.3). As Σ 6= Σ∗, at least one of the diagonal elements of Σ −Σ∗ is not zero. Suppose the k-th diagonal element is not zero. By (2.3), the k-th diagonal element of HZ must be one, contradicting our assumption. 2 109 Appendix B Appendix to Chapter 3 In this appendix, we provide the calculations in Sections 3.3.2 and 3.3.4 where we find the updates of Σx and {β, σ2} in the ECME procedure. In Section B.4, we derive the first order condition (3.10) and the Hessian matrix (3.14) of Section 3.3.2 where we maximize the log-likelihood ΛN over Σx while holding the other parameters fixed.. In Section B.5, we derive the first order conditions (3.22) and (3.23), and the Hessian matrix (3.26) of Sec- tion 3.3.4 where we maximize Λ̃N over {β, σ2} holding the other parameters fixed. We use the tool of matrix differential calculus, calculating first differen- tials to obtain the first order conditions and second differentials to obtain the Hessian matrices. The book by Magnus and Neudecker (1988) gives an elegant description on this subject. In Sections B.1-B.3, we follow the book to introduce some definitions and provide some background, mainly from Part Two of the book. We keep the same notation as in the book. Throughout this section, chapters and page numbers all refer to (Magnus and Neudecker, 1988). B.1 Definition of the first differential We first give the definition of the first differential for a vector function (a vector valued function with a vector argument). We show that the function’s first differential is connected with its Jacobian matrix. We then give an extension of the definition to a matrix function (a matrix valued function with a matrix argument) and show how to identify the Jacobian matrix from the first differential. 110 Appendix B. Appendix to Chapter 3 Definition B.1.1 Let f : S →