o', o,4>o) \u2014 H{o,), where H((f>o, 4>o) and H(4>o, ) are defined i n Theorem 6.2 (notice H(o, o) is the negative entropy, where the entropy H(Q) is defined i n Section 6.4). A s mentioned above, K provides a measure of distance between parameter points ; the definit ion of H(o, 4>) i n Theorem 6.2 shows that K(o, 4>) is the large-sample average K u l l b a c k - L e i b l e r divergence per observation between pn(yi,- \u2022 \u2022 ,yn\\o) a n d pn(yi, \u2022 \u2022 \u2022 ,yn] )- J u a n g and Rabiner (1985) use the cross-entropy as a measure of distance between hidden M a r k o v models i n a numerical study of the effects of s tar t ing values and observation sequence length on m a x i m u m l ike l ihood estimates. In this section we prove a result needed for the large sample analysis of m a x i m u m l ike l i -hood estimators, that the cross-entropy between two different points is posit ive. O b t a i n i n g this result is surpris ingly difficult and w i l l lead to another s tudy of the asymptot ic be-haviour of the log- l ikel ihood. K i n g m a n ' s subaddit ive ergodic theorem which was used in 86 the previous section does not include a representation of the hmi t as the expected value of some random variable, as does the classical ergodic theorem. In this section, we w i l l direct ly establish the convergence of the normalized log-l ikel ihood a n d , using the previous section's results to identify the h m i t random variable w i t h the constant H((j>o, ), obtain such a representation for H((j>o,). A s i n Section 6.4, we w i l l s tudy the log-l ikel ihood using the relat ion n log pn(yi, \u2022 \u2022 \u2022, yn; 4) = \u00a3 log Pi(yi | y , _ i , . . . , yi; 4>). However, instead of approx imat ing pi(Yi | Y i _ i , . . . , Y i ; 4>) by a stat ionary process, we define a new probabi l i ty measure (on an augmented probabi l i ty space), under which {pi(Y{ | Y f _ i , . . . , Y i ; )} is itself stationary. T h e quantities derived under this new probabi l -i t y space w i l l then be related back to quantities defined i n terms of the or iginal probabi l i ty space. T h e mot ivat ion for using this approach came from Furstenburg and Kesten (1960), who studied the convergence of products of random matrices and also f rom Petr ie (1969) who used the latter results to obtain the convergence of the log- l ikel ihood for f inite-valued h idden M a r k o v models. There is a connection w i t h K i n g m a n ' s theorems which were used i n the previous section; K i n g m a n applied his results to obta in those of Furstenburg and Kesten (1960). O n the other hand, the hmi t results for {qn} obtained using K i n g m a n ' s theorems could be proved using arguments s imilar to those of Furstenburg and Kesten (1960). T h e approach to be followed requires a careful accounting of the probabi l i ty spaces and measures involved. W e begin w i t h the probabi l i ty measure P$0 defined on (y, B); for notat ion, see Section 6.3. L e t f i be the set of sequences {(yn,u^)}, where the yn are reals and the are ro-dimensional vectors. Let P ^ o ^ be the probabi l i ty measure on f i defined as the image of P ^ 0 on the subset where = ctj(o), the s tat ionary probabil i t ies of the 87 stochastic m a t r i x [OJJA:(^ O)]> and \u2022 r \u00bb = E ; ' ^ ( ^ , * = i \u00bb , . = i , 2 . . . . (..ID (0\/0 is taken to be 0); {u^n^} is determined by {yn} on this subset, so this definit ion is meaningful . Le t YN and be the coordinate mappings on f i . T h e goal is to define a probabi l i ty measure on f i , under w h i c h {U^} is a stat ionary sequence, while {YN} has the same dis t r ibut ion as it does under P^. Let TQ be the shift transformation on f i , that is , Tu{(yn, u^)} = { ( j \/ n + 1 , \u00ab ( n + 1 ) ) } . Le t P^T^k be the prob-a b i l i t y measure on f i which is the inverse image of P$ $ under the kth iterate of T Q , that is , P'MT~k{A) = P^{u e f i : e A}, A e % (So, is the B o r e l a-field of f i , defined i n the same way as for y; see Section 6.3). Define new probabi l i ty measures P^^ = H^oPfo^T\"1 \/l, for \/ = 1,2, T h e fol lowing l emma is essentially due to Furstenburg and Kesten (1960). L e m m a 6.4 There is a subsequence {4} and a probability measure P^O )0 such that (i) for every p, the joint distribution of (Yi,U^),..., (YP,U^) under P^]p converges weakly to the corresponding joint distribution under P^0>4>; (ii) {(Yn,U^)} is a stationary process under Pfo^; and (iii) {Yn} has the same distribution under P as under P ^ 0 . Proof : Let Qpl) be the jo int d is t r ibut ion of ( Y i , tfW),..., (Yp, U^) under P ^ . B y the Hel ley selection theorem (Bil l ingsley, 1979, Theorem 25.9), for each p there is a subsequence ) U^) {lk }k=i s u c h that Qpk converges to a measure Qp w i t h mass less than or equal to one; we show Qp has mass one. G i v e n e > 0, let Kt be a compact subset of Rp for which Pj,0(Kt) > 1 \u2014 e; observe also that the range of is contained i n the compact set [0, l ] m . N o w , because a l l of the P^ $ agree on events defined i n terms of {Yn}, i f = 88 {((YIMl)),...,(yP,uW)): (yi,...,yP) G Kt,(uW,...,uW) e [0,1]\u2122}, then Q)P(KP) = Pfo(Kc) > 1 \u2014 e for every \/. Since is compact , this proves {QP^} is t ight , and i t follows that Qp has mass one (Bi l l ingsley, 1979, Theorem 29.3). B y construct ion, the measures Qp are consistent i n the sense that Qp+i agrees w i t h Qp on sets defined i n terms of the first p coordinates. Therefore, by the K o l m o g o r o v extension theorem, there is a measure -P^0 l <\/, on ft w i t h pth-OTdeT f inite dimensional d is t r ibut ion Qp for every p. A l s o , since {\/jjf4^} can be assumed to be a subsequence of {\/j^ }, the diagonal method (Bi l l ingsley, 1979, p . 292) implies that there is a subsequence {lk} such that the pth-ordeT f inite dimensional distr ibutions of P^\\ converge to Qp for every p. Since {Yn} is a stationary process under P^0t^, a l l P^o 0,i agree w i t h P^0oAT~lA)i A\u00a3Ba\\ t l l i s follows f rom i=0 \u2022 R e m a r k It can further be shown that P$*\\ itself converges weakly to -P<^ o,0, but this w i l l not be needed. For the case 4> \u2014 0{Xi = j | Y J _ i , . . . , Y 1 } under P^0i^o; hence, the operat ion of shift ing the 89 t ime scale and tak ing the l i m i t to obta in has the effect of convert ing C\/j ' in to a con-di t iona l probabi l i ty depending on inf initely many past values of {Yi} . M o r e precisely, C\/j1^ represents P^0{X\\ = j \\ Y 0 , Y _ 1 ( . . . } i n the sense (to be proved i n L e m m a 6.6 below) that the condit ional density of Y i , . . . , Y ; given under P^ 0 > ( \/ , 0 , is U^pi(yu..., y{ | j;0). Therefore, the entropy H(o), defined in Section 6.4 by H(4>o) = E0[\u2014 l o g { \u00a3 j P^0{Xi = j | Y0fY-1,...}f(Y1,ej)}], is seen to be equal to E M [ - l o g ^ - uf* f(Yx, 9j)}], w i t h the consequence that this representation can be extended to parameters other t h a n Q, as i n the fol lowing result. L e m m a 6.5 Assume conditions 1, 3, and 6 hold. Then, for every ))}]. Proof : T h e ergodic theorem implies ^ { ^ ^ E i < \u00ab { E ^ 0 \/ ( y . - , W))> = z} = i, i=i j where Z is a random variable w i t h EM[Z) = E^^og^j Uj^f(Xu 8j())}]-Fi rs t we show E^,0t^[Z] < H()- T h e recursion relations (6.11) i m p l y \u00a3 ui2)f{Y2, ek{)) = \u00a3 \u00a3 ^ \/ ( Y x , W ) ) \/ ( Y 2 , ek{4>))i \u00a3 u^m, e^)) k j k j = S ^ f t e . ^ I * * ) \/ \u00a3 cfW* I 3\\4>), (6-12) and i terat ing gives E^ , V (y\u00bb ,W)) = J\/fW^ i,..., Yi | j; <\u00a3)\/ O-fWiC^, \u2022 \u2022 \u2022, I J'; <\/>)\u2022 (6-13) Therefore, \u00a3 l o g { \u00a3 c f \/ ( K - , W ) ) } = i o g { \u00a3 f \/ f p n ( ^ , - - - ^ n l i ; ^ ) } !=1 j J < l o g g n ( Y i , . . . , Y n ; 0 ) , 90 and , since {Y,} has the same dis t r ibut ion under P ^ 0 as under Pfa^, the proof of Theorem 6.2 gives Z < H(4>0,4>). N e x t we show E^0t^[Z\\ > H(4>o, 4>). Assume, wi thout loss of generality, H( - o o . U s i n g the fact that the joint d is t r ibut ion of (Y\\, U^) under P ^ J , converges weakly to the corresponding d is t r ibut ion under p0 o ,0 , we get (Bi l l ingsley, 1979, Theorem 25.11) l i m s u p \/ log{\u00a3t\/f)\/(^,W))}^S< \/ l o g i ^ J \/ f \/ m , ^ ^ ) ) } ^ ^ , k J a j J a j where A = {log{\u00a3i ^ \/ ( Y i , * ; ( > ) ) } < 0}. A l s o , since (log{\u00a3jUPf(Yi, W))})+ is uni formly integrable w i t h respect to P$*\\ by condi t ion 6, l i m \/ l o g { \u00a3 c f V ( ^ i , W))}^i = \/ l o g i E ^ \/ m ^ i W ) ) } ^ , * -\u00ab JQ\\A j Jn\\A j Therefore, Z = E+0,4\\og{Y,uVm,93i))}} > h m s u p ^ ^ l o g i ^ C \/ f ^ Y ! , ^ ^ ) ) } ] j k i = l i m s u p I \u00a3 l ^ P o g E ffj'VW, W))}] = l i m s u p l o g { \u00a3 c f \/ ( Y \u201e 9M))}] k k !=1 j = l imsupj f3^ 0 [ log{\u00a3<* j { (h )p i k (Y i , . . . ,Y [ k \\ j;)}] k the second last equality follows f rom (6.13) and f\/ j 1^ = aj(o) on the support of P^o ^, and the last f r o m the proof of Theorem 6.2, using aj((j>o) > 0 (which follows f rom the i r reducib i l i ty of [co)])- 1=1 T h e representation in the lemma is used next to prove that the cross-entropy between two different parameter points is positive. 91 L e m m a 6.6 Assume conditions 1-3, 5, and 6 hold. For every E $c, K(o, ) > 0. \/\/ rf, o, ) > 0. Proof : T h e first step is a verif ication of the property (described fol lowing L e m m a 6.4) of the jo int d i s t r ibut ion of Y i , . . . ,Yi, IjW under P ^ 0 , ^ 0 , namely that ( Y i , . . . , YJ) has the con-d i t i o n a l density J2j UJ^pi(yi, ...,yi\\ j; 4>o), given U^. T h e case i = 2 w i l l be considered; the general case is verified s imilar ly . Le t Q be the dis t r ibut ion of U^> under P ^ 0 , ^ 0 ; then, i f B is a cont inui ty set of Q, P*o,*o{(Yi,Y2)eA,uW\u20acB} 1 ^ f f = l i m r \u00a3 \/ \/ E u i J ) P 2 ( y i , i \/ i + i I i j ^ o J d\/i^ O r f M i \/ i + O d Q K ^ ) = \/ \/ \u00a3 uiP2 ( 2\/i,yi+i I J\\4>o)dii{yi)du(y2)dQ{-lk\\u) k J B J A 3 = j z\\2u3P2(yi,yi+i \\ 3]o)d\u00bb(yi)dp(y2)dQ{u), where Q{ and Q(0 are the distr ibutions of \u00a3\/(') under P ^ ^ and \u00a3 ( - = 1 ^ 0 fl respec-t ively ; the second equal i ty follows f rom Ijf = P0,0) = ^ o , 0 o [ l o g { E ^ 1 ) \/ ( y i , ^ ( ^ o ) ) } + l o g{j : J \/ j 2 )\/(y2 , ^ ( ^ o ) ) } ] 3 3 = E^logiJ^U^MYi^ | j ; ^ o ) } ] 3 = J J J E u i K(!'i^2 | j;0o)log{^ttiP2(yi,W I J;o)}d\/J,(yi)dfj,(y2)dQ(u). 3 3 Next we extend the construct ion of L e m m a 6.4 to simultaneously include two sequences, { w h i c h satisfies (6.11) w i t h the parameter value 0, and { V W } w h i c h satisfies (6.11) 92 w i t h the parameter value (f>. T h e n , as above, 2#Oo,<\u00a3) = ^0,41og{X V^p2{Yu Y2 | j; )}] 3 = JjJ ^ \u00a3 ^ 2 ( ^ 1 , 2 \/ 2 1 j ; < M l o g { \u00a3 ^ ^ 3 3 where Q'(-, \u2022) is the d is t r ibut ion of ( { j W , W 1 ).) under P^0lo,) = HUE jUjp2(yi,y2 | j; Q) log j ^ y j P 2 ( y i *yl\\j;) } dn{y\\)dn(y2)dQ'(u, v). Since the inner integral , for f ixed u, v, is the K u l l b a c k - L e i b l e r divergence between two mixture densities, K(o, 4>) > 0 a n d , i f K(o, ) = 0, then this K u l l b a c k - L e i b l e r divergence is zero for almost every pair u, v (wi th respect to Q'(-, \u2022))\u2022 La the latter case, L e m m a 2.1 and L e m m a 2.2 i m p l y Y,llu3a3k{o\u00a5(ej(^)M*o)) = \u00a3 \u00a3 W * W(*>(*)A(*)) 3 k j k for almost every pair u, v (wi th respect to Q'(-, \u2022)), where 8 denotes the d i s t r ibut ion func-t ion of a point mass. Since has the dis t r ibut ion of P^Xi = j | Y 0 , Y L j , . . . ; 0}, E$o,Wj1^] \u2014 (Xji&o)- Therefore, K(0,) = 0 implies E E a i ( ^ ) a i ^ W ( \u00ab y ( * o ) A W o ) ) = HYlf v3dQ'(u,v)ajk()M)); 3 k 3 k hence and o define the same stat ionary laws for (#A'I ^ X2) a n ( i s o 4> ~ n 6.7 Consistency of the m a x i m u m likelihood estimator W e can now present the m a i n result, which concerns the consistency of the m a x i m u m l ikel ihood estimator. T h e results of the previous sections al low the appl icat ion of the basic 93 strategy invented by W a l d (1949) and further developed by Kiefer and Wol fowitz (1956). Consistency must be stated i n terms of convergence of the equivalence class of the max-i m u m l ike l ihood estimate 4>n (see Section 6.2). W e w i l l obtain convergence i n the quotient topology defined relative to the equivalence relation ~ . Redner (1981) used convergence i n this sense for estimators of the parameters of a finite mix ture d is t r ibut ion (see Section 2.4). Consistency i n the sense of the quotient topology s imply means that any open subset of the parameter space $ c w h i c h contains the equivalence class (fo of the true parameter must , for large n, contain the equivalence class of < n^. Theorem 6.3 Assume conditions 1-6 hold. Let o be the true parameter value and let 4>n be a maximum likelihood estimator. Then (f>n converges to ) denote l o g g n ( l i , . . . , Yn; )]\/n = II(0,) < H (o), by L e m m a 6.6 and the proof of L e m m a 6.2; hence there is an e > 0 and integer n e such that E^0[logqne((j))]\/n\u20ac < H(o,(f>o) \u2014 e. N o w , qne{-) is continuous, and , using the integrabi l i ty condi t ion 6, J5^ o [ ( logsup0. e C ) ^ q , n c ( < ^ ' ) ) + ] < oo, for a smal l enough neighbourhood 0$ of \\ therefore, there is an open neighbourhood Oj, for w h i c h E')}\/ne < E^log qn\u20ac()]\/ne + e\/2 < H{0, 0) - e\/2. A s i n the proof of L e m m a 6.2, 0 > log{ sup pn((f>')\/ sup qn{4>')} > log m i n a 1 ; 1 * , and so log{sup^\/ e C ) ^ Pn(,')}\/n and l o g { s u p ^ e C ) ^ qn(4>')}\/n have the same l i m i t i n g behaviour as n \u2014*\u2022 oo. A l s o , Wst \u2014 l \u00b0 g s u p 0 \/ g C ^ qt-9(') satisfies the conditions of K i n g m a n ' s subad-di t ive ergodic theorem (see equations (6.8), (6.9), and (6.10)), and hence def ff(<\u00a3o,')}\/n = H(0, ; O ^ ) , w i t h probabi l i ty one. B y another property of subaddit ive processes ( K i n g m a n , 1976, The-orem 1.1), H(o,;C>4,) = i n f ^ 0 [ l o g sup qn(')]\/n, 71 so that H(0,;\u20ac><\/,)< Epilog sup qM')]\/nt. Therefore, w i t h probabihty one, l i m l o g sup pn(')\/n = H(4>0,\\O4>) < H(0,4>0) - e\/2. Let C be a closed subset of $ c , not containing any points of the equivalence class 4>Q. Since $ c is compact , C is compact and so is covered by the union U^ = 1OA , where {<\/>i,... ,4>d} is a finite subset of C and Oh = 0(f,h. Therefore, w i t h probabi l i ty one, sup(logp n (<\u00a3) - logp n (<\u00a3 0 ) ) = max{ log sup pn() - l o g p n ( ^ 0 ) } ->\u2022 - o o , 4>ec h e0h w h i c h implies that , for any open subset O of <&c which contains the equivalence class 0, 4>n \u00a3 O for large n. If follows that the m a x i m u m l ike l ihood estimator converges to o i n the quotient topology, w i t h probabihty one. \u2022 95 Chapter 7 F u t u r e r e s e a r c h This chapter concerns three topics for future research: (i) a class of doubly stochastic Poisson processes which has the features of the hidden Markov model in continuous time; (ii) extensions of the large sample results of Chapter 5 to allow inference based on the maximum likelihood estimators for hidden Markov models; and (iii) hypothesis testing in mixture models for independent observations. We will give suggestions of specific problems and, in some cases, describe some preliminary work addressing these problems. 7.1 D o u b l y stochastic Poisson processes Let X(t) be a continuous time Markov chain with states 1,... , m and, conditional on the entire sample path of X, let N(t) be a non-homogeneous Poisson process with intensity function Ox(t)- Then N(t) is an example of a doubly stochastic Poisson process or Cox process. This class of point processes was proposed by Cox (1955) and has been studied extensively (see GrandeU, 1976, Snyder, 1975, and Karr, 1986). We have begun to investigate the computation of maximum likelihood estimates by the E M algorithm, using an analogy with the hidden Markov model with Poisson components. Conditional probabilities for X(t) given observation of N(t) on an interval can be calculated using a forward-backward algorithm. The recursive equations of Section 5.2 are replaced by 96 differential equations to describe the evolution over t ime intervals between observed events and u p d a t i n g equations to describe the evolut ion at event times. T h e differential equations yield explicit so lut ion, thus a l lowing simple evaluation of the required condit ional means. T h e relevant equations for P{X(t) = j \\ N(s) : 0 < s < t}, j \u2014 1 , . . . , m , have been given i n studies of f i l ter ing for C o x processes (Yashin , 1970, R u d e m o , 1972, and Snyder, 1972). W e mention that S m i t h and K a r r (1985) discuss m a x i m u m l ike l ihood est imation for the case that the intensity process takes only two values, one of which is zero, and D e m b o and Ze i touni (1986) use the E M algor i thm for a C o x process w i t h an intensity process which is a diffusion. T h e consistency of the m a x i m u m l ikel ihood estimators, as the observation period be-comes infinitely long, could perhaps be established using extensions of the arguments of Chapter 6 to the continuous t ime setting. For instance, the not ion of identi f iabi l i ty seems to carry over immediate ly w i t h an inf initesimal m a t r i x of transi t ion rates replacing the transi t ion probabihty matr ix . T h e subaddit ive ergodic theorem of K i n g m a n admits certain extensions to continuous t ime (see the discussion of this point i n K i n g m a n , 1976). Some of the other constructions, the cross-entropy, for instance, require a l i t t le more work. T h e questions of asymptot ic normal i ty of the estimators, x2 approximations of l ikel ihood rat io test statistics, and computat ion of the observed informat ion arise. T h e computat ion of the observed informat ion matr ix can be achieved by apply ing the general formula of Louis (1982). These questions should be addressed because of the great importance attached to C o x processes i n the field of point process model l ing. A l s o , the test of the one-component model against models containing more than one component provides a test of the hypothesis of a Poisson process which is sensitive to clustering of events. Ogata (1978) and Konecny (1986) develop asymptot ic theory of m a x i m u m l ikel ihood estimators for C o x processes, but 97 their results do not immediate ly apply to the model considered here. 7.2 Inference for hidden M a r k o v models It seems reasonable to believe that the m a x i m u m l ike l ihood estimators for the hidden M a r k o v model of Chapters 5 and 6 are asymptot ica l ly normal , w i t h large sample variances obtained f rom the informat ion m a t r i x i n the usual way; some restrict ion on the true param-eter value s imilar to that required for finite mixture distr ibutions w i l l l ikely be necessary. (Louis 's method for calculat ing informat ion can also be applied i n this model.) A potent ia l ly impor tant appl icat ion of the large sample results is to testing independence of a sequence of r a n d o m variables; since the independent mix ture model is nested w i t h i n the h idden M a r k o v model , the l ikel ihood rat io test could be used. T h e test statistic based on the m-component h idden M a r k o v model might have the usual large sample x2-distribution under the right conditions on the n u l l value of the parameter (for example, that it defines a mix ture d is t r ibut ion wi th exactly m components). 7.3 Hypothesis testing in mixture models In Section 2.4, we explained why the usual large sample x2 approximat ion for the l ikel ihood rat io test statist ic does not hold (at least cannot be obtained i n the usual way) for a n u l l hypothesis specifying a parametric fami ly of densities, w i t h the alternative specifying a class of mixtures f rom that family. H a r t i g a n (1985) showed, for a normal mixture family, that the x2 approx imat ion certainly does not ho ld , as the test statist ic converges to oo w i t h the sample size, under the n u l l hypothesis. G h o s h and Sen (1985) proved that a l ikel ihood rat io test stat ist ic , obtained under some constraints on the parameters, possesses a l i m i t i n g d is t r ibut ion expressed i n terms of a funct ional of a Gaussian process; this result appears to 98 have l i m i t e d use because of the diff iculty of determining the d is t r ibut ion of this funct ional . A s i d e f r o m the importance of mix ture models, tests of mixture hypotheses can be applied to hypothesis test ing situations not direct ly related to mixtures . For instance, the l ikel ihood rat io test of a pure Poisson dis t r ibut ion against a Poisson mixture yields a Poisson goodness of fit test w i t h the potent ia l to be sensitive to over-dispersion. W e undertook a smal l s imulat ion s tudy to compare the power of this test w i t h the test based on the index of dispersion. T h i s s tudy indicated that the usual x2 approximat ion does not hold for the l ikel ihood ratio test statistic and also that the potent ia l for improvement i n power over the dispersion test is rather l i m i t e d . 99 References A s k a r , M . and D e r i n , H . (1981). A recursive a lgor i thm for the Bayes solution of the smooth-i n g problem. IEEE Trans. Automatic Control, 26, 558-560. B a u m , L . E . and E a g o n , J . A . (1967). A n inequality w i t h applications to statist ical estima-t ion for probabi l is t ic functions of M a r k o v processes and to a model for ecology. Bull. Amer. Math. Soc, 73, 360-363. B a u m , L . E . and Petr ie , T . A . (1966). Stat is t ical inference for probabi l is t ic functions of finite state M a r k o v chains. Ann. Math. Statist., 37, 1554-1563. B a u m , L . E . , Petr ie , T . , Soules, G . , and Weiss, N . (1970). A maximiza t ion technique occur-r i n g i n the stat ist ical analysis of probabil ist ic functions of M a r k o v chains. Ann. Math. Statist, 41, 164-171. B b h n i n g , D . (1982). Convergence of Simar's a lgor i thm for f inding the M L E of a compound Poisson process. Ann. Statist., 10, 1006-1008. B o h n i n g , D . (1985). N u m e r i c a l est imation of a probabi l i ty measure. J. Stat. Plan. Inf., 11, 57-69. Bi l l ingsley, P . (1965). Ergodic Theory and Information. W i l e y , N e w Y o r k . Bi l l ingsley, P . (1979). Probability and Measure. W i l e y , New Y o r k . B u t l e r , R . W . (1986). Predic t ive l ike l ihood inference w i t h applications ( w i t h Discussion). J.R.S.S. B 48, 1-38. C h u r c h i l l , G . A . (1989). Stochastic models for heterogeneous D N A sequences. Bull. Math. Biol, 51, 79-94. C o x , D . R . (1955). Some stat ist ical models connected w i t h series of events. J.R.S.S. B 17, 129-164. D e m b o , A . and Ze i touni , O . (1986). Parameter est imation of part ia l ly observed continuous t ime stochastic processes v i a the E M algor i thm. Stoch. Proc. Appl., 23, 91-113. Dempster , A . P . , L a i r d , N . M . , and R u b i n , D . B . (1977). M a x i m u m l ikel ihood est imation f rom incomplete data v i a the E M algor i thm (wi th discussion). J.R.S.S. B , 39, 1-38. Devi jver , P . A . (1985). B a u m ' s forward-backward a lgor i thm revisited. Pattern Recognition Letters 3, 369-373. E d e l m a n , D . (1988). E s t i m a t i o n of the m i x i n g d is t r ibut ion for a n o r m a l mean w i t h appl i -cations to the compound decision problem. Ann. Stat. 16, 1609-1622. Feller, W . (1943). O n a general class of contagious distr ibutions. Ann. Math. Stat., 14, 389-399. Furstenberg, H . and Kes ten , H . (1960). Products of random matrices. Ann. Math. Stat., 31, 457-469. Geisser, S. (1975). T h e predict ive sample reuse method w i t h applicat ions. J.A.S.A., 70, 320-328. 100 G h o s h , J . and Sen, P . K . (1985). O n the asymptot ic performance of the log l ikel ihood ratio statistic for the mixture model and related results. Proc. of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer, V o l . II ( L . M . L e C a m and R . A . Olshen, eds.), p p . 789-806. G i l b e r t , E . J . (1959). O n the identi f iabi l i ty problem for functions of finite M a r k o v chains. Ann. Math. Stat, 30, 688-697. G r a n d e l l , J . (1976). Doubly Stochastic Poisson Processes. Springer-Verlag, B e r l i n . H a r t i g a n , J . A . (1985). A failure of l ikel ihood asymptotics for normal mixtures . Proc. of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer, V o l . II, ( L . M . Le C a m and R . A . Olshen, eds.), pp . 807-810. Hathaway, R . J . (1985). A constrained formulat ion of maximum-l ike l ihood est imation for N o r m a l mixture dis tr ibut ions . Ann. Stat. 13, 795-800. H e c k m a n , J . and Singer, B . (1984). A method for m i n i m i z i n g the impact of d is t r ibut ional assumptions i n econometric models for durat ion data . Econometrica, 52, 271-320. H e n n a , J . (1985). O n est imating the number of constituents of a finite mixture of continuous dis tr ibut ions . Ann. Inst. Statist. Math. 37, P a r t A , 235-240. Jeffreys, H . (1932). A n alternative to the refection of observations. Proc. Royal. Soc. Lon-don A 137, 78-87. Jewel l , N . P . (1982). M i x t u r e s of exponential distr ibutions. Ann. Statist., 10, 479-484. Juang , B . - H . and Rabiner , L . R . (1985). A probabil ist ic distance measure for hidden M a r k o v models. AT&T Technical Journal, 64, 391-408. K a r l i n , S. and Taylor , H . M . (1975). A First Course in Stochastic Processes. Academic Press, N e w Y o r k . K a r r , A . F . (1986). Point processes and their statistical inference. Dekker , N e w Y o r k . Kie fer , J . , and Wol fowi tz , J . (1956). Consistency of the m a x i m u m l ikel ihood estimator i n the presence of infinitely m a n y nuisance parameters. Ann. Math. Statist., 27, 887-906. K i n g m a n , J . F . C . (1976). Subaddit ive processes. In Ecole d'Ete de Probabilites de Saint-Flour V-1976, E d : P . L . Hennequin , Lecture Notes i n Mathemat ics 539, pp . 167-223, Springer-Verlag, B e r l i n . K i t a g a w a , G . (1987). N o n - G a u s s i a n state-space model ing of nonstationary t ime series. J.A.S.A., 82, 1032-1041. K o h n , R . and Ansley , C F . (1987). Comment on K i t a g a w a (1987). J.A.S.A., 82, 1041-1044. Konecny , F . (1986). M a x i m u m l ikel ihood est imation for doubly stochastic Poisson processes w i t h par t ia l observations. Stochastics, 16, 51-63. L a i r d , N . M . (1978). Nonparametr ic m a x i m u m l ikel ihood est imation of a m i x i n g dis tr ibu-t ion . J.A.S.A., 73, 805-811. 101 Levinson , S . E . , Rab iner , L . R . , and Sondhi , M . M . (1983). A n introduct ion to the applica-t ion of the theory of probabil ist ic functions of a M a r k o v process i n automatic speech recognit ion. Bell System Technical Journal, 62, 1035-1074. L i n d g r e n , G . (1978). M a r k o v regime models for mixed distr ibutions and switching regres-sions. Scand. J. Statist., 5, 81-91. L indsay , B . G . (1981). Properties of the m a x i m u m l ikel ihood estimator of a m i x i n g dis tr i -b u t i o n . In Statistical Distributions in Scientific Work (Eds . C . Ta i l l ie et al.), V o l . 5, 95-110. L indsay , B . G . (1983a). T h e geometry of m i x i n g l ikel ihoods: a general theory. Ann. Statist., 11, 86-94. L indsay , B . G . (1983b). T h e geometry of m i x i n g l ikel ihoods, part II: the exponential family. Ann. Statist, 11, 783-792. L i n h a r t , H . and Z u c c h i n i , W . (1986). Model Selection. W i l e y , N e w Y o r k . Loeve, M . (1977). Probability Theory I, 4 th edit ion. Springer-Verlag, N e w Y o r k . L o u i s , T . A . (1982). F i n d i n g the observed information matr ix when using the E M algor i thm. J.R.S.S. B 44, 226-233. M a c d o n a l d , P . D . M . (1975). E s t i m a t i o n of finite d is t r ibut ion mixtures . In Applied Statistics, R . P . G u p t a ( E d . ) . N o r t h - H o l l a n d , A m s t e r d a m , pp . 231-245. M c L a c h l a n , G . J . and Bas ford , K . E . (1988). Mixture Models. Dekker , N e w Y o r k . Newcomb, S. (1886). A generalized theory of the combinat ion of observations so as to obtain the best result. Amer. J. Math., 8, 343-366. Ni jenhuis , A . and W i l f , H . S . (1978). Combinatorial Algorithms, 2nd ed. Academic Press, N e w Y o r k . O g a t a , Y . (1978). T h e asymptot ic behaviour of m a x i m u m l ikel ihood estimators for station-ary point processes. Ann. Inst. Statist. Math., 30, P a r t A , 243-261. O r c h a r d , T . and W o o d b u r y , M . A . (1972). A missing informat ion principle : theory and ap-plicat ions. Proc. of the Sixth Berkeley Symp. on Math. Statist, and Prob. 1, 697-715. Petr ie , T . (1969). Probabi l i s t i c functions of finite state M a r k o v chains. Ann. Math. Statist., 40, 97-115. P f a n z a g l , J . (1988). Consistency of m a x i m u m l ikel ihood estimators for certain nonparamet-ric families, in part icular : mixtures . J. Stat. Plan. Inf., 19, 137-158. Phelps , R . R . (1966). Lectures on Choquet's Theorem. V a n N o s t r a n d , Pr ince ton . Redner , R . (1981). Note on the consistency of the m a x i m u m l ikel ihood estimate for non-identif iable distr ibutions. Ann. Stat., 9, 225-228. Redner , R . A . and Walker , H . F . (1984). M i x t u r e densities, m a x i m u m l ikel ihood and the E M a lgor i thm. SI AM Rev., 26, 195-239. 102 Robbins , H . (1956). A n empir ica l Bayes approach to statistics. Proc. of the Third Berkeley Symp. on Math. Statist, and Prob. 1, 157-163. Roberts , A . W . and Varberg , D . E . (1973). Convex Functions. Academic Press, New Y o r k . R u d e m o , M . (1972). D o u b l y stochastic Poisson processes and process control . Adv. Appl. Prob. 4, 318-338. Schwarz, G . (1978). E s t i m a t i n g the dimension of a model . Ann. Statist., 6, 461-464. Silvey, S . D . (1980). Optimal Design. C h a p m a n and H a l l , L o n d o n . S imar , L . (1976). M a x i m u m l ikel ihood est imation of a compound Poisson process. Ann. Statist, 4, 1200-1209. S m i t h , J . and K a r r , A . (1985). Stat is t ica l inference for point process models of rainfal l . Water Resour. Res., 21, 73-79. Snyder , D . L . (1972). F i l t e r i n g and detection for doubly stochastic Poisson processes. IEEE Trans. Info. Thy., IT-18, 91-102. Snyder , D . L . (1975). Random Point Processes. W i l e y , N e w Y o r k . Stone, M . (1974). Cross-val idatory choice and assessment of stat ist ical predictions (wi th Discussion) . J.R.S.S. B , 36, 111-147. Stone, M . (1977). A n asymptot ic equivalence of choice of model by cross-validation and Akaike 's cr i ter ion. J.R.S.S. B , 39, 44-47. Sundberg, R . (1974). M a x i m u m l ikel ihood theory for incomplete data f r o m an exponential family. Scand. J. Statist. 1, 49-58. Sundberg, R . (1976). A n iterative method for solution of the l ike l ihood equations for i n -complete data f r o m exponential families. Comm. Stat. B 5, 55-64. Teicher, H . (1960). O n the mixture of distr ibutions. Ann. Math. Statist., 31, 55-73. Teicher, H . (1961). Identif iabil i ty of mixtures . Ann. Math. Statist, 32, 244-248. Teicher, H . (1963). Identif iabil i ty of finite mixtures . Ann. Math. Statist, 34, 1265-1269. Teicher, H . (1967). Identif iabil i ty of mixtures of product measures. Ann. Math. Statist., 38, 1300-1302. T i t t e r i n g t o n , D . M . , S m i t h , A . F . M . and M a k o v , U . E . (1985). Statistical Analysis of Finite Mixture Distributions. W i l e y , N e w Y o r k . Varadara jan , V . S . (1958). O n the convergence of sample probabi l i ty distr ibutions. Sankhya 19, 23-26. W a l d , A . (1949). Note on the consistency of the m a x i m u m l ikel ihood estimate. Ann. Math. Statist, 20, 595-601. West , M . , H a r r i s o n , P . J . , and M i g o n , H . S . (1985). D y n a m i c generalized l inear models and Bayesian forecasting. J.A.S.A., 80, 73-83. 103 W i t t m a n n , B . K . , R u r a k , D . W . , and Taylor , S. (1984). Real - t ime ultrasound observation of breathing and body movements i n fetal lambs f rom 55 days gestation to term. Abstract presented at the XI Annual Conference, Society for the Study of Fetal Physiology, O x f o r d . Y a s h i n , A . I . (1970). F i l t e r i n g of j u m p processes. E n g l , trans, of Avtomatika i Telemekhanika 5, 52-58. 104 Appendix A Fortran program for computing maximum penal-ized likelihood estimates C This program calculates maximum likelihood estimates for mixtures of C Poisson distributions. For each value of m = # of components, C the constrained mle i s found using the EM algorithm C with starting values obtained from partitions of the data values. real lambda(6),lmax(6),p(6),pmax(6) real 11, llprev, Umax, aic, bic, t o l , mean integer f(8),x(8),group(8),fpos(8),t,h,r(99).total,rtotal integer m,k,j,flag,npos,conv,n,max,it,dim,totit,maxit C Read in the data, maximum # of iterations and stopping criterion. read(l,120) (f(j),j=l,8) 120 format(8i5) read(1,130) maxit,tol 130 format(i6,f9.6) max=l do 10 j=l,8 10 if(f(j).gt.0) max=j n=f(l) do 15 j=2,max 15 n=n+f(j) write(7,705)(f(j),j=l,max) 705 formatC Frequency distribution: ',15i5\/) write(7,706)maxit,tol 706 format(' max. # i t e r . : ' , i 6 , ' stopping criterion:',g8.1) mean=f(2) do 20 j=3,max 20 mean=mean+f(j)*(j-l) mean=mean\/n if(max.gt.2) goto 25 write(7,710)mean 710 format(' Only 0,1 counts so mle has 1 component: ',f8.4) stop C Output results for 1 component. 25 m=l llprev=(mean*alog(mean)-mean) *n dim=l aic=llprev-dim bic=llprev-0.5*dim*alog(float(n)) write(7,718) 718 formatC\/' ',23x,' # EM ',' log-') write(7,719) 719 formatC ',23x,'iterations ', + 'likelihood' ,10x, 'AIC ,12x, 'BIC') write(7,720) llprev,aic,bic 720 formatC\/' 1 component:',21x,3gl5.6) write(7,730) mean 730 formatC Estimate of the mean: ',fll.5) C Replace the frequencies with non-zero frequencies and corresponding values. npos=0 do 30 j=l,max if(f(j).eq.0) goto 30 npos=npos+l 105 fpos(npos)=f(j) x(npos)=j-l 30 continue C Iterate over the number of components. 98 m=m+l totit=0 dim=2*m-l flag=0 llmax=llprev C Enumerate a l l partitions of the data into m groups. For each c a l l the mle C procedure with starting values calculated from the partition. total=npos-m r(l)=total t=total h=0 do 911 k=2,m 911 r(k)=0 915 rtotal=0 do 40 k=l,m lambda(k)=0.0 p(k)=0.0 do 42 j=rtotal+l,rtotal+r(k)+l group(x(j)+l)=k p(k)=p(k)+fpos(j) 42 lambda(k)=lambda(k)+x(j)*fpos(j) if(p(k).lt.l.0) goto 900 lambda(k)=lambda(k)\/p(k) p(k)=float(p(k))\/n 40 rtotal=rtotal+r(k)+l write(8,840)(x(j),j =1,npos) 840 formatC count:', 15i4) write(8,845)(group(x(j)+l),j=l,npos) 845 formatO group:15i4) conv=0 c a l l em(fpos,x,npos,p,lambda,n,m,maxit,tol,conv,it,11) totit=totit+it if(11.It.Umax) goto 900 flag=l llmax=ll convmax=conv do 50 k=l,m pmax(k)=p(k) 50 lmax(k)=lambda(k) C Here we generate the next partition of the data. 900 if(r(m).eq.total) goto 999 i f ( t . g t . l ) h=0 h=h+l t=r(h) r(h)=0 r(l)=t-l r(h+l)=r(h+l)+l goto 915 C A l l partitions have been generated. 999 if(flag.eq.l) goto 199 write(7,740) m 740 format(' No increase in likelihood for',i2,' components') stop 199 aic=llmax-dim bic=llmax-0.5*dim*alog(float(n) ) 106 ifCconvmax.eq.O) goto 101 write(7,770) m,totit,Umax,aic,bic 770 formatC\/' ' , i l , ' components:',llx,i6,3x,3gl5.6) goto 102 101 write(7,771) m,totit,Umax,aic,bic 771 formatC\/' ' , i l , ' components:',' (no conv.)',i6,3x,3gl5.6) 102 write(7,769) 769 formatC Estimates: probability mean') write(7,790)(pmax(k).ImaxCk),k=l,m) 790 formatC ', 19x,f7.5,f 9.5) if(llmax-llprev.lt.1.0e-6*abs(llprev)) stop llprev=llmax if(m.lt.npos) goto 98 stop end C This subroutine computes log-likelihood and posterior prob's of E-step. subroutine llik(f,x,max,p,lambda,m,loglik,cp) real lambda(6),p(6),wsum,cp(8,6),loglik integer f(8),x(8),j,max,m,k loglik=0.0 do 31 j=l,max wsum=0.0 do 32 k=l,m ifCxCj).eq.0) goto 37 if(lambda(k).It.1.0e-9) goto 38 cp(j,k)=p(k)*exp(x(j)*alog(lambda(k))-lambda(k)) goto 32 38 cp(j,k)=0.0 goto 32 37 cp(j ,k)=p(k)*exp(-lambda(k)) 32 wsum=wsum+cp(j,k) do 33 k=l,m 33 cp(j ,k)=cp(j ,k)\/wsum 31 loglik=loglik+f(j)*alog(wsum) return end C This subroutine performs the EM algorithm with given starting values. subroutine em(f,x,max,p,lambda,n,m,maxit,tol,conv,iter,loglik) real lambda(6),p(6),lambnew(6),pnew(6),cp(8,6).error,loglik,tol integer f(8),x(8),j,n,k,m,iter,maxit,max,conv iter=0 1 iter=iter+l C Compute posterior probabilities cp(j,k) and log-likelihood, c a l l llik(f,x,max,p,lambda,m,loglik,cp) i f ( i t e r . g t . l ) goto 199 write(8,870) loglik 870 formatC Starting values: (loglik= ',gl8.7,')') write(8,871) 871 formatC' probability mean') write(8,872) CpCk).lambdaCk),k=l,m) 872 formatC \\fll.6,f9.6) C Now compute the updated estimates, lambnewCk).pnewCk). 199 do 40 k=l,m pnewCk)=0.0 lambnewCk)=0.0 do 45 j=l,max pnewCk)=pnewCk)+fCj)*cpCj,k) 45 lambnewCk)=lambnewCk)+xCj)*fCj)*cpCj,k) 1ambne w C k)=1ambnewCk)\/pnewCk) 107 40 pnew(k)=pnew(k)\/n C Check whether anything has changed much since last iteration, if(iter.eq.1) goto 55 do 50 k=l,m error=pne w ( k) -p ( k) if(pnew(k).lt.l.0e-7) goto 51 if(abs(error)\/pnew(k).gt.tol) goto 55 51 error=lambnew(k)-lambda(k) if(lambnew(k).lt.l.0e-7) goto 50 if(abs(error)\/lambnew(k).gt.tol) goto 55 50 continue conv=l goto 99 C If anything has changed, update the values of p(k), lambda(k). 55 do 56 k=l,m lambda(k)=lambneo(k) 56 p(k)=pnew(k) if(iter.It.maxit) goto 1 99 write(8,880) loglik 880 formatC Final values: (loglik= ',gl8.7,')') write(8,872) (p(k).lambda(k),k=l,m) if(conv.eq.0) goto 849 write(8,873) iter 873 formatC Convergence achieved after ',i5,' iterations.'\/) goto 850 849 write(8,874) iter 874 formatC A l l ',i5, + ' iterations performed - convergence not achieved.'\/) 850 return end Appendix B Fortran program for computing maximum likeli-hood estimates for hidden Markov models C This program computes the maximum likelihood estimate for hidden Markov models C with Poisson components and given # of components, using the EM algorithm, C with starting values obtained from partitions of the data values. real lambda(6),lmax(6),p(6,6),pmax(6,6),u(6,240),v(6,6,240) real alpha(6) ,11,Umax,aic.bic,mean,tol.epsilon integer y(240),f(8),x(8),group(8),r(99),n,max,m,dim,npos integer j,k,i,conv,cwarn,maxit,flag,t,h,total,totit,it,npart C Read i n data, # components, maximum # iterations, and stopping criterion. read(1,500) n 500 format(i3) read(l,510)(y(i),i=l,n) 510 format(30i2) read(l,520) maxit,tol.m.epsilon 520 format(i6,lx,f8.6,lx,il,lx,f8.6) C Calculate the frequency distribution and mean, do 5 j=l,8 5 f(j)=0 max=l 108 do 10 i=l,n j=y(i)+l if(j.gt.max) max=j 10 f(j)=f(j)+l mean=f(2) do 15 j=3,max 15 mean=mean+f(j)*(j-l) mean=mean\/n write(7,700) n 700 formatC Data: (n=',i4,')') write(7,710)(y(i),i=l,n) 710 format(30i2) write(7,720) (f(j),j=l.max) 720 formatC frequency distribution:', 15i4) write(7,730) epsilon.tol 730 formatC epsilon:' ,f8.6,' stopping criterion: ',f8.6) llmax=(mean*alog(mean)-mean)*n aic=llmax-l bic=llmax-0.5*alog(float(n)) write(7,740) 740 format(\/' ',33x,' log-') write(7,750) 750 formatC ',33x,'likelihood',9x,'AIC,12x,'BIC') orite(7,760) llmax.aic,bic 760 format(\/' 1 component:',20x,3gl5.6) write(7,770) mean 770 formatC Estimated rate:' ,f9.5) if(m.le.npos) goto 100 write(7,780) 780 formatC m must be <= number of distinct data values. ') stop C Replace the frequencies with non-zero frequencies and corresponding values. 100 npos=0 do 20 j=l,max if(f(j).eq.0) goto 20 npos=npos+l x(npos)=j 20 continue totit=0 cwarn=l flag=0 dim=m*m C Enumerate a l l partitions of the observed counts into m groups. Each C partition yields i n i t i a l values of v and em is entered at the M-step. npart=0 total=npos-m r(l)=total t=total h=0 do 900 i=2,m 900 r(i)=0 910 npart=npart+l rtotal=0 do 25 j=l,m alpha(j)=0.0 do 27 i=rtotal+l,rtotal+r(j)+l alpha(j)=alpha(j)+f(x(i)) 27 group(x(i))=j alpha (j ) =alpha (j )\/n 109 25 rtotal=rtotal+r(j)+l write(8,800) ( x ( j ) - l , j=l ,npos) 800 formatC count:',8i4) write(8,810) (group(x(j)),j=l,npos) 810 formatC group: ',8i4) do 29 i=l,n do 30 j=l,m do 31 k=l,m 31 v(j,k,i)=epsilon*alpha(j)*alpha(k) 30 u(j,i)=epsilon*alpha(j) 29 continue do 33 i=2,n j=group(y(i-l)+l) k=group(y(i)+l) v(j ,k,i)=(l .0-epsilon)+v(j , k, i) 33 u(j,i-l)=(1.0-epsilon)+epsilon*alpha(j) j=group(y(n)+l) u(j ,n)=(1.0-epsilon)+epsilon*alpha(j) conv=0 c a l l em(n,y,max,m,p,lambda,u,v,maxit,tol,conv,it,11) if(conv.eq.0) cwarn=0 totit=totit+it if(11.It.Umax) goto 50 flag=l llmax=ll do 45 j=l,m do 47 k=l,m 47 pmax(j ,k)=p(j ,k) 45 lmax(j)=lambda(j) C Here we generate the next partition of the data. 50 if(r(m).eq.total) goto 999 i f ( t . g t . l ) h=0 h=h+l t=r(h) r(h)=0 r(l)=t-l r(h+l)=r(h+l)+l goto 910 C A l l partitions have been generated. 999 write(7,791) npart,totit 791 formatC ',i3,' partitions,' ,i7,' EM iterations') if(cwarn.eq.l) goto 60 write(7,788) 788 formatC WARNING: not a l l starting values gave convergence.') 60 if(flag.eq.l) goto 55 write(7,789) Umax 789 formatC No estimate found with likelihood >',gl5.6) stop 55 aic=llmax-dim bic=llmax-0.5*dim*alog(float(n)) write(7,790) m,Umax,aic,bic 790 format(\/' ' , i l , ' components:',19x,3gl5.6) write(7,794) 794 formatC Estimates:'\/' rates trans, probs.') do 80 j=l,m 80 write(7,796)lmax(j),(pmax(j,k),k=l,m) 796 formatC ' ,f9.6,2x,7f9.6) stop end 110 C This subroutine performs the EM algorithm. subroutine em(n,y,max,m,p,lambda,u,v,maxit,tol,conv,it,loglik) integer y(240),n,m,i,j,k,maxit,it,conv,likexp,max real p(6,6) , lambda(6),pnew(6,6) , lambneH(6),pdf(8,6) real u(6,240),v(6,6,240),vsuml(6),vsum2,usum real error,tol,loglik,lik.logten logten=alog(10.0) it=0 goto 5 1 it=it+l C Compute Poisson pdf, conditional prob.s, log-lik. for current estimates, do 2 j=l,m pdf(1,j)=exp(-lambda(j)) do 3 i=2,max 3 pdf(i,j)=pdf(i-1,j)*lambda(j) 2 continue c a l l posterior(n,y,m,lambda,p,pdf,u,v,lik,likexp) loglik=alog(lik)+likexp*logten i f ( i t . g t . l ) goto 5 write(8,820) loglik 820 formatC Starting values: (log-lik=',gl4.6,')'\/ +' rates trans, prob.s') do 10 j=l,m 10 write(8,830) lambda(j),(p(j,k),k=l,m) 830 formatC ' ,f9.6,2x,7f9.6) C Compute updated estimates lambnew(j), pnew(j.k). 5 do 20 j=l,m usum=0.0 lambnew(j)=0.0 do 30 i=l,n lambnew(j)=lambnew(j)+u(j,i)*y(i) 30 usum=usum+u(j,i) 1ambnew(j)=lambnew(j)\/usurn vsum2=0.0 do 40 k=l,m vsuml(k)=0.0 do 50 i=2,n 50 vsuml(k)=vsuml(k)+v(j,k,i) 40 vsum2=vsum2+vsuml(k) do 60 k=l,m 60 pnew(j ,k)=vsuml(k)\/vsum2 20 continue C Check whether anything has changed much since last iteration. if(it.eq.O) goto 130 do 110 j=l,m do 120 k=l,m error=pnew(j,k)-p(j,k) if(pnew(j,k).lt.l.0e-7) goto 120 if(abs(error)\/pnew(j,k).gt.tol) goto 130 120 continue error=lambnew(j)-lambda(j ) if(lambnew(j).lt.l.0e-7) goto 110 if(abs(error)\/lambnew(j).gt.tol) goto 130 110 continue conv=l goto 160 C If anything has changed, update the values of p(m) and lambda(m). 130 do 140 j=l,m do 150 k=l,m 111 150 p(j ,k)=pnew(j ,k) 140 lambda(j)=lambnew(j) if(it.It.maxit) goto 1 160 write(8,840) loglik 840 formatC Final estimates: (log-lik=' ,gl4.6,') ') do 170 j=l,m 170 write(8,830) lambda(j),(p(j,k),k=l,m) if(conv.eq.0) goto 180 write(8,850) i t 850 formatC Convergence achieved after ',i5,' iterations.'\/) goto 190 180 write(8,860) i t 860 formatC A l l ',i5,' iterations performed - no convergence.'\/) 190 return end C This subroutine computes the conditional probabilities in the E-step. subroutine posterior(n,y,m,lambda,p,pdf,u,v,lik,likexp) integer y(240),ascale(240),bscale(240),e(240) integer n,m,i,j,k,likexp,asc,bsc real a(6,240),b(6,240),u(6,240),v(6,6,240),p(6,6),lambda(6) real asum,bsum,lik,pdf(8,6) ascale(l)=0 do 10 j = l,m 10 a(j,l) = pdf(y(l)+l,j)\/float(m) do 20 i = 2,n asum =0.0 do 30 j = l,m a(j,i) = 0.0 do 40 k = l , i 40 a(j,i) = a(j,i) + a(k,i-l)*p(k,j)*pdf(y(i)+l,j) 30 asum=asura+a(j,i) asc=0 19 if(asum.gt.1.0e-l) goto 18 asc=asc+l asum=asum*10.0 goto 19 18 do 50 j = l,io 50 a(j ,i)=a(j ,i)*10.0**asc ascale(i)=ascale(i-1)+asc 20 continue do 60 j = l,m 60 b(j,n) = 1 bscale(n)=0 do 70 i = n-1,1,-1 bsum = 0.0 do 80 j = l,m b(j,i) = 0.0 do 90 k = l,m 90 b(j,i) =b(j,i) + b(k,i+l)*p(j,k)*pdf(y(i+l)+l,k) 80 bsum=bsum+b(j , i) bsc=0 69 if(bsum.gt.1.0e-l) goto 68 bsc=bsc+l bsum=bsum*10.0 goto 69 68 do 100 j = l.m 100 b(j,i)=b(j,i)*10.0**bsc bscale(i)=bscale(i+1)+bsc 70 continue 112 lik=0.0 do 110 j=l,m 110 lik=lik + a(j,n) do 120 i=l,n e(i)=ascale(n)-ascale(i)-bscale(i) do 130 j=l,m 130 u(j,i)=a(j,i)*b(j,i)*10.0**e(i)\/lik 120 continue do 140 i=2,n e(i)=ascale(n)-ascale(i-1)-bscale(i) do 150 j=l,m do 160 k=l,m 160 v(j,k,i) = + p(j,k)*a(j,i-l)*pdi(y(i)+l,k)*b(k,i)*10.0**e(i)\/lik 150 continue 140 continue likexp=-ascale(n) return end 113 ","type":"literal","lang":"en"}],"http:\/\/www.europeana.eu\/schemas\/edm\/hasType":[{"value":"Thesis\/Dissertation","type":"literal","lang":"en"}],"http:\/\/www.europeana.eu\/schemas\/edm\/isShownAt":[{"value":"10.14288\/1.0098216","type":"literal","lang":"en"}],"http:\/\/purl.org\/dc\/terms\/language":[{"value":"eng","type":"literal","lang":"en"}],"https:\/\/open.library.ubc.ca\/terms#degreeDiscipline":[{"value":"Statistics","type":"literal","lang":"en"}],"http:\/\/www.europeana.eu\/schemas\/edm\/provider":[{"value":"Vancouver : University of British Columbia Library","type":"literal","lang":"en"}],"http:\/\/purl.org\/dc\/terms\/publisher":[{"value":"University of British Columbia","type":"literal","lang":"en"}],"http:\/\/purl.org\/dc\/terms\/rights":[{"value":"For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https:\/\/open.library.ubc.ca\/terms_of_use.","type":"literal","lang":"en"}],"https:\/\/open.library.ubc.ca\/terms#scholarLevel":[{"value":"Graduate","type":"literal","lang":"en"}],"http:\/\/purl.org\/dc\/terms\/title":[{"value":"Maximum likelihood estimation for mixture distributions and hidden Markov models","type":"literal","lang":"en"}],"http:\/\/purl.org\/dc\/terms\/type":[{"value":"Text","type":"literal","lang":"en"}],"https:\/\/open.library.ubc.ca\/terms#identifierURI":[{"value":"http:\/\/hdl.handle.net\/2429\/29176","type":"literal","lang":"en"}]}}