Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Maximum likelihood identification of linear discrete-time systems De Glas, Michel 1976-12-31

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata


UBC_1976_A7 D44.pdf [ 3.84MB ]
JSON: 1.0100141.json
JSON-LD: 1.0100141+ld.json
RDF/XML (Pretty): 1.0100141.xml
RDF/JSON: 1.0100141+rdf.json
Turtle: 1.0100141+rdf-turtle.txt
N-Triples: 1.0100141+rdf-ntriples.txt
Original Record: 1.0100141 +original-record.json
Full Text

Full Text

MAXIMUM LIKELIHOOD IDENTIFICATION OF LINEAR DISCRETE-TIME SYSTEMS by mchel^De Glas Tngenieur, Ecole Superieure d" Informatique Electronique et Automatique, France 1973 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENT FOR THE DEGREE OF MASTER OF APPLIED SCIENCE in the Department of Electrical Engineering We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA July, 1976 0 Michel De Glas, 1976 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the Head of my Department or by his representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of E I €. chr * <L &\ £ t*- J? / **- ^~g- v The University of British Columbia 2075 Wesbrook Place Vancouver, Canada V6T 1W5 Date TcJv Zl^j >Q7f> ABSTRACT / RESUME Abstract The theoretical properties of the Maximum Likelihood estimator, for both single input-single output and multivariable systems, are considered. New results relative to convergence properties of some identification methods of single input-single output systems are obtained. A unified approach to the Maximum Likelihood identification method of multivariable systems is proposed. Numerical tests on a computer are performed. Resumg Nous considerons les proprietes theoriques de l'esti— mateur du Maximum de Vraisemblance dans le cas de systemes mono-variables et de systemes multivariables. Nous e'tudions des methodes d'identification de systemes monovariables, et de nouveaux resultats relatifs a la convergence de ces methodes sont obtenus. Nous proposons une approche globale de 1'identification des systemes multivariables au sens du Maximum de Vraisemblance. Cette procedure est illustree par des exemples numeriques. iii TABLE OF CONTENTS Page INTRODUCTION ' 1 Chapter 1 SINGLE INPUT - SINGLE OUTPUT SYSTEMS .... 6 1 Introduction 6 2 Preliminaries 1 3 Moving Average noise model 12 3.1 Introduction 13.2 Global minimum points 14 3.3 Local extremum points 6 4 Autoregressive noise model 20 4.1 Introduction . . .4.2 Global minimum points ......< 21 4.3 Local extremum points 25 4.4 Comments 30 5 Conclusion 1 Chapter 2 MULTIVARIABLE SYSTEMS ....... 32 1 Introduction 32 Canonical structures . 34 2.1 Introduction 32.2 Preliminaries 5 2.3 Canonical forms 36 3 Canonical input - output equations 42 3.1 Introduction 43.2 Noise-free version 43 3.3 Noisy version 55 3.4 Comments • • • • $9 Page 4 Maximum Likelihood estimator 61 4.1 Introduction . 64.2 Equation of error 1 4.3 Statistical properties of the estimates ... 65 5 Minimizing the loss functions 71 5.1 Introduction 75.2 Minimization procedures using derivatives . . 72 5.3 Minimization procedures without using derivatives 73 6 Conclusion 5 Chapter 3 EXPERIMENTS . . . 78 1 Introduction2 Experiment 1 78 3 Experiment 2 9 CONCLUSIONS 84 REFERENCES 5 V LIST OF TABLES Table Page 3.1 Identification results. Third order system (S/N = ») 80 3.2 Identification results. Third order system (S/N = 10) . . 81 3.3 Identification results. Fourth order system (S/N = 10) 82 3.4 Identification results. Fourth order system (S/N =3) • 83 vi ACKNOWLEDGEMENT I would like to express my gratitude to Dr. E.V. Bohn, my supervisor, for his guidance during the research and writing of my thesis. I wish to thank Dr. A.C. Soudack for his helpful suggestions. I am thankful to the Canada Council for the financial support afforded me in the form of a Graduate Scholarship from 1974 to 1976. INTRODUCTION Preliminaries During the past decade, increasing attention has been devoted to the different aspects of system identification. However, although these topics have been discussed in a multitude of papers and a rather large number of survey papers [3,8,9,25] have been published, the field of identification does not appear as a unified subject. It is then of importance to have in mind the basic con cepts characterizing an identification problem. One of the basic ingredients in the formulation of the identification problem is the choice of the class of models [21] (parametric, non parametric) and of the model structure (linear-non linear, continuous-discrete time,...). Such choices cannot be done systematically and depend on the a priori knowledge of the case treated and on the purpose of the identification. Due to the fact that, in most realistic situations, the measurements made on the system considered are corrupted by random disturbances, one may investigate the identification problem through statistical methods, leading to an estimation problem. The estima tion problem can be formulated as the choice of an estimator (Least-Squares, Markov, Bayes, Maximum Likelihood). Such a for mulation makes it possible to derive mathematical properties of the estimates and provides a rigorous framework to the field of identi fication. It must be noticed that the probabilistic interpretation can be eluded by taking into account the effects of the disturbances from an empirical point of view. In any situation (probabilistic or deterministic), the identification problem can conceptually be stated as the finding of the model—subjected to the same input signals as the system—that is optimal in some sense. The optimality has to be defined using a criterion with respect to the output signals, which are related to the parameters values through a functional relationship. The criterion is generally defined by means of a loss function, leading to an optimization problem. The choice of an optimization method (derivative-type methods, search methods) is, once again, strongly related to the problem under study. All these aspects are equally important steps of the iden tification problem which then cannot be considered as a simple estimation of a set of parameters or as a simple optimization problem. Aim of Thesis The present work is concerned with identification of linear discrete-time systems. As outlined above, the choice of an estimator is a crucial step of the identification problem. A very popular choice is that of the Least-Squares (LS) estimator. However, it is well known that the LS estimator is generally biased. In order to overcome the problem of correlated noise, one may postulate a system with correlated noise, by introducing a noise model. Depending on the a priori knowledge of the system and of the noise, the following estimators may be chosen: - the Markov estimator for which the knowledge of the covariance matrix of the noise is needed - the Bayes' estimator for which the knowledge of the probability density function of the noise and of the parameters values is needed - the Maximum Likelihood estimator for which the proba bility density function of the noise is needed. The assumption of known covariance matrix of the noise or of known probability density function of the parameters values severely limits the practical applicability of the two first estimators. The aim of Chapter 1 is to analyze, for single input-single output systems, the mathematical properties of the Maximum Likelihood estimator. Two new results will be obtained: - in case of a Moving Average noise model (Astrom [2]), it will be shown that the Maximum Likelihood estimates may con verge to wrong values. Counterexamples to general convergence will be given (Section 3). - in case of an Autoregressive noise model, it will be established that - under suitable assumptions -the Maximum Likelihood estimates converge to the true values of the paraneters (Section 4). This result provides a convergence proof of the well-know Generalized-Least-Squares (GLS) method, pro posed by Clarke In Chapter 2, multivariable systems will be consi dered. The identification of multivariable systems cannot be viewed as a simple generalization of the single input-single output case and, before formulating the identification problem the following considerations must be taken into account : the choice generally made for the class of the models (state-space representation) implies the determination of a suitable canbni cal form and the derivation of a canonical set of input-output relations. The problem of finding state-space canonical forms "have been investigated by Luenberger[l5 J, Gopinath [ 12 J and Kayne[l6J. It can be shown that the use of a selector matrix or of a set of indexes allows one to describe the structure of a canonical model. In Section 2, the results relative to thJS s problem will be reported. Although the problem of deriving a set of canonical input-output relations has been studied by many authors, no general approach is yet available in literature. The methods proposed by Gopinath[15 ], Ackernan [ 1 ], Zuercher L28J or Guidorzi[l3 ] are subject to strong restrictions. In Section 3, the solution to this problem in the most general case is propo sed and comparison with the previous methods is established. The basic differences between the approach presented in this Section and the previous ones lie in the fact that it allows a discrimination of the input-output data and that it is ex tended to the noisy case. In Section k, using the results of Chapter 1, a Ma ximum Likelihood estimator is defined and the consistency of the estimates, in case of correlated noise, is proved. In. Section 5, the optimization problem is discussed. The various- types of minimization, algorithms are described and compared. Finally, in Chapter 3, experimental results are pre sented. The theoretical results of Chapter 1 and Chapter 2 are tested on numerical examples. 6 Chapter 1: SINGLE INPUT - SINGLE OUTPUT SYSTEMS 1 - INTRODUCTION As pointed out in the introductory Chapter, a type of identification problem may be obtained by embedding it in a probabilistic framework, leading to an estimation problem. Such a point of view allows a rigorous approach to the field of identification and constitutes one of its main aspects. The aim of this Chapter is to investigate the mathe matical properties of the Maximum Likelihood estimation method for various types of noise model structures. (i) In Section 3, the Moving Average noise model will be considered, (ii) In Section 4, the Autoregressive noise mo-"el will be studied. In both cases, two classes of Maximum Likelihood esti mators will be investigated and new results relative to the convergence properties of some existing methods will be esta blished. 2^ PRELIMINARIES The class of linear discrete-time single input — single output systems considered is represented by the dif ference equation ACz"1) y(k) = B(z_1) u(k) +• n(k) (1.1) where |y(k),: k=l,... .,N| is the system output Ju(k), k=l,....,NJ is the system input. |n(k), k=l,...,Nj is an additive zero-mean noise and n a ACz"1) = 1 + a.z"1 (1.2) i=l BU"1) = Yl b.sT1 (1.3) i=0 1 It. ia assumed that, the noise process can be expressed as a process driven by a white noise e(k) . n(k) = H(z~1) e(k) (1.4) where HU-lj ~Sl£bi (1.5) D(z~i) CCz"1) = 1 + cAzr± <1.6) i=l DCz"1) = 1 d-z_i (1.7) 1=1 The Likelihood, function L can then be defined as the probabi lity density function of £(k), where the numbers £(k) are the so-called residuals defined by H(z_1) £(k) = A(z~1) y(k) - B(z-1) u(k) (1.8) 2 Assuming that e is (0,; G~e) gaussian, the residual £(k) is 2 a sequence of independent and gaussian (0, °~ ) variables and the log-Likelihood function A takes the form H C (k) - N log tr - N log 2 TT (1.9) 2 <rd k=l 6 e The Maximum Likelihood estimation procedure can then be in terpreted as the finding of a model A(z_1) y(k) = BU"1) u(k) + H(z-1) G(k) (1.10) in such a way that the log-Likelihood function A is maximized. From (1.9),. it is clear that maximizing A Is equivalent to 9 minimizing the loss function N" VN(a,b,c,d) = ^rr !C £2(k) " k=l (1.11) where £(k) is obtained through the equation of error C(k) = H-^z"1) A(z-1) y(k) - iT^z"1) kz"1) u(k) (1.12) In the foilowing, there is an advantage in putting T £ = T , T T , abed T ". where and — I an a ~. ..a 12 n cT = (1.13) 12 n d, d0»...d yd. n The Maximum Likelihood estimator can thus be defined as follows the Maximum Likelihood estimate(s) of p_, say p_, is(are) the ab solute minimum point(s) of VJJ(E) E]L s £ £ Sx = • j iJJ : VN(pJJ) = min VN(£) P (1.14) However, from equation (1,12), it is clear that £ is non linear * * *M in c and d„ Conseauently, the finding of p has to be done iteratively using a search routine. It is then of importance to know if V,T(T>) has local extremum points This leads us to redefine the Maximum Likelihood estimator in the following way 10 p £ Sp = 'm dV(£) = 0 (1.15) m The estimator E± (resp. E£) will be consistent if and only if S1(resp. S2) = j Clearly Sx Q S£ and E£ consistent, implies E± consistent. (1.16) From a practical point of view,, it is highly desirable that E2 is consistent, E2 inconsistent means that, even if the global minimum point of the loss function coincides with the true value of the parameter vector £ (E^ consistent), the Maximum Likeli hood estimation procedure may not converge into and may then give a wrong estimate. The equation (1,-k) can be rewritten as DU"1) n(k) = CU x) e(k) -1, (1.17) which implies that, the noise process is modeled as an autore-gressive moving average (ARMA) process. In the next Sections, the following cases will be investigated 11 a) MA noise model (D=l), This case has been treated by Astrom {_2. ], Astrom and Soderstrom [ ^ ] and Soderstrom b) AR noise model (0=1).. This choice of noise struc ture is the basic ingredient in the formulation of the Generalized Least-Squares algorithm proposed by Clarke [ 6 K In both cases, the statistical properties of the estimates will be considered. It will be shown that — for a MA' model (Section 3) , the estirelator is consistent while is inconsistent. for a A3 noise, model (Section ^), the estimators E^ and E2 are consistent. The concept of persistently exciting signals will be used in the following, A signal u(k) is said to be persistently exciting of order n [ 2 ] if N lim i Yl u(k) u(k+3) =R(o) (1.18a) N—•00 k=l u exists and An = [a U-j) 1 i=l,....,n (1.18b) Is positive definite.. If u(k) is persistently exciting of order n n T (1.19a) n XI h. u(k-i) = 0 1=0 implies h± = 0 i=0,..».,n (1.19b) 3. MOVING AVERAGE NOISE MODEL 3«~1» Introduction If a Moving Average (MA) noise model is used, the output data are governed by the difference equation A(z""1) y(k) = B(z"1) u(k) + C(z_1) e(k) (1.20) The model structure is then given by Mz'1) y(k) = B(z 1) u(k) + C(z_1) €(k) (1.21) To apply Maximum Likelihood (ML) method, it is assumed that - Al. All the processes are ergodic - A2> The polynomials A(z~1) and C(z~^~) have all their zeros inside the unit circle £this condi tion implies that the system (1*20) is stable] " A3. The polynomials A(z_1), B(z-1) and C(z~^~) are relatively prime [this condition implies that the system (1.20) is controllable either, from u or from e] . [T T T"lT a _b cl , say p =|^a b c J , is obtained by minimization of the loss function VN(£) = ^ YZ £2(k) (1.22) k=l where the residual £(k) depends on p via (1.21) * -1 -1 £(k) = ^-r1 y(k> - ?(2 > u(k) (1.23) C(z_±) C(z-±) As a first result, the global minimum points of the loss func tion will be considered and conditions will be given for the estimator to be consistent.. This will be the object of Section 3.2. As mentioned above, it is of importance to know if there is a unique local minimum point of V^(£).. In practice, there are cases for which the loss function can have more than one local minimum. The reasons for such difficulties are the following — the model structure may not be appropriate — the number of data, N, may be too small — the model structure may have the inherent pro-perty that V^(_I>) bas several local minima. In order to avoid the first two pitfalls, the following addi tional assumptions will be made - M,It is assumed that (1.20) really holds - A5.The asymptotic loss function V(£).= lim V„(£) = E[£2] (1.24) will be considered instead of V^(£). The case covered by the third explanation will be the main object of Section 3.3., where it will be shown that the model structure considered implies that,- in general, a unique local minimum cannot be obtained and consequently that is inconsis tent. 3-2. Global minimum points Let us define - £K the global minimum point(s) of V^(p) : V%> = m^ V£) <l-25) P — £ the global minimum point(s) of V(£) V(pM) = min V(jg) (1.26) Astrora has shown [2]that £M = £ (1-27) lim jpjj = 2n with probability 1 (1.28) N-* oo provided u is persistently exciting of order n + n,. These two * a D relations establish, that the global minimum point(s) of the loss function VN(j3.) converged) as. N —to the unique global minimum point of the asymptotic loss function V(]D),, which coincides with the true value of the parameters, ja. This means that, under the assumptions Al - A5, = | £j and that Theorem 1,1 The ML estimator E.^ is asymptotically consistent. The results of this Section do not give any informa-tion about local extremum points and thus do not prove that JD converges to j). The convergence properties of the ML method, crucially depends on the existence of multiple local extrema of the loss function. It must be noticed that, although the ML method has been extensively applied to systems whose model is given- by (1.20), no general result relative to local extremum points is available in literature - except in the case of a pure ARMA pro cess (B = 0) [ if j . In the next Section' a class of counter exam ples to a unique local minimum is given : this implies the in-\ 16 consistency of E^ and the non-convergence of the method descri bed above, which is one the most commonly used in system iden tification. 3-3. Local extremum points Combining (1.20) and (1.21) and dropping the argu ments z-"*" and k, for convenience, yields £ = AB - A3 u + AC.e Cl.29) AC AC Since u and e are assumed to be uncorrelated, the asymptotic loss function takes then the form 2V E AB - AB AC 2 AC u + E — e AC which can be rewritten, using Parseval s theorem, as 2 2V = °e 2 7Ti (1-30) AC AC ^7'1' AC AC where the integration path is the unit circle and 3*u(z) Is the discrete power spectrum of u. We can now consider the stationary points of V, i.e. the points which are solutions of ^=0 (1.32) Let us first consider the solutions of dV/dc . = 0, j = ll..|n . After simple calculations, it can be written Dc . 2irt / ACC AC . 1A&45(i) AB^C.-I, 4>u(z) z^-1 d» 3=1,..., 2iri J ACC AC (1.33a) or aS. J A*C* U J-l,...,nc (1.33b) where i 2 AC "* * A3-; A'B ****** f<z> = - °t - A C ~ (A B -A B ) * (1.3'+a) ACC ACC a * n * A (z) = z a A(z~L) (1.3*»b) B*(z) = z b B(z~1) (1.3^+c) C (z) = z C C(z_1) (1.3ifd) Since the integrand of 3V/3c . has n +n poles (f(z) is analy-3 a c tic) , the equation dV/c£ = 0 has several solutions'in £_. 18 Let us now consider dV/da and DV/Db. Using (1.30), one may write B AC z"Ju AB-AB-AC + E A AC AC AC j=l,... , n. 3V E 1 - j — z Ju u AC j=0,...,n. (1.35) or av .1! i=l a. E 1 nT '3 -i -— z u + E \ z-Je" "C -i " —T z e AC AC AC AC E 1=0 B -j — z °u AC 1 -i — z u (1.36a) 2>b n v4 E i=l i z-ju B -i — z u AC i=0 i z^u 1 -i — z u c)b . (1.36b) The equation *a -21 = 0 SV 3b = 0 (1.37a) can then be rewritten as M -N' -If T (1.37b) 19 where M,- IT,. F,. £ and r are defined by M. . N. . 1J = E TB -i — z u [AC E AC 5- z-Ju A AC i z~Ju C -i — z e A AC P. . = E 13 1 -i — z u . C C A AC (1.39b) (1.39c) (1.39a) da (1.40a) r = - M(0) ob (1.40b) For any C, the matrices M and P are positive definite. The equa-tion (1.38) has then, at least, one solution in a and b, for any £. Since (1.33)' has several solutions in £, the gradients of V may vanish for several values of The points JD satisfying (1.33) and (1.38) belong to S2. This means that is inconsistent.. Remark The above analysis is valid for any (A,B,C). The non-uniqueness of the estimates is thus an inherent property of the model considered. 20 We have thus shown the existence of counter-exam ples to a unique local minimum and consequently to general con vergence of the ML method in the case of a MA noise model. This drawback is an inherent property of the model structure consi dered.. In the next section,, a similar analysis for the Autoregresslve noise model case is proposed. 4. AUT05EGRSSSIVE, NOISE MODEL 4.1. Introduction In the case of a Autoregresslve (AR) noise model, the equations (1.8) and (1.10) take the form A(z.~1) y(k) = B(z -1) u(k) + 1 — e(k) 1 \ (1.4D cC^_x) Mz"1) y(k) = rkz"1) u(k) + - 1 e<k) (1.42) Similarly with the previous case, the £ ,. is obtained by minimizing the lo function VN(P) = 2I 51 £2(k) (1.43) where the residual £(k) is given by C(k) = C(z-1) ACz"1) y(k) - C(z_1) BCz"1) u(k) (1.44) Assuming the conditions A1-A5 of Section 3 still hold, we shall analyze first, the global minimum points (Section 4-2) and second, the local extremum points (Section A 4-3) of the asymptotic loss function V(p). The asymptotic con sistency of the estimators and E^ will be proved. 4.2- - Global minii'iUUii points The purpose of this section is to prove the asymptotic consistency of the ML estimator E^. To do so, we shall proceed as follows : — First, we shall prove that the global minimum point j2 of the asymptotic loss function V(p_) coincides with the true value of the parameters vector, p_ - Second, we shall show that the global minimum point( of V,T(p), p„ converge(s) to v as N—» co . 11 ii Combining (1.41) and (1.42) allows us to rewrite the equation (I.44) and bo obtain the equation of error A A ' A A c ~ AB -AS \ AC ,, . CK 22 It then follows 2V = E A A AM AB - AB 1 , JAC j L (1.46) and 2V.> E AC AC 2 111 T ACvZj ACU > z " V (1.47) where the integration path is the unit circle |z| = 1. We can now postulate the Lemma 1.1 2V > o-: (1.48) The equality holds if and only if p_ = p_ (A=A, B-B and C=C), Proof r Let us consider the integral 2 I = 2iri AC AC (z) - 1 ACU ) dz z > 0 (1.49) Then I = V - 2 2 TTi AClZ; Z 27Ti dz z (1.50) Clearly, since A(z_1) and C(z 1) have all .their zeros inside the unit circle,(assumption A2) and A(0) = C(0) = 1, it follows that 23 27Ti AC, v dz AC(Z) T A(0) 0(0) A(0) C(0) =• 1 1 I dz 2lTif z = 1 (1.51) and I = V - Cr" > 0 e (1.52) Hence 2Y > 0-: (.1.53) The equality holds if and only if (i) 1=0 which implies AC = AC (ii) E (1.5if) = 0, which is equivalent, provided u is persistently exciting of order nb+nc, to AB = AB (1.55) Combining (1,5*4-) and (1,55) and using Assumption A3 yields A- = A, B = B, C = C (1.%) Q,E,.D„ For all N, V"N(_£) and V(jo) are polynomials. Then, the assertion of convergence of V"N to V as N-»<*> Implies that V"N(p) converges uniformly to V(r>) on every compact set. Hence, consi dering the estimates in a compact set G allows us to establish the Lemma 1.2 lim TL! = v'1 with probability 1 TIT (1.57) Proof Let«be an arbitrary small positive real number and be the set £ - £ M (1.58) Since V is continuous, there exists fi > 0 such that min V(J) > V(£M) + /? G-G (1.59) Since V" converges to V uniformly, there exists an integer M If such that if IT>M then (1.60) for all p 6 G, Thus Bin VM(£)> min v(£) - /3 > V(PM) +2/3 G-G This means that £N is in G^. Q,E..D. (1.61) Finally, the lemmas 1.1 and 1.2 yields the Theorem 1.2 r The ML estimator is asymptotically consistent. 4-3 — Local extremum points The goal of this Section is to show that the estima tor is asymptotically consistent, i.e.. that - j T° ^° so, we shall prove that the gradients of V(jo) v/ith respect to V vanish only at _p_ =• T)-In a recent paper [23], Soderstrom has shov/n the existence of counter-examples to a unique local minimum of V, for "small" signal-to-noise ratios : this is a major drawback to the implementation of this method.. However, it will be shown that, the introduction of one additional assumption allows us to overcome this problem* V/e shall thus investigate the conditions for the • gradients of V to vanish i.e.. ^7=2 (1.62) Defining the polynomials n E = IS = 1 + £f z~k (1.63) k=l tn ^ = ^=|^kz~k (1-64) where n = n + n, a b m = n, + n D C confers to V" the form 2V" = E A -, 2 r F + E AG 6 FB - HA A (1.65a) (1.65b) (1.66) and allows us to rev/rite (1.62) as n da. k=l "~J df 3V I . ^~ = o k-J m Sb. k=0 0 ci ^ —r~ = 0 j=l,-..,n (1.67a) 0=0,....n, (1.67b) n m = ^> a, . ~~ = /\ b 5V Be. k=l °k-;5 df. k=0 5V 0 5=1,...,n (1.67c) In the following, it will be assumed that A6 The polynomials A and C, and B and C are relatively prime. With this assumption, (1.67) implies j=l,...,n (1.68a) Df. av _ = 0 j=0,... ,m (1.68b) 27 Using (1.66), (1.68) can be rewritten as E ice + E FB-HA A FB-HA, A -u or n YL h E -1 i=l i AC z~^u ,-3 = 0 0 (1.69a) (1.69b) AC n E i=l B -i A 2 U f z^u ro E i=0 -l z u + ^(0) = 0 (1.70a) n YL E m i=l K -i A Z U r^ul - Z h i=0 E — -X z u z 3U + dv dh. Let us define the matrices (1.70b) Ml = E -X z \z~i 1 AC6 _ AC6 i=l, 3=1, . ,n (1.7D H2 = B -i jz u £ z-Ju A 1=1, 3 = 1, . ,n .. ,n (1.72) N =- E r -i z u 7 A i=l,....,n 3=1,...,m (1.73) E^z-1u z'^uj i=1, • . . , EI ,i=l,..,, ra (1.7'+) and the vectors f = |VV"?n] U-?5) „T = 4-(o> (1-77) The equation (1.70) takes then the form Defining yields H 2 (1.78) 0^ + M2) f_ - N h + £ = 0 (1.79a) NT f - P h + r = 0 (1.79b) or equivalently ^Ml + M2 ~ N P_1 nT) £ = N p_1 H ~ £ (1.80a) h = P"1 NT f + P-1 r (1.80b) v = | u (1.81) [Ry(i-j) ] (1.82a) N =|Ruv(i-j)j (1.82b) NT =[Rvu(i-j)] (1.82c) P =|^Ru(i-j) ] (1.82d) -1 T Clearly, the matrix M N P N is positive definite. Since Is positive definite, the matrix + M2 - N N^, as a sum of two symmetric positive definite matrices, is positive definite and then non-singular. The equation (1.80) has thus a unique solution, given by 1 = (M1+M2-IfP~1NT)~1(NP~1r-q) (1.83a) h = P~1N-T(M1+H2-NP"1NT)-1(Np-lp_a) + p-lj. (1.83b) This proves that the gradients of V vanish for one and only one value of _p_. From Theorem 1.2, this value coincides with JD. We have thus established that under assumption A1-A6, £>2 = j JE] anc* that Theorem 1.3 The M.L. estimator E^ is asymptotically consistent. A direct consequence of the above study is that -provided the assumptions A1-A6 are verified - the iiL method, in case of an AR noise model, is convergent. It must be noticed that the Generalized. Least-Squares (GLS) identification algorithm, proposed by Clarke [ 6 ]is no thing else than a ML method for which the loss function is minimized via a gradient algorithm. The theorems 1.2 and 1.3 thus establish the convergence proof of the GLS method. Before concluding we shall proceed to make some comments about the various assumptions which have been made-/{..Iv, Comments The reason why a finite N can cause difficulties is that the values of VN(_p_)» for fixed JD, are stochastic variables while V(_p_), for fixed £, is a deterministic function. The assumption A6 is a restrictive assumption since no minimization' algorithm can guarantee that this condition is fulfilled at each stage of the procedure. It is however easy, when a stationary point is reached, to test if it is so or not. In the v/hole section, it was assumed that the noise process may be modelled as an process CCz"1) n(k) = e(k) (1.8*0, This means that there exists a noise whitening filter which can be represented by Its truncated impulse response |,i=0,...,nc This assumption has the drawback that there are no systematic rules for choosing the order nc of the autoregression such •that the c. s, for i > n , can be neglected. However, v/e can JL O reasonably argue the following points;. - an analysis of the noise can facilitate the choice of the order of the autoregression - nc can be determined iteratively by increasing it until the reduction of the loss function is no longer significant. 5 - CONCLUSION In this Chapter, the ML estimator, for linear single input-single output systems, has been investigated. The statistical properties of the ML estimator have been established: it has been shown to depend on the noise model structure considered. For this purpose, two new results have been obtained - for a MA noise model, the ML method may not converge. Counterexamples to general convergence can be found. - for an AR noise model, the ML method is convergent. In the next Chapter, the above study will be extended to the multivariable case where it will be shown that, once a canonical structure is defined, the extension is straightforward. Chapter 2: MULTIVARIABLE SYSTEMS i - INTRODUCTION As outlined in the previous chapter, the general definition' of systems identification allows many degrees of freedom in the formulation of the identification problem, leading to take into account the following points : - The choice of a class of models : impulse response - transfer function - state-space representation.. - The determination of a class of input signals - The choice of the estimator - The choice of the optimization algorithm. When formulating and solving such problems, it is important to have the particular type of the system treated and the final goal of the identification procedure in mind. This cannot then be done from a purely mathematical point of view. It is, however, highly desirable to achieve a unified approach of these problems and, by embedding them in an abstract framework, obtain general results. (i) The choice made for the class of the models of multivariable systems is generally that of state-spsce representation^ Such a choice implies - the determination of the system order and of a suitable canonical form - the derivation of a set of input-output relations. The use of a selector matrix S, as suggested by Gopinath [12], or of a set N of indexes, as shown by Mayne [16] allows one to describe the structure of a canonical model. These approaches have the drawback that there is no method for the choice of S or N. However, since such a method should be applied to input-output sequences, this problem cannot be performed in realistic cases - i.e. when no precise a priori knowledge of the noise characteristics is available. The various aspects of the determination of a canonical structure will be treated in Section 2. Although the problem of deriving a set of input-output relations from the canonical state-space representation has been investigated by many authors, no unified approach can be found in literature. The methods proposed by Gopinath [12], Ackerman [1], Zuercher [28] or Guidorzi [13] are subject to strong restrictions. In Section 3, it is shown how to work out this problem in the most general case, permitting the decomposition of the system into subsystems. Moreover this approach is extended to the noisy case in which a suitable noise model is introduced in such a fashion as to preserve the advantages of the decomposition -even in the case of correlated noise. (ii) In Section k, a Maximun Likelihood estimator is described and the statistical properties of the Maximum Like lihood estimates are investigated.. (iii) Finally, in Section 5, the computational aspects are discussed. Two classes of optimization algorithms are des cribed and compared. 2 - CANONICAL STRUCTURES 2,1.- Introduction The class of linear discrete-time raultivariable systems considered is represented by the state equations ' z(k+l) = Fz(k) + Gu(k) (2.1a) £(k) = Hz(k) (2.1bwhere z(k)fRn (state vector), u_(k)CRIn (input vector), ^(k)£Rr (out put vector), F£Rn x. Rn (state matrix), G £ Rn x Rm (control matrix) and H£R X-R (observation matrix). It is well known (see e,g, [ 15 ] ) that. (F^, G-^, H^) and (F^, G-,,, H-,) are similar - i.e, they lead to the sane transfer function K(z) =. H(zl - H)-"^ — if there exists a non-singular p n matrix TER" x R such that F2 " "l'"1 <2.2a> °2=TG1 (2.2b) "2 - V1 (2.2cThe non-uniqueness of (F, G, H) results in a degree of arbitrariness in the realization algorithm K—G, H), which cannot be removed by simple normalization. The next sections show that each representation (Fr G, II) is characterized by a set of integers (n^, n2>.-..,ns) and that, if (n, ,n,,, ,.n ) is knov/n, the representation (F,. GT H) of (2-1) is unique.. 2..2.. Preliminaries Let r = [HT (HF)T....- (HFN-1)T]T A =.[G FG...- Fn_1c-] (2.3) be, respectively, the observability and controllability matrices. (F, G) and (F, H) are assumed to be, respectively, controllable and observable, which is equivalent to rank(T) = rank (A) = n (2.4); T Let h^ denote the i-th rpw of H and let I and J denote the sets I = { 1,2 rj - (2.5) J = {J2,-.-.Js) (2,6where l^j^^r for all k = 1,2,... .s, s^Tr. J is a permutation of any subspace of I. Let T be the non-singular (n x n) matrix defined by T = T . T. ^2 T . 0. h.T —1 h.TF —l h-> XF 1 —x i € J (2.7) Since the pair (F,H) is observable, T non-singular implies (n±+l) = s (2.8) i € J T = P where P is a permutation matrix (2.9) From (2.7) and (2.9) it is.clear that the transformation matrix •ix T is completely defined by either the set jn. ,n. ,...,n. \ ' 31 J2 °sJ or the matrix. P, This matrix is nothing else than the selector matrix, which is the basic concept of the approach suggested by Gopinath [l2 ] , Although these two definitions of T are equivalent, the set of indexes f n. ,n. ,...,n. 1,rather than P, will be ' Jl J2 Js> used in the following. 2.3- Canonical forms Defining the new state vector x(k) - Tz(k) yields the state-space equations x(k+l) = Ax(k) + Bu(k) (2a0a) y_(k) = Cx(k) (2.10bwhere (A, B, C) is obtained from (F, G, H) through (2.2). 37 We can now establish the Lemma 2.1 : Let T - a. —l be the i- th rov/ of A . - b.T —i be the i- th row of B - c.T —l be the i- th row of C T — e. —l be the i- th unit vector i' i = 1,2,. s, the indexes i-1 ^ = 1 v. = n. + i 1 1 k=l Jk ' Then (i> %T = <L±+l i£{l,2,."..,nJ-j%11,v2,...^sJ (2.11a) a.r = a..T i£jW"-'9s! (2.11b) —x —x T T (ii) c. = e.. i = 1,2,...,s (2.12a) -i. " -v. Ji i T T c . = c . —X —X i€ Jl,2,...,rJ - jj±., ...Js) (2.12b) Before investigating these results further, it is necessary to prove the Le mraa 2. 2 : The triplet (A, B, C), defined by (2.11) - (2.12) is unique. Proof : Let us assume there exist two represe'ntations (A^, B^, C and (A2> B2, C2) of (2.10) 38 r -FT 1 21 Let and r*2 be the corresponding observability matrices. Since the representations are similar, (2.13) and there exists a non singular permutation matrix P such that r I. Pri = n M, PT"2 -•where I is the n x n unit matrix, n (2.-14) Combining (2,13) and (2.14) yields T PF± = Pr£T = M2T and T I . QED n * ^ (2.15) The lemma 2;.l and 2.2 establish that the knowledge of a set .Cvf indexes j n^,n2,.. .n } implies the uniqueness of the triplet (A, B, C) characterizing the state representation of a multivariable system. In the following, we can assume, without.loss of generality, that 0\ = i i = 1 s (2.16) i.e. J = |1,2, l<s<r (2.17) 39 Hence, the canonical form of the triplet (A, 3, C) , given in (2.11) and (2.12) can be rewritten.. The matrices A, B and C exhi bit the following structure (i) A matrix n^+1 A = n^+l XL (n.+l) = n 1 n +1 s (2.18) • n where A^,i=l,2, . rs, is the £(n^+l) x n J matrix n 0 0 .... 0 1 0 0 0 0 0 0 1 0 a. —x T n±+l (2.19) (ii) B matrix B = Bn B s «-m,-n^+1 n^+l n n +1 s 4-(2.20) v/here , i = 1,2, ,s, is the (ni+l) x m matrix B_. = T x T ^.+1 x T b x — -o. +n. l l n.^+1 (2.21) (iii) C matrix T T c T —r-s r-s (2,22) n The state .equations take then the form 41 (k+1) 1 xv1+ni (k+1) xv (k+1) 2 x , A (k+1) V2+n2 •x (k+1) s xv +n (k+1) S 8 A xv (k) x.v (k) 1 nl *v (k) (k) 2 2 *S> Ck) s XV+n <k) s s — ^„ 1 ~Vni — 2 ^ T + b ~Vn2 • b T — s b T — V +n s s u(k) (2.23a) Za 00 Z2 (k) T •v.. l J i = 1s x (k) (2.23b) where i^k) = [y±(k) y2(k)...ys(k)]T ^2<k) = tys+i(k) ys+2(k)...yr(k)]T (2.24a) (2.24b) Having obtained a canonical structure for the matrices A, B and C, a set of input-output equations can now be deduced from (A, B, C). Remarkable features of this formulation are that it exhibits the structure of the system and that it shows the link between the internal (state-space) representation and the external (input-output) equations. This extends the results of Gopinath [12], Zuercher [28] and Guidorzi[13]. 3 - CANONICAL INPUT-OUTPUT EQUATIONS 3.1 Introduction A direct consequence of the structure of the matrices A and C, derived in the above Section, is that the original system has been decomposed into s interconnected subsystems. The main goal of this Section is to derive an input-output re presentation of the system such that the subsystems can be separately identified. Section 3.2 will be concerned with the noise-free case while the noisy case will be treated in Section 3-3. 3.2.. Noise-free version According to (2.23) the output vector £(k) can be divided into two components ^(k) and ^(k), £^00 £resp. y_p(k)] being related to x(k) through a relation whose para meters are known [resp. unknown] . The aim of this section is the derivation of an input-output equation u involving the parameters of the matrices A and B and then an input-output equation u_ »-2L2COn^a^n^-nS ^ne unknown parameters of C. 3.2.1. parameters of A and 3 From (2.19) and (2.21), it follows that the i-th subsystem, i = 1,2,... ,s, is described by the set of equation.1 xv Ck) = xv +1(k) + b/ u(k) i 1 i *v +1(k) = xy +2(k) + b/+ u(k) iii V+n.-l<k> = Xv.+n(k) + *Vn,-l *(k) XX XX XX xv +n (k) =a.T x(k) + b„T+n u(k) i i  i (2 Combining (2.22) and (2.25) leads to x:v (k) = y (k) i x (k) = zy (k) - b T u(k) i i xv_+2(k) = z2y;L(k) -D^T+1u.(k). - bvTzu(k) xv +n (k) = ^V10 ~ b7+n -1 u(k) J^111"1 H.(k) i i i i i (2.26) Remark It must be emphasized that is the time delay operator, defined by z~J f(k) = f(k-j) Then, since x(k) = ^•+11 (k) ' Vn 1 I ' x.. (k) x , A . (k) v +n s s the whole state vector can be expressed as (2.27) x(k) =. H(z) Zl(k) - N(z) u(k) (2.28) 45 where M(z) = 1 1 1 | 1 z 1 1 1 1 1 0 1 . . . |0 Z | 1 1 1 [ 1 1 z • ° i • • 1 i | L_ i 1 l 1 | i l j z o | i 0 j *-1 l 1 1 ! 2n= (n x s) (2.29) and 0 1 T n-i-1 m bV1+n1-l + ZbVni-2 + —• + Z bVx 0 b T "*2 N(z) = T Z^2+n2-2 + * n2-l T + 2 V 2 • * 0 b T • *v+n -1 + s s Z^V +-n -2 + '' s s -• • ."'"V s (2.30) The substitution of (2.28) into (2.10a) leads to the input-output relation [(zl - A) H-J^Oc) = [(zl -A) IT + B ] u(k.) (2.3D In (2.3D, howovjr, only the (v2 - l)th, (N), - l)th, .... (\^s - l)th, n-th equations are significant, the remaining ones 47 being identities. By removing these identities, (2.3D takes the form P(z) vx(k) = Q(Z) U(k) where P and Q are polynomial matrices in z Q(z) = P(z) = ^ljCz)j i 1, . • > s 1, .. , s 1, . . , s 1, . . ,10 (2.32) (2.33a) (2.33b) The polynomials P..(z) and 0, .(z) are obtained by simple ins-pection of (2.3D- It follows ij (z) = </, ,znj + 1 n. a. , z. - a. _ z - a. r,Vl x,». i = 1,.. . , s j = 1,. .. , s (2.34a) Q* .(z) =/3 .z1"1 + i = l,...,s. j = 1,...,m (2.3*+b) where the/^'s are related to the a*s and b's through the equation 1 p 21 nl fii. ftim = L bll b12 '21 Jlm J2m nl nm (2.35) 48 with L= hi) \:\'-(2.36) Lii ai,v.+l " ^,^+2 ' ' - a i,V±+2 x,V.+n. ' x x - a i,Vni o (2.37a) ai,-*. + l ~ ai,S» +2 J J - a i'Y2 0 - a 0 0 (2.37b) the matrices being [(ni + 1) x (n^. + 1) J . Remark The matrix L is always non singular : det (L) = 1, since det (L..) = cf. .. x j x j 49 We then obtain s subsystems whose input-output description is pii(z) yi(k) + + pis(z) ys(k) = Q^1(z) U;L(k) + ... + Q*m(z)• uffl(k) i=l,..,s (2.38) Defining the integers n\ = max (n1, n£, ... ,. ni_1, nj,+l, n±+1, . , nQ) i=l,...,s (2.39) ,and the parameters Pjk = " ai,-> .+n.-k+l 3=1,...s k=l,..,nJ + l qJk ssA1+ir1-k+I,J- 3=1,..,S k=l,..,n.+l i=l,..,s (2.40) allows us to define the reciprocal polynomials , • n.-n.+l . n.-n. . -n. " if + PjlZ + + Pj,n + 1Z n..+l 3 n .-n. -k+1 l. 0 l 13" ' P3kZ n.-n.+l V7 = cf. ,z J 1 + Z_ i-l,_,,S j = l,..,s (2.41) n.-n. . n.-n.+l . -n. X „X X , X X X , x l =• qjlZ + qj2Z + —• + qo,n.+lZ n .+1 a qikz k=l Jk n.-ru-k+l 1 xi l i=l,...,s j=l,...,m (2.42) and to rev/rite (2.38) in the form Pil(z~1) yx(k) + ... + P^U"1) ys(k) = Qil(z~1) ux(k) + ... + Qiffi(z~1) um(k) i = l,..,s (2.43) In summary, a set of s input-output equations of the form s m Yl P^A^1) yM = X^0..(z_1) u (k) (2.44a) j=l 0 J 0=1 J J n.+l J , n.-n +1 \ . n.-n.-k+l Pi0(z~ } = ^ij2 + ^ pjkZ i=l,..,s (2.44b) n. +1 x 1 / • " ^-m-k+l Qi0(Z" ) = ^1 q^kZ J=r,-.-,s (2.44c) 1 = 1, • j s has been obtained.. Recalling (2.40), (2.44) can be used in order to estimate the parameters of A and B. 3.2.2, Parameters of C In equations-(2.44), which, allow us to identify the system dynamics, only the information contained in u_ and is used.. In order to identify the matrix C, we shall now derive a set of equations making use of the information pro vided by y_2. According to (2.23), it follows i2(k) = Cx(k) '(2.45) where (2.46) Then, combining (2.46) and (2.28) yields 22(k) = CM(z) v^k) - CN(z) u(k) or, equivalently P*(z) vx(k) - £2(k> = Q.«(z) u(k) C = T T ^2 c T —r-s (2.47) (2.48) 52 where P'(z) and Q'(z) are the matrices P (z) •= CM(z) = FiJ(z) i=l,. j=l,. . ,r-s (2.49a) Q (z) = CN(z) = Ci(z) i = l,. J = l,. ,r-s (2.49b) whose constitutive element P_ and are obtained from sim ple calculations • * n. n-l P..(-z)=c. , z J + c. -,zJ x0 x,Vn. i.VV1" +...+C. . nz+c. X, \> . + 1 X , V . J J i = l,...,c-s j = l,...,s (2,50a) t * 0- .(z) = . z 10 jn,x n-l v n-2 jn-l,i .+ . + t . ^ . Z• + ^. . ^ , A/ , . (j-l)n,x (j-l)n+l,x i = l,...,r-s 'j = l,...,m (2.50b) in which n = max(n.) x (2.5D n -k+1 s u * 7 7 (3-l)n+k,i = ^ £a °i.-V+v+k-1 bvu+v~1^ (2.52) V/e then obtain r-s subsystems whose input-output description is 53 pii(a).y'i(k) + — + p±s ys(k) - Wk) = ;Qil(z) u:(k) + ... + ^(z) um(k) i=l,...,r-s (2.53) Recalling (2.5D and introducing the parameters Pjk = Ci,v.+n.-k+l J=l>-->s k=l,..,n.. + l 3 3 q3:, = #\ , . j=l,...,m k=l,,.,n Hjk jn-k+l,i o » > ,5 i=l,...,r-s (2.34) allows us to define the reciprocal polynomials P. (z L) = z n p (z) J 3 . n .—n . n .-n+1 «v = p.,z J + p^z J + ... + p. . ,,z"n -jl 32 -j.n^+l y ± n .-n-k+1 = p*z J i=l,..,r-s j = l,..,s (2.55) k=l Jk t _1 _« »* Q. .(z ) = z Q. .(z) vxjv ^13 i -1 i -2 , J i -n q .,z +q._z +.... + q . z jl" ^32" "* \j,n n q^z""^ i=l,..,r-s 3=1,..,in (2.56) •k=l JK and to rev/ri-.e (2,53) in the form P^C*-'1) yi(k) + ... + F^Cz"1) ys(k) - ye+jL(k) = Qil(z"1} ul(k) + + Qin(z"1) um(k) i=l,..r»r-s (2.57) Remark Using (2.54) and (2.52), the q1, 's car. be expressed in term of the p3:. 's Jk ~ n -n+k s u q* = p1 « , . b, • . (2.58) Jk u=l v=l ^u,nu-v-n+k+l ^.+v-.l,j so that the number of unknown parameters in (2.53) is n. In summary, a set. of r-s input-output equations of the form s . m ^ P.'^-1) y (k) - ^+-;(k) = YL QT (z"1) u.(k) (2.59a) na+i * r .1 i n.-n-k+1 Pij(z ? = ^ pjkz J=l,^-,s (2.59b) C^Cz"1) = q^z k j=l,,....,m (2,59c) i=l,;,,. ,r-s "has been obtained. Recalling (2.54), (2,59) allows us to estimate the parameters of the matrix CT, Equations (2.44) and (2.59) constitute the canonical input-output description of (2.10). The equivalence between the class of canonical state-space representations (2.10) and the input-output description (2,44)-(2..59) has thus been obtained. In the next Section, the results obtained above are extended to the case of data corrupted by noise. 3.3.- Noisy. version If we assume that the input and output sequences are corrupted by an additive noise y±(k) =3^00 + v±(k) i=l,...,r (2.60) u±(k) = u±(k) + w±(k) i=l,...,m (2.61) the equations (2.44) and (2,59) take the form pIi = Q ». + p Ix + Q I (2,62) p'zi ~ Z2 = Q'ii + -v2-+ q'w (2.63) where v1(k) = [v^k) v2(k) v2(k) = [vs+l(k> vs+2(k> v(k) = [v^(k)' v£(k)" T w(k) = [w-^k) w2(k) ... T (2.64) (2.65) J (2.66) \W]T (2.67) The i-th component n^(k) of tne noise vector n(k) = P v + Q w (2.63) defined by s m n (k) = P.^z-1) v. (k) + O.Az'1) w, (k) (2.69) 0=1 J j=l J has a power-density spectrum given by s m Q, S(S) 0. .(r1) <* (<?) (2.70) 0=1 ^ aj Wi where and <|> are the power-density spectra of v. and w. , v^ ± 1 1 Since $n (%) can be expressed as the product 4> i%) = *n V%) r (r1) (2.71) ni n. " " n. 1 n. ii l then ,-1 ^ H.(rX) = i—r- (2.73) Tn <r ) constant is the transfer function of a realizable filter. Since, in addition, (2.73) is equivalent to ,-1, *N (?) H.(^) \(%~X) = cl (2.74) 57 H.(^-^) is the transfer function of a 'whitening filter. It follows immediately that n±(k) = H±(z -1) x e,(k) (2.75) where e^(k) is a sequence of independant random variables E [ej = 0 E[e.(j) e.(k)] = <r\ cf (2.76a) (2.76b) If the whitening filter is represented by its impulse response, the process (2.?5) takes the form of an auto-regressive pro cess where n (k) = e (k) D.U"1) i V D. (z"1) = 1 + d^z"' j=l 3 (2.77) (2.78) form The equation (2.62) can, then, be rewritten in the Py1 = Qu + De (2.79) where e(k) = [e-^k) e2(k) ... es(k)JT (2.80) (2,81) DizT1) = diag ^(z-1) V2'^ 58 The equation (2.63),' in the sane manner, becomes where u + D e i r • (k) e2(k) . (2.82) ' er-s(k)]T (2*83) D'(Z_1) = diag 1 D1(Z_1) D (z X) r-s (2.84) V/e then obtain the noisy version of the equations (2.44) and (2.59) m Yl P, Az-1) y no = Yl Q. . 5=1 10 .. (z_1) u.(k) + ^- e (k) 0=1 1J 3 D,(z-X) 1 (2.85a) m Yl P.'.CZ"1) 5=1 J 3(k) -ys+i(k) = Yl oj - - - - 1 1— s (2.85b) These results generalize those obtained in Section 3.2- The equivalence between the state-space representation and the input-output description of a multivariable system has thus been established in the case of data corrupted by noise. It must be pointed out that, as revealed by (2.85), the advantages of the decomposition of the system into subsystems has been preserved — In the sense that these subsystems can be separately identified - although, no restrictive assumption on noise characteristics has been made.. It is now a simple matter to formulate the identifi cation problem. However, before stating and solving this pro blem,, we shall proceed to some comments. 3.4- Comments The structure of the matrices A, 3 and C, given in Section 2, and, consequently, the structure of the input-output equations, derived in Section 3, result from an arbitrary choice of the indexes n^. It would be of importance to propose an ef ficient method for the finding of the set j n^, n^, .... , n0 j » optimal with respect to a given criterion. However, v/hatever criterion we choose, such a method should be based on the measured input-output sequences and would lead to the following problem : how can one take into account the noise whose statistics are unknown ? For example, the minimal representation case leads us to consider input-output data matrices : how can we assert that these matrices are singular or not ? (Is a matrix whose determinant is 10~10 singular ?"). It then seems that this problem cannot be solved in realistic cases. Nevertheless, the choice for the n.'s can be guided by some physical considerations : — The structure may be assigned by the physical meaning of the state variables. — If the j-th output of the system is markedly more free from noise than the remaining ones, a maximi zation of the corresponding n . could be advanta-geous. — The number of unknown paraneters is in (2.85a) : n+m(ni+l)+n^ i=l,...,s in (2.85b) : n+nd i=l,...,r-s If n is a multiple of s, it can be judicious, in order to minimize the computing requirenents, to choose n^= n/s, since the subsystems are separately identified. Another problem arises with regard to the noise model : introducing a noise filter increases the number of unknown parameters which affects the Identification proce dure, since the efficiency of any optimization algorithm de creases when the number of parameters increases. However, the noise model allows us — to reduce the effect of the choice of the n^ s on the identification results since this effect is directly related to the amount of noise corrup ting the different outputs. - to improve the efficiency of some simple minimiza tion algorithms, which are sensitive to noise. Moreover•such an approach provides a probabilistic framework which permits us to establish the mathematical properties of the estimates. 4 - MAXIMUM LIKELIHOOD ESTIMATOR i+.l. Introduction The object of this Section is to present a Maximum Likelihood estimator. The equation of error and the likelihood function will be derived (Section 4.2) and the consistency of the Maximum Likelihood estimates will be proved (Section 4.3). 4.2, Equation of error Recalling (2.85) leads us to choose the model s m 1=1 , , • • , s (2,86a) s m i i—• » > 3?" — S (2,86b) 62 where n .+1 , n,-n,+l n,-n,-k+l (2.87) n.+l n.-n.-k+l Q^OT1) =- YL q^z"1 ^ (2.88) n .+1 ,. , n .-n-k+1 P (z—) ^ Z_ p^z J (2,89) 13 k=l J n i nd D.Cz"'1) =1+ YL d^z"3' (2.9D 1 i=i * i nd D.(z X) = 1 +: ZL, (T- z J (2.92) 1 j=l 3 The residuals £^(t) [^(t)] 3X6 independant. and Gaussian (0,; r±) [(0, ^)]. Rewriting (2,86) in the form 63 e±(t) = D. S \—' ^ P^, y,(t) 0=1 3=1 0. . u .(t) "13 3 1—1 y »#-»•) S L 3=1 (2.93a) ^—• Q_. • u 3=1 1 — 1 , , , » jT-S yields the so-called equations of error (2.93b) £ (t) = Z_ al h=0 n.,+1 y f~* -i <^—1 p., y .(t+n .-n. -k-h+1) 3=1 K=l 3 3 0 + yi(t+ni_njL-h+l) -n. +1 m x 0=1 k=l q3k uo(t+n.-n.-k-h+l) i—1 j»«• j s (2.94a) n €,(t) = E?: h=0 s n .+1 3=1 k=l m n P,, - y,(t+n -n-k-h+1) 3-^3 3 EC 1 "I -s+1- ' ^ ^ik "i 81 0=1 k=l Jk J q,v u.(t-k-h) 3.—1 j » « • i X*~"S (2.94b) 64 The Gaussian ess of c± and €^ allows, us to define the log-Likelihood functions N L i £2(t) + Nlogcr + NTLoS2Tr i=l,..,s (2.95a) 2 * 2 t=l i 1 L. = -l 2 cr ^ x~x i » Gi2(t) + Nlog<r! + Nlog27r i=l,...,r-s (2.95b) where N is the number of available data. The Maximum Likelihood estimates of Pj_-pQjLji and Qj_j are obtained by maximization of L^ and L^ or equivalently by mi nimization of the loss functions N W. = ~ ^Zcf(t) i=l,....,s (2.96a) 1 ^ t=l N w| = ~~ H £-2(t) l=l,...,r-s (2.96b) 1 ^ t=l Having defined the Maximum Likelihood estimation i problem as the finding of the global minima-of V/. and V/., x x we can determine the statistical properties of the estimates by examining these minima.. 4-3.- Statistical properties of the estimates In the whole Section, the following assumptions an made : . - - -- Al . To ensure that the model structure is appropriate, it is assumed that (2.85) really holds. —1 ' —1 _A2 v For all (i,j), the polynomials P. . (z ) ?. .(z ), —i • —i D.(z ) and D. (z ) are assumed to have all their x x zeros inside the unit circle. This implies that, the equations (2.85) are stable. _ A3 . For all i=l,.... ,s and j=l,... , min (m, s) £i=l,.. . ,r-and j=l,.... ,min(m,r-s) 1 the polynomials P. .(z~^~) and Qij(z""'1') ^^(z"1) and Qi^.(z~"1")J are relatively prime. These conditions are fulfilled provided the system is controllable from u. _M .- All stochastic processes are ergodic. Moreover — Since, for a finite number of data, N, the properties t of and may change drastically from one experiment to another, the asymptotic loss functions V. = lim W i=l,....,s (2.97a) v! = lim W\ i=l,...,r-s (2.97b) will be considered in the following. - For convenience, E f(t) v/ill denote N J 77 ^ » lim N_»oo " t=l f(t) (2.98) provided the limit exists. If f(t) is an ergodic stochastic process, E^f(t)j is the expected value of f(t). The output data are governed by P = Q u + D e i i P ii - 12' = Q H. + D 1 (2.99a) (2.99b) while the model is defined by P zx = Q u + D e (2,100a) (2.100b) Let us,•first, consider (2.99a) and (2.100a). Combining these two equations yields £ = D • P P D e+DPP -1 *-l ~ Q - D Q (2.101) Then, assuming that u and £ are uncorrelated and setting (2.102) ~-l " -1 F = D P P D A -J A 1 A "I A Ki = D P P Q — B Q (2.103) allows us to write E [i £T] = E [F. e eT FTJ + E [G u uT GTJ (2.104) which clearly implies that E ['£ £T] > E [F £ £T pT] = V (2.105) where A>B means that A-B is positive definite. Applying Parseval's theorem, V becomes (Assumption A4) ^ = Mf TO) R pT^"1) if' (2.106) ' 1*1=1 where R = E eTJ (2.107) Let us consider the integral J = ~ri [?{%) - ij R[FT(r1) - i] -d# > 0 2lfi (2.108) Then J = V f Jx - 2J2 (2.109) where J, =<fc» R -Tr = R (2.110) 7 m=i 5 J- =cb F(S) R ~ = R (2.111) 2 J |%| =1 * The latter equality is due to the fact that the elements of F(^) have their zeros outside the unit circle (Assumption A2) and that F (0) = D~1(0) P(0) P~1(0) D(0) = I since D±(0) = D±(0) = 1 and P (0) = P (0) = J Hence J = V - R > 0 (2.112) and, recalling (2.105) and (2.107) E [i £T] - E[e £T] > 0 (2.113) Applying Sylvester's theorem implies that the diagonal elements must be positive Vi = E[€i]^ E[ei] = °i '1=1,....8 (2.114) The equality, in (2.114) occurs if and only if (1) J = V - R = 0 i.e. F = D"1 P P-1 D = I (2.115) 69 (ii) the second term of the right-hand side of (2.104) vanishes i.e. - provided u is persis tently exciting of order 2n + n^ + 1 (see chap ter 1) - if G .= D"1 P P"1 Q - D"1 Q = 0 (2.116) Combining (2.115) and (2.116) yields ' D P = D P (2.117) - D Q = D Q (2.118) which implies (Assumption A3) A P = P Q = Q A D = D (2.119) (2.120) (2.121) We have then established the Lemma 2.3. V± ^> crjr i=l,...,8 (2.122) A A The equalities are obtained if and only if P = P, Q = Q and D = D. Tha lemma 2.3 states that the estimates of the para meters of the matrices A and B converge to the true values as N »-=»:> . 70 Let us now consider (2.99b) and (2.100b).Remarking, from (2.100a), that v., is defined by ^ = P"1 Q u + P-1 D e (2.123) the equations (2.99b) and (2.100b) can be rewritten as X2 = (p' P"1 Q - Qf) u +• P* P"1 D e - D* e' (2.114a) £2 = (F* P"1 Q - Q ) u + Pf P"1 D e - D' (2.114b) fr-am which we can deduce € = [D 1 (P - P ) P"1 Q - D -1 (Q" - Qr)j u 4W.,Pr)P-1D]e+[;r-1D,]er (2.115) Proceeding as above lead us to the Lemma 2.4. Vi>*l '2 i=l,...,r-s (2.116) The equalities are obtained if and only if p - p Q = Q and D" = D . The lemma 2.Vproves that the estimates of the parameters of the matrix C converge to the true values as N—>~ <>=> . We then obtain the Theorem 2.1 The Maximum Likelihood estimate of the triplet (A, B, C) is asymptotically consistent. The remaining problem is the minimization of the loss functions.. Inspecting (2.96) and (2.94) shows that minimizing i W^ and is a non-linear optimization problem. The object of the next section is co present the various classes of non-linear optimization algorithms. 5 - MINIMIZING-THE LOSS FUNCTIONS 5.1. Introduc tion The nonlinear minimization problem can be formally ' stated as Minimize V(p_) p € En (2.12?) 72 A large number of methods have been proposed to solve the general nonlinear minimization problem [20,1^]. These methods can be divided in two classes : — The methods that use derivatives - The methods that do not use derivatives In the following, the most commonly used methods are briefly described 5.2... Minimization procedures usinr; derivatives We first consider how to solve problem (2.127) by algorithms that make use of first and, possibly, second deri vatives of V(p_). 5.2.1. Gradient methods ilk ] At the k-th stage, the transition from a point - (k) . . " . . (k+1) . JD to another point, is given by ^(k+D s£(k) _x v (p(k)} (2a28) The negative of the gradient gives the direction for optimiza tion but not the magnitude of the step, so that various steepest doscent algorithms are possible, depending on the choice of A . Many methods of selecting A are available. It can be shown that, under suitable conditions, the method converges as k • 00 . However this, theoretical.result is of little in terest in practice because the rate of convergence can be intolerably slow. 5-2.2- Newton s methods [19] 1 The second-derivate methods, among which Newton s method is the best known, originate from the quadratic appro ximation of V(p_).. The transition from to _p_^k+1^ is V (p(k>) nn • (2.129) The convergence is guaranteed if the inverse of the Hessian matrix of V is positive definite. This is a major drawback to this method since for functions v/hich are not strictly convex, t Newton s method can diverge. 5.2.3- Quasi-Newton methods [l4 j Quasi-Newton methods approximate the Hessian matrix or its inverse but -use information from only first-order deri vatives. Atstage (k+1), p^k+1^ is computed from J3^K^ through £(k+l) = £(1° - *k A(p(k)) V£(p<k>) (2.130) where A is an approximation of the inverse of the Hessian. The problem of approximating the Hessian has been investigated by Pearsonfl? ]r, Davidon - Fletcher - Powell["14 ] and Fletcher [ 10] -5.3«- Minimization procedures without using derivatives This Section is concerned with derivative-free type of methods (search method). As a general rule, first-derivative and second-derivative methods converge faster than search methods. However, in practice, the derivative-type methods have two main drawbacks to their implementation. First, it can be laborious or impossible to provide analytical functions for the derivatives. Second, the derivative-type methods require a large amount of problem preparation as compared with search methods. Two of the many existing search algorithms are briefly described, A complete description of these two methods is given in [14]. 5.3.I. Rosenbrock s method [22 ] 1 (k+1) Rosenbrock s method locates _t> ' by successive unidimensional searches from an initial point along a set of orthonormal directions vT.^^ ' * * '»Vn^^ generated by Gram -Schmidt procedure. 5,3.2. Powell s method 1 Powell s method locates the minimum of V by succes sive unidimensional searches from an initial point along a set of conjugate directions. Choosing an algorithm is not a simple matter since no one method appears to be far superior to all the others. The choice cf a particular algorithm rests on the formulation of the problem and the experience of the practitioners. For the identification of multivariable systems, search methods seem more appropriate than derivative-type methods because of the generally large number of parameters. However, if the com putational (storage) requirements are considered, a quasi-Newton method can be chosen. In Chapter 3, the various aspects of these problems will be treated on some examples. 6 - CONCLUSION In this chapter a unified approach to the identification of multivariable systems has been presented. Although the problem of deriving the input-output de scription of a multivariable system from its state-space represen tation has been deeply investigated by many authors, no general result is available in literature. The results of Section 3 bridge this gap. Compared with the previous methods, the present one has the following advantages. - This is the most general approach, since all the pre vious methods are special cases of the above method. a) If the number of subsystems is equal to the number of outputs (s = r), one obtains the input-output representation de rived by Gopinath [12] and Zuercher [28]. b) If the n^'s are chosen as large as possible, one obtains the so-called minimization realization derived by Acker-man [ 1 ]. c) If the two above conditions can be satisfied simultaneously, the input-output representation, suggested by Guidorzi [13], is obtained. - The identification of the system dynamics (matrices A and B) is based on the information provided by u. and y_ This allows a discrimination of the input-output data which is of interest if some outputs of the system are markedly more free from . noise than the other ones. - Equations (2.28) and (2.25) provide a very simple state estimator. Having estimated the parameters of P and Q (i.e. A and B), equation (2.99a) gives an estimate of the output vector y_^, say y_, from which the noise has been removed Py^ = Qu (2.131) Then, provided ii is noise-free, the state estimator x(k) = Mx1(k) - Nu(k) (2.132) gives an unbiased estimate of x(k). This formulation leads to an alternative approach to the identification of the parameters of the observation matrix: the parameters of C can be estimated through equation (2.45). The Maximum Likelihood estimator, introduced in Section 4, allows us to establish the consistency of the estimates in case of correlated noise. All these results constitute a global approach to the identification of linear multivariable systems. The whole procedure described in the previous Sections can be extended, in an entirely obvious way, to systems whose input-output structure is known a priori (transfer functions, impulse response). Due to the recursive structure of the computation scheme, the procedure can be extended, with only minor modifications, to an on-line identification procedure. 78 Chapter 3 : EXPERIMENTS 1 - INTRODUCTION The aim of this Chapter is to present some examples of Maximum Likelihood indentification of multivariable systems. For this purpose, tests will be made on simulated data — for different minimizations algorithms - for different noise powers 2 - EXPERIMENT 1 We consider the following third-order- syscem 0 1 0 x(k+l) =• all al2 al3 a21 a22 a23 1 0 0 i(k) = 0 0 1 x(k) + ird- orde b b 11 12 '°21 b22 b3i b,, u(k) (3.1) x(k) (3.2) with 2 inputs and 2 outputs. This system has been simulated on an IBM 370 computer. The inputs and the outputs vere sequences of length N = 250. The loss function was minimized via the Fletcher al gorithm £lo] ..... Tables 3-1- and 3-2. give the results for 79 different values of the signal-to-noise ratio, N N " s i=l If k=l 3. EXPERIMENT 2 The following system has been simulated on an IBM 370 computer: x(k+l) = zoo = 0 1 0 0 bll b12 all a12 a13 a14 x(k) + b21 b22 u(k) (3.3) 0 0 0 1 b3l b32 a21 a22 a23 a24 Al V 1 0 0 0 0 0 1 0 x(k) (3.4) cll C12 C13 c14 The identification of this system has been achieved by using the Rosenbrock minimization algorithm for a set N = 250 input-output data.. The results are reported in Tables 3.3 and 3.4. The results, given in Tables 3.1 — 3.4 show the use fulness,, in case of small signal-to-noise-ratios, of the noise model which allows one to eliminate the bias on the parameters. This confirms the theoretical results of the previous chapters. parameters true values estimated values all - 0.5 - 0.5000 a12 - 0.25 - 0.2499 a13 0.25 0.2500 a21 1.375 1.3749 a22 i-5 1.5000 a23 - 0.75 - 0.7500 bll 0.2 0.1999 b12 0.3 0.2999 b21 0.1 0.0999 b22 - 0.275 - 0.2750 b3l 0.85 0.8499 b32 0.95 0.9500 Table 3.1 : T; = eo 81 parameters true va3.ues estimated values nd = 0 na = 3 - 0.5 0.083 - 0.489 ail - 0.25 - 0.236 - 0.252 a12 a13 0.25 0.018 0.249 1.328 a ~ 1.375 1.005 d21 a ~ 1.5 1.435 1-506 . 22 a — - 0.75 - 0.595 - 0.751 23 b 0.2 0.181 0.210 Dll b 0.3 0.329 0.296 12 b 0.1 0.084 0.097 D21 b - 0.275 - 0.237 - 0.259 22 b , 0.85 0.426 0.879 D3l 0.898 b 0.95 0.874 32 Table 3-2 ': | = 10 (n, : noise model order) estimated values parameters true values n , = 0 d nd = 3 all - 1. -1.274 - 1.061 al2 0.5 0.067 0.506 a13 2 2.186 2.003 aU 1.5 1.324 1.491 a21 1. 0.645 0.979 a22 2.5 2.484 2.485 a23 0.23 - 0.016 0.261 a24 1 0.715 1.083 bll 0.5 0.674 0.501 b12 0.5 0.323 0.518 b21 2 2.711 2.215 b22 - 1 - 1.617 • - 1.184 V 4 3.P77 3.899 • b32 3.2 1.006 2.994 V 2.7 - 0.199 2.645 V- - 0.2 —1.157 - - 0.247 cll 1 - 0.874 1.012 C12 - 1 - 1.015 - 1.024 C13 - 1 - 1.637 - 1.127 c14 1 0.421 0.919 Table 3.3 : 77 =10 (rt, : noise model order) parameters true value s estimated n . = 0 d values n, = 3 all - 1 1.429 - 1.064 al2 ' 0.5 - 0.126 0. 512 a13 2 2.114 2.004 a14 1.5 1.677 1.494 a21 •1 0.429 0.875 a22 2.5 2.734 2.515 a23 a24 bll 0.25 1 0.5 - 0.126 0.834 0.722 0.299 1.077 0.494 b12 0.5 0.299 0.577 b21 . 2 2.612 2.129 b22 - 1 -1.739 - 1.227 4 2.984 3.725 b32 3.2 1.008 2.999 2.7 - 0.237 2.825 - 0.2 - 0.975 - 0.251 - cll 1 0.827 1.108 • C12 - 1 - 1.115 - 1.009 C13 - 1 - 1.725 - 1.118 1 i c.. ! 14 1 0.622 0.905 Table 3.4. * N (n, : noise model order) CONCLUSIONS The purpose of this thesis is to investigate the Maximum Likelihood identification of linear discrete-time systems. The statistical properties of the Maximum Likelihood estimator, for single input-single output systems, are analyzed. Two different types of noise model structure are considered and two new results relative to the convergence properties of iden tification methods are obtained. A general method for deriving the input-output description of a multivariable system from its state-space representation is proposed. It is shown that this approach is superior, from dif ferent points of view, to those proposed in literature. Based on the results obtained in the single input-single output case, a Maximum Likelihood estimation procedure for multi-variable systems is described and convergence properties are derived. It is shown that the results can be generalized in various directions. Finally, numerical results of Maximum Likelihood iden tification are obtained. 85 REFERENCES [1] Ackermann, J.E. and Bucy, R.S., "Canonical minimal realization of a matrix of impulse response sequences", Inform. Contr. 19, pp. 224-231, 1971. O It [2] Astrom, K.J, and Bohlin, T., "Numerical identification of linear dynamic systems from normal operating records", IFAC Symp. on theory of self-adaptive systems, Teddington, England, 1965. In theory of self-adaptive control systems (ed. by P.H. Hammond), Plenum Press, New York, 1966. O II [3] Astrom, K.J., and Eykhoff, P., "System identification - A survey", Automatica 7, pp. 123-162, 1971. o It It tl [4] Astrom, K.J., and Soderstrom, T. , "Uniqueness of the maximum likelihood estimates of the parameters of an ARMA model", IEEE Trans. Aut. Control, Special issue on identification and time series analysis, AC-19, pp. 769-773, 1974. [5] Bohlin, T., "On the problem of ambiguities in maximum likelihood identification", Automatica 7, pp. 199-210, 1971. [6] Clarke, D.W., "Generalized Least-Squares estimation of the para meters of a dynamic model", IFAC Symp. on identification in autom. . control systems, Prague, paper 3.17, 1967. [7] Cramer, H., "Mathematical methods of statistics", Princeton University Press, 1946. [8] Cuenod, M. and Sage, A., "Comparison of some methods used for process identification", Automatica 4, pp. 235-269, 1968. [9] Eykhoff, P., "System identification, parameter and state estimation", Wiley, 1974. [10] Fletcher, R., "A new approach to variable metric methods", Computer J. 13, pp. 317-322, 1970. [11] Fletcher, R. and Powell, M.J.P., "A rapidly convergent descent method for minimization", Computer J. 6, pp. 163-168, 1963. [12] Gopinath, B., "On the identification of linear time invariant systems from input-output data", Bell. Syst. Tech. J. 48, pp. 1101-1113, 1969. [13] Guidorzi, R., "Canonical structures in the identification of multivariable systems", Automatica 11, pp. 361-374, 1975. [14] Himmelblau, P.M., "Applied nonlinear programming", McGraw Hill 1972. 86 [15] Luenberger, D.G., "Canonical forms for linear multivariable systems", IEEE Trans. Aut. Control, AC-12, pp. 290-293, 1967. [16] Mayne, D.Q., "Parametrization and identification of linear multivariable systems", Lecture notes in mathematics, Springer Verlag, 1972. [17] Pearson, J.D., "Unconstrained minimization of a function of several variables", Computer J. 13, pp. 171-178, 1969. [18] Powell, M.J.D., "An efficient method of finding the minimum of a function of several variables without calculating derivatives", Computer J. 7, pp. 155-162, 1965. [19] Powell, M.J.D., A survey of numerical methods for unconstrained optimization, SIAM Rev. 12, pp. 79-97, 1970. [20] Richalet, J., Rault, A. and Pouliquen, R. , "Process identifi cation by the model method" (in French), Gordon and Breach, 1971. [21] Rosenblueth, A. and Wiener, N., "The role of models in science", Philosophy of Science 12, pp. 316-321, 1945. [22] Rosenbrock, H.H., "An automatic method for finding the greatest or least value of a function", Computer J. 3, pp. 175-184, 1970. [23] SciderstrHm, T., "Convergence properties of the Generalized Least-Squares identification method", Automatica 10, 1974. [24] SBderstrcltn, T., "On the uniqueness of Maximum Likelihood identification", Automatica 11, pp. 193-197, 1975. [25] Soudack, A.C, Suryanarayanan, K.L. and Rao, S.G. , "A unified approach to discrete-time systems identification", Int. J. Control, Vol. 14, No. 6, pp. 1009-1029, 1971. [26] Wilde, D.J., "Optimum seeking methods", Prentice Hall, 1964. [27] Woo, K.T., "Maximum Likelihood identification of noisy systems." Second IFAC Symp. identification and process parameter estima tion, Prague, paper 3.1, 1970. [28] Zuercher, H., "Linear systems identification algorithms for a minicomputer", M.A.Sc. Thesis, Dept. of Elect. Engg.,^University of British Columbia, 1974. 


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics

Country Views Downloads
United States 15 1
China 12 3
Russia 9 0
Germany 4 0
Hong Kong 1 0
City Views Downloads
Shenzhen 12 2
Ashburn 8 0
Saint Petersburg 7 0
Unknown 7 13
Tomball 2 0
Plano 2 0
Clarks Summit 2 1
Wilmington 1 0

{[{ mDataHeader[type] }]} {[{ month[type] }]} {[{ tData[type] }]}
Download Stats



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items