APPLICATION OF AUTO REGRESSIVE FILTERING -TECHNIQUES TO FLOOD FLOW PREDICTION by PETER G. CAHOON B.Sc, Dalhousie University, 1970 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE in THE FACULTY OF GRADUATE STUDIES The Department of C iv i l Engineering We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA March, 1978 © Peter G. Cahoon, 1978 In presenting this thesis in p a r t i a l f ulfilment of the requirements for an advanced degree at the University of B r i t i s h Columbia, I agree that the Library shall make i t fre e l y available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the Head of my Department or by his representatives. It is understood that copying or publication of this thesis for fi n a n c i a l gain shall not be allowed without my written permission. Department of Civ i l Engineering The University of B r i t i s h Columbia 2075 Wesbrook Place Vancouver, Canada V6T 1W5 Date April 5, 1978 ABSTRACT This thesis describes a study of the application of multi-channel deconvolution, an autoregressive f i l te r ing technique, to the problems of flood flow prediction. This application is divided into four component segments: 1. The description of the behaviour of a multiple input, multiple output basin using multichannel autoregressive techniques with multiple lags of f in i te length. These descriptions f a l l into two catagories: a) the Weiner autoregressive technique^. b) minimum entropy auto regression-^. 2. The restatement of the input/output problem as a time varying state-space description with a feedback mechanism for implementation of information having a unit time delay. 3. The analysis and characterization of the state-space and autoregressive methods using standard spectral analysis techniques and stat i s t ica l confidence l imits. 4. The linking of the state-space/autoregression characterizations for snowpack depletion and rainfal l /runoff into a f in i te state machine algorithm coalescing the two processes so they may be linked to sate l l i te information. Several case studies were used in which the multiple precipita-tion records and multiple flow records were characterized. These steady state characteristics were then updated using the Kalman state-space description to provide an "online", information update. - i i -The snowpack depletion problem was treated as a multiple input ( ra in fa l l , temperature, humidity), single output (snowmelt) phenomena and characterized by a single multichannel autoregression. A second class of characterizations was employed to cope with the "spike" arrivals caused by rapid snowmelt flowing into a single output. This minimum entropy^ technique is developed for "flash flood" prediction. Final ly, a f in i te state machine algorithm is developed to link the snowpack depletion to the ra infa l l runoff problem in such a way that i t can be readily linked to sate l l i te produced data streams. For those unfamiliar with the complex and obtuse language of deconvolution and control theory, the role of this thesis can be described by analogy: You have arrived in the middle of a party where the participants are a l l quite intoxicated. They are huddled together in small knots talking excitedly, but without much c lar i ty about decon operators, M.E.D. processes, frames and f in i te state machines. My role is to catch you up on the conversation and help you find your way to the bar. - i i i -TABLE.OF CONTENTS Chapter Page 1 INTRODUCTION 1 A Brief Literature Review 3 2 THE CONVOLUTION PROBLEM 6 3 KALMAN FILTERING AND THE RECURSIVE LEAST SQUARES PROBLEM 9 4 STATE SPACE REPRESENTATION OF THE DECONVOLUTION PROCESS 19 5 IMPLEMENTATION OF AN ADAPTIVE KALMAN FILTER TO ESTIMATING RUNOFF 26 Computation of the Kalman F i l t e r 31 Weiner Kalman Formulation Assumptions 35 6 CASE STUDIES 36 Computation of the Confidence Bounds on Predicted Traces 43 Considerations 46 Other Applications 46 Other TechniquesrCombatability 48 Numerical Examples Example No. 1 49 Discussion 51 7 FINITE STATE MACHINES 78 Numerical Example No. 2 (Snowpack Depletion at Mission) 84 Numerical Example No. 3 (Squamish problem using Weiner f i l t e r s as error f i l t e r s ) 86 8 OTHER CLASSES OF FILTERS . 89 Minimum Entropy Deconvolution ' 90 9 CONCLUSIONS 101 BIBLIOGRAPHY 102 - i v -ACKNOWLEDGEMENTS The author wishes to express his appreciation to his super-visor, S.O. Russell, whose patience and guidance made possible the research and preparation of this thesis. The author also wishes to acknowledge the others involved at various stages of the thesis preparation: B i l l Caselton for his constructive cr i t ic i sm; Tad Ulrych for his generous loan of countless algorithms and introduction to the Weiner processes; Colin Walker for their explanation. -v-Chapter 1 • . v INTRODUCTION The problems associated with computing flood flow predictions from meteorologic data have occupied hydrologists for a number of years. Approaches to hydrologic forecasting have proliferated through recent years into a long and ever expanding l i s t running from purely deterministic models through to spectral analysis. Present in a l l time series analysis is a dualism between deterministic observations and theoretical formulation that is fundamental to a l l systems. Usually, the data available represents an inadequate base for thoroughly explaining the properties and limitations of the theoretical representation. Through the capabil it ies of high speed digital processors this dualism has taken on an increasingly intr icate network formulation making the prediction process and subsequent analysis expensive and complex. Efforts to simplify this dualism between measurement and theory to achieve a satisfactory compromise of s implicity and real i ty have taken on various forms. Typical of these is the assignment of probabil it ies of occurrence to the deterministic data then performing a frequency tabulation to separate out the stochastic var iab i l i ty in to bands of probabil ity. From this tabulation a "best" estimate is chosen as representative of the process. In 1960 Rudolf Kalman-| published a paper that described a way -to incorporate stochastic var iab i l i ty into a system model and to update this system as new information was received. This type of model had two simple parts. The model of the way the system i t se l f undergoes change -1 -and a model of the measurement process that was used to monitor the data. The measure of agreement between the variations induced by the system model and those of the measuring process serve as the correcting term at each time of evaluation. This paradigm- provides a simple, compact scheme for incorporating a l l types of stochastic var iab i l i ty that could occur in the model. However, as in a l l formalisms, the accuracy of the i n i t i a l guess at the system mechanics ultimately determines the usefulness of the Kalman formulation. The inaccuracies in these i n i t i a l guesses can be the principal downfall of any simulation procedure. This presentation is directed towards the ra infa l l runoff problem from a multichannel point of view, that is having a number of gauging points for the precipitation records as well as flows monitored from several r ivers. Spec i f ica l ly , the meteorologic stations at Vancouver Airport, Surrey Municipal Hall&Kwantlen Park. The corresponding flows are recorded on the Salmon and Nicomekl Rivers and at Murray Creek, a l l in the lower Fraser Valley. From this information various " f i l t e r s " are proposed to predict flood flows on these rivers as a group. The f i l t e r s themselves, as or ig inal ly calculated are time invariant matrices of 'suitable'coefficients that can be updated at discrete points in time using the Kalman model. The f i l t e r s themselves represent the solution to a (matrix) ordinary d i f ferent ia l equation with constant coeff ic ients. The Kalman formulation allows the coefficients to become time varying subject to changes in the input precipitat ion. The various flood conditions change from time period to time period consequently, a set of these f i l t e r s is used for the same phenomena -2-depending on particular conditions. Thus, rather than one restricted form of model this approach is p lu ra l i s t i c , adapting i t se l f to the individual situation rather than "cal ibrating" some deterministic representation of the whole process. A BRIEF LITERATURE REVIEW: Various types of models have been employed in attempts to piece together the time variations and the l inear overview composing the ra in fa l l runoff problem. Typical of these is the art ic le by Hino, M. Journal of Hydraulics, proc. of Am. Soc. of C iv i l Engineers Vol. 96, 1970. pp. 681. Each of these approaches develops a l inear f i l t e r (set of autoregressive weights) that, when convolved with the input precipitation at a f i n i te lag over a discrete interval, y ie ld an output value (flow) at some future time. Considerable time and effort is expended to find the correct number of lags to employ so as to minimize the error in estimation. Unfortunately these coefficients are invariant in time. Consequently, revision of these f i l t e r weights in response to new information is not possible. A further step towards this time adaptation is the situation where one takes an average across the sampling period and computes a second set of weights to compensate for the variance not explained by the f i r s t set. This is a popular approach to the problem, but requires considerably more computation. The application of the time varying approach to a ra infa l l - runoff problem is discussed by P. Young, Institute of Mathematics and Its f Applications Control Division, Dept. of Engineering, University of Cambridge, 1974. He employs the Kalman formulation to a simple single lag f i l t e r . The realization of this formulation may be simply represented as: *k = V k + e k where y^ = flow at k Ui, = rain at k e^ = error at k where is some function of a i r temperature and the difference in flows at any two points on a r iver. He then proceeds to continually update a^ using information on the actual flow as i t is received to correct y^. The value of a^ is changed until i ts fluctuations f a l l within some required tolerance. The advantage of this approach is that a^ is now a time varying parameter and responsive to changes in the new incoming information. The combination of autoregressive and time variation are now neatly combined into a single unit. An enormous amount of work has gone into adapting this idea to multiflow systems, employing varying numbers of lags in the Kalman formulation. However, the crux of the matter rests in finding the right model.(correct f i l t e r weights) for each situation. In most cases the weights do not vary a great deal, allowing one to pick a typical starting set and update these upon receipt of new information. This poses yet another problem typical to the forecasting of extreme flows. Given, that one has h i s tor ica l ly found the "best" set of these parameters and has a new set of precipitation values but no new flow information until sometime after the process has been -4-started, the question becomes: how then can one assess the c red ib i l i t y of these parameters, or indeed the model structure when applying the model to a s l ight ly dif ferent case? It is to this end that this model estimation is directed. To enable the construction of a multir iver model with multiple inputs. Some of the well documented geophysical techniques are borrowed and re-examined in the water resource paradigm. In particular the work of Crump, N.D., Geophysics 1974 v39, p. 1-14. In order to do this several concepts from spectral analysis need to be reformulated in terms of the runoff problem. This wi l l be done in a step by step manner with the suitable water resource adaptations. -5-Chapter 2 THE CONVOLUTION PROBLEM The word convolution means "folding". When a signal is passed into a f i l t e r the output is the convolution of the signal and the impulse response of the f i l t e r . Similarly, when a signal (ra infal l ) is passed into the earth the output runoff represents the convolution of the input signal with the response function of the earth. Deconvol-ution means unfolding. The deconvolution operator is one which produces the inverse operation to the given convolution operator. Consequently, when one deconvolves runoff records with a dig ital computer one is trying to unfold from the orginal heterogenous complex earth some simpler components so that its basic structure can be interpreted. This system deconvolution,"then separates out the effects of the data collection system. This is a problem which is different than simply separating signal from noise. For example consider a radar receiver that receives a signal masked with noise.Th.is.noise.jisgenerated by the system and has nothing to do with the signal. Compare this with the runoff records. Nearly a l l of this information contains meaningful information about the total system and the problem is to unfold the recorded records into simpler components so that one can see the contribution by primary events, ghost events or other causes. The information being considered consists of primary events or innovations. These innovations are, however, unpredictable. They occur randomly with random amplitudes. Associated with each and every one of these innovations is a response. This response consists of the -6-reverberations and repercussions generated by the innovation and each response persists for- relat ively short periods of time before damping out. Consequently, the resulting time series consists of a l l of these primary events overlapped subsequently masking the onset of each primary event. One wishes, therefore, to unfold these primary events. To enable this process one uses a measure of the information available. This is referred to as entropy. The ratio of the entropy per symbol to the maximum value i t could have is called the relative entropy. Viewed in terms of r a i n f a l l / runoff, the redundancy in recording represents the factor by which the average lengths of messages are increased beyond the minimum necessary to transmit the desired information. In natural processes, nature has bui lt in enough redundancy to overcome noise. However, this redundancy causes so much overlapping and mutual interference that i t is not possible to interpret raw records provided. For this type of situation the process of deconvolution in effect separates the new information from the redundant information as time progresses. This separation permits interpretation of both kinds of information (new and redundant). Thus both have value iii comprehending the basic mechanisms. The new information consists of primary events or innovations, and the redundant information, the attached responses and repercussions. There is one additional assumption necessary before one applies the deconvolution technique. This assumption is that the process is minimum delay. Inherent in this concept- are the following two assumptions: 1. The deterministic hypothesis that the percolation through the layered earth has the same effect on a l l of these primary events, i . e . , each of the events is of a minimum delay shape. -7-2. The s tat i s t ica l hypothesis that these primary events are randomly spaced in time and have random amplitudes. The deterministic hypothesis assumes that the primary response of the basin occurs on or near the surface. This means that the surface responds l ike a l inear reservoir at a fixed lag and the subsurface behaves l ike a l inear reservoir at a different lag. However, the primary response for each has essentially the same formulation and appearance. The s tat i s t ica l position is based on the assumption that the primary events ( in f i l t rat ion) occur in a manner that is not systematic and depends on the characteristics of the individual earth layers which were la id down with no systematic scheme in mind. The evidence of this behaviour is realized in the runoff records in the form of a time series. The computed autocorrelation of these series averages out the non-systematic events of the time series and preserves the systematic ones. These systematic variations are realizable in the amplitude (auto-correlation) spectrum of the system. In order for this condition to be consistent with the two primary assumptions one must combine auto-correlation with the minimum phase condition. The minimum delay condition can be heurist-ically argued as follows: As the input signal (precipitation) percolates downward through the layered medium of the earth, i t is deflected by the different density layers and correspondingly.attenuates in strength. For the flood conditions, or for that matter most other conditions not a l l of the flow is retained. Hence, the more i t is reflected the more, attenuated i t becomes. Therefore, the impulse in a progressing wave form appears at i ts begin-ning rather than at i ts end. This then is the condition that is defined as-minimum delay. -8-Chapter 3 KALMAN FILTERING AND THE RECURSIVE LEAST SQUARES PROBLEM Since 1960 the use of recursive least squares f i l te r ing has changed the foundations of estimation theory as i t is applied to both discrete and continuous problems. The addition of the recursion property to this well known technique is i ts f i r s t innovation since Gauss (1809) and LeGehdre (1806) goriginally applied i t to analyze observational data. This innovation came about through the presentation of a paper by Rudolf. Kalman..; * a control .theorist. His formulation allowed the unknown parameters to be described by inherently time variable states which in turn were described by a general set of l inear stochastic state equations. This departs from the classical least-squares definition of solving for a set of time-invariant parameters described in a l inear set of equations. As a brief introduction to the idea of recursive least squares, one should consider the original least square problem as posed by Gauss. Consider the linear regression problem in which a variable x 0 is known to be related to n other l inearly related variables through "the relation x 0 = aixl + a 2 x 2 + . . . + a n * n , where a., j=l , 2 n are the n unknown constant parameters. 3 The variables x- are assumed to be known exactly, but x 0 is seen only in the presence of some noise n . If one then cal ls the observation x 0 y, one then has y = x 0 + n or, *1 = a i x M + ^ i + + Vnt + n y t ( 1 ' 0 ) -9-i = 1, 2, . . . k n • is the error associated with the f i l t e r observation. Suppose that this sequence of errors has the following s tat i s t ica l properties for i = 1 k. 1. It is a sequence of random variables with mean zero i .e . E [ n y t ] = 0. 2. The riy.j are uncorrected; in time and have a constant variance a 2 . Efn,,.,. n„,-> = o25.., where 6,.4 1 for i = j ^ ^ . 1 J ^ 0 for i j j 3. The T)y. are independent of the variables x.^. This is the formulation of a simple well known problem. The same problem can be reposed from an estimation point of view so as to set up the recursion property. Now what is required is a "best" estimate of the values of the unknown parameters a^ given the information y.j» Xin- , x2.j x .; i=T,...,k. The least squares formulation of this situation is to minimize the sum of squares of some cost function C, where k r k h x j i a j - y i i=i Since this has only 1 parameter space (a.) the minimization with respect J •to the a., shouTd'.i be set to zero. This yields a set of l inear equations 3 or the "normal set of equations" as they are referred to in regression analysis. These then can be solved to obtain the least squares parameter estimate a.. J For the case where there is more than one parameter set one can simply rewrite (.!).'in vector form: -10-x 0 = x^a, where x^ = [ x l 9 x 2 , . . . , x ] (2.0) a - [a^, &2'••s a^] then C becomes . C = l\ [ x ^ a - y.f, where (3.0) i = T y i = x. T a + TI T 1 = 1,2 k (4.0) the normal equations become: v.(C a) = ^ . x J a - \\x.y. - 0 (5.0) i=l i=l is the gradient (Jacobian) of C with respect to a l l elements of a. a Provided that x-x."*" is non singular, the solution to (5.0) i s : = P k b k w n e r e (6-0) \ - i\ r-Wr1 ; b k . \\ Xly, (7.o) i = t 1=1 •To extend (6.0) to a recursive form, where after k samples a^ is a linear sum of the estimate obtained after k-1 samples a^.-p-plus some correction term based on the new information y^ and x^ received on the kth sample, one must use (7.0) so that and b^ can be related to P^ -j and b^ -j through the following equations: V ' • Ml"? x k x k T - (8.0) bk = bk_, * x k y k (9.0) which can be rearranged as pk - pk-i - Mi \ ri + \ J Mi \^ \T pk-i '10-°) (also called the "matrix inversion lemma") -11-To obtain a k in terms of a^-j put (8 and 9) into (6.0) to obtain: kk • pk-l xk U + V pk-l xfc] <12-°> Finally a simpler expression for (11.0) can be found by multiplying k k by P^ - 1 and substituting from (8.0) for P^ - 1 so that k k = P k Pk- l ' + x k x k T Pk-1 x k 1 + x k T Pk-1 x k _ 1 consequently (11.0) becomes a k = V l " P k [ x k x k T V l ~ V k ] ( 1 3 " 0 ) This then is the recursion form for the original least squares form (1.0), The only thing missing from this structure is that i t does not make any use of the stat i s t ica l errors n , . in the observations. To incorpor-yi ate this into (13.0) one needs only to retrace the steps used in the simple l inear regression development. This requires (1) zero mean E[a,J = o (2) the variance covariance matrix * - - J-i * P k = E [ a k a k ] is related to P k through the simple relation P k = a 2 P ^ . To update a k and P^ in (13.0) and (10.0) to y ie ld \ = a k _ ] - V {x k x k T a - x k y k } (14.0) a* \ = p k - i " p k - i \ { ° 2 + x T p k - i V x k T p k - i ( 1 5 - ° ) This algorithm not only supplies the parameter estimates at each sampling instant, but provides also an indication of the accuracy of these -12-estimates through the error covariance matrix P ^ ,;. This i s extremely useful for looking at things "on l i n e " . It also provides an estimate of the rate of convergence of the parameter estimates. * I f one considers the s t a t i s t i c a l interpretat ion of P ^ as an estimation error covariance matrix, i t . would then be i n t u i t i v e l y * reasonable to choose P n • -±. ±. -J.U ^ .u T * • ^ - J o c o n s i s t e n t with the level of confidence one has in the i n i t i a l estimate a. This i s a d i rec t l i nk with the Bayesian point of view, since a6 and P 0 represent the ap r i o r i s t a t i s t i c s of the estimate which w i l l be updated with the input yi and x 1 = t o y i e l d * the .aposteriori covariance matrix P x . This then proceeds in a recursive manner for each step. In r e a l i t y one knows l i t t l e about the parameters, in which case sett ing a Q = [0] i s a reasonable s ta r t . S im i l a r l y P Q i s set to large diagonal elements (e 10 3) ind icat ing that there i s very l i t t l e confidence in the i n i t i a l estimate and no idea of the cross-covariance terms. This interpretat ion can be applied to the correct ion term in (14.0) * \ ( V k T V i - V k J a One can say that th i s i s an instantaneous measure of the gradient of C P x at the kth instant, which i s modulated by k . Since the Gaussian * a 2 s t a t i s t i c a l properties allow that P ^ w i l l be a s t r i c t l y decreasing function of time; th i s then allows the interpretat ion that as the estimate proceeds and confidence increases, the weighting decreases; * consequently, the correction reduces. P ^ then tends to " f i l t e r " out the inaccuracy of the observations. This f i l t e r i n g idea applied to the least squares algorithm allowed Y . C . . H 0 2 to apply th i s recursive analys is.to a multidimensional -13-* p • stochastic approximation with the scalar gain replaced by k . This a2 analogy permits an overview that the consistency of the estimates i s not ju s t a s t a t i s t i c a l d e f i n i t i o n , but part of the actual mechanism. It i s th i s l ink that unites the least squares analysis to pattern recogni-t ion and machine learning to state var iable estimation and f i l t e r i n g . Since the idea of least squares applied in a recursive form may not be a familar form of analysis a simple example from physics w i l l be presented to aid in c l a r i f i c a t i o n . Consider a body moving in a st ra ight l i ne with a ve loc i ty V_. One knows that the distance covered S. at any time t . i s given by S. = SQ + \lt. SQ i s the posit ion @ t 0 Suppose there are k noisy observations y. of s^ (the posit ion @ t^) contaminated with noise n„ or n„ • yti y i Then y i = SQ + Mt- + Tji y . i = l , 2 k So we have a, = S : a = v 1 O 2 and xj.. = 1.0 and x 2 = t - the l inear least squares solut ion is i i 1 t r i v i a l . Consider a vector form of the same problem where x i = [ l . t ^ 1 ; a = [ S 0 , V ] T Consequently, the simultaneous estimation of SQ and V by reference to y.j i s a 2 parameter system which can also be solved in the leas t -squares manner. The fol lowing is a solut ion to a set of observations for th i s problem. This was presented by F. Grayb i l l 3 (1960) . His example i s as fo l lows: -14-For a 0 = [ 0 ] and P 0 = i 10 0 i 0 .10 1 0 . 0 r~ 1 . 0 0.1 0 . 0 1 0 . 0 0 1 I ' I I I I I 0 1 2 3 4 5 6 7 Number of samples Figure 3 p = r P n P i 2 ] P21 H22 Number of samples Figure 4 r a , stage-wise { a,p 0 = 1 0 0 . 0 a, Po = 1 - 0 -15-Here one can also see the s t r i c t l y decreasing nature of P k . Figure 2 shows the results in the parameter estimation. Figure 3 shows how P k evolves. Figure 4 compares the results of the system when S0 is known a priori (only one unknown V) computed with different values of P0 (which is now a scalar, since S is known). The results are compared to a stage wise least squares. For P0 > 102 there is no real difference. There is one catch to this otherwise elegant algorithm. Things are assumed to be relat ively constant over the interval of observation. If this is not true, then using this analysis directly is clearly not a good idea. The next sequential step in the evolution towards the Kalman model was to allow the process to have a memory (ostensibly by weighting past values). This in i t se l f is not reasonable since a l l parameters are treated alike and no criterion-exists for se lect iv i ty. The next step in the adaptation of this regression analysis is to l i f t the invariance of the parameters restr ict ion. This means that where a k = a k - l ' f o r a 1 1 k ' one assumes that the parameters vary in a manner described by the following stochastic matrix a k = Vk-i Vi4 !k,k-i V i (16-0) $ k k _ 1 is a transition matrix (n x n), also written as . k _ 1 is an (m x m) input matrix, also written^as r. q k .j is a vector (noise) of independent r.v. 's where E{qk) = 0; E{q kq k T} = Q k \ . . 3 -16-An example of this i s^he random walk where a k = a k - l + q k - l The form of (16.0) provides additional a priori information that can be exploited to find a^. To demonstrate this consider the now familiar system *k = x k T a k + T 1 y k from which one requires a^. It is assumed that a^ wil l vary in a stochastic manner that can be described by (16.0). At the kth sampling (estimation), this additional prior infor-mation allows one to make apriori updates (predictions) called a k/k - l a n c * P k /k - l t 0 t ' i e estimates a^-j and covariance P|<_-| obtained at the k-1th sampling. Simply stated this means that because one knows how the true parameter vector wil l vary between samplings, one can use this information to vary the parametric estimate in a similar manner prior to the receipt of the new data at the kth instant. To demonstrate this prediction f i r s t recall that ECa^ ] = $ a k i (from restrictions on q k) So an estimate of a^ prior to the new data arriving can be obtained by a k/k - l = • a k - l ( f l 7 - 0 ) where k/k-1 means k based on-k-1. The error is defined as (for the a priori estimate) a k /k - l = a k/k - l " a k f r o m ( 1 6 - 0 ) then from (16.0) arid "(17.0) a k /k - l = $ a k - l + r q k - l + * a k - l = $ a k - l " r q k - l in a similar manner (see Gelb pp. 107, 110) P k/k - l = $ P k - l $ T + r Q r T w h e r e (18.0) P k /k - l = E^ ak-1 a k - l ^ a n d ^ 1 S ^ e c o v a n a n c e matrix of the parameter variation disturbance in (16.0). One now has a correction mechanism for the system given a new observation plus the prediction mechanism a k/k - l = * § k - l P k / k - l * = * P k . 1 * » + r Q r T The formulation can be simply demonstrated by again using the random walk as an example, here $ = r = I I: identity matrix t h e n a k /k - l = a k - l P k/k - l = Pk-1 + Q Q is the covariance of the random walk and needs only be a diagonal matrix. One has at this point a l l of the mechanisms of the Kalman F i l ter as,.they are to-be.'.used in this paper. This .formulation of the state space representation .needs only to be linked to the deconvolution process to make the mechanism complete. Chapter 4 STATE SPACE REPRESENTATION OF THE DECONVOLUTION PROCESS Before one can effect ively l ink the deconvolution process to the Kalman state space representation, an equivalence between these two operations must be established along with some fundamental ideas concerning modelling random processes. To c l a r i f y the change in formulation a simple c i r cu i t example has been chosen. L Figure 5. . Figure 5 represents a system with an input u(t) and an output y ( t ) . The manner in which energy is transferred from input to output is the solution to a pair of l inear, homogeneous, ordinary d i f ferent ia l equations. Formally stated the pair of ordinary di f ferent ia l equations are u(t) = Ldi(t) + Ri(t) input dt y(t) = Ri(t) output The solution H(t) represents the system response to the input u(t). The solution has a form • -P./, t H(t) = R/llxp L To make the transition between the Kalman formalism and the transfer function form, one need only consider the following changes: 1 . Define state variables x(t) = i ( t ) ; y ( t ) = Rx(t) The d i f f e r e n t i a l equations become x(t) = -R/ L x(t) + 1 / L u(t) ( 2 0 . 0 ) y( t ) = Rx(t) There are now two things that become immediately apparent. F i r s t , the state space formulation permits R and L to vary as functions of time, whereas the transfer function formulation does not. This impli that in the t-space formulation there i s no e x p l i c i t solut ion for R(t) and L ( t ) . Secondly, the transfer function formalism has been replaced with a state formulation. In terms of the deconvolution process, the deconvolution operator i s equivalent to H(t). This has assumed that changes in the system parameters are f ixed (as are R and L). The physical system, such as the layered earth, does not have i t s system parameters f ixed and they do change with time. Consequently, to l i nk th i s evolution process of the parameters to the best " f i x ed " approximation provided by the decon-volution process, one needs to restate the problem in the state space conf igurat ion. This i s e f f e c t i ve l y done by making the appropriate subst i tut ions. I f one now returns to the Kalman system model formulation the restat ing of the deconvolution process in state space terms becomes stra ight forward. This can be done in the fol lowing manner. x(t ) = Ax(t) + Bu(t) where A, B, H, C are matrices ( 2 1 , 0 ) y( t ) .= Hx(t) + Cv_(t) . x, y, u, -v are vectors. - 2 0 -H is not the transfer function. In terms of the previous example x = x(t); A = -R/ L; B = 1 / l u(t) = u(t); C = V = 0; H = R Equations (21.0) are a general description of a time-varying l inear system. The state-space description in terms of state vector functions of time has replaced H(t) in the linear system. The solution to (21.0) can be solved as an ordinary d i f ferent ia l equation i f A and B are constant. This is done by analogy to the solution from x(t) = e A ( t - t o ) x ( t 0 ) +•/ e A ( t - T ) 6 u ( x ) d x t 0 i f U(T) = 0 then, x(t) = e A ^ " t o ^ x ( t 0 ) which is similar form to H(t). The position of this solution containing e A ^ t - t ° ^ is the transition matrix which'describes the evolution of the state vector function as a function of time. When A, B, H, and C are not constant one then has to solve an associated estimation problem. Before embarking on the estimation problems one must also consider the problems associated with modelling a random process. By analogy with the previous, examples we consider again u(t) as the input to a l inear system with a transfer function denoted by T and an associated output y(t ) . To enable a representation of this system as a random white noise process driving T to y ie ld an output y, the following argument from Bayless and Brigham^. is presented. They consider the power spectral density, that is the function (in this case the transfer function T) whose integral over any sampling interval represents the contribution to the variance from that sampling, -21-to be represented as o>. There are two contributions to the variance in the linear system we are looking at. They are cj>y(w) the contribution from the output o>u(w) the contribution from the input therefore, u(w) and since u(t) can be considered as a white noise process, then, u(w) = 1. This now implies: (w) = |T(jw)| : for the case where (w) is the variance contribution of a nondeterministic random process; ^(w) can be factored into T(jw)*T(jw). Hence, any given power spectral density can be factored to give T(jw). This means that there exists a T(jw) so that the random process u(t) can be represented as a white noise process driving the linear system T to give fy^M. As a result of this argument one can look at (19.0) as: u ( t ) - — T(jw) y(t) where u(t) is white noise. When translating this formulation to the state space description, T(jw) is replaced with a system of dif ferentia l equations and output equations as can, be visualized in Figure 6. — 1 , CO CO Figure 6 -22-As has been previously noted A, B, J are generally a function of time. At this point the process description of (19.0) has undergone a two stage transformation. We have, f i r s t and foremost, allowed that the mean and autocorrelation of a random process u(t) can be modelled as a l inear system driven by white noise. Secondly, the transfer function representation of this l inear system has been replaced by the state variable description. It now remains only to dovetail these two attributes into the Kalman representation. A simplified description of the deconvolution process can be visualized in Figure 7. message x(t) } /yN 1 Z(t) . h(t) impulse response x ' ( t ) process ' V(t) additive noise Figure 7 Here we wanted to find an h acting on the input Z(t) so as to minimize the mean square error between x(t) and x(t). Stated more formally: co x(t) = /h(x) Z(t-x)dx where h(x) — CO and sat is f ies the relation (auto correlation matr ix)* ( f i l ter response) = (Cross correlation of input and desired output) This is the class ical Weiner Hophg integral equation defined by co / h ( x ) R z z ( t-x)dx = R z x (22.0) —CO -23-R z z: = autocorrelation of input R : = cross-correlation of input and desired output. Z X In i ts most general sense (22.0) is written as t / h(t,x) R z z(T , k ) = R x z (t,k) with (23.0) to t x(t) = / h(t,i-) z ( t y r ) < h to This equation is d i f f i cu l t to solve generally and the contribution from Kalman was that he changed (23.0) into a non-linear di f ferentia l equation, whose solution determines the optimum f i l t e r . However, for the discrete digital case the Kalman equations take on the following formulation: x k = $ k / k _ ] x k - l + B k / k _ l U k ( 2 4 0 ) h = H k x k + V k This system is the same form as (22.0) with $ being the transition (propagation) matrix for the discrete state vector equations. In this case one wishes to minimize E[(x-x) T(x-x)] based on the previous k measurements of y (output; runoff). The sampling interval for this process was 1 time interval. Implicit in this formulation (following Crump^,1974) are the following assumptions. 1. $ , B, , H. are known for a l l times k. K / k - l K K 2. The driving input u^ has zero mean, known variance and is correlated to the additive noise v^ for a l l times k. E[u k] = 0; E [u k u T k ] = Q k ^ E [u k v T k ] = 0; for a l l k and j -24-un-Q k is a known diagonal matrix. The additive noise v^ has zero mean and known variance for a l l times k. E[v k] = 0; E [ v k V j T ] = V k 6 k j where, v k is a diagonal matrix The i n i t i a l state vector has known mean and variance E[x 0] = x 0 E[x 0 x T 0 ] = M0 It should be noted at this point that at each time k, second order s tat i s t ics are required for the noise processes u^ , and v^. This is the feature that allows for non-staionarity in u^ and vv whose stat i s t ics change from time step to time step. -25-Chapter 5 IMPLEMENTATION OF AN ADAPTIVE KALMAN FILTER TO ESTIMATING RUNOFF The classical Kalman formulation is suff ic ient when one has continuing information on the output signal (runoff). However, as is the case in hydrologic prediction the information on runoff stops at some point and the only continuum of information is the input signal ( ra infa l l ) . Estimation of the error covariance i n i t i a l l y is not a problem; neither is the estimation of the system noise. Consequently, the implementation of the Kalman f i l t e r can be effected using only minor modifications to its structure. One begins with x^. This is not a vector in this case but rather a matrix describing the change in the diffusing (percolation/ retention) properties of the layered earth model. Therefore $ k ^ is the matrix describing the change in the solution form as a function of time and V, is the :noise?associated with this measurement. <. K x k = *k, k - l x k - l + V k To establish the historic behaviour of this system under peak (flood) conditions one uses the "s ignif icant" storm records from several years of records to establish a typical design storm. This input storm can be chosen to ref lect a variety of conditions. However, for flood prediction one would typical ly want the "worst" of the data set. "Worst" can imply two things. First the signal that has the maximum intensity over a fixed period, or secondly, the maximum duration and maximum intensity. These signals are multichannel, implying that the same storm is monitored at a number of meteorologic'stations. -26-Associated with this input signal (at a suitable time lag) are the resulting monitored flow records. These are also multichanneled ( i .e . , monitored simultaneously on several different r ivers). One wishes to associate the output trace with the input signal in such a way as to characterize the system behaviour. This is done using the deconvolution process employing a standard Weiner f i l t e r . The Weiner f i l t e r is the digital operator which solves in a l inear least squares sense the Weiner Hopf equation (23.0). It is formulated in such a way that one produces a linear least squares predictive and enhanced stationary time series by means of a real izable, time invarient linear operator. (For a discussion of Weiner f i l t e r s see Robinsonj Chapter 6). One of these realizations would typical ly look l ike the following: m yt = I, A t - j u t - j ; m = h + 1 {u }^ are the vectors of input signals of length n at lags t - j ; j= l , n+1. A. . are the matrices of deconvolution operators at lags j ; y t is the estimated output at time t. In the Kalman notation: y k = V k + V k ' w h e r e m A,u. = T1 A' . u. . and A' is to be calculated from: k k L \ t - j t - j t j=l A ' k = \ , k-1 Ak-1 + w k Consequently, the system and measurement models look l ike: A k = * k , k-1 Ak-1 + wk s y s t e m y k = V k + \ measurement -27-In this problem application one has only estimates at y k called y k whereas the measurement process is on the input signals u^ . One, therefore, needs to rewrite the updating procedure in a somewhat different manner. m i .-i.e. y k = £x A'1<_. ul<_^ + Vk signal estimate using deconvolution. j=0 The transition then becomes A'k = $k, k-1 Ak-1 ; $k, k-1 = Ak 1 Ak-1 m • • • * ' k = A'k \ + l\ Ak-j V j + Vk j-1 From this one uses the Kalman algorithm to update A^ , i.e., Ak = A ' k + K k [ y k - y ' k ] In terms of the entire process the step by step computation becomes as follows: Compute P(0): This is the matrix of terms defined by E[(Ak - Ak) (Ak - A k) T]. Since one does not know the exact solution to the differential system satisfied by Ak it suffices to use a diagonal matrix with terms of order 10 3 . (For the motivation behind this see the preliminary discussion). Set A = At, the deconvolution operator computed for time t. Now compute: P'k = *k, k-1 Pk-1 $ Tk, k-1 + Qk - 1 where Qk is a diagonal matrix describing the covariance of the signal measurement. Compute: Kk = P'k ^k '-Vk^ k + ' w n e r e vk i s t h e "measurement -28-of the noise in the measurement process. Then compute: K = A ' k + Kk where A ' k = »k, k-1 Ak-1 and m. y', = A ' , u. + Y, A, . u, . J k k k A i k-j k-j j - l Then compute: p k = p , k - W k increment k and recompute P'. The Kalman equations as they are stated above are effectively minimizing the trace of P.,. The vector A ' k is the best projected estimate of A^, that is the best estimate for A^ based on the estimates y^_1 and the data through u ^ . The matrix P'^ is the error covariance matrix of the projected estimate A'^ influenced by the most recent data u^. Finally u'^ is the best estimate of the measurement vector u^ based on the samples ul<_-| and the estimates to y^. The nature of this discrete formulation can be visualized using the block diagram formulation from Crump^ (Figure 8). The Kalman f i l t e r diagram contains the same delay and feedback system as the signal model. It is also clear that the estimate A^ is obtained using the weighted sum of two things: 1. The best predicted estimate A'^ 2. The latest measurement sample u^ The relative weighting of these two quantities is determined by the system dynamics model and is invoked through the definition of $ and -29-A. • and defined by Q and V. K-J Figure 8 Block diagram of signal model and discrete Kalman f i l t e r . -30-Computation of the Kalman F i l te r To aid in the visualization of the actual mechanics of the Kalman process a flow description of the procedure is outlined below. 1. In i t ia l i ze the f i l t e r by defining the i n i t i a l estimations. P0 = M0 the covariance of the state matrix. x 0 = xQ X q is the i n i t i a l estimate of the state of the system. This corresponds to the decon-volution operator at zero lag. 2. For each time step perform the following calculations: (a) Compute pk = fkpk!kT + WkT (25-0) kk = PkHkT * + V k ] - l (b) Compute the updated state matrix xk = V k - i " x k : H k (c) Compute a new matrix P^ pk • p k * W k Increment k and repeat. It is also useful to re-examine \ = *k + kkCyk " yk] This is where the prediction of the next system state is computed. If one writes this exp l ic i t ly in the form: \ = *k*k-l + kk»k " Hk\xk] , one can observe the following properties: -31-1: If the model were a perfect one and there was no noise in the system the output would look l ike h = h = \ \ = V k V This means that x k = ^ x ^ - p Consequently, once the transition matrix $ k is obtained from the physical model, one could predict the state vector exactly for a l l time steps as long as y=y. If this is not so then k k [y k = y ' k ] is the error in the prediction of the state vector at the next time step. 2. The matrix k k (Kalman gain) has useful physical properties that can be examined as the iteration proceeds. The optimality of this Kalman f i l t e r is contained in the structure and specification of this gain matrix. The mathematical least squares structure has been discussed at length consequently, an intuit ive logic should also be included with its properties. This can be examined from the form Kk = P ^ i / ^ " 1 » w n e r e R|< = H k P ' k H k T + V k a n d r e P r e s e n t s the noise in the system. To better fac i l i t a te the physical interpretation of K k, assume for the moment H k is an identity matrix. This assumes P k and Rk are n x n matrices. If R - 1 (system noise) is diagonal, (this means that the noise between channels is not related), the matrix Kk results from multiplying each column of the error covariance matrix P k by the inverse of the mean square measurement noise. "This implies that each element of Kk is essentially the ratio between s tat i s t ica l measures of the uncertainty in the state estimate and the uncertainty in a measurement."g Simply stated this means that the gain matrix is proportional to the uncertainty in the estimate, and inversely proportional to -32-the measurement noise. If the measurement noise is large and the state estimate errors are small, the vector (in the original equations 23.0) is due primarily to noise and only small changes to the state estimate should be made. However, a small measurement noise and a large uncer-tainty in the state estimate says that contains a great deal of information on the errors in the estimate. The difference between the actual and predicted measurement is used as the correction basis. One can see from the formulation of that this does indeed agree with the intuit ive motivation. As a final observation one should also consider the interpretation of the covariance matrix P .^ The effect of system disturbances on the error covariance growth is the same as that observed when measure-ments were not available. As the s tat i s t ica l .parameters of the disturbance becomeilarger;-theirceffects are reflected in the"s ize" of the Q^ matrix in (25.0) The more pronounced the effect of the disturbances reflected in the "s ize" of the matrix, the more rapidly the error covariance wi l l increase. This effect of measurement noise on the behaviour of the error covariance can be seen in V + f 1 = V - ) " 1 + H V \ where (+) means after receiving information and (-) means before (as used in Gelb^). Large measurement noise (R k _ 1 is small) provides only a small increase in the inverse error covariance (P^ - 1 ) . Which means a small decrease in the error in P^ when the measurement is used. These measurements, therefore, contribute l i t t l e to the reduction of the estimated errors. However, small errors in measurement (large R. _ 1) cause a large decrease in the error covariance when the measurement -33-is used. If, on the other hand, one does not have measurement noise the appropriate equation for i s : P|<(+) = ^ " Kk Hk^ P|<(-) s i n c e R k _ 1 d o e s n o t a P P e a r -To summarize the general behaviour of the P^ matrix one can say the following things: 1. Larger measurement noise wi l l cause the error covariance to decrease slowly, or conversely to increase, depending on the system, the disturbances and the i n i t i a l value of P .^ 2. Smaller noise wi l l cause the f i l t e r estimates to converge on the true values more rapidly. As an i l l u s t ra t ion , the following Figure 9 from Gelb R is used. P •Vi "i-i i L COMPUTE EO.|A.2-161 Jk.| . * k - l y t. iPUTE \H I COMPUTE E O I REASON ABIE NESS CHECKS COMPU EO A TE K k I COMPUTE EO (A2 -51 REASONABLENESS CHECKS *k-l UPDATED ESTIMATE j SET k . Ste-p l L J O 12. ta. ^" . (s) d> (s) T n r ' v nn v ' -39-Correspondingly, the crosscorrelation between two series x and y looks 1 i ke: cj> (s) = E {x t + S y t T } i f one lets A (s) = E {x,(t+s)y.(t)} X i y j 1 J * v (s) r x y This matrix is the vehicle through which one can look at the degree of interrelation between any two time series. For means of computational ease <|> (s) is looked at from the frequency of occurrence domain defined xy by: • (f) = l | v -ZTrifs which is the Fourier, transform of the. (s) representation. This enables one to calculate the spectral density, or the contribution to the variance in the series by the individual components. This provides easier s ta t i s t i c to characterize the differences between the two series. For further discussion of the implications of this procedure see Jenkins and Watt-. The end result of this manipulation is to examine the maximum power (energy) shared by these two series at any sampling rate. This is formally defined as the geometrical mean i . e . , n / x^x^ x n of the two autospectra $| 636 at the frequency (sampling rate) of up to f = 2^r- ; At = 1 hr ; f = .5 cycles If the predicted deconvolution operator is the solution to the same problem (basic characteristics) as the previous one the two operators wi l l be coherent at the same frequencies, with perhaps dif ferent phase lags (response rates). If this is not the case, either the historical characterization is inadequate, or the prediction is defining conditions quite different than normal. If this is the case then the coherencies for the predicted flood flows and another similar historic situation should be checked, using the same signal input. For either of these two cases there is a consistency test to apply to establish cred ib i l i ty of the predictions. A similar test should be run when the signals being used are radical ly different than the historical records, eg. the "100 year storm". When this signal is applied to historical data the deconvolution operator should be the same ( i .e . , have corresponding coherencies) values over several sets of data. Consequently, one can place confidence intervals on the predictions one makes using these operators. -42-Computation of the Confidence Bounds on Predicted Traces: .. 'Part .of the computation performed by the Weiner algorithm is the estimation of the mean square error along the trace of the computed operator. This can be formalized as follows: E { e t e t T } = E{ e i2 ( t ) } .... E fe^ t ) e m(t) E{em("t)e1(t)} ••• E{em 2(t)} where e(t) is the error at time t; m is the number of lags. The trace of this I = tr E{e(t) e(t) T} = l\ E{ e j 2 ( t ) } j=l is to be minimum. This minimum is returned as a percentage. Unfortunately, this is only an estimate of the agreement with the target data set and does not provide bounds for the f i l t e r ' s performance using new input data. The innovation variance structure has been dealt with at length in the l i terature (see, for example, Plackett^* I960) and . wil l consequently be treated from a computational point of view only. Consider the autoregressive representation used for the Surrey data, i . e . , y t = A t _j u t _. + (Z is the noise) j=0 Since one is now casting into the future without observation Z = 0. The variance imparted by this estimation is given as: Var[y] = Var[Au] + Var [Z] = u'Cu + a 2 u' denotes transpose C is the covariance of the estimator. -43 -Since Var [u'C] = E{u'(X-y)'(X-y)'ii} = u'Vu where V now represents the covariance of the input (rainfal l ) variables. Note also V = a 2 I One can now substitute this new equality into the original Var[y] expression to obtain: Var[y] = (u 1 (X'X) _ 1 u+l)o 2 Further to this one only has an estimate s 2 to the actual a2 and since the population is assumed normal, the errors are then distributed with a Student's t-distr ibution (see Jenkins and Watts^g',Chapter 3, p. 83). This is denoted by t (l--^-)' In this case there are n-m degrees of freedom (since i t takes m lags to define the process); consequently the distribution value desired is t n _ m ( l - |-). For the Surrey data, n=72, m=7; n-m=68 degrees of freedom. For the 95% confidence interval tgg - 2. The f inal form of the confidence on y looks l ike: y ± t 6 8 (1- f ) s /1 + u ' U ' X ) " ^ The average s is approximately 19.8. The expression /1+ u '(X 1 X) _ 1 u is 0.8 with the covariance evaluated at t=8. This gives a confidence band of: y ± 31.6 cfs at t = 8 hours. The covariance matrices of the u's wil l obviously change as the time proceeds and wil l (hopefully) settle down to some stable level . In this case the band settles down to approximately ±15 cfs after t=26 hours. -44-For other situations, such as the Squamish River data, the bands wil l vary radical ly dependent on the arrival of the flash flood conditions. The prediction confidence then depends on the similarity between the historic flood and the snowpack conditions, of the test situation and the new conditions and target flows. As has been discussed previously, the arrival of one of these events gives rise to an enormous error in the f i l t e r whose con-fidence band is also proportional to the new covariance of the signal, which wil l also be large. Recently a somewhat different type of deconvolution process has been developed that deals precisely with the arrival of these types of events. This minimum entropy technique was put forth by R. Wiggins,*, 1977. -45-Considerations: When applying these f i l t e r s some purely "engineering" .criteria should be applied. This breaks down into two simple categories, namely, those dealing with the external behaviour of the system and those concerned with its internal behaviour. The Weiner f i l t e r formulation of the f i l t e r design cal ls for the solution of the Weiner-Hopf integral equation., where the unknown is the impulse response function of the optimum f i l t e r . In this discrete case one simply solves a set of normal equations to obtain coefficients of this impulse response function. This impulse response describes the external behaviour of the f i l t e r , that i s , i ts performance spec-i f icat ions. In most cases this becomes the input to the system design. In discrete f i l t e r i n g , the "RLC-circuit" f i l t e r no longer applies. The physical information is now included in a s tat i s t ica l pro-cedure (Kalman f i l t e r ) . As a result one can now question the internal behaviour of the system in a systematic manner. The s tab i l i ty and delay characteristics are a l l a part of the internal representation and cannot be disregarded. The net result of this combination of the two processes is that one specifies the stochastic behaviour of the process that wi l l change over time. Other Applications: Given the ab i l i ty to systematically characterize a basin or basins in terms of these f i l t e r s , one obvious extension of the prediction capability exists. One begins by->assuming that the spatial configuration of the'problem looks l ike Figure 12. -46-ad nV«rs Figure 12 Where 1, 2 and 3 are basins in some arbitrary region. One knows the precipitation and flows for 1 and 2's various rivers a, b, c, d, e, f. Consequently, one can define a set of general f i l t e r s describing 1 and 2. One also knows the precipitation for 3, but the streams have not been gauged ( i .e. g, h, i are not gauged). One can determine from frequency distributions of ra infa l l a general frequency for basins 1 and 2. The convolution of $ 1 2 ( f ) (the ra infa l l distribution) and the fourier transform of the predictive f i l t e r $ r j 1 2 ( f ) wi l l y ie ld a frequency distribution of flows for 3. Mow assume that one has some information on the flows in 3 either as historic data or at some future time. The predictive deconvolution in the Kalman algorithm wil l correct the behaviour of 3 as i n i t i a l l y estimated from 1 and 2. Another characteristic can be observed from the f i l t e r s resulting from 1 and 2. This is the change in the phase lag over time. (The -47-time base here would be in the order of years.) The change in this time lag would monitor the "d r i f t " of the flow with time or rate of urbanization. This can be done internal ly, i . e . , for basin 1 or 2 or as a cross comparison between the two regions. Other Techniques:Compatabi1ity The compatabi!ity of the various types of estimation techniques have been explored mathematically by numerous groups of individuals. The results by Kalman^ show that every second order process, (that is a process that takes into account only those stochastic properties determined by their means and variances) has a proper rational spectrum. This spectrum has a Markov real izat ion. He also shows that the starting assumptions for the Weiner f i l t e r and the Kalman f i l t e r are the same. The fact that every process defined in this manner has a Markovian realization means that techniques that are autoregressive with a memory of only one time lag (such as CIarks-j-| method) produce results that represent limited approximations to the Kalman-Weiner formulation. Hence, one could have predictions for basin #3 from some other auto-regressive formulation and perform subsequent analysis of the results using the Kalman Weiner paradigm. This is attractive i f one already has estimation techniques established and understood for modeling par t i -cular modes of the operation. -48-time Numerical Examples: Example. No. 1 The system of rivers chosen for the f i r s t example is from the Surrey basin defined in the map (Fig.10). They consist of the Salmon and Nicomekl rivers along with Murray Creek. The input signals to this system consist of precipitation data from the stations at Vancouver Airport, White Rock and the Surrey Municipal Hall. For the actual computation of the deconvolution operator three channels of runoff data from 1973 flood flows were also included. The lags for the six channels of input data can be visualized in Figuine ,13. input data Precipitation 0 72 hr 1973 flows +12 hr 78 ecedent runoff (prior to 1973 peaks), Figure 13 The assumption here is that the results of the precipitation are lagged by a minimum of 12 hours from the flows and that the flows are to be shifted in a corresponding manner. With this set up one wishes to compute the operation that when convolved with the signal wi l l y ie ld the 1973 peak flood flows. One further assumption is made. That is that since both signal precipitation and signal antecedent flows are trying to predict three channels of information that are the same, one can either constrain the -49-output traces to be three in number, or allow them to be redundant, i .e . put in three traces extra. These three extra traces are the same as the original ones. This can be visualized in Figure 14. Precipitation Antecedent Flows 1973 Airport White Rock { Municipal Hall NicomekT ' Salmon Murray Creek Nickomekl 1 0 P Salmon 2 E R Murray Creek 3 A T Nicomekl 1 0 R • Salmon 2 Murray Creek 3 73 runoff 72 hr 72 hr Figure 14 Having found this operator for the 1973 precipitation and antecedent flows, the new 1974 precipitation and antecedent flows are substituted and convolved with the operator to predict the 1974 flows, i . e . , f 74 V ( 73 ) : ' ( 74 \ \Signal j V Operator / ^Peaks / The results and a sample of the operator are shown in Figures The computed operator is now read into the Kalman state equations as the matrix representation of x^, where x^ is the leading (0-lag) decon operator matrix and $ = xj^ x^ the transition matrix. The Kalman update proceeds in the following manner. After each time step of prediction using the decon operator the actual flow values for the 1974 flows are read in. The covariance of their agree--50-ment is used to update x^. The new x^ is then convolved with the signal to produce the new updated flows. This simulates the "on l ine" situation. The resulting changes,are plotted in Figure 18a,b The Murray Creek trace was consistently overestimated due to the size of i ts peaks in comparison to the Salmon and Nicomekl." However, one can see the modification in the f i t by the updating mechanism. With suff ic ient historical data the overestimation can be simply subtracted out as a constant amplitude distortion. Discussion: Several of the general characteristics described in Chapters 2, 3 and 4 become immediately v is ib le. F i r s t , the realization of the minimum phase characteristic -of-each of'the matrix operators. Plots of the magnitude of the coefficients of the f i l t e r and the distribution with lag can be seen in Figures 14a,b,c. : The leading (zero) lag of the f i l t e r is the greatest amplitude and passes most of the information. This is consistent with the minimum phase condition described in Chapter 2. This is also important when one is using the Kalman f i l t e r program to update these coeff icients. Since the Kalman f i l t e r in this application is an error f i l t e r , i t is applied to this leading term of the deconvolution operator only on the motivation that this is where most information is carried and also accounts for most of the error. A table of the order of change to this computed operator for the Surrey problem is included in Table -1. . In terms of the new information being passed the error update from-the Kalman f i l t e r is overlaid on the original Weiner estimate modulating this smoothed estimate with the. pulse of new flow information. This can be easily imagined when comparing the realization of the Weiner f i t t ing Figures 16, 17,-18 to the change in f i t caused by Kalman updating (See Figures 18a, b, c ) ; -51-TABLE 1 CHANGE IN THE LEADING LAG OF THE DECONVOLUTION OPERATOR AS f ( t ) t = 8 hours Channel No. 1 0. 3655 0. 3926 -0. 6841 0. 7326 -0. 5981 -0. 2391 2 0. 3223 0. 0639 -0. 3700 0. 1184 -0. 4052 -0. 0441 3 0. 4378 -0. 0365 -0. 3739 0. 0221 -0. 5829 0. 0964 4 0. 4765 -0. 1174 -0. 3051 0. 0005 -0. 0213 0. 0787 5 -0. 1752 -0. 4881 0. 6094 0. 0746 0. 3730 0. 0218 6 -0. 1821 -0. 4077 0. 6089 -0. 0421 0. 1774 -0. 0189 t = 72 hours Channel No. 1 0. 3644 0. 3915 -0. 6852 0. 7315 -0. 5992 -0. ,2401 2 0. 3223 0. 0639 -0. 3700 0. ,1184 -0. 4052 -0. ,0445 3 0. 4375 -0. 0368 -0. 3742 0. ,0218 -0. 5832 0. ,0961 4 0. 4764 -0. 1175 -0. 3052 0. 0004 -0. ,0215 0. ,0785 5 -0. 1740 -0. 4869 0. 6106 0. 0758 0. 3742 0. ,0230 6 -0. 1824 -0. 4077 0. 6080 -0. 0420 0. 1774 -0. ,0189 -52-frequency days (lag) ; Second Channel X z Frequency days (lag) Figure 14b lag (days) Figure 14c -57--58-Figure 18a * 8 tt . . . It k s u L « & Figure 18b Figure 18c Secondly, the problems and importance of having a common reference point for a l l of the channels (rivers) is apparent in the realizations of the seventy-two hour predictions. Murray Creek is very small (in flow volume) compared to either the Nicomekl -or Salmon Rivers. As a con-sequence of this i t rises to peak flood conditions much more quickly than the other two. So what one sees is the dissipation of energy from Murray Creek, the peak from the Nicomekl- River and the Salmon River-approaching its peak, a l l lagged by several hours. This set of three different lags can be observed in the plots of the lag in degrees comparing the various channels. This representation shows how far behind the Salmon r iver is compared to Murray Creek and how the -•'Nicomekl River starts ahead of the Salmon River in its energy build up and gradually gets into phase with i t at one point and then out again. For larger systems (order >_ 10 ) of rivers the phase relations become more complex and the power spectrum (distribution of the covariance with frequency) becomes more important. The analysis of power spectra is standard tool in any time series analysis and is of particular interest in this case. The plots in Figures 19, 20, & 21 show the distribution of variance (power) with frequency can be readily scaled using (logarithms) so that confidence intervals can be imposed on the probability of a particular channel exceeding the maximum power for any time period. The logarithmic choice is natural because this type of scaling makes the changes in power proportional. The realization of these confidence bands for the 95% confidence region can be seen in Figures 22, 23, 24. The spectra for each channel has been smoothed using a Tukeyg (p. 91) window. This windows out the spurious (high) frequency osci l lat ions and enhances the persistent frequencies. As is apparent from Figures 25,26,27, -63-Amp!ituds I 2. a» 4 Frequency in."days -67-- 7 1 -the confidence intervals are quite wide so that when one compares the actual and predicted power spectra (using the Weiner f i l t e r ) the band is wide enough to include the prediction also. This widely used mechanism is a simple test for consistency of prediction and provides an accurate indication of the "predictabi l i ty" of any given system. This is also a useful measure of the interrelation of the various information channels and point out relationships at different times that otherwise are not apparent. This and the phase relations (Fig. 28a,b,c) disseminate periodic components residing in the data structure. The same sort of analysis is valuable when performed on the f i l t e r i t se l f . For the case of a "three channel, four-lag f i l t e r " used here, the roots of this characteristic polynomial matrix can be analyzed in the following manner. If one performs Z-transforms on the polynomial: A t ^ t + A t-A> i + + Vm^-m = y t • w h e r e h e r e m = 4 one obtains a matrix polynomial in powers of z. A.z° + A. , z 1 + . . . + A. m z m =0 t t - i t-m which has complex roots describing the eigenvalues of the system. The real parts of the roots describe rates of decay of the system and the complex position the fundamental frequency. A real ization of the eigenvalues of this f i l t e r are in Figure 28d. The motivation for their calculation in this problem is to determine a range as a f(ft/time) describing the rate of energy dissipation by the system along with the range of return times of the system. The return time/fundamental frequency is analagous to the harmonic frequency for a simple pendulum. The difference in this case is that one is dealing with a system of pendula. • ' . -73- N Figure 28a FILTER CHARACTERISTICS FOR SURREY BASIN 1974. 'FUNDAMENTAL FREQ.IN HOURS 9.348 9.348 3.394 3.394 DISSIPATION RATE IN./HOUR Figure 28d Chapter 7 FINITE STATE MACHINES This approach of treating the rainfal l /runoff problem as an information and feedback control problem is not new. The applications date back several years.(See, for example, Young1 7) However, considering the different methodologies employed as a whole, i t would appear that l i t t l e has been done towards implementing the problem as a whole. The Kalman f i l t e r model-is tradit ional ly used by i t se l f on the problem without the.-:use of .the pre-optimization provided by the Weiner f i l t e r . When the Kalman f i l t e r is used the obvious spectral analysis segments should be exploited too. The vast majority of problems arising from the prediction aspects of both the Weiner and Kalman models are inherently nonlinear in nature. However, the structure "of both techniques lends i t se l f to adaptation to a " f in i te state machine"1 'mode of real ization. To examine brief ly the poss ibi l i ty of this formsof the structure a simple analogy is used. Suppose that a machine existed that had local Weiner f i l t e r s to describe the movement and behaviour of each of a large number of gears. Suppose also that a mechanism existed to switch the focus of attention from section to section, so that one could monitor the change from state to state. If one defined the Kalman formulation as the switching mechanism (not necessarily linear) then, the problem can be broken down into a set of these machines that together would describe the entire operation. In terms of the RiC c i rcu i t used originally,,the equivalent operation would be to have n of these simple " irreducible" mechanisms combined to build up to an arbitrary f in i te state machine. To focus -78-this aspect into the resource management f i e ld one need only consider an entire system process for a basin. This system consists of a snow pack (for B.C.), a precipitation system, an in f i l t ra t ion system and f ina l ly a runoff system. The focus of this paper has been on a small part of the overali problem, (one section of the c i r cu i t ) , (one state of the machine). The structure for a l l of these processes is the same, whether they are nonlinear or l inear. Consequently, the formulation and formalisms presented in this thesis are merely case examples for a f in i te state machine. This extension to f in i te state machines is discus in Krohn^. and in Robinson^, Ch. 5. To further define some of the methodology that would be employed, consider the problem where the nonlinear relation x is to be estimated s ta t i s t i ca l l y and functionally using two processes A and B. This means that we are looking for A and B so that A-B is the "best" approximation to x. This is by definit ion nonlinear because i t involves multiplying A and B. However, employing this idea of s tat i s t ica l machines this nonlinearity can be converted into a local linear problem. Assume one has an i n i t i a l value for A, i . e . , A 0 ; and an estimate-for B, which yields BQ. One now has a linear estimation problem since AQ is given the problem is no longer AB of unknown functions, but rather A 0BQ, B is now a linear, unknown. This than can be solved using linear methods. This means that at a local level one has constructed a l inear problem. Given this information one switches back to the f i r s t state with A as Al and a new estimate B 2 and so on. Since this has the features of a contraction mapping, the estimates wi l l converge to a l imiting set of values representing the f inal A and B. This is a well known result and readily applies to estimation of snowmelt given temperature and -79-precipitation records. The ab i l i t y to link past and present inputs is also required to produce current outputs. This can be done through an "asynchronous sequential machine". This machine does hot require a "time incrementing procedure to link transitions between different states. This means that a transition can take place following any change in inputs. On the other hand a synchronous machine uses a clock to in i t i a te changes. To summarize these procedures the methodology from Robinson^ is u t i l i zed . To put these ideas of using "machines" to l ink various sections of a problem together the synthesis mechanism is as follows: 1. Prepare a complete description of the f inal behaviour by means of a transition table (or graph). 2. Reduce this table to its shortest possible form. 3. Synthesize the resulting network. Consider the following sketch to represent this network: 1 . Figure 31 This network in Figure 31 is composed of machines with f i n i t e states S 1 and S 2 , a f i n i t e number of inputs x 2 and a f i n i t e number of outputs y x , y . Let the time be represented by n, where n is an integer. This is a synchronous machine according to our def in i t ion. -80-The present output y n and the present state s p are functions of the present input x n and the previous state s In terms of the sketch n - i : s = s y„ _ y n describes the branch form states s -> s 2 . Similarly, the branch form S2 ~* S2 i m P l i e s ' n - i n x n = x i » y n = y i -This can be summarized in a transition table: X n x i x 2 s n - i s n y n s n y n s i y i y i S2 S2 y 2 s i y 2 let S j = 0; s 2 = 1 x i = 0; x 2 = 1 then one has y i = 0; y 2 = i X n 0 1 V i s n y n s n y n 0 0 0 1 0 1 1 1 0 1 or stated in Figure 32 -81-Figure 32 This is a simplified Kalman formulatien. The tr ick to a l l of this is determining from one of these t r a n s i t tion graphs how many feedback loops are necessary and what type of behav-iour should be embodied in the function generator. A general formulation from Robinson^ for blocks of an arbitrary number of channels can be seen in Figure 33. Figure 33 where x n is input. y n is output: = function of x n and sn_p s n is present state = f( x n > s n_D) S n p = previous state D = delay in changing from one state to another (variable) S n D = s n ( e x c e P t w n e n transition is in progress). This operator would work in the following manner. 1. A change in x -> change in s n (s n is the new internal state of the system). 2. s n changes s n _ D after a delay D. -82-3. New D changes y^ (with the system entering a new internal state simultaneously with s n_p. 4. s n D can cause a change in s n l ike i t did with x^. Thus one has an algorithm to separate nonlinear behaviour into linear f i l t e r s and a "machine" to control their global behaviour in a f in i te manner. In l ight of this new information from this machine structure the state space (Kalman) formulation takes on a unifying role linking the choice of f i l t e r types to the real time, "on l ine" situation. One has defined, therefore, a "frame". This "frame" has attributes that define its state {Si and S 2) in the example. Sx and S 2 could for this problem be the snowpack depletion and the rainfal l /runoff problem. Since the transition graph for the example and the two stage physical process are the same, they can be linked in a synchronous manner to a continuous input data stream (x l 5 x 2) from sate l l i te observations and recursively processed using selection c r i te r i a (arrival type) on the type of f i l t e r to be used. The state space formulation moves the prediction from Si to S 2 or, Sn_p to Sn- The outputs y^ and y 2 can be checked for accuracy with the next information reception and changes made accordingly. This composes a very t r i v i a l algorithm in ALGOL that allows for recursion. This algorithm schematically would be realized as follows: -83-Example #2: In conjunction with the evolution towards the f in i te state machine approach, a second example was performed using the Weiner f i l t e r s to predict snowpack depletion from the Mission area (See Fig. 29 ) using an operator computed using McBride snowpack depletion data. These two sets of snowpillow data were "driven" by a signal containing temperature, humidity and a histor ic record. This type of f i l t e r is s l ight ly dif ferent, in that there are three input signals and only one output trace. This type is called a constrained energy f i l t e r and is discussed by Robinson^v;PP^ 261 ,.269. The difference in' appearance is that the f i l t e r weights are now row vectors and the sum of squares of the output y t is minimized using LaGrange multipl iers. A realization of the agreement using Mission 1972 temp and humidity convolved with the McBride 1973 f i l t e r can be seen in Figure 29. With this result one now has set up the two stages of a "machine". The communication between these state machines can take on several forms. The simplest of these is to consider the ra infa l l runoff machine and the snowpack machine separately i . e . , as a pair of time invariant f i l t e r s which can be linked by augmenting the input of the second with the output of the f i r s t , eg. , finding the flow equivalent of both rain and snow depletion for that area and including the snow-water equivalent as an additional signal. However, i f one were to run this model in an "on l ine" situation the update on each of these models would be performed through the Kalman error f i l t e r thus emulating a "continuous" (discrete) feedback system. Such would be the case i f one was sensing the snowpack depletion from a sate l l i te track and then merging this with sate l l i te -84-Figure 29 -85-information on precipitation, temperature, etc. The overhead for this would be some type of pattern matching device (program) and a language to monitor a l l the incoming information before involving the various f i l t e r mechanisms. Example #3: As a f inal example of the Weiner f i l t e r s used as energy constraints and as a state machine, the following case study of floods on the Squamish River was performed. Peculiar to this particular basin is the contribution of rapidly melted snow, caused by a cold front and then followed by .'warm" turbulent a i r from a southwest direction. - (See Figure 30a). The majority of the melting occurs on or about the 5000 f t elevation. This apparently occurs only when the air is practical ly isothermal from the valley f loor to the 5000 f t (850 mil l ibar) range. This condition, when accompanied by heavy rains gives rise to flood conditions of short duration and high intensity. A realization of the five significant flood years for the month of October can be seen in Fig.30a. (These particular conditions are in evidence in October and in July, but predominate in October.) For this case one must budget (accumulate) snow @ 5000 f t . depending on temperature. This results in the temperature, precipitat ion, snowmelt situation as inputs and the single flood flow as output. Since the record variation is so rapid the Weiner f i l t e r is apt to be out-of-phase with the actual event. Consequently, one wishes to exploit one further property of the error behaviour of these constrained energy f i l t e r s . This characteristic can be argued as follows: -8.6-+ I I I 1 I 1 I I 1 1 1 I I 1 1 \AAAAA, 1 1 1 I 1 T T T T 1 The time axis i s in days 1958 + r T - r r r r r i r i i i T r - n o m I \ r i I I I i i i i i 1.00 -0.75 -0.50 -0.25 „ 0.0 -0.25 --0.50 -0.75 -1.00 1 FLOW TRACES FOR SQUAMISH FLOODS FOR OCTOBER 1955 1963 1 1 I I 1 1 1 1 1 I 1 I 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1965 + 1 1 I 1 1 1 1 1 I 1 1 1 1 I 1 1 1 1 1 1 I T T T T T T T T H peaks are flows .> 24 x 10 3 cts 1966 Figure 30a -87-The Weiner prediction error f i l t e r is realized in this case to minimize the output energy due to noise. Consequently, the amplitudes of the f i l t e r ' s output that are due to noise alone are small (least-squares small). If at some time t^ in the input stream a pulse (large snowmelt) arrives, i t can be predicted from the past values. Therefore, a large error is observed in the predicted output. However, the very fact that this error has occurred is an accurate means of detection. This pulse, i t appears is very close to the fundamental frequency of the system, (approximately 3.2 days) and its appearance in the input signal indicates consistently (at least with this data) the appearance of a flood. This phenomenon is an extremely useful tool for situations where spikes are evident in the output traces. -88-Chapter 8 OTHER CLASSES OF FILTERS One of the principal problems encountered with Weiner type deconvolution is that although i t provides signal compression, i t also causes, in a large number of cases, excessive amplification of noise components. As is apparent from the Squamish River floods, the noise and the arrival of large spikes (floods) dominate the traces. This pulse arrival also violates the minimum phase conditions required for a realizable Weiner f i l t e r . The use of the Weiner f i l t e r as a signal (seen in the large error) of the arrival of these events does not, however, indicate the amplitudes accurately. Recently, (Wiggins^, 1 9 ?/) 1 another class of deconvolution operators has been developed in an attempt to cope with pulse arrivals at varying lags. Since this technique uses the kurtosis as i ts optimizing c r i te r i a (as opposed to the Weiner least-squares) only the components contributing to large variance components are followed. Implicit in this phaseless formulation of the deconvolution process is that the selection of the norm one uses is only restricted by the fact that i t must represent a uniform, monotonicly increasing distribution. In simple terms this means that one can selectively redistribute the variance contributions by the observed data in any 'suitable' manner. This implies that as long as the redistribution function increases with increasing variance (up to some defined cut off point), one can choose any transformation that appears to suit the data. -89-Included in this chapter are two possible reformulations of the deconvolution problem. This type of transformation is thus the direction towards which the entire unfolding process wil l move as more classes of transformations belonging to this type are uncovered. In addition to this "simplifying" of the structure, the f i l t e r weights are vectors and not matrices. This wil l speed up considerably the calculations and the adaptability to the Kalman formulation. Minimum Entropy Deconvolution: As a brief summary of the minimum entropy deconvolution (M.E.D.) idea and its water resource applications one needs to imagine the s i tu -ation where the input consists only of a few large spikes which are separated by different time intervals while the operator i t se l f remains relat ively unchanged. In terms of the Squamish flood situation, the spikes are pulses of flow caused by the rapid melting of the snowpack and 5000 f t . These arrivals are dependent on the wind direction, temperature and amount of snowpack. The basin (operator) response is considered to be the same under a l l of these conditions. What the linear operator proposed by Wiggins-^ does is convert these signals to a "simple" appearance. "Simple" meaning that the output consists of a few large spikes of "unknown sign and location" which are in turn separated by nearly zero terms. This type of approach maximizes the order of these spikes or "equivalently minimizes the entropy of the spikes; hence the name minimum entropy deconvolution." i 0 -.90-To simplify the behaviour of this process he chooses the kurtosis of the samples. The kurtosis measures the contribution to the ta i l s of distribution of the samples; in other words, those spikes most deviant from the mean sample. To restate this in terms of the rainfal l /runoff context is straightforward. For the situation caused by high intensity ra infa l l accompanied by rapid melting, a spike arrival is seen to appear in the runoff. The intermediate lesser peaks are of no importance consequently one can ignore ( s tat i s t ica l ly ) their contribution in favour of the extreme (ta i l ) events. In terms of the general autoregressive formulation used throughout, the problem appears as follows: Nf y i j = ^ V i , j -k w h e r e 1 = 1 N S |<=T j = l,...,NT where NS are the number of trace elements and NT the number of time periods. (NF are the number of f i l t e r components). The norm (like the least squares norm) is to be found such that i t maximizes the variance. This is denoted as a "varimax" norm and has the following form: NS V (varimax norm) = I V. i=l v- = l\ yk-• I (I y2--)2 is the sum of the squares of the normalized variances. The behaviour of this norm looks l ike: | V = 1.0 i M I i I i I I -91-For a single output which is everywhere zero except for a single spike has V=l. The more nonzero similar sized spikes the smaller V becomes. The derivation by Wiggins-^ of the f i l t e r coefficients is straight forward and appears as follows: 9V Since V is to be maximum -T-F = 0 a t k i-e- li " "* ~ 0' substitute this into the expression for i af k V. and one obtains: 1 2 P i = I j y^ • (the variance of y) 3 ....... ay. . Setting f 1 J = u. . . one f ina l l y obtains: d l^ i , j -K l f * l l v i p i ?1 u i , j - * u i , j - k = J l Pi ?1 y i j u i , j - k which Wiggins-jg rewrites as: Rf = g R is the autocorrelation matrix consisting of a weighted sum of the autocorrelations of the input signals and g the cross correlation vector composed of a weighted sum of the cross correlations of the outputs cubed with the inputs. As i t stands the system is nonlinear, but can be solved i terat ively with an assumed f. The solution is not a unique one but provides an adequate estimate of the maximum. It is interesting to note that the effect of cubing a time series is similar to zeroing i t with the exception of a few of the largest values. The f i l t e r weights adapt themselves to finding a -92-f i l t e r shape to the spikes of the cubed outputs. From the results i t would appear that i f the i n i t i a l guess at f is a spike, then the output (runoff) spikes coincide with the actual flood spikes. The bursts of snowmelt in the input data seem to control the phasing or the delay of the output events. The way around this is to time shift the outputs after comparing the actual and predicted traces for some similar target data. There are several features of this process that make i t attractive for this type of event. If a l l of the output traces contain random noise (which is inevitable) then R is proportional to the unit matrix, meaning that the cross correlation matrix wil l have only one non-zero value. The same is true for the f inal output f i l t e r . This value is the same as the i n i t i a l estimate for f. Wiggins-jg demonstrates that the process is quite stable in the presence of large noise, i .e. i t leaves white noise unmodified. The most signif icant departure from the normal deconvolution process is the absence of the minimum phase restr ict ion on the resulting f i l t e r . In fact, there are no phasing requirements at a l l . This behaviour is also quite useful in other respects. When one views the predictive deconvolution f i l t e r characteristics generally, i t can be said that they a l l tend to amplify noise excessively. On the other hand, the M.E.D. (minimum entropy deconvolution) does just the opposite. While emphasizing the spike behaviour i t quite selectively suppresses bands where the signal to noise is smallest and accentuates the coherent signals. A sample of the type of agreement attainable from these spikey events is included in Figure 31. -93--94-As is v is ib le in Figure 33, the largest contribution to the variance is by the largest spike (at the top right hand corner). The predominant contributions to the trace is by the small amplitude flows seen near the origin. The algorithm to compute the weights approaches this maximum in 10 iterations (Figure 34) and the resulting f i l t e r configuration is visualized in Figure 33a. Further classes of transformations on the distribution of y^ have been investigated by 0oe-& Ulrych •(1978)-".The motivation behind the transformation of y. can be understood simply in terms of the normalized plots of variance vs. y^ below in Figure 35. Figure 35 The kurtosis distributes the variation in the variance contributions as per Figure 36. This may, however, be too radical a weighting for anything other than spike inputs. The process for redistributing the rate at which this contribution is accumulated is to introduce a transformation on y such that the resulting shape is determined on the transformed variable z. A suitable normalization is introduced and then the whole process is resubstituted into the varimax formulation and solved for the set of f i l t e r weights f^. ,-95-CO CO CD s~ CD Figure 34 Figure 33a As an example of th is , consider the transformation proposed by Ooe M., and Ulrych, T . ^ z = 1 - exp(-ay 2), where a = -^T7 , k a constant determining the slope. If one substitutes z into the varimax norm the new norm looks l ike: J J 3 V . The expansion and substitution for - r ^ — then appears l ike: 9 t k Z i j ) _ 2 ^Wa(1"z) ?! V i j - k x i j - k = 2(l\z)-h l\z l\z* l\ 2ay ( l - z ) x j J J* J from which one obtains a similar form: Rf = g as produced in Wiggins-^ to solve for f. This is just one of a class of norms possible for emphasizing different aspects of the data structure depending on conditions. As a comparison between the straight varimax transformation and the "z" transformation, plots of both are in Figure 36. 1.0 T o.5 4 Figure 36 -99 -Contained in this z transformation is the parameter k. This ultimately determines the shaping of the distribution. Experiments have found a stable value of k to be between 1.4 and 3. The choice of k is such that one wants k to "look l ike" y 2 for y m . Restated more formally: •'max J for z = 1 - e We want z = y 2 at y Since a = ^ ay' max a2 is the variance Where a, the standard deviation is such that 0 = ymax/1.4 or 3 This choice is on the assumption that.z is roughly l ike a normal distribution and 3a should approximately include the ta i l s of this distribution (See Figure 38). •100-Chapter 9 CONCLUSIONS Each of these classes of predictive f i l t e r s perform well under different sets of conditions. As has been found by t r i a l and error, no one class does i t a l l . What has become most apparent, however, is the appl icabi l i ty of a l l these f i l t e r s to the state-space configuration and a simplicity of incorporation into f in i te state machines. On a modular level the f i l t e r (system) characteristics are simply and directly interpretable using standard spectral techniques. This assemblage of tools in the form of general algorithms presents a formidable assessment package for a wide variety of multichannel problems. Although they do not represent a l l poss ib i l i t ies , they do provide a viable means of assessing flood prediction from a multichannel point of view. -101-BIBLIOGRAPHY 1. Kalman, R.E., "A New Approach to Linear F i l ter ing and Prediction Problems". Trans. J . of Basic Engineering, March 1960, p. 35-46. 2. Ho, Y.C., " J . Math. Analysis and Applications", 1962, 6, p. 152. 3. Graybi l l , F.A., "Ah Introduction to Linear Stat ist ical Models", Vol. 1, McGraw H i l l , New York, 1960. 4. Bayless, J.W. and Brigham, E.O., 1970, "Applications of Kalman Fi l ters to Continuous Signal Restoration", Geophysics, Vol. 35, p. 2-23. 5. Weiner, N., 1949, "Extrapolation, Interpolation and Smoothing of Stationary Time Series", J . Wiley & Sons, New York. 6. Hamming, R.W.; Tukey, J.W., "Measuring Noise Colour", Bell Telephone Lab, Murray H i l l , N.J., 1960. 7. Robinson, E.A., "Multichannel Time Series Analysis with Digital Computer Applications", Holden Day, San Francisco, 1967. 8. Gelb, A., "Applied Optimal Estimation", M.I.T. Press, 1974, Chapter 4. 9. Gauss, K.F.., Werke, Gontingeu, 4* 1821 (collected works) 1873. 10. Gump, N.D., Geophysics, 1974, Vol. 39, p. 1-14. 11. Clark, C O . , "Storage and the Unit Hydrograph", Trans. A.S.C.E., Vol. 110, p. 1419-1488. 12. Plackett, R.L., "Principles of Regression Analysis", Clarendon Press, Oxford, 1960. 13. Wiggins, R.A., Del Mar Technical Associates, P.O. Box 1083, Del Mar, Cal i fornia, 92014. -102-BIBLIOGRAPHY (Continued) 14. Krohn, K. and Rhodes, J . , "Development of New Mathematical Techniques f o r B i o s i g n a l A n a l y s i s " , (Annual Report to N.A.S.A. Contact, Krohn Rhodes Research I n s t i t u t e , Berkeley, C a l i f o r n i a , 1967. 15. Ooe, M.; U l r y c h , T., (Ta) 16. Jenkins G., Watts; "Spectral A n a l y s i s and I t s A p p l i c a t i o n s " , Holden Day, San F r a n c i s c o , C a l i f o r n i a , 1968. 17. Young, P., "Recursive Approaches to Time Series A n a l y s i s " , I n s t i t u t e of Mathematics and I t s A p p l i c a t i o n s , Department of Engineering, U n i v e r s i t y of Cambridge, 1974. -103-