H Y D R A U L I C N E T W O R K ANALYSIS A N D M O D E L C A L I B R A T I O N W I T H ARTIFICIAL N E U R A L NETS By Vavid A. Tayegh B. A . Math. Queen's University B. Sc. Eng. Queen's University M . A . Sc. University of British Columbia A THESIS SUBMITTED IN PARTIAL F U L F I L L M E N T O F T H E REQUIREMENTS FOR T H E D E G R E E O F D O C T O R OF PHILOSOPHY in T H E F A C U L T Y O F G R A D U A T E STUDIES (Applied Science) We accept this thesis as conforming to the required standard T H E UNIVERSITY OF BRITISH COLUMBIA 1993 (c) Vavid A. Tayegh, 1993 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. The University of British Columbia 2075 Wesbrook Place Vancouver, Canada V6T 1W5 Date: To: Special Collections Librarian Re: Restricted Thesis of David Fayegh At our request, the commencement of the period for which the partial license shall operate shall be delayed from October 28th, 1993 for a period of twelve (12) months; such operation may be delayed for an additional period with good cause, as determined by the undersigned. ABSTRACT Computation of steady-stateflowrates and the pressure distribution in a hydraulic network of given topology, pipe carrying capacities, and nodal demand discharges plays an essential role in the design, analysis, and operation of water and other fluid distribution systems. When accurate estimates of the hydraulic network model parameters (namely, pipe resistances and demand discharges) are available, the pipe segmentflowrates and nodal energy heads can be easily calculated by means of a number of efficient algorithms that iteratively solve the basic non-linear system of hydraulic equations. For existing networks, however, reliable estimates of model parameters are rarely available since the nodal demands are in a continual state of change and the pipe segment carrying capacities gradually change with time. Artificial neural networks1 have recently emerged as general problem-solving tools whose potential has only begun to be exploited in engineering. The objectives of this dissertation are three fold: first, to identify the most appropriate class of ANN architectures and learning algorithms suitable for modeling two key problems associated with the design and operation of water distribution systems, namely, hydraulic network analysis and model calibration; secondly, to develop a relatively general framework for representing arbitrarily complex hydraulic networks by means of multilayered feedforward ANNs; andfinally,to experimentally investigate the convergence and performance behavior of a number of feedforward networks trained by error back propagation for hydraulic network analysis and model calibration under typical operating conditions. Specifically, several sets of input/output2 training and testing vector pairs were generated and used to train and test the convergence and performance of a number of feedforward networks used to represent three characteristically different classes of hydraulic network problems. *ANNs henceforth. I/0 henceforth. 2 ii Convergence behavior was observed to improve after the appropriate preprocessing of the I/O training vector pairs generated by the hydraulic network model simulator. In part, this process involved application of linear and non-linear fuzzy transformation functions to the various groups of variables and parameters in the input and output vector pairs followed by the algebraic normalization of the input to the feedforward networks. The effects on convergence and performance as a result of variations in a number of experimental parameters such as the choice of transformation functions, changes in the underlying probability distribution from which seed hydraulic network data were drawn, the presence of noise in training and testing data, choice of the activation and output functions and their parameters, the size of training I/O data pairs, and the number of layers used in the feedforward network have also been investigated. Examination of the results indicates that the standard error back propagation algorithm converges relatively slowly for even medium-sized distribution systems although the number of training cycles required to learn an assigned class of tasks remains relatively invariant of the number of variables and parameters of the hydraulic network under investigation. The success of any particular feedforward network was largely dependent on the representation of the hydraulic networks (such as identification of the I/O, number of layers and nodes/layer in the ANN) and the choice of the transformation functions applied to I/O vector pairs. The techniques developed for transformation and normalization of the 1/0 training and testing data in addition to the appropriate ANN representation of the hydraulic network were observed to result in significant improvements in performance both during training, as measured in terms of the cycles required to achieve a desired root mean square error3 computed with both the training and testing data sets. 3 RMSE henceforth. iii T A B L E OF C O N T E N T S ABSTRACT ii LIST OF T A B L E S viii LIST OF FIGURES x LIST OF S Y M B O L S xiv ACKNOWLEDGEMENTS 1 2 xxiv INTRODUCTION 1 1.1 Motivation 1 1.2 Scope of the Present Investigation 3 1.3 Organization 8 H Y D R A U L I C N E T W O R K ANALYSIS A N D M O D E L C A L I B R A T I O N 10 2.1 Introduction 10 2.2 The Steady-State Hydraulic Network Analysis Problem 11 2.2.1 12 2.3 2.4 Matrix Formulation of the Basic Equations Iterative Solution Methods '. 20 2.3.1 Solution of the Pipe Flow Equations . 21 2.3.2 Solution of the Node Equations 22 2.3.3 Solution of the Loop Equations . . . 24 2.3.4 The Newton-Raphson Method . . 25 The Hydraulic Network Model Calibration Problem 29 2.4.1 32 Related Research iv 2.4.2 2.5 Hydraulic Network Model Calibration by Optimization Conclusions 34 3 T H E BUILDING B L O C K C O M P O N E N T S OF A N N S 3.1 3.2 3.3 3.4 3.5 4 Introduction 33 .' 37 . 37 3.1.1 Research History 41 3.1.2 Why Artificial Neural Networks? . . .. . . 43 The Functional Components of ANNs 46 3.2.1 A Simplified Processing Element 51 3.2.2 Connection Data Types and Interconnection Geometries 54 3.2.3 ANNs as Trainable Dynamical Systems 55 . . . : Hierarchical Representation of ANNs . 57 3.3.1 Static and Dynamic Components of ANNs 58 3.3.2 Associative Artificial Neural Networks 60 3.3.3 Mapping Networks 70 3.3.4 Stochastic Networks . . . 76 3.3.5 Spatiotemporal Networks 78 3.3.6 Hierarchical Networks . . 80 A N N Learning Laws and Training Methods 81 3.4.1 Learning as Parameter Estimation . . . 81 3.4.2 Example of A N N Learning as Parameter Estimation 86 Conclusions 91 D E V E L O P M E N T OF T H E N E T W O R K M O D E L S 93 4.1 Introduction 93 4.1.1 95 4.2 Why Learning by Error Backpropagation in Multilayered ANNs? Error Backpropagation in Multilayered Feedforward ANNs 4.2.1 The Forward-Backward Signal Propagation Cycle . . . v 98 100 4.3 4.4 4.5 4.6 4.2.2 Performance Measurement and Function Approximation Accuracy . . . . 103 4.2.3 Derivation of the Basic Self Adaptation Equations . 107 Integrating Hydraulic Network Models with Feedforward ANNs Ill 4.3.1 116 Mapping Hydraulic Networks to Feedforward ANNs Implementation of the Network Simulators . 120 4.4.1 Measures of ANN Performance 121 4.4.2 The WDN Model Simulator .126 4.4.3 The ANN Model Simulator 128 Specific Application Problems 130 4.5.1 Problem 1A: Hydraulic Network Analysis with Complete Data 130 4.5.2 Problem IB: Hydraulic Network Analysis with Limited Data 133 4.5.3 Problem 2A: Hydraulic Network Model Calibration with Complete Data . 134 4.5.4 Problem 2B: Hydraulic Net Model Calibration with Limited Data . . . . 135 4.5.5 Problem 3: Learning Arbitrary Functions Conclusions 135 137 E X P E R I M E N T A L V E R I F I C A T I O N A N D ANALYSIS O F RESULTS 138 5.1 Introduction 138 5.1.1 140 5.2 5.3 5.4 Experimental Approach The Experiments 143 5.2.1 Overview of Group I Experiments 5.2.2 Overview of Group II Experiments . 144 147 Representation and Details of the Hydraulic Network Examples 147 5.3.1 Example 1: The 3-Node/3-Segment/l-Loop Hydraulic Network 148 5.3.2 Example 2: The 7-Node/ll-Segment/5-Loop Hydraulic Network . . . . . 150 5.3.3 A Method for Hydraulic Network Decomposition . . . . . . . . . . . . . . Synthetic Data Generation 5.4.1 . . . . 152 154 Range of System Operation and the Size of Training and Testing Data Setsl56 vi 5.5 5.6 6 5.4.2 Development of the Input/Output Data Transformation Functions . . . .160 5.4.3 Choice of the Activation and Transfer Functions 165 5.4.4 Weight Initialization and Learning Parameters 167 Details of Experiments and Simulation Results 171 5.5.1 Simulation Results for Group I Experiments 171 5.5.2 Simulation Results for Group II Experiments 185 5.5.3 Discussion and Summary of Results 189 Conclusions 194 SYNOPSIS A N D CONCLUSIONS 197 BIBLIOGRAPHY 202 APPENDICES 210 A 210 S U M M A R Y OF RESULTS vii LIST O F T A B L E S 2.1 Summary of WDN state variables and parameters . 14 3.2 Examples of common activation functions 53 3.3 Template for hierarchical specification of ANN architecture 61 5.4 Pipe segment data for the 3-Node hydraulic network . . 149 5.5 Node data for the 3-Node hydraulic network 149 5.6 Node data for the 7-Node hydraulic network 150 5.7 Pipe segment data for the 7-Node hydraulic network 151 5.8 Results Table 1.1: Group I experiments, Class 1A Problem, Noiseless, T = 14 . . 173 5.9 Results Table 1.2: Group I experiments, Class 1A Problem, Noisy, T = 14 . . . . 174 5.10 Results Table 1.3: Group I experiments, Class 1A Problem, Noiseless, T = 54 . . 175 5.11 Results Table 1.4: Group I experiments, Class 1A Problem, Noisy, T = 54 . . . . 175 5.12 Results Table 2.1: Group I experiments, Class 2A Problem, Noiseless, T = 14 . . 177 5.13 Results Table 2.2: Group I experiments, Class 2A Problem, Noisy, T = 14 . . . . 178 5.14 Results Table 2.3: Group I experiments, Class 2A Problem, Noiseless, T = 54 . . 179 5.15 Results Table 2.4: Group I experiments, Class 2A Problem, Noisy, T = 54 . . . . 180 5.16 Results Table 3.1: Group I experiments, Class 3 Problem, Noiseless, T = 14 . . . 181 5.17 Results Table 3.2: Group I experiments, Class 3 Problem, Noisy , T = 14 . . . . 182 5.18 Results Table 3.3: Group I experiments, Class 3 Problem, Noiseless, T — 54 . . . 183 5.19 Results Table 3.4: Group I experiments, Class 3 Problem, Noisy, T = 54 185 5.20 Results Table 1 for Group II experiments (Problems IB and 2B, ANN with 1-Hidden layer) 187 viii 5.21 Results Table 2 for Group II experiments (Problems IB and 2B, A N N with 5-Hidden layers) 188 ix LIST O F F I G U R E S 2.1 Schematic of a 3-Node hydraulic network 12 2.2 Schematic of generic pipe segment 13 2.3 Illustration of zero finding with Newton's method 26 3.4 The basic components of the von Neumann computer 38 3.5 Example of ANN with 3 input units and 10 processing elements 39 3.6 The Internal Structure of the FLowTron . 49 3.7 A Simplified Processing Element 52 3.8 Illustration of a typical feedforward associative ANN architecture . . . . . . . . . 62 3.9 Illustration of a typical recurrent associative ANN architecture 63 3.10 Illustration of the Linear Associator architecture 65 3.11 Illustration of the Hopfield Network 68 3.12 Illustration of Kohonen's Self-Organizing Layer . 71 3.13 Illustration of counterpropagation 74 3.14 Example of the topology of a recurrent network 79 3.15 The Architecture of the ADAptive Linear Element 87 4.16 Illustration of the forward-backward signal propagation cycle 99 4.17 The Forward signal propagation phase in multilayered ANN . 101 4.18 The error signal backpropagation phase in feedforward ANN 104 4.19 Unknown Source of Data for Supervised Training . 112 4.20 Interfacing Multilayered Feedforward ANN with Existing WDN Models and Calibration Methods 4.21 ANN Training scheme for learning the desired WDN output 113 114 4.22 ANN Training Scheme For Estimating WDN Model Parameters 116 4.23 Interaction between the network simulators during training (Problem 1A) . . . . 131 4.24 Training scheme for learning model outputs with partial inputs (Problem IB) . . 132 4.25 Interaction between the network simulators during training (Problem 2A) . . . . 133 4.26 Training scheme for learning model parameters with partial input (Problem 2B) 135 4.27 Interaction between the network simulators during training (Problem 3) 136 5.28 Example 1: The simplest 3-Node/3-Segment/l-Loop hydraulic network 141 5.29 Example 2: The 7-Node/ll-Segment/5-Loop hydraulic network example 142 5.30 Overview of Group I Experiments for the 3-Node Hydraulic Net . 145 5.31 Overview of Group II Experiments 146 5.32 The smallest Feed-forward ANN used during simulations (Problem 3, 3-Node/3Segment example) • • • 5.33 Hydraulic network decomposition by "cut" along segment • • 148 . 153 5.34 Form of the input transformation functions 163 5.35 Parameters of the logistic activation function 166 5.36 Illustration of Overtraining . . 190 A.37 Result Figure 1.1: Error Evolution for Problem 1, 3-Node Net, T:= 14 (Normal) . 211 A.38 Result Figure 1.2: Error Evolution for Problem 1, 3-Node Net, T:=54 (Normal) . 212 A.39 Result Figure 1.3: Error Evolution for Problem 1, 3-Node Net, T==14 (Uniform) . 213 A.40 Result Figure 1.4: Error Evolution for Problem 1, 3-Node Net, T==54 (Uniform) . 214 A.41 Result Figure 1.5: Error Evolution for Problem 1, 7-Node Net, T==14 (Normal) . 215 A.42 Result Figure 1.6: Error Evolution for Problem 1, 7-Node Net, T:=54 (Normal) . 216 A.43 Result Figure 1.7: Error Evolution for Problem 1, 7-Node Net, T==14 (Uniform) .217 A.44 Result Figure 1.8: Error Evolution for Problem 1, 7-Node Net, T==54 (Uniform) . 218 A.45 Result Figure 2.1: Error Evolution for Problem 2, 3-Node Net, T==14 (Normal) . 219 A.46 Result Figure 2.2: Error Evolution for Problem 2, 3-Node Net, T==54 (Normal) . 220 xi A.47 Result Figure 2.3: Error Evolution for Problem 2, 3-Node Net, T=:14 (Uniform) . 221 A.48 Result Figure 2.4: Error Evolution for Problem 2, 3-Node Net, T=:54 (Uniform) . 222 A.49 Result Figure 2.5: Error Evolution for Problem 2, 7-Node Net, T== 14(Normal) . 223 A.50 Result Figure 2.6: Error Evolution for Problem 2, 7-Node Net, T==54 (Normal) 224 A.51 Result Figure 2.7: Error Evolution for Problem 2, 7-Node Net, T= 14 (Uniform) 225 A.52 Result Figure 2.8: Error Evolution for Problem 1, 7-Node Net, T= 54 (Uniform) 226 A.53 Result Figure 3.1: Error Evolution for Problem 3, 3-Node Net, T= 14 (Normal) 227 A.54 Result Figure 3.2: Error Evolution for Problem 3, 3-Node Net, T= 54 (Normal) 228 A.55 Result Figure 3.3: Error Evolution for Problem 3, 3-Node Net, T= 14 (Uniform) 229 A.56 Result Figure 3.4: Error Evolution for Problem 3, 3-Node Net, T= 54 (Uniform) 230 A.57 Result Figure 3.5: Error Evolution for Problem 3, 7-Node Net, T= 14 (Normal) 231 A.58 Result Figure 3.6: Error Evolution for Problem 3, 7-Node Net, T= 54 (Normal) 232 A.59 Result Figure 3.7: Error Evolution for Problem 3, 7-Node Net, T= 14 (Uniform) 233 A.60 Result Figure 3.8: Error Evolution for Problem 3, 7-Node Net, T= 54 (Uniform) . 234 A.61 Result Figure 3.9: Comparison of RMSE t and RMSE , k Problem 3, 3-Node Net, Noiseless 235 A.62 Result Figure 3.10: Comparison of RMSE t and RMSE , k Problem 3, 3-Node Net, Noisy . . . •. , A.63 Result Figure 3.11: Comparison of RMSE t and RMSE , k .236 Problem 3, 7-Node Net, Noiseless 237 A.64 Result Figure 3.12: Actual vs. A N N computed values of K , (E£j£,K™" ), n s Problem 3, Noiseless Normal, Fuzzy Type 1, T = 54 . 238 A.65 Result Figure 3.13: Actual vs. A N N computed values of K , s (E££,K™ ), ox Problem 3, Noiseless Normal, Fuzzy Type 1, T = 54 239 A.66 Result Figure 3.14: Actual vs. A N N computed values of K , s Problem 3, Noiseless Normal, Fuzzy Type 1, T = 54 xii (E££*,K™ ), n 240 A.67 Result Figure 3.15: Actual vs. ANN computed values of K , (E££*, K™ ), ax s Problem 3, Noiseless Normal, Fuzzy Type 1, T = 54 .241 A.68 Result Figure 3.16: Actual vs. ANN computed values of K , (E££!,KJ" ), n 3 Problem 3, Noisy Normal, Fuzzy Type 2, T = 54 242 A.69 Result Figure 3.17: Actual vs. ANN computed values of K , (E£™, K™ ), ax s Problem 3, Noisy Normal, Fuzzy Type 2, T = 54 243 A.70 Result Figure 3.18: Actual vs. ANN computed values of K , (E££ , K™ ), x in s Problem 3, Noisy Normal, Fuzzy Type 2, T = 54 244 A.71 Result Figure 3.19: Actual vs. ANN computed values of K , s (E^,K™ ), Problem 3, Noisy Normal, Fuzzy Type 2, T = 54 A.72 Error Evolution for Problem lb, 7-Node Net, ANN with 1 Hidden Layer a x 245 246 ,A.73 Error Evolution for Problem lb, 7-Node Net, ANN with 5 Hidden Layers . . . . 247 A.74 Error Evolution for Problem 2b, 7-Node Net, ANN with 1 Hidden Layer 248 A.75 Error Evolution for Problem 2b, 7-Node Net, ANN with 5 Hidden Layers . . . .249 A.76 Error Evolution for Problem 2b, 7-Node Net, ANN with 1 Hidden Layer xiii 250 LIST O F S Y M B O L S SYMBOL DEFINITION EQUATION a Positive integer 2.1 a,- Cross-sectional area of pipe segment i 2.5 o,-(.)" Activation function implemented at PE j 3.28 «}(•) Activation function implemented at PEj b Positive integer 2.1 Bias parameter stored at PE j 3.41 Bias parameter stored at PEj 3.20 Elements of incidence matrix, B j 2.2 Initial values of the ANN" biases 3.40 Constant coefficients of error polynomial 5.101 Integer exponent of percentage error 2.19 Diameter of pipe segment i 4.93 *!• bo Ci,---,C 4 do Initial size of square grid around the winning PE w rft Size of square grid at training iteration t d 6-dimensional vector of pipe segment diameters s . exp(.) The exponential function /*(•) Output transfer function implemented at PE j /j(-) Output transfer function implemented at PEj 3.29 Friction factor of pipe segment i 4.92 Acceleration due to gravity 2.5 Polynomial of degree 4 5.101 xiv SYMBOL DEFINITION hi{.) Function of energy head loss in pipe segment i EQUATION mj-dimensional vector of (weight) functions at PEj k Number of sink nodes in hydraulic network kt Number of sink nodes in loop I U Length of pipe segment i h s-dimensional vector of pipe segment lengths m Number of external inputs to A N N rrij Number of inputs to P E j n Number of external outputs from A N N n Number of connections (weight parameters) in A N N ne Total number of nodes in loop I Tli Real-valued exponent of volumetric n Number of P E s centered at the wining P E , w P(-) Probability density function c w 2.9 2.5 3.24 flowrate 2.4 5.100 Joint probability density function p The input/output cross correlation vector 3.65 9 Expectation of one-dimensional output 3.65 Average volumetric flowrate in pipe segment i 2.2 Inflow to segment i at node j — a Qbi comp q „obs Outflow from segment i at node j = b Rate of change of flowrate with respect to time 2.5 Model computed flowrate in pipe segment i, load I 2.19 Observed flowrate in pipe segment i, load I 2.19 Vector of pipe segment flowrates with s elements 2.6 xv SYMBOL DEFINITION r Number of source nodes in hydraulic net EQUATION 2.9 Number of source nodes in loop I sign(.) The sign function 3.31 s Number of segments in hydraulic net 2.2 std. abbreviates standard deviation Si The i Si Number of pipe segments in loop £ Smax Maximum value of s 5.95 Smin Minimum value of s 5.95 So Observed value of s 5.100 Mean value of s 5.97 5 M th element of the input vector s S m-dimensional external input vector to the ANN Sjfe k testing input vector of I/O vector pair t training input vector of 1/0 vector pair th th 3.29 3.48 ANN computed value of the m-dimensional input vector m-dimensional mean of the input vector to PE j Mean of s; at PE j 3.29 mj-dimensional input vector to PEJ 3.21 tanh(.) The hyperbolic tangent function 3i27 u Uniform deviate in the interval [0,1] u(t) (m + iV)-dimensional input vector •v(<) (n + (N — n))-dimensional output vector w Weight parameter stored at PEj with source PE i 3.24 Weight parameter stored at PEj with source PE i 3.41 xvi SYMBOL DEFINITION w Derivative of the weight ji with respect to time EQUATION I maximum value of weight parameter in layer / w.max i minimum value of weight parameter in layer / Wj m-dimensional weight vector stored at PE j 3.41 mj-dimensional weight vector stored at PEj 3.21 w,max w* The weight vector for which £ ( w * ) = £ , > 0 X Net-input to PE j Xj m a i Maximum value of net-input to PE j in layer / X / jmin X I lmax r 3 6 m n 3.41 Minimum value of net-input to PE j in layer / Net-input to PE\ 3.21 Vector of net-inputs 3.42 State of activation of PE j 3.26 State of activation of PEj 3.22 Max. output from PE\ with max. number of PEs/layer Output from PEj 3.26 Derivative of the output of PEj with respect to time 3.32 Output of PE] 3.23 ,'-1 Output vector from the source layer / — 1 Zt The t Zt+1 th estimate of the 1-dimensional root z* of F(z) 2.14 ANN computed output for training I/O pair t of T 3.63 The desired output of PE j (training I/O pair t of T) 3.43 The (t + l) 2.14 th estimate of the root z* of F(z) Mean value of z xvii SYMBOL DEFINITION z The t th t EQUATION estimate of the root z* of F ( z ) 2.14 ANN computed output vector for training I/O pair t of T z i The (t + l) zo Initial estimate of the root z* z (k + r + s)-dimensional vector of unknowns z* (k + r + s)-dimensional root of the system of equations Zj Desired ANN output vector in training data set ADALINE abbreviates ADAaptive LINear Element ANN abbreviates Artificial Neural Network ANN A four tuple representation of a generic ANN A Set of ANN" activation functions Ao Initial values of the parameters of A A j(.,.) Activation function implemented at PEj Bj $ x (k + r) Incidence matrix Bfc s x k Incidence submatrix 2.7 B s x r Incidence submatrix 2.8 t+ l r th estimate of the root z* of F ( z ) BP abbreviates BackPropagation C Set of constraints on ANN parameters 2.14 2.6 3.34 3.38 3.40 3.34 C ANN" weight constraints 3.39 Cb ANN" bias constraints 3.39 ANN" input transformation parameter constraints 3.39 w C^4 ANN activation function parameter constraints 3.39 CT ANN transfer function parameter constraints 3.39 xviii SYMBOL DEFINITION EQUATION ANN performance function constraints 3.39 Conductance of segment i d Coupled differential or difference equation DE abbreviates differential equation Ei Energy head at node j = a of pipe segment i a Energy head at node j = b of pipe segment i 3.32 2.1 2.1 Ej Energy head at node j 2.9 gcomp Model computed value of energy head at node j, load £ 2.19 Jjjobs Observed value of energy head at node j, load t 2.19 €(.) Error performance function 3.38 ^max Maximum allowable value of error performance function 3.40 St Error performance computed with the t 4.74 I/O pair th k-dimensional vector of nodal energy heads at sink nodes 2.6 fc-dimensional vector Of maximum nodal energy heads t72, Em it A;-dimensional vector of minimum nodal energy heads r-dimensional vector of nodal energy heads at source nodes 2.10 (k + r)-dimensinal vector of nodal energy heads E [.] Expectation operator (statistical mean/average) T Set of ANN" output transfer functions fo Initial values of the parameters of T F{z) = 0 Conditional equation with single root z* n Derivative of F(z) with respect to z xp o 3.38 • Output transfer function at PEj Output transfer function implemented at PEj xix 3.40 SYMBOL DEFINITION EQUATION F(z) = 0 Conditional system of equations with root z* 2.12 JF The Jacobian matrix 2.17 F(-,.) Function of the input and weight vector 4.74 Gi Set of PEs in grid / 3.37 1 Interconnection configuration of ANN" 3.34 I/O abbreviates input/output J ANN interconnection configuration 3.34 Ki Energy head loss parameter of segment i 2.4 LMS abbreviates Least Mean Squared L, L Number of hidden layers in feedforward ANN L Number of non-overlapping closed loops n h 2.13 DE (learning law) at PE j Ni Number of nodes in layer I \rmax l-l Source layer / — 1 with max. number of PEs N h Total number of nodes in hydraulic net 2.13 N n Total number of processing elements in ANN 4.88 NL Number of loading conditions 2.19 NN Number of pressure measurement nodes 2.19 NS Number offlowmeasurement segments 2.19 P[] Event probability PE abbreviates Processing Element PE'j Processing Element j in layer / Qj External flowrate at node j ly 3.59, 5.100 XX 2.2 SYMBOL DEFINITION EQUATION A:-dimensional vector of external flowrates at sink nodes Maximum values of.flowrates at A; sink nodes \max Ik Ql Q 2.7 . Intermediate values of flowrates at k sink nodes ntl int3 Intermediate values of flowrates at A; sink nodes i mrn k Maximum values of flowrates at k sink nodes Qr r-dimensional vector of flowrates at r source nodes Qk+r (k -f r)-dimensinal vector of external flowrates Ri Resistance of pipe segment i R s-dimensional vector of pipe segment resistances s 2.8 2.3 Input auto correlation matrix R R , R n m n and m-dimensional Eucledian space RMSEk Root mean square error computed with testing data 4.92 RMSE Root mean square error computed with training data 4.91 RMSEf Desired RMSEt Sh, s Number of pipe segments in the hydraulic net 2.13 S m x T training input matrix 3.44 st The pseudoinverse of S 3.46 s ANN structure 3.34 T Transpose of a vector/matrix when used as a superscript t Total number of training I/O vector pairs Total number of iterations r ANN" topology 3.34 w m x m weight matrix 3.44 w Time derivative of matrix W xxi SYMBOL DEFINITION W Initial values of weight matrix W 0 EQUATION 3.40 WDN abbreviates Water Distribution Network Z n x T training output matrix Q The momentum parameter •<*j Positive real-valued constant parameter of sigmoid functions at Integer constant of proportionality 4.89 6 Arbitrary positive constant 4.87 The difference between desired and ANN computed outputs. 3.69 Rational number 4.87 Error of approximation 5.100 c* The difference between- values of iterates z t + i and Zf 2.15 V Learning rate parameter Vo Initial value of the learning rate parameter m Learning rate parameter at training iteration t X Real-valued constant 4.87 K-) Input/output transformation functions 3.38 Initial values of the parameters of y, 3.40 € V"max 3.44 Maximum value of the transformed input /output fi 5.95 Minimum value of the transformed input/output fi 5.95 Input transformation applied to input s 5.94 s s /*«(•) Input transformation applied to input si /*5(.,.) Net-input function implemented at PEj xxii SYMBOL DEFINITION EQUATION Output transformation applied to output z As The transformed A N N input vector 5.102 The transformed normalized A N N input vector 5.102 The transformed A N N output vector °] Variance of input at P E j T Temperature parameter in stochastic ANNs T j Time constant 4> Unknown mapping (function) approximated by A N N r Inverse of mapping <j> 1 3.33 i, Continuous real monotonically increasing function 4.87 r Set of AJW transition functions 3.38 AEi Energy head loss in pipe segment i 2.1 AE Vector of energy head losses with s elements 2.9 S Vector of corrective loop flowrates of size t ©0 Set of initial states of ANN- n Designates Program/Procedure xxiii 3.40 ACKNOWLEDGEMENTS I would like to thank Dr. S. O. Russell for providing valuable guidance and encouragement throughout the preparation of this thesis. This dissertation has also benefited from discussions and consultation with my supervisory committee Drs. W. F. Caselton, M. S. Davies, J. J. Little, and A. D. Russell. To my family and friends, especially "Dr. Bob" and M.J.S, a special note of thanks for your support and understanding. • I o_ l_ _8_o ( . / / I > |_o q / ]_ / c 4_ . xxiv -. o_ ]_o_ ( _ . , IV I |_) > |_ / _ o ( _ /. Chapter 1 INTRODUCTION 1.1 Motivation Urban water distribution systems are the most common type of hydraulic networks with a large number of branches and segments interconnected every few hundred meters at junctions and other hydraulic components such as storage reservoirs, pumps, flow and pressure regulating devices, etc.. The system must be designed, operated, and maintained to meet continually changing discharge requirements at the points of demand while keeping pressures within allowable ranges. The overall design of large-scale water distribution systems involves determining the demands and the location of major consumption and supply nodes in the area of interest, determination of pipe lengths and diameters and selection of pipe material, and associated equipment such as pumps, pressure regulating devices, secondary storage reservoirs, and so on. A number of design guidelines, such as those set out by the American Water Works Association1 are often followed to ensure that, for example, adequateflowis established in every pipe (to avoid stagnation); that water can reach the critical points of demands in the network without interruption during periods of pipe or pump failure (usually accomplished by incorporating some degree of redundancy in the design); and that the discharge quantities required tofighta ' A W W A henceforth. 1 Chapter 1. INTRODUCTION 2 major fire can be provided when and where needed. Estimation of design values of water demand largely depends on the size and type of the distribution system under consideration. For example, in the case of residential networks, demand varies on an hourly, daily, and seasonal basis. During preliminary design, the total demand is often estimated based on a consumption rate of roughly 200 to 700 liters per person per day; nodal pressures may be required to be in the range of 200 to 600 kPa; pipe diameters are at least 75 mm (3 inches) in diameter; frictional energy head losses are assumed to be acceptable if in the range from 0.2 to 1.5 meters for a 500 meter length of pipeline; and fire flow requirements are usually in the range of 100 to 300 L/s ( [Brebbia83] and [Finch93]). Once a network has been designed and constructed, a number of parameters, such as the nodal demand requirements and the pipe carrying capacities undergo continual change. Load varies on a daily, seasonal, and yearly basis as industrial and domestic consumers change their demand patterns. Deterioration of the fluid network with age adds to the uncertainty associated with the parameter estimates, in particular when some of the elements in the network are affected by physical changes such as pipe "tuberculosis" (resulting in an overall reduction in the cross-sectional area of older pipes and high line energy losses) or when there is excessive leakage from the network. Computer modeling and simulation of the behavior of water distribution systems is an essential tool not only during the design stages, but also throughout the day-to-day operation of the hydraulic network. In practice, before a network can be analyzed using conventional analysis models, it must often be skeletonized by reduction of series and/or parallel pipe circuits Chapter 1. INTRODUCTION 3 into equivalent segments. Corresponding values of the equivalent pipe resistance and the endnode discharges are then estimated, of necessity on the basis of incomplete knowledge of the resistances of the pipe segments, and demands at the various nodes of the network being skeletonized. This is particularly true for large-scale urban water distribution networks2 with a large number of pipes and junctions where it may not be practical to include all nodes and segments in the analysis process. As a result of the uncertainties discussed above, simulation results can be misleading and the computed values may not be representative of the actual state of the network. Computed values of flow rates and pressures rarely, if ever, accurately match actual measured flowrates and pressures. In practice, the calibration problem is often solved by matching the hydraulic model computed values of pipeflowratesand nodal pressures to measured values by adjusting the values of demand discharges and pipe carrying capacities using ad hoc trial-and-error techniques. These provide no guarantee offindingthe "best" values. 1.2 Scope of the Present Investigation The objectives of this dissertation are three fold: first, to identify the most appropriate class of ANN architectures and learning algorithms suitable for modeling two key problems associated with the design and operation of water distribution systems, namely, hydraulic network analysis and model calibration; secondly, to develop a relatively general framework for representing arbitrarily complex hydraulic networks by means of multilayered feedforward ANNs; and finally, 2 W D N s henceforth. Chapter 1. INTRODUCTION 4 to experimentally investigate the convergence and performance behavior of a number of feedforward networks trained by error backpropagation for hydraulic network analysis and model calibration under typical operating conditions. The first aim of this research was to assess the feasibility of artificial neural network models as an alternative to conventional methods of hydraulic network analysis and model calibration. Artificial neural networks are distributed-parallel information processing structures that may 3 be best viewed as general-purpose input/output "black-box" models capable of learning possibly unknown relationships between designated sets of inputs and outputs. Although ANNs are promising and extremely attractive in principle, many factors must be carefully considered and appropriate choices made before a given system can be effectively represented and subsequently trained by means of such a parallel-distributed information processing structure. This approach is quite different from the standard W D N models. 1. What types of water/fluid distribution system problems are most suitable for being modeled using ANNs? Possible candidates are: i. Hydraulic network analysis and model calibration ii. On-line monitoring and automatic operation/control iii. Optimal hydraulic network design 2. What is the best A N N representation of a given hydraulic network? In particular, i. What should be the inputs to and outputs from the A N N and how do they correspond to the inputs to and outputs from the WDN? 3 A N N s henceforth. Chapter 1. INTRODUCTION 5 ii. How many processing elements (other than the input/output units) are needed to represent a given WDN? iii. Should the processing elements in the ANN be divided into layers? If so, how many layers and how many units per layer? iv. How should the various processing elements in the ANN be connected? v. How should the state of activation of the various elements be updated? should the updating scheme be deterministic or stochastic? synchronous or asynchronous? vi. What form of the transfer or output function should be implemented at each processing element in the ANN? 3. What learning algorithm is the most appropriate for the chosen architecture and how should its convergence and performance be best evaluated both during training and testing stages? i. How can the number of nodes and pipe segments in the hydraulic network be used to determine the number of units and layers in an ANN? ii. Does the ANN require a "teacher" during training or can it learn a task unsupervised? Where do training data come from and how reliable are these data? How many training input/output vector pairs are needed and how many times must this set be presented to the ANN before the desired error tolerance is reached? iii. Can the ANN be trained on-line as measurement data becomes available or must the training phase be separated from the operation phase and carried out off-line? Chapter 1. INTRODUCTION 6 4. How can an A N N be best simulated in software and what types of data structures and related information processing operations must be implemented? i. How can an A N N simulator be best integrated and interfaced with existing hydraulic network simulators and what memory model should be used for efficient exchange of I data between the two different classes of network models? ii. How can the inputs to and the outputs from the two classes of network models be best interfaced during training? iii. How can knowledge about the hydraulic network be used to specify the most suitable A N N architecture for a given class of problems? The above questions have been addressed at varying levels of detail in this study. A general framework for representing arbitrarily complex hydraulic networks by means of multilayered feedforward ANNs was initially developed and a hybrid model for interfacing the I/O to/from the A N N and W D N model simulators during training and testing stages was then implemented in order to assess the feasibility and versatility of multilayered feedforward ANNs for hydraulic network analysis and model calibration under typical operating conditions. Specifically, several sets of input/output (I/O) training and testing vector pairs were generated and used to train and test the convergence and performance of a number of feedforward networks used to represent three characteristically different classes of hydraulic network problems. Convergence behavior was observed to improve after the appropriate preprocessing of the I/O training vector pairs generated by the hydraulic network model simulator. In part, this process involved application of linear and non-linear fuzzy transformation functions to the / Chapter 1. INTRODUCTION 7 various groups of variables and parameters in the input and output vector pairs followed by the algebraic normalization of the input to the feedforward networks. The effects on convergence and performance as a result of variations in a number of experimental parameters such as the choice of transformation functions, changes in the underlying probability distribution from which seed hydraulic network data were drawn, the presence of noise in training and testing data, choice of the activation and output functions and their parameters, the size of training I/O data pairs, and the number of layers used in the feedforward network have also been investigated. In summary, • the hybrid model implements a number of practical schemes for integrating and interfacing the I/O to/from the two distinct classes of network model simulators (i.e., WDN and ANN) during the training and testing stages, • the WDN model simulator implements an efficient and accurate method for generating reliable training and testing I/O data without the need for the conventional iterative solution methods for finding the zeros of the non-linear system of hydraulic equations, • the ANN model simulator implements a unique method for transforming the various groups of hydraulic network model variables and parameters by means of linear and nonlinear fuzzy I/O membership functions (transformations), • the representational framework is general and can be easily used to efficiently and systematically represent arbitrarily complex WDNs of any given topology. The WDN-ANN Chapter 1. INTRODUCTION 8 (topological) mapping methods together with the hydraulic network decomposition technique developed in the remaining chapters of this thesis can be easily used as the basis for modeling and solving a number of practical real-world problems associated with the design and operation of large-scale WDNs (such as optimal hydraulic network design, detection of damaged and leaky lines, and perhaps most importantly, on-line monitor and control (flow and pressure control), etc.), • the hybrid model represents a unique method for hydraulic network analysis and model calibration with limited data. This problem is equivalent to the general problem of finding the zeros of an underdetermined system of non-linear (hydraulic) equations with more unknowns than equations. 1.3 Organization The remainder of this dissertation is divided into five chapters. Chapter 2 formulates the basic problems of hydraulic network analysis and model calibration, discusses the weaknesses and strengths of some of the conventional methods used in practice, and provides an overview of the algorithm used by the W D N model simulator to generate training and testing data. Chapter 3 investigates the capabilities and limitations of a number of existing A N N architectures and identifies the basic building block components of ANNs while emphasizing a hierarchical framework for the specification of their topology and structure. The objectives of Chapters 4 and 5 are to illustrate the feasibility and versatility of multilayered feedforward ANNs trained by error backpropagation for hydraulic network analysis Chapter 1. INTRODUCTION 9 and model calibration. Chapter 4 outlines the details of the error backpropagation algorithm and develops a number of schemes for interfacing the I / O to/from the W D N and A N N simulators during training and testing stages. In addition, two methods are suggested for mapping arbitrarily complex hydraulic networks to their equivalent multilayered feedforward A N N representation. Chapter 5 outlines the details and the key results of a relatively comprehensive set of numerical experiments carried out to assess the convergence and performance behavior of multilayered feedforward A N N s for hydraulic network analysis and model calibration under typical real-world operating conditions. Finally, Chapter 6 concludes this thesis by summarizing the main results and suggests a number of possible extensions for future research. Chapter 2 HYDRAULIC N E T W O R K ANALYSIS A N DMODEL CALIBRATION 2.1 Introduction Computation of steady-state flowrates and the pressure distribution i n a hydraulic network of a given topology, pipe carrying capacities and nodal demand discharges plays an essential role in the design, analysis, and operation of water and other fluid distribution systems. When accurate estimates of the hydraulic network model parameters (namely, pipe resistances and demand discharges) are available, the pipe segment flowrates and nodal energy heads can easily be calculated by means of a number of efficient algorithms that iteratively solve the basic nonlinear system of hydraulic equations. For aging networks, however, reliable estimates of model parameters are rarely available, since the nodal demands are i n a continual state of change and the various pipe segment carrying capacities gradually change with time. The purpose of this chapter is to formulate the basic problems of hydraulic network analysis and mo del calibration, discuss the strengths and weaknesses of some of the existing solution procedures, and provide an overview of the algorithm used by the hydraulic network simulator used during the experiments. 10 Chapter 2. HYDRAULIC 2.2 NETWORK ANALYSIS AND MODEL CALIBRATION 11 The Steady-State Hydraulic Network Analysis Problem During the operation of water distribution systems, the goal of steady-state hydraulic network analysis is to ensure that adequate minimum pressure is maintained at all locations while meeting the flow requirements at demand points in the network. Steady-state network analysis requires the formulation and solution of a system of non-linear equations which are expressed in terms of: 1. the unknownflowratesin the pipes segments (element or flow equations), 2. the unknown energy heads at junctions (node or head equations), 3. the unknown corrective flowrates in the pipe segments of non-overlapping closed loops (loop equations). The various equations are based on the application of the two fundamental physical laws: the conservation of mass and the conservation of energy written at the nodes and around the pipe segments of a loop respectively. In analogy with electrical networks, these laws are essentially the same as Kirchhoff's current and voltage laws. The solution of the system of equations requires that, at at-least one reference point in the network, the pressure be some known function of the external flow at that point. However, the external flow at nodes of this type (such as the source nodes) must now be known before the energy head can be calculated and hence the total number of equations and unknowns remains the same. It has been shown by Warga [Warga54] that if the energy head change along each pipe segment is a known function of the flow in that segment, under certain conditions, the resulting system of equations has a Chapter 2. HYDRAULIC NETWORK ANALYSIS AND MODEL CALIBRATION 12 [2]E 2 Q.2 191 Qi mi Figure 2.1: Schematic of a 3-Node hydraulic network with s = 3 segments, r = 1 source node, and k = 2 internal sink nodes. unique solutiontWargaS^1. In contrast to electrical networks, theflow-pressurerelationships in hydraulic networks are always non-linear. 2.2.1 Matrix Formulation of the Basic Equations A WDN may be best visualized as an interconnected system of physical elements, the pipe segments, joined at a specified number of junctions and mapped to a two dimensional Cartesian grid. To illustrate, consider a generic hydraulic network with s pipe segments, r source nodes (where the energy heads are known functions of externalflows)(reservoirs, tanks, etc.), and k internal sink nodes. Figure 2.1 is the schematic of a simple 3-Node network with Given that theflowratesin every pipe segment i can be expressed as some, known function of the energy head loss between the two .end-points of a pipe segment, i.e., qi = hi(E i — En), then, for all segments, the system of equations has a unique solution for the nodal energy heads under steady-state flows. The function hi has the following properties: (1) hi(x) — —hi(—x), (2) hi(x) is continuous for all x, (3) for all segments t, hi(x) is non-decreasing as x increases, and (4) there exists a path between any two nodes in the hydraulic network along which every pipe segment has an hi(x) that increases as x increases[Shamir68]. 1 a Chapter 2. HYDRAULIC NETWORK ANALYSIS AND MODEL CALIBRATION 13 AE; Figure 2.2: Schematic of generic pipe segment i with diameter d,-, length /,-, pipe resistance Ri, loss factor AT,- (not shown), inflow g ,-, and outflow (ft,- Note that = q { = g;,,- if there is no leakage from the segment. The energy loss in the direction of flow is designated by AEi. 0 a 5 = 3 segments, r = 1 source node, and k = 2 internal sink nodes. Pumps and other hydraulic devices can be easily included in this formulation by locating appropriate nodes at the inflow and outflow sides for the various devices. A critical step in the analysis of WDNs is the determination of the behavior of the individual pipe segments under typical operating conditions. Figure 2.2 is a schematic of a generic hydraulic element that connects nodes j — a and j — b. The energy heads at the end nodes of the segments, ai and bi, are denoted by E i and E\,i respectively. The total energy head loss a for the element is given by: AE{ — E i - En (2.1) a For a generic network with s segments, r source nodes, and k sink nodes, the known quantities or parameters of the conventional hydraulic network model may be summarized as follows: , Chapter 2. HYDRAULIC 14 NETWORK ANALYSIS AND MODEL CALIBRATION Vector Notation Size Dimension State Variable Name Flowrates in pipe segments Energy heads at sink nodes External Flowrates at source nodes Parameter Name = [qi,---, Qs) Efc = [Eu-- ,EkV Qr - [Qk+i, • • • , Q k + r ) s T Pipe Resistances/Loss Factors Rj = [Ri,-- •, R ] Demand discharges at the sink nodes Qk = [Qi,-- •,Q ] E = [Ek+i, ••,E r] Energy heads at the source nodes T 1 3 T k r k+ r k r s k r T L T T T? L T 3 L Table 2.1: Summary of WDN state variables and parameters. Other design parameters such as pipe lengths l = [Zx, • • •, / S ] T and diameters d = [d1? • • •, d ] are not included in this table. T s s s • Corresponding to the s pipe segments (or elements) there are 1. s pipe diameters, 2. s pipe lengths, 3. s pipe resistances or loss factors The values of pipe lengths and diameters are design parameters and are known for a network of given topology. • k predicted demand discharges at the sink nodes, • r energy heads as functions of external flow at source nodes, . The unknown quantities or state variables in the analysis are: • k energy heads at the interior sink nodes: • sflowratesin the pipe segments: Chapter 2. HYDRAULIC NETWORK 15 ANALYSIS AND MODEL CALIBRATION • r externalflowratesat source nodes Table 2.1 provides a summary of the vector notation used to represent the various state variables and parameters. The application of the law of conservation of mass at each node in the network results in the continuity equation which requires that the algebraic sum offlowsat a node must be equal to zero. For any interior sink node j in the network one may write an equation in the form: s £hi<n t=i = Qj 1 (2-2) where g,- is the averageflowratein the segment, Qj is the demand discharge at the node, and +1 ifflowenters node j — 1 ifflowemanates from node j 0 otherwise The flow equations relating the volumetricflowrateto the energy head loss, AEi, along the length of pipe segment i may be written as: R iqi = AEi = E - E , ai u for i=l, • • • , s (2.3) Each segment i connects nodes a and b and the direction offlowis assumed to be from a to b, if a > b, where a and b are positive integers used to enumerate the end nodes. With this assumption, the computed value offlowratein the segment is positive if the actualflowdirection is also from a to b and negative otherwise. After a distribution system has been skeletonized, the various nodes may be numbered in any order with the exception that all. the source nodes must be numbered last and the direction offlowin the various pipe segments is assumed to be from the higher numbered node of the segment to the lower numbered node, as shown in Chapter 2. HYDRAULIC NETWORK ANALYSIS AND MODEL CALIBRATION 16 the schematic of the 3-node network. A negative value of the flowrate computed for a segment indicates that the actual direction of flow in the segment is opposite to the assumed direction (i.e., from the lower to the higher numbered node). The flow at sink nodes can be either zero or have a negative value (i.e. fluid is discharged from the node). The pipe resistances, denoted by .ft,, may be expressed either in terms of flow or energy head losses. There are a number of formulas in use today ranging from Colebrook and White's formula to empirical formulae such as Hazen-Williams or Manning formulas for the determination of pipe resistances[Jeppson77]'. The exponential or power formula is widely used in practice to express .ft, and has the form: Ri = K^F'- (2.4) 1 where Ri > 0, Ki is a constant depending on pipe material/geometry and fluid properties, and 1.8 < n,- < 2.0 is the range of values commonly used for n,-. If estimates for the g,'s are available for all i then the values of iZ,'s can be easily calculated and a linear system of equations results which can then be solved to determine the pipe flow rates. These new values of pipe flows may then be used to update the values of .R,'s and the process repeated until the difference between the values of two consecutive iterates converge to within a prescribed tolerance. Once the flows are known, the energy heads at the nodes may be computed from the known head-flow relationships in each pipe segment . Rearranging Equation 2.3, allows the calculation of the 2 It is noteworthy that based on rigid water column theory, the fluid in a pipe may be treated as an incompressible mass, and the equation of motion for each pipe segment in the network is given by a first order non-linear differential equation[Onizuka86] 2 AE, = —q + K \q \ - qi dig ni i i i 1 (2.5) where is the inertia of pipe segment i with length U and cross-sectional area aj. At equilibrium, <j; = 0 in all pipe segments and AEi = #i|gr;| * g;. n -1 Chapter 2. HYDRAULIC NETWORK ANALYSIS AND MODEL CALIBRATION 17 segment flowrate i n terms of the energy head loss between its two nodes. = CiAEi where C, is often referred to as the conductance of the segment. If flow is assumed to be positive when going from node a to node 6, then Ea{ is greater than E^. A flowrate at each of the segment end nodes can also be defined (see Figure 2.2). Assuming that a nodal discharge will be positive when going into the node, and negative otherwise, two equations can be written for each pipe segment, qai = CiAEi = d(Ea - Eb) and qbi = -dAEi = -d(Ea - Eh) Qai — qbi if there is no leakage from the segment. In matrix notation, the above equations may be expressed as: Qai = d qbi +1 -1 Ei -1 +1 Ebi a where Qai Qbi . +1 -1 _ + 1 1 , and Eai Ebi are the segment's nodal discharge vector, characteristic matrix, and nodal energy head vector, respectively. To analyze the entire network, the interactions among the various segments must be taken into account by considering the equilibrium that exists at any given node i n the network (including any external inflow or discharge) based on continuity. A compact matrix formulation Chapter 2. HYDRAULIC NETWORK ANALYSIS AND MODEL CALIBRATION 18 is presented below (similar to that of [Nielsen89]) that makes use of an s x (k + r) incidence matrix, B/ constructed by concatenating an s x r matrix B to the right of an s x k matrix r Bfc (corresponding to the r source and k interior sink nodes respectively). The elements of B/, i.e., bji, have already been introduced in Equation 2.2 and can be either +1, -1, or 0. For the 3-node network depicted in Figure 2.1, the incidence matrix is: +1 B/ = -1 0 0 +1 -1 +1 o -1 where +1 B = k -1 0 , and, B = .0 4-1 +1 -1 r -1 0 The various unknown quantities may be summarized in vector notation as follows: z = I E (2.6) f c Qr with 9i Qk+l and, Q = r Qs Qk+r E k The matrix formulation of the continuity equations written at the k sink nodes of the network may be written as: (2.7) Similarly, at the source nodes, one may write: Bjq s +Q =0 r (2.8) Chapter 2. HYDRAULIC NETWORK ANALYSIS AND MODEL CALIBRATION 19 The energy head losses may be expressed as: k+r AEi = -J2bji Ej (2.9) so that the vector AEr AES = AE, satisfies A E S = -BfcEfc - B r E r (2.10) The flow Equation 2.3 can hence be written R q + BfcEjt = - B r E r s (2.11) s where R (z) = diag[Ri, • • •, Ra] and depends on z = [q , Ej,,Qr] T- The basic system of equas s tions for the determination of z is: R,(z) Bjt 0 q BrEr s B[ 0 0 E* Bj 0 1 Q + -Qk ~° ( 2- 12) 0 r The above equation may simply be written in the following general form F(z) = 0 Note that, for the r source or similar special nodes where the relationship between the energy head and externalflowis known, there are only (k + r) — 1 independent continuity equations since there is an additional mass-balance equation that may be written for the hydraulic network requiring that the sum of external inflows and external outflows be equal, i.e., k+r k j=k+i j=i Chapter 2. HYDRAULIC 2.3 NETWORK ANALYSIS AND MODEL CALIBRATION 20 Iterative Solution Methods The previous section has formulated the basic system of non-linear equations needed for steadystate hydraulic network analysis. The purpose of this section is to briefly summarize the most popular iterative solution techniques currently used in practice. For a network with Nh = k + r nodes, Sh = s pipe segments, and Lh = I non-overlapping closed loops, it can be shown by induction that: S = L +lN -l) h h h (2.13) The implication of this simple result from graph theory is that it is no longer necessary to directly solve the basic system of Sh, + (Nh — 1) equations and a reduction in the total number of equations is possible. A number of standard techniques may be used to reduce the total number of equations from the number of pipe segments, Sh, to the combined number of sink and source nodes minus one (Nh — 1), or to the number of non-overlapping closed loops Lh (see Jeppson [Jeppson77]). Several algorithms are in use today for solving the simultaneous non-linear system of equations iteratively. The basic idea is to first linearize the non-linear system of equations and then iteratively solve the resulting linear system until no further improvement on the linear approximation is possible. Wood and Rayes[Wood81] identify five basic classes of algorithms commonly used for pipe network analysis and provide an interesting discussion of the relative accuracy and reliability of the various techniques. The various algorithms are classified based on whether the equations are expressed in terms of the unknown flowrates in the pipe segments or the unknown energy heads at the nodes. One of these procedures is based on solving the Sh Chapter 2. HYDRAULIC NETWORK ANALYSIS AND MODEL CALIBRATION 21 pipe equations, two are based on solving the Lh loop equations, and two are based on solving the Nh — 1 node equations[Karney84]. The remainder of this section briefly summarizes the strengths and weaknesses of some of the existing analysis methods and outlines the characteristics of the adopted solution technique. 2.3.1 Solution of the Pipe Flow Equations Wood and Charles[Wood72] presented a new method for hydraulic network analysis based on a system of s equations with pipe segment flowrates as the unknowns. The total system of equations is made-up of the (Nh — 1) continuity equations at the nodes and the Lh nonlinear loop equations. The system of equations is solved by the linearization of the non-linear relationship between the energy head loss and flowrate in each pipe segment resulting in Lh linear equations. Using the linear theory method 3 , the energy head loss in a pipe segment is expressed as before: R iqi in which Ri = Ki\qi\ '~ . n x = AEi With an initial estimate for theflowratein each pipe segment i, the value of Ri can be calculated and hence a linear system of equations results which can then be solved for theflowratesin all the pipe segments in the network. Once these new pipeflowratesare computed, they can be used to re-calculate Ri (for all i) and the process is repeated until consecutive estimates offlowratesconverge to the desired tolerance. Once the pipeflowsare determined, the energy heads at the various nodes are then calculated using the known relationship between the energy head loss andflowrate.Note that initial estimate 3 L T henceforth. Chapter 2. HYDRAULIC NETWORK ANALYSIS AND MODEL CALIBRATION 22 of pipeflowrates,g,-, can be easily computed by simply estimating pipe resistances, Ri, with pipe loss factors, K,;[Karney84] whose values are independent of q,s. In general, this method is relatively reliable with good convergence and stability properties and does not require the initial estimates of theflowratesto be close to the final solution. i 2.3.2 Solution of the Node Equations There are two methods in existence today that formulate the basic equations with energy heads as the unknowns [Wood81]: • The single node adjustment algorithm pioneered by Hardy Cross[Cross36] and • the simultaneous node adjustment algorithm initially proposed by Shamir and Howard [Shamir68]. With initial assignment of energy heads to every node in the hydraulic network, the loop equations will be trivially satisfied and the number of equations can be reduced from Sh, to (iV/j — 1). In addition, with known values of the nodal energy heads, the pipe segment flowrates can be computed from the known relationship between energy head loss andflowratein the pipe segment. However, the calculation of theflowratesin this way will generally not satisfy the continuity equations. Corrections to the initial values of the energy heads may be applied so that the imbalance in the continuity equation is made arbitrarily small after a number of iterations. Iterations are required since the corrections to the values of the energy heads are based on a linearized version of the continuity equation expressed in terms of energy head losses or gains in the pipe segments as the unknowns. Chapter 2. HYDRAULIC NETWORK ANALYSIS AND MODEL CALIBRATION 23 The two techniques that utilize this method are identical in their formulation of the equations but differ in the way the energy head corrections are calculated before being applied to the each node. Hardy Cross [Cross36] pioneered this basic method of balancing the flows by suggesting that each correction be calculated on a node-by-node basis ignoring its interaction with the rest of the network. At each node, a single non-linear equation is expressed in terms of the unknown energy head and the zero of the system of equations is then approximated based on the Newton's method. Shamir and Howard [Shamir68] proposed an improvement to the traditional Hardy Cross method by reasoning that the convergence rate could be improved if the energy head corrections to be applied to each node took into consideration its interaction with all other nodes in the hydraulic network. This formulation results in an (Nh — 1) by (Nh — 1) system of non-linear equations which may then be solved simultaneously using the Newton-Raphson4 method. Both these methods have been observed to experience significant convergence problems since they require "close-enough" estimates of values for the unknown energy heads to initiate the solution procedure[Wood81]. The performance of both these methods may be improved by initiating the solution with sets of values for energy heads which are "closer" to the solution. However, as pointed out by Wood[Wood81], neither method is reliable since they do not provide any systematic guidelines for improving the solution from one iteration to the next. 4 NR henceforth. Chapter 2. HYDRAULIC 2.3.3 NETWORK ANALYSIS AND MODEL CALIBRATION 24 Solution of the Loop Equations There are two pipe network analysis methods in existence today that formulate the hydraulic network equations with pipe correctiveflowratesas the unknowns: • the single path adjustment method originally developed by Hardy Cross, • the simultaneous path adjustment method devised by Epp and Fowler[Epp70]. To assign initial values offlowratesto the various pipe segments in the hydraulic network such that the continuity equations are satisfied, the number of equations to be solved may be further decreased to Lh- With the initial assignment offlows,the energy head loss in each segment can be calculated using the known energy head-flow relationship. However, the sum of the energy head losses around each closed loop, as required by the law of conservation of energy, will generally be different from zero. Hence corrections must be applied to the initial assignments offlowratesin each segment of the various loops so as to reduce this difference to zero. As with the methods for the solution of the node equations, the corrections will generally not be exact because they are calculated from a linearized version of the loop equations and hence a number of iterations will be necessary in order to achieve convergence. The two solution methods mentioned above use different approaches for calculating the Lh loopflowcorrections but are identical in the way in which they formulate the basic equations. The method of Hardy Cross[Cross36] for solving the loop equations essentially solves a single non-linear equation based on Newton's method, as was done with the node equations. An unknown loop flowrate is applied in order to correct the imbalance in the loop equation ignoring the interaction between the segment and the other components of the distribution Chapter 2. HYDRAULIC NETWORK ANALYSIS AND MODEL CALIBRATION 25 system. This method has been found to have significant convergence problems when applied to medium to large scale water distribution systems. The poor performance of the Hardy Cross method, mainly due to ignoring the interactions between system components, was overcome by the simultaneous path adjustment method of Epp and Fowler[Epp70]. In this method, the pipe flow corrections in each loop are treated as a vector of unknowns and are computed by solving a system of equations that takes into account the interaction with the adjacent loops. The NR method is used to linearize the system of Lh non-linear equations by evaluation of their derivatives at the initial values of the flowrates. Convergence with this method has been found to be very good when good initial estimates of the flowrates in the various pipe segments are supplied. 2.3.4 The Newton-Raphson Method Some of the above methods are based on the Newton extrapolation rule to eliminate non-linear terms from the Taylor series expansion of the equations. Both the LT and NR methods start with an initial vector, ZQ and compute a series of iterates zi, Z 2 , • • •, that will hopefully converge to the solution z* such that F(z*) = 0 Newton's method is a root finding method that requires the evaluation of the function F(z) and its derivative, F'(z), at successive values of the iterate. Geometrically, the NR method consists of extending the tangent line at the t th estimate of the z*, i.e., z t , until it crosses the z-axis at z +\ as shown in Figure 2.3. z +i is the (t + l) th estimate of z*. t t Chapter 2. HYDRAULIC NETWORK ANALYSIS AND MODEL Figure 2.3: Illustration of zero finding with Newton's method. tangent drawn at F(zt) crosses the z axis. Zt+i CALIBRATION 26 is the point at which the Algebraically, the method simply uses the first two linear terms of the Taylor series expansion of the function, F(z) in the neighborhood £t = (zt+i — z ) of the t th estimate of z*, i.e., t Zt, to compute the next iterate. In one dimension, F(zt+1) « F(zt) + (zt+1 - zt) F'(zt) or F(zt + Q « F(zt) + (t F'(zt) If F(zt+i) is equal to zero, then zt+i must be the root and the right hand side of the above equation constitutes an equation for the root Zt+i. 0 = F(zt) + (zt+1 - zt) F'(zt) Chapter 2. HYDRAULIC 27 NETWORK ANALYSIS AND, MODEL CALIBRATION Solving for zt+\ yields an iterative formula for finding the zeros of the function F(z): Z t + 1 = ~ Ffr) Z t = Z t (2-14) ~ '^)~ ^) F lF Substituting £t = zt+i — zt and then solving for £t results in: V - - F'(z vii,t) \ (2.15) zt+i = zt + (t (2.16) which can be used to write The NR formula in vector notation may be written as: zt+i = zt- J where the vectors z , z i, t t+ respectively in 2.14. J F - 1 and F ' (2.17) replace the single variables zt, zt+i and the function F, _ 1 F F(z ) t which is the inverse of the Jacobian matrix5(since division by a matrix is undefined) replaces F'(zt) -1 in the NR formula for a single variable. Note, however, that when the vector form of the NR formula is used to iteratively find the roots of the non-linear system of equations, this inverse is never obtained and premultiplied by F as Equation 2.17 suggests. Instead, the solution vector of the linear system, JpCt = F(z ), i.e., Ct = J F t _ 1 F(Z ), 4 is added to the current (i.e., t ) estimate of the unknowns. The NR formula hence obtains a ih The Jacobian matrix is a function derived from a set of n = (k + r + s) variables the value of which at any point is the n x n determinant of the Jacobian matrix of partial derivatives of these equations evaluated at that point. If 5 = ••-,*•), •••,/»(*!,• ••,*«)] . then ' «LL 0*1 ... 21*. Szn Chapter 2. HYDRAULIC NETWORK ANALYSIS AND MODEL CALIBRATION 28 solution to a non-linear system of equations by iteratively solving a linear system of equations. The NR formula can hence be written in the form: z ( + i = z + Ct (2.18) t The LT and the NR methods reportedly exhibit excellent convergence properties [Wood81]. The NR method is known to have quadratic convergence6 given that the initial estimate of the root vector, z -o, is "close enough" to the true zeros, z*, of the system of equations [Burden85]. t Although the LT method oscillates when the values of the successive iterates get close to the solution this method does not require "close" initialization of the unknowns and often converges in a relatively small number of iterations [Wood81] and hence can be used to provide excellent starting values for the NR iteration[Nielsen89]. In the context of the notation introduced in the previous section the LT method may be obtained by simply replacing Rs(z) with Rs(zt) and then solving for z +i. As demonstrated t and explained by [Nielsen89], this method oscillates when z is close to the root z*. t When solving the network equations involving the flowrates in each pipe segment or the nodal energy heads as the unknowns, the vector z is replaced with vectors q and Efc respectively. s Similarly, when solving the equations resulting from the conservation of energy around nonoverlapping closed loops in the network, z must be replaced with Aq^ (£ — Lh)\ the corrective flowrates in the various segments of all the loops. i.e., the error associated with the next estimate z +i is proportional to the square of the error associated with the current estimate z of z*. 6 t t Chapter 2. HYDRAULIC 2.4 NETWORK ANALYSIS AND MODEL CALIBRATION 29 The Hydraulic Network Model Calibration Problem A very important phase in the process of simulating the operational behavior of an existing WDN is the comparison and matching of the model computed values of the nodal pressures and pipe flowrates with field measurements of pressures and flowrates at a selected number of points in the distribution system. If values of model parameters, such as the pipe resistances (loss-factors) and demand requirements at the discharge nodes, are known with sufficient accuracy (as is the case with newly designed systems), and the computer simulated network is an accurate representation of the actual distribution system, then the model calculated values of the unknowns (state variables) should closely match the observed values. However, values of the various model parameters (namely, pipe resistances/loss-factors and nodal demands) are subject to change as the network ages and deteriorates or is damaged (with possible leakage from older pipes) and/or the demand patterns at the various discharge nodes throughout the network changes with changing land-use and population density of the area served by the delivery system. The sources of inaccuracies in WDN model computed values fall under the following categories [Walski83a]: • incorrect estimates of demand at sink nodes, • incorrect estimates of the pipe carrying capacities, • incorrect estimates of the energy heads available (or produced) at the various control nodes (such as nodes representing pumps, pressure regulating valves, primary/secondary storage reservoirs, etc.), Chapter 2. HYDRAULIC NETWORK ANALYSIS AND MODEL CALIBRATION 30 • inadequate representation of the distribution system. In a report entitled "Water Distribution Research and Applied Development needs", the AWWA7[AWWA74] Research Committee on Distribution Systems remark that " . . . the major source of error in simulation of contemporary performance will be the assumed loading distributions and their variations," while Eggener and Polkowski [Eggener76] state that " . . . the weakest piece of input information is not the assumed loading condition, but the pipe friction factor." It seems that the importance of each factor varies from one network to the next and largely depends on the particular characteristics of the hydraulic network[Ormsbee86a]. Shamir and Howard [Shamir77] define model calibration as the process which " . . . consists of determining the physical and operational characteristics of an existing system and determining the data that when input to the computer model will yield realistic results." Similarly, the AWWA Research Committee on Distribution Systems [AWWA74] describes the process of model calibration and verification as: "System simulation is considered verified during preliminary analysis for design when calculated pressures are satisfactorily close to observed field gage readings for given field source send-out and storage conditions. If simulation is not satisfactory, the possibility of local aberrations, such as open boundary valves, is investigated. In the absence of other expected causative factors, the assumed local arterial network loads are adjusted until computed and observed field pressures are within reasonable agreement for various levels of extremes of demand, pumping, and storage." Walski [Walski83a] provides a more precise definition: "Calibration of a water distribution model is a two step process consisting of: (1) comparison of pressures and flows predicted with observed pressures and flows for known operating conditions (i.e., pump operation, tank levels, pressure reducing valve settings); and (2) adjustment of the input data for the model to improve agreement between observed and predicted values. A model is considered calibrated 7 American Water Works Association Chapter 2. HYDRAULIC NETWORK ANALYSIS AND MODEL CALIBRATION 31 for a set of operating conditions and water uses if it can predict flows and pressures with reasonable agreement. Calibration at one set of operating conditions and water use does not necessarily imply calibration in general, although confidence in the accuracy of results from the model should increase with an increase in the range of conditions for which the model is calibrated." The surrogate objective of the calibration process is to match the observed values of the state variables with those generated by the hydraulic network model. In practice, with a given set or sets of measurements, the engineer attempts to match the model computed values to the observed values using ad-hoc trial-and-error techniques to vary the parameters. The calibration process hence can be an extremely tedious task with no guarantee of obtaining a reasonable match. The task becomes more complicated as the number of parameters that must be estimated increases. As mentioned previously, the carrying capacities of pipe segments slowly change over time while the nodal demand discharges undergo continual change. Patterns vary not only from one distribution system to the next, but can also vary within a given system. In addition, energy losses that occur through pressure regulating devices such as valves often increase with time, or they may be closed when believed to be open[Lansey91]. Furthermore, reliable values of nodal demands are usually not known and may have to be estimated at any instant or over a period of time. The process is further complicated due to the lack of a sufficient amount of field data required to provide estimates of the various model parameters with sufficient precision. As a distribution system expands with time and new pipes and other hydraulic devices are added to meet changing demand and supply patterns, additional data are often necessary to calibrate the hydraulic model. If field measurement data are collected only for a single load pattern, compensating errors may result in situations where more than one set of parameters values are applicable and result in a deceptively good match. In summary, a Chapter 2. HYDRAULIC NETWORK ANALYSIS AND MODEL CALIBRATION 32 calibration method should be capable of analyzing multiple demand patterns while considering all the known and unknown model parameters. 2.4.1 Related Research In recent years a number of explicit and implicit algorithms have been proposed for hydraulic network model calibration based on the use of analytical, simulation, or optimization techniques. Walski utilizes a number of analytic formulas for adjusting model parameters under single load patterns [Walski83a]. : Simulation methods have been used when the number of unknown parameters is less than or equal to the number of the original unknowns or state variables. It is possible to rework the system of equations with the parameters as the unknowns which can then be solved iteratively (see [Donachie74], [Wagner88a], [Rahal80], [Gofman81], [Ormsbee86a] [Ormsbee89], and [Wagner88b]). Unfortunately, these techniques are generally restricted to only one load pattern and can not be used in situations where there are more unknown parameters than state variables. There are a number of mathematical optimization techniques in use for hydraulic network model calibration (see [Shamir74], [Coulbeck85], [Bargiela89], [Ormsbee86a], [Ormsbee89]). Shamir [Shamir74] proposed an algorithm for model calibration that could be used for a single loading condition. Coulbeck [Coulbeck85] later linearized the network equations and used lumped parameters in an optimization procedure, also for a single loading condition. Ormsbee [Ormsbee89] combined a simulation model and an extended version of the complex method of Box [Box65] as the optimization routine for both single and extended period demand patterns. Lansey, et al. [Lansey91] presents a similar formulation but instead of a direct search method, Chapter 2. HYDRAULIC NETWORK ANALYSIS AND MODEL CALIBRATION 33 a gradient-based optimization algorithm is used to consider both independent and extended period loading conditions. Both these methods, however, rely on the solution of the non-linear system of equations (using the hydraulic network model which is being calibrated) to provide the optimization model with values of the state variables. 2.4.2 Hydraulic Network Model Calibration by Optimization Hydraulic network model calibration may be formulated as a non-linear constrained mathematical programming problem with an objective function and an associated set of linear and non-linear constraints (see [Ormsbee89] and [Lansey91] for full details of two alternate formulations). The objective of the calibration problem is to minimize some function of the difference between the measured (observed) and the WDN model computed (predicted) values of the pipe segmentflowratesand nodal energy heads at various points in the system under different loading conditions. As before, the basic decision variables are: 1. the k nodal demands at the sink nodes, 2. the s pipe head loss factors, 3. the r energy heads at the source or other special control nodes. The objective function may be expressed as: NL minimize : where E^ mp (2.1.9) and q£ mv are the model computed values of the nodal energy heads and pipe segmentflowratesand E^\ s and q°% s are the observed or measured values under loading condition Chapter 2. HYDRAULIC NETWORK ANALYSIS AND MODEL CALIBRATION 34 £. NL is the number of loading conditions (for example, NL = 24 for each hour of operation during the day), NN and NS are the number of nodes and pipe segments at which measurements are being made, and d is a positive integer. Note that the objective function is the sum of percentage errors raised to the power of d, for all measurements over all loading conditions. Ormsbee [Ormsbee89] formulates the above objective function with d = 1 subject to three different types of constraints which are briefly described below. • Implicit System Constraints These constraints include (1) the conservation of mass at the nodes, (2) the network's conservation of mass, and (3) the conservation of energy around a closed loop. • Implicit Bound Constraints This group of constraints are used to specify the acceptable bound of the measurement values of nodal energy heads and pipe segment flowrates related to the accuracy of either the pressure gage or theflowmeter used to obtain the measurement values. • Explicit Bound Constraints This group of constraints may be used to specify upper and lower bounds for the (k + r + s) decision variables of the hydraulic network. 2.5 Conclusions Over the past decade, the ability to model and simulate ever larger and more complex WDNs has improved significantly mainly due to the wide-spread availability of relatively cheap and powerful personal computers. There are currently a number of commercial WDN models that are used in the industry on a routine basis that allow the engineer to model, simulate, evaluate, Chapter 2. HYDRAULIC NETWORK ANALYSIS AND MODEL CALIBRATION 35 and optimize certain aspects of the overall behavior and performance of WDNs. A review of the methods advocated for hydraulic network analysis and model calibration revealed that all of these techniques directly or indirectly incorporate the continuity and energy equations in the formulation and usually require the steady-state solution of the system of hydraulic equations as part of the overall solution. There are a number of disadvantages associated with strict reliance on the results obtained from steady-state network analysis based on the system of equations formulated in this chapter. Even with the most reliable methods, thereis no absolute assurance of convergence([Jeppson77], [Wood81], [Nielsen89]). In addition, in most situations, the model is only a simplification of the real-world hydraulic network and can not realistically model every detailed aspect of the distribution system. Finally, under a number of circumstances (in particular when the network is aging and deteriorating with as much as 50% loss of water due to leakage[Jowitt90]) WDNs model's estimates of flows and pressures have often been reported to be unreliable despite the calibration efforts[Ayala91]. For the purpose of generating data needed to train artificial neural nets for hydraulic network analysis and model calibration, the hydraulic equations were formulated with the s pipe flowrates and the (k + r) nodal external flowrates as the unknowns. The advantage of this approach is that the resulting system of equations can be solved directly for the unknowns (given estimates of the range of the (k + r) nodal energy heads and s loss factors) without the need to use the iterative solution methods outlined in the previous sections. This approach forms the basis for data generation by the hydraulic network simulator outlined in Chapter 4. The representation of the hydraulic network analysis and model calibration by means of Chapter 2. HYDRAULIC NETWORK ANALYSIS AND MODEL CALIBRATION 36 ANNs require not only the identification of the building block components of ANNs, but also the specification of the interactions between the two classes of network models. These topics are the subject of the next two chapters. Chapter 3 T H E BUILDING B L O C K C O M P O N E N T S OF ANNS 3.1 Introduction Artificial neural networks have recently emerged as practical problem-solving tools whose po1 tential has only begun to be exploited in engineering. Inspired by the structure of the human brain, neural networks have been successfully applied to model and solve a number of problems that are either intractable or are cumbersome to solve using traditional solution techniques on the conventional computer. The conventional von Neumann machine is the realization of a serial model of computation where threads of instructions are executed sequentially and consists of four basic components: the central processing unit (CPU), internal memory, an external processor bus, and an I/O system. Figure 3.4 is a simplified schematic of the basic components of a von Neumann computer. A C P U is in turn made up of a controller, an arithmetic and logic unit (ALU), a set of internal (instruction and data registers), and an internal processor bus for data movement. Once an instruction is loaded in the program counter (a special register), the CPU reads the data required by that instruction from the appropriate memory locations, executes the elementary operations on the data, and finally stores any intermediate or final results back in * A N N s henceforth. 37 Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 38 address bus mi Tf e c o M E M 0 R Y e I/O C P U TJ Z I/O data bus Figure 3.4: The basic components of the von Neumann computer its internal registers. The final result of running a program may be in turn instructions and/or data communicated to the various external I/O devices. Data read from and instructions/data written to the external memory and other devices are carried out via the I/O system. The point being emphasized here is not the details of the internal structure of the C P U , but rather to stress that the conventional programmable computer is based on a serial model of computation where the task of running every program, independent of the degree of sophistication or the level of analysis that the program may offer, is systematically reduced to that of repetitively executing a large number of well-defined elementary arithmetic and logic operations. The various operations are executed one at a time, at successive steps of the clock. In contrast, an A N N is a parallel, distributed information processing structure that consists of many processing elements , each with its own local memory and the ability to carry out 2 specialized information processing operations. The processing elements are interconnected by 2 PEs henceforth. Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 39 Hidden Layer Figure 3.5: Example of A N N with 3 input units and 10 processing elements in the hidden and output layers. The Nn = 14 PEs are labeled with integers 1, • • •, 14. means of unidirectional signal transmission channels or connections (each represented by a directed line segment or an arc in a directed graph) configured such that the overall network structure effectively models the specific details of the problem being considered. Each P E in such a structure may receive input signals from other PEs in the network or the external environment and has a single output connection that branches out or "fans out" (with the same signal data type) into as many connections as desired (see Figure 3.5). The information processing operations that go on within each P E are local (i.e., they depend only on the values of the impinging input signals and values of the weight parameters stored in the local memory of the PE) and may be arbitrarily defined. Note, however, that a processing element in an A N N is simpler and performs a smaller number of operations (for example, a summation function may be performed on the inputs to the P E followed by an update of the values of the weights and other parameters associated with each class of inputs) than the von Neumann machine. Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 40 The strength of the serial model of computation is that it allows the implementation of sophisticated algorithms from the combination of a large number of elementary operations. Programming languages such as C and Prolog, for example, provide the engineer with a relatively small number of well-defined high-level operators and constructs for composing complex programs which are eventually decomposed by the compiler or the interpreter into a large number of basic elementary operations before they are finally executed by the CPU one operation at a time. Independent of the language used, programming the conventional von Neumann computer basically involves the implementation of some relatively well-understood algorithm (and the specification of the corresponding data structures) and ultimately the specification of the exact number and sequence of operations that must be performed on the program input to produce the desired output. But how are algorithms specified to begin with? A good programmer can systematically turn any given algorithm into code that is eventually reduced to the corresponding set of elementary operations by the compiler or interpreter and executed on the architecture for which the program was intended. However, independent of the level of sophistication of a given application, the engineer must initially specify, implement, and interface a number of algorithms in the context of some specific problem in significant detail before implementation can even begin. The implication of this observation is that the range of tasks that may be automated by utilizing the conventional programmable computer are limited to those tasks for which the engineer can specify an algorithm[Hecht90]. With ANNs, the task of modeling the specific details of the problem is transferred from the programmer to the program itself and the task of the programmer is to ensure that the ANN performs the specified tasks correctly. Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 41 In contrast to the conventional algorithmic computing paradigm, neurocomputing more closely resembles the process by which the human brain encodes and processes information and "learns its programs": by its distribution among a large number of parallel distributed and massively interconnected neurons (processing elements). This paradigm promises to revolutionize many aspects of system design and analysis by means of powerful computing structures that allow many intractable problems to be modeled as "black-boxes" where, for example, it is not possible to represent and/or exhaustively search the solution space using the conventional paradigms. ANN architectures depart from traditional parallel processing architectures in a number of respects. For example, the various processing elements in an ANN are highly interconnected and there are more interconnections than there are processors. State-of-the-art parallel processing architectures typically have a much smaller ratio of interconnections to processing units with individual processors that are comparable in complexity to single-processor architectures. ANNs depart from this arrangement by containing a much larger number of simpler processing elements. While the performance of von Neumann machines are benchmarked by the number of instructions per second, in sequence, by a single processor; the information processing power of an ANN is often measured by the number of interconnection updates per second needed during training and activation updates during operation. 3.1.1 Research History The origins of ANNs may be traced back to the 1943 paper of McCulloch and Pitts [McCull43] in which they proposed a general theory of information processing based on networks of binary Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 42 switching or decision elements. McCulloch and Pitts showed that such networks can, in principle, carry out any desired computation, similar to a programmable von Neumann computer or its mathematical abstraction, the Turing machine[Muller90]. The main draw-back of these first neural network models was their inability to learn. The interest in understanding neuronal behavior continued with the work of Hebb [Hebb49], a Canadian psychologist, who proposed a model of learning based on a qualitative description of the process by which synaptic connections are modified in response to external stimuli. During the years 1957 and 1958, Frank Rosenblatt and colleagues successfully implemented the first character recognition ANN known as the Mark I Perceptron [Rosenb59] with 400 sensor inputs and 512 weights implemented by an 8 X 8 x 8 array of electric motor-driven potentiometers [Hecht90]. However, the analysis of Minsky and Papert [Minsky67] which correctly pointed out serious restrictions in the learning abilities of the perceptron caused a diversion of effort to other areas of Al research. Many researchers also realized that significant advances in hardware technologies were needed to allow the development and implementation of practical and affordable ANNs for solving complex real-world problems. In addition, the lack of analytical methods for the assessments of the capabilities and limitations of ANN models of computation in this period led to a number of widely publicized and unreasonable predictions about the future of brain-like computers that were to completely replace human workers. Consequently, the period from 1967 to 1982 saw very little research directly focussed on ANN architectures and learning algorithms. It wasn't until the early 1980s that ANN research became popular again. In the years 1982 and 1984 John Hopfield proposed an important class of ANN models capable of storing and Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 43 recollecting abstract association lists in fully connected network configurations ( [Hopfield82], [Hopfield84]). Around the same period, Kohonen carried out some very interesting work with two-layered associative networks and designed* an ANN to implement a content addressable memory [Kohonen84]. Other researchers, such as Grossberg et al. [Grossberg82] continued with the development of mathematical models of the brain function mainly oriented towards psychological and biological research. The 1986 publication of the classical two volume publication: Parallel Distributed Pro- cessing: Explorations in the Microstructure of Cognition, ( [Rumel86a] and [Rumel86b]) may be viewed as the precursor to the publication of a large number of ANN related papers and text books in the following years. In 1987 the International Neural Network Society (INNS) was formed. In 1988 the journal of Neural Networks was founded by INNS followed by the journal of Neural Computation in 1989, the IEEE Transactions of Neural Networks in 1990, and subsequently a number of others. 3.1.2 Why Artificial Neural Networks? As with almost all new technologies, ANNs are yet another tool in the already well-equipped engineer's "toolbox". ANN models allow exploration of machine intelligence from a dynamical system viewpoint. These dynamical systems may be modeled as stochastic processes that may be.expressed as large systems of differential or difference equations describing the local and/or global interactions of non-linear parallel processes[Kosko92a]. This notion of "intelligence" is characteristically different from the traditional artificial intelligence (Al) approach to machine intelligence where symbolic processing and tree search algorithms are the primary means of Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 44 simulating intelligent behavior. ANNs excel in "model-free" parameter estimation. As a trainable dynamical system, training I/O data sets implicitly program their time evolution[Kosko92a]. Unlike statistical parameter estimation, ANN models may be used to approximate a function without an explicit mathematical model or description of how the outputs (dependent variable) depend on the inputs (independent variables). ANNs have the ability to "learn" their "program" from numerical or linguistic sample I/O data by encoding the sampled information in a parallel-distributed structure3. In summary, the main advantages of ANNs over the traditional von Neumann model of computation, are: • ANNs are very robust and are endowed with fault tolerance due to the large number of interconnections used to encode pattern or function information. This distributed encoding makes ANNs more immune to random noise in the values of the inputs and enables them to compute the desired output even when the input pattern is, incomplete. Unlike the conventional von Neumann computer where component malfunction often results in abrupt failures, the performance of ANNs degrades gracefully with the loss of connections. • ANNs can learn new rules, features, input-output relationships, regardless of whether the various rules, features, or, relationships are known explicitly a priori. As demonstrated in Chapter 5, ANNs can be used to learn functional relationships that can not be explicitly denned by extracting the unknown relation from sample data. The parallel distributed processing framework is often viewed as a constraint network where each PE represents a hypothesis that a certain feature is present in the input pattern and the connection weights represent the constraints among the PEs or hypotheses[Rumel86b]. 3 Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 45 • ANNs can process new inputs and generalize to produce an output given effective representation of the problem under investigation and enough on-line or off-line training data. By far the largest area of activity in ANN research is that of apphcations([Muller90], [Dayhoff90], [Hecht90]). As with almost all applications of mathematical models to the solution of engineering problems, the key to the success of ANNs seems to be domain-specific knowledge pertaining to the problem under investigation. The approach adopted in investigating ANN models for the solution of the hydraulic network analysis and model calibration problems outlined in Chapter 2 was both top-down and bottom-up. The top-down approach attempted to systematically answer the following questions: • Given a hydraulic network of known geometric dimensions and topology what is the most suitable ANN architecture for the solution of that class of problem? How large should the ANN be and how should the various processing elements be interconnected? Can the problem be decomposed into a number of smaller subproblems, each of which, in turn, can be modeled and solved with different types of ANNs that can then be interfaced? • What are the traditional methods for hydraulic network analysis and model calibration and is an ANN really needed to solve these problems? How well does an ANN of a given architecture solve the various problems in comparison to existing well-understood methods? • What are the inputs and outputs to/from existing hydraulic network simulators for a Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 46 given class of distribution system problems and how should they correspond to the inputs and outputs of the ANN? Where do training data come form? The bottom-up approach attempted to provide practical answers to the following questions: • What class of ANN architectures are most suitable for hydraulic network analysis and model calibration? • For a given ANN architecture what are the advantages and disadvantages of the corresponding learning laws and training methods? • What are the requirements of efficiently representing and implementing ANNs in conventional software and how can such a simulator be best integrated with hydraulic network simulators? 3.2 The Functional Components of ANNs In this section, a generalized framework for representing the basic building block components of ANNs, namely^ the processing element (also referred to as unit, or node4) is briefly sketched. In a directed graph representation of an ANN, the various processing elements or units in an ANN may be viewed as a set of nodes along with a set of directed line segments or arcs between them[Hecht90]. • The various nodes of the graph represent the processing elements, • The arcs of the graph represent the connections between the nodes and act as instantaneous uni-directional signal-conduction paths. 4 Not to be confused with nodes in a hydraulic network Chapter 3, THE BUILDING BLOCK COMPONENTS OF ANNS 47 • Each processing element may have any number of input connections, • Each processing element may have any number of outgoing connections, however, the signals in all of these connections are assumed to be the same. In effect, each processing element has a single output connection that "fans out" into several copies to form multiple output lines each carrying the exact same signal. • Each processing element has its own local memory. • Each processing element may have both an activation and an output or transfer function which utilizes and/or alters (based on the input and error signals) the values of the various parameters stored in the local memory to update the node's level of activation and produce the processing element's output signal. • Input signals to an ANN from the outside environment arrive via connections that originate from the outside world and outputs from the network to the outside world are connections that leave the network. In light of the above definitions, it is now possible to outline a relatively general mathematical description of an ANN as a collection of PEs. Note that this description of the processing elements is independent of the manner in which they may be interconnected or may be implemented, in software or hardware. Consider an ANN with a set of N artificial neurons or processing elements labeled with n integers 1,2, • • -,N . n Figure 3.6 is a schematic of a typical processing element, j, in such a network. In general, each PE may have rrij inputs and produces one output signal Zj which Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 48 then branches or fans out and becomes the input to other PEs in the network. The signals carried by the I/O connections to/from a PE and the local memory variables may have any desired mathematical data type and are not limited to Boolean, integer, andfloatingpoint real variables. In vector notation, the various inputs to PE j in layer I (PEj) depicted in Figure 3.6 may be denoted by: The PEs in the input layer of the network often only receive a single signal from the external environment (and not from the other PEs in the hidden and output layers of the ANN). Dropping the layer superscript and setting rrij := m, the external input vector to the ANN (i.e., to layer / = 0) may be represented by either s = [si,---,sm] T or by some appropriately transformed vector Ps = [rlsl(si), • • •, M*m(-Sm)] T where /i5,(.)'s (i = l,---,m) represents some observable function of each component of the raw external input vector s,- the form and properties of which are largely dependent on the application problem under consideration (see Chapter 5 for specific examples). The various PEs in an ANN communicate by means of signals that propagate over a set of uni-directional signal conduction paths. Associated with the rrij inputs to PEj in a destination hidden or output layer are hence m0, (i = 1, • • •, mj) connection weight parameters local to the Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS' 49 Figure 3.6: The Internal Structure of the FLowTron. For PE j in hidden layer /, the input vector to the unit are the outputs of some source layer / — 1 and s - = [z l{~ x, • • •, z^ 1] 7. l PE Note that the PE may implement some possibly non-linear function of the weight parameters ^ = [Mi( il)'---^m,(^)f W When the signals impinging on a unit originate from source PEs in the input or hidden layer(s) of the ANN, the transformations applied to compute the net-input to PEj (I > 1) has the form (3.20) x'j = j(z T ,w ) + b l fi l 1 l j where fi j(.,.) is a function of both the incoming signals, z ' l - 1 (the outputs from the source layer are denoted by superscript (/ - 1) and originate from all the units in the various layers from which PE'j (in the current destination layer /) receives inputs from (i = 1, • • •, rrij)), and the Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 50 local weight parameters. Each processing element in the hidden and output layers of the ANN may also store the value of an associated constant input bj (not shown in Figure 3.6) commonly referred to as the bias in the literature. Figure 3.6 summarizes some of the notation introduced thus far by depicting a specialized PE in the first hidden layer (1 = 1) receiving signals only from the units in the input layer (/ = 0) with m ^ = T, U )iMsi) h w i=i (3.21) + b} where / J , S (i = 1, • • - ,m) are appropriate functions of the various components of the external 3 input vector implemented at the input PEs. During each time interval, each PEj transforms its net-input x j(t), to update its level of l activation, y j(t), based on the evaluation of Aj, which is some function of the net-input to the l PE and the level of activation at the previous time step, i.e., (3.22) y j(t) = A j(x j{t)Jj(t-\)) l l l This updated value of the level of activation is then used to compute an output value, Zj, for the PE which then fans out and becomes the input to the other PEs or the external environment (3.23) z j(t) = F]{y j{t),z j{t-l)) l l l Note that the exact form of the functions, /^s,(.), ftji-, •), h ji(-)i A j(., •), and FJ(.,.) often depend l l on the computational requirements and specific details of the application problem. Chapter 3. THE BUILDING BLOCK COMPONENTS 3.2.1 OF ANNS 51 A Simplified Processing Element Many existing representations of the processing element make a number of simplifying assumptions about the forms of the various functional components of the generalized PE described above. For example, it is often assumed that for the PEs in the input layer / V = \Psl(Sl),- •-,Msm(5m)] T = S = [si,---,S ] T m and, resulting in a linear model in the parameters of the ANN. In addition, the net-input is often computed by the sum of products of the inputs received from the source layer (1—1) and the corresponding weights, i.e., For the PEs receiving signals only from the input layer, m t=i The above is perhaps one of the most common expressions used for calculating the netinput to the various PEs in an ANN. Figure. 3.7 is a schematic of a simplified processing element (without the layer superscript) with 771 j ^ E ^ - ' t'=l 771j +^ E ^ " 1 (- ) 3 24 «'=0 where w j = b'j and s'0 = 1. In this form, the effective or net-input to a PE is simply l 0 the dot product of the input vector to the PE from the outputs of the PEs in the source layer Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 52 SjQ = ji s —•- jm.j — l —#- s jmj s —#- Figure 3.7: A Simplified Processing Element z ' - 1 = [ z o _ 1 »• • *. m X ] T a z n dt h e weight vector wj- = [w , • • •, w' .] , i.e., l T j0 x'j = z ' _1 jm (3.25) • w{ Note that this inner product is proportional to the projection of z/_i onto wj and hence the closer the direction of these two vectors the greater the net-input to the PE. Other simplifications to the above general architecture are based on the assumption that the activation and transfer functions do not depend on the values of these functions at the previous time step. That is, y (t) = A' (x (t),y (t-l))^ l 4(4(i)) l j j j j and zj(r) = Fj(yj(i),4(* - 1)) « f M(t)) l = y){t) In addition, it is often assumed that the various activation and transfer functions are the same for different groups of PEs. Typical examples of activation functions often used with the simplified version of the generic processing element depicted in Figure 3.7 are summarized in Table 3.2. Chapter 3. THE BUILDING BLOCK COMPONENTS Function Logistic Domain (I) [-oo,+oo] HyperTan [-00,4-00] Function: Gaussian Derivative: aj(xj) 1 aj(xj)(l x aj(l a [ - 0 0 , 4-00] Sign [ - 0 0 , 4-00] aj(xj)) Range ( 0 ) [0,4-1] [-1,4-1] a j(xj)) 2 +e -2aje ^ ^. [0,4-1] 0 {0,4-1} a [ - 0 0 , 4-00] Heaviside - 53 a'j(xj) - aj j<. j) - j(*j) a e OF ANNS. \ 1 if XJ > 0 [ 0 if XJ < 0 j 4-1 { -1 x if Xj > 0 0 if XJ < 0 {-1,4-1} Table 3.2: Examples of common activation functions Note tbat the Logistic function given by: 1 -f e -0l > >o) -a: (3.26) and the hyperbolic tangent function given by: ctj(xj(t)-xj ) e 0 _ -aj(xj(t)-xj ) e 0 aj(xj(t)-xjo) + -otj(xj(t)-xj ) e t&nh(aj(xj(t) - e 0 Xj )) 0 (3.27) are examples of continuous differentiable, monotone nondecreasing, sigmoid functions for positive scaling constant ctj > 0 governing steepness. The Gaussian (xj) aj = e - a ^ 2 (3.28) is an important class of non-monotone functions that can be used both as input transformation and activation functions (see Chapter 5). An interesting property of the Gaussian is that, for Chapter 3. THE BUILDING BLOCK COMPONENTS «j > 0, the sign of its derivative (—2aj0,j(xj)) OF ANNS 54 is always the opposite of the sign of a,j. Note that the generalized Gaussian function (often referred to as the Radial Basis function in the literature[Kosko92b]) has the form: 1 m (s) = exp[-—^ aj 3 «'=1 where s = [si, • • • ,sm] T in R at PE j , and Sj - s»2] (3.29) is the external input vector, cr2 is the estimate of the variance m = [SJI,- ••,Sjm] T is the estimate of the mean vector. This function defines a spherical receptive field in the input space R , centered on the mean vector SJ, and localized TO by (Tj (note that the radius of this field approaches R m as <jj approaches oo [Kosko92a]). The Heaviside or linear threshold function given by { 1 if (xj(t) - x ) > 0 j0 0 if (xj(t) - x ) j0 < 0 (3.30) may be converted to the sign function (by applying the simple assignment: Xj := 2XJ — 1) = WW = i ( i ( 0 ~ 3o) = sign(*i(*) - jo) o 3.2.2 z x x (3-31) Connection Data Types and Interconnection Geometries The signal carried by a connection may have any desired mathematical data type. Binary, integer, real, and complex numbers are examples of I/O datatypes commonly used that effectively influence the choice of activation and transfer functions. A PE often processes different groups of incoming signals separately and can identify each connection impinging on the unit with a layer or grid to which it belongs. For simplicity, it is often assumed that the various groups of PEs in a layer or grid evaluate the same activation and transfer functions and they all receive input from the same layers. Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 55 In addition to the specification of a connection's signal type and its source and destination grids, a "wiring diagram." is needed to indicate from which unit the connection originates from and where it terminates. In an ANN with Nn total PEs (including the units in the input layer), an interconnection matrix, similar to the incidence matrix introduced in Chapter 2 for the hydraulic network, can be specified in which bji = +1 if there is a connection to processing element j from processing element i; bji = — 1 if there is a connection to processing element i from PE j; and zero otherwise. However, for large ANNs, it is often impractical to construct an incidence matrix this way since the number of entries grows with N%. The various PEs in an ANN can be divided into three basic grids. • TO PEs in the input grid/layer • n PEs in the output grid/layers • N — (m + n) nodes in the hidden grid/layers n Each grid can further be divided into layers as dictated by the requirements of the problem under investigation. Note that, based on this definition, any ANN can be represented as a collection of layers simply by requiring each layer to have a single PE. 3.2.3 ANNs as Trainable Dynamical Systems The time evolution of an ANN may be described by a system of coupled differential or difference equations that model the time rate of change of the outputs of the various PEs in the network. In general, ZJ = DJ(ZJ,XJ) : (3.32) Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 56 For example, a system of coupled first order ordinary differential equations of the form Tjij = -ZJ + aj(xj) (3.33) where TJS are suitable time constants, may be used to describe theevolution of the various outputs over time. If the Wji connections between the units are symmetric (i.e. Wji = w tJ ), the solution to Zj(t) in the above system of equations is known to settle down to a stable equilibrium solution (see [Hertz91] for proof). This description of an ANN as a system of coupled differential equations (coupled since XJ depends on the outputs of many other PEs) provides a mathematical basis for the study of an ANN as a dynamical system (see Kosko [Kosko92a] for a comprehensive treatment). If the inputs to the PEs are sustained for a sufficiently large period of time (with Xj > 0), the output values will reach equilibrium when Zj(t) = 0 for all j. This results m-Zj(t) = a,j(xj(t)) which is often referred to as an equilibrium attractor of the dynamical system [Kosko92a]. An example of a,j is the Logistic activation function which has a saturation nonlinearity such that a,j(xj) levels off and approaches fixed limits for large positive and negative values of Xj (see Chapter 5). In the simple case when a,j(xj) = XJ is the identity function, the equilibrium attractor Zj = Xj is a fixed point of the dynamical system. Continuous valued PEs obeying Equation .3.33 are of particular interest when implementing ANNs using electrical circuits (see [Hopfield84] and [Hertz91]) where the system of equations is often numerically integrated before circuit implementation. It is also useful to study the evolution of weight parameters over time. This is particularly important when learning involves weight modification. One may write a system of differential Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 57 equations that describe the evolution of the weight parameters over time wji = Lj(wji,Zj,Sji,---) where Lj represents the learning law implemented at PE j [Freeman91]. The learning process involves finding or estimating the weight parameters that somehow encode the knowledge that the ANN has to learn. For most realistic ANNs, it is not a trivial (and often impossible) task to find a closed-form solution to the above system of equations and hence iterative solution methods are often used. 3.3 Hierarchical Representation of ANNs There are currently a relatively wide variety of different ANN architectures that have been successfully used to model a diverse number of problems. Hecht-Nielsen identifies five basic classes of ANN architectures5 each with its own advantages, strengths, and domain of application [Hecht90]. In this section a hierarchical framework for the specification of the topology and functional 5 In addition, Hecht-Nielsen [Hecht90] identifies five learning laws: 1. coincidence or Hebbian, 2. performance, 3. competitive, 4. filter learning, 5. spatiotemporal and three training methods: 1. supervised 2. reinforcement 3. self-organization Some of these laws and methods are independent of the topology of an ANN and are discussed in the context of the five popular architectures outlined in Subsections 3.3.2 to 3.3.6. Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 58 components Of a generic ANN architecture is initially described and a number of popular architectures are then briefly summarized. This framework was motivated by Fiesler's [Fiesler92] formalism and can be used as a template for the preliminary specification of the topology and function of characteristically different ANNs using the same notation. 3.3.1 Static and Dynamic Components of ANNs The static and dynamic components of an ANN may be defined by a four tuple [Fiesler92]: AJW = (T,T,C,& ) 0 (3.34) where • T is the network topology and is a description of both the basic network structure and the interconnection configuration (wiring) among its PEs. The topology of an ANN may in turn be represented by a two tuple: T = (S,1) (3.35) 1. The network structure, S, defined by the total number of PEs in the ANN (N ) and n the number of grids (L + 2 layers or grids which includes the L hidden layers as' well as the input and output grids with m and n units respectively) may be represented by an ordered set S - {GQ,GI, • • -,GL,GL+I} (3.36) Each grid, Gi (I — 0, • • -L + 1), may be defined by its dimension, di and the number of PEs, Ni, in the grid6. Go and Gx+i represent the input and output layers of the A layer may be defined as a simple grid where the dimension of the grid is at most 2 (1-D or 2-D layers). For simplicity, the terms grid and layer are used interchangeably throughout. 6 Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 59 ANN and the various PEs in these grids are often treated separately and implement specialized transition functions. The various PEs in each grid Gi (I = 0, • • -L + 1) may also be represented by an ordered set of PEs indexed by their number, j, in a total of N processing elements and the grid number /, i.e., n Gi = {PE'j} (3.37) 2. The specification of the interconnection configuration, I, requires consideration of the following: i. the type of interconnection scheme used (interlayer, supralayer, and intralayer)7, ii. the direction of weighted signal propagation (uni-directional (asymmetric), where connection weights are only used in one direction, or, bidirectional (symmetric), where the signals are propagated in both directions (the connection weights are equal in both directions in the symmetric case), iii. the connection order which specifies the number of PEs whose output signals are combined as they propagate over a single connection8, iv. the details of specific interconnections made (wiring diagram) between the PEs in all source and target/sink grids in the ANN 9 . In multilayered ANNs, these 3 basic types of connection schemes are denned as follows: 1) An interlayer connection specifies the wiring scheme between PEs in adjacent source and destination grids, 2) a supralayer connection specifies the wiring scheme between PEs in non-adjacent source and destination grids (i.e., at least one layer is "skipped"), and 3) an intralayer connection which specifies the wiring scheme between PEs in the same grid. Self-connections, where a connection originates and terminates at the same PE, are a sub-class of intralayer connections. A high order ANN has connection order greater than or equal to 2. All ANNs discussed in this presentation' have order 1. An incidence matrix (similar to that described in Chapter 2 in the context of hydraulic networks) may be used to specify the wiring diagram between the various PEs in an ANN. A network with N„ total PEs and L + 2 layers, may be represented by either a single (N x N ) incidence matrix, or, by L + 1 incidence matrices each with (Ni-i + Ni) x (Ni-i + Ni) elements and ( '-i + <)* elements. 7 8 9 n n N N t o t a J Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 60 • r represents the set of ANN" Transition functions T = { M ,A,T,£} F 3 (3.38) where 1. fi is a set of input transformation functions, 3 2. A is a set of state activation functions, 3. !F is a set of output transfer functions, and 4. £ is the error performance (or objective) function being minimized by the learning law specifying how the weights and biases are updated10. • C is a set of constraints on the minimum and maximum values of a number of ANN parameters such as weights, biases, input transformation, activation, and transfer functions parameters C = {C ,C ,Cn,CA,Cr,Ce} w 0 (3.39) • ©o is the initial state of the ANN, i.e., initial values of weights, thresholds, input transformation, activation, and transfer function parameters, etc. at the start (t = 0) ©o = {W ,bo,n ,Ao,Fo,£ ax} 0 3.3.2 s0 m (3.40) Associative Artificial Neural Networks Associative ANNs may be best viewed as data transformation mechanisms (with an input layer Go and a single functional layer, G\) used to associate a given set of arbitrarily chosen vectors Note that the ANN may implement an ontdgenic function specifying how the ANN topology must change during training. 10 Chapter 3. THE BUILDING BLOCK COMPONENTS ARCHITECTURE: TOPOLOGY: OF ANNS ANN = (T, r , C , 0 ) T =.(£,!) Structure: 61 o S — {Go, • • •, GL+I} Grid: G T Interconnection: TRANSITIONS: T =•{?., A, X = {W ,_, } N xNl F,£} I/O Transformations: State Activation: Output Transfer: Error Performance: CONSTRAINTS: C = INITIAL STATE: Connection Weights: T hres holds /biases: Input Transformations: State activations: Output Transfer: Error Performance: © 0 = {W ,b , PsO, «4o,^b, £max} Us A T e {CwtC^C^C^C^fCe} 0 {Cw} {C } b {C„> {CA} {Ce} 0 Connection Weights: Thresholds/biases: Input Transf'tions: State activations: Output Transfer: Error Performance: Table 3.3: Template for hierarchical specification of ANN architecture Wo b 0 PsO A 0 To £ Chapter 3. THE BUILDING BLOCK COMPONENTS sit S OF ANNS ^® *" W® * ^>0- " —*'^1^H Z 62 U 3* z *T m t Figure 3.8: Illustration of a typical feedforward associative ANN architecture. Presentation of the key pattern s € R to the single association layer results in the associated pattern z € R . m n in R , i.e., {si, • • - ,ST} to another set of arbitrarily chosen vectors in R , i.e., {z^, • • - , Z y } . m n The most important application of associative networks is storage and retrieval of associations defined as a simple connection of abstract patterns [Kohonen87] the simplest of which is the binary relation11. An association is an observed pair of I/O patterns {(si, zf), • • •, (ST, z^)} where St = [su, • • •, s ] T mt is the key pattern and z\ — \z\t, • • •, z%t] T is the associated pattern at time t. Associative networks may be either feedforward or recurrent. • In feedforward associative networks (see Figure 3.8 for schematic), the input vector impinging on the network fans out to the PEs in the functional layer and the network outputs are computed in a single feedforward pass. • In a recurrent associative ANN, the output signals of the PEs in the functional layer at time (t - 1) are also used as inputs to the PEs in the output layer (see Figure 3.9 for A pattern may be represented by a vector in R with real valued coordinates. In most applications, however, it is often assumed that s g [—1, +1] or s € [0,1] for all t = 1, • • •, T example training patterns. 1 1 m t t Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 63 Sit St m Figure 3.9: Illustration of a typical recurrent associative ANN architecture. Presentation of the key pattern s 6 R to the single association layer with n PEs results in the associated pattern z £ R". As shown above, the input to each PE also includes the output of that PE at the previous time step. m an example of a recurrent net). When the network is presented with an input vector, the feedback loop becomes active initiating the dynamic evolution of the network output12. During supervised training, the ANN "experiences" or observes T I/O vector associations ({(si,z^),- • -,(sj,Zy)}) or examples of functional mappings from the inputdomain in R m to the output range in R". The ANN attempts to approximate the behavior of the unknown function implicitly from the T I/O sample data, as they become available. In many associative nets, such as the Hopfield net, the output of the PEs are wrapped around and become the inputs to the PEs in the functional layer of the network (i.e., S J ( I ) = Zj(t — 1) or Sj(t + 1) = Zj(t) depending on the convention used). With this assumption it is often possible to completely replace the external inputs to PE j in the functional layer, i.e., sJtNote that if the time evolution of the outputs are ignored until the network converges (i.e., stabilizes as t —> oo), then the ANN is said to exhibit accretive behavior[Hecht90]. More precisely, in an accretive ANN, <j>(s +c) = Zt where c is a sufficiently small error vector. In contrast, in an interpolative network, <j>(s +c) = z +S, where \5\ 0 as |e| 0[Hecht90]. 12 t t t Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 64 (i = 1, • • -m), with the values of the output signals from all the other PEs in the functional layer of the network, i.e., = Sji(t) Z{(t — 1) (with = m, n = 1,— i m and i 7= j if there is no self-connection). The net-input to a PE in such a network (with a self-connection) can hence be computed by: m ) = Hi x w z + i b w i • z + J (- ) b 3 41 1=1 where W j = [WJI, • • •, Wj ] , m T z = [21, • • •, z ] m T (m — n). Using vectors to represent the connection weight vector, the net-input, and the ANN outputs results in a compact matrix formulation (bj = 0 for j = 1, • • -n): x = Wz (3.42) where W T l ^11 • •• Wlm W = [ W l ,...,w T O ] = T T m and x = [xi,---,x ] T m The network weight matrix, W , lies in the m2-dimensional Euclidean space and is made up of weights that are elements of local memories of the PEs. The set of all possible parameter or weight vectors determines the set of all possible combinations of the ANN with a fixed number of PEs. In most popular ANN implementations, self-adaptation is accomplished by the application of some learning law that guides the updating of the weight vectors to a state that results in better performance. Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 65 zlt z •sit d - — z ]t d •Sit > « ^ X J\J\J^K*7— \ z ^ •Smt ^ Figure 3.10: Illustration of the Linear Associator architecture. After the presentation of the input vector s G R , the network produces an output vector Zfc G R . PEj in the functional layer computes its output Zj using Zj = YALI SiWjim n k t The Linear Associator An important example of associative networks is the linear associator ANN most widely applied for storing association lists [Anderson72] (see Figure 3.10). In order to store the association list {(si, zf), • • •, (s^, Zj)} (the training data set), a simple form of Hebb's learning rule as prescribed by [Kohonen87] and [Anderson86] may be applied. For example, the weights may be computed using T ^ = E Z J ^ ( 3 - 4 3 ) t=i with i = 1, • • - ,m and j = 1, • • - ,n. The weight matrix W for the linear associator can be computed by the m x n matrix W = ZST (3.44) where S = [si, • • •, ST] denotes the m x T input matrix and Z = [zf, • • •, z ] denotes a n n x T T output matrix. Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 66 Once the weight matrix has been computed, a test input sjt will produce a corresponding output Zk — vectors Wsfc = Z S S j t T = Ylt=i( t s • s it) • zf. Note that this is a linear combination of the with coefficients determined by the inner product (s • Sk). If the input vectors in the t testing set are orthonormal to the input vectors in the training set, then the linear associator with weights computed as described above will exhibit perfect reca/Z[Kohonen87]. This implies that the maximum number of associations that can be perfectly stored by the linear associator ANN is equal to the dimension of the input vector m. If the input vectors are not orthonormal, then the Hebb learning law is no longer optimal and an error performance function must be defined. The error performance function may be defined by T m * = £E(4-*i«) <=i i=i a (- ) 3 45 Kohonen [Kohonen87] has shown that the matrix W which minimizes the above measure of error performance may be computed by W = zst (3.46) where st is the pseudoinverse of S*3. For the cases where the columns of are independent, then = Z T ( Z Z T ) _ 1 is another solution ([Hertz91], [Camargo90]). Otherwise the relatively expensive algorithm outlined by Fogleman for computing the pseudoinverse may be applied[Fogleman87]. Other alternatives include the use of any gradient technique which attempts to adjust the weight values in the directions that reduce the error the most. An important technique in this category is the "Defined as a generalization of the inverse of S such that SS^S = S, S^SS^ = S^, SS^ = (SS^) , and S S = (S S) . T t t T Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 67 method suggested by Widrow and Hoff [Widrow60] and discussed in detail in the next section. Note that Hebbian learning is often referred to as coincidence learning in the literature [Hecht90]. With this class of learning laws, weight changes occur simultaneously and are completely local to the PEs resulting from computations local to the PE. The mathematical expression for the learning law based on Hebb's observation of learning in biological neurons has the form wait + 1) = Wji(t) + zf(t)si(t) where zj(t) and S{(t) are the j th and i th (3.47) components of the zf and St respectively. In matrix notation W ( t + 1 ) = W + zfo (3.48) T t The changes to the weight vector after T presentations of training I/O vectors and consequently T application of the above update rule, results in (3.49) W = z(sj + ••• + z£s£ Hopfleld Networks The discrete (binary) and continuous Hopfield networks are important examples of fully connected recurrent ANNs used often for storing association lists (see Figure 3.11 for topology). The inputs to a PE in such a network is comprised of the outputs from all the other units creating a recurrent dynamic system capable of storing and retrieving any desired type of association list. Recurrent associative networks are often said to be autoassociative if the output vectors are made to be exactly the same as the input vectors (i.e., n = m and z = s for all t t Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 68 The Hopfield Network 23T (?) ( J ) " ® ® (J) 22li H2L ™23_ ™32 t»ni »3m tOna "3t "ml Figure 3.11: Illustration of the Hopfield Network. The Hopfield net is an example of a recurrent associative net with m PEs fully connected to all the others. The zeros in the diagonal of the weight matrix indicate that the PEs do not have self-connections. The energy function associated with a Hopfield net always decreases when ever a PE changes activation states. Starting at some initial state, the system state vector moves towards a local minimum of the energy function (guaranteed to be completed in a fixed number of steps [Hecht90]). t = 1, • • • ,T). Autoassociative networks have been successfully applied for the reconstruction of partially corrupted patterns[Anderson86]. Associative networks are said to be heteroassociative when z ^ s . t t The net-input to each PE in a Hopfield net is the weighted sum of its inputs according to the equation n Xj = ]T ZiWji + bj In combinatorial optimization applications, each PE updates its state using the activation rule: Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 69 if Xj > 0 then Zj = 1 else if Xj < 0 then Zj = 0. In other applications, such as development of associative memory structures, Zj = sign(xj) is used to represent the states of each PE. 1 if Xj > bj + 1) = 4 zj{t) if XJ = bj —1 if X j (3.50) bj < For the latter application, an "energy" function [Hopfield82] may be defined by ^ £ n n n / y X) / J fif*i*i ~Jl-J-l ( ) = -o9 X) - £/ z i=it=i 6 i i 2 j -J~J (3.51) i=i and it can be proved that independent of the update policy14 the network with the dynamics described above converges to a state where AS < 0. Note that £ is a strictly decreasing operator (i.e., AS < 0) in the transient phase of the time evolution of the network and constant (i.e., AS = 0) in the steady state phase [Camargo90]. Continuous Hopfield networks have the same ANN topology as the binary Hopfield net with symmetric weights (i.e., Wji = W{j). The major differences is that the PEs are no longer limited to binary (0/1) or bipolar (+/—) values and the Logistic or hyperbolic tangent activation functions may also be used at the PEs to update the level of activation. (1) Synchronous where all the PEs update their state and output simultaneously at specified points in time, (2) Asynchronous where the PEs update their states randomly and independent of one another, however, often with the same average frequency and standard deviation (note that it is often assumed that only one PE updates its state at any given moment), (3) Sequential where the PEs update their states one after the other following some pre-specified order, and (4) Block-sequential where groups of PEs update their states and outputs in parallel, however, the various groups are updated according to some pre-specified order. 14 Chapter 3. THE BUILDING BLOCK COMPONENTS 3.3.3 OF ANNS 70 Mapping Networks . This class of ANNs are of special interest in this study and generally implement a bounded mapping or function <f> : S G R m i—• Z C R where S is a bounded subset of the m- n dimensiqnal Euclidean space and 4>(S) is abounded subset of n-dimensional Euclidean space15. This mapping from (or on) the set S to (into/onto) the set Z is often performed by means of supervised training based on T examples of the mapping, i.e., (si, z^), • • •, (sj, Zj). Ideally, the various examples of the mapping should be obtained by randomly selecting vectors s from S t with a fixed but unknown probability density function p(s) which is assumed to be zero outside of S. Multilayered Feedforward Networks The multilayered feedforward ANN trained by error backpropagation (often referred to as "BP networks" in the literature) applied for solving the problems of hydraulic network analysis and model calibration outlined in Chapter 2, has a hierarchical topology consisting of interconnected layers of PEs. The architecture of this class of networks are discussed in detail in Chapter 4 and 5 and may be used to approximate any function <j> from some arbitrary compact16 subset S in R m onto a bounded subset <^>[S] in R . n During supervised training, the network must be supplied with a sequence of T training Note that R is the domain (of definition) of (j> and R is the codomain of <j>. Since S is a subset of R , the image under <j> of S, i.e., 0[S], is the set of elements in R such that z = ^(s). The set of values taken by _4>, that is, the set Z = {z.g R"|(3s)[z = <£(s)]} is called the range of (f> and is generally smaller than R . If the range of ^ is equal to the codomain' of </>, i.e., Z = R , then <j> is a function onto R" (i.e. surjective). i.e., closed and bounded such as [0,1]. 15 m n m n n n 16 Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 71 Figure 3.12: Illustration of Kohonen's Self-Organizing Layer I/O data pairs (si,Zj), • • •, (ST,Z ). T In many situations, the various (s ,zt) I/O pairs used dur- ing training are examples of somefixedfunction ( Zt = <f>(st) or physical process. The objective of the learning laws is to find a set of weights that minimize some specified performance or cost function. Other important examples of performance learning laws is the Widrow learning algorithm [Widrow60] discussed in the following section and the error backpropagation algorithm outlined in Chapter 4. The objective in both cases are to estimate the elements of the weight vector that somehow minimizes the mean square error between the desired output of the process being modeled and the output of the ANN. Kohonen's Self-Organizing Network Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 72 Kohonen's self-organizing ANN is another example of mapping networks used to approximate a continuous topological mapping <f>: S C R m '—> Z C R™ only by means of T randomly selected examples of z 6 Z. S is a rectangular subset of m-dimensional Euclidean space and Z is a bounded subset of the n- dimensional Euclidean space on which some fixed but unknown probability density function p(z) is defined. As shown in Figure 3.12, the self-organizing architecture has two basic layers. The first layer is the input layer (Go) and the second layer, organized as a two dimensional grid and is often referred to as a competitive layer (Gi). The two layers are fully interconnected and signals propagate forward from the input layer to the competitive layer. When an input is presented to the network, each PE in the competitive layer computes a value which measures the extent to which the weights associated with the input units, i.e., [wj\,• • •, Wj ] , matches the input [s\,• • •,s ] . The measure often used is the distance beT T m m tween s and W j , i.e., IN- Wi|| = (3.52) The PE with the smallest distance (i.e., the closest match) is said to "win" the competition. Competitive learning is based on the notion that from among some or all of the processing elements of the ANN engaged in a competition only "winners" are allowed to modify their weights. This category of learning was initially introduced by Kohonen [Kohonen84] and differs from Hebbian learning in that the ANN learns as a process of training by self-organization rather than by supervision. In general, the PEs involved in the competition apply the following input transformation function *i = A*i(S|W,-) (3.53) Chapter 3. THE BUILDING BLOCK COMPONENTS where W j = [WJ\, • • •, Wj ] T m OF ANNS 73 and s = [si, • • -,sm], and /Zj(s, W j ) is a "similarity" measurement function, such as the Euclidean distance / / j ( s , W j ) = ||s — WJ|| = yYXLi( i ~ ji) s w 2 0- -> E S U M of the squares of the arithmetical differences of the components of the vectors s and W j ) . For example, the processing elements in a Kohonen's layer compete with one another on the basis of which one of them has its weight vector Wj closest to the input vector s as measured by the distance function. Only the winner is allowed to set its output to Zj = 1 with all the other output signals set to 0. Once the winning PE has been identified, the next step is to define, some neighborhood (square grids of different widths dt, at training iteration t, with n — d% total units centered at w the winning PE w) and proceed with modifying the weights of the PEs in this neighborhood using A ={ Vt(sjt — WJU) if unit j is in the specified neighborhood (3.54) 0 otherwise rjt is the learning rate and may be determined using rjt = 770(1 — -y) where t is the current Wji training iteration and T is the total number of training iterations to be performed. Note that ao is often chosen in the range 0.2 — 0.5[Dayhoff90]. The neighborhood centered at the wining PE may be initially chosen as the largest do x do square grid (with possible cut-off if beyond the edge of the grid in either of the dimensions of a possibly non-square 2-D grid) around the winning PE. The size of this grid may then decreased according to the equation: d = do(l — -f) t which assures a gradual linear decrease in d to zero as t approaches T. t Chapter 3. THE BUILDING BLOCK COMPONENTS 0 1 2 OF ANNS 3 74 4 z d z d Z d Figure 3.13: Illustration of counterpropagtion. The Counterpropagation Network The "counterpropagation" network is another important example of mapping networks initially invented by Hecht-Nielsen[Hecht87]. Figure 3.13 illustrates the direction of signal propagation and topology of this multilayered network. The network is most suited for approximating a continuous function <j> : S C R m •—•> Z C R n defined on a compact set S. However, this architecture works best if the inverse of <f>, i.e., <f>~ , 1 exists (which requires <f> to be one-to-one and onto) and <^_1 : Z C R n "—> • S C R m is continuous [Hecht87]. As shown in Figure 3.13, the network has five layers: • Two input layers (0 and 4 in Figure 3.13 corresponding to the m-dimensional input (ss) and the n-dimensional output (z ) vectors in the training set) that fan-out to all the PEs d in the hidden competitive layer 2 and to a single corresponding PE in the appropriate output layers 3 and 1 respectively. Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 75 • A hidden competitive layer (2) (similar to the Kohonen layer described previously) PEs of which receive inputs from the m + n fan-in units and fan-out to m + n units in the output layers 1 and 3. • Two output or Grossberg layers (1 and 3) where each PE is fully connected to the units in the hidden layer (2) as well as to a single corresponding input unit in layers 4 and 0 respectively. During training, both or either one of the input and the output vectors are simultaneously input to the network at layers 0 and 4 where they are propagated in opposite or counter flow directions resulting in the ANN computed values of the outputs and the inputs (or either one) denoted by z and s' , respectively. t t The full counterpropagation ANN can be used to learn both the mapping z = (f>(s) and its inverse, i.e., s = <f>~ (z), at the same time. If this inverse does not exist, the feedforward-only x version with only three layers (and no intralayer connections) may be used to learn the desired mapping <j>. Considering the feedforward-only version, when the input vector s is presented to t the input layer (L = 0) it fans-out z° to the PEs in the competitive hidden layer 2 which in turn performs a weighted sum x"j = YliLo u °iT z w 2 where w 0 , - 2 denotes the weights connecting layers 0 and 2. The PE with the highest value of its net-input, xj wins the competition and outputs zj = 1. All the other units in the competitive layer set their output to 0. Once the competition is over, the PEs in the output layer 1 compute the weighted sum zj = X? = Zw^i it °jT z w 2 a s their output. However, since all the output values of the PEs in the hidden layer are zero with the exception of the winner i*, the PEs in the output layer take on the values of the weights Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 76 associated with the connection from the winner, i.e, zj = w®~. . 2 The m weights associated with the incoming signals to the competitive PE j = i* in layer 2 from input layer 0 may be modified as follows: w -\t+l) 0 j = w -\t) + n(s -w° r\t)) ' 0 j j it (3.55) where 0 < rj < 1.0 is the value of the learning rate parameter. Note that all the other weights where j 7= i* remain unchanged, i.e., w°^ (t + 1) = w°^ (t) 2 2 if j 7= i*. This weight adaptation is followed by the algebraic normalization of the weight vector stored at the winning PE W j . The PEs in the output layer adjust their weights based on Widrow-Hoff rule: w)r\t + 1) = w]r\ ) + r,czht)(z}\t) - z)(t)) t (3.56) where r) is the learning rate parameter. Note that since only the winning PE in the competitive c layer outputs zf = 1 when i = i* the above equation reduces to: ™]r\t+l)={ 3.3.4 >y JT (3.57) Stochastic Networks In this class of ANNs the level of activation and output of individual PEs are stochastic (rather than deterministic) function of their inputs. The Boltzmann machine is an important example of stochastic ANNs and has the same topology as the discrete Hopfield network with bipolar {+, —} or binary {0,1} outputs. The Boltzmann machine architecture has been successfully applied to solve a number of combinatorial optimization and constraint satisfaction problems (see [Kirkpat83] and [Hopfield85]). Similar to feedforward networks, learning requires training Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 77 1/ O vector pairs and involves adjustment of the weight parameters by an error minimization algorithm known as simulated annealing in analogy with metallurgical annealing (see [Aarts89] for a comprehensive treatment). The underlying idea is based on the observation that the global process of finding the minimum of the energy (error performance) function of the ANN can be reduced to a simple localized process of energy modification. Given a current state o of the network with an associated energy £„, then a subsequent state may be generated by implementing & perturbation mechanism that can be used to cause a change from the current state to the next[Aarts89]. If the change in energy, AS = £b — £ai between the two consecutive states of the network is less than or equal to zero then the change and the new state is accepted with probability exp(^^) T (3.58) where and r denotes the "temperature" of the network in appropriate units. Note that similar to the Hopfield network, only one PE is allowed to update its state at a time which is often selected uniformly at random [Hecht90]. A network with n = m binary or bipolar PEs in the functional layer can be in any of 2 m possible configurations. The probability that the PEs will be in a given configuration with energy (error) £ t at temperature r is given by the Boltzmann distribution: It has been shown that, if the network changes according to the above probability, then the network will eventually reach a dynamic "thermal equilibrium", i.e., the PEs will continue to change their states, however, the probability of finding the network in any given global state, for Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 78 a constant temperature, remains constant and obeys the Boltzmann distribution[Camargo90]. Similar to the Hopfield network, the energy or error is defined by 71 ^ € () = -o z n n £ WjiZjZi j=li=l 2 bjZj j=l which is identical to Equation 3.51. The transfer function of the Boltzmann machine is similar to that of the Hopfield net modified to use the simulated annealing procedure [Hecht90] t Zj{t +1)={ -Zj(t) if AS < 0 - (t) if AS > 0 and u < e x p ( ^ ) Zj{t) otherwise Zj (3.60) where r is the temperature, u is the uniform deviate between 0 and 1, and TO AS = z (t)fiT w z (t)-b ] «'=i j 3.3.5 / ji j j (3.61) Spatiotemporal Networks Similar to multi-layered feedforward networks, a Spatiotemporal ANN learns to approximate an unknown mapping or function <f> from S C RTO •—>Zc R n and are hence generalizations of the mapping networks introduced previously. However, the elements of both the input vector s and the output vector z are now functions of time (see Figure 3.14 for example) as well as space. Assuming that the operation of the network starts at time t = 1, the inputs to and the outputs from the network at time t may be denoted by s(r — 1) and z(t), respectively and the network is simulated forward in time for T discrete time steps. At each step in the simulation, after the network is presented with the input s(t — 1) (t = T), it outputs a corresponding Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 79 Figure 3.14: Example of the topology of a recurrent network output response vector z t . The objective of the network is to map the set S = {z(0),{s(0),s(l),---,s(T-l)}} into the set of m-dimensional vectors Z = {z(l),z(2),.-.,z(T)} Recurrent networks trained by error backpropagation are an important example of spatiotemporal networks (originally introduced by Rumelhart, Hinton, and Williams ( [Rumel86a], [Rumel86b]) and has been expanded and improved upon by many others. As illustrated in Figure 3.14, the network has a functional layer with N PEs, n of which output to the external environment. The outputs of the PEs in this layer are also fed back to the network via N fanout units. The inputs to the N PEs in the functional layer receive inputs from (m + N) fan-in Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 80 units, m of these inputs originate from the the external environment and N fan-out the values of the outputs at the previous time. The input vector to the functional layer at time t may be represented by u(t) = [si(t-1), • • •, s (t-1), z (t-1), • • •, z (t-1), z ( n + 1 ) (i-1), • • •, z (tm x n N 1)]T and the outputs by v(i) = \z\(i), • • •, z (t), Z(n+1)(2), • • •, z/v(£)]T the first m and n components n of which are the inputs from and the outputs to the external environment. The operation of such a network may be viewed as performing a mapping from the set of u's provided at each time step in the simulation to the set of v's produced by the network over the entire simulation run. The recurrent architecture is suitable in applications where the inputs are functionally dependent on the earlier outputs. 3.3.6 Hierarchical Networks A "hierarchical" ANN is another example of a multi-layer/grid network architecture with many layers of PEs. The most notable application was originally introduced and implemented by Fukushima called the neocognitron [Fukush88] and is based on the notion that the PEs in any given grid receive connections only from a restricted and localized subset of the PEs in the previous layer of the network. This restricted interconnection configuration is advantageous in a number of applications since the connection weights can now be characterized by sparse and localized connectivity matrices between the layers. The hierarchical ANN of Fukushima is perhaps the most complicated pattern recognition network to date [Hecht90] the details of which are beyond the scope of this section. Chapter 3. THE BUILDING BLOCK COMPONENTS 3.4 OF ANNS 81 A N N Learning Laws and Training Methods A major characteristic of ANNs is their ability to learn or adapt. As adaptive parallel-distributed information processing systems, ANNs can learn new associations, new patterns, and new functional dependencies from sample data. Learning is the process of continuously encoding the information in the sampled data by making changes in the values of the parameters of the constituent PEs. How well a network can learn from on-line or off-line sample data is often assessed based on some measure of performance or cost criteria. An ANN learns if the parameter matrix has a non-zero time derivative. Learning stops when W =0 Using an ANN to estimate a function <j> : S C R m •—> Z C R™, may be viewed as estimating the joint probability density p(s,z). The solution points, (s, F(s)) should then reside in the high-probability regions of the I/O product space S x Z [Kosko92a]. 3.4.1 Learning as Parameter Estimation For a number of applications, such as hydraulic network analysis and model calibration, learning in ANNs may be best viewed as a parameter estimation method[Barto89]. Parameter estimation methods have been extensively applied to many fields including adaptive control ([Goodwin84] and [Astrom89]) pattern recognition ([Duda73]), and adaptive signal processing ([Sklansky81]). In pattern recognition applications, for example, one may construct a pattern classifier by assuming that the classification or decision rule is a member of some specific class of rules. Each rule in the class is specified by selecting values for a set of parameters[Barto89]r Adaptation Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 82 or learning may be viewed as the process of adjusting parameter values based on how well the classification or decision rule can classify a set of patterns whose correct classifications are already known. In adaptive control and signal processing applications, parameter estimation is used for system identification with the objective offindingthe parameters of a process model (according to some adaptive law) such that the response of the model to an input signal approximates the response of the real system to the same input. In a similar manner, parameter estimation may be used to identify the inverse of a system. It is customary to assume that a model has a specific mathematical structure, expressed in terms of unknown parameters, and then proceed with estimating the model parameters from measured data. For example, in the context of the hydraulic network analysis and model calibration problems considered in Chapter 2, the objectives of a parameter estimation method would be to identify a model capable of computing pipe segmentflowratesand nodal pressures that are in close agreement with the real values (field measurements) of these unknowns. The inverse of this model is capable of computing the values of pipe resistances and demand discharges based on acceptable ranges of design values of these new unknowns. Many ANNs may be viewed as information storage/retrieval networks with a corresponding memory model interpretation of the network dynamics. In a number of applications, such as associative memory networks, the various connections in the ANN are utilized for storing many associations and the parameter values are hence the contents of the memory locations. In this distributed memory model the retrieval process takes place after all the associations are stored, or learned. Learning in associative networks, however, is interpreted as the storing Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 83 of information in a memory structure and not estimating the parameters of some best-fitting model in a class of models [Barto89]. In comparison to the traditional look-up table model of memory, associative memory networks have the desired properties of content addressability and afford a much improved degree of robustness against physical damage. Viewed as system identification procedures, they may be criticized since they often fail to model the data generating process accurately. Thus, despite the close relationship between associative memory and system identification, they are evaluated by different criteria. The similarities inherent in these two interpretations of learning provides a means for linking traditional memory-based problem-solving models of Al to parameter estimation methods used for system identification in engineering applications [Barto89]. In the context of ANN learning, the parameter estimation problem may be decomposed into the following components ([Goodwin84], and [Barto89]): • Problem Representation: An important characteristic of ANN research is that the parallel distributed processing paradigm in some way captures how the human brain carries out its information processing operations. In practice, when applying ANNs to model and simulate real-world engineering systems, representation corresponds to deciding on how the signals reaching the network's input units as well as how the network's output must be related to the problem under study and identification of relevant variables and/or parameters. There is little theory available to guide the engineer in representing the problem under investigation as system (or decision rule) identification. The questions that need to be considered include: Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 84 1. What features of objects, inputs, states, etc. should be measured to provide data for system identification? 2. If basic physical variables are obvious candidates, should they be represented simply as real-valued measurements, or should they be encoded in some other way? • Class of Decision Rules or Models: This aspect of parameter estimation is guided by relatively little theory [Barto89]. The engineer must some how select an appropriate class of decision rules or identification models along with the number of parameters to use and specify how they define the decision rule or model. As with most problems of this type, a compromise must be made between rule or model complexity and its adequacy for capturing the essential features of the problem under consideration. In the context of ANNs, these design choices correspond to specification of the internal structure of the PEs, the topology of the network, and the constraints among the weight parameters that need to be enforced during training. • Performance Criteria: How can the performance of different decision rules or iden- tification models be best compared? The objective is to be able somehow to assess the relative performance of the system based on different model simulations without having to actually implement them. For instance, the "best" measure of performance may evaluate how well the system actually makes use of the decision rule or model. In the context of parameter estimation, it is common to evaluate performance by the average of the squared error between the output of the decision rule or model and some desired or target values representing the observed behavior of the system. Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 85 • Estimation Methods: There are two major classes of parameter estimation methods that are of interest (both in the context of learning in ANNs and hydraulic network analysis and model calibration problems discussed in Chapter 2):off-line and on-line methods. 1. Off-line methods are applicable only when a sufficient amount of training data are available. Such data must be collected prior to performing the analysis that determines the parameter values. 2. On-line methods, on the other hand, do not always require training data in advance and can process such data as it becomes available and must hence utilize efficient methods to keep-up with the temporal flow of events. In practice, however, they usually start with an a priori block of data and then incorporate new data recursively. • Use of a priori Knowledge: Parameter estimation can benefit greatly from the use of a priori knowledge. This may allow the selection of the class of decision rules (process or identification models) and/or in the form of initial parameter values and constraints. ANN learning methods are often studied under the assumption that there is little prior information about the problem being investigated, however, this assumption can often lead to poor results[Barto89]. On the contrary, as will be discussed in a later section, the incorporation of existing domain knowledge can significantly accelerate learning and economize the size of ANNs. Chapter 3. THE BUILDING BLOCK COMPONENTS 3.4.2 OF ANNS 86 Example of A N N Learning as Parameter Estimation Adaptive least mean square17 error minimization method forms the basis of Widrow's learning algorithm [Widrow60] and has been successfully applied to a number of applications such as adaptive noise cancellation in long-distance telephone calls and high-speed modems [Widrow78]. The LMS algorithm is similar to adaptive Kalman filter [Kalman60] since both techniques recursively estimate the parameters of the normal equations of linear least square regression18. The remainder of this section considers the details of how the LMS error may be minimized in the context of one of the simplest PE architectures referred to as the ADAaptive LINear Element19 in the literature. Widrow learning [Widrow60] is based on the idea that there is some unknown process, physical system, or plant that is being modeled with an ADALINE based entirely on T I/O pairs of observations of the process or system. Given a single I/O pair, (s, z ), it is relatively d straightforward to estimate the values of the weight parameters w = (WQ, • • •, w ) which will m result in an ANN output z, which is in close agreement with the desired value, z . With T d real-valued I/O vectors, {(s\,zf), • • •, (ST,Z )}, with each St having a corresponding, perhaps T unique, desired output value, zf, (t — 1,- • - ,T) the problem of finding a single weight vector that can be used to calculate the desired or target output value is no longer as simple. The ADALINE performs a sum-of-products calculation on the incoming input signals and the values of the weight parameters stored in its local memory. In relation to the generalized PE described in Section 3.2, both the activation and transfer functions are the identity function L M S henceforth. The difference is that the Kalman filter adds a Gauss-Markov state equation to describe system dynarhics[Kosko92a]. Note that the Kalman filter is not usually adaptive. ADALINE henceforth. 17 18 1 9 Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 87 Figure 3.15: The Architecture of the ADAptive Linear Element. The input layer contains m linear PEs and the output layer has only one PE that performs a sum of products calculation. and equal to the net-input to the ADALINE as shown in Figure 3.15 The ADALINE may be described by the simple mathematical model: m Zt = Wot+ SitWi( -i) t t=l i) »=o = where the vector w = [w ,w£JT t 0 (3.62) w (t-lft represents the weight parameter vector associated with the random realization of input vector s = [so , • • •, s t] T t t m at time t. The LMS algorithm may be studied in a stochastic approximation framework where {zf}, {s }, and the residual or instantaneous error {S } (i.e., the desired minus ADALINE ret t sponse S = zf — z ) are random processes indexed by time. The LMS error goal is to minimize t t the mean square error given by: £(*)• = lim (hJ2(4 - zt? Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 88 Um(i)f>?-wTst)2 -°° 1=1. T T (3.63) E [(zf - z f] xp t Note that the above limit is simply the definition of the expectation operator, .Expf.], commonly used in statistics to designate the mean or average of the quantity within its brackets, based on the assumption that this quantity is a random variable with some fixed probability density function. Using the properties of the expectation operator, namely, that the expectation of a sum is the sum of the expectations of the terms within the brackets, and that constants can be taken out of the expectation operator, the expression for £(w) may be expanded as follows: £(w) = E [(zf-ztf] xp = E [(z - s ) ] d xp T W 2 t = E [zfzf-2z w s + w s sJw = E [ztz ]-2w E [z s ] + w E [ sJ] T xp t d T t t T xp t t xp St Defining the matrix R = Exp[stst ] as the input auto correlation matrix, p = T as the I/O cross correlation vector, and q as (3.64 T xp W E [z s ] d xp t E [zfz ] the error function may be written as d xp [Hecht90] [Kosko92a]: £(w) = q + w T Rw - 2pTw (3.65) Note that w is treated as a constant and the dependence of € on w is quadratic and defines a paraboloidal surface with height equal to £(w) at w. The mean square error function can be minimized by partially differentiating both sides Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 89 of the above equation with respect to components of the parameter vector w and setting the result to zero evaluated at optimal w*, 0£(w) dw = 2 R w - wp 2Rw*-2p Rw* = (3.66) 0 = p (3.67) w* = R ^ p if the matrix R scalar, ffl d£ 9 _ 1 exists. It is noteworthy that although the mean least square error term is a is a vector V wo rflg(w) — aw d€(w) wr The above procedure locates a point where the slope of the function £(w) is zero which may, in general, by either a maximum or minimum. However, since the error surface is hyperparaboloidal, and like a 3-dimensional paraboloid, it has a unique "trough" that lies at its minimum value. Although the hyper-paraboloid may be degenerate with more than one "trough", the points along the bottoms of these troughs will all be at the same minimum S height[Hecht90]. Unfortunately, the analytical solution of the optimum weight vector can be cumbersome both due to the possibly large dimension of w and the fact that R and p are expectations and their calculation requires a priori knowledge of the distribution from which the input signals were drawn from. As a solution to this problem, Widrow and Hoff [Widrow60] developed a method for computing w* by starting from any arbitrary initial value of w, and simply sliding down the least mean square error surface until the minimum is reached. Since the error surface is a simple Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 90 paraboloid, there are no local minima to prevent a direct slide towards the global minimum with the smallest possible value of mean square error. At any point on the error paraboloid, the vector — V £(w) w points in the direction in which £(w) will decrease at the fastest possible rate. Hence, the basic idea behind Widrow's algorithm is that once an estimate of — V £ ( w ) t0 can be computed, then the weight vector can simply be moved in this direction by a small amount which is closer to w*. The problem thus becomes that of estimating V„,£(w). Starting from the limit form of £, V £(w) can be shown to reduce to[Hecht90]: w i V £(w) w T = Urn (-) J2 (3.68) ~ t)(s ) z t Since 6 was used to denote the error (z( — z ) made by the ADALINE on the t th t t input vector, st, then V £(w) w = U m ^°° T (I)£2(^-zt)(-st) 7=1 T = " Hm(i)5>t(-st) = -2E [6 s ] xp In other words, in order to estimate V £(w), w t t (3.69) all that needs to be done is to average a large number of 6ts vectors and then multiply the result by -2. Widrow and Hoff extended this t procedure one step further and simply used T}6ts as the correction to w, i.e., t w t + i •= w + vStSt t (3.70) where rj is a positive constant whose value is typically selected by trial and error. As pointed out by Hecht-Nielsen, the task of finding the upper bound for n is not trivial [Hecht90]. Generally, if 77 is too large, the weight vector will not converge to w* and if it is too small, convergence Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 91 may take a very long time. Typical values range from 0.01 to 10.0 with a value of 0.1 as the starting value [Hecht90]. Presently, there are two popular variants of the Widrow learning law (also known as the LMS learning law, and the delta rule), known as the batching and the momentum versions [Hecht90]. In the batch mode, the weight vector is updated only after a number of st input vectors have been presented with a fixed weight vector. In the multiple input episode or batch version, the Stst vectors are averaged and used to replace the single input episode StSt used above. This averaging has the effect of changing the weight only after a reasonable estimate of the gradient of £(w) has been found. The momentum variant of LMS learning law requires the updating of the weight vector after each presentation of the input vector and is given by the following equation: w 1 + i = wf + 77(1 - a)6tst + a(wt - wt_i) (3.71) where a is the momentum parameter. 3.5 Conclusions This chapter has described the building block components of ANNs while emphasizing a hierarchical framework for the specification of their topology and functional components. A number of important ANNs were summarized with the objective of identifying the most suitable architectures for hydraulic network analysis and model calibration problems formulated in Chapter 2. Steady-state network analysis as described in Chapter 2 involves finding the solution vector z = z* of the system of n = (k + r + s) basic hydraulic equations such that F(z*) = 0. In Chapter 3. THE BUILDING BLOCK COMPONENTS OF ANNS 92 this study, the solution to this zero-finding problem has been viewed as the task of finding or approximating the unknown mapping, <j>: S — • > Z, from the set of all possible values of the hydraulic network model parameters, S, onto the set of all possible values of the state variables, Z, such that F(<*>(s)) = 0 (3.72) when z = <£(s) = z". Similarly, hydraulic network model calibration may be viewed as identifying the inverse of the mapping, <£ : Z i - t S , such that -1 F-V-^z)) = 0 (3.73) when s = <£ - 1 (z) = s". The questions of which mapping network architecture is most suitable and how the I/O of the WDN and ANN can be interfaced during training are addressed in the following chapter. Chapter 4 D E V E L O P M E N T OF T H E N E T W O R K MODELS 4.1 Introduction Although the parallel-distributed information processing architecture presented in the previous chapter is extremely attractive in principle, there are a number of questions that remain to be answered at the outset before a successful implementation can result: • What is the "best" ANN architecture needed to adequately represent a given class of hydraulic network problems? Specifically, in the context of hydraulic network analysis and model calibration problems outlined in Chapter 2 1. What are the various input/output variables and/or parameters for a given class of WDN problems and how do they correspond to the I/O to/from the ANN? 2. What type of PE architecture and ANN topology is most appropriate? (i.e., type and number of PEs and their interconnection configuration or wiring)? • Should and can the two network simulators be integrated and how can their I/O be best interfaced? How can each class of network simulator be implemented in software and what types of data structures and related information processing operations must be implemented? 93 Chapter 4. DEVELOPMENT OF THE NETWORK MODELS 94 • Given that an appropriate ANN topology for representing some specific class of hydraulic network problems can be identified, what are the most efficient learning algorithms and how well are they expected to perform in response to noisy input data? Does the algorithm require training data, and if so, where do these data come from and how can they be best generated synthetically? The objectives of this and the following chapter are to illustrate the versatility of multilayered feedforward ANNs trained by error backpropagation1 for hydraulic network analysis and model calibration. Chapters 2 and 3 have identified the basic components of the two classes of networks and the corresponding set of governing equations. The numerical experiments detailed in Chapter 5 are based on the multilayered feedforward network topology and the error BP algorithm derived in this chapter. In this and the following sections, the reasons motivating the choice of the error backpropagation algorithm for supervised training is initially described and a number of possible schemes for interfacing the 1/ 0 of the two network simulators during training are then outlined. Two methods are suggested for mapping arbitrarily complex hydraulic networks to their equivalent feedforward ANN representation. The algorithms at the heart of the WDN and ANN network model simulators are then presented followed by the description of three specific classes of problems used to numerically verify the hybrid model. BP 1 henceforth. Chapter 4. DEVELOPMENT 4.1.1 OF THE NETWORK MODELS 95 Why Learning by Error Backpropagation in Multilayered ANNs? The backpropagation algorithm has a colorful history and has been responsible for much of the reported success of ANN applications and renewed interest in learning research. The basic supervised weight modification scheme which has been developed independently by several researchers was popularized by Rumelhart et al. [Rumel86a], [Rumel86b] in the Parallel Distributed Processing volumes mentioned in Chapter 3. Over the past few years, multilayered feedforward ANNs trained by error backpropagation have been applied to solve a number of diverse problems with varying degree of success. Applications have included learning to play backgammon, non-linear prediction of noisy time series, control of robotic manipulators, prediction of stock and commodity prices, learning to pronounce English text, to name a few (see [Muller90], [Dayhoff90], and [Freeman91] for more examples and details of these and many other applications). Currently, for a number of practical applications, the "fastest" learning algorithm is the error BP[Fahlman88]. The most practical property of multilayered ANNs that is of interest in this study is their ability to approximate functions that are known only at a relatively small number of discrete points. The question of what functional forms can be approximated by multilayered feedforward networks trained by error BP is of great importance to many engineering applications. Perhaps the most clear insight into the versatility of multilayered feedforward ANNs for function approximation came as a result of a general theorem of Kolmogorov [Hecht90] concerning the representation of continuous functions of several variables. The interpretation of this theorem in the context of multilayered feedforward network provides the assurance that continuous Chapter 4. DEVELOPMENT OF THE NETWORK MODELS 96 mappings of an m-dimensional vector on the unit hypercube2 can be implemented exactly by . a three-layered feedforward ANN (with 2m + 1 PEs in the hidden layer) given the appropriate choice of activation and transfer functions in the hidden and output layers. In addition, Hecht-Nielsen [Hecht90] has shown that a three layered feedforward network with the correct weights is able to approximate any square-integrable function of interest to any desired degree of accuracy. However, there are no known learning algorithms that can be used to obtain reliable estimates of the weight parameters. Lapedes and Farber [Lapedes87] were able to show that at most two hidden layers are required for the representation of arbitrary, "reasonable" functions of any number of continuous arguments given "enough" PEs in the hidden layers. There are literally hundreds of commercial, public domain, and "shareware" ANN simulators in existence today each with its own advantages and limitations. A comprehensive review and comparison of the various existing ANN implementations is beyond the scope of this chapter (see, for example, the frequently asked questions (FAQ) monthly posting in Usenet newsgroup, comp.ai.neural-nets). During the early stages of this study, however, a number of these simulators were used to study the suitability of different types of ANN architectures for solving a bench-mark elementary 3-node hydraulic network. After much experimentation with a number of different architectures, it became evident that multilayered feedforward ANNs trained by error BP are the most suitable network architecture and learning algorithm for representing and modeling hydraulic networks. For the hydraulic network analysis and model calibration problems discussed in Chapter 2, multilayered feedforward networks trained by error BP appear most promising for a number 2 or any compact, i.e., closed and bounded set[Hecht90]. Chapter 4. DEVELOPMENT OF THE NETWORK MODELS 97 of reasons: • the number of connections in a multilayered feedforward network with N = J2f=o Ni n units is n = Y^u=o ^l^l+i which is less than n = N in a fully connected network. 2 c c • Due to their layered structure, feedforward networks are ideal for representing hydraulic networks in a hierarchical framework. • The theoretical results briefly discussed above can be used as the basis for the specification of the size (i.e., number of layers and PEs per layer) of the feedforward ANN based on the number of unknowns in the non-linear system of hydraulic equations formulated in Chapter 2 and the number of nodes, pipe segments, and loops in the hydraulic network. • The performance of feedforward networks trained by error BP is relatively robust and the algorithm is well-understood and easy to implement in software, • The conventional iterative methods currently used to find zeros of the non-linear system of hydraulic equations (or the more accurate direct method outlined in Chapter 2 and summarized in Subsection 4.4.2) can be used as the basis for assessing the performance of multilayered feedforward ANNs in comparison to these well-understood methods. • The fact that the appropriate performance or error function derivatives can be calculated by backpropagating the errors has two important consequences[Hertz91]: 1. The parameter (weight) update rule is local and hence to compute the parameter change for a given connection, only quantities available at the two ends of a connection are needed. This feature makes the BP algorithm appropriate for parallel Chapter 4. DEVELOPMENT OF THE NETWORK MODELS 98 implementation. 2. The computational complexity of multilayered feedforward ANNs is less than might Yli=o NiNi+i), the be expected. In a feedforward network with n connections (nc = c computation of the error or performance function requires order n operations. Comc putation of the n derivatives directly would require order n operations. However, 2 c the error BP algorithm allows the indirect calculation of the n derivatives requiring c only of order n c operations. In the following sections the BP algorithm is initially presented for the general case of a feedforward network with m inputs, n outputs, and L hidden layers. The presentation extends the standard PDP presentation of Rumelhart et al. [Rumel86a] from one to L hidden layers and highlights Kosko's [Kosko92a] emphasis on the stochastic nature of the learning process. A framework is then presented for integrating and interfacing the I/O to/from the two classes of network models followed by the development of a new method for the systematic specification of the number of layers and PEs/layer in the feedforward ANN based on the number of nodes, pipe segments and loops in the hydraulic network. 4.2 Error Backpropagation in Multilayered Feedforward ANNs The backpropagation algorithm is applicable to multilayered feedforward networks with m units in the input layer (/ = 0), n units in the output layer (/ = L + 1) and J2h=i Ni units in the L hierarchically ordered hidden layers or grids. In its classical form, the algorithm (referred to as the Generalized Delta Rule by Rumelhart et al. [Rumel86a]) may be best viewed as the Chapter 4. DEVELOPMENT OF THE NETWORK MODELS 99 Figure 4.16: Illustration of the forward-backward signal propagation cycle. Note that in both the feedforward and backward signal propagation i indexes the layer from, which the signal originates and j indexes the PEs of the current layer receiving input or error signals. non-linear extension of the least mean square (or LMS) learning algorithm (also known as the Delta Rule) outlined in Section 3.4. The PEs in the input layer (I = 0) are simple fan-out units with no local weight parameters that receive inputs from the external environment and distribute them after the application of the appropriate linear or non-linear transformation function. The n processing elements in the outputlayer (/ = L +1) are the only PEs that transmit signals to the external environment. As shown in Figure 4.16, a typical PE in hidden layer /, denoted by PEj, is fully connected to the PEs of the layer to its left (I - 1) and all the PEs in the layer to its right (/ + 1)During forward signal propagation, PEj, receives input from every PE in layer / — 1 to Chapter 4. DEVELOPMENT OF THE NETWORK MODELS 100 its left and transmits its outputs to all PEs in layer / + 1 to its right. During training, in addition to the feedforward connections, each PEj receives an error feedback signal from each of the units in layer / + 1 to its right and uses these signals to update the values of the local weight and other parameters in its local memory. Once the values of the weights have been updated, the PEj transmits values of (w'j^^Sj) (computed with the old values of the weight parameters and hence the (t — 1) subscript) to the PEs in layer / — 1 to its left from which it receives input during forward propagation. Note that unlike the forward connections, the feedback connections from a PE are not "fanned-out" copies of a single value and represent different signal values for each feedback connection. 4.2.1 The Forward-Backward Signal Propagation Cycle The information processing operations performed by a multilayered feedforward network during training consists of both a forward and a backward propagation of input and error signals through the network. The forward signal propagation starts with the presentation of the t th m-dimensional vector S{ to the input layer which transmits each and every component of the input vector to each and every PE in the first hidden layer after carrying out the appropriate transformations on the various groups of inputs. Each processing elements of the first hidden layer then calculates the net-input to each PEj as the sum-of-products of the incoming signal and the value of the corresponding weight parameter. The PE then updates its level of activation and finally transmits its outputs to all the units in the next hidden layer, I + 1. Similarly, the net-input and the outputs of each of the units in the following hidden layer are then computed and Chapter 4. DEVELOPMENT Input Layer Layer: PEs/layer: 1 =3 m Outputs: MODELS Output Layer l =L I= L + 1 h- 2m+1 rn x (2m + 1) n =3 (2m->[• 1) x n ) S{ —H < 101 Hidden Layer 1=0 Connections: Inputs: OF THE NETWORK = (IT Figure 4.17: Feed-forward ANN with m inputs, n outputs and 1 = 1 hidden layer. The number of nodes in the hidden layer is iVi = 2m + 1. The number of elements in weight matrix W is equal to the number of nodes in layer / multiplied by the number of nodes in the layer to the previous or left layer. l Chapter 4. DEVELOPMENT OF THE NETWORK MODELS 102 transmitted to all the PEs in the following layer, and so on, until, the PEs in the output layer, PEj'* 1 calculate their net-input, update their level of activations and finally emit the n-dimensional output vector denoted by z representing the feedforward network's estimate of t the desired output (z^). Figure 4.17 depicts the forward signal propagation phase for a simple example feedforward network with L = 1 hidden layer, m = 3 units in the input layer, and n = 3 units in the output layer and summarizes the details of the various equations. The choice of the transformation functions at the input layer and the activation and output functions implemented at all the other PEs in the hidden and output layers is largely dependent on the domain of application and the specification of these functional forms in the context of hydraulic network analysis and model calibration are discussed in Chapter 5. The backward error signal propagation begins after an input vector has been presented to the feedforward net. As shown in Figure 4.18, each of the output units is then supplied with the corresponding component of the desired output vector T.\ in the training set. The subscript t indicates that the network is processing the t th random sample "drawn" from the input space and applied to the unknown relationship being approximated by the ANN and may be best viewed as discrete points in time at which the 1/0 behavior of the hydraulic network is observed. At any point in time the pattern of activation over the feedforward ANN corresponds to the activation values of the constituent PEs and may take on different values at different points in time. Although the information processing operations performed by a PE are local, the changes in the values of the state variables (or the pattern of activation) over some period of time is not described by the localized computational process of the PE. The bounds within which the values of these activations vary are determined by the form of the activation Chapter 4. DEVELOPMENT OF THE NETWORK MODELS 103 functions. However, the processes that cause the variations in the pattern of activation over the entire network are not governed by the mathematical form of the various functions [Rumel86b]. Each PE in the output layer (L + 1) calculates a correction term, and updates the values of the its local weight parameters associated with the incoming signals. In addition, PEj'* 1 computes and transmits an error signal backwards (equal to to the PEs in the last hidden layer L. PEj's then use these values to modify their weights using w^ = it ji(t-i) + ^ ji w w a n d finally transmit the error signal (equal to f^ -i)^ft) w t back to the PEs in the hidden layer I = L — 1. The subscript (t — 1) is used to denote that non-updated weight values used in the forward pass are used for the calculation of the error signal. This process is repeated until the PEs in the first hidden layer, PEj, have updated the values of the weight parameters in their local memories. 4.2.2 Performance Measurement and Function Approximation Accuracy Feedforward ANNs trained by error backpropagation excel in learning unknown functional forms or mappings by encoding the information present in a set of T training I/O data pairs. This information processing operation may be mathematically characterized as the approximation of an unknown bounded function or mapping <f> : S C R m i — • Z C R from a compact set n S of the m- dimensional Euclidean space to a bounded subset <-A(S) of n- dimensional Euclidean space by means of T training I/O examples ( { ( s i , Z j ) , • • • , (ST,ZT)}) of the mapping, where z = <j){s ),t = 1, • •-,T[Hecht90]. As before, it is assumed that examples of the mapping, <f>, d t are generated by selecting the input vectors randomly from S in accordance with some fixed but unknown probability density function p(s). Chapter 4. DEVELOPMENT Layer: PEs/layer: OF THE NETWORK Input Layer Hidden Layer Output Layer /=0 I=L l = L+l m =3 2m+l n =3 Connections: m x (2m + 1) 104 (2m+l)xn Parameter Update: *>••' MODELS W? = W?_ ri6] z} 1+ - - **K« (zf t t t z )a% jt Figure 4.18: The error signal backpropagation phase in feedforward ANN starting with the outputlayer. Each node in layer / computes a 6 j and backpropagates unique error signals equal to w g 6 to the units in layer / - 1 to its left. The PEs in layer / - 1 (with the exception of the last hidden layer L) then compute SjJ and update local weights. l t l ld l jt 1 Chapter 4. DEVELOPMENT OF THE NETWORK MODELS 105 During training, a set of inputs with known target or desired outputs are initially propagated forward through the network producing an ANN computed output vector. This output is a function of the input vector, s and the connection weight parameter vector, w, at the t previous time-step and is denoted by F(s 1 ,w t _ 1 ). The BP algorithm is an extension of the LMS learning algorithm outlined in Chapter 3 and minimizes a quadratic performance or error function of the difference between the ANN computed values, F(s t ,w t _i), and observations of some unknown data generating function, <t>(*t), i-e., £,(st,wt_i) =' \<f>(s ) - F(s t ,w t _!)| 2 (4.74) t I J I 2 (4.75) St is the square of the approximation error made by the feedforward ANN on the t th 1/0 pair given (i.e., fixed) the weight matrix representing all the interconnections between the units at the beginning of this time step. In general, the ANN computed values of the outputs, z< = Ft(s , W t _ i ) will not be the t same as the target or desired values zf. Assuming that w = w t _i is fixed, then as with the LMS method, for each training I/O vector pairs, the mean squared error function of the ANN over all the T I/O pairs in the training sample can be expressed as: 1 S(w) = lim — K ' T^OO T T fr{ S (4.76) t which is simply the expected value of £(w) for a large enough sample size T, The sum of the square of the approximation error may be written as: S t = Ms )-F(8 ,w -i)\ 2 t t t Chapter 4. DEVELOPMENT OF THE NETWORK A MODELS 106 3=1 The average error over all T training I/O vector pairs is defined as £( ) = ^ E E ( 4 - ^ ) w 2 t=i j=i ' (- ) 4 78 with the factor | has been introduced in the literature for convenience and does not affect the derivation. Clearly, £ t and £(w) are > 0 since they are the sum of non-negative quantities and hence the error surface of the feedforward ANN, is "above" the weight space and is defined by € = £(w) in the (l+J2i=o NiNi+i) dimensional space of vectors (w, £(w)). The dimension of the parameter vector w, n = Yli=oNiNi+i,"\s the total number of connections in the feedforward c ANN with m inputs, n outputs, and L hidden layers. Thus, £(w) defines a non-negative error in the nc-dimensional space, R" , for each w, and represents the ANNs average squared error c of approximating the function <j> over all training I/O vector pairs. The BP algorithm prescribes how to modify the weight vector w in light of training I/O vector pairs so that £(w) will decrease towards a minimum of the error function resulting in an ANN that will improve its ability to approximate the unknown function <f>. Typically the global minimum of the error will not be zero since the feedforward net is unable to exactly implement the desired function mapping. The algorithm improves its estimate of the desired output by varying the weight parameters in a manner that reduces £ t as fast as possible with the objective of finding a weight vector w* for which £(w*) = £m,-„ > 0. Slightly different results are obtained depending on whether the basic gradient descent search for the global minimum of the ANN error is carried out in the weight space on the basis of £ t or £. In the former case, the corrections to the weights are made after each and every Chapter 4. DEVELOPMENT OF THE NETWORK MODELS 107 presentation of an I/O vector pairs in the training set. In other words, function information is extracted and encoded by changes made to the values of the weight vector one I/O vector pair at a time. This sequential procedure for minimization of the approximation error, St, is not the same as the minimization of S which is characteristic of statistical least squares and regression models[Pao89]. Procedures for minimization of the total sample error S ("off-line" or batch mode) modify the weight vector once after the presentation of the sequence of T I/O vector pairs. 4.2.3 Derivation of the Basic Self Adaptation Equations To a large extent, the versatility of the BP algorithm is due to its general adaptive model of the learning process that allows arrangement and formation of patterns of connectivity between individual units by adaptive weight modification. The power of the BP algorithm lies in its ability to organize the PEs in the hidden layers to act as "feature detectors" that respond to specific features in the input vectors. The special nodes in the input layer either "fan-out" the impinging signals unchanged or apply some specific transformation function to the various classes of inputs before distributing the transformed signal (see Chapter 5 for examples of these signal transformation functions). With the exception of the special PEs in the input layer, all other PEs in the hidden and output layers (PEj, for j = 1, • • - ,m and / = 1, • • -L + 1) first perform a sum-of-product calculation to determine their net-input. For example, in the first hidden layer, for j = l,---,Ni, PEj Chapter 4. DEVELOPMENT.OF THE NETWORK MODELS 108 calculates: «=i and dx) t ~ it T S dw Jt or more generally, and dx) t This sum-of-product calculation is then followed by the evaluation of some suitable activation function followed by the evaluation of an often linear transfer function: where a] and fj are sigmoid, monotonically increasing, and differentiable functions implemented by PEj. Note that _ i _ and since v~i — 1, dzjt _ dzj dyj _ dyj t )t dx yjt jt d dx t t )t dx It is often assumed that both the form and parameter values of these functions are the same for all the units in some or all the hidden and output layers. Chapter 4. DEVELOPMENT OF THE NETWORK MODELS 109 The net-input to any node j in the hidden and output layers (/, / = 1,---,L+ 1) may be computed by: 'jt = J2 v'iidT x 1 and 4f = f}(iit) = i6t Note that \t _ l—l dx "it and 'jt dx' dz jt = *Vjt dx' jt The expression for Et computed after the presentation of a single I/O training pair may be written as: (zjt = zf +l t with the layer superscript) In theory, convergence towards improved values of the connection weight parameters can be achieved by applying incremental changes to Awji that are proportional to the rate change of S with respect to the value of the weight parameter Wji associated with any connection to t PE j from unit i. In other words, Aw jit =- V ^ - (4.80) Chapter 4. DEVELOPMENT OF THE NETWORK MODELS where n is referred to as the learning "rate" parameter. Note that Awj^t+1^ 110 — Wj^ ) t+1 — WJH, or equivalently, AWJH = WJH — Wj,(t_i) depending on the convention used. This chapter uses the latter convention for convenience while Chapters 2 and 5 use the former. The partial derivative AWJH = —V^ffi can be evaluated by iterated application of the well-known "chain" rule from differential calculus ji dw jt dx \ dwji) (4.81) Defining the weight correction as 6j — t the apphcation of the chain rule as shown above results in: 6 = {zft - z)t)f;{a)(x l]t)) l jt (4.82) which can be used to write a compact expression for calculating the amount by which to modify the value of each weight parameter for the output layer: Awf+ l = nSf+'zJl (4.83) Chapter 4. DEVELOPMENT OF THE NETWORK MODELS 111 The above expression can only be used to calculate the values of the weight parameters associated with the output layer with / = L + 1 since for the PEs of the hidden layers, the desired or target values of PE j , i.e., zjf, are not known in advance. For the PEs in the hidden layers, l t the expression for Aw j may be derived as follows: Aw j = ^Jj^iS l l it d£ dw' t = j{ _ it dS dx' dx' \ [dw'jj t jt jt dS ( d dx< <w, ,=i h "jt t ' dZt "*jt dz[ dzL dxL t Jttt j-i i=l = »=i E'^W'c^)))* 1 ; (- ) 4 84 The change to be apphed to the parameters are thus Aw l 4.3 = 6 z£ l jit V l jt (4.85) Integrating Hydraulic Network Models with Feedforward ANNs Supervised training methods, such as the error backpropagation and the LMS algorithm outlined in this and the previous chapter all require a "teacher" that can correctly generate or Chapter 4. DEVELOPMENT OF THE NETWORK ANN Outputs? + ANN MODELS ? 112 Desired Outputs WDN Figure 4.19: Unknown source of data for supervised training. The desired WDN model output may be obtained from field measurements of pipeflowratesand nodal energy heads represents the desired or target WDN model outputs. The desired ANN outputs (i.e., the control inputs to the WDN) so that it can follow the desired n-dimensional equilibrium WDN output trajectory are not known but are needed to train the multilayered feedforward ANN by error backpropagation. sample a "sufficient" number of training I/O data pairs representative of the stimulus-response behavior of the system under typical operating conditions. Similar to the choice of the network architecture and learning algorithm, the decision on the source and size of the training data are important design decisions and largely dependent on the specific details of the problem under investigation (see Chapter 5). To illustrate, suppose that in the context of the hydraulic network model calibration problem outlined in Chapter 2, a feedforward ANN must be configured and trained to correctly compute the WDN model parameters given incomplete sampled data (field observations) at a limited number of points in the distribution system. These sampled data, represent the desired or target outputs that the WDN model must try and match. For example, the surrogate objective of hydraulic network model calibration is on-line Chapter 4. DEVELOPMENT OF THE NETWORK MODELS 113 ANN WDN Calibration Method WDN Figure 4.20: Interfacing multilayered feedforward ANNs with existing WDN models and Calibration Methods. During backpropagation the WDN model calibration method supplies the feedforward ANN with training data. The input vectors to both the ANN and the model calibration method are supplied by the WDN model successively at each cycle. control of a water distribution system by matching the measured values of the various state variables to those prescribed by the hydraulic network simulator and to be followed by the WDN or plant. As shown in Figure 4.19 the source of data needed to train feedforward ANNs to perform this class of tasks is not immediately obvious. This problem arises due to the difficulty associated with establishing the source of the desired or target ANN model outputs needed to train the feedforward network by error BP. Clearly, if the target ANN outputs (i.e., WDN inputs) needed to achieve the desired WDN outputs were known, then both the hydraulic network model calibration and the surrogate control problem would be solved. Figure 4.20 is a schematic of how the I/O to/from the ANN may be interfaced with the Chapter 4. DEVELOPMENT OF THE NETWORK MODELS 114 ANN WDN Figure 4.21: ANN Training scheme for learning the desired WDN output. During training, the multilayered feedforward ANN uses the outputs generated by the hydraulic network simulator as the taTget output trajectory of the WDN to follow. If training is successful, then the ANN has in effect learned to duplicate the WDN model output for the same range of input values. I/O to/from any existing WDN model calibration methods to duplicate the results of that particular method. During training, the desired/target ANN output for a given input is simply set to the output produced by the calibration method for the same input. This scheme has been applied by Widrow to a version of the pole-balancing problem [Widrow85] and may be viewed as a method for constructing an "expert" system by encoding the knowledge acquired from an existing "expert" method in the parameter values of the ANN[Barto89]. In this way, training data directly provided by the human engineer can be implicitly encoded in the connection weight parameters of the ANN. With enough training, such an ANN will gradually outperform any existing calibration method since it can incorporate training data supplied by different methods. Although the usefulness of this configuration in light of an existing WDN model calibration Chapter 4. DEVELOPMENT OF THE NETWORK MODELS 115 method may be questioned, this training scheme may prove more efficient in situations where the use of existing calibration methods based on optimization or other methods are available but impractical or cumbersome to perform every time. In addition, this technique provides a basis for comparing the ANN computed values with those computed with an existing model and the assessment of the performance of some specific ANN implementation. Figure 4.21 suggests a scheme for supplying training data to an ANN for the purpose of learning the correct WDN model outputs (as defined in Chapters 2 and 5) in conjunction with estimates of the outputs based onfieldobservations of flowrates and pressures at a small number of points in the hydraulic network. At each step during the training phase, the input and output lines of both the ANN and WDN are the same. The configuration above only depicts the situation where the ANN is mapping from the WDN model inputs to outputs, but more complex types of configurations, such as delayed or transformed inputs to the ANN, may also be specified in a similar manner. Figure 4.22 shows how an ANN can be configured to learn the appropriate WDN model inputs. In contrast to the configuration shown in Figure 4.21, in this scheme, the input to the ANN is the output from the WDN model and the target output of the ANN is the WDN model input. Note that only during training is the WDN model input supplied to the ANN as the desired/target output. This general configuration has many applications in the context of design, analysis and on-line operation of WDNs. Chapter 4. DEVELOPMENT 1 OF THE NETWORK -** MODELS 116 ANN WDN Figure 4.22: ANN Training Scheme For Estimating WDN Model Parameters. During training, the ANN uses the inputs to the hydraulic plant as target/desired outputs and the target/desired outputs of the plant as inputs. The ANN here is trained to identify the inverse of WDN model. 4.3.1 Mapping Hydraulic Networks to Feedforward ANNs In order to specify the most suitable ANN architecture for representing hydraulic networks with given invariant geometry, it became necessary to first decompose the networks into a set of static and dynamic components and then proceed withfindingthe appropriate mapping between the corresponding sets of static and dynamic components of the two classes of networks. A methodology was needed to assist with the systematic mapping from the basic topology of the hydraulic network (specifically, the number of nodes, pipe segments, and loops) to the appropriate feedforward topology (i.e., number of layers and PEs/layer). The advantages of such a mapping method is that the same multilayered feedforward ANN representation of the hydraulic network can be used to study different classes of problems associated with the design, analysis, and operation of water distribution systems, as defined by arbitrary groupings of the I/O variables and parameters. Chapter 4. DEVELOPMENT OF THE NETWORK MODELS 117 In the remainder of this subsection, two methods are suggested for the determination of the number of nodes, layers, and nodes per layer in a feedforward ANN based on the number of nodes, pipe segments, and loops in the hydraulic network. The first method utilizes Kolmogorov's Mapping Neural Network Existence Theorem that establishes an interesting proof regarding the requirements for the representation of arbitrary continuous functions from the m-dimensional cube [0, l ] m to R using feedforward ANNs [Hecht90]. n Kolmogorov's Mapping A N N Existence Theorem Given any continuous function <f>: [0, l ] i—• R , <f>(s) m n = z, <j> can be implemented exactly by a three-layer feedforward neural network having m fanout PEs in the input layer, (2m + 1) PEs in the middle hidden layer, and n PEs in the output layer. The processing elements in the hidden layer of the Kolmogorov mapping network implement the following general form of transfer function[Hecht90] m 2}=EAM*+i0+J t=i (4-86) V> is a continuous real monotonically increasing function which is independent of <j) (but dependent on m), A is a real constant, and e is a rational number 0 < e < S, where 6 is any arbitrarily chosen positive constant. The n units in the output layer implement the following transfer function: 2m+l 4 = E /;(*) (- ) 4 87 where the fj (j = l,---,n) are real continuous functions dependent on <j> and e. Unfortunately, there are no specific examples of the functions xf) and'/j and the constant e are known. In addition, there are no constructive methods for their specifications (although arbitrarily Chapter 4. DEVELOPMENT OF THE NETWORK MODELS 118 chosen functions can be easily verified) and hence the above theorem is strictly an existence theorem[Hecht90]. Although there are no procedural methods yet available that can be used to find the exact form of the various functions implemented at the PE's of such a network, it provides a valuable heuristic for network sizing and the assurance that any continuous function mapping from vectors on the unit cube can be represented by a feedforward ANN with exactly one hidden layer[Hecht90]. Kolmogorov's theorem may be used to specify the topology (or more specifically, structure) of a feedforward network with one hidden layer, by simply setting the number of PEs in the hidden layer to 2(iV/l -f Sh) + 1 where Nh is total number of sink and source (k + r) nodes and Sh is the number of pipe segments (s) in the hydraulic network. Although Kolmogorov's theorem proves that one hidden layer is always enough and an appropriate architecture must exist, when modeling real-world hydraulic network problems, it may be necessary to have more than one hidden layer. This is due to the fact that a feedforward ANN with one hidden layer could require an unreasonable number of PEs and connections. An adequate solution can often be obtained with smaller numbers of PEs per layer and total connections. An integer linear programming formulation is presented below for the specification of the topology of feedforward ANNs with more than one hidden layer based on the number of nodes, pipe segments, and loops in the hydraulic network. This formulation is based on the following assumptions: 1. The number of hidden layers in the ANN (L ) is equal to the number of non-overlapping n Chapter 4. DEVELOPMENT OF THE NETWORK MODELS 119 loops in the hydraulic network (Lh), i.e., L = L = Lh n 2. The number of PE's in a layer is greater than or equal to the sum of the number of nodes and pipe segments in the loop corresponding to that layer multiplied by 2, plus one. 3. The total number of connections in a multilayered network with more than one hidden layer is less than or equal to the number of connections in a feedforward network with only one hidden layer. With these assumptions, the objective function and the constraints of the linear program may be written as: L Minimize: N = N + • • ••+ N n 0 L+1 = N + £Ni i=i 0 +N L+1 (4.88) subject to: 1. N = m 0 2. Ni > 0; / = 0,---,L + 1 3. N L+1 =n 4. Ni < 2(ne + ri + s{) + 1 where rn, re, and se are the number of sink nodes, source nodes, and pipe segments in the corresponding loops of the hydraulic network respectively. 5- Efco l i+i N N < m(2m + 1) + (2m + l)n = (2m + l)(m + ») It is noteworthy that the number of PEs in the various hidden layers is constrained by the number of nodes and pipe segments in the corresponding loop and is only indirectly dependent on the number of inputs/outputs to the overall network. Constraint 5 above ensures that the Chapter 4. DEVELOPMENT OF THE NETWORK MODELS 120 number of connections in a feedforward ANN with more than one hidden layers is less than or equal to the number of connections in a net with a single hidden layer3. 4.4 Implementation of the Network Simulators The primary objective of developing the ANN and WDN network simulators was to enable efficient representation of the two classes of network models and provisions for interfacing their I/O during training. In general, the implementation goal and the design philosophy were to ensure modularity, portability, and re-usability of the various data structures and procedures while at the same time ensuring that each model simulator could be implemented and tested independently. An important characteristic of the two classes of networks under consideration in this study is their hierarchical structure. Efficient implementation of WDN and ANN networks in software required the specification of the various data structures needed to represent (in conventional computer memory) the underlying topology of the two network classes and the various functional relationships between their constituent components. One way to meet the above requirements is by the provision of some standardized descriptive definition of the various network classes and a set of facts describing the relationships between the components of these classes and among the constituent components within a given 3 Note that the objective function may also be expressed as: L Minimize: N n = N +a N 0 1 x H OCLN l + N +i L = N + ^ogNi 0 + N +i L with ai is the closest integer proportional to the total volume of the various pipe segments in loop £. (4.89) Chapter 4. DEVELOPMENT OF THE NETWORK MODELS 121 class. This organizational scheme allows knowledge about each class of network to be represented by making use of two basic concepts: the existence of identifiable standard network components making up a particular network class and the existence of relations between the various components. In this way, interrelated knowledge about the structure of the two network classes, their idealized constituent components, and their physical and abstract configuration in space and time can be easily retrieved and interpreted. Efficient memory management required the identification of a set of common data structures and procedures used by the various components of the two network simulators. The objective was to minimize the interaction and overlap between the various data structures used to represent the two independent classes of networks, while at the same time maximizing the logical grouping of functions within a given class and between the two network classes. Within practical constraints, an attempt was made to ensure that each simulator: • is efficient in execution time, • is efficient in terms of the use of conventional memory, • is relatively easy to use and interface with other application software, • is sufficiently general to allow the simulation of different types of ANN architectures with a minimum amount of modification to the programs. 4.4.1 Measures of A N N Performance The performance and capabilities of software or hardware implementations of ANN architectures in general and feedforward networks trained by error BP in particular, may be assessed based on Chapter 4. DEVELOPMENT OF THE NETWORK MODELS 122 the comparisons of capacity, speed, accuracy, generality, configuration and interface provisions, and cos£[Hecht90]. I. Capacity The most common measure of the capacity of an ANN implementation (in hardware or software) is the total number of networks of specified types and sizes that it can store for immediate online use. This measure of capacity is very important since there are a number of applications that require the integration of multiple network architectures. Other measures of capacity include the maximum number of processing elements and inter-processor connections. Each processing element is required to store, for example, one real-valued floating-point weight parameter for each connection (associated with the incoming signals), one floating-point value for the activation value of the node, and possibly one value for storing the previous output signal value. II. Speed Speed, measured in elapsed real time, is a measure of how fast an ANN of specified size can be moved forward in "network time" (as defined by the various functions implemented at the PEs) by one unit time step. Speed of an ANN implementation is hence determined by how fast the various PEs can all be either updated or moved forward in time either by one network time unit or by the time it takes for the network to undergo one complete cycle during the training phase. For example, in a feedforward network trained by error BP, each training cycle consists of a forward pass to compute the network output and a backward pass to adjust the Chapter 4. DEVELOPMENT OF THE NETWORK MODELS 123 weights. Commercial and "public-domain" or "shareware" ANN implementations are often benchmarked against a feedforward network with a given number of layers and PEs per layer so that the speed measurements can be normalized[Hecht90]. The total number of weights that can be updated in one second is often used to measure the speed of an ANN implementation in units that can be applied to a wide range of feedforward network sizes. In the literature, two speed measurements are often reported - one for the forward pass and one for the combined forward and backward passes. There is currently no well-established, non-empirical methods for measuring and comparing the speed of different feedforward networks trained by error BP. Small differences in the learning parameters can result in learning times that differ by an order of magnitude when implementing essentially the same learning algorithm and solving the same problem [Fahlman88]. III. Accuracy The accuracy of an ANN implementation is a measure of performance often determined by assessing the numerical accuracy of the ANN computed outputs given the specific requirements of the problem under investigation. The accuracy of feedforward networks trained by error BP can be measured using the root mean squared error calculated with both the training and testing data sets (of size T and K respectively). Chapter 4. DEVELOPMENT OF THE NETWORK MODELS 124 IV. Generality Most ANN simulators are suitable only for implementing specific types of ANN architectures (and the corresponding learning algorithms). There are however a few simulators that implement several different architectures and learning algorithms. For a specific list of known architectures, generality of an implementation may hence be viewed as the measure of whether the ANN implementation can or can not be used to simulate the various architectures. The ANN network simulator used throughout the experiments is limited to feedforward ANN architecture and the BP learning algorithm. V . Configuration Provisions For an ANN implementation to be useful, it must be configured to "run" some desired architecture "on the fly". There are two basic approaches that have been used in practice. First, the most popular ANN implementations can best be realized by means of "off-the-shelf" hardware components and software modules that have been customized to ensure the maximum possible execution speed. However, when the suitability of standard ANN architectures and the corresponding learning algorithms are being contemplated for the solution of specific engineering problems, it is essential that the candidate ANN architectures and the corresponding algorithms be initially implemented and tested (entirely) in software. VI. Interface Provisions For many engineering applications, such as hydraulic network analysis and model calibration, it is essential for conventional application programs to be able to call ANNs as "subroutines". Chapter 4. DEVELOPMENT OF THE NETWORK MODELS 125 Hence, the simplicity and ease-of-use with which the application software can be interfaced with the ANN is of great importance for many practical applications and requires careful consideration of a qualitative, subjective set of interface provisions. The interface between the two network simulators is strictly by the use of external files. VII. Implementation Costs Cost is perhaps the most difficult performance measure that must be estimated in a reasonable manner. There are a number of approaches that may be used to assess the cost of a particular ANN implementation. The approach that incorporates all the various costs involves the calculation of the delivered capability cosi[Hecht90]. Given an ANN of a certain architecture and size that is activated from software, the delivered capability cost is the total cost of running the network through some specified number of update iterations [Hecht90]. In addition to the initial cost of design specifications, this cost includes the purchase price of the ANN software and/or hardware components; the installation costs; system administration and maintenance costs, and so on. The running time of the ANN for which the delivered capability cost is being determined is then divided by the total estimated running time of the ANN over its useful life (i.e., the time during which the neurocomputer will actually be running the desired ANN architectures). The total cost of the ANN is often multiplied by this fraction to obtain an estimate of the delivered capability cost for the implementation. As pointed out by Hecht-Nielsen[Hecht90], the advantage of estimating the delivered capability cost is that it includes the initial cost, the projected maintenance costs, and any savings that may result in a multi-user environment [Hecht90]. Chapter 4. DEVELOPMENT 4.4.2 OF THE NETWORK MODELS 126 The W D N Model Simulator The primary purpose of the hydraulic network simulator is to generate appropriate training data for the feedforward network simulator briefly described in the following subsection. Data generation begins by processing data initially read from a hydraulic network definition file containing the number of nodes, pipe segments, number of training/test I/O data pairs in the various data set to generate, the underlying probability distribution to use, and so on. The network definition file also contains other information for each node such as the connectivity information needed to calculate the lengths of the various pipe segments; the mean and standard deviation of the friction factor for each segment followed by estimates of the required energy heads at the various nodes and their heuristically estimated standard deviations. The processing of the "raw" hydraulic network data involved the computation of pipe and externalflowratesbased on random generation of the values of the friction factors and energy heads (see Chapter 5 for details). As pointed out in Chapter 2, this approach to data generation is computationally efficient since it avoids the need for the conventional iterative solution of the non-linear system of equations because a direct solution of the non-linear system of equation (formulated with the pipe segment and externalflowrates as the unknowns) is now possible. The various generated and computed values were then grouped together for the three specific classes of problems and the I/O data was then subject to a number of transformations which were finally normalized before being used to train and test the various ANNs (see Chapter 5 for details). Chapter 4. DEVELOPMENT OF THE NETWORK MODELS 127 Summary of the Data Generation Algorithm The hydraulic network simulator generates training and testing I/O data using the following procedure: begin II: f o r ; = 1,- - ,Nh, do 1.1. get the time seed 1.2. randomly generate nodal energy heads (using mean and variance) end for i = 1, • • - ,Sh, do 2.1. get the time seed 2.2. randomly generate friction factors (using mean and variance) 2.3. calculate the pipe segment loss factors K s 2.4. calculate the pipe segment flowrate 4 end for j = 1, •••,iVfc, do for i = 1,-- -,N (i / j), do h 3.1. if node i is connected to node j, then do add flow in segment connecting node i and j end *The standard Darcy-Weisbach equation was used to directly calculate the pipe segment flowrate qi by rearranging the following equation: A£. A £ ' _ f ~ = i i i l 'd 2g fr t "Ti(jt) Tg f _ 8/riij 2 = K} iq (4.92) where f i is the generated value of the friction factor for segment i, li and di are the pipe length and diameter and vi = qt/ni is the average velocity of the fluid through the pipe, a; is the cross-sectional area of the pipe segment. r Chapter 4. DEVELOPMENT OF THE NETWORK MODELS 128 end end re-arrange values into the desired groups of I/O vector pairs (see Chapter 5). end II . 4.4.3 ,( • ' The A N N Model Simulator The ANN model simulator also processes an external data file containing the feedforward network's definitions such as the number of nodes in the input layer and the target layer(s) to which it is connected, the number of nodes in the output layer and the source layer(s) from which it receives inputs, followed by the number of nodes in each hidden layer and the destination layer(s) to which they output. Once a network definitionfilehas been processed, the ANN simulator initializes the various weight vectors between layers, prompts for the values of the momentum and learning rates to assign (offering default values), the name of thefilecontaining the set of training 1/0 data pairs, the number of cycles to run the simulation for,filesto store the values of root mean squared (RMSE ) t and maximum errors in, whether to propagate test data through the network as learning evolves, and so on. The ANN simulator allows weight adjustment to be made either after a complete sweep over the entire set of training I/O data pairs has been made (off-line or batch mode) or after a single I/O pair has been propagated through the network (on-line or real-time mode). All the feedforward networks outlined in the next chapter were trained on-line. Chapter 4. DEVELOPMENT OF THE NETWORK MODELS 129 Summary of the Error Backpropagation Algorithm The standard error backpropagation algorithm for any feedforward ANN with L hidden layers supplied with a set of T training I/O vector pairs: (si,zi), • • •, (ST, ZT) is summarized below. Chapter 5 outlines more specific details in the context of the experiments reported therein, begin II: initialize weights to appropriate random values for t = 1, • • •, T I/O vector pairs do: 1.1. for j = 1, • • - ,iVo(m) units in the input layer do: 1.1.1. apply the transformation function to input s t jt z = Hj( «) s end 1.2. for / = 1, • • •, L hidden layers do: 1.2.1. for each PEj in each layer do: 1.2.1.1. compute the net-input to the PE jt X — 2^»=0 jit it W z 1.2.1.2 update the level of activation of the PE ?}t = «it(*Jt) 1.2.1.3. evaluate the transfer function and output from the PE \t = z flMt) end end 1.3. for j = 1, • • -NL I + = n in the output layer then do: 1.3.1 compute the net-input values to the unit jt x — 2^,=0 ji w it z 1.3.2 update the level of activation of the PE 1.3.3 evaluate the transfer function and output from the PE 1.3.4 compute the error delta terms using 1.3.5 compute changes to be made to values of weight parameters Chapter 4. DEVELOPMENT V>j OF THE NETWORK j(t-l) - jt W t MODELS 130 jt no z 1.3.6 back propagate error signaltft' ^i[t-i) 1 w end 1.4. for I = L+ 1,--,1 do: 1.4.1 for j = 1, •••,#/ do: 1.4.1.1 update the weights 1.4.1.2 back propagate error signal S jt j%(t-i) l wl end end 1.5. if RMSE t = yJ^iZj-MtJ*? < m S E d t h e n stop else goto step 1 end end II 4.5 Specific Application Problems The numerical experiments outlined in Chapter 5 have been designed to study the convergence and performance of feedforward ANNs for modeling three characteristically different classes of hydraulic network problems using the suggested representation and training schemes developed in this chapter. 4.5.1 Problem 1A: Hydraulic Network Analysis with Complete Data Conventional hydraulic network models provide a steady-state solution of the non-linear system of equations for the s unknown pipe segment flowrates (q ), the A; energy heads (Efc) at the s Chapter 4. DEVELOPMENT OF THE NETWORK MODELS 131 Problem 1A ANN Qfc K WDN Simulator s Er Figure 4.23: Interaction between the network simulators during training (Problem 1A). The output of the ANN is the n = (k + r + s)-dimensional vector z = [E^, q j , Q,J] and the input vector to the ANN is the m = (k + r + s)-dimensional vector s.= [ Q ^ , K j , E ^ ] t . The various input and output vectors are subscripted by their dimension, i.e., E r & Q r are vectors in R r , Ejt & are vectors in R fc , and, q & K are vectors in R a . The ANN is trained to approximate the function from a closed and bounded subset of R m onto R n with m = n = k + r + s. T a s sink nodes, and the r external flowrates (Qr) at the source (reservoir or tank) nodes5. The aim of the numerical experiments for this problem was to evaluate the relative performance of a number of feedforward networks trained by error backpropagation to effectively find the unique solution to the system of non-linear hydraulic equations at which F(z*) = 0 (see Chapter 2 for details). Note that unlike the existing WDN models that solve a reduced system of equations (for example, for the nodal energy heads) and then use the known.energy-loss/flowrate equations to determine the remaining unknowns, the feedforward ANN are configured to solve the basic (and larger) system of equations involving all the unknowns simultaneously. It is noteworthy that a multidimensional root finding problem has been effectively solved As shown in Figure 4.23, during training the feedforward ANN, for a hydraulic network of a given topology, the solution of the (k + r + s) unknowns (outputs from the WDN simulator) requires estimates of (fc + r + s) network parameters, namely, the.s pipe resistances (R ) or loss factors (K ), the k external demand flowrates (Qfc) at the sink nodes, and the r energy heads (E ) at the source nodes (inputs to the WDN model). 5 3 r s Chapter 4. DEVELOPMENT OF THE NETWORK MODELS 132 Problem IB Efc ANN Qfc Qr .Eg WDN Simulator qf Qr d Figure 4.24: Training scheme for learning model outputs with partial inputs (Problem IB). Note that as in Figure 4.23, the output of the ANN is the n = (k + r + s)-dimensional vector z = [E^, qj, Qj] . However, as shown above, during training, the values of pipe flow rates are T not a part of the now (k + r)-dimensional input vector to the ANN and s = [Qf, E;T] . As before, the various input and output vectors are subscripted by their dimension (i.e., E r &; Q r are vectors in R , Ejt & Qjt are vectors in R , and, q is a vector in R ) , however, during error backpropagation, the ANN is not supplied with values of the pipe loss factors, K . The ANN is trained to approximate the function from R to R with m = (k + r) and n = (k + r + s). T r fc a s s m n using gradient search in the weight parameter space of feedforward nets as outlined previously. As pointed out by Press et al., multidimensional root finding is relatively harder than gradient search since the components of the gradient vector are not independent, arbitrary functions and obey integrability conditions that are highly restrictive[Press89]. The error BP algorithm essentially finds a minimum by "sliding downhill" on a single error surface over the weight space and therefore the criteria for moving towards the minimum is onedimensional. There is no analogous one-dimensional criteria for finding a multidimensional root since moving "downhill" now means moving downhill simultaneously inm = n separate function spaces with no simple rule as to how much progress in one dimension should be sacrificed for Chapter 4. DEVELOPMENT OF THE NETWORK MODELS 133 Problem 2A ANN Ki •E WDN Simulator fc Qr Figure 4.25: Interaction between the network simulators during training (Problem 2A). The output of the ANN is the n — (k + r + .s)-dimensional vector z = [ Q , K.J, E ; T ] and the input vector to the ANN is the m = (k + r + s)-dimensional vector s = [Ej, q , Qj] . As before, the various groups of input and output vectors are subscripted by their dimension, i.e., E &: Q are vectors in R , Efc & Qfc are vectors in R , and, q & K are vectors in R . The ANN is trained to approximate the inverse of the function represented in Figure 4.23 from R to R with n = m = k + r + sas before. T T T T r r fc s s r s n m progress in another6. 4.5.2 Problem IB: Hydraulic Network Analysis with Limited Data In order to investigate the more typical real-world situation where reliable estimates of pipe resistances are not available, a number of ANNs were trained to compute the s pipe flowrates (q ), the k energy heads at the sink nodes (E^), and the r source node external inflows (Q ) r s given only the energy heads at the r source nodes (E ) and estimates of the range of the k r demand discharges at the sink nodes (Qfc) (see Figure 4.24). As pointed out by Press et al., a popular method often used in multidimensional root finding is to construct a positive definite function with a global minimum of zero (exactly at all solutions of the original system of nonlinear equations) by adding up the sum of squares of the individual functions[Press89]. As with the error backpropagation algorithm, the main problem with the various minimization algorithms is due to the existence of local minima on the function surfaces indistinguishable from the global minima. 6 Chapter 4. DEVELOPMENT 4.5.3 OF THE NETWORK MODELS 134 Problem 2A: Hydraulic Network Model Calibration with Complete Data The second class of problems investigated experimentally was the hydraulic network calibration problem initially modeled as the inverse of the hydraulic network analysis problem described above. As shown in Figure 4.25, during training, the inputs to and the outputs from the feedforward ANNs were reversed. The goal was to train the ANN to compute the s pipe resistances (or loss coefficients), the k external demand outflows, and the r source node energy heads (i.e., the (k + r + s) parameters of the hydraulic network model) given only estimates of the range of design values in which the s pipeflowrates,the k energy heads, and the r external inflows may he (see Figure 4.25). The aim of the experiments was to evaluate the performance of feedforward networks trained by error BP for finding the solution of the nonlinear system of hydraulic equations written with the (k + r + s) parameters as the unknowns. Note that while the hydraulic network analysis may be viewed as system identification, hydraulic network model calibration may be viewed as the identification of the system inverse. In both cases, training information was obtained by observing the I/O behavior of the plant (i.e., the WDN model simulator) under a typical range of operating conditions and then interfacing this I/O with the I/O of the feedforward ANN as suggested by Figures 4.23 and 4.25. The trained feedforward ANNs for learning the hydraulic network model or plant inputs hence implements a mapping that is the plant inverse and can be used for surrogate control purposes. Chapter 4. DEVELOPMENT OF THE NETWORK MODELS 135 Problem 2B ANN Qfc Efc WDN Simulator Qr Figure 4.26: Training scheme for learning model parameters with partial input (Problem 2B). Note that as in Figure 4.25, the output of the ANN is the n = (k + r + s) dimensional vector = [Qk > J ' r H - However, as shown above, the values of pipe flow rates are not a part of the input vector to the ANN during training and s = [Ej-f, Qj] . The various ANNs were trained to successfully approximate the mapping from R to R + . z K E T T f c + r 4.5.4 fc+r s Problem 2B: Hydraulic Net Model Calibration with Limited Data Similar to the hydraulic network analysis problem, additional experiments were also carried out to investigate the more typical real-world situation where only measurements values for a subset of the state variables are available as inputs during training. Specifically, a number of ANNs were trained to compute the s pipe loss factors (K s ), the k demand discharges (Qjt), and the r source node energy heads (E r ) given only the externalflowratesat the r source nodes (Qr) and measurements of the k energy heads at the sink nodes (Efc) (see Figure 4.26). 4.5.5 Problem 3: Learning Arbitrary Functions This problem investigates the convergence and performance of feedforward ANNs configured for learning implicitly defined mappings between arbitrary groups of variables and/or parameters of Chapter 4. DEVELOPMENT OF THE NETWORK 136' MODELS Problem 3 ANN Qfc WDN Simulator K Figure 4.27: Interaction between the network simulators during training (Problem 3). The input vector to the ANN is the m = (k + r)-dimensional vector s = [Qj,Ej] and the output vector is the n = s-dimensional vector z = K . The ANN is trained to approximate the function mapping from R to R (for the 7-Node/1 l-Segment/5-Loop network the mapping successfully learned was from R to R ) . T s f c + r s 7 11 the hydraulic network model. Three specific examples in this class of problem were considered. The first two example problems considered have already been discussed as special cases of hydraulic network and model calibration with incomplete data (i.e., problems IB and 2B, see Figures 4.24 and 4.26 for training scheme). The example in this class of problems investigated experimentally attempted to test the ability of a feedforward network to learn the underlying relationship between the pipe loss factors as a function of the source node energy heads and sink node demand discharges, i.e., K a = <^>(Qfc,E). The practical goal was to train feedforward networks to estimate pipe loss r coefficients (K ) given only measurements of the r energy heads at the source nodes and the k s demand flowrates at the sink nodes (see Figure 4.27) with minimum amount of data. Chapter 4. DEVELOPMENT 4.6 OF THE NETWORK MODELS 137 Conclusions This chapter has presented a new and general approach for hydraulic network analysis and model calibration with both complete and partial data. In summary, 1. the basic error backpropagation algorithm was described and the self-adaptation equations were derived, 2. a number of training schemes that interface the I/O to/from the two class of network model simulators during training were outlined, 3. two methods for mapping arbitrarily complex hydraulic networks to their equivalent feedforward ANN representation were suggested which utilize the Kolmogorov's feedforward ANN existence theorem for the specification of the number of layers and nodes per layer, 4. the algorithms used to implement the WDN and the ANN network model simulators were presented and three specific classes of problems have been proposed as the basis for experimental verification of the hybrid model. The questions that must now be answered is "how well do the feedforward ANNs solve the assigned class of problems?" In other words, how well do the feedforward ANNs configured as prescribed in this chapter and trained by error BP approximate the unknown underlying functions with limited and possibly noisy training data sets? The following chapter will focuss on these question and outlines the details of a fairly comprehensive set of numerical experiments carried out to verify and compare the convergence and performance behavior of feedforward ANNs in the context of two specific hydraulic network examples. Chapter 5 E X P E R I M E N T A L VERIFICATION A N D ANALYSIS OF RESULTS 5.1 Introduction The suitability of multilayered feedforward ANNs for representing arbitrarily complex hydraulic networks and the generality of the error backpropagation algorithm for function approximation from sample input/output data have been discussed in detail in the previous chapter. A hybrid framework was developed for integrating and interfacing the I/O to/from the two classes of network simulators during training and testing stages. In addition, two possible methods were suggested for mapping hydraulic networks of given topology to their equivalent multilayered feedforward ANN representation. The objective of this chapter is to describe and summarize the key results of two groups of numerical experiments carried out to assess the versatility of the suggested representation and training schemes for solving the 3 different classes of hydraulic network problems detailed in Section 4.5 of the previous chapter and summarized below. Problem 1: The hydraulic network analysis problem outlined in Chapter 2. This problem may be viewed either as: 138 Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS 139 1 A : the solution of a system of nonlinear hydraulic equations with the number of equations (or known parameters, i.e., the (k + r + s) estimates of external demand discharges at the sink nodes, energy heads at the source nodes, and pipe segment loss factors) equal to the number of unknown state variables (i.e., the (k + r + s) nodal energy heads at the sink nodes, external flowrates at the source nodes, and pipe segment flowrates), ' IB: the solution of an underdetermined system of nonlinear hydraulic equations with (k+r+s) unknowns as before but only with the knowledge of (k+r) model parameters (i.e., without the estimates of the s pipe loss factors). Unlike the conventional iterative solution techniques that require a formulation with the same number of equations as the unknowns, and, "close-enough" estimates of the final solution vector in order to guarantee convergence, feedforward ANNs were trained to find the zeros of the system of equations starting with only random initial values of the interconnection weights. Problem 2: The hydraulic network model calibration problem also described in Chapter 2, which, depending on the number of unknown parameters, may be viewed either as: 2A: the solution of a system of nonlinear equations formulated with the model parameters, namely, nodal demand discharges, source node energy heads, and pipe segment loss factors as the unknowns; or 2B: the solution of a non-linear constrained optimization problem, that attempts to minimize the difference between the observed and measured values of a limited subset Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS 140 of the state variables (specifically, without any pipe flow measurements). Problem 3: For learning implicit functional relationships that exist between arbitrary groupings of the various state variables and/or parameters of the hydraulic network model. Specifically, a number of ANN were trained by error BP for computing the s pipe segment loss factors from only estimates of k external flowrates at the sink nodes and r energy heads at the source nodes. 5.1.1 Experimental Approach The various experiments described in this chapter attempted to model and solve the three classes of problems outlined above with reference to two specific examples of hydraulic networks of specified topology (see Figures 5.28 and 5.29 for schematic). Although the 3-Node/3-Segment/l-Loop hydraulic network depicted in Figure 5.28 is the simplest possible example of a hydraulic net, the solution of each class of problem described above for even this network is non-trivial. In addition, there are practical advantages in experimenting with the 3-Node hydraulic network. There are only a few (9) variables and (9) parameters involved and the candidate ANN architectures needed to represent and simulatethe corresponding class of hydraulic network problems were relatively small and easy to train. Finally, although success with the 3-Node network was no guarantee of the success of a given ANN architecture and the corresponding learning algorithm for a larger network, the converse was true and provided a simple heuristic for screening the candidate ANNs. If the ANN representation of the problem does not work with the 3-Node benchmark example hydraulic network, then there is no reason for expecting the candidate ANN to work with a larger network. Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS ft Flow rate in pipe segment, i Ki Loss factor in pipe segment i [K1,K2,K3] T Ej Qi Energy heads at node j [E ,E ,E ] [Qi,Q2,Q ] 141 T 1 Externalflowat node j 3 T 2 3 i]E P 2 » * 0-2 > Qi —<30 P Number of loops Number of nodes (source+sink) Lh = l TI + ki = 1 + 2 Number of segments si = 3 Figure 5.28: Example 1: The simplest 3-Node/3-Segment/l-Loop hydraulic network. Following the convention introduced in chapter 2, the source node is numbered last and the direction of flow in the various segments is from the higher to the lower numbered node. Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS 9i Flow rate in pipe segment i [9i,92, • •• , 9 i o , 9 i i ] Ki Loss factor for pipe segment i [K~i, K , • • •, iiTio, i^n] 7 Ej Energy heads at node j [Ei, E , • ••,E ,E ] Qi External flow at node j 142 T 2 T 2 6 7 ••,Q ,Q.7] T e Number of loops . L =5 Number of nodes (source+sink) N =k+ r= 7 Number of segments Sh = N + (L - 1) = 3 = 11 Number of nodes /loop ** + r/ = 3,for*=l,..-Z f c Number of segments/loop a/ = 3,for/ = l,.r-X A h h h h Figure 5.29: Example 2: the 7-Node/ll-Segment/5-Loop hydraulic network. Similar to the schematic of the 3-Node example (Figure 5.28), the (two) source nodes (6 and 7) are numbered last and the direction of flow in each pipe segment is assumed to be from the higher to the lower numbered end-nodes of the segment. Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS 143 Subsequent simulations were run for a 7-Node/ll-Segment/5-Loop hydraulic network (see Figure 5.29) representing a 3-fold increase in the number of variables and parameters of the 3-Node example. Experimentation with this network attempted to investigate how well the BP algorithm "scales" with the basic size of the hydraulic net as measured by the number of state variables and parameters. This example network was also used to investigate whether there are any advantages associated with representing the hydraulic net by means of a multi-layered feedforward ANN with the number of layers set equal to the number of closed loops in the hydraulic network as outlined in Chapter 4. Finally, a method is suggested for decomposing any given hydraulic network to arbitrarily smaller sub-networks that could then be trained and studied independently. 5.2 The Experiments The numerical experiments described in this and the following chapter represent a small fraction of the simulations carried out throughout the course of this study and have been carefully chosen to demonstrate the versatility of feedforward ANNs for hydraulic network analysis and model calibration under typical real-world operating conditions. The various experiments may be divided into two basic groups. Figures 5.30 and 5.31 provide an overview of Group I and Group II experiments. The code at the "heart" of the network simulators were written in the C programming language and are architecture-independent and were in fact compiled and run on a number of different conventional computing platforms equipped with an ANSI C compiler. The processing time required to complete a given simulation run was largely dependent on the training I/O sample size, the desired error performance Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS 144 criteria, the size of the problem, and the specifics of the serial machine architecture. Different simulations were run simultaneously on a host of close to 50 Sun1 workstations2. 5.2.1 Overview of Group I Experiments Group I experiments studied Problems 1A, 2A, and 3 outlined above (also see Chapter 4) while Group II experiments dealt with Problems IB and 2B. The primary purpose of Group I experiments were to investigate the effects of the following factors/parameters on convergence and performance of a given feedforward ANN representation of the three problems: • choice of the underlying probability distributions (Normal and Uniform) the deviates of which were used to generate the seed training and testing data, • presence of non-normalized random noise in training and testing data, • choice of the form and values of the parameters of the input transformation, activation, and transfer functions implemented at the PEs in the input, hidden, and output layers, • size of the training and testing I/O data set sizes, As shown in Figure 5.30, half (78) of the simulations in Group I experiments involved the 3-Node example and the other half involved the 7-Node example hydraulic network. In half (13) of the twenty six (26) simulations for each of the 3 classes of problems, the I/O data set size used during training was T = 14 while in the other half T = 54 I/O pairs were used to train Trade Mark of Sun Micro Systems Many thanks are due to the CE, CS, and ME departments at UBC for providing generous access to their computing facilities during the course of this research (without which it would have taken of order of years to conduct the experiments on a Sun 4 workstation). 1 2 Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS Figure 5.30: Overview of Group I Experiments for the 3-Node Hydraulic Net 145 Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS Overview of Group II Experiments 146 ,,—{Noiseless! ~4 Noisy 1 1-Layer ^SConfig-21 ^ Prob IB 5-Layer 16 7-Node 1-Layer ^Conlig-ll (Jonng-2j -JConfig-H ^SConflg-2| Prob 2B Config-l| 5-Layer ^ C o ^ i Figure 5.31: Overview of Group II Experiments the feedforward ANNs. In seven (7) of the 13 simulations3 seed training data was generated using the Normal deviate while the Uniform deviate was used to generate seed data for the remaining six (6) simulations. Four (4) of the 7 experiments with the Normal and half (3) of the 6 simulations with the Uniform investigated the effect on convergence and performance when the fuzzy membership functions outlined in Section 5.3 were used to transform the I/O vector pairs. Note that in experiments when the Uniform deviate was used to generate seed training data, the input vector to the ANNs were transformed using the three fuzzy membership functions and not the i.e., 7 of the 12 groups of 13 simulations. Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS 147 exceedence probability. Three (3) of the 7 simulations with the Normal and 3 of the 6 training runs with the Uniform attempted to examine convergence and performance behavior of the feedforward ANNs when trained with random noise added to the transformed I/O vector pairs. 5.2.2 Overview of Group II Experiments Group II (16/172) experiments studied the problems of hydraulic network analysis and model calibration with incomplete training data (i.e., Problems IB and 2B, respectively): In addition, these experiments investigated the effects on convergence and performance as a result of changes in the number of hidden layers and nodes per layer in the feedforward ANN representation of these problems. All the experiments in this group involved the 7-Node example hydraulic network. Seed training data was generated using the Normal deviate, the linear fuzzy membership function was used to transform the I/O vector pairs, and the feedforward ANN were all trained with T = 14 noisy and noiseless data. 5.3 Representation and Details of the Hydraulic Network Examples The three classes of problems discussed in Section 4.5 were solved for two specific hydraulic network examples: 1. a 3-Node/3-Segment/l-Loop hydraulic net 2. a 7-Node/ll-Segment/5-Loop hydraulic net Chapter 5. EXPERIMENTAL VERIFICATION Input Layer AND ANALYSIS OF RESULTS Hidden Layer 2m + 1 = 7 148 Output Layer Figure 5.32: Feed-forward ANN representation of the 3-Node/3-Segment/l-Loop example for Class 3 problems used for learning pipe loss factors only with knowledge of the values of A: = 2 external flows at the sink nodes and the value of r = 1 energy head at the source node. Note that fi and fi denote special input and output transformation functions specific to the hydraulic networks. a z The topology of the hydraulic network examples and their feedforward ANN representations in the context of each of the three classes of problems are detailed in the remainder of this section. 5.3.1 Example 1: The 3-Node/3-Segment/l-Loop Hydraulic Network The details of the 3-Node example with k = 2 sink nodes, r = 1 source node, and s = 3 pipe segments are summarized in Tables 5.4 and 5.5. Figure 5.28 is the schematic of this network. Based on Kolmogorov's ANN existence theorem discussed in Chapter 4, for all three classes of problems the feedforward ANN representation of the 3-Node hydraulic net was chosen to be an m — (2m •+ 1) — n feedforward ANN where m and n are the sizes of the input and output vectors respectively. For the hydraulic network analysis and model calibration problems with Chapter 5. EXPERIMENTAL Segment i 1 2 3 From Node 3{io) 2 3 3 VERIFICATION To Node j(ib) 1 2, 1 AND ANALYSIS OF RESULTS Diameter d,(meters) 0.1524 0.2540 0.2032 Length /.(meters) 0500.0 1000.0 1000.0 149 Friction Factor [mean ± 2.5 std.J .02 ± .005 .02 ± .005 .02 ± .005 Table 5.4: Pipe segment data for the 3-Node hydraulic network Node Number X-Coord Y:Coord (meters) (meters) 3 1 968.25 0.00 2 968.25 500.00 3 0.00 250.00 Energy Head Ej [mean ± 2.5 std.] (meters) 050.00 ± 4.0 050.00 ± 6 . 0 125.00 ± 12.5 Table 5.5: Node data for the 3-Node hydrauhc network complete data (Problems 1A and 2A, see Chapter 4 for details), the result was a feedforward ANN with with m = (k + r + s) = 6 units in the input layer, 2m+ 1 = 13 PEs in the hidden layer and n = (k + r + s) = 6 PEs in the output layer. This ANN topology (6 - 13 - 6) was used for representing both the forward (hydraulic network analysis) and the inverse (hydrauhc network model calibration) problems with complete data (i.e., with equal number of state variables and model parameters). Similarly for Problem 3, [m - (2m + 1) - n] = [3 - 7 - 3] feedforward ANNs (see Figure 5.32) were specified and trained to compute the n =.s- = 3-dimensional output vector (i.e., pipe segment loss factors) with m = (k + r) = 3 inputs (i.e., demand discharges at the 2 sink nodes and energy head at the source node). Chapter 5. EXPERIMENTAL VERIFICATION Node Number X-Coord (meters) 3 1 0.00 2 250.00 3 250.00 4 550.00 5 950.00 6 750.00 7 0.00 AND ANALYSIS OF RESULTS Y-Goord (meters) 0.00 0.00 500.00 0.00 -100.00 500.00 500.00 150 Energy Head Ej [mean ± 2.5 std.] (meters) 26.00 ± 2.6 25.00 ± 2.5 22.00 ± 2.2 23.00 ± 2.3 24.00 ± 2.4 45.00 ± 4.5 60.00 ± 0.0 Table 5.6: Node data for the 7-Node hydraulic network 5.3.2 Example 2: The 7-Node/ll-Segment/5-Loop Hydraulic Network In order to investigate how well the BP algorithm "scales" with the basic size of the hydraulic network (as measured by (A: + r + s)), the various experiments were repeated for a 7-Node/llSegment/5-Loop hydraulic net with s = 11, k = 5, and r = 2. For this example hydraulic network (k + r + s) = 18 which represents an increase in the number of basic variables and parameters by a factor of three. Tables 5.6 and 5.7 summarize the data used to generate training and testing I/O pairs for this example. Similar to the 3-Node/3-Segment/l-Loop example, Problems 1A and 2A were represented by means of a [(k + r + s) — 2(k + r + s) — (k + r+s)] = [18-37-18] feedforward A N N . ^ case of Problem 3, [(k+r)-2(k+r)-s] ANNs were trained to compute the pipe segment loss factors. = [7-15-11] Problems IB and 2B were represented and successfully trained by means of [(k + r) — 2(k + r) — (k + r + s)] = [7 — 15 —18] feedforward ANNs without the s = 11 pipe segment loss factors orflowratesavailable as part of the input vector during training. Experiments involving Problems IB and 2B were repeated to examine the effects of the Chapter 5. EXPERIMENTAL Segment i From Node j(io) 1 2 3 4 5 6 7 8 9 10 11 7 2 7 3 3 4 4 6 6 5 6 VERIFICATION To Node j(ib) 1 1 2 7 2 2 3 3 4 4 5 AND ANALYSIS OF RESULTS Diameter Length d{ (meters) li (meters) 0.1524 500.0 0.1524 250.0 0.1524 559.0 0.1270 250.0 0.1016 500.0 0.1270 300.0 0.1016 583.1 0.1524 500.0 0.1270 538.5 0.1270 412.3 0.1270 632.5 151 Friction Factor [mean ± 2.5 std:] .02 ± .005 .02 ± .005 .02 ± .005 .02 ± .005 .02 ± .005 .02 ± .005 .02 ± .005 .02 ± .005 .02 ± .005 .02 ± .005 .02 ± .005 Table 5.7: Pipe segment data for the 7-Node hydrauhc network number of hidden layers and nodes/layer on convergence and performance. Since all of the 5 loops of the 7-Node example hydrauhc network have the same number of nodes and pipe segments, the solution of the integer linear program suggested for the determination of the number of PE per layer becomes trivial since all of the 5 hidden layers in the feedforward ANN are required to have the same number of PEs (i.e., 13 for a corresponding 18-37-18 net and 7 for a corresponding 7-15-18 net). Consequently, for Problems IB and 2B, 7-13-13-13-13-13-18 and 7 - 7 - 7 - 7 - 7 - 7 - 1 8 ANNs with 5 hidden layers were used to replace the 7-37-18 and the 7—15 — 18 representation of these problems with one hidden layer, respectively4. Note that the number of connections in the feedforward ANNs representing Problems 1A and 2A with one hidden layer is (18 x 37) + (37 x 18) = 1332 while the number of connections in the ANN with 5 hidden layers representing the same problems is (18 x 13) + 4(13 x 13) + (13 x 18) = 1144. This reduction in the number of connections is at the expense of an increase in the total number of units from (18 + 37 + 18) = 73 to (18 + (4 x 13) + 18) = 88. 4 Chapter 5. EXPERIMENTAL 5.3.3 VERIFICATION AND ANALYSIS OF RESULTS 152 A Method for Hydraulic Network Decomposition As discussed previously, although the 3-Node/3-Segment/1-Loop net is the simplest possible example of a hydraulic network, the solution for the various classes of problems is not trivial. For instance, in the context of both Problems 1A and 2A, the solution involves finding the zeros of a system of (k + r + s) = 6 non-linear equations with 6 unknowns. The 7-Node/ll-Segment/5-Loop example represents a three fold increase in the total number of variables and parameters of the hydraulic network model with a total of 2(k + r + s) = 36 variables and parameters (k — 5, r = 2, and 5 = 11). An important observation based on the comparison of the results of the experiments dealing with the two hydraulic network examples is that the number of cycles required to train the feedforward ANNs did not increase with the size of the hydraulic network as measured by (k + r + s). This illustrates the parallel-distributed nature of the error backpropagation algorithm and its suitability for hardware implementation. Larger hydraulic networks were.initially considered after experimentation with the 3Node/3-Segment/l-Loop and the 7-Node/ll-Segment/5-Loop examples in an attempt to investigate the suitability of feedforward networks for representing and solving larger and more realistic water distribution systems. However, a simple method for decomposing a hydraulic network into sub-networks was subsequently developed whose repeated application produces at most Lh simple hydraulic networks that can be studied independently. Based on this method any hydraulic network may be "cut" along any number of pipes as illustrated in Figure 5.33. The two resulting sub-networks have exactly the same pipe segmentflowratesand nodal energy heads as the original network. The external inflows and demand discharges at the source and Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS 153 Qs Figure 5.33: Data for the original hydraulic network with Nh — k + r = 5 + 2 = 7, L = 5, and Sh — s = Nh + (Lh — 1) = 7 + (5 — 1) = 11 appear in Tables 5.6 and 5.7. The above figure illustrates how the network may be decomposed into smaller sub-networks by "cutting" the network along one or more pipe segments. Theflowrates in all the pipe segments including the segment along the cut (i.e., qj) remains the same as those in the original network. The external flows at the two nodes along the cut are now given by Qz\ = Qz — qs and Q32 — Qz — 94 + qs for node 3 and Q41 = Q4 — qio — q and Q42 = Q4 + q& for node 4. The energy heads at the various nodes of the sub-networks remain the same as those in the original network. Note that Qzi + Qz2 = Qz + 97 and Q + Q = Q - 97. n 9 iX 42 4 Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS 154 sink nodes are exactly the same as those of the larger network with the exception of the external flow at the two or possibly more nodes along the cut. As shown in Figure 5.33, pipe segment number 7 along the "cut" carries the same flowrate, g7, in both sub-networks as in the original network. This approach to decomposing the 7-Node/ll-Segment/5-Loop network results in a 5Node/7-Segment/3-Loop and a 4-Node/5-Segment/2-Loop sub-network. Notice that although the total number of nodes and segments to be analyzed increases by the total number of nodes and segments along the "cut", the number of additional unknowns to be solved for in each subnetwork only increases by the number of nodes along the cut. After decomposition, for each class of problems, the various hydrauhc sub-networks may then be represented and trained independently by means of the appropriate feedforward ANN as outlined previously. 5.4 Synthetic Data Generation One of the most important problems associated with model calibration using feedforward ANNs trained by error BP is that in most real-world situations, unlimited supplies of training and testing I/O data pairs are rarely available. With unlimited data, one way to decide on the size of these data sets is to train and test the ANN with successively larger data sets until no significant improvement in performance results or until the root mean squared error computed with both the training and testing data sets agree within the desired tolerance. However, this trial-anderror method for deciding on the training and testing sample sizes is very time-consuming and impractical for many real-world distribution systems where field measurements of pipe segment flowrates and nodal energy heads are available only at a token number of control points in the Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS 155 hydraulic network. The hydraulic network simulator described in Chapters 2 and 4 to a large extent overcomes this problem and can be used to synthetically generate reliable and accurate training and testing I/O data pairs for assessing the feasibility and versatility of feedforward ANNs for hydraulic network analysis and model calibration under typical operating conditions. The simulator begins by generating the values of the (Nh = k + r) energy heads and (Sh = -s) pipe friction factors in the prescribed range and then solves the system of equations for the (Sh = s) pipe flow rates (from the energy head loss-flowrate relationship) and the (Nh = k + r) external flows at the nodes directly. The obvious advantage of this approach is that the (k + r) external nodal flowrates and the s pipe segment flowrates can now be calculated without having to resort to the use of the iterative solution methods outlined in Chapter 2 completely avoiding algorithm convergence problems and the corresponding errors. This approach facilitates the evaluation of the convergence behavior of the various feedforward ANNs during training and their performance with test data for the various classes of hydraulic network problems and examples without the need for measurement data. For each of the three classes of problems detailed in Section 4.5 and for each of thetwo hydraulic network examples described in Section 5.3 of this chapter, the various sets of I/O training data pairs, (s ,z ), t d randomly generated over the specified operating range, were trans- formed and normalized before being used to train the various feedforward ANNs described in the previous section (5.2). The purpose of the remainder of this section is to describe the basic approach used in deciding on. the size of the training and testing data sets; outline the basic Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS 156 form of the various activation, transfer, and transformation functions implemented at the various processing elements and the choice of their parameters; and finally, describe some of the specific details of the error BP algorithm such as the range of the initial values of the weight parameters, choice of the values of the learning rate and momentum terms and so on. 5.4.1 Range of System Operation and the Size of Training and Testing Data Sets The specified range of the values of the nodal energy heads and friction factors over which training and testing 1/ 0 vector pairs are generated together with the given topology and values of pipe diameters and lengths in effect define the operating range of the hydrauhc network over well-defined boundaries. In order to ensure that the training and testing I/O data representative of the this operating range is present in the various data sets, the following four I/O pairs, generated using the minimum and maximum values of nodal energy heads and the minimum and maximum values of pipe segment loss factors, were included in all data sets: 1. minimum values of the (k + r) nodal energy heads and maximum values of the s pipe segment loss factors (Min-Max used to calculate and define the minimum values of pipe segment and external flowrates), 2. maximum values of the (k + r) nodal energy heads and maximum values of the s pipe segment loss factors (Max-Max used to calculate and define intermediate values of pipe segment and external flowrates), 3. minimum values of the (k + r) nodal energy heads and minimum values of the s pipe segment loss factors (Min-Min used to calculate and define intermediate values of pipe Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS 157 segment and external flowrates), and 4. minimum values of the (k + r) nodal energy heads and minimum values of the s pipe segment loss factors (Min-Max used to calculate and define the maximum values pipe segment and external flowrates). During preliminary experiments, only I/O vector pairs corresponding to items 1 and 4 were included in the 1/0 data sets randomly generated over the operating range. The inclusion of I/O vector pairs corresponding to items 2 and 3 above (i.e., Max-Max and Min-Min used to compute intermediate pipe segment and externalflowrates)in the same data set was observed to significantly increase the number of training cycles required to achieve the desired tolerance (by a factor of close to 5). In addition, the presence of these two I/O vector pairs was observed to increase the minimum value of the root mean squared error (computed with both the testing and training data sets) as a function of the training cycles. This result was expected and may be best explained in the context of Problem 3 where the (k + r)-dimensional input vector to the ANNs, made up of the external demand discharges at the sink nodes (Qt) and the source node energy heads ( E ) , is used to estimate the values of r the 5-dimensional ANN output vector (i.e., the pipe segment loss factors ( K ) ) . The underlying s function is not "well-defined" (i.e., many-to-one) since the maximum and minimum generated values of the energy heads (i.e., Ej££™ and E™£f) in conjunction with the minimum and maximum generated values of the pipe segment loss factors (i.e., K m m and K™ ) used to compute ax internal pipe and externalflowrates(i.e., q and (Qfc+r), when re-arranged for Problem 3, result 5 in the presence of the following four I/O vector pairs in both the training and the testing data Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS 158 sets: 1 ( E m , n Q^1*71)1 • Kma:r 2 ( E m a : r Q*n'l) y 3 ( E m , n QJJ1*2) 1 I J£max. • J£"*»'n Hence two quite different input vectors are mapped to exactly the same output vectors (I/O pairs 1 & 2 and 3 & 4 above). Despite the fact that the presence of these four I/O pairs was observed to significantly increase the number of cycles required to converge to the desired training RMSE D < 0.01, these four I/O vector pairs were included in all data sets to ensure adequate representation of the various variables and parameters over the specified operating range. Noisy Data Hydrauhc network analysis and model calibration in a real-world scenario often requires the study of the behavior of the water distribution system over its operating range in response to perturbations from steady-state conditions and/or noise. Close to half of Group I experiments investigated the presence of random "noise" in the values of the various groups of state variables and model parameters. The standard of deviation of the noise added to the training and testing I/O data pairs was 3% of the interval5 [/imtn, n x] = [0.1,0.9]. In all the experiments ma Note that the value of the variable/parameter after addition of noise lies in the interval ±2.5 times the standard deviation of the noise, (or ±7.5% of the transformed range) around the actual value. 5 Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS 159 investigating the effect of non-normalized noisy training and testing I/O data, the highest values of RMSEk were observed when the networks were trained with noisy training 1/0 data and tested with noiseless data (in comparison to networks trained with noiseless data and tested with noisy data or networks trained and tested with noisy data.). The experiments reported in this chapter investigate this worst case scenario. Random noise was added to 6 of the 7 basic data sets (for each sample size and each example) in order to evaluate the effect on convergence and performance of the various feedforward ANNs when trained with non-normalized noisy data. Underlying Probability Distribution Training and testing data needed to teach the various feedforward ANNs specified in Sections 4.5 and 5.3 were generated by initially selecting values of pipe friction factors (effectively the pipe segment loss factors K and nodal energy heads (Efc+ )) from the specified intervals and s r then the pipe segment (q ) and externalflowratesat all the source and sink nodes (Qk+r) were a calculated based on the algorithm outlined in Chapter 4. In order to investigate the effects of the underlying probability distribution, random deviates from both the Uniform and Normal (Gaussian) probability distributions were generated and used in conjunction with the specified means and standard deviations of friction factors and energy heads to calculate the internal pipe and external nodalflowrates.In addition, the effect of the size of the training data sets on convergence and performance of the feedforward ANNs was investigated by using two different sample sizes (14 and 54) to train the ANNs representing the three classes of hydraulic network problems for both the 3-Node and 7:Node examples. Chapter 5. EXPERIMENTAL 5.4.2 VERIFICATION AND ANALYSIS OF RESULTS 160 Development of the Input/Output Data Transformation Functions The input vectors to the feedforward ANNs were transformed using four different transformations three of which (one linear and two non-linear) are very similar to the well-known fuzzy membership functions. These transformations were used to constrain the values of the various groups of input vectors to the ANNs (i.e., the desired combination of the (2x(k+r+s)) variables or parameters) so that each input vector lies mostly in the interval [/i m,/J ] = [0.1,0.9]. ro max Note that although the form of these transformations are the same for all groups of input variables or parameters of the hydrauhc model that make-up the ANN input vector, the parameters of these functions are different for different groups that make-up the training and testing 1/ O vector pairs. This approach has the advantage of providing a consistent method for systematically mapping vaguely specified ranges of values of different groups of hydrauhc network model variables and parameters (used as I/O to/from ANN) to the same arbitrarily defined interval [/zm,n, P-max] = [0.1,0.9]6 . Specifically, the following transformation functions were used (see Figure 5.36): An important feature of these transformation functions is that the various sets of ordered pairs generated in this way (such as A* = {(si, /t i(si), • • •, (s , ti r(s )} where s represents either the externalflowratesor energy heads at the source nodes) are indeed non-normal fuzzy sets. By definition, if S is a collection of points s, then a fuzzy set A, in S is a set of ordered pairs: 6 3 r 3 r T A . = {(«,0.(*))l«€S} where fi (s) is the membership function the value of which represents the grade or membership of s in A . The membership function value of a point s may also be interpreted as the degree to which the deterministic measurement s is compatible with (some vague concept of) A . fis(s) maps the measurement space S to the space of the membership function M . The range of the values of the membership functions are non-negative real'numbers with a finite maximum value. If this upper bound is unity, then the set of ordered pairs A is called normaJ[Pao89]. The importance of the above interpretation of the transformations as fuzzy membership functions is that vague concepts (such as "reasonably highflowrate"or "reasonably low pressure" operating conditions) can now be defined in terms of membership functions of the components which are in turn defined in the domain of the various groups of the measurement variables. The transformed values hence represents the extent to which a particular value of the hydrauhc network model variable or parameter is compatible with the vague concept about that variable or parameter. This has practical implications for the analysis of water distribution networks both during design and operation with incomplete and often vague data. s s s 3 Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS 161 1. The Fuzzy Type 1 transformation performs a linear scaling of each of the various groups of state variables and/or model parameters (replace subscript s with z if the variables/parameters are part of the ANN output) based on the following equation: (5.93) p, = a + ais s 0 where t^max Mmt'n / r\ A\ r a0 = 'max — "mm (5.94) and ai = (5.95) s is the original value of the state variable, p, € [/tmin,Mmai]) is the transformed variable, s and is the most likely value of the variable s in the interval [s i ,s ]. m n max The value of s was calculated using s j -f- S m 9 X ^ S m , n (i.e., the most likely value was assumed to be a m n the midpoint of the interval), [n in,Hmax] = [0.1,0.9] for this transformation referred to m as Fuzzy Type 1 in the Results Tables of Section 5.4. 2. The Fuzzy Type 2 transformation used for each of the various groups of ANN inputs is a bell-shaped function with the following form: f*s = (Umax + a(s - s M ) 2 ) - 1 (5.96) where 77 0 1 = — l^max - ^2 = 77. — V"max ^2 ( 5 9 7 ) Chapter 5. EXPERIMENTAL where p and m a x Umin = and p in m VERIFICATION AND ANALYSIS OF RESULTS 162 were chosen to be 1.0 and 0.1 respectively. Note that when p m a x =1 0 the above function takes on the form of a widely used normal membership function and the expression for a is no longer valid. This function is referred to as Fuzzy Type 2 in all the Results Tables of Section 5.4. 3. The following Logistic-like transformation function was also used to fuzzify each of the various groups of ANN inputs with p m a x and p { m set to 0.9 and 0.1 respectively. n M* = (AW + e-^-*"))- 1 (5.98) where a = MT Mmax) 1 ^ Mmin ; ;— = ^(iT .{ min ~ "Sji) 1 Umax N Mmax) i, max ~ s / ^— ' (5.99) fi) s s As with the previous transformations, this is also a generalized form of a commonly used normal membership function when p i m n and p ax are set equal to 0 and 1 respectively m (in which case the expression for a is no longer valid). This function is referred to as Fuzzy Type 3 in the Results Tables of Section 5.4. 4. The cumulative normal distribution function was also used to transform the inputs. The following expression was used to estimate the probability of exceeding a particular observed value [Abramowitz65]: Ps = P[s>s ] 0 where |e(s)| < 2.5 X = l-^[h(s)]- 4 + e(s) (5.100) 10 - 4 and h(s) is a fourth degree polynomial given by: h(s) = (1 + cis + c s + c s + c 5 ) 2 2 3 3 4 4 with C! = 0.196854, c2 = 0.115194, c3 = 0.000344, and c4 = 0.019527. (5.101) Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS 163 Fuzzy Type 1 P-max Pmin s min S =- S/j, S max - s Ps Fuzzy Type 2 pmax / \ \ Mm«n s min S = s„ s Smax Ms Fuzzy Type 3 . t^max /' ,/ Mroin s min S = s max »- S Figure 5.34: Form of the input transformation functions. Fuzzy type 1 is the function used to transform the output from the ANN. Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS 164 The total ANN input vector was algebraically normalized by dividing the value of each of its components by the length of the total input vector. If ft is the transformed m-dimehsional 3 input vector to the ANN, then the normalized unit vector, p , , is simply7: s fis = (/& + • • • + (5-102) The effect of non-normalization of the input vector was also investigated in a number of experiments by adding small random noise to the training data after algebraic normalization of the ANN input vector in an attempt to simulate the worst case scenario. As a result of early experimentation with the 3-Node network, it was established that both the number of training cycles and the performance of the feedforward ANNs improve appreciably when the networks are trained with normalized input. In all experiments (with the exception of the networks trained with noisy data) the input vector to the ANN were normalized. The various groups of variables and parameters in the output vector were also transformed. However, in order to keep the inverse transformation functions needed at the output units of the ANN as simple as possible and to assist with the comparison of results, the components of the output vector were only linearly scaled using Equation 5.93 above (Fuzzy Type 1). In addition, unlike the input, the output vector was not normalized. In a number of experiments where the output vector was transformed using Equations 5.96 and 5.98 (Fuzzy Type 2 and 3), the number of cycles required to achieve the desired error was observed to increase significantly. Chapter 5. EXPERIMENTAL 5.4.3 VERIFICATION AND ANALYSIS OF RESULTS 165 Choice of the Activation and Transfer Functions In order to investigate the effects of the choice of the activation function on convergence and performance, three different functions, namely, the logistic, the hyperbolic tangent, and the Gaussian activations, were initially considered and implemented at the PEs in the hidden and output layers. Prehminary experiments investigating the convergence of feedforward ANNs implementing the Gaussian activation functions indicated that the learning task is computationally more expensive (in comparison to when the logistic activation function was used) requiring more PEs in the hidden layer than that prescribed by Kolmogorov's mapping ANN existence theorem outlined in Chapter 4. This observation may be mainly attributed to the fact that this class of activation functions are not monotonically non-decreasing as required by Kolmogorov's theorem. In practical terms, monotonicity (i.e., monotonically non-decreasing) means that the activation functions implemented at the PEs in the hidden and output layers are "mildly'' nonlinear. In the ANN literature, feedforward ANNs implementing monotonically non-decreasing activation functions are referred to as "quasi-linear" or "semi-linear" networks, in particular when exemplified by the logistic activation function. ANNs implementing Gaussian activations are "highly" nonlinear making analysis and computations relatively more difficult. In addition, although with enough PEs in the hidden layer the Gaussian activation function often increases the feedforward ANN's ability to suppress noise, the convergence and performance behavior of these networks are more difficult to study since nonlinearity often favors dynamic instability and risks computational and analytical intractability[Kosko92a]. As pointed out by Kosko[Kosko92a] ". Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS 166 The Logistic Activation Function 1.0 Figure 5.35: Illustration of the parameters of the logistic activation function. The parameter X j o serves as a threshold or bias value. The effect of a positive value of this parameter is to shift the activation function to the right along the horizontal axis. The value of aj governs the steepness of the function. A high value of aj makes the sigmoid function similar to a threshold-logic unit while a low value of aj results in a more gently varying curve. . . monotonicity seems to be nature's evolved balance, since almost all biological neurons have sigmoidal signal characteristics. The logistic and the hyperbolic tangent functions are differentiable, and monotonically non-decreasing functions that map points in their input space to the interval [0,1] and [—1,1] respectively. The logistic activation shown in Figure 5.35, is perhaps the most common type of activation function implemented in feedforward ANNs trained by error backpropagation. As pointed out in Chapter 3 (see Table 3.2), an interesting property of the derivatives of the logistic and the hyperbolic tangent functions is that they can be expressed in terms of the original functions themselves. As expected, the convergence and performance of the feedforward ANNs that implemented these two activation functions were observed to be very Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS 167 similar. The logistic activation function was chosen over the hyperbolic tangent since it is a simpler function with essentially the same geometric form and properties. The various logistic activation functions were assumed to be identical for all the units in the hidden and output layers with bias bj (XJQ — —bj) and ctj = 1. Note that in the weighted sum, Xj, bj may be represented by the product WJQSJO (or with the imaginary signal WJQZQ) SJQ (or ZQ) always set equal to unity. ( ) aj Xjt = a(x ) = (1 + e-'*)- (5.103) 1 jt As before, Xj is the net-input to PE j computed after the presentation of the t ih t training input to the ANN using x j t = E^o j,(t-i)4 - 1 w The transfer functions implemented at the various PEs are also the same for all the units in the hidden layers and are simply the identity functions, i.e., Zj(t) = fj(yj(t)) = f(yj(t)) = Vj(t) = a(xj(t)). This is consistent with the derivation of the error backpropagation algorithm presented in Chapter 4. Note that for the PEs in the output layer the transfer functions may be defined as the inverse of the linear fuzzy transformation performed on the output vector. 5.4.4 Weight Initialization and Learning Parameters The minimum, maximum, and the initial random values assigned to the weight parameters can be important since if the initial values are too large then the logistic activation functions implemented at the hidden and output units will saturate from the beginning with the possibility of the system becoming stuck in a local minimum or a very flat plateau during early stages of training. A strategy recommended by Hertz et al. [Hertz91] is to choose the random values of weights w - so that the magnitude of a typical net-input, x j, to unit j in layer / is slightly 1 l Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS 168 less than unity. This can be achieved by taking the values of the initial weights to be of order _i of N _\ where iVj_i is the number of PEs in source layer (/ — 1) which feed forward towards t PEj. Other strategies include initialization of the weight parameters (including the bias term) to random values in the interval ±0.5[Freeman91]. The logistic activation function with a = 1 implemented at the various nodes saturates and takes on values that are very close to zero or one when Xj is outside the interval [—9, +9]. Since the interval [0.1,0.9] was used as the range of the ANN outputs (after linear transformations), representing the smallest and largest possible values of the various groups of variables/parameters in the ANN output vector, the worst case occurs when all the outputs from the PEs in a source layer are at their maximum value. The highest/lowest values of the net-input, Xj, hence result when the PE receives these high inputs from a source layer in the network that has the largest number of PEs (N™\ ) with the weights at their maximum/minimum value. x Assuming w' min = -w , l max [—9,+9] = — [ xjmin> xjmax\ r Imax \rmax [ zimax lyl-\ = Solving for w l and w l max min I mini Imax i\jmax imax lyl-l w z I n maxl w [0.9^™r«;| ,- ,0.9^r«'maJ B B results in W' max mm 10 xrmax 1S l-l 10 l-l fjmax ly Based on the above reasoning, the various values of the weights were randomly generated in the interval \w' • ,u>L„ J . L mtn" maxl Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS As discussed in Chapter 4, the correction terms, Awj^t+l) 169 = —W§~h t n a t must be applied to the weight parameters require the selection of a value for the learning rate parameter 77. In general, it is not easy to choose appropriate values for rj. There are a number of methods in use today that adaptively adjust the learning rate parameter. For example, Cater has suggested observing the amounts by which a series of weight updates, for one specific training I/O vector pair, actually changes the error [Cater 8 7]. Learning rate parameters can then be adjusted based on the analysis of error evolution. If the error is observed to consistently decrease over several cycles (in the context of a single I/O pair) then it would be beneficial to increase the learning rate 77 for that I/O pair. On the other hand, if the error increases then the value of 77 is too large and it should be reduced. Other methods used for choosing initial values of n involve assigning different learning rates parameters 77/ to each layer, or using different values of T7j,s for each connection Wji (initial values of which are often chosen proportional to one over the number of fan-in units to PEj) [Jacobs88j. Different authors have suggested different range of values for 77. For example, Freeman et al. recommend small values between 0.05 and 0.25[Freeman91], while Dayhoff [Dayhoff90] suggests values in the interval 0.25 to 0.75. Like other gradient descent minimization methods, the convergence of the error backpropagation algorithm is. sensitive to the values of 77. On the one hand, if 77 is small, then learning can be very slow since a large number of iterations are needed to achieve the desired error tolerance. On the other hand, if 77 is too large then rapid learning is accompanied by wild oscillations in the values of RMSE . The problem may be attributed to the shape of the error t Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS 170 surface with steep sides but a shallow slope along the valley floor [Hertz91]. There are a number of approaches that may be used to deal with the above problem including the replacement of the gradient descent error BP algorithm with more sophisticated minimization algorithms, or adaptive adjustment of the learning rate parameter as training proceeds (see, for example, Hertz [Hertz91] and Silva [Silva90]). A conceptually simpler approach originally suggested by Rumelhart, Hinton, and Williams [Rumel86a] is to modify the error correction term by the addition of a momentum term. Aw\-(t + 1) = - r ^ - f + ctAw'^t) (5.104) where a is the momentum parameter and must be between zero and one. A value of 0.9 is recommended by many researchers ([Freeman91], [Dayhoff90], [Hertz91]) and was used for all simulations reported in this chapter. H the amount of movement towards the minimum of the error surface is about the same at two consecutive intervals, i.e., Awj,(i + 1) « Aw'j^t) then the above equation converges to: A „; j ( ( with an effective learning rate of + 1 ) „ _ _ i _ | | ( 5 . 1 0 5 ) (= 10r? when a = 0.9). In an oscillatory situation, however, Avjj^t + 1) responds mostly with coefficient n to instantaneousfluctuationsof The momentum term dictates that the change in similar to the change applied at the t ih at (t + l) th step should be somewhat step. In this way, momentum in the rate of change of the correction is conserved to some degree with the overall effect of accelerating the long term trend by a factor of (= 10 when a = 0.9) without magnifying the oscillations. Chapter 5. EXPERIMENTAL 5.5 VERIFICATION AND ANALYSIS OF RESULTS 171 Details of Experiments and Simulation Results This section reports on the results of a relatively comprehensive set of numerical experiments designed to study the relative convergence and performance of feedforward ANNs trained by error BP. The various experiments may be divided into two basic groups. Figures 5.30 and 5.31 provide an overview of Group I and Group II experiments. Group I experiments studied Problems 1A, 2A, and 3 involving both the 3-Node/3-Segment/l-Loop and the 7-Node/llSegment/5-Loop example hydrauhc networks while Group II experiments studied Problems IB and 2B involving only the 7-Node/ll-Segment/5-Loop example network (see Section 4.5 for details). 5.5.1 Simulation Results for Group I Experiments The first group of 156/172 experiments (Group I) are summarized in Tables 5.8 to 5.19 of this subsection. In total, 52 simulations were carried out for each of the 3 class of problems in this group (i.e., Problems 1A, 2A, and 3 with 26 experiments for each of the two (3-Node and 7-Node) example hydrauhc networks). Figure 5.30 is an overview of the experiments in this group. In all the experiments reported in Tables 5.8 to 5.19, the lowest values of RMSEk are highlighted with the fmarker symbol for each underlying distribution (i.e., Normal and Uniform). Note that although the input vectors in the testing I/O data sets used to assess the performance of the trained ANNs are different for each of the 78 experiments dealing with each example hydrauhc network, the output vectors are the same (all subject to Fuzzy Type 1 output transformation) to provide a basis for studying the relative performance of the various Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS 172 networks. During training, the goal was to achieve a desired root mean squared error of RMSEf < 0.01 (i.e., a stringent 1.25% of the transformed output range [0.1 to 0.9]). The maximum number of cycles in each experiment was limited to 1,000,000 and was used as the second stopping criteria. In a number of simulations, training was stopped as soon as the first criteria was met. Although in many experiments (in particular for Problem 1A) convergence to RMSEf < 0.01 was achieved in less than 1,000 cycles, the results tables report the values of RMSEt after 1.000 training cycles in order to enable comparison of the relative convergence and performance behavior of the various feedforward ANNs. The remainder of this subsection summarizes the results of Group I experiments for Problems 1A, 2A and 3. The 12 tables summarizing the results of this group of experiments are indexed using the Problem number (i.e., Results Tables 1.1 to 1.4 for Problem 1A, 2.1 to 2.4 for Problem 2A, and 3.1 to 3.4 for Problem 3) as well as the chapter number followed by the cumulative table number (i.e., Tables 5.8 to 5.11 for Problem 1A, 5.12 to 5.15 for Problem 2A, and 5.16 to 5.19 for Problem 3). Problem 1A: Hydraulic Network Analysis with Complete Data The results of the 52 experiments dealing with Problem 1A are tabulated in Results Tables 1.1 (5.8) to 1.4 (5.11). • The highest observed value of RMSEt was .010000 after 7,000 training cycles and is high-lighted with a • marker symbol (see Results Table 1.1 (Table 5.8, 3-Node network example, Noiseless-Normal, Probability), Chapter 5. EXPERIMENTAL Problem 1A T=14 Input Transformation Normal, Fuzzy Type 1 Normal, Fuzzy Type 2 Normal, Fuzzy Type 3 Normal, Probability Uniform Fuzzy Type 1 Uniform Fuzzy Type 2 Uniform Fuzzy Type 3 VERIFICATION AND ANALYSIS OF RESULTS 173 Noiseless 3-Node Network RMSE Cycles RMSEk 1,000 .009707 .011868 1,000 .008194 .010894 5,000 .009659 .0093781 7,000 .010000* .009545 3,000 .009964 .011706$ 2,000 .009153 .013409 5,000 .009980 .013108 t 7-Node Network Cycles RMSEt RMSEk 2,000 .006039 .010690 1,000 .005600* .011312 3,000 .008511 .008833i 8,000 .009944 .012424 1,000 .009147 .0174391 1,000 .009580 .018480 4,000 .009565 .017835 Table 5.8: Results Table 1.1: Group I experiments, Class 1A Problem, Noiseless, T = 14. • After 1,000 training cycles, the lowest observed value of RMSE was .005600 and is hight lighted with a * marker symbol (see Results Table 1.1 (Table 5.8), T = 14, 7-Node network example, Noiseless-Normal, Fuzzy Type 2) and does not correspond to the ANN with the minimum value of RMSEk- • The lowest value of RMSEk computed with the testing data set size K = 14 was .005920 after 8,000 training cycles (see Results Table 1.2 (Table 5.9), T - 14, 7-Node network example, Noisy-Normal, Fuzzy Type 3) with a corresponding RMSEt — .009562. • The highest value of RMSEk computed with the testing data set size K = 14 was .042143 ( « 5% of the transformed output interval [0.1,0.9]) after 27,000 training cycles (highlighted with the • marker symbol) corresponding to the 3-Node network example (see Results Table 1.4 (Table 5.11), T = 54, Noisy-Normal, Fuzzy Type 3). Note however that the lowest RMSEk computed with the same testing I/O data set and tabulated in the same table was only .016039 (Noisy-Uniform, Fuzzy Type 1). Chapter 5. EXPERIMENTAL Problem 1A T=14 Input Transformation Normal, Fuzzy Type 1 Normal, Fuzzy Type 2 Normal, Fuzzy Type 3 Uniform, Fuzzy Type 1 Uniform, Fuzzy Type 2 Uniform, Fuzzy Type 3 VERIFICATION AND ANALYSIS OF RESULTS 174 Noisy 3-Node Network RMSEk Cycles RMSE 10,000 .009746 .024891 3,000 .008712 .0164861 8,000 .009832 .038455 6,000 .009764 .024530 3,000 .009778 .0209631 6,000 .009939 .038525 t 7-Node Network Cycles RMSEt RMSE 2,000 .009512 .016748 2,000 .008015 .018728 8,000 .009562 .005920* 2,000 .008538 .022016* 2,000 .009695 .024502 4,000 .009320 .036039 k Table 5.9: Results Table 1.2: Group I experiments, Class 1A Problem, Noisy, T = 14. • When noiseless data were used to train the feedforward ANNs, an increase in the training data set size to T = 54 from T = 14 was observed to result in relatively smaller error fluctuations as function of the training cycles before the desired RMSE d < 0.01 was achieved with no apparent improvement in the minimum values of RMSEk (calculated with the same testing I/O data set for each hydrauhc network example (compare Results Tables 1.1 and 1.3 (Tables 5.8 and 5.10))). In addition, on the average, the number of training cycles required to achieve the desired RMSE < 0.01 seem to remain relatively d unchanged. • When noisy data were used to train the feedforward ANNs, an increase in the training data set size to T = 54 from T = 14 was observed to generally increase the number of training cycles required to converge to the desired RMSE < 0.01 with no apparent improvement d in the minimum values of RMSEk (compare Results Tables 1.2 and 1.4 (Tables 5.9 and 5.11 respectively)). Chapter 5. EXPERIMENTAL Problem 1A T=54 Input Transformation Normal, Fuzzy Type 1 Normal, Fuzzy Type 2 Normal, Fuzzy Type 3 Normal, Probability Uniform, Fuzzy Type 1 Uniform, Fuzzy Type 2 Uniform, Fuzzy Type 3 VERIFICATION AND ANALYSIS OF RESULTS 175 Noiseless 3-Node Network Cycles RMSEt RMSEk 1,000 .007703 .0097511 1,000 .007964 .011760 2,000 .009427 .014941 4,000 .009670 .011488 1,000 .008418 .019655 3,000 .009905 .017275 2,000 .008829 .011542} 7-Node Network Cycles RMSEt RMSEk 1,000 .007735 .011545 1,000 .006485 .007586} 1,000 .009990 .015632 4,000 .009726 .010801 2,000 .009985 .014431} 3,000 .009477 .014637. 6,000 .009878 .014447 Table 5.10: Results Table 1.3: Group I experiments, Class 1A Problem, Noiseless, T = 54. Problem 1A Noisy T=54 3-Node Network 7-Node Network Input Transformation Cycles RMSEt RMSEk Cycles RMSEt RMSEk Normal, Fuzzy Type 1 9,000 .009897 .023635 3,000 .009837 .016039} Normal, Fuzzy Type 2 13,000 .009926 .017784} 3,000 .009628 .018985 Normal, Fuzzy Type 3 27,000 .009996 .042143* 6,000 .009261 .034985 Uniform, Fuzzy Type 1 28,000 .009989 .016039} 14,000 .009921 .025933 Uniform, Fuzzy Type 2 36,000 .009884 .025113 10,000 .009630 .025563} Uniform, Fuzzy Type 3 432,000 .009993 .034485 34,000 .009988 .033945 Table 5.11: Results Table 1.4: Group I experiments, Class 1A Problem, Noisy, T = 54. • With, the exception of a few simulations (see for example, Results Table 1.3 (Table 5.10), Noiseless-Uniform, Fuzzy Type 1), the number of cycles required to train the ANNs to the desired RMSE d < 0.01 corresponding to the 7-Node hydraulic network example were observed to be less than the number of cycles required to train the ANNs corresponding to the 3-Node network example (with one third the total number of basic hydraulic model variables and parameters). Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS 176 The evolution of error as a function of training cycles for the above 52 simulations are summarized in results Figures A.37 [1.1] to A.44 [1.8] in Appendix A. Problem 2A: Hydraulic Network Model Calibration with Complete Data The results of the 52 experiments dealing with Problem 2A are tabulated in Results Tables 2.1 to 2.4 (Tables 5.12 to 5.15). Examination and comparison of these tables with Tables 1.1 to 1.4 (Tables 5.8 to 5.11) for Problem 1A reveal that the model calibration (or inverse system identification) Problem IB is considerably harder. The various feedforward ANNs required, on the average, an order of magnitude (or more) training cycles to achieve the desired RMSE < d 0.01. The reason for this increase in the number of training cycles may be attributed to the fact that the system of equations describing the hydrauhc network's inverse are relatively more "non-linear" and computationally more "expensive" to solve even by means of conventional iterative techniques. • The highest observed value of RMSEt for this class of problems was .020260 reached after 1,000,000 training cycles and is high-lighted with the • marker symbol (see Results Table 2.4 (Table 5.15) T = 54, 3-Node network example, Noisy-Uniform, Fuzzy Type 3). • The lowest observed value of RMSEt was .008131 after 88,000 training cycles and is highlighted with the * marker symbol (see Results Table 2.2 (Table 5.13), T = 14, 3-Node network example, Noisy-Uniform, Fuzzy Type 2) and does not correspond to the ANN with the lowest value of RMSEk- Chapter 5. EXPERIMENTAL Problem 2A T=14 Input Transformation Normal, Fuzzy Type 1 Normal, Fuzzy Type 2 Normal, Fuzzy Type 3 Normal, Probability Uniform, Fuzzy Type 1 Uniform, Fuzzy Type 2 Uniform, Fuzzy Type 3 VERIFICATION AND ANALYSIS OF RESULTS 177 Noiseless 3-Node Network RMSEk Cycles RMSE 62,000 .009978 .054427 27,000 .009874 .049144 108,000 .009995 .0288931 140,000 .009883 ,036635 46,000 .009876 .026659* 43,000 .009842 .051158 33,000 .009995 .045581 t 7-Node Network Cycles RMSEt RMSEk 7,000 .009871 .016315 7,000 .009601 .023628 13,000 .009127 .012403* 22,000 .009770 .022050 13,000 .008829 .050960 22,000 .008612 .039799* 30,000 .009180 .040874 Table 5.12: Results Table 2.1: Group I experiments, Class 2A Problem, Noiseless, T = 14. • The lowest computed value of RMSEk was .011004 after 5,000 training cycles (see Results Table 2.3 (Table 5.14), T = 54, 7-Node network example, Noiseless-Normal, Fuzzy Type 1) with a corresponding RMSE = .009936. t • The highest value of RMSEk computed with the testing I/O data set was .099225 ( « 12% of the interval [0.1,0.9]) after 79,000 training cycles and is highlighted with the • marker symbol (see Results Table 2.2 (Table 5.13), T = 14, 3-Node network example, NoisyUniform, Fuzzy Type 3). The RMSE for this experiment is only .008939. Note, however, t that the lowest value of RMSEk computed with the same testing I/O data set in the same table (Results Table 2.2) is only 0.024620 with a corresponding RMSE = 0.009871 after t 94,000 training cycles (T = 14, 3 Node example, Noisy-Normal, Fuzzy Type 1). • When noiseless data were used to train the feedforward ANNs, the increase in the training I/O data set size from T = 14 to T = 54 pairs was observed to lead to smaller error fluctuations as a function of.the number of cycles required to converge to the desired Chapter 5. EXPERIMENTAL Problem 2A T=14 Input Transformation Normal, Fuzzy Type 1 Normal, Fuzzy Type 2 Normal, Fuzzy Type 3 Uniform, Fuzzy Type 1 Uniform, Fuzzy Type 2 Uniform, Fuzzy Type 3 VERIFICATION 178 AND ANALYSIS OF RESULTS Noisy 3-Node Network RMSEt RMSEk Cycles 94,000 .009871 .024620} 56,000 .009838 .040718 120,000 .009081 .098838 39,000 .009789 .083429} 88,000 .008131* .092889 79,000 .008939 .099225* 7-Node Network RMSE Cycles RMSE 2,000 .009984 .024545} 14,000 .009742 .071981 17,000 .009469 .045295 5,000 .008475 .050466} 29,000 .008604 .053376 13,000 .009466 .059151 t k Table 5.13: Results Table 2.2: Group I experiments, Class 2A Problem, Noisy, T = 14. RMSE < 0.01 with noticeable improvement in the minimum value of RMSEk calculated d with the same test data set in some cases (compare Results Tables 2.1 and 2.3 (Tables 5.12 and 5.14). • When noisy data were used to train the feedforward ANNs, the increase in the training I/O data set size from T = 14 to T = 54 was observed to noticeably increase the number of training cycles required to achieve the desired RMSE < 0.01 in 21 of the 24 experiments d summarized in Results Tables 2.2 and 2.4 (Tables 5.13 and 5.15) with no improvement (and in some cases, deterioration) in the minimum value of RMSEk computed with the same testing I/O data set. • In all simulations studying Class 2A Problems the number of cycles required to train the ANNs (to the desired RMSE < 0.01) for the 7-Node hydraulic network example were d observed to be less than or equal to the number of cycles required to train the ANNs corresponding to the 3-Node network example. Chapter 5. EXPERIMENTAL Problem 2 A T=54 Input Transformation Normal, Fuzzy Type 1 Normal, Fuzzy Type 2 Normal, Fuzzy Type 3 Normal, Probability Uniform, Fuzzy Type 1 Uniformj Fuzzy Type 2 Uniform, Fuzzy Type 3 VERIFICATION AND ANALYSIS OF RESULTS 179 Noiseless 3-Node Network RMSEk Cycles RMSE 25,000 .009841 .014553 19,000 .009790 .014444 34,000 .009640 .012817* 94,000 .009995 .018707 86,000 .009881 .041771* 523,000 .009997 .041810 1,000,000 .011467 .037548 t 7-Node Network Cycles RMSEt RMSE 5,000 .009936 .011004* 8,000 .009743 .011019 6,000 .009710 .011907 22,000 .009715 .012775 27,000 .009961 .043996 78,000 .009967 .050688 234,000 .009992 .041532* k Table 5.14: Results Table 2.3: Group I experiments, Class 2A Problem, Noiseless, T = 54. • In 4 of the 26 (4/26) experiments involving the 3-Node network example, the desired RMSE < .01 was not reached even after 1,000,000 training cycles (see Results Table 2.3 d (Table 5.14), T = 54, 3-Node net, Noiseless-Uniform, Fuzzy Type 3, and, Results Table 2.4 (Table 5.15), T = 54, 3-Node net, Noisy-Uniform, Fuzzy Type 1, 2, and 3). However, in the same tables, the desired RMSE < 0.01 was obtained at significantly fewer cycles. d For example, as shown in Results Table 2.3 (Table 5.14), a value of RMSE = .009640 t was obtained only after 34,000 training cycles with a corresponding RMSEk = .012817 (Noiseless-Normal, Fuzzy Type 3). Similarly, as shown in Results Table 2.4 (Table 5.15), a value of RMSEt = .009985 was obtained only after 37,000 cycles with a corresponding RMSEk = -042223 (Noisy-Normal, Fuzzy Type 1). The error evolution for the 52/156 experiments studying Problem 2A are summarized in results Figures 2.1 (A.45) to 2.8 (A.52) in Appendix A. Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS Problem 2A 180 Noisy T=54 3-Node Network RMSE RMSEk Input Transformation Cycles Normal, Fuzzy Type 1 37,000 .009985 .042223} Normal, Fuzzy Type 2 411,000 .010118 .067625 Normal, Fuzzy Type 3 176,000 .009998 .089438 Uniform, Fuzzy Type 1 1,000,000 .012303 .087936 Uniform, Fuzzy Type 2 1,000,000 .015630 .085770 Uniform, Fuzzy Type 3 1,000,000 .020260* .078820} t 7-Node Network Cycles RMSEt RMSEk 17,000 .009417 .023941} 241,000 .009995 .061814 14,000 .009992 .046301 5,000 .009884 .059862} 297,000 .009859 .075808 150,000 .009955 .077725 Table 5.15: Results Table 2.4: Group I experiments, Class 2A Problem, Noisy, T = 54. Problem 3: Learning Arbitrary Functions The results of the 52 experiments dealing with Problem 3 are tabulated in Results Tables 3.1 to 3.4 (Tables ( 5.16) to 5.19). Examination and comparison of these tables with Results Tables 1.1 to 1.4 (Tables 5.8 to 5.11) for Problem 1A and Results Tables 2.1 to 2.4 (Tables 5.12 to 5.15) for Problem 2A reveal that this many-to-6ne mapping problem was the "hardest" to learn and the various feedforward ANNs often required several orders of magnitude more cycles before the desired RMSE < 0.01 was reached. d The reason for the observed increase in the number of training cycles may be attributed to the fact that this problem involved learning a many-to-one function. As explained in Subsection 5.4.1, included in both the training and testing data sets are two examples (4 I/O vector pairs) where two different input vectors are mapped to exactly the same outputs. • The highest observed value of RMSEt for this class of problems was .070736 reached after 1,000,000 training cycles (highlighted with the • marker symbol in Results Table 3.4 (Table 5.19), T = 54, 3-Node network example, Noisy-Uniform, Fuzzy Type 3). Note Chapter 5. EXPERIMENTAL VERIFICATION Problem 3 T=14 Transformation Normal, Fuzzy Type 1 Normal, Fuzzy Type 2 Normal, Fuzzy Type 3 Normal, Probability Uniform, Fuzzy Type 1 Uniform, Fuzzy Type 2 Uniform, Fuzzy Type 3 AND ANALYSIS OF RESULTS 181 Noiseless 3-Node Network Cycles RMSEt RMSEk 1,000,000 .010621 .047678 1,000,000 .025781 .120454 388,000 .009960 .093764 589,000 .009991 .023190} 250,000 .008800 .076819 1,000,000 .042604 .066131} 1,000,000 .015548 .080446 7-Node Network Cycles RMSEt RMSEk 56,000 .009972 .020870 15,000 .007970* .079750 76,000 .009839 .044503 53,000 .009591 .010533} 23,000 .009027 .086938} 10,000 .008091 .113721 84,000 .007927 .120350 Table 5.16: Results Table 3.1: Group I experiments, Class 3 Problem, Noiseless, T = 14. however that in the same table, a lower value of RMSEt = 0.036277 was reached also after 1,000,000 training cycles (Noisy-Normal, Fuzzy Type 1). • The lowest observed value of RMSEt was .007927 reached after 250,000 training cycles (highlighted with the • marker symbol in Results Table 3.1 (Table 5.16), T = 14, 7-Node network example, Noiseless-Normal, Fuzzy Type 2). The corresponding value of RMSEk for this simulation was 0.079750 and does not correspond to the minimum RMSEk = .010533 computed with the same testing I/O set after 53,000 cycles (see Results Table 3.1 (Table 5.16), RMSE = .009591, T = 14, 7-Node network example, Noiseless-Normal, t Probability). • With the exception of a few simulations, most of the ANNs did not converge to the desired RMSE d < 0.01 (1.25% of the transformed output range) during training even after 1,000,000 cycles. However, in most cases (42/52), an RMSE < 0.04 (i.e., only < 5% t of the interval [0.1,0.9]) was reached after considerably fewer cycles. Chapter 5. EXPERIMENTAL Problem 3 T=14 Input Transformation Normal, Fuzzy Type 1 Normal, Fuzzy Type 2 Normal, Fuzzy Type 3 Uniform, Fuzzy Type 1 Uniform, Fuzzy Type 2 Uniform, Fuzzy Type 3 VERIFICATION AND ANALYSIS OF RESULTS 182 Noisy 3-Node Network 7-Node Network RMSEk Cycles RMSEt Cycles RMSEt RMSEk 1,000,000 .029213 .061422* 72,000 .009824 .108578 1,000,000 .039957 .074372 17,000 .009989 .043935 1,000,000 .031467 .064795 26,000 .008886 .025854* 403,000 .009031 .107196* 23,000 .009901 .112752* 1,000,000 .034340 .120036 12,000 .009536 .116588 1,000,000 .059352 .119718 43,000 .009999 .120850* Table 5.17: Results Table 3.2: Group I experiments, Class 3 Problem, Noisy, T = 14, • The highest computed value of RMSEk was .120850 ( « 15% of the interval [0.1,0.9]) after 43,000 training cycles (highlighted with the • marker symbol in Results Table 3.2 (Table 5.17, T = 14, 7-Node network example, Noisy-Uniform, Fuzzy Type 3, RMSE = .009999). t Note however that, in the same table, the minimum computed value of RMSEk — .025854 was reached after only 26,000 training cycles with a corresponding RMSEt = 0.008886. • When noiseless data were used to train the feedforward ANNs, an increase in the training 1/0 data set size from T = 14 to T •= 54 vector pairs was observed to sometimes increase the training cycles required to achieve the desired RMSE . d In addition, the values of RMSEt were sometimes higher when the ANNs were trained with T = 54 I/O pairs (compare Results Tables 3.1 and 3.3 (Tables 5.16 and 5.18)). In addition, the increase in the training I/O data set size did not significantly improve the minimum values of RMSEk calculated with the same testing I/O data set. Note that although in some of these simulations the increased training data set size improved the value of RMSEk (for example compare the values of RMSEk for the 7-Node example, Noiseless-Uniform, Fuzzy Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS Problem 3 T=54 Transformation Normal, Fuzzy Type 1 Normal, Fuzzy Type 2 Normal, Fuzzy Type 3 Normal, Probability Uniform, Fuzzy Type 1 Uniform, Fuzzy Type 2 Uniform, Fuzzy Type 3 183 Noiseless 3-Node Network 7-Node Network RMSE Cycles RMSEt Cycles RMSEt RMSEk 1,000,000 .018015 .064998 1,000,000 .010441 .015318 1,000,000 .046203 .0494521 216,000 .009901 .033626 1,000,000 .043028 .103369 308,000 .009970 .035281 1,000,000 .043580 .067356 140,000 .009981 .010927} 1,000,000 .070531 .079110 1,000,000 .020118 .053409} 1,000,000 .040928 .061875} 1,000,000 .015942 .085372 1,000,000 .064709 .070660 1,000,000 .018195 .055976 k Table 5.18: Results Table 3.3: Group I experiments, Class 3 Problem, Noiseless, T = 54. Type 3 in Tables 3.1 (T = 14) and 3.3 (T = 54) (Tables 5.16 (T = 14) and 5.18 (T = 54))), the minimum values of RMSEk, for both the 3-Node and 7-Node examples were obtained when T = 14 I/O,pairs were used to train the corresponding ANNs. • When noisy data were used to train the feedforward ANNs, in a number of simulations, an increase in the sample size from T = 14 to T = 54 was observed to increased the number of training cycles needed to converge to the desired RMSEt (in particular for the 7-Node hydraulic network example; compare Results Tables 3.2 and 3.4 (Tables 5.17 and 5.19). In some simulations the increase in the training data set size improved the value of RMSEk computed with the same testing data set (for example compare the values of RMSEk for the 7-Node example, Uniform, Fuzzy Type 3 in Tables 3.2 and 3.4 (Tables 5.17 (T = 14) and 5.19 (T = 54))). However, the minimum values of RMSEk, for both the 3-Node and 7-Node network examples were observed when T = 14 I/O pairs were used to train the corresponding ANNs. Chapter 5. EXPERIMENTAL VERIFICATION 184 AND ANALYSIS OF RESULTS • In all of the 52/156 simulations studying Problem 3, the training cycles required to achieve the desired RMSE < 0.01 for the ANNs corresponding to the 7-Node example hydrauhc d network were observed to be less than or equal to the training cycles needed to train the ANNs corresponding to the 3-Node example hydrauhc network. • Although for 31 of the 52 (31/52) experiments investigating Problem 3 the desired RMSEf < 0.01 was not achieved even after 1,000,000 cycles, in most cases (42/52), an RMSE < 0.04 t (i.e., 5% of the transformed output range) was reached after considerably fewer cycles. The error evolution of the 52 experiments involving Problem 3 are summarized in results Figures 3.1 (A.53) to 3.8 (A.60) in Appendix A. In addition, Results Figures 3.9 to 3.11 (Figures A.61 to A.63) in Appendix A compare the values of the RMSEt and RMSEk for 6/52 experiments. As demonstrated by these figures, the values of RMSEk deteriorate with increasing training cycles after they reach a minimum around 10,000 to 20,000 cycles. The increase in the training I/O data set size from T = 14 to T = 54 pairs results in fewer oscillations of RMSEt and RMSEk as a function of training cycles with slight overall improvements. When the feedforward ANNs were trained with T = 14 "over-training" (see Subsection 5.5.3 for discussion) seems to occur earlier and faster rates of divergence were observed as the ANNs become , overtrained (see Figure 3.9 (A.61), 3.10 (A.62), and 3.11 (A.63) in Appendix A). Results Figures 3.12 to 3.19 (Figures A.64 to A.71) in Appendix A compare the actual versus ANN computed values of the pipe segment loss factors for the 7-Node/ll-Segment/5Loop hydraulic network example where the corresponding feedforward ANNs corresponding were trained with T = 54 I/O pairs and tested with K = 14. The first 4 figures (results Figures Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS 185 Problem 3 Noisy T=54 3-Node Network 7-Node Network RMSE RMSE Input Transformation Cycles Cycles RMSEt RMSE Normal, Fuzzy Type 1 1,000,000 .036277 .098385 1,000,000 .013345 .062545} Normal, Fuzzy Type 2 1,000,000 .055559 .115772 704,000 .010301 .093004 Normal, Fuzzy Type 3 1,000,000 .037486 .062974} 1,000,000 .030013 .062974 Uniform, Fuzzy Type 1 1,000,000 .056445 .120463 1,000,000 .036738 .074124} Uniform, Fuzzy Type 2 1,000,000 .065792 .120531 1,000,000 .021868 .091382 Uniform, Fuzzy Type 3 1,000,000 .070736* .079039} 1,000,000 .022067 .092316 t k k Table 5.19: Results Table 3.4: Group I experiments, Class 3 Problem, Noisy, T = 54. 3.12 to 3.15 (Figures A.64 to A.67)) compare the ANN computed values of the pipe segment loss factors with the actual values when the ANNs were trained with Noiseless-Normal, Fuzzy Type 1, T = 54 I/O data pairs. Results Figures 3.16 to 3.19 (Figures A.68 to A.71) compare the ANN computed and actual values of pipe segment loss factors, K , when the feedforward s ANNs were trained with Noisy-Normal, Fuzzy Type 2, T = 54 I/O data pairs. Note that the four points at which the values are compared correspond to the points in the testing data set with the highest values of maximum errors and are the same as the four I/O vector pairs deliberately included in both the training and testing data sets. As pointed out in Subsection 5.4.1, these points define the desired operating range of the hydraulic network and correspond to the minimum and maximum values of the nodal energy heads and pipe segment loss factors. As expected, the largest errors correspond to the smallest diameter pipes in the hydraulic network. 5.5.2 Simulation Results for Group II Experiments Sixteen (16/172) experiments studied Problems IB and 2B, i.e., hydraulic network analysis and model calibration with incomplete training data (see Figures 4.24 and 4.26 in Chapter Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS 186 4). This group of experiments also examined the effects on convergence and performance as a result of changes in the number of layers and PEs per layer (see Figure 5.31 for overview). As mentioned previously, all the experiments in this group involved the 7-Node example hydrauhc network. Seed training data were generated using the normal deviate, the linear Fuzzy Type 1 membership function was used to transform the 1/0 vector pairs, and the networks were all trained with T = 14 noisy and noiseless data pairs. The following ANN configurations were used for both.Problems IB and 2B (which have the same number of basic inputs (m = 7) and outputs (n = 18)): 1. A 7-15-18 ANN where the number of PEs in the hidden layer was set equal to (2m + 1) as prescribed by Kolmogorov's ANN existence theorem. 2. A 7-7-7-7-7-7-18 ANN where the number of layers and PEs per layer were determined based on the integer linear programming formulation suggested in Chapter 4 with Ni = 2nt + 1 where Ni is the number of units in the hidden layer I and ni = 3 (for £ = 1, • • •, 5) is the total number of nodes in loop £. Since all the five loops of the 7-Node example hydrauhc network have the same number of nodes and segments, the solution requires that the number of PEs in each of the 5 layers of the ANN to also be the same8. 3. A 7-37-18 ANN where the number of PEs in the hidden layer were increased to 37 (2 x (k + r + s) + l) over that prescribed by Kolmogorov's theorem (i.e., 15 = 2 x (k + r) + 1). The number of PEs in the hidden layer were set equal to the number of PEs in the hidden Note that while the number of connections in the 7-7-7-7-7-7-18 network (equal to (7 x 7)+4(7 x 7 ) + (7 x 18) = 371) is less than the total number of connections in a 7-15-18 network (equal to (7 x 15) + (15 x 18) = 375), satisfying the constraint on the maximum number of allowable connections, the total number of PEs (including the units in the input layer) has increased to 60 from 40. 8 Chapter 5. EXPERIMENTAL Group II Normal, Fuzzy 1 Problem/Data IB, Noiseless IB, Noisy 2B, Noiseless 2B, Noisy VERIFICATION AND ANALYSIS OF RESULTS 187 1-Hidden Layer 7-37-18 Network 7-15-18 Network RMSE} RMSE RMSE™" RMSE' RMSE\ RMSE\ 0.014333 0.0047161 0.017274 0.014668 0.002143 0.032408 0.020827 0.005916 0.044160 0.021341 0.006400 0.063666 0.017465 0.004924 0.026447 0.021732 0.004321 0.030590 0.019451 0.002577 0.041079 0.020004 0.001705 0.045394 iW W m Table 5.20: Results Table 1 for Group II experiments, Class IB and 2B Problems, T = 14 noiseless and noisy normal training data were used to train the ANNs with a single hidden layer. layer of the ANNs corresponding to Problems 1A and 2A with a 18-37-18 topology. 4. A 7-13-13-13-13-13-18 feedforward ANN with the number of hidden layers set equal to the number of non-overlapping loops in the hydraulic network. The number of PEs in the hidden layers were set equal to the number of PEs per layer computed based on the integer linear programming formulation applied to Problems 1A and 2A with 2(fc 4- r 4- s) = 36 state variables and model parameters (m = n = 18). For both these problems in the context of the 7-Node example hydraulic network, the number of PEs, Ni, in all the hidden layers (/ = 1, • • •, L) are the same and equal to Ni = 2 x (ne + si) + 1 = 2 x (3 + 3) + 1 — 13 where ng = 3 and se = 3 are the number of nodes and segments in loop I respectively. Note that the number of connections in an 18-13-13-13-13-13-18 ANN is (18 X 13) + 4(13 x 13) + (13 x 18) = 1144 and is less than the number of connections in an 18-37-18 feedforward ANN ((18 x 37) + (37 x 18) = 1332) satisfying the constraint on the maximum number of allowable connections. Results Tables 5.20 and 5.21 tabulate the simulation results for Problems IB and 2B. Chapter 5. EXPERIMENTAL Group II Normal, Fuzzy 1 Problem/Data IB, Noiseless IB, Noisy 2B, Noiseless 2B, Noisy VERIFICATION AND ANALYSIS OF 188 RESULTS 5-Hidden Layers 7-5 (13)-18 Network 7-5(7)-18 Network RMSE) RMSE] RMSE °° 0.000885 0.024970 0.001115 0.000054 0.015227 0.019572 0.019816 0.036973 00 0.018695 0.024970 0.041351 0.043441 RMSE] l k 0.019428 0.028434 0.040719 0.042854 RMSE™ RMSE °° 0.013275 0.028430 0.040717 0.042852 0.016137 0.018632 0.037965 0.038610 l k Table 5.21: Results Table 2 for Group II experiments, Class IB and 2B Problems, T = 14 noiseless and noisy normal training data were used to train the ANNs with five hidden layers. The values of RMSE t are shown after 1,000 and 100,000 cycles (i.e., respectively). Values of RMSEk RMSE] are shown only after 100,000 cycles (i.e., and RMSE™ ). 0 RMSE] 00 For the ANNs with a single hidden layer, although an increase in the number of units from 15 to 37 does not improve either values of RMSE] or values of RMSE] , 00 RMSE™ 0 noticeably improve (see Table 5.20). In comparison, for the ANNs with 5 hidden layers, while the change in topology from 7-5(7)-18 to 7-5(13)-18 significantly improves values of remaining relatively unchanged), values of RMSE 100 RMSE] 00 (with values of RMSE] do not improve proportionally. The error evolution of the 16 experiments summarized in Tables 5.20 and 5.21 are compared in results Figures A.72 to A.76 in Appendix A. Figure A.76 compares the evolution of RMSEt and RMSEk at 10,000 cycle intervals. Both networks are over-trained after approxi- mately 10,000 cycles where the values of RMSEk reach their minimum. As can be seen in this figure ( A.76) the ANN with the larger number of PEs in the hidden layer results in generally lower values of RMSEk as a function of training cycles. Chapter 5. EXPERIMENTAL 5.5.3 VERIFICATION AND ANALYSIS OF RESULTS 189 Discussion and Summary of Results In general, the number of training cycles required to reach the desired RMSE < 0.01 varied d with and was sensitive to the choice of the underlying probability distribution, the form and parameter values of the input/output transformation functions, the presence of noise and the sample size of the training and testing I/O data sets, the form and parameters of the activation functions, and learning parameters. The various experiments reported in this chapter examined the worst case scenarios in a number of respects in an attempt to address a number of practical questions associated with the type, quality, and quantity of the data available for hydraulic network analysis and model calibration. In particular, 1. Problems IB and 2B investigated the suitability of feedforward ANNs for hydraulic network analysis and model calibration with incomplete data (i.e., without any knowledge of pipe segment loss factors and flowrates, respectively), 2. experiments involving Problem 3 studied how well feedforward ANNs can be trained to approximate implicit and unknown relationships between arbitrary groupings of state variables and model parameters (an attempt was made to simulate the worst case situation with minimal measurement data where energy head losses were estimated without any knowledge of pipe segmentflowratesand nodal energy heads at the sink nodes), 3. the various feedforward ANNs were trained with noiseless and noisy data and tested with noiseless 1/0 pairs (although generally better results are possible when testing with noisy I/O pairs), Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS 190 I RMSE k RMSE™ in RMSE? in • €t Cycles Figure 5.36: Illustration of Overtraining. The training and testing errors, Zt and £jt are shown as functions of the number of training cycles. As illustrated in the figure, RMSE was reached at half the number of cycles required to converge to RMSE™ . The values of RMSEk computed with an over-trained feedforward network can be an order of magnitude large than the values of RMSE™ . mxn tn in 4. random noise was added to the input component of the I/O vector pairs after algebraic normalization although after preliminary experiments it was established that convergence and performance behavior of the various ANNs improve noticeably when noise was added before algebraic normalization of the I/O data, 5. the various feedforward ANNs were tested with only K = 14 I/O vector pairs although performance as measured by RMSEk was observed to improve significantly when the sample size of the testing 1/0 data sets used was increased to K = 54. Examination of the training and performance errors for a number of experiments reveal that after around 10,000 to 20,000 cycles the ANNs performance as measured by RMSEk (with K = 14) deteriorates with increasing number of training cycles. This phenomenon, referred Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS 191 to as overtraining in the literature (see Figure 5.36 for schematic) and may be explained by the fact that the relatively flat error surface at the beginning of the simulation deforms and becomes more "wrinkled" as the ANN tries to improve on RMSEt and better fit the training I/O vector pairs[Hecht90]. However, after a certain number of cycles the ability of the network to interpolate between these points decreases since, in effect, the ANN has "over fit" (or "over learned") the training data. Figures 3.9 (A.61) to 3.11 (A.63) in Appendix A compare values of the RMSE and RMSEk t for 6 experiments dealing with Problem 3, the "hardest" problem investigated in this study. As demonstrated by these figures, the various ANNs' performance as measured by RMSEk deteriorates with increasing number of training cycles. The increase in the training I/O data set size from T = 14 to T = 54 pairs results in fewer oscillations of RMSE and RMSEk as a t function of cycles with no considerable overall improvements. For feedforward ANNs trained with T = 14 "over-training" seems to occur earlier with faster overall rates of divergence than ANNs trained with T = 54 (see Figure 3.9 (A.61), 3.10 (A.62), and 3.11 (A.63) in Appendix A). In summary, • The number of training cycles required to converge to the specified error tolerance was observed to be relatively independent of the size of the hydraulic network as measured by (fc + r + s). In most experiments in Group I, the number of cycles required to converge to the desired RMSE was found to be less for the 7-Node network. This observation may be t partly explained by the relatively small number of connections in the feedforward ANNs used to represent the 3-Node net. The smallest ANN was a 3-7-3 net with 42 connections Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS 192 (Problem 3, 3-Node example). In addition, the energy head differences along the pipe segments are larger for the 3-Node example; this example hydraulic network has longer pipe segments with larger diameters and hence generally higher flowrates throughout the net. • The addition of random noise to the normalized training I/O data was observed to improve convergence to the desired error tolerance in a few experiments while in others the number of cycles required to converge to the desired RMSE < 0.01 was observed to increase. Both d RMSEt and RMSEk were observed to oscillate the most when Noisy-Uniform I/O data were used during training. • Values of RMSEt and RMSEk oscillate more noticeably as a function of training cycles when the feedforward ANNs were trained with the smaller (T = 14) data set size. • The choice of the underlying probability distribution used to generate seed training data was observed to influence ANN convergence and performance behavior. In most experiments, in particular when noisy data sets were used during training, the various feedforward ANN were observed to converge faster requiring less cycles when the seed data were generated using the normal deviate. As an example, with respect to Problem 2A, the 4 experiments where RMSE d < 0.01 was not achieved even after 1,000,000 cycles, the networks were trained with noisy uniform. On the other hand, in a few experiments when the various ANNs were trained with Noiseless-Uniform, convergence to the desired error was achieved after fewer cycles. However, with the exception of one experiment, the values of the performance RMSEk were consistently lower ( « one half) when the networks Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS 193 were trained with either noiseless or noisy normal I/O pairs. • The form of the I/O transformation functions and the values of their parameters were observed to significantly affect the number of training cycles needed to achieve the desired training and performance error tolerances. The values of RMSE as a function of the t training cycles were very similar when the various groups of inputs were transformed by means of Fuzzy Type 1 and Fuzzy Type 2 transformations. However, the values of RMSEk computed with the testing set seems to be consistently lower when the Fuzzy Type 1 transformation was used, in particular with respect to the relatively "harder" Problem 3. • Problem 1 was found to be the "easiest" problem for the ANNs to learn in that it required much fewer training cycles to converge to the desired RMSE < 0.01 and all the d simulations were successful in this respect. In addition, the values of RMSEk were found to be in closer agreement with RMSE for this class of problems. t • Problem 2 was relatively more difficult than Problem 1 and required an order of magnitude more training cycles before convergence to the desired RMSE < 0.01 was achieved. In d only four (4/52) experiments the desired RMSE < 0.01 was not achieved. d • Problem 3 (in particular for the 3-Node hydrauhc network example) was found to be the most difficult problem and required often 3 orders of magnitude more training cycles than problem 1. Although, in thirty one (31/52) experiments the desired RMSE < 0.01 was d not achieved, in 42/52 simulations, values of RMSEt less than 0.04 (5% of the interval [0.1,0.9]) was obtained at often close to two orders of magnitude fewer cycles. Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS 194 • When exceedence probabilities of the various groups of inputs were used, in a number of instances, the convergence behavior was observed to be more oscillatory, however, in a few experiments the desired RMSEf < 0.01 was reached after fewer cycles. • The number of cycles required for convergence to the desired RMSE < 0.01 was observed d tofluctuatewith the training data set size. The values of RMSEk were generally observed to be less oscillatory (and often in closer agreement with the values of RMSE ) when the t feedforward ANNs were trained with the larger training data set. However, increase in training data set size (in particular, when noise was added) do not seem to necessarily improve the minimum observed values of RMSEk• Increasing the number of units in the hidden layer (from 15 to 37) over (2m + 1) (where m = 7 was the number of units in the input layer) does not seem to improve RMSE t with some improvements in the values of RMSEk- This observation is in agreement with Kolmogorov's ANN existence theorem discussed in Chapter 4. Comparison of values of RMSEt and RMSEk seem to indicate that the ANNs with more units in the hidden layer(s) diverge at a slower rate after the networks become overtrained. 5.6 Conclusions This chapter has demonstrated the versatility of multilayered feedforward ANNs trained by error backpropagation for representing and solving 3 characteristically different classes of hydraulic network problems. The experiments demonstrate that even when the ANNs are trained with noisy data, or when only partial input data were used during training, the specified feedforward Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS 195 nets were successful in learning even arbitrarily defined and unknown many-to-one mappings between the desired groups of hydrauhc network model variables and/or parameters. Examination of the results indicate that the convergence and performance of the standard error BP algorithms implemented to train feedforward ANNs representing even medium-sized hydrauhc networks is largely dependent on the identification of the I/O, representation of the topology of the hydrauhc network in terms of the number layers and PEs/layer as well as the form and parameter values of the ANN transition functions. The techniques for fuzzy transformation and normalization of the I/O training and testing 1/ 0 vector pairs in addition to the methods suggested for the representation of hydrauhc networks by means of feedforward ANNs were observed to result in significant improvement in performance both during training (with respect to the number of cycles required to achieve a desired RMSEt) and during the testing stage (as measured by RMSEk). Part of the reason for slow convergence rates observed with the error BP algorithm is due to the high dimension of the weight space and the shape of the error surface at each iteration point with possibly many local minima. The various I/O transformations seem to make this error surface smoother and hence assure convergence, at possibly slower rate but with fewer oscillations. Due to the numerous combination of ANN and WDN parameters and initial conditions of interest, the empirical studies presented in this chapter are necessarily incomplete. However, the objective was to demonstrate the versatility of multilayered feedforward ANNs for representing arbitrarily complex hydrauhc networks and the performance of the general error BP algorithm for learning to solve three different classes of hydrauhc network problems. This objective has been achieved by the design and analysis of the results of a relatively comprehensive set of Chapter 5. EXPERIMENTAL VERIFICATION AND ANALYSIS OF RESULTS 196 numerical experiments that studied the convergence and performance of feedforward ANNs for solving three practical and generalizable engineering problems under stringent and often worst-case conditions. Chapter 6 SYNOPSIS A N D CONCLUSIONS This study has investigated the feasibility of ANNs in general and multilayered feedforward networks trained by error backpropagation in particular for hydraulic network analysis and model calibration under typical real-world operating conditions. Chapter 2 focussed on the identification and formulation of the basic problems of hydraulic network analysis and model calibration and summarized some of the conventional solution techniques used in practice. A review of the advocated solution methods revealed that all of these techniques directly or indirectly incorporate the continuity and energy equations in the formulation and usually require the steady-state solution of the non-linear system of hydraulic equations as part of the overall solution. For the purpose of generating training I/O vector pairs needed for supervised training of multilayered feedforward ANNs, the hydraulic equations were formulated with the s pipe flow rates and the (k + r) nodal external flow rates as the unknowns. The advantage of this approach is that the resulting system of equations can be solved directly for the unknowns without the need to resort to the use of existing solution methods completely avoiding the corresponding algorithm convergence problems and the resulting errors. Chapter 3 outlined the building block components of ANNs while emphasizing a hierarchical framework for the specification of their topology and functional components. A number of important ANNs were summarized with the objective of finding a suitable architecture for 197 Chapter 6. SYNOPSIS AND CONCLUSIONS 198 modeling and solving the hydrauhc network analysis and model calibration problems. These problems were modeled and represented with the mapping neural network architecture for estimating both the underlying function and it inverse that performs a mapping from the set of all possible values of the hydrauhc network model parameters/variables, S / Z , onto the set of all possible values of the state variables/parameters, Z / S Chapter 4 investigated the suitability of the backpropagation algorithm and presented a new and general approach for hydraulic network analysis and model calibration with both complete and partial data. The specification of the topology of multilayered feedforward ANNs was based on the number of nodes, pipe segments, and loops in the WDN. The specific details of interfacing the input/output to/from the WDN and ANN simulators during training for three specific classes of hydrauhc network problems were then outlined to assist with the experimental verification of the hybrid model. Chapter 5 reported on the key results of two groups of numerical experiments that investigated the suitability of the suggested representation and training schemes for solving the 3 specific classes of problems, compared the convergence of the feedforward nets with different choices of input transformations, and studied their performance when trained and tested with both noisy and noiseless input/output data sets. A method is also suggested for decomposing any given hydrauhc network to arbitrarily smaller sub-networks that could then be trained and studied independently. Analyses of results seem to strongly suggest the suitability of multilayered feedforward ANNs trained by error backpropagation for modeling a wide range of WDN problems as demonstrated by their success for hydrauhc network analysis and model calibration. For these key Chapter 6. SYNOPSIS AND CONCLUSIONS 199 hydraulic network problems, even when the feedforward ANNs were trained with noise and tested with non-noisy data sets (worst case scenario with the highest error), or when incomplete training input data were used, the specified feedforward nets were observed to successfully learn to approximate the underlying functions. The unexpected success of the experiments with problem 3 investigating the convergence and performance of feedforward nets configured and trained to approximate arbitrarily defined and explicitly unknown many-to-one functions ( K = <f>(Qk, E ) ) with a small set of noisy training and testing data, is additional experimens r tal evidence pointing to the versatility and robustness of the parallel distributed processing framework as a powerful computational tool for modeling and solving a variety of real-world engineering problems. In summary, a new approach for representing, modeling, and simulating three characteristically different classes of hydraulic network problems with multilayered feedforward ANNs has been presented. The developed framework is relatively general and can be used to represent and simulate many aspects of the operation of arbitrarily complex hydraulic networks and has the following distinctive features: 1. It can be used to efficiently and systematically decompose, represent, analyze, and calibrate arbitrarily complex hydraulic networks, 2. The specification of the feedforward networks are based on the topology of the hydraulic network and can be trained and used as a stand-alone module in applications that require estimates of hydraulic network model parameters and/or variables with limited data. 3. The hydraulic network decomposition and feedforward ANN representation techniques Chapter 6. SYNOPSIS AND CONCLUSIONS 200 developed can easily be used as the basis for solving a number of related problems, such as detection of damaged and leaky lines or on-line monitoring and control applications. 4. It implements an efficient method for generating reliable training input/output data pairs by direct solution without the need for the conventional iterative solution techniques currently used to solve the non-linear system of hydraulic equations. 5. It incorporates a convenient method for transforming the various groups of hydrauhc network model variables and parameters to the same interval by utilizing non-normal fuzzy membership functions. In closing, although the main objectives of this thesis have been achieved, further experimental and analytical extensions are desirable. Experimentation with different architectures and other characteristically different classes of problems would help further verify the generalizability of the developed framework. For example, a series of experiments with the objective of studying the convergence and performance behavior of feedforward nets configured for solving the following problems are of special interest in practice: 1. Detection of damaged and leaky pipe segments 2. Hydrauhc network analysis during transient conditions The analytical studies would be concerned with the study of the computational requirements of recurrent networks trained by error backpropagation or other minimization algorithms Chapter 6. SYNOPSIS AND CONCLUSIONS 201 for the prediction of the behavior of hydrauhc networks between equilibrium states under transient conditions. A final area of study currently being researched is concerned with the investigation of the requirements of applying the developed framework to the analysis and calibration of large-scale WDN models in conjunction with measurement data obtained on-line. BIBLIOGRAPHY [Aarts89] Aarts, E., and Korst, J. Simulated Annealing and Boltzmann Machines, 1989, John Wiley & Sons, Toronto, Ontario. [Abramowitz65] Abramowitz, M., Handbook of mathematical functions, with formulas, graphs, and mathematical tables, 1965, New York Dover Publications. [Anderson72] Anderson, J. A., "A simple neural network generating an interactive memory", Mathematical Bioscience, 1972, Vol. 14, pages 197-220. [Anderson86] Anderson, J. A., "Cognitive Capabilities of a Parallel System", NATO ASI Series, Disordered Systems and Biological Organization, 1986, edited by Bienenstock et al., Volume F20, pages 260-276. [Astrom89] Astrom, K. and Wittenmark, B., Adaptive Control, 1989, AddisonWesley Publishing Company, Don Mills, Ontario. [AWWA74] "Water Distribution and Applied Development Needs", American Water Research Association Committee on Distribution Systems, Journal of the American Water Works Association, June 1974, Vol. 66, No. 6, pages 385390. [Ayala91] Ayala, G., and Liggett, J. A., "Flow in Damaged Pipe Networks", Journal of the Hydraulics Division, February 1991, A S C E , Vol. 117 No. 2, pages 230-239. [Bao90] Bao, Y., and Mays, L. W., "Model for Water Distribution System Reliability", Journal of the Hydraulics Division, September 1990, A S C E , Vol. 116, No. 9, pages 1119-1137. [Bargiela89] Bargiela, A. and Hainsworth, G. D., "Pressure and Flow Uncertainty in Water Systems", Journal of the Hydraulics Division, 1989, A S C E , Vol. 115 No. 2, , pages 212-229. [Barto89] Barto, G. A., Connectionist Learning for Control: An Overview, COINS Technical Report, September 1989, Department of Computer and Information Science, University of Massachusetts, Amherst MA. [Box65] Box, M. J., "A New Method of Constrained Optimization and a Comparison with other Methods", 1965 Computer Journal, No. 8, pages 45-52. 202 BIBLIOGRAPHY 203 [Brebbia83] Brebbia, C. A. and Ferrante, A. J., Computational Hydraulics, Butterworth and Co., 1983. [Brion91] Brion, L.M., and Mays, L. W., "Methodology for Optimal Operation of Pumping Stations in Water Distribution Systems", Journal of the Hydraulics Division, November 1991, A S C E , Vol. 117 No. 11, pages 15511569. [Burden85] Burden, R. L. and Faires, J. D., Numerical Analysis, 3 Prindle, Weber & Schmidt, Boston. [Camargo90] Camargo, F. A., Learning Algorithms in Neural Networks, CUCS-062-90, The DCC Laboratory, Computer Science Department, Columbia University, New York, N.Y., 10027. [Cater87] Cater, J. P., "Successfully Using Peak Learning Rates of 10 (and Greater) in Back-Propagation Networks with the Heuristic Learning Algorithm", IEEE First International Conference on Neural Networks, 1987, San Diego, edited by M. Caudill and C. Bulter, vol. II, pages 645-651, New York. [Collins79] Collins, M. A., and Cooper, L., "Multiple Operating Points in Complex Pump Networks", Journal of the Hydraulics Division, March 1979, A S C E , Vol. 105 No. HY3, pages 229-244. [Coulbeck85] Coulbeck, B., "An Application of Hierarchical Optimization in Calibration of Large-Scale Water Networks", Optimal Control Applications and Methods, June 1985, 6, pages 31-42. [Cross36] Cross, H., "Analysis of Flow in Conduits or Conductors", 1936, Bulletin No. 286, University of Illinois Engineering Experimental Station, Urbana, Illinois. [Dayhoff90] Dayhpff, J. E., Neural Network Architectures A n Introduction, 1990, Van Norstrand Reinhold, New York. [Donachie74] Donachie, R. P., "Digital Program for Water Network Analysis", Journal of the Hydraulics Division, 1974, A S C E , Vol. 100 No. 3, pages 393-403. [Duda73] Duda, R. 0. and Hart, P. E., Pattern Classification and Scene Analysis, 1973, Wiley, New York. [Eggener76] Eggener, C. and Polkowski, L., "Network Models and impact of modeling assumptions", Journal of AWWA, 1976, 68(4). [Epp70] Epp, R. and Fowler, A. G., "Efficient Code for Steady-State Flows in Networks", Journal of the Hydraulics Division, January 1970, A S C E , Vol. 96, No. HY1, pages 43-56. rd edition, 1985, BIBLIOGRAPHY 204 [Fahlman88] Fahlman, S. E. "An Empirical Study of Learning Speed", 1988 CMU-CS88-162, Computer Science Department, Carnegie Melon University. [Feather83] Featherstone, R. E. and El-Jumaily, K., "Optimal Diameter Selection for Pipe Networks", Journal of the Hydraulics Division, February 1983, A S C E , Vol. 109, No. 2, pages 221-234. [Fiesler92] Fiesler, E., and Caulfield, H. J., "Neural Network Formalization", 1992, Technical Report, Alabama A & M University, Normal, Alabama [Finch.93] Finch G. R., and Howard, C. D. D., Hydraulics, Analysis, arid Field Testing of Water Distribution Systems, Charles Howard and Associates Ltd., Victoria, British Columbia. [Fogleman87] Fogelman, F., Automata networks in computer science: Theory and Applications, 1987, edited by R. Yves and R. Tchuente, Princeton University Press. [Freeman91] Freeman, J. A., and Skapura, D. M., Neural Networks Algorithms, Applications, and Programming Techniques, 1991, Addison- Wesley Publishing Company, Don Mills, Ontario. [Fukush88] Fukushima, K., "A hierarchical neural network model for selective attention", Nerual Computers, 1988, edited by R. Eckmiller and C. von der Malsberg, pages 89-90, Springer-Verlag, Berlin. [Gofman81] Gofman, E., and Rodeh, M., "Loop Equations with Unknown Pipe Characteristics", Journal of the Hydraulics Division, September 1981, A S C E , Vol. 107 No. HY9 , pages 1047-1060. [Goodwin84] Goodwin, G. C , and Sin, K. S. Adaptive Filtering Prediction and Control, 1984, Prentice-Hall Inc., Engelwood Cliffs, New Jersey. [Grossberg82] Grossberg, S., Studies of Mind and Brain: Neural principles of learning, perception, development, cognition, and motor control, Reidel Press, Boston, 1982. [Hamberg88a] Hamberg, D., and Shamir, U., "Schematic Models for Distribution Systems Design. I: Combination Concept", Journal of Water Resources Planning and Management, March 1988, A S C E , Vol. 114 No. 2, pages 129-140. [Hamberg88b] Hamberg, D., and Shamir, U., "Schematic Models for Distribution Systems Design. II: Continuum Approach", Journal of Water Resources Planning and Management, March 1988, A S C E , Vol. 114 No. 2, pages 141-162. [Hebb49] Hebb, D. 0., The Organiztion of Behavior, 1949, Wiley, New York. BIBLIOGRAPHY 205 [Hecht87] Hecht-Nielsen, R. "Counterpropagation Networks", Proc. of Int. Conf. on Neural Networks, II, June 1987, pages 19-32, IEEE Press, New York. [Hecht90] Hecht-Nielsen, R. Neurocomputing, Addison- Wesley Publishing Company, 1990 Don Mills, Ontario, Canada. [Hertz91] Hertz, J., Krogh, A., and Palmer, R., Introduction to the Theory of Neural Computation, Addison-Wesley Publishing Company, 1990 Don Mills, Ontario, Canada. [Hopfield82] Hopfield, J. J. "Neural Networks and Physical Systems with Emergent Collective Computational Abilities", Proceeding of National Academy of Science, Biophysics, 1982, No. 79, pages 2554-2558. [Hopfield84] Hopfield, J. J. "Neuron with Graded Response have Collective Computational Properties like those of two-state Neurons", Proceeding of National Academy of Science, Biophysics, 1984, No. 81, pages 3088-3092. [Hopfield85] Hopfield, J. J. and Tank, D. W., "Neural Computation of Decisions in Optimization Problems", Biological Cybernetics 1987, No. 52, pages 141152. [Jacobs88] Jacobs, R. A., "Increased Rates of Convergence Through Learning Rate Adaptation", Neural Networks 1, 1988, pages 295-307. [Jeppson77] Jeppson, R. W., Analysis of Flow in Pipe Networks, 1977, Ann Arbor Science, Ann Arbor, Michigan. [Jowitt90] Jowitt, P. W., and Xu C , "Optimal Valve Control in Water Distribution Networks", Journal of Water Resources Planning and Management, July/August 1990, A S C E , Vol. 116 No. 4, pages 455-472. [John80] John, J. E. A., and Haberman, W. L., Introduction to Fluid Mechanis, 1980, Prentice-Hall, Inc., Englewood Cliffs, New Jersey. [Kalman60] Kalman, R. E., "A New Approach to Linear Filtering and Prediction Problems", Transactions of ASME, Journal of Basic Engineering, 1960, Vol. 82, pages 35-46. [Karney84] Karney, B. W., Analysis of Fluid Transients in Large Distribution Networks, Ph.D. thesis, Faculty of Graduate Studies, The University of British Columbia, September 1984. [Kirkpat83] Kirkpatrick, S., Gelatt, C. D. Jr., and Vecchi, M. P., "Optimization by simulated annealing", Science, 1983, 220 (4598), pages 671-680. [Klopf88] Klopf, A. H., "A neuronal model of classical conditioning", Psychobiology, 1988, 16 (2), pages 85-125. BIBLIOGRAPHY 206 [Kohonen84] Kohonen, T., Self-Organization and Associative Memory, 1984 (Second Edition, 1988), Springer-Verlag, Berlin. [Kohonen87] Kohonen, T., "Self-learning inference rules by dynamically expanding context", 1987, Proc. of Int. Conf. on Neural Networks, II, 3-9, IEEE Press, New York. [Kosko88] Kosko, B., "Bidirectional Associative Memories", IEEE Trans. Systems, Man and Cybernetics 1988, 18(1), pages 49-60. [Kosko92a] Kosko, B., Neural Networks and Fuzzy Systems, 1992, Prentice-Hall, Inc., Englewood Cliffs, New Jersey. [Kosko92b] Kosko, B., Neural Networks For Signal Processing, 1992, PrenticeHall, Inc., Englewood Cliffs, New Jersey. [Kuo91] Kuo, J. T., Yen, B. C , and Hwang, G.P., "Optimal Design for Storm Sewer System with Pumping Stations", Journal of Water Resources Planning and Management, January/February 1991, A S C E , Vol. 117 No. 1, pages 11-27. [Kuo87] Kuo, B. C , Automatic Control Systems, 1987, Prentice-Hall, Inc., Englewood Cliffs, New Jersey. [Lansey89] Lansey, K. E., and Mays, L. W., "Optimization Model for Water Distribution System Design", Journal of the Hydraulics Division, October 1989, A S C E , Vol. 115 No.. 10, pages 1401-1418. [Lansey91] Lansey, K. E., and Basnet, C , "Parameter Estimation for Water Distribution Networks", Journal of Water Resources Planning and Management, January/February 1991, A S C E , Vol. 117 No. 1, pages 126-144. [Lapedes87] Lapedes, A. and Farber, R., "Nonlinear Signal Processing Using Neural Networks: Prediction and System Modeling", Technical Report LA-UR87-2662, Los Alamos National Laboratory, Los Alamos, NM. [Little89] Little, K. W., and McCrodden, B. J., "Minimization of Raw Water Pumping Costs Using MILP", Journal of the Water Resources Planning and Management, July 1989, A S C E , Vol. 115 No. 5, pages 511-522. [Ljung83] Ljung, L., and Soderstrom, T., Theory and Practice of Recursive Identification, The MIT Press, Cambridge, MA, 1983. [McCull43] McCulloch, W. S. and Pitts W., "A logical calculus of the ideas immanent in nervous activity", Bulletin of Mathematical Biophysics, 1943, No. 5, pages 115-143. BIBLIOGRAPHY 207 [McClel88] McClelland, J. L., Rumelhart, D. E., editors, Explorations in P D P : A Handbook of Models, Programs and Exercises The MIT Press, 1988, Cambridge, MA. [Minsky67] Minsky, M. L. and Papert, S. A., Perceptrons: A n Introduction to Computational Geometry MIT Press, Cambridge, MA, extended edition, 1988. [Muller90] Miiller, B. and Reinhardt, J. (Physics of ) Neural Networks A n Introduction, 1990, Springer-Verlag, Berlin. [Nielsen89] Nielsen, H. B., "Methods of Analyzing Pipe Networks", Journal of the Hydraulics Division, February 1989, A S C E , Vol. 115, No. 2, pages 139157. [Onizuka86] Onizuka, K., "System Dynamics Approach to Pipe Network Analysis", Journal of the Hydraulics Division, August 1986, A S C E , Vol. 112 No. 8 , pages 728-749. [Ormsbee86a] Ormsbee, L. E. and Wood, D. J., "Explicit Pipe Network Calibration", Journal of the Hydraulics Division, April 1986, A S C E , Vol. 112 No. 2, pages 166-182. [Ormsbee86b] Ormsbee, L. E. and Wood, D. J., "Hydrauhc Design Algorithms for Pipe Networks", Journal of the Hydraulics Division, December 1986, A S C E , Vol. 112 No. 12, pages 1195-1207. [Ormsbee89] Ormsbee, L. E., "Implicit Network Calibration", Journal of the Water Resources Planning and Management, March 1989, A S C E , Vol. 115 No. 2, pages 243-257. [Orr91] Orr, C. H., Parkar, M. A., and Tennant, S. T., "Implementation of On-Line Control Scheme for City Water Systems", Journal of the Water Resources Planning and Management, September/October 1990, A S C E , Vol. 116 No. 5, pages 708-726. [Pao89] Pao, Y., Adaptive Pattern Recognition and Neural Networks, Addison- Wesley Publishing Company, 1989, Don Mills, Ontario, Canada. [Press89] Press, W. H., Flannery, B. P., Teukolsky, S. A., and Vetterling, W. T., Numerical Recipes in C, The Art of Scientific Computing, Cambridge University Press, 1989, New York. [Rahal80] Rahal, C M . , "Parameter Tuning for Simulation Models of Water Distribution Networks", Proceedings of Institution of Civil Engineers, September 1980, London, U.K., 69, pages 751-762. BIBLIOGRAPHY 208 [Rosenb59] Rosenblatt, F. "Two theorems of statistical stability in the perceptron", Symposium on the Mechanization of Thought Processes, 1959, pages 421456 [Rumel86a] Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volumes I Rumelhart, D. E., and McClelland, J. L. editors, The MIT Press, 1987, Cambridge, MA. [Rumel86b] Rumelhart, D. E., and McClelland, J. L. editors, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volumes II The MIT Press, 1986, Cambridge, MA. [Shamir68] Shamir, TJ. and Howard, D. D., "Water Distribution Systems Analysis", Journal of the Hydraulics Division, January 1968, A S C E , Hyl, pages 219234. [Shamir74] Shamir, U., "Optimal Design and Operation of Water Distribution Systems", Water Resource Research 1974, Vol. 10, No. 1, pages 27-36. [Shamir77] Shamir, U. and Howard, C. D. D., "Engineering Analysis of Water Disr tribution Systems", Journal of the American Water Works Association, September 1977, A S C E , Vol. 69 No. 9, pages 510-514. [Silva90] Silva, F. M., and Almedia, L. B, "Acceleration Techniques for the Backpropagation Algorithm", Neural Networks, 1990, EURASIP Workshop, pages 110-119, Springer-Verlag, New York. [Sklansky81] Sklansky, J. and Wassell, G. N. Pattern Classifiers and Trainable Machines, 1981, Springer-Verlag, Berlin. [Su87] Su, Y.C., Mays, L.W., and Duan, N. "Reliability-Based Optimization Model for Water Distribution Systems", Journal of the Hydraulics Division, 1987, A S C E , Vol. 114 No. 12, pages 1539-1556. [Tarquin89] Tarquin, A.J., and Dowdy, J. "Optimal Pump Operation in Water Distribution", Journal of the Hydraulics Division, February 1989, A S C E , Vol. 115 No. 2, pages 158-168. [Wagner88a] Wagner, J. M., and Shamir, U., "Water Distribution Rehability: Analytical Methods", Journal of the Water Resources Planning and Management, May 1988, A S C E , Vol. 114 No. 3, pages 253-275. [Wagner88b] Wagner, J. M., and Shamir, U., "Water Distribution Reliability: Simulation Methods", Journal of the Water Resources Planning and Management, May 1988, A S C E , Vol. li4 No. 3, pages 276-294. BIBLIOGRAPHY 209 [Walski83a] Walski, T. M., "Techniques for Calibrating Network Models", Journal of the Water Resources Planning and Management, October 1983, A S C E , Vol. 109 No. 4, pages 360-372. [Walski84] Walski, T. M., Analysis of Water Distribution Systems, Van Nostrand-Reinhold, New York, NY, 1984. . [Walski86] Walski, T. M., "Case Study: Pipe Network Model Calibration Issues", Journal of the Water Resources Planning and Management, April 1986, A S C E , Vol. 112 No. 2, pages 238-249. [Warga54] Warga, J., "Determination of Steady-State Flows and Currents in a Network", Proceedings of the Instrument Society of America, 1954, Vol. 9. Pt. 5, Paper 54-43-4. [Widrow60] Widrow, B., and M. E. Hoff, "Adaptive Switching Circuits", IRE WESCON Convention Record, 1960, 96-104, New York. [Widrow78] Widrow, B., J. McCool, and B. Medoff, "Adaptive Control by Inverse Modeling", \2 Asilomar Conference on Circuits, Systems, and Computers, 1978. th [Widrow85] Widrow, B. and Stearns, S. D., Adaptive Signal Processing, 1985, Prentice-Hall, Inc., Englewood Cliffs-, New Jersey. [Wood72] Wood, D. J. and Charles, CO.A., "Hydraulic Network Analysis Using Linear Theory", Journal of the Hydraulics Division, July 1972, A S C E , Vol. 98 No. HY7, pages 1157-1170. [Wood81] Wood, D. J. and Rayes, A. G., "Reliability of Algorithms for Pipe Network Analysis", Journal of the Hydraulics Division, October 1981, A S C E , Vol. 107 No. HY10, pages 1145-1161. [Zessler89] Zessler, U. and Shamir, U., "Optimal Operation of Water Distribution Systems", Journal of the Water Resources Planning and Management, November 1989, A S C E , Vol. 115 No. 6, pages 735-752. Appendix A. SUMMARY OF RESULTS 211 1 • ! • i ; j* •* • >* a a • E * » i a . ^•s «5 K* » j *CX + OH^W 3j Appendix A S U M M A R Y OF RESULTS 210 Appendix A. SUMMARY OF RESULTS 212 tlu • 1 J • • \'\ • E I • a a • • • .! * * a i •i * ! 11 j *0<+OB«B Figure A.38: [1.2]: Error Evolution for Problem 1, 3-Node Net, T=54 (Normal) Appendix A. SUMMARY OF RESULTS 213 E o * El •D IP •5 ffltaa4 jojJ3 swy Figure A.39: [1.3]: Error Evolution for Problem 1, 3-Node Net, T=14 (Uniform) Appendix A. SUMMARY OF RESULTS 214 E 3 • E f ~i E • *C f • 3 » Z Z « • *• > S£ X M 9 N 9 N 9 • 3 • > • X »l M N 9 c 3 M M 9 «1 K 2 J ° ^ 3 swa Figure A.40: [1.4]: Error Evolution for Problem 1, 3-Node Net, T=54 (Uniform) Appendix A. SUMMARY OF RESULTS Figure A.41: [1.5]: Error.Evolution for Problem 1, 7-Node Net, T=14 (Normal) 215 Appendix A. SUMMARY OF RESULTS 216 m 1 F 1a - - i j 1•i • MmE • • w * t s * z i ; i a O Figure A.42: [1.6]: Error Evolution for Problem 1, 7-Node Net, T=54 (Normal) Appendix A. SUMMARY OF RESULTS 217 t • rm E • * a • o • • 3 © W. z e O Z M ? E E • R &1 S s N 1 19 B 3 m « a M • X »1 •1 •> S K N •• 1 4 Figure A.43: [1.7]: Error Evolution for Problem 1, 7-Node Net, T=14 (Uniform) Appendix A. SUMMARY OF RESULTS 218 Appendix A. SUMMARY OF RESULTS 219 • _ f j * s: 1 | • i i • M k • • • • a s * * * m 2 11 E * 0< + O B « N o o m o o a in o o o tr> o o o in o o o in o o jojj3 sm Figure A.45: [2.1]: Error Evolution for Problem 2, 3-Node Net, T=14 (Normal) Appendix A. SUMMARY OF RESULTS Figure A.46: [2.2]: Error Evolution for Problem 2, 3-Node Net, T=54 (Normal) 220 Appendix A: SUMMARY OF RESULTS 221 i . Ok CI Figure A.47: [2.3]: Error Evolution for Problem 1, 3-Node Net, T=14 (Uniform) Appendix A. SUMMARY OF RESULTS 222 E E C k • • • • > • "•5 z © X z N E E t k • • e Z •e X r » a o X H Kl ft 2 15J z •3 B H S t t j 4 ,H o• in a in o in o in o o d o d o d o d d M M in M M 3 o Q JOJJ3 swa Figure A.48: [2.4]: Error Evolution for Problem 2, 3-Node Net, T=54 (Uniform) Appendix A. SUMMARY OF RESULTS 223 ;,a a i\ a a & a <' C X - r - O B O M Figure A.49: [2.5]: Error Evolution for Problem 2, 7-Node Net, T=14 (Normal) Appendix A. SUMMARY OF RESULTS Figure A.50: [2.6]: Error Evolution for Problem 2, 7-Node Net, T=54 (Normal) 224 Appendix A. SUMMARY OF RESULTS 225 fj • i c *e 1 c a a 3 « — • • "« K • e • « a 3 • > • • O z z - M N M N iI — if 5£ I • 3 B B S * ! 4 Figure A.51: [2.7]: Error Evolution for Problem 2, 7-Node Net, T=14 (Uniform) OF RESULTS 226 1 rm Appendix A. SUMMARY *£ E m • E CE oc "c • •5 • * s • > • X *z5 "Sz Xa Xa X « *i •*) HMNN N H 9 999 Figure A.52: [2.8]: Error Evolution for Problem 1, 7-Node Net, T=54 (Uniform) Appendix A. SUMMARY OF RESULTS 227 "3 • E X « >. _! 1 E• i • • X J»jJ li» • o • "5 9 < 5 M Xk. > * > M OJ *O X X X X fi fi n « a « X X M 9 3 i .2 s £ 2 -3o< + OB*M£ M J» o M & O o JOJJ 3 S W y Figure A.53: [3.1]: Error Evolution for Problem 3, 3-Node Net, T=14 (Normal) Appendix A. SUMMARY OF RESULTS 228 _ 1 j E • • E • a i I » j : • » •S a « • i 9£ i M • > i a E * i 0 < + O B * W a Q o o in w r» o in o in o to o i/> o in o in K) in o in <n q q O q CoM cs « qCO cqo r» q o o O o IqD in q o a a a o o o d d d ci d o d d d d d d d d d d J0^3 SrNd Figure A.54: [3.2]: Error Evolution for Problem 3, 3-Node Net, T=54 (Normal) Appendix A. SUMMARY OF RESULTS Figure Ar55: [3.3]: Error Evolution for Problem 3, 3-Node Net, T=14 (Uniform) 229 Appendix A. SUMMARY OF RESULTS 230 E Unlfo • • t« "5 • •t 3 a • i 3 • » a 3m •» ms z • * M « 1 •2• 2 2m1 m m• c a "c g c S o v E f Im •1 •< M N O M N "5 z o ot o in d d DO q .070 d m © o .075 o o 080 o o a ° ° JOJJ3 m to p o o q d m m o 6 o m q d sm Figure A.56: [3.4]: Error Evolution for Problem 3, 3-Node Net, T=54 (Uniform) Appendix A. SUMMARY OF RESULTS 231 • _ I «! • • 1 • • « « ! • • m 1 « • « » a a • • ! i * S j m •* • i •j •k * a « • 2 CXJ + O H O H JOJJ3 swa Figure A.57: [3.5]: Error Evolution for Problem 3, 7-Node Net, T=14 (Normal) Appendix A. SUMMARY OF RESULTS Figure A.58: [3.6]: Error Evolution for Problem 3, 7-Node Net, T=54 (Normal) 232 Appendix A. SUMMARY OF RESULTS 233 c f E *• E "c Eo 3c ••• a « • • 5 •5 »•> m • • • "5z "S a * C o M N » <N ft M N X & 2i » V) M s t o o o o o JOJJ3 swa Figure A.59: [3.7]: Error Evolution for Problem 3, 7-Node Net, T=14 (Uniform) Appendix A. SUMMARY OF RESULTS 234 • 5 I E e c I h< ,4 i • 3 o 3 m — a "a a 3 * a • a *o B o a ar x Z Z • • • - « a 3 a X « •* M N N N 2 3 •3SBQtl4 Figure A.60: [3.8]: Error Evolution for Problem 3, 7-Node Net, T=54 (Uniform) Appendix A. SUMMARY OF RESULTS 235 • • • • • • • 5 j ,• •"• M 3 5 11 Jt * * X I at m m\ o o o d d d d d JOJJ3 s n y Figure A.61: [3.9]: Comparison of RMSE t and RMSEk, Problem 3, 3-Node Net, Noiseless Appendix A. SUMMARY OF RESULTS Figure A.62: [3.10]: Comparison of RMSE 236 t and RMSEk, Problem 3, 3-Node Net, Noisy Appendix A. SUMMARY OF RESULTS Figure A.63: [3.11]: Comparison of RMSE 237 t and RMSE , k Problem 3, 7-Node Net, Noiseless Appendix A. SUMMARY OF RESULTS 238 Figure A.64: [3.12]: Actual vs. ANN computed values of K , (Egf", ? ), less Normal, Fuzzy Type 1, T = 54 K 3 in Problem 3, Noise- Appendix A. SUMMARY OF RESULTS 239 c in z CD z < <u • _J * m z o ?e I/) CD _D O > CD | O UF " I -7 c (0 > < d oo d d d -rm d -Td -rd j o p q j s s c q n j s i u 6 a s s d y (j, adXj^ Xzznj) p a p o g Figure A.65: [3.13]: Actual vs. ANN computed values of K , Problem 3, ( E £ ™ , K less Normal, Fuzzy Type 1, T = 54 s maa; ) , Noise- Appendix A. SUMMARY OF RESULTS 240 XJ c OJ o <u _J o o o c i d d c j o 4> OJ z z < z Q • s 0 j o p q j sso"i |usuu6as a d y (j, adAj. Xzznj) p a | O D $ Figure A.66: [3.14]: Actual vs. ANN computed values of K , Problem 3, (E£!£*,K™ ), Noiseless Normal, Fuzzy Type 1, T = 54 n s Appendix A. SUMMARY OF RESULTS 241 XI » OJ z D> z OJ < _l z o •c • o> d to d r>s d to d m d •+ d tot>4 d d ~~. o 0 j o p q j sso"i |U9iu6as a d y (y adXj^ Xzznj) p a p o s Figure A.67: [3.15]: Actual vs. ANN computed values of K , Problem 3, less Normal, Fuzzy Type 1, T = 54 s (E££ , R X K ), max Noise- Appendix A. SUMMARY OF RESULTS 242 Figure A.68: [3.16]: Actual vs. ANN computed values of K , Problem 3, ( E £ * ? , K ™ n ) , Noiseless Normal, Fuzzy Type 2, T = 54 s Appendix A. SUMMARY OF RESULTS 243 V) O > O UT > < j o p q j s s o ] | u s a i 6 a s a d y ([, adAj^ Xzznj) p a p o s - Figure A.69: [3.17]: Actual vs. ANN computed values of K , Problem 3, (E£f", K™ 0 *), Noiseless Normal, Fuzzy Type 2, T = 54 s Appendix A. SUMMARY OF RESULTS — i i d d j o p p j sso-| 244 i o i d i o HjaLLi6as 8 C ' ! d (l i i i i d d d ° 9 C ^1 \-o ^zznj) p a p o ^ Figure A.70: [3.18]: Actual vs. ANN computed values of K , Problem 3, (E™^, K™' n ), Noiseless Normal, Fuzzy Type 2, T = 54 a Appendix A. SUMMARY OF RESULTS 245 Figure A.71: [3.19]: Actual vs. ANN computed values of K , Problem 3, (E ™,K ), less Normal, Fuzzy Type 2, T = 54 m s max Noise- Appendix A. SUMMARY OF RESULTS 246 m • z© Z•O m "o Z «j z z•0 z •a1 m TT K 7 toT < K o Figure A.72: Error Evolution for Problem lb, 7-Node Net, ANN with 1 Hidden Layer Appendix A. SUMMARY OF RESULTS 247 * • • • • • •• • c X > • a «J X z s X z •a •j m •J 7 TJ 2 M £ T £ • V *- t 7 a» oKJ < + o •3 o o o d i n d o d i o d o i d A O d JOJJ3 i d n d o u d i o o shd Figure A.73: Error Evolution for Problem lb, 7-Node Net, ANN with. 5 Hidden Layers Appendix A. SUMMARY OF RESULTS 248 • • M O > z •j z z m •a T3 7 7 C • ft ft X • K K o < Figure A.74: Error Evolution for Problem 2b, 7-Node Net, ANN with 1 Hidden Layer Appendix A. SUMMARY OF RESULTS 249 * • a J e ; * M E X ) *9 9 X •9 1 1 1 M i I -I « I 1 • X • •XJ 1 5£ *> <n • I K K o< + o Figure A.75: Error Evolution for Problem 2b, 7-Node Net, ANN with 5 Hidden Layers Appendix A. SUMMARY OF RESULTS 250 3 35 9 & IT M M Vta K 3 1 aac aat a •C c •X • Xz X ak •a • •9 1 "o T r•»» 7 7 s ? 1 7 •A 1 9 • m -• JOJJ3 swa Figure A.76: Error Evolution for Problem 2b, 7-Node Net, ANN with 1 Hidden Layer
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Hydraulic network analysis and model calibration with...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Hydraulic network analysis and model calibration with artificial neural nets Fayegh, David A. 1993
pdf
Page Metadata
Item Metadata
Title | Hydraulic network analysis and model calibration with artificial neural nets |
Creator |
Fayegh, David A. |
Date Issued | 1993 |
Description | Computation of steady-state flow rates and the pressure distribution in a hydraulic network of given topology, pipe carrying capacities, and nodal demand discharges plays an essential role in the design, analysis, and operation of water and other fluid distribution systems. When accurate estimates of the hydraulic network model parameters (namely, pipe resistances and demand discharges) are available, the pipe segment flow rates and nodal energy heads can be easily calculated by means of a number of efficient algorithms that iteratively solve the basic non-linear system of hydraulic equations. For existing networks, however, reliable estimates of model parameters are rarely available since the nodal demands are in a continual state of change and the pipe segment carrying capacities gradually change with time. Artificial neural networks1 have recently emerged as general problem-solving tools whose potential has only begun to be exploited in engineering. The objectives of this dissertation are three fold: first, to identify the most appropriate class of ANN architectures and learning algorithms suitable for modeling two key problems associated with the design and operation of water distribution systems, namely, hydraulic network analysis and model calibration; secondly, to develop a relatively general framework for representing arbitrarily complex hydraulic networks by means of multilayered feedforward ANNs; and finally, to experimentally investigate the convergence and performance behavior of a number of feedforward networks trained by error back propagation for hydraulic network analysis and model calibration under typical operating conditions. Specifically, several sets of input/output2 training and testing vector pairs were generated and used to train and test the convergence and performance of a number of feedforward networks used to represent three characteristically different classes of hydraulic network problems. *ANNs henceforth. 2I/O henceforth. Convergence behavior was observed to improve after the appropriate preprocessing of the I/O training vector pairs generated by the hydraulic network model simulator. In part, this process involved application of linear and non-linear fuzzy transformation functions to the various groups of variables and parameters in the input and output vector pairs followed by the algebraic normalization of the input to the feedforward networks. The effects on convergence and performance as a result of variations in a number of experimental parameters such as the choice of transformation functions, changes in the underlying probability distribution from which seed hydraulic network data were drawn, the presence of noise in training and testing data, choice of the activation and output functions and their parameters, the size of training I/O data pairs, and the number of layers used in the feedforward network have also been investigated. Examination of the results indicates that the standard error back propagation algorithm converges relatively slowly for even medium-sized distribution systems although the number of training cycles required to learn an assigned class of tasks remains relatively invariant of the number of variables and parameters of the hydraulic network under investigation. The success of any particular feedforward network was largely dependent on the representation of the hydraulic networks (such as identification of the I/O, number of layers and nodes/layer in the ANN) and the choice of the transformation functions applied to I/O vector pairs. The techniques developed for transformation and normalization of the 1/0 training and testing data in addition to the appropriate ANN representation of the hydraulic network were observed to result in significant improvements in performance both during training, as measured in terms of the cycles required to achieve a desired root mean square error3 computed with both the training and testing data sets. 3RMSE henceforth. |
Extent | 11149818 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
File Format | application/pdf |
Language | eng |
Date Available | 2009-03-19 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0087845 |
URI | http://hdl.handle.net/2429/6228 |
Degree |
Doctor of Philosophy - PhD |
Program |
Civil Engineering |
Affiliation |
Applied Science, Faculty of Civil Engineering, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-ubc_1996-091902.pdf [ 10.63MB ]
- Metadata
- JSON: 831-1.0087845.json
- JSON-LD: 831-1.0087845-ld.json
- RDF/XML (Pretty): 831-1.0087845-rdf.xml
- RDF/JSON: 831-1.0087845-rdf.json
- Turtle: 831-1.0087845-turtle.txt
- N-Triples: 831-1.0087845-rdf-ntriples.txt
- Original Record: 831-1.0087845-source.json
- Full Text
- 831-1.0087845-fulltext.txt
- Citation
- 831-1.0087845.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0087845/manifest